Mailing List Archive

Virtio in Xen on Arm (based on IOREQ concept)
Hello all.

We would like to resume Virtio in Xen on Arm activities. You can find some
background at [1] and Virtio specification at [2].

*A few words about importance:*
There is an increasing interest, I would even say, the requirement to have
flexible, generic and standardized cross-hypervisor solution for I/O
virtualization
in the automotive and embedded areas. The target is quite clear here.
Providing a standardized interface and device models for device
para-virtualization
in hypervisor environments, Virtio interface allows us to move Guest domains
among different hypervisor systems without further modification at the
Guest side.
What is more that Virtio support is available in Linux, Android and many
other
operating systems and there are a lot of existing Virtio drivers (frontends)
which could be just reused without reinventing the wheel. Many
organisations push
Virtio direction as a common interface. To summarize, Virtio support would
be
the great feature in Xen on Arm in addition to traditional Xen PV drivers
for
the user to be able to choose which one to use.

*A few word about solution:*
As it was mentioned at [1], in order to implement virtio-mmio Xen on Arm
requires
some implementation to forward guest MMIO access to a device model. And as
it
turned out the Xen on x86 contains most of the pieces to be able to use that
transport (via existing IOREQ concept). Julien has already done a big amount
of work in his PoC (xen/arm: Add support for Guest IO forwarding to a
device emulator).
Using that code as a base we managed to create a completely functional PoC
with DomU
running on virtio block device instead of a traditional Xen PV driver
without
modifications to DomU Linux. Our work is mostly about rebasing Julien's
code on the actual
codebase (Xen 4.14-rc4), various tweeks to be able to run emulator
(virtio-disk backend)
in other than Dom0 domain (in our system we have thin Dom0 and keep all
backends
in driver domain), misc fixes for our use-cases and tool support for the
configuration.
Unfortunately, Julien doesn’t have much time to allocate on the work
anymore,
so we would like to step in and continue.

*A few word about the Xen code:*
You can find the whole Xen series at [5]. The patches are in RFC state
because
some actions in the series should be reconsidered and implemented properly.
Before submitting the final code for the review the first IOREQ patch
(which is quite
big) will be split into x86, Arm and common parts. Please note, x86 part
wasn’t
even build-tested so far and could be broken with that series. Also the
series probably
wants splitting into adding IOREQ on Arm (should be focused first) and
tools support
for the virtio-disk (which is going to be the first Virtio driver)
configuration before going
into the mailing list.

What I would like to add here, the IOREQ feature on Arm could be used not
only
for implementing Virtio, but for other use-cases which require some
emulator entity
outside Xen such as custom PCI emulator (non-ECAM compatible) for example.

*A few word about the backend(s):*
One of the main problems with Virtio in Xen on Arm is the absence of
“ready-to-use” and “out-of-Qemu” Virtio backends (I least am not aware of).
We managed to create virtio-disk backend based on demu [3] and kvmtool [4]
using
that series. It is worth mentioning that although Xenbus/Xenstore is not
supposed
to be used with native Virtio, that interface was chosen to just pass
configuration from toolstack
to the backend and notify it about creating/destroying Guest domain (I
think it is
not bad since backends are usually tied to the hypervisor and can use
services
provided by hypervisor), the most important thing here is that all Virtio
subsystem
in the Guest was left unmodified. Backend wants some cleanup and, probably,
refactoring. We have a plan to publish it in a while.

Our next plan is to start preparing series for the review. Any feedback and
would be
highly appreciated.

[1]
https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg01746.html
[2]
https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html
[3] https://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git;a=summary
[4] https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/
[5] https://github.com/xen-troops/xen/commits/ioreq_4.14_ml

--
Regards,

Oleksandr Tyshchenko
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
Hello,

I'm very happy to see this proposal, as I think having proper (1st
class) VirtIO support on Xen is crucial to our survival. Almost all
OSes have VirtIO frontends, while the same can't be said about Xen PV
frontends. It would also allow us to piggyback on any new VirtIO
devices without having to re-invent the wheel by creating a clone Xen
PV device.

On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
> Hello all.
>
> We would like to resume Virtio in Xen on Arm activities. You can find some
> background at [1] and Virtio specification at [2].
>
> *A few words about importance:*
> There is an increasing interest, I would even say, the requirement to have
> flexible, generic and standardized cross-hypervisor solution for I/O
> virtualization
> in the automotive and embedded areas. The target is quite clear here.
> Providing a standardized interface and device models for device
> para-virtualization
> in hypervisor environments, Virtio interface allows us to move Guest domains
> among different hypervisor systems without further modification at the
> Guest side.
> What is more that Virtio support is available in Linux, Android and many
> other
> operating systems and there are a lot of existing Virtio drivers (frontends)
> which could be just reused without reinventing the wheel. Many
> organisations push
> Virtio direction as a common interface. To summarize, Virtio support would
> be
> the great feature in Xen on Arm in addition to traditional Xen PV drivers
> for
> the user to be able to choose which one to use.

I think most of the above also applies to x86, and fully agree.

>
> *A few word about solution:*
> As it was mentioned at [1], in order to implement virtio-mmio Xen on Arm

Any plans for virtio-pci? Arm seems to be moving to the PCI bus, and
it would be very interesting from a x86 PoV, as I don't think
virtio-mmio is something that you can easily use on x86 (or even use
at all).

> requires
> some implementation to forward guest MMIO access to a device model. And as
> it
> turned out the Xen on x86 contains most of the pieces to be able to use that
> transport (via existing IOREQ concept). Julien has already done a big amount
> of work in his PoC (xen/arm: Add support for Guest IO forwarding to a
> device emulator).
> Using that code as a base we managed to create a completely functional PoC
> with DomU
> running on virtio block device instead of a traditional Xen PV driver
> without
> modifications to DomU Linux. Our work is mostly about rebasing Julien's
> code on the actual
> codebase (Xen 4.14-rc4), various tweeks to be able to run emulator
> (virtio-disk backend)
> in other than Dom0 domain (in our system we have thin Dom0 and keep all
> backends
> in driver domain),

How do you handle this use-case? Are you using grants in the VirtIO
ring, or rather allowing the driver domain to map all the guest memory
and then placing gfn on the ring like it's commonly done with VirtIO?

Do you have any plans to try to upstream a modification to the VirtIO
spec so that grants (ie: abstract references to memory addresses) can
be used on the VirtIO ring?

> misc fixes for our use-cases and tool support for the
> configuration.
> Unfortunately, Julien doesn’t have much time to allocate on the work
> anymore,
> so we would like to step in and continue.
>
> *A few word about the Xen code:*
> You can find the whole Xen series at [5]. The patches are in RFC state
> because
> some actions in the series should be reconsidered and implemented properly.
> Before submitting the final code for the review the first IOREQ patch
> (which is quite
> big) will be split into x86, Arm and common parts. Please note, x86 part
> wasn’t
> even build-tested so far and could be broken with that series. Also the
> series probably
> wants splitting into adding IOREQ on Arm (should be focused first) and
> tools support
> for the virtio-disk (which is going to be the first Virtio driver)
> configuration before going
> into the mailing list.

Sending first a patch series to enable IOREQs on Arm seems perfectly
fine, and it doesn't have to come with the VirtIO backend. In fact I
would recommend that you send that ASAP, so that you don't spend time
working on the backend that would likely need to be modified
according to the review received on the IOREQ series.

>
> What I would like to add here, the IOREQ feature on Arm could be used not
> only
> for implementing Virtio, but for other use-cases which require some
> emulator entity
> outside Xen such as custom PCI emulator (non-ECAM compatible) for example.
>
> *A few word about the backend(s):*
> One of the main problems with Virtio in Xen on Arm is the absence of
> “ready-to-use” and “out-of-Qemu” Virtio backends (I least am not aware of).
> We managed to create virtio-disk backend based on demu [3] and kvmtool [4]
> using
> that series. It is worth mentioning that although Xenbus/Xenstore is not
> supposed
> to be used with native Virtio, that interface was chosen to just pass
> configuration from toolstack
> to the backend and notify it about creating/destroying Guest domain (I
> think it is

I would prefer if a single instance was launched to handle each
backend, and that the configuration was passed on the command line.
Killing the user-space backend from the toolstack is fine I think,
there's no need to notify the backend using xenstore or any other
out-of-band methods.

xenstore has proven to be a bottleneck in terms of performance, and it
would be better if we can avoid using it when possible, specially here
that you have to do this from scratch anyway.

Thanks, Roger.
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 17.07.20 18:00, Roger Pau Monné wrote:
> Hello,

Hello Roger


> I'm very happy to see this proposal, as I think having proper (1st
> class) VirtIO support on Xen is crucial to our survival. Almost all
> OSes have VirtIO frontends, while the same can't be said about Xen PV
> frontends. It would also allow us to piggyback on any new VirtIO
> devices without having to re-invent the wheel by creating a clone Xen
> PV device.

Thank you.


>
> On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
>> Hello all.
>>
>> We would like to resume Virtio in Xen on Arm activities. You can find some
>> background at [1] and Virtio specification at [2].
>>
>> *A few words about importance:*
>> There is an increasing interest, I would even say, the requirement to have
>> flexible, generic and standardized cross-hypervisor solution for I/O
>> virtualization
>> in the automotive and embedded areas. The target is quite clear here.
>> Providing a standardized interface and device models for device
>> para-virtualization
>> in hypervisor environments, Virtio interface allows us to move Guest domains
>> among different hypervisor systems without further modification at the
>> Guest side.
>> What is more that Virtio support is available in Linux, Android and many
>> other
>> operating systems and there are a lot of existing Virtio drivers (frontends)
>> which could be just reused without reinventing the wheel. Many
>> organisations push
>> Virtio direction as a common interface. To summarize, Virtio support would
>> be
>> the great feature in Xen on Arm in addition to traditional Xen PV drivers
>> for
>> the user to be able to choose which one to use.
> I think most of the above also applies to x86, and fully agree.
>
>> *A few word about solution:*
>> As it was mentioned at [1], in order to implement virtio-mmio Xen on Arm
> Any plans for virtio-pci? Arm seems to be moving to the PCI bus, and
> it would be very interesting from a x86 PoV, as I don't think
> virtio-mmio is something that you can easily use on x86 (or even use
> at all).

Being honest I didn't consider virtio-pci so far. Julien's PoC (we are
based on) provides support for the virtio-mmio transport

which is enough to start working around VirtIO and is not as complex as
virtio-pci. But it doesn't mean there is no way for virtio-pci in Xen.

I think, this could be added in next steps. But the nearest target is
virtio-mmio approach (of course if the community agrees on that).


>> requires
>> some implementation to forward guest MMIO access to a device model. And as
>> it
>> turned out the Xen on x86 contains most of the pieces to be able to use that
>> transport (via existing IOREQ concept). Julien has already done a big amount
>> of work in his PoC (xen/arm: Add support for Guest IO forwarding to a
>> device emulator).
>> Using that code as a base we managed to create a completely functional PoC
>> with DomU
>> running on virtio block device instead of a traditional Xen PV driver
>> without
>> modifications to DomU Linux. Our work is mostly about rebasing Julien's
>> code on the actual
>> codebase (Xen 4.14-rc4), various tweeks to be able to run emulator
>> (virtio-disk backend)
>> in other than Dom0 domain (in our system we have thin Dom0 and keep all
>> backends
>> in driver domain),
> How do you handle this use-case? Are you using grants in the VirtIO
> ring, or rather allowing the driver domain to map all the guest memory
> and then placing gfn on the ring like it's commonly done with VirtIO?

Second option. Xen grants are not used at all as well as event channel
and Xenbus. That allows us to have guest

*unmodified* which one of the main goals. Yes, this may sound (or even
sounds) non-secure, but backend which runs in driver domain is allowed
to map all guest memory.

In current backend implementation a part of guest memory is mapped just
to process guest request then unmapped back, there is no mappings in
advance. The xenforeignmemory_map

call is used for that purpose. For experiment I tried to map all guest
memory in advance and just calculated pointer at runtime. Of course that
logic performed better.

I was thinking about guest static memory regions and forcing guest to
allocate descriptors from them (in order not to map all guest memory,
but a predefined region). But that implies modifying guest...


>
> Do you have any plans to try to upstream a modification to the VirtIO
> spec so that grants (ie: abstract references to memory addresses) can
> be used on the VirtIO ring?

But VirtIO spec hasn't been modified as well as VirtIO infrastructure in
the guest. Nothing to upsteam)


>
>> misc fixes for our use-cases and tool support for the
>> configuration.
>> Unfortunately, Julien doesn’t have much time to allocate on the work
>> anymore,
>> so we would like to step in and continue.
>>
>> *A few word about the Xen code:*
>> You can find the whole Xen series at [5]. The patches are in RFC state
>> because
>> some actions in the series should be reconsidered and implemented properly.
>> Before submitting the final code for the review the first IOREQ patch
>> (which is quite
>> big) will be split into x86, Arm and common parts. Please note, x86 part
>> wasn’t
>> even build-tested so far and could be broken with that series. Also the
>> series probably
>> wants splitting into adding IOREQ on Arm (should be focused first) and
>> tools support
>> for the virtio-disk (which is going to be the first Virtio driver)
>> configuration before going
>> into the mailing list.
> Sending first a patch series to enable IOREQs on Arm seems perfectly
> fine, and it doesn't have to come with the VirtIO backend. In fact I
> would recommend that you send that ASAP, so that you don't spend time
> working on the backend that would likely need to be modified
> according to the review received on the IOREQ series.

Completely agree with you, I will send it after splitting IOREQ patch
and performing some cleanup.

However, it is going to take some time to make it properly taking into
the account

that personally I won't be able to test on x86.


>
>> What I would like to add here, the IOREQ feature on Arm could be used not
>> only
>> for implementing Virtio, but for other use-cases which require some
>> emulator entity
>> outside Xen such as custom PCI emulator (non-ECAM compatible) for example.
>>
>> *A few word about the backend(s):*
>> One of the main problems with Virtio in Xen on Arm is the absence of
>> “ready-to-use” and “out-of-Qemu” Virtio backends (I least am not aware of).
>> We managed to create virtio-disk backend based on demu [3] and kvmtool [4]
>> using
>> that series. It is worth mentioning that although Xenbus/Xenstore is not
>> supposed
>> to be used with native Virtio, that interface was chosen to just pass
>> configuration from toolstack
>> to the backend and notify it about creating/destroying Guest domain (I
>> think it is
> I would prefer if a single instance was launched to handle each
> backend, and that the configuration was passed on the command line.
> Killing the user-space backend from the toolstack is fine I think,
> there's no need to notify the backend using xenstore or any other
> out-of-band methods.
>
> xenstore has proven to be a bottleneck in terms of performance, and it
> would be better if we can avoid using it when possible, specially here
> that you have to do this from scratch anyway.

Let me elaborate a bit more on this.

In current backend implementation, the Xenstore is *not* used for
communication between backend (VirtIO device) and frontend (VirtIO
driver), frontend knows nothing about it.

Xenstore was chosen as an interface in order to be able to pass
configuration from toolstack in Dom0 to backend which may reside in
other than Dom0 domain (DomD in our case),

also looking into the Xenstore entries backend always knows when the
intended guest is been created/destroyed.

I may mistake, but I don't think we can avoid using Xenstore (or other
interface provided by toolstack) for the several reasons.

Besides a virtio-disk configuration (a disk to be assigned to the guest,
R/O mode, etc), for each virtio-mmio device instance

a pair (mmio range + IRQ) are allocated by toolstack at the guest
construction time and inserted into virtio-mmio device tree node

in the guest device tree. And for the backend to properly operate these
variable parameters are also passed to the backend via Xenstore.

The other reasons are:

1. Automation. With current backend implementation we don't need to
pause guest right after creating it, then go to the driver domain and
spawn backend and

after that go back to the dom0 and unpause the guest.

2. Ability to detect when guest with involved frontend has gone away and
properly release resource (guest destroy/reboot).

3. Ability to (re)connect to the newly created guest with involved
frontend (guest create/reboot).

4. What is more that having Xenstore support the backend is able to
detect the dom_id it runs into and the guest dom_id, there is no need
pass them via command line.


I will be happy to explain in details after publishing backend code).


--
Regards,

Oleksandr Tyshchenko
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
> On 17.07.20 18:00, Roger Pau Monné wrote:
> > On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
> > > requires
> > > some implementation to forward guest MMIO access to a device model. And as
> > > it
> > > turned out the Xen on x86 contains most of the pieces to be able to use that
> > > transport (via existing IOREQ concept). Julien has already done a big amount
> > > of work in his PoC (xen/arm: Add support for Guest IO forwarding to a
> > > device emulator).
> > > Using that code as a base we managed to create a completely functional PoC
> > > with DomU
> > > running on virtio block device instead of a traditional Xen PV driver
> > > without
> > > modifications to DomU Linux. Our work is mostly about rebasing Julien's
> > > code on the actual
> > > codebase (Xen 4.14-rc4), various tweeks to be able to run emulator
> > > (virtio-disk backend)
> > > in other than Dom0 domain (in our system we have thin Dom0 and keep all
> > > backends
> > > in driver domain),
> > How do you handle this use-case? Are you using grants in the VirtIO
> > ring, or rather allowing the driver domain to map all the guest memory
> > and then placing gfn on the ring like it's commonly done with VirtIO?
>
> Second option. Xen grants are not used at all as well as event channel and
> Xenbus. That allows us to have guest
>
> *unmodified* which one of the main goals. Yes, this may sound (or even
> sounds) non-secure, but backend which runs in driver domain is allowed to
> map all guest memory.

Supporting unmodified guests is certainly a fine goal, but I don't
think it's incompatible with also trying to expand the spec in
parallel in order to support grants in a negotiated way (see below).

That way you could (long term) regain some of the lost security.

> > Do you have any plans to try to upstream a modification to the VirtIO
> > spec so that grants (ie: abstract references to memory addresses) can
> > be used on the VirtIO ring?
>
> But VirtIO spec hasn't been modified as well as VirtIO infrastructure in the
> guest. Nothing to upsteam)

OK, so there's no intention to add grants (or a similar interface) to
the spec?

I understand that you want to support unmodified VirtIO frontends, but
I also think that long term frontends could negotiate with backends on
the usage of grants in the shared ring, like any other VirtIO feature
negotiated between the frontend and the backend.

This of course needs to be on the spec first before we can start
implementing it, and hence my question whether a modification to the
spec in order to add grants has been considered.

It's fine to say that you don't have any plans in this regard.

>
> >
> > > misc fixes for our use-cases and tool support for the
> > > configuration.
> > > Unfortunately, Julien doesn’t have much time to allocate on the work
> > > anymore,
> > > so we would like to step in and continue.
> > >
> > > *A few word about the Xen code:*
> > > You can find the whole Xen series at [5]. The patches are in RFC state
> > > because
> > > some actions in the series should be reconsidered and implemented properly.
> > > Before submitting the final code for the review the first IOREQ patch
> > > (which is quite
> > > big) will be split into x86, Arm and common parts. Please note, x86 part
> > > wasn’t
> > > even build-tested so far and could be broken with that series. Also the
> > > series probably
> > > wants splitting into adding IOREQ on Arm (should be focused first) and
> > > tools support
> > > for the virtio-disk (which is going to be the first Virtio driver)
> > > configuration before going
> > > into the mailing list.
> > Sending first a patch series to enable IOREQs on Arm seems perfectly
> > fine, and it doesn't have to come with the VirtIO backend. In fact I
> > would recommend that you send that ASAP, so that you don't spend time
> > working on the backend that would likely need to be modified
> > according to the review received on the IOREQ series.
>
> Completely agree with you, I will send it after splitting IOREQ patch and
> performing some cleanup.
>
> However, it is going to take some time to make it properly taking into the
> account
>
> that personally I won't be able to test on x86.

We have gitlab and the osstest CI loop (plus all the reviewers) so we
should be able to spot any regressions. Build testing on x86 would be
nice so that you don't need to resend to fix build issues.

>
> >
> > > What I would like to add here, the IOREQ feature on Arm could be used not
> > > only
> > > for implementing Virtio, but for other use-cases which require some
> > > emulator entity
> > > outside Xen such as custom PCI emulator (non-ECAM compatible) for example.
> > >
> > > *A few word about the backend(s):*
> > > One of the main problems with Virtio in Xen on Arm is the absence of
> > > “ready-to-use” and “out-of-Qemu” Virtio backends (I least am not aware of).
> > > We managed to create virtio-disk backend based on demu [3] and kvmtool [4]
> > > using
> > > that series. It is worth mentioning that although Xenbus/Xenstore is not
> > > supposed
> > > to be used with native Virtio, that interface was chosen to just pass
> > > configuration from toolstack
> > > to the backend and notify it about creating/destroying Guest domain (I
> > > think it is
> > I would prefer if a single instance was launched to handle each
> > backend, and that the configuration was passed on the command line.
> > Killing the user-space backend from the toolstack is fine I think,
> > there's no need to notify the backend using xenstore or any other
> > out-of-band methods.
> >
> > xenstore has proven to be a bottleneck in terms of performance, and it
> > would be better if we can avoid using it when possible, specially here
> > that you have to do this from scratch anyway.
>
> Let me elaborate a bit more on this.
>
> In current backend implementation, the Xenstore is *not* used for
> communication between backend (VirtIO device) and frontend (VirtIO driver),
> frontend knows nothing about it.
>
> Xenstore was chosen as an interface in order to be able to pass
> configuration from toolstack in Dom0 to backend which may reside in other
> than Dom0 domain (DomD in our case),

There's 'xl devd' which can be used on the driver domain to spawn
backends, maybe you could add the logic there so that 'xl devd' calls
the backend executable with the required command line parameters, so
that the backend itself doesn't need to interact with xenstore in any
way?

That way in the future we could use something else instead of
xenstore, like Argo for instance in order to pass the backend data
from the control domain to the driver domain.

> also looking into the Xenstore entries backend always knows when the
> intended guest is been created/destroyed.

xl devd should also do the killing of backends anyway when a domain is
destroyed, or else malfunctioning user-space backends could keep
running after the domain they are serving is destroyed.

> I may mistake, but I don't think we can avoid using Xenstore (or other
> interface provided by toolstack) for the several reasons.
>
> Besides a virtio-disk configuration (a disk to be assigned to the guest, R/O
> mode, etc), for each virtio-mmio device instance
>
> a pair (mmio range + IRQ) are allocated by toolstack at the guest
> construction time and inserted into virtio-mmio device tree node
>
> in the guest device tree. And for the backend to properly operate these
> variable parameters are also passed to the backend via Xenstore.

I think you could pass all these parameters as command line arguments
to the backend?

> The other reasons are:
>
> 1. Automation. With current backend implementation we don't need to pause
> guest right after creating it, then go to the driver domain and spawn
> backend and
>
> after that go back to the dom0 and unpause the guest.

xl devd should be capable of handling this for you on the driver
domain.

> 2. Ability to detect when guest with involved frontend has gone away and
> properly release resource (guest destroy/reboot).
>
> 3. Ability to (re)connect to the newly created guest with involved frontend
> (guest create/reboot).
>
> 4. What is more that having Xenstore support the backend is able to detect
> the dom_id it runs into and the guest dom_id, there is no need pass them via
> command line.
>
>
> I will be happy to explain in details after publishing backend code).

As I'm not the one doing the work I certainly won't stop you from
using xenstore on the backend. I would certainly prefer if the backend
gets all the information it needs from the command line so that the
configuration data is completely agnostic to the transport layer used
to convey it.

Thanks, Roger.
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 20/07/2020 10:17, Roger Pau Monné wrote:
> On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
>> On 17.07.20 18:00, Roger Pau Monné wrote:
>>> On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
>>>> requires
>>>> some implementation to forward guest MMIO access to a device model. And as
>>>> it
>>>> turned out the Xen on x86 contains most of the pieces to be able to use that
>>>> transport (via existing IOREQ concept). Julien has already done a big amount
>>>> of work in his PoC (xen/arm: Add support for Guest IO forwarding to a
>>>> device emulator).
>>>> Using that code as a base we managed to create a completely functional PoC
>>>> with DomU
>>>> running on virtio block device instead of a traditional Xen PV driver
>>>> without
>>>> modifications to DomU Linux. Our work is mostly about rebasing Julien's
>>>> code on the actual
>>>> codebase (Xen 4.14-rc4), various tweeks to be able to run emulator
>>>> (virtio-disk backend)
>>>> in other than Dom0 domain (in our system we have thin Dom0 and keep all
>>>> backends
>>>> in driver domain),
>>> How do you handle this use-case? Are you using grants in the VirtIO
>>> ring, or rather allowing the driver domain to map all the guest memory
>>> and then placing gfn on the ring like it's commonly done with VirtIO?
>>
>> Second option. Xen grants are not used at all as well as event channel and
>> Xenbus. That allows us to have guest
>>
>> *unmodified* which one of the main goals. Yes, this may sound (or even
>> sounds) non-secure, but backend which runs in driver domain is allowed to
>> map all guest memory.
>
> Supporting unmodified guests is certainly a fine goal, but I don't
> think it's incompatible with also trying to expand the spec in
> parallel in order to support grants in a negotiated way (see below).
>
> That way you could (long term) regain some of the lost security.

FWIW, Xen is not the only hypervisor/community interested in creating
"less privileged" backend.

>
>>> Do you have any plans to try to upstream a modification to the VirtIO
>>> spec so that grants (ie: abstract references to memory addresses) can
>>> be used on the VirtIO ring?
>>
>> But VirtIO spec hasn't been modified as well as VirtIO infrastructure in the
>> guest. Nothing to upsteam)
>
> OK, so there's no intention to add grants (or a similar interface) to
> the spec?
>
> I understand that you want to support unmodified VirtIO frontends, but
> I also think that long term frontends could negotiate with backends on
> the usage of grants in the shared ring, like any other VirtIO feature
> negotiated between the frontend and the backend.
>
> This of course needs to be on the spec first before we can start
> implementing it, and hence my question whether a modification to the
> spec in order to add grants has been considered.
The problem is not really the specification but the adoption in the
ecosystem. A protocol based on grant-tables would mostly only be used by
Xen therefore:
- It may be difficult to convince a proprietary OS vendor to invest
resource on implementing the protocol
- It would be more difficult to move in/out of Xen ecosystem.

Both, may slow the adoption of Xen in some areas.

If one is interested in security, then it would be better to work with
the other interested parties. I think it would be possible to use a
virtual IOMMU for this purpose.

Cheers,

--
Julien Grall
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Mon, Jul 20, 2020 at 10:40:40AM +0100, Julien Grall wrote:
>
>
> On 20/07/2020 10:17, Roger Pau Monné wrote:
> > On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
> > > On 17.07.20 18:00, Roger Pau Monné wrote:
> > > > On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
> > > > Do you have any plans to try to upstream a modification to the VirtIO
> > > > spec so that grants (ie: abstract references to memory addresses) can
> > > > be used on the VirtIO ring?
> > >
> > > But VirtIO spec hasn't been modified as well as VirtIO infrastructure in the
> > > guest. Nothing to upsteam)
> >
> > OK, so there's no intention to add grants (or a similar interface) to
> > the spec?
> >
> > I understand that you want to support unmodified VirtIO frontends, but
> > I also think that long term frontends could negotiate with backends on
> > the usage of grants in the shared ring, like any other VirtIO feature
> > negotiated between the frontend and the backend.
> >
> > This of course needs to be on the spec first before we can start
> > implementing it, and hence my question whether a modification to the
> > spec in order to add grants has been considered.
> The problem is not really the specification but the adoption in the
> ecosystem. A protocol based on grant-tables would mostly only be used by Xen
> therefore:
> - It may be difficult to convince a proprietary OS vendor to invest
> resource on implementing the protocol
> - It would be more difficult to move in/out of Xen ecosystem.
>
> Both, may slow the adoption of Xen in some areas.

Right, just to be clear my suggestion wasn't to force the usage of
grants, but whether adding something along this lines was in the
roadmap, see below.

> If one is interested in security, then it would be better to work with the
> other interested parties. I think it would be possible to use a virtual
> IOMMU for this purpose.

Yes, I've also heard rumors about using the (I assume VirtIO) IOMMU in
order to protect what backends can map. This seems like a fine idea,
and would allow us to gain the lost security without having to do the
whole work ourselves.

Do you know if there's anything published about this? I'm curious
about how and where in the system the VirtIO IOMMU is/should be
implemented.

Thanks, Roger.
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 20.07.20 12:17, Roger Pau Monné wrote:

Hello Roger

> On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
>> On 17.07.20 18:00, Roger Pau Monné wrote:
>>> On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
>>>> requires
>>>> some implementation to forward guest MMIO access to a device model. And as
>>>> it
>>>> turned out the Xen on x86 contains most of the pieces to be able to use that
>>>> transport (via existing IOREQ concept). Julien has already done a big amount
>>>> of work in his PoC (xen/arm: Add support for Guest IO forwarding to a
>>>> device emulator).
>>>> Using that code as a base we managed to create a completely functional PoC
>>>> with DomU
>>>> running on virtio block device instead of a traditional Xen PV driver
>>>> without
>>>> modifications to DomU Linux. Our work is mostly about rebasing Julien's
>>>> code on the actual
>>>> codebase (Xen 4.14-rc4), various tweeks to be able to run emulator
>>>> (virtio-disk backend)
>>>> in other than Dom0 domain (in our system we have thin Dom0 and keep all
>>>> backends
>>>> in driver domain),
>>> How do you handle this use-case? Are you using grants in the VirtIO
>>> ring, or rather allowing the driver domain to map all the guest memory
>>> and then placing gfn on the ring like it's commonly done with VirtIO?
>> Second option. Xen grants are not used at all as well as event channel and
>> Xenbus. That allows us to have guest
>>
>> *unmodified* which one of the main goals. Yes, this may sound (or even
>> sounds) non-secure, but backend which runs in driver domain is allowed to
>> map all guest memory.
> Supporting unmodified guests is certainly a fine goal, but I don't
> think it's incompatible with also trying to expand the spec in
> parallel in order to support grants in a negotiated way (see below).
>
> That way you could (long term) regain some of the lost security.
>
>>> Do you have any plans to try to upstream a modification to the VirtIO
>>> spec so that grants (ie: abstract references to memory addresses) can
>>> be used on the VirtIO ring?
>> But VirtIO spec hasn't been modified as well as VirtIO infrastructure in the
>> guest. Nothing to upsteam)
> OK, so there's no intention to add grants (or a similar interface) to
> the spec?
>
> I understand that you want to support unmodified VirtIO frontends, but
> I also think that long term frontends could negotiate with backends on
> the usage of grants in the shared ring, like any other VirtIO feature
> negotiated between the frontend and the backend.
>
> This of course needs to be on the spec first before we can start
> implementing it, and hence my question whether a modification to the
> spec in order to add grants has been considered.
>
> It's fine to say that you don't have any plans in this regard.
Adding grants (or a similar interface) to the spec hasn't been
considered so far.

But I understand and completely agree that some solution should be found
in order not to reduce security.


>>>> misc fixes for our use-cases and tool support for the
>>>> configuration.
>>>> Unfortunately, Julien doesn’t have much time to allocate on the work
>>>> anymore,
>>>> so we would like to step in and continue.
>>>>
>>>> *A few word about the Xen code:*
>>>> You can find the whole Xen series at [5]. The patches are in RFC state
>>>> because
>>>> some actions in the series should be reconsidered and implemented properly.
>>>> Before submitting the final code for the review the first IOREQ patch
>>>> (which is quite
>>>> big) will be split into x86, Arm and common parts. Please note, x86 part
>>>> wasn’t
>>>> even build-tested so far and could be broken with that series. Also the
>>>> series probably
>>>> wants splitting into adding IOREQ on Arm (should be focused first) and
>>>> tools support
>>>> for the virtio-disk (which is going to be the first Virtio driver)
>>>> configuration before going
>>>> into the mailing list.
>>> Sending first a patch series to enable IOREQs on Arm seems perfectly
>>> fine, and it doesn't have to come with the VirtIO backend. In fact I
>>> would recommend that you send that ASAP, so that you don't spend time
>>> working on the backend that would likely need to be modified
>>> according to the review received on the IOREQ series.
>> Completely agree with you, I will send it after splitting IOREQ patch and
>> performing some cleanup.
>>
>> However, it is going to take some time to make it properly taking into the
>> account
>>
>> that personally I won't be able to test on x86.
> We have gitlab and the osstest CI loop (plus all the reviewers) so we
> should be able to spot any regressions. Build testing on x86 would be
> nice so that you don't need to resend to fix build issues.

Of course, before sending series to ML I will definitely perform a build
test

on x86.


>>>> What I would like to add here, the IOREQ feature on Arm could be used not
>>>> only
>>>> for implementing Virtio, but for other use-cases which require some
>>>> emulator entity
>>>> outside Xen such as custom PCI emulator (non-ECAM compatible) for example.
>>>>
>>>> *A few word about the backend(s):*
>>>> One of the main problems with Virtio in Xen on Arm is the absence of
>>>> “ready-to-use” and “out-of-Qemu” Virtio backends (I least am not aware of).
>>>> We managed to create virtio-disk backend based on demu [3] and kvmtool [4]
>>>> using
>>>> that series. It is worth mentioning that although Xenbus/Xenstore is not
>>>> supposed
>>>> to be used with native Virtio, that interface was chosen to just pass
>>>> configuration from toolstack
>>>> to the backend and notify it about creating/destroying Guest domain (I
>>>> think it is
>>> I would prefer if a single instance was launched to handle each
>>> backend, and that the configuration was passed on the command line.
>>> Killing the user-space backend from the toolstack is fine I think,
>>> there's no need to notify the backend using xenstore or any other
>>> out-of-band methods.
>>>
>>> xenstore has proven to be a bottleneck in terms of performance, and it
>>> would be better if we can avoid using it when possible, specially here
>>> that you have to do this from scratch anyway.
>> Let me elaborate a bit more on this.
>>
>> In current backend implementation, the Xenstore is *not* used for
>> communication between backend (VirtIO device) and frontend (VirtIO driver),
>> frontend knows nothing about it.
>>
>> Xenstore was chosen as an interface in order to be able to pass
>> configuration from toolstack in Dom0 to backend which may reside in other
>> than Dom0 domain (DomD in our case),
> There's 'xl devd' which can be used on the driver domain to spawn
> backends, maybe you could add the logic there so that 'xl devd' calls
> the backend executable with the required command line parameters, so
> that the backend itself doesn't need to interact with xenstore in any
> way?
>
> That way in the future we could use something else instead of
> xenstore, like Argo for instance in order to pass the backend data
> from the control domain to the driver domain.
>
>> also looking into the Xenstore entries backend always knows when the
>> intended guest is been created/destroyed.
> xl devd should also do the killing of backends anyway when a domain is
> destroyed, or else malfunctioning user-space backends could keep
> running after the domain they are serving is destroyed.
>
>> I may mistake, but I don't think we can avoid using Xenstore (or other
>> interface provided by toolstack) for the several reasons.
>>
>> Besides a virtio-disk configuration (a disk to be assigned to the guest, R/O
>> mode, etc), for each virtio-mmio device instance
>>
>> a pair (mmio range + IRQ) are allocated by toolstack at the guest
>> construction time and inserted into virtio-mmio device tree node
>>
>> in the guest device tree. And for the backend to properly operate these
>> variable parameters are also passed to the backend via Xenstore.
> I think you could pass all these parameters as command line arguments
> to the backend?
>
>> The other reasons are:
>>
>> 1. Automation. With current backend implementation we don't need to pause
>> guest right after creating it, then go to the driver domain and spawn
>> backend and
>>
>> after that go back to the dom0 and unpause the guest.
> xl devd should be capable of handling this for you on the driver
> domain.
>
>> 2. Ability to detect when guest with involved frontend has gone away and
>> properly release resource (guest destroy/reboot).
>>
>> 3. Ability to (re)connect to the newly created guest with involved frontend
>> (guest create/reboot).
>>
>> 4. What is more that having Xenstore support the backend is able to detect
>> the dom_id it runs into and the guest dom_id, there is no need pass them via
>> command line.
>>
>>
>> I will be happy to explain in details after publishing backend code).
> As I'm not the one doing the work I certainly won't stop you from
> using xenstore on the backend. I would certainly prefer if the backend
> gets all the information it needs from the command line so that the
> configuration data is completely agnostic to the transport layer used
> to convey it.
>
> Thanks, Roger.

Thank you for pointing another possible way. I feel I need to
investigate what is the "xl devd" (+ Argo?) and how it works. If it is
able to provide backend with

the support/information it needs and xenstore is not welcome then I
would be absolutely ok to consider using other solution.

I propose to get back to that discussion after I prepare and send out
the proper IOREQ series.


--
Regards,

Oleksandr Tyshchenko
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Mon, Jul 20, 2020 at 01:56:51PM +0300, Oleksandr wrote:
> On 20.07.20 12:17, Roger Pau Monné wrote:
> > On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
> > > On 17.07.20 18:00, Roger Pau Monné wrote:
> > > > On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
> > > The other reasons are:
> > >
> > > 1. Automation. With current backend implementation we don't need to pause
> > > guest right after creating it, then go to the driver domain and spawn
> > > backend and
> > >
> > > after that go back to the dom0 and unpause the guest.
> > xl devd should be capable of handling this for you on the driver
> > domain.
> >
> > > 2. Ability to detect when guest with involved frontend has gone away and
> > > properly release resource (guest destroy/reboot).
> > >
> > > 3. Ability to (re)connect to the newly created guest with involved frontend
> > > (guest create/reboot).
> > >
> > > 4. What is more that having Xenstore support the backend is able to detect
> > > the dom_id it runs into and the guest dom_id, there is no need pass them via
> > > command line.
> > >
> > >
> > > I will be happy to explain in details after publishing backend code).
> > As I'm not the one doing the work I certainly won't stop you from
> > using xenstore on the backend. I would certainly prefer if the backend
> > gets all the information it needs from the command line so that the
> > configuration data is completely agnostic to the transport layer used
> > to convey it.
> >
> > Thanks, Roger.
>
> Thank you for pointing another possible way. I feel I need to investigate
> what is the "xl devd" (+ Argo?) and how it works. If it is able to provide
> backend with

That's what x86 at least uses to manage backends on driver domains: xl
devd will for example launch the QEMU instance required to handle a
Xen PV disk backend in user-space.

Note that there's currently no support for Argo or any communication
channel different than xenstore, but I think it would be cleaner to
place the fetching of data from xenstore in xl devd and just pass
those as command line arguments to the VirtIO backend if possible. I
would prefer the VirtIO backend to be fully decoupled from xenstore.

Note that for a backend running on dom0 there would be no need to
pass any data on xenstore, as the backend would be launched directly
from xl with the appropriate command line arguments.

> the support/information it needs and xenstore is not welcome then I would be
> absolutely ok to consider using other solution.
>
> I propose to get back to that discussion after I prepare and send out the
> proper IOREQ series.

Sure, that's fine.

Roger.
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Mon, 20 Jul 2020, Roger Pau Monné wrote:
> On Mon, Jul 20, 2020 at 10:40:40AM +0100, Julien Grall wrote:
> >
> >
> > On 20/07/2020 10:17, Roger Pau Monné wrote:
> > > On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
> > > > On 17.07.20 18:00, Roger Pau Monné wrote:
> > > > > On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
> > > > > Do you have any plans to try to upstream a modification to the VirtIO
> > > > > spec so that grants (ie: abstract references to memory addresses) can
> > > > > be used on the VirtIO ring?
> > > >
> > > > But VirtIO spec hasn't been modified as well as VirtIO infrastructure in the
> > > > guest. Nothing to upsteam)
> > >
> > > OK, so there's no intention to add grants (or a similar interface) to
> > > the spec?
> > >
> > > I understand that you want to support unmodified VirtIO frontends, but
> > > I also think that long term frontends could negotiate with backends on
> > > the usage of grants in the shared ring, like any other VirtIO feature
> > > negotiated between the frontend and the backend.
> > >
> > > This of course needs to be on the spec first before we can start
> > > implementing it, and hence my question whether a modification to the
> > > spec in order to add grants has been considered.
> > The problem is not really the specification but the adoption in the
> > ecosystem. A protocol based on grant-tables would mostly only be used by Xen
> > therefore:
> > - It may be difficult to convince a proprietary OS vendor to invest
> > resource on implementing the protocol
> > - It would be more difficult to move in/out of Xen ecosystem.
> >
> > Both, may slow the adoption of Xen in some areas.
>
> Right, just to be clear my suggestion wasn't to force the usage of
> grants, but whether adding something along this lines was in the
> roadmap, see below.
>
> > If one is interested in security, then it would be better to work with the
> > other interested parties. I think it would be possible to use a virtual
> > IOMMU for this purpose.
>
> Yes, I've also heard rumors about using the (I assume VirtIO) IOMMU in
> order to protect what backends can map. This seems like a fine idea,
> and would allow us to gain the lost security without having to do the
> whole work ourselves.
>
> Do you know if there's anything published about this? I'm curious
> about how and where in the system the VirtIO IOMMU is/should be
> implemented.

Not yet (as far as I know), but we have just started some discussons on
this topic within Linaro.


You should also be aware that there is another proposal based on
pre-shared-memory and memcpys to solve the virtio security issue:

https://marc.info/?l=linux-kernel&m=158807398403549

It would be certainly slower than the "virtio IOMMU" solution but it
would take far less time to develop and could work as a short-term
stop-gap. (In my view the "virtio IOMMU" is the only clean solution
to the problem long term.)
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Fri, 17 Jul 2020, Oleksandr wrote:
> > > *A few word about solution:*
> > > As it was mentioned at [1], in order to implement virtio-mmio Xen on Arm
> > Any plans for virtio-pci? Arm seems to be moving to the PCI bus, and
> > it would be very interesting from a x86 PoV, as I don't think
> > virtio-mmio is something that you can easily use on x86 (or even use
> > at all).
>
> Being honest I didn't consider virtio-pci so far. Julien's PoC (we are based
> on) provides support for the virtio-mmio transport
>
> which is enough to start working around VirtIO and is not as complex as
> virtio-pci. But it doesn't mean there is no way for virtio-pci in Xen.
>
> I think, this could be added in next steps. But the nearest target is
> virtio-mmio approach (of course if the community agrees on that).

Hi Julien, Oleksandr,

Aside from complexity and easy-of-development, are there any other
architectural reasons for using virtio-mmio?

I am not asking because I intend to suggest to do something different
(virtio-mmio is fine as far as I can tell.) I am asking because recently
there was a virtio-pci/virtio-mmio discussion recently in Linaro and I
would like to understand if there are any implications from a Xen point
of view that I don't yet know.

For instance, what's your take on notifications with virtio-mmio? How
are they modelled today? Are they good enough or do we need MSIs?
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Mon, 20 Jul 2020, Roger Pau Monné wrote:
> On Mon, Jul 20, 2020 at 01:56:51PM +0300, Oleksandr wrote:
> > On 20.07.20 12:17, Roger Pau Monné wrote:
> > > On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
> > > > On 17.07.20 18:00, Roger Pau Monné wrote:
> > > > > On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
> > > > The other reasons are:
> > > >
> > > > 1. Automation. With current backend implementation we don't need to pause
> > > > guest right after creating it, then go to the driver domain and spawn
> > > > backend and
> > > >
> > > > after that go back to the dom0 and unpause the guest.
> > > xl devd should be capable of handling this for you on the driver
> > > domain.
> > >
> > > > 2. Ability to detect when guest with involved frontend has gone away and
> > > > properly release resource (guest destroy/reboot).
> > > >
> > > > 3. Ability to (re)connect to the newly created guest with involved frontend
> > > > (guest create/reboot).
> > > >
> > > > 4. What is more that having Xenstore support the backend is able to detect
> > > > the dom_id it runs into and the guest dom_id, there is no need pass them via
> > > > command line.
> > > >
> > > >
> > > > I will be happy to explain in details after publishing backend code).
> > > As I'm not the one doing the work I certainly won't stop you from
> > > using xenstore on the backend. I would certainly prefer if the backend
> > > gets all the information it needs from the command line so that the
> > > configuration data is completely agnostic to the transport layer used
> > > to convey it.
> > >
> > > Thanks, Roger.
> >
> > Thank you for pointing another possible way. I feel I need to investigate
> > what is the "xl devd" (+ Argo?) and how it works. If it is able to provide
> > backend with
>
> That's what x86 at least uses to manage backends on driver domains: xl
> devd will for example launch the QEMU instance required to handle a
> Xen PV disk backend in user-space.
>
> Note that there's currently no support for Argo or any communication
> channel different than xenstore, but I think it would be cleaner to
> place the fetching of data from xenstore in xl devd and just pass
> those as command line arguments to the VirtIO backend if possible. I
> would prefer the VirtIO backend to be fully decoupled from xenstore.
>
> Note that for a backend running on dom0 there would be no need to
> pass any data on xenstore, as the backend would be launched directly
> from xl with the appropriate command line arguments.

If I can paraphrase Roger's point, I think we all agree that xenstore is
very convenient to use and great to get something up and running
quickly. But it has several limitations, so it would be fantastic if we
could kill two birds with one stone and find a way to deploy the system
without xenstore, given that with virtio it is not actually needed if not
for very limited initial configurations. It would certainly be a big
win. However, it is fair to say that the xenstore alternative, whatever
that might be, needs work.
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 20.07.20 23:38, Stefano Stabellini wrote:

Hello Stefano

> On Fri, 17 Jul 2020, Oleksandr wrote:
>>>> *A few word about solution:*
>>>> As it was mentioned at [1], in order to implement virtio-mmio Xen on Arm
>>> Any plans for virtio-pci? Arm seems to be moving to the PCI bus, and
>>> it would be very interesting from a x86 PoV, as I don't think
>>> virtio-mmio is something that you can easily use on x86 (or even use
>>> at all).
>> Being honest I didn't consider virtio-pci so far. Julien's PoC (we are based
>> on) provides support for the virtio-mmio transport
>>
>> which is enough to start working around VirtIO and is not as complex as
>> virtio-pci. But it doesn't mean there is no way for virtio-pci in Xen.
>>
>> I think, this could be added in next steps. But the nearest target is
>> virtio-mmio approach (of course if the community agrees on that).
> Hi Julien, Oleksandr,
>
> Aside from complexity and easy-of-development, are there any other
> architectural reasons for using virtio-mmio?
>
> I am not asking because I intend to suggest to do something different
> (virtio-mmio is fine as far as I can tell.) I am asking because recently
> there was a virtio-pci/virtio-mmio discussion recently in Linaro and I
> would like to understand if there are any implications from a Xen point
> of view that I don't yet know.
Unfortunately, I can't say anything regarding virtio-pci/MSI. Could the
virtio-pci work in virtual environment without PCI support (various
embedded platforms)?

It feels to me that both transports (easy and lightweight virtio-mmio
and complex and powerfull virtio-pci) will have their consumer demand
and worth being implemented in Xen.


>
> For instance, what's your take on notifications with virtio-mmio? How
> are they modelled today? Are they good enough or do we need MSIs?

Notifications are sent from device (backend) to the driver (frontend)
using interrupts. Additional DM function was introduced for that purpose
xendevicemodel_set_irq_level() which results in vgic_inject_irq() call.

Currently, if device wants to notify a driver it should trigger the
interrupt by calling that function twice (high level at first, then low
level).


--
Regards,

Oleksandr Tyshchenko
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
Hi Stefano,

On 20/07/2020 21:37, Stefano Stabellini wrote:
> On Mon, 20 Jul 2020, Roger Pau Monné wrote:
>> On Mon, Jul 20, 2020 at 10:40:40AM +0100, Julien Grall wrote:
>>>
>>>
>>> On 20/07/2020 10:17, Roger Pau Monné wrote:
>>>> On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
>>>>> On 17.07.20 18:00, Roger Pau Monné wrote:
>>>>>> On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
>>>>>> Do you have any plans to try to upstream a modification to the VirtIO
>>>>>> spec so that grants (ie: abstract references to memory addresses) can
>>>>>> be used on the VirtIO ring?
>>>>>
>>>>> But VirtIO spec hasn't been modified as well as VirtIO infrastructure in the
>>>>> guest. Nothing to upsteam)
>>>>
>>>> OK, so there's no intention to add grants (or a similar interface) to
>>>> the spec?
>>>>
>>>> I understand that you want to support unmodified VirtIO frontends, but
>>>> I also think that long term frontends could negotiate with backends on
>>>> the usage of grants in the shared ring, like any other VirtIO feature
>>>> negotiated between the frontend and the backend.
>>>>
>>>> This of course needs to be on the spec first before we can start
>>>> implementing it, and hence my question whether a modification to the
>>>> spec in order to add grants has been considered.
>>> The problem is not really the specification but the adoption in the
>>> ecosystem. A protocol based on grant-tables would mostly only be used by Xen
>>> therefore:
>>> - It may be difficult to convince a proprietary OS vendor to invest
>>> resource on implementing the protocol
>>> - It would be more difficult to move in/out of Xen ecosystem.
>>>
>>> Both, may slow the adoption of Xen in some areas.
>>
>> Right, just to be clear my suggestion wasn't to force the usage of
>> grants, but whether adding something along this lines was in the
>> roadmap, see below.
>>
>>> If one is interested in security, then it would be better to work with the
>>> other interested parties. I think it would be possible to use a virtual
>>> IOMMU for this purpose.
>>
>> Yes, I've also heard rumors about using the (I assume VirtIO) IOMMU in
>> order to protect what backends can map. This seems like a fine idea,
>> and would allow us to gain the lost security without having to do the
>> whole work ourselves.
>>
>> Do you know if there's anything published about this? I'm curious
>> about how and where in the system the VirtIO IOMMU is/should be
>> implemented.
>
> Not yet (as far as I know), but we have just started some discussons on
> this topic within Linaro.
>
>
> You should also be aware that there is another proposal based on
> pre-shared-memory and memcpys to solve the virtio security issue:
>
> https://marc.info/?l=linux-kernel&m=158807398403549
>
> It would be certainly slower than the "virtio IOMMU" solution but it
> would take far less time to develop and could work as a short-term
> stop-gap.

I don't think I agree with this blank statement. In the case of "virtio
IOMMU", you would need to potentially map/unmap pages every request
which would result to a lot of back and forth to the hypervisor.

So it may turn out that pre-shared-memory may be faster on some setup.

Cheers,

--
Julien Grall
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 20.07.20 23:40, Stefano Stabellini wrote:

Hello Stefano

> On Mon, 20 Jul 2020, Roger Pau Monné wrote:
>> On Mon, Jul 20, 2020 at 01:56:51PM +0300, Oleksandr wrote:
>>> On 20.07.20 12:17, Roger Pau Monné wrote:
>>>> On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
>>>>> On 17.07.20 18:00, Roger Pau Monné wrote:
>>>>>> On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
>>>>> The other reasons are:
>>>>>
>>>>> 1. Automation. With current backend implementation we don't need to pause
>>>>> guest right after creating it, then go to the driver domain and spawn
>>>>> backend and
>>>>>
>>>>> after that go back to the dom0 and unpause the guest.
>>>> xl devd should be capable of handling this for you on the driver
>>>> domain.
>>>>
>>>>> 2. Ability to detect when guest with involved frontend has gone away and
>>>>> properly release resource (guest destroy/reboot).
>>>>>
>>>>> 3. Ability to (re)connect to the newly created guest with involved frontend
>>>>> (guest create/reboot).
>>>>>
>>>>> 4. What is more that having Xenstore support the backend is able to detect
>>>>> the dom_id it runs into and the guest dom_id, there is no need pass them via
>>>>> command line.
>>>>>
>>>>>
>>>>> I will be happy to explain in details after publishing backend code).
>>>> As I'm not the one doing the work I certainly won't stop you from
>>>> using xenstore on the backend. I would certainly prefer if the backend
>>>> gets all the information it needs from the command line so that the
>>>> configuration data is completely agnostic to the transport layer used
>>>> to convey it.
>>>>
>>>> Thanks, Roger.
>>> Thank you for pointing another possible way. I feel I need to investigate
>>> what is the "xl devd" (+ Argo?) and how it works. If it is able to provide
>>> backend with
>> That's what x86 at least uses to manage backends on driver domains: xl
>> devd will for example launch the QEMU instance required to handle a
>> Xen PV disk backend in user-space.
>>
>> Note that there's currently no support for Argo or any communication
>> channel different than xenstore, but I think it would be cleaner to
>> place the fetching of data from xenstore in xl devd and just pass
>> those as command line arguments to the VirtIO backend if possible. I
>> would prefer the VirtIO backend to be fully decoupled from xenstore.
>>
>> Note that for a backend running on dom0 there would be no need to
>> pass any data on xenstore, as the backend would be launched directly
>> from xl with the appropriate command line arguments.
> If I can paraphrase Roger's point, I think we all agree that xenstore is
> very convenient to use and great to get something up and running
> quickly. But it has several limitations, so it would be fantastic if we
> could kill two birds with one stone and find a way to deploy the system
> without xenstore, given that with virtio it is not actually needed if not
> for very limited initial configurations. It would certainly be a big
> win. However, it is fair to say that the xenstore alternative, whatever
> that might be, needs work.

Well, why actually not?

For example, the idea "to place the fetching of data from xenstore in xl
devd and just pass
those as command line arguments to the VirtIO backend if possible"
sounds fine to me. But this needs an additional investigation.

--
Regards,

Oleksandr Tyshchenko
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
(+ Andree for the vGIC).

Hi Stefano,

On 20/07/2020 21:38, Stefano Stabellini wrote:
> On Fri, 17 Jul 2020, Oleksandr wrote:
>>>> *A few word about solution:*
>>>> As it was mentioned at [1], in order to implement virtio-mmio Xen on Arm
>>> Any plans for virtio-pci? Arm seems to be moving to the PCI bus, and
>>> it would be very interesting from a x86 PoV, as I don't think
>>> virtio-mmio is something that you can easily use on x86 (or even use
>>> at all).
>>
>> Being honest I didn't consider virtio-pci so far. Julien's PoC (we are based
>> on) provides support for the virtio-mmio transport
>>
>> which is enough to start working around VirtIO and is not as complex as
>> virtio-pci. But it doesn't mean there is no way for virtio-pci in Xen.
>>
>> I think, this could be added in next steps. But the nearest target is
>> virtio-mmio approach (of course if the community agrees on that).

> Aside from complexity and easy-of-development, are there any other
> architectural reasons for using virtio-mmio?

From the hypervisor PoV, the main/only difference between virtio-mmio
and virtio-pci is that in the latter we need to forward PCI config space
access to the device emulator. IOW, we would need to add support for
vPCI. This shouldn't require much more work, but I didn't want to invest
on it for PoC.

Long term, I don't think we should tie Xen to any of the virtio
protocol. We just need to offer facilities so users can be build easily
virtio backend for Xen.

>
> I am not asking because I intend to suggest to do something different
> (virtio-mmio is fine as far as I can tell.) I am asking because recently
> there was a virtio-pci/virtio-mmio discussion recently in Linaro and I
> would like to understand if there are any implications from a Xen point
> of view that I don't yet know.

virtio-mmio is going to require more work in the toolstack because we
would need to do the memory/interrupts allocation ourself. In the case
of virtio-pci, we only need to pass a range of memory/interrupts to the
guest and let him decide the allocation.

Regarding virtio-pci vs virtio-mmio:
- flexibility: virtio-mmio is a good fit when you know all your
devices at boot. If you want to hotplug disk/network, then virtio-pci is
going to be a better fit.
- interrupts: I would expect each virtio-mmio device to have its
own SPI interrupts. In the case of virtio-pci, then legacy interrupts
would be shared between all the PCI devices on the same host controller.
This could possibly lead to performance issue if you have many devices.
So for virtio-pci, we should consider MSIs.

>
> For instance, what's your take on notifications with virtio-mmio? How
> are they modelled today?

The backend will notify the frontend using an SPI. The other way around
(frontend -> backend) is based on an MMIO write.

We have an interface to allow the backend to control whether the
interrupt level (i.e. low, high). However, the "old" vGIC doesn't handle
properly level interrupts. So we would end up to treat level interrupts
as edge.

Technically, the problem is already existing with HW interrupts, but the
HW should fire it again if the interrupt line is still asserted. Another
issue is the interrupt may fire even if the interrupt line was
deasserted (IIRC this caused some interesting problem with the Arch timer).

I am a bit concerned that the issue will be more proeminent for virtual
interrupts. I know that we have some gross hack in the vpl011 to handle
a level interrupts. So maybe it is time to switch to the new vGIC?

> Are they good enough or do we need MSIs?

I am not sure whether virtio-mmio supports MSIs. However for virtio-pci,
MSIs is going to be useful to improve performance. This may mean to
expose an ITS, so we would need to add support for guest.

Cheers,

--
Julien Grall
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Tue, Jul 21, 2020 at 01:31:48PM +0100, Julien Grall wrote:
> Hi Stefano,
>
> On 20/07/2020 21:37, Stefano Stabellini wrote:
> > On Mon, 20 Jul 2020, Roger Pau Monné wrote:
> > > On Mon, Jul 20, 2020 at 10:40:40AM +0100, Julien Grall wrote:
> > > >
> > > >
> > > > On 20/07/2020 10:17, Roger Pau Monné wrote:
> > > > > On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
> > > > > > On 17.07.20 18:00, Roger Pau Monné wrote:
> > > > > > > On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
> > > > > > > Do you have any plans to try to upstream a modification to the VirtIO
> > > > > > > spec so that grants (ie: abstract references to memory addresses) can
> > > > > > > be used on the VirtIO ring?
> > > > > >
> > > > > > But VirtIO spec hasn't been modified as well as VirtIO infrastructure in the
> > > > > > guest. Nothing to upsteam)
> > > > >
> > > > > OK, so there's no intention to add grants (or a similar interface) to
> > > > > the spec?
> > > > >
> > > > > I understand that you want to support unmodified VirtIO frontends, but
> > > > > I also think that long term frontends could negotiate with backends on
> > > > > the usage of grants in the shared ring, like any other VirtIO feature
> > > > > negotiated between the frontend and the backend.
> > > > >
> > > > > This of course needs to be on the spec first before we can start
> > > > > implementing it, and hence my question whether a modification to the
> > > > > spec in order to add grants has been considered.
> > > > The problem is not really the specification but the adoption in the
> > > > ecosystem. A protocol based on grant-tables would mostly only be used by Xen
> > > > therefore:
> > > > - It may be difficult to convince a proprietary OS vendor to invest
> > > > resource on implementing the protocol
> > > > - It would be more difficult to move in/out of Xen ecosystem.
> > > >
> > > > Both, may slow the adoption of Xen in some areas.
> > >
> > > Right, just to be clear my suggestion wasn't to force the usage of
> > > grants, but whether adding something along this lines was in the
> > > roadmap, see below.
> > >
> > > > If one is interested in security, then it would be better to work with the
> > > > other interested parties. I think it would be possible to use a virtual
> > > > IOMMU for this purpose.
> > >
> > > Yes, I've also heard rumors about using the (I assume VirtIO) IOMMU in
> > > order to protect what backends can map. This seems like a fine idea,
> > > and would allow us to gain the lost security without having to do the
> > > whole work ourselves.
> > >
> > > Do you know if there's anything published about this? I'm curious
> > > about how and where in the system the VirtIO IOMMU is/should be
> > > implemented.
> >
> > Not yet (as far as I know), but we have just started some discussons on
> > this topic within Linaro.
> >
> >
> > You should also be aware that there is another proposal based on
> > pre-shared-memory and memcpys to solve the virtio security issue:
> >
> > https://marc.info/?l=linux-kernel&m=158807398403549
> >
> > It would be certainly slower than the "virtio IOMMU" solution but it
> > would take far less time to develop and could work as a short-term
> > stop-gap.
>
> I don't think I agree with this blank statement. In the case of "virtio
> IOMMU", you would need to potentially map/unmap pages every request which
> would result to a lot of back and forth to the hypervisor.
>
> So it may turn out that pre-shared-memory may be faster on some setup.

AFAICT you could achieve the same with an IOMMU: pre-share (ie: add to
the device IOMMU page tables) a bunch of pages and keep bouncing data
to/from them in order to interact with the device, that way you could
avoid the map and unmaps (and is effectively how persistent grants
work in the blkif protocol).

The thread referenced by Stefano seems to point out this shared memory
model is targeted for very limited hypervisors that don't have the
capacity to trap, decode and emulate accesses to memory?

I certainly don't know much about it.

Roger.
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
Hi Roger,

On 21/07/2020 14:25, Roger Pau Monné wrote:
> On Tue, Jul 21, 2020 at 01:31:48PM +0100, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 20/07/2020 21:37, Stefano Stabellini wrote:
>>> On Mon, 20 Jul 2020, Roger Pau Monné wrote:
>>>> On Mon, Jul 20, 2020 at 10:40:40AM +0100, Julien Grall wrote:
>>>>>
>>>>>
>>>>> On 20/07/2020 10:17, Roger Pau Monné wrote:
>>>>>> On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
>>>>>>> On 17.07.20 18:00, Roger Pau Monné wrote:
>>>>>>>> On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
>>>>>>>> Do you have any plans to try to upstream a modification to the VirtIO
>>>>>>>> spec so that grants (ie: abstract references to memory addresses) can
>>>>>>>> be used on the VirtIO ring?
>>>>>>>
>>>>>>> But VirtIO spec hasn't been modified as well as VirtIO infrastructure in the
>>>>>>> guest. Nothing to upsteam)
>>>>>>
>>>>>> OK, so there's no intention to add grants (or a similar interface) to
>>>>>> the spec?
>>>>>>
>>>>>> I understand that you want to support unmodified VirtIO frontends, but
>>>>>> I also think that long term frontends could negotiate with backends on
>>>>>> the usage of grants in the shared ring, like any other VirtIO feature
>>>>>> negotiated between the frontend and the backend.
>>>>>>
>>>>>> This of course needs to be on the spec first before we can start
>>>>>> implementing it, and hence my question whether a modification to the
>>>>>> spec in order to add grants has been considered.
>>>>> The problem is not really the specification but the adoption in the
>>>>> ecosystem. A protocol based on grant-tables would mostly only be used by Xen
>>>>> therefore:
>>>>> - It may be difficult to convince a proprietary OS vendor to invest
>>>>> resource on implementing the protocol
>>>>> - It would be more difficult to move in/out of Xen ecosystem.
>>>>>
>>>>> Both, may slow the adoption of Xen in some areas.
>>>>
>>>> Right, just to be clear my suggestion wasn't to force the usage of
>>>> grants, but whether adding something along this lines was in the
>>>> roadmap, see below.
>>>>
>>>>> If one is interested in security, then it would be better to work with the
>>>>> other interested parties. I think it would be possible to use a virtual
>>>>> IOMMU for this purpose.
>>>>
>>>> Yes, I've also heard rumors about using the (I assume VirtIO) IOMMU in
>>>> order to protect what backends can map. This seems like a fine idea,
>>>> and would allow us to gain the lost security without having to do the
>>>> whole work ourselves.
>>>>
>>>> Do you know if there's anything published about this? I'm curious
>>>> about how and where in the system the VirtIO IOMMU is/should be
>>>> implemented.
>>>
>>> Not yet (as far as I know), but we have just started some discussons on
>>> this topic within Linaro.
>>>
>>>
>>> You should also be aware that there is another proposal based on
>>> pre-shared-memory and memcpys to solve the virtio security issue:
>>>
>>> https://marc.info/?l=linux-kernel&m=158807398403549
>>>
>>> It would be certainly slower than the "virtio IOMMU" solution but it
>>> would take far less time to develop and could work as a short-term
>>> stop-gap.
>>
>> I don't think I agree with this blank statement. In the case of "virtio
>> IOMMU", you would need to potentially map/unmap pages every request which
>> would result to a lot of back and forth to the hypervisor.
>>
>> So it may turn out that pre-shared-memory may be faster on some setup.
>
> AFAICT you could achieve the same with an IOMMU: pre-share (ie: add to
> the device IOMMU page tables) a bunch of pages and keep bouncing data
> to/from them in order to interact with the device, that way you could
> avoid the map and unmaps (and is effectively how persistent grants
> work in the blkif protocol).

Yes it is possible to do the same with the virtio IOMMU. I was more
arguing on the statement that pre-shared-memory is going to be slower
than the IOMMU case.

>
> The thread referenced by Stefano seems to point out this shared memory
> model is targeted for very limited hypervisors that don't have the
> capacity to trap, decode and emulate accesses to memory?

Technically we are in the same case for Xen on Arm as we don't have the
IOREQ support yet. But I think IOREQ is worthwhile as it would enable
existing unmodified Linux with virtio driver to boot on Xen.

Cheers,

--
Julien Grall
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Tue, Jul 21, 2020 at 02:32:38PM +0100, Julien Grall wrote:
> Hi Roger,
>
> On 21/07/2020 14:25, Roger Pau Monné wrote:
> > On Tue, Jul 21, 2020 at 01:31:48PM +0100, Julien Grall wrote:
> > > Hi Stefano,
> > >
> > > On 20/07/2020 21:37, Stefano Stabellini wrote:
> > > > On Mon, 20 Jul 2020, Roger Pau Monné wrote:
> > > > > On Mon, Jul 20, 2020 at 10:40:40AM +0100, Julien Grall wrote:
> > > > > >
> > > > > >
> > > > > > On 20/07/2020 10:17, Roger Pau Monné wrote:
> > > > > > > On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
> > > > > > > > On 17.07.20 18:00, Roger Pau Monné wrote:
> > > > > > > > > On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
> > > > > > > > > Do you have any plans to try to upstream a modification to the VirtIO
> > > > > > > > > spec so that grants (ie: abstract references to memory addresses) can
> > > > > > > > > be used on the VirtIO ring?
> > > > > > > >
> > > > > > > > But VirtIO spec hasn't been modified as well as VirtIO infrastructure in the
> > > > > > > > guest. Nothing to upsteam)
> > > > > > >
> > > > > > > OK, so there's no intention to add grants (or a similar interface) to
> > > > > > > the spec?
> > > > > > >
> > > > > > > I understand that you want to support unmodified VirtIO frontends, but
> > > > > > > I also think that long term frontends could negotiate with backends on
> > > > > > > the usage of grants in the shared ring, like any other VirtIO feature
> > > > > > > negotiated between the frontend and the backend.
> > > > > > >
> > > > > > > This of course needs to be on the spec first before we can start
> > > > > > > implementing it, and hence my question whether a modification to the
> > > > > > > spec in order to add grants has been considered.
> > > > > > The problem is not really the specification but the adoption in the
> > > > > > ecosystem. A protocol based on grant-tables would mostly only be used by Xen
> > > > > > therefore:
> > > > > > - It may be difficult to convince a proprietary OS vendor to invest
> > > > > > resource on implementing the protocol
> > > > > > - It would be more difficult to move in/out of Xen ecosystem.
> > > > > >
> > > > > > Both, may slow the adoption of Xen in some areas.
> > > > >
> > > > > Right, just to be clear my suggestion wasn't to force the usage of
> > > > > grants, but whether adding something along this lines was in the
> > > > > roadmap, see below.
> > > > >
> > > > > > If one is interested in security, then it would be better to work with the
> > > > > > other interested parties. I think it would be possible to use a virtual
> > > > > > IOMMU for this purpose.
> > > > >
> > > > > Yes, I've also heard rumors about using the (I assume VirtIO) IOMMU in
> > > > > order to protect what backends can map. This seems like a fine idea,
> > > > > and would allow us to gain the lost security without having to do the
> > > > > whole work ourselves.
> > > > >
> > > > > Do you know if there's anything published about this? I'm curious
> > > > > about how and where in the system the VirtIO IOMMU is/should be
> > > > > implemented.
> > > >
> > > > Not yet (as far as I know), but we have just started some discussons on
> > > > this topic within Linaro.
> > > >
> > > >
> > > > You should also be aware that there is another proposal based on
> > > > pre-shared-memory and memcpys to solve the virtio security issue:
> > > >
> > > > https://marc.info/?l=linux-kernel&m=158807398403549
> > > >
> > > > It would be certainly slower than the "virtio IOMMU" solution but it
> > > > would take far less time to develop and could work as a short-term
> > > > stop-gap.
> > >
> > > I don't think I agree with this blank statement. In the case of "virtio
> > > IOMMU", you would need to potentially map/unmap pages every request which
> > > would result to a lot of back and forth to the hypervisor.
> > >
> > > So it may turn out that pre-shared-memory may be faster on some setup.
> >
> > AFAICT you could achieve the same with an IOMMU: pre-share (ie: add to
> > the device IOMMU page tables) a bunch of pages and keep bouncing data
> > to/from them in order to interact with the device, that way you could
> > avoid the map and unmaps (and is effectively how persistent grants
> > work in the blkif protocol).
>
> Yes it is possible to do the same with the virtio IOMMU. I was more arguing
> on the statement that pre-shared-memory is going to be slower than the IOMMU
> case.
>
> >
> > The thread referenced by Stefano seems to point out this shared memory
> > model is targeted for very limited hypervisors that don't have the
> > capacity to trap, decode and emulate accesses to memory?
>
> Technically we are in the same case for Xen on Arm as we don't have the
> IOREQ support yet. But I think IOREQ is worthwhile as it would enable
> existing unmodified Linux with virtio driver to boot on Xen.

Yes, I fully agree.

Roger.
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
(+ Andre)

Hi Oleksandr,

On 21/07/2020 13:26, Oleksandr wrote:
> On 20.07.20 23:38, Stefano Stabellini wrote:
>> For instance, what's your take on notifications with virtio-mmio? How
>> are they modelled today? Are they good enough or do we need MSIs?
>
> Notifications are sent from device (backend) to the driver (frontend)
> using interrupts. Additional DM function was introduced for that purpose
> xendevicemodel_set_irq_level() which results in vgic_inject_irq() call.
>
> Currently, if device wants to notify a driver it should trigger the
> interrupt by calling that function twice (high level at first, then low
> level).

This doesn't look right to me. Assuming the interrupt is trigger when
the line is high-level, the backend should only issue the hypercall once
to set the level to high. Once the guest has finish to process all the
notifications the backend would then call the hypercall to lower the
interrupt line.

This means the interrupts should keep firing as long as the interrupt
line is high.

It is quite possible that I took some shortcut when implementing the
hypercall, so this should be corrected before anyone start to rely on it.

Cheers,

--
Julien Grall
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
Julien Grall <julien@xen.org> writes:

> Hi Stefano,
>
> On 20/07/2020 21:37, Stefano Stabellini wrote:
>> On Mon, 20 Jul 2020, Roger Pau Monné wrote:
>>> On Mon, Jul 20, 2020 at 10:40:40AM +0100, Julien Grall wrote:
>>>>
>>>>
>>>> On 20/07/2020 10:17, Roger Pau Monné wrote:
>>>>> On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
>>>>>> On 17.07.20 18:00, Roger Pau Monné wrote:
>>>>>>> On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
>>>>>>> Do you have any plans to try to upstream a modification to the VirtIO
>>>>>>> spec so that grants (ie: abstract references to memory addresses) can
>>>>>>> be used on the VirtIO ring?
>>>>>>
>>>>>> But VirtIO spec hasn't been modified as well as VirtIO infrastructure in the
>>>>>> guest. Nothing to upsteam)
>>>>>
>>>>> OK, so there's no intention to add grants (or a similar interface) to
>>>>> the spec?
>>>>>
>>>>> I understand that you want to support unmodified VirtIO frontends, but
>>>>> I also think that long term frontends could negotiate with backends on
>>>>> the usage of grants in the shared ring, like any other VirtIO feature
>>>>> negotiated between the frontend and the backend.
>>>>>
>>>>> This of course needs to be on the spec first before we can start
>>>>> implementing it, and hence my question whether a modification to the
>>>>> spec in order to add grants has been considered.
>>>> The problem is not really the specification but the adoption in the
>>>> ecosystem. A protocol based on grant-tables would mostly only be used by Xen
>>>> therefore:
>>>> - It may be difficult to convince a proprietary OS vendor to invest
>>>> resource on implementing the protocol
>>>> - It would be more difficult to move in/out of Xen ecosystem.
>>>>
>>>> Both, may slow the adoption of Xen in some areas.
>>>
>>> Right, just to be clear my suggestion wasn't to force the usage of
>>> grants, but whether adding something along this lines was in the
>>> roadmap, see below.
>>>
>>>> If one is interested in security, then it would be better to work with the
>>>> other interested parties. I think it would be possible to use a virtual
>>>> IOMMU for this purpose.
>>>
>>> Yes, I've also heard rumors about using the (I assume VirtIO) IOMMU in
>>> order to protect what backends can map. This seems like a fine idea,
>>> and would allow us to gain the lost security without having to do the
>>> whole work ourselves.
>>>
>>> Do you know if there's anything published about this? I'm curious
>>> about how and where in the system the VirtIO IOMMU is/should be
>>> implemented.
>>
>> Not yet (as far as I know), but we have just started some discussons on
>> this topic within Linaro.
>>
>>
>> You should also be aware that there is another proposal based on
>> pre-shared-memory and memcpys to solve the virtio security issue:
>>
>> https://marc.info/?l=linux-kernel&m=158807398403549
>>
>> It would be certainly slower than the "virtio IOMMU" solution but it
>> would take far less time to develop and could work as a short-term
>> stop-gap.
>
> I don't think I agree with this blank statement. In the case of "virtio
> IOMMU", you would need to potentially map/unmap pages every request
> which would result to a lot of back and forth to the hypervisor.

Can a virtio-iommu just set bounds when a device is initialised as to
where memory will be in the kernel address space?

> So it may turn out that pre-shared-memory may be faster on some setup.

Certainly having to update the page permissions every transaction is
going to be to slow for soemthing that wants to avoid the performance
penalty of a bounce buffer.

--
Alex Bennée
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
Julien Grall <julien@xen.org> writes:

> (+ Andree for the vGIC).
>
> Hi Stefano,
>
> On 20/07/2020 21:38, Stefano Stabellini wrote:
>> On Fri, 17 Jul 2020, Oleksandr wrote:
>>>>> *A few word about solution:*
>>>>> As it was mentioned at [1], in order to implement virtio-mmio Xen on Arm
>>>> Any plans for virtio-pci? Arm seems to be moving to the PCI bus, and
>>>> it would be very interesting from a x86 PoV, as I don't think
>>>> virtio-mmio is something that you can easily use on x86 (or even use
>>>> at all).
>>>
>>> Being honest I didn't consider virtio-pci so far. Julien's PoC (we are based
>>> on) provides support for the virtio-mmio transport
>>>
>>> which is enough to start working around VirtIO and is not as complex as
>>> virtio-pci. But it doesn't mean there is no way for virtio-pci in Xen.
>>>
>>> I think, this could be added in next steps. But the nearest target is
>>> virtio-mmio approach (of course if the community agrees on that).
>
>> Aside from complexity and easy-of-development, are there any other
>> architectural reasons for using virtio-mmio?
>
<snip>
>>
>> For instance, what's your take on notifications with virtio-mmio? How
>> are they modelled today?
>
> The backend will notify the frontend using an SPI. The other way around
> (frontend -> backend) is based on an MMIO write.
>
> We have an interface to allow the backend to control whether the
> interrupt level (i.e. low, high). However, the "old" vGIC doesn't handle
> properly level interrupts. So we would end up to treat level interrupts
> as edge.
>
> Technically, the problem is already existing with HW interrupts, but the
> HW should fire it again if the interrupt line is still asserted. Another
> issue is the interrupt may fire even if the interrupt line was
> deasserted (IIRC this caused some interesting problem with the Arch timer).
>
> I am a bit concerned that the issue will be more proeminent for virtual
> interrupts. I know that we have some gross hack in the vpl011 to handle
> a level interrupts. So maybe it is time to switch to the new vGIC?
>
>> Are they good enough or do we need MSIs?
>
> I am not sure whether virtio-mmio supports MSIs. However for virtio-pci,
> MSIs is going to be useful to improve performance. This may mean to
> expose an ITS, so we would need to add support for guest.

virtio-mmio doesn't support MSI's at the moment although there have been
proposals to update the spec to allow them. At the moment the cost of
reading the ISR value and then writing an ack in vm_interrupt:

/* Read and acknowledge interrupts */
status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS);
writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK);

puts an extra vmexit cost to trap an emulate each exit. Getting an MSI
via an exitless access to the GIC would be better I think. I'm not quite
sure what the path to IRQs from Xen is.


>
> Cheers,


--
Alex Bennée
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
Hi,

On 17/07/2020 19:34, Oleksandr wrote:
>
> On 17.07.20 18:00, Roger Pau Monné wrote:
>>> requires
>>> some implementation to forward guest MMIO access to a device model.
>>> And as
>>> it
>>> turned out the Xen on x86 contains most of the pieces to be able to
>>> use that
>>> transport (via existing IOREQ concept). Julien has already done a big
>>> amount
>>> of work in his PoC (xen/arm: Add support for Guest IO forwarding to a
>>> device emulator).
>>> Using that code as a base we managed to create a completely
>>> functional PoC
>>> with DomU
>>> running on virtio block device instead of a traditional Xen PV driver
>>> without
>>> modifications to DomU Linux. Our work is mostly about rebasing Julien's
>>> code on the actual
>>> codebase (Xen 4.14-rc4), various tweeks to be able to run emulator
>>> (virtio-disk backend)
>>> in other than Dom0 domain (in our system we have thin Dom0 and keep all
>>> backends
>>> in driver domain),
>> How do you handle this use-case? Are you using grants in the VirtIO
>> ring, or rather allowing the driver domain to map all the guest memory
>> and then placing gfn on the ring like it's commonly done with VirtIO?
>
> Second option. Xen grants are not used at all as well as event channel
> and Xenbus. That allows us to have guest
>
> *unmodified* which one of the main goals. Yes, this may sound (or even
> sounds) non-secure, but backend which runs in driver domain is allowed
> to map all guest memory.
>
> In current backend implementation a part of guest memory is mapped just
> to process guest request then unmapped back, there is no mappings in
> advance. The xenforeignmemory_map
>
> call is used for that purpose. For experiment I tried to map all guest
> memory in advance and just calculated pointer at runtime. Of course that
> logic performed better.

That works well for a PoC, however I am not sure you can rely on it long
term as a guest is free to modify its memory layout. For instance, Linux
may balloon in/out memory. You probably want to consider something
similar to mapcache in QEMU.

On a similar topic, I am a bit surprised you didn't encounter memory
exhaustion when trying to use virtio. Because on how Linux currently
works (see XSA-300), the backend domain as to have a least as much RAM
as the domain it serves. For instance, you have serve two domains with
1GB of RAM each, then your backend would need at least 2GB + some for
its own purpose.

This probably wants to be resolved by allowing foreign mapping to be
"paging" out as you would for memory assigned to a userspace.

> I was thinking about guest static memory regions and forcing guest to
> allocate descriptors from them (in order not to map all guest memory,
> but a predefined region). But that implies modifying guest...

[...]

>>> misc fixes for our use-cases and tool support for the
>>> configuration.
>>> Unfortunately, Julien doesn’t have much time to allocate on the work
>>> anymore,
>>> so we would like to step in and continue.
>>>
>>> *A few word about the Xen code:*
>>> You can find the whole Xen series at [5]. The patches are in RFC state
>>> because
>>> some actions in the series should be reconsidered and implemented
>>> properly.
>>> Before submitting the final code for the review the first IOREQ patch
>>> (which is quite
>>> big) will be split into x86, Arm and common parts. Please note, x86 part
>>> wasn’t
>>> even build-tested so far and could be broken with that series. Also the
>>> series probably
>>> wants splitting into adding IOREQ on Arm (should be focused first) and
>>> tools support
>>> for the virtio-disk (which is going to be the first Virtio driver)
>>> configuration before going
>>> into the mailing list.
>> Sending first a patch series to enable IOREQs on Arm seems perfectly
>> fine, and it doesn't have to come with the VirtIO backend. In fact I
>> would recommend that you send that ASAP, so that you don't spend time
>> working on the backend that would likely need to be modified
>> according to the review received on the IOREQ series.
>
> Completely agree with you, I will send it after splitting IOREQ patch
> and performing some cleanup.
>
> However, it is going to take some time to make it properly taking into
> the account
>
> that personally I won't be able to test on x86.
I think other member of the community should be able to help here.
However, nowadays testing Xen on x86 is pretty easy with QEMU :).

Cheers,

--
Julien Grall
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 21/07/2020 14:43, Julien Grall wrote:
> (+ Andre)
>
> Hi Oleksandr,
>
> On 21/07/2020 13:26, Oleksandr wrote:
>> On 20.07.20 23:38, Stefano Stabellini wrote:
>>> For instance, what's your take on notifications with virtio-mmio? How
>>> are they modelled today? Are they good enough or do we need MSIs?
>>
>> Notifications are sent from device (backend) to the driver (frontend)
>> using interrupts. Additional DM function was introduced for that
>> purpose xendevicemodel_set_irq_level() which results in
>> vgic_inject_irq() call.
>>
>> Currently, if device wants to notify a driver it should trigger the
>> interrupt by calling that function twice (high level at first, then
>> low level).
>
> This doesn't look right to me. Assuming the interrupt is trigger when
> the line is high-level, the backend should only issue the hypercall once
> to set the level to high. Once the guest has finish to process all the
> notifications the backend would then call the hypercall to lower the
> interrupt line.
>
> This means the interrupts should keep firing as long as the interrupt
> line is high.
>
> It is quite possible that I took some shortcut when implementing the
> hypercall, so this should be corrected before anyone start to rely on it.

So I think the key question is: are virtio interrupts level or edge
triggered? Both QEMU and kvmtool advertise virtio-mmio interrupts as
edge-triggered.
From skimming through the virtio spec I can't find any explicit
mentioning of the type of IRQ, but the usage of MSIs indeed hints at
using an edge property. Apparently reading the PCI ISR status register
clears it, which again sounds like edge. For virtio-mmio the driver
needs to explicitly clear the interrupt status register, which again
says: edge (as it's not the device clearing the status).

So the device should just notify the driver once, which would cause one
vgic_inject_irq() call. It would be then up to the driver to clear up
that status, by reading PCI ISR status or writing to virtio-mmio's
interrupt-acknowledge register.

Does that make sense?

Cheers,
Andre
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
Hi Alex,

Thank you for your feedback!

On 21/07/2020 15:15, Alex Bennée wrote:
> Julien Grall <julien@xen.org> writes:
>
>> (+ Andree for the vGIC).
>>
>> Hi Stefano,
>>
>> On 20/07/2020 21:38, Stefano Stabellini wrote:
>>> On Fri, 17 Jul 2020, Oleksandr wrote:
>>>>>> *A few word about solution:*
>>>>>> As it was mentioned at [1], in order to implement virtio-mmio Xen on Arm
>>>>> Any plans for virtio-pci? Arm seems to be moving to the PCI bus, and
>>>>> it would be very interesting from a x86 PoV, as I don't think
>>>>> virtio-mmio is something that you can easily use on x86 (or even use
>>>>> at all).
>>>>
>>>> Being honest I didn't consider virtio-pci so far. Julien's PoC (we are based
>>>> on) provides support for the virtio-mmio transport
>>>>
>>>> which is enough to start working around VirtIO and is not as complex as
>>>> virtio-pci. But it doesn't mean there is no way for virtio-pci in Xen.
>>>>
>>>> I think, this could be added in next steps. But the nearest target is
>>>> virtio-mmio approach (of course if the community agrees on that).
>>
>>> Aside from complexity and easy-of-development, are there any other
>>> architectural reasons for using virtio-mmio?
>>
> <snip>
>>>
>>> For instance, what's your take on notifications with virtio-mmio? How
>>> are they modelled today?
>>
>> The backend will notify the frontend using an SPI. The other way around
>> (frontend -> backend) is based on an MMIO write.
>>
>> We have an interface to allow the backend to control whether the
>> interrupt level (i.e. low, high). However, the "old" vGIC doesn't handle
>> properly level interrupts. So we would end up to treat level interrupts
>> as edge.
>>
>> Technically, the problem is already existing with HW interrupts, but the
>> HW should fire it again if the interrupt line is still asserted. Another
>> issue is the interrupt may fire even if the interrupt line was
>> deasserted (IIRC this caused some interesting problem with the Arch timer).
>>
>> I am a bit concerned that the issue will be more proeminent for virtual
>> interrupts. I know that we have some gross hack in the vpl011 to handle
>> a level interrupts. So maybe it is time to switch to the new vGIC?
>>
>>> Are they good enough or do we need MSIs?
>>
>> I am not sure whether virtio-mmio supports MSIs. However for virtio-pci,
>> MSIs is going to be useful to improve performance. This may mean to
>> expose an ITS, so we would need to add support for guest.
>
> virtio-mmio doesn't support MSI's at the moment although there have been
> proposals to update the spec to allow them. At the moment the cost of
> reading the ISR value and then writing an ack in vm_interrupt:
>
> /* Read and acknowledge interrupts */
> status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS);
> writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK);
>

Hmmmm, the current way to handle MMIO is the following:
* pause the vCPU
* Forward the access to the backend domain
* Schedule the backend domain
* Wait for the access to be handled
* unpause the vCPU

So the sequence is going to be fairly expensive on Xen.

> puts an extra vmexit cost to trap an emulate each exit. Getting an MSI
> via an exitless access to the GIC would be better I think.
> I'm not quite
> sure what the path to IRQs from Xen is.

vmexit on Xen on Arm is pretty cheap compare to KVM as we don't save a
lot of things. In this situation, they handling an extra trap for the
interrupt is likely to be meaningless compare to the sequence above.

I am assuming the sequence is also going to be used by the MSIs, right?

It feels to me that it would be worth spending time to investigate the
cost of that sequence. It might be possible to optimize the ACK and
avoid to wait for the backend to handle the access.

Cheers,

--
Julien Grall
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 21.07.20 17:32, André Przywara wrote:
> On 21/07/2020 14:43, Julien Grall wrote:

Hello Andre, Julien


>> (+ Andre)
>>
>> Hi Oleksandr,
>>
>> On 21/07/2020 13:26, Oleksandr wrote:
>>> On 20.07.20 23:38, Stefano Stabellini wrote:
>>>> For instance, what's your take on notifications with virtio-mmio? How
>>>> are they modelled today? Are they good enough or do we need MSIs?
>>> Notifications are sent from device (backend) to the driver (frontend)
>>> using interrupts. Additional DM function was introduced for that
>>> purpose xendevicemodel_set_irq_level() which results in
>>> vgic_inject_irq() call.
>>>
>>> Currently, if device wants to notify a driver it should trigger the
>>> interrupt by calling that function twice (high level at first, then
>>> low level).
>> This doesn't look right to me. Assuming the interrupt is trigger when
>> the line is high-level, the backend should only issue the hypercall once
>> to set the level to high. Once the guest has finish to process all the
>> notifications the backend would then call the hypercall to lower the
>> interrupt line.
>>
>> This means the interrupts should keep firing as long as the interrupt
>> line is high.
>>
>> It is quite possible that I took some shortcut when implementing the
>> hypercall, so this should be corrected before anyone start to rely on it.
> So I think the key question is: are virtio interrupts level or edge
> triggered? Both QEMU and kvmtool advertise virtio-mmio interrupts as
> edge-triggered.
> From skimming through the virtio spec I can't find any explicit
> mentioning of the type of IRQ, but the usage of MSIs indeed hints at
> using an edge property. Apparently reading the PCI ISR status register
> clears it, which again sounds like edge. For virtio-mmio the driver
> needs to explicitly clear the interrupt status register, which again
> says: edge (as it's not the device clearing the status).
>
> So the device should just notify the driver once, which would cause one
> vgic_inject_irq() call. It would be then up to the driver to clear up
> that status, by reading PCI ISR status or writing to virtio-mmio's
> interrupt-acknowledge register.
>
> Does that make sense?
When implementing Xen backend, I didn't have an already working example
so only guessed. I looked how kvmtool behaved when actually triggering
the interrupt on Arm [1].

Taking into the account that Xen PoC on Arm advertises [2] the same irq
type (TYPE_EDGE_RISING) as kvmtool [3] I decided to follow the model of
triggering an interrupt. Could you please explain, is this wrong?


[1]
https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/tree/arm/gic.c#n418

[2]
https://github.com/xen-troops/xen/blob/ioreq_4.14_ml/tools/libxl/libxl_arm.c#L727

[3]
https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/tree/virtio/mmio.c#n270

--
Regards,

Oleksandr Tyshchenko
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 21/07/2020 15:52, Oleksandr wrote:
>
> On 21.07.20 17:32, André Przywara wrote:
>> On 21/07/2020 14:43, Julien Grall wrote:
>
> Hello Andre, Julien
>
>
>>> (+ Andre)
>>>
>>> Hi Oleksandr,
>>>
>>> On 21/07/2020 13:26, Oleksandr wrote:
>>>> On 20.07.20 23:38, Stefano Stabellini wrote:
>>>>> For instance, what's your take on notifications with virtio-mmio? How
>>>>> are they modelled today? Are they good enough or do we need MSIs?
>>>> Notifications are sent from device (backend) to the driver (frontend)
>>>> using interrupts. Additional DM function was introduced for that
>>>> purpose xendevicemodel_set_irq_level() which results in
>>>> vgic_inject_irq() call.
>>>>
>>>> Currently, if device wants to notify a driver it should trigger the
>>>> interrupt by calling that function twice (high level at first, then
>>>> low level).
>>> This doesn't look right to me. Assuming the interrupt is trigger when
>>> the line is high-level, the backend should only issue the hypercall once
>>> to set the level to high. Once the guest has finish to process all the
>>> notifications the backend would then call the hypercall to lower the
>>> interrupt line.
>>>
>>> This means the interrupts should keep firing as long as the interrupt
>>> line is high.
>>>
>>> It is quite possible that I took some shortcut when implementing the
>>> hypercall, so this should be corrected before anyone start to rely on
>>> it.
>> So I think the key question is: are virtio interrupts level or edge
>> triggered? Both QEMU and kvmtool advertise virtio-mmio interrupts as
>> edge-triggered.
>>  From skimming through the virtio spec I can't find any explicit
>> mentioning of the type of IRQ, but the usage of MSIs indeed hints at
>> using an edge property. Apparently reading the PCI ISR status register
>> clears it, which again sounds like edge. For virtio-mmio the driver
>> needs to explicitly clear the interrupt status register, which again
>> says: edge (as it's not the device clearing the status).
>>
>> So the device should just notify the driver once, which would cause one
>> vgic_inject_irq() call. It would be then up to the driver to clear up
>> that status, by reading PCI ISR status or writing to virtio-mmio's
>> interrupt-acknowledge register.
>>
>> Does that make sense?
> When implementing Xen backend, I didn't have an already working example
> so only guessed. I looked how kvmtool behaved when actually triggering
> the interrupt on Arm [1].
>
> Taking into the account that Xen PoC on Arm advertises [2] the same irq
> type (TYPE_EDGE_RISING) as kvmtool [3] I decided to follow the model of
> triggering an interrupt. Could you please explain, is this wrong?

Yes, kvmtool does a double call needlessly (on x86, ppc and arm, mips is
correct).
I just chased it down in the kernel, a KVM_IRQ_LINE ioctl with level=low
is ignored when the target IRQ is configured as edge (which it is,
because the DT says so), check vgic_validate_injection() in the kernel.

So you should only ever need one call to set the line "high" (actually:
trigger the edge pulse).

Cheers,
Andre.

>
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/tree/arm/gic.c#n418
>
>
> [2]
> https://github.com/xen-troops/xen/blob/ioreq_4.14_ml/tools/libxl/libxl_arm.c#L727
>
>
> [3]
> https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/tree/virtio/mmio.c#n270
>
>
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 21.07.20 17:58, André Przywara wrote:
> On 21/07/2020 15:52, Oleksandr wrote:
>> On 21.07.20 17:32, André Przywara wrote:
>>> On 21/07/2020 14:43, Julien Grall wrote:
>> Hello Andre, Julien
>>
>>
>>>> (+ Andre)
>>>>
>>>> Hi Oleksandr,
>>>>
>>>> On 21/07/2020 13:26, Oleksandr wrote:
>>>>> On 20.07.20 23:38, Stefano Stabellini wrote:
>>>>>> For instance, what's your take on notifications with virtio-mmio? How
>>>>>> are they modelled today? Are they good enough or do we need MSIs?
>>>>> Notifications are sent from device (backend) to the driver (frontend)
>>>>> using interrupts. Additional DM function was introduced for that
>>>>> purpose xendevicemodel_set_irq_level() which results in
>>>>> vgic_inject_irq() call.
>>>>>
>>>>> Currently, if device wants to notify a driver it should trigger the
>>>>> interrupt by calling that function twice (high level at first, then
>>>>> low level).
>>>> This doesn't look right to me. Assuming the interrupt is trigger when
>>>> the line is high-level, the backend should only issue the hypercall once
>>>> to set the level to high. Once the guest has finish to process all the
>>>> notifications the backend would then call the hypercall to lower the
>>>> interrupt line.
>>>>
>>>> This means the interrupts should keep firing as long as the interrupt
>>>> line is high.
>>>>
>>>> It is quite possible that I took some shortcut when implementing the
>>>> hypercall, so this should be corrected before anyone start to rely on
>>>> it.
>>> So I think the key question is: are virtio interrupts level or edge
>>> triggered? Both QEMU and kvmtool advertise virtio-mmio interrupts as
>>> edge-triggered.
>>>  From skimming through the virtio spec I can't find any explicit
>>> mentioning of the type of IRQ, but the usage of MSIs indeed hints at
>>> using an edge property. Apparently reading the PCI ISR status register
>>> clears it, which again sounds like edge. For virtio-mmio the driver
>>> needs to explicitly clear the interrupt status register, which again
>>> says: edge (as it's not the device clearing the status).
>>>
>>> So the device should just notify the driver once, which would cause one
>>> vgic_inject_irq() call. It would be then up to the driver to clear up
>>> that status, by reading PCI ISR status or writing to virtio-mmio's
>>> interrupt-acknowledge register.
>>>
>>> Does that make sense?
>> When implementing Xen backend, I didn't have an already working example
>> so only guessed. I looked how kvmtool behaved when actually triggering
>> the interrupt on Arm [1].
>>
>> Taking into the account that Xen PoC on Arm advertises [2] the same irq
>> type (TYPE_EDGE_RISING) as kvmtool [3] I decided to follow the model of
>> triggering an interrupt. Could you please explain, is this wrong?
> Yes, kvmtool does a double call needlessly (on x86, ppc and arm, mips is
> correct).
> I just chased it down in the kernel, a KVM_IRQ_LINE ioctl with level=low
> is ignored when the target IRQ is configured as edge (which it is,
> because the DT says so), check vgic_validate_injection() in the kernel.
>
> So you should only ever need one call to set the line "high" (actually:
> trigger the edge pulse).

Got it, thanks for the explanation. Have just removed an extra action
(setting low level) and checked.


--
Regards,

Oleksandr Tyshchenko
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Tue, 21 Jul 2020, Alex Bennée wrote:
> Julien Grall <julien@xen.org> writes:
>
> > Hi Stefano,
> >
> > On 20/07/2020 21:37, Stefano Stabellini wrote:
> >> On Mon, 20 Jul 2020, Roger Pau Monné wrote:
> >>> On Mon, Jul 20, 2020 at 10:40:40AM +0100, Julien Grall wrote:
> >>>>
> >>>>
> >>>> On 20/07/2020 10:17, Roger Pau Monné wrote:
> >>>>> On Fri, Jul 17, 2020 at 09:34:14PM +0300, Oleksandr wrote:
> >>>>>> On 17.07.20 18:00, Roger Pau Monné wrote:
> >>>>>>> On Fri, Jul 17, 2020 at 05:11:02PM +0300, Oleksandr Tyshchenko wrote:
> >>>>>>> Do you have any plans to try to upstream a modification to the VirtIO
> >>>>>>> spec so that grants (ie: abstract references to memory addresses) can
> >>>>>>> be used on the VirtIO ring?
> >>>>>>
> >>>>>> But VirtIO spec hasn't been modified as well as VirtIO infrastructure in the
> >>>>>> guest. Nothing to upsteam)
> >>>>>
> >>>>> OK, so there's no intention to add grants (or a similar interface) to
> >>>>> the spec?
> >>>>>
> >>>>> I understand that you want to support unmodified VirtIO frontends, but
> >>>>> I also think that long term frontends could negotiate with backends on
> >>>>> the usage of grants in the shared ring, like any other VirtIO feature
> >>>>> negotiated between the frontend and the backend.
> >>>>>
> >>>>> This of course needs to be on the spec first before we can start
> >>>>> implementing it, and hence my question whether a modification to the
> >>>>> spec in order to add grants has been considered.
> >>>> The problem is not really the specification but the adoption in the
> >>>> ecosystem. A protocol based on grant-tables would mostly only be used by Xen
> >>>> therefore:
> >>>> - It may be difficult to convince a proprietary OS vendor to invest
> >>>> resource on implementing the protocol
> >>>> - It would be more difficult to move in/out of Xen ecosystem.
> >>>>
> >>>> Both, may slow the adoption of Xen in some areas.
> >>>
> >>> Right, just to be clear my suggestion wasn't to force the usage of
> >>> grants, but whether adding something along this lines was in the
> >>> roadmap, see below.
> >>>
> >>>> If one is interested in security, then it would be better to work with the
> >>>> other interested parties. I think it would be possible to use a virtual
> >>>> IOMMU for this purpose.
> >>>
> >>> Yes, I've also heard rumors about using the (I assume VirtIO) IOMMU in
> >>> order to protect what backends can map. This seems like a fine idea,
> >>> and would allow us to gain the lost security without having to do the
> >>> whole work ourselves.
> >>>
> >>> Do you know if there's anything published about this? I'm curious
> >>> about how and where in the system the VirtIO IOMMU is/should be
> >>> implemented.
> >>
> >> Not yet (as far as I know), but we have just started some discussons on
> >> this topic within Linaro.
> >>
> >>
> >> You should also be aware that there is another proposal based on
> >> pre-shared-memory and memcpys to solve the virtio security issue:
> >>
> >> https://marc.info/?l=linux-kernel&m=158807398403549
> >>
> >> It would be certainly slower than the "virtio IOMMU" solution but it
> >> would take far less time to develop and could work as a short-term
> >> stop-gap.
> >
> > I don't think I agree with this blank statement. In the case of "virtio
> > IOMMU", you would need to potentially map/unmap pages every request
> > which would result to a lot of back and forth to the hypervisor.

Yes, that's true.


> Can a virtio-iommu just set bounds when a device is initialised as to
> where memory will be in the kernel address space?

First let me premise to avoid possible miscommunication that what Julien
and I are calling "virtio IOMMU" is not an existing virtio-iommu driver
of some sort, but an idea for a cross-domain virtual IOMMU for the sake
of the frontends to explicitly permit memory to be accessed by the
backends. Hopefully it was clear already but better be sure :-)


If you are asking whether it would be possible to use the virtual IOMMU
just to setup memory at startup time, then it certainly could, but
effectively we would end up with one of the following scenarios:

1) one pre-shared bounce buffer
Effectively the same as https://marc.info/?l=linux-kernel&m=158807398403549
still requires memcpys
could still be nicer than Qualcomm's proposal because easier to
configure?

2) all domU memory allowed access to the backend
Not actually any more secure than placing the backends in dom0


Otherwise we need the dynamic maps/unmaps.

For completeness, if we could write the whole software stack from
scratch, it would also be possible to architect a protocol (like
virtio-net) and the software stack above it to always allocate memory
from a given buffer (the pre-shared buffer), hence greatly reducing the
amount of required memcpys, maybe even down to zero. In reality, most
interfaces in Linux and POSIX userspace expect the application to be the
one providing the buffer, hence they would require memcpys in the kernel
to move data between the user-provided buffers and the pre-shared buffers.
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Tue, 21 Jul 2020, Julien Grall wrote:
> Hi Alex,
>
> Thank you for your feedback!
>
> On 21/07/2020 15:15, Alex Bennée wrote:
> > Julien Grall <julien@xen.org> writes:
> >
> > > (+ Andree for the vGIC).
> > >
> > > Hi Stefano,
> > >
> > > On 20/07/2020 21:38, Stefano Stabellini wrote:
> > > > On Fri, 17 Jul 2020, Oleksandr wrote:
> > > > > > > *A few word about solution:*
> > > > > > > As it was mentioned at [1], in order to implement virtio-mmio Xen
> > > > > > > on Arm
> > > > > > Any plans for virtio-pci? Arm seems to be moving to the PCI bus, and
> > > > > > it would be very interesting from a x86 PoV, as I don't think
> > > > > > virtio-mmio is something that you can easily use on x86 (or even use
> > > > > > at all).
> > > > >
> > > > > Being honest I didn't consider virtio-pci so far. Julien's PoC (we are
> > > > > based
> > > > > on) provides support for the virtio-mmio transport
> > > > >
> > > > > which is enough to start working around VirtIO and is not as complex
> > > > > as
> > > > > virtio-pci. But it doesn't mean there is no way for virtio-pci in Xen.
> > > > >
> > > > > I think, this could be added in next steps. But the nearest target is
> > > > > virtio-mmio approach (of course if the community agrees on that).
> > >
> > > > Aside from complexity and easy-of-development, are there any other
> > > > architectural reasons for using virtio-mmio?
> > >
> > <snip>
> > > >
> > > > For instance, what's your take on notifications with virtio-mmio? How
> > > > are they modelled today?
> > >
> > > The backend will notify the frontend using an SPI. The other way around
> > > (frontend -> backend) is based on an MMIO write.
> > >
> > > We have an interface to allow the backend to control whether the
> > > interrupt level (i.e. low, high). However, the "old" vGIC doesn't handle
> > > properly level interrupts. So we would end up to treat level interrupts
> > > as edge.
> > >
> > > Technically, the problem is already existing with HW interrupts, but the
> > > HW should fire it again if the interrupt line is still asserted. Another
> > > issue is the interrupt may fire even if the interrupt line was
> > > deasserted (IIRC this caused some interesting problem with the Arch
> > > timer).
> > >
> > > I am a bit concerned that the issue will be more proeminent for virtual
> > > interrupts. I know that we have some gross hack in the vpl011 to handle
> > > a level interrupts. So maybe it is time to switch to the new vGIC?
> > >
> > > > Are they good enough or do we need MSIs?
> > >
> > > I am not sure whether virtio-mmio supports MSIs. However for virtio-pci,
> > > MSIs is going to be useful to improve performance. This may mean to
> > > expose an ITS, so we would need to add support for guest.
> >
> > virtio-mmio doesn't support MSI's at the moment although there have been
> > proposals to update the spec to allow them. At the moment the cost of
> > reading the ISR value and then writing an ack in vm_interrupt:
> >
> > /* Read and acknowledge interrupts */
> > status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS);
> > writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK);
> >
>
> Hmmmm, the current way to handle MMIO is the following:
> * pause the vCPU
> * Forward the access to the backend domain
> * Schedule the backend domain
> * Wait for the access to be handled
> * unpause the vCPU
>
> So the sequence is going to be fairly expensive on Xen.
>
> > puts an extra vmexit cost to trap an emulate each exit. Getting an MSI
> > via an exitless access to the GIC would be better I think.
> > I'm not quite
> > sure what the path to IRQs from Xen is.
>
> vmexit on Xen on Arm is pretty cheap compare to KVM as we don't save a lot of
> things. In this situation, they handling an extra trap for the interrupt is
> likely to be meaningless compare to the sequence above.

+1


> I am assuming the sequence is also going to be used by the MSIs, right?
>
> It feels to me that it would be worth spending time to investigate the cost of
> that sequence. It might be possible to optimize the ACK and avoid to wait for
> the backend to handle the access.

+1
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 21/07/2020 17:09, Oleksandr wrote:
>
> On 21.07.20 17:58, André Przywara wrote:
>> On 21/07/2020 15:52, Oleksandr wrote:
>>> On 21.07.20 17:32, André Przywara wrote:
>>>> On 21/07/2020 14:43, Julien Grall wrote:
>>> Hello Andre, Julien
>>>
>>>
>>>>> (+ Andre)
>>>>>
>>>>> Hi Oleksandr,
>>>>>
>>>>> On 21/07/2020 13:26, Oleksandr wrote:
>>>>>> On 20.07.20 23:38, Stefano Stabellini wrote:
>>>>>>> For instance, what's your take on notifications with virtio-mmio?
>>>>>>> How
>>>>>>> are they modelled today? Are they good enough or do we need MSIs?
>>>>>> Notifications are sent from device (backend) to the driver (frontend)
>>>>>> using interrupts. Additional DM function was introduced for that
>>>>>> purpose xendevicemodel_set_irq_level() which results in
>>>>>> vgic_inject_irq() call.
>>>>>>
>>>>>> Currently, if device wants to notify a driver it should trigger the
>>>>>> interrupt by calling that function twice (high level at first, then
>>>>>> low level).
>>>>> This doesn't look right to me. Assuming the interrupt is trigger when
>>>>> the line is high-level, the backend should only issue the hypercall
>>>>> once
>>>>> to set the level to high. Once the guest has finish to process all the
>>>>> notifications the backend would then call the hypercall to lower the
>>>>> interrupt line.
>>>>>
>>>>> This means the interrupts should keep firing as long as the interrupt
>>>>> line is high.
>>>>>
>>>>> It is quite possible that I took some shortcut when implementing the
>>>>> hypercall, so this should be corrected before anyone start to rely on
>>>>> it.
>>>> So I think the key question is: are virtio interrupts level or edge
>>>> triggered? Both QEMU and kvmtool advertise virtio-mmio interrupts as
>>>> edge-triggered.
>>>>   From skimming through the virtio spec I can't find any explicit
>>>> mentioning of the type of IRQ, but the usage of MSIs indeed hints at
>>>> using an edge property. Apparently reading the PCI ISR status register
>>>> clears it, which again sounds like edge. For virtio-mmio the driver
>>>> needs to explicitly clear the interrupt status register, which again
>>>> says: edge (as it's not the device clearing the status).
>>>>
>>>> So the device should just notify the driver once, which would cause one
>>>> vgic_inject_irq() call. It would be then up to the driver to clear up
>>>> that status, by reading PCI ISR status or writing to virtio-mmio's
>>>> interrupt-acknowledge register.
>>>>
>>>> Does that make sense?
>>> When implementing Xen backend, I didn't have an already working example
>>> so only guessed. I looked how kvmtool behaved when actually triggering
>>> the interrupt on Arm [1].
>>>
>>> Taking into the account that Xen PoC on Arm advertises [2] the same irq
>>> type (TYPE_EDGE_RISING) as kvmtool [3] I decided to follow the model of
>>> triggering an interrupt. Could you please explain, is this wrong?
>> Yes, kvmtool does a double call needlessly (on x86, ppc and arm, mips is
>> correct).
>> I just chased it down in the kernel, a KVM_IRQ_LINE ioctl with level=low
>> is ignored when the target IRQ is configured as edge (which it is,
>> because the DT says so), check vgic_validate_injection() in the kernel.
>>
>> So you should only ever need one call to set the line "high" (actually:
>> trigger the edge pulse).
>
> Got it, thanks for the explanation. Have just removed an extra action
> (setting low level) and checked.
>

Just for the records: the KVM API documentation explicitly mentions:
"Note that edge-triggered interrupts require the level to be set to 1
and then back to 0." So kvmtool is just following the book.

Setting it to 0 still does nothing *on ARM*, and the x86 IRQ code is far
to convoluted to easily judge what's really happening here. For MSIs at
least it's equally ignored.

So I guess a clean implementation in Xen does not need two calls, but
some folks with understanding of x86 IRQ handling in Xen should confirm.

Cheers,
Andre.
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 21.07.20 17:27, Julien Grall wrote:
> Hi,

Hello Julien


>
> On 17/07/2020 19:34, Oleksandr wrote:
>>
>> On 17.07.20 18:00, Roger Pau Monné wrote:
>>>> requires
>>>> some implementation to forward guest MMIO access to a device model.
>>>> And as
>>>> it
>>>> turned out the Xen on x86 contains most of the pieces to be able to
>>>> use that
>>>> transport (via existing IOREQ concept). Julien has already done a
>>>> big amount
>>>> of work in his PoC (xen/arm: Add support for Guest IO forwarding to a
>>>> device emulator).
>>>> Using that code as a base we managed to create a completely
>>>> functional PoC
>>>> with DomU
>>>> running on virtio block device instead of a traditional Xen PV driver
>>>> without
>>>> modifications to DomU Linux. Our work is mostly about rebasing
>>>> Julien's
>>>> code on the actual
>>>> codebase (Xen 4.14-rc4), various tweeks to be able to run emulator
>>>> (virtio-disk backend)
>>>> in other than Dom0 domain (in our system we have thin Dom0 and keep
>>>> all
>>>> backends
>>>> in driver domain),
>>> How do you handle this use-case? Are you using grants in the VirtIO
>>> ring, or rather allowing the driver domain to map all the guest memory
>>> and then placing gfn on the ring like it's commonly done with VirtIO?
>>
>> Second option. Xen grants are not used at all as well as event
>> channel and Xenbus. That allows us to have guest
>>
>> *unmodified* which one of the main goals. Yes, this may sound (or
>> even sounds) non-secure, but backend which runs in driver domain is
>> allowed to map all guest memory.
>>
>> In current backend implementation a part of guest memory is mapped
>> just to process guest request then unmapped back, there is no
>> mappings in advance. The xenforeignmemory_map
>>
>> call is used for that purpose. For experiment I tried to map all
>> guest memory in advance and just calculated pointer at runtime. Of
>> course that logic performed better.
>
> That works well for a PoC, however I am not sure you can rely on it
> long term as a guest is free to modify its memory layout. For
> instance, Linux may balloon in/out memory. You probably want to
> consider something similar to mapcache in QEMU.
Yes, that was considered and even tried.
Current backend implementation is based on map/unmap only needed part of
guest memory per each request with some kind of mapcache. I borrowed x86
logic on Arm to invalidate mapcache on XENMEM_decrease_reservation call,
so if mapcache is in use it will be cleared. Hopefully DomU without
backends running is not going to balloon in/out memory often.


>
> On a similar topic, I am a bit surprised you didn't encounter memory
> exhaustion when trying to use virtio. Because on how Linux currently
> works (see XSA-300), the backend domain as to have a least as much RAM
> as the domain it serves. For instance, you have serve two domains with
> 1GB of RAM each, then your backend would need at least 2GB + some for
> its own purpose.
I understand these bits. You have already warned me about that. When
playing with mapping the whole guest memory in advance, I gave a DomU
512MB only, that was enough to not encounter memory exhaustion on my
environment. Then switched to "map/unmap at runtime" model.


>>>>
>>>> *A few word about the Xen code:*
>>>> You can find the whole Xen series at [5]. The patches are in RFC state
>>>> because
>>>> some actions in the series should be reconsidered and implemented
>>>> properly.
>>>> Before submitting the final code for the review the first IOREQ patch
>>>> (which is quite
>>>> big) will be split into x86, Arm and common parts. Please note, x86
>>>> part
>>>> wasn’t
>>>> even build-tested so far and could be broken with that series. Also
>>>> the
>>>> series probably
>>>> wants splitting into adding IOREQ on Arm (should be focused first) and
>>>> tools support
>>>> for the virtio-disk (which is going to be the first Virtio driver)
>>>> configuration before going
>>>> into the mailing list.
>>> Sending first a patch series to enable IOREQs on Arm seems perfectly
>>> fine, and it doesn't have to come with the VirtIO backend. In fact I
>>> would recommend that you send that ASAP, so that you don't spend time
>>> working on the backend that would likely need to be modified
>>> according to the review received on the IOREQ series.
>>
>> Completely agree with you, I will send it after splitting IOREQ patch
>> and performing some cleanup.
>>
>> However, it is going to take some time to make it properly taking
>> into the account
>>
>> that personally I won't be able to test on x86.
> I think other member of the community should be able to help here.
> However, nowadays testing Xen on x86 is pretty easy with QEMU :).

That's good.


--
Regards,

Oleksandr Tyshchenko
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 21.07.20 17:27, Julien Grall wrote:
> Hi,

Hello


>
> On a similar topic, I am a bit surprised you didn't encounter memory
> exhaustion when trying to use virtio. Because on how Linux currently
> works (see XSA-300), the backend domain as to have a least as much RAM
> as the domain it serves. For instance, you have serve two domains with
> 1GB of RAM each, then your backend would need at least 2GB + some for
> its own purpose.
>
> This probably wants to be resolved by allowing foreign mapping to be
> "paging" out as you would for memory assigned to a userspace.

Didn't notice the last sentence initially. Could you please explain your
idea in detail if possible. Does it mean if implemented it would be
feasible to map all guest memory regardless of how much memory the guest
has? Avoiding map/unmap memory each guest request would allow us to have
better performance (of course with taking care of the fact that guest
memory layout could be changed)... Actually what I understand looking at
kvmtool is the fact it does not map/unmap memory dynamically, just
calculate virt addresses according to the gfn provided.


--
Regards,

Oleksandr Tyshchenko
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
Hi Oleksandr,

On 21/07/2020 19:16, Oleksandr wrote:
>
> On 21.07.20 17:27, Julien Grall wrote:
>> On a similar topic, I am a bit surprised you didn't encounter memory
>> exhaustion when trying to use virtio. Because on how Linux currently
>> works (see XSA-300), the backend domain as to have a least as much RAM
>> as the domain it serves. For instance, you have serve two domains with
>> 1GB of RAM each, then your backend would need at least 2GB + some for
>> its own purpose.
>>
>> This probably wants to be resolved by allowing foreign mapping to be
>> "paging" out as you would for memory assigned to a userspace.
>
> Didn't notice the last sentence initially. Could you please explain your
> idea in detail if possible. Does it mean if implemented it would be
> feasible to map all guest memory regardless of how much memory the guest
> has?
>
> Avoiding map/unmap memory each guest request would allow us to have
> better performance (of course with taking care of the fact that guest
> memory layout could be changed)...

I will explain that below. Before let me comment on KVM first.

> Actually what I understand looking at
> kvmtool is the fact it does not map/unmap memory dynamically, just
> calculate virt addresses according to the gfn provided.

The memory management between KVM and Xen is quite different. In the
case of KVM, the guest RAM is effectively memory from the userspace
(allocated via mmap) and then shared with the guest.

From the userspace PoV, the guest memory will always be accessible from
the same virtual region. However, behind the scene, the pages may not
always reside in memory. They are basically managed the same way as
"normal" userspace memory.

In the case of Xen, we are basically stealing a guest physical page
allocated via kmalloc() and provide no facilities for Linux to reclaim
the page if it needs to do it before the userspace decide to unmap the
foreign mapping.

I think it would be good to handle the foreing mapping the same way as
userspace memory. By that I mean, that Linux could reclaim the physical
page used by the foreing mapping if it needs to.

The process for reclaiming the page would look like:
1) Unmap the foreign page
2) Ballon in the backend domain physical address used by the
foreing mapping (allocate the page in the physmap)

The next time the userspace is trying to access the foreign page, Linux
will receive a data abort that would result to:
1) Allocate a backend domain physical page
2) Balloon out the physical address (remove the page from the physmap)
3) Map the foreing mapping at the new guest physical address
4) Map the guest physical page in the userspace address space

With this approach, we should be able to have backend domain that can
handle frontend domain without require a lot of memory.

Note that I haven't looked at the Linux code yet, so I don't know the
complexity to implement it or all the pitfalls.

One pitfall I could think right now is the frontend guest may have
removed the page from its physmap. Therefore the backend domain wouldn't
be able to re-map the page. We definitely don't want to crash the
backend app in this case. However, I am not entirely sure what would be
the correct action.

Long term, we may want to consider to use separate region in the backend
domain physical address. This may remove the pressure in the backend
domain RAM and reducing the number of page that may be "swapped out".

Cheers,

--
Julien Grall
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Tue, Jul 21, 2020 at 10:12:40PM +0100, Julien Grall wrote:
> Hi Oleksandr,
>
> On 21/07/2020 19:16, Oleksandr wrote:
> >
> > On 21.07.20 17:27, Julien Grall wrote:
> > > On a similar topic, I am a bit surprised you didn't encounter memory
> > > exhaustion when trying to use virtio. Because on how Linux currently
> > > works (see XSA-300), the backend domain as to have a least as much
> > > RAM as the domain it serves. For instance, you have serve two
> > > domains with 1GB of RAM each, then your backend would need at least
> > > 2GB + some for its own purpose.
> > >
> > > This probably wants to be resolved by allowing foreign mapping to be
> > > "paging" out as you would for memory assigned to a userspace.
> >
> > Didn't notice the last sentence initially. Could you please explain your
> > idea in detail if possible. Does it mean if implemented it would be
> > feasible to map all guest memory regardless of how much memory the guest
> > has?
> >
> > Avoiding map/unmap memory each guest request would allow us to have
> > better performance (of course with taking care of the fact that guest
> > memory layout could be changed)...
>
> I will explain that below. Before let me comment on KVM first.
>
> > Actually what I understand looking at kvmtool is the fact it does not
> > map/unmap memory dynamically, just calculate virt addresses according to
> > the gfn provided.
>
> The memory management between KVM and Xen is quite different. In the case of
> KVM, the guest RAM is effectively memory from the userspace (allocated via
> mmap) and then shared with the guest.
>
> From the userspace PoV, the guest memory will always be accessible from the
> same virtual region. However, behind the scene, the pages may not always
> reside in memory. They are basically managed the same way as "normal"
> userspace memory.
>
> In the case of Xen, we are basically stealing a guest physical page
> allocated via kmalloc() and provide no facilities for Linux to reclaim the
> page if it needs to do it before the userspace decide to unmap the foreign
> mapping.
>
> I think it would be good to handle the foreing mapping the same way as
> userspace memory. By that I mean, that Linux could reclaim the physical page
> used by the foreing mapping if it needs to.
>
> The process for reclaiming the page would look like:
> 1) Unmap the foreign page
> 2) Ballon in the backend domain physical address used by the foreing
> mapping (allocate the page in the physmap)
>
> The next time the userspace is trying to access the foreign page, Linux will
> receive a data abort that would result to:
> 1) Allocate a backend domain physical page
> 2) Balloon out the physical address (remove the page from the physmap)
> 3) Map the foreing mapping at the new guest physical address
> 4) Map the guest physical page in the userspace address space

This is going to shatter all the super pages in the stage-2
translation.

> With this approach, we should be able to have backend domain that can handle
> frontend domain without require a lot of memory.

Linux on x86 has the option to use empty hotplug memory ranges to map
foreign memory: the balloon driver hotplugs an unpopulated physical
memory range that's not made available to the OS free memory allocator
and it's just used as scratch space to map foreign memory. Not sure
whether Arm has something similar, or if it could be implemented.

You can still use the map-on-fault behaviour as above, but I would
recommend that you try to limit the number of hypercalls issued.
Having to issue a single hypercall for each page fault it's going to
be slow, so I would instead use mmap batch to map the hole range in
unpopulated physical memory and then the OS fault handler just needs to
fill the page tables with the corresponding address.

Roger.
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
Hi Roger,

On 22/07/2020 09:21, Roger Pau Monné wrote:
> On Tue, Jul 21, 2020 at 10:12:40PM +0100, Julien Grall wrote:
>> Hi Oleksandr,
>>
>> On 21/07/2020 19:16, Oleksandr wrote:
>>>
>>> On 21.07.20 17:27, Julien Grall wrote:
>>>> On a similar topic, I am a bit surprised you didn't encounter memory
>>>> exhaustion when trying to use virtio. Because on how Linux currently
>>>> works (see XSA-300), the backend domain as to have a least as much
>>>> RAM as the domain it serves. For instance, you have serve two
>>>> domains with 1GB of RAM each, then your backend would need at least
>>>> 2GB + some for its own purpose.
>>>>
>>>> This probably wants to be resolved by allowing foreign mapping to be
>>>> "paging" out as you would for memory assigned to a userspace.
>>>
>>> Didn't notice the last sentence initially. Could you please explain your
>>> idea in detail if possible. Does it mean if implemented it would be
>>> feasible to map all guest memory regardless of how much memory the guest
>>> has?
>>>
>>> Avoiding map/unmap memory each guest request would allow us to have
>>> better performance (of course with taking care of the fact that guest
>>> memory layout could be changed)...
>>
>> I will explain that below. Before let me comment on KVM first.
>>
>>> Actually what I understand looking at kvmtool is the fact it does not
>>> map/unmap memory dynamically, just calculate virt addresses according to
>>> the gfn provided.
>>
>> The memory management between KVM and Xen is quite different. In the case of
>> KVM, the guest RAM is effectively memory from the userspace (allocated via
>> mmap) and then shared with the guest.
>>
>> From the userspace PoV, the guest memory will always be accessible from the
>> same virtual region. However, behind the scene, the pages may not always
>> reside in memory. They are basically managed the same way as "normal"
>> userspace memory.
>>
>> In the case of Xen, we are basically stealing a guest physical page
>> allocated via kmalloc() and provide no facilities for Linux to reclaim the
>> page if it needs to do it before the userspace decide to unmap the foreign
>> mapping.
>>
>> I think it would be good to handle the foreing mapping the same way as
>> userspace memory. By that I mean, that Linux could reclaim the physical page
>> used by the foreing mapping if it needs to.
>>
>> The process for reclaiming the page would look like:
>> 1) Unmap the foreign page
>> 2) Ballon in the backend domain physical address used by the foreing
>> mapping (allocate the page in the physmap)
>>
>> The next time the userspace is trying to access the foreign page, Linux will
>> receive a data abort that would result to:
>> 1) Allocate a backend domain physical page
>> 2) Balloon out the physical address (remove the page from the physmap)
>> 3) Map the foreing mapping at the new guest physical address
>> 4) Map the guest physical page in the userspace address space
>
> This is going to shatter all the super pages in the stage-2
> translation.

Yes, but this is nothing really new as ballooning would result to
(AFAICT) the same behavior on Linux.

>
>> With this approach, we should be able to have backend domain that can handle
>> frontend domain without require a lot of memory.
>
> Linux on x86 has the option to use empty hotplug memory ranges to map
> foreign memory: the balloon driver hotplugs an unpopulated physical
> memory range that's not made available to the OS free memory allocator
> and it's just used as scratch space to map foreign memory. Not sure
> whether Arm has something similar, or if it could be implemented.

We already discussed that last year :). This was attempted in the past
(I was still at Citrix) and indefinitely paused for Arm.

/proc/iomem can be incomplete on Linux if we didn't load a driver for
all the devices. This means that Linux doesn't have the full view of
what is physical range is freed.

Additionally, in the case of Dom0, all the regions corresponding to the
host RAM are unusable when using the SMMU. This is because we would do
1:1 mapping for the foreign mapping as well.

It might be possible to take advantage of the direct mapping property if
Linux do some bookeeping. Although, this wouldn't work for 32-bit Dom0
using short page tables (e.g some version of Debian does) as it may not
be able to access all the host RAM. Whether we still care about is a
different situation :).

For all the other domains, I think we would want the toolstack to
provide a region that can be safely used for foreign mapping (similar to
what we already do for the grant-table).

>
> You can still use the map-on-fault behaviour as above, but I would
> recommend that you try to limit the number of hypercalls issued.
> Having to issue a single hypercall for each page fault it's going to
> be slow, so I would instead use mmap batch to map the hole range in
> unpopulated physical memory and then the OS fault handler just needs to
> fill the page tables with the corresponding address.
IIUC your proposal, you are assuming that you will have enough free
space in the physical address space to map the foreign mapping.

However that amount of free space is not unlimited and may be quite
small (see above). It would be fairly easy to exhaust it given that a
userspace application can map many times the same guest physical address.

So I still think we need to be able to allow Linux to swap a foreign
page with another page.

Cheers,

--
Julien Grall
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Wed, Jul 22, 2020 at 11:47:18AM +0100, Julien Grall wrote:
> Hi Roger,
>
> On 22/07/2020 09:21, Roger Pau Monné wrote:
> > On Tue, Jul 21, 2020 at 10:12:40PM +0100, Julien Grall wrote:
> > > Hi Oleksandr,
> > >
> > > On 21/07/2020 19:16, Oleksandr wrote:
> > > >
> > > > On 21.07.20 17:27, Julien Grall wrote:
> > > > > On a similar topic, I am a bit surprised you didn't encounter memory
> > > > > exhaustion when trying to use virtio. Because on how Linux currently
> > > > > works (see XSA-300), the backend domain as to have a least as much
> > > > > RAM as the domain it serves. For instance, you have serve two
> > > > > domains with 1GB of RAM each, then your backend would need at least
> > > > > 2GB + some for its own purpose.
> > > > >
> > > > > This probably wants to be resolved by allowing foreign mapping to be
> > > > > "paging" out as you would for memory assigned to a userspace.
> > > >
> > > > Didn't notice the last sentence initially. Could you please explain your
> > > > idea in detail if possible. Does it mean if implemented it would be
> > > > feasible to map all guest memory regardless of how much memory the guest
> > > > has?
> > > >
> > > > Avoiding map/unmap memory each guest request would allow us to have
> > > > better performance (of course with taking care of the fact that guest
> > > > memory layout could be changed)...
> > >
> > > I will explain that below. Before let me comment on KVM first.
> > >
> > > > Actually what I understand looking at kvmtool is the fact it does not
> > > > map/unmap memory dynamically, just calculate virt addresses according to
> > > > the gfn provided.
> > >
> > > The memory management between KVM and Xen is quite different. In the case of
> > > KVM, the guest RAM is effectively memory from the userspace (allocated via
> > > mmap) and then shared with the guest.
> > >
> > > From the userspace PoV, the guest memory will always be accessible from the
> > > same virtual region. However, behind the scene, the pages may not always
> > > reside in memory. They are basically managed the same way as "normal"
> > > userspace memory.
> > >
> > > In the case of Xen, we are basically stealing a guest physical page
> > > allocated via kmalloc() and provide no facilities for Linux to reclaim the
> > > page if it needs to do it before the userspace decide to unmap the foreign
> > > mapping.
> > >
> > > I think it would be good to handle the foreing mapping the same way as
> > > userspace memory. By that I mean, that Linux could reclaim the physical page
> > > used by the foreing mapping if it needs to.
> > >
> > > The process for reclaiming the page would look like:
> > > 1) Unmap the foreign page
> > > 2) Ballon in the backend domain physical address used by the foreing
> > > mapping (allocate the page in the physmap)
> > >
> > > The next time the userspace is trying to access the foreign page, Linux will
> > > receive a data abort that would result to:
> > > 1) Allocate a backend domain physical page
> > > 2) Balloon out the physical address (remove the page from the physmap)
> > > 3) Map the foreing mapping at the new guest physical address
> > > 4) Map the guest physical page in the userspace address space
> >
> > This is going to shatter all the super pages in the stage-2
> > translation.
>
> Yes, but this is nothing really new as ballooning would result to (AFAICT)
> the same behavior on Linux.
>
> >
> > > With this approach, we should be able to have backend domain that can handle
> > > frontend domain without require a lot of memory.
> >
> > Linux on x86 has the option to use empty hotplug memory ranges to map
> > foreign memory: the balloon driver hotplugs an unpopulated physical
> > memory range that's not made available to the OS free memory allocator
> > and it's just used as scratch space to map foreign memory. Not sure
> > whether Arm has something similar, or if it could be implemented.
>
> We already discussed that last year :). This was attempted in the past (I
> was still at Citrix) and indefinitely paused for Arm.
>
> /proc/iomem can be incomplete on Linux if we didn't load a driver for all
> the devices. This means that Linux doesn't have the full view of what is
> physical range is freed.
>
> Additionally, in the case of Dom0, all the regions corresponding to the host
> RAM are unusable when using the SMMU. This is because we would do 1:1
> mapping for the foreign mapping as well.

Right, that's a PITA because on x86 PVH dom0 I was planning to use
those RAM regions as scratch space for foreign mapping lacking a
better alternative ATM.

> It might be possible to take advantage of the direct mapping property if
> Linux do some bookeeping. Although, this wouldn't work for 32-bit Dom0 using
> short page tables (e.g some version of Debian does) as it may not be able to
> access all the host RAM. Whether we still care about is a different
> situation :).
>
> For all the other domains, I think we would want the toolstack to provide a
> region that can be safely used for foreign mapping (similar to what we
> already do for the grant-table).

Yes, that would be the plan on x86 also - have some way for the
hypervisor to report safe ranges where a domU can create foreign
mappings.

> >
> > You can still use the map-on-fault behaviour as above, but I would
> > recommend that you try to limit the number of hypercalls issued.
> > Having to issue a single hypercall for each page fault it's going to
> > be slow, so I would instead use mmap batch to map the hole range in
> > unpopulated physical memory and then the OS fault handler just needs to
> > fill the page tables with the corresponding address.
> IIUC your proposal, you are assuming that you will have enough free space in
> the physical address space to map the foreign mapping.
>
> However that amount of free space is not unlimited and may be quite small
> (see above). It would be fairly easy to exhaust it given that a userspace
> application can map many times the same guest physical address.
>
> So I still think we need to be able to allow Linux to swap a foreign page
> with another page.

Right, but you will have to be careful to make sure physical addresses
are not swapped while being used for IO with devices, as in that case
you won't get a recoverable fault. This is safe now because physical
mappings created by privcmd are never swapped out, but if you go the
route you propose you will have to figure a way to correctly populate
physical ranges used for IO with devices, even when the CPU hasn't
accessed them.

Relying solely on CPU page faults to populate them will not be enough,
as the CPU won't necessarily access all the pages that would be send
to devices for IO.

Roger.
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On 22/07/2020 12:10, Roger Pau Monné wrote:
> On Wed, Jul 22, 2020 at 11:47:18AM +0100, Julien Grall wrote:
>>>
>>> You can still use the map-on-fault behaviour as above, but I would
>>> recommend that you try to limit the number of hypercalls issued.
>>> Having to issue a single hypercall for each page fault it's going to
>>> be slow, so I would instead use mmap batch to map the hole range in
>>> unpopulated physical memory and then the OS fault handler just needs to
>>> fill the page tables with the corresponding address.
>> IIUC your proposal, you are assuming that you will have enough free space in
>> the physical address space to map the foreign mapping.
>>
>> However that amount of free space is not unlimited and may be quite small
>> (see above). It would be fairly easy to exhaust it given that a userspace
>> application can map many times the same guest physical address.
>>
>> So I still think we need to be able to allow Linux to swap a foreign page
>> with another page.
>
> Right, but you will have to be careful to make sure physical addresses
> are not swapped while being used for IO with devices, as in that case
> you won't get a recoverable fault. This is safe now because physical
> mappings created by privcmd are never swapped out, but if you go the
> route you propose you will have to figure a way to correctly populate
> physical ranges used for IO with devices, even when the CPU hasn't
> accessed them.
>
> Relying solely on CPU page faults to populate them will not be enough,
> as the CPU won't necessarily access all the pages that would be send
> to devices for IO.

The problem you described here doesn't seem to be specific to foreign
mapping. So I would really be surprised if Linux doesn't already have
generic mechanism to deal with this.

Hence why I suggested before to deal with foreign mapping the same way
as Linux would do with user memory.

Cheers,

--
Julien Grall
Re: Virtio in Xen on Arm (based on IOREQ concept) [ In reply to ]
On Wed, Jul 22, 2020 at 12:17:26PM +0100, Julien Grall wrote:
>
>
> On 22/07/2020 12:10, Roger Pau Monné wrote:
> > On Wed, Jul 22, 2020 at 11:47:18AM +0100, Julien Grall wrote:
> > > >
> > > > You can still use the map-on-fault behaviour as above, but I would
> > > > recommend that you try to limit the number of hypercalls issued.
> > > > Having to issue a single hypercall for each page fault it's going to
> > > > be slow, so I would instead use mmap batch to map the hole range in
> > > > unpopulated physical memory and then the OS fault handler just needs to
> > > > fill the page tables with the corresponding address.
> > > IIUC your proposal, you are assuming that you will have enough free space in
> > > the physical address space to map the foreign mapping.
> > >
> > > However that amount of free space is not unlimited and may be quite small
> > > (see above). It would be fairly easy to exhaust it given that a userspace
> > > application can map many times the same guest physical address.
> > >
> > > So I still think we need to be able to allow Linux to swap a foreign page
> > > with another page.
> >
> > Right, but you will have to be careful to make sure physical addresses
> > are not swapped while being used for IO with devices, as in that case
> > you won't get a recoverable fault. This is safe now because physical
> > mappings created by privcmd are never swapped out, but if you go the
> > route you propose you will have to figure a way to correctly populate
> > physical ranges used for IO with devices, even when the CPU hasn't
> > accessed them.
> >
> > Relying solely on CPU page faults to populate them will not be enough,
> > as the CPU won't necessarily access all the pages that would be send
> > to devices for IO.
>
> The problem you described here doesn't seem to be specific to foreign
> mapping. So I would really be surprised if Linux doesn't already have
> generic mechanism to deal with this.

Right, Linux will pre-fault and lock the pages before using them for
IO, and unlock them afterwards, in which case it should be safe.

> Hence why I suggested before to deal with foreign mapping the same way as
> Linux would do with user memory.

Should work, on FreeBSD privcmd I also populate the pages in the page
fault handler, but the hypercall to create the foreign mappings is
executed only once when the ioctl is issued.

Roger.