Mailing List Archive

Re: [openxt-dev] VirtIO-Argo initial development proposal
?On Dec 17, 2020, at 07:13, Jean-Philippe Ouellet <jpo@vt.edu> wrote:
> ?On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
>> Hi all,
>>
>> I have written a page for the OpenXT wiki describing a proposal for
>> initial development towards the VirtIO-Argo transport driver, and the
>> related system components to support it, destined for OpenXT and
>> upstream projects:
>>
>> https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1
>>
>> Please review ahead of tomorrow's OpenXT Community Call.
>>
>> I would draw your attention to the Comparison of Argo interface options section:
>>
>> https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Comparison-of-Argo-interface-options
>>
>> where further input to the table would be valuable;
>> and would also appreciate input on the IOREQ project section:
>>
>> https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Project:-IOREQ-for-VirtIO-Argo
>>
>> in particular, whether an IOREQ implementation to support the
>> provision of devices to the frontends can replace the need for any
>> userspace software to interact with an Argo kernel interface for the
>> VirtIO-Argo implementation.
>>
>> thanks,
>> Christopher
>
> Hi,
>
> Really excited to see this happening, and disappointed that I'm not
> able to contribute at this time. I don't think I'll be able to join
> the call, but wanted to share some initial thoughts from my
> middle-of-the-night review anyway.
>
> Super rough notes in raw unedited notes-to-self form:
>
> main point of feedback is: I love the desire to get a non-shared-mem
> transport backend for virtio standardized. It moves us closer to an
> HMX-only world. BUT: virtio is relevant to many hypervisors beyond
> Xen, not all of which have the same views on how policy enforcement
> should be done, namely some have a preference for capability-oriented
> models over type-enforcement / MAC models. It would be nice if any
> labeling encoded into the actual specs / guest-boundary protocols
> would be strictly a mechanism, and be policy-agnostic, in particular
> not making implicit assumptions about XSM / SELinux / similar. I don't
> have specific suggestions at this point, but would love to discuss.
>
> thoughts on how to handle device enumeration? hotplug notifications?
> - can't rely on xenstore
> - need some internal argo messaging for this?
> - name service w/ well-known names? starts to look like xenstore
> pretty quickly...
> - granular disaggregation of backend device-model providers desirable
>
> how does resource accounting work? each side pays for their own delivery ring?
> - init in already-guest-mapped mem & simply register?
> - how does it compare to grant tables?
> - do you need to go through linux driver to alloc (e.g. xengntalloc)
> or has way to share arbitrary otherwise not-special userspace pages
> (e.g. u2mfn, with all its issues (pinning, reloc, etc.))?
>
> ioreq is tangled with grant refs, evt chans, generic vmexit
> dispatcher, instruction decoder, etc. none of which seems desirable if
> trying to move towards world with strictly safer guest interfaces
> exposed (e.g. HMX-only)
> - there's no io to trap/decode here, it's explicitly exclusively via
> hypercall to HMX, no?
> - also, do we want argo sendv hypercall to be always blocking & synchronous?
> - or perhaps async notify & background copy to other vm addr space?
> - possibly better scaling?
> - accounting of in-flight io requests to handle gets complicated
> (see recent XSA)
> - PCI-like completion request semantics? (argo as cross-domain
> software dma engine w/ some basic protocol enforcement?)
>
> "port" v4v driver => argo:
> - yes please! something without all the confidence-inspiring
> DEBUG_{APPLE,ORANGE,BANANA} indicators of production-worthy code would
> be great ;)
> - seems like you may want to redo argo hypercall interface too? (at
> least the syscall interface...)
> - targeting synchronous blocking sendv()?
> - or some async queue/completion thing too? (like PF_RING, but with
> *iov entries?)
> - both could count as HMX, both could enforce no double-write racing
> games at dest ring, etc.
>
> re v4vchar & doing similar for argo:
> - we may prefer "can write N bytes? -> yes/no" or "how many bytes can
> write? -> N" over "try to write N bytes -> only wrote M, EAGAIN"
> - the latter can be implemented over the former, but not the other way around
> - starts to matter when you want to be able to implement in userspace
> & provide backpressure to peer userspace without additional buffering
> & potential lying about durability of writes
> - breaks cross-domain EPIPE boundary correctness
> - Qubes ran into same issues when porting vchan from Xen to KVM
> initially via vsock
>
> some virtio drivers explicitly use shared mem for more than just
> communication rings:
> - e.g. virtio-fs, which can map pages as DAX-like fs backing to share page cache
> - e.g. virtio-gpu, virtio-wayland, virtio-video, which deal in framebuffers
> - needs thought about how best to map semantics to (or at least
> interoperate cleanly & safely with) HMX-{only,mostly} world
> - the performance of shared mem actually can meaningfully matter for
> e.g. large framebuffers in particular due to fundamental memory
> bandwidth constraints
>
> what is mentioned PX hypervisor? presumably short for PicoXen? any
> public information?

Not much at the moment, but there is prior public work. PX is an OSS L0 "Protection Hypervisor" in the Hardened Access Terminal (HAT) architecture presented by Daniel Smith at the 2020 Xen Summit: https://youtube.com/watch?v=Wt-SBhFnDZY&t=3m48s

PX is intended to build on lessons learned from IBM Ultravisor, HP/Bromium AX and AIS Bareflank L0 hypervisors:

IBM: https://www.platformsecuritysummit.com/2019/speaker/hunt/

HP/Bromium: https://www.platformsecuritysummit.com/2018/speaker/pratt/
Dec 2019 meeting in Cambridge, Day2 discussion included L0 nesting hypervisor, UUID semantics, Argo, communication between nested hypervisors: https://lists.archive.carbon60.com/xen/devel/577800

Bareflank: https://youtube.com/channel/UCH-7Pw96K5V1RHAPn5-cmYA
Xen Summit 2020 design session notes: https://lists.archive.carbon60.com/xen/devel/591509

In the long-term, efficient hypervisor nesting will require close cooperation with silicon and firmware vendors. Note that Intel is introducing TDX (Trust Domain Extensions):

https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html
https://www.brighttalk.com/webcast/18206/453600

There are also a couple of recent papers from Shanghai Jiao Tong University, on using hardware instructions to accelerate inter-domain HMX.

March 2019: https://ipads.se.sjtu.edu.cn/_media/publications/skybridge-eurosys19.pdf

> we present SkyBridge, a new communication facility designed and optimized for synchronous IPC in microkernels. SkyBridge requires no involvement of kernels during communication and allows a process to directly switch to the virtual address space of the target process and invoke the target function. SkyBridge retains the traditional virtual address space isolation and thus can be easily integrated into existing microkernels. The key idea of SkyBridge is to leverage a commodity hardware feature for virtualization (i.e., [Intel EPT] VMFUNC) to achieve efficient IPC. To leverage the hardware feature, SkyBridge inserts a tiny virtualization layer (Rootkernel) beneath the original microkernel (Subkernel). The Rootkernel is carefully designed to eliminate most virtualization overheads. SkyBridge also integrates a series of techniques to guarantee the security properties of IPC. We have implemented SkyBridge on three popular open-source microkernels (seL4, Fiasco.OC, and Google Zircon). The evaluation results show that SkyBridge improves the speed of IPC by 1.49x to 19.6x for microbenchmarks. For real-world applications (e.g., SQLite3 database), SkyBridge improves the throughput by 81.9%, 1.44x and 9.59x for the three microkernels on average.

July 2020: https://ipads.se.sjtu.edu.cn/_media/publications/guatc20.pdf

> a redesign of traditional microkernel OSes to harmonize the tension between messaging performance and isolation. UnderBridge moves the OS components of a microkernel between user space and kernel space at runtime while enforcing consistent isolation. It retrofits Intel Memory Protection Key for Userspace (PKU) in kernel space to achieve such isolation efficiently and design a fast IPC mechanism across those OS components. Thanks to PKU’s extremely low overhead, the inter-process communication (IPC) roundtrip cost in UnderBridge can be as low as 109 cycles. We have designed and implemented a new microkernel called ChCore based on UnderBridge and have also ported UnderBridge to three mainstream microkernels, i.e., seL4, Google Zircon, and Fiasco.OC. Evaluations show that UnderBridge speeds up the IPC by 3.0× compared with the state-of-the-art (e.g., SkyBridge) and improves the performance of IPC-intensive applications by up to 13.1× for the above three microkernels



For those interested in Argo and VirtIO, there will be a conference call on Thursday, Jan 14th 2021, at 1600 UTC.

Rich
Re: [openxt-dev] VirtIO-Argo initial development proposal [ In reply to ]
On Wed, Dec 23, 2020 at 04:32:01PM -0500, Rich Persaud wrote:
> ?On Dec 17, 2020, at 07:13, Jean-Philippe Ouellet <jpo@vt.edu> wrote:
> > ?On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark
> > <christopher.w.clark@gmail.com> wrote:
> >> Hi all,
> >>
> >> I have written a page for the OpenXT wiki describing a proposal for
> >> initial development towards the VirtIO-Argo transport driver, and the
> >> related system components to support it, destined for OpenXT and
> >> upstream projects:
> >>
> >> https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1

Thanks for the detailed document, I've taken a look and there's indeed
a lot of work to do listed there :). I have some suggestion and
questions.

Overall I think it would be easier for VirtIO to take a new transport
if it's not tied to a specific hypervisor. The way Argo is implemented
right now is using hypercalls, which is a mechanism specific to Xen.
IMO it might be easier to start by having an Argo interface using
MSRs, that all hypervisors can implement, and then base the VirtIO
implementation on top of that interface. It could be presented as a
hypervisor agnostic mediated interface for inter-domain communication
or some such.

That kind of links to a question, has any of this been discussed with
the VirtIO folks, either at OASIS or the Linux kernel?

The document mentions: "Destination: mainline Linux kernel, via the
Xen community" regarding the upstreamability of the VirtIO-Argo
transport driver, but I think this would have to go through the VirtIO
maintainers and not the Xen ones, hence you might want their feedback
quite early to make sure they are OK with the approach taken, and in
turn this might also require OASIS to agree to have a new transport
documented.

> >>
> >> Please review ahead of tomorrow's OpenXT Community Call.
> >>
> >> I would draw your attention to the Comparison of Argo interface options section:
> >>
> >> https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Comparison-of-Argo-interface-options
> >>
> >> where further input to the table would be valuable;
> >> and would also appreciate input on the IOREQ project section:
> >>
> >> https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Project:-IOREQ-for-VirtIO-Argo
> >>
> >> in particular, whether an IOREQ implementation to support the
> >> provision of devices to the frontends can replace the need for any
> >> userspace software to interact with an Argo kernel interface for the
> >> VirtIO-Argo implementation.
> >>
> >> thanks,
> >> Christopher
> >
> > Hi,
> >
> > Really excited to see this happening, and disappointed that I'm not
> > able to contribute at this time. I don't think I'll be able to join
> > the call, but wanted to share some initial thoughts from my
> > middle-of-the-night review anyway.
> >
> > Super rough notes in raw unedited notes-to-self form:
> >
> > main point of feedback is: I love the desire to get a non-shared-mem
> > transport backend for virtio standardized. It moves us closer to an
> > HMX-only world. BUT: virtio is relevant to many hypervisors beyond
> > Xen, not all of which have the same views on how policy enforcement
> > should be done, namely some have a preference for capability-oriented
> > models over type-enforcement / MAC models. It would be nice if any
> > labeling encoded into the actual specs / guest-boundary protocols
> > would be strictly a mechanism, and be policy-agnostic, in particular
> > not making implicit assumptions about XSM / SELinux / similar. I don't
> > have specific suggestions at this point, but would love to discuss.
> >
> > thoughts on how to handle device enumeration? hotplug notifications?
> > - can't rely on xenstore
> > - need some internal argo messaging for this?
> > - name service w/ well-known names? starts to look like xenstore
> > pretty quickly...
> > - granular disaggregation of backend device-model providers desirable

I'm also curious about this part and I was assuming this would be
done using some kind of Argo messages, but there's no mention in the
document. Would be nice to elaborate a little more about this in the
document.

> > how does resource accounting work? each side pays for their own delivery ring?
> > - init in already-guest-mapped mem & simply register?
> > - how does it compare to grant tables?
> > - do you need to go through linux driver to alloc (e.g. xengntalloc)
> > or has way to share arbitrary otherwise not-special userspace pages
> > (e.g. u2mfn, with all its issues (pinning, reloc, etc.))?
> >
> > ioreq is tangled with grant refs, evt chans, generic vmexit
> > dispatcher, instruction decoder, etc. none of which seems desirable if
> > trying to move towards world with strictly safer guest interfaces
> > exposed (e.g. HMX-only)

I think this needs Christopher's clarification, but it's my
understanding that the Argo transport wouldn't need IOREQs at all,
since all data exchange would be done using the Argo interfaces, there
would be no MMIO emulation or anything similar. The mention about
IOREQs is because the Arm folks are working on using IOREQs in Arm to
enable virtio-mmio on Xen.

Fro my reading of the document, it seem Argo VirtIO would still rely
on event channels, it would IMO be better if instead interrupts are
delivered using a native mechanism, something like MSI delivery by
using a destination APIC ID, vector, delivery mode and trigger mode.

Roger.
Re: [openxt-dev] VirtIO-Argo initial development proposal [ In reply to ]
On Thu, Dec 17, 2020 at 4:13 AM Jean-Philippe Ouellet <jpo@vt.edu> wrote:
>
> On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark
> <christopher.w.clark@gmail.com> wrote:
> > Hi all,
> >
> > I have written a page for the OpenXT wiki describing a proposal for
> > initial development towards the VirtIO-Argo transport driver, and the
> > related system components to support it, destined for OpenXT and
> > upstream projects:
> >
> > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1
> >
> > Please review ahead of tomorrow's OpenXT Community Call.
> >
> > I would draw your attention to the Comparison of Argo interface options section:
> >
> > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Comparison-of-Argo-interface-options
> >
> > where further input to the table would be valuable;
> > and would also appreciate input on the IOREQ project section:
> >
> > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Project:-IOREQ-for-VirtIO-Argo
> >
> > in particular, whether an IOREQ implementation to support the
> > provision of devices to the frontends can replace the need for any
> > userspace software to interact with an Argo kernel interface for the
> > VirtIO-Argo implementation.
> >
> > thanks,
> > Christopher
>
> Hi,
>
> Really excited to see this happening, and disappointed that I'm not
> able to contribute at this time. I don't think I'll be able to join
> the call, but wanted to share some initial thoughts from my
> middle-of-the-night review anyway.

Thanks for the review and positive feedback - appreciated.

> Super rough notes in raw unedited notes-to-self form:
>
> main point of feedback is: I love the desire to get a non-shared-mem
> transport backend for virtio standardized. It moves us closer to an
> HMX-only world. BUT: virtio is relevant to many hypervisors beyond
> Xen, not all of which have the same views on how policy enforcement
> should be done, namely some have a preference for capability-oriented
> models over type-enforcement / MAC models. It would be nice if any
> labeling encoded into the actual specs / guest-boundary protocols
> would be strictly a mechanism, and be policy-agnostic, in particular
> not making implicit assumptions about XSM / SELinux / similar. I don't
> have specific suggestions at this point, but would love to discuss.

That is an interesting point; thanks. It is more about the features
and specification of Argo itself and its interfaces than the use of it
to implement a VirtIO transport, but is good to consider. We have a
OpenXT wiki page for Argo development, and have a related item
described there about having the hypervisor and remote guest kernel
provide message context about the communication source to the
receiver, to support policy decisions:

https://openxt.atlassian.net/wiki/spaces/DC/pages/737345538/Argo+Hypervisor-Mediated+data+eXchange+Development

> thoughts on how to handle device enumeration? hotplug notifications?
> - can't rely on xenstore
> - need some internal argo messaging for this?
> - name service w/ well-known names? starts to look like xenstore
> pretty quickly...

I don't think we have a firm decision on this. We have been
considering using ACPI-tables and/or Device Tree for device
enumeration, which is viable for devices that are statically assigned,
and hotplug is an additional case to design for. We'll be looking at
the existing VirtIO transports too.

Handling notifications on a well-known Argo port is a reasonable
direction to go and fits with applying XSM policy to govern Argo port
connectivity between domains.

https://openxt.atlassian.net/wiki/spaces/DC/pages/1333428225/Analysis+of+Argo+as+a+transport+medium+for+VirtIO#Argo:-Device-discovery-and-driver-registration-with-Virtio-Argo-transport

> - granular disaggregation of backend device-model providers desirable

agreed

> how does resource accounting work? each side pays for their own delivery ring?
> - init in already-guest-mapped mem & simply register?

Yes: rings are registered with a domain's own memory for receiving messages.

> - how does it compare to grant tables?

The grant tables are the Xen mechanism for a VM to instruct the
hypervisor to grant another VM permission to establish shared memory
mappings, or to copy data between domains. Argo is an alternative
mechanism for communicating between VMs that does not share memory
between them and provides different properties that are supportive of
isolation and access control.

There's a presentation with an overview of Argo from the 2019 Xen
Design and Developer Summit:
https://static.sched.com/hosted_files/xensummit19/92/Argo%20and%20HMX%20-%20OpenXT%20-%20Christopher%20Clark%20-%20Xen%20Summit%202019.pdf
https://www.youtube.com/watch?v=cnC0Tg3jqJQ&list=PLYyw7IQjL-zHmP6CuqwuqfXNK5QLmU7Ur&index=15

> - do you need to go through linux driver to alloc (e.g. xengntalloc)
> or has way to share arbitrary otherwise not-special userspace pages
> (e.g. u2mfn, with all its issues (pinning, reloc, etc.))?

In the current Argo device driver implementations, userspace does not
have direct access to Argo message rings. Instead the kernel provides
devices that can be used to send and receive data with familiar I/O
primitives via those.

For the VirtIO-Argo transport, userspace would not need to be aware of
the use of Argo - the VirtIO virtual devices will present themselves
to userspace with the same VirtIO device interfaces as when they use
any other transport.

> ioreq is tangled with grant refs, evt chans, generic vmexit
> dispatcher, instruction decoder, etc. none of which seems desirable if
> trying to move towards world with strictly safer guest interfaces
> exposed (e.g. HMX-only)

ack

> - there's no io to trap/decode here, it's explicitly exclusively via
> hypercall to HMX, no?

Yes; as Roger noted in his reply in this thread, the interest in IOREQ
has been motivated by other recent VirtIO activity in the Xen
Community, and whether some potential might exist for alignment with
that work.

> - also, do we want argo sendv hypercall to be always blocking & synchronous?
> - or perhaps async notify & background copy to other vm addr space?
> - possibly better scaling?
> - accounting of in-flight io requests to handle gets complicated
> (see recent XSA)
> - PCI-like completion request semantics? (argo as cross-domain
> software dma engine w/ some basic protocol enforcement?)

I think implementation of an asynchronous delivery primitive for Argo
is worth exploring given its potential for achieving different
performance characteristics which could enable it to support
additional use cases.
It is likely beyond the scope of the initial VirtIO-Argo driver
development, but enabling VirtIO guest drivers to use Argo will allow
testing to determine which uses of it could benefit from further
investment.

> "port" v4v driver => argo:
> - yes please! something without all the confidence-inspiring
> DEBUG_{APPLE,ORANGE,BANANA} indicators of production-worthy code would
> be great ;)
> - seems like you may want to redo argo hypercall interface too?

The Xen community has plans to remove all the uses of virtual
addresses from the hypervisor interface, and the Argo interface will
need to be updated as part of that work. In addition, work to
incorporate further features from v4v, and some updates to Argo per
items on the OpenXT Argo development wiki page, will also involve some
updates to the interface.

> (at least the syscall interface...)

Yes: a new Argo Linux driver will likely have quite a different
interface to userspace to the current one; it's been discussed in the
OpenXT community and the notes from the discussion are here:

https://openxt.atlassian.net/wiki/spaces/DC/pages/775389197/New+Linux+Driver+for+Argo

There is motivation to support both a networking and non-networking
interface, so that network-enabled guest OSes can use familiar
primitives and software, and non-network-enabled guests are still able
to use Argo communication.

> - targeting synchronous blocking sendv()?
> - or some async queue/completion thing too? (like PF_RING, but with
> *iov entries?)
> - both could count as HMX, both could enforce no double-write racing
> games at dest ring, etc.

The immediate focus is on building a modern, hopefully simple, driver
that unblocks the immediate use cases we have, allowing us to retire
the existing driver, and is suitable for submission and maintenance in
the kernel upstream.

> re v4vchar & doing similar for argo:
> - we may prefer "can write N bytes? -> yes/no" or "how many bytes can
> write? -> N" over "try to write N bytes -> only wrote M, EAGAIN"
> - the latter can be implemented over the former, but not the other way around
> - starts to matter when you want to be able to implement in userspace
> & provide backpressure to peer userspace without additional buffering
> & potential lying about durability of writes
> - breaks cross-domain EPIPE boundary correctness
> - Qubes ran into same issues when porting vchan from Xen to KVM
> initially via vsock

Thanks - that's helpful and will look at that when the driver work proceeds.

> some virtio drivers explicitly use shared mem for more than just
> communication rings:
> - e.g. virtio-fs, which can map pages as DAX-like fs backing to share page cache
> - e.g. virtio-gpu, virtio-wayland, virtio-video, which deal in framebuffers
> - needs thought about how best to map semantics to (or at least
> interoperate cleanly & safely with) HMX-{only,mostly} world
> - the performance of shared mem actually can meaningfully matter for
> e.g. large framebuffers in particular due to fundamental memory
> bandwidth constraints

This is an important point and given the clear utility of these
drivers it will be worth exploring what can be done to meet their
performance requirements and satisfy the semantics needed for them to
function. It may be the case that shared memory regions are going to
be necessary for some classes of driver - some investigation required.
Along the lines of the research that Rich included in his reply, it
would be interesting to see whether modern hardware provides
primitives that can support efficient cross-domain data transport that
could be used for this. Thanks for raising it.

Christopher
Re: [openxt-dev] VirtIO-Argo initial development proposal [ In reply to ]
On Tue, Dec 29, 2020 at 1:17 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Dec 23, 2020 at 04:32:01PM -0500, Rich Persaud wrote:
> > ?On Dec 17, 2020, at 07:13, Jean-Philippe Ouellet <jpo@vt.edu> wrote:
> > > ?On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark
> > > <christopher.w.clark@gmail.com> wrote:
> > >> Hi all,
> > >>
> > >> I have written a page for the OpenXT wiki describing a proposal for
> > >> initial development towards the VirtIO-Argo transport driver, and the
> > >> related system components to support it, destined for OpenXT and
> > >> upstream projects:
> > >>
> > >> https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1
>
> Thanks for the detailed document, I've taken a look and there's indeed
> a lot of work to do listed there :). I have some suggestion and
> questions.
>
> Overall I think it would be easier for VirtIO to take a new transport
> if it's not tied to a specific hypervisor. The way Argo is implemented
> right now is using hypercalls, which is a mechanism specific to Xen.
> IMO it might be easier to start by having an Argo interface using
> MSRs, that all hypervisors can implement, and then base the VirtIO
> implementation on top of that interface. It could be presented as a
> hypervisor agnostic mediated interface for inter-domain communication
> or some such.

Thanks - that is an interesting option for a new interface and it
would definitely be advantageous to be able to extend the benefits of
this approach beyond the Xen hypervisor. I have added it to our
planning document to investigate.

> That kind of links to a question, has any of this been discussed with
> the VirtIO folks, either at OASIS or the Linux kernel?

We identified a need within the Automotive Grade Linux community for
the ability to enforce access control, and they want to use VirtIO for
the usual reasons of standardization and to use the existing pool of
available drivers, but there is currently but no good answer for
having both, so we put Argo forward in a presentation the AGL
Virtualization Experts group in August, and they are discussing it.

The slides are available here:
https://lists.automotivelinux.org/g/agl-dev-community/attachment/8595/0/Argo%20and%20VirtIO.pdf

If you think there's anyone we should invite to the upcoming call on
the 14th of January, please let me know off-list.

> The document mentions: "Destination: mainline Linux kernel, via the
> Xen community" regarding the upstreamability of the VirtIO-Argo
> transport driver, but I think this would have to go through the VirtIO
> maintainers and not the Xen ones, hence you might want their feedback
> quite early to make sure they are OK with the approach taken, and in
> turn this might also require OASIS to agree to have a new transport
> documented.

We're aiming to get requirements within the Xen community first, since
there are multiple approaches to VirtIO with Xen ongoing at the
moment, but you are right that a design review by the VirtIO community
in the near term is important. I think it would be helpful to that
process if the Xen community has tried to reach a consensus on the
design beforehand.

> > > thoughts on how to handle device enumeration? hotplug notifications?
> > > - can't rely on xenstore
> > > - need some internal argo messaging for this?
> > > - name service w/ well-known names? starts to look like xenstore
> > > pretty quickly...
> > > - granular disaggregation of backend device-model providers desirable
>
> I'm also curious about this part and I was assuming this would be
> done using some kind of Argo messages, but there's no mention in the
> document. Would be nice to elaborate a little more about this in the
> document.

Ack, noted: some further design work is needed on this.

> > > how does resource accounting work? each side pays for their own delivery ring?
> > > - init in already-guest-mapped mem & simply register?
> > > - how does it compare to grant tables?
> > > - do you need to go through linux driver to alloc (e.g. xengntalloc)
> > > or has way to share arbitrary otherwise not-special userspace pages
> > > (e.g. u2mfn, with all its issues (pinning, reloc, etc.))?
> > >
> > > ioreq is tangled with grant refs, evt chans, generic vmexit
> > > dispatcher, instruction decoder, etc. none of which seems desirable if
> > > trying to move towards world with strictly safer guest interfaces
> > > exposed (e.g. HMX-only)
>
> I think this needs Christopher's clarification, but it's my
> understanding that the Argo transport wouldn't need IOREQs at all,
> since all data exchange would be done using the Argo interfaces, there
> would be no MMIO emulation or anything similar. The mention about
> IOREQs is because the Arm folks are working on using IOREQs in Arm to
> enable virtio-mmio on Xen.

Yes, that is correct.

> Fro my reading of the document, it seem Argo VirtIO would still rely
> on event channels, it would IMO be better if instead interrupts are
> delivered using a native mechanism, something like MSI delivery by
> using a destination APIC ID, vector, delivery mode and trigger mode.

Yes, Argo could deliver interrupts via another mechanism rather than
event channels; have added this to our planning doc for investigation.
https://openxt.atlassian.net/wiki/spaces/DC/pages/1696169985/VirtIO-Argo+Development+Phase+1

thanks,

Christopher
Re: [openxt-dev] VirtIO-Argo initial development proposal [ In reply to ]
Hi Roger,

On 29/12/2020 09:17, Roger Pau Monné wrote:
> On Wed, Dec 23, 2020 at 04:32:01PM -0500, Rich Persaud wrote:
>> ?On Dec 17, 2020, at 07:13, Jean-Philippe Ouellet <jpo@vt.edu> wrote:
>>> ?On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark
>>> <christopher.w.clark@gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> I have written a page for the OpenXT wiki describing a proposal for
>>>> initial development towards the VirtIO-Argo transport driver, and the
>>>> related system components to support it, destined for OpenXT and
>>>> upstream projects:
>>>>
>>>> https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1
>
> Thanks for the detailed document, I've taken a look and there's indeed
> a lot of work to do listed there :). I have some suggestion and
> questions.
>
> Overall I think it would be easier for VirtIO to take a new transport
> if it's not tied to a specific hypervisor. The way Argo is implemented
> right now is using hypercalls, which is a mechanism specific to Xen.
The concept of hypervisor call is not Xen specific. Any other hypervisor
can easily implement them. At least this is the case on Arm because we
have an instruction 'hvc' that acts the same way as a syscall but for
the hypervisor.

What we would need to do is reserving a range for Argos. It should be
possible to do it on Arm thanks to the SMCCC (see [1]).

I am not sure whether you have something similar on x86.

> IMO it might be easier to start by having an Argo interface using
> MSRs, that all hypervisors can implement, and then base the VirtIO
> implementation on top of that interface.
My concern is the interface would need to be arch-specific. Would you
mind to explain what's the problem to implement something based on vmcall?

Cheers,

[1]
https://developer.arm.com/architectures/system-architectures/software-standards/smccc

--
Julien Grall
Re: [openxt-dev] VirtIO-Argo initial development proposal [ In reply to ]
On Wed, Dec 30, 2020 at 11:30:03AM +0000, Julien Grall wrote:
> Hi Roger,
>
> On 29/12/2020 09:17, Roger Pau Monné wrote:
> > On Wed, Dec 23, 2020 at 04:32:01PM -0500, Rich Persaud wrote:
> > > ?On Dec 17, 2020, at 07:13, Jean-Philippe Ouellet <jpo@vt.edu> wrote:
> > > > ?On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark
> > > > <christopher.w.clark@gmail.com> wrote:
> > > > > Hi all,
> > > > >
> > > > > I have written a page for the OpenXT wiki describing a proposal for
> > > > > initial development towards the VirtIO-Argo transport driver, and the
> > > > > related system components to support it, destined for OpenXT and
> > > > > upstream projects:
> > > > >
> > > > > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1
> >
> > Thanks for the detailed document, I've taken a look and there's indeed
> > a lot of work to do listed there :). I have some suggestion and
> > questions.
> >
> > Overall I think it would be easier for VirtIO to take a new transport
> > if it's not tied to a specific hypervisor. The way Argo is implemented
> > right now is using hypercalls, which is a mechanism specific to Xen.
> The concept of hypervisor call is not Xen specific. Any other hypervisor can
> easily implement them. At least this is the case on Arm because we have an
> instruction 'hvc' that acts the same way as a syscall but for the
> hypervisor.
>
> What we would need to do is reserving a range for Argos. It should be
> possible to do it on Arm thanks to the SMCCC (see [1]).
>
> I am not sure whether you have something similar on x86.

On x86 Intel has vmcall and AMD vmmcall, but those are only available
to HVM guests.

> > IMO it might be easier to start by having an Argo interface using
> > MSRs, that all hypervisors can implement, and then base the VirtIO
> > implementation on top of that interface.
> My concern is the interface would need to be arch-specific. Would you mind
> to explain what's the problem to implement something based on vmcall?

More of a recommendation than a concern, but I think it would be more
attractive for upstream if we could provide an interface to Argo (or
hypervisor mediated data exchange) that's no too tied into Xen
specifics. Using a defined vmcall/vmmcall interface (and leaving PV out
of the picture?) could be one option.

My suggestion for using MSRs was because I think every x86 hypervisor
must have the logic to trap and handle some of those, and also every
OS has the helpers to read/write MSRs, and those instructions are not
vendor specific.

Roger.
Re: [openxt-dev] VirtIO-Argo initial development proposal [ In reply to ]
On Thu, 31 Dec 2020 at 08:46, Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Dec 30, 2020 at 11:30:03AM +0000, Julien Grall wrote:
> > Hi Roger,
> >
> > On 29/12/2020 09:17, Roger Pau Monné wrote:
> > > On Wed, Dec 23, 2020 at 04:32:01PM -0500, Rich Persaud wrote:
> > > > ?On Dec 17, 2020, at 07:13, Jean-Philippe Ouellet <jpo@vt.edu> wrote:
> > > > > ?On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark
> > > > > <christopher.w.clark@gmail.com> wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I have written a page for the OpenXT wiki describing a proposal for
> > > > > > initial development towards the VirtIO-Argo transport driver, and the
> > > > > > related system components to support it, destined for OpenXT and
> > > > > > upstream projects:
> > > > > >
> > > > > > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1
> > >
> > > Thanks for the detailed document, I've taken a look and there's indeed
> > > a lot of work to do listed there :). I have some suggestion and
> > > questions.
> > >
> > > Overall I think it would be easier for VirtIO to take a new transport
> > > if it's not tied to a specific hypervisor. The way Argo is implemented
> > > right now is using hypercalls, which is a mechanism specific to Xen.
> > The concept of hypervisor call is not Xen specific. Any other hypervisor can
> > easily implement them. At least this is the case on Arm because we have an
> > instruction 'hvc' that acts the same way as a syscall but for the
> > hypervisor.
> >
> > What we would need to do is reserving a range for Argos. It should be
> > possible to do it on Arm thanks to the SMCCC (see [1]).
> >
> > I am not sure whether you have something similar on x86.
>
> On x86 Intel has vmcall and AMD vmmcall, but those are only available
> to HVM guests.

Right, as it would for other architecture if one decided to implement
PV (or binary translated) guests.
While it may be possible to use a different way to communicate on x86
(see more below), I am not sure this would be the case for other
architectures.
The closest thing to MSR on Arm would be the System Registers. But I
am not aware of a range of IDs that could be used by the software.

>
> > > IMO it might be easier to start by having an Argo interface using
> > > MSRs, that all hypervisors can implement, and then base the VirtIO
> > > implementation on top of that interface.
> > My concern is the interface would need to be arch-specific. Would you mind
> > to explain what's the problem to implement something based on vmcall?
>
> More of a recommendation than a concern, but I think it would be more
> attractive for upstream if we could provide an interface to Argo (or
> hypervisor mediated data exchange) that's no too tied into Xen
> specifics.

I agree with this statement. We also need an interface that is ideally
not to arch-specific otherwise it will be more complicated to get
adopted on other arch.
For instance, at the moment, I don't really see what else can be used
on Arm (other that MMIO and PCI) if you want to care about PV (or
binary translated) guests.

> My suggestion for using MSRs was because I think every x86 hypervisor
> must have the logic to trap and handle some of those, and also every
> OS has the helpers to read/write MSRs, and those instructions are not
> vendor specific.

In order to use MSRs, you would need to reserve a range of IDs. Do x86
have a range of ID that can be used for software purposes (i.e. the
current and future processors will not use it)?

Cheers,
Re: [openxt-dev] VirtIO-Argo initial development proposal [ In reply to ]
On Thu, Dec 31, 2020 at 11:02:40AM +0000, Julien Grall wrote:
> On Thu, 31 Dec 2020 at 08:46, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Wed, Dec 30, 2020 at 11:30:03AM +0000, Julien Grall wrote:
> > > Hi Roger,
> > >
> > > On 29/12/2020 09:17, Roger Pau Monné wrote:
> > > > On Wed, Dec 23, 2020 at 04:32:01PM -0500, Rich Persaud wrote:
> > > > > ?On Dec 17, 2020, at 07:13, Jean-Philippe Ouellet <jpo@vt.edu> wrote:
> > > > > > ?On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark
> > > > > > <christopher.w.clark@gmail.com> wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I have written a page for the OpenXT wiki describing a proposal for
> > > > > > > initial development towards the VirtIO-Argo transport driver, and the
> > > > > > > related system components to support it, destined for OpenXT and
> > > > > > > upstream projects:
> > > > > > >
> > > > > > > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1
> > > >
> > > > Thanks for the detailed document, I've taken a look and there's indeed
> > > > a lot of work to do listed there :). I have some suggestion and
> > > > questions.
> > > >
> > > > Overall I think it would be easier for VirtIO to take a new transport
> > > > if it's not tied to a specific hypervisor. The way Argo is implemented
> > > > right now is using hypercalls, which is a mechanism specific to Xen.
> > > The concept of hypervisor call is not Xen specific. Any other hypervisor can
> > > easily implement them. At least this is the case on Arm because we have an
> > > instruction 'hvc' that acts the same way as a syscall but for the
> > > hypervisor.
> > >
> > > What we would need to do is reserving a range for Argos. It should be
> > > possible to do it on Arm thanks to the SMCCC (see [1]).
> > >
> > > I am not sure whether you have something similar on x86.
> >
> > On x86 Intel has vmcall and AMD vmmcall, but those are only available
> > to HVM guests.
>
> Right, as it would for other architecture if one decided to implement
> PV (or binary translated) guests.
> While it may be possible to use a different way to communicate on x86
> (see more below), I am not sure this would be the case for other
> architectures.
> The closest thing to MSR on Arm would be the System Registers. But I
> am not aware of a range of IDs that could be used by the software.

I don't really know that much about Arm to make any helpful statement
here. My point was to keep in mind that it might be interesting to try
to define an hypervisor agnostic mediated data exchange interface, so
that whatever is implemented in the VirtIO transport layer could also
be used by other hypervisors without having to modify the transport
layer itself. If that's better done using a vmcall interface that's
fine, as long as we provide a sane interface that other hypervisors
can (easily) implement.

> >
> > > > IMO it might be easier to start by having an Argo interface using
> > > > MSRs, that all hypervisors can implement, and then base the VirtIO
> > > > implementation on top of that interface.
> > > My concern is the interface would need to be arch-specific. Would you mind
> > > to explain what's the problem to implement something based on vmcall?
> >
> > More of a recommendation than a concern, but I think it would be more
> > attractive for upstream if we could provide an interface to Argo (or
> > hypervisor mediated data exchange) that's no too tied into Xen
> > specifics.
>
> I agree with this statement. We also need an interface that is ideally
> not to arch-specific otherwise it will be more complicated to get
> adopted on other arch.
> For instance, at the moment, I don't really see what else can be used
> on Arm (other that MMIO and PCI) if you want to care about PV (or
> binary translated) guests.

My recommendation was mostly to make Argo easier to propose as an
hypervisor agnostic mediated data exchange, which in turn could make
adding a VirtIO transport layer based on it easier. If that's not the
case let's just forget about this.

> > My suggestion for using MSRs was because I think every x86 hypervisor
> > must have the logic to trap and handle some of those, and also every
> > OS has the helpers to read/write MSRs, and those instructions are not
> > vendor specific.
>
> In order to use MSRs, you would need to reserve a range of IDs. Do x86
> have a range of ID that can be used for software purposes (i.e. the
> current and future processors will not use it)?

There's a range of MSRs (0x40000000-0x400000FF) that are guaranteed to
always be invalid on bare metal by Intel, so VMware, HyperV and
others have started using this range to add virtualization specific
MSRs. You can grep for HV_X64_MSR_* defines on Xen for some of the
HyperV ones.

Roger.
Re: [openxt-dev] VirtIO-Argo initial development proposal [ In reply to ]
On Thu, Dec 31, 2020 at 3:39 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Thu, Dec 31, 2020 at 11:02:40AM +0000, Julien Grall wrote:
> > On Thu, 31 Dec 2020 at 08:46, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Wed, Dec 30, 2020 at 11:30:03AM +0000, Julien Grall wrote:
> > > > Hi Roger,
> > > >
> > > > On 29/12/2020 09:17, Roger Pau Monné wrote:
> > > > > On Wed, Dec 23, 2020 at 04:32:01PM -0500, Rich Persaud wrote:
> > > > > > ?On Dec 17, 2020, at 07:13, Jean-Philippe Ouellet <jpo@vt.edu> wrote:
> > > > > > > ?On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark
> > > > > > > <christopher.w.clark@gmail.com> wrote:
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I have written a page for the OpenXT wiki describing a proposal for
> > > > > > > > initial development towards the VirtIO-Argo transport driver, and the
> > > > > > > > related system components to support it, destined for OpenXT and
> > > > > > > > upstream projects:
> > > > > > > >
> > > > > > > > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1
> > > > >
> > > > > Thanks for the detailed document, I've taken a look and there's indeed
> > > > > a lot of work to do listed there :). I have some suggestion and
> > > > > questions.
> > > > >
> > > > > Overall I think it would be easier for VirtIO to take a new transport
> > > > > if it's not tied to a specific hypervisor. The way Argo is implemented
> > > > > right now is using hypercalls, which is a mechanism specific to Xen.
> > > > The concept of hypervisor call is not Xen specific. Any other hypervisor can
> > > > easily implement them. At least this is the case on Arm because we have an
> > > > instruction 'hvc' that acts the same way as a syscall but for the
> > > > hypervisor.
> > > >
> > > > What we would need to do is reserving a range for Argos. It should be
> > > > possible to do it on Arm thanks to the SMCCC (see [1]).
> > > >
> > > > I am not sure whether you have something similar on x86.
> > >
> > > On x86 Intel has vmcall and AMD vmmcall, but those are only available
> > > to HVM guests.
> >
> > Right, as it would for other architecture if one decided to implement
> > PV (or binary translated) guests.
> > While it may be possible to use a different way to communicate on x86
> > (see more below), I am not sure this would be the case for other
> > architectures.
> > The closest thing to MSR on Arm would be the System Registers. But I
> > am not aware of a range of IDs that could be used by the software.
>
> I don't really know that much about Arm to make any helpful statement
> here. My point was to keep in mind that it might be interesting to try
> to define an hypervisor agnostic mediated data exchange interface, so
> that whatever is implemented in the VirtIO transport layer could also
> be used by other hypervisors without having to modify the transport
> layer itself. If that's better done using a vmcall interface that's
> fine, as long as we provide a sane interface that other hypervisors
> can (easily) implement.
>
> > >
> > > > > IMO it might be easier to start by having an Argo interface using
> > > > > MSRs, that all hypervisors can implement, and then base the VirtIO
> > > > > implementation on top of that interface.
> > > > My concern is the interface would need to be arch-specific. Would you mind
> > > > to explain what's the problem to implement something based on vmcall?
> > >
> > > More of a recommendation than a concern, but I think it would be more
> > > attractive for upstream if we could provide an interface to Argo (or
> > > hypervisor mediated data exchange) that's no too tied into Xen
> > > specifics.
> >
> > I agree with this statement. We also need an interface that is ideally
> > not to arch-specific otherwise it will be more complicated to get
> > adopted on other arch.
> > For instance, at the moment, I don't really see what else can be used
> > on Arm (other that MMIO and PCI) if you want to care about PV (or
> > binary translated) guests.
>
> My recommendation was mostly to make Argo easier to propose as an
> hypervisor agnostic mediated data exchange, which in turn could make
> adding a VirtIO transport layer based on it easier. If that's not the
> case let's just forget about this.
>
> > > My suggestion for using MSRs was because I think every x86 hypervisor
> > > must have the logic to trap and handle some of those, and also every
> > > OS has the helpers to read/write MSRs, and those instructions are not
> > > vendor specific.
> >
> > In order to use MSRs, you would need to reserve a range of IDs. Do x86
> > have a range of ID that can be used for software purposes (i.e. the
> > current and future processors will not use it)?
>
> There's a range of MSRs (0x40000000-0x400000FF) that are guaranteed to
> always be invalid on bare metal by Intel, so VMware, HyperV and
> others have started using this range to add virtualization specific
> MSRs. You can grep for HV_X64_MSR_* defines on Xen for some of the
> HyperV ones.

I've written a summary of the points from this thread in a project
description at this wiki page:

https://openxt.atlassian.net/wiki/spaces/DC/pages/1696169985/VirtIO-Argo+Development+Phase+1#Project:-Hypervisor-agnostic-Hypervisor-Interface

-- please let me know if anything is captured incorrectly or any
amendments that you would like to be made.

As a reminder, we have an upcoming VirtIO-Argo call on Thursday next
week, 14th of January at 16:00 UTC.

thanks,

Christopher