Mailing List Archive: [RFC] Extend the number of event channels availabe to guests

[RFC] Extend the number of event channels availabe to guests

Sep 19, 2012, 4:49 PM

Post #1 of 7 (415 views)

Hello,
reported below there is a request for comment on the plan to extend the
number of event channel the hypervisor can grok. I've informally
discussed some parts of this with Ian Campbell but I would like to
formalize it someway, hear more opinion on it and possibly give the
project more exposure and guidance, as this is supposed to be one of the
major features for 4.3.

SYNOPSIS
Currently the number of eventchannel every guest can setup is 1k or 4k,
depending if the guest is 32 or 64 bits. This is a limitation in the
number of guests an host can actively run, because the host and guest
will need to setup eventchannel among them and thus at some point Dom0
will exhaust the number of available eventchannel for all guests. Scope
of this work is to raise the number of available eventchannel for every
guest (and then, also for Dom0).

The 4k number cames out directly by the eventchannel organization. In
order to address a single channel, every guest keeps a map of
corresponding bits in its shared page with the hypervisor. However, in
order to avoid to search anytime for 4k bits, a per-cpu, upper level
further mask is present to address singularly smaller words of the
pending eventchannel mask (making the organization of the code at all
the effects a two-level lookup table.

In order to expand the number of available eventchannels, one must take
into account 2 important aspects, related to compatibility: ABI and
ability to run both old and new method altogether.
The former one is about the fact that all the controlling structures
related to eventchannels live in public ABI of the hypervisor. A valid
solution, then, must not enforce any ABI changes at all.
The latter one is about the ability to leave the hypervisor to work with
both the old model and the new one. This is to keep support with guests
running an older kernel than the patched one.

Proposal
The proposal is pretty simple: the eventchannel search will become a
three-level lookup table, with the leaf level being composed by shared
pages registered at boot time by the guests.
The bitmap working now as leaf (then called "second level") will work
alternatively as leaf level still (for older kernel) or for intermediate
level to address into a new array of shared pages (for newer kernels).
This leaves the possibility to reuse the existing mechanisms without
modifying its internals.

More specifically, what needs to happen:
- Add new members to struct domain to handle an array of pages (to
contain the actual evtchn bitmaps), a further array of pages (to contain
the evtchn masks) and a control bit to say if it is subjective to the
new mode or not. Initially the arrays will be empty and the control bit
will be OFF.
- At init_platform() time, the guest must allocate the pages to compose
the 2 arrays and invoke a novel hypercall which, at big lines, does the
following:
* Creates some pages to populate the new arrays in struct domain via
alloc_xenheap_pages()
* Recreates the mapping with the gpfn passed from the userland, using
basically guest_physmap_add_page()
* Sets the control bit to ON
- Places that need to access to the actual leaf bit (like, for example,
xen_evtchn_do_upcall()) will need to double check the control bit. If it
is OFF they consider the second level as the leaf one, otherwise they
will do a further lookup to get the bit from the new array of pages.

Of course there are some nits to be decided yet, like, for example:
* How many pages should the new level have? We can start by populating
just one, for example
* Who should have really the knowledge of how many pages to allocate?
Likely the hypervisor should have a threshhold, but in general we may
want to have a posting mechanism to have the guest ask the hypervisor
before-hand and satisfy its actual request
* How many bits should be indirected in the third-level by every single
bit in the second-level? (that is a really minor factor, but still).

Please let me know what do you think about this.

Thanks,
Attilio

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel