On Tue, Apr 11, 2017 at 02:21:07AM -0600, Jan Beulich wrote: >>>> On 11.04.17 at 02:59, <email@example.com> wrote:
>> As you know, with VT-d PI enabled, hardware can directly deliver external
>> interrupts to guest without any VMM intervention. It will reduces overall
>> interrupt latency to guest and reduces overheads otherwise incurred by the
>> VMM for virtualizing interrupts. In my mind, it's an important feature to
>> interrupt virtualization.
>> But VT-d PI feature is disabled by default on Xen for some corner
>> cases and bugs. Based on Feng's work, we have fixed those corner
>> cases related to VT-d PI. Do you think it is a time to enable VT-d PI by
>> default. If no, could you list your concerns so that we can resolve them?
>I don't recall you addressing the main issue (blocked vCPU-s list
>length; see the comment next to the iommu_intpost definition).
Indeed. I have gone through the discussion happened in April 2016[1, 2].
First of all, I admit this is an issue in extreme case and we should
come up with a solution.
The problem we are facing is:
There is a per-cpu list used to maintain all the blocked vCPU on a
pCPU. When a wakeup interrupt comes, the interrupt handler travels
the list to wake the vCPUs whose pi_desc indicates an interrupt has
been posted. There is no policy to restrict the size of the list such
that in some extreme case, the list can be too long to cause some
issues (the most obvious issue is about interrupt latency).
The theoretical max number of entry in the list is 4M as one host can
have 32k domains and every domain can have 128vCPU. If all the vCPUs
are blocked in one list, the list gets its theoretical maximum.
The root cause of this issue, I think, is that the wakeup interrupt
vector is shared by
all the vCPUs on one pCPU. Lacking of enough
information (such as which device sends or which IRTE translates this
interrupt), there is no effective method to distinguish the
interrupt's destination vCPU except traveling this list. Right? So we
only can mitigate this issue through decreasing or limiting the
entry's maximum in one list.
Several methods we can take to mitigate this issue:
1. According to your discussions, evenly distributing all the blocked
vCPUs among all pCPUs can mitigate this issue. With this approach, all
vCPUs are blocked in one list can be avoided. It can decrease the
entry's maximum in one list by
N times (N is the number of pCPU).
2. Don't put the blocked vCPUs which won't be woken by
interrupt into the per-cpu list. Currently, we put the blocked vCPUs
belong to domains who have assigned devices into the list. But if one
blocked vCPU of such domain is not a destination of every posted
format IRTE, it needn't be added to the per-cpu list. The blocked vCPU
will be woken by
IPIs or other virtual interrupts. From this aspect, we
can decrease the entries in the per-cpu list.
3. Like what we do in struct irq_guest_action_t, can we limit the
maximum of entry we support in the list. With this approach, during
domain creation, we calculate the available entries and compare with
the domain's vCPU number to decide whether the domain can use VT-d PI
This method will pose a strict restriction to the maximum of entry in
one list. But it may affect vCPU hotplug.
According to your intuition, which methods are feasible and
acceptable? I will attempt to mitigate this issue per your advices.
Xen-devel mailing list