Mailing List Archive

[PVH]: Help: msi.c
The second is msi.c. I don't understand it very well, and need to
figure what to do for PVH. Would appreciate suggestions if anyone knows.

Thanks for your time,
Mukesh



diff -r ba5e9253d04d xen/arch/x86/msi.c
--- a/xen/arch/x86/msi.c Thu Nov 01 16:53:31 2012 -0700
+++ b/xen/arch/x86/msi.c Fri Dec 07 17:45:07 2012 -0800
@@ -766,6 +766,9 @@ static int msix_capability_init(struct p
WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first,
dev->msix_pba.last));

+/* PVH: fixme: not a clue what to do here :) */
+if (is_pvh_domain(dev->domain) && dev->domain->domain_id != 0)
+{
if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
dev->msix_table.last) )
WARN();
@@ -793,6 +796,7 @@ static int msix_capability_init(struct p
/* XXX How to deal with existing mappings? */
}
}
+}
}
WARN_ON(dev->msix_nr_entries != nr_entries);
WARN_ON(dev->msix_table.first != (table_paddr >> PAGE_SHIFT));

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
>>> On 08.12.12 at 02:46, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> The second is msi.c. I don't understand it very well, and need to
> figure what to do for PVH. Would appreciate suggestions if anyone knows.

Why do you think you need to do something specially for PVH here
in the first place? The only adjustment I would expect might be
needed is address translation (depending on how PVH deal with
MMIO addresses).

In any case is checking the domain ID here pointless - this code
can only be run in the context of Dom0 anyway.

What the code here does is protect the MSI-X table of a device
from Dom0 write accesses - Xen itself needs to be able to fully
control the mask bits, and hence Dom0 shouldn't (and qemu, even
if running in Dom0) mustn't write to it (which was happening in the
past).

Jan

> --- a/xen/arch/x86/msi.c Thu Nov 01 16:53:31 2012 -0700
> +++ b/xen/arch/x86/msi.c Fri Dec 07 17:45:07 2012 -0800
> @@ -766,6 +766,9 @@ static int msix_capability_init(struct p
> WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first,
> dev->msix_pba.last));
>
> +/* PVH: fixme: not a clue what to do here :) */
> +if (is_pvh_domain(dev->domain) && dev->domain->domain_id != 0)
> +{
> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
> dev->msix_table.last) )
> WARN();
> @@ -793,6 +796,7 @@ static int msix_capability_init(struct p
> /* XXX How to deal with existing mappings? */
> }
> }
> +}
> }
> WARN_ON(dev->msix_nr_entries != nr_entries);
> WARN_ON(dev->msix_table.first != (table_paddr >> PAGE_SHIFT));




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Mon, 10 Dec 2012 09:43:34 +0000
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 08.12.12 at 02:46, Mukesh Rathor <mukesh.rathor@oracle.com>
> >>> wrote:
> > The second is msi.c. I don't understand it very well, and need to
> > figure what to do for PVH. Would appreciate suggestions if anyone
> > knows.
>
> Why do you think you need to do something specially for PVH here
> in the first place? The only adjustment I would expect might be
> needed is address translation (depending on how PVH deal with
> MMIO addresses).

Ok, thanks. Looks like I'm missing some address translation somewhere,
getting EPT violation from dom0. Time to read up more on msi-x and debug.

thanks
Mukesh


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Tue, 11 Dec 2012, Mukesh Rathor wrote:
> On Mon, 10 Dec 2012 09:43:34 +0000
> "Jan Beulich" <JBeulich@suse.com> wrote:
>
> > >>> On 08.12.12 at 02:46, Mukesh Rathor <mukesh.rathor@oracle.com>
> > >>> wrote:
> > > The second is msi.c. I don't understand it very well, and need to
> > > figure what to do for PVH. Would appreciate suggestions if anyone
> > > knows.
> >
> > Why do you think you need to do something specially for PVH here
> > in the first place? The only adjustment I would expect might be
> > needed is address translation (depending on how PVH deal with
> > MMIO addresses).
>
> Ok, thanks. Looks like I'm missing some address translation somewhere,
> getting EPT violation from dom0. Time to read up more on msi-x and debug.

That's strange because AFAIK Linux is never editing the MSI-X entries
directly: give a look at arch/x86/pci/xen.c:xen_initdom_setup_msi_irqs,
Linux only remaps MSIs into pirqs using hypercalls.
Xen should be the only one to touch the real MSI-X table.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Tue, 11 Dec 2012 12:10:19 +0000
Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:

> On Tue, 11 Dec 2012, Mukesh Rathor wrote:
> > On Mon, 10 Dec 2012 09:43:34 +0000
> > "Jan Beulich" <JBeulich@suse.com> wrote:
> >
> > > >>> On 08.12.12 at 02:46, Mukesh Rathor <mukesh.rathor@oracle.com>
> > > >>> wrote:
> > > > The second is msi.c. I don't understand it very well, and need
> > > > to figure what to do for PVH. Would appreciate suggestions if
> > > > anyone knows.
> > >
> > > Why do you think you need to do something specially for PVH here
> > > in the first place? The only adjustment I would expect might be
> > > needed is address translation (depending on how PVH deal with
> > > MMIO addresses).
> >
> > Ok, thanks. Looks like I'm missing some address translation
> > somewhere, getting EPT violation from dom0. Time to read up more on
> > msi-x and debug.
>
> That's strange because AFAIK Linux is never editing the MSI-X entries
> directly: give a look at
> arch/x86/pci/xen.c:xen_initdom_setup_msi_irqs, Linux only remaps MSIs
> into pirqs using hypercalls. Xen should be the only one to touch the
> real MSI-X table.

So, this is what's happening. The side effect of :

if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
dev->msix_table.last) )
WARN();
if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
dev->msix_pba.last) )
WARN();

in msix_capability_init() in xen is that the dom0 EPT entries that I've
mapped are going from RW to read only. Then when dom0 accesses it, I
get EPT violation. In case of pure PV, the PTE entry to access the
iomem is RW, and the above rangeset adding doesn't affect it. I don't
understand why? Looking into that now...

thanks
Mukesh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Wed, 12 Dec 2012 17:15:23 -0800
Mukesh Rathor <mukesh.rathor@oracle.com> wrote:

> On Tue, 11 Dec 2012 12:10:19 +0000
> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
>
> > On Tue, 11 Dec 2012, Mukesh Rathor wrote:
> > > On Mon, 10 Dec 2012 09:43:34 +0000
> > > "Jan Beulich" <JBeulich@suse.com> wrote:
> >
> > That's strange because AFAIK Linux is never editing the MSI-X
> > entries directly: give a look at
> > arch/x86/pci/xen.c:xen_initdom_setup_msi_irqs, Linux only remaps
> > MSIs into pirqs using hypercalls. Xen should be the only one to
> > touch the real MSI-X table.
>
> So, this is what's happening. The side effect of :
>
> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
> dev->msix_table.last) )
> WARN();
> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
> dev->msix_pba.last) )
> WARN();
>
> in msix_capability_init() in xen is that the dom0 EPT entries that
> I've mapped are going from RW to read only. Then when dom0 accesses
> it, I get EPT violation. In case of pure PV, the PTE entry to access
> the iomem is RW, and the above rangeset adding doesn't affect it. I
> don't understand why? Looking into that now...

Ah, it's coming from:

ept_p2m_type_to_flags():
case p2m_mmio_direct:
entry->r = entry->x = 1;
entry->w = !rangeset_contains_singleton(mmio_ro_ranges, entry->mfn);

I suppose, the best fix would be to add a check here for dom0 and let it
have w access?

thanks
mukesh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
>>> On 13.12.12 at 02:43, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> On Wed, 12 Dec 2012 17:15:23 -0800
> Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>
>> On Tue, 11 Dec 2012 12:10:19 +0000
>> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
>>
>> > On Tue, 11 Dec 2012, Mukesh Rathor wrote:
>> > > On Mon, 10 Dec 2012 09:43:34 +0000
>> > > "Jan Beulich" <JBeulich@suse.com> wrote:
>> >
>> > That's strange because AFAIK Linux is never editing the MSI-X
>> > entries directly: give a look at
>> > arch/x86/pci/xen.c:xen_initdom_setup_msi_irqs, Linux only remaps
>> > MSIs into pirqs using hypercalls. Xen should be the only one to
>> > touch the real MSI-X table.
>>
>> So, this is what's happening. The side effect of :
>>
>> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
>> dev->msix_table.last) )
>> WARN();
>> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
>> dev->msix_pba.last) )
>> WARN();
>>
>> in msix_capability_init() in xen is that the dom0 EPT entries that
>> I've mapped are going from RW to read only. Then when dom0 accesses
>> it, I get EPT violation. In case of pure PV, the PTE entry to access
>> the iomem is RW, and the above rangeset adding doesn't affect it. I
>> don't understand why? Looking into that now...

As far as I was able to tell back at the time when I implemented
this, existing code shouldn't have mappings for these tables in
place at the time these ranges get added here. But I noted in
the patch description that this is a potential issue (and may need
fixing if deemed severe enough - back then, apparently nobody
really cared, perhaps largely because passthrough to PV guests
isn't considered fully secure anyway).

Now - did that change? I.e. can you describe where the mappings
come from that cause the problem here? Because if that is
possible even with non-hostile code, that might warrant properly
dealing with that, even if pretty involved (I don't think there's a
way to find all mappings of a given page other than a brute force
scan of all L1 page tables the domain currently has).

> Ah, it's coming from:
>
> ept_p2m_type_to_flags():
> case p2m_mmio_direct:
> entry->r = entry->x = 1;
> entry->w = !rangeset_contains_singleton(mmio_ro_ranges, entry->mfn);
>
> I suppose, the best fix would be to add a check here for dom0 and let it
> have w access?

No, absolutely not. Nor would that help with the same issue in a
DomU.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Thu, 13 Dec 2012, Jan Beulich wrote:
> >>> On 13.12.12 at 02:43, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> > On Wed, 12 Dec 2012 17:15:23 -0800
> > Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> >
> >> On Tue, 11 Dec 2012 12:10:19 +0000
> >> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
> >>
> >> > On Tue, 11 Dec 2012, Mukesh Rathor wrote:
> >> > > On Mon, 10 Dec 2012 09:43:34 +0000
> >> > > "Jan Beulich" <JBeulich@suse.com> wrote:
> >> >
> >> > That's strange because AFAIK Linux is never editing the MSI-X
> >> > entries directly: give a look at
> >> > arch/x86/pci/xen.c:xen_initdom_setup_msi_irqs, Linux only remaps
> >> > MSIs into pirqs using hypercalls. Xen should be the only one to
> >> > touch the real MSI-X table.
> >>
> >> So, this is what's happening. The side effect of :
> >>
> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
> >> dev->msix_table.last) )
> >> WARN();
> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
> >> dev->msix_pba.last) )
> >> WARN();
> >>
> >> in msix_capability_init() in xen is that the dom0 EPT entries that
> >> I've mapped are going from RW to read only. Then when dom0 accesses
> >> it, I get EPT violation. In case of pure PV, the PTE entry to access
> >> the iomem is RW, and the above rangeset adding doesn't affect it. I
> >> don't understand why? Looking into that now...
>
> As far as I was able to tell back at the time when I implemented
> this, existing code shouldn't have mappings for these tables in
> place at the time these ranges get added here. But I noted in
> the patch description that this is a potential issue (and may need
> fixing if deemed severe enough - back then, apparently nobody
> really cared, perhaps largely because passthrough to PV guests
> isn't considered fully secure anyway).
>
> Now - did that change? I.e. can you describe where the mappings
> come from that cause the problem here?

The generic Linux MSI-X handling code does that, before calling the
arch specific msi setup function, for which we have a xen version
(xen_initdom_setup_msi_irqs):

pci_enable_msix -> msix_capability_init -> msix_map_region

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
>>> On 13.12.12 at 13:19, Stefano Stabellini <stefano.stabellini@eu.citrix.com>
wrote:
> On Thu, 13 Dec 2012, Jan Beulich wrote:
>> >>> On 13.12.12 at 02:43, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>> > On Wed, 12 Dec 2012 17:15:23 -0800
>> > Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
>> >
>> >> On Tue, 11 Dec 2012 12:10:19 +0000
>> >> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
>> >>
>> >> > On Tue, 11 Dec 2012, Mukesh Rathor wrote:
>> >> > > On Mon, 10 Dec 2012 09:43:34 +0000
>> >> > > "Jan Beulich" <JBeulich@suse.com> wrote:
>> >> >
>> >> > That's strange because AFAIK Linux is never editing the MSI-X
>> >> > entries directly: give a look at
>> >> > arch/x86/pci/xen.c:xen_initdom_setup_msi_irqs, Linux only remaps
>> >> > MSIs into pirqs using hypercalls. Xen should be the only one to
>> >> > touch the real MSI-X table.
>> >>
>> >> So, this is what's happening. The side effect of :
>> >>
>> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
>> >> dev->msix_table.last) )
>> >> WARN();
>> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
>> >> dev->msix_pba.last) )
>> >> WARN();
>> >>
>> >> in msix_capability_init() in xen is that the dom0 EPT entries that
>> >> I've mapped are going from RW to read only. Then when dom0 accesses
>> >> it, I get EPT violation. In case of pure PV, the PTE entry to access
>> >> the iomem is RW, and the above rangeset adding doesn't affect it. I
>> >> don't understand why? Looking into that now...
>>
>> As far as I was able to tell back at the time when I implemented
>> this, existing code shouldn't have mappings for these tables in
>> place at the time these ranges get added here. But I noted in
>> the patch description that this is a potential issue (and may need
>> fixing if deemed severe enough - back then, apparently nobody
>> really cared, perhaps largely because passthrough to PV guests
>> isn't considered fully secure anyway).
>>
>> Now - did that change? I.e. can you describe where the mappings
>> come from that cause the problem here?
>
> The generic Linux MSI-X handling code does that, before calling the
> arch specific msi setup function, for which we have a xen version
> (xen_initdom_setup_msi_irqs):
>
> pci_enable_msix -> msix_capability_init -> msix_map_region

Ah, okay, (of course?) I had looked only at the forward ported
version of this. Is all that fiddling with the mask bits really
being suppressed properly when running under Xen? Otherwise
pv-ops is quite broken in this regard at present... And if it is,
I don't see what the respective ioremap() is good for here in
the Xen case.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Thu, 13 Dec 2012, Jan Beulich wrote:
> >>> On 13.12.12 at 13:19, Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> wrote:
> > On Thu, 13 Dec 2012, Jan Beulich wrote:
> >> >>> On 13.12.12 at 02:43, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> >> > On Wed, 12 Dec 2012 17:15:23 -0800
> >> > Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> >> >
> >> >> On Tue, 11 Dec 2012 12:10:19 +0000
> >> >> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
> >> >>
> >> >> > On Tue, 11 Dec 2012, Mukesh Rathor wrote:
> >> >> > > On Mon, 10 Dec 2012 09:43:34 +0000
> >> >> > > "Jan Beulich" <JBeulich@suse.com> wrote:
> >> >> >
> >> >> > That's strange because AFAIK Linux is never editing the MSI-X
> >> >> > entries directly: give a look at
> >> >> > arch/x86/pci/xen.c:xen_initdom_setup_msi_irqs, Linux only remaps
> >> >> > MSIs into pirqs using hypercalls. Xen should be the only one to
> >> >> > touch the real MSI-X table.
> >> >>
> >> >> So, this is what's happening. The side effect of :
> >> >>
> >> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
> >> >> dev->msix_table.last) )
> >> >> WARN();
> >> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
> >> >> dev->msix_pba.last) )
> >> >> WARN();
> >> >>
> >> >> in msix_capability_init() in xen is that the dom0 EPT entries that
> >> >> I've mapped are going from RW to read only. Then when dom0 accesses
> >> >> it, I get EPT violation. In case of pure PV, the PTE entry to access
> >> >> the iomem is RW, and the above rangeset adding doesn't affect it. I
> >> >> don't understand why? Looking into that now...
> >>
> >> As far as I was able to tell back at the time when I implemented
> >> this, existing code shouldn't have mappings for these tables in
> >> place at the time these ranges get added here. But I noted in
> >> the patch description that this is a potential issue (and may need
> >> fixing if deemed severe enough - back then, apparently nobody
> >> really cared, perhaps largely because passthrough to PV guests
> >> isn't considered fully secure anyway).
> >>
> >> Now - did that change? I.e. can you describe where the mappings
> >> come from that cause the problem here?
> >
> > The generic Linux MSI-X handling code does that, before calling the
> > arch specific msi setup function, for which we have a xen version
> > (xen_initdom_setup_msi_irqs):
> >
> > pci_enable_msix -> msix_capability_init -> msix_map_region
>
> Ah, okay, (of course?) I had looked only at the forward ported
> version of this. Is all that fiddling with the mask bits really
> being suppressed properly when running under Xen? Otherwise
> pv-ops is quite broken in this regard at present... And if it is,
> I don't see what the respective ioremap() is good for here in
> the Xen case.

Actually I think that you might be right: just looking at the code it
seems that the mask bits get written to the table once as part of the
initialization process:

pci_enable_msix -> msix_capability_init -> msix_program_entries

Unfortunately msix_program_entries is called few lines after
arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI as
a pirq.
However after that is done, all the masking/unmask is done via irq_mask
that we handle properly masking/unmasking the corresponding event
channels.


Possible solutions on top of my head:

- in msix_program_entries instead of writing to the table directly
(__msix_mask_irq), call desc->irq_data.chip->irq_mask();

- replace msix_program_entries with arch_msix_program_entries, but it
would probably be unpopular.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Thu, 13 Dec 2012 14:25:16 +0000
Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:

> On Thu, 13 Dec 2012, Jan Beulich wrote:
> > >>> On 13.12.12 at 13:19, Stefano Stabellini
> > >>> <stefano.stabellini@eu.citrix.com>
> > wrote:
> > > On Thu, 13 Dec 2012, Jan Beulich wrote:
> > >> >>> On 13.12.12 at 02:43, Mukesh Rathor
> > >> >>> <mukesh.rathor@oracle.com> wrote:
>
> Actually I think that you might be right: just looking at the code it
> seems that the mask bits get written to the table once as part of the
> initialization process:
>
> pci_enable_msix -> msix_capability_init -> msix_program_entries
>
> Unfortunately msix_program_entries is called few lines after
> arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI
> as a pirq.
> However after that is done, all the masking/unmask is done via
> irq_mask that we handle properly masking/unmasking the corresponding
> event channels.
>
>
> Possible solutions on top of my head:
>
> - in msix_program_entries instead of writing to the table directly
> (__msix_mask_irq), call desc->irq_data.chip->irq_mask();
>
> - replace msix_program_entries with arch_msix_program_entries, but it
> would probably be unpopular.


Can you or Jan or somebody please take that over? I can focus on other
PVH things then and try to get a patch in asap.

Thanks,
Mukesh


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Fri, 14 Dec 2012, Mukesh Rathor wrote:
> On Thu, 13 Dec 2012 14:25:16 +0000
> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
>
> > On Thu, 13 Dec 2012, Jan Beulich wrote:
> > > >>> On 13.12.12 at 13:19, Stefano Stabellini
> > > >>> <stefano.stabellini@eu.citrix.com>
> > > wrote:
> > > > On Thu, 13 Dec 2012, Jan Beulich wrote:
> > > >> >>> On 13.12.12 at 02:43, Mukesh Rathor
> > > >> >>> <mukesh.rathor@oracle.com> wrote:
> >
> > Actually I think that you might be right: just looking at the code it
> > seems that the mask bits get written to the table once as part of the
> > initialization process:
> >
> > pci_enable_msix -> msix_capability_init -> msix_program_entries
> >
> > Unfortunately msix_program_entries is called few lines after
> > arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI
> > as a pirq.
> > However after that is done, all the masking/unmask is done via
> > irq_mask that we handle properly masking/unmasking the corresponding
> > event channels.
> >
> >
> > Possible solutions on top of my head:
> >
> > - in msix_program_entries instead of writing to the table directly
> > (__msix_mask_irq), call desc->irq_data.chip->irq_mask();
> >
> > - replace msix_program_entries with arch_msix_program_entries, but it
> > would probably be unpopular.
>
>
> Can you or Jan or somebody please take that over? I can focus on other
> PVH things then and try to get a patch in asap.

The following patch moves the MSI-X masking before arch_setup_msi_irqs,
that is when Linux calls PHYSDEVOP_map_pirq that is the hypercall that
causes Xen to execute msix_capability_init.

I don't have access to a machine with an MSI-X device right now so I
have only tested the appended patch with MSI.

---



diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a825d78..ef73e80 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -652,11 +652,22 @@ static void msix_program_entries(struct pci_dev *dev,
int i = 0;

list_for_each_entry(entry, &dev->msi_list, list) {
+ entries[i].vector = entry->irq;
+ irq_set_msi_desc(entry->irq, entry);
+ i++;
+ }
+}
+
+static void msix_mask_entries(struct pci_dev *dev,
+ struct msix_entry *entries)
+{
+ struct msi_desc *entry;
+ int i = 0;
+
+ list_for_each_entry(entry, &dev->msi_list, list) {
int offset = entries[i].entry * PCI_MSIX_ENTRY_SIZE +
PCI_MSIX_ENTRY_VECTOR_CTRL;

- entries[i].vector = entry->irq;
- irq_set_msi_desc(entry->irq, entry);
entry->masked = readl(entry->mask_base + offset);
msix_mask_irq(entry, 1);
i++;
@@ -696,6 +707,8 @@ static int msix_capability_init(struct pci_dev *dev,
if (ret)
return ret;

+ msix_mask_entries(dev, entries);
+
ret = arch_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX);
if (ret)
goto error;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
>>> On 17.12.12 at 13:42, Stefano Stabellini <stefano.stabellini@eu.citrix.com>
wrote:
> On Fri, 14 Dec 2012, Mukesh Rathor wrote:
>> On Thu, 13 Dec 2012 14:25:16 +0000
>> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
>>
>> > On Thu, 13 Dec 2012, Jan Beulich wrote:
>> > > >>> On 13.12.12 at 13:19, Stefano Stabellini
>> > > >>> <stefano.stabellini@eu.citrix.com>
>> > > wrote:
>> > > > On Thu, 13 Dec 2012, Jan Beulich wrote:
>> > > >> >>> On 13.12.12 at 02:43, Mukesh Rathor
>> > > >> >>> <mukesh.rathor@oracle.com> wrote:
>> >
>> > Actually I think that you might be right: just looking at the code it
>> > seems that the mask bits get written to the table once as part of the
>> > initialization process:
>> >
>> > pci_enable_msix -> msix_capability_init -> msix_program_entries
>> >
>> > Unfortunately msix_program_entries is called few lines after
>> > arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI
>> > as a pirq.
>> > However after that is done, all the masking/unmask is done via
>> > irq_mask that we handle properly masking/unmasking the corresponding
>> > event channels.
>> >
>> >
>> > Possible solutions on top of my head:
>> >
>> > - in msix_program_entries instead of writing to the table directly
>> > (__msix_mask_irq), call desc->irq_data.chip->irq_mask();
>> >
>> > - replace msix_program_entries with arch_msix_program_entries, but it
>> > would probably be unpopular.
>>
>>
>> Can you or Jan or somebody please take that over? I can focus on other
>> PVH things then and try to get a patch in asap.
>
> The following patch moves the MSI-X masking before arch_setup_msi_irqs,
> that is when Linux calls PHYSDEVOP_map_pirq that is the hypercall that
> causes Xen to execute msix_capability_init.

And in what way would that help?

Jan

> I don't have access to a machine with an MSI-X device right now so I
> have only tested the appended patch with MSI.
>
> ---
>
>
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index a825d78..ef73e80 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -652,11 +652,22 @@ static void msix_program_entries(struct pci_dev *dev,
> int i = 0;
>
> list_for_each_entry(entry, &dev->msi_list, list) {
> + entries[i].vector = entry->irq;
> + irq_set_msi_desc(entry->irq, entry);
> + i++;
> + }
> +}
> +
> +static void msix_mask_entries(struct pci_dev *dev,
> + struct msix_entry *entries)
> +{
> + struct msi_desc *entry;
> + int i = 0;
> +
> + list_for_each_entry(entry, &dev->msi_list, list) {
> int offset = entries[i].entry * PCI_MSIX_ENTRY_SIZE +
> PCI_MSIX_ENTRY_VECTOR_CTRL;
>
> - entries[i].vector = entry->irq;
> - irq_set_msi_desc(entry->irq, entry);
> entry->masked = readl(entry->mask_base + offset);
> msix_mask_irq(entry, 1);
> i++;
> @@ -696,6 +707,8 @@ static int msix_capability_init(struct pci_dev *dev,
> if (ret)
> return ret;
>
> + msix_mask_entries(dev, entries);
> +
> ret = arch_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX);
> if (ret)
> goto error;




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Mon, 17 Dec 2012, Jan Beulich wrote:
> >>> On 17.12.12 at 13:42, Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> wrote:
> > On Fri, 14 Dec 2012, Mukesh Rathor wrote:
> >> On Thu, 13 Dec 2012 14:25:16 +0000
> >> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
> >>
> >> > On Thu, 13 Dec 2012, Jan Beulich wrote:
> >> > > >>> On 13.12.12 at 13:19, Stefano Stabellini
> >> > > >>> <stefano.stabellini@eu.citrix.com>
> >> > > wrote:
> >> > > > On Thu, 13 Dec 2012, Jan Beulich wrote:
> >> > > >> >>> On 13.12.12 at 02:43, Mukesh Rathor
> >> > > >> >>> <mukesh.rathor@oracle.com> wrote:
> >> >
> >> > Actually I think that you might be right: just looking at the code it
> >> > seems that the mask bits get written to the table once as part of the
> >> > initialization process:
> >> >
> >> > pci_enable_msix -> msix_capability_init -> msix_program_entries
> >> >
> >> > Unfortunately msix_program_entries is called few lines after
> >> > arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI
> >> > as a pirq.
> >> > However after that is done, all the masking/unmask is done via
> >> > irq_mask that we handle properly masking/unmasking the corresponding
> >> > event channels.
> >> >
> >> >
> >> > Possible solutions on top of my head:
> >> >
> >> > - in msix_program_entries instead of writing to the table directly
> >> > (__msix_mask_irq), call desc->irq_data.chip->irq_mask();
> >> >
> >> > - replace msix_program_entries with arch_msix_program_entries, but it
> >> > would probably be unpopular.
> >>
> >>
> >> Can you or Jan or somebody please take that over? I can focus on other
> >> PVH things then and try to get a patch in asap.
> >
> > The following patch moves the MSI-X masking before arch_setup_msi_irqs,
> > that is when Linux calls PHYSDEVOP_map_pirq that is the hypercall that
> > causes Xen to execute msix_capability_init.
>
> And in what way would that help?

I was working under the assumption that before the call to
msix_capability_init (in particular before
rangeset_add_range(mmio_ro_ranges, dev->msix_table...) in Xen, the table
is actually writeable by the guest.

If that is the case, then this scheme should work.
If it is not the case, this patch is wrong.



> > I don't have access to a machine with an MSI-X device right now so I
> > have only tested the appended patch with MSI.
> >
> > ---
> >
> >
> >
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index a825d78..ef73e80 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -652,11 +652,22 @@ static void msix_program_entries(struct pci_dev *dev,
> > int i = 0;
> >
> > list_for_each_entry(entry, &dev->msi_list, list) {
> > + entries[i].vector = entry->irq;
> > + irq_set_msi_desc(entry->irq, entry);
> > + i++;
> > + }
> > +}
> > +
> > +static void msix_mask_entries(struct pci_dev *dev,
> > + struct msix_entry *entries)
> > +{
> > + struct msi_desc *entry;
> > + int i = 0;
> > +
> > + list_for_each_entry(entry, &dev->msi_list, list) {
> > int offset = entries[i].entry * PCI_MSIX_ENTRY_SIZE +
> > PCI_MSIX_ENTRY_VECTOR_CTRL;
> >
> > - entries[i].vector = entry->irq;
> > - irq_set_msi_desc(entry->irq, entry);
> > entry->masked = readl(entry->mask_base + offset);
> > msix_mask_irq(entry, 1);
> > i++;
> > @@ -696,6 +707,8 @@ static int msix_capability_init(struct pci_dev *dev,
> > if (ret)
> > return ret;
> >
> > + msix_mask_entries(dev, entries);
> > +
> > ret = arch_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX);
> > if (ret)
> > goto error;
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
>>> On 17.12.12 at 15:16, Stefano Stabellini <stefano.stabellini@eu.citrix.com>
wrote:
> On Mon, 17 Dec 2012, Jan Beulich wrote:
>> >>> On 17.12.12 at 13:42, Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> wrote:
>> > On Fri, 14 Dec 2012, Mukesh Rathor wrote:
>> >> On Thu, 13 Dec 2012 14:25:16 +0000
>> >> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
>> >>
>> >> > On Thu, 13 Dec 2012, Jan Beulich wrote:
>> >> > > >>> On 13.12.12 at 13:19, Stefano Stabellini
>> >> > > >>> <stefano.stabellini@eu.citrix.com>
>> >> > > wrote:
>> >> > > > On Thu, 13 Dec 2012, Jan Beulich wrote:
>> >> > > >> >>> On 13.12.12 at 02:43, Mukesh Rathor
>> >> > > >> >>> <mukesh.rathor@oracle.com> wrote:
>> >> >
>> >> > Actually I think that you might be right: just looking at the code it
>> >> > seems that the mask bits get written to the table once as part of the
>> >> > initialization process:
>> >> >
>> >> > pci_enable_msix -> msix_capability_init -> msix_program_entries
>> >> >
>> >> > Unfortunately msix_program_entries is called few lines after
>> >> > arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI
>> >> > as a pirq.
>> >> > However after that is done, all the masking/unmask is done via
>> >> > irq_mask that we handle properly masking/unmasking the corresponding
>> >> > event channels.
>> >> >
>> >> >
>> >> > Possible solutions on top of my head:
>> >> >
>> >> > - in msix_program_entries instead of writing to the table directly
>> >> > (__msix_mask_irq), call desc->irq_data.chip->irq_mask();
>> >> >
>> >> > - replace msix_program_entries with arch_msix_program_entries, but it
>> >> > would probably be unpopular.
>> >>
>> >>
>> >> Can you or Jan or somebody please take that over? I can focus on other
>> >> PVH things then and try to get a patch in asap.
>> >
>> > The following patch moves the MSI-X masking before arch_setup_msi_irqs,
>> > that is when Linux calls PHYSDEVOP_map_pirq that is the hypercall that
>> > causes Xen to execute msix_capability_init.
>>
>> And in what way would that help?
>
> I was working under the assumption that before the call to
> msix_capability_init (in particular before
> rangeset_add_range(mmio_ro_ranges, dev->msix_table...) in Xen, the table
> is actually writeable by the guest.
>
> If that is the case, then this scheme should work.
> If it is not the case, this patch is wrong.

The question is not if or when the table is writable - the Dom0
kernel should _never_ try to write to the mask bits.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Thu, Dec 13, 2012 at 02:25:16PM +0000, Stefano Stabellini wrote:
> On Thu, 13 Dec 2012, Jan Beulich wrote:
> > >>> On 13.12.12 at 13:19, Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > wrote:
> > > On Thu, 13 Dec 2012, Jan Beulich wrote:
> > >> >>> On 13.12.12 at 02:43, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> > >> > On Wed, 12 Dec 2012 17:15:23 -0800
> > >> > Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> > >> >
> > >> >> On Tue, 11 Dec 2012 12:10:19 +0000
> > >> >> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
> > >> >>
> > >> >> > On Tue, 11 Dec 2012, Mukesh Rathor wrote:
> > >> >> > > On Mon, 10 Dec 2012 09:43:34 +0000
> > >> >> > > "Jan Beulich" <JBeulich@suse.com> wrote:
> > >> >> >
> > >> >> > That's strange because AFAIK Linux is never editing the MSI-X
> > >> >> > entries directly: give a look at
> > >> >> > arch/x86/pci/xen.c:xen_initdom_setup_msi_irqs, Linux only remaps
> > >> >> > MSIs into pirqs using hypercalls. Xen should be the only one to
> > >> >> > touch the real MSI-X table.
> > >> >>
> > >> >> So, this is what's happening. The side effect of :
> > >> >>
> > >> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
> > >> >> dev->msix_table.last) )
> > >> >> WARN();
> > >> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
> > >> >> dev->msix_pba.last) )
> > >> >> WARN();
> > >> >>
> > >> >> in msix_capability_init() in xen is that the dom0 EPT entries that
> > >> >> I've mapped are going from RW to read only. Then when dom0 accesses
> > >> >> it, I get EPT violation. In case of pure PV, the PTE entry to access
> > >> >> the iomem is RW, and the above rangeset adding doesn't affect it. I
> > >> >> don't understand why? Looking into that now...
> > >>
> > >> As far as I was able to tell back at the time when I implemented
> > >> this, existing code shouldn't have mappings for these tables in
> > >> place at the time these ranges get added here. But I noted in
> > >> the patch description that this is a potential issue (and may need
> > >> fixing if deemed severe enough - back then, apparently nobody
> > >> really cared, perhaps largely because passthrough to PV guests
> > >> isn't considered fully secure anyway).
> > >>
> > >> Now - did that change? I.e. can you describe where the mappings
> > >> come from that cause the problem here?
> > >
> > > The generic Linux MSI-X handling code does that, before calling the
> > > arch specific msi setup function, for which we have a xen version
> > > (xen_initdom_setup_msi_irqs):
> > >
> > > pci_enable_msix -> msix_capability_init -> msix_map_region
> >
> > Ah, okay, (of course?) I had looked only at the forward ported
> > version of this. Is all that fiddling with the mask bits really
> > being suppressed properly when running under Xen? Otherwise
> > pv-ops is quite broken in this regard at present... And if it is,
> > I don't see what the respective ioremap() is good for here in
> > the Xen case.
>
> Actually I think that you might be right: just looking at the code it
> seems that the mask bits get written to the table once as part of the
> initialization process:
>
> pci_enable_msix -> msix_capability_init -> msix_program_entries
>
> Unfortunately msix_program_entries is called few lines after
> arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI as
> a pirq.
> However after that is done, all the masking/unmask is done via irq_mask
> that we handle properly masking/unmasking the corresponding event
> channels.
>
>
> Possible solutions on top of my head:

There is also the potential to piggyback on Joerg's patches
that introduce a new x86_msi_ops: compose_msi_msg.

See here: https://lkml.org/lkml/2012/8/20/432
(I think there was also a more recent one posted at some point).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Tue, 18 Dec 2012, Konrad Rzeszutek Wilk wrote:
> On Thu, Dec 13, 2012 at 02:25:16PM +0000, Stefano Stabellini wrote:
> > On Thu, 13 Dec 2012, Jan Beulich wrote:
> > > >>> On 13.12.12 at 13:19, Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > > wrote:
> > > > On Thu, 13 Dec 2012, Jan Beulich wrote:
> > > >> >>> On 13.12.12 at 02:43, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> > > >> > On Wed, 12 Dec 2012 17:15:23 -0800
> > > >> > Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> > > >> >
> > > >> >> On Tue, 11 Dec 2012 12:10:19 +0000
> > > >> >> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
> > > >> >>
> > > >> >> > On Tue, 11 Dec 2012, Mukesh Rathor wrote:
> > > >> >> > > On Mon, 10 Dec 2012 09:43:34 +0000
> > > >> >> > > "Jan Beulich" <JBeulich@suse.com> wrote:
> > > >> >> >
> > > >> >> > That's strange because AFAIK Linux is never editing the MSI-X
> > > >> >> > entries directly: give a look at
> > > >> >> > arch/x86/pci/xen.c:xen_initdom_setup_msi_irqs, Linux only remaps
> > > >> >> > MSIs into pirqs using hypercalls. Xen should be the only one to
> > > >> >> > touch the real MSI-X table.
> > > >> >>
> > > >> >> So, this is what's happening. The side effect of :
> > > >> >>
> > > >> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
> > > >> >> dev->msix_table.last) )
> > > >> >> WARN();
> > > >> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
> > > >> >> dev->msix_pba.last) )
> > > >> >> WARN();
> > > >> >>
> > > >> >> in msix_capability_init() in xen is that the dom0 EPT entries that
> > > >> >> I've mapped are going from RW to read only. Then when dom0 accesses
> > > >> >> it, I get EPT violation. In case of pure PV, the PTE entry to access
> > > >> >> the iomem is RW, and the above rangeset adding doesn't affect it. I
> > > >> >> don't understand why? Looking into that now...
> > > >>
> > > >> As far as I was able to tell back at the time when I implemented
> > > >> this, existing code shouldn't have mappings for these tables in
> > > >> place at the time these ranges get added here. But I noted in
> > > >> the patch description that this is a potential issue (and may need
> > > >> fixing if deemed severe enough - back then, apparently nobody
> > > >> really cared, perhaps largely because passthrough to PV guests
> > > >> isn't considered fully secure anyway).
> > > >>
> > > >> Now - did that change? I.e. can you describe where the mappings
> > > >> come from that cause the problem here?
> > > >
> > > > The generic Linux MSI-X handling code does that, before calling the
> > > > arch specific msi setup function, for which we have a xen version
> > > > (xen_initdom_setup_msi_irqs):
> > > >
> > > > pci_enable_msix -> msix_capability_init -> msix_map_region
> > >
> > > Ah, okay, (of course?) I had looked only at the forward ported
> > > version of this. Is all that fiddling with the mask bits really
> > > being suppressed properly when running under Xen? Otherwise
> > > pv-ops is quite broken in this regard at present... And if it is,
> > > I don't see what the respective ioremap() is good for here in
> > > the Xen case.
> >
> > Actually I think that you might be right: just looking at the code it
> > seems that the mask bits get written to the table once as part of the
> > initialization process:
> >
> > pci_enable_msix -> msix_capability_init -> msix_program_entries
> >
> > Unfortunately msix_program_entries is called few lines after
> > arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI as
> > a pirq.
> > However after that is done, all the masking/unmask is done via irq_mask
> > that we handle properly masking/unmasking the corresponding event
> > channels.
> >
> >
> > Possible solutions on top of my head:
>
> There is also the potential to piggyback on Joerg's patches
> that introduce a new x86_msi_ops: compose_msi_msg.
>
> See here: https://lkml.org/lkml/2012/8/20/432
> (I think there was also a more recent one posted at some point).

Given that dom0 should never write to the MSI-X table, introducing a new
msi_ops that replaces msix_program_entries (or at least the part of
msix_program_entries that masks all the entried) is the only solution
left.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Wed, Dec 19, 2012 at 11:19:22AM +0000, Stefano Stabellini wrote:
> On Tue, 18 Dec 2012, Konrad Rzeszutek Wilk wrote:
> > On Thu, Dec 13, 2012 at 02:25:16PM +0000, Stefano Stabellini wrote:
> > > On Thu, 13 Dec 2012, Jan Beulich wrote:
> > > > >>> On 13.12.12 at 13:19, Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > > > wrote:
> > > > > On Thu, 13 Dec 2012, Jan Beulich wrote:
> > > > >> >>> On 13.12.12 at 02:43, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> > > > >> > On Wed, 12 Dec 2012 17:15:23 -0800
> > > > >> > Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> > > > >> >
> > > > >> >> On Tue, 11 Dec 2012 12:10:19 +0000
> > > > >> >> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
> > > > >> >>
> > > > >> >> > On Tue, 11 Dec 2012, Mukesh Rathor wrote:
> > > > >> >> > > On Mon, 10 Dec 2012 09:43:34 +0000
> > > > >> >> > > "Jan Beulich" <JBeulich@suse.com> wrote:
> > > > >> >> >
> > > > >> >> > That's strange because AFAIK Linux is never editing the MSI-X
> > > > >> >> > entries directly: give a look at
> > > > >> >> > arch/x86/pci/xen.c:xen_initdom_setup_msi_irqs, Linux only remaps
> > > > >> >> > MSIs into pirqs using hypercalls. Xen should be the only one to
> > > > >> >> > touch the real MSI-X table.
> > > > >> >>
> > > > >> >> So, this is what's happening. The side effect of :
> > > > >> >>
> > > > >> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
> > > > >> >> dev->msix_table.last) )
> > > > >> >> WARN();
> > > > >> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
> > > > >> >> dev->msix_pba.last) )
> > > > >> >> WARN();
> > > > >> >>
> > > > >> >> in msix_capability_init() in xen is that the dom0 EPT entries that
> > > > >> >> I've mapped are going from RW to read only. Then when dom0 accesses
> > > > >> >> it, I get EPT violation. In case of pure PV, the PTE entry to access
> > > > >> >> the iomem is RW, and the above rangeset adding doesn't affect it. I
> > > > >> >> don't understand why? Looking into that now...
> > > > >>
> > > > >> As far as I was able to tell back at the time when I implemented
> > > > >> this, existing code shouldn't have mappings for these tables in
> > > > >> place at the time these ranges get added here. But I noted in
> > > > >> the patch description that this is a potential issue (and may need
> > > > >> fixing if deemed severe enough - back then, apparently nobody
> > > > >> really cared, perhaps largely because passthrough to PV guests
> > > > >> isn't considered fully secure anyway).
> > > > >>
> > > > >> Now - did that change? I.e. can you describe where the mappings
> > > > >> come from that cause the problem here?
> > > > >
> > > > > The generic Linux MSI-X handling code does that, before calling the
> > > > > arch specific msi setup function, for which we have a xen version
> > > > > (xen_initdom_setup_msi_irqs):
> > > > >
> > > > > pci_enable_msix -> msix_capability_init -> msix_map_region
> > > >
> > > > Ah, okay, (of course?) I had looked only at the forward ported
> > > > version of this. Is all that fiddling with the mask bits really
> > > > being suppressed properly when running under Xen? Otherwise
> > > > pv-ops is quite broken in this regard at present... And if it is,
> > > > I don't see what the respective ioremap() is good for here in
> > > > the Xen case.
> > >
> > > Actually I think that you might be right: just looking at the code it
> > > seems that the mask bits get written to the table once as part of the
> > > initialization process:
> > >
> > > pci_enable_msix -> msix_capability_init -> msix_program_entries
> > >
> > > Unfortunately msix_program_entries is called few lines after
> > > arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI as
> > > a pirq.
> > > However after that is done, all the masking/unmask is done via irq_mask
> > > that we handle properly masking/unmasking the corresponding event
> > > channels.
> > >
> > >
> > > Possible solutions on top of my head:
> >
> > There is also the potential to piggyback on Joerg's patches
> > that introduce a new x86_msi_ops: compose_msi_msg.
> >
> > See here: https://lkml.org/lkml/2012/8/20/432
> > (I think there was also a more recent one posted at some point).
>
> Given that dom0 should never write to the MSI-X table, introducing a new

How does this work with QEMU setting up MSI and MSI-X on behalf of
guests? Or is that actually handled by Xen hypervisor?

> msi_ops that replaces msix_program_entries (or at least the part of
> msix_program_entries that masks all the entried) is the only solution
> left.

so this one (__msix_mask_irq):

mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
198 if (flag)
199 mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
200 writel(mask_bits, desc->mask_base + offset);





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Wed, 19 Dec 2012, Konrad Rzeszutek Wilk wrote:
> On Wed, Dec 19, 2012 at 11:19:22AM +0000, Stefano Stabellini wrote:
> > On Tue, 18 Dec 2012, Konrad Rzeszutek Wilk wrote:
> > > On Thu, Dec 13, 2012 at 02:25:16PM +0000, Stefano Stabellini wrote:
> > > > On Thu, 13 Dec 2012, Jan Beulich wrote:
> > > > > >>> On 13.12.12 at 13:19, Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > > > > wrote:
> > > > > > On Thu, 13 Dec 2012, Jan Beulich wrote:
> > > > > >> >>> On 13.12.12 at 02:43, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> > > > > >> > On Wed, 12 Dec 2012 17:15:23 -0800
> > > > > >> > Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> > > > > >> >
> > > > > >> >> On Tue, 11 Dec 2012 12:10:19 +0000
> > > > > >> >> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
> > > > > >> >>
> > > > > >> >> > On Tue, 11 Dec 2012, Mukesh Rathor wrote:
> > > > > >> >> > > On Mon, 10 Dec 2012 09:43:34 +0000
> > > > > >> >> > > "Jan Beulich" <JBeulich@suse.com> wrote:
> > > > > >> >> >
> > > > > >> >> > That's strange because AFAIK Linux is never editing the MSI-X
> > > > > >> >> > entries directly: give a look at
> > > > > >> >> > arch/x86/pci/xen.c:xen_initdom_setup_msi_irqs, Linux only remaps
> > > > > >> >> > MSIs into pirqs using hypercalls. Xen should be the only one to
> > > > > >> >> > touch the real MSI-X table.
> > > > > >> >>
> > > > > >> >> So, this is what's happening. The side effect of :
> > > > > >> >>
> > > > > >> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
> > > > > >> >> dev->msix_table.last) )
> > > > > >> >> WARN();
> > > > > >> >> if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
> > > > > >> >> dev->msix_pba.last) )
> > > > > >> >> WARN();
> > > > > >> >>
> > > > > >> >> in msix_capability_init() in xen is that the dom0 EPT entries that
> > > > > >> >> I've mapped are going from RW to read only. Then when dom0 accesses
> > > > > >> >> it, I get EPT violation. In case of pure PV, the PTE entry to access
> > > > > >> >> the iomem is RW, and the above rangeset adding doesn't affect it. I
> > > > > >> >> don't understand why? Looking into that now...
> > > > > >>
> > > > > >> As far as I was able to tell back at the time when I implemented
> > > > > >> this, existing code shouldn't have mappings for these tables in
> > > > > >> place at the time these ranges get added here. But I noted in
> > > > > >> the patch description that this is a potential issue (and may need
> > > > > >> fixing if deemed severe enough - back then, apparently nobody
> > > > > >> really cared, perhaps largely because passthrough to PV guests
> > > > > >> isn't considered fully secure anyway).
> > > > > >>
> > > > > >> Now - did that change? I.e. can you describe where the mappings
> > > > > >> come from that cause the problem here?
> > > > > >
> > > > > > The generic Linux MSI-X handling code does that, before calling the
> > > > > > arch specific msi setup function, for which we have a xen version
> > > > > > (xen_initdom_setup_msi_irqs):
> > > > > >
> > > > > > pci_enable_msix -> msix_capability_init -> msix_map_region
> > > > >
> > > > > Ah, okay, (of course?) I had looked only at the forward ported
> > > > > version of this. Is all that fiddling with the mask bits really
> > > > > being suppressed properly when running under Xen? Otherwise
> > > > > pv-ops is quite broken in this regard at present... And if it is,
> > > > > I don't see what the respective ioremap() is good for here in
> > > > > the Xen case.
> > > >
> > > > Actually I think that you might be right: just looking at the code it
> > > > seems that the mask bits get written to the table once as part of the
> > > > initialization process:
> > > >
> > > > pci_enable_msix -> msix_capability_init -> msix_program_entries
> > > >
> > > > Unfortunately msix_program_entries is called few lines after
> > > > arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI as
> > > > a pirq.
> > > > However after that is done, all the masking/unmask is done via irq_mask
> > > > that we handle properly masking/unmasking the corresponding event
> > > > channels.
> > > >
> > > >
> > > > Possible solutions on top of my head:
> > >
> > > There is also the potential to piggyback on Joerg's patches
> > > that introduce a new x86_msi_ops: compose_msi_msg.
> > >
> > > See here: https://lkml.org/lkml/2012/8/20/432
> > > (I think there was also a more recent one posted at some point).
> >
> > Given that dom0 should never write to the MSI-X table, introducing a new
>
> How does this work with QEMU setting up MSI and MSI-X on behalf of
> guests? Or is that actually handled by Xen hypervisor?

In the case of HVM guests, QEMU emulates the PCI config space and the
table, so it is OK for the guest to write to it.


> > msi_ops that replaces msix_program_entries (or at least the part of
> > msix_program_entries that masks all the entried) is the only solution
> > left.
>
> so this one (__msix_mask_irq):
>
> mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
> 198 if (flag)
> 199 mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
> 200 writel(mask_bits, desc->mask_base + offset);
>

Yes, that's the one. Once could argue that __msix_mask_irq should call
mask_irq rather than writing to the table directly.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
> > > > > pci_enable_msix -> msix_capability_init -> msix_program_entries
> > > > >
> > > > > Unfortunately msix_program_entries is called few lines after
> > > > > arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI as
> > > > > a pirq.
> > > > > However after that is done, all the masking/unmask is done via irq_mask
> > > > > that we handle properly masking/unmasking the corresponding event
> > > > > channels.
> > > > >
> > > > >
> > > > > Possible solutions on top of my head:
> > > >
> > > > There is also the potential to piggyback on Joerg's patches
> > > > that introduce a new x86_msi_ops: compose_msi_msg.
> > > >
> > > > See here: https://lkml.org/lkml/2012/8/20/432
> > > > (I think there was also a more recent one posted at some point).
> > >
> > > Given that dom0 should never write to the MSI-X table, introducing a new
> >
> > How does this work with QEMU setting up MSI and MSI-X on behalf of
> > guests? Or is that actually handled by Xen hypervisor?
>
> In the case of HVM guests, QEMU emulates the PCI config space and the
> table, so it is OK for the guest to write to it.
>
>
> > > msi_ops that replaces msix_program_entries (or at least the part of
> > > msix_program_entries that masks all the entried) is the only solution
> > > left.
> >
> > so this one (__msix_mask_irq):
> >
> > mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > 198 if (flag)
> > 199 mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > 200 writel(mask_bits, desc->mask_base + offset);
> >
>
> Yes, that's the one. Once could argue that __msix_mask_irq should call
> mask_irq rather than writing to the table directly.

You mean 'irq_mask ' ? Not really - that is within the IOAPIC domain.

To be more generic it should encompass then also the other usages -
that is the 'readl' and 'writel' users.

My understading of the reason we have been fortunate enough to have this
working right now is b/c the hypercall we do beforehand writes the
'pirq' in the MSI-X BAR and that is later what the Linux kernel
does (by doing readl) - and we end up re-writing that value
by the Linux kernel.

The other thing we can do and entirely bypass the msi.c writes is
xen_initdom_setup_msi_irqs make the desc->mask_base point to
somewhere safe. Meaning point to an page we allocate when
we setup the IRQs and we fill it with whatever we want
(which I guess would be the pirq values we just got).


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Fri, 21 Dec 2012, Konrad Rzeszutek Wilk wrote:
> > > > > > pci_enable_msix -> msix_capability_init -> msix_program_entries
> > > > > >
> > > > > > Unfortunately msix_program_entries is called few lines after
> > > > > > arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI as
> > > > > > a pirq.
> > > > > > However after that is done, all the masking/unmask is done via irq_mask
> > > > > > that we handle properly masking/unmasking the corresponding event
> > > > > > channels.
> > > > > >
> > > > > >
> > > > > > Possible solutions on top of my head:
> > > > >
> > > > > There is also the potential to piggyback on Joerg's patches
> > > > > that introduce a new x86_msi_ops: compose_msi_msg.
> > > > >
> > > > > See here: https://lkml.org/lkml/2012/8/20/432
> > > > > (I think there was also a more recent one posted at some point).
> > > >
> > > > Given that dom0 should never write to the MSI-X table, introducing a new
> > >
> > > How does this work with QEMU setting up MSI and MSI-X on behalf of
> > > guests? Or is that actually handled by Xen hypervisor?
> >
> > In the case of HVM guests, QEMU emulates the PCI config space and the
> > table, so it is OK for the guest to write to it.
> >
> >
> > > > msi_ops that replaces msix_program_entries (or at least the part of
> > > > msix_program_entries that masks all the entried) is the only solution
> > > > left.
> > >
> > > so this one (__msix_mask_irq):
> > >
> > > mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > > 198 if (flag)
> > > 199 mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > > 200 writel(mask_bits, desc->mask_base + offset);
> > >
> >
> > Yes, that's the one. Once could argue that __msix_mask_irq should call
> > mask_irq rather than writing to the table directly.
>
> You mean 'irq_mask ' ? Not really - that is within the IOAPIC domain.

The concept of IOAPIC domain is a bit blurry to me when it comes to MSI-X.
But I see what you mean.


> To be more generic it should encompass then also the other usages -
> that is the 'readl' and 'writel' users.
>
> My understading of the reason we have been fortunate enough to have this
> working right now is b/c the hypercall we do beforehand writes the
> 'pirq' in the MSI-X BAR and that is later what the Linux kernel
> does (by doing readl) - and we end up re-writing that value
> by the Linux kernel.
>
> The other thing we can do and entirely bypass the msi.c writes is
> xen_initdom_setup_msi_irqs make the desc->mask_base point to
> somewhere safe. Meaning point to an page we allocate when
> we setup the IRQs and we fill it with whatever we want
> (which I guess would be the pirq values we just got).

Ah yes, that would work. Even though more code to work around generic
Linux infrastructure is not ideal.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PVH]: Help: msi.c [ In reply to ]
On Fri, Jan 04, 2013 at 04:57:13PM +0000, Stefano Stabellini wrote:
> On Fri, 21 Dec 2012, Konrad Rzeszutek Wilk wrote:
> > > > > > > pci_enable_msix -> msix_capability_init -> msix_program_entries
> > > > > > >
> > > > > > > Unfortunately msix_program_entries is called few lines after
> > > > > > > arch_setup_msi_irqs, where we call PHYSDEVOP_map_pirq to map the MSI as
> > > > > > > a pirq.
> > > > > > > However after that is done, all the masking/unmask is done via irq_mask
> > > > > > > that we handle properly masking/unmasking the corresponding event
> > > > > > > channels.
> > > > > > >
> > > > > > >
> > > > > > > Possible solutions on top of my head:
> > > > > >
> > > > > > There is also the potential to piggyback on Joerg's patches
> > > > > > that introduce a new x86_msi_ops: compose_msi_msg.
> > > > > >
> > > > > > See here: https://lkml.org/lkml/2012/8/20/432
> > > > > > (I think there was also a more recent one posted at some point).
> > > > >
> > > > > Given that dom0 should never write to the MSI-X table, introducing a new
> > > >
> > > > How does this work with QEMU setting up MSI and MSI-X on behalf of
> > > > guests? Or is that actually handled by Xen hypervisor?
> > >
> > > In the case of HVM guests, QEMU emulates the PCI config space and the
> > > table, so it is OK for the guest to write to it.
> > >
> > >
> > > > > msi_ops that replaces msix_program_entries (or at least the part of
> > > > > msix_program_entries that masks all the entried) is the only solution
> > > > > left.
> > > >
> > > > so this one (__msix_mask_irq):
> > > >
> > > > mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > > > 198 if (flag)
> > > > 199 mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
> > > > 200 writel(mask_bits, desc->mask_base + offset);
> > > >
> > >
> > > Yes, that's the one. Once could argue that __msix_mask_irq should call
> > > mask_irq rather than writing to the table directly.
> >
> > You mean 'irq_mask ' ? Not really - that is within the IOAPIC domain.
>
> The concept of IOAPIC domain is a bit blurry to me when it comes to MSI-X.
> But I see what you mean.
>
>
> > To be more generic it should encompass then also the other usages -
> > that is the 'readl' and 'writel' users.
> >
> > My understading of the reason we have been fortunate enough to have this
> > working right now is b/c the hypercall we do beforehand writes the
> > 'pirq' in the MSI-X BAR and that is later what the Linux kernel
> > does (by doing readl) - and we end up re-writing that value
> > by the Linux kernel.
> >
> > The other thing we can do and entirely bypass the msi.c writes is
> > xen_initdom_setup_msi_irqs make the desc->mask_base point to
> > somewhere safe. Meaning point to an page we allocate when
> > we setup the IRQs and we fill it with whatever we want
> > (which I guess would be the pirq values we just got).
>
> Ah yes, that would work. Even though more code to work around generic
> Linux infrastructure is not ideal.

I concur. I am just thinking of a fail-safe mechanism. The abstraction
of how to do the MSI-X read/write is not yet completly clear in my mind.

>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel