Mailing List Archive: [PATCH] pci: cleanup MSI interrupts before removing device from IOMMU

[PATCH] pci: cleanup MSI interrupts before removing device from IOMMU

Oct 21, 2020, 1:19 AM

Post #1 of 3 (82 views)

Doing the MSI cleanup after removing the device from the IOMMU leads
to the following panic on AMD hardware:

Assertion 'table.ptr && (index < intremap_table_entries(table.ptr, iommu))' failed at iommu_intr.c:172
----[ Xen-4.13.1-10.0.3-d x86_64 debug=y Not tainted ]----
CPU: 3
RIP: e008:[<ffff82d08026ae3c>] drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
[...]
Xen call trace:
[<ffff82d08026ae3c>] R drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
[<ffff82d08026af25>] F drivers/passthrough/amd/iommu_intr.c#update_intremap_entry_from_msi_msg+0xc0/0x342
[<ffff82d08026ba65>] F amd_iommu_msi_msg_update_ire+0x98/0x129
[<ffff82d08025dd36>] F iommu_update_ire_from_msi+0x1e/0x21
[<ffff82d080286862>] F msi_free_irq+0x55/0x1a0
[<ffff82d080286f25>] F pci_cleanup_msi+0x8c/0xb0
[<ffff82d08025cf52>] F pci_remove_device+0x1af/0x2da
[<ffff82d0802a42d1>] F do_physdev_op+0xd18/0x1187
[<ffff82d080383925>] F pv_hypercall+0x1f5/0x567
[<ffff82d08038a432>] F lstar_enter+0x112/0x120

That's because the call to iommu_remove_device on AMD hardware will
remove the per-device interrupt remapping table, and hence the call to
pci_cleanup_msi done afterwards will find a null intremap table and
crash.

Reorder the calls so that MSI interrupts are torn down before removing
the device from the IOMMU.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
I've discussed the issue with Andrew and maybe we should try to avoid
removing the interrupt remapping table on device removal, but then the
tables would have to be sized to support the maximum amount of
interrupts instead of the maximum supported by the device currently
plugged in.

I think the currently proposed fix can be easily backported. We can
see about improving the resilience going ahead.
---
xen/drivers/passthrough/pci.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index b035067975..64b8a77ce0 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -834,10 +834,15 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
if ( pdev->bus == bus && pdev->devfn == devfn )
{
+ /*
+ * Cleanup MSI interrupts before removing the device from the
+ * IOMMU, or else the internal IOMMU data used to track the device
+ * interrupts might be already gone.
+ */
+ pci_cleanup_msi(pdev);
ret = iommu_remove_device(pdev);
if ( pdev->domain )
list_del(&pdev->domain_list);
- pci_cleanup_msi(pdev);
printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf);
free_pdev(pseg, pdev);
break;
--
2.29.0

Re: [PATCH] pci: cleanup MSI interrupts before removing device from IOMMU [ In reply to ]

jbeulich at suse

Oct 21, 2020, 4:20 AM

Post #2 of 3 (82 views)

Permalink

On 21.10.2020 10:19, Roger Pau Monne wrote:
> Doing the MSI cleanup after removing the device from the IOMMU leads
> to the following panic on AMD hardware:
>
> Assertion 'table.ptr && (index < intremap_table_entries(table.ptr, iommu))' failed at iommu_intr.c:172
> ----[ Xen-4.13.1-10.0.3-d x86_64 debug=y Not tainted ]----
> CPU: 3
> RIP: e008:[<ffff82d08026ae3c>] drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
> [...]
> Xen call trace:
> [<ffff82d08026ae3c>] R drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
> [<ffff82d08026af25>] F drivers/passthrough/amd/iommu_intr.c#update_intremap_entry_from_msi_msg+0xc0/0x342
> [<ffff82d08026ba65>] F amd_iommu_msi_msg_update_ire+0x98/0x129
> [<ffff82d08025dd36>] F iommu_update_ire_from_msi+0x1e/0x21
> [<ffff82d080286862>] F msi_free_irq+0x55/0x1a0
> [<ffff82d080286f25>] F pci_cleanup_msi+0x8c/0xb0
> [<ffff82d08025cf52>] F pci_remove_device+0x1af/0x2da
> [<ffff82d0802a42d1>] F do_physdev_op+0xd18/0x1187
> [<ffff82d080383925>] F pv_hypercall+0x1f5/0x567
> [<ffff82d08038a432>] F lstar_enter+0x112/0x120
>
> That's because the call to iommu_remove_device on AMD hardware will
> remove the per-device interrupt remapping table, and hence the call to
> pci_cleanup_msi done afterwards will find a null intremap table and
> crash.
>
> Reorder the calls so that MSI interrupts are torn down before removing
> the device from the IOMMU.

I guess this wants

Fixes: d7cfeb7c13ed ("AMD/IOMMU: don't blindly allocate interrupt remapping tables")

?

> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> I've discussed the issue with Andrew and maybe we should try to avoid
> removing the interrupt remapping table on device removal, but then the
> tables would have to be sized to support the maximum amount of
> interrupts instead of the maximum supported by the device currently
> plugged in.

We've specifically limited allocation sizes not so long ago (the
commit above was the first of two steps in that direction). So
I'd rather not see us go back unless there's truly new information
available now.

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -834,10 +834,15 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
> list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
> if ( pdev->bus == bus && pdev->devfn == devfn )
> {
> + /*
> + * Cleanup MSI interrupts before removing the device from the
> + * IOMMU, or else the internal IOMMU data used to track the device
> + * interrupts might be already gone.
> + */
> + pci_cleanup_msi(pdev);
> ret = iommu_remove_device(pdev);
> if ( pdev->domain )
> list_del(&pdev->domain_list);
> - pci_cleanup_msi(pdev);

To be honest I'm not sure about the comment. It should really have
been this way from the very beginning, and VT-d not being affected
makes me wonder what possible improvements are there waiting to be
noticed and then carried out.

Jan

Re: [PATCH] pci: cleanup MSI interrupts before removing device from IOMMU [ In reply to ]

roger.pau at citrix

Oct 21, 2020, 4:34 AM

Post #3 of 3 (82 views)

Permalink

On Wed, Oct 21, 2020 at 01:20:27PM +0200, Jan Beulich wrote:
> On 21.10.2020 10:19, Roger Pau Monne wrote:
> > Doing the MSI cleanup after removing the device from the IOMMU leads
> > to the following panic on AMD hardware:
> >
> > Assertion 'table.ptr && (index < intremap_table_entries(table.ptr, iommu))' failed at iommu_intr.c:172
> > ----[ Xen-4.13.1-10.0.3-d x86_64 debug=y Not tainted ]----
> > CPU: 3
> > RIP: e008:[<ffff82d08026ae3c>] drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
> > [...]
> > Xen call trace:
> > [<ffff82d08026ae3c>] R drivers/passthrough/amd/iommu_intr.c#get_intremap_entry+0x52/0x7b
> > [<ffff82d08026af25>] F drivers/passthrough/amd/iommu_intr.c#update_intremap_entry_from_msi_msg+0xc0/0x342
> > [<ffff82d08026ba65>] F amd_iommu_msi_msg_update_ire+0x98/0x129
> > [<ffff82d08025dd36>] F iommu_update_ire_from_msi+0x1e/0x21
> > [<ffff82d080286862>] F msi_free_irq+0x55/0x1a0
> > [<ffff82d080286f25>] F pci_cleanup_msi+0x8c/0xb0
> > [<ffff82d08025cf52>] F pci_remove_device+0x1af/0x2da
> > [<ffff82d0802a42d1>] F do_physdev_op+0xd18/0x1187
> > [<ffff82d080383925>] F pv_hypercall+0x1f5/0x567
> > [<ffff82d08038a432>] F lstar_enter+0x112/0x120
> >
> > That's because the call to iommu_remove_device on AMD hardware will
> > remove the per-device interrupt remapping table, and hence the call to
> > pci_cleanup_msi done afterwards will find a null intremap table and
> > crash.
> >
> > Reorder the calls so that MSI interrupts are torn down before removing
> > the device from the IOMMU.
>
> I guess this wants
>
> Fixes: d7cfeb7c13ed ("AMD/IOMMU: don't blindly allocate interrupt remapping tables")
>
> ?

Oh yes, I didn't git blame the file to figure out when such allocating
and freeing was added.

> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

> > --- a/xen/drivers/passthrough/pci.c
> > +++ b/xen/drivers/passthrough/pci.c
> > @@ -834,10 +834,15 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
> > list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
> > if ( pdev->bus == bus && pdev->devfn == devfn )
> > {
> > + /*
> > + * Cleanup MSI interrupts before removing the device from the
> > + * IOMMU, or else the internal IOMMU data used to track the device
> > + * interrupts might be already gone.
> > + */
> > + pci_cleanup_msi(pdev);
> > ret = iommu_remove_device(pdev);
> > if ( pdev->domain )
> > list_del(&pdev->domain_list);
> > - pci_cleanup_msi(pdev);
>
> To be honest I'm not sure about the comment. It should really have
> been this way from the very beginning, and VT-d not being affected
> makes me wonder what possible improvements are there waiting to be
> noticed and then carried out.

I'm fine with dropping the comment, I would also expect the normal
flow to be to cleanup any interrupt and then remove the device,
instead of the other way around.

Roger.