Mailing List Archive

i915 dma faults on Xen
Hi,

Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576

I'm seeing DMA faults for the i915 graphics hardware on a Dell
Latitude 5500. These were captured when I plugged into a Dell
Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4
staging and Linux 5.4.70 (and some earlier versions).

Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler
[i915]] *ERROR* Fault errors on pipe A: 0x00000080
Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler
[i915]] *ERROR* Fault errors on pipe A: 0x00000080
Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
Request device [0000:00:02.0] fault addr 39b5845000, iommu reg =
ffff82c00021d000
Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
PTE Read access is not set
Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler
[i915]] *ERROR* Fault errors on pipe A: 0x00000080
Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler
[i915]] *ERROR* Fault errors on pipe A: 0x00000080
Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg =
ffff82c00021d000
Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
PTE Read access is not set

They repeat. In the log attached to
https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at
"Oct 14 18:41:49.056589" and continue until I unplug the dock around
"Oct 14 18:41:54.801802".

I've also seen similar messages when attaching the laptop's HDMI port
to a 4k monitor. The eDP display by itself seems okay.

I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and
didn't see any errors

This is a kernel & xen log with drm.debug=0x1e. It also includes some
application (glass) logging when it changes resolutions which seems to
set off the DMA faults. 5500-igfx-messages-kern-xen-glass

Running xen with iommu=no-igfx disables the iommu for the i915
graphics and no faults are reported. However, that breaks some other
devices (Dell Latitude 7200 and 5580) giving a black screen with:

Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed
to idle engines, declaring wedged!
Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed
to initialize GPU, declaring it wedged!

Any suggestions welcome.

Thanks,
Jason
Re: i915 dma faults on Xen [ In reply to ]
On 14/10/2020 20:28, Jason Andryuk wrote:
> Hi,
>
> Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576
>
> I'm seeing DMA faults for the i915 graphics hardware on a Dell
> Latitude 5500. These were captured when I plugged into a Dell
> Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4
> staging and Linux 5.4.70 (and some earlier versions).
>
> Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler
> [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler
> [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> Request device [0000:00:02.0] fault addr 39b5845000, iommu reg =
> ffff82c00021d000
> Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> PTE Read access is not set
> Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler
> [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler
> [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg =
> ffff82c00021d000
> Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> PTE Read access is not set
>
> They repeat. In the log attached to
> https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at
> "Oct 14 18:41:49.056589" and continue until I unplug the dock around
> "Oct 14 18:41:54.801802".
>
> I've also seen similar messages when attaching the laptop's HDMI port
> to a 4k monitor. The eDP display by itself seems okay.
>
> I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and
> didn't see any errors
>
> This is a kernel & xen log with drm.debug=0x1e. It also includes some
> application (glass) logging when it changes resolutions which seems to
> set off the DMA faults. 5500-igfx-messages-kern-xen-glass
>
> Running xen with iommu=no-igfx disables the iommu for the i915
> graphics and no faults are reported. However, that breaks some other
> devices (Dell Latitude 7200 and 5580) giving a black screen with:
>
> Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed
> to idle engines, declaring wedged!
> Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed
> to initialize GPU, declaring it wedged!
>
> Any suggestions welcome.

Presumably this is with a PV dom0.  What are 39b5845000 and 4238d0a000
in the machine memory map?

This smells like a missing RMRR in the ACPI tables.

~Andrew
Re: i915 dma faults on Xen [ In reply to ]
On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote:
> On 14/10/2020 20:28, Jason Andryuk wrote:
> > Hi,
> >
> > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576
> >
> > I'm seeing DMA faults for the i915 graphics hardware on a Dell
> > Latitude 5500. These were captured when I plugged into a Dell
> > Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4
> > staging and Linux 5.4.70 (and some earlier versions).
> >
> > Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler
> > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler
> > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > Request device [0000:00:02.0] fault addr 39b5845000, iommu reg =
> > ffff82c00021d000
> > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > PTE Read access is not set
> > Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler
> > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler
> > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg =
> > ffff82c00021d000
> > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > PTE Read access is not set
> >
> > They repeat. In the log attached to
> > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at
> > "Oct 14 18:41:49.056589" and continue until I unplug the dock around
> > "Oct 14 18:41:54.801802".
> >
> > I've also seen similar messages when attaching the laptop's HDMI port
> > to a 4k monitor. The eDP display by itself seems okay.
> >
> > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and
> > didn't see any errors
> >
> > This is a kernel & xen log with drm.debug=0x1e. It also includes some
> > application (glass) logging when it changes resolutions which seems to
> > set off the DMA faults. 5500-igfx-messages-kern-xen-glass
> >
> > Running xen with iommu=no-igfx disables the iommu for the i915
> > graphics and no faults are reported. However, that breaks some other
> > devices (Dell Latitude 7200 and 5580) giving a black screen with:
> >
> > Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed
> > to idle engines, declaring wedged!
> > Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed
> > to initialize GPU, declaring it wedged!
> >
> > Any suggestions welcome.
>
> Presumably this is with a PV dom0.  What are 39b5845000 and 4238d0a000
> in the machine memory map?
>
> This smells like a missing RMRR in the ACPI tables.

I agree.

Can you paste the memory map as printed by Xen when booting, and what
command line are you using to boot Xen.

Have you tried adding dom0-iommu=map-inclusive to the Xen command
line?

Roger.
Re: i915 dma faults on Xen [ In reply to ]
On Thu, Oct 15, 2020 at 7:31 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote:
> > On 14/10/2020 20:28, Jason Andryuk wrote:
> > > Hi,
> > >
> > > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576
> > >
> > > I'm seeing DMA faults for the i915 graphics hardware on a Dell
> > > Latitude 5500. These were captured when I plugged into a Dell
> > > Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4
> > > staging and Linux 5.4.70 (and some earlier versions).
> > >
> > > Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler
> > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler
> > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > > Request device [0000:00:02.0] fault addr 39b5845000, iommu reg =
> > > ffff82c00021d000
> > > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > > PTE Read access is not set
> > > Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler
> > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler
> > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > > Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg =
> > > ffff82c00021d000
> > > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > > PTE Read access is not set
> > >
> > > They repeat. In the log attached to
> > > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at
> > > "Oct 14 18:41:49.056589" and continue until I unplug the dock around
> > > "Oct 14 18:41:54.801802".
> > >
> > > I've also seen similar messages when attaching the laptop's HDMI port
> > > to a 4k monitor. The eDP display by itself seems okay.
> > >
> > > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and
> > > didn't see any errors
> > >
> > > This is a kernel & xen log with drm.debug=0x1e. It also includes some
> > > application (glass) logging when it changes resolutions which seems to
> > > set off the DMA faults. 5500-igfx-messages-kern-xen-glass
> > >
> > > Running xen with iommu=no-igfx disables the iommu for the i915
> > > graphics and no faults are reported. However, that breaks some other
> > > devices (Dell Latitude 7200 and 5580) giving a black screen with:
> > >
> > > Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed
> > > to idle engines, declaring wedged!
> > > Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed
> > > to initialize GPU, declaring it wedged!
> > >
> > > Any suggestions welcome.
> >
> > Presumably this is with a PV dom0. What are 39b5845000 and 4238d0a000
> > in the machine memory map?

They are bogus?
End of RAM is 0x47c800000
Thats:
0x047c800000
vs.
0x39b5845000
0x4238d0a000

> > This smells like a missing RMRR in the ACPI tables.
>
> I agree.
>
> Can you paste the memory map as printed by Xen when booting, and what
> command line are you using to boot Xen.

So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen

There's the memory map
(XEN) TBOOT RAM map:
(XEN) 0000000000000000 - 0000000000060000 (usable)
(XEN) 0000000000060000 - 0000000000068000 (reserved)
(XEN) 0000000000068000 - 000000000009e000 (usable)
(XEN) 000000000009e000 - 000000000009f000 (reserved)
(XEN) 000000000009f000 - 00000000000a0000 (usable)
(XEN) 00000000000a0000 - 0000000000100000 (reserved)
(XEN) 0000000000100000 - 0000000040000000 (usable)
(XEN) 0000000040000000 - 0000000040400000 (reserved)
(XEN) 0000000040400000 - 000000007024b000 (usable)
(XEN) 000000007024b000 - 000000007024c000 (ACPI NVS)
(XEN) 000000007024c000 - 000000007024d000 (reserved)
(XEN) 000000007024d000 - 0000000077f19000 (usable)
(XEN) 0000000077f19000 - 0000000078987000 (reserved)
(XEN) 0000000078987000 - 0000000078a04000 (ACPI data)
(XEN) 0000000078a04000 - 0000000078ea3000 (ACPI NVS)
(XEN) 0000000078ea3000 - 000000007acff000 (reserved)
(XEN) 000000007acff000 - 000000007ad00000 (usable)
(XEN) 000000007ad00000 - 000000007f800000 (reserved)
(XEN) 00000000f0000000 - 00000000f8000000 (reserved)
(XEN) 00000000fe000000 - 00000000fe011000 (reserved)
(XEN) 00000000fec00000 - 00000000fec01000 (reserved)
(XEN) 00000000fee00000 - 00000000fee01000 (reserved)
(XEN) 00000000ff000000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 000000047c800000 (usable)
(XEN) EFI memory map:
(XEN) 0000000000000-000000009dfff type=7 attr=000000000000000f
(XEN) 000000009e000-000000009efff type=0 attr=000000000000000f
(XEN) 000000009f000-000000009ffff type=3 attr=000000000000000f
(XEN) 0000000100000-000003fffffff type=7 attr=000000000000000f
(XEN) 0000040000000-00000403fffff type=0 attr=000000000000000f
(XEN) 0000040400000-000005e359fff type=7 attr=000000000000000f
(XEN) 000005e35a000-000005e399fff type=4 attr=000000000000000f
(XEN) 000005e39a000-000006a47dfff type=7 attr=000000000000000f
(XEN) 000006a47e000-000006c3eefff type=2 attr=000000000000000f
(XEN) 000006c3ef000-000006d5eefff type=1 attr=000000000000000f
(XEN) 000006d5ef000-000006d86cfff type=2 attr=000000000000000f
(XEN) 000006d86d000-000006d978fff type=1 attr=000000000000000f
(XEN) 000006d979000-000006dc7afff type=4 attr=000000000000000f
(XEN) 000006dc7b000-000006dc98fff type=3 attr=000000000000000f
(XEN) 000006dc99000-000006dcc7fff type=4 attr=000000000000000f
(XEN) 000006dcc8000-000006dccdfff type=3 attr=000000000000000f
(XEN) 000006dcce000-00000701a5fff type=4 attr=000000000000000f
(XEN) 00000701a6000-00000701c8fff type=3 attr=000000000000000f
(XEN) 00000701c9000-00000701edfff type=4 attr=000000000000000f
(XEN) 00000701ee000-0000070204fff type=3 attr=000000000000000f
(XEN) 0000070205000-000007022cfff type=4 attr=000000000000000f
(XEN) 000007022d000-000007024afff type=3 attr=000000000000000f
(XEN) 000007024b000-000007024bfff type=10 attr=000000000000000f
(XEN) 000007024c000-000007024cfff type=6 attr=800000000000000f
(XEN) 000007024d000-000007024dfff type=4 attr=000000000000000f
(XEN) 000007024e000-0000070282fff type=3 attr=000000000000000f
(XEN) 0000070283000-00000702c3fff type=4 attr=000000000000000f
(XEN) 00000702c4000-00000702c8fff type=3 attr=000000000000000f
(XEN) 00000702c9000-00000702defff type=4 attr=000000000000000f
(XEN) 00000702df000-0000070307fff type=3 attr=000000000000000f
(XEN) 0000070308000-0000070317fff type=4 attr=000000000000000f
(XEN) 0000070318000-0000070319fff type=3 attr=000000000000000f
(XEN) 000007031a000-0000070331fff type=4 attr=000000000000000f
(XEN) 0000070332000-0000070349fff type=3 attr=000000000000000f
(XEN) 000007034a000-0000070356fff type=2 attr=000000000000000f
(XEN) 0000070357000-0000070357fff type=7 attr=000000000000000f
(XEN) 0000070358000-0000070358fff type=2 attr=000000000000000f
(XEN) 0000070359000-0000076f3efff type=4 attr=000000000000000f
(XEN) 0000076f3f000-00000772affff type=7 attr=000000000000000f
(XEN) 00000772b0000-0000077f18fff type=3 attr=000000000000000f
(XEN) 0000077f19000-0000078986fff type=0 attr=000000000000000f
(XEN) 0000078987000-0000078a03fff type=9 attr=000000000000000f
(XEN) 0000078a04000-0000078ea2fff type=10 attr=000000000000000f
(XEN) 0000078ea3000-000007ab22fff type=6 attr=800000000000000f
(XEN) 000007ab23000-000007acfefff type=5 attr=800000000000000f
(XEN) 000007acff000-000007acfffff type=4 attr=000000000000000f
(XEN) 0000100000000-000047c7fffff type=7 attr=000000000000000f
(XEN) 00000000a0000-00000000fffff type=0 attr=0000000000000000
(XEN) 000007ad00000-000007adfffff type=0 attr=070000000000000f
(XEN) 000007ae00000-000007f7fffff type=0 attr=0000000000000000
(XEN) 00000f0000000-00000f7ffffff type=11 attr=800000000000100d
(XEN) 00000fe000000-00000fe010fff type=11 attr=8000000000000001
(XEN) 00000fec00000-00000fec00fff type=11 attr=8000000000000001
(XEN) 00000fee00000-00000fee00fff type=11 attr=8000000000000001
(XEN) 00000ff000000-00000ffffffff type=11 attr=800000000000100d

Command line
console=com1 dom0_mem=min:420M,max:420M,420M efi=no-rs,attr=uc
com1=115200,8n1,pci mbi-video vga=current flask=enforcing loglvl=debug
guest_loglvl=debug smt=0 ucode=-1 bootscrub=1
argo=yes,mac-permissive=1 iommu=force,igfx

iommu=force,igfx was to force igfx back on. I added a dmi quirk to
set no-igfx on this platform as a temporary workaround.

> Have you tried adding dom0-iommu=map-inclusive to the Xen command
> line?

I have not. I can try that tomorrow when I have access to the system again.

Thanks,
Jason
Re: i915 dma faults on Xen [ In reply to ]
> > Can you paste the memory map as printed by Xen when booting, and what
> > command line are you using to boot Xen.
>
> So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen

Unrelated comment: since tboot now has a PE build
(http://hg.code.sf.net/p/tboot/code/rev/5c68f0963a78) I think it would
be time for OpenXT to drop the weird efi->xen->tboot->xen flow and
just do efi->tboot->xen. Only reason we did efi->xen->tboot was
because tboot didn't have a PE build at the time. It's a very hackish
solution that's no longer needed.

Tamas
Re: i915 dma faults on Xen [ In reply to ]
On Thu, Oct 15, 2020 at 12:39 PM Tamas K Lengyel
<tamas.k.lengyel@gmail.com> wrote:
>
> > > Can you paste the memory map as printed by Xen when booting, and what
> > > command line are you using to boot Xen.
> >
> > So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen
>
> Unrelated comment: since tboot now has a PE build
> (http://hg.code.sf.net/p/tboot/code/rev/5c68f0963a78) I think it would
> be time for OpenXT to drop the weird efi->xen->tboot->xen flow and
> just do efi->tboot->xen. Only reason we did efi->xen->tboot was
> because tboot didn't have a PE build at the time. It's a very hackish
> solution that's no longer needed.

Thanks for the pointer, Tamas. If I recall correctly, there was also
an issue with ExitBootServices. Do you know if that has been
addressed?

Depending on timing, OpenXT may just move to TrenchBoot for a DRTM solution.

Regards,
Jason
Re: i915 dma faults on Xen [ In reply to ]
On Thu, Oct 15, 2020 at 11:16 AM Jason Andryuk <jandryuk@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 7:31 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote:
> > > On 14/10/2020 20:28, Jason Andryuk wrote:
> > > > Hi,
> > > >
> > > > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576
> > > >
> > > > I'm seeing DMA faults for the i915 graphics hardware on a Dell
> > > > Latitude 5500. These were captured when I plugged into a Dell
> > > > Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4
> > > > staging and Linux 5.4.70 (and some earlier versions).
> > > >
> > > > Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler
> > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > > Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler
> > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > > > Request device [0000:00:02.0] fault addr 39b5845000, iommu reg =
> > > > ffff82c00021d000
> > > > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > > > PTE Read access is not set
> > > > Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler
> > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > > Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler
> > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > > > Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg =
> > > > ffff82c00021d000
> > > > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > > > PTE Read access is not set
> > > >
> > > > They repeat. In the log attached to
> > > > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at
> > > > "Oct 14 18:41:49.056589" and continue until I unplug the dock around
> > > > "Oct 14 18:41:54.801802".
> > > >
> > > > I've also seen similar messages when attaching the laptop's HDMI port
> > > > to a 4k monitor. The eDP display by itself seems okay.
> > > >
> > > > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and
> > > > didn't see any errors
> > > >
> > > > This is a kernel & xen log with drm.debug=0x1e. It also includes some
> > > > application (glass) logging when it changes resolutions which seems to
> > > > set off the DMA faults. 5500-igfx-messages-kern-xen-glass
> > > >
> > > > Running xen with iommu=no-igfx disables the iommu for the i915
> > > > graphics and no faults are reported. However, that breaks some other
> > > > devices (Dell Latitude 7200 and 5580) giving a black screen with:
> > > >
> > > > Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed
> > > > to idle engines, declaring wedged!
> > > > Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed
> > > > to initialize GPU, declaring it wedged!
> > > >
> > > > Any suggestions welcome.
> > >
> > > Presumably this is with a PV dom0. What are 39b5845000 and 4238d0a000
> > > in the machine memory map?
>
> They are bogus?
> End of RAM is 0x47c800000
> Thats:
> 0x047c800000
> vs.
> 0x39b5845000
> 0x4238d0a000
>
> > > This smells like a missing RMRR in the ACPI tables.

The RMRRs are:
(XEN) [VT-D]Host address width 39
(XEN) [VT-D]found ACPI_DMAR_DRHD:
(XEN) [VT-D] dmaru->address = fed90000
(XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00021d000
(XEN) [VT-D]cap = 1c0000c40660462 ecap = 19e2ff0505e
(XEN) [VT-D] endpoint: 0000:00:02.0
(XEN) [VT-D]found ACPI_DMAR_DRHD:
(XEN) [VT-D] dmaru->address = fed91000
(XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00021f000
(XEN) [VT-D]cap = d2008c40660462 ecap = f050da
(XEN) [VT-D] IOAPIC: 0000:00:1e.7
(XEN) [VT-D] MSI HPET: 0000:00:1e.6
(XEN) [VT-D] flags: INCLUDE_ALL
(XEN) [VT-D]found ACPI_DMAR_RMRR:
(XEN) [VT-D] endpoint: 0000:00:14.0
(XEN) [VT-D]dmar.c:615: RMRR region: base_addr 78863000 end_addr 78882fff
(XEN) [VT-D]found ACPI_DMAR_RMRR:
(XEN) [VT-D] endpoint: 0000:00:02.0
(XEN) [VT-D]dmar.c:615: RMRR region: base_addr 7d000000 end_addr 7f7fffff
(XEN) [VT-D]found ACPI_DMAR_RMRR:
(XEN) [VT-D] endpoint: 0000:00:16.7
(XEN) [VT-D]dmar.c:581: Non-existent device (0000:00:16.7) is
reported in RMRR (78907000, 78986fff)'s scope!
(XEN) [VT-D]dmar.c:596: Ignore the RMRR (78907000, 78986fff) due to
devices under its scope are not PCI discoverable!

> > I agree.
> >
> > Can you paste the memory map as printed by Xen when booting, and what
> > command line are you using to boot Xen.
>
> So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen
>
> There's the memory map
> (XEN) TBOOT RAM map:
> (XEN) 0000000000000000 - 0000000000060000 (usable)
> (XEN) 0000000000060000 - 0000000000068000 (reserved)
> (XEN) 0000000000068000 - 000000000009e000 (usable)
> (XEN) 000000000009e000 - 000000000009f000 (reserved)
> (XEN) 000000000009f000 - 00000000000a0000 (usable)
> (XEN) 00000000000a0000 - 0000000000100000 (reserved)
> (XEN) 0000000000100000 - 0000000040000000 (usable)
> (XEN) 0000000040000000 - 0000000040400000 (reserved)
> (XEN) 0000000040400000 - 000000007024b000 (usable)
> (XEN) 000000007024b000 - 000000007024c000 (ACPI NVS)
> (XEN) 000000007024c000 - 000000007024d000 (reserved)
> (XEN) 000000007024d000 - 0000000077f19000 (usable)
> (XEN) 0000000077f19000 - 0000000078987000 (reserved)
> (XEN) 0000000078987000 - 0000000078a04000 (ACPI data)
> (XEN) 0000000078a04000 - 0000000078ea3000 (ACPI NVS)
> (XEN) 0000000078ea3000 - 000000007acff000 (reserved)
> (XEN) 000000007acff000 - 000000007ad00000 (usable)
> (XEN) 000000007ad00000 - 000000007f800000 (reserved)
> (XEN) 00000000f0000000 - 00000000f8000000 (reserved)
> (XEN) 00000000fe000000 - 00000000fe011000 (reserved)
> (XEN) 00000000fec00000 - 00000000fec01000 (reserved)
> (XEN) 00000000fee00000 - 00000000fee01000 (reserved)
> (XEN) 00000000ff000000 - 0000000100000000 (reserved)
> (XEN) 0000000100000000 - 000000047c800000 (usable)
> (XEN) EFI memory map:
> (XEN) 0000000000000-000000009dfff type=7 attr=000000000000000f
> (XEN) 000000009e000-000000009efff type=0 attr=000000000000000f
> (XEN) 000000009f000-000000009ffff type=3 attr=000000000000000f
> (XEN) 0000000100000-000003fffffff type=7 attr=000000000000000f
> (XEN) 0000040000000-00000403fffff type=0 attr=000000000000000f
> (XEN) 0000040400000-000005e359fff type=7 attr=000000000000000f
> (XEN) 000005e35a000-000005e399fff type=4 attr=000000000000000f
> (XEN) 000005e39a000-000006a47dfff type=7 attr=000000000000000f
> (XEN) 000006a47e000-000006c3eefff type=2 attr=000000000000000f
> (XEN) 000006c3ef000-000006d5eefff type=1 attr=000000000000000f
> (XEN) 000006d5ef000-000006d86cfff type=2 attr=000000000000000f
> (XEN) 000006d86d000-000006d978fff type=1 attr=000000000000000f
> (XEN) 000006d979000-000006dc7afff type=4 attr=000000000000000f
> (XEN) 000006dc7b000-000006dc98fff type=3 attr=000000000000000f
> (XEN) 000006dc99000-000006dcc7fff type=4 attr=000000000000000f
> (XEN) 000006dcc8000-000006dccdfff type=3 attr=000000000000000f
> (XEN) 000006dcce000-00000701a5fff type=4 attr=000000000000000f
> (XEN) 00000701a6000-00000701c8fff type=3 attr=000000000000000f
> (XEN) 00000701c9000-00000701edfff type=4 attr=000000000000000f
> (XEN) 00000701ee000-0000070204fff type=3 attr=000000000000000f
> (XEN) 0000070205000-000007022cfff type=4 attr=000000000000000f
> (XEN) 000007022d000-000007024afff type=3 attr=000000000000000f
> (XEN) 000007024b000-000007024bfff type=10 attr=000000000000000f
> (XEN) 000007024c000-000007024cfff type=6 attr=800000000000000f
> (XEN) 000007024d000-000007024dfff type=4 attr=000000000000000f
> (XEN) 000007024e000-0000070282fff type=3 attr=000000000000000f
> (XEN) 0000070283000-00000702c3fff type=4 attr=000000000000000f
> (XEN) 00000702c4000-00000702c8fff type=3 attr=000000000000000f
> (XEN) 00000702c9000-00000702defff type=4 attr=000000000000000f
> (XEN) 00000702df000-0000070307fff type=3 attr=000000000000000f
> (XEN) 0000070308000-0000070317fff type=4 attr=000000000000000f
> (XEN) 0000070318000-0000070319fff type=3 attr=000000000000000f
> (XEN) 000007031a000-0000070331fff type=4 attr=000000000000000f
> (XEN) 0000070332000-0000070349fff type=3 attr=000000000000000f
> (XEN) 000007034a000-0000070356fff type=2 attr=000000000000000f
> (XEN) 0000070357000-0000070357fff type=7 attr=000000000000000f
> (XEN) 0000070358000-0000070358fff type=2 attr=000000000000000f
> (XEN) 0000070359000-0000076f3efff type=4 attr=000000000000000f
> (XEN) 0000076f3f000-00000772affff type=7 attr=000000000000000f
> (XEN) 00000772b0000-0000077f18fff type=3 attr=000000000000000f
> (XEN) 0000077f19000-0000078986fff type=0 attr=000000000000000f
> (XEN) 0000078987000-0000078a03fff type=9 attr=000000000000000f
> (XEN) 0000078a04000-0000078ea2fff type=10 attr=000000000000000f
> (XEN) 0000078ea3000-000007ab22fff type=6 attr=800000000000000f
> (XEN) 000007ab23000-000007acfefff type=5 attr=800000000000000f
> (XEN) 000007acff000-000007acfffff type=4 attr=000000000000000f
> (XEN) 0000100000000-000047c7fffff type=7 attr=000000000000000f
> (XEN) 00000000a0000-00000000fffff type=0 attr=0000000000000000
> (XEN) 000007ad00000-000007adfffff type=0 attr=070000000000000f
> (XEN) 000007ae00000-000007f7fffff type=0 attr=0000000000000000
> (XEN) 00000f0000000-00000f7ffffff type=11 attr=800000000000100d
> (XEN) 00000fe000000-00000fe010fff type=11 attr=8000000000000001
> (XEN) 00000fec00000-00000fec00fff type=11 attr=8000000000000001
> (XEN) 00000fee00000-00000fee00fff type=11 attr=8000000000000001
> (XEN) 00000ff000000-00000ffffffff type=11 attr=800000000000100d
>
> Command line
> console=com1 dom0_mem=min:420M,max:420M,420M efi=no-rs,attr=uc
> com1=115200,8n1,pci mbi-video vga=current flask=enforcing loglvl=debug
> guest_loglvl=debug smt=0 ucode=-1 bootscrub=1
> argo=yes,mac-permissive=1 iommu=force,igfx
>
> iommu=force,igfx was to force igfx back on. I added a dmi quirk to
> set no-igfx on this platform as a temporary workaround.
>
> > Have you tried adding dom0-iommu=map-inclusive to the Xen command
> > line?

Still seeing faults with dom0-iommu=map-inclusive. At a different
address this time:
Oct 16 15:58:05.110768 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
Request device [0000:00:02.0] fault addr ea0c4f000, iommu reg = ffff
82c00021d000
Oct 16 15:58:05.110774 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
PTE Read access is not set
Oct 16 15:58:05.110777 VM hypervisor: (XEN) print_vtd_entries: iommu
#0 dev 0000:00:02.0 gmfn ea0c4f
Oct 16 15:58:05.110780 VM hypervisor: (XEN) root_entry[00] = 46e129001
Oct 16 15:58:05.110782 VM hypervisor: (XEN) context[10] = 2_46e128001
Oct 16 15:58:05.110785 VM hypervisor: (XEN) l4[000] = 46e11b003
Oct 16 15:58:05.110787 VM hypervisor: (XEN) l3[03a] = 0
Oct 16 15:58:05.110789 VM hypervisor: (XEN) l3[03a] not present

The previous posting, the two faulting addresses repeated in pairs.
Here it is only this one address repeating.

I plugged and unplugged and a different address was repeating with a
few other random addresses with 1 or 2 faults. Here is uniq -c output
of the address and count pulled from the logs:
0x1ce9d6b000 2007
0x31b50d5000 1
0x1ce9d6b000 882
0x707741000 1
0x1ce9d6b000 1114
0x20d2099000 1
0x1ce9d6b000 3489
0xeb98eb000 1
0x1ce9d6b000 2430
0xeb98eb000 1
0x1ce9d6b000 1300
0x22f20bb000 1
0x1ce9d6b000 269
0x22f20bb000 1
0x1ce9d6b000 5091
0x6c99ec9000 1
0x1ce9d6b000 29
0xeb98eb000 1
0x1ce9d6b000 4599
0x6c99ec9000 1
0x1ce9d6b000 1989

In the i915 bug report, LAKSHMINARAYANA VUDUM commented "We have a
similar issue on SKL on our CI system
https://gitlab.freedesktop.org/drm/intel/-/issues/2017"

Regards,
Jason
Re: i915 dma faults on Xen [ In reply to ]
On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote:
> On Thu, Oct 15, 2020 at 11:16 AM Jason Andryuk <jandryuk@gmail.com> wrote:
> >
> > On Thu, Oct 15, 2020 at 7:31 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote:
> > > > On 14/10/2020 20:28, Jason Andryuk wrote:
> > > > > Hi,
> > > > >
> > > > > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576
> > > > >
> > > > > I'm seeing DMA faults for the i915 graphics hardware on a Dell
> > > > > Latitude 5500. These were captured when I plugged into a Dell
> > > > > Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4
> > > > > staging and Linux 5.4.70 (and some earlier versions).
> > > > >
> > > > > Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler
> > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > > > Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler
> > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > > > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > > > > Request device [0000:00:02.0] fault addr 39b5845000, iommu reg =
> > > > > ffff82c00021d000
> > > > > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > > > > PTE Read access is not set
> > > > > Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler
> > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > > > Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler
> > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080
> > > > > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > > > > Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg =
> > > > > ffff82c00021d000
> > > > > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > > > > PTE Read access is not set
> > > > >
> > > > > They repeat. In the log attached to
> > > > > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at
> > > > > "Oct 14 18:41:49.056589" and continue until I unplug the dock around
> > > > > "Oct 14 18:41:54.801802".
> > > > >
> > > > > I've also seen similar messages when attaching the laptop's HDMI port
> > > > > to a 4k monitor. The eDP display by itself seems okay.
> > > > >
> > > > > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and
> > > > > didn't see any errors
> > > > >
> > > > > This is a kernel & xen log with drm.debug=0x1e. It also includes some
> > > > > application (glass) logging when it changes resolutions which seems to
> > > > > set off the DMA faults. 5500-igfx-messages-kern-xen-glass
> > > > >
> > > > > Running xen with iommu=no-igfx disables the iommu for the i915
> > > > > graphics and no faults are reported. However, that breaks some other
> > > > > devices (Dell Latitude 7200 and 5580) giving a black screen with:
> > > > >
> > > > > Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed
> > > > > to idle engines, declaring wedged!
> > > > > Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed
> > > > > to initialize GPU, declaring it wedged!
> > > > >
> > > > > Any suggestions welcome.
> > > >
> > > > Presumably this is with a PV dom0. What are 39b5845000 and 4238d0a000
> > > > in the machine memory map?
> >
> > They are bogus?
> > End of RAM is 0x47c800000
> > Thats:
> > 0x047c800000
> > vs.
> > 0x39b5845000
> > 0x4238d0a000
> >
> > > > This smells like a missing RMRR in the ACPI tables.
>
> The RMRRs are:
> (XEN) [VT-D]Host address width 39
> (XEN) [VT-D]found ACPI_DMAR_DRHD:
> (XEN) [VT-D] dmaru->address = fed90000
> (XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00021d000
> (XEN) [VT-D]cap = 1c0000c40660462 ecap = 19e2ff0505e
> (XEN) [VT-D] endpoint: 0000:00:02.0
> (XEN) [VT-D]found ACPI_DMAR_DRHD:
> (XEN) [VT-D] dmaru->address = fed91000
> (XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00021f000
> (XEN) [VT-D]cap = d2008c40660462 ecap = f050da
> (XEN) [VT-D] IOAPIC: 0000:00:1e.7
> (XEN) [VT-D] MSI HPET: 0000:00:1e.6
> (XEN) [VT-D] flags: INCLUDE_ALL
> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> (XEN) [VT-D] endpoint: 0000:00:14.0
> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 78863000 end_addr 78882fff
> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> (XEN) [VT-D] endpoint: 0000:00:02.0
> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 7d000000 end_addr 7f7fffff
> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> (XEN) [VT-D] endpoint: 0000:00:16.7
> (XEN) [VT-D]dmar.c:581: Non-existent device (0000:00:16.7) is
> reported in RMRR (78907000, 78986fff)'s scope!
> (XEN) [VT-D]dmar.c:596: Ignore the RMRR (78907000, 78986fff) due to

This is also part of a reserved region, so should be added to the
iommu page tables anyway regardless of this message.

> devices under its scope are not PCI discoverable!
>
> > > I agree.
> > >
> > > Can you paste the memory map as printed by Xen when booting, and what
> > > command line are you using to boot Xen.
> >
> > So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen
> >
> > There's the memory map
> > (XEN) TBOOT RAM map:
> > (XEN) 0000000000000000 - 0000000000060000 (usable)
> > (XEN) 0000000000060000 - 0000000000068000 (reserved)
> > (XEN) 0000000000068000 - 000000000009e000 (usable)
> > (XEN) 000000000009e000 - 000000000009f000 (reserved)
> > (XEN) 000000000009f000 - 00000000000a0000 (usable)
> > (XEN) 00000000000a0000 - 0000000000100000 (reserved)
> > (XEN) 0000000000100000 - 0000000040000000 (usable)
> > (XEN) 0000000040000000 - 0000000040400000 (reserved)
> > (XEN) 0000000040400000 - 000000007024b000 (usable)
> > (XEN) 000000007024b000 - 000000007024c000 (ACPI NVS)
> > (XEN) 000000007024c000 - 000000007024d000 (reserved)
> > (XEN) 000000007024d000 - 0000000077f19000 (usable)
> > (XEN) 0000000077f19000 - 0000000078987000 (reserved)
> > (XEN) 0000000078987000 - 0000000078a04000 (ACPI data)
> > (XEN) 0000000078a04000 - 0000000078ea3000 (ACPI NVS)
> > (XEN) 0000000078ea3000 - 000000007acff000 (reserved)
> > (XEN) 000000007acff000 - 000000007ad00000 (usable)
> > (XEN) 000000007ad00000 - 000000007f800000 (reserved)
> > (XEN) 00000000f0000000 - 00000000f8000000 (reserved)
> > (XEN) 00000000fe000000 - 00000000fe011000 (reserved)
> > (XEN) 00000000fec00000 - 00000000fec01000 (reserved)
> > (XEN) 00000000fee00000 - 00000000fee01000 (reserved)
> > (XEN) 00000000ff000000 - 0000000100000000 (reserved)
> > (XEN) 0000000100000000 - 000000047c800000 (usable)
> > (XEN) EFI memory map:
> > (XEN) 0000000000000-000000009dfff type=7 attr=000000000000000f
> > (XEN) 000000009e000-000000009efff type=0 attr=000000000000000f
> > (XEN) 000000009f000-000000009ffff type=3 attr=000000000000000f
> > (XEN) 0000000100000-000003fffffff type=7 attr=000000000000000f
> > (XEN) 0000040000000-00000403fffff type=0 attr=000000000000000f
> > (XEN) 0000040400000-000005e359fff type=7 attr=000000000000000f
> > (XEN) 000005e35a000-000005e399fff type=4 attr=000000000000000f
> > (XEN) 000005e39a000-000006a47dfff type=7 attr=000000000000000f
> > (XEN) 000006a47e000-000006c3eefff type=2 attr=000000000000000f
> > (XEN) 000006c3ef000-000006d5eefff type=1 attr=000000000000000f
> > (XEN) 000006d5ef000-000006d86cfff type=2 attr=000000000000000f
> > (XEN) 000006d86d000-000006d978fff type=1 attr=000000000000000f
> > (XEN) 000006d979000-000006dc7afff type=4 attr=000000000000000f
> > (XEN) 000006dc7b000-000006dc98fff type=3 attr=000000000000000f
> > (XEN) 000006dc99000-000006dcc7fff type=4 attr=000000000000000f
> > (XEN) 000006dcc8000-000006dccdfff type=3 attr=000000000000000f
> > (XEN) 000006dcce000-00000701a5fff type=4 attr=000000000000000f
> > (XEN) 00000701a6000-00000701c8fff type=3 attr=000000000000000f
> > (XEN) 00000701c9000-00000701edfff type=4 attr=000000000000000f
> > (XEN) 00000701ee000-0000070204fff type=3 attr=000000000000000f
> > (XEN) 0000070205000-000007022cfff type=4 attr=000000000000000f
> > (XEN) 000007022d000-000007024afff type=3 attr=000000000000000f
> > (XEN) 000007024b000-000007024bfff type=10 attr=000000000000000f
> > (XEN) 000007024c000-000007024cfff type=6 attr=800000000000000f
> > (XEN) 000007024d000-000007024dfff type=4 attr=000000000000000f
> > (XEN) 000007024e000-0000070282fff type=3 attr=000000000000000f
> > (XEN) 0000070283000-00000702c3fff type=4 attr=000000000000000f
> > (XEN) 00000702c4000-00000702c8fff type=3 attr=000000000000000f
> > (XEN) 00000702c9000-00000702defff type=4 attr=000000000000000f
> > (XEN) 00000702df000-0000070307fff type=3 attr=000000000000000f
> > (XEN) 0000070308000-0000070317fff type=4 attr=000000000000000f
> > (XEN) 0000070318000-0000070319fff type=3 attr=000000000000000f
> > (XEN) 000007031a000-0000070331fff type=4 attr=000000000000000f
> > (XEN) 0000070332000-0000070349fff type=3 attr=000000000000000f
> > (XEN) 000007034a000-0000070356fff type=2 attr=000000000000000f
> > (XEN) 0000070357000-0000070357fff type=7 attr=000000000000000f
> > (XEN) 0000070358000-0000070358fff type=2 attr=000000000000000f
> > (XEN) 0000070359000-0000076f3efff type=4 attr=000000000000000f
> > (XEN) 0000076f3f000-00000772affff type=7 attr=000000000000000f
> > (XEN) 00000772b0000-0000077f18fff type=3 attr=000000000000000f
> > (XEN) 0000077f19000-0000078986fff type=0 attr=000000000000000f
> > (XEN) 0000078987000-0000078a03fff type=9 attr=000000000000000f
> > (XEN) 0000078a04000-0000078ea2fff type=10 attr=000000000000000f
> > (XEN) 0000078ea3000-000007ab22fff type=6 attr=800000000000000f
> > (XEN) 000007ab23000-000007acfefff type=5 attr=800000000000000f
> > (XEN) 000007acff000-000007acfffff type=4 attr=000000000000000f
> > (XEN) 0000100000000-000047c7fffff type=7 attr=000000000000000f
> > (XEN) 00000000a0000-00000000fffff type=0 attr=0000000000000000
> > (XEN) 000007ad00000-000007adfffff type=0 attr=070000000000000f
> > (XEN) 000007ae00000-000007f7fffff type=0 attr=0000000000000000
> > (XEN) 00000f0000000-00000f7ffffff type=11 attr=800000000000100d
> > (XEN) 00000fe000000-00000fe010fff type=11 attr=8000000000000001
> > (XEN) 00000fec00000-00000fec00fff type=11 attr=8000000000000001
> > (XEN) 00000fee00000-00000fee00fff type=11 attr=8000000000000001
> > (XEN) 00000ff000000-00000ffffffff type=11 attr=800000000000100d
> >
> > Command line
> > console=com1 dom0_mem=min:420M,max:420M,420M efi=no-rs,attr=uc
> > com1=115200,8n1,pci mbi-video vga=current flask=enforcing loglvl=debug
> > guest_loglvl=debug smt=0 ucode=-1 bootscrub=1
> > argo=yes,mac-permissive=1 iommu=force,igfx
> >
> > iommu=force,igfx was to force igfx back on. I added a dmi quirk to
> > set no-igfx on this platform as a temporary workaround.

I assume setting no-igfx fixed the issue and the card works fine in
that case?

> > > Have you tried adding dom0-iommu=map-inclusive to the Xen command
> > > line?
>
> Still seeing faults with dom0-iommu=map-inclusive. At a different
> address this time:
> Oct 16 15:58:05.110768 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> Request device [0000:00:02.0] fault addr ea0c4f000, iommu reg = ffff

That's also past the end of RAM.

> 82c00021d000
> Oct 16 15:58:05.110774 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> PTE Read access is not set
> Oct 16 15:58:05.110777 VM hypervisor: (XEN) print_vtd_entries: iommu
> #0 dev 0000:00:02.0 gmfn ea0c4f
> Oct 16 15:58:05.110780 VM hypervisor: (XEN) root_entry[00] = 46e129001
> Oct 16 15:58:05.110782 VM hypervisor: (XEN) context[10] = 2_46e128001
> Oct 16 15:58:05.110785 VM hypervisor: (XEN) l4[000] = 46e11b003
> Oct 16 15:58:05.110787 VM hypervisor: (XEN) l3[03a] = 0
> Oct 16 15:58:05.110789 VM hypervisor: (XEN) l3[03a] not present
>
> The previous posting, the two faulting addresses repeated in pairs.
> Here it is only this one address repeating.
>
> I plugged and unplugged and a different address was repeating with a
> few other random addresses with 1 or 2 faults. Here is uniq -c output
> of the address and count pulled from the logs:
> 0x1ce9d6b000 2007
> 0x31b50d5000 1
> 0x1ce9d6b000 882
> 0x707741000 1
> 0x1ce9d6b000 1114
> 0x20d2099000 1
> 0x1ce9d6b000 3489
> 0xeb98eb000 1
> 0x1ce9d6b000 2430
> 0xeb98eb000 1
> 0x1ce9d6b000 1300
> 0x22f20bb000 1
> 0x1ce9d6b000 269
> 0x22f20bb000 1
> 0x1ce9d6b000 5091
> 0x6c99ec9000 1
> 0x1ce9d6b000 29
> 0xeb98eb000 1
> 0x1ce9d6b000 4599
> 0x6c99ec9000 1
> 0x1ce9d6b000 1989

Hm, it's hard to tell what's going on. My limited experience with
IOMMU faults on broken systems there's a small range that initially
triggers those, and then the device goes wonky and starts accessing a
whole load of invalid addresses.

You could try adding those manually using the rmrr Xen command line
option [0], maybe you can figure out which range(s) are missing?

Roger.

[0] https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#rmrr
Re: i915 dma faults on Xen [ In reply to ]
On 21.10.2020 11:58, Roger Pau Monné wrote:
> On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote:
>> The RMRRs are:
>> (XEN) [VT-D]Host address width 39
>> (XEN) [VT-D]found ACPI_DMAR_DRHD:
>> (XEN) [VT-D] dmaru->address = fed90000
>> (XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00021d000
>> (XEN) [VT-D]cap = 1c0000c40660462 ecap = 19e2ff0505e
>> (XEN) [VT-D] endpoint: 0000:00:02.0
>> (XEN) [VT-D]found ACPI_DMAR_DRHD:
>> (XEN) [VT-D] dmaru->address = fed91000
>> (XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00021f000
>> (XEN) [VT-D]cap = d2008c40660462 ecap = f050da
>> (XEN) [VT-D] IOAPIC: 0000:00:1e.7
>> (XEN) [VT-D] MSI HPET: 0000:00:1e.6
>> (XEN) [VT-D] flags: INCLUDE_ALL
>> (XEN) [VT-D]found ACPI_DMAR_RMRR:
>> (XEN) [VT-D] endpoint: 0000:00:14.0
>> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 78863000 end_addr 78882fff
>> (XEN) [VT-D]found ACPI_DMAR_RMRR:
>> (XEN) [VT-D] endpoint: 0000:00:02.0
>> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 7d000000 end_addr 7f7fffff
>> (XEN) [VT-D]found ACPI_DMAR_RMRR:
>> (XEN) [VT-D] endpoint: 0000:00:16.7
>> (XEN) [VT-D]dmar.c:581: Non-existent device (0000:00:16.7) is
>> reported in RMRR (78907000, 78986fff)'s scope!
>> (XEN) [VT-D]dmar.c:596: Ignore the RMRR (78907000, 78986fff) due to
>
> This is also part of a reserved region, so should be added to the
> iommu page tables anyway regardless of this message.

Could you clarify why you think so? RMRRs are tied to devices, so
if a device in reality doesn't exist (and no other one uses the
same range), I don't see why an IOMMU mapping would be needed
(unless to work around some related firmware bug). Plus aiui none
of the IOMMU faults actually report this range as having got
accessed.

Jan
Re: i915 dma faults on Xen [ In reply to ]
On Wed, Oct 21, 2020 at 12:33:05PM +0200, Jan Beulich wrote:
> On 21.10.2020 11:58, Roger Pau Monné wrote:
> > On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote:
> >> The RMRRs are:
> >> (XEN) [VT-D]Host address width 39
> >> (XEN) [VT-D]found ACPI_DMAR_DRHD:
> >> (XEN) [VT-D] dmaru->address = fed90000
> >> (XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00021d000
> >> (XEN) [VT-D]cap = 1c0000c40660462 ecap = 19e2ff0505e
> >> (XEN) [VT-D] endpoint: 0000:00:02.0
> >> (XEN) [VT-D]found ACPI_DMAR_DRHD:
> >> (XEN) [VT-D] dmaru->address = fed91000
> >> (XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00021f000
> >> (XEN) [VT-D]cap = d2008c40660462 ecap = f050da
> >> (XEN) [VT-D] IOAPIC: 0000:00:1e.7
> >> (XEN) [VT-D] MSI HPET: 0000:00:1e.6
> >> (XEN) [VT-D] flags: INCLUDE_ALL
> >> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> >> (XEN) [VT-D] endpoint: 0000:00:14.0
> >> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 78863000 end_addr 78882fff
> >> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> >> (XEN) [VT-D] endpoint: 0000:00:02.0
> >> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 7d000000 end_addr 7f7fffff
> >> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> >> (XEN) [VT-D] endpoint: 0000:00:16.7
> >> (XEN) [VT-D]dmar.c:581: Non-existent device (0000:00:16.7) is
> >> reported in RMRR (78907000, 78986fff)'s scope!
> >> (XEN) [VT-D]dmar.c:596: Ignore the RMRR (78907000, 78986fff) due to
> >
> > This is also part of a reserved region, so should be added to the
> > iommu page tables anyway regardless of this message.
>
> Could you clarify why you think so? RMRRs are tied to devices, so
> if a device in reality doesn't exist (and no other one uses the
> same range), I don't see why an IOMMU mapping would be needed
> (unless to work around some related firmware bug). Plus aiui none
> of the IOMMU faults actually report this range as having got
> accessed.

Since it's the hardware domain that gets the gfx card assigned here it
will get any reserved regions added to the IOMMU page tables in
arch_iommu_hwdom_init. I agree it's not relevant here, since those are
not the regions reported in the IOMMU faults.

Roger.
Re: i915 dma faults on Xen [ In reply to ]
On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote:
> >
> > The RMRRs are:
> > (XEN) [VT-D]Host address width 39
> > (XEN) [VT-D]found ACPI_DMAR_DRHD:
> > (XEN) [VT-D] dmaru->address = fed90000
> > (XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00021d000
> > (XEN) [VT-D]cap = 1c0000c40660462 ecap = 19e2ff0505e
> > (XEN) [VT-D] endpoint: 0000:00:02.0
> > (XEN) [VT-D]found ACPI_DMAR_DRHD:
> > (XEN) [VT-D] dmaru->address = fed91000
> > (XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00021f000
> > (XEN) [VT-D]cap = d2008c40660462 ecap = f050da
> > (XEN) [VT-D] IOAPIC: 0000:00:1e.7
> > (XEN) [VT-D] MSI HPET: 0000:00:1e.6
> > (XEN) [VT-D] flags: INCLUDE_ALL
> > (XEN) [VT-D]found ACPI_DMAR_RMRR:
> > (XEN) [VT-D] endpoint: 0000:00:14.0
> > (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 78863000 end_addr 78882fff
> > (XEN) [VT-D]found ACPI_DMAR_RMRR:
> > (XEN) [VT-D] endpoint: 0000:00:02.0
> > (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 7d000000 end_addr 7f7fffff
> > (XEN) [VT-D]found ACPI_DMAR_RMRR:
> > (XEN) [VT-D] endpoint: 0000:00:16.7
> > (XEN) [VT-D]dmar.c:581: Non-existent device (0000:00:16.7) is
> > reported in RMRR (78907000, 78986fff)'s scope!
> > (XEN) [VT-D]dmar.c:596: Ignore the RMRR (78907000, 78986fff) due to
>
> This is also part of a reserved region, so should be added to the
> iommu page tables anyway regardless of this message.

I wonder if this is for the Intel AMT PCI device? I assumed it is
disabled, but I actually can't find it listed in the BIOS
configuration to verify.

> > devices under its scope are not PCI discoverable!
> >
> > > > I agree.
> > > >
> > > > Can you paste the memory map as printed by Xen when booting, and what
> > > > command line are you using to boot Xen.
> > >
> > > So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen
> > >
> > > There's the memory map
> > > (XEN) TBOOT RAM map:
> > > (XEN) 0000000000000000 - 0000000000060000 (usable)
> > > (XEN) 0000000000060000 - 0000000000068000 (reserved)
> > > (XEN) 0000000000068000 - 000000000009e000 (usable)
> > > (XEN) 000000000009e000 - 000000000009f000 (reserved)
> > > (XEN) 000000000009f000 - 00000000000a0000 (usable)
> > > (XEN) 00000000000a0000 - 0000000000100000 (reserved)
> > > (XEN) 0000000000100000 - 0000000040000000 (usable)
> > > (XEN) 0000000040000000 - 0000000040400000 (reserved)
> > > (XEN) 0000000040400000 - 000000007024b000 (usable)
> > > (XEN) 000000007024b000 - 000000007024c000 (ACPI NVS)
> > > (XEN) 000000007024c000 - 000000007024d000 (reserved)
> > > (XEN) 000000007024d000 - 0000000077f19000 (usable)
> > > (XEN) 0000000077f19000 - 0000000078987000 (reserved)
> > > (XEN) 0000000078987000 - 0000000078a04000 (ACPI data)
> > > (XEN) 0000000078a04000 - 0000000078ea3000 (ACPI NVS)
> > > (XEN) 0000000078ea3000 - 000000007acff000 (reserved)
> > > (XEN) 000000007acff000 - 000000007ad00000 (usable)
> > > (XEN) 000000007ad00000 - 000000007f800000 (reserved)
> > > (XEN) 00000000f0000000 - 00000000f8000000 (reserved)
> > > (XEN) 00000000fe000000 - 00000000fe011000 (reserved)
> > > (XEN) 00000000fec00000 - 00000000fec01000 (reserved)
> > > (XEN) 00000000fee00000 - 00000000fee01000 (reserved)
> > > (XEN) 00000000ff000000 - 0000000100000000 (reserved)
> > > (XEN) 0000000100000000 - 000000047c800000 (usable)
> > > (XEN) EFI memory map:
> > > (XEN) 0000000000000-000000009dfff type=7 attr=000000000000000f
> > > (XEN) 000000009e000-000000009efff type=0 attr=000000000000000f
> > > (XEN) 000000009f000-000000009ffff type=3 attr=000000000000000f
> > > (XEN) 0000000100000-000003fffffff type=7 attr=000000000000000f
> > > (XEN) 0000040000000-00000403fffff type=0 attr=000000000000000f
> > > (XEN) 0000040400000-000005e359fff type=7 attr=000000000000000f
> > > (XEN) 000005e35a000-000005e399fff type=4 attr=000000000000000f
> > > (XEN) 000005e39a000-000006a47dfff type=7 attr=000000000000000f
> > > (XEN) 000006a47e000-000006c3eefff type=2 attr=000000000000000f
> > > (XEN) 000006c3ef000-000006d5eefff type=1 attr=000000000000000f
> > > (XEN) 000006d5ef000-000006d86cfff type=2 attr=000000000000000f
> > > (XEN) 000006d86d000-000006d978fff type=1 attr=000000000000000f
> > > (XEN) 000006d979000-000006dc7afff type=4 attr=000000000000000f
> > > (XEN) 000006dc7b000-000006dc98fff type=3 attr=000000000000000f
> > > (XEN) 000006dc99000-000006dcc7fff type=4 attr=000000000000000f
> > > (XEN) 000006dcc8000-000006dccdfff type=3 attr=000000000000000f
> > > (XEN) 000006dcce000-00000701a5fff type=4 attr=000000000000000f
> > > (XEN) 00000701a6000-00000701c8fff type=3 attr=000000000000000f
> > > (XEN) 00000701c9000-00000701edfff type=4 attr=000000000000000f
> > > (XEN) 00000701ee000-0000070204fff type=3 attr=000000000000000f
> > > (XEN) 0000070205000-000007022cfff type=4 attr=000000000000000f
> > > (XEN) 000007022d000-000007024afff type=3 attr=000000000000000f
> > > (XEN) 000007024b000-000007024bfff type=10 attr=000000000000000f
> > > (XEN) 000007024c000-000007024cfff type=6 attr=800000000000000f
> > > (XEN) 000007024d000-000007024dfff type=4 attr=000000000000000f
> > > (XEN) 000007024e000-0000070282fff type=3 attr=000000000000000f
> > > (XEN) 0000070283000-00000702c3fff type=4 attr=000000000000000f
> > > (XEN) 00000702c4000-00000702c8fff type=3 attr=000000000000000f
> > > (XEN) 00000702c9000-00000702defff type=4 attr=000000000000000f
> > > (XEN) 00000702df000-0000070307fff type=3 attr=000000000000000f
> > > (XEN) 0000070308000-0000070317fff type=4 attr=000000000000000f
> > > (XEN) 0000070318000-0000070319fff type=3 attr=000000000000000f
> > > (XEN) 000007031a000-0000070331fff type=4 attr=000000000000000f
> > > (XEN) 0000070332000-0000070349fff type=3 attr=000000000000000f
> > > (XEN) 000007034a000-0000070356fff type=2 attr=000000000000000f
> > > (XEN) 0000070357000-0000070357fff type=7 attr=000000000000000f
> > > (XEN) 0000070358000-0000070358fff type=2 attr=000000000000000f
> > > (XEN) 0000070359000-0000076f3efff type=4 attr=000000000000000f
> > > (XEN) 0000076f3f000-00000772affff type=7 attr=000000000000000f
> > > (XEN) 00000772b0000-0000077f18fff type=3 attr=000000000000000f
> > > (XEN) 0000077f19000-0000078986fff type=0 attr=000000000000000f
> > > (XEN) 0000078987000-0000078a03fff type=9 attr=000000000000000f
> > > (XEN) 0000078a04000-0000078ea2fff type=10 attr=000000000000000f
> > > (XEN) 0000078ea3000-000007ab22fff type=6 attr=800000000000000f
> > > (XEN) 000007ab23000-000007acfefff type=5 attr=800000000000000f
> > > (XEN) 000007acff000-000007acfffff type=4 attr=000000000000000f
> > > (XEN) 0000100000000-000047c7fffff type=7 attr=000000000000000f
> > > (XEN) 00000000a0000-00000000fffff type=0 attr=0000000000000000
> > > (XEN) 000007ad00000-000007adfffff type=0 attr=070000000000000f
> > > (XEN) 000007ae00000-000007f7fffff type=0 attr=0000000000000000
> > > (XEN) 00000f0000000-00000f7ffffff type=11 attr=800000000000100d
> > > (XEN) 00000fe000000-00000fe010fff type=11 attr=8000000000000001
> > > (XEN) 00000fec00000-00000fec00fff type=11 attr=8000000000000001
> > > (XEN) 00000fee00000-00000fee00fff type=11 attr=8000000000000001
> > > (XEN) 00000ff000000-00000ffffffff type=11 attr=800000000000100d
> > >
> > > Command line
> > > console=com1 dom0_mem=min:420M,max:420M,420M efi=no-rs,attr=uc
> > > com1=115200,8n1,pci mbi-video vga=current flask=enforcing loglvl=debug
> > > guest_loglvl=debug smt=0 ucode=-1 bootscrub=1
> > > argo=yes,mac-permissive=1 iommu=force,igfx
> > >
> > > iommu=force,igfx was to force igfx back on. I added a dmi quirk to
> > > set no-igfx on this platform as a temporary workaround.
>
> I assume setting no-igfx fixed the issue and the card works fine in
> that case?

Yes, it seems to work. The internal and 2 external monitors are
displaying and seem okay. If I unplug the dock with those 2 displays,
then go plug in a different dock with a different monitor, I've seen
(unclear how often) the i915 report errors with configuring it's
"pipe" and the built in display (eDP) is black. But it may recover
sometimes?

> > > > Have you tried adding dom0-iommu=map-inclusive to the Xen command
> > > > line?
> >
> > Still seeing faults with dom0-iommu=map-inclusive. At a different
> > address this time:
> > Oct 16 15:58:05.110768 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > Request device [0000:00:02.0] fault addr ea0c4f000, iommu reg = ffff
>
> That's also past the end of RAM.
>
> > 82c00021d000
> > Oct 16 15:58:05.110774 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > PTE Read access is not set
> > Oct 16 15:58:05.110777 VM hypervisor: (XEN) print_vtd_entries: iommu
> > #0 dev 0000:00:02.0 gmfn ea0c4f
> > Oct 16 15:58:05.110780 VM hypervisor: (XEN) root_entry[00] = 46e129001
> > Oct 16 15:58:05.110782 VM hypervisor: (XEN) context[10] = 2_46e128001
> > Oct 16 15:58:05.110785 VM hypervisor: (XEN) l4[000] = 46e11b003
> > Oct 16 15:58:05.110787 VM hypervisor: (XEN) l3[03a] = 0
> > Oct 16 15:58:05.110789 VM hypervisor: (XEN) l3[03a] not present
> >
> > The previous posting, the two faulting addresses repeated in pairs.
> > Here it is only this one address repeating.
> >
> > I plugged and unplugged and a different address was repeating with a
> > few other random addresses with 1 or 2 faults. Here is uniq -c output
> > of the address and count pulled from the logs:
> > 0x1ce9d6b000 2007
> > 0x31b50d5000 1
> > 0x1ce9d6b000 882
> > 0x707741000 1
> > 0x1ce9d6b000 1114
> > 0x20d2099000 1
> > 0x1ce9d6b000 3489
> > 0xeb98eb000 1
> > 0x1ce9d6b000 2430
> > 0xeb98eb000 1
> > 0x1ce9d6b000 1300
> > 0x22f20bb000 1
> > 0x1ce9d6b000 269
> > 0x22f20bb000 1
> > 0x1ce9d6b000 5091
> > 0x6c99ec9000 1
> > 0x1ce9d6b000 29
> > 0xeb98eb000 1
> > 0x1ce9d6b000 4599
> > 0x6c99ec9000 1
> > 0x1ce9d6b000 1989
>
> Hm, it's hard to tell what's going on. My limited experience with
> IOMMU faults on broken systems there's a small range that initially
> triggers those, and then the device goes wonky and starts accessing a
> whole load of invalid addresses.
>
> You could try adding those manually using the rmrr Xen command line
> option [0], maybe you can figure out which range(s) are missing?

They seem to change, so it's hard to know. Would there be harm in
adding one to cover the end of RAM ( 0x04,7c80,0000 ) to (
0xff,ffff,ffff )? Maybe that would just quiet the pointless faults
while leaving the IOMMU enabled?

Thanks for taking a look.

Regards,
Jason
Re: i915 dma faults on Xen [ In reply to ]
On 21.10.2020 14:45, Jason Andryuk wrote:
> On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>> Hm, it's hard to tell what's going on. My limited experience with
>> IOMMU faults on broken systems there's a small range that initially
>> triggers those, and then the device goes wonky and starts accessing a
>> whole load of invalid addresses.
>>
>> You could try adding those manually using the rmrr Xen command line
>> option [0], maybe you can figure out which range(s) are missing?
>
> They seem to change, so it's hard to know. Would there be harm in
> adding one to cover the end of RAM ( 0x04,7c80,0000 ) to (
> 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults
> while leaving the IOMMU enabled?

While they may quieten the faults, I don't think those faults are
pointless. They indicate some problem with the software (less
likely the hardware, possibly the firmware) that you're using.
Also there's the question of what the overall behavior is going
to be when devices are permitted to access unpopulated address
ranges. I assume you did check already that no devices have their
BARs placed in that range?

Jan
Re: i915 dma faults on Xen [ In reply to ]
On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 21.10.2020 14:45, Jason Andryuk wrote:
> > On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >> Hm, it's hard to tell what's going on. My limited experience with
> >> IOMMU faults on broken systems there's a small range that initially
> >> triggers those, and then the device goes wonky and starts accessing a
> >> whole load of invalid addresses.
> >>
> >> You could try adding those manually using the rmrr Xen command line
> >> option [0], maybe you can figure out which range(s) are missing?
> >
> > They seem to change, so it's hard to know. Would there be harm in
> > adding one to cover the end of RAM ( 0x04,7c80,0000 ) to (
> > 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults
> > while leaving the IOMMU enabled?
>
> While they may quieten the faults, I don't think those faults are
> pointless. They indicate some problem with the software (less
> likely the hardware, possibly the firmware) that you're using.
> Also there's the question of what the overall behavior is going
> to be when devices are permitted to access unpopulated address
> ranges. I assume you did check already that no devices have their
> BARs placed in that range?

Isn't no-igfx already letting them try to read those unpopulated addresses?

Looks like all PCI BARs are below 4GB. The graphics ones are:
00:02.0 VGA compatible controller: Intel Corporation Device 3ea0 (rev
02) (prog-if 00 [VGA controller])
Subsystem: Dell Device 08b9
Flags: bus master, fast devsel, latency 0, IRQ 177
Memory at cb000000 (64-bit, non-prefetchable) [size=16M]
Memory at 80000000 (64-bit, prefetchable) [size=256M]

Yes, I agree the faults aren't pointless. I'm wondering if it's
something with the i915 driver or hardware having assumptions that
aren't met by Xen swiotlb.

Regards,
Jason
Re: i915 dma faults on Xen [ In reply to ]
On 21.10.2020 15:36, Jason Andryuk wrote:
> On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 21.10.2020 14:45, Jason Andryuk wrote:
>>> On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>> Hm, it's hard to tell what's going on. My limited experience with
>>>> IOMMU faults on broken systems there's a small range that initially
>>>> triggers those, and then the device goes wonky and starts accessing a
>>>> whole load of invalid addresses.
>>>>
>>>> You could try adding those manually using the rmrr Xen command line
>>>> option [0], maybe you can figure out which range(s) are missing?
>>>
>>> They seem to change, so it's hard to know. Would there be harm in
>>> adding one to cover the end of RAM ( 0x04,7c80,0000 ) to (
>>> 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults
>>> while leaving the IOMMU enabled?
>>
>> While they may quieten the faults, I don't think those faults are
>> pointless. They indicate some problem with the software (less
>> likely the hardware, possibly the firmware) that you're using.
>> Also there's the question of what the overall behavior is going
>> to be when devices are permitted to access unpopulated address
>> ranges. I assume you did check already that no devices have their
>> BARs placed in that range?
>
> Isn't no-igfx already letting them try to read those unpopulated addresses?

Yes, and it is for the reason that the documentation for the
option says "If specifying `no-igfx` fixes anything, please
report the problem." I imply from in in particular that one
better wouldn't use it for non-development purposes of whatever
kind.

Jan
Re: i915 dma faults on Xen [ In reply to ]
On Wed, Oct 21, 2020 at 9:59 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 21.10.2020 15:36, Jason Andryuk wrote:
> > On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 21.10.2020 14:45, Jason Andryuk wrote:
> >>> On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>> Hm, it's hard to tell what's going on. My limited experience with
> >>>> IOMMU faults on broken systems there's a small range that initially
> >>>> triggers those, and then the device goes wonky and starts accessing a
> >>>> whole load of invalid addresses.
> >>>>
> >>>> You could try adding those manually using the rmrr Xen command line
> >>>> option [0], maybe you can figure out which range(s) are missing?
> >>>
> >>> They seem to change, so it's hard to know. Would there be harm in
> >>> adding one to cover the end of RAM ( 0x04,7c80,0000 ) to (
> >>> 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults
> >>> while leaving the IOMMU enabled?
> >>
> >> While they may quieten the faults, I don't think those faults are
> >> pointless. They indicate some problem with the software (less
> >> likely the hardware, possibly the firmware) that you're using.
> >> Also there's the question of what the overall behavior is going
> >> to be when devices are permitted to access unpopulated address
> >> ranges. I assume you did check already that no devices have their
> >> BARs placed in that range?
> >
> > Isn't no-igfx already letting them try to read those unpopulated addresses?
>
> Yes, and it is for the reason that the documentation for the
> option says "If specifying `no-igfx` fixes anything, please
> report the problem." I imply from in in particular that one
> better wouldn't use it for non-development purposes of whatever
> kind.

I stopped seeing these DMA faults, but I didn't know what made them go
away. Then when working with an older 5.4.64 kernel, I saw them
again. Eric bisected down to the 5.4.y version of mainline linux
commit:

commit 8195400f7ea95399f721ad21f4d663a62c65036f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Oct 19 11:15:23 2020 +0100

drm/i915: Force VT'd workarounds when running as a guest OS

If i915.ko is being used as a passthrough device, it does not know if
the host is using intel_iommu. Mixing the iommu and gfx causes a few
issues (such as scanout overfetch) which we need to workaround inside
the driver, so if we detect we are running under a hypervisor, also
assume the device access is being virtualised.

Regards,
Jason
Re: i915 dma faults on Xen [ In reply to ]
On Fri, Feb 19, 2021 at 12:30:23PM -0500, Jason Andryuk wrote:
> On Wed, Oct 21, 2020 at 9:59 AM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 21.10.2020 15:36, Jason Andryuk wrote:
> > > On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich <jbeulich@suse.com> wrote:
> > >>
> > >> On 21.10.2020 14:45, Jason Andryuk wrote:
> > >>> On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >>>> Hm, it's hard to tell what's going on. My limited experience with
> > >>>> IOMMU faults on broken systems there's a small range that initially
> > >>>> triggers those, and then the device goes wonky and starts accessing a
> > >>>> whole load of invalid addresses.
> > >>>>
> > >>>> You could try adding those manually using the rmrr Xen command line
> > >>>> option [0], maybe you can figure out which range(s) are missing?
> > >>>
> > >>> They seem to change, so it's hard to know. Would there be harm in
> > >>> adding one to cover the end of RAM ( 0x04,7c80,0000 ) to (
> > >>> 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults
> > >>> while leaving the IOMMU enabled?
> > >>
> > >> While they may quieten the faults, I don't think those faults are
> > >> pointless. They indicate some problem with the software (less
> > >> likely the hardware, possibly the firmware) that you're using.
> > >> Also there's the question of what the overall behavior is going
> > >> to be when devices are permitted to access unpopulated address
> > >> ranges. I assume you did check already that no devices have their
> > >> BARs placed in that range?
> > >
> > > Isn't no-igfx already letting them try to read those unpopulated addresses?
> >
> > Yes, and it is for the reason that the documentation for the
> > option says "If specifying `no-igfx` fixes anything, please
> > report the problem." I imply from in in particular that one
> > better wouldn't use it for non-development purposes of whatever
> > kind.
>
> I stopped seeing these DMA faults, but I didn't know what made them go
> away. Then when working with an older 5.4.64 kernel, I saw them
> again. Eric bisected down to the 5.4.y version of mainline linux
> commit:
>
> commit 8195400f7ea95399f721ad21f4d663a62c65036f
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date: Mon Oct 19 11:15:23 2020 +0100
>
> drm/i915: Force VT'd workarounds when running as a guest OS
>
> If i915.ko is being used as a passthrough device, it does not know if
> the host is using intel_iommu. Mixing the iommu and gfx causes a few
> issues (such as scanout overfetch) which we need to workaround inside
> the driver, so if we detect we are running under a hypervisor, also
> assume the device access is being virtualised.

So the commit above fixes the DMA faults seen on Linux when using a
i915 gfx card?

Thanks for digging into this.

Roger.
Re: i915 dma faults on Xen [ In reply to ]
On Mon, Feb 22, 2021 at 5:18 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Fri, Feb 19, 2021 at 12:30:23PM -0500, Jason Andryuk wrote:
> > On Wed, Oct 21, 2020 at 9:59 AM Jan Beulich <jbeulich@suse.com> wrote:
> > >
> > > On 21.10.2020 15:36, Jason Andryuk wrote:
> > > > On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich <jbeulich@suse.com> wrote:
> > > >>
> > > >> On 21.10.2020 14:45, Jason Andryuk wrote:
> > > >>> On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > >>>> Hm, it's hard to tell what's going on. My limited experience with
> > > >>>> IOMMU faults on broken systems there's a small range that initially
> > > >>>> triggers those, and then the device goes wonky and starts accessing a
> > > >>>> whole load of invalid addresses.
> > > >>>>
> > > >>>> You could try adding those manually using the rmrr Xen command line
> > > >>>> option [0], maybe you can figure out which range(s) are missing?
> > > >>>
> > > >>> They seem to change, so it's hard to know. Would there be harm in
> > > >>> adding one to cover the end of RAM ( 0x04,7c80,0000 ) to (
> > > >>> 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults
> > > >>> while leaving the IOMMU enabled?
> > > >>
> > > >> While they may quieten the faults, I don't think those faults are
> > > >> pointless. They indicate some problem with the software (less
> > > >> likely the hardware, possibly the firmware) that you're using.
> > > >> Also there's the question of what the overall behavior is going
> > > >> to be when devices are permitted to access unpopulated address
> > > >> ranges. I assume you did check already that no devices have their
> > > >> BARs placed in that range?
> > > >
> > > > Isn't no-igfx already letting them try to read those unpopulated addresses?
> > >
> > > Yes, and it is for the reason that the documentation for the
> > > option says "If specifying `no-igfx` fixes anything, please
> > > report the problem." I imply from in in particular that one
> > > better wouldn't use it for non-development purposes of whatever
> > > kind.
> >
> > I stopped seeing these DMA faults, but I didn't know what made them go
> > away. Then when working with an older 5.4.64 kernel, I saw them
> > again. Eric bisected down to the 5.4.y version of mainline linux
> > commit:
> >
> > commit 8195400f7ea95399f721ad21f4d663a62c65036f
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date: Mon Oct 19 11:15:23 2020 +0100
> >
> > drm/i915: Force VT'd workarounds when running as a guest OS
> >
> > If i915.ko is being used as a passthrough device, it does not know if
> > the host is using intel_iommu. Mixing the iommu and gfx causes a few
> > issues (such as scanout overfetch) which we need to workaround inside
> > the driver, so if we detect we are running under a hypervisor, also
> > assume the device access is being virtualised.
>
> So the commit above fixes the DMA faults seen on Linux when using a
> i915 gfx card?

Yes, DMA faults are not seen with this commit. i915 behaves
differently when it detects VT-d active, and this commit sets the VT-d
behavior when running under any hypervisor.

Regards,
Jason