Mailing List Archive

Hang booting Dom0: nvme timeout, completion polled
Hi,

we are working with Ubuntu 22.04 (Ubuntu Kernel 5.15.0-67) and xen 4.16
(latest commit from stable) and cannot not boot into dom0 on a new
server. The NVMEs are in software raid 1.

With same kernel without xen we can boot the server without problems.

The host is hanging on:
nvme nvme0: I/O 0 QID 0 timeout, completion polled
nvme nvme1: I/O 8 QID 0 timeout, completion polled

Some specs:

Motherboard:
description: Motherboard
product: Pro WS 565-ACE
vendor: ASUSTeK COMPUTER INC.
physical id: 0
version: Rev X.0x
serial: 221112256701775
slot: Default string
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: 9901
date: 10/13/2022
size: 64KiB
capacity: 16MiB
capabilities: pci apm upgrade shadowing cdboot bootselect
socketedrom edd int13floppy1200 int13floppy720 int13floppy2880
int5printscreen int9keyboard int14serial int17printer acpi usb
biosbootspecification uefi


PCI / NVME:
*-pci:1
description: PCI bridge
product: Starship/Matisse GPP Bridge
vendor: Advanced Micro Devices, Inc. [AMD]
physical id: 3.1
bus info: pci@0000:00:03.1
version: 00
width: 32 bits
clock: 33MHz
capabilities: pci pm pciexpress msi ht normal_decode
bus_master cap_list
configuration: driver=pcieport
resources: irq:27 memory:fc900000-fc9fffff
*-nvme
description: NVMe device
product: SAMSUNG MZQL23T8HCLS-00A07
vendor: Samsung Electronics Co Ltd
physical id: 0
bus info: pci@0000:08:00.0
logical name: /dev/nvme0
version: GDC5802Q
serial: S64HNJ0T643797
width: 64 bits
clock: 33MHz
capabilities: nvme pm msi pciexpress msix nvm_express
bus_master cap_list rom
configuration: driver=nvme latency=0
nqn=nqn.1994-11.com.samsung:nvme:PM9A3:2.5-inch:S64HNJ0T643797 state=live
resources: irq:41 memory:fc910000-fc913fff
memory:fc900000-fc90ffff

Can anyone give hints for solving this problem?

Best regards and thank you in advance,
Jan Kellermann
Re: Hang booting Dom0: nvme timeout, completion polled [ In reply to ]
On Wed, Mar 15, 2023 at 07:22:49PM +0000, Jan Kellermann wrote:
> The host is hanging on:
> nvme nvme0: I/O 0 QID 0 timeout, completion polled
> nvme nvme1: I/O 8 QID 0 timeout, completion polled
>
> Can anyone give hints for solving this problem?
>
> Best regards and thank you in advance,
> Jan Kellermann

I've experienced the same problem on a X399 motherboard with Ryzen 5000
(Zen 3). Apparently, malfunctional IOMMU is currently is a systematic
issue on AMD Ryzen with Xen - it really should be reported as an upstream
bug, if nobody has done it yet - I was able to solve it by disabling SMT
(hyperthreading) in BIOS [1].

If SMT is still needed, another suggested solution by QubesOS users [2]
(which uses the Xen hypervisor) is booting Dom0 with the following
Linux kernel parameters:

dom0_max_vcpus=1 dom0_vcpus_pin

This allocates one CPU to Dom0 exclusively [3] - although I haven't tested
it personally.

Cheers,
Tom Li

[1] https://github.com/QubesOS/qubes-issues/issues/8136

[2] https://forum.qubes-os.org/t/gpd-win-max-2-unable-to-boot-installer/14466/14

[3] https://wiki.xenproject.org/wiki/Xen_FAQ_Dom0