Mailing List Archive

Ryzen 6000 (Mobile)
Hi All,

I'm having issues getting QubesOS running on my Lenovo Yoga Slim 7 ProX (AMD Ryzen 6800HS)

Firstly in order to boot the device at all, I'm required to add `dom0_max_vcpus=1 dom0_vcpus_pin` to dom0's CMDLINE, this is similar to what I had to do previously - https://xen.markmail.org/search/?q=Ryzen#query:Ryzen+page:1+mid:f3hel4yj25qilabv+state:results with the Ryzen 4000 series, however without these options added dom0 never fully boots into Fedora.

The other interesting issue I'm having is upon booting any VM, just a normal simple VM without any PCI devices attached, it'll successfully start, about 1 second will pass then the entire device will hang and reset, it's virtually impossible to get any logs at all out of the device when it's in this state.

FYI: QubesOS uses Xen 4.14

Thanks all
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 19.07.2022 01:04, Dylanger Daly wrote:
> I'm having issues getting QubesOS running on my Lenovo Yoga Slim 7 ProX (AMD Ryzen 6800HS)
>
> Firstly in order to boot the device at all, I'm required to add `dom0_max_vcpus=1 dom0_vcpus_pin` to dom0's CMDLINE, this is similar to what I had to do previously - https://xen.markmail.org/search/?q=Ryzen#query:Ryzen+page:1+mid:f3hel4yj25qilabv+state:results with the Ryzen 4000 series, however without these options added dom0 never fully boots into Fedora.
>
> The other interesting issue I'm having is upon booting any VM, just a normal simple VM without any PCI devices attached, it'll successfully start, about 1 second will pass then the entire device will hang and reset, it's virtually impossible to get any logs at all out of the device when it's in this state.
>
> FYI: QubesOS uses Xen 4.14

I guess you understand that with no logs or anything else technical
there's very little chance anyone is going to be able to do anything
about this, without having access to an affected system?

Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
Yes ??, do you know if it's possible to obtain logs some other way for a system that doesn't have a COM port? console=vga exists but I can't seem to flip over to the vga "console" after I trigger the start of a VM

-------- Original Message --------
On Jul 19, 2022, 4:29 PM, Jan Beulich wrote:

> On 19.07.2022 01:04, Dylanger Daly wrote: > I'm having issues getting QubesOS running on my Lenovo Yoga Slim 7 ProX (AMD Ryzen 6800HS) > > Firstly in order to boot the device at all, I'm required to add `dom0_max_vcpus=1 dom0_vcpus_pin` to dom0's CMDLINE, this is similar to what I had to do previously - https://xen.markmail.org/search/?q=Ryzen#query:Ryzen+page:1+mid:f3hel4yj25qilabv+state:results with the Ryzen 4000 series, however without these options added dom0 never fully boots into Fedora. > > The other interesting issue I'm having is upon booting any VM, just a normal simple VM without any PCI devices attached, it'll successfully start, about 1 second will pass then the entire device will hang and reset, it's virtually impossible to get any logs at all out of the device when it's in this state. > > FYI: QubesOS uses Xen 4.14 I guess you understand that with no logs or anything else technical there's very little chance anyone is going to be able to do anything about this, without having access to an affected system? Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
Dom0 (being a VM itself) boots just perfectly, it's any other domU that triggers the issue, I'm hoping I can somehow hook up gdb or something to Xen somehow

-------- Original Message --------
On Jul 19, 2022, 4:29 PM, Jan Beulich wrote:

> On 19.07.2022 01:04, Dylanger Daly wrote: > I'm having issues getting QubesOS running on my Lenovo Yoga Slim 7 ProX (AMD Ryzen 6800HS) > > Firstly in order to boot the device at all, I'm required to add `dom0_max_vcpus=1 dom0_vcpus_pin` to dom0's CMDLINE, this is similar to what I had to do previously - https://xen.markmail.org/search/?q=Ryzen#query:Ryzen+page:1+mid:f3hel4yj25qilabv+state:results with the Ryzen 4000 series, however without these options added dom0 never fully boots into Fedora. > > The other interesting issue I'm having is upon booting any VM, just a normal simple VM without any PCI devices attached, it'll successfully start, about 1 second will pass then the entire device will hang and reset, it's virtually impossible to get any logs at all out of the device when it's in this state. > > FYI: QubesOS uses Xen 4.14 I guess you understand that with no logs or anything else technical there's very little chance anyone is going to be able to do anything about this, without having access to an affected system? Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 19.07.2022 08:47, Dylanger Daly wrote:
> Yes ??, do you know if it's possible to obtain logs some other way for a system that doesn't have a COM port? console=vga exists but I can't seem to flip over to the vga "console" after I trigger the start of a VM

I'd focus on the booting issues first. And I guess you can take a video
of that (assuming that a single screenshot likely isn't going to be
enough), possibly with "vga=keep" in place (albeit that introduces
extra slowness)?

There's also the option of using an EHCI debug port for the serial
console, but this requires (a) a special cable and (b) the system
designers not having inserted any hubs between the controller and
the connector.

Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 19/07/2022 00:04, Dylanger Daly wrote:
Hi All,

I'm having issues getting QubesOS running on my Lenovo Yoga Slim 7 ProX (AMD Ryzen 6800HS)

Firstly in order to boot the device at all, I'm required to add `dom0_max_vcpus=1 dom0_vcpus_pin` to dom0's CMDLINE, this is similar to what I had to do previously - https://xen.markmail.org/search/?q=Ryzen#query:Ryzen+page:1+mid:f3hel4yj25qilabv+state:results with the Ryzen 4000 series, however without these options added dom0 never fully boots into Fedora.

The other interesting issue I'm having is upon booting any VM, just a normal simple VM without any PCI devices attached, it'll successfully start, about 1 second will pass then the entire device will hang and reset, it's virtually impossible to get any logs at all out of the device when it's in this state.

FYI: QubesOS uses Xen 4.14

Ok, these sound like two different things. One is dom0 failing to boot, and one is the hang/reset when starting the VMs.

Lets start with the dom0 problem first. The link you provide suggests a credit2 bug. Does dom0 boot if you pass `sched=credit` on the command line, in place of `dom0_max_vcpus=1 dom0_vcpus_pin` ?

~Andrew
Re: Ryzen 6000 (Mobile) [ In reply to ]
> I'd focus on the booting issues first. And I guess you can take a video
> of that (assuming that a single screenshot likely isn't going to be
> enough), possibly with "vga=keep" in place (albeit that introduces
> extra slowness)?
>
> There's also the option of using an EHCI debug port for the serial
> console, but this requires (a) a special cable and (b) the system
> designers not having inserted any hubs between the controller andthe connector.

Do you know if it's possible to have `console=vga vga=keep` and specify a secondary monitor? This would be very useful if I could have Xen log via a secondary monitor, in any case I'll record a video today. I can't seem to get anything useful out of /var/log/xen/console/hypervisor.log, I assume this log file isn't written to on a 'live' basis.

I would assume AMD has disabled any sort of debugging/NIDnT/CCD, surprisingly or unsurprisingly it's easier to debug Chromebooks with their CCD USB-C cables.

> Ok, these sound like two different things. One is dom0 failing to boot, and one is the hang/reset when starting the VMs.
> Lets start with the dom0 problem first. The link you provide suggests a credit2 bug. Does dom0 boot if you pass `sched=credit` on the command line, in place of `dom0_max_vcpus=1 dom0_vcpus_pin` ?

Yes, this is correct, I think the first problem is an AMD 6000 Series CPU issue, as others have reported this same first issue: https://github.com/QubesOS/qubes-issues/issues/7570 (having to add `dom0_max_vcpus=1 dom0_vcpus_pin`)

I believe the second issue could be platform specific, that being a UEFI Option relating to the scheduler or something else causing the device to hang, anecdotally others that have the same-ish CPU aren't having this issue, so it could be specific to the Lenovo Yoga Slim 7 Pro X (Gen 7).

Issue #1 seems to be common with newer AMD Ryzen Mobile CPUs
Issue #2 seems to be Lenovo specific, I've tried limiting other domU's to 1 vcpu to no avail, I haven't tried pinning a vcpu to a domU yet.

Unfortunately I tried adding `sched=credit` in place of the pinning config and dom0 didn't come up to ask for a LUKs password. Dom0 does indeed boot, it just doesn't make it past the early stage of kernel setup.

Cheers, Dylanger
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 20.07.2022 02:33, Dylanger Daly wrote:
>> I'd focus on the booting issues first. And I guess you can take a video
>> of that (assuming that a single screenshot likely isn't going to be
>> enough), possibly with "vga=keep" in place (albeit that introduces
>> extra slowness)?
>>
>> There's also the option of using an EHCI debug port for the serial
>> console, but this requires (a) a special cable and (b) the system
>> designers not having inserted any hubs between the controller andthe connector.
>
> Do you know if it's possible to have `console=vga vga=keep` and specify a secondary monitor? This would be very useful if I could have Xen log via a secondary monitor,

No, if anything it could be the other way around. Xen wants to use the
(sole) VGAish thing in the system; Dom0 kernel and Dom0 userspace (X)
may be happy to use any secondary (non-VGA) graphics card. On EFI it
might in principle be possible, but that would require (perhaps quite
a bit of) work in Xen.

Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 20.07.2022 10:11, Jan Beulich wrote:
> On 20.07.2022 02:33, Dylanger Daly wrote:
>>> I'd focus on the booting issues first. And I guess you can take a video
>>> of that (assuming that a single screenshot likely isn't going to be
>>> enough), possibly with "vga=keep" in place (albeit that introduces
>>> extra slowness)?
>>>
>>> There's also the option of using an EHCI debug port for the serial
>>> console, but this requires (a) a special cable and (b) the system
>>> designers not having inserted any hubs between the controller andthe connector.
>>
>> Do you know if it's possible to have `console=vga vga=keep` and specify a secondary monitor? This would be very useful if I could have Xen log via a secondary monitor,
>
> No, if anything it could be the other way around. Xen wants to use the
> (sole) VGAish thing in the system; Dom0 kernel and Dom0 userspace (X)
> may be happy to use any secondary (non-VGA) graphics card. On EFI it
> might in principle be possible, but that would require (perhaps quite
> a bit of) work in Xen.

Oh - and then likely still only when there are two gfx cards in the
system. Otherwise we'd likely get into card-specific-driver territory.

Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
Hi All,

It would appear this issue isn't specific to the Lenovo Yoga Slim 7 ProX, someone else in the Qubes community is having the same issue (https://github.com/QubesOS/qubes-issues/issues/7620#issuecomment-1209114810)

Can anyone shed some light on what possibly might be making a Xen 4.14 Hypervisor crash after attempting to start a domU? Dom0 start's just fine, it 'feels' like a memory violation or DMA/IOMMU issue, because the VM does successfully start, however 1 or 2 seconds after it successfully boot the mouse (in dom0) locks up for 2-3 seconds and the entire device resets.

I can't seem to get any logs at all, xen's console, dom0 dmesg and domU's dmesg all appear to be fine in the lead up to the crash. I assume no one has had a chance to use Xen on Ryzen 6000 (Rembrandt) yet due to the fact it's hard to get your hands on with the chip shortage etc.

I'm hoping it's something that can be fixed with a cmdline flag, it's very frustrating having this shiny new laptop sitting on my desk :P

Cheers all
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 15.08.2022 13:07, Dylanger Daly wrote:
> It would appear this issue isn't specific to the Lenovo Yoga Slim 7 ProX, someone else in the Qubes community is having the same issue (https://github.com/QubesOS/qubes-issues/issues/7620#issuecomment-1209114810)
>
> Can anyone shed some light on what possibly might be making a Xen 4.14 Hypervisor crash after attempting to start a domU?

Well, to shed light on what's going on we need logged output, which I
understand isn't easy with laptops. Simply trying to guess what's
going wrong isn't very likely to lead us anywhere.

> Dom0 start's just fine, it 'feels' like a memory violation or DMA/IOMMU issue, because the VM does successfully start, however 1 or 2 seconds after it successfully boot the mouse (in dom0) locks up for 2-3 seconds and the entire device resets.

In the Qubes report there's talk of a 5s delay, which makes me assume
Xen crashes (or Dom0 reports itself crashed to Xen), which - unless
overridden - would result in a 5s delay (after logging state) until a
reboot would be attempted. This aspect could be verified by passing
"noreboot" on the Xen command line, in which case the device shouldn't
try to reboot itself at all. But all we'd learn from this is that
there's _some_ form of a crash (but not e.g. a triple fault), still
without knowing any details.

Jan

> I can't seem to get any logs at all, xen's console, dom0 dmesg and domU's dmesg all appear to be fine in the lead up to the crash. I assume no one has had a chance to use Xen on Ryzen 6000 (Rembrandt) yet due to the fact it's hard to get your hands on with the chip shortage etc.
>
> I'm hoping it's something that can be fixed with a cmdline flag, it's very frustrating having this shiny new laptop sitting on my desk :P
>
> Cheers all
Re: Ryzen 6000 (Mobile) [ In reply to ]
Hi Jan,

Indeed adding noreboot does result in the device just hanging there after starting a VM.

I wonder if it's possible to have Xen write out it's log to some memory address, hoping it's doing a warm reset the log messages should still be present.

Cheers
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 15.08.2022 13:50, Dylanger Daly wrote:
> Indeed adding noreboot does result in the device just hanging there after starting a VM.
>
> I wonder if it's possible to have Xen write out it's log to some memory address, hoping it's doing a warm reset the log messages should still be present.

Well, it's certainly possible, but would require code to be written and an
un-clobbered address range to be determined. (In earlier projects I found
it easiest to store data towards the end of the video card's memory, as on
most systems only the part actually used for displaying purposes would be
overwritten.)

Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 15/08/2022 12:07, Dylanger Daly wrote:
> Hi All,
>
> It would appear this issue isn't specific to the Lenovo Yoga Slim 7 ProX, someone else in the Qubes community is having the same issue (https://github.com/QubesOS/qubes-issues/issues/7620#issuecomment-1209114810)
>
> Can anyone shed some light on what possibly might be making a Xen 4.14 Hypervisor crash after attempting to start a domU? Dom0 start's just fine, it 'feels' like a memory violation or DMA/IOMMU issue, because the VM does successfully start, however 1 or 2 seconds after it successfully boot the mouse (in dom0) locks up for 2-3 seconds and the entire device resets.
>
> I can't seem to get any logs at all, xen's console, dom0 dmesg and domU's dmesg all appear to be fine in the lead up to the crash. I assume no one has had a chance to use Xen on Ryzen 6000 (Rembrandt) yet due to the fact it's hard to get your hands on with the chip shortage etc.
>
> I'm hoping it's something that can be fixed with a cmdline flag, it's very frustrating having this shiny new laptop sitting on my desk :P

Append `,keep` to your existing `vga=` option for Xen, and add the
`noreboot` option too.

That should cause Xen to write its backtrace out over whatever was on
the screen.

~Andrew
Re: Ryzen 6000 (Mobile) [ In reply to ]
> Append `,keep` to your existing `vga=` option for Xen, and add the
> `noreboot` option too.
>
> That should cause Xen to write its backtrace out over whatever was on
> the screen.

Hi Andrew,

Great news! I managed to get it to log the error with your cmdlines

Please see the attached images

The error "BUG: unable to handle page fault for address: ffffc90040639019"

It appears to be a memory violation error?

Thanks everyone!
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 15.08.2022 17:39, Dylanger Daly wrote:
> Great news! I managed to get it to log the error with your cmdlines
>
> Please see the attached images
>
> The error "BUG: unable to handle page fault for address: ffffc90040639019"
>
> It appears to be a memory violation error?

Yes, there's an attempt to access something which wasn't (successfully)
mapped. I expect there's a log message ahead of the actual crash telling
us what it was that was attempted to be mapped. A wild guess of mine
would be PCI MMCFG space. We may be able to read something out of the
system's ACPI tables, if you could extract them (at least DSDT, maybe
also SSDTs) into files. It would then also be useful to see the
hypervisor and kernel boot messages, at the very least to know where
certain things live in physical address space.

Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
Hi Jan,

Please see the attached dom0 dmesg log, verbose lspci output and a tar of all SSDT and DSDT decompiled ACPI tables.

Please let let me know if I can send anything else

Cheers
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 15.08.2022 18:54, Dylanger Daly wrote:
> Please see the attached dom0 dmesg log, verbose lspci output and a tar of all SSDT and DSDT decompiled ACPI tables.
>
> Please let let me know if I can send anything else

The lspci output reminds me of having forgotten to ask which device it
is that the domain about to be created is being handed. (I can't help
the impression that the issue isn't with the creation of _any_ DomU,
but only with such where some specific device is passed to it.)

Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
Hi Jan,

Indeed no devices are being passed into the domU, I'm simply trying to start a vanilla VM with no PCIe devices attached.

Could it be a misconfiguration with ACPI tables? I originally thought it could be AMD's SEV but I think it might just be that Xen is attempting to use a memory region that it shouldn't

Cheers
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 16.08.2022 10:34, Dylanger Daly wrote:
> Indeed no devices are being passed into the domU, I'm simply trying to start a vanilla VM with no PCIe devices attached.

Hmm, looking more closely it's the sound device which is being opened by
some ALSA process. I have no clue at all why this would happen while
starting a VM. If the firmware setup allows you to, you may want to try
turning off that device and see if then the VM starts successfully.

> Could it be a misconfiguration with ACPI tables? I originally thought it could be AMD's SEV but I think it might just be that Xen is attempting to use a memory region that it shouldn't

No, it's clearly ACPI which tries to evaluate / modify something. That's not
initiated by Xen at all. It's merely likely that under Xen something works
differently than under native. As said, since now you're able to see log
output on the screen, quite likely you would also see an earlier message
about some mapping operation having failed. Whether that would be a Xen or
kernel message (or both) is uncertain, as it would depend on the specific
operation that is being attempted.

Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
Hi Jan,

Interesting morning indeed!

Opening sound settings in dom0 and setting the HD Audio Controller to "Off" allowed the VM to boot! ????

Very strange indeed

Cheers
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 16.08.2022 11:19, Dylanger Daly wrote:
> Opening sound settings in dom0 and setting the HD Audio Controller to "Off" allowed the VM to boot! ????

"The HD Audio Controller" is somewhat ambiguous - according to lspci
apparently you've got three of them, one named as "multimedia controller"
(and hence likely having functions beyond just sound).

In any event we still need to figure out what ACPI is trying to do when
the controller is being "opened" and why that doesn't work under Xen.

Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 15.08.2022 18:54, Dylanger Daly wrote:
> Please see the attached dom0 dmesg log, verbose lspci output and a tar of all SSDT and DSDT decompiled ACPI tables.

The only way I can currently explain all aspects of the behavior that
I'm aware of is for Dom0's kernel somehow not identifying the page
that ACPI wants to map (via ioremap_cache()) as identity mapped. As
far as ACPI goes, this is what I read out of the tables:

In SSDT27.dsl we have

Scope (\_SB.PCI0.GP17.AZAL)
{
Method (_PS0, 0, NotSerialized) // _PS0: Power State 0
{
Acquire (\M27E, 0xFFFF)
M460 ("FEA-ASL-\\_SB.PCI0.PBC.AZAL._PS0 CpmAzaliaPresentState = 1\n", Zero, Zero, Zero, Zero, Zero, Zero)
M279 = One
M276 ()
Release (\M27E)
}

M276() then invokes

Local0 = M017 (Zero, 0x08, One, 0x19, Zero, 0x08)

with M017() located in SSDT16.dsl:

Method (M017, 6, Serialized)
{
Local0 = M083 /* \M083 */
Local1 = (M083 >> 0x14)
Local2 = (Local1 & 0x0F00)
Local2 += 0x0100
If (((Local1 + Arg0) >= Local2))
{
Local3 = 0x7FFFFFFF
Local3 |= 0x80000000
Local4 = ((Local3 >> Arg4) & (Local3 >> (0x20 - Arg5)
))
Return (Local4)
}

Local0 += (Arg0 << 0x14)
Local0 += (Arg1 << 0x0F)
Local0 += (Arg2 << 0x0C)
Return (M013 (Local0, Arg3, Arg4, Arg5))
}

M013 carries out the actual memory access (32 bits at offset 0x19 from
Local0 that was determined here; oddly enough a mis-aligned access,
but that itself isn't a problem). The base address therefore is M083
offset by (0 << 0x14) + (8 << 0xf) + (1 << 0xc) = 0x41000 if I got
things right.

M083 in turn is a field in

OperationRegion (CPNV, SystemMemory, 0x7AF67018, 0x000100F7)
Field (CPNV, AnyAcc, Lock, Preserve)
{
M082, 32,
M083, 32,
M084, 32,
...

so the first few words of machine memory at 0x7af67018 would be of
interest (assuming of course that address doesn't change across
boots). 0x7af67018 itself is within the ACPI NVS range. Could you
perhaps obtain this from one of the /proc or /sys interfaces (perhaps
from a native kernel), or should I make a debugging patch for the
hypervisor? (Making one right away, with further logging added,
doesn't seem useful until it's clear whether you can actually also
observe output slightly before the actual crash, which has a risk of
being overwritten or scrolling off the screen.)

The situation of course isn't helped by the kernel's PFN <-> MFN
translation asymmetry in pte_pfn_to_mfn() nor pte_mfn_to_pfn()'s
anomaly (as already noted over two years ago in
https://lists.xen.org/archives/html/xen-devel/2020-05/msg00549.html),
albeit the exception error code suggests that the former is what is
getting in the way (and what would then also result in entirely
silent mapping failure). While I would like to patch the kernel at
least as much for the PFN/MFN to survive and hence appear in the page
table entry dump associated with the page fault, I'm afraid the
resulting entry could be recognized as a swap one. Such a patch could
hence only be used for debugging purposes when no swap space is in
use.

Jan
Re: Ryzen 6000 (Mobile) [ In reply to ]
Hi Jan,

I'm sorry I didn't get where in /sys/firmware you'd like to take a look at.

Sometimes when I power the laptop off I can see it's crashing somewhere in ACPI/weird address issue

Is there anyone else struggling with AMD Ryzen 6000 on Xen?
Re: Ryzen 6000 (Mobile) [ In reply to ]
On 24.08.2022 20:15, Dylanger Daly wrote:
> I'm sorry I didn't get where in /sys/firmware you'd like to take a look at.

It's been a long time since I last needed to access that, when it
was still /proc/mem and/or /proc/kmem. Their modern equivalents might
be /sys/devices/virtual/mem/{,k}mem ... But if that's not usable to
get at the needed data, perhaps we should go with logging it by way
of a patch to Xen. Please let me know if I need to hand you a patch
to do so.

> Sometimes when I power the laptop off I can see it's crashing somewhere in ACPI/weird address issue

In ACPI or in EFI? In the latter case suppressing the use of the EFI
runtime service for shutdown/reboot may help ("efi=no-rs" to disable
all runtime services use might be a good first try).

> Is there anyone else struggling with AMD Ryzen 6000 on Xen?

Don't know.

Jan

1 2  View All