Mailing List Archive

Ryzen 4000 (Mobile) Softlocks/Micro-stutters
Hi All,

I'm currently using Xen 4.14 (Qubes 4.1 OS) on a Ryzen 7 4750U PRO, by default I'll experience softlocks where the mouse for example will jolt from time to time, in this state it's not usable.

Adding `dom0_max_vcpus=1 dom0_vcpus_pin` to Xen's CMDLINE results in no more jolting however performance isn't what it should be on an 8 core CPU, softlocks are still a problem within domU's, any sort of UI animation for example.

Reverting [this commit (8e2aa76dc1670e82eaa15683353853bc66bf54fc)](https://github.com/xen-project/xen/commit/8e2aa76dc1670e82eaa15683353853bc66bf54fc) results in even worse performance with or without the above changes to CMDLINE, and it's not usable at all.

Does anyone have any pointers?

Cheers
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
On 15.10.2020 02:38, Dylanger Daly wrote:
> I'm currently using Xen 4.14 (Qubes 4.1 OS) on a Ryzen 7 4750U PRO, by default I'll experience softlocks where the mouse for example will jolt from time to time, in this state it's not usable.

From what you say below I imply this is in Dom0?

> Adding `dom0_max_vcpus=1 dom0_vcpus_pin` to Xen's CMDLINE results in no more jolting however performance isn't what it should be on an 8 core CPU, softlocks are still a problem within domU's, any sort of UI animation for example.
>
> Reverting [this commit (8e2aa76dc1670e82eaa15683353853bc66bf54fc)](https://github.com/xen-project/xen/commit/8e2aa76dc1670e82eaa15683353853bc66bf54fc) results in even worse performance with or without the above changes to CMDLINE, and it's not usable at all.

You saying this surely has a reason, but making the connection would
help. I don't consider it surprising that a revert of an improvement
makes things worse. You having bothered to find a certain code change
also makes me suspect you've experimented with other scheduler
related settings - if so, please share all data you've got. (FAOD -
with the information provided I have no idea what to suggest, sorry.)

Jan
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
Hi Jan, thank you for responding.

Indeed this is for dom0, I only recently tried limiting a domU to 1 core and observed absolutely no softlocks, UI animations are smooth as butter with 1 core only.

Indeed I believe this is a CPU Scheduling issue, I've tried both the older credit and RTDS however both don't boot correctly.
The number of cores on this CPU is 8, 16 threads however Qubes by default disables SMT, sched_credit2_max_cpus_runqueue is 16 by default, I've tried testing with setting this to 7 or 8 however it'll either not boot, or nothing will change.

There are a number of credit2 tweak-ables so I'm hoping to play around and drop the `dom0_max_vcpus=1`, I suspect `sched_credit2_max_cpus_runqueue` is the main thing to play with.

I did manage to get it booting with sched_credit2_max_cpus_runqueue=7 but it ended up locking up shortly after X launched on dom0

??????? Original Message ???????

On Thursday, October 15th, 2020 at 7:18 PM, Jan Beulich <jbeulich@suse.com> wrote:

> On 15.10.2020 02:38, Dylanger Daly wrote:
>
> > I'm currently using Xen 4.14 (Qubes 4.1 OS) on a Ryzen 7 4750U PRO, by default I'll experience softlocks where the mouse for example will jolt from time to time, in this state it's not usable.
>
> From what you say below I imply this is in Dom0?
>
> > Adding `dom0_max_vcpus=1 dom0_vcpus_pin` to Xen's CMDLINE results in no more jolting however performance isn't what it should be on an 8 core CPU, softlocks are still a problem within domU's, any sort of UI animation for example.
> >
> > Reverting this commit (8e2aa76dc1670e82eaa15683353853bc66bf54fc) results in even worse performance with or without the above changes to CMDLINE, and it's not usable at all.
>
> You saying this surely has a reason, but making the connection would
>
> help. I don't consider it surprising that a revert of an improvement
>
> makes things worse. You having bothered to find a certain code change
>
> also makes me suspect you've experimented with other scheduler
>
> related settings - if so, please share all data you've got. (FAOD -
>
> with the information provided I have no idea what to suggest, sorry.)
>
> Jan
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
On 15.10.2020 11:14, Dylanger Daly wrote:
> Indeed this is for dom0, I only recently tried limiting a domU to 1 core and observed absolutely no softlocks, UI animations are smooth as butter with 1 core only.
>
> Indeed I believe this is a CPU Scheduling issue, I've tried both the older credit and RTDS however both don't boot correctly.

This wants reporting (with sufficient data, i.e. at least a serial log)
as separate issues.

> The number of cores on this CPU is 8, 16 threads however Qubes by default disables SMT, sched_credit2_max_cpus_runqueue is 16 by default, I've tried testing with setting this to 7 or 8 however it'll either not boot, or nothing will change.

Failure to boot, unless with insane command line options, should always
be reported to it can be fixed.

I'm afraid neither part of the reply gets you/us any closer to an
understanding of your softlockup issues. As a random thought, have you
tried disabling use of (deep) C-states? This is known to have helped
to work around errata on other hardware, so may be worth a try.

Jan
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
On 15/10/2020 01:38, Dylanger Daly wrote:
> Hi All,
>
> I'm currently using Xen 4.14 (Qubes 4.1 OS) on a Ryzen 7 4750U PRO, by
> default I'll experience softlocks where the mouse for example will
> jolt from time to time, in this state it's not usable.
>
> Adding `dom0_max_vcpus=1 dom0_vcpus_pin` to Xen's CMDLINE results in
> no more jolting however performance isn't what it should be on an 8
> core CPU, softlocks are still a problem within domU's, any sort of UI
> animation for example.
>
> Reverting this commit (8e2aa76dc1670e82eaa15683353853bc66bf54fc)
> <https://github.com/xen-project/xen/commit/8e2aa76dc1670e82eaa15683353853bc66bf54fc> results
> in even worse performance with or without the above changes to
> CMDLINE, and it's not usable at all.
>
> Does anyone have any pointers?

Does booting with sched=credit alter the symptoms?

~Andrew
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
> Does booting with sched=credit alter the symptoms?

Indeed I've tried this, the result is an observable delay, unusable performance, credit2 seems to be the only usable scheduler, I'm certain it has something to do with SMT being disabled, resulting in 8 cores instead of the expected 16 threads.

> As a random thought, have you tried disabling use of (deep) C-states?

Yeah I've tried both `processor.max_cstate=1|5`
I've also tried adding `0xC0010292` and `0xC0010296` MSRs into arch/x86/msr.c (guest_{rdmsr,wrmsr})

The above allowed me to use https://github.com/r4m0n/ZenStates-Linux/blob/master/zenstates.py

After removing `dom0_max_vcpus=1 dom0_vcpus_pin` from Xen's CMDLINE, and disabling C6 I observed no change.

> This wants reporting (with sufficient data, i.e. at least a serial log)

Hm, I'm not sure there's UART on this Laptop, can I save the boot log somewhere?
??????? Original Message ???????
On Thursday, October 15th, 2020 at 10:57 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:

> On 15/10/2020 01:38, Dylanger Daly wrote:
>
>> Hi All,
>>
>> I'm currently using Xen 4.14 (Qubes 4.1 OS) on a Ryzen 7 4750U PRO, by default I'll experience softlocks where the mouse for example will jolt from time to time, in this state it's not usable.
>>
>> Adding `dom0_max_vcpus=1 dom0_vcpus_pin` to Xen's CMDLINE results in no more jolting however performance isn't what it should be on an 8 core CPU, softlocks are still a problem within domU's, any sort of UI animation for example.
>>
>> Reverting [this commit (8e2aa76dc1670e82eaa15683353853bc66bf54fc)](https://github.com/xen-project/xen/commit/8e2aa76dc1670e82eaa15683353853bc66bf54fc) results in even worse performance with or without the above changes to CMDLINE, and it's not usable at all.
>>
>> Does anyone have any pointers?
>
> Does booting with sched=credit alter the symptoms?
>
> ~Andrew
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
On 20.10.2020 01:37, Dylanger Daly wrote:
>> This wants reporting (with sufficient data, i.e. at least a serial log)
>
> Hm, I'm not sure there's UART on this Laptop, can I save the boot log somewhere?

If the systems remains sufficiently usable "xl dmesg" will give you
the log. But you won't be able to get away without a serial-like
console (USB2 debug port may be an alternative, if you have a
suitable cable and if the USB topology in the laptop doesn't
prevent it functioning). Yes, laptops are always problematic in
this regard.

Jan
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
Hey All,

I think I've narrowed down what could be the issue.

I think disabling SMT on any AMD Zen 2 CPU is breaking Xen's Credit2 scheduler, I can only test on AMD Ryzen 4000 based Mobile CPUs, but I think this is what is causing issues with softlocks/having to pin dom0 1 vcpu.

I'm currently trying to re-enable SMT on Qubes 4.1 (Xen 4.14) and I'll report my findings here.
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
On 23.12.2020 00:04, Dylanger Daly wrote:
> I think I've narrowed down what could be the issue.
>
> I think disabling SMT on any AMD Zen 2 CPU is breaking Xen's Credit2 scheduler, I can only test on AMD Ryzen 4000 based Mobile CPUs, but I think this is what is causing issues with softlocks/having to pin dom0 1 vcpu.

Dario,

does this maybe ring any bells?

Jan

> I'm currently trying to re-enable SMT on Qubes 4.1 (Xen 4.14) and I'll report my findings here.
>
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
Hi,

Interesting situation (so to speak... :-O)

On Thu, 2020-10-15 at 11:20 +0200, Jan Beulich wrote:
> On 15.10.2020 11:14, Dylanger Daly wrote:
> > Indeed this is for dom0, I only recently tried limiting a domU to 1
> > core and observed absolutely no softlocks, UI animations are smooth
> > as butter with 1 core only.
> >
> > Indeed I believe this is a CPU Scheduling issue, I've tried both
> > the older credit and RTDS however both don't boot correctly.
>
> This wants reporting (with sufficient data, i.e. at least a serial
> log)
> as separate issues.
>
Indeed.

So, just to be sure I am understanding the symptoms correctly: here you
say that Credit (and RTDS) "don't boot correctly". In another mail, I
think you said that Credit boots, but is unusable due to lag and
lockups... Which is which?

Also, since this looks like it is SMT related, is Credit bootable
and/or usable with SMT off? And with SMT on?

> > The number of cores on this CPU is 8, 16 threads however Qubes by
> > default disables SMT, sched_credit2_max_cpus_runqueue is 16 by
> > default, I've tried testing with setting this to 7 or 8 however
> > it'll either not boot, or nothing will change.
>
> Failure to boot, unless with insane command line options, should
> always
> be reported to it can be fixed.
>
Yeah and facts are:

1) no value of the sched_credit2_max_cpus_runqueue option should 
prevent the system from booting. If it does, it's definitely a bug.

It'd be "wonderful" to see _how_ it does that, by seeing the 
stacktrace (preferrably of a debug build), if there is one. Or, if 
the system locks, e.g., knowing whether it is responsive at least 
to debug keys (and, if yest, how the output of the 'r' debug key 
looks like)

2) A suboptimal value of sched_credit2_max_cpus_runqueue may indeed be 
associated with performance issues, including lags and lookups. 
*BUT* that usually happens on large boxes, with like 128 or 256 
CPUs. In your case, having either 8 or 16 CPUs in the same Credit2
runqueue (or in two runqueue if you leave SMT on and use 8 as the 
value of that param) should work just fine. And, for sure, it 
shouldn't hang.

So, again, I'm not doubting it's happening, but I can't immediately
think of a root cause, especially without seeing more info.

In absence of that, I only have more questions. :-/ E.g., how are you
enabling and disabling SMT, via the command line parameter, or via
BIOS?

Also, can you perhaps try either upstream 4.14 Xen (from sources, I
mean) or the packages for a distro different than QubesOS (perhaps
installing such distro, temporarily, in an external HD or whatever).

Note that I by no means am trying to blame Qubes or anything else in
particular... I'm just trying to understand.

Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
On Wed, 2020-12-23 at 10:59 +0100, Jan Beulich wrote:
> On 23.12.2020 00:04, Dylanger Daly wrote:
> > I think I've narrowed down what could be the issue.
> >
> > I think disabling SMT on any AMD Zen 2 CPU is breaking Xen's
> > Credit2 scheduler, I can only test on AMD Ryzen 4000 based Mobile
> > CPUs, but I think this is what is causing issues with
> > softlocks/having to pin dom0 1 vcpu.
>
> Dario,
>
Hi, and thanks for bringing me in. :-)

> does this maybe ring any bells?
>
Not really. :-(

Unfortunately, I don't think I have access to a Ryzen CPU (but I'll try
to look better).

I do have access to an EPYC2 (Rome) CPUs, i.e., an EPYC 7742 with 256
CPUs (128 cores x 2 threads). I have just tried booting Xen 4.14 there
and:

1) with all the 256 CPUs enabled (i.e., smt=1), Credit2 scheduler and 
the default value (16) for sched_credit2_max_cpus_runqueue, the 
system seem to work fine.

There are 16 runqueues with 16 CPUs inside each of them, and they 
seem to be constructed reasonably (siblings are in the same 
runqueue, etc).

I don't have a GUI on that box for checking whether mouse movement 
are fluid, but I've run some basic tests from the terminal and 
everything looks normal.

Dom0 has 256 vCPUs and no pinning.

2) with only 128 CPUs (i.e., booting Xen with smt=0), Credit2 and  
still 16 in sched_credit2_max_cpus_runqueue, it also seems to work 
fine.

There are again 16 runqueues, each one with 8 CPUs and the system 
seems responsive enough.

Dom0 has 128 vCPUs and no pinning.

I can try Credit as well, later, but if this is something CPU arch/gen
related, it seems to be a Ryzen rather than a Zen 2 thing...

Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
Hi Dario,

Thank you for your reply

This issue is made much worse with: https://github.com/QubesOS/qubes-vmm-xen/commit/c28754bdb458281a22e9a9779213c941531b6dff#diff-d98b01176d360f55f58c25d2dfbfadc115718806181ef40d1838d2efa6b2bea1

Reverting `xen: credit2: limit the max number of CPUs in a
runqueue` results in stuttering even with dom0 pinned with 1vcpu, currently AMD Ryzen 4000 users are maintain a separate build without this change, I think this commit has something to do with the issues we're experiencing.

Without pinning dom0 1 vcpu this is what the lockups look like: https://imgur.com/a/q7MQRez. Another weird artifact is the mouse (trackpad) will quickly jerk when being moved.

The other interesting thing is appVMs can only use 2 vcpus, allocating more results in that appVM exibiting stuttering/microlockups

So to get the device working, the following is required
1. Not revert `xen: credit2: limit the max number of CPUs in a
runqueue`
2. Pin 1 vcpu for donm0
3. Limit 2 vcpus per appVM

> So, just to be sure I am understanding the symptoms correctly: here you
> say that Credit (and RTDS) "don't boot correctly". In another mail, I
> think you said that Credit boots, but is unusable due to lag and
> lockups... Which is which?

credit2 is the only scheduler that I can get working, other schedulers don't boot at all

> Also, since this looks like it is SMT related, is Credit bootable
> and/or usable with SMT off? And with SMT on?

Qubes by default disables SMT, just after I sent my email yesterday I was actually able to boot with smt enabled as long as I had dom0 allocated 1 vcpu, without dom0 being allocated 1 vcpu the device won't boot at all.

> It'd be "wonderful" to see _how_ it does that, by seeing the
> stacktrace (preferrably of a debug build), if there is one. Or, if
> the system locks, e.g., knowing whether it is responsive at least
> to debug keys (and, if yest, how the output of the 'r' debug key
> looks like)

Because I'm compiling Xen myself, I should absolutely be able to enable debug/verbose logging, I'll try to capture more logging today

Here's what I can dig up currently

```
[chairman@dom0 ~]$ xl info
host : dom0
release : 5.8.18-200.fc32.x86_64
version : #1 SMP Mon Nov 2 19:49:11 UTC 2020
machine : x86_64
nr_cpus : 8
max_cpu_id : 15
nr_nodes : 1
cores_per_socket : 8
threads_per_core : 1
cpu_mhz : 1696.837
hw_caps : 178bf3ff:76d8320b:2e500800:244037ff:0000000f:219c91a9:00400004:00000500
virt_caps : pv hvm hvm_directio pv_directio hap
total_memory : 30439
free_memory : 6644
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 14
xen_extra : .0
xen_version : 4.14.0
xen_caps : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler : credit2
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset :
xen_commandline : placeholder console=none dom0_mem=min:1024M dom0_mem=max:4096M dom0_max_vcpus=1 dom0_vcpus_pin ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096 ept=exec-sp no-real-mode edd=off
cc_compiler : gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9)
cc_compile_by : user
cc_compile_domain : [unknown]
cc_compile_date : Wed Dec 16 00:00:00 UTC 2020
build_id : 9eb1d06c8bbc4686c4a8a6c9ee46d91e106df81d
xend_config_format : 4

[chairman@dom0 ~]$ xl vcpu-list
Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
Domain-0 0 0 0 r-- 367.3 0 / 0,2,4,6,8,10,12,14
sys-net 1 0 12 -b- 59.9 all / all
sys-net 1 1 10 -b- 61.0 all / all
sys-net-dm 2 0 10 -b- 19.4 all / all
sys-usb 3 0 8 -b- 31.0 all / all
sys-usb 3 1 14 -b- 36.9 all / all
sys-usb 3 2 10 -b- 34.5 all / all
sys-usb 3 3 12 -b- 32.8 all / all
sys-usb-dm 4 0 12 -b- 20.8 all / all
sys-firewall 5 0 10 -b- 1.4 all / all
sys-firewall 5 1 14 -b- 11.0 all / all
xxxxxxxxxxxxxx 6 0 12 -b- 9.2 all / all
xxxxxxxxxxxxxx 6 1 12 -b- 8.0 all / all
xxxxxxxxxxxxxxx 7 0 8 -b- 8.9 all / all
xxxxxxxxxxxxxxx 7 1 8 -b- 6.1 all / all
xxxxxxxxxxx 8 0 14 -b- 399.2 all / all
xxxxxxxxxxx 8 1 12 -b- 454.6 all / all
xxxxxxxxxxxxxxxx 9 0 14 -b- 33.7 all / all
xxxxxxxxxxxxxxxx 9 1 8 -b- 54.1 all / all
xxxxxxxxxxxxxxx 10 0 10 -b- 4.2 all / all
xxxxxxxxxxxxxxx 10 1 8 -b- 7.3 all / all
xxxxxxxxxxxx 11 0 2 -b- 39.7 all / all
xxxxxxxxxxxx 11 1 12 -b- 121.9 all / all
email 12 0 8 -b- 29.2 all / all
email 12 1 14 -b- 84.2 all / all


[2020-12-24 08:34:49] Logfile Opened
[2020-12-24 08:34:49] (XEN) Built-in command line: ept=exec-sp
[2020-12-24 08:34:49] (XEN) parameter no-real-mode unknown!
[2020-12-24 08:34:49] (XEN) parameter edd unknown!
[2020-12-24 08:34:49] Xen 4.14.0
[2020-12-24 08:34:49] (XEN) Xen version 4.14.0 (user@[unknown]) (gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9)) debug=n Wed Dec 16 00:00:00 UTC 2020
[2020-12-24 08:34:49] (XEN) Latest ChangeSet:
[2020-12-24 08:34:49] (XEN) Bootloader: GRUB 2.04
[2020-12-24 08:34:49] (XEN) Command line: placeholder console=none dom0_mem=min:1024M dom0_mem=max:4096M dom0_max_vcpus=1 dom0_vcpus_pin ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096 ept=exec-sp no-real-mode edd=off
[2020-12-24 08:34:49] (XEN) Xen image load base address: 0xb8e00000
[2020-12-24 08:34:49] (XEN) Video information:
[2020-12-24 08:34:49] (XEN) VGA is graphics mode 1920x1080, 32 bpp
[2020-12-24 08:34:49] (XEN) Disc information:
[2020-12-24 08:34:49] (XEN) Found 0 MBR signatures
[2020-12-24 08:34:49] (XEN) Found 2 EDD information structures
[2020-12-24 08:34:49] (XEN) EFI RAM map:
[2020-12-24 08:34:49] (XEN) [0000000000000000, 000000000009efff] (usable)
[2020-12-24 08:34:49] (XEN) [000000000009f000, 000000000009ffff] (reserved)
[2020-12-24 08:34:49] (XEN) [00000000000e0000, 00000000000fffff] (reserved)
[2020-12-24 08:34:49] (XEN) [0000000000100000, 0000000009bfffff] (usable)
[2020-12-24 08:34:49] (XEN) [0000000009c00000, 0000000009d00fff] (reserved)
[2020-12-24 08:34:49] (XEN) [0000000009d01000, 0000000009efffff] (usable)
[2020-12-24 08:34:49] (XEN) [0000000009f00000, 0000000009f0ffff] (ACPI NVS)
[2020-12-24 08:34:49] (XEN) [0000000009f10000, 00000000bd9ddfff] (usable)
[2020-12-24 08:34:49] (XEN) [00000000bd9de000, 00000000ca37dfff] (reserved)
[2020-12-24 08:34:49] (XEN) [00000000ca37e000, 00000000cc37dfff] (ACPI NVS)
[2020-12-24 08:34:49] (XEN) [00000000cc37e000, 00000000cc3fdfff] (ACPI data)
[2020-12-24 08:34:49] (XEN) [00000000cc3fe000, 00000000cdffffff] (usable)
[2020-12-24 08:34:49] (XEN) [00000000ce000000, 00000000cfffffff] (reserved)
[2020-12-24 08:34:49] (XEN) [00000000f8000000, 00000000fbffffff] (reserved)
[2020-12-24 08:34:49] (XEN) [00000000fde00000, 00000000fdefffff] (reserved)
[2020-12-24 08:34:49] (XEN) [00000000fed80000, 00000000fed80fff] (reserved)
[2020-12-24 08:34:49] (XEN) [0000000100000000, 00000007af33ffff] (usable)
[2020-12-24 08:34:49] (XEN) [00000007af340000, 000000082fffffff] (reserved)
[2020-12-24 08:34:49] (XEN) ACPI: RSDP CC3FD014, 0024 (r2 LENOVO)
[2020-12-24 08:34:49] (XEN) ACPI: XSDT CC3FB188, 0104 (r1 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: FACP BE499000, 0114 (r6 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: DSDT BE484000, F08E (r1 LENOVO TP-R1C 1290 INTL 20180313)
[2020-12-24 08:34:49] (XEN) ACPI: FACS CC218000, 0040
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BF751000, 00A2 (r1 LENOVO PID0Ssdt 1 INTL 20180313)
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BF750000, 0CCC (r1 LENOVO UsbCTabl 1 INTL 20180313)
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BF743000, 7216 (r2 LENOVO TP-R1C 2 MSFT 4000000)
[2020-12-24 08:34:49] (XEN) ACPI: IVRS BF742000, 01A4 (r2 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BF704000, 0266 (r1 LENOVO STD3 1 INTL 20180313)
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BF6F0000, 0632 (r2 LENOVO Tpm2Tabl 1000 INTL 20180313)
[2020-12-24 08:34:49] (XEN) ACPI: TPM2 BF6EF000, 0034 (r3 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BF6EE000, 0924 (r1 LENOVO WmiTable 1 INTL 20180313)
[2020-12-24 08:34:49] (XEN) ACPI: MSDM BF6B5000, 0055 (r3 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: BATB BF6A0000, 004A (r2 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: HPET BE498000, 0038 (r1 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: APIC BE497000, 0138 (r2 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: MCFG BE496000, 003C (r1 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: SBST BE495000, 0030 (r1 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: WSMT BE494000, 0028 (r1 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: VFCT BE476000, D484 (r1 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BE472000, 39F4 (r1 LENOVO TP-R1C 1 AMD 1)
[2020-12-24 08:34:49] (XEN) ACPI: CRAT BE471000, 0F00 (r1 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: CDIT BE470000, 0029 (r1 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: FPDT BF6C7000, 0034 (r1 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BE46E000, 13CF (r1 LENOVO TP-R1C 1 INTL 20180313)
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BE46C000, 1576 (r1 LENOVO TP-R1C 1 INTL 20180313)
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BE468000, 353C (r1 LENOVO TP-R1C 1 INTL 20180313)
[2020-12-24 08:34:49] (XEN) ACPI: BGRT BE467000, 0038 (r1 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: UEFI CC217000, 013E (r1 LENOVO TP-R1C 1290 PTEC 2)
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BF74F000, 0090 (r1 LENOVO TP-R1C 1 INTL 20180313)
[2020-12-24 08:34:49] (XEN) ACPI: SSDT BF74E000, 09AD (r1 LENOVO TP-R1C 1 INTL 20180313)
[2020-12-24 08:34:49] (XEN) System RAM: 30439MB (31170232kB)
[2020-12-24 08:34:49] (XEN) Domain heap initialised
[2020-12-24 08:34:49] (XEN) ACPI: 32/64X FACS address mismatch in FADT - cc218000/0000000000000000, using 32
[2020-12-24 08:34:49] (XEN) IOAPIC[0]: apic_id 32, version 33, address 0xfec00000, GSI 0-23
[2020-12-24 08:34:49] (XEN) IOAPIC[1]: apic_id 33, version 33, address 0xfec01000, GSI 24-55
[2020-12-24 08:34:49] (XEN) Enabling APIC mode: Phys. Using 2 I/O APICs
[2020-12-24 08:34:49] (XEN) CPU0: 1400..1700 MHz
[2020-12-24 08:34:49] (XEN) xstate: size: 0x380 and states: 0x207
[2020-12-24 08:34:49] (XEN) Speculative mitigation facilities:
[2020-12-24 08:34:49] (XEN) Hardware features: IBPB
[2020-12-24 08:34:49] (XEN) Compiled-in support: INDIRECT_THUNK
[2020-12-24 08:34:49] (XEN) Xen settings: BTI-Thunk LFENCE, SPEC_CTRL: No, Other: IBPB BRANCH_HARDEN
[2020-12-24 08:34:49] (XEN) Support for HVM VMs: RSB
[2020-12-24 08:34:49] (XEN) Support for PV VMs: RSB
[2020-12-24 08:34:49] (XEN) XPTI (64-bit PV only): Dom0 disabled, DomU disabled (without PCID)
[2020-12-24 08:34:49] (XEN) PV L1TF shadowing: Dom0 disabled, DomU disabled
[2020-12-24 08:34:49] (XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2)
[2020-12-24 08:34:49] (XEN) Initializing Credit2 scheduler
[2020-12-24 08:34:49] (XEN) Platform timer is 14.318MHz HPET
[2020-12-24 08:34:49] (XEN) Detected 1696.837 MHz processor.
[2020-12-24 08:34:49] (XEN) Unknown cachability for MFNs 0xe0-0xff
[2020-12-24 08:34:49] (XEN) AMD-Vi: IOMMU Extended Features:
[2020-12-24 08:34:49] (XEN) - Peripheral Page Service Request
[2020-12-24 08:34:49] (XEN) - x2APIC
[2020-12-24 08:34:49] (XEN) - NX bit
[2020-12-24 08:34:49] (XEN) - Invalidate All Command
[2020-12-24 08:34:49] (XEN) - Guest APIC
[2020-12-24 08:34:49] (XEN) - Performance Counters
[2020-12-24 08:34:49] (XEN) - Host Address Translation Size: 0x2
[2020-12-24 08:34:49] (XEN) - Guest Address Translation Size: 0
[2020-12-24 08:34:49] (XEN) - Guest CR3 Root Table Level: 0x1
[2020-12-24 08:34:49] (XEN) - Maximum PASID: 0xf
[2020-12-24 08:34:49] (XEN) - SMI Filter Register: 0x1
[2020-12-24 08:34:49] (XEN) - SMI Filter Register Count: 0x1
[2020-12-24 08:34:49] (XEN) - Guest Virtual APIC Modes: 0x1
[2020-12-24 08:34:49] (XEN) - Dual PPR Log: 0x2
[2020-12-24 08:34:49] (XEN) - Dual Event Log: 0x2
[2020-12-24 08:34:49] (XEN) - User / Supervisor Page Protection
[2020-12-24 08:34:49] (XEN) - Device Table Segmentation: 0x3
[2020-12-24 08:34:49] (XEN) - PPR Log Overflow Early Warning
[2020-12-24 08:34:49] (XEN) - PPR Automatic Response
[2020-12-24 08:34:49] (XEN) - Memory Access Routing and Control: 0
[2020-12-24 08:34:49] (XEN) - Block StopMark Message
[2020-12-24 08:34:49] (XEN) - Performance Optimization
[2020-12-24 08:34:49] (XEN) - MSI Capability MMIO Access
[2020-12-24 08:34:49] (XEN) - Guest I/O Protection
[2020-12-24 08:34:49] (XEN) - Enhanced PPR Handling
[2020-12-24 08:34:49] (XEN) - Attribute Forward
[2020-12-24 08:34:49] (XEN) - Invalidate IOTLB Type
[2020-12-24 08:34:49] (XEN) - VM Table Size: 0
[2020-12-24 08:34:49] (XEN) - Guest Access Bit Update Disable
[2020-12-24 08:34:49] (XEN) AMD-Vi: IOMMU 0 Enabled.
[2020-12-24 08:34:49] (XEN) I/O virtualisation enabled
[2020-12-24 08:34:49] (XEN) - Dom0 mode: Relaxed
[2020-12-24 08:34:49] (XEN) Interrupt remapping enabled
[2020-12-24 08:34:49] (XEN) ENABLING IO-APIC IRQs
[2020-12-24 08:34:49] (XEN) -> Using new ACK method
[2020-12-24 08:34:49] (XEN) Allocated console ring of 32 KiB.
[2020-12-24 08:34:49] (XEN) HVM: ASIDs enabled.
[2020-12-24 08:34:49] (XEN) SVM: Supported advanced features:
[2020-12-24 08:34:49] (XEN) - Nested Page Tables (NPT)
[2020-12-24 08:34:49] (XEN) - Last Branch Record (LBR) Virtualisation
[2020-12-24 08:34:49] (XEN) - Next-RIP Saved on #VMEXIT
[2020-12-24 08:34:49] (XEN) - VMCB Clean Bits
[2020-12-24 08:34:49] (XEN) - DecodeAssists
[2020-12-24 08:34:49] (XEN) - Virtual VMLOAD/VMSAVE
[2020-12-24 08:34:49] (XEN) - Virtual GIF
[2020-12-24 08:34:49] (XEN) - Pause-Intercept Filter
[2020-12-24 08:34:49] (XEN) - Pause-Intercept Filter Threshold
[2020-12-24 08:34:49] (XEN) - TSC Rate MSR
[2020-12-24 08:34:49] (XEN) HVM: SVM enabled
[2020-12-24 08:34:49] (XEN) HVM: Hardware Assisted Paging (HAP) detected
[2020-12-24 08:34:49] (XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
[2020-12-24 08:34:49] (XEN) CPU 1 still not dead...
[2020-12-24 08:34:49] (XEN) CPU 1 still not dead...
[2020-12-24 08:34:49] (XEN) Brought up 8 CPUs
[2020-12-24 08:34:49] (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
[2020-12-24 08:34:49] (XEN) xenoprof: Initialization failed. AMD processor family 23 is not supported
[2020-12-24 08:34:49] (XEN) TSC warp detected, disabling TSC_RELIABLE
[2020-12-24 08:34:49] (XEN) Dom0 has maximum 264 PIRQs
[2020-12-24 08:34:49] (XEN) Xen kernel: 64-bit, lsb, compat32
[2020-12-24 08:34:49] (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x3e00000
[2020-12-24 08:34:49] (XEN) PHYSICAL MEMORY ARRANGEMENT:
[2020-12-24 08:34:49] (XEN) Dom0 alloc.: 000000078c000000->0000000790000000 (1022908 pages to be allocated)
[2020-12-24 08:34:49] (XEN) Init. ramdisk: 00000007acdbc000->00000007af1ff5cd
[2020-12-24 08:34:49] (XEN) VIRTUAL MEMORY ARRANGEMENT:
[2020-12-24 08:34:49] (XEN) Loaded kernel: ffffffff81000000->ffffffff83e00000
[2020-12-24 08:34:49] (XEN) Init. ramdisk: 0000000000000000->0000000000000000
[2020-12-24 08:34:49] (XEN) Phys-Mach map: 0000008000000000->0000008000800000
[2020-12-24 08:34:49] (XEN) Start info: ffffffff83e00000->ffffffff83e004b8
[2020-12-24 08:34:49] (XEN) Xenstore ring: 0000000000000000->0000000000000000
[2020-12-24 08:34:49] (XEN) Console ring: 0000000000000000->0000000000000000
[2020-12-24 08:34:49] (XEN) Page tables: ffffffff83e01000->ffffffff83e24000
[2020-12-24 08:34:49] (XEN) Boot stack: ffffffff83e24000->ffffffff83e25000
[2020-12-24 08:34:49] (XEN) TOTAL: ffffffff80000000->ffffffff84000000
[2020-12-24 08:34:49] (XEN) ENTRY ADDRESS: ffffffff83128180
[2020-12-24 08:34:49] (XEN) Dom0 has maximum 1 VCPUs
[2020-12-24 08:34:49] (XEN) Initial low memory virq threshold set at 0x4000 pages.
[2020-12-24 08:34:49] (XEN) Scrubbing Free RAM in background
[2020-12-24 08:34:49] (XEN) Std. Loglevel: Errors and warnings
[2020-12-24 08:34:49] (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
[2020-12-24 08:34:49] (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
[2020-12-24 08:34:49] (XEN) Freed 536kB init memory
```

> In absence of that, I only have more questions. :-/ E.g., how are you
> enabling and disabling SMT, via the command line parameter, or via
> BIOS?

Currently via Xen's CMDLINE, the Lenovo X13 doesn't have an option to disable SMT :(

> Also, can you perhaps try either upstream 4.14 Xen (from sources, I
> mean) or the packages for a distro different than QubesOS (perhaps
> installing such distro, temporarily, in an external HD or whatever).

I can give this a shot, I'll try and dump out as much logging as I can, then I'll try build from the master branch of xen

> I can try Credit as well, later, but if this is something CPU arch/gen
> related, it seems to be a Ryzen rather than a Zen 2 thing...

Absolutely, I agree, this seems to be Ryzen 3000/4000 related, another Qubes user running a Ryzen 9 3900X CPU appears to be having the same issue: https://imgur.com/a/EYOMmRe
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
Hi All,

Trying to debug Credit2 https://wiki.xenproject.org/wiki/Credit2_Scheduler#Dumping_Status_and_Params

It should be possible to get some debug output on what Credit2 is doing via pressing 'r' on the Serial Debug port

Does anyone know if it's at all possible to use a USB-TTY adapter? The wiki (https://wiki.xenproject.org/wiki/Xen_Serial_Console) says it's not possible, this makes debugging any modern laptop really hard, I don't think I've ever owned a laptop with a serial port.

Failing that I'll just have to add a bunch of printk's Credit2 functions
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
On 03.01.2021 07:40, Dylanger Daly wrote:
> Trying to debug Credit2 https://wiki.xenproject.org/wiki/Credit2_Scheduler#Dumping_Status_and_Params
>
> It should be possible to get some debug output on what Credit2 is doing via pressing 'r' on the Serial Debug port
>
> Does anyone know if it's at all possible to use a USB-TTY adapter? The wiki (https://wiki.xenproject.org/wiki/Xen_Serial_Console) says it's not possible, this makes debugging any modern laptop really hard, I don't think I've ever owned a laptop with a serial port.

If you have a USB2 debug cable and the laptop has a USB2 (EHCI)
debug port directly underneath the host controller (no hubs
inbetween) this should work ("console=dbgp" and some suitable
"dbgp=..."). However, besides USB2 being quite old and typically
laptops coming with newer ports / controllers nowadays, I've
not come across a laptop where the debug port wasn't behind some
hub ...

Jan
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
Hi Everyone,

I just wanted to close this off and let everyone know the issue ended up being a faulty/misconfigured HPET clock.

Appending `clocksource=tsc tsc=unstable hpetbroadcast=0` to Xen's CMDLINE totally fixed my issue, I assume Xen was detecting TSC may have been 'off' and was trying to recover/self-correct?

In any case it's working perfectly.

Cheers
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
On 16.03.2021 03:10, Dylanger Daly wrote:
> I just wanted to close this off and let everyone know the issue ended up being a faulty/misconfigured HPET clock.
>
> Appending `clocksource=tsc tsc=unstable hpetbroadcast=0` to Xen's CMDLINE totally fixed my issue, I assume Xen was detecting TSC may have been 'off' and was trying to recover/self-correct?

I find this a very confusing combination of command line options.
In particular "tsc=unstable" clears one of the feature prereqs
(TSC_RELIABLE) that are required for "clocksource=tsc" to take
any effect, afaict. I therefore would conclude that you're not
actually running with TSC as the clock source. Did you check the
hypervisor log (which might prove me wrong)?

Jan
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
On Tue, 2021-03-16 at 09:02 +0100, Jan Beulich wrote:
> On 16.03.2021 03:10, Dylanger Daly wrote:
> > I just wanted to close this off and let everyone know the issue
> > ended up being a faulty/misconfigured HPET clock.
> >
> > Appending `clocksource=tsc tsc=unstable hpetbroadcast=0` to Xen's
> > CMDLINE totally fixed my issue, I assume Xen was detecting TSC may
> > have been 'off' and was trying to recover/self-correct?
>
> I find this a very confusing combination of command line options.
> In particular "tsc=unstable" clears one of the feature prereqs
> (TSC_RELIABLE) that are required for "clocksource=tsc" to take
> any effect, afaict. I therefore would conclude that you're not
> actually running with TSC as the clock source. 
>
Right. Also, isn't hpetbroadcast set to 0 by default already?

Dario
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
Hi Everyone, I've just confirmed only `tsc=unstable` is required, that specific change has fixed the issues I was having on the Lenovo X13, I assume this is because Lenovo's Clock isn't correct?

> Right. Also, isn't hpetbroadcast set to 0 by default already?
>
> Dario
> --------------------------------------------------------------------
>
> Dario Faggioli, Ph.D
>
> http://about.me/dario.faggioli
>
> Virtualization Software Engineer
>
> SUSE Labs, SUSE https://www.suse.com/
> -----------------------------------------------------------------------------------------------------------------------------
>
> <<This happens because I choose it to happen!>> (Raistlin Majere)
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
On 19.03.2021 13:45, Dylanger Daly wrote:
> Hi Everyone, I've just confirmed only `tsc=unstable` is required,
> that specific change has fixed the issues I was having on the
> Lenovo X13, I assume this is because Lenovo's Clock isn't correct?

Hard to tell without knowing what actually went wrong. It wasn't
very long ago that we had to fix an issue where iirc a machine
wouldn't even boot because of some strange state firmware put
the TSCs in. We were able to work around this, such that
"tsc=unstable" wouldn't be needed. So while there may indeed be
some oddity with what firmware does, there may also be a way to
work around this. But I guess it would take someone debugging
this on an affected system...

Jan
Re: Ryzen 4000 (Mobile) Softlocks/Micro-stutters [ In reply to ]
Hi all,

Lenovo released a new UEFI update for the Lenovo X13/T14s, changelog is here: https://download.lenovo.com/pccbbs/mobiles/r1cuj63wd.txt

It lists "Fixed an issue that Fixed TSC synchronization failed under linux." as a fix, I can confirm after removing the tsc=unstable CMDLINE everything is functioning perfectly.

The bottom of this issue was indeed Lenovo's shotty firmware.

Thank you to everyone.