Mailing List Archive

text-tsx fails on Intel core 8th gen system
Hi,

I've noticed that tools/tests/tsx/test-tsx fails on a system with Intel
Core i7-8750H. Specific error I get:

[user@dom0 tsx]$ ./test-tsx
TSX tests
Got 16 CPUs
Testing MSR_TSX_FORCE_ABORT consistency
CPU0 val 0
Testing MSR_TSX_CTRL consistency
Testing MSR_MCU_OPT_CTRL consistency
CPU0 val 0
Testing RTM behaviour
Got #UD
Host reports RTM, but appears unavailable
Testing PV default/max policies
Max: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
Def: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
HLE/RTM offered to guests despite not being available
Testing HVM default/max policies
Max: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
Def: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
HLE/RTM offered to guests despite not being available
Testing PV guest
Created d8
Cur: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
Cur: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
Testing HVM guest
Created d9
Cur: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
Cur: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
[user@dom0 tsx]$ echo $?
1


When I try it on a newer system (11th gen) then it works fine (exit code
0, just "Got #UD", no "Host reports RTM, but appears unavailable" line).


/proc/cpuinfo says:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
stepping : 10
microcode : 0xf6
cpu MHz : 2207.990
cache size : 9216 KB
physical id : 0
siblings : 6
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp fsgsbase bmi1 avx2 bmi2 erms rdseed adx clflushopt xsaveopt xsavec xgetbv1 md_clear arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds mmio_stale_data retbleed
bogomips : 4415.98
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

...


Full `xen-cpuid detail` output attached.

Just in case, I'm attaching also full xl dmesg, but I don't see anything
related there.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: text-tsx fails on Intel core 8th gen system [ In reply to ]
On 03.04.2024 16:50, Marek Marczykowski-Górecki wrote:
> Hi,
>
> I've noticed that tools/tests/tsx/test-tsx fails on a system with Intel
> Core i7-8750H. Specific error I get:
>
> [user@dom0 tsx]$ ./test-tsx
> TSX tests
> Got 16 CPUs
> Testing MSR_TSX_FORCE_ABORT consistency
> CPU0 val 0
> Testing MSR_TSX_CTRL consistency
> Testing MSR_MCU_OPT_CTRL consistency
> CPU0 val 0
> Testing RTM behaviour
> Got #UD
> Host reports RTM, but appears unavailable

Isn't this ...

> Testing PV default/max policies
> Max: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> Def: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> HLE/RTM offered to guests despite not being available
> Testing HVM default/max policies
> Max: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> Def: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> HLE/RTM offered to guests despite not being available
> Testing PV guest
> Created d8
> Cur: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> Cur: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> Testing HVM guest
> Created d9
> Cur: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> Cur: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> [user@dom0 tsx]$ echo $?
> 1

... the reason for this?

Jan
Re: text-tsx fails on Intel core 8th gen system [ In reply to ]
On Wed, Apr 03, 2024 at 05:04:20PM +0200, Jan Beulich wrote:
> On 03.04.2024 16:50, Marek Marczykowski-Górecki wrote:
> > Hi,
> >
> > I've noticed that tools/tests/tsx/test-tsx fails on a system with Intel
> > Core i7-8750H. Specific error I get:
> >
> > [user@dom0 tsx]$ ./test-tsx
> > TSX tests
> > Got 16 CPUs
> > Testing MSR_TSX_FORCE_ABORT consistency
> > CPU0 val 0
> > Testing MSR_TSX_CTRL consistency
> > Testing MSR_MCU_OPT_CTRL consistency
> > CPU0 val 0
> > Testing RTM behaviour
> > Got #UD
> > Host reports RTM, but appears unavailable
>
> Isn't this ...
>
> > Testing PV default/max policies
> > Max: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> > Def: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> > HLE/RTM offered to guests despite not being available
> > Testing HVM default/max policies
> > Max: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> > Def: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> > HLE/RTM offered to guests despite not being available
> > Testing PV guest
> > Created d8
> > Cur: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> > Cur: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> > Testing HVM guest
> > Created d9
> > Cur: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> > Cur: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
> > [user@dom0 tsx]$ echo $?
> > 1
>
> ... the reason for this?

I think so, but the question is why it behaves this way. Could be an
issue with MSR/CPUID values presented by Xen, or values Xen gets from
the CPU.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: text-tsx fails on Intel core 8th gen system [ In reply to ]
On 03/04/2024 3:50 pm, Marek Marczykowski-Górecki wrote:
> Hi,
>
> I've noticed that tools/tests/tsx/test-tsx fails on a system with Intel
> Core i7-8750H. Specific error I get:
>
> [user@dom0 tsx]$ ./test-tsx
> TSX tests
> Got 16 CPUs
> Testing MSR_TSX_FORCE_ABORT consistency
> CPU0 val 0
> Testing MSR_TSX_CTRL consistency
> Testing MSR_MCU_OPT_CTRL consistency
> CPU0 val 0
> Testing RTM behaviour
> Got #UD
> Host reports RTM, but appears unavailable

Hmm - I should make this failure report more obviously distinguishable
from the general logging.

This is reporting a consistency-check failure, with a mismatch between
actual-behaviour and what's in CPUID (host policy in practice).

This is CoffeeLake, and was one of the CPUs which had TSX taken out, but
something looks wonky.


> When I try it on a newer system (11th gen) then it works fine (exit code
> 0, just "Got #UD", no "Host reports RTM, but appears unavailable" line).

RocketLake was after the decision to remove TSX from the client line, so
will either genuinely not have the silicon, or it should be properly
fused out.

Anyway, back to CoffeeLake.

The Raw policy shows rtm-always-abort and tsx-force-abort.  Test-tsx
says the value in MSR_TSX_FORCE_ABORT is 0, but that shouldn't be the
case seeing as the RTM/HLE bits are hidden in real CPUID, but the
CPUID_HIDE bit is clear.

We do intentionally force RTM_ALWAYS_ABORT in some cases, because it
self-hides in some cases.  I wonder if we've got bug in that path.

From the state in Raw, we then synthesise HLE/RTM in the Host policy
because MSR_TSX_FORCE_ABORT only exists on TSX-capable systems. 
However, if XBEGIN is really #UD-ing, we can't offer it as an opt-in to
guests.

Let me see about putting some debugging together.

~Andrew
Re: text-tsx fails on Intel core 8th gen system [ In reply to ]
On 03.04.2024 18:41, Marek Marczykowski-Górecki wrote:
> On Wed, Apr 03, 2024 at 05:04:20PM +0200, Jan Beulich wrote:
>> On 03.04.2024 16:50, Marek Marczykowski-Górecki wrote:
>>> Hi,
>>>
>>> I've noticed that tools/tests/tsx/test-tsx fails on a system with Intel
>>> Core i7-8750H. Specific error I get:
>>>
>>> [user@dom0 tsx]$ ./test-tsx
>>> TSX tests
>>> Got 16 CPUs
>>> Testing MSR_TSX_FORCE_ABORT consistency
>>> CPU0 val 0
>>> Testing MSR_TSX_CTRL consistency
>>> Testing MSR_MCU_OPT_CTRL consistency
>>> CPU0 val 0
>>> Testing RTM behaviour
>>> Got #UD
>>> Host reports RTM, but appears unavailable
>>
>> Isn't this ...
>>
>>> Testing PV default/max policies
>>> Max: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
>>> Def: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
>>> HLE/RTM offered to guests despite not being available
>>> Testing HVM default/max policies
>>> Max: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
>>> Def: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
>>> HLE/RTM offered to guests despite not being available
>>> Testing PV guest
>>> Created d8
>>> Cur: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
>>> Cur: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
>>> Testing HVM guest
>>> Created d9
>>> Cur: RTM 0, HLE 0, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
>>> Cur: RTM 1, HLE 1, TSX_FORCE_ABORT 0, RTM_ALWAYS_ABORT 0, TSX_CTRL 0
>>> [user@dom0 tsx]$ echo $?
>>> 1
>>
>> ... the reason for this?
>
> I think so, but the question is why it behaves this way. Could be an
> issue with MSR/CPUID values presented by Xen, or values Xen gets from
> the CPU.

Can't test_rtm_behaviour() be run even without Xen underneath? Maybe this
could be run irrespective of xc_interface_open() failing?

Jan