Mailing List Archive

xen domU segfaults with xpti on intel based systems
Hello,
we are observing random PV domU segfaults on Intel based systems with XPTI
enabled. These segfaults were not present in Xen 4.9.2 and can be
reproduced on 4.9.3/4.10.2/4.11.1. The problem can be mitigated by adding
xpti=false to xen command line options.

Some of the affected systems are Debian 8/9 (Debian 10 with kernel 4.18
seems to work fine) and NetBSD 7. It's harder to reproduce the segfaults on
Debian, but on NetBSD it's almost instant.

The 7.0 files can be taken from:
https://cdn.netbsd.org/pub/NetBSD/NetBSD-7.0/amd64/binary/kernel/netbsd-INSTALL_XEN3_DOMU.gz
https://cdn.netbsd.org/pub/NetBSD/NetBSD-7.0/amd64/binary/kernel/netbsd-XEN3_DOMU.gz

netbsd.conf:
kernel = "7.0/netbsd-INSTALL_XEN3_DOMU.gz"
memory = 512
vcpus = 1
name = "netbsd"
vif = [ '' ]
disk = ['phy:/dev/vg_data/netbsd,xvda,w']

The installation goes fine, but in the end:

Status: Command
failed
Command: /bin/sh MAKEDEV
all
Hit enter to
continue
--------------------------------------------------------------------------------
[1] Done eval "${before}"... |
Segmentation fault (core dumped) eval "${after}";...
[1] Done eval "${before}"... |
Segmentation fault (core dumped) eval "${after}";...

Now we boot the system with changing kernel to:
kernel = "7.0/netbsd-XEN3_DOMU.gz"

...
Updating motd.
Starting powerd.
[1] Segmentation fault (core dumped) sysctl -n hw.dis...
/usr/sbin/postconf: warning: valid_hostname: misplaced delimiter:
.domain.tld
/usr/sbin/postconf: fatal: unable to use my own hostname
Jan 11 05:49:49 .sygic postfix[1550]: fatal: unable to use my own hostname
/etc/rc.d/postfix exited with code 1
Starting inetd.
...

Is it a known issue or can something be done with this?

Thank you,
Tomas
Re: xen domU segfaults with xpti on intel based systems [ In reply to ]
On 11/01/2019 07:05, Tomas Mozes wrote:
> Hello,
> we are observing random PV domU segfaults on Intel based systems with
> XPTI enabled. These segfaults were not present in Xen 4.9.2 and can be
> reproduced on 4.9.3/4.10.2/4.11.1. <http://4.11.1.> The problem can be
> mitigated by adding xpti=false to xen command line options.
>
> Some of the affected systems are Debian 8/9 (Debian 10 with kernel 4.18
> seems to work fine) and NetBSD 7. It's harder to reproduce the segfaults
> on Debian, but on NetBSD it's almost instant.

Hmm, as we haven't received any similar reports, I suspect there is
something special on your side.

Can you please be more specific regarding:

- hardware (machine type(s), processor model(s), ...)
- other config options (hypervisor command line, hypervisor .config)

A hypervisor log (output of "xl dmesg") would help, too. Please add
"loglvl=all guest_loglvl=all" to the hypervisor command line for that
purpose. If possible use a debug hypervisor for this test, as that
will produce more diagnostic output.


Juergen

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users
Re: xen domU segfaults with xpti on intel based systems [ In reply to ]
On Fri, Jan 11, 2019 at 9:21 AM Juergen Gross <jgross@suse.com> wrote:

> On 11/01/2019 07:05, Tomas Mozes wrote:
> > Hello,
> > we are observing random PV domU segfaults on Intel based systems with
> > XPTI enabled. These segfaults were not present in Xen 4.9.2 and can be
> > reproduced on 4.9.3/4.10.2/4.11.1. <http://4.11.1.> The problem can be
> > mitigated by adding xpti=false to xen command line options.
> >
> > Some of the affected systems are Debian 8/9 (Debian 10 with kernel 4.18
> > seems to work fine) and NetBSD 7. It's harder to reproduce the segfaults
> > on Debian, but on NetBSD it's almost instant.
>
> Hmm, as we haven't received any similar reports, I suspect there is
> something special on your side.
>
> Can you please be more specific regarding:
>
> - hardware (machine type(s), processor model(s), ...)
> - other config options (hypervisor command line, hypervisor .config)
>
> A hypervisor log (output of "xl dmesg") would help, too. Please add
> "loglvl=all guest_loglvl=all" to the hypervisor command line for that
> purpose. If possible use a debug hypervisor for this test, as that
> will produce more diagnostic output.
>
>
> Juergen
>

These segfaults were actually spotted by the gmp project maintainer and
only later they were locally reproduced on other machine (intel too).

A machine on which it can be reproduced: Intel DH87MC with Intel Core
i7-4770 CPU @ 3.40GHz on Linux Gentoo (Haswell)
But for example i cannot reproduce on my desktop machine: Intel DH77EB with
Intel Core i5-3570 CPU @ 3.40GHz (Ivy Bridge)

Grub options for the kernel/xen:
GRUB_CMDLINE_LINUX="panic=30 net.ifnames=0 domdadm"
GRUB_CMDLINE_XEN="dom0_mem=4G gnttab_max_frames=256 ucode=scan loglvl=all
guest_loglvl=all console_to_ring console_timestamps=date conring_size=1m
smt=true"
Re: xen domU segfaults with xpti on intel based systems [ In reply to ]
On 11/01/2019 14:05, Tomas Mozes wrote:
>
>
> On Fri, Jan 11, 2019 at 9:21 AM Juergen Gross <jgross@suse.com
> <mailto:jgross@suse.com>> wrote:
>
> On 11/01/2019 07:05, Tomas Mozes wrote:
> > Hello,
> > we are observing random PV domU segfaults on Intel based systems with
> > XPTI enabled. These segfaults were not present in Xen 4.9.2 and can be
> > reproduced on 4.9.3/4.10.2/4.11.1. <http://4.11.1.>
> <http://4.11.1.> The problem can be
> > mitigated by adding xpti=false to xen command line options.
> >
> > Some of the affected systems are Debian 8/9 (Debian 10 with kernel
> 4.18
> > seems to work fine) and NetBSD 7. It's harder to reproduce the
> segfaults
> > on Debian, but on NetBSD it's almost instant.
>
> Hmm, as we haven't received any similar reports, I suspect there is
> something special on your side.
>
> Can you please be more specific regarding:
>
> - hardware (machine type(s), processor model(s), ...)
> - other config options (hypervisor command line, hypervisor .config)
>
> A hypervisor log (output of "xl dmesg") would help, too. Please add
> "loglvl=all guest_loglvl=all" to the hypervisor command line for that
> purpose. If possible use a debug hypervisor for this test, as that
> will produce more diagnostic output.
>
>
> Juergen
>
>
> These segfaults were actually spotted by the gmp project maintainer and
> only later they were locally reproduced on other machine (intel too).
>
> A machine on which it can be reproduced: Intel DH87MC with Intel Core
> i7-4770 CPU @ 3.40GHz on Linux Gentoo (Haswell)
> But for example i cannot reproduce on my desktop machine: Intel DH77EB
> with Intel Core i5-3570 CPU @ 3.40GHz (Ivy Bridge)

Okay, those two cpus differ in a critical feature: on Ovy Bridge XPTI
can't make use of the processor's PCID feature due to a lack of the
INVPCID instruction.

Can you test wheter adding "pcid=false" to the hypervisor command line
on the Haswell machine makes any difference?

And one other question: could it be the problem occurred at the same
time when

(XEN) [2019-01-11 12:41:06] d1 L1TF-vulnerable L4e 000000070cb93004 -
Shadowing

was issued?


Juergen

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users
Re: xen domU segfaults with xpti on intel based systems [ In reply to ]
Hi Juergen, Tomas,

On Fri, Jan 11, 2019 at 09:21:09AM +0100, Juergen Gross wrote:
> On 11/01/2019 07:05, Tomas Mozes wrote:
> > Some of the affected systems are Debian 8/9 (Debian 10 with kernel 4.18
> > seems to work fine) and NetBSD 7. It's harder to reproduce the segfaults
> > on Debian, but on NetBSD it's almost instant.
>
> Hmm, as we haven't received any similar reports, I suspect there is
> something special on your side.

I did report slightly similar problems to xen-devel:

https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg02811.html

I currently work around it by ensuring the guests have updated their
kernels to have the L1TF mitigations (you can tell because
/sys/devices/system/cpu/vulnerabilities/l1tf appears).

The other way was to set the Xen command line options pv-l1tf=false
or pcid=0.

For me this only affected 64-bit PV domains, but I only run Linux. I
didn't try xpti=false because the logs about shadowing made me try
the L1TF-related options first.

For me the above behaviour is experienced on Xeon D-1540 and Xeon
E5-1680v4 systems. I don't have any other types of system so don't
know how widespread it is.

Also please note that within weeks I also started experiencing much
worse problems: host crash, for which the only suggestion so far is
to try pcid=0. As that is hard for me to reproduce, with a time to
re-occurrence currently somewhere between 8 and 14 days, I am not
yet sure if pcid=0 helps. We're 9 days in to a test on that.

https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00938.html

Cheers,
Andy

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users
Re: xen domU segfaults with xpti on intel based systems [ In reply to ]
On Fri, Jan 11, 2019 at 2:36 PM Juergen Gross <jgross@suse.com> wrote:

> On 11/01/2019 14:05, Tomas Mozes wrote:
> >
> >
> > On Fri, Jan 11, 2019 at 9:21 AM Juergen Gross <jgross@suse.com
> > <mailto:jgross@suse.com>> wrote:
> >
> > On 11/01/2019 07:05, Tomas Mozes wrote:
> > > Hello,
> > > we are observing random PV domU segfaults on Intel based systems
> with
> > > XPTI enabled. These segfaults were not present in Xen 4.9.2 and
> can be
> > > reproduced on 4.9.3/4.10.2/4.11.1. <http://4.11.1.>
> > <http://4.11.1.> The problem can be
> > > mitigated by adding xpti=false to xen command line options.
> > >
> > > Some of the affected systems are Debian 8/9 (Debian 10 with kernel
> > 4.18
> > > seems to work fine) and NetBSD 7. It's harder to reproduce the
> > segfaults
> > > on Debian, but on NetBSD it's almost instant.
> >
> > Hmm, as we haven't received any similar reports, I suspect there is
> > something special on your side.
> >
> > Can you please be more specific regarding:
> >
> > - hardware (machine type(s), processor model(s), ...)
> > - other config options (hypervisor command line, hypervisor .config)
> >
> > A hypervisor log (output of "xl dmesg") would help, too. Please add
> > "loglvl=all guest_loglvl=all" to the hypervisor command line for that
> > purpose. If possible use a debug hypervisor for this test, as that
> > will produce more diagnostic output.
> >
> >
> > Juergen
> >
> >
> > These segfaults were actually spotted by the gmp project maintainer and
> > only later they were locally reproduced on other machine (intel too).
> >
> > A machine on which it can be reproduced: Intel DH87MC with Intel Core
> > i7-4770 CPU @ 3.40GHz on Linux Gentoo (Haswell)
> > But for example i cannot reproduce on my desktop machine: Intel DH77EB
> > with Intel Core i5-3570 CPU @ 3.40GHz (Ivy Bridge)
>
> Okay, those two cpus differ in a critical feature: on Ovy Bridge XPTI
> can't make use of the processor's PCID feature due to a lack of the
> INVPCID instruction.
>
> Can you test wheter adding "pcid=false" to the hypervisor command line
> on the Haswell machine makes any difference?
>

Setting "pcid=false" makes the segfault go away too.


>
> And one other question: could it be the problem occurred at the same
> time when
>
> (XEN) [2019-01-11 12:41:06] d1 L1TF-vulnerable L4e 000000070cb93004 -
> Shadowing
>
> was issued?
>
>
It's printed shortly after the domU is started, like 10 seconds before the
segfault. It's printed in both cases (with/without pcid=false).


>
> Juergen
>
Re: xen domU segfaults with xpti on intel based systems [ In reply to ]
On 11/01/2019 17:52, Andy Smith wrote:
> Hi Juergen, Tomas,
>
> On Fri, Jan 11, 2019 at 09:21:09AM +0100, Juergen Gross wrote:
>> On 11/01/2019 07:05, Tomas Mozes wrote:
>>> Some of the affected systems are Debian 8/9 (Debian 10 with kernel 4.18
>>> seems to work fine) and NetBSD 7. It's harder to reproduce the segfaults
>>> on Debian, but on NetBSD it's almost instant.
>>
>> Hmm, as we haven't received any similar reports, I suspect there is
>> something special on your side.
>
> I did report slightly similar problems to xen-devel:
>
> https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg02811.html
>
> I currently work around it by ensuring the guests have updated their
> kernels to have the L1TF mitigations (you can tell because
> /sys/devices/system/cpu/vulnerabilities/l1tf appears).
>
> The other way was to set the Xen command line options pv-l1tf=false
> or pcid=0.
>
> For me this only affected 64-bit PV domains, but I only run Linux. I
> didn't try xpti=false because the logs about shadowing made me try
> the L1TF-related options first.

This is XSA-294. A patch has just been committed to Xen staging, patches
for older Xen releases will follow soon.


Juergen

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users