Mailing List Archive

Inconsistent behavior with unified EFI binaries
I've been checking Xen as an alternative hypervisor in my search on what
I am going to run on my next server. I've looked at XCP-NG, but it falls
short of some of my main requirements. That being secure boot and FDE.
FDE could potentially be worked around by using a storage VM that sets
itself up as an NFS/SMB/CIFS share. But then secure boot is still an
issue and XCP-NG does not provide a xen.efi.

So I'm trying out using Arch Linux as my Xen Dom0, which works fine sans
the features offered by XCP-NG.
So using https://xenbits.xen.org/docs/unstable/misc/efi.html as my
source, I've built a script that builds a unified Xen+Linux+initrd EFI
binary (now referred to as "UXE"). I've done sanity checks on the EFI
binary to make sure all the data in the inserted sections is intact
(matching SHA256 sums after dumping).

Trying to boot the UXE on my laptop (ASUS GL703GS) results in a loader error:
"ERROR: Will only load images built for the generic loader or Linux
images (Not '' and '') or with PHYS32_ENTRY set"
(typed out by hand since I do not have a serial connection to my laptop)
Regular xen.efi will boot just fine however.

However, trying to boot the UXE on my friend's laptop (MSI GE75 Raider)
it gets past the loading stage and into the OS. He runs EndeavourOS,
which is a fork of Arch Linux, but uses the same kernel (equivalent
hashes).

On my home server (SuperMicro X9DRH-7F; Intel Xeon E5-2670 ES C1 QBF5)
Linux fails to boot regardless of it being a UXE or standalone xen.efi.
It does load the kernel into memory and executes it, but the kernel
fails almost immediately after hitting a BUG in physaddr.c.
I have boot logs thanks to serial over LAN, but seeing this is an
engineering sample I would chalk it up to CPU instability. The boot log
is the exact same for both regular xen.efi and the UXE.

And finally, in a nested VM through libvirt (KVM) the UXE fails claiming
the kernel is not an ELF binary. Regular xen.efi does boot, but the
kernel will hit a few BUGs. The host is a desktop with an
ASUS PRIME Z370-A motherboard and an i7-8700K CPU, running Arch Linux
with a custom kernel.

Now I am interested the most in why loading the kernel seems to fail on
my laptop, and not on my friend's laptop or on my home server. I want to
confirm that secure boot is possible on my soon to be ordered server
before even considering to use Xen. How would I even begin to
troubleshot this? This is not something I've been able to reproduce in
user mode, running readnotes on the extracted kernel image returns the
ELF notes just fine.

Logs can be found here: https://gist.github.com/RA-Kooi/69bc2283d73e923c6b7f40ca379a2527

Xen configuration for my laptop:

[global]
default=xen

[xen]
options=console=vga iommu=force:true,qinval:true,debug:true loglvl=all noreboot=true reboot=no vga=ask
kernel=vmlinuz-linux root=/dev/mapper/root rd.luks.name=31475bde-3aba-4f6c-adf8-73355337d44d=root rw add_efi_memmap earlyprintk=xen
ramdisk=initramfs-linux.img
ucode=xen-efi-intel-ucode.bin
Re: Inconsistent behavior with unified EFI binaries [ In reply to ]
On 10/05/2023 04:49, Rafaƫl Kooi wrote:
> I've been checking Xen as an alternative hypervisor in my search on what
> I am going to run on my next server. I've looked at XCP-NG, but it falls
> short of some of my main requirements. That being secure boot and FDE.
> FDE could potentially be worked around by using a storage VM that sets
> itself up as an NFS/SMB/CIFS share. But then secure boot is still an
> issue and XCP-NG does not provide a xen.efi.
>
> So I'm trying out using Arch Linux as my Xen Dom0, which works fine sans
> the features offered by XCP-NG.
> So using https://xenbits.xen.org/docs/unstable/misc/efi.html as my
> source, I've built a script that builds a unified Xen+Linux+initrd EFI
> binary (now referred to as "UXE"). I've done sanity checks on the EFI
> binary to make sure all the data in the inserted sections is intact
> (matching SHA256 sums after dumping).
>
> Trying to boot the UXE on my laptop (ASUS GL703GS) results in a loader error:
> "ERROR: Will only load images built for the generic loader or Linux
> images (Not '' and '') or with PHYS32_ENTRY set"
> (typed out by hand since I do not have a serial connection to my laptop)
> Regular xen.efi will boot just fine however.
>
> However, trying to boot the UXE on my friend's laptop (MSI GE75 Raider)
> it gets past the loading stage and into the OS. He runs EndeavourOS,
> which is a fork of Arch Linux, but uses the same kernel (equivalent
> hashes).
>
> On my home server (SuperMicro X9DRH-7F; Intel Xeon E5-2670 ES C1 QBF5)
> Linux fails to boot regardless of it being a UXE or standalone xen.efi.
> It does load the kernel into memory and executes it, but the kernel
> fails almost immediately after hitting a BUG in physaddr.c.
> I have boot logs thanks to serial over LAN, but seeing this is an
> engineering sample I would chalk it up to CPU instability. The boot log
> is the exact same for both regular xen.efi and the UXE.
>
> And finally, in a nested VM through libvirt (KVM) the UXE fails claiming
> the kernel is not an ELF binary. Regular xen.efi does boot, but the
> kernel will hit a few BUGs. The host is a desktop with an
> ASUS PRIME Z370-A motherboard and an i7-8700K CPU, running Arch Linux
> with a custom kernel.
>
> Now I am interested the most in why loading the kernel seems to fail on
> my laptop, and not on my friend's laptop or on my home server. I want to
> confirm that secure boot is possible on my soon to be ordered server
> before even considering to use Xen. How would I even begin to
> troubleshot this? This is not something I've been able to reproduce in
> user mode, running readnotes on the extracted kernel image returns the
> ELF notes just fine.
>
> Logs can be found here: https://gist.github.com/RA-Kooi/69bc2283d73e923c6b7f40ca379a2527
>
> Xen configuration for my laptop:
>
> [global]
> default=xen
>
> [xen]
> options=console=vga iommu=force:true,qinval:true,debug:true loglvl=all noreboot=true reboot=no vga=ask
> kernel=vmlinuz-linux root=/dev/mapper/root rd.luks.name=31475bde-3aba-4f6c-adf8-73355337d44d=root rw add_efi_memmap earlyprintk=xen
> ramdisk=initramfs-linux.img
> ucode=xen-efi-intel-ucode.bin
>

The issue actually ended up being two-fold, the first one is that while
I did sanity check my EFI binary, I did not align the sections to 4K
address boundaries. The second issue is actually an issue related to my
NVME SSD. On the xen-devel mailing list I said the SSD had died, but I
may actually have spoken too soon. When I am in the OS, everything works
fine and SMART reports the NVME drive as healthy. So as a workaround I
simply let my script output the EFI binary to a USB stick and then use
UEFI shell to boot from my USB stick.

Aligning the partitions also made the unified Xen EFI binary work in
QEMU/KVM.

If anybody is interested in the script, I forked the sbupdate AUR
package and you can find it here: https://github.com/RA-Kooi/sbupdate/tree/xen
Do keep in mind that the script is Arch Linux centric, but I did also
port it to Debian. The Debian branch does not support Xen, as I wrote it
for Proxmox. But one could probably merge the debian and xen branch
to get similar functionality on Debian and derivatives.