Mailing List Archive

E820 memory allocation issue on Threadripper platforms
Hi,

I ran into a memory allocation issue, I think. It is the same as
https://github.com/QubesOS/qubes-issues/issues/8791 and I saw at the end it
was recommended (by marmarek) that the issue reporter forward the issue to
this list. I searched the list, but as I didn't see it in the list already,
I'm doing that now.

Hardware:
I have an AMD Threadripper 7960X on a ASRock TRX50 WS motherboard. The
Qubes reporter had a Threadripper 3970X on an ASUS Prime TRX40-Pro
Motherboard. I saw a 3rd issue report of a similar issue on another
Threadripper, so I think this may be Threadripper-specific.

Setup:
The QuebesOS reporter was using Qubes Installer.
My install was that I had a fresh install of Debian 12 (no gui), and then
did `apt install xen-system-amd64` and rebooted.

The issue:
Any boot of Xen on the hardware results in a halted machine. When
monitoring the logs with `vga=,keep`, we get:

(XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
(XEN) Freed 644kB init memory
mapping kernel into physical memory
about to get started…
xen hypervisor allocated kernel memory conflicts with E820
(XEN) Hardware Dom0 halted: halting machine

None of the settings I or the Qubes reporter have tried have been able to
get past this failure.

I am happy to provide debugging support.

Patrick
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On 11.01.2024 03:29, Patrick Plenefisch wrote:
> Hi,
>
> I ran into a memory allocation issue, I think. It is the same as
> https://github.com/QubesOS/qubes-issues/issues/8791 and I saw at the end it
> was recommended (by marmarek) that the issue reporter forward the issue to
> this list. I searched the list, but as I didn't see it in the list already,
> I'm doing that now.
>
> Hardware:
> I have an AMD Threadripper 7960X on a ASRock TRX50 WS motherboard. The
> Qubes reporter had a Threadripper 3970X on an ASUS Prime TRX40-Pro
> Motherboard. I saw a 3rd issue report of a similar issue on another
> Threadripper, so I think this may be Threadripper-specific.
>
> Setup:
> The QuebesOS reporter was using Qubes Installer.
> My install was that I had a fresh install of Debian 12 (no gui), and then
> did `apt install xen-system-amd64` and rebooted.
>
> The issue:
> Any boot of Xen on the hardware results in a halted machine. When
> monitoring the logs with `vga=,keep`, we get:
>
> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
> (XEN) Freed 644kB init memory
> mapping kernel into physical memory
> about to get started…
> xen hypervisor allocated kernel memory conflicts with E820

So first of all (the title doesn't say it) this is a Linux Dom0 issue.
Whether or not needing addressing in Xen is unknown at this point.

> (XEN) Hardware Dom0 halted: halting machine
>
> None of the settings I or the Qubes reporter have tried have been able to
> get past this failure.
>
> I am happy to provide debugging support.

Well, the crucial piece of data initially is going to be: What's the
E820 map Xen gets to see, what's the E820 map Dom0 gets to see, and
what address range is the conflict detected for? The first question
is possible to answer by supplying a serial log. The second question
likely means adding some debugging code to either Xen or Linux. The
answer to third question may be possible to infer from the other
data, but would likely be better to obtain explicitly by adjusting /
amending the message Linux emits.

Jan
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On 11/1/24 10:37, Jan Beulich wrote:
> On 11.01.2024 03:29, Patrick Plenefisch wrote:
>> Hi,
>>
>> I ran into a memory allocation issue, I think. It is the same as
>> https://github.com/QubesOS/qubes-issues/issues/8791 and I saw at the end it
>> was recommended (by marmarek) that the issue reporter forward the issue to
>> this list. I searched the list, but as I didn't see it in the list already,
>> I'm doing that now.
>>
>> Hardware:
>> I have an AMD Threadripper 7960X on a ASRock TRX50 WS motherboard. The
>> Qubes reporter had a Threadripper 3970X on an ASUS Prime TRX40-Pro
>> Motherboard. I saw a 3rd issue report of a similar issue on another
>> Threadripper, so I think this may be Threadripper-specific.
>>
>> Setup:
>> The QuebesOS reporter was using Qubes Installer.
>> My install was that I had a fresh install of Debian 12 (no gui), and then
>> did `apt install xen-system-amd64` and rebooted.
>>
>> The issue:
>> Any boot of Xen on the hardware results in a halted machine. When
>> monitoring the logs with `vga=,keep`, we get:
>>
>> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
>> (XEN) Freed 644kB init memory
>> mapping kernel into physical memory
>> about to get started…
>> xen hypervisor allocated kernel memory conflicts with E820
>
> So first of all (the title doesn't say it) this is a Linux Dom0 issue.
> Whether or not needing addressing in Xen is unknown at this point.
>
>> (XEN) Hardware Dom0 halted: halting machine
>>
>> None of the settings I or the Qubes reporter have tried have been able to
>> get past this failure.
>>
>> I am happy to provide debugging support.
>
> Well, the crucial piece of data initially is going to be: What's the
> E820 map Xen gets to see, what's the E820 map Dom0 gets to see, and
> what address range is the conflict detected for? The first question
> is possible to answer by supplying a serial log. The second question
> likely means adding some debugging code to either Xen or Linux. The
> answer to third question may be possible to infer from the other
> data, but would likely be better to obtain explicitly by adjusting /
> amending the message Linux emits.

We 've already hit similar issue because xen doesn't take into account
the reserved memory regions when loading the dom0 kernel (even if it is
relocatable). It can be worked around by changing accordingly
CONFIG_PHYSICAL_START in kernel config.

Let me provide more details on how to get the info Jan requested:

1) in the xen cmdline add: e820-verbose=true console_to_ring

2) in the dom0 kernel cmdline add: earlyprintk=xen

3) change the xen log message emitted by the linux kernel to print the
conflicting address, like below

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index cfa99e8f054b..ad88b700d58e 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -717,7 +717,7 @@ static void __init xen_reserve_xen_mfnlist(void)
xen_relocate_p2m();
memblock_phys_free(start, size);
}
-
+void xen_raw_printk(const char *fmt, ...);
/**
* xen_memory_setup - Hook for machine specific memory setup.
**/
@@ -853,7 +853,8 @@ char * __init xen_memory_setup(void)
*/
if (xen_is_e820_reserved(__pa_symbol(_text),
__pa_symbol(__bss_stop) - __pa_symbol(_text))) {
- xen_raw_console_write("Xen hypervisor allocated kernel
memory conflicts with E820 map\n");
+ xen_raw_printk("Xen hypervisor allocated kernel memory
conflicts with E820 map: %#lx - %#lx\n",
+ __pa_symbol(_text),
__pa_symbol(__bss_stop));
BUG();
}

>
> Jan
>
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On 11.01.24 09:37, Jan Beulich wrote:
> On 11.01.2024 03:29, Patrick Plenefisch wrote:
>> Hi,
>>
>> I ran into a memory allocation issue, I think. It is the same as
>> https://github.com/QubesOS/qubes-issues/issues/8791 and I saw at the end it
>> was recommended (by marmarek) that the issue reporter forward the issue to
>> this list. I searched the list, but as I didn't see it in the list already,
>> I'm doing that now.
>>
>> Hardware:
>> I have an AMD Threadripper 7960X on a ASRock TRX50 WS motherboard. The
>> Qubes reporter had a Threadripper 3970X on an ASUS Prime TRX40-Pro
>> Motherboard. I saw a 3rd issue report of a similar issue on another
>> Threadripper, so I think this may be Threadripper-specific.
>>
>> Setup:
>> The QuebesOS reporter was using Qubes Installer.
>> My install was that I had a fresh install of Debian 12 (no gui), and then
>> did `apt install xen-system-amd64` and rebooted.
>>
>> The issue:
>> Any boot of Xen on the hardware results in a halted machine. When
>> monitoring the logs with `vga=,keep`, we get:
>>
>> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
>> (XEN) Freed 644kB init memory
>> mapping kernel into physical memory
>> about to get started…
>> xen hypervisor allocated kernel memory conflicts with E820
>
> So first of all (the title doesn't say it) this is a Linux Dom0 issue.
> Whether or not needing addressing in Xen is unknown at this point.
>
>> (XEN) Hardware Dom0 halted: halting machine
>>
>> None of the settings I or the Qubes reporter have tried have been able to
>> get past this failure.
>>
>> I am happy to provide debugging support.
>
> Well, the crucial piece of data initially is going to be: What's the
> E820 map Xen gets to see, what's the E820 map Dom0 gets to see, and
> what address range is the conflict detected for? The first question
> is possible to answer by supplying a serial log. The second question
> likely means adding some debugging code to either Xen or Linux. The
> answer to third question may be possible to infer from the other
> data, but would likely be better to obtain explicitly by adjusting /
> amending the message Linux emits.

The needed information should all be in the hypervisor messages.

The hypervisor is initially presenting a memory map to dom0 which is not the
same as the native memory map. Dom0 tries to rearrange its memory layout to
be compatible with the native memory map.

The seen message ("xen hypervisor allocated kernel memory conflicts with E820")
tells us that the kernel position is conflicting with the native memory map
(at least one guest pfn occupied by the kernel would be at a non-RAM populated
location after rearrangement of memory).

In theory it would be possible to cover this case, too, but it would be quite
cumbersome. Right now only the initrd is allowed to conflict with the memory map
(it will be moved in this case), kernel and initial page table conflicts are not
handled.

When I added the conflict handling nearly 10 years ago, there was no hardware
known to have memory holes at addresses which would conflict with Xen's initial
idea of dom0 memory layout.

I can look into this later, but right now I'm just about to go offline probably
until end of January.


Juergen
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
I managed to set up serial access and saved the output with the requested
flags as the attached logs

Patrick

On Thu, Jan 11, 2024 at 5:13?AM Juergen Gross <jgross@suse.com> wrote:

> On 11.01.24 09:37, Jan Beulich wrote:
> > On 11.01.2024 03:29, Patrick Plenefisch wrote:
> >> Hi,
> >>
> >> I ran into a memory allocation issue, I think. It is the same as
> >> https://github.com/QubesOS/qubes-issues/issues/8791 and I saw at the
> end it
> >> was recommended (by marmarek) that the issue reporter forward the issue
> to
> >> this list. I searched the list, but as I didn't see it in the list
> already,
> >> I'm doing that now.
> >>
> >> Hardware:
> >> I have an AMD Threadripper 7960X on a ASRock TRX50 WS motherboard. The
> >> Qubes reporter had a Threadripper 3970X on an ASUS Prime TRX40-Pro
> >> Motherboard. I saw a 3rd issue report of a similar issue on another
> >> Threadripper, so I think this may be Threadripper-specific.
> >>
> >> Setup:
> >> The QuebesOS reporter was using Qubes Installer.
> >> My install was that I had a fresh install of Debian 12 (no gui), and
> then
> >> did `apt install xen-system-amd64` and rebooted.
> >>
> >> The issue:
> >> Any boot of Xen on the hardware results in a halted machine. When
> >> monitoring the logs with `vga=,keep`, we get:
> >>
> >> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch
> input)
> >> (XEN) Freed 644kB init memory
> >> mapping kernel into physical memory
> >> about to get started…
> >> xen hypervisor allocated kernel memory conflicts with E820
> >
> > So first of all (the title doesn't say it) this is a Linux Dom0 issue.
> > Whether or not needing addressing in Xen is unknown at this point.
> >
> >> (XEN) Hardware Dom0 halted: halting machine
> >>
> >> None of the settings I or the Qubes reporter have tried have been able
> to
> >> get past this failure.
> >>
> >> I am happy to provide debugging support.
> >
> > Well, the crucial piece of data initially is going to be: What's the
> > E820 map Xen gets to see, what's the E820 map Dom0 gets to see, and
> > what address range is the conflict detected for? The first question
> > is possible to answer by supplying a serial log. The second question
> > likely means adding some debugging code to either Xen or Linux. The
> > answer to third question may be possible to infer from the other
> > data, but would likely be better to obtain explicitly by adjusting /
> > amending the message Linux emits.
>
> The needed information should all be in the hypervisor messages.
>
> The hypervisor is initially presenting a memory map to dom0 which is not
> the
> same as the native memory map. Dom0 tries to rearrange its memory layout to
> be compatible with the native memory map.
>
> The seen message ("xen hypervisor allocated kernel memory conflicts with
> E820")
> tells us that the kernel position is conflicting with the native memory map
> (at least one guest pfn occupied by the kernel would be at a non-RAM
> populated
> location after rearrangement of memory).
>
> In theory it would be possible to cover this case, too, but it would be
> quite
> cumbersome. Right now only the initrd is allowed to conflict with the
> memory map
> (it will be moved in this case), kernel and initial page table conflicts
> are not
> handled.
>
> When I added the conflict handling nearly 10 years ago, there was no
> hardware
> known to have memory holes at addresses which would conflict with Xen's
> initial
> idea of dom0 memory layout.
>
> I can look into this later, but right now I'm just about to go offline
> probably
> until end of January.
>
>
> Juergen
>
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On 16.01.2024 01:22, Patrick Plenefisch wrote:
> I managed to set up serial access and saved the output with the requested
> flags as the attached logs

Thanks. While you didn't ...

> On Thu, Jan 11, 2024 at 5:13?AM Juergen Gross <jgross@suse.com> wrote:
>> On 11.01.24 09:37, Jan Beulich wrote:
>>> On 11.01.2024 03:29, Patrick Plenefisch wrote:
>>>> I ran into a memory allocation issue, I think. It is the same as
>>>> https://github.com/QubesOS/qubes-issues/issues/8791 and I saw at the
>> end it
>>>> was recommended (by marmarek) that the issue reporter forward the issue
>> to
>>>> this list. I searched the list, but as I didn't see it in the list
>> already,
>>>> I'm doing that now.
>>>>
>>>> Hardware:
>>>> I have an AMD Threadripper 7960X on a ASRock TRX50 WS motherboard. The
>>>> Qubes reporter had a Threadripper 3970X on an ASUS Prime TRX40-Pro
>>>> Motherboard. I saw a 3rd issue report of a similar issue on another
>>>> Threadripper, so I think this may be Threadripper-specific.
>>>>
>>>> Setup:
>>>> The QuebesOS reporter was using Qubes Installer.
>>>> My install was that I had a fresh install of Debian 12 (no gui), and
>> then
>>>> did `apt install xen-system-amd64` and rebooted.
>>>>
>>>> The issue:
>>>> Any boot of Xen on the hardware results in a halted machine. When
>>>> monitoring the logs with `vga=,keep`, we get:
>>>>
>>>> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch
>> input)
>>>> (XEN) Freed 644kB init memory
>>>> mapping kernel into physical memory
>>>> about to get started…
>>>> xen hypervisor allocated kernel memory conflicts with E820
>>>
>>> So first of all (the title doesn't say it) this is a Linux Dom0 issue.
>>> Whether or not needing addressing in Xen is unknown at this point.
>>>
>>>> (XEN) Hardware Dom0 halted: halting machine
>>>>
>>>> None of the settings I or the Qubes reporter have tried have been able
>> to
>>>> get past this failure.
>>>>
>>>> I am happy to provide debugging support.
>>>
>>> Well, the crucial piece of data initially is going to be: What's the
>>> E820 map Xen gets to see, what's the E820 map Dom0 gets to see, and
>>> what address range is the conflict detected for? The first question
>>> is possible to answer by supplying a serial log. The second question
>>> likely means adding some debugging code to either Xen or Linux. The
>>> answer to third question may be possible to infer from the other
>>> data, but would likely be better to obtain explicitly by adjusting /
>>> amending the message Linux emits.

... fiddle with the Linux message, ...

>> The needed information should all be in the hypervisor messages.
>>
>> The hypervisor is initially presenting a memory map to dom0 which is not
>> the
>> same as the native memory map. Dom0 tries to rearrange its memory layout to
>> be compatible with the native memory map.
>>
>> The seen message ("xen hypervisor allocated kernel memory conflicts with
>> E820")
>> tells us that the kernel position is conflicting with the native memory map
>> (at least one guest pfn occupied by the kernel would be at a non-RAM
>> populated
>> location after rearrangement of memory).
>>
>> In theory it would be possible to cover this case, too, but it would be
>> quite
>> cumbersome. Right now only the initrd is allowed to conflict with the
>> memory map
>> (it will be moved in this case), kernel and initial page table conflicts
>> are not
>> handled.
>>
>> When I added the conflict handling nearly 10 years ago, there was no
>> hardware
>> known to have memory holes at addresses which would conflict with Xen's
>> initial
>> idea of dom0 memory layout.

... as per

(XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x4a00000

there's an overlap with not exactly a hole, but with an
EfiACPIMemoryNVS region:

(XEN) 0000000100000-0000003159fff type=2 attr=000000000000000f
(XEN) 000000315a000-0000003ffffff type=7 attr=000000000000000f
(XEN) 0000004000000-0000004045fff type=10 attr=000000000000000f
(XEN) 0000004046000-0000009afefff type=7 attr=000000000000000f

(the 3rd of the 4 lines). Considering there's another region higher
up:

(XEN) 00000a747f000-00000a947efff type=10 attr=000000000000000f

I'm inclined to say it is poor firmware (or, far less likely, boot
loader) behavior to clobber a rather low and entirely arbitrary RAM
range, rather than consolidating all such regions near the top of
RAM below 4Gb. There are further such odd regions, btw:

(XEN) 0000009aff000-0000009ffffff type=0 attr=000000000000000f
...
(XEN) 000000b000000-000000b020fff type=0 attr=000000000000000f

If the kernel image was sufficiently much larger, these could become
a problem as well. Otoh if the kernel wasn't built with
CONFIG_PHYSICAL_START=0x1000000, i.e. to start at 16Mb, but at, say,
2Mb, things should apparently work even with this unusual memory
layout (until the kernel would grow enough to again run into that
very region).

It remains to be seen in how far it is reasonably possible to work
around this in the kernel. While (sadly) still unsupported, in the
meantime you may want to consider running Dom0 in PVH mode.

Jan
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Tue, Jan 16, 2024 at 4:33?AM Jan Beulich <jbeulich@suse.com> wrote:

> On 16.01.2024 01:22, Patrick Plenefisch wrote:
> > I managed to set up serial access and saved the output with the requested
> > flags as the attached logs
>
> Thanks. While you didn't ...
>
>
> ... fiddle with the Linux message, ...
>

I last built the kernel over a decade ago, and so was hoping to not have to
look up how to do that again, but I can research how to go about that again
if it would help?


>
> ... as per
>
> (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x4a00000
>
> there's an overlap with not exactly a hole, but with an
> EfiACPIMemoryNVS region:
>
> (XEN) 0000000100000-0000003159fff type=2 attr=000000000000000f
> (XEN) 000000315a000-0000003ffffff type=7 attr=000000000000000f
> (XEN) 0000004000000-0000004045fff type=10 attr=000000000000000f
> (XEN) 0000004046000-0000009afefff type=7 attr=000000000000000f
>
> (the 3rd of the 4 lines). Considering there's another region higher
> up:
>
> (XEN) 00000a747f000-00000a947efff type=10 attr=000000000000000f
>
> I'm inclined to say it is poor firmware (or, far less likely, boot
> loader) behavior to clobber a rather low and entirely arbitrary RAM
>

Bootloader is Grub 2.06 EFI platform as packaged by Debian 12



> range, rather than consolidating all such regions near the top of
> RAM below 4Gb. There are further such odd regions, btw:
>
> (XEN) 0000009aff000-0000009ffffff type=0 attr=000000000000000f
> ...
> (XEN) 000000b000000-000000b020fff type=0 attr=000000000000000f
>
> If the kernel image was sufficiently much larger, these could become
> a problem as well. Otoh if the kernel wasn't built with
> CONFIG_PHYSICAL_START=0x1000000, i.e. to start at 16Mb, but at, say,
> 2Mb, things should apparently work even with this unusual memory
> layout (until the kernel would grow enough to again run into that
> very region).
>

I'm currently talking to the vendor's support team and testing a beta BIOS
for unrelated reasons, is there something specific I should forward to
them, either as a question or as a request for a fix?

As someone who hasn't built a kernel in over a decade, should I figure out
how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and report
back?


> It remains to be seen in how far it is reasonably possible to work
> around this in the kernel. While (sadly) still unsupported, in the
> meantime you may want to consider running Dom0 in PVH mode.
>

I tried this by adding dom0=pvh, and instead got this boot error:

(XEN) xenoprof: Initialization failed. AMD processor family 25 is not
supported
(XEN) NX (Execute Disable) protection active
(XEN) Dom0 has maximum 1400 PIRQs
(XEN) *** Building a PVH Dom0 ***
(XEN) Failed to load kernel: -1
(XEN) Xen dom0 kernel broken ELF: <NULL>
(XEN) Failed to load Dom0 kernel
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Could not construct domain 0
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...




>
> Jan
>
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On 17.01.2024 07:12, Patrick Plenefisch wrote:
> On Tue, Jan 16, 2024 at 4:33?AM Jan Beulich <jbeulich@suse.com> wrote:
>
>> On 16.01.2024 01:22, Patrick Plenefisch wrote:
>>> I managed to set up serial access and saved the output with the requested
>>> flags as the attached logs
>>
>> Thanks. While you didn't ...
>>
>>
>> ... fiddle with the Linux message, ...
>>
>
> I last built the kernel over a decade ago, and so was hoping to not have to
> look up how to do that again, but I can research how to go about that again
> if it would help?
>
>
>>
>> ... as per
>>
>> (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x4a00000
>>
>> there's an overlap with not exactly a hole, but with an
>> EfiACPIMemoryNVS region:
>>
>> (XEN) 0000000100000-0000003159fff type=2 attr=000000000000000f
>> (XEN) 000000315a000-0000003ffffff type=7 attr=000000000000000f
>> (XEN) 0000004000000-0000004045fff type=10 attr=000000000000000f
>> (XEN) 0000004046000-0000009afefff type=7 attr=000000000000000f
>>
>> (the 3rd of the 4 lines). Considering there's another region higher
>> up:
>>
>> (XEN) 00000a747f000-00000a947efff type=10 attr=000000000000000f
>>
>> I'm inclined to say it is poor firmware (or, far less likely, boot
>> loader) behavior to clobber a rather low and entirely arbitrary RAM
>>
>
> Bootloader is Grub 2.06 EFI platform as packaged by Debian 12
>
>
>
>> range, rather than consolidating all such regions near the top of
>> RAM below 4Gb. There are further such odd regions, btw:
>>
>> (XEN) 0000009aff000-0000009ffffff type=0 attr=000000000000000f
>> ...
>> (XEN) 000000b000000-000000b020fff type=0 attr=000000000000000f
>>
>> If the kernel image was sufficiently much larger, these could become
>> a problem as well. Otoh if the kernel wasn't built with
>> CONFIG_PHYSICAL_START=0x1000000, i.e. to start at 16Mb, but at, say,
>> 2Mb, things should apparently work even with this unusual memory
>> layout (until the kernel would grow enough to again run into that
>> very region).
>
> I'm currently talking to the vendor's support team and testing a beta BIOS
> for unrelated reasons, is there something specific I should forward to
> them, either as a question or as a request for a fix?

Well, first it would need figuring whether the "interesting" regions
are being put in place by firmware of the boot loader. If it's firmware
(pretty likely at least for the region you're having trouble with), you
may want to ask them to re-do where they place that specific data.

> As someone who hasn't built a kernel in over a decade, should I figure out
> how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and report
> back?

That was largely a suggestion to perhaps allow you to gain some
workable setup. It would be of interest to us largely for completeness.

>> It remains to be seen in how far it is reasonably possible to work
>> around this in the kernel. While (sadly) still unsupported, in the
>> meantime you may want to consider running Dom0 in PVH mode.
>>
>
> I tried this by adding dom0=pvh, and instead got this boot error:
>
> (XEN) xenoprof: Initialization failed. AMD processor family 25 is not
> supported
> (XEN) NX (Execute Disable) protection active
> (XEN) Dom0 has maximum 1400 PIRQs
> (XEN) *** Building a PVH Dom0 ***
> (XEN) Failed to load kernel: -1
> (XEN) Xen dom0 kernel broken ELF: <NULL>
> (XEN) Failed to load Dom0 kernel
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Could not construct domain 0
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...

Hmm, that's sad. The more that the error messages aren't really
informative. You did check though that your kernel is PVH-capable?
(With a debug build of Xen, and with suitably high logging level,
various of the ELF properties would be logged. Such output may or
may not give further hints towards what's actually wrong. Albeit
you using 4.17 this would further require you to pull in commit
ea3dabfb80d7 ["x86/PVH: allow Dom0 ELF parsing to be verbose"].)

But wait - aren't you running into the same collision there with
that memory region? I think that explains the unhelpful output.
Whereas I assume the native kernel can deal with that as long as
it's built with CONFIG_RELOCATABLE=y. I don't think we want to
get into the business of interpreting the kernel's internal
representation of the relocations needed, so it's not really
clear to me what we might do in such a case. Perhaps the only way
is to signal to the kernel that it needs to apply relocations
itself (which in turn would require the kernel to signal to us
that it's capable of doing so). Cc-ing Roger in case he has any
neat idea.

Jan
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Wed, Jan 17, 2024 at 01:12:30AM -0500, Patrick Plenefisch wrote:
> On Tue, Jan 16, 2024 at 4:33?AM Jan Beulich <jbeulich@suse.com> wrote:
>
> > On 16.01.2024 01:22, Patrick Plenefisch wrote:
> > It remains to be seen in how far it is reasonably possible to work
> > around this in the kernel. While (sadly) still unsupported, in the
> > meantime you may want to consider running Dom0 in PVH mode.
> >
>
> I tried this by adding dom0=pvh, and instead got this boot error:
>
> (XEN) xenoprof: Initialization failed. AMD processor family 25 is not
> supported
> (XEN) NX (Execute Disable) protection active
> (XEN) Dom0 has maximum 1400 PIRQs
> (XEN) *** Building a PVH Dom0 ***
> (XEN) Failed to load kernel: -1
> (XEN) Xen dom0 kernel broken ELF: <NULL>
> (XEN) Failed to load Dom0 kernel
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Could not construct domain 0
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...

PVH dom0 also re-uses the host memory map in order to build the dom0
memory map, and will fail to load the kernel if the ELF program
headers physical addresses are not between RAM regions (or destination
guest physical addresses where hvm_copy_to_guest_phys() returns
failure).

Roger.
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Wed, Jan 17, 2024 at 09:46:27AM +0100, Jan Beulich wrote:
> On 17.01.2024 07:12, Patrick Plenefisch wrote:
> > On Tue, Jan 16, 2024 at 4:33?AM Jan Beulich <jbeulich@suse.com> wrote:
> >> On 16.01.2024 01:22, Patrick Plenefisch wrote:
> >> It remains to be seen in how far it is reasonably possible to work
> >> around this in the kernel. While (sadly) still unsupported, in the
> >> meantime you may want to consider running Dom0 in PVH mode.
> >>
> >
> > I tried this by adding dom0=pvh, and instead got this boot error:
> >
> > (XEN) xenoprof: Initialization failed. AMD processor family 25 is not
> > supported
> > (XEN) NX (Execute Disable) protection active
> > (XEN) Dom0 has maximum 1400 PIRQs
> > (XEN) *** Building a PVH Dom0 ***
> > (XEN) Failed to load kernel: -1
> > (XEN) Xen dom0 kernel broken ELF: <NULL>
> > (XEN) Failed to load Dom0 kernel
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Could not construct domain 0
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Reboot in five seconds...
>
> Hmm, that's sad. The more that the error messages aren't really
> informative. You did check though that your kernel is PVH-capable?
> (With a debug build of Xen, and with suitably high logging level,
> various of the ELF properties would be logged. Such output may or
> may not give further hints towards what's actually wrong. Albeit
> you using 4.17 this would further require you to pull in commit
> ea3dabfb80d7 ["x86/PVH: allow Dom0 ELF parsing to be verbose"].)
>
> But wait - aren't you running into the same collision there with
> that memory region? I think that explains the unhelpful output.

I think so, elf_memcpy() in elf_load_image() is failing to load on the
given destination address. Error messages should be more helpful
there.

> Whereas I assume the native kernel can deal with that as long as
> it's built with CONFIG_RELOCATABLE=y. I don't think we want to
> get into the business of interpreting the kernel's internal
> representation of the relocations needed, so it's not really
> clear to me what we might do in such a case. Perhaps the only way
> is to signal to the kernel that it needs to apply relocations
> itself (which in turn would require the kernel to signal to us
> that it's capable of doing so). Cc-ing Roger in case he has any
> neat idea.

Hm, no, not really.

We could do like multiboot2: the kernel provides us with some
placement data (min/max addresses, alignment), and Xen let's the
kernel deal with relocations itself.

Additionally we could support the kernel providing a section with the
relocations and apply them from Xen, but that's likely hm, complicated
at best, as I don't even know which kinds of relocations we would have
to support.

I'm not sure how Linux deals with this in the bare metal case, are
relocations done after decompressing and before jumping into the entry
point?

I would also need to check FreeBSD at least to have an idea of how
it's done there.

Thanks, Roger.
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On 17.01.2024 11:13, Roger Pau Monné wrote:
> On Wed, Jan 17, 2024 at 09:46:27AM +0100, Jan Beulich wrote:
>> Whereas I assume the native kernel can deal with that as long as
>> it's built with CONFIG_RELOCATABLE=y. I don't think we want to
>> get into the business of interpreting the kernel's internal
>> representation of the relocations needed, so it's not really
>> clear to me what we might do in such a case. Perhaps the only way
>> is to signal to the kernel that it needs to apply relocations
>> itself (which in turn would require the kernel to signal to us
>> that it's capable of doing so). Cc-ing Roger in case he has any
>> neat idea.
>
> Hm, no, not really.
>
> We could do like multiboot2: the kernel provides us with some
> placement data (min/max addresses, alignment), and Xen let's the
> kernel deal with relocations itself.

Requiring the kernel's entry point to take a sufficiently different
flow then compared to how it's today, I expect.

> Additionally we could support the kernel providing a section with the
> relocations and apply them from Xen, but that's likely hm, complicated
> at best, as I don't even know which kinds of relocations we would have
> to support.

If the kernel was properly linked to a PIE, there'd generally be only
one kind of relocation (per arch) that ought to need dealing with -
for x86-64 that's R_X86_64_RELATIVE iirc. Hence why (I suppose) they
don't use ELF relocation structures (for being wastefully large), but
rather a more compact custom representation. Even without building PIE
(presumably in part not possible because of how per-CPU data needs
dealing with), they get away with handling just very few relocs (and
from looking at the reloc processing code I'm getting the impression
they mistreat R_X86_64_32 as being the same as R_X86_64_32S, when it
isn't; needing to get such quirks right is one more aspect of why I
think we should leave relocation handling to the kernel).

> I'm not sure how Linux deals with this in the bare metal case, are
> relocations done after decompressing and before jumping into the entry
> point?

That's how it was last time I looked, yes.

Jan

> I would also need to check FreeBSD at least to have an idea of how
> it's done there.
>
> Thanks, Roger.
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Tue, Jan 16, 2024 at 10:33:26AM +0100, Jan Beulich wrote:
> ... as per
>
> (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x4a00000
>
> there's an overlap with not exactly a hole, but with an
> EfiACPIMemoryNVS region:
>
> (XEN) 0000000100000-0000003159fff type=2 attr=000000000000000f
> (XEN) 000000315a000-0000003ffffff type=7 attr=000000000000000f
> (XEN) 0000004000000-0000004045fff type=10 attr=000000000000000f
> (XEN) 0000004046000-0000009afefff type=7 attr=000000000000000f
>
> (the 3rd of the 4 lines). Considering there's another region higher
> up:
>
> (XEN) 00000a747f000-00000a947efff type=10 attr=000000000000000f
>
> I'm inclined to say it is poor firmware (or, far less likely, boot
> loader) behavior to clobber a rather low and entirely arbitrary RAM
> range, rather than consolidating all such regions near the top of
> RAM below 4Gb.

FWIW, we have two more similar reports, with different motherboards and
firmware versions, but the common factor is Threadripper CPU. It doesn't
exclude firmware issue (it can be an issue in some common template, like
edk2?), but makes it a bit less likely.

> There are further such odd regions, btw:
>
> (XEN) 0000009aff000-0000009ffffff type=0 attr=000000000000000f
> ...
> (XEN) 000000b000000-000000b020fff type=0 attr=000000000000000f
>
> If the kernel image was sufficiently much larger, these could become
> a problem as well. Otoh if the kernel wasn't built with
> CONFIG_PHYSICAL_START=0x1000000, i.e. to start at 16Mb, but at, say,
> 2Mb, things should apparently work even with this unusual memory
> layout (until the kernel would grow enough to again run into that
> very region).

Shouldn't CONFIG_RELOCATABLE=y take care of this? At least in the case
of Qubes OS, it's enabled and the issue still happens.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Wed, Jan 17, 2024 at 11:40:20AM +0100, Jan Beulich wrote:
> On 17.01.2024 11:13, Roger Pau Monné wrote:
> > On Wed, Jan 17, 2024 at 09:46:27AM +0100, Jan Beulich wrote:
> >> Whereas I assume the native kernel can deal with that as long as
> >> it's built with CONFIG_RELOCATABLE=y. I don't think we want to
> >> get into the business of interpreting the kernel's internal
> >> representation of the relocations needed, so it's not really
> >> clear to me what we might do in such a case. Perhaps the only way
> >> is to signal to the kernel that it needs to apply relocations
> >> itself (which in turn would require the kernel to signal to us
> >> that it's capable of doing so). Cc-ing Roger in case he has any
> >> neat idea.
> >
> > Hm, no, not really.
> >
> > We could do like multiboot2: the kernel provides us with some
> > placement data (min/max addresses, alignment), and Xen let's the
> > kernel deal with relocations itself.
>
> Requiring the kernel's entry point to take a sufficiently different
> flow then compared to how it's today, I expect.

Indeed, I would expect that.

> > Additionally we could support the kernel providing a section with the
> > relocations and apply them from Xen, but that's likely hm, complicated
> > at best, as I don't even know which kinds of relocations we would have
> > to support.
>
> If the kernel was properly linked to a PIE, there'd generally be only
> one kind of relocation (per arch) that ought to need dealing with -
> for x86-64 that's R_X86_64_RELATIVE iirc. Hence why (I suppose) they
> don't use ELF relocation structures (for being wastefully large), but
> rather a more compact custom representation. Even without building PIE
> (presumably in part not possible because of how per-CPU data needs
> dealing with), they get away with handling just very few relocs (and
> from looking at the reloc processing code I'm getting the impression
> they mistreat R_X86_64_32 as being the same as R_X86_64_32S, when it
> isn't; needing to get such quirks right is one more aspect of why I
> think we should leave relocation handling to the kernel).

Would have to look into more detail, but I think leaving any relocs
for the OS to perform would be my initial approach.

> > I'm not sure how Linux deals with this in the bare metal case, are
> > relocations done after decompressing and before jumping into the entry
> > point?
>
> That's how it was last time I looked, yes.

I've created a gitlab ticket for it:

https://gitlab.com/xen-project/xen/-/issues/180

So that we don't forget, as I don't have time to work into this right
now, but I think it's important enough that we don't forget.

For PV it's a bit more unclear how we want to deal with it, as it's
IMO a specific Linux behavior that makes it fail to boot.

Roger.
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Wed, Jan 17, 2024 at 01:06:53PM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Jan 16, 2024 at 10:33:26AM +0100, Jan Beulich wrote:
> > ... as per
> >
> > (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x4a00000
> >
> > there's an overlap with not exactly a hole, but with an
> > EfiACPIMemoryNVS region:
> >
> > (XEN) 0000000100000-0000003159fff type=2 attr=000000000000000f
> > (XEN) 000000315a000-0000003ffffff type=7 attr=000000000000000f
> > (XEN) 0000004000000-0000004045fff type=10 attr=000000000000000f
> > (XEN) 0000004046000-0000009afefff type=7 attr=000000000000000f
> >
> > (the 3rd of the 4 lines). Considering there's another region higher
> > up:
> >
> > (XEN) 00000a747f000-00000a947efff type=10 attr=000000000000000f
> >
> > I'm inclined to say it is poor firmware (or, far less likely, boot
> > loader) behavior to clobber a rather low and entirely arbitrary RAM
> > range, rather than consolidating all such regions near the top of
> > RAM below 4Gb.
>
> FWIW, we have two more similar reports, with different motherboards and
> firmware versions, but the common factor is Threadripper CPU. It doesn't
> exclude firmware issue (it can be an issue in some common template, like
> edk2?), but makes it a bit less likely.
>
> > There are further such odd regions, btw:
> >
> > (XEN) 0000009aff000-0000009ffffff type=0 attr=000000000000000f
> > ...
> > (XEN) 000000b000000-000000b020fff type=0 attr=000000000000000f
> >
> > If the kernel image was sufficiently much larger, these could become
> > a problem as well. Otoh if the kernel wasn't built with
> > CONFIG_PHYSICAL_START=0x1000000, i.e. to start at 16Mb, but at, say,
> > 2Mb, things should apparently work even with this unusual memory
> > layout (until the kernel would grow enough to again run into that
> > very region).
>
> Shouldn't CONFIG_RELOCATABLE=y take care of this?

No, because PV doesn't use the native entry point.

> At least in the case
> of Qubes OS, it's enabled and the issue still happens.

I think for PV it should be possible to workaround this in Linux
itself, maybe by changing the pfn -> mfn relations of the kernel
area?

Those overlaps are not real, as the loaded kernel is scattered across
mfns, and those certainly belong to RAM regions in the memory map.

For PVH it's going to require some changes in Xen itself.

Roger.
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On 17.01.2024 13:59, Roger Pau Monné wrote:
> On Wed, Jan 17, 2024 at 01:06:53PM +0100, Marek Marczykowski-Górecki wrote:
>> On Tue, Jan 16, 2024 at 10:33:26AM +0100, Jan Beulich wrote:
>>> ... as per
>>>
>>> (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x4a00000
>>>
>>> there's an overlap with not exactly a hole, but with an
>>> EfiACPIMemoryNVS region:
>>>
>>> (XEN) 0000000100000-0000003159fff type=2 attr=000000000000000f
>>> (XEN) 000000315a000-0000003ffffff type=7 attr=000000000000000f
>>> (XEN) 0000004000000-0000004045fff type=10 attr=000000000000000f
>>> (XEN) 0000004046000-0000009afefff type=7 attr=000000000000000f
>>>
>>> (the 3rd of the 4 lines). Considering there's another region higher
>>> up:
>>>
>>> (XEN) 00000a747f000-00000a947efff type=10 attr=000000000000000f
>>>
>>> I'm inclined to say it is poor firmware (or, far less likely, boot
>>> loader) behavior to clobber a rather low and entirely arbitrary RAM
>>> range, rather than consolidating all such regions near the top of
>>> RAM below 4Gb.
>>
>> FWIW, we have two more similar reports, with different motherboards and
>> firmware versions, but the common factor is Threadripper CPU. It doesn't
>> exclude firmware issue (it can be an issue in some common template, like
>> edk2?), but makes it a bit less likely.
>>
>>> There are further such odd regions, btw:
>>>
>>> (XEN) 0000009aff000-0000009ffffff type=0 attr=000000000000000f
>>> ...
>>> (XEN) 000000b000000-000000b020fff type=0 attr=000000000000000f
>>>
>>> If the kernel image was sufficiently much larger, these could become
>>> a problem as well. Otoh if the kernel wasn't built with
>>> CONFIG_PHYSICAL_START=0x1000000, i.e. to start at 16Mb, but at, say,
>>> 2Mb, things should apparently work even with this unusual memory
>>> layout (until the kernel would grow enough to again run into that
>>> very region).
>>
>> Shouldn't CONFIG_RELOCATABLE=y take care of this?
>
> No, because PV doesn't use the native entry point.
>
>> At least in the case
>> of Qubes OS, it's enabled and the issue still happens.
>
> I think for PV it should be possible to workaround this in Linux
> itself, maybe by changing the pfn -> mfn relations of the kernel
> area?

Right, that's what I understand Jürgen is intending to look into once
he's back.

Jan

> Those overlaps are not real, as the loaded kernel is scattered across
> mfns, and those certainly belong to RAM regions in the memory map.
>
> For PVH it's going to require some changes in Xen itself.
>
> Roger.
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Wed, Jan 17, 2024 at 3:46?AM Jan Beulich <jbeulich@suse.com> wrote:

> On 17.01.2024 07:12, Patrick Plenefisch wrote:
> > On Tue, Jan 16, 2024 at 4:33?AM Jan Beulich <jbeulich@suse.com> wrote:
> >
> >> On 16.01.2024 01:22, Patrick Plenefisch wrote:
> >>> I managed to set up serial access and saved the output with the
> requested
> >>> flags as the attached logs
> >>
> >> Thanks. While you didn't ...
> >>
> >>
> >> ... fiddle with the Linux message, ...
> >>
> >
> > I last built the kernel over a decade ago, and so was hoping to not have
> to
> > look up how to do that again, but I can research how to go about that
> again
> > if it would help?
> >
>

The nice thing about threadripper is the fast kernel build times. I have
added that patch to the kernel and confirmed:

about to get started...
Xen hypervisor allocated kernel memory conflicts with E820 map: 0x1000000 -
0x4400000
(XEN) Hardware Dom0 halted: halting machine




> >
> > I'm currently talking to the vendor's support team and testing a beta
> BIOS
> > for unrelated reasons, is there something specific I should forward to
> > them, either as a question or as a request for a fix?
>
> Well, first it would need figuring whether the "interesting" regions
> are being put in place by firmware of the boot loader. If it's firmware
> (pretty likely at least for the region you're having trouble with), you
> may want to ask them to re-do where they place that specific data.
>

This section changes boot-to-boot and grub vs EFI direct load, but my
untrained eyes don't see an obvoius pattern. I've attached several logs.
Name format:

xen-XENVERSION_LOADER_KERNELNAME_TYPE.log

where XENVERSION is 4.17 (packaged in debian 12) or 4.18 (I built from
source) or 4.18p (I applied the patch you mention below and built from
source)

where LOADER is grub for grub2 (from debian 12) or UEFI (direct boot via
efibootmgr-configured UEFI entry)

where KERNELNAME is either empty (PVH failure), or linuxpatch (linux with
the patch requested above), or linuxoffset (with PHYSICAL_START=2MiB), or
linux6 (debian 12 kernel)

where TYPE is either pvh or pv

For the two logs that actually boooted (linuxoffset), I truncated them
during pcie initialization, but they did go all the way to give me a login
screen



>
> > As someone who hasn't built a kernel in over a decade, should I figure
> out
> > how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and report
> > back?
>
> That was largely a suggestion to perhaps allow you to gain some
> workable setup. It would be of interest to us largely for completeness.
>

Typo aside, setting the boot to 2MiB works! It works better for PV, while
PVH has some graphics card issues, namely that I have to interact over
serial and dmesg has some concerning radeon errors


>
>
> Hmm, that's sad. The more that the error messages aren't really
> informative. You did check though that your kernel is PVH-capable?
> (With a debug build of Xen, and with suitably high logging level,
> various of the ELF properties would be logged. Such output may or
> may not give further hints towards what's actually wrong.




> Albeit
> you using 4.17 this would further require you to pull in commit
> ea3dabfb80d7 ["x86/PVH: allow Dom0 ELF parsing to be verbose"].)
>

This was applied in "4.18p" logs (above)


>
> But wait - aren't you running into the same collision there with
> that memory region? I think that explains the unhelpful output.
> Whereas I assume the native kernel can deal with that as long as
> it's built with CONFIG_RELOCATABLE=y. I don't think we want to
> get into the business of interpreting the kernel's internal
> representation of the relocations needed, so it's not really
> clear to me what we might do in such a case. Perhaps the only way
> is to signal to the kernel that it needs to apply relocations
> itself (which in turn would require the kernel to signal to us
> that it's capable of doing so). Cc-ing Roger in case he has any
> neat idea.
>

Yes, PVH, PV, and Relocatable are all enabled in the debian kernel I was
using, and then basing my kernel config on.

Said kernel, with its config file can be found at
https://packages.debian.org/bookworm/linux-image-6.1.0-17-amd64


>
> Jan
>
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Thu, Jan 18, 2024 at 01:23:56AM -0500, Patrick Plenefisch wrote:
> On Wed, Jan 17, 2024 at 3:46?AM Jan Beulich <jbeulich@suse.com> wrote:
>
> > On 17.01.2024 07:12, Patrick Plenefisch wrote:
> > > On Tue, Jan 16, 2024 at 4:33?AM Jan Beulich <jbeulich@suse.com> wrote:
> > >
> > >> On 16.01.2024 01:22, Patrick Plenefisch wrote:
> For the two logs that actually boooted (linuxoffset), I truncated them
> during pcie initialization, but they did go all the way to give me a login
> screen

I'm not seeing any Linux output on the provided logs, they just seem
to contain Xen output ...

> >
> > > As someone who hasn't built a kernel in over a decade, should I figure
> > out
> > > how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and report
> > > back?
> >
> > That was largely a suggestion to perhaps allow you to gain some
> > workable setup. It would be of interest to us largely for completeness.
> >
>
> Typo aside, setting the boot to 2MiB works! It works better for PV, while
> PVH has some graphics card issues, namely that I have to interact over
> serial and dmesg has some concerning radeon errors

... and so the radeon error mentioned here seem to be missing. IIRC
for radeon cards to work on PVH dom0 you will need an hypervisor with
the following commit:

https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=f69c5991595c92756860d038346569464c1b9ea1

(included in 4.18)

There where also some changes not long ago in order to propagate the
video console information from Xen into dom0, those are also included
in 4.18, but I don't recall in which Linux version they landed.

Anyway, would be good if you can provide the full Xen + Linux logs
when the radeon issue happens.

Regards, Roger.
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Thu, Jan 18, 2024 at 3:27?AM Roger Pau Monné <roger.pau@citrix.com>
wrote:

> On Thu, Jan 18, 2024 at 01:23:56AM -0500, Patrick Plenefisch wrote:
> > On Wed, Jan 17, 2024 at 3:46?AM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > > On 17.01.2024 07:12, Patrick Plenefisch wrote:
> > > > On Tue, Jan 16, 2024 at 4:33?AM Jan Beulich <jbeulich@suse.com>
> wrote:
> > > >
> > > >> On 16.01.2024 01:22, Patrick Plenefisch wrote:
> > For the two logs that actually boooted (linuxoffset), I truncated them
> > during pcie initialization, but they did go all the way to give me a
> login
> > screen
>
> I'm not seeing any Linux output on the provided logs, they just seem
> to contain Xen output ...
>
> > >
> > > > As someone who hasn't built a kernel in over a decade, should I
> figure
> > > out
> > > > how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and
> report
> > > > back?
> > >
> > > That was largely a suggestion to perhaps allow you to gain some
> > > workable setup. It would be of interest to us largely for completeness.
> > >
> >
> > Typo aside, setting the boot to 2MiB works! It works better for PV, while
> > PVH has some graphics card issues, namely that I have to interact over
> > serial and dmesg has some concerning radeon errors
>
> ... and so the radeon error mentioned here seem to be missing. IIRC
> for radeon cards to work on PVH dom0 you will need an hypervisor with
> the following commit:
>
>
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=f69c5991595c92756860d038346569464c1b9ea1
>
> (included in 4.18)
>

hmm.. that would mean I was running with them as I used 4.18 for that run


>
> There where also some changes not long ago in order to propagate the
> video console information from Xen into dom0, those are also included
> in 4.18, but I don't recall in which Linux version they landed.
>
> Anyway, would be good if you can provide the full Xen + Linux logs
> when the radeon issue happens.
>

Luckily linux logs are mercifully short. Append this to
xen-4.18p_grub_linuxoffset_pvh.log:

[ 0.778770] i2c_designware AMDI0010:00: Unknown Synopsys component type:
0xffffffff
[ 0.914664] amd_gpio AMDI0030:00: error -EINVAL: IRQ index 0 not found
[ 0.930112] xen_mcelog: Failed to get CPU numbers
[ 8.324907] ccp 0000:06:00.5: pcim_iomap_regions failed (-16)
[ 8.338604] sp5100-tco sp5100-tco: Watchdog hardware is disabled
[ 8.909366] [drm:radeon_get_bios [radeon]] *ERROR* ACPI VFCT table
present but broken (too short #2)

Debian GNU/Linux 12 testos hvc0

[ 11.891845] amdgpu 0000:01:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]]
*ERROR* IB test failed on uvd (-110).
[ 12.915854] amdgpu 0000:01:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]]
*ERROR* IB test failed on uvd_enc0 (-110).
[ 13.939868] amdgpu 0000:01:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]]
*ERROR* IB test failed on uvd_enc1 (-110).
[ 15.059840] amdgpu 0000:01:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]]
*ERROR* IB test failed on vce0 (-110).






>
> Regards, Roger.
>
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On 18.01.2024 07:23, Patrick Plenefisch wrote:
> On Wed, Jan 17, 2024 at 3:46?AM Jan Beulich <jbeulich@suse.com> wrote:
>> On 17.01.2024 07:12, Patrick Plenefisch wrote:
>>> I'm currently talking to the vendor's support team and testing a beta
>> BIOS
>>> for unrelated reasons, is there something specific I should forward to
>>> them, either as a question or as a request for a fix?
>>
>> Well, first it would need figuring whether the "interesting" regions
>> are being put in place by firmware of the boot loader. If it's firmware
>> (pretty likely at least for the region you're having trouble with), you
>> may want to ask them to re-do where they place that specific data.
>
> This section changes boot-to-boot and grub vs EFI direct load, but my
> untrained eyes don't see an obvoius pattern. I've attached several logs.
> Name format:
>
> xen-XENVERSION_LOADER_KERNELNAME_TYPE.log
>
> where XENVERSION is 4.17 (packaged in debian 12) or 4.18 (I built from
> source) or 4.18p (I applied the patch you mention below and built from
> source)
>
> where LOADER is grub for grub2 (from debian 12) or UEFI (direct boot via
> efibootmgr-configured UEFI entry)
>
> where KERNELNAME is either empty (PVH failure), or linuxpatch (linux with
> the patch requested above), or linuxoffset (with PHYSICAL_START=2MiB), or
> linux6 (debian 12 kernel)
>
> where TYPE is either pvh or pv
>
> For the two logs that actually boooted (linuxoffset), I truncated them
> during pcie initialization, but they did go all the way to give me a login
> screen

The LOADER=UEFI logs confirm it's firmware (in the widest sense, as it could
also be a UEFI driver) which puts in place these unhelpful regions.

Jan
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Thu, Jan 18, 2024 at 03:34:13AM -0500, Patrick Plenefisch wrote:
> On Thu, Jan 18, 2024 at 3:27?AM Roger Pau Monné <roger.pau@citrix.com>
> wrote:
>
> > On Thu, Jan 18, 2024 at 01:23:56AM -0500, Patrick Plenefisch wrote:
> > > On Wed, Jan 17, 2024 at 3:46?AM Jan Beulich <jbeulich@suse.com> wrote:
> > >
> > > > On 17.01.2024 07:12, Patrick Plenefisch wrote:
> > > > > On Tue, Jan 16, 2024 at 4:33?AM Jan Beulich <jbeulich@suse.com>
> > wrote:
> > > > >
> > > > >> On 16.01.2024 01:22, Patrick Plenefisch wrote:
> > > For the two logs that actually boooted (linuxoffset), I truncated them
> > > during pcie initialization, but they did go all the way to give me a
> > login
> > > screen
> >
> > I'm not seeing any Linux output on the provided logs, they just seem
> > to contain Xen output ...
> >
> > > >
> > > > > As someone who hasn't built a kernel in over a decade, should I
> > figure
> > > > out
> > > > > how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and
> > report
> > > > > back?
> > > >
> > > > That was largely a suggestion to perhaps allow you to gain some
> > > > workable setup. It would be of interest to us largely for completeness.
> > > >
> > >
> > > Typo aside, setting the boot to 2MiB works! It works better for PV, while
> > > PVH has some graphics card issues, namely that I have to interact over
> > > serial and dmesg has some concerning radeon errors
> >
> > ... and so the radeon error mentioned here seem to be missing. IIRC
> > for radeon cards to work on PVH dom0 you will need an hypervisor with
> > the following commit:
> >
> >
> > https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=f69c5991595c92756860d038346569464c1b9ea1
> >
> > (included in 4.18)
> >
>
> hmm.. that would mean I was running with them as I used 4.18 for that run
>
>
> >
> > There where also some changes not long ago in order to propagate the
> > video console information from Xen into dom0, those are also included
> > in 4.18, but I don't recall in which Linux version they landed.
> >
> > Anyway, would be good if you can provide the full Xen + Linux logs
> > when the radeon issue happens.
> >
>
> Luckily linux logs are mercifully short. Append this to
> xen-4.18p_grub_linuxoffset_pvh.log:
>
> [ 0.778770] i2c_designware AMDI0010:00: Unknown Synopsys component type:
> 0xffffffff
> [ 0.914664] amd_gpio AMDI0030:00: error -EINVAL: IRQ index 0 not found
> [ 0.930112] xen_mcelog: Failed to get CPU numbers
> [ 8.324907] ccp 0000:06:00.5: pcim_iomap_regions failed (-16)
> [ 8.338604] sp5100-tco sp5100-tco: Watchdog hardware is disabled
> [ 8.909366] [drm:radeon_get_bios [radeon]] *ERROR* ACPI VFCT table
> present but broken (too short #2)

Hm, interesting. I will have to add more debug in order to check
what's going on here, seems like the table is corrupted somehow.

Would you be able to build a new version of Xen if I provide you with
an extra debug patch?

Thanks, Roger.
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Thu, Jan 18, 2024 at 4:46?AM Roger Pau Monné <roger.pau@citrix.com>
wrote:

> >
> > Luckily linux logs are mercifully short. Append this to
> > xen-4.18p_grub_linuxoffset_pvh.log:
> >
> > [ 0.778770] i2c_designware AMDI0010:00: Unknown Synopsys component
> type:
> > 0xffffffff
> > [ 0.914664] amd_gpio AMDI0030:00: error -EINVAL: IRQ index 0 not found
> > [ 0.930112] xen_mcelog: Failed to get CPU numbers
> > [ 8.324907] ccp 0000:06:00.5: pcim_iomap_regions failed (-16)
> > [ 8.338604] sp5100-tco sp5100-tco: Watchdog hardware is disabled
> > [ 8.909366] [drm:radeon_get_bios [radeon]] *ERROR* ACPI VFCT table
> > present but broken (too short #2)
>
> Hm, interesting. I will have to add more debug in order to check
> what's going on here, seems like the table is corrupted somehow.
>
> Would you be able to build a new version of Xen if I provide you with
> an extra debug patch?
>

Yes, I now have a build env setup for testing xen and the linux kernel.


>
> Thanks, Roger.
>
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Thu, Jan 18, 2024 at 01:23:56AM -0500, Patrick Plenefisch wrote:
> On Wed, Jan 17, 2024 at 3:46?AM Jan Beulich <jbeulich@suse.com> wrote:
> > On 17.01.2024 07:12, Patrick Plenefisch wrote:
> > > As someone who hasn't built a kernel in over a decade, should I figure
> > out
> > > how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and report
> > > back?
> >
> > That was largely a suggestion to perhaps allow you to gain some
> > workable setup. It would be of interest to us largely for completeness.
> >
>
> Typo aside, setting the boot to 2MiB works! It works better for PV

Are there any downsides of running kernel with
CONFIG_PHYSICAL_START=0x200000? I can confirm it fixes the issue on
another affected system, and if there aren't any practical downsides,
I'm tempted to change it the default kernel in Qubes OS.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On 19.01.2024 14:40, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 18, 2024 at 01:23:56AM -0500, Patrick Plenefisch wrote:
>> On Wed, Jan 17, 2024 at 3:46?AM Jan Beulich <jbeulich@suse.com> wrote:
>>> On 17.01.2024 07:12, Patrick Plenefisch wrote:
>>>> As someone who hasn't built a kernel in over a decade, should I figure
>>> out
>>>> how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and report
>>>> back?
>>>
>>> That was largely a suggestion to perhaps allow you to gain some
>>> workable setup. It would be of interest to us largely for completeness.
>>>
>>
>> Typo aside, setting the boot to 2MiB works! It works better for PV
>
> Are there any downsides of running kernel with
> CONFIG_PHYSICAL_START=0x200000? I can confirm it fixes the issue on
> another affected system, and if there aren't any practical downsides,
> I'm tempted to change it the default kernel in Qubes OS.

There must have been a reason to make the default 16Mb. You may want
to fish out the commit doing so ... In Qubes, though, I understand
you're always running with Xen underneath, so unless this same kernel
is also needed to run in HVM guests, some of whatever the reasons may
have been may go away.

Jan
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Fri, Jan 19, 2024 at 02:50:38PM +0100, Jan Beulich wrote:
> On 19.01.2024 14:40, Marek Marczykowski-Górecki wrote:
> > On Thu, Jan 18, 2024 at 01:23:56AM -0500, Patrick Plenefisch wrote:
> >> On Wed, Jan 17, 2024 at 3:46?AM Jan Beulich <jbeulich@suse.com> wrote:
> >>> On 17.01.2024 07:12, Patrick Plenefisch wrote:
> >>>> As someone who hasn't built a kernel in over a decade, should I figure
> >>> out
> >>>> how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and report
> >>>> back?
> >>>
> >>> That was largely a suggestion to perhaps allow you to gain some
> >>> workable setup. It would be of interest to us largely for completeness.
> >>>
> >>
> >> Typo aside, setting the boot to 2MiB works! It works better for PV
> >
> > Are there any downsides of running kernel with
> > CONFIG_PHYSICAL_START=0x200000? I can confirm it fixes the issue on
> > another affected system, and if there aren't any practical downsides,
> > I'm tempted to change it the default kernel in Qubes OS.
>
> There must have been a reason to make the default 16Mb. You may want
> to fish out the commit doing so ...

https://git.kernel.org/torvalds/c/ceefccc93932b920

Default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN each to 16 MB,
so that both non-relocatable and relocatable kernels are loaded at
16 MB by a non-relocating bootloader. This is somewhat hacky, but it
appears to be the only way to do this that does not break some some
set of existing bootloaders.

We want to avoid the bottom 16 MB because of large page breakup,
memory holes, and ZONE_DMA. Embedded systems may need to reduce this,
or update their bootloaders to be aware of the new min_alignment field.

Large pages (in practice) do not apply to PV dom0, but other points
could in theory. That said, I checked few other systems and I don't see
any reserved regions there (there is large usable region at 0x100000,
other reserved regions are near the 4GB boundary).
This isn't very representative sample, though...

> In Qubes, though, I understand
> you're always running with Xen underneath, so unless this same kernel
> is also needed to run in HVM guests, some of whatever the reasons may
> have been may go away.

The same kernel is used for PVH/HVM guests too.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: E820 memory allocation issue on Threadripper platforms [ In reply to ]
On Fri, Jan 19, 2024 at 02:40:06PM +0100, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 18, 2024 at 01:23:56AM -0500, Patrick Plenefisch wrote:
> > On Wed, Jan 17, 2024 at 3:46?AM Jan Beulich <jbeulich@suse.com> wrote:
> > > On 17.01.2024 07:12, Patrick Plenefisch wrote:
> > > > As someone who hasn't built a kernel in over a decade, should I figure
> > > out
> > > > how to do a kernel build with CONFIG_PHYSICAL_START=0x2000000 and report
> > > > back?
> > >
> > > That was largely a suggestion to perhaps allow you to gain some
> > > workable setup. It would be of interest to us largely for completeness.
> > >
> >
> > Typo aside, setting the boot to 2MiB works! It works better for PV
>
> Are there any downsides of running kernel with
> CONFIG_PHYSICAL_START=0x200000? I can confirm it fixes the issue on
> another affected system, and if there aren't any practical downsides,
> I'm tempted to change it the default kernel in Qubes OS.

I have the answer here: CONFIG_PHYSICAL_START=0x200000 breaks booting
Xen in KVM with OVMF. There, the memory map has:
(XEN) 0000000100000-00000007fffff type=7 attr=000000000000000f
(XEN) 0000000800000-0000000807fff type=10 attr=000000000000000f
(XEN) 0000000808000-000000080afff type=7 attr=000000000000000f
(XEN) 000000080b000-000000080bfff type=10 attr=000000000000000f
(XEN) 000000080c000-000000080ffff type=7 attr=000000000000000f
(XEN) 0000000810000-00000008fffff type=10 attr=000000000000000f
(XEN) 0000000900000-00000015fffff type=4 attr=000000000000000f

So, starting at 0x1000000 worked since type=4 (boot service data) is
available at that time already, but with 0x200000 it conflicts with
those AcpiNvs areas around 0x800000.

I'm cc-ing Jason since I see he claimed relevant gitlab issue. This
conflict at least gives easy test environment with console logged to a
file.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

1 2  View All