Mailing List Archive

1 2  View All
Re: [REGRESSION] mainline boot regression on AMD Stoney Ridge Chromebooks [ In reply to ]
Hi Thomas,

On 4/10/24 15:57, Thomas Gleixner wrote:
> Laura!
>
> On Wed, Apr 10 2024 at 10:15, Laura Nao wrote:
>> On 4/9/24 14:25, Thomas Gleixner wrote:
>>> Can you please replace that patch with the one below?
>>
>> So, with this patch applied on top of ace278e7eca6 the kernel doesn't
>> boot anymore - reference test job:
>> https://lava.collabora.dev/scheduler/job/13324010
>>
>> I see the only change between the second and third patch you provided,
>> besides the debug prints, is:
>>
>> - if (!topo_is_converted(c))
>> - return;
>> -
>
> Right. So this limits the area to search significantly.
>
>> Printing the debug information without this probably doesn't really help,
>> but just in case it's useful: I tried excluding the change above from the
>> patch while leaving everything else unchanged - reference test job:
>> https://lava.collabora.dev/scheduler/job/13324298 (also pasted the
>> kernel log here for easier consultation:
>> https://pastebin.com/raw/TQBDvCah)
>>
>> Hope this helps,
>
> It does. Good idea!
>
> I just moved the exit check a bit so we should see the scan info. That
> should tell me what goes south.
>

Here's the full kernel log with the latest patch applied:
https://pastebin.com/raw/r2CkP396

Reference test job: https://lava.collabora.dev/scheduler/job/13328709

Thanks!

Laura
Re: [REGRESSION] mainline boot regression on AMD Stoney Ridge Chromebooks [ In reply to ]
Laura!

On Wed, Apr 10 2024 at 18:11, Laura Nao wrote:
> On 4/10/24 15:57, Thomas Gleixner wrote:
>> I just moved the exit check a bit so we should see the scan info. That
>> should tell me what goes south.
>
> Here's the full kernel log with the latest patch applied:
> https://pastebin.com/raw/r2CkP396

Ok. Now I can see it.

pr_info("NPP %u\n", tscan.amd_nodes_per_pkg);

The output is:

<6>[ 0.000000] NPP 3

The original code then reads out 1, which is what's expected on that
machine.

Now that bogus value '3' is used later in

a %= b / 3

as b == 2, the division result is zero, which makes the %= operation
crash with a division by zero. As there are no handlers yet, nothing to
see.

The problem causing the bogus readout sits between my keyboard and my
chair.

I'll send out the fixes with proper change logs later tonight.

Thanks for all your help!

tglx

1 2  View All