Mailing List Archive

Invalid opcode after kernel update
A few months ago after updating my kernel I started getting an invalid
opcode error during boot on the init process on my initramfs which I did
rebuilt. Switching to the old kernel and initramfs fixed the problem so
I kept that kernel for a few months for lack of time.

Today I rebuilt the whole system using `emerge -e @world` and after that
I'm able to boot the new kernel but now some pre-compiled packages (and
some that emerge -e missed because the ebuild was masked) crash with
illegal opcode. In the case of chrome it's not crashing but it only
renders garbage for webpages.

Does anyone have a clue what is happening? It's like the instruction set
changed after the kernel update (or was it the microcode?)

Thanks,

--

Fernando Rodriguez
Re: Invalid opcode after kernel update [ In reply to ]
Hello, Fernando.

On Sun, Sep 17, 2023 at 17:49:22 -0400, Fernando Rodriguez wrote:
> A few months ago after updating my kernel I started getting an invalid
> opcode error during boot on the init process on my initramfs which I did
> rebuilt. Switching to the old kernel and initramfs fixed the problem so
> I kept that kernel for a few months for lack of time.

> Today I rebuilt the whole system using `emerge -e @world` and after that
> I'm able to boot the new kernel but now some pre-compiled packages (and
> some that emerge -e missed because the ebuild was masked) crash with
> illegal opcode. In the case of chrome it's not crashing but it only
> renders garbage for webpages.

> Does anyone have a clue what is happening? It's like the instruction set
> changed after the kernel update (or was it the microcode?)

Could it be that you've got a sporadic RAM failure? Running the
standard RAM test (the one you boot into, I've forgotten its name) for
many hours might pin down the problem.

> Thanks,

> --

> Fernando Rodriguez

--
Alan Mackenzie (Nuremberg, Germany).
Re: Invalid opcode after kernel update [ In reply to ]
On 9/17/23 18:03, Alan Mackenzie wrote:
> Hello, Fernando.
>
> On Sun, Sep 17, 2023 at 17:49:22 -0400, Fernando Rodriguez wrote:
>> A few months ago after updating my kernel I started getting an invalid
>> opcode error during boot on the init process on my initramfs which I did
>> rebuilt. Switching to the old kernel and initramfs fixed the problem so
>> I kept that kernel for a few months for lack of time.
>
>> Today I rebuilt the whole system using `emerge -e @world` and after that
>> I'm able to boot the new kernel but now some pre-compiled packages (and
>> some that emerge -e missed because the ebuild was masked) crash with
>> illegal opcode. In the case of chrome it's not crashing but it only
>> renders garbage for webpages.
>
>> Does anyone have a clue what is happening? It's like the instruction set
>> changed after the kernel update (or was it the microcode?)
>
> Could it be that you've got a sporadic RAM failure? Running the
> standard RAM test (the one you boot into, I've forgotten its name) for
> many hours might pin down the problem.

I ran the test to be sure but it's not sporadic. It happens all the time
with the same pre-built binaries. My last working kernel was 5.15.122,
if I boot from that kernel everything works. Before the update
everything was built with -march=native and before the 'emerge -e' I
switched to -mtune=generic but I don't think it was the flags that
messed it up but the act of rebuilding because after rebuilding the
whole system I'm still having issues with pre-compiled binaries and
those should be generic builds. Strangely the same binaries that crash
on the host system run fine on a VM using hw virtualization.

I will try to run it on gdb to find out which instruction is triggering
the fault.

Thanks,
Fernando
Re: Invalid opcode after kernel update [ In reply to ]
On 9/18/23 11:04, Fernando Rodriguez wrote:
> On 9/17/23 18:03, Alan Mackenzie wrote:
> I will try to run it on gdb to find out which instruction is triggering
> the fault.
>
> Thanks,
> Fernando
>

The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM)
i7-8809G CPU @ 3.10GHz and it's supposed to have AVX2 but I don't see it
listed on /proc/cpuinfo. I can't reboot into the old kernel right now
but I suspect that when I do it will be there because I kind of remember
seeing it there. Any clues?

--

Fernando Rodriguez
Re: Invalid opcode after kernel update [ In reply to ]
On 9/18/23 14:52, Fernando Rodriguez wrote:
> On 9/18/23 11:04, Fernando Rodriguez wrote:
>> On 9/17/23 18:03, Alan Mackenzie wrote:
>> I will try to run it on gdb to find out which instruction is
>> triggering the fault.
>>
>> Thanks,
>> Fernando
>>
>
> The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM)
> i7-8809G CPU @ 3.10GHz and it's supposed to have AVX2 but I don't see it
> listed on /proc/cpuinfo. I can't reboot into the old kernel right now
> but I suspect that when I do it will be there because I kind of remember
>  seeing it there. Any clues?
>

Found this on my journal: "GDS: Microcode update needed! Disabling AVX
as mitigation." So I guess it's a microcode issue. I'm using dracut with
--early-microcode and I have CONFIG_MICROCODE_INTEL set and I have the
latest (as of friday) intel-microcode. I don't have initramfs enabled
for intel-microcode but never did and it was working. Will try it when I
get back, gotta run now. Any more ideas?

--

Fernando Rodriguez
Re: Invalid opcode after kernel update [ In reply to ]
Am Montag, 18. September 2023, 20:52:27 CEST schrieb Fernando Rodriguez:
> On 9/18/23 11:04, Fernando Rodriguez wrote:
> > On 9/17/23 18:03, Alan Mackenzie wrote:
> > I will try to run it on gdb to find out which instruction is triggering
> > the fault.
> >
> > Thanks,
> > Fernando
>
> The crash is happening on AVX2 instructions. My CPU is Intel(R) Core(TM)
> i7-8809G CPU @ 3.10GHz and it's supposed to have AVX2 but I don't see it
> listed on /proc/cpuinfo. I can't reboot into the old kernel right now
> but I suspect that when I do it will be there because I kind of remember
> seeing it there. Any clues?

It is Intel DOWNFALL, also called GDS Gather Data Sampling.

Maybe you want read: https://www.phoronix.com/review/downfall

Regards,
Peter