Mailing List Archive

AMD microcode error?
Hello list,

For the first time ever, I received an mce error today:

[11473.528812] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 14: 9090909090909090
[11473.529657] mce: [Hardware Error]: TSC 0
[11473.530146] mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1706457141 SOCKET 0 APIC 2 microcode a201009

This is an AMD Ryzen M9 5900X.

Hits on the web suggest downgrading linux-firmware, which I've now done and
will await results. The latest upgrade was to version 20240115-r1, four days
ago.

Has anyone else experienced this?

--
Regards,
Peter.
Re: AMD microcode error? [ In reply to ]
On Sunday, 28 January 2024 16:39:56 GMT I wrote:

> Hits on the web suggest downgrading linux-firmware, which I've now done and
> will await results. The latest upgrade was to version 20240115-r1, four days
> ago.

s/Hits/Hints/

--
Regards,
Peter.
Re: AMD microcode error? [ In reply to ]
On Sun, Jan 28, 2024 at 9:49?AM Peter Humphrey <peter@prh.myzen.co.uk>
wrote:
>
> On Sunday, 28 January 2024 16:39:56 GMT I wrote:
>
> > Hits on the web suggest downgrading linux-firmware, which I've now done
and
> > will await results. The latest upgrade was to version 20240115-r1, four
days
> > ago.
>
> s/Hits/Hints/
>
> --
> Regards,
> Peter.
>

If it is a memory error then there are there possibilities:

1) The new linux-firmware has a problem and the error is untrue

2) The DRAM was bad but not tested earlier and is true

3) The DRAM has gone bad and the error is true

A reasonable next step is to run some sort of longer term
memory test, memtest 86, memtest64 or something else of your choice.

Good luck,
Mark
Re: AMD microcode error? [ In reply to ]
On Sunday, 28 January 2024 16:59:52 GMT Mark Knecht wrote:
> On Sun, Jan 28, 2024 at 9:49?AM Peter Humphrey <peter@prh.myzen.co.uk>
>
> wrote:
> > On Sunday, 28 January 2024 16:39:56 GMT I wrote:
> > > Hits on the web suggest downgrading linux-firmware, which I've now done
>
> and
>
> > > will await results. The latest upgrade was to version 20240115-r1, four
>
> days
>
> > > ago.
> >
> > s/Hits/Hints/
> >
> > --
> > Regards,
> > Peter.
>
> If it is a memory error then there are there possibilities:
>
> 1) The new linux-firmware has a problem and the error is untrue
>
> 2) The DRAM was bad but not tested earlier and is true
>
> 3) The DRAM has gone bad and the error is true
>
> A reasonable next step is to run some sort of longer term
> memory test, memtest 86, memtest64 or something else of your choice.
>
> Good luck,
> Mark

I'm not sure a microcode update has been released yet by AMD as a blob,
outside what they make available to MoBo OEMs within 'BIOS firmware' updates.
To find what's in the box use:

dmesg | grep -i 'family:'

Then check what CPU family and model microcodes the linux-firmware contains:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
tree/amd-ucode/README

If you can't find your family and model in the above, then you could check
what firmware updates are available by the MoBo's OEM. These would include
microcode made directly available by AMD to the OEM.
Re: AMD microcode error? [ In reply to ]
Il 28/01/24 17:39, Peter Humphrey ha scritto:
> Hello list,
>
> For the first time ever, I received an mce error today:
>
> [11473.528812] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 14: 9090909090909090
> [11473.529657] mce: [Hardware Error]: TSC 0
> [11473.530146] mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1706457141 SOCKET 0 APIC 2 microcode a201009
>
> This is an AMD Ryzen M9 5900X.
>
> Hits on the web suggest downgrading linux-firmware, which I've now done and
> will await results. The latest upgrade was to version 20240115-r1, four days
> ago.
>
> Has anyone else experienced this?

No:

$ eix -I linux-firmware
[I] sys-kernel/linux-firmware
     Available versions:  (~)20231111-r1^bstd 20231211^bstd
20240115^bstd (~)20240115-r1^bstd **99999999*l^bstd {compress-xz
compress-zstd deduplicate initramfs +redistributable savedconfig
unknown-license}
     Installed versions:  20240115-r1^bst(10:52:28
01/27/24)(redistributable savedconfig -compress-xz -compress-zstd
-deduplicate -initramfs -unknown-license)

$ grep -e "microcode\|model name" /proc/cpuinfo
model name    : AMD Ryzen 9 5900X 12-Core Processor
microcode    : 0xa20120e

raf
Re: AMD microcode error? [ In reply to ]
On Sunday, 28 January 2024 17:39:56 GMT Michael wrote:

> I'm not sure a microcode update has been released yet by AMD as a blob,
> outside what they make available to MoBo OEMs within 'BIOS firmware'
> updates. To find what's in the box use:
>
> dmesg | grep -i 'family:'
>
> Then check what CPU family and model microcodes the linux-firmware
contains:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> tree/amd-ucode/README

No luck with those.

> If you can't find your family and model in the above, then you could check
> what firmware updates are available by the MoBo's OEM. These would include
> microcode made directly available by AMD to the OEM.

That's ASRock X570 Taichi. Their pages suggest that they only acknowledge
Windows 10 & 11.

I'll keep my eyes open for another glitch. Maybe the microcode isn't to blame
at all, in which case I'd better not sleep on the job.

Thanks for the pointers.

--
Regards,
Peter.
Re: AMD microcode error? [ In reply to ]
On Monday, 29 January 2024 16:18:22 GMT Peter Humphrey wrote:
> On Sunday, 28 January 2024 17:39:56 GMT Michael wrote:
> > I'm not sure a microcode update has been released yet by AMD as a blob,
> > outside what they make available to MoBo OEMs within 'BIOS firmware'
> > updates. To find what's in the box use:
> >
> > dmesg | grep -i 'family:'
> >
> > Then check what CPU family and model microcodes the linux-firmware
>
> contains:
> > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.gi
> > t/ tree/amd-ucode/README
>
> No luck with those.

OK, this means there is no microcode to load via the linux-firmware releases
(yet).


> > If you can't find your family and model in the above, then you could check
> > what firmware updates are available by the MoBo's OEM. These would
> > include
> > microcode made directly available by AMD to the OEM.
>
> That's ASRock X570 Taichi. Their pages suggest that they only acknowledge
> Windows 10 & 11.

Check the BIOS version in dmesg and compare it with the with the ASRock's AMD
chipset image on the asrock.com website. If the versions/dates are the same
you have nothing more to do. If the version on the website is more recent
then you may want to flash the MoBo with it.

Download the zip archive on offer and unzip it, then store the new image on a
USB stick which has been formatted with FAT32. Some OEMs require you rename
the firmware image file, it will say so on the website, or in a README within
the zip archive. Reboot and press [F2] during POST to get into the BIOS setup
menu, then go to the Tools tab to flash it from the USB.

You may have to re-apply in the BIOS menu any changes you had previously made
after the PC reboots, because restoring the settings from a backup file
doesn't always work.


> I'll keep my eyes open for another glitch. Maybe the microcode isn't to
> blame at all, in which case I'd better not sleep on the job.

Well, updating the BIOS firmware with the latest version often contains
patches for bugs and microcode patches for CPU vulnerabilities. However, this
does not mean it will address the MCE errors you were experiencing.