Mailing List Archive

Approx monthly hard lockups
May be off topic. I've been experiencing approximately monthly hard
lockups on my machine. Dell Inspiron 3252 Shipped Feb 19th, 2016. Atom
based machine with 8 gigs of ram.

gcc -c -Q -march=native --help=target | grep march=

...tells me it's a "silvermont". I'm running bog standard 64-bit
no-multilib Gentoo, a bit on the lean side. By "hard lockup" I mean...

* {CTRL}{ALT}{DEL} won't reboot
* Magic SysRq won't reboot
* The Power button powers down, but is unable to reboot (there's a light
inside the case that stays on)
* I have to flip off the power bar, wait, flip power bar back on, and
press the power button

Are there any utilities I can use to possibly diagnose the problem?
Let's just say, I'm now somewhat fanatical about backups every couple of
days.

--
Walter Dnes <waltdnes@waltdnes.org>
I don't run "desktop environments"; I run useful applications
Re: Approx monthly hard lockups [ In reply to ]
On Wed, May 12, 2021 at 9:35 AM Walter Dnes <waltdnes@waltdnes.org> wrote:
>
> May be off topic. I've been experiencing approximately monthly hard
> lockups on my machine. Dell Inspiron 3252 Shipped Feb 19th, 2016. Atom
> based machine with 8 gigs of ram.
>
> gcc -c -Q -march=native --help=target | grep march=
>
> ...tells me it's a "silvermont". I'm running bog standard 64-bit
> no-multilib Gentoo, a bit on the lean side. By "hard lockup" I mean...
>
> * {CTRL}{ALT}{DEL} won't reboot
> * Magic SysRq won't reboot
> * The Power button powers down, but is unable to reboot (there's a light
> inside the case that stays on)
> * I have to flip off the power bar, wait, flip power bar back on, and
> press the power button
>
> Are there any utilities I can use to possibly diagnose the problem?
> Let's just say, I'm now somewhat fanatical about backups every couple of
> days.
>
> --
> Walter Dnes <waltdnes@waltdnes.org>
> I don't run "desktop environments"; I run useful applications

Is there nothing in the system logs? What's near the end before whatever
you get from booting after this issue?
Re: Approx monthly hard lockups [ In reply to ]
On 5/12/21 10:35 AM, Walter Dnes wrote:
> May be off topic. I've been experiencing approximately monthly hard
> lockups on my machine. Dell Inspiron 3252 Shipped Feb 19th, 2016. Atom
> based machine with 8 gigs of ram.
>
> gcc -c -Q -march=native --help=target | grep march=
>
> ...tells me it's a "silvermont". I'm running bog standard 64-bit
> no-multilib Gentoo, a bit on the lean side. By "hard lockup" I mean...
>
> * {CTRL}{ALT}{DEL} won't reboot
> * Magic SysRq won't reboot
> * The Power button powers down, but is unable to reboot (there's a light
> inside the case that stays on)
Is that a short press or long press (at least six seconds?) on the power
button.  Short push should send the power down signal to the OS, but
long push should actually cut power.
> * I have to flip off the power bar, wait, flip power bar back on, and
> press the power button
>
> Are there any utilities I can use to possibly diagnose the problem?
> Let's just say, I'm now somewhat fanatical about backups every couple of
> days.
>
Re: Approx monthly hard lockups [ In reply to ]
On Wed, May 12, 2021 at 11:16:56AM -0700, Mark Knecht wrote
>
> Is there nothing in the system logs? What's near the end before whatever
> you get from booting after this issue?

Where would those logfiles be? "dmesg" starts off with...

[ 0.000000] Linux version 5.10.27-gentoo (root@i3) (gcc (Gentoo 10.2.0-r5 p6) 10.2.0, GNU ld (Gentoo 2.34 p6) 2.34.0) #1 SMP Mon Apr 19 20:56:52 EDT 2021
[ 0.000000] Command line: BOOT_IMAGE=Experimental ro root=801 noexec=on net.ifnames=0 intel_pstate=disable ipv6.disable=1
[ 0.000000] x86/fpu: x87 FPU will use FXSAVE
[ 0.000000] BIOS-provided physical RAM map:

etc. This has happened with various kernels. Note that "Experimental"
is the latest kernel version. "Production" is the previous working
kernel I've compiled. It serves as a fallback in case I screw things up
on the "make menuconfig" step. It also saved me years ago on the switch
from /dev/hdx to /dev/sdx.

--
Walter Dnes <waltdnes@waltdnes.org>
I don't run "desktop environments"; I run useful applications
Re: Approx monthly hard lockups [ In reply to ]
On Wed, May 12, 2021 at 03:40:30PM -0600, Jack wrote
> On 5/12/21 10:35 AM, Walter Dnes wrote:

> > * The Power button powers down, but is unable to reboot (there's a light
> > inside the case that stays on)

> Is that a short press or long press (at least six seconds?) on the
> power button. Short push should send the power down signal to the
> OS, but long push should actually cut power.

> > * I have to flip off the power bar, wait, flip power bar back on, and
> > press the power button

I don't remember what I used to power down. I'll try to remember next
time (which hopefully won't happen soon or at all).

--
Walter Dnes <waltdnes@waltdnes.org>
I don't run "desktop environments"; I run useful applications
Re: Approx monthly hard lockups [ In reply to ]
Walter Dnes wrote:
> On Wed, May 12, 2021 at 11:16:56AM -0700, Mark Knecht wrote
>> Is there nothing in the system logs? What's near the end before whatever
>> you get from booting after this issue?
> Where would those logfiles be? "dmesg" starts off with...
>


Most likely in /var/log/messages.  The bad thing about dmesg, I think it
resets when rebooting which erases previous info.

As Mark pointed out, you need to go back to the time before it starts
adding the boot up process.

Hope that helps.

Dale

:-)  :-) 
Re: Approx monthly hard lockups [ In reply to ]
Thanks for stepping in Dale. Yes, I was thinking /bar/log/messages or
something similar.

If the machine was completely hung then no disk activity probably means a
big jump in the times logged and should make it somewhat easier to find the
last stuff written.

Please excuse the top posting. I'm on my phone with limited editing
capabilities in GMail.

Good luck

On Wed, May 12, 2021, 7:50 PM Dale <rdalek1967@gmail.com> wrote:

> Walter Dnes wrote:
> > On Wed, May 12, 2021 at 11:16:56AM -0700, Mark Knecht wrote
> >> Is there nothing in the system logs? What's near the end before whatever
> >> you get from booting after this issue?
> > Where would those logfiles be? "dmesg" starts off with...
> >
>
>
> Most likely in /var/log/messages. The bad thing about dmesg, I think it
> resets when rebooting which erases previous info.
>
> As Mark pointed out, you need to go back to the time before it starts
> adding the boot up process.
>
> Hope that helps.
>
> Dale
>
> :-) :-)
>
Re: Approx monthly hard lockups [ In reply to ]
On Wed, 12 May 2021 22:44:56 -0400
"Walter Dnes" <waltdnes@waltdnes.org> wrote:

> On Wed, May 12, 2021 at 11:16:56AM -0700, Mark Knecht wrote
> >
> > Is there nothing in the system logs? What's near the end before whatever
> > you get from booting after this issue?
>
> Where would those logfiles be? "dmesg" starts off with...
>
> [ 0.000000] Linux version 5.10.27-gentoo (root@i3) (gcc (Gentoo 10.2.0-r5 p6) 10.2.0, GNU ld (Gentoo 2.34 p6) 2.34.0) #1 SMP Mon Apr 19 20:56:52 EDT 2021
> [ 0.000000] Command line: BOOT_IMAGE=Experimental ro root=801 noexec=on net.ifnames=0 intel_pstate=disable ipv6.disable=1
> [ 0.000000] x86/fpu: x87 FPU will use FXSAVE
> [ 0.000000] BIOS-provided physical RAM map:
>
> etc. This has happened with various kernels. Note that "Experimental"
> is the latest kernel version. "Production" is the previous working
> kernel I've compiled. It serves as a fallback in case I screw things up
> on the "make menuconfig" step. It also saved me years ago on the switch
> from /dev/hdx to /dev/sdx.
>

Maybe you can set in /etc/conf.d/bootmisc the variable
previous_dmesg="YES"
That should preserve the last dmesg after a reboot, i.e.
/var/log/dmesg.old would be the file you want to look at after hard
reset.

Cheers
Andreas
Re: Approx monthly hard lockups [ In reply to ]
On Wed, May 12, 2021 at 09:50:29PM -0500, Dale wrote
> Walter Dnes wrote:
> > On Wed, May 12, 2021 at 11:16:56AM -0700, Mark Knecht wrote
> >> Is there nothing in the system logs? What's near the end before whatever
> >> you get from booting after this issue?
> > Where would those logfiles be? "dmesg" starts off with...
> >
>
>
> Most likely in /var/log/messages. The bad thing about dmesg, I think it
> resets when rebooting which erases previous info.
>
> As Mark pointed out, you need to go back to the time before it starts
> adding the boot up process.

No help. /var/log/messages goes straight from iptables messages to
bootup with no indication of shutdown...

May 12 11:26:31 i3 kernel: FECESBOOK:IN= OUT=eth0 SRC=192.168.1.249 DST=31.13.80.12 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=5824 DF PROTO=TCP SPT=50566 DPT=443 WINDOW=64170 RES=0x00 SYN URGP=0
May 12 11:26:32 i3 kernel: FECESBOOK:IN= OUT=eth0 SRC=192.168.1.249 DST=31.13.80.12 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=60152 DF PROTO=TCP SPT=50568 DPT=443 WINDOW=64170 RES=0x00 SYN URGP=0
May 12 11:34:04 i3 syslog-ng[1384]: syslog-ng starting up; version='3.30.1'
May 12 11:34:04 i3 crond[1414]: /usr/sbin/crond 4.5 dillon's cron daemon, started with loglevel notice
May 12 11:34:04 i3 /usr/sbin/gpm[1443]: *** info [daemon/startup.c(136)]:
May 12 11:34:04 i3 /usr/sbin/gpm[1443]: Started gpm successfully. Entered daemon mode.

When doing a regular shutdown I get stuff like...

May 12 11:42:31 i3 shutdown[1974]: shutting down for system reboot
May 12 11:42:31 i3 init[1]: Switching to runlevel: 6
May 12 11:42:32 i3 init[1]: Trying to re-exec init
May 12 11:42:32 i3 start-stop-daemon[2053]: Will stop /usr/sbin/sshd
May 12 11:42:32 i3 start-stop-daemon[2053]: Will stop PID 1764
May 12 11:42:32 i3 start-stop-daemon[2053]: Sending signal 15 to PID 1764
May 12 11:42:32 i3 sshd[1764]: Received signal 15; terminating.

--
Walter Dnes <waltdnes@waltdnes.org>
I don't run "desktop environments"; I run useful applications
Re: Approx monthly hard lockups [ In reply to ]
On Thu, May 13, 2021 at 12:20 AM Walter Dnes <waltdnes@waltdnes.org> wrote:
>
> On Wed, May 12, 2021 at 09:50:29PM -0500, Dale wrote
> > Walter Dnes wrote:
> > > On Wed, May 12, 2021 at 11:16:56AM -0700, Mark Knecht wrote
> > >> Is there nothing in the system logs? What's near the end before
whatever
> > >> you get from booting after this issue?
> > > Where would those logfiles be? "dmesg" starts off with...
> > >
> >
> >
> > Most likely in /var/log/messages. The bad thing about dmesg, I think it
> > resets when rebooting which erases previous info.
> >
> > As Mark pointed out, you need to go back to the time before it starts
> > adding the boot up process.
>
> No help. /var/log/messages goes straight from iptables messages to
> bootup with no indication of shutdown...
>
> May 12 11:26:31 i3 kernel: FECESBOOK:IN= OUT=eth0 SRC=192.168.1.249
DST=31.13.80.12 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=5824 DF PROTO=TCP
SPT=50566 DPT=443 WINDOW=64170 RES=0x00 SYN URGP=0
> May 12 11:26:32 i3 kernel: FECESBOOK:IN= OUT=eth0 SRC=192.168.1.249
DST=31.13.80.12 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=60152 DF PROTO=TCP
SPT=50568 DPT=443 WINDOW=64170 RES=0x00 SYN URGP=0
> May 12 11:34:04 i3 syslog-ng[1384]: syslog-ng starting up;
version='3.30.1'
> May 12 11:34:04 i3 crond[1414]: /usr/sbin/crond 4.5 dillon's cron daemon,
started with loglevel notice
> May 12 11:34:04 i3 /usr/sbin/gpm[1443]: *** info [daemon/startup.c(136)]:
> May 12 11:34:04 i3 /usr/sbin/gpm[1443]: Started gpm successfully. Entered
daemon mode.
>
> When doing a regular shutdown I get stuff like...
>
> May 12 11:42:31 i3 shutdown[1974]: shutting down for system reboot
> May 12 11:42:31 i3 init[1]: Switching to runlevel: 6
> May 12 11:42:32 i3 init[1]: Trying to re-exec init
> May 12 11:42:32 i3 start-stop-daemon[2053]: Will stop /usr/sbin/sshd
> May 12 11:42:32 i3 start-stop-daemon[2053]: Will stop PID 1764
> May 12 11:42:32 i3 start-stop-daemon[2053]: Sending signal 15 to PID 1764
> May 12 11:42:32 i3 sshd[1764]: Received signal 15; terminating.
>

OK, but those messages are yesterday, May 12th, at 11:42 in the morning.
Being that you started this thread on May 12th at 9:35 (when I received the
first email anyway) the actual lockup presumably occurred earlier.

I believe you're in the correct file, or set of files. Depending on how
your logging works (what logger, what options) generally the logs roll over
into successive files. (*.1, *.2, *.3, etc) Assuming you have some idea
when the lockup occurred you want to look for what was going on just before
that. Do the logs hit a date/time and stop? Or do they go on logging with
lots of similar or repeating messages for hours or days?

Do double check that you log into a directory that doesn't run out of space
once a month. That could cause problems also.

HTH,
MArk
Re: Approx monthly hard lockups [ In reply to ]
On Thu, 13 May 2021 03:19:54 -0400, Walter Dnes wrote:

> >
> > Most likely in /var/log/messages. The bad thing about dmesg, I think
> > it resets when rebooting which erases previous info.
> >
> > As Mark pointed out, you need to go back to the time before it starts
> > adding the boot up process.
>
> No help. /var/log/messages goes straight from iptables messages to
> bootup with no indication of shutdown...


That's not surprising, the messages are cached before writing to disk, so
the most recent messages will be lost. You could try mounting the
filesystem containing /var with sync, which will at least avoid the disk
writes being cached. It's possible that syslog-ng also caches write, in
which case you'll also need to look for an option to prevent that.


--
Neil Bothwick

Math and alcohol don't mix. Don't drink and derive.
Re: Approx monthly hard lockups [ In reply to ]
On Thu, May 13, 2021 at 06:10:08AM -0700, Mark Knecht wrote
>
> Do double check that you log into a directory that doesn't run out of space
> once a month. That could cause problems also.

After the "separate /usr" brouhaha, I gave up and went to one file
system. We all back up our PC's regularly, don't we? <G> "fdisk -l"
shows my "1 terabyte" drive as...

Device Start End Sectors Size Type
/dev/sda1 2048 1929381887 1929379840 920G Linux filesystem
/dev/sda2 1929381888 1953525134 24143247 11.5G Linux swap

I am *NOT* running out of disk space. mc (Midnight Commander) shows
617 of 905 gigabytes free when logged in as regular user. 663 of 905
free when logged in as root. Soon to be several gigabytes more free
space when I do some cleaning up of obsolete unused cruft accumulated
over the years.

--
Walter Dnes <waltdnes@waltdnes.org>
I don't run "desktop environments"; I run useful applications
Re: Approx monthly hard lockups [ In reply to ]
On 5/13/21 11:01 PM, Walter Dnes wrote:
> On Thu, May 13, 2021 at 06:10:08AM -0700, Mark Knecht wrote
>> Do double check that you log into a directory that doesn't run out of space
>> once a month. That could cause problems also.
> After the "separate /usr" brouhaha, I gave up and went to one file
> system. We all back up our PC's regularly, don't we? <G> "fdisk -l"
> shows my "1 terabyte" drive as...
>
> Device Start End Sectors Size Type
> /dev/sda1 2048 1929381887 1929379840 920G Linux filesystem
> /dev/sda2 1929381888 1953525134 24143247 11.5G Linux swap
>
> I am *NOT* running out of disk space. mc (Midnight Commander) shows
> 617 of 905 gigabytes free when logged in as regular user. 663 of 905
> free when logged in as root. Soon to be several gigabytes more free
> space when I do some cleaning up of obsolete unused cruft accumulated
> over the years.
One thing I've done in a similar situation is to boot to a live image. 
That way, the current system does not mess with anything in /var/log and
you might find something which would be overwritten by a fresh boot.