Mailing List Archive

System freezes
Since the end of February I've been getting occasional freezes of my
Fedora 36 / MythTV master box, needing a complete reboot. Their onset
coincided with a major update of the desktop environment (KDE) and I
have posted about the problem on the fedora-kde users list; there hasn't
been much response.

My best clue about the cause seems to be a 'canary' from the
rtkit-daemon, where the 'action' has often included the freeze. My
fixes-32 box (still with el7) has no canary events and I've added a few
lines relating to its daemon at the end.

Any suggestions? Thanks. The 'tda1004' refs are to drop tuner reinit
events that seem to be routine.

John P

{{{

[john@HPFed ~]$ sudo SYSTEMD_COLORS=false journalctl --since -40d |
grep -v tda1004x | grep canary

[sudo] password for john:
Feb 27 01:39:02 HPFed rtkit-daemon[888]: The canary thread is apparently
starving. Taking action.
Feb 28 18:29:01 HPFed rtkit-daemon[906]: The canary thread is apparently
starving. Taking action.
Mar 02 02:41:48 HPFed rtkit-daemon[903]: The canary thread is apparently
starving. Taking action.
Mar 04 02:34:24 HPFed rtkit-daemon[889]: The canary thread is apparently
starving. Taking action.
Mar 07 18:54:38 HPFed rtkit-daemon[903]: The canary thread is apparently
starving. Taking action.
Mar 10 01:58:04 HPFed rtkit-daemon[909]: The canary thread is apparently
starving. Taking action.
Mar 13 01:19:45 HPFed rtkit-daemon[887]: The canary thread is apparently
starving. Taking action.
Mar 15 02:18:06 HPFed rtkit-daemon[891]: The canary thread is apparently
starving. Taking action.
Mar 20 01:57:02 HPFed rtkit-daemon[887]: The canary thread is apparently
starving. Taking action.
[john@HPFed ~]$

Fixes-32 system:

[root@HP_Box john]# journalctl -S -4d | grep rtkit

Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
processes of 1 users.
Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
processes of 1 users.
Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
processes of 1 users.

Fedora 36 most recent exammples:

[john@HPFed ~]$ sudo SYSTEMD_COLORS=false journalctl --since -6d | grep
-v tda1004x | grep -C 5 canary
--
Mar 15 02:01:03 HPFed anacron[56959]: Anacron started on 2023-03-15
Mar 15 02:01:03 HPFed run-parts[56961]: (/etc/cron.hourly) finished 0anacron
Mar 15 02:01:03 HPFed CROND[56947]: (root) CMDEND (run-parts
/etc/cron.hourly)
Mar 15 02:01:03 HPFed anacron[56959]: Normal exit (0 jobs run)
Mar 15 02:09:32 HPFed kioslave5[57197]: kf.coreaddons: Expected a
KPluginFactory, got a KIOPluginForMetaData
Mar 15 02:18:06 HPFed rtkit-daemon[891]: The canary thread is apparently
starving. Taking action.


-- Boot 9895bb71d2bb46228c4a08159c32218a --
Mar 15 08:32:40 HPFed kernel: Linux version 6.1.15-100.fc36.x86_64
(mockbuild@bkernel02.iad2.fedoraproject.org) (gcc (GCC) 12.2.1 20221121
(Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC
Fri Mar 3 17:22:46 UTC 2023
Mar 15 08:32:40 HPFed kernel: Command line:
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.1.15-100.fc36.x86_64
root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap
rd.lvm.lv=fedora/root rhgb quiet
Mar 15 08:32:40 HPFed kernel: x86/fpu: x87 FPU will use FXSAVE
Mar 15 08:32:40 HPFed kernel: signal: max sigframe size: 1440
--
Mar 20 01:31:17 HPFed systemd[1]: fstrim.service: Deactivated successfully.
Mar 20 01:31:17 HPFed systemd[1]: Finished fstrim.service - Discard
unused blocks on filesystems from /etc/fstab.
Mar 20 01:31:17 HPFed audit[1]: SERVICE_START pid=1 uid=0
auid=4294967295 ses=4294967295 subj=kernel msg='unit=fstrim
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
terminal=? res=success'
Mar 20 01:31:17 HPFed audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295
ses=4294967295 subj=kernel msg='unit=fstrim comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 20 01:47:39 HPFed chronyd[1043]: Source 82.219.4.30 replaced with
85.199.214.102 (2.fedora.pool.ntp.org)
Mar 20 01:57:02 HPFed rtkit-daemon[887]: The canary thread is apparently
starving. Taking action.
Mar 20 01:57:02 HPFed rtkit-daemon[887]: Demoting known real-time threads.


-- Boot 4d5dcb8446434ece8c5b94538498ea6e --
Mar 20 10:43:49 HPFed kernel: Linux version 6.1.18-100.fc36.x86_64
(mockbuild@bkernel02.iad2.fedoraproject.org) (gcc (GCC) 12.2.1 20221121
(Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC
Sat Mar 11 16:46:48 UTC 2023
Mar 20 10:43:49 HPFed kernel: Command line:
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.1.18-100.fc36.x86_64
root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap
rd.lvm.lv=fedora/root rhgb quiet
Mar 20 10:43:49 HPFed kernel: x86/fpu: x87 FPU will use FXSAVE
[john@HPFed ~]$

}}}
_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org
Re: System freezes [ In reply to ]
On 20/03/2023 17:22, John Pilkington wrote:
> Since the end of February I've been getting occasional freezes of my
> Fedora 36 / MythTV master box, needing a complete reboot.  Their onset
> coincided with a major update of the desktop environment (KDE) and I
> have posted about the problem on the fedora-kde users list; there hasn't
> been much response.
>
> My best clue about the cause seems to be a 'canary' from the
> rtkit-daemon, where the 'action' has often included the freeze.  My
> fixes-32 box (still with el7) has no canary events and I've added a few
> lines relating to its daemon at the end.
>
> Any suggestions?  Thanks.  The 'tda1004' refs are to drop tuner reinit
> events that seem to be routine.

I'm curious: what else does your system run that requires real-time
scheduling? In my experience MythTV itself doesn't need it.

Jan
_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org
Re: System freezes [ In reply to ]
On 20/03/2023 16:31, Jan Ceuleers wrote:
> On 20/03/2023 17:22, John Pilkington wrote:
>> Since the end of February I've been getting occasional freezes of my
>> Fedora 36 / MythTV master box, needing a complete reboot.  Their onset
>> coincided with a major update of the desktop environment (KDE) and I
>> have posted about the problem on the fedora-kde users list; there hasn't
>> been much response.
>>
>> My best clue about the cause seems to be a 'canary' from the
>> rtkit-daemon, where the 'action' has often included the freeze.  My
>> fixes-32 box (still with el7) has no canary events and I've added a few
>> lines relating to its daemon at the end.
>>
>> Any suggestions?  Thanks.  The 'tda1004' refs are to drop tuner reinit
>> events that seem to be routine.
>
> I'm curious: what else does your system run that requires real-time
> scheduling? In my experience MythTV itself doesn't need it.
>
> Jan

Thanks for the query. I haven't knowingly added any strange jobs; there
are just software notification tools etc. journalctl grepping
"Successfully made thread" does find mythfrontend, firefox, pipewire,
pipewire-pulse, wireplumber, with increasing thread ident numbers...

There are many fewer "Successfully demoted thread" events, the latest on
7 March.

and there was a pipewire update today.

John




_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org
Re: System freezes [ In reply to ]
On Mon, 20 Mar 2023 16:22:22 +0000, you wrote:

>Since the end of February I've been getting occasional freezes of my
>Fedora 36 / MythTV master box, needing a complete reboot. Their onset
>coincided with a major update of the desktop environment (KDE) and I
>have posted about the problem on the fedora-kde users list; there hasn't
>been much response.
>
>My best clue about the cause seems to be a 'canary' from the
>rtkit-daemon, where the 'action' has often included the freeze. My
>fixes-32 box (still with el7) has no canary events and I've added a few
>lines relating to its daemon at the end.
>
>Any suggestions? Thanks. The 'tda1004' refs are to drop tuner reinit
>events that seem to be routine.
>
>John P
>
>{{{
>
>[john@HPFed ~]$ sudo SYSTEMD_COLORS=false journalctl --since -40d |
>grep -v tda1004x | grep canary
>
>[sudo] password for john:
>Feb 27 01:39:02 HPFed rtkit-daemon[888]: The canary thread is apparently
>starving. Taking action.
>Feb 28 18:29:01 HPFed rtkit-daemon[906]: The canary thread is apparently
>starving. Taking action.
>Mar 02 02:41:48 HPFed rtkit-daemon[903]: The canary thread is apparently
>starving. Taking action.
>Mar 04 02:34:24 HPFed rtkit-daemon[889]: The canary thread is apparently
>starving. Taking action.
>Mar 07 18:54:38 HPFed rtkit-daemon[903]: The canary thread is apparently
>starving. Taking action.
>Mar 10 01:58:04 HPFed rtkit-daemon[909]: The canary thread is apparently
>starving. Taking action.
>Mar 13 01:19:45 HPFed rtkit-daemon[887]: The canary thread is apparently
>starving. Taking action.
>Mar 15 02:18:06 HPFed rtkit-daemon[891]: The canary thread is apparently
>starving. Taking action.
>Mar 20 01:57:02 HPFed rtkit-daemon[887]: The canary thread is apparently
>starving. Taking action.
>[john@HPFed ~]$
>
>Fixes-32 system:
>
>[root@HP_Box john]# journalctl -S -4d | grep rtkit
>
>Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
>processes of 1 users.
>Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
>processes of 1 users.
>Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
>processes of 1 users.
>
>Fedora 36 most recent exammples:
>
>[john@HPFed ~]$ sudo SYSTEMD_COLORS=false journalctl --since -6d | grep
>-v tda1004x | grep -C 5 canary
>--
>Mar 15 02:01:03 HPFed anacron[56959]: Anacron started on 2023-03-15
>Mar 15 02:01:03 HPFed run-parts[56961]: (/etc/cron.hourly) finished 0anacron
>Mar 15 02:01:03 HPFed CROND[56947]: (root) CMDEND (run-parts
>/etc/cron.hourly)
>Mar 15 02:01:03 HPFed anacron[56959]: Normal exit (0 jobs run)
>Mar 15 02:09:32 HPFed kioslave5[57197]: kf.coreaddons: Expected a
>KPluginFactory, got a KIOPluginForMetaData
>Mar 15 02:18:06 HPFed rtkit-daemon[891]: The canary thread is apparently
>starving. Taking action.
>
>
>-- Boot 9895bb71d2bb46228c4a08159c32218a --
>Mar 15 08:32:40 HPFed kernel: Linux version 6.1.15-100.fc36.x86_64
>(mockbuild@bkernel02.iad2.fedoraproject.org) (gcc (GCC) 12.2.1 20221121
>(Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC
>Fri Mar 3 17:22:46 UTC 2023
>Mar 15 08:32:40 HPFed kernel: Command line:
>BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.1.15-100.fc36.x86_64
>root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap
>rd.lvm.lv=fedora/root rhgb quiet
>Mar 15 08:32:40 HPFed kernel: x86/fpu: x87 FPU will use FXSAVE
>Mar 15 08:32:40 HPFed kernel: signal: max sigframe size: 1440

There are endless different scenarios for system freezes, so it would
be helpful to have a bit more information about your particular one.

What exactly happens when it freezes? Can you still ssh into the box
and reboot it that way (ie is the display frozen but the box still
running underneath that)? Can you still ping it and get a response?
Do you have a systemd early debug shell enabled? See here for how to
do that:

https://freedesktop.org/wiki/Software/systemd/Debugging/

If you do have an early debug shell available, does Ctrl-Alt-F9 get
you to it? Does it still work and allow you to do a reboot command?

What sort of video card does it use? Nvidia driver updates are a
common cause of freeze problems.

When you reboot, how are you doing that? Do you have SysRq support
enabled so you can use the SysRq REISUB keyboard sequence to do the
reboot as safely as possible to prevent filesystem corruption?

https://fedoraproject.org/wiki/QA/Sysrq

Do you do a full fsck of at least the system partition(s) before
rebooting the system normally, to ensure the system is not getting
more and more corrupted? The easiest way to do that is to boot a live
USB image and do it from there, or to have another bootable partition
on the system.
_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org
Re: System freezes [ In reply to ]
On 21/03/2023 01:59, Stephen Worthington wrote:
> On Mon, 20 Mar 2023 16:22:22 +0000, you wrote:
>
>> Since the end of February I've been getting occasional freezes of my
>> Fedora 36 / MythTV master box, needing a complete reboot. Their onset
>> coincided with a major update of the desktop environment (KDE) and I
>> have posted about the problem on the fedora-kde users list; there hasn't
>> been much response.
>>
>> My best clue about the cause seems to be a 'canary' from the
>> rtkit-daemon, where the 'action' has often included the freeze. My
>> fixes-32 box (still with el7) has no canary events and I've added a few
>> lines relating to its daemon at the end.
>>
>> Any suggestions? Thanks. The 'tda1004' refs are to drop tuner reinit
>> events that seem to be routine.
>>
>> John P
>>
>> {{{
>>
>> [john@HPFed ~]$ sudo SYSTEMD_COLORS=false journalctl --since -40d |
>> grep -v tda1004x | grep canary
>>
>> [sudo] password for john:
>> Feb 27 01:39:02 HPFed rtkit-daemon[888]: The canary thread is apparently
>> starving. Taking action.
>> Feb 28 18:29:01 HPFed rtkit-daemon[906]: The canary thread is apparently
>> starving. Taking action.
>> Mar 02 02:41:48 HPFed rtkit-daemon[903]: The canary thread is apparently
>> starving. Taking action.
>> Mar 04 02:34:24 HPFed rtkit-daemon[889]: The canary thread is apparently
>> starving. Taking action.
>> Mar 07 18:54:38 HPFed rtkit-daemon[903]: The canary thread is apparently
>> starving. Taking action.
>> Mar 10 01:58:04 HPFed rtkit-daemon[909]: The canary thread is apparently
>> starving. Taking action.
>> Mar 13 01:19:45 HPFed rtkit-daemon[887]: The canary thread is apparently
>> starving. Taking action.
>> Mar 15 02:18:06 HPFed rtkit-daemon[891]: The canary thread is apparently
>> starving. Taking action.
>> Mar 20 01:57:02 HPFed rtkit-daemon[887]: The canary thread is apparently
>> starving. Taking action.
>> [john@HPFed ~]$
>>
>> Fixes-32 system:
>>
>> [root@HP_Box john]# journalctl -S -4d | grep rtkit
>>
>> Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
>> processes of 1 users.
>> Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
>> processes of 1 users.
>> Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
>> processes of 1 users.
>>
>> Fedora 36 most recent exammples:
>>
>> [john@HPFed ~]$ sudo SYSTEMD_COLORS=false journalctl --since -6d | grep
>> -v tda1004x | grep -C 5 canary
>> --
>> Mar 15 02:01:03 HPFed anacron[56959]: Anacron started on 2023-03-15
>> Mar 15 02:01:03 HPFed run-parts[56961]: (/etc/cron.hourly) finished 0anacron
>> Mar 15 02:01:03 HPFed CROND[56947]: (root) CMDEND (run-parts
>> /etc/cron.hourly)
>> Mar 15 02:01:03 HPFed anacron[56959]: Normal exit (0 jobs run)
>> Mar 15 02:09:32 HPFed kioslave5[57197]: kf.coreaddons: Expected a
>> KPluginFactory, got a KIOPluginForMetaData
>> Mar 15 02:18:06 HPFed rtkit-daemon[891]: The canary thread is apparently
>> starving. Taking action.
>>
>>
>> -- Boot 9895bb71d2bb46228c4a08159c32218a --
>> Mar 15 08:32:40 HPFed kernel: Linux version 6.1.15-100.fc36.x86_64
>> (mockbuild@bkernel02.iad2.fedoraproject.org) (gcc (GCC) 12.2.1 20221121
>> (Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC
>> Fri Mar 3 17:22:46 UTC 2023
>> Mar 15 08:32:40 HPFed kernel: Command line:
>> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.1.15-100.fc36.x86_64
>> root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap
>> rd.lvm.lv=fedora/root rhgb quiet
>> Mar 15 08:32:40 HPFed kernel: x86/fpu: x87 FPU will use FXSAVE
>> Mar 15 08:32:40 HPFed kernel: signal: max sigframe size: 1440
>
> There are endless different scenarios for system freezes, so it would
> be helpful to have a bit more information about your particular one.

Stephen, thanks for your reply and suggestions. I'll work on them, but
here's more info:

The freezes have typically happened at night when screens are off.
First warning is that the disk access light is found on. Keyboard and
mouse unresponsive; caps lock key doesn't affect its light. ssh from
another box gets no response.

Power down by front button long press. Reboot with both vga monitor and
hdmi Sony Android tv active, at first looks normal but on completion the
tv screen may be seen as 'off' by nVidia-settings 470xx. and needs to be
reset (1360x768) and repositioned to abut the DELL monitor. (I'm still
having problems in getting consistent screen assignments as monitor and
tv do their various power-saving changes of status).

At that point I have usually run the DB-optimise-and-backup from a
konsole tab, and an in-and-out mythtv-setup(.real) before restarting the
backend and frontend(.real), again in konsole tabs. All back to normal.

The usual boot fsck checks have been ok, but I haven't yet run one from
a live disk. The system is still looking ok after yesterday's pipewire
update and reboot.

There are more details here, from earlier in the chase. But I dislike
the HYPERKITTY archive...

https://lists.fedoraproject.org/archives/list/kde@lists.fedoraproject.org/thread/R4PC7CZWD2JYMSX645KYY6SWXKIITO2Z/

>
> What exactly happens when it freezes? Can you still ssh into the box
> and reboot it that way (ie is the display frozen but the box still
> running underneath that)? Can you still ping it and get a response?
> Do you have a systemd early debug shell enabled? See here for how to
> do that:
>
> https://freedesktop.org/wiki/Software/systemd/Debugging/
>
> If you do have an early debug shell available, does Ctrl-Alt-F9 get
> you to it? Does it still work and allow you to do a reboot command?
>
> What sort of video card does it use? Nvidia driver updates are a
> common cause of freeze problems.
>
> When you reboot, how are you doing that? Do you have SysRq support
> enabled so you can use the SysRq REISUB keyboard sequence to do the
> reboot as safely as possible to prevent filesystem corruption?
>
> https://fedoraproject.org/wiki/QA/Sysrq
>
> Do you do a full fsck of at least the system partition(s) before
> rebooting the system normally, to ensure the system is not getting
> more and more corrupted? The easiest way to do that is to boot a live
> USB image and do it from there, or to have another bootable partition
> on the system.


_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org
Re: System freezes [ In reply to ]
On Tue, 21 Mar 2023 11:09:13 +0000, you wrote:

>Stephen, thanks for your reply and suggestions. I'll work on them, but
>here's more info:
>
>The freezes have typically happened at night when screens are off.
>First warning is that the disk access light is found on. Keyboard and
>mouse unresponsive; caps lock key doesn't affect its light. ssh from
>another box gets no response.

That sounds like a fairly full on freeze. It would still be worth
enabling SysRq and seeing if that is still working. If it is working,
it means the problem is less likely to be what I suspect, which is an
Nvidia driver problem. It would be worth trying an older Nvidia
driver. If you can remember, did a driver update happen just before
the freezes started happening? If so, go back to the version before
they started. If that fixes the problem (or even if it does not), it
would be worth updating to the very latest drivers and see if it is
also fixed there. I am currently on 525.85 on Ubuntu, but if your
options do not go that high yet, try 515 or 520.

>Power down by front button long press. Reboot with both vga monitor and
>hdmi Sony Android tv active, at first looks normal but on completion the
>tv screen may be seen as 'off' by nVidia-settings 470xx. and needs to be
>reset (1360x768) and repositioned to abut the DELL monitor. (I'm still
>having problems in getting consistent screen assignments as monitor and
>tv do their various power-saving changes of status).

It may be that you need to make local copies of your displays' EDID
data and set the drivers to use the copies, so that both sets of data
are available at all times. If that does not work, then you will
likely need to make customised modelines for each monitor and that is
a big pain.

>At that point I have usually run the DB-optimise-and-backup from a
>konsole tab, and an in-and-out mythtv-setup(.real) before restarting the
>backend and frontend(.real), again in konsole tabs. All back to normal.
>
>The usual boot fsck checks have been ok, but I haven't yet run one from
>a live disk. The system is still looking ok after yesterday's pipewire
>update and reboot.

The automatic boot fsck checks are not sufficient. With a freeze
happens when there are writes happening on the boot partition, there
can be complex damage to the filesystem. I had one occasion on my
mother's MythTV box that needed fsck to be run 7 times before it did
not flag any more errors. So after every freeze, you need to boot a
live image and use that to run fsck as many times as necessary until
it does a run where there are no errors found. If you do get damage
that is not fixed by the single automatic boot time fsck, then another
freeze causing more damage on top of the existing damage can cause
unfixable errors and you will lose the boot partition and have to
reinstall.

And there is also the problem that the automatic boot time fsck does
not log what it fixed (on Ubuntu anyway), so you have no idea if it
actually had to fix anything or not and what fixes it applied. At the
time that fsck is run, there is nowhere for it to log anything to
except RAM, so it is not set up to log at all.

After rebooting normally, you do still need to run a full check and
repair of the database. And re-check any table that got fixed, until
there are no more fixes done. In freezes like this, if I am recording
at the time, I almost always get the recordedseek table damaged.
Again, failing to fix the database before using it again can cause
irreparable damage.

>There are more details here, from earlier in the chase. But I dislike
>the HYPERKITTY archive...
>
>https://lists.fedoraproject.org/archives/list/kde@lists.fedoraproject.org/thread/R4PC7CZWD2JYMSX645KYY6SWXKIITO2Z/

Another thing I have had that caused freezes similar to yours was on
an old PC where I think the thermal paste between the CPU and its
heatsink had dried out and was not working well. The CPU overheated
and did an instant shutdown to save itself, and that often happened
while a command was active on the SATA controller, resulting in the
SATA activity LED being stuck on. So, is it the hotter time of the
year for you? Is it an older motherboard?

I think thermal shutdown is less likely than an Nvidia driver problem,
mainly because of it happening more at night, when things are
generally cooler. With my Nvidia driver problems, freezes could
happen at any time, but were more likely when the screen was changing
mode, for example from 1080p for the MythTV GUI to 1080i for playback.
_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org
Re: System freezes [ In reply to ]
On 21/03/2023 02:59, Stephen Worthington wrote:
> Do you do a full fsck of at least the system partition(s) before
> rebooting the system normally, to ensure the system is not getting
> more and more corrupted? The easiest way to do that is to boot a live
> USB image and do it from there, or to have another bootable partition
> on the system.

In my experience an automatic repairing fsck at every boot is sufficient
to avoid unrecoverable damage.

I use the ext4 filesystem, and I have used tune2fs to set its "maximum
mount count" setting to 1 so that fsck runs at every boot on every
filesystem. I have furthermore specified the fsck.repair=yes kernel
option so that fsck has permission to fix the issues it encounters.

This prevents complicated errors from developing, which are caused by
not running fsck when the filesystem has been uncleanly unmounted.

Takes very little time on SSDs.
_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org
Re: System freezes [ In reply to ]
On 21/03/2023 14:58, Stephen Worthington wrote:
> The automatic boot fsck checks are not sufficient. With a freeze
> happens when there are writes happening on the boot partition, there
> can be complex damage to the filesystem.

So that's where my experience differs: if you run fsck every time the
filesystem is mounted the issues it potentially needs to fix are not
complex (because the filesystem hasn't yet been written to since the
damage first occured).

But this does involve changing the defaults: from memory fsck is run
only once every 20 mounts or so, and that should be changed to every
time ("once every 1 mounts"), and fsck has to be given permission to fix
the issues it encounters (which is achieved by means of kernel parameter
fsck.repair=yes).

HTH, Jan

_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org
Re: System freezes [ In reply to ]
On 21/03/2023 13:58, Stephen Worthington wrote:
> On Tue, 21 Mar 2023 11:09:13 +0000, you wrote:
>
>> Stephen, thanks for your reply and suggestions. I'll work on them, but
>> here's more info:
>>
>> The freezes have typically happened at night when screens are off.
>> First warning is that the disk access light is found on. Keyboard and
>> mouse unresponsive; caps lock key doesn't affect its light. ssh from
>> another box gets no response.
>
> That sounds like a fairly full on freeze. It would still be worth
> enabling SysRq and seeing if that is still working. If it is working,
> it means the problem is less likely to be what I suspect, which is an
> Nvidia driver problem.

My recollection of SysRq is that it needed too many fingers and a
working keyboard. Haven't tried it recently.

Just after my first freeze, rpmfusion put a new nVidia build into
'testing'; I installed that, and it ran without problems for a day or
two before freezing again. It's still the current offering for 470xx,
the latest supporting my GT710 card.

The first freeze came soon after 60+ kde package updates, and the last
was before the latest pipewire. I'm in wait-and-see mode at present,
and looking out a 'live' image.

It would be worth trying an older Nvidia
> driver. If you can remember, did a driver update happen just before
> the freezes started happening? If so, go back to the version before
> they started. If that fixes the problem (or even if it does not), it
> would be worth updating to the very latest drivers and see if it is
> also fixed there. I am currently on 525.85 on Ubuntu, but if your
> options do not go that high yet, try 515 or 520.
>
>> Power down by front button long press. Reboot with both vga monitor and
>> hdmi Sony Android tv active, at first looks normal but on completion the
>> tv screen may be seen as 'off' by nVidia-settings 470xx. and needs to be
>> reset (1360x768) and repositioned to abut the DELL monitor. (I'm still
>> having problems in getting consistent screen assignments as monitor and
>> tv do their various power-saving changes of status).
>
> It may be that you need to make local copies of your displays' EDID
> data and set the drivers to use the copies, so that both sets of data
> are available at all times. If that does not work, then you will
> likely need to make customised modelines for each monitor and that is
> a big pain.
>
>> At that point I have usually run the DB-optimise-and-backup from a
>> konsole tab, and an in-and-out mythtv-setup(.real) before restarting the
>> backend and frontend(.real), again in konsole tabs. All back to normal.
>>
>> The usual boot fsck checks have been ok, but I haven't yet run one from
>> a live disk. The system is still looking ok after yesterday's pipewire
>> update and reboot.
>
> The automatic boot fsck checks are not sufficient. With a freeze
> happens when there are writes happening on the boot partition, there
> can be complex damage to the filesystem. I had one occasion on my
> mother's MythTV box that needed fsck to be run 7 times before it did
> not flag any more errors. So after every freeze, you need to boot a
> live image and use that to run fsck as many times as necessary until
> it does a run where there are no errors found. If you do get damage
> that is not fixed by the single automatic boot time fsck, then another
> freeze causing more damage on top of the existing damage can cause
> unfixable errors and you will lose the boot partition and have to
> reinstall.
>
> And there is also the problem that the automatic boot time fsck does
> not log what it fixed (on Ubuntu anyway), so you have no idea if it
> actually had to fix anything or not and what fixes it applied. At the
> time that fsck is run, there is nowhere for it to log anything to
> except RAM, so it is not set up to log at all.
>
> After rebooting normally, you do still need to run a full check and
> repair of the database. And re-check any table that got fixed, until
> there are no more fixes done. In freezes like this, if I am recording
> at the time, I almost always get the recordedseek table damaged.
> Again, failing to fix the database before using it again can cause
> irreparable damage.
>
>> There are more details here, from earlier in the chase. But I dislike
>> the HYPERKITTY archive...
>>
>> https://lists.fedoraproject.org/archives/list/kde@lists.fedoraproject.org/thread/R4PC7CZWD2JYMSX645KYY6SWXKIITO2Z/
>
> Another thing I have had that caused freezes similar to yours was on
> an old PC where I think the thermal paste between the CPU and its
> heatsink had dried out and was not working well. The CPU overheated
> and did an instant shutdown to save itself, and that often happened
> while a command was active on the SATA controller, resulting in the
> SATA activity LED being stuck on. So, is it the hotter time of the
> year for you? Is it an older motherboard?
>
> I think thermal shutdown is less likely than an Nvidia driver problem,
> mainly because of it happening more at night, when things are
> generally cooler. With my Nvidia driver problems, freezes could
> happen at any time, but were more likely when the screen was changing
> mode, for example from 1080p for the MythTV GUI to 1080i for playback.
> _______________________________________________

_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org
Re: System freezes [ In reply to ]
On 21/03/2023 15:39, Jan Ceuleers wrote:
> On 21/03/2023 14:58, Stephen Worthington wrote:
>> The automatic boot fsck checks are not sufficient. With a freeze
>> happens when there are writes happening on the boot partition, there
>> can be complex damage to the filesystem.
>
> So that's where my experience differs: if you run fsck every time the
> filesystem is mounted the issues it potentially needs to fix are not
> complex (because the filesystem hasn't yet been written to since the
> damage first occured).
>
> But this does involve changing the defaults: from memory fsck is run
> only once every 20 mounts or so, and that should be changed to every
> time ("once every 1 mounts"), and fsck has to be given permission to fix
> the issues it encounters (which is achieved by means of kernel parameter
> fsck.repair=yes).
>
> HTH, Jan

Yes, certainly it helps. tune2fs is part of the e2fsprogs package. But
the freeze happened again last night. The usual symptoms. I found that
ping worked, but not ssh or rsync.

Using "tune2fs -l <device>" on the 7 devices listed in fstab, whch are
on 2 disks, one said it was last checked 2 days ago, next check in 6
months or 21 mounts. The others were last checked on installation; one
7 years ago, one last year. :-(

On booting again, without any manual fsck-ery, I find

{{{

[john@HPFed ~]$ sudo SYSTEMD_COLORS=false journalctl --since -1d |
grep ": clean,"
Mar 22 08:36:36 HPFed systemd-fsck[462]: /dev/mapper/fedora-root: clean,
717649/3276800 files, 11378621/13107200 blocks
Mar 22 08:37:00 HPFed systemd-fsck[795]: /dev/sda1: clean, 467/128016
files, 306115/512000 blocks
Mar 22 08:37:00 HPFed systemd-fsck[793]: sam1: clean, 48267/32832000
files, 121560973/131072000 blocks
Mar 22 08:37:01 HPFed systemd-fsck[803]: /dev/sdb2: clean, 305/51200
files, 143591/204800 blocks
Mar 22 08:37:01 HPFed systemd-fsck[806]: /dev/sdb3: clean, 768/28139520
files, 105258046/112542464 blocks
Mar 22 08:37:04 HPFed systemd-fsck[794]: /dev/mapper/fedora-home: clean,
131957/11821056 files, 42123377/47272960 blocks
Mar 22 08:37:31 HPFed systemd-fsck[796]: /dev/sda3: clean,
427344/106831872 files, 407146188/427327488 blocks
[john@HPFed ~]$

}}}

which looks reassuring, but clearly ought not to be relied on in future.
The full journalctl shows some "Clearing of orphaned inode"s on
fedora-home and sdb3.

John P


_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org
Re: System freezes [ In reply to ]
On 3/21/23 08:33, Jan Ceuleers wrote:
> In my experience an automatic repairing fsck at every boot is sufficient
> to avoid unrecoverable damage.

This is my experience as well.

I also had a system freeze issue when I changed my Linux Distribution
(and kernel version). Very difficult to diagnose. What "fixed" the issue
in the end was getting some new RAM. I still don't know if I was
exhausting the RAM (was 4GB, now is 8GB) or if the previous set of DIMMs
went bad. They did not 'test' bad, but I have been told they don't
always test bad using memtester86+.

Bob


_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org
Re: System freezes [ In reply to ]
On Sun, Mar 26, 2023 at 1:25?PM Bob <mythtv@cox.net> wrote:

> They did not 'test' bad, but I have been told they don't
> always test bad using memtester86+.


I once had suspect ram. I could "use" the system pretty much fine, but if I
tried to compile GCC within the 3 gig tempfs (had 6 gigs of ram and slow
hdd, so i wanted compiles to be faster, thus tempfs ramdisk to compile in)
it would ALWAYS internal-compiler-error.

One pass of memtest86 found nothing.
One hour found nothing.
Only after 27 some hours of constant hammering with memtest86 did it show 1
error. But GCC *reliably* crashed compiling itself, so the ram *was*
reliably bad.

Mike
Re: System freezes [ In reply to ]
On 20/03/2023 16:22, John Pilkington wrote:
> Since the end of February I've been getting occasional freezes of my
> Fedora 36 / MythTV master box, needing a complete reboot.  Their onset
> coincided with a major update of the desktop environment (KDE) and I
> have posted about the problem on the fedora-kde users list; there hasn't
> been much response.
>
> My best clue about the cause seems to be a 'canary' from the
> rtkit-daemon, where the 'action' has often included the freeze.  My
> fixes-32 box (still with el7) has no canary events and I've added a few
> lines relating to its daemon at the end.
>
> Any suggestions?  Thanks.  The 'tda1004' refs are to drop tuner reinit
> events that seem to be routine.

Today, for the first time, I got a similar freeze in my el7-clone box
running fixes/32. It had been using kernel-lt 5.4.36 from elrepo, but
last week an update failed to see the video chip and I followed a
suggestion to use the mainline kernel-ml, currently 6.2.8. The canary
has moved in. In F36 that happened at 6.1.13

I opened https://github.com/MythTV/mythtv/issues/741 yesterday.
>
> John P
>
> {{{
>
> [john@HPFed ~]$ sudo SYSTEMD_COLORS=false journalctl --since -40d  |
> grep -v tda1004x | grep  canary
>
> [sudo] password for john:
> Feb 27 01:39:02 HPFed rtkit-daemon[888]: The canary thread is apparently
> starving. Taking action.
> Feb 28 18:29:01 HPFed rtkit-daemon[906]: The canary thread is apparently
> starving. Taking action.
> Mar 02 02:41:48 HPFed rtkit-daemon[903]: The canary thread is apparently
> starving. Taking action.
> Mar 04 02:34:24 HPFed rtkit-daemon[889]: The canary thread is apparently
> starving. Taking action.
> Mar 07 18:54:38 HPFed rtkit-daemon[903]: The canary thread is apparently
> starving. Taking action.
> Mar 10 01:58:04 HPFed rtkit-daemon[909]: The canary thread is apparently
> starving. Taking action.
> Mar 13 01:19:45 HPFed rtkit-daemon[887]: The canary thread is apparently
> starving. Taking action.
> Mar 15 02:18:06 HPFed rtkit-daemon[891]: The canary thread is apparently
> starving. Taking action.
> Mar 20 01:57:02 HPFed rtkit-daemon[887]: The canary thread is apparently
> starving. Taking action.
> [john@HPFed ~]$
>
> Fixes-32 system:
>
> [root@HP_Box john]# journalctl -S -4d | grep  rtkit
>
> Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
> processes of 1 users.
> Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
> processes of 1 users.
> Mar 20 15:13:36 HP_Box rtkit-daemon[875]: Supervising 6 threads of 3
> processes of 1 users.
>
> Fedora 36 most recent exammples:
>
> [john@HPFed ~]$ sudo SYSTEMD_COLORS=false journalctl --since -6d  | grep
> -v tda1004x | grep -C 5 canary
> --
> Mar 15 02:01:03 HPFed anacron[56959]: Anacron started on 2023-03-15
> Mar 15 02:01:03 HPFed run-parts[56961]: (/etc/cron.hourly) finished
> 0anacron
> Mar 15 02:01:03 HPFed CROND[56947]: (root) CMDEND (run-parts
> /etc/cron.hourly)
> Mar 15 02:01:03 HPFed anacron[56959]: Normal exit (0 jobs run)
> Mar 15 02:09:32 HPFed kioslave5[57197]: kf.coreaddons: Expected a
> KPluginFactory, got a KIOPluginForMetaData
> Mar 15 02:18:06 HPFed rtkit-daemon[891]: The canary thread is apparently
> starving. Taking action.
>
>
> -- Boot 9895bb71d2bb46228c4a08159c32218a --
> Mar 15 08:32:40 HPFed kernel: Linux version 6.1.15-100.fc36.x86_64
> (mockbuild@bkernel02.iad2.fedoraproject.org) (gcc (GCC) 12.2.1 20221121
> (Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC
> Fri Mar  3 17:22:46 UTC 2023
> Mar 15 08:32:40 HPFed kernel: Command line:
> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.1.15-100.fc36.x86_64
> root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap
> rd.lvm.lv=fedora/root rhgb quiet
> Mar 15 08:32:40 HPFed kernel: x86/fpu: x87 FPU will use FXSAVE
> Mar 15 08:32:40 HPFed kernel: signal: max sigframe size: 1440
> --
> Mar 20 01:31:17 HPFed systemd[1]: fstrim.service: Deactivated successfully.
> Mar 20 01:31:17 HPFed systemd[1]: Finished fstrim.service - Discard
> unused blocks on filesystems from /etc/fstab.
> Mar 20 01:31:17 HPFed audit[1]: SERVICE_START pid=1 uid=0
> auid=4294967295 ses=4294967295 subj=kernel msg='unit=fstrim
> comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
> terminal=? res=success'
> Mar 20 01:31:17 HPFed audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295
> ses=4294967295 subj=kernel msg='unit=fstrim comm="systemd"
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> Mar 20 01:47:39 HPFed chronyd[1043]: Source 82.219.4.30 replaced with
> 85.199.214.102 (2.fedora.pool.ntp.org)
> Mar 20 01:57:02 HPFed rtkit-daemon[887]: The canary thread is apparently
> starving. Taking action.
> Mar 20 01:57:02 HPFed rtkit-daemon[887]: Demoting known real-time threads.
>
>
> -- Boot 4d5dcb8446434ece8c5b94538498ea6e --
> Mar 20 10:43:49 HPFed kernel: Linux version 6.1.18-100.fc36.x86_64
> (mockbuild@bkernel02.iad2.fedoraproject.org) (gcc (GCC) 12.2.1 20221121
> (Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC
> Sat Mar 11 16:46:48 UTC 2023
> Mar 20 10:43:49 HPFed kernel: Command line:
> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.1.18-100.fc36.x86_64
> root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap
> rd.lvm.lv=fedora/root rhgb quiet
> Mar 20 10:43:49 HPFed kernel: x86/fpu: x87 FPU will use FXSAVE
> [john@HPFed ~]$
>
> }}}

_______________________________________________
mythtv-users mailing list
mythtv-users@mythtv.org
http://lists.mythtv.org/mailman/listinfo/mythtv-users
http://wiki.mythtv.org/Mailing_List_etiquette
MythTV Forums: https://forum.mythtv.org