Hello
Trying heartbeat-0.4.7b under 2.2.15 with heavy load it still terminates:
May 16 13:10:28 florix /usr/lib/heartbeat/heartbeat[1396]: node florix: is dead
May 16 13:10:28 florix /usr/lib/heartbeat/heartbeat[1396]: No local heartbeat. Forcing shutdown.
May 16 13:10:31 florix heartbeat: INFO: Running /etc/ha.d/rc.d/status status
May 16 13:10:35 florix /usr/lib/heartbeat/heartbeat[1394]: Heartbeat shutdown in progress.
May 16 13:10:36 florix /usr/lib/heartbeat/heartbeat[20067]: Giving up all HA resources.
May 16 13:10:42 florix /usr/lib/heartbeat/heartbeat[20067]: All HA resources relinquished.
May 16 13:10:42 florix /usr/lib/heartbeat/heartbeat[1394]: Heartbeat shutdown complete.
Just some additional information, I did this test on the secondary note.
The machine is a dual PIII-450 and a SW-Raid 5 spread across five disks.
The load was around 34 when heartbeat gave up. However the disusage at that
time was very high (copying lots of small files via ftp locally). Besides,
does anyone know how to get disk/filesystem statistics in linux, to see how
busy the disk/filesystem are?
But maybe this behaviour is correct after all. If the disk is so busy
that heartbeat must think that the disk ie the node is dead. But I think
for nice_failback care should be taken that heartbeat does not terminate.
Giving away the resources is okay, but not the termination.
Another thing I noticed is that two heartbeat process are not locked
into memory:
20132 ttyS0 SL 0:00 /usr/lib/heartbeat/heartbeat
20134 ttyS0 SL 0:00 /usr/lib/heartbeat/heartbeat
20135 ttyS0 SL 0:00 /usr/lib/heartbeat/heartbeat
20136 ttyS0 S 0:00 /usr/lib/heartbeat/heartbeat
20137 ttyS0 SL 0:00 /usr/lib/heartbeat/heartbeat
20138 ttyS0 S 0:00 /usr/lib/heartbeat/heartbeat
Is this correct?
Then I still have two cosmetic wishes for heartbeat (I know christmas
is still far away, but still ... ;-))
- When heartbeat starts that it always prints out the version number.
- In the logfile don't always print out the full path of heartbeat, eg:
May 16 08:18:23 florix /usr/lib/heartbeat/heartbeat[1389]: Configuration ...
Just
May 16 08:18:23 florix heartbeat[1389]: Configuration ...
This would make reading the log files much easier.
Looking at /proc/<heartbeat-proc-id>/fd directory I notice that all of them
have /etc/ha.d/haresources open. Maybe this is another candidate for
close on exec.
Holger
Trying heartbeat-0.4.7b under 2.2.15 with heavy load it still terminates:
May 16 13:10:28 florix /usr/lib/heartbeat/heartbeat[1396]: node florix: is dead
May 16 13:10:28 florix /usr/lib/heartbeat/heartbeat[1396]: No local heartbeat. Forcing shutdown.
May 16 13:10:31 florix heartbeat: INFO: Running /etc/ha.d/rc.d/status status
May 16 13:10:35 florix /usr/lib/heartbeat/heartbeat[1394]: Heartbeat shutdown in progress.
May 16 13:10:36 florix /usr/lib/heartbeat/heartbeat[20067]: Giving up all HA resources.
May 16 13:10:42 florix /usr/lib/heartbeat/heartbeat[20067]: All HA resources relinquished.
May 16 13:10:42 florix /usr/lib/heartbeat/heartbeat[1394]: Heartbeat shutdown complete.
Just some additional information, I did this test on the secondary note.
The machine is a dual PIII-450 and a SW-Raid 5 spread across five disks.
The load was around 34 when heartbeat gave up. However the disusage at that
time was very high (copying lots of small files via ftp locally). Besides,
does anyone know how to get disk/filesystem statistics in linux, to see how
busy the disk/filesystem are?
But maybe this behaviour is correct after all. If the disk is so busy
that heartbeat must think that the disk ie the node is dead. But I think
for nice_failback care should be taken that heartbeat does not terminate.
Giving away the resources is okay, but not the termination.
Another thing I noticed is that two heartbeat process are not locked
into memory:
20132 ttyS0 SL 0:00 /usr/lib/heartbeat/heartbeat
20134 ttyS0 SL 0:00 /usr/lib/heartbeat/heartbeat
20135 ttyS0 SL 0:00 /usr/lib/heartbeat/heartbeat
20136 ttyS0 S 0:00 /usr/lib/heartbeat/heartbeat
20137 ttyS0 SL 0:00 /usr/lib/heartbeat/heartbeat
20138 ttyS0 S 0:00 /usr/lib/heartbeat/heartbeat
Is this correct?
Then I still have two cosmetic wishes for heartbeat (I know christmas
is still far away, but still ... ;-))
- When heartbeat starts that it always prints out the version number.
- In the logfile don't always print out the full path of heartbeat, eg:
May 16 08:18:23 florix /usr/lib/heartbeat/heartbeat[1389]: Configuration ...
Just
May 16 08:18:23 florix heartbeat[1389]: Configuration ...
This would make reading the log files much easier.
Looking at /proc/<heartbeat-proc-id>/fd directory I notice that all of them
have /etc/ha.d/haresources open. Maybe this is another candidate for
close on exec.
Holger