Hello
I still have the problem that heartbeat gives up the resources under
high load. Doing a test on my secondary node (florix) heartbeat put the
following in the logfile:
Jul 18 17:29:25 florix heartbeat[1050]: WARN: node florix: is dead
Jul 18 17:29:25 florix heartbeat[1050]: ERROR: No local heartbeat. Forcing shutdown.
Jul 18 17:29:25 florix heartbeat[1050]: info: Node florix: status active
Jul 18 17:29:25 florix heartbeat[1048]: info: Heartbeat shutdown in progress.
Jul 18 17:29:25 florix heartbeat[25949]: info: Giving up all HA resources.
Jul 18 17:29:28 florix heartbeat: info: Running /etc/ha.d/rc.d/status status
Jul 18 17:29:31 florix heartbeat: info: /usr/lib/heartbeat/mach_down: nice_failback: acquiring foreign resources
Jul 18 17:29:35 florix heartbeat[25949]: info: All HA resources relinquished.
Jul 18 17:29:36 florix heartbeat[1048]: info: Heartbeat shutdown complete.
This test was done with heartbeat 0.4.8 under a SMP Linux 2.2.15. I notice
that some process of heartbeat are not locked into memory:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 10680 0.2 0.2 1300 720 ttyS0 SL 06:49 0:00 /usr/lib/heartbeat/heartbeat
root 10682 0.0 0.2 1300 724 ttyS0 SL 06:49 0:00 /usr/lib/heartbeat/heartbeat
root 10683 0.0 0.2 1300 708 ttyS0 SL 06:49 0:00 /usr/lib/heartbeat/heartbeat
root 10684 0.0 0.2 1296 712 ttyS0 S 06:49 0:00 /usr/lib/heartbeat/heartbeat
root 10685 0.0 0.2 1300 708 ttyS0 SL 06:49 0:00 /usr/lib/heartbeat/heartbeat
root 10686 0.0 0.2 1296 704 ttyS0 S 06:49 0:00 /usr/lib/heartbeat/heartbeat
Could this be the problem?
Or could it be due to the very high disk load? There are lots of very small
files being written to disk and then deleted again. I am already using a
SW Raid5 spread accross 5 disks to make the disk IO faster, which did
help a lot. I notice heartbeat makes use of FIFO's. Does anyone know if
FIFO's are effected by disk performance, ie. if the disks are very busy
that FIFO's will block in such a situation?
I plan to use this system as a file distributing system that can have very
high loads at certain times. If heartbeat terminates itself on one of the
node the other node will take the load but will then terminate as well and
my system will be dead for the outside world.
What can I do so that this does not happen?
Thanks,
Holger
I still have the problem that heartbeat gives up the resources under
high load. Doing a test on my secondary node (florix) heartbeat put the
following in the logfile:
Jul 18 17:29:25 florix heartbeat[1050]: WARN: node florix: is dead
Jul 18 17:29:25 florix heartbeat[1050]: ERROR: No local heartbeat. Forcing shutdown.
Jul 18 17:29:25 florix heartbeat[1050]: info: Node florix: status active
Jul 18 17:29:25 florix heartbeat[1048]: info: Heartbeat shutdown in progress.
Jul 18 17:29:25 florix heartbeat[25949]: info: Giving up all HA resources.
Jul 18 17:29:28 florix heartbeat: info: Running /etc/ha.d/rc.d/status status
Jul 18 17:29:31 florix heartbeat: info: /usr/lib/heartbeat/mach_down: nice_failback: acquiring foreign resources
Jul 18 17:29:35 florix heartbeat[25949]: info: All HA resources relinquished.
Jul 18 17:29:36 florix heartbeat[1048]: info: Heartbeat shutdown complete.
This test was done with heartbeat 0.4.8 under a SMP Linux 2.2.15. I notice
that some process of heartbeat are not locked into memory:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 10680 0.2 0.2 1300 720 ttyS0 SL 06:49 0:00 /usr/lib/heartbeat/heartbeat
root 10682 0.0 0.2 1300 724 ttyS0 SL 06:49 0:00 /usr/lib/heartbeat/heartbeat
root 10683 0.0 0.2 1300 708 ttyS0 SL 06:49 0:00 /usr/lib/heartbeat/heartbeat
root 10684 0.0 0.2 1296 712 ttyS0 S 06:49 0:00 /usr/lib/heartbeat/heartbeat
root 10685 0.0 0.2 1300 708 ttyS0 SL 06:49 0:00 /usr/lib/heartbeat/heartbeat
root 10686 0.0 0.2 1296 704 ttyS0 S 06:49 0:00 /usr/lib/heartbeat/heartbeat
Could this be the problem?
Or could it be due to the very high disk load? There are lots of very small
files being written to disk and then deleted again. I am already using a
SW Raid5 spread accross 5 disks to make the disk IO faster, which did
help a lot. I notice heartbeat makes use of FIFO's. Does anyone know if
FIFO's are effected by disk performance, ie. if the disks are very busy
that FIFO's will block in such a situation?
I plan to use this system as a file distributing system that can have very
high loads at certain times. If heartbeat terminates itself on one of the
node the other node will take the load but will then terminate as well and
my system will be dead for the outside world.
What can I do so that this does not happen?
Thanks,
Holger