Hello,
While testing a new cluster we found the following behavior which i
discussed on #linux-ha with "andreask" afterwards and we both agree the
behavior was wrong.
bug scenario:
3 node cluster, 1 standby just for having 3 nodes, 2 active nodes
when we did a power off of the machine ( similar to pulling the power
cable from a machine ) the cluster failed to failover to the next node.
This is because the following setting:
RESETPOWERON was set to 0, so a machine powered off stays powered off
with the current code path, a machine in the state poweroff is
considered a failure for the stonith reset operation. which results in
no resources are started on the second node, and the machine stays in a
unclean state.
The analogy with real hardware and a powerbar and imho correct behavior:
---
If i pull the plug of node1, node 2 will fence it with the powerbar. The
power will powercycle the socket without any result, because i pulled
the plug. But the fencing operation is a success and all resources are
started on the second node
---
Patch to fix this with i hope a minimal change is attached.
After finding this bug i got ill and have to stay at home for a few
days, so i don't have access to an environment to test this patch atm.
Regards
Robbert Müller
While testing a new cluster we found the following behavior which i
discussed on #linux-ha with "andreask" afterwards and we both agree the
behavior was wrong.
bug scenario:
3 node cluster, 1 standby just for having 3 nodes, 2 active nodes
when we did a power off of the machine ( similar to pulling the power
cable from a machine ) the cluster failed to failover to the next node.
This is because the following setting:
RESETPOWERON was set to 0, so a machine powered off stays powered off
with the current code path, a machine in the state poweroff is
considered a failure for the stonith reset operation. which results in
no resources are started on the second node, and the machine stays in a
unclean state.
The analogy with real hardware and a powerbar and imho correct behavior:
---
If i pull the plug of node1, node 2 will fence it with the powerbar. The
power will powercycle the socket without any result, because i pulled
the plug. But the fencing operation is a success and all resources are
started on the second node
---
Patch to fix this with i hope a minimal change is attached.
After finding this bug i got ill and have to stay at home for a few
days, so i don't have access to an environment to test this patch atm.
Regards
Robbert Müller