Hi,
I've been thinking seriously about a new feature for heartbeat:
ping membership
I'd like your comments and thoughts on it:
What ping membership would do is allow a switch, or router or anything else that
you can ping to become a pseudo-member of the cluster.
A pseudo-member is one which we report on through the API almost as though it
were a real member, except that it doesn't have to run heartbeat - it just has
to respond to a ping.
Such pseudo-members could become tie-breakers for 2-node clusters.
For example:
Node1 and Node2 are real members. Switch1 is a pseudo-member. Switch1 would be
pinged at an appropriate interval, and as long as the pings returned often
enough and rapidly enough, switch2 is thought to be a member of the cluster. If
it dies or connectivity to it is lost, then heartbeat thinks that it has left
the cluster.
This allows quorum decisions in which the pseudo-member has a "vote". For
example, if you pull the ethernet from node2, then it looks around, sees that
node1 and switch1 have "died". It sees that it does not have enough members to
constitute quorum, so it gives up resources and waits for something to change.
On the other hand, node1 sees that node2 has died but switch1 is still alive,
which is two out of three "votes". It can then continue as the viable cluster.
Of course, it isn't really necessary to make a new membership type, but it seems
like a nice, uniform way of looking at it.
Normal nodes have status "dead", "up" or "active". Psuedo nodes might have
status "ping" or "dead". I'm undecided if the dead status should be the same
between the two types, or not. If they're the same, then there should be a
node-type API call that would tell you if it's a normal member or a
pseudo-member.
A few observations:
You must choose the ping resource such that it's "impossible" for the two
machines both to be able to communicate with the pseudo-member, but not be able
to communicate with each other using either this interface or another one.
This does not eliminate the need for I/O fencing.
Arbitrarily perverse hardware failures can cause arbitraryily perverse problems
- and this is no exception.
You still want more than one heartbeat network - especially if you have shared
storage.
Thoughts? Comments?
-- Alan Robertson
alanr@suse.com
I've been thinking seriously about a new feature for heartbeat:
ping membership
I'd like your comments and thoughts on it:
What ping membership would do is allow a switch, or router or anything else that
you can ping to become a pseudo-member of the cluster.
A pseudo-member is one which we report on through the API almost as though it
were a real member, except that it doesn't have to run heartbeat - it just has
to respond to a ping.
Such pseudo-members could become tie-breakers for 2-node clusters.
For example:
Node1 and Node2 are real members. Switch1 is a pseudo-member. Switch1 would be
pinged at an appropriate interval, and as long as the pings returned often
enough and rapidly enough, switch2 is thought to be a member of the cluster. If
it dies or connectivity to it is lost, then heartbeat thinks that it has left
the cluster.
This allows quorum decisions in which the pseudo-member has a "vote". For
example, if you pull the ethernet from node2, then it looks around, sees that
node1 and switch1 have "died". It sees that it does not have enough members to
constitute quorum, so it gives up resources and waits for something to change.
On the other hand, node1 sees that node2 has died but switch1 is still alive,
which is two out of three "votes". It can then continue as the viable cluster.
Of course, it isn't really necessary to make a new membership type, but it seems
like a nice, uniform way of looking at it.
Normal nodes have status "dead", "up" or "active". Psuedo nodes might have
status "ping" or "dead". I'm undecided if the dead status should be the same
between the two types, or not. If they're the same, then there should be a
node-type API call that would tell you if it's a normal member or a
pseudo-member.
A few observations:
You must choose the ping resource such that it's "impossible" for the two
machines both to be able to communicate with the pseudo-member, but not be able
to communicate with each other using either this interface or another one.
This does not eliminate the need for I/O fencing.
Arbitrarily perverse hardware failures can cause arbitraryily perverse problems
- and this is no exception.
You still want more than one heartbeat network - especially if you have shared
storage.
Thoughts? Comments?
-- Alan Robertson
alanr@suse.com