I have coded a quorum deamon with an associated life monitor and
experience a problem with the quorum algorithm. I use one described
by stephen tweedie in struture.txt[1]. The point of the quorum is
to be sure that at -most- one partition has the quorum. Lets consider
the following scenario:
[node1]<--->[many nodes]<----a link L--->[less nodes]<--->[node2]
At the begining, there is no partitions so every node has the quorum.
suddenly the link L fail and the cluster is splitted. so only part
with node1 is supposed to have the quorum.
But the information of the link faillure doesn't reaches node2
instantly. During this delay, node2 wrongly believes it has the
quorum.so this algo seems not to garantee the quorum, simply to
give a 'good probability'.
i have 2 questions:
1. do i miss something ?
2. is a garantee required or a 'good probability' is enough ?
[1] from structure.txt:
" Quorum is necessary to protect cluster-wide shared persistent
state. It is essential to avoid problems when we have "cluster
partition": a possible type of fault in which some of the cluster
members have lost communications with the rest, but where the nodes
themselves are still working. In a partitioned cluster, we need
some mechanism we can rely on to ensure that at most one partition
has the right to update the cluster's shared persistent state.
(That state might be a shared disk, for example.)
Quorum is maintained by assigning a number of votes to each node.
This is a configuration property of the node. The Quorum manager
keeps track of two separate vote counts: the "Cluster Votes", which
is the sum of the votes of every node which is a member of the
cluster, and the "Expected Votes", which is the sum of the votes on
every node which has ever been seen by any voting member of the
cluster. (The storage of those node records is one reason why the
Quorum layer requires a JDB in this design.)
The cluster has Quorum if, and only if, it posesses MORE than half
of the Expected Votes. This guarantees that the known nodes which
are not in this cluster can not possibly form a Quorum on their own."
experience a problem with the quorum algorithm. I use one described
by stephen tweedie in struture.txt[1]. The point of the quorum is
to be sure that at -most- one partition has the quorum. Lets consider
the following scenario:
[node1]<--->[many nodes]<----a link L--->[less nodes]<--->[node2]
At the begining, there is no partitions so every node has the quorum.
suddenly the link L fail and the cluster is splitted. so only part
with node1 is supposed to have the quorum.
But the information of the link faillure doesn't reaches node2
instantly. During this delay, node2 wrongly believes it has the
quorum.so this algo seems not to garantee the quorum, simply to
give a 'good probability'.
i have 2 questions:
1. do i miss something ?
2. is a garantee required or a 'good probability' is enough ?
[1] from structure.txt:
" Quorum is necessary to protect cluster-wide shared persistent
state. It is essential to avoid problems when we have "cluster
partition": a possible type of fault in which some of the cluster
members have lost communications with the rest, but where the nodes
themselves are still working. In a partitioned cluster, we need
some mechanism we can rely on to ensure that at most one partition
has the right to update the cluster's shared persistent state.
(That state might be a shared disk, for example.)
Quorum is maintained by assigning a number of votes to each node.
This is a configuration property of the node. The Quorum manager
keeps track of two separate vote counts: the "Cluster Votes", which
is the sum of the votes of every node which is a member of the
cluster, and the "Expected Votes", which is the sum of the votes on
every node which has ever been seen by any voting member of the
cluster. (The storage of those node records is one reason why the
Quorum layer requires a JDB in this design.)
The cluster has Quorum if, and only if, it posesses MORE than half
of the Expected Votes. This guarantees that the known nodes which
are not in this cluster can not possibly form a Quorum on their own."