Mailing List Archive: Problems with cluster integration due to distributed states

Currently I am working on integration of DRBD with a clustering SW different
>from 'heartbeat'. However, there seems to be a general problem with cluster
integration.
One of the benefits of a cluster is that you have a central intelligence,
which knows about the state of all resources. Depending on that it can be
decide which resource gets active. Status of the resources can be gathered
by probing or capturing events.
The problem with DRBD is that the intelligence about the state is
distributed to the two nodes. In practice you read the state from each nodes
proc filesystem. On the other hand there is some administrative dependency
between these two instances. E.g. you may not switch one instance to primary
if the other one already is primary.
The specific problem I am faced with is that in case of a graceful failover,
the clustering software switches the active node to secondary. After
execution of the appropriate 'drbdsetup /dev/nb0 secondary' command, the
former standby node is switched to primary with the 'drbdsetup /dev/nb0
primary' command. The problem is that these commands are not synchronously
applied to the other node.
This means when switching a node to secondary the other local DRBD still
assumes the other node is primary, which can lead to a Primary/Primary
situation. If this happens the other node is forced into a Standalone state,
which is not what we wanted.
One solution for that can be to provide some kind of synchronous state
transition, which notifies the caller if the other node got the information
as well.

I appreciate your feedback :-)

/Wolfram

=======================================================================
Wolfram Weyer FORCE COMPUTERS GmbH
Staff Engineer - Systems Engineering A Solectron Subsidiary

phone: +49 89 60814-523 Street: Prof.-Messerschmitt-Str. 1
fax: +49 89 60814-112 City: D-85579 Neubiberg/Muenchen
mailto:Wolfram.Weyer@example.com <mailto:Wolfram.Weyer@force.de>
http://www.forcecomputers.com <http://www.forcecomputers.com/>
=======================================================================

I am trying te be more visible on the list again, so I guess I should
share my opinion on this...

* Weyer, Wolfram <Wolfram.Weyer@example.com> [011213 15:40]:
> Currently I am working on integration of DRBD with a clustering SW different
> from 'heartbeat'. However, there seems to be a general problem with cluster
> integration.
> One of the benefits of a cluster is that you have a central intelligence,
> which knows about the state of all resources. Depending on that it can be
> decide which resource gets active. Status of the resources can be gathered
> by probing or capturing events.
> The problem with DRBD is that the intelligence about the state is
> distributed to the two nodes. In practice you read the state from each nodes
> proc filesystem.

This point was raised before. To address this the "st:" field contains the
state of the local and the remote node.

> On the other hand there is some administrative dependency
> between these two instances. E.g. you may not switch one instance to primary
> if the other one already is primary.
> The specific problem I am faced with is that in case of a graceful failover,
> the clustering software switches the active node to secondary. After
> execution of the appropriate 'drbdsetup /dev/nb0 secondary' command, the
> former standby node is switched to primary with the 'drbdsetup /dev/nb0
> primary' command. The problem is that these commands are not synchronously
> applied to the other node.
> This means when switching a node to secondary the other local DRBD still
> assumes the other node is primary, which can lead to a Primary/Primary
> situation. If this happens the other node is forced into a Standalone state,
> which is not what we wanted.
> One solution for that can be to provide some kind of synchronous state
> transition, which notifies the caller if the other node got the information
> as well.
>
> I appreciate your feedback :-)
>
> /Wolfram
>

If you are issuing the two commands (drbdsetup xxx primary and
drbdsetup xxx secondary really concurrently (which is a hard problem on its own)
it should not lead to a disconnect.

Reason:
If you set the state of a node from sec->pri it does not care about the
state of the other node (since these state transitions are not done in an
transactional way in DRBD). It will simply send a message to its partner
telling him about its state change.
The other node receives the state change and checks if the change leads
to an invalid cluster state (pri/pri). If it is invalid it will
disconnect = cs:Standallone.

Solution for your problem:
Ensure a causal order of your drbdsetup calls. E.g. by sending a message
after issuing drbdsetup xxx secondary. The other node issues
drbdsetup xxx primary after it has seen that message.

Maybe more pragmatic solution:
Control the whole gracefull failover process from the current secondary node
(=future primary node) by using:
drbdsetup xxx secondary_remote
drbdsetup xxx primary

(I just notices that "secondary_remote" is only documented on the man page,
it is still missing from the usage output)

Unfortunately secondary_remote does not have any reasonably return
semantics/value by now. -- One of the manny things I should fix...

-Philipp

--
: Dipl-Ing Philipp Reisner Tel +43-1-8974897-750 :
: LINBIT Information Technologies GmbH http://www.linbit.com :
: Sechshauserstr 48, 1150 Wien :