Mailing List Archive

working with a cluster manager (heartbeat)
Hi,

Sorry if this has been discussed before (I'm pretty far behind on the list),
but it doesn't seem to be resolved (by reading the code and trying it out).
So, if I'm being an idiot, tell me right away, but be gentle ;-)

Last week in Germany I finally tried out drbd with heartbeat. As far as I
can tell, it and heartbeat don't really get along too well. Heartbeat
thinks its in charge, yet drbd really needs to have more control than it is
getting.

The problem is not drbd per se, but the scripts which start it and stop it.
Since I know heartbeat pretty well, and think I understand the basics of
what drbd, I'll go ahead and jump out and make a proposal for how the
scripts ought to work. I slept too much on the plane to write the code, but
I meant to ;-) If people think this will work, I'll write it and test it
this week.

I have clear ideas on how to do this in 3 phases. I foolishly think they
might even work ;-) I asked Marius to go over this with me, I convinced him
it would also work.

There are two scripts that drbd has:

An init script to activate the drbd service, and either become secondary
or prepare to become primary.

A heartbeat start/stop script which instructs drbd to become primary
or drop its primary status and become secondary

I'll go through three phases in this discussion, for various levels of
solution. They are in order of complexity of implementation.

The first phase will *only* work with nice_failback in heartbeat

The second phase will work with either nice_failback or not.

The third phase will handle two servers going down at once and both coming
back up. It will not handle both going down, and only one coming back up.

I use these abbreviations for the state when you ask drbd:
NC: No connection to the other side
PRI: Other side is primary
SEC: Other side is secondary

I call this state of the other side "OtherState" in the text below.

For all phases, one needs to add an option to the init script where some
human being can force a machine to become primary. /etc/rc.d/init.d/drbd
primary! or something like that. This option sets the State := GOOD.

==== First phase:=============================================
This phase requires a state file with one of three possible states stored in
it:
GOOD: we have a good copy of the data
BAD: we have no good data
SYNC: we are synchronizing with the other side right now.

I use the variable name "State" to refer to this state of our local data.
This should not (need not) persist between reboots.

INIT SCRIPT "start" logic:-------:

OtherState == NC: State:= BAD. GET HUMAN HELP. This is the case where
we cannot start because we don't have good data, and
we can't get it from the other side.

== PRI: State:= SYNC. This is the case where the other side is
primary, and we're just coming up.

Try a quick sync.
If quick sync succeeds
State:=GOOD
else try a full sync
monitor the full sync in the background
if it fails, State:=BAD
when it succeeds, State:=GOOD

OtherState == SEC: State:= BAD. GET HUMAN HELP. This is the case where
neither
of us knows we have good data. We need for
some human to appoint one of us primary before
we can continue. This happens after both
both machines crash and then come back up.


STARTUP SCRIPT logic:-------:
OtherState == NC or SEC: if State == GOOD, force primary role
else GET HUMAN HELP

OtherState == PRI: GET HUMAN HELP This means the other side
is still trying to own the resource too.
This shouldn't happen.

==== Second phase:=============================================

Same as first phase EXCEPT FOR

STARTUP SCRIPT logic:-------:
OtherState == NC or SEC: if State == GOOD, force primary role
else GET HUMAN HELP {same as before}

OtherState == PRI: If State == SYNC:
wait for sync to complete then
go on. If it fails GET HUMAN HELP

If State == BAD:
GET HUMAN HELP

If State == GOOD: send the other side a msg
asking it to become secondary
Force local side into primary.
Ideally, this would be done with a DRBD command
but I'm not sure if it can be. Failing
that, use the local cluster manager API
to send the message.


==== Third phase:=============================================

For the third phase, one needs permanent generation tuples. They are an
ordered pair {manual, auto}. They must persist across reboots.

The manual number is incremented every time that a node is forced to become
master manually. When this happens, the auto number is reset to 1. The
auto element of the tuple is incremented every time a node becomes primary.
The slave keeps the same generation tuple as the primary. It only
increments it when it takes over from the primary.

There is a ">" relation on generation numbers that compare the tuple
elements in the (manual, auto) order. (1,5) > (1,4); (2,1) > (1,4), etc.

This technique either requires rewriting DRBD to accommodate these
generation numbers, or writing code which uses the local cluster manager API
to send these messages around.

In an ideal world these numbers would be stored inside the partition,
because then they would not get confused when disks get replaced, etc. It
would be nice for DRBD to support them in this way at least as an option.
This would make it more bullet-proof in the real world.

On to how they're used...

The scripts above deal for every case except for this one:

INIT SCRIPT "start" logic:-------:

OtherState == SEC:

In this case, the two sides exchange generation tuples, and if
one of them has a higher number, it changes its state to GOOD,
and the other one does a full sync from the GOOD side. When
it completes, the secondary also marks its state to GOOD.

There's a little more logic to making this work with heartbeat and making
sure the side which wants to be heartbeat master is also the DRBD master. I
leave these details as an exercise to the reader :-)

-- Alan Robertson
alanr@example.com
RE: working with a cluster manager (heartbeat) [ In reply to ]
Alan,

Have you played with the CVS version of the datadisk ?
As far as I can tell ... it's alot better than the one packaged with 0.5.7.
Although, we're still having problems with it and DRBD but ... I don't know.
Hope you can get your implementation out soon ;-) (I have a system we're
putting into production in a few weeks, and heartbeat + DRBD + RAID1 + Reiser
seems to be a not so stable mix at the moment, ... darn :-)

Best Regards,

--
Omar Kilani
Systems Administrator
Mail Call Couriers Pty Ltd
Re: working with a cluster manager (heartbeat) [ In reply to ]
Omar Kilani wrote:
>
> Alan,
>
> Have you played with the CVS version of the datadisk ?
> As far as I can tell ... it's alot better than the one packaged with 0.5.7.
> Although, we're still having problems with it and DRBD but ... I don't know.
> Hope you can get your implementation out soon ;-) (I have a system we're
> putting into production in a few weeks, and heartbeat + DRBD + RAID1 + Reiser
> seems to be a not so stable mix at the moment, ... darn :-)

I can make it work pretty quickly if you are willing to use nice_failback
(which is probably what you want for DRBD anyway).

-- Alan Robertson
alanr@example.com
Re: working with a cluster manager (heartbeat) [ In reply to ]
Alan,

What about without nice_failback ?
Is that just an issue about figuring out which node is the primary and so on ?

Regards

On Tuesday 21 November 2000 09:05, you wrote:
> Omar Kilani wrote:
> > Alan,
> >
> > Have you played with the CVS version of the datadisk ?
> > As far as I can tell ... it's alot better than the one packaged with
> > 0.5.7. Although, we're still having problems with it and DRBD but ... I
> > don't know. Hope you can get your implementation out soon ;-) (I have a
> > system we're putting into production in a few weeks, and heartbeat + DRBD
> > + RAID1 + Reiser seems to be a not so stable mix at the moment, ... darn
> > :-)
>
> I can make it work pretty quickly if you are willing to use nice_failback
> (which is probably what you want for DRBD anyway).
>
> -- Alan Robertson
> alanr@example.com
> _______________________________________________
> DRBD-devel mailing list
> DRBD-devel@example.com
> http://lists.sourceforge.net/mailman/listinfo/drbd-devel

--
Omar Kilani
Systems Administrator
Mail Call Couriers Pty Ltd
Re: working with a cluster manager (heartbeat) [ In reply to ]
Omar Kilani wrote:
>
> Alan,
>
> What about without nice_failback ?
> Is that just an issue about figuring out which node is the primary and so on ?
>
> Regards
>
> On Tuesday 21 November 2000 09:05, you wrote:
> > Omar Kilani wrote:
> > > Alan,
> > >
> > > Have you played with the CVS version of the datadisk ?
> > > As far as I can tell ... it's alot better than the one packaged with
> > > 0.5.7. Although, we're still having problems with it and DRBD but ... I
> > > don't know. Hope you can get your implementation out soon ;-) (I have a
> > > system we're putting into production in a few weeks, and heartbeat + DRBD
> > > + RAID1 + Reiser seems to be a not so stable mix at the moment, ... darn
> > > :-)
> >
> > I can make it work pretty quickly if you are willing to use nice_failback
> > (which is probably what you want for DRBD anyway).


Nice failback is much better when you run DRBD for a couple of reasons:
First, you have fewer outages. If a machine crashes and comes back up, you
will only have one takeover outage. Secondly, until the machine comes back
up and gets resynced, the takeover can't actually happen. This can cause the
retakeback interval to be very long. These things aren't considerations
when using nice_failback.

When you use nice_failback, any machine which has a resource keeps it until
it dies. With normal_failback, a designated machine (the natural master if
you will) will always take back any resources it wants when it wants them.

-- Alan Robertson
alanr@example.com
Re: working with a cluster manager (heartbeat) [ In reply to ]
> Alan,
>
> Have you played with the CVS version of the datadisk ?
> As far as I can tell ... it's alot better than the one packaged with
0.5.7.
> Although, we're still having problems with it and DRBD but ... I don't
know.
> Hope you can get your implementation out soon ;-) (I have a system we're
> putting into production in a few weeks, and heartbeat + DRBD + RAID1 +
Reiser
> seems to be a not so stable mix at the moment, ... darn :-)

Let me know what problem you have and I will do my possible to help you
with datadisk.

Thomas
--
Thomas Mangin (mailto:thomas.mangin@example.com)
System Administrator (mailto:systems@example.com)
Legend Internet Ltd. (http://www.legend.co.uk:/)
--
The urgent is done, the impossible is on the way, for miracles expect a
small delay
Re: working with a cluster manager (heartbeat) [ In reply to ]
Thomas Mangin wrote:
>
> > Alan,
> >
> > Have you played with the CVS version of the datadisk ?
> > As far as I can tell ... it's alot better than the one packaged with
> 0.5.7.
> > Although, we're still having problems with it and DRBD but ... I don't
> know.
> > Hope you can get your implementation out soon ;-) (I have a system we're
> > putting into production in a few weeks, and heartbeat + DRBD + RAID1 +
> Reiser
> > seems to be a not so stable mix at the moment, ... darn :-)
>
> Let me know what problem you have and I will do my possible to help you
> with datadisk.

Thanks Thomas!

It has been pointed out to me that there has been much good work in the CVS
tree which I haven't looked at. I should have known better. My apologies
for being in a hurry.

My apologies for bringing this to the list before investigating the CVS
tree. I will go away and read that code, and look at that before
proceeding.


-- Alan Robertson
alanr@example.com