Mailing List Archive: State information

I've just successfully set up DRBD 0.6.1-pre3 and heartbeat on two new
Athlon boxes to provide failover NFS service. It's succeeded beyond my
dreams so far - so congratulations all.

When setting it up, there seemed to be a couple of errors in the
"datadisk" script - it calls drbdsetup with the arguments "pri" and
"sec" rather than "primary" and "secondary", which caused it to fail on
my setup.

I have a few questions which I need to answer before moving this onto
any kind of "production" box though, and I hope someone will be able to
help.

Firstly, what (2.4) kernel versions are recommended, if any, to work
best with DRBD?

Secondly, I don't really have a good understanding of the different
states that the system can be in. Where can I look for documentation /
code on this? For example: clearly, if host1 is primary and host2 is
secondary, and if host1 goes down, then datadisk will make host2
primary, and then when host1 comes back up it will become secondary and
do a SyncAll. But what happens if both machines go down? Will they
both come back up in secondary state? When is manual intervention
required? What happens if host1 and host2 are primary/secondary as
before, host1 goes down, comes back up and starts synchronising, but
then host2 goes down while this sync is occurring? Is there some sort
of state diagram available somewhere? (there are clearly several
possibilities I haven't mentioned here, and I'd like to have a good
understanding of what might happen before deciding how to test and
benchmark my setup)

Lastly, I'm getting a lot of errors in the syslog from drbd:

drbd0: transferlog too small!!
drbd0: Epoch set size wrong!!found=512 reported=1535

I know from the mailing list archive that the first one isn't a problem
- although I have tried to remove it by setting tl-size=1024, with no
success. But what does the second message mean? Should I be worried?

Hope someone can help!

Many thanks

Jack Bertram

* Jack Bertram <drbd@example.com> [011017 12:45]:
> I've just successfully set up DRBD 0.6.1-pre3 and heartbeat on two new
> Athlon boxes to provide failover NFS service. It's succeeded beyond my
> dreams so far - so congratulations all.
>
> When setting it up, there seemed to be a couple of errors in the
> "datadisk" script - it calls drbdsetup with the arguments "pri" and
> "sec" rather than "primary" and "secondary", which caused it to fail on
> my setup.

Oops, you got the wrong datadisk script. The datadisk script should
be a symlink to /etc/init.d/drbd, which in turn is a "multihomed script".

>
> I have a few questions which I need to answer before moving this onto
> any kind of "production" box though, and I hope someone will be able to
> help.
>
> Firstly, what (2.4) kernel versions are recommended, if any, to work
> best with DRBD?

Hahahaha, chose one with a working VM system! I am still using only
Linus' kernels, but I must say, I got the impression that there are
a few good reasons to switch to Alan's tree.

>
> Secondly, I don't really have a good understanding of the different
> states that the system can be in. Where can I look for documentation /
> code on this? For example: clearly, if host1 is primary and host2 is
> secondary, and if host1 goes down, then datadisk will make host2
> primary, and then when host1 comes back up it will become secondary and
> do a SyncAll. But what happens if both machines go down? Will they
> both come back up in secondary state? When is manual intervention
> required? What happens if host1 and host2 are primary/secondary as
> before, host1 goes down, comes back up and starts synchronising, but
> then host2 goes down while this sync is occurring? Is there some sort
> of state diagram available somewhere? (there are clearly several
> possibilities I haven't mentioned here, and I'd like to have a good
> understanding of what might happen before deciding how to test and
> benchmark my setup)
>

The best way to get an overview is to reed the Ede-Paper:
http://www.complang.tuwien.ac.at/reisner/drbd/publications/drbd_paper_for_NLUUG_2001.ps.gz

> Lastly, I'm getting a lot of errors in the syslog from drbd:
>
> drbd0: transferlog too small!!
> drbd0: Epoch set size wrong!!found=512 reported=1535

Try tl-size=6000

> I know from the mailing list archive that the first one isn't a problem
> - although I have tried to remove it by setting tl-size=1024, with no
> success. But what does the second message mean? Should I be worried?

The second message is a consequence of the too small transfer log.
Only write after write dependency analysis failed. -- Your running
kernel is not messed up.

>
> Hope someone can help!
>
> Many thanks
>
> Jack Bertram

-Philipp