Mailing List Archive

drbd-reactor v0.9.0
Dear DRBD users,

this is drbd-reactor version 0.9.0. No changes since the last RC, the
original announcement for convenience:

The main new feature is that the promoter plugin can now freeze the
services of the currently active node when it loses quorum and then thaw
them when the node gains quorum again. This might be an advantage when
starting services takes a long time (e.g., huge databases). Freezing and
thawing is instant and uses the according cgroup features via systemctl
freeze/thaw.

Copying from the documentation [1]:
The default behavior when a DRBD Primary looses quorum is to immediately
stop the generated target unit and hope that other nodes still having
quorum will successfully start the service. This works well if services
can be failed over/started on another node in reasonable time.
Unfortunately there are services that take a very long time to start,
for example huge data bases.

When a DRBD Primary looses its quorum we basically have two possibilities:

- the rest of the nodes, or at least parts of it still have quorum: Then
these have to start the service, they are the only ones with quorum,
but still we could keep the old Primary in a frozen state. And then,
when the nodes with quorum come into contact with the old Primary,
then it should stop the service and its storage should become in sync
with the other nodes.
- the rest of the nodes are not able to form a partition with quorum. In
such a scenario there are no alternatives anyways, we would need to
keep the Primary frozen. But if the nodes eventually join the old
Primary again, and quorum would be restored, we could just
unfreeze/thaw the old Primary (which is also the new Primary).

There are several requirements for this to work properly:

- A system with unified cgroups. If the file
/sys/fs/cgroup/cgroup.controllers exists you should be fine. That
requires a relatively "new" kernel. Note that "even" RHEL8 for example
needs the addition of systemd.unified_cgroup_hierarchy on the kernel
command line.
- a service that can tolerate to be frozen
- DRBD option on-suspended-primary-outdated set to force-secondary
- DRBD option on-quorum-loss set to suspend-io
- DRBD net option rr-conflict set to retry-connect

If these requirements are fulfilled, then one can set the promoter
option "on-quorum-loss" to "freeze".

It is a feature that might be handy in specific situations, the more
classic behavior of stopping the services might be the better default
for most users.

Also, and that is important in general, check the output of "systemctl
status drbd-reactor.service", it runs all kinds of DRBD option checks on
your DRBD resources and tells you which options are missing/wrong.
Follow these suggestions!

Regards, rck

GIT: https://github.com/LINBIT/drbd-reactor/commit/b9639b431f6d6e0fcc53a2a17d85717acb29d43e
TGZ: https://pkg.linbit.com//downloads/drbd/utils/drbd-reactor-0.9.0.tar.gz
PPA: https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack

Changelog:
[ Roland Kammerer ]
* doc: make man pages o+r
* docs,promoter: hint to use provided packages
* promoter: warn if mount unit is topmost unit
* promoter: implement on-quorum-loss policy
* promoter: relax ocf parser
* ctl: add resource filter
* ctl: fix status without res filter
* promoter: call systemctl freeze/thaw for every unit

[1] https://github.com/LINBIT/drbd-reactor/blob/master/doc/promoter.md#freezing-resources