Mailing List Archive

drbd-reactor v0.5.0-rc.1 (including HA FS mount example)
Dear DRBD(-reactor) users,

this is the first release candidate of version 0.5.0

Besides minor fixes for Ubuntu Bionic, and some upgrades for containers,
the main feature is proper demote failure handling in the promoter
plugin.

There was the "on-stop-failure" action, which at one point worked, but
did not do anything since we switched to a more fancy systemd.target
logic. What we really care about when managing services (and a potential
fail-over because of a service failure) is if things are stopped in a
way that the DRBD device can be demoted to secondary. If not, we might
need to halt or reboot the node so that another node can take over the
DRBD resource and the services depending ot it.

This is done via the new setting "on-drbd-demote-failure".
"on-stop-failure" is deprecated and ignored. The new option can be set
to any action defined for "FailureAction" as defined in systemd.unit(5).
If the DRBD resource can not be demoted, that action is executed.

Let's see how that looks like in a HA cluster providing a file system
mount. I assume a working linstor cluster (while not strictly required).
A good way to help us testing is using the PPA. If you are using fresh
VMs, make sure that you restart multipathd after the first drbd-utils
install. So, let's assume a 3 node cluster, which also has this RC of
drbd-reactor installed:

Let's create a 3 node DRBD resource:
$ linstor rg c --place-count 3 promoter
$ linstor rg drbd-options promoter --auto-promote no
$ linstor rg drbd-options promoter --quorum majority
$ linstor rg drbd-options promoter --on-no-quorum io-error
$ linstor vg c promoter
$ linstor rg spawn promoter test 20M

And a file system:
$ drbdadm primary test
$ mkfs.ext4 /dev/drbd1000
$ drbdadm secondary test

And a mount unit for the storage:
on *all* nodes:
$ cat <<EOF > /etc/systemd/system/mnt-test.mount
[Unit]
Description=Mount /dev/drbd1000 to /mnt/test

[Mount]
What=/dev/drbd1000
Where=/mnt/test
Type=ext4
EOF

And a simple drbd-reactor::promoter config:
on *all* nodes:
$ cat <<EOF > /etc/drbd-reactor.d/mnt-test.toml
[[promoter]]
id = "mnt-test"
[promoter.resources.test]
start = ["mnt-test.mount"]
on-drbd-demote-failure = "reboot"
EOF

on *all* nodes:
systemctl start drbd-reactor

Then you can check which node is Primary and has the device mounted:
$ drbd-reactorctl status mnt-test

On the node that is Primary you can do a switch-over, just for testing:
$ drbd-reactorctl disable --now mnt-test
$ # another node should be primary now and have the FS mounted
$ drbd-reactorctl enable mnt-test # to re-enable the config again

Testing demote failure. Connect to the node that is Primary
$ touch /mnt/test/lock
$ sleep 3600 < /mnt/test/lock &
$ # ^^ this creates an opener and the mount unit will be unable to stop
$ # and the DRBD device will be unable to demote
$ systemctl restart drbd-services@test.target # trigger a stop/restart of
the target

This should trigger the reboot action and another node should take over
the mount.

Please help testing.

Regards, rck

GIT: https://github.com/LINBIT/drbd-reactor/commit/3df014d63611b97470728838afdaf2d313f0d786
TGZ: https://pkg.linbit.com//downloads/drbd/utils/drbd-reactor-0.5.0-rc.1.tar.gz
PPA: https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack/