Hi,
This is just a note on what nice_failback is, and why it works so nicely
with drbd (DRBD was one of the main reasons nice_failback was added).
In heartbeat's old mode, a machine was designated the "natural master" of a
given resource, like drbd. This means that whenever that machine was up, it
would become the master of that resource.
In nice_failback mode, a resource would transition to another machine only
when the machine providing the resource goes down.
When possible, it is better for drbd to use nice_failback takeover for the
following reasons:
When a machine comes back up, it has to resync, which is a potentially
expensive operation. This transition has to be delayed by this amount of
time before it can occur. Nice_failback makes fewer transitions (failovers)
than normal failback does, so there's less of this going on.
Making drbd stop one end from being master and forcing it to be a slave, and
vice versa on the other end is kind of messy and complicated, particularly
if the slave isn't yet in sync (usually the case after it comes back up).
This doesn't *ever* happen with nice_failback. When the transition occurs,
either you already have good data, and you can just fail over, or you don't
and you can't get it at all. Either way, it's easy.
So, it is fair to say that drbd and nice_failback get along pretty well in
some conceptual sense.
However, the startup scripts assume that one machine is going to be the
master whenever it comes up. Of course, with nice_failback that doesn't
happen. What I had written as Phase I is with nice_failback, and that's why
it's simpler.
-- Alan Robertson
alanr@example.com
This is just a note on what nice_failback is, and why it works so nicely
with drbd (DRBD was one of the main reasons nice_failback was added).
In heartbeat's old mode, a machine was designated the "natural master" of a
given resource, like drbd. This means that whenever that machine was up, it
would become the master of that resource.
In nice_failback mode, a resource would transition to another machine only
when the machine providing the resource goes down.
When possible, it is better for drbd to use nice_failback takeover for the
following reasons:
When a machine comes back up, it has to resync, which is a potentially
expensive operation. This transition has to be delayed by this amount of
time before it can occur. Nice_failback makes fewer transitions (failovers)
than normal failback does, so there's less of this going on.
Making drbd stop one end from being master and forcing it to be a slave, and
vice versa on the other end is kind of messy and complicated, particularly
if the slave isn't yet in sync (usually the case after it comes back up).
This doesn't *ever* happen with nice_failback. When the transition occurs,
either you already have good data, and you can just fail over, or you don't
and you can't get it at all. Either way, it's easy.
So, it is fair to say that drbd and nice_failback get along pretty well in
some conceptual sense.
However, the startup scripts assume that one machine is going to be the
master whenever it comes up. Of course, with nice_failback that doesn't
happen. What I had written as Phase I is with nice_failback, and that's why
it's simpler.
-- Alan Robertson
alanr@example.com