Marcelo Tosatti wrote:
>
> On Mon, 17 Apr 2000, Alan Robertson wrote:
>
> > Horms wrote:
> > >
> > > On Mon, Apr 17, 2000 at 09:39:47AM -0600, Alan Robertson wrote:
> > > > "Luis Claudio R. Goncalves" wrote:
> > > > >
> > > > > Hello!
> > > > >
> > > > > I think, (I *hope* to be honest) this is the nicer nice failback
> > > > > patch I ever did. It adds some features that I'll extend next monday,
> > > > > like periodic resources_held messages and so.
> > > > > I'd challenge the brave ones to test this code (I'm still hardtesting
> > > > > it). If someone survive and give me some feedback, I'll put it on the
> > > > > CVS on Monday :)
> > > >
> > > > Sorry I was out of commission for a few days, so this is coming late,
> > > > and sounds like a broken record. I am generally opposed to putting
> > > > support of resources into heartbeat, particularly if they restrict
> > > > things to only two machines.
> > >
> > > The problem is that heartbeat as it stands has a _serious_ flaw.
> >
> > Agreed.
> >
> > > If all links fail then resources become owned by more than one
> > > machine and will not be relinquished once links are re-established.
> >
> > Yes, but a better way might be to have pseudo-quorum based on the
> > reachability of something like a router or switch or hub. And do
> > something I'll outline below, ALSO.
> That was exactly our idea when we started to help with heartbeat.
> The "FIXME: do something useful here" on my patch is there because we
> need scripts to "do" the pseudo-quorum. Luis already explained this
> in a past message.
> (http://lists.tummy.com/pipermail/linux-ha-dev/2000-March/000460.html)
I knew it had been discussed, but forgot when. If you do this, then
doesn't this largely solve the problem?
I believe that the low-level protocol is capable of noticing the joining
together of two independent clusters, and activating a special "oops"
case when it detects that the systems have rejoined. This could be the
audit script we've talked about. This *can* happen with perverse enough
failures even with a pseudo-quorum device. For those cases where a
pseudo-quorum device isn't configured, it is much more likely.
> > > At the moment the hack is to have as many links used for heartbeat
> > > communication as possible and hope that you never run into a situation
> > > where nodes lose communication with each other and yet are fully
> > > functional. This is in my opinion an acceptable situation in the sort
> > > term as in the case of 2 nodes a serial link should give you a
> > > fair amount of security against all links failing.
> >
> > I don't think I'd call it a "hack", and would recommend it even if it
> > didn't help solve the problem. But, this is not to fundamentally
> > disagree with your assessment.
> >
> > > To my mind to get around this problem the best way forward is to have
> > > nodes keep track of resources internally, when nodes change state they can
> > > check to see if a resource is has - or can potential have - is owned by
> > > any other nodes on the network. Without this assumptions have to be made
> > > about a node being accessible meaning that given resources are accessible.
> > > Especially in the case where there is no master for a resource, there
> > > is no way to make such assumptions without the possibility of situations
> > > where either resources are duplicated or disappear off the network.
> >
> > With drbd (for example), you MUST NEVER have both sides have the mirror
> > mounted read-write simultaneously, so your solution is insufficient for
> > this. The only way I know to handle this is to follow Stephen's
> > suggestion of having a pseudo-quorum resource that you have to "own" in
> > order to own the master side of the mirror. It should work like this:
> > If you can reach the hub and you can't reach the master, then you may
> > take over the drbd resource.
> > If you can't reach the hub, then you should probably shut down, and
> > await it's becoming available again.
> >
> > This will fail to work in the following very unlikely situation:
> > Both sides can reach the hub/switch/router,
> > Neither side can talk to the other (including via alternate paths)
> Then both sit_and_cry().
No. Because each can reach the pseudo-quorum device, neither sits and
cries. This is a very unlikely failure, but not impossible. It would
require a particular type of failure inside the psuedo-quorum
hub/router, and also a simultaneous failure of the redundant heartbeat
link.
> > [This is at least a double failure]
> >
> > This also solves another important problem:
> > A side staying up when it can't serve it's customers.
> >
> > Pardon me if I've forgotten, but does this solve the same problems as
> > you're trying to solve?
> Horms?
Let me make a proposal, and see if anyone is interested in implementing
pieces of it:
Someone (maybe me) should implement the code for detecting cluster
merge. We should activate an external script when it is
discovered. Name and arguments to be determined.
We ought to implement pseudo-quorum. I'm open as to the details.
Thoughts include:
A new resource type ping-quorum::135.9.214.51,
and make the takeover scripts actually check the
return code of one resource before taking over others :-)
In the current implementation, you'd have to list it last
on a line.
Including the ping resource as a psuedo-host, and have
it execute the status script whenever it comes and
goes. This would be a bit of a kludge, but not
so bad. You'd mangle ping responses to make
received packets. Not really *that* bad...
Something special would have to be done when
it came and went... I suppose giving up all
resources when it disappears, and getting them
back when it comes back.
Do it externally with a cron job and stop and start
heartbeat.
Are these complete?
Now, in this context, it seems to me that the nice_failback still has to
worry about whether the other side has any of the resources (which may
be where we started this conversation). You could always add a message
type and ask... The only difference between nice_failback and normal
is if you ask whether the other side has the resources, or if you just
take them over anyway.
If you don't want to add any new message types, you could always
implement nice failback as a case where the side coming back up (the
"natural" master) gets a "no" response when it asks for the resources
from the other side. You could then even make nice_failback a special
resource, so that the nice-failback property is then a property of the
group, not the whole configuration. When asked to give up any group
with the nice-failback resource in it, the far end machine always says
"no".
Sorry to send things so far afield... But this does keep nicefailback
and resource handling in general outside the core code...
-- Alan Robertson
alanr@suse.com
>
> On Mon, 17 Apr 2000, Alan Robertson wrote:
>
> > Horms wrote:
> > >
> > > On Mon, Apr 17, 2000 at 09:39:47AM -0600, Alan Robertson wrote:
> > > > "Luis Claudio R. Goncalves" wrote:
> > > > >
> > > > > Hello!
> > > > >
> > > > > I think, (I *hope* to be honest) this is the nicer nice failback
> > > > > patch I ever did. It adds some features that I'll extend next monday,
> > > > > like periodic resources_held messages and so.
> > > > > I'd challenge the brave ones to test this code (I'm still hardtesting
> > > > > it). If someone survive and give me some feedback, I'll put it on the
> > > > > CVS on Monday :)
> > > >
> > > > Sorry I was out of commission for a few days, so this is coming late,
> > > > and sounds like a broken record. I am generally opposed to putting
> > > > support of resources into heartbeat, particularly if they restrict
> > > > things to only two machines.
> > >
> > > The problem is that heartbeat as it stands has a _serious_ flaw.
> >
> > Agreed.
> >
> > > If all links fail then resources become owned by more than one
> > > machine and will not be relinquished once links are re-established.
> >
> > Yes, but a better way might be to have pseudo-quorum based on the
> > reachability of something like a router or switch or hub. And do
> > something I'll outline below, ALSO.
> That was exactly our idea when we started to help with heartbeat.
> The "FIXME: do something useful here" on my patch is there because we
> need scripts to "do" the pseudo-quorum. Luis already explained this
> in a past message.
> (http://lists.tummy.com/pipermail/linux-ha-dev/2000-March/000460.html)
I knew it had been discussed, but forgot when. If you do this, then
doesn't this largely solve the problem?
I believe that the low-level protocol is capable of noticing the joining
together of two independent clusters, and activating a special "oops"
case when it detects that the systems have rejoined. This could be the
audit script we've talked about. This *can* happen with perverse enough
failures even with a pseudo-quorum device. For those cases where a
pseudo-quorum device isn't configured, it is much more likely.
> > > At the moment the hack is to have as many links used for heartbeat
> > > communication as possible and hope that you never run into a situation
> > > where nodes lose communication with each other and yet are fully
> > > functional. This is in my opinion an acceptable situation in the sort
> > > term as in the case of 2 nodes a serial link should give you a
> > > fair amount of security against all links failing.
> >
> > I don't think I'd call it a "hack", and would recommend it even if it
> > didn't help solve the problem. But, this is not to fundamentally
> > disagree with your assessment.
> >
> > > To my mind to get around this problem the best way forward is to have
> > > nodes keep track of resources internally, when nodes change state they can
> > > check to see if a resource is has - or can potential have - is owned by
> > > any other nodes on the network. Without this assumptions have to be made
> > > about a node being accessible meaning that given resources are accessible.
> > > Especially in the case where there is no master for a resource, there
> > > is no way to make such assumptions without the possibility of situations
> > > where either resources are duplicated or disappear off the network.
> >
> > With drbd (for example), you MUST NEVER have both sides have the mirror
> > mounted read-write simultaneously, so your solution is insufficient for
> > this. The only way I know to handle this is to follow Stephen's
> > suggestion of having a pseudo-quorum resource that you have to "own" in
> > order to own the master side of the mirror. It should work like this:
> > If you can reach the hub and you can't reach the master, then you may
> > take over the drbd resource.
> > If you can't reach the hub, then you should probably shut down, and
> > await it's becoming available again.
> >
> > This will fail to work in the following very unlikely situation:
> > Both sides can reach the hub/switch/router,
> > Neither side can talk to the other (including via alternate paths)
> Then both sit_and_cry().
No. Because each can reach the pseudo-quorum device, neither sits and
cries. This is a very unlikely failure, but not impossible. It would
require a particular type of failure inside the psuedo-quorum
hub/router, and also a simultaneous failure of the redundant heartbeat
link.
> > [This is at least a double failure]
> >
> > This also solves another important problem:
> > A side staying up when it can't serve it's customers.
> >
> > Pardon me if I've forgotten, but does this solve the same problems as
> > you're trying to solve?
> Horms?
Let me make a proposal, and see if anyone is interested in implementing
pieces of it:
Someone (maybe me) should implement the code for detecting cluster
merge. We should activate an external script when it is
discovered. Name and arguments to be determined.
We ought to implement pseudo-quorum. I'm open as to the details.
Thoughts include:
A new resource type ping-quorum::135.9.214.51,
and make the takeover scripts actually check the
return code of one resource before taking over others :-)
In the current implementation, you'd have to list it last
on a line.
Including the ping resource as a psuedo-host, and have
it execute the status script whenever it comes and
goes. This would be a bit of a kludge, but not
so bad. You'd mangle ping responses to make
received packets. Not really *that* bad...
Something special would have to be done when
it came and went... I suppose giving up all
resources when it disappears, and getting them
back when it comes back.
Do it externally with a cron job and stop and start
heartbeat.
Are these complete?
Now, in this context, it seems to me that the nice_failback still has to
worry about whether the other side has any of the resources (which may
be where we started this conversation). You could always add a message
type and ask... The only difference between nice_failback and normal
is if you ask whether the other side has the resources, or if you just
take them over anyway.
If you don't want to add any new message types, you could always
implement nice failback as a case where the side coming back up (the
"natural" master) gets a "no" response when it asks for the resources
from the other side. You could then even make nice_failback a special
resource, so that the nice-failback property is then a property of the
group, not the whole configuration. When asked to give up any group
with the nice-failback resource in it, the far end machine always says
"no".
Sorry to send things so far afield... But this does keep nicefailback
and resource handling in general outside the core code...
-- Alan Robertson
alanr@suse.com