Apr 17, 2000, 11:23 AM
Post #6 of 8
(1831 views)
Permalink
"Luis Claudio R. Goncalves" wrote:
>
> On Mon, 17 Apr 2000, Alan Robertson wrote:
>
> > > There are two points here... :)
> > > The first one is that this kind of situation wouldn't occur, at
> > > least this way, on a multinode setup. If you have N nodes you surely
> > > will have load balance or at least the service active on all nodes.
> >
> > Not necessarily. And, if your code demands that you only have two
> > nodes, then it won't ever happen. It may be the case that heartbeat
> > isn't managing resources for larger clusters, just acting as heartbeat.
>
> It is possible that you have the nice_failback on and use N
> hosts.
Not with the code you defined. It was strictly "me and the other gy".
> When you startup a host it will look for someone alive and
> holding resources. On finding this guy, it simply doesn't take the
> resources. Else, it will take the resources it is configured to hold.
> I think this is the main problem here: I don't want (or someone
> may not want) the host that has the resources defined on haresources
> take them back everytime it starts up - if someone already holds the
> resources. It means, among other things, that when this host (that I
> would call Master/Primary using the terminology I adopted for the
> messages) starts heartbeat, and the resources were hold by someone
> else, all the negociations in course may stop. It is not good for what
> we're looking for... this is why all this nice_failback(tm) stuff came
> to life.
Yes. I think negotation is the way to go. Perhaps simply have the far
end guy say "no" when he is asked to give up the resources. It'll be a
little complicated to distinguish this case from the "no response" case
given the current scripts, but not that complicated.
>
> > > a two node setup you may have such a situation on which both hosts are
> > > up and noone has the resources... and, of course, it it could occur on
> > > a multinode scene, it'd be *more* painful to solve.
> >
> > Yes, but the proposed implementation puts these assumptions in the main
> > part of the heartbeat code. It makes the current code less functional
> > in some respects.
>
> Maybe in the conceptual world.:) But I can't see this loss of
> functionality, mainly because you can easily turn it off. It seems, to
> me, like a special case treatment.
> Well, as it is a need for me, I'll start writing some scripts.
> (Where's my bash wizards book???)
Write it in C if you like, just put in a separate binary.
> > But it doesn't look like the proposed design will function at all in the
> > context of more than 2 nodes. It's hard wired for two -- period.
>
> It's sufficent that one host is up and holding resources to satisfy
> the nice_failback requests. It doesn't matter which resources or how
> many resources. And it may be used on N hosts environment only if you
> use heartbeat as heartbet, not as a cluster manager.
> If heartbeat is your cluster manager, that makes more sense for N
> hosts evironments, turn off nice_failback.
> This is MHO. Anyway, I don't mind on rewriting this stuff on the
> scripts. I'd only ask you to pray for me... :)
And for Heartbeat ;-) But, feel free to write a little "C" program if
you'd rather. It would be simpler in some ways...
> > Right
> > now, the initial takeover of resources is handled by external programs
> > (in particular, shell scripts). I could easily see putting the logic
> > being discussed into an external program (C or script), and doing
> > whatever you want there.
>
> Going to the meta-world, or the conceptual world, it makes the main
> code more beatiful and clean... but polutes all around. IMHO heartbeat
> should handle more than two states. ON or OFF on the core code and
> more one or two abstractions done by the scripts isn't a good
> architechtural view.
I'm afraid I didn't follow this. I claim that heartbeat simply tells
you when nodes come and go. Everything else is Somebody Else's
Problem(tm). This is very simple architecturally. It might mean
rewriting the takeover script, but it's not very complicated. It
doesn't have to be a bash script.
> > > Another cool detail is that when you have two nodes and only one of
> > > them is up, it surely has all the resources.
> >
> > It surely *should* have all the resource groups, and you should
> > guarantee that no race conditions occur with regard to startup.
>
> Ooooops...
> How/Where/Why me? :)
The possibility seems to exist. Let's see if I can give an idea...
When one side starts up, it asks the other to give up it's resources.
It is coming up too, and it asks you to give up your resources. Since
neither side has any, both think they can start them up. Now, because
of the way the scripts are written, and the fact that we have a
"natural" master, this may not happen, but it should be walked through
to make sure that it can't happen for either the normal or "nice
failback" case.
> > You can have an arbitrary number of resource groups per machine. It's
> > clearly not limited to just one group. That isn't required by the
> > current code in any way. I'm sure there are people out there that have
> > more than one resource group per machine.
>
> But for the starting stuff, if someone has at least one of the
> resources it may have... that's good for us. The resources_held
> structure (that I used only to count how many resources I do have, and
> to list each of them to help Horms) actualy lists all the resource
> groups a node handles. :)
>
> > I don't mind the approach in general, but not if it's part of the main
> > heartbeat code. I'll look over your patch again. Again, I'm sorry I
> > didn't get back to you sooner.
>
> I'm feeling more confortable on putting this stuff on the
> scripts, now. But I still thinking that two states on core are few.
Still don't understand this. The core code doesn't know or care about
the resources. It just tracks nodes... *That's* the key distinction.
> Anyway, let's code. :)
>
> > I understand and appreciate the need you're trying to address. With a
> > little more effort, I'm sure that you'll come up with a design that
> > doesn't limit the use of heartbeat in other contexts.
>
> The main problem I had is that I'm looking for a two hosts
> solution. The are things that are easy to solve on a two nodes
> fashion.
Understood. I just don't want to break it for multi-nodes when you're
just tracking nodes. This makes heartbeat useful in other contexts -
like possibly in LinuxFailSafe :-)
-- Alan Robertson
alanr@suse.com