Mailing List Archive

Re: Comments on heartbeat package
Birger Toedtmann wrote:
>
> Hello,
>
> I recently built a 2-node-linux-ha and installed heartbeat for this purpose.
> Works fine, just had some minor inconveniances I want to report:
>
> 1. It seems as if heartbeat starts the skripts within /etc/ha.d/haresoucres
> in reverse order, bringing up first the service scripts, then doing
> the IP takeover, but I said "xxx.xxx.xxx.xxx myfirstscript mysecond...".
>
> Wouldn't It be better to bring up the IP first and then starting
> server scripts?

You understand what it does correctly. I have had difficulties seeing the
best answer clearly. I suspect it will become clearer over time. Some
services want it to be up first, and some you might want the service to be
running before the packets start getting received. It may be backwards
compared to normal usage. It's hard for me to say universally. However, you
can always put them on the line in the reverse order if you want.

> 2. There are some services that run even in "down" mode, esp. named.
> Big problems arose because it did not bind to the new IP brought up
> by heartbeat, so I had to hack the "IPaddr" script for a "killall -HUP"
> on named.

However, having psuedo-resources as the first resource on the line is a
problem. There's also a script /etc/ha.d/rc.d/local_takeip that's always
invoked if it's present when IPs are taken over. A corresponding one
/etc/ha.d/rc.d/local_giveip is invoked when giving one up. Maybe those could
do what you want?

> It would be great to include this as an extra option like
>
> sendhup::<proggietoHUP>

Patches are being accepted :-). Unfortunately, it's a problem when it's the
first resource on the line. Such a resource would have to answer the status
query *correctly*, which would be hard (I only ask the first one on the
resource line and assume it's answer for the entire line). Maybe I could
recognize an answer that says "psuedo-resource", and then it would cycle to
the next on the line and get status from it automatically. This sounds like
it could be a useful feature.

> 3. The beat over my serial cable was quite good, but heartbeat won't
> stop properly, I had to do a "kill -9" on it. Device problems?

Haven't seen that one, except for when things hung due to no CTS from the
other side (like when it's down). Only one process hung that way? What does
ps axwwl say about it?

> - and one last question: how do you sync your machines? I for now use
> "rsync" on /var and /etc, which is quite fast at the moment but maybe
> there are better progs out there?

There are better ways under consideration, but rsync seems to be the tool of
choice for now. Rudy Pawul is writing a howto on using rsync. I've CCed him.

Your comments about named (DNS) are good, since I'd like to give specific
instructions on how to support DNS in this environment.

-- Alan Robertson
alanr@bell-labs.com