Mailing List Archive

Heartbeat 0.4.5c prerelease available for download
Hi,

I've uploaded heartbeat 0.4.5c to the download directory of the web site, but
there are no pointers to it on the web site. The URLs are what you'd expect:

http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c.tar.gz
http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.src.rpm
http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.i386.rpm

Give it a shot, and see what you think.

It has all the fixes I've been given so far, has more progress on logging
cleanups, and now doesn't rely on its startup script to create the FIFO, or to
acquire its resources for it. By moving initial resource acquisition inside
heartbeat, I am also (usually) able to acquire resources on reboot several
seconds sooner than before.

Real Soon Now, I should also be able to set the log options for the various
shell scripts, since all of them that log important stuff will now be child
processes of heartbeat, and can use the environment variables it sets for them.

I also updated the online web pages to point at Rudy's getting starting
document, so that people can see the docs before deciding to download heartbeat.

-- Alan Robertson
alanr@bell-labs.com
Heartbeat 0.4.5c prerelease available for download [ In reply to ]
Hi,

I've uploaded heartbeat 0.4.5c to the download directory of the web site, but
there are no pointers to it on the web site. The URLs are what you'd expect:

http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c.tar.gz
http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.src.rpm
http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.i386.rpm

Give it a shot, and see what you think.

It has all the fixes I've been given so far, has more progress on logging
cleanups, and now doesn't rely on its startup script to create the FIFO, or to
acquire its resources for it. By moving initial resource acquisition inside
heartbeat, I am also (usually) able to acquire resources on reboot several
seconds sooner than before.

Real Soon Now, I should also be able to set the log options for the various
shell scripts, since all of them that log important stuff will now be child
processes of heartbeat, and can use the environment variables it sets for them.

I also updated the online web pages to point at Rudy's getting starting
document, so that people can see the docs before deciding to download heartbeat.

-- Alan Robertson
alanr@bell-labs.com
Heartbeat 0.4.5c prerelease available for download [ In reply to ]
Hi,
On Sat, Oct 02, 1999 at 10:04:46PM -0600, Alan Robertson wrote:
> Hi,
>
> I've uploaded heartbeat 0.4.5c to the download directory of the web site, but
> there are no pointers to it on the web site. The URLs are what you'd expect:
>
> http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c.tar.gz
> http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.src.rpm
> http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.i386.rpm
>
> Give it a shot, and see what you think.

Question, who does the auth work. I have placed on both nodes the
same authkey file (from the doc directory), removed the comments
in front of the auth directive and now i get the following in the
log:
heartbeat: 1999/10/03_13:04:32 debug: master_status_process: node [skywalker.cha
os.han.de] failed authentication

So must the keys in the authkey file be the same or not.

Next point is the /proc/ha interface, if i set the USE_MODULES=1
in the heartbeat script the module is not loaded, and i get in the log file:
./heartbeat: [.: too many arguments

running the script with -vx shows the following problem;
grep -v '^#' $CONFIG | grep watchdog |
sed s'%^[ ]*watchdog[ ]*%%'
++ grep -v '^#' /etc/ha.d/ha.cf
++ grep watchdog
++ sed 's%^[ ]*watchdog[ ]*%%'
+ WATCHDEV=
echo $WATCHDEV
++ echo
+ WATCHDEV=
+ '[' X '!=' X ']'
+ : No watchdog device specified in /etc/ha.d/ha.cf file.
+ '[' '!' -c -a no = yes ']'
heartbeat: [: too many arguments


Thats all for the moment ;-)

Thomas
--
-----------------------------------------------
| Thomas Hepper th@ant.han.de |
| ( If the above address fail try ) |
| ( thomas.hepper@planet-interkom.de) |
-----------------------------------------------
Heartbeat 0.4.5c prerelease available for download [ In reply to ]
Hi,
On Sat, Oct 02, 1999 at 10:04:46PM -0600, Alan Robertson wrote:
> Hi,
>
> I've uploaded heartbeat 0.4.5c to the download directory of the web site, but
> there are no pointers to it on the web site. The URLs are what you'd expect:
>
> http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c.tar.gz
> http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.src.rpm
> http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.i386.rpm
>
> Give it a shot, and see what you think.

Question, who does the auth work. I have placed on both nodes the
same authkey file (from the doc directory), removed the comments
in front of the auth directive and now i get the following in the
log:
heartbeat: 1999/10/03_13:04:32 debug: master_status_process: node [skywalker.cha
os.han.de] failed authentication

So must the keys in the authkey file be the same or not.

Next point is the /proc/ha interface, if i set the USE_MODULES=1
in the heartbeat script the module is not loaded, and i get in the log file:
./heartbeat: [.: too many arguments

running the script with -vx shows the following problem;
grep -v '^#' $CONFIG | grep watchdog |
sed s'%^[ ]*watchdog[ ]*%%'
++ grep -v '^#' /etc/ha.d/ha.cf
++ grep watchdog
++ sed 's%^[ ]*watchdog[ ]*%%'
+ WATCHDEV=
echo $WATCHDEV
++ echo
+ WATCHDEV=
+ '[' X '!=' X ']'
+ : No watchdog device specified in /etc/ha.d/ha.cf file.
+ '[' '!' -c -a no = yes ']'
heartbeat: [: too many arguments


Thats all for the moment ;-)

Thomas
--
-----------------------------------------------
| Thomas Hepper th@ant.han.de |
| ( If the above address fail try ) |
| ( thomas.hepper@planet-interkom.de) |
-----------------------------------------------
Heartbeat 0.4.5c prerelease available for download [ In reply to ]
Thomas Hepper wrote:
>
> Hi,
> On Sat, Oct 02, 1999 at 10:04:46PM -0600, Alan Robertson wrote:
> > Hi,
> >
> > I've uploaded heartbeat 0.4.5c to the download directory of the web site, but
> > there are no pointers to it on the web site. The URLs are what you'd expect:
> >
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c.tar.gz
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.src.rpm
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.i386.rpm
> >
> > Give it a shot, and see what you think.
>
> Question, who does the auth work. I have placed on both nodes the
> same authkey file (from the doc directory), removed the comments
> in front of the auth directive and now i get the following in the
> log:
> heartbeat: 1999/10/03_13:04:32 debug: master_status_process: node [skywalker.cha
> os.han.de] failed authentication

That's not good... Adding this feature is a "flash-cut" feature, you have to
run the new code on all nodes in your cluster. :-(

> So must the keys in the authkey file be the same or not.

The auth methods/keys listed must be identical. The parsing isn't too
sophisticated, so you might run a crc or sum or something on them to
double-check that they're identical. Under normal circumstances, both files
should be byte-for-byte identical.

I'm going to try and explain this, to make sure everything's clear...
DOCUMENTATION GUYS LISTEN UP:

The "auth #" directive at the first of the file must give a number which is
found lower in the file. So, if you say auth 2, then there must be a line
labelled 2 below. The "auth #" line determines what authentication method we
use to authenticate the packets we send out.

So, if it says auth 3, then we use method/key #3 in the file to authenticate our
outgoing packets.

If it says "auth 2", then we use methods/key #2 in the file to authenticate our
outgoing packets.

To authenticate incoming packets, each packet has an "auth # ..." line in it
which indicates which method and result this packet is using for
authentication. The corresponding numbered line in the local authkeys file is
used to compute the authentication for the packet, and that result is compared
to the auth string in the packet. You get the message you got when those two
don't compare.

So, both files don't have to be exactly identical, but every key being sent by
every node has to be in the authkeys file of each node. This means that you can
have unused keys in there, and different "auth #" strings at the front, but you
mainly do that to help you smoothly change keys.

This works here... I don't know why it's not working at your end... You might
try switching the auth method to crc to make sure that the links really aren't
corrupting packets. Crc authentication doesn't use a key...

If you don't figure it out from this description, then you might try putting a
call in to dump the packet being authenticated into the function "isauthentic"
at around like 512 in ha_msg.c. Is it rejecting all packets from the other
machine, and accepting packets from itself? [.authentication applies to all
packets, those from elsewhere, and those from itself]. If this is the case,
then I suspect a difference in the authkeys file. Let me know.

> Next point is the /proc/ha interface, if i set the USE_MODULES=1
> in the heartbeat script the module is not loaded, and i get in the log file:
> ./heartbeat: [.: too many arguments
>
> running the script with -vx shows the following problem;
> grep -v '^#' $CONFIG | grep watchdog |
> sed s'%^[ ]*watchdog[ ]*%%'
> ++ grep -v '^#' /etc/ha.d/ha.cf
> ++ grep watchdog
> ++ sed 's%^[ ]*watchdog[ ]*%%'
> + WATCHDEV=
> echo $WATCHDEV
> ++ echo
> + WATCHDEV=
> + '[' X '!=' X ']'
> + : No watchdog device specified in /etc/ha.d/ha.cf file.
> + '[' '!' -c -a no = yes ']'
> heartbeat: [: too many arguments


OOPS!

I changed some things to make it compile objects for Red Hat 6.1, and think I
didn't quite finish the job. Or, this could just be a bug (or both). Sorry...

-- Alan Robertson
alanr@bell-labs.com
Heartbeat 0.4.5c prerelease available for download [ In reply to ]
Thomas Hepper wrote:
>
> Hi,
> On Sat, Oct 02, 1999 at 10:04:46PM -0600, Alan Robertson wrote:
> > Hi,
> >
> > I've uploaded heartbeat 0.4.5c to the download directory of the web site, but
> > there are no pointers to it on the web site. The URLs are what you'd expect:
> >
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c.tar.gz
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.src.rpm
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.i386.rpm
> >
> > Give it a shot, and see what you think.
>
> Question, who does the auth work. I have placed on both nodes the
> same authkey file (from the doc directory), removed the comments
> in front of the auth directive and now i get the following in the
> log:
> heartbeat: 1999/10/03_13:04:32 debug: master_status_process: node [skywalker.cha
> os.han.de] failed authentication

That's not good... Adding this feature is a "flash-cut" feature, you have to
run the new code on all nodes in your cluster. :-(

> So must the keys in the authkey file be the same or not.

The auth methods/keys listed must be identical. The parsing isn't too
sophisticated, so you might run a crc or sum or something on them to
double-check that they're identical. Under normal circumstances, both files
should be byte-for-byte identical.

I'm going to try and explain this, to make sure everything's clear...
DOCUMENTATION GUYS LISTEN UP:

The "auth #" directive at the first of the file must give a number which is
found lower in the file. So, if you say auth 2, then there must be a line
labelled 2 below. The "auth #" line determines what authentication method we
use to authenticate the packets we send out.

So, if it says auth 3, then we use method/key #3 in the file to authenticate our
outgoing packets.

If it says "auth 2", then we use methods/key #2 in the file to authenticate our
outgoing packets.

To authenticate incoming packets, each packet has an "auth # ..." line in it
which indicates which method and result this packet is using for
authentication. The corresponding numbered line in the local authkeys file is
used to compute the authentication for the packet, and that result is compared
to the auth string in the packet. You get the message you got when those two
don't compare.

So, both files don't have to be exactly identical, but every key being sent by
every node has to be in the authkeys file of each node. This means that you can
have unused keys in there, and different "auth #" strings at the front, but you
mainly do that to help you smoothly change keys.

This works here... I don't know why it's not working at your end... You might
try switching the auth method to crc to make sure that the links really aren't
corrupting packets. Crc authentication doesn't use a key...

If you don't figure it out from this description, then you might try putting a
call in to dump the packet being authenticated into the function "isauthentic"
at around like 512 in ha_msg.c. Is it rejecting all packets from the other
machine, and accepting packets from itself? [.authentication applies to all
packets, those from elsewhere, and those from itself]. If this is the case,
then I suspect a difference in the authkeys file. Let me know.

> Next point is the /proc/ha interface, if i set the USE_MODULES=1
> in the heartbeat script the module is not loaded, and i get in the log file:
> ./heartbeat: [.: too many arguments
>
> running the script with -vx shows the following problem;
> grep -v '^#' $CONFIG | grep watchdog |
> sed s'%^[ ]*watchdog[ ]*%%'
> ++ grep -v '^#' /etc/ha.d/ha.cf
> ++ grep watchdog
> ++ sed 's%^[ ]*watchdog[ ]*%%'
> + WATCHDEV=
> echo $WATCHDEV
> ++ echo
> + WATCHDEV=
> + '[' X '!=' X ']'
> + : No watchdog device specified in /etc/ha.d/ha.cf file.
> + '[' '!' -c -a no = yes ']'
> heartbeat: [: too many arguments


OOPS!

I changed some things to make it compile objects for Red Hat 6.1, and think I
didn't quite finish the job. Or, this could just be a bug (or both). Sorry...

-- Alan Robertson
alanr@bell-labs.com
Heartbeat 0.4.5c prerelease available for download [ In reply to ]
Thomas Hepper wrote:
>
> Hi,
> On Sat, Oct 02, 1999 at 10:04:46PM -0600, Alan Robertson wrote:
> > Hi,
> >
> > I've uploaded heartbeat 0.4.5c to the download directory of the web site, but
> > there are no pointers to it on the web site. The URLs are what you'd expect:
> >
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c.tar.gz
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.src.rpm
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.i386.rpm
> >
> > Give it a shot, and see what you think.
>
> Question, who does the auth work. I have placed on both nodes the
> same authkey file (from the doc directory), removed the comments
> in front of the auth directive and now i get the following in the
> log:
> heartbeat: 1999/10/03_13:04:32 debug: master_status_process: node [skywalker.cha
> os.han.de] failed authentication
>
> So must the keys in the authkey file be the same or not.
>
> Next point is the /proc/ha interface, if i set the USE_MODULES=1
> in the heartbeat script the module is not loaded, and i get in the log file:
> ./heartbeat: [.: too many arguments
>
> running the script with -vx shows the following problem;
> grep -v '^#' $CONFIG | grep watchdog |
> sed s'%^[ ]*watchdog[ ]*%%'
> ++ grep -v '^#' /etc/ha.d/ha.cf
> ++ grep watchdog
> ++ sed 's%^[ ]*watchdog[ ]*%%'
> + WATCHDEV=
> echo $WATCHDEV
> ++ echo
> + WATCHDEV=
> + '[' X '!=' X ']'
> + : No watchdog device specified in /etc/ha.d/ha.cf file.
> + '[' '!' -c -a no = yes ']'
> heartbeat: [: too many arguments

This is actually unrelated to the module building stuff I did for RH 6.1. I
think I fixed it. Try changing the if statement near line 108 in heartbeat.sh.
Please take careful note of the quoting, since I failed to do that before :-)

if
[. "X$WATCHDEV" != X -a ! -c "$WATCHDEV" -a $insmod = yes ]
then
minor=`cat /proc/misc | grep watchdog | cut -c1-4`
mknod -m 600 $WATCHDEV c $MISCDEV $minor
fi

I guess I still should figure out the module loading for RH 6.1, though...

-- Alan Robertson
alanr@bell-labs.com
Heartbeat 0.4.5c prerelease available for download [ In reply to ]
Thomas Hepper wrote:
>
> Hi,
> On Sat, Oct 02, 1999 at 10:04:46PM -0600, Alan Robertson wrote:
> > Hi,
> >
> > I've uploaded heartbeat 0.4.5c to the download directory of the web site, but
> > there are no pointers to it on the web site. The URLs are what you'd expect:
> >
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c.tar.gz
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.src.rpm
> > http://www.henge.com/~alanr/ha/download/heartbeat-0.4.5c-1.i386.rpm
> >
> > Give it a shot, and see what you think.
>
> Question, who does the auth work. I have placed on both nodes the
> same authkey file (from the doc directory), removed the comments
> in front of the auth directive and now i get the following in the
> log:
> heartbeat: 1999/10/03_13:04:32 debug: master_status_process: node [skywalker.cha
> os.han.de] failed authentication
>
> So must the keys in the authkey file be the same or not.
>
> Next point is the /proc/ha interface, if i set the USE_MODULES=1
> in the heartbeat script the module is not loaded, and i get in the log file:
> ./heartbeat: [.: too many arguments
>
> running the script with -vx shows the following problem;
> grep -v '^#' $CONFIG | grep watchdog |
> sed s'%^[ ]*watchdog[ ]*%%'
> ++ grep -v '^#' /etc/ha.d/ha.cf
> ++ grep watchdog
> ++ sed 's%^[ ]*watchdog[ ]*%%'
> + WATCHDEV=
> echo $WATCHDEV
> ++ echo
> + WATCHDEV=
> + '[' X '!=' X ']'
> + : No watchdog device specified in /etc/ha.d/ha.cf file.
> + '[' '!' -c -a no = yes ']'
> heartbeat: [: too many arguments

This is actually unrelated to the module building stuff I did for RH 6.1. I
think I fixed it. Try changing the if statement near line 108 in heartbeat.sh.
Please take careful note of the quoting, since I failed to do that before :-)

if
[. "X$WATCHDEV" != X -a ! -c "$WATCHDEV" -a $insmod = yes ]
then
minor=`cat /proc/misc | grep watchdog | cut -c1-4`
mknod -m 600 $WATCHDEV c $MISCDEV $minor
fi

I guess I still should figure out the module loading for RH 6.1, though...

-- Alan Robertson
alanr@bell-labs.com