Mailing List Archive

Serial and ppp-udp connections broken in heartbeat 0.4.5
Joerg Henne wrote:
>
> Alan,
>
> after some fiddling with heartbeat 0.4.5 I've come to the conclusion that it
> contains a bug. But then again, as I'm new to the HA stuff, I've might have
> just shot myself in the foot somewhere. Here's what I've found:
>
> My setup consists of two nodes (for now) connected via a null-modem serial
> cable. I've configured "authentication" via crc, but have tried the other
> options, too. Every time I started up heartbeat on both nodes, they started
> logging "master_status_process: node [X] failed authentication". I added some
> debugging stuff and found that on the transmitting node the CRC is calculated
> with "ttl=3", on the receiving one with "ttl=2" - authentication fails, of
> course. On solution might be to skip the TTL field of messages in
> add_msg_auth() and isauthentic().
> This problem might apply only to nodes connected via the serial line (I have
> not yet tried UDP and don't have access to the machines at the moment).

Yep. It's a bug all right. I need to re-authenticate the packets (duhhh), or
skip the tty field. This authentication stuff is new with this release. I
hadn't had a chance to test it on a serial setup because my cluster that has
serial connections died. Guess I'd better hook up my other cluster with
serial ports! As I recall, the the same logic is in the ppp-udp connection
also.

It will never work as it stands. I'll tell the list.

> Apart from that: I wondered why you chose to implement heartbeat using a
> number of processes instead of using I/O multiplexig (e.g. select()).

The serial versus packet interfaces are so wildly different that I didn't
think it made sense to try and combine them at that level (character at a
time, framing considerations, etc. versus packet-at-a-time). So, I combine
them at a different level. Another non-trivial consideration for me was that
I have more experience with this kind of structure. It's not that much more
memory to make it work. The code I have is quite simple, and easy to follow
(once you understand the basic structure).


Thanks a lot for the great bug report!

-- Alan Robertson
alanr@bell-labs.com