Mailing List Archive

Heartbeat API
Hi,

After hearing from Luis Claudio Goncalves and Horms and Ken Beck and
others, I think the time has (finally) come to make an API for
heartbeat.

The general idea is to make an API which would allow programs to do
things like these:

Receive notification of asynchronous events:

like a machine becoming accessible, or becoming inaccessible,
interfaces going quiet (failing?), or becoming active again

Get the current list of nodes in the cluster

Get the information about any given node in the cluster
status (up/down)
load average (/proc/loadavg info)
other info?

Send a message to the whole cluster

Send a message to a node in the cluster

Receive a message from the cluster

This would allow lots of different uses for heartbeat, in addition to
what it does now. It would allow you to write the management code in
"C", and not be restricted to shell or some kludgy combination of C and
shell.

Next step: Defining these APIs.

Comments? Suggestions?

Thanks!

-- Alan Robertson
alanr@suse.com
Re: Heartbeat API [ In reply to ]
On 2000-04-21T11:29:24,
Alan Robertson <alanr@suse.com> said:

> The general idea is to make an API which would allow programs to do
> things like these:
>
> Receive notification of asynchronous events:
>
> like a machine becoming accessible, or becoming inaccessible,
> interfaces going quiet (failing?), or becoming active again

Ok.

>
> Get the current list of nodes in the cluster

Ok.

>
> Get the information about any given node in the cluster
> status (up/down)
> load average (/proc/loadavg info)
> other info?

I would remove the "get information about node" except for the
up/down/undefined part - supplying additional information should be part of a
module on top of this infrastructure.

> Send a message to the whole cluster
> Send a message to a node in the cluster

Maybe this could be generalised to "send a message to nodes with a specific
attribute". This may be "nodename == foo" or "attributes includes
CONNECTED_TO_SAN_1"...

> Receive a message from the cluster

> This would allow lots of different uses for heartbeat, in addition to
> what it does now. It would allow you to write the management code in
> "C", and not be restricted to shell or some kludgy combination of C and
> shell.
>
> Next step: Defining these APIs.
>
> Comments? Suggestions?

In essence, you are replacing the Cluster Membership Services and even parts
of the Group Messaging Services in FailSafe with heartbeat by this, right?

I also think there should be a defined API to configure heartbeat: add/remove
nodes/links.

Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
Development HA

--
Perfection is our goal, excellence will be tolerated. -- J. Yahl
Re: Heartbeat API [ In reply to ]
Lars Marowsky-Bree wrote:
>
> On 2000-04-21T11:29:24,
> Alan Robertson <alanr@suse.com> said:
>
> > The general idea is to make an API which would allow programs to do
> > things like these:
> >
> > Receive notification of asynchronous events:
> >
> > like a machine becoming accessible, or becoming inaccessible,
> > interfaces going quiet (failing?), or becoming active again
>
> Ok.
>
> >
> > Get the current list of nodes in the cluster
>
> Ok.
>
> >
> > Get the information about any given node in the cluster
> > status (up/down)
> > load average (/proc/loadavg info)
> > other info?
>
> I would remove the "get information about node" except for the
> up/down/undefined part - supplying additional information should be part of a
> module on top of this infrastructure.

Actually, the heartbeat code carries a certain number of attributes with
every heartbeat. Load average is the only one currently defined. I'm
intending this "other info" to be one that's sent with every heartbeat -
so it's readily accessible. One could define it as the name of an
arbitrary field in a heartbeat message - like time of last send, or time
of last receive, or something like this. In any case, Wensong said he'd
like to see load average - which I can perfectly understand...

> > Send a message to the whole cluster
> > Send a message to a node in the cluster
>
> Maybe this could be generalised to "send a message to nodes with a specific
> attribute". This may be "nodename == foo" or "attributes includes
> CONNECTED_TO_SAN_1"...

Right now the only attribute that heartbeat knows about is the node's
name. In general, not all messages are applicable to the whole cluster
now, and they are often implicit in this respect.

For example, when the current code sends out a ip-addr-request message,
it only applies to nodes which own that ip-address (really now a
resource group), but heartbeat (at this level) has no idea who has this
resource group, so it gets *routed* to all nodes, and they ignore it if
it isn't applicable.

One *could* use the node name to do routing, and it's a common special
case (for replies) that's why I make a special case of it, and *do* use
it to ignore packets that aren't meant for me (on reception).

> > Receive a message from the cluster
>
> > This would allow lots of different uses for heartbeat, in addition to
> > what it does now. It would allow you to write the management code in
> > "C", and not be restricted to shell or some kludgy combination of C and
> > shell.
> >
> > Next step: Defining these APIs.
> >
> > Comments? Suggestions?
>
> In essence, you are replacing the Cluster Membership Services and even parts
> of the Group Messaging Services in FailSafe with heartbeat by this, right?

I'm just making an API for what heartbeat already does. I'm making sure
that it has sufficient power to replace the existing heartbeat scripts
with a new layer that does more interesting things.

As of today, I'm not intending to replace anything that FailSafe does
with it. This is what's necessary for Luis and Horms to do what they've
been wanting to do with it.

However, I suspect that if one wanted to, that one could probably do
pretty much what you've described with it. I have not made the decision
that I want to do that with heartbeat.

You could also see this as a replacement for some of the things in
Stephen's proposal - that doesn't mean that I'm going to propose it to
Stephen ;-)

Adding/removing nodes sounds like a possibly good idea. Links a little
less so, since it's potentially a reasonably tramautic experience for
heartbeat to carry out, and you may have to provide some more syntax in
the message. Not to say that it's a bad idea, just that it's probably
not the place to start. Other people have argued for allowing any node
that knows the secret handshake to join from the network side, and that
might make more sense. I haven't decided in either case.

One can always add to an API to do new things with it (especially when
they're orthogonal - like nodes vs links, etc). I'm going to start out
without it, and then see if anyone wants it, and what they're trying to
accomplish with it.

It also seems like we're going to eventually want guaranteed packet
delivery order as well. It shouldn't be too hard, actually. The
protocol on the wire doesn't have to change.

-- Alan Robertson
alanr@suse.com
Heartbeat API [ In reply to ]
> Comments? Suggestions?

yes several :) but a first general one: In design i always try to
separate the features in cleanly defined areas to avoid to end up
in a huge unmaintainable heap of code.

in this email, i use the word lifemonitor and not heartbeat to avoid the
confusion with the programm called heartbeat.

On Fri, Apr 21, 2000 at 11:29:24AM -0600, Alan Robertson wrote:
> Get the current list of nodes in the cluster
> like a machine becoming accessible, or becoming inaccessible,

these functions are related to the life monitor. heartbeat may implement
them via udp.
1. for request/reply (e.g.the list of node), the client send a request
and hearbeat reply.
2. for asynchronous notification (e.g. a node leaving/entering in the cluster),
the client register himself when it start, and heartbeat send the
notification when necessary.

> interfaces going quiet (failing?), or becoming active again

why a program would need to know which interface is up/down ?

> Send a message to the whole cluster
> Send a message to a node in the cluster
> Receive a message from the cluster

the communications with other nodes shouldnt included be into the
heartbeat program because it isnt a life monitor problem.
To do it imply to use a usespace priopriatary network stack. IP
already exists in the kernel and has been designed/implemented by
experienced people along the years, why not use it ?
Heartbeat API [ In reply to ]
Jerome Etienne wrote:
>
> > Comments? Suggestions?
>
> yes several :) but a first general one: In design i always try to
> separate the features in cleanly defined areas to avoid to end up
> in a huge unmaintainable heap of code.

Me too. Good thing heartbeat isn't a huge unmaintainable heap of code
;-)

> in this email, i use the word lifemonitor and not heartbeat to avoid the
> confusion with the programm called heartbeat.
>
> On Fri, Apr 21, 2000 at 11:29:24AM -0600, Alan Robertson wrote:
> > Get the current list of nodes in the cluster
> > like a machine becoming accessible, or becoming inaccessible,
>
> these functions are related to the life monitor. heartbeat may implement
> them via udp.

Yes. It may - or it not.

> 1. for request/reply (e.g.the list of node), the client send a request
> and hearbeat reply.
> 2. for asynchronous notification (e.g. a node leaving/entering in the cluster),
> the client register himself when it start, and heartbeat send the
> notification when necessary.
>
> > interfaces going quiet (failing?), or becoming active again
>
> why a program would need to know which interface is up/down ?

If the "backup" link has failed, then you need to schedule downtime, so
that it is properly working when the "primary" link fails.

This way it can tell the adminstrator that something has failed - and
keep redundancy working. No high-availability system can work in
practice without testing the redundant hardware components to ensure
that they are still working. That way, when the time comes, and you
need the redundancy which you've architected into your system, that it
will work. The experience of highly available systems like telephony
systems has demonstrated the necessity of this approach over many years.

> > Send a message to the whole cluster
> > Send a message to a node in the cluster
> > Receive a message from the cluster
>
> the communications with other nodes shouldnt included be into the
> heartbeat program because it isnt a life monitor problem.
> To do it imply to use a usespace priopriatary network stack. IP
> already exists in the kernel and has been designed/implemented by
> experienced people along the years, why not use it ?

By the way, "proprietary open source software" is an oxymoron. Custom
is probably a better choice of terms.

heartbeat does use IP - for ethernet.

This API hides the details of how this is accomplished. It doesn't say
that it does or doesn't use IP. Quite frankly - this is none of the
upper layer's business. [Though discussing it in this forum is perfectly
appropriate].

These primitives are absolutely necessary for any HA system. You have
to be able to send messages to the entire collection of nodes in the
cluster. You have to be able to send messages to individual nodes in
the cluster.

It could be implemented with an O(N^2) collection of TCP connections, or
UDP connections like Stephen does. It could be implemented via
multicast, or serial rings. It could be implemented via SAN (a likely
option in a cluster), or X.25 or IrDA. It could use shared memory in
Larry McVoy's model of a cluster. It could be implemented by tin cans
and string or smoke signals. It could use all these techniques. It
doesn't matter. Like any good API, it hides unnecessary details of the
implementation(s).

It does whatever needs to be done, and that's its job. Including
managing and testing redundant communication paths, the set of machines
in the cluster, authentication details, etc to keep every user of
cluster communications from having to reimplement them themselves.

The code named "heartbeat" is *primarily* a modular, redundant,
intracluster communications layer - on which a simple heartbeat
mechanism rides.


-- Alan Robertson
alanr@suse.com
Heartbeat API [ In reply to ]
This is a multi-part message in MIME format.
--------------06F655326482951F36B372BC
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Alan Robertson wrote:
>
> Hi,
>
> After hearing from Luis Claudio Goncalves and Horms and Ken Beck and
> others, I think the time has (finally) come to make an API for
> heartbeat.

[Perhaps these might better be called "low-level cluster APIs"]

<snip>

> Next step: Defining these APIs.


I've attached my first draft of them. They're somewhat object oriented.

It's an attachment, so that my mailer doesn't mangle the line
boundaries, etc.

-- Alan Robertson
alanr@suse.com
--------------06F655326482951F36B372BC
Content-Type: text/plain; charset=us-ascii;
name="interfaces.c"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="interfaces.c"

#include <ha_msg.h>

typedef void (*llc_msg_callback_t) (const struct ha_msg* msg
, void* private_data);

typedef void (*llc_nstatus_callback_t) (const char *node, const char * status
, void* private_data);

typedef void (*llc_ifstatus_callback_t) (const char *node
, const char * interface, const char * status
, void* private_data);

struct ll_cluster {
void * ll_cluster_private;

/*
* set_msg_callback: Define callback for the given message type
*
* msgtype: Type of message being handled. NULL for default case.
* Note that default case not reached for node
* status messages handled by nstatus_callback,
* or ifstatus messages handled by nstatus_callback,
* Not just those explicitly handled by "msg_hander"
* cases.
*
* callback: callback function.
*
* p: private data - later passed to callback.
*/
int (*set_msg_callback) (const char * msgtype
, llc_msg_callback_t callback, void * p);

/*
* set_nstatus_callback: Define callback for node status messages
* This is a message of type "st"
*
* cbf: callback function.
*
* p: private data - later passed to callback.
*/

int (*set_nstatus_callback) (llc_nstatus_callback_t cbf
, void * p);
/*
* set_ifstatus_callback: Define callback for interface status messages
* This is a message of type "???"
* These messages are issued whenever an interface goes
* dead or becomes active again.
*
* cbf: callback function.
*
* p: private data - later passed to callback.
*/

int (*set_ifstatus_callback) (llc_ifstatus_callback_t cbf
, void * p);

/*
* init_nodewalk: Initialize walk through list of list of known nodes
*/
int (*init_nodewalk)(void);
/*
* init_nodewalk: Return next node in the list of known nodes
*/
const char * (*nextnode)(void);
/*
* end_nodewalk: End walk through the list of known nodes
*/
int (*end_nodewalk)(void);
/*
* node_status: Return most recent heartbeat status of the given node
*/
int (*node_status)(const char * nodename);

/*
* sendclustermsg: Send the given message to all cluster members
*/
int (*sendclustermsg)(const struct ha_msg* msg);
/*
* sendnodemsg: Send the given message to the given node in cluster.
*/
int (*sendnodemsg)(const struct ha_msg* msg
, const char * nodename);

/*
* inputfd: Return fd which can be given to select(2) or poll(2)
* for determining when messages are ready to be read.
* Only to be used in select() or poll(), please...
*/
int (*inputfd)(void);
/*
* msgready: Returns TRUE (1) when a message is ready to be read.
*/
int (*msgready)(void);
/*
* setmsgsignal: Associates the given signal with the "message waiting"
* condition.
*/
int (*setmsgsignal)(int signo);
/*
* rcvmsg: Cause the next message to be read - activating callbacks for
* processing the message.
*/
int (*rcvmsg)(int blocking);
};

--------------06F655326482951F36B372BC--
Heartbeat API [ In reply to ]
Hi,


/*
* set_ifstatus_callback: Define callback for interface status messages
* This is a message of type "???"
* These messages are issued whenever an interface goes
* dead or becomes active again.
*
* cbf: callback function.
*
* p: private data - later passed to callback.
*/

int (*set_ifstatus_callback) (llc_ifstatus_callback_t cbf, void * p);

You could pass the interface/host names, something like this:

int (*set_ifstatus_callback) (llc_ifstatus_callback_t cbf,
const char * name, const char * iface, void * p);

If "name" and "iface" are not NULL, the callback would be called for
interface changes of "iface" against node "name."
NULL passed as "name" would call the callback for state changes on the
interface "iface" against all hosts in the cluster.
NULL passed as "iface" would call the callback for state changes of the
node "name" in any interface.
NULL passed as "name" and "iface" would call the callback on any
interface status change against any node in the cluster.

Comments?

On Sat, 22 Apr 2000, Alan Robertson wrote:

> Alan Robertson wrote:
> >
> > Hi,
> >
> > After hearing from Luis Claudio Goncalves and Horms and Ken Beck and
> > others, I think the time has (finally) come to make an API for
> > heartbeat.
>
> [Perhaps these might better be called "low-level cluster APIs"]
>
> <snip>
>
> > Next step: Defining these APIs.
>
>
> I've attached my first draft of them. They're somewhat object oriented.
>
> It's an attachment, so that my mailer doesn't mangle the line
> boundaries, etc.
>
> -- Alan Robertson
> alanr@suse.com
Heartbeat API [ In reply to ]
Marcelo Tosatti wrote:
>
> Hi,
>
> /*
> * set_ifstatus_callback: Define callback for interface status messages
> * This is a message of type "???"
> * These messages are issued whenever an interface goes
> * dead or becomes active again.
> *
> * cbf: callback function.
> *
> * p: private data - later passed to callback.
> */
>
> int (*set_ifstatus_callback) (llc_ifstatus_callback_t cbf, void * p);
>
> You could pass the interface/host names, something like this:
>
> int (*set_ifstatus_callback) (llc_ifstatus_callback_t cbf,
> const char * name, const char * iface, void * p);
>
> If "name" and "iface" are not NULL, the callback would be called for
> interface changes of "iface" against node "name."
> NULL passed as "name" would call the callback for state changes on the
> interface "iface" against all hosts in the cluster.
> NULL passed as "iface" would call the callback for state changes of the
> node "name" in any interface.
> NULL passed as "name" and "iface" would call the callback on any
> interface status change against any node in the cluster.
>
> Comments?


I guess I wasn't thinking about these calls as applying across all nodes
in the cluster, but primarily to local interfaces... Hmmm... I guess
having it apply across the cluster seems desirable...

I also realized that I've shortchanged the interface side of the world
in another way...

I didn't create any way to get the list of interfaces, nor to get the
current status of any interfaces...

This seems desirable, perhaps necessary. This part may be a little
harder to implement across the cluster than it would be strictly
locally... :-(

-- Alan Robertson
alanr@suse.com
Heartbeat API [ In reply to ]
Is there any strategy yet for aligning the heartbeat API with
the Failsafe api for places where there is conceivable overlap?
Or at least aligning some of the concepts?

curiously,
-dB
Heartbeat API [ In reply to ]
David Brower wrote:
>
> Is there any strategy yet for aligning the heartbeat API with
> the Failsafe api for places where there is conceivable overlap?
> Or at least aligning some of the concepts?

Of course, heartbeat is much lower level than Failsafe. I would view it
as an API that FailSafe might eventually take advantage of.

The closest analogy is that FailSafe has an API for it's admin tools
that might provide somewhat similar functions.

This layer of the cluster (comm and membership) doesn't have user-level
API's that I'm aware of. I think there is a whole separate layer for
the user-level APIs. But, maybe I'm wrong...

I've cross-posted this to the FailSafe mailing list in case someone has
comments on it.

-- Alan Robertson
alanr@suse.com