Mailing List Archive

A Stonith API
This is a multi-part message in MIME format.
--------------F408C55ECE3C7DF0977EF860
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi,

I've been thinking about STONITH things for a while, and have come up
with an API which seems reasonable to me. I've attached it for your
reading pleasure.

I have implemented it for one particular type of STONITH device, and it
seems to work pretty well.

As always, I look forward to your comments.

Thanks!

-- Alan Robertson
alanr@suse.com
--------------F408C55ECE3C7DF0977EF860
Content-Type: text/plain; charset=us-ascii;
name="stonith.h"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="stonith.h"

/*
* S hoot
* T he
* O ther
* N ode
* I n
* T he
* H ead
*
* Cause the other machine to reboot or die - now.
*
* We guarantee that when we report that the machine has been
* rebooted, then it has been (barring misconfiguration or hardware errors)
*
* A machine which we have STONITHed won't do anything more to its peripherials
* etc. until it goes through the reboot cycle.
*/

/*
* Return codes from "Stonith" member functions.
*/

#define S_OK 0 /* Machine correctly reset */
#define S_BADCONFIG 1 /* Bad config info given */
#define S_NOACCESS 2 /* Can't access STONITH device */
/* (login/passwd problem?) */
#define S_BADHOST 3 /* Bad/illegal host/node name */
#define S_RESETFAIL 4 /* Reset failed */
#define S_OOPS 5 /* Something strange happened */

typedef struct stonith {
struct stonith_ops * s_ops;
void * pinfo;
}Stonith;

/*
* These functions all use syslog(3) for error messages.
* Consequently they assume you've done an openlog() to initialize it for them.
*/
struct stonith_ops {
void (*delete) (Stonith* s); /* Stonith Destructor */

int (*set_config_file) (Stonith * s, const char * filename);

int (*set_config_info) (Stonith * s, const char * configstring);

/*
* Must call set_config_info or set_config_file before calling any of
* these.
*/

int (*reset_host) (Stonith * s, const char * hostname);

char**(*hostlist) (Stonith* s);
/* Returns list of hosts it supports */
};

extern Stonith * stonith_new(const char * type);

--------------F408C55ECE3C7DF0977EF860--
A Stonith API [ In reply to ]
[snip]
> #define S_OK 0 /* Machine correctly reset */
> #define S_BADCONFIG 1 /* Bad config info given */
> #define S_NOACCESS 2 /* Can't access STONITH device */
> /* (login/passwd problem?) */
I'm unclear on how a login/authentication scheme fits into the APIs
shown below. Please clarify.

> #define S_BADHOST 3 /* Bad/illegal host/node name */
> #define S_RESETFAIL 4 /* Reset failed */
> #define S_OOPS 5 /* Something strange happened */

I suggest a S_TIMEOUT. Kimberlite power switch drivers are serial port
based. Often the only
indication you have that nothing is connected (or misconfigured) is that
commands timeout.

Along the lines of serial port connected switches..... what do you think
about allowing specification of serial port parameters?

>
> typedef struct stonith {
> struct stonith_ops * s_ops;
> void * pinfo;
> }Stonith;
>
> /*
> * These functions all use syslog(3) for error messages.
> * Consequently they assume you've done an openlog() to initialize it for them.
> */
> struct stonith_ops {
> void (*delete) (Stonith* s); /* Stonith Destructor */
>
> int (*set_config_file) (Stonith * s, const char * filename);
>
> int (*set_config_info) (Stonith * s, const char * configstring);
Could you please provide more description of what you picture to be
specified in the config file?
>
> /*
> * Must call set_config_info or set_config_file before calling any of
> * these.
> */
>
> int (*reset_host) (Stonith * s, const char * hostname);
I've wondered if it would be useful to have a call to say power down and
leave it off; opposed to power cycle. We don't have such a call; but
conceivably if things got completely out of hand and you had
sophisticated historical monitoring it may be appropriate.
>
> char**(*hostlist) (Stonith* s);
> /* Returns list of hosts it supports */

Kimberlite also makes use of a form of "power switch status" command.
This is useful for policy decisions as well as for displaying in the
GUI, etc. We periodically poll the power switch status and log error
messages as a means of tipping off the administrator that a problem
exists. This is preferable to finding out that your connectivity to the
switch has gone away only when you actually need to shoot another node!
> };
>
> extern Stonith * stonith_new(const char * type);

--
Tim Burke Tel. 978-446-9166 x232
Mission Critical Linux, Inc burke@mclinux.com
http://www.missioncriticallinux.com
A Stonith API [ In reply to ]
Hi Tim,

Thanks for the reply.

Tim Burke wrote:
>
> [snip]
> > #define S_OK 0 /* Machine correctly reset */
> > #define S_BADCONFIG 1 /* Bad config info given */
> > #define S_NOACCESS 2 /* Can't access STONITH device */
> > /* (login/passwd problem?) */
> I'm unclear on how a login/authentication scheme fits into the APIs
> shown below. Please clarify.

It's assumed to be part of the configuration information which is given
below.

> > #define S_BADHOST 3 /* Bad/illegal host/node name */
> > #define S_RESETFAIL 4 /* Reset failed */
> > #define S_OOPS 5 /* Something strange happened */
>
> I suggest a S_TIMEOUT. Kimberlite power switch drivers are serial port
> based. Often the only
> indication you have that nothing is connected (or misconfigured) is that
> commands timeout.

In my current implementation this would generally result in an S_OOPS.
I could add S_TIMEOUT or whatever, if you think that distinction is
really necessary. I seem to recall that you're using the BayTech RPC
devices. So am I (currently). They are coming out with a version that
uses SSH for authentication. I didn't use the Kimberlite code because I
was half-way done when you announced your code, and the use of the regex
library seemed a bit too heavyweight to me. There's a nice,
simple-to-use expect routine which Stonith modules can use if they so
desire...

> Along the lines of serial port connected switches..... what do you think
> about allowing specification of serial port parameters?

My assumption is that all the information necessary for configuring the
device is given to the set_config_info() or set_config_file() member
functions. The format of the string is necessarily dependent on the
type of Stonith device under consideration.


> > struct stonith_ops {
> > void (*delete) (Stonith* s); /* Stonith Destructor */
> >
> > int (*set_config_file) (Stonith * s, const char * filename);
> >
> > int (*set_config_info) (Stonith * s, const char * configstring);
> Could you please provide more description of what you picture to be
> specified in the config file?

Whatever the particular device needs to operate. For the code I've
written it's: IP address, login and password. Different devices have
different needs. I'll probably also allow /dev/ttyxxx instead of IP
address for this type of device. If the device supports multiple hosts,
but not host names, this would have to include mapping information.

Here's what's in my particular config file on my test machine:

# IP address/name login password
#
# We let the switch store the outlet <=> host mappings.
# Human beings need for it to be in the switch (so they can telnet
there)
# So, there didn't seem to be much point in duplicating it here.
#
10.10.10.127 admin admin

> > /*
> > * Must call set_config_info or set_config_file before calling any of
> > * these.
> > */
> >
> > int (*reset_host) (Stonith * s, const char * hostname);
> I've wondered if it would be useful to have a call to say power down and
> leave it off; opposed to power cycle. We don't have such a call; but
> conceivably if things got completely out of hand and you had
> sophisticated historical monitoring it may be appropriate.

This is really a starting place, not the ending place ;-).

> > char**(*hostlist) (Stonith* s);
> > /* Returns list of hosts it supports */
>
> Kimberlite also makes use of a form of "power switch status" command.
> This is useful for policy decisions as well as for displaying in the
> GUI, etc. We periodically poll the power switch status and log error
> messages as a means of tipping off the administrator that a problem
> exists. This is preferable to finding out that your connectivity to the
> switch has gone away only when you actually need to shoot another node!
> > };

I assume you mean the status of the Stonith device itself?

If so, that makes lots of sense. For the particular device I'm using,
you can infer that from getting the host list from the device. But,
this is a good idea, and there's no guarantee that you have to ask the
device to get that info.

I added it to my copy of the API, and my implementation...

int (*status) (Stonith * s);

OK?

Here's my "Hello, world" main.c, complete with call to the new status()
member function:

int rc;
char * cmdname;
Stonith * s;

if ((cmdname = strrchr(argv[0], '/')) == NULL) {
cmdname = argv[0];
}else{
++cmdname;
}

openlog(cmdname, LOG_CONS|LOG_PERROR, LOG_DAEMON);

s = stonith_new("baytech");

if (s == NULL) {
syslog(LOG_ERR, "Cannot create BayTech Stonith object");
exit(S_OOPS);
}
if ((rc=s->s_ops->set_config_file(s, "/etc/ha.d/rpc.cfg")) != S_OK) {
syslog(LOG_ERR, "Cannot set config file for stonith object");
exit(rc);
}

if ((rc = s->s_ops->status(s)) != S_OK) {
/* Uh-Oh */
syslog(LOG_ERR, "baytech device not accessible");
}

rc = (s->s_ops->reset_host(s, argv[1]));
s->s_ops->delete(s);
return(rc);


-- Alan Robertson
alanr@suse.com
A Stonith API [ In reply to ]
[snip]

> >
> > int (*reset_host) (Stonith * s, const char * hostname);
> I've wondered if it would be useful to have a call to say power down and
> leave it off; opposed to power cycle. We don't have such a call; but
> conceivably if things got completely out of hand and you had
> sophisticated historical monitoring it may be appropriate.

i think it is useful. in the ha projects i've worked on in the past we
had a distinction between power cycle and power off. as one might
expect, power cycle was the second to last escalation point during fault
recovery, followed by power off as the last point of escalation. seems
pretty easy to add to the api...

-chris
A Stonith API [ In reply to ]
Chris Wright wrote:

> Tim Burke wrote:

> > I've wondered if it would be useful to have a call to say power down and
> > leave it off; opposed to power cycle. We don't have such a call; but
> > conceivably if things got completely out of hand and you had
> > sophisticated historical monitoring it may be appropriate.
>
> i think it is useful. in the ha projects i've worked on in the past we
> had a distinction between power cycle and power off. as one might
> expect, power cycle was the second to last escalation point during fault
> recovery, followed by power off as the last point of escalation. seems
> pretty easy to add to the api...

But, it also seems to me that some forms of Stonith hardware might not
support this particular use. For example, if your Stonith device just
yanks down the reset lead, then it probably can't do this.

This probably isn't a huge deal, and, as you said, it's easy to add. If
you add it, then you also probably need to add a host_status() call also
which would tell you whether it's currently powered up or not. My
particular device (the BayTech RPC5) has independent power inputs for
each machine, so it can also tell you if the power input for any
particular machine are live or not. The switch itself is operational as
long as any one of the power inputs is live. Curiously enough a given
machine can be powered on, yet have no power. For this device, I could
imagine wanting an input_status() API call...

So, there's a whole world of things which one could add to the API.

Should we add an S_UNSUPPORTED return code for unimplemented opcodes?

I have a goal of putting this on my CVS repository Real Soon Now, so I'm
not going to add lots more stuff before I do.


-- Alan Robertson
alanr@suse.com
A Stonith API [ In reply to ]
1. Generalization for host i/o fencing.

The changes would be for 'reset' to change it's name,
address the reboot issue, and adopt the harder fence
as suggested for power off. This would result in operations
such as:

int (*soft_fence)( char *hostname ); /* may be unfenced by node */
int (*hard_fence)( char *hostname );
int (*unfence_me)( boolean fromhard ); /* used at boot to unfence self */

An i/o fence at the host level may only be reset following a reboot.
For STONITH, this practically comes for free, but not if a real i/o
fence was used instead. To reset that, booting time a host must
call unfence_me somewhere early on to be sure it can access the
non-boot storage devices. In a true stonith implementation,
this would be a no-op. For one that programs a fibre-channel switch,
unfence_me might do quite a lot.

2. Better default behaviour.

I don't think it should be necessary to specify a config file
before doing something. There should be a default for each
implementation.

4. Config file format

I don't think the config file format should be specified as part
of the API to allow necessary implmetation freedom. Different
mechanisms might need different information. This might include
security credentials or serial port parameters. The API shouldn't
have to deal with this.

4. Security.

It should be the case that only programs with some privilege are
allowed to execute the API. For consistency w/errno, should
S_NOACCESS be S_ACCESS? Should there be a separate S_PERM, akin
to the EACCESS/EPERM distinction? (Implementations may need
different kinds of credentials to work, see #3.

5. Implementation support for multiple mechanisms.

Say you have 4 hosts, two of which are on one kind of stonith widget,
and two on another. Your client program should be isolated from the
existence of both of these, and only need to call one API that will
do the right thing.

A high quality implementation might provide a configurable way
of listing the mechanisms, loading configured DLLs, and calling
the right mechanisms on demand for the specified hosts.

A minimal implmentation might have no config file and hard wire
stuff in the library.

6. Asyncronous calls in api.

I know it's a pain, but it would be useful if the calls had
an async flavor and a status poll mechanism. This would let
you issue several fences at the same time and then wait for
them all to complete. As it is, you have a totally syncronous
interface that serializes all operations. This can increase
latency.

example:

typedef struct stonithstat stonithstat;
#define S_INPROGRESS 6
stonithstat (*soft_fence)( char *hostname );
int (*query_stat)( stonithstat stat );

(The shabby implementation does it syncronous anyway.)

7. Multi-host operations.

Similarly, it might be useful for the fence operations to
take a vector of hostnames instead of one at a time. At
least then it could have internal parallelism, even if
it doesn't support async ops.

example:

int (*soft_fence)( int hostc, char **hostv );

-dB
A Stonith API [ In reply to ]
Alan Robertson wrote:
[snip]
> Should we add an S_UNSUPPORTED return code for unimplemented opcodes?

Absolutely.

What about a call to return the power switch type? If nothing else,
useful trivia to display in a management gui.
A Stonith API [ In reply to ]
Alan Robertson wrote:
>
> Chris Wright wrote:
>

>
> This probably isn't a huge deal, and, as you said, it's easy to add. If
> you add it, then you also probably need to add a host_status() call also
> which would tell you whether it's currently powered up or not. My
> particular device (the BayTech RPC5) has independent power inputs for
> each machine, so it can also tell you if the power input for any
> particular machine are live or not. The switch itself is operational as
> long as any one of the power inputs is live. Curiously enough a given
> machine can be powered on, yet have no power. For this device, I could
> imagine wanting an input_status() API call...
>
> So, there's a whole world of things which one could add to the API.
>
> Should we add an S_UNSUPPORTED return code for unimplemented opcodes?
>

Opinions Galore:

I think exposing host status is a Bad Idea. That is what the monitoring
code is supposed to be doing.

I think UNSUPPORTED operations are a Bad Thing, because there is
a tremendous temptation to code to the LCD, not the full set.
What operations are required, and which optional? It's madness
to go that way.

-dB
A Stonith API [ In reply to ]
David Brower wrote:
>
> Alan Robertson wrote:
> >
> > Chris Wright wrote:
> >
>
> >
> > This probably isn't a huge deal, and, as you said, it's easy to add. If
> > you add it, then you also probably need to add a host_status() call also
> > which would tell you whether it's currently powered up or not. My
> > particular device (the BayTech RPC5) has independent power inputs for
> > each machine, so it can also tell you if the power input for any
> > particular machine are live or not. The switch itself is operational as
> > long as any one of the power inputs is live. Curiously enough a given
> > machine can be powered on, yet have no power. For this device, I could
> > imagine wanting an input_status() API call...
> >
> > So, there's a whole world of things which one could add to the API.
> >
> > Should we add an S_UNSUPPORTED return code for unimplemented opcodes?
> >
>
> Opinions Galore:
>
> I think exposing host status is a Bad Idea. That is what the monitoring
> code is supposed to be doing.

But, there's no way to tell if power is off or on without it. It is am
important complement to power on and off. You don't want one without
the other.

> I think UNSUPPORTED operations are a Bad Thing, because there is
> a tremendous temptation to code to the LCD, not the full set.
> What operations are required, and which optional? It's madness
> to go that way.

I can appreciate this. It depends on what your goal is:

Restricting your options to the LCD (which is what not adding
other options would do), and not providing useful information
which doesn't make sense in some configurations.
Restricting your hardware to the GCD.

-- Alan Robertson
alanr@suse.com
A Stonith API [ In reply to ]
[snip]
> > I suggest a S_TIMEOUT. Kimberlite power switch drivers are serial port
> > based. Often the only
> > indication you have that nothing is connected (or misconfigured) is that
> > commands timeout.
>
> In my current implementation this would generally result in an S_OOPS.
> I could add S_TIMEOUT or whatever, if you think that distinction is
> really necessary.
Where possible, I prefer more specific than general catch all error
messages. This allows one to implement better policy based on
differentiating the outcome.

> I seem to recall that you're using the BayTech RPC
> devices. So am I (currently). They are coming out with a version that
> uses SSH for authentication.
No, we're using RPS-10's from www.wti.com. Reason being that our
current implementation is for 2 nodes. Given that it seemed much more
cost effective to use simple power switches which attach via serial port
and control a single plug. Since your target is >= 2 nodes you need a
more sophisticated switch.
A Stonith API [ In reply to ]
> > > This probably isn't a huge deal, and, as you said, it's easy to add. If
> > > you add it, then you also probably need to add a host_status() call also
> > > which would tell you whether it's currently powered up or not. My
> > > particular device (the BayTech RPC5) has independent power inputs for
> > > each machine, so it can also tell you if the power input for any
> > > particular machine are live or not. The switch itself is operational as
> > > long as any one of the power inputs is live. Curiously enough a given
> > > machine can be powered on, yet have no power. For this device, I could
> > > imagine wanting an input_status() API call...
> > >
> > > So, there's a whole world of things which one could add to the API.
> > >
> > > Should we add an S_UNSUPPORTED return code for unimplemented opcodes?
> >
> > Opinions Galore:
> >
> > I think exposing host status is a Bad Idea. That is what the monitoring
> > code is supposed to be doing.
>
> But, there's no way to tell if power is off or on without it. It is am
> important complement to power on and off. You don't want one without
> the other.

Yes, there is: the call returned that it completed successfully.
It works, or it doesn't.

That doesn't mean you need a -separate- api on which to toss
all sorts of status info that might be interesting in some cases.

I guess there are two things going on: (1) operational actions that
actually do something. And (2) status operations, which IMO belong
in a management framework, not in the operational API. You don't
have details of your NIC in the socket APIs. They show up in /proc
and/or SNMP stuff, where the device specific stuff belongs.

> > I think UNSUPPORTED operations are a Bad Thing, because there is
> > a tremendous temptation to code to the LCD, not the full set.
> > What operations are required, and which optional? It's madness
> > to go that way.
>
> I can appreciate this. It depends on what your goal is:
>
> Restricting your options to the LCD (which is what not adding
> other options would do), and not providing useful information
> which doesn't make sense in some configurations.
> Restricting your hardware to the GCD.


I can't think of any operations proposed that I'd want
to allow to be UNSUPPORTED, once you strip out monitoring.
We have to have fence/reset -- it can't be UNSUPPORTED, or
you don't have anything.

-dB
A Stonith API [ In reply to ]
David Brower wrote:
>
> 1. Generalization for host i/o fencing.
>
> The changes would be for 'reset' to change it's name,
> address the reboot issue, and adopt the harder fence
> as suggested for power off. This would result in operations
> such as:
>
> int (*soft_fence)( char *hostname ); /* may be unfenced by node */
> int (*hard_fence)( char *hostname );
> int (*unfence_me)( boolean fromhard ); /* used at boot to unfence self */

Of course, this is meaningless in the case of a Stonith option. I/O
fencing is VASTLY more complex in what it needs to do, and what it can
do. For example, you may potentially need to unfence some resources
just to boot. Other resources may be hard fenced. It seems to me that
you need to specify resources, not just machines for I/O fencing. I see
real I/O fencing as being an order of magnitude more complex -- at
least.

At this point in time, I'm not trying to define general I/O fencing.

> An i/o fence at the host level may only be reset following a reboot.
> For STONITH, this practically comes for free, but not if a real i/o
> fence was used instead. To reset that, booting time a host must
> call unfence_me somewhere early on to be sure it can access the
> non-boot storage devices. In a true stonith implementation,
> this would be a no-op. For one that programs a fibre-channel switch,
> unfence_me might do quite a lot.
>
> 2. Better default behaviour.
>
> I don't think it should be necessary to specify a config file
> before doing something. There should be a default for each
> implementation.

This would tend to tie it to being useful for only one particular HA
system, since where to put config files is different from HA
implementation to implementation. For example, I could put them in
/etc/ha.d, but that would only work well with heartbeat. The reason for
giving a string as an option is that then it could be retrieved from a
database, or from a global config file.

> 4. Config file format
>
> I don't think the config file format should be specified as part
> of the API to allow necessary implmetation freedom. Different
> mechanisms might need different information. This might include
> security credentials or serial port parameters.

I think this is what I said (?). The API has to deal with it to the
extent that it has to be able to tell the implementation where to go get
it, or give it to it as a string. How it is implemented is up to the
surround environment (caller).

> The API shouldn't
> have to deal with this.

Except to point the code at where it thinks the config info is to be
found, I agree.

> 4. Security.
>
> It should be the case that only programs with some privilege are
> allowed to execute the API. For consistency w/errno, should
> S_NOACCESS be S_ACCESS? Should there be a separate S_PERM, akin
> to the EACCESS/EPERM distinction? (Implementations may need
> different kinds of credentials to work, see #3.

In my view, this is a UNIX security issue, not an API security issue.
If it can't open the file that has the password, then it won't work.
This code runs with NO privilege itself. If the caller can't give you
the info you need to reset the switch, then you can't do it. If they
can, AND you can access the device, or whatever, then you can do it.
It's up to the caller and OS to enforce this. Keep in mind that I can't
enforce anything better than UNIX security does anyway.

If I had the fancier scheme you talked about below, I'd need to worry a
lot about network security, access tokens, authentication, etc. I think
I can ignore it for this API.

> 5. Implementation support for multiple mechanisms.
>
> Say you have 4 hosts, two of which are on one kind of stonith widget,
> and two on another. Your client program should be isolated from the
> existence of both of these, and only need to call one API that will
> do the right thing.

This is a good thought, but I'm not trying for a grand system that hides
everything from everyone. You could build it on top of this system if
you want. If you know the set of all widgets available on a given
machine, then you can instantiate them all individually, and ask them
which hosts they support.

> A high quality implementation might provide a configurable way
> of listing the mechanisms, loading configured DLLs, and calling
> the right mechanisms on demand for the specified hosts.
>
> A minimal implmentation might have no config file and hard wire
> stuff in the library.
>
> 6. Asyncronous calls in api.
>
> I know it's a pain, but it would be useful if the calls had
> an async flavor and a status poll mechanism. This would let
> you issue several fences at the same time and then wait for
> them all to complete. As it is, you have a totally syncronous
> interface that serializes all operations. This can increase
> latency.
>
> example:
>
> typedef struct stonithstat stonithstat;
> #define S_INPROGRESS 6
> stonithstat (*soft_fence)( char *hostname );
> int (*query_stat)( stonithstat stat );
>
> (The shabby implementation does it syncronous anyway.)

This seems a more dangerous option than using the built in mechanism for
reboot. With STONITH, you want to *reset* the machine NOT power it off
and back on. Otherwise you wind up with deadlock situations. Reset
operations are synchronous - you have to wait for them to complete (at
least on my hardware).

The considerations for STONITH are a little different than the
considerations for real I/O fencing (or at least it seems to me).

> 7. Multi-host operations.
>
> Similarly, it might be useful for the fence operations to
> take a vector of hostnames instead of one at a time. At
> least then it could have internal parallelism, even if
> it doesn't support async ops.
>
> example:
>
> int (*soft_fence)( int hostc, char **hostv );

Telling which machine failed, and which succeded might be problematic.

I guess you can tell I'm a fan of simplicity. You can always define new
versions of the API, once you have experience teaching you what
additions are essential. My reaction at the moment is that trying to
support STONITH and general I/O fencing out of the same API is going to
greatly complicate the STONITH API.

IBM says that they have had serious problems with their HA system wrt
code bloat. I have no trouble believing that. My code is already
larger than I want it to be.


-- Alan Robertson
alanr@suse.com
A Stonith API [ In reply to ]
David Brower wrote:
>
> > > > This probably isn't a huge deal, and, as you said, it's easy to add. If
> > > > you add it, then you also probably need to add a host_status() call also
> > > > which would tell you whether it's currently powered up or not. My
> > > > particular device (the BayTech RPC5) has independent power inputs for
> > > > each machine, so it can also tell you if the power input for any
> > > > particular machine are live or not. The switch itself is operational as
> > > > long as any one of the power inputs is live. Curiously enough a given
> > > > machine can be powered on, yet have no power. For this device, I could
> > > > imagine wanting an input_status() API call...
> > > >
> > > > So, there's a whole world of things which one could add to the API.
> > > >
> > > > Should we add an S_UNSUPPORTED return code for unimplemented opcodes?
> > >
> > > Opinions Galore:
> > >
> > > I think exposing host status is a Bad Idea. That is what the monitoring
> > > code is supposed to be doing.
> >
> > But, there's no way to tell if power is off or on without it. It is am
> > important complement to power on and off. You don't want one without
> > the other.
>
> Yes, there is: the call returned that it completed successfully.
> It works, or it doesn't.

But, someone can come in manually and tell the device to shut off
power. You don't in general know the state of power in the presence of
all the factors present (like human beings, software bugs, gremlins,
etc.)

> That doesn't mean you need a -separate- api on which to toss
> all sorts of status info that might be interesting in some cases.
>
> I guess there are two things going on: (1) operational actions that
> actually do something. And (2) status operations, which IMO belong
> in a management framework, not in the operational API. You don't
> have details of your NIC in the socket APIs. They show up in /proc
> and/or SNMP stuff, where the device specific stuff belongs.

OK. I'm not sure I want to invent another API to do something whose
code is 90% common with the other API. Maybe documenting the options as
being of a different class, and stating specifically that only
monitoring calls are optional? Or, maybe even have a s_ops vector, and
an s_status vector, and if you implement one of them, you implement all
the operations, or none?

> > > I think UNSUPPORTED operations are a Bad Thing, because there is
> > > a tremendous temptation to code to the LCD, not the full set.
> > > What operations are required, and which optional? It's madness
> > > to go that way.
> >
> > I can appreciate this. It depends on what your goal is:
> >
> > Restricting your options to the LCD (which is what not adding
> > other options would do), and not providing useful information
> > which doesn't make sense in some configurations.
> > Restricting your hardware to the GCD.
>
> I can't think of any operations proposed that I'd want
> to allow to be UNSUPPORTED, once you strip out monitoring.
> We have to have fence/reset -- it can't be UNSUPPORTED, or
> you don't have anything.

I don't think we *need* power off. So, it *could* be optional. This is
why I didn't include it in the first place. Some types of STONITH
hardware can't support it. I'm certainly OK with a minimal view of how
this should work.

I suppose hardware that pressed the reset button could simply do that
when asked to shut the machine off. Or, would it be better to not put
in support for power off, or would it be better to say power off is
mandantory if the hardware supports it, but optional if it doesn't? Or
is it just mandantory - period - don't use hardware that doesn't support
it?

-- Alan Robertson
alanr@suse.com
Re: A Stonith API [ In reply to ]
Hi!

On Wed, 28 Jun 2000, Alan Robertson wrote:

> Hi,
>
> I've been thinking about STONITH things for a while, and have come up
> with an API which seems reasonable to me. I've attached it for your
> reading pleasure.
>
> I have implemented it for one particular type of STONITH device, and it
> seems to work pretty well.
>
> As always, I look forward to your comments.

To have an effective STONITH we need to be sure of what machine has
to be shot. This a quorum issue, *simple* if you have only two
machines. We've talked about it sometime ago.
We need a __reference host__ (a third machine, a router or
something who happens to have an IP address) to define who's out of
communication. Why??? Using the actual scheme, if host A can't see
host B, it thinks host B is dead. The same will occur to host B. So
both will try to STONITH each other. It can be very dangerous both
machines can STONITH and you aren't sure on "which host have to die"...
Using a __reference host__ you can heve something like:

+-----------+
| Ref. Host |
+-----------+
|
+-------------+---------------+
| |
|eth0 |eth0
+-----------+ Serial/eth1 +-----------+
| Host A |-----------------| Host B |
+-----------+ +-----------+

i.e.:

* If host A can see host B (on any interface) and can see the
reference host, it's OK.
* If host B can see host A via eth1/serial but can't see the
reference host, its eth0 may be on trouble. If eth0 is the iface for
services, it's safe to STONITH host B. This case host B can shot his
head.

Using the third host it is simple to find out who has a link
down. It wold be very useful when setting up a HA firewall... if one
of the mandatory interfaces go down, let's STONITH...

Just thoughts.

Luis

[ Luis Claudio R. Goncalves lclaudio@conectiva.com.br ]
[. MSc coming soon -- Conectiva HA Team -- Gospel User -- Linuxer -- :) ]
[. Fault Tolerance - Real-Time - Distributed Systems - IECLB - IS 40:31 ]
[. LateNite Programmer -- Jesus Is The Solid Rock On Which I Stand -- ]
Re: A Stonith API [ In reply to ]
"Luis Claudio R. Goncalves" wrote:
[snip]
>
> To have an effective STONITH we need to be sure of what machine has
> to be shot. This a quorum issue, *simple* if you have only two
> machines. We've talked about it sometime ago.
> We need a __reference host__ (a third machine, a router or
> something who happens to have an IP address) to define who's out of
> communication. Why??? Using the actual scheme, if host A can't see
> host B, it thinks host B is dead. The same will occur to host B. So
> both will try to STONITH each other. It can be very dangerous both
> machines can STONITH and you aren't sure on "which host have to die"...
[snip]

Host membership decisions and policy regarding when the STONITH should
be called are at an inherently higher level. This should be separate
from the low-level STONITH calls.

Actually, even for a 2 node case, quorum isn't *simple* if you care
about addressing network partitions and hung nodes which may become
unhung.
Re: A Stonith API [ In reply to ]
"Luis Claudio R. Goncalves" wrote:
>
> Hi!
>
> On Wed, 28 Jun 2000, Alan Robertson wrote:
>
> > Hi,
> >
> > I've been thinking about STONITH things for a while, and have come up
> > with an API which seems reasonable to me. I've attached it for your
> > reading pleasure.
> >
> > I have implemented it for one particular type of STONITH device, and it
> > seems to work pretty well.
> >
> > As always, I look forward to your comments.
>
> To have an effective STONITH we need to be sure of what machine has
> to be shot. This a quorum issue, *simple* if you have only two
> machines. We've talked about it sometime ago.
> We need a __reference host__ (a third machine, a router or
> something who happens to have an IP address) to define who's out of
> communication. Why??? Using the actual scheme, if host A can't see
> host B, it thinks host B is dead. The same will occur to host B. So
> both will try to STONITH each other. It can be very dangerous both
> machines can STONITH and you aren't sure on "which host have to die"...
> Using a __reference host__ you can heve something like:
>
> +-----------+
> | Ref. Host |
> +-----------+
> |
> +-------------+---------------+
> | |
> |eth0 |eth0
> +-----------+ Serial/eth1 +-----------+
> | Host A |-----------------| Host B |
> +-----------+ +-----------+
>
> i.e.:
>
> * If host A can see host B (on any interface) and can see the
> reference host, it's OK.
> * If host B can see host A via eth1/serial but can't see the
> reference host, its eth0 may be on trouble. If eth0 is the iface for
> services, it's safe to STONITH host B. This case host B can shot his
> head.
>
> Using the third host it is simple to find out who has a link
> down. It wold be very useful when setting up a HA firewall... if one
> of the mandatory interfaces go down, let's STONITH...

Of course, if you STONITH device is network attached IT can become the
third party. This is one of the reasons I like a network attached
STONITH device :-)

I'm actually thinking about writing a module in heartbeat that would
create a class of membership called "ping" membership. In this type of
heartbeat, you would ping a machine, then if it can't be reached, it
would be called down. You would access it's information using the usual
API. Then, you have a nice quorum setup.

-- Alan Robertson
alanr@suse.com
Re: A Stonith API [ In reply to ]
Tim Burke wrote:
>
> "Luis Claudio R. Goncalves" wrote:
> [snip]
> >
> > To have an effective STONITH we need to be sure of what machine has
> > to be shot. This a quorum issue, *simple* if you have only two
> > machines. We've talked about it sometime ago.
> > We need a __reference host__ (a third machine, a router or
> > something who happens to have an IP address) to define who's out of
> > communication. Why??? Using the actual scheme, if host A can't see
> > host B, it thinks host B is dead. The same will occur to host B. So
> > both will try to STONITH each other. It can be very dangerous both
> > machines can STONITH and you aren't sure on "which host have to die"...
> [snip]
>
> Host membership decisions and policy regarding when the STONITH should
> be called are at an inherently higher level. This should be separate
> from the low-level STONITH calls.

Agreed. I also think that cluster-wide STONITH which hides where the
devices are is a cluster service, not a low-level service. It depends
too much on cluster communications, etc. Stonith is a nice thing to
have even if you invoke it manually. I wasn't intending to define the
cluster-level service, but the low-level API that a cluster-level
service could use.

> Actually, even for a 2 node case, quorum isn't *simple* if you care
> about addressing network partitions and hung nodes which may become
> unhung.

Nope. That's why you want STONITH when you have a physically shared
device (as opposed to a replicated device like DRBD). You may be able
to get by without STONITH or I/O fencing if you're using DRBD. You
still probably need quorum, though.

-- Alan Robertson
alanr@suse.com
Re: A Stonith API [ In reply to ]
Alan Robertson wrote:
>
> Tim Burke wrote:

> > Actually, even for a 2 node case, quorum isn't *simple* if you care
> > about addressing network partitions and hung nodes which may become
> > unhung.
>
> Nope.

I meant "Nope" in the sense of "Nope, it isn't simple", or "Yes, I agree
with you". Sorry.

-- Alan Robertson
alanr@suse.com
Re: A Stonith API [ In reply to ]
No Problemo!

Alan Robertson wrote:
>
> Alan Robertson wrote:
> >
> > Tim Burke wrote:
>
> > > Actually, even for a 2 node case, quorum isn't *simple* if you care
> > > about addressing network partitions and hung nodes which may become
> > > unhung.
> >
> > Nope.
>
> I meant "Nope" in the sense of "Nope, it isn't simple", or "Yes, I agree
> with you". Sorry.
>
> -- Alan Robertson
> alanr@suse.com
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.tummy.com
> http://lists.tummy.com/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

--
Tim Burke Tel. 978-446-9166 x232
Mission Critical Linux, Inc burke@mclinux.com
http://www.missioncriticallinux.com
Re: A Stonith API [ In reply to ]
Hi!

> > > To have an effective STONITH we need to be sure of what machine has
> > > to be shot. This a quorum issue, *simple* if you have only two
> > > machines. We've talked about it sometime ago.

*simple* != soooooo simple :)

> > Host membership decisions and policy regarding when the STONITH should
> > be called are at an inherently higher level. This should be separate
> > from the low-level STONITH calls.

I agree with you. But right now we have the STONITH API and don't
have the infamous Cluster Manager... this may be a quick hack to use
the API and be (at least a little bit) more safe.

> > Actually, even for a 2 node case, quorum isn't *simple* if you care
> > about addressing network partitions and hung nodes which may become
> > unhung.

That's why I said that a node can perceive if it is out of
touch. If a node can't reach the other one and can't reach the
reference host, it's on a partition or even with links down.
I know that there are many issues involved with quorum that depends
heavily on which solution you're using. As I'm much more into
heartbeat code, I'm trying to se how heartbeat could be improved with
this ideas ('till the API is ready and we have a Cluster Manager
;). But I think some of the ideas are universal.

> Nope. That's why you want STONITH when you have a physically shared
> device (as opposed to a replicated device like DRBD). You may be able
> to get by without STONITH or I/O fencing if you're using DRBD. You
> still probably need quorum, though.

Not that bad as kimberlite uses some disk area to put status and
uses disk locks... this could be used to control the return of a host.
(I haven't read all the docs yet but it sounds reasonable to me.)
Anyway, quorum, time sync and other issues are still problems but
it doesn't mean that you can't use a simplified version of them... :)

Hugs!

Luis

[ Luis Claudio R. Goncalves lclaudio@conectiva.com.br ]
[. MSc coming soon -- Conectiva HA Team -- Gospel User -- Linuxer -- :) ]
[. Fault Tolerance - Real-Time - Distributed Systems - IECLB - IS 40:31 ]
[. LateNite Programmer -- Jesus Is The Solid Rock On Which I Stand -- ]