Mailing List Archive

A possible approach for i/o fencing
We're proposing this as an outline for how to organize
cluster i/o fencing. My immediate personal goal is to
prototype a natalie nfs agent. We've left quite
a number of open issues, for which I hope to see some
discussion. All comments are appreciated.

thanks!

-dB


I/O Fencing for Clusters

Generic Resource Intervention Tool Service (GRITS)
and
NFS Admin Tool And Location Intervention Extension (NATALIE)

----------------

David Brower (mailto:dbrower@us.oracle.com)
John Leys (mailto:jleys@us.oracle.com)
Gary Young (mailto:gdyoung@us.oracle.com)

History

Public version 0.1 1-Mar-00

Abstract

Cluster systems with shared resources, such as disk, need
"fencing" of those resources during and after membership
reconfigurations. There are no general solutions to providing a
mechanism for fencing in the Open Standards world, with existing
solutions tied tightly to particular membership services and i/o
systems. This note outlines the architecture of a generic service
(GRITS) for organizing fencing interactions between membership
services and resources, and a mechanism (NATALIE) by which NFS
services may be extended to become a safely fenceable resource
under GRITS. Other resources, such as shared-scsi disk drivers,
SAN switches and the like should be equally capable of becoming
GRITS-able partners. Because the solution is openly releases, it
is hoped that system providers, SAN vendors, and the purveyors of
storage systems will incorporate appropriate agents, allowing for
reliable clusters with shared, fence-able resources.

GRITS Architecture

A GRITS cluster consists of:

Some number of nodes;

Some number of membership group/services, maintaining
quorum for the group. Groups have generation numbers of
at least 16 bits; wraps are allowed, and handled.

Some number of resources used by groups.

Nodes are identified by either IP address or resolvable name.
Resources are identified the same way, and are assumed to have at
least an IP capable proxy -- something that will respond to IP,
even if it needs to take some other path to an actual resource.

Each GRITS membership group has a configuration identifying the
resources that may be used by the group, including the destination
for GRITS control messages to the resource. The group service also
provides multiple access points for resources to query the group
when that is necessary.

Each GRITS resource has a a configured list of groups and hosts
that may possibly access it. The configuration identifies the
destinations for querys by the resource of the group. The
resource itself has at least one identified access point for
control messages to it from the resources.

(The configurations of groups and resources are expected to be
slowly changing, and their control is not defined by GRITS.)

GRITS controls access to resources by nodes depending on the group
membership. Access is either permitted or denied to whole nodes,
with no finer granularity. (While more control might be
desirable, it is hard to achieve, and is not addressed so as to
provide the simplest possible solution.)

Resources that must remain writable to all during cluster
transition, perhaps because they are used as part of the
membership quorum resolution, should not be under GRITS control.

At resource boot time, the resource examines the configuration,
and adopts an access posture towards the potential members of all
the groups. It does this first by seeing the configured boot
policy associated with each group member. Then it may use GRITS
defined messaging to communicate with the membership service to
set the correct current access rights. Plausibly different
initial policies are "read only" and "no access"; some resources
may only be able to enforce "no access". Obviously, a "writable"
boot policy would be insane.

Once an initial posture is established by a resource, membership
change events in the group drive GRITS control messages to all the
resources configured for the group. This will either deny access
to departing members, or allow access to continuing or joining
members.

At group boot time, the membership service must establish a quorum
without the use of any GRITS controlled resources. Then it allows
access to the members to all the resources using GRITS. The group
cannot proceed out of its reconfiguration until the unfencing of
all resources has been accomplished.

It is intended that "gritty" agents can be written and put in
place for:

- directly attached disks on shared SCSI. The agent would
communicate with some kernel-level code to manipulate the
SCSI reset and SCSI reserve to arbitrate access to the resource;
GRITS would talk to both sides to force to a known state.

- SAN attached storage, where the agent could program tokens or
domains of the fabric to control access;

- NFS attached storage, where an agent could use NATALIE
capabilities to narrow access below that of the basic exports;

- SMB attached storage, where an agent could communicate to the
software doing the "sharing" to control access.

- General network attached storage, where control may be achieved
by filtering in a router or proxy between the nodes and the
resources.

- Worst Case group members, who will have a third party wired
to their reset buttons to force them to be "fenced." This
is an always correct final solution.

Mixtures of these agencies may be needed, depending on the needs
and topology of the cluster in question. The resource providers
may or may not be on hosts that are part of any group in question.

Protocols

At the architectural level, GRITS is agnostic about the control
protocols. The service could be provided using a variety of
communication mechanisms. The messages are defined in terms of
verbs that may be bound to different techniques. In practice,
there will need to be some commonality of protocol. It will not
do to have a resource attempt to query a group using a protocol
the group does not support, nor can a group meaningfully send
a membership change to a resource without common ground.

Potential protocols include:

ONC RPC
CORBA
HTTP
HTTPS
COM/DCOM
SMB extensions

Exact bindings and support is area for open discussion.

Security

Only "authorized" parties may be allowed to invoke the verbs.
This is handled, pitifully, by a "cookie", a shared secret between
resources and group services. A secure protocol will protect the
contents of the cookie, but is not an essential part of the
architecture. As is traditional in cluster discussions, we
presume for the moment that traffic between nodes and resources is
on a secure network.

It is also the case that only current quorum holding membership
services should be invoking commands. The approach taken is to
have GRITS have some knowledge about cluster epochs. (A cluster
epoch is an ever increasing number bumped at the time the
membership is changed, sometimes calle the cluster generation, or
cluster id.) Only messages from the latest epoch should be
obeyed. To do this, GRITS needs to be able to establish an
ordering on epochs. The protocol must also handle wraps of the
epoch, from a high value back to zero. This is not worked out.

There are also issues regarding the need for stable storage for
the epoch in resource agents. What epoch should they obey at
resource boot?

Resource Settings

A resourceSetting is a combination of

{ resource, node, allow|deny }

ResourceSettings are a list or array of resourceSettings to
cover a set of resource/node bindings.

Verbs

Resource to GroupService

SetMeSeymour( in resourceName, out resourceSettings );

GroupService to Resource

Set( in cookie, in groupGeneration, in resourceName,
in nodeName, in boolean allow );

SetAll( in cookie, in groupGeneration, in resourceName,
in boolean allow );

GetSettings( in resourceName, out resourceSettings );

The GRITS agent must remember the highest generation used, and
refuse operations from older generations. If the cookie provided
do not match the configured cookie, the set request will be
rejected. FIXME -- deal with generation wrap.

When a Set operation completes, denying i/o from a host, it must
be guaranteed that all i/o from that host is over, and absolutely
no more will be done -- to the degree possible at the level of the
agent invoked. For example, an agent manipulating the disk driver
of a operating system kernel can rarely stop i/o that is queued in
the drive electronics.


Expected use

When the group service detects the death of a member, it uses
GRITS to fence it off, by calling Set for all resources in use by
the group, denying access by the deceased member.

When the member comes back to life, and is granted access back
into the group, the group uses another GRITS Set to re-enable
access to the resources.

It is up to the agent associated with the group to determine the
exact access needed to particular resources. It may be necessary
to leave write access available to some resource that is used as
part of group membership establishment, and/or quorum
determination.

NATALIE

The NATALIE extensions to NFS are additional RPC interfaces to
perform manipulation of the live state of the NFS servers. In
particular, NATALIE supports forceable eviction of mounting
clients. This is principally useful for cluster "fence off", but
is administratively useful on its own merits in non-cluster
environments.

The main verbs are almost those used with the GRITS GroupService
to Resource. With the followig exceptions. (1) The generation is
not needed, as NATALIE is not specific to group membership, and
(2) the mode is not allow or deny, but an NFS export access right,
such as "rw". The GRITS agent translating to NATALIE must do the
appropriate mapping.

Set( in cookie, in resourceName, in nodeName, in mode );

SetAll( in cookie, in resourceName, in mode );

GetSettings( in resourceName, out resourceSettings );

Only nodes covered by entries in the NFS exports will be allowed
to mount at all. It is not specified whether NATALIE can grant
more permission than are present in the exports configuration,
as this would constrain the Natalie implementaton.

When there are multiple and dynamic NFS servers on a machine, the
NATALIE settings need to be coherently replicated across them all.
That is, when a Set operation completes disallowing i/o from a
host, it must be guaranteed that all i/o from that host is over,
and no more will be done.

Three plausible implementations of NATALIE follow:

1. At the time of a Set that changes current values, all nfsds
and mountds are killed. The exports file is updated to
reflect the permissions being set, and the daemons are
restarted.

The error propagation of this is questionable; survivors
may receive errors that they ought not see.

2. Have control plugs into the NFSDs to reject operations
from fenced hosts. This requires memory of the nodes
being fenced. May also want hooks into mountd.

3. Create an NFS proxy, which implements the NATALIE filtering
but otherwise forwards the request on to an unmodified
NFS server. This is inefficient, but totally generic.
The NATALIE service need not even reside on the same
node as the NFS service. It should not, however,
reside on any of the nodes being fenced!

If a NATALIE service of type 2 or 3 nfds comes up as configured by
exports, and does not make the GRITS query, then a retried write
of evicted/frozen node might be allowed. This would be
unnacceptable. One solution is to put the natalie state in
persistant storage on the server. Another is to have the NATALIE
be started by the GRITS agent after it queries the group about the
boot posture.

Examples

These examples pretend to use ONC as the communication
mechanism. Others may do as well.

NFS/NATALIE Cluster
-------------------

A three node cluster A, B, C uses shared storage provided by
service N, mounted from export point /path. Together, the quorum
of A, B, C form a virtual IP host V.

Whenever there is a membership change in V, an agent receives
the change and issues GRITS set commands to all the resources,


On each of the nodes A, B or C, there is a configuration of GRITS
resources used, perhaps in the following form:

% cat /etc/grits.groups

# GRITS resources used by group V
V N onc://N/grits/resource/path

This tells the group-side code that to issue commands to the
resource for N, one uses onc to that location. It also
says that at boot, only read access is required.

On the resource providing N, the GRITS agent is configured:

% cat /etc/grits.resources

# GRITS groups using resource N

N /path onc://V/grits/group/V(none) \
onc://V/grits/group/A(r) \
onc://V/grits/group/B(r) \
onc://V/grits/group/C(r) \

This tells the resource side of GRITS that on booting, it may
contact any of V, A, B, or C for current state, and that until it
makes sucessful contact, it should allow only r access to the
actual nodes. There should never be requests from the virtual
node V, so it is given no access.

Shared SCSI Cluster
-------------------

Two nodes A and B have a shared SCSI bus arbitrating disk D,
forming group G. Each node runs both the membership and the scsi
grits agent; there is no shared IP.

% cat /etc/grits.groups

# GRITS resources used by group V
G D onc://A/grits/resource/path
G D onc://B/grits/resource/path

% cat /etc/grits.resources

# GRITS groups using resources

D /path onc://V/grits/group/A(r) \
onc://V/grits/group/B(r) \
A possible approach for i/o fencing [ In reply to ]
Hi,

On Wed, 01 Mar 2000 18:07:12 -0800, David Brower <dbrower@us.oracle.com>
said:

> We're proposing this as an outline for how to organize
> cluster i/o fencing.

OK, nice.

> ordering on epochs. The protocol must also handle wraps of the
> epoch, from a high value back to zero. This is not worked out.

It's easy enough. Just define an ordering between generation numbers
such as:

static inline x_before_y (unsigned x, unsigned y)
{
return ((signed) (x - y)) < 0;
}

and you are safe on wrap.

By the way, you really want a quorum generation number, not just a
membership generation, to deal with cluster partitioning (I'll mention
more below).

> There are also issues regarding the need for stable storage for
> the epoch in resource agents. What epoch should they obey at
> resource boot?

Assuming that these commands only ever get sent during cluster recovery,
it should be safe enough to simply adopt the first generation seen as
the base.

> Resource Settings

> Verbs

> Resource to GroupService
> SetMeSeymour( in resourceName, out resourceSettings );

What is this for?

> When a Set operation completes, denying i/o from a host, it must
> be guaranteed that all i/o from that host is over, and absolutely
> no more will be done -- to the degree possible at the level of the
> agent invoked. For example, an agent manipulating the disk driver
> of a operating system kernel can rarely stop i/o that is queued in
> the drive electronics.

No, but it _can_ wait for acknowledgement for all previously submitted
IOs. Guaranteeing quiescence before returning the ack should definitely
be assumed.

> Expected use

> When the group service detects the death of a member, it uses
> GRITS to fence it off, by calling Set for all resources in use by
> the group, denying access by the deceased member.

What about when quorum is lost in the local group? You may just have
been partitioned from the other side, but the other partition might
actually be dead --- you don't know, all you know is that you want to
fence off *yourself*. But you can't, because you don't have a new
generation number on your quorum.

Remember, not all clusters will suddenly kill themselves completely on
loss of quorum. If you lose quorum, you will still get a new membership
generation, but that shouldn't allow you to fence off other nodes and
give yourself access! It should let you fence yourself off, however.

Proposal: Allow nodes to deny access to themselves or to other nodes
in their membership group, even without the current generation
number.


Finally, you allude to but don't really address one nasty problem:
shared resources may, but won't always, have a controlling node, and the
controlling node may move from place to place on failover.

Fencing for shared scsi, for example, has always got to be done on the
local node. Fibrechannel fabrics may be able to fence only via the
command port on a remote node. For HA SMB services, the server may be
able to migrate freely. The node to which the fencing command needs to
be sent is not always going to be predictable in advance.

So, it seems as if fencing should always be considered to be a local
action, which gets forwarded by the fencing agent if appropriate. In
the case of SMB, the failover tracking will determine where to send the
fencing (and whether to send the fence to backup servers too, just to be
safe).

But in the case of SCSI, we need to fence locally. How can we do this
and still honour generation numbers? I'd guess we would need a
reservation superblock on disk to achieve that, otherwise there is no
way of encoding the "current" generation into the protocol.

Either way, talk of which protocol to use for GRITS arbitration seems to
be missing the point --- in the first instance, all GRITS commands
probably have got to go through a local agent of some description.

--Stephen
A possible approach for i/o fencing [ In reply to ]
"Stephen C. Tweedie" wrote:

> On Wed, 01 Mar 2000 18:07:12 -0800, David Brower <dbrower@us.oracle.com>
> said:
>
> By the way, you really want a quorum generation number, not just a
> membership generation, to deal with cluster partitioning (I'll mention
> more below).

I suppose I run the two together -- a generation w/o quorum "doesn't
exist", which is to say, it ought not be seen by the fenceable
resources.
Members in that generation know they don't have quorum, and ought not
be sending commands.

> > There are also issues regarding the need for stable storage for
> > the epoch in resource agents. What epoch should they obey at
> > resource boot?
>
> Assuming that these commands only ever get sent during cluster recovery,
> it should be safe enough to simply adopt the first generation seen as
> the base.

I *think* this is the case, but I'm not yet convinced I can proove
it is totally correct. We come around this again later.

> > Verbs
>
> > Resource to GroupService
> > SetMeSeymour( in resourceName, out resourceSettings );

>> What is this for?
>

I'm not assuming that the resource provider is part of the membership
group itself. (Hard for a drive or switch to run the s/w). So, the
intent is to allow them to sync to what ought to be their current
state when they boot or failover without correct state. To do so,
they need to be able to query the group for the current state.
(The name is flippant, and will likely be changed!)

> > When a Set operation completes, denying i/o from a host, it must
> > be guaranteed that all i/o from that host is over, and absolutely
> > no more will be done -- to the degree possible at the level of the
> > agent invoked. For example, an agent manipulating the disk driver
> > of a operating system kernel can rarely stop i/o that is queued in
> > the drive electronics.
>
> No, but it _can_ wait for acknowledgement for all previously submitted
> IOs. Guaranteeing quiescence before returning the ack should definitely
> be assumed.

I am absolutely certain that this will be an area of buggy
implementation!

It is probably the case that it needs to be possible for the system here
to return errors in some way. If the resource gets hung in such a way
that a resource doesn't get it's i/o complete, it may be useful
and correct to declare an error and rework the membership, perhaps by
killing the host involved. This raises the possibility of the need
to have explicit hierarchies of fencing based on the topology, allowing
the use of the finest correct granularity. You could then avoid
escalating
to the "reset" hammer except when necessary.
>
> > Expected use
>
> > When the group service detects the death of a member, it uses
> > GRITS to fence it off, by calling Set for all resources in use by
> > the group, denying access by the deceased member.
>
> What about when quorum is lost in the local group? You may just have
> been partitioned from the other side, but the other partition might
> actually be dead --- you don't know, all you know is that you want to
> fence off *yourself*. But you can't, because you don't have a new
> generation number on your quorum.
>
> Remember, not all clusters will suddenly kill themselves completely on
> loss of quorum. If you lose quorum, you will still get a new membership
> generation, but that shouldn't allow you to fence off other nodes and
> give yourself access! It should let you fence yourself off, however.
>
> Proposal: Allow nodes to deny access to themselves or to other nodes
> in their membership group, even without the current generation
> number.

My initial reaction is that if you know enough to die, it seems like you
know enough to stop your own i/o, and are functioning well enough to do
that.
But it certainly seems reasonable to allow the self-fence.

>
> Finally, you allude to but don't really address one nasty problem:
> shared resources may, but won't always, have a controlling node, and the
> controlling node may move from place to place on failover.
>
> Fencing for shared scsi, for example, has always got to be done on the
> local node. Fibrechannel fabrics may be able to fence only via the
> command port on a remote node. For HA SMB services, the server may be
> able to migrate freely. The node to which the fencing command needs to
> be sent is not always going to be predictable in advance.
>
> So, it seems as if fencing should always be considered to be a local
> action, which gets forwarded by the fencing agent if appropriate. In
> the case of SMB, the failover tracking will determine where to send the
> fencing (and whether to send the fence to backup servers too, just to be
> safe).
>
> But in the case of SCSI, we need to fence locally. How can we do this
> and still honour generation numbers? I'd guess we would need a
> reservation superblock on disk to achieve that, otherwise there is no
> way of encoding the "current" generation into the protocol.

This goes back to my wondering about how you detect the current
generation
at resource boot. If you tought you could accept the first generation
that commanded you, you would not need the persistant store. If you
think
you cannot, then you need to provide reliable persistant store for each
resource.

My expectation is that failover paths to the resource control (like a
virtual ip) can just be in the resource id when possible. For things
that don't have the nice semantic, you'll probably need to have the
true address of all the possible failover candidates. In the case
of shared scsi, you need to (try to) talk to the hosts at both ends
of the cable, and what you tell them will determine what scsi
reservations and resets they assert, and what errors you will tolerate
before returning error. If you are doing a fence of a bad node, you
tell it to fence itself off, and the other one to fence the other off;
if the command to the survivor works, you don't care about the results
from the first, because you believe the reservation has gone into
effect successfully.

> Either way, talk of which protocol to use for GRITS arbitration seems to
> be missing the point --- in the first instance, all GRITS commands
> probably have got to go through a local agent of some description.


I think that for writing interoperable apps and resources, the protocol
is the point of truth, actually, and can't be avoided for too long.
While it probably ought to be designed protocol-independantly (and I'm
trying to structure it that way), at some point there will be some small
number of implementations of a small number of protocols. I am eagerly
waiting for suggestions what the first few should be. I think ONC RPC
is one, only because I'm very interested in fencing NFS services. I can
probably be reasoned out of that view, but I don't know what the best
alternate would be. CORBA? HTTP(S)? DCOM?

I do not see why we need to assume the presence of an agent local to
the instances with the group service. If, for example, we have NAS
storage through NFS or SMB, then the agent can run exclusively on the
storage providing system, and need not be on the client nodes at all.

thanks!

-dB
A possible approach for i/o fencing [ In reply to ]
Hi,

On Sun, 05 Mar 2000 07:13:00 -0800, David Brower <dbrower@us.oracle.com>
said:

> "Stephen C. Tweedie" wrote:
>> On Wed, 01 Mar 2000 18:07:12 -0800, David Brower <dbrower@us.oracle.com>
>> said:
>>
>> By the way, you really want a quorum generation number, not just a
>> membership generation, to deal with cluster partitioning (I'll mention
>> more below).

> I suppose I run the two together -- a generation w/o quorum "doesn't
> exist", which is to say, it ought not be seen by the fenceable
> resources. Members in that generation know they don't have quorum,
> and ought not be sending commands.

First, if we could rely 100% on nodes outside the quorate cluster to
enforce this, there wouldn't be any need for fencing.

Secondly, the cluster model I outlined at the January cluster meeting
explicitly separates local membership from quorum: quorum is just
another resource that the cluster manages. This is really important,
for a number of reasons, the most important of which is that it
abstracts out an important but quite specific part of the cluster
management. Modularity is a Good Thing. Generally speaking, you don't
necessarily want all activity on a node to die just because it loses
quorum (think about a cluster including individuals' workstations: the
VMS-style behaviour of freezing a workstation which loses its connection
to the network isn't necessarily what we want in all cases!)

>> Assuming that these commands only ever get sent during cluster recovery,
>> it should be safe enough to simply adopt the first generation seen as
>> the base.

> I *think* this is the case, but I'm not yet convinced I can proove
> it is totally correct. We come around this again later.

Cluster recovery is one part of the system which we have to assume gets
quorum right, or all bets are off anyway!

>> > Resource to GroupService
>> > SetMeSeymour( in resourceName, out resourceSettings );
>>> What is this for?

> I'm not assuming that the resource provider is part of the membership
> group itself. (Hard for a drive or switch to run the s/w). So, the
> intent is to allow them to sync to what ought to be their current
> state when they boot or failover without correct state. To do so,
> they need to be able to query the group for the current state.

OK, got you.

>> No, but it _can_ wait for acknowledgement for all previously submitted
>> IOs. Guaranteeing quiescence before returning the ack should definitely
>> be assumed.

> I am absolutely certain that this will be an area of buggy
> implementation!

Perhaps, but there is a whole slew of software which already assumes
that this works correctly. In particular, if write acks aren't
synchronised correctly with oxide, then all forms of journaling become
unreliable. It feels like a reasonable assumption to make that we know
which IOs have definitely completed at any point in time.

>> Remember, not all clusters will suddenly kill themselves completely on
>> loss of quorum. If you lose quorum, you will still get a new membership
>> generation, but that shouldn't allow you to fence off other nodes and
>> give yourself access! It should let you fence yourself off, however.
>>
>> Proposal: Allow nodes to deny access to themselves or to other nodes
>> in their membership group, even without the current generation
>> number.

> My initial reaction is that if you know enough to die, it seems like
> you know enough to stop your own i/o, and are functioning well enough
> to do that. But it certainly seems reasonable to allow the
> self-fence.

Point taken, but I guess that a working self-fence could be used as an
effective way of aborting outstanding IOs as a natural part of the
cluster recovery if quorum is lost.

>> But in the case of SCSI, we need to fence locally. How can we do this
>> and still honour generation numbers? I'd guess we would need a
>> reservation superblock on disk to achieve that, otherwise there is no
>> way of encoding the "current" generation into the protocol.

> This goes back to my wondering about how you detect the current
> generation at resource boot. If you tought you could accept the first
> generation that commanded you, you would not need the persistant
> store. If you think you cannot, then you need to provide reliable
> persistant store for each resource.

Agreed, but that raises a different question: do you need generation
numbers at all, then? If cluster software (a) only ever performs
fencing during cluster recovery, and then (b) only if it holds quorum,
then the only risk (other than of buggy cluster software) is if cluster
recovery takes so long on one faulty node that its own fencing request
gets overtaken by another cluster transition elsewhere, and it gets
evicted before it notices.

Nasty. If this happens, it's not clear that you _can_ do the right
thing, unless you can rely on persistent generation numbers in the
fenced resource. If the resource is (say) a network switch with no
persistent state, the only alternative is for it to broadcast for
generation numbers on startup and take the highest one offered, before
accepting any fencing instructions.

> In the case of shared scsi, you need to (try to) talk to the hosts at
> both ends of the cable, and what you tell them will determine what
> scsi reservations and resets they assert, and what errors you will
> tolerate before returning error.

Actually, shared scsi has the rather nice property that you can use a
combination of reservation and on-disk storage to do a lot of this
_without_ direct negotiation between hosts. If you can both write your
ownership of the disk and your new generation count while holding
reservation, then at least the drivers on each end of the wire can do
some form of cooperative fencing without having to talk to each other
directly.

> I think that for writing interoperable apps and resources, the protocol
> is the point of truth, actually, and can't be avoided for too long.
> While it probably ought to be designed protocol-independantly (and I'm
> trying to structure it that way), at some point there will be some small
> number of implementations of a small number of protocols. I am eagerly
> waiting for suggestions what the first few should be. I think ONC RPC
> is one, only because I'm very interested in fencing NFS services. I can
> probably be reasoned out of that view, but I don't know what the best
> alternate would be. CORBA? HTTP(S)? DCOM?

RPC has the advantage that it doesn't try to do any naming itself: it
leaves it up to the carrying protocol to do that, so you can do rpc over
SCSI as easily as over IP. I don't want to have to have a corba name
service running to make this stuff work. :)

--Stephen
A possible approach for i/o fencing [ In reply to ]
"Stephen C. Tweedie" wrote:
> >> By the way, you really want a quorum generation number, not just a
> >> membership generation, to deal with cluster partitioning (I'll mention
> >> more below).
>
> > I suppose I run the two together -- a generation w/o quorum "doesn't
> > exist", which is to say, it ought not be seen by the fenceable
> > resources. Members in that generation know they don't have quorum,
> > and ought not be sending commands.
>
> First, if we could rely 100% on nodes outside the quorate cluster to
> enforce this, there wouldn't be any need for fencing.

I'm not sure I go along with this. The only parties that can be issuing
fence commands are those that are already in cluster transition modes
of operation. They are already admitting that something is going on,
and fencing may need to be done, and that someone is going to issue
fences. What we are trying to protect ourselves from is the insane
nodes that have not enterered a transition mode, and may be doing
writes, or sleeping with writes queueing. They will not ever be
issuing fencing commands, because they aren't in the transition code.
Thus, I don't believe we need to particularly protect ourselves from
grossly incorrect fencing by non quorum members, if we keep the
quorum generations straight (this remains fuzzy itself though).

> Secondly, the cluster model I outlined at the January cluster meeting
> explicitly separates local membership from quorum: quorum is just
> another resource that the cluster manages. This is really important,
> for a number of reasons, the most important of which is that it
> abstracts out an important but quite specific part of the cluster
> management. Modularity is a Good Thing. Generally speaking, you don't
> necessarily want all activity on a node to die just because it loses
> quorum (think about a cluster including individuals' workstations: the
> VMS-style behaviour of freezing a workstation which loses its connection
> to the network isn't necessarily what we want in all cases!)

I think I've tried to support that model, but not called it out quite so
clearly. This is why GRITS is agnostic about the number of groups, and
the resources controlled by the group. There is the hidden assumption
(that ought to be explicit) that the group and gritty resources are
always a quorum group and the instantiation of the access policy for
members and non-members to those resources. You are correct that there
are groups that are non-quorum groups accessing non-controlled resources;
these are certainly the groups that need to form to determine quorum in
the first place.

> >> Assuming that these commands only ever get sent during cluster recovery,
> >> it should be safe enough to simply adopt the first generation seen as
> >> the base.
>
> > I *think* this is the case, but I'm not yet convinced I can proove
> > it is totally correct. We come around this again later.
>
> Cluster recovery is one part of the system which we have to assume gets
> quorum right, or all bets are off anyway!

Yup.

> >> No, but it _can_ wait for acknowledgement for all previously submitted
> >> IOs. Guaranteeing quiescence before returning the ack should definitely
> >> be assumed.
>
> > I am absolutely certain that this will be an area of buggy
> > implementation!
>
> Perhaps, but there is a whole slew of software which already assumes
> that this works correctly. In particular, if write acks aren't
> synchronised correctly with oxide, then all forms of journaling become
> unreliable. It feels like a reasonable assumption to make that we know
> which IOs have definitely completed at any point in time.

OK. we will need to work the language to clearly define the semantic.

> >> But in the case of SCSI, we need to fence locally. How can we do this
> >> and still honour generation numbers? I'd guess we would need a
> >> reservation superblock on disk to achieve that, otherwise there is no
> >> way of encoding the "current" generation into the protocol.
>
> > This goes back to my wondering about how you detect the current
> > generation at resource boot. If you thought you could accept the first
> > generation that commanded you, you would not need the persistant
> > store. If you think you cannot, then you need to provide reliable
> > persistant store for each resource.
>
> Agreed, but that raises a different question: do you need generation
> numbers at all, then? If cluster software (a) only ever performs
> fencing during cluster recovery, and then (b) only if it holds quorum,
> then the only risk (other than of buggy cluster software) is if cluster
> recovery takes so long on one faulty node that its own fencing request
> gets overtaken by another cluster transition elsewhere, and it gets
> evicted before it notices.
>
> Nasty. If this happens, it's not clear that you _can_ do the right
> thing, unless you can rely on persistent generation numbers in the
> fenced resource. If the resource is (say) a network switch with no
> persistent state, the only alternative is for it to broadcast for
> generation numbers on startup and take the highest one offered, before
> accepting any fencing instructions.

Yes; this is where I am unconvinced that non-persistent store is
adequate. We need to keep thinking about this.

> > In the case of shared scsi, you need to (try to) talk to the hosts at
> > both ends of the cable, and what you tell them will determine what
> > scsi reservations and resets they assert, and what errors you will
> > tolerate before returning error.
>
> Actually, shared scsi has the rather nice property that you can use a
> combination of reservation and on-disk storage to do a lot of this
> _without_ direct negotiation between hosts. If you can both write your
> ownership of the disk and your new generation count while holding
> reservation, then at least the drivers on each end of the wire can do
> some form of cooperative fencing without having to talk to each other
> directly.

Yes this is true. I have been hoping, however, to keep dancing until
we decide that the persistant storage is an essential requirement. That
is, I'd like not to entertain use of the disk even in the shared scsi
environment until we are certain that we need to do it for all resources.
Even if we avoid it as a requirement, it might still be a convenient
way for some shared scsi agent to implement the fence semantics, but
that is not the same as saying it -must- do it that way.

> > I think that for writing interoperable apps and resources, the protocol
> > is the point of truth, actually, and can't be avoided for too long.
> > While it probably ought to be designed protocol-independantly (and I'm
> > trying to structure it that way), at some point there will be some small
> > number of implementations of a small number of protocols. I am eagerly
> > waiting for suggestions what the first few should be. I think ONC RPC
> > is one, only because I'm very interested in fencing NFS services. I can
> > probably be reasoned out of that view, but I don't know what the best
> > alternate would be. CORBA? HTTP(S)? DCOM?
>
> RPC has the advantage that it doesn't try to do any naming itself: it
> leaves it up to the carrying protocol to do that, so you can do rpc over
> SCSI as easily as over IP. I don't want to have to have a corba name
> service running to make this stuff work. :)

I'm cool with that, mostly. It is certainly possible to do CORBA w/o the
name service. A CORBA resource object reference in the config could be bound
to a real or virtual host id, either by name or numeric ip address, at a
fixed port. Almost any mechanism may prefer to have a functional DNS, and
ONC/RPC may require a working portmapper. At some level, they all seem about
that same to me. I suppose it depends mostly on what someone is comfortable
working with. I'm personally more familiar with corba than onc/rpc, but I'm
not going to push it.

-dB
A possible approach for i/o fencing [ In reply to ]
Hi,

On Mon, 06 Mar 2000 11:43:48 -0800, David Brower <dbrower@us.oracle.com>
said:

>> First, if we could rely 100% on nodes outside the quorate cluster to
>> enforce this, there wouldn't be any need for fencing.

> I'm not sure I go along with this. The only parties that can be issuing
> fence commands are those that are already in cluster transition modes
> of operation.

Yes, but that doesn't mean they are all running in the _same_
transition.

> They are already admitting that something is going on, and fencing may
> need to be done, and that someone is going to issue fences. What we
> are trying to protect ourselves from is the insane nodes that have not
> enterered a transition mode, and may be doing writes, or sleeping with
> writes queueing.

Yes, the example usually given being a node which goes to sleep for a
while for whatever reason, then comes back suddenly, submitting whatever
it had in its write queue to the disk. And...

> They will not ever be issuing fencing commands, because they aren't in
> the transition code.

...there is no reason why such a node sleep cannot occur during
transition processing. After all, problems tend to come in bunches.
What happens if a node is doing cluster recovery, it dies, the rest of
the cluster recovers in turn from that death (bumping the cluster
incarnation of course), and then the dead node suddenly recovers,
sending forth bogus fence commands?

> Thus, I don't believe we need to particularly protect ourselves from
> grossly incorrect fencing by non quorum members,

I think we do, since we simply cannot make guarantees about faulty
nodes.

>> Secondly, the cluster model I outlined at the January cluster meeting
>> explicitly separates local membership from quorum: quorum is just
>> another resource that the cluster manages.
...

> I think I've tried to support that model, but not called it out quite
> so clearly. This is why GRITS is agnostic about the number of groups,
> and the resources controlled by the group. There is the hidden
> assumption (that ought to be explicit) that the group and gritty
> resources are always a quorum group and the instantiation of the
> access policy for members and non-members to those resources.

I'm more concerned about the implicit assumption that quorum group ==
membership group, and quorum incarnation == membership incarnation. We
can make a guarantee that quorum incarnations increase monotonically,
but group membership incarnation identifiers have to behave rather
differently if you allow non-quorate partitions to have incarnation
numbers.

>> Agreed, but that raises a different question: do you need generation
>> numbers at all, then? If cluster software (a) only ever performs
>> fencing during cluster recovery, and then (b) only if it holds quorum,
>> then the only risk (other than of buggy cluster software) is if cluster
>> recovery takes so long on one faulty node that its own fencing request
>> gets overtaken by another cluster transition elsewhere, and it gets
>> evicted before it notices.

See my argument above: we have to be prepared for fence commands coming
from non-quorate nodes in certain failure modes. :(

>> Nasty. If this happens, it's not clear that you _can_ do the right
>> thing, unless you can rely on persistent generation numbers in the
>> fenced resource. If the resource is (say) a network switch with no
>> persistent state, the only alternative is for it to broadcast for
>> generation numbers on startup and take the highest one offered, before
>> accepting any fencing instructions.

> Yes; this is where I am unconvinced that non-persistent store is
> adequate. We need to keep thinking about this.

Indeed. However, if our switch does the broadcast for the active
quorate generation number and gets a number of replies representing
enough votes for quorum, then it has a pretty good idea that things are
OK! That's not hard to do: the "what's my generation number?" message
just has to have a reply like "it's xyzzy, and I have N votes, and
quorum is M." That's enough to bootstrap the generation numbers
reliably.

Of course, we'd have to have a similar kind of behaviour even when
setting fencing during a cluster transition, since the first quorate
transition after a cluster power cycle may find the grits resource
without a prior generation number, and so it will have to "authenticate"
the new generation number with the same sort of quorum evidence. Once
the resource has a generation number, the monotonic advance assumption
is enough to authenticate future quorate transitions.

>> Actually, shared scsi has the rather nice property that you can use a
>> combination of reservation and on-disk storage to do a lot of this
>> _without_ direct negotiation between hosts. ...

> Yes this is true. I have been hoping, however, to keep dancing until
> we decide that the persistant storage is an essential requirement.

Either that or presenting evidence of quorum as above, I guess. It
starts to feel unnecessarily complex, but I can't see a simpler way of
making the guarantees solid.

--Stephen
A possible approach for i/o fencing [ In reply to ]
By the way, is anybody on these lists getting anything
out of these fencing discussions, or are Stephen and I
talking to ourselves?

thanks,
-dB
A possible approach for i/o fencing [ In reply to ]
Here is an updated proposal, fixing wording that was confusing, and
clearly identifying (I hope) open issues to discuss. I've gotten
some feedback saying to keep the discussion on the list rather than
take it offline. All you lurkers should chip in your comments too!

thanks,
-dB


I/O Fencing for Clusters

Generic Resource Intervention Tool Service (GRITS)
and
NFS Admin Tool And Location Intervention Extension (NATALIE)

----------------

David Brower (mailto:dbrower@us.oracle.com)
John Leys (mailto:jleys@us.oracle.com)
Gary Young (mailto:gdyoung@us.oracle.com)

History

Public version 0.2 8-Mar-00

Abstract

Cluster systems with shared resources, such as disk, need
"fencing" of those resources during and after membership
reconfigurations. There are no general solutions to providing a
mechanism for fencing in the Open Standards world, with existing
solutions tied tightly to particular membership services and i/o
systems. This note outlines the architecture of a generic service
(GRITS) for organizing fencing interactions between membership
services and resources. It also describes a mechanism (NATALIE)
by which NFS services may be extended to become a safely fenceable
resource under GRITS. Other resources, such as shared-scsi disk
drivers, SAN switches and the like, should be equally capable of
becoming GRITS-able partners. Because the solution is openly
released, it is hoped that system providers, SAN vendors, and the
purveyors of storage systems will incorporate appropriate agents,
allowing for reliable clusters with shared, fence-able resources.

GRITS Architecture

A GRITS cluster consists of:

Some number of nodes;

Some number of membership quorum group services.
Groups have quorum generation numbers of
at least 16 bits; wraps are allowed, and handled.

Some number of resources used by quorum groups.

Nodes are identified by either IP address or resolvable name.
Resources are identified the same way, and are assumed to have at
least an IP capable proxy -- something that will respond to IP,
even if it needs to take some other path to an actual resource.

Each GRITS membership quorum group has a configuration identifying
the resources that may be used by the group, including the
destination for GRITS control messages to the resource. The
quorum group service also provides multiple access points for
resources to query the group when that is necessary. Each GRITS
group issuing commands and responding to queries is required to
have established quorum. Each has a generation number, which is
only seen outside the membership service once quorum has been
established.

Each GRITS resource has a configured list of quorum groups and
hosts that may possibly access it. The configuration identifies
the destinations for querys by the resource of the group. The
resource itself has at least one identified access point for
control messages to it from the resources.

(The configurations of groups and resources are expected to be
slowly changing, and their control is not defined by GRITS.)

GRITS controls access to resources by nodes depending on the
quorum group membership. Access is either permitted or denied to
whole nodes, with no finer granularity.

[.Finer granularity is very desirable, but very hard to achieve.
It would seem to be necessary to associate groups with processes,
and make the groups, or a group cookie or key get carried along
with requests on behalf of processes. For instance, the key
associated with a fibre-channel persistent reservation might be an
excellent way to allow/disallow members. It may be very
difficult to arrange for the key sent by the driver for an i/o on
behalf of one process to be different than the key used for i/o by
another process.]

Resources that must remain writable to all during cluster
transition, perhaps because they are used as part of the
membership quorum resolution, should not be under GRITS control.

--> It has been suggested that fencing can be used as part of quorum
resolution. If so, then it should be done for a separate "quorum"
group, arbitrating access to the quorum resource. This hasn't
been worked out.

At resource boot time, the resource examines the configuration,
and adopts an access posture towards the potential members of all
the groups. First, it sees the configured boot policy associated
with each group member. Then it may also use GRITS defined
messaging to communicate with the configured membership groups to
set the correct current access rights. At a true cold boot, there
may be no groups to respond, so the configured boot posture
remains in effect until a quorum group is formed and issues
commands to the resources. The plausible initial policies are
"read only" and "no access"; some resources may only be able to
enforce "no access". A "writable" boot policy would be defeat
the purpose of the fence.

Once an initial posture is established by a resource, membership
change events in the quorum group drive GRITS control messages to
all the resources configured for the group. These will deny
access to departing members and allow access to continuing or
joining members. The quorum group cannot proceed out of its
reconfiguration stage until the correct fencing of all resources
has been accomplished.

It is intended that "gritty" agents can be written and put in
place for:

- directly attached disks on shared SCSI. The agent would
communicate with some kernel-level code to manipulate the
SCSI reset and SCSI reserve to arbitrate access to the resource;
GRITS would talk to both sides to force to a known state.

- SAN attached storage, where the agent could program tokens or
domains of the fabric to control access;

- NFS attached storage, where an agent could use NATALIE
capabilities to narrow access below that of the basic exports;

- SMB attached storage, where an agent could communicate to the
software doing the "sharing" to control access.

- General network attached storage, where control may be achieved
by filtering in a router or proxy between the nodes and the
resources.

- Worst Case group members, who will have a third party wired
to their reset buttons to force them to be "fenced." This
is an always correct final solution. The "X10" system can
be used to turn off the power to particularly non-cooperative
entities.

Mixtures of these agencies may be needed, depending on the needs
and topology of the cluster in question. The resource providers
may or may not be on hosts that are part of any group in question.

OPEN: The current proposal does not address layers of fencing or
escalation and isolation. It might be useful to identify levels
at which fencing may be stopped without doing higher levels. For
instance, if all disk i/o may be stopped by frobbing the
fibrechannel switch, then turning off the power may not be
necessary.

Protocols

At the architectural level, GRITS is agnostic about the control
protocols. The service could be provided using a variety of
communication mechanisms. The messages are defined in terms of
verbs that may be bound to different techniques. In practice,
there will need to be some commonality of protocol. It will not
do to have a resource attempt to query a group using a protocol
the group does not support, nor can a group meaningfully send
a membership change to a resource without common ground.

Potential protocols include:

ONC RPC
CORBA
HTTP
HTTPS
COM/DCOM
SMB extensions

Exact bindings and support is area for open discussion.

Security

Only "authorized" parties may be allowed to invoke the verbs.
This is handled, pitifully, by a "cookie", a shared secret between
resources and group services. A secure protocol would protect the
contents of the cookie, but is not an essential part of the
architecture. As is traditional in cluster discussions, we may
presume for the moment that traffic between nodes and resources is
on a secure network.

Only current quorum holding membership services may be invoking
commands, except that a member may always fence itself from
resources. (It may not unfence without obtaining quorum.)

To enforce this, GRITS has some knowledge about quorum
generations. A quorum generation is an ever increasing number
bumped at the time the membership is changed and confirmed. This
is distinct from a raw cluster, which may exist without the
presence of quorum. For purposes of GRITS, only quorum
generations exist, and cluster generations are never seen. For
example, a cluster with a quorum generation of 10 experiences a
partition, which drives reconfiguration. Several partitions may
each decide to have generation 11 as they seek quorum. All but
one of these will lose the quorum determination, and their
existence at generation 11 will never be seen by GRITS. Only the
surviving quorum holder at generation 11 may issue GRITS commands.
Therefore, GRITS communication need only consider commands from
the latest cluster generation as valid, and must discard or return
error to late-arriving commands from earlier generations.

To establish ordering of quorum generations, GRITS must consider
the possibility of wraparound. It is suggested that something like

static inline x_before_y (unsigned x, unsigned y)
{
return ((signed) (x - y)) < 0;
}

will suffice.

FIXME - For this to work, we need to fix the width of the
generation, to 16, 32, or 64 bits. My inclination is to make it
64 bits.

FIXME - There are also issues regarding the need for stable
storage for the epoch in resource agents. What epoch should they
obey at resource boot.

Resource Settings

A resourceSetting is a combination of

{ resource, node, allow|deny }

ResourceSettings are a list or array of resourceSettings to
cover a set of resource/node bindings.

Verbs

Resource to GroupService

SetMeSeymour( in resourceName, out resourceSettings );

GroupService to Resource

Set( in cookie, in groupGeneration, in resourceSettings );

Get( in resourceName, out resourceSettings );

[.OPEN the previous proposal did not use resourceSettings for Set,
which would be a performance/scalability problem if Set is
lengthy. It might be necessary to do Sets in parallel for
multiple resource providers. Should the interface be async, or
should we say that the agent will be multi-threaded to do
parallel operations?]

The GRITS agent must remember the highest generation used, and
refuse operations from older generations. If the cookie provided
do not match the configured cookie, the set request will be
rejected. FIXME -- deal with generation wrap.

When a Set operation completes, denying i/o from a host, it must
be guaranteed that all i/o from that host is over, and absolutely
no more will be done. This may mean waiting for the completion of
all outstanding i/o requests if true cancellation is not possible.

OPEN: the current proposal does not address errors or timeouts
that could be returned from Set operations.

Expected use

When the quorum group detects the death of a member, it uses GRITS
to fence it off, by calling Set for all resources in use by the
group, denying access by the deceased member.

When the member comes back to life, and is granted access back
into the group, the group uses another GRITS Set to re-enable
access to the resources.

It is up to the agent associated with the group to determine the
exact access needed to particular resources. It may be necessary
to leave write access available to some resource that is used as
part of group membership establishment, and/or quorum
determination.

NATALIE

The NATALIE extensions to NFS are additional RPC interfaces to
perform manipulation of the live state of the NFS servers. In
particular, NATALIE supports forceable eviction of mounting
clients. This is principally useful for cluster "fence off", but
is administratively useful on its own merits in non-cluster
environments.

The main verbs are almost those used with the GRITS GroupService
to Resource. With the followig exceptions. (1) The generation is
not needed, as NATALIE is not specific to group membership, and
(2) the mode is not allow or deny, but an NFS export access right,
such as "rw". The GRITS agent translating to NATALIE must do the
appropriate mapping.

Set( in cookie, in resourceName, in nodeName, in mode );

SetAll( in cookie, in resourceName, in mode );

GetSettings( in resourceName, out resourceSettings );

Only nodes covered by entries in the NFS exports will be allowed
to mount at all. It is not specified whether NATALIE can grant
more permission than are present in the exports configuration,
as this would constrain the NATALIE implementaton.

When there are multiple and dynamic NFS servers on a machine, the
NATALIE settings need to be coherently replicated across them all.
That is, when a Set operation completes disallowing i/o from a
host, it must be guaranteed that all i/o from that host is over,
and no more will be done.

Three plausible implementations of NATALIE follow:

1. At the time of a Set that changes current values, all nfsds
and mountds are killed. The exports file is updated to
reflect the permissions being set, and the daemons are
restarted.

The error propagation of this is questionable; survivors
may receive errors that they ought not see.

2. Have control plugs into the NFSDs to reject operations
from fenced hosts. This requires memory of the nodes
being fenced. May also want hooks into mountd.

3. Create an NFS proxy, which implements the NATALIE filtering
but otherwise forwards the request on to an unmodified
NFS server. This is inefficient, but totally generic.
The NATALIE service need not even reside on the same
node as the NFS service. It should not, however,
reside on any of the nodes being fenced!

If a NATALIE service of type 2 or 3 nfds comes up as configured by
exports, and does not make the GRITS query, then a retried write
of evicted/frozen node might be allowed. This would be bad. One
solution is to put the NATALIE state in persistant storage on the
server. Another is to have the NATALIE be started by the GRITS
agent after it queries the group about the boot posture.

Examples

These examples pretend to use ONC as the communication
mechanism. Others may do as well.

NFS/NATALIE Cluster
-------------------

A three node cluster A, B, C uses shared storage provided by
service N, mounted from export point /path. Together, the quorum
of A, B, C form a virtual IP host V.

Whenever there is a membership change in V, an agent receives
the change and issues GRITS set commands to all the resources,


On each of the nodes A, B or C, there is a configuration of GRITS
resources used, perhaps in the following form:

% cat /etc/grits.groups

# GRITS resources used by group V
V N onc://N/grits/resource/path

This tells the group-side code that to issue commands to the
resource for N, one uses onc to that location. It also
says that at boot, only read access is required.

On the resource providing N, the GRITS agent is configured:

% cat /etc/grits.resources

# GRITS groups using resource N

N /path onc://V/grits/group/V(none) \
onc://V/grits/group/A(r) \
onc://V/grits/group/B(r) \
onc://V/grits/group/C(r) \

This tells the resource side of GRITS that on booting, it may
contact any of V, A, B, or C for current state, and that until it
makes sucessful contact, it should allow only r access to the
actual nodes. There should never be requests from the virtual
node V, so it is given no access.

Shared SCSI Cluster
-------------------

Two nodes A and B have a shared SCSI bus arbitrating disk D,
forming group G. Each node runs both the membership and the scsi
GRITS agent; there is no shared IP.

% cat /etc/grits.groups

# GRITS resources used by group V
G D onc://A/grits/resource/path
G D onc://B/grits/resource/path

% cat /etc/grits.resources

# GRITS groups using resources

D /path onc://V/grits/group/A(r) \
onc://V/grits/group/B(r) \


Summary of open areas and problems
----------------------------------

1. Can we use fencing to resolve quorum? How would that
actually work?

2. Do we need persistance in the resource agent to determine
the correct cluster generation to listen too? This gets
particularly complicated during nested reconfigs, with
some delayed member believing it has quorum, when the
actual cluster has moved on beyond.

3. It may be desirable to support configuration of explicit
hierarchies of fencing points, stopping at the lowest one that
will work rather than going all the way up to "shoot the node"

4. Error reporting and propagation have not been addressed. If
an attempt to fence fails, what to we do? This leads to
hierarchies above.

5. Finer granularity than node may be extremely desirable.
Doing so is difficult, seemingly requiring kernel level
hooks to attach "group" attributes to processes to be
attached to requests for resources; this involves getting
into device drivers, and gets very messy.

6. Performance reasons forced a change to resourceSettings
in the Set command, so we can batch a bunch of requests
in one shot. But we may still need to do Sets in parallel
if there are a lot of them to talk to, and we don't want
to serialize on their potentially lengthy responses.

Acknowledgements
----------------

We'd like to thank the readers of the gfs-devel@borg.umn.edu
and those on the linux-ha-dev@lists.tummy.com for their indulgence
in hosting discussions of this proposal. We'd also like to thank
the following individuals who have provided significant feedback:

Stephen C. Tweedie (sct@redhat.com)
A possible approach for i/o fencing [ In reply to ]
On Mon, 6 Mar 2000, David Brower wrote:

> By the way, is anybody on these lists getting anything
> out of these fencing discussions, or are Stephen and I
> talking to ourselves?
>
David, if you don't mind that I (or we?) are more in "learning" than in
"expert" mode, then please continue. I follow it very closely, and if I
feel like being able to contribute, I will not hesitate. I think this is
a valuable discussion, and even if I do not understand it fully right now,
I will certainly be happy if I can later look it up in the archive.

> thanks,
> -dB
>
Thanks
Volker

--
Volker Wiegand Phone: +49 (0) 6196 / 50951-24
SuSE Linux AG Fax: +49 (0) 6196 / 40 96 07
Mergenthalerallee 45-47 Mobile: +49 (0) 179 / 292 66 76
D-65760 Eschborn E-Mail: Volker.Wiegand@suse.de
++ Only users lose drugs. Or was it the other way round? ++