Mailing List Archive: Two node cluster and no hardware device for stonith.

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

Feb 4, 2015, 6:38 PM

Post #26 of 37 (3671 views)

Could you please give a hint: how to use fencing in case the nodes are all
in different geo-distributed datacenters? How people do that? Because there
could be a network disconnection between datacenters, and we have no chance
to send a stonith signal somewhere.

On Wednesday, February 4, 2015, Andrea <a.bacchi@codices.com> wrote:

> Digimer <lists@...> writes:
>
> >
> > That fence failed until the network came back makes your fence method
> > less than ideal. Will it eventually fence with the network still failed?
> >
> > Most importantly though; Cluster resources blocked while the fence was
> > pending? If so, then your cluster is safe, and that is the most
> > important part.
> >
> Hi Digimer
>
> I'm using for fencing a remote NAS, attached via iscsi target.
> During network failure, for example on node2, each node try to fence other
> node.
> Fencing action on node1 get success, but on node2 fail, because it can't
> see
> iscsi target(network is down!) .
> I thinks it's the reason why node2 doesn't reboot now, because it can't
> make
> operation on key reservation and watchdog can't check for this.
> When network come back, watchdog can check for key registration and reboot
> node2.
>
> For clustered filesystem I planned to use ping resource with location
> constraint as described here
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch09s03s03s02.html
> If the node can't see iscsi target..then..stop AppServer, Filesystem ecc
>
> But it doesn't works. In the node with network failure i see in the log
> that
> pingd is set to 0 but Filesystem resource doesn't stop.
>
> I will continue testing...
>
> Thanks
> Andrea
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

lists at alteeve

Feb 5, 2015, 12:18 AM

Post #27 of 37 (3685 views)

Permalink

That is the problem that makes geo-clustering very hard to nearly
impossible. You can look at the Booth option for pacemaker, but that
requires two (or more) full clusters, plus an arbitrator 3rd location.
Outside of this though, there really is no way to have geo/stretch
clustering with automatic failover.

digimer

On 05/02/15 03:38 AM, Dmitry Koterov wrote:
> Could you please give a hint: how to use fencing in case the nodes are
> all in different geo-distributed datacenters? How people do that?
> Because there could be a network disconnection between datacenters, and
> we have no chance to send a stonith signal somewhere.
>
> On Wednesday, February 4, 2015, Andrea <a.bacchi@codices.com
> <mailto:a.bacchi@codices.com>> wrote:
>
> Digimer <lists@...> writes:
>
> >
> > That fence failed until the network came back makes your fence method
> > less than ideal. Will it eventually fence with the network still
> failed?
> >
> > Most importantly though; Cluster resources blocked while the
> fence was
> > pending? If so, then your cluster is safe, and that is the most
> > important part.
> >
> Hi Digimer
>
> I'm using for fencing a remote NAS, attached via iscsi target.
> During network failure, for example on node2, each node try to fence
> other node.
> Fencing action on node1 get success, but on node2 fail, because it
> can't see
> iscsi target(network is down!) .
> I thinks it's the reason why node2 doesn't reboot now, because it
> can't make
> operation on key reservation and watchdog can't check for this.
> When network come back, watchdog can check for key registration and
> reboot
> node2.
>
> For clustered filesystem I planned to use ping resource with location
> constraint as described here
> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch09s03s03s02.html
> If the node can't see iscsi target..then..stop AppServer, Filesystem ecc
>
> But it doesn't works. In the node with network failure i see in the
> log that
> pingd is set to 0 but Filesystem resource doesn't stop.
>
> I will continue testing...
>
> Thanks
> Andrea
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

lists at alteeve

Feb 5, 2015, 12:18 AM

Post #28 of 37 (3681 views)

Permalink

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

a.bacchi at codices

Feb 5, 2015, 8:08 AM

Post #29 of 37 (3684 views)

Permalink

> > On Wednesday, February 4, 2015, Andrea <a.bacchi@...
> > <mailto:a.bacchi@...>> wrote:
> >
> > Digimer <lists <at> ...> writes:
> >
> > >
> > > That fence failed until the network came back makes your fence method
> > > less than ideal. Will it eventually fence with the network still
> > failed?
> > >
> > > Most importantly though; Cluster resources blocked while the
> > fence was
> > > pending? If so, then your cluster is safe, and that is the most
> > > important part.
> > >
> > Hi Digimer
> >
> > I'm using for fencing a remote NAS, attached via iscsi target.
> > During network failure, for example on node2, each node try to fence
> > other node.
> > Fencing action on node1 get success, but on node2 fail, because it
> > can't see
> > iscsi target(network is down!) .
> > I thinks it's the reason why node2 doesn't reboot now, because it
> > can't make
> > operation on key reservation and watchdog can't check for this.
> > When network come back, watchdog can check for key registration and
> > reboot
> > node2.
> >
> > For clustered filesystem I planned to use ping resource with location
> > constraint as described here
> >
http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch09s03s03s02.html
> > If the node can't see iscsi target..then..stop AppServer, Filesystem ecc
> >
> > But it doesn't works. In the node with network failure i see in the
> > log that
> > pingd is set to 0 but Filesystem resource doesn't stop.
> >
> > I will continue testing...
> >
> > Thanks
> > Andrea
> >
> >
> >

Hi

I test location constraint with ping resource to stop resource on
disconnected node, but with stonith active, doesn't works

I used dummy resource for test
[ONE]pcs resource create mydummy ocf:pacemaker:Dummy op monitor
interval=120s --clone

Ping resource
[ONE] pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000
host_list=pingnode --clone

Location Constraint
[ONE]pcs constraint location mydummy rule score=-INFINITY pingd lt 1 or
not_defined pingd

If the pingnode became not visible on node2, I will see pingd attribute on
node2 set to 0 and dummy resources stop on node2.
If I cut off nentire network on node2, I will see pingd attribute on node2
set to 0 bud dummy resource never stop.
During network failure...stonith agent is active and try to fence node 1
without success.
Why? Is the failed fence action that block location constraint?

Andrea

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

lists at alteeve

Feb 5, 2015, 1:59 PM

Post #30 of 37 (3689 views)

Permalink

On 2015-02-05 17:08, Andrea wrote:
> Hi
>
> I test location constraint with ping resource to stop resource on
> disconnected node, but with stonith active, doesn't works
>
> I used dummy resource for test
> [ONE]pcs resource create mydummy ocf:pacemaker:Dummy op monitor
> interval=120s --clone
>
> Ping resource
> [ONE] pcs resource create ping ocf:pacemaker:ping dampen=5s
> multiplier=1000
> host_list=pingnode --clone
>
> Location Constraint
> [ONE]pcs constraint location mydummy rule score=-INFINITY pingd lt 1 or
> not_defined pingd
>
>
> If the pingnode became not visible on node2, I will see pingd attribute
> on
> node2 set to 0 and dummy resources stop on node2.
> If I cut off nentire network on node2, I will see pingd attribute on
> node2
> set to 0 bud dummy resource never stop.
> During network failure...stonith agent is active and try to fence node
> 1
> without success.
> Why? Is the failed fence action that block location constraint?
>
>
> Andrea

When you disable stonith, pacemaker just assumes that "no contact" ==
"peer dead", so recovery happens. This is a very false sense of security
though, because most people test by actually crashing a node, so there
is no risk of a split-brain. The problem is, in the real world, this can
not be assured. A node can be running just fine, but the connection
fails. If you disable stonith, you get a split-brain.

So when you enable stonith, and you really must, then pacemaker will
never make an assumption about the state of the peer. So when the peer
stops responding, pacemaker blocks and calls a fence. It will then sit
there and wait for the fence to succeed. If the fence *doesn't* succeed,
it ends up staying blocked. This is the proper behaviour!

Now, if you enable stonith *and* it is configured properly, then you
will see that recovery proceeds as expected *after* the fence action
completes successfully. So, setup stonith! :)

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

lists at alteeve

Feb 5, 2015, 1:59 PM

Post #31 of 37 (3678 views)

Permalink

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

a.bacchi at codices

Feb 6, 2015, 1:08 AM

Post #32 of 37 (3694 views)

Permalink

<lists@...> writes:

> > If the pingnode became not visible on node2, I will see pingd attribute
> > on
> > node2 set to 0 and dummy resources stop on node2.
> > If I cut off nentire network on node2, I will see pingd attribute on
> > node2
> > set to 0 bud dummy resource never stop.
> > During network failure...stonith agent is active and try to fence node
> > 1
> > without success.
> > Why? Is the failed fence action that block location constraint?
> >
> >
> > Andrea
>
> When you disable stonith, pacemaker just assumes that "no contact" ==
> "peer dead", so recovery happens. This is a very false sense of security
> though, because most people test by actually crashing a node, so there
> is no risk of a split-brain. The problem is, in the real world, this can
> not be assured. A node can be running just fine, but the connection
> fails. If you disable stonith, you get a split-brain.
>
> So when you enable stonith, and you really must, then pacemaker will
> never make an assumption about the state of the peer. So when the peer
> stops responding, pacemaker blocks and calls a fence. It will then sit
> there and wait for the fence to succeed. If the fence *doesn't* succeed,
> it ends up staying blocked. This is the proper behaviour!
>
> Now, if you enable stonith *and* it is configured properly, then you
> will see that recovery proceeds as expected *after* the fence action
> completes successfully. So, setup stonith! :)
>

Hi

I don't want to disable stonith, I have stonith enabled, and I use it.
The problem is that during network failure on node2, fence action is
activated on this node, but fail. Fail because it is the unconnected node.
And also it doesn't reboot because watchdog can't check for key registration.
There is a method to stop resources on this node?
Maybe fence_scsi on remote iscsi target isn't the good solution?

Andrea

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

dejanmm at fastmail

Feb 6, 2015, 7:15 AM

Post #33 of 37 (3689 views)

Permalink

Hi,

On Thu, Feb 05, 2015 at 09:18:50AM +0100, Digimer wrote:
> That is the problem that makes geo-clustering very hard to nearly
> impossible. You can look at the Booth option for pacemaker, but that
> requires two (or more) full clusters, plus an arbitrator 3rd

A full cluster can consist of one node only. Hence, it is
possible to have a kind of stretch two-node [multi-site] cluster
based on tickets and managed by booth.

Thanks,

Dejan

> location. Outside of this though, there really is no way to have
> geo/stretch clustering with automatic failover.
>
> digimer
>
> On 05/02/15 03:38 AM, Dmitry Koterov wrote:
> >Could you please give a hint: how to use fencing in case the nodes are
> >all in different geo-distributed datacenters? How people do that?
> >Because there could be a network disconnection between datacenters, and
> >we have no chance to send a stonith signal somewhere.
> >
> >On Wednesday, February 4, 2015, Andrea <a.bacchi@codices.com
> ><mailto:a.bacchi@codices.com>> wrote:
> >
> > Digimer <lists@...> writes:
> >
> > >
> > > That fence failed until the network came back makes your fence method
> > > less than ideal. Will it eventually fence with the network still
> > failed?
> > >
> > > Most importantly though; Cluster resources blocked while the
> > fence was
> > > pending? If so, then your cluster is safe, and that is the most
> > > important part.
> > >
> > Hi Digimer
> >
> > I'm using for fencing a remote NAS, attached via iscsi target.
> > During network failure, for example on node2, each node try to fence
> > other node.
> > Fencing action on node1 get success, but on node2 fail, because it
> > can't see
> > iscsi target(network is down!) .
> > I thinks it's the reason why node2 doesn't reboot now, because it
> > can't make
> > operation on key reservation and watchdog can't check for this.
> > When network come back, watchdog can check for key registration and
> > reboot
> > node2.
> >
> > For clustered filesystem I planned to use ping resource with location
> > constraint as described here
> > http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch09s03s03s02.html
> > If the node can't see iscsi target..then..stop AppServer, Filesystem ecc
> >
> > But it doesn't works. In the node with network failure i see in the
> > log that
> > pingd is set to 0 but Filesystem resource doesn't stop.
> >
> > I will continue testing...
> >
> > Thanks
> > Andrea
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> >_______________________________________________
> >Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> >Project Home: http://www.clusterlabs.org
> >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >Bugs: http://bugs.clusterlabs.org
> >
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person
> without access to education?
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

lars.ellenberg at linbit

Feb 9, 2015, 7:41 AM

Post #34 of 37 (3664 views)

Permalink

On Fri, Feb 06, 2015 at 04:15:44PM +0100, Dejan Muhamedagic wrote:
> Hi,
>
> On Thu, Feb 05, 2015 at 09:18:50AM +0100, Digimer wrote:
> > That is the problem that makes geo-clustering very hard to nearly
> > impossible. You can look at the Booth option for pacemaker, but that
> > requires two (or more) full clusters, plus an arbitrator 3rd
>
> A full cluster can consist of one node only. Hence, it is
> possible to have a kind of stretch two-node [multi-site] cluster
> based on tickets and managed by booth.

In theory.

In practice, we rely on "proper behaviour" of "the other site",
in case a ticket is revoked, or cannot be renewed.

Relying on a single node for "proper behaviour" does not inspire
as much confidence as relying on a multi-node HA-cluster at each site,
which we can expect to ensure internal fencing.

With reliable hardware watchdogs, it still should be ok to do
"stretched two node HA clusters" in a reliable way.

Be generous with timeouts.

And document which failure modes you expect to handle,
and how to deal with the worst-case scenarios if you end up with some
failure case that you are not equipped to handle properly.

There are deployments which favor
"rather online with _potential_ split brain" over
"rather offline just in case".

Document this, print it out on paper,

"I am aware that this may lead to lost transactions,
data divergence, data corruption, or data loss.
I am personally willing to take the blame,
and live with the consequences."

Have some "boss" sign that ^^^
in the real world using a real pen.

Lars

--
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA and Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

dejanmm at fastmail

Feb 10, 2015, 6:58 AM

Post #35 of 37 (3664 views)

Permalink

On Mon, Feb 09, 2015 at 04:41:19PM +0100, Lars Ellenberg wrote:
> On Fri, Feb 06, 2015 at 04:15:44PM +0100, Dejan Muhamedagic wrote:
> > Hi,
> >
> > On Thu, Feb 05, 2015 at 09:18:50AM +0100, Digimer wrote:
> > > That is the problem that makes geo-clustering very hard to nearly
> > > impossible. You can look at the Booth option for pacemaker, but that
> > > requires two (or more) full clusters, plus an arbitrator 3rd
> >
> > A full cluster can consist of one node only. Hence, it is
> > possible to have a kind of stretch two-node [multi-site] cluster
> > based on tickets and managed by booth.
>
> In theory.
>
> In practice, we rely on "proper behaviour" of "the other site",
> in case a ticket is revoked, or cannot be renewed.
>
> Relying on a single node for "proper behaviour" does not inspire
> as much confidence as relying on a multi-node HA-cluster at each site,
> which we can expect to ensure internal fencing.
>
> With reliable hardware watchdogs, it still should be ok to do
> "stretched two node HA clusters" in a reliable way.
>
> Be generous with timeouts.

As always.

> And document which failure modes you expect to handle,
> and how to deal with the worst-case scenarios if you end up with some
> failure case that you are not equipped to handle properly.
>
> There are deployments which favor
> "rather online with _potential_ split brain" over
> "rather offline just in case".

There's an arbitrator which should help in case of split brain.

> Document this, print it out on paper,
>
> "I am aware that this may lead to lost transactions,
> data divergence, data corruption, or data loss.
> I am personally willing to take the blame,
> and live with the consequences."
>
> Have some "boss" sign that ^^^
> in the real world using a real pen.

Well, of course running such a "stretch" cluster would be
rather different from a "normal" one.

The essential thing is that there's no fencing, unless configured
as a dead-man switch for the ticket. Given that booth has a
"sanity" program hook, maybe that could be utilized to verify if
this side of the cluster is healthy enough.

Thanks,

Dejan

> Lars
>
> --
> : Lars Ellenberg
> : http://www.LINBIT.com | Your Way to High Availability
> : DRBD, Linux-HA and Pacemaker support and consulting
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

arvidjaar at gmail

Feb 10, 2015, 10:10 PM

Post #36 of 37 (3661 views)

Permalink

Ð’ Tue, 10 Feb 2015 15:58:57 +0100
Dejan Muhamedagic <dejanmm@fastmail.fm> Ð¿Ð¸ÑˆÐµÑ‚:

> On Mon, Feb 09, 2015 at 04:41:19PM +0100, Lars Ellenberg wrote:
> > On Fri, Feb 06, 2015 at 04:15:44PM +0100, Dejan Muhamedagic wrote:
> > > Hi,
> > >
> > > On Thu, Feb 05, 2015 at 09:18:50AM +0100, Digimer wrote:
> > > > That is the problem that makes geo-clustering very hard to nearly
> > > > impossible. You can look at the Booth option for pacemaker, but that
> > > > requires two (or more) full clusters, plus an arbitrator 3rd
> > >
> > > A full cluster can consist of one node only. Hence, it is
> > > possible to have a kind of stretch two-node [multi-site] cluster
> > > based on tickets and managed by booth.
> >
> > In theory.
> >
> > In practice, we rely on "proper behaviour" of "the other site",
> > in case a ticket is revoked, or cannot be renewed.
> >
> > Relying on a single node for "proper behaviour" does not inspire
> > as much confidence as relying on a multi-node HA-cluster at each site,
> > which we can expect to ensure internal fencing.
> >
> > With reliable hardware watchdogs, it still should be ok to do
> > "stretched two node HA clusters" in a reliable way.
> >
> > Be generous with timeouts.
>
> As always.
>
> > And document which failure modes you expect to handle,
> > and how to deal with the worst-case scenarios if you end up with some
> > failure case that you are not equipped to handle properly.
> >
> > There are deployments which favor
> > "rather online with _potential_ split brain" over
> > "rather offline just in case".
>
> There's an arbitrator which should help in case of split brain.
>

You can never really differentiate between site down and site cut off
due to (network) infrastructure outage. Arbitrator can mitigate split
brain only to the extent you trust your network. You still have to take
decision what you value more - data availability or data consistency.

Long distance clusters are really for disaster recovery. It is
convenient to have a single button that starts up all resources in
controlled manner, but someone really need to decide to push this
button.

> > Document this, print it out on paper,
> >
> > "I am aware that this may lead to lost transactions,
> > data divergence, data corruption, or data loss.
> > I am personally willing to take the blame,
> > and live with the consequences."
> >
> > Have some "boss" sign that ^^^
> > in the real world using a real pen.
>
> Well, of course running such a "stretch" cluster would be
> rather different from a "normal" one.
>
> The essential thing is that there's no fencing, unless configured
> as a dead-man switch for the ticket. Given that booth has a
> "sanity" program hook, maybe that could be utilized to verify if
> this side of the cluster is healthy enough.
>
> Thanks,
>
> Dejan
>
> > Lars
> >
> > --
> > : Lars Ellenberg
> > : http://www.LINBIT.com | Your Way to High Availability
> > : DRBD, Linux-HA and Pacemaker support and consulting
> >
> > DRBDÂ® and LINBITÂ® are registered trademarks of LINBIT, Austria.
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

dejanmm at fastmail

Feb 11, 2015, 2:30 AM

Post #37 of 37 (3662 views)

Permalink

On Wed, Feb 11, 2015 at 09:10:45AM +0300, Andrei Borzenkov wrote:
> Ð’ Tue, 10 Feb 2015 15:58:57 +0100
> Dejan Muhamedagic <dejanmm@fastmail.fm> Ð¿Ð¸ÑˆÐµÑ‚:
>
> > On Mon, Feb 09, 2015 at 04:41:19PM +0100, Lars Ellenberg wrote:
> > > On Fri, Feb 06, 2015 at 04:15:44PM +0100, Dejan Muhamedagic wrote:
> > > > Hi,
> > > >
> > > > On Thu, Feb 05, 2015 at 09:18:50AM +0100, Digimer wrote:
> > > > > That is the problem that makes geo-clustering very hard to nearly
> > > > > impossible. You can look at the Booth option for pacemaker, but that
> > > > > requires two (or more) full clusters, plus an arbitrator 3rd
> > > >
> > > > A full cluster can consist of one node only. Hence, it is
> > > > possible to have a kind of stretch two-node [multi-site] cluster
> > > > based on tickets and managed by booth.
> > >
> > > In theory.
> > >
> > > In practice, we rely on "proper behaviour" of "the other site",
> > > in case a ticket is revoked, or cannot be renewed.
> > >
> > > Relying on a single node for "proper behaviour" does not inspire
> > > as much confidence as relying on a multi-node HA-cluster at each site,
> > > which we can expect to ensure internal fencing.
> > >
> > > With reliable hardware watchdogs, it still should be ok to do
> > > "stretched two node HA clusters" in a reliable way.
> > >
> > > Be generous with timeouts.
> >
> > As always.
> >
> > > And document which failure modes you expect to handle,
> > > and how to deal with the worst-case scenarios if you end up with some
> > > failure case that you are not equipped to handle properly.
> > >
> > > There are deployments which favor
> > > "rather online with _potential_ split brain" over
> > > "rather offline just in case".
> >
> > There's an arbitrator which should help in case of split brain.
> >
>
> You can never really differentiate between site down and site cut off
> due to (network) infrastructure outage. Arbitrator can mitigate split
> brain only to the extent you trust your network. You still have to take
> decision what you value more - data availability or data consistency.

Right, that's why I mentioned ticket loss policy. If booth drops
the ticket, pacemaker would fence the node (if loss-policy=fence).
Booth guarantees that no two sites will hold the ticket at the
same time. Of course, you have to trust booth to function
properly, but I guess that's a different story.

Thanks,

Dejan

> Long distance clusters are really for disaster recovery. It is
> convenient to have a single button that starts up all resources in
> controlled manner, but someone really need to decide to push this
> button.
>
> > > Document this, print it out on paper,
> > >
> > > "I am aware that this may lead to lost transactions,
> > > data divergence, data corruption, or data loss.
> > > I am personally willing to take the blame,
> > > and live with the consequences."
> > >
> > > Have some "boss" sign that ^^^
> > > in the real world using a real pen.
> >
> > Well, of course running such a "stretch" cluster would be
> > rather different from a "normal" one.
> >
> > The essential thing is that there's no fencing, unless configured
> > as a dead-man switch for the ticket. Given that booth has a
> > "sanity" program hook, maybe that could be utilized to verify if
> > this side of the cluster is healthy enough.
> >
> > Thanks,
> >
> > Dejan
> >
> > > Lars
> > >
> > > --
> > > : Lars Ellenberg
> > > : http://www.LINBIT.com | Your Way to High Availability
> > > : DRBD, Linux-HA and Pacemaker support and consulting
> > >
> > > DRBDÂ® and LINBITÂ® are registered trademarks of LINBIT, Austria.
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org