Mailing List Archive: Two node cluster and no hardware device for stonith.

Two node cluster and no hardware device for stonith.

a.bacchi at codices

Jan 21, 2015, 5:13 AM

Post #1 of 37 (13297 views)

Hi All,

I have a question about stonith
In my scenarion , I have to create 2 node cluster, but I don't have any
hardware device for stonith. No APC no IPMI ecc, no one of the list returned
by "pcs stonith list"
So, there is an option to do something?
This is my scenario:
- 2 nodes cluster
serverHA1
serverHA2

- Software
Centos 6.6
pacemaker.x86_64 1.1.12-4.el6
cman.x86_64 3.0.12.1-68.el6
corosync.x86_64 1.4.7-1.el6

-NO hardware device for stonith!

- Cluster creation ([ALL] operation done on all nodes, [ONE] operation done
on only one node)
[ALL] systemctl start pcsd.service
[ALL] systemctl enable pcsd.service
[ONE] pcs cluster auth serverHA1 serverHA2
[ALL] echo "CMAN_QUORUM_TIMEOUT=0" >> /etc/sysconfig/cman
[ONE] pcs cluster setup --name MyCluHA serverHA1 serverHA2
[ONE] pcs property set stonith-enabled=false
[ONE] pcs property set no-quorum-policy=ignore
[ONE] pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000
host_list=192.168.56.1 --clone

In my test, when I simulate network failure, split brain occurs, and when
network come back, One node kill the other node
-log on node 1:
Jan 21 11:45:28 corosync [CMAN ] memb: Sending KILL to node 2

-log on node 2:
Jan 21 11:45:28 corosync [CMAN ] memb: got KILL for node 2

There is a method to restart pacemaker when network come back instead of
kill it?

Thanks
Andrea

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

lists at alteeve

Jan 21, 2015, 8:18 AM

Post #2 of 37 (13177 views)

On 21/01/15 08:13 AM, Andrea wrote:
> Hi All,
>
> I have a question about stonith
> In my scenarion , I have to create 2 node cluster, but I don't have any
> hardware device for stonith. No APC no IPMI ecc, no one of the list returned
> by "pcs stonith list"
> So, there is an option to do something?
> This is my scenario:
> - 2 nodes cluster
> serverHA1
> serverHA2
>
> - Software
> Centos 6.6
> pacemaker.x86_64 1.1.12-4.el6
> cman.x86_64 3.0.12.1-68.el6
> corosync.x86_64 1.4.7-1.el6
>
> -NO hardware device for stonith!
>
> - Cluster creation ([ALL] operation done on all nodes, [ONE] operation done
> on only one node)
> [ALL] systemctl start pcsd.service
> [ALL] systemctl enable pcsd.service
> [ONE] pcs cluster auth serverHA1 serverHA2
> [ALL] echo "CMAN_QUORUM_TIMEOUT=0" >> /etc/sysconfig/cman
> [ONE] pcs cluster setup --name MyCluHA serverHA1 serverHA2
> [ONE] pcs property set stonith-enabled=false
> [ONE] pcs property set no-quorum-policy=ignore
> [ONE] pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000
> host_list=192.168.56.1 --clone
>
>
> In my test, when I simulate network failure, split brain occurs, and when
> network come back, One node kill the other node
> -log on node 1:
> Jan 21 11:45:28 corosync [CMAN ] memb: Sending KILL to node 2
>
> -log on node 2:
> Jan 21 11:45:28 corosync [CMAN ] memb: got KILL for node 2
>
>
> There is a method to restart pacemaker when network come back instead of
> kill it?
>
> Thanks
> Andrea

You really need a fence device, there isn't a way around it. By
definition, when a node needs to be fenced, it is in an unknown state
and it can not be predicted to operate predictably.

If you're using real hardware, then you can use a switched PDU
(network-connected power bar with individual outlet control) to do
fencing. I use the APC AP7900 in all my clusters and it works perfectly.
I know that some other brands work, too.

If you machines are virtual machines, then you can do fencing by talking
to the hypervisor. In this case, one node calls the host of the other
node and asks it to be terminated (fence_virsh and fence_xvm for KVM/Xen
systems, fence_vmware for VMWare, etc).

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

lists at alteeve

Jan 21, 2015, 8:18 AM

Post #3 of 37 (13183 views)

On 21/01/15 08:13 AM, Andrea wrote:
> Hi All,
>
> I have a question about stonith
> In my scenarion , I have to create 2 node cluster, but I don't have any
> hardware device for stonith. No APC no IPMI ecc, no one of the list returned
> by "pcs stonith list"
> So, there is an option to do something?
> This is my scenario:
> - 2 nodes cluster
> serverHA1
> serverHA2
>
> - Software
> Centos 6.6
> pacemaker.x86_64 1.1.12-4.el6
> cman.x86_64 3.0.12.1-68.el6
> corosync.x86_64 1.4.7-1.el6
>
> -NO hardware device for stonith!
>
> - Cluster creation ([ALL] operation done on all nodes, [ONE] operation done
> on only one node)
> [ALL] systemctl start pcsd.service
> [ALL] systemctl enable pcsd.service
> [ONE] pcs cluster auth serverHA1 serverHA2
> [ALL] echo "CMAN_QUORUM_TIMEOUT=0" >> /etc/sysconfig/cman
> [ONE] pcs cluster setup --name MyCluHA serverHA1 serverHA2
> [ONE] pcs property set stonith-enabled=false
> [ONE] pcs property set no-quorum-policy=ignore
> [ONE] pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000
> host_list=192.168.56.1 --clone
>
>
> In my test, when I simulate network failure, split brain occurs, and when
> network come back, One node kill the other node
> -log on node 1:
> Jan 21 11:45:28 corosync [CMAN ] memb: Sending KILL to node 2
>
> -log on node 2:
> Jan 21 11:45:28 corosync [CMAN ] memb: got KILL for node 2
>
>
> There is a method to restart pacemaker when network come back instead of
> kill it?
>
> Thanks
> Andrea

You really need a fence device, there isn't a way around it. By
definition, when a node needs to be fenced, it is in an unknown state
and it can not be predicted to operate predictably.

If you're using real hardware, then you can use a switched PDU
(network-connected power bar with individual outlet control) to do
fencing. I use the APC AP7900 in all my clusters and it works perfectly.
I know that some other brands work, too.

If you machines are virtual machines, then you can do fencing by talking
to the hypervisor. In this case, one node calls the host of the other
node and asks it to be terminated (fence_virsh and fence_xvm for KVM/Xen
systems, fence_vmware for VMWare, etc).

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

E.Kuemmerle at fz-juelich

Jan 22, 2015, 1:03 AM

Post #4 of 37 (13185 views)

On 21.01.2015 11:18 Digimer wrote:
> On 21/01/15 08:13 AM, Andrea wrote:
>> > Hi All,
>> >
>> > I have a question about stonith
>> > In my scenarion , I have to create 2 node cluster, but I don't have any
>> > hardware device for stonith. No APC no IPMI ecc, no one of the list returned
>> > by "pcs stonith list"
>> > So, there is an option to do something?
>> > This is my scenario:
>> > - 2 nodes cluster
>> > serverHA1
>> > serverHA2
>> >
>> > - Software
>> > Centos 6.6
>> > pacemaker.x86_64 1.1.12-4.el6
>> > cman.x86_64 3.0.12.1-68.el6
>> > corosync.x86_64 1.4.7-1.el6
>> >
>> > -NO hardware device for stonith!
>> >
>> > - Cluster creation ([ALL] operation done on all nodes, [ONE] operation done
>> > on only one node)
>> > [ALL] systemctl start pcsd.service
>> > [ALL] systemctl enable pcsd.service
>> > [ONE] pcs cluster auth serverHA1 serverHA2
>> > [ALL] echo "CMAN_QUORUM_TIMEOUT=0" >> /etc/sysconfig/cman
>> > [ONE] pcs cluster setup --name MyCluHA serverHA1 serverHA2
>> > [ONE] pcs property set stonith-enabled=false
>> > [ONE] pcs property set no-quorum-policy=ignore
>> > [ONE] pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000
>> > host_list=192.168.56.1 --clone
>> >
>> >
>> > In my test, when I simulate network failure, split brain occurs, and when
>> > network come back, One node kill the other node
>> > -log on node 1:
>> > Jan 21 11:45:28 corosync [CMAN ] memb: Sending KILL to node 2
>> >
>> > -log on node 2:
>> > Jan 21 11:45:28 corosync [CMAN ] memb: got KILL for node 2
>> >
>> >
>> > There is a method to restart pacemaker when network come back instead of
>> > kill it?
>> >
>> > Thanks
>> > Andrea
> You really need a fence device, there isn't a way around it. By
> definition, when a node needs to be fenced, it is in an unknown state
> and it can not be predicted to operate predictably.
>
> If you're using real hardware, then you can use a switched PDU
> (network-connected power bar with individual outlet control) to do
> fencing. I use the APC AP7900 in all my clusters and it works perfectly.
> I know that some other brands work, too.
>
> If you machines are virtual machines, then you can do fencing by talking
> to the hypervisor. In this case, one node calls the host of the other
> node and asks it to be terminated (fence_virsh and fence_xvm for KVM/Xen
> systems, fence_vmware for VMWare, etc).
>
> -- Digimer
If you want to save money and you can solder a bit, I can recommend rcd_serial.
The required device is described in cluster-glue/stonith/README.rcd_serial.
It is very simple but it works for us reliably since more than four years!

Eberhard

------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

Jan 22, 2015, 1:15 AM

Post #5 of 37 (13181 views)

Am Donnerstag, 22. Januar 2015, 10:03:38 schrieb E. Kuemmerle:
> On 21.01.2015 11:18 Digimer wrote:
> > On 21/01/15 08:13 AM, Andrea wrote:
> >> > Hi All,
> >> >
> >> > I have a question about stonith
> >> > In my scenarion , I have to create 2 node cluster, but I don't have any
> >> > hardware device for stonith. No APC no IPMI ecc, no one of the list
> >> > returned by "pcs stonith list"
> >> > So, there is an option to do something?
> >> > This is my scenario:
> >> > - 2 nodes cluster
> >> > serverHA1
> >> > serverHA2
> >> >
> >> > - Software
> >> > Centos 6.6
> >> > pacemaker.x86_64 1.1.12-4.el6
> >> > cman.x86_64 3.0.12.1-68.el6
> >> > corosync.x86_64 1.4.7-1.el6
> >> >
> >> > -NO hardware device for stonith!

Are you sure that you do not have fencing hardware? Perhaps you just did nit
configure it? Please read the manual of you BIOS and check your system board if
you have a IPMI interface.

> >> > - Cluster creation ([ALL] operation done on all nodes, [ONE] operation
> >> > done
> >> > on only one node)
> >> > [ALL] systemctl start pcsd.service
> >> > [ALL] systemctl enable pcsd.service
> >> > [ONE] pcs cluster auth serverHA1 serverHA2
> >> > [ALL] echo "CMAN_QUORUM_TIMEOUT=0" >> /etc/sysconfig/cman
> >> > [ONE] pcs cluster setup --name MyCluHA serverHA1 serverHA2
> >> > [ONE] pcs property set stonith-enabled=false
> >> > [ONE] pcs property set no-quorum-policy=ignore
> >> > [ONE] pcs resource create ping ocf:pacemaker:ping dampen=5s
> >> > multiplier=1000
> >> > host_list=192.168.56.1 --clone
> >> >
> >> >
> >> > In my test, when I simulate network failure, split brain occurs, and
> >> > when
> >> > network come back, One node kill the other node
> >> > -log on node 1:
> >> > Jan 21 11:45:28 corosync [CMAN ] memb: Sending KILL to node 2
> >> >
> >> > -log on node 2:
> >> > Jan 21 11:45:28 corosync [CMAN ] memb: got KILL for node 2

That is how fencing works.

Mit freundlichen Grüßen,

Michael Schwartzkopff

--
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

a.bacchi at codices

Jan 22, 2015, 2:08 AM

Post #6 of 37 (13176 views)

Michael Schwartzkopff <ms@...> writes:

>
> Am Donnerstag, 22. Januar 2015, 10:03:38 schrieb E. Kuemmerle:
> > On 21.01.2015 11:18 Digimer wrote:
> > > On 21/01/15 08:13 AM, Andrea wrote:
> > >> > Hi All,
> > >> >
> > >> > I have a question about stonith
> > >> > In my scenarion , I have to create 2 node cluster, but I don't
>
> Are you sure that you do not have fencing hardware? Perhaps you just did nit
> configure it? Please read the manual of you BIOS and check your system
board if
> you have a IPMI interface.
>

> > >> > In my test, when I simulate network failure, split brain occurs, and
> > >> > when
> > >> > network come back, One node kill the other node
> > >> > -log on node 1:
> > >> > Jan 21 11:45:28 corosync [CMAN ] memb: Sending KILL to node 2
> > >> >
> > >> > -log on node 2:
> > >> > Jan 21 11:45:28 corosync [CMAN ] memb: got KILL for node 2
>
> That is how fencing works.
>
> Mit freundlichen GrÃ¼ÃŸen,
>
> Michael Schwartzkopff
>

Hi All

many thanks for your replies.
I will update my scenario to ask about adding some devices for stonith
- Option 1
I will ask for having 2 vmware virtual machine, so i can try fance_vmware
-Option 2
In the project, maybe will need a shared storage. In this case, the shared
storage will be a NAS that a can add to my nodes via iscsi. In this case I
can try fence_scsi

I will write here about news

Many thanks to all for support
Andrea

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

a.bacchi at codices

Jan 27, 2015, 2:35 AM

Post #7 of 37 (13140 views)

Andrea <a.bacchi@...> writes:

>
> Michael Schwartzkopff <ms <at> ...> writes:
>
> >
> > Am Donnerstag, 22. Januar 2015, 10:03:38 schrieb E. Kuemmerle:
> > > On 21.01.2015 11:18 Digimer wrote:
> > > > On 21/01/15 08:13 AM, Andrea wrote:
> > > >> > Hi All,
> > > >> >
> > > >> > I have a question about stonith
> > > >> > In my scenarion , I have to create 2 node cluster, but I don't
> >
> > Are you sure that you do not have fencing hardware? Perhaps you just did
nit
> > configure it? Please read the manual of you BIOS and check your system
> board if
> > you have a IPMI interface.
> >
>
> > > >> > In my test, when I simulate network failure, split brain occurs, and
> > > >> > when
> > > >> > network come back, One node kill the other node
> > > >> > -log on node 1:
> > > >> > Jan 21 11:45:28 corosync [CMAN ] memb: Sending KILL to node 2
> > > >> >
> > > >> > -log on node 2:
> > > >> > Jan 21 11:45:28 corosync [CMAN ] memb: got KILL for node 2
> >
> > That is how fencing works.
> >
> > Mit freundlichen GrÃ¼ÃŸen,
> >
> > Michael Schwartzkopff
> >
>
> Hi All
>
> many thanks for your replies.
> I will update my scenario to ask about adding some devices for stonith
> - Option 1
> I will ask for having 2 vmware virtual machine, so i can try fance_vmware
> -Option 2
> In the project, maybe will need a shared storage. In this case, the shared
> storage will be a NAS that a can add to my nodes via iscsi. In this case I
> can try fence_scsi
>
> I will write here about news
>
> Many thanks to all for support
> Andrea
>

some news

- Option 2
In the customer environment I configured a iscsi target that our project
will use as cluster filesystem

[ONE]pvcreate /dev/sdb
[ONE]vgcreate -Ay -cy cluster_vg /dev/sdb
[ONE]lvcreate -L*G -n cluster_lv cluster_vg
[ONE]mkfs.gfs2 -j2 -p lock_dlm -t ProjectHA:ArchiveFS /dev/cluster_vg/cluster_lv

now I can add a Filesystem resource

[ONE]pcs resource create clusterfs Filesystem
device="/dev/cluster_vg/cluster_lv" directory="/var/mountpoint"
fstype="gfs2" "options=noatime" op monitor interval=10s clone interleave=true

and I can read and write from both node.

Now I'd like to use this device with fence_scsi.
It is ok? because I see in the man page this:
"The fence_scsi agent works by having each node in the cluster register a
unique key with the SCSI devive(s). Once registered, a single node will
become the reservation holder by creating a "write exclu-sive,
registrants only" reservation on the device(s). The result is that only
registered nodes may write to the device(s)"
It's no good for me, I need both node can write on the device.
So, I need another device to use with fence_scsi? In this case I will try to
create two partition, sdb1 and sdb2, on this device and use sdb1 as
clusterfs and sdb2 for fencing.

If i try to manually test this, I obtain before any operation
[ONE]sg_persist -n --read-keys
--device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
PR generation=0x27, 1 registered reservation key follows:
0x98343e580002734d

Then, I try to set serverHA1 key
[serverHA1]fence_scsi -d
/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 -f /tmp/miolog.txt -n
serverHA1 -o on

But nothing has changed
[ONE]sg_persist -n --read-keys
--device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
PR generation=0x27, 1 registered reservation key follows:
0x98343e580002734d

and in the log:
gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore
(node_key=4d5a0001, dev=/dev/sde)
gen 26 17:53:27 fence_scsi: [debug] main::do_reset (dev=/dev/sde, status=6)
gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore (err=0)

The same when i try on serverHA2
It is normal?

In any case, i try to create a stonith device
[ONE]pcs stonith create iscsi-stonith-device fence_scsi
pcmk_host_list="serverHA1 serverHA2"
devices=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 meta
provides=unfencing

and the cluster status is ok
[ONE] pcs status
Cluster name: MyCluHA
Last updated: Tue Jan 27 11:21:48 2015
Last change: Tue Jan 27 10:46:57 2015
Stack: cman
Current DC: serverHA1 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
5 Resources configured

Online: [ serverHA1 serverHA2 ]

Full list of resources:

Clone Set: ping-clone [ping]
Started: [ serverHA1 serverHA2 ]
Clone Set: clusterfs-clone [clusterfs]
Started: [ serverHA1 serverHA2 ]
iscsi-stonith-device (stonith:fence_scsi): Started serverHA1

How I can try this from remote connection?

Andrea

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

emi2fast at gmail

Jan 27, 2015, 2:44 AM

Post #8 of 37 (13141 views)

In normal situation every node can in your file system, fence_scsi is
used when your cluster is in split-braint, when your a node doesn't
comunicate with the other node, i don't is good idea.

2015-01-27 11:35 GMT+01:00 Andrea <a.bacchi@codices.com>:
> Andrea <a.bacchi@...> writes:
>
>>
>> Michael Schwartzkopff <ms <at> ...> writes:
>>
>> >
>> > Am Donnerstag, 22. Januar 2015, 10:03:38 schrieb E. Kuemmerle:
>> > > On 21.01.2015 11:18 Digimer wrote:
>> > > > On 21/01/15 08:13 AM, Andrea wrote:
>> > > >> > Hi All,
>> > > >> >
>> > > >> > I have a question about stonith
>> > > >> > In my scenarion , I have to create 2 node cluster, but I don't
>> >
>> > Are you sure that you do not have fencing hardware? Perhaps you just did
> nit
>> > configure it? Please read the manual of you BIOS and check your system
>> board if
>> > you have a IPMI interface.
>> >
>>
>> > > >> > In my test, when I simulate network failure, split brain occurs, and
>> > > >> > when
>> > > >> > network come back, One node kill the other node
>> > > >> > -log on node 1:
>> > > >> > Jan 21 11:45:28 corosync [CMAN ] memb: Sending KILL to node 2
>> > > >> >
>> > > >> > -log on node 2:
>> > > >> > Jan 21 11:45:28 corosync [CMAN ] memb: got KILL for node 2
>> >
>> > That is how fencing works.
>> >
>> > Mit freundlichen GrÃ¼ÃŸen,
>> >
>> > Michael Schwartzkopff
>> >
>>
>> Hi All
>>
>> many thanks for your replies.
>> I will update my scenario to ask about adding some devices for stonith
>> - Option 1
>> I will ask for having 2 vmware virtual machine, so i can try fance_vmware
>> -Option 2
>> In the project, maybe will need a shared storage. In this case, the shared
>> storage will be a NAS that a can add to my nodes via iscsi. In this case I
>> can try fence_scsi
>>
>> I will write here about news
>>
>> Many thanks to all for support
>> Andrea
>>
>
>
>
> some news
>
> - Option 2
> In the customer environment I configured a iscsi target that our project
> will use as cluster filesystem
>
> [ONE]pvcreate /dev/sdb
> [ONE]vgcreate -Ay -cy cluster_vg /dev/sdb
> [ONE]lvcreate -L*G -n cluster_lv cluster_vg
> [ONE]mkfs.gfs2 -j2 -p lock_dlm -t ProjectHA:ArchiveFS /dev/cluster_vg/cluster_lv
>
> now I can add a Filesystem resource
>
> [ONE]pcs resource create clusterfs Filesystem
> device="/dev/cluster_vg/cluster_lv" directory="/var/mountpoint"
> fstype="gfs2" "options=noatime" op monitor interval=10s clone interleave=true
>
> and I can read and write from both node.
>
>
> Now I'd like to use this device with fence_scsi.
> It is ok? because I see in the man page this:
> "The fence_scsi agent works by having each node in the cluster register a
> unique key with the SCSI devive(s). Once registered, a single node will
> become the reservation holder by creating a "write exclu-sive,
> registrants only" reservation on the device(s). The result is that only
> registered nodes may write to the device(s)"
> It's no good for me, I need both node can write on the device.
> So, I need another device to use with fence_scsi? In this case I will try to
> create two partition, sdb1 and sdb2, on this device and use sdb1 as
> clusterfs and sdb2 for fencing.
>
>
> If i try to manually test this, I obtain before any operation
> [ONE]sg_persist -n --read-keys
> --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
> PR generation=0x27, 1 registered reservation key follows:
> 0x98343e580002734d
>
>
> Then, I try to set serverHA1 key
> [serverHA1]fence_scsi -d
> /dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 -f /tmp/miolog.txt -n
> serverHA1 -o on
>
> But nothing has changed
> [ONE]sg_persist -n --read-keys
> --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
> PR generation=0x27, 1 registered reservation key follows:
> 0x98343e580002734d
>
>
> and in the log:
> gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore
> (node_key=4d5a0001, dev=/dev/sde)
> gen 26 17:53:27 fence_scsi: [debug] main::do_reset (dev=/dev/sde, status=6)
> gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore (err=0)
>
> The same when i try on serverHA2
> It is normal?
>
>
> In any case, i try to create a stonith device
> [ONE]pcs stonith create iscsi-stonith-device fence_scsi
> pcmk_host_list="serverHA1 serverHA2"
> devices=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 meta
> provides=unfencing
>
> and the cluster status is ok
> [ONE] pcs status
> Cluster name: MyCluHA
> Last updated: Tue Jan 27 11:21:48 2015
> Last change: Tue Jan 27 10:46:57 2015
> Stack: cman
> Current DC: serverHA1 - partition with quorum
> Version: 1.1.11-97629de
> 2 Nodes configured
> 5 Resources configured
>
>
> Online: [ serverHA1 serverHA2 ]
>
> Full list of resources:
>
> Clone Set: ping-clone [ping]
> Started: [ serverHA1 serverHA2 ]
> Clone Set: clusterfs-clone [clusterfs]
> Started: [ serverHA1 serverHA2 ]
> iscsi-stonith-device (stonith:fence_scsi): Started serverHA1
>
>
>
> How I can try this from remote connection?
>
>
> Andrea
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

emi2fast at gmail

Jan 27, 2015, 2:44 AM

Post #9 of 37 (13139 views)

In normal situation every node can in your file system, fence_scsi is
used when your cluster is in split-braint, when your a node doesn't
comunicate with the other node, i don't is good idea.

2015-01-27 11:35 GMT+01:00 Andrea <a.bacchi@codices.com>:
> Andrea <a.bacchi@...> writes:
>
>>
>> Michael Schwartzkopff <ms <at> ...> writes:
>>
>> >
>> > Am Donnerstag, 22. Januar 2015, 10:03:38 schrieb E. Kuemmerle:
>> > > On 21.01.2015 11:18 Digimer wrote:
>> > > > On 21/01/15 08:13 AM, Andrea wrote:
>> > > >> > Hi All,
>> > > >> >
>> > > >> > I have a question about stonith
>> > > >> > In my scenarion , I have to create 2 node cluster, but I don't
>> >
>> > Are you sure that you do not have fencing hardware? Perhaps you just did
> nit
>> > configure it? Please read the manual of you BIOS and check your system
>> board if
>> > you have a IPMI interface.
>> >
>>
>> > > >> > In my test, when I simulate network failure, split brain occurs, and
>> > > >> > when
>> > > >> > network come back, One node kill the other node
>> > > >> > -log on node 1:
>> > > >> > Jan 21 11:45:28 corosync [CMAN ] memb: Sending KILL to node 2
>> > > >> >
>> > > >> > -log on node 2:
>> > > >> > Jan 21 11:45:28 corosync [CMAN ] memb: got KILL for node 2
>> >
>> > That is how fencing works.
>> >
>> > Mit freundlichen GrÃ¼ÃŸen,
>> >
>> > Michael Schwartzkopff
>> >
>>
>> Hi All
>>
>> many thanks for your replies.
>> I will update my scenario to ask about adding some devices for stonith
>> - Option 1
>> I will ask for having 2 vmware virtual machine, so i can try fance_vmware
>> -Option 2
>> In the project, maybe will need a shared storage. In this case, the shared
>> storage will be a NAS that a can add to my nodes via iscsi. In this case I
>> can try fence_scsi
>>
>> I will write here about news
>>
>> Many thanks to all for support
>> Andrea
>>
>
>
>
> some news
>
> - Option 2
> In the customer environment I configured a iscsi target that our project
> will use as cluster filesystem
>
> [ONE]pvcreate /dev/sdb
> [ONE]vgcreate -Ay -cy cluster_vg /dev/sdb
> [ONE]lvcreate -L*G -n cluster_lv cluster_vg
> [ONE]mkfs.gfs2 -j2 -p lock_dlm -t ProjectHA:ArchiveFS /dev/cluster_vg/cluster_lv
>
> now I can add a Filesystem resource
>
> [ONE]pcs resource create clusterfs Filesystem
> device="/dev/cluster_vg/cluster_lv" directory="/var/mountpoint"
> fstype="gfs2" "options=noatime" op monitor interval=10s clone interleave=true
>
> and I can read and write from both node.
>
>
> Now I'd like to use this device with fence_scsi.
> It is ok? because I see in the man page this:
> "The fence_scsi agent works by having each node in the cluster register a
> unique key with the SCSI devive(s). Once registered, a single node will
> become the reservation holder by creating a "write exclu-sive,
> registrants only" reservation on the device(s). The result is that only
> registered nodes may write to the device(s)"
> It's no good for me, I need both node can write on the device.
> So, I need another device to use with fence_scsi? In this case I will try to
> create two partition, sdb1 and sdb2, on this device and use sdb1 as
> clusterfs and sdb2 for fencing.
>
>
> If i try to manually test this, I obtain before any operation
> [ONE]sg_persist -n --read-keys
> --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
> PR generation=0x27, 1 registered reservation key follows:
> 0x98343e580002734d
>
>
> Then, I try to set serverHA1 key
> [serverHA1]fence_scsi -d
> /dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 -f /tmp/miolog.txt -n
> serverHA1 -o on
>
> But nothing has changed
> [ONE]sg_persist -n --read-keys
> --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
> PR generation=0x27, 1 registered reservation key follows:
> 0x98343e580002734d
>
>
> and in the log:
> gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore
> (node_key=4d5a0001, dev=/dev/sde)
> gen 26 17:53:27 fence_scsi: [debug] main::do_reset (dev=/dev/sde, status=6)
> gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore (err=0)
>
> The same when i try on serverHA2
> It is normal?
>
>
> In any case, i try to create a stonith device
> [ONE]pcs stonith create iscsi-stonith-device fence_scsi
> pcmk_host_list="serverHA1 serverHA2"
> devices=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 meta
> provides=unfencing
>
> and the cluster status is ok
> [ONE] pcs status
> Cluster name: MyCluHA
> Last updated: Tue Jan 27 11:21:48 2015
> Last change: Tue Jan 27 10:46:57 2015
> Stack: cman
> Current DC: serverHA1 - partition with quorum
> Version: 1.1.11-97629de
> 2 Nodes configured
> 5 Resources configured
>
>
> Online: [ serverHA1 serverHA2 ]
>
> Full list of resources:
>
> Clone Set: ping-clone [ping]
> Started: [ serverHA1 serverHA2 ]
> Clone Set: clusterfs-clone [clusterfs]
> Started: [ serverHA1 serverHA2 ]
> iscsi-stonith-device (stonith:fence_scsi): Started serverHA1
>
>
>
> How I can try this from remote connection?
>
>
> Andrea
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

emi2fast at gmail

Jan 27, 2015, 2:48 AM

Post #10 of 37 (13145 views)

sorry, but i forgot to tell you, you need to know the fence_scsi
doesn't reboot the evicted node, so you can combine fence_vmware with
fence_scsi as the second option.

2015-01-27 11:44 GMT+01:00 emmanuel segura <emi2fast@gmail.com>:
> In normal situation every node can in your file system, fence_scsi is
> used when your cluster is in split-braint, when your a node doesn't
> comunicate with the other node, i don't is good idea.
>
>
> 2015-01-27 11:35 GMT+01:00 Andrea <a.bacchi@codices.com>:
>> Andrea <a.bacchi@...> writes:
>>
>>>
>>> Michael Schwartzkopff <ms <at> ...> writes:
>>>
>>> >
>>> > Am Donnerstag, 22. Januar 2015, 10:03:38 schrieb E. Kuemmerle:
>>> > > On 21.01.2015 11:18 Digimer wrote:
>>> > > > On 21/01/15 08:13 AM, Andrea wrote:
>>> > > >> > Hi All,
>>> > > >> >
>>> > > >> > I have a question about stonith
>>> > > >> > In my scenarion , I have to create 2 node cluster, but I don't
>>> >
>>> > Are you sure that you do not have fencing hardware? Perhaps you just did
>> nit
>>> > configure it? Please read the manual of you BIOS and check your system
>>> board if
>>> > you have a IPMI interface.
>>> >
>>>
>>> > > >> > In my test, when I simulate network failure, split brain occurs, and
>>> > > >> > when
>>> > > >> > network come back, One node kill the other node
>>> > > >> > -log on node 1:
>>> > > >> > Jan 21 11:45:28 corosync [CMAN ] memb: Sending KILL to node 2
>>> > > >> >
>>> > > >> > -log on node 2:
>>> > > >> > Jan 21 11:45:28 corosync [CMAN ] memb: got KILL for node 2
>>> >
>>> > That is how fencing works.
>>> >
>>> > Mit freundlichen GrÃ¼ÃŸen,
>>> >
>>> > Michael Schwartzkopff
>>> >
>>>
>>> Hi All
>>>
>>> many thanks for your replies.
>>> I will update my scenario to ask about adding some devices for stonith
>>> - Option 1
>>> I will ask for having 2 vmware virtual machine, so i can try fance_vmware
>>> -Option 2
>>> In the project, maybe will need a shared storage. In this case, the shared
>>> storage will be a NAS that a can add to my nodes via iscsi. In this case I
>>> can try fence_scsi
>>>
>>> I will write here about news
>>>
>>> Many thanks to all for support
>>> Andrea
>>>
>>
>>
>>
>> some news
>>
>> - Option 2
>> In the customer environment I configured a iscsi target that our project
>> will use as cluster filesystem
>>
>> [ONE]pvcreate /dev/sdb
>> [ONE]vgcreate -Ay -cy cluster_vg /dev/sdb
>> [ONE]lvcreate -L*G -n cluster_lv cluster_vg
>> [ONE]mkfs.gfs2 -j2 -p lock_dlm -t ProjectHA:ArchiveFS /dev/cluster_vg/cluster_lv
>>
>> now I can add a Filesystem resource
>>
>> [ONE]pcs resource create clusterfs Filesystem
>> device="/dev/cluster_vg/cluster_lv" directory="/var/mountpoint"
>> fstype="gfs2" "options=noatime" op monitor interval=10s clone interleave=true
>>
>> and I can read and write from both node.
>>
>>
>> Now I'd like to use this device with fence_scsi.
>> It is ok? because I see in the man page this:
>> "The fence_scsi agent works by having each node in the cluster register a
>> unique key with the SCSI devive(s). Once registered, a single node will
>> become the reservation holder by creating a "write exclu-sive,
>> registrants only" reservation on the device(s). The result is that only
>> registered nodes may write to the device(s)"
>> It's no good for me, I need both node can write on the device.
>> So, I need another device to use with fence_scsi? In this case I will try to
>> create two partition, sdb1 and sdb2, on this device and use sdb1 as
>> clusterfs and sdb2 for fencing.
>>
>>
>> If i try to manually test this, I obtain before any operation
>> [ONE]sg_persist -n --read-keys
>> --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
>> PR generation=0x27, 1 registered reservation key follows:
>> 0x98343e580002734d
>>
>>
>> Then, I try to set serverHA1 key
>> [serverHA1]fence_scsi -d
>> /dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 -f /tmp/miolog.txt -n
>> serverHA1 -o on
>>
>> But nothing has changed
>> [ONE]sg_persist -n --read-keys
>> --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
>> PR generation=0x27, 1 registered reservation key follows:
>> 0x98343e580002734d
>>
>>
>> and in the log:
>> gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore
>> (node_key=4d5a0001, dev=/dev/sde)
>> gen 26 17:53:27 fence_scsi: [debug] main::do_reset (dev=/dev/sde, status=6)
>> gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore (err=0)
>>
>> The same when i try on serverHA2
>> It is normal?
>>
>>
>> In any case, i try to create a stonith device
>> [ONE]pcs stonith create iscsi-stonith-device fence_scsi
>> pcmk_host_list="serverHA1 serverHA2"
>> devices=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 meta
>> provides=unfencing
>>
>> and the cluster status is ok
>> [ONE] pcs status
>> Cluster name: MyCluHA
>> Last updated: Tue Jan 27 11:21:48 2015
>> Last change: Tue Jan 27 10:46:57 2015
>> Stack: cman
>> Current DC: serverHA1 - partition with quorum
>> Version: 1.1.11-97629de
>> 2 Nodes configured
>> 5 Resources configured
>>
>>
>> Online: [ serverHA1 serverHA2 ]
>>
>> Full list of resources:
>>
>> Clone Set: ping-clone [ping]
>> Started: [ serverHA1 serverHA2 ]
>> Clone Set: clusterfs-clone [clusterfs]
>> Started: [ serverHA1 serverHA2 ]
>> iscsi-stonith-device (stonith:fence_scsi): Started serverHA1
>>
>>
>>
>> How I can try this from remote connection?
>>
>>
>> Andrea
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera

--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

emi2fast at gmail

Jan 27, 2015, 2:48 AM

Post #11 of 37 (13147 views)

sorry, but i forgot to tell you, you need to know the fence_scsi
doesn't reboot the evicted node, so you can combine fence_vmware with
fence_scsi as the second option.

2015-01-27 11:44 GMT+01:00 emmanuel segura <emi2fast@gmail.com>:
> In normal situation every node can in your file system, fence_scsi is
> used when your cluster is in split-braint, when your a node doesn't
> comunicate with the other node, i don't is good idea.
>
>
> 2015-01-27 11:35 GMT+01:00 Andrea <a.bacchi@codices.com>:
>> Andrea <a.bacchi@...> writes:
>>
>>>
>>> Michael Schwartzkopff <ms <at> ...> writes:
>>>
>>> >
>>> > Am Donnerstag, 22. Januar 2015, 10:03:38 schrieb E. Kuemmerle:
>>> > > On 21.01.2015 11:18 Digimer wrote:
>>> > > > On 21/01/15 08:13 AM, Andrea wrote:
>>> > > >> > Hi All,
>>> > > >> >
>>> > > >> > I have a question about stonith
>>> > > >> > In my scenarion , I have to create 2 node cluster, but I don't
>>> >
>>> > Are you sure that you do not have fencing hardware? Perhaps you just did
>> nit
>>> > configure it? Please read the manual of you BIOS and check your system
>>> board if
>>> > you have a IPMI interface.
>>> >
>>>
>>> > > >> > In my test, when I simulate network failure, split brain occurs, and
>>> > > >> > when
>>> > > >> > network come back, One node kill the other node
>>> > > >> > -log on node 1:
>>> > > >> > Jan 21 11:45:28 corosync [CMAN ] memb: Sending KILL to node 2
>>> > > >> >
>>> > > >> > -log on node 2:
>>> > > >> > Jan 21 11:45:28 corosync [CMAN ] memb: got KILL for node 2
>>> >
>>> > That is how fencing works.
>>> >
>>> > Mit freundlichen GrÃ¼ÃŸen,
>>> >
>>> > Michael Schwartzkopff
>>> >
>>>
>>> Hi All
>>>
>>> many thanks for your replies.
>>> I will update my scenario to ask about adding some devices for stonith
>>> - Option 1
>>> I will ask for having 2 vmware virtual machine, so i can try fance_vmware
>>> -Option 2
>>> In the project, maybe will need a shared storage. In this case, the shared
>>> storage will be a NAS that a can add to my nodes via iscsi. In this case I
>>> can try fence_scsi
>>>
>>> I will write here about news
>>>
>>> Many thanks to all for support
>>> Andrea
>>>
>>
>>
>>
>> some news
>>
>> - Option 2
>> In the customer environment I configured a iscsi target that our project
>> will use as cluster filesystem
>>
>> [ONE]pvcreate /dev/sdb
>> [ONE]vgcreate -Ay -cy cluster_vg /dev/sdb
>> [ONE]lvcreate -L*G -n cluster_lv cluster_vg
>> [ONE]mkfs.gfs2 -j2 -p lock_dlm -t ProjectHA:ArchiveFS /dev/cluster_vg/cluster_lv
>>
>> now I can add a Filesystem resource
>>
>> [ONE]pcs resource create clusterfs Filesystem
>> device="/dev/cluster_vg/cluster_lv" directory="/var/mountpoint"
>> fstype="gfs2" "options=noatime" op monitor interval=10s clone interleave=true
>>
>> and I can read and write from both node.
>>
>>
>> Now I'd like to use this device with fence_scsi.
>> It is ok? because I see in the man page this:
>> "The fence_scsi agent works by having each node in the cluster register a
>> unique key with the SCSI devive(s). Once registered, a single node will
>> become the reservation holder by creating a "write exclu-sive,
>> registrants only" reservation on the device(s). The result is that only
>> registered nodes may write to the device(s)"
>> It's no good for me, I need both node can write on the device.
>> So, I need another device to use with fence_scsi? In this case I will try to
>> create two partition, sdb1 and sdb2, on this device and use sdb1 as
>> clusterfs and sdb2 for fencing.
>>
>>
>> If i try to manually test this, I obtain before any operation
>> [ONE]sg_persist -n --read-keys
>> --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
>> PR generation=0x27, 1 registered reservation key follows:
>> 0x98343e580002734d
>>
>>
>> Then, I try to set serverHA1 key
>> [serverHA1]fence_scsi -d
>> /dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 -f /tmp/miolog.txt -n
>> serverHA1 -o on
>>
>> But nothing has changed
>> [ONE]sg_persist -n --read-keys
>> --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
>> PR generation=0x27, 1 registered reservation key follows:
>> 0x98343e580002734d
>>
>>
>> and in the log:
>> gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore
>> (node_key=4d5a0001, dev=/dev/sde)
>> gen 26 17:53:27 fence_scsi: [debug] main::do_reset (dev=/dev/sde, status=6)
>> gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore (err=0)
>>
>> The same when i try on serverHA2
>> It is normal?
>>
>>
>> In any case, i try to create a stonith device
>> [ONE]pcs stonith create iscsi-stonith-device fence_scsi
>> pcmk_host_list="serverHA1 serverHA2"
>> devices=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 meta
>> provides=unfencing
>>
>> and the cluster status is ok
>> [ONE] pcs status
>> Cluster name: MyCluHA
>> Last updated: Tue Jan 27 11:21:48 2015
>> Last change: Tue Jan 27 10:46:57 2015
>> Stack: cman
>> Current DC: serverHA1 - partition with quorum
>> Version: 1.1.11-97629de
>> 2 Nodes configured
>> 5 Resources configured
>>
>>
>> Online: [ serverHA1 serverHA2 ]
>>
>> Full list of resources:
>>
>> Clone Set: ping-clone [ping]
>> Started: [ serverHA1 serverHA2 ]
>> Clone Set: clusterfs-clone [clusterfs]
>> Started: [ serverHA1 serverHA2 ]
>> iscsi-stonith-device (stonith:fence_scsi): Started serverHA1
>>
>>
>>
>> How I can try this from remote connection?
>>
>>
>> Andrea
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera

--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

a.bacchi at codices

Jan 27, 2015, 4:29 AM

Post #12 of 37 (13141 views)

emmanuel segura <emi2fast@...> writes:

>
> sorry, but i forgot to tell you, you need to know the fence_scsi
> doesn't reboot the evicted node, so you can combine fence_vmware with
> fence_scsi as the second option.
>
for this, i'm trying to use a watchdog script
https://access.redhat.com/solutions/65187

But when I start wachdog daemon, all node reboot.
I continue testing...

> 2015-01-27 11:44 GMT+01:00 emmanuel segura <emi2fast <at> gmail.com>:
> > In normal situation every node can in your file system, fence_scsi is
> > used when your cluster is in split-braint, when your a node doesn't
> > comunicate with the other node, i don't is good idea.
> >

So, i will see key registration only when nodes loose comunication?

> >
> > 2015-01-27 11:35 GMT+01:00 Andrea <a.bacchi <at> codices.com>:
> >> Andrea <a.bacchi <at> ...> writes:
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

emi2fast at gmail

Jan 27, 2015, 4:51 AM

Post #13 of 37 (13149 views)

When a node is dead the registration key is removed.

2015-01-27 13:29 GMT+01:00 Andrea <a.bacchi@codices.com>:
> emmanuel segura <emi2fast@...> writes:
>
>>
>> sorry, but i forgot to tell you, you need to know the fence_scsi
>> doesn't reboot the evicted node, so you can combine fence_vmware with
>> fence_scsi as the second option.
>>
> for this, i'm trying to use a watchdog script
> https://access.redhat.com/solutions/65187
>
> But when I start wachdog daemon, all node reboot.
> I continue testing...
>
>
>
>> 2015-01-27 11:44 GMT+01:00 emmanuel segura <emi2fast <at> gmail.com>:
>> > In normal situation every node can in your file system, fence_scsi is
>> > used when your cluster is in split-braint, when your a node doesn't
>> > comunicate with the other node, i don't is good idea.
>> >
>
> So, i will see key registration only when nodes loose comunication?
>
>
>
>
>
>> >
>> > 2015-01-27 11:35 GMT+01:00 Andrea <a.bacchi <at> codices.com>:
>> >> Andrea <a.bacchi <at> ...> writes:
>>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

emi2fast at gmail

Jan 27, 2015, 4:51 AM

Post #14 of 37 (13140 views)

When a node is dead the registration key is removed.

2015-01-27 13:29 GMT+01:00 Andrea <a.bacchi@codices.com>:
> emmanuel segura <emi2fast@...> writes:
>
>>
>> sorry, but i forgot to tell you, you need to know the fence_scsi
>> doesn't reboot the evicted node, so you can combine fence_vmware with
>> fence_scsi as the second option.
>>
> for this, i'm trying to use a watchdog script
> https://access.redhat.com/solutions/65187
>
> But when I start wachdog daemon, all node reboot.
> I continue testing...
>
>
>
>> 2015-01-27 11:44 GMT+01:00 emmanuel segura <emi2fast <at> gmail.com>:
>> > In normal situation every node can in your file system, fence_scsi is
>> > used when your cluster is in split-braint, when your a node doesn't
>> > comunicate with the other node, i don't is good idea.
>> >
>
> So, i will see key registration only when nodes loose comunication?
>
>
>
>
>
>> >
>> > 2015-01-27 11:35 GMT+01:00 Andrea <a.bacchi <at> codices.com>:
>> >> Andrea <a.bacchi <at> ...> writes:
>>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

pvinod.mit at gmail

Jan 27, 2015, 5:05 AM

Post #15 of 37 (13148 views)

is stonith enabled in crm conf?

On Tue, Jan 27, 2015 at 6:21 PM, emmanuel segura <emi2fast@gmail.com> wrote:

> When a node is dead the registration key is removed.
>
> 2015-01-27 13:29 GMT+01:00 Andrea <a.bacchi@codices.com>:
> > emmanuel segura <emi2fast@...> writes:
> >
> >>
> >> sorry, but i forgot to tell you, you need to know the fence_scsi
> >> doesn't reboot the evicted node, so you can combine fence_vmware with
> >> fence_scsi as the second option.
> >>
> > for this, i'm trying to use a watchdog script
> > https://access.redhat.com/solutions/65187
> >
> > But when I start wachdog daemon, all node reboot.
> > I continue testing...
> >
> >
> >
> >> 2015-01-27 11:44 GMT+01:00 emmanuel segura <emi2fast <at> gmail.com>:
> >> > In normal situation every node can in your file system, fence_scsi is
> >> > used when your cluster is in split-braint, when your a node doesn't
> >> > comunicate with the other node, i don't is good idea.
> >> >
> >
> > So, i will see key registration only when nodes loose comunication?
> >
> >
> >
> >
> >
> >> >
> >> > 2015-01-27 11:35 GMT+01:00 Andrea <a.bacchi <at> codices.com>:
> >> >> Andrea <a.bacchi <at> ...> writes:
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

--
OSS BSS Developer
Hand Phone: 9860788344
[image: Picture]

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

emi2fast at gmail

Jan 27, 2015, 5:12 AM

Post #16 of 37 (13145 views)

if you are using cman+pacemaker you need to enabled the stonith and
configuring that in you crm config

2015-01-27 14:05 GMT+01:00 Vinod Prabhu <pvinod.mit@gmail.com>:

> is stonith enabled in crm conf?
>
> On Tue, Jan 27, 2015 at 6:21 PM, emmanuel segura <emi2fast@gmail.com>
> wrote:
>
>> When a node is dead the registration key is removed.
>>
>> 2015-01-27 13:29 GMT+01:00 Andrea <a.bacchi@codices.com>:
>> > emmanuel segura <emi2fast@...> writes:
>> >
>> >>
>> >> sorry, but i forgot to tell you, you need to know the fence_scsi
>> >> doesn't reboot the evicted node, so you can combine fence_vmware with
>> >> fence_scsi as the second option.
>> >>
>> > for this, i'm trying to use a watchdog script
>> > https://access.redhat.com/solutions/65187
>> >
>> > But when I start wachdog daemon, all node reboot.
>> > I continue testing...
>> >
>> >
>> >
>> >> 2015-01-27 11:44 GMT+01:00 emmanuel segura <emi2fast <at> gmail.com>:
>> >> > In normal situation every node can in your file system, fence_scsi is
>> >> > used when your cluster is in split-braint, when your a node doesn't
>> >> > comunicate with the other node, i don't is good idea.
>> >> >
>> >
>> > So, i will see key registration only when nodes loose comunication?
>> >
>> >
>> >
>> >
>> >
>> >> >
>> >> > 2015-01-27 11:35 GMT+01:00 Andrea <a.bacchi <at> codices.com>:
>> >> >> Andrea <a.bacchi <at> ...> writes:
>> >>
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> --
> OSS BSS Developer
> Hand Phone: 9860788344
> [image: Picture]
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>

--
esta es mi vida e me la vivo hasta que dios quiera

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

a.bacchi at codices

Jan 27, 2015, 5:24 AM

Post #17 of 37 (13150 views)

emmanuel segura <emi2fast@...> writes:

>
>
> if you are using cman+pacemaker you need to enabled the stonith and
configuring that in you crm config
>
>
> 2015-01-27 14:05 GMT+01:00 Vinod Prabhu
<pvinod.mit@gmail.com>:
> is stonith enabled in crm conf?
>

yes, stonith is enabled

[ONE]pcs property
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.11-97629de
last-lrm-refresh: 1422285715
no-quorum-policy: ignore
stonith-enabled: true

If I disable it, stonith device don't start

>
> On Tue, Jan 27, 2015 at 6:21 PM, emmanuel segura
<emi2fast@gmail.com> wrote:When a node is dead
the registration key is removed.

So I must see 2 key registered when I add fence_scsi device?
But I don't see 2 key registered...

Andrea

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

emi2fast at gmail

Jan 27, 2015, 5:42 AM

Post #18 of 37 (13140 views)

please show your configuration and your logs.

2015-01-27 14:24 GMT+01:00 Andrea <a.bacchi@codices.com>:
> emmanuel segura <emi2fast@...> writes:
>
>>
>>
>> if you are using cman+pacemaker you need to enabled the stonith and
> configuring that in you crm config
>>
>>
>> 2015-01-27 14:05 GMT+01:00 Vinod Prabhu
> <pvinod.mit@gmail.com>:
>> is stonith enabled in crm conf?
>>
>
> yes, stonith is enabled
>
> [ONE]pcs property
> Cluster Properties:
> cluster-infrastructure: cman
> dc-version: 1.1.11-97629de
> last-lrm-refresh: 1422285715
> no-quorum-policy: ignore
> stonith-enabled: true
>
>
> If I disable it, stonith device don't start
>
>
>>
>> On Tue, Jan 27, 2015 at 6:21 PM, emmanuel segura
> <emi2fast@gmail.com> wrote:When a node is dead
> the registration key is removed.
>
> So I must see 2 key registered when I add fence_scsi device?
> But I don't see 2 key registered...
>
>
>
> Andrea
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

a.bacchi at codices

Jan 27, 2015, 8:55 AM

Post #19 of 37 (13145 views)

emmanuel segura <emi2fast@...> writes:

>
> please show your configuration and your logs.
>
> 2015-01-27 14:24 GMT+01:00 Andrea <a.bacchi@...>:
> > emmanuel segura <emi2fast <at> ...> writes:
> >
> >>
> >>
> >> if you are using cman+pacemaker you need to enabled the stonith and
> > configuring that in you crm config
> >>
> >>
> >> 2015-01-27 14:05 GMT+01:00 Vinod Prabhu
> > <pvinod.mit@...>:
> >> is stonith enabled in crm conf?
> >>
> >
> > yes, stonith is enabled
> >
> > [ONE]pcs property
> > Cluster Properties:
> > cluster-infrastructure: cman
> > dc-version: 1.1.11-97629de
> > last-lrm-refresh: 1422285715
> > no-quorum-policy: ignore
> > stonith-enabled: true
> >
> >
> > If I disable it, stonith device don't start
> >
> >
> >>
> >> On Tue, Jan 27, 2015 at 6:21 PM, emmanuel segura
> > <emi2fast@...> wrote:When a node is dead
> > the registration key is removed.
> >
> > So I must see 2 key registered when I add fence_scsi device?
> > But I don't see 2 key registered...
> >
> >

Sorry, I used wrong device id.
Now, with the correct device id, I see 2 key reserved

[ONE] sg_persist -n --read-keys
--device=/dev/disk/by-id/scsi-36e843b60f3d0cc6d1a11d4ff0da95cd8
PR generation=0x4, 2 registered reservation keys follow:
0x4d5a0001
0x4d5a0002

Tomorrow i will do some test for fencing...

thanks
Andrea

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

a.bacchi at codices

Jan 30, 2015, 3:38 AM

Post #20 of 37 (13165 views)

Andrea <a.bacchi@...> writes:

>
> Sorry, I used wrong device id.
> Now, with the correct device id, I see 2 key reserved
>
> [ONE] sg_persist -n --read-keys
> --device=/dev/disk/by-id/scsi-36e843b60f3d0cc6d1a11d4ff0da95cd8
> PR generation=0x4, 2 registered reservation keys follow:
> 0x4d5a0001
> 0x4d5a0002
>
> Tomorrow i will do some test for fencing...
>

some news

If I try to fence serverHA2 with this command:
[ONE]pcs stonith fence serverHA2

I obtain that all seem to be ok, but serverHA2 freeze,
below the log from each node (on serverHA2 after loggin these lines, freeze)

The servers are 2 vmware virtual machine (I ask for an account on esx server
to test fence_vmware, I'm waiting response)

log serverHA1

Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice: handle_request: Client
stonith_admin.1907.b13e0290 wants to fence (reboot) 'serverHA2' with device
'(any)'
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
serverHA2: 70b75107-8919-4510-9c6c-7cc65e6a00a6 (0)
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice:
can_fence_host_with_device: iscsi-stonith-device can fence (reboot)
serverHA2: static-list
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info:
process_remote_stonith_query: Query result 1 of 2 from serverHA1 for
serverHA2/reboot (1 devices) 70b75107-8919-4510-9c6c-7cc65e6a00a6
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info: call_remote_stonith:
Total remote op timeout set to 120 for fencing of node serverHA2 for
stonith_admin.1907.70b75107
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info: call_remote_stonith:
Requesting that serverHA1 perform op reboot serverHA2 for stonith_admin.1907
(144s)
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice:
can_fence_host_with_device: iscsi-stonith-device can fence (reboot)
serverHA2: static-list
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info:
stonith_fence_get_devices_cb: Found 1 matching devices for 'serverHA2'
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: warning: stonith_device_execute:
Agent 'fence_scsi' does not advertise support for 'reboot', performing 'off'
action instead
Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info:
process_remote_stonith_query: Query result 2 of 2 from serverHA2 for
serverHA2/reboot (1 devices) 70b75107-8919-4510-9c6c-7cc65e6a00a6
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: log_operation:
Operation 'reboot' [1908] (call 2 from stonith_admin.1907) for host 'serverHA2'
with device 'iscsi-stonith-device' returned: 0 (OK)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: warning: get_xpath_object:
No match for //@st_delegate in /st-reply
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: remote_op_done:
Operation reboot of serverHA2 by serverHA1 for
stonith_admin.1907@serverHA1.70b75107: OK
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: tengine_stonith_notify:
Peer serverHA2 was terminated (reboot) by serverHA1 for serverHA1: OK
(ref=70b75107-8919-4510-9c6c-7cc65e6a00a6) by client stonith_admin.1907
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: tengine_stonith_notify:
Notified CMAN that 'serverHA2' is now fenced
Jan 30 12:13:03 [2514] serverHA1 crmd: info: crm_update_peer_join:
crmd_peer_down: Node serverHA2[2] - join-2 phase 4 -> 0
Jan 30 12:13:03 [2514] serverHA1 crmd: info:
crm_update_peer_expected: crmd_peer_down: Node serverHA2[2] - expected
state is now down (was member)
Jan 30 12:13:03 [2514] serverHA1 crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='serverHA2']/lrm
Jan 30 12:13:03 [2514] serverHA1 crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='serverHA2']/transient_attributes
Jan 30 12:13:03 [2514] serverHA1 crmd: info: tengine_stonith_notify:
External fencing operation from stonith_admin.1907 fenced serverHA2
Jan 30 12:13:03 [2514] serverHA1 crmd: info: abort_transition_graph:
Transition aborted: External Fencing Operation
(source=tengine_stonith_notify:248, 1)
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 30 12:13:03 [2514] serverHA1 crmd: warning: do_state_transition:
Only 1 of 2 cluster nodes are eligible to run resources - continue 0
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Forwarding cib_modify operation for section status to master
(origin=local/crmd/333)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Forwarding cib_delete operation for section
//node_state[@uname='serverHA2']/lrm to master (origin=local/crmd/334)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Forwarding cib_delete operation for section
//node_state[@uname='serverHA2']/transient_attributes to master
(origin=local/crmd/335)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
--- 0.51.86 2
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
+++ 0.51.87 (null)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: +
/cib: @num_updates=87
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: +
/cib/status/node_state[@id='serverHA2']:
@crm-debug-origin=send_stonith_update, @join=down, @expected=down
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0,
origin=serverHA1/crmd/333, version=0.51.87)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
--- 0.51.87 2
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
+++ 0.51.88 (null)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: --
/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2']
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: +
/cib: @num_updates=88
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Completed cib_delete operation for section
//node_state[@uname='serverHA2']/lrm: OK (rc=0, origin=serverHA1/crmd/334,
version=0.51.88)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
--- 0.51.88 2
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff:
+++ 0.51.89 (null)
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: --
/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2']
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: +
/cib: @num_updates=89
Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request:
Completed cib_delete operation for section
//node_state[@uname='serverHA2']/transient_attributes: OK (rc=0,
origin=serverHA1/crmd/335, version=0.51.89)
Jan 30 12:13:03 [2514] serverHA1 crmd: info: cib_fencing_updated:
Fencing update 333 for serverHA2: complete
Jan 30 12:13:03 [2514] serverHA1 crmd: info: abort_transition_graph:
Transition aborted by deletion of lrm[@id='serverHA2']: Resource state removal
(cib=0.51.88, source=te_update_diff:429,
path=/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2'], 1)
Jan 30 12:13:03 [2514] serverHA1 crmd: info: abort_transition_graph:
Transition aborted by deletion of transient_attributes[@id='serverHA2']:
Transient attribute change (cib=0.51.89, source=te_update_diff:391,
path=/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2
'], 1)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: process_pe_message:
Input has not changed since last time, not saving to disk
Jan 30 12:13:03 [2513] serverHA1 pengine: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 30 12:13:03 [2513] serverHA1 pengine: info:
determine_online_status_fencing: Node serverHA2 is active
Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status:
Node serverHA2 is online
Jan 30 12:13:03 [2513] serverHA1 pengine: info:
determine_online_status_fencing: Node serverHA1 is active
Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status:
Node serverHA1 is online
Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone
Set: ping-clone [ping]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Started: [ serverHA1 serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone
Set: clusterfs-clone [clusterfs]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Started: [ serverHA1 serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_print:
iscsi-stonith-device (stonith:fence_scsi): Started serverHA1
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
ping:0 (Started serverHA2)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
ping:1 (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
clusterfs:0 (Started serverHA2)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
clusterfs:1 (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
iscsi-stonith-device (Started serverHA1)
Jan 30 12:13:03 [2514] serverHA1 crmd: info: handle_response:
pe_calc calculation pe_calc-dc-1422616383-286 is obsolete
Jan 30 12:13:03 [2513] serverHA1 pengine: notice: process_pe_message:
Calculated Transition 189: /var/lib/pacemaker/pengine/pe-input-145.bz2
Jan 30 12:13:03 [2513] serverHA1 pengine: notice: unpack_config: On loss
of CCM Quorum: Ignore
Jan 30 12:13:03 [2513] serverHA1 pengine: info:
determine_online_status_fencing: - Node serverHA2 is not ready to run
resources
Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status:
Node serverHA2 is pending
Jan 30 12:13:03 [2513] serverHA1 pengine: info:
determine_online_status_fencing: Node serverHA1 is active
Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status:
Node serverHA1 is online
Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone
Set: ping-clone [ping]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Started: [ serverHA1 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Stopped: [ serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone
Set: clusterfs-clone [clusterfs]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Started: [ serverHA1 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print:
Stopped: [ serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_print:
iscsi-stonith-device (stonith:fence_scsi): Started serverHA1
Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_color:
Resource ping:1 cannot run anywhere
Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_color:
Resource clusterfs:1 cannot run anywhere
Jan 30 12:13:03 [2513] serverHA1 pengine: info: probe_resources:
Action probe_complete-serverHA2 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1 pengine: warning: custom_action: Action
ping:0_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1 pengine: warning: custom_action: Action
clusterfs:0_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1 pengine: warning: custom_action: Action
iscsi-stonith-device_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1 pengine: notice: trigger_unfencing:
Unfencing serverHA2: node discovery
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
ping:0 (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
ping:1 (Stopped)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
clusterfs:0 (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
clusterfs:1 (Stopped)
Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave
iscsi-stonith-device (Started serverHA1)
Jan 30 12:13:03 [2514] serverHA1 crmd: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 30 12:13:03 [2514] serverHA1 crmd: info: do_te_invoke:
Processing graph 190 (ref=pe_calc-dc-1422616383-287) derived from
/var/lib/pacemaker/pengine/pe-input-146.bz2
Jan 30 12:13:03 [2513] serverHA1 pengine: notice: process_pe_message:
Calculated Transition 190: /var/lib/pacemaker/pengine/pe-input-146.bz2
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: te_fence_node:
Executing on fencing operation (5) on serverHA2 (timeout=60000)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: handle_request: Client
crmd.2514.b5961dc1 wants to fence (on) 'serverHA2' with device '(any)'
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice:
initiate_remote_stonith_op: Initiating remote operation on for serverHA2:
e19629dc-bec3-4e63-baf6-a7ecd5ed44bb (0)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info:
process_remote_stonith_query: Query result 2 of 2 from serverHA2 for
serverHA2/on (1 devices) e19629dc-bec3-4e63-baf6-a7ecd5ed44bb
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info:
process_remote_stonith_query: All queries have arrived, continuing (2, 2, 2)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info: call_remote_stonith:
Total remote op timeout set to 60 for fencing of node serverHA2 for
crmd.2514.e19629dc
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info: call_remote_stonith:
Requesting that serverHA2 perform op on serverHA2 for crmd.2514 (72s)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: warning: get_xpath_object:
No match for //@st_delegate in /st-reply
Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: remote_op_done:
Operation on of serverHA2 by serverHA2 for crmd.2514@serverHA1.e19629dc: OK
Jan 30 12:13:03 [2514] serverHA1 crmd: notice:
tengine_stonith_callback: Stonith operation
9/5:190:0:4e500b84-bb92-4406-8f9c-f4140dd40ec7: OK (0)
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: tengine_stonith_notify:
serverHA2 was successfully unfenced by serverHA2 (at the request of serverHA1)
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: run_graph:
Transition 190 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-146.bz2): Complete
Jan 30 12:13:03 [2514] serverHA1 crmd: info: do_log: FSA: Input
I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Jan 30 12:13:03 [2514] serverHA1 crmd: notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]

log serverHA2

Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice:
can_fence_host_with_device: iscsi-stonith-device can fence (reboot)
serverHA2: static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: remote_op_done:
Operation reboot of serverHA2 by serverHA1 for
stonith_admin.1907@serverHA1.70b75107: OK
Jan 30 12:13:11 [2631] serverHA2 crmd: crit: tengine_stonith_notify:
We were alegedly just fenced by serverHA1 for serverHA1!
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
--- 0.51.86 2
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
+++ 0.51.87 (null)
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: +
/cib: @num_updates=87
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: +
/cib/status/node_state[@id='serverHA2']:
@crm-debug-origin=send_stonith_update, @join=down, @expected=down
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0,
origin=serverHA1/crmd/333, version=0.51.87)
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
--- 0.51.87 2
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
+++ 0.51.88 (null)
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: --
/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2']
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: +
/cib: @num_updates=88
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_process_request:
Completed cib_delete operation for section
//node_state[@uname='serverHA2']/lrm: OK (rc=0, origin=serverHA1/crmd/334,
version=0.51.88)
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
--- 0.51.88 2
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff:
+++ 0.51.89 (null)
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: --
/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2']
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: +
/cib: @num_updates=89
Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_process_request:
Completed cib_delete operation for section
//node_state[@uname='serverHA2']/transient_attributes: OK (rc=0,
origin=serverHA1/crmd/335, version=0.51.89)
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice:
can_fence_host_with_device: iscsi-stonith-device can fence (on) serverHA2:
static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice:
can_fence_host_with_device: iscsi-stonith-device can fence (on) serverHA2:
static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: info:
stonith_fence_get_devices_cb: Found 1 matching devices for 'serverHA2'
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: log_operation:
Operation 'on' [3037] (call 9 from crmd.2514) for host 'serverHA2' with device
'iscsi-stonith-device' returned: 0 (OK)
Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: remote_op_done:
Operation on of serverHA2 by serverHA2 for crmd.2514@serverHA1.e19629dc: OK

I will continue testing....

Andrea

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

a.bacchi at codices

Feb 2, 2015, 3:22 AM

Post #21 of 37 (13117 views)

Hi,

I tryed a network failure and it works.
During failure, each node try to fence other node.
When network come back, the node with network problem is fenced and reboot.
Moreover, the cman kill(cman) on one node, tipically node1 kill(cman) on
node2, so, I have 2 situations:

1) Network failure on node2
When network come back, node2 is fenced and cman kill (cman) on node2 .
Watchdog script check for key registration, and reboot node2.
After reboot cluster come back with 2 nodes up.

2) Network failure on node1
When network come back, node1 is fenced, and cman kill(cman) on
node2.(cluster is down!)
Watchdog script check for key registration, and reboot node1.
During reboot cluster is offline because node1 is rebooting and cman on node
2 was killed.
After reboot, node1 is up and fence node2. Now, watchdog reboot node2.
After reboot, cluster come back with 2 nodes up.

The only "problem" is downtime in situation 2, but it is acceptable for my
context.
I created my fence device with this command:
[ONE]pcs stonith create scsi fence_scsi pcmk_host_list="serverHA1 serverHA2"
pcmk_reboot_action="off" meta provides="unfencing" --force
as described here
https://access.redhat.com/articles/530533

If possible, I will test the fence_vmware (without Wachdog script) and i
will post here my result

thansk to all
Andrea

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

lists at alteeve

Feb 2, 2015, 7:53 AM

Post #22 of 37 (13122 views)

That fence failed until the network came back makes your fence method
less than ideal. Will it eventually fence with the network still failed?

Most importantly though; Cluster resources blocked while the fence was
pending? If so, then your cluster is safe, and that is the most
important part.

On 02/02/15 06:22 AM, Andrea wrote:
> Hi,
>
> I tryed a network failure and it works.
> During failure, each node try to fence other node.
> When network come back, the node with network problem is fenced and reboot.
> Moreover, the cman kill(cman) on one node, tipically node1 kill(cman) on
> node2, so, I have 2 situations:
>
> 1) Network failure on node2
> When network come back, node2 is fenced and cman kill (cman) on node2 .
> Watchdog script check for key registration, and reboot node2.
> After reboot cluster come back with 2 nodes up.
>
> 2) Network failure on node1
> When network come back, node1 is fenced, and cman kill(cman) on
> node2.(cluster is down!)
> Watchdog script check for key registration, and reboot node1.
> During reboot cluster is offline because node1 is rebooting and cman on node
> 2 was killed.
> After reboot, node1 is up and fence node2. Now, watchdog reboot node2.
> After reboot, cluster come back with 2 nodes up.
>
>
> The only "problem" is downtime in situation 2, but it is acceptable for my
> context.
> I created my fence device with this command:
> [ONE]pcs stonith create scsi fence_scsi pcmk_host_list="serverHA1 serverHA2"
> pcmk_reboot_action="off" meta provides="unfencing" --force
> as described here
> https://access.redhat.com/articles/530533
>
>
> If possible, I will test the fence_vmware (without Wachdog script) and i
> will post here my result
>
> thansk to all
> Andrea
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

lists at alteeve

Feb 2, 2015, 7:53 AM

Post #23 of 37 (13113 views)

That fence failed until the network came back makes your fence method
less than ideal. Will it eventually fence with the network still failed?

Most importantly though; Cluster resources blocked while the fence was
pending? If so, then your cluster is safe, and that is the most
important part.

On 02/02/15 06:22 AM, Andrea wrote:
> Hi,
>
> I tryed a network failure and it works.
> During failure, each node try to fence other node.
> When network come back, the node with network problem is fenced and reboot.
> Moreover, the cman kill(cman) on one node, tipically node1 kill(cman) on
> node2, so, I have 2 situations:
>
> 1) Network failure on node2
> When network come back, node2 is fenced and cman kill (cman) on node2 .
> Watchdog script check for key registration, and reboot node2.
> After reboot cluster come back with 2 nodes up.
>
> 2) Network failure on node1
> When network come back, node1 is fenced, and cman kill(cman) on
> node2.(cluster is down!)
> Watchdog script check for key registration, and reboot node1.
> During reboot cluster is offline because node1 is rebooting and cman on node
> 2 was killed.
> After reboot, node1 is up and fence node2. Now, watchdog reboot node2.
> After reboot, cluster come back with 2 nodes up.
>
>
> The only "problem" is downtime in situation 2, but it is acceptable for my
> context.
> I created my fence device with this command:
> [ONE]pcs stonith create scsi fence_scsi pcmk_host_list="serverHA1 serverHA2"
> pcmk_reboot_action="off" meta provides="unfencing" --force
> as described here
> https://access.redhat.com/articles/530533
>
>
> If possible, I will test the fence_vmware (without Wachdog script) and i
> will post here my result
>
> thansk to all
> Andrea
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

a.bacchi at codices

Feb 4, 2015, 8:40 AM

Post #24 of 37 (13087 views)

Digimer <lists@...> writes:

>
> That fence failed until the network came back makes your fence method
> less than ideal. Will it eventually fence with the network still failed?
>
> Most importantly though; Cluster resources blocked while the fence was
> pending? If so, then your cluster is safe, and that is the most
> important part.
>
Hi Digimer

I'm using for fencing a remote NAS, attached via iscsi target.
During network failure, for example on node2, each node try to fence other node.
Fencing action on node1 get success, but on node2 fail, because it can't see
iscsi target(network is down!) .
I thinks it's the reason why node2 doesn't reboot now, because it can't make
operation on key reservation and watchdog can't check for this.
When network come back, watchdog can check for key registration and reboot
node2.

For clustered filesystem I planned to use ping resource with location
constraint as described here
http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch09s03s03s02.html
If the node can't see iscsi target..then..stop AppServer, Filesystem ecc

But it doesn't works. In the node with network failure i see in the log that
pingd is set to 0 but Filesystem resource doesn't stop.

I will continue testing...

Thanks
Andrea

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Two node cluster and no hardware device for stonith. [ In reply to ]

dmitry.koterov at gmail

Feb 4, 2015, 6:38 PM

Post #25 of 37 (13089 views)

Could you please give a hint: how to use fencing in case the nodes are all
in different geo-distributed datacenters? How people do that? Because there
could be a network disconnection between datacenters, and we have no chance
to send a stonith signal somewhere.

On Wednesday, February 4, 2015, Andrea <a.bacchi@codices.com> wrote:

> Digimer <lists@...> writes:
>
> >
> > That fence failed until the network came back makes your fence method
> > less than ideal. Will it eventually fence with the network still failed?
> >
> > Most importantly though; Cluster resources blocked while the fence was
> > pending? If so, then your cluster is safe, and that is the most
> > important part.
> >
> Hi Digimer
>
> I'm using for fencing a remote NAS, attached via iscsi target.
> During network failure, for example on node2, each node try to fence other
> node.
> Fencing action on node1 get success, but on node2 fail, because it can't
> see
> iscsi target(network is down!) .
> I thinks it's the reason why node2 doesn't reboot now, because it can't
> make
> operation on key reservation and watchdog can't check for this.
> When network come back, watchdog can check for key registration and reboot
> node2.
>
> For clustered filesystem I planned to use ping resource with location
> constraint as described here
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch09s03s03s02.html
> If the node can't see iscsi target..then..stop AppServer, Filesystem ecc
>
> But it doesn't works. In the node with network failure i see in the log
> that
> pingd is set to 0 but Filesystem resource doesn't stop.
>
> I will continue testing...
>
> Thanks
> Andrea
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>