Mailing List Archive

Split brain and STONITH behavior (VMware fencing)
Hello,

I'm trying to understand how this STONITH works.

I have 2 VMware VMs (moon1a, moon1b) on two different hosts. Each have 2 nic
assigned: eth0 for heartbeat while eth1 used for everything else.

This is my testing configuration:

node $id="168428034" moon1a
node $id="168428035" moon1b
primitive Foo ocf:heartbeat:Dummy
primitive stonith_moon1a stonith:fence_vmware_soap \
params ipaddr="192.168.1.134" login="foo" \
uuid="42053b22-d3fd-25fe-6fb3-7cb2c7cd2c63" \
action="off" verbose="true" passwd="bar" \
ssl="true" \
op monitor interval="60s"
primitive stonith_moon1b stonith:fence_vmware_soap \
params ipaddr="192.168.1.134" login="foo" \
uuid="4205b986-4426-5de4-1069-b10a77123bc4" \
action="off" verbose="true" passwd="bar" \
ssl="true" \
op monitor interval="60s"
clone FooClones Foo
location loc_stonith_moon1a stonith_moon1a -inf: moon1a
location loc_stonith_moon1b stonith_moon1b -inf: moon1b
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="true" \
last-lrm-refresh="1414565715"
rsc_defaults $id="rsc-options" \
resource-stickiness="200"


The vCenter is at 192.168.1.134 and the uuids taken from a list generated by
fence_vmware_soap.

When I do fencing manually using:

# fence_vmware_soap -z -a 192.168.1.134 \
-l foo -p bar \
-U 4205b986-4426-5de4-1069-b10a77123bc4 \
-o off

from moon1a, as expected the moon1b
(4205b986-4426-5de4-1069-b10a77123bc4) VM
died, so the configuration should be right, I think.

But so far I cant emulate split brain by killing corosync like this:

# killall -9 corosync


My questions:

1. Is my configuration correct?
2. How one cause a split-brain to trigger the expected stonith
behavior?



Thank you,
Ariel


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Split brain and STONITH behavior (VMware fencing) [ In reply to ]
You can try to force a split brain by shutting down the heartbeat NICs and keep corosync running on both nodes.

Regards Sven

Ariel S <ariel_bis2030@yahoo.co.id> schrieb:

>Hello,
>
>I'm trying to understand how this STONITH works.
>
>I have 2 VMware VMs (moon1a, moon1b) on two different hosts. Each have 2 nic
>assigned: eth0 for heartbeat while eth1 used for everything else.
>
>This is my testing configuration:
>
> node $id="168428034" moon1a
> node $id="168428035" moon1b
> primitive Foo ocf:heartbeat:Dummy
> primitive stonith_moon1a stonith:fence_vmware_soap \
> params ipaddr="192.168.1.134" login="foo" \
> uuid="42053b22-d3fd-25fe-6fb3-7cb2c7cd2c63" \
> action="off" verbose="true" passwd="bar" \
> ssl="true" \
> op monitor interval="60s"
> primitive stonith_moon1b stonith:fence_vmware_soap \
> params ipaddr="192.168.1.134" login="foo" \
> uuid="4205b986-4426-5de4-1069-b10a77123bc4" \
> action="off" verbose="true" passwd="bar" \
> ssl="true" \
> op monitor interval="60s"
> clone FooClones Foo
> location loc_stonith_moon1a stonith_moon1a -inf: moon1a
> location loc_stonith_moon1b stonith_moon1b -inf: moon1b
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-42f2063" \
> cluster-infrastructure="corosync" \
> stonith-enabled="true" \
> last-lrm-refresh="1414565715"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="200"
>
>
>The vCenter is at 192.168.1.134 and the uuids taken from a list generated by
>fence_vmware_soap.
>
>When I do fencing manually using:
>
> # fence_vmware_soap -z -a 192.168.1.134 \
> -l foo -p bar \
> -U 4205b986-4426-5de4-1069-b10a77123bc4 \
> -o off
>
>from moon1a, as expected the moon1b
>(4205b986-4426-5de4-1069-b10a77123bc4) VM
>died, so the configuration should be right, I think.
>
>But so far I cant emulate split brain by killing corosync like this:
>
> # killall -9 corosync
>
>
>My questions:
>
> 1. Is my configuration correct?
> 2. How one cause a split-brain to trigger the expected stonith
>behavior?
>
>
>
>Thank you,
>Ariel
>
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Split brain and STONITH behavior (VMware fencing) [ In reply to ]
On Wed, Oct 29, 2014 at 10:46 AM, Ariel S <ariel_bis2030@yahoo.co.id> wrote:
> Hello,
>
> I'm trying to understand how this STONITH works.
>
> I have 2 VMware VMs (moon1a, moon1b) on two different hosts. Each have 2 nic
> assigned: eth0 for heartbeat while eth1 used for everything else.
>
> This is my testing configuration:
>
> node $id="168428034" moon1a
> node $id="168428035" moon1b
> primitive Foo ocf:heartbeat:Dummy
> primitive stonith_moon1a stonith:fence_vmware_soap \
> params ipaddr="192.168.1.134" login="foo" \
> uuid="42053b22-d3fd-25fe-6fb3-7cb2c7cd2c63" \
> action="off" verbose="true" passwd="bar" \
> ssl="true" \
> op monitor interval="60s"
> primitive stonith_moon1b stonith:fence_vmware_soap \
> params ipaddr="192.168.1.134" login="foo" \
> uuid="4205b986-4426-5de4-1069-b10a77123bc4" \
> action="off" verbose="true" passwd="bar" \
> ssl="true" \
> op monitor interval="60s"
> clone FooClones Foo
> location loc_stonith_moon1a stonith_moon1a -inf: moon1a
> location loc_stonith_moon1b stonith_moon1b -inf: moon1b
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-42f2063" \
> cluster-infrastructure="corosync" \
> stonith-enabled="true" \
> last-lrm-refresh="1414565715"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="200"
>
>
> The vCenter is at 192.168.1.134 and the uuids taken from a list generated by
> fence_vmware_soap.
>
> When I do fencing manually using:
>
> # fence_vmware_soap -z -a 192.168.1.134 \
> -l foo -p bar \
> -U 4205b986-4426-5de4-1069-b10a77123bc4 \
> -o off
>
> from moon1a, as expected the moon1b (4205b986-4426-5de4-1069-b10a77123bc4)
> VM
> died, so the configuration should be right, I think.
>
> But so far I cant emulate split brain by killing corosync like this:
>
> # killall -9 corosync
>

Killing corosync is not strictly speaking split-brain, it is emulation
of (partial) node failure.

>
> My questions:
>
> 1. Is my configuration correct?

On two node cluster you also need no-quorum-policy=ignore, otherwise
remaining node won't initiate fencing.

> 2. How one cause a split-brain to trigger the expected stonith
> behavior?
>
>
>
> Thank you,
> Ariel
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Split brain and STONITH behavior (VMware fencing) [ In reply to ]
Thank you for the reply, I try severing the heartbeat nic and add set the
`no-quorum-policy` to `ignore` (I forgot that one) like this:

# crm configure property no-quorum-policy=ignore

and it works nicely.


Thank you,
Ariel


On 10/29/2014 03:27 PM, Sven Moeller wrote:
> You can try to force a split brain by shutting down the heartbeat NICs and keep corosync running on both nodes.
>
> Regards Sven
>
> Ariel S <ariel_bis2030@yahoo.co.id> schrieb:
>
>> Hello,
>>
>> I'm trying to understand how this STONITH works.
>>
>> I have 2 VMware VMs (moon1a, moon1b) on two different hosts. Each have 2 nic
>> assigned: eth0 for heartbeat while eth1 used for everything else.
>>
>> This is my testing configuration:
>>
>> node $id="168428034" moon1a
>> node $id="168428035" moon1b
>> primitive Foo ocf:heartbeat:Dummy
>> primitive stonith_moon1a stonith:fence_vmware_soap \
>> params ipaddr="192.168.1.134" login="foo" \
>> uuid="42053b22-d3fd-25fe-6fb3-7cb2c7cd2c63" \
>> action="off" verbose="true" passwd="bar" \
>> ssl="true" \
>> op monitor interval="60s"
>> primitive stonith_moon1b stonith:fence_vmware_soap \
>> params ipaddr="192.168.1.134" login="foo" \
>> uuid="4205b986-4426-5de4-1069-b10a77123bc4" \
>> action="off" verbose="true" passwd="bar" \
>> ssl="true" \
>> op monitor interval="60s"
>> clone FooClones Foo
>> location loc_stonith_moon1a stonith_moon1a -inf: moon1a
>> location loc_stonith_moon1b stonith_moon1b -inf: moon1b
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.10-42f2063" \
>> cluster-infrastructure="corosync" \
>> stonith-enabled="true" \
>> last-lrm-refresh="1414565715"
>> rsc_defaults $id="rsc-options" \
>> resource-stickiness="200"
>>
>>
>> The vCenter is at 192.168.1.134 and the uuids taken from a list generated by
>> fence_vmware_soap.
>>
>> When I do fencing manually using:
>>
>> # fence_vmware_soap -z -a 192.168.1.134 \
>> -l foo -p bar \
>> -U 4205b986-4426-5de4-1069-b10a77123bc4 \
>> -o off
>>
> >from moon1a, as expected the moon1b
>> (4205b986-4426-5de4-1069-b10a77123bc4) VM
>> died, so the configuration should be right, I think.
>>
>> But so far I cant emulate split brain by killing corosync like this:
>>
>> # killall -9 corosync
>>
>>
>> My questions:
>>
>> 1. Is my configuration correct?
>> 2. How one cause a split-brain to trigger the expected stonith
>> behavior?
>>
>>
>>
>> Thank you,
>> Ariel
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Split brain and STONITH behavior (VMware fencing) [ In reply to ]
> On 29 Oct 2014, at 7:48 pm, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>
> On Wed, Oct 29, 2014 at 10:46 AM, Ariel S <ariel_bis2030@yahoo.co.id> wrote:
>> Hello,
>>
>> I'm trying to understand how this STONITH works.
>>
>> I have 2 VMware VMs (moon1a, moon1b) on two different hosts. Each have 2 nic
>> assigned: eth0 for heartbeat while eth1 used for everything else.
>>
>> This is my testing configuration:
>>
>> node $id="168428034" moon1a
>> node $id="168428035" moon1b
>> primitive Foo ocf:heartbeat:Dummy
>> primitive stonith_moon1a stonith:fence_vmware_soap \
>> params ipaddr="192.168.1.134" login="foo" \
>> uuid="42053b22-d3fd-25fe-6fb3-7cb2c7cd2c63" \
>> action="off" verbose="true" passwd="bar" \
>> ssl="true" \
>> op monitor interval="60s"
>> primitive stonith_moon1b stonith:fence_vmware_soap \
>> params ipaddr="192.168.1.134" login="foo" \
>> uuid="4205b986-4426-5de4-1069-b10a77123bc4" \
>> action="off" verbose="true" passwd="bar" \
>> ssl="true" \
>> op monitor interval="60s"
>> clone FooClones Foo
>> location loc_stonith_moon1a stonith_moon1a -inf: moon1a
>> location loc_stonith_moon1b stonith_moon1b -inf: moon1b
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.10-42f2063" \
>> cluster-infrastructure="corosync" \
>> stonith-enabled="true" \
>> last-lrm-refresh="1414565715"
>> rsc_defaults $id="rsc-options" \
>> resource-stickiness="200"
>>
>>
>> The vCenter is at 192.168.1.134 and the uuids taken from a list generated by
>> fence_vmware_soap.
>>
>> When I do fencing manually using:
>>
>> # fence_vmware_soap -z -a 192.168.1.134 \
>> -l foo -p bar \
>> -U 4205b986-4426-5de4-1069-b10a77123bc4 \
>> -o off
>>
>> from moon1a, as expected the moon1b (4205b986-4426-5de4-1069-b10a77123bc4)
>> VM
>> died, so the configuration should be right, I think.
>>
>> But so far I cant emulate split brain by killing corosync like this:
>>
>> # killall -9 corosync
>>
>
> Killing corosync is not strictly speaking split-brain, it is emulation
> of (partial) node failure.

Its almost the opposite of split-brain.
For split-brain, both sides need to believe they are full functional and it is the other side that has a problem.

>
>>
>> My questions:
>>
>> 1. Is my configuration correct?
>
> On two node cluster you also need no-quorum-policy=ignore, otherwise
> remaining node won't initiate fencing.
>
>> 2. How one cause a split-brain to trigger the expected stonith
>> behavior?

Use a firewall to block corosync's ports on both hosts

>>
>>
>>
>> Thank you,
>> Ariel
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org