Mailing List Archive

sbd fencing race
Hi list,

The last night, i had a cluster in fencing race using sbd as stonith
device, i would like to know what is the effect to use start-delay in
my stonith resource in this way:

primitive stonith-sbd stonith:external/sbd \
params sbd_device="/dev/mapper/SBD \
op start interval="0" start-delay="5"

Thanks

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: sbd fencing race [ In reply to ]
Hi,

On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote:
> Hi list,
>
> The last night, i had a cluster in fencing race using sbd as stonith

Can you give a bit more details.

> device, i would like to know what is the effect to use start-delay in
> my stonith resource in this way:
>
> primitive stonith-sbd stonith:external/sbd \
> params sbd_device="/dev/mapper/SBD \
> op start interval="0" start-delay="5"

Yes, that could help with a stonith deathmatch. Normally, you
have a stonith resource running on one node. On split brain, the
other node also starts the resource in order to shoot the first
node. That's where start-delay comes into play.

Ultimate resource for the issue: http://ourobengr.com/ha/

Cheers,

Dejan

> Thanks
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: sbd fencing race [ In reply to ]
But i would like to know if pacemaker needs to start sbd on the node
where sbd resource isnt running to fence the other nodes, because i
don't see any start action in the second node:

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69794]: WARN: CIB: We do
NOT have quorum!
message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69791]: WARN: Pacemaker
health check: UNHEALTHY
message_2cd.txt:Nov 23 11:43:28 node01 pengine: [69823]: notice:
LogActions: Leave stonith-sbd (Started node01)
message_2ch.txt:Nov 23 11:43:28 s02srv002ch sbd: [97640]: WARN: CIB:
We do NOT have quorum!

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97640]: WARN: CIB: We do
NOT have quorum!
message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97637]: WARN: Pacemaker
health check: UNHEALTHY
message_2ch.txt:Nov 23 11:43:28 node02 pengine: [97679]: WARN:
custom_action: Action stonith-sbd_stop_0 on node01 is unrunnable
(offline)
message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Delivery
process handling /dev/mapper/SBD01B0298700230
message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Writing
reset to node slot node01
message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Messaging delay: 40

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Thanks

2014-11-26 10:26 GMT+01:00 Dejan Muhamedagic <dejanmm@fastmail.fm>:
> Hi,
>
> On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote:
>> Hi list,
>>
>> The last night, i had a cluster in fencing race using sbd as stonith
>
> Can you give a bit more details.
>
>> device, i would like to know what is the effect to use start-delay in
>> my stonith resource in this way:
>>
>> primitive stonith-sbd stonith:external/sbd \
>> params sbd_device="/dev/mapper/SBD \
>> op start interval="0" start-delay="5"
>
> Yes, that could help with a stonith deathmatch. Normally, you
> have a stonith resource running on one node. On split brain, the
> other node also starts the resource in order to shoot the first
> node. That's where start-delay comes into play.
>
> Ultimate resource for the issue: http://ourobengr.com/ha/
>
> Cheers,
>
> Dejan
>
>> Thanks
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: sbd fencing race [ In reply to ]
On Wed, Nov 26, 2014 at 11:13:41AM +0100, emmanuel segura wrote:
> But i would like to know if pacemaker needs to start sbd on the node
> where sbd resource isnt running to fence the other nodes, because i
> don't see any start action in the second node:

That's strange. I'd expect that a stonith resource needs to be
started (enabled) first. Perhaps that changed, as it seems to be
the case judging by the logs below. I cannot offer any more
advice here, but would still like to know the circumstances and
how it happened that the nodes shot each other.

Thanks,

Dejan


> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>
> message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69794]: WARN: CIB: We do
> NOT have quorum!
> message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69791]: WARN: Pacemaker
> health check: UNHEALTHY
> message_2cd.txt:Nov 23 11:43:28 node01 pengine: [69823]: notice:
> LogActions: Leave stonith-sbd (Started node01)
> message_2ch.txt:Nov 23 11:43:28 s02srv002ch sbd: [97640]: WARN: CIB:
> We do NOT have quorum!
>
> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>
> message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97640]: WARN: CIB: We do
> NOT have quorum!
> message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97637]: WARN: Pacemaker
> health check: UNHEALTHY
> message_2ch.txt:Nov 23 11:43:28 node02 pengine: [97679]: WARN:
> custom_action: Action stonith-sbd_stop_0 on node01 is unrunnable
> (offline)
> message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Delivery
> process handling /dev/mapper/SBD01B0298700230
> message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Writing
> reset to node slot node01
> message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Messaging delay: 40
>
> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>
> Thanks
>
> 2014-11-26 10:26 GMT+01:00 Dejan Muhamedagic <dejanmm@fastmail.fm>:
> > Hi,
> >
> > On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote:
> >> Hi list,
> >>
> >> The last night, i had a cluster in fencing race using sbd as stonith
> >
> > Can you give a bit more details.
> >
> >> device, i would like to know what is the effect to use start-delay in
> >> my stonith resource in this way:
> >>
> >> primitive stonith-sbd stonith:external/sbd \
> >> params sbd_device="/dev/mapper/SBD \
> >> op start interval="0" start-delay="5"
> >
> > Yes, that could help with a stonith deathmatch. Normally, you
> > have a stonith resource running on one node. On split brain, the
> > other node also starts the resource in order to shoot the first
> > node. That's where start-delay comes into play.
> >
> > Ultimate resource for the issue: http://ourobengr.com/ha/
> >
> > Cheers,
> >
> > Dejan
> >
> >> Thanks
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: sbd fencing race [ In reply to ]
I think pacemaker doesn't care about the sbd resource status when it
needs to make a fencing call, that what i think, but i hope some one,
will give me some more information.

Thanks


2014-11-26 15:11 GMT+01:00 Dejan Muhamedagic <dejanmm@fastmail.fm>:
> On Wed, Nov 26, 2014 at 11:13:41AM +0100, emmanuel segura wrote:
>> But i would like to know if pacemaker needs to start sbd on the node
>> where sbd resource isnt running to fence the other nodes, because i
>> don't see any start action in the second node:
>
> That's strange. I'd expect that a stonith resource needs to be
> started (enabled) first. Perhaps that changed, as it seems to be
> the case judging by the logs below. I cannot offer any more
> advice here, but would still like to know the circumstances and
> how it happened that the nodes shot each other.
>
> Thanks,
>
> Dejan
>
>
>> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>
>> message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69794]: WARN: CIB: We do
>> NOT have quorum!
>> message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69791]: WARN: Pacemaker
>> health check: UNHEALTHY
>> message_2cd.txt:Nov 23 11:43:28 node01 pengine: [69823]: notice:
>> LogActions: Leave stonith-sbd (Started node01)
>> message_2ch.txt:Nov 23 11:43:28 s02srv002ch sbd: [97640]: WARN: CIB:
>> We do NOT have quorum!
>>
>> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>
>> message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97640]: WARN: CIB: We do
>> NOT have quorum!
>> message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97637]: WARN: Pacemaker
>> health check: UNHEALTHY
>> message_2ch.txt:Nov 23 11:43:28 node02 pengine: [97679]: WARN:
>> custom_action: Action stonith-sbd_stop_0 on node01 is unrunnable
>> (offline)
>> message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Delivery
>> process handling /dev/mapper/SBD01B0298700230
>> message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Writing
>> reset to node slot node01
>> message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Messaging delay: 40
>>
>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>>
>> Thanks
>>
>> 2014-11-26 10:26 GMT+01:00 Dejan Muhamedagic <dejanmm@fastmail.fm>:
>> > Hi,
>> >
>> > On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote:
>> >> Hi list,
>> >>
>> >> The last night, i had a cluster in fencing race using sbd as stonith
>> >
>> > Can you give a bit more details.
>> >
>> >> device, i would like to know what is the effect to use start-delay in
>> >> my stonith resource in this way:
>> >>
>> >> primitive stonith-sbd stonith:external/sbd \
>> >> params sbd_device="/dev/mapper/SBD \
>> >> op start interval="0" start-delay="5"
>> >
>> > Yes, that could help with a stonith deathmatch. Normally, you
>> > have a stonith resource running on one node. On split brain, the
>> > other node also starts the resource in order to shoot the first
>> > node. That's where start-delay comes into play.
>> >
>> > Ultimate resource for the issue: http://ourobengr.com/ha/
>> >
>> > Cheers,
>> >
>> > Dejan
>> >
>> >> Thanks
>> >>
>> >> _______________________________________________
>> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://bugs.clusterlabs.org
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org