Mailing List Archive: resource-stickiness not working?

resource-stickiness not working?

Nov 13, 2014, 10:52 AM

Post #1 of 5 (2336 views)

Here is a simple Active/Passive configuration with a single Dummy resource (see end of message). The resource-stickiness default is set to 100. I was assuming that this would be enough to keep the Dummy resource on the active node as long as the active node stays healthy. However, stickiness is not working as I expected in the following scenario:

1) The node testnode1, which is running the Dummy resource, reboots or crashes
2) Dummy resource fails to node testnode2
3) testnode1 comes back up after reboot or crash
4) Dummy resource fails back to testnode1

I don't want the resource to failback to the original node in step 4. That is why resource-stickiness is set to 100. The only way I can get the resource to not to fail back is to set resource-stickiness to INFINITY. Is this the correct behavior of resource-stickiness? What am I missing? This is not what I understand from the documentation from clusterlabs.org. BTW, after reading various postings on fail back issues, I played with setting on-fail to standby, but that doesn't seem to help either. Any help is appreciated!

Scott

node testnode1
node testnode2
primitive dummy ocf:heartbeat:Dummy \
op start timeout="180s" interval="0" \
op stop timeout="180s" interval="0" \
op monitor interval="60s" timeout="60s" migration-threshold="5"
xml <rsc_location id="cli-prefer-dummy" rsc="dummy" role="Started" node="testnode2" score="INFINITY"/>
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
stonith-action="reboot" \
no-quorum-policy="ignore" \
last-lrm-refresh="1413378119"
rsc_defaults $id="rsc-options" \
resource-stickiness="100" \
migration-threshold="5"

Re: resource-stickiness not working? [ In reply to ]

dvossel at redhat

Nov 14, 2014, 6:31 AM

Post #2 of 5 (2303 views)

Permalink

----- Original Message -----
> Here is a simple Active/Passive configuration with a single Dummy resource
> (see end of message). The resource-stickiness default is set to 100. I was
> assuming that this would be enough to keep the Dummy resource on the active
> node as long as the active node stays healthy. However, stickiness is not
> working as I expected in the following scenario:
>
> 1) The node testnode1, which is running the Dummy resource, reboots or
> crashes
> 2) Dummy resource fails to node testnode2
> 3) testnode1 comes back up after reboot or crash
> 4) Dummy resource fails back to testnode1
>
> I don't want the resource to failback to the original node in step 4. That is
> why resource-stickiness is set to 100. The only way I can get the resource
> to not to fail back is to set resource-stickiness to INFINITY. Is this the
> correct behavior of resource-stickiness? What am I missing? This is not what
> I understand from the documentation from clusterlabs.org. BTW, after reading
> various postings on fail back issues, I played with setting on-fail to
> standby, but that doesn't seem to help either. Any help is appreciated!

I agree, this is curious.

Can you attach a crm_report? Then we can walk through the transitions to
figure out why this is happening.

-- Vossel

> Scott
>
> node testnode1
> node testnode2
> primitive dummy ocf:heartbeat:Dummy \
> op start timeout="180s" interval="0" \
> op stop timeout="180s" interval="0" \
> op monitor interval="60s" timeout="60s" migration-threshold="5"
> xml <rsc_location id="cli-prefer-dummy" rsc="dummy" role="Started"
> node="testnode2" score="INFINITY"/>
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-14.el6-368c726" \
> cluster-infrastructure="classic openais (with plugin)" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> stonith-action="reboot" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1413378119"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100" \
> migration-threshold="5"
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: resource-stickiness not working? [ In reply to ]

dejanmm at fastmail

Nov 14, 2014, 7:28 AM

Post #3 of 5 (2302 views)

Permalink

Hi,

On Thu, Nov 13, 2014 at 06:52:29PM +0000, Scott Donoho wrote:
> Here is a simple Active/Passive configuration with a single Dummy resource (see end of message). The resource-stickiness default is set to 100. I was assuming that this would be enough to keep the Dummy resource on the active node as long as the active node stays healthy. However, stickiness is not working as I expected in the following scenario:
>
> 1) The node testnode1, which is running the Dummy resource, reboots or crashes
> 2) Dummy resource fails to node testnode2
> 3) testnode1 comes back up after reboot or crash
> 4) Dummy resource fails back to testnode1
>
> I don't want the resource to failback to the original node in step 4. That is why resource-stickiness is set to 100. The only way I can get the resource to not to fail back is to set resource-stickiness to INFINITY. Is this the correct behavior of resource-stickiness? What am I missing? This is not what I understand from the documentation from clusterlabs.org. BTW, after reading various postings on fail back issues, I played with setting on-fail to standby, but that doesn't seem to help either. Any help is appreciated!

You can try crm resource scores. But note that below you have a
location preference of infinity, hence stickiness has to match
that score.

> Scott
>
> node testnode1
> node testnode2
> primitive dummy ocf:heartbeat:Dummy \
> op start timeout="180s" interval="0" \
> op stop timeout="180s" interval="0" \
> op monitor interval="60s" timeout="60s" migration-threshold="5"
> xml <rsc_location id="cli-prefer-dummy" rsc="dummy" role="Started" node="testnode2" score="INFINITY"/>

Looks like here crmsh got confused by the role set to Started.
Which crmsh version do you run?

Thanks,

Dejan

> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-14.el6-368c726" \
> cluster-infrastructure="classic openais (with plugin)" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> stonith-action="reboot" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1413378119"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100" \
> migration-threshold="5"
>
>
>
>

> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: resource-stickiness not working? [ In reply to ]

sdonoho at cray

Nov 14, 2014, 11:44 AM

Post #4 of 5 (2291 views)

Permalink

We are running the following versions:

crmsh 1.2.6
pacemaker 1.1.10
corosync 1.4.1

On 11/14/14 9:28 AM, "Dejan Muhamedagic" <dejanmm@fastmail.fm> wrote:

>Hi,
>
>On Thu, Nov 13, 2014 at 06:52:29PM +0000, Scott Donoho wrote:
>> Here is a simple Active/Passive configuration with a single Dummy
>>resource (see end of message). The resource-stickiness default is set to
>>100. I was assuming that this would be enough to keep the Dummy resource
>>on the active node as long as the active node stays healthy. However,
>>stickiness is not working as I expected in the following scenario:
>>
>> 1) The node testnode1, which is running the Dummy resource, reboots or
>>crashes
>> 2) Dummy resource fails to node testnode2
>> 3) testnode1 comes back up after reboot or crash
>> 4) Dummy resource fails back to testnode1
>>
>> I don't want the resource to failback to the original node in step 4.
>>That is why resource-stickiness is set to 100. The only way I can get
>>the resource to not to fail back is to set resource-stickiness to
>>INFINITY. Is this the correct behavior of resource-stickiness? What am I
>>missing? This is not what I understand from the documentation from
>>clusterlabs.org. BTW, after reading various postings on fail back
>>issues, I played with setting on-fail to standby, but that doesn't seem
>>to help either. Any help is appreciated!
>
>You can try crm resource scores. But note that below you have a
>location preference of infinity, hence stickiness has to match
>that score.
>
>> Scott
>>
>> node testnode1
>> node testnode2
>> primitive dummy ocf:heartbeat:Dummy \
>> op start timeout="180s" interval="0" \
>> op stop timeout="180s" interval="0" \
>> op monitor interval="60s" timeout="60s" migration-threshold="5"
>> xml <rsc_location id="cli-prefer-dummy" rsc="dummy" role="Started"
>>node="testnode2" score="INFINITY"/>
>
>Looks like here crmsh got confused by the role set to Started.
>Which crmsh version do you run?
>
>Thanks,
>
>Dejan
>
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.10-14.el6-368c726" \
>> cluster-infrastructure="classic openais (with plugin)" \
>> expected-quorum-votes="2" \
>> stonith-enabled="false" \
>> stonith-action="reboot" \
>> no-quorum-policy="ignore" \
>> last-lrm-refresh="1413378119"
>> rsc_defaults $id="rsc-options" \
>> resource-stickiness="100" \
>> migration-threshold="5"
>>
>>
>>
>>
>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: resource-stickiness not working? [ In reply to ]

andrew at beekhof

Nov 16, 2014, 10:03 PM

Post #5 of 5 (2254 views)

Permalink

> On 14 Nov 2014, at 5:52 am, Scott Donoho <sdonoho@cray.com> wrote:
>
> Here is a simple Active/Passive configuration with a single Dummy resource (see end of message). The resource-stickiness default is set to 100. I was assuming that this would be enough to keep the Dummy resource on the active node as long as the active node stays healthy. However, stickiness is not working as I expected in the following scenario:
>
> 1) The node testnode1, which is running the Dummy resource, reboots or crashes
> 2) Dummy resource fails to node testnode2
> 3) testnode1 comes back up after reboot or crash

When this happens, the cluster will check what state Dummy is in on testnode1.
My guess is that Dummy thinks it is still active (based on a stale lock file) and recovery is initiated quick enough that it looks like a 'normal' migration

> 4) Dummy resource fails back to testnode1
>
> I don't want the resource to failback to the original node in step 4. That is why resource-stickiness is set to 100. The only way I can get the resource to not to fail back is to set resource-stickiness to INFINITY. Is this the correct behavior of resource-stickiness? What am I missing? This is not what I understand from the documentation from clusterlabs.org. BTW, after reading various postings on fail back issues, I played with setting on-fail to standby, but that doesn't seem to help either. Any help is appreciated!
>
> Scott
>
> node testnode1
> node testnode2
> primitive dummy ocf:heartbeat:Dummy \
> op start timeout="180s" interval="0" \
> op stop timeout="180s" interval="0" \
> op monitor interval="60s" timeout="60s" migration-threshold="5"
> xml <rsc_location id="cli-prefer-dummy" rsc="dummy" role="Started" node="testnode2" score="INFINITY"/>
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-14.el6-368c726" \
> cluster-infrastructure="classic openais (with plugin)" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> stonith-action="reboot" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1413378119"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100" \
> migration-threshold="5"
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org