Mailing List Archive

resource-stickiness not working?
Here is a simple Active/Passive configuration with a single Dummy resource (see end of message). The resource-stickiness default is set to 100. I was assuming that this would be enough to keep the Dummy resource on the active node as long as the active node stays healthy. However, stickiness is not working as I expected in the following scenario:

1) The node testnode1, which is running the Dummy resource, reboots or crashes
2) Dummy resource fails to node testnode2
3) testnode1 comes back up after reboot or crash
4) Dummy resource fails back to testnode1

I don't want the resource to failback to the original node in step 4. That is why resource-stickiness is set to 100. The only way I can get the resource to not to fail back is to set resource-stickiness to INFINITY. Is this the correct behavior of resource-stickiness? What am I missing? This is not what I understand from the documentation from clusterlabs.org. BTW, after reading various postings on fail back issues, I played with setting on-fail to standby, but that doesn't seem to help either. Any help is appreciated!

Scott

node testnode1
node testnode2
primitive dummy ocf:heartbeat:Dummy \
op start timeout="180s" interval="0" \
op stop timeout="180s" interval="0" \
op monitor interval="60s" timeout="60s" migration-threshold="5"
xml <rsc_location id="cli-prefer-dummy" rsc="dummy" role="Started" node="testnode2" score="INFINITY"/>
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
stonith-action="reboot" \
no-quorum-policy="ignore" \
last-lrm-refresh="1413378119"
rsc_defaults $id="rsc-options" \
resource-stickiness="100" \
migration-threshold="5"
Re: resource-stickiness not working? [ In reply to ]
----- Original Message -----
> Here is a simple Active/Passive configuration with a single Dummy resource
> (see end of message). The resource-stickiness default is set to 100. I was
> assuming that this would be enough to keep the Dummy resource on the active
> node as long as the active node stays healthy. However, stickiness is not
> working as I expected in the following scenario:
>
> 1) The node testnode1, which is running the Dummy resource, reboots or
> crashes
> 2) Dummy resource fails to node testnode2
> 3) testnode1 comes back up after reboot or crash
> 4) Dummy resource fails back to testnode1
>
> I don't want the resource to failback to the original node in step 4. That is
> why resource-stickiness is set to 100. The only way I can get the resource
> to not to fail back is to set resource-stickiness to INFINITY. Is this the
> correct behavior of resource-stickiness? What am I missing? This is not what
> I understand from the documentation from clusterlabs.org. BTW, after reading
> various postings on fail back issues, I played with setting on-fail to
> standby, but that doesn't seem to help either. Any help is appreciated!

I agree, this is curious.

Can you attach a crm_report? Then we can walk through the transitions to
figure out why this is happening.

-- Vossel

> Scott
>
> node testnode1
> node testnode2
> primitive dummy ocf:heartbeat:Dummy \
> op start timeout="180s" interval="0" \
> op stop timeout="180s" interval="0" \
> op monitor interval="60s" timeout="60s" migration-threshold="5"
> xml <rsc_location id="cli-prefer-dummy" rsc="dummy" role="Started"
> node="testnode2" score="INFINITY"/>
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-14.el6-368c726" \
> cluster-infrastructure="classic openais (with plugin)" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> stonith-action="reboot" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1413378119"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100" \
> migration-threshold="5"
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: resource-stickiness not working? [ In reply to ]
Hi,

On Thu, Nov 13, 2014 at 06:52:29PM +0000, Scott Donoho wrote:
> Here is a simple Active/Passive configuration with a single Dummy resource (see end of message). The resource-stickiness default is set to 100. I was assuming that this would be enough to keep the Dummy resource on the active node as long as the active node stays healthy. However, stickiness is not working as I expected in the following scenario:
>
> 1) The node testnode1, which is running the Dummy resource, reboots or crashes
> 2) Dummy resource fails to node testnode2
> 3) testnode1 comes back up after reboot or crash
> 4) Dummy resource fails back to testnode1
>
> I don't want the resource to failback to the original node in step 4. That is why resource-stickiness is set to 100. The only way I can get the resource to not to fail back is to set resource-stickiness to INFINITY. Is this the correct behavior of resource-stickiness? What am I missing? This is not what I understand from the documentation from clusterlabs.org. BTW, after reading various postings on fail back issues, I played with setting on-fail to standby, but that doesn't seem to help either. Any help is appreciated!

You can try crm resource scores. But note that below you have a
location preference of infinity, hence stickiness has to match
that score.

> Scott
>
> node testnode1
> node testnode2
> primitive dummy ocf:heartbeat:Dummy \
> op start timeout="180s" interval="0" \
> op stop timeout="180s" interval="0" \
> op monitor interval="60s" timeout="60s" migration-threshold="5"
> xml <rsc_location id="cli-prefer-dummy" rsc="dummy" role="Started" node="testnode2" score="INFINITY"/>

Looks like here crmsh got confused by the role set to Started.
Which crmsh version do you run?

Thanks,

Dejan

> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-14.el6-368c726" \
> cluster-infrastructure="classic openais (with plugin)" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> stonith-action="reboot" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1413378119"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100" \
> migration-threshold="5"
>
>
>
>

> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: resource-stickiness not working? [ In reply to ]
We are running the following versions:

crmsh 1.2.6
pacemaker 1.1.10
corosync 1.4.1



On 11/14/14 9:28 AM, "Dejan Muhamedagic" <dejanmm@fastmail.fm> wrote:

>Hi,
>
>On Thu, Nov 13, 2014 at 06:52:29PM +0000, Scott Donoho wrote:
>> Here is a simple Active/Passive configuration with a single Dummy
>>resource (see end of message). The resource-stickiness default is set to
>>100. I was assuming that this would be enough to keep the Dummy resource
>>on the active node as long as the active node stays healthy. However,
>>stickiness is not working as I expected in the following scenario:
>>
>> 1) The node testnode1, which is running the Dummy resource, reboots or
>>crashes
>> 2) Dummy resource fails to node testnode2
>> 3) testnode1 comes back up after reboot or crash
>> 4) Dummy resource fails back to testnode1
>>
>> I don't want the resource to failback to the original node in step 4.
>>That is why resource-stickiness is set to 100. The only way I can get
>>the resource to not to fail back is to set resource-stickiness to
>>INFINITY. Is this the correct behavior of resource-stickiness? What am I
>>missing? This is not what I understand from the documentation from
>>clusterlabs.org. BTW, after reading various postings on fail back
>>issues, I played with setting on-fail to standby, but that doesn't seem
>>to help either. Any help is appreciated!
>
>You can try crm resource scores. But note that below you have a
>location preference of infinity, hence stickiness has to match
>that score.
>
>> Scott
>>
>> node testnode1
>> node testnode2
>> primitive dummy ocf:heartbeat:Dummy \
>> op start timeout="180s" interval="0" \
>> op stop timeout="180s" interval="0" \
>> op monitor interval="60s" timeout="60s" migration-threshold="5"
>> xml <rsc_location id="cli-prefer-dummy" rsc="dummy" role="Started"
>>node="testnode2" score="INFINITY"/>
>
>Looks like here crmsh got confused by the role set to Started.
>Which crmsh version do you run?
>
>Thanks,
>
>Dejan
>
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.10-14.el6-368c726" \
>> cluster-infrastructure="classic openais (with plugin)" \
>> expected-quorum-votes="2" \
>> stonith-enabled="false" \
>> stonith-action="reboot" \
>> no-quorum-policy="ignore" \
>> last-lrm-refresh="1413378119"
>> rsc_defaults $id="rsc-options" \
>> resource-stickiness="100" \
>> migration-threshold="5"
>>
>>
>>
>>
>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: resource-stickiness not working? [ In reply to ]
> On 14 Nov 2014, at 5:52 am, Scott Donoho <sdonoho@cray.com> wrote:
>
> Here is a simple Active/Passive configuration with a single Dummy resource (see end of message). The resource-stickiness default is set to 100. I was assuming that this would be enough to keep the Dummy resource on the active node as long as the active node stays healthy. However, stickiness is not working as I expected in the following scenario:
>
> 1) The node testnode1, which is running the Dummy resource, reboots or crashes
> 2) Dummy resource fails to node testnode2
> 3) testnode1 comes back up after reboot or crash

When this happens, the cluster will check what state Dummy is in on testnode1.
My guess is that Dummy thinks it is still active (based on a stale lock file) and recovery is initiated quick enough that it looks like a 'normal' migration

> 4) Dummy resource fails back to testnode1
>
> I don't want the resource to failback to the original node in step 4. That is why resource-stickiness is set to 100. The only way I can get the resource to not to fail back is to set resource-stickiness to INFINITY. Is this the correct behavior of resource-stickiness? What am I missing? This is not what I understand from the documentation from clusterlabs.org. BTW, after reading various postings on fail back issues, I played with setting on-fail to standby, but that doesn't seem to help either. Any help is appreciated!
>
> Scott
>
> node testnode1
> node testnode2
> primitive dummy ocf:heartbeat:Dummy \
> op start timeout="180s" interval="0" \
> op stop timeout="180s" interval="0" \
> op monitor interval="60s" timeout="60s" migration-threshold="5"
> xml <rsc_location id="cli-prefer-dummy" rsc="dummy" role="Started" node="testnode2" score="INFINITY"/>
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-14.el6-368c726" \
> cluster-infrastructure="classic openais (with plugin)" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> stonith-action="reboot" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1413378119"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100" \
> migration-threshold="5"
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org