Mailing List Archive

Re: [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
On Thu, Oct 25, 2012 at 01:24:40AM -0700, Takatoshi MATSUO wrote:
> check existence of instance number in replication mode
> because Pacemaker 1.1.8 or higher do not append instance numbers.

I think this is wrong.

It seems this became "necessary" because of

commit 427c7fe6ea94a566aaa714daf8d214290632f837
Author: Andrew Beekhof <andrew@beekhof.net>
Date: Fri Jul 13 13:37:42 2012 +1000

High: PE: Do not append instance numbers to anonymous clones

Benefits:
- they shouldnt have been exposed in the first place, but I didnt know how not to back then
- if admins don't know what they are, they can't be misunderstood or misused
- more reliable failcount and promotion scores (since you dont have to check for all possible permutations)
- smaller status section since there cant be entries for each possible :N suffix
- the name in the config corresponds to the resource in the logs


So if pgsql thinks it needs these instance numbers,
maybe it is not so "anonymous" a clone, after all?

Would the existing resource agent work with globally-unique=true ?

Lars

>
> You can merge this Pull Request by running:
>
> git pull https://github.com/t-matsuo/resource-agents check-instance-number
>
> Or you can view, comment on it, or merge it online at:
>
> https://github.com/ClusterLabs/resource-agents/pull/159
>
> -- Commit Summary --
>
> * Low: pgsql: check existence of instance number in replication mode
>
> -- File Changes --
>
> M heartbeat/pgsql (44)
>
> -- Patch Links --
>
> https://github.com/ClusterLabs/resource-agents/pull/159.patch
> https://github.com/ClusterLabs/resource-agents/pull/159.diff
>
>
> ---
> Reply to this email directly or view it on GitHub:
> https://github.com/ClusterLabs/resource-agents/pull/159

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159) [ In reply to ]
Usually, we use "crm_master" command instead of "crm_attribute" to
change own master score in RA.
But PostgreSQL's Slave can't get own replication status, so Master
changes Slave's master-score
using instance number on Pacemaker 1.0.x .
This probably is not ordinary usage.

> So if pgsql thinks it needs these instance numbers,
> maybe it is not so "anonymous" a clone, after all?
>
> Would the existing resource agent work with globally-unique=true ?

No, I use it with false and it dosen't need true.

--
Takatoshi MATSUO


2012/10/25 Lars Ellenberg <lars.ellenberg@linbit.com>:
> On Thu, Oct 25, 2012 at 01:24:40AM -0700, Takatoshi MATSUO wrote:
>> check existence of instance number in replication mode
>> because Pacemaker 1.1.8 or higher do not append instance numbers.
>
> I think this is wrong.
>
> It seems this became "necessary" because of
>
> commit 427c7fe6ea94a566aaa714daf8d214290632f837
> Author: Andrew Beekhof <andrew@beekhof.net>
> Date: Fri Jul 13 13:37:42 2012 +1000
>
> High: PE: Do not append instance numbers to anonymous clones
>
> Benefits:
> - they shouldnt have been exposed in the first place, but I didnt know how not to back then
> - if admins don't know what they are, they can't be misunderstood or misused
> - more reliable failcount and promotion scores (since you dont have to check for all possible permutations)
> - smaller status section since there cant be entries for each possible :N suffix
> - the name in the config corresponds to the resource in the logs
>
>
> So if pgsql thinks it needs these instance numbers,
> maybe it is not so "anonymous" a clone, after all?
>
> Would the existing resource agent work with globally-unique=true ?
>
> Lars
>
>>
>> You can merge this Pull Request by running:
>>
>> git pull https://github.com/t-matsuo/resource-agents check-instance-number
>>
>> Or you can view, comment on it, or merge it online at:
>>
>> https://github.com/ClusterLabs/resource-agents/pull/159
>>
>> -- Commit Summary --
>>
>> * Low: pgsql: check existence of instance number in replication mode
>>
>> -- File Changes --
>>
>> M heartbeat/pgsql (44)
>>
>> -- Patch Links --
>>
>> https://github.com/ClusterLabs/resource-agents/pull/159.patch
>> https://github.com/ClusterLabs/resource-agents/pull/159.diff
>>
>>
>> ---
>> Reply to this email directly or view it on GitHub:
>> https://github.com/ClusterLabs/resource-agents/pull/159
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159) [ In reply to ]
On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote:
> Usually, we use "crm_master" command instead of "crm_attribute" to change master score in RA.
> But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score
> using instance number on Pacemaker 1.0.x .
> This probably is not ordinary usage.
>
> > Would the existing resource agent work with globally-unique=true ?
>
> I don't know it works with true.
> I use it with false and it dosen't need true.

I suggested that you actually should use globally-unique clones,
as in that case you still get those instance numbers...

But thinking about it once more, I'm not so sure anymore.

Correct me where I'm wrong.

This is about the master score.
In case the Master instance fails, we preferably want to promote the
slave instance that is as close as possible to the Master.
We only know which *node* was "best" at the last monitoring interval,
which may be "good enough".

We need to then change the master score for *all possible instances*,
for all nodes, accordingly.

Which is what that loop did.
(I think skipping the "current" instance is actually a bug;
If pacemaker relabeles things in a "bad way", you may hit it).

Now, with pacemaker 1.1.8, all instances become "equal"
(for anonymous clones, aka globally-unique=false),
and we only need to set the score on the resource-id,
not for all resource-id:instance combinations.

Which is great. After all, the master score in this case is attached to
the node (or, the data set accessible from that node), and not to the
(arbitrary, potentially relabeled "anytime") instance number pacemaker
assigned to the clone instance running on that node.


And that is exactly what your patch does:
* detect if a version of pacemaker is in use that attaches the instance
number to the resource id
* if so, do the loop on all possible instance numbers as before
* if not, only set the master score on the resource-id


Is my understanding correct?
Then I think you patch is good.

Still, other resource agents that use master scores (or any other
attributes that reference instance numbers of anonymous clones)
need to be reviewed.

Though this "I'll set scores for other instances, not only myself"
logic is unique to pgsql, so most other resource agents should "just
work" with whatever is present in the environment, they typically treat
the $OCF_RESOURCE_INSTANCE as opaque.

Thanks,
Lars

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159) [ In reply to ]
On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote:
> On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote:
> > Usually, we use "crm_master" command instead of "crm_attribute" to change master score in RA.
> > But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score
> > using instance number on Pacemaker 1.0.x .
> > This probably is not ordinary usage.
> >
> > > Would the existing resource agent work with globally-unique=true ?
> >
> > I don't know it works with true.
> > I use it with false and it dosen't need true.
>
> I suggested that you actually should use globally-unique clones,
> as in that case you still get those instance numbers...

Does using different clones make sense in pgsql? What is to be
different between them? Or would it be just for the sake of
getting instance numbers? If so, then it somehow looks wrong to
me :)

> But thinking about it once more, I'm not so sure anymore.
>
> Correct me where I'm wrong.
>
> This is about the master score.
> In case the Master instance fails, we preferably want to promote the
> slave instance that is as close as possible to the Master.
> We only know which *node* was "best" at the last monitoring interval,
> which may be "good enough".
>
> We need to then change the master score for *all possible instances*,
> for all nodes, accordingly.
>
> Which is what that loop did.
> (I think skipping the "current" instance is actually a bug;
> If pacemaker relabeles things in a "bad way", you may hit it).
>
> Now, with pacemaker 1.1.8, all instances become "equal"
> (for anonymous clones, aka globally-unique=false),
> and we only need to set the score on the resource-id,
> not for all resource-id:instance combinations.

OK.

> Which is great. After all, the master score in this case is attached to
> the node (or, the data set accessible from that node), and not to the
> (arbitrary, potentially relabeled "anytime") instance number pacemaker
> assigned to the clone instance running on that node.
>
>
> And that is exactly what your patch does:
> * detect if a version of pacemaker is in use that attaches the instance
> number to the resource id
> * if so, do the loop on all possible instance numbers as before
> * if not, only set the master score on the resource-id
>
>
> Is my understanding correct?
> Then I think you patch is good.

Yes, the patch seems good then. Though there is quite a bit of
code repetition. The "set attribute part" should be moved to an
extra function.

> Still, other resource agents that use master scores (or any other
> attributes that reference instance numbers of anonymous clones)
> need to be reviewed.
>
> Though this "I'll set scores for other instances, not only myself"
> logic is unique to pgsql, so most other resource agents should "just
> work" with whatever is present in the environment, they typically treat
> the $OCF_RESOURCE_INSTANCE as opaque.

Seems like no other RA uses instance numbers. However, quite a
few use OCF_RESOURCE_INSTANCE which, in case of clone/ms
resources, may potentially lead to unpredictable results on
upgrade to 1.1.8.

> Thanks,
> Lars

Cheers,

Dejan
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159) [ In reply to ]
On Thu, Oct 25, 2012 at 10:01 PM, Takatoshi MATSUO <matsuo.tak@gmail.com> wrote:
> Usually, we use "crm_master" command instead of "crm_attribute" to
> change own master score in RA.
> But PostgreSQL's Slave can't get own replication status, so Master
> changes Slave's master-score
> using instance number on Pacemaker 1.0.x .
> This probably is not ordinary usage.

Ouch! No, not ordinary (or recommended) at all :-)
What does the crm_attribute command line look like? Maybe the --node
option could help?

>
>> So if pgsql thinks it needs these instance numbers,
>> maybe it is not so "anonymous" a clone, after all?
>>
>> Would the existing resource agent work with globally-unique=true ?
>
> No, I use it with false and it dosen't need true.
>
> --
> Takatoshi MATSUO
>
>
> 2012/10/25 Lars Ellenberg <lars.ellenberg@linbit.com>:
>> On Thu, Oct 25, 2012 at 01:24:40AM -0700, Takatoshi MATSUO wrote:
>>> check existence of instance number in replication mode
>>> because Pacemaker 1.1.8 or higher do not append instance numbers.
>>
>> I think this is wrong.
>>
>> It seems this became "necessary" because of
>>
>> commit 427c7fe6ea94a566aaa714daf8d214290632f837
>> Author: Andrew Beekhof <andrew@beekhof.net>
>> Date: Fri Jul 13 13:37:42 2012 +1000
>>
>> High: PE: Do not append instance numbers to anonymous clones
>>
>> Benefits:
>> - they shouldnt have been exposed in the first place, but I didnt know how not to back then
>> - if admins don't know what they are, they can't be misunderstood or misused
>> - more reliable failcount and promotion scores (since you dont have to check for all possible permutations)
>> - smaller status section since there cant be entries for each possible :N suffix
>> - the name in the config corresponds to the resource in the logs
>>
>>
>> So if pgsql thinks it needs these instance numbers,
>> maybe it is not so "anonymous" a clone, after all?
>>
>> Would the existing resource agent work with globally-unique=true ?
>>
>> Lars
>>
>>>
>>> You can merge this Pull Request by running:
>>>
>>> git pull https://github.com/t-matsuo/resource-agents check-instance-number
>>>
>>> Or you can view, comment on it, or merge it online at:
>>>
>>> https://github.com/ClusterLabs/resource-agents/pull/159
>>>
>>> -- Commit Summary --
>>>
>>> * Low: pgsql: check existence of instance number in replication mode
>>>
>>> -- File Changes --
>>>
>>> M heartbeat/pgsql (44)
>>>
>>> -- Patch Links --
>>>
>>> https://github.com/ClusterLabs/resource-agents/pull/159.patch
>>> https://github.com/ClusterLabs/resource-agents/pull/159.diff
>>>
>>>
>>> ---
>>> Reply to this email directly or view it on GitHub:
>>> https://github.com/ClusterLabs/resource-agents/pull/159
>>
>> --
>> : Lars Ellenberg
>> : LINBIT | Your Way to High Availability
>> : DRBD/HA support and consulting http://www.linbit.com
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159) [ In reply to ]
On Fri, Oct 26, 2012 at 12:52 AM, Dejan Muhamedagic <dejan@suse.de> wrote:
> On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote:
>> On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote:
>> > Usually, we use "crm_master" command instead of "crm_attribute" to change master score in RA.
>> > But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score
>> > using instance number on Pacemaker 1.0.x .
>> > This probably is not ordinary usage.
>> >
>> > > Would the existing resource agent work with globally-unique=true ?
>> >
>> > I don't know it works with true.
>> > I use it with false and it dosen't need true.
>>
>> I suggested that you actually should use globally-unique clones,
>> as in that case you still get those instance numbers...
>
> Does using different clones make sense in pgsql? What is to be
> different between them? Or would it be just for the sake of
> getting instance numbers? If so, then it somehow looks wrong to
> me :)
>
>> But thinking about it once more, I'm not so sure anymore.
>>
>> Correct me where I'm wrong.
>>
>> This is about the master score.
>> In case the Master instance fails, we preferably want to promote the
>> slave instance that is as close as possible to the Master.
>> We only know which *node* was "best" at the last monitoring interval,
>> which may be "good enough".
>>
>> We need to then change the master score for *all possible instances*,
>> for all nodes, accordingly.
>>
>> Which is what that loop did.
>> (I think skipping the "current" instance is actually a bug;
>> If pacemaker relabeles things in a "bad way", you may hit it).
>>
>> Now, with pacemaker 1.1.8, all instances become "equal"
>> (for anonymous clones, aka globally-unique=false),
>> and we only need to set the score on the resource-id,
>> not for all resource-id:instance combinations.
>
> OK.
>
>> Which is great. After all, the master score in this case is attached to
>> the node (or, the data set accessible from that node), and not to the
>> (arbitrary, potentially relabeled "anytime") instance number pacemaker
>> assigned to the clone instance running on that node.
>>
>>
>> And that is exactly what your patch does:
>> * detect if a version of pacemaker is in use that attaches the instance
>> number to the resource id
>> * if so, do the loop on all possible instance numbers as before
>> * if not, only set the master score on the resource-id
>>
>>
>> Is my understanding correct?
>> Then I think you patch is good.
>
> Yes, the patch seems good then. Though there is quite a bit of
> code repetition. The "set attribute part" should be moved to an
> extra function.
>
>> Still, other resource agents that use master scores (or any other
>> attributes that reference instance numbers of anonymous clones)
>> need to be reviewed.
>>
>> Though this "I'll set scores for other instances, not only myself"
>> logic is unique to pgsql, so most other resource agents should "just
>> work" with whatever is present in the environment, they typically treat
>> the $OCF_RESOURCE_INSTANCE as opaque.
>
> Seems like no other RA uses instance numbers. However, quite a
> few use OCF_RESOURCE_INSTANCE which, in case of clone/ms
> resources, may potentially lead to unpredictable results on
> upgrade to 1.1.8.

No. Otherwise all the regression tests would fail. The PE is smart
enough to find promotion score and failcounts in either case.
Also, OCF_RESOURCE_INSTANCE contains whatever the local lrmd knows the
resource as, not what we call it internally to the PE.

>
>> Thanks,
>> Lars
>
> Cheers,
>
> Dejan
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159) [ In reply to ]
2012/10/26 Andrew Beekhof <andrew@beekhof.net>:
> On Thu, Oct 25, 2012 at 10:01 PM, Takatoshi MATSUO <matsuo.tak@gmail.com> wrote:
>> Usually, we use "crm_master" command instead of "crm_attribute" to
>> change own master score in RA.
>> But PostgreSQL's Slave can't get own replication status, so Master
>> changes Slave's master-score
>> using instance number on Pacemaker 1.0.x .
>> This probably is not ordinary usage.
>
> Ouch! No, not ordinary (or recommended) at all :-)
> What does the crm_attribute command line look like? Maybe the --node
> option could help?

# crm_attribute -l reboot -N pm02 -n "master-pgsql:1" -v "1000"

This line uses crm_master as a reference.
I would like crm_master to have a parameter which can set hostname.


But crm_master gets hostname using "crm_node -n" command in these days,
so I think that I should fix method to get hostname for next version.
It also needs compatible code for Pacemaker 1.0.x :(

>>
>>> So if pgsql thinks it needs these instance numbers,
>>> maybe it is not so "anonymous" a clone, after all?
>>>
>>> Would the existing resource agent work with globally-unique=true ?
>>
>> No, I use it with false and it dosen't need true.
>>
>> --
>> Takatoshi MATSUO
>>
>>
>> 2012/10/25 Lars Ellenberg <lars.ellenberg@linbit.com>:
>>> On Thu, Oct 25, 2012 at 01:24:40AM -0700, Takatoshi MATSUO wrote:
>>>> check existence of instance number in replication mode
>>>> because Pacemaker 1.1.8 or higher do not append instance numbers.
>>>
>>> I think this is wrong.
>>>
>>> It seems this became "necessary" because of
>>>
>>> commit 427c7fe6ea94a566aaa714daf8d214290632f837
>>> Author: Andrew Beekhof <andrew@beekhof.net>
>>> Date: Fri Jul 13 13:37:42 2012 +1000
>>>
>>> High: PE: Do not append instance numbers to anonymous clones
>>>
>>> Benefits:
>>> - they shouldnt have been exposed in the first place, but I didnt know how not to back then
>>> - if admins don't know what they are, they can't be misunderstood or misused
>>> - more reliable failcount and promotion scores (since you dont have to check for all possible permutations)
>>> - smaller status section since there cant be entries for each possible :N suffix
>>> - the name in the config corresponds to the resource in the logs
>>>
>>>
>>> So if pgsql thinks it needs these instance numbers,
>>> maybe it is not so "anonymous" a clone, after all?
>>>
>>> Would the existing resource agent work with globally-unique=true ?
>>>
>>> Lars
>>>
>>>>
>>>> You can merge this Pull Request by running:
>>>>
>>>> git pull https://github.com/t-matsuo/resource-agents check-instance-number
>>>>
>>>> Or you can view, comment on it, or merge it online at:
>>>>
>>>> https://github.com/ClusterLabs/resource-agents/pull/159
>>>>
>>>> -- Commit Summary --
>>>>
>>>> * Low: pgsql: check existence of instance number in replication mode
>>>>
>>>> -- File Changes --
>>>>
>>>> M heartbeat/pgsql (44)
>>>>
>>>> -- Patch Links --
>>>>
>>>> https://github.com/ClusterLabs/resource-agents/pull/159.patch
>>>> https://github.com/ClusterLabs/resource-agents/pull/159.diff
>>>>
>>>>
>>>> ---
>>>> Reply to this email directly or view it on GitHub:
>>>> https://github.com/ClusterLabs/resource-agents/pull/159
>>>
>>> --
>>> : Lars Ellenberg
>>> : LINBIT | Your Way to High Availability
>>> : DRBD/HA support and consulting http://www.linbit.com
>>> _______________________________________________________
>>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>> Home Page: http://linux-ha.org/

--
Thanks,
Takatoshi MATSUO
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159) [ In reply to ]
2012/10/25 Dejan Muhamedagic <dejan@suse.de>:
> On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote:
>> On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote:
>> > Usually, we use "crm_master" command instead of "crm_attribute" to change master score in RA.
>> > But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score
>> > using instance number on Pacemaker 1.0.x .
>> > This probably is not ordinary usage.
>> >
>> > > Would the existing resource agent work with globally-unique=true ?
>> >
>> > I don't know it works with true.
>> > I use it with false and it dosen't need true.
>>
>> I suggested that you actually should use globally-unique clones,
>> as in that case you still get those instance numbers...
>
> Does using different clones make sense in pgsql? What is to be
> different between them? Or would it be just for the sake of
> getting instance numbers? If so, then it somehow looks wrong to
> me :)

It makes no sense to using different clones.
Pgsql only uses instance numbers for changing master score on other nodes.
Master score needs it on Pacemaker 1.0.x regardless of globally-unique.

>
>> But thinking about it once more, I'm not so sure anymore.
>>
>> Correct me where I'm wrong.
>>
>> This is about the master score.
>> In case the Master instance fails, we preferably want to promote the
>> slave instance that is as close as possible to the Master.
>> We only know which *node* was "best" at the last monitoring interval,
>> which may be "good enough".
>>
>> We need to then change the master score for *all possible instances*,
>> for all nodes, accordingly.
>>
>> Which is what that loop did.
>> (I think skipping the "current" instance is actually a bug;
>> If pacemaker relabeles things in a "bad way", you may hit it).
>>
>> Now, with pacemaker 1.1.8, all instances become "equal"
>> (for anonymous clones, aka globally-unique=false),
>> and we only need to set the score on the resource-id,
>> not for all resource-id:instance combinations.
>
> OK.
>
>> Which is great. After all, the master score in this case is attached to
>> the node (or, the data set accessible from that node), and not to the
>> (arbitrary, potentially relabeled "anytime") instance number pacemaker
>> assigned to the clone instance running on that node.
>>
>>
>> And that is exactly what your patch does:
>> * detect if a version of pacemaker is in use that attaches the instance
>> number to the resource id
>> * if so, do the loop on all possible instance numbers as before
>> * if not, only set the master score on the resource-id
>>
>>
>> Is my understanding correct?
>> Then I think you patch is good.
>
> Yes, the patch seems good then. Though there is quite a bit of
> code repetition. The "set attribute part" should be moved to an
> extra function.

I will improve it.

>
>> Still, other resource agents that use master scores (or any other
>> attributes that reference instance numbers of anonymous clones)
>> need to be reviewed.
>>
>> Though this "I'll set scores for other instances, not only myself"
>> logic is unique to pgsql, so most other resource agents should "just
>> work" with whatever is present in the environment, they typically treat
>> the $OCF_RESOURCE_INSTANCE as opaque.
>
> Seems like no other RA uses instance numbers. However, quite a
> few use OCF_RESOURCE_INSTANCE which, in case of clone/ms
> resources, may potentially lead to unpredictable results on
> upgrade to 1.1.8.
>
>> Thanks,
>> Lars
>
> Cheers,
>
> Dejan


Thanks,
Takatoshi MATSUO
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159) [ In reply to ]
On Fri, Oct 26, 2012 at 12:49 PM, Takatoshi MATSUO <matsuo.tak@gmail.com> wrote:
> 2012/10/26 Andrew Beekhof <andrew@beekhof.net>:
>> On Thu, Oct 25, 2012 at 10:01 PM, Takatoshi MATSUO <matsuo.tak@gmail.com> wrote:
>>> Usually, we use "crm_master" command instead of "crm_attribute" to
>>> change own master score in RA.
>>> But PostgreSQL's Slave can't get own replication status, so Master
>>> changes Slave's master-score
>>> using instance number on Pacemaker 1.0.x .
>>> This probably is not ordinary usage.
>>
>> Ouch! No, not ordinary (or recommended) at all :-)
>> What does the crm_attribute command line look like? Maybe the --node
>> option could help?
>
> # crm_attribute -l reboot -N pm02 -n "master-pgsql:1" -v "1000"

That looks fine, just drop the :1 (or use whatever is in OCF_RESOURCE_INSTANCE)

>
> This line uses crm_master as a reference.
> I would like crm_master to have a parameter which can set hostname.

Probably not going to happen. crm_master is a convenience function
for the common use case.
Its fine to switch to crm_attribute for advanced usage.

>
>
> But crm_master gets hostname using "crm_node -n" command in these days,
> so I think that I should fix method to get hostname for next version.
> It also needs compatible code for Pacemaker 1.0.x :(
>
>>>
>>>> So if pgsql thinks it needs these instance numbers,
>>>> maybe it is not so "anonymous" a clone, after all?
>>>>
>>>> Would the existing resource agent work with globally-unique=true ?
>>>
>>> No, I use it with false and it dosen't need true.
>>>
>>> --
>>> Takatoshi MATSUO
>>>
>>>
>>> 2012/10/25 Lars Ellenberg <lars.ellenberg@linbit.com>:
>>>> On Thu, Oct 25, 2012 at 01:24:40AM -0700, Takatoshi MATSUO wrote:
>>>>> check existence of instance number in replication mode
>>>>> because Pacemaker 1.1.8 or higher do not append instance numbers.
>>>>
>>>> I think this is wrong.
>>>>
>>>> It seems this became "necessary" because of
>>>>
>>>> commit 427c7fe6ea94a566aaa714daf8d214290632f837
>>>> Author: Andrew Beekhof <andrew@beekhof.net>
>>>> Date: Fri Jul 13 13:37:42 2012 +1000
>>>>
>>>> High: PE: Do not append instance numbers to anonymous clones
>>>>
>>>> Benefits:
>>>> - they shouldnt have been exposed in the first place, but I didnt know how not to back then
>>>> - if admins don't know what they are, they can't be misunderstood or misused
>>>> - more reliable failcount and promotion scores (since you dont have to check for all possible permutations)
>>>> - smaller status section since there cant be entries for each possible :N suffix
>>>> - the name in the config corresponds to the resource in the logs
>>>>
>>>>
>>>> So if pgsql thinks it needs these instance numbers,
>>>> maybe it is not so "anonymous" a clone, after all?
>>>>
>>>> Would the existing resource agent work with globally-unique=true ?
>>>>
>>>> Lars
>>>>
>>>>>
>>>>> You can merge this Pull Request by running:
>>>>>
>>>>> git pull https://github.com/t-matsuo/resource-agents check-instance-number
>>>>>
>>>>> Or you can view, comment on it, or merge it online at:
>>>>>
>>>>> https://github.com/ClusterLabs/resource-agents/pull/159
>>>>>
>>>>> -- Commit Summary --
>>>>>
>>>>> * Low: pgsql: check existence of instance number in replication mode
>>>>>
>>>>> -- File Changes --
>>>>>
>>>>> M heartbeat/pgsql (44)
>>>>>
>>>>> -- Patch Links --
>>>>>
>>>>> https://github.com/ClusterLabs/resource-agents/pull/159.patch
>>>>> https://github.com/ClusterLabs/resource-agents/pull/159.diff
>>>>>
>>>>>
>>>>> ---
>>>>> Reply to this email directly or view it on GitHub:
>>>>> https://github.com/ClusterLabs/resource-agents/pull/159
>>>>
>>>> --
>>>> : Lars Ellenberg
>>>> : LINBIT | Your Way to High Availability
>>>> : DRBD/HA support and consulting http://www.linbit.com
>>>> _______________________________________________________
>>>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>>> Home Page: http://linux-ha.org/
>
> --
> Thanks,
> Takatoshi MATSUO
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159) [ In reply to ]
On Fri, Oct 26, 2012 at 11:36:53AM +1100, Andrew Beekhof wrote:
> On Fri, Oct 26, 2012 at 12:52 AM, Dejan Muhamedagic <dejan@suse.de> wrote:
> > On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote:
> >> On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote:
> >> > Usually, we use "crm_master" command instead of "crm_attribute" to change master score in RA.
> >> > But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score
> >> > using instance number on Pacemaker 1.0.x .
> >> > This probably is not ordinary usage.
> >> >
> >> > > Would the existing resource agent work with globally-unique=true ?
> >> >
> >> > I don't know it works with true.
> >> > I use it with false and it dosen't need true.
> >>
> >> I suggested that you actually should use globally-unique clones,
> >> as in that case you still get those instance numbers...
> >
> > Does using different clones make sense in pgsql? What is to be
> > different between them? Or would it be just for the sake of
> > getting instance numbers? If so, then it somehow looks wrong to
> > me :)
> >
> >> But thinking about it once more, I'm not so sure anymore.
> >>
> >> Correct me where I'm wrong.
> >>
> >> This is about the master score.
> >> In case the Master instance fails, we preferably want to promote the
> >> slave instance that is as close as possible to the Master.
> >> We only know which *node* was "best" at the last monitoring interval,
> >> which may be "good enough".
> >>
> >> We need to then change the master score for *all possible instances*,
> >> for all nodes, accordingly.
> >>
> >> Which is what that loop did.
> >> (I think skipping the "current" instance is actually a bug;
> >> If pacemaker relabeles things in a "bad way", you may hit it).
> >>
> >> Now, with pacemaker 1.1.8, all instances become "equal"
> >> (for anonymous clones, aka globally-unique=false),
> >> and we only need to set the score on the resource-id,
> >> not for all resource-id:instance combinations.
> >
> > OK.
> >
> >> Which is great. After all, the master score in this case is attached to
> >> the node (or, the data set accessible from that node), and not to the
> >> (arbitrary, potentially relabeled "anytime") instance number pacemaker
> >> assigned to the clone instance running on that node.
> >>
> >>
> >> And that is exactly what your patch does:
> >> * detect if a version of pacemaker is in use that attaches the instance
> >> number to the resource id
> >> * if so, do the loop on all possible instance numbers as before
> >> * if not, only set the master score on the resource-id
> >>
> >>
> >> Is my understanding correct?
> >> Then I think you patch is good.
> >
> > Yes, the patch seems good then. Though there is quite a bit of
> > code repetition. The "set attribute part" should be moved to an
> > extra function.
> >
> >> Still, other resource agents that use master scores (or any other
> >> attributes that reference instance numbers of anonymous clones)
> >> need to be reviewed.
> >>
> >> Though this "I'll set scores for other instances, not only myself"
> >> logic is unique to pgsql, so most other resource agents should "just
> >> work" with whatever is present in the environment, they typically treat
> >> the $OCF_RESOURCE_INSTANCE as opaque.
> >
> > Seems like no other RA uses instance numbers. However, quite a
> > few use OCF_RESOURCE_INSTANCE which, in case of clone/ms
> > resources, may potentially lead to unpredictable results on
> > upgrade to 1.1.8.
>
> No. Otherwise all the regression tests would fail. The PE is smart
> enough to find promotion score and failcounts in either case.

Cool.

> Also, OCF_RESOURCE_INSTANCE contains whatever the local lrmd knows the
> resource as, not what we call it internally to the PE.

What I meant was that some RA use OCF_RESOURCE_INSTANCE to name
local files which keep some kind of state. If
OCF_RESOURCE_INSTANCE changes on upgrade... Well, I guess that
the worst that can happen is for the probe to fail. But I didn't
take a closer look.

Thanks,

Dejan

> >> Thanks,
> >> Lars
> >
> > Cheers,
> >
> > Dejan
> > _______________________________________________________
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159) [ In reply to ]
On Mon, Oct 29, 2012 at 9:51 PM, Dejan Muhamedagic <dejan@suse.de> wrote:
> On Fri, Oct 26, 2012 at 11:36:53AM +1100, Andrew Beekhof wrote:
>> On Fri, Oct 26, 2012 at 12:52 AM, Dejan Muhamedagic <dejan@suse.de> wrote:
>> > On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote:
>> >> On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote:
>> >> > Usually, we use "crm_master" command instead of "crm_attribute" to change master score in RA.
>> >> > But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score
>> >> > using instance number on Pacemaker 1.0.x .
>> >> > This probably is not ordinary usage.
>> >> >
>> >> > > Would the existing resource agent work with globally-unique=true ?
>> >> >
>> >> > I don't know it works with true.
>> >> > I use it with false and it dosen't need true.
>> >>
>> >> I suggested that you actually should use globally-unique clones,
>> >> as in that case you still get those instance numbers...
>> >
>> > Does using different clones make sense in pgsql? What is to be
>> > different between them? Or would it be just for the sake of
>> > getting instance numbers? If so, then it somehow looks wrong to
>> > me :)
>> >
>> >> But thinking about it once more, I'm not so sure anymore.
>> >>
>> >> Correct me where I'm wrong.
>> >>
>> >> This is about the master score.
>> >> In case the Master instance fails, we preferably want to promote the
>> >> slave instance that is as close as possible to the Master.
>> >> We only know which *node* was "best" at the last monitoring interval,
>> >> which may be "good enough".
>> >>
>> >> We need to then change the master score for *all possible instances*,
>> >> for all nodes, accordingly.
>> >>
>> >> Which is what that loop did.
>> >> (I think skipping the "current" instance is actually a bug;
>> >> If pacemaker relabeles things in a "bad way", you may hit it).
>> >>
>> >> Now, with pacemaker 1.1.8, all instances become "equal"
>> >> (for anonymous clones, aka globally-unique=false),
>> >> and we only need to set the score on the resource-id,
>> >> not for all resource-id:instance combinations.
>> >
>> > OK.
>> >
>> >> Which is great. After all, the master score in this case is attached to
>> >> the node (or, the data set accessible from that node), and not to the
>> >> (arbitrary, potentially relabeled "anytime") instance number pacemaker
>> >> assigned to the clone instance running on that node.
>> >>
>> >>
>> >> And that is exactly what your patch does:
>> >> * detect if a version of pacemaker is in use that attaches the instance
>> >> number to the resource id
>> >> * if so, do the loop on all possible instance numbers as before
>> >> * if not, only set the master score on the resource-id
>> >>
>> >>
>> >> Is my understanding correct?
>> >> Then I think you patch is good.
>> >
>> > Yes, the patch seems good then. Though there is quite a bit of
>> > code repetition. The "set attribute part" should be moved to an
>> > extra function.
>> >
>> >> Still, other resource agents that use master scores (or any other
>> >> attributes that reference instance numbers of anonymous clones)
>> >> need to be reviewed.
>> >>
>> >> Though this "I'll set scores for other instances, not only myself"
>> >> logic is unique to pgsql, so most other resource agents should "just
>> >> work" with whatever is present in the environment, they typically treat
>> >> the $OCF_RESOURCE_INSTANCE as opaque.
>> >
>> > Seems like no other RA uses instance numbers. However, quite a
>> > few use OCF_RESOURCE_INSTANCE which, in case of clone/ms
>> > resources, may potentially lead to unpredictable results on
>> > upgrade to 1.1.8.
>>
>> No. Otherwise all the regression tests would fail. The PE is smart
>> enough to find promotion score and failcounts in either case.
>
> Cool.
>
>> Also, OCF_RESOURCE_INSTANCE contains whatever the local lrmd knows the
>> resource as, not what we call it internally to the PE.
>
> What I meant was that some RA use OCF_RESOURCE_INSTANCE to name
> local files which keep some kind of state. If
> OCF_RESOURCE_INSTANCE changes on upgrade... Well, I guess that
> the worst that can happen is for the probe to fail.

Right. But only for attach/reattach.
And people should have maintenance-mode enabled at the point the probe
is run, so there is time to fix things up before the cluster does
anything about it.

> But I didn't
> take a closer look.
>
> Thanks,
>
> Dejan
>
>> >> Thanks,
>> >> Lars
>> >
>> > Cheers,
>> >
>> > Dejan
>> > _______________________________________________________
>> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> > Home Page: http://linux-ha.org/
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/