Mailing List Archive

'stop' operation passes outdated set of instance attributes to RA
Hi,

I believe that is a bug that 'stop' operation uses set of instance
attributes from the original 'start' op, not what successful 'reload' had.
Corresponding pe-input has correct set of attributes, and pre-stop
'notify' op uses updated set of attributes too.
This is easily reproducible with 3.9.6 resource agents and trace_ra.

pacemaker is c529898.

Should I provide more information?

Best,
Vladislav

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: 'stop' operation passes outdated set of instance attributes to RA [ In reply to ]
> On 14 Feb 2015, at 1:10 am, Vladislav Bogdanov <bubble@hoster-ok.com> wrote:
>
> Hi,
>
> I believe that is a bug that 'stop' operation uses set of instance attributes from the original 'start' op, not what successful 'reload' had.
> Corresponding pe-input has correct set of attributes, and pre-stop 'notify' op uses updated set of attributes too.
> This is easily reproducible with 3.9.6 resource agents and trace_ra.
>
> pacemaker is c529898.
>
> Should I provide more information?

Yes please.
I suspect the lrmd needs to update it's parameter cache for the reload operation.

David?


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: 'stop' operation passes outdated set of instance attributes to RA [ In reply to ]
----- Original Message -----
>
> > On 14 Feb 2015, at 1:10 am, Vladislav Bogdanov <bubble@hoster-ok.com>
> > wrote:
> >
> > Hi,
> >
> > I believe that is a bug that 'stop' operation uses set of instance
> > attributes from the original 'start' op, not what successful 'reload' had.
> > Corresponding pe-input has correct set of attributes, and pre-stop 'notify'
> > op uses updated set of attributes too.
> > This is easily reproducible with 3.9.6 resource agents and trace_ra.
> >
> > pacemaker is c529898.
> >
> > Should I provide more information?
>
> Yes please.
> I suspect the lrmd needs to update it's parameter cache for the reload
> operation.
>
> David?

This falls on the crmd I believe. I haven't tested it, but something like
this should fix it I bet.

diff --git a/crmd/lrm.c b/crmd/lrm.c
index ead2e05..45641d2 100644
--- a/crmd/lrm.c
+++ b/crmd/lrm.c
@@ -186,6 +186,7 @@ update_history_cache(lrm_state_t * lrm_state, lrmd_rsc_info_t * rsc, lrmd_event_

if (op->params &&
(safe_str_eq(CRMD_ACTION_START, op->op_type) ||
+ safe_str_eq("reload", op->op_type) ||
safe_str_eq(CRMD_ACTION_STATUS, op->op_type))) {

if (entry->stop_params) {



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: 'stop' operation passes outdated set of instance attributes to RA [ In reply to ]
23.02.2015 05:20, Andrew Beekhof wrote:
>
>> On 14 Feb 2015, at 1:10 am, Vladislav Bogdanov <bubble@hoster-ok.com> wrote:
>>
>> Hi,
>>
>> I believe that is a bug that 'stop' operation uses set of instance attributes from the original 'start' op, not what successful 'reload' had.
>> Corresponding pe-input has correct set of attributes, and pre-stop 'notify' op uses updated set of attributes too.
>> This is easily reproducible with 3.9.6 resource agents and trace_ra.
>>
>> pacemaker is c529898.
>>
>> Should I provide more information?
>
> Yes please.

I doubt what could be needed to reproduce and fix that.
On the one hand, everything from crm_report (may be except digest
hashed) will be ok. On the other, vars are set to the outdated values,
and that is visible in RA traces. May be it is enough to just to try to
reproduce with my latest patch to resource agents (included in 3.9.6)?
Steps are:
* create a clone resource (it is enough to set clone-max=1) with RA
which supports both reload and notify (may be it is simpler to
unconditionally set OCF_RESKEY_trace_ra=1 in the very beginning of the
resource agent before OCF framework is imported to get traces of all RA
executions)
* enable notifications (and trace_ra) for that resource
* start the resource
* change parameters for the resource - that should cause reload
* stop the resource
* compare printenv output in the very beginning of the start, reload,
notify pre-stop and stop actions traces.

Everything should be clear just after that is done I think.

Best,
Vladislav


> I suspect the lrmd needs to update it's parameter cache for the reload operation.
>
> David?
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: 'stop' operation passes outdated set of instance attributes to RA [ In reply to ]
> On 24 Feb 2015, at 5:53 am, Vladislav Bogdanov <bubble@hoster-ok.com> wrote:
>
> 23.02.2015 05:20, Andrew Beekhof wrote:
>>
>>> On 14 Feb 2015, at 1:10 am, Vladislav Bogdanov <bubble@hoster-ok.com> wrote:
>>>
>>> Hi,
>>>
>>> I believe that is a bug that 'stop' operation uses set of instance attributes from the original 'start' op, not what successful 'reload' had.
>>> Corresponding pe-input has correct set of attributes, and pre-stop 'notify' op uses updated set of attributes too.
>>> This is easily reproducible with 3.9.6 resource agents and trace_ra.
>>>
>>> pacemaker is c529898.
>>>
>>> Should I provide more information?
>>
>> Yes please.
>
> I doubt what could be needed to reproduce and fix that.
> On the one hand, everything from crm_report (may be except digest hashed) will be ok. On the other, vars are set to the outdated values, and that is visible in RA traces. May be it is enough to just to try to reproduce with my latest patch to resource agents (included in 3.9.6)?
> Steps are:
> * create a clone resource (it is enough to set clone-max=1) with RA which supports both reload and notify (may be it is simpler to unconditionally set OCF_RESKEY_trace_ra=1 in the very beginning of the resource agent before OCF framework is imported to get traces of all RA executions)
> * enable notifications (and trace_ra) for that resource
> * start the resource
> * change parameters for the resource - that should cause reload
> * stop the resource
> * compare printenv output in the very beginning of the start, reload, notify pre-stop and stop actions traces.
>
> Everything should be clear just after that is done I think.

General rule of thumb... add 1 month turnaround if I need to set up a cluster to reproduce compared to looking at logs/PE files.
Thats not me being mean, I simply don't have the bandwidth. Yesterday I did nothing but answer emails and I barely scratched the surface.

So the easier it is for me to reply, the sooner its going to happen.

>
> Best,
> Vladislav
>
>
>> I suspect the lrmd needs to update it's parameter cache for the reload operation.

Did you try David's fix?
(See, I didn't even find time to hunt down the right place for a 1 line change)

>>
>> David?
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: 'stop' operation passes outdated set of instance attributes to RA [ In reply to ]
> On 24 Feb 2015, at 5:50 am, David Vossel <dvossel@redhat.com> wrote:
>
>
>
> ----- Original Message -----
>>
>>> On 14 Feb 2015, at 1:10 am, Vladislav Bogdanov <bubble@hoster-ok.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I believe that is a bug that 'stop' operation uses set of instance
>>> attributes from the original 'start' op, not what successful 'reload' had.
>>> Corresponding pe-input has correct set of attributes, and pre-stop 'notify'
>>> op uses updated set of attributes too.
>>> This is easily reproducible with 3.9.6 resource agents and trace_ra.
>>>
>>> pacemaker is c529898.
>>>
>>> Should I provide more information?
>>
>> Yes please.
>> I suspect the lrmd needs to update it's parameter cache for the reload
>> operation.
>>
>> David?
>
> This falls on the crmd I believe.

Ah, yes. That rings a bell now. Thanks!

> I haven't tested it, but something like
> this should fix it I bet.
>
> diff --git a/crmd/lrm.c b/crmd/lrm.c
> index ead2e05..45641d2 100644
> --- a/crmd/lrm.c
> +++ b/crmd/lrm.c
> @@ -186,6 +186,7 @@ update_history_cache(lrm_state_t * lrm_state, lrmd_rsc_info_t * rsc, lrmd_event_
>
> if (op->params &&
> (safe_str_eq(CRMD_ACTION_START, op->op_type) ||
> + safe_str_eq("reload", op->op_type) ||
> safe_str_eq(CRMD_ACTION_STATUS, op->op_type))) {
>
> if (entry->stop_params) {
>
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: 'stop' operation passes outdated set of instance attributes to RA [ In reply to ]
23.02.2015 21:50, David Vossel wrote:
>
>
> ----- Original Message -----
>>
>>> On 14 Feb 2015, at 1:10 am, Vladislav Bogdanov <bubble@hoster-ok.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I believe that is a bug that 'stop' operation uses set of instance
>>> attributes from the original 'start' op, not what successful 'reload' had.
>>> Corresponding pe-input has correct set of attributes, and pre-stop 'notify'
>>> op uses updated set of attributes too.
>>> This is easily reproducible with 3.9.6 resource agents and trace_ra.
>>>
>>> pacemaker is c529898.
>>>
>>> Should I provide more information?
>>
>> Yes please.
>> I suspect the lrmd needs to update it's parameter cache for the reload
>> operation.
>>
>> David?
>
> This falls on the crmd I believe. I haven't tested it, but something like
> this should fix it I bet.
>
> diff --git a/crmd/lrm.c b/crmd/lrm.c
> index ead2e05..45641d2 100644
> --- a/crmd/lrm.c
> +++ b/crmd/lrm.c
> @@ -186,6 +186,7 @@ update_history_cache(lrm_state_t * lrm_state, lrmd_rsc_info_t * rsc, lrmd_event_
>
> if (op->params &&
> (safe_str_eq(CRMD_ACTION_START, op->op_type) ||
> + safe_str_eq("reload", op->op_type) ||
> safe_str_eq(CRMD_ACTION_STATUS, op->op_type))) {
>
> if (entry->stop_params) {
>
>

This definitely fixes the issue,
thank you!

>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org