Mailing List Archive

[Patch] The problem that the cord of the digest cord of crmd becomes mismatched for.
Hi All,

We found pacemaker that we could not judge a result of the operation of lrmd well.

When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor.

(snip)
primitive prmDiskd ocf:pacemaker:Dummy \
params name="diskcheck_status_internal" device="/dev/vda" interval="30" \
op start interval="0" timeout="60s" on-fail="restart" prereq="fencing" \
op monitor interval="30s" timeout="60s" on-fail="restart" \
op stop interval="0s" timeout="60s" on-fail="block"
(snip)

This is because lrmd gives back prereq parameter of start as a result of monitor operation.
As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation.

We can confirm this problem by the next command in Pacemaker1.0.12.

Command 1) crm_verify command outputs the difference in digest cord.

[root@rh63-heartbeat1 ~]# crm_verify -L
crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6


Command 2) The ptest command outputs the difference in digest cord, too.

[root@rh63-heartbeat1 ~]# ptest -L -VV
ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes
ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
[root@rh63-heartbeat1 ~]#

Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource.

Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp: Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1
Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave resource prmDiskd:0#011(Started rh63-heartbeat1)
Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses
Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2
Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_30000 on rh63-heartbeat1 (local)
Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_30000 )
Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_interval=[30000] CRM_meta_timeout=[60000] cancelled
Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 monitor[5] (pid 20009)
Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=4, status=1, cib-update=0, confirmed=true) Cancelled
Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on prmDiskd:0 for client 19839: pid 20009 exited with return code 0
Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest: #### yamauchi ####Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for prmDiskd:0_monitor_30000 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). Source: <parameters device="/dev/vda" name="diskcheck_status_internal" interval="30" prereq="fencing" CRM_meta_timeout="60000"/>
Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=5, rc=0, cib-update=53, confirmed=false) ok
Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: match_graph_event: Action prmDiskd:0_monitor_30000 (1) confirmed on rh63-heartbeat1 (rc=0)


It is a problem to judge crmd that a digest cord is changed in not changing the parameter at all.

I made a patch.
The lrmd always gives back only a parameter depended on to a result from crmd and is a patch copying a parameter necessary for only RA run time.

My patch may have a problem.
Please confirm the contents of the patch.

Best Regards,
Hideo Yamauchi.
Re: [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for. [ In reply to ]
Hi Hideo-san,

On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661014@ybb.ne.jp wrote:
> Hi All,
>
> We found pacemaker that we could not judge a result of the operation of lrmd well.
>
> When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor.
>
> (snip)
> primitive prmDiskd ocf:pacemaker:Dummy \
> params name="diskcheck_status_internal" device="/dev/vda" interval="30" \
> op start interval="0" timeout="60s" on-fail="restart" prereq="fencing" \
> op monitor interval="30s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="block"
> (snip)
>
> This is because lrmd gives back prereq parameter of start as a result of monitor operation.
> As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation.
>
> We can confirm this problem by the next command in Pacemaker1.0.12.
>
> Command 1) crm_verify command outputs the difference in digest cord.
>
> [root@rh63-heartbeat1 ~]# crm_verify -L
> crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
>
>
> Command 2) The ptest command outputs the difference in digest cord, too.
>
> [root@rh63-heartbeat1 ~]# ptest -L -VV
> ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> [root@rh63-heartbeat1 ~]#
>
> Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource.
>
> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp: Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1
> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave resource prmDiskd:0#011(Started rh63-heartbeat1)
> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses
> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2
> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_30000 on rh63-heartbeat1 (local)
> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_30000 )
> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_interval=[30000] CRM_meta_timeout=[60000] cancelled
> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 monitor[5] (pid 20009)
> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=4, status=1, cib-update=0, confirmed=true) Cancelled
> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on prmDiskd:0 for client 19839: pid 20009 exited with return code 0
> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest: #### yamauchi ####Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for prmDiskd:0_monitor_30000 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). Source: <parameters device="/dev/vda" name="diskcheck_status_internal" interval="30" prereq="fencing" CRM_meta_timeout="60000"/>
> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=5, rc=0, cib-update=53, confirmed=false) ok
> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: match_graph_event: Action prmDiskd:0_monitor_30000 (1) confirmed on rh63-heartbeat1 (rc=0)
>
>
> It is a problem to judge crmd that a digest cord is changed in not changing the parameter at all.
>
> I made a patch.
> The lrmd always gives back only a parameter depended on to a result from crmd and is a patch copying a parameter necessary for only RA run time.
>
> My patch may have a problem.
> Please confirm the contents of the patch.

What the patch does is to prevent lrmd from passing back the
parameters defined with the operation. What's funny is that this
code was there since 2006 (see LF bug 1301).

Well, it makes sense to me. It would be good if Andrew takes a
look too.

And many thanks for the patch.


Cheers,

Dejan


> Best Regards,
> Hideo Yamauchi.


> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for. [ In reply to ]
Hi Dejan,

Thank you for comments.

I wait for comment of Andrew.
I hope that a problem is settled with a patch.

Many thanks,
Hideo Yamauhci.

--- On Wed, 2012/10/10, Dejan Muhamedagic <dejan@suse.de> wrote:

> Hi Hideo-san,
>
> On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661014@ybb.ne.jp wrote:
> > Hi All,
> >
> > We found pacemaker that we could not judge a result of the operation of lrmd well.
> >
> > When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor.
> >
> > (snip)
> > primitive prmDiskd ocf:pacemaker:Dummy \
> >         params name="diskcheck_status_internal" device="/dev/vda" interval="30" \
> >         op start interval="0" timeout="60s" on-fail="restart" prereq="fencing" \
> >         op monitor interval="30s" timeout="60s" on-fail="restart" \
> >         op stop interval="0s" timeout="60s" on-fail="block"
> > (snip)
> >
> > This is because lrmd gives back prereq parameter of start as a result of monitor operation.
> > As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation.
> >
> > We can confirm this problem by the next command in Pacemaker1.0.12.
> >
> > Command 1) crm_verify command outputs the difference in digest cord.
> >
> > [root@rh63-heartbeat1 ~]# crm_verify -L
> > crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> >
> >
> > Command 2) The ptest command outputs the difference in digest cord, too.
> >
> > [root@rh63-heartbeat1 ~]# ptest -L -VV
> > ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> > ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> > [root@rh63-heartbeat1 ~]#
> >
> > Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource.
> >
> > Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> > Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp:  Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1
> > Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave   resource prmDiskd:0#011(Started rh63-heartbeat1)
> > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses
> > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2
> > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_30000 on rh63-heartbeat1 (local)
> > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_30000 )
> > Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_interval=[30000] CRM_meta_timeout=[60000]  cancelled
> > Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 monitor[5] (pid 20009)
> > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=4, status=1, cib-update=0, confirmed=true) Cancelled
> > Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on prmDiskd:0 for client 19839: pid 20009 exited with return code 0
> > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest: #### yamauchi ####Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for prmDiskd:0_monitor_30000 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). Source: <parameters device="/dev/vda" name="diskcheck_status_internal" interval="30" prereq="fencing" CRM_meta_timeout="60000"/>
> > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=5, rc=0, cib-update=53, confirmed=false) ok
> > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: match_graph_event: Action prmDiskd:0_monitor_30000 (1) confirmed on rh63-heartbeat1 (rc=0)
> >
> >
> > It is a problem to judge crmd that a digest cord is changed in not changing the parameter at all.
> >
> > I made a patch.
> > The lrmd always gives back only a parameter depended on to a result from crmd and is a patch copying a parameter necessary for only RA run time.
> >
> > My patch may have a problem.
> > Please confirm the contents of the patch.
>
> What the patch does is to prevent lrmd from passing back the
> parameters defined with the operation. What's funny is that this
> code was there since 2006 (see LF bug 1301).
>
> Well, it makes sense to me. It would be good if Andrew takes a
> look too.
>
> And many thanks for the patch.
>
>
> Cheers,
>
> Dejan
>
>
> > Best Regards,
> > Hideo Yamauchi.
>
>
> > _______________________________________________________
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
>
>
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for. [ In reply to ]
On Wed, Oct 10, 2012 at 11:21 PM, Dejan Muhamedagic <dejan@suse.de> wrote:
> Hi Hideo-san,
>
> On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661014@ybb.ne.jp wrote:
>> Hi All,
>>
>> We found pacemaker that we could not judge a result of the operation of lrmd well.
>>
>> When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor.
>>
>> (snip)
>> primitive prmDiskd ocf:pacemaker:Dummy \
>> params name="diskcheck_status_internal" device="/dev/vda" interval="30" \
>> op start interval="0" timeout="60s" on-fail="restart" prereq="fencing" \
>> op monitor interval="30s" timeout="60s" on-fail="restart" \
>> op stop interval="0s" timeout="60s" on-fail="block"
>> (snip)
>>
>> This is because lrmd gives back prereq parameter of start as a result of monitor operation.
>> As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation.
>>
>> We can confirm this problem by the next command in Pacemaker1.0.12.
>>
>> Command 1) crm_verify command outputs the difference in digest cord.
>>
>> [root@rh63-heartbeat1 ~]# crm_verify -L
>> crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
>>
>>
>> Command 2) The ptest command outputs the difference in digest cord, too.
>>
>> [root@rh63-heartbeat1 ~]# ptest -L -VV
>> ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes
>> ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
>> [root@rh63-heartbeat1 ~]#
>>
>> Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource.
>>
>> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
>> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp: Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1
>> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave resource prmDiskd:0#011(Started rh63-heartbeat1)
>> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
>> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses
>> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2
>> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_30000 on rh63-heartbeat1 (local)
>> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_30000 )
>> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_interval=[30000] CRM_meta_timeout=[60000] cancelled
>> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 monitor[5] (pid 20009)
>> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=4, status=1, cib-update=0, confirmed=true) Cancelled
>> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on prmDiskd:0 for client 19839: pid 20009 exited with return code 0
>> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest: #### yamauchi ####Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for prmDiskd:0_monitor_30000 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). Source: <parameters device="/dev/vda" name="diskcheck_status_internal" interval="30" prereq="fencing" CRM_meta_timeout="60000"/>
>> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=5, rc=0, cib-update=53, confirmed=false) ok
>> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: match_graph_event: Action prmDiskd:0_monitor_30000 (1) confirmed on rh63-heartbeat1 (rc=0)
>>
>>
>> It is a problem to judge crmd that a digest cord is changed in not changing the parameter at all.
>>
>> I made a patch.
>> The lrmd always gives back only a parameter depended on to a result from crmd and is a patch copying a parameter necessary for only RA run time.
>>
>> My patch may have a problem.
>> Please confirm the contents of the patch.
>
> What the patch does is to prevent lrmd from passing back the
> parameters defined with the operation. What's funny is that this
> code was there since 2006 (see LF bug 1301).
>
> Well, it makes sense to me. It would be good if Andrew takes a
> look too.

Makes sense to me.
With the patch, the effective options are create+op rather than
create+op1+op2+op3...

>
> And many thanks for the patch.
>
>
> Cheers,
>
> Dejan
>
>
>> Best Regards,
>> Hideo Yamauchi.
>
>
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for. [ In reply to ]
Hi Andrew,
Hi Dejan,

> Makes sense to me.
> With the patch, the effective options are create+op rather than
> create+op1+op2+op3...

Will it be a meaning to change the structure of the op-done message?
I cannot change op message when I think about other influence.
I think that a patch is right by the op message of present lrmd and crmd.

We want to apply a patch to glue early if we can do it.

Best Regards,
Hideo Yamauchi.

--- On Thu, 2012/10/11, Andrew Beekhof <beekhof@gmail.com> wrote:

> On Wed, Oct 10, 2012 at 11:21 PM, Dejan Muhamedagic <dejan@suse.de> wrote:
> > Hi Hideo-san,
> >
> > On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661014@ybb.ne.jp wrote:
> >> Hi All,
> >>
> >> We found pacemaker that we could not judge a result of the operation of lrmd well.
> >>
> >> When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor.
> >>
> >> (snip)
> >> primitive prmDiskd ocf:pacemaker:Dummy \
> >>         params name="diskcheck_status_internal" device="/dev/vda" interval="30" \
> >>         op start interval="0" timeout="60s" on-fail="restart" prereq="fencing" \
> >>         op monitor interval="30s" timeout="60s" on-fail="restart" \
> >>         op stop interval="0s" timeout="60s" on-fail="block"
> >> (snip)
> >>
> >> This is because lrmd gives back prereq parameter of start as a result of monitor operation.
> >> As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation.
> >>
> >> We can confirm this problem by the next command in Pacemaker1.0.12.
> >>
> >> Command 1) crm_verify command outputs the difference in digest cord.
> >>
> >> [root@rh63-heartbeat1 ~]# crm_verify -L
> >> crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> >>
> >>
> >> Command 2) The ptest command outputs the difference in digest cord, too.
> >>
> >> [root@rh63-heartbeat1 ~]# ptest -L -VV
> >> ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> >> ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> >> [root@rh63-heartbeat1 ~]#
> >>
> >> Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource.
> >>
> >> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> >> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp:  Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1
> >> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave   resource prmDiskd:0#011(Started rh63-heartbeat1)
> >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses
> >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2
> >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_30000 on rh63-heartbeat1 (local)
> >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_30000 )
> >> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_interval=[30000] CRM_meta_timeout=[60000]  cancelled
> >> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 monitor[5] (pid 20009)
> >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=4, status=1, cib-update=0, confirmed=true) Cancelled
> >> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on prmDiskd:0 for client 19839: pid 20009 exited with return code 0
> >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest: #### yamauchi ####Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for prmDiskd:0_monitor_30000 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). Source: <parameters device="/dev/vda" name="diskcheck_status_internal" interval="30" prereq="fencing" CRM_meta_timeout="60000"/>
> >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=5, rc=0, cib-update=53, confirmed=false) ok
> >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: match_graph_event: Action prmDiskd:0_monitor_30000 (1) confirmed on rh63-heartbeat1 (rc=0)
> >>
> >>
> >> It is a problem to judge crmd that a digest cord is changed in not changing the parameter at all.
> >>
> >> I made a patch.
> >> The lrmd always gives back only a parameter depended on to a result from crmd and is a patch copying a parameter necessary for only RA run time.
> >>
> >> My patch may have a problem.
> >> Please confirm the contents of the patch.
> >
> > What the patch does is to prevent lrmd from passing back the
> > parameters defined with the operation. What's funny is that this
> > code was there since 2006 (see LF bug 1301).
> >
> > Well, it makes sense to me. It would be good if Andrew takes a
> > look too.
>
> Makes sense to me.
> With the patch, the effective options are create+op rather than
> create+op1+op2+op3...
>
> >
> > And many thanks for the patch.
> >
> >
> > Cheers,
> >
> > Dejan
> >
> >
> >> Best Regards,
> >> Hideo Yamauchi.
> >
> >
> >> _______________________________________________________
> >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> >> Home Page: http://linux-ha.org/
> >
> > _______________________________________________________
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
>
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for. [ In reply to ]
Hi,

On Fri, Oct 12, 2012 at 08:31:21AM +0900, renayama19661014@ybb.ne.jp wrote:
> Hi Andrew,
> Hi Dejan,
>
> > Makes sense to me.
> > With the patch, the effective options are create+op rather than
> > create+op1+op2+op3...
>
> Will it be a meaning to change the structure of the op-done message?
> I cannot change op message when I think about other influence.
> I think that a patch is right by the op message of present lrmd and crmd.
>
> We want to apply a patch to glue early if we can do it.

I'll do some testing first.

Cheers,

Dejan

> Best Regards,
> Hideo Yamauchi.
>
> --- On Thu, 2012/10/11, Andrew Beekhof <beekhof@gmail.com> wrote:
>
> > On Wed, Oct 10, 2012 at 11:21 PM, Dejan Muhamedagic <dejan@suse.de> wrote:
> > > Hi Hideo-san,
> > >
> > > On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661014@ybb.ne.jp wrote:
> > >> Hi All,
> > >>
> > >> We found pacemaker that we could not judge a result of the operation of lrmd well.
> > >>
> > >> When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor.
> > >>
> > >> (snip)
> > >> primitive prmDiskd ocf:pacemaker:Dummy \
> > >>         params name="diskcheck_status_internal" device="/dev/vda" interval="30" \
> > >>         op start interval="0" timeout="60s" on-fail="restart" prereq="fencing" \
> > >>         op monitor interval="30s" timeout="60s" on-fail="restart" \
> > >>         op stop interval="0s" timeout="60s" on-fail="block"
> > >> (snip)
> > >>
> > >> This is because lrmd gives back prereq parameter of start as a result of monitor operation.
> > >> As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation.
> > >>
> > >> We can confirm this problem by the next command in Pacemaker1.0.12.
> > >>
> > >> Command 1) crm_verify command outputs the difference in digest cord.
> > >>
> > >> [root@rh63-heartbeat1 ~]# crm_verify -L
> > >> crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> > >>
> > >>
> > >> Command 2) The ptest command outputs the difference in digest cord, too.
> > >>
> > >> [root@rh63-heartbeat1 ~]# ptest -L -VV
> > >> ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> > >> ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> > >> [root@rh63-heartbeat1 ~]#
> > >>
> > >> Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource.
> > >>
> > >> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> > >> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp:  Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1
> > >> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave   resource prmDiskd:0#011(Started rh63-heartbeat1)
> > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses
> > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2
> > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_30000 on rh63-heartbeat1 (local)
> > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_30000 )
> > >> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_interval=[30000] CRM_meta_timeout=[60000]  cancelled
> > >> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 monitor[5] (pid 20009)
> > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=4, status=1, cib-update=0, confirmed=true) Cancelled
> > >> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on prmDiskd:0 for client 19839: pid 20009 exited with return code 0
> > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest: #### yamauchi ####Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for prmDiskd:0_monitor_30000 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). Source: <parameters device="/dev/vda" name="diskcheck_status_internal" interval="30" prereq="fencing" CRM_meta_timeout="60000"/>
> > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=5, rc=0, cib-update=53, confirmed=false) ok
> > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: match_graph_event: Action prmDiskd:0_monitor_30000 (1) confirmed on rh63-heartbeat1 (rc=0)
> > >>
> > >>
> > >> It is a problem to judge crmd that a digest cord is changed in not changing the parameter at all.
> > >>
> > >> I made a patch.
> > >> The lrmd always gives back only a parameter depended on to a result from crmd and is a patch copying a parameter necessary for only RA run time.
> > >>
> > >> My patch may have a problem.
> > >> Please confirm the contents of the patch.
> > >
> > > What the patch does is to prevent lrmd from passing back the
> > > parameters defined with the operation. What's funny is that this
> > > code was there since 2006 (see LF bug 1301).
> > >
> > > Well, it makes sense to me. It would be good if Andrew takes a
> > > look too.
> >
> > Makes sense to me.
> > With the patch, the effective options are create+op rather than
> > create+op1+op2+op3...
> >
> > >
> > > And many thanks for the patch.
> > >
> > >
> > > Cheers,
> > >
> > > Dejan
> > >
> > >
> > >> Best Regards,
> > >> Hideo Yamauchi.
> > >
> > >
> > >> _______________________________________________________
> > >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > >> Home Page: http://linux-ha.org/
> > >
> > > _______________________________________________________
> > > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > Home Page: http://linux-ha.org/
> >
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for. [ In reply to ]
Hi Dejan,
Hi Andrew,

I confirmed the update with the patch of glue.
* http://hg.linux-ha.org/glue/rev/579e45f957b6

Many Thanks!
Hideo Yamauchi.


--- On Fri, 2012/10/12, Dejan Muhamedagic <dejan@suse.de> wrote:

> Hi,
>
> On Fri, Oct 12, 2012 at 08:31:21AM +0900, renayama19661014@ybb.ne.jp wrote:
> > Hi Andrew,
> > Hi Dejan,
> >
> > > Makes sense to me.
> > > With the patch, the effective options are create+op rather than
> > > create+op1+op2+op3...
> >
> > Will it be a meaning to change the structure of the op-done message?
> > I cannot change op message when I think about other influence.
> > I think that a patch is right by the op message of present lrmd and crmd.
> >
> > We want to apply a patch to glue early if we can do it.
>
> I'll do some testing first.
>
> Cheers,
>
> Dejan
>
> > Best Regards,
> > Hideo Yamauchi.
> >
> > --- On Thu, 2012/10/11, Andrew Beekhof <beekhof@gmail.com> wrote:
> >
> > > On Wed, Oct 10, 2012 at 11:21 PM, Dejan Muhamedagic <dejan@suse.de> wrote:
> > > > Hi Hideo-san,
> > > >
> > > > On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661014@ybb.ne.jp wrote:
> > > >> Hi All,
> > > >>
> > > >> We found pacemaker that we could not judge a result of the operation of lrmd well.
> > > >>
> > > >> When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor.
> > > >>
> > > >> (snip)
> > > >> primitive prmDiskd ocf:pacemaker:Dummy \
> > > >>         params name="diskcheck_status_internal" device="/dev/vda" interval="30" \
> > > >>         op start interval="0" timeout="60s" on-fail="restart" prereq="fencing" \
> > > >>         op monitor interval="30s" timeout="60s" on-fail="restart" \
> > > >>         op stop interval="0s" timeout="60s" on-fail="block"
> > > >> (snip)
> > > >>
> > > >> This is because lrmd gives back prereq parameter of start as a result of monitor operation.
> > > >> As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation.
> > > >>
> > > >> We can confirm this problem by the next command in Pacemaker1.0.12.
> > > >>
> > > >> Command 1) crm_verify command outputs the difference in digest cord.
> > > >>
> > > >> [root@rh63-heartbeat1 ~]# crm_verify -L
> > > >> crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> > > >>
> > > >>
> > > >> Command 2) The ptest command outputs the difference in digest cord, too.
> > > >>
> > > >> [root@rh63-heartbeat1 ~]# ptest -L -VV
> > > >> ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> > > >> ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> > > >> [root@rh63-heartbeat1 ~]#
> > > >>
> > > >> Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource.
> > > >>
> > > >> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6
> > > >> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp:  Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1
> > > >> Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave   resource prmDiskd:0#011(Started rh63-heartbeat1)
> > > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> > > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses
> > > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2
> > > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_30000 on rh63-heartbeat1 (local)
> > > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_30000 )
> > > >> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_interval=[30000] CRM_meta_timeout=[60000]  cancelled
> > > >> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 monitor[5] (pid 20009)
> > > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=4, status=1, cib-update=0, confirmed=true) Cancelled
> > > >> Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on prmDiskd:0 for client 19839: pid 20009 exited with return code 0
> > > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest: #### yamauchi ####Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for prmDiskd:0_monitor_30000 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). Source: <parameters device="/dev/vda" name="diskcheck_status_internal" interval="30" prereq="fencing" CRM_meta_timeout="60000"/>
> > > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_30000 (call=5, rc=0, cib-update=53, confirmed=false) ok
> > > >> Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: match_graph_event: Action prmDiskd:0_monitor_30000 (1) confirmed on rh63-heartbeat1 (rc=0)
> > > >>
> > > >>
> > > >> It is a problem to judge crmd that a digest cord is changed in not changing the parameter at all.
> > > >>
> > > >> I made a patch.
> > > >> The lrmd always gives back only a parameter depended on to a result from crmd and is a patch copying a parameter necessary for only RA run time.
> > > >>
> > > >> My patch may have a problem.
> > > >> Please confirm the contents of the patch.
> > > >
> > > > What the patch does is to prevent lrmd from passing back the
> > > > parameters defined with the operation. What's funny is that this
> > > > code was there since 2006 (see LF bug 1301).
> > > >
> > > > Well, it makes sense to me. It would be good if Andrew takes a
> > > > look too.
> > >
> > > Makes sense to me.
> > > With the patch, the effective options are create+op rather than
> > > create+op1+op2+op3...
> > >
> > > >
> > > > And many thanks for the patch.
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Dejan
> > > >
> > > >
> > > >> Best Regards,
> > > >> Hideo Yamauchi.
> > > >
> > > >
> > > >> _______________________________________________________
> > > >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > >> Home Page: http://linux-ha.org/
> > > >
> > > > _______________________________________________________
> > > > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > > Home Page: http://linux-ha.org/
> > >
> > _______________________________________________________
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
>
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/