Mailing List Archive

Centos 70->71 update fails with "Application of an update diff failed (rc=-206)"
Hi,

I'm running a CentOS 7.0 2-nodes cluster providing iSCSI/SAN features. In order to upgrade to CentOS 7.1, I'm testing the whole process in VMs and it fails. I've now stripped my config down to a pair of DRBD MS with IPADDR2 (cluster.cfg) attached.

From a running cluster, here are the steps (I'm upgrading node san2):
- put node san2 in standby
- stop/disable pacemaker on san2
- stop/disable corosync on san2
- update san2 to CentOS 7.1 (pacemaker 1.1.10-32.el7_0.1 -> 1.1.12-22.el7_1.1)
- reboot san2
- enable/start corosync on san2. It looks good, rings are fine in "corosync-cfgtool-s")
- enable/start pacemaker on san2

I can see the following in the logs:

/var/log/messages (attached, line #57)
=================
Apr 24 16:18:26 san2 crmd[11759]: notice: erase_xpath_callback: Deletion of "//node_state[@uname='san2.local']/transient_attributes": Application of an update diff failed (rc=-206)

/var/log/pacemaker.log (attached, starting from line #292)
======================
Apr 24 16:18:26 [11754] san2.local cib: info: xml_apply_patchset: v1 digest mis-match: expected 428c0eb4cd80a4c1ee19b627f6876abd, calculated ffb5456991bd4ed9e5a7774f49e8259d
Apr 24 16:18:26 [11754] san2.local cib: info: __xml_diff_object: Moved node_state@id (0 -> 6)
Apr 24 16:18:26 [11754] san2.local cib: info: __xml_diff_object: Moved node_state@uname (1 -> 0)
Apr 24 16:18:26 [11759] san2.local crmd: notice: erase_xpath_callback: Deletion of "//node_state[@uname='san2.local']/transient_attributes": Application of an update diff failed (rc=-206)
Apr 24 16:18:26 [11754] san2.local cib: info: send_sync_request: Requesting re-sync from peer
Apr 24 16:18:26 [11754] san2.local cib: notice: cib_server_process_diff: Not applying diff 0.0.14 -> 0.46.15 (sync in progress)
Apr 24 16:18:26 [11754] san2.local cib: notice: cib_server_process_diff: Not applying diff 0.0.15 -> 0.46.16 (sync in progress)
Apr 24 16:18:26 [11754] san2.local cib: notice: cib_server_process_diff: Not applying diff 0.0.16 -> 0.46.17 (sync in progress)

Google doesn't help me in figuring out what might be wrong.

Config was generated with crmsh-2.1-1.4 is that can have an impact.

Any hint would be highly appreciated.

Cheers, Patrick

NOTE: I have kernel modules (scst/zfs) that require reboots when upgrading, so I cannot upgrade both nodes while in unmanaged state. I really need to upgrade one node after the other.


**************************************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager. "postmaster@navixia.com" Navixia SA
**************************************************************************************
Re: Centos 70->71 update fails with "Application of an update diff failed (rc=-206)" [ In reply to ]
Are you sure your cluster hostnames are ok?

get_node_name: Could not obtain a node name for corosync nodeid 2

2015-04-24 17:09 GMT+02:00 Patrick Zwahlen <paz@navixia.com>:
> Hi,
>
> I'm running a CentOS 7.0 2-nodes cluster providing iSCSI/SAN features. In order to upgrade to CentOS 7.1, I'm testing the whole process in VMs and it fails. I've now stripped my config down to a pair of DRBD MS with IPADDR2 (cluster.cfg) attached.
>
> From a running cluster, here are the steps (I'm upgrading node san2):
> - put node san2 in standby
> - stop/disable pacemaker on san2
> - stop/disable corosync on san2
> - update san2 to CentOS 7.1 (pacemaker 1.1.10-32.el7_0.1 -> 1.1.12-22.el7_1.1)
> - reboot san2
> - enable/start corosync on san2. It looks good, rings are fine in "corosync-cfgtool-s")
> - enable/start pacemaker on san2
>
> I can see the following in the logs:
>
> /var/log/messages (attached, line #57)
> =================
> Apr 24 16:18:26 san2 crmd[11759]: notice: erase_xpath_callback: Deletion of "//node_state[@uname='san2.local']/transient_attributes": Application of an update diff failed (rc=-206)
>
> /var/log/pacemaker.log (attached, starting from line #292)
> ======================
> Apr 24 16:18:26 [11754] san2.local cib: info: xml_apply_patchset: v1 digest mis-match: expected 428c0eb4cd80a4c1ee19b627f6876abd, calculated ffb5456991bd4ed9e5a7774f49e8259d
> Apr 24 16:18:26 [11754] san2.local cib: info: __xml_diff_object: Moved node_state@id (0 -> 6)
> Apr 24 16:18:26 [11754] san2.local cib: info: __xml_diff_object: Moved node_state@uname (1 -> 0)
> Apr 24 16:18:26 [11759] san2.local crmd: notice: erase_xpath_callback: Deletion of "//node_state[@uname='san2.local']/transient_attributes": Application of an update diff failed (rc=-206)
> Apr 24 16:18:26 [11754] san2.local cib: info: send_sync_request: Requesting re-sync from peer
> Apr 24 16:18:26 [11754] san2.local cib: notice: cib_server_process_diff: Not applying diff 0.0.14 -> 0.46.15 (sync in progress)
> Apr 24 16:18:26 [11754] san2.local cib: notice: cib_server_process_diff: Not applying diff 0.0.15 -> 0.46.16 (sync in progress)
> Apr 24 16:18:26 [11754] san2.local cib: notice: cib_server_process_diff: Not applying diff 0.0.16 -> 0.46.17 (sync in progress)
>
> Google doesn't help me in figuring out what might be wrong.
>
> Config was generated with crmsh-2.1-1.4 is that can have an impact.
>
> Any hint would be highly appreciated.
>
> Cheers, Patrick
>
> NOTE: I have kernel modules (scst/zfs) that require reboots when upgrading, so I cannot upgrade both nodes while in unmanaged state. I really need to upgrade one node after the other.
>
>
> **************************************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the system manager. "postmaster@navixia.com" Navixia SA
> **************************************************************************************
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Centos 70->71 update fails with "Application of an update diff failed (rc=-206)" [ In reply to ]
> Are you sure your cluster hostnames are ok?
>
> get_node_name: Could not obtain a node name for corosync nodeid 2

(Confused with pacemaker<->clusterlabs mailinglists. Sorry for the
double-post)

Cluster works perfectly on CentOS 7.0, even though I have these logs as
well. Might be due to corosync.conf (attached) containing only IP (generated
by pcs)

Regards, Patrick
Re: Centos 70->71 update fails with "Application of an update diff failed (rc=-206)" [ In reply to ]
map your ip cluster to hostname using /etc/hosts and try to use an
example like this
http://clusterlabs.org/doc/fr/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_sample_corosync_configuration.html

2015-04-25 12:19 GMT+02:00 Patrick Zwahlen <paz@navixia.com>:
>> Are you sure your cluster hostnames are ok?
>>
>> get_node_name: Could not obtain a node name for corosync nodeid 2
>
> (Confused with pacemaker<->clusterlabs mailinglists. Sorry for the
> double-post)
>
> Cluster works perfectly on CentOS 7.0, even though I have these logs as
> well. Might be due to corosync.conf (attached) containing only IP (generated
> by pcs)
>
> Regards, Patrick
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



--
esta es mi vida e me la vivo hasta que dios quiera

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Centos 70->71 update fails with "Application of an update diff failed (rc=-206)" [ In reply to ]
> map your ip cluster to hostname using /etc/hosts and try to use an
> example like this
> http://clusterlabs.org/doc/fr/Pacemaker/1.1-
> pcs/html/Clusters_from_Scratch/_sample_corosync_configuration.html

I've added "name: fqdn" in my corosync.conf and I don't have those hostname
logs anymore.

This being said, I think it's unrelated to my original problem (1.1.10 -
1.1.12 upgrade pb). I have tried my upgrade once more and it keeps showing
that "diff failed" log.

But thanks for helping. Patrick
Re: Centos 70->71 update fails with "Application of an update diff failed (rc=-206)" [ In reply to ]
> On 26 Apr 2015, at 7:27 pm, Patrick Zwahlen <paz@navixia.com> wrote:
>
>> map your ip cluster to hostname using /etc/hosts and try to use an
>> example like this
>> http://clusterlabs.org/doc/fr/Pacemaker/1.1-
>> pcs/html/Clusters_from_Scratch/_sample_corosync_configuration.html
>
> I've added "name: fqdn" in my corosync.conf and I don't have those hostname
> logs anymore.

Excellent.

>
> This being said, I think it's unrelated to my original problem (1.1.10 -
> 1.1.12 upgrade pb). I have tried my upgrade once more and it keeps showing
> that "diff failed" log.

Apart from those scary logs, does anything actually break?
What your seeing is probably just ignorable noise from the older version - I would expect the underlying cib to resolve things correctly.

>
> But thanks for helping. Patrick
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Centos 70->71 update fails with "Application of an update diff failed (rc=-206)" [ In reply to ]
> Apart from those scary logs, does anything actually break?
> What your seeing is probably just ignorable noise from the older version
> - I would expect the underlying cib to resolve things correctly.

Thanks Andrew for the response.

After starting the new 1.1.12 and trying to migrate my resources, I ended up
with groups stuck "halfway" with some resources stopped on the old node and
no migration (apparently without errors from my RA).

This WE I tried another route, as I finally found how to upgrade *just*
corosync/pacemaker (without the whole OS).

- enter maintenance
- "pcs cluser stop --all"
- "yum update corosync pacemaker libqb resource-agents pcs"
- "pcs cluser start --all"
- exit maintenance

I initially just did a "yum update corosync pacemaker" and then pacemaker
didn't start. I was missing libqb but I also think there a dependency
missing somewhere in the RPMs, as libqb should get updated as well.

Anyway, I have been able to migrate from CentOS 7.0 to 7.1 in my lab without
losing anything.

Cheers, Patrick
Re: Centos 70->71 update fails with "Application of an update diff failed (rc=-206)" [ In reply to ]
> On 27 Apr 2015, at 6:35 pm, Patrick Zwahlen <paz@navixia.com> wrote:
>
>> Apart from those scary logs, does anything actually break?
>> What your seeing is probably just ignorable noise from the older version
>> - I would expect the underlying cib to resolve things correctly.
>
> Thanks Andrew for the response.
>
> After starting the new 1.1.12 and trying to migrate my resources, I ended up
> with groups stuck "halfway" with some resources stopped on the old node and
> no migration (apparently without errors from my RA).

If you’d like to send a crm_report I’d be interested to have a look.

>
> This WE I tried another route, as I finally found how to upgrade *just*
> corosync/pacemaker (without the whole OS).
>
> - enter maintenance
> - "pcs cluser stop --all"
> - "yum update corosync pacemaker libqb resource-agents pcs"
> - "pcs cluser start --all"
> - exit maintenance
>
> I initially just did a "yum update corosync pacemaker" and then pacemaker
> didn't start. I was missing libqb but I also think there a dependency
> missing somewhere in the RPMs, as libqb should get updated as well.

Nod. We’re adding that in.
Both sides keep maintaining backwards compatibility - pacemaker just wants to use the version it was built against but rpm isn’t smart enough to do that automagically :-(

>
> Anyway, I have been able to migrate from CentOS 7.0 to 7.1 in my lab without
> losing anything.

Excellent. Sounds like it might have been something to do with the resources themselves then :-/

>
> Cheers, Patrick
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org