Mailing List Archive

Problem with updating the cluster stack (SLES11 SP3) (long)
Hello!

I had seens this problem at least twice (reported to Novell support also (pending)): When updating the cluster stack via online update, the updated node fails to re-join the cluster. Furthermore "rcopenais restart" hangs on stopping, so I needed a reboot. Logs look like this:

(Update restarts services)
Dec 30 11:34:01 h01 corosync[20684]: [SERV ] Unloading all Corosync service engines.
Dec 30 11:34:01 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: Shuting down Pacemaker
Dec 30 11:34:01 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to mgmtd: [20727]
Dec 30 11:34:01 h01 mgmtd: [20727]: info: mgmtd is shutting down
Dec 30 11:34:01 h01 mgmtd: [20727]: debug: [mgmtd] stopped
Dec 30 11:34:01 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: mgmtd confirmed stopped
Dec 30 11:34:01 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to crmd: [20726]
Dec 30 11:34:01 h01 crmd[20726]: notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms
Dec 30 11:34:01 h01 attrd[20724]: notice: attrd_cs_dispatch: Update relayed from h05
Dec 30 11:34:01 h01 attrd[20724]: notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1388399641)
Dec 30 11:34:01 h01 attrd[20724]: notice: attrd_perform_update: Sent update 10: shutdown=1388399641
Dec 30 11:34:01 h01 Xen(prm_xen_v05)[5393]: INFO: v05: Starting xm migrate to h05
(Cluster triggers resource migration...)
[...]
Dec 30 11:39:00 h01 Xen(prm_xen_v06)[8525]: INFO: v06: xm migrate to h05 succeeded.
Dec 30 11:39:00 h01 logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/2/51712
Dec 30 11:39:00 h01 crmd[20726]: notice: process_lrm_event: LRM operation prm_xen_v06_migrate_to_0 (call=171, rc=0, cib-update=50, confirmed=true) ok
[...]
(Cluster is stopping dependent resources)
Dec 30 11:39:00 h01 crmd[20726]: notice: process_lrm_event: LRM operation prm_xen_v06_stop_0 (call=176, rc=0, cib-update=51, confirmed=true) ok
Dec 30 11:39:00 h01 Filesystem(prm_CFS_VMs_fs)[10396]: INFO: Running stop for /dev/CFS_VMs/xen on /cfs/xen
Dec 30 11:39:00 h01 Filesystem(prm_CFS_VMs_fs)[10396]: INFO: Trying to unmount /cfs/xen
[...]
Dec 30 11:39:01 h01 kernel: [266563.583622] ocfs2: Unmounting device (253,48) on (node 1783108)
Dec 30 11:39:01 h01 Filesystem(prm_CFS_VMs_fs)[10396]: INFO: unmounted /cfs/xen successfully
Dec 30 11:39:01 h01 crmd[20726]: notice: process_lrm_event: LRM operation prm_CFS_VMs_fs_stop_0 (call=180, rc=0, cib-update=52, confirmed=true) ok
Dec 30 11:39:01 h01 lvm[21964]: Got new connection on fd 5
(cLVM is telling the story of its life, the universe, and everything...)
[...]
Dec 30 11:39:02 h01 lvm[21964]: process_work_item: free fd -1
Dec 30 11:39:02 h01 lvm[21964]: LVM thread waiting for work
Dec 30 11:39:02 h01 LVM(prm_LVM_CFS_VMs)[10671]: INFO: 0 logical volume(s) in volume group "CFS_VMs" now active
Dec 30 11:39:02 h01 LVM(prm_LVM_CFS_VMs)[10671]: INFO: LVM Volume CFS_VMs is not available (stopped)
Dec 30 11:39:02 h01 crmd[20726]: notice: process_lrm_event: LRM operation prm_LVM_CFS_VMs_stop_0 (call=185, rc=0, cib-update=53, confirmed=true) ok
Dec 30 11:39:02 h01 clvmd(prm_cLVMd)[10734]: INFO: Stopping prm_cLVMd
[...]
(Cluster Stack eventually stops)
Dec 30 11:39:04 h01 kernel: [266566.993795] dlm: closing connection to node 84939948
Dec 30 11:39:04 h01 kernel: [266566.993828] dlm: closing connection to node 17831084
Dec 30 11:39:06 h01 crmd[20726]: notice: process_lrm_event: LRM operation prm_DLM_stop_0 (call=200, rc=0, cib-update=56, confirmed=true) ok
Dec 30 11:39:06 h01 crmd[20726]: notice: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_messa
ge ]
Dec 30 11:39:06 h01 crmd[20726]: notice: do_lrm_control: Disconnected from the LRM
Dec 30 11:39:06 h01 crmd[20726]: notice: terminate_cs_connection: Disconnecting from Corosync
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x726a20, async-conn=0x726a20) left
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: crmd confirmed stopped
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to pengine: [20725]
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: pengine confirmed stopped
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to attrd: [20724]
Dec 30 11:39:06 h01 attrd[20724]: notice: main: Exiting...
Dec 30 11:39:06 h01 attrd[20724]: notice: main: Disconnecting client 0x6106a0, pid=20726...
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: attrd confirmed stopped
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to pengine: [20725]
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: pengine confirmed stopped
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to attrd: [20724]
Dec 30 11:39:06 h01 attrd[20724]: notice: main: Exiting...
Dec 30 11:39:06 h01 attrd[20724]: notice: main: Disconnecting client 0x6106a0, pid=20726...
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: attrd confirmed stopped
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to lrmd: [20723]
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: lrmd confirmed stopped
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to stonith-ng: [20722]
Dec 30 11:39:06 h01 cib[20721]: warning: qb_ipcs_event_sendv: new_event_notification (20721-20722-10): Broken pipe (32)
Dec 30 11:39:06 h01 cib[20721]: warning: cib_notify_send_one: Notification of client crmd/76346944-ec4a-498e-9f5b-7cfa12b7d5cf failed
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x719fa0, async-conn=0x719fa0) left
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: stonith-ng confirmed stopped
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to cib: [20721]
Dec 30 11:39:06 h01 cib[20721]: notice: terminate_cs_connection: Disconnecting from Corosync
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x7226a0, async-conn=0x7226a0) left
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: cib confirmed stopped
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000000002 (2)
Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: Shutdown complete
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: Pacemaker Cluster Manager 1.1.10
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync configuration service
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync profile loading service
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais cluster membership service B.01.01
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais event service B.01.01
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais distributed locking service B.03.01
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais message service B.03.01
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais timer service A.01.01
Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais availability management framework B.01.01
Dec 30 11:39:06 h01 corosync[20684]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:1940.

(Restart of Cluster stack is observed now)
Dec 30 11:39:08 h01 sbd: [10827]: info: Watchdog enabled.
Dec 30 11:39:08 h01 sbd: [10827]: info: Setting latency warning to 15
Dec 30 11:39:08 h01 sbd: [10828]: info: Delivery process handling /dev/disk/by-id/dm-name-SBD_1-E3
Dec 30 11:39:08 h01 sbd: [10829]: info: Delivery process handling /dev/disk/by-id/dm-name-SBD_1-E4
Dec 30 11:39:08 h01 sbd: [10828]: info: Writing exit to node slot h01
Dec 30 11:39:08 h01 sbd: [10829]: info: Writing exit to node slot h01
Dec 30 11:39:08 h01 sbd: [10828]: info: exit successfully delivered to h01
Dec 30 11:39:08 h01 sbd: [10829]: info: exit successfully delivered to h01
Dec 30 11:39:08 h01 sbd: [10827]: info: Message successfully delivered.
Dec 30 11:39:09 h01 sbd: [20486]: info: Received command exit from h01 on disk /dev/disk/by-id/dm-name-SBD_1-E4
Dec 30 11:39:09 h01 sbd: [20483]: WARN: Servant for /dev/disk/by-id/dm-name-SBD_1-E3 (pid: 20485) has terminated
Dec 30 11:39:09 h01 sbd: [20483]: WARN: Servant for /dev/disk/by-id/dm-name-SBD_1-E4 (pid: 20486) has terminated
Dec 30 11:39:10 h01 sbd: [10894]: info: Watchdog enabled.
Dec 30 11:39:10 h01 sbd: [10894]: info: Setting latency warning to 15
Dec 30 11:39:10 h01 sbd: [10898]: info: Servant starting for device /dev/disk/by-id/dm-name-SBD_1-E3
Dec 30 11:39:10 h01 sbd: [10899]: info: Servant starting for device /dev/disk/by-id/dm-name-SBD_1-E4
Dec 30 11:39:17 h01 sbd: [10895]: notice: Using watchdog device: /dev/watchdog
Dec 30 11:39:17 h01 sbd: [10895]: info: Set watchdog timeout to 90 seconds.
Dec 30 11:39:17 h01 kernel: [266579.322915] hpwdt: New timer passed in is 90 seconds.
Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Corosync Cluster Engine ('1.4.6'): started and ready to provide service.
Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Corosync built-in features: nss
Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Successfully configured openais services to load
Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] The network interface [172.20.16.1] is now up.
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais cluster membership service B.01.01
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais event service B.01.01
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais availability management framework B.01.01
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais message service B.03.01
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais distributed locking service B.03.01
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais timer service A.01.01
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: process_ais_conf: Reading configure
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_init: Local handle: 8054506479773810690 for logging
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional logging options...
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found 'off' for option: debug
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found 'no' for option: to_logfile
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found 'yes' for option: to_syslog
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found 'daemon' for option: syslog_facility
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_init: Local handle: 7664968412203843587 for quorum
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: No additional configuration supplied for: quorum
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: No default for option: provider
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_init: Local handle: 9033685355982749700 for service
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found '0' for option: ver
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Defaulting to 'pcmk' for option: clustername
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found 'yes' for option: use_mgmtd
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_startup: CRM: Initialized
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] Logging: Initialized pcmk_startup
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_startup: Service: 9
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_startup: Local hostname: h01
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_update_nodeid: Local node id: 17831084
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Creating entry for node 17831084 born on 0
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: 0x704810 Node 17831084 now known as h01 (was: (null))
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has 1 quorum votes (was 0)
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node 17831084/h01 is now: member
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11001 for process cib
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11002 for process stonith-ng
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11003 for process lrmd
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process attrd
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11004 for process attrd
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process pengine
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11005 for process pengine
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process crmd
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11006 for process crmd
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11007 for process mgmtd
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.1.10
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync configuration service
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync profile loading service
Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] The network interface [10.2.2.1] is now up.
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] New Configuration:
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Left:
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Joined:
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1904: memb=0, new=0, lost=0
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] New Configuration:
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Left:
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Joined:
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1904: memb=1, new=1, lost=0
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: NEW: h01 17831084
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 30 11:39:17 h01 corosync[10962]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:0 left:0)
Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Completed service synchronization, ready to provide service.
(so far, everything looks good (IMHO))
Dec 30 11:39:17 h01 mgmtd: [11007]: info: Pacemaker-mgmt Git Version:
Dec 30 11:39:17 h01 crmd[11006]: notice: main: CRM Git Version: 9d65618
Dec 30 11:39:17 h01 attrd[11004]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:39:17 h01 cib[11001]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
Dec 30 11:39:17 h01 stonith-ng[11002]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:39:17 h01 mgmtd: [11007]: WARN: Core dumps could be lost if multiple dumps occur.
Dec 30 11:39:17 h01 mgmtd: [11007]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
Dec 30 11:39:17 h01 mgmtd: [11007]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
Dec 30 11:39:17 h01 attrd[11004]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x716840 for attrd/11004
Dec 30 11:39:17 h01 stonith-ng[11002]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x71abc0 for stonith-ng/11002
Dec 30 11:39:17 h01 attrd[11004]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:17 h01 attrd[11004]: notice: main: Starting mainloop...
Dec 30 11:39:17 h01 stonith-ng[11002]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:17 h01 cib[11001]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:39:17 h01 cib[11001]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x721790 for cib/11001
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Sending membership update 1904 to cib
Dec 30 11:39:17 h01 cib[11001]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:17 h01 cib[11001]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] New Configuration:
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Left:
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Joined:
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1908: memb=1, new=0, lost=0
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: memb: h01 17831084
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] New Configuration:
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Left:
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Joined:
Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1908: memb=2, new=1, lost=0
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Creating entry for node 84939948 born on 1908
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node 84939948/unknown is now: member
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: NEW: .pending. 84939948
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: MEMB: .pending. 84939948
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 1 children
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: 0x704810 Node 17831084 ((null)) born on: 1908
(the node now should be integrated into the cluster)
Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 30 11:39:17 h01 cib[11001]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[84939948] - state is now member (was (null))
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: 0x71f610 Node 84939948 (h05) born on: 1876
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: 0x71f610 Node 84939948 now known as h05 (was: (null))
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node h05 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node h05 now has 1 quorum votes (was 0)
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 1 children
Dec 30 11:39:17 h01 cib[11001]: notice: plugin_handle_membership: Membership 1908: quorum acquired
(everything looks perfect, doesn't it? But now...)
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:17 h01 corosync[10962]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:1 left:0)
Dec 30 11:39:17 h01 cib[11001]: notice: cib_server_process_diff: Not applying diff 0.856.46 -> 0.856.47 (sync in progress)
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:17 h01 cib[11001]: notice: crm_ipc_prepare: Message exceeds the configured ipc limit (51200 bytes), consider configuring PCMK_ipc_buffer to 1
61300 or higher to avoid compression overheads
Dec 30 11:39:18 h01 mgmtd: [11007]: info: Started.
Dec 30 11:39:18 h01 crmd[11006]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:39:18 h01 stonith-ng[11002]: notice: setup_cib: Watching for stonith topology changes
Dec 30 11:39:18 h01 crmd[11006]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:18 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x72c090 for crmd/11006
Dec 30 11:39:18 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Sending membership update 1908 to crmd
Dec 30 11:39:18 h01 crmd[11006]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:18 h01 stonith-ng[11002]: notice: unpack_config: On loss of CCM Quorum: Ignore
Dec 30 11:39:18 h01 crmd[11006]: notice: plugin_handle_membership: Membership 1908: quorum acquired
Dec 30 11:39:18 h01 crmd[11006]: notice: crm_update_peer_state: plugin_handle_membership: Node h05[84939948] - state is now member (was (null))
Dec 30 11:39:18 h01 crmd[11006]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
Dec 30 11:39:18 h01 crmd[11006]: notice: do_started: The local CRM is operational
Dec 30 11:39:18 h01 crmd[11006]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_st
arted ]
Dec 30 11:39:20 h01 logd: [11079]: debug: Stopping ha_logd with pid 16800
Dec 30 11:39:20 h01 logd: [11079]: info: Waiting for pid=16800 to exit
Dec 30 11:39:20 h01 logd: [16800]: debug: logd_term_action: received SIGTERM
Dec 30 11:39:20 h01 logd: [16800]: debug: logd_term_action: waiting for 0 messages to be read by write process
Dec 30 11:39:20 h01 logd: [16800]: debug: logd_term_action: sending SIGTERM to write process
Dec 30 11:39:20 h01 logd: [16804]: info: logd_term_write_action: received SIGTERM
Dec 30 11:39:20 h01 logd: [16804]: debug: Writing out 0 messages then quitting
Dec 30 11:39:20 h01 logd: [16804]: info: Exiting write process
Dec 30 11:39:20 h01 crmd[11006]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join
_finalize_respond ]
Dec 30 11:39:20 h01 attrd[11004]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Dec 30 11:39:20 h01 Filesystem(prm_CFS_VMs_fs)[11089]: WARNING: Couldn't find device [/dev/CFS_VMs/xen]. Expected /dev/??? to exist
Dec 30 11:39:20 h01 LVM(prm_LVM_CFS_VMs)[11083]: WARNING: LVM Volume CFS_VMs is not available (stopped)
Dec 30 11:39:20 h01 o2cb(prm_O2CB)[11085]: INFO: Stack glue driver not loaded
Dec 30 11:39:21 h01 crmd[11006]: notice: process_lrm_event: LRM operation prm_cLVMd_monitor_0 (call=16, rc=7, cib-update=9, confirmed=true) not running
[...]
(resources start up)
Dec 30 11:39:22 h01 kernel: [266584.261481] dlm: Using SCTP for communications
Dec 30 11:39:22 h01 cib[11001]: error: crm_abort: cib_common_callback: Triggered fatal assert at callbacks.c:249 : flags & crm_ipc_client_response
Dec 30 11:39:22 h01 kernel: [266584.267797] ocfs2: Registered cluster interface user
Dec 30 11:39:22 h01 stonith-ng[11002]: error: crm_ipc_read: Connection to cib_rw failed
Dec 30 11:39:22 h01 stonith-ng[11002]: error: mainloop_gio_callback: Connection to cib_rw[0x61e680] closed (I/O condition=17)
Dec 30 11:39:22 h01 crmd[11006]: error: crm_ipc_read: Connection to cib_shm failed
Dec 30 11:39:22 h01 crmd[11006]: error: mainloop_gio_callback: Connection to cib_shm[0x757ec0] closed (I/O condition=17)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x721790, async-conn=0x721790) left
Dec 30 11:39:22 h01 stonith-ng[11002]: notice: cib_connection_destroy: Connection to the CIB terminated. Shutting down.
Dec 30 11:39:22 h01 crmd[11006]: error: crmd_cib_connection_destroy: Connection to the CIB terminated...
Dec 30 11:39:22 h01 crmd[11006]: error: do_log: FSA: Input I_ERROR from crmd_cib_connection_destroy() received in state S_NOT_DC
Dec 30 11:39:22 h01 crmd[11006]: notice: do_state_transition: State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=crmd_cib
_connection_destroy ]
Dec 30 11:39:22 h01 crmd[11006]: warning: do_recover: Fast-tracking shutdown in response to errors
Dec 30 11:39:22 h01 crmd[11006]: error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY
Dec 30 11:39:22 h01 lrmd[11003]: error: crm_ipc_read: Connection to stonith-ng failed
Dec 30 11:39:22 h01 lrmd[11003]: error: mainloop_gio_callback: Connection to stonith-ng[0x62dc80] closed (I/O condition=17)
Dec 30 11:39:22 h01 lrmd[11003]: error: stonith_connection_destroy_cb: LRMD lost STONITH connection
Dec 30 11:39:22 h01 crmd[11006]: error: lrm_state_verify_stopped: 2 pending LRM operations at shutdown(null)
Dec 30 11:39:22 h01 crmd[11006]: error: ghash_print_pending: Pending action: prm_cLVMd:79 (prm_cLVMd_start_0)
Dec 30 11:39:22 h01 crmd[11006]: error: ghash_print_pending: Pending action: prm_O2CB:81 (prm_O2CB_start_0)
Dec 30 11:39:22 h01 crmd[11006]: notice: do_lrm_control: Disconnected from the LRM
Dec 30 11:39:22 h01 crmd[11006]: notice: terminate_cs_connection: Disconnecting from Corosync
Dec 30 11:39:22 h01 logd: [11537]: info: logd started with default configuration.
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x71abc0, async-conn=0x71abc0) left
Dec 30 11:39:22 h01 logd: [11537]: WARN: Core dumps could be lost if multiple dumps occur.
Dec 30 11:39:22 h01 logd: [11537]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
Dec 30 11:39:22 h01 logd: [11537]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
Dec 30 11:39:22 h01 crmd[11006]: error: crmd_fast_exit: Could not recover from internal error
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x72c090, async-conn=0x72c090) left
Dec 30 11:39:22 h01 kernel: [266584.277833] OCFS2 Node Manager 1.5.0
Dec 30 11:39:22 h01 kernel: [266584.291675] OCFS2 1.5.0
Dec 30 11:39:22 h01 ocfs2_controld.pcmk: Core dumps enabled: /var/lib/openais
Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_notify: Enabling node notifications for child 11546 (0x71abc0)
Dec 30 11:39:22 h01 ocfs2_controld[11546]: Cluster connection established. Local node id: 17831084
Dec 30 11:39:22 h01 ocfs2_controld[11546]: Added Pacemaker as client 1 with fd 8
Dec 30 11:39:22 h01 ocfs2_controld[11546]: Initializing CKPT service (try 1)
Dec 30 11:39:22 h01 ocfs2_controld[11546]: Connected to CKPT service with handle 0x440db64900000000
Dec 30 11:39:22 h01 ocfs2_controld[11546]: Opening checkpoint "ocfs2:controld:011014ac" (try 1)
[...]
Dec 30 11:39:22 h01 ocfs2_controld[11546]: cpg_join succeeded
Dec 30 11:39:22 h01 ocfs2_controld[11546]: setup done
Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: plugin_handle_membership: Membership 1908: quorum acquired
Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: crm_update_peer_state: plugin_handle_membership: Node h05[84939948] - state is now member (was (null))
Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
Dec 30 11:39:22 h01 ocfs2_controld[11546]: confchg called
Dec 30 11:39:22 h01 ocfs2_controld[11546]: ocfs2_controld (group "ocfs2:controld") confchg: members 2, left 0, joined 1
Dec 30 11:39:22 h01 ocfs2_controld[11546]: CPG is live, we are not the first daemon
[...]
Dec 30 11:39:22 h01 ocfs2_controld[11546]: Daemon protocol is 1.0
Dec 30 11:39:22 h01 ocfs2_controld[11546]: fs protocol is 1.0
Dec 30 11:39:22 h01 ocfs2_controld[11546]: Connecting to dlm_controld
Dec 30 11:39:22 h01 ocfs2_controld[11546]: Opening control device
Dec 30 11:39:22 h01 ocfs2_controld[11546]: Starting to listen for mounters
Dec 30 11:39:22 h01 ocfs2_controld[11546]: new listening connection 4
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=11001, core=false)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11553 for process cib
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process crmd exited (pid=11006, rc=201)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151112 (1380626)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: crmd
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process crmd
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11554 for process crmd
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Child process stonith-ng exited (pid=11002, rc=0)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: stonith-ng
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11555 for process stonith-ng
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
(this does not look good!)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151112 (1380626)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
Dec 30 11:39:22 h01 stonith-ng[11555]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:39:22 h01 cib[11553]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
Dec 30 11:39:22 h01 crmd[11554]: notice: main: CRM Git Version: 9d65618
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2)
Dec 30 11:39:22 h01 stonith-ng[11555]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x74e800 for stonith-ng/11555
Dec 30 11:39:22 h01 stonith-ng[11555]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:22 h01 cib[11553]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:39:22 h01 cib[11553]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x752b80 for cib/11553
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Sending membership update 1908 to cib
Dec 30 11:39:22 h01 cib[11553]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:22 h01 cib[11553]: notice: plugin_handle_membership: Membership 1908: quorum acquired
Dec 30 11:39:22 h01 cib[11553]: notice: crm_update_peer_state: plugin_handle_membership: Node h05[84939948] - state is now member (was (null))
Dec 30 11:39:22 h01 cib[11553]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:22 h01 cib[11553]: notice: cib_server_process_diff: Not applying diff 0.856.88 -> 0.856.89 (sync in progress)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:22 h01 cib[11553]: error: crm_compress_string: Compression of 1969 bytes failed: output data will not fit into the buffer provided (-8)
Dec 30 11:39:22 h01 cib[11553]: error: crm_ipc_prepare: Could not compress the message into less than the configured ipc limit (48 bytes).Set PCMK_ipc_bu
ffer to a higher value (3938 bytes suggested)
Dec 30 11:39:22 h01 cib[11553]: notice: cib_notify_send: Notification failed: Message too long (-90)
Dec 30 11:39:22 h01 cib[11553]: error: crm_compress_string: Compression of 1939 bytes failed: output data will not fit into the buffer provided (-8)
Dec 30 11:39:22 h01 cib[11553]: error: crm_ipc_prepare: Could not compress the message into less than the configured ipc limit (48 bytes).Set PCMK_ipc_bu
ffer to a higher value (7876 bytes suggested)
Dec 30 11:39:22 h01 cib[11553]: notice: cib_notify_send: Notification failed: Message too long (-90)
Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:23 h01 cib[11553]: error: crm_abort: cib_common_callback: Triggered fatal assert at callbacks.c:249 : flags & crm_ipc_client_response
Dec 30 11:39:23 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x752b80, async-conn=0x752b80) left
Dec 30 11:39:24 h01 cluster-dlm[11460]: notice: crm_ipc_send: Connection to cib_rw closed: Transport endpoint is not connected (-107)
Dec 30 11:39:24 h01 cluster-dlm[11460]: cib_connect: Signon to cib failed: Communication error on send
Dec 30 11:39:24 h01 cluster-dlm[11460]: dlm_join_lockspace: dlm_join_lockspace no fence domain
Dec 30 11:39:24 h01 cluster-dlm[11460]: do_sysfs: write "-1" to "/sys/kernel/dlm/clvmd/event_done"
Dec 30 11:39:24 h01 cluster-dlm[11460]: process_uevent: process_uevent online@ error -1 errno 0
Dec 30 11:39:24 h01 cluster-dlm[11460]: process_listener: client connection 5 fd 18
Dec 30 11:39:24 h01 cluster-dlm[11460]: process_uevent: uevent: remove@/kernel/dlm/clvmd
Dec 30 11:39:24 h01 cluster-dlm[11460]: process_uevent: kernel: remove@ clvmd
Dec 30 11:39:24 h01 clvmd[11517]: Unable to create DLM lockspace for CLVM: Operation not permitted
Dec 30 11:39:24 h01 clvmd[11517]: Can't initialise cluster interface
Dec 30 11:39:24 h01 kernel: [266586.266327] dlm: clvmd: group join failed -1 -1
Dec 30 11:39:24 h01 clvmd(prm_cLVMd)[11485]: ERROR: Could not start /usr/sbin/clvmd
Dec 30 11:39:24 h01 lrmd[11003]: notice: operation_finished: prm_cLVMd_start_0:11485:stderr [ local socket: connect failed: No such file or directory ]
Dec 30 11:39:24 h01 lrmd[11003]: notice: operation_finished: prm_cLVMd_start_0:11485:stderr [ clvmd could not connect to cluster manager ]
Dec 30 11:39:24 h01 lrmd[11003]: notice: operation_finished: prm_cLVMd_start_0:11485:stderr [ Consult syslog for more information ]
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=11553, core=false)
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11629 for process cib
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:24 h01 cib[11629]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
Dec 30 11:39:24 h01 cib[11629]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:39:24 h01 cib[11629]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x739ed0 for cib/11629
Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Sending membership update 1908 to cib
Dec 30 11:39:24 h01 cib[11629]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:24 h01 cib[11629]: notice: plugin_handle_membership: Membership 1908: quorum acquired
Dec 30 11:39:24 h01 cib[11629]: notice: crm_update_peer_state: plugin_handle_membership: Node h05[84939948] - state is now member (was (null))
Dec 30 11:39:24 h01 cib[11629]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
Dec 30 11:39:25 h01 crmd[11554]: notice: crm_ipc_send: Connection to cib_shm closed: Transport endpoint is not connected (-107)
Dec 30 11:39:25 h01 crmd[11554]: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
Dec 30 11:39:25 h01 stonith-ng[11555]: notice: setup_cib: Watching for stonith topology changes
Dec 30 11:39:25 h01 stonith-ng[11555]: notice: unpack_config: On loss of CCM Quorum: Ignore
Dec 30 11:39:27 h01 cib[11629]: error: crm_abort: cib_common_callback: Triggered fatal assert at callbacks.c:247 : flags & crm_ipc_client_response
Dec 30 11:39:27 h01 stonith-ng[11555]: error: crm_ipc_read: Connection to cib_rw failed
Dec 30 11:39:27 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x739ed0, async-conn=0x739ed0) left
Dec 30 11:39:27 h01 stonith-ng[11555]: error: mainloop_gio_callback: Connection to cib_rw[0x61e760] closed (I/O condition=17)
Dec 30 11:39:27 h01 stonith-ng[11555]: notice: cib_connection_destroy: Connection to the CIB terminated. Shutting down.
Dec 30 11:39:27 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x74e800, async-conn=0x74e800) left
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=11629, core=false)
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11741 for process cib
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Child process stonith-ng exited (pid=11555, rc=0)
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: stonith-ng
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11742 for process stonith-ng
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:28 h01 stonith-ng[11742]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:39:28 h01 stonith-ng[11742]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x742b30 for stonith-ng/11742
Dec 30 11:39:28 h01 cib[11741]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Dec 30 11:39:28 h01 stonith-ng[11742]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:28 h01 cib[11741]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:39:28 h01 cib[11741]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
[...]
Dec 30 11:39:38 h01 cib[11884]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:39:38 h01 cib[11884]: notice: plugin_handle_membership: Membership 1908: quorum acquired
Dec 30 11:39:38 h01 cib[11884]: notice: crm_update_peer_state: plugin_handle_membership: Node h05[84939948] - state is now member (was (null))
Dec 30 11:39:38 h01 cib[11884]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
Dec 30 11:39:39 h01 attrd[11004]: notice: crm_ipc_send: Connection to cib_rw closed: Transport endpoint is not connected (-107)
Dec 30 11:39:39 h01 stonith-ng[11885]: notice: setup_cib: Watching for stonith topology changes
Dec 30 11:39:39 h01 stonith-ng[11885]: notice: unpack_config: On loss of CCM Quorum: Ignore
Dec 30 11:39:42 h01 cib[11884]: error: crm_abort: cib_common_callback: Triggered fatal assert at callbacks.c:247 : flags & crm_ipc_client_response
Dec 30 11:39:42 h01 stonith-ng[11885]: error: crm_ipc_read: Connection to cib_rw failed
Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x7fc5ec0076f0, async-conn=0x7fc5ec0076f0) left
Dec 30 11:39:42 h01 stonith-ng[11885]: error: mainloop_gio_callback: Connection to cib_rw[0x61e760] closed (I/O condition=17)
Dec 30 11:39:42 h01 stonith-ng[11885]: notice: cib_connection_destroy: Connection to the CIB terminated. Shutting down.
Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=11884, core=false)
Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11930 for process cib
Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Child process stonith-ng exited (pid=11885, rc=0)
Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: stonith-ng
[...]
Dec 30 11:41:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:42:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:42:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:43:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:43:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:44:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:44:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:45:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:45:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:46:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:46:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:47:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:47:38 h01 shutdown[16154]: shutting down for system reboot
Dec 30 11:47:38 h01 init: Switching to runlevel: 6
Dec 30 11:47:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:48:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:48:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:49:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:49:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
Dec 30 11:50:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...

After Reboot the situation looks like this (no settings were changed!)
Dec 30 11:56:55 h01 sbd: [15941]: notice: Using watchdog device: /dev/watchdog
Dec 30 11:56:55 h01 sbd: [15941]: info: Set watchdog timeout to 90 seconds.
Dec 30 11:56:55 h01 kernel: [ 82.778538] hpwdt: New timer passed in is 90 seconds.
Dec 30 11:56:55 h01 corosync[16143]: [MAIN ] Corosync Cluster Engine ('1.4.6'): started and ready to provide service.
Dec 30 11:56:55 h01 corosync[16143]: [MAIN ] Corosync built-in features: nss
Dec 30 11:56:55 h01 corosync[16143]: [MAIN ] Successfully configured openais services to load
Dec 30 11:56:55 h01 corosync[16143]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
[...]
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_startup: Local hostname: h01
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_update_nodeid: Local node id: 17831084
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: update_member: Creating entry for node 17831084 born on 0
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: update_member: 0x6451c0 Node 17831084 now known as h01 (was: (null))
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: update_member: Node h01 now has 1 quorum votes (was 0)
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: update_member: Node 17831084/h01 is now: member
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16180 for process cib
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16181 for process stonith-ng
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16182 for process lrmd
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process attrd
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16183 for process attrd
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process pengine
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16184 for process pengine
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process crmd
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16185 for process crmd
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16186 for process mgmtd
Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.1.10
Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync configuration service
Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync profile loading service
Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Dec 30 11:56:56 h01 corosync[16143]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Dec 30 11:56:56 h01 corosync[16143]: [TOTEM ] The network interface [10.2.2.1] is now up.
Dec 30 11:56:56 h01 corosync[16143]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:56:56 h01 corosync[16143]: [CLM ] New Configuration:
Dec 30 11:56:56 h01 corosync[16143]: [CLM ] Members Left:
Dec 30 11:56:56 h01 corosync[16143]: [CLM ] Members Joined:
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1912: memb=0, new=0, lost=0
Dec 30 11:56:56 h01 corosync[16143]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:56:56 h01 corosync[16143]: [CLM ] New Configuration:
Dec 30 11:56:56 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:56:56 h01 corosync[16143]: [CLM ] Members Left:
Dec 30 11:56:56 h01 corosync[16143]: [CLM ] Members Joined:
Dec 30 11:56:56 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1912: memb=1, new=1, lost=0
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: NEW: h01 17831084
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:56:56 h01 corosync[16143]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 30 11:56:56 h01 corosync[16143]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:0 left:0)
Dec 30 11:56:56 h01 corosync[16143]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 30 11:56:56 h01 stonith-ng[16181]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:56:56 h01 stonith-ng[16181]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Recorded connection 0x6534e0 for stonith-ng/16181
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Sending membership update 1912 to stonith-ng
Dec 30 11:56:56 h01 stonith-ng[16181]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:56:56 h01 attrd[16183]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:56:56 h01 attrd[16183]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Recorded connection 0x659650 for attrd/16183
Dec 30 11:56:56 h01 attrd[16183]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:56:56 h01 attrd[16183]: notice: main: Starting mainloop...
Dec 30 11:56:56 h01 cib[16180]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
Dec 30 11:56:56 h01 mgmtd: [16186]: info: Pacemaker-mgmt Git Version: dcb09c5
Dec 30 11:56:56 h01 mgmtd: [16186]: WARN: Core dumps could be lost if multiple dumps occur.
Dec 30 11:56:56 h01 mgmtd: [16186]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
Dec 30 11:56:56 h01 mgmtd: [16186]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
Dec 30 11:56:56 h01 mgmtd: [16186]: notice: connect to lrmd: 0, rc=-107
Dec 30 11:56:56 h01 cib[16180]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:56:56 h01 cib[16180]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Recorded connection 0x65d9d0 for cib/16180
Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Sending membership update 1912 to cib
Dec 30 11:56:56 h01 crmd[16185]: notice: main: CRM Git Version: 65bb87e
Dec 30 11:56:56 h01 cib[16180]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:56:56 h01 cib[16180]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
Dec 30 11:56:57 h01 stonith-ng[16181]: notice: setup_cib: Watching for stonith topology changes
Dec 30 11:56:57 h01 stonith-ng[16181]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
Dec 30 11:56:57 h01 stonith-ng[16181]: notice: unpack_config: On loss of CCM Quorum: Ignore
Dec 30 11:56:57 h01 crmd[16185]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Dec 30 11:56:57 h01 crmd[16185]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:56:57 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Recorded connection 0x662670 for crmd/16185
Dec 30 11:56:57 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Sending membership update 1912 to crmd
Dec 30 11:56:57 h01 crmd[16185]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Dec 30 11:56:57 h01 crmd[16185]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
Dec 30 11:56:57 h01 crmd[16185]: warning: cluster_option: Using deprecated name 'migration-limit' for cluster option 'node-action-limit'
Dec 30 11:56:57 h01 crmd[16185]: notice: do_started: The local CRM is operational
Dec 30 11:56:57 h01 crmd[16185]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_st
arted ]
Dec 30 11:56:57 h01 mgmtd: [16186]: info: Started.
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] New Configuration:
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] Members Left:
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] Members Joined:
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1916: memb=1, new=0, lost=0
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: memb: h01 17831084
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] New Configuration:
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] Members Left:
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] Members Joined:
Dec 30 11:57:00 h01 stonith-ng[16181]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 84939948
Dec 30 11:57:00 h01 stonith-ng[16181]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[84939948] - state is now member (was (null))
Dec 30 11:57:00 h01 cib[16180]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 84939948
Dec 30 11:57:00 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:57:00 h01 stonith-ng[16181]: notice: plugin_handle_membership: Membership 1916: quorum acquired
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1916: memb=2, new=1, lost=0
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: Creating entry for node 84939948 born on 1916
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: Node 84939948/unknown is now: member
Dec 30 11:57:00 h01 crmd[16185]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 84939948
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: NEW: .pending. 84939948
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: MEMB: .pending. 84939948
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: send_member_notification: Sending membership update 1916 to 3 children
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: 0x6451c0 Node 17831084 ((null)) born on: 1916
Dec 30 11:57:00 h01 corosync[16143]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: 0x667590 Node 84939948 (h05) born on: 1876
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: 0x667590 Node 84939948 now known as h05 (was: (null))
Dec 30 11:57:00 h01 crmd[16185]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[84939948] - state is now member (was (null))
Dec 30 11:57:00 h01 cib[16180]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[84939948] - state is now member (was (null))
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: Node h05 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:57:00 h01 cib[16180]: notice: plugin_handle_membership: Membership 1916: quorum acquired
Dec 30 11:57:00 h01 crmd[16185]: notice: plugin_handle_membership: Membership 1916: quorum acquired
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: Node h05 now has 1 quorum votes (was 0)
Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: send_member_notification: Sending membership update 1916 to 3 children
Dec 30 11:57:00 h01 cib[16180]: notice: cib_server_process_diff: Not applying diff 0.856.100 -> 0.856.101 (sync in progress)
Dec 30 11:57:00 h01 corosync[16143]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:1 left:0)
Dec 30 11:57:00 h01 corosync[16143]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 30 11:57:00 h01 crmd[16185]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join
_finalize_respond ]
Dec 30 11:57:00 h01 attrd[16183]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Dec 30 11:57:02 h01 o2cb(prm_O2CB)[16588]: INFO: configfs not loaded
Dec 30 11:57:02 h01 Filesystem(prm_CFS_VMs_fs)[16589]: WARNING: Couldn't find device [/dev/CFS_VMs/xen]. Expected /dev/??? to exist
Dec 30 11:57:02 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_O2CB_monitor_0 (call=26, rc=7, cib-update=9, confirmed=true) not running
Dec 30 11:57:02 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_cLVMd_monitor_0 (call=16, rc=7, cib-update=10, confirmed=true) not running
Dec 30 11:57:02 h01 LVM(prm_LVM_CFS_VMs)[16587]: WARNING: LVM Volume CFS_VMs is not available (stopped)
Dec 30 11:57:02 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_DLM_monitor_0 (call=11, rc=7, cib-update=11, confirmed=true) not running
[... (more probes)]
Dec 30 11:57:03 h01 ocfs2_controld[17046]: new listening connection 4
Dec 30 11:57:03 h01 cluster-dlm[16969]: process_listener: client connection 7 fd 21
Dec 30 11:57:04 h01 clvmd[17023]: Created DLM lockspace for CLVMD.
Dec 30 11:57:04 h01 clvmd[17023]: DLM initialisation complete
Dec 30 11:57:04 h01 clvmd[17023]: Our local node id is 17831084
Dec 30 11:57:04 h01 clvmd[17023]: Connected to Corosync
Dec 30 11:57:04 h01 clvmd[17023]: Cluster LVM daemon started - connected to Corosync
Dec 30 11:57:04 h01 clvmd[17023]: Cluster ready, doing some more initialisation
Dec 30 11:57:04 h01 clvmd[17023]: starting LVM thread
Dec 30 11:57:04 h01 clvmd[17023]: LVM thread function started
Dec 30 11:57:05 h01 lvm[17023]: Sub thread ready for work.
Dec 30 11:57:05 h01 lvm[17023]: LVM thread waiting for work
Dec 30 11:57:05 h01 lvm[17023]: clvmd ready for work
Dec 30 11:57:05 h01 lvm[17023]: Using timeout of 60 seconds
Dec 30 11:57:05 h01 lvm[17023]: confchg callback. 1 joined, 0 left, 2 members
Dec 30 11:57:05 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_O2CB_start_0 (call=63, rc=0, cib-update=23, confirmed=true) ok
Dec 30 11:57:05 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_O2CB_monitor_60000 (call=64, rc=0, cib-update=24, confirmed=false) ok
Dec 30 11:57:07 h01 cmirrord[17078]: Starting cmirrord:
Dec 30 11:57:07 h01 cmirrord[17078]: Built: May 29 2013 15:04:35
Dec 30 11:57:09 h01 lrmd[16182]: notice: operation_finished: prm_cLVMd_start_0:16979:stderr [ local socket: connect failed: Connection refused ]
Dec 30 11:57:09 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_cLVMd_start_0 (call=62, rc=0, cib-update=25, confirmed=true) ok
Dec 30 11:57:09 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_cLVMd_monitor_300000 (call=65, rc=0, cib-update=26, confirmed=false) ok
Dec 30 11:57:09 h01 lvm[17023]: Got new connection on fd 5
[...]
Dec 30 11:57:13 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_CFS_VMs_fs_start_0 (call=68, rc=0, cib-update=29, confirmed=true) ok
Dec 30 11:57:13 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_CFS_VMs_fs_monitor_300000 (call=69, rc=0, cib-update=30, confirmed=false) ok

During the communication problems the other node (DC) saw these messages:
Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000000002 (2)
Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1900 to 4 children
Dec 30 11:39:06 h05 crmd[20787]: notice: peer_update_callback: do_shutdown of h01 (op 80) is complete
Dec 30 11:39:06 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:39:06 h05 corosync[20745]: [CLM ] New Configuration:
Dec 30 11:39:06 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:39:06 h05 corosync[20745]: [CLM ] Members Left:
Dec 30 11:39:06 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:39:06 h05 corosync[20745]: [CLM ] Members Joined:
Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1904: memb=1, new=0, lost=1
Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: memb: h05 84939948
Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: lost: h01 17831084
Dec 30 11:39:06 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:39:06 h05 corosync[20745]: [CLM ] New Configuration:
Dec 30 11:39:06 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:39:06 h05 corosync[20745]: [CLM ] Members Left:
Dec 30 11:39:06 h05 ocfs2_controld[15794]: notice: plugin_handle_membership: Membership 1904: quorum lost
Dec 30 11:39:06 h05 corosync[20745]: [CLM ] Members Joined:
Dec 30 11:39:06 h05 ocfs2_controld[15794]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
Dec 30 11:39:06 h05 cluster-dlm[15666]: notice: plugin_handle_membership: Membership 1904: quorum lost
Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1904: memb=1, new=0, lost=0
Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h05 84939948
Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: ais_mark_unseen_peer_dead: Node h01 was not seen in the previous transition
Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: update_member: Node 17831084/h01 is now: lost
Dec 30 11:39:06 h05 cluster-dlm[15666]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1904 to 4 children
Dec 30 11:39:06 h05 corosync[20745]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 30 11:39:06 h05 cluster-dlm[15666]: update_cluster: Processing membership 1904
Dec 30 11:39:06 h05 cluster-dlm[15666]: dlm_process_node: Skipped active node 84939948: born-on=1876, last-seen=1904, this-event=1904, last-event=1900
Dec 30 11:39:06 h05 cluster-dlm[15666]: del_configfs_node: del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/17831084"
Dec 30 11:39:06 h05 cluster-dlm[15666]: dlm_process_node: Removed inactive node 17831084: born-on=1900, last-seen=1900, this-event=1904, last-event=1900
Dec 30 11:39:06 h05 corosync[20745]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.5) r(1) ip(10.2.2.5) ; members(old:2 left:1)
Dec 30 11:39:06 h05 corosync[20745]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 30 11:39:06 h05 kernel: [2678855.695469] dlm: closing connection to node 17831084
Dec 30 11:39:06 h05 cib[20782]: notice: plugin_handle_membership: Membership 1904: quorum lost
Dec 30 11:39:06 h05 cib[20782]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
Dec 30 11:39:06 h05 crmd[20787]: notice: plugin_handle_membership: Membership 1904: quorum lost
Dec 30 11:39:06 h05 crmd[20787]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
Dec 30 11:39:06 h05 crmd[20787]: notice: peer_update_callback: do_shutdown of h01 (op 80) is complete
Dec 30 11:39:06 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to false (call=1874)
Dec 30 11:39:06 h05 crmd[20787]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
Dec 30 11:39:06 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to false (call=1879)
Dec 30 11:39:06 h05 attrd[20785]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Dec 30 11:39:06 h05 attrd[20785]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Dec 30 11:39:06 h05 pengine[20786]: notice: unpack_config: On loss of CCM Quorum: Ignore
Dec 30 11:39:06 h05 pengine[20786]: notice: find_anonymous_clone: /Anonymous/ clone cln_CFS_VMs_fs is already running on h05
Dec 30 11:39:06 h05 pengine[20786]: notice: process_pe_message: Calculated Transition 318: /var/lib/pacemaker/pengine/pe-input-32.bz2
Dec 30 11:39:06 h05 crmd[20787]: notice: do_te_invoke: Processing graph 318 (ref=pe_calc-dc-1388399946-774) derived from /var/lib/pacemaker/pengine/pe-input-32.bz2
Dec 30 11:39:06 h05 crmd[20787]: notice: run_graph: Transition 318 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-32.bz2): Compl
ete
Dec 30 11:39:06 h05 crmd[20787]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] New Configuration:
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] Members Left:
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] Members Joined:
Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1908: memb=1, new=0, lost=0
Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: memb: h05 84939948
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] New Configuration:
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] Members Left:
Dec 30 11:39:17 h05 cluster-dlm[15666]: notice: plugin_handle_membership: Membership 1908: quorum acquired
Dec 30 11:39:17 h05 ocfs2_controld[15794]: notice: plugin_handle_membership: Membership 1908: quorum acquired
Dec 30 11:39:17 h05 ocfs2_controld[15794]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
Dec 30 11:39:17 h05 cluster-dlm[15666]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] Members Joined:
Dec 30 11:39:17 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1908: memb=2, new=1, lost=0
Dec 30 11:39:17 h05 cluster-dlm[15666]: update_cluster: Processing membership 1908
Dec 30 11:39:17 h05 crmd[20787]: notice: plugin_handle_membership: Membership 1908: quorum acquired
Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: update_member: Node 17831084/h01 is now: member
Dec 30 11:39:17 h05 cib[20782]: notice: plugin_handle_membership: Membership 1908: quorum acquired
Dec 30 11:39:17 h05 cib[20782]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
Dec 30 11:39:17 h05 crmd[20787]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
Dec 30 11:39:17 h05 cluster-dlm[15666]: dlm_process_node: Skipped active node 84939948: born-on=1876, last-seen=1908, this-event=1908, last-event=1904
Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: NEW: h01 17831084
Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h05 84939948
Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:17 h05 cluster-dlm[15666]: dlm_process_node: Adding address ip(172.20.16.1) to configfs for node 17831084
Dec 30 11:39:17 h05 corosync[20745]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 30 11:39:17 h05 cluster-dlm[15666]: add_configfs_node: set_configfs_node 17831084 172.20.16.1 local 0
Dec 30 11:39:17 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to true (call=1884)
Dec 30 11:39:17 h05 cluster-dlm[15666]: dlm_process_node: Adding address ip(10.2.2.1) to configfs for node 17831084
Dec 30 11:39:17 h05 cluster-dlm[15666]: add_configfs_node: set_configfs_node 17831084 10.2.2.1 local 0
Dec 30 11:39:17 h05 cluster-dlm[15666]: create_path: /sys/kernel/config/dlm/cluster/comms/17831084: mkdir failed: 17
Dec 30 11:39:17 h05 cluster-dlm[15666]: dlm_process_node: Added active node 17831084: born-on=1900, last-seen=1908, this-event=1908, last-event=1904
Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: update_member: 0x663780 Node 17831084 (h01) born on: 1908
Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:17 h05 corosync[20745]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:1 left:0)
Dec 30 11:39:17 h05 corosync[20745]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 30 11:39:17 h05 crmd[20787]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
Dec 30 11:39:20 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to true (call=1902)
Dec 30 11:39:20 h05 attrd[20785]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Dec 30 11:39:20 h05 attrd[20785]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Dec 30 11:39:20 h05 pengine[20786]: notice: unpack_config: On loss of CCM Quorum: Ignore
Dec 30 11:39:20 h05 pengine[20786]: notice: find_anonymous_clone: /Anonymous/ clone cln_CFS_VMs_fs is already running on h05
Dec 30 11:39:20 h05 pengine[20786]: notice: LogActions: Start prm_DLM:1 (h01)
Dec 30 11:39:20 h05 pengine[20786]: notice: LogActions: Start prm_cLVMd:1 (h01)
Dec 30 11:39:20 h05 pengine[20786]: notice: LogActions: Start prm_LVM_CFS_VMs:1 (h01)
Dec 30 11:39:20 h05 pengine[20786]: notice: LogActions: Start prm_O2CB:1 (h01)
Dec 30 11:39:20 h05 pengine[20786]: notice: LogActions: Start prm_CFS_VMs_fs:1 (h01)
Dec 30 11:39:20 h05 pengine[20786]: notice: process_pe_message: Calculated Transition 319: /var/lib/pacemaker/pengine/pe-input-33.bz2
Dec 30 11:39:20 h05 crmd[20787]: notice: do_te_invoke: Processing graph 319 (ref=pe_calc-dc-1388399960-784) derived from /var/lib/pacemaker/pengine/pe-input-33.bz2
Dec 30 11:39:20 h05 crmd[20787]: notice: te_rsc_command: Initiating action 19: monitor prm_stonith_sbd_monitor_0 on h01
[...]
Dec 30 11:39:22 h05 ocfs2_controld[15794]: confchg called
Dec 30 11:39:22 h05 ocfs2_controld[15794]: ocfs2_controld (group "ocfs2:controld") confchg: members 2, left 0, joined 1
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151112 (1380626)
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:22 h05 crmd[20787]: warning: match_down_event: No match for shutdown action on h01
Dec 30 11:39:22 h05 crmd[20787]: notice: peer_update_callback: Stonith/shutdown of h01 not matched
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 46 (46) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 45 (45) was pending on h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 54 (54) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 53 (53) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 62 (62) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 61 (61) was pending on h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 70 (70) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 69 (69) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: warning: fail_incompletable_actions: Node h01 shutdown resulted in un-runnable actions
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 46 (46) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 45 (45) was pending on h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 54 (54) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 53 (53) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 62 (62) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 61 (61) was pending on h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 70 (70) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 69 (69) is scheduled for h01 (offline)
Dec 30 11:39:22 h05 crmd[20787]: warning: fail_incompletable_actions: Node h01 shutdown resulted in un-runnable actions
Dec 30 11:39:22 h05 crmd[20787]: notice: run_graph: Transition 319 (Complete=25, Pending=0, Fired=0, Skipped=10, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-33.bz2): Sto
pped
Dec 30 11:39:24 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:24 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:24 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:24 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
(and so on...)

(some time later)
Dec 30 11:42:22 h05 crmd[20787]: error: crm_timer_popped: Integration Timer (I_INTEGRATED) just popped in state S_INTEGRATION! (180000ms)
Dec 30 11:42:22 h05 crmd[20787]: warning: do_state_transition: Progressed to state S_FINALIZE_JOIN after C_TIMER_POPPED
Dec 30 11:42:22 h05 crmd[20787]: warning: do_state_transition: 1 cluster nodes failed to respond to the join offer.
Dec 30 11:42:22 h05 crmd[20787]: notice: crmd_join_phase_log: join-1: h01=welcomed
Dec 30 11:42:22 h05 crmd[20787]: notice: crmd_join_phase_log: join-1: h05=confirmed
Dec 30 11:50:01 h05 /usr/sbin/cron[13942]: (root) CMD ([ -x /usr/lib64/sa/sa1 ] && exec /usr/lib64/sa/sa1 -S ALL 1 1)
Dec 30 11:51:15 h05 cluster-dlm[15666]: log_config: dlm:controld conf 1 0 1 memb 84939948 join left 17831084
Dec 30 11:51:15 h05 ocfs2_controld[15794]: confchg called
Dec 30 11:51:15 h05 ocfs2_controld[15794]: ocfs2_controld (group "ocfs2:controld") confchg: members 1, left 1, joined 0
Dec 30 11:51:15 h05 ocfs2_controld[15794]: node daemon left 17831084
Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000000002 (2)
Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
Dec 30 11:51:16 h05 crmd[20787]: warning: match_down_event: No match for shutdown action on h01
Dec 30 11:51:16 h05 crmd[20787]: notice: peer_update_callback: Stonith/shutdown of h01 not matched
Dec 30 11:51:16 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:51:16 h05 corosync[20745]: [CLM ] New Configuration:
Dec 30 11:51:16 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:51:16 h05 corosync[20745]: [CLM ] Members Left:
Dec 30 11:51:16 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:51:16 h05 corosync[20745]: [CLM ] Members Joined:
Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1912: memb=1, new=0, lost=1
Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: memb: h05 84939948
Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: lost: h01 17831084
Dec 30 11:51:16 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:51:16 h05 corosync[20745]: [CLM ] New Configuration:
Dec 30 11:51:16 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:51:16 h05 corosync[20745]: [CLM ] Members Left:
Dec 30 11:51:16 h05 ocfs2_controld[15794]: notice: plugin_handle_membership: Membership 1912: quorum lost
Dec 30 11:51:16 h05 ocfs2_controld[15794]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
Dec 30 11:51:16 h05 corosync[20745]: [CLM ] Members Joined:
Dec 30 11:51:16 h05 cluster-dlm[15666]: notice: plugin_handle_membership: Membership 1912: quorum lost
Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1912: memb=1, new=0, lost=0
Dec 30 11:51:16 h05 cluster-dlm[15666]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h05 84939948
Dec 30 11:51:16 h05 cluster-dlm[15666]: update_cluster: Processing membership 1912
Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: ais_mark_unseen_peer_dead: Node h01 was not seen in the previous transition
Dec 30 11:51:16 h05 cluster-dlm[15666]: dlm_process_node: Skipped active node 84939948: born-on=1876, last-seen=1912, this-event=1912, last-event=1908
Dec 30 11:51:16 h05 cluster-dlm[15666]: del_configfs_node: del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/17831084"
Dec 30 11:51:16 h05 cluster-dlm[15666]: dlm_process_node: Removed inactive node 17831084: born-on=1908, last-seen=1908, this-event=1912, last-event=1908
Dec 30 11:51:16 h05 crmd[20787]: notice: plugin_handle_membership: Membership 1912: quorum lost
Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: update_member: Node 17831084/h01 is now: lost
Dec 30 11:51:16 h05 crmd[20787]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
Dec 30 11:51:16 h05 crmd[20787]: warning: match_down_event: No match for shutdown action on h01
Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1912 to 4 children
Dec 30 11:51:16 h05 crmd[20787]: notice: peer_update_callback: Stonith/shutdown of h01 not matched
Dec 30 11:51:16 h05 cib[20782]: notice: plugin_handle_membership: Membership 1912: quorum lost
Dec 30 11:51:16 h05 corosync[20745]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 30 11:51:16 h05 cib[20782]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
Dec 30 11:51:16 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to false (call=2141)
Dec 30 11:51:16 h05 corosync[20745]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.5) r(1) ip(10.2.2.5) ; members(old:2 left:1)
Dec 30 11:51:16 h05 corosync[20745]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 30 11:51:16 h05 kernel: [2679586.289486] dlm: closing connection to node 17831084
Dec 30 11:51:16 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to false (call=2146)
Dec 30 11:51:16 h05 attrd[20785]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Dec 30 11:51:16 h05 attrd[20785]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Dec 30 11:51:16 h05 pengine[20786]: notice: unpack_config: On loss of CCM Quorum: Ignore
Dec 30 11:51:16 h05 pengine[20786]: warning: pe_fence_node: Node h01 will be fenced because the node is no longer part of the cluster
Dec 30 11:51:16 h05 pengine[20786]: warning: determine_online_status: Node h01 is unclean
Dec 30 11:51:16 h05 pengine[20786]: notice: find_anonymous_clone: /Anonymous/ clone cln_CFS_VMs_fs is already running on h05
Dec 30 11:51:16 h05 pengine[20786]: warning: custom_action: Action prm_DLM:0_stop_0 on h01 is unrunnable (offline)
Dec 30 11:51:16 h05 pengine[20786]: warning: custom_action: Action prm_DLM:0_stop_0 on h01 is unrunnable (offline)
Dec 30 11:51:16 h05 pengine[20786]: warning: custom_action: Action prm_cLVMd:0_stop_0 on h01 is unrunnable (offline)
Dec 30 11:51:16 h05 pengine[20786]: warning: custom_action: Action prm_cLVMd:0_stop_0 on h01 is unrunnable (offline)
[...]
Dec 30 11:54:17 h05 sbd: [14621]: info: reset successfully delivered to h01
Dec 30 11:54:17 h05 sbd: [14619]: info: Message successfully delivered.
Dec 30 11:54:18 h05 stonith-ng[20783]: notice: log_operation: Operation 'reboot' [14612] (call 2 from crmd.20787) for host 'h01' with device 'prm_stonith_sbd' returned: 0 (OK)
Dec 30 11:54:18 h05 stonith-ng[20783]: notice: remote_op_done: Operation reboot of h01 by h05 for crmd.20787@h05.515279c7: OK
Dec 30 11:54:18 h05 crmd[20787]: notice: tengine_stonith_callback: Stonith operation 2/71:320:0:d2d0a70f-b17d-4e6f-b9bb-7f71b7501e0d: OK (0)
Dec 30 11:54:18 h05 crmd[20787]: notice: tengine_stonith_notify: Peer h01 was terminated (reboot) by h05 for h05: OK (ref=515279c7-51cc-47e3-930a-46d8fd5786ca) by client crmd.20787
Dec 30 11:54:18 h05 crmd[20787]: notice: run_graph: Transition 320 (Complete=18, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-0.bz2): Comple
te
Dec 30 11:54:18 h05 crmd[20787]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 30 11:54:18 h05 crmd[20787]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Dec 30 11:54:18 h05 pengine[20786]: notice: unpack_config: On loss of CCM Quorum: Ignore
Dec 30 11:54:18 h05 pengine[20786]: notice: find_anonymous_clone: /Anonymous/ clone cln_CFS_VMs_fs is already running on h05
Dec 30 11:54:18 h05 pengine[20786]: notice: process_pe_message: Calculated Transition 321: /var/lib/pacemaker/pengine/pe-input-34.bz2
Dec 30 11:54:18 h05 crmd[20787]: notice: do_te_invoke: Processing graph 321 (ref=pe_calc-dc-1388400858-859) derived from /var/lib/pacemaker/pengine/pe-input-34.bz2
Dec 30 11:54:18 h05 crmd[20787]: notice: run_graph: Transition 321 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-34.bz2): Compl
ete
Dec 30 11:54:18 h05 crmd[20787]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 30 11:55:01 h05 /usr/sbin/cron[16614]: (root) CMD ([ -x /usr/lib64/sa/sa2 ] && exec /usr/lib64/sa/sa2 -A)
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] New Configuration:
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] Members Left:
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] Members Joined:
Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1916: memb=1, new=0, lost=0
Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: memb: h05 84939948
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] New Configuration:
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] Members Left:
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] Members Joined:
Dec 30 11:57:00 h05 cib[20782]: notice: plugin_handle_membership: Membership 1916: quorum acquired
Dec 30 11:57:00 h05 cib[20782]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
Dec 30 11:57:00 h05 ocfs2_controld[15794]: notice: plugin_handle_membership: Membership 1916: quorum acquired
Dec 30 11:57:00 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
Dec 30 11:57:00 h05 crmd[20787]: notice: plugin_handle_membership: Membership 1916: quorum acquired
Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1916: memb=2, new=1, lost=0
Dec 30 11:57:00 h05 cluster-dlm[15666]: notice: plugin_handle_membership: Membership 1916: quorum acquired
Dec 30 11:57:00 h05 ocfs2_controld[15794]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
Dec 30 11:57:00 h05 crmd[20787]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: update_member: Node 17831084/h01 is now: member
Dec 30 11:57:00 h05 cluster-dlm[15666]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: NEW: h01 17831084
Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
Dec 30 11:57:00 h05 cluster-dlm[15666]: update_cluster: Processing membership 1916
Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h05 84939948
Dec 30 11:57:00 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to true (call=2160)
Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1916 to 4 children
Dec 30 11:57:00 h05 cluster-dlm[15666]: dlm_process_node: Skipped active node 84939948: born-on=1876, last-seen=1916, this-event=1916, last-event=1912
Dec 30 11:57:00 h05 corosync[20745]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: update_member: 0x663780 Node 17831084 (h01) born on: 1916
Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
Dec 30 11:57:00 h05 cluster-dlm[15666]: dlm_process_node: Adding address ip(172.20.16.1) to configfs for node 17831084
Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1916 to 4 children
Dec 30 11:57:00 h05 cluster-dlm[15666]: add_configfs_node: set_configfs_node 17831084 172.20.16.1 local 0
Dec 30 11:57:00 h05 cluster-dlm[15666]: dlm_process_node: Adding address ip(10.2.2.1) to configfs for node 17831084
Dec 30 11:57:00 h05 cluster-dlm[15666]: add_configfs_node: set_configfs_node 17831084 10.2.2.1 local 0
Dec 30 11:57:00 h05 cluster-dlm[15666]: create_path: /sys/kernel/config/dlm/cluster/comms/17831084: mkdir failed: 17
Dec 30 11:57:00 h05 cluster-dlm[15666]: dlm_process_node: Added active node 17831084: born-on=1908, last-seen=1916, this-event=1916, last-event=1912
Dec 30 11:57:00 h05 corosync[20745]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:1 left:0)
Dec 30 11:57:00 h05 corosync[20745]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 30 11:57:00 h05 crmd[20787]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
Dec 30 11:57:00 h05 crmd[20787]: notice: do_dc_join_finalize: join-1: Syncing the CIB from h01 to the rest of the cluster
Dec 30 11:57:00 h05 crmd[20787]: notice: do_dc_join_finalize: Requested version <generation_tuple epoch="856" num_updates="102" admin_epoch="0" validate-with="pacemaker-1.2" crm_fe
ature_set="3.0.7" have-quorum="1" cib-last-written="Mon Dec 30 08:18:06 2013" update-origin="h01" update-client="cibadmin" update-user="root" dc-uuid="h05"/>
Dec 30 11:57:00 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to true (call=2174)
Dec 30 11:57:00 h05 attrd[20785]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
Dec 30 11:57:00 h05 attrd[20785]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
[...]
Dec 30 11:57:13 h05 crmd[20787]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]

Oh, Well. The software on h01 is the latest for SLE11 SP3...
python-lxml-2.3.6-0.11.1 Mon Dec 30 11:39:23 2013
pacemaker-mgmt-client-2.1.2-0.11.4 Mon Dec 30 11:39:23 2013
crmsh-1.2.6-0.25.4 Mon Dec 30 11:39:23 2013
resource-agents-3.9.5-0.32.22 Mon Dec 30 11:39:22 2013
pacemaker-mgmt-2.1.2-0.11.4 Mon Dec 30 11:39:22 2013
pacemaker-1.1.10-0.9.28 Mon Dec 30 11:39:22 2013
cluster-glue-1.0.11-0.19.4 Mon Dec 30 11:39:20 2013
sbd-1.2.1-0.7.22 Mon Dec 30 11:39:19 2013
libpacemaker3-1.1.10-0.9.28 Mon Dec 30 11:39:19 2013
openais-1.1.4-5.17.5 Mon Dec 30 11:34:01 2013
libqb0-0.16.0-0.7.4 Mon Dec 30 11:34:00 2013
libopenais3-1.1.4-5.17.5 Mon Dec 30 11:34:00 2013
libglue2-1.0.11-0.19.4 Mon Dec 30 11:34:00 2013
ldirectord-3.9.5-0.32.22 Mon Dec 30 11:34:00 2013

Regards,
Ulrich

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Problem with updating the cluster stack (SLES11 SP3) (long) [ In reply to ]
Hi Ulrich

Are you using the update that was pushed on Friday? The previous
versions are quite badly broken in certain scenarios and I have been
working with the SLES Engineers for over a month now to get to the
bottom of them. The Friday update should fix everything that we found
and I am now running it on two of my clusters cleanly with clean upgrades.

Tom

On 12/30/2013 07:37 AM, Ulrich Windl wrote:
> Hello!
>
> I had seens this problem at least twice (reported to Novell support also (pending)): When updating the cluster stack via online update, the updated node fails to re-join the cluster. Furthermore "rcopenais restart" hangs on stopping, so I needed a reboot. Logs look like this:
>
> (Update restarts services)
> Dec 30 11:34:01 h01 corosync[20684]: [SERV ] Unloading all Corosync service engines.
> Dec 30 11:34:01 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: Shuting down Pacemaker
> Dec 30 11:34:01 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to mgmtd: [20727]
> Dec 30 11:34:01 h01 mgmtd: [20727]: info: mgmtd is shutting down
> Dec 30 11:34:01 h01 mgmtd: [20727]: debug: [mgmtd] stopped
> Dec 30 11:34:01 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: mgmtd confirmed stopped
> Dec 30 11:34:01 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to crmd: [20726]
> Dec 30 11:34:01 h01 crmd[20726]: notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms
> Dec 30 11:34:01 h01 attrd[20724]: notice: attrd_cs_dispatch: Update relayed from h05
> Dec 30 11:34:01 h01 attrd[20724]: notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1388399641)
> Dec 30 11:34:01 h01 attrd[20724]: notice: attrd_perform_update: Sent update 10: shutdown=1388399641
> Dec 30 11:34:01 h01 Xen(prm_xen_v05)[5393]: INFO: v05: Starting xm migrate to h05
> (Cluster triggers resource migration...)
> [...]
> Dec 30 11:39:00 h01 Xen(prm_xen_v06)[8525]: INFO: v06: xm migrate to h05 succeeded.
> Dec 30 11:39:00 h01 logger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/2/51712
> Dec 30 11:39:00 h01 crmd[20726]: notice: process_lrm_event: LRM operation prm_xen_v06_migrate_to_0 (call=171, rc=0, cib-update=50, confirmed=true) ok
> [...]
> (Cluster is stopping dependent resources)
> Dec 30 11:39:00 h01 crmd[20726]: notice: process_lrm_event: LRM operation prm_xen_v06_stop_0 (call=176, rc=0, cib-update=51, confirmed=true) ok
> Dec 30 11:39:00 h01 Filesystem(prm_CFS_VMs_fs)[10396]: INFO: Running stop for /dev/CFS_VMs/xen on /cfs/xen
> Dec 30 11:39:00 h01 Filesystem(prm_CFS_VMs_fs)[10396]: INFO: Trying to unmount /cfs/xen
> [...]
> Dec 30 11:39:01 h01 kernel: [266563.583622] ocfs2: Unmounting device (253,48) on (node 1783108)
> Dec 30 11:39:01 h01 Filesystem(prm_CFS_VMs_fs)[10396]: INFO: unmounted /cfs/xen successfully
> Dec 30 11:39:01 h01 crmd[20726]: notice: process_lrm_event: LRM operation prm_CFS_VMs_fs_stop_0 (call=180, rc=0, cib-update=52, confirmed=true) ok
> Dec 30 11:39:01 h01 lvm[21964]: Got new connection on fd 5
> (cLVM is telling the story of its life, the universe, and everything...)
> [...]
> Dec 30 11:39:02 h01 lvm[21964]: process_work_item: free fd -1
> Dec 30 11:39:02 h01 lvm[21964]: LVM thread waiting for work
> Dec 30 11:39:02 h01 LVM(prm_LVM_CFS_VMs)[10671]: INFO: 0 logical volume(s) in volume group "CFS_VMs" now active
> Dec 30 11:39:02 h01 LVM(prm_LVM_CFS_VMs)[10671]: INFO: LVM Volume CFS_VMs is not available (stopped)
> Dec 30 11:39:02 h01 crmd[20726]: notice: process_lrm_event: LRM operation prm_LVM_CFS_VMs_stop_0 (call=185, rc=0, cib-update=53, confirmed=true) ok
> Dec 30 11:39:02 h01 clvmd(prm_cLVMd)[10734]: INFO: Stopping prm_cLVMd
> [...]
> (Cluster Stack eventually stops)
> Dec 30 11:39:04 h01 kernel: [266566.993795] dlm: closing connection to node 84939948
> Dec 30 11:39:04 h01 kernel: [266566.993828] dlm: closing connection to node 17831084
> Dec 30 11:39:06 h01 crmd[20726]: notice: process_lrm_event: LRM operation prm_DLM_stop_0 (call=200, rc=0, cib-update=56, confirmed=true) ok
> Dec 30 11:39:06 h01 crmd[20726]: notice: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_messa
> ge ]
> Dec 30 11:39:06 h01 crmd[20726]: notice: do_lrm_control: Disconnected from the LRM
> Dec 30 11:39:06 h01 crmd[20726]: notice: terminate_cs_connection: Disconnecting from Corosync
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x726a20, async-conn=0x726a20) left
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: crmd confirmed stopped
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to pengine: [20725]
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: pengine confirmed stopped
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to attrd: [20724]
> Dec 30 11:39:06 h01 attrd[20724]: notice: main: Exiting...
> Dec 30 11:39:06 h01 attrd[20724]: notice: main: Disconnecting client 0x6106a0, pid=20726...
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: attrd confirmed stopped
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to pengine: [20725]
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: pengine confirmed stopped
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to attrd: [20724]
> Dec 30 11:39:06 h01 attrd[20724]: notice: main: Exiting...
> Dec 30 11:39:06 h01 attrd[20724]: notice: main: Disconnecting client 0x6106a0, pid=20726...
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: attrd confirmed stopped
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to lrmd: [20723]
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: lrmd confirmed stopped
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to stonith-ng: [20722]
> Dec 30 11:39:06 h01 cib[20721]: warning: qb_ipcs_event_sendv: new_event_notification (20721-20722-10): Broken pipe (32)
> Dec 30 11:39:06 h01 cib[20721]: warning: cib_notify_send_one: Notification of client crmd/76346944-ec4a-498e-9f5b-7cfa12b7d5cf failed
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x719fa0, async-conn=0x719fa0) left
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: stonith-ng confirmed stopped
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: stop_child: Sent -15 to cib: [20721]
> Dec 30 11:39:06 h01 cib[20721]: notice: terminate_cs_connection: Disconnecting from Corosync
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x7226a0, async-conn=0x7226a0) left
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: cib confirmed stopped
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000000002 (2)
> Dec 30 11:39:06 h01 corosync[20684]: [pcmk ] notice: pcmk_shutdown: Shutdown complete
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: Pacemaker Cluster Manager 1.1.10
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync configuration service
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync profile loading service
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais cluster membership service B.01.01
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais event service B.01.01
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais distributed locking service B.03.01
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais message service B.03.01
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais timer service A.01.01
> Dec 30 11:39:06 h01 corosync[20684]: [SERV ] Service engine unloaded: openais availability management framework B.01.01
> Dec 30 11:39:06 h01 corosync[20684]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:1940.
>
> (Restart of Cluster stack is observed now)
> Dec 30 11:39:08 h01 sbd: [10827]: info: Watchdog enabled.
> Dec 30 11:39:08 h01 sbd: [10827]: info: Setting latency warning to 15
> Dec 30 11:39:08 h01 sbd: [10828]: info: Delivery process handling /dev/disk/by-id/dm-name-SBD_1-E3
> Dec 30 11:39:08 h01 sbd: [10829]: info: Delivery process handling /dev/disk/by-id/dm-name-SBD_1-E4
> Dec 30 11:39:08 h01 sbd: [10828]: info: Writing exit to node slot h01
> Dec 30 11:39:08 h01 sbd: [10829]: info: Writing exit to node slot h01
> Dec 30 11:39:08 h01 sbd: [10828]: info: exit successfully delivered to h01
> Dec 30 11:39:08 h01 sbd: [10829]: info: exit successfully delivered to h01
> Dec 30 11:39:08 h01 sbd: [10827]: info: Message successfully delivered.
> Dec 30 11:39:09 h01 sbd: [20486]: info: Received command exit from h01 on disk /dev/disk/by-id/dm-name-SBD_1-E4
> Dec 30 11:39:09 h01 sbd: [20483]: WARN: Servant for /dev/disk/by-id/dm-name-SBD_1-E3 (pid: 20485) has terminated
> Dec 30 11:39:09 h01 sbd: [20483]: WARN: Servant for /dev/disk/by-id/dm-name-SBD_1-E4 (pid: 20486) has terminated
> Dec 30 11:39:10 h01 sbd: [10894]: info: Watchdog enabled.
> Dec 30 11:39:10 h01 sbd: [10894]: info: Setting latency warning to 15
> Dec 30 11:39:10 h01 sbd: [10898]: info: Servant starting for device /dev/disk/by-id/dm-name-SBD_1-E3
> Dec 30 11:39:10 h01 sbd: [10899]: info: Servant starting for device /dev/disk/by-id/dm-name-SBD_1-E4
> Dec 30 11:39:17 h01 sbd: [10895]: notice: Using watchdog device: /dev/watchdog
> Dec 30 11:39:17 h01 sbd: [10895]: info: Set watchdog timeout to 90 seconds.
> Dec 30 11:39:17 h01 kernel: [266579.322915] hpwdt: New timer passed in is 90 seconds.
> Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Corosync Cluster Engine ('1.4.6'): started and ready to provide service.
> Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Corosync built-in features: nss
> Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Successfully configured openais services to load
> Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] Initializing transport (UDP/IP Multicast).
> Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] Initializing transport (UDP/IP Multicast).
> Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] The network interface [172.20.16.1] is now up.
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais cluster membership service B.01.01
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais event service B.01.01
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais availability management framework B.01.01
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais message service B.03.01
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais distributed locking service B.03.01
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: openais timer service A.01.01
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: process_ais_conf: Reading configure
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_init: Local handle: 8054506479773810690 for logging
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional logging options...
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found 'off' for option: debug
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found 'no' for option: to_logfile
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found 'yes' for option: to_syslog
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found 'daemon' for option: syslog_facility
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_init: Local handle: 7664968412203843587 for quorum
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: No additional configuration supplied for: quorum
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: No default for option: provider
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_init: Local handle: 9033685355982749700 for service
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: config_find_next: Processing additional service options...
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found '0' for option: ver
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Defaulting to 'pcmk' for option: clustername
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: get_config_opt: Found 'yes' for option: use_mgmtd
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_startup: CRM: Initialized
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] Logging: Initialized pcmk_startup
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_startup: Service: 9
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_startup: Local hostname: h01
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_update_nodeid: Local node id: 17831084
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Creating entry for node 17831084 born on 0
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: 0x704810 Node 17831084 now known as h01 (was: (null))
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has 1 quorum votes (was 0)
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node 17831084/h01 is now: member
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11001 for process cib
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11002 for process stonith-ng
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11003 for process lrmd
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process attrd
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11004 for process attrd
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process pengine
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11005 for process pengine
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process crmd
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11006 for process crmd
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11007 for process mgmtd
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.1.10
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync configuration service
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync profile loading service
> Dec 30 11:39:17 h01 corosync[10962]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
> Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] The network interface [10.2.2.1] is now up.
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] New Configuration:
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Left:
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Joined:
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1904: memb=0, new=0, lost=0
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] New Configuration:
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Left:
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Joined:
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1904: memb=1, new=1, lost=0
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: NEW: h01 17831084
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Dec 30 11:39:17 h01 corosync[10962]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:0 left:0)
> Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Completed service synchronization, ready to provide service.
> (so far, everything looks good (IMHO))
> Dec 30 11:39:17 h01 mgmtd: [11007]: info: Pacemaker-mgmt Git Version:
> Dec 30 11:39:17 h01 crmd[11006]: notice: main: CRM Git Version: 9d65618
> Dec 30 11:39:17 h01 attrd[11004]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:39:17 h01 cib[11001]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
> Dec 30 11:39:17 h01 stonith-ng[11002]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:39:17 h01 mgmtd: [11007]: WARN: Core dumps could be lost if multiple dumps occur.
> Dec 30 11:39:17 h01 mgmtd: [11007]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
> Dec 30 11:39:17 h01 mgmtd: [11007]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
> Dec 30 11:39:17 h01 attrd[11004]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x716840 for attrd/11004
> Dec 30 11:39:17 h01 stonith-ng[11002]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x71abc0 for stonith-ng/11002
> Dec 30 11:39:17 h01 attrd[11004]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:17 h01 attrd[11004]: notice: main: Starting mainloop...
> Dec 30 11:39:17 h01 stonith-ng[11002]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:17 h01 cib[11001]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:39:17 h01 cib[11001]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x721790 for cib/11001
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Sending membership update 1904 to cib
> Dec 30 11:39:17 h01 cib[11001]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:17 h01 cib[11001]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] New Configuration:
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Left:
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Joined:
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1908: memb=1, new=0, lost=0
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: memb: h01 17831084
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] New Configuration:
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Left:
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] Members Joined:
> Dec 30 11:39:17 h01 corosync[10962]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1908: memb=2, new=1, lost=0
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Creating entry for node 84939948 born on 1908
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node 84939948/unknown is now: member
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: NEW: .pending. 84939948
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: pcmk_peer_update: MEMB: .pending. 84939948
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 1 children
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: 0x704810 Node 17831084 ((null)) born on: 1908
> (the node now should be integrated into the cluster)
> Dec 30 11:39:17 h01 corosync[10962]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Dec 30 11:39:17 h01 cib[11001]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[84939948] - state is now member (was (null))
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: 0x71f610 Node 84939948 (h05) born on: 1876
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: 0x71f610 Node 84939948 now known as h05 (was: (null))
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node h05 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: update_member: Node h05 now has 1 quorum votes (was 0)
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 1 children
> Dec 30 11:39:17 h01 cib[11001]: notice: plugin_handle_membership: Membership 1908: quorum acquired
> (everything looks perfect, doesn't it? But now...)
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:17 h01 corosync[10962]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:1 left:0)
> Dec 30 11:39:17 h01 cib[11001]: notice: cib_server_process_diff: Not applying diff 0.856.46 -> 0.856.47 (sync in progress)
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:17 h01 corosync[10962]: [MAIN ] Completed service synchronization, ready to provide service.
> Dec 30 11:39:17 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:17 h01 cib[11001]: notice: crm_ipc_prepare: Message exceeds the configured ipc limit (51200 bytes), consider configuring PCMK_ipc_buffer to 1
> 61300 or higher to avoid compression overheads
> Dec 30 11:39:18 h01 mgmtd: [11007]: info: Started.
> Dec 30 11:39:18 h01 crmd[11006]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:39:18 h01 stonith-ng[11002]: notice: setup_cib: Watching for stonith topology changes
> Dec 30 11:39:18 h01 crmd[11006]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:18 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x72c090 for crmd/11006
> Dec 30 11:39:18 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Sending membership update 1908 to crmd
> Dec 30 11:39:18 h01 crmd[11006]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:18 h01 stonith-ng[11002]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Dec 30 11:39:18 h01 crmd[11006]: notice: plugin_handle_membership: Membership 1908: quorum acquired
> Dec 30 11:39:18 h01 crmd[11006]: notice: crm_update_peer_state: plugin_handle_membership: Node h05[84939948] - state is now member (was (null))
> Dec 30 11:39:18 h01 crmd[11006]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
> Dec 30 11:39:18 h01 crmd[11006]: notice: do_started: The local CRM is operational
> Dec 30 11:39:18 h01 crmd[11006]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_st
> arted ]
> Dec 30 11:39:20 h01 logd: [11079]: debug: Stopping ha_logd with pid 16800
> Dec 30 11:39:20 h01 logd: [11079]: info: Waiting for pid=16800 to exit
> Dec 30 11:39:20 h01 logd: [16800]: debug: logd_term_action: received SIGTERM
> Dec 30 11:39:20 h01 logd: [16800]: debug: logd_term_action: waiting for 0 messages to be read by write process
> Dec 30 11:39:20 h01 logd: [16800]: debug: logd_term_action: sending SIGTERM to write process
> Dec 30 11:39:20 h01 logd: [16804]: info: logd_term_write_action: received SIGTERM
> Dec 30 11:39:20 h01 logd: [16804]: debug: Writing out 0 messages then quitting
> Dec 30 11:39:20 h01 logd: [16804]: info: Exiting write process
> Dec 30 11:39:20 h01 crmd[11006]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join
> _finalize_respond ]
> Dec 30 11:39:20 h01 attrd[11004]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Dec 30 11:39:20 h01 Filesystem(prm_CFS_VMs_fs)[11089]: WARNING: Couldn't find device [/dev/CFS_VMs/xen]. Expected /dev/??? to exist
> Dec 30 11:39:20 h01 LVM(prm_LVM_CFS_VMs)[11083]: WARNING: LVM Volume CFS_VMs is not available (stopped)
> Dec 30 11:39:20 h01 o2cb(prm_O2CB)[11085]: INFO: Stack glue driver not loaded
> Dec 30 11:39:21 h01 crmd[11006]: notice: process_lrm_event: LRM operation prm_cLVMd_monitor_0 (call=16, rc=7, cib-update=9, confirmed=true) not running
> [...]
> (resources start up)
> Dec 30 11:39:22 h01 kernel: [266584.261481] dlm: Using SCTP for communications
> Dec 30 11:39:22 h01 cib[11001]: error: crm_abort: cib_common_callback: Triggered fatal assert at callbacks.c:249 : flags & crm_ipc_client_response
> Dec 30 11:39:22 h01 kernel: [266584.267797] ocfs2: Registered cluster interface user
> Dec 30 11:39:22 h01 stonith-ng[11002]: error: crm_ipc_read: Connection to cib_rw failed
> Dec 30 11:39:22 h01 stonith-ng[11002]: error: mainloop_gio_callback: Connection to cib_rw[0x61e680] closed (I/O condition=17)
> Dec 30 11:39:22 h01 crmd[11006]: error: crm_ipc_read: Connection to cib_shm failed
> Dec 30 11:39:22 h01 crmd[11006]: error: mainloop_gio_callback: Connection to cib_shm[0x757ec0] closed (I/O condition=17)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x721790, async-conn=0x721790) left
> Dec 30 11:39:22 h01 stonith-ng[11002]: notice: cib_connection_destroy: Connection to the CIB terminated. Shutting down.
> Dec 30 11:39:22 h01 crmd[11006]: error: crmd_cib_connection_destroy: Connection to the CIB terminated...
> Dec 30 11:39:22 h01 crmd[11006]: error: do_log: FSA: Input I_ERROR from crmd_cib_connection_destroy() received in state S_NOT_DC
> Dec 30 11:39:22 h01 crmd[11006]: notice: do_state_transition: State transition S_NOT_DC -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=crmd_cib
> _connection_destroy ]
> Dec 30 11:39:22 h01 crmd[11006]: warning: do_recover: Fast-tracking shutdown in response to errors
> Dec 30 11:39:22 h01 crmd[11006]: error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY
> Dec 30 11:39:22 h01 lrmd[11003]: error: crm_ipc_read: Connection to stonith-ng failed
> Dec 30 11:39:22 h01 lrmd[11003]: error: mainloop_gio_callback: Connection to stonith-ng[0x62dc80] closed (I/O condition=17)
> Dec 30 11:39:22 h01 lrmd[11003]: error: stonith_connection_destroy_cb: LRMD lost STONITH connection
> Dec 30 11:39:22 h01 crmd[11006]: error: lrm_state_verify_stopped: 2 pending LRM operations at shutdown(null)
> Dec 30 11:39:22 h01 crmd[11006]: error: ghash_print_pending: Pending action: prm_cLVMd:79 (prm_cLVMd_start_0)
> Dec 30 11:39:22 h01 crmd[11006]: error: ghash_print_pending: Pending action: prm_O2CB:81 (prm_O2CB_start_0)
> Dec 30 11:39:22 h01 crmd[11006]: notice: do_lrm_control: Disconnected from the LRM
> Dec 30 11:39:22 h01 crmd[11006]: notice: terminate_cs_connection: Disconnecting from Corosync
> Dec 30 11:39:22 h01 logd: [11537]: info: logd started with default configuration.
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x71abc0, async-conn=0x71abc0) left
> Dec 30 11:39:22 h01 logd: [11537]: WARN: Core dumps could be lost if multiple dumps occur.
> Dec 30 11:39:22 h01 logd: [11537]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
> Dec 30 11:39:22 h01 logd: [11537]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
> Dec 30 11:39:22 h01 crmd[11006]: error: crmd_fast_exit: Could not recover from internal error
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x72c090, async-conn=0x72c090) left
> Dec 30 11:39:22 h01 kernel: [266584.277833] OCFS2 Node Manager 1.5.0
> Dec 30 11:39:22 h01 kernel: [266584.291675] OCFS2 1.5.0
> Dec 30 11:39:22 h01 ocfs2_controld.pcmk: Core dumps enabled: /var/lib/openais
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_notify: Enabling node notifications for child 11546 (0x71abc0)
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: Cluster connection established. Local node id: 17831084
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: Added Pacemaker as client 1 with fd 8
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: Initializing CKPT service (try 1)
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: Connected to CKPT service with handle 0x440db64900000000
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: Opening checkpoint "ocfs2:controld:011014ac" (try 1)
> [...]
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: cpg_join succeeded
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: setup done
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: plugin_handle_membership: Membership 1908: quorum acquired
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: crm_update_peer_state: plugin_handle_membership: Node h05[84939948] - state is now member (was (null))
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: confchg called
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: ocfs2_controld (group "ocfs2:controld") confchg: members 2, left 0, joined 1
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: CPG is live, we are not the first daemon
> [...]
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: Daemon protocol is 1.0
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: fs protocol is 1.0
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: Connecting to dlm_controld
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: Opening control device
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: Starting to listen for mounters
> Dec 30 11:39:22 h01 ocfs2_controld[11546]: new listening connection 4
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=11001, core=false)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11553 for process cib
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process crmd exited (pid=11006, rc=201)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151112 (1380626)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: crmd
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process crmd
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11554 for process crmd
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Child process stonith-ng exited (pid=11002, rc=0)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: stonith-ng
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11555 for process stonith-ng
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> (this does not look good!)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151112 (1380626)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
> Dec 30 11:39:22 h01 stonith-ng[11555]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:39:22 h01 cib[11553]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
> Dec 30 11:39:22 h01 crmd[11554]: notice: main: CRM Git Version: 9d65618
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:22 h01 stonith-ng[11555]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x74e800 for stonith-ng/11555
> Dec 30 11:39:22 h01 stonith-ng[11555]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:22 h01 cib[11553]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:39:22 h01 cib[11553]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x752b80 for cib/11553
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Sending membership update 1908 to cib
> Dec 30 11:39:22 h01 cib[11553]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:22 h01 cib[11553]: notice: plugin_handle_membership: Membership 1908: quorum acquired
> Dec 30 11:39:22 h01 cib[11553]: notice: crm_update_peer_state: plugin_handle_membership: Node h05[84939948] - state is now member (was (null))
> Dec 30 11:39:22 h01 cib[11553]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:22 h01 cib[11553]: notice: cib_server_process_diff: Not applying diff 0.856.88 -> 0.856.89 (sync in progress)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:22 h01 cib[11553]: error: crm_compress_string: Compression of 1969 bytes failed: output data will not fit into the buffer provided (-8)
> Dec 30 11:39:22 h01 cib[11553]: error: crm_ipc_prepare: Could not compress the message into less than the configured ipc limit (48 bytes).Set PCMK_ipc_bu
> ffer to a higher value (3938 bytes suggested)
> Dec 30 11:39:22 h01 cib[11553]: notice: cib_notify_send: Notification failed: Message too long (-90)
> Dec 30 11:39:22 h01 cib[11553]: error: crm_compress_string: Compression of 1939 bytes failed: output data will not fit into the buffer provided (-8)
> Dec 30 11:39:22 h01 cib[11553]: error: crm_ipc_prepare: Could not compress the message into less than the configured ipc limit (48 bytes).Set PCMK_ipc_bu
> ffer to a higher value (7876 bytes suggested)
> Dec 30 11:39:22 h01 cib[11553]: notice: cib_notify_send: Notification failed: Message too long (-90)
> Dec 30 11:39:22 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:23 h01 cib[11553]: error: crm_abort: cib_common_callback: Triggered fatal assert at callbacks.c:249 : flags & crm_ipc_client_response
> Dec 30 11:39:23 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x752b80, async-conn=0x752b80) left
> Dec 30 11:39:24 h01 cluster-dlm[11460]: notice: crm_ipc_send: Connection to cib_rw closed: Transport endpoint is not connected (-107)
> Dec 30 11:39:24 h01 cluster-dlm[11460]: cib_connect: Signon to cib failed: Communication error on send
> Dec 30 11:39:24 h01 cluster-dlm[11460]: dlm_join_lockspace: dlm_join_lockspace no fence domain
> Dec 30 11:39:24 h01 cluster-dlm[11460]: do_sysfs: write "-1" to "/sys/kernel/dlm/clvmd/event_done"
> Dec 30 11:39:24 h01 cluster-dlm[11460]: process_uevent: process_uevent online@ error -1 errno 0
> Dec 30 11:39:24 h01 cluster-dlm[11460]: process_listener: client connection 5 fd 18
> Dec 30 11:39:24 h01 cluster-dlm[11460]: process_uevent: uevent: remove@/kernel/dlm/clvmd
> Dec 30 11:39:24 h01 cluster-dlm[11460]: process_uevent: kernel: remove@ clvmd
> Dec 30 11:39:24 h01 clvmd[11517]: Unable to create DLM lockspace for CLVM: Operation not permitted
> Dec 30 11:39:24 h01 clvmd[11517]: Can't initialise cluster interface
> Dec 30 11:39:24 h01 kernel: [266586.266327] dlm: clvmd: group join failed -1 -1
> Dec 30 11:39:24 h01 clvmd(prm_cLVMd)[11485]: ERROR: Could not start /usr/sbin/clvmd
> Dec 30 11:39:24 h01 lrmd[11003]: notice: operation_finished: prm_cLVMd_start_0:11485:stderr [ local socket: connect failed: No such file or directory ]
> Dec 30 11:39:24 h01 lrmd[11003]: notice: operation_finished: prm_cLVMd_start_0:11485:stderr [ clvmd could not connect to cluster manager ]
> Dec 30 11:39:24 h01 lrmd[11003]: notice: operation_finished: prm_cLVMd_start_0:11485:stderr [ Consult syslog for more information ]
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=11553, core=false)
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11629 for process cib
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:24 h01 cib[11629]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
> Dec 30 11:39:24 h01 cib[11629]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:39:24 h01 cib[11629]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x739ed0 for cib/11629
> Dec 30 11:39:24 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Sending membership update 1908 to cib
> Dec 30 11:39:24 h01 cib[11629]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:24 h01 cib[11629]: notice: plugin_handle_membership: Membership 1908: quorum acquired
> Dec 30 11:39:24 h01 cib[11629]: notice: crm_update_peer_state: plugin_handle_membership: Node h05[84939948] - state is now member (was (null))
> Dec 30 11:39:24 h01 cib[11629]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
> Dec 30 11:39:25 h01 crmd[11554]: notice: crm_ipc_send: Connection to cib_shm closed: Transport endpoint is not connected (-107)
> Dec 30 11:39:25 h01 crmd[11554]: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
> Dec 30 11:39:25 h01 stonith-ng[11555]: notice: setup_cib: Watching for stonith topology changes
> Dec 30 11:39:25 h01 stonith-ng[11555]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Dec 30 11:39:27 h01 cib[11629]: error: crm_abort: cib_common_callback: Triggered fatal assert at callbacks.c:247 : flags & crm_ipc_client_response
> Dec 30 11:39:27 h01 stonith-ng[11555]: error: crm_ipc_read: Connection to cib_rw failed
> Dec 30 11:39:27 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x739ed0, async-conn=0x739ed0) left
> Dec 30 11:39:27 h01 stonith-ng[11555]: error: mainloop_gio_callback: Connection to cib_rw[0x61e760] closed (I/O condition=17)
> Dec 30 11:39:27 h01 stonith-ng[11555]: notice: cib_connection_destroy: Connection to the CIB terminated. Shutting down.
> Dec 30 11:39:27 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x74e800, async-conn=0x74e800) left
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=11629, core=false)
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11741 for process cib
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Child process stonith-ng exited (pid=11555, rc=0)
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: stonith-ng
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11742 for process stonith-ng
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 2 children
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:28 h01 stonith-ng[11742]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:39:28 h01 stonith-ng[11742]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] info: pcmk_ipc: Recorded connection 0x742b30 for stonith-ng/11742
> Dec 30 11:39:28 h01 cib[11741]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
> Dec 30 11:39:28 h01 corosync[10962]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Dec 30 11:39:28 h01 stonith-ng[11742]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:28 h01 cib[11741]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:39:28 h01 cib[11741]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> [...]
> Dec 30 11:39:38 h01 cib[11884]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:39:38 h01 cib[11884]: notice: plugin_handle_membership: Membership 1908: quorum acquired
> Dec 30 11:39:38 h01 cib[11884]: notice: crm_update_peer_state: plugin_handle_membership: Node h05[84939948] - state is now member (was (null))
> Dec 30 11:39:38 h01 cib[11884]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
> Dec 30 11:39:39 h01 attrd[11004]: notice: crm_ipc_send: Connection to cib_rw closed: Transport endpoint is not connected (-107)
> Dec 30 11:39:39 h01 stonith-ng[11885]: notice: setup_cib: Watching for stonith topology changes
> Dec 30 11:39:39 h01 stonith-ng[11885]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Dec 30 11:39:42 h01 cib[11884]: error: crm_abort: cib_common_callback: Triggered fatal assert at callbacks.c:247 : flags & crm_ipc_client_response
> Dec 30 11:39:42 h01 stonith-ng[11885]: error: crm_ipc_read: Connection to cib_rw failed
> Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x7fc5ec0076f0, async-conn=0x7fc5ec0076f0) left
> Dec 30 11:39:42 h01 stonith-ng[11885]: error: mainloop_gio_callback: Connection to cib_rw[0x61e760] closed (I/O condition=17)
> Dec 30 11:39:42 h01 stonith-ng[11885]: notice: cib_connection_destroy: Connection to the CIB terminated. Shutting down.
> Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=11884, core=false)
> Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
> Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: spawn_child: Forked child 11930 for process cib
> Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Child process stonith-ng exited (pid=11885, rc=0)
> Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
> Dec 30 11:39:42 h01 corosync[10962]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: stonith-ng
> [...]
> Dec 30 11:41:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:42:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:42:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:43:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:43:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:44:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:44:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:45:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:45:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:46:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:46:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:47:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:47:38 h01 shutdown[16154]: shutting down for system reboot
> Dec 30 11:47:38 h01 init: Switching to runlevel: 6
> Dec 30 11:47:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:48:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:48:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:49:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:49:47 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
> Dec 30 11:50:17 h01 corosync[10962]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=11554, seq=6) to terminate...
>
> After Reboot the situation looks like this (no settings were changed!)
> Dec 30 11:56:55 h01 sbd: [15941]: notice: Using watchdog device: /dev/watchdog
> Dec 30 11:56:55 h01 sbd: [15941]: info: Set watchdog timeout to 90 seconds.
> Dec 30 11:56:55 h01 kernel: [ 82.778538] hpwdt: New timer passed in is 90 seconds.
> Dec 30 11:56:55 h01 corosync[16143]: [MAIN ] Corosync Cluster Engine ('1.4.6'): started and ready to provide service.
> Dec 30 11:56:55 h01 corosync[16143]: [MAIN ] Corosync built-in features: nss
> Dec 30 11:56:55 h01 corosync[16143]: [MAIN ] Successfully configured openais services to load
> Dec 30 11:56:55 h01 corosync[16143]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> [...]
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_startup: Local hostname: h01
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_update_nodeid: Local node id: 17831084
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: update_member: Creating entry for node 17831084 born on 0
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: update_member: 0x6451c0 Node 17831084 now known as h01 (was: (null))
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: update_member: Node h01 now has 1 quorum votes (was 0)
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: update_member: Node 17831084/h01 is now: member
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16180 for process cib
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16181 for process stonith-ng
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16182 for process lrmd
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process attrd
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16183 for process attrd
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process pengine
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16184 for process pengine
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process crmd
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16185 for process crmd
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: spawn_child: Forked child 16186 for process mgmtd
> Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.1.10
> Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
> Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync configuration service
> Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
> Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
> Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync profile loading service
> Dec 30 11:56:56 h01 corosync[16143]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
> Dec 30 11:56:56 h01 corosync[16143]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Dec 30 11:56:56 h01 corosync[16143]: [TOTEM ] The network interface [10.2.2.1] is now up.
> Dec 30 11:56:56 h01 corosync[16143]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:56:56 h01 corosync[16143]: [CLM ] New Configuration:
> Dec 30 11:56:56 h01 corosync[16143]: [CLM ] Members Left:
> Dec 30 11:56:56 h01 corosync[16143]: [CLM ] Members Joined:
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1912: memb=0, new=0, lost=0
> Dec 30 11:56:56 h01 corosync[16143]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:56:56 h01 corosync[16143]: [CLM ] New Configuration:
> Dec 30 11:56:56 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:56:56 h01 corosync[16143]: [CLM ] Members Left:
> Dec 30 11:56:56 h01 corosync[16143]: [CLM ] Members Joined:
> Dec 30 11:56:56 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1912: memb=1, new=1, lost=0
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: NEW: h01 17831084
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:56:56 h01 corosync[16143]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Dec 30 11:56:56 h01 corosync[16143]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:0 left:0)
> Dec 30 11:56:56 h01 corosync[16143]: [MAIN ] Completed service synchronization, ready to provide service.
> Dec 30 11:56:56 h01 stonith-ng[16181]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:56:56 h01 stonith-ng[16181]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Recorded connection 0x6534e0 for stonith-ng/16181
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Sending membership update 1912 to stonith-ng
> Dec 30 11:56:56 h01 stonith-ng[16181]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:56:56 h01 attrd[16183]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:56:56 h01 attrd[16183]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Recorded connection 0x659650 for attrd/16183
> Dec 30 11:56:56 h01 attrd[16183]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:56:56 h01 attrd[16183]: notice: main: Starting mainloop...
> Dec 30 11:56:56 h01 cib[16180]: notice: main: Using legacy config location: /var/lib/heartbeat/crm
> Dec 30 11:56:56 h01 mgmtd: [16186]: info: Pacemaker-mgmt Git Version: dcb09c5
> Dec 30 11:56:56 h01 mgmtd: [16186]: WARN: Core dumps could be lost if multiple dumps occur.
> Dec 30 11:56:56 h01 mgmtd: [16186]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
> Dec 30 11:56:56 h01 mgmtd: [16186]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
> Dec 30 11:56:56 h01 mgmtd: [16186]: notice: connect to lrmd: 0, rc=-107
> Dec 30 11:56:56 h01 cib[16180]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:56:56 h01 cib[16180]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Recorded connection 0x65d9d0 for cib/16180
> Dec 30 11:56:56 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Sending membership update 1912 to cib
> Dec 30 11:56:56 h01 crmd[16185]: notice: main: CRM Git Version: 65bb87e
> Dec 30 11:56:56 h01 cib[16180]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:56:56 h01 cib[16180]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
> Dec 30 11:56:57 h01 stonith-ng[16181]: notice: setup_cib: Watching for stonith topology changes
> Dec 30 11:56:57 h01 stonith-ng[16181]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
> Dec 30 11:56:57 h01 stonith-ng[16181]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Dec 30 11:56:57 h01 crmd[16185]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
> Dec 30 11:56:57 h01 crmd[16185]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:56:57 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Recorded connection 0x662670 for crmd/16185
> Dec 30 11:56:57 h01 corosync[16143]: [pcmk ] info: pcmk_ipc: Sending membership update 1912 to crmd
> Dec 30 11:56:57 h01 crmd[16185]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
> Dec 30 11:56:57 h01 crmd[16185]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was (null))
> Dec 30 11:56:57 h01 crmd[16185]: warning: cluster_option: Using deprecated name 'migration-limit' for cluster option 'node-action-limit'
> Dec 30 11:56:57 h01 crmd[16185]: notice: do_started: The local CRM is operational
> Dec 30 11:56:57 h01 crmd[16185]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_st
> arted ]
> Dec 30 11:56:57 h01 mgmtd: [16186]: info: Started.
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] New Configuration:
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] Members Left:
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] Members Joined:
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1916: memb=1, new=0, lost=0
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: memb: h01 17831084
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] New Configuration:
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] Members Left:
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] Members Joined:
> Dec 30 11:57:00 h01 stonith-ng[16181]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 84939948
> Dec 30 11:57:00 h01 stonith-ng[16181]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[84939948] - state is now member (was (null))
> Dec 30 11:57:00 h01 cib[16180]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 84939948
> Dec 30 11:57:00 h01 corosync[16143]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:57:00 h01 stonith-ng[16181]: notice: plugin_handle_membership: Membership 1916: quorum acquired
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1916: memb=2, new=1, lost=0
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: Creating entry for node 84939948 born on 1916
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: Node 84939948/unknown is now: member
> Dec 30 11:57:00 h01 crmd[16185]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 84939948
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: NEW: .pending. 84939948
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: pcmk_peer_update: MEMB: .pending. 84939948
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: send_member_notification: Sending membership update 1916 to 3 children
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: 0x6451c0 Node 17831084 ((null)) born on: 1916
> Dec 30 11:57:00 h01 corosync[16143]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: 0x667590 Node 84939948 (h05) born on: 1876
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: 0x667590 Node 84939948 now known as h05 (was: (null))
> Dec 30 11:57:00 h01 crmd[16185]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[84939948] - state is now member (was (null))
> Dec 30 11:57:00 h01 cib[16180]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[84939948] - state is now member (was (null))
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: Node h05 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:57:00 h01 cib[16180]: notice: plugin_handle_membership: Membership 1916: quorum acquired
> Dec 30 11:57:00 h01 crmd[16185]: notice: plugin_handle_membership: Membership 1916: quorum acquired
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: update_member: Node h05 now has 1 quorum votes (was 0)
> Dec 30 11:57:00 h01 corosync[16143]: [pcmk ] info: send_member_notification: Sending membership update 1916 to 3 children
> Dec 30 11:57:00 h01 cib[16180]: notice: cib_server_process_diff: Not applying diff 0.856.100 -> 0.856.101 (sync in progress)
> Dec 30 11:57:00 h01 corosync[16143]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:1 left:0)
> Dec 30 11:57:00 h01 corosync[16143]: [MAIN ] Completed service synchronization, ready to provide service.
> Dec 30 11:57:00 h01 crmd[16185]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join
> _finalize_respond ]
> Dec 30 11:57:00 h01 attrd[16183]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Dec 30 11:57:02 h01 o2cb(prm_O2CB)[16588]: INFO: configfs not loaded
> Dec 30 11:57:02 h01 Filesystem(prm_CFS_VMs_fs)[16589]: WARNING: Couldn't find device [/dev/CFS_VMs/xen]. Expected /dev/??? to exist
> Dec 30 11:57:02 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_O2CB_monitor_0 (call=26, rc=7, cib-update=9, confirmed=true) not running
> Dec 30 11:57:02 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_cLVMd_monitor_0 (call=16, rc=7, cib-update=10, confirmed=true) not running
> Dec 30 11:57:02 h01 LVM(prm_LVM_CFS_VMs)[16587]: WARNING: LVM Volume CFS_VMs is not available (stopped)
> Dec 30 11:57:02 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_DLM_monitor_0 (call=11, rc=7, cib-update=11, confirmed=true) not running
> [... (more probes)]
> Dec 30 11:57:03 h01 ocfs2_controld[17046]: new listening connection 4
> Dec 30 11:57:03 h01 cluster-dlm[16969]: process_listener: client connection 7 fd 21
> Dec 30 11:57:04 h01 clvmd[17023]: Created DLM lockspace for CLVMD.
> Dec 30 11:57:04 h01 clvmd[17023]: DLM initialisation complete
> Dec 30 11:57:04 h01 clvmd[17023]: Our local node id is 17831084
> Dec 30 11:57:04 h01 clvmd[17023]: Connected to Corosync
> Dec 30 11:57:04 h01 clvmd[17023]: Cluster LVM daemon started - connected to Corosync
> Dec 30 11:57:04 h01 clvmd[17023]: Cluster ready, doing some more initialisation
> Dec 30 11:57:04 h01 clvmd[17023]: starting LVM thread
> Dec 30 11:57:04 h01 clvmd[17023]: LVM thread function started
> Dec 30 11:57:05 h01 lvm[17023]: Sub thread ready for work.
> Dec 30 11:57:05 h01 lvm[17023]: LVM thread waiting for work
> Dec 30 11:57:05 h01 lvm[17023]: clvmd ready for work
> Dec 30 11:57:05 h01 lvm[17023]: Using timeout of 60 seconds
> Dec 30 11:57:05 h01 lvm[17023]: confchg callback. 1 joined, 0 left, 2 members
> Dec 30 11:57:05 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_O2CB_start_0 (call=63, rc=0, cib-update=23, confirmed=true) ok
> Dec 30 11:57:05 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_O2CB_monitor_60000 (call=64, rc=0, cib-update=24, confirmed=false) ok
> Dec 30 11:57:07 h01 cmirrord[17078]: Starting cmirrord:
> Dec 30 11:57:07 h01 cmirrord[17078]: Built: May 29 2013 15:04:35
> Dec 30 11:57:09 h01 lrmd[16182]: notice: operation_finished: prm_cLVMd_start_0:16979:stderr [ local socket: connect failed: Connection refused ]
> Dec 30 11:57:09 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_cLVMd_start_0 (call=62, rc=0, cib-update=25, confirmed=true) ok
> Dec 30 11:57:09 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_cLVMd_monitor_300000 (call=65, rc=0, cib-update=26, confirmed=false) ok
> Dec 30 11:57:09 h01 lvm[17023]: Got new connection on fd 5
> [...]
> Dec 30 11:57:13 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_CFS_VMs_fs_start_0 (call=68, rc=0, cib-update=29, confirmed=true) ok
> Dec 30 11:57:13 h01 crmd[16185]: notice: process_lrm_event: LRM operation prm_CFS_VMs_fs_monitor_300000 (call=69, rc=0, cib-update=30, confirmed=false) ok
>
> During the communication problems the other node (DC) saw these messages:
> Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000000002 (2)
> Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1900 to 4 children
> Dec 30 11:39:06 h05 crmd[20787]: notice: peer_update_callback: do_shutdown of h01 (op 80) is complete
> Dec 30 11:39:06 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:39:06 h05 corosync[20745]: [CLM ] New Configuration:
> Dec 30 11:39:06 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:39:06 h05 corosync[20745]: [CLM ] Members Left:
> Dec 30 11:39:06 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:39:06 h05 corosync[20745]: [CLM ] Members Joined:
> Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1904: memb=1, new=0, lost=1
> Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: memb: h05 84939948
> Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: lost: h01 17831084
> Dec 30 11:39:06 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:39:06 h05 corosync[20745]: [CLM ] New Configuration:
> Dec 30 11:39:06 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:39:06 h05 corosync[20745]: [CLM ] Members Left:
> Dec 30 11:39:06 h05 ocfs2_controld[15794]: notice: plugin_handle_membership: Membership 1904: quorum lost
> Dec 30 11:39:06 h05 corosync[20745]: [CLM ] Members Joined:
> Dec 30 11:39:06 h05 ocfs2_controld[15794]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
> Dec 30 11:39:06 h05 cluster-dlm[15666]: notice: plugin_handle_membership: Membership 1904: quorum lost
> Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1904: memb=1, new=0, lost=0
> Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h05 84939948
> Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: ais_mark_unseen_peer_dead: Node h01 was not seen in the previous transition
> Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: update_member: Node 17831084/h01 is now: lost
> Dec 30 11:39:06 h05 cluster-dlm[15666]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
> Dec 30 11:39:06 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1904 to 4 children
> Dec 30 11:39:06 h05 corosync[20745]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Dec 30 11:39:06 h05 cluster-dlm[15666]: update_cluster: Processing membership 1904
> Dec 30 11:39:06 h05 cluster-dlm[15666]: dlm_process_node: Skipped active node 84939948: born-on=1876, last-seen=1904, this-event=1904, last-event=1900
> Dec 30 11:39:06 h05 cluster-dlm[15666]: del_configfs_node: del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/17831084"
> Dec 30 11:39:06 h05 cluster-dlm[15666]: dlm_process_node: Removed inactive node 17831084: born-on=1900, last-seen=1900, this-event=1904, last-event=1900
> Dec 30 11:39:06 h05 corosync[20745]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.5) r(1) ip(10.2.2.5) ; members(old:2 left:1)
> Dec 30 11:39:06 h05 corosync[20745]: [MAIN ] Completed service synchronization, ready to provide service.
> Dec 30 11:39:06 h05 kernel: [2678855.695469] dlm: closing connection to node 17831084
> Dec 30 11:39:06 h05 cib[20782]: notice: plugin_handle_membership: Membership 1904: quorum lost
> Dec 30 11:39:06 h05 cib[20782]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
> Dec 30 11:39:06 h05 crmd[20787]: notice: plugin_handle_membership: Membership 1904: quorum lost
> Dec 30 11:39:06 h05 crmd[20787]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
> Dec 30 11:39:06 h05 crmd[20787]: notice: peer_update_callback: do_shutdown of h01 (op 80) is complete
> Dec 30 11:39:06 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to false (call=1874)
> Dec 30 11:39:06 h05 crmd[20787]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
> Dec 30 11:39:06 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to false (call=1879)
> Dec 30 11:39:06 h05 attrd[20785]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Dec 30 11:39:06 h05 attrd[20785]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> Dec 30 11:39:06 h05 pengine[20786]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Dec 30 11:39:06 h05 pengine[20786]: notice: find_anonymous_clone: /Anonymous/ clone cln_CFS_VMs_fs is already running on h05
> Dec 30 11:39:06 h05 pengine[20786]: notice: process_pe_message: Calculated Transition 318: /var/lib/pacemaker/pengine/pe-input-32.bz2
> Dec 30 11:39:06 h05 crmd[20787]: notice: do_te_invoke: Processing graph 318 (ref=pe_calc-dc-1388399946-774) derived from /var/lib/pacemaker/pengine/pe-input-32.bz2
> Dec 30 11:39:06 h05 crmd[20787]: notice: run_graph: Transition 318 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-32.bz2): Compl
> ete
> Dec 30 11:39:06 h05 crmd[20787]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] New Configuration:
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] Members Left:
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] Members Joined:
> Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1908: memb=1, new=0, lost=0
> Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: memb: h05 84939948
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] New Configuration:
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] Members Left:
> Dec 30 11:39:17 h05 cluster-dlm[15666]: notice: plugin_handle_membership: Membership 1908: quorum acquired
> Dec 30 11:39:17 h05 ocfs2_controld[15794]: notice: plugin_handle_membership: Membership 1908: quorum acquired
> Dec 30 11:39:17 h05 ocfs2_controld[15794]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
> Dec 30 11:39:17 h05 cluster-dlm[15666]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] Members Joined:
> Dec 30 11:39:17 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1908: memb=2, new=1, lost=0
> Dec 30 11:39:17 h05 cluster-dlm[15666]: update_cluster: Processing membership 1908
> Dec 30 11:39:17 h05 crmd[20787]: notice: plugin_handle_membership: Membership 1908: quorum acquired
> Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: update_member: Node 17831084/h01 is now: member
> Dec 30 11:39:17 h05 cib[20782]: notice: plugin_handle_membership: Membership 1908: quorum acquired
> Dec 30 11:39:17 h05 cib[20782]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
> Dec 30 11:39:17 h05 crmd[20787]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
> Dec 30 11:39:17 h05 cluster-dlm[15666]: dlm_process_node: Skipped active node 84939948: born-on=1876, last-seen=1908, this-event=1908, last-event=1904
> Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: NEW: h01 17831084
> Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
> Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h05 84939948
> Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:17 h05 cluster-dlm[15666]: dlm_process_node: Adding address ip(172.20.16.1) to configfs for node 17831084
> Dec 30 11:39:17 h05 corosync[20745]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Dec 30 11:39:17 h05 cluster-dlm[15666]: add_configfs_node: set_configfs_node 17831084 172.20.16.1 local 0
> Dec 30 11:39:17 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to true (call=1884)
> Dec 30 11:39:17 h05 cluster-dlm[15666]: dlm_process_node: Adding address ip(10.2.2.1) to configfs for node 17831084
> Dec 30 11:39:17 h05 cluster-dlm[15666]: add_configfs_node: set_configfs_node 17831084 10.2.2.1 local 0
> Dec 30 11:39:17 h05 cluster-dlm[15666]: create_path: /sys/kernel/config/dlm/cluster/comms/17831084: mkdir failed: 17
> Dec 30 11:39:17 h05 cluster-dlm[15666]: dlm_process_node: Added active node 17831084: born-on=1900, last-seen=1908, this-event=1908, last-event=1904
> Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: update_member: 0x663780 Node 17831084 (h01) born on: 1908
> Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:17 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:17 h05 corosync[20745]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:1 left:0)
> Dec 30 11:39:17 h05 corosync[20745]: [MAIN ] Completed service synchronization, ready to provide service.
> Dec 30 11:39:17 h05 crmd[20787]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
> Dec 30 11:39:20 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to true (call=1902)
> Dec 30 11:39:20 h05 attrd[20785]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Dec 30 11:39:20 h05 attrd[20785]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> Dec 30 11:39:20 h05 pengine[20786]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Dec 30 11:39:20 h05 pengine[20786]: notice: find_anonymous_clone: /Anonymous/ clone cln_CFS_VMs_fs is already running on h05
> Dec 30 11:39:20 h05 pengine[20786]: notice: LogActions: Start prm_DLM:1 (h01)
> Dec 30 11:39:20 h05 pengine[20786]: notice: LogActions: Start prm_cLVMd:1 (h01)
> Dec 30 11:39:20 h05 pengine[20786]: notice: LogActions: Start prm_LVM_CFS_VMs:1 (h01)
> Dec 30 11:39:20 h05 pengine[20786]: notice: LogActions: Start prm_O2CB:1 (h01)
> Dec 30 11:39:20 h05 pengine[20786]: notice: LogActions: Start prm_CFS_VMs_fs:1 (h01)
> Dec 30 11:39:20 h05 pengine[20786]: notice: process_pe_message: Calculated Transition 319: /var/lib/pacemaker/pengine/pe-input-33.bz2
> Dec 30 11:39:20 h05 crmd[20787]: notice: do_te_invoke: Processing graph 319 (ref=pe_calc-dc-1388399960-784) derived from /var/lib/pacemaker/pengine/pe-input-33.bz2
> Dec 30 11:39:20 h05 crmd[20787]: notice: te_rsc_command: Initiating action 19: monitor prm_stonith_sbd_monitor_0 on h01
> [...]
> Dec 30 11:39:22 h05 ocfs2_controld[15794]: confchg called
> Dec 30 11:39:22 h05 ocfs2_controld[15794]: ocfs2_controld (group "ocfs2:controld") confchg: members 2, left 0, joined 1
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151112 (1380626)
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:22 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:22 h05 crmd[20787]: warning: match_down_event: No match for shutdown action on h01
> Dec 30 11:39:22 h05 crmd[20787]: notice: peer_update_callback: Stonith/shutdown of h01 not matched
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 46 (46) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 45 (45) was pending on h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 54 (54) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 53 (53) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 62 (62) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 61 (61) was pending on h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 70 (70) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 69 (69) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: warning: fail_incompletable_actions: Node h01 shutdown resulted in un-runnable actions
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 46 (46) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 45 (45) was pending on h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 54 (54) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 53 (53) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 62 (62) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 61 (61) was pending on h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 70 (70) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: notice: fail_incompletable_actions: Action 69 (69) is scheduled for h01 (offline)
> Dec 30 11:39:22 h05 crmd[20787]: warning: fail_incompletable_actions: Node h01 shutdown resulted in un-runnable actions
> Dec 30 11:39:22 h05 crmd[20787]: notice: run_graph: Transition 319 (Complete=25, Pending=0, Fired=0, Skipped=10, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-33.bz2): Sto
> pped
> Dec 30 11:39:24 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:24 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:24 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:24 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
> Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:28 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
> Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:33 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151212 (1380882)
> Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000051312 (332562)
> Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:39:38 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> (and so on...)
>
> (some time later)
> Dec 30 11:42:22 h05 crmd[20787]: error: crm_timer_popped: Integration Timer (I_INTEGRATED) just popped in state S_INTEGRATION! (180000ms)
> Dec 30 11:42:22 h05 crmd[20787]: warning: do_state_transition: Progressed to state S_FINALIZE_JOIN after C_TIMER_POPPED
> Dec 30 11:42:22 h05 crmd[20787]: warning: do_state_transition: 1 cluster nodes failed to respond to the join offer.
> Dec 30 11:42:22 h05 crmd[20787]: notice: crmd_join_phase_log: join-1: h01=welcomed
> Dec 30 11:42:22 h05 crmd[20787]: notice: crmd_join_phase_log: join-1: h05=confirmed
> Dec 30 11:50:01 h05 /usr/sbin/cron[13942]: (root) CMD ([ -x /usr/lib64/sa/sa1 ] && exec /usr/lib64/sa/sa1 -S ALL 1 1)
> Dec 30 11:51:15 h05 cluster-dlm[15666]: log_config: dlm:controld conf 1 0 1 memb 84939948 join left 17831084
> Dec 30 11:51:15 h05 ocfs2_controld[15794]: confchg called
> Dec 30 11:51:15 h05 ocfs2_controld[15794]: ocfs2_controld (group "ocfs2:controld") confchg: members 1, left 1, joined 0
> Dec 30 11:51:15 h05 ocfs2_controld[15794]: node daemon left 17831084
> Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000000002 (2)
> Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1908 to 4 children
> Dec 30 11:51:16 h05 crmd[20787]: warning: match_down_event: No match for shutdown action on h01
> Dec 30 11:51:16 h05 crmd[20787]: notice: peer_update_callback: Stonith/shutdown of h01 not matched
> Dec 30 11:51:16 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:51:16 h05 corosync[20745]: [CLM ] New Configuration:
> Dec 30 11:51:16 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:51:16 h05 corosync[20745]: [CLM ] Members Left:
> Dec 30 11:51:16 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:51:16 h05 corosync[20745]: [CLM ] Members Joined:
> Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1912: memb=1, new=0, lost=1
> Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: memb: h05 84939948
> Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: lost: h01 17831084
> Dec 30 11:51:16 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:51:16 h05 corosync[20745]: [CLM ] New Configuration:
> Dec 30 11:51:16 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:51:16 h05 corosync[20745]: [CLM ] Members Left:
> Dec 30 11:51:16 h05 ocfs2_controld[15794]: notice: plugin_handle_membership: Membership 1912: quorum lost
> Dec 30 11:51:16 h05 ocfs2_controld[15794]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
> Dec 30 11:51:16 h05 corosync[20745]: [CLM ] Members Joined:
> Dec 30 11:51:16 h05 cluster-dlm[15666]: notice: plugin_handle_membership: Membership 1912: quorum lost
> Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1912: memb=1, new=0, lost=0
> Dec 30 11:51:16 h05 cluster-dlm[15666]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
> Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h05 84939948
> Dec 30 11:51:16 h05 cluster-dlm[15666]: update_cluster: Processing membership 1912
> Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: ais_mark_unseen_peer_dead: Node h01 was not seen in the previous transition
> Dec 30 11:51:16 h05 cluster-dlm[15666]: dlm_process_node: Skipped active node 84939948: born-on=1876, last-seen=1912, this-event=1912, last-event=1908
> Dec 30 11:51:16 h05 cluster-dlm[15666]: del_configfs_node: del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/17831084"
> Dec 30 11:51:16 h05 cluster-dlm[15666]: dlm_process_node: Removed inactive node 17831084: born-on=1908, last-seen=1908, this-event=1912, last-event=1908
> Dec 30 11:51:16 h05 crmd[20787]: notice: plugin_handle_membership: Membership 1912: quorum lost
> Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: update_member: Node 17831084/h01 is now: lost
> Dec 30 11:51:16 h05 crmd[20787]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
> Dec 30 11:51:16 h05 crmd[20787]: warning: match_down_event: No match for shutdown action on h01
> Dec 30 11:51:16 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1912 to 4 children
> Dec 30 11:51:16 h05 crmd[20787]: notice: peer_update_callback: Stonith/shutdown of h01 not matched
> Dec 30 11:51:16 h05 cib[20782]: notice: plugin_handle_membership: Membership 1912: quorum lost
> Dec 30 11:51:16 h05 corosync[20745]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Dec 30 11:51:16 h05 cib[20782]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now lost (was member)
> Dec 30 11:51:16 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to false (call=2141)
> Dec 30 11:51:16 h05 corosync[20745]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.5) r(1) ip(10.2.2.5) ; members(old:2 left:1)
> Dec 30 11:51:16 h05 corosync[20745]: [MAIN ] Completed service synchronization, ready to provide service.
> Dec 30 11:51:16 h05 kernel: [2679586.289486] dlm: closing connection to node 17831084
> Dec 30 11:51:16 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to false (call=2146)
> Dec 30 11:51:16 h05 attrd[20785]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Dec 30 11:51:16 h05 attrd[20785]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> Dec 30 11:51:16 h05 pengine[20786]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Dec 30 11:51:16 h05 pengine[20786]: warning: pe_fence_node: Node h01 will be fenced because the node is no longer part of the cluster
> Dec 30 11:51:16 h05 pengine[20786]: warning: determine_online_status: Node h01 is unclean
> Dec 30 11:51:16 h05 pengine[20786]: notice: find_anonymous_clone: /Anonymous/ clone cln_CFS_VMs_fs is already running on h05
> Dec 30 11:51:16 h05 pengine[20786]: warning: custom_action: Action prm_DLM:0_stop_0 on h01 is unrunnable (offline)
> Dec 30 11:51:16 h05 pengine[20786]: warning: custom_action: Action prm_DLM:0_stop_0 on h01 is unrunnable (offline)
> Dec 30 11:51:16 h05 pengine[20786]: warning: custom_action: Action prm_cLVMd:0_stop_0 on h01 is unrunnable (offline)
> Dec 30 11:51:16 h05 pengine[20786]: warning: custom_action: Action prm_cLVMd:0_stop_0 on h01 is unrunnable (offline)
> [...]
> Dec 30 11:54:17 h05 sbd: [14621]: info: reset successfully delivered to h01
> Dec 30 11:54:17 h05 sbd: [14619]: info: Message successfully delivered.
> Dec 30 11:54:18 h05 stonith-ng[20783]: notice: log_operation: Operation 'reboot' [14612] (call 2 from crmd.20787) for host 'h01' with device 'prm_stonith_sbd' returned: 0 (OK)
> Dec 30 11:54:18 h05 stonith-ng[20783]: notice: remote_op_done: Operation reboot of h01 by h05 for crmd.20787@h05.515279c7: OK
> Dec 30 11:54:18 h05 crmd[20787]: notice: tengine_stonith_callback: Stonith operation 2/71:320:0:d2d0a70f-b17d-4e6f-b9bb-7f71b7501e0d: OK (0)
> Dec 30 11:54:18 h05 crmd[20787]: notice: tengine_stonith_notify: Peer h01 was terminated (reboot) by h05 for h05: OK (ref=515279c7-51cc-47e3-930a-46d8fd5786ca) by client crmd.20787
> Dec 30 11:54:18 h05 crmd[20787]: notice: run_graph: Transition 320 (Complete=18, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-0.bz2): Comple
> te
> Dec 30 11:54:18 h05 crmd[20787]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Dec 30 11:54:18 h05 crmd[20787]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Dec 30 11:54:18 h05 pengine[20786]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Dec 30 11:54:18 h05 pengine[20786]: notice: find_anonymous_clone: /Anonymous/ clone cln_CFS_VMs_fs is already running on h05
> Dec 30 11:54:18 h05 pengine[20786]: notice: process_pe_message: Calculated Transition 321: /var/lib/pacemaker/pengine/pe-input-34.bz2
> Dec 30 11:54:18 h05 crmd[20787]: notice: do_te_invoke: Processing graph 321 (ref=pe_calc-dc-1388400858-859) derived from /var/lib/pacemaker/pengine/pe-input-34.bz2
> Dec 30 11:54:18 h05 crmd[20787]: notice: run_graph: Transition 321 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-34.bz2): Compl
> ete
> Dec 30 11:54:18 h05 crmd[20787]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> Dec 30 11:55:01 h05 /usr/sbin/cron[16614]: (root) CMD ([ -x /usr/lib64/sa/sa2 ] && exec /usr/lib64/sa/sa2 -A)
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] New Configuration:
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] Members Left:
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] Members Joined:
> Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 1916: memb=1, new=0, lost=0
> Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: memb: h05 84939948
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] New Configuration:
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.5) r(1) ip(10.2.2.5)
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] Members Left:
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] Members Joined:
> Dec 30 11:57:00 h05 cib[20782]: notice: plugin_handle_membership: Membership 1916: quorum acquired
> Dec 30 11:57:00 h05 cib[20782]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
> Dec 30 11:57:00 h05 ocfs2_controld[15794]: notice: plugin_handle_membership: Membership 1916: quorum acquired
> Dec 30 11:57:00 h05 corosync[20745]: [CLM ] r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)
> Dec 30 11:57:00 h05 crmd[20787]: notice: plugin_handle_membership: Membership 1916: quorum acquired
> Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 1916: memb=2, new=1, lost=0
> Dec 30 11:57:00 h05 cluster-dlm[15666]: notice: plugin_handle_membership: Membership 1916: quorum acquired
> Dec 30 11:57:00 h05 ocfs2_controld[15794]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
> Dec 30 11:57:00 h05 crmd[20787]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
> Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: update_member: Node 17831084/h01 is now: member
> Dec 30 11:57:00 h05 cluster-dlm[15666]: notice: crm_update_peer_state: plugin_handle_membership: Node h01[17831084] - state is now member (was lost)
> Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: NEW: h01 17831084
> Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h01 17831084
> Dec 30 11:57:00 h05 cluster-dlm[15666]: update_cluster: Processing membership 1916
> Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: pcmk_peer_update: MEMB: h05 84939948
> Dec 30 11:57:00 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to true (call=2160)
> Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1916 to 4 children
> Dec 30 11:57:00 h05 cluster-dlm[15666]: dlm_process_node: Skipped active node 84939948: born-on=1876, last-seen=1916, this-event=1916, last-event=1912
> Dec 30 11:57:00 h05 corosync[20745]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: update_member: 0x663780 Node 17831084 (h01) born on: 1916
> Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: update_member: Node h01 now has process list: 00000000000000000000000000151312 (1381138)
> Dec 30 11:57:00 h05 cluster-dlm[15666]: dlm_process_node: Adding address ip(172.20.16.1) to configfs for node 17831084
> Dec 30 11:57:00 h05 corosync[20745]: [pcmk ] info: send_member_notification: Sending membership update 1916 to 4 children
> Dec 30 11:57:00 h05 cluster-dlm[15666]: add_configfs_node: set_configfs_node 17831084 172.20.16.1 local 0
> Dec 30 11:57:00 h05 cluster-dlm[15666]: dlm_process_node: Adding address ip(10.2.2.1) to configfs for node 17831084
> Dec 30 11:57:00 h05 cluster-dlm[15666]: add_configfs_node: set_configfs_node 17831084 10.2.2.1 local 0
> Dec 30 11:57:00 h05 cluster-dlm[15666]: create_path: /sys/kernel/config/dlm/cluster/comms/17831084: mkdir failed: 17
> Dec 30 11:57:00 h05 cluster-dlm[15666]: dlm_process_node: Added active node 17831084: born-on=1908, last-seen=1916, this-event=1916, last-event=1912
> Dec 30 11:57:00 h05 corosync[20745]: [CPG ] chosen downlist: sender r(0) ip(172.20.16.1) r(1) ip(10.2.2.1) ; members(old:1 left:0)
> Dec 30 11:57:00 h05 corosync[20745]: [MAIN ] Completed service synchronization, ready to provide service.
> Dec 30 11:57:00 h05 crmd[20787]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ]
> Dec 30 11:57:00 h05 crmd[20787]: notice: do_dc_join_finalize: join-1: Syncing the CIB from h01 to the rest of the cluster
> Dec 30 11:57:00 h05 crmd[20787]: notice: do_dc_join_finalize: Requested version <generation_tuple epoch="856" num_updates="102" admin_epoch="0" validate-with="pacemaker-1.2" crm_fe
> ature_set="3.0.7" have-quorum="1" cib-last-written="Mon Dec 30 08:18:06 2013" update-origin="h01" update-client="cibadmin" update-user="root" dc-uuid="h05"/>
> Dec 30 11:57:00 h05 crmd[20787]: notice: crm_update_quorum: Updating quorum status to true (call=2174)
> Dec 30 11:57:00 h05 attrd[20785]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> Dec 30 11:57:00 h05 attrd[20785]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> [...]
> Dec 30 11:57:13 h05 crmd[20787]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
>
> Oh, Well. The software on h01 is the latest for SLE11 SP3...
> python-lxml-2.3.6-0.11.1 Mon Dec 30 11:39:23 2013
> pacemaker-mgmt-client-2.1.2-0.11.4 Mon Dec 30 11:39:23 2013
> crmsh-1.2.6-0.25.4 Mon Dec 30 11:39:23 2013
> resource-agents-3.9.5-0.32.22 Mon Dec 30 11:39:22 2013
> pacemaker-mgmt-2.1.2-0.11.4 Mon Dec 30 11:39:22 2013
> pacemaker-1.1.10-0.9.28 Mon Dec 30 11:39:22 2013
> cluster-glue-1.0.11-0.19.4 Mon Dec 30 11:39:20 2013
> sbd-1.2.1-0.7.22 Mon Dec 30 11:39:19 2013
> libpacemaker3-1.1.10-0.9.28 Mon Dec 30 11:39:19 2013
> openais-1.1.4-5.17.5 Mon Dec 30 11:34:01 2013
> libqb0-0.16.0-0.7.4 Mon Dec 30 11:34:00 2013
> libopenais3-1.1.4-5.17.5 Mon Dec 30 11:34:00 2013
> libglue2-1.0.11-0.19.4 Mon Dec 30 11:34:00 2013
> ldirectord-3.9.5-0.32.22 Mon Dec 30 11:34:00 2013
>
> Regards,
> Ulrich
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems