Mailing List Archive

DRBDスプリットブレインの発生原因に関して
お世話になります。あきやまと申します。

下記の環境で運用をしています。

環境:
Red Hat Enterprise Linux Server 6.1 (for x86)
kernel-2.6.32-431.23.3.el6.i686
corosync-1.4.1-17.el6_5.1.i686
pacemaker-1.1.10-14.el6_5.3.i686
drbd-8.3.16-1.el6.i386

質問内容:
運用中に突然DRBDスプリットブレインが発生し、手動で復旧させましたが、
発生原因が不明で困っています。
発生原因と回避策が分かれば、ご教授願えますでしょうか。

現在までに判明している事:
・DRBDネットワークのping応答は問題なし
・messagesログにリンクダウンの出力なし

【プライマリ側のmessagesログ】
Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown )
conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting header on
sock: r=-110
Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
Unconnected )
Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
from=::ffff:192.171.1.102
Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root: cmd='echo
$(date +%Y/%m/%d\ %H:%M:%S)'
Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
duration=0(sec)
Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: Agreed
network protocol version 97
Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread (from
drbd0_receiver [3117])
Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg: <not-used>
Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down by
peer.
Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
NetworkFailure )
Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
WFConnection )

【プライマリ側のcorosyncログ】
Feb 16 18:00:27 server1 corosync[1756]: [MAIN ] main.c:1341 Corosync
main process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms).
Consider token timeout increase.
Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Transitional membership event on ring 16: memb=1, new=0,
lost=1
Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:773 info:
pcmk_peer_update: memb: server1 1694608064
Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:779 info:
pcmk_peer_update: lost: server2 1711385280
Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, lost=0
Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:791 info:
pcmk_peer_update: NEW: server2 1711385280
Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
pcmk_peer_update: MEMB: server1 1694608064
Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
pcmk_peer_update: MEMB: server2 1711385280
Feb 16 18:00:27 server1 corosync[1756]: [TOTEM ] totemsrp.c:1892 A
processor joined or left the membership and a new membership was formed.
Feb 16 18:00:27 server1 corosync[1756]: [CPG ] cpg.c:783 chosen
downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
Feb 16 18:00:28 server1 corosync[1756]: [MAIN ] main.c:1341 Corosync
main process was not scheduled for 826.4143 ms (threshold is 800.0000 ms).
Consider token timeout increase.
Feb 16 18:00:28 server1 corosync[1756]: [MAIN ] main.c:278 Completed
service synchronization, ready to provide service.
Feb 16 18:00:28 server1 crmd[1766]: warning: crmd_ha_msg_filter: Another
DC detected: server2 (op=noop)
Feb 16 18:00:28 server1 crmd[1766]: notice: do_state_transition: State
transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
origin=crmd_ha_msg_filter ]
Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff: Not
requesting full refresh in R/W mode
Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff: Not
requesting full refresh in R/W mode
Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff: Not
requesting full refresh in R/W mode
Feb 16 18:00:30 server1 corosync[1756]: [MAIN ] main.c:1341 Corosync
main process was not scheduled for 2248.0801 ms (threshold is 800.0000 ms).
Consider token timeout increase.
Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Transitional membership event on ring 24: memb=1, new=0,
lost=1
Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:773 info:
pcmk_peer_update: memb: server1 1694608064
Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:779 info:
pcmk_peer_update: lost: server2 1711385280
Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, lost=0
Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:791 info:
pcmk_peer_update: NEW: server2 1711385280
Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
pcmk_peer_update: MEMB: server1 1694608064
Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
pcmk_peer_update: MEMB: server2 1711385280
Feb 16 18:00:31 server1 corosync[1756]: [TOTEM ] totemsrp.c:1892 A
processor joined or left the membership and a new membership was formed.
Feb 16 18:00:32 server1 corosync[1756]: [CPG ] cpg.c:783 chosen
downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
Feb 16 18:00:32 server1 corosync[1756]: [MAIN ] main.c:278 Completed
service synchronization, ready to provide service.
Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff: Not
requesting full refresh in R/W mode
Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff: Not
requesting full refresh in R/W mode
Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff: Not
requesting full refresh in R/W mode
Feb 16 18:02:28 server1 crmd[1766]: error: crm_timer_popped: Election
Timeout (I_ELECTION_DC) just popped in state S_ELECTION! (120000ms)
Feb 16 18:02:28 server1 crmd[1766]: notice: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Feb 16 18:02:28 server1 crmd[1766]: notice: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Feb 16 18:02:28 server1 crmd[1766]: notice: do_dc_join_finalize: join-4:
Syncing the CIB from server2 to the rest of the cluster
Feb 16 18:02:28 server1 crmd[1766]: notice: do_dc_join_finalize:
Requested version <generation_tuple epoch="7" num_updates="87"
admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Wed Jan 21
00:17:56 2015" update-origin="server1" update-client="crmd"
crm_feature_set="3.0.7" have-quorum="1" dc-uuid="server2"/>
Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: Diff: --- 0.7.69
Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: Diff: +++ 0.7.87
1afd9af32d3a574358543e1d4751aaf1
Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: -- <node_state
crm-debug-origin="do_state_transition" join="member" id="server1"/>
Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: --
<lrm_rsc_op operation_key="res_drbd_r0_start_0" operation="start"
transition-key="46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
transition-magic="0:0;46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
call-id="44" last-run="1421767134" last-rc-change="1421767134"
exec-time="430" id="res_drbd_r0_last_0"/>
Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: --
<lrm_rsc_op id="res_drbd_r0_monitor_30000"
operation_key="res_drbd_r0_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
transition-magic="0:0;47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
call-id="50" rc-code="0" op-status="0" interval="30000"
last-rc-change="1421767135" exec-time="60" queue-time="0"
op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: -- <nvpair
value="1000" id="status-server2-master-res_drbd_r0"/>
Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++ <node_state
id="server1" uname="server1" in_ccm="true" crmd="online"
crm-debug-origin="post_cache_update" join="down" expected="member"/>
Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++ <nvpair
id="status-server2-master-res_drbd_r0" name="master-res_drbd_r0"
value="10000"/>
Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++
<lrm_rsc_op id="res_drbd_r0_last_0" operation_key="res_drbd_r0_promote_0"
operation="promote" crm-debug-origin="do_update_resource"
crm_feature_set="3.0.7"
transition-key="14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
transition-magic="0:0;14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
call-id="62" rc-code="0" op-status="0" interval="0" last-run="1424077226"
last-rc-change="1424077226" exec-time="120" queue-time="0"
op-digest="c0e018b73fdf522b6cdd355e125af15e"/>
Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++
<lrm_rsc_op id="res_drbd_r0_monitor_10000"
operation_key="res_drbd_r0_monitor_10000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
transition-magic="0:8;15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
call-id="68" rc-code="8" op-status="0" interval="10000"
last-rc-change="1424077226" exec-time="53" queue-time="0"
op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
Feb 16 18:02:28 server1 stonith-ng[1762]: warning: cib_process_diff: Diff
0.7.69 -> 0.7.87 from local not applied to 0.7.69: Failed application of an
update diff
Feb 16 18:02:28 server1 stonith-ng[1762]: notice: update_cib_cache_cb:
[cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-res_drbd_r0 (10000)
Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Feb 16 18:02:28 server1 pengine[1765]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Feb 16 18:02:28 server1 pengine[1765]: notice: LogActions: Demote
res_drbd_r0:1#011(Master -> Slave server2)
Feb 16 18:02:28 server1 pengine[1765]: notice: process_pe_message:
Calculated Transition 2582: /var/lib/pacemaker/pengine/pe-input-24.bz2
Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: Initiating
action 8: cancel res_drbd_r0_cancel_10000 on server2
Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: Initiating
action 81: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: Initiating
action 83: notify res_drbd_r0_pre_notify_demote_0 on server2
Feb 16 18:02:28 server1 crmd[1766]: notice: process_lrm_event: LRM
operation res_drbd_r0_notify_0 (call=113, rc=0, cib-update=0,
confirmed=true) ok
Feb 16 18:02:28 server1 crmd[1766]: notice: run_graph: Transition 2582
(Complete=4, Pending=0, Fired=0, Skipped=4, Incomplete=5,
Source=/var/lib/pacemaker/pengine/pe-input-24.bz2): Stopped
Feb 16 18:02:28 server1 pengine[1765]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Feb 16 18:02:28 server1 pengine[1765]: notice: LogActions: Demote
res_drbd_r0:1#011(Master -> Slave server2)
Feb 16 18:02:28 server1 pengine[1765]: notice: process_pe_message:
Calculated Transition 2583: /var/lib/pacemaker/pengine/pe-input-25.bz2
Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: Initiating
action 80: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: Initiating
action 82: notify res_drbd_r0_pre_notify_demote_0 on server2
Feb 16 18:02:29 server1 crmd[1766]: notice: process_lrm_event: LRM
operation res_drbd_r0_notify_0 (call=116, rc=0, cib-update=0,
confirmed=true) ok
Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: Initiating
action 39: demote res_drbd_r0_demote_0 on server2
Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: Initiating
action 81: notify res_drbd_r0_post_notify_demote_0 on server1 (local)
Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: Initiating
action 83: notify res_drbd_r0_post_notify_demote_0 on server2
Feb 16 18:02:29 server1 crmd[1766]: notice: process_lrm_event: LRM
operation res_drbd_r0_notify_0 (call=119, rc=0, cib-update=0,
confirmed=true) ok
Feb 16 18:02:29 server1 crmd[1766]: notice: run_graph: Transition 2583
(Complete=11, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-25.bz2): Stopped
Feb 16 18:02:29 server1 pengine[1765]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Feb 16 18:02:29 server1 pengine[1765]: notice: process_pe_message:
Calculated Transition 2584: /var/lib/pacemaker/pengine/pe-input-26.bz2
Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: Initiating
action 40: monitor res_drbd_r0_monitor_30000 on server2
Feb 16 18:02:29 server1 crmd[1766]: notice: run_graph: Transition 2584
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-26.bz2): Complete
Feb 16 18:02:29 server1 crmd[1766]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Feb 16 18:17:29 server1 crmd[1766]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Feb 16 18:17:29 server1 pengine[1765]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Feb 16 18:17:29 server1 pengine[1765]: notice: process_pe_message:
Calculated Transition 2585: /var/lib/pacemaker/pengine/pe-input-27.bz2
Feb 16 18:17:29 server1 crmd[1766]: notice: run_graph: Transition 2585
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-27.bz2): Complete
Feb 16 18:17:29 server1 crmd[1766]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]

【セカンダリ側のmessagesログ】
Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown )
conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting header on
sock: r=-110
Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
Unconnected )
Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
from=::ffff:192.171.1.102
Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root: cmd='echo
$(date +%Y/%m/%d\ %H:%M:%S)'
Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
duration=0(sec)
Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: Agreed
network protocol version 97
Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread (from
drbd0_receiver [3117])
Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg: <not-used>
Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down by
peer.
Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
NetworkFailure )
Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
WFConnection )

【セカンダリ側のcorosyncログ】
Feb 16 18:00:23 server2 attrd[1557]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-res_drbd_r0 (1000)
Feb 16 18:00:23 server2 attrd[1557]: notice: attrd_perform_update: Sent
update 45: master-res_drbd_r0=1000
Feb 16 18:00:24 server2 corosync[1549]: [TOTEM ] totemsrp.c:1595 A
processor failed, forming new configuration.
Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Transitional membership event on ring 12: memb=1, new=0,
lost=1
Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
pcmk_peer_update: memb: server2 1711385280
Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:779 info:
pcmk_peer_update: lost: server1 1694608064
Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Stable membership event on ring 12: memb=1, new=0, lost=0
Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
pcmk_peer_update: MEMB: server2 1711385280
Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:731 info:
ais_mark_unseen_peer_dead: Node server1 was not seen in the previous
transition
Feb 16 18:00:25 server2 cib[1554]: notice: plugin_handle_membership:
Membership 12: quorum lost
Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] utils.c:358 info:
update_member: Node 1694608064/server1 is now: lost
Feb 16 18:00:25 server2 crmd[1559]: notice: plugin_handle_membership:
Membership 12: quorum lost
Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
send_member_notification: Sending membership update 12 to 2 children
Feb 16 18:00:25 server2 cib[1554]: notice: crm_update_peer_state:
plugin_handle_membership: Node server1[1694608064] - state is now lost (was
member)
Feb 16 18:00:25 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
processor joined or left the membership and a new membership was formed.
Feb 16 18:00:25 server2 crmd[1559]: notice: crm_update_peer_state:
plugin_handle_membership: Node server1[1694608064] - state is now lost (was
member)
Feb 16 18:00:25 server2 crmd[1559]: warning: reap_dead_nodes: Our DC node
(server1) left the cluster
Feb 16 18:00:25 server2 crmd[1559]: notice: do_state_transition: State
transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
origin=reap_dead_nodes ]
Feb 16 18:00:25 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
Feb 16 18:00:25 server2 corosync[1549]: [MAIN ] main.c:278 Completed
service synchronization, ready to provide service.
Feb 16 18:00:25 server2 crmd[1559]: notice: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-res_drbd_r0 (1000)
Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_perform_update: Sent
update 47: master-res_drbd_r0=1000
Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Feb 16 18:00:26 server2 pengine[1558]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Feb 16 18:00:26 server2 pengine[1558]: notice: LogActions: Promote
res_drbd_r0:0#011(Slave -> Master server2)
Feb 16 18:00:26 server2 pengine[1558]: notice: process_pe_message:
Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-0.bz2
Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
action 1: cancel res_drbd_r0_cancel_30000 on server2 (local)
Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
action 56: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
operation res_drbd_r0_notify_0 (call=56, rc=0, cib-update=0,
confirmed=true) ok
Feb 16 18:00:26 server2 crmd[1559]: notice: run_graph: Transition 0
(Complete=3, Pending=0, Fired=0, Skipped=5, Incomplete=4,
Source=/var/lib/pacemaker/pengine/pe-input-0.bz2): Stopped
Feb 16 18:00:26 server2 pengine[1558]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Feb 16 18:00:26 server2 pengine[1558]: notice: LogActions: Promote
res_drbd_r0:0#011(Slave -> Master server2)
Feb 16 18:00:26 server2 pengine[1558]: notice: process_pe_message:
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-1.bz2
Feb 16 18:00:26 server2 crmd[1559]: warning: destroy_action: Cancelling
timer for action 1 (src=45)
Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
action 55: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Transitional membership event on ring 16: memb=1, new=0,
lost=0
Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
pcmk_peer_update: memb: server2 1711385280
Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, lost=0
Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] utils.c:358 info:
update_member: Node 1694608064/server1 is now: member
Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:791 info:
pcmk_peer_update: NEW: server1 1694608064
Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
pcmk_peer_update: MEMB: server1 1694608064
Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
pcmk_peer_update: MEMB: server2 1711385280
Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
send_member_notification: Sending membership update 16 to 2 children
Feb 16 18:00:26 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
processor joined or left the membership and a new membership was formed.
Feb 16 18:00:26 server2 crmd[1559]: notice: plugin_handle_membership:
Membership 16: quorum acquired
Feb 16 18:00:26 server2 crmd[1559]: notice: crm_update_peer_state:
plugin_handle_membership: Node server1[1694608064] - state is now member
(was lost)
Feb 16 18:00:26 server2 cib[1554]: notice: plugin_handle_membership:
Membership 16: quorum acquired
Feb 16 18:00:26 server2 cib[1554]: notice: crm_update_peer_state:
plugin_handle_membership: Node server1[1694608064] - state is now member
(was lost)
Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
operation res_drbd_r0_notify_0 (call=59, rc=0, cib-update=0,
confirmed=true) ok
Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
action 14: promote res_drbd_r0_promote_0 on server2 (local)
Feb 16 18:00:26 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
operation res_drbd_r0_promote_0 (call=62, rc=0, cib-update=49,
confirmed=true) ok
Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
action 56: notify res_drbd_r0_post_notify_promote_0 on server2 (local)
Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-res_drbd_r0 (10000)
Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_perform_update: Sent
update 51: master-res_drbd_r0=10000
Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
operation res_drbd_r0_notify_0 (call=65, rc=0, cib-update=0,
confirmed=true) ok
Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
action 15: monitor res_drbd_r0_monitor_10000 on server2 (local)
Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-res_drbd_r0 (10000)
Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_perform_update: Sent
update 53: master-res_drbd_r0=10000
Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
operation res_drbd_r0_monitor_10000 (call=68, rc=8, cib-update=50,
confirmed=false) master
Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event:
server2-res_drbd_r0_monitor_10000:68 [ \n ]
Feb 16 18:00:26 server2 crmd[1559]: notice: run_graph: Transition 1
(Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-1.bz2): Complete
Feb 16 18:00:27 server2 corosync[1549]: [MAIN ] main.c:278 Completed
service synchronization, ready to provide service.
Feb 16 18:00:28 server2 corosync[1549]: [TOTEM ] totemsrp.c:1595 A
processor failed, forming new configuration.
Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Transitional membership event on ring 20: memb=1, new=0,
lost=1
Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
pcmk_peer_update: memb: server2 1711385280
Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:779 info:
pcmk_peer_update: lost: server1 1694608064
Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0, lost=0
Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
pcmk_peer_update: MEMB: server2 1711385280
Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:731 info:
ais_mark_unseen_peer_dead: Node server1 was not seen in the previous
transition
Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] utils.c:358 info:
update_member: Node 1694608064/server1 is now: lost
Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
send_member_notification: Sending membership update 20 to 2 children
Feb 16 18:00:29 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
processor joined or left the membership and a new membership was formed.
Feb 16 18:00:29 server2 cib[1554]: notice: plugin_handle_membership:
Membership 20: quorum lost
Feb 16 18:00:29 server2 crmd[1559]: notice: plugin_handle_membership:
Membership 20: quorum lost
Feb 16 18:00:29 server2 cib[1554]: notice: crm_update_peer_state:
plugin_handle_membership: Node server1[1694608064] - state is now lost (was
member)
Feb 16 18:00:29 server2 crmd[1559]: warning: match_down_event: No match
for shutdown action on server1
Feb 16 18:00:29 server2 crmd[1559]: notice: peer_update_callback:
Stonith/shutdown of server1 not matched
Feb 16 18:00:29 server2 crmd[1559]: notice: crm_update_peer_state:
plugin_handle_membership: Node server1[1694608064] - state is now lost (was
member)
Feb 16 18:00:29 server2 crmd[1559]: warning: match_down_event: No match
for shutdown action on server1
Feb 16 18:00:29 server2 crmd[1559]: notice: peer_update_callback:
Stonith/shutdown of server1 not matched
Feb 16 18:00:29 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
Feb 16 18:00:29 server2 corosync[1549]: [MAIN ] main.c:278 Completed
service synchronization, ready to provide service.
Feb 16 18:00:29 server2 crmd[1559]: notice: do_election_count_vote:
Election 4 (current: 2, owner: server1): Processed vote from server1 (Peer
is not part of our cluster)
Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-res_drbd_r0 (10000)
Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Feb 16 18:00:29 server2 pengine[1558]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Feb 16 18:00:29 server2 pengine[1558]: notice: process_pe_message:
Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-2.bz2
Feb 16 18:00:29 server2 crmd[1559]: notice: run_graph: Transition 2
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-2.bz2): Complete
Feb 16 18:00:29 server2 crmd[1559]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Transitional membership event on ring 24: memb=1, new=0,
lost=0
Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
pcmk_peer_update: memb: server2 1711385280
Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, lost=0
Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] utils.c:358 info:
update_member: Node 1694608064/server1 is now: member
Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:791 info:
pcmk_peer_update: NEW: server1 1694608064
Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
pcmk_peer_update: MEMB: server1 1694608064
Feb 16 18:00:30 server2 crmd[1559]: notice: plugin_handle_membership:
Membership 24: quorum acquired
Feb 16 18:00:30 server2 cib[1554]: notice: plugin_handle_membership:
Membership 24: quorum acquired
Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
pcmk_peer_update: MEMB: server2 1711385280
Feb 16 18:00:30 server2 cib[1554]: notice: crm_update_peer_state:
plugin_handle_membership: Node server1[1694608064] - state is now member
(was lost)
Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
send_member_notification: Sending membership update 24 to 2 children
Feb 16 18:00:30 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
processor joined or left the membership and a new membership was formed.
Feb 16 18:00:30 server2 crmd[1559]: notice: crm_update_peer_state:
plugin_handle_membership: Node server1[1694608064] - state is now member
(was lost)
Feb 16 18:00:30 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
Feb 16 18:00:31 server2 corosync[1549]: [MAIN ] main.c:278 Completed
service synchronization, ready to provide service.
Feb 16 18:00:31 server2 attrd[1557]: warning: attrd_cib_callback: Update
45 for master-res_drbd_r0=1000 failed: Application of an update diff failed
Feb 16 18:00:56 server2 crmd[1559]: warning: cib_rsc_callback: Resource
update 49 failed: (rc=-62) Timer expired
Feb 16 18:00:56 server2 crmd[1559]: warning: cib_rsc_callback: Resource
update 50 failed: (rc=-62) Timer expired
Feb 16 18:02:26 server2 attrd[1557]: warning: attrd_cib_callback: Update
51 for master-res_drbd_r0=10000 failed: Timer expired
Feb 16 18:02:27 server2 crmd[1559]: warning: crmd_ha_msg_filter: Another
DC detected: server1 (op=join_offer)
Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition: State
transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
origin=crmd_ha_msg_filter ]
Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition: State
transition S_ELECTION -> S_RELEASE_DC [ input=I_RELEASE_DC
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition: State
transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
origin=do_cl_join_finalize_respond ]
Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-res_drbd_r0 (10000)
Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Feb 16 18:02:27 server2 crmd[1559]: notice: process_lrm_event:
server2-res_drbd_r0_monitor_10000:68 [ \n ]
Feb 16 18:02:27 server2 crmd[1559]: notice: process_lrm_event: LRM
operation res_drbd_r0_notify_0 (call=74, rc=0, cib-update=0,
confirmed=true) ok
Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
operation res_drbd_r0_notify_0 (call=77, rc=0, cib-update=0,
confirmed=true) ok
Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
operation res_drbd_r0_demote_0 (call=80, rc=0, cib-update=74,
confirmed=true) ok
Feb 16 18:02:28 server2 attrd[1557]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-res_drbd_r0 (1000)
Feb 16 18:02:28 server2 attrd[1557]: notice: attrd_perform_update: Sent
update 67: master-res_drbd_r0=1000
Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
operation res_drbd_r0_notify_0 (call=83, rc=0, cib-update=0,
confirmed=true) ok
Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
operation res_drbd_r0_monitor_30000 (call=86, rc=0, cib-update=75,
confirmed=false) ok

【global_common.conf】
global {
usage-count no;
}

common {
protocol C;
handlers {
}
startup {
}
disk {
no-disk-barrier;
no-disk-flushes;
}
net {
max-buffers 8000;
max-epoch-size 8000;
after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
syncer {
rate 30M;
verify-alg sha1;
}
}
Re: DRBDスプリットブレインの発生原因に関して [ In reply to ]
あきやまさん

こんにちは、山内です。

corosyncのプライマリログを見る限り、セカンダリとのcorosync層の通信が切れて、一旦、再構成がかかったようです。

corosync.confのtokenパラメータを伸ばしてみるとどうなりますでしょうか?

>Feb 16 18:00:27 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms). Consider token timeout increase.


上記のログが出ていますので、本来のcorosyncのスケジューリングが2秒強、遅れている状態が通知されてます。

以上、宜しくお願いいたします。


----- Original Message -----
>From: あきやまじろう <mayama0130@gmail.com>
>To: linux-ha-japan@lists.sourceforge.jp
>Date: 2015/2/17, Tue 11:38
>Subject: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
>
>
>お世話になります。あきやまと申します。
>
>下記の環境で運用をしています。
>
>環境:
>Red Hat Enterprise Linux Server 6.1 (for x86)
>kernel-2.6.32-431.23.3.el6.i686
>corosync-1.4.1-17.el6_5.1.i686
>pacemaker-1.1.10-14.el6_5.3.i686
>drbd-8.3.16-1.el6.i386
>
>質問内容:
>運用中に突然DRBDスプリットブレインが発生し、手動で復旧させましたが、
>発生原因が不明で困っています。
>発生原因と回避策が分かれば、ご教授願えますでしょうか。
>
>現在までに判明している事:
>・DRBDネットワークのping応答は問題なし
>・messagesログにリンクダウンの出力なし
>
>【プライマリ側のmessagesログ】
>Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
>Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting header on sock: r=-110
>Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
>Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
>Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
>Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
>Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe -> Unconnected )
>Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
>Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
>Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
>Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected -> WFConnection )
>Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056 from=::ffff:192.171.1.102
>Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root: cmd='echo $(date +%Y/%m/%d\ %H:%M:%S)'
>Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056 duration=0(sec)
>Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: Agreed network protocol version 97
>Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection -> WFReportParams )
>Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread (from drbd0_receiver [3117])
>Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg: <not-used>
>Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down by peer.
>Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams -> NetworkFailure )
>Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
>Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
>Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
>Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
>Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
>Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
>Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
>Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected -> WFConnection )
>
>【プライマリ側のcorosyncログ】
>Feb 16 18:00:27 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms). Consider token timeout increase.
>Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Transitional membership event on ring 16: memb=1, new=0, lost=1
>Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:773 info: pcmk_peer_update: memb: server1 1694608064
>Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:779 info: pcmk_peer_update: lost: server2 1711385280
>Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, lost=0
>Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:791 info: pcmk_peer_update: NEW:  server2 1711385280
>Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info: pcmk_peer_update: MEMB: server1 1694608064
>Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info: pcmk_peer_update: MEMB: server2 1711385280
>Feb 16 18:00:27 server1 corosync[1756]:   [TOTEM ] totemsrp.c:1892 A processor joined or left the membership and a new membership was formed.
>Feb 16 18:00:27 server1 corosync[1756]:   [CPG   ] cpg.c:783 chosen downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
>Feb 16 18:00:28 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main process was not scheduled for 826.4143 ms (threshold is 800.0000 ms). Consider token timeout increase.
>Feb 16 18:00:28 server1 corosync[1756]:   [MAIN  ] main.c:278 Completed service synchronization, ready to provide service.
>Feb 16 18:00:28 server1 crmd[1766]:  warning: crmd_ha_msg_filter: Another DC detected: server2 (op=noop)
>Feb 16 18:00:28 server1 crmd[1766]:   notice: do_state_transition: State transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
>Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not requesting full refresh in R/W mode
>Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not requesting full refresh in R/W mode
>Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not requesting full refresh in R/W mode
>Feb 16 18:00:30 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main process was not scheduled for 2248.0801 ms (threshold is 800.0000 ms). Consider token timeout increase.
>Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Transitional membership event on ring 24: memb=1, new=0, lost=1
>Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:773 info: pcmk_peer_update: memb: server1 1694608064
>Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:779 info: pcmk_peer_update: lost: server2 1711385280
>Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, lost=0
>Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:791 info: pcmk_peer_update: NEW:  server2 1711385280
>Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info: pcmk_peer_update: MEMB: server1 1694608064
>Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info: pcmk_peer_update: MEMB: server2 1711385280
>Feb 16 18:00:31 server1 corosync[1756]:   [TOTEM ] totemsrp.c:1892 A processor joined or left the membership and a new membership was formed.
>Feb 16 18:00:32 server1 corosync[1756]:   [CPG   ] cpg.c:783 chosen downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
>Feb 16 18:00:32 server1 corosync[1756]:   [MAIN  ] main.c:278 Completed service synchronization, ready to provide service.
>Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not requesting full refresh in R/W mode
>Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not requesting full refresh in R/W mode
>Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not requesting full refresh in R/W mode
>Feb 16 18:02:28 server1 crmd[1766]:    error: crm_timer_popped: Election Timeout (I_ELECTION_DC) just popped in state S_ELECTION! (120000ms)
>Feb 16 18:02:28 server1 crmd[1766]:   notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=crm_timer_popped ]
>Feb 16 18:02:28 server1 crmd[1766]:   notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
>Feb 16 18:02:28 server1 crmd[1766]:   notice: do_dc_join_finalize: join-4: Syncing the CIB from server2 to the rest of the cluster
>Feb 16 18:02:28 server1 crmd[1766]:   notice: do_dc_join_finalize: Requested version   <generation_tuple epoch="7" num_updates="87" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Wed Jan 21 00:17:56 2015" update-origin="server1" update-client="crmd" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="server2"/>
>Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: Diff: --- 0.7.69
>Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: Diff: +++ 0.7.87 1afd9af32d3a574358543e1d4751aaf1
>Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --     <node_state crm-debug-origin="do_state_transition" join="member" id="server1"/>
>Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --             <lrm_rsc_op operation_key="res_drbd_r0_start_0" operation="start" transition-key="46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df" transition-magic="0:0;46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df" call-id="44" last-run="1421767134" last-rc-change="1421767134" exec-time="430" id="res_drbd_r0_last_0"/>
>Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --             <lrm_rsc_op id="res_drbd_r0_monitor_30000" operation_key="res_drbd_r0_monitor_30000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.7" transition-key="47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df" transition-magic="0:0;47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df" call-id="50" rc-code="0" op-status="0" interval="30000" last-rc-change="1421767135" exec-time="60" queue-time="0" op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
>Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --           <nvpair value="1000" id="status-server2-master-res_drbd_r0"/>
>Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++     <node_state id="server1" uname="server1" in_ccm="true" crmd="online" crm-debug-origin="post_cache_update" join="down" expected="member"/>
>Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++           <nvpair id="status-server2-master-res_drbd_r0" name="master-res_drbd_r0" value="10000"/>
>Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++             <lrm_rsc_op id="res_drbd_r0_last_0" operation_key="res_drbd_r0_promote_0" operation="promote" crm-debug-origin="do_update_resource" crm_feature_set="3.0.7" transition-key="14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb" transition-magic="0:0;14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb" call-id="62" rc-code="0" op-status="0" interval="0" last-run="1424077226" last-rc-change="1424077226" exec-time="120" queue-time="0" op-digest="c0e018b73fdf522b6cdd355e125af15e"/>
>Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++             <lrm_rsc_op id="res_drbd_r0_monitor_10000" operation_key="res_drbd_r0_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.7" transition-key="15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb" transition-magic="0:8;15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb" call-id="68" rc-code="8" op-status="0" interval="10000" last-rc-change="1424077226" exec-time="53" queue-time="0" op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
>Feb 16 18:02:28 server1 stonith-ng[1762]:  warning: cib_process_diff: Diff 0.7.69 -> 0.7.87 from local not applied to 0.7.69: Failed application of an update diff
>Feb 16 18:02:28 server1 stonith-ng[1762]:   notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
>Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
>Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-res_drbd_r0 (10000)
>Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
>Feb 16 18:02:28 server1 pengine[1765]:   notice: unpack_config: On loss of CCM Quorum: Ignore
>Feb 16 18:02:28 server1 pengine[1765]:   notice: LogActions: Demote  res_drbd_r0:1#011(Master -> Slave server2)
>Feb 16 18:02:28 server1 pengine[1765]:   notice: process_pe_message: Calculated Transition 2582: /var/lib/pacemaker/pengine/pe-input-24.bz2
>Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating action 8: cancel res_drbd_r0_cancel_10000 on server2
>Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating action 81: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
>Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating action 83: notify res_drbd_r0_pre_notify_demote_0 on server2
>Feb 16 18:02:28 server1 crmd[1766]:   notice: process_lrm_event: LRM operation res_drbd_r0_notify_0 (call=113, rc=0, cib-update=0, confirmed=true) ok
>Feb 16 18:02:28 server1 crmd[1766]:   notice: run_graph: Transition 2582 (Complete=4, Pending=0, Fired=0, Skipped=4, Incomplete=5, Source=/var/lib/pacemaker/pengine/pe-input-24.bz2): Stopped
>Feb 16 18:02:28 server1 pengine[1765]:   notice: unpack_config: On loss of CCM Quorum: Ignore
>Feb 16 18:02:28 server1 pengine[1765]:   notice: LogActions: Demote  res_drbd_r0:1#011(Master -> Slave server2)
>Feb 16 18:02:28 server1 pengine[1765]:   notice: process_pe_message: Calculated Transition 2583: /var/lib/pacemaker/pengine/pe-input-25.bz2
>Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating action 80: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
>Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating action 82: notify res_drbd_r0_pre_notify_demote_0 on server2
>Feb 16 18:02:29 server1 crmd[1766]:   notice: process_lrm_event: LRM operation res_drbd_r0_notify_0 (call=116, rc=0, cib-update=0, confirmed=true) ok
>Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating action 39: demote res_drbd_r0_demote_0 on server2
>Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating action 81: notify res_drbd_r0_post_notify_demote_0 on server1 (local)
>Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating action 83: notify res_drbd_r0_post_notify_demote_0 on server2
>Feb 16 18:02:29 server1 crmd[1766]:   notice: process_lrm_event: LRM operation res_drbd_r0_notify_0 (call=119, rc=0, cib-update=0, confirmed=true) ok
>Feb 16 18:02:29 server1 crmd[1766]:   notice: run_graph: Transition 2583 (Complete=11, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-25.bz2): Stopped
>Feb 16 18:02:29 server1 pengine[1765]:   notice: unpack_config: On loss of CCM Quorum: Ignore
>Feb 16 18:02:29 server1 pengine[1765]:   notice: process_pe_message: Calculated Transition 2584: /var/lib/pacemaker/pengine/pe-input-26.bz2
>Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating action 40: monitor res_drbd_r0_monitor_30000 on server2
>Feb 16 18:02:29 server1 crmd[1766]:   notice: run_graph: Transition 2584 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-26.bz2): Complete
>Feb 16 18:02:29 server1 crmd[1766]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
>Feb 16 18:17:29 server1 crmd[1766]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
>Feb 16 18:17:29 server1 pengine[1765]:   notice: unpack_config: On loss of CCM Quorum: Ignore
>Feb 16 18:17:29 server1 pengine[1765]:   notice: process_pe_message: Calculated Transition 2585: /var/lib/pacemaker/pengine/pe-input-27.bz2
>Feb 16 18:17:29 server1 crmd[1766]:   notice: run_graph: Transition 2585 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-27.bz2): Complete
>Feb 16 18:17:29 server1 crmd[1766]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
>
>【セカンダリ側のmessagesログ】
>Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
>Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting header on sock: r=-110
>Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
>Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
>Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
>Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
>Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe -> Unconnected )
>Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
>Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
>Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
>Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected -> WFConnection )
>Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056 from=::ffff:192.171.1.102
>Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root: cmd='echo $(date +%Y/%m/%d\ %H:%M:%S)'
>Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056 duration=0(sec)
>Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: Agreed network protocol version 97
>Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection -> WFReportParams )
>Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread (from drbd0_receiver [3117])
>Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg: <not-used>
>Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down by peer.
>Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams -> NetworkFailure )
>Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
>Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
>Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
>Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
>Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
>Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
>Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
>Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected -> WFConnection )
>
>【セカンダリ側のcorosyncログ】
>Feb 16 18:00:23 server2 attrd[1557]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-res_drbd_r0 (1000)
>Feb 16 18:00:23 server2 attrd[1557]:   notice: attrd_perform_update: Sent update 45: master-res_drbd_r0=1000
>Feb 16 18:00:24 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1595 A processor failed, forming new configuration.
>Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Transitional membership event on ring 12: memb=1, new=0, lost=1
>Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info: pcmk_peer_update: memb: server2 1711385280
>Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:779 info: pcmk_peer_update: lost: server1 1694608064
>Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Stable membership event on ring 12: memb=1, new=0, lost=0
>Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info: pcmk_peer_update: MEMB: server2 1711385280
>Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:731 info: ais_mark_unseen_peer_dead: Node server1 was not seen in the previous transition
>Feb 16 18:00:25 server2 cib[1554]:   notice: plugin_handle_membership: Membership 12: quorum lost
>Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] utils.c:358 info: update_member: Node 1694608064/server1 is now: lost
>Feb 16 18:00:25 server2 crmd[1559]:   notice: plugin_handle_membership: Membership 12: quorum lost
>Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info: send_member_notification: Sending membership update 12 to 2 children
>Feb 16 18:00:25 server2 cib[1554]:   notice: crm_update_peer_state: plugin_handle_membership: Node server1[1694608064] - state is now lost (was member)
>Feb 16 18:00:25 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A processor joined or left the membership and a new membership was formed.
>Feb 16 18:00:25 server2 crmd[1559]:   notice: crm_update_peer_state: plugin_handle_membership: Node server1[1694608064] - state is now lost (was member)
>Feb 16 18:00:25 server2 crmd[1559]:  warning: reap_dead_nodes: Our DC node (server1) left the cluster
>Feb 16 18:00:25 server2 crmd[1559]:   notice: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=reap_dead_nodes ]
>Feb 16 18:00:25 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
>Feb 16 18:00:25 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed service synchronization, ready to provide service.
>Feb 16 18:00:25 server2 crmd[1559]:   notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
>Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
>Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-res_drbd_r0 (1000)
>Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_perform_update: Sent update 47: master-res_drbd_r0=1000
>Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
>Feb 16 18:00:26 server2 pengine[1558]:   notice: unpack_config: On loss of CCM Quorum: Ignore
>Feb 16 18:00:26 server2 pengine[1558]:   notice: LogActions: Promote res_drbd_r0:0#011(Slave -> Master server2)
>Feb 16 18:00:26 server2 pengine[1558]:   notice: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-0.bz2
>Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating action 1: cancel res_drbd_r0_cancel_30000 on server2 (local)
>Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating action 56: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
>Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM operation res_drbd_r0_notify_0 (call=56, rc=0, cib-update=0, confirmed=true) ok
>Feb 16 18:00:26 server2 crmd[1559]:   notice: run_graph: Transition 0 (Complete=3, Pending=0, Fired=0, Skipped=5, Incomplete=4, Source=/var/lib/pacemaker/pengine/pe-input-0.bz2): Stopped
>Feb 16 18:00:26 server2 pengine[1558]:   notice: unpack_config: On loss of CCM Quorum: Ignore
>Feb 16 18:00:26 server2 pengine[1558]:   notice: LogActions: Promote res_drbd_r0:0#011(Slave -> Master server2)
>Feb 16 18:00:26 server2 pengine[1558]:   notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-1.bz2
>Feb 16 18:00:26 server2 crmd[1559]:  warning: destroy_action: Cancelling timer for action 1 (src=45)
>Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating action 55: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
>Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Transitional membership event on ring 16: memb=1, new=0, lost=0
>Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info: pcmk_peer_update: memb: server2 1711385280
>Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, lost=0
>Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] utils.c:358 info: update_member: Node 1694608064/server1 is now: member
>Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:791 info: pcmk_peer_update: NEW:  server1 1694608064
>Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info: pcmk_peer_update: MEMB: server1 1694608064
>Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info: pcmk_peer_update: MEMB: server2 1711385280
>Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info: send_member_notification: Sending membership update 16 to 2 children
>Feb 16 18:00:26 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A processor joined or left the membership and a new membership was formed.
>Feb 16 18:00:26 server2 crmd[1559]:   notice: plugin_handle_membership: Membership 16: quorum acquired
>Feb 16 18:00:26 server2 crmd[1559]:   notice: crm_update_peer_state: plugin_handle_membership: Node server1[1694608064] - state is now member (was lost)
>Feb 16 18:00:26 server2 cib[1554]:   notice: plugin_handle_membership: Membership 16: quorum acquired
>Feb 16 18:00:26 server2 cib[1554]:   notice: crm_update_peer_state: plugin_handle_membership: Node server1[1694608064] - state is now member (was lost)
>Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM operation res_drbd_r0_notify_0 (call=59, rc=0, cib-update=0, confirmed=true) ok
>Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating action 14: promote res_drbd_r0_promote_0 on server2 (local)
>Feb 16 18:00:26 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
>Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM operation res_drbd_r0_promote_0 (call=62, rc=0, cib-update=49, confirmed=true) ok
>Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating action 56: notify res_drbd_r0_post_notify_promote_0 on server2 (local)
>Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-res_drbd_r0 (10000)
>Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_perform_update: Sent update 51: master-res_drbd_r0=10000
>Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM operation res_drbd_r0_notify_0 (call=65, rc=0, cib-update=0, confirmed=true) ok
>Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating action 15: monitor res_drbd_r0_monitor_10000 on server2 (local)
>Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-res_drbd_r0 (10000)
>Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_perform_update: Sent update 53: master-res_drbd_r0=10000
>Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM operation res_drbd_r0_monitor_10000 (call=68, rc=8, cib-update=50, confirmed=false) master
>Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: server2-res_drbd_r0_monitor_10000:68 [ \n ]
>Feb 16 18:00:26 server2 crmd[1559]:   notice: run_graph: Transition 1 (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1.bz2): Complete
>Feb 16 18:00:27 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed service synchronization, ready to provide service.
>Feb 16 18:00:28 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1595 A processor failed, forming new configuration.
>Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Transitional membership event on ring 20: memb=1, new=0, lost=1
>Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info: pcmk_peer_update: memb: server2 1711385280
>Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:779 info: pcmk_peer_update: lost: server1 1694608064
>Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0, lost=0
>Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info: pcmk_peer_update: MEMB: server2 1711385280
>Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:731 info: ais_mark_unseen_peer_dead: Node server1 was not seen in the previous transition
>Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] utils.c:358 info: update_member: Node 1694608064/server1 is now: lost
>Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info: send_member_notification: Sending membership update 20 to 2 children
>Feb 16 18:00:29 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A processor joined or left the membership and a new membership was formed.
>Feb 16 18:00:29 server2 cib[1554]:   notice: plugin_handle_membership: Membership 20: quorum lost
>Feb 16 18:00:29 server2 crmd[1559]:   notice: plugin_handle_membership: Membership 20: quorum lost
>Feb 16 18:00:29 server2 cib[1554]:   notice: crm_update_peer_state: plugin_handle_membership: Node server1[1694608064] - state is now lost (was member)
>Feb 16 18:00:29 server2 crmd[1559]:  warning: match_down_event: No match for shutdown action on server1
>Feb 16 18:00:29 server2 crmd[1559]:   notice: peer_update_callback: Stonith/shutdown of server1 not matched
>Feb 16 18:00:29 server2 crmd[1559]:   notice: crm_update_peer_state: plugin_handle_membership: Node server1[1694608064] - state is now lost (was member)
>Feb 16 18:00:29 server2 crmd[1559]:  warning: match_down_event: No match for shutdown action on server1
>Feb 16 18:00:29 server2 crmd[1559]:   notice: peer_update_callback: Stonith/shutdown of server1 not matched
>Feb 16 18:00:29 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
>Feb 16 18:00:29 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed service synchronization, ready to provide service.
>Feb 16 18:00:29 server2 crmd[1559]:   notice: do_election_count_vote: Election 4 (current: 2, owner: server1): Processed vote from server1 (Peer is not part of our cluster)
>Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
>Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-res_drbd_r0 (10000)
>Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
>Feb 16 18:00:29 server2 pengine[1558]:   notice: unpack_config: On loss of CCM Quorum: Ignore
>Feb 16 18:00:29 server2 pengine[1558]:   notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-2.bz2
>Feb 16 18:00:29 server2 crmd[1559]:   notice: run_graph: Transition 2 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2.bz2): Complete
>Feb 16 18:00:29 server2 crmd[1559]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
>Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Transitional membership event on ring 24: memb=1, new=0, lost=0
>Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info: pcmk_peer_update: memb: server2 1711385280
>Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice: pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, lost=0
>Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] utils.c:358 info: update_member: Node 1694608064/server1 is now: member
>Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:791 info: pcmk_peer_update: NEW:  server1 1694608064
>Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info: pcmk_peer_update: MEMB: server1 1694608064
>Feb 16 18:00:30 server2 crmd[1559]:   notice: plugin_handle_membership: Membership 24: quorum acquired
>Feb 16 18:00:30 server2 cib[1554]:   notice: plugin_handle_membership: Membership 24: quorum acquired
>Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info: pcmk_peer_update: MEMB: server2 1711385280
>Feb 16 18:00:30 server2 cib[1554]:   notice: crm_update_peer_state: plugin_handle_membership: Node server1[1694608064] - state is now member (was lost)
>Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info: send_member_notification: Sending membership update 24 to 2 children
>Feb 16 18:00:30 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A processor joined or left the membership and a new membership was formed.
>Feb 16 18:00:30 server2 crmd[1559]:   notice: crm_update_peer_state: plugin_handle_membership: Node server1[1694608064] - state is now member (was lost)
>Feb 16 18:00:30 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
>Feb 16 18:00:31 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed service synchronization, ready to provide service.
>Feb 16 18:00:31 server2 attrd[1557]:  warning: attrd_cib_callback: Update 45 for master-res_drbd_r0=1000 failed: Application of an update diff failed
>Feb 16 18:00:56 server2 crmd[1559]:  warning: cib_rsc_callback: Resource update 49 failed: (rc=-62) Timer expired
>Feb 16 18:00:56 server2 crmd[1559]:  warning: cib_rsc_callback: Resource update 50 failed: (rc=-62) Timer expired
>Feb 16 18:02:26 server2 attrd[1557]:  warning: attrd_cib_callback: Update 51 for master-res_drbd_r0=10000 failed: Timer expired
>Feb 16 18:02:27 server2 crmd[1559]:  warning: crmd_ha_msg_filter: Another DC detected: server1 (op=join_offer)
>Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
>Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State transition S_ELECTION -> S_RELEASE_DC [ input=I_RELEASE_DC cause=C_FSA_INTERNAL origin=do_election_count_vote ]
>Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
>Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
>Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-res_drbd_r0 (10000)
>Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
>Feb 16 18:02:27 server2 crmd[1559]:   notice: process_lrm_event: server2-res_drbd_r0_monitor_10000:68 [ \n ]
>Feb 16 18:02:27 server2 crmd[1559]:   notice: process_lrm_event: LRM operation res_drbd_r0_notify_0 (call=74, rc=0, cib-update=0, confirmed=true) ok
>Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM operation res_drbd_r0_notify_0 (call=77, rc=0, cib-update=0, confirmed=true) ok
>Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM operation res_drbd_r0_demote_0 (call=80, rc=0, cib-update=74, confirmed=true) ok
>Feb 16 18:02:28 server2 attrd[1557]:   notice: attrd_trigger_update: Sending flush op to all hosts for: master-res_drbd_r0 (1000)
>Feb 16 18:02:28 server2 attrd[1557]:   notice: attrd_perform_update: Sent update 67: master-res_drbd_r0=1000
>Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM operation res_drbd_r0_notify_0 (call=83, rc=0, cib-update=0, confirmed=true) ok
>Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM operation res_drbd_r0_monitor_30000 (call=86, rc=0, cib-update=75, confirmed=false) ok
>
>【global_common.conf】
>global {
>        usage-count no;
>}
>
>common {
>        protocol C;
>        handlers {
>        }
>        startup {
>        }
>        disk {
>                no-disk-barrier;
>                no-disk-flushes;
>        }
>        net {
>                max-buffers 8000;
>                max-epoch-size 8000;
>                after-sb-0pri discard-younger-primary;
>                after-sb-1pri discard-secondary;
>                after-sb-2pri disconnect;
>        }
>        syncer {
>                rate 30M;
>                verify-alg sha1;
>        }
>}
>
>
>
>
>_______________________________________________
>Linux-ha-japan mailing list
>Linux-ha-japan@lists.sourceforge.jp
>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>
>
>

_______________________________________________
Linux-ha-japan mailing list
Linux-ha-japan@lists.sourceforge.jp
http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
Re: DRBDスプリットブレインの発生原因に関して [ In reply to ]
あきやまさん

山内です。

ちなみに、プライマリ、セカンダリの負荷が高い可能性もありますので、
sarやvmstatでcpuの状態も見てみた方がよいかも知れません。

#ある時に、何かのcronジョブなどで負荷があがったかも知れません。

以上です。


----- Original Message -----
> From: "renayama19661014@ybb.ne.jp" <renayama19661014@ybb.ne.jp>
> To: "linux-ha-japan@lists.sourceforge.jp" <linux-ha-japan@lists.sourceforge.jp>
> Cc:
> Date: 2015/2/17, Tue 12:31
> Subject: Re: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
>
> あきやまさん
>
> こんにちは、山内です。
>
> corosyncのプライマリログを見る限り、セカンダリとのcorosync層の通信が切れて、一旦、再構成がかかったようです。
>
> corosync.confのtokenパラメータを伸ばしてみるとどうなりますでしょうか?
>
>> Feb 16 18:00:27 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
> process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms). Consider
> token timeout increase.
>
>
> 上記のログが出ていますので、本来のcorosyncのスケジューリングが2秒強、遅れている状態が通知されてます。
>
> 以上、宜しくお願いいたします。
>
>
> ----- Original Message -----
>> From: あきやまじろう <mayama0130@gmail.com>
>> To: linux-ha-japan@lists.sourceforge.jp
>> Date: 2015/2/17, Tue 11:38
>> Subject: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
>>
>>
>> お世話になります。あきやまと申します。
>>
>> 下記の環境で運用をしています。
>>
>> 環境:
>> Red Hat Enterprise Linux Server 6.1 (for x86)
>> kernel-2.6.32-431.23.3.el6.i686
>> corosync-1.4.1-17.el6_5.1.i686
>> pacemaker-1.1.10-14.el6_5.3.i686
>> drbd-8.3.16-1.el6.i386
>>
>> 質問内容:
>> 運用中に突然DRBDスプリットブレインが発生し、手動で復旧させましたが、
>> 発生原因が不明で困っています。
>> 発生原因と回避策が分かれば、ご教授願えますでしょうか。
>>
>> 現在までに判明している事:
>> ・DRBDネットワークのping応答は問題なし
>> ・messagesログにリンクダウンの出力なし
>>
>> 【プライマリ側のmessagesログ】
>> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
>> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown )
> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting header on
> sock: r=-110
>> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
> 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
>> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
>> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
>> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
> Unconnected )
>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
>> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
> WFConnection )
>> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
> from=::ffff:192.171.1.102
>> Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root: cmd='echo
> $(date +%Y/%m/%d\ %H:%M:%S)'
>> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
> duration=0(sec)
>> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: Agreed
> network protocol version 97
>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
> WFReportParams )
>> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread (from
> drbd0_receiver [3117])
>> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg:
> <not-used>
>> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down by
> peer.
>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
> NetworkFailure )
>> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
>> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
>> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
> Unconnected )
>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
>> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
> WFConnection )
>>
>> 【プライマリ側のcorosyncログ】
>> Feb 16 18:00:27 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
> process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms). Consider
> token timeout increase.
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 16: memb=1, new=0,
> lost=1
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server1 1694608064
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:779 info:
> pcmk_peer_update: lost: server2 1711385280
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, lost=0
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:791 info:
> pcmk_peer_update: NEW:  server2 1711385280
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server1 1694608064
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:27 server1 corosync[1756]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:27 server1 corosync[1756]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
>> Feb 16 18:00:28 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
> process was not scheduled for 826.4143 ms (threshold is 800.0000 ms). Consider
> token timeout increase.
>> Feb 16 18:00:28 server1 corosync[1756]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:28 server1 crmd[1766]:  warning: crmd_ha_msg_filter: Another DC
> detected: server2 (op=noop)
>> Feb 16 18:00:28 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> origin=crmd_ha_msg_filter ]
>> Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:00:30 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
> process was not scheduled for 2248.0801 ms (threshold is 800.0000 ms). Consider
> token timeout increase.
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 24: memb=1, new=0,
> lost=1
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server1 1694608064
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:779 info:
> pcmk_peer_update: lost: server2 1711385280
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, lost=0
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:791 info:
> pcmk_peer_update: NEW:  server2 1711385280
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server1 1694608064
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:31 server1 corosync[1756]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:32 server1 corosync[1756]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
>> Feb 16 18:00:32 server1 corosync[1756]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:02:28 server1 crmd[1766]:    error: crm_timer_popped: Election
> Timeout (I_ELECTION_DC) just popped in state S_ELECTION! (120000ms)
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_TIMER_POPPED origin=crm_timer_popped ]
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_dc_join_finalize: join-4:
> Syncing the CIB from server2 to the rest of the cluster
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_dc_join_finalize: Requested
> version   <generation_tuple epoch="7" num_updates="87"
> admin_epoch="0" validate-with="pacemaker-1.2"
> cib-last-written="Wed Jan 21 00:17:56 2015"
> update-origin="server1" update-client="crmd"
> crm_feature_set="3.0.7" have-quorum="1"
> dc-uuid="server2"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: Diff: --- 0.7.69
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: Diff: +++ 0.7.87
> 1afd9af32d3a574358543e1d4751aaf1
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --     <node_state
> crm-debug-origin="do_state_transition" join="member"
> id="server1"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --            
> <lrm_rsc_op operation_key="res_drbd_r0_start_0"
> operation="start"
> transition-key="46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> transition-magic="0:0;46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> call-id="44" last-run="1421767134"
> last-rc-change="1421767134" exec-time="430"
> id="res_drbd_r0_last_0"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --            
> <lrm_rsc_op id="res_drbd_r0_monitor_30000"
> operation_key="res_drbd_r0_monitor_30000"
> operation="monitor" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.7"
> transition-key="47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> transition-magic="0:0;47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> call-id="50" rc-code="0" op-status="0"
> interval="30000" last-rc-change="1421767135"
> exec-time="60" queue-time="0"
> op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --          
> <nvpair value="1000"
> id="status-server2-master-res_drbd_r0"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++     <node_state
> id="server1" uname="server1" in_ccm="true"
> crmd="online" crm-debug-origin="post_cache_update"
> join="down" expected="member"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++          
> <nvpair id="status-server2-master-res_drbd_r0"
> name="master-res_drbd_r0" value="10000"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++            
> <lrm_rsc_op id="res_drbd_r0_last_0"
> operation_key="res_drbd_r0_promote_0" operation="promote"
> crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.7"
> transition-key="14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> transition-magic="0:0;14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> call-id="62" rc-code="0" op-status="0"
> interval="0" last-run="1424077226"
> last-rc-change="1424077226" exec-time="120"
> queue-time="0"
> op-digest="c0e018b73fdf522b6cdd355e125af15e"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++            
> <lrm_rsc_op id="res_drbd_r0_monitor_10000"
> operation_key="res_drbd_r0_monitor_10000"
> operation="monitor" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.7"
> transition-key="15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> transition-magic="0:8;15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> call-id="68" rc-code="8" op-status="0"
> interval="10000" last-rc-change="1424077226"
> exec-time="53" queue-time="0"
> op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
>> Feb 16 18:02:28 server1 stonith-ng[1762]:  warning: cib_process_diff: Diff
> 0.7.69 -> 0.7.87 from local not applied to 0.7.69: Failed application of an
> update diff
>> Feb 16 18:02:28 server1 stonith-ng[1762]:   notice: update_cib_cache_cb:
> [cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
>> Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_local_callback: Sending
> full refresh (origin=crmd)
>> Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (10000)
>> Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: probe_complete (true)
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: LogActions: Demote 
> res_drbd_r0:1#011(Master -> Slave server2)
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: process_pe_message:
> Calculated Transition 2582: /var/lib/pacemaker/pengine/pe-input-24.bz2
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 8: cancel res_drbd_r0_cancel_10000 on server2
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 81: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 83: notify res_drbd_r0_pre_notify_demote_0 on server2
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=113, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: run_graph: Transition 2582
> (Complete=4, Pending=0, Fired=0, Skipped=4, Incomplete=5,
> Source=/var/lib/pacemaker/pengine/pe-input-24.bz2): Stopped
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: LogActions: Demote 
> res_drbd_r0:1#011(Master -> Slave server2)
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: process_pe_message:
> Calculated Transition 2583: /var/lib/pacemaker/pengine/pe-input-25.bz2
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 80: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 82: notify res_drbd_r0_pre_notify_demote_0 on server2
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=116, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 39: demote res_drbd_r0_demote_0 on server2
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 81: notify res_drbd_r0_post_notify_demote_0 on server1 (local)
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 83: notify res_drbd_r0_post_notify_demote_0 on server2
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=119, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: run_graph: Transition 2583
> (Complete=11, Pending=0, Fired=0, Skipped=1, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-25.bz2): Stopped
>> Feb 16 18:02:29 server1 pengine[1765]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:02:29 server1 pengine[1765]:   notice: process_pe_message:
> Calculated Transition 2584: /var/lib/pacemaker/pengine/pe-input-26.bz2
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 40: monitor res_drbd_r0_monitor_30000 on server2
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: run_graph: Transition 2584
> (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-26.bz2): Complete
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>> Feb 16 18:17:29 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
> origin=crm_timer_popped ]
>> Feb 16 18:17:29 server1 pengine[1765]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:17:29 server1 pengine[1765]:   notice: process_pe_message:
> Calculated Transition 2585: /var/lib/pacemaker/pengine/pe-input-27.bz2
>> Feb 16 18:17:29 server1 crmd[1766]:   notice: run_graph: Transition 2585
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-27.bz2): Complete
>> Feb 16 18:17:29 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>
>> 【セカンダリ側のmessagesログ】
>> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
>> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown )
> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting header on
> sock: r=-110
>> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
> 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
>> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
>> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
>> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
> Unconnected )
>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
>> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
> WFConnection )
>> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
> from=::ffff:192.171.1.102
>> Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root: cmd='echo
> $(date +%Y/%m/%d\ %H:%M:%S)'
>> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
> duration=0(sec)
>> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: Agreed
> network protocol version 97
>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
> WFReportParams )
>> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread (from
> drbd0_receiver [3117])
>> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg:
> <not-used>
>> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down by
> peer.
>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
> NetworkFailure )
>> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
>> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
>> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
> Unconnected )
>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
>> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
> WFConnection )
>>
>> 【セカンダリ側のcorosyncログ】
>> Feb 16 18:00:23 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (1000)
>> Feb 16 18:00:23 server2 attrd[1557]:   notice: attrd_perform_update: Sent
> update 45: master-res_drbd_r0=1000
>> Feb 16 18:00:24 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1595 A
> processor failed, forming new configuration.
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 12: memb=1, new=0,
> lost=1
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server2 1711385280
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:779 info:
> pcmk_peer_update: lost: server1 1694608064
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 12: memb=1, new=0, lost=0
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:731 info:
> ais_mark_unseen_peer_dead: Node server1 was not seen in the previous transition
>> Feb 16 18:00:25 server2 cib[1554]:   notice: plugin_handle_membership:
> Membership 12: quorum lost
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
> update_member: Node 1694608064/server1 is now: lost
>> Feb 16 18:00:25 server2 crmd[1559]:   notice: plugin_handle_membership:
> Membership 12: quorum lost
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
> send_member_notification: Sending membership update 12 to 2 children
>> Feb 16 18:00:25 server2 cib[1554]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
> member)
>> Feb 16 18:00:25 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:25 server2 crmd[1559]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
> member)
>> Feb 16 18:00:25 server2 crmd[1559]:  warning: reap_dead_nodes: Our DC node
> (server1) left the cluster
>> Feb 16 18:00:25 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> origin=reap_dead_nodes ]
>> Feb 16 18:00:25 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
>> Feb 16 18:00:25 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:25 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_local_callback: Sending
> full refresh (origin=crmd)
>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (1000)
>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_perform_update: Sent
> update 47: master-res_drbd_r0=1000
>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: probe_complete (true)
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: LogActions: Promote
> res_drbd_r0:0#011(Slave -> Master server2)
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: process_pe_message:
> Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-0.bz2
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 1: cancel res_drbd_r0_cancel_30000 on server2 (local)
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 56: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=56, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: run_graph: Transition 0
> (Complete=3, Pending=0, Fired=0, Skipped=5, Incomplete=4,
> Source=/var/lib/pacemaker/pengine/pe-input-0.bz2): Stopped
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: LogActions: Promote
> res_drbd_r0:0#011(Slave -> Master server2)
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: process_pe_message:
> Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-1.bz2
>> Feb 16 18:00:26 server2 crmd[1559]:  warning: destroy_action: Cancelling
> timer for action 1 (src=45)
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 55: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 16: memb=1, new=0,
> lost=0
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server2 1711385280
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, lost=0
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
> update_member: Node 1694608064/server1 is now: member
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:791 info:
> pcmk_peer_update: NEW:  server1 1694608064
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server1 1694608064
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
> send_member_notification: Sending membership update 16 to 2 children
>> Feb 16 18:00:26 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: plugin_handle_membership:
> Membership 16: quorum acquired
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now member (was
> lost)
>> Feb 16 18:00:26 server2 cib[1554]:   notice: plugin_handle_membership:
> Membership 16: quorum acquired
>> Feb 16 18:00:26 server2 cib[1554]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now member (was
> lost)
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=59, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 14: promote res_drbd_r0_promote_0 on server2 (local)
>> Feb 16 18:00:26 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_promote_0 (call=62, rc=0, cib-update=49, confirmed=true)
> ok
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 56: notify res_drbd_r0_post_notify_promote_0 on server2 (local)
>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (10000)
>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_perform_update: Sent
> update 51: master-res_drbd_r0=10000
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=65, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 15: monitor res_drbd_r0_monitor_10000 on server2 (local)
>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (10000)
>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_perform_update: Sent
> update 53: master-res_drbd_r0=10000
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_monitor_10000 (call=68, rc=8, cib-update=50,
> confirmed=false) master
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event:
> server2-res_drbd_r0_monitor_10000:68 [ \n ]
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: run_graph: Transition 1
> (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-1.bz2): Complete
>> Feb 16 18:00:27 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:28 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1595 A
> processor failed, forming new configuration.
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 20: memb=1, new=0,
> lost=1
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server2 1711385280
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:779 info:
> pcmk_peer_update: lost: server1 1694608064
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0, lost=0
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:731 info:
> ais_mark_unseen_peer_dead: Node server1 was not seen in the previous transition
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
> update_member: Node 1694608064/server1 is now: lost
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
> send_member_notification: Sending membership update 20 to 2 children
>> Feb 16 18:00:29 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:29 server2 cib[1554]:   notice: plugin_handle_membership:
> Membership 20: quorum lost
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: plugin_handle_membership:
> Membership 20: quorum lost
>> Feb 16 18:00:29 server2 cib[1554]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
> member)
>> Feb 16 18:00:29 server2 crmd[1559]:  warning: match_down_event: No match for
> shutdown action on server1
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: peer_update_callback:
> Stonith/shutdown of server1 not matched
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
> member)
>> Feb 16 18:00:29 server2 crmd[1559]:  warning: match_down_event: No match for
> shutdown action on server1
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: peer_update_callback:
> Stonith/shutdown of server1 not matched
>> Feb 16 18:00:29 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
>> Feb 16 18:00:29 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: do_election_count_vote:
> Election 4 (current: 2, owner: server1): Processed vote from server1 (Peer is
> not part of our cluster)
>> Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_local_callback: Sending
> full refresh (origin=crmd)
>> Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (10000)
>> Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: probe_complete (true)
>> Feb 16 18:00:29 server2 pengine[1558]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:00:29 server2 pengine[1558]:   notice: process_pe_message:
> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-2.bz2
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: run_graph: Transition 2
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-2.bz2): Complete
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 24: memb=1, new=0,
> lost=0
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server2 1711385280
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, lost=0
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
> update_member: Node 1694608064/server1 is now: member
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:791 info:
> pcmk_peer_update: NEW:  server1 1694608064
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server1 1694608064
>> Feb 16 18:00:30 server2 crmd[1559]:   notice: plugin_handle_membership:
> Membership 24: quorum acquired
>> Feb 16 18:00:30 server2 cib[1554]:   notice: plugin_handle_membership:
> Membership 24: quorum acquired
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:30 server2 cib[1554]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now member (was
> lost)
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
> send_member_notification: Sending membership update 24 to 2 children
>> Feb 16 18:00:30 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:30 server2 crmd[1559]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now member (was
> lost)
>> Feb 16 18:00:30 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
>> Feb 16 18:00:31 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:31 server2 attrd[1557]:  warning: attrd_cib_callback: Update 45
> for master-res_drbd_r0=1000 failed: Application of an update diff failed
>> Feb 16 18:00:56 server2 crmd[1559]:  warning: cib_rsc_callback: Resource
> update 49 failed: (rc=-62) Timer expired
>> Feb 16 18:00:56 server2 crmd[1559]:  warning: cib_rsc_callback: Resource
> update 50 failed: (rc=-62) Timer expired
>> Feb 16 18:02:26 server2 attrd[1557]:  warning: attrd_cib_callback: Update 51
> for master-res_drbd_r0=10000 failed: Timer expired
>> Feb 16 18:02:27 server2 crmd[1559]:  warning: crmd_ha_msg_filter: Another DC
> detected: server1 (op=join_offer)
>> Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> origin=crmd_ha_msg_filter ]
>> Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_ELECTION -> S_RELEASE_DC [ input=I_RELEASE_DC
> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
>> Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
> origin=do_cl_join_finalize_respond ]
>> Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_local_callback: Sending
> full refresh (origin=crmd)
>> Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (10000)
>> Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: probe_complete (true)
>> Feb 16 18:02:27 server2 crmd[1559]:   notice: process_lrm_event:
> server2-res_drbd_r0_monitor_10000:68 [ \n ]
>> Feb 16 18:02:27 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=74, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=77, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_demote_0 (call=80, rc=0, cib-update=74, confirmed=true) ok
>> Feb 16 18:02:28 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (1000)
>> Feb 16 18:02:28 server2 attrd[1557]:   notice: attrd_perform_update: Sent
> update 67: master-res_drbd_r0=1000
>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=83, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_monitor_30000 (call=86, rc=0, cib-update=75,
> confirmed=false) ok
>>
>> 【global_common.conf】
>> global {
>>         usage-count no;
>> }
>>
>> common {
>>         protocol C;
>>         handlers {
>>         }
>>         startup {
>>         }
>>         disk {
>>                 no-disk-barrier;
>>                 no-disk-flushes;
>>         }
>>         net {
>>                 max-buffers 8000;
>>                 max-epoch-size 8000;
>>                 after-sb-0pri discard-younger-primary;
>>                 after-sb-1pri discard-secondary;
>>                 after-sb-2pri disconnect;
>>         }
>>         syncer {
>>                 rate 30M;
>>                 verify-alg sha1;
>>         }
>> }
>>
>>
>>
>>
>> _______________________________________________
>> Linux-ha-japan mailing list
>> Linux-ha-japan@lists.sourceforge.jp
>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>>
>>
>>
>
> _______________________________________________
> Linux-ha-japan mailing list
> Linux-ha-japan@lists.sourceforge.jp
> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>

_______________________________________________
Linux-ha-japan mailing list
Linux-ha-japan@lists.sourceforge.jp
http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
Re: DRBDスプリットブレインの発生原因に関して [ In reply to ]
あきやまさん

山内です。

ちなみに、プライマリ、セカンダリの負荷が高い可能性もありますので、
sarやvmstatでcpuの状態も見てみた方がよいかも知れません。

#ある時に、何かのcronジョブなどで負荷があがったかも知れません。

以上です。


----- Original Message -----
> From: "renayama19661014@ybb.ne.jp" <renayama19661014@ybb.ne.jp>
> To: "linux-ha-japan@lists.sourceforge.jp" <linux-ha-japan@lists.sourceforge.jp>
> Cc:
> Date: 2015/2/17, Tue 12:31
> Subject: Re: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
>
> あきやまさん
>
> こんにちは、山内です。
>
> corosyncのプライマリログを見る限り、セカンダリとのcorosync層の通信が切れて、一旦、再構成がかかったようです。
>
> corosync.confのtokenパラメータを伸ばしてみるとどうなりますでしょうか?
>
>> Feb 16 18:00:27 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
> process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms). Consider
> token timeout increase.
>
>
> 上記のログが出ていますので、本来のcorosyncのスケジューリングが2秒強、遅れている状態が通知されてます。
>
> 以上、宜しくお願いいたします。
>
>
> ----- Original Message -----
>> From: あきやまじろう <mayama0130@gmail.com>
>> To: linux-ha-japan@lists.sourceforge.jp
>> Date: 2015/2/17, Tue 11:38
>> Subject: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
>>
>>
>> お世話になります。あきやまと申します。
>>
>> 下記の環境で運用をしています。
>>
>> 環境:
>> Red Hat Enterprise Linux Server 6.1 (for x86)
>> kernel-2.6.32-431.23.3.el6.i686
>> corosync-1.4.1-17.el6_5.1.i686
>> pacemaker-1.1.10-14.el6_5.3.i686
>> drbd-8.3.16-1.el6.i386
>>
>> 質問内容:
>> 運用中に突然DRBDスプリットブレインが発生し、手動で復旧させましたが、
>> 発生原因が不明で困っています。
>> 発生原因と回避策が分かれば、ご教授願えますでしょうか。
>>
>> 現在までに判明している事:
>> ・DRBDネットワークのping応答は問題なし
>> ・messagesログにリンクダウンの出力なし
>>
>> 【プライマリ側のmessagesログ】
>> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
>> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown )
> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting header on
> sock: r=-110
>> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
> 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
>> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
>> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
>> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
> Unconnected )
>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
>> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
> WFConnection )
>> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
> from=::ffff:192.171.1.102
>> Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root: cmd='echo
> $(date +%Y/%m/%d\ %H:%M:%S)'
>> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
> duration=0(sec)
>> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: Agreed
> network protocol version 97
>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
> WFReportParams )
>> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread (from
> drbd0_receiver [3117])
>> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg:
> <not-used>
>> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down by
> peer.
>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
> NetworkFailure )
>> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
>> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
>> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
> Unconnected )
>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
>> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
> WFConnection )
>>
>> 【プライマリ側のcorosyncログ】
>> Feb 16 18:00:27 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
> process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms). Consider
> token timeout increase.
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 16: memb=1, new=0,
> lost=1
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server1 1694608064
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:779 info:
> pcmk_peer_update: lost: server2 1711385280
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, lost=0
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:791 info:
> pcmk_peer_update: NEW:  server2 1711385280
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server1 1694608064
>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:27 server1 corosync[1756]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:27 server1 corosync[1756]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
>> Feb 16 18:00:28 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
> process was not scheduled for 826.4143 ms (threshold is 800.0000 ms). Consider
> token timeout increase.
>> Feb 16 18:00:28 server1 corosync[1756]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:28 server1 crmd[1766]:  warning: crmd_ha_msg_filter: Another DC
> detected: server2 (op=noop)
>> Feb 16 18:00:28 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> origin=crmd_ha_msg_filter ]
>> Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:00:30 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
> process was not scheduled for 2248.0801 ms (threshold is 800.0000 ms). Consider
> token timeout increase.
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 24: memb=1, new=0,
> lost=1
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server1 1694608064
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:779 info:
> pcmk_peer_update: lost: server2 1711385280
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, lost=0
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:791 info:
> pcmk_peer_update: NEW:  server2 1711385280
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server1 1694608064
>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:31 server1 corosync[1756]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:32 server1 corosync[1756]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
>> Feb 16 18:00:32 server1 corosync[1756]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not
> requesting full refresh in R/W mode
>> Feb 16 18:02:28 server1 crmd[1766]:    error: crm_timer_popped: Election
> Timeout (I_ELECTION_DC) just popped in state S_ELECTION! (120000ms)
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_TIMER_POPPED origin=crm_timer_popped ]
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_dc_join_finalize: join-4:
> Syncing the CIB from server2 to the rest of the cluster
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_dc_join_finalize: Requested
> version   <generation_tuple epoch="7" num_updates="87"
> admin_epoch="0" validate-with="pacemaker-1.2"
> cib-last-written="Wed Jan 21 00:17:56 2015"
> update-origin="server1" update-client="crmd"
> crm_feature_set="3.0.7" have-quorum="1"
> dc-uuid="server2"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: Diff: --- 0.7.69
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: Diff: +++ 0.7.87
> 1afd9af32d3a574358543e1d4751aaf1
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --     <node_state
> crm-debug-origin="do_state_transition" join="member"
> id="server1"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --            
> <lrm_rsc_op operation_key="res_drbd_r0_start_0"
> operation="start"
> transition-key="46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> transition-magic="0:0;46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> call-id="44" last-run="1421767134"
> last-rc-change="1421767134" exec-time="430"
> id="res_drbd_r0_last_0"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --            
> <lrm_rsc_op id="res_drbd_r0_monitor_30000"
> operation_key="res_drbd_r0_monitor_30000"
> operation="monitor" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.7"
> transition-key="47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> transition-magic="0:0;47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> call-id="50" rc-code="0" op-status="0"
> interval="30000" last-rc-change="1421767135"
> exec-time="60" queue-time="0"
> op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --          
> <nvpair value="1000"
> id="status-server2-master-res_drbd_r0"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++     <node_state
> id="server1" uname="server1" in_ccm="true"
> crmd="online" crm-debug-origin="post_cache_update"
> join="down" expected="member"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++          
> <nvpair id="status-server2-master-res_drbd_r0"
> name="master-res_drbd_r0" value="10000"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++            
> <lrm_rsc_op id="res_drbd_r0_last_0"
> operation_key="res_drbd_r0_promote_0" operation="promote"
> crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.7"
> transition-key="14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> transition-magic="0:0;14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> call-id="62" rc-code="0" op-status="0"
> interval="0" last-run="1424077226"
> last-rc-change="1424077226" exec-time="120"
> queue-time="0"
> op-digest="c0e018b73fdf522b6cdd355e125af15e"/>
>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++            
> <lrm_rsc_op id="res_drbd_r0_monitor_10000"
> operation_key="res_drbd_r0_monitor_10000"
> operation="monitor" crm-debug-origin="do_update_resource"
> crm_feature_set="3.0.7"
> transition-key="15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> transition-magic="0:8;15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> call-id="68" rc-code="8" op-status="0"
> interval="10000" last-rc-change="1424077226"
> exec-time="53" queue-time="0"
> op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
>> Feb 16 18:02:28 server1 stonith-ng[1762]:  warning: cib_process_diff: Diff
> 0.7.69 -> 0.7.87 from local not applied to 0.7.69: Failed application of an
> update diff
>> Feb 16 18:02:28 server1 stonith-ng[1762]:   notice: update_cib_cache_cb:
> [cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
>> Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_local_callback: Sending
> full refresh (origin=crmd)
>> Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (10000)
>> Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: probe_complete (true)
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: LogActions: Demote 
> res_drbd_r0:1#011(Master -> Slave server2)
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: process_pe_message:
> Calculated Transition 2582: /var/lib/pacemaker/pengine/pe-input-24.bz2
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 8: cancel res_drbd_r0_cancel_10000 on server2
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 81: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 83: notify res_drbd_r0_pre_notify_demote_0 on server2
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=113, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: run_graph: Transition 2582
> (Complete=4, Pending=0, Fired=0, Skipped=4, Incomplete=5,
> Source=/var/lib/pacemaker/pengine/pe-input-24.bz2): Stopped
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: LogActions: Demote 
> res_drbd_r0:1#011(Master -> Slave server2)
>> Feb 16 18:02:28 server1 pengine[1765]:   notice: process_pe_message:
> Calculated Transition 2583: /var/lib/pacemaker/pengine/pe-input-25.bz2
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 80: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 82: notify res_drbd_r0_pre_notify_demote_0 on server2
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=116, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 39: demote res_drbd_r0_demote_0 on server2
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 81: notify res_drbd_r0_post_notify_demote_0 on server1 (local)
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 83: notify res_drbd_r0_post_notify_demote_0 on server2
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=119, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: run_graph: Transition 2583
> (Complete=11, Pending=0, Fired=0, Skipped=1, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-25.bz2): Stopped
>> Feb 16 18:02:29 server1 pengine[1765]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:02:29 server1 pengine[1765]:   notice: process_pe_message:
> Calculated Transition 2584: /var/lib/pacemaker/pengine/pe-input-26.bz2
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
> action 40: monitor res_drbd_r0_monitor_30000 on server2
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: run_graph: Transition 2584
> (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-26.bz2): Complete
>> Feb 16 18:02:29 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>> Feb 16 18:17:29 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
> origin=crm_timer_popped ]
>> Feb 16 18:17:29 server1 pengine[1765]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:17:29 server1 pengine[1765]:   notice: process_pe_message:
> Calculated Transition 2585: /var/lib/pacemaker/pengine/pe-input-27.bz2
>> Feb 16 18:17:29 server1 crmd[1766]:   notice: run_graph: Transition 2585
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-27.bz2): Complete
>> Feb 16 18:17:29 server1 crmd[1766]:   notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>
>> 【セカンダリ側のmessagesログ】
>> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
>> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown )
> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting header on
> sock: r=-110
>> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
> 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
>> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
>> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
>> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
> Unconnected )
>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
>> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
> WFConnection )
>> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
> from=::ffff:192.171.1.102
>> Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root: cmd='echo
> $(date +%Y/%m/%d\ %H:%M:%S)'
>> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
> duration=0(sec)
>> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: Agreed
> network protocol version 97
>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
> WFReportParams )
>> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread (from
> drbd0_receiver [3117])
>> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg:
> <not-used>
>> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down by
> peer.
>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
> NetworkFailure )
>> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
>> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
>> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
> Unconnected )
>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
>> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
> WFConnection )
>>
>> 【セカンダリ側のcorosyncログ】
>> Feb 16 18:00:23 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (1000)
>> Feb 16 18:00:23 server2 attrd[1557]:   notice: attrd_perform_update: Sent
> update 45: master-res_drbd_r0=1000
>> Feb 16 18:00:24 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1595 A
> processor failed, forming new configuration.
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 12: memb=1, new=0,
> lost=1
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server2 1711385280
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:779 info:
> pcmk_peer_update: lost: server1 1694608064
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 12: memb=1, new=0, lost=0
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:731 info:
> ais_mark_unseen_peer_dead: Node server1 was not seen in the previous transition
>> Feb 16 18:00:25 server2 cib[1554]:   notice: plugin_handle_membership:
> Membership 12: quorum lost
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
> update_member: Node 1694608064/server1 is now: lost
>> Feb 16 18:00:25 server2 crmd[1559]:   notice: plugin_handle_membership:
> Membership 12: quorum lost
>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
> send_member_notification: Sending membership update 12 to 2 children
>> Feb 16 18:00:25 server2 cib[1554]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
> member)
>> Feb 16 18:00:25 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:25 server2 crmd[1559]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
> member)
>> Feb 16 18:00:25 server2 crmd[1559]:  warning: reap_dead_nodes: Our DC node
> (server1) left the cluster
>> Feb 16 18:00:25 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> origin=reap_dead_nodes ]
>> Feb 16 18:00:25 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
>> Feb 16 18:00:25 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:25 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_local_callback: Sending
> full refresh (origin=crmd)
>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (1000)
>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_perform_update: Sent
> update 47: master-res_drbd_r0=1000
>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: probe_complete (true)
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: LogActions: Promote
> res_drbd_r0:0#011(Slave -> Master server2)
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: process_pe_message:
> Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-0.bz2
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 1: cancel res_drbd_r0_cancel_30000 on server2 (local)
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 56: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=56, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: run_graph: Transition 0
> (Complete=3, Pending=0, Fired=0, Skipped=5, Incomplete=4,
> Source=/var/lib/pacemaker/pengine/pe-input-0.bz2): Stopped
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: LogActions: Promote
> res_drbd_r0:0#011(Slave -> Master server2)
>> Feb 16 18:00:26 server2 pengine[1558]:   notice: process_pe_message:
> Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-1.bz2
>> Feb 16 18:00:26 server2 crmd[1559]:  warning: destroy_action: Cancelling
> timer for action 1 (src=45)
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 55: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 16: memb=1, new=0,
> lost=0
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server2 1711385280
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, lost=0
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
> update_member: Node 1694608064/server1 is now: member
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:791 info:
> pcmk_peer_update: NEW:  server1 1694608064
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server1 1694608064
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
> send_member_notification: Sending membership update 16 to 2 children
>> Feb 16 18:00:26 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: plugin_handle_membership:
> Membership 16: quorum acquired
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now member (was
> lost)
>> Feb 16 18:00:26 server2 cib[1554]:   notice: plugin_handle_membership:
> Membership 16: quorum acquired
>> Feb 16 18:00:26 server2 cib[1554]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now member (was
> lost)
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=59, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 14: promote res_drbd_r0_promote_0 on server2 (local)
>> Feb 16 18:00:26 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_promote_0 (call=62, rc=0, cib-update=49, confirmed=true)
> ok
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 56: notify res_drbd_r0_post_notify_promote_0 on server2 (local)
>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (10000)
>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_perform_update: Sent
> update 51: master-res_drbd_r0=10000
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=65, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
> action 15: monitor res_drbd_r0_monitor_10000 on server2 (local)
>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (10000)
>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_perform_update: Sent
> update 53: master-res_drbd_r0=10000
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_monitor_10000 (call=68, rc=8, cib-update=50,
> confirmed=false) master
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event:
> server2-res_drbd_r0_monitor_10000:68 [ \n ]
>> Feb 16 18:00:26 server2 crmd[1559]:   notice: run_graph: Transition 1
> (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-1.bz2): Complete
>> Feb 16 18:00:27 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:28 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1595 A
> processor failed, forming new configuration.
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 20: memb=1, new=0,
> lost=1
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server2 1711385280
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:779 info:
> pcmk_peer_update: lost: server1 1694608064
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0, lost=0
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:731 info:
> ais_mark_unseen_peer_dead: Node server1 was not seen in the previous transition
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
> update_member: Node 1694608064/server1 is now: lost
>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
> send_member_notification: Sending membership update 20 to 2 children
>> Feb 16 18:00:29 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:29 server2 cib[1554]:   notice: plugin_handle_membership:
> Membership 20: quorum lost
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: plugin_handle_membership:
> Membership 20: quorum lost
>> Feb 16 18:00:29 server2 cib[1554]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
> member)
>> Feb 16 18:00:29 server2 crmd[1559]:  warning: match_down_event: No match for
> shutdown action on server1
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: peer_update_callback:
> Stonith/shutdown of server1 not matched
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
> member)
>> Feb 16 18:00:29 server2 crmd[1559]:  warning: match_down_event: No match for
> shutdown action on server1
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: peer_update_callback:
> Stonith/shutdown of server1 not matched
>> Feb 16 18:00:29 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
>> Feb 16 18:00:29 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: do_election_count_vote:
> Election 4 (current: 2, owner: server1): Processed vote from server1 (Peer is
> not part of our cluster)
>> Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_local_callback: Sending
> full refresh (origin=crmd)
>> Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (10000)
>> Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: probe_complete (true)
>> Feb 16 18:00:29 server2 pengine[1558]:   notice: unpack_config: On loss of
> CCM Quorum: Ignore
>> Feb 16 18:00:29 server2 pengine[1558]:   notice: process_pe_message:
> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-2.bz2
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: run_graph: Transition 2
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-2.bz2): Complete
>> Feb 16 18:00:29 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Transitional membership event on ring 24: memb=1, new=0,
> lost=0
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
> pcmk_peer_update: memb: server2 1711385280
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, lost=0
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
> update_member: Node 1694608064/server1 is now: member
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:791 info:
> pcmk_peer_update: NEW:  server1 1694608064
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server1 1694608064
>> Feb 16 18:00:30 server2 crmd[1559]:   notice: plugin_handle_membership:
> Membership 24: quorum acquired
>> Feb 16 18:00:30 server2 cib[1554]:   notice: plugin_handle_membership:
> Membership 24: quorum acquired
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
> pcmk_peer_update: MEMB: server2 1711385280
>> Feb 16 18:00:30 server2 cib[1554]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now member (was
> lost)
>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
> send_member_notification: Sending membership update 24 to 2 children
>> Feb 16 18:00:30 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
> processor joined or left the membership and a new membership was formed.
>> Feb 16 18:00:30 server2 crmd[1559]:   notice: crm_update_peer_state:
> plugin_handle_membership: Node server1[1694608064] - state is now member (was
> lost)
>> Feb 16 18:00:30 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
> downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
>> Feb 16 18:00:31 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
> service synchronization, ready to provide service.
>> Feb 16 18:00:31 server2 attrd[1557]:  warning: attrd_cib_callback: Update 45
> for master-res_drbd_r0=1000 failed: Application of an update diff failed
>> Feb 16 18:00:56 server2 crmd[1559]:  warning: cib_rsc_callback: Resource
> update 49 failed: (rc=-62) Timer expired
>> Feb 16 18:00:56 server2 crmd[1559]:  warning: cib_rsc_callback: Resource
> update 50 failed: (rc=-62) Timer expired
>> Feb 16 18:02:26 server2 attrd[1557]:  warning: attrd_cib_callback: Update 51
> for master-res_drbd_r0=10000 failed: Timer expired
>> Feb 16 18:02:27 server2 crmd[1559]:  warning: crmd_ha_msg_filter: Another DC
> detected: server1 (op=join_offer)
>> Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> origin=crmd_ha_msg_filter ]
>> Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_ELECTION -> S_RELEASE_DC [ input=I_RELEASE_DC
> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
>> Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State
> transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
> origin=do_cl_join_finalize_respond ]
>> Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_local_callback: Sending
> full refresh (origin=crmd)
>> Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (10000)
>> Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: probe_complete (true)
>> Feb 16 18:02:27 server2 crmd[1559]:   notice: process_lrm_event:
> server2-res_drbd_r0_monitor_10000:68 [ \n ]
>> Feb 16 18:02:27 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=74, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=77, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_demote_0 (call=80, rc=0, cib-update=74, confirmed=true) ok
>> Feb 16 18:02:28 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
> flush op to all hosts for: master-res_drbd_r0 (1000)
>> Feb 16 18:02:28 server2 attrd[1557]:   notice: attrd_perform_update: Sent
> update 67: master-res_drbd_r0=1000
>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_notify_0 (call=83, rc=0, cib-update=0, confirmed=true) ok
>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
> operation res_drbd_r0_monitor_30000 (call=86, rc=0, cib-update=75,
> confirmed=false) ok
>>
>> 【global_common.conf】
>> global {
>>         usage-count no;
>> }
>>
>> common {
>>         protocol C;
>>         handlers {
>>         }
>>         startup {
>>         }
>>         disk {
>>                 no-disk-barrier;
>>                 no-disk-flushes;
>>         }
>>         net {
>>                 max-buffers 8000;
>>                 max-epoch-size 8000;
>>                 after-sb-0pri discard-younger-primary;
>>                 after-sb-1pri discard-secondary;
>>                 after-sb-2pri disconnect;
>>         }
>>         syncer {
>>                 rate 30M;
>>                 verify-alg sha1;
>>         }
>> }
>>
>>
>>
>>
>> _______________________________________________
>> Linux-ha-japan mailing list
>> Linux-ha-japan@lists.sourceforge.jp
>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>>
>>
>>
>
> _______________________________________________
> Linux-ha-japan mailing list
> Linux-ha-japan@lists.sourceforge.jp
> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>

_______________________________________________
Linux-ha-japan mailing list
Linux-ha-japan@lists.sourceforge.jp
http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
Re: DRBDスプリットブレインの発生原因に関して [ In reply to ]
山内様

あきやまです。

ご返信ありがとうございます。

> ちなみに、プライマリ、セカンダリの負荷が高い可能性もありますので、
> sarやvmstatでcpuの状態も見てみた方がよいかも知れません。
> #ある時に、何かのcronジョブなどで負荷があがったかも知れません。
当該時間他のアプリケーションの処理が他の時間帯に比べて負荷が高かったので、
その可能性も含め、状態の確認をしてみたいと思います。

> corosync.confのtokenパラメータを伸ばしてみるとどうなりますでしょうか?
token値はデフォルト値(1000ms)のままでした。
変更する場合ですが、推奨値はあるのでしょうか?
もしくは算出方法等あれば、ご教授頂けると助かります。

2015-02-17 12:58 GMT+09:00 <renayama19661014@ybb.ne.jp>:

> あきやまさん
>
> 山内です。
>
> ちなみに、プライマリ、セカンダリの負荷が高い可能性もありますので、
> sarやvmstatでcpuの状態も見てみた方がよいかも知れません。
>
> #ある時に、何かのcronジョブなどで負荷があがったかも知れません。
>
> 以上です。
>
>
> ----- Original Message -----
> > From: "renayama19661014@ybb.ne.jp" <renayama19661014@ybb.ne.jp>
> > To: "linux-ha-japan@lists.sourceforge.jp" <
> linux-ha-japan@lists.sourceforge.jp>
> > Cc:
> > Date: 2015/2/17, Tue 12:31
> > Subject: Re: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
> >
> > あきやまさん
> >
> > こんにちは、山内です。
> >
> > corosyncのプライマリログを見る限り、セカンダリとのcorosync層の通信が切れて、一旦、再構成がかかったようです。
> >
> > corosync.confのtokenパラメータを伸ばしてみるとどうなりますでしょうか?
> >
> >> Feb 16 18:00:27 server1 corosync[1756]: [MAIN ] main.c:1341 Corosync
> main
> > process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms).
> Consider
> > token timeout increase.
> >
> >
> > 上記のログが出ていますので、本来のcorosyncのスケジューリングが2秒強、遅れている状態が通知されてます。
> >
> > 以上、宜しくお願いいたします。
> >
> >
> > ----- Original Message -----
> >> From: あきやまじろう <mayama0130@gmail.com>
> >> To: linux-ha-japan@lists.sourceforge.jp
> >> Date: 2015/2/17, Tue 11:38
> >> Subject: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
> >>
> >>
> >> お世話になります。あきやまと申します。
> >>
> >> 下記の環境で運用をしています。
> >>
> >> 環境:
> >> Red Hat Enterprise Linux Server 6.1 (for x86)
> >> kernel-2.6.32-431.23.3.el6.i686
> >> corosync-1.4.1-17.el6_5.1.i686
> >> pacemaker-1.1.10-14.el6_5.3.i686
> >> drbd-8.3.16-1.el6.i386
> >>
> >> 質問内容:
> >> 運用中に突然DRBDスプリットブレインが発生し、手動で復旧させましたが、
> >> 発生原因が不明で困っています。
> >> 発生原因と回避策が分かれば、ご教授願えますでしょうか。
> >>
> >> 現在までに判明している事:
> >> ・DRBDネットワークのping応答は問題なし
> >> ・messagesログにリンクダウンの出力なし
> >>
> >> 【プライマリ側のmessagesログ】
> >> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
> >> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown
> )
> > conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
> >> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting
> header on
> > sock: r=-110
> >> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
> > 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
> >> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
> >> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
> >> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
> >> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
> > Unconnected )
> >> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
> >> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
> >> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
> >> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
> > WFConnection )
> >> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
> > from=::ffff:192.171.1.102
> >> Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root:
> cmd='echo
> > $(date +%Y/%m/%d\ %H:%M:%S)'
> >> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
> > duration=0(sec)
> >> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful:
> Agreed
> > network protocol version 97
> >> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
> > WFReportParams )
> >> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread
> (from
> > drbd0_receiver [3117])
> >> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg:
> > <not-used>
> >> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down
> by
> > peer.
> >> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
> > NetworkFailure )
> >> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
> >> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
> >> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
> >> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
> > Unconnected )
> >> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
> >> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
> >> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
> >> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
> > WFConnection )
> >>
> >> 【プライマリ側のcorosyncログ】
> >> Feb 16 18:00:27 server1 corosync[1756]: [MAIN ] main.c:1341 Corosync
> main
> > process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms).
> Consider
> > token timeout increase.
> >> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Transitional membership event on ring 16: memb=1,
> new=0,
> > lost=1
> >> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:773 info:
> > pcmk_peer_update: memb: server1 1694608064
> >> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:779 info:
> > pcmk_peer_update: lost: server2 1711385280
> >> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1,
> lost=0
> >> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:791 info:
> > pcmk_peer_update: NEW: server2 1711385280
> >> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
> > pcmk_peer_update: MEMB: server1 1694608064
> >> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
> > pcmk_peer_update: MEMB: server2 1711385280
> >> Feb 16 18:00:27 server1 corosync[1756]: [TOTEM ] totemsrp.c:1892 A
> > processor joined or left the membership and a new membership was formed.
> >> Feb 16 18:00:27 server1 corosync[1756]: [CPG ] cpg.c:783 chosen
> > downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
> >> Feb 16 18:00:28 server1 corosync[1756]: [MAIN ] main.c:1341 Corosync
> main
> > process was not scheduled for 826.4143 ms (threshold is 800.0000 ms).
> Consider
> > token timeout increase.
> >> Feb 16 18:00:28 server1 corosync[1756]: [MAIN ] main.c:278 Completed
> > service synchronization, ready to provide service.
> >> Feb 16 18:00:28 server1 crmd[1766]: warning: crmd_ha_msg_filter:
> Another DC
> > detected: server2 (op=noop)
> >> Feb 16 18:00:28 server1 crmd[1766]: notice: do_state_transition: State
> > transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> > origin=crmd_ha_msg_filter ]
> >> Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> > requesting full refresh in R/W mode
> >> Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> > requesting full refresh in R/W mode
> >> Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> > requesting full refresh in R/W mode
> >> Feb 16 18:00:30 server1 corosync[1756]: [MAIN ] main.c:1341 Corosync
> main
> > process was not scheduled for 2248.0801 ms (threshold is 800.0000 ms).
> Consider
> > token timeout increase.
> >> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Transitional membership event on ring 24: memb=1,
> new=0,
> > lost=1
> >> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:773 info:
> > pcmk_peer_update: memb: server1 1694608064
> >> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:779 info:
> > pcmk_peer_update: lost: server2 1711385280
> >> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1,
> lost=0
> >> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:791 info:
> > pcmk_peer_update: NEW: server2 1711385280
> >> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
> > pcmk_peer_update: MEMB: server1 1694608064
> >> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
> > pcmk_peer_update: MEMB: server2 1711385280
> >> Feb 16 18:00:31 server1 corosync[1756]: [TOTEM ] totemsrp.c:1892 A
> > processor joined or left the membership and a new membership was formed.
> >> Feb 16 18:00:32 server1 corosync[1756]: [CPG ] cpg.c:783 chosen
> > downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
> >> Feb 16 18:00:32 server1 corosync[1756]: [MAIN ] main.c:278 Completed
> > service synchronization, ready to provide service.
> >> Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> > requesting full refresh in R/W mode
> >> Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> > requesting full refresh in R/W mode
> >> Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> > requesting full refresh in R/W mode
> >> Feb 16 18:02:28 server1 crmd[1766]: error: crm_timer_popped: Election
> > Timeout (I_ELECTION_DC) just popped in state S_ELECTION! (120000ms)
> >> Feb 16 18:02:28 server1 crmd[1766]: notice: do_state_transition: State
> > transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> > cause=C_TIMER_POPPED origin=crm_timer_popped ]
> >> Feb 16 18:02:28 server1 crmd[1766]: notice: do_state_transition: State
> > transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> > cause=C_FSA_INTERNAL origin=do_election_check ]
> >> Feb 16 18:02:28 server1 crmd[1766]: notice: do_dc_join_finalize:
> join-4:
> > Syncing the CIB from server2 to the rest of the cluster
> >> Feb 16 18:02:28 server1 crmd[1766]: notice: do_dc_join_finalize:
> Requested
> > version <generation_tuple epoch="7" num_updates="87"
> > admin_epoch="0" validate-with="pacemaker-1.2"
> > cib-last-written="Wed Jan 21 00:17:56 2015"
> > update-origin="server1" update-client="crmd"
> > crm_feature_set="3.0.7" have-quorum="1"
> > dc-uuid="server2"/>
> >> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: Diff: --- 0.7.69
> >> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: Diff: +++ 0.7.87
> > 1afd9af32d3a574358543e1d4751aaf1
> >> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: --
> <node_state
> > crm-debug-origin="do_state_transition" join="member"
> > id="server1"/>
> >> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: --
> > <lrm_rsc_op operation_key="res_drbd_r0_start_0"
> > operation="start"
> > transition-key="46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> > transition-magic="0:0;46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> > call-id="44" last-run="1421767134"
> > last-rc-change="1421767134" exec-time="430"
> > id="res_drbd_r0_last_0"/>
> >> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: --
> > <lrm_rsc_op id="res_drbd_r0_monitor_30000"
> > operation_key="res_drbd_r0_monitor_30000"
> > operation="monitor" crm-debug-origin="do_update_resource"
> > crm_feature_set="3.0.7"
> > transition-key="47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> > transition-magic="0:0;47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> > call-id="50" rc-code="0" op-status="0"
> > interval="30000" last-rc-change="1421767135"
> > exec-time="60" queue-time="0"
> > op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
> >> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: --
> > <nvpair value="1000"
> > id="status-server2-master-res_drbd_r0"/>
> >> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++
> <node_state
> > id="server1" uname="server1" in_ccm="true"
> > crmd="online" crm-debug-origin="post_cache_update"
> > join="down" expected="member"/>
> >> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++
> > <nvpair id="status-server2-master-res_drbd_r0"
> > name="master-res_drbd_r0" value="10000"/>
> >> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++
> > <lrm_rsc_op id="res_drbd_r0_last_0"
> > operation_key="res_drbd_r0_promote_0" operation="promote"
> > crm-debug-origin="do_update_resource"
> > crm_feature_set="3.0.7"
> > transition-key="14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> > transition-magic="0:0;14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> > call-id="62" rc-code="0" op-status="0"
> > interval="0" last-run="1424077226"
> > last-rc-change="1424077226" exec-time="120"
> > queue-time="0"
> > op-digest="c0e018b73fdf522b6cdd355e125af15e"/>
> >> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++
> > <lrm_rsc_op id="res_drbd_r0_monitor_10000"
> > operation_key="res_drbd_r0_monitor_10000"
> > operation="monitor" crm-debug-origin="do_update_resource"
> > crm_feature_set="3.0.7"
> > transition-key="15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> > transition-magic="0:8;15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> > call-id="68" rc-code="8" op-status="0"
> > interval="10000" last-rc-change="1424077226"
> > exec-time="53" queue-time="0"
> > op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
> >> Feb 16 18:02:28 server1 stonith-ng[1762]: warning: cib_process_diff:
> Diff
> > 0.7.69 -> 0.7.87 from local not applied to 0.7.69: Failed application of
> an
> > update diff
> >> Feb 16 18:02:28 server1 stonith-ng[1762]: notice: update_cib_cache_cb:
> > [cib_diff_notify] Patch aborted: Application of an update diff failed
> (-206)
> >> Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_local_callback:
> Sending
> > full refresh (origin=crmd)
> >> Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: master-res_drbd_r0 (10000)
> >> Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: probe_complete (true)
> >> Feb 16 18:02:28 server1 pengine[1765]: notice: unpack_config: On loss
> of
> > CCM Quorum: Ignore
> >> Feb 16 18:02:28 server1 pengine[1765]: notice: LogActions: Demote
> > res_drbd_r0:1#011(Master -> Slave server2)
> >> Feb 16 18:02:28 server1 pengine[1765]: notice: process_pe_message:
> > Calculated Transition 2582: /var/lib/pacemaker/pengine/pe-input-24.bz2
> >> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: Initiating
> > action 8: cancel res_drbd_r0_cancel_10000 on server2
> >> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: Initiating
> > action 81: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
> >> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: Initiating
> > action 83: notify res_drbd_r0_pre_notify_demote_0 on server2
> >> Feb 16 18:02:28 server1 crmd[1766]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_notify_0 (call=113, rc=0, cib-update=0,
> confirmed=true) ok
> >> Feb 16 18:02:28 server1 crmd[1766]: notice: run_graph: Transition 2582
> > (Complete=4, Pending=0, Fired=0, Skipped=4, Incomplete=5,
> > Source=/var/lib/pacemaker/pengine/pe-input-24.bz2): Stopped
> >> Feb 16 18:02:28 server1 pengine[1765]: notice: unpack_config: On loss
> of
> > CCM Quorum: Ignore
> >> Feb 16 18:02:28 server1 pengine[1765]: notice: LogActions: Demote
> > res_drbd_r0:1#011(Master -> Slave server2)
> >> Feb 16 18:02:28 server1 pengine[1765]: notice: process_pe_message:
> > Calculated Transition 2583: /var/lib/pacemaker/pengine/pe-input-25.bz2
> >> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: Initiating
> > action 80: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
> >> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: Initiating
> > action 82: notify res_drbd_r0_pre_notify_demote_0 on server2
> >> Feb 16 18:02:29 server1 crmd[1766]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_notify_0 (call=116, rc=0, cib-update=0,
> confirmed=true) ok
> >> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: Initiating
> > action 39: demote res_drbd_r0_demote_0 on server2
> >> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: Initiating
> > action 81: notify res_drbd_r0_post_notify_demote_0 on server1 (local)
> >> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: Initiating
> > action 83: notify res_drbd_r0_post_notify_demote_0 on server2
> >> Feb 16 18:02:29 server1 crmd[1766]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_notify_0 (call=119, rc=0, cib-update=0,
> confirmed=true) ok
> >> Feb 16 18:02:29 server1 crmd[1766]: notice: run_graph: Transition 2583
> > (Complete=11, Pending=0, Fired=0, Skipped=1, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-25.bz2): Stopped
> >> Feb 16 18:02:29 server1 pengine[1765]: notice: unpack_config: On loss
> of
> > CCM Quorum: Ignore
> >> Feb 16 18:02:29 server1 pengine[1765]: notice: process_pe_message:
> > Calculated Transition 2584: /var/lib/pacemaker/pengine/pe-input-26.bz2
> >> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: Initiating
> > action 40: monitor res_drbd_r0_monitor_30000 on server2
> >> Feb 16 18:02:29 server1 crmd[1766]: notice: run_graph: Transition 2584
> > (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-26.bz2): Complete
> >> Feb 16 18:02:29 server1 crmd[1766]: notice: do_state_transition: State
> > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> > cause=C_FSA_INTERNAL origin=notify_crmd ]
> >> Feb 16 18:17:29 server1 crmd[1766]: notice: do_state_transition: State
> > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_TIMER_POPPED
> > origin=crm_timer_popped ]
> >> Feb 16 18:17:29 server1 pengine[1765]: notice: unpack_config: On loss
> of
> > CCM Quorum: Ignore
> >> Feb 16 18:17:29 server1 pengine[1765]: notice: process_pe_message:
> > Calculated Transition 2585: /var/lib/pacemaker/pengine/pe-input-27.bz2
> >> Feb 16 18:17:29 server1 crmd[1766]: notice: run_graph: Transition 2585
> > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-27.bz2): Complete
> >> Feb 16 18:17:29 server1 crmd[1766]: notice: do_state_transition: State
> > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> > cause=C_FSA_INTERNAL origin=notify_crmd ]
> >>
> >> 【セカンダリ側のmessagesログ】
> >> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
> >> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown
> )
> > conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
> >> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting
> header on
> > sock: r=-110
> >> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
> > 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
> >> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
> >> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
> >> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
> >> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
> > Unconnected )
> >> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
> >> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
> >> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
> >> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
> > WFConnection )
> >> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
> > from=::ffff:192.171.1.102
> >> Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root:
> cmd='echo
> > $(date +%Y/%m/%d\ %H:%M:%S)'
> >> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
> > duration=0(sec)
> >> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful:
> Agreed
> > network protocol version 97
> >> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
> > WFReportParams )
> >> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread
> (from
> > drbd0_receiver [3117])
> >> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg:
> > <not-used>
> >> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down
> by
> > peer.
> >> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
> > NetworkFailure )
> >> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
> >> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
> >> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
> >> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
> > Unconnected )
> >> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
> >> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
> >> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
> >> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
> > WFConnection )
> >>
> >> 【セカンダリ側のcorosyncログ】
> >> Feb 16 18:00:23 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: master-res_drbd_r0 (1000)
> >> Feb 16 18:00:23 server2 attrd[1557]: notice: attrd_perform_update:
> Sent
> > update 45: master-res_drbd_r0=1000
> >> Feb 16 18:00:24 server2 corosync[1549]: [TOTEM ] totemsrp.c:1595 A
> > processor failed, forming new configuration.
> >> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Transitional membership event on ring 12: memb=1,
> new=0,
> > lost=1
> >> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
> > pcmk_peer_update: memb: server2 1711385280
> >> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:779 info:
> > pcmk_peer_update: lost: server1 1694608064
> >> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Stable membership event on ring 12: memb=1, new=0,
> lost=0
> >> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> > pcmk_peer_update: MEMB: server2 1711385280
> >> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:731 info:
> > ais_mark_unseen_peer_dead: Node server1 was not seen in the previous
> transition
> >> Feb 16 18:00:25 server2 cib[1554]: notice: plugin_handle_membership:
> > Membership 12: quorum lost
> >> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] utils.c:358 info:
> > update_member: Node 1694608064/server1 is now: lost
> >> Feb 16 18:00:25 server2 crmd[1559]: notice: plugin_handle_membership:
> > Membership 12: quorum lost
> >> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
> > send_member_notification: Sending membership update 12 to 2 children
> >> Feb 16 18:00:25 server2 cib[1554]: notice: crm_update_peer_state:
> > plugin_handle_membership: Node server1[1694608064] - state is now lost
> (was
> > member)
> >> Feb 16 18:00:25 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
> > processor joined or left the membership and a new membership was formed.
> >> Feb 16 18:00:25 server2 crmd[1559]: notice: crm_update_peer_state:
> > plugin_handle_membership: Node server1[1694608064] - state is now lost
> (was
> > member)
> >> Feb 16 18:00:25 server2 crmd[1559]: warning: reap_dead_nodes: Our DC
> node
> > (server1) left the cluster
> >> Feb 16 18:00:25 server2 crmd[1559]: notice: do_state_transition: State
> > transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> > origin=reap_dead_nodes ]
> >> Feb 16 18:00:25 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
> > downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
> >> Feb 16 18:00:25 server2 corosync[1549]: [MAIN ] main.c:278 Completed
> > service synchronization, ready to provide service.
> >> Feb 16 18:00:25 server2 crmd[1559]: notice: do_state_transition: State
> > transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> > cause=C_FSA_INTERNAL origin=do_election_check ]
> >> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_local_callback:
> Sending
> > full refresh (origin=crmd)
> >> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: master-res_drbd_r0 (1000)
> >> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_perform_update:
> Sent
> > update 47: master-res_drbd_r0=1000
> >> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: probe_complete (true)
> >> Feb 16 18:00:26 server2 pengine[1558]: notice: unpack_config: On loss
> of
> > CCM Quorum: Ignore
> >> Feb 16 18:00:26 server2 pengine[1558]: notice: LogActions: Promote
> > res_drbd_r0:0#011(Slave -> Master server2)
> >> Feb 16 18:00:26 server2 pengine[1558]: notice: process_pe_message:
> > Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-0.bz2
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
> > action 1: cancel res_drbd_r0_cancel_30000 on server2 (local)
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
> > action 56: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_notify_0 (call=56, rc=0, cib-update=0,
> confirmed=true) ok
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: run_graph: Transition 0
> > (Complete=3, Pending=0, Fired=0, Skipped=5, Incomplete=4,
> > Source=/var/lib/pacemaker/pengine/pe-input-0.bz2): Stopped
> >> Feb 16 18:00:26 server2 pengine[1558]: notice: unpack_config: On loss
> of
> > CCM Quorum: Ignore
> >> Feb 16 18:00:26 server2 pengine[1558]: notice: LogActions: Promote
> > res_drbd_r0:0#011(Slave -> Master server2)
> >> Feb 16 18:00:26 server2 pengine[1558]: notice: process_pe_message:
> > Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-1.bz2
> >> Feb 16 18:00:26 server2 crmd[1559]: warning: destroy_action: Cancelling
> > timer for action 1 (src=45)
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
> > action 55: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
> >> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Transitional membership event on ring 16: memb=1,
> new=0,
> > lost=0
> >> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
> > pcmk_peer_update: memb: server2 1711385280
> >> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1,
> lost=0
> >> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] utils.c:358 info:
> > update_member: Node 1694608064/server1 is now: member
> >> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:791 info:
> > pcmk_peer_update: NEW: server1 1694608064
> >> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> > pcmk_peer_update: MEMB: server1 1694608064
> >> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> > pcmk_peer_update: MEMB: server2 1711385280
> >> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
> > send_member_notification: Sending membership update 16 to 2 children
> >> Feb 16 18:00:26 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
> > processor joined or left the membership and a new membership was formed.
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: plugin_handle_membership:
> > Membership 16: quorum acquired
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: crm_update_peer_state:
> > plugin_handle_membership: Node server1[1694608064] - state is now member
> (was
> > lost)
> >> Feb 16 18:00:26 server2 cib[1554]: notice: plugin_handle_membership:
> > Membership 16: quorum acquired
> >> Feb 16 18:00:26 server2 cib[1554]: notice: crm_update_peer_state:
> > plugin_handle_membership: Node server1[1694608064] - state is now member
> (was
> > lost)
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_notify_0 (call=59, rc=0, cib-update=0,
> confirmed=true) ok
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
> > action 14: promote res_drbd_r0_promote_0 on server2 (local)
> >> Feb 16 18:00:26 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
> > downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_promote_0 (call=62, rc=0, cib-update=49,
> confirmed=true)
> > ok
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
> > action 56: notify res_drbd_r0_post_notify_promote_0 on server2 (local)
> >> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: master-res_drbd_r0 (10000)
> >> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_perform_update:
> Sent
> > update 51: master-res_drbd_r0=10000
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_notify_0 (call=65, rc=0, cib-update=0,
> confirmed=true) ok
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: Initiating
> > action 15: monitor res_drbd_r0_monitor_10000 on server2 (local)
> >> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: master-res_drbd_r0 (10000)
> >> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_perform_update:
> Sent
> > update 53: master-res_drbd_r0=10000
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_monitor_10000 (call=68, rc=8, cib-update=50,
> > confirmed=false) master
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event:
> > server2-res_drbd_r0_monitor_10000:68 [ \n ]
> >> Feb 16 18:00:26 server2 crmd[1559]: notice: run_graph: Transition 1
> > (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-1.bz2): Complete
> >> Feb 16 18:00:27 server2 corosync[1549]: [MAIN ] main.c:278 Completed
> > service synchronization, ready to provide service.
> >> Feb 16 18:00:28 server2 corosync[1549]: [TOTEM ] totemsrp.c:1595 A
> > processor failed, forming new configuration.
> >> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Transitional membership event on ring 20: memb=1,
> new=0,
> > lost=1
> >> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
> > pcmk_peer_update: memb: server2 1711385280
> >> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:779 info:
> > pcmk_peer_update: lost: server1 1694608064
> >> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0,
> lost=0
> >> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> > pcmk_peer_update: MEMB: server2 1711385280
> >> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:731 info:
> > ais_mark_unseen_peer_dead: Node server1 was not seen in the previous
> transition
> >> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] utils.c:358 info:
> > update_member: Node 1694608064/server1 is now: lost
> >> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
> > send_member_notification: Sending membership update 20 to 2 children
> >> Feb 16 18:00:29 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
> > processor joined or left the membership and a new membership was formed.
> >> Feb 16 18:00:29 server2 cib[1554]: notice: plugin_handle_membership:
> > Membership 20: quorum lost
> >> Feb 16 18:00:29 server2 crmd[1559]: notice: plugin_handle_membership:
> > Membership 20: quorum lost
> >> Feb 16 18:00:29 server2 cib[1554]: notice: crm_update_peer_state:
> > plugin_handle_membership: Node server1[1694608064] - state is now lost
> (was
> > member)
> >> Feb 16 18:00:29 server2 crmd[1559]: warning: match_down_event: No
> match for
> > shutdown action on server1
> >> Feb 16 18:00:29 server2 crmd[1559]: notice: peer_update_callback:
> > Stonith/shutdown of server1 not matched
> >> Feb 16 18:00:29 server2 crmd[1559]: notice: crm_update_peer_state:
> > plugin_handle_membership: Node server1[1694608064] - state is now lost
> (was
> > member)
> >> Feb 16 18:00:29 server2 crmd[1559]: warning: match_down_event: No
> match for
> > shutdown action on server1
> >> Feb 16 18:00:29 server2 crmd[1559]: notice: peer_update_callback:
> > Stonith/shutdown of server1 not matched
> >> Feb 16 18:00:29 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
> > downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
> >> Feb 16 18:00:29 server2 corosync[1549]: [MAIN ] main.c:278 Completed
> > service synchronization, ready to provide service.
> >> Feb 16 18:00:29 server2 crmd[1559]: notice: do_election_count_vote:
> > Election 4 (current: 2, owner: server1): Processed vote from server1
> (Peer is
> > not part of our cluster)
> >> Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_local_callback:
> Sending
> > full refresh (origin=crmd)
> >> Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: master-res_drbd_r0 (10000)
> >> Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: probe_complete (true)
> >> Feb 16 18:00:29 server2 pengine[1558]: notice: unpack_config: On loss
> of
> > CCM Quorum: Ignore
> >> Feb 16 18:00:29 server2 pengine[1558]: notice: process_pe_message:
> > Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-2.bz2
> >> Feb 16 18:00:29 server2 crmd[1559]: notice: run_graph: Transition 2
> > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-2.bz2): Complete
> >> Feb 16 18:00:29 server2 crmd[1559]: notice: do_state_transition: State
> > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> > cause=C_FSA_INTERNAL origin=notify_crmd ]
> >> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Transitional membership event on ring 24: memb=1,
> new=0,
> > lost=0
> >> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
> > pcmk_peer_update: memb: server2 1711385280
> >> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:760 notice:
> > pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1,
> lost=0
> >> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] utils.c:358 info:
> > update_member: Node 1694608064/server1 is now: member
> >> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:791 info:
> > pcmk_peer_update: NEW: server1 1694608064
> >> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> > pcmk_peer_update: MEMB: server1 1694608064
> >> Feb 16 18:00:30 server2 crmd[1559]: notice: plugin_handle_membership:
> > Membership 24: quorum acquired
> >> Feb 16 18:00:30 server2 cib[1554]: notice: plugin_handle_membership:
> > Membership 24: quorum acquired
> >> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> > pcmk_peer_update: MEMB: server2 1711385280
> >> Feb 16 18:00:30 server2 cib[1554]: notice: crm_update_peer_state:
> > plugin_handle_membership: Node server1[1694608064] - state is now member
> (was
> > lost)
> >> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
> > send_member_notification: Sending membership update 24 to 2 children
> >> Feb 16 18:00:30 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
> > processor joined or left the membership and a new membership was formed.
> >> Feb 16 18:00:30 server2 crmd[1559]: notice: crm_update_peer_state:
> > plugin_handle_membership: Node server1[1694608064] - state is now member
> (was
> > lost)
> >> Feb 16 18:00:30 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
> > downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
> >> Feb 16 18:00:31 server2 corosync[1549]: [MAIN ] main.c:278 Completed
> > service synchronization, ready to provide service.
> >> Feb 16 18:00:31 server2 attrd[1557]: warning: attrd_cib_callback:
> Update 45
> > for master-res_drbd_r0=1000 failed: Application of an update diff failed
> >> Feb 16 18:00:56 server2 crmd[1559]: warning: cib_rsc_callback: Resource
> > update 49 failed: (rc=-62) Timer expired
> >> Feb 16 18:00:56 server2 crmd[1559]: warning: cib_rsc_callback: Resource
> > update 50 failed: (rc=-62) Timer expired
> >> Feb 16 18:02:26 server2 attrd[1557]: warning: attrd_cib_callback:
> Update 51
> > for master-res_drbd_r0=10000 failed: Timer expired
> >> Feb 16 18:02:27 server2 crmd[1559]: warning: crmd_ha_msg_filter:
> Another DC
> > detected: server1 (op=join_offer)
> >> Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition: State
> > transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> > origin=crmd_ha_msg_filter ]
> >> Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition: State
> > transition S_ELECTION -> S_RELEASE_DC [ input=I_RELEASE_DC
> > cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> >> Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition: State
> > transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
> > origin=do_cl_join_finalize_respond ]
> >> Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_local_callback:
> Sending
> > full refresh (origin=crmd)
> >> Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: master-res_drbd_r0 (10000)
> >> Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: probe_complete (true)
> >> Feb 16 18:02:27 server2 crmd[1559]: notice: process_lrm_event:
> > server2-res_drbd_r0_monitor_10000:68 [ \n ]
> >> Feb 16 18:02:27 server2 crmd[1559]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_notify_0 (call=74, rc=0, cib-update=0,
> confirmed=true) ok
> >> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_notify_0 (call=77, rc=0, cib-update=0,
> confirmed=true) ok
> >> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_demote_0 (call=80, rc=0, cib-update=74,
> confirmed=true) ok
> >> Feb 16 18:02:28 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> > flush op to all hosts for: master-res_drbd_r0 (1000)
> >> Feb 16 18:02:28 server2 attrd[1557]: notice: attrd_perform_update:
> Sent
> > update 67: master-res_drbd_r0=1000
> >> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_notify_0 (call=83, rc=0, cib-update=0,
> confirmed=true) ok
> >> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
> > operation res_drbd_r0_monitor_30000 (call=86, rc=0, cib-update=75,
> > confirmed=false) ok
> >>
> >> 【global_common.conf】
> >> global {
> >> usage-count no;
> >> }
> >>
> >> common {
> >> protocol C;
> >> handlers {
> >> }
> >> startup {
> >> }
> >> disk {
> >> no-disk-barrier;
> >> no-disk-flushes;
> >> }
> >> net {
> >> max-buffers 8000;
> >> max-epoch-size 8000;
> >> after-sb-0pri discard-younger-primary;
> >> after-sb-1pri discard-secondary;
> >> after-sb-2pri disconnect;
> >> }
> >> syncer {
> >> rate 30M;
> >> verify-alg sha1;
> >> }
> >> }
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Linux-ha-japan mailing list
> >> Linux-ha-japan@lists.sourceforge.jp
> >> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >>
> >>
> >>
> >
> > _______________________________________________
> > Linux-ha-japan mailing list
> > Linux-ha-japan@lists.sourceforge.jp
> > http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >
>
> _______________________________________________
> Linux-ha-japan mailing list
> Linux-ha-japan@lists.sourceforge.jp
> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>
Re: DRBDスプリットブレインの発生原因に関して [ In reply to ]
あきやまさん

こんにちは、山内です。

>token値はデフォルト値(1000ms)のままでした。
>変更する場合ですが、推奨値はあるのでしょうか?
>もしくは算出方法等あれば、ご教授頂けると助かります。

corosync+PMの推奨値はデフォルトの1000msだったと記憶しています。
変更する場合の値は、構成(負荷状況など)によりますので、倍程度からあげてみては如何でしょうか?

tokenを伸ばす場合のデメリットは、故障検知(相手のダウン)時間が長くなることになります。

以上、宜しくお願いいたします。



----- Original Message -----
>From: あきやまじろう <mayama0130@gmail.com>
>To: linux-ha-japan@lists.sourceforge.jp
>Date: 2015/2/17, Tue 14:08
>Subject: Re: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
>
>
>山内様
>
>あきやまです。
>
>ご返信ありがとうございます。
>
>> ちなみに、プライマリ、セカンダリの負荷が高い可能性もありますので、
>> sarやvmstatでcpuの状態も見てみた方がよいかも知れません。
>> #ある時に、何かのcronジョブなどで負荷があがったかも知れません。
>当該時間他のアプリケーションの処理が他の時間帯に比べて負荷が高かったので、
>その可能性も含め、状態の確認をしてみたいと思います。
>
>> corosync.confのtokenパラメータを伸ばしてみるとどうなりますでしょうか?
>token値はデフォルト値(1000ms)のままでした。
>変更する場合ですが、推奨値はあるのでしょうか?
>もしくは算出方法等あれば、ご教授頂けると助かります。
>
>
>
>2015-02-17 12:58 GMT+09:00 <renayama19661014@ybb.ne.jp>:
>
>あきやまさん
>>
>>山内です。
>>
>>ちなみに、プライマリ、セカンダリの負荷が高い可能性もありますので、
>>sarやvmstatでcpuの状態も見てみた方がよいかも知れません。
>>
>>#ある時に、何かのcronジョブなどで負荷があがったかも知れません。
>>
>>以上です。
>>
>>
>>
>>----- Original Message -----
>>> From: "renayama19661014@ybb.ne.jp" <renayama19661014@ybb.ne.jp>
>>> To: "linux-ha-japan@lists.sourceforge.jp" <linux-ha-japan@lists.sourceforge.jp>
>>> Cc:
>>> Date: 2015/2/17, Tue 12:31
>>> Subject: Re: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
>>>
>>> あきやまさん
>>>
>>> こんにちは、山内です。
>>>
>>> corosyncのプライマリログを見る限り、セカンダリとのcorosync層の通信が切れて、一旦、再構成がかかったようです。
>>>
>>> corosync.confのtokenパラメータを伸ばしてみるとどうなりますでしょうか?
>>>
>>>> Feb 16 18:00:27 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
>>> process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms). Consider
>>> token timeout increase.
>>>
>>>
>>> 上記のログが出ていますので、本来のcorosyncのスケジューリングが2秒強、遅れている状態が通知されてます。
>>>
>>> 以上、宜しくお願いいたします。
>>>
>>>
>>> ----- Original Message -----
>>>> From: あきやまじろう <mayama0130@gmail.com>
>>>> To: linux-ha-japan@lists.sourceforge.jp
>>>> Date: 2015/2/17, Tue 11:38
>>>> Subject: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
>>>>
>>>>
>>>> お世話になります。あきやまと申します。
>>>>
>>>> 下記の環境で運用をしています。
>>>>
>>>> 環境:
>>>> Red Hat Enterprise Linux Server 6.1 (for x86)
>>>> kernel-2.6.32-431.23.3.el6.i686
>>>> corosync-1.4.1-17.el6_5.1.i686
>>>> pacemaker-1.1.10-14.el6_5.3.i686
>>>> drbd-8.3.16-1.el6.i386
>>>>
>>>> 質問内容:
>>>> 運用中に突然DRBDスプリットブレインが発生し、手動で復旧させましたが、
>>>> 発生原因が不明で困っています。
>>>> 発生原因と回避策が分かれば、ご教授願えますでしょうか。
>>>>
>>>> 現在までに判明している事:
>>>> ・DRBDネットワークのping応答は問題なし
>>>> ・messagesログにリンクダウンの出力なし
>>>>
>>>> 【プライマリ側のmessagesログ】
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown )
>>> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting header on
>>> sock: r=-110
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
>>> 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
>>> Unconnected )
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
>>> WFConnection )
>>>> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
>>> from=::ffff:192.171.1.102
>>>> Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root: cmd='echo
>>> $(date +%Y/%m/%d\ %H:%M:%S)'
>>>> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
>>> duration=0(sec)
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: Agreed
>>> network protocol version 97
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
>>> WFReportParams )
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread (from
>>> drbd0_receiver [3117])
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg:
>>> <not-used>
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down by
>>> peer.
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
>>> NetworkFailure )
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
>>> Unconnected )
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
>>> WFConnection )
>>>>
>>>> 【プライマリ側のcorosyncログ】
>>>> Feb 16 18:00:27 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
>>> process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms). Consider
>>> token timeout increase.
>>>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Transitional membership event on ring 16: memb=1, new=0,
>>> lost=1
>>>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:773 info:
>>> pcmk_peer_update: memb: server1 1694608064
>>>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:779 info:
>>> pcmk_peer_update: lost: server2 1711385280
>>>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, lost=0
>>>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:791 info:
>>> pcmk_peer_update: NEW:  server2 1711385280
>>>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
>>> pcmk_peer_update: MEMB: server1 1694608064
>>>> Feb 16 18:00:27 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
>>> pcmk_peer_update: MEMB: server2 1711385280
>>>> Feb 16 18:00:27 server1 corosync[1756]:   [TOTEM ] totemsrp.c:1892 A
>>> processor joined or left the membership and a new membership was formed.
>>>> Feb 16 18:00:27 server1 corosync[1756]:   [CPG   ] cpg.c:783 chosen
>>> downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
>>>> Feb 16 18:00:28 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
>>> process was not scheduled for 826.4143 ms (threshold is 800.0000 ms). Consider
>>> token timeout increase.
>>>> Feb 16 18:00:28 server1 corosync[1756]:   [MAIN  ] main.c:278 Completed
>>> service synchronization, ready to provide service.
>>>> Feb 16 18:00:28 server1 crmd[1766]:  warning: crmd_ha_msg_filter: Another DC
>>> detected: server2 (op=noop)
>>>> Feb 16 18:00:28 server1 crmd[1766]:   notice: do_state_transition: State
>>> transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
>>> origin=crmd_ha_msg_filter ]
>>>> Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not
>>> requesting full refresh in R/W mode
>>>> Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not
>>> requesting full refresh in R/W mode
>>>> Feb 16 18:00:30 server1 cib[1761]:  warning: cib_server_process_diff: Not
>>> requesting full refresh in R/W mode
>>>> Feb 16 18:00:30 server1 corosync[1756]:   [MAIN  ] main.c:1341 Corosync main
>>> process was not scheduled for 2248.0801 ms (threshold is 800.0000 ms). Consider
>>> token timeout increase.
>>>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Transitional membership event on ring 24: memb=1, new=0,
>>> lost=1
>>>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:773 info:
>>> pcmk_peer_update: memb: server1 1694608064
>>>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:779 info:
>>> pcmk_peer_update: lost: server2 1711385280
>>>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, lost=0
>>>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:791 info:
>>> pcmk_peer_update: NEW:  server2 1711385280
>>>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
>>> pcmk_peer_update: MEMB: server1 1694608064
>>>> Feb 16 18:00:31 server1 corosync[1756]:   [pcmk  ] plugin.c:808 info:
>>> pcmk_peer_update: MEMB: server2 1711385280
>>>> Feb 16 18:00:31 server1 corosync[1756]:   [TOTEM ] totemsrp.c:1892 A
>>> processor joined or left the membership and a new membership was formed.
>>>> Feb 16 18:00:32 server1 corosync[1756]:   [CPG   ] cpg.c:783 chosen
>>> downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
>>>> Feb 16 18:00:32 server1 corosync[1756]:   [MAIN  ] main.c:278 Completed
>>> service synchronization, ready to provide service.
>>>> Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not
>>> requesting full refresh in R/W mode
>>>> Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not
>>> requesting full refresh in R/W mode
>>>> Feb 16 18:00:32 server1 cib[1761]:  warning: cib_server_process_diff: Not
>>> requesting full refresh in R/W mode
>>>> Feb 16 18:02:28 server1 crmd[1766]:    error: crm_timer_popped: Election
>>> Timeout (I_ELECTION_DC) just popped in state S_ELECTION! (120000ms)
>>>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_state_transition: State
>>> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
>>> cause=C_TIMER_POPPED origin=crm_timer_popped ]
>>>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_state_transition: State
>>> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
>>> cause=C_FSA_INTERNAL origin=do_election_check ]
>>>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_dc_join_finalize: join-4:
>>> Syncing the CIB from server2 to the rest of the cluster
>>>> Feb 16 18:02:28 server1 crmd[1766]:   notice: do_dc_join_finalize: Requested
>>> version   <generation_tuple epoch="7" num_updates="87"
>>> admin_epoch="0" validate-with="pacemaker-1.2"
>>> cib-last-written="Wed Jan 21 00:17:56 2015"
>>> update-origin="server1" update-client="crmd"
>>> crm_feature_set="3.0.7" have-quorum="1"
>>> dc-uuid="server2"/>
>>>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: Diff: --- 0.7.69
>>>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: Diff: +++ 0.7.87
>>> 1afd9af32d3a574358543e1d4751aaf1
>>>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --     <node_state
>>> crm-debug-origin="do_state_transition" join="member"
>>> id="server1"/>
>>>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --            
>>> <lrm_rsc_op operation_key="res_drbd_r0_start_0"
>>> operation="start"
>>> transition-key="46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
>>> transition-magic="0:0;46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
>>> call-id="44" last-run="1421767134"
>>> last-rc-change="1421767134" exec-time="430"
>>> id="res_drbd_r0_last_0"/>
>>>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --            
>>> <lrm_rsc_op id="res_drbd_r0_monitor_30000"
>>> operation_key="res_drbd_r0_monitor_30000"
>>> operation="monitor" crm-debug-origin="do_update_resource"
>>> crm_feature_set="3.0.7"
>>> transition-key="47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
>>> transition-magic="0:0;47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
>>> call-id="50" rc-code="0" op-status="0"
>>> interval="30000" last-rc-change="1421767135"
>>> exec-time="60" queue-time="0"
>>> op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
>>>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: --          
>>> <nvpair value="1000"
>>> id="status-server2-master-res_drbd_r0"/>
>>>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++     <node_state
>>> id="server1" uname="server1" in_ccm="true"
>>> crmd="online" crm-debug-origin="post_cache_update"
>>> join="down" expected="member"/>
>>>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++          
>>> <nvpair id="status-server2-master-res_drbd_r0"
>>> name="master-res_drbd_r0" value="10000"/>
>>>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++            
>>> <lrm_rsc_op id="res_drbd_r0_last_0"
>>> operation_key="res_drbd_r0_promote_0" operation="promote"
>>> crm-debug-origin="do_update_resource"
>>> crm_feature_set="3.0.7"
>>> transition-key="14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
>>> transition-magic="0:0;14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
>>> call-id="62" rc-code="0" op-status="0"
>>> interval="0" last-run="1424077226"
>>> last-rc-change="1424077226" exec-time="120"
>>> queue-time="0"
>>> op-digest="c0e018b73fdf522b6cdd355e125af15e"/>
>>>> Feb 16 18:02:28 server1 cib[1761]:   notice: cib:diff: ++            
>>> <lrm_rsc_op id="res_drbd_r0_monitor_10000"
>>> operation_key="res_drbd_r0_monitor_10000"
>>> operation="monitor" crm-debug-origin="do_update_resource"
>>> crm_feature_set="3.0.7"
>>> transition-key="15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
>>> transition-magic="0:8;15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
>>> call-id="68" rc-code="8" op-status="0"
>>> interval="10000" last-rc-change="1424077226"
>>> exec-time="53" queue-time="0"
>>> op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
>>>> Feb 16 18:02:28 server1 stonith-ng[1762]:  warning: cib_process_diff: Diff
>>> 0.7.69 -> 0.7.87 from local not applied to 0.7.69: Failed application of an
>>> update diff
>>>> Feb 16 18:02:28 server1 stonith-ng[1762]:   notice: update_cib_cache_cb:
>>> [cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
>>>> Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_local_callback: Sending
>>> full refresh (origin=crmd)
>>>> Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: master-res_drbd_r0 (10000)
>>>> Feb 16 18:02:28 server1 attrd[1764]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: probe_complete (true)
>>>> Feb 16 18:02:28 server1 pengine[1765]:   notice: unpack_config: On loss of
>>> CCM Quorum: Ignore
>>>> Feb 16 18:02:28 server1 pengine[1765]:   notice: LogActions: Demote 
>>> res_drbd_r0:1#011(Master -> Slave server2)
>>>> Feb 16 18:02:28 server1 pengine[1765]:   notice: process_pe_message:
>>> Calculated Transition 2582: /var/lib/pacemaker/pengine/pe-input-24.bz2
>>>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
>>> action 8: cancel res_drbd_r0_cancel_10000 on server2
>>>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
>>> action 81: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
>>>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
>>> action 83: notify res_drbd_r0_pre_notify_demote_0 on server2
>>>> Feb 16 18:02:28 server1 crmd[1766]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_notify_0 (call=113, rc=0, cib-update=0, confirmed=true) ok
>>>> Feb 16 18:02:28 server1 crmd[1766]:   notice: run_graph: Transition 2582
>>> (Complete=4, Pending=0, Fired=0, Skipped=4, Incomplete=5,
>>> Source=/var/lib/pacemaker/pengine/pe-input-24.bz2): Stopped
>>>> Feb 16 18:02:28 server1 pengine[1765]:   notice: unpack_config: On loss of
>>> CCM Quorum: Ignore
>>>> Feb 16 18:02:28 server1 pengine[1765]:   notice: LogActions: Demote 
>>> res_drbd_r0:1#011(Master -> Slave server2)
>>>> Feb 16 18:02:28 server1 pengine[1765]:   notice: process_pe_message:
>>> Calculated Transition 2583: /var/lib/pacemaker/pengine/pe-input-25.bz2
>>>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
>>> action 80: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
>>>> Feb 16 18:02:28 server1 crmd[1766]:   notice: te_rsc_command: Initiating
>>> action 82: notify res_drbd_r0_pre_notify_demote_0 on server2
>>>> Feb 16 18:02:29 server1 crmd[1766]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_notify_0 (call=116, rc=0, cib-update=0, confirmed=true) ok
>>>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
>>> action 39: demote res_drbd_r0_demote_0 on server2
>>>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
>>> action 81: notify res_drbd_r0_post_notify_demote_0 on server1 (local)
>>>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
>>> action 83: notify res_drbd_r0_post_notify_demote_0 on server2
>>>> Feb 16 18:02:29 server1 crmd[1766]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_notify_0 (call=119, rc=0, cib-update=0, confirmed=true) ok
>>>> Feb 16 18:02:29 server1 crmd[1766]:   notice: run_graph: Transition 2583
>>> (Complete=11, Pending=0, Fired=0, Skipped=1, Incomplete=0,
>>> Source=/var/lib/pacemaker/pengine/pe-input-25.bz2): Stopped
>>>> Feb 16 18:02:29 server1 pengine[1765]:   notice: unpack_config: On loss of
>>> CCM Quorum: Ignore
>>>> Feb 16 18:02:29 server1 pengine[1765]:   notice: process_pe_message:
>>> Calculated Transition 2584: /var/lib/pacemaker/pengine/pe-input-26.bz2
>>>> Feb 16 18:02:29 server1 crmd[1766]:   notice: te_rsc_command: Initiating
>>> action 40: monitor res_drbd_r0_monitor_30000 on server2
>>>> Feb 16 18:02:29 server1 crmd[1766]:   notice: run_graph: Transition 2584
>>> (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>> Source=/var/lib/pacemaker/pengine/pe-input-26.bz2): Complete
>>>> Feb 16 18:02:29 server1 crmd[1766]:   notice: do_state_transition: State
>>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>>> Feb 16 18:17:29 server1 crmd[1766]:   notice: do_state_transition: State
>>> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
>>> origin=crm_timer_popped ]
>>>> Feb 16 18:17:29 server1 pengine[1765]:   notice: unpack_config: On loss of
>>> CCM Quorum: Ignore
>>>> Feb 16 18:17:29 server1 pengine[1765]:   notice: process_pe_message:
>>> Calculated Transition 2585: /var/lib/pacemaker/pengine/pe-input-27.bz2
>>>> Feb 16 18:17:29 server1 crmd[1766]:   notice: run_graph: Transition 2585
>>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>> Source=/var/lib/pacemaker/pengine/pe-input-27.bz2): Complete
>>>> Feb 16 18:17:29 server1 crmd[1766]:   notice: do_state_transition: State
>>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>>>
>>>> 【セカンダリ側のmessagesログ】
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned -110
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> Unknown )
>>> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting header on
>>> sock: r=-110
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
>>> 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
>>> Unconnected )
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
>>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
>>> WFConnection )
>>>> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
>>> from=::ffff:192.171.1.102
>>>> Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root: cmd='echo
>>> $(date +%Y/%m/%d\ %H:%M:%S)'
>>>> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
>>> duration=0(sec)
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: Agreed
>>> network protocol version 97
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
>>> WFReportParams )
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread (from
>>> drbd0_receiver [3117])
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg:
>>> <not-used>
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut down by
>>> peer.
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
>>> NetworkFailure )
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
>>>> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
>>> Unconnected )
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
>>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
>>> WFConnection )
>>>>
>>>> 【セカンダリ側のcorosyncログ】
>>>> Feb 16 18:00:23 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: master-res_drbd_r0 (1000)
>>>> Feb 16 18:00:23 server2 attrd[1557]:   notice: attrd_perform_update: Sent
>>> update 45: master-res_drbd_r0=1000
>>>> Feb 16 18:00:24 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1595 A
>>> processor failed, forming new configuration.
>>>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Transitional membership event on ring 12: memb=1, new=0,
>>> lost=1
>>>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
>>> pcmk_peer_update: memb: server2 1711385280
>>>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:779 info:
>>> pcmk_peer_update: lost: server1 1694608064
>>>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Stable membership event on ring 12: memb=1, new=0, lost=0
>>>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
>>> pcmk_peer_update: MEMB: server2 1711385280
>>>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:731 info:
>>> ais_mark_unseen_peer_dead: Node server1 was not seen in the previous transition
>>>> Feb 16 18:00:25 server2 cib[1554]:   notice: plugin_handle_membership:
>>> Membership 12: quorum lost
>>>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
>>> update_member: Node 1694608064/server1 is now: lost
>>>> Feb 16 18:00:25 server2 crmd[1559]:   notice: plugin_handle_membership:
>>> Membership 12: quorum lost
>>>> Feb 16 18:00:25 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
>>> send_member_notification: Sending membership update 12 to 2 children
>>>> Feb 16 18:00:25 server2 cib[1554]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
>>> member)
>>>> Feb 16 18:00:25 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
>>> processor joined or left the membership and a new membership was formed.
>>>> Feb 16 18:00:25 server2 crmd[1559]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
>>> member)
>>>> Feb 16 18:00:25 server2 crmd[1559]:  warning: reap_dead_nodes: Our DC node
>>> (server1) left the cluster
>>>> Feb 16 18:00:25 server2 crmd[1559]:   notice: do_state_transition: State
>>> transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
>>> origin=reap_dead_nodes ]
>>>> Feb 16 18:00:25 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
>>> downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
>>>> Feb 16 18:00:25 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
>>> service synchronization, ready to provide service.
>>>> Feb 16 18:00:25 server2 crmd[1559]:   notice: do_state_transition: State
>>> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
>>> cause=C_FSA_INTERNAL origin=do_election_check ]
>>>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_local_callback: Sending
>>> full refresh (origin=crmd)
>>>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: master-res_drbd_r0 (1000)
>>>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_perform_update: Sent
>>> update 47: master-res_drbd_r0=1000
>>>> Feb 16 18:00:25 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: probe_complete (true)
>>>> Feb 16 18:00:26 server2 pengine[1558]:   notice: unpack_config: On loss of
>>> CCM Quorum: Ignore
>>>> Feb 16 18:00:26 server2 pengine[1558]:   notice: LogActions: Promote
>>> res_drbd_r0:0#011(Slave -> Master server2)
>>>> Feb 16 18:00:26 server2 pengine[1558]:   notice: process_pe_message:
>>> Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-0.bz2
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
>>> action 1: cancel res_drbd_r0_cancel_30000 on server2 (local)
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
>>> action 56: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_notify_0 (call=56, rc=0, cib-update=0, confirmed=true) ok
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: run_graph: Transition 0
>>> (Complete=3, Pending=0, Fired=0, Skipped=5, Incomplete=4,
>>> Source=/var/lib/pacemaker/pengine/pe-input-0.bz2): Stopped
>>>> Feb 16 18:00:26 server2 pengine[1558]:   notice: unpack_config: On loss of
>>> CCM Quorum: Ignore
>>>> Feb 16 18:00:26 server2 pengine[1558]:   notice: LogActions: Promote
>>> res_drbd_r0:0#011(Slave -> Master server2)
>>>> Feb 16 18:00:26 server2 pengine[1558]:   notice: process_pe_message:
>>> Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-1.bz2
>>>> Feb 16 18:00:26 server2 crmd[1559]:  warning: destroy_action: Cancelling
>>> timer for action 1 (src=45)
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
>>> action 55: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
>>>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Transitional membership event on ring 16: memb=1, new=0,
>>> lost=0
>>>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
>>> pcmk_peer_update: memb: server2 1711385280
>>>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, lost=0
>>>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
>>> update_member: Node 1694608064/server1 is now: member
>>>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:791 info:
>>> pcmk_peer_update: NEW:  server1 1694608064
>>>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
>>> pcmk_peer_update: MEMB: server1 1694608064
>>>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
>>> pcmk_peer_update: MEMB: server2 1711385280
>>>> Feb 16 18:00:26 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
>>> send_member_notification: Sending membership update 16 to 2 children
>>>> Feb 16 18:00:26 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
>>> processor joined or left the membership and a new membership was formed.
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: plugin_handle_membership:
>>> Membership 16: quorum acquired
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node server1[1694608064] - state is now member (was
>>> lost)
>>>> Feb 16 18:00:26 server2 cib[1554]:   notice: plugin_handle_membership:
>>> Membership 16: quorum acquired
>>>> Feb 16 18:00:26 server2 cib[1554]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node server1[1694608064] - state is now member (was
>>> lost)
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_notify_0 (call=59, rc=0, cib-update=0, confirmed=true) ok
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
>>> action 14: promote res_drbd_r0_promote_0 on server2 (local)
>>>> Feb 16 18:00:26 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
>>> downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_promote_0 (call=62, rc=0, cib-update=49, confirmed=true)
>>> ok
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
>>> action 56: notify res_drbd_r0_post_notify_promote_0 on server2 (local)
>>>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: master-res_drbd_r0 (10000)
>>>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_perform_update: Sent
>>> update 51: master-res_drbd_r0=10000
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_notify_0 (call=65, rc=0, cib-update=0, confirmed=true) ok
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: te_rsc_command: Initiating
>>> action 15: monitor res_drbd_r0_monitor_10000 on server2 (local)
>>>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: master-res_drbd_r0 (10000)
>>>> Feb 16 18:00:26 server2 attrd[1557]:   notice: attrd_perform_update: Sent
>>> update 53: master-res_drbd_r0=10000
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_monitor_10000 (call=68, rc=8, cib-update=50,
>>> confirmed=false) master
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: process_lrm_event:
>>> server2-res_drbd_r0_monitor_10000:68 [ \n ]
>>>> Feb 16 18:00:26 server2 crmd[1559]:   notice: run_graph: Transition 1
>>> (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>> Source=/var/lib/pacemaker/pengine/pe-input-1.bz2): Complete
>>>> Feb 16 18:00:27 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
>>> service synchronization, ready to provide service.
>>>> Feb 16 18:00:28 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1595 A
>>> processor failed, forming new configuration.
>>>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Transitional membership event on ring 20: memb=1, new=0,
>>> lost=1
>>>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
>>> pcmk_peer_update: memb: server2 1711385280
>>>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:779 info:
>>> pcmk_peer_update: lost: server1 1694608064
>>>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0, lost=0
>>>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
>>> pcmk_peer_update: MEMB: server2 1711385280
>>>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:731 info:
>>> ais_mark_unseen_peer_dead: Node server1 was not seen in the previous transition
>>>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
>>> update_member: Node 1694608064/server1 is now: lost
>>>> Feb 16 18:00:29 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
>>> send_member_notification: Sending membership update 20 to 2 children
>>>> Feb 16 18:00:29 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
>>> processor joined or left the membership and a new membership was formed.
>>>> Feb 16 18:00:29 server2 cib[1554]:   notice: plugin_handle_membership:
>>> Membership 20: quorum lost
>>>> Feb 16 18:00:29 server2 crmd[1559]:   notice: plugin_handle_membership:
>>> Membership 20: quorum lost
>>>> Feb 16 18:00:29 server2 cib[1554]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
>>> member)
>>>> Feb 16 18:00:29 server2 crmd[1559]:  warning: match_down_event: No match for
>>> shutdown action on server1
>>>> Feb 16 18:00:29 server2 crmd[1559]:   notice: peer_update_callback:
>>> Stonith/shutdown of server1 not matched
>>>> Feb 16 18:00:29 server2 crmd[1559]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node server1[1694608064] - state is now lost (was
>>> member)
>>>> Feb 16 18:00:29 server2 crmd[1559]:  warning: match_down_event: No match for
>>> shutdown action on server1
>>>> Feb 16 18:00:29 server2 crmd[1559]:   notice: peer_update_callback:
>>> Stonith/shutdown of server1 not matched
>>>> Feb 16 18:00:29 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
>>> downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
>>>> Feb 16 18:00:29 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
>>> service synchronization, ready to provide service.
>>>> Feb 16 18:00:29 server2 crmd[1559]:   notice: do_election_count_vote:
>>> Election 4 (current: 2, owner: server1): Processed vote from server1 (Peer is
>>> not part of our cluster)
>>>> Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_local_callback: Sending
>>> full refresh (origin=crmd)
>>>> Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: master-res_drbd_r0 (10000)
>>>> Feb 16 18:00:29 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: probe_complete (true)
>>>> Feb 16 18:00:29 server2 pengine[1558]:   notice: unpack_config: On loss of
>>> CCM Quorum: Ignore
>>>> Feb 16 18:00:29 server2 pengine[1558]:   notice: process_pe_message:
>>> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-2.bz2
>>>> Feb 16 18:00:29 server2 crmd[1559]:   notice: run_graph: Transition 2
>>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>> Source=/var/lib/pacemaker/pengine/pe-input-2.bz2): Complete
>>>> Feb 16 18:00:29 server2 crmd[1559]:   notice: do_state_transition: State
>>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Transitional membership event on ring 24: memb=1, new=0,
>>> lost=0
>>>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:773 info:
>>> pcmk_peer_update: memb: server2 1711385280
>>>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:760 notice:
>>> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, lost=0
>>>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] utils.c:358 info:
>>> update_member: Node 1694608064/server1 is now: member
>>>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:791 info:
>>> pcmk_peer_update: NEW:  server1 1694608064
>>>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
>>> pcmk_peer_update: MEMB: server1 1694608064
>>>> Feb 16 18:00:30 server2 crmd[1559]:   notice: plugin_handle_membership:
>>> Membership 24: quorum acquired
>>>> Feb 16 18:00:30 server2 cib[1554]:   notice: plugin_handle_membership:
>>> Membership 24: quorum acquired
>>>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:808 info:
>>> pcmk_peer_update: MEMB: server2 1711385280
>>>> Feb 16 18:00:30 server2 cib[1554]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node server1[1694608064] - state is now member (was
>>> lost)
>>>> Feb 16 18:00:30 server2 corosync[1549]:   [pcmk  ] plugin.c:1368 info:
>>> send_member_notification: Sending membership update 24 to 2 children
>>>> Feb 16 18:00:30 server2 corosync[1549]:   [TOTEM ] totemsrp.c:1892 A
>>> processor joined or left the membership and a new membership was formed.
>>>> Feb 16 18:00:30 server2 crmd[1559]:   notice: crm_update_peer_state:
>>> plugin_handle_membership: Node server1[1694608064] - state is now member (was
>>> lost)
>>>> Feb 16 18:00:30 server2 corosync[1549]:   [CPG   ] cpg.c:783 chosen
>>> downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
>>>> Feb 16 18:00:31 server2 corosync[1549]:   [MAIN  ] main.c:278 Completed
>>> service synchronization, ready to provide service.
>>>> Feb 16 18:00:31 server2 attrd[1557]:  warning: attrd_cib_callback: Update 45
>>> for master-res_drbd_r0=1000 failed: Application of an update diff failed
>>>> Feb 16 18:00:56 server2 crmd[1559]:  warning: cib_rsc_callback: Resource
>>> update 49 failed: (rc=-62) Timer expired
>>>> Feb 16 18:00:56 server2 crmd[1559]:  warning: cib_rsc_callback: Resource
>>> update 50 failed: (rc=-62) Timer expired
>>>> Feb 16 18:02:26 server2 attrd[1557]:  warning: attrd_cib_callback: Update 51
>>> for master-res_drbd_r0=10000 failed: Timer expired
>>>> Feb 16 18:02:27 server2 crmd[1559]:  warning: crmd_ha_msg_filter: Another DC
>>> detected: server1 (op=join_offer)
>>>> Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State
>>> transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
>>> origin=crmd_ha_msg_filter ]
>>>> Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State
>>> transition S_ELECTION -> S_RELEASE_DC [ input=I_RELEASE_DC
>>> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
>>>> Feb 16 18:02:27 server2 crmd[1559]:   notice: do_state_transition: State
>>> transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
>>> origin=do_cl_join_finalize_respond ]
>>>> Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_local_callback: Sending
>>> full refresh (origin=crmd)
>>>> Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: master-res_drbd_r0 (10000)
>>>> Feb 16 18:02:27 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: probe_complete (true)
>>>> Feb 16 18:02:27 server2 crmd[1559]:   notice: process_lrm_event:
>>> server2-res_drbd_r0_monitor_10000:68 [ \n ]
>>>> Feb 16 18:02:27 server2 crmd[1559]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_notify_0 (call=74, rc=0, cib-update=0, confirmed=true) ok
>>>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_notify_0 (call=77, rc=0, cib-update=0, confirmed=true) ok
>>>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_demote_0 (call=80, rc=0, cib-update=74, confirmed=true) ok
>>>> Feb 16 18:02:28 server2 attrd[1557]:   notice: attrd_trigger_update: Sending
>>> flush op to all hosts for: master-res_drbd_r0 (1000)
>>>> Feb 16 18:02:28 server2 attrd[1557]:   notice: attrd_perform_update: Sent
>>> update 67: master-res_drbd_r0=1000
>>>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_notify_0 (call=83, rc=0, cib-update=0, confirmed=true) ok
>>>> Feb 16 18:02:28 server2 crmd[1559]:   notice: process_lrm_event: LRM
>>> operation res_drbd_r0_monitor_30000 (call=86, rc=0, cib-update=75,
>>> confirmed=false) ok
>>>>
>>>> 【global_common.conf】
>>>> global {
>>>>         usage-count no;
>>>> }
>>>>
>>>> common {
>>>>         protocol C;
>>>>         handlers {
>>>>         }
>>>>         startup {
>>>>         }
>>>>         disk {
>>>>                 no-disk-barrier;
>>>>                 no-disk-flushes;
>>>>         }
>>>>         net {
>>>>                 max-buffers 8000;
>>>>                 max-epoch-size 8000;
>>>>                 after-sb-0pri discard-younger-primary;
>>>>                 after-sb-1pri discard-secondary;
>>>>                 after-sb-2pri disconnect;
>>>>         }
>>>>         syncer {
>>>>                 rate 30M;
>>>>                 verify-alg sha1;
>>>>         }
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Linux-ha-japan mailing list
>>>> Linux-ha-japan@lists.sourceforge.jp
>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Linux-ha-japan mailing list
>>> Linux-ha-japan@lists.sourceforge.jp
>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>>>
>>
>>_______________________________________________
>>Linux-ha-japan mailing list
>>Linux-ha-japan@lists.sourceforge.jp
>>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>>
>
>_______________________________________________
>Linux-ha-japan mailing list
>Linux-ha-japan@lists.sourceforge.jp
>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>
>
>

_______________________________________________
Linux-ha-japan mailing list
Linux-ha-japan@lists.sourceforge.jp
http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
Re: DRBDスプリットブレインの発生原因に関して [ In reply to ]
山内様

あきやまです。

ご回答ありがとうごいます。

> corosync+PMの推奨値はデフォルトの1000msだったと記憶しています。
> 変更する場合の値は、構成(負荷状況など)によりますので、倍程度からあげてみては如何でしょうか?
>
> tokenを伸ばす場合のデメリットは、故障検知(相手のダウン)時間が長くなることになります。
上記をtoken値の変更も検討したいと思います。




2015-02-17 14:35 GMT+09:00 <renayama19661014@ybb.ne.jp>:

>
> あきやまさん
>
> こんにちは、山内です。
>
> >token値はデフォルト値(1000ms)のままでした。
> >変更する場合ですが、推奨値はあるのでしょうか?
> >もしくは算出方法等あれば、ご教授頂けると助かります。
>
> corosync+PMの推奨値はデフォルトの1000msだったと記憶しています。
> 変更する場合の値は、構成(負荷状況など)によりますので、倍程度からあげてみては如何でしょうか?
>
> tokenを伸ばす場合のデメリットは、故障検知(相手のダウン)時間が長くなることになります。
>
> 以上、宜しくお願いいたします。
>
>
>
> ----- Original Message -----
> >From: あきやまじろう <mayama0130@gmail.com>
> >To: linux-ha-japan@lists.sourceforge.jp
> >Date: 2015/2/17, Tue 14:08
> >Subject: Re: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
> >
> >
> >山内様
> >
> >あきやまです。
> >
> >ご返信ありがとうございます。
> >
> >> ちなみに、プライマリ、セカンダリの負荷が高い可能性もありますので、
> >> sarやvmstatでcpuの状態も見てみた方がよいかも知れません。
> >> #ある時に、何かのcronジョブなどで負荷があがったかも知れません。
> >当該時間他のアプリケーションの処理が他の時間帯に比べて負荷が高かったので、
> >その可能性も含め、状態の確認をしてみたいと思います。
> >
> >> corosync.confのtokenパラメータを伸ばしてみるとどうなりますでしょうか?
> >token値はデフォルト値(1000ms)のままでした。
> >変更する場合ですが、推奨値はあるのでしょうか?
> >もしくは算出方法等あれば、ご教授頂けると助かります。
> >
> >
> >
> >2015-02-17 12:58 GMT+09:00 <renayama19661014@ybb.ne.jp>:
> >
> >あきやまさん
> >>
> >>山内です。
> >>
> >>ちなみに、プライマリ、セカンダリの負荷が高い可能性もありますので、
> >>sarやvmstatでcpuの状態も見てみた方がよいかも知れません。
> >>
> >>#ある時に、何かのcronジョブなどで負荷があがったかも知れません。
> >>
> >>以上です。
> >>
> >>
> >>
> >>----- Original Message -----
> >>> From: "renayama19661014@ybb.ne.jp" <renayama19661014@ybb.ne.jp>
> >>> To: "linux-ha-japan@lists.sourceforge.jp" <
> linux-ha-japan@lists.sourceforge.jp>
> >>> Cc:
> >>> Date: 2015/2/17, Tue 12:31
> >>> Subject: Re: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
> >>>
> >>> あきやまさん
> >>>
> >>> こんにちは、山内です。
> >>>
> >>> corosyncのプライマリログを見る限り、セカンダリとのcorosync層の通信が切れて、一旦、再構成がかかったようです。
> >>>
> >>> corosync.confのtokenパラメータを伸ばしてみるとどうなりますでしょうか?
> >>>
> >>>> Feb 16 18:00:27 server1 corosync[1756]: [MAIN ] main.c:1341
> Corosync main
> >>> process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms).
> Consider
> >>> token timeout increase.
> >>>
> >>>
> >>> 上記のログが出ていますので、本来のcorosyncのスケジューリングが2秒強、遅れている状態が通知されてます。
> >>>
> >>> 以上、宜しくお願いいたします。
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: あきやまじろう <mayama0130@gmail.com>
> >>>> To: linux-ha-japan@lists.sourceforge.jp
> >>>> Date: 2015/2/17, Tue 11:38
> >>>> Subject: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して
> >>>>
> >>>>
> >>>> お世話になります。あきやまと申します。
> >>>>
> >>>> 下記の環境で運用をしています。
> >>>>
> >>>> 環境:
> >>>> Red Hat Enterprise Linux Server 6.1 (for x86)
> >>>> kernel-2.6.32-431.23.3.el6.i686
> >>>> corosync-1.4.1-17.el6_5.1.i686
> >>>> pacemaker-1.1.10-14.el6_5.3.i686
> >>>> drbd-8.3.16-1.el6.i386
> >>>>
> >>>> 質問内容:
> >>>> 運用中に突然DRBDスプリットブレインが発生し、手動で復旧させましたが、
> >>>> 発生原因が不明で困っています。
> >>>> 発生原因と回避策が分かれば、ご教授願えますでしょうか。
> >>>>
> >>>> 現在までに判明している事:
> >>>> ・DRBDネットワークのping応答は問題なし
> >>>> ・messagesログにリンクダウンの出力なし
> >>>>
> >>>> 【プライマリ側のmessagesログ】
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned
> -110
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary ->
> Unknown )
> >>> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting
> header on
> >>> sock: r=-110
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
> >>> 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
> >>> Unconnected )
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
> >>> WFConnection )
> >>>> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
> >>> from=::ffff:192.171.1.102
> >>>> Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root:
> cmd='echo
> >>> $(date +%Y/%m/%d\ %H:%M:%S)'
> >>>> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
> >>> duration=0(sec)
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful:
> Agreed
> >>> network protocol version 97
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
> >>> WFReportParams )
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread
> (from
> >>> drbd0_receiver [3117])
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg:
> >>> <not-used>
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut
> down by
> >>> peer.
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
> >>> NetworkFailure )
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
> >>> Unconnected )
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
> >>> WFConnection )
> >>>>
> >>>> 【プライマリ側のcorosyncログ】
> >>>> Feb 16 18:00:27 server1 corosync[1756]: [MAIN ] main.c:1341
> Corosync main
> >>> process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms).
> Consider
> >>> token timeout increase.
> >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Transitional membership event on ring 16: memb=1,
> new=0,
> >>> lost=1
> >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:773 info:
> >>> pcmk_peer_update: memb: server1 1694608064
> >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:779 info:
> >>> pcmk_peer_update: lost: server2 1711385280
> >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1,
> lost=0
> >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:791 info:
> >>> pcmk_peer_update: NEW: server2 1711385280
> >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
> >>> pcmk_peer_update: MEMB: server1 1694608064
> >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
> >>> pcmk_peer_update: MEMB: server2 1711385280
> >>>> Feb 16 18:00:27 server1 corosync[1756]: [TOTEM ] totemsrp.c:1892 A
> >>> processor joined or left the membership and a new membership was
> formed.
> >>>> Feb 16 18:00:27 server1 corosync[1756]: [CPG ] cpg.c:783 chosen
> >>> downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
> >>>> Feb 16 18:00:28 server1 corosync[1756]: [MAIN ] main.c:1341
> Corosync main
> >>> process was not scheduled for 826.4143 ms (threshold is 800.0000 ms).
> Consider
> >>> token timeout increase.
> >>>> Feb 16 18:00:28 server1 corosync[1756]: [MAIN ] main.c:278
> Completed
> >>> service synchronization, ready to provide service.
> >>>> Feb 16 18:00:28 server1 crmd[1766]: warning: crmd_ha_msg_filter:
> Another DC
> >>> detected: server2 (op=noop)
> >>>> Feb 16 18:00:28 server1 crmd[1766]: notice: do_state_transition:
> State
> >>> transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> >>> origin=crmd_ha_msg_filter ]
> >>>> Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> >>> requesting full refresh in R/W mode
> >>>> Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> >>> requesting full refresh in R/W mode
> >>>> Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> >>> requesting full refresh in R/W mode
> >>>> Feb 16 18:00:30 server1 corosync[1756]: [MAIN ] main.c:1341
> Corosync main
> >>> process was not scheduled for 2248.0801 ms (threshold is 800.0000 ms).
> Consider
> >>> token timeout increase.
> >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Transitional membership event on ring 24: memb=1,
> new=0,
> >>> lost=1
> >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:773 info:
> >>> pcmk_peer_update: memb: server1 1694608064
> >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:779 info:
> >>> pcmk_peer_update: lost: server2 1711385280
> >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1,
> lost=0
> >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:791 info:
> >>> pcmk_peer_update: NEW: server2 1711385280
> >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
> >>> pcmk_peer_update: MEMB: server1 1694608064
> >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:808 info:
> >>> pcmk_peer_update: MEMB: server2 1711385280
> >>>> Feb 16 18:00:31 server1 corosync[1756]: [TOTEM ] totemsrp.c:1892 A
> >>> processor joined or left the membership and a new membership was
> formed.
> >>>> Feb 16 18:00:32 server1 corosync[1756]: [CPG ] cpg.c:783 chosen
> >>> downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1)
> >>>> Feb 16 18:00:32 server1 corosync[1756]: [MAIN ] main.c:278
> Completed
> >>> service synchronization, ready to provide service.
> >>>> Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> >>> requesting full refresh in R/W mode
> >>>> Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> >>> requesting full refresh in R/W mode
> >>>> Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff:
> Not
> >>> requesting full refresh in R/W mode
> >>>> Feb 16 18:02:28 server1 crmd[1766]: error: crm_timer_popped:
> Election
> >>> Timeout (I_ELECTION_DC) just popped in state S_ELECTION! (120000ms)
> >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: do_state_transition:
> State
> >>> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> >>> cause=C_TIMER_POPPED origin=crm_timer_popped ]
> >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: do_state_transition:
> State
> >>> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> >>> cause=C_FSA_INTERNAL origin=do_election_check ]
> >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: do_dc_join_finalize:
> join-4:
> >>> Syncing the CIB from server2 to the rest of the cluster
> >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: do_dc_join_finalize:
> Requested
> >>> version <generation_tuple epoch="7" num_updates="87"
> >>> admin_epoch="0" validate-with="pacemaker-1.2"
> >>> cib-last-written="Wed Jan 21 00:17:56 2015"
> >>> update-origin="server1" update-client="crmd"
> >>> crm_feature_set="3.0.7" have-quorum="1"
> >>> dc-uuid="server2"/>
> >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: Diff: ---
> 0.7.69
> >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: Diff: +++
> 0.7.87
> >>> 1afd9af32d3a574358543e1d4751aaf1
> >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: --
> <node_state
> >>> crm-debug-origin="do_state_transition" join="member"
> >>> id="server1"/>
> >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: --
> >>> <lrm_rsc_op operation_key="res_drbd_r0_start_0"
> >>> operation="start"
> >>> transition-key="46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> >>> transition-magic="0:0;46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> >>> call-id="44" last-run="1421767134"
> >>> last-rc-change="1421767134" exec-time="430"
> >>> id="res_drbd_r0_last_0"/>
> >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: --
> >>> <lrm_rsc_op id="res_drbd_r0_monitor_30000"
> >>> operation_key="res_drbd_r0_monitor_30000"
> >>> operation="monitor" crm-debug-origin="do_update_resource"
> >>> crm_feature_set="3.0.7"
> >>> transition-key="47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> >>> transition-magic="0:0;47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df"
> >>> call-id="50" rc-code="0" op-status="0"
> >>> interval="30000" last-rc-change="1421767135"
> >>> exec-time="60" queue-time="0"
> >>> op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
> >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: --
> >>> <nvpair value="1000"
> >>> id="status-server2-master-res_drbd_r0"/>
> >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++
> <node_state
> >>> id="server1" uname="server1" in_ccm="true"
> >>> crmd="online" crm-debug-origin="post_cache_update"
> >>> join="down" expected="member"/>
> >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++
> >>> <nvpair id="status-server2-master-res_drbd_r0"
> >>> name="master-res_drbd_r0" value="10000"/>
> >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++
> >>> <lrm_rsc_op id="res_drbd_r0_last_0"
> >>> operation_key="res_drbd_r0_promote_0" operation="promote"
> >>> crm-debug-origin="do_update_resource"
> >>> crm_feature_set="3.0.7"
> >>> transition-key="14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> >>> transition-magic="0:0;14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> >>> call-id="62" rc-code="0" op-status="0"
> >>> interval="0" last-run="1424077226"
> >>> last-rc-change="1424077226" exec-time="120"
> >>> queue-time="0"
> >>> op-digest="c0e018b73fdf522b6cdd355e125af15e"/>
> >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++
> >>> <lrm_rsc_op id="res_drbd_r0_monitor_10000"
> >>> operation_key="res_drbd_r0_monitor_10000"
> >>> operation="monitor" crm-debug-origin="do_update_resource"
> >>> crm_feature_set="3.0.7"
> >>> transition-key="15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> >>> transition-magic="0:8;15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb"
> >>> call-id="68" rc-code="8" op-status="0"
> >>> interval="10000" last-rc-change="1424077226"
> >>> exec-time="53" queue-time="0"
> >>> op-digest="6f5adcd7f1211cdfc17850827b8582c5"/>
> >>>> Feb 16 18:02:28 server1 stonith-ng[1762]: warning: cib_process_diff:
> Diff
> >>> 0.7.69 -> 0.7.87 from local not applied to 0.7.69: Failed application
> of an
> >>> update diff
> >>>> Feb 16 18:02:28 server1 stonith-ng[1762]: notice:
> update_cib_cache_cb:
> >>> [cib_diff_notify] Patch aborted: Application of an update diff failed
> (-206)
> >>>> Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_local_callback:
> Sending
> >>> full refresh (origin=crmd)
> >>>> Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: master-res_drbd_r0 (10000)
> >>>> Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: probe_complete (true)
> >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: unpack_config: On
> loss of
> >>> CCM Quorum: Ignore
> >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: LogActions: Demote
> >>> res_drbd_r0:1#011(Master -> Slave server2)
> >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: process_pe_message:
> >>> Calculated Transition 2582: /var/lib/pacemaker/pengine/pe-input-24.bz2
> >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command:
> Initiating
> >>> action 8: cancel res_drbd_r0_cancel_10000 on server2
> >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command:
> Initiating
> >>> action 81: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
> >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command:
> Initiating
> >>> action 83: notify res_drbd_r0_pre_notify_demote_0 on server2
> >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_notify_0 (call=113, rc=0, cib-update=0,
> confirmed=true) ok
> >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: run_graph: Transition
> 2582
> >>> (Complete=4, Pending=0, Fired=0, Skipped=4, Incomplete=5,
> >>> Source=/var/lib/pacemaker/pengine/pe-input-24.bz2): Stopped
> >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: unpack_config: On
> loss of
> >>> CCM Quorum: Ignore
> >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: LogActions: Demote
> >>> res_drbd_r0:1#011(Master -> Slave server2)
> >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: process_pe_message:
> >>> Calculated Transition 2583: /var/lib/pacemaker/pengine/pe-input-25.bz2
> >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command:
> Initiating
> >>> action 80: notify res_drbd_r0_pre_notify_demote_0 on server1 (local)
> >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command:
> Initiating
> >>> action 82: notify res_drbd_r0_pre_notify_demote_0 on server2
> >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_notify_0 (call=116, rc=0, cib-update=0,
> confirmed=true) ok
> >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command:
> Initiating
> >>> action 39: demote res_drbd_r0_demote_0 on server2
> >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command:
> Initiating
> >>> action 81: notify res_drbd_r0_post_notify_demote_0 on server1 (local)
> >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command:
> Initiating
> >>> action 83: notify res_drbd_r0_post_notify_demote_0 on server2
> >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_notify_0 (call=119, rc=0, cib-update=0,
> confirmed=true) ok
> >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: run_graph: Transition
> 2583
> >>> (Complete=11, Pending=0, Fired=0, Skipped=1, Incomplete=0,
> >>> Source=/var/lib/pacemaker/pengine/pe-input-25.bz2): Stopped
> >>>> Feb 16 18:02:29 server1 pengine[1765]: notice: unpack_config: On
> loss of
> >>> CCM Quorum: Ignore
> >>>> Feb 16 18:02:29 server1 pengine[1765]: notice: process_pe_message:
> >>> Calculated Transition 2584: /var/lib/pacemaker/pengine/pe-input-26.bz2
> >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command:
> Initiating
> >>> action 40: monitor res_drbd_r0_monitor_30000 on server2
> >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: run_graph: Transition
> 2584
> >>> (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> >>> Source=/var/lib/pacemaker/pengine/pe-input-26.bz2): Complete
> >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: do_state_transition:
> State
> >>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> >>> cause=C_FSA_INTERNAL origin=notify_crmd ]
> >>>> Feb 16 18:17:29 server1 crmd[1766]: notice: do_state_transition:
> State
> >>> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_TIMER_POPPED
> >>> origin=crm_timer_popped ]
> >>>> Feb 16 18:17:29 server1 pengine[1765]: notice: unpack_config: On
> loss of
> >>> CCM Quorum: Ignore
> >>>> Feb 16 18:17:29 server1 pengine[1765]: notice: process_pe_message:
> >>> Calculated Transition 2585: /var/lib/pacemaker/pengine/pe-input-27.bz2
> >>>> Feb 16 18:17:29 server1 crmd[1766]: notice: run_graph: Transition
> 2585
> >>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> >>> Source=/var/lib/pacemaker/pengine/pe-input-27.bz2): Complete
> >>>> Feb 16 18:17:29 server1 crmd[1766]: notice: do_state_transition:
> State
> >>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> >>> cause=C_FSA_INTERNAL origin=notify_crmd ]
> >>>>
> >>>> 【セカンダリ側のmessagesログ】
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned
> -110
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary ->
> Unknown )
> >>> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting
> header on
> >>> sock: r=-110
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID
> >>> 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe ->
> >>> Unconnected )
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started
> >>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected ->
> >>> WFConnection )
> >>>> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056
> >>> from=::ffff:192.171.1.102
> >>>> Feb 16 18:00:06 server1 rshd[1057]: root@mdn-cluster2 as root:
> cmd='echo
> >>> $(date +%Y/%m/%d\ %H:%M:%S)'
> >>>> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056
> >>> duration=0(sec)
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful:
> Agreed
> >>> network protocol version 97
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection ->
> >>> WFReportParams )
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread
> (from
> >>> drbd0_receiver [3117])
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg:
> >>> <not-used>
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut
> down by
> >>> peer.
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams ->
> >>> NetworkFailure )
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated
> >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure ->
> >>> Unconnected )
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started
> >>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected ->
> >>> WFConnection )
> >>>>
> >>>> 【セカンダリ側のcorosyncログ】
> >>>> Feb 16 18:00:23 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: master-res_drbd_r0 (1000)
> >>>> Feb 16 18:00:23 server2 attrd[1557]: notice: attrd_perform_update:
> Sent
> >>> update 45: master-res_drbd_r0=1000
> >>>> Feb 16 18:00:24 server2 corosync[1549]: [TOTEM ] totemsrp.c:1595 A
> >>> processor failed, forming new configuration.
> >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Transitional membership event on ring 12: memb=1,
> new=0,
> >>> lost=1
> >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
> >>> pcmk_peer_update: memb: server2 1711385280
> >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:779 info:
> >>> pcmk_peer_update: lost: server1 1694608064
> >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Stable membership event on ring 12: memb=1, new=0,
> lost=0
> >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> >>> pcmk_peer_update: MEMB: server2 1711385280
> >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:731 info:
> >>> ais_mark_unseen_peer_dead: Node server1 was not seen in the previous
> transition
> >>>> Feb 16 18:00:25 server2 cib[1554]: notice: plugin_handle_membership:
> >>> Membership 12: quorum lost
> >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] utils.c:358 info:
> >>> update_member: Node 1694608064/server1 is now: lost
> >>>> Feb 16 18:00:25 server2 crmd[1559]: notice:
> plugin_handle_membership:
> >>> Membership 12: quorum lost
> >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
> >>> send_member_notification: Sending membership update 12 to 2 children
> >>>> Feb 16 18:00:25 server2 cib[1554]: notice: crm_update_peer_state:
> >>> plugin_handle_membership: Node server1[1694608064] - state is now lost
> (was
> >>> member)
> >>>> Feb 16 18:00:25 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
> >>> processor joined or left the membership and a new membership was
> formed.
> >>>> Feb 16 18:00:25 server2 crmd[1559]: notice: crm_update_peer_state:
> >>> plugin_handle_membership: Node server1[1694608064] - state is now lost
> (was
> >>> member)
> >>>> Feb 16 18:00:25 server2 crmd[1559]: warning: reap_dead_nodes: Our DC
> node
> >>> (server1) left the cluster
> >>>> Feb 16 18:00:25 server2 crmd[1559]: notice: do_state_transition:
> State
> >>> transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION
> cause=C_FSA_INTERNAL
> >>> origin=reap_dead_nodes ]
> >>>> Feb 16 18:00:25 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
> >>> downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
> >>>> Feb 16 18:00:25 server2 corosync[1549]: [MAIN ] main.c:278
> Completed
> >>> service synchronization, ready to provide service.
> >>>> Feb 16 18:00:25 server2 crmd[1559]: notice: do_state_transition:
> State
> >>> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> >>> cause=C_FSA_INTERNAL origin=do_election_check ]
> >>>> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_local_callback:
> Sending
> >>> full refresh (origin=crmd)
> >>>> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: master-res_drbd_r0 (1000)
> >>>> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_perform_update:
> Sent
> >>> update 47: master-res_drbd_r0=1000
> >>>> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: probe_complete (true)
> >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: unpack_config: On
> loss of
> >>> CCM Quorum: Ignore
> >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: LogActions: Promote
> >>> res_drbd_r0:0#011(Slave -> Master server2)
> >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: process_pe_message:
> >>> Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-0.bz2
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command:
> Initiating
> >>> action 1: cancel res_drbd_r0_cancel_30000 on server2 (local)
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command:
> Initiating
> >>> action 56: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_notify_0 (call=56, rc=0, cib-update=0,
> confirmed=true) ok
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: run_graph: Transition 0
> >>> (Complete=3, Pending=0, Fired=0, Skipped=5, Incomplete=4,
> >>> Source=/var/lib/pacemaker/pengine/pe-input-0.bz2): Stopped
> >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: unpack_config: On
> loss of
> >>> CCM Quorum: Ignore
> >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: LogActions: Promote
> >>> res_drbd_r0:0#011(Slave -> Master server2)
> >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: process_pe_message:
> >>> Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-1.bz2
> >>>> Feb 16 18:00:26 server2 crmd[1559]: warning: destroy_action:
> Cancelling
> >>> timer for action 1 (src=45)
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command:
> Initiating
> >>> action 55: notify res_drbd_r0_pre_notify_promote_0 on server2 (local)
> >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Transitional membership event on ring 16: memb=1,
> new=0,
> >>> lost=0
> >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
> >>> pcmk_peer_update: memb: server2 1711385280
> >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1,
> lost=0
> >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] utils.c:358 info:
> >>> update_member: Node 1694608064/server1 is now: member
> >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:791 info:
> >>> pcmk_peer_update: NEW: server1 1694608064
> >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> >>> pcmk_peer_update: MEMB: server1 1694608064
> >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> >>> pcmk_peer_update: MEMB: server2 1711385280
> >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
> >>> send_member_notification: Sending membership update 16 to 2 children
> >>>> Feb 16 18:00:26 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
> >>> processor joined or left the membership and a new membership was
> formed.
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice:
> plugin_handle_membership:
> >>> Membership 16: quorum acquired
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: crm_update_peer_state:
> >>> plugin_handle_membership: Node server1[1694608064] - state is now
> member (was
> >>> lost)
> >>>> Feb 16 18:00:26 server2 cib[1554]: notice: plugin_handle_membership:
> >>> Membership 16: quorum acquired
> >>>> Feb 16 18:00:26 server2 cib[1554]: notice: crm_update_peer_state:
> >>> plugin_handle_membership: Node server1[1694608064] - state is now
> member (was
> >>> lost)
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_notify_0 (call=59, rc=0, cib-update=0,
> confirmed=true) ok
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command:
> Initiating
> >>> action 14: promote res_drbd_r0_promote_0 on server2 (local)
> >>>> Feb 16 18:00:26 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
> >>> downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_promote_0 (call=62, rc=0, cib-update=49,
> confirmed=true)
> >>> ok
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command:
> Initiating
> >>> action 56: notify res_drbd_r0_post_notify_promote_0 on server2 (local)
> >>>> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: master-res_drbd_r0 (10000)
> >>>> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_perform_update:
> Sent
> >>> update 51: master-res_drbd_r0=10000
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_notify_0 (call=65, rc=0, cib-update=0,
> confirmed=true) ok
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command:
> Initiating
> >>> action 15: monitor res_drbd_r0_monitor_10000 on server2 (local)
> >>>> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: master-res_drbd_r0 (10000)
> >>>> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_perform_update:
> Sent
> >>> update 53: master-res_drbd_r0=10000
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_monitor_10000 (call=68, rc=8, cib-update=50,
> >>> confirmed=false) master
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event:
> >>> server2-res_drbd_r0_monitor_10000:68 [ \n ]
> >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: run_graph: Transition 1
> >>> (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> >>> Source=/var/lib/pacemaker/pengine/pe-input-1.bz2): Complete
> >>>> Feb 16 18:00:27 server2 corosync[1549]: [MAIN ] main.c:278
> Completed
> >>> service synchronization, ready to provide service.
> >>>> Feb 16 18:00:28 server2 corosync[1549]: [TOTEM ] totemsrp.c:1595 A
> >>> processor failed, forming new configuration.
> >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Transitional membership event on ring 20: memb=1,
> new=0,
> >>> lost=1
> >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
> >>> pcmk_peer_update: memb: server2 1711385280
> >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:779 info:
> >>> pcmk_peer_update: lost: server1 1694608064
> >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0,
> lost=0
> >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> >>> pcmk_peer_update: MEMB: server2 1711385280
> >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:731 info:
> >>> ais_mark_unseen_peer_dead: Node server1 was not seen in the previous
> transition
> >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] utils.c:358 info:
> >>> update_member: Node 1694608064/server1 is now: lost
> >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
> >>> send_member_notification: Sending membership update 20 to 2 children
> >>>> Feb 16 18:00:29 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
> >>> processor joined or left the membership and a new membership was
> formed.
> >>>> Feb 16 18:00:29 server2 cib[1554]: notice: plugin_handle_membership:
> >>> Membership 20: quorum lost
> >>>> Feb 16 18:00:29 server2 crmd[1559]: notice:
> plugin_handle_membership:
> >>> Membership 20: quorum lost
> >>>> Feb 16 18:00:29 server2 cib[1554]: notice: crm_update_peer_state:
> >>> plugin_handle_membership: Node server1[1694608064] - state is now lost
> (was
> >>> member)
> >>>> Feb 16 18:00:29 server2 crmd[1559]: warning: match_down_event: No
> match for
> >>> shutdown action on server1
> >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: peer_update_callback:
> >>> Stonith/shutdown of server1 not matched
> >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: crm_update_peer_state:
> >>> plugin_handle_membership: Node server1[1694608064] - state is now lost
> (was
> >>> member)
> >>>> Feb 16 18:00:29 server2 crmd[1559]: warning: match_down_event: No
> match for
> >>> shutdown action on server1
> >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: peer_update_callback:
> >>> Stonith/shutdown of server1 not matched
> >>>> Feb 16 18:00:29 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
> >>> downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1)
> >>>> Feb 16 18:00:29 server2 corosync[1549]: [MAIN ] main.c:278
> Completed
> >>> service synchronization, ready to provide service.
> >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: do_election_count_vote:
> >>> Election 4 (current: 2, owner: server1): Processed vote from server1
> (Peer is
> >>> not part of our cluster)
> >>>> Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_local_callback:
> Sending
> >>> full refresh (origin=crmd)
> >>>> Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: master-res_drbd_r0 (10000)
> >>>> Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: probe_complete (true)
> >>>> Feb 16 18:00:29 server2 pengine[1558]: notice: unpack_config: On
> loss of
> >>> CCM Quorum: Ignore
> >>>> Feb 16 18:00:29 server2 pengine[1558]: notice: process_pe_message:
> >>> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-2.bz2
> >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: run_graph: Transition 2
> >>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> >>> Source=/var/lib/pacemaker/pengine/pe-input-2.bz2): Complete
> >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: do_state_transition:
> State
> >>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> >>> cause=C_FSA_INTERNAL origin=notify_crmd ]
> >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Transitional membership event on ring 24: memb=1,
> new=0,
> >>> lost=0
> >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:773 info:
> >>> pcmk_peer_update: memb: server2 1711385280
> >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:760
> notice:
> >>> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1,
> lost=0
> >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] utils.c:358 info:
> >>> update_member: Node 1694608064/server1 is now: member
> >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:791 info:
> >>> pcmk_peer_update: NEW: server1 1694608064
> >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> >>> pcmk_peer_update: MEMB: server1 1694608064
> >>>> Feb 16 18:00:30 server2 crmd[1559]: notice:
> plugin_handle_membership:
> >>> Membership 24: quorum acquired
> >>>> Feb 16 18:00:30 server2 cib[1554]: notice: plugin_handle_membership:
> >>> Membership 24: quorum acquired
> >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:808 info:
> >>> pcmk_peer_update: MEMB: server2 1711385280
> >>>> Feb 16 18:00:30 server2 cib[1554]: notice: crm_update_peer_state:
> >>> plugin_handle_membership: Node server1[1694608064] - state is now
> member (was
> >>> lost)
> >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:1368 info:
> >>> send_member_notification: Sending membership update 24 to 2 children
> >>>> Feb 16 18:00:30 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A
> >>> processor joined or left the membership and a new membership was
> formed.
> >>>> Feb 16 18:00:30 server2 crmd[1559]: notice: crm_update_peer_state:
> >>> plugin_handle_membership: Node server1[1694608064] - state is now
> member (was
> >>> lost)
> >>>> Feb 16 18:00:30 server2 corosync[1549]: [CPG ] cpg.c:783 chosen
> >>> downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0)
> >>>> Feb 16 18:00:31 server2 corosync[1549]: [MAIN ] main.c:278
> Completed
> >>> service synchronization, ready to provide service.
> >>>> Feb 16 18:00:31 server2 attrd[1557]: warning: attrd_cib_callback:
> Update 45
> >>> for master-res_drbd_r0=1000 failed: Application of an update diff
> failed
> >>>> Feb 16 18:00:56 server2 crmd[1559]: warning: cib_rsc_callback:
> Resource
> >>> update 49 failed: (rc=-62) Timer expired
> >>>> Feb 16 18:00:56 server2 crmd[1559]: warning: cib_rsc_callback:
> Resource
> >>> update 50 failed: (rc=-62) Timer expired
> >>>> Feb 16 18:02:26 server2 attrd[1557]: warning: attrd_cib_callback:
> Update 51
> >>> for master-res_drbd_r0=10000 failed: Timer expired
> >>>> Feb 16 18:02:27 server2 crmd[1559]: warning: crmd_ha_msg_filter:
> Another DC
> >>> detected: server1 (op=join_offer)
> >>>> Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition:
> State
> >>> transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
> >>> origin=crmd_ha_msg_filter ]
> >>>> Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition:
> State
> >>> transition S_ELECTION -> S_RELEASE_DC [ input=I_RELEASE_DC
> >>> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> >>>> Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition:
> State
> >>> transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
> >>> origin=do_cl_join_finalize_respond ]
> >>>> Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_local_callback:
> Sending
> >>> full refresh (origin=crmd)
> >>>> Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: master-res_drbd_r0 (10000)
> >>>> Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: probe_complete (true)
> >>>> Feb 16 18:02:27 server2 crmd[1559]: notice: process_lrm_event:
> >>> server2-res_drbd_r0_monitor_10000:68 [ \n ]
> >>>> Feb 16 18:02:27 server2 crmd[1559]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_notify_0 (call=74, rc=0, cib-update=0,
> confirmed=true) ok
> >>>> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_notify_0 (call=77, rc=0, cib-update=0,
> confirmed=true) ok
> >>>> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_demote_0 (call=80, rc=0, cib-update=74,
> confirmed=true) ok
> >>>> Feb 16 18:02:28 server2 attrd[1557]: notice: attrd_trigger_update:
> Sending
> >>> flush op to all hosts for: master-res_drbd_r0 (1000)
> >>>> Feb 16 18:02:28 server2 attrd[1557]: notice: attrd_perform_update:
> Sent
> >>> update 67: master-res_drbd_r0=1000
> >>>> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_notify_0 (call=83, rc=0, cib-update=0,
> confirmed=true) ok
> >>>> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM
> >>> operation res_drbd_r0_monitor_30000 (call=86, rc=0, cib-update=75,
> >>> confirmed=false) ok
> >>>>
> >>>> 【global_common.conf】
> >>>> global {
> >>>> usage-count no;
> >>>> }
> >>>>
> >>>> common {
> >>>> protocol C;
> >>>> handlers {
> >>>> }
> >>>> startup {
> >>>> }
> >>>> disk {
> >>>> no-disk-barrier;
> >>>> no-disk-flushes;
> >>>> }
> >>>> net {
> >>>> max-buffers 8000;
> >>>> max-epoch-size 8000;
> >>>> after-sb-0pri discard-younger-primary;
> >>>> after-sb-1pri discard-secondary;
> >>>> after-sb-2pri disconnect;
> >>>> }
> >>>> syncer {
> >>>> rate 30M;
> >>>> verify-alg sha1;
> >>>> }
> >>>> }
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Linux-ha-japan mailing list
> >>>> Linux-ha-japan@lists.sourceforge.jp
> >>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >>>>
> >>>>
> >>>>
> >>>
> >>> _______________________________________________
> >>> Linux-ha-japan mailing list
> >>> Linux-ha-japan@lists.sourceforge.jp
> >>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >>>
> >>
> >>_______________________________________________
> >>Linux-ha-japan mailing list
> >>Linux-ha-japan@lists.sourceforge.jp
> >>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >>
> >
> >_______________________________________________
> >Linux-ha-japan mailing list
> >Linux-ha-japan@lists.sourceforge.jp
> >http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >
> >
> >
>
> _______________________________________________
> Linux-ha-japan mailing list
> Linux-ha-japan@lists.sourceforge.jp
> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>