Hi,
I have some problems with a 2-node cluster running Heartbeat 3.0.3 on
Debian 6.0.3. I set up 2 nodes in VMwares. To test failover behaviour I
disable eth0 and eth1. The NICs are configigured in heartbeat. Eth0 with
multicast and eth1 with broadcast. In most cases when I reactivate the
NICs crm_mon shows me both nodes running after a few seconds. But
sometimes not. Then the ha-log shows following errors:
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: info: Link
debian60-clnode1:eth1 up.
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: CRIT: Cluster node
debian60-clnode1 returning after partition.
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: info: For
information on cluster partitions, See URL:
http://linux-ha.org/wiki/Split_Brain
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: WARN: Deadtime
value may be too small.
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: info: See FAQ for
information on tuning deadtime.
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: info: URL:
http://linux-ha.org/wiki/FAQ#Heavy_Load
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: WARN: Late
heartbeat: Node debian60-clnode1: interval 234500 ms
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: info: Status update
for node debian60-clnode1: status active
Nov 14 12:05:21 debian60-clnode2 crmd: [32584]: notice:
crmd_ha_status_callback: Status update: Node debian60-clnode1 now has
status [active] (DC=true)
Nov 14 12:05:21 debian60-clnode2 crmd: [32584]: info:
crm_update_peer_proc: debian60-clnode1.ais is now online
Nov 14 12:05:21 debian60-clnode2 cib: [32580]: WARN: cib_peer_callback:
Discarding cib_apply_diff message (1518) from debian60-clnode1: not in
our membership
Nov 14 12:05:22 debian60-clnode2 ccm: [32579]: info: Break tie for 2
nodes cluster
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info: mem_handle_event:
Got an event OC_EV_MS_INVALID from ccm
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info: mem_handle_event:
no mbr_track info
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info: mem_handle_event:
Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info: mem_handle_event:
instance=514, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info:
crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP
(id=514)
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info: ccm_event_detail:
NEW MEMBERSHIP: trans=514, nodes=1, new=0, lost=0 n_idx=0, new_idx=1,
old_idx=3
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info:
ccm_event_detail: CURRENT: debian60-clnode2 [nodeid=1, born=514]
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info:
populate_cib_nodes_ha: Requesting the list of configured nodes
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info: mem_handle_event:
Got an event OC_EV_MS_INVALID from ccm
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info: mem_handle_event:
no mbr_track info
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info: mem_handle_event:
Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info: mem_handle_event:
instance=514, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info:
cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=514)
Nov 14 12:05:22 debian60-clnode2 ccm: [32579]: WARN: ccm_state_joined:
dropping message of type CCM_TYPE_PROTOVERSION_RESP. Is this a
Byzantine failure?
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info:
cib_process_request: Operation complete: op cib_modify for section nodes
(origin=local/crmd/1497, version=0.1131.11): ok (rc=0)
The only way to get the cluster back to a consistent status is to
restart heartbeat on one node.
I tested different downtimes and it seems that the behavior depends not
on the downtime.
Is this behavior of heartbeat correct?
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
I have some problems with a 2-node cluster running Heartbeat 3.0.3 on
Debian 6.0.3. I set up 2 nodes in VMwares. To test failover behaviour I
disable eth0 and eth1. The NICs are configigured in heartbeat. Eth0 with
multicast and eth1 with broadcast. In most cases when I reactivate the
NICs crm_mon shows me both nodes running after a few seconds. But
sometimes not. Then the ha-log shows following errors:
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: info: Link
debian60-clnode1:eth1 up.
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: CRIT: Cluster node
debian60-clnode1 returning after partition.
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: info: For
information on cluster partitions, See URL:
http://linux-ha.org/wiki/Split_Brain
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: WARN: Deadtime
value may be too small.
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: info: See FAQ for
information on tuning deadtime.
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: info: URL:
http://linux-ha.org/wiki/FAQ#Heavy_Load
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: WARN: Late
heartbeat: Node debian60-clnode1: interval 234500 ms
Nov 14 12:05:21 debian60-clnode2 heartbeat: [32565]: info: Status update
for node debian60-clnode1: status active
Nov 14 12:05:21 debian60-clnode2 crmd: [32584]: notice:
crmd_ha_status_callback: Status update: Node debian60-clnode1 now has
status [active] (DC=true)
Nov 14 12:05:21 debian60-clnode2 crmd: [32584]: info:
crm_update_peer_proc: debian60-clnode1.ais is now online
Nov 14 12:05:21 debian60-clnode2 cib: [32580]: WARN: cib_peer_callback:
Discarding cib_apply_diff message (1518) from debian60-clnode1: not in
our membership
Nov 14 12:05:22 debian60-clnode2 ccm: [32579]: info: Break tie for 2
nodes cluster
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info: mem_handle_event:
Got an event OC_EV_MS_INVALID from ccm
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info: mem_handle_event:
no mbr_track info
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info: mem_handle_event:
Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info: mem_handle_event:
instance=514, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info:
crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP
(id=514)
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info: ccm_event_detail:
NEW MEMBERSHIP: trans=514, nodes=1, new=0, lost=0 n_idx=0, new_idx=1,
old_idx=3
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info:
ccm_event_detail: CURRENT: debian60-clnode2 [nodeid=1, born=514]
Nov 14 12:05:22 debian60-clnode2 crmd: [32584]: info:
populate_cib_nodes_ha: Requesting the list of configured nodes
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info: mem_handle_event:
Got an event OC_EV_MS_INVALID from ccm
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info: mem_handle_event:
no mbr_track info
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info: mem_handle_event:
Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info: mem_handle_event:
instance=514, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info:
cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=514)
Nov 14 12:05:22 debian60-clnode2 ccm: [32579]: WARN: ccm_state_joined:
dropping message of type CCM_TYPE_PROTOVERSION_RESP. Is this a
Byzantine failure?
Nov 14 12:05:22 debian60-clnode2 cib: [32580]: info:
cib_process_request: Operation complete: op cib_modify for section nodes
(origin=local/crmd/1497, version=0.1131.11): ok (rc=0)
The only way to get the cluster back to a consistent status is to
restart heartbeat on one node.
I tested different downtimes and it seems that the behavior depends not
on the downtime.
Is this behavior of heartbeat correct?
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/