Mailing List Archive: 2 Nodes split brain, distant sites

2 Nodes split brain, distant sites

Feb 27, 2014, 6:42 AM

Post #1 of 10 (4852 views)

Hello,

Before starting, my first language is French so I'll try to do my best to explain my problem in English.

1) The situation :

I have 2 servers on 2 distant site.

I need to run openvpn with the same configuration on the 2 servers.
But it must run only on one server at a time.

I want that it start on the second server when the connection with internet is lost on the first node.

I use debian with corosync and pacemaker.

Here is the config :

A) Corosync.conf :
compatibility: whitetank
totem {
version: 2
token: 3000
token_retransmits_before_loss_const: 10
join: 240
consensus: 3600
vsftype: none
max_messages: 20
clear_node_high_bit: yes
secauth: off
threads: 0
nodeid: 1111
rrp_mode: none
interface {
member {
memberaddr: 172.16.135.9
}
member {
memberaddr: 172.16.64.248
}
ringnumber: 0
bindnetaddr: 172.16.135.9
mcastport: 5405
}
transport: udpu
}
amf {
mode: disabled
}
service {
ver: 0
name: pacemaker
}
aisexec {
user: root
group: root
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}

B) Pacemaker :
node controle-col
node vpn-air
primitive ClusterMon ocf:pacemaker:ClusterMon \
params user="root" update="30" extra_options="-E /root/PacemakerMailScript.sh -h /tmp/ClusterMon.html" \
op monitor on-fail="restart" interval="60"
primitive openvpn lsb:openvpn \
op monitor interval="30s"
primitive p_ping ocf:pacemaker:ping \
params host_list="8.8.8.8 4.2.2.2" multiplier="100" dampen="5s" \
op monitor interval="60" timeout="60" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
clone ClusterMon-clone ClusterMon
clone c_ping p_ping
location OpenVpnCluster openvpn \
rule $id="OpenVpnCluster-rule" -inf: not_defined pingd or pingd lte 0
location PrefVpnAir openvpn \
rule $id="PrefVpnAir-rule" 50: #uname eq vpn-air
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"

C) Running good crm_mon
============
Last updated: Thu Feb 27 14:54:31 2014
Last change: Wed Jan 15 12:51:35 2014 via crmd on controle-col
Stack: openais
Current DC: controle-col - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
5 Resources configured.
============

Online: [ vpn-air controle-col ]

Clone Set: c_ping [p_ping]
Started: [ controle-col vpn-air ]
openvpn (lsb:openvpn): Started vpn-air
Clone Set: ClusterMon-clone [ClusterMon]
Started: [ controle-col vpn-air ]

2) My problem :

When there is a network problem :

Ex :
a) first-node site lost internet connection ( and communication with second-node at same time due to vpn on internet connection )
b) cluster stop openvpn on first node and launch it on second due to primitive p_ping in config.
c) connection come back on first-node site
d) Problem : first-node and second-node don't bring back cluster, the don't see each other and create a cluster on each node -> split brain I think.
e) Each node has openvpn running which shouldn't happen

I don't have stonith running because I think without quorum it will be problematic
Is there a way to say to corosync to recreate a ring ?

Or have someone another solution ?

Thanks

Tribolet Thomas
ISSeP (Institut Scientifique de Service Public)
th.tribolet@issep.be<mailto:th.tribolet@issep.be>
+32 (0) 4229 83 46

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems