Mailing List Archive

drbd1: Connection lost. (not coming back)
Hi everybody,

I'm running drbd 0.6.1-pre7 with 3 drbd devices syncing over one crossover
ethernet interface and I have experienced one rather strange problem.

The whole cluster works just fine, SyncAll, SyncQuick, everything.

Just _sometimes_ one device (drbd1) loses it's connection. As far as I
understand this should be no problem and drbd should reconnect as
specified by "connect-int" in the config file.

When starting heartbeat, I sometimes get:

Nov 27 11:00:36 server-194 heartbeat: info: Running /etc/ha.d/resource.d/datadisk drbd1 start
Nov 27 11:00:36 server-194 kernel: drbd1: Connection lost.(pc=0,uc=0)

Now the state of drbd1 on the first node is primary/unknown and the state on
the second node is "Standalone/unknown".

My drbd.conf values are like this:

net {
sync-rate=5000
#skip-sync
tl-size=256
timeout=60
connect-int=10
ping-int=10
}

and thats the same for all 3 drbd devices.

Anything I can tune in the config file to make the reconnect work?

Cheers,
Kai Groshert
RE: drbd1: Connection lost. (not coming back) [ In reply to ]
>Just _sometimes_ one device (drbd1) loses it's connection. As far as I
>understand this should be no problem and drbd should reconnect as
>specified by "connect-int" in the config file.
>When starting heartbeat, I sometimes get:
>Nov 27 11:00:36 server-194 heartbeat: info: Running
/etc/ha.d/resource.d/datadisk drbd1 start
>Nov 27 11:00:36 server-194 kernel: drbd1: Connection lost.(pc=0,uc=0)
>Now the state of drbd1 on the first node is primary/unknown and the state
on
>the second node is "Standalone/unknown".

So have I. If you then type drbdrsetup /dev/nbx you'll see that all network
parameter 4 the device lost.

So I did following workaround:

I placed an entry in /etc/inittab (SuSe) for a script that polls an
drbdsetup's show output
and - if no networkparameter set anymore - reconfigures it.

Example:

/etc/inittab
------------
dr:23:respawn:sh -c /root/drbdnet
------------

(start with runlevel 2&3 and restart script (respawn) if it ends - what it
never should de :O)

script
------------
#!/bin/sh

while true ; do

if [ $(lsmod | grep drbd | wc -l) != "0" ] ; then

if [. $(drbdsetup /dev/nb0 show | grep 7788 | wc -l) = "0" ] ; then
drbdsetup /dev/nb0 net 192.168.1.1:7788 192.168.1.2:7788
B --sync-rate 4M -t 200 -c 25 -i 2
logger netconfig drbd0 started 2 reconfigure network parameters
fi

if [. $(drbdsetup /dev/nb1 show | grep 7789 | wc -l) = "0" ] ; then
drbdsetup /dev/nb1 net 192.168.2.1:7789 192.168.2.2:7789
B --sync-rate 4M -t 200 -c 25 -i 2
logger netconfig drbd0 started 2 reconfigure network parameters
fi

fi

sleep 60

done

exit 0
------------

chmod 700 /root/drbdnet
chown root.root /root/drbdnet

Michael Appeldorn