Mailing List Archive

strange drbd problems
Hi all,

I got some strange problems using drbd. I have two PII 600+ boxes
running Debian Linux Kernel 2.2.19 connected via a dedicated 100 MBit
link and the 0.5.8 drbd version.
The first setup of my five drbd devices together with heartbeat was no
problem. Syncing after boot works fine too. But after the system is
running for a while, perhaps after one or two failover tests, the
problems start. Suddenly one or more of the devices are loosing the
connection.

debug.log:
drbd4: Connection lost (pc=0,uc=0)
drbd4: receiver exiting

(nothing else)

If I try to reestablish the connection, this works sometimes, but more
often if I try to reestablish the connection, some of the other devices
are loosing the connection too.
After this happens, most time I got "Timeout" messages in /proc/drbd. It
not possible restart drbd. I can't umount the devices (the umount
command freezes) and sometimes even restarting the box is not possible..

I thought, that there could be a problem with the network link between
the two boxes, so I tried two brand new 3com NICs. Unfortunately without
luck.

Any ideas?
Thanks very much!!
David


my drbd.conf file...

# Comment lines.
#


resource drbd0 {

protocol=B
fsckcmd=fsck -p -y

disk {
do-panic
disk-size=1028097
}

net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}

on ha-node01 {
device=/dev/nb0
disk=/dev/hdb5
address=10.10.10.5
port=7788
}

on ha-node02 {
device=/dev/nb0
disk=/dev/sdb5
address=10.10.10.10
port=7788
}
}

resource drbd1 {

protocol=B
fsckcmd=fsck -p -y

disk {
do-panic
disk-size=1542208
}

net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}

on ha-node01 {
device=/dev/nb1
disk=/dev/hdb6
address=10.10.10.5
port=7789
}

on ha-node02 {
device=/dev/nb1
disk=/dev/sdb6
address=10.10.10.10
port=7789
}
}

resource drbd2 {

protocol=B
fsckcmd=fsck -p -y

disk {
do-panic
disk-size=514048
}

net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}

on ha-node01 {
device=/dev/nb2
disk=/dev/hdb7
address=10.10.10.5
port=7790
}

on ha-node02 {
device=/dev/nb2
disk=/dev/sdb7
address=10.10.10.10
port=7790
}
}

resource drbd3 {

protocol=B
fsckcmd=fsck -p -y

disk {
do-panic
disk-size=208813
}

net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}

on ha-node01 {
device=/dev/nb3
disk=/dev/hdb8
address=10.10.10.5
port=7791
}

on ha-node02 {
device=/dev/nb3
disk=/dev/sdb8
address=10.10.10.10
port=7791
}
}

resource drbd4 {

protocol=B
fsckcmd=/bin/true

disk {
do-panic
disk-size=14337981
}

net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}

on ha-node01 {
device=/dev/nb4
disk=/dev/hdb9
address=10.10.10.5
port=7792
}

on ha-node02 {
device=/dev/nb4
disk=/dev/sdb9
address=10.10.10.10
port=7792
}
}
Re: strange drbd problems [ In reply to ]
1) If you get timeouts in normal operation, you should consider to increase
the timeout value. Try 10 seconds or 15 seconds. E.g. timeout=150

2) That the whole box gets unstable after some timeouts is something
I have experienced by myself once, but I have no idea where to look
for this bug.

* David Hoeckel <dhoeckel@example.com> [010814 12:19]:
> Hi all,
>
> I got some strange problems using drbd. I have two PII 600+ boxes
> running Debian Linux Kernel 2.2.19 connected via a dedicated 100 MBit
> link and the 0.5.8 drbd version.
> The first setup of my five drbd devices together with heartbeat was no
> problem. Syncing after boot works fine too. But after the system is
> running for a while, perhaps after one or two failover tests, the
> problems start. Suddenly one or more of the devices are loosing the
> connection.
>
> debug.log:
> drbd4: Connection lost (pc=0,uc=0)
> drbd4: receiver exiting
>
> (nothing else)
>
> If I try to reestablish the connection, this works sometimes, but more
> often if I try to reestablish the connection, some of the other devices
> are loosing the connection too.
> After this happens, most time I got "Timeout" messages in /proc/drbd. It
> not possible restart drbd. I can't umount the devices (the umount
> command freezes) and sometimes even restarting the box is not possible..
>
> I thought, that there could be a problem with the network link between
> the two boxes, so I tried two brand new 3com NICs. Unfortunately without
> luck.
>
> Any ideas?
> Thanks very much!!
> David
>
>
> my drbd.conf file...
>
> # Comment lines.
> #
>
>
> resource drbd0 {
>
> protocol=B
> fsckcmd=fsck -p -y
>
> disk {
> do-panic
> disk-size=1028097
> }
>
> net {
> tl-size=256
> sync-rate=2000
> timeout=60
> connect-int=10
> ping-int=10
> }
>
> on ha-node01 {
> device=/dev/nb0
> disk=/dev/hdb5
> address=10.10.10.5
> port=7788
> }
>
> on ha-node02 {
> device=/dev/nb0
> disk=/dev/sdb5
> address=10.10.10.10
> port=7788
> }
> }
>
> resource drbd1 {
>
> protocol=B
> fsckcmd=fsck -p -y
>
> disk {
> do-panic
> disk-size=1542208
> }
>
> net {
> tl-size=256
> sync-rate=2000
> timeout=60
> connect-int=10
> ping-int=10
> }
>
> on ha-node01 {
> device=/dev/nb1
> disk=/dev/hdb6
> address=10.10.10.5
> port=7789
> }
>
> on ha-node02 {
> device=/dev/nb1
> disk=/dev/sdb6
> address=10.10.10.10
> port=7789
> }
> }
>
> resource drbd2 {
>
> protocol=B
> fsckcmd=fsck -p -y
>
> disk {
> do-panic
> disk-size=514048
> }
>
> net {
> tl-size=256
> sync-rate=2000
> timeout=60
> connect-int=10
> ping-int=10
> }
>
> on ha-node01 {
> device=/dev/nb2
> disk=/dev/hdb7
> address=10.10.10.5
> port=7790
> }
>
> on ha-node02 {
> device=/dev/nb2
> disk=/dev/sdb7
> address=10.10.10.10
> port=7790
> }
> }
>
> resource drbd3 {
>
> protocol=B
> fsckcmd=fsck -p -y
>
> disk {
> do-panic
> disk-size=208813
> }
>
> net {
> tl-size=256
> sync-rate=2000
> timeout=60
> connect-int=10
> ping-int=10
> }
>
> on ha-node01 {
> device=/dev/nb3
> disk=/dev/hdb8
> address=10.10.10.5
> port=7791
> }
>
> on ha-node02 {
> device=/dev/nb3
> disk=/dev/sdb8
> address=10.10.10.10
> port=7791
> }
> }
>
> resource drbd4 {
>
> protocol=B
> fsckcmd=/bin/true
>
> disk {
> do-panic
> disk-size=14337981
> }
>
> net {
> tl-size=256
> sync-rate=2000
> timeout=60
> connect-int=10
> ping-int=10
> }
>
> on ha-node01 {
> device=/dev/nb4
> disk=/dev/hdb9
> address=10.10.10.5
> port=7792
> }
>
> on ha-node02 {
> device=/dev/nb4
> disk=/dev/sdb9
> address=10.10.10.10
> port=7792
> }
> }
>
> _______________________________________________
> DRBD-devel mailing list
> DRBD-devel@example.com
> http://lists.sourceforge.net/lists/listinfo/drbd-devel
>