Hi all,
I got some strange problems using drbd. I have two PII 600+ boxes
running Debian Linux Kernel 2.2.19 connected via a dedicated 100 MBit
link and the 0.5.8 drbd version.
The first setup of my five drbd devices together with heartbeat was no
problem. Syncing after boot works fine too. But after the system is
running for a while, perhaps after one or two failover tests, the
problems start. Suddenly one or more of the devices are loosing the
connection.
debug.log:
drbd4: Connection lost (pc=0,uc=0)
drbd4: receiver exiting
(nothing else)
If I try to reestablish the connection, this works sometimes, but more
often if I try to reestablish the connection, some of the other devices
are loosing the connection too.
After this happens, most time I got "Timeout" messages in /proc/drbd. It
not possible restart drbd. I can't umount the devices (the umount
command freezes) and sometimes even restarting the box is not possible..
I thought, that there could be a problem with the network link between
the two boxes, so I tried two brand new 3com NICs. Unfortunately without
luck.
Any ideas?
Thanks very much!!
David
my drbd.conf file...
# Comment lines.
#
resource drbd0 {
protocol=B
fsckcmd=fsck -p -y
disk {
do-panic
disk-size=1028097
}
net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}
on ha-node01 {
device=/dev/nb0
disk=/dev/hdb5
address=10.10.10.5
port=7788
}
on ha-node02 {
device=/dev/nb0
disk=/dev/sdb5
address=10.10.10.10
port=7788
}
}
resource drbd1 {
protocol=B
fsckcmd=fsck -p -y
disk {
do-panic
disk-size=1542208
}
net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}
on ha-node01 {
device=/dev/nb1
disk=/dev/hdb6
address=10.10.10.5
port=7789
}
on ha-node02 {
device=/dev/nb1
disk=/dev/sdb6
address=10.10.10.10
port=7789
}
}
resource drbd2 {
protocol=B
fsckcmd=fsck -p -y
disk {
do-panic
disk-size=514048
}
net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}
on ha-node01 {
device=/dev/nb2
disk=/dev/hdb7
address=10.10.10.5
port=7790
}
on ha-node02 {
device=/dev/nb2
disk=/dev/sdb7
address=10.10.10.10
port=7790
}
}
resource drbd3 {
protocol=B
fsckcmd=fsck -p -y
disk {
do-panic
disk-size=208813
}
net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}
on ha-node01 {
device=/dev/nb3
disk=/dev/hdb8
address=10.10.10.5
port=7791
}
on ha-node02 {
device=/dev/nb3
disk=/dev/sdb8
address=10.10.10.10
port=7791
}
}
resource drbd4 {
protocol=B
fsckcmd=/bin/true
disk {
do-panic
disk-size=14337981
}
net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}
on ha-node01 {
device=/dev/nb4
disk=/dev/hdb9
address=10.10.10.5
port=7792
}
on ha-node02 {
device=/dev/nb4
disk=/dev/sdb9
address=10.10.10.10
port=7792
}
}
I got some strange problems using drbd. I have two PII 600+ boxes
running Debian Linux Kernel 2.2.19 connected via a dedicated 100 MBit
link and the 0.5.8 drbd version.
The first setup of my five drbd devices together with heartbeat was no
problem. Syncing after boot works fine too. But after the system is
running for a while, perhaps after one or two failover tests, the
problems start. Suddenly one or more of the devices are loosing the
connection.
debug.log:
drbd4: Connection lost (pc=0,uc=0)
drbd4: receiver exiting
(nothing else)
If I try to reestablish the connection, this works sometimes, but more
often if I try to reestablish the connection, some of the other devices
are loosing the connection too.
After this happens, most time I got "Timeout" messages in /proc/drbd. It
not possible restart drbd. I can't umount the devices (the umount
command freezes) and sometimes even restarting the box is not possible..
I thought, that there could be a problem with the network link between
the two boxes, so I tried two brand new 3com NICs. Unfortunately without
luck.
Any ideas?
Thanks very much!!
David
my drbd.conf file...
# Comment lines.
#
resource drbd0 {
protocol=B
fsckcmd=fsck -p -y
disk {
do-panic
disk-size=1028097
}
net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}
on ha-node01 {
device=/dev/nb0
disk=/dev/hdb5
address=10.10.10.5
port=7788
}
on ha-node02 {
device=/dev/nb0
disk=/dev/sdb5
address=10.10.10.10
port=7788
}
}
resource drbd1 {
protocol=B
fsckcmd=fsck -p -y
disk {
do-panic
disk-size=1542208
}
net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}
on ha-node01 {
device=/dev/nb1
disk=/dev/hdb6
address=10.10.10.5
port=7789
}
on ha-node02 {
device=/dev/nb1
disk=/dev/sdb6
address=10.10.10.10
port=7789
}
}
resource drbd2 {
protocol=B
fsckcmd=fsck -p -y
disk {
do-panic
disk-size=514048
}
net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}
on ha-node01 {
device=/dev/nb2
disk=/dev/hdb7
address=10.10.10.5
port=7790
}
on ha-node02 {
device=/dev/nb2
disk=/dev/sdb7
address=10.10.10.10
port=7790
}
}
resource drbd3 {
protocol=B
fsckcmd=fsck -p -y
disk {
do-panic
disk-size=208813
}
net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}
on ha-node01 {
device=/dev/nb3
disk=/dev/hdb8
address=10.10.10.5
port=7791
}
on ha-node02 {
device=/dev/nb3
disk=/dev/sdb8
address=10.10.10.10
port=7791
}
}
resource drbd4 {
protocol=B
fsckcmd=/bin/true
disk {
do-panic
disk-size=14337981
}
net {
tl-size=256
sync-rate=2000
timeout=60
connect-int=10
ping-int=10
}
on ha-node01 {
device=/dev/nb4
disk=/dev/hdb9
address=10.10.10.5
port=7792
}
on ha-node02 {
device=/dev/nb4
disk=/dev/sdb9
address=10.10.10.10
port=7792
}
}