Hi *!
We're currently testing drbd in Hamburg for a fail-over
configuration of a IMAP server with two drbd-mirrors,
one for the IMAP data and the second for LDAP.
Later also a third partition for mail spool, i think.
We have two nodes with same parameters. It are normal
PC's for tests with Amd450, 128MB RAM an IDE harddisk
and two network cards each.
(Later we will use two IBM Netfinities with raid5 (~70GB)
array for data and a raid1 array for the system in each
node. The drbd volumes will be all on the raid5 array.)
The config is:
MASTER_NODE="imas1"
SLAVE_NODE="imas2"
MASTER_IF="192.168.1.1:7701"
SLAVE_IF="192.168.1.2:7701"
OPTIONS="-r 2000 -p"
PROTOCOL="B"
MASTER_DEVICE="/dev/nb1"
SLAVE_DEVICE="/dev/nb1"
MASTER_PARTITION="/dev/hda6"
SLAVE_PARTITION="/dev/hda6"
MASTER_FSCK="fsck -p -y"
SLAVE_FSCK="fsck -p -y"
imas1:~ # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda3 1.9G 464M 1.4G 25% /
/dev/hda1 22M 2.1M 19M 10% /boot
/dev/nb1 6.5G 20k 6.1G 0% /imap
imas1:~ # mount
/dev/hda3 on / type ext2 (rw)
proc on /proc type proc (rw)
/dev/hda1 on /boot type ext2 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=0620)
/dev/nb1 on /imap type ext2 (rw,sync)
and similar for /dev/nb0 (/dev/hda5). I've tryed to
stress it a little bit with the bonnie benchmark.
If I start bonnie on only one partition (/imap here),
it works very well:
imas1:~ # sync
imas1:~ # bonnie -d /imap -s 500 -m imas1-imap -y
File '/imap/Bonnie.429', size: 524288000, volumes: 1
Writing with putc()... done: 354 kB/s 15.4 %CPU
Rewriting... done: 2500 kB/s 41.2 %CPU
Writing intelligently...done: 355 kB/s 6.8 %CPU
Reading with getc()... done: 2836 kB/s 98.6 %CPU
Reading intelligently...done: 7721 kB/s 98.1 %CPU
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
---Sequential Output (sync)----- ---Sequential Input--
--Rnd Seek-
-Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--04k (03)-
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
/sec %CPU
imas1- 1* 500 354 15.4 355 6.8 2500 41.2 2836 98.6 7721 98.1
99.2 6.7
imas1:~ #
imas1:~ # bonnie -d /imap -s 1500 -m imas1-imap -y
File '/imap/Bonnie.531', size: 1572864000, volumes: 1
Writing with putc()... done: 352 kB/s 14.3 %CPU
Rewriting... done: 2495 kB/s 40.7 %CPU
Writing intelligently...done: 355 kB/s 5.3 %CPU
Reading with getc()... done: 2833 kB/s 98.2 %CPU
Reading intelligently...done: 7664 kB/s 97.4 %CPU
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
---Sequential Output (sync)----- ---Sequential Input--
--Rnd Seek-
-Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--04k (03)-
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
/sec %CPU
imas1- 1*1500 352 14.3 355 5.3 2495 40.7 2833 98.2 7664 97.4
78.5 4.5
imas1:~ # cat /proc/drbd
version : 56
0: cs:Connected st:Secondary/Primary ns:0 nr:17 dw:17 dr:0 of:0
1: cs:Connected st:Primary/Secondary ns:2700153 nr:0 dw:2702340
dr:1596318 of:0
But if I start bonnie also on the second node on the /data
partition (/dev/nb0) at the same time, I'll get a completely
crash. The IDE LED, network LED's are all on, but the PC is
really death (with sync one node was death):
without sync options (mount,hdparm):
imas1:~ # mount
/dev/hda3 on / type ext2 (rw)
proc on /proc type proc (rw)
/dev/hda1 on /boot type ext2 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=0620)
/dev/nb1 on /imap type ext2 (rw)
imas1:~ # cd /imap/
imas1:/imap # bonnie -d . -html -m imas1-imap -s 1000 -y
File './Bonnie.343', size: 1048576000, volumes: 1
Writing with putc()... done: 1618 kB/s 50.9 %CPU
Rewriting...Read from remote host test01.hh.suse.de: No route to host
Connection to test01.hh.suse.de closed.
imas2:~ # mount
/dev/hda3 on / type ext2 (rw)
proc on /proc type proc (rw)
/dev/hda1 on /boot type ext2 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=0620)
/dev/nb0 on /data type ext2 (rw)
imas2:~ # cd /data/
imas2:/data # bonnie -d . -html -m imas2-data -s 1000 -y
File './Bonnie.344', size: 1048576000, volumes: 1
Writing with putc()... done: 1680 kB/s 51.4 %CPU
Rewriting...Read from remote host test02.hh.suse.de: No route to host
Connection to test02.hh.suse.de closed.
and with sync options (mount,hdparm):
imas1:/ # mount
/dev/hda3 on / type ext2 (rw,sync)
proc on /proc type proc (rw)
/dev/hda1 on /boot type ext2 (rw,sync)
devpts on /dev/pts type devpts (rw,gid=5,mode=0620)
/dev/nb1 on /imap type ext2 (rw,sync)
imas1:/ # bonnie -d /imap -html -m imas1-imap -y
Bonnie: Warning: You have 127MB RAM, but you test with only 100MB
datasize!
Bonnie: This might yield unrealistically good results,
Bonnie: for reading and seeking.
File '/imap/Bonnie.612', size: 104857600, volumes: 1
Writing with putc()... done: 115 kB/s 3.6 %CPU
Rewriting...Read from remote host test01.hh.suse.de: Connection timed out
Connection to test01.hh.suse.de closed.
imas2:~ # mount
/dev/hda3 on / type ext2 (rw,sync)
proc on /proc type proc (rw)
/dev/hda1 on /boot type ext2 (rw,sync)
devpts on /dev/pts type devpts (rw,gid=5,mode=0620)
/dev/nb0 on /data type ext2 (rw,sync)
imas2:~ # bonnie -d /data -html -m imas2-data -y
Bonnie: Warning: You have 127MB RAM, but you test with only 100MB
datasize!
Bonnie: This might yield unrealistically good results,
Bonnie: for reading and seeking.
File '/data/Bonnie.702', size: 104857600, volumes: 1
Writing with putc()... done: 114 kB/s 3.6 %CPU
Rewriting... done: 2185 kB/s 35.1 %CPU
Writing intelligently...done: 385 kB/s 1.7 %CPU
Reading with getc()... done: 3258 kB/s 98.3 %CPU
Reading intelligently...done: 15224 kB/s 95.2 %CPU
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
<TR><TD>imas2-data</TD><TD>100 * 1</TD><TD>114</TD><TD>
3.6</TD><TD>385</TD><TD>
1.7</TD><TD>2185</TD><TD>35.1</TD><TD>3258</TD><TD>98.3</TD>
<TD>15224</TD><TD>95.2</TD><TD>1055.8</TD><TD>24.3</TD></TR>
Aug 29 01:17:47 imas2 kernel: drbd0: ack timeout detected!
Aug 29 01:17:47 imas2 kernel: drbd : timeout detected! (pid=2)
Aug 29 01:17:47 imas2 kernel: drbd0: send timed out!! (pid=2)
Aug 29 01:17:47 imas2 kernel: drbd1: sock_sendmsg returned -32
Aug 29 01:17:47 imas2 kernel: drbd1: sock_recvmsg returned -104
Aug 29 01:18:49 imas2 kernel: drbd0: ack timeout detected!
Aug 29 01:19:15 imas2 kernel: drbd0: ack timeout detected!
Aug 29 01:20:06 imas2 last message repeated 2 times
Aug 29 01:20:32 imas2 kernel: drbd0: ack timeout detected!
My first opinion is, that this happens, because the operations
on IDE disks needs too much CPU time and drbd can't sync the
volume... it's right?
Or does it happen, because of "misconfiguration"?
Has anybody an idea, what I can do / "tune" here?
BTW: On http://www.suse.de/~mt/drbd/ you'll find my
first SuSE-aware RPM's (patched run level script).
I'll check them into our dist build system later,
if I get feedback and this is desired :-)
Kind regards,
Marius Tomaschewski <mt@example.com>
--
SuSE GmbH, Hamburg --- SuSE Labs, Product Developement
PGP public key available: http://www.suse.de/~mt/mt.pgp
Fprint: EA 1F 92 75 1A F9 82 07 A1 28 DE 7A 32 E8 97 18
We're currently testing drbd in Hamburg for a fail-over
configuration of a IMAP server with two drbd-mirrors,
one for the IMAP data and the second for LDAP.
Later also a third partition for mail spool, i think.
We have two nodes with same parameters. It are normal
PC's for tests with Amd450, 128MB RAM an IDE harddisk
and two network cards each.
(Later we will use two IBM Netfinities with raid5 (~70GB)
array for data and a raid1 array for the system in each
node. The drbd volumes will be all on the raid5 array.)
The config is:
MASTER_NODE="imas1"
SLAVE_NODE="imas2"
MASTER_IF="192.168.1.1:7701"
SLAVE_IF="192.168.1.2:7701"
OPTIONS="-r 2000 -p"
PROTOCOL="B"
MASTER_DEVICE="/dev/nb1"
SLAVE_DEVICE="/dev/nb1"
MASTER_PARTITION="/dev/hda6"
SLAVE_PARTITION="/dev/hda6"
MASTER_FSCK="fsck -p -y"
SLAVE_FSCK="fsck -p -y"
imas1:~ # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda3 1.9G 464M 1.4G 25% /
/dev/hda1 22M 2.1M 19M 10% /boot
/dev/nb1 6.5G 20k 6.1G 0% /imap
imas1:~ # mount
/dev/hda3 on / type ext2 (rw)
proc on /proc type proc (rw)
/dev/hda1 on /boot type ext2 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=0620)
/dev/nb1 on /imap type ext2 (rw,sync)
and similar for /dev/nb0 (/dev/hda5). I've tryed to
stress it a little bit with the bonnie benchmark.
If I start bonnie on only one partition (/imap here),
it works very well:
imas1:~ # sync
imas1:~ # bonnie -d /imap -s 500 -m imas1-imap -y
File '/imap/Bonnie.429', size: 524288000, volumes: 1
Writing with putc()... done: 354 kB/s 15.4 %CPU
Rewriting... done: 2500 kB/s 41.2 %CPU
Writing intelligently...done: 355 kB/s 6.8 %CPU
Reading with getc()... done: 2836 kB/s 98.6 %CPU
Reading intelligently...done: 7721 kB/s 98.1 %CPU
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
---Sequential Output (sync)----- ---Sequential Input--
--Rnd Seek-
-Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--04k (03)-
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
/sec %CPU
imas1- 1* 500 354 15.4 355 6.8 2500 41.2 2836 98.6 7721 98.1
99.2 6.7
imas1:~ #
imas1:~ # bonnie -d /imap -s 1500 -m imas1-imap -y
File '/imap/Bonnie.531', size: 1572864000, volumes: 1
Writing with putc()... done: 352 kB/s 14.3 %CPU
Rewriting... done: 2495 kB/s 40.7 %CPU
Writing intelligently...done: 355 kB/s 5.3 %CPU
Reading with getc()... done: 2833 kB/s 98.2 %CPU
Reading intelligently...done: 7664 kB/s 97.4 %CPU
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
---Sequential Output (sync)----- ---Sequential Input--
--Rnd Seek-
-Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--04k (03)-
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
/sec %CPU
imas1- 1*1500 352 14.3 355 5.3 2495 40.7 2833 98.2 7664 97.4
78.5 4.5
imas1:~ # cat /proc/drbd
version : 56
0: cs:Connected st:Secondary/Primary ns:0 nr:17 dw:17 dr:0 of:0
1: cs:Connected st:Primary/Secondary ns:2700153 nr:0 dw:2702340
dr:1596318 of:0
But if I start bonnie also on the second node on the /data
partition (/dev/nb0) at the same time, I'll get a completely
crash. The IDE LED, network LED's are all on, but the PC is
really death (with sync one node was death):
without sync options (mount,hdparm):
imas1:~ # mount
/dev/hda3 on / type ext2 (rw)
proc on /proc type proc (rw)
/dev/hda1 on /boot type ext2 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=0620)
/dev/nb1 on /imap type ext2 (rw)
imas1:~ # cd /imap/
imas1:/imap # bonnie -d . -html -m imas1-imap -s 1000 -y
File './Bonnie.343', size: 1048576000, volumes: 1
Writing with putc()... done: 1618 kB/s 50.9 %CPU
Rewriting...Read from remote host test01.hh.suse.de: No route to host
Connection to test01.hh.suse.de closed.
imas2:~ # mount
/dev/hda3 on / type ext2 (rw)
proc on /proc type proc (rw)
/dev/hda1 on /boot type ext2 (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=0620)
/dev/nb0 on /data type ext2 (rw)
imas2:~ # cd /data/
imas2:/data # bonnie -d . -html -m imas2-data -s 1000 -y
File './Bonnie.344', size: 1048576000, volumes: 1
Writing with putc()... done: 1680 kB/s 51.4 %CPU
Rewriting...Read from remote host test02.hh.suse.de: No route to host
Connection to test02.hh.suse.de closed.
and with sync options (mount,hdparm):
imas1:/ # mount
/dev/hda3 on / type ext2 (rw,sync)
proc on /proc type proc (rw)
/dev/hda1 on /boot type ext2 (rw,sync)
devpts on /dev/pts type devpts (rw,gid=5,mode=0620)
/dev/nb1 on /imap type ext2 (rw,sync)
imas1:/ # bonnie -d /imap -html -m imas1-imap -y
Bonnie: Warning: You have 127MB RAM, but you test with only 100MB
datasize!
Bonnie: This might yield unrealistically good results,
Bonnie: for reading and seeking.
File '/imap/Bonnie.612', size: 104857600, volumes: 1
Writing with putc()... done: 115 kB/s 3.6 %CPU
Rewriting...Read from remote host test01.hh.suse.de: Connection timed out
Connection to test01.hh.suse.de closed.
imas2:~ # mount
/dev/hda3 on / type ext2 (rw,sync)
proc on /proc type proc (rw)
/dev/hda1 on /boot type ext2 (rw,sync)
devpts on /dev/pts type devpts (rw,gid=5,mode=0620)
/dev/nb0 on /data type ext2 (rw,sync)
imas2:~ # bonnie -d /data -html -m imas2-data -y
Bonnie: Warning: You have 127MB RAM, but you test with only 100MB
datasize!
Bonnie: This might yield unrealistically good results,
Bonnie: for reading and seeking.
File '/data/Bonnie.702', size: 104857600, volumes: 1
Writing with putc()... done: 114 kB/s 3.6 %CPU
Rewriting... done: 2185 kB/s 35.1 %CPU
Writing intelligently...done: 385 kB/s 1.7 %CPU
Reading with getc()... done: 3258 kB/s 98.3 %CPU
Reading intelligently...done: 15224 kB/s 95.2 %CPU
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
<TR><TD>imas2-data</TD><TD>100 * 1</TD><TD>114</TD><TD>
3.6</TD><TD>385</TD><TD>
1.7</TD><TD>2185</TD><TD>35.1</TD><TD>3258</TD><TD>98.3</TD>
<TD>15224</TD><TD>95.2</TD><TD>1055.8</TD><TD>24.3</TD></TR>
Aug 29 01:17:47 imas2 kernel: drbd0: ack timeout detected!
Aug 29 01:17:47 imas2 kernel: drbd : timeout detected! (pid=2)
Aug 29 01:17:47 imas2 kernel: drbd0: send timed out!! (pid=2)
Aug 29 01:17:47 imas2 kernel: drbd1: sock_sendmsg returned -32
Aug 29 01:17:47 imas2 kernel: drbd1: sock_recvmsg returned -104
Aug 29 01:18:49 imas2 kernel: drbd0: ack timeout detected!
Aug 29 01:19:15 imas2 kernel: drbd0: ack timeout detected!
Aug 29 01:20:06 imas2 last message repeated 2 times
Aug 29 01:20:32 imas2 kernel: drbd0: ack timeout detected!
My first opinion is, that this happens, because the operations
on IDE disks needs too much CPU time and drbd can't sync the
volume... it's right?
Or does it happen, because of "misconfiguration"?
Has anybody an idea, what I can do / "tune" here?
BTW: On http://www.suse.de/~mt/drbd/ you'll find my
first SuSE-aware RPM's (patched run level script).
I'll check them into our dist build system later,
if I get feedback and this is desired :-)
Kind regards,
Marius Tomaschewski <mt@example.com>
--
SuSE GmbH, Hamburg --- SuSE Labs, Product Developement
PGP public key available: http://www.suse.de/~mt/mt.pgp
Fprint: EA 1F 92 75 1A F9 82 07 A1 28 DE 7A 32 E8 97 18