I've done some more testing of the meta-data code and think I've
uncovered a bug. It's reproducible. I can cause it to happen with
heartbeat controlling DRBD or by controlling DRBD by hand. (the
example below is without HB).
We can get into a situation where a file created on one node is not
quick-synced to it's partner. DRBD always selects the correct node to
be primary, but it seems it doesn't do the quick-sync. My two systems
are called "cuda1" and "cuda2":
1) Cuda1: start DRBD, make it primary, mount fs.
2) Full sync from Cuda1 to Cuda2.
-- [root@cuda1 /root]# cat /proc/drbd ; dmeta
-- version: 0.6.1-pre3 (api:58/proto:58)
--
-- 0: cs:Connected st:Primary/Secondary ns:4097800 nr:0 dw:164 dr:4097460
pe:0 ua:0
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 27 11 primary
3) Cuda2: stop DRBD.
-- [root@cuda2 /root]# /etc/rc.d/init.d/drbd stop
On Cuda1 we have:
-- [root@cuda1 /root]# cat /proc/drbd ; dmeta
-- version: 0.6.1-pre3 (api:58/proto:58)
--
-- 0: cs:WFConnection st:Primary/Unknown ns:4097800 nr:0 dw:164 dr:4097460
pe:0 ua:0
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 28 11 primary
4) Cuda1: Create file in shared partition.
-- [root@cuda1 /root]# echo "testing" > /bas/data/testfile
-- [root@cuda1 /root]# ls -l /bas/data/testfile
-- -rw-r--r-- 1 root root 8 Oct 15 15:20 /bas/data/testfile
5) Cuda1: umount fs, make DRBD secondary, stop DRBD.
-- [root@cuda1 /root]# umount /bas/data
-- [root@cuda1 /root]# drbdsetup /dev/nb0 secondary
-- [root@cuda1 /root]# /etc/rc.d/init.d/drbd stop
-- [root@cuda1 /root]#
-- [root@cuda1 /root]# cat /proc/drbd ; dmeta
-- cat: /proc/drbd: No such file or directory
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 28 11 secondary
-- [root@cuda1 /root]#
6) Start DRBD on both nodes. (Note: DRBD always selects the correct
node to be primary)
-- [root@cuda1 /root]# /etc/rc.d/init.d/drbd start
-- Setting up drbd0...[ OK ]
-- Do you want to abort waiting for other server and make this one primary?
-- /dev/nb0 is Connected,Primaryno
-- [root@cuda1 /root]# cat /proc/drbd ; dmeta
-- version: 0.6.1-pre3 (api:58/proto:58)
--
-- 0: cs:Connected st:Primary/Secondary ns:0 nr:0 dw:0 dr:0 pe:0 ua:0
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 28 12 primary
7) Cuda1: DRBD is already primary, mount the fs. Observe the file
created in step 4 is present.
-- [root@cuda1 /root]# mount /bas/data
-- [root@cuda1 /root]# ls -l /bas/data/testfile
-- -rw-r--r-- 1 root root 8 Oct 15 15:20 /bas/data/testfile
8) Cuda1: umount fs, make drbd secondary.
-- [root@cuda1 /root]# umount /bas/data
-- [root@cuda1 /root]# drbdsetup /dev/nb0 secondary
-- [root@cuda1 /root]# cat /proc/drbd ; dmeta
-- version: 0.6.1-pre3 (api:58/proto:58)
--
-- 0: cs:Connected st:Secondary/Secondary ns:176 nr:0 dw:44 dr:4 pe:0 ua:0
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 28 12 secondary
9) Cuda2: make DRBD primary. mount fs. Observe the file created in
step 4 is missing.
-- [root@cuda2 /root]# drbdsetup /dev/nb0 primary
-- [root@cuda2 /root]# mount /bas/data
-- [root@cuda2 /root]# ls -l /bas/data/testfile
-- ls: /bas/data/testfile: No such file or directory
-- [root@cuda2 /root]# cat /proc/drbd ; dmeta
-- version: 0.6.1-pre3 (api:58/proto:58)
--
-- 0: cs:Connected st:Primary/Secondary ns:16 nr:44 dw:48 dr:80 pe:0 ua:0
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 29 12 primary
Has anyone else seen this problem?
--------------------------------------------------------------------------
System configuration follows:
Two RH 6.1 systems.
DRBD: version: 0.6.1-pre3 (api:58/proto:58).
reiserfs over DRBD.
Heartbeat: version 0.4.9.0d.
drbd.conf:
resource drbd0 {
protocol=B
fsckcmd=fsck -p -y
disk {
do-panic
disk-size=2048728
}
net {
sync-rate=5000
skip-sync
tl-size=256
timeout=60
connect-int=10
ping-int=10
}
on cuda1 {
device=/dev/nb0
disk=/dev/hda5
address=192.0.2.1
port=7788
}
on cuda2 {
device=/dev/nb0
disk=/dev/hda5
address=192.0.2.2
port=7788
}
}
--
Tony Willoughby tonyw@example.com
"That's where Japanzees live."
-My four year old describing Japan.
uncovered a bug. It's reproducible. I can cause it to happen with
heartbeat controlling DRBD or by controlling DRBD by hand. (the
example below is without HB).
We can get into a situation where a file created on one node is not
quick-synced to it's partner. DRBD always selects the correct node to
be primary, but it seems it doesn't do the quick-sync. My two systems
are called "cuda1" and "cuda2":
1) Cuda1: start DRBD, make it primary, mount fs.
2) Full sync from Cuda1 to Cuda2.
-- [root@cuda1 /root]# cat /proc/drbd ; dmeta
-- version: 0.6.1-pre3 (api:58/proto:58)
--
-- 0: cs:Connected st:Primary/Secondary ns:4097800 nr:0 dw:164 dr:4097460
pe:0 ua:0
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 27 11 primary
3) Cuda2: stop DRBD.
-- [root@cuda2 /root]# /etc/rc.d/init.d/drbd stop
On Cuda1 we have:
-- [root@cuda1 /root]# cat /proc/drbd ; dmeta
-- version: 0.6.1-pre3 (api:58/proto:58)
--
-- 0: cs:WFConnection st:Primary/Unknown ns:4097800 nr:0 dw:164 dr:4097460
pe:0 ua:0
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 28 11 primary
4) Cuda1: Create file in shared partition.
-- [root@cuda1 /root]# echo "testing" > /bas/data/testfile
-- [root@cuda1 /root]# ls -l /bas/data/testfile
-- -rw-r--r-- 1 root root 8 Oct 15 15:20 /bas/data/testfile
5) Cuda1: umount fs, make DRBD secondary, stop DRBD.
-- [root@cuda1 /root]# umount /bas/data
-- [root@cuda1 /root]# drbdsetup /dev/nb0 secondary
-- [root@cuda1 /root]# /etc/rc.d/init.d/drbd stop
-- [root@cuda1 /root]#
-- [root@cuda1 /root]# cat /proc/drbd ; dmeta
-- cat: /proc/drbd: No such file or directory
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 28 11 secondary
-- [root@cuda1 /root]#
6) Start DRBD on both nodes. (Note: DRBD always selects the correct
node to be primary)
-- [root@cuda1 /root]# /etc/rc.d/init.d/drbd start
-- Setting up drbd0...[ OK ]
-- Do you want to abort waiting for other server and make this one primary?
-- /dev/nb0 is Connected,Primaryno
-- [root@cuda1 /root]# cat /proc/drbd ; dmeta
-- version: 0.6.1-pre3 (api:58/proto:58)
--
-- 0: cs:Connected st:Primary/Secondary ns:0 nr:0 dw:0 dr:0 pe:0 ua:0
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 28 12 primary
7) Cuda1: DRBD is already primary, mount the fs. Observe the file
created in step 4 is present.
-- [root@cuda1 /root]# mount /bas/data
-- [root@cuda1 /root]# ls -l /bas/data/testfile
-- -rw-r--r-- 1 root root 8 Oct 15 15:20 /bas/data/testfile
8) Cuda1: umount fs, make drbd secondary.
-- [root@cuda1 /root]# umount /bas/data
-- [root@cuda1 /root]# drbdsetup /dev/nb0 secondary
-- [root@cuda1 /root]# cat /proc/drbd ; dmeta
-- version: 0.6.1-pre3 (api:58/proto:58)
--
-- 0: cs:Connected st:Secondary/Secondary ns:176 nr:0 dw:44 dr:4 pe:0 ua:0
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 28 12 secondary
9) Cuda2: make DRBD primary. mount fs. Observe the file created in
step 4 is missing.
-- [root@cuda2 /root]# drbdsetup /dev/nb0 primary
-- [root@cuda2 /root]# mount /bas/data
-- [root@cuda2 /root]# ls -l /bas/data/testfile
-- ls: /bas/data/testfile: No such file or directory
-- [root@cuda2 /root]# cat /proc/drbd ; dmeta
-- version: 0.6.1-pre3 (api:58/proto:58)
--
-- 0: cs:Connected st:Primary/Secondary ns:16 nr:44 dw:48 dr:80 pe:0 ua:0
-- device | Consistent | HumanCnt | ConnectedCnt | ArbitraryCnt | lastState
-- drbd0 1 1 29 12 primary
Has anyone else seen this problem?
--------------------------------------------------------------------------
System configuration follows:
Two RH 6.1 systems.
DRBD: version: 0.6.1-pre3 (api:58/proto:58).
reiserfs over DRBD.
Heartbeat: version 0.4.9.0d.
drbd.conf:
resource drbd0 {
protocol=B
fsckcmd=fsck -p -y
disk {
do-panic
disk-size=2048728
}
net {
sync-rate=5000
skip-sync
tl-size=256
timeout=60
connect-int=10
ping-int=10
}
on cuda1 {
device=/dev/nb0
disk=/dev/hda5
address=192.0.2.1
port=7788
}
on cuda2 {
device=/dev/nb0
disk=/dev/hda5
address=192.0.2.2
port=7788
}
}
--
Tony Willoughby tonyw@example.com
"That's where Japanzees live."
-My four year old describing Japan.