Mailing List Archive

DRBD 9: UpToDate but verify reports resource to be almost completely out of sync, no resynchronisation on disconnect/connect however
Hi all,

In our three node, single master DRBD cluster we experienced the following situation.



Check state:

root@host101010:/home/admins# drbdadm status vmXXXX
vmXXXX role:Primary
disk:UpToDate
host101011 role:Secondary
peer-disk:UpToDate
host101020 role:Secondary
peer-disk:UpToDate

Everything looks nice.



Let's perform a verification:

root@host101010:/home/admins# drbdadm verify vmXXXX

Running:

root@host101010:/home/admins# drbdadm status vmXXXX
vmXXXX role:Primary
disk:UpToDate
host101011 role:Secondary
replication:VerifyS peer-disk:UpToDate done:0.09
host101020 role:Secondary
replication:VerifyS peer-disk:UpToDate done:0.11



The results are remarkable:

[Wed May 5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: Online verify done (total 7112
sec; paused 0 sec; 29484 K/sec)
[Wed May 5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: Online verify found 1 4k blocks
out of sync!
[Wed May 5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: repl( VerifyS -> Established )
[Wed May 5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: helper command: /sbin/drbdadm
out-of-sync
[Wed May 5 11:07:55 2021] drbd vmXXXX/0 drbdXXXX0 host101011: helper command: /sbin/drbdadm
out-of-sync exit code 0
[Wed May 5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: Online verify done (total 7302
sec; paused 0 sec; 28720 K/sec)
[Wed May 5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: Online verify found 41790818 4k
blocks out of sync!
[Wed May 5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: repl( VerifyS -> Established )
[Wed May 5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: helper command: /sbin/drbdadm
out-of-sync
[Wed May 5 11:11:05 2021] drbd vmXXXX/0 drbdXXXX0 host101020: helper command: /sbin/drbdadm
out-of-sync exit code 0

Verify claims almost the whole 150G device to be out of sync.


Let's perform disconnect/connect:

[Wed May 5 11:44:30 2021] drbd vmXXXX: Preparing cluster-wide state change 1977682193 (0->1
496/16)
[Wed May 5 11:44:30 2021] drbd vmXXXX: State change 1977682193: primary_nodes=1,
weak_nodes=FFFFFFFFFFFFFFFA
[Wed May 5 11:44:30 2021] drbd vmXXXX: Committing cluster-wide state change 1977682193 (0ms)
[Wed May 5 11:44:30 2021] drbd vm1054 host101011: conn( Connected -> Disconnecting ) peer(
Secondary -> Unknown )
[Wed May 5 11:44:30 2021] drbd vmXXXX/0 drbdXXXX0 host101011: pdsk( UpToDate -> DUnknown )
repl( Established -> Off )
[Wed May 5 11:44:30 2021] drbd vmXXXX host101011: ack_receiver terminated
[Wed May 5 11:44:30 2021] drbd vmXXXX host101011: Terminating ack_recv thread
[Wed May 5 11:44:30 2021] drbd vmXXXX host101011: Restarting sender thread
[Wed May 5 11:44:30 2021] drbd vmXXXX host101011: Connection closed
[Wed May 5 11:44:30 2021] drbd vmXXXX host101011: helper command: /sbin/drbdadm disconnected
[Wed May 5 11:44:30 2021] drbd vmXXXX host101011: helper command: /sbin/drbdadm disconnected
exit code 0
[Wed May 5 11:44:30 2021] drbd vmXXXX host101011: conn( Disconnecting -> StandAlone )
[Wed May 5 11:44:30 2021] drbd vmXXXX host101011: Terminating receiver thread
[Wed May 5 11:44:30 2021] drbd vmXXXX: Preparing cluster-wide state change 2927240878 (0->2
496/16)
[Wed May 5 11:44:30 2021] drbd vmXXXX: State change 2927240878: primary_nodes=1,
weak_nodes=FFFFFFFFFFFFFFFE
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: Cluster is now split
[Wed May 5 11:44:30 2021] drbd vmXXXX: Committing cluster-wide state change 2927240878 (0ms)
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: conn( Connected -> Disconnecting ) peer(
Secondary -> Unknown )
[Wed May 5 11:44:30 2021] drbd vmXXXX/0 drbdXXXX0 host101020: pdsk( UpToDate -> DUnknown )
repl( Established -> Off )
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: ack_receiver terminated
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: Terminating ack_recv thread
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: Restarting sender thread
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: Connection closed
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: helper command: /sbin/drbdadm disconnected
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: helper command: /sbin/drbdadm disconnected
exit code 0
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: conn( Disconnecting -> StandAlone )
[Wed May 5 11:44:30 2021] drbd vmXXXX host101020: Terminating receiver thread
[Wed May 5 11:44:33 2021] drbd vmXXXX host101011: conn( StandAlone -> Unconnected )
[Wed May 5 11:44:33 2021] drbd vmXXXX host101011: Starting receiver thread (from drbd_w_vmXXXX
[38063])
[Wed May 5 11:44:33 2021] drbd vmXXXX host101011: conn( Unconnected -> Connecting )
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: conn( StandAlone -> Unconnected )
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: Starting receiver thread (from drbd_w_vmXXXX
[38063])
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: conn( Unconnected -> Connecting )
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: Handshake to peer 2 successful: Agreed
network protocol version 119
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: Feature flags enabled on protocol level: 0xf
TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: Peer authenticated using 20 bytes HMAC
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: Starting ack_recv thread (from drbd_r_vmXXXX
[16989])
[Wed May 5 11:44:33 2021] drbd vmXXXX: Preparing cluster-wide state change 2251651464 (0->2
499/145)
[Wed May 5 11:44:33 2021] drbd vmXXXX host101011: Handshake to peer 1 successful: Agreed
network protocol version 119
[Wed May 5 11:44:33 2021] drbd vmXXXX host101011: Feature flags enabled on protocol level: 0xf
TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[Wed May 5 11:44:33 2021] drbd vmXXXX host101011: Peer authenticated using 20 bytes HMAC
[Wed May 5 11:44:33 2021] drbd vmXXXX host101011: Starting ack_recv thread (from drbd_r_vmXXXX
[16987])
[Wed May 5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: drbd_sync_handshake:
[Wed May 5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: self
AE3DDC8114ED4F29:0000000000000000:BAD783DF14F75BB4:CD4C13862D04D7CC bits:41790818 flags:120
[Wed May 5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: peer
AE3DDC8114ED4F28:0000000000000000:9D5D2097F94E7D30:F96063637CE0E650 bits:41790624 flags:120
[Wed May 5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: uuid_compare()=no-sync by rule 38
[Wed May 5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: bitmap content (41790818 bits)
[Wed May 5 11:44:33 2021] drbd vmXXXX: State change 2251651464: primary_nodes=1,
weak_nodes=FFFFFFFFFFFFFFFA
[Wed May 5 11:44:33 2021] drbd vmXXXX: Committing cluster-wide state change 2251651464 (88ms)
[Wed May 5 11:44:33 2021] drbd vmXXXX host101020: conn( Connecting -> Connected ) peer(
Unknown -> Secondary )
[Wed May 5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: pdsk( DUnknown -> Outdated )
repl( Off -> Established )
[Wed May 5 11:44:33 2021] drbd vmXXXX/0 drbdXXXX0 host101020: pdsk( Outdated -> UpToDate )
[Wed May 5 11:44:34 2021] drbd vmXXXX: Preparing cluster-wide state change 1922267911 (0->1
499/145)
[Wed May 5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: drbd_sync_handshake:
[Wed May 5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: self
AE3DDC8114ED4F29:0000000000000000:BAD783DF14F75BB4:CD4C13862D04D7CC bits:1 flags:120
[Wed May 5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: peer
AE3DDC8114ED4F28:0000000000000000:E7DC3C8D8FA65BA8:9D5D2097F94E7D30 bits:1 flags:120
[Wed May 5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: uuid_compare()=no-sync by rule 38
[Wed May 5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: bitmap content (1 bits)
[Wed May 5 11:44:34 2021] drbd vmXXXX: State change 1922267911: primary_nodes=1,
weak_nodes=FFFFFFFFFFFFFFF8
[Wed May 5 11:44:34 2021] drbd vmXXXX: Committing cluster-wide state change 1922267911 (48ms)
[Wed May 5 11:44:34 2021] drbd vmXXXX host101011: conn( Connecting -> Connected ) peer(
Unknown -> Secondary )
[Wed May 5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: pdsk( DUnknown -> Outdated )
repl( Off -> Established )
[Wed May 5 11:44:34 2021] drbd vmXXXX/0 drbdXXXX0 host101011: pdsk( Outdated -> UpToDate )


However after disconnect/reconnect drdadmin status vmXXXX did not show any synchronisation
activity. Instead state was UpToDate/UpToDate/UpToDate immediately.


How can this be explained?



Debian GNU/Linux 10
drbd 9.0.28-1
drbd-utils 9.15 (from Buster Backports)

global {
usage-count no;
udev-always-use-vnr;
}

common {
handlers {
}
startup {
}
options {
auto-promote no;
}
disk {
al-extents 250;
c-plan-ahead 0;
}
net {
protocol C;
max-buffers 8000;
max-epoch-size 8000;
sndbuf-size 8M;
cram-hmac-alg "sha1";
verify-alg "sha1";
}
}

resource "vmXXXX" {
volume 0 {
device /dev/drbdXXXX0;
disk /dev/zvol/disk/data/vmXXXX-0;
meta-disk /dev/zvol/disk/meta/vmXXXX-0;
}
on host101011 {
address 10.200.100.11:1XXXX;
node-id 0;
}
on host101010 {
address 10.200.100.10:1XXXX;
node-id 1;
}
on host101020 {
address 10.200.100.20:1XXXX;
node-id 2;
}
connection-mesh {
hosts host101011 host101010 host101020;
}
net {
shared-secret "...";
}
}


Thanks in advance,

Michael






_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user