Mailing List Archive

DRBD 9.x sync corruption when using resync-after
Hello,

We are seeing an issue with data corruption on the backing volume of a 2-node DRBD resource when combined with the ?resync-after? configuration.

The problem occurs when there are two resources that sync changes on reconnection after a period of disconnection. We have configured the second resource to resync after the first resource. On the completion of the sync and after the second resource reports UpToDate, a checksum of the data on the second resource?s backing disk show it is mismatched.

Issuing a drbdadm verify against the second resource reports many out-of-sync blocks in dmesg with the final message: "drbd volume-2/0 drbd1001 node-2: Online verify found 33506 4k blocks out of sync!"

So far, we have only managed to replicate this issue with a specific workload against the volume. That workload is a Windows 10 virtual machine booting up, logging in via RDP and doing some background Windows Update checks. We have been able to capture the IO using blktrace and replicate the issue when replaying it using fio.

The issue is reproducible on DRBD 9.0.30, 9.0.32, 9.1.8 RC1 and 9.2.0 RC6 on Ubuntu 20.04 (Linux 5.4.0-121-generic). The issue is present when the backing volume is ZFS, LVM or direct to a physical disk.

We have not been able to replicate the issue using the DRBD 8.4 module included with the kernel.

I have uploaded the reproducer shell script to: https://gist.github.com/ShMaunder/25c6483a8cf29312d5ca409f99466090 -- however this does not include the binary blktrace files. I am happy to send the blktrace files to anybody that requests them.



Global config:

global {
usage-count no;
udev-always-use-vnr;
disable-ip-verification;
}

common {
protocol C;

startup {
wfc-timeout 1;
}

disk {
on-io-error detach;
al-extents 1297;
}

net {
after-sb-0pri discard-zero-changes;
after-sb-1pri consensus;
after-sb-2pri disconnect;
rr-conflict call-pri-lost;
max-buffers 8000;
max-epoch-size 8000;
sndbuf-size 0;
csums-alg md5;
timeout 60;
ko-count 7;
verify-alg sha256;
}
}



Resource definitions:

resource volume-1 {
device /dev/drbd1000;
disk /dev/zvol/ztank/volume-1;
meta-disk /dev/zvol/ztank/volume-1-meta;
handlers {
split-brain "/usr/local/bin/split-brain-handler.sh volume-1";
}
disk {
}
on node-1 {
address ipv4 192.168.56.1:7790;
}
on node-2 {
address ipv4 192.168.56.2:7790;
}
}

resource volume-2 {
device /dev/drbd1001;
disk /dev/zvol/ztank/volume-2;
meta-disk /dev/zvol/ztank/volume-2-meta;
handlers {
split-brain "/usr/local/bin/split-brain-handler.sh volume-2";
}
disk {
resync-after volume-1;
}
on node-1 {
address ipv4 192.168.56.1:7791;
}
on node-2 {
address ipv4 192.168.56.2:7791;
}
}


Does anybody have any ideas to the cause or how to further troubleshoot?

Many thanks,
Shaun

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user
Re: DRBD 9.x sync corruption when using resync-after [ In reply to ]
We have narrowed the problem down to writes being issued into the DRBD device whilst the peer is in
the Outdated state.

There is an updated reproducer script which removes the blktrace/fio replay requirements.

Script link: https://gist.github.com/ShMaunder/25c6483a8cf29312d5ca409f99466090

The script performs the following steps:

1. Down the volume-1 and volume-2 DRBD resources.

2. Recreate backing & metadata volumes for volume-1 and volume-2.
Create new metadata, bring up the resources and make sure they connect.
Set a new UUID to skip a full resync and primary the resources on one of the nodes.

3. Disconnect the resources.
Inject ~250MB of random data into volume-1. ~250MB prevents an instant resync of volume-1.
Inject a sector of random data into volume-2. Only a single sector to trigger the resync.

4. Connects volume-1 and volume-2 DRBD resources.
Explicitly pause sync on volume-2.
Volume-2 will stay in the Outdated state even after volume-1 has resync'd.

5. Inject another sector of random data into volume-2.
This sector won't resync and nor will any other changed sectors not already marked for resync.

6. The secondary side waits for the primary side to reconnect its resources.

7. Resume sync of volume-2 and wait till the sync has completed.
Check for DRBD's resync warning message in the journal.

8. Verify whether volume-1 and volume-2 backing devices are in sync via checksum & drbdadm verify.


Some analysis shows metric inconsistencies between the nodes while volume-2 is in the Outdated
state. For example, the rs_total counter seems to be misaligned between the nodes and increments
only on the primary side as more data is changed.

I have found that the data changes which are not sync'd from step 5 can be quick sync'd by:

a. Disconnect volume-2.

b. Inject another random sector into volume-2.

c. Connect volume-2.

d. Both the random sector and all the changes from step 5 are quickly sync'd.
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user