Mailing List Archive

DRBD_MAX_BIO_SIZE problem in a three node cluster (9.x)
Dear DRBD-users,

after using successfully for many years the in-kernel module for two node
setup, we've decided to pass to the version 9.x to use create clusters with
more nodes, but we are facing a serious problem.

Our setup is pretty simple, a three node cluster (HV01, HV02 and HV03 each
one with a 4 port ethernet controller cross-connecting each node with a
dedicated port), 2x2TB SAS HDD in each node (RAID-1, /dev/sda). We are
using an extendend partition (/dev/sda4) to create the partitions for
backing the DRBD devices, as follows:

/dev/sda4 210126848 3905945599 3695818752 1.7T 5 Extended
/dev/sda5 210128896 566644735 356515840 170G 83 Linux
/dev/sda6 566646784 1405507583 838860800 400G 83 Linux

The OS is the latest Gentoo Linux with a vanilla 6.0.9 kernel (compiled
from the original source code, not patches applied), while we are using the
DRBD kernel module Version: 9.2.1 (api:2/proto:86-121), compiled from
source without any patch/change. The three nodes are identical in hw and sw
configuration (the whole / partition is cloned, just changed the network
parameters)

We create a resource as follows (no other configs present, global.conf
empty):

resource "r01"
{
protocol C;
volume 0 {
device minor 1;
disk "/dev/sda5";
meta-disk internal;
}

volume 1 {
device minor 2;
disk "/dev/sda6";
meta-disk internal;
}

on HV01 {
node-id 0;
}

on HV02 {
node-id 1;
}

on HV03 {
node-id 2;
}

connection {
host HV01 address 172.31.255.1:7012;
host HV02 address 172.31.255.2:7021;
}

connection {
host HV01 address 172.30.255.1:7013;
host HV03 address 172.30.255.3:7031;
}

connection {
host HV02 address 172.29.255.2:7023;
host HV03 address 172.29.255.3:7032;
}
}
=====================================================

# drbdadm create-md --force r01 (on each node)
# drbdadm up r01 (on each node)
# drbdadm create-md --force r01 (on each node)
# drbdadm new-current-uuid --clear-bitmap r01/0 (on the primary node HV01)
# drbdadm new-current-uuid --clear-bitmap r01/1 (on the primary node HV01)
# drbdadm primary r01 (on the primary node)
# mkfs.ext4 /dev/drbd1 (on the primary node)
# mount /dev/drbd1 /mnt/r01 (on the primary node)



Everything is fine up to this point. The status is the following:



HV01 ~ # drbdsetup status --verbose --statistics
r01 node-id:0 role:Primary suspended:no force-io-failures:no
write-ordering:flush
volume:0 minor:1 disk:UpToDate backing_dev:/dev/sda5 quorum:yes
size:178247004 read:0 written:0 al-writes:0 bm-writes:0
upper-pending:0 lower-pending:0 al-suspended:no blocked:no
volume:1 minor:2 disk:UpToDate backing_dev:/dev/sda6 quorum:yes
size:419404764 read:0 written:0 al-writes:0 bm-writes:0
upper-pending:0 lower-pending:0 al-suspended:no blocked:no
HV02 node-id:1 connection:Connected role:Secondary congested:no
ap-in-flight:0 rs-in-flight:0
volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0
volume:1 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0
HV03 node-id:2 connection:Connected role:Secondary congested:no
ap-in-flight:0 rs-in-flight:0
volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0
volume:1 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0

HV02 ~ # drbdsetup status --verbose --statistics
r01 node-id:1 role:Secondary suspended:no force-io-failures:no
write-ordering:flush
volume:0 minor:1 disk:UpToDate backing_dev:/dev/sda5 quorum:yes
size:178247004 read:0 written:0 al-writes:0 bm-writes:0
upper-pending:0 lower-pending:0 al-suspended:no blocked:no
volume:1 minor:2 disk:UpToDate backing_dev:/dev/sda6 quorum:yes
size:419404764 read:0 written:0 al-writes:0 bm-writes:0
upper-pending:0 lower-pending:0 al-suspended:no blocked:no
HV01 node-id:0 connection:Connected role:Primary congested:no
ap-in-flight:0 rs-in-flight:0
volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0
volume:1 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0
HV03 node-id:2 connection:Connected role:Secondary congested:no
ap-in-flight:0 rs-in-flight:0
volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0
volume:1 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0

HV03 ~ # drbdsetup status --verbose --statistics
r01 node-id:2 role:Secondary suspended:no force-io-failures:no
write-ordering:flush
volume:0 minor:1 disk:UpToDate backing_dev:/dev/sda5 quorum:yes
size:178247004 read:0 written:0 al-writes:0 bm-writes:0
upper-pending:0 lower-pending:0 al-suspended:no blocked:no
volume:1 minor:2 disk:UpToDate backing_dev:/dev/sda6 quorum:yes
size:419404764 read:0 written:0 al-writes:0 bm-writes:0
upper-pending:0 lower-pending:0 al-suspended:no blocked:no
HV01 node-id:0 connection:Connected role:Primary congested:no
ap-in-flight:0 rs-in-flight:0
volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0
volume:1 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0
HV02 node-id:1 connection:Connected role:Secondary congested:no
ap-in-flight:0 rs-in-flight:0
volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0
volume:1 replication:Established peer-disk:UpToDate resync-suspended:no
received:0 sent:0 out-of-sync:0 pending:0 unacked:0



But when starting a copy of (i.e.) a 10GB file, this error comes up
immediately on both the secondary nodes:


*[Tue Dec 6 16:18:12 2022] drbd r01/0 drbd1 HV01: ASSERTION d->bi_size <=
DRBD_MAX_BIO_SIZE FAILED in read_in_block[Tue Dec 6 16:18:12 2022] drbd
r01 HV01: error receiving P_DATA, e: -5 l: 2093056!*
[Tue Dec 6 16:18:12 2022] drbd r01 HV01: conn( Connected -> ProtocolError
) peer( Primary -> Unknown )
[Tue Dec 6 16:18:12 2022] drbd r01/0 drbd1: disk( UpToDate -> Consistent )
[Tue Dec 6 16:18:12 2022] drbd r01/0 drbd1 HV01: pdsk( UpToDate ->
DUnknown ) repl( Established -> Off )
[Tue Dec 6 16:18:12 2022] drbd r01/1 drbd2: disk( UpToDate -> Consistent )
[Tue Dec 6 16:18:12 2022] drbd r01/1 drbd2 HV01: pdsk( UpToDate ->
DUnknown ) repl( Established -> Off )
[Tue Dec 6 16:18:12 2022] drbd r01 HV01: Terminating sender thread
[Tue Dec 6 16:18:12 2022] drbd r01 HV01: Starting sender thread (from
drbd_r_r01 [13658])
[Tue Dec 6 16:18:12 2022] drbd r01/1 drbd2 HV03: pdsk( UpToDate ->
Consistent )
[Tue Dec 6 16:18:12 2022] drbd r01: Preparing cluster-wide state change
120227718 (1->-1 0/0)
[Tue Dec 6 16:18:12 2022] drbd r01 HV01: Connection closed
[Tue Dec 6 16:18:12 2022] drbd r01 HV01: helper command: /sbin/drbdadm
disconnected
[Tue Dec 6 16:18:12 2022] drbd r01 HV01: helper command: /sbin/drbdadm
disconnected exit code 0
[Tue Dec 6 16:18:12 2022] drbd r01 HV01: conn( ProtocolError ->
Unconnected )
[Tue Dec 6 16:18:12 2022] drbd r01 HV01: Restarting receiver thread
[...]



*[Tue Dec 6 16:18:13 2022] drbd r01/0 drbd1 HV01: ASSERTION d->bi_size <=
DRBD_MAX_BIO_SIZE FAILED in read_in_block[Tue Dec 6 16:18:13 2022] drbd
r01 HV01: error receiving P_DATA, e: -5 l: 2093056!*
[Tue Dec 6 16:18:13 2022] drbd r01 HV01: conn( Connected -> ProtocolError
) peer( Primary -> Unknown )
[Tue Dec 6 16:18:13 2022] drbd r01/0 drbd1: disk( UpToDate -> Consistent )
[Tue Dec 6 16:18:13 2022] drbd r01/0 drbd1 HV01: pdsk( UpToDate ->
DUnknown ) repl( Established -> Off )
[Tue Dec 6 16:18:13 2022] drbd r01/1 drbd2: disk( UpToDate -> Consistent )
[Tue Dec 6 16:18:13 2022] drbd r01/1 drbd2 HV01: pdsk( UpToDate ->
DUnknown ) repl( Established -> Off )
[Tue Dec 6 16:18:13 2022] drbd r01 HV01: Terminating sender thread
[Tue Dec 6 16:18:13 2022] drbd r01 HV01: Starting sender thread (from
drbd_r_r01 [13627])
[Tue Dec 6 16:18:13 2022] drbd r01: Preparing cluster-wide state change
1546196417 (2->-1 0/0)
[Tue Dec 6 16:18:13 2022] drbd r01/0 drbd1 HV02: pdsk( UpToDate ->
Consistent )
[Tue Dec 6 16:18:13 2022] drbd r01 HV01: Connection closed
[Tue Dec 6 16:18:13 2022] drbd r01 HV01: helper command: /sbin/drbdadm
disconnected
[Tue Dec 6 16:18:13 2022] drbd r01 HV01: helper command: /sbin/drbdadm
disconnected exit code 0
[Tue Dec 6 16:18:13 2022] drbd r01 HV01: conn( ProtocolError ->
Unconnected )
[Tue Dec 6 16:18:13 2022] drbd r01 HV01: Restarting receiver thread
[...]


The problem is 100% replicable each time, the only thing that changes is
the d->bi_size value (it also changes in the other error messages during
the copy). The DRBD_MAX_BIO_SIZE is 1048576.
Copying the same file on the same net interface / partition with scp or
other tools works perfectly, at maximum hw speed.

I've tried:
- changing the --peer-max-bio-size
- changing the primary node
- connecting only the primary node with a secondary node
- checking the equality of max_sectors_kb, max_hw_sectors_kb, nr_request
for both the physical disk and the drbd device
- changing the MTU of the network interfaces

Nothing seems to have an effect. Everytime an intensive write operation
starts, this error immediately shows on each secondary node.
Could someone give us some hint, we are totally stuck at the moment?

Thank you in advance and best regards,
Andrea