Mailing List Archive

DRBD Behavior on Device Failure, and Recovery Procedure
Our servers have a large number of resources on a 6-drive volume group. When Linstor provisioned the resources, it apparently kept them all on individual devices. Here's a snippet of the approximately 200 resources on the servers. None of them show more than 1 device in the "Devices" column.

[root@ha51b ~]# lvs -o+lv_layout,stripes,devices
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Layout #Str Devices
site002_00000 vg0 -wi-ao---- 104.02g linear 1 /dev/nvme3n1(371535)
site003_00000 vg0 -wi-ao---- <63.02g linear 1 /dev/nvme0n1(498558)
site017_00000 vg0 -wi-ao---- <149.04g linear 1 /dev/nvme3n1(724396)
site019_00000 vg0 -wi-ao---- <19.01g linear 1 /dev/nvme4n1(0)
site021_00000 vg0 -wi-ao---- <23.01g linear 1 /dev/nvme2n1(698275)
site030_00000 vg0 -wi-ao---- 39.01g linear 1 /dev/nvme2n1(704165)
site034_00000 vg0 -wi-ao---- <23.01g linear 1 /dev/nvme3n1(713896)
site035_00000 vg0 -wi-ao---- 39.01g linear 1 /dev/nvme0n1(254527)
site036_00000 vg0 -wi-ao---- <88.02g linear 1 /dev/nvme2n1(714152)
site037_00000 vg0 -wi-ao---- <28.01g linear 1 /dev/nvme0n1(530822)
site039_00000 vg0 -wi-ao---- <59.02g linear 1 /dev/nvme1n1(180777)
site041_00000 vg0 -wi-ao---- <21.01g linear 1 /dev/nvme3n1(181290)
site043_00000 vg0 -wi-ao---- 50.01g linear 1 /dev/nvme3n1(398165)
site045_00000 vg0 -wi-ao---- 52.01g linear 1 /dev/nvme1n1(203567)
site047_00000 vg0 -wi-ao---- 54.01g linear 1 /dev/nvme0n1(264514)
site049_00000 vg0 -wi-ao---- <81.02g linear 1 /dev/nvme3n1(410968)
site058_00000 vg0 -wi-ao---- <30.01g linear 1 /dev/nvme0n1(564622)
site062_00000 vg0 -wi-ao---- 17.00g linear 1 /dev/nvme3n1(197679)
site065_00000 vg0 -wi-ao---- <23.01g linear 1 /dev/nvme1n1(387935)
site068_00000 vg0 -wi-ao---- <32.01g linear 1 /dev/nvme0n1(616090)
</snip>

With this layout (all LVs are linear), when a drive fails, I assume only the resources on that physical drive would go diskless, and all the other resources would continue operating normally, is that correct?

In such an event, what would be the recovery procedure? Swap the failed drive, use vgcfrestore to restore the LVM data to the new PV, then do a DRBD resync?

-Eric

Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
Re: DRBD Behavior on Device Failure, and Recovery Procedure [ In reply to ]
Hello,

As mentioned in the other drbd-user thread (Adding a Physical Disk to a
Storage Pool), this behaviour is controlled by LVM, not by Linstor. During
the `linstor sp c ...` command you only gave Linstor the name of the
volume-group, which is also what Linstor will use for the `lvcreate ...`
commands. If you want to control how LVM distributes the volumes within
that VG, you have to configure your VG accordingly, or use Linstor's
"StorDriver/LvcreateOptions"
property [1] to configure raid levels "per resource" (you can also set that
property on controller level which will affect all new resources).

> In such an event, what would be the recovery procedure? Swap the failed
drive, use vgcfrestore to restore the LVM data to the new PV, then do a
DRBD resync?

Sounds correct to me.

[1]
https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-storage-providers
--
Best regards,
Gabor Hernadi