Mailing List Archive

2 node clustersplit-brain on linstor_db
I´m trying to build a 2 node cluster with an extra qdevice to have 3
votes with proxmox and drbd.

node 1 1GB NIC 192.168.1.245   2.5GB NIC 192.168.3.1
node 1 1GB NIC 192.168.1.246   2.5GB NIC 192.168.3.2


After installing proxmox 6.4 i install drbd9/linstor.


#apt install linstor-controller linstor-satellite linstor-client
#systemctl start linstor-satellite
#systemctl enable linstor-satellite

#systemctl start linstor-controller
#systemctl enable linstor-satellite

#linstor node create proxmoxn1 192.168.3.1 --node-type Combined
#linstor node create proxmoxn2 192.168.3.2 --node-type Combined

/etc/linstor/linstor-client.conf
    [global]
    controllers=proxmoxn1,proxmoxn2

#create a partition with fdisk /dev/nvme0n1
#vgcreate vg_ssd /dev/nvme0n1p4

On First node
#linstor storage-pool create lvm proxmoxn1 pool_ssd vg_ssd
#linstor storage-pool create lvm proxmoxn2 pool_ssd vg_ssd

#linstor resource-group create adcgrp --storage-pool pool_ssd
--place-count 2
#linstor vg create adcgrp

On both nodes
#apt install linstor-proxmox

/etc/pve/storage.cfg
drbd: drbdstorage
    content images, rootdir
     controller 192.168.3.1,192.168.3.2
     resourcegroup adcgrp

#systemctl restart pvedaemon

Making linstor HA
#linstor resource-definition create linstor_db
#linstor resource-definition set-property linstor_db
DrbdOptions/Resource/on-no-quorum io-error
#linstor volume-definition create linstor_db 200M
#linstor resource create linstor_db -s pool_ssd --auto-place 2

On both nodes
#systemctl disable --now linstor-controller

#cat << EOF > /etc/systemd/system/var-lib-linstor.mount
    [Unit]
    Description=Filesystem for the LINSTOR controller

    [Mount]
    # you can use the minor like /dev/drbdX or the udev symlink
    What=/dev/drbd/by-res/linstor_db/0
    Where=/var/lib/linstor
    EOF

#mv /var/lib/linstor{,.orig}
#mkfs.ext4 /dev/drbd/by-res/linstor_db/0
#systemctl start var-lib-linstor.mount

#cp -r /var/lib/linstor.orig/* /var/lib/linstor
#systemctl start linstor-controller
#scp /etc/systemd/system/var-lib-linstor.mount
root@192.168.1.246:/etc/systemd/system/var-lib-linstor.mount

#systemctl start linstor-controller

#apt install  drbd-reactor
#mkdir /etc/drbd-reactor.d
/etc/drbd-reactor.d/linstor.toml
    [promoter]]
    promoter.resources.linstor_db]
    start = ["var-lib-linstor.mount", "linstor-controller.service"]

#systemctl restart drbd-reactor
#systemctl enable drbd-reactor

#systemctl edit linstor-satellite
    [Service]
    Environment=LS_KEEP_RES=linstor_db
    [Unit]
    After=drbd-reactor.service


#systemctl restart linstor-satellite

I can create VM's and all seems to be ok.

After rebooting both nodes   linstor/drbdadm shows this behaviour
now VM is very slow (10 times slower the in proxmox LVMTHIN)

dmesg show split-brain only for linstor_db
[   17.632010] drbd linstor_db/0 drbd1001 proxmoxn2: helper command:
/sbin/drbdadm initial-split-brain
[   17.632621] drbd linstor_db/0 drbd1001 proxmoxn2: helper command:
/sbin/drbdadm initial-split-brain exit code 0
[   17.632627] drbd linstor_db/0 drbd1001: Split-Brain detected but
unresolved, dropping connection!
[   17.632646] drbd linstor_db/0 drbd1001 proxmoxn2: helper command:
/sbin/drbdadm split-brain
[   17.633208] drbd linstor_db/0 drbd1001 proxmoxn2: helper command:
/sbin/drbdadm split-brain exit code 0

Even manually fix split-brain doesn't work






Output of the nodes



First node



root@proxmoxn1:~# linstor r l
??????????????????????????????????????????????????????????????????????????????????????????????????????
? ResourceName  ? Node      ? Port ? Usage  ? Conns                 ?   
State ? CreatedOn           ?
??????????????????????????????????????????????????????????????????????????????????????????????????????
? linstor_db    ? proxmoxn1 ? 7001 ? InUse  ? StandAlone(proxmoxn2) ?
UpToDate ? 2021-06-01 21:34:35 ?
? linstor_db    ? proxmoxn2 ? 7001 ? InUse  ? Connecting(proxmoxn1) ?
UpToDate ? 2021-06-01 21:34:35 ?
? vm-100-disk-1 ? proxmoxn1 ? 7000 ? Unused ? Ok                    ?
UpToDate ? 2021-05-29 12:30:08 ?
? vm-100-disk-1 ? proxmoxn2 ? 7000 ? Unused ? Ok                    ?
UpToDate ? 2021-05-29 12:30:07 ?
? vm-108-disk-1 ? proxmoxn1 ? 7002 ? InUse  ? StandAlone(proxmoxn2) ?
UpToDate ? 2021-06-06 21:01:10 ?
? vm-108-disk-1 ? proxmoxn2 ? 7002 ? Unused ? Connecting(proxmoxn1) ?
UpToDate ? 2021-06-06 21:01:10 ?
??????????????????????????????????????????????????????????????????????????????????????????????????????
root@proxmoxn1:~# drbdadm status
linstor_db role:Primary
  disk:UpToDate
  proxmoxn2 connection:StandAlone

vm-100-disk-1 role:Secondary
  disk:UpToDate
  proxmoxn2 role:Secondary
    peer-disk:UpToDate

vm-108-disk-1 role:Primary
  disk:UpToDate
  proxmoxn2 connection:StandAlone

root@proxmoxn1:~#




Second node

root@proxmoxn2:~# drbdadm status
linstor_db role:Primary
  disk:UpToDate
  proxmoxn1 connection:Connecting

vm-100-disk-1 role:Secondary
  disk:UpToDate
  proxmoxn1 role:Secondary
    peer-disk:UpToDate

vm-108-disk-1 role:Secondary
  disk:UpToDate
  proxmoxn1 connection:Connecting



I´ve read the docs again an again but no luck
Can anybody help ?

Martin

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user
Re: 2 node clustersplit-brain on linstor_db [ In reply to ]
On Tue, Jun 08, 2021 at 10:44:57PM +0200, Martin wrote:
> I?ve read the docs again an again but no luck

You only have 2 nodes, but drbd-reactor uses DRBD quorum, you should at
least have 3 nodes. The 3rd one can be one with a diskless assignment.

> Even manually fix split-brain doesn't work

How did you do it exactly? A quick way out is to decide which is the
losing node and to create fresh meta data there. The linstor_db resource
is small enough to resync quickly.

I would deactivate drbd-reactor till you have a 3rd node. It is easy
enough to start the 2 services (mount unit/controller service) on the
node you want to be the controller for now.

And after you added a 3rd node on LINSTOR/DRBD level, I would then wait
for drbd-reactor v0.4.0. It has substantial improvements for the
promoter plugin, but we also had v0.3.0 in production and node failovers
worked reliably.

Regards, rck
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user
Re: 2 node clustersplit-brain on linstor_db [ In reply to ]
On Thu, Jun 10, 2021 at 03:20:30PM +0200, Martin wrote:
> thank you for explanation,
>
> i have configured proxmox with 2 identical servers with nvme ssd?s? and
> a third qdevice on raspi4.
> I assumed (maybe wrong) that this would be enough.? I will try again
> with a small third diskless device
> may be raspi is enough for that ?

It should be. It just was not active, at least the status output from
your last mail only showed 2 nodes for the linstor_db resource but not a
3rd one. You can manually 'linstor resource create' it on the third
node.

> After reboot both nodes (with drbd-reactor and linstor services active) it ends
> up with the situation i described.

If you boot both at the same time yes, maybe you really end up in that
situation as the resource is not connected yet, and both will be allowed
to become Primary in that case as they have "quourum" are not connected
yet and don't know better. But if you would have used proper quorum as
described in the documentation it should not have tried to promote, as 1
out of 3 is not good enough.

> Years ago i?ve build a 2 node Cluster with drdb and proxmox. It was relativ
> easy to install dbrd (apt install .. edit the config files). It works very well
> without a third node or qdevice.

Feel free to still do that if it better fits your scenario.

> But i use fencing via powerswitch.

Coolio, do that then.

> Because of the very good experience with drbd in the passt i try to get it running
> instead of using CEPH.

Thanks a lot, I guess we owe you! *upside down emoji*

Best, rck
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user