Mailing List Archive: drbd / libvirt / Pacemaker Cluster?

Hello emmanuel,

> export TERM=linux and resend your config

Sorry, now the readable config:

node $id="1084777473" master \
attributes standby="off" maintenance="off"
node $id="1084777474" slave \
attributes maintenance="off" standby="off"
primitive libvirt upstart:libvirt-bin \
op start timeout="120s" interval="0" \
op stop timeout="120s" interval="0" \
op monitor interval="30s" \
meta target-role="Started"
primitive st-null stonith:null \
params hostlist="master slave"
primitive vmdata ocf:linbit:drbd \
params drbd_resource="vmdata" \
op monitor interval="29s" role="Master" timeout="20" \
op monitor interval="31s" role="Slave" timeout="20" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100"
primitive vmdata_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/vmdata" fstype="ext4" \
meta target-role="Started" \
op monitor interval="20" timeout="40" \
op start timeout="30" interval="0" \
op stop timeout="30" interval="0"
ms drbd_master_slave vmdata \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started"
clone fencing st-null
location PrimaryNode-libvirt libvirt 200: master
location PrimaryNode-vmdata_fs vmdata_fs 200: master
location SecondaryNode-libvirt libvirt 10: slave
location SecondaryNode-vmdata_fs vmdata_fs 10: slave
location drbd-fence-by-handler-vmdata-drbd_master_slave drbd_master_slave \
rule $id="drbd-fence-by-handler-vmdata-rule-drbd_master_slave" $role="Master" -inf: #uname ne master
colocation libvirt-with-fs inf: libvirt vmdata_fs
colocation services_colo inf: vmdata_fs drbd_master_slave:Master
order fs_after_drbd inf: drbd_master_slave:promote vmdata_fs:start libvirt:start
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="true" \
no-quorum-policy="ignore" \
last-lrm-refresh="1416390260"

I need a simple failover Cluster, if the drbs/fs_mount ist ok and
libvirt is started after that in all cases i dont have problems.

But from time to time the slave dont see that the master is gone
when i plug out the power of the active/master.

And also from time to time (when i test power loss/reboot etc)
i have to start drbd / libvirt manualy - in all cases it can be
"repaired" - but i need to automate it as good as it can.

hm

> ok i have configured it in pacemaker / crm
>
> Since the config has stonith/fencing it has many problems,
> after reboot the nodes are unclean and so on, i need an
> automatic Hot Standby...
>
> When i power off the master box - the slave resources dont came up,
> the slave always says then the master is "online" - but the machine
> is powered off...
>
> ---
>
> Logs that may be interesting:
> master corosync[1350]: [QUORUM] This node is within the non-primary
> component and will NOT provide any services.
> master warning: do_state_transition: Only 1 of 2 cluster nodes are
> eligible to run resources - continue 0
> notice: pcmk_quorum_notification: Membership 900: quorum lost (1)
> notice: crm_update_peer_state: pcmk_quorum_notification: Node
> slave[1084777474] - state is now lost (was member)
> notice: stonith_device_register: Added 'st-null:0' to the device list (2
> active devices)
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Am Dienstag, 2. Dezember 2014, 11:26:12 schrieb Heiner Meier:
> Hello emmanuel,
>
> > export TERM=linux and resend your config
>
> Sorry, now the readable config:
>
>
> node $id="1084777473" master \
> attributes standby="off" maintenance="off"
> node $id="1084777474" slave \
> attributes maintenance="off" standby="off"
> primitive libvirt upstart:libvirt-bin \
> op start timeout="120s" interval="0" \
> op stop timeout="120s" interval="0" \
> op monitor interval="30s" \
> meta target-role="Started"
> primitive st-null stonith:null \
> params hostlist="master slave"
> primitive vmdata ocf:linbit:drbd \
> params drbd_resource="vmdata" \
> op monitor interval="29s" role="Master" timeout="20" \
> op monitor interval="31s" role="Slave" timeout="20" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="100"
> primitive vmdata_fs ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/vmdata" fstype="ext4" \
> meta target-role="Started" \
> op monitor interval="20" timeout="40" \
> op start timeout="30" interval="0" \
> op stop timeout="30" interval="0"
> ms drbd_master_slave vmdata \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started" clone fencing
> st-null
> location PrimaryNode-libvirt libvirt 200: master
> location PrimaryNode-vmdata_fs vmdata_fs 200: master
> location SecondaryNode-libvirt libvirt 10: slave
> location SecondaryNode-vmdata_fs vmdata_fs 10: slave
> location drbd-fence-by-handler-vmdata-drbd_master_slave drbd_master_slave \
> rule $id="drbd-fence-by-handler-vmdata-rule-drbd_master_slave"
> $role="Master" -inf: #uname ne master colocation libvirt-with-fs inf:
> libvirt vmdata_fs
> colocation services_colo inf: vmdata_fs drbd_master_slave:Master
> order fs_after_drbd inf: drbd_master_slave:promote vmdata_fs:start
> libvirt:start property $id="cib-bootstrap-options" \
> dc-version="1.1.10-42f2063" \
> cluster-infrastructure="corosync" \
> stonith-enabled="true" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1416390260"
>
>
> I need a simple failover Cluster, if the drbs/fs_mount ist ok and
> libvirt is started after that in all cases i dont have problems.

Yes. Simple, any many problems anyway.

> But from time to time the slave dont see that the master is gone
> when i plug out the power of the active/master.

Did you think about, who entered the "location drbd-fence-by-handler-..."
constraint and what is the result of that contraint?

Please delete it in a running (!) setup.

> And also from time to time (when i test power loss/reboot etc)
> i have to start drbd / libvirt manualy - in all cases it can be
> "repaired" - but i need to automate it as good as it can.

No. A cluster is for failover in case of an error. I cluster still needs some
administration, as any computer system. Perhaps a cluster needs some more
care, since it should provide the high availability.

Warning you of errors is the job of a monitoing system. Did you set it up?

No to the problems in the config above:

1) Don't use stonith:null. It is just for testing and in a setup with shared
data (DRBD!) you will destroy data.

2) Why do you control your virtual machines with upstart:libvirt and not with
ocf:heartbeat:VirtualDomain? Any specific reason? Could write more about this,
but there is no space left here.

3) Why is the target-role of you ms resource "Started"? Please delete the
target role here.

4) Delete all location constraints for the secondary node. It is nonsense.
Resources will run on the master, if you give 200 points and run on the
secondary it the master is not available.

5) delete the location fence-handler-... See above.

Mit freundlichen Grüßen,

Michael Schwartzkopff

--
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

Mailing List Archive

Mailing List Archive

Attached Files: