Mailing List Archive

Avoid one node from being a target for resources migration
Hello.

I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2
are DRBD master-slave, also they have a number of other services installed
(postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
DRBD/postgresql/... are installed at it, only corosync+pacemaker.

But when I add resources to the cluster, a part of them are somehow moved
to node3 and since then fail. Note than I have a "colocation" directive to
place these resources to the DRBD master only and "location" with -inf for
node3, but this does not help - why? How to make pacemaker not run anything
at node3?

All the resources are added in a single transaction: "cat config.txt | crm
-w -f- configure" where config.txt contains directives and "commit"
statement at the end.

Below are "crm status" (error messages) and "crm configure show" outputs.


*root@node3:~# crm status*
Current DC: node2 (1017525950) - partition with quorum
3 Nodes configured
6 Resources configured
Online: [ node1 node2 node3 ]
Master/Slave Set: ms_drbd [drbd]
Masters: [ node1 ]
Slaves: [ node2 ]
Resource Group: server
fs (ocf::heartbeat:Filesystem): Started node1
postgresql (lsb:postgresql): Started node3 FAILED
bind9 (lsb:bind9): Started node3 FAILED
nginx (lsb:nginx): Started node3 (unmanaged) FAILED
Failed actions:
drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
installed
postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
error
bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
error
nginx_stop_0 (node=node3, call=767, rc=5, status=complete,
last-rc-change=Mon Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not
installed


*root@node3:~# crm configure show | cat*
node $id="1017525950" node2
node $id="13071578" node3
node $id="1760315215" node1
primitive drbd ocf:linbit:drbd \
params drbd_resource="vlv" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="120"
primitive fs ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root"
options="noatime,nodiratime" fstype="xfs" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300"
primitive postgresql lsb:postgresql \
op monitor interval="10" timeout="60" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
primitive bind9 lsb:bind9 \
op monitor interval="10" timeout="60" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
primitive nginx lsb:nginx \
op monitor interval="10" timeout="60" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
group server fs postgresql bind9 nginx
ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location loc_server server rule $id="loc_server-rule" -inf: #uname eq node3
colocation col_server inf: server ms_drbd:Master
order ord_server inf: ms_drbd:promote server:start
property $id="cib-bootstrap-options" \
stonith-enabled="false" \
last-lrm-refresh="1421079189" \
maintenance-mode="false"
Re: Avoid one node from being a target for resources migration [ In reply to ]
>
> 1. install the resource related packages on node3 even though you never
> want
> them to run there. This will allow the resource-agents to verify the
> resource
> is in fact inactive.


Thanks, your advise helped: I installed all the services at node3 as well
(including DRBD, but without it configs) and stopped+disabled them. Then I
added the following line to my configuration:

location loc_drbd drbd rule -inf: #uname eq node3

So node3 is never a target for DRBD, and this helped: "crm nodr standby
node1" doesn't tries to use node3 anymore.

But I have another (related) issue. If some node (e.g. node1) becomes
isolated from other 2 nodes, how to force it to shutdown its services? I
cannot use IPMB-based fencing/stonith, because there are no reliable
connections between nodes at all (the nodes are in geo-distributed
datacenters), and IPMI call to shutdown a node from another node is
impossible.

E.g. initially I have the following:

*# crm status*
Online: [ node1 node2 node3 ]
Master/Slave Set: ms_drbd [drbd]
Masters: [ node2 ]
Slaves: [ node1 ]
Resource Group: server
fs (ocf::heartbeat:Filesystem): Started node2
postgresql (lsb:postgresql): Started node2
bind9 (lsb:bind9): Started node2
nginx (lsb:nginx): Started node2

Then I turn on firewall on node2 to isolate it from the outside internet:

*root@node2:~# iptables -A INPUT -p tcp --dport 22 -j ACCEPT*
*root@node2:~# **iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT*
*root@node2:~# **iptables -A INPUT -i lo -j ACCEPT*
*root@node2:~# **iptables -A OUTPUT -o lo -j ACCEPT*
*root@node2:~# **iptables -P INPUT DROP; iptables -P OUTPUT DROP*

Then I see that, although node2 clearly knows it's isolated (it doesn't see
other 2 nodes and does not have quorum), it does not stop its services:

*root@node2:~# crm status*
Online: [ node2 ]
OFFLINE: [ node1 node3 ]
Master/Slave Set: ms_drbd [drbd]
Masters: [ node2 ]
Stopped: [ node1 node3 ]
Resource Group: server
fs (ocf::heartbeat:Filesystem): Started node2
postgresql (lsb:postgresql): Started node2
bind9 (lsb:bind9): Started node2
nginx (lsb:nginx): Started node2

So is there a way to say pacemaker to shutdown nodes' services when they
become isolated?



On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <dvossel@redhat.com> wrote:

>
>
> ----- Original Message -----
> > Hello.
> >
> > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2
> are
> > DRBD master-slave, also they have a number of other services installed
> > (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
> > DRBD/postgresql/... are installed at it, only corosync+pacemaker.
> >
> > But when I add resources to the cluster, a part of them are somehow
> moved to
> > node3 and since then fail. Note than I have a "colocation" directive to
> > place these resources to the DRBD master only and "location" with -inf
> for
> > node3, but this does not help - why? How to make pacemaker not run
> anything
> > at node3?
> >
> > All the resources are added in a single transaction: "cat config.txt |
> crm -w
> > -f- configure" where config.txt contains directives and "commit"
> statement
> > at the end.
> >
> > Below are "crm status" (error messages) and "crm configure show" outputs.
> >
> >
> > root@node3:~# crm status
> > Current DC: node2 (1017525950) - partition with quorum
> > 3 Nodes configured
> > 6 Resources configured
> > Online: [ node1 node2 node3 ]
> > Master/Slave Set: ms_drbd [drbd]
> > Masters: [ node1 ]
> > Slaves: [ node2 ]
> > Resource Group: server
> > fs (ocf::heartbeat:Filesystem): Started node1
> > postgresql (lsb:postgresql): Started node3 FAILED
> > bind9 (lsb:bind9): Started node3 FAILED
> > nginx (lsb:nginx): Started node3 (unmanaged) FAILED
> > Failed actions:
> > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
> > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
> > installed
> > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
> > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
> > error
> > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
> > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
> > error
> > nginx_stop_0 (node=node3, call=767, rc=5, status=complete,
> last-rc-change=Mon
> > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed
>
> Here's what is going on. Even when you say "never run this resource on
> node3"
> pacemaker is going to probe for the resource regardless on node3 just to
> verify
> the resource isn't running.
>
> The failures you are seeing "monitor_0 failed" indicate that pacemaker
> failed
> to be able to verify resources are running on node3 because the related
> packages for the resources are not installed. Given pacemaker's default
> behavior I'd expect this.
>
> You have two options.
>
> 1. install the resource related packages on node3 even though you never
> want
> them to run there. This will allow the resource-agents to verify the
> resource
> is in fact inactive.
>
> 2. If you are using the current master branch of pacemaker, there's a new
> location constraint option called
> 'resource-discovery=always|never|exclusive'.
> If you add the 'resource-discovery=never' option to your location
> constraint
> that attempts to keep resources from node3, you'll avoid having pacemaker
> perform the 'monitor_0' actions on node3 as well.
>
> -- Vossel
>
> >
> > root@node3:~# crm configure show | cat
> > node $id="1017525950" node2
> > node $id="13071578" node3
> > node $id="1760315215" node1
> > primitive drbd ocf:linbit:drbd \
> > params drbd_resource="vlv" \
> > op start interval="0" timeout="240" \
> > op stop interval="0" timeout="120"
> > primitive fs ocf:heartbeat:Filesystem \
> > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root"
> > options="noatime,nodiratime" fstype="xfs" \
> > op start interval="0" timeout="300" \
> > op stop interval="0" timeout="300"
> > primitive postgresql lsb:postgresql \
> > op monitor interval="10" timeout="60" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="60"
> > primitive bind9 lsb:bind9 \
> > op monitor interval="10" timeout="60" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="60"
> > primitive nginx lsb:nginx \
> > op monitor interval="10" timeout="60" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="60"
> > group server fs postgresql bind9 nginx
> > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true"
> > location loc_server server rule $id="loc_server-rule" -inf: #uname eq
> node3
> > colocation col_server inf: server ms_drbd:Master
> > order ord_server inf: ms_drbd:promote server:start
> > property $id="cib-bootstrap-options" \
> > stonith-enabled="false" \
> > last-lrm-refresh="1421079189" \
> > maintenance-mode="false"
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
Re: Avoid one node from being a target for resources migration [ In reply to ]
> On 13 Jan 2015, at 7:56 am, Dmitry Koterov <dmitry.koterov@gmail.com> wrote:
>
> 1. install the resource related packages on node3 even though you never want
> them to run there. This will allow the resource-agents to verify the resource
> is in fact inactive.
>
> Thanks, your advise helped: I installed all the services at node3 as well (including DRBD, but without it configs) and stopped+disabled them. Then I added the following line to my configuration:
>
> location loc_drbd drbd rule -inf: #uname eq node3
>
> So node3 is never a target for DRBD, and this helped: "crm nodr standby node1" doesn't tries to use node3 anymore.
>
> But I have another (related) issue. If some node (e.g. node1) becomes isolated from other 2 nodes, how to force it to shutdown its services? I cannot use IPMB-based fencing/stonith, because there are no reliable connections between nodes at all (the nodes are in geo-distributed datacenters), and IPMI call to shutdown a node from another node is impossible.
>
> E.g. initially I have the following:
>
> # crm status
> Online: [ node1 node2 node3 ]
> Master/Slave Set: ms_drbd [drbd]
> Masters: [ node2 ]
> Slaves: [ node1 ]
> Resource Group: server
> fs (ocf::heartbeat:Filesystem): Started node2
> postgresql (lsb:postgresql): Started node2
> bind9 (lsb:bind9): Started node2
> nginx (lsb:nginx): Started node2
>
> Then I turn on firewall on node2 to isolate it from the outside internet:
>
> root@node2:~# iptables -A INPUT -p tcp --dport 22 -j ACCEPT
> root@node2:~# iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT
> root@node2:~# iptables -A INPUT -i lo -j ACCEPT
> root@node2:~# iptables -A OUTPUT -o lo -j ACCEPT
> root@node2:~# iptables -P INPUT DROP; iptables -P OUTPUT DROP
>
> Then I see that, although node2 clearly knows it's isolated (it doesn't see other 2 nodes and does not have quorum)

we don't know that - there are several algorithms for calculating quorum and the information isn't included in your output.
are you using cman, or corosync underneath pacemaker? corosync version? pacemaker version? have you set no-quorum-policy?

> , it does not stop its services:
>
> root@node2:~# crm status
> Online: [ node2 ]
> OFFLINE: [ node1 node3 ]
> Master/Slave Set: ms_drbd [drbd]
> Masters: [ node2 ]
> Stopped: [ node1 node3 ]
> Resource Group: server
> fs (ocf::heartbeat:Filesystem): Started node2
> postgresql (lsb:postgresql): Started node2
> bind9 (lsb:bind9): Started node2
> nginx (lsb:nginx): Started node2
>
> So is there a way to say pacemaker to shutdown nodes' services when they become isolated?
>
>
>
> On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <dvossel@redhat.com> wrote:
>
>
> ----- Original Message -----
> > Hello.
> >
> > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are
> > DRBD master-slave, also they have a number of other services installed
> > (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
> > DRBD/postgresql/... are installed at it, only corosync+pacemaker.
> >
> > But when I add resources to the cluster, a part of them are somehow moved to
> > node3 and since then fail. Note than I have a "colocation" directive to
> > place these resources to the DRBD master only and "location" with -inf for
> > node3, but this does not help - why? How to make pacemaker not run anything
> > at node3?
> >
> > All the resources are added in a single transaction: "cat config.txt | crm -w
> > -f- configure" where config.txt contains directives and "commit" statement
> > at the end.
> >
> > Below are "crm status" (error messages) and "crm configure show" outputs.
> >
> >
> > root@node3:~# crm status
> > Current DC: node2 (1017525950) - partition with quorum
> > 3 Nodes configured
> > 6 Resources configured
> > Online: [ node1 node2 node3 ]
> > Master/Slave Set: ms_drbd [drbd]
> > Masters: [ node1 ]
> > Slaves: [ node2 ]
> > Resource Group: server
> > fs (ocf::heartbeat:Filesystem): Started node1
> > postgresql (lsb:postgresql): Started node3 FAILED
> > bind9 (lsb:bind9): Started node3 FAILED
> > nginx (lsb:nginx): Started node3 (unmanaged) FAILED
> > Failed actions:
> > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
> > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
> > installed
> > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
> > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
> > error
> > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
> > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
> > error
> > nginx_stop_0 (node=node3, call=767, rc=5, status=complete, last-rc-change=Mon
> > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed
>
> Here's what is going on. Even when you say "never run this resource on node3"
> pacemaker is going to probe for the resource regardless on node3 just to verify
> the resource isn't running.
>
> The failures you are seeing "monitor_0 failed" indicate that pacemaker failed
> to be able to verify resources are running on node3 because the related
> packages for the resources are not installed. Given pacemaker's default
> behavior I'd expect this.
>
> You have two options.
>
> 1. install the resource related packages on node3 even though you never want
> them to run there. This will allow the resource-agents to verify the resource
> is in fact inactive.
>
> 2. If you are using the current master branch of pacemaker, there's a new
> location constraint option called 'resource-discovery=always|never|exclusive'.
> If you add the 'resource-discovery=never' option to your location constraint
> that attempts to keep resources from node3, you'll avoid having pacemaker
> perform the 'monitor_0' actions on node3 as well.
>
> -- Vossel
>
> >
> > root@node3:~# crm configure show | cat
> > node $id="1017525950" node2
> > node $id="13071578" node3
> > node $id="1760315215" node1
> > primitive drbd ocf:linbit:drbd \
> > params drbd_resource="vlv" \
> > op start interval="0" timeout="240" \
> > op stop interval="0" timeout="120"
> > primitive fs ocf:heartbeat:Filesystem \
> > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root"
> > options="noatime,nodiratime" fstype="xfs" \
> > op start interval="0" timeout="300" \
> > op stop interval="0" timeout="300"
> > primitive postgresql lsb:postgresql \
> > op monitor interval="10" timeout="60" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="60"
> > primitive bind9 lsb:bind9 \
> > op monitor interval="10" timeout="60" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="60"
> > primitive nginx lsb:nginx \
> > op monitor interval="10" timeout="60" \
> > op start interval="0" timeout="60" \
> > op stop interval="0" timeout="60"
> > group server fs postgresql bind9 nginx
> > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true"
> > location loc_server server rule $id="loc_server-rule" -inf: #uname eq node3
> > colocation col_server inf: server ms_drbd:Master
> > order ord_server inf: ms_drbd:promote server:start
> > property $id="cib-bootstrap-options" \
> > stonith-enabled="false" \
> > last-lrm-refresh="1421079189" \
> > maintenance-mode="false"
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Avoid one node from being a target for resources migration [ In reply to ]
Hi,

On Mon, Jan 12, 2015 at 07:42:10PM +0300, Dmitry Koterov wrote:
> Hello.
>
> I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2
> are DRBD master-slave, also they have a number of other services installed
> (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
> DRBD/postgresql/... are installed at it, only corosync+pacemaker.
>
> But when I add resources to the cluster, a part of them are somehow moved
> to node3 and since then fail. Note than I have a "colocation" directive to
> place these resources to the DRBD master only and "location" with -inf for
> node3, but this does not help - why? How to make pacemaker not run anything
> at node3?

You could also put the quorum node into standby.

Thanks,

Dejan

> All the resources are added in a single transaction: "cat config.txt | crm
> -w -f- configure" where config.txt contains directives and "commit"
> statement at the end.
>
> Below are "crm status" (error messages) and "crm configure show" outputs.
>
>
> *root@node3:~# crm status*
> Current DC: node2 (1017525950) - partition with quorum
> 3 Nodes configured
> 6 Resources configured
> Online: [ node1 node2 node3 ]
> Master/Slave Set: ms_drbd [drbd]
> Masters: [ node1 ]
> Slaves: [ node2 ]
> Resource Group: server
> fs (ocf::heartbeat:Filesystem): Started node1
> postgresql (lsb:postgresql): Started node3 FAILED
> bind9 (lsb:bind9): Started node3 FAILED
> nginx (lsb:nginx): Started node3 (unmanaged) FAILED
> Failed actions:
> drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
> last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
> installed
> postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
> last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
> error
> bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
> last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
> error
> nginx_stop_0 (node=node3, call=767, rc=5, status=complete,
> last-rc-change=Mon Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not
> installed
>
>
> *root@node3:~# crm configure show | cat*
> node $id="1017525950" node2
> node $id="13071578" node3
> node $id="1760315215" node1
> primitive drbd ocf:linbit:drbd \
> params drbd_resource="vlv" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="120"
> primitive fs ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root"
> options="noatime,nodiratime" fstype="xfs" \
> op start interval="0" timeout="300" \
> op stop interval="0" timeout="300"
> primitive postgresql lsb:postgresql \
> op monitor interval="10" timeout="60" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="60"
> primitive bind9 lsb:bind9 \
> op monitor interval="10" timeout="60" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="60"
> primitive nginx lsb:nginx \
> op monitor interval="10" timeout="60" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="60"
> group server fs postgresql bind9 nginx
> ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> location loc_server server rule $id="loc_server-rule" -inf: #uname eq node3
> colocation col_server inf: server ms_drbd:Master
> order ord_server inf: ms_drbd:promote server:start
> property $id="cib-bootstrap-options" \
> stonith-enabled="false" \
> last-lrm-refresh="1421079189" \
> maintenance-mode="false"

> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Avoid one node from being a target for resources migration [ In reply to ]
> > Then I see that, although node2 clearly knows it's isolated (it doesn't
> see other 2 nodes and does not have quorum)
>
> we don't know that - there are several algorithms for calculating quorum
> and the information isn't included in your output.
> are you using cman, or corosync underneath pacemaker? corosync version?
> pacemaker version? have you set no-quorum-policy?


no-quorum-policy is not set, so, according to
http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-options.html
, it is "stop - stop all resources in the affected cluster parition". I
suppose this is the right option, but why the resources are not stopped on
the node when this one node of three becomes isolated and the node clearly
sees other nodes as offline (so it knows it's isolated)? What should I
configure in addition?

I'm using corosync+pacemaker, no cman. Below (in quotes) is output of "crm
configure show". Versions are from Ubuntu 14.04, so almost new.


>
> > , it does not stop its services:
> >
> > root@node2:~# crm status
> > Online: [ node2 ]
> > OFFLINE: [ node1 node3 ]
> > Master/Slave Set: ms_drbd [drbd]
> > Masters: [ node2 ]
> > Stopped: [ node1 node3 ]
> > Resource Group: server
> > fs (ocf::heartbeat:Filesystem): Started node2
> > postgresql (lsb:postgresql): Started node2
> > bind9 (lsb:bind9): Started node2
> > nginx (lsb:nginx): Started node2
> >
> > So is there a way to say pacemaker to shutdown nodes' services when they
> become isolated?
> >
> >
> >
> > On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <dvossel@redhat.com
> <javascript:;>> wrote:
> >
> >
> > ----- Original Message -----
> > > Hello.
> > >
> > > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and
> Node2 are
> > > DRBD master-slave, also they have a number of other services installed
> > > (postgresql, nginx, ...). Node3 is just a corosync node (for quorum),
> no
> > > DRBD/postgresql/... are installed at it, only corosync+pacemaker.
> > >
> > > But when I add resources to the cluster, a part of them are somehow
> moved to
> > > node3 and since then fail. Note than I have a "colocation" directive to
> > > place these resources to the DRBD master only and "location" with -inf
> for
> > > node3, but this does not help - why? How to make pacemaker not run
> anything
> > > at node3?
> > >
> > > All the resources are added in a single transaction: "cat config.txt |
> crm -w
> > > -f- configure" where config.txt contains directives and "commit"
> statement
> > > at the end.
> > >
> > > Below are "crm status" (error messages) and "crm configure show"
> outputs.
> > >
> > >
> > > root@node3:~# crm status
> > > Current DC: node2 (1017525950) - partition with quorum
> > > 3 Nodes configured
> > > 6 Resources configured
> > > Online: [ node1 node2 node3 ]
> > > Master/Slave Set: ms_drbd [drbd]
> > > Masters: [ node1 ]
> > > Slaves: [ node2 ]
> > > Resource Group: server
> > > fs (ocf::heartbeat:Filesystem): Started node1
> > > postgresql (lsb:postgresql): Started node3 FAILED
> > > bind9 (lsb:bind9): Started node3 FAILED
> > > nginx (lsb:nginx): Started node3 (unmanaged) FAILED
> > > Failed actions:
> > > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
> > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
> > > installed
> > > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
> > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
> > > error
> > > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
> > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms):
> unknown
> > > error
> > > nginx_stop_0 (node=node3, call=767, rc=5, status=complete,
> last-rc-change=Mon
> > > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed
> >
> > Here's what is going on. Even when you say "never run this resource on
> node3"
> > pacemaker is going to probe for the resource regardless on node3 just to
> verify
> > the resource isn't running.
> >
> > The failures you are seeing "monitor_0 failed" indicate that pacemaker
> failed
> > to be able to verify resources are running on node3 because the related
> > packages for the resources are not installed. Given pacemaker's default
> > behavior I'd expect this.
> >
> > You have two options.
> >
> > 1. install the resource related packages on node3 even though you never
> want
> > them to run there. This will allow the resource-agents to verify the
> resource
> > is in fact inactive.
> >
> > 2. If you are using the current master branch of pacemaker, there's a new
> > location constraint option called
> 'resource-discovery=always|never|exclusive'.
> > If you add the 'resource-discovery=never' option to your location
> constraint
> > that attempts to keep resources from node3, you'll avoid having pacemaker
> > perform the 'monitor_0' actions on node3 as well.
> >
> > -- Vossel
> >
> > >
> > > root@node3:~# crm configure show | cat
> > > node $id="1017525950" node2
> > > node $id="13071578" node3
> > > node $id="1760315215" node1
> > > primitive drbd ocf:linbit:drbd \
> > > params drbd_resource="vlv" \
> > > op start interval="0" timeout="240" \
> > > op stop interval="0" timeout="120"
> > > primitive fs ocf:heartbeat:Filesystem \
> > > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root"
> > > options="noatime,nodiratime" fstype="xfs" \
> > > op start interval="0" timeout="300" \
> > > op stop interval="0" timeout="300"
> > > primitive postgresql lsb:postgresql \
> > > op monitor interval="10" timeout="60" \
> > > op start interval="0" timeout="60" \
> > > op stop interval="0" timeout="60"
> > > primitive bind9 lsb:bind9 \
> > > op monitor interval="10" timeout="60" \
> > > op start interval="0" timeout="60" \
> > > op stop interval="0" timeout="60"
> > > primitive nginx lsb:nginx \
> > > op monitor interval="10" timeout="60" \
> > > op start interval="0" timeout="60" \
> > > op stop interval="0" timeout="60"
> > > group server fs postgresql bind9 nginx
> > > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2"
> > > clone-node-max="1" notify="true"
> > > location loc_server server rule $id="loc_server-rule" -inf: #uname eq
> node3
> > > colocation col_server inf: server ms_drbd:Master
> > > order ord_server inf: ms_drbd:promote server:start
> > > property $id="cib-bootstrap-options" \
> > > stonith-enabled="false" \
> > > last-lrm-refresh="1421079189" \
> > > maintenance-mode="false"
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
Re: Avoid one node from being a target for resources migration [ In reply to ]
> On 14 Jan 2015, at 12:06 am, Dmitry Koterov <dmitry.koterov@gmail.com> wrote:
>
>
> > Then I see that, although node2 clearly knows it's isolated (it doesn't see other 2 nodes and does not have quorum)
>
> we don't know that - there are several algorithms for calculating quorum and the information isn't included in your output.
> are you using cman, or corosync underneath pacemaker? corosync version? pacemaker version? have you set no-quorum-policy?
>
> no-quorum-policy is not set, so, according to http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-options.html , it is "stop - stop all resources in the affected cluster parition". I suppose this is the right option, but why the resources are not stopped on the node when this one node of three becomes isolated and the node clearly sees other nodes as offline (so it knows it's isolated)? What should I configure in addition?
>
> I'm using corosync+pacemaker, no cman. Below (in quotes) is output of "crm configure show". Versions are from Ubuntu 14.04, so almost new.

I don't have Ubuntu installed. You'll have to be more specific as to what package versions you have.

>
>
> > , it does not stop its services:
> >
> > root@node2:~# crm status
> > Online: [ node2 ]
> > OFFLINE: [ node1 node3 ]
> > Master/Slave Set: ms_drbd [drbd]
> > Masters: [ node2 ]
> > Stopped: [ node1 node3 ]
> > Resource Group: server
> > fs (ocf::heartbeat:Filesystem): Started node2
> > postgresql (lsb:postgresql): Started node2
> > bind9 (lsb:bind9): Started node2
> > nginx (lsb:nginx): Started node2
> >
> > So is there a way to say pacemaker to shutdown nodes' services when they become isolated?
> >
> >
> >
> > On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <dvossel@redhat.com> wrote:
> >
> >
> > ----- Original Message -----
> > > Hello.
> > >
> > > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are
> > > DRBD master-slave, also they have a number of other services installed
> > > (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
> > > DRBD/postgresql/... are installed at it, only corosync+pacemaker.
> > >
> > > But when I add resources to the cluster, a part of them are somehow moved to
> > > node3 and since then fail. Note than I have a "colocation" directive to
> > > place these resources to the DRBD master only and "location" with -inf for
> > > node3, but this does not help - why? How to make pacemaker not run anything
> > > at node3?
> > >
> > > All the resources are added in a single transaction: "cat config.txt | crm -w
> > > -f- configure" where config.txt contains directives and "commit" statement
> > > at the end.
> > >
> > > Below are "crm status" (error messages) and "crm configure show" outputs.
> > >
> > >
> > > root@node3:~# crm status
> > > Current DC: node2 (1017525950) - partition with quorum
> > > 3 Nodes configured
> > > 6 Resources configured
> > > Online: [ node1 node2 node3 ]
> > > Master/Slave Set: ms_drbd [drbd]
> > > Masters: [ node1 ]
> > > Slaves: [ node2 ]
> > > Resource Group: server
> > > fs (ocf::heartbeat:Filesystem): Started node1
> > > postgresql (lsb:postgresql): Started node3 FAILED
> > > bind9 (lsb:bind9): Started node3 FAILED
> > > nginx (lsb:nginx): Started node3 (unmanaged) FAILED
> > > Failed actions:
> > > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
> > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
> > > installed
> > > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
> > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
> > > error
> > > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
> > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
> > > error
> > > nginx_stop_0 (node=node3, call=767, rc=5, status=complete, last-rc-change=Mon
> > > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed
> >
> > Here's what is going on. Even when you say "never run this resource on node3"
> > pacemaker is going to probe for the resource regardless on node3 just to verify
> > the resource isn't running.
> >
> > The failures you are seeing "monitor_0 failed" indicate that pacemaker failed
> > to be able to verify resources are running on node3 because the related
> > packages for the resources are not installed. Given pacemaker's default
> > behavior I'd expect this.
> >
> > You have two options.
> >
> > 1. install the resource related packages on node3 even though you never want
> > them to run there. This will allow the resource-agents to verify the resource
> > is in fact inactive.
> >
> > 2. If you are using the current master branch of pacemaker, there's a new
> > location constraint option called 'resource-discovery=always|never|exclusive'.
> > If you add the 'resource-discovery=never' option to your location constraint
> > that attempts to keep resources from node3, you'll avoid having pacemaker
> > perform the 'monitor_0' actions on node3 as well.
> >
> > -- Vossel
> >
> > >
> > > root@node3:~# crm configure show | cat
> > > node $id="1017525950" node2
> > > node $id="13071578" node3
> > > node $id="1760315215" node1
> > > primitive drbd ocf:linbit:drbd \
> > > params drbd_resource="vlv" \
> > > op start interval="0" timeout="240" \
> > > op stop interval="0" timeout="120"
> > > primitive fs ocf:heartbeat:Filesystem \
> > > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root"
> > > options="noatime,nodiratime" fstype="xfs" \
> > > op start interval="0" timeout="300" \
> > > op stop interval="0" timeout="300"
> > > primitive postgresql lsb:postgresql \
> > > op monitor interval="10" timeout="60" \
> > > op start interval="0" timeout="60" \
> > > op stop interval="0" timeout="60"
> > > primitive bind9 lsb:bind9 \
> > > op monitor interval="10" timeout="60" \
> > > op start interval="0" timeout="60" \
> > > op stop interval="0" timeout="60"
> > > primitive nginx lsb:nginx \
> > > op monitor interval="10" timeout="60" \
> > > op start interval="0" timeout="60" \
> > > op stop interval="0" timeout="60"
> > > group server fs postgresql bind9 nginx
> > > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2"
> > > clone-node-max="1" notify="true"
> > > location loc_server server rule $id="loc_server-rule" -inf: #uname eq node3
> > > colocation col_server inf: server ms_drbd:Master
> > > order ord_server inf: ms_drbd:promote server:start
> > > property $id="cib-bootstrap-options" \
> > > stonith-enabled="false" \
> > > last-lrm-refresh="1421079189" \
> > > maintenance-mode="false"
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Avoid one node from being a target for resources migration [ In reply to ]
Sorry!

Pacemaker 1.1.10
Corosync 2.3.30

BTW I removed quorum.two_node:1 from corosync.conf, and it helped! Now
isolated node stops its services in 3-node cluster. Was it the right
solution?

On Wednesday, January 14, 2015, Andrew Beekhof <andrew@beekhof.net> wrote:

>
> > On 14 Jan 2015, at 12:06 am, Dmitry Koterov <dmitry.koterov@gmail.com
> <javascript:;>> wrote:
> >
> >
> > > Then I see that, although node2 clearly knows it's isolated (it
> doesn't see other 2 nodes and does not have quorum)
> >
> > we don't know that - there are several algorithms for calculating quorum
> and the information isn't included in your output.
> > are you using cman, or corosync underneath pacemaker? corosync version?
> pacemaker version? have you set no-quorum-policy?
> >
> > no-quorum-policy is not set, so, according to
> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-options.html
> , it is "stop - stop all resources in the affected cluster parition". I
> suppose this is the right option, but why the resources are not stopped on
> the node when this one node of three becomes isolated and the node clearly
> sees other nodes as offline (so it knows it's isolated)? What should I
> configure in addition?
> >
> > I'm using corosync+pacemaker, no cman. Below (in quotes) is output of
> "crm configure show". Versions are from Ubuntu 14.04, so almost new.
>
> I don't have Ubuntu installed. You'll have to be more specific as to what
> package versions you have.
>
> >
> >
> > > , it does not stop its services:
> > >
> > > root@node2:~# crm status
> > > Online: [ node2 ]
> > > OFFLINE: [ node1 node3 ]
> > > Master/Slave Set: ms_drbd [drbd]
> > > Masters: [ node2 ]
> > > Stopped: [ node1 node3 ]
> > > Resource Group: server
> > > fs (ocf::heartbeat:Filesystem): Started node2
> > > postgresql (lsb:postgresql): Started node2
> > > bind9 (lsb:bind9): Started node2
> > > nginx (lsb:nginx): Started node2
> > >
> > > So is there a way to say pacemaker to shutdown nodes' services when
> they become isolated?
> > >
> > >
> > >
> > > On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <dvossel@redhat.com
> <javascript:;>> wrote:
> > >
> > >
> > > ----- Original Message -----
> > > > Hello.
> > > >
> > > > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and
> Node2 are
> > > > DRBD master-slave, also they have a number of other services
> installed
> > > > (postgresql, nginx, ...). Node3 is just a corosync node (for
> quorum), no
> > > > DRBD/postgresql/... are installed at it, only corosync+pacemaker.
> > > >
> > > > But when I add resources to the cluster, a part of them are somehow
> moved to
> > > > node3 and since then fail. Note than I have a "colocation" directive
> to
> > > > place these resources to the DRBD master only and "location" with
> -inf for
> > > > node3, but this does not help - why? How to make pacemaker not run
> anything
> > > > at node3?
> > > >
> > > > All the resources are added in a single transaction: "cat config.txt
> | crm -w
> > > > -f- configure" where config.txt contains directives and "commit"
> statement
> > > > at the end.
> > > >
> > > > Below are "crm status" (error messages) and "crm configure show"
> outputs.
> > > >
> > > >
> > > > root@node3:~# crm status
> > > > Current DC: node2 (1017525950) - partition with quorum
> > > > 3 Nodes configured
> > > > 6 Resources configured
> > > > Online: [ node1 node2 node3 ]
> > > > Master/Slave Set: ms_drbd [drbd]
> > > > Masters: [ node1 ]
> > > > Slaves: [ node2 ]
> > > > Resource Group: server
> > > > fs (ocf::heartbeat:Filesystem): Started node1
> > > > postgresql (lsb:postgresql): Started node3 FAILED
> > > > bind9 (lsb:bind9): Started node3 FAILED
> > > > nginx (lsb:nginx): Started node3 (unmanaged) FAILED
> > > > Failed actions:
> > > > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
> > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
> > > > installed
> > > > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
> > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms):
> unknown
> > > > error
> > > > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
> > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms):
> unknown
> > > > error
> > > > nginx_stop_0 (node=node3, call=767, rc=5, status=complete,
> last-rc-change=Mon
> > > > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed
> > >
> > > Here's what is going on. Even when you say "never run this resource on
> node3"
> > > pacemaker is going to probe for the resource regardless on node3 just
> to verify
> > > the resource isn't running.
> > >
> > > The failures you are seeing "monitor_0 failed" indicate that pacemaker
> failed
> > > to be able to verify resources are running on node3 because the related
> > > packages for the resources are not installed. Given pacemaker's default
> > > behavior I'd expect this.
> > >
> > > You have two options.
> > >
> > > 1. install the resource related packages on node3 even though you
> never want
> > > them to run there. This will allow the resource-agents to verify the
> resource
> > > is in fact inactive.
> > >
> > > 2. If you are using the current master branch of pacemaker, there's a
> new
> > > location constraint option called
> 'resource-discovery=always|never|exclusive'.
> > > If you add the 'resource-discovery=never' option to your location
> constraint
> > > that attempts to keep resources from node3, you'll avoid having
> pacemaker
> > > perform the 'monitor_0' actions on node3 as well.
> > >
> > > -- Vossel
> > >
> > > >
> > > > root@node3:~# crm configure show | cat
> > > > node $id="1017525950" node2
> > > > node $id="13071578" node3
> > > > node $id="1760315215" node1
> > > > primitive drbd ocf:linbit:drbd \
> > > > params drbd_resource="vlv" \
> > > > op start interval="0" timeout="240" \
> > > > op stop interval="0" timeout="120"
> > > > primitive fs ocf:heartbeat:Filesystem \
> > > > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root"
> > > > options="noatime,nodiratime" fstype="xfs" \
> > > > op start interval="0" timeout="300" \
> > > > op stop interval="0" timeout="300"
> > > > primitive postgresql lsb:postgresql \
> > > > op monitor interval="10" timeout="60" \
> > > > op start interval="0" timeout="60" \
> > > > op stop interval="0" timeout="60"
> > > > primitive bind9 lsb:bind9 \
> > > > op monitor interval="10" timeout="60" \
> > > > op start interval="0" timeout="60" \
> > > > op stop interval="0" timeout="60"
> > > > primitive nginx lsb:nginx \
> > > > op monitor interval="10" timeout="60" \
> > > > op start interval="0" timeout="60" \
> > > > op stop interval="0" timeout="60"
> > > > group server fs postgresql bind9 nginx
> > > > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2"
> > > > clone-node-max="1" notify="true"
> > > > location loc_server server rule $id="loc_server-rule" -inf: #uname
> eq node3
> > > > colocation col_server inf: server ms_drbd:Master
> > > > order ord_server inf: ms_drbd:promote server:start
> > > > property $id="cib-bootstrap-options" \
> > > > stonith-enabled="false" \
> > > > last-lrm-refresh="1421079189" \
> > > > maintenance-mode="false"
> > > >
> > > > _______________________________________________
> > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > >
> > > > Project Home: http://www.clusterlabs.org
> > > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > Bugs: http://bugs.clusterlabs.org
> > > >
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org <javascript:;>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
Re: Avoid one node from being a target for resources migration [ In reply to ]
> On 15 Jan 2015, at 12:43 am, Dmitry Koterov <dmitry.koterov@gmail.com> wrote:
>
> Sorry!
>
> Pacemaker 1.1.10
> Corosync 2.3.30
>
> BTW I removed quorum.two_node:1 from corosync.conf, and it helped! Now isolated node stops its services in 3-node cluster. Was it the right solution?

Yes. 'quorum.two_node:1' is only sane for a 2 node cluster

>
> On Wednesday, January 14, 2015, Andrew Beekhof <andrew@beekhof.net> wrote:
>
> > On 14 Jan 2015, at 12:06 am, Dmitry Koterov <dmitry.koterov@gmail.com> wrote:
> >
> >
> > > Then I see that, although node2 clearly knows it's isolated (it doesn't see other 2 nodes and does not have quorum)
> >
> > we don't know that - there are several algorithms for calculating quorum and the information isn't included in your output.
> > are you using cman, or corosync underneath pacemaker? corosync version? pacemaker version? have you set no-quorum-policy?
> >
> > no-quorum-policy is not set, so, according to http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-options.html , it is "stop - stop all resources in the affected cluster parition". I suppose this is the right option, but why the resources are not stopped on the node when this one node of three becomes isolated and the node clearly sees other nodes as offline (so it knows it's isolated)? What should I configure in addition?
> >
> > I'm using corosync+pacemaker, no cman. Below (in quotes) is output of "crm configure show". Versions are from Ubuntu 14.04, so almost new.
>
> I don't have Ubuntu installed. You'll have to be more specific as to what package versions you have.
>
> >
> >
> > > , it does not stop its services:
> > >
> > > root@node2:~# crm status
> > > Online: [ node2 ]
> > > OFFLINE: [ node1 node3 ]
> > > Master/Slave Set: ms_drbd [drbd]
> > > Masters: [ node2 ]
> > > Stopped: [ node1 node3 ]
> > > Resource Group: server
> > > fs (ocf::heartbeat:Filesystem): Started node2
> > > postgresql (lsb:postgresql): Started node2
> > > bind9 (lsb:bind9): Started node2
> > > nginx (lsb:nginx): Started node2
> > >
> > > So is there a way to say pacemaker to shutdown nodes' services when they become isolated?
> > >
> > >
> > >
> > > On Mon, Jan 12, 2015 at 8:25 PM, David Vossel <dvossel@redhat.com> wrote:
> > >
> > >
> > > ----- Original Message -----
> > > > Hello.
> > > >
> > > > I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are
> > > > DRBD master-slave, also they have a number of other services installed
> > > > (postgresql, nginx, ...). Node3 is just a corosync node (for quorum), no
> > > > DRBD/postgresql/... are installed at it, only corosync+pacemaker.
> > > >
> > > > But when I add resources to the cluster, a part of them are somehow moved to
> > > > node3 and since then fail. Note than I have a "colocation" directive to
> > > > place these resources to the DRBD master only and "location" with -inf for
> > > > node3, but this does not help - why? How to make pacemaker not run anything
> > > > at node3?
> > > >
> > > > All the resources are added in a single transaction: "cat config.txt | crm -w
> > > > -f- configure" where config.txt contains directives and "commit" statement
> > > > at the end.
> > > >
> > > > Below are "crm status" (error messages) and "crm configure show" outputs.
> > > >
> > > >
> > > > root@node3:~# crm status
> > > > Current DC: node2 (1017525950) - partition with quorum
> > > > 3 Nodes configured
> > > > 6 Resources configured
> > > > Online: [ node1 node2 node3 ]
> > > > Master/Slave Set: ms_drbd [drbd]
> > > > Masters: [ node1 ]
> > > > Slaves: [ node2 ]
> > > > Resource Group: server
> > > > fs (ocf::heartbeat:Filesystem): Started node1
> > > > postgresql (lsb:postgresql): Started node3 FAILED
> > > > bind9 (lsb:bind9): Started node3 FAILED
> > > > nginx (lsb:nginx): Started node3 (unmanaged) FAILED
> > > > Failed actions:
> > > > drbd_monitor_0 (node=node3, call=744, rc=5, status=complete,
> > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=2ms, exec=0ms): not
> > > > installed
> > > > postgresql_monitor_0 (node=node3, call=753, rc=1, status=complete,
> > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=8ms, exec=0ms): unknown
> > > > error
> > > > bind9_monitor_0 (node=node3, call=757, rc=1, status=complete,
> > > > last-rc-change=Mon Jan 12 11:16:43 2015, queued=11ms, exec=0ms): unknown
> > > > error
> > > > nginx_stop_0 (node=node3, call=767, rc=5, status=complete, last-rc-change=Mon
> > > > Jan 12 11:16:44 2015, queued=1ms, exec=0ms): not installed
> > >
> > > Here's what is going on. Even when you say "never run this resource on node3"
> > > pacemaker is going to probe for the resource regardless on node3 just to verify
> > > the resource isn't running.
> > >
> > > The failures you are seeing "monitor_0 failed" indicate that pacemaker failed
> > > to be able to verify resources are running on node3 because the related
> > > packages for the resources are not installed. Given pacemaker's default
> > > behavior I'd expect this.
> > >
> > > You have two options.
> > >
> > > 1. install the resource related packages on node3 even though you never want
> > > them to run there. This will allow the resource-agents to verify the resource
> > > is in fact inactive.
> > >
> > > 2. If you are using the current master branch of pacemaker, there's a new
> > > location constraint option called 'resource-discovery=always|never|exclusive'.
> > > If you add the 'resource-discovery=never' option to your location constraint
> > > that attempts to keep resources from node3, you'll avoid having pacemaker
> > > perform the 'monitor_0' actions on node3 as well.
> > >
> > > -- Vossel
> > >
> > > >
> > > > root@node3:~# crm configure show | cat
> > > > node $id="1017525950" node2
> > > > node $id="13071578" node3
> > > > node $id="1760315215" node1
> > > > primitive drbd ocf:linbit:drbd \
> > > > params drbd_resource="vlv" \
> > > > op start interval="0" timeout="240" \
> > > > op stop interval="0" timeout="120"
> > > > primitive fs ocf:heartbeat:Filesystem \
> > > > params device="/dev/drbd0" directory="/var/lib/vlv.drbd/root"
> > > > options="noatime,nodiratime" fstype="xfs" \
> > > > op start interval="0" timeout="300" \
> > > > op stop interval="0" timeout="300"
> > > > primitive postgresql lsb:postgresql \
> > > > op monitor interval="10" timeout="60" \
> > > > op start interval="0" timeout="60" \
> > > > op stop interval="0" timeout="60"
> > > > primitive bind9 lsb:bind9 \
> > > > op monitor interval="10" timeout="60" \
> > > > op start interval="0" timeout="60" \
> > > > op stop interval="0" timeout="60"
> > > > primitive nginx lsb:nginx \
> > > > op monitor interval="10" timeout="60" \
> > > > op start interval="0" timeout="60" \
> > > > op stop interval="0" timeout="60"
> > > > group server fs postgresql bind9 nginx
> > > > ms ms_drbd drbd meta master-max="1" master-node-max="1" clone-max="2"
> > > > clone-node-max="1" notify="true"
> > > > location loc_server server rule $id="loc_server-rule" -inf: #uname eq node3
> > > > colocation col_server inf: server ms_drbd:Master
> > > > order ord_server inf: ms_drbd:promote server:start
> > > > property $id="cib-bootstrap-options" \
> > > > stonith-enabled="false" \
> > > > last-lrm-refresh="1421079189" \
> > > > maintenance-mode="false"
> > > >
> > > > _______________________________________________
> > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > >
> > > > Project Home: http://www.clusterlabs.org
> > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > Bugs: http://bugs.clusterlabs.org
> > > >
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org