Mailing List Archive: Booth ticket renovation timeout

Booth ticket renovation timeout

Feb 8, 2015, 11:06 AM

Post #1 of 4 (1323 views)

Hi all,

I'm performing a lab test were I have a geo cluster and an arbitrator, in a
configuration for disaster recovery with fail over. There are two main
sites (primary and disaster recovery) and a third site for arbitrator.

I have defined a ticket named "Primary", which will define which is the
primary site and which is the recovery site.
In my first configuration I had in the bothh.conf a value of 60 for the
ticket renewal. After I assigned the ticket to the primary site, when the
renovation time was reached, the ticket was not renewed and it ended up not
assigned to any of the sites.

So, I increased the value to 120 and now the ticket gets correctly renewed.

I am interested to know if there are any kind of constraints for the
minimum value for the ticket renewal. Is there any design aspect that would
recommend higher values? And what about in a production environment, where
time lags might be larger, would such a situation occur? What would be a
typical set of timeout values (please notice the CIB timeout values).

My configurations are as follow.

Thanks in advance,
Jorge

/etc/booth/booth.conf:

transport="UDP"
port="6666"
site="192.168.180.211"
site="192.168.190.211"
arbitrator="192.168.200.211"
ticket="primary;120"

crm configure show:
node $id="1084798152" cluster1-node1
primitive booth ocf:pacemaker:booth-site \
meta resource-stickiness="INFINITY" \
op monitor interval="10s" timeout="20s"
primitive booth-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.180.211"
primitive dummy-pgsql ocf:pacemaker:Stateful \
op monitor interval="15" role="Slave" timeout="60s" \
op monitor interval="30" role="Master" timeout="60s"
primitive oversee-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.180.210"
group g-booth booth-ip booth
ms ms_dummy_pqsql dummy-pgsql \
meta target-role="Master" clone-max="1"
order order-booth-oversee-ip inf: g-booth oversee-ip
rsc_ticket ms_dummy_pgsql_primary primary: ms_dummy_pqsql:Master
loss-policy=demote
rsc_ticket oversee-ip-req-primary primary: oversee-ip loss-policy=stop
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="false"

Re: Booth ticket renovation timeout [ In reply to ]

dejanmm at fastmail

Feb 9, 2015, 1:56 AM

Post #2 of 4 (1255 views)

Permalink

Hi,

On Sun, Feb 08, 2015 at 07:06:13PM +0000, Jorge Lopes wrote:
> Hi all,
>
> I'm performing a lab test were I have a geo cluster and an arbitrator, in a
> configuration for disaster recovery with fail over. There are two main
> sites (primary and disaster recovery) and a third site for arbitrator.
>
> I have defined a ticket named "Primary", which will define which is the
> primary site and which is the recovery site.
> In my first configuration I had in the bothh.conf a value of 60 for the
> ticket renewal. After I assigned the ticket to the primary site, when the
> renovation time was reached, the ticket was not renewed and it ended up not
> assigned to any of the sites.
>
> So, I increased the value to 120 and now the ticket gets correctly renewed.
>
> I am interested to know if there are any kind of constraints for the
> minimum value for the ticket renewal. Is there any design aspect that would
> recommend higher values? And what about in a production environment, where
> time lags might be larger, would such a situation occur? What would be a
> typical set of timeout values (please notice the CIB timeout values).
>
> My configurations are as follow.

It seems like you're running the older version of booth, which
has been deprecated and is effectively unmaintained. The newer
version is available at
https://github.com/ClusterLabs/booth/releases/tag/v0.2.0

Thanks,

Dejan

> Thanks in advance,
> Jorge
>
>
> /etc/booth/booth.conf:
>
> transport="UDP"
> port="6666"
> site="192.168.180.211"
> site="192.168.190.211"
> arbitrator="192.168.200.211"
> ticket="primary;120"
>
>
> crm configure show:
> node $id="1084798152" cluster1-node1
> primitive booth ocf:pacemaker:booth-site \
> meta resource-stickiness="INFINITY" \
> op monitor interval="10s" timeout="20s"
> primitive booth-ip ocf:heartbeat:IPaddr2 \
> params ip="192.168.180.211"
> primitive dummy-pgsql ocf:pacemaker:Stateful \
> op monitor interval="15" role="Slave" timeout="60s" \
> op monitor interval="30" role="Master" timeout="60s"
> primitive oversee-ip ocf:heartbeat:IPaddr2 \
> params ip="192.168.180.210"
> group g-booth booth-ip booth
> ms ms_dummy_pqsql dummy-pgsql \
> meta target-role="Master" clone-max="1"
> order order-booth-oversee-ip inf: g-booth oversee-ip
> rsc_ticket ms_dummy_pgsql_primary primary: ms_dummy_pqsql:Master
> loss-policy=demote
> rsc_ticket oversee-ip-req-primary primary: oversee-ip loss-policy=stop
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-42f2063" \
> cluster-infrastructure="corosync" \
> stonith-enabled="false"

> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Booth ticket renovation timeout [ In reply to ]

jmclopes at gmail

Feb 9, 2015, 2:53 AM

Post #3 of 4 (1256 views)

Permalink

Hi Dejan,
Thanks for the tip.

Concerning the timeout values, what would be the ticket renewal typical
values for a production environment?

Thanks,
Jorge

>
>On Sun, Feb 08, 2015 at 07:06:13PM +0000, Jorge Lopes wrote:
>> Hi all,
>>
>> I'm performing a lab test were I have a geo cluster and an arbitrator,
in a
>> configuration for disaster recovery with fail over. There are two main
>> sites (primary and disaster recovery) and a third site for arbitrator.
>>
>> I have defined a ticket named "Primary", which will define which is the
>> primary site and which is the recovery site.
>> In my first configuration I had in the bothh.conf a value of 60 for the
>> ticket renewal. After I assigned the ticket to the primary site, when the
>> renovation time was reached, the ticket was not renewed and it ended up
not
>> assigned to any of the sites.
>>
>> So, I increased the value to 120 and now the ticket gets correctly
renewed.
>>
>> I am interested to know if there are any kind of constraints for the
>> minimum value for the ticket renewal. Is there any design aspect that
would
>> recommend higher values? And what about in a production environment,
where
>> time lags might be larger, would such a situation occur? What would be a
>> typical set of timeout values (please notice the CIB timeout values).
>>
>> My configurations are as follow.
>
>It seems like you're running the older version of booth, which
>has been deprecated and is effectively unmaintained. The newer
>version is available at
>https://github.com/ClusterLabs/booth/releases/tag/v0.2.0
>
>Thanks,
>
>Dejan
>
>> Thanks in advance,
>> Jorge
>>
>>
>> /etc/booth/booth.conf:
>>
>> transport="UDP"
>> port="6666"
>> site="192.168.180.211"
>> site="192.168.190.211"
>> arbitrator="192.168.200.211"
>> ticket="primary;120"
>>
>>
>> crm configure show:
>> node $id="1084798152" cluster1-node1
>> primitive booth ocf:pacemaker:booth-site \
>> meta resource-stickiness="INFINITY" \
>> op monitor interval="10s" timeout="20s"
>> primitive booth-ip ocf:heartbeat:IPaddr2 \
>> params ip="192.168.180.211"
>> primitive dummy-pgsql ocf:pacemaker:Stateful \
>> op monitor interval="15" role="Slave" timeout="60s" \
>> op monitor interval="30" role="Master" timeout="60s"
>> primitive oversee-ip ocf:heartbeat:IPaddr2 \
>> params ip="192.168.180.210"
>> group g-booth booth-ip booth
>> ms ms_dummy_pqsql dummy-pgsql \
>> meta target-role="Master" clone-max="1"
>> order order-booth-oversee-ip inf: g-booth oversee-ip
>> rsc_ticket ms_dummy_pgsql_primary primary: ms_dummy_pqsql:Master
>> loss-policy=demote
>> rsc_ticket oversee-ip-req-primary primary: oversee-ip loss-policy=stop
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.10-42f2063" \
>> cluster-infrastructure="corosync" \
>> stonith-enabled="false"
>

Re: Booth ticket renovation timeout [ In reply to ]

dejanmm at fastmail

Feb 9, 2015, 3:52 AM

Post #4 of 4 (1248 views)

Permalink

On Mon, Feb 09, 2015 at 10:53:00AM +0000, Jorge Lopes wrote:
> Hi Dejan,
> Thanks for the tip.
>
> Concerning the timeout values, what would be the ticket renewal typical
> values for a production environment?

We have two parameters: expire and renewal. The latter used to be
set to half of the former and due to a user request it is
configurable now. If not configured, the expire time is set to 10
minutes, which yields the renewal time of 5 minutes. Those being
defaults, eventually they depend on your business needs and
site failover/disaster recovery procedures as well as the
connection stability and packet loss rates. I doubt that an
expiry time of less than 1 minute is practical, though testing
could be done with times of less than 10 seconds. The README
contains booth operation description which you may find useful:

https://github.com/ClusterLabs/booth/blob/master/README

Thanks,

Dejan

> Thanks,
> Jorge
>
> >
> >On Sun, Feb 08, 2015 at 07:06:13PM +0000, Jorge Lopes wrote:
> >> Hi all,
> >>
> >> I'm performing a lab test were I have a geo cluster and an arbitrator,
> in a
> >> configuration for disaster recovery with fail over. There are two main
> >> sites (primary and disaster recovery) and a third site for arbitrator.
> >>
> >> I have defined a ticket named "Primary", which will define which is the
> >> primary site and which is the recovery site.
> >> In my first configuration I had in the bothh.conf a value of 60 for the
> >> ticket renewal. After I assigned the ticket to the primary site, when the
> >> renovation time was reached, the ticket was not renewed and it ended up
> not
> >> assigned to any of the sites.
> >>
> >> So, I increased the value to 120 and now the ticket gets correctly
> renewed.
> >>
> >> I am interested to know if there are any kind of constraints for the
> >> minimum value for the ticket renewal. Is there any design aspect that
> would
> >> recommend higher values? And what about in a production environment,
> where
> >> time lags might be larger, would such a situation occur? What would be a
> >> typical set of timeout values (please notice the CIB timeout values).
> >>
> >> My configurations are as follow.
> >
> >It seems like you're running the older version of booth, which
> >has been deprecated and is effectively unmaintained. The newer
> >version is available at
> >https://github.com/ClusterLabs/booth/releases/tag/v0.2.0
> >
> >Thanks,
> >
> >Dejan
> >
> >> Thanks in advance,
> >> Jorge
> >>
> >>
> >> /etc/booth/booth.conf:
> >>
> >> transport="UDP"
> >> port="6666"
> >> site="192.168.180.211"
> >> site="192.168.190.211"
> >> arbitrator="192.168.200.211"
> >> ticket="primary;120"
> >>
> >>
> >> crm configure show:
> >> node $id="1084798152" cluster1-node1
> >> primitive booth ocf:pacemaker:booth-site \
> >> meta resource-stickiness="INFINITY" \
> >> op monitor interval="10s" timeout="20s"
> >> primitive booth-ip ocf:heartbeat:IPaddr2 \
> >> params ip="192.168.180.211"
> >> primitive dummy-pgsql ocf:pacemaker:Stateful \
> >> op monitor interval="15" role="Slave" timeout="60s" \
> >> op monitor interval="30" role="Master" timeout="60s"
> >> primitive oversee-ip ocf:heartbeat:IPaddr2 \
> >> params ip="192.168.180.210"
> >> group g-booth booth-ip booth
> >> ms ms_dummy_pqsql dummy-pgsql \
> >> meta target-role="Master" clone-max="1"
> >> order order-booth-oversee-ip inf: g-booth oversee-ip
> >> rsc_ticket ms_dummy_pgsql_primary primary: ms_dummy_pqsql:Master
> >> loss-policy=demote
> >> rsc_ticket oversee-ip-req-primary primary: oversee-ip loss-policy=stop
> >> property $id="cib-bootstrap-options" \
> >> dc-version="1.1.10-42f2063" \
> >> cluster-infrastructure="corosync" \
> >> stonith-enabled="false"
> >

> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org