Mailing List Archive

Failed-over incomplete
Dear List,

We are using Pacemaker and Corosync with CMAN as our HA software as
below version.

OS: CentOS release 6.5 (Final) 64-bit
Pacemaker: pacemaker.x86_64 1.1.10-14.el6_5.3
Corosync: corosync.x86_64 1.4.1-17.el6_5.1
CMAN: cman.x86_64 3.0.12.1-59.el6_5.2
Resource-Agent: resource-agents.x86_64 3.9.5-3.12

Topology: 2 Nodes with Active/Standby model. (MySQL is
Active/Active by clone)

All packages are install from CentOS official repository, and the
Resource-Agent is only one which be installed from OpenSUSE repository
(http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/).

The system is work normally for few months until yesterday morning,
around 03:35 UTC+0700, we found that one of resource is go into
UNMANAGED state without any configuration changed. After another
resource is failed, the pacemaker try to failed-over resource to
another node but it incomplete after facing this resource.

Configuration of some resource is below and the LOG during event is in
attached file.

primitive res.vBKN6 IPv6addr \
params ipv6addr="2001:db8:0:f::61a" cidr_netmask=64 nic=eth0 \
op monitor interval=10s

primitive res.vDMZ6 IPv6addr \
params ipv6addr="2001:db8:0:9::61a" cidr_netmask=64 nic=eth1 \
op monitor interval=10s

group gr.mainService res.vDMZ4 res.vDMZ6 res.vBKN4 res.vBKN6 res.http res.ftp

rsc_defaults rsc_defaults-options: \
migration-threshold=1

Please help me to solve this problem.

--teenigma
Re: Failed-over incomplete [ In reply to ]
On Thu, Dec 4, 2014 at 4:56 AM, Teerapatr Kittiratanachai
<maillist.tk@gmail.com> wrote:
> Dear List,
>
> We are using Pacemaker and Corosync with CMAN as our HA software as
> below version.
>
> OS: CentOS release 6.5 (Final) 64-bit
> Pacemaker: pacemaker.x86_64 1.1.10-14.el6_5.3
> Corosync: corosync.x86_64 1.4.1-17.el6_5.1
> CMAN: cman.x86_64 3.0.12.1-59.el6_5.2
> Resource-Agent: resource-agents.x86_64 3.9.5-3.12
>
> Topology: 2 Nodes with Active/Standby model. (MySQL is
> Active/Active by clone)
>
> All packages are install from CentOS official repository, and the
> Resource-Agent is only one which be installed from OpenSUSE repository
> (http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/).
>
> The system is work normally for few months until yesterday morning,
> around 03:35 UTC+0700, we found that one of resource is go into
> UNMANAGED state without any configuration changed. After another
> resource is failed, the pacemaker try to failed-over resource to
> another node but it incomplete after facing this resource.
>
> Configuration of some resource is below and the LOG during event is in
> attached file.
>

The log just covers resource monitor failure and stopping of
resources. It does not contain any event related to starting resources
on another nodes.

You would need to collect crm_report with start time before resource
failed and stop time after resources were started on another node.

> primitive res.vBKN6 IPv6addr \
> params ipv6addr="2001:db8:0:f::61a" cidr_netmask=64 nic=eth0 \
> op monitor interval=10s
>
> primitive res.vDMZ6 IPv6addr \
> params ipv6addr="2001:db8:0:9::61a" cidr_netmask=64 nic=eth1 \
> op monitor interval=10s
>
> group gr.mainService res.vDMZ4 res.vDMZ6 res.vBKN4 res.vBKN6 res.http res.ftp
>
> rsc_defaults rsc_defaults-options: \
> migration-threshold=1
>
> Please help me to solve this problem.
>
> --teenigma
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Failed-over incomplete [ In reply to ]
Dear Andrei,

Since the failed over is uncompleted so all the resource isn't failed
over to another node.

I think this case happened because of the res.vBKN is go into unmanaged state.

But why? Since there is no configuration is changed.

--teenigma

On Thu, Dec 4, 2014 at 1:41 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> On Thu, Dec 4, 2014 at 4:56 AM, Teerapatr Kittiratanachai
> <maillist.tk@gmail.com> wrote:
>> Dear List,
>>
>> We are using Pacemaker and Corosync with CMAN as our HA software as
>> below version.
>>
>> OS: CentOS release 6.5 (Final) 64-bit
>> Pacemaker: pacemaker.x86_64 1.1.10-14.el6_5.3
>> Corosync: corosync.x86_64 1.4.1-17.el6_5.1
>> CMAN: cman.x86_64 3.0.12.1-59.el6_5.2
>> Resource-Agent: resource-agents.x86_64 3.9.5-3.12
>>
>> Topology: 2 Nodes with Active/Standby model. (MySQL is
>> Active/Active by clone)
>>
>> All packages are install from CentOS official repository, and the
>> Resource-Agent is only one which be installed from OpenSUSE repository
>> (http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/).
>>
>> The system is work normally for few months until yesterday morning,
>> around 03:35 UTC+0700, we found that one of resource is go into
>> UNMANAGED state without any configuration changed. After another
>> resource is failed, the pacemaker try to failed-over resource to
>> another node but it incomplete after facing this resource.
>>
>> Configuration of some resource is below and the LOG during event is in
>> attached file.
>>
>
> The log just covers resource monitor failure and stopping of
> resources. It does not contain any event related to starting resources
> on another nodes.
>
> You would need to collect crm_report with start time before resource
> failed and stop time after resources were started on another node.
>
>> primitive res.vBKN6 IPv6addr \
>> params ipv6addr="2001:db8:0:f::61a" cidr_netmask=64 nic=eth0 \
>> op monitor interval=10s
>>
>> primitive res.vDMZ6 IPv6addr \
>> params ipv6addr="2001:db8:0:9::61a" cidr_netmask=64 nic=eth1 \
>> op monitor interval=10s
>>
>> group gr.mainService res.vDMZ4 res.vDMZ6 res.vBKN4 res.vBKN6 res.http res.ftp
>>
>> rsc_defaults rsc_defaults-options: \
>> migration-threshold=1
>>
>> Please help me to solve this problem.
>>
>> --teenigma
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Failed-over incomplete [ In reply to ]
On Thu, Dec 4, 2014 at 9:52 AM, Teerapatr Kittiratanachai
<maillist.tk@gmail.com> wrote:
> Dear Andrei,
>
> Since the failed over is uncompleted so all the resource isn't failed
> over to another node.
>
> I think this case happened because of the res.vBKN is go into unmanaged state.
>

There is no resource res.vBKN in your logs or configuration snippet
you have shown.

> But why? Since there is no configuration is changed.
>
> --teenigma
>
> On Thu, Dec 4, 2014 at 1:41 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>> On Thu, Dec 4, 2014 at 4:56 AM, Teerapatr Kittiratanachai
>> <maillist.tk@gmail.com> wrote:
>>> Dear List,
>>>
>>> We are using Pacemaker and Corosync with CMAN as our HA software as
>>> below version.
>>>
>>> OS: CentOS release 6.5 (Final) 64-bit
>>> Pacemaker: pacemaker.x86_64 1.1.10-14.el6_5.3
>>> Corosync: corosync.x86_64 1.4.1-17.el6_5.1
>>> CMAN: cman.x86_64 3.0.12.1-59.el6_5.2
>>> Resource-Agent: resource-agents.x86_64 3.9.5-3.12
>>>
>>> Topology: 2 Nodes with Active/Standby model. (MySQL is
>>> Active/Active by clone)
>>>
>>> All packages are install from CentOS official repository, and the
>>> Resource-Agent is only one which be installed from OpenSUSE repository
>>> (http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/).
>>>
>>> The system is work normally for few months until yesterday morning,
>>> around 03:35 UTC+0700, we found that one of resource is go into
>>> UNMANAGED state without any configuration changed. After another
>>> resource is failed, the pacemaker try to failed-over resource to
>>> another node but it incomplete after facing this resource.
>>>
>>> Configuration of some resource is below and the LOG during event is in
>>> attached file.
>>>
>>
>> The log just covers resource monitor failure and stopping of
>> resources. It does not contain any event related to starting resources
>> on another nodes.
>>
>> You would need to collect crm_report with start time before resource
>> failed and stop time after resources were started on another node.
>>
>>> primitive res.vBKN6 IPv6addr \
>>> params ipv6addr="2001:db8:0:f::61a" cidr_netmask=64 nic=eth0 \
>>> op monitor interval=10s
>>>
>>> primitive res.vDMZ6 IPv6addr \
>>> params ipv6addr="2001:db8:0:9::61a" cidr_netmask=64 nic=eth1 \
>>> op monitor interval=10s
>>>
>>> group gr.mainService res.vDMZ4 res.vDMZ6 res.vBKN4 res.vBKN6 res.http res.ftp
>>>
>>> rsc_defaults rsc_defaults-options: \
>>> migration-threshold=1
>>>
>>> Please help me to solve this problem.
>>>
>>> --teenigma
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Failed-over incomplete [ In reply to ]
sorry for my mistyping,
it's res.vBKN6

--teenigma

On Thu, Dec 4, 2014 at 4:23 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> On Thu, Dec 4, 2014 at 9:52 AM, Teerapatr Kittiratanachai
> <maillist.tk@gmail.com> wrote:
>> Dear Andrei,
>>
>> Since the failed over is uncompleted so all the resource isn't failed
>> over to another node.
>>
>> I think this case happened because of the res.vBKN is go into unmanaged state.
>>
>
> There is no resource res.vBKN in your logs or configuration snippet
> you have shown.
>
>> But why? Since there is no configuration is changed.
>>
>> --teenigma
>>
>> On Thu, Dec 4, 2014 at 1:41 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>>> On Thu, Dec 4, 2014 at 4:56 AM, Teerapatr Kittiratanachai
>>> <maillist.tk@gmail.com> wrote:
>>>> Dear List,
>>>>
>>>> We are using Pacemaker and Corosync with CMAN as our HA software as
>>>> below version.
>>>>
>>>> OS: CentOS release 6.5 (Final) 64-bit
>>>> Pacemaker: pacemaker.x86_64 1.1.10-14.el6_5.3
>>>> Corosync: corosync.x86_64 1.4.1-17.el6_5.1
>>>> CMAN: cman.x86_64 3.0.12.1-59.el6_5.2
>>>> Resource-Agent: resource-agents.x86_64 3.9.5-3.12
>>>>
>>>> Topology: 2 Nodes with Active/Standby model. (MySQL is
>>>> Active/Active by clone)
>>>>
>>>> All packages are install from CentOS official repository, and the
>>>> Resource-Agent is only one which be installed from OpenSUSE repository
>>>> (http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/).
>>>>
>>>> The system is work normally for few months until yesterday morning,
>>>> around 03:35 UTC+0700, we found that one of resource is go into
>>>> UNMANAGED state without any configuration changed. After another
>>>> resource is failed, the pacemaker try to failed-over resource to
>>>> another node but it incomplete after facing this resource.
>>>>
>>>> Configuration of some resource is below and the LOG during event is in
>>>> attached file.
>>>>
>>>
>>> The log just covers resource monitor failure and stopping of
>>> resources. It does not contain any event related to starting resources
>>> on another nodes.
>>>
>>> You would need to collect crm_report with start time before resource
>>> failed and stop time after resources were started on another node.
>>>
>>>> primitive res.vBKN6 IPv6addr \
>>>> params ipv6addr="2001:db8:0:f::61a" cidr_netmask=64 nic=eth0 \
>>>> op monitor interval=10s
>>>>
>>>> primitive res.vDMZ6 IPv6addr \
>>>> params ipv6addr="2001:db8:0:9::61a" cidr_netmask=64 nic=eth1 \
>>>> op monitor interval=10s
>>>>
>>>> group gr.mainService res.vDMZ4 res.vDMZ6 res.vBKN4 res.vBKN6 res.http res.ftp
>>>>
>>>> rsc_defaults rsc_defaults-options: \
>>>> migration-threshold=1
>>>>
>>>> Please help me to solve this problem.
>>>>
>>>> --teenigma
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Failed-over incomplete [ In reply to ]
Ð’ Thu, 4 Dec 2014 17:41:39 +0700
Teerapatr Kittiratanachai <maillist.tk@gmail.com> пишет:

> sorry for my mistyping,
> it's res.vBKN6
>

pacemaker tried to stop res.vBKN6 but resource agent failed to do it

Dec 03 03:35:57 [2027] node0.ntt.co.th crmd: notice: process_lrm_event: LRM operation res.vBKN6_stop_0 (call=97, rc=1, cib-update=34, confirmed=true) unknown error

This means pacemaker cannot start res.vBKN6 anywhere else, at least
without going via node stonith.

> --teenigma
>
> On Thu, Dec 4, 2014 at 4:23 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> > On Thu, Dec 4, 2014 at 9:52 AM, Teerapatr Kittiratanachai
> > <maillist.tk@gmail.com> wrote:
> >> Dear Andrei,
> >>
> >> Since the failed over is uncompleted so all the resource isn't failed
> >> over to another node.
> >>
> >> I think this case happened because of the res.vBKN is go into unmanaged state.
> >>
> >
> > There is no resource res.vBKN in your logs or configuration snippet
> > you have shown.
> >
> >> But why? Since there is no configuration is changed.
> >>
> >> --teenigma
> >>
> >> On Thu, Dec 4, 2014 at 1:41 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> >>> On Thu, Dec 4, 2014 at 4:56 AM, Teerapatr Kittiratanachai
> >>> <maillist.tk@gmail.com> wrote:
> >>>> Dear List,
> >>>>
> >>>> We are using Pacemaker and Corosync with CMAN as our HA software as
> >>>> below version.
> >>>>
> >>>> OS: CentOS release 6.5 (Final) 64-bit
> >>>> Pacemaker: pacemaker.x86_64 1.1.10-14.el6_5.3
> >>>> Corosync: corosync.x86_64 1.4.1-17.el6_5.1
> >>>> CMAN: cman.x86_64 3.0.12.1-59.el6_5.2
> >>>> Resource-Agent: resource-agents.x86_64 3.9.5-3.12
> >>>>
> >>>> Topology: 2 Nodes with Active/Standby model. (MySQL is
> >>>> Active/Active by clone)
> >>>>
> >>>> All packages are install from CentOS official repository, and the
> >>>> Resource-Agent is only one which be installed from OpenSUSE repository
> >>>> (http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/).
> >>>>
> >>>> The system is work normally for few months until yesterday morning,
> >>>> around 03:35 UTC+0700, we found that one of resource is go into
> >>>> UNMANAGED state without any configuration changed. After another
> >>>> resource is failed, the pacemaker try to failed-over resource to
> >>>> another node but it incomplete after facing this resource.
> >>>>
> >>>> Configuration of some resource is below and the LOG during event is in
> >>>> attached file.
> >>>>
> >>>
> >>> The log just covers resource monitor failure and stopping of
> >>> resources. It does not contain any event related to starting resources
> >>> on another nodes.
> >>>
> >>> You would need to collect crm_report with start time before resource
> >>> failed and stop time after resources were started on another node.
> >>>
> >>>> primitive res.vBKN6 IPv6addr \
> >>>> params ipv6addr="2001:db8:0:f::61a" cidr_netmask=64 nic=eth0 \
> >>>> op monitor interval=10s
> >>>>
> >>>> primitive res.vDMZ6 IPv6addr \
> >>>> params ipv6addr="2001:db8:0:9::61a" cidr_netmask=64 nic=eth1 \
> >>>> op monitor interval=10s
> >>>>
> >>>> group gr.mainService res.vDMZ4 res.vDMZ6 res.vBKN4 res.vBKN6 res.http res.ftp
> >>>>
> >>>> rsc_defaults rsc_defaults-options: \
> >>>> migration-threshold=1
> >>>>
> >>>> Please help me to solve this problem.
> >>>>
> >>>> --teenigma
> >>>>
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org