Mailing List Archive: Reinstall Pacemaker/Corosync.

Reinstall Pacemaker/Corosync.

Nov 24, 2015, 2:18 AM

Post #1 of 3 (1549 views)

Hi all,

I searched online but couldn't find a detailed answer. OS is RHEL 6.5.

Problem:
I have 2 servers which was setup fine (MySQL cluster is on it, DRBD for the
data disk on local disk) on which these 2 servers needs to be migrated to
other location. When it was migrated, the DRBD has to change from local
disk to SAN LUN which was migrated ok but the cluster began experiencing
weird behavior. Then the 2 nodes are shutdown and booted together, each
server can see each other as online via "crm_mon -1" but when one of the
node's pacemaker process is restarted, the status of that node from the
other node stays offline/stopped, even if I reboot that node, it doesn't
join back the cluster.

Other observation - if these 2 servers boot up together, both see online as
above and when I stop pacemaker process on the Active node, the other node
takes over the resources which is good but even if I start back the
pacemaker process on the other node, it's not able to take back the
resources. Kind of like, only one failover can happen and cannot failback.

What I did:
I removed Pacemaker and Corosync via YUM
Rebooted the OS
Verified no more Pacemaker/Corosync packages
Installed back Pacemaker and Corosync via YUM
When I did "crm_mon -1", I'm surprised to see that configuration is still
there.

After the reinstallation, still experiencing the same behavior and noticed
that DRBD is reporting Failed disk - only a reboot of the node can bring it
back to UpToDate.

Please advise on the correct procedure to wipe out the configuration and
reinstallation.

I will share the logs shortly.

Thanks,
Jef

Re: Reinstall Pacemaker/Corosync. [ In reply to ]

emi2fast at gmail

Nov 24, 2015, 2:28 AM

Post #2 of 3 (1533 views)

Permalink

I don't remember well, But I think in Redhat 6.5 you need to use
cman+pacemaker and please your config and you need to be sure you have
fencing configured.

2015-11-24 11:18 GMT+01:00 Cayab, Jefrey E. <jcayab@gmail.com>:
> Hi all,
>
> I searched online but couldn't find a detailed answer. OS is RHEL 6.5.
>
> Problem:
> I have 2 servers which was setup fine (MySQL cluster is on it, DRBD for the
> data disk on local disk) on which these 2 servers needs to be migrated to
> other location. When it was migrated, the DRBD has to change from local disk
> to SAN LUN which was migrated ok but the cluster began experiencing weird
> behavior. Then the 2 nodes are shutdown and booted together, each server can
> see each other as online via "crm_mon -1" but when one of the node's
> pacemaker process is restarted, the status of that node from the other node
> stays offline/stopped, even if I reboot that node, it doesn't join back the
> cluster.
>
> Other observation - if these 2 servers boot up together, both see online as
> above and when I stop pacemaker process on the Active node, the other node
> takes over the resources which is good but even if I start back the
> pacemaker process on the other node, it's not able to take back the
> resources. Kind of like, only one failover can happen and cannot failback.
>
>
> What I did:
> I removed Pacemaker and Corosync via YUM
> Rebooted the OS
> Verified no more Pacemaker/Corosync packages
> Installed back Pacemaker and Corosync via YUM
> When I did "crm_mon -1", I'm surprised to see that configuration is still
> there.
>
> After the reinstallation, still experiencing the same behavior and noticed
> that DRBD is reporting Failed disk - only a reboot of the node can bring it
> back to UpToDate.
>
> Please advise on the correct procedure to wipe out the configuration and
> reinstallation.
>
> I will share the logs shortly.
>
> Thanks,
> Jef
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

--
.~.
/V\
// \\
/( )\
^`~'^

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: Reinstall Pacemaker/Corosync. [ In reply to ]

kgaillot at redhat

Nov 30, 2015, 2:30 PM

Post #3 of 3 (1488 views)

Permalink

On 11/24/2015 04:28 AM, emmanuel segura wrote:
> I don't remember well, But I think in Redhat 6.5 you need to use
> cman+pacemaker and please your config and you need to be sure you have
> fencing configured.

Yes, the versions in 6.5 are quite old; 6.7 has recent versions, so if
you can upgrade, that would help. Even 6.6 is significantly newer and
has important bugfixes.

RHEL 6 does use corosync 1, but via CMAN rather than directly.

You can use the pcs command to configure and deconfigure the cluster
(pcs cluster node add/remove for one node, or pcs cluster setup/destroy
for the entire cluster).

> 2015-11-24 11:18 GMT+01:00 Cayab, Jefrey E. <jcayab@gmail.com>:
>> Hi all,
>>
>> I searched online but couldn't find a detailed answer. OS is RHEL 6.5.
>>
>> Problem:
>> I have 2 servers which was setup fine (MySQL cluster is on it, DRBD for the
>> data disk on local disk) on which these 2 servers needs to be migrated to
>> other location. When it was migrated, the DRBD has to change from local disk
>> to SAN LUN which was migrated ok but the cluster began experiencing weird
>> behavior. Then the 2 nodes are shutdown and booted together, each server can
>> see each other as online via "crm_mon -1" but when one of the node's
>> pacemaker process is restarted, the status of that node from the other node
>> stays offline/stopped, even if I reboot that node, it doesn't join back the
>> cluster.
>>
>> Other observation - if these 2 servers boot up together, both see online as
>> above and when I stop pacemaker process on the Active node, the other node
>> takes over the resources which is good but even if I start back the
>> pacemaker process on the other node, it's not able to take back the
>> resources. Kind of like, only one failover can happen and cannot failback.
>>
>>
>> What I did:
>> I removed Pacemaker and Corosync via YUM
>> Rebooted the OS
>> Verified no more Pacemaker/Corosync packages
>> Installed back Pacemaker and Corosync via YUM
>> When I did "crm_mon -1", I'm surprised to see that configuration is still
>> there.
>>
>> After the reinstallation, still experiencing the same behavior and noticed
>> that DRBD is reporting Failed disk - only a reboot of the node can bring it
>> back to UpToDate.
>>
>> Please advise on the correct procedure to wipe out the configuration and
>> reinstallation.
>>
>> I will share the logs shortly.
>>
>> Thanks,
>> Jef

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org