Mailing List Archive: Re: resources don't migrate on failure of one node (in a two node cluster)

I was led to believe for a hot cluster (no stonith using drbd backed nfs resources) that the best way to ensure failover was via a quorem with a third (resource hosting or not ) server. I had issues with getting a robust fail over and am moving on to a 3 node drbd9 backed cluster.

Sent from my Verizon Wireless 4G LTE DROID

JR <botemout@gmail.com> wrote:

>Greetings,
>
>I have a 2 node test cluster. It exposes a single resource, an NFS
>server which exports a single directory. I'm able to do:
>
>crm resource move <resource_name>
>
>and that works but if I do:
>
>pkill -9 'corosync|pacemaker'
>
>the resource doesn't migrate.
>
>I've been told by folks on the linux-ha IRC that fencing is my answer
>and I've put in place the null fence client. I understand that this is
>not what I'd want in production, but for my testing it seems to be the
>correct way to test a cluster. I've confirmed in the good server's logs
>that it believes it has successfully fenced its partner
>
>notice: log_operation: Operation 'reboot' [24621] (call 0 from
>crmd.22546) for host 'nebula04' with device 'st-null' returned: 0 (OK)
>
>Am I mistaken that the stonith:null resource agent should allow the
>system to believe that the "failed" server has been fenced and,
>therefore, it is safe to migrate the resources? Note the script that
>issues the pkill also stops the resources (so there aren't 2 VIPs, etc...).
>
>Thanks much for any insight.
>
>JR
>_______________________________________________
>Linux-HA mailing list
>Linux-HA@lists.linux-ha.org
>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems