Mailing List Archive

resources don't migrate on failure of one node (in a two node cluster)
Greetings,

I have a 2 node test cluster. It exposes a single resource, an NFS
server which exports a single directory. I'm able to do:

crm resource move <resource_name>

and that works but if I do:

pkill -9 'corosync|pacemaker'

the resource doesn't migrate.

I've been told by folks on the linux-ha IRC that fencing is my answer
and I've put in place the null fence client. I understand that this is
not what I'd want in production, but for my testing it seems to be the
correct way to test a cluster. I've confirmed in the good server's logs
that it believes it has successfully fenced its partner

notice: log_operation: Operation 'reboot' [24621] (call 0 from
crmd.22546) for host 'nebula04' with device 'st-null' returned: 0 (OK)

Am I mistaken that the stonith:null resource agent should allow the
system to believe that the "failed" server has been fenced and,
therefore, it is safe to migrate the resources? Note the script that
issues the pkill also stops the resources (so there aren't 2 VIPs, etc...).

Thanks much for any insight.

JR
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: resources don't migrate on failure of one node (in a two node cluster) [ In reply to ]
On 2014-02-22T13:49:40, JR <botemout@gmail.com> wrote:

> I've been told by folks on the linux-ha IRC that fencing is my answer
> and I've put in place the null fence client. I understand that this is
> not what I'd want in production, but for my testing it seems to be the
> correct way to test a cluster. I've confirmed in the good server's logs
> that it believes it has successfully fenced its partner

Well, as long as you never do this for production, yes, I guess this
should work.

But I doubt anyone has tested if the "null" stonith agent really works.
Perhaps it is, in fact, too fast and you're hitting a strange race. Or
perhaps something else is going wrong, such as no-quorum-policy not
being set properly. It's impossible to tell without your configuration
or logs, which always hold the answers ;-)


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems