Mailing List Archive

Managed Failovers w/ NFS HA Cluster
I feel like this is something that must have been covered extensively already but I've done a lot of googling, looked at a lot of cluster configs, but have not found the solution.

I have an HA NFS cluster (corosync+pacemaker). The relevant rpms are listed below but I'm not sure they are that important to the question which is this...

When performing managed failovers of the NFS-exported file system resource from one node to the other (crm resource move), any active NFS clients experience an I/O error when the file system is unexported. In other words, you must unexport it to unmount it. As soon as it is unexported, clients are no longer able to write to it and experience an I/O error (rather than just blocking).

In a failure scenario this is not a problem becuase the file system is never unexported on the primary server. Rather the server just goes down, the secondary takes over the resources and client I/O blocks until the process is complete and then goes about its business. We would like this same behavior for a *managed* failover but have not found a mount or export option/scenario that works. Is it possible? What am I missing?

I realize this is more of an nfs/exportfs question but I would think that those implementing NFS HA clusters would be familiar with the scenario I'm describing.

Regards,

Charlie Taylor

pacemaker-cluster-libs-1.1.7-6.el6.x86_64
pacemaker-cli-1.1.7-6.el6.x86_64
pacemaker-1.1.7-6.el6.x86_64
pacemaker-libs-1.1.7-6.el6.x86_64
resource-agents-3.9.2-40.el6.x86_64
fence-agents-3.1.5-35.el6.x86_64

Red Hat Enterprise Linux Server release 6.3 (Santiago)

Linux biostor3.ufhpc 2.6.32-279.19.1.el6.x86_64 #1 SMP Sat Nov 24 14:35:28 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

[root@biostor4 bs34]# crm status
============
Last updated: Thu Jul 17 10:55:04 2014
Last change: Thu Jul 17 07:59:47 2014 via crmd on biostor3.ufhpc
Stack: openais
Current DC: biostor3.ufhpc - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 2 expected votes
20 Resources configured.
============

Online: [ biostor3.ufhpc biostor4.ufhpc ]

Resource Group: grp_b3v0
vg_b3v0 (ocf::heartbeat:LVM): Started biostor3.ufhpc
fs_b3v0 (ocf::heartbeat:Filesystem): Started biostor3.ufhpc
ip_vbio3 (ocf::heartbeat:IPaddr2): Started biostor3.ufhpc
ex_b3v0_1 (ocf::heartbeat:exportfs): Started biostor3.ufhpc
ex_b3v0_2 (ocf::heartbeat:exportfs): Started biostor3.ufhpc
ex_b3v0_3 (ocf::heartbeat:exportfs): Started biostor3.ufhpc
ex_b3v0_4 (ocf::heartbeat:exportfs): Started biostor3.ufhpc
ex_b3v0_5 (ocf::heartbeat:exportfs): Started biostor3.ufhpc
Resource Group: grp_b4v0
vg_b4v0 (ocf::heartbeat:LVM): Started biostor4.ufhpc
fs_b4v0 (ocf::heartbeat:Filesystem): Started biostor4.ufhpc
ip_vbio4 (ocf::heartbeat:IPaddr2): Started biostor4.ufhpc
ex_b4v0_1 (ocf::heartbeat:exportfs): Started biostor4.ufhpc
ex_b4v0_2 (ocf::heartbeat:exportfs): Started biostor4.ufhpc
ex_b4v0_3 (ocf::heartbeat:exportfs): Started biostor4.ufhpc
ex_b4v0_4 (ocf::heartbeat:exportfs): Started biostor4.ufhpc
ex_b4v0_5 (ocf::heartbeat:exportfs): Started biostor4.ufhpc
st_bio3 (stonith:fence_ipmilan): Started biostor4.ufhpc
st_bio4 (stonith:fence_ipmilan): Started biostor3.ufhpc



_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Managed Failovers w/ NFS HA Cluster [ In reply to ]
On 7/17/2014 10:24 AM, Charles Taylor wrote:
> When performing managed failovers of the NFS-exported file system
resource from one node to the other (crm resource move), any active NFS
clients experience an I/O error when the file system is unexported. In
other words, you must unexport it to unmount it. As soon as it is
unexported, clients are no longer able to write to it and experience an
I/O error (rather than just blocking).

FWIW that doesn't happen here w/ nfs v3, /var/lib/nfs on drbd, heartbeat
in r1 mode on the servers, and automounter on the clients.

Dima

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Managed Failovers w/ NFS HA Cluster [ In reply to ]
----- Original Message -----
> I feel like this is something that must have been covered extensively already
> but I've done a lot of googling, looked at a lot of cluster configs, but
> have not found the solution.
>
> I have an HA NFS cluster (corosync+pacemaker). The relevant rpms are listed
> below but I'm not sure they are that important to the question which is
> this...
>
> When performing managed failovers of the NFS-exported file system resource
> from one node to the other (crm resource move), any active NFS clients
> experience an I/O error when the file system is unexported. In other words,
> you must unexport it to unmount it. As soon as it is unexported, clients
> are no longer able to write to it and experience an I/O error (rather than
> just blocking).
>
> In a failure scenario this is not a problem becuase the file system is never
> unexported on the primary server. Rather the server just goes down, the
> secondary takes over the resources and client I/O blocks until the process
> is complete and then goes about its business. We would like this same
> behavior for a *managed* failover but have not found a mount or export
> option/scenario that works. Is it possible? What am I missing?
>
> I realize this is more of an nfs/exportfs question but I would think that
> those implementing NFS HA clusters would be familiar with the scenario I'm
> describing.

read this.

NFS Active/Passive
https://github.com/davidvossel/phd/blob/master/doc/presentations/nfs-ap-overview.pdf?raw=true

NFS Active/Active
https://github.com/davidvossel/phd/blob/master/doc/presentations/nfs-aa-overview.pdf?raw=true

Note that the nfsnotify agent and the nfsserver agents have had a lot of work
done to them in the last month or two upstream. Depending on what distro you
are using, you may benefit from using the latest upstream agents (if rhel based
definitely use the upstream agents.)


-- Vossel


> Regards,
>
> Charlie Taylor
>
> pacemaker-cluster-libs-1.1.7-6.el6.x86_64
> pacemaker-cli-1.1.7-6.el6.x86_64
> pacemaker-1.1.7-6.el6.x86_64
> pacemaker-libs-1.1.7-6.el6.x86_64
> resource-agents-3.9.2-40.el6.x86_64
> fence-agents-3.1.5-35.el6.x86_64
>
> Red Hat Enterprise Linux Server release 6.3 (Santiago)
>
> Linux biostor3.ufhpc 2.6.32-279.19.1.el6.x86_64 #1 SMP Sat Nov 24 14:35:28
> EST 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> [root@biostor4 bs34]# crm status
> ============
> Last updated: Thu Jul 17 10:55:04 2014
> Last change: Thu Jul 17 07:59:47 2014 via crmd on biostor3.ufhpc
> Stack: openais
> Current DC: biostor3.ufhpc - partition with quorum
> Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
> 2 Nodes configured, 2 expected votes
> 20 Resources configured.
> ============
>
> Online: [ biostor3.ufhpc biostor4.ufhpc ]
>
> Resource Group: grp_b3v0
> vg_b3v0 (ocf::heartbeat:LVM): Started biostor3.ufhpc
> fs_b3v0 (ocf::heartbeat:Filesystem): Started biostor3.ufhpc
> ip_vbio3 (ocf::heartbeat:IPaddr2): Started biostor3.ufhpc
> ex_b3v0_1 (ocf::heartbeat:exportfs): Started biostor3.ufhpc
> ex_b3v0_2 (ocf::heartbeat:exportfs): Started biostor3.ufhpc
> ex_b3v0_3 (ocf::heartbeat:exportfs): Started biostor3.ufhpc
> ex_b3v0_4 (ocf::heartbeat:exportfs): Started biostor3.ufhpc
> ex_b3v0_5 (ocf::heartbeat:exportfs): Started biostor3.ufhpc
> Resource Group: grp_b4v0
> vg_b4v0 (ocf::heartbeat:LVM): Started biostor4.ufhpc
> fs_b4v0 (ocf::heartbeat:Filesystem): Started biostor4.ufhpc
> ip_vbio4 (ocf::heartbeat:IPaddr2): Started biostor4.ufhpc
> ex_b4v0_1 (ocf::heartbeat:exportfs): Started biostor4.ufhpc
> ex_b4v0_2 (ocf::heartbeat:exportfs): Started biostor4.ufhpc
> ex_b4v0_3 (ocf::heartbeat:exportfs): Started biostor4.ufhpc
> ex_b4v0_4 (ocf::heartbeat:exportfs): Started biostor4.ufhpc
> ex_b4v0_5 (ocf::heartbeat:exportfs): Started biostor4.ufhpc
> st_bio3 (stonith:fence_ipmilan): Started biostor4.ufhpc
> st_bio4 (stonith:fence_ipmilan): Started biostor3.ufhpc
>
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems