Mailing List Archive

Antw: Re: file system resource becomes inaccesible when any of the node goes down
>>> Muhammad Sharfuddin <M.Sharfuddin@nds.com.pk> schrieb am 06.07.2015 um 12:14 in
Nachricht <559A550A.8010906@nds.com.pk>:
[...]
> Ok, so reducing the sbd timeout(or msgwait) would provide the
> uninterrupted access to the ocfs2 file system on the surviving/online node ?
> or would it just minimize the downtime ?

It will reduce the time between "writing the reset message for a node" and "the cluster believes the node is down". So you can guess what happens if you set it to some very short time like 1 second...

Regards,
Ulrich



_______________________________________________
Linux-HA mailing list is closing down.
Please subscribe to users@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
_______________________________________________
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
Re: Antw: Re: file system resource becomes inaccesible when any of the node goes down [ In reply to ]
On 07/07/2015 12:14 PM, Ulrich Windl wrote:
>>>> Muhammad Sharfuddin <M.Sharfuddin@nds.com.pk> schrieb am 06.07.2015 um 12:14 in
> Nachricht <559A550A.8010906@nds.com.pk>:
> [...]
>> Ok, so reducing the sbd timeout(or msgwait) would provide the
>> uninterrupted access to the ocfs2 file system on the surviving/online node ?
>> or would it just minimize the downtime ?
> It will reduce the time between "writing the reset message for a node" and "the cluster believes the node is down". So you can guess what happens if you set it to some very short time like 1 second...
>
> Regards,
> Ulrich
>
now msgwait timeout is set to 10s and a delay/inaccessibility of 15
seconds was observed. If a service(App, DB, file server) is installed
and running from the ocfs2 file system via the surviving/online node, then
wouldn't that service get crashed or become offline due to the
inaccessibility of the file system(event though its ocfs2) while a
member node goes down ?


If cluster is configured to run the two independent services, and starts
one on node1 and ther on node2, while both the service shared the same
file system, /sharedata(ocfs2), then in case of a failure of one node,
the other/online wont be able to
keep running the particular service because the file system holding the
binaries/configuration/service is not available for around at least 15
seconds.

I don't understand the advantage of Ocfs2 file system in such a setup.


--
Regards,

Muhammad Sharfuddin
<http://www.nds.com.pk>
_______________________________________________
Linux-HA mailing list is closing down.
Please subscribe to users@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
_______________________________________________
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
Re: Antw: Re: file system resource becomes inaccesible when any of the node goes down [ In reply to ]
On 2015-07-07T14:15:14, Muhammad Sharfuddin <M.Sharfuddin@nds.com.pk> wrote:

> now msgwait timeout is set to 10s and a delay/inaccessibility of 15 seconds
> was observed. If a service(App, DB, file server) is installed and running
> from the ocfs2 file system via the surviving/online node, then
> wouldn't that service get crashed or become offline due to the
> inaccessibility of the file system(event though its ocfs2) while a member
> node goes down ?

You're seeing a trade-off of using OCFS2. The semantics of the file
system require all accessing nodes to be very closely synchronized (that
is not optional), and that requires the access to the fs to be paused
during recovery. (See the CAP theorem.)

The apps don't crash, they are simply blocked. (To them it looks like
slow IO.)

The same is true for DRBD in active/active mode; the block device is
tightly synchronized, and this requires both nodes to be up, or cleanly
reported as down.

> If cluster is configured to run the two independent services, and starts one
> on node1 and ther on node2, while both the service shared the same file
> system, /sharedata(ocfs2), then in case of a failure of one node, the
> other/online wont be able to
> keep running the particular service because the file system holding the
> binaries/configuration/service is not available for around at least 15
> seconds.
>
> I don't understand the advantage of Ocfs2 file system in such a setup.

If that's your setup, indeed, you're not getting any advantages. OCFS2
makes sense if you have services that indeed need access to the same
file system and directory structure.

If you have two independent services, or even services that are
essentially node local, you're much better off using independent,
separate file system mounts with XFS or extX.



Regards,
Lars

--
Architect Storage/HA
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list is closing down.
Please subscribe to users@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
_______________________________________________
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha