Mailing List Archive

Why does o2cb RA remove module ocfs2?
Hi!

I had a problem where "O2CB stop" fenced the node that was shut down:
I had updated the kernel, and then rebooted. As part of shutdown, the cluster stack was stopped. In turn, the "O2CB" resource was stopped.
Unfortunately this caused an error like (SLES11 SP3):

---
modprobe: FATAL: Could not load /lib/modules/3.0.101-0.8-xen/modules.dep: No such file or directory
o2cb(prm_O2CB)[19908]: ERROR: Unable to unload module: ocfs2
---

This in turn caused a node fence, which ruined the clean reboot.

So why is the RA messing with the kernel module on stop?

Regards,
Ulrich


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Why does o2cb RA remove module ocfs2? [ In reply to ]
On 2014-02-05T12:24:00, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> wrote:

> I had a problem where "O2CB stop" fenced the node that was shut down:
> I had updated the kernel, and then rebooted. As part of shutdown, the cluster stack was stopped. In turn, the "O2CB" resource was stopped.
> Unfortunately this caused an error like (SLES11 SP3):
>
> ---
> modprobe: FATAL: Could not load /lib/modules/3.0.101-0.8-xen/modules.dep: No such file or directory
> o2cb(prm_O2CB)[19908]: ERROR: Unable to unload module: ocfs2
> ---
>
> This in turn caused a node fence, which ruined the clean reboot.
>
> So why is the RA messing with the kernel module on stop?

Because customers complained about the new module not being picked up if
they upgrade ocfs2-kmp and restarted the cluster stack on a node. It's
incredibly hard to please everyone, alas ...

The right way to update a cluster node is anyway this one:

1. Stop the cluster stack
2. Update/upgrade/reboot as needed
3. Restart the cluster stack

This would avoid this error too. Or keeping multiple kernel versions in
parallel (which also helps if a kernel update no longer boots for some
reason). Removing the running kernel package is usually not a great
idea; I prefer to remove them after having successfully rebooted only,
because you *never* know if you may have to reload a module.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems