Mailing List Archive

Antw: Re: Why does o2cb RA remove module ocfs2?
>>> Lars Marowsky-Bree <lmb@suse.com> schrieb am 05.02.2014 um 12:36 in
Nachricht
<20140205113649.GN13514@suse.de>:
> On 2014-02-05T12:24:00, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>
wrote:
>
>> I had a problem where "O2CB stop" fenced the node that was shut down:
>> I had updated the kernel, and then rebooted. As part of shutdown, the
> cluster stack was stopped. In turn, the "O2CB" resource was stopped.
>> Unfortunately this caused an error like (SLES11 SP3):
>>
>> ---
>> modprobe: FATAL: Could not load /lib/modules/3.0.101-0.8-xen/modules.dep:
No
> such file or directory
>> o2cb(prm_O2CB)[19908]: ERROR: Unable to unload module: ocfs2
>> ---
>>
>> This in turn caused a node fence, which ruined the clean reboot.
>>
>> So why is the RA messing with the kernel module on stop?
>
> Because customers complained about the new module not being picked up if
> they upgrade ocfs2-kmp and restarted the cluster stack on a node. It's
> incredibly hard to please everyone, alas ...

I think the proper way would be this:
Stop your OCFS2 resources, rmmod the module, [modprobe the module to re-insert
the new version], start your OCFS2 resources.

I guess the kernel update is more common than the "just the ocfs2-kmp update"

>
> The right way to update a cluster node is anyway this one:
>
> 1. Stop the cluster stack
> 2. Update/upgrade/reboot as needed
> 3. Restart the cluster stack
>
> This would avoid this error too. Or keeping multiple kernel versions in
> parallel (which also helps if a kernel update no longer boots for some
> reason). Removing the running kernel package is usually not a great
> idea; I prefer to remove them after having successfully rebooted only,
> because you *never* know if you may have to reload a module.

There's another way: (Like HP-UX learned to do it): Defer changes to the
running kernel until shutdown/reboot.

>
>
> Regards,
> Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,

> HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Antw: Re: Why does o2cb RA remove module ocfs2? [ In reply to ]
On 2014-02-05T15:06:47, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> wrote:

> I guess the kernel update is more common than the "just the ocfs2-kmp update"

Well, some customers do apply updates in the recommended way, and thus
don't encounter this ;-) In any case, since at this time the cluster
services are already stopped, at least the service impact is minimal.

> > This would avoid this error too. Or keeping multiple kernel versions in
> > parallel (which also helps if a kernel update no longer boots for some
> > reason). Removing the running kernel package is usually not a great
> > idea; I prefer to remove them after having successfully rebooted only,
> > because you *never* know if you may have to reload a module.
>
> There's another way: (Like HP-UX learned to do it): Defer changes to the
> running kernel until shutdown/reboot.

True. Hence: activate multi-versions for the kernel in
/etc/zypp/zypp.conf and only remove the old kernel after the reboot. I
do that manually, but I do think we even have a script for that
somewhere. I honestly don't remember where though; I like to keep
several kernels around for testing anyway.

I think this is the default going forward, but as always: zypper gained
this ability during the SLE 11 cycle, and we couldn't just change
existing behaviour in a simple update, it has to be manually
activated.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Antw: Re: Why does o2cb RA remove module ocfs2? [ In reply to ]
>>> Lars Marowsky-Bree <lmb@suse.com> schrieb am 05.02.2014 um 15:11 in
Nachricht
<20140205141140.GU13514@suse.de>:
> On 2014-02-05T15:06:47, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>
wrote:
>
>> I guess the kernel update is more common than the "just the ocfs2-kmp
update"
>
> Well, some customers do apply updates in the recommended way, and thus
> don't encounter this ;-) In any case, since at this time the cluster
> services are already stopped, at least the service impact is minimal.
>
>> > This would avoid this error too. Or keeping multiple kernel versions in
>> > parallel (which also helps if a kernel update no longer boots for some
>> > reason). Removing the running kernel package is usually not a great
>> > idea; I prefer to remove them after having successfully rebooted only,
>> > because you *never* know if you may have to reload a module.
>>
>> There's another way: (Like HP-UX learned to do it): Defer changes to the
>> running kernel until shutdown/reboot.
>
> True. Hence: activate multi-versions for the kernel in
> /etc/zypp/zypp.conf and only remove the old kernel after the reboot. I
> do that manually, but I do think we even have a script for that
> somewhere. I honestly don't remember where though; I like to keep
> several kernels around for testing anyway.
>
> I think this is the default going forward, but as always: zypper gained
> this ability during the SLE 11 cycle, and we couldn't just change
> existing behaviour in a simple update, it has to be manually
> activated.

I did a quick check: It seems only "ocf:ocfs2:o2cb" does sucj (IMHO) nonsense
like removing a module on stop (I can guess it's a leftover from o2cb module
hacking when the developer was too lazy to remove the module by hand when
wanting to try a newer version):
--
# egrep 'modprobe|rmmod' /usr/lib/ocf/resource.d/*/*
/usr/lib/ocf/resource.d/heartbeat/drbd: do_cmd modprobe -s drbd `$DRBDADM
sh-mod-parms` || {
/usr/lib/ocf/resource.d/heartbeat/iface-vlan: error="$(modprobe
8021q 2>&1)"
/usr/lib/ocf/resource.d/linbit/drbd: do_cmd modprobe -s drbd
`$DRBDADM sh-mod-parms` || {
/usr/lib/ocf/resource.d/ocfs2/o2cb: modprobe -rs "$FSNAME"
/usr/lib/ocf/resource.d/ocfs2/o2cb: modprobe -rs "$MODNAME"
/usr/lib/ocf/resource.d/ocfs2/o2cb: modprobe -s ocfs2_stackglue
/usr/lib/ocf/resource.d/ocfs2/o2cb: modprobe -s ocfs2_stack_user
/usr/lib/ocf/resource.d/ocfs2/o2cb: modprobe -s ocfs2
/usr/lib/ocf/resource.d/pacemaker/controld: modprobe configfs
/usr/lib/ocf/resource.d/pacemaker/controld: modprobe dlm
--

Regards,
Ulrich


>
>
> Regards,
> Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,

> HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Antw: Re: Why does o2cb RA remove module ocfs2? [ In reply to ]
On 11 Feb 2014, at 10:38 pm, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> wrote:

>>>> Lars Marowsky-Bree <lmb@suse.com> schrieb am 05.02.2014 um 15:11 in
> Nachricht
> <20140205141140.GU13514@suse.de>:
>> On 2014-02-05T15:06:47, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>
> wrote:
>>
>>> I guess the kernel update is more common than the "just the ocfs2-kmp
> update"
>>
>> Well, some customers do apply updates in the recommended way, and thus
>> don't encounter this ;-) In any case, since at this time the cluster
>> services are already stopped, at least the service impact is minimal.
>>
>>>> This would avoid this error too. Or keeping multiple kernel versions in
>>>> parallel (which also helps if a kernel update no longer boots for some
>>>> reason). Removing the running kernel package is usually not a great
>>>> idea; I prefer to remove them after having successfully rebooted only,
>>>> because you *never* know if you may have to reload a module.
>>>
>>> There's another way: (Like HP-UX learned to do it): Defer changes to the
>>> running kernel until shutdown/reboot.
>>
>> True. Hence: activate multi-versions for the kernel in
>> /etc/zypp/zypp.conf and only remove the old kernel after the reboot. I
>> do that manually, but I do think we even have a script for that
>> somewhere. I honestly don't remember where though; I like to keep
>> several kernels around for testing anyway.
>>
>> I think this is the default going forward, but as always: zypper gained
>> this ability during the SLE 11 cycle, and we couldn't just change
>> existing behaviour in a simple update, it has to be manually
>> activated.
>
> I did a quick check: It seems only "ocf:ocfs2:o2cb" does sucj (IMHO) nonsense
> like removing a module on stop (I can guess it's a leftover from o2cb module
> hacking when the developer was too lazy to remove the module by hand when
> wanting to try a newer version):

seems pretty reasonable to me.
stop == remove all trace of the active resource.

> --
> # egrep 'modprobe|rmmod' /usr/lib/ocf/resource.d/*/*
> /usr/lib/ocf/resource.d/heartbeat/drbd: do_cmd modprobe -s drbd `$DRBDADM
> sh-mod-parms` || {
> /usr/lib/ocf/resource.d/heartbeat/iface-vlan: error="$(modprobe
> 8021q 2>&1)"
> /usr/lib/ocf/resource.d/linbit/drbd: do_cmd modprobe -s drbd
> `$DRBDADM sh-mod-parms` || {
> /usr/lib/ocf/resource.d/ocfs2/o2cb: modprobe -rs "$FSNAME"
> /usr/lib/ocf/resource.d/ocfs2/o2cb: modprobe -rs "$MODNAME"
> /usr/lib/ocf/resource.d/ocfs2/o2cb: modprobe -s ocfs2_stackglue
> /usr/lib/ocf/resource.d/ocfs2/o2cb: modprobe -s ocfs2_stack_user
> /usr/lib/ocf/resource.d/ocfs2/o2cb: modprobe -s ocfs2
> /usr/lib/ocf/resource.d/pacemaker/controld: modprobe configfs
> /usr/lib/ocf/resource.d/pacemaker/controld: modprobe dlm
> --
>
> Regards,
> Ulrich
>
>
>>
>>
>> Regards,
>> Lars
>>
>> --
>> Architect Storage/HA
>> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
>
>> HRB 21284 (AG Nürnberg)
>> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
Re: Antw: Re: Why does o2cb RA remove module ocfs2? [ In reply to ]
>>> Andrew Beekhof <andrew@beekhof.net> schrieb am 17.02.2014 um 02:33 in Nachricht
<7619A7E9-F006-4098-90F9-5C5B8BC84FB0@beekhof.net>:

> On 11 Feb 2014, at 10:38 pm, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>
> wrote:

[...]
>> I did a quick check: It seems only "ocf:ocfs2:o2cb" does such (IMHO)
> nonsense
>> like removing a module on stop (I can guess it's a leftover from o2cb module
>> hacking when the developer was too lazy to remove the module by hand when
>> wanting to try a newer version):
>
> seems pretty reasonable to me.
> stop == remove all trace of the active resource.
>
[...]

But why doesn't the LVM RA try to remove the lvm module, and why doesn't the NFS RA try to remove the nfs module, etc. then?

Regards,
Ulrich


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Antw: Re: Why does o2cb RA remove module ocfs2? [ In reply to ]
On 17 Feb 2014, at 6:39 pm, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> wrote:

>>>> Andrew Beekhof <andrew@beekhof.net> schrieb am 17.02.2014 um 02:33 in Nachricht
> <7619A7E9-F006-4098-90F9-5C5B8BC84FB0@beekhof.net>:
>
>> On 11 Feb 2014, at 10:38 pm, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>
>> wrote:
>
> [...]
>>> I did a quick check: It seems only "ocf:ocfs2:o2cb" does such (IMHO)
>> nonsense
>>> like removing a module on stop (I can guess it's a leftover from o2cb module
>>> hacking when the developer was too lazy to remove the module by hand when
>>> wanting to try a newer version):
>>
>> seems pretty reasonable to me.
>> stop == remove all trace of the active resource.
>>
> [...]
>
> But why doesn't the LVM RA try to remove the lvm module, and why doesn't the NFS RA try to remove the nfs module, etc. then?

No idea, I didn't write those either. Perhaps they should