Mailing List Archive

How to tell pacemaker to process a new event during a long-running resource operation
Hi all,

I have a resource which could in special cases have a very long-running
start operation.

If I have a new event (like switching a standby node back to online)
during the already running transition (cluster is still
S_TRANSITION_ENGINE) I would like the cluster to process them as soon
as possible and not only after the other resource came up.

Is that possible? I tried already batch-limit but I guess this is only
to make actions parallel in a combined transition, right?

Thanks in advance
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: How to tell pacemaker to process a new event during a long-running resource operation [ In reply to ]
----- Original Message -----
> From: "Maloja01" <maloja01@arcor.de>
> To: "Linux-HA" <linux-ha@lists.linux-ha.org>
> Sent: Friday, March 14, 2014 5:32:34 AM
> Subject: [Linux-HA] How to tell pacemaker to process a new event during a long-running resource operation
>
> Hi all,
>
> I have a resource which could in special cases have a very long-running
> start operation.

in-flight operations always have to complete before we can process a new transition. The only way we can transition earlier is by killing the in-flight process, which results in failure recovery and possibly fencing depending on what operation it is.

There's really nothing that can be done to speed this up except work on lowering the startup time of that resource.

-- Vossel

> If I have a new event (like switching a standby node back to online)
> during the already running transition (cluster is still
> S_TRANSITION_ENGINE) I would like the cluster to process them as soon
> as possible and not only after the other resource came up.
>
> Is that possible? I tried already batch-limit but I guess this is only
> to make actions parallel in a combined transition, right?
>
> Thanks in advance
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: How to tell pacemaker to process a new event during a long-running resource operation [ In reply to ]
On 03/14/2014 08:50 PM, David Vossel wrote:
> ----- Original Message -----
>> From: "Maloja01" <maloja01@arcor.de>
>> To: "Linux-HA" <linux-ha@lists.linux-ha.org>
>> Sent: Friday, March 14, 2014 5:32:34 AM
>> Subject: [Linux-HA] How to tell pacemaker to process a new event during a long-running resource operation
>>
>> Hi all,
>>
>> I have a resource which could in special cases have a very long-running
>> start operation.
>
> in-flight operations always have to complete before we can process a new transition. The only way we can transition earlier is by killing the in-flight process, which results in failure recovery and possibly fencing depending on what operation it is.
>
> There's really nothing that can be done to speed this up except work on lowering the startup time of that resource.

Thanks for this clear statement.

>
> -- Vossel
>
>> If I have a new event (like switching a standby node back to online)
>> during the already running transition (cluster is still
>> S_TRANSITION_ENGINE) I would like the cluster to process them as soon
>> as possible and not only after the other resource came up.
>>
>> Is that possible? I tried already batch-limit but I guess this is only
>> to make actions parallel in a combined transition, right?
>>
>> Thanks in advance
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: How to tell pacemaker to process a new event during a long-running resource operation [ In reply to ]
On 2014-03-14T15:50:18, David Vossel <dvossel@redhat.com> wrote:

> in-flight operations always have to complete before we can process a new transition. The only way we can transition earlier is by killing the in-flight process, which results in failure recovery and possibly fencing depending on what operation it is.
>
> There's really nothing that can be done to speed this up except work on lowering the startup time of that resource.

We keep getting similar requests though - some services take a long
time, and during that interval, the cluster is essentially stuck. As the
density of resources in the cluster increases and the number of nodes
goes up, this becomes more of an issue.

It *would* be possible with changes to the TE/PE - assume in-flight
operations will complete as planned, so that any further changes to
in-flight resources would be ordered after them, the ability to accept
actions completing from previous transitions -, but it's also
non-trivial.

Pacemaker 2.0 material? ;-)


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: How to tell pacemaker to process a new event during a long-running resource operation [ In reply to ]
On 17 Mar 2014, at 6:49 pm, Lars Marowsky-Bree <lmb@suse.com> wrote:

> On 2014-03-14T15:50:18, David Vossel <dvossel@redhat.com> wrote:
>
>> in-flight operations always have to complete before we can process a new transition. The only way we can transition earlier is by killing the in-flight process, which results in failure recovery and possibly fencing depending on what operation it is.
>>
>> There's really nothing that can be done to speed this up except work on lowering the startup time of that resource.
>
> We keep getting similar requests though - some services take a long
> time, and during that interval, the cluster is essentially stuck. As the
> density of resources in the cluster increases and the number of nodes
> goes up, this becomes more of an issue.
>
> It *would* be possible with changes to the TE/PE - assume in-flight
> operations will complete as planned, so that any further changes to
> in-flight resources would be ordered after them, the ability to accept
> actions completing from previous transitions -, but it's also
> non-trivial.

Thats probably the most lucid thinking anyone has done on the subject... you should put that into a bugzilla so we don't collectively forget.
I'd agree that this is probably going to increase in importance over time, so we'll likely have to address it sooner or later (just not right now).

>
> Pacemaker 2.0 material? ;-)
>
>
> Regards,
> Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems