Mailing List Archive

Re: RA spec: explicit "probe" operation? [ In reply to ]
On 2011-07-09T17:23:04, Andrew Beekhof <andrew@beekhof.net> wrote:

> > As a somewhat silly example, consider validate-all checking that
> > "tmpdir" is non-empty and a valid directory. Consider that start removes
> > it.  Consider what will happen if something goes wrong with the
> > parameter, or one of our users manages to call "start" manually. (You
> > know they will.)
> >
> > The operation and the validation for the parameters that operation takes
> > belong in _one_ execution context, not split into two.
>
> So you're arguing that validate-all should go away?

Uh? No. validate-all makes perfect sense for an UI or so that wants to
validate the resource agents parameters without actually starting or
stopping it.

> > It is not. But if parameters are incorrect, start will fail,
> Without doubt. However not all failures (and error codes) are created equal.

Incorrect. They are at least _created_ equal. ;-) But indeed, they might
not always check all the same things.

> A "somewhat more detailed failure" is the part I care about.
> How many "it doesn't start" emails/reports do we get? Far too many to count.

That's a systematic problem though that won't be solved by calling two
actions. They will still not get it - so the cluster calls two
operations instead of one, what are the chances that they'll read the
messages from "validate-all" when they currently don't read "start"? ;-)

> I think having the cluster abort before that point would make all our
> lives easier as it would be more obvious that the error is related to
> the config or install.

Right, the UI should actually make use of validate-all - because then
the errors will be reported _when the user actually configures it_, that
makes sense. And hopefully they'll get it then.

(Sure, that doesn't always make sense when they're using a huge shadow
CIB commit, but there's always something.)

How about a phrasing like this in the validate-all description:

"User interfaces should make appropriate use of this operation to
validate the resource's settings prior to adding it to the cluster
configuration."?


Regards,
Lars

--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
ha-wg-technical mailing list
ha-wg-technical@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/ha-wg-technical
Re: RA spec: explicit "probe" operation? [ In reply to ]
On Mon, Jul 11, 2011 at 8:21 PM, Lars Marowsky-Bree <lmb@suse.de> wrote:
> On 2011-07-09T17:23:04, Andrew Beekhof <andrew@beekhof.net> wrote:
>
>> > As a somewhat silly example, consider validate-all checking that
>> > "tmpdir" is non-empty and a valid directory. Consider that start removes
>> > it.  Consider what will happen if something goes wrong with the
>> > parameter, or one of our users manages to call "start" manually. (You
>> > know they will.)
>> >
>> > The operation and the validation for the parameters that operation takes
>> > belong in _one_ execution context, not split into two.
>>
>> So you're arguing that validate-all should go away?
>
> Uh? No. validate-all makes perfect sense for an UI or so that wants to
> validate the resource agents parameters without actually starting or
> stopping it.

Okay. But you don't want "start calling validate-all" to be "best
practice", and code duplication is a bad thing... so what do you
propose?

>
>> > It is not. But if parameters are incorrect, start will fail,
>> Without doubt.  However not all failures (and error codes) are created equal.
>
> Incorrect. They are at least _created_ equal. ;-) But indeed, they might
> not always check all the same things.

Which is important.
Its important that validate-all and start return appropriate error
messages instead of throwing up their hands and chucking out
OCF_ERR_GENERIC

>
>> A "somewhat more detailed failure" is the part I care about.
>> How many "it doesn't start" emails/reports do we get?  Far too many to count.
>
> That's a systematic problem though that won't be solved by calling two
> actions. They will still not get it - so the cluster calls two
> operations instead of one, what are the chances that they'll read the
> messages from "validate-all" when they currently don't read "start"? ;-)
>
>> I think having the cluster abort before that point would make all our
>> lives easier as it would be more obvious that the error is related to
>> the config or install.
>
> Right, the UI should actually make use of validate-all - because then
> the errors will be reported _when the user actually configures it_, that
> makes sense. And hopefully they'll get it then.
>
> (Sure, that doesn't always make sense when they're using a huge shadow
> CIB commit, but there's always something.)
>
> How about a phrasing like this in the validate-all description:
>
> "User interfaces should make appropriate use of this operation to
> validate the resource's settings prior to adding it to the cluster
> configuration."?
>
>
> Regards,
>    Lars
>
> --
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
_______________________________________________
ha-wg-technical mailing list
ha-wg-technical@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/ha-wg-technical
Re: RA spec: explicit "probe" operation? [ In reply to ]
On 2011-07-11T21:10:00, Andrew Beekhof <andrew@beekhof.net> wrote:

> > Uh? No. validate-all makes perfect sense for an UI or so that wants to
> > validate the resource agents parameters without actually starting or
> > stopping it.
> Okay. But you don't want "start calling validate-all" to be "best
> practice", and code duplication is a bad thing... so what do you
> propose?

Well, as far as "validate-all" is concerned, I'dn't actually change that
call at all, and not make it mandatory either. But it would be awesome
if the UIs actually called it - yet, that is beyond the specification, I
see no need for it to be mandatory.

But like I proposed below in the mail here - we can increase the
pressure a bit on validate-all actually being made use of by the UIs.

> > Incorrect. They are at least _created_ equal. ;-) But indeed, they might
> > not always check all the same things.
> Which is important.
> Its important that validate-all and start return appropriate error
> messages instead of throwing up their hands and chucking out
> OCF_ERR_GENERIC

Right. But that's already required, and I doubt we're going to get it by
making another call mandatory. People should already be doing that, and
users should be reading the reports - I don't think the spec can help
that here.

For the most part, it's a reporting issue - they see the PE warning or
the crm_mon summary line and don't grok what they need to do to fix it.
That the error is coming from validate-all instead of start won't make
it easier for them; what they need is UI assistance. (i.e., better log
file querying to get a message as to what is actually wrong.)

Again, that's an implementation detail, not a specification one.


If we were talking about the spec, I'd rather add some way to return a
descriptive _string_ from the RA to the UI in some standard format so
that the UIs can display a meaningful one-line summary, instead of
rendering our machine-parseable rc.

(i.e., have the RA call a "ocf_describe" binary to set that before the
shell script returns, never mind how that is internally implemented for
now.)


Regards,
Lars

--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
ha-wg-technical mailing list
ha-wg-technical@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/ha-wg-technical
Re: RA spec: explicit "probe" operation? [ In reply to ]
On Mon, Jul 11, 2011 at 9:19 PM, Lars Marowsky-Bree <lmb@suse.de> wrote:
> On 2011-07-11T21:10:00, Andrew Beekhof <andrew@beekhof.net> wrote:
>
>> > Uh? No. validate-all makes perfect sense for an UI or so that wants to
>> > validate the resource agents parameters without actually starting or
>> > stopping it.
>> Okay. But you don't want "start calling validate-all" to be "best
>> practice", and code duplication is a bad thing... so what do you
>> propose?
>
> Well, as far as "validate-all" is concerned, I'dn't actually change that
> call at all, and not make it mandatory either.

best practice != mandatory.

I've no problem with it being in the dev guide not the spec.
Same way we recommend that many/most people put a monitor loop at the
end of the start action.

> But it would be awesome
> if the UIs actually called it - yet, that is beyond the specification, I
> see no need for it to be mandatory.

So the shell will start calling validate-all too?

>
> But like I proposed below in the mail here - we can increase the
> pressure a bit on validate-all actually being made use of by the UIs.
>
>> > Incorrect. They are at least _created_ equal. ;-) But indeed, they might
>> > not always check all the same things.
>> Which is important.
>> Its important that validate-all and start return appropriate error
>> messages instead of throwing up their hands and chucking out
>> OCF_ERR_GENERIC
>
> Right. But that's already required,

10 head
20 desk
30 add
40 goto 10

Yes, but you only get it if start calls validate-all or the checks are
duplicated.
You don't want the first and the second is error prone.

> and I doubt we're going to get it by
> making another call mandatory. People should already be doing that, and
> users should be reading the reports - I don't think the spec can help
> that here.
>
> For the most part, it's a reporting issue - they see the PE warning or
> the crm_mon summary line and don't grok what they need to do to fix it.
> That the error is coming from validate-all instead of start won't make
> it easier for them;

In my mind it will help split the reports into two piles - user error
vs. app error.
Knowing that is a good first step.

> what they need is UI assistance. (i.e., better log
> file querying to get a message as to what is actually wrong.)
>
> Again, that's an implementation detail, not a specification one.
>
>
> If we were talking about the spec, I'd rather add some way to return a
> descriptive _string_ from the RA to the UI in some standard format so
> that the UIs can display a meaningful one-line summary, instead of
> rendering our machine-parseable rc.
>
> (i.e., have the RA call a "ocf_describe" binary to set that before the
> shell script returns, never mind how that is internally implemented for
> now.)

Interesting idea.

>
>
> Regards,
>    Lars
>
> --
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
_______________________________________________
ha-wg-technical mailing list
ha-wg-technical@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/ha-wg-technical
Re: RA spec: explicit "probe" operation? [ In reply to ]
On 2011-07-11T21:38:11, Andrew Beekhof <andrew@beekhof.net> wrote:

> > Well, as far as "validate-all" is concerned, I'dn't actually change that
> > call at all, and not make it mandatory either.
> best practice != mandatory.

Right, if you want to call it "BCP", sure, go ahead.

> I've no problem with it being in the dev guide not the spec.
> Same way we recommend that many/most people put a monitor loop at the
> end of the start action.

Right; we insist that "start" must only return once the service is fully
up and operational, and for some service,s the loop is the easiest way
to achieve that.

> > But it would be awesome if the UIs actually called it - yet, that is
> > beyond the specification, I see no need for it to be mandatory.
> So the shell will start calling validate-all too?

Uhm, how should I know? It probably should, yes, but that's something to
discuss with Dejan, and not exactly relevant to the specification of
what "validate-all" does and what implementors must provide and can rely
on.

> >> Its important that validate-all and start return appropriate error
> >> messages instead of throwing up their hands and chucking out
> >> OCF_ERR_GENERIC
> > Right. But that's already required,
>
> 10 head
> 20 desk
> 30 add
> 40 goto 10

Ah, such constructive communication styles abound here! ;-)

> Yes, but you only get it if start calls validate-all or the checks are
> duplicated.
> You don't want the first and the second is error prone.

I didn't say I don't want start *internally* to the RA call
validate-all; clearly, start needs to verify its requirements. I just
don't want to call it before a start as a separate action, because that
is just stupid.

> In my mind it will help split the reports into two piles - user error
> vs. app error.

But that is already possible; nothing in the spec prevents that. If you
see "start" fail with OCF_ERR_CONFIGURED, OCF_ERR_ARGS,
OCF_ERR_INSTALLED those are quite likely "user errors".

A RA that doesn't return them but sends back "ERR_GENERIC" for
everything is an *implementation*, not a *specification* problem.

> > If we were talking about the spec, I'd rather add some way to return a
> > descriptive _string_ from the RA to the UI in some standard format so
> > that the UIs can display a meaningful one-line summary, instead of
> > rendering our machine-parseable rc.
> >
> > (i.e., have the RA call a "ocf_describe" binary to set that before the
> > shell script returns, never mind how that is internally implemented for
> > now.)
>
> Interesting idea.

Actually it might be very valuable, if the CIB status section can take
~80 chars per failure. Perhaps that's the way out of this discussion?
;-)


Regards,
Lars

--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
ha-wg-technical mailing list
ha-wg-technical@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/ha-wg-technical
Re: RA spec: explicit "probe" operation? [ In reply to ]
On Tue, Jul 12, 2011 at 9:27 PM, Lars Marowsky-Bree <lmb@suse.de> wrote:
> On 2011-07-11T21:38:11, Andrew Beekhof <andrew@beekhof.net> wrote:
>
>> > Well, as far as "validate-all" is concerned, I'dn't actually change that
>> > call at all, and not make it mandatory either.
>> best practice != mandatory.
>
> Right, if you want to call it "BCP", sure, go ahead.
>
>> I've no problem with it being in the dev guide not the spec.
>> Same way we recommend that many/most people put a monitor loop at the
>> end of the start action.
>
> Right; we insist that "start" must only return once the service is fully
> up and operational, and for some service,s the loop is the easiest way
> to achieve that.
>
>> > But it would be awesome if the UIs actually called it - yet, that is
>> > beyond the specification, I see no need for it to be mandatory.
>> So the shell will start calling validate-all too?
>
> Uhm, how should I know? It probably should, yes, but that's something to
> discuss with Dejan, and not exactly relevant to the specification of
> what "validate-all" does and what implementors must provide and can rely
> on.

Well it is.
Since your position was "let the management tools handle it", its
reasonable to ask how the most commonly used tool would do so ;-)

Its also not clear to me how pushing it into the GUIs solves the
problem that validate-all will only work when all resource
dependencies are available.
This is why I was pushing for the cluster (either directly or
indirectly as part of the RA's start action) to be the one to make the
call.

>
>> >> Its important that validate-all and start return appropriate error
>> >> messages instead of throwing up their hands and chucking out
>> >> OCF_ERR_GENERIC
>> > Right. But that's already required,
>>
>> 10 head
>> 20 desk
>> 30 add
>> 40 goto 10
>
> Ah, such constructive communication styles abound here! ;-)
>
>> Yes, but you only get it if start calls validate-all or the checks are
>> duplicated.
>> You don't want the first and the second is error prone.
>
> I didn't say I don't want start *internally* to the RA call
> validate-all;

Well, you did actually. That was the only reason I mentioned doing it
automatically.
But if you're happy with it now, then I think we have a path forward :-)

Florian: could you amend the dev guide to recommend start internally
calls validate-all first up?

> clearly, start needs to verify its requirements. I just
> don't want to call it before a start as a separate action, because that
> is just stupid.

ok.
I don't think its that big of a deal, but whatever :-)

>> In my mind it will help split the reports into two piles - user error
>> vs. app error.
>
> But that is already possible; nothing in the spec prevents that. If you
> see "start" fail with OCF_ERR_CONFIGURED, OCF_ERR_ARGS,
> OCF_ERR_INSTALLED those are quite likely "user errors".
>
> A RA that doesn't return them but sends back "ERR_GENERIC" for
> everything is an *implementation*, not a *specification* problem.
>
>> > If we were talking about the spec, I'd rather add some way to return a
>> > descriptive _string_ from the RA to the UI in some standard format so
>> > that the UIs can display a meaningful one-line summary, instead of
>> > rendering our machine-parseable rc.
>> >
>> > (i.e., have the RA call a "ocf_describe" binary to set that before the
>> > shell script returns, never mind how that is internally implemented for
>> > now.)
>>
>> Interesting idea.
>
> Actually it might be very valuable, if the CIB status section can take
> ~80 chars per failure. Perhaps that's the way out of this discussion?
> ;-)
>
>
> Regards,
>    Lars
>
> --
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> _______________________________________________
> ha-wg-technical mailing list
> ha-wg-technical@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/ha-wg-technical
>
_______________________________________________
ha-wg-technical mailing list
ha-wg-technical@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/ha-wg-technical

1 2  View All