Mailing List Archive

Slight bending of OCF specs: Re: Issues found in Apache resource agent
Hi Dejan,

If the resource agent is not running correctly it needs to be
restarted. My memory says that OCF_ERR_GENERIC will not cause that
behavior. I believe the spec says you should exit with not running if
it is not functioning correctly. (but I didn't check it, and my memory
isn't that clear in this case).

I will likely write a monitor-only resource agent for web servers. What
would you think about calling it from the other web resource agents?

This resource agent will not look at any config files, and will require
everything explicitly in parameters, and will not know how to start or
stop anything. This would be for my new monitoring project, of course
;-). But it could then be called by all the HTTP resource agents - or
used directly - for example by the Assimilation project.

This would be a slight but useful bending of OCF resource agent APIs.
We could create some new metadata to document it, and also not put start
and stop into the actions in the operations section. Or just the latter.

What do you think?



On 08/29/2012 05:31 AM, Dejan Muhamedagic wrote:
> Hi Alan,
>
> On Mon, Aug 27, 2012 at 10:51:15AM -0600, Alan Robertson wrote:
>> Hi,
>>
>> I was recently using the Apache resource agent, and discovered a few
>> problems:
>>
>> The exit code from grep was used directly as an OCF exit code.
>> It is NOT an OCF exit code, and should not be directly used
>> in this way.
> I guess you mean the greps in monitor_apache_extended and
> monitor_apache_basic? These lines:
>
> 267 $whattorun "$test_url" | grep -Ei "$test_regex" > /dev/null
> 277 ${ourhttpclient}_func "$STATUSURL" | grep -Ei "$TESTREGEX" > /dev/null
>
>> This caused a "not running" error to become a generic error.
> These lines are invoked _only_ in case it was previously
> established that the apache server is running. So, they should
> return OCF_ERR_GENERIC if the test fails. grep exits with code 1
> which matches OCF_ERR_GENERIC. But indeed the OCF error code
> should be returned explicitely.
>
>> Pacemaker reacts very differently to the two kinds of errors.
>>
>> This code occurred in two places.
>>
>> The resource agent used OCF_CHECK_LEVEL improperly.
>>
>> The specification says that if you receive an OCF_CHECK_LEVEL which you
>> do not support, you are required to interpret it as the next lower
>> supported value for OCF_CHECK_LEVEL.
>>
>> In effect, there are no invalid OCF_CHECK_LEVEL values. The Apache
>> agent declared all values but one to be errors. This is not the correct
>> behavior.
> OK. That somehow slipped while I had been reading the OCF standard.
>
> BTW, it'd be great if nginx shared some code with apache. The
> latter has already been split into three scripts.
>
> Cheers,
>
> Dejan
>
>> --
>> Alan Robertson <alanr@unix.sh> - @OSSAlanR
>>
>> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/


--
Alan Robertson <alanr@unix.sh> - @OSSAlanR

"Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
Hi,

On Tue, Sep 04, 2012 at 07:20:23PM -0600, Alan Robertson wrote:
> Hi Dejan,
>
> If the resource agent is not running correctly it needs to be
> restarted. My memory says that OCF_ERR_GENERIC will not cause that
> behavior. I believe the spec says you should exit with not running if
> it is not functioning correctly. (but I didn't check it, and my memory
> isn't that clear in this case).

From the OCF standard:

1 generic or unspecified error (current practice)
The "monitor" operation shall return this for a crashed, hung
or otherwise non-functional resource.
...
7 program is not running
Note: This is not the error code to be returned by a
successful "stop" operation. A successful "stop" operation
shall return 0.
The "monitor" action shall return this value only for a
_cleanly_ stopped resource. If in doubt, it should return 1.

It also sounds OK to me.

> I will likely write a monitor-only resource agent for web servers. What
> would you think about calling it from the other web resource agents?

There's a bit of code in http-mon.sh, extracted from apache. It
offers some extended testing of web servers. The features are
described in README.webapps.

> This resource agent will not look at any config files, and will require
> everything explicitly in parameters, and will not know how to start or
> stop anything.

Somebody wanted to do a ping-like RA, i.e. setting attribute
based on the HTTP results. Unfortunately, one contributor gave up
and another wanted to do everything from scratch thus duplicating
parts of the code. What I'd like to see is a script handling
just CRM attributes. Then it would be easy to put together a
Dummy-like RA to make use of that one and say http-mon.sh.

> This would be for my new monitoring project, of course
> ;-). But it could then be called by all the HTTP resource agents - or
> used directly - for example by the Assimilation project.
>
> This would be a slight but useful bending of OCF resource agent APIs.
> We could create some new metadata to document it, and also not put start
> and stop into the actions in the operations section. Or just the latter.
>
> What do you think?

Right now, there's a bunch of resource agents faking the state
(e.g. ping), that is pretending to be able to start and stop.
If we could somehow do without it, that would obviously be
beneficial. Not sure if/how the pacemaker could deal with such
agents.

Cheers,

Dejan

>
> On 08/29/2012 05:31 AM, Dejan Muhamedagic wrote:
> > Hi Alan,
> >
> > On Mon, Aug 27, 2012 at 10:51:15AM -0600, Alan Robertson wrote:
> >> Hi,
> >>
> >> I was recently using the Apache resource agent, and discovered a few
> >> problems:
> >>
> >> The exit code from grep was used directly as an OCF exit code.
> >> It is NOT an OCF exit code, and should not be directly used
> >> in this way.
> > I guess you mean the greps in monitor_apache_extended and
> > monitor_apache_basic? These lines:
> >
> > 267 $whattorun "$test_url" | grep -Ei "$test_regex" > /dev/null
> > 277 ${ourhttpclient}_func "$STATUSURL" | grep -Ei "$TESTREGEX" > /dev/null
> >
> >> This caused a "not running" error to become a generic error.
> > These lines are invoked _only_ in case it was previously
> > established that the apache server is running. So, they should
> > return OCF_ERR_GENERIC if the test fails. grep exits with code 1
> > which matches OCF_ERR_GENERIC. But indeed the OCF error code
> > should be returned explicitely.
> >
> >> Pacemaker reacts very differently to the two kinds of errors.
> >>
> >> This code occurred in two places.
> >>
> >> The resource agent used OCF_CHECK_LEVEL improperly.
> >>
> >> The specification says that if you receive an OCF_CHECK_LEVEL which you
> >> do not support, you are required to interpret it as the next lower
> >> supported value for OCF_CHECK_LEVEL.
> >>
> >> In effect, there are no invalid OCF_CHECK_LEVEL values. The Apache
> >> agent declared all values but one to be errors. This is not the correct
> >> behavior.
> > OK. That somehow slipped while I had been reading the OCF standard.
> >
> > BTW, it'd be great if nginx shared some code with apache. The
> > latter has already been split into three scripts.
> >
> > Cheers,
> >
> > Dejan
> >
> >> --
> >> Alan Robertson <alanr@unix.sh> - @OSSAlanR
> >>
> >> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
> >> _______________________________________________________
> >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> >> Home Page: http://linux-ha.org/
> > _______________________________________________________
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
>
>
> --
> Alan Robertson <alanr@unix.sh> - @OSSAlanR
>
> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
On 2012-09-04T19:20:23, Alan Robertson <alanr@unix.sh> wrote:

> I will likely write a monitor-only resource agent for web servers. What
> would you think about calling it from the other web resource agents?

Sharing code - in this case, the monitor-via-network of the http agents
- seems to make sense, yes.

> This resource agent will not look at any config files, and will require
> everything explicitly in parameters, and will not know how to start or
> stop anything. This would be for my new monitoring project, of course
> ;-). But it could then be called by all the HTTP resource agents - or
> used directly - for example by the Assimilation project.
>
> This would be a slight but useful bending of OCF resource agent APIs.

I am not sure I'd go by making this OCF RA.

We've - for other reasons, like monitoring the services within a VM -
started to look at wrapping up the icinga/nagios probes so that they can
be configured and called by the cluster. My current thinking is that
they might be best handled via a new resource agent class.

(Pseudo-configuration:

primitive vm1 ocf:heartbeat:VirtualDomain
primitive vm1-httpd icinga:httpd \
params ip="192.168.2.1" port="80"
group vm1-service vm1 vm1-httpd

With some special code in the PE to make it understand that it can't
just restart vm1-httpd, but would need to tackle the whole group
atomically, etc.)

I'm curious, have you looked into re-using those probes already? I admit
we're still at the evaluation stage so we might have missed problems
with the approach.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
Hi Lars,

On Wed, Sep 05, 2012 at 11:41:17AM +0200, Lars Marowsky-Bree wrote:
> On 2012-09-04T19:20:23, Alan Robertson <alanr@unix.sh> wrote:
>
> > I will likely write a monitor-only resource agent for web servers. What
> > would you think about calling it from the other web resource agents?
>
> Sharing code - in this case, the monitor-via-network of the http agents
> - seems to make sense, yes.
>
> > This resource agent will not look at any config files, and will require
> > everything explicitly in parameters, and will not know how to start or
> > stop anything. This would be for my new monitoring project, of course
> > ;-). But it could then be called by all the HTTP resource agents - or
> > used directly - for example by the Assimilation project.
> >
> > This would be a slight but useful bending of OCF resource agent APIs.
>
> I am not sure I'd go by making this OCF RA.
>
> We've - for other reasons, like monitoring the services within a VM -
> started to look at wrapping up the icinga/nagios probes so that they can
> be configured and called by the cluster. My current thinking is that
> they might be best handled via a new resource agent class.
>
> (Pseudo-configuration:
>
> primitive vm1 ocf:heartbeat:VirtualDomain
> primitive vm1-httpd icinga:httpd \
> params ip="192.168.2.1" port="80"
> group vm1-service vm1 vm1-httpd
>
> With some special code in the PE to make it understand that it can't
> just restart vm1-httpd, but would need to tackle the whole group
> atomically, etc.)

How about a new element. Something like

primitive vm1 ocf:heartbeat:VirtualDomain
require vm1 web-test dns-test
primitive web-test monocf:heartbeat:http-mon \
params ip="192.168.2.1" port="80"
primitive dns-test monocf:heartbeat:named ...

The "require" would imply that the resource vm1 requires
monitors of web-test and dns-test to succeed, in addition to its
monitor (if defined). Monitor ops of web-test and dns-test will
run only on the node where vm1 is started. They could in also
get the environment (parameters) of vm1.

monocf may be just like ocf, sans start and stop operations.
That would make all ocf RA elligible for this use.

We could derive more classes from monocf, i.e. wrappers for
various monitor solutions.

I suppose that this would be relatively straightforward to
implement.

Thanks,

Dejan

> I'm curious, have you looked into re-using those probes already? I admit
> we're still at the evaluation stage so we might have missed problems
> with the approach.
>
>
> Regards,
> Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
On 2012-09-05T15:25:44, Dejan Muhamedagic <dejan@suse.de> wrote:

> How about a new element. Something like
>
> primitive vm1 ocf:heartbeat:VirtualDomain
> require vm1 web-test dns-test

How we map this into Pacemaker's dependency scheme is obviously open to
discussion.

> The "require" would imply that the resource vm1 requires
> monitors of web-test and dns-test to succeed, in addition to its
> monitor (if defined).

Perhaps. But an "as-a-whole" attribute for groups to restart handling
might already be enough, since we would want the system to eventually
stabilize at the same state it runs to today (that is, with the group
brought up to the last non-failing resource; otherwise, admins
couldn't login to the VM to fix the problem).

> Monitor ops of web-test and dns-test will run only on the node where
> vm1 is started. They could in also get the environment (parameters) of
> vm1.

That's implicit in the group.

Internally, this could indeed map to a "symmetric" or whatever aspect of
the order dependency, yes, that could be set for the whole group.

> monocf may be just like ocf, sans start and stop operations.
> That would make all ocf RA elligible for this use.

None of the current resource agents would be able to cope with the use
case I suggested, because they expect to run in the OS image where the
service is provided - the idea of using the icinga/nagios plugins is
exactly that they don't have this requirement, and thus can monitor the
VM externally.

For OCF agents, this sort-of already exists: meta is-managed=false.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
On 06/09/2012, at 12:30 AM, Lars Marowsky-Bree <lmb@suse.com> wrote:

> On 2012-09-05T15:25:44, Dejan Muhamedagic <dejan@suse.de> wrote:
>
>> How about a new element. Something like
>>
>> primitive vm1 ocf:heartbeat:VirtualDomain
>> require vm1 web-test dns-test
>
> How we map this into Pacemaker's dependency scheme is obviously open to
> discussion.
>
>> The "require" would imply that the resource vm1 requires
>> monitors of web-test and dns-test to succeed, in addition to its
>> monitor (if defined).
>
> Perhaps. But an "as-a-whole" attribute for groups to restart handling
> might already be enough, since we would want the system to eventually
> stabilize at the same state it runs to today (that is, with the group
> brought up to the last non-failing resource; otherwise, admins
> couldn't login to the VM to fix the problem).

Those two requirements seem at odds with each other. I doubt it would end well.
I suspect you really want the "restart everything" trigger to be attached to the "monitor only" resource (at the end).

>
>> Monitor ops of web-test and dns-test will run only on the node where
>> vm1 is started. They could in also get the environment (parameters) of
>> vm1.
>
> That's implicit in the group.
>
> Internally, this could indeed map to a "symmetric" or whatever aspect of
> the order dependency, yes, that could be set for the whole group.
>
>> monocf may be just like ocf, sans start and stop operations.
>> That would make all ocf RA elligible for this use.
>
> None of the current resource agents would be able to cope with the use
> case I suggested, because they expect to run in the OS image where the
> service is provided - the idea of using the icinga/nagios plugins is
> exactly that they don't have this requirement, and thus can monitor the
> VM externally.
>
> For OCF agents, this sort-of already exists: meta is-managed=false.
>
>
> Regards,
> Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
On 2012-09-05T15:25:44, Dejan Muhamedagic <dejan@suse.de> wrote:

BTW, FWIW -

> monocf may be just like ocf, sans start and stop operations.
> That would make all ocf RA elligible for this use.

Thinking about this, not entirely. We'd have to fake the start/stop at
least. (In particular the start.)

For probes, start has to do at least

while ! monitor ; sleep 1 ; done

to wait until the service is up, before going on. Otherwise, we'd
possibly immediately report a failure to Pacemaker and trigger
recovery.

(Unless we want to mess with start-delay, which I dislike and also
doesn't provide such nice reporting.)

"stop" is tricky from the PE perspective - we don't actually want to
stop the probes, but only the VM (which implies the stop of the services
it provides).

And if we can, we'd love to keep showing the probes's state while the VM
shuts down, to show the admin what's going on. But, of course, not
trigger a recovery.

So, we'd want start-up to be:

VM -> (probes) - that's easy, as a group or as resource set

Shutdown would be the same, though:

VM -> (probes) - and not the inverse of the above.

I wonder how much this would suck or if we should just suck it up and
destroy the probes and then stop the VM (giving up this added
transparency)?


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
On 09/05/2012 03:32 AM, Dejan Muhamedagic wrote:
>
> This would be for my new monitoring project, of course
> ;-). But it could then be called by all the HTTP resource agents - or
> used directly - for example by the Assimilation project.
>
> This would be a slight but useful bending of OCF resource agent APIs.
> We could create some new metadata to document it, and also not put start
> and stop into the actions in the operations section. Or just the latter.
>
> What do you think?
> Right now, there's a bunch of resource agents faking the state
> (e.g. ping), that is pretending to be able to start and stop.
> If we could somehow do without it, that would obviously be
> beneficial. Not sure if/how the pacemaker could deal with such
> agents.
Well, I presume that one would not tell pacemaker about such agents, as
they would not be useful to pacemaker. From the point of view of the
crm command, you wouldn't consider them as "valid" resource agents to
put in a configuration for pacemaker.

People would instead use the nginx or apache agents that _do_ know how
to start and stop things.

--
Alan Robertson <alanr@unix.sh> - @OSSAlanR

"Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
On 2012-09-07T13:46:27, Alan Robertson <alanr@unix.sh> wrote:

> Well, I presume that one would not tell pacemaker about such agents, as
> they would not be useful to pacemaker. From the point of view of the
> crm command, you wouldn't consider them as "valid" resource agents to
> put in a configuration for pacemaker.

Depends. Pacemaker may still care about the status of these agents.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
On 09/08/2012 02:53 PM, Lars Marowsky-Bree wrote:
> On 2012-09-07T13:46:27, Alan Robertson <alanr@unix.sh> wrote:
>
>> Well, I presume that one would not tell pacemaker about such agents, as
>> they would not be useful to pacemaker. From the point of view of the
>> crm command, you wouldn't consider them as "valid" resource agents to
>> put in a configuration for pacemaker.
> Depends. Pacemaker may still care about the status of these agents.
If it can't start or stop them, what can it do with them? And
presuming it can't do anything with them, then it doesn't make sense to
include them in a configuration.

Am I missing something here?

--
Alan Robertson <alanr@unix.sh> - @OSSAlanR

"Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
On 2012-09-11T15:04:55, Alan Robertson <alanr@unix.sh> wrote:

> > Depends. Pacemaker may still care about the status of these agents.
> If it can't start or stop them, what can it do with them?

The status from these agents may feed into operations on other
resources that are fully managed.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
On 09/12/2012 05:14 AM, Lars Marowsky-Bree wrote:
> On 2012-09-11T15:04:55, Alan Robertson <alanr@unix.sh> wrote:
>
>>> Depends. Pacemaker may still care about the status of these agents.
>> If it can't start or stop them, what can it do with them?
> The status from these agents may feed into operations on other
> resources that are fully managed.

Understood.

I believe it will care about those other agents - not these. It
shouldn't know about these, AFAIK.

The fact that the other agents might call these is an implementation
detail - not something it should care about directly. Just as the
resource agents should only rely on things that the OCF RA spec says are
provided, consumers of those agents (like pacemaker) shouldn't go past
the spec in terms of expectations from or observations of resource
agents beyond the spec. Or at least that's how it seems to me.

It's still my intent to have the exit codes, argument passing, etc. be
fully compliant with the OCF RA specification. The only exception I
plan on is no start or stop (or reload, etc) actions. They will
implement the meta-data and monitor and validate-all actions. I'm not
sure whether validate-all makes sense for them or not(?). I'll think
about that...



--
Alan Robertson <alanr@unix.sh> - @OSSAlanR

"Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
On 2012-09-12T09:01:05, Alan Robertson <alanr@unix.sh> wrote:

> > The status from these agents may feed into operations on other
> > resources that are fully managed.
>
> Understood.
>
> I believe it will care about those other agents - not these. It
> shouldn't know about these, AFAIK.

I guess then you're talking about a different effort from what
Dejan, Yan, and I are investigating. (Since we need that status so that
Pacemaker can restart the VM, if needed, for example.)

(Our goal is also to reuse existing probes from other monitoring
frameworks, not rewrite them.)



Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Slight bending of OCF specs: Re: Issues found in Apache resource agent [ In reply to ]
On 09/12/2012 09:11 AM, Lars Marowsky-Bree wrote:
> On 2012-09-12T09:01:05, Alan Robertson <alanr@unix.sh> wrote:
>
>>> The status from these agents may feed into operations on other
>>> resources that are fully managed.
>> Understood.
>>
>> I believe it will care about those other agents - not these. It
>> shouldn't know about these, AFAIK.
> I guess then you're talking about a different effort from what
> Dejan, Yan, and I are investigating. (Since we need that status so that
> Pacemaker can restart the VM, if needed, for example.)
>
> (Our goal is also to reuse existing probes from other monitoring
> frameworks, not rewrite them.)

Well... Most monitors use software from somewhere else, but I didn't
know about your effort - so no, I wasn't talking about that effort -
although there is some similarity.

What I've heard from other folks using the other monitoring frameworks,
is that one of the biggest issues with Nagios for example is that the
monitoring agents aren't very reliable.

In spite of that, I've certainly given some thought to writing a Nagios
plugin for the LRM for my purposes.

--
Alan Robertson <alanr@unix.sh> - @OSSAlanR

"Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/