Mailing List Archive

ManageVE prints bogus errors to the syslog
Hi,

When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
get a lot of such messages in the syslog:

Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7

It looks to me as if lrmd is making sure the CT is not running anymore.
However, this triggers ManageVE to print an error.

Since the result in this case is expected, shouldn't ManageVE avoid to
print an error? It looks as something went wrong and also it is caught
every time by our log monitor, although nothing is actually wrong.

Roman


_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: ManageVE prints bogus errors to the syslog [ In reply to ]
Hi,

On Fri, Mar 22, 2013 at 08:41:30AM +0100, Roman Haefeli wrote:
> Hi,
>
> When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
> get a lot of such messages in the syslog:
>
> Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
> Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7
>
> It looks to me as if lrmd is making sure the CT is not running anymore.
> However, this triggers ManageVE to print an error.

Could be. Looking at the RA, there's a bunch of places where the
status is invoked and where this message could get logged. It
could be improved. The following patch should help:

https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015

Cheers,

Dejan

> Since the result in this case is expected, shouldn't ManageVE avoid to
> print an error? It looks as something went wrong and also it is caught
> every time by our log monitor, although nothing is actually wrong.
>
> Roman
>
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: ManageVE prints bogus errors to the syslog [ In reply to ]
On Wed, Apr 03, 2013 at 06:25:58PM +0200, Dejan Muhamedagic wrote:
> Hi,
>
> On Fri, Mar 22, 2013 at 08:41:30AM +0100, Roman Haefeli wrote:
> > Hi,
> >
> > When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
> > get a lot of such messages in the syslog:
> >
> > Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
> > Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7
> >
> > It looks to me as if lrmd is making sure the CT is not running anymore.
> > However, this triggers ManageVE to print an error.
>
> Could be. Looking at the RA, there's a bunch of places where the
> status is invoked and where this message could get logged. It
> could be improved. The following patch should help:
>
> https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015

BTW, why call `vzctl | awk` *twice*,
just to get two items out of the vzctl output?

how about lose the awk, and the second invokation?
something like this:
(should veexists and vestatus be local as well?)

diff --git a/heartbeat/ManageVE b/heartbeat/ManageVE
index 56a3d03..53f9bab 100755
--- a/heartbeat/ManageVE
+++ b/heartbeat/ManageVE
@@ -182,10 +182,12 @@ migrate_from_ve()
status_ve()
{
declare -i retcode
-
- veexists=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $3}'`
- vestatus=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $5}'`
+ local vzstatus
+ vzstatus=`$VZCTL status $VEID 2>/dev/null`
retcode=$?
+ set -- $vzstatus
+ veexists=$3
+ vestatus=$5

if [[ $retcode != 0 ]]; then
ocf_log err "vzctl status $VEID returned: $retcode"



[. BTW, what's all the "declare -i" doing in there?
"local" would have done nicely. ]

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: ManageVE prints bogus errors to the syslog [ In reply to ]
Hi Lars,

On Thu, Apr 04, 2013 at 09:28:00PM +0200, Lars Ellenberg wrote:
> On Wed, Apr 03, 2013 at 06:25:58PM +0200, Dejan Muhamedagic wrote:
> > Hi,
> >
> > On Fri, Mar 22, 2013 at 08:41:30AM +0100, Roman Haefeli wrote:
> > > Hi,
> > >
> > > When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
> > > get a lot of such messages in the syslog:
> > >
> > > Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
> > > Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7
> > >
> > > It looks to me as if lrmd is making sure the CT is not running anymore.
> > > However, this triggers ManageVE to print an error.
> >
> > Could be. Looking at the RA, there's a bunch of places where the
> > status is invoked and where this message could get logged. It
> > could be improved. The following patch should help:
> >
> > https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015
>
> BTW, why call `vzctl | awk` *twice*,
> just to get two items out of the vzctl output?
>
> how about lose the awk, and the second invokation?
> something like this:
> (should veexists and vestatus be local as well?)
>
> diff --git a/heartbeat/ManageVE b/heartbeat/ManageVE
> index 56a3d03..53f9bab 100755
> --- a/heartbeat/ManageVE
> +++ b/heartbeat/ManageVE
> @@ -182,10 +182,12 @@ migrate_from_ve()
> status_ve()
> {
> declare -i retcode
> -
> - veexists=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $3}'`
> - vestatus=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $5}'`
> + local vzstatus
> + vzstatus=`$VZCTL status $VEID 2>/dev/null`
> retcode=$?
> + set -- $vzstatus
> + veexists=$3
> + vestatus=$5
>
> if [[ $retcode != 0 ]]; then
> ocf_log err "vzctl status $VEID returned: $retcode"

Well, you do have commit rights, don't you? :)

> [. BTW, what's all the "declare -i" doing in there?
> "local" would have done nicely. ]

No idea. But since the RA is /bin/bash I guess that it doesn't
matter.

Cheers,

Dejan

> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: ManageVE prints bogus errors to the syslog [ In reply to ]
On Fri, Apr 05, 2013 at 12:39:46PM +0200, Dejan Muhamedagic wrote:
> Hi Lars,
>
> On Thu, Apr 04, 2013 at 09:28:00PM +0200, Lars Ellenberg wrote:
> > On Wed, Apr 03, 2013 at 06:25:58PM +0200, Dejan Muhamedagic wrote:
> > > Hi,
> > >
> > > On Fri, Mar 22, 2013 at 08:41:30AM +0100, Roman Haefeli wrote:
> > > > Hi,
> > > >
> > > > When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
> > > > get a lot of such messages in the syslog:
> > > >
> > > > Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
> > > > Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7
> > > >
> > > > It looks to me as if lrmd is making sure the CT is not running anymore.
> > > > However, this triggers ManageVE to print an error.
> > >
> > > Could be. Looking at the RA, there's a bunch of places where the
> > > status is invoked and where this message could get logged. It
> > > could be improved. The following patch should help:
> > >
> > > https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015
> >
> > BTW, why call `vzctl | awk` *twice*,
> > just to get two items out of the vzctl output?
> >
> > how about lose the awk, and the second invokation?
> > something like this:
> > (should veexists and vestatus be local as well?)
> >
> > diff --git a/heartbeat/ManageVE b/heartbeat/ManageVE
> > index 56a3d03..53f9bab 100755
> > --- a/heartbeat/ManageVE
> > +++ b/heartbeat/ManageVE
> > @@ -182,10 +182,12 @@ migrate_from_ve()
> > status_ve()
> > {
> > declare -i retcode
> > -
> > - veexists=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $3}'`
> > - vestatus=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $5}'`
> > + local vzstatus
> > + vzstatus=`$VZCTL status $VEID 2>/dev/null`
> > retcode=$?
> > + set -- $vzstatus
> > + veexists=$3
> > + vestatus=$5
> >
> > if [[ $retcode != 0 ]]; then
> > ocf_log err "vzctl status $VEID returned: $retcode"
>
> Well, you do have commit rights, don't you? :)

Sure, but I don't have a vz handy to test even "obviously correct"
patches with, before I commit...


Lars
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: ManageVE prints bogus errors to the syslog [ In reply to ]
On Wed, Apr 10, 2013 at 12:23:38AM +0200, Lars Ellenberg wrote:
> On Fri, Apr 05, 2013 at 12:39:46PM +0200, Dejan Muhamedagic wrote:
> > Hi Lars,
> >
> > On Thu, Apr 04, 2013 at 09:28:00PM +0200, Lars Ellenberg wrote:
> > > On Wed, Apr 03, 2013 at 06:25:58PM +0200, Dejan Muhamedagic wrote:
> > > > Hi,
> > > >
> > > > On Fri, Mar 22, 2013 at 08:41:30AM +0100, Roman Haefeli wrote:
> > > > > Hi,
> > > > >
> > > > > When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
> > > > > get a lot of such messages in the syslog:
> > > > >
> > > > > Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
> > > > > Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7
> > > > >
> > > > > It looks to me as if lrmd is making sure the CT is not running anymore.
> > > > > However, this triggers ManageVE to print an error.
> > > >
> > > > Could be. Looking at the RA, there's a bunch of places where the
> > > > status is invoked and where this message could get logged. It
> > > > could be improved. The following patch should help:
> > > >
> > > > https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015
> > >
> > > BTW, why call `vzctl | awk` *twice*,
> > > just to get two items out of the vzctl output?
> > >
> > > how about lose the awk, and the second invokation?
> > > something like this:
> > > (should veexists and vestatus be local as well?)
> > >
> > > diff --git a/heartbeat/ManageVE b/heartbeat/ManageVE
> > > index 56a3d03..53f9bab 100755
> > > --- a/heartbeat/ManageVE
> > > +++ b/heartbeat/ManageVE
> > > @@ -182,10 +182,12 @@ migrate_from_ve()
> > > status_ve()
> > > {
> > > declare -i retcode
> > > -
> > > - veexists=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $3}'`
> > > - vestatus=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $5}'`
> > > + local vzstatus
> > > + vzstatus=`$VZCTL status $VEID 2>/dev/null`
> > > retcode=$?
> > > + set -- $vzstatus
> > > + veexists=$3
> > > + vestatus=$5
> > >
> > > if [[ $retcode != 0 ]]; then
> > > ocf_log err "vzctl status $VEID returned: $retcode"
> >
> > Well, you do have commit rights, don't you? :)
>
> Sure, but I don't have a vz handy to test even "obviously correct"
> patches with, before I commit...

Looked correct to me too, but then it wouldn't have been the
first time I got something wrong :D

Maybe the reporter can help with testing. Roman?

Cheers,

Dejan

>
> Lars
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/