Mailing List Archive

Patches for VirtualDomain RA
Hi,

I hope I found the correct list. Playing with the VirtualDomain RA I found two
problems. Please find the description and patches below.


1) During stop operation libvirt occasionally returns an error because the
state cannot be determined just the moment the machine is shut down. This
patch makes the RA try to get the state again one time. If the machine is down
then everything is OK.

--- /root/VirtualDomain 2011-07-29 08:39:30.652675972 +0200
+++ /usr/lib/ocf/resource.d/heartbeat/VirtualDomain 2011-07-29
10:08:24.712790703 +0200
@@ -149,6 +149,7 @@
VirtualDomain_Status() {
rc=$OCF_ERR_GENERIC
status="no state"
+ bail_wait="yes";
while [ "$status" = "no state" ]; do
status="`virsh $VIRSH_OPTIONS domstate $DOMAIN_NAME`"
case "$status" in
@@ -177,8 +178,13 @@
# During the stop operation, we want to bail out
# quickly, so as to be able to force-stop (destroy)
# the domain if necessary.
- ocf_log error "Virtual domain $DOMAIN_NAME has no state
during
stop operation, bailing out."
- return $OCF_ERR_GENERIC;
+ ocf_log info "Virtual domain $DOMAIN_NAME has no state
during
stop operation."
+ if [ "$bail_wait" = "no" ]; then
+ ocf_log error "Virtual domain $DOMAIN_NAME has no
state
during stop operation, bailing out."
+ return $OCF_ERR_GENERIC;
+ fi
+ bail_wait="no"
+ sleep 1
else
# During all other actions, we just wait and try
# again, relying on the CRM/LRM to time us out if

2) The next problem is that a graceful shutdown sometimes does not work when
the machine just booted. This patch makes the RA send a shutdown command every
10 seconds while shutting down the machine. This catches the boot problem.

@@ -234,6 +240,9 @@
shutdown_timeout=$((($OCF_RESKEY_CRM_meta_timeout/1000)-5))
# Loop on status for $shutdown_timeout seconds
for i in `seq $shutdown_timeout`; do
+ if [ $((i%10)) -eq 0 ]; then
+ virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}
+ fi
VirtualDomain_Status
status=$?
case $status in

Greetings,
--
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98
Fax: (089) 620 304 13
Re: Patches for VirtualDomain RA [ In reply to ]
Hi,

On Fri, Jul 29, 2011 at 10:22:40AM +0200, Michael Schwartzkopff wrote:
> Hi,
>
> I hope I found the correct list. Playing with the VirtualDomain RA I found two
> problems. Please find the description and patches below.

It's the right ML, but it we're temporarily out of free cycles.
I hope that Florian will take care of this once he's back.

Cheers,

Dejan

> 1) During stop operation libvirt occasionally returns an error because the
> state cannot be determined just the moment the machine is shut down. This
> patch makes the RA try to get the state again one time. If the machine is down
> then everything is OK.
>
> --- /root/VirtualDomain 2011-07-29 08:39:30.652675972 +0200
> +++ /usr/lib/ocf/resource.d/heartbeat/VirtualDomain 2011-07-29
> 10:08:24.712790703 +0200
> @@ -149,6 +149,7 @@
> VirtualDomain_Status() {
> rc=$OCF_ERR_GENERIC
> status="no state"
> + bail_wait="yes";
> while [ "$status" = "no state" ]; do
> status="`virsh $VIRSH_OPTIONS domstate $DOMAIN_NAME`"
> case "$status" in
> @@ -177,8 +178,13 @@
> # During the stop operation, we want to bail out
> # quickly, so as to be able to force-stop (destroy)
> # the domain if necessary.
> - ocf_log error "Virtual domain $DOMAIN_NAME has no state
> during
> stop operation, bailing out."
> - return $OCF_ERR_GENERIC;
> + ocf_log info "Virtual domain $DOMAIN_NAME has no state
> during
> stop operation."
> + if [ "$bail_wait" = "no" ]; then
> + ocf_log error "Virtual domain $DOMAIN_NAME has no
> state
> during stop operation, bailing out."
> + return $OCF_ERR_GENERIC;
> + fi
> + bail_wait="no"
> + sleep 1
> else
> # During all other actions, we just wait and try
> # again, relying on the CRM/LRM to time us out if
>
> 2) The next problem is that a graceful shutdown sometimes does not work when
> the machine just booted. This patch makes the RA send a shutdown command every
> 10 seconds while shutting down the machine. This catches the boot problem.
>
> @@ -234,6 +240,9 @@
> shutdown_timeout=$((($OCF_RESKEY_CRM_meta_timeout/1000)-5))
> # Loop on status for $shutdown_timeout seconds
> for i in `seq $shutdown_timeout`; do
> + if [ $((i%10)) -eq 0 ]; then
> + virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}
> + fi
> VirtualDomain_Status
> status=$?
> case $status in
>
> Greetings,
> --
> Dr. Michael Schwartzkopff
> Guardinistr. 63
> 81375 München
>
> Tel: (0163) 172 50 98
> Fax: (089) 620 304 13



> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Patches for VirtualDomain RA [ In reply to ]
On 2011-07-29 10:22, Michael Schwartzkopff wrote:
> Hi,
>
> I hope I found the correct list. Playing with the VirtualDomain RA I found two
> problems. Please find the description and patches below.

Sorry for not tending to this for a while, and thanks to Dejan for the
reminder.

> 1) During stop operation libvirt occasionally returns an error because the
> state cannot be determined just the moment the machine is shut down. This
> patch makes the RA try to get the state again one time. If the machine is down
> then everything is OK.
>
> --- /root/VirtualDomain 2011-07-29 08:39:30.652675972 +0200
> +++ /usr/lib/ocf/resource.d/heartbeat/VirtualDomain 2011-07-29
> 10:08:24.712790703 +0200
> @@ -149,6 +149,7 @@
> VirtualDomain_Status() {
> rc=$OCF_ERR_GENERIC
> status="no state"
> + bail_wait="yes";
> while [ "$status" = "no state" ]; do
> status="`virsh $VIRSH_OPTIONS domstate $DOMAIN_NAME`"
> case "$status" in
> @@ -177,8 +178,13 @@
> # During the stop operation, we want to bail out
> # quickly, so as to be able to force-stop (destroy)
> # the domain if necessary.
> - ocf_log error "Virtual domain $DOMAIN_NAME has no state
> during
> stop operation, bailing out."
> - return $OCF_ERR_GENERIC;
> + ocf_log info "Virtual domain $DOMAIN_NAME has no state
> during
> stop operation."
> + if [ "$bail_wait" = "no" ]; then
> + ocf_log error "Virtual domain $DOMAIN_NAME has no
> state
> during stop operation, bailing out."
> + return $OCF_ERR_GENERIC;
> + fi
> + bail_wait="no"
> + sleep 1
> else
> # During all other actions, we just wait and try
> # again, relying on the CRM/LRM to time us out if

Can you please configure your mail agent to not insert line breaks when
you send patches? Better still, use git send-email.

At any rate, I consider the patch obsolete (and actually, it was already
when it was submitted), as Lars Ellenberg implemented a "try this three
times" logic in commit ffc83235, on July 1, 2010:

https://github.com/ClusterLabs/resource-agents/commit/ffc8323515c19bc51fe0801fc3d2610878699ce3

> 2) The next problem is that a graceful shutdown sometimes does not work when
> the machine just booted. This patch makes the RA send a shutdown command every
> 10 seconds while shutting down the machine. This catches the boot problem.
>
> @@ -234,6 +240,9 @@
> shutdown_timeout=$((($OCF_RESKEY_CRM_meta_timeout/1000)-5))
> # Loop on status for $shutdown_timeout seconds
> for i in `seq $shutdown_timeout`; do
> + if [ $((i%10)) -eq 0 ]; then
> + virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}
> + fi
> VirtualDomain_Status
> status=$?
> case $status in

I see the point -- if you're issuing a KVM shutdown while the machine is
still booting and the guest's acpid is not started, then the shutdown
effectively doesn't happen. And issuing a shutdown request for a domain
that's already got one should do no harm.

Question is, why only do this every 10 seconds then? Might as well do it
on every iteration. So we could just roll the invocation of "virsh
$VIRSH_OPTIONS shutdown ${DOMAIN_NAME}" into the existing "while [ $NOW
-lt $shutdown_timeout ]; do" loop.

What do others think?

Cheers,
Florian

--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: Patches for VirtualDomain RA [ In reply to ]
> > 2) The next problem is that a graceful shutdown sometimes does not work
> > when the machine just booted. This patch makes the RA send a shutdown
> > command every 10 seconds while shutting down the machine. This catches
> > the boot problem.
> >
> > @@ -234,6 +240,9 @@
> >
> > shutdown_timeout=$((($OCF_RESKEY_CRM_meta_timeout/1000)-5
> > )) # Loop on status for $shutdown_timeout seconds
> > for i in `seq $shutdown_timeout`; do
> >
> > + if [ $((i%10)) -eq 0 ]; then
> > + virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}
> > + fi
> >
> > VirtualDomain_Status
> > status=$?
> > case $status in
>
> I see the point -- if you're issuing a KVM shutdown while the machine is
> still booting and the guest's acpid is not started, then the shutdown
> effectively doesn't happen. And issuing a shutdown request for a domain
> that's already got one should do no harm.
>
> Question is, why only do this every 10 seconds then? Might as well do it
> on every iteration. So we could just roll the invocation of "virsh
> $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}" into the existing "while [ $NOW
> -lt $shutdown_timeout ]; do" loop.
>
> What do others think?

Perhaps the shutdown might cause a considerably load on the system. If the
system is slow it might be not a good idea to fire a shutdown every second. On
the other hand, if acpip is not started, the shutdown will not harm.

I am not against firing every second.

Geetings,

--
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98
Re: Patches for VirtualDomain RA [ In reply to ]
On 2011-11-11 11:42, Michael Schwartzkopff wrote:
>>> 2) The next problem is that a graceful shutdown sometimes does not work
>>> when the machine just booted. This patch makes the RA send a shutdown
>>> command every 10 seconds while shutting down the machine. This catches
>>> the boot problem.
>>>
>>> @@ -234,6 +240,9 @@
>>>
>>> shutdown_timeout=$((($OCF_RESKEY_CRM_meta_timeout/1000)-5
>>> )) # Loop on status for $shutdown_timeout seconds
>>> for i in `seq $shutdown_timeout`; do
>>>
>>> + if [ $((i%10)) -eq 0 ]; then
>>> + virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}
>>> + fi
>>>
>>> VirtualDomain_Status
>>> status=$?
>>> case $status in
>>
>> I see the point -- if you're issuing a KVM shutdown while the machine is
>> still booting and the guest's acpid is not started, then the shutdown
>> effectively doesn't happen. And issuing a shutdown request for a domain
>> that's already got one should do no harm.
>>
>> Question is, why only do this every 10 seconds then? Might as well do it
>> on every iteration. So we could just roll the invocation of "virsh
>> $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}" into the existing "while [ $NOW
>> -lt $shutdown_timeout ]; do" loop.
>>
>> What do others think?
>
> Perhaps the shutdown might cause a considerably load on the system.

Why?

Florian
--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/