Mailing List Archive

VirtualDomain issue
Hi

code snippet from
http://hg.linux-ha.org/agents/raw-file/7a11934b142d/heartbeat/VirtualDomain
(which I believe is the current version)

VirtualDomain_Validate_All() {
<snip>
if [ ! -r $OCF_RESKEY_config ]; then
if ocf_is_probe; then
ocf_log info "Configuration file $OCF_RESKEY_config not readable
during probe."
else
ocf_log error "Configuration file $OCF_RESKEY_config does not exist
or is not readable."
return $OCF_ERR_INSTALLED
fi
fi
}
<snip>
VirtualDomain_Validate_All || exit $?
<snip>
if ocf_is_probe && [ ! -r $OCF_RESKEY_config ]; then
exit $OCF_NOT_RUNNING
fi

So, say one node does not have the config, but the cluster decides to
run the vm on that node. The probe returns NOT_RUNNING, so the cluster
tries to start the vm, that start returns ERR_INSTALLED, the cluster has
to try to recover from the start failure, so stop it, but that stop op
returns ERR_INSTALLED as well, so we need to be stonith'd.

I think this is wrong behaviour. I read the comments about
configurations being on shared storage which might not be available at
certain points in time and I see the point. But the way this is
implemented clearly does not work for everybody. I vote for making this
configurable. Unfortunately, due to several reasons, I am not able to
contribute this patch myself at the moment.

Regards
Dominik
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: VirtualDomain issue [ In reply to ]
Hi,

On Thu, Jun 23, 2011 at 07:51:48AM +0200, Dominik Klein wrote:
> Hi
>
> code snippet from
> http://hg.linux-ha.org/agents/raw-file/7a11934b142d/heartbeat/VirtualDomain
> (which I believe is the current version)
>
> VirtualDomain_Validate_All() {
> <snip>
> if [ ! -r $OCF_RESKEY_config ]; then
> if ocf_is_probe; then
> ocf_log info "Configuration file $OCF_RESKEY_config not readable
> during probe."
> else
> ocf_log error "Configuration file $OCF_RESKEY_config does not exist
> or is not readable."
> return $OCF_ERR_INSTALLED
> fi
> fi
> }
> <snip>
> VirtualDomain_Validate_All || exit $?
> <snip>
> if ocf_is_probe && [ ! -r $OCF_RESKEY_config ]; then
> exit $OCF_NOT_RUNNING
> fi
>
> So, say one node does not have the config, but the cluster decides to
> run the vm on that node. The probe returns NOT_RUNNING, so the cluster
> tries to start the vm, that start returns ERR_INSTALLED, the cluster has
> to try to recover from the start failure, so stop it, but that stop op
> returns ERR_INSTALLED as well, so we need to be stonith'd.
>
> I think this is wrong behaviour.

On stop, it should return OCF_SUCCESS. I wonder if it would be
safe for the CRM to interpret ERR_INSTALLED on stop as "resource
stopped."

Opinions?

Cheers,

Dejan

P.S. Very sorry for such a delay!

> I read the comments about
> configurations being on shared storage which might not be available at
> certain points in time and I see the point. But the way this is
> implemented clearly does not work for everybody. I vote for making this
> configurable. Unfortunately, due to several reasons, I am not able to
> contribute this patch myself at the moment.
>
> Regards
> Dominik
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: VirtualDomain issue [ In reply to ]
On Mon, Nov 14, 2011 at 9:58 PM, Dejan Muhamedagic <dejan@suse.de> wrote:
> Hi,
>
> On Thu, Jun 23, 2011 at 07:51:48AM +0200, Dominik Klein wrote:
>> Hi
>>
>> code snippet from
>> http://hg.linux-ha.org/agents/raw-file/7a11934b142d/heartbeat/VirtualDomain
>> (which I believe is the current version)
>>
>> VirtualDomain_Validate_All() {
>> <snip>
>>      if [ ! -r $OCF_RESKEY_config ]; then
>>       if ocf_is_probe; then
>>           ocf_log info "Configuration file $OCF_RESKEY_config not readable
>> during probe."
>>       else
>>           ocf_log error "Configuration file $OCF_RESKEY_config does not exist
>> or is not readable."
>>           return $OCF_ERR_INSTALLED
>>       fi
>>      fi
>> }
>> <snip>
>> VirtualDomain_Validate_All || exit $?
>> <snip>
>> if ocf_is_probe && [ ! -r $OCF_RESKEY_config ]; then
>>      exit $OCF_NOT_RUNNING
>> fi
>>
>> So, say one node does not have the config, but the cluster decides to
>> run the vm on that node. The probe returns NOT_RUNNING, so the cluster
>> tries to start the vm, that start returns ERR_INSTALLED, the cluster has
>> to try to recover from the start failure, so stop it, but that stop op
>> returns ERR_INSTALLED as well, so we need to be stonith'd.
>>
>> I think this is wrong behaviour.
>
> On stop, it should return OCF_SUCCESS. I wonder if it would be
> safe for the CRM to interpret ERR_INSTALLED on stop as "resource
> stopped."
>
> Opinions?

Feels dangerous.
Even if the binaries are missing, the RA should arguably look for and
kill any relevant processes before returning OK.

>
> Cheers,
>
> Dejan
>
> P.S. Very sorry for such a delay!
>
>> I read the comments about
>> configurations being on shared storage which might not be available at
>> certain points in time and I see the point. But the way this is
>> implemented clearly does not work for everybody. I vote for making this
>> configurable. Unfortunately, due to several reasons, I am not able to
>> contribute this patch myself at the moment.
>>
>> Regards
>> Dominik
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: VirtualDomain issue [ In reply to ]
Hi,

On Mon, Nov 14, 2011 at 11:58:06AM +0100, Dejan Muhamedagic wrote:
> Hi,
>
> On Thu, Jun 23, 2011 at 07:51:48AM +0200, Dominik Klein wrote:
> > Hi
> >
> > code snippet from
> > http://hg.linux-ha.org/agents/raw-file/7a11934b142d/heartbeat/VirtualDomain
> > (which I believe is the current version)
> >
> > VirtualDomain_Validate_All() {
> > <snip>
> > if [ ! -r $OCF_RESKEY_config ]; then
> > if ocf_is_probe; then
> > ocf_log info "Configuration file $OCF_RESKEY_config not readable
> > during probe."
> > else
> > ocf_log error "Configuration file $OCF_RESKEY_config does not exist
> > or is not readable."
> > return $OCF_ERR_INSTALLED
> > fi
> > fi
> > }
> > <snip>
> > VirtualDomain_Validate_All || exit $?
> > <snip>
> > if ocf_is_probe && [ ! -r $OCF_RESKEY_config ]; then
> > exit $OCF_NOT_RUNNING
> > fi
> >
> > So, say one node does not have the config, but the cluster decides to
> > run the vm on that node. The probe returns NOT_RUNNING, so the cluster
> > tries to start the vm, that start returns ERR_INSTALLED, the cluster has
> > to try to recover from the start failure, so stop it, but that stop op
> > returns ERR_INSTALLED as well, so we need to be stonith'd.
> >
> > I think this is wrong behaviour.
>
> On stop, it should return OCF_SUCCESS. I wonder if it would be
> safe for the CRM to interpret ERR_INSTALLED on stop as "resource
> stopped."
>
> Opinions?

Florian, can you please ack/nack this patch.

Cheers,

Dejan

> Cheers,
>
> Dejan
>
> P.S. Very sorry for such a delay!
>
> > I read the comments about
> > configurations being on shared storage which might not be available at
> > certain points in time and I see the point. But the way this is
> > implemented clearly does not work for everybody. I vote for making this
> > configurable. Unfortunately, due to several reasons, I am not able to
> > contribute this patch myself at the moment.
> >
> > Regards
> > Dominik
> > _______________________________________________________
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/