Mailing List Archive

Re: [Linux-HA] Bug in iSCSILogicalUnit
Hi Vadym,

moving the discussion to the -dev list, which is the more appropriate
forum for this. Please reply to -dev; more comments inline.

On Sun, May 20, 2012 at 9:52 PM, Vadym Chepkov <vchepkov@gmail.com> wrote:
> Hi,
>
>
> The monitor operation of  iSCSILogicalUnit  is not specific enough in the regular expression and I got very "nice" fencing going because it was falsely reporting "failed to stop" resource.
>
> I happened to add primitive lun-build10, while already having lun-build1.

It would be bad if _that_ caused problems, but I'm unsure how that
would be related to your patch.

> For myself I have narrowed it down to the following fix, but probably a more appropriate regex has to be applied to these and other commands serving the same purpose for other iSCSI implementations.
>
> diff --git a/heartbeat/iSCSILogicalUnit b/heartbeat/iSCSILogicalUnit
> index 25ee32e..2cee970 100755
> --- a/heartbeat/iSCSILogicalUnit
> +++ b/heartbeat/iSCSILogicalUnit
> @@ -328,7 +328,7 @@ iSCSILogicalUnit_monitor() {
>        tgt)
>            # Figure out and set the target ID
>            TID=`tgtadm --lld iscsi --op show --mode target \
> -               | sed -ne "s/^Target \([[:digit:]]\+\): ${OCF_RESKEY_target_iqn}/\1/p"`
> +               | sed -ne "s/^Target \([[:digit:]]\+\): ${OCF_RESKEY_target_iqn}$/\1/p"`

Adding the end-of-line anchor there does make good sense, but it would
only fix the case of there being two _targets_ sharing part of their
IQN, not two LUs with primitives similarly named. Do you have more
than one target, where the full IQN of one target is a substring of
the IQN of another?

>            if [ -z "$TID" ]; then
>                # Our target is not configured, thus we're not
>                # running.
> @@ -337,7 +337,7 @@ iSCSILogicalUnit_monitor() {
>            # This only looks for the backing store, but does not test
>            # for the correct target ID and LUN.
>            tgtadm --lld iscsi --op show --mode target \
> -               | grep -E -q "[[:space:]]+Backing store.*: ${OCF_RESKEY_path}" && return $OCF_SUCCESS
> +               | grep -E -q "[[:space:]]+Backing store.*: ${OCF_RESKEY_path}$" && return $OCF_SUCCESS
>            ;;
>        lio)

Here the "$" looks OK too, but here it would apply to two backing
devices with overlapping paths. I presume you named your LVs
"lun-build1" and "lun-build10" also, and they're in the same VG?

Cheers,
Florian

--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [Linux-HA] Bug in iSCSILogicalUnit [ In reply to ]
Hi,

I am not subscribed to the devel list, so don't think my e-mail will go there.

Yes, I have the same pattern in iscsi target and in the store path

tgtadm --lld iscsi --op show --mode target | grep ^Target

Target 101: iqn.1999-04.net.videonext:vbuild1
...
Target 110: iqn.1999-04.net.videonext:vbuild10


tgtadm --lld iscsi --op show --mode target | grep Backing|grep by-res
Backing store path: /dev/drbd/by-res/vbuild1
...
Backing store path: /dev/drbd/by-res/vbuild10


vbuild[234] have migrated to other node, see what would have happened if I ran the same command there and had a target called just 'vbuild':

# tgtadm --lld iscsi --op show --mode target | sed -ne "s/^Target \([[:digit:]]\+\): iqn.1999-04.net.videonext:vbuild/\1/p"
1026
1032
1043


On May 21, 2012, at 2:15 PM, Florian Haas wrote:

> Hi Vadym,
>
> moving the discussion to the -dev list, which is the more appropriate
> forum for this. Please reply to -dev; more comments inline.
>
> On Sun, May 20, 2012 at 9:52 PM, Vadym Chepkov <vchepkov@gmail.com> wrote:
>> Hi,
>>
>>
>> The monitor operation of iSCSILogicalUnit is not specific enough in the regular expression and I got very "nice" fencing going because it was falsely reporting "failed to stop" resource.
>>
>> I happened to add primitive lun-build10, while already having lun-build1.
>
> It would be bad if _that_ caused problems, but I'm unsure how that
> would be related to your patch.
>
>> For myself I have narrowed it down to the following fix, but probably a more appropriate regex has to be applied to these and other commands serving the same purpose for other iSCSI implementations.
>>
>> diff --git a/heartbeat/iSCSILogicalUnit b/heartbeat/iSCSILogicalUnit
>> index 25ee32e..2cee970 100755
>> --- a/heartbeat/iSCSILogicalUnit
>> +++ b/heartbeat/iSCSILogicalUnit
>> @@ -328,7 +328,7 @@ iSCSILogicalUnit_monitor() {
>> tgt)
>> # Figure out and set the target ID
>> TID=`tgtadm --lld iscsi --op show --mode target \
>> - | sed -ne "s/^Target \([[:digit:]]\+\): ${OCF_RESKEY_target_iqn}/\1/p"`
>> + | sed -ne "s/^Target \([[:digit:]]\+\): ${OCF_RESKEY_target_iqn}$/\1/p"`
>
> Adding the end-of-line anchor there does make good sense, but it would
> only fix the case of there being two _targets_ sharing part of their
> IQN, not two LUs with primitives similarly named. Do you have more
> than one target, where the full IQN of one target is a substring of
> the IQN of another?
>
>> if [ -z "$TID" ]; then
>> # Our target is not configured, thus we're not
>> # running.
>> @@ -337,7 +337,7 @@ iSCSILogicalUnit_monitor() {
>> # This only looks for the backing store, but does not test
>> # for the correct target ID and LUN.
>> tgtadm --lld iscsi --op show --mode target \
>> - | grep -E -q "[[:space:]]+Backing store.*: ${OCF_RESKEY_path}" && return $OCF_SUCCESS
>> + | grep -E -q "[[:space:]]+Backing store.*: ${OCF_RESKEY_path}$" && return $OCF_SUCCESS
>> ;;
>> lio)
>
> Here the "$" looks OK too, but here it would apply to two backing
> devices with overlapping paths. I presume you named your LVs
> "lun-build1" and "lun-build10" also, and they're in the same VG?
>
> Cheers,
> Florian
>
> --
> Need help with High Availability?
> http://www.hastexo.com/now
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Re: [Linux-HA] Bug in iSCSILogicalUnit [ In reply to ]
Hi,

On Tue, May 22, 2012 at 12:02:49PM +0200, Florian Haas wrote:
> On Tue, May 22, 2012 at 11:48 AM, Florian Haas <florian@hastexo.com> wrote:
> > Your patch looks fine to me for STGT; for IET it should be equally
> > trivial but sadly I don't have the capacity to test that at the
> > moment. Anyone able to give IET with an updated agent a spin?
>
> Fixed RA is in the "iscsi" branch in my fork of resource-agents:
>
> https://github.com/fghaas/resource-agents/tree/iscsi
> https://github.com/fghaas/resource-agents/commits/iscsi/heartbeat/iSCSILogicalUnit

Pulled into upstream master. Many thanks to both of you!

Cheers,

Dejan

> Cheers,
> Florian
>
> --
> Need help with High Availability?
> http://www.hastexo.com/now
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/