Mailing List Archive

Netapp: DFM refresh lun list - possible race issue
Hi,

I stumbled upon an issue with DFM LUN list and how it is reflected in
self.discovered_luns list. Before filing a ticket, I might first ask
if anybody have seen such problem.

Background:
During create volume from snapshot, the Netapp driver will create a
new LUN (openstack volume) by cloning existing LUN (openstack
snapshot).

def create_volume_from_snapshot(self, volume, snapshot):
...
self._clone_lun(lun.HostId, src_path, dest_path, False)
self._refresh_dfm_luns(lun.HostId)
self._discover_dataset_luns(dataset, clone_name)

1) _clone_lun - will create new LUN on the filer
2) _refresh_dfm_luns - asks DFM to refresh his LUN list by querying
filer 'HostId'. This call will block until DFM refresh will finish.
3) _discover_dataset_luns - read the list of LUNs from DFM and update
internal self.discovered_luns

Problem:
After "_refresh_dfm_luns" finishes, DFM is still reporting the LUN
list _without_ the LUN that was just created. This is happening
sporadically, in my case it's about 10-15%. When the new LUN is
missing in "self.discovered_luns", subsequent "create_export" will
bomb-out with "Error: No entry in LUN table for volume ..".

Notes:
I have also tested this by creating LUNs manually and running code
similar to _refresh_dfm_luns / _discover_dataset_luns, with the same
results.

The driver's code looks correct to me. It seems that it is DFM who
cannot guarantee that his LUN list is up-to-date. I have a suspicion
that explicit refresh jobs (from the driver) may be interfering with
internal (croned) DFM refresh jobs.

Workaround:
I'm thinking about wrapping steps 2) 3) with a loop and test if the
cloned LUN is on the discovered_lun list or not. Even if the first
refresh/discover will return out-of-date data, the second seems to be
fine.

Anybody has seen this ? Or might have a better idea how to workaround it ?

Regards,

Brano Zarnovican

PS: Netapp driver (7-mode) is latest from Folsom branch, DFM version
5.1, filer OnTAP 7.3.6P5.

--
Mailing list: https://launchpad.net/~openstack-volume
Post to : openstack-volume@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack-volume
More help : https://help.launchpad.net/ListHelp
Re: Netapp: DFM refresh lun list - possible race issue [ In reply to ]
On Wed, Dec 5, 2012 at 10:41 PM, Swartzlander, Ben
<Ben.Swartzlander@netapp.com> wrote:
> Brano, I think you may be right about DFM failing to ensure the list of LUNs is up to date. Your workaround will probably solve the issue you're seeing.

Hi Ben (, Rushi),

sorry for the much delayed response.

> My question is whether it would slow down operations at all in the 85-90% of the time when DFM does the right thing.

No slow down in Good case. It will only invoke extra
'_lookup_lun_for_volume', which is cheap.

> If not, then I would like to take the change and put it in the official driver.

I would still like to know if DFM is suppose to return up-to-date list
or not. Maybe there is nicer workaround to tweak some config variables
on DFM (or OnTap?).

> Also please do file a bug and we'll make sure it gets assigned to someone at NetApp.

I have just submitted an Openstack bug report with all the details
https://bugs.launchpad.net/nova/+bug/1095633

Regards,

Brano Zarnovican

--
Mailing list: https://launchpad.net/~openstack-volume
Post to : openstack-volume@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack-volume
More help : https://help.launchpad.net/ListHelp