Mailing List Archive

[PATCH 0/3] xen-blkback: refactor vbd remove/disconnect.
This patchset is a backport and original patch's author is Daniel Stodden:
http://xenbits.xen.org/hg/XCP/linux-2.6.32.pq.hg/file/tip/CA-7672-blkback-shutdown.patch

Initial issue:
When we do block device attach/detach test with below steps, umount hang
in guest and the guest unable to shutdown:

1. start guest with the latest kernel.
2. attach new block device by xm block-attach in Dom0
3. mount new disk in guest
4. execute xm block-detach to detach the block device in dom0 until timeout
5. try to unmount the disk in guest, umount hung. at here, any IOs to the
device will hang.

Root cause:
This caused by 'xm block-detach' in Dom0 set backend device's state to
'XenbusStateClosing', frontend received the notification and
blkfront_closing() be called, at the moment, the disk still using by guest,
so frontend refused to close. In the blkfront_closing(), frontend send a
notification to backend said that the its state switched to 'Closing', when
backend got the event, it will disconnect from real device, at here any IO
request will be stuck, even tried to release the disk by umount.

So this may fix either frontend or backend, I have send a fix for frontend:
https://lkml.org/lkml/2011/7/8/159
Ian think we should fix it from backend and he pointed out Daniel Stodden have
submitted a patch(see above link) for xen-blkback, I tried it and it works
well.

drivers/block/xen-blkback/blkback.c | 10 +--
drivers/block/xen-blkback/common.h | 5 +
drivers/block/xen-blkback/xenbus.c | 202 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
3 files changed, 191 insertions(+), 26 deletions(-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH 0/3] xen-blkback: refactor vbd remove/disconnect. [ In reply to ]
>>> Joe Jin 08/03/11 4:10 AM >>>
>1. start guest with the latest kernel.
>2. attach new block device by xm block-attach in Dom0
>3. mount new disk in guest
>4. execute xm block-detach to detach the block device in dom0 until timeout
>5. try to unmount the disk in guest, umount hung. at here, any IOs to the
>device will hang.

A bogus sequence of operations - an operator in Dom0 shouldn't remove
a device that is still in use in a guest, except as an exceptionasl measure
(and then other bad behavior is to be expected). What's the use case?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH 0/3] xen-blkback: refactor vbd remove/disconnect. [ In reply to ]
On Wed, 2011-08-03 at 10:53 -0400, Jan Beulich wrote:
> >>> Joe Jin 08/03/11 4:10 AM >>>
> >1. start guest with the latest kernel.
> >2. attach new block device by xm block-attach in Dom0
> >3. mount new disk in guest
> >4. execute xm block-detach to detach the block device in dom0 until timeout

Doesn't this fail immediately in userland right now? The disconnect
attempt (blkback switching to Closing) will be acknowledged with an
error, so you'd watch backend and frontend state simultaneously.

>From there on, there might be options. Could switch back to Connected
and call it a day, but that's currently nowhere exercised, so not sure
if it's fully stable.

But the behavior established in XCP is to bail out, but leave the
backend at Closing. That means from the backend perspective, the VBD
will unplug and get cleaned up at an unspecific later point in time. But
stay operational.

Note that the latter takes udev work, to finally clean up attached disk
images after completion.

> >5. try to unmount the disk in guest, umount hung. at here, any IOs to the
> >device will hang.

Yeah, not so good. That's what --force would imply.

> A bogus sequence of operations - an operator in Dom0 shouldn't remove
> a device that is still in use in a guest, except as an exceptionasl measure
> (and then other bad behavior is to be expected). What's the use case?

The administrative perspective is right, but it's a valid one. And even
if you're coordinating dom0 unplug with guest umount -- there needs a
way to handly buggy coordination.

We don't have guest usage indication in the frontend record, so backends
resort to try/error instead.

I think try/error is the way to deal with it. Transaction stuff
necessary to synch a front/back-coordinated disconnect seems over the
top, and we'd probably want compat behavior anyway.

Not sure what xm/xl can or wants to do about delayed detaches. If it
sounds undesirable, we probably want to consider patch to blkback to
abort shutdown-request and flip back to Connected.

So a controller would try in sync, fail in sync, and rolls back. So it
won't have to deal with backend detach later. Sounds simpler. I don't
think there's really a particular reason XCP chose to behave the way it
does, except low hanging fruit.

Not convinced it doesn't need transactions either, because you could see
the tool/blkback interaction race blkback/blkfront driven by guest
umount. But it may yield a simpler UI.

Please correct me where my lack of OSS toolstack understanding
struck. :)

Daniel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel