Mailing List Archive

[PATCH 2/2] libxl: do not automatically force detach of block devices
From: Paul Durrant <pdurrant@amazon.com>

The manpage for 'xl' documents that guest co-operation is required for a (non-
forced) block-detach operation and that it may consequently fail. Currently,
however, the implementation of generic device removal means that a time-out
of a block-detach is being automatically re-tried with the force flag set
rather than failing. This patch stops such behaviour.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Wei Liu <wl@xen.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
---
tools/libxl/libxl_device.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 0381c5d509..d17ca78848 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -1092,7 +1092,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds,

if (rc == ERROR_TIMEDOUT &&
aodev->action == LIBXL__DEVICE_ACTION_REMOVE &&
- !aodev->force) {
+ !aodev->force &&
+ aodev->dev->kind != LIBXL__DEVICE_KIND_VBD) {
LOGD(DEBUG, aodev->dev->domid, "Timeout reached, initiating forced remove");
aodev->force = 1;
libxl__initiate_device_generic_remove(egc, aodev);
@@ -1103,7 +1104,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds,
LOGD(ERROR, aodev->dev->domid, "unable to %s device with path %s",
libxl__device_action_to_string(aodev->action),
libxl__device_backend_path(gc, aodev->dev));
- goto out;
+ if (!aodev->force)
+ goto out;
}

device_hotplug(egc, aodev);
@@ -1319,7 +1321,8 @@ static void device_hotplug_done(libxl__egc *egc, libxl__ao_device *aodev)
device_hotplug_clean(gc, aodev);

/* Clean xenstore if it's a disconnection */
- if (aodev->action == LIBXL__DEVICE_ACTION_REMOVE) {
+ if (aodev->action == LIBXL__DEVICE_ACTION_REMOVE &&
+ (aodev->force || !aodev->rc)) {
rc = libxl__device_destroy(gc, aodev->dev);
if (!aodev->rc)
aodev->rc = rc;
--
2.20.1
Re: [PATCH 2/2] libxl: do not automatically force detach of block devices [ In reply to ]
On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
>
> The manpage for 'xl' documents that guest co-operation is required for a (non-
> forced) block-detach operation and that it may consequently fail. Currently,
> however, the implementation of generic device removal means that a time-out
> of a block-detach is being automatically re-tried with the force flag set
> rather than failing. This patch stops such behaviour.

Won't this break cleanup on domain shutdown if the guest doesn't close
the devices itself?

I think we need some special-casing on shutdown that keeps the current
behavior on that case.

>
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Ian Jackson <iwj@xenproject.org>
> Cc: Wei Liu <wl@xen.org>
> Cc: Anthony PERARD <anthony.perard@citrix.com>
> ---
> tools/libxl/libxl_device.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
> index 0381c5d509..d17ca78848 100644
> --- a/tools/libxl/libxl_device.c
> +++ b/tools/libxl/libxl_device.c
> @@ -1092,7 +1092,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds,
>
> if (rc == ERROR_TIMEDOUT &&
> aodev->action == LIBXL__DEVICE_ACTION_REMOVE &&
> - !aodev->force) {
> + !aodev->force &&
> + aodev->dev->kind != LIBXL__DEVICE_KIND_VBD) {

Doing this differentiation for block only seems weird, we should treat
all devices equally.

Thanks, Roger.
RE: [PATCH 2/2] libxl: do not automatically force detach of block devices [ In reply to ]
> -----Original Message-----
> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: 04 September 2020 09:53
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Ian Jackson
> <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Anthony PERARD <anthony.perard@citrix.com>
> Subject: Re: [PATCH 2/2] libxl: do not automatically force detach of block devices
>
> On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > The manpage for 'xl' documents that guest co-operation is required for a (non-
> > forced) block-detach operation and that it may consequently fail. Currently,
> > however, the implementation of generic device removal means that a time-out
> > of a block-detach is being automatically re-tried with the force flag set
> > rather than failing. This patch stops such behaviour.
>
> Won't this break cleanup on domain shutdown if the guest doesn't close
> the devices itself?
>

I don't think so... AFAIK that's a destroy i.e. it's always forced (since there's no way the guest can co-operate at the point).

> I think we need some special-casing on shutdown that keeps the current
> behavior on that case.
>
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > ---
> > Cc: Ian Jackson <iwj@xenproject.org>
> > Cc: Wei Liu <wl@xen.org>
> > Cc: Anthony PERARD <anthony.perard@citrix.com>
> > ---
> > tools/libxl/libxl_device.c | 9 ++++++---
> > 1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
> > index 0381c5d509..d17ca78848 100644
> > --- a/tools/libxl/libxl_device.c
> > +++ b/tools/libxl/libxl_device.c
> > @@ -1092,7 +1092,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds,
> >
> > if (rc == ERROR_TIMEDOUT &&
> > aodev->action == LIBXL__DEVICE_ACTION_REMOVE &&
> > - !aodev->force) {
> > + !aodev->force &&
> > + aodev->dev->kind != LIBXL__DEVICE_KIND_VBD) {
>
> Doing this differentiation for block only seems weird, we should treat
> all devices equally.
>

That is not how things are documented in the xl manpage though; block-detach is the only one to have a force option. I assume that was deliberate.

Paul

> Thanks, Roger.
Re: [PATCH 2/2] libxl: do not automatically force detach of block devices [ In reply to ]
On Fri, Sep 04, 2020 at 10:47:37AM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Roger Pau Monné <roger.pau@citrix.com>
> > Sent: 04 September 2020 09:53
> > To: Paul Durrant <paul@xen.org>
> > Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Ian Jackson
> > <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Anthony PERARD <anthony.perard@citrix.com>
> > Subject: Re: [PATCH 2/2] libxl: do not automatically force detach of block devices
> >
> > On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote:
> > > From: Paul Durrant <pdurrant@amazon.com>
> > >
> > > The manpage for 'xl' documents that guest co-operation is required for a (non-
> > > forced) block-detach operation and that it may consequently fail. Currently,
> > > however, the implementation of generic device removal means that a time-out
> > > of a block-detach is being automatically re-tried with the force flag set
> > > rather than failing. This patch stops such behaviour.
> >
> > Won't this break cleanup on domain shutdown if the guest doesn't close
> > the devices itself?
> >
>
> I don't think so... AFAIK that's a destroy i.e. it's always forced (since there's no way the guest can co-operate at the point).

Urgh, yes, sorry.

> > I think we need some special-casing on shutdown that keeps the current
> > behavior on that case.
> >
> > >
> > > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > > ---
> > > Cc: Ian Jackson <iwj@xenproject.org>
> > > Cc: Wei Liu <wl@xen.org>
> > > Cc: Anthony PERARD <anthony.perard@citrix.com>
> > > ---
> > > tools/libxl/libxl_device.c | 9 ++++++---
> > > 1 file changed, 6 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
> > > index 0381c5d509..d17ca78848 100644
> > > --- a/tools/libxl/libxl_device.c
> > > +++ b/tools/libxl/libxl_device.c
> > > @@ -1092,7 +1092,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds,
> > >
> > > if (rc == ERROR_TIMEDOUT &&
> > > aodev->action == LIBXL__DEVICE_ACTION_REMOVE &&
> > > - !aodev->force) {
> > > + !aodev->force &&
> > > + aodev->dev->kind != LIBXL__DEVICE_KIND_VBD) {
> >
> > Doing this differentiation for block only seems weird, we should treat
> > all devices equally.
> >
>
> That is not how things are documented in the xl manpage though; block-detach is the only one to have a force option. I assume that was deliberate.

Oh right, that's weird. I guess removing a block device without guest
cooperation will likely result in a guest crash, while the same it's
not true for other devices.

Thanks, Roger.
Re: [PATCH 2/2] libxl: do not automatically force detach of block devices [ In reply to ]
On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
>
> The manpage for 'xl' documents that guest co-operation is required for a (non-
> forced) block-detach operation and that it may consequently fail. Currently,
> however, the implementation of generic device removal means that a time-out
> of a block-detach is being automatically re-tried with the force flag set
> rather than failing. This patch stops such behaviour.
>
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

I'm two-minded here.

On the one hand, special-casing VBD in libxl to conform to xl's
behaviour seems wrong to me.

On the other hand, if we want to treat all devices the same in libxl,
libxl should drop its force-removal behaviour all together, and the
retry behaviour would need to be implemented in xl for all devices
excepts for VBD. This requires a bit of code churn and, more
importantly, changes how those device removal APIs behave. In the end
this effort may not worth it.

If we go with the patch here, we should document this special case on
libxl level somehow.

Anthony and Ian, do you have any opinion on this?

Wei.

> ---
> Cc: Ian Jackson <iwj@xenproject.org>
> Cc: Wei Liu <wl@xen.org>
> Cc: Anthony PERARD <anthony.perard@citrix.com>
> ---
> tools/libxl/libxl_device.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
> index 0381c5d509..d17ca78848 100644
> --- a/tools/libxl/libxl_device.c
> +++ b/tools/libxl/libxl_device.c
> @@ -1092,7 +1092,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds,
>
> if (rc == ERROR_TIMEDOUT &&
> aodev->action == LIBXL__DEVICE_ACTION_REMOVE &&
> - !aodev->force) {
> + !aodev->force &&
> + aodev->dev->kind != LIBXL__DEVICE_KIND_VBD) {
> LOGD(DEBUG, aodev->dev->domid, "Timeout reached, initiating forced remove");
> aodev->force = 1;
> libxl__initiate_device_generic_remove(egc, aodev);
> @@ -1103,7 +1104,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds,
> LOGD(ERROR, aodev->dev->domid, "unable to %s device with path %s",
> libxl__device_action_to_string(aodev->action),
> libxl__device_backend_path(gc, aodev->dev));
> - goto out;
> + if (!aodev->force)
> + goto out;
> }
>
> device_hotplug(egc, aodev);
> @@ -1319,7 +1321,8 @@ static void device_hotplug_done(libxl__egc *egc, libxl__ao_device *aodev)
> device_hotplug_clean(gc, aodev);
>
> /* Clean xenstore if it's a disconnection */
> - if (aodev->action == LIBXL__DEVICE_ACTION_REMOVE) {
> + if (aodev->action == LIBXL__DEVICE_ACTION_REMOVE &&
> + (aodev->force || !aodev->rc)) {
> rc = libxl__device_destroy(gc, aodev->dev);
> if (!aodev->rc)
> aodev->rc = rc;
> --
> 2.20.1
>
Re: [PATCH 2/2] libxl: do not automatically force detach of block devices [ In reply to ]
On Tue, Sep 08, 2020 at 02:19:03PM +0000, Wei Liu wrote:
> On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > The manpage for 'xl' documents that guest co-operation is required for a (non-
> > forced) block-detach operation and that it may consequently fail. Currently,
> > however, the implementation of generic device removal means that a time-out
> > of a block-detach is being automatically re-tried with the force flag set
> > rather than failing. This patch stops such behaviour.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
>
> I'm two-minded here.
>
> On the one hand, special-casing VBD in libxl to conform to xl's
> behaviour seems wrong to me.
>
> On the other hand, if we want to treat all devices the same in libxl,
> libxl should drop its force-removal behaviour all together, and the
> retry behaviour would need to be implemented in xl for all devices
> excepts for VBD. This requires a bit of code churn and, more
> importantly, changes how those device removal APIs behave. In the end
> this effort may not worth it.

I would be worried about those changes, as we would likely have to
also change libvirt or any other downstreams?

> If we go with the patch here, we should document this special case on
> libxl level somehow.
>
> Anthony and Ian, do you have any opinion on this?

Maybe a new function should be introduced instead, that attempts to
remove a device gracefully and fail otherwise?

Then none of the current APIs would change, and xl could use this new
function to handle VBD removal?

Roger.
Re: [PATCH 2/2] libxl: do not automatically force detach of block devices [ In reply to ]
On Mon, Sep 14, 2020 at 12:41:09PM +0200, Roger Pau Monn? wrote:
> On Tue, Sep 08, 2020 at 02:19:03PM +0000, Wei Liu wrote:
> > On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote:
> > > From: Paul Durrant <pdurrant@amazon.com>
> > >
> > > The manpage for 'xl' documents that guest co-operation is required for a (non-
> > > forced) block-detach operation and that it may consequently fail. Currently,
> > > however, the implementation of generic device removal means that a time-out
> > > of a block-detach is being automatically re-tried with the force flag set
> > > rather than failing. This patch stops such behaviour.
> > >
> > > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> >
> > I'm two-minded here.
> >
> > On the one hand, special-casing VBD in libxl to conform to xl's
> > behaviour seems wrong to me.
> >
> > On the other hand, if we want to treat all devices the same in libxl,
> > libxl should drop its force-removal behaviour all together, and the
> > retry behaviour would need to be implemented in xl for all devices
> > excepts for VBD. This requires a bit of code churn and, more
> > importantly, changes how those device removal APIs behave. In the end
> > this effort may not worth it.
>
> I would be worried about those changes, as we would likely have to
> also change libvirt or any other downstreams?
>
> > If we go with the patch here, we should document this special case on
> > libxl level somehow.
> >
> > Anthony and Ian, do you have any opinion on this?
>
> Maybe a new function should be introduced instead, that attempts to
> remove a device gracefully and fail otherwise?
>
> Then none of the current APIs would change, and xl could use this new
> function to handle VBD removal?

This sounds fine to me.

Wei.

>
> Roger.
Re: [PATCH 2/2] libxl: do not automatically force detach of block devices [ In reply to ]
Wei Liu writes ("Re: [PATCH 2/2] libxl: do not automatically force detach of block devices"):
> On Mon, Sep 14, 2020 at 12:41:09PM +0200, Roger Pau Monn? wrote:
> > Maybe a new function should be introduced instead, that attempts to
> > remove a device gracefully and fail otherwise?
> >
> > Then none of the current APIs would change, and xl could use this new
> > function to handle VBD removal?
>
> This sounds fine to me.

I agree.

If there is going to be different default policy for different devices
it ought to be in xl, not libxl, but frankly I think this is an
anomaly.

I suggest we have a new entrypoint that specifies the fallback
behaviour explicitly. ISTM that there are three possible behaviours:
- fail if the guest does not cooperate
- fall back to force remove
- rip the device out immediately

The last of these would be useful only in rare situations. IDK if the
length of the timeout (for the first two cases) ought to be a
parameter too.

Ian.
RE: [PATCH 2/2] libxl: do not automatically force detach of block devices [ In reply to ]
> -----Original Message-----
> From: Ian Jackson <iwj@xenproject.org>
> Sent: 28 September 2020 14:44
> To: Wei Liu <wl@xen.org>
> Cc: Roger Pau Monn? <roger.pau@citrix.com>; Paul Durrant <paul@xen.org>; xen-
> devel@lists.xenproject.org; Durrant, Paul <pdurrant@amazon.co.uk>; Anthony PERARD
> <anthony.perard@citrix.com>
> Subject: RE: [EXTERNAL] [PATCH 2/2] libxl: do not automatically force detach of block devices
>
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
>
>
>
> Wei Liu writes ("Re: [PATCH 2/2] libxl: do not automatically force detach of block devices"):
> > On Mon, Sep 14, 2020 at 12:41:09PM +0200, Roger Pau Monn? wrote:
> > > Maybe a new function should be introduced instead, that attempts to
> > > remove a device gracefully and fail otherwise?
> > >
> > > Then none of the current APIs would change, and xl could use this new
> > > function to handle VBD removal?
> >
> > This sounds fine to me.
>
> I agree.
>
> If there is going to be different default policy for different devices
> it ought to be in xl, not libxl, but frankly I think this is an
> anomaly.
>
> I suggest we have a new entrypoint that specifies the fallback
> behaviour explicitly.

Indeed. See v2 of my series, posted a couple of weeks ago, specifically:

https://lists.xenproject.org/archives/html/xen-devel/2020-09/msg01029.html

> ISTM that there are three possible behaviours:
> - fail if the guest does not cooperate

That is the newly introduced 'safe_remove'

> - fall back to force remove

That is the existing 'remove'

> - rip the device out immediately

That is the existing 'destroy'

> The last of these would be useful only in rare situations. IDK if the
> length of the timeout (for the first two cases) ought to be a
> parameter too.
>

I think that would be a worthy enhancement but above and beyond the aim of this series.

Paul