Mailing List Archive

libxl: error when destroying domain on NetBSD
Hello,

When destroying a guest (xl destroy <domid>) on NetBSD I get the
following error on the log file:

Waiting for domain test (domid 1) to die [pid 11675]
Domain 1 is dead
Unknown shutdown reason code 255. Destroying domain.
Action for shutdown reason code 255 is destroy
Domain 1 needs to be cleaned up: destroying the domain
do_domctl failed: errno 3
libxl: error: libxl.c:762:libxl_domain_destroy: xc_domain_pause failed for 1
libxl: error: libxl_dom.c:658:userdata_path: unable to find domain
info for domain 1: No such file or directory
do_domctl failed: errno 3
libxl: error: libxl.c:787:libxl_domain_destroy: xc_domain_destroy failed for 1
Done. Exiting now

The domain is destroyed, but xenstore is not cleaned properly, and
hotplug scripts are not executed because the state of the devices
doesn't get to 6 until xl exits. From libxl code I guess the following
procedure is used to destroy the domain:

* Destroy PCI devices
* Pause domain
* Destroy device model (not applicable here, the guest is a PV with no dm)
* Destroy devices (here xl waits for devices to reach state '6', but
they never get there, only when xl exits the state changes to 6)
* Clean xenstore
* Destroy the domain

The 'pause' ctl call fails here, but I've tried to pause a domain
using 'xl pause <domid>' and it works fine. BTW, I'm using
xen-unstable, and the guest is a Debian 6.0.3 PV. Any help on what
might be happening here is welcome.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: libxl: error when destroying domain on NetBSD [ In reply to ]
Also, libxl_domain_destroy is called twice, the first time it is
called from destroy_domain (xl_cmdimpl.c), and the second time it is
called from handle_domain_death, the errors shown on the log above are
from the second call, the one that comes from handle_domain_death.
Don't know if this is the normal behavior, but it seems quite strange
that libxl_domain_destroy is called twice.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: libxl: error when destroying domain on NetBSD [ In reply to ]
On Fri, 2011-12-02 at 09:45 +0000, Roger Pau Monné wrote:
> Also, libxl_domain_destroy is called twice, the first time it is
> called from destroy_domain (xl_cmdimpl.c), and the second time it is
> called from handle_domain_death, the errors shown on the log above are
> from the second call, the one that comes from handle_domain_death.
> Don't know if this is the normal behavior, but it seems quite strange
> that libxl_domain_destroy is called twice.

It's normal I think. handle_domain_death() is there to deal with
graceful shutdown and reboot scenarios e.g. to clear up after the domain
and restart as necessary.

xl destroy is explictly the command which shoots the domain in the head
and so it also calls destroy. If you want graceful you should use "xl
shutdown".

Although handle_domain_death also picks up on the destroy it shouldn't
be doing too much in that case since the interesting work has already
happened. The interesting logs will be the xl -vvv destroy ones I think.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: libxl: error when destroying domain on NetBSD [ In reply to ]
2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>:
> It's normal I think. handle_domain_death() is there to deal with
> graceful shutdown and reboot scenarios e.g. to clear up after the domain
> and restart as necessary.
>
> xl destroy is explictly the command which shoots the domain in the head
> and so it also calls destroy. If you want graceful you should use "xl
> shutdown".
>
> Although handle_domain_death also picks up on the destroy it shouldn't
> be doing too much in that case since the interesting work has already
> happened. The interesting logs will be the xl -vvv destroy ones I think.

Devices doesn't get disconnected, or at least they don't get to state
'6' until xl exits, maybe I should modify libxl_domain_destroy to
search xenstore and try to manually execute hotplug scripts for
devices that are still connected after calling
'libxl__devices_destroy'. Using xl -vvv destroy <domid>' doesn't
print any debug message, only:

xc: debug: hypercall buffer: total allocations:131 total releases:131
xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
xc: debug: hypercall buffer: cache current size:2
xc: debug: hypercall buffer: cache hits:128 misses:2 toobig:1

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: libxl: error when destroying domain on NetBSD [ In reply to ]
On Fri, 2011-12-02 at 10:10 +0000, Roger Pau Monné wrote:
> 2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>:
> > It's normal I think. handle_domain_death() is there to deal with
> > graceful shutdown and reboot scenarios e.g. to clear up after the domain
> > and restart as necessary.
> >
> > xl destroy is explictly the command which shoots the domain in the head
> > and so it also calls destroy. If you want graceful you should use "xl
> > shutdown".
> >
> > Although handle_domain_death also picks up on the destroy it shouldn't
> > be doing too much in that case since the interesting work has already
> > happened. The interesting logs will be the xl -vvv destroy ones I think.
>
> Devices doesn't get disconnected, or at least they don't get to state
> '6' until xl exits, maybe I should modify libxl_domain_destroy to
> search xenstore and try to manually execute hotplug scripts for
> devices that are still connected after calling
> 'libxl__devices_destroy'.

libxl_destroy_domain should be called with force = 1 in the main_destroy
case, I suspect. Does that cause the scripts to be run?

> Using xl -vvv destroy <domid>' doesn't
> print any debug message, only:
>
> xc: debug: hypercall buffer: total allocations:131 total releases:131
> xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
> xc: debug: hypercall buffer: cache current size:2
> xc: debug: hypercall buffer: cache hits:128 misses:2 toobig:1
>
> Thanks, Roger.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: libxl: error when destroying domain on NetBSD [ In reply to ]
2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>:
> libxl_destroy_domain should be called with force = 1 in the main_destroy
> case, I suspect. Does that cause the scripts to be run?

Well, with force = 1 hotplug scripts are executed, but devices are
still busy and they cannot be disconnected (mainly vnd). Also crashed
the server, but that's NetBSD buggy vnd driver problem. Seeing the
execution order in libxl_domain_destroy, shouldn't we first destroy
the domain (xc_domain_destroy) and then remove the devices?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: libxl: error when destroying domain on NetBSD [ In reply to ]
On Fri, 2011-12-02 at 10:29 +0000, Roger Pau Monné wrote:
> 2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>:
> > libxl_destroy_domain should be called with force = 1 in the main_destroy
> > case, I suspect. Does that cause the scripts to be run?
>
> Well, with force = 1 hotplug scripts are executed, but devices are
> still busy and they cannot be disconnected (mainly vnd). Also crashed
> the server, but that's NetBSD buggy vnd driver problem. Seeing the
> execution order in libxl_domain_destroy, shouldn't we first destroy
> the domain (xc_domain_destroy) and then remove the devices?

In the force case, yes, I expect so.

In the non-force case you want to let the guest shutdown its devices
gracefully so you would do devices first.

However I'm not entirely sure that a non-forced libxl_domain_destroy
makes much sense. The callsites are:

* handle_domain_death: The guest has already shutdown at this
point. Nothing graceful can happen.
* create_domain: We have failed to start the guest, no chance of
graceful shutdown.
* destroy_domain: Semantics are explicitly the force case.
* save_domain: Domain has already suspended. There's nothing which
can be done gracefully.
* migrate_domain: Already forced, domain is gone already, no
chance of a graceful shutdown.
* migrate_receive: Already forced, we have failed to receive the
domain, no possibility of a graceful shutdown.
* libxl_domain_create, on the failure path so no need for a
graceful option.
* libxl__destroy_device_model. Maybe this should be doing a
graceful shutdown but in that case it should either be calling
libxl_domain_shutdown or writing the qemu-dm control node and
waiting, at which point after some timeout perhaps a forced
shutdown would be appropriate.

So it seems to me that the non-forced option in libxl_domain_destroy can
be removed and we should just shoot the domain and then forcibly
teardown the backends, running script as necessary.

The only wrinkle is the stub device-model case but really that's already
a special domain and should be treated as such.

Does that make sense to anyone else?

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: libxl: error when destroying domain on NetBSD [ In reply to ]
2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>:
> On Fri, 2011-12-02 at 10:29 +0000, Roger Pau Monné wrote:
>> 2011/12/2 Ian Campbell <Ian.Campbell@citrix.com>:
>> > libxl_destroy_domain should be called with force = 1 in the main_destroy
>> > case, I suspect. Does that cause the scripts to be run?
>>
>> Well, with force = 1 hotplug scripts are executed, but devices are
>> still busy and they cannot be disconnected (mainly vnd). Also crashed
>> the server, but that's NetBSD buggy vnd driver problem. Seeing the
>> execution order in libxl_domain_destroy, shouldn't we first destroy
>> the domain (xc_domain_destroy) and then remove the devices?
>
> In the force case, yes, I expect so.
>
> In the non-force case you want to let the guest shutdown its devices
> gracefully so you would do devices first.
>
> However I'm not entirely sure that a non-forced libxl_domain_destroy
> makes much sense. The callsites are:
>
>      * handle_domain_death: The guest has already shutdown at this
>        point. Nothing graceful can happen.
>      * create_domain: We have failed to start the guest, no chance of
>        graceful shutdown.
>      * destroy_domain: Semantics are explicitly the force case.
>      * save_domain: Domain has already suspended. There's nothing which
>        can be done gracefully.
>      * migrate_domain: Already forced, domain is gone already, no
>        chance of a graceful shutdown.
>      * migrate_receive: Already forced, we have failed to receive the
>        domain, no possibility of a graceful shutdown.
>      * libxl_domain_create, on the failure path so no need for a
>        graceful option.
>      * libxl__destroy_device_model. Maybe this should be doing a
>        graceful shutdown but in that case it should either be calling
>        libxl_domain_shutdown or writing the qemu-dm control node and
>        waiting, at which point after some timeout perhaps a forced
>        shutdown would be appropriate.
>
> So it seems to me that the non-forced option in libxl_domain_destroy can
> be removed and we should just shoot the domain and then forcibly
> teardown the backends, running script as necessary.
>
> The only wrinkle is the stub device-model case but really that's already
> a special domain and should be treated as such.
>
> Does that make sense to anyone else?
>
> Ian.
>

Well, I think I found the underlying problem that was preventing
NetBSD from correctly detaching vnd devices when destroying a domain,
the frontend state needs to be manually set to 6
(/local/domain/<domid>/device/vbd/<devid>/state and the same for vif
to be more "correct") so the kernel closes the device and it can then
be correctly unmounted.


I will prepare a patch (or a series) to adress this, and change
libxl_domain_destroy to use the force when called.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel