Mailing List Archive

Xen 4.2.1 live migration with qemu device model
In this email message:
http://www.gossamer-threads.com/lists/xen/devel/256737
a patch to libxl_domain_suspend is presented to enable live migrate
with the qemu device model in xen-unstable. This patch has *NOT*
been taken into the 4.2.1 tree (as far as I can see).

In 4.2.0, adding this patch and attempting a live migrate using
the qemu device model (using xl) produces a seg fault due to
unitialised variables.

Should I expect live-migrate of qemu-dm vms to work under 4.2.1?
If so, should the patch (or a modification thereof) to remove
the check from libxl_domain_suspend be applied to 4.2.1-testing
or is there more to do?

I am very happy to commit test resources to this.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffeffff700 (LWP 5995)]
0x00007ffff5a0862a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ffff5a0862a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff6d4e970 in libxl__domain_suspend_common_switch_qemu_logdirty (domid=<optimized out>, enable=<optimized out>, user=0x7ffff00024e8) at libxl_dom.c:728
#2 0x00007ffff6d5c1ae in libxl__srm_callout_received_save (msg=0x7fffefffe41a " error", len=<optimized out>, user=0x7ffff00024e8) at _libxl_save_msgs_callout.c:162
#3 0x00007ffff6d5b736 in helper_stdout_readable (egc=0x7fffefffe5a0, ev=0x7ffff0002560, fd=38, events=<optimized out>, revents=<optimized out>) at libxl_save_callout.c:283
#4 0x00007ffff6d601f1 in afterpoll_internal (egc=0x7fffefffe5a0, poller=0x7ffff00028c0, nfds=4, fds=0x7ffff00048b0, now=...) at libxl_event.c:948
#5 0x00007ffff6d604db in eventloop_iteration (egc=0x7fffefffe5a0, poller=0x7ffff00028c0) at libxl_event.c:1368
#6 0x00007ffff6d616b3 in libxl__ao_inprogress (ao=0x7ffff0001d40, file=<optimized out>, line=<optimized out>, func=<optimized out>) at libxl_event.c:1614
#7 0x00007ffff6d3ab75 in libxl_domain_suspend (ctx=<optimized out>, domid=1, fd=10, flags=<optimized out>, ao_how=<optimized out>) at libxl.c:796
#8 0x000000000043677e in migrate_domain_send (ctx=0x7ffff0008860, domid=1, fd=10) at hypervisor/xen_libxl.c:587
#9 0x000000000043698a in live_migrate_send (hyperconn=0x7ffff0001c70, server=0x7ffff0001cb0, node_ip=0x7ffff00041e0 "10.157.128.20", fd=10) at hypervisor/xen_libxl.c:647
#10 0x0000000000422a70 in migrate_server_action (request=0x7ffff0002980) at action/node_action.c:1287
#11 0x00000000004240c1 in runAction (socket_fd=8) at action/handleaction.c:138
#12 0x00000000004179bd in runcomm (socket=0x8) at xvpagent.c:253
#13 0x0000000000427502 in trackedthread_run (arg=0x66df20) at util/util.c:179
#14 0x00007ffff5c9ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007ffff59ca4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x0000000000000000 in ?? ()


--
Alex Bligh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Xen 4.2.1 live migration with qemu device model [ In reply to ]
This particular segv seems to be because in
libxl__domain_suspend_common_switch_qemu_logdirty
in libxl_dom.c the variables 'got', and 'got_ret' do not appear to be initialised
to NULL (got certainly should be, got_ret should be if there is any possibility
of libxl__xs_read_checked not writing to got_ret which it doesn't seem to be).
Code inspection suggests this issue is still there in 4.2.1, hence my wondering
whether other stuff needs bringing in from unstable.


--On 11 December 2012 11:45:42 +0000 Alex Bligh <alex@alex.org.uk> wrote:

> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffeffff700 (LWP 5995)]
> 0x00007ffff5a0862a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) bt
># 0 0x00007ffff5a0862a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
># 1 0x00007ffff6d4e970 in libxl__domain_suspend_common_switch_qemu_logdirty (domid=<optimized out>, enable=<optimized out>, user=0x7ffff00024e8) at libxl_dom.c:728
># 2 0x00007ffff6d5c1ae in libxl__srm_callout_received_save (msg=0x7fffefffe41a " error", len=<optimized out>, user=0x7ffff00024e8) at _libxl_save_msgs_callout.c:162
># 3 0x00007ffff6d5b736 in helper_stdout_readable (egc=0x7fffefffe5a0, ev=0x7ffff0002560, fd=38, events=<optimized out>, revents=<optimized out>) at libxl_save_callout.c:283
># 4 0x00007ffff6d601f1 in afterpoll_internal (egc=0x7fffefffe5a0, poller=0x7ffff00028c0, nfds=4, fds=0x7ffff00048b0, now=...) at libxl_event.c:948
># 5 0x00007ffff6d604db in eventloop_iteration (egc=0x7fffefffe5a0, poller=0x7ffff00028c0) at libxl_event.c:1368
># 6 0x00007ffff6d616b3 in libxl__ao_inprogress (ao=0x7ffff0001d40, file=<optimized out>, line=<optimized out>, func=<optimized out>) at libxl_event.c:1614
># 7 0x00007ffff6d3ab75 in libxl_domain_suspend (ctx=<optimized out>, domid=1, fd=10, flags=<optimized out>, ao_how=<optimized out>) at libxl.c:796
># 8 0x000000000043677e in migrate_domain_send (ctx=0x7ffff0008860, domid=1, fd=10) at hypervisor/xen_libxl.c:587
># 9 0x000000000043698a in live_migrate_send (hyperconn=0x7ffff0001c70, server=0x7ffff0001cb0, node_ip=0x7ffff00041e0 "10.157.128.20", fd=10) at hypervisor/xen_libxl.c:647
># 10 0x0000000000422a70 in migrate_server_action (request=0x7ffff0002980) at action/node_action.c:1287
># 11 0x00000000004240c1 in runAction (socket_fd=8) at action/handleaction.c:138
># 12 0x00000000004179bd in runcomm (socket=0x8) at xvpagent.c:253
># 13 0x0000000000427502 in trackedthread_run (arg=0x66df20) at util/util.c:179
># 14 0x00007ffff5c9ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
># 15 0x00007ffff59ca4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
># 16 0x0000000000000000 in ?? ()



--
Alex Bligh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Xen 4.2.1 live migration with qemu device model [ In reply to ]
On Tue, 2012-12-11 at 11:45 +0000, Alex Bligh wrote:
> In this email message:
> http://www.gossamer-threads.com/lists/xen/devel/256737
> a patch to libxl_domain_suspend is presented to enable live migrate
> with the qemu device model in xen-unstable. This patch has *NOT*
> been taken into the 4.2.1 tree (as far as I can see).
>
> In 4.2.0, adding this patch and attempting a live migrate using
> the qemu device model (using xl) produces a seg fault due to
> unitialised variables.

Really using xl? because the stack trace suggests otherwise.

> Should I expect live-migrate of qemu-dm vms to work under 4.2.1?

Do you mean VMs using the upstream "qemu-xen" device model, as opposed
to the default "qemu-xen-traditional" model?

Migration of HVM guests in 4.2.x is only supported with the
qemu-xen-traditional device model and AFAIK there is no plan to backport
this support to 4.2.

It still shouldn't crash though. I'm not sure how it got this far since
libxl on 4.2 explicitly checks the DM version before attempting to
migrate and will refuse to even try with a qemu-xen DM.

Ian.

> If so, should the patch (or a modification thereof) to remove
> the check from libxl_domain_suspend be applied to 4.2.1-testing
> or is there more to do?
>
> I am very happy to commit test resources to this.
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffeffff700 (LWP 5995)]
> 0x00007ffff5a0862a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) bt
> #0 0x00007ffff5a0862a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #1 0x00007ffff6d4e970 in libxl__domain_suspend_common_switch_qemu_logdirty (domid=<optimized out>, enable=<optimized out>, user=0x7ffff00024e8) at libxl_dom.c:728
> #2 0x00007ffff6d5c1ae in libxl__srm_callout_received_save (msg=0x7fffefffe41a " error", len=<optimized out>, user=0x7ffff00024e8) at _libxl_save_msgs_callout.c:162
> #3 0x00007ffff6d5b736 in helper_stdout_readable (egc=0x7fffefffe5a0, ev=0x7ffff0002560, fd=38, events=<optimized out>, revents=<optimized out>) at libxl_save_callout.c:283
> #4 0x00007ffff6d601f1 in afterpoll_internal (egc=0x7fffefffe5a0, poller=0x7ffff00028c0, nfds=4, fds=0x7ffff00048b0, now=...) at libxl_event.c:948
> #5 0x00007ffff6d604db in eventloop_iteration (egc=0x7fffefffe5a0, poller=0x7ffff00028c0) at libxl_event.c:1368
> #6 0x00007ffff6d616b3 in libxl__ao_inprogress (ao=0x7ffff0001d40, file=<optimized out>, line=<optimized out>, func=<optimized out>) at libxl_event.c:1614
> #7 0x00007ffff6d3ab75 in libxl_domain_suspend (ctx=<optimized out>, domid=1, fd=10, flags=<optimized out>, ao_how=<optimized out>) at libxl.c:796
> #8 0x000000000043677e in migrate_domain_send (ctx=0x7ffff0008860, domid=1, fd=10) at hypervisor/xen_libxl.c:587
> #9 0x000000000043698a in live_migrate_send (hyperconn=0x7ffff0001c70, server=0x7ffff0001cb0, node_ip=0x7ffff00041e0 "10.157.128.20", fd=10) at hypervisor/xen_libxl.c:647
> #10 0x0000000000422a70 in migrate_server_action (request=0x7ffff0002980) at action/node_action.c:1287
> #11 0x00000000004240c1 in runAction (socket_fd=8) at action/handleaction.c:138
> #12 0x00000000004179bd in runcomm (socket=0x8) at xvpagent.c:253
> #13 0x0000000000427502 in trackedthread_run (arg=0x66df20) at util/util.c:179
> #14 0x00007ffff5c9ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> #15 0x00007ffff59ca4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #16 0x0000000000000000 in ?? ()
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Xen 4.2.1 live migration with qemu device model [ In reply to ]
On Tue, 2012-12-11 at 12:05 +0000, Alex Bligh wrote:
> This particular segv seems to be because in
> libxl__domain_suspend_common_switch_qemu_logdirty
> in libxl_dom.c the variables 'got', and 'got_ret' do not appear to be initialised
> to NULL (got certainly should be, got_ret should be if there is any possibility
> of libxl__xs_read_checked not writing to got_ret which it doesn't seem to be).
> Code inspection suggests this issue is still there in 4.2.1, hence my wondering
> whether other stuff needs bringing in from unstable.

libxl__xs_read_checked will always either initialise the variable
(perhaps to NULL) or return an error. On both callsites we check for
error and "goto out".

I think the crash is because the code uses got_ret without checking if
it was NULL, which can happen if the path is not present. Ian (J) does
that make sense as something which is allowed to happen?

As I said in my early mail I'm not sure why you are getting here at all
though.

Ian.


>
> --On 11 December 2012 11:45:42 +0000 Alex Bligh <alex@alex.org.uk> wrote:
>
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7fffeffff700 (LWP 5995)]
> > 0x00007ffff5a0862a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > (gdb) bt
> ># 0 0x00007ffff5a0862a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> ># 1 0x00007ffff6d4e970 in libxl__domain_suspend_common_switch_qemu_logdirty (domid=<optimized out>, enable=<optimized out>, user=0x7ffff00024e8) at libxl_dom.c:728
> ># 2 0x00007ffff6d5c1ae in libxl__srm_callout_received_save (msg=0x7fffefffe41a " error", len=<optimized out>, user=0x7ffff00024e8) at _libxl_save_msgs_callout.c:162
> ># 3 0x00007ffff6d5b736 in helper_stdout_readable (egc=0x7fffefffe5a0, ev=0x7ffff0002560, fd=38, events=<optimized out>, revents=<optimized out>) at libxl_save_callout.c:283
> ># 4 0x00007ffff6d601f1 in afterpoll_internal (egc=0x7fffefffe5a0, poller=0x7ffff00028c0, nfds=4, fds=0x7ffff00048b0, now=...) at libxl_event.c:948
> ># 5 0x00007ffff6d604db in eventloop_iteration (egc=0x7fffefffe5a0, poller=0x7ffff00028c0) at libxl_event.c:1368
> ># 6 0x00007ffff6d616b3 in libxl__ao_inprogress (ao=0x7ffff0001d40, file=<optimized out>, line=<optimized out>, func=<optimized out>) at libxl_event.c:1614
> ># 7 0x00007ffff6d3ab75 in libxl_domain_suspend (ctx=<optimized out>, domid=1, fd=10, flags=<optimized out>, ao_how=<optimized out>) at libxl.c:796
> ># 8 0x000000000043677e in migrate_domain_send (ctx=0x7ffff0008860, domid=1, fd=10) at hypervisor/xen_libxl.c:587
> ># 9 0x000000000043698a in live_migrate_send (hyperconn=0x7ffff0001c70, server=0x7ffff0001cb0, node_ip=0x7ffff00041e0 "10.157.128.20", fd=10) at hypervisor/xen_libxl.c:647
> ># 10 0x0000000000422a70 in migrate_server_action (request=0x7ffff0002980) at action/node_action.c:1287
> ># 11 0x00000000004240c1 in runAction (socket_fd=8) at action/handleaction.c:138
> ># 12 0x00000000004179bd in runcomm (socket=0x8) at xvpagent.c:253
> ># 13 0x0000000000427502 in trackedthread_run (arg=0x66df20) at util/util.c:179
> ># 14 0x00007ffff5c9ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> ># 15 0x00007ffff59ca4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> ># 16 0x0000000000000000 in ?? ()
>
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Xen 4.2.1 live migration with qemu device model [ In reply to ]
Ian,

(two messages in one)

>> In 4.2.0, adding this patch and attempting a live migrate using
>> the qemu device model (using xl) produces a seg fault due to
>> unitialised variables.
>
> Really using xl? because the stack trace suggests otherwise.

Sorry, libxl.

>> Should I expect live-migrate of qemu-dm vms to work under 4.2.1?
>
> Do you mean VMs using the upstream "qemu-xen" device model, as opposed
> to the default "qemu-xen-traditional" model?

Yes, using upstream qemu-xen.

> Migration of HVM guests in 4.2.x is only supported with the
> qemu-xen-traditional device model and AFAIK there is no plan to backport
> this support to 4.2.

Ah. What would be involved in a backport? We use HVM guests under 4.2 and
need qemu-xen for various reasons.

> It still shouldn't crash though. I'm not sure how it got this far since
> libxl on 4.2 explicitly checks the DM version before attempting to
> migrate and will refuse to even try with a qemu-xen DM.

We had removed the check (per the pointer to the mail message I sent)
for the qemu-xen model.

> libxl__xs_read_checked will always either initialise the variable
> (perhaps to NULL) or return an error. On both callsites we check for
> error and "goto out".

It returns NULL if the error is not ENOENT I think.

> I think the crash is because the code uses got_ret without checking if
> it was NULL, which can happen if the path is not present. Ian (J) does
> that make sense as something which is allowed to happen?

gdb showed one pointer was NULL and the other pointed to some rubbish
(the latter is confusing).

Alex



>
> Ian.
>
>> If so, should the patch (or a modification thereof) to remove
>> the check from libxl_domain_suspend be applied to 4.2.1-testing
>> or is there more to do?
>>
>> I am very happy to commit test resources to this.
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7fffeffff700 (LWP 5995)]
>> 0x00007ffff5a0862a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> (gdb) bt
>> # 0 0x00007ffff5a0862a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> # 1 0x00007ffff6d4e970 in libxl__domain_suspend_common_switch_qemu_logdirty (domid=<optimized out>, enable=<optimized out>, user=0x7ffff00024e8) at libxl_dom.c:728
>> # 2 0x00007ffff6d5c1ae in libxl__srm_callout_received_save (msg=0x7fffefffe41a " error", len=<optimized out>, user=0x7ffff00024e8) at _libxl_save_msgs_callout.c:162
>> # 3 0x00007ffff6d5b736 in helper_stdout_readable (egc=0x7fffefffe5a0, ev=0x7ffff0002560, fd=38, events=<optimized out>, revents=<optimized out>) at libxl_save_callout.c:283
>> # 4 0x00007ffff6d601f1 in afterpoll_internal (egc=0x7fffefffe5a0, poller=0x7ffff00028c0, nfds=4, fds=0x7ffff00048b0, now=...) at libxl_event.c:948
>> # 5 0x00007ffff6d604db in eventloop_iteration (egc=0x7fffefffe5a0, poller=0x7ffff00028c0) at libxl_event.c:1368
>> # 6 0x00007ffff6d616b3 in libxl__ao_inprogress (ao=0x7ffff0001d40, file=<optimized out>, line=<optimized out>, func=<optimized out>) at libxl_event.c:1614
>> # 7 0x00007ffff6d3ab75 in libxl_domain_suspend (ctx=<optimized out>, domid=1, fd=10, flags=<optimized out>, ao_how=<optimized out>) at libxl.c:796
>> # 8 0x000000000043677e in migrate_domain_send (ctx=0x7ffff0008860, domid=1, fd=10) at hypervisor/xen_libxl.c:587
>> # 9 0x000000000043698a in live_migrate_send (hyperconn=0x7ffff0001c70, server=0x7ffff0001cb0, node_ip=0x7ffff00041e0 "10.157.128.20", fd=10) at hypervisor/xen_libxl.c:647
>> # 10 0x0000000000422a70 in migrate_server_action (request=0x7ffff0002980) at action/node_action.c:1287
>> # 11 0x00000000004240c1 in runAction (socket_fd=8) at action/handleaction.c:138
>> # 12 0x00000000004179bd in runcomm (socket=0x8) at xvpagent.c:253
>> # 13 0x0000000000427502 in trackedthread_run (arg=0x66df20) at util/util.c:179
>> # 14 0x00007ffff5c9ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
>> # 15 0x00007ffff59ca4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> # 16 0x0000000000000000 in ?? ()
>>
>>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>



--
Alex Bligh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Xen 4.2.1 live migration with qemu device model [ In reply to ]
On Tue, 2012-12-11 at 15:07 +0000, Alex Bligh wrote:
> > Migration of HVM guests in 4.2.x is only supported with the
> > qemu-xen-traditional device model and AFAIK there is no plan to backport
> > this support to 4.2.
>
> Ah. What would be involved in a backport? We use HVM guests under 4.2 and
> need qemu-xen for various reasons.

AFAIK its a pretty big qemu-side backport, plus you need the whole libxl
series not just the final patch that you linked to.

I don't think this is a candidate for 4.2.x. You could try doing it
locally though I guess.

> > It still shouldn't crash though. I'm not sure how it got this far since
> > libxl on 4.2 explicitly checks the DM version before attempting to
> > migrate and will refuse to even try with a qemu-xen DM.
>
> We had removed the check (per the pointer to the mail message I sent)
> for the qemu-xen model.

Oh well, then all bets are off really.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Xen 4.2.1 live migration with qemu device model [ In reply to ]
On Tue, 11 Dec 2012, Ian Campbell wrote:
> On Tue, 2012-12-11 at 15:07 +0000, Alex Bligh wrote:
> > > Migration of HVM guests in 4.2.x is only supported with the
> > > qemu-xen-traditional device model and AFAIK there is no plan to backport
> > > this support to 4.2.
> >
> > Ah. What would be involved in a backport? We use HVM guests under 4.2 and
> > need qemu-xen for various reasons.
>
> AFAIK its a pretty big qemu-side backport, plus you need the whole libxl
> series not just the final patch that you linked to.
>
> I don't think this is a candidate for 4.2.x. You could try doing it
> locally though I guess.

This is the patch series that needs to be backported in QEMU:
http://marc.info/?l=qemu-devel&m=134920288412400&w=2

And this is the libxl counterpart:
http://marc.info/?l=xen-devel&m=134944750724252

I would be OK with backporting the QEMU side, but I'll leave the
decision on the libxl side up to you.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Xen 4.2.1 live migration with qemu device model [ In reply to ]
Stefono, Ian,

--On 11 December 2012 15:26:25 +0000 Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:

> This is the patch series that needs to be backported in QEMU:
> http://marc.info/?l=qemu-devel&m=134920288412400&w=2
>
> And this is the libxl counterpart:
> http://marc.info/?l=xen-devel&m=134944750724252
>
> I would be OK with backporting the QEMU side,

That would be great and very useful (even if it doesn't ultimately make 4.2.x)

> but I'll leave the
> decision on the libxl side up to you.

I'd already planned to try backporting this patchset. If it goes in reasonably cleanly,
it's probably within my competence, and I'm happy to test etc.

We need qemu-xen for the better snapshotting capability on qcow2 (rebase
in particular), and not having live-migrate on HVM is a bit of a PITA.

--
Alex Bligh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Xen 4.2.1 live migration with qemu device model [ In reply to ]
On Tue, 11 Dec 2012, Alex Bligh wrote:
> Stefono, Ian,
>
> --On 11 December 2012 15:26:25 +0000 Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
>
> > This is the patch series that needs to be backported in QEMU:
> > http://marc.info/?l=qemu-devel&m=134920288412400&w=2
> >
> > And this is the libxl counterpart:
> > http://marc.info/?l=xen-devel&m=134944750724252
> >
> > I would be OK with backporting the QEMU side,
>
> That would be great and very useful (even if it doesn't ultimately make 4.2.x)
>
> > but I'll leave the
> > decision on the libxl side up to you.
>
> I'd already planned to try backporting this patchset. If it goes in reasonably cleanly,
> it's probably within my competence, and I'm happy to test etc.
>
> We need qemu-xen for the better snapshotting capability on qcow2 (rebase
> in particular), and not having live-migrate on HVM is a bit of a PITA.

It would be great if you could implement QEMU qcow2 snapshotting support
in libxl, so that everything can happen seamlessly using a variation of xl
save/restore.
It is a while that we wanted that feature but we never got around to it.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Xen 4.2.1 live migration with qemu device model [ In reply to ]
--On 11 December 2012 19:24:54 +0000 Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:

> It would be great if you could implement QEMU qcow2 snapshotting support
> in libxl, so that everything can happen seamlessly using a variation of xl
> save/restore.

I'd guess that isn't hard. We already have QEMU qcow2 snapshotting support
on qemu-xen working via direct calls to QEMU - I think it 'worked just like
kvm'. I seem to remember there is some fiddling to do with an open fd number
in qemu for the live rebase, and some limitations (for instance I couldn't
see how to do a consistent snapshot of the same device presented both pv
and emulated under HVM - which is theoretically an issue if you have
one partition mounted one way and one the other).

--
Alex Bligh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Xen 4.2.1 live migration with qemu device model [ In reply to ]
Ian Campbell writes ("Re: [Xen-devel] Xen 4.2.1 live migration with qemu device model"):
> libxl__xs_read_checked will always either initialise the variable
> (perhaps to NULL) or return an error. On both callsites we check for
> error and "goto out".

Yes. And the error handling case after the strcmp does indeed handle
got_ret==NULL but the test fails to guard against this before running
strcmp.

> I think the crash is because the code uses got_ret without checking if
> it was NULL, which can happen if the path is not present. Ian (J) does
> that make sense as something which is allowed to happen?

Yes. I think it is an error somewhere. I don't understand how this
is happening. This situation might occur if the qemu crashed and two
separate attempts were made to send a logdirty command, I guess.

> As I said in my early mail I'm not sure why you are getting here at all
> though.

Right.

I think the patch below fixes the segfault but it doesn't fix the
underlying cause.

Ian.

Subject: libxl: qemu trad logdirty: Tolerate ENOENT on ret path

It can happen in error conditions that lds->ret_path doesn't exist,
and libxl__xs_read_checked signals this by setting got_ret=NULL. If
this happens, fail without crashing.

Reported-by: Alex Bligh <alex@alex.org.uk>,
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 95da18e..7586a6c 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -725,7 +725,7 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty
rc = libxl__xs_read_checked(gc, t, lds->ret_path, &got_ret);
if (rc) goto out;

- if (strcmp(got, got_ret)) {
+ if (!got_ret || strcmp(got, got_ret)) {
LOG(ERROR,"controlling logdirty: qemu was already sent"
" command `%s' (xenstore path `%s') but result is `%s'",
got, lds->cmd_path, got_ret ? got_ret : "<none>");

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Xen 4.2.1 live migration with qemu device model [ In reply to ]
> Subject: libxl: qemu trad logdirty: Tolerate ENOENT on ret path
>
> It can happen in error conditions that lds->ret_path doesn't exist,
> and libxl__xs_read_checked signals this by setting got_ret=NULL. If
> this happens, fail without crashing.
>
> Reported-by: Alex Bligh <alex@alex.org.uk>,
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

Acked + applied, thanks.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel