Mailing List Archive

[RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain.
General idea is to allow freely set device_model_version and
device_model_stubdomain_override and choose the right options based on this choice.
Also, allow to specific path to stubdomain kernel/ramdisk, for greater flexibility.
Right now when qemu-xen in stubdomain is selected, it is assumed it's
Linux-based and few decisions are based on it, specifically:
- qemu args encoding (\x1b as separator, to allow spaces in arguments)
- save/restore stream access (specific FDs passed to qemu by a wrapper script)
- QMP access, if any

It may be a good idea to document "stubdomain API" somewhere. Is there such
document for MiniOS one? Here is an initial version for Linux one:

Assumptions about Linux-based stubdomain for qemu-xen:
* qemu command line is stored by toolstack in
/vm/<target-uuid>/image/dmargs xenstore entry, arguments are separated
with \x1b character
* qemu can access saved state (if any) at its FD 3
* qemu can write its state (for save/migration) to its FD 4
* qemu expects backend for serial console at /dev/hvc3
* disks configured for the target are available as /dev/xvd*, in
configuration order
* qemu can call /etc/qemu-ifup and /etc/qemu-ifdown to connect/disconnect
network interface to appropriate network

Initial version has no QMP support - in initial patches it is completely
disabled, which means no suspend/restore. QMP normally would be used for PCI
passthrough setup, but it is worked around with MiniOS-like control protocol
over xenstore, which then is translated to QMP on stubdomain side.
Some option is to use separate console for that, but that require
xenstoled supporting multiple consoles per domain (the goal is to not have qemu
in dom0 at all). Also, it would be preferable to evaluate how libxl
handle potentially malicious QMP responses.
Another option is to use xenstore - either translate everything needed to
MiniOS-like thing, or write already json-formatted requests to xenstore and
watch there for responses. When using separate xenstore dir for that, with
per-command entries (command id as a key name?) that would solve concurrency
problem.

QMP support over separate console: patch "libxl: access QMP socket via console
for qemu-in-stubdomain" implements some early PoC of this.
Major limitation: only one connection at a time and no means of out of
band reset (and re-negotiate). I've tried adding very simple `qmp_reset`
command for in-band connection reset, but it is unreliable because of the
first limitation - xl process running in background hold this connection open
and every other xl instance interfere with it. On the other hand, for libvirt
use case one connection could be enough (leaving alone libvirtd restart).

Xenconsoled patches add support for multiple consoles in xenconsoled, to avoid the need
for qemu in dom0 for this to work. Multiple consoles for a stubdomain are used for:
- logs (console 0)
- save + restore (console 1, 2)
- qmp (console 3)
- serial terminal to target domU (console 4)
Xenconsoled patches are in fact a separate series, but I'm sending them here to
ease dependencies handling (latter libxl patches use that).

What qmp-libxenstat socket is for?

PCI passthrough support require some more love. Right now, libxl tries to setup
pcifront for both target HVM and stubdomain and the former fails (race
condition):
xen-pciback pci-259-0: 22 Couldn't locate PCI device (0000:00:1b.0)! perhaps already in-use?
Fortunately it isn't needed. There is also a PCI related problem on
domain shutdown - it looks like first stubdomain is paused and then libxl waits
for pcifront there.
Also, MSI doesn't work (qemu output):

[00:05.0] xen_pt_msgctrl_reg_write: setup MSI (register: 81).
[00:05.0] msi_msix_setup: requested pirq 55 for MSI (vec: 0, entry: 0)
[00:05.0] msi_msix_setup: Error: Mapping of MSI (err: 1, vec: 0, entry 0)
[00:05.0] xen_pt_msgctrl_reg_write: Warning: Can not map MSI (register: 80)!
[00:05.0] Write-back to unknown field 0x44 (partially) inhibited (0x00)
[00:05.0] If the device doesn't work, try enabling permissive mode
[00:05.0] (unsafe) and if it helps report the problem to xen-devel

The actual stubdomain implementation is here:

https://github.com/marmarek/qubes-vmm-xen-stubdom-linux (branch for-upstream)

See readme there for build instructions.

Remaining parts for eliminating dom0's instance of qemu:
- do not force QDISK backend for CDROM

This patch series is third (fourth?) attempt to get rid of limitation
"if you want to use stubdomain, you're stuck with qemu-traditional", done over
years, by many people.
I think it should be acceptable plan to gradually add features to
qemu-xen+stubdomain configuration - not necessary waiting with committing any
of those patches until full feature set of qemu-xen is supported. After all,
right now "feature set supported by qemu-xen+stubdom" is empty, so wouldn't be
worse.

Changes in v2:
- apply review comments by Jason Andryuk

Cc: Simon Gaiser <simon@invisiblethingslab.com>
Cc: Eric Shelton <eshelton@pobox.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

Eric Shelton (3):
libxl: Add "stubdomain_version" to domain_build_info.
libxl: Handle Linux stubdomain specific QEMU options.
libxl: Build the domain with a Linux based stubdomain

Marek Marczykowski-Górecki (13):
libxl: fix qemu-trad cmdline for no sdl/vnc case
libxl: create vkb device only for guests with graphics output
libxl: add save/restore support for qemu-xen in stubdomain
xl: add stubdomain related options to xl config parser
libxl: use \x1b to separate qemu arguments for linux stubdomain
xenconsoled: install xenstore watch for all supported consoles
xenconsoled: add support for consoles using 'state' xenstore entry
xenconsoled: make console_type->use_gnttab less confusing
xenconsoled: add support for up to 3 secondary consoles
xenconsoled: deduplicate error handling
xenconsoled: add support for non-pty output
libxl: access QMP socket via console for qemu-in-stubdomain
libxl: use xenconsoled even for multiple stubdomain's consoles

Simon Gaiser (1):
libxl: use xenstore for pci hotplug qemu-in-linux-stubdom commands

docs/man/xl.cfg.pod.5.in | 23 ++-
tools/console/daemon/io.c | 222 ++++++++++++++++++++++++++++-----
tools/libxl/libxl_create.c | 83 ++++++++++--
tools/libxl/libxl_dm.c | 190 +++++++++++++++++++---------
tools/libxl/libxl_dom_suspend.c | 10 +-
tools/libxl/libxl_internal.c | 22 +++-
tools/libxl/libxl_internal.h | 9 +-
tools/libxl/libxl_mem.c | 6 +-
tools/libxl/libxl_pci.c | 22 ++-
tools/libxl/libxl_qmp.c | 52 +++++++-
tools/libxl/libxl_types.idl | 10 +-
tools/xl/xl_parse.c | 7 +-
12 files changed, 546 insertions(+), 110 deletions(-)

base-commit: e752f28409678ce3ade49986b39309178fb2a6d6
--
git-series 0.9.1

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
Marek Marczykowski-Górecki writes ("[RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> [stuff]

Thanks for this. I'm very keen on this approach and keen on getting
it upstream. Sorry for the delay reviewing it! I have been away a
lot.

> General idea is to allow freely set device_model_version and
> device_model_stubdomain_override and choose the right options based on this choice.
> Also, allow to specific path to stubdomain kernel/ramdisk, for greater flexibility.
> Right now when qemu-xen in stubdomain is selected, it is assumed it's
> Linux-based and few decisions are based on it, specifically:

That sounds sensible.

> - qemu args encoding (\x1b as separator, to allow spaces in arguments)

What if the arguments contain \x1b ? I think you may need a quoting
scheme, or to use nul.

> It may be a good idea to document "stubdomain API" somewhere. Is there such
> document for MiniOS one?

No.

> Here is an initial version for Linux one:
>
> Assumptions about Linux-based stubdomain for qemu-xen:
> * qemu command line is stored by toolstack in
> /vm/<target-uuid>/image/dmargs xenstore entry, arguments are separated
> with \x1b character

Oh, xenstore. From docs/misc/xenstore.txt:

| xenstore values should normally be 7-bit ASCII text strings
| containing bytes 0x20..0x7f only

I think you could have separate xenstore keys for the seperate
arguments, or you could encode it somehow.

> * qemu can access saved state (if any) at its FD 3
> * qemu can write its state (for save/migration) to its FD 4

That's a description of the protocol to *qemu*. Is the toolstack then
responsible for the protocol across the domain boundary ? I think it
would be nice to specify that here too.

> * qemu expects backend for serial console at /dev/hvc3
> * disks configured for the target are available as /dev/xvd*, in
> configuration order
> * qemu can call /etc/qemu-ifup and /etc/qemu-ifdown to connect/disconnect
> network interface to appropriate network

Please can we put this in-tree and not in a 00/ cover letter :-).

> Initial version has no QMP support - in initial patches it is completely
> disabled, which means no suspend/restore. QMP normally would be used for PCI
> passthrough setup, but it is worked around with MiniOS-like control protocol
> over xenstore, which then is translated to QMP on stubdomain side.

How unpleasant. But better than nothing.

> Some option is to use separate console for that, but that require
> xenstoled supporting multiple consoles per domain (the goal is to not have qemu
> in dom0 at all). Also, it would be preferable to evaluate how libxl
> handle potentially malicious QMP responses.

We are working on that latter point anyway.

> Another option is to use xenstore - either translate everything needed to
> MiniOS-like thing, or write already json-formatted requests to xenstore and
> watch there for responses. When using separate xenstore dir for that, with
> per-command entries (command id as a key name?) that would solve concurrency
> problem.

That would be fine too.

> QMP support over separate console: patch "libxl: access QMP socket via console
> for qemu-in-stubdomain" implements some early PoC of this.
> Major limitation: only one connection at a time and no means of out of
> band reset (and re-negotiate). I've tried adding very simple `qmp_reset`
> command for in-band connection reset, but it is unreliable because of the
> first limitation - xl process running in background hold this connection open
> and every other xl instance interfere with it. On the other hand, for libvirt
> use case one connection could be enough (leaving alone libvirtd restart).

This is awkward, isn't it. The Xen PV console protocol has no reset
mechanism. Can we use libvchan here and provide qmp vchans ?

> Xenconsoled patches add support for multiple consoles in xenconsoled, to avoid the need
> for qemu in dom0 for this to work. Multiple consoles for a stubdomain are used for:
> - logs (console 0)
> - save + restore (console 1, 2)
> - qmp (console 3)
> - serial terminal to target domU (console 4)
> Xenconsoled patches are in fact a separate series, but I'm sending them here to
> ease dependencies handling (latter libxl patches use that).

That sounds reasonable.

> What qmp-libxenstat socket is for?
>
> PCI passthrough support require some more love. Right now, libxl tries to setup
> pcifront for both target HVM and stubdomain and the former fails (race
> condition):
> xen-pciback pci-259-0: 22 Couldn't locate PCI device (0000:00:1b.0)! perhaps already in-use?
> Fortunately it isn't needed. There is also a PCI related problem on
> domain shutdown - it looks like first stubdomain is paused and then libxl waits
> for pcifront there.
> Also, MSI doesn't work (qemu output):
>
> [00:05.0] xen_pt_msgctrl_reg_write: setup MSI (register: 81).
> [00:05.0] msi_msix_setup: requested pirq 55 for MSI (vec: 0, entry: 0)
> [00:05.0] msi_msix_setup: Error: Mapping of MSI (err: 1, vec: 0, entry 0)
> [00:05.0] xen_pt_msgctrl_reg_write: Warning: Can not map MSI (register: 80)!
> [00:05.0] Write-back to unknown field 0x44 (partially) inhibited (0x00)
> [00:05.0] If the device doesn't work, try enabling permissive mode
> [00:05.0] (unsafe) and if it helps report the problem to xen-devel

I confess I don't understand PCI passthrough but it would be nice for
this to work. I think a starting point might be to write down who is
supposed to do what...

> This patch series is third (fourth?) attempt to get rid of limitation
> "if you want to use stubdomain, you're stuck with qemu-traditional", done over
> years, by many people.
> I think it should be acceptable plan to gradually add features to
> qemu-xen+stubdomain configuration - not necessary waiting with committing any
> of those patches until full feature set of qemu-xen is supported. After all,
> right now "feature set supported by qemu-xen+stubdom" is empty, so wouldn't be
> worse.

Absolutely.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
On Tue, Oct 16, 2018 at 05:53:02PM +0100, Ian Jackson wrote:
> Marek Marczykowski-Górecki writes ("[RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> > [stuff]
>
> Thanks for this. I'm very keen on this approach and keen on getting
> it upstream. Sorry for the delay reviewing it! I have been away a
> lot.
>
> > General idea is to allow freely set device_model_version and
> > device_model_stubdomain_override and choose the right options based on this choice.
> > Also, allow to specific path to stubdomain kernel/ramdisk, for greater flexibility.
> > Right now when qemu-xen in stubdomain is selected, it is assumed it's
> > Linux-based and few decisions are based on it, specifically:
>
> That sounds sensible.
>
> > - qemu args encoding (\x1b as separator, to allow spaces in arguments)
>
> What if the arguments contain \x1b ?

Well, here I've changed "What if the arguments contain a space" to "What
if the arguments contain \x1b" ;)

> I think you may need a quoting
> scheme, or to use nul.

The main reason for this over alternatives (including nul) is using
shell script on the stubdomain side to handle this. Handling nul char in
shell is major PITA.

> > It may be a good idea to document "stubdomain API" somewhere. Is there such
> > document for MiniOS one?
>
> No.
>
> > Here is an initial version for Linux one:
> >
> > Assumptions about Linux-based stubdomain for qemu-xen:
> > * qemu command line is stored by toolstack in
> > /vm/<target-uuid>/image/dmargs xenstore entry, arguments are separated
> > with \x1b character
>
> Oh, xenstore. From docs/misc/xenstore.txt:
>
> | xenstore values should normally be 7-bit ASCII text strings
> | containing bytes 0x20..0x7f only
>
> I think you could have separate xenstore keys for the seperate
> arguments, or you could encode it somehow.

What "normally" means in this context? For me it doesn't mean other
characters are prohibited.

> > * qemu can access saved state (if any) at its FD 3
> > * qemu can write its state (for save/migration) to its FD 4
>
> That's a description of the protocol to *qemu*. Is the toolstack then
> responsible for the protocol across the domain boundary ? I think it
> would be nice to specify that here too.

Well, toolstack is responsible for qemu command line (and QMP), so it is
part of the stubdomain protocol...

>
> > * qemu expects backend for serial console at /dev/hvc3
> > * disks configured for the target are available as /dev/xvd*, in
> > configuration order
> > * qemu can call /etc/qemu-ifup and /etc/qemu-ifdown to connect/disconnect
> > network interface to appropriate network

Actually, after recent changes in our linux stubdomain, this last point
wouldn't be needed. It's handled internally by observing qmp events.

> Please can we put this in-tree and not in a 00/ cover letter :-).

Sure.

> > Initial version has no QMP support - in initial patches it is completely
> > disabled, which means no suspend/restore. QMP normally would be used for PCI
> > passthrough setup, but it is worked around with MiniOS-like control protocol
> > over xenstore, which then is translated to QMP on stubdomain side.
>
> How unpleasant. But better than nothing.
>
> > Some option is to use separate console for that, but that require
> > xenstoled supporting multiple consoles per domain (the goal is to not have qemu
> > in dom0 at all). Also, it would be preferable to evaluate how libxl
> > handle potentially malicious QMP responses.
>
> We are working on that latter point anyway.
>
> > Another option is to use xenstore - either translate everything needed to
> > MiniOS-like thing, or write already json-formatted requests to xenstore and
> > watch there for responses. When using separate xenstore dir for that, with
> > per-command entries (command id as a key name?) that would solve concurrency
> > problem.
>
> That would be fine too.
>
> > QMP support over separate console: patch "libxl: access QMP socket via console
> > for qemu-in-stubdomain" implements some early PoC of this.
> > Major limitation: only one connection at a time and no means of out of
> > band reset (and re-negotiate). I've tried adding very simple `qmp_reset`
> > command for in-band connection reset, but it is unreliable because of the
> > first limitation - xl process running in background hold this connection open
> > and every other xl instance interfere with it. On the other hand, for libvirt
> > use case one connection could be enough (leaving alone libvirtd restart).
>
> This is awkward, isn't it. The Xen PV console protocol has no reset
> mechanism. Can we use libvchan here and provide qmp vchans ?

That would be an option too. Require (trivial) vchan->unix socket proxy.

> > Xenconsoled patches add support for multiple consoles in xenconsoled, to avoid the need
> > for qemu in dom0 for this to work. Multiple consoles for a stubdomain are used for:
> > - logs (console 0)
> > - save + restore (console 1, 2)
> > - qmp (console 3)
> > - serial terminal to target domU (console 4)
> > Xenconsoled patches are in fact a separate series, but I'm sending them here to
> > ease dependencies handling (latter libxl patches use that).
>
> That sounds reasonable.
>
> > What qmp-libxenstat socket is for?
> >
> > PCI passthrough support require some more love. Right now, libxl tries to setup
> > pcifront for both target HVM and stubdomain and the former fails (race
> > condition):
> > xen-pciback pci-259-0: 22 Couldn't locate PCI device (0000:00:1b.0)! perhaps already in-use?
> > Fortunately it isn't needed. There is also a PCI related problem on
> > domain shutdown - it looks like first stubdomain is paused and then libxl waits
> > for pcifront there.
> > Also, MSI doesn't work (qemu output):
> >
> > [00:05.0] xen_pt_msgctrl_reg_write: setup MSI (register: 81).
> > [00:05.0] msi_msix_setup: requested pirq 55 for MSI (vec: 0, entry: 0)
> > [00:05.0] msi_msix_setup: Error: Mapping of MSI (err: 1, vec: 0, entry 0)
> > [00:05.0] xen_pt_msgctrl_reg_write: Warning: Can not map MSI (register: 80)!
> > [00:05.0] Write-back to unknown field 0x44 (partially) inhibited (0x00)
> > [00:05.0] If the device doesn't work, try enabling permissive mode
> > [00:05.0] (unsafe) and if it helps report the problem to xen-devel
>
> I confess I don't understand PCI passthrough but it would be nice for
> this to work. I think a starting point might be to write down who is
> supposed to do what...
>
> > This patch series is third (fourth?) attempt to get rid of limitation
> > "if you want to use stubdomain, you're stuck with qemu-traditional", done over
> > years, by many people.
> > I think it should be acceptable plan to gradually add features to
> > qemu-xen+stubdomain configuration - not necessary waiting with committing any
> > of those patches until full feature set of qemu-xen is supported. After all,
> > right now "feature set supported by qemu-xen+stubdom" is empty, so wouldn't be
> > worse.
>
> Absolutely.
>
> Ian.
>

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> On Tue, Oct 16, 2018 at 05:53:02PM +0100, Ian Jackson wrote:
> > I think you may need a quoting
> > scheme, or to use nul.
>
> The main reason for this over alternatives (including nul) is using
> shell script on the stubdomain side to handle this. Handling nul char in
> shell is major PITA.

Ah. Yes. You could handle multiple arguments in multiple xenstore
keys easily enough though, I think ?

Or you could pass a shell string. That is, you could shell escape the
arguments in libxl. Shell escaping is a bit fiddly but not too
hard.[1] Then in the stubdom you can just say sh -c.

[1] One algorithm would be
1. replace all \ in each argument with '\\'
2. replace all ' in each argument with '\''
3. put ' ' around each argument
4. concatenate with spaces in between

> > | xenstore values should normally be 7-bit ASCII text strings
> > | containing bytes 0x20..0x7f only
> >
> > I think you could have separate xenstore keys for the seperate
> > arguments, or you could encode it somehow.
>
> What "normally" means in this context? For me it doesn't mean other
> characters are prohibited.

The reasoning is explained in the surrounding text, Basically, it
makes debugging (listing xenstore by hand, etc) very annoying.

> > > * qemu can access saved state (if any) at its FD 3
> > > * qemu can write its state (for save/migration) to its FD 4
> >
> > That's a description of the protocol to *qemu*. Is the toolstack then
> > responsible for the protocol across the domain boundary ? I think it
> > would be nice to specify that here too.
>
> Well, toolstack is responsible for qemu command line (and QMP), so it is
> part of the stubdomain protocol...

Err, I mean, you are specifying a protocol at the qemu boundary. It
is good to write that down. But it would be nice to also write down
the protocol at the stubdom boundary, even though both halves of it
are actually implemented by part of the toolstack (the stubdom side
being scripts passed in by the toolstack).

> > This is awkward, isn't it. The Xen PV console protocol has no reset
> > mechanism. Can we use libvchan here and provide qmp vchans ?
>
> That would be an option too. Require (trivial) vchan->unix socket proxy.

Yes. Or teaching qemu about libvchan.
Hey, we should teach socat about libvchan :-).

Regards,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
On Tue, Oct 16, 2018 at 06:52:36PM +0100, Ian Jackson wrote:
> Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> > On Tue, Oct 16, 2018 at 05:53:02PM +0100, Ian Jackson wrote:
> > > I think you may need a quoting
> > > scheme, or to use nul.
> >
> > The main reason for this over alternatives (including nul) is using
> > shell script on the stubdomain side to handle this. Handling nul char in
> > shell is major PITA.
>
> Ah. Yes. You could handle multiple arguments in multiple xenstore
> keys easily enough though, I think ?
>
> Or you could pass a shell string.

From those two options I'd prefer multiple xenstore keys as much
cleaner. We do have them as an array in libxl already.

So, let image/dmargs be actually a directory with entries like:
- image/dmargs/000 = -xen-domid
- image/dmargs/001 = 123
- image/dmargs/002 = -nodefaults
...

I wonder if there needs to be arguments count, or iterating until ENOENT
is enough?

> That is, you could shell escape the
> arguments in libxl. Shell escaping is a bit fiddly but not too
> hard.[1] Then in the stubdom you can just say sh -c.
>
> [1] One algorithm would be
> 1. replace all \ in each argument with '\\'
> 2. replace all ' in each argument with '\''
> 3. put ' ' around each argument
> 4. concatenate with spaces in between
>
> > > | xenstore values should normally be 7-bit ASCII text strings
> > > | containing bytes 0x20..0x7f only
> > >
> > > I think you could have separate xenstore keys for the seperate
> > > arguments, or you could encode it somehow.
> >
> > What "normally" means in this context? For me it doesn't mean other
> > characters are prohibited.
>
> The reasoning is explained in the surrounding text, Basically, it
> makes debugging (listing xenstore by hand, etc) very annoying.
>
> > > > * qemu can access saved state (if any) at its FD 3
> > > > * qemu can write its state (for save/migration) to its FD 4
> > >
> > > That's a description of the protocol to *qemu*. Is the toolstack then
> > > responsible for the protocol across the domain boundary ? I think it
> > > would be nice to specify that here too.
> >
> > Well, toolstack is responsible for qemu command line (and QMP), so it is
> > part of the stubdomain protocol...
>
> Err, I mean, you are specifying a protocol at the qemu boundary. It
> is good to write that down. But it would be nice to also write down
> the protocol at the stubdom boundary, even though both halves of it
> are actually implemented by part of the toolstack (the stubdom side
> being scripts passed in by the toolstack).
>
> > > This is awkward, isn't it. The Xen PV console protocol has no reset
> > > mechanism. Can we use libvchan here and provide qmp vchans ?
> >
> > That would be an option too. Require (trivial) vchan->unix socket proxy.
>
> Yes. Or teaching qemu about libvchan.

That's also an option. I'll see how hard it would be to add this to
qemu.

> Hey, we should teach socat about libvchan :-).

:)

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> From those two options I'd prefer multiple xenstore keys as much
> cleaner. We do have them as an array in libxl already.
>
> So, let image/dmargs be actually a directory with entries like:
> - image/dmargs/000 = -xen-domid
> - image/dmargs/001 = 123
> - image/dmargs/002 = -nodefaults
>
> I wonder if there needs to be arguments count, or iterating until ENOENT
> is enough?

xenstore-read doesn't seem to provide an easy way for a shell script
to tell ENOENT apart from "everything is doomed". Here is its
(frankly, very poor) error handling:

char *val = xs_read(xsh, xth, argv[optind], &len);
if (val == NULL) {
warnx("couldn't read path %s", argv[optind]);
return 1;
}

It doesn't even print the errno value, let alone change the exit
status or provide a way to tolerate ENOENT without bulrbing to stderr.

If I were you I'd send a shell string but it's entirely up to you.

> > Yes. Or teaching qemu about libvchan.
>
> That's also an option. I'll see how hard it would be to add this to
> qemu.

Good luck.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> [stuff]

So, I only got a little way through this series, but it was a while
ago and you say things are done differently now. I'm not sure if it
is useful for me to review the rest of it.

Would it be better for me to await a repost ?

Regards,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
On Wed, Oct 17, 2018 at 04:16:03PM +0100, Ian Jackson wrote:
> Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> > [stuff]
>
> So, I only got a little way through this series, but it was a while
> ago and you say things are done differently now. I'm not sure if it
> is useful for me to review the rest of it.
>
> Would it be better for me to await a repost ?

All the xenconsoled stuff is unchanged (and unlikely to change), so it
should be safe to review it. Patches 06 and 07 also shouldn't change.

The thing that will change is qemu cmdline and qmp handling. TBH I'm not sure
if its better to touch qmp now, or after reworked qmp handling by
Anthony will be merged. There will definitely be some conflicts and it
may even affects what underlying mechanism is chosen for qmp transport.
Based on discussion here, and in libxl__ev_qmp_* thread, I see two
viable options:

1. libvchan
pros:
- out of band reset support, so qmp capabilities negotiation can be
handled gracefully
cons:
- more work, require patching qemu (or adding vchan->socket proxy),
adds dependency on libvchan to libxl (probably not a problem)
- possibly more complex libxl__ev_qmp_* handling, or at least needs
separate handling of send/receive for stubdomain case
2. pv console
pros:
- no qemu modifications
- same read()/write() on libxl side
cons:
- no out of band reset, needs libxl handling for that (skipping
negotiation)
- possibly other problems from that (events filling up some buffers
when no one is listening?)

In both cases, there is only one simultaneous connection to qemu, so
some locking will be needed at libxl side.
BTW Does libxl listed for qmp events?

There is also third option: xenstore, but that would probably require
totally separate libxl__ev_qmp_* implementation, so I'd rule it out.

If problems with pv console could be solved, I'd go this way. But
otherwise libvchan seems like a good alternative.

Adding Anthony.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> All the xenconsoled stuff is unchanged (and unlikely to change), so it
> should be safe to review it. Patches 06 and 07 also shouldn't change.

Sorry, I missed this reply. I will go on to review those.

> The thing that will change is qemu cmdline and qmp handling. TBH I'm not sure
> if its better to touch qmp now, or after reworked qmp handling by
> Anthony will be merged. There will definitely be some conflicts and it
> may even affects what underlying mechanism is chosen for qmp transport.
> Based on discussion here, and in libxl__ev_qmp_* thread, I see two
> viable options:
>
> 1. libvchan
> pros:
> - out of band reset support, so qmp capabilities negotiation can be
> handled gracefully
> cons:
> - more work, require patching qemu (or adding vchan->socket proxy),
> adds dependency on libvchan to libxl (probably not a problem)
> - possibly more complex libxl__ev_qmp_* handling, or at least needs
> separate handling of send/receive for stubdomain case

I think that the changes to libxl__ev_qmp_* will be relatively
self-contained, particularly after Anthony's async rework. There's
one place where the communication occurs.

Does libvchan offer asynchronous connection (ie, connect/disconnect
calls which cannot be stalled by the peer, but which instead allow
poll/select-based async) ? I think it may not, in which case we need
a vchan to socket proxy anyway.

Aside from that the libxl dependency is untroublesome.

> 2. pv console
> pros:
> - no qemu modifications
> - same read()/write() on libxl side
> cons:
> - no out of band reset, needs libxl handling for that (skipping
> negotiation)

Doesn't this potentially mean that the qmp console connection can
become irrecoverably desynchronised ? I don't know how you would
recover from the situation where another libxl process had got halfway
through some qmp stuff and been terminated (for whatever reason; maybe
the calling toolstack crashed).

> - possibly other problems from that (events filling up some buffers
> when no one is listening?)

xenconsole drops things in this situation. I think that may result in
desynchronisation.

> BTW Does libxl listed for qmp events?

Not right now. It may want to in future. Anthony's qmp series
discards events.

> There is also third option: xenstore, but that would probably require
> totally separate libxl__ev_qmp_* implementation, so I'd rule it out.

That's not a terrible idea but I can't see it being popular with qemu
upstream, so you'd end up writing a kind of bidirectional
qmp<->xenstore proxy. Urgh.

> If problems with pv console could be solved, I'd go this way. But
> otherwise libvchan seems like a good alternative.

Yes.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
On Thu, Nov 01, 2018 at 04:57:18PM +0000, Ian Jackson wrote:
> Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> > All the xenconsoled stuff is unchanged (and unlikely to change), so it
> > should be safe to review it. Patches 06 and 07 also shouldn't change.
>
> Sorry, I missed this reply. I will go on to review those.
>
> > The thing that will change is qemu cmdline and qmp handling. TBH I'm not sure
> > if its better to touch qmp now, or after reworked qmp handling by
> > Anthony will be merged. There will definitely be some conflicts and it
> > may even affects what underlying mechanism is chosen for qmp transport.
> > Based on discussion here, and in libxl__ev_qmp_* thread, I see two
> > viable options:
> >
> > 1. libvchan
> > pros:
> > - out of band reset support, so qmp capabilities negotiation can be
> > handled gracefully
> > cons:
> > - more work, require patching qemu (or adding vchan->socket proxy),
> > adds dependency on libvchan to libxl (probably not a problem)
> > - possibly more complex libxl__ev_qmp_* handling, or at least needs
> > separate handling of send/receive for stubdomain case
>
> I think that the changes to libxl__ev_qmp_* will be relatively
> self-contained, particularly after Anthony's async rework. There's
> one place where the communication occurs.
>
> Does libvchan offer asynchronous connection (ie, connect/disconnect
> calls which cannot be stalled by the peer, but which instead allow
> poll/select-based async) ? I think it may not, in which case we need
> a vchan to socket proxy anyway.

libxenvchan_server_init is asynchronous. libxenvchan_client_init is too,
but it fails if called before server is ready. I have a
wrapper[1] around libxenvchan_client_init in Qubes code which solve this
problem with xs_watch. "libvchan: create xenstore entries in one
transaction" patch is related to that wrapper.

Maybe it should be also added to libxenvchan? Right now it only waits
(synchronously) for server to appear (using while(...) xs_read_watch()).
This is slightly more complex, as it also handle remote domain death
before establishing connection as well as save+restore local domain.
But it can be provided as a separate function like
libxenvchan_client_wait_for_server or such.

Providing a function that could be used in libxl would be more complex,
as it needs to integrate with libxl async API. Maybe it could use
good old trick with separate thread + pipe() for signaling readiness?
This way, the libxenvchan_client_wait_for_server would start separate
thread (using own xenstore connection) and return fd that libxl can wait
on. No need to convert libxenvchan_client_wait_for_server into callback
hell...

[1] https://github.com/QubesOS/qubes-core-vchan-xen/blob/master/vchan/init.c#L58-L168

> Aside from that the libxl dependency is untroublesome.
>
> > 2. pv console
> > pros:
> > - no qemu modifications
> > - same read()/write() on libxl side
> > cons:
> > - no out of band reset, needs libxl handling for that (skipping
> > negotiation)
>
> Doesn't this potentially mean that the qmp console connection can
> become irrecoverably desynchronised ? I don't know how you would
> recover from the situation where another libxl process had got halfway
> through some qmp stuff and been terminated (for whatever reason; maybe
> the calling toolstack crashed).

That's right, it could result in irrecoverably desynchronised
connection. So, we need out of band reset.

> > - possibly other problems from that (events filling up some buffers
> > when no one is listening?)
>
> xenconsole drops things in this situation. I think that may result in
> desynchronisation.
>
> > BTW Does libxl listed for qmp events?
>
> Not right now. It may want to in future. Anthony's qmp series
> discards events.
>
> > There is also third option: xenstore, but that would probably require
> > totally separate libxl__ev_qmp_* implementation, so I'd rule it out.
>
> That's not a terrible idea but I can't see it being popular with qemu
> upstream, so you'd end up writing a kind of bidirectional
> qmp<->xenstore proxy. Urgh.

Well, I do that already (for pci-ins). In bash.

> > If problems with pv console could be solved, I'd go this way. But
> > otherwise libvchan seems like a good alternative.
>
> Yes.
>
> Ian.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> libxenvchan_server_init is asynchronous.

I went to look and I think what you mean is that it is fast. Ie, it
does not need to wait for anything.

> libxenvchan_client_init is too,
> but it fails if called before server is ready.

That is less good.

> I have a
> wrapper[1] around libxenvchan_client_init in Qubes code which solve this
> problem with xs_watch. "libvchan: create xenstore entries in one
> transaction" patch is related to that wrapper.

Ah, yes, I saw that, I think it should go in soon if it hasn't
already.

> Maybe it should be also added to libxenvchan? Right now it only waits
> (synchronously) for server to appear (using while(...) xs_read_watch()).

That's rather poor.

> This is slightly more complex, as it also handle remote domain death
> before establishing connection as well qas save+restore local domain.
> But it can be provided as a separate function like
> libxenvchan_client_wait_for_server or such.

AIUI you would want to call such a function in libxl, because your
qemu stubdom is the server ? In which case a synchronous call is no
good, because it would allow a rogue linux stubdom to block the entire
libxl process.

> Providing a function that could be used in libxl would be more complex,
> as it needs to integrate with libxl async API.

Oh, you're ahead of me.

> Maybe it could use
> good old trick with separate thread + pipe() for signaling readiness?

libxl has code for waiting for xs watches and domain death and so on
already.

How about this: provide a new variant of libxenvchan_client_init which
can give a return indication of the form `this server does not appear
to be set up; please watch on the following xenstore key and then call
me again when the watch fires'. That would be simple, and not involve
further event loop entanglement, and would fit nicely into libxl.

> This way, the libxenvchan_client_wait_for_server would start separate
> thread (using own xenstore connection) and return fd that libxl can wait
> on. No need to convert libxenvchan_client_wait_for_server into callback
> hell...

That may be overkill.

> > Doesn't this potentially mean that the qmp console connection can
> > become irrecoverably desynchronised ? I don't know how you would
> > recover from the situation where another libxl process had got halfway
> > through some qmp stuff and been terminated (for whatever reason; maybe
> > the calling toolstack crashed).
>
> That's right, it could result in irrecoverably desynchronised
> connection. So, we need out of band reset.

Sounds complicated. I think that is what your console state stuff is
about...

> > That's not a terrible idea but I can't see it being popular with qemu
> > upstream, so you'd end up writing a kind of bidirectional
> > qmp<->xenstore proxy. Urgh.
>
> Well, I do that already (for pci-ins). In bash.

hahahaha. Well, I think vchan may be easier.

Regards,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
On Thu, Nov 01, 2018 at 05:48:37PM +0000, Ian Jackson wrote:
> libxl has code for waiting for xs watches and domain death and so on
> already.
>
> How about this: provide a new variant of libxenvchan_client_init which
> can give a return indication of the form `this server does not appear
> to be set up; please watch on the following xenstore key and then call
> me again when the watch fires'. That would be simple, and not involve
> further event loop entanglement, and would fit nicely into libxl.

Doing it in this order would be susceptible to a race condition -
server appearing after libxenvchan_client_init check, but before libxl
register the watch. Also, right now libxenvchan_client_init have only
one possible error code: NULL (instead of struct libxenvchan *). Adding
more elaborate error reporting would require API change.

As the xs path is provided by libxenvchan_client_init caller anyway,
libxl can register watch before calling libxenvchan_client_init and wait
on it if libxenvchan_client_init fails. In _this particular case_ other
conditions don't need to be handled specifically, as in the worst case it
will hit startup/qmp timeout.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> Doing it in this order would be susceptible to a race condition -
> server appearing after libxenvchan_client_init check, but before libxl
> register the watch.

I see the race you mean but happily xenstore already has a general
mechanism for avoiding it: after setting up a watch, it always fires
once immediately. (Obviously we could have done the equivalent thing
open-coded in the caller of my proposed new init function variant.)

> Also, right now libxenvchan_client_init have only
> one possible error code: NULL (instead of struct libxenvchan *). Adding
> more elaborate error reporting would require API change.

I think libxenvchan_client_init sets errno. All of the functions it
calls do so, so the errno value is passed through. So we would just
need to reserve a specific errno value for this.

> As the xs path is provided by libxenvchan_client_init caller anyway,
> libxl can register watch before calling libxenvchan_client_init and wait
> on it

Yes. (Sorry I didn't see that parameter yesterday. I was really
being quite dim.)

Although because of the xswatch behaviour I describe above, libxl can
simply set up the watch unconditionally, and call
libxenvchan_client_init in the xswatch event handler function.

> if libxenvchan_client_init fails.

I'm not a fan of this. I tend to be quite picky about error
handling. I think we should define a specific errno value for `server
not set up'. ENOENT is what it currently returns, so if we use that
we won't break existing clients.

As belt and braces we should probably have libxenvchan_client_init
turn any ENOENT other than from the xs_read of ring-ref into EIO with
a log error message.

I worked up a patch to do that, which I will post in a moment. It
turned into a series...

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
On Thu, Nov 01, 2018 at 06:32:07PM +0100, Marek Marczykowski-Górecki wrote:
> On Thu, Nov 01, 2018 at 04:57:18PM +0000, Ian Jackson wrote:
> > Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> > > 2. pv console
> > > pros:
> > > - no qemu modifications
> > > - same read()/write() on libxl side
> > > cons:
> > > - no out of band reset, needs libxl handling for that (skipping
> > > negotiation)
> >
> > Doesn't this potentially mean that the qmp console connection can
> > become irrecoverably desynchronised ? I don't know how you would
> > recover from the situation where another libxl process had got halfway
> > through some qmp stuff and been terminated (for whatever reason; maybe
> > the calling toolstack crashed).
>
> That's right, it could result in irrecoverably desynchronised
> connection. So, we need out of band reset.

Actually, it looks like we can recover that situation without out of
band reset. It's even in the spec[1]:

2.7 QGA Synchronization
-----------------------

When a client connects to QGA over a transport lacking proper
connection semantics such as virtio-serial, QGA may have read partial
input from a previous client. The client needs to force QGA's parser
into known-good state using the previous section's technique.
Moreover, the client may receive output a previous client didn't read.
To help with skipping that output, QGA provides the
'guest-sync-delimited' command. Refer to its documentation for
details.

previous section:
2.6 Forcing the JSON parser into known-good state
-------------------------------------------------

Incomplete or invalid input can leave the server's JSON parser in a
state where it can't parse additional commands. To get it back into
known-good state, the client should provoke a lexical error.

The cleanest way to do that is sending an ASCII control character
other than '\t' (horizontal tab), '\r' (carriage return), or '\n' (new
line).

Sadly, older versions of QEMU can fail to flag this as an error. If a
client needs to deal with them, it should send a 0xFF byte.

[1] https://git.qemu.org/?p=qemu.git;a=blob;f=docs/interop/qmp-spec.txt;h=8f7da0245d51447be7df2b3d4b105bad9fbec0b3;hb=HEAD#l231

--
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
On Thu, Nov 15, 2018 at 05:41:44PM +0000, Anthony PERARD wrote:
> On Thu, Nov 01, 2018 at 06:32:07PM +0100, Marek Marczykowski-Górecki wrote:
> > On Thu, Nov 01, 2018 at 04:57:18PM +0000, Ian Jackson wrote:
> > > Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> > > > 2. pv console
> > > > pros:
> > > > - no qemu modifications
> > > > - same read()/write() on libxl side
> > > > cons:
> > > > - no out of band reset, needs libxl handling for that (skipping
> > > > negotiation)
> > >
> > > Doesn't this potentially mean that the qmp console connection can
> > > become irrecoverably desynchronised ? I don't know how you would
> > > recover from the situation where another libxl process had got halfway
> > > through some qmp stuff and been terminated (for whatever reason; maybe
> > > the calling toolstack crashed).
> >
> > That's right, it could result in irrecoverably desynchronised
> > connection. So, we need out of band reset.
>
> Actually, it looks like we can recover that situation without out of
> band reset. It's even in the spec[1]:

That's interesting. And it mention serial console explicitly as the use
case for this. Does it apply to monitor socket too, or guest agent only?
I'd much prefer to use console, as the code would be much simpler (the
same handling for local and stubdomain qemu).

Also, this doesn't cover capabilities (re-)negotiation. While actual
capabilities are probably not a problem, libxl do store qemu version
from the server greeting (is it used anywhere?).

(...)

> previous section:
> 2.6 Forcing the JSON parser into known-good state
> -------------------------------------------------
>
> Incomplete or invalid input can leave the server's JSON parser in a
> state where it can't parse additional commands. To get it back into
> known-good state, the client should provoke a lexical error.
>
> The cleanest way to do that is sending an ASCII control character
> other than '\t' (horizontal tab), '\r' (carriage return), or '\n' (new
> line).
>
> Sadly, older versions of QEMU can fail to flag this as an error. If a
> client needs to deal with them, it should send a 0xFF byte.

That's weird as guest-sync-delimiter documentation (still?) mention
0xFF. In both directions. Does it mean the new recommendation is to use
ASCII control character in one direction, but expect 0xFF in the other?

> [1] https://git.qemu.org/?p=qemu.git;a=blob;f=docs/interop/qmp-spec.txt;h=8f7da0245d51447be7df2b3d4b105bad9fbec0b3;hb=HEAD#l231

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
On Thu, Nov 15, 2018 at 07:57:08PM +0100, Marek Marczykowski-Górecki wrote:
> On Thu, Nov 15, 2018 at 05:41:44PM +0000, Anthony PERARD wrote:
> > On Thu, Nov 01, 2018 at 06:32:07PM +0100, Marek Marczykowski-Górecki wrote:
> > > On Thu, Nov 01, 2018 at 04:57:18PM +0000, Ian Jackson wrote:
> > > > Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> > > > > 2. pv console
> > > > > pros:
> > > > > - no qemu modifications
> > > > > - same read()/write() on libxl side
> > > > > cons:
> > > > > - no out of band reset, needs libxl handling for that (skipping
> > > > > negotiation)
> > > >
> > > > Doesn't this potentially mean that the qmp console connection can
> > > > become irrecoverably desynchronised ? I don't know how you would
> > > > recover from the situation where another libxl process had got halfway
> > > > through some qmp stuff and been terminated (for whatever reason; maybe
> > > > the calling toolstack crashed).
> > >
> > > That's right, it could result in irrecoverably desynchronised
> > > connection. So, we need out of band reset.
> >
> > Actually, it looks like we can recover that situation without out of
> > band reset. It's even in the spec[1]:
>
> That's interesting. And it mention serial console explicitly as the use
> case for this. Does it apply to monitor socket too, or guest agent only?
> I'd much prefer to use console, as the code would be much simpler (the
> same handling for local and stubdomain qemu).

The 'guest-sync-delimited' command doesn't seems to be available on the
monitor socket. I should have checked that ... but that would just mean
that libxl would need to tolerate the first read to be an incompleted
json-object. Then we can use the 'id' that every response have to figure
out if it was a reply sent to a previous libxl run. We can maybe encode
the pid into the id.

> Also, this doesn't cover capabilities (re-)negotiation. While actual
> capabilities are probably not a problem, libxl do store qemu version
> from the server greeting (is it used anywhere?).

The QEMU version is still available after the capabilities negociation,
one simply need to execute 'query-version'.

And yes, the version is used. So far there is one command that changes
with newer version of QEMU, 'xen-save-devices-state'.

> > previous section:
> > 2.6 Forcing the JSON parser into known-good state
> > -------------------------------------------------
> >
> > Incomplete or invalid input can leave the server's JSON parser in a
> > state where it can't parse additional commands. To get it back into
> > known-good state, the client should provoke a lexical error.
> >
> > The cleanest way to do that is sending an ASCII control character
> > other than '\t' (horizontal tab), '\r' (carriage return), or '\n' (new
> > line).
> >
> > Sadly, older versions of QEMU can fail to flag this as an error. If a
> > client needs to deal with them, it should send a 0xFF byte.
>
> That's weird as guest-sync-delimiter documentation (still?) mention
> 0xFF. In both directions. Does it mean the new recommendation is to use
> ASCII control character in one direction, but expect 0xFF in the other?

Looks like it. But there is a different between the server and the
client, the server still parse JSON, but the client will actively look
for the delimiter once the command have been sent.

--
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
On Fri, Nov 16, 2018 at 10:39:07AM +0000, Anthony PERARD wrote:
> On Thu, Nov 15, 2018 at 07:57:08PM +0100, Marek Marczykowski-Górecki wrote:
> > On Thu, Nov 15, 2018 at 05:41:44PM +0000, Anthony PERARD wrote:
> > > On Thu, Nov 01, 2018 at 06:32:07PM +0100, Marek Marczykowski-Górecki wrote:
> > > > On Thu, Nov 01, 2018 at 04:57:18PM +0000, Ian Jackson wrote:
> > > > > Marek Marczykowski-Górecki writes ("Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain."):
> > > > > > 2. pv console
> > > > > > pros:
> > > > > > - no qemu modifications
> > > > > > - same read()/write() on libxl side
> > > > > > cons:
> > > > > > - no out of band reset, needs libxl handling for that (skipping
> > > > > > negotiation)
> > > > >
> > > > > Doesn't this potentially mean that the qmp console connection can
> > > > > become irrecoverably desynchronised ? I don't know how you would
> > > > > recover from the situation where another libxl process had got halfway
> > > > > through some qmp stuff and been terminated (for whatever reason; maybe
> > > > > the calling toolstack crashed).
> > > >
> > > > That's right, it could result in irrecoverably desynchronised
> > > > connection. So, we need out of band reset.
> > >
> > > Actually, it looks like we can recover that situation without out of
> > > band reset. It's even in the spec[1]:
> >
> > That's interesting. And it mention serial console explicitly as the use
> > case for this. Does it apply to monitor socket too, or guest agent only?
> > I'd much prefer to use console, as the code would be much simpler (the
> > same handling for local and stubdomain qemu).
>
> The 'guest-sync-delimited' command doesn't seems to be available on the
> monitor socket. I should have checked that ... but that would just mean
> that libxl would need to tolerate the first read to be an incompleted
> json-object. Then we can use the 'id' that every response have to figure
> out if it was a reply sent to a previous libxl run. We can maybe encode
> the pid into the id.

It may be tricky to figure out where is the end of such incomplete json
object... Suppose you read:

{ "x": { "y": 1 } } }

If you read this from the beginning looking for json, you'll get valid
json object unless you encounter the last "}" (which you may receive in
separate read() call, if you're unlucky). I'm afraid the logic for
skipping initial (possibly incomplete) object(s) may be quite complex.
Maybe better propose upstream to include 'guest-sync-delimited' also on
monitor socket too? In that case, the command naming will be awkward,
but still, similar command would be useful in that context.

> > Also, this doesn't cover capabilities (re-)negotiation. While actual
> > capabilities are probably not a problem, libxl do store qemu version
> > from the server greeting (is it used anywhere?).
>
> The QEMU version is still available after the capabilities negociation,
> one simply need to execute 'query-version'.

That's good.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Re: [RFC PATCH v2 00/17] Add support for qemu-xen runnning in a Linux-based stubdomain. [ In reply to ]
On Sun, Nov 18, 2018 at 06:25:53PM +0100, Marek Marczykowski-Górecki wrote:
> On Fri, Nov 16, 2018 at 10:39:07AM +0000, Anthony PERARD wrote:
> > The 'guest-sync-delimited' command doesn't seems to be available on the
> > monitor socket. I should have checked that ... but that would just mean
> > that libxl would need to tolerate the first read to be an incompleted
> > json-object. Then we can use the 'id' that every response have to figure
> > out if it was a reply sent to a previous libxl run. We can maybe encode
> > the pid into the id.
>
> It may be tricky to figure out where is the end of such incomplete json
> object... Suppose you read:
>
> { "x": { "y": 1 } } }
>
> If you read this from the beginning looking for json, you'll get valid
> json object unless you encounter the last "}" (which you may receive in
> separate read() call, if you're unlucky). I'm afraid the logic for
> skipping initial (possibly incomplete) object(s) may be quite complex.

It's not that complex, all messages sent by QEMU are terminated by CRLF,
that part of the protocol. So I think that libxl already return an error
if it get something like: '{ "z": 2 } }\r\n', because of that extra }
that should not be there.

> Maybe better propose upstream to include 'guest-sync-delimited' also on
> monitor socket too? In that case, the command naming will be awkward,
> but still, similar command would be useful in that context.

It might be usefull to have.

--
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel