Mailing List Archive

Comments on Xen API document
I've only recently joined this mailing list, so I'm sure some of my questions
here are old hat - in which case apologies. Same goes for ones that are a
result of my limited knowledge about XML RPC.

These comments are based on version 0.4 from the wiki page.

Generally - there's no clear versioning. In particular a description of how to
find out what version of this API the host expects, and what it will return.
Ideally it would degrade to the version that the client reports as being able
to handle. The API docs should in the future clearly label what revision things
were added in. One thing that might be useful is clearly specifying stability
of each item. This nicely allows for experimental features, whilst labelling
truly stable entries clearly.

I think you mention it, but all calls need to clearly define the failure modes
and what gets returned on failure. It also wasn't clear to me what happens if I
try to set several parameters in a single XML RPC call (presuming that's
possible?). If the second one fails, what do I get back in terms of the error
struct?

1.1

The add_to_* primitives for sets should probably return the equivalent of
EEXIST, not just silently succeed: it's always better to provide more
information to a client. The operation may be a logic error.

1.3.1

ErrorDescription isn't specified well enough to program against.

1.4.2

What happens to the session if a connection is terminated or idles out?

1.4.3

The persistent tasks per user behaviour should be called out here explicitly,
rather than the comment later on in the document.

The specification implies that a failed task must have its Progress set to
"100". This seems odd. More widely, has there been consideration of indicating
/phases/ of operation? Often this is more useful than a generic slider value.

1.5

I don't understand the expanded XML for the do_login_with_password() - the
paramters are not name/value pairs. If this is a general design, how are
optional parameters to be handled?

"getname_label" is a result, presumably, of the odd "name/label" naming of the
parameter. This mapping is not defined in the document, and it's a little odd
that we have "getname_label" but "get_uuid".


Probably an XML RPC thing, but there's no discussions of transactions in the
document. Can I change a batch of things and assure it's done atomically?

1.7

Do zombie domains need to be explicitly called out in the lifecycle?

2.3.3

These enums would be easier to read if placed by the calls that used them.

enum vm_power_state - "power" is a very odd notion for a VM. Why not just
"vm_state" ?

enum on_crash_behaviour - it's likely useful to have a host-wide setting for
whether domains should core dump. How would that interact?

enum boot_type - it's not clear to me how a client would make use of knowing
that it's "kernel_internal"

2.6.1

is_a_template - "is_template" seems more natural

memory/* needs units

VCPUs/params needs a detailed description of what it can/does contain

VCPUS/utilisation - a percentage presumably?

bios/cdrom - I'm not sure what this does...

kernel/kernel et al - depending on bootloader, is this the copied kernel path
on dom0 filesystem, or the path inside the domU filesystem? e.g. pygrub,
domuloader

tools_version - needs defining explicitly?

There's no mapping from VM vcpus to the host_cpu's it's pinned on.

2.6.2

What's the interaction with the on shutdown behaviour for the shutdown RPCs?

suspend/resume - I'm a little confused about where the domain save file
actually goes, and how 'xm' will specify it

2.7.1

What is name/label intended to be for a host? It's not actually entirely clear
to me what a host /is/. What does "list" do? Note that early in the document
you refer to a non-existent ListAllVMs RPC.

'disable' and 'enable' seem perhaps a bit too generic name-wise.

2.11

I'm not clear on what an SR actually is and its relationship to its contained
VDIs. Could the document clarify this somewhat?

type,location - what are its possible values?

2.12.1

"shareable" just wins in a googlefight against "sharable"

2.12.2

Why is a snapshot defined as living in the same SR?

2.13.1

I don't think vbd_mode or driver_type are specified.

thanks,
john

_______________________________________________
xen-api mailing list
xen-api@lists.xensource.com
http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-api
Re: Comments on Xen API document [ In reply to ]
On Tue, Aug 15, 2006 at 03:21:45AM +0100, John Levon wrote:

>
> I've only recently joined this mailing list, so I'm sure some of my questions
> here are old hat - in which case apologies. Same goes for ones that are a
> result of my limited knowledge about XML RPC.
>
> These comments are based on version 0.4 from the wiki page.
>
> Generally - there's no clear versioning. In particular a description of how
> to find out what version of this API the host expects, and what it will
> return. Ideally it would degrade to the version that the client reports as
> being able to handle. The API docs should in the future clearly label what
> revision things were added in. One thing that might be useful is clearly
> specifying stability of each item. This nicely allows for experimental
> features, whilst labelling truly stable entries clearly.
>
> I think you mention it, but all calls need to clearly define the failure
> modes and what gets returned on failure.

Yes, both versioning and failure modes and codes are on the todo list -- we
definitely intend to specify this stuff properly before release.

> It also wasn't clear to me what happens if I try to set several parameters
> in a single XML RPC call (presuming that's possible?). If the second one
> fails, what do I get back in terms of the error struct?

No, you can't do this. In general, we want to keep the calls "atomic", in
that the call either succeeds or it fails and changes no state. The policy on
handling failures in the middle of a set of operations is something that we
want to push to the client.

> 1.1
>
> The add_to_* primitives for sets should probably return the equivalent of
> EEXIST, not just silently succeed: it's always better to provide more
> information to a client. The operation may be a logic error.

OK, that's reasonable. I'll put that in.

> 1.3.1
>
> ErrorDescription isn't specified well enough to program against.

What do you mean by this? We still haven't specified the actual error codes,
their meaning, and the parameters that the code implies, but other than that I
thought it was well enough specified. Could you be more specific?

> 1.4.2
>
> What happens to the session if a connection is terminated or idles out?

This isn't in the document, but I think the behaviour we'll go for is that
sessions can time out, and any call could potentially return a "NoSuchSession"
error, and so the client must be prepared to authenticate again.

Note that session objects are independent of the underlying transport, so
there's no reason why a TCP timeout should necessarily cause a session
timeout.

> 1.4.3
>
> The persistent tasks per user behaviour should be called out here explicitly,
> rather than the comment later on in the document.

OK. The task stuff is still a bit up in the air, and could do with more work,
certainly.

> The specification implies that a failed task must have its Progress set to
> "100". This seems odd.

Odd maybe, but not harmful, I think. The intention is that you continue to
poll until you see 100, and then parse the result for success or failure codes
etc -- it's not intended that 100 indicates success.

> More widely, has there been consideration of indicating
> /phases/ of operation? Often this is more useful than a generic slider value.

No, no-one's considered that, as far as I know. What would you expect here?
Would we end up with a whole new set of codes to indicate task progress, on
top of the error codes?

> 1.5
>
> I don't understand the expanded XML for the do_login_with_password() - the
> paramters are not name/value pairs. If this is a general design, how are
> optional parameters to be handled?

Optional parameters aren't handled at all, deliberately, as they add
complexity that I don't think we need, and would make life difficult for
bindings to languages that don't have them (like C).

Any call that takes a struct could in the future have optional entries, since
those are defined using name-value pairs, but again, we've not allowed that in
the specification yet.

> "getname_label" is a result, presumably, of the odd "name/label" naming of
> the parameter. This mapping is not defined in the document, and it's a
> little odd that we have "getname_label" but "get_uuid".

Yes, that's a typo -- it ought to read get_name_label -- and we certainly need
to define that mapping properly.

> Probably an XML RPC thing, but there's no discussions of transactions in the
> document. Can I change a batch of things and assure it's done atomically?

No -- we want the client to make rollback / error handling decisions -- Xend
simply isn't the right place for that kind of policy. Atomicity is impossible
to guarantee for a lot of the parameters in any case, because Xen doesn't
offer a transactional interface either.

> 1.7
>
> Do zombie domains need to be explicitly called out in the lifecycle?

Hmmm, maybe. I'll have a think. Zombies are more related to the _domain_
rather than the _VM_ -- you could restart the VM even with a zombie lying
around, there are just some additional resources being consumed -- so I don't
think that they really belong in that lifecycle.

> 2.3.3
>
> These enums would be easier to read if placed by the calls that used them.

OK, I'll see if I can rejig it. There might be places where enums are used by
more than one class.

> enum vm_power_state - "power" is a very odd notion for a VM. Why not just
> "vm_state" ?

I'm not wedded to the name -- I just wanted to avoid having a completely
generic name like "state". Perhaps another term would be better?

> enum on_crash_behaviour - it's likely useful to have a host-wide setting for
> whether domains should core dump. How would that interact?

We could core dump if either setting said so, I guess, but there's no per-host
setting for this in the API at the moment. Isn't this something that we could
punt to the higher-level tools?

> enum boot_type - it's not clear to me how a client would make use of knowing
> that it's "kernel_internal"

Well the user needs to know, because they need to make sure that they don't
delete the kernel from the guest's rootfs in that case!

> 2.6.1
>
> is_a_template - "is_template" seems more natural

OK.

> memory/* needs units

Yes, they do, certainly.

> VCPUs/params needs a detailed description of what it can/does contain

It's meant to be a free-form string, interpreted by whatever scheduler is
running.

> VCPUS/utilisation - a percentage presumably?

Um, load average I think, like top, but I'll check with our scheduler expert
and see if we can get more interesting stats out.

> bios/cdrom - I'm not sure what this does...

Yes, we've dropped that one already.

> kernel/kernel et al - depending on bootloader, is this the copied kernel path
> on dom0 filesystem, or the path inside the domU filesystem? e.g. pygrub,
> domuloader

Yes, that's right.

> tools_version - needs defining explicitly?

We'll certainly need to define some permanent entries, but we wanted this to
be open for expansion later too. Obviously we'll guarantee that there will be
a "Xend" entry, but we wanted for other version numbers to be able to slot in
here too, such as the version of PV drivers inside guests.

> There's no mapping from VM vcpus to the host_cpu's it's pinned on.

Pinning is something that would get encoded into VCPUs/params, when the
scheduler supports such functionality.

> 2.6.2
>
> What's the interaction with the on shutdown behaviour for the shutdown RPCs?

The specified after_shutdown action takes place once the shutdown has
completed, regardless of whether you use clean_shutdown or hard_shutdown.

> suspend/resume - I'm a little confused about where the domain save file
> actually goes, and how 'xm' will specify it

It doesn't "go" anywhere defined -- that's an internal issue for Xend to deal
with. We're moving away from the model where you can save and restore to
files and move them around willy-nilly -- people have a tendency to corrupt
their filesystems by starting the same save file twice, for example.

If you actually want the save file, you need some kind of "export" call, which
we haven't yet defined.

> 2.7.1
>
> What is name/label intended to be for a host?

The tools could initially set it from something obvious like the machine's DNS
name, but it's meant to be human-readable and settable -- it's for the admin
to specify, ultimately.

> It's not actually entirely clear to me what a host /is/.

One host == one Xen == one Xend. Does that clear it up?

> What does "list" do? Note that early in the document you refer to a
> non-existent ListAllVMs RPC.

Yes, the doc needs fixing so that it's consistent. I think we've dropped
listAllVMs in favour of Host.get_resident_vms, which returns a set of
references to the VMs available on that host (i.e. a list of VM UUIDs).

> 'disable' and 'enable' seem perhaps a bit too generic name-wise.

What would you prefer?

> 2.11
>
> I'm not clear on what an SR actually is and its relationship to its contained
> VDIs. Could the document clarify this somewhat?

Sure, we could clarify this. An SR is a storage segment, like a partition, or
an LVM volume, or a LUN on a SAN. I'll get our storage guys to specify this a
little better than I can manage! With blocktap or loopback files, you can
have multiple VDIs within an SR, one backing file per VDI, hence the
containment relationship.

> type,location - what are its possible values?

That's still up in the air, I think. That's something else that needs to go
on the todo.

> 2.12.1
>
> "shareable" just wins in a googlefight against "sharable"

Ah, googlefight, the OED for the 21st Century! We Brits (the pedantic ones,
at least) tend not to pay much attention to googlefights, but in this case I
prefer it with the 'e' anyway ;-) I'll change that.

> 2.12.2
>
> Why is a snapshot defined as living in the same SR?

So that copy-on-write implementations of snapshotting can work. If you want a
snapshot in a separate SR, you need to explicitly copy it, using the call that
I've forgotten to put in ;-)

> 2.13.1
>
> I don't think vbd_mode or driver_type are specified.

They're enums -- perhaps they dropped off your copy, but in my more recent one
they're definitely there. vbd_mode is RO or RW, and driver_type is ioemu or
paravirtualised.

Thanks for your feedback, it's been very useful,

Ewan.

_______________________________________________
xen-api mailing list
xen-api@lists.xensource.com
http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-api
Re: Comments on Xen API document [ In reply to ]
On Thu, Aug 17, 2006 at 12:13:24PM +0100, Ewan Mellor wrote:

> > ErrorDescription isn't specified well enough to program against.
>
> What do you mean by this? We still haven't specified the actual error codes,
> their meaning, and the parameters that the code implies, but other than that I
> thought it was well enough specified. Could you be more specific?

I think I was actually just repeating myself, sorry about that.

> > What happens to the session if a connection is terminated or idles out?
>
> This isn't in the document, but I think the behaviour we'll go for is that
> sessions can time out, and any call could potentially return a "NoSuchSession"
> error, and so the client must be prepared to authenticate again.

Sounds reasonable.

> Note that session objects are independent of the underlying transport, so
> there's no reason why a TCP timeout should necessarily cause a session
> timeout.

Though there's an obvious "accidental DoS" opportunity if session objects
persist forever.

> > The specification implies that a failed task must have its Progress set to
> > "100". This seems odd.
>
> Odd maybe, but not harmful, I think. The intention is that you continue to
> poll until you see 100, and then parse the result for success or failure codes
> etc -- it's not intended that 100 indicates success.

I suppose that's OK, but the spec should call this out explicitly I think.

> > More widely, has there been consideration of indicating
> > /phases/ of operation? Often this is more useful than a generic slider value.
>
> No, no-one's considered that, as far as I know. What would you expect here?
> Would we end up with a whole new set of codes to indicate task progress, on
> top of the error codes?

It might be useful to label the stages of a live migration for example. "Hmm,
it hung during the suspend call in the domU" rather than "it stopped at x
percent".

> > I don't understand the expanded XML for the do_login_with_password() - the
> > paramters are not name/value pairs. If this is a general design, how are
> > optional parameters to be handled?
>
> Optional parameters aren't handled at all, deliberately, as they add
> complexity that I don't think we need, and would make life difficult for
> bindings to languages that don't have them (like C).
>
> Any call that takes a struct could in the future have optional entries, since
> those are defined using name-value pairs, but again, we've not allowed that in
> the specification yet.

This might become a problem in the future for "unstable" data items.

> > Probably an XML RPC thing, but there's no discussions of transactions in the
> > document. Can I change a batch of things and assure it's done atomically?
>
> No -- we want the client to make rollback / error handling decisions -- Xend
> simply isn't the right place for that kind of policy. Atomicity is impossible
> to guarantee for a lot of the parameters in any case, because Xen doesn't
> offer a transactional interface either.

Hmm. This leaves clients with an unfortunate problem. Consider a client that
wants to upgrade a domU's kernel with an external method. It can't change both
kernel and initrd without a window of opportunity for another client to attempt
booting the domU with an invalid configuration. Perhaps some thought should be
given to making xend itself transactional somehow? It could be as simple as
letting clients "lock" domUs...

> > Do zombie domains need to be explicitly called out in the lifecycle?
>
> Hmmm, maybe. I'll have a think. Zombies are more related to the _domain_
> rather than the _VM_ -- you could restart the VM even with a zombie lying
> around, there are just some additional resources being consumed -- so I don't
> think that they really belong in that lifecycle.

Fair point.

> > enum vm_power_state - "power" is a very odd notion for a VM. Why not just
> > "vm_state" ?
>
> I'm not wedded to the name -- I just wanted to avoid having a completely
> generic name like "state". Perhaps another term would be better?

I understand the concern, but in this case I think 'state' is pretty
descriptive when attached to a domU. Or, perhaps, "runstate" ?

> > enum on_crash_behaviour - it's likely useful to have a host-wide setting for
> > whether domains should core dump. How would that interact?
>
> We could core dump if either setting said so, I guess, but there's no per-host
> setting for this in the API at the moment. Isn't this something that we could
> punt to the higher-level tools?

Yes, probably.

> > VCPUs/params needs a detailed description of what it can/does contain
>
> It's meant to be a free-form string, interpreted by whatever scheduler is
> running.

So it's explicit that an XML RPC client can't configure the scheduler
parameters, right?

> > kernel/kernel et al - depending on bootloader, is this the copied kernel path
> > on dom0 filesystem, or the path inside the domU filesystem? e.g. pygrub,
> > domuloader
>
> Yes, that's right.

I think the changing meaning of the parameters depending upon the boot method
should be described explicitly in the document.

> > tools_version - needs defining explicitly?
>
> We'll certainly need to define some permanent entries, but we wanted this to
> be open for expansion later too. Obviously we'll guarantee that there will be
> a "Xend" entry, but we wanted for other version numbers to be able to slot in
> here too, such as the version of PV drivers inside guests.

OK.

> > There's no mapping from VM vcpus to the host_cpu's it's pinned on.
>
> Pinning is something that would get encoded into VCPUs/params, when the
> scheduler supports such functionality.

More generally, though, there's no way to examine load balancing across the
physical CPUs that I can see? This seems like pretty critical information.

> > suspend/resume - I'm a little confused about where the domain save file
> > actually goes, and how 'xm' will specify it
>
> It doesn't "go" anywhere defined -- that's an internal issue for Xend to deal
> with. We're moving away from the model where you can save and restore to
> files and move them around willy-nilly -- people have a tendency to corrupt
> their filesystems by starting the same save file twice, for example.
>
> If you actually want the save file, you need some kind of "export" call, which
> we haven't yet defined.

OK.

> > 'disable' and 'enable' seem perhaps a bit too generic name-wise.
>
> What would you prefer?

Good question. I'll think...

> > I'm not clear on what an SR actually is and its relationship to its contained
> > VDIs. Could the document clarify this somewhat?
>
> Sure, we could clarify this. An SR is a storage segment, like a partition, or
> an LVM volume, or a LUN on a SAN. I'll get our storage guys to specify this a
> little better than I can manage! With blocktap or loopback files, you can
> have multiple VDIs within an SR, one backing file per VDI, hence the
> containment relationship.

I need to think about this in respect to ZFS somewhat...

> Thanks for your feedback, it's been very useful,

Thanks for the replies!

regards
john

_______________________________________________
xen-api mailing list
xen-api@lists.xensource.com
http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-api
Re: Comments on Xen API document [ In reply to ]
>> > VCPUs/params needs a detailed description of what it can/does contain
>>
>> It's meant to be a free-form string, interpreted by whatever scheduler
is
>> running.
>
>So it's explicit that an XML RPC client can't configure the scheduler
>parameters, right?

If the only real way to do any sort of priority based 'weighting' of CPU
allocation to guest VMs under Xen is via adjusting scheduler params, then
personally I would like to see XML RPC interfaces for that. Otherwise the
only level of host CPU resource allocation adjustment available to an
system admin via CIM is rather coarse pinning/unpinning specific physical
CPUs to specific DomUs.

Or am I missing something here?

- Gareth
Re: Comments on Xen API document [ In reply to ]
On Mon, Aug 21, 2006 at 05:27:04PM +0100, John Levon wrote:

> On Thu, Aug 17, 2006 at 12:13:24PM +0100, Ewan Mellor wrote:
>
> > > VCPUs/params needs a detailed description of what it can/does contain
> >
> > It's meant to be a free-form string, interpreted by whatever scheduler is
> > running.
>
> So it's explicit that an XML RPC client can't configure the scheduler
> parameters, right?

That's not what I meant, no. What I meant to say was that the client could
set scheduler parameters, but that it would need to understand whichever
scheduler is in use, and know how to encode the parameters for that. In other
words, we need to state the meaning and format of the VCPUs/params field,
depending upon the scheduler in use.

In practice, for now at least, I think that just means defining the meaning of
this field for the credit scheduler, because that's will be the default for
Xen 3.0.3 and 3.0.4. The intention for the longer term is to support the use
of other schedulers, but that's some way off yet.

> > > There's no mapping from VM vcpus to the host_cpu's it's pinned on.
> >
> > Pinning is something that would get encoded into VCPUs/params, when the
> > scheduler supports such functionality.
>
> More generally, though, there's no way to examine load balancing across the
> physical CPUs that I can see? This seems like pretty critical information.

That's a good point! Thanks, I'll put that on the todo list. I'll also put
together a fuller description of the CPU scheduler side of the API in general,
which certainly seems to be something that would be useful.

Cheers,

Ewan.

_______________________________________________
xen-api mailing list
xen-api@lists.xensource.com
http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-api
Re: Comments on Xen API document [ In reply to ]
On Wed, Aug 23, 2006 at 09:45:14PM +0100, Ewan Mellor wrote:

> > So it's explicit that an XML RPC client can't configure the scheduler
> > parameters, right?
>
> In practice, for now at least, I think that just means defining the meaning of
> this field for the credit scheduler

Sounds fine then.

> That's a good point! Thanks, I'll put that on the todo list. I'll also put
> together a fuller description of the CPU scheduler side of the API in general,
> which certainly seems to be something that would be useful.

That'd be great, thanks.

regards
john

_______________________________________________
xen-api mailing list
xen-api@lists.xensource.com
http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-api
Re: Comments on Xen API document [ In reply to ]
On Mon, Aug 21, 2006 at 05:27:04PM +0100, John Levon wrote:

> > > enum vm_power_state - "power" is a very odd notion for a VM. Why not just
> > > "vm_state" ?
> >
> > I'm not wedded to the name -- I just wanted to avoid having a completely
> > generic name like "state". Perhaps another term would be better?
>
> I understand the concern, but in this case I think 'state' is pretty
> descriptive when attached to a domU. Or, perhaps, "runstate" ?

Can I request again for a more meaningful name than "power_state"?

regards
john

_______________________________________________
xen-api mailing list
xen-api@lists.xensource.com
http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-api