Mailing List Archive

[nova][cinder] Disabling nova volume-update (aka swap volume; aka cinder live migration)
For those who aren't familiar with it, nova's volume-update (also
called swap volume by nova devs) is the nova part of the
implementation of cinder's live migration (also called retype).
Volume-update is essentially an internal cinder<->nova api, but as
that's not a thing it's also unfortunately exposed to users. Some
users have found it and are using it, but because it's essentially an
internal cinder<->nova api it breaks pretty easily if you don't treat
it like a special snowflake. It looks like we've finally found a way
it's broken for non-cinder callers that we can't fix, even with a
dirty hack.

volume-update <server> <old> <new> essentially does a live copy of the
data on <old> volume to <new> volume, then seamlessly swaps the
attachment to <server> from <old> to <new>. The guest OS on <server>
will not notice anything at all as the hypervisor swaps the storage
backing an attached volume underneath it.

When called by cinder, as intended, cinder does some post-operation
cleanup such that <old> is deleted and <new> inherits the same
volume_id; that is <old> effectively becomes <new>. When called any
other way, however, this cleanup doesn't happen, which breaks a bunch
of assumptions. One of these is that a disk's serial number is the
same as the attached volume_id. Disk serial number, in KVM at least,
is immutable, so can't be updated during volume-update. This is fine
if we were called via cinder, because the cinder cleanup means the
volume_id stays the same. If called any other way, however, they no
longer match, at least until a hard reboot when it will be reset to
the new volume_id. It turns out this breaks live migration, but
probably other things too. We can't think of a workaround.

I wondered why users would want to do this anyway. It turns out that
sometimes cinder won't let you migrate a volume, but nova
volume-update doesn't do those checks (as they're specific to cinder
internals, none of nova's business, and duplicating them would be
fragile, so we're not adding them!). Specifically we know that cinder
won't let you migrate a volume with snapshots. There may be other
reasons. If cinder won't let you migrate your volume, you can still
move your data by using nova's volume-update, even though you'll end
up with a new volume on the destination, and a slightly broken
instance. Apparently the former is a trade-off worth making, but the
latter has been reported as a bug.

I'd like to make it very clear that nova's volume-update, isn't
expected to work correctly except when called by cinder. Specifically
there was a proposal that we disable volume-update from non-cinder
callers in some way, possibly by asserting volume state that can only
be set by cinder. However, I'm also very aware that users are calling
volume-update because it fills a need, and we don't want to trap data
that wasn't previously trapped.

Firstly, is anybody aware of any other reasons to use nova's
volume-update directly?

Secondly, is there any reason why we shouldn't just document then you
have to delete snapshots before doing a volume migration? Hopefully
some cinder folks or operators can chime in to let me know how to back
them up or somehow make them independent before doing this, at which
point the volume itself should be migratable?

If we can establish that there's an acceptable alternative to calling
volume-update directly for all use-cases we're aware of, I'm going to
propose heading off this class of bug by disabling it for non-cinder
callers.

Matt
--
Matthew Booth
Red Hat OpenStack Engineer, Compute DFG

Phone: +442070094448 (UK)

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder] Disabling nova volume-update (aka swap volume; aka cinder live migration) [ In reply to ]
On 20-08-18 16:29:52, Matthew Booth wrote:
> For those who aren't familiar with it, nova's volume-update (also
> called swap volume by nova devs) is the nova part of the
> implementation of cinder's live migration (also called retype).
> Volume-update is essentially an internal cinder<->nova api, but as
> that's not a thing it's also unfortunately exposed to users. Some
> users have found it and are using it, but because it's essentially an
> internal cinder<->nova api it breaks pretty easily if you don't treat
> it like a special snowflake. It looks like we've finally found a way
> it's broken for non-cinder callers that we can't fix, even with a
> dirty hack.
>
> volume-update <server> <old> <new> essentially does a live copy of the
> data on <old> volume to <new> volume, then seamlessly swaps the
> attachment to <server> from <old> to <new>. The guest OS on <server>
> will not notice anything at all as the hypervisor swaps the storage
> backing an attached volume underneath it.
>
> When called by cinder, as intended, cinder does some post-operation
> cleanup such that <old> is deleted and <new> inherits the same
> volume_id; that is <old> effectively becomes <new>. When called any
> other way, however, this cleanup doesn't happen, which breaks a bunch
> of assumptions. One of these is that a disk's serial number is the
> same as the attached volume_id. Disk serial number, in KVM at least,
> is immutable, so can't be updated during volume-update. This is fine
> if we were called via cinder, because the cinder cleanup means the
> volume_id stays the same. If called any other way, however, they no
> longer match, at least until a hard reboot when it will be reset to
> the new volume_id. It turns out this breaks live migration, but
> probably other things too. We can't think of a workaround.
>
> I wondered why users would want to do this anyway. It turns out that
> sometimes cinder won't let you migrate a volume, but nova
> volume-update doesn't do those checks (as they're specific to cinder
> internals, none of nova's business, and duplicating them would be
> fragile, so we're not adding them!). Specifically we know that cinder
> won't let you migrate a volume with snapshots. There may be other
> reasons. If cinder won't let you migrate your volume, you can still
> move your data by using nova's volume-update, even though you'll end
> up with a new volume on the destination, and a slightly broken
> instance. Apparently the former is a trade-off worth making, but the
> latter has been reported as a bug.
>
> I'd like to make it very clear that nova's volume-update, isn't
> expected to work correctly except when called by cinder. Specifically
> there was a proposal that we disable volume-update from non-cinder
> callers in some way, possibly by asserting volume state that can only
> be set by cinder. However, I'm also very aware that users are calling
> volume-update because it fills a need, and we don't want to trap data
> that wasn't previously trapped.
>
> Firstly, is anybody aware of any other reasons to use nova's
> volume-update directly?
>
> Secondly, is there any reason why we shouldn't just document then you
> have to delete snapshots before doing a volume migration? Hopefully
> some cinder folks or operators can chime in to let me know how to back
> them up or somehow make them independent before doing this, at which
> point the volume itself should be migratable?
>
> If we can establish that there's an acceptable alternative to calling
> volume-update directly for all use-cases we're aware of, I'm going to
> propose heading off this class of bug by disabling it for non-cinder
> callers.

I'm definitely in favor of hiding this from users eventually but
wouldn't this require some form of deprecation cycle?

Warnings within the API documentation would also be useful and even
something we could backport to stable to highlight just how fragile this
API is ahead of any policy change.

Cheers,

--
Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76