Mailing List Archive

[nova][cinder][neutron] Cross-cell cold migration
Hi everyone,

I have started an etherpad for cells topics at the Stein PTG [1]. The
main issue in there right now is dealing with cross-cell cold migration
in nova.

At a high level, I am going off these requirements:

* Cells can shard across flavors (and hardware type) so operators would
like to move users off the old flavors/hardware (old cell) to new
flavors in a new cell.

* There is network isolation between compute hosts in different cells,
so no ssh'ing the disk around like we do today. But the image service is
global to all cells.

Based on this, for the initial support for cross-cell cold migration, I
am proposing that we leverage something like shelve offload/unshelve
masquerading as resize. We shelve offload from the source cell and
unshelve in the target cell. This should work for both volume-backed and
non-volume-backed servers (we use snapshots for shelved offloaded
non-volume-backed servers).

There are, of course, some complications. The main ones that I need help
with right now are what happens with volumes and ports attached to the
server. Today we detach from the source and attach at the target, but
that's assuming the storage backend and network are available to both
hosts involved in the move of the server. Will that be the case across
cells? I am assuming that depends on the network topology (are routed
networks being used?) and storage backend (routed storage?). If the
network and/or storage backend are not available across cells, how do we
migrate volumes and ports? Cinder has a volume migrate API for admins
but I do not know how nova would know the proper affinity per-cell to
migrate the volume to the proper host (cinder does not have a routed
storage concept like routed provider networks in neutron, correct?). And
as far as I know, there is no such thing as port migration in Neutron.

Could Placement help with the volume/port migration stuff? Neutron
routed provider networks rely on placement aggregates to schedule the VM
to a compute host in the same network segment as the port used to create
the VM, however, if that segment does not span cells we are kind of
stuck, correct?

To summarize the issues as I see them (today):

* How to deal with the targeted cell during scheduling? This is so we
can even get out of the source cell in nova.

* How does the API deal with the same instance being in two DBs at the
same time during the move?

* How to handle revert resize?

* How are volumes and ports handled?

I can get feedback from my company's operators based on what their
deployment will look like for this, but that does not mean it will work
for others, so I need as much feedback from operators, especially those
running with multiple cells today, as possible. Thanks in advance.

[1] https://etherpad.openstack.org/p/nova-ptg-stein-cells

--

Thanks,

Matt

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
I think in our case we’d only migrate between cells if we know the network and storage is accessible and would never do it if not.
Thinking moving from old to new hardware at a cell level.

If storage and network isn’t available ideally it would fail at the api request.

There is also ceph backed instances and so this is also something to take into account which nova would be responsible for.

I’ll be in Denver so we can discuss more there too.

Cheers,
Sam





> On 23 Aug 2018, at 11:23 am, Matt Riedemann <mriedemos@gmail.com> wrote:
>
> Hi everyone,
>
> I have started an etherpad for cells topics at the Stein PTG [1]. The main issue in there right now is dealing with cross-cell cold migration in nova.
>
> At a high level, I am going off these requirements:
>
> * Cells can shard across flavors (and hardware type) so operators would like to move users off the old flavors/hardware (old cell) to new flavors in a new cell.
>
> * There is network isolation between compute hosts in different cells, so no ssh'ing the disk around like we do today. But the image service is global to all cells.
>
> Based on this, for the initial support for cross-cell cold migration, I am proposing that we leverage something like shelve offload/unshelve masquerading as resize. We shelve offload from the source cell and unshelve in the target cell. This should work for both volume-backed and non-volume-backed servers (we use snapshots for shelved offloaded non-volume-backed servers).
>
> There are, of course, some complications. The main ones that I need help with right now are what happens with volumes and ports attached to the server. Today we detach from the source and attach at the target, but that's assuming the storage backend and network are available to both hosts involved in the move of the server. Will that be the case across cells? I am assuming that depends on the network topology (are routed networks being used?) and storage backend (routed storage?). If the network and/or storage backend are not available across cells, how do we migrate volumes and ports? Cinder has a volume migrate API for admins but I do not know how nova would know the proper affinity per-cell to migrate the volume to the proper host (cinder does not have a routed storage concept like routed provider networks in neutron, correct?). And as far as I know, there is no such thing as port migration in Neutron.
>
> Could Placement help with the volume/port migration stuff? Neutron routed provider networks rely on placement aggregates to schedule the VM to a compute host in the same network segment as the port used to create the VM, however, if that segment does not span cells we are kind of stuck, correct?
>
> To summarize the issues as I see them (today):
>
> * How to deal with the targeted cell during scheduling? This is so we can even get out of the source cell in nova.
>
> * How does the API deal with the same instance being in two DBs at the same time during the move?
>
> * How to handle revert resize?
>
> * How are volumes and ports handled?
>
> I can get feedback from my company's operators based on what their deployment will look like for this, but that does not mean it will work for others, so I need as much feedback from operators, especially those running with multiple cells today, as possible. Thanks in advance.
>
> [1] https://etherpad.openstack.org/p/nova-ptg-stein-cells
>
> --
>
> Thanks,
>
> Matt


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [openstack-dev] [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
On Wed, Aug 22, 2018 at 08:23:41PM -0500, Matt Riedemann wrote:
> Hi everyone,
>
> I have started an etherpad for cells topics at the Stein PTG [1]. The main
> issue in there right now is dealing with cross-cell cold migration in nova.
>
> At a high level, I am going off these requirements:
>
> * Cells can shard across flavors (and hardware type) so operators would like
> to move users off the old flavors/hardware (old cell) to new flavors in a
> new cell.
>
> * There is network isolation between compute hosts in different cells, so no
> ssh'ing the disk around like we do today. But the image service is global to
> all cells.
>
> Based on this, for the initial support for cross-cell cold migration, I am
> proposing that we leverage something like shelve offload/unshelve
> masquerading as resize. We shelve offload from the source cell and unshelve
> in the target cell. This should work for both volume-backed and
> non-volume-backed servers (we use snapshots for shelved offloaded
> non-volume-backed servers).
>
> There are, of course, some complications. The main ones that I need help
> with right now are what happens with volumes and ports attached to the
> server. Today we detach from the source and attach at the target, but that's
> assuming the storage backend and network are available to both hosts
> involved in the move of the server. Will that be the case across cells? I am
> assuming that depends on the network topology (are routed networks being
> used?) and storage backend (routed storage?). If the network and/or storage
> backend are not available across cells, how do we migrate volumes and ports?
> Cinder has a volume migrate API for admins but I do not know how nova would
> know the proper affinity per-cell to migrate the volume to the proper host
> (cinder does not have a routed storage concept like routed provider networks
> in neutron, correct?). And as far as I know, there is no such thing as port
> migration in Neutron.
>

Just speaking to iSCSI storage, I know some deployments do not route their
storage traffic. If this is the case, then both cells would need to have access
to the same subnet to still access the volume.

I'm also referring to the case where the migration is from one compute host to
another compute host, and not from one storage backend to another storage
backend.

I haven't gone through the workflow, but I thought shelve/unshelve could detach
the volume on shelving and reattach it on unshelve. In that workflow, assuming
the networking is in place to provide the connectivity, the nova compute host
would be connecting to the volume just like any other attach and should work
fine. The unknown or tricky part is making sure that there is the network
connectivity or routing in place for the compute host to be able to log in to
the storage target.

If it's the other scenario mentioned where the volume needs to be migrated from
one storage backend to another storage backend, then that may require a little
more work. The volume would need to be retype'd or migrated (storage migration)
from the original backend to the new backend.

Again, in this scenario at some point there needs to be network connectivity
between cells to copy over that data.

There is no storage-offloaded migration in this situation, so Cinder can't
currently optimize how that data gets from the original volume backend to the
new one. It would require a host copy of all the data on the volume (an often
slow and expensive operation) and it would require that the host doing the data
copy has access to both the original backend and then new backend.

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
Sorry for delayed response. Was on PTO when this came out. Comments
inline...

On 08/22/2018 09:23 PM, Matt Riedemann wrote:
> Hi everyone,
>
> I have started an etherpad for cells topics at the Stein PTG [1]. The
> main issue in there right now is dealing with cross-cell cold migration
> in nova.
>
> At a high level, I am going off these requirements:
>
> * Cells can shard across flavors (and hardware type) so operators would
> like to move users off the old flavors/hardware (old cell) to new
> flavors in a new cell.

So cell migrations are kind of the new release upgrade dance. Got it.

> * There is network isolation between compute hosts in different cells,
> so no ssh'ing the disk around like we do today. But the image service is
> global to all cells.
>
> Based on this, for the initial support for cross-cell cold migration, I
> am proposing that we leverage something like shelve offload/unshelve
> masquerading as resize. We shelve offload from the source cell and
> unshelve in the target cell. This should work for both volume-backed and
> non-volume-backed servers (we use snapshots for shelved offloaded
> non-volume-backed servers).

shelve was and continues to be a hack in order for users to keep an IPv4
address while not consuming compute resources for some amount of time. [1]

If cross-cell cold migration is similarly just about the user being able
to keep their instance's IPv4 address while allowing an admin to move an
instance's storage to another physical location, then my firm belief is
that this kind of activity needs to be coordinated *externally to Nova*.

Each deployment is going to be different, and in all cases of cross-cell
migration, the admins doing these move operations are going to need to
understand various network, storage and failure domains that are
particular to that deployment (and not something we have the ability to
discover in any automated fashion).

Since we're not talking about live migration (thank all that is holy), I
believe the safest and most effective way to perform such a cross-cell
"migration" would be the following basic steps:

0. ensure that each compute node is associated with at least one nova
host aggregate that is *only* in a single cell
1. shut down the instance (optionally snapshotting required local disk
changes if the user is unfortunately using their root disk for
application data)
2. "save" the instance's IP address by manually creating a port in
Neutron and assigning the IP address manually to that port. this of
course will be deployment-dependent since you will need to hope the
saved IP address for the migrating instance is in a subnet range that is
available in the target cell
3. migrate the volume manually. this will be entirely deployment and
backend-dependent as smcginnis alluded to in a response to this thread
4. have the admin boot the instance in a host aggregate that is known to
be in the target cell, passing --network port_id=$SAVED_PORT_WITH_IP and
--volume $MIGRATED_VOLUME_UUID arguments as needed. the admin would need
to do this because users don't know about host aggregates and, frankly,
the user shouldn't know about host aggregates, cells, or any of this.

Best,
-jay

[1] ok, shelve also lets a user keep their instance ID. I don't care
much about that.

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
>> * Cells can shard across flavors (and hardware type) so operators
>> would like to move users off the old flavors/hardware (old cell) to
>> new flavors in a new cell.
>
> So cell migrations are kind of the new release upgrade dance. Got it.

No, cell migrations are about moving instances between cells for
whatever reason. If you have small cells for organization, then it's
about not building arbitrary barriers between aisles. If you use it for
hardware refresh, then it might be related to long term lifecycle. I'm
not sure what any of this has to do with release upgrades or dancing.

> shelve was and continues to be a hack in order for users to keep an
> IPv4 address while not consuming compute resources for some amount of
> time. [1]

As we discussed in YVR most recently, it also may become an important
thing for operators and users where expensive accelerators are committed
to instances with part-time usage patterns. It has also come up more
than once in the realm of "but I need to detach my root volume"
scenarios. I love to hate on shelve as well, but recently a few more
legit (than merely keeping an IPv4 address) use-cases have come out for
it, and I don't think Matt is wrong that cross-cell migration *might* be
easier as a shelve operation under the covers.

> If cross-cell cold migration is similarly just about the user being
> able to keep their instance's IPv4 address while allowing an admin to
> move an instance's storage to another physical location, then my firm
> belief is that this kind of activity needs to be coordinated
> *externally to Nova*.

I'm not sure how you could make that jump, but no, I don't think that's
the case. In any sort of large cloud that uses cells to solve problems
of scale, I think it's quite likely to expect that your IPv4 address
physically can't be honored in the target cell, and/or requires some
less-than-ideal temporary tunneling for bridging the gap.

> Since we're not talking about live migration (thank all that is holy),

Oh it's coming. Don't think it's not.

> I believe the safest and most effective way to perform such a
> cross-cell "migration" would be the following basic steps:
>
> 0. ensure that each compute node is associated with at least one nova
> host aggregate that is *only* in a single cell
> 1. shut down the instance (optionally snapshotting required local disk
> changes if the user is unfortunately using their root disk for
> application data)
> 2. "save" the instance's IP address by manually creating a port in
> Neutron and assigning the IP address manually to that port. this of
> course will be deployment-dependent since you will need to hope the
> saved IP address for the migrating instance is in a subnet range that
> is available in the target cell
> 3. migrate the volume manually. this will be entirely deployment and
> backend-dependent as smcginnis alluded to in a response to this thread
> 4. have the admin boot the instance in a host aggregate that is known
> to be in the target cell, passing --network
> port_id=$SAVED_PORT_WITH_IP and --volume $MIGRATED_VOLUME_UUID
> arguments as needed. the admin would need to do this because users
> don't know about host aggregates and, frankly, the user shouldn't know
> about host aggregates, cells, or any of this.

What you just described here is largely shelve, ignoring the volume
migration part and the fact that such a manual process means the user
loses the instance's uuid and various other elements about it (such as
create time, action/event history, etc). Oh, and ignoring the fact that
the user no longer owns their instance (the admin does) :)

Especially given that migrating across a cell may mean "one aisle over,
same storage provider and network" to a lot of people, the above being a
completely manual process seems a little crazy to me.

--Dan

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
I respect your opinion but respectfully disagree that this is something
we need to spend our time on. Comments inline.

On 08/29/2018 10:47 AM, Dan Smith wrote:
>>> * Cells can shard across flavors (and hardware type) so operators
>>> would like to move users off the old flavors/hardware (old cell) to
>>> new flavors in a new cell.
>>
>> So cell migrations are kind of the new release upgrade dance. Got it.
>
> No, cell migrations are about moving instances between cells for
> whatever reason. If you have small cells for organization, then it's
> about not building arbitrary barriers between aisles. If you use it for
> hardware refresh, then it might be related to long term lifecycle. I'm
> not sure what any of this has to do with release upgrades or dancing.

A release upgrade dance involves coordination of multiple moving parts.
It's about as similar to this scenario as I can imagine. And there's a
reason release upgrades are not done entirely within Nova; clearly an
external upgrade tool or script is needed to orchestrate the many steps
and components involved in the upgrade process.

The similar dance for cross-cell migration is the coordination that
needs to happen between Nova, Neutron and Cinder. It's called
orchestration for a reason and is not what Nova is good at (as we've
repeatedly seen)

The thing that makes *this* particular scenario problematic is that
cells aren't user-visible things. User-visible things could much more
easily be orchestrated via external actors, as I still firmly believe
this kind of thing should be done.

>> shelve was and continues to be a hack in order for users to keep an
>> IPv4 address while not consuming compute resources for some amount of
>> time. [1]
>
> As we discussed in YVR most recently, it also may become an important
> thing for operators and users where expensive accelerators are committed
> to instances with part-time usage patterns.

I don't think that's a valid use case in respect to this scenario of
cross-cell migration. If the target cell compute doesn't have the same
expensive accelerators on them, nobody would want or permit a move to
that target cell anyway.

Also, I'd love to hear from anyone in the real world who has
successfully migrated (live or otherwise) an instance that "owns"
expensive hardware (accelerators, SR-IOV PFs, GPUs or otherwise).

The patterns that I have seen are one of the following:

* Applications don't move. They are pets that stay on one or more VMs or
baremetal nodes and they grow roots.

* Applications are designed to *utilize* the expensive hardware. They
don't "own" the hardware itself.

In this latter case, the application is properly designed and stores its
persistent data in a volume and doesn't keep state outside of the
application volume. In these cases, the process of "migrating" an
instance simply goes away. You just detach the application persistent
volume, shut down the instance, start up a new one elsewhere (allowing
the scheduler to select one that meets the resource constraints in the
flavor/image), attach the volume again and off you go. No messing around
with shelving, offloading, migrating, or any of that nonsense in Nova.

We should not pretend that what we're discussing here is anything other
than hacking orchestration workarounds into Nova to handle
poorly-designed applications that have grown roots on some hardware and
think they "own" hardware resources in a Nova deployment.

> It has also come up more than once in the realm of "but I need to
> detach my root volume" scenarios. I love to hate on shelve as well,
> but recently a few more legit (than merely keeping an IPv4 address)
> use-cases have come out for it, and I don't think Matt is wrong that
> cross-cell migration *might* be easier as a shelve operation under
> the covers.

Matt may indeed be right, but I'm certainly allowed to express my
opinion that I think shelve is a monstrosity that should be avoided at
all costs and building additional orchestration functionality into Nova
on top of an already-shaky foundation (shelve) isn't something I think
is a long-term maintainable solution.

>> If cross-cell cold migration is similarly just about the user being
>> able to keep their instance's IPv4 address while allowing an admin to
>> move an instance's storage to another physical location, then my firm
>> belief is that this kind of activity needs to be coordinated
>> *externally to Nova*.
>
> I'm not sure how you could make that jump, but no, I don't think that's
> the case. In any sort of large cloud that uses cells to solve problems
> of scale, I think it's quite likely to expect that your IPv4 address
> physically can't be honored in the target cell, and/or requires some
> less-than-ideal temporary tunneling for bridging the gap.

If that's the case, why are we discussing shelve at all? Just stop the
instance, copy/migrate the volume data (if needed, again it completely
depends on the deployment, network topology and block storage backend),
to a new location (new cell, new AZ, new host agg, does it really
matter?) and start a new instance, attaching the volume after the
instance starts or supplying the volume in the boot/create command.

>> Since we're not talking about live migration (thank all that is holy),
>
> Oh it's coming. Don't think it's not.
>
>> I believe the safest and most effective way to perform such a
>> cross-cell "migration" would be the following basic steps:
>>
>> 0. ensure that each compute node is associated with at least one nova
>> host aggregate that is *only* in a single cell
>> 1. shut down the instance (optionally snapshotting required local disk
>> changes if the user is unfortunately using their root disk for
>> application data)
>> 2. "save" the instance's IP address by manually creating a port in
>> Neutron and assigning the IP address manually to that port. this of
>> course will be deployment-dependent since you will need to hope the
>> saved IP address for the migrating instance is in a subnet range that
>> is available in the target cell
>> 3. migrate the volume manually. this will be entirely deployment and
>> backend-dependent as smcginnis alluded to in a response to this thread
>> 4. have the admin boot the instance in a host aggregate that is known
>> to be in the target cell, passing --network
>> port_id=$SAVED_PORT_WITH_IP and --volume $MIGRATED_VOLUME_UUID
>> arguments as needed. the admin would need to do this because users
>> don't know about host aggregates and, frankly, the user shouldn't know
>> about host aggregates, cells, or any of this.
>
> What you just described here is largely shelve, ignoring the volume
> migration part and the fact that such a manual process means the user
> loses the instance's uuid and various other elements about it (such as
> create time, action/event history, etc). Oh, and ignoring the fact that
> the user no longer owns their instance (the admin does) :)

The admin only "owns" the instance because we have no ability to
transfer ownership of the instance and a cell isn't a user-visible
thing. An external script that accomplishes this kind of orchestrated
move from one cell to another could easily update the ownership of said
instance in the DB.

My point is that Nova isn't an orchestrator, and building functionality
into Nova to do this type of cross-cell migration IMHO just will lead to
even more unmaintainable code paths that few, if any, deployers will
ever end up using because they will end up doing it externally anyway
due to the need to integrate with backend inventory management systems
and other things.

Best,
-jay

> Especially given that migrating across a cell may mean "one aisle over,
> same storage provider and network" to a lot of people, the above being a
> completely manual process seems a little crazy to me.
>
> --Dan
>

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
> A release upgrade dance involves coordination of multiple moving
> parts. It's about as similar to this scenario as I can imagine. And
> there's a reason release upgrades are not done entirely within Nova;
> clearly an external upgrade tool or script is needed to orchestrate
> the many steps and components involved in the upgrade process.

I'm lost here, and assume we must be confusing terminology or something.

> The similar dance for cross-cell migration is the coordination that
> needs to happen between Nova, Neutron and Cinder. It's called
> orchestration for a reason and is not what Nova is good at (as we've
> repeatedly seen)

Most other operations in Nova meet this criteria. Boot requires
coordination between Nova, Cinder, and Neutron. As do migrate, start,
stop, evacuate. We might decide that (for now) the volume migration
thing is beyond the line we're willing to cross, and that's cool, but I
think it's an arbitrary limitation we shouldn't assume is
impossible. Moving instances around *is* what nova is (supposed to be)
good at.

> The thing that makes *this* particular scenario problematic is that
> cells aren't user-visible things. User-visible things could much more
> easily be orchestrated via external actors, as I still firmly believe
> this kind of thing should be done.

I'm having a hard time reconciling these:

1. Cells aren't user-visible, and shouldn't be (your words and mine).
2. Cross-cell migration should be done by an external service (your
words).
3. External services work best when things are user-visible (your words).

You say the user-invisible-ness makes orchestrating this externally
difficult and I agree, but...is your argument here just that it
shouldn't be done at all?

>> As we discussed in YVR most recently, it also may become an important
>> thing for operators and users where expensive accelerators are committed
>> to instances with part-time usage patterns.
>
> I don't think that's a valid use case in respect to this scenario of
> cross-cell migration.

You're right, it has nothing to do with cross-cell migration at all. I
was pointing to *other* legitimate use cases for shelve.

> Also, I'd love to hear from anyone in the real world who has
> successfully migrated (live or otherwise) an instance that "owns"
> expensive hardware (accelerators, SR-IOV PFs, GPUs or otherwise).

Again, the accelerator case has nothing to do with migrating across
cells, but merely demonstrates another example of where shelve may be
the thing operators actually desire. Maybe I shouldn't have confused the
discussion by bringing it up.

> The patterns that I have seen are one of the following:
>
> * Applications don't move. They are pets that stay on one or more VMs
> or baremetal nodes and they grow roots.
>
> * Applications are designed to *utilize* the expensive hardware. They
> don't "own" the hardware itself.
>
> In this latter case, the application is properly designed and stores
> its persistent data in a volume and doesn't keep state outside of the
> application volume. In these cases, the process of "migrating" an
> instance simply goes away. You just detach the application persistent
> volume, shut down the instance, start up a new one elsewhere (allowing
> the scheduler to select one that meets the resource constraints in the
> flavor/image), attach the volume again and off you go. No messing
> around with shelving, offloading, migrating, or any of that nonsense
> in Nova.

Jay, you know I sympathize with the fully-ephemeral application case,
right? Can we agree that pets are a thing and that migrations are not
going to be leaving Nova's scope any time soon? If so, I think we can
get back to the real discussion, and if not, I think we probably, er,
can't :)

> We should not pretend that what we're discussing here is anything
> other than hacking orchestration workarounds into Nova to handle
> poorly-designed applications that have grown roots on some hardware
> and think they "own" hardware resources in a Nova deployment.

I have no idea how we got to "own hardware resources" here. The point of
this discussion is to make our instance-moving operations work across
cells. We designed cellsv2 to be invisible and baked into the core of
Nova. We intended for it to not fall into the trap laid by cellsv1,
where the presence of multiple cells meant that a bunch of regular
operations don't work like they would otherwise.

If we're going to discuss removing move operations from Nova, we should
do that in another thread. This one is about making existing operations
work :)

> If that's the case, why are we discussing shelve at all? Just stop the
> instance, copy/migrate the volume data (if needed, again it completely
> depends on the deployment, network topology and block storage
> backend), to a new location (new cell, new AZ, new host agg, does it
> really matter?) and start a new instance, attaching the volume after
> the instance starts or supplying the volume in the boot/create
> command.

Because shelve potentially makes it less dependent on the answers to
those questions and Matt suggested it as a first step to being able to
move things around at all. It means that "copy the data" becomes "talk
to glance" which compute nodes can already do. Requiring compute nodes
across cells to talk to each other (which could be in different
buildings, sites, or security domains) is a whole extra layer of
complexity. I do think we'll go there (via resize/migrate at some point,
but shelve going through glance for data and through a homeless phase in
Nova does simplify a whole set of things.

> The admin only "owns" the instance because we have no ability to
> transfer ownership of the instance and a cell isn't a user-visible
> thing. An external script that accomplishes this kind of orchestrated
> move from one cell to another could easily update the ownership of
> said instance in the DB.

So step 5 was "do surgery on the database"? :)

> My point is that Nova isn't an orchestrator, and building
> functionality into Nova to do this type of cross-cell migration IMHO
> just will lead to even more unmaintainable code paths that few, if
> any, deployers will ever end up using because they will end up doing
> it externally anyway due to the need to integrate with backend
> inventory management systems and other things.

On the contrary, per the original goal of cellsv2, I want to make the
*existing* code paths in Nova work properly when multiple cells are
present. Just like we had to make boot and list work properly with
multiple cells, I think we need to do the same with migrate, shelve,
etc.

--Dan

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
On 08/29/2018 12:39 PM, Dan Smith wrote:
> If we're going to discuss removing move operations from Nova, we should
> do that in another thread. This one is about making existing operations
> work :)

OK, understood. :)

>> The admin only "owns" the instance because we have no ability to
>> transfer ownership of the instance and a cell isn't a user-visible
>> thing. An external script that accomplishes this kind of orchestrated
>> move from one cell to another could easily update the ownership of
>> said instance in the DB.
>
> So step 5 was "do surgery on the database"? :)

Yep. You'd be surprised how often that ends up being the case.

I'm currently sitting here looking at various integration tooling for
doing just this kind of thing for our deployments of >150K baremetal
compute nodes. The number of specific-to-an-environment variables that
need to be considered and worked into the overall migration plan is
breathtaking. And trying to do all of that inside of Nova just isn't
feasible for the scale at which we run.

At least, that's my point of view. I won't drag this conversation out
any further on tangents.

Best,
-jay

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
On 08/29/2018 10:02 AM, Jay Pipes wrote:

> Also, I'd love to hear from anyone in the real world who has successfully
> migrated (live or otherwise) an instance that "owns" expensive hardware
> (accelerators, SR-IOV PFs, GPUs or otherwise).

I thought cold migration of instances with such devices was supported upstream?

Chris

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
On 08/29/2018 02:26 PM, Chris Friesen wrote:
> On 08/29/2018 10:02 AM, Jay Pipes wrote:
>
>> Also, I'd love to hear from anyone in the real world who has successfully
>> migrated (live or otherwise) an instance that "owns" expensive hardware
>> (accelerators, SR-IOV PFs, GPUs or otherwise).
>
> I thought cold migration of instances with such devices was supported
> upstream?

That's not what I asked. :)

-jay

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
I've not followed all the arguments here regarding internals but CERN's background usage of Cells v2 (and thoughts on impact of cross cell migration) is below. Some background at https://www.openstack.org/videos/vancouver-2018/moving-from-cellsv1-to-cellsv2-at-cern. Some rough parameters with the team providing more concrete numbers if needed....

- The VMs to be migrated are not generally not expensive configurations, just hardware lifecycles where boxes go out of warranty or computer centre rack/cooling needs re-organising. For CERN, this is a 6-12 month frequency of ~10,000 VMs per year (with a ~30% pet share)
- We make a cell from identical hardware at a single location, this greatly simplifies working out hardware issues, provisioning and management
- Some cases can be handled with the 'please delete and re-create'. Many other cases need much user support/downtime (and require significant effort or risk delaying retirements to get agreement)
- When a new hardware delivery is made, we would hope to define a new cell (as it is a different configuration)
- Depending on the facilities retirement plans, we would work out what needed to be moved to new resources
- There are many different scenarios for migration (either live or cold)
-- All instances in the old cell would be migrated to the new hardware which would have sufficient capacity
-- All instances in a single cell would be migrated to several different cells such as the new cells being smaller
-- Some instances would be migrated because those racks need to be retired but other servers in the cell would remain for a further year or two until retirement was mandatory

With many cells and multiple locations, spreading the hypervisors across the cells in anticipation of potential migrations is unattractive.

From my understanding, these models were feasible with Cells V1.

We can discuss further, at the PTG or Summit, on the operational flexibility which we have taken advantage of so far and alternative models.

Tim

?-----Original Message-----
From: Dan Smith <dms@danplanet.com>
Date: Wednesday, 29 August 2018 at 18:47
To: Jay Pipes <jaypipes@gmail.com>
Cc: "openstack-operators@lists.openstack.org" <openstack-operators@lists.openstack.org>
Subject: Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

> A release upgrade dance involves coordination of multiple moving
> parts. It's about as similar to this scenario as I can imagine. And
> there's a reason release upgrades are not done entirely within Nova;
> clearly an external upgrade tool or script is needed to orchestrate
> the many steps and components involved in the upgrade process.

I'm lost here, and assume we must be confusing terminology or something.

> The similar dance for cross-cell migration is the coordination that
> needs to happen between Nova, Neutron and Cinder. It's called
> orchestration for a reason and is not what Nova is good at (as we've
> repeatedly seen)

Most other operations in Nova meet this criteria. Boot requires
coordination between Nova, Cinder, and Neutron. As do migrate, start,
stop, evacuate. We might decide that (for now) the volume migration
thing is beyond the line we're willing to cross, and that's cool, but I
think it's an arbitrary limitation we shouldn't assume is
impossible. Moving instances around *is* what nova is (supposed to be)
good at.

> The thing that makes *this* particular scenario problematic is that
> cells aren't user-visible things. User-visible things could much more
> easily be orchestrated via external actors, as I still firmly believe
> this kind of thing should be done.

I'm having a hard time reconciling these:

1. Cells aren't user-visible, and shouldn't be (your words and mine).
2. Cross-cell migration should be done by an external service (your
words).
3. External services work best when things are user-visible (your words).

You say the user-invisible-ness makes orchestrating this externally
difficult and I agree, but...is your argument here just that it
shouldn't be done at all?

>> As we discussed in YVR most recently, it also may become an important
>> thing for operators and users where expensive accelerators are committed
>> to instances with part-time usage patterns.
>
> I don't think that's a valid use case in respect to this scenario of
> cross-cell migration.

You're right, it has nothing to do with cross-cell migration at all. I
was pointing to *other* legitimate use cases for shelve.

> Also, I'd love to hear from anyone in the real world who has
> successfully migrated (live or otherwise) an instance that "owns"
> expensive hardware (accelerators, SR-IOV PFs, GPUs or otherwise).

Again, the accelerator case has nothing to do with migrating across
cells, but merely demonstrates another example of where shelve may be
the thing operators actually desire. Maybe I shouldn't have confused the
discussion by bringing it up.

> The patterns that I have seen are one of the following:
>
> * Applications don't move. They are pets that stay on one or more VMs
> or baremetal nodes and they grow roots.
>
> * Applications are designed to *utilize* the expensive hardware. They
> don't "own" the hardware itself.
>
> In this latter case, the application is properly designed and stores
> its persistent data in a volume and doesn't keep state outside of the
> application volume. In these cases, the process of "migrating" an
> instance simply goes away. You just detach the application persistent
> volume, shut down the instance, start up a new one elsewhere (allowing
> the scheduler to select one that meets the resource constraints in the
> flavor/image), attach the volume again and off you go. No messing
> around with shelving, offloading, migrating, or any of that nonsense
> in Nova.

Jay, you know I sympathize with the fully-ephemeral application case,
right? Can we agree that pets are a thing and that migrations are not
going to be leaving Nova's scope any time soon? If so, I think we can
get back to the real discussion, and if not, I think we probably, er,
can't :)

> We should not pretend that what we're discussing here is anything
> other than hacking orchestration workarounds into Nova to handle
> poorly-designed applications that have grown roots on some hardware
> and think they "own" hardware resources in a Nova deployment.

I have no idea how we got to "own hardware resources" here. The point of
this discussion is to make our instance-moving operations work across
cells. We designed cellsv2 to be invisible and baked into the core of
Nova. We intended for it to not fall into the trap laid by cellsv1,
where the presence of multiple cells meant that a bunch of regular
operations don't work like they would otherwise.

If we're going to discuss removing move operations from Nova, we should
do that in another thread. This one is about making existing operations
work :)

> If that's the case, why are we discussing shelve at all? Just stop the
> instance, copy/migrate the volume data (if needed, again it completely
> depends on the deployment, network topology and block storage
> backend), to a new location (new cell, new AZ, new host agg, does it
> really matter?) and start a new instance, attaching the volume after
> the instance starts or supplying the volume in the boot/create
> command.

Because shelve potentially makes it less dependent on the answers to
those questions and Matt suggested it as a first step to being able to
move things around at all. It means that "copy the data" becomes "talk
to glance" which compute nodes can already do. Requiring compute nodes
across cells to talk to each other (which could be in different
buildings, sites, or security domains) is a whole extra layer of
complexity. I do think we'll go there (via resize/migrate at some point,
but shelve going through glance for data and through a homeless phase in
Nova does simplify a whole set of things.

> The admin only "owns" the instance because we have no ability to
> transfer ownership of the instance and a cell isn't a user-visible
> thing. An external script that accomplishes this kind of orchestrated
> move from one cell to another could easily update the ownership of
> said instance in the DB.

So step 5 was "do surgery on the database"? :)

> My point is that Nova isn't an orchestrator, and building
> functionality into Nova to do this type of cross-cell migration IMHO
> just will lead to even more unmaintainable code paths that few, if
> any, deployers will ever end up using because they will end up doing
> it externally anyway due to the need to integrate with backend
> inventory management systems and other things.

On the contrary, per the original goal of cellsv2, I want to make the
*existing* code paths in Nova work properly when multiple cells are
present. Just like we had to make boot and list work properly with
multiple cells, I think we need to do the same with migrate, shelve,
etc.

--Dan

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
> - The VMs to be migrated are not generally not expensive
> configurations, just hardware lifecycles where boxes go out of
> warranty or computer centre rack/cooling needs re-organising. For
> CERN, this is a 6-12 month frequency of ~10,000 VMs per year (with a
> ~30% pet share)
> - We make a cell from identical hardware at a single location, this
> greatly simplifies working out hardware issues, provisioning and
> management
> - Some cases can be handled with the 'please delete and
> re-create'. Many other cases need much user support/downtime (and
> require significant effort or risk delaying retirements to get
> agreement)

Yep, this is the "organizational use case" of cells I refer to. I assume
that if one aisle (cell) is being replaced, it makes sense to stand up
the new one as its own cell, migrate the pets from one to the other and
then decommission the old one. Being only an aisle away, it's reasonable
to think that *this* situation might not suffer from the complexity of
needing to worry about heavyweight migrate network and storage.

> From my understanding, these models were feasible with Cells V1.

I don't think cellsv1 supported any notion of moving things between
cells at all, unless you had some sort of external hack for doing
it. Being able to migrate between cells at all was always one of the
things we touted as a "future feature" for cellsv2.

Unless of course you mean migration in terms of
snapshot-to-glance-and-redeploy?

--Dan

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
On 08/29/2018 04:04 PM, Dan Smith wrote:
>> - The VMs to be migrated are not generally not expensive
>> configurations, just hardware lifecycles where boxes go out of
>> warranty or computer centre rack/cooling needs re-organising. For
>> CERN, this is a 6-12 month frequency of ~10,000 VMs per year (with a
>> ~30% pet share)
>> - We make a cell from identical hardware at a single location, this
>> greatly simplifies working out hardware issues, provisioning and
>> management
>> - Some cases can be handled with the 'please delete and
>> re-create'. Many other cases need much user support/downtime (and
>> require significant effort or risk delaying retirements to get
>> agreement)
>
> Yep, this is the "organizational use case" of cells I refer to. I assume
> that if one aisle (cell) is being replaced, it makes sense to stand up
> the new one as its own cell, migrate the pets from one to the other and
> then decommission the old one. Being only an aisle away, it's reasonable
> to think that *this* situation might not suffer from the complexity of
> needing to worry about heavyweight migrate network and storage.

For this use case, why not just add the new hardware directly into the
existing cell and migrate the workloads onto the new hardware, then
disable the old hardware and retire it?

I mean, there might be a short period of time where the cell's DB and MQ
would be congested due to lots of migration operations, but it seems a
lot simpler to me than trying to do cross-cell migrations when cells
have been designed pretty much from the beginning of cellsv2 to not talk
to each other or allow any upcalls.

Thoughts?
-jay

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
Given the partial retirement scenario (i.e. only racks A-C retired due to cooling contrainsts, racks D-F still active with same old hardware but still useful for years), adding new hardware to old cells would not be non-optimal. I'm ignoring the long list of other things to worry such as preserving IP addresses etc.

Sounds like a good topic for PTG/Forum?

Tim

?-----Original Message-----
From: Jay Pipes <jaypipes@gmail.com>
Date: Wednesday, 29 August 2018 at 22:12
To: Dan Smith <dms@danplanet.com>, Tim Bell <Tim.Bell@cern.ch>
Cc: "openstack-operators@lists.openstack.org" <openstack-operators@lists.openstack.org>
Subject: Re: [Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

On 08/29/2018 04:04 PM, Dan Smith wrote:
>> - The VMs to be migrated are not generally not expensive
>> configurations, just hardware lifecycles where boxes go out of
>> warranty or computer centre rack/cooling needs re-organising. For
>> CERN, this is a 6-12 month frequency of ~10,000 VMs per year (with a
>> ~30% pet share)
>> - We make a cell from identical hardware at a single location, this
>> greatly simplifies working out hardware issues, provisioning and
>> management
>> - Some cases can be handled with the 'please delete and
>> re-create'. Many other cases need much user support/downtime (and
>> require significant effort or risk delaying retirements to get
>> agreement)
>
> Yep, this is the "organizational use case" of cells I refer to. I assume
> that if one aisle (cell) is being replaced, it makes sense to stand up
> the new one as its own cell, migrate the pets from one to the other and
> then decommission the old one. Being only an aisle away, it's reasonable
> to think that *this* situation might not suffer from the complexity of
> needing to worry about heavyweight migrate network and storage.

For this use case, why not just add the new hardware directly into the
existing cell and migrate the workloads onto the new hardware, then
disable the old hardware and retire it?

I mean, there might be a short period of time where the cell's DB and MQ
would be congested due to lots of migration operations, but it seems a
lot simpler to me than trying to do cross-cell migrations when cells
have been designed pretty much from the beginning of cellsv2 to not talk
to each other or allow any upcalls.

Thoughts?
-jay


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
On 8/29/2018 3:21 PM, Tim Bell wrote:
> Sounds like a good topic for PTG/Forum?

Yeah it's already on the PTG agenda [1][2]. I started the thread because
I wanted to get the ball rolling as early as possible, and with people
that won't attend the PTG and/or the Forum, to weigh in on not only the
known issues with cross-cell migration but also the things I'm not
thinking about.

[1] https://etherpad.openstack.org/p/nova-ptg-stein
[2] https://etherpad.openstack.org/p/nova-ptg-stein-cells

--

Thanks,

Matt

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [nova][cinder][neutron] Cross-cell cold migration [ In reply to ]
After hacking on the PoC for awhile [1] I have finally pushed up a spec
[2]. Behold it in all its dark glory!

[1] https://review.openstack.org/#/c/603930/
[2] https://review.openstack.org/#/c/616037/

On 8/22/2018 8:23 PM, Matt Riedemann wrote:
> Hi everyone,
>
> I have started an etherpad for cells topics at the Stein PTG [1]. The
> main issue in there right now is dealing with cross-cell cold migration
> in nova.
>
> At a high level, I am going off these requirements:
>
> * Cells can shard across flavors (and hardware type) so operators would
> like to move users off the old flavors/hardware (old cell) to new
> flavors in a new cell.
>
> * There is network isolation between compute hosts in different cells,
> so no ssh'ing the disk around like we do today. But the image service is
> global to all cells.
>
> Based on this, for the initial support for cross-cell cold migration, I
> am proposing that we leverage something like shelve offload/unshelve
> masquerading as resize. We shelve offload from the source cell and
> unshelve in the target cell. This should work for both volume-backed and
> non-volume-backed servers (we use snapshots for shelved offloaded
> non-volume-backed servers).
>
> There are, of course, some complications. The main ones that I need help
> with right now are what happens with volumes and ports attached to the
> server. Today we detach from the source and attach at the target, but
> that's assuming the storage backend and network are available to both
> hosts involved in the move of the server. Will that be the case across
> cells? I am assuming that depends on the network topology (are routed
> networks being used?) and storage backend (routed storage?). If the
> network and/or storage backend are not available across cells, how do we
> migrate volumes and ports? Cinder has a volume migrate API for admins
> but I do not know how nova would know the proper affinity per-cell to
> migrate the volume to the proper host (cinder does not have a routed
> storage concept like routed provider networks in neutron, correct?). And
> as far as I know, there is no such thing as port migration in Neutron.
>
> Could Placement help with the volume/port migration stuff? Neutron
> routed provider networks rely on placement aggregates to schedule the VM
> to a compute host in the same network segment as the port used to create
> the VM, however, if that segment does not span cells we are kind of
> stuck, correct?
>
> To summarize the issues as I see them (today):
>
> * How to deal with the targeted cell during scheduling? This is so we
> can even get out of the source cell in nova.
>
> * How does the API deal with the same instance being in two DBs at the
> same time during the move?
>
> * How to handle revert resize?
>
> * How are volumes and ports handled?
>
> I can get feedback from my company's operators based on what their
> deployment will look like for this, but that does not mean it will work
> for others, so I need as much feedback from operators, especially those
> running with multiple cells today, as possible. Thanks in advance.
>
> [1] https://etherpad.openstack.org/p/nova-ptg-stein-cells
>


--

Thanks,

Matt

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators