Mailing List Archive

Live-migration experiences?
Hello! At GoDaddy, we're about to start experimenting with live
migration. While setting it up, we've found a number of options that
seem attractive/useful, but we're wondering if anyone has data/anecdotes
about specific configurations of live migration. Your time in reading
them is appreciated!

First a few facts about our installation:

* We're using kolla-ansible and basically leaving most nova settings at
the default, meaning libvirt+kvm
* We will be using block migration, as we have no shared storage of any
kind.
* We use routed networks to set up L2 segments per-rack. Each rack is
basically an island unto itself. The VMs on one rack cannot be migrated
to another rack because of this.
* Our main resource limitation is disk, followed closely by RAM. As
such, our main motivation for wanting to do live migration is to be able
to move VMs off of machines where over-subscribed disk users start to
threaten the free space of the others.

Now, some things we'd love your help with:

* TLS for libvirt - We do not want to transfer the contents of VMs' RAM
over unencrypted sockets. We want to setup TLS with an internal CA and
tls_allowed_dn_list controlling access. Has anyone reading this used
this setup? Do you have suggestions, reservations, or encouragement for
us wanting to do it this way?

* Raw backed qcow2 files - Our instances use qcow2, and our images are
uploaded as a raw-backed qcow2. As a result we get maximum disk savings
with excellent read performance. When live migrating these around, have
you found that they continue to use the same space on the target node as
they did on the source? If not, did you find a workaround?

* Do people have feedback on live_migrate_permit_auto_convergence? It
seems like a reasonable trade-off, but since it is defaulted to false, I
wonder if there are some hidden gotchas there.

* General pointers to excellent guides, white papers, etc, that might
help us avoid doing all of our learning via trial/error.

Thanks very much for your time!

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: Live-migration experiences? [ In reply to ]
On 8/6/2018 8:12 AM, Clint Byrum wrote:
> First a few facts about our installation:
>
> * We're using kolla-ansible and basically leaving most nova settings at
> the default, meaning libvirt+kvm
> * We will be using block migration, as we have no shared storage of any
> kind.
> * We use routed networks to set up L2 segments per-rack. Each rack is
> basically an island unto itself. The VMs on one rack cannot be migrated
> to another rack  because of this.
> * Our main resource limitation is disk, followed closely by RAM. As
> such, our main motivation for wanting to do live migration is to be able
> to move VMs off of machines where over-subscribed disk users start to
> threaten the free space of the others.

What release are you on?

>
> * Do people have feedback on live_migrate_permit_auto_convergence? It
> seems like a reasonable trade-off, but since it is defaulted to false, I
> wonder if there are some hidden gotchas there.

You might want to read through [1] and [2]. Those were written by the
OSIC dev team when that still existed. But there are some (somewhat
mysterious) mentions to caveats with post-copy you should be aware of.
At this point, John Garbutt is probably the best person to talk to about
those since all of the other OSIC devs that worked on this spec are long
gone.

>
> * General pointers to excellent guides, white papers, etc, that might
help us avoid doing all of our learning via trial/error.

Check out [3]. I've specifically been meaning to watch the one from
Boston that John was in.

[1]
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/live-migration-force-after-timeout.html
[2]
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/live-migration-per-instance-timeout.html
[3] https://www.openstack.org/videos/search?search=live%20migration

--

Thanks,

Matt

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: Live-migration experiences? [ In reply to ]
Hi clint, matt.

To be noticed that post-copy and auto-convergence are mutually exclusive.

The drawbacks that we experienced with here is that live-migration using
either way post-copy or auto-convergence will likely fail for application
not being able to handle throttling. Although, post-copy is guaranteed to
work for most of your migration.

On the design part of your solution. For a while we had the same design
with each rack being segmented but we gave up with this choice as it was a
PITA especially for live-migration.

We’re currently migrating our network design to a much simplified all L3
network layout with our underlying network a bgp driven network and all
overlay network being managed by Openstack.

Let me know if you need more details or information.

Kind regards,
G.



Le mar. 7 août 2018 à 01:44, Matt Riedemann <mriedemos@gmail.com> a écrit :

> On 8/6/2018 8:12 AM, Clint Byrum wrote:
> > First a few facts about our installation:
> >
> > * We're using kolla-ansible and basically leaving most nova settings at
> > the default, meaning libvirt+kvm
> > * We will be using block migration, as we have no shared storage of any
> > kind.
> > * We use routed networks to set up L2 segments per-rack. Each rack is
> > basically an island unto itself. The VMs on one rack cannot be migrated
> > to another rack because of this.
> > * Our main resource limitation is disk, followed closely by RAM. As
> > such, our main motivation for wanting to do live migration is to be able
> > to move VMs off of machines where over-subscribed disk users start to
> > threaten the free space of the others.
>
> What release are you on?
>
> >
> > * Do people have feedback on live_migrate_permit_auto_convergence? It
> > seems like a reasonable trade-off, but since it is defaulted to false, I
> > wonder if there are some hidden gotchas there.
>
> You might want to read through [1] and [2]. Those were written by the
> OSIC dev team when that still existed. But there are some (somewhat
> mysterious) mentions to caveats with post-copy you should be aware of.
> At this point, John Garbutt is probably the best person to talk to about
> those since all of the other OSIC devs that worked on this spec are long
> gone.
>
> >
> > * General pointers to excellent guides, white papers, etc, that might
> help us avoid doing all of our learning via trial/error.
>
> Check out [3]. I've specifically been meaning to watch the one from
> Boston that John was in.
>
> [1]
>
> https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/live-migration-force-after-timeout.html
> [2]
>
> https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/live-migration-per-instance-timeout.html
> [3] https://www.openstack.org/videos/search?search=live%20migration
>
> --
>
> Thanks,
>
> Matt
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>