Mailing List Archive

Lose 30+ seconds of packets to instance during Live-Migration
Version: Pike
OVS version: 2.9

VM-A (On Compute A) ----- (On Compute B) VM-B

What is it in Neutron that might delay vxlan tunnel construction on the
destination compute node during live-migration? As the VM is live-migrated,
I'm watch the flows and the vxlan tunnel interfaces on br-tun on the
Compute node where the VM is moving too and they don't appear until 30+
seconds into the migration. I'm wondering if this is the cause of packet
loss during this migration that's around ~35 seconds or so.

The strange thing is, if I start a continuous ping from VM B on compute B
to VM A on compute A and then initiate a live-migration of VM A to move to
Compute B, I only lose ~1 second of traffic, which leads me to suspect this
issue is related to said tunnels or flows on br-tun...

Any help would be greatly appreciated!

Thanks!

Steve
Re: Lose 30+ seconds of packets to instance during Live-Migration [ In reply to ]
After turning off L2 population on the compute and network nodes, the
packet loss during live migration diminished from 30+ to about 3 seconds...

Does anyone have an explanation for this? I'd really like to be able to use
L2 pop and ARP responder if I can, but not at the cost of that large of a
hit when I live migrate.

Thanks in advance!

Steve

On Wed, Aug 22, 2018 at 11:56 AM Sterdnot Shaken <sterdnotshaken@gmail.com>
wrote:

> Version: Pike
> OVS version: 2.9
>
> VM-A (On Compute A) ----- (On Compute B) VM-B
>
> What is it in Neutron that might delay vxlan tunnel construction on the
> destination compute node during live-migration? As the VM is live-migrated,
> I'm watch the flows and the vxlan tunnel interfaces on br-tun on the
> Compute node where the VM is moving too and they don't appear until 30+
> seconds into the migration. I'm wondering if this is the cause of packet
> loss during this migration that's around ~35 seconds or so.
>
> The strange thing is, if I start a continuous ping from VM B on compute B
> to VM A on compute A and then initiate a live-migration of VM A to move to
> Compute B, I only lose ~1 second of traffic, which leads me to suspect this
> issue is related to said tunnels or flows on br-tun...
>
> Any help would be greatly appreciated!
>
> Thanks!
>
> Steve
>