Mailing List Archive: Directional network performance issues with Neutron + OpenvSwitch

Directional network performance issues with Neutron + OpenvSwitch

Oct 2, 2013, 2:14 AM

Post #1 of 62 (17111 views)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Folks

I'm seeing an odd direction performance issue with my Havana test rig
which I'm struggling to debug; details:

Ubuntu 12.04 with Linux 3.8 backports kernel, Havana Cloud Archive
(currently Havana b3, OpenvSwitch 1.10.2), OpenvSwitch plugin with GRE
overlay networks.

I've configured the MTU's on all of the physical host network
interfaces to 1546 to add capacity for the GRE network headers.

Performance between instances within a single tenant network on
different physical hosts is as I would expect (near 1GBps), but I see
issues when data transits the Neutron L3 gateway - in the example
below churel is a physical host on the same network as the layer 3
gateway:

ubuntu@churel:~$ scp hardware.dump 10.98.191.103:
hardware.dump
100% 67MB 4.8MB/s
00:14

ubuntu@churel:~$ scp 10.98.191.103:hardware.dump .
hardware.dump
100% 67MB
66.8MB/s 00:01

As you can see, pushing data to the instance (via a floating ip
10.98.191.103) is painfully slow, whereas pulling the same data is
x10+ faster (and closer to what I would expect).

iperf confirms the same:

ubuntu@churel:~$ iperf -c 10.98.191.103 -m
- ------------------------------------------------------------
Client connecting to 10.98.191.103, TCP port 5001
TCP window size: 22.9 KByte (default)
- ------------------------------------------------------------
[ 3] local 10.98.191.11 port 55330 connected with 10.98.191.103 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 60.8 MBytes 50.8 Mbits/sec
[ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

ubuntu@james-page-bastion:~$ iperf -c 10.98.191.11 -m

- ------------------------------------------------------------
Client connecting to 10.98.191.11, TCP port 5001
TCP window size: 23.3 KByte (default)
- ------------------------------------------------------------
[ 3] local 10.5.0.2 port 52190 connected with 10.98.191.11 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.07 GBytes 918 Mbits/sec
[ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

918Mbit vs 50Mbits.

I tcpdump'ed the traffic and I see alot of duplicate acks which makes
me suspect some sort of packet fragmentation but its got me puzzled.

Anyone have any ideas about how to debug this further? or has anyone
seen anything like this before?

Cheers

James

- --
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSS+QSAAoJEL/srsug59jD8ZcQAKbZDVU8KKa7hsic7+ulqWQQ
EFbq8Im5x4mQY7htIvIOM26BR0ktAO5luE7zMBXsA4AwPud1BQSGhw89/NvNhADT
TLcGdQADsomeiBpJebzwUmvL/tYUoMDRA3O96mUn2pi0fySWbEuEgMDjDJ/ow23D
Y7nEv0mItaZ4MBSI9RZcqsDUl7UbbdlGejSWhJcwp/127HMU9nYwWNz5UHJjsGZ1
eITyv1WZH/dYPQ1SES41qD1WvkTBugopGJvptEyrcO62A+akGOvnqpsHgPECbLb+
b/8rk8nB1HB74Wh+tQP4WRQCZYso15nB6ukIyIU24Qti2tXtXDdKwszEoblCwCT3
YZJTERNOENURlUEFwgi6FNL+nZomSG0UJU6qqDGiUJkbSF7SwJm4y8/XRlJM2Ihn
wyxFB0qe3YdMqgDLZn11GwCDqn3g11hYaocHNUyRaj/tgxhGKbOFvix5kz3I4V7T
gd+sqUySMVd9wCRXBzDDhCuG9xf/QY2ZQxXzyfPJWd9svPh/O6osTSQzaI1eZl9/
jVRejMAFr6Rl11GPKd3DYi32GXa896QELjBmJ9Kof0NDlCcDuUKpVeifIhcbQZZV
sWyQmbb6Z/ypFV9xXiLRfH2fW2bAQQHgiQGvy9apoE78BWYdnsD8Q3Ekwag6lFqp
yUwt/RcRXS1PbLG4EGFW
=HTvW
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

rick.jones2 at hp

Oct 2, 2013, 8:28 AM

Post #2 of 62 (16951 views)

Permalink

On 10/02/2013 02:14 AM, James Page wrote:

>
> I tcpdump'ed the traffic and I see alot of duplicate acks which makes
> me suspect some sort of packet fragmentation but its got me puzzled.
>
> Anyone have any ideas about how to debug this further? or has anyone
> seen anything like this before?

Duplicate ACKs can be triggered by missing or out-of-order TCP segments.
Presumably that would show-up in the tcpdump trace though it might be
easier to see if you run the .pcap file through tcptrace -G.

Iperf may have a similar option, but if there are actual TCP
retransmissions during the run, netperf can be told to tell you about
them (when running under Linux):

netperf -H <remote> -t TCP_STREAM -- -o
throughput,local_transport_retrans,remote_transport_retrans

will give to <remote>

and

netperf -H <remote> -t TCP_MAERTS -- -o
throughput,local_transport_retrans,remote_transport_retrans

will give from <remote>. Or you can take snapshots of netstat -s output
from before and after your iperf run(s) and do the math by hand.

rick jones
if the netperf in multiverse isn't new enough to grok the -o option, you
can grab the top-of-trunk from http://www.netperf.org/svn/netperf2/trunk
via svn.

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

jaypipes at gmail

Oct 2, 2013, 8:37 AM

Post #3 of 62 (16926 views)

Permalink

Hi James, have you tried setting the MTU to a lower number of bytes,
instead of a higher-than-1500 setting? Say... 1454 instead of 1546?

Curious to see if that resolves the issue. If it does, then perhaps
there is a path somewhere that had a <1546 PMTU?

-jay

On 10/02/2013 05:14 AM, James Page wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi Folks
>
> I'm seeing an odd direction performance issue with my Havana test rig
> which I'm struggling to debug; details:
>
> Ubuntu 12.04 with Linux 3.8 backports kernel, Havana Cloud Archive
> (currently Havana b3, OpenvSwitch 1.10.2), OpenvSwitch plugin with GRE
> overlay networks.
>
> I've configured the MTU's on all of the physical host network
> interfaces to 1546 to add capacity for the GRE network headers.
>
> Performance between instances within a single tenant network on
> different physical hosts is as I would expect (near 1GBps), but I see
> issues when data transits the Neutron L3 gateway - in the example
> below churel is a physical host on the same network as the layer 3
> gateway:
>
> ubuntu@churel:~$ scp hardware.dump 10.98.191.103:
> hardware.dump
> 100% 67MB 4.8MB/s
> 00:14
>
> ubuntu@churel:~$ scp 10.98.191.103:hardware.dump .
> hardware.dump
> 100% 67MB
> 66.8MB/s 00:01
>
> As you can see, pushing data to the instance (via a floating ip
> 10.98.191.103) is painfully slow, whereas pulling the same data is
> x10+ faster (and closer to what I would expect).
>
> iperf confirms the same:
>
> ubuntu@churel:~$ iperf -c 10.98.191.103 -m
> - ------------------------------------------------------------
> Client connecting to 10.98.191.103, TCP port 5001
> TCP window size: 22.9 KByte (default)
> - ------------------------------------------------------------
> [ 3] local 10.98.191.11 port 55330 connected with 10.98.191.103 port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0-10.0 sec 60.8 MBytes 50.8 Mbits/sec
> [ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>
> ubuntu@james-page-bastion:~$ iperf -c 10.98.191.11 -m
>
>
> - ------------------------------------------------------------
> Client connecting to 10.98.191.11, TCP port 5001
> TCP window size: 23.3 KByte (default)
> - ------------------------------------------------------------
> [ 3] local 10.5.0.2 port 52190 connected with 10.98.191.11 port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0-10.0 sec 1.07 GBytes 918 Mbits/sec
> [ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>
>
> 918Mbit vs 50Mbits.
>
> I tcpdump'ed the traffic and I see alot of duplicate acks which makes
> me suspect some sort of packet fragmentation but its got me puzzled.
>
> Anyone have any ideas about how to debug this further? or has anyone
> seen anything like this before?
>
> Cheers
>
> James
>
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBCAAGBQJSS+QSAAoJEL/srsug59jD8ZcQAKbZDVU8KKa7hsic7+ulqWQQ
> EFbq8Im5x4mQY7htIvIOM26BR0ktAO5luE7zMBXsA4AwPud1BQSGhw89/NvNhADT
> TLcGdQADsomeiBpJebzwUmvL/tYUoMDRA3O96mUn2pi0fySWbEuEgMDjDJ/ow23D
> Y7nEv0mItaZ4MBSI9RZcqsDUl7UbbdlGejSWhJcwp/127HMU9nYwWNz5UHJjsGZ1
> eITyv1WZH/dYPQ1SES41qD1WvkTBugopGJvptEyrcO62A+akGOvnqpsHgPECbLb+
> b/8rk8nB1HB74Wh+tQP4WRQCZYso15nB6ukIyIU24Qti2tXtXDdKwszEoblCwCT3
> YZJTERNOENURlUEFwgi6FNL+nZomSG0UJU6qqDGiUJkbSF7SwJm4y8/XRlJM2Ihn
> wyxFB0qe3YdMqgDLZn11GwCDqn3g11hYaocHNUyRaj/tgxhGKbOFvix5kz3I4V7T
> gd+sqUySMVd9wCRXBzDDhCuG9xf/QY2ZQxXzyfPJWd9svPh/O6osTSQzaI1eZl9/
> jVRejMAFr6Rl11GPKd3DYi32GXa896QELjBmJ9Kof0NDlCcDuUKpVeifIhcbQZZV
> sWyQmbb6Z/ypFV9xXiLRfH2fW2bAQQHgiQGvy9apoE78BWYdnsD8Q3Ekwag6lFqp
> yUwt/RcRXS1PbLG4EGFW
> =HTvW
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

james.page at ubuntu

Oct 2, 2013, 9:17 AM

Post #4 of 62 (16933 views)

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Jay

On 02/10/13 16:37, Jay Pipes wrote:
> Hi James, have you tried setting the MTU to a lower number of
> bytes, instead of a higher-than-1500 setting? Say... 1454 instead
> of 1546?
>
> Curious to see if that resolves the issue. If it does, then
> perhaps there is a path somewhere that had a <1546 PMTU?

Do you mean in instances, or on the physical servers?

For context I hit this problem prior to tweaking MTU's (defaults of
1500 everywhere).

- --
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTEcPAAoJEL/srsug59jDJg0P/3Qa2IBgbbRDJ0qyoRKJIasY
apeI1ocxBSQiMu/T+8lcjaJj5ucz3l8LZrZGBe+0mHtEIzwOma7Lyie47GIPHopJ
I7oalNHY20sipbtljraA4HFfondDjXL9DCSQNAotiY2sD+QqHWG4oZ55IH/lh/fs
HfUb7WrOclQVK16WKn8fRmot3qx1tR2TNDhbp1WGGDqZxRPJa1xBXiw6FhSandcs
/uruaEIw8lZFvtLOiOhLLH5JErPZOAE4SZHTUuF56AtEthfMZLIzFrMrwV+cqcS5
8z/y6gsjMvDl4uKFwbuw/8DnVfzdQVI2/IRQPOrhj0Ve73YtAspEa5FmHgcGtm9c
8AL8emOLLs3jVFBVLDcCD3PezeItqaDoj8oAI1RUU3Ks1Pk2OsgKH2PLG/A2q97J
MSHv81Sm2m6xbSdAxLsxz+MCWV3Wkhvm0F6Q9k8xUowsIsgql2pbOs4QmAsIwucJ
tQdQ0R+yBCV+9lxWODieXTT/N0h7di5GVztip08T5kMxISLUo/Qhswi8jE9GU6ds
M6YC/GkSfoV0mOVsbLso8s6IEBlaCJajZduG4RkT1X+gt8nLMtcrx8eR49h0CIMe
+cT7Ck174IUL3oOfDJSjWRZkixIqhvmId5gtnjX0sg1mXnvGMMYG2d/0YeF1WZDE
of5cDqBMmh9Lm2ZMZvCh
=2SYk
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

hrushikesh.gangur at hp

Oct 2, 2013, 9:24 AM

Post #5 of 62 (16929 views)

Permalink

http://techbackground.blogspot.co.uk/2013/06/path-mtu-discovery-and-gre.html

-----Original Message-----
From: James Page [mailto:james.page@ubuntu.com]
Sent: Wednesday, October 02, 2013 9:17 AM
To: openstack@lists.openstack.org
Subject: Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Jay

On 02/10/13 16:37, Jay Pipes wrote:
> Hi James, have you tried setting the MTU to a lower number of bytes,
> instead of a higher-than-1500 setting? Say... 1454 instead of 1546?
>
> Curious to see if that resolves the issue. If it does, then perhaps
> there is a path somewhere that had a <1546 PMTU?

Do you mean in instances, or on the physical servers?

For context I hit this problem prior to tweaking MTU's (defaults of
1500 everywhere).

- --
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTEcPAAoJEL/srsug59jDJg0P/3Qa2IBgbbRDJ0qyoRKJIasY
apeI1ocxBSQiMu/T+8lcjaJj5ucz3l8LZrZGBe+0mHtEIzwOma7Lyie47GIPHopJ
I7oalNHY20sipbtljraA4HFfondDjXL9DCSQNAotiY2sD+QqHWG4oZ55IH/lh/fs
HfUb7WrOclQVK16WKn8fRmot3qx1tR2TNDhbp1WGGDqZxRPJa1xBXiw6FhSandcs
/uruaEIw8lZFvtLOiOhLLH5JErPZOAE4SZHTUuF56AtEthfMZLIzFrMrwV+cqcS5
8z/y6gsjMvDl4uKFwbuw/8DnVfzdQVI2/IRQPOrhj0Ve73YtAspEa5FmHgcGtm9c
8AL8emOLLs3jVFBVLDcCD3PezeItqaDoj8oAI1RUU3Ks1Pk2OsgKH2PLG/A2q97J
MSHv81Sm2m6xbSdAxLsxz+MCWV3Wkhvm0F6Q9k8xUowsIsgql2pbOs4QmAsIwucJ
tQdQ0R+yBCV+9lxWODieXTT/N0h7di5GVztip08T5kMxISLUo/Qhswi8jE9GU6ds
M6YC/GkSfoV0mOVsbLso8s6IEBlaCJajZduG4RkT1X+gt8nLMtcrx8eR49h0CIMe
+cT7Ck174IUL3oOfDJSjWRZkixIqhvmId5gtnjX0sg1mXnvGMMYG2d/0YeF1WZDE
of5cDqBMmh9Lm2ZMZvCh
=2SYk
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

jaypipes at gmail

Oct 2, 2013, 9:28 AM

Post #6 of 62 (16923 views)

Permalink

On 10/02/2013 12:17 PM, James Page wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi Jay
>
> On 02/10/13 16:37, Jay Pipes wrote:
>> Hi James, have you tried setting the MTU to a lower number of
>> bytes, instead of a higher-than-1500 setting? Say... 1454 instead
>> of 1546?
>>
>> Curious to see if that resolves the issue. If it does, then
>> perhaps there is a path somewhere that had a <1546 PMTU?
>
> Do you mean in instances, or on the physical servers?

I mean on the instance vNICs.

> For context I hit this problem prior to tweaking MTU's (defaults of
> 1500 everywhere).

Right, I'm just curious :)

-jay

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

james.page at ubuntu

Oct 2, 2013, 9:33 AM

Post #7 of 62 (16953 views)

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Gangur

On 02/10/13 17:24, Gangur, Hrushikesh (R & D HP Cloud) wrote:
> http://techbackground.blogspot.co.uk/2013/06/path-mtu-discovery-and-gre.html

Yeah
>
- - I read that already:

sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
traceroute -n 10.5.0.2 -p 44444 --mtu
traceroute to 10.5.0.2 (10.5.0.2), 30 hops max, 65000 byte packets
1 10.5.0.2 0.950 ms F=1500 0.598 ms 0.566 ms

The PMTU from the l3 gateway to the instance looks OK to me.

> On 02/10/13 16:37, Jay Pipes wrote:
>> Hi James, have you tried setting the MTU to a lower number of
>> bytes, instead of a higher-than-1500 setting? Say... 1454 instead
>> of 1546?
>
>> Curious to see if that resolves the issue. If it does, then
>> perhaps there is a path somewhere that had a <1546 PMTU?
>
> Do you mean in instances, or on the physical servers?
>
> For context I hit this problem prior to tweaking MTU's (defaults
> of 1500 everywhere).

- --
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTErmAAoJEL/srsug59jDLzEQALqIhfbVeWwUCe/s+P/CLN3k
EIH5koGJ69RiQDFhcIBSRzQw7FbwWznBAHHeemVn5OW/LcCKJQo9wLNX1K742pjz
G2pDwVeJnwX/QVK95chyJ/4zZENpSiT/2fzlNje7H95eiKdRd6mvDSPsIjoEQ5Ci
Cz4R1nvOoJj9cWOt5xCHtsmb5PX7O2D9zpCj/Al6ELH95zNfe7eyFSUcwZ/MEo9t
e8VxAaKlg+AQ6bdYokssIrHU6osdHDGXY1/9z6ffbcrVXJnlDkzHx0DmN81qIPXV
ros8OPZA51cVqVpEw2TvFbl5DZHukjOLGePsTKN6IcQ/2TtMdqqgbGdWAxO9iVFR
SAQdVp9yM6J7XM4kZ//gj4Oc3g/jN9EHr8rP0tEFWlypomiBjG8sQeEuHlp6DFxQ
IYacqOfWCozTDuQroj77Q9QUf4VV+ykVvTPFBHG7FiLAZyXRV5ueOlwHgAdysiyO
rIYcxXYrU6RAAmuqXXnyu5awFd/s2qisuAXTjhQpN9mUuVB9ge/BRGLa1di4S/Wz
sHAhT18h/JAxvyzARq9Qa0X8go87mM3Xoe5fivnvQrTNPQsoOxgaK6JVbTNG0pP2
bJbnRTBEjudSNlRo1WEfopsiz1HxYsN5tlpG0BabnkAsUqVjKP36tUQphe3e7S9R
dFBngsPowBFLcBuBY7tp
=FDK3
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

james.page at canonical

Oct 2, 2013, 9:51 AM

Post #8 of 62 (16945 views)

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 02/10/13 17:28, Jay Pipes wrote:
>> On 02/10/13 16:37, Jay Pipes wrote:
>>> Hi James, have you tried setting the MTU to a lower number of
>>> bytes, instead of a higher-than-1500 setting? Say... 1454
>>> instead of 1546?
>>>
>>> Curious to see if that resolves the issue. If it does, then
>>> perhaps there is a path somewhere that had a <1546 PMTU?
>>
>> Do you mean in instances, or on the physical servers?
>
> I mean on the instance vNICs.

Yeah - thats what I thought - that makes no difference either.

- --
James Page
Technical Lead
Ubuntu Server Team
james.page@canonical.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTE7zAAoJEL/srsug59jDwswP/AwQarblKhDnAe+aGYVn1hKs
g/BPiyqovNtBNNXKj5FLIaDDQnpLueIDxoX0lHPZkKLpDJybrsQBtqwnol2qcBa3
rBfb/yt92vL8wDlRBEsbh1qr/2EmErksFjcIMIltqBNXP5gGR3ADS9DIJ65GUIFY
Aipsk03bu3pn2FiCJo/cbbKBT96bbQg9vNgbUi8Eu8vWW7wpEq90njlDrVh02u/o
ioME0Ja8DnFrPNmIx8kaaOdXSY9e3YmWfjImQbi/O7lVwUHV7ZA+4szSrQiCmPn3
eHUGTblLP2yEmETu3rF7hxB1bn2H3bxZ+C1vg7k3ABNlTMrDPHTQv+iRSCA9WDcf
yMNjCD5dTI10gx+OTDjEIg+z2yEA4fqmYqHgHsuPyCBdRs6CX1qIJPywFZlFDglC
AC1R6PMtpVTlcUXlLX/3QJc63/n+3nX6R56iOmAxgDIaVLy5+Hh52g+5vY1T5Nl8
B0aqM60Duxvpf6/9wkgSHcjp7MHBp1IEoT8b+aD5xwSZjG+gqW2wClCGx6ktOfnN
vwxmaTT+rY2vqLNXd51PF2Tfl5+cfK2Sws3lnmJwh5PxZtcwfY42wiBAJWbuJMDT
EIurmHqSPhBkylZlONWto7oNyDSaiqYczbTXGM3eYw/ZqTpgN/X9JuCpMAxt51oI
ALR0na+J0AIQcRUS0P4M
=CQbq
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

james.page at ubuntu

Oct 2, 2013, 2:49 PM

Post #9 of 62 (16932 views)

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 02/10/13 17:33, James Page wrote:
> On 02/10/13 17:24, Gangur, Hrushikesh (R & D HP Cloud) wrote:
>>> http://techbackground.blogspot.co.uk/2013/06/path-mtu-discovery-and-gre.html
>
>>>
Yeah
>>>
> - I read that already:
>
> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950 ms
> F=1500 0.598 ms 0.566 ms
>
> The PMTU from the l3 gateway to the instance looks OK to me.

I spent a bit more time debugging this; performance from within the
router netns on the L3 gateway node looks good in both directions when
accessing via the tenant network (10.5.0.2) over the qr-XXXXX
interface, but when accessing through the external network from within
the netns I see the same performance choke upstream into the tenant
network.

Which would indicate that my problem lies somewhere around the
qg-XXXXX interface in the router netns - just trying to figure out
exactly what - maybe iptables is doing something wonky?

- --
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTJTzAAoJEL/srsug59jDXIoQAIqd5Msoyubvs0Y270PeYHwJ
vsmjw0Fzyf+428KTo2RcfWKGarkBmn/3kbygzPJH2aVHZx/+s2dHY1YJu1gH7B4i
0yCIQZWhur+CdXN7QplqhJLgq+ZVyC4/GV4RA/C2NpHzGZg/avx5BPMhzfnSnRtB
Xy49umZkG90622WhW2hlXW5J06YIEsO1EuwonXxIXzXu2CYsvLKk2GguU7tejC7Q
DfW36gkCVv2z/71vVXgpjNt76MNsA8IVmaB4vv08Ai4yyUMNpvUc/SWu5DwzuoZx
vGxkCFv419rzO64L6EbYcmnUBXa+wFnSTp8hCNfl8fsDMJb6kynwLAWqCiIKKS8/
ozZfZ7eQ4CmyctckXjxBchmybh0aMRrzYANvE/9vkub3aAF7fpeCus+Nw59TLe62
tlfAZKPhmLikGbbIia6SX6j9PS9x2mSagfinjQs0BHDV0Pyww5qotWbWLbCFD7Cz
yhLjAGAhOnB5CQlEqX9XdM2/YGvhTIzLMMkPeQVicNlUXx/TXqJ2cvcIjdoBASFC
i6lfhhwXU9n9zi0THOxHQozksaMKc/diWULkcewqdbqYgLbZ5x8+SADf2Zd7WFzZ
MKe54y7fmhKWnL+zTN9tLwG8qnLWpIWJ5M4V99a8HL6zgTyeRJ/9bgMsl/2ghTra
EGO8vL6+zj8cAYTFB3oF
=Fp5N
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 2, 2013, 5:15 PM

Post #10 of 62 (16913 views)

Permalink

Hi James,

Let me ask you something...

Are you using the package `openvswitch-datapath-dkms' from Havana Ubuntu
Cloud Archive with Linux 3.8?

I am unable to compile that module on top of Ubuntu 12.04.3 (with Linux
3.8) and I'm wondering if it is still required or not...

Thanks!
Thiago

On 2 October 2013 06:14, James Page <james.page@ubuntu.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi Folks
>
> I'm seeing an odd direction performance issue with my Havana test rig
> which I'm struggling to debug; details:
>
> Ubuntu 12.04 with Linux 3.8 backports kernel, Havana Cloud Archive
> (currently Havana b3, OpenvSwitch 1.10.2), OpenvSwitch plugin with GRE
> overlay networks.
>
> I've configured the MTU's on all of the physical host network
> interfaces to 1546 to add capacity for the GRE network headers.
>
> Performance between instances within a single tenant network on
> different physical hosts is as I would expect (near 1GBps), but I see
> issues when data transits the Neutron L3 gateway - in the example
> below churel is a physical host on the same network as the layer 3
> gateway:
>
> ubuntu@churel:~$ scp hardware.dump 10.98.191.103:
> hardware.dump
> 100% 67MB 4.8MB/s
> 00:14
>
> ubuntu@churel:~$ scp 10.98.191.103:hardware.dump .
> hardware.dump
> 100% 67MB
> 66.8MB/s 00:01
>
> As you can see, pushing data to the instance (via a floating ip
> 10.98.191.103) is painfully slow, whereas pulling the same data is
> x10+ faster (and closer to what I would expect).
>
> iperf confirms the same:
>
> ubuntu@churel:~$ iperf -c 10.98.191.103 -m
> - ------------------------------------------------------------
> Client connecting to 10.98.191.103, TCP port 5001
> TCP window size: 22.9 KByte (default)
> - ------------------------------------------------------------
> [ 3] local 10.98.191.11 port 55330 connected with 10.98.191.103 port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0-10.0 sec 60.8 MBytes 50.8 Mbits/sec
> [ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>
> ubuntu@james-page-bastion:~$ iperf -c 10.98.191.11 -m
>
>
> - ------------------------------------------------------------
> Client connecting to 10.98.191.11, TCP port 5001
> TCP window size: 23.3 KByte (default)
> - ------------------------------------------------------------
> [ 3] local 10.5.0.2 port 52190 connected with 10.98.191.11 port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0-10.0 sec 1.07 GBytes 918 Mbits/sec
> [ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>
>
> 918Mbit vs 50Mbits.
>
> I tcpdump'ed the traffic and I see alot of duplicate acks which makes
> me suspect some sort of packet fragmentation but its got me puzzled.
>
> Anyone have any ideas about how to debug this further? or has anyone
> seen anything like this before?
>
> Cheers
>
> James
>
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBCAAGBQJSS+QSAAoJEL/srsug59jD8ZcQAKbZDVU8KKa7hsic7+ulqWQQ
> EFbq8Im5x4mQY7htIvIOM26BR0ktAO5luE7zMBXsA4AwPud1BQSGhw89/NvNhADT
> TLcGdQADsomeiBpJebzwUmvL/tYUoMDRA3O96mUn2pi0fySWbEuEgMDjDJ/ow23D
> Y7nEv0mItaZ4MBSI9RZcqsDUl7UbbdlGejSWhJcwp/127HMU9nYwWNz5UHJjsGZ1
> eITyv1WZH/dYPQ1SES41qD1WvkTBugopGJvptEyrcO62A+akGOvnqpsHgPECbLb+
> b/8rk8nB1HB74Wh+tQP4WRQCZYso15nB6ukIyIU24Qti2tXtXDdKwszEoblCwCT3
> YZJTERNOENURlUEFwgi6FNL+nZomSG0UJU6qqDGiUJkbSF7SwJm4y8/XRlJM2Ihn
> wyxFB0qe3YdMqgDLZn11GwCDqn3g11hYaocHNUyRaj/tgxhGKbOFvix5kz3I4V7T
> gd+sqUySMVd9wCRXBzDDhCuG9xf/QY2ZQxXzyfPJWd9svPh/O6osTSQzaI1eZl9/
> jVRejMAFr6Rl11GPKd3DYi32GXa896QELjBmJ9Kof0NDlCcDuUKpVeifIhcbQZZV
> sWyQmbb6Z/ypFV9xXiLRfH2fW2bAQQHgiQGvy9apoE78BWYdnsD8Q3Ekwag6lFqp
> yUwt/RcRXS1PbLG4EGFW
> =HTvW
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

robertc at robertcollins

Oct 2, 2013, 6:28 PM

Post #11 of 62 (16916 views)

Permalink

I believe it's still needed: upstream kernel have pushed back against
the modules it provides, but neutron needs them to deliver the gre
tunnels.

-Rob

On 3 October 2013 13:15, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com> wrote:
> Hi James,
>
> Let me ask you something...
>
> Are you using the package `openvswitch-datapath-dkms' from Havana Ubuntu
> Cloud Archive with Linux 3.8?
>
> I am unable to compile that module on top of Ubuntu 12.04.3 (with Linux 3.8)
> and I'm wondering if it is still required or not...
>
> Thanks!
> Thiago
>
>
> On 2 October 2013 06:14, James Page <james.page@ubuntu.com> wrote:
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> Hi Folks
>>
>> I'm seeing an odd direction performance issue with my Havana test rig
>> which I'm struggling to debug; details:
>>
>> Ubuntu 12.04 with Linux 3.8 backports kernel, Havana Cloud Archive
>> (currently Havana b3, OpenvSwitch 1.10.2), OpenvSwitch plugin with GRE
>> overlay networks.
>>
>> I've configured the MTU's on all of the physical host network
>> interfaces to 1546 to add capacity for the GRE network headers.
>>
>> Performance between instances within a single tenant network on
>> different physical hosts is as I would expect (near 1GBps), but I see
>> issues when data transits the Neutron L3 gateway - in the example
>> below churel is a physical host on the same network as the layer 3
>> gateway:
>>
>> ubuntu@churel:~$ scp hardware.dump 10.98.191.103:
>> hardware.dump
>> 100% 67MB 4.8MB/s
>> 00:14
>>
>> ubuntu@churel:~$ scp 10.98.191.103:hardware.dump .
>> hardware.dump
>> 100% 67MB
>> 66.8MB/s 00:01
>>
>> As you can see, pushing data to the instance (via a floating ip
>> 10.98.191.103) is painfully slow, whereas pulling the same data is
>> x10+ faster (and closer to what I would expect).
>>
>> iperf confirms the same:
>>
>> ubuntu@churel:~$ iperf -c 10.98.191.103 -m
>> - ------------------------------------------------------------
>> Client connecting to 10.98.191.103, TCP port 5001
>> TCP window size: 22.9 KByte (default)
>> - ------------------------------------------------------------
>> [ 3] local 10.98.191.11 port 55330 connected with 10.98.191.103 port 5001
>> [ ID] Interval Transfer Bandwidth
>> [ 3] 0.0-10.0 sec 60.8 MBytes 50.8 Mbits/sec
>> [ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>>
>> ubuntu@james-page-bastion:~$ iperf -c 10.98.191.11 -m
>>
>>
>> - ------------------------------------------------------------
>> Client connecting to 10.98.191.11, TCP port 5001
>> TCP window size: 23.3 KByte (default)
>> - ------------------------------------------------------------
>> [ 3] local 10.5.0.2 port 52190 connected with 10.98.191.11 port 5001
>> [ ID] Interval Transfer Bandwidth
>> [ 3] 0.0-10.0 sec 1.07 GBytes 918 Mbits/sec
>> [ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>>
>>
>> 918Mbit vs 50Mbits.
>>
>> I tcpdump'ed the traffic and I see alot of duplicate acks which makes
>> me suspect some sort of packet fragmentation but its got me puzzled.
>>
>> Anyone have any ideas about how to debug this further? or has anyone
>> seen anything like this before?
>>
>> Cheers
>>
>> James
>>
>>
>> - --
>> James Page
>> Ubuntu and Debian Developer
>> james.page@ubuntu.com
>> jamespage@debian.org
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.14 (GNU/Linux)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>
>> iQIcBAEBCAAGBQJSS+QSAAoJEL/srsug59jD8ZcQAKbZDVU8KKa7hsic7+ulqWQQ
>> EFbq8Im5x4mQY7htIvIOM26BR0ktAO5luE7zMBXsA4AwPud1BQSGhw89/NvNhADT
>> TLcGdQADsomeiBpJebzwUmvL/tYUoMDRA3O96mUn2pi0fySWbEuEgMDjDJ/ow23D
>> Y7nEv0mItaZ4MBSI9RZcqsDUl7UbbdlGejSWhJcwp/127HMU9nYwWNz5UHJjsGZ1
>> eITyv1WZH/dYPQ1SES41qD1WvkTBugopGJvptEyrcO62A+akGOvnqpsHgPECbLb+
>> b/8rk8nB1HB74Wh+tQP4WRQCZYso15nB6ukIyIU24Qti2tXtXDdKwszEoblCwCT3
>> YZJTERNOENURlUEFwgi6FNL+nZomSG0UJU6qqDGiUJkbSF7SwJm4y8/XRlJM2Ihn
>> wyxFB0qe3YdMqgDLZn11GwCDqn3g11hYaocHNUyRaj/tgxhGKbOFvix5kz3I4V7T
>> gd+sqUySMVd9wCRXBzDDhCuG9xf/QY2ZQxXzyfPJWd9svPh/O6osTSQzaI1eZl9/
>> jVRejMAFr6Rl11GPKd3DYi32GXa896QELjBmJ9Kof0NDlCcDuUKpVeifIhcbQZZV
>> sWyQmbb6Z/ypFV9xXiLRfH2fW2bAQQHgiQGvy9apoE78BWYdnsD8Q3Ekwag6lFqp
>> yUwt/RcRXS1PbLG4EGFW
>> =HTvW
>> -----END PGP SIGNATURE-----
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack@lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>

--
Robert Collins <rbtcollins@hp.com>
Distinguished Technologist
HP Converged Cloud

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 2, 2013, 8:43 PM

Post #12 of 62 (16939 views)

Permalink

Mmm... I am unable to compile openvswitch-datapath-dkms from Havana Ubuntu
Cloud Archive (on top of a fresh install of Ubuntu 12.04.3), look:

------
root@havabuntu-1:~# uname -a
Linux havabuntu-1 3.8.0-31-generic #46~precise1-Ubuntu SMP Wed Sep 11
18:21:16 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

root@havabuntu-1:~# dpkg -l | grep openvswitch-datapath-dkms
ii openvswitch-datapath-dkms 1.10.2-0ubuntu1~cloud0 Open
vSwitch datapath module source - DKMS version

root@havabuntu-1:~# dpkg-reconfigure openvswitch-datapath-dkms

------------------------------
Deleting module version: 1.10.2
completely from the DKMS tree.
------------------------------
Done.

Creating symlink /var/lib/dkms/openvswitch/1.10.2/source ->
/usr/src/openvswitch-1.10.2

DKMS: add completed.

Kernel preparation unnecessary for this kernel. Skipping...

Building module:
cleaning build area....(bad exit status: 2)
./configure --with-linux='/lib/modules/3.8.0-31-generic/build' && make -C
datapath/linux............(bad exit status: 2)
Error! Bad return status for module build on kernel: 3.8.0-31-generic
(x86_64)
Consult /var/lib/dkms/openvswitch/1.10.2/build/make.log for more
information.
------

Contents of /var/lib/dkms/openvswitch/1.10.2/build/make.log:

http://paste.openstack.org/show/47888/

I also have the packages: build-essential, linux-headers, etc, installed...

So, James, have you this module compiled on your test environment? I mean,
does this command: "dpkg-reconfigure openvswitch-datapath-dkms" works for
you?!

NOTE: It also doesn't compiles when with Linux 3.2 (Ubuntu 12.04.1).

Thanks,
Thiago

On 2 October 2013 22:28, Robert Collins <robertc@robertcollins.net> wrote:

> I believe it's still needed: upstream kernel have pushed back against
> the modules it provides, but neutron needs them to deliver the gre
> tunnels.
>
> -Rob
>
> On 3 October 2013 13:15, Martinx - $B%8%'!<%`%:(B <thiagocmartinsc@gmail.com>
> wrote:
> > Hi James,
> >
> > Let me ask you something...
> >
> > Are you using the package `openvswitch-datapath-dkms' from Havana Ubuntu
> > Cloud Archive with Linux 3.8?
> >
> > I am unable to compile that module on top of Ubuntu 12.04.3 (with Linux
> 3.8)
> > and I'm wondering if it is still required or not...
> >
> > Thanks!
> > Thiago
> >
> >
> > On 2 October 2013 06:14, James Page <james.page@ubuntu.com> wrote:
> >>
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA256
> >>
> >> Hi Folks
> >>
> >> I'm seeing an odd direction performance issue with my Havana test rig
> >> which I'm struggling to debug; details:
> >>
> >> Ubuntu 12.04 with Linux 3.8 backports kernel, Havana Cloud Archive
> >> (currently Havana b3, OpenvSwitch 1.10.2), OpenvSwitch plugin with GRE
> >> overlay networks.
> >>
> >> I've configured the MTU's on all of the physical host network
> >> interfaces to 1546 to add capacity for the GRE network headers.
> >>
> >> Performance between instances within a single tenant network on
> >> different physical hosts is as I would expect (near 1GBps), but I see
> >> issues when data transits the Neutron L3 gateway - in the example
> >> below churel is a physical host on the same network as the layer 3
> >> gateway:
> >>
> >> ubuntu@churel:~$ scp hardware.dump 10.98.191.103:
> >> hardware.dump
> >> 100% 67MB 4.8MB/s
> >> 00:14
> >>
> >> ubuntu@churel:~$ scp 10.98.191.103:hardware.dump .
> >> hardware.dump
> >> 100% 67MB
> >> 66.8MB/s 00:01
> >>
> >> As you can see, pushing data to the instance (via a floating ip
> >> 10.98.191.103) is painfully slow, whereas pulling the same data is
> >> x10+ faster (and closer to what I would expect).
> >>
> >> iperf confirms the same:
> >>
> >> ubuntu@churel:~$ iperf -c 10.98.191.103 -m
> >> - ------------------------------------------------------------
> >> Client connecting to 10.98.191.103, TCP port 5001
> >> TCP window size: 22.9 KByte (default)
> >> - ------------------------------------------------------------
> >> [ 3] local 10.98.191.11 port 55330 connected with 10.98.191.103 port
> 5001
> >> [ ID] Interval Transfer Bandwidth
> >> [ 3] 0.0-10.0 sec 60.8 MBytes 50.8 Mbits/sec
> >> [ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
> >>
> >> ubuntu@james-page-bastion:~$ iperf -c 10.98.191.11 -m
> >>
> >>
> >> - ------------------------------------------------------------
> >> Client connecting to 10.98.191.11, TCP port 5001
> >> TCP window size: 23.3 KByte (default)
> >> - ------------------------------------------------------------
> >> [ 3] local 10.5.0.2 port 52190 connected with 10.98.191.11 port 5001
> >> [ ID] Interval Transfer Bandwidth
> >> [ 3] 0.0-10.0 sec 1.07 GBytes 918 Mbits/sec
> >> [ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
> >>
> >>
> >> 918Mbit vs 50Mbits.
> >>
> >> I tcpdump'ed the traffic and I see alot of duplicate acks which makes
> >> me suspect some sort of packet fragmentation but its got me puzzled.
> >>
> >> Anyone have any ideas about how to debug this further? or has anyone
> >> seen anything like this before?
> >>
> >> Cheers
> >>
> >> James
> >>
> >>
> >> - --
> >> James Page
> >> Ubuntu and Debian Developer
> >> james.page@ubuntu.com
> >> jamespage@debian.org
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v1.4.14 (GNU/Linux)
> >> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> >>
> >> iQIcBAEBCAAGBQJSS+QSAAoJEL/srsug59jD8ZcQAKbZDVU8KKa7hsic7+ulqWQQ
> >> EFbq8Im5x4mQY7htIvIOM26BR0ktAO5luE7zMBXsA4AwPud1BQSGhw89/NvNhADT
> >> TLcGdQADsomeiBpJebzwUmvL/tYUoMDRA3O96mUn2pi0fySWbEuEgMDjDJ/ow23D
> >> Y7nEv0mItaZ4MBSI9RZcqsDUl7UbbdlGejSWhJcwp/127HMU9nYwWNz5UHJjsGZ1
> >> eITyv1WZH/dYPQ1SES41qD1WvkTBugopGJvptEyrcO62A+akGOvnqpsHgPECbLb+
> >> b/8rk8nB1HB74Wh+tQP4WRQCZYso15nB6ukIyIU24Qti2tXtXDdKwszEoblCwCT3
> >> YZJTERNOENURlUEFwgi6FNL+nZomSG0UJU6qqDGiUJkbSF7SwJm4y8/XRlJM2Ihn
> >> wyxFB0qe3YdMqgDLZn11GwCDqn3g11hYaocHNUyRaj/tgxhGKbOFvix5kz3I4V7T
> >> gd+sqUySMVd9wCRXBzDDhCuG9xf/QY2ZQxXzyfPJWd9svPh/O6osTSQzaI1eZl9/
> >> jVRejMAFr6Rl11GPKd3DYi32GXa896QELjBmJ9Kof0NDlCcDuUKpVeifIhcbQZZV
> >> sWyQmbb6Z/ypFV9xXiLRfH2fW2bAQQHgiQGvy9apoE78BWYdnsD8Q3Ekwag6lFqp
> >> yUwt/RcRXS1PbLG4EGFW
> >> =HTvW
> >> -----END PGP SIGNATURE-----
> >>
> >> _______________________________________________
> >> Mailing list:
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> Post to : openstack@lists.openstack.org
> >> Unsubscribe :
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >
> >
> >
> > _______________________________________________
> > Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> > Post to : openstack@lists.openstack.org
> > Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >
>
>
>
> --
> Robert Collins <rbtcollins@hp.com>
> Distinguished Technologist
> HP Converged Cloud
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

james.page at ubuntu

Oct 2, 2013, 11:02 PM

Post #13 of 62 (16928 views)

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 03/10/13 04:43, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º wrote:
> Mmm... I am unable to compile openvswitch-datapath-dkms from
> Havana Ubuntu Cloud Archive (on top of a fresh install of Ubuntu
> 12.04.3), look:

There is a bug in that version; I'm deploying from
ppa:ubuntu-cloud-archive/havana-staging which has a version that does
work - we are testing everything prior to push through to proposed and
updates for rc1 (i.e. this week).

- --
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTQh2AAoJEL/srsug59jDYvgQAIFpc/NTKGHBUSCRX3JiRVru
iBK2EuPZeNhh9Y4oXO14/zhDNp4/vnDQcJMNAZskUxuA5HcAnLp9oZbleKqG/r7W
w0s9fpkPzzYabaKR431QzJhm+3NIuMqtSgNy0ZX7zO9om3vkSAtLLTUlyYIHxTj3
owPpndN527XUuYalwFF7ffdZK0oIOX65XEUehmX1SPEeOGNhrWjnLH8rcr5XcCbL
VaGPMcqkJLjW+aKTjr4Xi0R6geQ+BjM7g+FNtu7BR4V+laxLyKz9f+WPdrdfcFQP
PLt6gBG6/OVzmZD8Fxs2iD0ox/KaC7gfhxF7ffF1aFwZIhzMZhUYtmCxNSPx80lG
FXOG9R54kDzvPzPNdZLS+dYUcuSBjFLw3Wjrplxzlok+cLjlqjfoABHXlhFjfcuM
Qr5QeUnJc9at+2p8JBjBRK1uxLgV2G+R7umIcjS9SIiD0kK9mKHGDbdKHJ4pvto8
sMAtIDAYMT+hEPWZ7i7x3lqbd/G2ipwKi2exgKy2VVfxB11qTY07boqNztd905NG
iOpusyvFqouHZZJ4SC5OziTTa3rcy2nhta2uYT946aS22z3BxESePlzi/PCJ5faU
h6HA7qIZyr4aUH75I/FBBmDasFrSKA7xJUYXPHa5wV1pnBvSs6QA14P0q43OsmwX
OQyC1OFfgRfE49kX14QZ
=TjDN
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 2, 2013, 11:16 PM

Post #14 of 62 (16922 views)

Permalink

Cool! The `ppa:ubuntu-cloud-archive/havana-staging' is the repository I was
looking for. It works now... Thanks!

On 3 October 2013 03:02, James Page <james.page@ubuntu.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> On 03/10/13 04:43, Martinx - $B%8%'!<%`%:(B wrote:
> > Mmm... I am unable to compile openvswitch-datapath-dkms from
> > Havana Ubuntu Cloud Archive (on top of a fresh install of Ubuntu
> > 12.04.3), look:
>
> There is a bug in that version; I'm deploying from
> ppa:ubuntu-cloud-archive/havana-staging which has a version that does
> work - we are testing everything prior to push through to proposed and
> updates for rc1 (i.e. this week).
>
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBCAAGBQJSTQh2AAoJEL/srsug59jDYvgQAIFpc/NTKGHBUSCRX3JiRVru
> iBK2EuPZeNhh9Y4oXO14/zhDNp4/vnDQcJMNAZskUxuA5HcAnLp9oZbleKqG/r7W
> w0s9fpkPzzYabaKR431QzJhm+3NIuMqtSgNy0ZX7zO9om3vkSAtLLTUlyYIHxTj3
> owPpndN527XUuYalwFF7ffdZK0oIOX65XEUehmX1SPEeOGNhrWjnLH8rcr5XcCbL
> VaGPMcqkJLjW+aKTjr4Xi0R6geQ+BjM7g+FNtu7BR4V+laxLyKz9f+WPdrdfcFQP
> PLt6gBG6/OVzmZD8Fxs2iD0ox/KaC7gfhxF7ffF1aFwZIhzMZhUYtmCxNSPx80lG
> FXOG9R54kDzvPzPNdZLS+dYUcuSBjFLw3Wjrplxzlok+cLjlqjfoABHXlhFjfcuM
> Qr5QeUnJc9at+2p8JBjBRK1uxLgV2G+R7umIcjS9SIiD0kK9mKHGDbdKHJ4pvto8
> sMAtIDAYMT+hEPWZ7i7x3lqbd/G2ipwKi2exgKy2VVfxB11qTY07boqNztd905NG
> iOpusyvFqouHZZJ4SC5OziTTa3rcy2nhta2uYT946aS22z3BxESePlzi/PCJ5faU
> h6HA7qIZyr4aUH75I/FBBmDasFrSKA7xJUYXPHa5wV1pnBvSs6QA14P0q43OsmwX
> OQyC1OFfgRfE49kX14QZ
> =TjDN
> -----END PGP SIGNATURE-----
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

james.page at ubuntu

Oct 3, 2013, 2:27 AM

Post #15 of 62 (16922 views)

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 02/10/13 22:49, James Page wrote:
>> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
>>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
>>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
>>> ms F=1500 0.598 ms 0.566 ms
>>>
>>> The PMTU from the l3 gateway to the instance looks OK to me.
> I spent a bit more time debugging this; performance from within
> the router netns on the L3 gateway node looks good in both
> directions when accessing via the tenant network (10.5.0.2) over
> the qr-XXXXX interface, but when accessing through the external
> network from within the netns I see the same performance choke
> upstream into the tenant network.
>
> Which would indicate that my problem lies somewhere around the
> qg-XXXXX interface in the router netns - just trying to figure out
> exactly what - maybe iptables is doing something wonky?

OK - I found a fix but I'm not sure why this makes a difference;
neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
True'; I switched this on, clearing everything down, rebooted and now
I seem symmetric good performance across all neutron routers.

This would point to some sort of underlying bug when ovs_use_veth = False.

- --
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
jkJM4Y1BUV+2L5Rrf3sc
=4lO4
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 21, 2013, 11:52 PM

Post #16 of 62 (16796 views)

Permalink

James,

I think I'm hitting this problem.

I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
L3+DHCP Network Node.

The connectivity from behind my Instances is very slow. It takes an
eternity to finish "apt-get update".

If I run "apt-get update" from within tenant's Namespace, it goes fine.

If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and
unable to start new Ubuntu Instances and login into them... Look:

--
cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
2013-10-22 06:01:42,989 - util.py[WARNING]: '
http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]:
url error [[Errno 113] No route to host]
2013-10-22 06:01:45,988 - util.py[WARNING]: '
http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]:
url error [[Errno 113] No route to host]
--

Is this problem still around?!

Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?

Is it possible to re-enable Metadata when ovs_use_veth = true ?

Thanks!
Thiago

On 3 October 2013 06:27, James Page <james.page@ubuntu.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> On 02/10/13 22:49, James Page wrote:
> >> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
> >>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
> >>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
> >>> ms F=1500 0.598 ms 0.566 ms
> >>>
> >>> The PMTU from the l3 gateway to the instance looks OK to me.
> > I spent a bit more time debugging this; performance from within
> > the router netns on the L3 gateway node looks good in both
> > directions when accessing via the tenant network (10.5.0.2) over
> > the qr-XXXXX interface, but when accessing through the external
> > network from within the netns I see the same performance choke
> > upstream into the tenant network.
> >
> > Which would indicate that my problem lies somewhere around the
> > qg-XXXXX interface in the router netns - just trying to figure out
> > exactly what - maybe iptables is doing something wonky?
>
> OK - I found a fix but I'm not sure why this makes a difference;
> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
> True'; I switched this on, clearing everything down, rebooted and now
> I seem symmetric good performance across all neutron routers.
>
> This would point to some sort of underlying bug when ovs_use_veth = False.
>
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
> jkJM4Y1BUV+2L5Rrf3sc
> =4lO4
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

james.page at ubuntu

Oct 22, 2013, 7:24 AM

Post #17 of 62 (16778 views)

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Martinx

On 21/10/13 23:52, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º wrote:
> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels
> and L3+DHCP Network Node.
>
> The connectivity from behind my Instances is very slow. It takes
> an eternity to finish "apt-get update".
>
> If I run "apt-get update" from within tenant's Namespace, it goes
> fine.
>
> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working
> and I and unable to start new Ubuntu Instances and login into
> them... Look:
>
> -- cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up
> 4.01 seconds 2013-10-22 06:01:42,989 - util.py[WARNING]:
> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
> [3/120s]: url error [[Errno 113] No route to host] 2013-10-22
> 06:01:45,988 - util.py[WARNING]:
> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
> [6/120s]: url error [[Errno 113] No route to host] --
>
> Is this problem still around?!

Definatetly sounds similar; I'd ensure that all of the namespaces on
the gateways/data forwarding node are correct by giving it a reboot;

I think this needs a bug; Neutron should be OK without the use of veth
- - I'll get to that today.

Cheers

James

- --
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSZoqMAAoJEL/srsug59jDlYAQAJkzfeVcWElQbB9LWQ4CRwjy
KwiAsFN6UVVnUgh4gZtS6Nb9xUtA4oQN/X8hVbSK9Ng5bSErot1NrjRITnWH0Wjl
70Tg4vh4ofufrYzzvGcUVGJ0FB1V+pf/XDAk5vMNEF6iMs7/XETWsabN15dPPUOv
Hq+YKo+8eeDgASVszelb8Hy14oZ7mJ1uaGIUTCqXH3Zbrkcwqw9Cp0AJ621pQ6K4
W0deiyy89+Br/FF65pi358949o1z7xexo+R74i9mPwUyeEuR27EeZEo9sM2LgLkR
kvk4jhndAZNgnK4ijc6ATqKuiDqgyUbrwJi4MTIbN2iFKtEV9gwftW/LRBwL5ihN
CgTgUw3ocKRudstgqUJ4Y1UjAmeztnrdQ3ZYuj1IXqqnpjvWvBxE87ajmoj6xhEL
miaxEKHkQuiM6XTuSmmoUvVQw5H77ZaRBTUCtTr2yUbaHArrBgjCwdAWsXjv2jp0
OO59k6Und6Mugi1tpUOWgrupgcrqG0Bc0W9XC+Q11WhYVYaoDh6QEjGFY8/5H5Mp
gUfu6jvGA891eDbYDMFclB2XDAKDxKGvMsnJbJ3UbC/tQBmmviemKgbKqRAO3Pt7
692bLGwuTy/t69EbTqs/+USaJGn9G2l2pZk8CgvmmHEU4dqdKqtFsZCfn4X3+w41
sl0NaHdulfF8HRgQN6ES
=kaLf
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 23, 2013, 5:09 PM

Post #18 of 62 (16772 views)

Permalink

Hi James!

Any updates about this issue?!

I am unable to provide a good connectivity for my tenants. I already tried
everything I could, without success... I tried to installing it again, from
scratch, in parallel (another isolated lab), using new hardware, same
result.

I'm trying to use "Per-Tenant Routers with Private Networks" topology, and
it is useless... I'm with GRE tunnels, and a Network Node with 3 ethernets.

Also, when I enable ovs_use_veth, DHCP / Metadata stops working, reboots
doesn't fix it. So, I am unable to check if ovs_use_veth fix it.

I'm a bit tired of GRE tunnels, too much trouble... Maybe it is time to try
VXLAN... But it will be a shot in the dark... :-/

Grizzly Reference HowTo, used to guide my Havana deployment:

https://github.com/mseknibilel/OpenStack-Grizzly-Install-Guide/blob/OVS_MultiNode/OpenStack_Grizzly_Install_Guide.rst

Thanks!
Thiago

On 22 October 2013 12:24, James Page <james.page@ubuntu.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi Martinx
>
> On 21/10/13 23:52, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º wrote:
> > I'm using "Per-Tenant Routers with Private Networks", GRE tunnels
> > and L3+DHCP Network Node.
> >
> > The connectivity from behind my Instances is very slow. It takes
> > an eternity to finish "apt-get update".
> >
> > If I run "apt-get update" from within tenant's Namespace, it goes
> > fine.
> >
> > If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working
> > and I and unable to start new Ubuntu Instances and login into
> > them... Look:
> >
> > -- cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up
> > 4.01 seconds 2013-10-22 06:01:42,989 - util.py[WARNING]:
> > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
> > [3/120s]: url error [[Errno 113] No route to host] 2013-10-22
> > 06:01:45,988 - util.py[WARNING]:
> > 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
> > [6/120s]: url error [[Errno 113] No route to host] --
> >
> > Is this problem still around?!
>
> Definatetly sounds similar; I'd ensure that all of the namespaces on
> the gateways/data forwarding node are correct by giving it a reboot;
>
> I think this needs a bug; Neutron should be OK without the use of veth
> - - I'll get to that today.
>
> Cheers
>
> James
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBCAAGBQJSZoqMAAoJEL/srsug59jDlYAQAJkzfeVcWElQbB9LWQ4CRwjy
> KwiAsFN6UVVnUgh4gZtS6Nb9xUtA4oQN/X8hVbSK9Ng5bSErot1NrjRITnWH0Wjl
> 70Tg4vh4ofufrYzzvGcUVGJ0FB1V+pf/XDAk5vMNEF6iMs7/XETWsabN15dPPUOv
> Hq+YKo+8eeDgASVszelb8Hy14oZ7mJ1uaGIUTCqXH3Zbrkcwqw9Cp0AJ621pQ6K4
> W0deiyy89+Br/FF65pi358949o1z7xexo+R74i9mPwUyeEuR27EeZEo9sM2LgLkR
> kvk4jhndAZNgnK4ijc6ATqKuiDqgyUbrwJi4MTIbN2iFKtEV9gwftW/LRBwL5ihN
> CgTgUw3ocKRudstgqUJ4Y1UjAmeztnrdQ3ZYuj1IXqqnpjvWvBxE87ajmoj6xhEL
> miaxEKHkQuiM6XTuSmmoUvVQw5H77ZaRBTUCtTr2yUbaHArrBgjCwdAWsXjv2jp0
> OO59k6Und6Mugi1tpUOWgrupgcrqG0Bc0W9XC+Q11WhYVYaoDh6QEjGFY8/5H5Mp
> gUfu6jvGA891eDbYDMFclB2XDAKDxKGvMsnJbJ3UbC/tQBmmviemKgbKqRAO3Pt7
> 692bLGwuTy/t69EbTqs/+USaJGn9G2l2pZk8CgvmmHEU4dqdKqtFsZCfn4X3+w41
> sl0NaHdulfF8HRgQN6ES
> =kaLf
> -----END PGP SIGNATURE-----
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

arosen at nicira

Oct 23, 2013, 5:40 PM

Post #19 of 62 (16775 views)

Permalink

On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com
> wrote:

> James,
>
> I think I'm hitting this problem.
>
> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
> L3+DHCP Network Node.
>
> The connectivity from behind my Instances is very slow. It takes an
> eternity to finish "apt-get update".
>

I'm curious if you can do the following tests to help pinpoint the bottle
neck:

Run iperf or netperf between:
two instances on the same hypervisor - this will determine if it's a
virtualization driver issue if the performance is bad.
two instances on different hypervisors.
one instance to the namespace of the l3 agent.

>
> If I run "apt-get update" from within tenant's Namespace, it goes fine.
>
> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and
> unable to start new Ubuntu Instances and login into them... Look:
>
> --
> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
> 2013-10-22 06:01:42,989 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]:
> url error [[Errno 113] No route to host]
> 2013-10-22 06:01:45,988 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]:
> url error [[Errno 113] No route to host]
> --
>

Do you see anything interesting in the neutron-metadata-agent log? Or it
looks like your instance doesn't have a route to the default gw?

>
> Is this problem still around?!
>
> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?
>
> Is it possible to re-enable Metadata when ovs_use_veth = true ?
>
> Thanks!
> Thiago
>
>
> On 3 October 2013 06:27, James Page <james.page@ubuntu.com> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> On 02/10/13 22:49, James Page wrote:
>> >> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
>> >>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
>> >>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
>> >>> ms F=1500 0.598 ms 0.566 ms
>> >>>
>> >>> The PMTU from the l3 gateway to the instance looks OK to me.
>> > I spent a bit more time debugging this; performance from within
>> > the router netns on the L3 gateway node looks good in both
>> > directions when accessing via the tenant network (10.5.0.2) over
>> > the qr-XXXXX interface, but when accessing through the external
>> > network from within the netns I see the same performance choke
>> > upstream into the tenant network.
>> >
>> > Which would indicate that my problem lies somewhere around the
>> > qg-XXXXX interface in the router netns - just trying to figure out
>> > exactly what - maybe iptables is doing something wonky?
>>
>> OK - I found a fix but I'm not sure why this makes a difference;
>> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
>> True'; I switched this on, clearing everything down, rebooted and now
>> I seem symmetric good performance across all neutron routers.
>>
>> This would point to some sort of underlying bug when ovs_use_veth = False.
>>
>>
>> - --
>> James Page
>> Ubuntu and Debian Developer
>> james.page@ubuntu.com
>> jamespage@debian.org
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.14 (GNU/Linux)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>
>> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
>> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
>> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
>> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
>> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
>> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
>> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
>> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
>> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
>> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
>> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
>> jkJM4Y1BUV+2L5Rrf3sc
>> =4lO4
>> -----END PGP SIGNATURE-----
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack@lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

rick.jones2 at hp

Oct 23, 2013, 6:20 PM

Post #20 of 62 (16769 views)

Permalink

On 10/23/2013 05:40 PM, Aaron Rosen wrote:
> I'm curious if you can do the following tests to help pinpoint the
> bottle neck:
>
> Run iperf or netperf between:
> two instances on the same hypervisor - this will determine if it's a
> virtualization driver issue if the performance is bad.
> two instances on different hypervisors.
> one instance to the namespace of the l3 agent.

If you happen to run netperf, I would suggest something like:

netperf -H <otherinstance> -t TCP_STREAM -l 30 -- -m 64K -o
throughput,local_transport_retrans

If you need data flowing the other direction, then I would suggest:

netperf -H <otherinstance> -t TCP_MAERTS -l 30 -- -m ,64K -o
throughput,remote_transport_retrans

You could add ",transport_mss" to those lists after the -o option if you
want.

What you will get is throughput (in 10^6 bits/s) and the number of TCP
retransmissions for the data connection (assuming the OS running in the
instances is Linux). Netperf will present 64KB of data to the transport
in each send call, and will run for 30 seconds. The socket buffer sizes
will be at their defaults - which under linux means they will autotune.

happy benchmarking,

rick jones

For extra credit :) you can run:

netperf -t TCP_RR -H <otherinstance> -l 30

if you are curious about latency.

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 24, 2013, 12:58 AM

Post #21 of 62 (16786 views)

Permalink

Hi Aaron,

Thanks for answering! =)

Lets work...

---

TEST #1 - iperf between Network Node and its Uplink router (Data Center's
gateway "Internet") - OVS br-ex / eth2

# Tenant Namespace route table

root@net-node-1:~# ip netns exec
qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route
default via 172.16.0.1 dev qg-50b615b7-c2
172.16.0.0/20 dev qg-50b615b7-c2 proto kernel scope link src 172.16.0.2
192.168.210.0/24 dev qr-a1376f61-05 proto kernel scope link src
192.168.210.1

# there is a "iperf -s" running at 172.16.0.1 "Internet", testing it

root@net-node-1:~# ip netns exec
qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1
------------------------------------------------------------
Client connecting to 172.16.0.1, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[ 5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 668 MBytes 559 Mbits/sec
---

---

TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink
router

# iperf server running within Tenant's Namespace router

root@net-node-1:~# ip netns exec
qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s

-

# from instance-1

ubuntu@instance-1:~$ ip route
default via 192.168.210.1 dev eth0 metric 100
192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.2

# instance-1 performing tests against net-node-1 Namespace above

ubuntu@instance-1:~$ iperf -c 192.168.210.1
------------------------------------------------------------
Client connecting to 192.168.210.1, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 484 MBytes 406 Mbits/sec

# still on instance-1, now against "External IP" of its own Namespace /
Router

ubuntu@instance-1:~$ iperf -c 172.16.0.2
------------------------------------------------------------
Client connecting to 172.16.0.2, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 520 MBytes 436 Mbits/sec

# still on instance-1, now against the Data Center UpLink Router

ubuntu@instance-1:~$ iperf -c 172.16.0.1
------------------------------------------------------------
Client connecting to 172.16.0.1, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec * 324 MBytes 271 Mbits/sec*
---

This latest test shows only 271 Mbits/s! I think it should be at least,
400~430 MBits/s... Right?!

---

TEST #3 - Two instances on the same hypervisor

# iperf server

ubuntu@instance-2:~$ ip route
default via 192.168.210.1 dev eth0 metric 100
192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.4

ubuntu@instance-2:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port 45800
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec

# iperf client

ubuntu@instance-1:~$ iperf -c 192.168.210.4
------------------------------------------------------------
Client connecting to 192.168.210.4, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec
---

---

TEST #4 - Two instances on different hypervisors - over GRE

root@instance-2:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port 34640
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 237 MBytes 198 Mbits/sec

root@instance-1:~# iperf -c 192.168.210.4
------------------------------------------------------------
Client connecting to 192.168.210.4, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 237 MBytes 198 Mbits/sec
---

I just realized how slow is my intra-cloud (intra-VM) communication... :-/

---

TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip

# Same path of "TEST #4" but, testing the physical GRE path (where GRE
traffic flows)

root@hypervisor-2:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
n[ 4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec

root@hypervisor-1:~# iperf -c 10.20.2.57
------------------------------------------------------------
Client connecting to 10.20.2.57, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[ 3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec
---

About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach
1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster
(GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only
within a hypervisor (3.96Gbit/sec).

Tomorrow, I'll do this tests with netperf.

NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via
"dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade
to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if
you guys tell me to do so.

BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with
exactly the same configurations from my current Havana + Ubuntu 12.04.3, on
top of the same hardware, to see if the problem still persist.

Regards,
Thiago

On 23 October 2013 22:40, Aaron Rosen <arosen@nicira.com> wrote:

>
>
>
> On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <
> thiagocmartinsc@gmail.com> wrote:
>
>> James,
>>
>> I think I'm hitting this problem.
>>
>> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
>> L3+DHCP Network Node.
>>
>> The connectivity from behind my Instances is very slow. It takes an
>> eternity to finish "apt-get update".
>>
>
>
> I'm curious if you can do the following tests to help pinpoint the bottle
> neck:
>
> Run iperf or netperf between:
> two instances on the same hypervisor - this will determine if it's a
> virtualization driver issue if the performance is bad.
> two instances on different hypervisors.
> one instance to the namespace of the l3 agent.
>
>
>
>
>
>
>>
>> If I run "apt-get update" from within tenant's Namespace, it goes fine.
>>
>> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I
>> and unable to start new Ubuntu Instances and login into them... Look:
>>
>> --
>> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
>> 2013-10-22 06:01:42,989 - util.py[WARNING]: '
>> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
>> [3/120s]: url error [[Errno 113] No route to host]
>> 2013-10-22 06:01:45,988 - util.py[WARNING]: '
>> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
>> [6/120s]: url error [[Errno 113] No route to host]
>> --
>>
>
>
> Do you see anything interesting in the neutron-metadata-agent log? Or it
> looks like your instance doesn't have a route to the default gw?
>
>
>>
>> Is this problem still around?!
>>
>> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?
>>
>> Is it possible to re-enable Metadata when ovs_use_veth = true ?
>>
>> Thanks!
>> Thiago
>>
>>
>> On 3 October 2013 06:27, James Page <james.page@ubuntu.com> wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>> On 02/10/13 22:49, James Page wrote:
>>> >> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
>>> >>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
>>> >>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
>>> >>> ms F=1500 0.598 ms 0.566 ms
>>> >>>
>>> >>> The PMTU from the l3 gateway to the instance looks OK to me.
>>> > I spent a bit more time debugging this; performance from within
>>> > the router netns on the L3 gateway node looks good in both
>>> > directions when accessing via the tenant network (10.5.0.2) over
>>> > the qr-XXXXX interface, but when accessing through the external
>>> > network from within the netns I see the same performance choke
>>> > upstream into the tenant network.
>>> >
>>> > Which would indicate that my problem lies somewhere around the
>>> > qg-XXXXX interface in the router netns - just trying to figure out
>>> > exactly what - maybe iptables is doing something wonky?
>>>
>>> OK - I found a fix but I'm not sure why this makes a difference;
>>> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
>>> True'; I switched this on, clearing everything down, rebooted and now
>>> I seem symmetric good performance across all neutron routers.
>>>
>>> This would point to some sort of underlying bug when ovs_use_veth =
>>> False.
>>>
>>>
>>> - --
>>> James Page
>>> Ubuntu and Debian Developer
>>> james.page@ubuntu.com
>>> jamespage@debian.org
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.14 (GNU/Linux)
>>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>>
>>> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
>>> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
>>> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
>>> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
>>> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
>>> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
>>> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
>>> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
>>> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
>>> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
>>> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
>>> jkJM4Y1BUV+2L5Rrf3sc
>>> =4lO4
>>> -----END PGP SIGNATURE-----
>>>
>>> _______________________________________________
>>> Mailing list:
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>> Post to : openstack@lists.openstack.org
>>> Unsubscribe :
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>
>>
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack@lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 24, 2013, 8:37 AM

Post #22 of 62 (16780 views)

Permalink

Precisely!

The doc currently says to disable Namespace when using GRE, never did this
before, look:

http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html

But on this very same doc, they say to enable it... Who knows?! =P

http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html

I stick with Namespace enabled...

Let me ask you something, when you enable ovs_use_veth, que Metadata and
DHCP still works?!

Cheers!
Thiago

On 24 October 2013 12:22, Speichert,Daniel <djs428@drexel.edu> wrote:

> Hello everyone,****
>
> ** **
>
> It seems we also ran into the same issue.****
>
> ** **
>
> We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud
> archives (precise-updates).****
>
> ** **
>
> The download speed to the VMs increased from 5 Mbps to maximum after
> enabling ovs_use_veth. Upload speed from the VMs is still terrible (max 1
> Mbps, usually 0.04 Mbps).****
>
> ** **
>
> Here is the iperf between the instance and L3 agent (network node) inside
> namespace.****
>
> ** **
>
> root@cloud:~# ip netns exec qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a
> iperf -c 10.1.0.24 -r****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> ------------------------------------------------------------****
>
> Client connecting to 10.1.0.24, TCP port 5001****
>
> TCP window size: 585 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 7] 0.0-10.0 sec 845 MBytes 708 Mbits/sec****
>
> [ 6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006****
>
> [ 6] 0.0-31.4 sec 256 KBytes 66.7 Kbits/sec****
>
> ** **
>
> We are using Neutron OpenVSwitch with GRE and namespaces.****
>
>
> A side question: the documentation says to disable namespaces with GRE and
> enable them with VLANs. It was always working well for us on Grizzly with
> GRE and namespaces and we could never get it to work without namespaces. Is
> there any specific reason why the documentation is advising to disable it?
> ****
>
> ** **
>
> Regards,****
>
> Daniel****
>
> ** **
>
> *From:* Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º [mailto:thiagocmartinsc@gmail.com]
> *Sent:* Thursday, October 24, 2013 3:58 AM
> *To:* Aaron Rosen
> *Cc:* openstack@lists.openstack.org
>
> *Subject:* Re: [Openstack] Directional network performance issues with
> Neutron + OpenvSwitch****
>
> ** **
>
> Hi Aaron,****
>
> ** **
>
> Thanks for answering! =)****
>
> ** **
>
> Lets work...****
>
> ** **
>
> ---****
>
> ** **
>
> TEST #1 - iperf between Network Node and its Uplink router (Data Center's
> gateway "Internet") - OVS br-ex / eth2****
>
> ** **
>
> # Tenant Namespace route table****
>
> ** **
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route****
>
> default via 172.16.0.1 dev qg-50b615b7-c2 ****
>
> 172.16.0.0/20 dev qg-50b615b7-c2 proto kernel scope link src
> 172.16.0.2 ****
>
> 192.168.210.0/24 dev qr-a1376f61-05 proto kernel scope link src
> 192.168.210.1 ****
>
> ** **
>
> # there is a "iperf -s" running at 172.16.0.1 "Internet", testing it****
>
> ** **
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.1, TCP port 5001****
>
> TCP window size: 22.9 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 5] 0.0-10.0 sec 668 MBytes 559 Mbits/sec****
>
> ---****
>
> ** **
>
> ---****
>
> ** **
>
> TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink
> router****
>
> ** **
>
> # iperf server running within Tenant's Namespace router****
>
> ** **
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s****
>
> ** **
>
> -****
>
> ** **
>
> # from instance-1****
>
> ** **
>
> ubuntu@instance-1:~$ ip route****
>
> default via 192.168.210.1 dev eth0 metric 100 ****
>
> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.2 ***
> *
>
> ** **
>
> # instance-1 performing tests against net-node-1 Namespace above****
>
> ** **
>
> ubuntu@instance-1:~$ iperf -c 192.168.210.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.1, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 484 MBytes 406 Mbits/sec****
>
> ** **
>
> # still on instance-1, now against "External IP" of its own Namespace /
> Router****
>
> ** **
>
> ubuntu@instance-1:~$ iperf -c 172.16.0.2****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.2, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001**
> **
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 520 MBytes 436 Mbits/sec****
>
> ** **
>
> # still on instance-1, now against the Data Center UpLink Router****
>
> ** **
>
> ubuntu@instance-1:~$ iperf -c 172.16.0.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.1, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001**
> **
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec * 324 MBytes 271 Mbits/sec*****
>
> ---****
>
> ** **
>
> This latest test shows only 271 Mbits/s! I think it should be at least,
> 400~430 MBits/s... Right?!****
>
> ** **
>
> ---****
>
> ** **
>
> TEST #3 - Two instances on the same hypervisor****
>
> ** **
>
> # iperf server****
>
> ** **
>
> ubuntu@instance-2:~$ ip route****
>
> default via 192.168.210.1 dev eth0 metric 100 ****
>
> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.4 ***
> *
>
> ** **
>
> ubuntu@instance-2:~$ iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
> 45800****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec****
>
> ** **
>
> # iperf client****
>
> ** **
>
> ubuntu@instance-1:~$ iperf -c 192.168.210.4****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.4, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec****
>
> ---****
>
> ** **
>
> ---****
>
> ** **
>
> TEST #4 - Two instances on different hypervisors - over GRE****
>
> ** **
>
> root@instance-2:~# iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
> 34640****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 237 MBytes 198 Mbits/sec****
>
> ** **
>
> ** **
>
> root@instance-1:~# iperf -c 192.168.210.4****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.4, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 237 MBytes 198 Mbits/sec****
>
> ---****
>
> ** **
>
> I just realized how slow is my intra-cloud (intra-VM) communication...
> :-/****
>
> ** **
>
> ---****
>
> ** **
>
> TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip***
> *
>
> ** **
>
> # Same path of "TEST #4" but, testing the physical GRE path (where GRE
> traffic flows)****
>
> ** **
>
> root@hypervisor-2:~$ iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> n[ 4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec****
>
> ** **
>
> root@hypervisor-1:~# iperf -c 10.20.2.57****
>
> ------------------------------------------------------------****
>
> Client connecting to 10.20.2.57, TCP port 5001****
>
> TCP window size: 22.9 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec****
>
> ---****
>
> ** **
>
> About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach
> 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster
> (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only
> within a hypervisor (3.96Gbit/sec).****
>
> ** **
>
> Tomorrow, I'll do this tests with netperf.****
>
> ** **
>
> NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via
> "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade
> to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if
> you guys tell me to do so.****
>
> ** **
>
> BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with
> exactly the same configurations from my current Havana + Ubuntu 12.04.3, on
> top of the same hardware, to see if the problem still persist.****
>
> ** **
>
> Regards,****
>
> Thiago****
>
> ** **
>
> On 23 October 2013 22:40, Aaron Rosen <arosen@nicira.com> wrote:****
>
> ** **
>
> ** **
>
> On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <
> thiagocmartinsc@gmail.com> wrote:****
>
> James,****
>
> ** **
>
> I think I'm hitting this problem.****
>
> ** **
>
> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
> L3+DHCP Network Node.****
>
> ** **
>
> The connectivity from behind my Instances is very slow. It takes an
> eternity to finish "apt-get update".****
>
> ** **
>
> ** **
>
> I'm curious if you can do the following tests to help pinpoint the bottle
> neck: ****
>
> ** **
>
> Run iperf or netperf between:****
>
> two instances on the same hypervisor - this will determine if it's a
> virtualization driver issue if the performance is bad. ****
>
> two instances on different hypervisors.****
>
> one instance to the namespace of the l3 agent. ****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ****
>
> ** **
>
> If I run "apt-get update" from within tenant's Namespace, it goes fine.***
> *
>
> ** **
>
> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and
> unable to start new Ubuntu Instances and login into them... Look:****
>
> ** **
>
> --****
>
> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
> ****
>
> 2013-10-22 06:01:42,989 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]:
> url error [[Errno 113] No route to host]****
>
> 2013-10-22 06:01:45,988 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]:
> url error [[Errno 113] No route to host]****
>
> --****
>
> ** **
>
> ** **
>
> Do you see anything interesting in the neutron-metadata-agent log? Or it
> looks like your instance doesn't have a route to the default gw? ****
>
> ****
>
> ** **
>
> Is this problem still around?!****
>
> ** **
>
> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?****
>
> ** **
>
> Is it possible to re-enable Metadata when ovs_use_veth = true ?****
>
> ** **
>
> Thanks!****
>
> Thiago****
>
> ** **
>
> On 3 October 2013 06:27, James Page <james.page@ubuntu.com> wrote:****
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256****
>
> On 02/10/13 22:49, James Page wrote:
> >> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
> >>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
> >>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
> >>> ms F=1500 0.598 ms 0.566 ms
> >>>
> >>> The PMTU from the l3 gateway to the instance looks OK to me.
> > I spent a bit more time debugging this; performance from within
> > the router netns on the L3 gateway node looks good in both
> > directions when accessing via the tenant network (10.5.0.2) over
> > the qr-XXXXX interface, but when accessing through the external
> > network from within the netns I see the same performance choke
> > upstream into the tenant network.
> >
> > Which would indicate that my problem lies somewhere around the
> > qg-XXXXX interface in the router netns - just trying to figure out
> > exactly what - maybe iptables is doing something wonky?****
>
> OK - I found a fix but I'm not sure why this makes a difference;
> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
> True'; I switched this on, clearing everything down, rebooted and now
> I seem symmetric good performance across all neutron routers.
>
> This would point to some sort of underlying bug when ovs_use_veth = False.
> ****
>
>
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/****
>
> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
> jkJM4Y1BUV+2L5Rrf3sc
> =4lO4****
>
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ** **
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ** **
>
> ** **
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

anne at openstack

Oct 24, 2013, 9:07 AM

Post #23 of 62 (16771 views)

Permalink

On Thu, Oct 24, 2013 at 10:37 AM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com
> wrote:

> Precisely!
>
> The doc currently says to disable Namespace when using GRE, never did this
> before, look:
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html
>
> But on this very same doc, they say to enable it... Who knows?! =P
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
>
> I stick with Namespace enabled...
>
>
Just a reminder, /trunk/ links are works in progress, thanks for bringing
the mismatch to our attention, and we already have a doc bug filed:

https://bugs.launchpad.net/openstack-manuals/+bug/1241056

Review this patch: https://review.openstack.org/#/c/53380/

Anne

> Let me ask you something, when you enable ovs_use_veth, que Metadata and
> DHCP still works?!
>
> Cheers!
> Thiago
>
>
> On 24 October 2013 12:22, Speichert,Daniel <djs428@drexel.edu> wrote:
>
>> Hello everyone,****
>>
>> ** **
>>
>> It seems we also ran into the same issue.****
>>
>> ** **
>>
>> We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud
>> archives (precise-updates).****
>>
>> ** **
>>
>> The download speed to the VMs increased from 5 Mbps to maximum after
>> enabling ovs_use_veth. Upload speed from the VMs is still terrible (max
>> 1 Mbps, usually 0.04 Mbps).****
>>
>> ** **
>>
>> Here is the iperf between the instance and L3 agent (network node) inside
>> namespace.****
>>
>> ** **
>>
>> root@cloud:~# ip netns exec
>> qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a iperf -c 10.1.0.24 -r****
>>
>> ------------------------------------------------------------****
>>
>> Server listening on TCP port 5001****
>>
>> TCP window size: 85.3 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 10.1.0.24, TCP port 5001****
>>
>> TCP window size: 585 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [ 7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001****
>>
>> [ ID] Interval Transfer Bandwidth****
>>
>> [ 7] 0.0-10.0 sec 845 MBytes 708 Mbits/sec****
>>
>> [ 6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006****
>>
>> [ 6] 0.0-31.4 sec 256 KBytes 66.7 Kbits/sec****
>>
>> ** **
>>
>> We are using Neutron OpenVSwitch with GRE and namespaces.****
>>
>>
>> A side question: the documentation says to disable namespaces with GRE
>> and enable them with VLANs. It was always working well for us on Grizzly
>> with GRE and namespaces and we could never get it to work without
>> namespaces. Is there any specific reason why the documentation is advising
>> to disable it?****
>>
>> ** **
>>
>> Regards,****
>>
>> Daniel****
>>
>> ** **
>>
>> *From:* Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º [mailto:thiagocmartinsc@gmail.com]
>> *Sent:* Thursday, October 24, 2013 3:58 AM
>> *To:* Aaron Rosen
>> *Cc:* openstack@lists.openstack.org
>>
>> *Subject:* Re: [Openstack] Directional network performance issues with
>> Neutron + OpenvSwitch****
>>
>> ** **
>>
>> Hi Aaron,****
>>
>> ** **
>>
>> Thanks for answering! =)****
>>
>> ** **
>>
>> Lets work...****
>>
>> ** **
>>
>> ---****
>>
>> ** **
>>
>> TEST #1 - iperf between Network Node and its Uplink router (Data Center's
>> gateway "Internet") - OVS br-ex / eth2****
>>
>> ** **
>>
>> # Tenant Namespace route table****
>>
>> ** **
>>
>> root@net-node-1:~# ip netns exec
>> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route****
>>
>> default via 172.16.0.1 dev qg-50b615b7-c2 ****
>>
>> 172.16.0.0/20 dev qg-50b615b7-c2 proto kernel scope link src
>> 172.16.0.2 ****
>>
>> 192.168.210.0/24 dev qr-a1376f61-05 proto kernel scope link src
>> 192.168.210.1 ****
>>
>> ** **
>>
>> # there is a "iperf -s" running at 172.16.0.1 "Internet", testing it****
>>
>> ** **
>>
>> root@net-node-1:~# ip netns exec
>> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 172.16.0.1, TCP port 5001****
>>
>> TCP window size: 22.9 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [ 5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001****
>>
>> [ ID] Interval Transfer Bandwidth****
>>
>> [ 5] 0.0-10.0 sec 668 MBytes 559 Mbits/sec****
>>
>> ---****
>>
>> ** **
>>
>> ---****
>>
>> ** **
>>
>> TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink
>> router****
>>
>> ** **
>>
>> # iperf server running within Tenant's Namespace router****
>>
>> ** **
>>
>> root@net-node-1:~# ip netns exec
>> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s****
>>
>> ** **
>>
>> -****
>>
>> ** **
>>
>> # from instance-1****
>>
>> ** **
>>
>> ubuntu@instance-1:~$ ip route****
>>
>> default via 192.168.210.1 dev eth0 metric 100 ****
>>
>> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.2 **
>> **
>>
>> ** **
>>
>> # instance-1 performing tests against net-node-1 Namespace above****
>>
>> ** **
>>
>> ubuntu@instance-1:~$ iperf -c 192.168.210.1****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 192.168.210.1, TCP port 5001****
>>
>> TCP window size: 21.0 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [ 3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port
>> 5001****
>>
>> [ ID] Interval Transfer Bandwidth****
>>
>> [ 3] 0.0-10.0 sec 484 MBytes 406 Mbits/sec****
>>
>> ** **
>>
>> # still on instance-1, now against "External IP" of its own Namespace /
>> Router****
>>
>> ** **
>>
>> ubuntu@instance-1:~$ iperf -c 172.16.0.2****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 172.16.0.2, TCP port 5001****
>>
>> TCP window size: 21.0 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [ 3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001*
>> ***
>>
>> [ ID] Interval Transfer Bandwidth****
>>
>> [ 3] 0.0-10.0 sec 520 MBytes 436 Mbits/sec****
>>
>> ** **
>>
>> # still on instance-1, now against the Data Center UpLink Router****
>>
>> ** **
>>
>> ubuntu@instance-1:~$ iperf -c 172.16.0.1****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 172.16.0.1, TCP port 5001****
>>
>> TCP window size: 21.0 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [ 3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001*
>> ***
>>
>> [ ID] Interval Transfer Bandwidth****
>>
>> [ 3] 0.0-10.0 sec * 324 MBytes 271 Mbits/sec*****
>>
>> ---****
>>
>> ** **
>>
>> This latest test shows only 271 Mbits/s! I think it should be at least,
>> 400~430 MBits/s... Right?!****
>>
>> ** **
>>
>> ---****
>>
>> ** **
>>
>> TEST #3 - Two instances on the same hypervisor****
>>
>> ** **
>>
>> # iperf server****
>>
>> ** **
>>
>> ubuntu@instance-2:~$ ip route****
>>
>> default via 192.168.210.1 dev eth0 metric 100 ****
>>
>> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.4 **
>> **
>>
>> ** **
>>
>> ubuntu@instance-2:~$ iperf -s****
>>
>> ------------------------------------------------------------****
>>
>> Server listening on TCP port 5001****
>>
>> TCP window size: 85.3 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
>> 45800****
>>
>> [ ID] Interval Transfer Bandwidth****
>>
>> [ 4] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec****
>>
>> ** **
>>
>> # iperf client****
>>
>> ** **
>>
>> ubuntu@instance-1:~$ iperf -c 192.168.210.4****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 192.168.210.4, TCP port 5001****
>>
>> TCP window size: 21.0 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [ 3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port
>> 5001****
>>
>> [ ID] Interval Transfer Bandwidth****
>>
>> [ 3] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec****
>>
>> ---****
>>
>> ** **
>>
>> ---****
>>
>> ** **
>>
>> TEST #4 - Two instances on different hypervisors - over GRE****
>>
>> ** **
>>
>> root@instance-2:~# iperf -s****
>>
>> ------------------------------------------------------------****
>>
>> Server listening on TCP port 5001****
>>
>> TCP window size: 85.3 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
>> 34640****
>>
>> [ ID] Interval Transfer Bandwidth****
>>
>> [ 4] 0.0-10.0 sec 237 MBytes 198 Mbits/sec****
>>
>> ** **
>>
>> ** **
>>
>> root@instance-1:~# iperf -c 192.168.210.4****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 192.168.210.4, TCP port 5001****
>>
>> TCP window size: 21.0 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [ 3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port
>> 5001****
>>
>> [ ID] Interval Transfer Bandwidth****
>>
>> [ 3] 0.0-10.0 sec 237 MBytes 198 Mbits/sec****
>>
>> ---****
>>
>> ** **
>>
>> I just realized how slow is my intra-cloud (intra-VM) communication...
>> :-/****
>>
>> ** **
>>
>> ---****
>>
>> ** **
>>
>> TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip**
>> **
>>
>> ** **
>>
>> # Same path of "TEST #4" but, testing the physical GRE path (where GRE
>> traffic flows)****
>>
>> ** **
>>
>> root@hypervisor-2:~$ iperf -s****
>>
>> ------------------------------------------------------------****
>>
>> Server listening on TCP port 5001****
>>
>> TCP window size: 85.3 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> n[ 4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694***
>> *
>>
>> [ ID] Interval Transfer Bandwidth****
>>
>> [ 4] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec****
>>
>> ** **
>>
>> root@hypervisor-1:~# iperf -c 10.20.2.57****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 10.20.2.57, TCP port 5001****
>>
>> TCP window size: 22.9 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [ 3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001****
>>
>> [ ID] Interval Transfer Bandwidth****
>>
>> [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec****
>>
>> ---****
>>
>> ** **
>>
>> About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach
>> 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster
>> (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only
>> within a hypervisor (3.96Gbit/sec).****
>>
>> ** **
>>
>> Tomorrow, I'll do this tests with netperf.****
>>
>> ** **
>>
>> NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via
>> "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade
>> to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if
>> you guys tell me to do so.****
>>
>> ** **
>>
>> BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with
>> exactly the same configurations from my current Havana + Ubuntu 12.04.3, on
>> top of the same hardware, to see if the problem still persist.****
>>
>> ** **
>>
>> Regards,****
>>
>> Thiago****
>>
>> ** **
>>
>> On 23 October 2013 22:40, Aaron Rosen <arosen@nicira.com> wrote:****
>>
>> ** **
>>
>> ** **
>>
>> On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <
>> thiagocmartinsc@gmail.com> wrote:****
>>
>> James,****
>>
>> ** **
>>
>> I think I'm hitting this problem.****
>>
>> ** **
>>
>> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
>> L3+DHCP Network Node.****
>>
>> ** **
>>
>> The connectivity from behind my Instances is very slow. It takes an
>> eternity to finish "apt-get update".****
>>
>> ** **
>>
>> ** **
>>
>> I'm curious if you can do the following tests to help pinpoint the bottle
>> neck: ****
>>
>> ** **
>>
>> Run iperf or netperf between:****
>>
>> two instances on the same hypervisor - this will determine if it's a
>> virtualization driver issue if the performance is bad. ****
>>
>> two instances on different hypervisors.****
>>
>> one instance to the namespace of the l3 agent. ****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ****
>>
>> ** **
>>
>> If I run "apt-get update" from within tenant's Namespace, it goes fine.**
>> **
>>
>> ** **
>>
>> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I
>> and unable to start new Ubuntu Instances and login into them... Look:****
>>
>> ** **
>>
>> --****
>>
>> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
>> ****
>>
>> 2013-10-22 06:01:42,989 - util.py[WARNING]: '
>> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
>> [3/120s]: url error [[Errno 113] No route to host]****
>>
>> 2013-10-22 06:01:45,988 - util.py[WARNING]: '
>> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
>> [6/120s]: url error [[Errno 113] No route to host]****
>>
>> --****
>>
>> ** **
>>
>> ** **
>>
>> Do you see anything interesting in the neutron-metadata-agent log? Or it
>> looks like your instance doesn't have a route to the default gw? ****
>>
>> ****
>>
>> ** **
>>
>> Is this problem still around?!****
>>
>> ** **
>>
>> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?***
>> *
>>
>> ** **
>>
>> Is it possible to re-enable Metadata when ovs_use_veth = true ?****
>>
>> ** **
>>
>> Thanks!****
>>
>> Thiago****
>>
>> ** **
>>
>> On 3 October 2013 06:27, James Page <james.page@ubuntu.com> wrote:****
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256****
>>
>> On 02/10/13 22:49, James Page wrote:
>> >> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
>> >>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
>> >>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
>> >>> ms F=1500 0.598 ms 0.566 ms
>> >>>
>> >>> The PMTU from the l3 gateway to the instance looks OK to me.
>> > I spent a bit more time debugging this; performance from within
>> > the router netns on the L3 gateway node looks good in both
>> > directions when accessing via the tenant network (10.5.0.2) over
>> > the qr-XXXXX interface, but when accessing through the external
>> > network from within the netns I see the same performance choke
>> > upstream into the tenant network.
>> >
>> > Which would indicate that my problem lies somewhere around the
>> > qg-XXXXX interface in the router netns - just trying to figure out
>> > exactly what - maybe iptables is doing something wonky?****
>>
>> OK - I found a fix but I'm not sure why this makes a difference;
>> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
>> True'; I switched this on, clearing everything down, rebooted and now
>> I seem symmetric good performance across all neutron routers.
>>
>> This would point to some sort of underlying bug when ovs_use_veth = False.
>> ****
>>
>>
>>
>> - --
>> James Page
>> Ubuntu and Debian Developer
>> james.page@ubuntu.com
>> jamespage@debian.org
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.14 (GNU/Linux)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/****
>>
>> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
>> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
>> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
>> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
>> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
>> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
>> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
>> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
>> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
>> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
>> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
>> jkJM4Y1BUV+2L5Rrf3sc
>> =4lO4****
>>
>> -----END PGP SIGNATURE-----
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack@lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>>
>> ** **
>>
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack@lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>>
>> ** **
>>
>> ** **
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack@lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

djs428 at drexel

Oct 24, 2013, 12:38 PM

Post #24 of 62 (16757 views)

Permalink

We managed to bring the upload speed back to maximum on the instances through the use of this guide:
http://docs.openstack.org/trunk/openstack-network/admin/content/openvswitch_plugin.html

Basically, the MTU needs to be lowered for GRE tunnels. It can be done with DHCP as explained in the new trunk manual.

Regards,
Daniel

From: annegentle@justwriteclick.com [mailto:annegentle@justwriteclick.com] On Behalf Of Anne Gentle
Sent: Thursday, October 24, 2013 12:08 PM
To: Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º
Cc: Speichert,Daniel; openstack@lists.openstack.org
Subject: Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

On Thu, Oct 24, 2013 at 10:37 AM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com<mailto:thiagocmartinsc@gmail.com>> wrote:
Precisely!

The doc currently says to disable Namespace when using GRE, never did this before, look:

http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html

But on this very same doc, they say to enable it... Who knows?! =P

http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html

I stick with Namespace enabled...

Just a reminder, /trunk/ links are works in progress, thanks for bringing the mismatch to our attention, and we already have a doc bug filed:

https://bugs.launchpad.net/openstack-manuals/+bug/1241056

Review this patch: https://review.openstack.org/#/c/53380/

Anne

Let me ask you something, when you enable ovs_use_veth, que Metadata and DHCP still works?!

Cheers!
Thiago

On 24 October 2013 12:22, Speichert,Daniel <djs428@drexel.edu<mailto:djs428@drexel.edu>> wrote:
Hello everyone,

It seems we also ran into the same issue.

We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud archives (precise-updates).

The download speed to the VMs increased from 5 Mbps to maximum after enabling ovs_use_veth. Upload speed from the VMs is still terrible (max 1 Mbps, usually 0.04 Mbps).

Here is the iperf between the instance and L3 agent (network node) inside namespace.

root@cloud:~# ip netns exec qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a iperf -c 10.1.0.24 -r
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.1.0.24, TCP port 5001
TCP window size: 585 KByte (default)
------------------------------------------------------------
[ 7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001
[ ID] Interval Transfer Bandwidth
[ 7] 0.0-10.0 sec 845 MBytes 708 Mbits/sec
[ 6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006
[ 6] 0.0-31.4 sec 256 KBytes 66.7 Kbits/sec

We are using Neutron OpenVSwitch with GRE and namespaces.

A side question: the documentation says to disable namespaces with GRE and enable them with VLANs. It was always working well for us on Grizzly with GRE and namespaces and we could never get it to work without namespaces. Is there any specific reason why the documentation is advising to disable it?

Regards,
Daniel

From: Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º [mailto:thiagocmartinsc@gmail.com<mailto:thiagocmartinsc@gmail.com>]
Sent: Thursday, October 24, 2013 3:58 AM
To: Aaron Rosen
Cc: openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>

Subject: Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

Hi Aaron,

Thanks for answering! =)

Lets work...

---

TEST #1 - iperf between Network Node and its Uplink router (Data Center's gateway "Internet") - OVS br-ex / eth2

# Tenant Namespace route table

root@net-node-1:~# ip netns exec qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route
default via 172.16.0.1 dev qg-50b615b7-c2
172.16.0.0/20<http://172.16.0.0/20> dev qg-50b615b7-c2 proto kernel scope link src 172.16.0.2
192.168.210.0/24<http://192.168.210.0/24> dev qr-a1376f61-05 proto kernel scope link src 192.168.210.1<tel:192.168.210.1>

# there is a "iperf -s" running at 172.16.0.1 "Internet", testing it

root@net-node-1:~# ip netns exec qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1
------------------------------------------------------------
Client connecting to 172.16.0.1, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[ 5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 668 MBytes 559 Mbits/sec
---

---

TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink router

# iperf server running within Tenant's Namespace router

root@net-node-1:~# ip netns exec qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s

-

# from instance-1

ubuntu@instance-1:~$ ip route
default via 192.168.210.1<tel:192.168.210.1> dev eth0 metric 100
192.168.210.0/24<http://192.168.210.0/24> dev eth0 proto kernel scope link src 192.168.210.2<tel:192.168.210.2>

# instance-1 performing tests against net-node-1 Namespace above

ubuntu@instance-1:~$ iperf -c 192.168.210.1<tel:192.168.210.1>
------------------------------------------------------------
Client connecting to 192.168.210.1<tel:192.168.210.1>, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2<tel:192.168.210.2> port 43739 connected with 192.168.210.1<tel:192.168.210.1> port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 484 MBytes 406 Mbits/sec

# still on instance-1, now against "External IP" of its own Namespace / Router

ubuntu@instance-1:~$ iperf -c 172.16.0.2
------------------------------------------------------------
Client connecting to 172.16.0.2, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2<tel:192.168.210.2> port 34703 connected with 172.16.0.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 520 MBytes 436 Mbits/sec

# still on instance-1, now against the Data Center UpLink Router

ubuntu@instance-1:~$ iperf -c 172.16.0.1
------------------------------------------------------------
Client connecting to 172.16.0.1, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.4<tel:192.168.210.4> port 38401 connected with 172.16.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 324 MBytes 271 Mbits/sec
---

This latest test shows only 271 Mbits/s! I think it should be at least, 400~430 MBits/s... Right?!

---

TEST #3 - Two instances on the same hypervisor

# iperf server

ubuntu@instance-2:~$ ip route
default via 192.168.210.1<tel:192.168.210.1> dev eth0 metric 100
192.168.210.0/24<http://192.168.210.0/24> dev eth0 proto kernel scope link src 192.168.210.4<tel:192.168.210.4>

ubuntu@instance-2:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.210.4<tel:192.168.210.4> port 5001 connected with 192.168.210.2<tel:192.168.210.2> port 45800
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec

# iperf client

ubuntu@instance-1:~$ iperf -c 192.168.210.4<tel:192.168.210.4>
------------------------------------------------------------
Client connecting to 192.168.210.4<tel:192.168.210.4>, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2<tel:192.168.210.2> port 45800 connected with 192.168.210.4<tel:192.168.210.4> port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec
---

---

TEST #4 - Two instances on different hypervisors - over GRE

root@instance-2:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.210.4<tel:192.168.210.4> port 5001 connected with 192.168.210.2<tel:192.168.210.2> port 34640
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 237 MBytes 198 Mbits/sec

root@instance-1:~# iperf -c 192.168.210.4<tel:192.168.210.4>
------------------------------------------------------------
Client connecting to 192.168.210.4<tel:192.168.210.4>, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2<tel:192.168.210.2> port 34640 connected with 192.168.210.4<tel:192.168.210.4> port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 237 MBytes 198 Mbits/sec
---

I just realized how slow is my intra-cloud (intra-VM) communication... :-/

---

TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip

# Same path of "TEST #4" but, testing the physical GRE path (where GRE traffic flows)

root@hypervisor-2:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
n[ 4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec

root@hypervisor-1:~# iperf -c 10.20.2.57
------------------------------------------------------------
Client connecting to 10.20.2.57, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[ 3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec
---

About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only within a hypervisor (3.96Gbit/sec).

Tomorrow, I'll do this tests with netperf.

NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if you guys tell me to do so.

BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with exactly the same configurations from my current Havana + Ubuntu 12.04.3, on top of the same hardware, to see if the problem still persist.

Regards,
Thiago

On 23 October 2013 22:40, Aaron Rosen <arosen@nicira.com<mailto:arosen@nicira.com>> wrote:

On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com<mailto:thiagocmartinsc@gmail.com>> wrote:
James,

I think I'm hitting this problem.

I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and L3+DHCP Network Node.

The connectivity from behind my Instances is very slow. It takes an eternity to finish "apt-get update".

I'm curious if you can do the following tests to help pinpoint the bottle neck:

Run iperf or netperf between:
two instances on the same hypervisor - this will determine if it's a virtualization driver issue if the performance is bad.
two instances on different hypervisors.
one instance to the namespace of the l3 agent.

If I run "apt-get update" from within tenant's Namespace, it goes fine.

If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and unable to start new Ubuntu Instances and login into them... Look:

--
cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
2013-10-22 06:01:42,989 - util.py[WARNING]: 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]: url error [[Errno 113] No route to host]
2013-10-22 06:01:45,988 - util.py[WARNING]: 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]: url error [[Errno 113] No route to host]
--

Do you see anything interesting in the neutron-metadata-agent log? Or it looks like your instance doesn't have a route to the default gw?

Is this problem still around?!

Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?

Is it possible to re-enable Metadata when ovs_use_veth = true ?

Thanks!
Thiago

On 3 October 2013 06:27, James Page <james.page@ubuntu.com<mailto:james.page@ubuntu.com>> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
On 02/10/13 22:49, James Page wrote:
>> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
>>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
>>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
>>> ms F=1500 0.598 ms 0.566 ms
>>>
>>> The PMTU from the l3 gateway to the instance looks OK to me.
> I spent a bit more time debugging this; performance from within
> the router netns on the L3 gateway node looks good in both
> directions when accessing via the tenant network (10.5.0.2) over
> the qr-XXXXX interface, but when accessing through the external
> network from within the netns I see the same performance choke
> upstream into the tenant network.
>
> Which would indicate that my problem lies somewhere around the
> qg-XXXXX interface in the router netns - just trying to figure out
> exactly what - maybe iptables is doing something wonky?
OK - I found a fix but I'm not sure why this makes a difference;
neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
True'; I switched this on, clearing everything down, rebooted and now
I seem symmetric good performance across all neutron routers.

This would point to some sort of underlying bug when ovs_use_veth = False.

- --
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com<mailto:james.page@ubuntu.com>
jamespage@debian.org<mailto:jamespage@debian.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
jkJM4Y1BUV+2L5Rrf3sc
=4lO4
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

robertc at robertcollins

Oct 24, 2013, 1:29 PM

Post #25 of 62 (16759 views)

Permalink

Ok so that says that PMTUd is failing, probably due to a
bug/limitation in openvswitch. Can we please make sure a bug is filed
- both on Neutron and on the upstream component as soon as someone
tracks it down : Manual MTU lowering is only needed when a network
component is failing to report failed delivery of DF packets
correctly.

-Rob

On 25 October 2013 08:38, Speichert,Daniel <djs428@drexel.edu> wrote:
> We managed to bring the upload speed back to maximum on the instances
> through the use of this guide:
>
> http://docs.openstack.org/trunk/openstack-network/admin/content/openvswitch_plugin.html
>
>
>
> Basically, the MTU needs to be lowered for GRE tunnels. It can be done with
> DHCP as explained in the new trunk manual.
>
>
>
> Regards,
>
> Daniel
>
>
>
> From: annegentle@justwriteclick.com [mailto:annegentle@justwriteclick.com]
> On Behalf Of Anne Gentle
> Sent: Thursday, October 24, 2013 12:08 PM
> To: Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º
> Cc: Speichert,Daniel; openstack@lists.openstack.org
>
>
> Subject: Re: [Openstack] Directional network performance issues with Neutron
> + OpenvSwitch
>
>
>
>
>
>
>
> On Thu, Oct 24, 2013 at 10:37 AM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º
> <thiagocmartinsc@gmail.com> wrote:
>
> Precisely!
>
>
>
> The doc currently says to disable Namespace when using GRE, never did this
> before, look:
>
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html
>
>
>
> But on this very same doc, they say to enable it... Who knows?! =P
>
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
>
>
>
> I stick with Namespace enabled...
>
>
>
>
>
> Just a reminder, /trunk/ links are works in progress, thanks for bringing
> the mismatch to our attention, and we already have a doc bug filed:
>
>
>
> https://bugs.launchpad.net/openstack-manuals/+bug/1241056
>
>
>
> Review this patch: https://review.openstack.org/#/c/53380/
>
>
>
> Anne
>
>
>
>
>
>
>
> Let me ask you something, when you enable ovs_use_veth, que Metadata and
> DHCP still works?!
>
>
>
> Cheers!
>
> Thiago
>
>
>
> On 24 October 2013 12:22, Speichert,Daniel <djs428@drexel.edu> wrote:
>
> Hello everyone,
>
>
>
> It seems we also ran into the same issue.
>
>
>
> We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud archives
> (precise-updates).
>
>
>
> The download speed to the VMs increased from 5 Mbps to maximum after
> enabling ovs_use_veth. Upload speed from the VMs is still terrible (max 1
> Mbps, usually 0.04 Mbps).
>
>
>
> Here is the iperf between the instance and L3 agent (network node) inside
> namespace.
>
>
>
> root@cloud:~# ip netns exec qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a
> iperf -c 10.1.0.24 -r
>
> ------------------------------------------------------------
>
> Server listening on TCP port 5001
>
> TCP window size: 85.3 KByte (default)
>
> ------------------------------------------------------------
>
> ------------------------------------------------------------
>
> Client connecting to 10.1.0.24, TCP port 5001
>
> TCP window size: 585 KByte (default)
>
> ------------------------------------------------------------
>
> [ 7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 7] 0.0-10.0 sec 845 MBytes 708 Mbits/sec
>
> [ 6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006
>
> [ 6] 0.0-31.4 sec 256 KBytes 66.7 Kbits/sec
>
>
>
> We are using Neutron OpenVSwitch with GRE and namespaces.
>
>
> A side question: the documentation says to disable namespaces with GRE and
> enable them with VLANs. It was always working well for us on Grizzly with
> GRE and namespaces and we could never get it to work without namespaces. Is
> there any specific reason why the documentation is advising to disable it?
>
>
>
> Regards,
>
> Daniel
>
>
>
> From: Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º [mailto:thiagocmartinsc@gmail.com]
> Sent: Thursday, October 24, 2013 3:58 AM
> To: Aaron Rosen
> Cc: openstack@lists.openstack.org
>
>
> Subject: Re: [Openstack] Directional network performance issues with Neutron
> + OpenvSwitch
>
>
>
> Hi Aaron,
>
>
>
> Thanks for answering! =)
>
>
>
> Lets work...
>
>
>
> ---
>
>
>
> TEST #1 - iperf between Network Node and its Uplink router (Data Center's
> gateway "Internet") - OVS br-ex / eth2
>
>
>
> # Tenant Namespace route table
>
>
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route
>
> default via 172.16.0.1 dev qg-50b615b7-c2
>
> 172.16.0.0/20 dev qg-50b615b7-c2 proto kernel scope link src 172.16.0.2
>
> 192.168.210.0/24 dev qr-a1376f61-05 proto kernel scope link src
> 192.168.210.1
>
>
>
> # there is a "iperf -s" running at 172.16.0.1 "Internet", testing it
>
>
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1
>
> ------------------------------------------------------------
>
> Client connecting to 172.16.0.1, TCP port 5001
>
> TCP window size: 22.9 KByte (default)
>
> ------------------------------------------------------------
>
> [ 5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 5] 0.0-10.0 sec 668 MBytes 559 Mbits/sec
>
> ---
>
>
>
> ---
>
>
>
> TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink
> router
>
>
>
> # iperf server running within Tenant's Namespace router
>
>
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s
>
>
>
> -
>
>
>
> # from instance-1
>
>
>
> ubuntu@instance-1:~$ ip route
>
> default via 192.168.210.1 dev eth0 metric 100
>
> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.2
>
>
>
> # instance-1 performing tests against net-node-1 Namespace above
>
>
>
> ubuntu@instance-1:~$ iperf -c 192.168.210.1
>
> ------------------------------------------------------------
>
> Client connecting to 192.168.210.1, TCP port 5001
>
> TCP window size: 21.0 KByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 484 MBytes 406 Mbits/sec
>
>
>
> # still on instance-1, now against "External IP" of its own Namespace /
> Router
>
>
>
> ubuntu@instance-1:~$ iperf -c 172.16.0.2
>
> ------------------------------------------------------------
>
> Client connecting to 172.16.0.2, TCP port 5001
>
> TCP window size: 21.0 KByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 520 MBytes 436 Mbits/sec
>
>
>
> # still on instance-1, now against the Data Center UpLink Router
>
>
>
> ubuntu@instance-1:~$ iperf -c 172.16.0.1
>
> ------------------------------------------------------------
>
> Client connecting to 172.16.0.1, TCP port 5001
>
> TCP window size: 21.0 KByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 324 MBytes 271 Mbits/sec
>
> ---
>
>
>
> This latest test shows only 271 Mbits/s! I think it should be at least,
> 400~430 MBits/s... Right?!
>
>
>
> ---
>
>
>
> TEST #3 - Two instances on the same hypervisor
>
>
>
> # iperf server
>
>
>
> ubuntu@instance-2:~$ ip route
>
> default via 192.168.210.1 dev eth0 metric 100
>
> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.4
>
>
>
> ubuntu@instance-2:~$ iperf -s
>
> ------------------------------------------------------------
>
> Server listening on TCP port 5001
>
> TCP window size: 85.3 KByte (default)
>
> ------------------------------------------------------------
>
> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port 45800
>
> [ ID] Interval Transfer Bandwidth
>
> [ 4] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec
>
>
>
> # iperf client
>
>
>
> ubuntu@instance-1:~$ iperf -c 192.168.210.4
>
> ------------------------------------------------------------
>
> Client connecting to 192.168.210.4, TCP port 5001
>
> TCP window size: 21.0 KByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec
>
> ---
>
>
>
> ---
>
>
>
> TEST #4 - Two instances on different hypervisors - over GRE
>
>
>
> root@instance-2:~# iperf -s
>
> ------------------------------------------------------------
>
> Server listening on TCP port 5001
>
> TCP window size: 85.3 KByte (default)
>
> ------------------------------------------------------------
>
> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port 34640
>
> [ ID] Interval Transfer Bandwidth
>
> [ 4] 0.0-10.0 sec 237 MBytes 198 Mbits/sec
>
>
>
>
>
> root@instance-1:~# iperf -c 192.168.210.4
>
> ------------------------------------------------------------
>
> Client connecting to 192.168.210.4, TCP port 5001
>
> TCP window size: 21.0 KByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 237 MBytes 198 Mbits/sec
>
> ---
>
>
>
> I just realized how slow is my intra-cloud (intra-VM) communication... :-/
>
>
>
> ---
>
>
>
> TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip
>
>
>
> # Same path of "TEST #4" but, testing the physical GRE path (where GRE
> traffic flows)
>
>
>
> root@hypervisor-2:~$ iperf -s
>
> ------------------------------------------------------------
>
> Server listening on TCP port 5001
>
> TCP window size: 85.3 KByte (default)
>
> ------------------------------------------------------------
>
> n[ 4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694
>
> [ ID] Interval Transfer Bandwidth
>
> [ 4] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec
>
>
>
> root@hypervisor-1:~# iperf -c 10.20.2.57
>
> ------------------------------------------------------------
>
> Client connecting to 10.20.2.57, TCP port 5001
>
> TCP window size: 22.9 KByte (default)
>
> ------------------------------------------------------------
>
> [ 3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001
>
> [ ID] Interval Transfer Bandwidth
>
> [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec
>
> ---
>
>
>
> About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach
> 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster
> (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only
> within a hypervisor (3.96Gbit/sec).
>
>
>
> Tomorrow, I'll do this tests with netperf.
>
>
>
> NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via
> "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade
> to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if
> you guys tell me to do so.
>
>
>
> BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with
> exactly the same configurations from my current Havana + Ubuntu 12.04.3, on
> top of the same hardware, to see if the problem still persist.
>
>
>
> Regards,
>
> Thiago
>
>
>
> On 23 October 2013 22:40, Aaron Rosen <arosen@nicira.com> wrote:
>
>
>
>
>
> On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º
> <thiagocmartinsc@gmail.com> wrote:
>
> James,
>
>
>
> I think I'm hitting this problem.
>
>
>
> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
> L3+DHCP Network Node.
>
>
>
> The connectivity from behind my Instances is very slow. It takes an eternity
> to finish "apt-get update".
>
>
>
>
>
> I'm curious if you can do the following tests to help pinpoint the bottle
> neck:
>
>
>
> Run iperf or netperf between:
>
> two instances on the same hypervisor - this will determine if it's a
> virtualization driver issue if the performance is bad.
>
> two instances on different hypervisors.
>
> one instance to the namespace of the l3 agent.
>
>
>
>
>
>
>
>
>
>
>
>
>
> If I run "apt-get update" from within tenant's Namespace, it goes fine.
>
>
>
> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and
> unable to start new Ubuntu Instances and login into them... Look:
>
>
>
> --
>
> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
>
> 2013-10-22 06:01:42,989 - util.py[WARNING]:
> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]:
> url error [[Errno 113] No route to host]
>
> 2013-10-22 06:01:45,988 - util.py[WARNING]:
> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]:
> url error [[Errno 113] No route to host]
>
> --
>
>
>
>
>
> Do you see anything interesting in the neutron-metadata-agent log? Or it
> looks like your instance doesn't have a route to the default gw?
>
>
>
>
>
> Is this problem still around?!
>
>
>
> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?
>
>
>
> Is it possible to re-enable Metadata when ovs_use_veth = true ?
>
>
>
> Thanks!
>
> Thiago
>
>
>
> On 3 October 2013 06:27, James Page <james.page@ubuntu.com> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> On 02/10/13 22:49, James Page wrote:
>>> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
>>>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
>>>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
>>>> ms F=1500 0.598 ms 0.566 ms
>>>>
>>>> The PMTU from the l3 gateway to the instance looks OK to me.
>> I spent a bit more time debugging this; performance from within
>> the router netns on the L3 gateway node looks good in both
>> directions when accessing via the tenant network (10.5.0.2) over
>> the qr-XXXXX interface, but when accessing through the external
>> network from within the netns I see the same performance choke
>> upstream into the tenant network.
>>
>> Which would indicate that my problem lies somewhere around the
>> qg-XXXXX interface in the router netns - just trying to figure out
>> exactly what - maybe iptables is doing something wonky?
>
> OK - I found a fix but I'm not sure why this makes a difference;
> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
> True'; I switched this on, clearing everything down, rebooted and now
> I seem symmetric good performance across all neutron routers.
>
> This would point to some sort of underlying bug when ovs_use_veth = False.
>
>
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
> jkJM4Y1BUV+2L5Rrf3sc
> =4lO4
>
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>

--
Robert Collins <rbtcollins@hp.com>
Distinguished Technologist
HP Converged Cloud

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 24, 2013, 8:58 PM

Post #26 of 62 (8437 views)

Permalink

Hi Daniel,

I followed that page, my Instances MTU is lowered by DHCP Agent but, same
result: poor network performance (internal between Instances and when
trying to reach the Internet).

No matter if I use "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
"dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU =
1500), the result is almost the same.

I'll try VXLAN (or just VLANs) this weekend to see if I can get better
results...

Thanks!
Thiago

On 24 October 2013 17:38, Speichert,Daniel <djs428@drexel.edu> wrote:

> We managed to bring the upload speed back to maximum on the instances
> through the use of this guide:****
>
>
> http://docs.openstack.org/trunk/openstack-network/admin/content/openvswitch_plugin.html
> ****
>
> ** **
>
> Basically, the MTU needs to be lowered for GRE tunnels. It can be done
> with DHCP as explained in the new trunk manual.****
>
> ** **
>
> Regards,****
>
> Daniel****
>
> ** **
>
> *From:* annegentle@justwriteclick.com [mailto:
> annegentle@justwriteclick.com] *On Behalf Of *Anne Gentle
> *Sent:* Thursday, October 24, 2013 12:08 PM
> *To:* Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º
> *Cc:* Speichert,Daniel; openstack@lists.openstack.org
>
> *Subject:* Re: [Openstack] Directional network performance issues with
> Neutron + OpenvSwitch****
>
> ** **
>
> ** **
>
> ** **
>
> On Thu, Oct 24, 2013 at 10:37 AM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <
> thiagocmartinsc@gmail.com> wrote:****
>
> Precisely!****
>
> ** **
>
> The doc currently says to disable Namespace when using GRE, never did this
> before, look:****
>
> ** **
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html
> ****
>
> ** **
>
> But on this very same doc, they say to enable it... Who knows?! =P****
>
> ** **
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
> ****
>
> ** **
>
> I stick with Namespace enabled...****
>
> ** **
>
> ** **
>
> Just a reminder, /trunk/ links are works in progress, thanks for bringing
> the mismatch to our attention, and we already have a doc bug filed:****
>
> ** **
>
> https://bugs.launchpad.net/openstack-manuals/+bug/1241056****
>
> ** **
>
> Review this patch: https://review.openstack.org/#/c/53380/****
>
> ** **
>
> Anne****
>
> ** **
>
> ** **
>
> ****
>
> Let me ask you something, when you enable ovs_use_veth, que Metadata and
> DHCP still works?!****
>
> ** **
>
> Cheers!****
>
> Thiago****
>
> ** **
>
> On 24 October 2013 12:22, Speichert,Daniel <djs428@drexel.edu> wrote:****
>
> Hello everyone,****
>
> ****
>
> It seems we also ran into the same issue.****
>
> ****
>
> We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud
> archives (precise-updates).****
>
> ****
>
> The download speed to the VMs increased from 5 Mbps to maximum after
> enabling ovs_use_veth. Upload speed from the VMs is still terrible (max 1
> Mbps, usually 0.04 Mbps).****
>
> ****
>
> Here is the iperf between the instance and L3 agent (network node) inside
> namespace.****
>
> ****
>
> root@cloud:~# ip netns exec qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a
> iperf -c 10.1.0.24 -r****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> ------------------------------------------------------------****
>
> Client connecting to 10.1.0.24, TCP port 5001****
>
> TCP window size: 585 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 7] 0.0-10.0 sec 845 MBytes 708 Mbits/sec****
>
> [ 6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006****
>
> [ 6] 0.0-31.4 sec 256 KBytes 66.7 Kbits/sec****
>
> ****
>
> We are using Neutron OpenVSwitch with GRE and namespaces.****
>
>
> A side question: the documentation says to disable namespaces with GRE and
> enable them with VLANs. It was always working well for us on Grizzly with
> GRE and namespaces and we could never get it to work without namespaces. Is
> there any specific reason why the documentation is advising to disable it?
> ****
>
> ****
>
> Regards,****
>
> Daniel****
>
> ****
>
> *From:* Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º [mailto:thiagocmartinsc@gmail.com]
> *Sent:* Thursday, October 24, 2013 3:58 AM
> *To:* Aaron Rosen
> *Cc:* openstack@lists.openstack.org****
>
>
> *Subject:* Re: [Openstack] Directional network performance issues with
> Neutron + OpenvSwitch****
>
> ****
>
> Hi Aaron,****
>
> ****
>
> Thanks for answering! =)****
>
> ****
>
> Lets work...****
>
> ****
>
> ---****
>
> ****
>
> TEST #1 - iperf between Network Node and its Uplink router (Data Center's
> gateway "Internet") - OVS br-ex / eth2****
>
> ****
>
> # Tenant Namespace route table****
>
> ****
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route****
>
> default via 172.16.0.1 dev qg-50b615b7-c2 ****
>
> 172.16.0.0/20 dev qg-50b615b7-c2 proto kernel scope link src
> 172.16.0.2 ****
>
> 192.168.210.0/24 dev qr-a1376f61-05 proto kernel scope link src
> 192.168.210.1 ****
>
> ****
>
> # there is a "iperf -s" running at 172.16.0.1 "Internet", testing it****
>
> ****
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.1, TCP port 5001****
>
> TCP window size: 22.9 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 5] 0.0-10.0 sec 668 MBytes 559 Mbits/sec****
>
> ---****
>
> ****
>
> ---****
>
> ****
>
> TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink
> router****
>
> ****
>
> # iperf server running within Tenant's Namespace router****
>
> ****
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s****
>
> ****
>
> -****
>
> ****
>
> # from instance-1****
>
> ****
>
> ubuntu@instance-1:~$ ip route****
>
> default via 192.168.210.1 dev eth0 metric 100 ****
>
> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.2 ***
> *
>
> ****
>
> # instance-1 performing tests against net-node-1 Namespace above****
>
> ****
>
> ubuntu@instance-1:~$ iperf -c 192.168.210.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.1, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 484 MBytes 406 Mbits/sec****
>
> ****
>
> # still on instance-1, now against "External IP" of its own Namespace /
> Router****
>
> ****
>
> ubuntu@instance-1:~$ iperf -c 172.16.0.2****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.2, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001**
> **
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 520 MBytes 436 Mbits/sec****
>
> ****
>
> # still on instance-1, now against the Data Center UpLink Router****
>
> ****
>
> ubuntu@instance-1:~$ iperf -c 172.16.0.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.1, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001**
> **
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec * 324 MBytes 271 Mbits/sec*****
>
> ---****
>
> ****
>
> This latest test shows only 271 Mbits/s! I think it should be at least,
> 400~430 MBits/s... Right?!****
>
> ****
>
> ---****
>
> ****
>
> TEST #3 - Two instances on the same hypervisor****
>
> ****
>
> # iperf server****
>
> ****
>
> ubuntu@instance-2:~$ ip route****
>
> default via 192.168.210.1 dev eth0 metric 100 ****
>
> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.4 ***
> *
>
> ****
>
> ubuntu@instance-2:~$ iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
> 45800****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec****
>
> ****
>
> # iperf client****
>
> ****
>
> ubuntu@instance-1:~$ iperf -c 192.168.210.4****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.4, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec****
>
> ---****
>
> ****
>
> ---****
>
> ****
>
> TEST #4 - Two instances on different hypervisors - over GRE****
>
> ****
>
> root@instance-2:~# iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
> 34640****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 237 MBytes 198 Mbits/sec****
>
> ****
>
> ****
>
> root@instance-1:~# iperf -c 192.168.210.4****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.4, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 237 MBytes 198 Mbits/sec****
>
> ---****
>
> ****
>
> I just realized how slow is my intra-cloud (intra-VM) communication...
> :-/****
>
> ****
>
> ---****
>
> ****
>
> TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip***
> *
>
> ****
>
> # Same path of "TEST #4" but, testing the physical GRE path (where GRE
> traffic flows)****
>
> ****
>
> root@hypervisor-2:~$ iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> n[ 4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec****
>
> ****
>
> root@hypervisor-1:~# iperf -c 10.20.2.57****
>
> ------------------------------------------------------------****
>
> Client connecting to 10.20.2.57, TCP port 5001****
>
> TCP window size: 22.9 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec****
>
> ---****
>
> ****
>
> About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach
> 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster
> (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only
> within a hypervisor (3.96Gbit/sec).****
>
> ****
>
> Tomorrow, I'll do this tests with netperf.****
>
> ****
>
> NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via
> "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade
> to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if
> you guys tell me to do so.****
>
> ****
>
> BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with
> exactly the same configurations from my current Havana + Ubuntu 12.04.3, on
> top of the same hardware, to see if the problem still persist.****
>
> ****
>
> Regards,****
>
> Thiago****
>
> ****
>
> On 23 October 2013 22:40, Aaron Rosen <arosen@nicira.com> wrote:****
>
> ****
>
> ****
>
> On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <
> thiagocmartinsc@gmail.com> wrote:****
>
> James,****
>
> ****
>
> I think I'm hitting this problem.****
>
> ****
>
> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
> L3+DHCP Network Node.****
>
> ****
>
> The connectivity from behind my Instances is very slow. It takes an
> eternity to finish "apt-get update".****
>
> ****
>
> ****
>
> I'm curious if you can do the following tests to help pinpoint the bottle
> neck: ****
>
> ****
>
> Run iperf or netperf between:****
>
> two instances on the same hypervisor - this will determine if it's a
> virtualization driver issue if the performance is bad. ****
>
> two instances on different hypervisors.****
>
> one instance to the namespace of the l3 agent. ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> If I run "apt-get update" from within tenant's Namespace, it goes fine.***
> *
>
> ****
>
> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and
> unable to start new Ubuntu Instances and login into them... Look:****
>
> ****
>
> --****
>
> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
> ****
>
> 2013-10-22 06:01:42,989 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]:
> url error [[Errno 113] No route to host]****
>
> 2013-10-22 06:01:45,988 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]:
> url error [[Errno 113] No route to host]****
>
> --****
>
> ****
>
> ****
>
> Do you see anything interesting in the neutron-metadata-agent log? Or it
> looks like your instance doesn't have a route to the default gw? ****
>
> ****
>
> ****
>
> Is this problem still around?!****
>
> ****
>
> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?****
>
> ****
>
> Is it possible to re-enable Metadata when ovs_use_veth = true ?****
>
> ****
>
> Thanks!****
>
> Thiago****
>
> ****
>
> On 3 October 2013 06:27, James Page <james.page@ubuntu.com> wrote:****
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256****
>
> On 02/10/13 22:49, James Page wrote:
> >> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
> >>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
> >>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
> >>> ms F=1500 0.598 ms 0.566 ms
> >>>
> >>> The PMTU from the l3 gateway to the instance looks OK to me.
> > I spent a bit more time debugging this; performance from within
> > the router netns on the L3 gateway node looks good in both
> > directions when accessing via the tenant network (10.5.0.2) over
> > the qr-XXXXX interface, but when accessing through the external
> > network from within the netns I see the same performance choke
> > upstream into the tenant network.
> >
> > Which would indicate that my problem lies somewhere around the
> > qg-XXXXX interface in the router netns - just trying to figure out
> > exactly what - maybe iptables is doing something wonky?****
>
> OK - I found a fix but I'm not sure why this makes a difference;
> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
> True'; I switched this on, clearing everything down, rebooted and now
> I seem symmetric good performance across all neutron routers.
>
> This would point to some sort of underlying bug when ovs_use_veth = False.
> ****
>
>
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/****
>
> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
> jkJM4Y1BUV+2L5Rrf3sc
> =4lO4****
>
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ****
>
> ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ** **
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ** **
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

dara2002-openstack at yahoo

Oct 25, 2013, 2:15 AM

Post #27 of 62 (8448 views)

Permalink

Hi Thiago,

you have configured DHCP to push out a MTU of 1400. Can you confirm that the 1400 MTU is actually getting out to the instances by running 'ip link' on them?

There is an open problem where the veth used to connect the OVS and Linux bridges causes a performance drop on some kernels - https://bugs.launchpad.net/nova-project/+bug/1223267 . If you are using the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to LibvirtOpenVswitchDriver and repeat the iperf test between instances on different compute-nodes.

What NICs (maker+model) are you using? You could try disabling any off-load functionality - 'ethtool -k <iface-used-for-gre>'.

What kernal are you using: 'uname -a'?

Re, Darragh.

> Hi Daniel,

>
> I followed that page, my Instances MTU is lowered by DHCP Agent but, same
> result: poor network performance (internal between Instances and when
> trying to reach the Internet).
>
> No matter if I use "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
> "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU =
> 1500), the result is almost the same.
>
> I'll try VXLAN (or just VLANs) this weekend to see if I can get better
> results...
>
> Thanks!
> Thiago

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

djs428 at drexel

Oct 25, 2013, 5:51 AM

Post #28 of 62 (8461 views)

Permalink

Thiago,
It looks like you have a slightly different problem. I didn$B!G(Bt have any slowdown in the connection between instances.

You might want to try this: https://ask.openstack.org/en/question/6140/quantum-neutron-gre-slow-performance/?answer=6320#post-id-6320

Regards,
Daniel

From: Martinx - $B%8%'!<%`%:(B [mailto:thiagocmartinsc@gmail.com]
Sent: Thursday, October 24, 2013 11:59 PM
To: Speichert,Daniel
Cc: Anne Gentle; openstack@lists.openstack.org
Subject: Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

Hi Daniel,

I followed that page, my Instances MTU is lowered by DHCP Agent but, same result: poor network performance (internal between Instances and when trying to reach the Internet).

No matter if I use "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf + "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU = 1500), the result is almost the same.

I'll try VXLAN (or just VLANs) this weekend to see if I can get better results...

Thanks!
Thiago

On 24 October 2013 17:38, Speichert,Daniel <djs428@drexel.edu<mailto:djs428@drexel.edu>> wrote:
We managed to bring the upload speed back to maximum on the instances through the use of this guide:
http://docs.openstack.org/trunk/openstack-network/admin/content/openvswitch_plugin.html

Basically, the MTU needs to be lowered for GRE tunnels. It can be done with DHCP as explained in the new trunk manual.

Regards,
Daniel

From: annegentle@justwriteclick.com<mailto:annegentle@justwriteclick.com> [mailto:annegentle@justwriteclick.com<mailto:annegentle@justwriteclick.com>] On Behalf Of Anne Gentle
Sent: Thursday, October 24, 2013 12:08 PM
To: Martinx - $B%8%'!<%`%:(B
Cc: Speichert,Daniel; openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>

Subject: Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

On Thu, Oct 24, 2013 at 10:37 AM, Martinx - $B%8%'!<%`%:(B <thiagocmartinsc@gmail.com<mailto:thiagocmartinsc@gmail.com>> wrote:
Precisely!

The doc currently says to disable Namespace when using GRE, never did this before, look:

http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html

But on this very same doc, they say to enable it... Who knows?! =P

http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html

I stick with Namespace enabled...

Just a reminder, /trunk/ links are works in progress, thanks for bringing the mismatch to our attention, and we already have a doc bug filed:

https://bugs.launchpad.net/openstack-manuals/+bug/1241056

Review this patch: https://review.openstack.org/#/c/53380/

Anne

Let me ask you something, when you enable ovs_use_veth, que Metadata and DHCP still works?!

Cheers!
Thiago

On 24 October 2013 12:22, Speichert,Daniel <djs428@drexel.edu<mailto:djs428@drexel.edu>> wrote:
Hello everyone,

It seems we also ran into the same issue.

We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud archives (precise-updates).

The download speed to the VMs increased from 5 Mbps to maximum after enabling ovs_use_veth. Upload speed from the VMs is still terrible (max 1 Mbps, usually 0.04 Mbps).

Here is the iperf between the instance and L3 agent (network node) inside namespace.

root@cloud:~# ip netns exec qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a iperf -c 10.1.0.24 -r
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.1.0.24, TCP port 5001
TCP window size: 585 KByte (default)
------------------------------------------------------------
[ 7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001
[ ID] Interval Transfer Bandwidth
[ 7] 0.0-10.0 sec 845 MBytes 708 Mbits/sec
[ 6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006
[ 6] 0.0-31.4 sec 256 KBytes 66.7 Kbits/sec

We are using Neutron OpenVSwitch with GRE and namespaces.

A side question: the documentation says to disable namespaces with GRE and enable them with VLANs. It was always working well for us on Grizzly with GRE and namespaces and we could never get it to work without namespaces. Is there any specific reason why the documentation is advising to disable it?

Regards,
Daniel

From: Martinx - $B%8%'!<%`%:(B [mailto:thiagocmartinsc@gmail.com<mailto:thiagocmartinsc@gmail.com>]
Sent: Thursday, October 24, 2013 3:58 AM
To: Aaron Rosen
Cc: openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>

Subject: Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

Hi Aaron,

Thanks for answering! =)

Lets work...

---

TEST #1 - iperf between Network Node and its Uplink router (Data Center's gateway "Internet") - OVS br-ex / eth2

# Tenant Namespace route table

root@net-node-1:~# ip netns exec qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route
default via 172.16.0.1 dev qg-50b615b7-c2
172.16.0.0/20<http://172.16.0.0/20> dev qg-50b615b7-c2 proto kernel scope link src 172.16.0.2
192.168.210.0/24<http://192.168.210.0/24> dev qr-a1376f61-05 proto kernel scope link src 192.168.210.1<tel:192.168.210.1>

# there is a "iperf -s" running at 172.16.0.1 "Internet", testing it

root@net-node-1:~# ip netns exec qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1
------------------------------------------------------------
Client connecting to 172.16.0.1, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[ 5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 668 MBytes 559 Mbits/sec
---

---

TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink router

# iperf server running within Tenant's Namespace router

root@net-node-1:~# ip netns exec qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s

-

# from instance-1

ubuntu@instance-1:~$ ip route
default via 192.168.210.1<tel:192.168.210.1> dev eth0 metric 100
192.168.210.0/24<http://192.168.210.0/24> dev eth0 proto kernel scope link src 192.168.210.2<tel:192.168.210.2>

# instance-1 performing tests against net-node-1 Namespace above

ubuntu@instance-1:~$ iperf -c 192.168.210.1<tel:192.168.210.1>
------------------------------------------------------------
Client connecting to 192.168.210.1<tel:192.168.210.1>, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2<tel:192.168.210.2> port 43739 connected with 192.168.210.1<tel:192.168.210.1> port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 484 MBytes 406 Mbits/sec

# still on instance-1, now against "External IP" of its own Namespace / Router

ubuntu@instance-1:~$ iperf -c 172.16.0.2
------------------------------------------------------------
Client connecting to 172.16.0.2, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2<tel:192.168.210.2> port 34703 connected with 172.16.0.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 520 MBytes 436 Mbits/sec

# still on instance-1, now against the Data Center UpLink Router

ubuntu@instance-1:~$ iperf -c 172.16.0.1
------------------------------------------------------------
Client connecting to 172.16.0.1, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.4<tel:192.168.210.4> port 38401 connected with 172.16.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 324 MBytes 271 Mbits/sec
---

This latest test shows only 271 Mbits/s! I think it should be at least, 400~430 MBits/s... Right?!

---

TEST #3 - Two instances on the same hypervisor

# iperf server

ubuntu@instance-2:~$ ip route
default via 192.168.210.1<tel:192.168.210.1> dev eth0 metric 100
192.168.210.0/24<http://192.168.210.0/24> dev eth0 proto kernel scope link src 192.168.210.4<tel:192.168.210.4>

ubuntu@instance-2:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.210.4<tel:192.168.210.4> port 5001 connected with 192.168.210.2<tel:192.168.210.2> port 45800
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec

# iperf client

ubuntu@instance-1:~$ iperf -c 192.168.210.4<tel:192.168.210.4>
------------------------------------------------------------
Client connecting to 192.168.210.4<tel:192.168.210.4>, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2<tel:192.168.210.2> port 45800 connected with 192.168.210.4<tel:192.168.210.4> port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec
---

---

TEST #4 - Two instances on different hypervisors - over GRE

root@instance-2:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.210.4<tel:192.168.210.4> port 5001 connected with 192.168.210.2<tel:192.168.210.2> port 34640
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 237 MBytes 198 Mbits/sec

root@instance-1:~# iperf -c 192.168.210.4<tel:192.168.210.4>
------------------------------------------------------------
Client connecting to 192.168.210.4<tel:192.168.210.4>, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.210.2<tel:192.168.210.2> port 34640 connected with 192.168.210.4<tel:192.168.210.4> port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 237 MBytes 198 Mbits/sec
---

I just realized how slow is my intra-cloud (intra-VM) communication... :-/

---

TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip

# Same path of "TEST #4" but, testing the physical GRE path (where GRE traffic flows)

root@hypervisor-2:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
n[ 4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec

root@hypervisor-1:~# iperf -c 10.20.2.57
------------------------------------------------------------
Client connecting to 10.20.2.57, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[ 3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec
---

About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only within a hypervisor (3.96Gbit/sec).

Tomorrow, I'll do this tests with netperf.

NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if you guys tell me to do so.

BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with exactly the same configurations from my current Havana + Ubuntu 12.04.3, on top of the same hardware, to see if the problem still persist.

Regards,
Thiago

On 23 October 2013 22:40, Aaron Rosen <arosen@nicira.com<mailto:arosen@nicira.com>> wrote:

On Mon, Oct 21, 2013 at 11:52 PM, Martinx - $B%8%'!<%`%:(B <thiagocmartinsc@gmail.com<mailto:thiagocmartinsc@gmail.com>> wrote:
James,

I think I'm hitting this problem.

I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and L3+DHCP Network Node.

The connectivity from behind my Instances is very slow. It takes an eternity to finish "apt-get update".

I'm curious if you can do the following tests to help pinpoint the bottle neck:

Run iperf or netperf between:
two instances on the same hypervisor - this will determine if it's a virtualization driver issue if the performance is bad.
two instances on different hypervisors.
one instance to the namespace of the l3 agent.

If I run "apt-get update" from within tenant's Namespace, it goes fine.

If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and unable to start new Ubuntu Instances and login into them... Look:

--
cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
2013-10-22 06:01:42,989 - util.py[WARNING]: 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]: url error [[Errno 113] No route to host]
2013-10-22 06:01:45,988 - util.py[WARNING]: 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]: url error [[Errno 113] No route to host]
--

Do you see anything interesting in the neutron-metadata-agent log? Or it looks like your instance doesn't have a route to the default gw?

Is this problem still around?!

Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?

Is it possible to re-enable Metadata when ovs_use_veth = true ?

Thanks!
Thiago

On 3 October 2013 06:27, James Page <james.page@ubuntu.com<mailto:james.page@ubuntu.com>> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
On 02/10/13 22:49, James Page wrote:
>> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
>>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
>>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
>>> ms F=1500 0.598 ms 0.566 ms
>>>
>>> The PMTU from the l3 gateway to the instance looks OK to me.
> I spent a bit more time debugging this; performance from within
> the router netns on the L3 gateway node looks good in both
> directions when accessing via the tenant network (10.5.0.2) over
> the qr-XXXXX interface, but when accessing through the external
> network from within the netns I see the same performance choke
> upstream into the tenant network.
>
> Which would indicate that my problem lies somewhere around the
> qg-XXXXX interface in the router netns - just trying to figure out
> exactly what - maybe iptables is doing something wonky?
OK - I found a fix but I'm not sure why this makes a difference;
neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
True'; I switched this on, clearing everything down, rebooted and now
I seem symmetric good performance across all neutron routers.

This would point to some sort of underlying bug when ovs_use_veth = False.

- --
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com<mailto:james.page@ubuntu.com>
jamespage@debian.org<mailto:jamespage@debian.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
jkJM4Y1BUV+2L5Rrf3sc
=4lO4
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org<mailto:openstack@lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 25, 2013, 7:13 AM

Post #29 of 62 (8437 views)

Permalink

Hi Darragh,

Yes, Instances are getting MTU 1400.

I'm using LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG
1223267 right now!

The LibvirtOpenVswitchDriver doesn't work, look:

http://paste.openstack.org/show/49709/

http://paste.openstack.org/show/49710/

My NICs are "RTL8111/8168/8411 PCI Express Gigabit Ethernet", Hypervisors
motherboard are MSI-890FXA-GD70.

The command "ethtool -K eth1 gro off" did not had any effect on the
communication between instances on different hypervisors, still poor,
around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is
built).

My Linux version is "Linux hypervisor-1 3.8.0-32-generic
#47~precise1-Ubuntu", same kernel on Network Node" and others nodes too
(Ubuntu 12.04.3 installed from scratch for this Havana deployment).

The only difference I can see right now, between my two hypervisors, is
that my second is just a spare machine, with a slow CPU but, I don't think
it will have a negative impact at the network throughput, since I have only
1 Instance running into it (plus a qemu-nbd process eating 90% of its CPU).
I'll replace this CPU tomorrow, to redo this tests again but, I don't think
that this is the source of my problem. The MOBOs of two hypervisors
are identical, 1 3Com (manageable) switch connecting the two.

Thanks!
Thiago

On 25 October 2013 07:15, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:

> Hi Thiago,
>
> you have configured DHCP to push out a MTU of 1400. Can you confirm that
> the 1400 MTU is actually getting out to the instances by running 'ip link'
> on them?
>
> There is an open problem where the veth used to connect the OVS and Linux
> bridges causes a performance drop on some kernels -
> https://bugs.launchpad.net/nova-project/+bug/1223267 . If you are using
> the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to
> LibvirtOpenVswitchDriver and repeat the iperf test between instances on
> different compute-nodes.
>
> What NICs (maker+model) are you using? You could try disabling any
> off-load functionality - 'ethtool -k <iface-used-for-gre>'.
>
> What kernal are you using: 'uname -a'?
>
> Re, Darragh.
>
> > Hi Daniel,
>
> >
> > I followed that page, my Instances MTU is lowered by DHCP Agent but, same
> > result: poor network performance (internal between Instances and when
> > trying to reach the Internet).
> >
> > No matter if I use
> "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
> > "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU
> =
> > 1500), the result is almost the same.
> >
> > I'll try VXLAN (or just VLANs) this weekend to see if I can get better
> > results...
> >
> > Thanks!
> > Thiago
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

dara2002-openstack at yahoo

Oct 25, 2013, 7:28 AM

Post #30 of 62 (8445 views)

Permalink

Hi Thiago,

for the VIF error: you will need to change qemu.conf as described here:
http://openvswitch.org/openstack/documentation/

Re, Darragh.

On Friday, 25 October 2013, 15:14, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com> wrote:

Hi Darragh,
>
>
>Yes, Instances are getting MTU 1400.
>
>
>I'm usingÂ LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG 1223267 right now!Â
>
>
>
>
>TheÂ LibvirtOpenVswitchDriver doesn't work, look:
>
>
>http://paste.openstack.org/show/49709/
>
>
>
>http://paste.openstack.org/show/49710/
>
>
>
>
>
>My NICs are "RTL8111/8168/8411 PCI Express Gigabit Ethernet", Hypervisors motherboard are MSI-890FXA-GD70.
>
>
>The command "ethtool -K eth1 gro off" did not had any effect on the communication between instances on different hypervisors, still poor, around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is built).
>
>
>My Linux version is "Linux hypervisor-1 3.8.0-32-generic #47~precise1-Ubuntu", same kernel on Network Node" and others nodes too (Ubuntu 12.04.3 installed from scratch for this Havana deployment).
>
>
>The only difference I can see right now, between my two hypervisors, is that my second is just a spare machine, with a slow CPU but, I don't think it will have a negative impact at the networkÂ throughput, since I have only 1 Instance running into it (plus a qemu-nbd process eating 90% of its CPU). I'll replace this CPU tomorrow, to redo this tests again but, I don't think that this is the source of my problem. The MOBOs of two hypervisors areÂ identical, 1 3Com (manageable) switch connecting the two.
>
>
>Thanks!
>Thiago
>
>
>
>On 25 October 2013 07:15, Darragh O'Reilly <dara2002-openstack@yahoo.com> wrote:
>
>Hi Thiago,
>>
>>you have configured DHCP to push out a MTU of 1400. Can you confirm that the 1400 MTU is actually getting out to the instances by running 'ip link' on them?
>>
>>There is an open problem where the veth used to connect the OVS and Linux bridges causes a performance drop on some kernels - https://bugs.launchpad.net/nova-project/+bug/1223267 .Â If you are using the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to LibvirtOpenVswitchDriver and repeat the iperf test between instances on different compute-nodes.
>>
>>What NICs (maker+model) are you using? You could try disabling any off-load functionality - 'ethtool -k <iface-used-for-gre>'.
>>
>>What kernal are you using: 'uname -a'?
>>
>>Re, Darragh.
>>
>>
>>> Hi Daniel,
>>
>>>
>>> I followed that page, my Instances MTU is lowered by DHCP Agent but, same
>>> result: poor network performance (internal between Instances and when
>>> trying to reach the Internet).
>>>
>>> No matter if I use "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
>>> "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU =
>>> 1500), the result is almost the same.
>>>
>>> I'll try VXLAN (or just VLANs) this weekend to see if I can get better
>>> results...
>>>
>>> Thanks!
>>> Thiago
>>
>>
>>_______________________________________________
>>Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>Post to Â Â : openstack@lists.openstack.org
>>Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 25, 2013, 7:34 AM

Post #31 of 62 (8450 views)

Permalink

Daniel,

Honestly, I think I have two problems, first one is related to "instances
trying to reach the Internet", that traffic that pass trough Network Node
(L3 + Namespace), which is vey, very slow. It is impossible to run "apt-get
update" from within a Instance, for example, takes an eternity to finish,
no MTU problems detected with tcpdump at the L3, it must be something else.

The second problem, is related to the communication between two instances
on different hypervisors. Which I just realized after doing more tests.

Do you think that those two problems are, in fact, the same (or related)?

Thanks!
Thiago

On 25 October 2013 10:51, Speichert,Daniel <djs428@drexel.edu> wrote:

> Thiago,****
>
> It looks like you have a slightly different problem. I didnâ€™t have any
> slowdown in the connection between instances.****
>
> ** **
>
> You might want to try this:
> https://ask.openstack.org/en/question/6140/quantum-neutron-gre-slow-performance/?answer=6320#post-id-6320
> ****
>
> ** **
>
> Regards,****
>
> Daniel****
>
> ** **
>
> *From:* Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º [mailto:thiagocmartinsc@gmail.com]
> *Sent:* Thursday, October 24, 2013 11:59 PM
> *To:* Speichert,Daniel
> *Cc:* Anne Gentle; openstack@lists.openstack.org
>
> *Subject:* Re: [Openstack] Directional network performance issues with
> Neutron + OpenvSwitch****
>
> ** **
>
> Hi Daniel,****
>
> ** **
>
> I followed that page, my Instances MTU is lowered by DHCP Agent but, same
> result: poor network performance (internal between Instances and when
> trying to reach the Internet).****
>
> ** **
>
> No matter if I use "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf
> + "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU
> = 1500), the result is almost the same.****
>
> ** **
>
> I'll try VXLAN (or just VLANs) this weekend to see if I can get better
> results...****
>
> ** **
>
> Thanks!****
>
> Thiago****
>
> ** **
>
> ** **
>
> ** **
>
> On 24 October 2013 17:38, Speichert,Daniel <djs428@drexel.edu> wrote:****
>
> We managed to bring the upload speed back to maximum on the instances
> through the use of this guide:****
>
>
> http://docs.openstack.org/trunk/openstack-network/admin/content/openvswitch_plugin.html
> ****
>
> ****
>
> Basically, the MTU needs to be lowered for GRE tunnels. It can be done
> with DHCP as explained in the new trunk manual.****
>
> ****
>
> Regards,****
>
> Daniel****
>
> ****
>
> *From:* annegentle@justwriteclick.com [mailto:
> annegentle@justwriteclick.com] *On Behalf Of *Anne Gentle
> *Sent:* Thursday, October 24, 2013 12:08 PM
> *To:* Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º
> *Cc:* Speichert,Daniel; openstack@lists.openstack.org****
>
>
> *Subject:* Re: [Openstack] Directional network performance issues with
> Neutron + OpenvSwitch****
>
> ****
>
> ****
>
> ****
>
> On Thu, Oct 24, 2013 at 10:37 AM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <
> thiagocmartinsc@gmail.com> wrote:****
>
> Precisely!****
>
> ****
>
> The doc currently says to disable Namespace when using GRE, never did this
> before, look:****
>
> ****
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html
> ****
>
> ****
>
> But on this very same doc, they say to enable it... Who knows?! =P****
>
> ****
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
> ****
>
> ****
>
> I stick with Namespace enabled...****
>
> ****
>
> ****
>
> Just a reminder, /trunk/ links are works in progress, thanks for bringing
> the mismatch to our attention, and we already have a doc bug filed:****
>
> ****
>
> https://bugs.launchpad.net/openstack-manuals/+bug/1241056****
>
> ****
>
> Review this patch: https://review.openstack.org/#/c/53380/****
>
> ****
>
> Anne****
>
> ****
>
> ****
>
> ****
>
> Let me ask you something, when you enable ovs_use_veth, que Metadata and
> DHCP still works?!****
>
> ****
>
> Cheers!****
>
> Thiago****
>
> ****
>
> On 24 October 2013 12:22, Speichert,Daniel <djs428@drexel.edu> wrote:****
>
> Hello everyone,****
>
> ****
>
> It seems we also ran into the same issue.****
>
> ****
>
> We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud
> archives (precise-updates).****
>
> ****
>
> The download speed to the VMs increased from 5 Mbps to maximum after
> enabling ovs_use_veth. Upload speed from the VMs is still terrible (max 1
> Mbps, usually 0.04 Mbps).****
>
> ****
>
> Here is the iperf between the instance and L3 agent (network node) inside
> namespace.****
>
> ****
>
> root@cloud:~# ip netns exec qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a
> iperf -c 10.1.0.24 -r****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> ------------------------------------------------------------****
>
> Client connecting to 10.1.0.24, TCP port 5001****
>
> TCP window size: 585 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 7] 0.0-10.0 sec 845 MBytes 708 Mbits/sec****
>
> [ 6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006****
>
> [ 6] 0.0-31.4 sec 256 KBytes 66.7 Kbits/sec****
>
> ****
>
> We are using Neutron OpenVSwitch with GRE and namespaces.****
>
>
> A side question: the documentation says to disable namespaces with GRE and
> enable them with VLANs. It was always working well for us on Grizzly with
> GRE and namespaces and we could never get it to work without namespaces. Is
> there any specific reason why the documentation is advising to disable it?
> ****
>
> ****
>
> Regards,****
>
> Daniel****
>
> ****
>
> *From:* Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º [mailto:thiagocmartinsc@gmail.com]
> *Sent:* Thursday, October 24, 2013 3:58 AM
> *To:* Aaron Rosen
> *Cc:* openstack@lists.openstack.org****
>
>
> *Subject:* Re: [Openstack] Directional network performance issues with
> Neutron + OpenvSwitch****
>
> ****
>
> Hi Aaron,****
>
> ****
>
> Thanks for answering! =)****
>
> ****
>
> Lets work...****
>
> ****
>
> ---****
>
> ****
>
> TEST #1 - iperf between Network Node and its Uplink router (Data Center's
> gateway "Internet") - OVS br-ex / eth2****
>
> ****
>
> # Tenant Namespace route table****
>
> ****
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route****
>
> default via 172.16.0.1 dev qg-50b615b7-c2 ****
>
> 172.16.0.0/20 dev qg-50b615b7-c2 proto kernel scope link src
> 172.16.0.2 ****
>
> 192.168.210.0/24 dev qr-a1376f61-05 proto kernel scope link src
> 192.168.210.1 ****
>
> ****
>
> # there is a "iperf -s" running at 172.16.0.1 "Internet", testing it****
>
> ****
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.1, TCP port 5001****
>
> TCP window size: 22.9 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 5] 0.0-10.0 sec 668 MBytes 559 Mbits/sec****
>
> ---****
>
> ****
>
> ---****
>
> ****
>
> TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink
> router****
>
> ****
>
> # iperf server running within Tenant's Namespace router****
>
> ****
>
> root@net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s****
>
> ****
>
> -****
>
> ****
>
> # from instance-1****
>
> ****
>
> ubuntu@instance-1:~$ ip route****
>
> default via 192.168.210.1 dev eth0 metric 100 ****
>
> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.2 ***
> *
>
> ****
>
> # instance-1 performing tests against net-node-1 Namespace above****
>
> ****
>
> ubuntu@instance-1:~$ iperf -c 192.168.210.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.1, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 484 MBytes 406 Mbits/sec****
>
> ****
>
> # still on instance-1, now against "External IP" of its own Namespace /
> Router****
>
> ****
>
> ubuntu@instance-1:~$ iperf -c 172.16.0.2****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.2, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001**
> **
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 520 MBytes 436 Mbits/sec****
>
> ****
>
> # still on instance-1, now against the Data Center UpLink Router****
>
> ****
>
> ubuntu@instance-1:~$ iperf -c 172.16.0.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.1, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001**
> **
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec * 324 MBytes 271 Mbits/sec*****
>
> ---****
>
> ****
>
> This latest test shows only 271 Mbits/s! I think it should be at least,
> 400~430 MBits/s... Right?!****
>
> ****
>
> ---****
>
> ****
>
> TEST #3 - Two instances on the same hypervisor****
>
> ****
>
> # iperf server****
>
> ****
>
> ubuntu@instance-2:~$ ip route****
>
> default via 192.168.210.1 dev eth0 metric 100 ****
>
> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.4 ***
> *
>
> ****
>
> ubuntu@instance-2:~$ iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
> 45800****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec****
>
> ****
>
> # iperf client****
>
> ****
>
> ubuntu@instance-1:~$ iperf -c 192.168.210.4****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.4, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec****
>
> ---****
>
> ****
>
> ---****
>
> ****
>
> TEST #4 - Two instances on different hypervisors - over GRE****
>
> ****
>
> root@instance-2:~# iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
> 34640****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 237 MBytes 198 Mbits/sec****
>
> ****
>
> ****
>
> root@instance-1:~# iperf -c 192.168.210.4****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.4, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 237 MBytes 198 Mbits/sec****
>
> ---****
>
> ****
>
> I just realized how slow is my intra-cloud (intra-VM) communication...
> :-/****
>
> ****
>
> ---****
>
> ****
>
> TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip***
> *
>
> ****
>
> # Same path of "TEST #4" but, testing the physical GRE path (where GRE
> traffic flows)****
>
> ****
>
> root@hypervisor-2:~$ iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> n[ 4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec****
>
> ****
>
> root@hypervisor-1:~# iperf -c 10.20.2.57****
>
> ------------------------------------------------------------****
>
> Client connecting to 10.20.2.57, TCP port 5001****
>
> TCP window size: 22.9 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec****
>
> ---****
>
> ****
>
> About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach
> 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster
> (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only
> within a hypervisor (3.96Gbit/sec).****
>
> ****
>
> Tomorrow, I'll do this tests with netperf.****
>
> ****
>
> NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via
> "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade
> to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if
> you guys tell me to do so.****
>
> ****
>
> BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with
> exactly the same configurations from my current Havana + Ubuntu 12.04.3, on
> top of the same hardware, to see if the problem still persist.****
>
> ****
>
> Regards,****
>
> Thiago****
>
> ****
>
> On 23 October 2013 22:40, Aaron Rosen <arosen@nicira.com> wrote:****
>
> ****
>
> ****
>
> On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <
> thiagocmartinsc@gmail.com> wrote:****
>
> James,****
>
> ****
>
> I think I'm hitting this problem.****
>
> ****
>
> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
> L3+DHCP Network Node.****
>
> ****
>
> The connectivity from behind my Instances is very slow. It takes an
> eternity to finish "apt-get update".****
>
> ****
>
> ****
>
> I'm curious if you can do the following tests to help pinpoint the bottle
> neck: ****
>
> ****
>
> Run iperf or netperf between:****
>
> two instances on the same hypervisor - this will determine if it's a
> virtualization driver issue if the performance is bad. ****
>
> two instances on different hypervisors.****
>
> one instance to the namespace of the l3 agent. ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> If I run "apt-get update" from within tenant's Namespace, it goes fine.***
> *
>
> ****
>
> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and
> unable to start new Ubuntu Instances and login into them... Look:****
>
> ****
>
> --****
>
> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
> ****
>
> 2013-10-22 06:01:42,989 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]:
> url error [[Errno 113] No route to host]****
>
> 2013-10-22 06:01:45,988 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]:
> url error [[Errno 113] No route to host]****
>
> --****
>
> ****
>
> ****
>
> Do you see anything interesting in the neutron-metadata-agent log? Or it
> looks like your instance doesn't have a route to the default gw? ****
>
> ****
>
> ****
>
> Is this problem still around?!****
>
> ****
>
> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?****
>
> ****
>
> Is it possible to re-enable Metadata when ovs_use_veth = true ?****
>
> ****
>
> Thanks!****
>
> Thiago****
>
> ****
>
> On 3 October 2013 06:27, James Page <james.page@ubuntu.com> wrote:****
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256****
>
> On 02/10/13 22:49, James Page wrote:
> >> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
> >>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
> >>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
> >>> ms F=1500 0.598 ms 0.566 ms
> >>>
> >>> The PMTU from the l3 gateway to the instance looks OK to me.
> > I spent a bit more time debugging this; performance from within
> > the router netns on the L3 gateway node looks good in both
> > directions when accessing via the tenant network (10.5.0.2) over
> > the qr-XXXXX interface, but when accessing through the external
> > network from within the netns I see the same performance choke
> > upstream into the tenant network.
> >
> > Which would indicate that my problem lies somewhere around the
> > qg-XXXXX interface in the router netns - just trying to figure out
> > exactly what - maybe iptables is doing something wonky?****
>
> OK - I found a fix but I'm not sure why this makes a difference;
> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
> True'; I switched this on, clearing everything down, rebooted and now
> I seem symmetric good performance across all neutron routers.
>
> This would point to some sort of underlying bug when ovs_use_veth = False.
> ****
>
>
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/****
>
> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
> jkJM4Y1BUV+2L5Rrf3sc
> =4lO4****
>
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ****
>
> ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ****
>
> ** **
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 25, 2013, 8:19 AM

Post #32 of 62 (8452 views)

Permalink

I think can say... "YAY!!" :-D

With "LibvirtOpenVswitchDriver" my internal communication is the double
now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to
*400Mbit/s*(with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s (my
physical path
limit) but, more acceptable now.

The command "ethtool -K eth1 gro off" still makes no difference.

So, there is only 1 remain problem, when traffic pass trough L3 /
Namespace, it is still useless. Even the SSH connection into my Instances,
via its Floating IPs, is slow as hell, sometimes it just stops responding
for a few seconds, and becomes online again "out-of-nothing"...

I just detect a weird "behavior", when I run "apt-get update" from
instance-1, it is slow as I said plus, its ssh connection (where I'm
running apt-get update), stops responding right after I run "apt-get
update" AND, *all my others ssh connections also stops working too!* For a
few seconds... This means that when I run "apt-get update" from within
instance-1, the SSH session of instance-2 is affected too!! There is
something pretty bad going on at L3 / Namespace.

BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE tunnel)
on top of a 1Gbit ethernet is acceptable?! It is still less than a half...

Thank you!
Thiago

On 25 October 2013 12:28, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:

> Hi Thiago,
>
> for the VIF error: you will need to change qemu.conf as described here:
> http://openvswitch.org/openstack/documentation/
>
> Re, Darragh.
>
>
> On Friday, 25 October 2013, 15:14, Martinx - $B%8%'!<%`%:(B <
> thiagocmartinsc@gmail.com> wrote:
>
> Hi Darragh,
>
> Yes, Instances are getting MTU 1400.
>
> I'm using LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG
> 1223267 right now!
>
>
> The LibvirtOpenVswitchDriver doesn't work, look:
>
> http://paste.openstack.org/show/49709/
>
> http://paste.openstack.org/show/49710/
>
>
> My NICs are "RTL8111/8168/8411 PCI Express Gigabit Ethernet", Hypervisors
> motherboard are MSI-890FXA-GD70.
>
> The command "ethtool -K eth1 gro off" did not had any effect on the
> communication between instances on different hypervisors, still poor,
> around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is
> built).
>
> My Linux version is "Linux hypervisor-1 3.8.0-32-generic
> #47~precise1-Ubuntu", same kernel on Network Node" and others nodes too
> (Ubuntu 12.04.3 installed from scratch for this Havana deployment).
>
> The only difference I can see right now, between my two hypervisors, is
> that my second is just a spare machine, with a slow CPU but, I don't think
> it will have a negative impact at the network throughput, since I have only
> 1 Instance running into it (plus a qemu-nbd process eating 90% of its CPU).
> I'll replace this CPU tomorrow, to redo this tests again but, I don't think
> that this is the source of my problem. The MOBOs of two hypervisors
> are identical, 1 3Com (manageable) switch connecting the two.
>
> Thanks!
> Thiago
>
>
> On 25 October 2013 07:15, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:
>
> Hi Thiago,
>
> you have configured DHCP to push out a MTU of 1400. Can you confirm that
> the 1400 MTU is actually getting out to the instances by running 'ip link'
> on them?
>
> There is an open problem where the veth used to connect the OVS and Linux
> bridges causes a performance drop on some kernels -
> https://bugs.launchpad.net/nova-project/+bug/1223267 . If you are using
> the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to
> LibvirtOpenVswitchDriver and repeat the iperf test between instances on
> different compute-nodes.
>
> What NICs (maker+model) are you using? You could try disabling any
> off-load functionality - 'ethtool -k <iface-used-for-gre>'.
>
> What kernal are you using: 'uname -a'?
>
> Re, Darragh.
>
> > Hi Daniel,
>
> >
> > I followed that page, my Instances MTU is lowered by DHCP Agent but, same
> > result: poor network performance (internal between Instances and when
> > trying to reach the Internet).
> >
> > No matter if I use
> "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
> > "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU
> =
> > 1500), the result is almost the same.
> >
> > I'll try VXLAN (or just VLANs) this weekend to see if I can get better
> > results...
> >
> > Thanks!
> > Thiago
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

rick.jones2 at hp

Oct 25, 2013, 8:44 AM

Post #33 of 62 (8448 views)

Permalink

On 10/25/2013 08:19 AM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º wrote:
> I think can say... "YAY!!" :-D
>
> With "LibvirtOpenVswitchDriver" my internal communication is the double
> now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to
> *_400Mbit/s_* (with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s
> (my physical path limit) but, more acceptable now.
>
> The command "ethtool -K eth1 gro off" still makes no difference.

Does GRO happen if there isn't RX CKO on the NIC? Can your NIC
peer-into a GRE tunnel (?) to do CKO on the encapsulated traffic?

> So, there is only 1 remain problem, when traffic pass trough L3 /
> Namespace, it is still useless. Even the SSH connection into my
> Instances, via its Floating IPs, is slow as hell, sometimes it just
> stops responding for a few seconds, and becomes online again
> "out-of-nothing"...
>
> I just detect a weird "behavior", when I run "apt-get update" from
> instance-1, it is slow as I said plus, its ssh connection (where I'm
> running apt-get update), stops responding right after I run "apt-get
> update" AND, _all my others ssh connections also stops working too!_ For
> a few seconds... This means that when I run "apt-get update" from within
> instance-1, the SSH session of instance-2 is affected too!! There is
> something pretty bad going on at L3 / Namespace.
>
> BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE
> tunnel) on top of a 1Gbit ethernet is acceptable?! It is still less than
> a half...

I would suggest checking for individual CPUs maxing-out during the 400
Mbit/s transfers.

rick jones

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

dara2002-openstack at yahoo

Oct 25, 2013, 9:11 AM

Post #34 of 62 (8450 views)

Permalink

the uneven ssh performance is strange - maybe learning on the tunnel mesh is not stablizing. It is easy to mess it up by giving a wrong local_ip in the ovs-plugin config file. Check the tunnels ports on br-tun with 'ovs-vsctl show'. Is each one using the correct IPs? Br-tun should have N-1 gre-x ports - no more! Maybe you can put 'ovs-vsctl show' from the nodes on paste.openstack if there are not to many?

Re, Darragh.

On Friday, 25 October 2013, 16:20, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com> wrote:

I think can say... "YAY!!" Â Â :-D
>
>
>With "LibvirtOpenVswitchDriver" my internal communication is the double now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to 400Mbit/s (with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s (my physical path limit) but, more acceptable now.
>
>
>The command "ethtool -K eth1 gro off" still makes no difference.
>
>
>So, there is only 1 remain problem, when traffic pass trough L3 / Namespace, it is still useless. Even the SSH connection into my Instances, via its Floating IPs, is slow as hell, sometimes it just stops responding for a few seconds, and becomes online again "out-of-nothing"...
>
>
>I just detect a weird "behavior", when I run "apt-get update" from instance-1, it is slow as I said plus, its ssh connection (where I'm running apt-get update), stops responding right after I run "apt-get update" AND, all my others ssh connections also stops working too! For a few seconds... This means that when I run "apt-get update" from within instance-1, the SSH session of instance-2 is affected too!! There is something pretty bad going on at L3 / Namespace.
>
>
>BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE tunnel) on top of a 1Gbit ethernet is acceptable?! It is still less than a half...
>
>
>Thank you!
>Thiago
>
>
>On 25 October 2013 12:28, Darragh O'Reilly <dara2002-openstack@yahoo.com> wrote:
>
>Hi Thiago,
>>
>>
>>for the VIF error: you will need to change qemu.conf as described here:
>>http://openvswitch.org/openstack/documentation/
>>
>>
>>Re, Darragh.
>>
>>
>>
>>
>>On Friday, 25 October 2013, 15:14, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com> wrote:
>>
>>Hi Darragh,
>>>
>>>
>>>Yes, Instances are getting MTU 1400.
>>>
>>>
>>>I'm usingÂ LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG 1223267 right now!Â
>>>
>>>
>>>
>>>
>>>TheÂ LibvirtOpenVswitchDriver doesn't work, look:
>>>
>>>
>>>http://paste.openstack.org/show/49709/
>>>
>>>
>>>
>>>http://paste.openstack.org/show/49710/
>>>
>>>
>>>
>>>
>>>
>>>My NICs are "RTL8111/8168/8411 PCI Express Gigabit Ethernet", Hypervisors motherboard are MSI-890FXA-GD70.
>>>
>>>
>>>The command "ethtool -K eth1 gro off" did not had any effect on the communication between instances on different hypervisors, still poor, around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is built).
>>>
>>>
>>>My Linux version is "Linux hypervisor-1 3.8.0-32-generic #47~precise1-Ubuntu", same kernel on Network Node" and others nodes too (Ubuntu 12.04.3 installed from scratch for this Havana deployment).
>>>
>>>
>>>The only difference I can see right now, between my two hypervisors, is that my second is just a spare machine, with a slow CPU but, I don't think it will have a negative impact at the networkÂ throughput, since I have only 1 Instance running into it (plus a qemu-nbd process eating 90% of its CPU). I'll replace this CPU tomorrow, to redo this tests again but, I don't think that this is the source of my problem. The MOBOs of two hypervisors areÂ identical, 1 3Com (manageable) switch connecting the two.
>>>
>>>
>>>Thanks!
>>>Thiago
>>>
>>>
>>>
>>>On 25 October 2013 07:15, Darragh O'Reilly <dara2002-openstack@yahoo.com> wrote:
>>>
>>>Hi Thiago,
>>>>
>>>>you have configured DHCP to push out a MTU of 1400. Can you confirm that the 1400 MTU is actually getting out to the instances by running 'ip link' on them?
>>>>
>>>>There is an open problem where the veth used to connect the OVS and Linux bridges causes a performance drop on some kernels - https://bugs.launchpad.net/nova-project/+bug/1223267 .Â If you are using the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to LibvirtOpenVswitchDriver and repeat the iperf test between instances on different compute-nodes.
>>>>
>>>>What NICs (maker+model) are you using? You could try disabling any off-load functionality - 'ethtool -k <iface-used-for-gre>'.
>>>>
>>>>What kernal are you using: 'uname -a'?
>>>>
>>>>Re, Darragh.
>>>>
>>>>
>>>>> Hi Daniel,
>>>>
>>>>>
>>>>> I followed that page, my Instances MTU is lowered by DHCP Agent but, same
>>>>> result: poor network performance (internal between Instances and when
>>>>> trying to reach the Internet).
>>>>>
>>>>> No matter if I use "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
>>>>> "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU =
>>>>> 1500), the result is almost the same.
>>>>>
>>>>> I'll try VXLAN (or just VLANs) this weekend to see if I can get better
>>>>> results...
>>>>>
>>>>> Thanks!
>>>>> Thiago
>>>>
>>>>
>>>>_______________________________________________
>>>>Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>Post to Â Â : openstack@lists.openstack.org
>>>>Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>
>>>
>>>
>>>
>
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 25, 2013, 9:28 AM

Post #35 of 62 (8447 views)

Permalink

Here we go:

---
root@net-node-1:~# grep local_ip
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
local_ip = 10.20.2.52

root@net-node-1:~# ip r | grep 10.\20
10.20.2.0/24 dev eth1 proto kernel scope link src 10.20.2.52
---

---
root@hypervisor-1:~# grep local_ip
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
local_ip = 10.20.2.53

root@hypervisor-1:~# ip r | grep 10.\20
10.20.2.0/24 dev eth1 proto kernel scope link src 10.20.2.53
---

---
root@hypervisor-2:~# grep local_ip
/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
local_ip = 10.20.2.57

root@hypervisor-2:~# ip r | grep 10.\20
10.20.2.0/24 dev eth1 proto kernel scope link src 10.20.2.57
---

Each "ovs-vsctl show":

net-node-1: http://paste.openstack.org/show/49727/

hypervisor-1: http://paste.openstack.org/show/49728/

hypervisor-2: http://paste.openstack.org/show/49729/

Best,
Thiago

On 25 October 2013 14:11, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:

>
> the uneven ssh performance is strange - maybe learning on the tunnel mesh
> is not stablizing. It is easy to mess it up by giving a wrong local_ip in
> the ovs-plugin config file. Check the tunnels ports on br-tun with
> 'ovs-vsctl show'. Is each one using the correct IPs? Br-tun should have N-1
> gre-x ports - no more! Maybe you can put 'ovs-vsctl show' from the nodes on
> paste.openstack if there are not to many?
>
> Re, Darragh.
>
>
> On Friday, 25 October 2013, 16:20, Martinx - $B%8%'!<%`%:(B <
> thiagocmartinsc@gmail.com> wrote:
>
> I think can say... "YAY!!" :-D
>
> With "LibvirtOpenVswitchDriver" my internal communication is the double
> now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to *400Mbit/s*(with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s (my physical path
> limit) but, more acceptable now.
>
> The command "ethtool -K eth1 gro off" still makes no difference.
>
> So, there is only 1 remain problem, when traffic pass trough L3 /
> Namespace, it is still useless. Even the SSH connection into my Instances,
> via its Floating IPs, is slow as hell, sometimes it just stops responding
> for a few seconds, and becomes online again "out-of-nothing"...
>
> I just detect a weird "behavior", when I run "apt-get update" from
> instance-1, it is slow as I said plus, its ssh connection (where I'm
> running apt-get update), stops responding right after I run "apt-get
> update" AND, *all my others ssh connections also stops working too!* For
> a few seconds... This means that when I run "apt-get update" from within
> instance-1, the SSH session of instance-2 is affected too!! There is
> something pretty bad going on at L3 / Namespace.
>
> BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE tunnel)
> on top of a 1Gbit ethernet is acceptable?! It is still less than a half...
>
> Thank you!
> Thiago
>
> On 25 October 2013 12:28, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:
>
> Hi Thiago,
>
> for the VIF error: you will need to change qemu.conf as described here:
> http://openvswitch.org/openstack/documentation/
>
> Re, Darragh.
>
>
> On Friday, 25 October 2013, 15:14, Martinx - $B%8%'!<%`%:(B <
> thiagocmartinsc@gmail.com> wrote:
>
> Hi Darragh,
>
> Yes, Instances are getting MTU 1400.
>
> I'm using LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG
> 1223267 right now!
>
>
> The LibvirtOpenVswitchDriver doesn't work, look:
>
> http://paste.openstack.org/show/49709/
>
> http://paste.openstack.org/show/49710/
>
>
> My NICs are "RTL8111/8168/8411 PCI Express Gigabit Ethernet", Hypervisors
> motherboard are MSI-890FXA-GD70.
>
> The command "ethtool -K eth1 gro off" did not had any effect on the
> communication between instances on different hypervisors, still poor,
> around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is
> built).
>
> My Linux version is "Linux hypervisor-1 3.8.0-32-generic
> #47~precise1-Ubuntu", same kernel on Network Node" and others nodes too
> (Ubuntu 12.04.3 installed from scratch for this Havana deployment).
>
> The only difference I can see right now, between my two hypervisors, is
> that my second is just a spare machine, with a slow CPU but, I don't think
> it will have a negative impact at the network throughput, since I have only
> 1 Instance running into it (plus a qemu-nbd process eating 90% of its CPU).
> I'll replace this CPU tomorrow, to redo this tests again but, I don't think
> that this is the source of my problem. The MOBOs of two hypervisors
> are identical, 1 3Com (manageable) switch connecting the two.
>
> Thanks!
> Thiago
>
>
> On 25 October 2013 07:15, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:
>
> Hi Thiago,
>
> you have configured DHCP to push out a MTU of 1400. Can you confirm that
> the 1400 MTU is actually getting out to the instances by running 'ip link'
> on them?
>
> There is an open problem where the veth used to connect the OVS and Linux
> bridges causes a performance drop on some kernels -
> https://bugs.launchpad.net/nova-project/+bug/1223267 . If you are using
> the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to
> LibvirtOpenVswitchDriver and repeat the iperf test between instances on
> different compute-nodes.
>
> What NICs (maker+model) are you using? You could try disabling any
> off-load functionality - 'ethtool -k <iface-used-for-gre>'.
>
> What kernal are you using: 'uname -a'?
>
> Re, Darragh.
>
> > Hi Daniel,
>
> >
> > I followed that page, my Instances MTU is lowered by DHCP Agent but, same
> > result: poor network performance (internal between Instances and when
> > trying to reach the Internet).
> >
> > No matter if I use
> "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
> > "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU
> =
> > 1500), the result is almost the same.
> >
> > I'll try VXLAN (or just VLANs) this weekend to see if I can get better
> > results...
> >
> > Thanks!
> > Thiago
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
>
>
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

dara2002-openstack at yahoo

Oct 25, 2013, 10:28 AM

Post #36 of 62 (8449 views)

Permalink

ok, the tunnels look fine. One thing that looks funny on the network node are these untagged tap* devices. I guess you switched to using veths and thenÂ switched back to not using them. I don't know if they matter, but you should clean them up by stopping everthing, running neutron-ovs-cleanup (check bridges empty) and reboot.

Bridge br-int Port "tapa1376f61-05" Interface "tapa1376f61-05" ...
Port "qr-a1376f61-05"
tag: 1
Interface "qr-a1376f61-05"
type: internal

Re, Darragh.

On Friday, 25 October 2013, 17:28, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com> wrote:

Here we go:
>
>
>---
>root@net-node-1:~# grep local_ip /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.iniÂ
>local_ip = 10.20.2.52
>
>
>root@net-node-1:~# ip r | grep 10.\20
>10.20.2.0/24 dev eth1 Â proto kernel Â scope link Â src 10.20.2.52Â
>---
>
>
>---
>root@hypervisor-1:~# grep local_ip /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
>local_ip = 10.20.2.53
>
>
>root@hypervisor-1:~# ip r | grep 10.\20
>10.20.2.0/24 dev eth1 Â proto kernel Â scope link Â src 10.20.2.53Â
>---
>
>
>---
>root@hypervisor-2:~# grep local_ip /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
>local_ip = 10.20.2.57
>
>
>root@hypervisor-2:~# ip r | grep 10.\20
>10.20.2.0/24 dev eth1 Â proto kernel Â scope link Â src 10.20.2.57
>---
>
>
>Each "ovs-vsctl show":
>
>
>net-node-1:Â http://paste.openstack.org/show/49727/
>
>
>hypervisor-1:Â http://paste.openstack.org/show/49728/
>
>
>hypervisor-2:Â http://paste.openstack.org/show/49729/
>
>
>
>
>
>Best,
>Thiago
>
>
>
>On 25 October 2013 14:11, Darragh O'Reilly <dara2002-openstack@yahoo.com> wrote:
>
>
>>
>>the uneven ssh performance is strange - maybe learning on the tunnel mesh is not stablizing. It is easy to mess it up by giving a wrong local_ip in the ovs-plugin config file. Check the tunnels ports on br-tun with 'ovs-vsctl show'. Is each one using the correct IPs? Br-tun should have N-1 gre-x ports - no more! Maybe you can put 'ovs-vsctl show' from the nodes on paste.openstack if there are not to many?
>>
>>
>>Re, Darragh.
>>
>>
>>
>>
>>On Friday, 25 October 2013, 16:20, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com> wrote:
>>
>>I think can say... "YAY!!" Â Â :-D
>>>
>>>
>>>With "LibvirtOpenVswitchDriver" my internal communication is the double now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to 400Mbit/s (with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s (my physical path limit) but, more acceptable now.
>>>
>>>
>>>The command "ethtool -K eth1 gro off" still makes no difference.
>>>
>>>
>>>So, there is only 1 remain problem, when traffic pass trough L3 / Namespace, it is still useless. Even the SSH connection into my Instances, via its Floating IPs, is slow as hell, sometimes it just stops responding for a few seconds, and becomes online again "out-of-nothing"...
>>>
>>>
>>>I just detect a weird "behavior", when I run "apt-get update" from instance-1, it is slow as I said plus, its ssh connection (where I'm running apt-get update), stops responding right after I run "apt-get update" AND, all my others ssh connections also stops working too! For a few seconds... This means that when I run "apt-get update" from within instance-1, the SSH session of instance-2 is affected too!! There is something pretty bad going on at L3 / Namespace.
>>>
>>>
>>>BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE tunnel) on top of a 1Gbit ethernet is acceptable?! It is still less than a half...
>>>
>>>
>>>Thank you!
>>>Thiago
>>>
>>>
>>>On 25 October 2013 12:28, Darragh O'Reilly <dara2002-openstack@yahoo.com> wrote:
>>>
>>>Hi Thiago,
>>>>
>>>>
>>>>for the VIF error: you will need to change qemu.conf as described here:
>>>>http://openvswitch.org/openstack/documentation/
>>>>
>>>>
>>>>Re, Darragh.
>>>>
>>>>
>>>>
>>>>
>>>>On Friday, 25 October 2013, 15:14, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com> wrote:
>>>>
>>>>Hi Darragh,
>>>>>
>>>>>
>>>>>Yes, Instances are getting MTU 1400.
>>>>>
>>>>>
>>>>>I'm usingÂ LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG 1223267 right now!Â
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>TheÂ LibvirtOpenVswitchDriver doesn't work, look:
>>>>>
>>>>>
>>>>>http://paste.openstack.org/show/49709/
>>>>>
>>>>>
>>>>>
>>>>>http://paste.openstack.org/show/49710/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>My NICs are "RTL8111/8168/8411 PCI Express Gigabit Ethernet", Hypervisors motherboard are MSI-890FXA-GD70.
>>>>>
>>>>>
>>>>>The command "ethtool -K eth1 gro off" did not had any effect on the communication between instances on different hypervisors, still poor, around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is built).
>>>>>
>>>>>
>>>>>My Linux version is "Linux hypervisor-1 3.8.0-32-generic #47~precise1-Ubuntu", same kernel on Network Node" and others nodes too (Ubuntu 12.04.3 installed from scratch for this Havana deployment).
>>>>>
>>>>>
>>>>>The only difference I can see right now, between my two hypervisors, is that my second is just a spare machine, with a slow CPU but, I don't think it will have a negative impact at the networkÂ throughput, since I have only 1 Instance running into it (plus a qemu-nbd process eating 90% of its CPU). I'll replace this CPU tomorrow, to redo this tests again but, I don't think that this is the source of my problem. The MOBOs of two hypervisors areÂ identical, 1 3Com (manageable) switch connecting the two.
>>>>>
>>>>>
>>>>>Thanks!
>>>>>Thiago
>>>>>
>>>>>
>>>>>
>>>>>On 25 October 2013 07:15, Darragh O'Reilly <dara2002-openstack@yahoo.com> wrote:
>>>>>
>>>>>Hi Thiago,
>>>>>>
>>>>>>you have configured DHCP to push out a MTU of 1400. Can you confirm that the 1400 MTU is actually getting out to the instances by running 'ip link' on them?
>>>>>>
>>>>>>There is an open problem where the veth used to connect the OVS and Linux bridges causes a performance drop on some kernels - https://bugs.launchpad.net/nova-project/+bug/1223267 .Â If you are using the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to LibvirtOpenVswitchDriver and repeat the iperf test between instances on different compute-nodes.
>>>>>>
>>>>>>What NICs (maker+model) are you using? You could try disabling any off-load functionality - 'ethtool -k <iface-used-for-gre>'.
>>>>>>
>>>>>>What kernal are you using: 'uname -a'?
>>>>>>
>>>>>>Re, Darragh.
>>>>>>
>>>>>>
>>>>>>> Hi Daniel,
>>>>>>
>>>>>>>
>>>>>>> I followed that page, my Instances MTU is lowered by DHCP Agent but, same
>>>>>>> result: poor network performance (internal between Instances and when
>>>>>>> trying to reach the Internet).
>>>>>>>
>>>>>>> No matter if I use "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
>>>>>>> "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU =
>>>>>>> 1500), the result is almost the same.
>>>>>>>
>>>>>>> I'll try VXLAN (or just VLANs) this weekend to see if I can get better
>>>>>>> results...
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Thiago
>>>>>>
>>>>>>
>>>>>>_______________________________________________
>>>>>>Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>>Post to Â Â : openstack@lists.openstack.org
>>>>>>Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 25, 2013, 11:44 AM

Post #37 of 62 (8444 views)

Permalink

Okay, cool!

tap** removed, neutron-ovs-cleanup ok, bridges empty, all nodes rebooted.

BUT, still poor performance when reaching "External" network from within a
Instance (plus SSH lags)... [?]

I'll install a new Network Node, in another hardware, to test it more...
Weird thing is, my Grizzly Network Node works perfectly on this very same
hardware (same OpenStack Network topology, of course)...

Hardware of my current "net-node-1":

* Grizzly - Okay
* Havana - Fails... ;-(

Best,
Thiago

On 25 October 2013 15:28, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:

>
> ok, the tunnels look fine. One thing that looks funny on the network node
> are these untagged tap* devices. I guess you switched to using veths and
> then switched back to not using them. I don't know if they matter, but you
> should clean them up by stopping everthing, running neutron-ovs-cleanup
> (check bridges empty) and reboot.
>
> Bridge br-int
> Port "tapa1376f61-05"
> Interface "tapa1376f61-05"
> ...
> Port "qr-a1376f61-05"
> tag: 1
> Interface "qr-a1376f61-05"
> type: internal
>
> Re, Darragh.
>
>
>
> On Friday, 25 October 2013, 17:28, Martinx - $B%8%'!<%`%:(B <
> thiagocmartinsc@gmail.com> wrote:
>
> Here we go:
>
> ---
> root@net-node-1:~# grep local_ip
> /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
> local_ip = 10.20.2.52
>
> root@net-node-1:~# ip r | grep 10.\20
> 10.20.2.0/24 dev eth1 proto kernel scope link src 10.20.2.52
> ---
>
> ---
> root@hypervisor-1:~# grep local_ip
> /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
> local_ip = 10.20.2.53
>
> root@hypervisor-1:~# ip r | grep 10.\20
> 10.20.2.0/24 dev eth1 proto kernel scope link src 10.20.2.53
> ---
>
> ---
> root@hypervisor-2:~# grep local_ip
> /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
> local_ip = 10.20.2.57
>
> root@hypervisor-2:~# ip r | grep 10.\20
> 10.20.2.0/24 dev eth1 proto kernel scope link src 10.20.2.57
> ---
>
> Each "ovs-vsctl show":
>
> net-node-1: http://paste.openstack.org/show/49727/
>
> hypervisor-1: http://paste.openstack.org/show/49728/
>
> hypervisor-2: http://paste.openstack.org/show/49729/
>
>
> Best,
> Thiago
>
>
> On 25 October 2013 14:11, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:
>
>
> the uneven ssh performance is strange - maybe learning on the tunnel mesh
> is not stablizing. It is easy to mess it up by giving a wrong local_ip in
> the ovs-plugin config file. Check the tunnels ports on br-tun with
> 'ovs-vsctl show'. Is each one using the correct IPs? Br-tun should have N-1
> gre-x ports - no more! Maybe you can put 'ovs-vsctl show' from the nodes on
> paste.openstack if there are not to many?
>
> Re, Darragh.
>
>
> On Friday, 25 October 2013, 16:20, Martinx - $B%8%'!<%`%:(B <
> thiagocmartinsc@gmail.com> wrote:
>
> I think can say... "YAY!!" :-D
>
> With "LibvirtOpenVswitchDriver" my internal communication is the double
> now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to *400Mbit/s*(with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s (my physical path
> limit) but, more acceptable now.
>
> The command "ethtool -K eth1 gro off" still makes no difference.
>
> So, there is only 1 remain problem, when traffic pass trough L3 /
> Namespace, it is still useless. Even the SSH connection into my Instances,
> via its Floating IPs, is slow as hell, sometimes it just stops responding
> for a few seconds, and becomes online again "out-of-nothing"...
>
> I just detect a weird "behavior", when I run "apt-get update" from
> instance-1, it is slow as I said plus, its ssh connection (where I'm
> running apt-get update), stops responding right after I run "apt-get
> update" AND, *all my others ssh connections also stops working too!* For
> a few seconds... This means that when I run "apt-get update" from within
> instance-1, the SSH session of instance-2 is affected too!! There is
> something pretty bad going on at L3 / Namespace.
>
> BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE tunnel)
> on top of a 1Gbit ethernet is acceptable?! It is still less than a half...
>
> Thank you!
> Thiago
>
> On 25 October 2013 12:28, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:
>
> Hi Thiago,
>
> for the VIF error: you will need to change qemu.conf as described here:
> http://openvswitch.org/openstack/documentation/
>
> Re, Darragh.
>
>
> On Friday, 25 October 2013, 15:14, Martinx - $B%8%'!<%`%:(B <
> thiagocmartinsc@gmail.com> wrote:
>
> Hi Darragh,
>
> Yes, Instances are getting MTU 1400.
>
> I'm using LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG
> 1223267 right now!
>
>
> The LibvirtOpenVswitchDriver doesn't work, look:
>
> http://paste.openstack.org/show/49709/
>
> http://paste.openstack.org/show/49710/
>
>
> My NICs are "RTL8111/8168/8411 PCI Express Gigabit Ethernet", Hypervisors
> motherboard are MSI-890FXA-GD70.
>
> The command "ethtool -K eth1 gro off" did not had any effect on the
> communication between instances on different hypervisors, still poor,
> around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is
> built).
>
> My Linux version is "Linux hypervisor-1 3.8.0-32-generic
> #47~precise1-Ubuntu", same kernel on Network Node" and others nodes too
> (Ubuntu 12.04.3 installed from scratch for this Havana deployment).
>
> The only difference I can see right now, between my two hypervisors, is
> that my second is just a spare machine, with a slow CPU but, I don't think
> it will have a negative impact at the network throughput, since I have only
> 1 Instance running into it (plus a qemu-nbd process eating 90% of its CPU).
> I'll replace this CPU tomorrow, to redo this tests again but, I don't think
> that this is the source of my problem. The MOBOs of two hypervisors
> are identical, 1 3Com (manageable) switch connecting the two.
>
> Thanks!
> Thiago
>
>
> On 25 October 2013 07:15, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:
>
> Hi Thiago,
>
> you have configured DHCP to push out a MTU of 1400. Can you confirm that
> the 1400 MTU is actually getting out to the instances by running 'ip link'
> on them?
>
> There is an open problem where the veth used to connect the OVS and Linux
> bridges causes a performance drop on some kernels -
> https://bugs.launchpad.net/nova-project/+bug/1223267 . If you are using
> the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to
> LibvirtOpenVswitchDriver and repeat the iperf test between instances on
> different compute-nodes.
>
> What NICs (maker+model) are you using? You could try disabling any
> off-load functionality - 'ethtool -k <iface-used-for-gre>'.
>
> What kernal are you using: 'uname -a'?
>
> Re, Darragh.
>
> > Hi Daniel,
>
> >
> > I followed that page, my Instances MTU is lowered by DHCP Agent but, same
> > result: poor network performance (internal between Instances and when
> > trying to reach the Internet).
> >
> > No matter if I use
> "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
> > "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU
> =
> > 1500), the result is almost the same.
> >
> > I'll try VXLAN (or just VLANs) this weekend to see if I can get better
> > results...
> >
> > Thanks!
> > Thiago
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
>
>
>
>
>
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 25, 2013, 12:03 PM

Post #38 of 62 (8455 views)

Permalink

Hi Rick,

On 25 October 2013 13:44, Rick Jones <rick.jones2@hp.com> wrote:

> On 10/25/2013 08:19 AM, Martinx - $B%8%'!<%`%:(B wrote:
>
>> I think can say... "YAY!!" :-D
>>
>> With "LibvirtOpenVswitchDriver" my internal communication is the double
>> now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to
>> *_400Mbit/s_* (with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s
>>
>> (my physical path limit) but, more acceptable now.
>>
>> The command "ethtool -K eth1 gro off" still makes no difference.
>>
>
> Does GRO happen if there isn't RX CKO on the NIC?

Ouch! I missed that lesson... hehe

No idea, how can I check / test this?

If I "disable RX CKO" (using ethtool?) on the NIC, how can I verify if the
GRO is actually happening or not?

Anyway, I'm goggling about all this stuff right now. Thanks for pointing it
out!

Refs:

* JLS2009: Generic receive offload - http://lwn.net/Articles/358910/

Can your NIC peer-into a GRE tunnel (?) to do CKO on the encapsulated
> traffic?
>

Again, no idea... No idea... :-/

Listen, maybe this sounds too dumb from my part but, it is the first time
I'm talking about this stuff (like "NIC peer-into GRE" ?, or GRO / CKO...

GRE tunnels sounds too damn complex and problematic... I guess it is time
to try VXLAN (or NVP ?)...

If you guys say: VXLAN is a completely different beast (i.e. it does not
touch with ANY GRE tunnel), and it works smoothly (without GRO / CKO / MTU
/ lags / low speed troubles and issues), I'll move to it right now (is
VXLAN docs ready?).

NOTE: I don't want to hijack this thread because of other (internal
communication VS "Directional network performance issues with Neutron +
OpenvSwitch" thread subject) problems with my OpenStack environment,
please, let me know if this becomes a problem for you guys.

> So, there is only 1 remain problem, when traffic pass trough L3 /
>> Namespace, it is still useless. Even the SSH connection into my
>> Instances, via its Floating IPs, is slow as hell, sometimes it just
>> stops responding for a few seconds, and becomes online again
>> "out-of-nothing"...
>>
>> I just detect a weird "behavior", when I run "apt-get update" from
>> instance-1, it is slow as I said plus, its ssh connection (where I'm
>> running apt-get update), stops responding right after I run "apt-get
>> update" AND, _all my others ssh connections also stops working too!_ For
>>
>> a few seconds... This means that when I run "apt-get update" from within
>> instance-1, the SSH session of instance-2 is affected too!! There is
>> something pretty bad going on at L3 / Namespace.
>>
>> BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE
>> tunnel) on top of a 1Gbit ethernet is acceptable?! It is still less than
>> a half...
>>
>
> I would suggest checking for individual CPUs maxing-out during the 400
> Mbit/s transfers.

Okay, I'll.

>
>
> rick jones
>

Thiago

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

xchenum at gmail

Oct 25, 2013, 12:22 PM

Post #39 of 62 (8478 views)

Permalink

You can use "ethtool -k eth0" to view the setting and use "ethtool -K eth0
gro off" to turn off GRO.

On Fri, Oct 25, 2013 at 3:03 PM, Martinx - $B%8%'!<%`%:(B
<thiagocmartinsc@gmail.com>wrote:

> Hi Rick,
>
> On 25 October 2013 13:44, Rick Jones <rick.jones2@hp.com> wrote:
>
>> On 10/25/2013 08:19 AM, Martinx - $B%8%'!<%`%:(B wrote:
>>
>>> I think can say... "YAY!!" :-D
>>>
>>> With "LibvirtOpenVswitchDriver" my internal communication is the double
>>> now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to
>>> *_400Mbit/s_* (with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s
>>>
>>> (my physical path limit) but, more acceptable now.
>>>
>>> The command "ethtool -K eth1 gro off" still makes no difference.
>>>
>>
>> Does GRO happen if there isn't RX CKO on the NIC?
>
>
>
> Ouch! I missed that lesson... hehe
>
> No idea, how can I check / test this?
>
> If I "disable RX CKO" (using ethtool?) on the NIC, how can I verify if the
> GRO is actually happening or not?
>
> Anyway, I'm goggling about all this stuff right now. Thanks for pointing
> it out!
>
> Refs:
>
> * JLS2009: Generic receive offload - http://lwn.net/Articles/358910/
>
>
> Can your NIC peer-into a GRE tunnel (?) to do CKO on the encapsulated
>> traffic?
>>
>
>
> Again, no idea... No idea... :-/
>
> Listen, maybe this sounds too dumb from my part but, it is the first time
> I'm talking about this stuff (like "NIC peer-into GRE" ?, or GRO / CKO...
>
> GRE tunnels sounds too damn complex and problematic... I guess it is time
> to try VXLAN (or NVP ?)...
>
> If you guys say: VXLAN is a completely different beast (i.e. it does not
> touch with ANY GRE tunnel), and it works smoothly (without GRO / CKO / MTU
> / lags / low speed troubles and issues), I'll move to it right now (is
> VXLAN docs ready?).
>
> NOTE: I don't want to hijack this thread because of other (internal
> communication VS "Directional network performance issues with Neutron +
> OpenvSwitch" thread subject) problems with my OpenStack environment,
> please, let me know if this becomes a problem for you guys.
>
>
>
>> So, there is only 1 remain problem, when traffic pass trough L3 /
>>> Namespace, it is still useless. Even the SSH connection into my
>>> Instances, via its Floating IPs, is slow as hell, sometimes it just
>>> stops responding for a few seconds, and becomes online again
>>> "out-of-nothing"...
>>>
>>> I just detect a weird "behavior", when I run "apt-get update" from
>>> instance-1, it is slow as I said plus, its ssh connection (where I'm
>>> running apt-get update), stops responding right after I run "apt-get
>>> update" AND, _all my others ssh connections also stops working too!_ For
>>>
>>> a few seconds... This means that when I run "apt-get update" from within
>>> instance-1, the SSH session of instance-2 is affected too!! There is
>>> something pretty bad going on at L3 / Namespace.
>>>
>>> BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE
>>> tunnel) on top of a 1Gbit ethernet is acceptable?! It is still less than
>>> a half...
>>>
>>
>> I would suggest checking for individual CPUs maxing-out during the 400
>> Mbit/s transfers.
>
>
> Okay, I'll.
>
>
>>
>>
>> rick jones
>>
>
> Thiago
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

rick.jones2 at hp

Oct 25, 2013, 1:56 PM

Post #40 of 62 (8435 views)

Permalink

> Listen, maybe this sounds too dumb from my part but, it is the first
> time I'm talking about this stuff (like "NIC peer-into GRE" ?, or GRO
> / CKO...

No worries.

So, a slightly brief history of stateless offloads in NICs. It may be
too basic, and I may get some details wrong, but it should give the gist.

Go back to the "old days" - 10 Mbit/s Ethernet was "it" (all you Token
Ring fans can keep quiet :). Systems got faster than 10 Mbit/s. By a
fair margin. 100 BT came out and it wasn't all that long before systems
were faster than that, but things like interrupt rates were starting to
get to be an issue for performance, so 100 BT NICs started implementing
interrupt avoidance heuristics. The next bump in network speed to 1000
Mbit/s managed to get well out ahead of the systems. All this time,
while the link speeds were increasing, the IEEE was doing little to
nothing to make sending and receiving Ethernet traffic any easier on the
end stations (eg increasing the MTU). It was taking just as many CPU
cycles to send/receive a frame over 1000BT as it did over 100BT as it
did over 10BT.

<insert segque about how FDDI was doing things to make life easier, as
well as what the FDDI NIC vendors were doing to enable copy-free
networking, here>

So the Ethernet NIC vendors started getting creative and started
borrowing some techniques from FDDI. The base of it all is CKO -
ChecKsum Offload. Offloading the checksum calculation for the TCP and
UDP checksums. In broad handwaving terms, for inbound packets, the NIC
is made either smart enough to recognize an incoming frame as TCP
segment (UDP datagram) or it performs the Internet Checksum across the
entire frame and leaves it to the driver to fixup. For outbound
traffic, the stack, via the driver, tells the NIC a starting value
(perhaps), where to start computing the checksum, how far to go, and
where to stick it...

So, we can save the CPU cycles used calculating/verifying the checksums.
In rough terms, in the presence of copies, that is perhaps 10% or 15%
savings. Systems still needed more. It was just as many trips up and
down the protocol stack in the host to send a MB of data as it was
before - the IEEE hanging-on to the 1500 byte MTU. So, some NIC vendors
came-up with Jumbo Frames - I think the first may have been Alteon and
their AceNICs and switches. A 9000 byte MTU allows one to send bulk
data across the network in ~1/6 the number of trips up and down the
protocol stack. But that has problems - in particular you have to have
support for Jumbo Frames from end to end.

So someone, I don't recall who, had the flash of inspiration - What
If... the NIC could perform the TCP segmentation on behalf of the
stack? When sending a big chunk of data over TCP in one direction, the
only things which change from TCP segment to TCP segment are the
sequence number, and the checksum <insert some handwaving about the IP
datagram ID here>. The NIC already knows how to compute the checksum,
so let's teach it how to very simply increment the TCP sequence number.
Now we can give it A Lot of Data (tm) in one trip down the protocol
stack and save even more CPU cycles than Jumbo Frames. Now the NIC has
to know a little bit more about the traffic - it has to know that it is
TCP so it can know where the TCP sequence number goes. We also tell it
the MSS to use when it is doing the segmentation on our behalf. Thus
was born TCP Segmentation Offload, aka TSO or "Poor Man's Jumbo Frames"

That works pretty well for servers at the time - they tend to send more
data than they receive. The clients receiving the data don't need to be
able to keep up at 1000 Mbit/s and the server can be sending to multiple
clients. However, we get another order of magnitude bump in link
speeds, to 10000 Mbit/s. Now people need/want to receive at the higher
speeds too. So some 10 Gbit/s NIC vendors come up with the mirror image
of TSO and call it LRO - Large Receive Offload. The LRO NIC will
coalesce several, consequtive TCP segments into one uber segment and
hand that to the host. There are some "issues" with LRO though - for
example when a system is acting as a router, so in Linux, and perhaps
other stacks, LRO is taken out of the hands of the NIC and given to the
stack in the form of 'GRO" - Generic Receive Offload. GRO operates
above the NIC/driver, but below IP. It detects the consecutive
segments and coalesces them before passing them further up the stack. It
becomes possible to receive data at link-rate over 10 GbE. All is
happiness and joy.

OK, so now we have all these "stateless" offloads that know about the
basic traffic flow. They are all built on the foundation of CKO. They
are all dealing with *un* encapsulated traffic. (They also don't to
anything for small packets.)

Now, toss-in some encapsulation. Take your pick, in the abstract it
doesn't really matter which I suspect, at least for a little longer.
What is arriving at the NIC on inbound is no longer a TCP segment in an
IP datagram in an Ethernet frame, it is all that wrapped-up in the
encapsulation protocol. Unless the NIC knows about the encapsulation
protocol, all the NIC knows it has is some slightly alien packet. It
will probably know it is IP, but it won't know more than that.

It could, perhaps, simply compute an Internet Checksum across the entire
IP datagram and leave it to the driver to fix-up. It could simply punt
and not perform any CKO at all. But CKO is the foundation of the
stateless offloads. So, certainly no LRO and (I think but could be
wrong) no GRO. (At least not until the Linux stack learns how to look
beyond the encapsulation headers.)

Similarly, consider the outbound path. We could change the constants we
tell the NIC for doing CKO perhaps, but unless it knows about the
encapsulation protocol, we cannot ask it to do the TCP segmentation of
TSO - it would have to start replicating not only the TCP and IP
headers, but also the headers of the encapsulation protocol. So, there
goes TSO.

In essence, using an encapsulation protocol takes us all the way back to
the days of 100BT in so far as stateless offloads are concerned.
Perhaps to the early days of 1000BT.

We do have a bit more CPU grunt these days, but for the last several
years that has come primarily in the form of more cores per processor,
not in the form of processors with higher and higher frequencies. In
broad handwaving terms, single-threaded performance is not growing all
that much. If at all.

That is why we have things like multiple queues per NIC port now and
Receive Side Scaling (RSS) or Receive Packet Scaling/Receive Flow
Scaling in Linux (or Inbound Packet Scheduling/Thread Optimized Packet
Scheduling in HP-UX etc etc). RSS works by having the NIC compute a
hash over selected headers of the arriving packet - perhaps the source
and destination MAC addresses, perhaps the source and destination IP
addresses, and perhaps the source and destination TCP ports. But now
the arrving traffic is all wrapped up in this encapsulation protocol
that the NIC might not know about. Over what should the NIC compute the
hash with which to pick the queue that then picks the CPU to interrupt?
It may just punt and send all the traffic up one queue.

There are similar sorts of hashes being computed at either end of a
bond/aggregate/trunk. And the switches or bonding drivers making those
calculations may not know about the encapsulation protocol, so they may
not be able to spread traffic across multiple links. The information
they used to use is now hidden from them by the encapsulation protocol.

That then is what I was getting at when talking about NICs peering into GRE.

rick jones
All I want for Christmas is a 32 bit VLAN ID and NICs and switches which
understand it... :)

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 25, 2013, 2:37 PM

Post #41 of 62 (8445 views)

Permalink

WOW!! Thank you for your time Rick! Awesome answer!! =D

I'll do this tests (with ethtool GRO / CKO) tonight but, do you think that
this is the main root of the problem?!

I mean, I'm seeing two distinct problems here:

1- Slow connectivity to the External network plus SSH lags all over the
cloud (everything that pass trough L3 / Namespace is problematic), and;

2- Communication between two Instances on different hypervisors (i.e. maybe
it is related to this GRO / CKO thing).

So, two different problems, right?!

Thanks!
Thiago

On 25 October 2013 18:56, Rick Jones <rick.jones2@hp.com> wrote:

> > Listen, maybe this sounds too dumb from my part but, it is the first
> > time I'm talking about this stuff (like "NIC peer-into GRE" ?, or GRO
> > / CKO...
>
> No worries.
>
> So, a slightly brief history of stateless offloads in NICs. It may be
> too basic, and I may get some details wrong, but it should give the gist.
>
> Go back to the "old days" - 10 Mbit/s Ethernet was "it" (all you Token
> Ring fans can keep quiet :). Systems got faster than 10 Mbit/s. By a
> fair margin. 100 BT came out and it wasn't all that long before systems
> were faster than that, but things like interrupt rates were starting to
> get to be an issue for performance, so 100 BT NICs started implementing
> interrupt avoidance heuristics. The next bump in network speed to 1000
> Mbit/s managed to get well out ahead of the systems. All this time,
> while the link speeds were increasing, the IEEE was doing little to
> nothing to make sending and receiving Ethernet traffic any easier on the
> end stations (eg increasing the MTU). It was taking just as many CPU
> cycles to send/receive a frame over 1000BT as it did over 100BT as it
> did over 10BT.
>
> <insert segque about how FDDI was doing things to make life easier, as
> well as what the FDDI NIC vendors were doing to enable copy-free
> networking, here>
>
> So the Ethernet NIC vendors started getting creative and started
> borrowing some techniques from FDDI. The base of it all is CKO -
> ChecKsum Offload. Offloading the checksum calculation for the TCP and
> UDP checksums. In broad handwaving terms, for inbound packets, the NIC
> is made either smart enough to recognize an incoming frame as TCP
> segment (UDP datagram) or it performs the Internet Checksum across the
> entire frame and leaves it to the driver to fixup. For outbound
> traffic, the stack, via the driver, tells the NIC a starting value
> (perhaps), where to start computing the checksum, how far to go, and
> where to stick it...
>
> So, we can save the CPU cycles used calculating/verifying the checksums.
> In rough terms, in the presence of copies, that is perhaps 10% or 15%
> savings. Systems still needed more. It was just as many trips up and
> down the protocol stack in the host to send a MB of data as it was
> before - the IEEE hanging-on to the 1500 byte MTU. So, some NIC vendors
> came-up with Jumbo Frames - I think the first may have been Alteon and
> their AceNICs and switches. A 9000 byte MTU allows one to send bulk
> data across the network in ~1/6 the number of trips up and down the
> protocol stack. But that has problems - in particular you have to have
> support for Jumbo Frames from end to end.
>
> So someone, I don't recall who, had the flash of inspiration - What
> If... the NIC could perform the TCP segmentation on behalf of the
> stack? When sending a big chunk of data over TCP in one direction, the
> only things which change from TCP segment to TCP segment are the
> sequence number, and the checksum <insert some handwaving about the IP
> datagram ID here>. The NIC already knows how to compute the checksum,
> so let's teach it how to very simply increment the TCP sequence number.
> Now we can give it A Lot of Data (tm) in one trip down the protocol
> stack and save even more CPU cycles than Jumbo Frames. Now the NIC has
> to know a little bit more about the traffic - it has to know that it is
> TCP so it can know where the TCP sequence number goes. We also tell it
> the MSS to use when it is doing the segmentation on our behalf. Thus
> was born TCP Segmentation Offload, aka TSO or "Poor Man's Jumbo Frames"
>
> That works pretty well for servers at the time - they tend to send more
> data than they receive. The clients receiving the data don't need to be
> able to keep up at 1000 Mbit/s and the server can be sending to multiple
> clients. However, we get another order of magnitude bump in link
> speeds, to 10000 Mbit/s. Now people need/want to receive at the higher
> speeds too. So some 10 Gbit/s NIC vendors come up with the mirror image
> of TSO and call it LRO - Large Receive Offload. The LRO NIC will
> coalesce several, consequtive TCP segments into one uber segment and
> hand that to the host. There are some "issues" with LRO though - for
> example when a system is acting as a router, so in Linux, and perhaps
> other stacks, LRO is taken out of the hands of the NIC and given to the
> stack in the form of 'GRO" - Generic Receive Offload. GRO operates
> above the NIC/driver, but below IP. It detects the consecutive
> segments and coalesces them before passing them further up the stack. It
> becomes possible to receive data at link-rate over 10 GbE. All is
> happiness and joy.
>
> OK, so now we have all these "stateless" offloads that know about the
> basic traffic flow. They are all built on the foundation of CKO. They
> are all dealing with *un* encapsulated traffic. (They also don't to
> anything for small packets.)
>
> Now, toss-in some encapsulation. Take your pick, in the abstract it
> doesn't really matter which I suspect, at least for a little longer.
> What is arriving at the NIC on inbound is no longer a TCP segment in an
> IP datagram in an Ethernet frame, it is all that wrapped-up in the
> encapsulation protocol. Unless the NIC knows about the encapsulation
> protocol, all the NIC knows it has is some slightly alien packet. It
> will probably know it is IP, but it won't know more than that.
>
> It could, perhaps, simply compute an Internet Checksum across the entire
> IP datagram and leave it to the driver to fix-up. It could simply punt
> and not perform any CKO at all. But CKO is the foundation of the
> stateless offloads. So, certainly no LRO and (I think but could be
> wrong) no GRO. (At least not until the Linux stack learns how to look
> beyond the encapsulation headers.)
>
> Similarly, consider the outbound path. We could change the constants we
> tell the NIC for doing CKO perhaps, but unless it knows about the
> encapsulation protocol, we cannot ask it to do the TCP segmentation of
> TSO - it would have to start replicating not only the TCP and IP
> headers, but also the headers of the encapsulation protocol. So, there
> goes TSO.
>
> In essence, using an encapsulation protocol takes us all the way back to
> the days of 100BT in so far as stateless offloads are concerned.
> Perhaps to the early days of 1000BT.
>
> We do have a bit more CPU grunt these days, but for the last several
> years that has come primarily in the form of more cores per processor,
> not in the form of processors with higher and higher frequencies. In
> broad handwaving terms, single-threaded performance is not growing all
> that much. If at all.
>
> That is why we have things like multiple queues per NIC port now and
> Receive Side Scaling (RSS) or Receive Packet Scaling/Receive Flow
> Scaling in Linux (or Inbound Packet Scheduling/Thread Optimized Packet
> Scheduling in HP-UX etc etc). RSS works by having the NIC compute a
> hash over selected headers of the arriving packet - perhaps the source
> and destination MAC addresses, perhaps the source and destination IP
> addresses, and perhaps the source and destination TCP ports. But now
> the arrving traffic is all wrapped up in this encapsulation protocol
> that the NIC might not know about. Over what should the NIC compute the
> hash with which to pick the queue that then picks the CPU to interrupt?
> It may just punt and send all the traffic up one queue.
>
> There are similar sorts of hashes being computed at either end of a
> bond/aggregate/trunk. And the switches or bonding drivers making those
> calculations may not know about the encapsulation protocol, so they may
> not be able to spread traffic across multiple links. The information
> they used to use is now hidden from them by the encapsulation protocol.
>
> That then is what I was getting at when talking about NICs peering into
> GRE.
>
> rick jones
> All I want for Christmas is a 32 bit VLAN ID and NICs and switches which
> understand it... :)
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

rick.jones2 at hp

Oct 25, 2013, 2:45 PM

Post #42 of 62 (8418 views)

Permalink

On 10/25/2013 02:37 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º wrote:
> WOW!! Thank you for your time Rick! Awesome answer!! =D
>
> I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
> that this is the main root of the problem?!
>
>
> I mean, I'm seeing two distinct problems here:
>
> 1- Slow connectivity to the External network plus SSH lags all over the
> cloud (everything that pass trough L3 / Namespace is problematic), and;
>
> 2- Communication between two Instances on different hypervisors (i.e.
> maybe it is related to this GRO / CKO thing).
>
>
> So, two different problems, right?!

One or two problems I cannot say. Certainly if one got the benefit of
stateless offloads in one direction and not the other, one could see
different performance limits in each direction.

All I can really say is I liked it better when we were called Quantum,
because then I could refer to it as "Spooky networking at a distance."
Sadly, describing Neutron as "Networking with no inherent charge"
doesn't work as well :)

rick jones

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 25, 2013, 9:25 PM

Post #43 of 62 (8432 views)

Permalink

LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron! =P

I'll ignore the problems related to the "performance between two instances
on different hypervisors" for now. My priority is the connectivity issue
with the External networks... At least, internal is slow but it works.

I'm about to remove the L3 Agent / Namespaces entirely from my topology...
It is a shame because it is pretty cool! With Grizzly I had no problems at
all. Plus, I need to put Havana into production ASAP! :-/

Why I'm giving it up (of L3 / NS) for now? Because I tried:

The option "tenant_network_type" with gre, vxlan and vlan (range
physnet1:206:256 configured at the 3Com switch as tagged).

From the instances, the connection with External network *is always slow*,
no matter if I choose for Tenants, GRE, VXLAN or VLAN.

For example, right now, I'm using VLAN, same problem.

Don't you guys think that this can be a problem with the bridge "br-ex" and
its internals ? Since I swapped the "Tenant Network Type" 3 times, same
result... But I still did not removed the br-ex from the scene.

If someone wants to debug it, I can give the root password, no problem, it
is just a lab... =)

Thanks!
Thiago

On 25 October 2013 19:45, Rick Jones <rick.jones2@hp.com> wrote:

> On 10/25/2013 02:37 PM, Martinx - $B%8%'!<%`%:(B wrote:
>
>> WOW!! Thank you for your time Rick! Awesome answer!! =D
>>
>> I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
>> that this is the main root of the problem?!
>>
>>
>> I mean, I'm seeing two distinct problems here:
>>
>> 1- Slow connectivity to the External network plus SSH lags all over the
>> cloud (everything that pass trough L3 / Namespace is problematic), and;
>>
>> 2- Communication between two Instances on different hypervisors (i.e.
>> maybe it is related to this GRO / CKO thing).
>>
>>
>> So, two different problems, right?!
>>
>
> One or two problems I cannot say. Certainly if one got the benefit of
> stateless offloads in one direction and not the other, one could see
> different performance limits in each direction.
>
> All I can really say is I liked it better when we were called Quantum,
> because then I could refer to it as "Spooky networking at a distance."
> Sadly, describing Neutron as "Networking with no inherent charge" doesn't
> work as well :)
>
> rick jones
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 25, 2013, 10:27 PM

Post #44 of 62 (8427 views)

Permalink

I was able to enable "ovs_use_veth" and start Instances (VXLAN / DHCP /
Metadata Okay)... But, same problem when accessing External network.

BTW, I have valid "Floating IPs" and easy access to the Internet from the
Network Node, if someone wants to debug, just ping a message.

On 26 October 2013 02:25, Martinx - $B%8%'!<%`%:(B <thiagocmartinsc@gmail.com> wrote:

> LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron! =P
>
> I'll ignore the problems related to the "performance between two instances
> on different hypervisors" for now. My priority is the connectivity issue
> with the External networks... At least, internal is slow but it works.
>
> I'm about to remove the L3 Agent / Namespaces entirely from my topology...
> It is a shame because it is pretty cool! With Grizzly I had no problems at
> all. Plus, I need to put Havana into production ASAP! :-/
>
> Why I'm giving it up (of L3 / NS) for now? Because I tried:
>
> The option "tenant_network_type" with gre, vxlan and vlan (range
> physnet1:206:256 configured at the 3Com switch as tagged).
>
> From the instances, the connection with External network *is always slow*,
> no matter if I choose for Tenants, GRE, VXLAN or VLAN.
>
> For example, right now, I'm using VLAN, same problem.
>
> Don't you guys think that this can be a problem with the bridge "br-ex"
> and its internals ? Since I swapped the "Tenant Network Type" 3 times, same
> result... But I still did not removed the br-ex from the scene.
>
> If someone wants to debug it, I can give the root password, no problem, it
> is just a lab... =)
>
> Thanks!
> Thiago
>
> On 25 October 2013 19:45, Rick Jones <rick.jones2@hp.com> wrote:
>
>> On 10/25/2013 02:37 PM, Martinx - $B%8%'!<%`%:(B wrote:
>>
>>> WOW!! Thank you for your time Rick! Awesome answer!! =D
>>>
>>> I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
>>> that this is the main root of the problem?!
>>>
>>>
>>> I mean, I'm seeing two distinct problems here:
>>>
>>> 1- Slow connectivity to the External network plus SSH lags all over the
>>> cloud (everything that pass trough L3 / Namespace is problematic), and;
>>>
>>> 2- Communication between two Instances on different hypervisors (i.e.
>>> maybe it is related to this GRO / CKO thing).
>>>
>>>
>>> So, two different problems, right?!
>>>
>>
>> One or two problems I cannot say. Certainly if one got the benefit of
>> stateless offloads in one direction and not the other, one could see
>> different performance limits in each direction.
>>
>> All I can really say is I liked it better when we were called Quantum,
>> because then I could refer to it as "Spooky networking at a distance."
>> Sadly, describing Neutron as "Networking with no inherent charge" doesn't
>> work as well :)
>>
>> rick jones
>>
>>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

darragh.oreilly at yahoo

Oct 26, 2013, 1:28 AM

Post #45 of 62 (8424 views)

Permalink

Hi Thiago,

so just to confirm - on the same netnode machine, with the same OS, kernal and OVS versions - Grizzly is ok and Havana is not?

Also, on the network node, are there any errors in the neutron logs, the syslog, or /var/log/openvswitch/* ?

Re, Darragh.

On Saturday, 26 October 2013, 5:25, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com> wrote:

LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron! Â Â =P
>
>
>
>I'll ignore the problems related to the "performance between two instances on different hypervisors" for now. My priority is the connectivity issue with the External networks... At least, internal is slow but it works.
>
>
>I'm about to remove the L3 Agent / Namespaces entirely from my topology... It is a shame because it is pretty cool! With Grizzly I had no problems at all. Plus, I need to put Havana into production ASAP! Â Â :-/
>
>
>Why I'm giving it up (of L3 / NS) for now? Because I tried:
>
>
>The option "tenant_network_type" with gre, vxlan and vlan (range physnet1:206:256 configured at the 3Com switch as tagged).
>
>
>From the instances, the connection with External network is always slow, no matter if I choose for Tenants, GRE, VXLAN or VLAN.
>
>
>For example, right now, I'm using VLAN, same problem.
>
>
>Don't you guys think that this can be a problem with the bridge "br-ex" and its internals ? Since I swapped the "Tenant Network Type" 3 times, same result... But I still did not removed the br-ex from the scene.
>
>
>If someone wants to debug it, I can give the root password, no problem, it is just a lab... Â =)
>
>
>Thanks!
>Thiago
>
>
>On 25 October 2013 19:45, Rick Jones <rick.jones2@hp.com> wrote:
>
>On 10/25/2013 02:37 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º wrote:
>>
>>WOW!! Thank you for your time Rick! Awesome answer!! Â Â =D
>>>
>>>I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
>>>that this is the main root of the problem?!
>>>
>>>
>>>I mean, I'm seeing two distinct problems here:
>>>
>>>1- Slow connectivity to the External network plus SSH lags all over the
>>>cloud (everything that pass trough L3 / Namespace is problematic), and;
>>>
>>>2- Communication between two Instances on different hypervisors (i.e.
>>>maybe it is related to this GRO / CKO thing).
>>>
>>>
>>>So, two different problems, right?!
>>>
>>
One or two problems I cannot say. Â Â Certainly if one got the benefit of stateless offloads in one direction and not the other, one could see different performance limits in each direction.
>>
>>All I can really say is I liked it better when we were called Quantum, because then I could refer to it as "Spooky networking at a distance." Â Sadly, describing Neutron as "Networking with no inherent charge" doesn't work as well :)
>>
>>rick jones
>>
>>
>
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 26, 2013, 8:57 AM

Post #46 of 62 (8416 views)

Permalink

Hi Darragh,

Yes, on the same net-node machine, Grizzly works, Havana don't... But, for
Grizzly, I have Ubuntu 12.04 with Linux 3.2 and OVS 1.4.0-1ubuntu1.6.

If I replace the Havana net-node hardware entirely, the problem persist
(i.e. it "follows" Havana net-node), so, I think, it can not be related to
the hardware.

I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS
1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).

My logs (including Open vSwitch) right after starting an Instance (nothing
at OVS logs):

http://paste.openstack.org/show/49870/

I tried everything, including installing the Network Node on top of a KVM
virtual machine or directly on a dedicated server, same result, the problem
follows Hanava node (virtual or physical). Grizzly Network Node works both
on a KVM VM or on a dedicated server.

Regards,
Thiago

On 26 October 2013 06:28, Darragh OReilly <darragh.oreilly@yahoo.com> wrote:

> Hi Thiago,
>
> so just to confirm - on the same netnode machine, with the same OS, kernal
> and OVS versions - Grizzly is ok and Havana is not?
>
> Also, on the network node, are there any errors in the neutron logs, the
> syslog, or /var/log/openvswitch/* ?
>
> Re, Darragh.
>
>
> On Saturday, 26 October 2013, 5:25, Martinx - $B%8%'!<%`%:(B <
> thiagocmartinsc@gmail.com> wrote:
>
> LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron! =P
>
> I'll ignore the problems related to the "performance between two instances
> on different hypervisors" for now. My priority is the connectivity issue
> with the External networks... At least, internal is slow but it works.
>
> I'm about to remove the L3 Agent / Namespaces entirely from my topology...
> It is a shame because it is pretty cool! With Grizzly I had no problems at
> all. Plus, I need to put Havana into production ASAP! :-/
>
> Why I'm giving it up (of L3 / NS) for now? Because I tried:
>
> The option "tenant_network_type" with gre, vxlan and vlan (range
> physnet1:206:256 configured at the 3Com switch as tagged).
>
> From the instances, the connection with External network *is always slow*,
> no matter if I choose for Tenants, GRE, VXLAN or VLAN.
>
> For example, right now, I'm using VLAN, same problem.
>
> Don't you guys think that this can be a problem with the bridge "br-ex"
> and its internals ? Since I swapped the "Tenant Network Type" 3 times, same
> result... But I still did not removed the br-ex from the scene.
>
> If someone wants to debug it, I can give the root password, no problem, it
> is just a lab... =)
>
> Thanks!
> Thiago
>
> On 25 October 2013 19:45, Rick Jones <rick.jones2@hp.com> wrote:
>
> On 10/25/2013 02:37 PM, Martinx - $B%8%'!<%`%:(B wrote:
>
> WOW!! Thank you for your time Rick! Awesome answer!! =D
>
> I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
> that this is the main root of the problem?!
>
>
> I mean, I'm seeing two distinct problems here:
>
> 1- Slow connectivity to the External network plus SSH lags all over the
> cloud (everything that pass trough L3 / Namespace is problematic), and;
>
> 2- Communication between two Instances on different hypervisors (i.e.
> maybe it is related to this GRO / CKO thing).
>
>
> So, two different problems, right?!
>
>
> One or two problems I cannot say. Certainly if one got the benefit of
> stateless offloads in one direction and not the other, one could see
> different performance limits in each direction.
>
> All I can really say is I liked it better when we were called Quantum,
> because then I could refer to it as "Spooky networking at a distance."
> Sadly, describing Neutron as "Networking with no inherent charge" doesn't
> work as well :)
>
> rick jones
>
>
>
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Oct 27, 2013, 3:01 PM

Post #47 of 62 (8417 views)

Permalink

Stackers,

I have a small report from my latest tests.

Tests:

* Namespace (br-ex) *<->* Internet - OK

* Namespace (vxlan,gre,vlan) *<->* Tenant - OK

* Tenant *<->* Namespace *<->* Internet - *NOT-OK* (Very slow / Unstable /
Intermittent)

Since the connectivity from Tenant to its Namespace is fine AND, from its
Namespace to the Internet is also fine too, then, come to my mind: Hey, why
not run Squid WITHIN the Tenant Namespace as a workaround?!

And... VoialÃ¡! There I "Fixed" It! =P

New Test:

Tenant *<->* *Namespace with Squid* *<->* Internet - OK!

*NOTE:* I'm sure that the entire ethernet path (without L3, Namespace, OVS,
VXLANs, GREs, or Linux bridges, just plain Linux + IPs), *from the
hypervisor to the Internet*, *passing trough the same Network Node hardware
/ path*, is working smoothly. I mean, I tested the entire path BEFORE
installing OpenStack Havana... So, I it can not be a "infrastructure /
hardware" issue, it must be something else, located at the software layer
running within the Network Node itself.

I'm about to send more info about this problem.

Thanks!
Thiago

On 26 October 2013 13:57, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com> wrote:

> Hi Darragh,
>
> Yes, on the same net-node machine, Grizzly works, Havana don't... But, for
> Grizzly, I have Ubuntu 12.04 with Linux 3.2 and OVS 1.4.0-1ubuntu1.6.
>
> If I replace the Havana net-node hardware entirely, the problem persist
> (i.e. it "follows" Havana net-node), so, I think, it can not be related to
> the hardware.
>
> I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS
> 1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).
>
> My logs (including Open vSwitch) right after starting an Instance (nothing
> at OVS logs):
>
> http://paste.openstack.org/show/49870/
>
> I tried everything, including installing the Network Node on top of a KVM
> virtual machine or directly on a dedicated server, same result, the problem
> follows Hanava node (virtual or physical). Grizzly Network Node works both
> on a KVM VM or on a dedicated server.
>
> Regards,
> Thiago
>
>
> On 26 October 2013 06:28, Darragh OReilly <darragh.oreilly@yahoo.com>wrote:
>
>> Hi Thiago,
>>
>> so just to confirm - on the same netnode machine, with the same OS,
>> kernal and OVS versions - Grizzly is ok and Havana is not?
>>
>> Also, on the network node, are there any errors in the neutron logs, the
>> syslog, or /var/log/openvswitch/* ?
>>
>> Re, Darragh.
>>
>>
>> On Saturday, 26 October 2013, 5:25, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <
>> thiagocmartinsc@gmail.com> wrote:
>>
>> LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron! =P
>>
>> I'll ignore the problems related to the "performance between two
>> instances on different hypervisors" for now. My priority is the
>> connectivity issue with the External networks... At least, internal is slow
>> but it works.
>>
>> I'm about to remove the L3 Agent / Namespaces entirely from my
>> topology... It is a shame because it is pretty cool! With Grizzly I had no
>> problems at all. Plus, I need to put Havana into production ASAP! :-/
>>
>> Why I'm giving it up (of L3 / NS) for now? Because I tried:
>>
>> The option "tenant_network_type" with gre, vxlan and vlan (range
>> physnet1:206:256 configured at the 3Com switch as tagged).
>>
>> From the instances, the connection with External network *is always slow*,
>> no matter if I choose for Tenants, GRE, VXLAN or VLAN.
>>
>> For example, right now, I'm using VLAN, same problem.
>>
>> Don't you guys think that this can be a problem with the bridge "br-ex"
>> and its internals ? Since I swapped the "Tenant Network Type" 3 times, same
>> result... But I still did not removed the br-ex from the scene.
>>
>> If someone wants to debug it, I can give the root password, no problem,
>> it is just a lab... =)
>>
>> Thanks!
>> Thiago
>>
>> On 25 October 2013 19:45, Rick Jones <rick.jones2@hp.com> wrote:
>>
>> On 10/25/2013 02:37 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º wrote:
>>
>> WOW!! Thank you for your time Rick! Awesome answer!! =D
>>
>> I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
>> that this is the main root of the problem?!
>>
>>
>> I mean, I'm seeing two distinct problems here:
>>
>> 1- Slow connectivity to the External network plus SSH lags all over the
>> cloud (everything that pass trough L3 / Namespace is problematic), and;
>>
>> 2- Communication between two Instances on different hypervisors (i.e.
>> maybe it is related to this GRO / CKO thing).
>>
>>
>> So, two different problems, right?!
>>
>>
>> One or two problems I cannot say. Certainly if one got the benefit of
>> stateless offloads in one direction and not the other, one could see
>> different performance limits in each direction.
>>
>> All I can really say is I liked it better when we were called Quantum,
>> because then I could refer to it as "Spooky networking at a distance."
>> Sadly, describing Neutron as "Networking with no inherent charge" doesn't
>> work as well :)
>>
>> rick jones
>>
>>
>>
>>
>>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

dara2002-openstack at yahoo

Oct 28, 2013, 2:00 AM

Post #48 of 62 (8430 views)

Permalink

Thiago,

some more answers below.

Btw: I saw the problem with a "qemu-nbd -c" process using all the cpu on the compute. It happened just once - must be a bug in it. You can disable libvirt injection if you don't want it by setting "libvirt_inject_partition = -2" in nova.conf.

On Saturday, 26 October 2013, 16:58, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com> wrote:

Hi Darragh,
>
>
>Yes, on the same net-node machine, Grizzly works, Havana don't... But, for Grizzly, I have Ubuntu 12.04 with Linux 3.2 and >OVSÂ 1.4.0-1ubuntu1.6.

so we don't know if the problem is due to Neutron, the Ubuntu kernel or OVS. I suspect the kernel as it implements the routing/natting, interfaces and namespaces.Â I don't think Neutron Havana changes how these things are setup too much.

Can you try running Havana on a network node with the Linux 3.2 kernel?

>
>
>If I replace the Havana net-node hardware entirely, the problem persist (i.e. it "follows" Havana net-node), so, I think, it can not be related to the hardware.
>
>
>I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS 1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).
>
>
>My logs (including Open vSwitch) right after starting an Instance (nothing at OVS logs):
>
>
>http://paste.openstack.org/show/49870/
>
>
>
>I tried everything, including installing the Network Node on top of a KVM virtual machine or directly on a dedicated server, same result, the problem follows Hanava node (virtual or physical). Grizzly Network Node works both on a KVM VM or on a dedicated server.
>
>
>Regards,
>Thiago
>
>
>
>On 26 October 2013 06:28, Darragh OReilly wrote:
>
>Hi Thiago,
>>
>>so just to confirm - on the same netnode machine, with the same OS, kernal and OVS versions - Grizzly is ok and Havana is not?
>>
>>Also, on the network node, are there any errors in the neutron logs, the syslog, or /var/log/openvswitch/* ?
>>
>>
>>
>>Re, Darragh.
>>
>>
>>
>>
>>On Saturday, 26 October 2013, 5:25, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º <thiagocmartinsc@gmail.com> wrote:
>>
>>LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron! Â Â =P
>>>
>>>
>>>
>>>I'll ignore the problems related to the "performance between two instances on different hypervisors" for now. My priority is the connectivity issue with the External networks... At least, internal is slow but it works.
>>>
>>>
>>>I'm about to remove the L3 Agent / Namespaces entirely from my topology... It is a shame because it is pretty cool! With Grizzly I had no problems at all. Plus, I need to put Havana into production ASAP! Â Â :-/
>>>
>>>
>>>Why I'm giving it up (of L3 / NS) for now? Because I tried:
>>>
>>>
>>>The option "tenant_network_type" with gre, vxlan and vlan (range physnet1:206:256 configured at the 3Com switch as tagged).
>>>
>>>
>>>From the instances, the connection with External network is always slow, no matter if I choose for Tenants, GRE, VXLAN or VLAN.
>>>
>>>
>>>For example, right now, I'm using VLAN, same problem.
>>>
>>>
>>>Don't you guys think that this can be a problem with the bridge "br-ex" and its internals ? Since I swapped the "Tenant Network Type" 3 times, same result... But I still did not removed the br-ex from the scene.
>>>
>>>
>>>If someone wants to debug it, I can give the root password, no problem, it is just a lab... Â =)
>>>
>>>
>>>Thanks!
>>>Thiago
>>>
>>>
>>>On 25 October 2013 19:45, Rick Jones <rick.jones2@hp.com> wrote:
>>>
>>>On 10/25/2013 02:37 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º wrote:
>>>>
>>>>WOW!! Thank you for your time Rick! Awesome answer!! Â Â =D
>>>>>
>>>>>I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
>>>>>that this is the main root of the problem?!
>>>>>
>>>>>
>>>>>I mean, I'm seeing two distinct problems here:
>>>>>
>>>>>1- Slow connectivity to the External network plus SSH lags all over the
>>>>>cloud (everything that pass trough L3 / Namespace is problematic), and;
>>>>>
>>>>>2- Communication between two Instances on different hypervisors (i.e.
>>>>>maybe it is related to this GRO / CKO thing).
>>>>>
>>>>>
>>>>>So, two different problems, right?!
>>>>>
>>>>
One or two problems I cannot say. Â Â Certainly if one got the benefit of stateless offloads in one direction and not the other, one could see different performance limits in each direction.
>>>>
>>>>All I can really say is I liked it better when we were called Quantum, because then I could refer to it as "Spooky networking at a distance." Â Sadly, describing Neutron as "Networking with no inherent charge" doesn't work as well :)
>>>>
>>>>rick jones
>>>>
>>>>
>>>
>>>
>>>
>
>
>

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Nov 6, 2013, 3:20 AM

Post #49 of 62 (8344 views)

Permalink

Hello Stackers!

Sorry to not back on this topic last week, too many things to do...

So, instead of trying this and that, reply this, reply again... I made a
video about this problem, I hope that helps more than those e-mails that
I'm writing! =P

Honestly, I don't know the source of this problem, if it is with OpenStack
/ Neutron, or with "Linux / Namespace / OVS"... It would be great to test
it alone, Ubuntu Linux + Namespace + OVS (without Neutron), to see if the
problem persist but, I have no idea about how to setup everything, just
like Neutron does. Maybe, I just need to reproduce the "Namespace and OVS
bridges / ports / VXLAN - as is", without Neutron?! I can try that...

Also, my Grizzly setup is gone, I deleted it... Sorry about that... I know
it works because it is the first time I'm seeing this problem... I had used
Grizzly for ~5 months with only 1 problem (related to MTU 1400) but, this
problem with Havana is totally different...

Video:

OpenStack Havana L3 Router problem - Ubuntu 12.04.3 LTS:
http://www.youtube.com/watch?v=jVjiphMuuzM

* After 5 minutes, I inserted a new video, showing how I "fixed" it, by
running Squid within the Tenant router. You guys can see that, using the
default Tenant router (10:30), it will take about 1 hour to finish the
"apt-get download" and, with Squid (09:27), it goes down to about 3 minutes
(no, it is still not cached, I clean it for each test).

Sorry about the size of the video, it is about 12 minutes and high-res (to
see the screen details) but, it is a serious problem and I think it worth
watching it...

NOTE: Sorry about my English! It is very hard to "speak" a non-native
language, handling an Android phone and typing the keyboard... :-)

Best!
Thiago

On 28 October 2013 07:00, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:

> Thiago,
>
> some more answers below.
>
> Btw: I saw the problem with a "qemu-nbd -c" process using all the cpu on
> the compute. It happened just once - must be a bug in it. You can disable
> libvirt injection if you don't want it by setting "libvirt_inject_partition
> = -2" in nova.conf.
>
>
> On Saturday, 26 October 2013, 16:58, Martinx - $B%8%'!<%`%:(B <
> thiagocmartinsc@gmail.com> wrote:
>
> Hi Darragh,
> >
> >
> >Yes, on the same net-node machine, Grizzly works, Havana don't... But,
> for Grizzly, I have Ubuntu 12.04 with Linux 3.2 and >OVS 1.4.0-1ubuntu1.6.
>
>
> so we don't know if the problem is due to Neutron, the Ubuntu kernel or
> OVS. I suspect the kernel as it implements the routing/natting, interfaces
> and namespaces. I don't think Neutron Havana changes how these things are
> setup too much.
>
> Can you try running Havana on a network node with the Linux 3.2 kernel?
>
>
> >
> >
> >If I replace the Havana net-node hardware entirely, the problem persist
> (i.e. it "follows" Havana net-node), so, I think, it can not be related to
> the hardware.
> >
> >
> >I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS
> 1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).
> >
> >
> >My logs (including Open vSwitch) right after starting an Instance
> (nothing at OVS logs):
> >
> >
> >http://paste.openstack.org/show/49870/
> >
> >
> >
> >I tried everything, including installing the Network Node on top of a KVM
> virtual machine or directly on a dedicated server, same result, the problem
> follows Hanava node (virtual or physical). Grizzly Network Node works both
> on a KVM VM or on a dedicated server.
> >
> >
> >Regards,
> >Thiago
> >
> >
> >
> >On 26 October 2013 06:28, Darragh OReilly wrote:
> >
> >Hi Thiago,
> >>
> >>so just to confirm - on the same netnode machine, with the same OS,
> kernal and OVS versions - Grizzly is ok and Havana is not?
> >>
> >>Also, on the network node, are there any errors in the neutron logs, the
> syslog, or /var/log/openvswitch/* ?
> >>
> >>
> >>
> >>Re, Darragh.
> >>
> >>
> >>
> >>
> >>On Saturday, 26 October 2013, 5:25, Martinx - $B%8%'!<%`%:(B <
> thiagocmartinsc@gmail.com> wrote:
> >>
> >>LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron!
> =P
> >>>
> >>>
> >>>
> >>>I'll ignore the problems related to the "performance between two
> instances on different hypervisors" for now. My priority is the
> connectivity issue with the External networks... At least, internal is slow
> but it works.
> >>>
> >>>
> >>>I'm about to remove the L3 Agent / Namespaces entirely from my
> topology... It is a shame because it is pretty cool! With Grizzly I had no
> problems at all. Plus, I need to put Havana into production ASAP! :-/
> >>>
> >>>
> >>>Why I'm giving it up (of L3 / NS) for now? Because I tried:
> >>>
> >>>
> >>>The option "tenant_network_type" with gre, vxlan and vlan (range
> physnet1:206:256 configured at the 3Com switch as tagged).
> >>>
> >>>
> >>>From the instances, the connection with External network is always
> slow, no matter if I choose for Tenants, GRE, VXLAN or VLAN.
> >>>
> >>>
> >>>For example, right now, I'm using VLAN, same problem.
> >>>
> >>>
> >>>Don't you guys think that this can be a problem with the bridge "br-ex"
> and its internals ? Since I swapped the "Tenant Network Type" 3 times, same
> result... But I still did not removed the br-ex from the scene.
> >>>
> >>>
> >>>If someone wants to debug it, I can give the root password, no problem,
> it is just a lab... =)
> >>>
> >>>
> >>>Thanks!
> >>>Thiago
> >>>
> >>>
> >>>On 25 October 2013 19:45, Rick Jones <rick.jones2@hp.com> wrote:
> >>>
> >>>On 10/25/2013 02:37 PM, Martinx - $B%8%'!<%`%:(B wrote:
> >>>>
> >>>>WOW!! Thank you for your time Rick! Awesome answer!! =D
> >>>>>
> >>>>>I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
> >>>>>that this is the main root of the problem?!
> >>>>>
> >>>>>
> >>>>>I mean, I'm seeing two distinct problems here:
> >>>>>
> >>>>>1- Slow connectivity to the External network plus SSH lags all over
> the
> >>>>>cloud (everything that pass trough L3 / Namespace is problematic),
> and;
> >>>>>
> >>>>>2- Communication between two Instances on different hypervisors (i.e.
> >>>>>maybe it is related to this GRO / CKO thing).
> >>>>>
> >>>>>
> >>>>>So, two different problems, right?!
> >>>>>
> >>>>
> One or two problems I cannot say. Certainly if one got the benefit of
> stateless offloads in one direction and not the other, one could see
> different performance limits in each direction.
> >>>>
> >>>>All I can really say is I liked it better when we were called Quantum,
> because then I could refer to it as "Spooky networking at a distance."
> Sadly, describing Neutron as "Networking with no inherent charge" doesn't
> work as well :)
> >>>>
> >>>>rick jones
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >
> >
> >
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Nov 9, 2013, 4:09 PM

Post #50 of 62 (8324 views)

Permalink

Guys,

This problem is kind of a "deal breaker"... I was counting on OpenStack
Havana (and with Ubuntu) for my first public cloud that I'm (was) about to
announce / launch but, this problem changed everything.

I can not put Havana with Ubuntu LTS into production because of this
network issue. This is a very serious problem for me... Since all sites, or
even ssh connections, that pass through the "Floating IPs" entering into
the tenant's subnets, are very slow and, all the connections freezes for
seconds, every minute.

Again, I'm seeing that there is no way to put Havana into production (using
Per-Tenant Routers with Private Networks), *because the Network Node is
broken*. At least when with Ubuntu... I'll try it with Debian 7, or CentOS
(I don't like it), just to see if the problem persist but, I prefer Ubuntu
distro since Warty Warthog... :-/

So, what is being done to fix it? I already tried everything I could,
without any kind of success...

Also, I followed this doc (to triple * triple re-check my env):
http://docs.openstack.org/havana/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
but,
it does not work as expected.

BTW, I can give full access into my environment for you guys, no problem...
I can build a lab from scratch, following your instructions, I can also
give root access to OpenStack experts... Just, let me know... =)

Thanks!
Thiago

On 6 November 2013 09:20, Martinx - $B%8%'!<%`%:(B <thiagocmartinsc@gmail.com> wrote:

> Hello Stackers!
>
> Sorry to not back on this topic last week, too many things to do...
>
> So, instead of trying this and that, reply this, reply again... I made a
> video about this problem, I hope that helps more than those e-mails that
> I'm writing! =P
>
> Honestly, I don't know the source of this problem, if it is with OpenStack
> / Neutron, or with "Linux / Namespace / OVS"... It would be great to test
> it alone, Ubuntu Linux + Namespace + OVS (without Neutron), to see if the
> problem persist but, I have no idea about how to setup everything, just
> like Neutron does. Maybe, I just need to reproduce the "Namespace and OVS
> bridges / ports / VXLAN - as is", without Neutron?! I can try that...
>
> Also, my Grizzly setup is gone, I deleted it... Sorry about that... I know
> it works because it is the first time I'm seeing this problem... I had used
> Grizzly for ~5 months with only 1 problem (related to MTU 1400) but, this
> problem with Havana is totally different...
>
>
> Video:
>
> OpenStack Havana L3 Router problem - Ubuntu 12.04.3 LTS:
> http://www.youtube.com/watch?v=jVjiphMuuzM
>
>
> * After 5 minutes, I inserted a new video, showing how I "fixed" it, by
> running Squid within the Tenant router. You guys can see that, using the
> default Tenant router (10:30), it will take about 1 hour to finish the
> "apt-get download" and, with Squid (09:27), it goes down to about 3 minutes
> (no, it is still not cached, I clean it for each test).
>
>
> Sorry about the size of the video, it is about 12 minutes and high-res (to
> see the screen details) but, it is a serious problem and I think it worth
> watching it...
>
> NOTE: Sorry about my English! It is very hard to "speak" a non-native
> language, handling an Android phone and typing the keyboard... :-)
>
> Best!
> Thiago
>
>
>
> On 28 October 2013 07:00, Darragh O'Reilly <dara2002-openstack@yahoo.com>wrote:
>
>> Thiago,
>>
>> some more answers below.
>>
>> Btw: I saw the problem with a "qemu-nbd -c" process using all the cpu on
>> the compute. It happened just once - must be a bug in it. You can disable
>> libvirt injection if you don't want it by setting "libvirt_inject_partition
>> = -2" in nova.conf.
>>
>>
>> On Saturday, 26 October 2013, 16:58, Martinx - $B%8%'!<%`%:(B <
>> thiagocmartinsc@gmail.com> wrote:
>>
>> Hi Darragh,
>> >
>> >
>> >Yes, on the same net-node machine, Grizzly works, Havana don't... But,
>> for Grizzly, I have Ubuntu 12.04 with Linux 3.2 and >OVS 1.4.0-1ubuntu1.6.
>>
>>
>> so we don't know if the problem is due to Neutron, the Ubuntu kernel or
>> OVS. I suspect the kernel as it implements the routing/natting, interfaces
>> and namespaces. I don't think Neutron Havana changes how these things are
>> setup too much.
>>
>> Can you try running Havana on a network node with the Linux 3.2 kernel?
>>
>>
>> >
>> >
>> >If I replace the Havana net-node hardware entirely, the problem persist
>> (i.e. it "follows" Havana net-node), so, I think, it can not be related to
>> the hardware.
>> >
>> >
>> >I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS
>> 1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).
>> >
>> >
>> >My logs (including Open vSwitch) right after starting an Instance
>> (nothing at OVS logs):
>> >
>> >
>> >http://paste.openstack.org/show/49870/
>> >
>> >
>> >
>> >I tried everything, including installing the Network Node on top of a
>> KVM virtual machine or directly on a dedicated server, same result, the
>> problem follows Hanava node (virtual or physical). Grizzly Network Node
>> works both on a KVM VM or on a dedicated server.
>> >
>> >
>> >Regards,
>> >Thiago
>> >
>> >
>> >
>> >On 26 October 2013 06:28, Darragh OReilly wrote:
>> >
>> >Hi Thiago,
>> >>
>> >>so just to confirm - on the same netnode machine, with the same OS,
>> kernal and OVS versions - Grizzly is ok and Havana is not?
>> >>
>> >>Also, on the network node, are there any errors in the neutron logs,
>> the syslog, or /var/log/openvswitch/* ?
>> >>
>> >>
>> >>
>> >>Re, Darragh.
>> >>
>> >>
>> >>
>> >>
>> >>On Saturday, 26 October 2013, 5:25, Martinx - $B%8%'!<%`%:(B <
>> thiagocmartinsc@gmail.com> wrote:
>> >>
>> >>LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron!
>> =P
>> >>>
>> >>>
>> >>>
>> >>>I'll ignore the problems related to the "performance between two
>> instances on different hypervisors" for now. My priority is the
>> connectivity issue with the External networks... At least, internal is slow
>> but it works.
>> >>>
>> >>>
>> >>>I'm about to remove the L3 Agent / Namespaces entirely from my
>> topology... It is a shame because it is pretty cool! With Grizzly I had no
>> problems at all. Plus, I need to put Havana into production ASAP! :-/
>> >>>
>> >>>
>> >>>Why I'm giving it up (of L3 / NS) for now? Because I tried:
>> >>>
>> >>>
>> >>>The option "tenant_network_type" with gre, vxlan and vlan (range
>> physnet1:206:256 configured at the 3Com switch as tagged).
>> >>>
>> >>>
>> >>>From the instances, the connection with External network is always
>> slow, no matter if I choose for Tenants, GRE, VXLAN or VLAN.
>> >>>
>> >>>
>> >>>For example, right now, I'm using VLAN, same problem.
>> >>>
>> >>>
>> >>>Don't you guys think that this can be a problem with the bridge
>> "br-ex" and its internals ? Since I swapped the "Tenant Network Type" 3
>> times, same result... But I still did not removed the br-ex from the scene.
>> >>>
>> >>>
>> >>>If someone wants to debug it, I can give the root password, no
>> problem, it is just a lab... =)
>> >>>
>> >>>
>> >>>Thanks!
>> >>>Thiago
>> >>>
>> >>>
>> >>>On 25 October 2013 19:45, Rick Jones <rick.jones2@hp.com> wrote:
>> >>>
>> >>>On 10/25/2013 02:37 PM, Martinx - $B%8%'!<%`%:(B wrote:
>> >>>>
>> >>>>WOW!! Thank you for your time Rick! Awesome answer!! =D
>> >>>>>
>> >>>>>I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
>> >>>>>that this is the main root of the problem?!
>> >>>>>
>> >>>>>
>> >>>>>I mean, I'm seeing two distinct problems here:
>> >>>>>
>> >>>>>1- Slow connectivity to the External network plus SSH lags all over
>> the
>> >>>>>cloud (everything that pass trough L3 / Namespace is problematic),
>> and;
>> >>>>>
>> >>>>>2- Communication between two Instances on different hypervisors (i.e.
>> >>>>>maybe it is related to this GRO / CKO thing).
>> >>>>>
>> >>>>>
>> >>>>>So, two different problems, right?!
>> >>>>>
>> >>>>
>> One or two problems I cannot say. Certainly if one got the benefit of
>> stateless offloads in one direction and not the other, one could see
>> different performance limits in each direction.
>> >>>>
>> >>>>All I can really say is I liked it better when we were called
>> Quantum, because then I could refer to it as "Spooky networking at a
>> distance." Sadly, describing Neutron as "Networking with no inherent
>> charge" doesn't work as well :)
>> >>>>
>> >>>>rick jones
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >
>> >
>> >
>>
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

james.page at ubuntu

Nov 10, 2013, 1:34 AM

Post #51 of 62 (3013 views)

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 10/11/13 00:09, Martinx - $B%8%'!<%`%:(B wrote:
> Also, I followed this doc (to triple * triple re-check my env):
> http://docs.openstack.org/havana/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
> but, it does not work as expected.
>
> BTW, I can give full access into my environment for you guys, no
> problem... I can build a lab from scratch, following your
> instructions, I can also give root access to OpenStack experts...
> Just, let me know... =)

Hey

If you can set this up I can spare some time to help you debug
tomorrow (monday) between 0900 and 1800 utc

Cheers

James

- --
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSf1M3AAoJEL/srsug59jDuacP/1hU3tDk6Dk9I+jsxjwlIH9H
JeIs00GoLdB3yg+M/c1aWbPV9Pihgm1mC27aI/mnBXO5gOhQ/U8oss4jjz+46Cpx
qCWuIgsRFD0OpR/DzZ0cbzB64Pa/vzg9Sb3NP5YrQxvcI/WYJVDHhLuc/rvyfBsD
zi/H4ODIOb9ptZ5fbJyQGbmZUHArdUJ9FaN57PYB0Y7KQOejhYE3qjqk/IjIXm7e
mMAVVyHf8EVadcEFy+D+CxpIBXQIgjrzy5Amhrw/3q9DPs3OHoXWAGU8/ApDZiVP
yo01Pm3ZnlnXfFw3csf0PJEMKAkE3wKb/9YzXWBXNHHND0+zRKNyCCB8RE+hDDnu
M72Lj1zrXkFHhAWbPM3gsGHzGY8bsTswYDvOGrB8cTf8KcF54m8ruJb/lzdesHh3
l0cyTUKkwuWkZ4LJ63oI7FIsL4bTGt/bBvjf3FF0iFIK0OFxuGuvKtZpdi9xek8i
ihy/f0r+AlPA5pU1nMkTsOhS1v61GKLF1ygXBK0PLBeHX5wnnnxqchS4yVkjSRup
fwPmb0u2gLD8gbPINXi46sePuCwn8acBFdIvNoz9v4APYGrLgnS7rWinrjrOCHTq
EsuZ6fYs5Lnr48tPlv3WxmpHM9UNknio1zy+Bk3vrNL/43ppjkJYXVVE/JstmcYk
NjrHeUuQkdENzBZvRODx
=CcA2
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

jaypipes at gmail

Nov 10, 2013, 9:21 AM

Post #52 of 62 (3019 views)

Permalink

On 11/09/2013 07:09 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º wrote:
> Guys,
>
> This problem is kind of a "deal breaker"... I was counting on OpenStack
> Havana (and with Ubuntu) for my first public cloud that I'm (was) about
> to announce / launch but, this problem changed everything.
>
> I can not put Havana with Ubuntu LTS into production because of this
> network issue. This is a very serious problem for me... Since all sites,
> or even ssh connections, that pass through the "Floating IPs" entering
> into the tenant's subnets, are very slow and, all the connections
> freezes for seconds, every minute.
>
> Again, I'm seeing that there is no way to put Havana into production
> (using Per-Tenant Routers with Private Networks), _because the Network
> Node is broken_. At least when with Ubuntu... I'll try it with Debian 7,
> or CentOS (I don't like it), just to see if the problem persist but, I
> prefer Ubuntu distro since Warty Warthog... :-/
>
> So, what is being done to fix it? I already tried everything I could,
> without any kind of success...
>
> Also, I followed this doc (to triple * triple re-check my env):
> http://docs.openstack.org/havana/install-guide/install/apt/content/section_networking-routers-with-private-networks.html but,
> it does not work as expected.

I'd just like to point out that it is indeed possible to achieve good
network performance (bi-directional) with Ubuntu 12.04, OVS 1.11, and
OpenStack Grizzly with Neutron and GRE tunnels. We've deployed two zones
with it and after upgrading to OVS 1.11, we are seeing pretty good
performance.

We use the OpenStack Chef cookbooks to configure Neutron:

https://github.com/stackforge/cookbook-openstack-network

You may want to go through the above cookbook and check the default
settings that are in the attributes and written to the configuration
file templates.

I don't know of anything that changed between Grizzly and Havana that
would have had an impact on network performance, but perhaps someone
from the Neutron dev community could chime in here and write if there's
been anything added in the Havana timeframe that may affect network
performance...

Best,
-jay

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Nov 10, 2013, 10:35 AM

Post #53 of 62 (2998 views)

Permalink

Hi Jay!

Thank you! I'll definitely take a look at those cookbooks but, I already
tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.

Also, my previous region based on Grizzly / Quantum / GRE, worked perfectly
for months (except with MTU = 1400) and, Havana is somehow different.

Thanks!
Thiago

On 10 November 2013 15:21, Jay Pipes <jaypipes@gmail.com> wrote:

> On 11/09/2013 07:09 PM, Martinx - ¥¸¥§©`¥à¥º wrote:
>
>> Guys,
>>
>> This problem is kind of a "deal breaker"... I was counting on OpenStack
>> Havana (and with Ubuntu) for my first public cloud that I'm (was) about
>> to announce / launch but, this problem changed everything.
>>
>> I can not put Havana with Ubuntu LTS into production because of this
>> network issue. This is a very serious problem for me... Since all sites,
>> or even ssh connections, that pass through the "Floating IPs" entering
>> into the tenant's subnets, are very slow and, all the connections
>> freezes for seconds, every minute.
>>
>> Again, I'm seeing that there is no way to put Havana into production
>> (using Per-Tenant Routers with Private Networks), _because the Network
>> Node is broken_. At least when with Ubuntu... I'll try it with Debian 7,
>>
>> or CentOS (I don't like it), just to see if the problem persist but, I
>> prefer Ubuntu distro since Warty Warthog... :-/
>>
>> So, what is being done to fix it? I already tried everything I could,
>> without any kind of success...
>>
>> Also, I followed this doc (to triple * triple re-check my env):
>> http://docs.openstack.org/havana/install-guide/install/
>> apt/content/section_networking-routers-with-private-networks.html but,
>> it does not work as expected.
>>
>
> I'd just like to point out that it is indeed possible to achieve good
> network performance (bi-directional) with Ubuntu 12.04, OVS 1.11, and
> OpenStack Grizzly with Neutron and GRE tunnels. We've deployed two zones
> with it and after upgrading to OVS 1.11, we are seeing pretty good
> performance.
>
> We use the OpenStack Chef cookbooks to configure Neutron:
>
> https://github.com/stackforge/cookbook-openstack-network
>
> You may want to go through the above cookbook and check the default
> settings that are in the attributes and written to the configuration file
> templates.
>
> I don't know of anything that changed between Grizzly and Havana that
> would have had an impact on network performance, but perhaps someone from
> the Neutron dev community could chime in here and write if there's been
> anything added in the Havana timeframe that may affect network
> performance...
>
> Best,
> -jay
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

jaypipes at gmail

Nov 10, 2013, 10:47 AM

Post #54 of 62 (3017 views)

Permalink

On 11/10/2013 01:35 PM, Martinx - $B%8%'!<%`%:(B wrote:
> Hi Jay!
>
> Thank you! I'll definitely take a look at those cookbooks but, I already
> tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.
>
> Also, my previous region based on Grizzly / Quantum / GRE, worked
> perfectly for months (except with MTU = 1400) and, Havana is somehow
> different.

Interesting. Well, we're just beginning the process of our Havana
deployment testing and changes, so we'll certainly be double-checking
performance based on the above feedback.

Best,
-jay

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Nov 10, 2013, 10:33 PM

Post #55 of 62 (2999 views)

Permalink

Cool! Let me know what do you'll need.

I'll make a tenant / project / user for you here at my cloud and I can give
you root access to the network node (or any openstack node).

Let me know if it is enough for you to debug / test it.

Cheers!
Thiago

On 10 November 2013 07:34, James Page <james.page@ubuntu.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> On 10/11/13 00:09, Martinx - $B%8%'!<%`%:(B wrote:
> > Also, I followed this doc (to triple * triple re-check my env):
> >
> http://docs.openstack.org/havana/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
> > but, it does not work as expected.
> >
> > BTW, I can give full access into my environment for you guys, no
> > problem... I can build a lab from scratch, following your
> > instructions, I can also give root access to OpenStack experts...
> > Just, let me know... =)
>
> Hey
>
> If you can set this up I can spare some time to help you debug
> tomorrow (monday) between 0900 and 1800 utc
>
> Cheers
>
> James
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBCAAGBQJSf1M3AAoJEL/srsug59jDuacP/1hU3tDk6Dk9I+jsxjwlIH9H
> JeIs00GoLdB3yg+M/c1aWbPV9Pihgm1mC27aI/mnBXO5gOhQ/U8oss4jjz+46Cpx
> qCWuIgsRFD0OpR/DzZ0cbzB64Pa/vzg9Sb3NP5YrQxvcI/WYJVDHhLuc/rvyfBsD
> zi/H4ODIOb9ptZ5fbJyQGbmZUHArdUJ9FaN57PYB0Y7KQOejhYE3qjqk/IjIXm7e
> mMAVVyHf8EVadcEFy+D+CxpIBXQIgjrzy5Amhrw/3q9DPs3OHoXWAGU8/ApDZiVP
> yo01Pm3ZnlnXfFw3csf0PJEMKAkE3wKb/9YzXWBXNHHND0+zRKNyCCB8RE+hDDnu
> M72Lj1zrXkFHhAWbPM3gsGHzGY8bsTswYDvOGrB8cTf8KcF54m8ruJb/lzdesHh3
> l0cyTUKkwuWkZ4LJ63oI7FIsL4bTGt/bBvjf3FF0iFIK0OFxuGuvKtZpdi9xek8i
> ihy/f0r+AlPA5pU1nMkTsOhS1v61GKLF1ygXBK0PLBeHX5wnnnxqchS4yVkjSRup
> fwPmb0u2gLD8gbPINXi46sePuCwn8acBFdIvNoz9v4APYGrLgnS7rWinrjrOCHTq
> EsuZ6fYs5Lnr48tPlv3WxmpHM9UNknio1zy+Bk3vrNL/43ppjkJYXVVE/JstmcYk
> NjrHeUuQkdENzBZvRODx
> =CcA2
> -----END PGP SIGNATURE-----
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

geraint at koding

Nov 11, 2013, 8:40 PM

Post #56 of 62 (3002 views)

Permalink

I suddenly have the identical situation occurring here - of note I am
using grizzly and there have been two changes to the environment that have
seemingly caused this : upgrade of OVS to 1.11 and upgrade of quantum-*
from 2013.1.2 to 2013.1.3

I haven$B!G(Bt tried the default 1.04 from 12.04 and I can$B!G(Bt as this is a prod
system.

However if the openstack update is causing it then here is the place to
start I suspect : https://launchpad.net/neutron/grizzly/2013.1.3

Performance of 1.04 in my env makes that unusable.

--
Geraint Jones

On 11/11/13 2:47 am, "Jay Pipes" <jaypipes@gmail.com> wrote:

>On 11/10/2013 01:35 PM, Martinx - $B%8%'!<%`%:(B wrote:
>> Hi Jay!
>>
>> Thank you! I'll definitely take a look at those cookbooks but, I
>>already
>> tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.
>>
>> Also, my previous region based on Grizzly / Quantum / GRE, worked
>> perfectly for months (except with MTU = 1400) and, Havana is somehow
>> different.
>
>Interesting. Well, we're just beginning the process of our Havana
>deployment testing and changes, so we'll certainly be double-checking
>performance based on the above feedback.
>
>Best,
>-jay
>
>
>_______________________________________________
>Mailing list:
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>Post to : openstack@lists.openstack.org
>Unsubscribe :
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Nov 11, 2013, 10:24 PM

Post #57 of 62 (2999 views)

Permalink

At least one guy from Rackspace is aware of this problem, thanks Anne and
James Denton! ^_^

Hope to talk with James Page on IRC tomorrow, today was too complicated for
me... More experts coming!

I have a good environment for you guys to test and debug this in deep, if
desired.

BTW, hey Ubuntu guys! Please, release the ML2 plugin! ASAP!! I would love
to try it! =D

Best,
Thiago

On 12 November 2013 02:40, Geraint Jones <geraint@koding.com> wrote:

> I suddenly have the identical situation occurring here - of note I am
> using grizzly and there have been two changes to the environment that have
> seemingly caused this : upgrade of OVS to 1.11 and upgrade of quantum-*
> from 2013.1.2 to 2013.1.3
>
> I haven$B!G(Bt tried the default 1.04 from 12.04 and I can$B!G(Bt as this is a prod
> system.
>
> However if the openstack update is causing it then here is the place to
> start I suspect : https://launchpad.net/neutron/grizzly/2013.1.3
>
> Performance of 1.04 in my env makes that unusable.
>
>
> --
> Geraint Jones
>
>
>
>
> On 11/11/13 2:47 am, "Jay Pipes" <jaypipes@gmail.com> wrote:
>
> >On 11/10/2013 01:35 PM, Martinx - $B%8%'!<%`%:(B wrote:
> >> Hi Jay!
> >>
> >> Thank you! I'll definitely take a look at those cookbooks but, I
> >>already
> >> tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.
> >>
> >> Also, my previous region based on Grizzly / Quantum / GRE, worked
> >> perfectly for months (except with MTU = 1400) and, Havana is somehow
> >> different.
> >
> >Interesting. Well, we're just beginning the process of our Havana
> >deployment testing and changes, so we'll certainly be double-checking
> >performance based on the above feedback.
> >
> >Best,
> >-jay
> >
> >
> >_______________________________________________
> >Mailing list:
> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >Post to : openstack@lists.openstack.org
> >Unsubscribe :
> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Nov 18, 2013, 10:09 PM

Post #58 of 62 (2995 views)

Permalink

Guys,

Can I fill a BUG about this issue?! If yes, where?! Neutron Launchpad page?

Tks,
Thiago

On 12 November 2013 04:24, Martinx - $B%8%'!<%`%:(B <thiagocmartinsc@gmail.com>wrote:

> At least one guy from Rackspace is aware of this problem, thanks Anne and
> James Denton! ^_^
>
> Hope to talk with James Page on IRC tomorrow, today was too complicated
> for me... More experts coming!
>
> I have a good environment for you guys to test and debug this in deep, if
> desired.
>
> BTW, hey Ubuntu guys! Please, release the ML2 plugin! ASAP!! I would love
> to try it! =D
>
> Best,
> Thiago
>
>
> On 12 November 2013 02:40, Geraint Jones <geraint@koding.com> wrote:
>
>> I suddenly have the identical situation occurring here - of note I am
>> using grizzly and there have been two changes to the environment that have
>> seemingly caused this : upgrade of OVS to 1.11 and upgrade of quantum-*
>> from 2013.1.2 to 2013.1.3
>>
>> I haven$B!G(Bt tried the default 1.04 from 12.04 and I can$B!G(Bt as this is a prod
>> system.
>>
>> However if the openstack update is causing it then here is the place to
>> start I suspect : https://launchpad.net/neutron/grizzly/2013.1.3
>>
>> Performance of 1.04 in my env makes that unusable.
>>
>>
>> --
>> Geraint Jones
>>
>>
>>
>>
>> On 11/11/13 2:47 am, "Jay Pipes" <jaypipes@gmail.com> wrote:
>>
>> >On 11/10/2013 01:35 PM, Martinx - $B%8%'!<%`%:(B wrote:
>> >> Hi Jay!
>> >>
>> >> Thank you! I'll definitely take a look at those cookbooks but, I
>> >>already
>> >> tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.
>> >>
>> >> Also, my previous region based on Grizzly / Quantum / GRE, worked
>> >> perfectly for months (except with MTU = 1400) and, Havana is somehow
>> >> different.
>> >
>> >Interesting. Well, we're just beginning the process of our Havana
>> >deployment testing and changes, so we'll certainly be double-checking
>> >performance based on the above feedback.
>> >
>> >Best,
>> >-jay
>> >
>> >
>> >_______________________________________________
>> >Mailing list:
>> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >Post to : openstack@lists.openstack.org
>> >Unsubscribe :
>> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

razique.mahroua at gmail

Nov 19, 2013, 10:00 AM

Post #59 of 62 (2961 views)

Permalink

Yup :)

On 18 Nov 2013, at 22:09, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º wrote:

> Guys,
>
> Can I fill a BUG about this issue?! If yes, where?! Neutron Launchpad
> page?
>
> Tks,
> Thiago
>
>
> On 12 November 2013 04:24, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º
> <thiagocmartinsc@gmail.com>wrote:
>
>> At least one guy from Rackspace is aware of this problem, thanks Anne
>> and
>> James Denton! ^_^
>>
>> Hope to talk with James Page on IRC tomorrow, today was too
>> complicated
>> for me... More experts coming!
>>
>> I have a good environment for you guys to test and debug this in
>> deep, if
>> desired.
>>
>> BTW, hey Ubuntu guys! Please, release the ML2 plugin! ASAP!! I would
>> love
>> to try it! =D
>>
>> Best,
>> Thiago
>>
>>
>> On 12 November 2013 02:40, Geraint Jones <geraint@koding.com> wrote:
>>
>>> I suddenly have the identical situation occurring here - of note I
>>> am
>>> using grizzly and there have been two changes to the environment
>>> that have
>>> seemingly caused this : upgrade of OVS to 1.11 and upgrade of
>>> quantum-*
>>> from 2013.1.2 to 2013.1.3
>>>
>>> I havenâ€™t tried the default 1.04 from 12.04 and I canâ€™t as this
>>> is a prod
>>> system.
>>>
>>> However if the openstack update is causing it then here is the place
>>> to
>>> start I suspect : https://launchpad.net/neutron/grizzly/2013.1.3
>>>
>>> Performance of 1.04 in my env makes that unusable.
>>>
>>>
>>> --
>>> Geraint Jones
>>>
>>>
>>>
>>>
>>> On 11/11/13 2:47 am, "Jay Pipes" <jaypipes@gmail.com> wrote:
>>>
>>>> On 11/10/2013 01:35 PM, Martinx - ã‚¸ã‚§ãƒ¼ãƒ ã‚º wrote:
>>>>> Hi Jay!
>>>>>
>>>>> Thank you! I'll definitely take a look at those cookbooks but, I
>>>>> already
>>>>> tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.
>>>>>
>>>>> Also, my previous region based on Grizzly / Quantum / GRE, worked
>>>>> perfectly for months (except with MTU = 1400) and, Havana is
>>>>> somehow
>>>>> different.
>>>>
>>>> Interesting. Well, we're just beginning the process of our Havana
>>>> deployment testing and changes, so we'll certainly be
>>>> double-checking
>>>> performance based on the above feedback.
>>>>
>>>> Best,
>>>> -jay
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list:
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>> Post to : openstack@lists.openstack.org
>>>> Unsubscribe :
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>
>>>
>>>
>>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Nov 19, 2013, 3:53 PM

Post #60 of 62 (2971 views)

Permalink

Okay!

BUG filled: https://bugs.launchpad.net/neutron/+bug/1252900

Regards,
Thiago

On 19 November 2013 16:00, Razique Mahroua <razique.mahroua@gmail.com>wrote:

> Yup :)
>
>
> On 18 Nov 2013, at 22:09, Martinx - $B%8%'!<%`%:(B wrote:
>
> Guys,
>>
>> Can I fill a BUG about this issue?! If yes, where?! Neutron Launchpad
>> page?
>>
>> Tks,
>> Thiago
>>
>>
>> On 12 November 2013 04:24, Martinx - $B%8%'!<%`%:(B <thiagocmartinsc@gmail.com>
>> wrote:
>>
>> At least one guy from Rackspace is aware of this problem, thanks Anne and
>>> James Denton! ^_^
>>>
>>> Hope to talk with James Page on IRC tomorrow, today was too complicated
>>> for me... More experts coming!
>>>
>>> I have a good environment for you guys to test and debug this in deep, if
>>> desired.
>>>
>>> BTW, hey Ubuntu guys! Please, release the ML2 plugin! ASAP!! I would
>>> love
>>> to try it! =D
>>>
>>> Best,
>>> Thiago
>>>
>>>
>>> On 12 November 2013 02:40, Geraint Jones <geraint@koding.com> wrote:
>>>
>>> I suddenly have the identical situation occurring here - of note I am
>>>> using grizzly and there have been two changes to the environment that
>>>> have
>>>> seemingly caused this : upgrade of OVS to 1.11 and upgrade of quantum-*
>>>> from 2013.1.2 to 2013.1.3
>>>>
>>>> I haven$B!G(Bt tried the default 1.04 from 12.04 and I can$B!G(Bt as this is a
>>>> prod
>>>> system.
>>>>
>>>> However if the openstack update is causing it then here is the place to
>>>> start I suspect : https://launchpad.net/neutron/grizzly/2013.1.3
>>>>
>>>> Performance of 1.04 in my env makes that unusable.
>>>>
>>>>
>>>> --
>>>> Geraint Jones
>>>>
>>>>
>>>>
>>>>
>>>> On 11/11/13 2:47 am, "Jay Pipes" <jaypipes@gmail.com> wrote:
>>>>
>>>> On 11/10/2013 01:35 PM, Martinx - $B%8%'!<%`%:(B wrote:
>>>>>
>>>>>> Hi Jay!
>>>>>>
>>>>>> Thank you! I'll definitely take a look at those cookbooks but, I
>>>>>> already
>>>>>> tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.
>>>>>>
>>>>>> Also, my previous region based on Grizzly / Quantum / GRE, worked
>>>>>> perfectly for months (except with MTU = 1400) and, Havana is somehow
>>>>>> different.
>>>>>>
>>>>>
>>>>> Interesting. Well, we're just beginning the process of our Havana
>>>>> deployment testing and changes, so we'll certainly be double-checking
>>>>> performance based on the above feedback.
>>>>>
>>>>> Best,
>>>>> -jay
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list:
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>> Post to : openstack@lists.openstack.org
>>>>> Unsubscribe :
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
>> openstack
>> Post to : openstack@lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
>> openstack
>>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

gmi68745 at gmail

Nov 20, 2013, 9:16 AM

Post #61 of 62 (2973 views)

Permalink

Hi Thiago,

I updated your bug report with my own tests and I don't experience your
performance issues.

George

On Tue, Nov 19, 2013 at 6:53 PM, Martinx - $B%8%'!<%`%:(B
<thiagocmartinsc@gmail.com>wrote:

> Okay!
>
> BUG filled: https://bugs.launchpad.net/neutron/+bug/1252900
>
> Regards,
> Thiago
>
>
> On 19 November 2013 16:00, Razique Mahroua <razique.mahroua@gmail.com>wrote:
>
>> Yup :)
>>
>>
>> On 18 Nov 2013, at 22:09, Martinx - $B%8%'!<%`%:(B wrote:
>>
>> Guys,
>>>
>>> Can I fill a BUG about this issue?! If yes, where?! Neutron Launchpad
>>> page?
>>>
>>> Tks,
>>> Thiago
>>>
>>>
>>> On 12 November 2013 04:24, Martinx - $B%8%'!<%`%:(B <thiagocmartinsc@gmail.com>
>>> wrote:
>>>
>>> At least one guy from Rackspace is aware of this problem, thanks Anne
>>>> and
>>>> James Denton! ^_^
>>>>
>>>> Hope to talk with James Page on IRC tomorrow, today was too complicated
>>>> for me... More experts coming!
>>>>
>>>> I have a good environment for you guys to test and debug this in deep,
>>>> if
>>>> desired.
>>>>
>>>> BTW, hey Ubuntu guys! Please, release the ML2 plugin! ASAP!! I would
>>>> love
>>>> to try it! =D
>>>>
>>>> Best,
>>>> Thiago
>>>>
>>>>
>>>> On 12 November 2013 02:40, Geraint Jones <geraint@koding.com> wrote:
>>>>
>>>> I suddenly have the identical situation occurring here - of note I am
>>>>> using grizzly and there have been two changes to the environment that
>>>>> have
>>>>> seemingly caused this : upgrade of OVS to 1.11 and upgrade of quantum-*
>>>>> from 2013.1.2 to 2013.1.3
>>>>>
>>>>> I haven$B!G(Bt tried the default 1.04 from 12.04 and I can$B!G(Bt as this is a
>>>>> prod
>>>>> system.
>>>>>
>>>>> However if the openstack update is causing it then here is the place to
>>>>> start I suspect : https://launchpad.net/neutron/grizzly/2013.1.3
>>>>>
>>>>> Performance of 1.04 in my env makes that unusable.
>>>>>
>>>>>
>>>>> --
>>>>> Geraint Jones
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 11/11/13 2:47 am, "Jay Pipes" <jaypipes@gmail.com> wrote:
>>>>>
>>>>> On 11/10/2013 01:35 PM, Martinx - $B%8%'!<%`%:(B wrote:
>>>>>>
>>>>>>> Hi Jay!
>>>>>>>
>>>>>>> Thank you! I'll definitely take a look at those cookbooks but, I
>>>>>>> already
>>>>>>> tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.
>>>>>>>
>>>>>>> Also, my previous region based on Grizzly / Quantum / GRE, worked
>>>>>>> perfectly for months (except with MTU = 1400) and, Havana is somehow
>>>>>>> different.
>>>>>>>
>>>>>>
>>>>>> Interesting. Well, we're just beginning the process of our Havana
>>>>>> deployment testing and changes, so we'll certainly be double-checking
>>>>>> performance based on the above feedback.
>>>>>>
>>>>>> Best,
>>>>>> -jay
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list:
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>> Post to : openstack@lists.openstack.org
>>>>>> Unsubscribe :
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
>>> openstack
>>> Post to : openstack@lists.openstack.org
>>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
>>> openstack
>>>
>>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>

Re: Directional network performance issues with Neutron + OpenvSwitch [ In reply to ]

%`%:_(B<thiagocmartinsc at gmail

Nov 20, 2013, 11:36 AM

Post #62 of 62 (2983 views)

Permalink

Hi Geroge,

I'll double * tripple check everything and I'll make more performance tests
using different tools (iperf already used).

Thanks!
Thiago

On 20 November 2013 15:16, Gmi M <gmi68745@gmail.com> wrote:

> Hi Thiago,
>
> I updated your bug report with my own tests and I don't experience your
> performance issues.
>
> George
>
>
>
> On Tue, Nov 19, 2013 at 6:53 PM, Martinx - $B%8%'!<%`%:(B <
> thiagocmartinsc@gmail.com> wrote:
>
>> Okay!
>>
>> BUG filled: https://bugs.launchpad.net/neutron/+bug/1252900
>>
>> Regards,
>> Thiago
>>
>>
>> On 19 November 2013 16:00, Razique Mahroua <razique.mahroua@gmail.com>wrote:
>>
>>> Yup :)
>>>
>>>
>>> On 18 Nov 2013, at 22:09, Martinx - $B%8%'!<%`%:(B wrote:
>>>
>>> Guys,
>>>>
>>>> Can I fill a BUG about this issue?! If yes, where?! Neutron Launchpad
>>>> page?
>>>>
>>>> Tks,
>>>> Thiago
>>>>
>>>>
>>>> On 12 November 2013 04:24, Martinx - $B%8%'!<%`%:(B <thiagocmartinsc@gmail.com>
>>>> wrote:
>>>>
>>>> At least one guy from Rackspace is aware of this problem, thanks Anne
>>>>> and
>>>>> James Denton! ^_^
>>>>>
>>>>> Hope to talk with James Page on IRC tomorrow, today was too complicated
>>>>> for me... More experts coming!
>>>>>
>>>>> I have a good environment for you guys to test and debug this in deep,
>>>>> if
>>>>> desired.
>>>>>
>>>>> BTW, hey Ubuntu guys! Please, release the ML2 plugin! ASAP!! I would
>>>>> love
>>>>> to try it! =D
>>>>>
>>>>> Best,
>>>>> Thiago
>>>>>
>>>>>
>>>>> On 12 November 2013 02:40, Geraint Jones <geraint@koding.com> wrote:
>>>>>
>>>>> I suddenly have the identical situation occurring here - of note I am
>>>>>> using grizzly and there have been two changes to the environment that
>>>>>> have
>>>>>> seemingly caused this : upgrade of OVS to 1.11 and upgrade of
>>>>>> quantum-*
>>>>>> from 2013.1.2 to 2013.1.3
>>>>>>
>>>>>> I haven$B!G(Bt tried the default 1.04 from 12.04 and I can$B!G(Bt as this is a
>>>>>> prod
>>>>>> system.
>>>>>>
>>>>>> However if the openstack update is causing it then here is the place
>>>>>> to
>>>>>> start I suspect : https://launchpad.net/neutron/grizzly/2013.1.3
>>>>>>
>>>>>> Performance of 1.04 in my env makes that unusable.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Geraint Jones
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/11/13 2:47 am, "Jay Pipes" <jaypipes@gmail.com> wrote:
>>>>>>
>>>>>> On 11/10/2013 01:35 PM, Martinx - $B%8%'!<%`%:(B wrote:
>>>>>>>
>>>>>>>> Hi Jay!
>>>>>>>>
>>>>>>>> Thank you! I'll definitely take a look at those cookbooks but, I
>>>>>>>> already
>>>>>>>> tried Havana (Cloud Archive) with OVS 1.11.0, same poor results.
>>>>>>>>
>>>>>>>> Also, my previous region based on Grizzly / Quantum / GRE, worked
>>>>>>>> perfectly for months (except with MTU = 1400) and, Havana is somehow
>>>>>>>> different.
>>>>>>>>
>>>>>>>
>>>>>>> Interesting. Well, we're just beginning the process of our Havana
>>>>>>> deployment testing and changes, so we'll certainly be double-checking
>>>>>>> performance based on the above feedback.
>>>>>>>
>>>>>>> Best,
>>>>>>> -jay
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list:
>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>>> Post to : openstack@lists.openstack.org
>>>>>>> Unsubscribe :
>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
>>>> openstack
>>>> Post to : openstack@lists.openstack.org
>>>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
>>>> openstack
>>>>
>>>
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack@lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>

Mailing List Archive

Mailing List Archive

Attached Files: