Mailing List Archive

strangeness with high BDP tests
Hi everyone,

First, let me establish the baseline here:
All default settings, no modifications to any sysctls. Same amount of
RAM to dom0 and VM. (note that by default, TCP BiC is on). Test across
a low latency cluster, everything on the same gigabit switch. I'm
using Xen 2.0.3. I'm using netperf for all my test.

Between dom0 to dom0 on two machines in the cluster, I can consistenly
get ~930Mbps. Between VM to VM on the same two machines, I can get
between 730 to 850 Mbps, but there's a lot more variation.

So far so good.

Now, I modify the TCP buffer sizes (both on dom0 and VM) thus:
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 65536 8388608
net.ipv4.tcp_mem = 24576 32768 49152
net.core.rmem_default = 112640
net.core.wmem_default = 112640
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
net.ipv4.tcp_bic_low_window = 14
net.ipv4.tcp_bic_fast_convergence = 1
net.ipv4.tcp_bic = 1

Now, between dom0 to dom0 on 2 machines, I can get consistenly get
880Mbps. And between VM to VM, I can get around 850Mbps. So far so
good.

But now comes the really interesting part. So far, these machines were
talking over the switch directly. Now I direct all traffic through a
dummynet router (on the same switch). The pipe connecting the two is
set to 500Mbps with an RTT of 80ms.

Here are the results for dom0 to dom0 tests:

== Single flow, 10 seconds ==
[dgupta@sysnet03]$ netperf -H sysnet08
TCP STREAM TEST to sysnet08
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 65536 65536 10.11 158.55

== Single flow, 80 seconds ==
[dgupta@sysnet03]$ netperf -H sysnet08 -l 80
TCP STREAM TEST to sysnet08
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 65536 65536 80.72 344.20

== 50 flows, 80 seconds ==

87380 65536 65536 80.14 4.93
87380 65536 65536 80.18 9.37
87380 65536 65536 80.21 10.13
87380 65536 65536 80.22 9.11
87380 65536 65536 80.19 9.45
87380 65536 65536 80.22 5.06
87380 65536 65536 80.15 9.38
87380 65536 65536 80.20 9.98
87380 65536 65536 80.23 3.70
87380 65536 65536 80.20 9.14
87380 65536 65536 80.18 8.85
87380 65536 65536 80.16 8.96
87380 65536 65536 80.21 9.91
87380 65536 65536 80.18 9.46
87380 65536 65536 80.17 9.38
87380 65536 65536 80.18 9.82
87380 65536 65536 80.15 7.22
87380 65536 65536 80.16 8.64
87380 65536 65536 80.26 10.60
87380 65536 65536 80.22 9.33
87380 65536 65536 80.24 8.88
87380 65536 65536 80.22 9.54
87380 65536 65536 80.19 9.65
87380 65536 65536 80.20 9.70
87380 65536 65536 80.24 9.43
87380 65536 65536 80.19 8.10
87380 65536 65536 80.21 9.31
87380 65536 65536 80.18 9.08
87380 65536 65536 80.19 9.24
87380 65536 65536 80.27 9.91
87380 65536 65536 80.28 9.67
87380 65536 65536 80.24 9.50
87380 65536 65536 80.28 9.70
87380 65536 65536 80.24 10.09
87380 65536 65536 80.31 4.55
87380 65536 65536 80.28 5.93
87380 65536 65536 80.25 9.55
87380 65536 65536 80.32 5.60
87380 65536 65536 80.35 6.29
87380 65536 65536 80.27 4.75
87380 65536 65536 80.40 6.51
87380 65536 65536 80.39 6.38
87380 65536 65536 80.40 10.12
87380 65536 65536 80.53 4.62
87380 65536 65536 80.67 16.53
87380 65536 65536 81.10 4.53
87380 65536 65536 82.21 1.93
87380 65536 65536 80.09 9.43
87380 65536 65536 80.10 9.14
87380 65536 65536 80.13 9.88
[~]
[dgupta@sysnet03]$ awk '{sum+=$5} END {print sum,NR,sum/NR}' dom0-dom0-50.dat
419.96 50 8.3992

This the aggregate and average per flow. Now I run the same test from VM to VM:

== Single flow, 10 seconds ==
root@tg3:~# netperf -H 172.19.222.101
TCP STREAM TEST to 172.19.222.101
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 65536 65536 10.15 22.30

== Single flow, 80 seconds ==
root@tg3:~# netperf -H 172.19.222.101 -l 80
TCP STREAM TEST to 172.19.222.101
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 65536 65536 80.17 76.96

== 50 flows, 80 seconds ==
tee vm-vm-50.dati=0;i<50;i++));do netperf -P 0 -H 172.19.222.101 -l 80 & done |
87380 65536 65536 80.09 8.50
87380 65536 65536 80.08 6.46
87380 65536 65536 80.19 7.33
87380 65536 65536 80.20 7.29
87380 65536 65536 80.20 5.86
87380 65536 65536 80.23 8.40
87380 65536 65536 80.22 8.55
87380 65536 65536 80.22 7.34
87380 65536 65536 80.29 6.28
87380 65536 65536 80.28 7.23
87380 65536 65536 80.23 8.56
87380 65536 65536 80.25 6.60
87380 65536 65536 80.31 6.99
87380 65536 65536 80.27 8.22
87380 65536 65536 80.30 7.41
87380 65536 65536 80.33 8.21
87380 65536 65536 80.27 7.94
87380 65536 65536 80.32 6.54
87380 65536 65536 80.29 8.58
87380 65536 65536 80.35 7.37
87380 65536 65536 80.35 7.09
87380 65536 65536 80.37 7.23
87380 65536 65536 80.38 8.31
87380 65536 65536 80.38 8.18
87380 65536 65536 80.44 9.11
87380 65536 65536 80.43 4.95
87380 65536 65536 80.43 6.48
87380 65536 65536 80.42 8.11
87380 65536 65536 80.44 6.74
87380 65536 65536 80.47 8.76
87380 65536 65536 80.42 7.68
87380 65536 65536 80.45 6.10
87380 65536 65536 80.46 7.47
87380 65536 65536 80.51 7.37
87380 65536 65536 80.52 6.78
87380 65536 65536 80.48 7.31
87380 65536 65536 80.56 7.55
87380 65536 65536 80.57 6.85
87380 65536 65536 80.59 7.53
87380 65536 65536 80.63 7.01
87380 65536 65536 80.64 6.78
87380 65536 65536 80.60 5.76
87380 65536 65536 80.79 6.63
87380 65536 65536 80.79 6.29
87380 65536 65536 80.81 7.54
87380 65536 65536 80.81 7.22
87380 65536 65536 80.94 6.54
87380 65536 65536 80.90 8.02
87380 65536 65536 81.15 4.22

root@tg3:~# awk '{sum+=$5} END {print sum,NR,sum/NR}' vm-vm-50.dat
361.74 50 7.2348

Note the the terrible performance with single flows. With 50 flows,
the aggregate improves, but is still much worse than the dom0 to dom0
results.

Any ideas why I'm getting such bad performance from the VMs on high
BDP links? I'm willing and interested to help in debugging and fixing
this issue, but I need some leads :)

TIA
--
Diwaker Gupta
http://resolute.ucsd.edu/diwaker

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
RE: strangeness with high BDP tests [ In reply to ]
> Any ideas why I'm getting such bad performance from the VMs
> on high BDP links? I'm willing and interested to help in
> debugging and fixing this issue, but I need some leads :)

The first thing to do is to look at the CPU usage in dom0 and domU. If
you can run them on different CPUs or even different hyperthreads it
might make the experiment simpler to understand. The first thing to find
out is whether you're maxed out on CPU, or whether this is an IO
blocking issue. Xm list should show you how much CPU each domain is
burning.

Secondly, enable performance counters in a Xen build, then use the user
space tools to read out the context switch rate. How does it compare
without the emulated BDP link?

Also, you might want to play around with the rate limiting function in
netback. If you set it to a few hundred Mb/s you might help promote
batching.

I'm also concerned that dummynet is pretty terible when operating at
such high speeds, and the whole thing might be just a bad interaction
between Xen's batching and dummynet's. Why not set up a real experiement
across Abilene just to check?

Ian

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: strangeness with high BDP tests [ In reply to ]
> > Any ideas why I'm getting such bad performance from the VMs
> > on high BDP links? I'm willing and interested to help in
> > debugging and fixing this issue, but I need some leads :)
>
> The first thing to do is to look at the CPU usage in dom0 and domU. If
> you can run them on different CPUs or even different hyperthreads it
> might make the experiment simpler to understand. The first thing to find
> out is whether you're maxed out on CPU, or whether this is an IO
> blocking issue. Xm list should show you how much CPU each domain is
> burning.

I had caught glimpses on the list of a top like utility for viewing
CPU usage.. is that a reality yet? I haven't followed up on that
thread. The problem is that xm list is fine for very coarse grained
measurements, but its a pain to do real-time fine granularity
measurements with that. Sure, I could always write my own little
Python script using the xm interface, but it'll be great if we had
something like top.

> Also, you might want to play around with the rate limiting function in
> netback. If you set it to a few hundred Mb/s you might help promote
> batching.

Sorry if this is dumb, but whats the rate limiting function in
netback? Is it a run-time parameter or something in the code? What
does it do? If I set it too high, won't it lead to bad performance
with low b/w flows? I guess I should just look at the code :)


> I'm also concerned that dummynet is pretty terible when operating at
> such high speeds, and the whole thing might be just a bad interaction
> between Xen's batching and dummynet's. Why not set up a real experiement
> across Abilene just to check?

I think thats a separate debate. For now, I just want to get the same
performance levels from a VM as from dom0, for all possible
environments, dummynet just being one of them. Setting up a real
experiment is a good idea though, I'm looking into it. BTW, where can
I learn more on Xen's "batching"?

--
Diwaker Gupta
http://resolute.ucsd.edu/diwaker

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
RE: strangeness with high BDP tests [ In reply to ]
> > Also, you might want to play around with the rate limiting
> function in
> > netback. If you set it to a few hundred Mb/s you might help promote
> > batching.
>
> Sorry if this is dumb, but whats the rate limiting function
> in netback? Is it a run-time parameter or something in the
> code? What does it do? If I set it too high, won't it lead to
> bad performance with low b/w flows? I guess I should just
> look at the code :)

See: xm vif-limit


Ian

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: strangeness with high BDP tests [ In reply to ]
Diwaker Gupta wrote:
>>>Any ideas why I'm getting such bad performance from the VMs
>>>on high BDP links? I'm willing and interested to help in
>>>debugging and fixing this issue, but I need some leads :)
>>
>>The first thing to do is to look at the CPU usage in dom0 and domU. If
>>you can run them on different CPUs or even different hyperthreads it
>>might make the experiment simpler to understand. The first thing to find
>>out is whether you're maxed out on CPU, or whether this is an IO
>>blocking issue. Xm list should show you how much CPU each domain is
>>burning.
>
>
> I had caught glimpses on the list of a top like utility for viewing
> CPU usage.. is that a reality yet? I haven't followed up on that
> thread. The problem is that xm list is fine for very coarse grained
> measurements, but its a pain to do real-time fine granularity
> measurements with that. Sure, I could always write my own little
> Python script using the xm interface, but it'll be great if we had
> something like top.
>
>
>>Also, you might want to play around with the rate limiting function in
>>netback. If you set it to a few hundred Mb/s you might help promote
>>batching.
>
>
> Sorry if this is dumb, but whats the rate limiting function in
> netback? Is it a run-time parameter or something in the code? What
> does it do? If I set it too high, won't it lead to bad performance
> with low b/w flows? I guess I should just look at the code :)

Hi Diwaker! Sorry I'm coming to this thread late, I was out
sick the last couple of days. I just started looking into the
net flow control problem. Ian is speculating that the rate
limiting function will actually help improve data get pushed
faster. We're looking into where exactly our latencies are.
If you could run some debug patches for me, I'd really appreciate
it..

Btw, have you tried using the -i and -I options to netperf?
-i 30, 10, will at least ensure a minimum of 10 runs for
each measurement, and -I can be used to specify a confidence
interval (99, 5). Even if it's consistent, I wouldn't trust the 10
second run time for the test.

Netperf uses setsockopt() to set its own buffer sizes, so
increasing the system sysctl values will not affect your test
in anyway (or shouldn't ;)).


>>I'm also concerned that dummynet is pretty terible when operating at
>>such high speeds, and the whole thing might be just a bad interaction
>>between Xen's batching and dummynet's. Why not set up a real experiement
>>across Abilene just to check?
>
>
> I think thats a separate debate. For now, I just want to get the same
> performance levels from a VM as from dom0, for all possible
> environments, dummynet just being one of them. Setting up a real
> experiment is a good idea though, I'm looking into it. BTW, where can
> I learn more on Xen's "batching"?

The question is how frequently should the frontend kick the
backend, and how frequently should the backend pass along packets
to the real device. Aggregating requests improves the efficiency
of the transfers but impacts latency.

thanks,
Nivedita




_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: strangeness with high BDP tests [ In reply to ]
On Apr 4, 2005 12:39 AM, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote:
>
> > > Also, you might want to play around with the rate limiting
> > function in
> > > netback. If you set it to a few hundred Mb/s you might help promote
> > > batching.
> >
> > Sorry if this is dumb, but whats the rate limiting function
> > in netback? Is it a run-time parameter or something in the
> > code? What does it do? If I set it too high, won't it lead to
> > bad performance with low b/w flows? I guess I should just
> > look at the code :)
>
> See: xm vif-limit
>

Maybe I missed something. My xm only has vif-list, no vif-limit. I
also grepped for anything resembling vif-limit inside the tools
directory, but with no useful results. Is this a new feature? I'm
using Xen 2.0.3

--
Diwaker Gupta
http://resolute.ucsd.edu/diwaker

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: strangeness with high BDP tests [ In reply to ]
> Hi Diwaker! Sorry I'm coming to this thread late, I was out
> sick the last couple of days. I just started looking into the
> net flow control problem. Ian is speculating that the rate
> limiting function will actually help improve data get pushed
> faster. We're looking into where exactly our latencies are.
> If you could run some debug patches for me, I'd really appreciate
> it..

I'd be happy to. Just send them over, and let me know if you find
anything interesting.

> Btw, have you tried using the -i and -I options to netperf?
> -i 30, 10, will at least ensure a minimum of 10 runs for
> each measurement, and -I can be used to specify a confidence
> interval (99, 5). Even if it's consistent, I wouldn't trust the 10
> second run time for the test.

I don't trust the 10 second tests either, especially for such a high
RTT. Thats why I ran the tests for 80 seconds (thats 1000 RTTs, and
should give TCP enough time to stabilize). I'll get some numbers using
these options in any case.


> Netperf uses setsockopt() to set its own buffer sizes, so
> increasing the system sysctl values will not affect your test
> in anyway (or shouldn't ;)).

Yeah, but in my experience it usually picks up the "default" value as
set by the sysctl. I'll check the code.

>
>
> >>I'm also concerned that dummynet is pretty terible when operating at
> >>such high speeds, and the whole thing might be just a bad interaction
> >>between Xen's batching and dummynet's. Why not set up a real experiement
> >>across Abilene just to check?
> >
> >
> > I think thats a separate debate. For now, I just want to get the same
> > performance levels from a VM as from dom0, for all possible
> > environments, dummynet just being one of them. Setting up a real
> > experiment is a good idea though, I'm looking into it. BTW, where can
> > I learn more on Xen's "batching"?
>
> The question is how frequently should the frontend kick the
> backend, and how frequently should the backend pass along packets
> to the real device. Aggregating requests improves the efficiency
> of the transfers but impacts latency.

I agree. But I think its a reasonable goal to expect dom0 performance
to match a VM performance across a variety of environments :)

--
Diwaker Gupta
http://resolute.ucsd.edu/diwaker

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: strangeness with high BDP tests [ In reply to ]
> I don't trust the 10 second tests either, especially for such a high
> RTT. Thats why I ran the tests for 80 seconds (thats 1000 RTTs, and
> should give TCP enough time to stabilize). I'll get some numbers using
> these options in any case.

Cool :). Thanks for offering to test, too.

> Yeah, but in my experience it usually picks up the "default" value as
> set by the sysctl. I'll check the code.

In your netperf output, it's listing the socket size as
the default system 64K. If you invoke netperf with -s 131762 -S 131762
it should at least use 128K (local and remote). Bumping that
up by 3 times usually gives good gain on netperf stream type
tests bymmv..

> I agree. But I think its a reasonable goal to expect dom0 performance
> to match a VM performance across a variety of environments :)

Yep ;)

thanks,
Nivedita



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
RE: strangeness with high BDP tests [ In reply to ]
> Maybe I missed something. My xm only has vif-list, no
> vif-limit. I also grepped for anything resembling vif-limit
> inside the tools directory, but with no useful results. Is
> this a new feature? I'm using Xen 2.0.3

Please upgrade.

Best,
Ian


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: strangeness with high BDP tests [ In reply to ]
> > I don't trust the 10 second tests either, especially for such a high
> > RTT. Thats why I ran the tests for 80 seconds (thats 1000 RTTs, and
> > should give TCP enough time to stabilize). I'll get some numbers using
> > these options in any case.
>
> Cool :). Thanks for offering to test, too.

No problemo :) So I tried the -i and -I options... not too much of a
difference. Slight improvement in the numbers, but the difference is
still stark.

> > Yeah, but in my experience it usually picks up the "default" value as
> > set by the sysctl. I'll check the code.
>
> In your netperf output, it's listing the socket size as
> the default system 64K. If you invoke netperf with -s 131762 -S 131762
> it should at least use 128K (local and remote). Bumping that
> up by 3 times usually gives good gain on netperf stream type
> tests bymmv..

I looked at the netperf source. If the -s/-S values are not specified,
it seems it sticks to the default values. Also, setsockopt only
changes the maximum buffer size, the default is still governed by the
sysctl values. Further, AFAIK, even the max value (in Linux) is just a
hint to the TCP stack -- the actual size of the buffer is determined
by the TCP auto-buffer tuning code. In any case, since both dom0 and
the VM are using the same buffer sizes, I'm not too concerned about
setting the "right" buffer sizes. Right now, I want to figure out the
discrepancy in performance.

--
Diwaker Gupta
http://resolute.ucsd.edu/diwaker

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: strangeness with high BDP tests [ In reply to ]
On Apr 4, 2005 11:48 AM, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote:
> > Maybe I missed something. My xm only has vif-list, no
> > vif-limit. I also grepped for anything resembling vif-limit
> > inside the tools directory, but with no useful results. Is
> > this a new feature? I'm using Xen 2.0.3
>
> Please upgrade.

I did:
Xen version 2.0.5 (root@localdomain) (gcc version 3.3.5 (Debian
1:3.3.5-2)) Mon Apr 4 13:53:08 PDT 2005

But I still don't see any xm vif-limit. Do I need to upgrade to unstable?

--
Diwaker Gupta
http://resolute.ucsd.edu/diwaker

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: strangeness with high BDP tests [ In reply to ]
Diwaker Gupta wrote:

> On Apr 4, 2005 11:48 AM, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote:
>
>>>Maybe I missed something. My xm only has vif-list, no
>>>vif-limit. I also grepped for anything resembling vif-limit
>>>inside the tools directory, but with no useful results. Is
>>>this a new feature? I'm using Xen 2.0.3
>>
>>Please upgrade.
>
>
> I did:
> Xen version 2.0.5 (root@localdomain) (gcc version 3.3.5 (Debian
> 1:3.3.5-2)) Mon Apr 4 13:53:08 PDT 2005
>
> But I still don't see any xm vif-limit. Do I need to upgrade to unstable?

Oops, sorry, I assumed you were on unstable. It would be
very useful if you could try xen-unstable. Quite frankly,
it has been relatively stable for me.

thanks,
Nivedita



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users