Mailing List Archive

policing TCP traffic
While testing TCP performance through a policer in the lab, we noticed
some strange results. It appears that with 15M and 20M policers, we
get significantly less throughput (as a percentage of the policer
setting) than with other settings.

I'm not sure this can be explained by the TCP window closing, as it
seems to happen only at 15M and 20M.

We are using the netperf tool to generate traffic:
netperf -H 192.168.8.11 -l 200 -- -i 192.168.11.11

The policer is configured as follows, under 5.4R1 on an M10:

fe-0/0/3 {
unit 0 {
family inet {
filter {
input test-filter;
}
address 192.168.11.2/24;
}
}
}

family inet {
filter test-filter {
policer p1 {
if-exceeding {
bandwidth-limit 15m;
burst-size-limit 100m;
}
then discard;
}

Here is a summary of our results at different settings. Yes, I know
that the burst-size is set high, but that doesn't seem to make a
difference in this test.

==> summary_5m_tcp_output.txt <==
Avg. bw per round = 4.8100 Avg. delay per round = 10.0780 ms
==> summary_10m_tcp_output.txt <==
Avg. bw per round = 9.6600 Avg. delay per round = 5.3340 ms
==> summary_15m_tcp_output.txt <==
Avg. bw per round = 11.2200 Avg. delay per round = 3.5190 ms
^^^^^^^
==> summary_20m_tcp_output.txt <==
Avg. bw per round = 11.5500 Avg. delay per round = 3.4150 ms
^^^^^^^
==> summary_25m_tcp_output.txt <==
Avg. bw per round = 24.0700 Avg. delay per round = 6.5240 ms
==> summary_30m_tcp_output.txt <==
Avg. bw per round = 28.9100 Avg. delay per round = 9.3030 ms
==> summary_35m_tcp_output.txt <==
Avg. bw per round = 33.7400 Avg. delay per round = 7.8210 ms
==> summary_40m_tcp_output.txt <==
Avg. bw per round = 38.5700 Avg. delay per round = 8.9690 ms
==> summary_45m_tcp_output.txt <==
Avg. bw per round = 43.3100 Avg. delay per round = 8.5940 ms
==> summary_50m_tcp_output.txt <==
Avg. bw per round = 47.7100 Avg. delay per round = 4.5710 ms
==> summary_55m_tcp_output.txt <==
Avg. bw per round = 52.1700 Avg. delay per round = 8.3280 ms
==> summary_60m_tcp_output.txt <==
Avg. bw per round = 57.2800 Avg. delay per round = 6.2030 ms
==> summary_65m_tcp_output.txt <==
Avg. bw per round = 62.0700 Avg. delay per round = 4.9650 ms
==> summary_70m_tcp_output.txt <==
Avg. bw per round = 67.4900 Avg. delay per round = 4.6020 ms
==> summary_75m_tcp_output.txt <==
Avg. bw per round = 72.3200 Avg. delay per round = 5.4870 ms
==> summary_80m_tcp_output.txt <==
Avg. bw per round = 77.2300 Avg. delay per round = 3.7220 ms
==> summary_85m_tcp_output.txt <==
Avg. bw per round = 82.0500 Avg. delay per round = 6.1610 ms
==> summary_90m_tcp_output.txt <==
Avg. bw per round = 86.8900 Avg. delay per round = 15.3260 ms
==> summary_95m_tcp_output.txt <==
Avg. bw per round = 91.7300 Avg. delay per round = 14.5570 ms
==> summary_100m_tcp_output.txt <==
Avg. bw per round = 94.0500 Avg. delay per round = 14.7480 ms

Any pointers would be appreciated.

Thanks.
- Jason Parsons
policing TCP traffic [ In reply to ]
Hi Jason,

The burst-limit defines the maximum amount of time (as well as amount of
traffic) that you will allow any single packet to sit in the burst
buffer.

For example: to limit any single packet from sitting in the burst buffer
for any longer than 5ms, you would use the following generic formula.

---------> Burst size =3D bandwidth * 5 ms

That is, to keep your burst size large enough to allow
5 ms of burst.

Burst size is expressed in kbits.

Another example, let's say I had an OC48, and I wanted to rate-limit a
specific subset of traffic on that interface to 622Mbps. To limit
packets running at an OC12 link rate to less than 1ms wait time in the
burst buffer, I would use the following:

burst size limit =3D 1 ms * 622mbit/sec =3D 622k bits =3D 77 k byte.

So, for 20Mb rate, I would set a burst size of 1000k (1 Meg), and that
would ensure
that packets are not queued for more than 5ms. ~5ms max-delay is
usually adequate for most traffic.

Also, and this does not apply to your case, but as a rule of thumb, the
burst limit should never be smaller than the MTU for the interface.

If the burst size is too small, you could end
up getting something smaller then policed rate.
If it is too big, as is the case here, you are extending=20
the burst-buffer to way beyond the intended limit.

Think of the burst-limit buffer as being a threshold that is monitored
and
used for throttling traffic that bursts above the bandwidth-limit many
times in a second.

Thanks,

Bob O'Hara =20

Systems Engineer

Juniper Networks - 'Every Bit IP'

.........................................
. Email: rohara@juniper.net .
. Cell: 603.498.8119 .
. Home Office: 603.382.3894 .
. Westford Office: 978.589.0127 . =20
.........................................


-----Original Message-----
From: Jason Parsons [mailto:jparsons-juniper@saffron.org]
Sent: Friday, August 30, 2002 12:39 PM
To: juniper-nsp@puck.nether.net
Subject: [j-nsp] policing TCP traffic


While testing TCP performance through a policer in the lab, we noticed=20
some strange results. It appears that with 15M and 20M policers, we=20
get significantly less throughput (as a percentage of the policer=20
setting) than with other settings.

I'm not sure this can be explained by the TCP window closing, as it=20
seems to happen only at 15M and 20M.

We are using the netperf tool to generate traffic:
netperf -H 192.168.8.11 -l 200 -- -i 192.168.11.11

The policer is configured as follows, under 5.4R1 on an M10:

fe-0/0/3 {
unit 0 {
family inet {
filter {
input test-filter;
}
address 192.168.11.2/24;
}
}
}

family inet {
filter test-filter {
policer p1 {
if-exceeding {
bandwidth-limit 15m;
burst-size-limit 100m;
}
then discard;
}

Here is a summary of our results at different settings. Yes, I know=20
that the burst-size is set high, but that doesn't seem to make a=20
difference in this test.

=3D=3D> summary_5m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 4.8100 Avg. delay per round =3D 10.0780 ms
=3D=3D> summary_10m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 9.6600 Avg. delay per round =3D 5.3340 ms
=3D=3D> summary_15m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 11.2200 Avg. delay per round =3D 3.5190 ms
^^^^^^^
=3D=3D> summary_20m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 11.5500 Avg. delay per round =3D 3.4150 ms
^^^^^^^
=3D=3D> summary_25m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 24.0700 Avg. delay per round =3D 6.5240 ms
=3D=3D> summary_30m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 28.9100 Avg. delay per round =3D 9.3030 ms
=3D=3D> summary_35m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 33.7400 Avg. delay per round =3D 7.8210 ms
=3D=3D> summary_40m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 38.5700 Avg. delay per round =3D 8.9690 ms
=3D=3D> summary_45m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 43.3100 Avg. delay per round =3D 8.5940 ms
=3D=3D> summary_50m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 47.7100 Avg. delay per round =3D 4.5710 ms
=3D=3D> summary_55m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 52.1700 Avg. delay per round =3D 8.3280 ms
=3D=3D> summary_60m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 57.2800 Avg. delay per round =3D 6.2030 ms
=3D=3D> summary_65m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 62.0700 Avg. delay per round =3D 4.9650 ms
=3D=3D> summary_70m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 67.4900 Avg. delay per round =3D 4.6020 ms
=3D=3D> summary_75m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 72.3200 Avg. delay per round =3D 5.4870 ms
=3D=3D> summary_80m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 77.2300 Avg. delay per round =3D 3.7220 ms
=3D=3D> summary_85m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 82.0500 Avg. delay per round =3D 6.1610 ms
=3D=3D> summary_90m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 86.8900 Avg. delay per round =3D 15.3260 ms
=3D=3D> summary_95m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 91.7300 Avg. delay per round =3D 14.5570 ms
=3D=3D> summary_100m_tcp_output.txt <=3D=3D
Avg. bw per round =3D 94.0500 Avg. delay per round =3D 14.7480 ms

Any pointers would be appreciated.

Thanks.
- Jason Parsons

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/juniper-nsp
policing TCP traffic [ In reply to ]
Thanks for the information Bob.

Do you have any comment about the performance of these filters at the
15M and 20M levels? I'm looking for a plausible explanation of why TCP
throughput is so much lower than the policing level at these two rates.

Thanks again.
- Jason Parsons


> ==> summary_15m_tcp_output.txt <==
> Avg. bw per round = 11.2200 Avg. delay per round = 3.5190 ms
> ^^^^^^^
> ==> summary_20m_tcp_output.txt <==
> Avg. bw per round = 11.5500 Avg. delay per round = 3.4150 ms
> ^^^^^^^
policing TCP traffic [ In reply to ]
Hi Jason,

To some extent, I am theorizing, because I am not in front of your=20
test setup, but.... as noted, I think the burst-limit is set too high.
I guess the
thing to remember is that the larger you make your burst-limit, the
longer the delay will be for traffic that burst-above the bandwidth
limit.
100Mb is really big for a bandwidth-limit of 15 or 20Mb.

As you burst above your bandwidth limit of 15 or 20 Mb, the buffer you
have defined is
way too generous, and may be impacting tcp end to end.

This, in turn, is probably causing tcp global synchronization and
causing the end
stations to back off and the throughput is bound to drop off from the
sender while the
window gets renegotiated and the traffic climbs back to it's potential.
Without enough acks coming back to the sender, the sender cannot
take advantage of the pipe until it renegotiates a bigger window.

It's just a guess, but it wouldn't surprise me if that is what is
happening. 100Mb is a huge burst-limit for 15 or 20 mb rate. =20
Perhaps higher rates can work with that large a burst-limit In fact, I
am surprised it accepted that as an in-range value, but I am not in
front of a router at the moment, so I am not sure what the acceptable
range is there.

I would send a predefined amount of traffic, look under traffic stats
and count the packets on the inbound
interface (absolute value), and the outbound interface (what made it
through). =20
Also, I'd add an extra counter to your filter for visibility - read
on...

When you do 'show firewall' you will see a counter named 'test-filter'.=20
This counter associated with this policer and is counting the packets
dropped/discarded.
You can add a 'then count' statement to a named counter after you
perform the
'then discard' and this will show a count of the packets that make it
through
the policer. =20

The main thing to remember is that the policer is an algorithm that has
guidelines for use.
It's designed to be used for all bandwidth rates, and the burst-limit
parameter is designed
to be tweaked appropriately for various bandwidth rates.

Let me know how you progress with this. I am interested in your
results.

Thanks,

Bob O'Hara =20

Systems Engineer

Juniper Networks - 'Every Bit IP'

.........................................
. Email: rohara@juniper.net .
. Cell: 603.498.8119 .
. Westford Office: 978.589.0127 . =20
.........................................

-----Original Message-----
From: Jason Parsons [mailto:jparsons-juniper@saffron.org]
Sent: Friday, August 30, 2002 3:59 PM
To: Robert O'Hara
Cc: juniper-nsp@puck.nether.net
Subject: Re: [j-nsp] policing TCP traffic


Thanks for the information Bob.

Do you have any comment about the performance of these filters at the=20
15M and 20M levels? I'm looking for a plausible explanation of why TCP=20
throughput is so much lower than the policing level at these two rates.

Thanks again.
- Jason Parsons


> =3D=3D> summary_15m_tcp_output.txt <=3D=3D
> Avg. bw per round =3D 11.2200 Avg. delay per round =3D 3.5190 ms
> ^^^^^^^
> =3D=3D> summary_20m_tcp_output.txt <=3D=3D
> Avg. bw per round =3D 11.5500 Avg. delay per round =3D 3.4150 ms
> ^^^^^^^
policing TCP traffic [ In reply to ]
I will try that, but I'm a little confused about why the performance
would suffer only at the 15 and 20M rates, but not at 5M, 10M, or 25M
(per my original email). Maybe it's some artifact of packet size?

We'll try a policer such as:

firewall {
policer test-policer {
if-exceeding {
bandwidth-limit 15M;
burst-size-limit 47000;
}
then {
count policer-drop;
discard;
}
}
family inet {
filter filter-name {
policer test-policer {
if-exceeding {
bandwidth-limit 15M;
burst-size-limit 47000;
}
then {
count policer-accepted;
accept;
}
}
}
}
}

This is on a fast ethernet, by the way.

Thanks.
- Jason

On Friday, Aug 30, 2002, at 19:33 US/Eastern, Robert O'Hara wrote:

> Hi Jason,
>
> To some extent, I am theorizing, because I am not in front of your
> test setup, but.... as noted, I think the burst-limit is set too high.
> I guess the
> thing to remember is that the larger you make your burst-limit, the
> longer the delay will be for traffic that burst-above the bandwidth
> limit.
> 100Mb is really big for a bandwidth-limit of 15 or 20Mb.
>
> As you burst above your bandwidth limit of 15 or 20 Mb, the buffer you
> have defined is
> way too generous, and may be impacting tcp end to end.
>
> This, in turn, is probably causing tcp global synchronization and
> causing the end
> stations to back off and the throughput is bound to drop off from the
> sender while the
> window gets renegotiated and the traffic climbs back to it's potential.
> Without enough acks coming back to the sender, the sender cannot
> take advantage of the pipe until it renegotiates a bigger window.
>
> It's just a guess, but it wouldn't surprise me if that is what is
> happening. 100Mb is a huge burst-limit for 15 or 20 mb rate.
> Perhaps higher rates can work with that large a burst-limit In fact, I
> am surprised it accepted that as an in-range value, but I am not in
> front of a router at the moment, so I am not sure what the acceptable
> range is there.
>
> I would send a predefined amount of traffic, look under traffic stats
> and count the packets on the inbound
> interface (absolute value), and the outbound interface (what made it
> through).
> Also, I'd add an extra counter to your filter for visibility - read
> on...
>
> When you do 'show firewall' you will see a counter named 'test-filter'.
> This counter associated with this policer and is counting the packets
> dropped/discarded.
> You can add a 'then count' statement to a named counter after you
> perform the
> 'then discard' and this will show a count of the packets that make it
> through
> the policer.
>
> The main thing to remember is that the policer is an algorithm that has
> guidelines for use.
> It's designed to be used for all bandwidth rates, and the burst-limit
> parameter is designed
> to be tweaked appropriately for various bandwidth rates.
>
> Let me know how you progress with this. I am interested in your
> results.
>
> Thanks,
>
> Bob O'Hara
>
> Systems Engineer
>
> Juniper Networks - 'Every Bit IP'
>
> .........................................
> . Email: rohara@juniper.net .
> . Cell: 603.498.8119 .
> . Westford Office: 978.589.0127 .
> .........................................
>
> -----Original Message-----
> From: Jason Parsons [mailto:jparsons-juniper@saffron.org]
> Sent: Friday, August 30, 2002 3:59 PM
> To: Robert O'Hara
> Cc: juniper-nsp@puck.nether.net
> Subject: Re: [j-nsp] policing TCP traffic
>
>
> Thanks for the information Bob.
>
> Do you have any comment about the performance of these filters at the
> 15M and 20M levels? I'm looking for a plausible explanation of why TCP
> throughput is so much lower than the policing level at these two rates.
>
> Thanks again.
> - Jason Parsons
>
>
>> ==> summary_15m_tcp_output.txt <==
>> Avg. bw per round = 11.2200 Avg. delay per round = 3.5190 ms
>> ^^^^^^^
>> ==> summary_20m_tcp_output.txt <==
>> Avg. bw per round = 11.5500 Avg. delay per round = 3.4150 ms
>> ^^^^^^^
>
policing TCP traffic [ In reply to ]
Bob,

Some more detail on my policing concerns; hope you can offer some more
help.

To make things a little easier, I applied the policer directly to the
interface:

fe-0/0/3 {
unit 0 {
family inet {
policer {
input policer-1;
}
address 192.168.11.2/24;
}
}
}

I then tried playing around with the settings of policer-1.

policer policer-1 {
if-exceeding {
bandwidth-limit 15m;
burst-size-limit 15k;
}
then discard;
}

Here's a summary of different bandwidth-limits with different
burst-size-limits, and the TCP throughput I was able to achieve under
that configuration:

bandwidth-limit burst-size-limit Actual TCP throughput
15M 15k (10xMTU) 440Kb/s
15M 3m 7.38Mb/s
15M 7m 7.60Mb/s
15M 15m 7.45Mb/s
(15m = bandwidth(15m) x allowable time for burst traffic (1ms))
15M 30m 7.44Mb/s

30M 15k 12.61Mb/s
30M 15m 27.10Mb/s
30M 30m 26.99Mb/s

As the table shows, the best performance I can get out of a 15M filter
is about 51% of the configured limit. However, I can get 90% out of a
30M filter with a similar configuration.

So, what am I missing? I know that my testing setup isn't great, but I
can't find a good explanation for the discrepancy between the 15M and
30M results.

Thanks again.
- Jason Parsons

On Friday, Aug 30, 2002, at 19:33 US/Eastern, Robert O'Hara wrote:

> To some extent, I am theorizing, because I am not in front of your
> test setup, but.... as noted, I think the burst-limit is set too high.