Mailing List Archive

Trident3 vs Jericho2
Once again, which is better shared buffer featurerich or fat buffer switches?
When its better to put big buffer switch? When its better to drop and retransmit instead of queueing?

Thanks.
Dmitry
Re: Trident3 vs Jericho2 [ In reply to ]
What I've observed is that it's better to have a big buffer device when you're mixing port speeds. The more dramatic the port speed differences (and the more of them), the more buffer you need.


If you have all the same port speed, small buffers are fine. If you have 100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located.




-----
Mike Hammett
Intelligent Computing Solutions

Midwest Internet Exchange

The Brothers WISP

----- Original Message -----

From: "Dmitry Sherman" <dmitry@interhost.net>
To: nanog@nanog.org
Sent: Friday, April 9, 2021 7:57:05 AM
Subject: Trident3 vs Jericho2

Once again, which is better shared buffer featurerich or fat buffer switches?
When its better to put big buffer switch? When its better to drop and retransmit instead of queueing?

Thanks.
Dmitry
Re: Trident3 vs Jericho2 [ In reply to ]
There is no easy, one side fits all answer to this question. It's a complex
subject, and the answer will often be different depending on the
environment and traffic profile.

On Fri, Apr 9, 2021 at 8:58 AM Dmitry Sherman <dmitry@interhost.net> wrote:

> Once again, which is better shared buffer featurerich or fat buffer
> switches?
> When its better to put big buffer switch? When its better to drop and
> retransmit instead of queueing?
>
> Thanks.
> Dmitry
>
Re: Trident3 vs Jericho2 [ In reply to ]
>
> If you have all the same port speed, small buffers are fine. If you have
> 100G and 1G ports, you'll need big buffers wherever the transition to the
> smaller port speed is located.


While the larger buffer there you are likely to be severely impacting
application throughput.

On Fri, Apr 9, 2021 at 9:05 AM Mike Hammett <nanog@ics-il.net> wrote:

> What I've observed is that it's better to have a big buffer device when
> you're mixing port speeds. The more dramatic the port speed differences
> (and the more of them), the more buffer you need.
>
> If you have all the same port speed, small buffers are fine. If you have
> 100G and 1G ports, you'll need big buffers wherever the transition to the
> smaller port speed is located.
>
>
>
> -----
> Mike Hammett
> Intelligent Computing Solutions <http://www.ics-il.com/>
> <https://www.facebook.com/ICSIL>
> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb>
> <https://www.linkedin.com/company/intelligent-computing-solutions>
> <https://twitter.com/ICSIL>
> Midwest Internet Exchange <http://www.midwest-ix.com/>
> <https://www.facebook.com/mdwestix>
> <https://www.linkedin.com/company/midwest-internet-exchange>
> <https://twitter.com/mdwestix>
> The Brothers WISP <http://www.thebrotherswisp.com/>
> <https://www.facebook.com/thebrotherswisp>
> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg>
> ------------------------------
> *From: *"Dmitry Sherman" <dmitry@interhost.net>
> *To: *nanog@nanog.org
> *Sent: *Friday, April 9, 2021 7:57:05 AM
> *Subject: *Trident3 vs Jericho2
>
> Once again, which is better shared buffer featurerich or fat buffer
> switches?
> When its better to put big buffer switch? When its better to drop and
> retransmit instead of queueing?
>
> Thanks.
> Dmitry
>
>
Re: Trident3 vs Jericho2 [ In reply to ]
I have seen the opposite, where small buffers impacted throughput.

Then again, it was observation only, no research into why, other than superficial.




-----
Mike Hammett
Intelligent Computing Solutions

Midwest Internet Exchange

The Brothers WISP

----- Original Message -----

From: "Tom Beecher" <beecher@beecher.cc>
To: "Mike Hammett" <nanog@ics-il.net>
Cc: "Dmitry Sherman" <dmitry@interhost.net>, "NANOG" <nanog@nanog.org>
Sent: Friday, April 9, 2021 8:40:00 AM
Subject: Re: Trident3 vs Jericho2




If you have all the same port speed, small buffers are fine. If you have 100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located.




While the larger buffer there you are likely to be severely impacting application throughput.



On Fri, Apr 9, 2021 at 9:05 AM Mike Hammett < nanog@ics-il.net > wrote:

<blockquote>


What I've observed is that it's better to have a big buffer device when you're mixing port speeds. The more dramatic the port speed differences (and the more of them), the more buffer you need.


If you have all the same port speed, small buffers are fine. If you have 100G and 1G ports, you'll need big buffers wherever the transition to the smaller port speed is located.




-----
Mike Hammett
Intelligent Computing Solutions

Midwest Internet Exchange

The Brothers WISP



From: "Dmitry Sherman" < dmitry@interhost.net >
To: nanog@nanog.org
Sent: Friday, April 9, 2021 7:57:05 AM
Subject: Trident3 vs Jericho2

Once again, which is better shared buffer featurerich or fat buffer switches?
When its better to put big buffer switch? When its better to drop and retransmit instead of queueing?

Thanks.
Dmitry


</blockquote>
Re: Trident3 vs Jericho2 [ In reply to ]
The reason why we need larger buffers on some applications is because of
TCP implementation detail. When TCP window grows in size (it grows
exponentially) the newly created window size is bursted on to the wire at
sender speed.

If sender is significantly higher speed than receiver, someone needs to
store these bytes, while they are serialised at receiver speed. If we
cannot store them, then the window cannot grow to accommodate the
banwdith*delay product and the receiver cannot observe ideal TCP receive
rate.

If we'd change TCP sender to bandwidth estimation, and newly created window
space would be serialised at estimated receiver rate then we would need
dramatically less buffers. However this less aggressive TCP algorithm would
be outcompeted by new reno reducing bandwidth estimation to approach zero.

Luckily almost all traffic is handled by few players, if they agree to
change to well behaved TCP (or QUIC) algorithm, it doesn't matter much if
the long tail is badly behaving TCP.



On Fri, 9 Apr 2021 at 17:13, Mike Hammett <nanog@ics-il.net> wrote:

> I have seen the opposite, where small buffers impacted throughput.
>
> Then again, it was observation only, no research into why, other than
> superficial.
>
>
>
> -----
> Mike Hammett
> Intelligent Computing Solutions <http://www.ics-il.com/>
> <https://www.facebook.com/ICSIL>
> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb>
> <https://www.linkedin.com/company/intelligent-computing-solutions>
> <https://twitter.com/ICSIL>
> Midwest Internet Exchange <http://www.midwest-ix.com/>
> <https://www.facebook.com/mdwestix>
> <https://www.linkedin.com/company/midwest-internet-exchange>
> <https://twitter.com/mdwestix>
> The Brothers WISP <http://www.thebrotherswisp.com/>
> <https://www.facebook.com/thebrotherswisp>
> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg>
> ------------------------------
> *From: *"Tom Beecher" <beecher@beecher.cc>
> *To: *"Mike Hammett" <nanog@ics-il.net>
> *Cc: *"Dmitry Sherman" <dmitry@interhost.net>, "NANOG" <nanog@nanog.org>
> *Sent: *Friday, April 9, 2021 8:40:00 AM
> *Subject: *Re: Trident3 vs Jericho2
>
> If you have all the same port speed, small buffers are fine. If you have
>> 100G and 1G ports, you'll need big buffers wherever the transition to the
>> smaller port speed is located.
>
>
> While the larger buffer there you are likely to be severely impacting
> application throughput.
>
> On Fri, Apr 9, 2021 at 9:05 AM Mike Hammett <nanog@ics-il.net> wrote:
>
>> What I've observed is that it's better to have a big buffer device when
>> you're mixing port speeds. The more dramatic the port speed differences
>> (and the more of them), the more buffer you need.
>>
>> If you have all the same port speed, small buffers are fine. If you have
>> 100G and 1G ports, you'll need big buffers wherever the transition to the
>> smaller port speed is located.
>>
>>
>>
>> -----
>> Mike Hammett
>> Intelligent Computing Solutions <http://www.ics-il.com/>
>> <https://www.facebook.com/ICSIL>
>> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb>
>> <https://www.linkedin.com/company/intelligent-computing-solutions>
>> <https://twitter.com/ICSIL>
>> Midwest Internet Exchange <http://www.midwest-ix.com/>
>> <https://www.facebook.com/mdwestix>
>> <https://www.linkedin.com/company/midwest-internet-exchange>
>> <https://twitter.com/mdwestix>
>> The Brothers WISP <http://www.thebrotherswisp.com/>
>> <https://www.facebook.com/thebrotherswisp>
>> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg>
>> ------------------------------
>> *From: *"Dmitry Sherman" <dmitry@interhost.net>
>> *To: *nanog@nanog.org
>> *Sent: *Friday, April 9, 2021 7:57:05 AM
>> *Subject: *Trident3 vs Jericho2
>>
>> Once again, which is better shared buffer featurerich or fat buffer
>> switches?
>> When its better to put big buffer switch? When its better to drop and
>> retransmit instead of queueing?
>>
>> Thanks.
>> Dmitry
>>
>>
>

--
++ytti
Re: Trident3 vs Jericho2 [ In reply to ]
? 9 avril 2021 17:20 +03, Saku Ytti:

> If we'd change TCP sender to bandwidth estimation, and newly created window
> space would be serialised at estimated receiver rate then we would need
> dramatically less buffers. However this less aggressive TCP algorithm would
> be outcompeted by new reno reducing bandwidth estimation to approach zero.
>
> Luckily almost all traffic is handled by few players, if they agree to
> change to well behaved TCP (or QUIC) algorithm, it doesn't matter much if
> the long tail is badly behaving TCP.

I think many of them are now using BBR or BBR v2. It would be
interesting to know how it impacted switch buffering.
--
As flies to wanton boys are we to the gods; they kill us for their sport.
-- Shakespeare, "King Lear"
Re: Trident3 vs Jericho2 [ In reply to ]
On Fri, Apr 9, 2021 at 6:05 AM Mike Hammett <nanog@ics-il.net> wrote:
> What I've observed is that it's better to have a big buffer device
> when you're mixing port speeds. The more dramatic the port
> speed differences (and the more of them), the more buffer you need.
>
> If you have all the same port speed, small buffers are fine. If you have
> 100G and 1G ports, you'll need big buffers wherever the transition to
> the smaller port speed is located.


When a network is behaving well (losing few packets to data
corruption), TCP throughput is is impacted by exactly two factors:

1. Packet round trip time
2. The size to which the congestion window has grown when the first
packet is lost

Assuming the sender has data ready, it will (after the initial
negotiation) slam out 10 packets back to back at the local wire speed.
Those 10 packets are the initial congestion window. After sending 10
packets it will wait and wait and wait until it hits a timeout or the
other side responds with an acknowledgement. So those initial packets
start out crammed right at the front of the round trip time with lots
of empty afterwards.

The receiver gets the packets in a similar burst and sends its acks.
As the sender receives acknowledgement for each of the original
packets, it sends two more. This doubling effect is called "slow
start," and it's slow in the sense that the sender doesn't just throw
the entire data set at the wire and hope. So, having received acks for
10 packets, it sends 20 more. These 20 have spread out a little bit,
more or less based on the worst link speed in the path, but they're
still all crammed up in a bunch at the start of the round trip time.

Next round trip time it doubles to 40 packets. Then 80. Then 160. All
crammed up at the start of the round trip time causing them to hit
that one slowest link in the middle all at once. This doubling
continues until one of the buffers in the middle is too small to hold
the trailing part of the burst of packets while the leading part is
sent. With a full buffer, a packet is dropped. Whatever the congestion
window size is when that first packet is dropped, that number times
the round trip time is more or less the throughput you're going to see
on that TCP connection.

The various congestion control algorithms for TCP do different things
after they see that first packet drop. Some knock the congestion
window in half right away. Others back down more cautiously. Some
reduce growth all the way down to 1 packet per round trip time. Others
will allow faster growth as the packets spread out over the whole
round trip time and demonstrate that they don't keep getting lost. But
in general, the throughput you're going to see on that TCP connection
has been decided as soon as you lose that first packet.

So, TCP will almost always get better throughput with more buffers.
The flip side is latency: packets sitting in a buffer extend the time
before the receiver gets them. So if you make a buffer that's 500
milliseconds long and then let a TCP connection fill it up, apps which
work poorly in high latency environments (like games and ssh) will
suffer.

Regards,
Bill Herrin


--
William Herrin
bill@herrin.us
https://bill.herrin.us/
Re: Trident3 vs Jericho2 [ In reply to ]
It will not be easy to get a straight answer, I would say more about your environ and applications. So if you considered the classical TCP algorithm ignoring latency it is large buffer, yet what about microburst?

LG

________________________________
From: NANOG <nanog-bounces+lobna_gouda=hotmail.com@nanog.org> on behalf of William Herrin <bill@herrin.us>
Sent: Friday, April 9, 2021 1:07 PM
To: Mike Hammett <nanog@ics-il.net>
Cc: nanog@nanog.org <nanog@nanog.org>
Subject: Re: Trident3 vs Jericho2

On Fri, Apr 9, 2021 at 6:05 AM Mike Hammett <nanog@ics-il.net> wrote:
> What I've observed is that it's better to have a big buffer device
> when you're mixing port speeds. The more dramatic the port
> speed differences (and the more of them), the more buffer you need.
>
> If you have all the same port speed, small buffers are fine. If you have
> 100G and 1G ports, you'll need big buffers wherever the transition to
> the smaller port speed is located.


When a network is behaving well (losing few packets to data
corruption), TCP throughput is is impacted by exactly two factors:

1. Packet round trip time
2. The size to which the congestion window has grown when the first
packet is lost

Assuming the sender has data ready, it will (after the initial
negotiation) slam out 10 packets back to back at the local wire speed.
Those 10 packets are the initial congestion window. After sending 10
packets it will wait and wait and wait until it hits a timeout or the
other side responds with an acknowledgement. So those initial packets
start out crammed right at the front of the round trip time with lots
of empty afterwards.

The receiver gets the packets in a similar burst and sends its acks.
As the sender receives acknowledgement for each of the original
packets, it sends two more. This doubling effect is called "slow
start," and it's slow in the sense that the sender doesn't just throw
the entire data set at the wire and hope. So, having received acks for
10 packets, it sends 20 more. These 20 have spread out a little bit,
more or less based on the worst link speed in the path, but they're
still all crammed up in a bunch at the start of the round trip time.

Next round trip time it doubles to 40 packets. Then 80. Then 160. All
crammed up at the start of the round trip time causing them to hit
that one slowest link in the middle all at once. This doubling
continues until one of the buffers in the middle is too small to hold
the trailing part of the burst of packets while the leading part is
sent. With a full buffer, a packet is dropped. Whatever the congestion
window size is when that first packet is dropped, that number times
the round trip time is more or less the throughput you're going to see
on that TCP connection.

The various congestion control algorithms for TCP do different things
after they see that first packet drop. Some knock the congestion
window in half right away. Others back down more cautiously. Some
reduce growth all the way down to 1 packet per round trip time. Others
will allow faster growth as the packets spread out over the whole
round trip time and demonstrate that they don't keep getting lost. But
in general, the throughput you're going to see on that TCP connection
has been decided as soon as you lose that first packet.

So, TCP will almost always get better throughput with more buffers.
The flip side is latency: packets sitting in a buffer extend the time
before the receiver gets them. So if you make a buffer that's 500
milliseconds long and then let a TCP connection fill it up, apps which
work poorly in high latency environments (like games and ssh) will
suffer.

Regards,
Bill Herrin


--
William Herrin
bill@herrin.us
https://bill.herrin.us/
Re: Trident3 vs Jericho2 [ In reply to ]
Buffer size has nothing to do with feature richness.
Assuming you are asking about DC - in a wide radix low oversubscription network shallow buffers do just fine, some applications (think map reduce/ML model training) have many to one traffic patterns and suffer from incast as the result, deep buffers might be helpful here, DCI/DC-GW is another case where deep buffers could be justified.

Regards,
Jeff

> On Apr 9, 2021, at 05:59, Dmitry Sherman <dmitry@interhost.net> wrote:
>
> ?Once again, which is better shared buffer featurerich or fat buffer switches?
> When its better to put big buffer switch? When its better to drop and retransmit instead of queueing?
>
> Thanks.
> Dmitry