Mailing List Archive: Lossy cogent p2p experiences?

Re: Lossy cogent p2p experiences? [ In reply to ]

Sep 8, 2023, 9:00 AM

Post #76 of 82 (169 views)

It was intended to detect congestion. The obvious response was in some way to pace the sender(s) so that it was alleviated.

Sent using a machine that autocorrects in interesting ways...

> On Sep 7, 2023, at 11:19?PM, Mark Tinka <mark@tinka.africa> wrote:
>
> ?
>
>> On 9/7/23 09:51, Saku Ytti wrote:
>>
>> Perhaps if congestion control used latency or FEC instead of loss, we
>> could tolerate reordering while not underperforming under loss, but
>> I'm sure in decades following that decision we'd learn new ways how we
>> don't understand any of this.
>
> Isn't this partly what ECN was meant for? It's so old I barely remember what it was meant to solve :-).
>
> Mark.

Re: Lossy cogent p2p experiences? [ In reply to ]

randy at psg

Sep 9, 2023, 11:44 AM

Post #77 of 82 (156 views)

Permalink

i am going to be foolish and comment, as i have not seen this raised

if i am running a lag, i can not resist adding a bit of resilience by
having it spread across line cards.

surprise! line cards from vendor <any> do not have uniform hashing
or rotating algorithms.

randy

Re: Lossy cogent p2p experiences? [ In reply to ]

mark at tinka

Sep 9, 2023, 12:33 PM

Post #78 of 82 (156 views)

Permalink

On 9/9/23 20:44, Randy Bush wrote:

> i am going to be foolish and comment, as i have not seen this raised
>
> if i am running a lag, i can not resist adding a bit of resilience by
> having it spread across line cards.
>
> surprise! line cards from vendor <any> do not have uniform hashing
> or rotating algorithms.

We spread all our LAG's across multiple line cards wherever possible
(wherever possible = chassis-based hardware).

I am not intimately aware of any hashing concerns for LAG's that
traverse multiple line cards in the same chassis.

Mark.

Re: Lossy cogent p2p experiences? [ In reply to ]

craetdave at gmail

Sep 9, 2023, 1:29 PM

Post #79 of 82 (156 views)

Permalink

At a previous $dayjob at a Tier 1, we would only support LAG for a customer
L2/3 service if the ports were on the same card. The response we gave if
customers pushed back was "we don't consider LAG a form of circuit
protection, so we're not going to consider physical resiliency in the
design", which was true, because we didn't, but it was beside the point.
The real reason was that getting our switching/routing platform to actually
run traffic symmetrically across a LAG, which most end users considered
expected behavior in a LAG, required a reconfiguration of the default hash,
which effectively meant that [switching/routing vendor]'s TAC wouldn't help
when something invariably went wrong. So it wasn't that it wouldn't work
(my recollection at least is that everything ran fine in lab environments)
but we didn't trust the hardware vendor support.

On Sat, Sep 9, 2023 at 3:36?PM Mark Tinka <mark@tinka.africa> wrote:

>
>
> On 9/9/23 20:44, Randy Bush wrote:
>
> > i am going to be foolish and comment, as i have not seen this raised
> >
> > if i am running a lag, i can not resist adding a bit of resilience by
> > having it spread across line cards.
> >
> > surprise! line cards from vendor <any> do not have uniform hashing
> > or rotating algorithms.
>
> We spread all our LAG's across multiple line cards wherever possible
> (wherever possible = chassis-based hardware).
>
> I am not intimately aware of any hashing concerns for LAG's that
> traverse multiple line cards in the same chassis.
>
> Mark.
>

--
- Dave Cohen
craetdave@gmail.com
@dCoSays
www.venicesunlight.com

Re: Lossy cogent p2p experiences? [ In reply to ]

mark at tinka

Sep 9, 2023, 6:22 PM

Post #80 of 82 (154 views)

Permalink

On 9/9/23 22:29, Dave Cohen wrote:

> At a previous $dayjob at a Tier 1, we would only support LAG for a
> customer L2/3 service if the ports were on the same card. The response
> we gave if customers pushed back was "we don't consider LAG a form of
> circuit protection, so we're not going to consider physical resiliency
> in the design", which was true, because we didn't, but it was beside
> the point. The real reason was that getting our switching/routing
> platform to actually run traffic symmetrically across a LAG, which
> most end users considered expected behavior in a LAG, required a
> reconfiguration of the default hash, which effectively meant that
> [switching/routing vendor]'s TAC wouldn't help when something
> invariably went wrong. So it wasn't that it wouldn't work (my
> recollection at least is that everything ran fine in lab environments)
> but we didn't trust the hardware vendor support.

We've had the odd bug here and there with LAG's for things like VRRP,
BFD, e.t.c. But we have not run into that specific issue before on
ASR1000's, ASR9000's, CRS-X's and MX. 98% of our network is Juniper
nowadays, but even when we ran Cisco and had LAG's across multiple line
cards, we didn't see this problem.

The only hashing issue we had with LAG's is when we tried to carry Layer
2 traffic across them in the core. But this was just a limitation of the
CRS-X, and happened also on member links of a LAG that shared the same
line card.

Mark.

Re: Lossy cogent p2p experiences? [ In reply to ]

saku at ytti

Sep 10, 2023, 12:39 AM

Post #81 of 82 (154 views)

Permalink

On Sat, 9 Sept 2023 at 21:36, Benny Lyne Amorsen
<benny+usenet@amorsen.dk> wrote:

> The Linux TCP stack does not immediately start backing off when it
> encounters packet reordering. In the server world, packet-based
> round-robin is a fairly common interface bonding strategy, with the
> accompanying reordering, and generally it performs great.

If you have
Linux - 1RU cat-or-such - Router - Internet

Mostly round-robin between Linux-1RU is gonna work, because it
satisfies the a) non congested b) equal rtt c) non-distributed (single
pipeline ASIC switch, honoring ingress order on egress),
requirements. But it is quite a special case, and of course there is
only a round-robin on one link in one direction.

Between 3.6-4.4 all multipath in Linux was broken, and I still to this
day help people with problems on multipath complaining it doesn't
perform (in LAN!).

3.6 introduced FIB to replace flow-cache, and made multipath essentially random
4.4 replaced random with hash

When I ask them 'do you see reordering', people mostly reply 'no',
because they look at PCAP and it doesn't look important to the human
observer, it is such an insignificant amount.. Invariable problem goes
away with hashing. (netstat -s is better than intuition on PCAP).

--
++ytti

Re: Lossy cogent p2p experiences? [ In reply to ]

dhubbard at dino

Sep 11, 2023, 9:14 AM

Post #82 of 82 (150 views)

Permalink

Some interesting new developments on this, independent of the divergent network equipment discussion. ????

Cogent had a field engineer at the east coast location where my local loop (10gig wave) meets their equipment, i.e. (me – patch cable to loop provider’s wave equipment – wave – patch cable to Cogent equipment). On the other end, the geographically distant west coast direction, it’s Cogent equipment to my equipment in the same facility with just patch cable. They connected some model of EXFO’s NetBlazer FTBx 8880-series testing device to a port on their east coast network device, not disconnecting my circuit. Originally, they were planning to have someone physically loop at their equipment at the other end, but I volunteered that my Arista gear supports a provider-facing loop at the transceiver level if they wanted to try that, so my loop, cabling, and transceiver could be part of the testing.

One direction at a time, they interrupted the point to point config to create a point to point between one direction of my gear, set to loopback mode, and the NetBlazer device. The device was set to use five parallel streams. In the close direction, where the third-party wave is involved, they ran at full 5 x 2gbps for thirty minutes, had zero packets lost, no issues. My monitoring confirmed this rate of port input was occurring, although oddly not output, but perhaps Arista doesn’t “see”/count the retransmitted packets in phy loopback mode.

In the distant direction across their backbone, their equipment at the remote end, and the fiber patch cable to me, they tested at 9.5 Gbit for thirty minutes through my device in loopback mode. The result was, of 2.6B packets sent, only 334 packets lost. They configured for 9.5 gbps rate of testing, so five 1.9gbps streams. Across the five streams, the report has a “frame loss” and out of sequence section. Zero out of sequence, but among the five streams, loss seconds / count were 3 / 26, 3 / 48, 1 / 5, 13 / 221, 1 / 34. I’m not familiar with this testing device, but to me that suggests it’s stating how many of the total seconds experienced loss, and the counted packet loss. So really the only one that stands out is the one with thirteen seconds where loss occurred, but the packet counts we’re talking about are miniscule. Again, my monitoring at the interface level showed this 9.5gbps of testing occurring for the thirty minutes the report says.

So, now I’m just completely confused. How is this device, traversing the same equipment, ports, cables, able to achieve far greater average throughput, and almost no loss, across a very long duration? There are times I’ll be able to achieve nearly the same, but never for a test longer than ten seconds as it just falls off from there. For example, I did a five parallel stream TCP test with iperf just now and did achieve a net throughput of 8.16 Gbps with about 1200 retransmits. Same five stream test run for half hour like theirs, I got no better than 2.64 Gbps and 183,000 retransmits.

iperf and UDP allow me to see loss at any rate of transmit exceeding ~140mbps, in just seconds, not a half hour. To rule out my gear, I’m also able to perform the same tests from the same systems (both VM and physical) using public addresses and traversing the internet, as these are publicly connected systems. I get far lower loss and much greater throughput on the internet path. For example, simple ten second test of a single stream at 400 Mbit UDP; 5 packets lost across internet, 491 across P2P. Single stream TCP across the internet for ten seconds; 3.47 Gbps, 162 retransmits. Across the P2P, this time at least, 637 Mbps, 3633 retransmits.

David

From: David Hubbard <dhubbard@dino.hostasaurus.com>
Date: Friday, September 1, 2023 at 10:19 AM
To: Nanog@nanog.org <nanog@nanog.org>
Subject: Re: Lossy cogent p2p experiences?
The initial and recurring packet loss occurs on any flow of more than ~140 Mbit. The fact that it’s loss-free under that rate is what furthers my opinion it’s config-based somewhere, even though they say it isn’t.

From: NANOG <nanog-bounces+dhubbard=dino.hostasaurus.com@nanog.org> on behalf of Mark Tinka <mark@tinka.africa>
Date: Friday, September 1, 2023 at 10:13 AM
To: Mike Hammett <nanog@ics-il.net>, Saku Ytti <saku@ytti.fi>
Cc: nanog@nanog.org <nanog@nanog.org>
Subject: Re: Lossy cogent p2p experiences?

On 9/1/23 15:44, Mike Hammett wrote:
and I would say the OP wasn't even about elephant flows, just about a network that can't deliver anything acceptable.

Unless Cogent are not trying to accept (and by extension, may not be able to guarantee) large Ethernet flows because they can't balance them across their various core links, end-to-end...

Pure conjecture...

Mark.