Mailing List Archive

Strange connectivity issue Frontier EVPL
We have a strange issue that defies logic. We have a NNI at our POP with
Frontier serving as an aggregation circuit with different customers on
different VLANs. It's working well to several customers.

Bringing up a new customer shows roughly half of the IP addresses
unreachable across the link, as if there's some kind of load-balancing
or hashing function that's mis-directing half of the traffic. It's
consistent, if an address is reachable it's always reachable. If it's
not reachable, it's never reachable. Everything ARPs fine.

The Frontier circuit is layer 2 so shouldn't care about IP addresses.
Frontier tech shows no trouble. They changed the RAD device on-premise.
We've triple-checked configurations, torn down and rebuilt subinterface,
etc. with no joy.

Any suggestions?

--
Jay Hennigan - jay@west.net
Network Engineering - CCIE #7880
503 897-8550 - WB6RDV
Re: Strange connectivity issue Frontier EVPL [ In reply to ]
Could you be running up against a MAC table limit on the circuit?

On 11/6/20 11:59 AM, Jay Hennigan wrote:
> We have a strange issue that defies logic. We have a NNI at our POP with
> Frontier serving as an aggregation circuit with different customers on
> different VLANs. It's working well to several customers.
>
> Bringing up a new customer shows roughly half of the IP addresses
> unreachable across the link, as if there's some kind of load-balancing
> or hashing function that's mis-directing half of the traffic. It's
> consistent, if an address is reachable it's always reachable. If it's
> not reachable, it's never reachable. Everything ARPs fine.
>
> The Frontier circuit is layer 2 so shouldn't care about IP addresses.
> Frontier tech shows no trouble. They changed the RAD device on-premise.
> We've triple-checked configurations, torn down and rebuilt subinterface,
> etc. with no joy.
>
> Any suggestions?
>
RE: Strange connectivity issue Frontier EVPL [ In reply to ]
EVPL (eline) should not be learning macs. So mac table size should be a non-issue. Unless someone somewhere has constructed a 2-part bridge domain (mef-speak, etree or elan of sorts) which would have mac learning, then Matt's question comes into play.

-Aaron

-----Original Message-----
From: NANOG <nanog-bounces+aaron1=gvtc.com@nanog.org> On Behalf Of Matt Hoppes
Sent: Friday, November 6, 2020 11:09 AM
To: Jay Hennigan <jay@west.net>; NANOG list <nanog@nanog.org>
Subject: Re: Strange connectivity issue Frontier EVPL

Could you be running up against a MAC table limit on the circuit?

On 11/6/20 11:59 AM, Jay Hennigan wrote:
> We have a strange issue that defies logic. We have a NNI at our POP
> with Frontier serving as an aggregation circuit with different
> customers on different VLANs. It's working well to several customers.
>
> Bringing up a new customer shows roughly half of the IP addresses
> unreachable across the link, as if there's some kind of load-balancing
> or hashing function that's mis-directing half of the traffic. It's
> consistent, if an address is reachable it's always reachable. If it's
> not reachable, it's never reachable. Everything ARPs fine.
>
> The Frontier circuit is layer 2 so shouldn't care about IP addresses.
> Frontier tech shows no trouble. They changed the RAD device on-premise.
> We've triple-checked configurations, torn down and rebuilt
> subinterface, etc. with no joy.
>
> Any suggestions?
>
Re: Strange connectivity issue Frontier EVPL [ In reply to ]
On 11/6/20 09:08, Matt Hoppes wrote:
> Could you be running up against a MAC table limit on the circuit?

Unlikely. The only MACs that should be in play are our gateway on our PE
router and the customer's router and those are both in the address table
and ARP. At layer 3, customer can consistently reach about 50% of the IP
addresses attempted.

--
Jay Hennigan - jay@west.net
Network Engineering - CCIE #7880
503 897-8550 - WB6RDV
Re: Strange connectivity issue Frontier EVPL [ In reply to ]
Jay, I previously ran the engineering org over there, so sent this to my old team to look at, including the best engineer I know in regard to the RADs. Will pass along anything they come back with.

Thanks,
-Jeff

> On Nov 6, 2020, at 8:59 AM, Jay Hennigan <jay@west.net> wrote:
>
> We have a strange issue that defies logic. We have a NNI at our POP with Frontier serving as an aggregation circuit with different customers on different VLANs. It's working well to several customers.
>
> Bringing up a new customer shows roughly half of the IP addresses unreachable across the link, as if there's some kind of load-balancing or hashing function that's mis-directing half of the traffic. It's consistent, if an address is reachable it's always reachable. If it's not reachable, it's never reachable. Everything ARPs fine.
>
> The Frontier circuit is layer 2 so shouldn't care about IP addresses. Frontier tech shows no trouble. They changed the RAD device on-premise. We've triple-checked configurations, torn down and rebuilt subinterface, etc. with no joy.
>
> Any suggestions?
>
> --
> Jay Hennigan - jay@west.net
> Network Engineering - CCIE #7880
> 503 897-8550 - WB6RDV
Re: Strange connectivity issue Frontier EVPL [ In reply to ]
I have similar Frontier NNI's out of One Wilshire, some 1gig some 10.

While I haven't seen the half-IP-reachable issue you describe I have spent
days and days chasing performance issues on them. I finally got gig
line-rate capable iperf3 boxes at both ends and see distinct differences
in single-TCP stream performance vs running 3-4 streams, and the difference
disappears like clockwork at "unbusy hours" (1am-7am) every day.

After running hundreds of tests and adjusting my buffering and RED on both
ends of these circuits I just have come to the conclusion that they have
some LAGs somewhere "in the middle" that get busy during the day, and
they don't care if I have to run 4 TCP streams to max a 1gig circuit.

It makes browser-based speedtests look really bad but otherwise the
circuits are usable. We're trying to replace the worst ones with
wavelength services.

-Will Orton


On Fri, Nov 06, 2020 at 08:59:28AM -0800, Jay Hennigan wrote:
> We have a strange issue that defies logic. We have a NNI at our POP
> with Frontier serving as an aggregation circuit with different
> customers on different VLANs. It's working well to several
> customers.
>
> Bringing up a new customer shows roughly half of the IP addresses
> unreachable across the link, as if there's some kind of
> load-balancing or hashing function that's mis-directing half of the
> traffic. It's consistent, if an address is reachable it's always
> reachable. If it's not reachable, it's never reachable. Everything
> ARPs fine.
>
> The Frontier circuit is layer 2 so shouldn't care about IP
> addresses. Frontier tech shows no trouble. They changed the RAD
> device on-premise. We've triple-checked configurations, torn down
> and rebuilt subinterface, etc. with no joy.
>
> Any suggestions?
>
> --
> Jay Hennigan - jay@west.net
> Network Engineering - CCIE #7880
> 503 897-8550 - WB6RDV
Re: Strange connectivity issue Frontier EVPL [ In reply to ]
What hardware is on each side?

> On Nov 6, 2020, at 10:08, will@loopfree.net wrote:
>
> ?I have similar Frontier NNI's out of One Wilshire, some 1gig some 10.
>
> While I haven't seen the half-IP-reachable issue you describe I have spent
> days and days chasing performance issues on them. I finally got gig
> line-rate capable iperf3 boxes at both ends and see distinct differences
> in single-TCP stream performance vs running 3-4 streams, and the difference
> disappears like clockwork at "unbusy hours" (1am-7am) every day.
>
> After running hundreds of tests and adjusting my buffering and RED on both
> ends of these circuits I just have come to the conclusion that they have
> some LAGs somewhere "in the middle" that get busy during the day, and
> they don't care if I have to run 4 TCP streams to max a 1gig circuit.
>
> It makes browser-based speedtests look really bad but otherwise the
> circuits are usable. We're trying to replace the worst ones with
> wavelength services.
>
> -Will Orton
>
>
>> On Fri, Nov 06, 2020 at 08:59:28AM -0800, Jay Hennigan wrote:
>> We have a strange issue that defies logic. We have a NNI at our POP
>> with Frontier serving as an aggregation circuit with different
>> customers on different VLANs. It's working well to several
>> customers.
>>
>> Bringing up a new customer shows roughly half of the IP addresses
>> unreachable across the link, as if there's some kind of
>> load-balancing or hashing function that's mis-directing half of the
>> traffic. It's consistent, if an address is reachable it's always
>> reachable. If it's not reachable, it's never reachable. Everything
>> ARPs fine.
>>
>> The Frontier circuit is layer 2 so shouldn't care about IP
>> addresses. Frontier tech shows no trouble. They changed the RAD
>> device on-premise. We've triple-checked configurations, torn down
>> and rebuilt subinterface, etc. with no joy.
>>
>> Any suggestions?
>>
>> --
>> Jay Hennigan - jay@west.net
>> Network Engineering - CCIE #7880
>> 503 897-8550 - WB6RDV
Re: Strange connectivity issue Frontier EVPL [ In reply to ]
On 11/6/20 10:14, Mike Lyon wrote:
> What hardware is on each side?

On our aggregate side an ASR920. Customer has a RAD device as the
Frontier handoff. We've seen the same issue with multiple devices at the
customer side including a laptop direct to the RAD.

--
Jay Hennigan - jay@west.net
Network Engineering - CCIE #7880
503 897-8550 - WB6RDV
Re: Strange connectivity issue Frontier EVPL [ In reply to ]
Recently saw a relatively same problem when Wave migrated us off of their antiquated 6500 to a brand new ASR920. EVPL had been working flawlessly for years on the 6500, but then stopped working when migrated to the ASR. Tried multiple ports on the ASR and then even another brand new ASR, same problem. Moved the circuit over to another (different) antiquated 6500 and all was good.

On my side, i was using a Mikrotik, i had the port in a bridge group and was seeing all the MAC addresses across the link but for some reason, they weren’t showing up in the ARP table of the Mikrotik. Tried a couple other Mikrotik devices, same thing. Installed a dumb gigabit switch in the middle, same thing. However, when my laptop was plugged in, that worked.

So yes, seen the same weird behavior. As to how to fix it, no idea :)

-Mike



> On Nov 6, 2020, at 10:32, Jay Hennigan <jay@west.net> wrote:
>
> ?On 11/6/20 10:14, Mike Lyon wrote:
>> What hardware is on each side?
>
> On our aggregate side an ASR920. Customer has a RAD device as the Frontier handoff. We've seen the same issue with multiple devices at the customer side including a laptop direct to the RAD.
>
> --
> Jay Hennigan - jay@west.net
> Network Engineering - CCIE #7880
> 503 897-8550 - WB6RDV
Re: Strange connectivity issue Frontier EVPL [ In reply to ]
Am Freitag, 6. November 2020, 10:31:25 schrieb Jay Hennigan:
> On 11/6/20 10:14, Mike Lyon wrote:
> > What hardware is on each side?
>
> On our aggregate side an ASR920. Customer has a RAD device as the
> Frontier handoff. We've seen the same issue with multiple devices at the
> customer side including a laptop direct to the RAD.

It sounds a bit like loadbalancing with one broken link...

Have you verified, for example with acl counters at both sides of the link, in which direction the
packets are dropped?

As the customer has changed the devices, does the ASR uses a MAC starting with 4 or 6?

My only idea at the moment is to generate load on the link with a udp traffic generator which
does not work end to end and let them check where the traffic dies within their network.


Karsten
Re: Strange connectivity issue Frontier EVPL [ In reply to ]
This is my biggest complaint about non-wavelength transport. The provider is overselling a port somewhere in the circuit, unless it's a wave.




-----
Mike Hammett
Intelligent Computing Solutions

Midwest Internet Exchange

The Brothers WISP

----- Original Message -----

From: will@loopfree.net
To: nanog@nanog.org
Sent: Friday, November 6, 2020 11:54:53 AM
Subject: Re: Strange connectivity issue Frontier EVPL

I have similar Frontier NNI's out of One Wilshire, some 1gig some 10.

While I haven't seen the half-IP-reachable issue you describe I have spent
days and days chasing performance issues on them. I finally got gig
line-rate capable iperf3 boxes at both ends and see distinct differences
in single-TCP stream performance vs running 3-4 streams, and the difference
disappears like clockwork at "unbusy hours" (1am-7am) every day.

After running hundreds of tests and adjusting my buffering and RED on both
ends of these circuits I just have come to the conclusion that they have
some LAGs somewhere "in the middle" that get busy during the day, and
they don't care if I have to run 4 TCP streams to max a 1gig circuit.

It makes browser-based speedtests look really bad but otherwise the
circuits are usable. We're trying to replace the worst ones with
wavelength services.

-Will Orton


On Fri, Nov 06, 2020 at 08:59:28AM -0800, Jay Hennigan wrote:
> We have a strange issue that defies logic. We have a NNI at our POP
> with Frontier serving as an aggregation circuit with different
> customers on different VLANs. It's working well to several
> customers.
>
> Bringing up a new customer shows roughly half of the IP addresses
> unreachable across the link, as if there's some kind of
> load-balancing or hashing function that's mis-directing half of the
> traffic. It's consistent, if an address is reachable it's always
> reachable. If it's not reachable, it's never reachable. Everything
> ARPs fine.
>
> The Frontier circuit is layer 2 so shouldn't care about IP
> addresses. Frontier tech shows no trouble. They changed the RAD
> device on-premise. We've triple-checked configurations, torn down
> and rebuilt subinterface, etc. with no joy.
>
> Any suggestions?
>
> --
> Jay Hennigan - jay@west.net
> Network Engineering - CCIE #7880
> 503 897-8550 - WB6RDV
RE: Strange connectivity issue Frontier EVPL [ In reply to ]
My coworker is having similar issues with PS Lightwave and Alpheus/Logix
from San Antonio to Houston whereas some things work and somethings don't

-Aaron
Re: Strange connectivity issue Frontier EVPL [ In reply to ]
I'm amazed you can get *anything* to work with Logix involved. Haven't
heard of many issues with PSLightwave in Houston, however... they seem
to be one of the only halfway decent options here.

On 11/6/20 2:57 PM, aaron1@gvtc.com wrote:
> My coworker is having similar issues with PS Lightwave and Alpheus/Logix
> from San Antonio to Houston whereas some things work and somethings don't
>
> -Aaron
>
>
Re: Strange connectivity issue Frontier EVPL [ In reply to ]
As it happens, I've just recently turned up a peering circuit with PSL in Houston, and their senior engineer is clue++

Naturally, he's on vacation this week, but [Aaron] ping me unicast if I might be able to assist/lend eyeballs/make an introduction of you guys next week.

--Adam

?On 11/9/20, 8:05 AM, "NANOG on behalf of Tim Burke" <nanog-bounces+ak=mid.net@nanog.org on behalf of tim@mid.net> wrote:

I'm amazed you can get *anything* to work with Logix involved. Haven't
heard of many issues with PSLightwave in Houston, however... they seem
to be one of the only halfway decent options here.

On 11/6/20 2:57 PM, aaron1@gvtc.com wrote:
> My coworker is having similar issues with PS Lightwave and Alpheus/Logix
> from San Antonio to Houston whereas some things work and somethings don't
>
> -Aaron
>
>