Mailing List Archive: EVPN all-active toward large layer 2?

EVPN all-active toward large layer 2?

rwf at loonybin

Apr 17, 2019, 10:43 PM

Post #1 of 12 (2525 views)

I've been experimenting with EVPN all-active multihoming toward some large
legacy layer 2 domains, and running into some fairly bizarre behavior...

First and foremost, is a topology like this even a valid use case?

EVPN PE <-> switch <-> switch <-> EVPN PE

...where both switches are STP root bridges and have a pile of VLANs and
other switches behind them. All of the documentation seems to hint at
LACP toward a single CE device being the expected config here -- is that
accurate? If so, are there any options to make the above work?

If I turn up EVPN virtual-switch routing instances on both PEs as above
with config on both roughly equivalent to the following:

interfaces {
xe-0/1/2 {
flexible-vlan-tagging;
encapsulation flexible-ethernet-services;
esi {
00:11:11:11:11:11:11:11:11:11;
all-active;
}
unit 12 {
encapsulation vlan-bridge;
vlan-id 12;
}
}
}
routing-instances {
test {
instance-type virtual-switch;
vrf-target target:65000:1;
protocols {
evpn {
extended-vlan-list 12;
}
}
bridge-domains {
test-vlan12 {
vlan-id 12;
interface xe-0/1/2.12;
}
}
}
}

Everything works fine for a few minutes -- exact time varies -- then what
appears to be thousands of packets of unknown unicast traffic starts
flowing between the PEs, and doesn't stop until one or the other is
disabled. Same behavior on this particular segment with or without any
remote PEs connected.

Both PEs are MX204s running 18.1R3-S4, automatic route distinguishers,
full mesh RSVP LSPs between, direct BGP with family evpn allowed, no LDP.

I'm going to try a few more tests with single-active and enabling MAC
accounting to try to nail down what this traffic actually is, but figure
I'd better first ask whether I'm nuts for trying this at all...

-Rob
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: EVPN all-active toward large layer 2? [ In reply to ]

kszarkowicz at gmail

Apr 17, 2019, 11:24 PM

Post #2 of 12 (2525 views)

Hi Rob,

RFC 7432, Section 8.5:

If a bridged network is multihomed to more than one PE in an EVPN
network via switches, then the support of All-Active redundancy mode
requires the bridged network to be connected to two or more PEs using
a LAG.

So, have you MC-LAG (facing EVPN PEs) configured on your switches?

Thanks,
Krzysztof

> On 2019-Apr-18, at 07:43, Rob Foehl <rwf@loonybin.net> wrote:
>
> I've been experimenting with EVPN all-active multihoming toward some large legacy layer 2 domains, and running into some fairly bizarre behavior...
>
> First and foremost, is a topology like this even a valid use case?
>
> EVPN PE <-> switch <-> switch <-> EVPN PE
>
> ...where both switches are STP root bridges and have a pile of VLANs and other switches behind them. All of the documentation seems to hint at LACP toward a single CE device being the expected config here -- is that accurate? If so, are there any options to make the above work?
>
> If I turn up EVPN virtual-switch routing instances on both PEs as above with config on both roughly equivalent to the following:
>
> interfaces {
> xe-0/1/2 {
> flexible-vlan-tagging;
> encapsulation flexible-ethernet-services;
> esi {
> 00:11:11:11:11:11:11:11:11:11;
> all-active;
> }
> unit 12 {
> encapsulation vlan-bridge;
> vlan-id 12;
> }
> }
> }
> routing-instances {
> test {
> instance-type virtual-switch;
> vrf-target target:65000:1;
> protocols {
> evpn {
> extended-vlan-list 12;
> }
> }
> bridge-domains {
> test-vlan12 {
> vlan-id 12;
> interface xe-0/1/2.12;
> }
> }
> }
> }
>
> Everything works fine for a few minutes -- exact time varies -- then what appears to be thousands of packets of unknown unicast traffic starts flowing between the PEs, and doesn't stop until one or the other is disabled. Same behavior on this particular segment with or without any remote PEs connected.
>
> Both PEs are MX204s running 18.1R3-S4, automatic route distinguishers, full mesh RSVP LSPs between, direct BGP with family evpn allowed, no LDP.
>
> I'm going to try a few more tests with single-active and enabling MAC accounting to try to nail down what this traffic actually is, but figure I'd better first ask whether I'm nuts for trying this at all...
>
> -Rob
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: EVPN all-active toward large layer 2? [ In reply to ]

rwf at loonybin

Apr 17, 2019, 11:35 PM

Post #3 of 12 (2525 views)

On Thu, 18 Apr 2019, Krzysztof Szarkowicz wrote:

> Hi Rob,
> RFC 7432, Section 8.5:
>
> If a bridged network is multihomed to more than one PE in an EVPN
> network via switches, then the support of All-Active redundancy mode
> requires the bridged network to be connected to two or more PEs using
> a LAG.
>
>
> So, have you MC-LAG (facing EVPN PEs) configured on your switches?

No, hence the question... I'd have expected ESI-LAG to be relevant for
EVPN, and in this case it's not a single "CE" device but rather an entire
layer 2 domain. For a few of those, Juniper-flavored MC-LAG isn't an
option, anyway. In any case, it's not clear what 8.5 means by "must be
connected using a LAG" -- from only one device in said bridged network?

-Rob
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: EVPN all-active toward large layer 2? [ In reply to ]

wojciech.janiszewski at gmail

Apr 17, 2019, 11:48 PM

Post #4 of 12 (2525 views)

Hi Rob,

You have effectively created L2 loop over EVPN, so to cut it you need a
link between bridged network and EVPN to be a single link. There is no STP
in EVPN.
If you need two physical connections to between those networks, then LAG is
a way to go. MC-LAG or virtual chassis can be configured on legacy switches
to maintain that connection. ESI will handle that on EVPN side.

HTH,
Wojciech

czw., 18 kwi 2019, 08:37 u?ytkownik Rob Foehl <rwf@loonybin.net> napisa?:

> On Thu, 18 Apr 2019, Krzysztof Szarkowicz wrote:
>
> > Hi Rob,
> > RFC 7432, Section 8.5:
> >
> > If a bridged network is multihomed to more than one PE in an EVPN
> > network via switches, then the support of All-Active redundancy mode
> > requires the bridged network to be connected to two or more PEs using
> > a LAG.
> >
> >
> > So, have you MC-LAG (facing EVPN PEs) configured on your switches?
>
> No, hence the question... I'd have expected ESI-LAG to be relevant for
> EVPN, and in this case it's not a single "CE" device but rather an entire
> layer 2 domain. For a few of those, Juniper-flavored MC-LAG isn't an
> option, anyway. In any case, it's not clear what 8.5 means by "must be
> connected using a LAG" -- from only one device in said bridged network?
>
> -Rob
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: EVPN all-active toward large layer 2? [ In reply to ]

kszarkowicz at gmail

Apr 17, 2019, 11:53 PM

Post #5 of 12 (2525 views)

Hi Rob,

As per RFC, bridges must appear to EVPN PEs as a LAG. In essence, you need to configure MC-LAG (facing EVPN PEs) on the switches facing EVPN PEs, if you have multiple switches facing EVPN-PEs. Switches doesn’t need to be from Juniper, so MC-LAG on the switches doesn’t need to be Juniper-flavored. If you have single switch facing EVPN PEs -> simple LAG (with members towards different EVPN PEs) on that single switch is OK.

Thanks,
Krzysztof

> On 2019-Apr-18, at 08:35, Rob Foehl <rwf@loonybin.net> wrote:
>
> On Thu, 18 Apr 2019, Krzysztof Szarkowicz wrote:
>
>> Hi Rob,
>> RFC 7432, Section 8.5:
>>
>> If a bridged network is multihomed to more than one PE in an EVPN
>> network via switches, then the support of All-Active redundancy mode
>> requires the bridged network to be connected to two or more PEs using
>> a LAG.
>> So, have you MC-LAG (facing EVPN PEs) configured on your switches?
>
> No, hence the question... I'd have expected ESI-LAG to be relevant for EVPN, and in this case it's not a single "CE" device but rather an entire layer 2 domain. For a few of those, Juniper-flavored MC-LAG isn't an option, anyway. In any case, it's not clear what 8.5 means by "must be connected using a LAG" -- from only one device in said bridged network?
>
> -Rob

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: EVPN all-active toward large layer 2? [ In reply to ]

rwf at loonybin

Apr 18, 2019, 12:33 AM

Post #6 of 12 (2525 views)

On Thu, 18 Apr 2019, Wojciech Janiszewski wrote:

> You have effectively created L2 loop over EVPN, so to cut it you need a
> link between bridged network and EVPN to be a single link. There is no STP
> in EVPN.
> If you need two physical connections to between those networks, then LAG is
> a way to go. MC-LAG or virtual chassis can be configured on legacy switches
> to maintain that connection. ESI will handle that on EVPN side.

On Thu, 18 Apr 2019, Krzysztof Szarkowicz wrote:

> As per RFC, bridges must appear to EVPN PEs as a LAG. In essence, you need to configure MC-LAG (facing EVPN PEs) on the switches facing EVPN PEs, if you have multiple switches facing EVPN-PEs. Switches doesn?t need to be from Juniper, so MC-LAG on the switches doesn?t need to be Juniper-flavored. If you have single switch facing EVPN PEs -> simple LAG (with members towards different EVPN PEs) on that single switch is OK.

Got it. Insufficiently careful reading of the RFC vs. Juniper example
documentation. I really ought to know better by now...

Unfortunately, doing MC-LAG of any flavor toward the PEs from some of
these switches is easier said than done. Assuming incredibly dumb layer 2
only, and re-reading RFC 7432 8.5 more carefully this time... Is
single-active a viable option here? If so, is there any support on the MX
for what the RFC is calling service carving for VLAN-aware bundles for
basic load balancing between the PEs?

Thanks for setting me straight!

-Rob
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: EVPN all-active toward large layer 2? [ In reply to ]

kszarkowicz at gmail

Apr 18, 2019, 12:48 AM

Post #7 of 12 (2525 views)

Hi Rob,

Indeed, for single-active, no LAG is needed, as only DF PE will allow traffic, and other PEs (nDF) will block all the traffic for given VLAN. So, you can deploy single-active. It is supported on MX (incluidng service carving for VLAN-aware bundle).

Thanks,
Krzysztof

> On 2019-Apr-18, at 09:33, Rob Foehl <rwf@loonybin.net> wrote:
>
> On Thu, 18 Apr 2019, Wojciech Janiszewski wrote:
>
>> You have effectively created L2 loop over EVPN, so to cut it you need a
>> link between bridged network and EVPN to be a single link. There is no STP
>> in EVPN.
>> If you need two physical connections to between those networks, then LAG is
>> a way to go. MC-LAG or virtual chassis can be configured on legacy switches
>> to maintain that connection. ESI will handle that on EVPN side.
>
> On Thu, 18 Apr 2019, Krzysztof Szarkowicz wrote:
>
>> As per RFC, bridges must appear to EVPN PEs as a LAG. In essence, you need to configure MC-LAG (facing EVPN PEs) on the switches facing EVPN PEs, if you have multiple switches facing EVPN-PEs. Switches doesn’t need to be from Juniper, so MC-LAG on the switches doesn’t need to be Juniper-flavored. If you have single switch facing EVPN PEs -> simple LAG (with members towards different EVPN PEs) on that single switch is OK.
>
> Got it. Insufficiently careful reading of the RFC vs. Juniper example documentation. I really ought to know better by now...
>
> Unfortunately, doing MC-LAG of any flavor toward the PEs from some of these switches is easier said than done. Assuming incredibly dumb layer 2 only, and re-reading RFC 7432 8.5 more carefully this time... Is single-active a viable option here? If so, is there any support on the MX for what the RFC is calling service carving for VLAN-aware bundles for basic load balancing between the PEs?
>
> Thanks for setting me straight!
>
> -Rob

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: EVPN all-active toward large layer 2? [ In reply to ]

tarko at lanparty

Apr 18, 2019, 2:13 AM

Post #8 of 12 (2524 views)

hey,

> You have effectively created L2 loop over EVPN, so to cut it you need a
> link between bridged network and EVPN to be a single link. There is no STP
> in EVPN.

To be fair it's not a full loop but only BUM traffic will loop back to
other PE.

Single-active is only way forward if you cannot do something like MC-LAG
from the L2 domain.

--
tarko
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: EVPN all-active toward large layer 2? [ In reply to ]

adamv0025 at netconsultings

Apr 19, 2019, 3:58 AM

Post #9 of 12 (2509 views)

> Rob Foehl
> Sent: Thursday, April 18, 2019 6:43 AM
>
> First and foremost, is a topology like this even a valid use case?
>
> EVPN PE <-> switch <-> switch <-> EVPN PE
>
> ...where both switches are STP root bridges and have a pile of VLANs and
> other switches behind them. All of the documentation seems to hint at
LACP
> toward a single CE device being the expected config here -- is that
accurate?
> If so, are there any options to make the above work?
>
When I first heard of active-active for EVPN I thought yeah mac-level
load-sharing! Perfect that's just like in L3 world.
But then I realized that the only available solution to do that was to
reduce your topology to:
EVPN PE <-> switch
With some clever tricks like using MC-LAG.
Without these the only level at which one can do load-sharing is at the VLAN
level.
That is active-standby with one half of VLANs active on PE1 and other half
active on PE2.

I came to conclusion that the ultimate problem why true MAC level
active-active is not possible in EVPN is not limitation of EVPN itself, but
rather is a limitation of the L2 domain where it boils down to a fact that
you can't have a single MAC address associated with two ports at the same
time (like it's possible with L3 routes), which I don't really know why that
is the case.

adam

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: EVPN all-active toward large layer 2? [ In reply to ]

adamv0025 at netconsultings

Apr 19, 2019, 4:02 AM

Post #10 of 12 (2509 views)

> Wojciech Janiszewski
> Sent: Thursday, April 18, 2019 7:48 AM
>
> Hi Rob,
>
> You have effectively created L2 loop over EVPN, so to cut it you need a link
> between bridged network and EVPN to be a single link. There is no STP in
> EVPN.
>
So the bridge-domains on PEs consume BPDUs and do not food them on all ports (not looking in obviously)?
-if BDs on PEs would flood BPDUs that that would solve this problem allowing the switches kill the loop (isn't there a knob that can be turned on to allow this?).

adam

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: EVPN all-active toward large layer 2? [ In reply to ]

adamv0025 at netconsultings

Apr 19, 2019, 4:04 AM

Post #11 of 12 (2509 views)

> Tarko Tikan
> Sent: Thursday, April 18, 2019 10:14 AM
>
> hey,
>
> > You have effectively created L2 loop over EVPN, so to cut it you need
> > a link between bridged network and EVPN to be a single link. There is
> > no STP in EVPN.
>
> To be fair it's not a full loop but only BUM traffic will loop back to
other PE.
>
Yes but there should be an MPLS label associated with that traffic that says
to the other PE -do not send this traffic back to LAN -cause it's the same
site.

adam

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: EVPN all-active toward large layer 2? [ In reply to ]

ekoyle+puck.nether.net at gmail

Apr 23, 2019, 4:57 PM

Post #12 of 12 (2497 views)

On Fri, Apr 19, 2019 at 5:06 AM <adamv0025@netconsultings.com> wrote:
>
> > Tarko Tikan
> > Sent: Thursday, April 18, 2019 10:14 AM
> >
> > hey,
> >
> > > You have effectively created L2 loop over EVPN, so to cut it you need
> > > a link between bridged network and EVPN to be a single link. There is
> > > no STP in EVPN.
> >
> > To be fair it's not a full loop but only BUM traffic will loop back to
> other PE.
> >
> Yes but there should be an MPLS label associated with that traffic that says
> to the other PE -do not send this traffic back to LAN -cause it's the same
> site.

The problem is actually in the other side: the LAN would send BUM
traffic sourced from the router back to the other router port, and BUM
traffic sourced from the LAN to both router ports. Two sites
configured like this in an evpn would cause such traffic to loop
infinitely, since Ethernet has no TTL. Three sites would get you to
the point of exponential packet duplication where a single broadcast
packet could fill your pipes and keep them full until you intervene
(or something dies).

Allowing a MAC to appear on multiple ports would add a _lot_ of
complexity to ethernet (current hardware doesn't support it), and
could often result in traffic taking a suboptimal path (since switches
only know they saw this source MAC on that port -- not how far away it
is). You would need a routing protocol running at layer 2 to solve
these issues.

Remember that ethernet was initially designed using shared media, and
the MAC address was used to allow your NIC to ignore traffic that was
being sent to other hosts (to save CPU). The fact that they managed
to shoehorn switching in there without re-writing the protocol is
magical, but we are still living with some inherent limitations.

--
Eldon
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp