Mailing List Archive

Re: EVPN/VXLAN experience
Thank you Sebastian for sharing your very valuable experience.

Kind regards,
Andrey

Sebastian Wiesinger ????? 2019-03-22 04:39:
> * Andrey Kostin <ankost@podolsk.ru> [2019-03-15 20:50]:
>> I'm interested to hear about experience of running EVPN/VXLAN,
>> particularly
>> with QFX10k as L3 gateway and QFX5k as spine/leaves. As per docs, it
>> should
>> be immune to any single switch downtime, so might be a candidate to
>> really
>> redundant design.
>
> All right here it goes:
>
> I can't speak for QFX10k as spine but we have QFX5100 Leaf/Spine
> setups with EVPN/VXLAN running right now. Switch downtime is no
> problem at all, we unplugged a running switch, shut down ports,
> unplugged cables between leaf & spine or leaf & client all while there
> was storage traffic (NFS) active in the setup. Worst thing that
> happend was that IOPS went down from 400k/s to 100k/s for 1-3 seconds.
>
> What did bother us was that you are limited (at least on QFX5100) in
> the amount of "VLANs" (VNIs). We were testing with 30 client
> full-trunk ports per leaf and with that amount you can only provision
> around 500 VLANs before you get errors and basically it seems you run
> out of memory for bridge domains on the switch. This seems to be a
> limitation by the chips used in the QFX5100, at least that's what I
> got when I asked about it.
>
> You can check if you know where:
>
> root@SW-A:RE:0% ifsmon -Id | grep IFBD
> IFBD : 12884 0
>
> root@SW-A:RE:0% ifsmon -Id | grep Bridge
> Bridge Domain : 3502 0
>
> These numbers combined need to be <= 16382.
>
> And if you get over the limit these nice errors occur:
>
> dcf_ng_get_vxlan_ifbd_hw_token: Max vxlan ifbd hw token reached 16382
> ifbd_create_node: VXLAN IFBD hw token couldn't be allocated for
> <xe-...>
>
> Workaround is to decrease VLANs or trunk config.
>
> Also you absolutely NEED LACP from servers to the fabric. 17.4 has
> enhancements which will put the client ports in LACP standby when the
> leaf gets separated from all spines.
>
>> As a downside I see the more complex configuration at least. Adding
>> vlan means adding routing instance etc. There are also other
>> questions, about convergence, scalability, how stable it is and code
>> maturity.
>
> We have it automated with Ansible. Management access happens over OOB
> (Mgmt) ports and everything is pushed by Ansible playbooks. Ansible
> generates configuration from templates and pushes it to the switches
> via netconf. I never would want to do this by hand. This demands a
> certain level of structuring by every team (network, people doing the
> cabling, server team) but it works out well for structured setups.
>
> Our switch config looks like this:
>
> --------------------------------------------------------------------------
> user@sw1-spine-pod1> show configuration
> ## Last commit: 2019-03-11 03:13:49 CET by user
> ## Image name: jinstall-host-qfx-5-flex-17.4R2-S2.3-signed.tgz
>
> version 17.4R1-S3.3;
> groups {
> /* Created by Ansible */
> evpn-defaults { /* OMITTED */ };
> /* Created by Ansible */
> evpn-spine-defaults { /* OMITTED */ };
> /* Created by Ansible */
> evpn-spine-1 { /* OMITTED */ };
> /* Created by Ansible - Empty group for maintenance operations */
> client-interfaces;
> }
> apply-groups [ evpn-defaults evpn-spine-defaults evpn-spine-1 ];
> --------------------------------------------------------------------------
>
> So everything Ansible does is contained in apply-groups and is hidden.
> You can
> immediately spot if something is configured by hand.
>
> For code we're currently running on the 17.4 train which works mostly
> fine, we had a few problems with third party 40G optics but these
> should be fixed in the newest 17.4 service release.
>
> Also we had a problem where new Spine/Leaf links did not come up but
> these vanished after rebooting/upgrading the spines.
>
> In daily operations it proves to be quite stable.
>
>
> Best Regards
>
> Sebastian

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: EVPN/VXLAN experience [ In reply to ]
One more question just came to mind: what routing protocol do you use
for underlay, eBGP/iBGP/IGP? Design guides show examples with eBGP but
looks like for deployment that's not very big ISIS could do everything
needed. What are pros and cons for BGP vs IGP?

Kind regards,
Andrey

Andrey Kostin ????? 2019-03-22 09:46:
> Thank you Sebastian for sharing your very valuable experience.
>
> Kind regards,
> Andrey
>
> Sebastian Wiesinger ????? 2019-03-22 04:39:
>> * Andrey Kostin <ankost@podolsk.ru> [2019-03-15 20:50]:
>>> I'm interested to hear about experience of running EVPN/VXLAN,
>>> particularly
>>> with QFX10k as L3 gateway and QFX5k as spine/leaves. As per docs, it
>>> should
>>> be immune to any single switch downtime, so might be a candidate to
>>> really
>>> redundant design.
>>
>> All right here it goes:
>>
>> I can't speak for QFX10k as spine but we have QFX5100 Leaf/Spine
>> setups with EVPN/VXLAN running right now. Switch downtime is no
>> problem at all, we unplugged a running switch, shut down ports,
>> unplugged cables between leaf & spine or leaf & client all while there
>> was storage traffic (NFS) active in the setup. Worst thing that
>> happend was that IOPS went down from 400k/s to 100k/s for 1-3 seconds.
>>
>> What did bother us was that you are limited (at least on QFX5100) in
>> the amount of "VLANs" (VNIs). We were testing with 30 client
>> full-trunk ports per leaf and with that amount you can only provision
>> around 500 VLANs before you get errors and basically it seems you run
>> out of memory for bridge domains on the switch. This seems to be a
>> limitation by the chips used in the QFX5100, at least that's what I
>> got when I asked about it.
>>
>> You can check if you know where:
>>
>> root@SW-A:RE:0% ifsmon -Id | grep IFBD
>> IFBD : 12884 0
>>
>> root@SW-A:RE:0% ifsmon -Id | grep Bridge
>> Bridge Domain : 3502 0
>>
>> These numbers combined need to be <= 16382.
>>
>> And if you get over the limit these nice errors occur:
>>
>> dcf_ng_get_vxlan_ifbd_hw_token: Max vxlan ifbd hw token reached 16382
>> ifbd_create_node: VXLAN IFBD hw token couldn't be allocated for
>> <xe-...>
>>
>> Workaround is to decrease VLANs or trunk config.
>>
>> Also you absolutely NEED LACP from servers to the fabric. 17.4 has
>> enhancements which will put the client ports in LACP standby when the
>> leaf gets separated from all spines.
>>
>>> As a downside I see the more complex configuration at least. Adding
>>> vlan means adding routing instance etc. There are also other
>>> questions, about convergence, scalability, how stable it is and code
>>> maturity.
>>
>> We have it automated with Ansible. Management access happens over OOB
>> (Mgmt) ports and everything is pushed by Ansible playbooks. Ansible
>> generates configuration from templates and pushes it to the switches
>> via netconf. I never would want to do this by hand. This demands a
>> certain level of structuring by every team (network, people doing the
>> cabling, server team) but it works out well for structured setups.
>>
>> Our switch config looks like this:
>>
>> --------------------------------------------------------------------------
>> user@sw1-spine-pod1> show configuration
>> ## Last commit: 2019-03-11 03:13:49 CET by user
>> ## Image name: jinstall-host-qfx-5-flex-17.4R2-S2.3-signed.tgz
>>
>> version 17.4R1-S3.3;
>> groups {
>> /* Created by Ansible */
>> evpn-defaults { /* OMITTED */ };
>> /* Created by Ansible */
>> evpn-spine-defaults { /* OMITTED */ };
>> /* Created by Ansible */
>> evpn-spine-1 { /* OMITTED */ };
>> /* Created by Ansible - Empty group for maintenance operations */
>> client-interfaces;
>> }
>> apply-groups [ evpn-defaults evpn-spine-defaults evpn-spine-1 ];
>> --------------------------------------------------------------------------
>>
>> So everything Ansible does is contained in apply-groups and is hidden.
>> You can
>> immediately spot if something is configured by hand.
>>
>> For code we're currently running on the 17.4 train which works mostly
>> fine, we had a few problems with third party 40G optics but these
>> should be fixed in the newest 17.4 service release.
>>
>> Also we had a problem where new Spine/Leaf links did not come up but
>> these vanished after rebooting/upgrading the spines.
>>
>> In daily operations it proves to be quite stable.
>>
>>
>> Best Regards
>>
>> Sebastian
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: EVPN/VXLAN experience [ In reply to ]
? 22 mars 2019 13:39 -04, Rob Foehl <rwf@loonybin.net>:

> I've got a few really large layer 2 domains that I'm looking to start
> breaking up and stitching back together with EVPN+VXLAN in the middle,
> on the order of a few thousand VLANs apiece. Trying to plan around
> any likely limitations, but specifics have been hard to come by...

You can find a bit more here:

- <https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/interface-num-edit-forwarding-options.html>
- <https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/next-hop-edit-forwarding-options-vxlan-routing.html>
--
I fell asleep reading a dull book, and I dreamt that I was reading on,
so I woke up from sheer boredom.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: EVPN/VXLAN experience [ In reply to ]
On Fri, 22 Mar 2019, Vincent Bernat wrote:

> ? 22 mars 2019 13:39 -04, Rob Foehl <rwf@loonybin.net>:
>
>> I've got a few really large layer 2 domains that I'm looking to start
>> breaking up and stitching back together with EVPN+VXLAN in the middle,
>> on the order of a few thousand VLANs apiece. Trying to plan around
>> any likely limitations, but specifics have been hard to come by...
>
> You can find a bit more here:
>
> - <https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/interface-num-edit-forwarding-options.html>
> - <https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/next-hop-edit-forwarding-options-vxlan-routing.html>

Noted, thanks. Raises even more questions, though... Are these really
QFX5110 specific, and if so, are there static limitations on the 5100
chipset?

-Rob
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: EVPN/VXLAN experience [ In reply to ]
* Andrey Kostin <ankost@podolsk.ru> [2019-03-22 16:16]:
> One more question just came to mind: what routing protocol do you use for
> underlay, eBGP/iBGP/IGP? Design guides show examples with eBGP but looks
> like for deployment that's not very big ISIS could do everything needed.
> What are pros and cons for BGP vs IGP?

We use ISIS. It's easier for people to understand and I don't expect
any scalability issues with our size of fabrics. We did not encounter
any drawbacks by using ISIS instead of BGP.

Regards

Sebastian

--
GPG Key: 0x58A2D94A93A0B9CE (F4F6 B1A3 866B 26E9 450A 9D82 58A2 D94A 93A0 B9CE)
'Are you Death?' ... IT'S THE SCYTHE, ISN'T IT? PEOPLE ALWAYS NOTICE THE SCYTHE.
-- Terry Pratchett, The Fifth Elephant
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: EVPN/VXLAN experience [ In reply to ]
Hello,

For some reason, we had a lot of issue with ISIS with evpn-vxlan on Broadcom chipset.
We had a ticket running for months with Juniper about this. ECMP wasn't working, mac boucing and so on.
We moved to OSPF and ... no more issue since then. The topology was not a basic leaf-spine but a ring ( not officially supported by the SI team @juniper )

This was on QFX5100

Raphael


?On 25/03/2019 10:50, "juniper-nsp on behalf of Sebastian Wiesinger" <juniper-nsp-bounces@puck.nether.net on behalf of sebastian@karotte.org> wrote:

* Andrey Kostin <ankost@podolsk.ru> [2019-03-22 16:16]:
> One more question just came to mind: what routing protocol do you use for
> underlay, eBGP/iBGP/IGP? Design guides show examples with eBGP but looks
> like for deployment that's not very big ISIS could do everything needed.
> What are pros and cons for BGP vs IGP?

We use ISIS. It's easier for people to understand and I don't expect
any scalability issues with our size of fabrics. We did not encounter
any drawbacks by using ISIS instead of BGP.

Regards

Sebastian

--
GPG Key: 0x58A2D94A93A0B9CE (F4F6 B1A3 866B 26E9 450A 9D82 58A2 D94A 93A0 B9CE)
'Are you Death?' ... IT'S THE SCYTHE, ISN'T IT? PEOPLE ALWAYS NOTICE THE SCYTHE.
-- Terry Pratchett, The Fifth Elephant
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: EVPN/VXLAN experience [ In reply to ]
Hi Sebastian,

Could you please clarify a little bit, does this limit on bridge-domain
number apply when you have same 500 vlans on 30 aes or each ae has
unique 500 VNIs?
How is external connectivity implemented and for how many VNIs?

Kind regards,
Andrey

Sebastian Wiesinger ????? 2019-03-25 05:58:
> * Rob Foehl <rwf@loonybin.net> [2019-03-22 18:40]:
>> Huh, that's potentially bad... Can you elaborate on the config a bit
>> more?
>> Are you hitting a limit around ~16k bridge domains total?
>
> Well we're just putting VLANs on LACP trunks like this:
>
> ae0 {
> mtu 9216;
> esi {
> 00:00:00:00:00:00:00:01:01:01;
> all-active;
> }
> aggregated-ether-options {
> lacp {
> active;
> system-id 00:00:00:01:01:01;
> hold-time up 2;
> }
> }
> unit 0 {
> family ethernet-switching {
> interface-mode trunk;
> vlan {
> members STORAGE1;
> }
> }
> }
> }
>
> VLANs are configured "as ususal":
>
> vlans {
> STORAGE1 {
> vlan-id 402;
> vxlan {
> vni 402;
> }
> }
> }
>
>
> If you have 30 AEs you will start hitting this when you put around 500
> vlans on the vlan members list of all AEs.
>
> What I find irritating are the warnings around the evpn configuration:
>
> evpn {
> ## Warning: Encapsulation can only be configured for an EVPN
> instance
> ## Warning
> encapsulation vxlan;
> ## Warning: multicast-mode can only be configured in a virtual
> switch instance
> ## Warning: Multicast mode can only be configured if
> route-distinguisher is configured
> multicast-mode ingress-replication;
> ## Warning: Extended VNI list can only be configured in a
> virtual switch instance
> extended-vni-list all;
> }
>
> This config works without problems and was the configuration we got
> from Juniper in the beginning as well. Did not find an explanation for
> the warnings when we initally provisioned this.
>
> Regards
>
> Sebastian

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: EVPN/VXLAN experience [ In reply to ]
* Andrey Kostin <ankost@podolsk.ru> [2019-03-28 15:24]:
> Hi Sebastian,
>
> Could you please clarify a little bit, does this limit on bridge-domain
> number apply when you have same 500 vlans on 30 aes or each ae has unique
> 500 VNIs?

Hi,

it's the same 500 VLANs on all 30 AEs.

> How is external connectivity implemented and for how many VNIs?

There is none in our case. :) It's "simple" VLANs between all clients
attached to the fabric. The fabric is the "backend" for our setup. We
currently cannot deploy it in the frontend because we would need more
than 500 VLANs there.

Regards

Sebastian

--
GPG Key: 0x58A2D94A93A0B9CE (F4F6 B1A3 866B 26E9 450A 9D82 58A2 D94A 93A0 B9CE)
'Are you Death?' ... IT'S THE SCYTHE, ISN'T IT? PEOPLE ALWAYS NOTICE THE SCYTHE.
-- Terry Pratchett, The Fifth Elephant
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp