Mailing List Archive

Rock-solid JUNOS for QFX5100
Dear List,

I'm curious if anybody can recommend a JUNOS release for QFX5100 that is seriously stable. Right now we're on the previously-recommended version 17.3R3-S1.5. Everything's been fine in testing, and suddenly out of the blue there will be weird issues when I make a change. I suspect maybe they are related to VSTP or LAG, or both.

1. Add a VLAN to a trunk port, all the access ports on that VLAN completely stopped moving packets. Disable/delete disable all of the broken interfaces restored function. This happened during the day. I opened a JTAC ticket and they'd never heard of an issue like this, of course we couldn't reproduce it. I no longer recall with confidence, but I think the trunk port may have been a one-member LAG (replacement of a downstream switch).

2. New trunk port (a two-port LACP LAG) not sending VSTP BPDUs for some VLANs. I'm not sure if it was coincidence or always broken as I had recently began feeding new VSTP BPDUs (thus the root bridge changed) before I even looked at this. Other trunk ports did not exhibit the same issue. Completely deleted the LAG and rolled back to fix. This was on a fresh turnup and luckily wasn't in a topology that could form a loop.

Features I'm using include:

- BGP
- OSPF
- PIM
- VSTP
- LACP
- VRRP
- IGMPv2 and v3
- Routing-instance
- CoS for multicast
- CoS for unicast
- CoS classification by ingress filter
- IPv4-only
- ~7k routes in FIB (total of all tables)
- ~1k multicast groups


There are no automation features, no MPLS, no MC-LAG, no EVPN, VXLAN, etc. These switches are L3 boxes that hand off IP to an MX core. Management is in the default instance/table, everything else is in a routing instance.

These boxes have us scared to touch them outside of a window as seemingly basic changes risk blowing the whole thing up. Is this a case where an ancient version might be a better choice or is this release a lemon? I recall that JTAC used to recommend two releases, one being for if you didn't require "new features". I find myself stuck between the adages of "If it ain't broke, don't fix it" and "Software doesn't age like wine". Given how poorly multicast seems to be understood by JTAC I'm very hesitant to upgrade to significantly newer releases.

If anybody can give advice or suggestions I would appreciate it immensely!

Thanks
Ross

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Rock-solid JUNOS for QFX5100 [ In reply to ]
Hi Ross,

We are on 14.1X53 for our prod QFX5100. Don't do BGP, VRRP and PIM on
them but other features are similar to yours (tried PIM once in a while
but it behaved weird and decided just don't do it). The only problem we
saw with them is few third-party QSFP issues, but resolved them by
manipulating auto-negotiation iirc.
I'm currently looking to 18.2R3 as potential candidate for next step and
testing it on QFX5110 atm, according to release notes it has a bunch of
fixes for bugs that were discovered in 17.x releases. Also 17.4R3 is
going to be released in August, waiting for it for subscriber-management
routers but it will have recent fixes for QFXs as well. In your case
though it'll be interesting to know JTAC findings. If it's a new bug
then it may take some time until it will be resolved.
I'm also very suspicious when S-releases are shown as "recommended". I
may be mistaken, but in my understanding S-releases don't undergo full
testing routine and verified only for implemented bugfixes.
Please share you investigation results with JTAC.

Kind regards,
Andrey Kostin


Ross Halliday ????? 2019-08-12 09:19:
> Dear List,
>
> I'm curious if anybody can recommend a JUNOS release for QFX5100 that
> is seriously stable. Right now we're on the previously-recommended
> version 17.3R3-S1.5. Everything's been fine in testing, and suddenly
> out of the blue there will be weird issues when I make a change. I
> suspect maybe they are related to VSTP or LAG, or both.
>
> 1. Add a VLAN to a trunk port, all the access ports on that VLAN
> completely stopped moving packets. Disable/delete disable all of the
> broken interfaces restored function. This happened during the day. I
> opened a JTAC ticket and they'd never heard of an issue like this, of
> course we couldn't reproduce it. I no longer recall with confidence,
> but I think the trunk port may have been a one-member LAG (replacement
> of a downstream switch).
>
> 2. New trunk port (a two-port LACP LAG) not sending VSTP BPDUs for
> some VLANs. I'm not sure if it was coincidence or always broken as I
> had recently began feeding new VSTP BPDUs (thus the root bridge
> changed) before I even looked at this. Other trunk ports did not
> exhibit the same issue. Completely deleted the LAG and rolled back to
> fix. This was on a fresh turnup and luckily wasn't in a topology that
> could form a loop.
>
> Features I'm using include:
>
> - BGP
> - OSPF
> - PIM
> - VSTP
> - LACP
> - VRRP
> - IGMPv2 and v3
> - Routing-instance
> - CoS for multicast
> - CoS for unicast
> - CoS classification by ingress filter
> - IPv4-only
> - ~7k routes in FIB (total of all tables)
> - ~1k multicast groups
>
>
> There are no automation features, no MPLS, no MC-LAG, no EVPN, VXLAN,
> etc. These switches are L3 boxes that hand off IP to an MX core.
> Management is in the default instance/table, everything else is in a
> routing instance.
>
> These boxes have us scared to touch them outside of a window as
> seemingly basic changes risk blowing the whole thing up. Is this a
> case where an ancient version might be a better choice or is this
> release a lemon? I recall that JTAC used to recommend two releases,
> one being for if you didn't require "new features". I find myself
> stuck between the adages of "If it ain't broke, don't fix it" and
> "Software doesn't age like wine". Given how poorly multicast seems to
> be understood by JTAC I'm very hesitant to upgrade to significantly
> newer releases.
>
> If anybody can give advice or suggestions I would appreciate it
> immensely!
>
> Thanks
> Ross
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Rock-solid JUNOS for QFX5100 [ In reply to ]
Hi Ross

We've recently switched our 5100s to 18.1R3-S5. 18.1 is stable with BGP/OSPF/LDP/RSVP/MPLS and LACP LAG in general. We don't use STP of any kind with the QFXs so I can't really help there.

I was hesitant to upgrade to 18.X since the 5100 was still the only QFX not to have and 18 version recommended on KB21476, but recently they updated the KB to include that model, so I'd say it's pretty safe now. They've pushed out S6 in July, if I'd have to re-do it now I'd use that one instead of S5.

The kind of problem you're describing sounds like what we've lived through with 14.X and VCF when we first started using these. We'd commit a change and some random ports would stop passing traffic, we'd then have to delete port config and re provision for traffic to resume. Lots of weird stuff like that kept happening until we go fed up with the architecture and moved to routed MPLS with almost no layer2 switching.

Good luck.

-phil




-----Original Message-----
From: juniper-nsp <juniper-nsp-bounces@puck.nether.net> On Behalf Of Ross Halliday
Sent: August 12, 2019 9:20 AM
To: juniper-nsp@puck.nether.net
Subject: [j-nsp] Rock-solid JUNOS for QFX5100

Dear List,

I'm curious if anybody can recommend a JUNOS release for QFX5100 that is seriously stable. Right now we're on the previously-recommended version 17.3R3-S1.5. Everything's been fine in testing, and suddenly out of the blue there will be weird issues when I make a change. I suspect maybe they are related to VSTP or LAG, or both.

1. Add a VLAN to a trunk port, all the access ports on that VLAN completely stopped moving packets. Disable/delete disable all of the broken interfaces restored function. This happened during the day. I opened a JTAC ticket and they'd never heard of an issue like this, of course we couldn't reproduce it. I no longer recall with confidence, but I think the trunk port may have been a one-member LAG (replacement of a downstream switch).

2. New trunk port (a two-port LACP LAG) not sending VSTP BPDUs for some VLANs. I'm not sure if it was coincidence or always broken as I had recently began feeding new VSTP BPDUs (thus the root bridge changed) before I even looked at this. Other trunk ports did not exhibit the same issue. Completely deleted the LAG and rolled back to fix. This was on a fresh turnup and luckily wasn't in a topology that could form a loop.

Features I'm using include:

- BGP
- OSPF
- PIM
- VSTP
- LACP
- VRRP
- IGMPv2 and v3
- Routing-instance
- CoS for multicast
- CoS for unicast
- CoS classification by ingress filter
- IPv4-only
- ~7k routes in FIB (total of all tables)
- ~1k multicast groups


There are no automation features, no MPLS, no MC-LAG, no EVPN, VXLAN, etc. These switches are L3 boxes that hand off IP to an MX core. Management is in the default instance/table, everything else is in a routing instance.

These boxes have us scared to touch them outside of a window as seemingly basic changes risk blowing the whole thing up. Is this a case where an ancient version might be a better choice or is this release a lemon? I recall that JTAC used to recommend two releases, one being for if you didn't require "new features". I find myself stuck between the adages of "If it ain't broke, don't fix it" and "Software doesn't age like wine". Given how poorly multicast seems to be understood by JTAC I'm very hesitant to upgrade to significantly newer releases.

If anybody can give advice or suggestions I would appreciate it immensely!

Thanks
Ross

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Rock-solid JUNOS for QFX5100 [ In reply to ]
Yep, holding at 14.1X53 for production QFX5100 also. These were sold as
east/west datacenter Layer 2 switches. If they can't figure out port
init and connection, I am wondering what purpose these switches are
supposed to serve.

Seeing the same connection issues in EX2300/4300 with 10Gb ports in
JunOS 15/16/17. Just upgraded some flaky switches to 18.1R3-S6. Let's
see if they have it right this time. Maybe when the access switches stop
acting autistic, I will think about the QFX5100.

Brian Nelson

On 8/12/19 8:49 AM, Andrey Kostin wrote:
> Hi Ross,
>
> We are on 14.1X53 for our prod QFX5100. Don't do BGP, VRRP and PIM on
> them but other features are similar to yours (tried PIM once in a while
> but it behaved weird and decided just don't do it). The only problem we
> saw with them is few third-party QSFP issues, but resolved them by
> manipulating auto-negotiation iirc.
> I'm currently looking to 18.2R3 as potential candidate for next step and
> testing it on QFX5110 atm, according to release notes it has a bunch of
> fixes for bugs that were discovered in 17.x releases. Also 17.4R3 is
> going to be released in August, waiting for it for subscriber-management
> routers but it will have recent fixes for QFXs as well. In your case
> though it'll be interesting to know JTAC findings. If it's a new bug
> then it may take some time until it will be resolved.
> I'm also very suspicious when S-releases are shown as "recommended". I
> may be mistaken, but in my understanding S-releases don't undergo full
> testing routine and verified only for implemented bugfixes.
> Please share you investigation results with JTAC.
>
> Kind regards,
> Andrey Kostin
>
>
> Ross Halliday ????? 2019-08-12 09:19:
>> Dear List,
>>
>> I'm curious if anybody can recommend a JUNOS release for QFX5100 that
>> is seriously stable. Right now we're on the previously-recommended
>> version 17.3R3-S1.5. Everything's been fine in testing, and suddenly
>> out of the blue there will be weird issues when I make a change. I
>> suspect maybe they are related to VSTP or LAG, or both.
>>
>> 1. Add a VLAN to a trunk port, all the access ports on that VLAN
>> completely stopped moving packets. Disable/delete disable all of the
>> broken interfaces restored function. This happened during the day. I
>> opened a JTAC ticket and they'd never heard of an issue like this, of
>> course we couldn't reproduce it. I no longer recall with confidence,
>> but I think the trunk port may have been a one-member LAG (replacement
>> of a downstream switch).
>>
>> 2. New trunk port (a two-port LACP LAG) not sending VSTP BPDUs for
>> some VLANs. I'm not sure if it was coincidence or always broken as I
>> had recently began feeding new VSTP BPDUs (thus the root bridge
>> changed) before I even looked at this. Other trunk ports did not
>> exhibit the same issue. Completely deleted the LAG and rolled back to
>> fix. This was on a fresh turnup and luckily wasn't in a topology that
>> could form a loop.
>>
>> Features I'm using include:
>>
>> - BGP
>> - OSPF
>> - PIM
>> - VSTP
>> - LACP
>> - VRRP
>> - IGMPv2 and v3
>> - Routing-instance
>> - CoS for multicast
>> - CoS for unicast
>> - CoS classification by ingress filter
>> - IPv4-only
>> - ~7k routes in FIB (total of all tables)
>> - ~1k multicast groups
>>
>>
>> There are no automation features, no MPLS, no MC-LAG, no EVPN, VXLAN,
>> etc. These switches are L3 boxes that hand off IP to an MX core.
>> Management is in the default instance/table, everything else is in a
>> routing instance.
>>
>> These boxes have us scared to touch them outside of a window as
>> seemingly basic changes risk blowing the whole thing up. Is this a
>> case where an ancient version might be a better choice or is this
>> release a lemon? I recall that JTAC used to recommend two releases,
>> one being for if you didn't require "new features". I find myself
>> stuck between the adages of "If it ain't broke, don't fix it" and
>> "Software doesn't age like wine". Given how poorly multicast seems to
>> be understood by JTAC I'm very hesitant to upgrade to significantly
>> newer releases.
>>
>> If anybody can give advice or suggestions I would appreciate it
>> immensely!
>>
>> Thanks
>> Ross
>>
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fjuniper-nsp&amp;data=02%7C01%7Cbrian.nelson%40utdallas.edu%7C46dc3d3102a24da6321108d71f2becf6%7C8d281d1d9c4d4bf7b16e032d15de9f6c%7C0%7C0%7C637012145848697458&amp;sdata=JDN54e8K6xG5Fh0EfTolJWr0qsVaCs6Q1GKwuYWSi2A%3D&amp;reserved=0
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fjuniper-nsp&amp;data=02%7C01%7Cbrian.nelson%40utdallas.edu%7C46dc3d3102a24da6321108d71f2becf6%7C8d281d1d9c4d4bf7b16e032d15de9f6c%7C0%7C0%7C637012145848697458&amp;sdata=JDN54e8K6xG5Fh0EfTolJWr0qsVaCs6Q1GKwuYWSi2A%3D&amp;reserved=0
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Rock-solid JUNOS for QFX5100 [ In reply to ]
Just FYI for all, but 18.1R3-S6 is specifically needed for EVPN/VXLAN use cases with QFX5110, not so much QFX5100. For 5100 without EVPN/VXLAN 14.1x53-D[latest] is very stable AFAIK. There are no real added features/functionality for QFX5100 outside of EVPN/VXLAN, so if this is not your use case, 14.1x53 should be equivalent feature/functionality wise to 18.x. For both EX and QFX I [now] recommend staying away from 16.x or 17.x, as no added benefits.

S-Releases are the new Standard 'recommended' for almost all products. For sure all EX and all QFX. Not happy with this, but it is what it is.

Please also note that "TAC Recommended" is generic BEST (fewest cases, along with some deployment/downloads) from purely STABILITY point of view, and does not take into account specific use cases. Your best bet is to discuss any code upgrades with your local Juniper account team. Even if you just work only with a specific partner, that partner has a Juniper team with an SE supporting them.

I am also of the firm believe that upgrade for upgrade sake or to stay most current is not always a great idea - if not broken why try to fix/change?

Just FYI, Rich

Richard McGovern
Sr Sales Engineer, Juniper Networks
978-618-3342

I’d rather be lucky than good, as I know I am not good
I don’t make the news, I just report it


?On 8/12/19, 11:31 AM, "Philippe Girard" <philippe.girard@metrooptic.com> wrote:

Hi Ross

We've recently switched our 5100s to 18.1R3-S5. 18.1 is stable with BGP/OSPF/LDP/RSVP/MPLS and LACP LAG in general. We don't use STP of any kind with the QFXs so I can't really help there.

I was hesitant to upgrade to 18.X since the 5100 was still the only QFX not to have and 18 version recommended on KB21476, but recently they updated the KB to include that model, so I'd say it's pretty safe now. They've pushed out S6 in July, if I'd have to re-do it now I'd use that one instead of S5.

The kind of problem you're describing sounds like what we've lived through with 14.X and VCF when we first started using these. We'd commit a change and some random ports would stop passing traffic, we'd then have to delete port config and re provision for traffic to resume. Lots of weird stuff like that kept happening until we go fed up with the architecture and moved to routed MPLS with almost no layer2 switching.

Good luck.

-phil




-----Original Message-----
From: juniper-nsp <juniper-nsp-bounces@puck.nether.net> On Behalf Of Ross Halliday
Sent: August 12, 2019 9:20 AM
To: juniper-nsp@puck.nether.net
Subject: [j-nsp] Rock-solid JUNOS for QFX5100

Dear List,

I'm curious if anybody can recommend a JUNOS release for QFX5100 that is seriously stable. Right now we're on the previously-recommended version 17.3R3-S1.5. Everything's been fine in testing, and suddenly out of the blue there will be weird issues when I make a change. I suspect maybe they are related to VSTP or LAG, or both.

1. Add a VLAN to a trunk port, all the access ports on that VLAN completely stopped moving packets. Disable/delete disable all of the broken interfaces restored function. This happened during the day. I opened a JTAC ticket and they'd never heard of an issue like this, of course we couldn't reproduce it. I no longer recall with confidence, but I think the trunk port may have been a one-member LAG (replacement of a downstream switch).

2. New trunk port (a two-port LACP LAG) not sending VSTP BPDUs for some VLANs. I'm not sure if it was coincidence or always broken as I had recently began feeding new VSTP BPDUs (thus the root bridge changed) before I even looked at this. Other trunk ports did not exhibit the same issue. Completely deleted the LAG and rolled back to fix. This was on a fresh turnup and luckily wasn't in a topology that could form a loop.

Features I'm using include:

- BGP
- OSPF
- PIM
- VSTP
- LACP
- VRRP
- IGMPv2 and v3
- Routing-instance
- CoS for multicast
- CoS for unicast
- CoS classification by ingress filter
- IPv4-only
- ~7k routes in FIB (total of all tables)
- ~1k multicast groups


There are no automation features, no MPLS, no MC-LAG, no EVPN, VXLAN, etc. These switches are L3 boxes that hand off IP to an MX core. Management is in the default instance/table, everything else is in a routing instance.

These boxes have us scared to touch them outside of a window as seemingly basic changes risk blowing the whole thing up. Is this a case where an ancient version might be a better choice or is this release a lemon? I recall that JTAC used to recommend two releases, one being for if you didn't require "new features". I find myself stuck between the adages of "If it ain't broke, don't fix it" and "Software doesn't age like wine". Given how poorly multicast seems to be understood by JTAC I'm very hesitant to upgrade to significantly newer releases.

If anybody can give advice or suggestions I would appreciate it immensely!

Thanks
Ross

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net https://urldefense.proofpoint.com/v2/url?u=https-3A__puck.nether.net_mailman_listinfo_juniper-2Dnsp&d=DwIFAg&c=HAkYuh63rsuhr6Scbfh0UjBXeMK-ndb3voDTXcWzoCI&r=cViNvWbwxCvdnmDGDIbWYLiUsu8nisqLYXmd-x445bc&m=k37wTi5rXWodnDiwk1FazSYNFG5qpi12y4WYluOskOE&s=FGrLn0ZXWJ0ef8Z-_tskWOf6fPC56qsD97GqLPJ5luk&e=



_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Rock-solid JUNOS for QFX5100 [ In reply to ]
I'd like to add to that, l2circuit hot standby needs 18+ to work, I had been waiting for some time now for 18 to be recommended to use it.

Thanks for the info much appreciated!

-phil

-----Original Message-----
From: Richard McGovern <rmcgovern@juniper.net>
Sent: August 13, 2019 10:30 AM
To: Philippe Girard <philippe.girard@metrooptic.com>; Ross Halliday <ross.halliday@wtccommunications.ca>; Andrey Kostin <ankost@podolsk.ru>
Cc: juniper-nsp@puck.nether.net
Subject: Re: [j-nsp] Rock-solid JUNOS for QFX5100

Just FYI for all, but 18.1R3-S6 is specifically needed for EVPN/VXLAN use cases with QFX5110, not so much QFX5100. For 5100 without EVPN/VXLAN 14.1x53-D[latest] is very stable AFAIK. There are no real added features/functionality for QFX5100 outside of EVPN/VXLAN, so if this is not your use case, 14.1x53 should be equivalent feature/functionality wise to 18.x. For both EX and QFX I [now] recommend staying away from 16.x or 17.x, as no added benefits.

S-Releases are the new Standard 'recommended' for almost all products. For sure all EX and all QFX. Not happy with this, but it is what it is.

Please also note that "TAC Recommended" is generic BEST (fewest cases, along with some deployment/downloads) from purely STABILITY point of view, and does not take into account specific use cases. Your best bet is to discuss any code upgrades with your local Juniper account team. Even if you just work only with a specific partner, that partner has a Juniper team with an SE supporting them.

I am also of the firm believe that upgrade for upgrade sake or to stay most current is not always a great idea - if not broken why try to fix/change?

Just FYI, Rich

Richard McGovern
Sr Sales Engineer, Juniper Networks
978-618-3342

I’d rather be lucky than good, as I know I am not good I don’t make the news, I just report it


?On 8/12/19, 11:31 AM, "Philippe Girard" <philippe.girard@metrooptic.com> wrote:

Hi Ross

We've recently switched our 5100s to 18.1R3-S5. 18.1 is stable with BGP/OSPF/LDP/RSVP/MPLS and LACP LAG in general. We don't use STP of any kind with the QFXs so I can't really help there.

I was hesitant to upgrade to 18.X since the 5100 was still the only QFX not to have and 18 version recommended on KB21476, but recently they updated the KB to include that model, so I'd say it's pretty safe now. They've pushed out S6 in July, if I'd have to re-do it now I'd use that one instead of S5.

The kind of problem you're describing sounds like what we've lived through with 14.X and VCF when we first started using these. We'd commit a change and some random ports would stop passing traffic, we'd then have to delete port config and re provision for traffic to resume. Lots of weird stuff like that kept happening until we go fed up with the architecture and moved to routed MPLS with almost no layer2 switching.

Good luck.

-phil




-----Original Message-----
From: juniper-nsp <juniper-nsp-bounces@puck.nether.net> On Behalf Of Ross Halliday
Sent: August 12, 2019 9:20 AM
To: juniper-nsp@puck.nether.net
Subject: [j-nsp] Rock-solid JUNOS for QFX5100

Dear List,

I'm curious if anybody can recommend a JUNOS release for QFX5100 that is seriously stable. Right now we're on the previously-recommended version 17.3R3-S1.5. Everything's been fine in testing, and suddenly out of the blue there will be weird issues when I make a change. I suspect maybe they are related to VSTP or LAG, or both.

1. Add a VLAN to a trunk port, all the access ports on that VLAN completely stopped moving packets. Disable/delete disable all of the broken interfaces restored function. This happened during the day. I opened a JTAC ticket and they'd never heard of an issue like this, of course we couldn't reproduce it. I no longer recall with confidence, but I think the trunk port may have been a one-member LAG (replacement of a downstream switch).

2. New trunk port (a two-port LACP LAG) not sending VSTP BPDUs for some VLANs. I'm not sure if it was coincidence or always broken as I had recently began feeding new VSTP BPDUs (thus the root bridge changed) before I even looked at this. Other trunk ports did not exhibit the same issue. Completely deleted the LAG and rolled back to fix. This was on a fresh turnup and luckily wasn't in a topology that could form a loop.

Features I'm using include:

- BGP
- OSPF
- PIM
- VSTP
- LACP
- VRRP
- IGMPv2 and v3
- Routing-instance
- CoS for multicast
- CoS for unicast
- CoS classification by ingress filter
- IPv4-only
- ~7k routes in FIB (total of all tables)
- ~1k multicast groups


There are no automation features, no MPLS, no MC-LAG, no EVPN, VXLAN, etc. These switches are L3 boxes that hand off IP to an MX core. Management is in the default instance/table, everything else is in a routing instance.

These boxes have us scared to touch them outside of a window as seemingly basic changes risk blowing the whole thing up. Is this a case where an ancient version might be a better choice or is this release a lemon? I recall that JTAC used to recommend two releases, one being for if you didn't require "new features". I find myself stuck between the adages of "If it ain't broke, don't fix it" and "Software doesn't age like wine". Given how poorly multicast seems to be understood by JTAC I'm very hesitant to upgrade to significantly newer releases.

If anybody can give advice or suggestions I would appreciate it immensely!

Thanks
Ross

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net https://urldefense.proofpoint.com/v2/url?u=https-3A__puck.nether.net_mailman_listinfo_juniper-2Dnsp&d=DwIFAg&c=HAkYuh63rsuhr6Scbfh0UjBXeMK-ndb3voDTXcWzoCI&r=cViNvWbwxCvdnmDGDIbWYLiUsu8nisqLYXmd-x445bc&m=k37wTi5rXWodnDiwk1FazSYNFG5qpi12y4WYluOskOE&s=FGrLn0ZXWJ0ef8Z-_tskWOf6fPC56qsD97GqLPJ5luk&e=



_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Rock-solid JUNOS for QFX5100 [ In reply to ]
Hi,

similar problem with

Model: qfx5100-48s-6q
Junos: 17.3R3-S4.2

Creating vlan means stop forwarding traffic for approx 3 seconds probably on trunk ports with allowed all vlans, or something like this. Pretty bad for bfd going through this ifaces.

Does anyone have a similar issue?

regards, Daniel

-------- P?vodní zpráva --------
Zapnuto 12. 8. 2019 18:10, Nelson, Brian napsal:

> Yep, holding at 14.1X53 for production QFX5100 also. These were sold as
> east/west datacenter Layer 2 switches. If they can't figure out port
> init and connection, I am wondering what purpose these switches are
> supposed to serve.
>
> Seeing the same connection issues in EX2300/4300 with 10Gb ports in
> JunOS 15/16/17. Just upgraded some flaky switches to 18.1R3-S6. Let's
> see if they have it right this time. Maybe when the access switches stop
> acting autistic, I will think about the QFX5100.
>
> Brian Nelson
>
> On 8/12/19 8:49 AM, Andrey Kostin wrote:
>> Hi Ross,
>>
>> We are on 14.1X53 for our prod QFX5100. Don't do BGP, VRRP and PIM on
>> them but other features are similar to yours (tried PIM once in a while
>> but it behaved weird and decided just don't do it). The only problem we
>> saw with them is few third-party QSFP issues, but resolved them by
>> manipulating auto-negotiation iirc.
>> I'm currently looking to 18.2R3 as potential candidate for next step and
>> testing it on QFX5110 atm, according to release notes it has a bunch of
>> fixes for bugs that were discovered in 17.x releases. Also 17.4R3 is
>> going to be released in August, waiting for it for subscriber-management
>> routers but it will have recent fixes for QFXs as well. In your case
>> though it'll be interesting to know JTAC findings. If it's a new bug
>> then it may take some time until it will be resolved.
>> I'm also very suspicious when S-releases are shown as "recommended". I
>> may be mistaken, but in my understanding S-releases don't undergo full
>> testing routine and verified only for implemented bugfixes.
>> Please share you investigation results with JTAC.
>>
>> Kind regards,
>> Andrey Kostin
>>
>>
>> Ross Halliday ????? 2019-08-12 09:19:
>>> Dear List,
>>>
>>> I'm curious if anybody can recommend a JUNOS release for QFX5100 that
>>> is seriously stable. Right now we're on the previously-recommended
>>> version 17.3R3-S1.5. Everything's been fine in testing, and suddenly
>>> out of the blue there will be weird issues when I make a change. I
>>> suspect maybe they are related to VSTP or LAG, or both.
>>>
>>> 1. Add a VLAN to a trunk port, all the access ports on that VLAN
>>> completely stopped moving packets. Disable/delete disable all of the
>>> broken interfaces restored function. This happened during the day. I
>>> opened a JTAC ticket and they'd never heard of an issue like this, of
>>> course we couldn't reproduce it. I no longer recall with confidence,
>>> but I think the trunk port may have been a one-member LAG (replacement
>>> of a downstream switch).
>>>
>>> 2. New trunk port (a two-port LACP LAG) not sending VSTP BPDUs for
>>> some VLANs. I'm not sure if it was coincidence or always broken as I
>>> had recently began feeding new VSTP BPDUs (thus the root bridge
>>> changed) before I even looked at this. Other trunk ports did not
>>> exhibit the same issue. Completely deleted the LAG and rolled back to
>>> fix. This was on a fresh turnup and luckily wasn't in a topology that
>>> could form a loop.
>>>
>>> Features I'm using include:
>>>
>>> - BGP
>>> - OSPF
>>> - PIM
>>> - VSTP
>>> - LACP
>>> - VRRP
>>> - IGMPv2 and v3
>>> - Routing-instance
>>> - CoS for multicast
>>> - CoS for unicast
>>> - CoS classification by ingress filter
>>> - IPv4-only
>>> - ~7k routes in FIB (total of all tables)
>>> - ~1k multicast groups
>>>
>>>
>>> There are no automation features, no MPLS, no MC-LAG, no EVPN, VXLAN,
>>> etc. These switches are L3 boxes that hand off IP to an MX core.
>>> Management is in the default instance/table, everything else is in a
>>> routing instance.
>>>
>>> These boxes have us scared to touch them outside of a window as
>>> seemingly basic changes risk blowing the whole thing up. Is this a
>>> case where an ancient version might be a better choice or is this
>>> release a lemon? I recall that JTAC used to recommend two releases,
>>> one being for if you didn't require "new features". I find myself
>>> stuck between the adages of "If it ain't broke, don't fix it" and
>>> "Software doesn't age like wine". Given how poorly multicast seems to
>>> be understood by JTAC I'm very hesitant to upgrade to significantly
>>> newer releases.
>>>
>>> If anybody can give advice or suggestions I would appreciate it
>>> immensely!
>>>
>>> Thanks
>>> Ross
>>>
>>> _______________________________________________
>>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>>> [https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fjuniper-nsp&amp;data=02%7C01%7Cbrian.nelson%40utdallas.edu%7C46dc3d3102a24da6321108d71f2becf6%7C8d281d1d9c4d4bf7b16e032d15de9f6c%7C0%7C0%7C637012145848697458&amp;sdata=JDN54e8K6xG5Fh0EfTolJWr0qsVaCs6Q1GKwuYWSi2A%3D&amp;reserved=0](https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fjuniper-nsp&data=02%7C01%7Cbrian.nelson%40utdallas.edu%7C46dc3d3102a24da6321108d71f2becf6%7C8d281d1d9c4d4bf7b16e032d15de9f6c%7C0%7C0%7C637012145848697458&sdata=JDN54e8K6xG5Fh0EfTolJWr0qsVaCs6Q1GKwuYWSi2A%3D&reserved=0)
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>> [https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fjuniper-nsp&amp;data=02%7C01%7Cbrian.nelson%40utdallas.edu%7C46dc3d3102a24da6321108d71f2becf6%7C8d281d1d9c4d4bf7b16e032d15de9f6c%7C0%7C0%7C637012145848697458&amp;sdata=JDN54e8K6xG5Fh0EfTolJWr0qsVaCs6Q1GKwuYWSi2A%3D&amp;reserved=0](https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fjuniper-nsp&data=02%7C01%7Cbrian.nelson%40utdallas.edu%7C46dc3d3102a24da6321108d71f2becf6%7C8d281d1d9c4d4bf7b16e032d15de9f6c%7C0%7C0%7C637012145848697458&sdata=JDN54e8K6xG5Fh0EfTolJWr0qsVaCs6Q1GKwuYWSi2A%3D&reserved=0)
>>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Rock-solid JUNOS for QFX5100 [ In reply to ]
On Aug 13, 2019, at 1:50 PM, Dan ?ímal <dan@danrimal.net> wrote:
>
> Model: qfx5100-48s-6q
> Junos: 17.3R3-S4.2
>
> Creating vlan means stop forwarding traffic for approx 3 seconds probably on trunk ports with allowed all vlans, or something like this. Pretty bad for bfd going through this ifaces.
>
> Does anyone have a similar issue?

We have attempted (twice) to get to 17 from 14 on our QFX 5100. The first time we got to 16 and none of the interfaces would come up. TAC couldn't figure it out on the spot and we reverted. In the post-mortem they discovered that we had an "unsupported" config item (a discard interface) that essentially prevented the box from booting and bringing up any interfaces.

Second time we tried we got to version 16, but then the subsequent move to 17 was a disaster. The switch ate all the DHCP traffic passing through it, rendering the network quite unusable. Again TAC couldn't figure it out during the maintenance window so we rolled back.

So we're still on 16, but I haven't wanted to try another update. Sounds like we might need to just jump to 18 and see if we can skip over some of this other stuff.

Jason
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Rock-solid JUNOS for QFX5100 [ In reply to ]
Just here to tell you that we've had an issue that sounds close to what you saw:
After a commit, a LAG would stop moving packets. Renaming the LAG (i.e. from ae24 to ae35) would fix the issue. Renaming it back to ae24 would trigger the issue again.

Happened on a device that was only used for switching, no IP addresses were configured apart form the management port. A reboot fixed the issue. This was before we used them in prod and since then we've upgraded to 17.3R3-S1 and never saw that issue again. What you experienced sounds scary. If this happens to us outside of a maintenance window we'll have to throw out all of our QFX5100. :/

Regards
Karl

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
*From:* Ross Halliday [mailto:ross.halliday@wtccommunications.ca]
*Sent:* Monday, August 12, 2019, 3:19 PM
*To:* juniper-nsp@puck.nether.net
*Subject:* [j-nsp] Rock-solid JUNOS for QFX5100

> Dear List,
>
> I'm curious if anybody can recommend a JUNOS release for QFX5100 that is seriously stable. Right now we're on the previously-recommended version 17.3R3-S1.5. Everything's been fine in testing, and suddenly out of the blue there will be weird issues when I make a change. I suspect maybe they are related to VSTP or LAG, or both.
>
> 1. Add a VLAN to a trunk port, all the access ports on that VLAN completely stopped moving packets. Disable/delete disable all of the broken interfaces restored function. This happened during the day. I opened a JTAC ticket and they'd never heard of an issue like this, of course we couldn't reproduce it. I no longer recall with confidence, but I think the trunk port may have been a one-member LAG (replacement of a downstream switch).
>
> 2. New trunk port (a two-port LACP LAG) not sending VSTP BPDUs for some VLANs. I'm not sure if it was coincidence or always broken as I had recently began feeding new VSTP BPDUs (thus the root bridge changed) before I even looked at this. Other trunk ports did not exhibit the same issue. Completely deleted the LAG and rolled back to fix. This was on a fresh turnup and luckily wasn't in a topology that could form a loop.
>
> Features I'm using include:
>
> - BGP
> - OSPF
> - PIM
> - VSTP
> - LACP
> - VRRP
> - IGMPv2 and v3
> - Routing-instance
> - CoS for multicast
> - CoS for unicast
> - CoS classification by ingress filter
> - IPv4-only
> - ~7k routes in FIB (total of all tables)
> - ~1k multicast groups
>
>
> There are no automation features, no MPLS, no MC-LAG, no EVPN, VXLAN, etc. These switches are L3 boxes that hand off IP to an MX core. Management is in the default instance/table, everything else is in a routing instance.
>
> These boxes have us scared to touch them outside of a window as seemingly basic changes risk blowing the whole thing up. Is this a case where an ancient version might be a better choice or is this release a lemon? I recall that JTAC used to recommend two releases, one being for if you didn't require "new features". I find myself stuck between the adages of "If it ain't broke, don't fix it" and "Software doesn't age like wine". Given how poorly multicast seems to be understood by JTAC I'm very hesitant to upgrade to significantly newer releases.
>
> If anybody can give advice or suggestions I would appreciate it immensely!
>
> Thanks
> Ross
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp