Mailing List Archive

Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco
Hi
One think I've omit to say is that BGP is over a LACP with currently just
one interface 100 Gbs.

I see that the issue is triggered on Cisco when eth interface seems to go
in Initializing state:


2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN:
Interface port-channel101 is down (No operational members)
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PARENT_DOWN: Interface
port-channel101.2303 is down (Parent interface is down)
2024 Feb 9 16:39:36 NEXUS1 %BGP-5-ADJCHANGE: bgp- [xxx] (xxx) neighbor
172.16.6.17 Down - sent: other configuration change
2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-FOP_CHANGED:
port-channel101: first operational port changed from Ethernet1/44 to none
2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel101:
Ethernet1/44 is down
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface
port-channel101,bandwidth changed to 100000 Kbit
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN:
Interface port-channel101 is down (No operational members)
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-SPEED: Interface port-channel101,
operational speed changed to 100 Gbps
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DUPLEX: Interface
port-channel101, operational duplex mode changed to Full
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface
port-channel101, operational Receive Flow Control state changed to off
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface
port-channel101, operational Transmit Flow Control state changed to off
2024 Feb 9 16:39:39 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_UP: port-channel101:
Ethernet1/44 is up
2024 Feb 9 16:39:39 NEXUS1 %ETH_PORT_CHANNEL-5-FOP_CHANGED:
port-channel101: first operational port changed from none to Ethernet1/44
2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface
port-channel101,bandwidth changed to 100000000 Kbit
2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface Ethernet1/44 is up
in Layer3
2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface port-channel101 is
up in Layer3
2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface
port-channel101.2303 is up in Layer3
2024 Feb 9 16:39:43 NEXUS1 %BGP-5-ADJCHANGE: bgp- [xxx] (xxx) neighbor
172.16.6.17 Up

Cheers
James

Il giorno dom 11 feb 2024 alle ore 11:12 Gert Doering <gert@greenie.muc.de>
ha scritto:

> Hi,
>
> On Sun, Feb 11, 2024 at 11:08:29AM +0100, james list via cisco-nsp wrote:
> > we notice BGP flaps
>
> Any particular error message? BGP flaps can happen due to many different
> reasons, and usually $C is fairly good at logging the reason.
>
> Any interface errors, packet errors, ping packets lost?
>
> "BGP flaps" *can* be related to lower layer issues (so: interface counters,
> error counters, extended pings) or to something unrelated, like "MaxPfx
> exceeded"...
>
> gert
> --
> "If was one thing all people took for granted, was conviction that if you
> feed honest figures into a computer, honest figures come out. Never
> doubted
> it myself till I met a computer with a sense of humor."
> Robert A. Heinlein, The Moon is a Harsh
> Mistress
>
> Gert Doering - Munich, Germany
> gert@greenie.muc.de
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
yes same version
currently no traffic exchange is in place, just BGP peer setup
no traffic

Il giorno dom 11 feb 2024 alle ore 11:16 Igor Sukhomlinov <
dvalinswamp@gmail.com> ha scritto:

> Hi James,
>
> Do you happen to run the same software on all nexuses and all MXes?
> Do the DC1 and DC2 bgp session exchange the same amount of routing updates
> across the links?
>
>
> On Sun, Feb 11, 2024, 21:09 james list via cisco-nsp <
> cisco-nsp@puck.nether.net> wrote:
>
>> Dear experts
>> we have a couple of BGP peers over a 100 Gbs interconnection between
>> Juniper (MX10003) and Cisco (Nexus N9K-C9364C) in two different
>> datacenters
>> like this:
>>
>> DC1
>> MX1 -- bgp -- NEXUS1
>> MX2 -- bgp -- NEXUS2
>>
>> DC2
>> MX3 -- bgp -- NEXUS3
>> MX4 -- bgp -- NEXUS4
>>
>> The issue we see is that sporadically (ie every 1 to 3 days) we notice BGP
>> flaps only in DC1 on both interconnections (not at the same time), there
>> is
>> still no traffic since once noticed the flaps we have blocked deploy on
>> production.
>>
>> We've already changed SPF (we moved the ones from DC2 to DC1 and
>> viceversa)
>> and cables on both the interconnetion at DC1 without any solution.
>>
>> SFP we use in both DCs:
>>
>> Juniper - QSFP-100G-SR4-T2
>> Cisco - QSFP-100G-SR4
>>
>> over MPO cable OM4.
>>
>> Distance is DC1 70 mt and DC2 80 mt, hence is less where we see the issue.
>>
>> Any idea or suggestion what to check or to do ?
>>
>> Thanks in advance
>> Cheers
>> James
>> _______________________________________________
>> cisco-nsp mailing list cisco-nsp@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
DC technicians states cable are the same in both DCs and direct, no patch
panel

Cheers

Il giorno dom 11 feb 2024 alle ore 11:20 nivalMcNd d <nivalmcnd@gmail.com>
ha scritto:

> Can it be DC1 is connecting links over an intermediary patch panel and you
> face fibre disturbance? That may be eliminated if your interfaces on DC1
> links do not go down
>
> On Sun, Feb 11, 2024, 21:16 Igor Sukhomlinov via cisco-nsp <
> cisco-nsp@puck.nether.net> wrote:
>
>> Hi James,
>>
>> Do you happen to run the same software on all nexuses and all MXes?
>> Do the DC1 and DC2 bgp session exchange the same amount of routing updates
>> across the links?
>>
>>
>> On Sun, Feb 11, 2024, 21:09 james list via cisco-nsp <
>> cisco-nsp@puck.nether.net> wrote:
>>
>> > Dear experts
>> > we have a couple of BGP peers over a 100 Gbs interconnection between
>> > Juniper (MX10003) and Cisco (Nexus N9K-C9364C) in two different
>> datacenters
>> > like this:
>> >
>> > DC1
>> > MX1 -- bgp -- NEXUS1
>> > MX2 -- bgp -- NEXUS2
>> >
>> > DC2
>> > MX3 -- bgp -- NEXUS3
>> > MX4 -- bgp -- NEXUS4
>> >
>> > The issue we see is that sporadically (ie every 1 to 3 days) we notice
>> BGP
>> > flaps only in DC1 on both interconnections (not at the same time),
>> there is
>> > still no traffic since once noticed the flaps we have blocked deploy on
>> > production.
>> >
>> > We've already changed SPF (we moved the ones from DC2 to DC1 and
>> viceversa)
>> > and cables on both the interconnetion at DC1 without any solution.
>> >
>> > SFP we use in both DCs:
>> >
>> > Juniper - QSFP-100G-SR4-T2
>> > Cisco - QSFP-100G-SR4
>> >
>> > over MPO cable OM4.
>> >
>> > Distance is DC1 70 mt and DC2 80 mt, hence is less where we see the
>> issue.
>> >
>> > Any idea or suggestion what to check or to do ?
>> >
>> > Thanks in advance
>> > Cheers
>> > James
>> > _______________________________________________
>> > cisco-nsp mailing list cisco-nsp@puck.nether.net
>> > https://puck.nether.net/mailman/listinfo/cisco-nsp
>> > archive at http://puck.nether.net/pipermail/cisco-nsp/
>> >
>> _______________________________________________
>> cisco-nsp mailing list cisco-nsp@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
On Sun, 11 Feb 2024 at 13:51, james list via juniper-nsp
<juniper-nsp@puck.nether.net> wrote:

> One think I've omit to say is that BGP is over a LACP with currently just
> one interface 100 Gbs.
>
> I see that the issue is triggered on Cisco when eth interface seems to go
> in Initializing state:

Ok, so we can forget BGP entirely. And focus on why the LACP is going down.

Is the LACP single port, eth1/44?

When the LACP fails, does Juniper end emit any syslog? Does Juniper
see the interface facing eth1/44 flapping?

--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
> DC technicians states cable are the same in both DCs and
> direct, no patch panel

Things I would look at:

* Has all the connectors been verified clean via microscope?

* Optical levels relative to threshold values (may relate to the
first).

* Any end seeing any input errors? (May relate to the above
two.) On the Juniper you can see some of this via PCS
("Physical Coding Sublayer") unexpected events independently
of whether you have payload traffic, not sure you can do the
same on the Nexus boxes.

Regards,

- H?vard
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
I don't think any of these matter. You'd see FCS failure on any
link-related issue causing the BGP packet to drop.

If you're not seeing FCS failures, you can ignore all link related
problems in this case.


On Sun, 11 Feb 2024 at 14:13, Havard Eidnes via juniper-nsp
<juniper-nsp@puck.nether.net> wrote:
>
> > DC technicians states cable are the same in both DCs and
> > direct, no patch panel
>
> Things I would look at:
>
> * Has all the connectors been verified clean via microscope?
>
> * Optical levels relative to threshold values (may relate to the
> first).
>
> * Any end seeing any input errors? (May relate to the above
> two.) On the Juniper you can see some of this via PCS
> ("Physical Coding Sublayer") unexpected events independently
> of whether you have payload traffic, not sure you can do the
> same on the Nexus boxes.
>
> Regards,
>
> - Håvard
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
I want to clarify, I meant this in the context of the original question.

That is, if you have a BGP specific problem, and no FCS errors, then
you can't have link problems.

But in this case, the problem is not BGP specific, in fact it has
nothing to do with BGP, since the problem begins on observing link
flap.

On Sun, 11 Feb 2024 at 14:14, Saku Ytti <saku@ytti.fi> wrote:
>
> I don't think any of these matter. You'd see FCS failure on any
> link-related issue causing the BGP packet to drop.
>
> If you're not seeing FCS failures, you can ignore all link related
> problems in this case.
>
>
> On Sun, 11 Feb 2024 at 14:13, Havard Eidnes via juniper-nsp
> <juniper-nsp@puck.nether.net> wrote:
> >
> > > DC technicians states cable are the same in both DCs and
> > > direct, no patch panel
> >
> > Things I would look at:
> >
> > * Has all the connectors been verified clean via microscope?
> >
> > * Optical levels relative to threshold values (may relate to the
> > first).
> >
> > * Any end seeing any input errors? (May relate to the above
> > two.) On the Juniper you can see some of this via PCS
> > ("Physical Coding Sublayer") unexpected events independently
> > of whether you have payload traffic, not sure you can do the
> > same on the Nexus boxes.
> >
> > Regards,
> >
> > - Håvard
> > _______________________________________________
> > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > https://puck.nether.net/mailman/listinfo/juniper-nsp
>
>
>
> --
> ++ytti



--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
Hi
there are no errors on both interfaces (Cisco and Juniper).

here following logs of one event on both side, config and LACP stats.

LOGS of one event time 16:39:

CISCO
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN:
Interface port-channel101 is down (No operational members)
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PARENT_DOWN: Interface
port-channel101.2303 is down (Parent interface is down)
2024 Feb 9 16:39:36 NEXUS1 %BGP-5-ADJCHANGE: bgp- [xxx] (xxx) neighbor
172.16.6.17 Down - sent: other configuration change
2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-FOP_CHANGED:
port-channel101: first operational port changed from Ethernet1/44 to none
2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel101:
Ethernet1/44 is down
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface
port-channel101,bandwidth changed to 100000 Kbit
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN:
Interface port-channel101 is down (No operational members)
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-SPEED: Interface port-channel101,
operational speed changed to 100 Gbps
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DUPLEX: Interface
port-channel101, operational duplex mode changed to Full
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface
port-channel101, operational Receive Flow Control state changed to off
2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface
port-channel101, operational Transmit Flow Control state changed to off
2024 Feb 9 16:39:39 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_UP: port-channel101:
Ethernet1/44 is up
2024 Feb 9 16:39:39 NEXUS1 %ETH_PORT_CHANNEL-5-FOP_CHANGED:
port-channel101: first operational port changed from none to Ethernet1/44
2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface
port-channel101,bandwidth changed to 100000000 Kbit
2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface Ethernet1/44 is up
in Layer3
2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface port-channel101 is
up in Layer3
2024 Feb 9 16:39:39 NEXUS1 %ETHPORT-5-IF_UP: Interface
port-channel101.2303 is up in Layer3
2024 Feb 9 16:39:43 NEXUS1 %BGP-5-ADJCHANGE: bgp- [xxx] (xxx) neighbor
172.16.6.17 Up



Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp
current while timer expired current Receive State: CURRENT
Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACP_INTF_DOWN: ae49: Interface
marked down due to lacp timeout on member et-0/1/5
Feb 9 16:39:35.819 2024 MX1 kernel: lag_bundlestate_ifd_change: bundle
ae49: bundle IFD minimum bandwidth or minimum links not met, Bandwidth
(Current : Required) 0 : 100000000000 Number of links (Current : Required)
0 : 1
Feb 9 16:39:35.815 2024 MX1 lacpd[31632]: LACP_INTF_MUX_STATE_CHANGED:
ae49: et-0/1/5: Lacp state changed from COLLECTING_DISTRIBUTING to
ATTACHED, actor port state : |EXP|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner
port state : |-|-|DIS|COL|OUT_OF_SYNC|AGG|SHORT|ACT|
Feb 9 16:39:35.869 2024 MX1 rpd[31866]: bgp_ifachange_group:10697:
NOTIFICATION sent to 172.16.6.18 (External AS xxx): code 6 (Cease) subcode
6 (Other Configuration Change), Reason: Interface change for the peer-group
Feb 9 16:39:35.909 2024 MX1 mib2d[31909]: SNMP_TRAP_LINK_DOWN: ifIndex
684, ifAdminStatus up(1), ifOperStatus down(2), ifName ae49
Feb 9 16:39:36.083 2024 MX1 lacpd[31632]: LACP_INTF_MUX_STATE_CHANGED:
ae49: et-0/1/5: Lacp state changed from ATTACHED to
COLLECTING_DISTRIBUTING, actor port state :
|-|-|DIS|COL|IN_SYNC|AGG|SHORT|ACT|, partner port state :
|-|-|DIS|COL|IN_SYNC|AGG|SHORT|ACT|
Feb 9 16:39:36.089 2024 MX1 kernel: lag_bundlestate_ifd_change: bundle
ae49 is now Up. uplinks 1 >= min_links 1
Feb 9 16:39:36.089 2024 MX1 kernel: lag_bundlestate_ifd_change: bundle
ae49: bundle IFD minimum bandwidth or minimum links not met, Bandwidth
(Current : Required) 0 : 100000000000 Number of links (Current : Required)
0 : 1
Feb 9 16:39:36.085 2024 MX1 lacpd[31632]: LACP_INTF_MUX_STATE_CHANGED:
ae49: et-0/1/5: Lacp state changed from COLLECTING_DISTRIBUTING to
ATTACHED, actor port state : |-|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner port
state : |-|-|-|-|OUT_OF_SYNC|AGG|SHORT|ACT|
Feb 9 16:39:39.095 2024 MX1 lacpd[31632]: LACP_INTF_MUX_STATE_CHANGED:
ae49: et-0/1/5: Lacp state changed from ATTACHED to
COLLECTING_DISTRIBUTING, actor port state :
|-|-|DIS|COL|IN_SYNC|AGG|SHORT|ACT|, partner port state :
|-|-|-|-|IN_SYNC|AGG|SHORT|ACT|
Feb 9 16:39:39.101 2024 MX1 kernel: lag_bundlestate_ifd_change: bundle
ae49 is now Up. uplinks 1 >= min_links 1
Feb 9 16:39:39.109 2024 MX1 mib2d[31909]: SNMP_TRAP_LINK_UP: ifIndex 684,
ifAdminStatus up(1), ifOperStatus up(1), ifName ae49
Feb 9 16:39:41.190 2024 MX1 rpd[31866]: bgp_recv: read from peer
172.16.6.18 (External AS xxx) failed: Unknown error: 48110976


CONFIG:

CISCO

NEXUS1# sh run int port-channel 101

interface port-channel101
description <[To MX1|Et-0/1/5]>
mtu 9216
no ip redirects

NEXUS01# sh run int port-channel 101.2303

interface port-channel101.2303
description <[To MX1|Et-0/1/5]>
mtu 9216
encapsulation dot1q 2303
vrf member SIA
bfd ipv4 interval 250 min_rx 250 multiplier 3
no ip redirects
ip address 172.16.6.18/30
no shutdown

JUNIPER

MX1> show configuration interfaces ae49
description "link to NEXUS01";
flexible-vlan-tagging;
mtu 9192;
encapsulation flexible-ethernet-services;
aggregated-ether-options {
lacp {
active;
periodic fast;
}
}
unit 2303 {
vlan-id 2303;
family inet {
mtu 1500;
address 172.16.6.17/30;
}
}

LACP counters:


CISCO

NEXUS01# sh lacp counters
NOTE: Clear lacp counters to get accurate statistics

------------------------------------------------------------------------------
LACPDUs Markers/Resp
LACPDUs
Port Sent Recv Recv Sent Pkts
Err
------------------------------------------------------------------------------
port-channel101
Ethernet1/44 6123011 6118981 0 0
0

NEXUS1# sh lacp interface eth1/44
Interface Ethernet1/44 is up
Channel group is 101 port channel is Po101
PDUs sent: 6123014
PDUs rcvd: 6118984
Markers sent: 0
Markers rcvd: 0
Marker response sent: 0
Marker response rcvd: 0
Unknown packets rcvd: 0
Illegal packets rcvd: 0
Lag Id: [. [(7f, c4-9-b7-64-30-38, 32, 7f, 18), (8000, b0-8b-cf-83-49-5b,
64, 8000, 1ad)] ]
Operational as aggregated link since Fri Feb 9 16:39:39 2024

Local Port: Eth1/44 MAC Address= b0-8b-cf-83-49-5b
System Identifier=0x8000, Port Identifier=0x8000,0x1ad
Operational key=100
LACP_Activity=active
LACP_Timeout=Short Timeout (1s)
Synchronization=IN_SYNC
Collecting=true
Distributing=true
Partner information refresh timeout=Short Timeout (3s)
Actor Admin State=63
Actor Oper State=63
Neighbor: 0x18
MAC Address= c4-9-b7-64-30-38
System Identifier=0x7f, Port Identifier=0x7f,0x18
Operational key=50
LACP_Activity=active
LACP_Timeout=short Timeout (1s)
Synchronization=IN_SYNC
Collecting=true
Distributing=true
Partner Admin State=63
Partner Oper State=63
Aggregate or Individual(True=1)= 1

JUNIPER

MX1> show lacp interfaces ae49 extensive
Aggregated interface: ae49
LACP state: Role Exp Def Dist Col Syn Aggr Timeout
Activity
et-0/1/5 Actor No No Yes Yes Yes Yes Fast
Active
et-0/1/5 Partner No No Yes Yes Yes Yes Fast
Active
LACP protocol: Receive State Transmit State Mux State
et-0/1/5 Current Fast periodic Collecting
distributing
LACP info: Role System System Port
Port Port
priority identifier priority
number key
et-0/1/5 Actor 127 c4:09:b7:64:30:38 127
24 50
et-0/1/5 Partner 32768 b0:8b:cf:83:49:5b 32768
429 100

Il giorno dom 11 feb 2024 alle ore 13:07 Gert Doering <gert@greenie.muc.de>
ha scritto:

> HI,
>
> On Sun, Feb 11, 2024 at 12:50:32PM +0100, james list wrote:
> > 2024 Feb 9 16:39:36 NEXUS1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN:
> > Interface port-channel101 is down (No operational members)
>
> So there is no *BGP* problem here, but a lower layer issue.
>
> Let me repeat that part about "error counters on the interface"...
>
> gert
> --
> "If was one thing all people took for granted, was conviction that if you
> feed honest figures into a computer, honest figures come out. Never
> doubted
> it myself till I met a computer with a sense of humor."
> Robert A. Heinlein, The Moon is a Harsh
> Mistress
>
> Gert Doering - Munich, Germany
> gert@greenie.muc.de
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
Hi

1) cable has been replaced with a brand new one, they said that to check an
MPO 100 Gbs cable is not that easy

3) no errors reported on both side

2) here the output of cisco and juniper

NEXUS1# sh interface eth1/44 transceiver details
Ethernet1/44
transceiver is present
type is QSFP-100G-SR4
name is CISCO-INNOLIGHT
part number is TR-FC85S-NC3
revision is 2C
serial number is INL27050TVT
nominal bitrate is 25500 MBit/sec
Link length supported for 50/125um OM3 fiber is 70 m
cisco id is 17
cisco extended id number is 220
cisco part number is 10-3142-03
cisco product id is QSFP-100G-SR4-S
cisco version id is V03

Lane Number:1 Network Lane
SFP Detail Diagnostics Information (internal calibration)

----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low

----------------------------------------------------------------------------
Temperature 30.51 C 75.00 C -5.00 C 70.00 C 0.00 C
Voltage 3.28 V 3.63 V 2.97 V 3.46 V 3.13 V
Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA 3.25
mA
Tx Power 0.98 dBm 5.39 dBm -12.44 dBm 2.39 dBm -8.41
dBm
Rx Power -1.60 dBm 5.39 dBm -14.31 dBm 2.39 dBm -10.31
dBm
Transmit Fault Count = 0

----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning

Lane Number:2 Network Lane
SFP Detail Diagnostics Information (internal calibration)

----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low

----------------------------------------------------------------------------
Temperature 30.51 C 75.00 C -5.00 C 70.00 C 0.00 C
Voltage 3.28 V 3.63 V 2.97 V 3.46 V 3.13 V
Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA 3.25
mA
Tx Power 0.62 dBm 5.39 dBm -12.44 dBm 2.39 dBm -8.41
dBm
Rx Power -1.18 dBm 5.39 dBm -14.31 dBm 2.39 dBm -10.31
dBm
Transmit Fault Count = 0

----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning

Lane Number:3 Network Lane
SFP Detail Diagnostics Information (internal calibration)

----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low

----------------------------------------------------------------------------
Temperature 30.51 C 75.00 C -5.00 C 70.00 C 0.00 C
Voltage 3.28 V 3.63 V 2.97 V 3.46 V 3.13 V
Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA 3.25
mA
Tx Power 0.87 dBm 5.39 dBm -12.44 dBm 2.39 dBm -8.41
dBm
Rx Power 0.01 dBm 5.39 dBm -14.31 dBm 2.39 dBm -10.31
dBm
Transmit Fault Count = 0

----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning

Lane Number:4 Network Lane
SFP Detail Diagnostics Information (internal calibration)

----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low

----------------------------------------------------------------------------
Temperature 30.51 C 75.00 C -5.00 C 70.00 C 0.00 C
Voltage 3.28 V 3.63 V 2.97 V 3.46 V 3.13 V
Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA 3.25
mA
Tx Power 0.67 dBm 5.39 dBm -12.44 dBm 2.39 dBm -8.41
dBm
Rx Power 0.11 dBm 5.39 dBm -14.31 dBm 2.39 dBm -10.31
dBm
Transmit Fault Count = 0

----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning



MX1> show interfaces diagnostics optics et-1/0/5
Physical interface: et-1/0/5
Module temperature : 38 degrees C / 100 degrees
F
Module voltage : 3.2740 V
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Module temperature high alarm threshold : 78 degrees C / 172 degrees
F
Module temperature low alarm threshold : -5 degrees C / 23 degrees F
Module temperature high warning threshold : 75 degrees C / 167 degrees
F
Module temperature low warning threshold : 0 degrees C / 32 degrees F
Module voltage high alarm threshold : 3.6300 V
Module voltage low alarm threshold : 2.9700 V
Module voltage high warning threshold : 3.4640 V
Module voltage low warning threshold : 3.1340 V
Laser bias current high alarm threshold : 104.999 mA
Laser bias current low alarm threshold : 7.999 mA
Laser bias current high warning threshold : 104.999 mA
Laser bias current low warning threshold : 9.999 mA
Laser output power high alarm threshold : 5.3703 mW / 7.30 dBm
Laser output power low alarm threshold : 0.0794 mW / -11.00 dBm
Laser output power high warning threshold : 3.1623 mW / 5.00 dBm
Laser output power low warning threshold : 0.1995 mW / -7.00 dBm
Laser rx power high alarm threshold : 4.4668 mW / 6.50 dBm
Laser rx power low alarm threshold : 0.0251 mW / -16.00 dBm
Laser rx power high warning threshold : 3.5481 mW / 5.50 dBm
Laser rx power low warning threshold : 0.0630 mW / -12.01 dBm
Lane 0
Laser bias current : 41.588 mA
Laser output power : 1.702 mW / 2.31 dBm
Laser receiver power : 1.102 mW / 0.42 dBm
Laser bias current high alarm : Off
Laser bias current low alarm : Off
Laser bias current high warning : Off
Laser bias current low warning : Off
Laser receiver power high alarm : Off
Laser receiver power low alarm : Off
Laser receiver power high warning : Off
Laser receiver power low warning : Off
Tx loss of signal functionality alarm : Off
Rx loss of signal alarm : Off
Tx laser disabled alarm : Off
Lane 1
Laser bias current : 42.324 mA
Laser output power : 1.376 mW / 1.39 dBm
Laser receiver power : 2.001 mW / 3.01 dBm
Laser bias current high alarm : Off
Laser bias current low alarm : Off
Laser bias current high warning : Off
Laser bias current low warning : Off
Laser receiver power high alarm : Off
Laser receiver power low alarm : Off
Laser receiver power high warning : Off
Laser receiver power low warning : Off
Tx loss of signal functionality alarm : Off
Rx loss of signal alarm : Off
Tx laser disabled alarm : Off
Lane 2
Laser bias current : 41.066 mA
Laser output power : 1.659 mW / 2.20 dBm
Laser receiver power : 1.328 mW / 1.23 dBm
Laser bias current high alarm : Off
Laser bias current low alarm : Off
Laser bias current high warning : Off
Laser bias current low warning : Off
Laser receiver power high alarm : Off
Laser receiver power low alarm : Off
Laser receiver power high warning : Off
Laser receiver power low warning : Off
Tx loss of signal functionality alarm : Off
Rx loss of signal alarm : Off
Tx laser disabled alarm : Off
Lane 3
Laser bias current : 37.970 mA
Laser output power : 1.304 mW / 1.15 dBm
Laser receiver power : 1.370 mW / 1.37 dBm
Laser bias current high alarm : Off
Laser bias current low alarm : Off
Laser bias current high warning : Off
Laser bias current low warning : Off
Laser receiver power high alarm : Off
Laser receiver power low alarm : Off
Laser receiver power high warning : Off
Laser receiver power low warning : Off
Tx loss of signal functionality alarm : Off
Rx loss of signal alarm : Off
Tx laser disabled alarm : Off

Il giorno dom 11 feb 2024 alle ore 13:12 Havard Eidnes <he@uninett.no> ha
scritto:

> > DC technicians states cable are the same in both DCs and
> > direct, no patch panel
>
> Things I would look at:
>
> * Has all the connectors been verified clean via microscope?
>
> * Optical levels relative to threshold values (may relate to the
> first).
>
> * Any end seeing any input errors? (May relate to the above
> two.) On the Juniper you can see some of this via PCS
> ("Physical Coding Sublayer") unexpected events independently
> of whether you have payload traffic, not sure you can do the
> same on the Nexus boxes.
>
> Regards,
>
> - Håvard
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
Hey James,

You shared this off-list, I think it's sufficiently material to share.

2024 Feb 9 16:39:36 NEXUS1
%ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface
port-channel101 is down (No operational members)
2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_DOWN:
port-channel101: Ethernet1/44 is down
Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5:
lacp current while timer expired current Receive State: CURRENT
Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACP_INTF_DOWN: ae49:
Interface marked down due to lacp timeout on member et-0/1/5

We can't know the order of events here, due to no subsecond precision
enabled on Cisco end.

But if failure would start from interface down, it would take 3seconds
for Juniper to realise LACP failure. However we can see that it
happens in less than 1s, so we can determine the interface was not
down first, the first problem was Juniper not receiving 3 consecutive
LACP PDUs, 1s apart, prior to noticing any type of interface state
related problems.

Is this always the order of events? Does it always happen with Juniper
noticing problems receiving LACP PDU first?


On Sun, 11 Feb 2024 at 14:55, james list via juniper-nsp
<juniper-nsp@puck.nether.net> wrote:
>
> Hi
>
> 1) cable has been replaced with a brand new one, they said that to check an
> MPO 100 Gbs cable is not that easy
>
> 3) no errors reported on both side
>
> 2) here the output of cisco and juniper
>
> NEXUS1# sh interface eth1/44 transceiver details
> Ethernet1/44
> transceiver is present
> type is QSFP-100G-SR4
> name is CISCO-INNOLIGHT
> part number is TR-FC85S-NC3
> revision is 2C
> serial number is INL27050TVT
> nominal bitrate is 25500 MBit/sec
> Link length supported for 50/125um OM3 fiber is 70 m
> cisco id is 17
> cisco extended id number is 220
> cisco part number is 10-3142-03
> cisco product id is QSFP-100G-SR4-S
> cisco version id is V03
>
> Lane Number:1 Network Lane
> SFP Detail Diagnostics Information (internal calibration)
>
> ----------------------------------------------------------------------------
> Current Alarms Warnings
> Measurement High Low High Low
>
> ----------------------------------------------------------------------------
> Temperature 30.51 C 75.00 C -5.00 C 70.00 C 0.00 C
> Voltage 3.28 V 3.63 V 2.97 V 3.46 V 3.13 V
> Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA 3.25
> mA
> Tx Power 0.98 dBm 5.39 dBm -12.44 dBm 2.39 dBm -8.41
> dBm
> Rx Power -1.60 dBm 5.39 dBm -14.31 dBm 2.39 dBm -10.31
> dBm
> Transmit Fault Count = 0
>
> ----------------------------------------------------------------------------
> Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
>
> Lane Number:2 Network Lane
> SFP Detail Diagnostics Information (internal calibration)
>
> ----------------------------------------------------------------------------
> Current Alarms Warnings
> Measurement High Low High Low
>
> ----------------------------------------------------------------------------
> Temperature 30.51 C 75.00 C -5.00 C 70.00 C 0.00 C
> Voltage 3.28 V 3.63 V 2.97 V 3.46 V 3.13 V
> Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA 3.25
> mA
> Tx Power 0.62 dBm 5.39 dBm -12.44 dBm 2.39 dBm -8.41
> dBm
> Rx Power -1.18 dBm 5.39 dBm -14.31 dBm 2.39 dBm -10.31
> dBm
> Transmit Fault Count = 0
>
> ----------------------------------------------------------------------------
> Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
>
> Lane Number:3 Network Lane
> SFP Detail Diagnostics Information (internal calibration)
>
> ----------------------------------------------------------------------------
> Current Alarms Warnings
> Measurement High Low High Low
>
> ----------------------------------------------------------------------------
> Temperature 30.51 C 75.00 C -5.00 C 70.00 C 0.00 C
> Voltage 3.28 V 3.63 V 2.97 V 3.46 V 3.13 V
> Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA 3.25
> mA
> Tx Power 0.87 dBm 5.39 dBm -12.44 dBm 2.39 dBm -8.41
> dBm
> Rx Power 0.01 dBm 5.39 dBm -14.31 dBm 2.39 dBm -10.31
> dBm
> Transmit Fault Count = 0
>
> ----------------------------------------------------------------------------
> Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
>
> Lane Number:4 Network Lane
> SFP Detail Diagnostics Information (internal calibration)
>
> ----------------------------------------------------------------------------
> Current Alarms Warnings
> Measurement High Low High Low
>
> ----------------------------------------------------------------------------
> Temperature 30.51 C 75.00 C -5.00 C 70.00 C 0.00 C
> Voltage 3.28 V 3.63 V 2.97 V 3.46 V 3.13 V
> Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA 3.25
> mA
> Tx Power 0.67 dBm 5.39 dBm -12.44 dBm 2.39 dBm -8.41
> dBm
> Rx Power 0.11 dBm 5.39 dBm -14.31 dBm 2.39 dBm -10.31
> dBm
> Transmit Fault Count = 0
>
> ----------------------------------------------------------------------------
> Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
>
>
>
> MX1> show interfaces diagnostics optics et-1/0/5
> Physical interface: et-1/0/5
> Module temperature : 38 degrees C / 100 degrees
> F
> Module voltage : 3.2740 V
> Module temperature high alarm : Off
> Module temperature low alarm : Off
> Module temperature high warning : Off
> Module temperature low warning : Off
> Module voltage high alarm : Off
> Module voltage low alarm : Off
> Module voltage high warning : Off
> Module voltage low warning : Off
> Module temperature high alarm threshold : 78 degrees C / 172 degrees
> F
> Module temperature low alarm threshold : -5 degrees C / 23 degrees F
> Module temperature high warning threshold : 75 degrees C / 167 degrees
> F
> Module temperature low warning threshold : 0 degrees C / 32 degrees F
> Module voltage high alarm threshold : 3.6300 V
> Module voltage low alarm threshold : 2.9700 V
> Module voltage high warning threshold : 3.4640 V
> Module voltage low warning threshold : 3.1340 V
> Laser bias current high alarm threshold : 104.999 mA
> Laser bias current low alarm threshold : 7.999 mA
> Laser bias current high warning threshold : 104.999 mA
> Laser bias current low warning threshold : 9.999 mA
> Laser output power high alarm threshold : 5.3703 mW / 7.30 dBm
> Laser output power low alarm threshold : 0.0794 mW / -11.00 dBm
> Laser output power high warning threshold : 3.1623 mW / 5.00 dBm
> Laser output power low warning threshold : 0.1995 mW / -7.00 dBm
> Laser rx power high alarm threshold : 4.4668 mW / 6.50 dBm
> Laser rx power low alarm threshold : 0.0251 mW / -16.00 dBm
> Laser rx power high warning threshold : 3.5481 mW / 5.50 dBm
> Laser rx power low warning threshold : 0.0630 mW / -12.01 dBm
> Lane 0
> Laser bias current : 41.588 mA
> Laser output power : 1.702 mW / 2.31 dBm
> Laser receiver power : 1.102 mW / 0.42 dBm
> Laser bias current high alarm : Off
> Laser bias current low alarm : Off
> Laser bias current high warning : Off
> Laser bias current low warning : Off
> Laser receiver power high alarm : Off
> Laser receiver power low alarm : Off
> Laser receiver power high warning : Off
> Laser receiver power low warning : Off
> Tx loss of signal functionality alarm : Off
> Rx loss of signal alarm : Off
> Tx laser disabled alarm : Off
> Lane 1
> Laser bias current : 42.324 mA
> Laser output power : 1.376 mW / 1.39 dBm
> Laser receiver power : 2.001 mW / 3.01 dBm
> Laser bias current high alarm : Off
> Laser bias current low alarm : Off
> Laser bias current high warning : Off
> Laser bias current low warning : Off
> Laser receiver power high alarm : Off
> Laser receiver power low alarm : Off
> Laser receiver power high warning : Off
> Laser receiver power low warning : Off
> Tx loss of signal functionality alarm : Off
> Rx loss of signal alarm : Off
> Tx laser disabled alarm : Off
> Lane 2
> Laser bias current : 41.066 mA
> Laser output power : 1.659 mW / 2.20 dBm
> Laser receiver power : 1.328 mW / 1.23 dBm
> Laser bias current high alarm : Off
> Laser bias current low alarm : Off
> Laser bias current high warning : Off
> Laser bias current low warning : Off
> Laser receiver power high alarm : Off
> Laser receiver power low alarm : Off
> Laser receiver power high warning : Off
> Laser receiver power low warning : Off
> Tx loss of signal functionality alarm : Off
> Rx loss of signal alarm : Off
> Tx laser disabled alarm : Off
> Lane 3
> Laser bias current : 37.970 mA
> Laser output power : 1.304 mW / 1.15 dBm
> Laser receiver power : 1.370 mW / 1.37 dBm
> Laser bias current high alarm : Off
> Laser bias current low alarm : Off
> Laser bias current high warning : Off
> Laser bias current low warning : Off
> Laser receiver power high alarm : Off
> Laser receiver power low alarm : Off
> Laser receiver power high warning : Off
> Laser receiver power low warning : Off
> Tx loss of signal functionality alarm : Off
> Rx loss of signal alarm : Off
> Tx laser disabled alarm : Off
>
> Il giorno dom 11 feb 2024 alle ore 13:12 Havard Eidnes <he@uninett.no> ha
> scritto:
>
> > > DC technicians states cable are the same in both DCs and
> > > direct, no patch panel
> >
> > Things I would look at:
> >
> > * Has all the connectors been verified clean via microscope?
> >
> > * Optical levels relative to threshold values (may relate to the
> > first).
> >
> > * Any end seeing any input errors? (May relate to the above
> > two.) On the Juniper you can see some of this via PCS
> > ("Physical Coding Sublayer") unexpected events independently
> > of whether you have payload traffic, not sure you can do the
> > same on the Nexus boxes.
> >
> > Regards,
> >
> > - Håvard
> >
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
On Cisco I see physical goes down (initializing), what does that mean?

While on Juniper when the issue happens I always see:

show log messages | last 440 | match LACPD_TIMEOUT
Jan 25 21:32:27.948 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp
current while timer expired current Receive State: CURRENT
Jan 26 18:41:12.514 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp
current while timer expired current Receive State: CURRENT
Jan 28 05:07:20.283 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp
current while timer expired current Receive State: CURRENT
Jan 29 04:06:51.768 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp
current while timer expired current Receive State: CURRENT
Jan 30 03:09:43.923 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp
current while timer expired current Receive State: CURRENT
Feb 5 18:13:20.158 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp
current while timer expired current Receive State: CURRENT
Feb 6 02:17:23.703 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp
current while timer expired current Receive State: CURRENT
Feb 6 22:00:23.758 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp
current while timer expired current Receive State: CURRENT
Feb 9 09:29:35.728 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp
current while timer expired current Receive State: CURRENT
Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp
current while timer expired current Receive State: CURRENT

Il giorno dom 11 feb 2024 alle ore 14:10 Saku Ytti <saku@ytti.fi> ha
scritto:

> Hey James,
>
> You shared this off-list, I think it's sufficiently material to share.
>
> 2024 Feb 9 16:39:36 NEXUS1
> %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface
> port-channel101 is down (No operational members)
> 2024 Feb 9 16:39:36 NEXUS1 %ETH_PORT_CHANNEL-5-PORT_DOWN:
> port-channel101: Ethernet1/44 is down
> Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5:
> lacp current while timer expired current Receive State: CURRENT
> Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACP_INTF_DOWN: ae49:
> Interface marked down due to lacp timeout on member et-0/1/5
>
> We can't know the order of events here, due to no subsecond precision
> enabled on Cisco end.
>
> But if failure would start from interface down, it would take 3seconds
> for Juniper to realise LACP failure. However we can see that it
> happens in less than 1s, so we can determine the interface was not
> down first, the first problem was Juniper not receiving 3 consecutive
> LACP PDUs, 1s apart, prior to noticing any type of interface state
> related problems.
>
> Is this always the order of events? Does it always happen with Juniper
> noticing problems receiving LACP PDU first?
>
>
> On Sun, 11 Feb 2024 at 14:55, james list via juniper-nsp
> <juniper-nsp@puck.nether.net> wrote:
> >
> > Hi
> >
> > 1) cable has been replaced with a brand new one, they said that to check
> an
> > MPO 100 Gbs cable is not that easy
> >
> > 3) no errors reported on both side
> >
> > 2) here the output of cisco and juniper
> >
> > NEXUS1# sh interface eth1/44 transceiver details
> > Ethernet1/44
> > transceiver is present
> > type is QSFP-100G-SR4
> > name is CISCO-INNOLIGHT
> > part number is TR-FC85S-NC3
> > revision is 2C
> > serial number is INL27050TVT
> > nominal bitrate is 25500 MBit/sec
> > Link length supported for 50/125um OM3 fiber is 70 m
> > cisco id is 17
> > cisco extended id number is 220
> > cisco part number is 10-3142-03
> > cisco product id is QSFP-100G-SR4-S
> > cisco version id is V03
> >
> > Lane Number:1 Network Lane
> > SFP Detail Diagnostics Information (internal calibration)
> >
> >
> ----------------------------------------------------------------------------
> > Current Alarms Warnings
> > Measurement High Low High Low
> >
> >
> ----------------------------------------------------------------------------
> > Temperature 30.51 C 75.00 C -5.00 C 70.00 C
> 0.00 C
> > Voltage 3.28 V 3.63 V 2.97 V 3.46 V
> 3.13 V
> > Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA
> 3.25
> > mA
> > Tx Power 0.98 dBm 5.39 dBm -12.44 dBm 2.39 dBm
> -8.41
> > dBm
> > Rx Power -1.60 dBm 5.39 dBm -14.31 dBm 2.39 dBm
> -10.31
> > dBm
> > Transmit Fault Count = 0
> >
> >
> ----------------------------------------------------------------------------
> > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
> >
> > Lane Number:2 Network Lane
> > SFP Detail Diagnostics Information (internal calibration)
> >
> >
> ----------------------------------------------------------------------------
> > Current Alarms Warnings
> > Measurement High Low High Low
> >
> >
> ----------------------------------------------------------------------------
> > Temperature 30.51 C 75.00 C -5.00 C 70.00 C
> 0.00 C
> > Voltage 3.28 V 3.63 V 2.97 V 3.46 V
> 3.13 V
> > Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA
> 3.25
> > mA
> > Tx Power 0.62 dBm 5.39 dBm -12.44 dBm 2.39 dBm
> -8.41
> > dBm
> > Rx Power -1.18 dBm 5.39 dBm -14.31 dBm 2.39 dBm
> -10.31
> > dBm
> > Transmit Fault Count = 0
> >
> >
> ----------------------------------------------------------------------------
> > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
> >
> > Lane Number:3 Network Lane
> > SFP Detail Diagnostics Information (internal calibration)
> >
> >
> ----------------------------------------------------------------------------
> > Current Alarms Warnings
> > Measurement High Low High Low
> >
> >
> ----------------------------------------------------------------------------
> > Temperature 30.51 C 75.00 C -5.00 C 70.00 C
> 0.00 C
> > Voltage 3.28 V 3.63 V 2.97 V 3.46 V
> 3.13 V
> > Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA
> 3.25
> > mA
> > Tx Power 0.87 dBm 5.39 dBm -12.44 dBm 2.39 dBm
> -8.41
> > dBm
> > Rx Power 0.01 dBm 5.39 dBm -14.31 dBm 2.39 dBm
> -10.31
> > dBm
> > Transmit Fault Count = 0
> >
> >
> ----------------------------------------------------------------------------
> > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
> >
> > Lane Number:4 Network Lane
> > SFP Detail Diagnostics Information (internal calibration)
> >
> >
> ----------------------------------------------------------------------------
> > Current Alarms Warnings
> > Measurement High Low High Low
> >
> >
> ----------------------------------------------------------------------------
> > Temperature 30.51 C 75.00 C -5.00 C 70.00 C
> 0.00 C
> > Voltage 3.28 V 3.63 V 2.97 V 3.46 V
> 3.13 V
> > Current 6.40 mA 12.45 mA 3.25 mA 12.45 mA
> 3.25
> > mA
> > Tx Power 0.67 dBm 5.39 dBm -12.44 dBm 2.39 dBm
> -8.41
> > dBm
> > Rx Power 0.11 dBm 5.39 dBm -14.31 dBm 2.39 dBm
> -10.31
> > dBm
> > Transmit Fault Count = 0
> >
> >
> ----------------------------------------------------------------------------
> > Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
> >
> >
> >
> > MX1> show interfaces diagnostics optics et-1/0/5
> > Physical interface: et-1/0/5
> > Module temperature : 38 degrees C / 100
> degrees
> > F
> > Module voltage : 3.2740 V
> > Module temperature high alarm : Off
> > Module temperature low alarm : Off
> > Module temperature high warning : Off
> > Module temperature low warning : Off
> > Module voltage high alarm : Off
> > Module voltage low alarm : Off
> > Module voltage high warning : Off
> > Module voltage low warning : Off
> > Module temperature high alarm threshold : 78 degrees C / 172
> degrees
> > F
> > Module temperature low alarm threshold : -5 degrees C / 23
> degrees F
> > Module temperature high warning threshold : 75 degrees C / 167
> degrees
> > F
> > Module temperature low warning threshold : 0 degrees C / 32
> degrees F
> > Module voltage high alarm threshold : 3.6300 V
> > Module voltage low alarm threshold : 2.9700 V
> > Module voltage high warning threshold : 3.4640 V
> > Module voltage low warning threshold : 3.1340 V
> > Laser bias current high alarm threshold : 104.999 mA
> > Laser bias current low alarm threshold : 7.999 mA
> > Laser bias current high warning threshold : 104.999 mA
> > Laser bias current low warning threshold : 9.999 mA
> > Laser output power high alarm threshold : 5.3703 mW / 7.30 dBm
> > Laser output power low alarm threshold : 0.0794 mW / -11.00 dBm
> > Laser output power high warning threshold : 3.1623 mW / 5.00 dBm
> > Laser output power low warning threshold : 0.1995 mW / -7.00 dBm
> > Laser rx power high alarm threshold : 4.4668 mW / 6.50 dBm
> > Laser rx power low alarm threshold : 0.0251 mW / -16.00 dBm
> > Laser rx power high warning threshold : 3.5481 mW / 5.50 dBm
> > Laser rx power low warning threshold : 0.0630 mW / -12.01 dBm
> > Lane 0
> > Laser bias current : 41.588 mA
> > Laser output power : 1.702 mW / 2.31 dBm
> > Laser receiver power : 1.102 mW / 0.42 dBm
> > Laser bias current high alarm : Off
> > Laser bias current low alarm : Off
> > Laser bias current high warning : Off
> > Laser bias current low warning : Off
> > Laser receiver power high alarm : Off
> > Laser receiver power low alarm : Off
> > Laser receiver power high warning : Off
> > Laser receiver power low warning : Off
> > Tx loss of signal functionality alarm : Off
> > Rx loss of signal alarm : Off
> > Tx laser disabled alarm : Off
> > Lane 1
> > Laser bias current : 42.324 mA
> > Laser output power : 1.376 mW / 1.39 dBm
> > Laser receiver power : 2.001 mW / 3.01 dBm
> > Laser bias current high alarm : Off
> > Laser bias current low alarm : Off
> > Laser bias current high warning : Off
> > Laser bias current low warning : Off
> > Laser receiver power high alarm : Off
> > Laser receiver power low alarm : Off
> > Laser receiver power high warning : Off
> > Laser receiver power low warning : Off
> > Tx loss of signal functionality alarm : Off
> > Rx loss of signal alarm : Off
> > Tx laser disabled alarm : Off
> > Lane 2
> > Laser bias current : 41.066 mA
> > Laser output power : 1.659 mW / 2.20 dBm
> > Laser receiver power : 1.328 mW / 1.23 dBm
> > Laser bias current high alarm : Off
> > Laser bias current low alarm : Off
> > Laser bias current high warning : Off
> > Laser bias current low warning : Off
> > Laser receiver power high alarm : Off
> > Laser receiver power low alarm : Off
> > Laser receiver power high warning : Off
> > Laser receiver power low warning : Off
> > Tx loss of signal functionality alarm : Off
> > Rx loss of signal alarm : Off
> > Tx laser disabled alarm : Off
> > Lane 3
> > Laser bias current : 37.970 mA
> > Laser output power : 1.304 mW / 1.15 dBm
> > Laser receiver power : 1.370 mW / 1.37 dBm
> > Laser bias current high alarm : Off
> > Laser bias current low alarm : Off
> > Laser bias current high warning : Off
> > Laser bias current low warning : Off
> > Laser receiver power high alarm : Off
> > Laser receiver power low alarm : Off
> > Laser receiver power high warning : Off
> > Laser receiver power low warning : Off
> > Tx loss of signal functionality alarm : Off
> > Rx loss of signal alarm : Off
> > Tx laser disabled alarm : Off
> >
> > Il giorno dom 11 feb 2024 alle ore 13:12 Havard Eidnes <he@uninett.no>
> ha
> > scritto:
> >
> > > > DC technicians states cable are the same in both DCs and
> > > > direct, no patch panel
> > >
> > > Things I would look at:
> > >
> > > * Has all the connectors been verified clean via microscope?
> > >
> > > * Optical levels relative to threshold values (may relate to the
> > > first).
> > >
> > > * Any end seeing any input errors? (May relate to the above
> > > two.) On the Juniper you can see some of this via PCS
> > > ("Physical Coding Sublayer") unexpected events independently
> > > of whether you have payload traffic, not sure you can do the
> > > same on the Nexus boxes.
> > >
> > > Regards,
> > >
> > > - Håvard
> > >
> > _______________________________________________
> > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > https://puck.nether.net/mailman/listinfo/juniper-nsp
>
>
>
> --
> ++ytti
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
On Sun, 11 Feb 2024 at 15:24, james list <jameslist72@gmail.com> wrote:

> While on Juniper when the issue happens I always see:
>
> show log messages | last 440 | match LACPD_TIMEOUT
> Jan 25 21:32:27.948 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT
....
> Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5: lacp current while timer expired current Receive State: CURRENT

Ok so problem always starts by Juniper seeing 3seconds without LACP
PDU, i.e. missing 3 consecutive LACP PDU. It would be good to ping
while this problem is happening, to see if ping stops at 3s before the
syslog lines, or at the same time as syslog lines.
If ping stops 3s before, it's link problem from cisco to juniper.
If ping stops at syslog time (my guess), it's software problem.

There is unfortunately log of bug surface here, both on inject and on
punt path. You could be hitting PR1541056 on the Juniper end. You
could test for this by removing distributed LACP handling with 'set
routing-options ppm no-delegate-processing'
You could also do packet capture for LACP on both ends, to try to see
if LACP was sent by Cisco and received by capture, but not by system.


--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
Hi
I have a couple of points to ask related to your idea:
- why physical interface flaps in DC1 if it is related to lacp ?
- why the same setup in DC2 do not report issues ?

NEXUS01# sh logging | in Initia | last 15
2024 Jan 17 22:37:49 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Jan 18 23:54:25 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Jan 19 00:58:13 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Jan 19 07:15:04 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Jan 22 16:03:13 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Jan 25 21:32:29 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Jan 26 18:41:12 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Jan 28 05:07:20 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Jan 29 04:06:52 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Jan 30 03:09:44 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Feb 5 18:13:20 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Feb 6 02:17:25 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Feb 6 22:00:24 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Feb 9 09:29:36 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)
2024 Feb 9 16:39:36 NEXUS01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface
Ethernet1/44 is down (Initializing)

Il giorno dom 11 feb 2024 alle ore 14:36 Saku Ytti <saku@ytti.fi> ha
scritto:

> On Sun, 11 Feb 2024 at 15:24, james list <jameslist72@gmail.com> wrote:
>
> > While on Juniper when the issue happens I always see:
> >
> > show log messages | last 440 | match LACPD_TIMEOUT
> > Jan 25 21:32:27.948 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5:
> lacp current while timer expired current Receive State: CURRENT
> ....
> > Feb 9 16:39:35.813 2024 MX1 lacpd[31632]: LACPD_TIMEOUT: et-0/1/5:
> lacp current while timer expired current Receive State: CURRENT
>
> Ok so problem always starts by Juniper seeing 3seconds without LACP
> PDU, i.e. missing 3 consecutive LACP PDU. It would be good to ping
> while this problem is happening, to see if ping stops at 3s before the
> syslog lines, or at the same time as syslog lines.
> If ping stops 3s before, it's link problem from cisco to juniper.
> If ping stops at syslog time (my guess), it's software problem.
>
> There is unfortunately log of bug surface here, both on inject and on
> punt path. You could be hitting PR1541056 on the Juniper end. You
> could test for this by removing distributed LACP handling with 'set
> routing-options ppm no-delegate-processing'
> You could also do packet capture for LACP on both ends, to try to see
> if LACP was sent by Cisco and received by capture, but not by system.
>
>
> --
> ++ytti
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
On Sun, 11 Feb 2024 at 17:52, james list <jameslist72@gmail.com> wrote:

> - why physical interface flaps in DC1 if it is related to lacp ?

16:39:35.813 Juniper reports LACP timeout (so problem started at
16:39:32, (was traffic passing at 32, 33, 34 seconds?))
16:39:36.xxx Cisco reports interface down, long after problem has
already started

Why Cisco reports physical interface down, I'm not sure. But clearly
the problem was already happening before interface down, and first log
entry is LACP timeout, which occurs 3s after the problem starts.
Perhaps Juniper asserts for some reason RFI? Perhaps Cisco resets the
physical interface once removed from LACP?

> - why the same setup in DC2 do not report issues ?

If this is is LACP related software issue, could be difference not
identified. You need to gather more information, like how does ping
look throughout this event, particularly before syslog entries. And if
ping still works up-until syslog, you almost certainly have software
issue with LACP inject at Cisco, or more likely LACP punt at Juniper.

--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
hi
I'd like to test with LACP slow, then can see if physical interface still
flaps...

Thanks for your support

Il giorno dom 11 feb 2024 alle ore 18:02 Saku Ytti <saku@ytti.fi> ha
scritto:

> On Sun, 11 Feb 2024 at 17:52, james list <jameslist72@gmail.com> wrote:
>
> > - why physical interface flaps in DC1 if it is related to lacp ?
>
> 16:39:35.813 Juniper reports LACP timeout (so problem started at
> 16:39:32, (was traffic passing at 32, 33, 34 seconds?))
> 16:39:36.xxx Cisco reports interface down, long after problem has
> already started
>
> Why Cisco reports physical interface down, I'm not sure. But clearly
> the problem was already happening before interface down, and first log
> entry is LACP timeout, which occurs 3s after the problem starts.
> Perhaps Juniper asserts for some reason RFI? Perhaps Cisco resets the
> physical interface once removed from LACP?
>
> > - why the same setup in DC2 do not report issues ?
>
> If this is is LACP related software issue, could be difference not
> identified. You need to gather more information, like how does ping
> look throughout this event, particularly before syslog entries. And if
> ping still works up-until syslog, you almost certainly have software
> issue with LACP inject at Cisco, or more likely LACP punt at Juniper.
>
> --
> ++ytti
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [c-nsp] Stange issue on 100 Gbs interconnection Juniper - Cisco [ In reply to ]
On Mon, 12 Feb 2024 at 09:44, james list <jameslist72@gmail.com> wrote:

> I'd like to test with LACP slow, then can see if physical interface still flaps...

I don't think that's good idea, like what would we know? Would we have
to wait 30 times longer, so month-3months, to hit what ever it is,
before we have confidence?

I would suggest
- turn on debugging, to see cisco emitting LACP PDU, and juniper
receiving LACP PDU
- do packet capture, if at all reasonable, ideally tap, but in
absence of tap mirror
- turn off LACP distributed handling on junos
- ping on the link, ideally 0.2-0.5s interval, to record how ping
stops in relation to first syslog emitted about LACP going down
- wait for 4days


--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp