Mailing List Archive

EOBC0/0 ifInErrors
Hi!

Walking SNMP ifTable for Cisco 7606/RSP720-3C-GE I've found that virtual
interface EOBC0/0 (Ethernet out-of-band channel) has increasing counter IF-MIB::ifInErrors.
No visible problems with the box, though.

Should I worry about this ifInErrors growth?

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
Eugene Grosbein wrote on 29/09/2020 10:08:
> Walking SNMP ifTable for Cisco 7606/RSP720-3C-GE I've found that virtual
> interface EOBC0/0 (Ethernet out-of-band channel) has increasing counter IF-MIB::ifInErrors.
> No visible problems with the box, though.
>
> Should I worry about this ifInErrors growth?

possibly. If the errors are significant, you should try reseating the
rsp720 and possibly some of the cards to see if that helps.

Nick
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
29.09.2020 18:56, Nick Hilliard wrote:

> Eugene Grosbein wrote on 29/09/2020 10:08:
>> Walking SNMP ifTable for Cisco 7606/RSP720-3C-GE I've found that virtual
>> interface EOBC0/0 (Ethernet out-of-band channel) has increasing counter IF-MIB::ifInErrors.
>> No visible problems with the box, though.
>>
>> Should I worry about this ifInErrors growth?
>
> possibly. If the errors are significant, you should try reseating the rsp720 and possibly some of the cards to see if that helps.

Define "significant" :-) This router has uptime over 1 year.

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
sharp increase recently (weekly, daily, hourly). I'd be more concerned
with daily and hourly (or less).
How frequently do you see them and what is the amount?

On Tue, Sep 29, 2020 at 9:41 AM Eugene Grosbein <eugen@grosbein.net> wrote:

> 29.09.2020 18:56, Nick Hilliard wrote:
>
> > Eugene Grosbein wrote on 29/09/2020 10:08:
> >> Walking SNMP ifTable for Cisco 7606/RSP720-3C-GE I've found that virtual
> >> interface EOBC0/0 (Ethernet out-of-band channel) has increasing counter
> IF-MIB::ifInErrors.
> >> No visible problems with the box, though.
> >>
> >> Should I worry about this ifInErrors growth?
> >
> > possibly. If the errors are significant, you should try reseating the
> rsp720 and possibly some of the cards to see if that helps.
>
> Define "significant" :-) This router has uptime over 1 year.
>
> _______________________________________________
> cisco-nsp mailing list cisco-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
29.09.2020 20:44, Aaron wrote:

> sharp increase recently (weekly, daily, hourly). I'd be more concerned with daily and hourly (or less).
> How frequently do you see them and what is the amount?

Yesterday I've created mrtg graph for the counter and it shows steady rate in a range of 16-32 per second.


_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
Eugene Grosbein wrote on 30/09/2020 05:14:
> Yesterday I've created mrtg graph for the counter and it shows steady rate in a range of 16-32 per second.

I'd say that is sup + line card reseating territory.

Nick
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
30.09.2020 19:03, Nick Hilliard wrote:

> Eugene Grosbein wrote on 30/09/2020 05:14:
>> Yesterday I've created mrtg graph for the counter and it shows steady rate in a range of 16-32 per second.
>
> I'd say that is sup + line card reseating territory.

What does it mean?


_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
You may need to ask TAC. Unfortunately I do not know.
He is suggesting reseating all cards. Starting with the Supervisor.


On Wed, Sep 30, 2020 at 10:14 AM Eugene Grosbein <eugen@grosbein.net> wrote:

> 30.09.2020 19:03, Nick Hilliard wrote:
>
> > Eugene Grosbein wrote on 30/09/2020 05:14:
> >> Yesterday I've created mrtg graph for the counter and it shows steady
> rate in a range of 16-32 per second.
> >
> > I'd say that is sup + line card reseating territory.
>
> What does it mean?
>
>
> _______________________________________________
> cisco-nsp mailing list cisco-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
Aaron wrote on 30/09/2020 20:11:
> He is suggesting reseating all cards. Starting with the Supervisor.

correct. power down the box, carefully reseat all cards, power up, see
if that fixes it.

If it doesn't fix it, then open a TAC case. If the unit isn't under
support, then you have a problem because this type of error could be one
of the cards, or the sup, or the backplane and it's really hard to tell
which without swapping units out. If you can check out the EOBC on the
line cards using remote login, that might give useful information, maybe.

Nick
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
you could deduce if it is a line card or by adding 1 line card at a time.

On Wednesday, September 30, 2020, Nick Hilliard <nick@foobar.org> wrote:

> Aaron wrote on 30/09/2020 20:11:
>
>> He is suggesting reseating all cards. Starting with the Supervisor.
>>
>
> correct. power down the box, carefully reseat all cards, power up, see if
> that fixes it.
>
> If it doesn't fix it, then open a TAC case. If the unit isn't under
> support, then you have a problem because this type of error could be one of
> the cards, or the sup, or the backplane and it's really hard to tell which
> without swapping units out. If you can check out the EOBC on the line
> cards using remote login, that might give useful information, maybe.
>
> Nick
>
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
01.10.2020 3:03, Nick Hilliard wrote:

> Aaron wrote on 30/09/2020 20:11:
>> He is suggesting reseating all cards. Starting with the Supervisor.
>
> correct. power down the box, carefully reseat all cards, power up, see if that fixes it.
>
> If it doesn't fix it, then open a TAC case. If the unit isn't under support,
> then you have a problem because this type of error could be one of the cards,
> or the sup, or the backplane and it's really hard to tell which without swapping units out.

This 7606 is the core router of our network, it runs for many years without visible problems and current uptime's over 1 year.
It would be hard doing such things with the core without any evidences other than some counter increasing.

At the moment, all I want is to know what is the counter designed for, what can it growth mean?

> If you can check out the EOBC on the line cards using remote login, that might give useful information, maybe.

How do I do that?

Thank you very much for your patience.

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
Eugene Grosbein wrote on 01/10/2020 03:23:
> This 7606 is the core router of our network, it runs for many years
> without visible problems and current uptime's over 1 year. It would
> be hard doing such things with the core without any evidences other
> than some counter increasing.
>
> At the moment, all I want is to know what is the counter designed
> for, what can it growth mean?

Can you post the output of "show eobc"?

eobc is the ethernet out-of-band connector and is the main control plane
communication mechanism between the sup and the line cards. If you're
seeing errors on the link, then this probably means that there's a
physical problem somewhere on the backplane.

>> If you can check out the EOBC on the line cards using remote login,
>> that might give useful information, maybe.
>
> How do I do that?

Try: "show platform eobc all"

or:

# remote command module <number> show interface <intname>

I can't remember whether you can examine the EOBC using "remote command"
- it's been a long time since I've logged into a 6500/7600.

Nick
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
01.10.2020 15:36, Nick Hilliard wrote:

> Can you post the output of "show eobc"?
>
> eobc is the ethernet out-of-band connector and is the main control plane communication mechanism between the sup and the line cards. If you're seeing errors on the link, then this probably means that there's a physical problem somewhere on the backplane.
>
>>> If you can check out the EOBC on the line cards using remote login,
>>> that might give useful information, maybe.
>>
>> How do I do that?
>
> Try: "show platform eobc all"

http://www.grosbein.net/cisco/eobc.txt
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
Eugene Grosbein wrote on 01/10/2020 09:58:
> http://www.grosbein.net/cisco/eobc.txt

this looks interesting:

> Counters collected at Idb:
> Input Errors = 3199810
> Output Drops = 0 Giants/Runts = 0/3199810

runts are packets which smaller than the minimum packet size, i.e. this
suggests a physical problem.

You have dual supervisor on this box. Can you do a failover to the
secondary and see if that stops the input errors? This will localise
the problem

Nick
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
01.10.2020 17:49, Nick Hilliard ?????:

> this looks interesting:
>
>> Counters collected at Idb:
>> Input Errors = 3199810 Output Drops = 0 Giants/Runts = 0/3199810
>
> runts are packets which smaller than the minimum packet size, i.e. this suggests a physical problem.
>
> You have dual supervisor on this box. Can you do a failover to the secondary and see if that stops the input errors? This will localise the problem

This is possible, I'll try.

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
01.10.2020 17:49, Nick Hilliard wrote:

> You have dual supervisor on this box.
> Can you do a failover to the secondary and see if that stops the input errors?
> This will localise the problem

Do I need to disable NSF (non-stop forwarding) to localise the problem?
I guess both RSP should exchange data routinely when NSF is enabled.



_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
02.10.2020 12:16, Eugene Grosbein wrote:

> 01.10.2020 17:49, Nick Hilliard wrote:
>
>> You have dual supervisor on this box.
>> Can you do a failover to the secondary and see if that stops the input errors?
>> This will localise the problem
>
> Do I need to disable NSF (non-stop forwarding) to localise the problem?
> I guess both RSP should exchange data routinely when NSF is enabled.

Or I could just eject stand-by RSP out of the chassis.


_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
02.10.2020 12:24, Eugene Grosbein wrote:

>> 01.10.2020 17:49, Nick Hilliard wrote:
>>
>>> You have dual supervisor on this box.
>>> Can you do a failover to the secondary and see if that stops the input errors?
>>> This will localise the problem
>>
>> Do I need to disable NSF (non-stop forwarding) to localise the problem?
>> I guess both RSP should exchange data routinely when NSF is enabled.
>
> Or I could just eject stand-by RSP out of the chassis.

So I've changed redundancy mode to to RPR (Route Processor Redundancy, was SSO) then stand-by RSP was ejected physically.
I expected that errors would stop or would not change at all but neither happened.

Instead, error rate decreased significantly but not ceased (12:25 was the moment):
http://www.grosbein.net/cisco/eobc0_0-day.png

Also something strange happened with these stats that I consider as "fabric utilisation":

CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.1.0 = INTEGER: 25
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.1.1 = INTEGER: 37
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.2.0 = INTEGER: 6
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.2.1 = INTEGER: 5
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.3.0 = INTEGER: 8
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.3.1 = INTEGER: 1
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.4.0 = INTEGER: 4
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.4.1 = INTEGER: 4
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.5.0 = INTEGER: 0
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.6.0 = INTEGER: 0
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.1.0 = INTEGER: 7
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.1.1 = INTEGER: 12
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.2.0 = INTEGER: 13
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.2.1 = INTEGER: 18
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.3.0 = INTEGER: 13
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.3.1 = INTEGER: 13
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.4.0 = INTEGER: 7
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.4.1 = INTEGER: 11
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.5.0 = INTEGER: 0
CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.6.0 = INTEGER: 0

Here are corresponding graphs. Modules 5 and 6 (RSPs) almost always show zeroes,
graphs for modules 1 and 4 do not show any unusual but graph for modules 2 and 3 do:

http://www.grosbein.net/cisco/f10-day.png
http://www.grosbein.net/cisco/f11-day.png
http://www.grosbein.net/cisco/f20-day.png
http://www.grosbein.net/cisco/f21-day.png
http://www.grosbein.net/cisco/f30-day.png
http://www.grosbein.net/cisco/f31-day.png
http://www.grosbein.net/cisco/f40-day.png
http://www.grosbein.net/cisco/f41-day.png

I get RSP and configuration back and graphs restore to previous shapes.

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
02.10.2020 15:00, Eugene Grosbein wrote:

>>>> You have dual supervisor on this box.
>>>> Can you do a failover to the secondary and see if that stops the input errors?
>>>> This will localise the problem
>>>
>>> Do I need to disable NSF (non-stop forwarding) to localise the problem?
>>> I guess both RSP should exchange data routinely when NSF is enabled.
>>
>> Or I could just eject stand-by RSP out of the chassis.
>
> So I've changed redundancy mode to to RPR (Route Processor Redundancy, was SSO) then stand-by RSP was ejected physically.
> I expected that errors would stop or would not change at all but neither happened.
>
> Instead, error rate decreased significantly but not ceased (12:25 was the moment):
> http://www.grosbein.net/cisco/eobc0_0-day.png
>
> Also something strange happened with these stats that I consider as "fabric utilisation":
>
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.1.0 = INTEGER: 25
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.1.1 = INTEGER: 37
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.2.0 = INTEGER: 6
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.2.1 = INTEGER: 5
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.3.0 = INTEGER: 8
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.3.1 = INTEGER: 1
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.4.0 = INTEGER: 4
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.4.1 = INTEGER: 4
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.5.0 = INTEGER: 0
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsInUtil.6.0 = INTEGER: 0
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.1.0 = INTEGER: 7
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.1.1 = INTEGER: 12
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.2.0 = INTEGER: 13
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.2.1 = INTEGER: 18
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.3.0 = INTEGER: 13
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.3.1 = INTEGER: 13
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.4.0 = INTEGER: 7
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.4.1 = INTEGER: 11
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.5.0 = INTEGER: 0
> CISCO-CAT6K-CROSSBAR-MIB::cc6kxbarStatisticsOutUtil.6.0 = INTEGER: 0
>
> Here are corresponding graphs. Modules 5 and 6 (RSPs) almost always show zeroes,
> graphs for modules 1 and 4 do not show any unusual but graph for modules 2 and 3 do:
>
> http://www.grosbein.net/cisco/f10-day.png
> http://www.grosbein.net/cisco/f11-day.png
> http://www.grosbein.net/cisco/f20-day.png
> http://www.grosbein.net/cisco/f21-day.png
> http://www.grosbein.net/cisco/f30-day.png
> http://www.grosbein.net/cisco/f31-day.png
> http://www.grosbein.net/cisco/f40-day.png
> http://www.grosbein.net/cisco/f41-day.png
>
> I get RSP and configuration back and graphs restore to previous shapes.

Also I should note this router has multiple 2 or 4-port Port-channels with 1Gbps ports.
Some Port-channels have members at single line card, some have members spread over multiple line cards.
Maybe this is relevant.
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
Hi,

On Fri, Oct 02, 2020 at 03:00:05PM +0700, Eugene Grosbein wrote:
> Instead, error rate decreased significantly but not ceased (12:25 was the moment):
> http://www.grosbein.net/cisco/eobc0_0-day.png

Well, if the now-active RSP is the one with the faulty EOBC link,
this is what would happen - by removing the standby RSP, you removed
a lot of traffic on the EOBC bus (session sync etc).

Some traffic remains (talking to the line cards), so it's not "gone"
but "less"

gert
--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany gert@greenie.muc.de
Re: EOBC0/0 ifInErrors [ In reply to ]
Gert Doering wrote on 02/10/2020 09:16:
> Well, if the now-active RSP is the one with the faulty EOBC link,
> this is what would happen - by removing the standby RSP, you removed
> a lot of traffic on the EOBC bus (session sync etc).

i.e. next step: re-insert the standby rsp, fail over to it and see what
happens.

Nick
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
02.10.2020 15:22, Nick Hilliard wrote:

> Gert Doering wrote on 02/10/2020 09:16:
>> Well, if the now-active RSP is the one with the faulty EOBC link,
>> this is what would happen - by removing the standby RSP, you removed
>> a lot of traffic on the EOBC bus (session sync etc).
>
> i.e. next step: re-insert the standby rsp, fail over to it and see what happens.

Thank you all, I'll try next week.

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
02.10.2020 15:00, Eugene Grosbein wrote:

>>> 01.10.2020 17:49, Nick Hilliard wrote:
>>>
>>>> You have dual supervisor on this box.
>>>> Can you do a failover to the secondary and see if that stops the input errors?
>>>> This will localise the problem
>>>
>>> Do I need to disable NSF (non-stop forwarding) to localise the problem?
>>> I guess both RSP should exchange data routinely when NSF is enabled.
>>
>> Or I could just eject stand-by RSP out of the chassis.
>
> So I've changed redundancy mode to to RPR (Route Processor Redundancy, was SSO) then stand-by RSP was ejected physically.
> I expected that errors would stop or would not change at all but neither happened.
>
> Instead, error rate decreased significantly but not ceased (12:25 was the moment):
> http://www.grosbein.net/cisco/eobc0_0-day.png

So I waited until Saturday early morning and performed switch-over at 7:00, then error rate increased.
Today at 12:16 another one (now inactive) RSP module was ejected physically and error rate decreases again but not ceased:

http://www.grosbein.net/cisco/eobc0_0-day2.png

What does it mean?

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
12.10.2020 15:59, Eugene Grosbein wrote:

>>>>> You have dual supervisor on this box.
>>>>> Can you do a failover to the secondary and see if that stops the input errors?
>>>>> This will localise the problem
>>>>
>>>> Do I need to disable NSF (non-stop forwarding) to localise the problem?
>>>> I guess both RSP should exchange data routinely when NSF is enabled.
>>>
>>> Or I could just eject stand-by RSP out of the chassis.
>>
>> So I've changed redundancy mode to to RPR (Route Processor Redundancy, was SSO) then stand-by RSP was ejected physically.
>> I expected that errors would stop or would not change at all but neither happened.
>>
>> Instead, error rate decreased significantly but not ceased (12:25 was the moment):
>> http://www.grosbein.net/cisco/eobc0_0-day.png
>
> So I waited until Saturday early morning and performed switch-over at 7:00, then error rate increased.

Forgot show show that moment:

http://www.grosbein.net/cisco/eobc0_0-inc.png

Both graps show same value, upper graph shows maximum over an hour, lower (blue) is an average over an hour.

> Today at 12:16 another one (now inactive) RSP module was ejected physically and error rate decreases again but not ceased:
>
> http://www.grosbein.net/cisco/eobc0_0-day2.png
>
> What does it mean?

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
Eugene Grosbein wrote on 12/10/2020 10:07:
> Forgot show show that moment:
>
> http://www.grosbein.net/cisco/eobc0_0-inc.png
>
> Both graps show same value, upper graph shows maximum over an hour, lower (blue) is an average over an hour.
>
>> Today at 12:16 another one (now inactive) RSP module was ejected physically and error rate decreases again but not ceased:
>>
>> http://www.grosbein.net/cisco/eobc0_0-day2.png
>>
>> What does it mean?

if errs-in matches runts-in, then this may be a backplane failure. You
need to get an opinion from TAC on this.

Nick
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
12.10.2020 23:12, Nick Hilliard wrote:

> Eugene Grosbein wrote on 12/10/2020 10:07:
>> Forgot show show that moment:
>>
>> http://www.grosbein.net/cisco/eobc0_0-inc.png
>>
>> Both graps show same value, upper graph shows maximum over an hour, lower (blue) is an average over an hour.
>>
>>> Today at 12:16 another one (now inactive) RSP module was ejected physically and error rate decreases again but not ceased:
>>>
>>> http://www.grosbein.net/cisco/eobc0_0-day2.png
>>>
>>> What does it mean?
>
> if errs-in matches runts-in, then this may be a backplane failure. You need to get an opinion from TAC on this.

I'm not sure how and where I can get counters for errs-in/runts-in of EOBC0/0.

Also, I have another similar 7606 box with two excactly same RSP modules and its SNMP counters for EOBC0/0 show very similar picture.
I doubt this is hardware failure because both routers experience no other visible problems while forming core of the network and carrying its traffic.

We have no active support contract.

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
Eugene Grosbein wrote on 12/10/2020 20:11:
> I'm not sure how and where I can get counters for errs-in/runts-in of
> EOBC0/0.

"show platform eobc all" - see the "Giants / Runts" counters.

> Also, I have another similar 7606 box with two excactly same RSP
> modules and its SNMP counters for EOBC0/0 show very similar picture.
> I doubt this is hardware failure because both routers experience no
> other visible problems while forming core of the network and carrying
> its traffic.

the EOBC interface is half-duplex which means that collisions can
happen. It may be that these "errors" are simply collisions, in which
case they are harmless.

Nick
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
13.10.2020 2:18, Nick Hilliard wrote:

> Eugene Grosbein wrote on 12/10/2020 20:11:
>> I'm not sure how and where I can get counters for errs-in/runts-in of
>> EOBC0/0.
>
> "show platform eobc all" - see the "Giants / Runts" counters.
>
>> Also, I have another similar 7606 box with two excactly same RSP
>> modules and its SNMP counters for EOBC0/0 show very similar picture.
>> I doubt this is hardware failure because both routers experience no
>> other visible problems while forming core of the network and carrying
>> its traffic.
>
> the EOBC interface is half-duplex which means that collisions can happen. It may be that these "errors" are simply collisions, in which case they are harmless.

It seems so:

#show platform eobc all
Driver Level Counters: (Cumulative, Zeroed only at Reset)
Frames Bytes
Rx(0) 83287913 683355380
Tx(0) 76687181 3083105928

Input Drop Frame Count
Rx0 = 0 Rx-replacement0 = 0
Per Queue Receive Errors:
FRME OFLW BUFE NOENP DISCRD DISABLE BADCOUNT
Rx0 0 0 0 0 0 0 0

Rx Last bit not set = 0 Rx First bit not set = 0
Ring Reset = 0

Tx Errors/State:
One Collision Error = 306656 More Collisions = 1209541
No Encap Error = 0 Deferred Error = 2009836
Loss Carrier Error = 0 Late Collision Error = 0
Excessive Collisions = 1 Buffer Error = 0
Tx Freeze Count = 0 Tx Intrpt Serv timeout= 1
Tx Flow State = FLOW_ON
Tx Flow Off Count = 0 Tx Flow On Count = 0

Counters collected at Idb:
Is input throttled = 0 Throttle Count = 0
Rx Resource Errors = 0 Input Drops = 0
Input Errors = 28517
Output Drops = 0 Giants/Runts = 0/28517
Dma Mem Error = 0 Input Overrun = 0

Hash match table for multicast (in use 1, maximum 64 entries):
Entry 0 MAC Addr = 0100.8300.0000

Dev_Instance=0x1D13A06C 487825516
Receive/Transmit Buffer Descriptors
RingSz Shadow StartBD LastBD Head Tail Pend
Rx0 512 1D13AA2C 79CF7C80 79CF8C78 361 0 0
Tx0 512 0 79CF8CC0 79CF9CB8 333 333 0

Rx Drop On Busy Disabled
Hash match table for multicast (in use 1, maximum 64 entries):
Entry 0 MAC Addr = 0100.8300.0000

ETSEC IPC/RPC/SCP Counters

IPC Rx Count = 100450718
IPC RxFrag Count = 12547529
IPC RPC Request Rx Count = 11260149
IPC RPC Request RxFrag Count = 9414646
IPC RPC Response Rx Count = 14647969
IPC RPC Response RxFrag Count = 1562930
IPC Unreliable Rx Count = 452764
SCP Rx Count = 28688164

IPC Tx Count = 91846030
IPC TxFrag Count = 104057
IPC RPC Request Tx Count = 13189443
IPC RPC Request TxFrag Count = 3899
IPC RPC Response Tx Count = 5032168
IPC RPC Response TxFrag Count = 68319
IPC Unreliable Tx Count = 452792
SCP Tx Count = 28696645


ETSEC counters in Hardware

64b = 122344316 64b-127b = 138902198 128b-255b = 43473726
256b-511b = 11325757 512b-1023b = 8774625 1024b-1518b = 160146925
1519b-1522b = 0 RxBytes = 3933761960 RxPkt = 420739417
RxFCSErr = 0 RxMCast = 9871009 RxBCast = 17
RxCntrlFr = 0 RxPauseFR = 0 RxUnkop = 0
RxAlignErr = 0 RxLenErr = 0 RxCodeErr = 0
RxSenseErr = 0 RxUnder = 76 RxOver = 0
RxFrag = 12458974 RxJab = 0 RxDrop = 0
RxFilerRej = 0 Txbytes = 3420175598 TxPkt = 76687181
TxMCast = 244991 TxBCast = 2 TxPause = 0
TxDef = 1421065 TxCol = 306656 TxMCol = 321360
TxLateCol = 0 TxExCol = 1 TxTotCol = 1516203
TxDrop = 0 TxJab = 0 TxFCS = 0
TxOver = 0 TxUnder = 0 TxFrg = 0
TxIllegalTxDrops = 0


_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
13.10.2020 2:18, Nick Hilliard wrote:

> Eugene Grosbein wrote on 12/10/2020 20:11:
>> I'm not sure how and where I can get counters for errs-in/runts-in of
>> EOBC0/0.
>
> "show platform eobc all" - see the "Giants / Runts" counters.

Another box show similar picture:

Counters collected at Idb:
Is input throttled = 0 Throttle Count = 0
Rx Resource Errors = 0 Input Drops = 0
Input Errors = 4306877
Output Drops = 0 Giants/Runts = 0/4306877
Dma Mem Error = 0 Input Overrun = 0

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
Eugene Grosbein wrote on 12/10/2020 20:28:
> It seems so:
[...]
> Tx Errors/State:
> One Collision Error = 306656 More Collisions = 1209541
[...]
> Input Errors = 28517
> Output Drops = 0 Giants/Runts = 0/28517

uh, obviously collisions are a transmit phenomenon, but you're seeing rx
errors, so this isn't collisions. I don't know what the root cause is
here, but runts are not good on ethernet and usually indicate physical
cabling problems. As the EOBC interface is "cabled" via the crossbar at
the rear of the chassis, there are relatively few failure points.

Nick
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
Eugene Grosbein wrote on 12/10/2020 20:28:
> It seems so:
[...]
> Tx Errors/State:
> One Collision Error = 306656 More Collisions = 1209541
[...]
> Input Errors = 28517
> Output Drops = 0 Giants/Runts = 0/28517

> uh, obviously collisions are a transmit phenomenon, but you're seeing rx
> errors, so this isn't collisions. I don't know what the root cause is
> here, but runts are not good on ethernet and usually indicate physical
> cabling problems. As the EOBC interface is "cabled" via the crossbar at
> the rear of the chassis, there are relatively few failure points.

> Nick

One possibility that springs to mind here is a duplex mis-match. If the EOBC0/0 interface that you're seeing runts on is running as full duplex, and the interface that it connects to is running at half duplex, then the half-duplex interface will record a collision and stop transmitting when it receives a frame overlapping with one that it is transmitting. Your full-duplex interface will only see the part of that frame that is transmitted before the collision is detected - which the full-duplex interface may record as a runt.

It would be interesting to know what happens to the runt and collision stats if you can force both ends to full duplex, or both ends to half duplex.

Tim
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
Tim Rayner wrote on 13/10/2020 10:48:
> If the EOBC0/0 interface that you're seeing runts on is running as full
> duplex, and the interface that it connects to is running at half duplex,

that would do it, but EOBC is an internal interface which is h/d by
default. I'm not sure if it's even possible to configure the interface
discipline on the CLI, but maybe it is.

Nick
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: EOBC0/0 ifInErrors [ In reply to ]
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/