Mailing List Archive

6509 w/SUP720-3BXL and high CPU load
Hello,

We are seeing on one of our 6509 chassis high CPU load (50-90%). We are not
seeing this on our other chassis and they are all optioned the same. The
one difference is that this chassis is sending traffic on one incoming
10gig interface out to another 6509 where that traffic is destine to hit
its gateway and then out to the internet.

Simple diagram is below.

10G serverB - 6509b - 6509a - asr9000 - internet
10G serverA - 6509a - asr9000 - internet

While I know this is not ideal, it is what it is until B server can get
moved to a different vlan. The issue is that 6509b has got high CPU load of
50-90% while 6509a has CPU load of 4%.

Traffic from server B is about 4.8G and traffic from server B is about 5G.

I have gone through the troubleshooting high CPU load on sup720 document
here:
https://community.cisco.com/t5/networking-documents/troubleshooting-high-cpu-on-a-6500-with-sup720/ta-p/3126932

and every time I find something that give me that Ah-ha moment, I check it
on the other switch and see that it is the same or higher as to ACL usage
or other items.

So my question is, what is the best way to track down what this high CPU
load is?

CPU on 6509b: CPU utilization for five seconds: 62%/22%; one minute: 42%;
five minutes: 42%
CPU on 6509a: CPU utilization for five seconds: 3%/1%; one minute: 14%;
five minutes: 14%

Any help would be greatly appreciated. Pulling my hair out trying to figure
out why.

Thanks,

-Lee
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: 6509 w/SUP720-3BXL and high CPU load [ In reply to ]
On Thu, 19 Mar 2020 at 19:33, Lee Starnes <lee.t.starnes@gmail.com> wrote:


> CPU on 6509b: CPU utilization for five seconds: 62%/22%; one minute: 42%;
> five minutes: 42%

The 2nd number is I/O, so you're software switching something. What
and why may be complex to answer and my 7600 memories seem to be
ethanol soluble.

First thing I'd try is to capture punted packets.

show plat cap buffer asic pinnacle slot N port 4 direction out priority lo
show plat cap buffer collect for 5
show plat cap buffer data filt
show plat cap buffer data sample X

N == your SUP slot
4 is direction out (out from fabric to rp).

Then look something which shouldn't have been punted, and look at
that prefix in mls cef.

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: 6509 w/SUP720-3BXL and high CPU load [ In reply to ]
>
>
> First thing I'd try is to capture punted packets.
>
> Per the document the you linked, I've found netdr or cpu span to be
helpful in this regard. That community post pretty much mirrors an
official doc on the same topic. I think the last time I saw something like
this it was some kind of link local IPv6 stuff. Either way, it would be
nice to know what you find the problem to be.

Thank you,
Nathan
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: 6509 w/SUP720-3BXL and high CPU load [ In reply to ]
Hi,

On Thu, Mar 19, 2020 at 10:28:58AM -0700, Lee Starnes wrote:
> We are seeing on one of our 6509 chassis high CPU load (50-90%). We are not

As ytti said, you're software switching.

Are you carrying full tables, and have hit MLS CEF limits?

("show mls cef exception status")

If this is showing "TRUE", you've hit "too many prefixes" and need to
reduce the number of routes this box sees, and then reload (no way to
normalize without reboot).

If this is showing all FALSE, something else is causing software
switching.

gert
--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany gert@greenie.muc.de
Re: 6509 w/SUP720-3BXL and high CPU load [ In reply to ]
Hello Ytti,

Looks like the 6509 does not have the show platform cap. It only has show
platform buffers. But I did find that this was an issue with SNMP. Thanks
for the pointers.

-Lee

On Thu, Mar 19, 2020 at 11:22 AM Saku Ytti <saku@ytti.fi> wrote:

> On Thu, 19 Mar 2020 at 19:33, Lee Starnes <lee.t.starnes@gmail.com> wrote:
>
>
> > CPU on 6509b: CPU utilization for five seconds: 62%/22%; one minute: 42%;
> > five minutes: 42%
>
> The 2nd number is I/O, so you're software switching something. What
> and why may be complex to answer and my 7600 memories seem to be
> ethanol soluble.
>
> First thing I'd try is to capture punted packets.
>
> show plat cap buffer asic pinnacle slot N port 4 direction out priority lo
> show plat cap buffer collect for 5
> show plat cap buffer data filt
> show plat cap buffer data sample X
>
> N == your SUP slot
> 4 is direction out (out from fabric to rp).
>
> Then look something which shouldn't have been punted, and look at
> that prefix in mls cef.
>
> --
> ++ytti
>
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: 6509 w/SUP720-3BXL and high CPU load [ In reply to ]
Hello Gert,

Thanks for the reply. All come up false. I did finally track it down to
mrtg hitting it with 25K packet requests with some old interfaces that were
in its config that are not in the chassis anymore. Once re-created this,
the issue resolved.

-Lee


On Thu, Mar 19, 2020 at 12:00 PM Gert Doering <gert@greenie.muc.de> wrote:

> Hi,
>
> On Thu, Mar 19, 2020 at 10:28:58AM -0700, Lee Starnes wrote:
> > We are seeing on one of our 6509 chassis high CPU load (50-90%). We are
> not
>
> As ytti said, you're software switching.
>
> Are you carrying full tables, and have hit MLS CEF limits?
>
> ("show mls cef exception status")
>
> If this is showing "TRUE", you've hit "too many prefixes" and need to
> reduce the number of routes this box sees, and then reload (no way to
> normalize without reboot).
>
> If this is showing all FALSE, something else is causing software
> switching.
>
> gert
> --
> "If was one thing all people took for granted, was conviction that if you
> feed honest figures into a computer, honest figures come out. Never
> doubted
> it myself till I met a computer with a sense of humor."
> Robert A. Heinlein, The Moon is a Harsh
> Mistress
>
> Gert Doering - Munich, Germany
> gert@greenie.muc.de
>
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: 6509 w/SUP720-3BXL and high CPU load [ In reply to ]
Hello Nathan,

So what I find interesting is that a process that shows 13% CPU is actually
using 60% CPU. Using a "show proc cpu sorted 5sec" I was able to see that
SNMP was coming up with 13 and 15% CPU on the process when this is going on
(all the time), but on the other switches, that would only appear once for
about 5 seconds and then go away. Leaving a brief. spike and then drop to
normal on the CPU load. So started to investigate and the machine that was
hitting it with 25K packets each time was our machine that runs MRTG. A
little research into that and found that the config for this switch was old
and had some interfaces that were not in the chassis anymore and missing
some that were new in the chassis. Rebuilt that and the issue resolved.
Packets went from 25K to 746 and completed its poll of the interfaces
within 5-7 seconds.

Thanks for the response.

-Lee


On Thu, Mar 19, 2020 at 11:39 AM Nathan Lannine <nathan.lannine@gmail.com>
wrote:

>
>> First thing I'd try is to capture punted packets.
>>
>> Per the document the you linked, I've found netdr or cpu span to be
> helpful in this regard. That community post pretty much mirrors an
> official doc on the same topic. I think the last time I saw something like
> this it was some kind of link local IPv6 stuff. Either way, it would be
> nice to know what you find the problem to be.
>
> Thank you,
> Nathan
>
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: 6509 w/SUP720-3BXL and high CPU load [ In reply to ]
On Fri, 20 Mar 2020 at 23:42, Lee Starnes <lee.t.starnes@gmail.com> wrote:

> Looks like the 6509 does not have the show platform cap. It only has show platform buffers. But I did find that this was an issue with SNMP. Thanks for the pointers.

You probably need 'service internal' for those commands.

>
> -Lee
>
> On Thu, Mar 19, 2020 at 11:22 AM Saku Ytti <saku@ytti.fi> wrote:
>>
>> On Thu, 19 Mar 2020 at 19:33, Lee Starnes <lee.t.starnes@gmail.com> wrote:
>>
>>
>> > CPU on 6509b: CPU utilization for five seconds: 62%/22%; one minute: 42%;
>> > five minutes: 42%
>>
>> The 2nd number is I/O, so you're software switching something. What
>> and why may be complex to answer and my 7600 memories seem to be
>> ethanol soluble.
>>
>> First thing I'd try is to capture punted packets.
>>
>> show plat cap buffer asic pinnacle slot N port 4 direction out priority lo
>> show plat cap buffer collect for 5
>> show plat cap buffer data filt
>> show plat cap buffer data sample X
>>
>> N == your SUP slot
>> 4 is direction out (out from fabric to rp).
>>
>> Then look something which shouldn't have been punted, and look at
>> that prefix in mls cef.
>>
>> --
>> ++ytti



--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/