Mailing List Archive

JunOS 16.2R2.8 High CPU caused by python
Hi everyone,

Has anyone seen anything like this before? Searching on Google etc has
revealed nothing.

For a few months now, MX240s running 16.2R2.8 have showed one of the RE
CPUs running at around 90%. It has not affected the MX80s in the same
network running the same OS and having essentially the same config
(simple dual stack network, IS-IS/BGP, that's about it).

And none of the 240s started doing this at exactly the same time. And
there are no configuration changes around the times the CPU jumped.

Here is an example:

philip@TCR> show system processes extensive
last pid: 33210; load averages: 1.23, 1.15, 1.14 up 343+02:08:01
06:42:04
144 processes: 5 running, 138 sleeping, 1 waiting

Mem: 284M Active, 1474M Inact, 183M Wired, 12M Cache, 91M Buf, 32M Free
Swap: 4096M Total, 319M Used, 3777M Free, 7% Inuse


PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11473 root 1 52 0 724M 6048K piperd 3408.8 79.05% python
10 root 1 155 ki31 0K 12K RUN 2883.3 2.20% idle
11367 root 2 -26 r26 813M 8904K nanslp 182.5H 1.66% chassisd
11825 root 3 20 0 903M 39936K kqread 36.9H 0.49% rpd
11387 root 1 20 0 784M 29580K select 119.3H 0.39% mib2d
11863 root 1 20 0 738M 10084K select 76.4H 0.29% snmpd


Almost 80% caused by python. I'm not doing any automation (that I've
knowingly set up).

Jumping into a shell, I see this:

philip@TCR> start shell
% ps ax | grep python
11473 - R 204530:44.49 /usr/bin/python /usr/libexec/icmd/icmd.py
33362 0 S+ 0:00.00 grep python

ICMD according to the docs is the "internal communication health monitor
daemon".

Weirdly, I only see the ICMD running on the MX240s, not the MX80s. Even
though the docs suggest it applies to all MX.
https://www.juniper.net/documentation/en_US/junos/information-products/topic-collections/release-notes/16.2/topic-113675.html

Any clues at all?

Is this just a reboot to make it go away? (Although one has been
rebooted recently, and after about two weeks of uptime, the CPU jumped
up to 85% and has stayed there.)

Thanks!

philip
--


_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: JunOS 16.2R2.8 High CPU caused by python [ In reply to ]
Hi,

> On Mar 26, 2019, at 8:59 PM, Philip Smith <pfsinoz@gmail.com> wrote:
>
> Is this just a reboot to make it go away?

Not a solution, but an ignorant question - Is there a function to kill (and/or restart) the process in this type of scenario? On IOS-XR, there were specific XR CLI wrappers for restarting a process as a means to fix stuff like processes run amok without having to reboot the box (or RE/RSP/LC/whatever was misbehaving).
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: JunOS 16.2R2.8 High CPU caused by python [ In reply to ]
Yes. It was fixed in a later release. Perhaps try 16.2R2-S8 if you don't want to change to a later version.

On Wed, Mar 27, 2019 at 10:59:09AM +1000, Philip Smith wrote:
> Hi everyone,
>
> Has anyone seen anything like this before? Searching on Google etc has
> revealed nothing.
>
> For a few months now, MX240s running 16.2R2.8 have showed one of the RE
> CPUs running at around 90%. It has not affected the MX80s in the same
> network running the same OS and having essentially the same config
> (simple dual stack network, IS-IS/BGP, that's about it).
>
> And none of the 240s started doing this at exactly the same time. And
> there are no configuration changes around the times the CPU jumped.
>
> Here is an example:
>
> philip@TCR> show system processes extensive
> last pid: 33210; load averages: 1.23, 1.15, 1.14 up 343+02:08:01
> 06:42:04
> 144 processes: 5 running, 138 sleeping, 1 waiting
>
> Mem: 284M Active, 1474M Inact, 183M Wired, 12M Cache, 91M Buf, 32M Free
> Swap: 4096M Total, 319M Used, 3777M Free, 7% Inuse
>
>
> PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
> 11473 root 1 52 0 724M 6048K piperd 3408.8 79.05% python
> 10 root 1 155 ki31 0K 12K RUN 2883.3 2.20% idle
> 11367 root 2 -26 r26 813M 8904K nanslp 182.5H 1.66% chassisd
> 11825 root 3 20 0 903M 39936K kqread 36.9H 0.49% rpd
> 11387 root 1 20 0 784M 29580K select 119.3H 0.39% mib2d
> 11863 root 1 20 0 738M 10084K select 76.4H 0.29% snmpd
>
>
> Almost 80% caused by python. I'm not doing any automation (that I've
> knowingly set up).
>
> Jumping into a shell, I see this:
>
> philip@TCR> start shell
> % ps ax | grep python
> 11473 - R 204530:44.49 /usr/bin/python /usr/libexec/icmd/icmd.py
> 33362 0 S+ 0:00.00 grep python
>
> ICMD according to the docs is the "internal communication health monitor
> daemon".
>
> Weirdly, I only see the ICMD running on the MX240s, not the MX80s. Even
> though the docs suggest it applies to all MX.
> https://www.juniper.net/documentation/en_US/junos/information-products/topic-collections/release-notes/16.2/topic-113675.html
>
> Any clues at all?
>
> Is this just a reboot to make it go away? (Although one has been
> rebooted recently, and after about two weeks of uptime, the CPU jumped
> up to 85% and has stayed there.)
>
> Thanks!
>
> philip
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: JunOS 16.2R2.8 High CPU caused by python [ In reply to ]
Ah. Great, thanks Charles! :-)

philip
--

Anderson, Charles R wrote on 27/3/19 11:16 :
> Yes. It was fixed in a later release. Perhaps try 16.2R2-S8 if you don't want to change to a later version.
>
> On Wed, Mar 27, 2019 at 10:59:09AM +1000, Philip Smith wrote:
>> Hi everyone,
>>
>> Has anyone seen anything like this before? Searching on Google etc has
>> revealed nothing.
>>
>> For a few months now, MX240s running 16.2R2.8 have showed one of the RE
>> CPUs running at around 90%. It has not affected the MX80s in the same
>> network running the same OS and having essentially the same config
>> (simple dual stack network, IS-IS/BGP, that's about it).
>>
>> And none of the 240s started doing this at exactly the same time. And
>> there are no configuration changes around the times the CPU jumped.
>>
>> Here is an example:
>>
>> philip@TCR> show system processes extensive
>> last pid: 33210; load averages: 1.23, 1.15, 1.14 up 343+02:08:01
>> 06:42:04
>> 144 processes: 5 running, 138 sleeping, 1 waiting
>>
>> Mem: 284M Active, 1474M Inact, 183M Wired, 12M Cache, 91M Buf, 32M Free
>> Swap: 4096M Total, 319M Used, 3777M Free, 7% Inuse
>>
>>
>> PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
>> 11473 root 1 52 0 724M 6048K piperd 3408.8 79.05% python
>> 10 root 1 155 ki31 0K 12K RUN 2883.3 2.20% idle
>> 11367 root 2 -26 r26 813M 8904K nanslp 182.5H 1.66% chassisd
>> 11825 root 3 20 0 903M 39936K kqread 36.9H 0.49% rpd
>> 11387 root 1 20 0 784M 29580K select 119.3H 0.39% mib2d
>> 11863 root 1 20 0 738M 10084K select 76.4H 0.29% snmpd
>>
>>
>> Almost 80% caused by python. I'm not doing any automation (that I've
>> knowingly set up).
>>
>> Jumping into a shell, I see this:
>>
>> philip@TCR> start shell
>> % ps ax | grep python
>> 11473 - R 204530:44.49 /usr/bin/python /usr/libexec/icmd/icmd.py
>> 33362 0 S+ 0:00.00 grep python
>>
>> ICMD according to the docs is the "internal communication health monitor
>> daemon".
>>
>> Weirdly, I only see the ICMD running on the MX240s, not the MX80s. Even
>> though the docs suggest it applies to all MX.
>> https://www.juniper.net/documentation/en_US/junos/information-products/topic-collections/release-notes/16.2/topic-113675.html
>>
>> Any clues at all?
>>
>> Is this just a reboot to make it go away? (Although one has been
>> rebooted recently, and after about two weeks of uptime, the CPU jumped
>> up to 85% and has stayed there.)
>>
>> Thanks!
>>
>> philip
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: JunOS 16.2R2.8 High CPU caused by python [ In reply to ]
On 2019-03-26 21:11 -0400, Jason Lixfeld wrote:

> Not a solution, but an ignorant question - Is there a function to
> kill (and/or restart) the process in this type of scenario? On
> IOS-XR, there were specific XR CLI wrappers for restarting a process
> as a means to fix stuff like processes run amok without having to
> reboot the box (or RE/RSP/LC/whatever was misbehaving).

There is a restart command in Junos, which does exactly that. E.g:

bellman@Chili4> restart jsd
JET Services Daemon started, pid 62402

However, it can only restart certain processes (on my switches, I see
64 possible daemons in the help when I press "?"), and ICMD does not
seem to be one of them. (But that's on an EX4600 running 18.4R1; and
/usr/libexec/icmd doesn't even exist on it.)

Also, sometimes the name of the process binary does not match exactly
with the argument you are supposed to give to the restart command, so
you may need to think a little bit to figure that out.

(On 18.3, we had similar problems, but with jsd and ga-nrpc; after a
few weeks, they started using 100% CPU. Restarting them helped, but
after another couple of weeks they ran amok again. Doesn't happen in
18.2 or 18.4, though.)


/Bellman
Re: JunOS 16.2R2.8 High CPU caused by python [ In reply to ]
On Wed, Mar 27, 2019 at 11:35:54AM +0100, Thomas Bellman wrote:
> On 2019-03-26 21:11 -0400, Jason Lixfeld wrote:
>
> > Not a solution, but an ignorant question - Is there a function to
> > kill (and/or restart) the process in this type of scenario? On
> > IOS-XR, there were specific XR CLI wrappers for restarting a process
> > as a means to fix stuff like processes run amok without having to
> > reboot the box (or RE/RSP/LC/whatever was misbehaving).
>
> There is a restart command in Junos, which does exactly that. E.g:
>
> bellman@Chili4> restart jsd
> JET Services Daemon started, pid 62402
>
> However, it can only restart certain processes (on my switches, I see
> 64 possible daemons in the help when I press "?"), and ICMD does not
> seem to be one of them. (But that's on an EX4600 running 18.4R1; and
> /usr/libexec/icmd doesn't even exist on it.)

There is also generic way to terminate any process you want:

snar@router> request system process terminate ?
Possible completions:
<process-id> Process ID (1..99999)

(most of them will be restarted automagically).

>
> Also, sometimes the name of the process binary does not match exactly
> with the argument you are supposed to give to the restart command, so
> you may need to think a little bit to figure that out.
>
> (On 18.3, we had similar problems, but with jsd and ga-nrpc; after a
> few weeks, they started using 100% CPU. Restarting them helped, but
> after another couple of weeks they ran amok again. Doesn't happen in
> 18.2 or 18.4, though.)

These processes can be just disabled:

snar@router> show configuration system processes
jsd disable;
na-grpc-server disable;

>
>
> /Bellman
>




> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: JunOS 16.2R2.8 High CPU caused by python [ In reply to ]
(from jason ... "Is there a function to kill (and/or restart) the process in
this type of scenario? ")

Yes, there is.

For instance, I had an issue with leaking ip helper (dhcp relay) in Junos
ACX5048, that was immediately relieved with a restart of that process...


agould@ 5048> show chassis routing-engine | grep memory | refresh 1

Memory utilization 83 percent

---(refreshed at 2018-11-30 13:33:26 CST)---

Memory utilization 83 percent

...

---(refreshed at 2018-11-30 13:33:44 CST)---

Memory utilization 96 percent

---(refreshed at 2018-11-30 13:33:45 CST)---

Memory utilization 96 percent



agould@ 5048> restart dhcp-service gracefully

Junos Dynamic Host Configuration Protocol process started, pid 37106



.

.

.

---(refreshed at 2018-11-30 13:34:02 CST)---

Memory utilization 59 percent

---(refreshed at 2018-11-30 13:34:03 CST)---

Memory utilization 59 percent





15.1X54-D61.6 - leaking jdhcpd

17.3R3.10 - permanently fixed


- Aaron

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp