Mailing List Archive

JunOS 9.4R1.8 - Memory Leak?
Hi all.

Anyone notice what looks like an RE memory leak/usage growth
post a JunOS 9.4R1.8 upgrade?

System here is RE-850 in an M7i with 1.5GB of DRAM.

cflowd suspected; we've disabled it and appear to have
arrested further growth in usage.

JTAC are in the loop, still investigating.

Cheers,

Mark.
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
On Fri, February 27, 2009 14:12, Mark Tinka wrote:
> Hi all.
>
> Anyone notice what looks like an RE memory leak/usage growth
> post a JunOS 9.4R1.8 upgrade?

we experienced a similar problem on various gear (MX/M/T) with 9.4R1.8.
symptoms are either rpd crashing with malloc failure, or rpd getting stuck
with 100% CPU load.

from the information we got so far, it seems to be related to IS-IS.
if you've got JTAC involved, you might want to have your case
cross-checked with PR428557 and PR424317.

BR,
-jr
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
On Thursday 05 March 2009 07:38:21 pm Johannes Resch wrote:

> we experienced a similar problem on various gear (MX/M/T)
> with 9.4R1.8.

We've been pulling our hair out of our heads trying to
figure out what could be causing the leak :-).

We have a case open with JTAC, but they have us trending
memory usage over a number of days, it's running slow.
Refreshing to hear someone else has seen this in the field.

> symptoms are either rpd crashing with
> malloc failure, or rpd getting stuck with 100% CPU load.

We haven't gotten this far, although we've seen an average
increase of CPU usage by at least 3% daily on M7i's running
with 1.5GB of DRAM on RE-850's.

We're quite surprised Juniper could have missed this, if it
is indeed a bug, as we are easily seeing this issue on a
new, fully-configured box not plugged into the network, just
running some burn tests - and memory usage creeps up slowly,
day by day.

Not loving 9.4R1.8 so far...

> from the information we got so far, it seems to be
> related to IS-IS.

Which is our IGP.

> if you've got JTAC involved, you might
> want to have your case cross-checked with PR428557 and
> PR424317.

Trying to search these up isn't turning up anything. Are
they public?

Cheers,

Mark.
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
Hi,

* Mark Tinka

> Anyone notice what looks like an RE memory leak/usage growth
> post a JunOS 9.4R1.8 upgrade?
>
> System here is RE-850 in an M7i with 1.5GB of DRAM.
>
> cflowd suspected; we've disabled it and appear to have
> arrested further growth in usage.

I just had a MX 240 with 9.4R1.8 crash hard today, no trace of any
reason in the logs as far as I can see. When I got to the console it
was in some kind of a debugger, and I had to powercycle it to get it
back online ("reset" from the debugger just booted up in a single-user
shell or something like that).

We're using cflowd but not IS-IS. I have two MX-es which are set up
almost identical, but the one that crashed are connected to an IX so it
has in excess of 40 BGP peers while the other one has only 5 - possibly
that caused it to go first?

The one that hasn't crashed (yet?) has very little free memory now:

> show system processes summary
last pid: 11142; load averages: 0.00, 0.04, 0.08 up 21+07:24:22
17:42:18
116 processes: 4 running, 95 sleeping, 17 waiting

Mem: 1282M Active, 256M Inact, 120M Wired, 308M Cache, 69M Buf, 30M Free
Swap: 2048M Total, 2048M Free


PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 1 171 52 0K 12K RUN 499.0H 91.55% idle

While the recently booted one looks much better:

> show system processes summary
last pid: 1571; load averages: 0.00, 0.03, 0.06 up 0+01:05:11
17:43:37
116 processes: 3 running, 95 sleeping, 18 waiting

Mem: 418M Active, 213M Inact, 108M Wired, 211M Cache, 69M Buf, 1046M Free
Swap: 2048M Total, 2048M Free


PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 1 171 52 0K 12K RUN 59:48 96.19% idle

They had almost identical uptimes prior to the crash, and the last boot
was due to the upgrade to 9.4. I just opened a case with my local
support provider, haven't heard back from them yet.

I'll keep you posted if I learn more, and thanks in advance for doing
the same...

Best regards,
--
Tore Anderson
Redpill Linpro AS - http://www.redpill-linpro.com/
Tel: +47 21 54 41 27
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
On Friday 06 March 2009 01:49:37 am Tore Anderson wrote:

> Hi,

Hi Tore.

Great feedback, thanks.

> We're using cflowd but not IS-IS.

I think we might have ruled out cflowd, as we discontinued
sampling but still see the memory leak, sitting at some 75%
today, from a regular 45% about two weeks back.

We were happy with 9.3R2.8, but need to support the enhanced
CFEB's, so...

> I'll keep you posted if I learn more...

Same.

Cheers,

Mark.
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
Once upon a time, Mark Tinka <mtinka@globaltransit.net> said:
> We were happy with 9.3R2.8, but need to support the enhanced
> CFEB's, so...

Funny you should mention that... I'm seeing SNMP problems on 9.3R2.8,
with both J-series and a couple of M10is.

On the J-series, I've got one in a remote POP with the proscribed USB
modem attached and a dialer interface configured for out-of-band access
(for when the backhaul fiber gets cut, like it did today). If I even
just walk ifInOctets, when it hits the dialer interface, the SNMP daemon
freezes for 10 seconds (no response to any requests). Remove the dialer
interface from the config, and SNMP works again.

My M10i problem is with the ifStackStatus table, which is supposed to
describe the relationship between interfaces (e.g. ct3-1/2/3 ->
t1-1/2/3:1 -> t1-1/2/3:1.0, fe-1/2/1 -> fe-1/2/1.99). My in-house
written management tools use this when deciding which interfaces to
monitor and display for which purposes. However, I'm getting screwed up
stacking, where the base interface (e.g. fe-1/2/1) may be listed as a
child of its own VLAN (fe-1/2/1.99). It is random, appearing on an fe
with a bunch of VLANs on one router and on a SONET interface (unit 0 is
the parent of the base) on another.

I have tickets open on both of these, but so far JTAC hasn't been able
to duplicate; I was just wondering if anyone else was seeing similar (or
other) SNMP "oddness" with 9.3R2.8.

--
Chris Adams <cmadams@hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
Hey,

> I'll keep you posted if I learn more, and thanks in advance for doing
> the same...

The fine folks from nLogic and JTAC have a theory now, which is that the
leak is related to a new process introduced in 9.4 called "lpdfd". It's
not something I've been using, so disabling it was not a problem for me:

system {
processes {
local-policy-decision-function disable;
}
}

The process was logging to files in the directory "/mfs/var/lpdfd",
which is mounted on a memory-backed block device. This is the actual
leak, it seems. I deleted all the logs, but suprisingly enough it
didn't cause the memory utilisation (as reported by "show chassis
routing-engine") to drop sharply. However, it seems like the router now
has stopped leaking! The memory utilisation is down to 93% - it was at
96% right before "lpdfd" was disabled and had been slowly but steadily
increasing up until that point.

I had disabled the process on my other MX too, but I have not yet
deleted its log files. So far it looks like the memory usage on that
one has flatlined. I suspect that the memory freed up by deleting the
files is reclaimed only when needed, and that's why I see a (small) drop
in memory utilisation on the router where the log files were deleted only.

I'll have to monitor the memory utilisation on the routers for a few
more days before I can be certain that we've nailed the bug, though, but
I'm feeling optimistic. You'll probably want to try disabling the
process yourself. Let me know how it goes!

Best regards,
--
Tore Anderson
Redpill Linpro AS - http://www.redpill-linpro.com/
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
On Friday 13 March 2009 02:22:26 am Tore Anderson wrote:

> Hey,

Hi Tore.

> The fine folks from nLogic and JTAC have a theory now,
> which is that the leak is related to a new process
> introduced in 9.4 called "lpdfd".

Right; the JTAC engineer working on our case came back to us
last night to say the exact same thing - the new LPDFD
process introduced in the 9.4 code base is logging quite
heavily.

It creates a fairly large log file under '/mfs/var/lpdfd/'
every 8hrs, approximately, which, as you say, is where the
leak is occurring.

We've disabled this process too and see RE memory
utilization reduction occurring slowly but surely.

LPDFD provides logging for things like VoIP, e.t.c.

> I'll have to monitor the memory utilisation on the
> routers for a few more days before I can be certain that
> we've nailed the bug, though, but I'm feeling optimistic.
> You'll probably want to try disabling the process
> yourself. Let me know how it goes!

--do--

Meanwhile, I'd say this is a recommended command for anyone
moving to 9.4R1.8. However, the actual bug appears to be
fixed in the 9.4R3, 9.5R1 and 9.6R1 or later.

Cheers,

Mark.
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
On Friday 13 March 2009 02:22:26 am Tore Anderson wrote:

> I'll have to monitor the memory utilisation on the
> routers for a few more days before I can be certain that
> we've nailed the bug, though, but I'm feeling optimistic.
> You'll probably want to try disabling the process
> yourself. Let me know how it goes!

So it looks like we've arrested further growth in RE memory
utilization. Clearly, a reboot would be needed to reclaim
what's been wasted, but what's key is that it's not running
amok anymore :-).

The question is whether the reboot will be to downgrade to
9.3R2.8, upgrade to the next 9.4 release that fixes this
issue or stick to this release. We're more inclined to the
latter options as 9.4 fixes the traceroute bug seen in MPLS-
based BGP-free cores.

Thanks to you, JTAC and all the others that helped in
working on this issue.

Cheers,

Mark.
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
Mark,

On Thu, Mar 12, 2009 at 5:04 PM, Mark Tinka <mtinka@globaltransit.net> wrote:

>
> It creates a fairly large log file under '/mfs/var/lpdfd/'
> every 8hrs, approximately, which, as you say, is where the
> leak is occurring.
>
> We've disabled this process too and see RE memory
> utilization reduction occurring slowly but surely.
>

Interesting, the /mfs/.. is essentially RAM disk (remember what it is
:). If the file is getting big, it will eventually force some of
inactive memory into swap.

-Pasvorn
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
> The question is whether the reboot will be to downgrade to
> 9.3R2.8, upgrade to the next 9.4 release that fixes this
> issue or stick to this release. We're more inclined to the
> latter options as 9.4 fixes the traceroute bug seen in MPLS-
> based BGP-free cores.

Could you expand a bit on the traceroute bug and how this is visible?

Steinar Haug, Nethelp consulting, sthaug@nethelp.no
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
On Monday 16 March 2009 11:58:49 pm sthaug@nethelp.no wrote:

> Could you expand a bit on the traceroute bug and how this
> is visible?

http://www.gossamer-threads.com/lists/nsp/juniper/16700

PR/396280

Although JTAC say this issue is resolved in 9.3R3 for the
9.3 code base. However, as mentioned before, we need 9.4 at
a minimum to support the enhanced CFEB's.

Cheers,

Mark.
Re: JunOS 9.4R1.8 - Memory Leak? [ In reply to ]
> > Could you expand a bit on the traceroute bug and how this
> > is visible?
>
> http://www.gossamer-threads.com/lists/nsp/juniper/16700
>
> PR/396280
>
> Although JTAC say this issue is resolved in 9.3R3 for the
> 9.3 code base. However, as mentioned before, we need 9.4 at
> a minimum to support the enhanced CFEB's.

Ah yes, I remember that bug now.

We're seeimg a different bug where traceroute through an MPLS core
sometimes behaves as if no-propagate-ttl was configured (but it is
not configured).

Steinar Haug, Nethelp consulting, sthaug@nethelp.no
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp