Mailing List Archive

Apparent memory leak
The system load reported by the uptime command, on one of my servers,
periodically spikes to 20-30 and then, shortly thereafter, I see this in
dmesg:

[2887460.393402] Out of memory: Kill process 12533 (/usr/sbin/apach) score
25 or sacrifice child
[2887460.394880] Killed process 12533 (/usr/sbin/apach) total-vm:476432kB,
anon-rss:204480kB, file-rss:0kB

Several gigs of memory then becomes available and the system load quickly
returns to normal. I'm pretty sure it's a mod perl process that's doing
this but I'm not entirely sure how to track down the problem.

How would you guys approach this problem?

--
John Dunlap
*CTO | Lariat *

*Direct:*
*john@lariat.co <john@lariat.co>*

*Customer Service:*
877.268.6667
support@lariat.co
Re: Apparent memory leak [ In reply to ]
Hi John,

The key is usually finding out what the request was that caused it. You can
add the pid to your access logging, or write a more complete mod_perl
handler to log the complete data input along with the pid. Then you just go
back and look at what it was after you see which process was killed.

- Perrin

On Tue, Sep 6, 2016 at 10:00 AM, John Dunlap <john@lariat.co> wrote:

> The system load reported by the uptime command, on one of my servers,
> periodically spikes to 20-30 and then, shortly thereafter, I see this in
> dmesg:
>
> [2887460.393402] Out of memory: Kill process 12533 (/usr/sbin/apach) score
> 25 or sacrifice child
> [2887460.394880] Killed process 12533 (/usr/sbin/apach) total-vm:476432kB,
> anon-rss:204480kB, file-rss:0kB
>
> Several gigs of memory then becomes available and the system load quickly
> returns to normal. I'm pretty sure it's a mod perl process that's doing
> this but I'm not entirely sure how to track down the problem.
>
> How would you guys approach this problem?
>
> --
> John Dunlap
> *CTO | Lariat *
>
> *Direct:*
> *john@lariat.co <john@lariat.co>*
>
> *Customer Service:*
> 877.268.6667
> support@lariat.co
>
Re: Apparent memory leak [ In reply to ]
My fear with logging the complete data input is that it will make the
problem worse for my customers because this problem is only happening on
heavily loaded servers. I can't reproduce it locally.

On Tue, Sep 6, 2016 at 11:26 AM, Perrin Harkins <pharkins@gmail.com> wrote:

> Hi John,
>
> The key is usually finding out what the request was that caused it. You
> can add the pid to your access logging, or write a more complete mod_perl
> handler to log the complete data input along with the pid. Then you just go
> back and look at what it was after you see which process was killed.
>
> - Perrin
>
> On Tue, Sep 6, 2016 at 10:00 AM, John Dunlap <john@lariat.co> wrote:
>
>> The system load reported by the uptime command, on one of my servers,
>> periodically spikes to 20-30 and then, shortly thereafter, I see this in
>> dmesg:
>>
>> [2887460.393402] Out of memory: Kill process 12533 (/usr/sbin/apach)
>> score 25 or sacrifice child
>> [2887460.394880] Killed process 12533 (/usr/sbin/apach)
>> total-vm:476432kB, anon-rss:204480kB, file-rss:0kB
>>
>> Several gigs of memory then becomes available and the system load quickly
>> returns to normal. I'm pretty sure it's a mod perl process that's doing
>> this but I'm not entirely sure how to track down the problem.
>>
>> How would you guys approach this problem?
>>
>> --
>> John Dunlap
>> *CTO | Lariat *
>>
>> *Direct:*
>> *john@lariat.co <john@lariat.co>*
>>
>> *Customer Service:*
>> 877.268.6667
>> support@lariat.co
>>
>
>


--
John Dunlap
*CTO | Lariat *

*Direct:*
*john@lariat.co <john@lariat.co>*

*Customer Service:*
877.268.6667
support@lariat.co
Re: Apparent memory leak [ In reply to ]
If this is a memory leak, won't the last request to be sent to the mod_perl
worker process be the last straw and not necessarily the culprit? What if
the leak is in some library code that's used in every request?

On Tue, Sep 6, 2016 at 12:43 PM, John Dunlap <john@lariat.co> wrote:

> My fear with logging the complete data input is that it will make the
> problem worse for my customers because this problem is only happening on
> heavily loaded servers. I can't reproduce it locally.
>
> On Tue, Sep 6, 2016 at 11:26 AM, Perrin Harkins <pharkins@gmail.com>
> wrote:
>
>> Hi John,
>>
>> The key is usually finding out what the request was that caused it. You
>> can add the pid to your access logging, or write a more complete mod_perl
>> handler to log the complete data input along with the pid. Then you just go
>> back and look at what it was after you see which process was killed.
>>
>> - Perrin
>>
>> On Tue, Sep 6, 2016 at 10:00 AM, John Dunlap <john@lariat.co> wrote:
>>
>>> The system load reported by the uptime command, on one of my servers,
>>> periodically spikes to 20-30 and then, shortly thereafter, I see this in
>>> dmesg:
>>>
>>> [2887460.393402] Out of memory: Kill process 12533 (/usr/sbin/apach)
>>> score 25 or sacrifice child
>>> [2887460.394880] Killed process 12533 (/usr/sbin/apach)
>>> total-vm:476432kB, anon-rss:204480kB, file-rss:0kB
>>>
>>> Several gigs of memory then becomes available and the system load
>>> quickly returns to normal. I'm pretty sure it's a mod perl process that's
>>> doing this but I'm not entirely sure how to track down the problem.
>>>
>>> How would you guys approach this problem?
>>>
>>> --
>>> John Dunlap
>>> *CTO | Lariat *
>>>
>>> *Direct:*
>>> *john@lariat.co <john@lariat.co>*
>>>
>>> *Customer Service:*
>>> 877.268.6667
>>> support@lariat.co
>>>
>>
>>
>
>
> --
> John Dunlap
> *CTO | Lariat *
>
> *Direct:*
> *john@lariat.co <john@lariat.co>*
>
> *Customer Service:*
> 877.268.6667
> support@lariat.co
>



--
John Dunlap
*CTO | Lariat *

*Direct:*
*john@lariat.co <john@lariat.co>*

*Customer Service:*
877.268.6667
support@lariat.co
Re: Apparent memory leak [ In reply to ]
Sorry, I should have explained what I meant better. You would add a handler
BEFORE the request gets to your regular application, so you catch the
details of the request that dies. I mis-remembered about the access_log. I
was actually thinking of a custom C module I used once that did this type
of thing -- logged the request in an early stage before handling it. But
you can do that with mod_perl.

This does not sound like a memory leak to me. Those are slow and can be
handled with something like Apache::SizeLimit. This sounds like some of
your Perl code is loading too much into memory for some reason, like
reading a giant file or database result. And that points to a specific type
of request or parameters being the problem.

On Tue, Sep 6, 2016 at 12:53 PM, John Dunlap <john@lariat.co> wrote:

> If this is a memory leak, won't the last request to be sent to the
> mod_perl worker process be the last straw and not necessarily the culprit?
> What if the leak is in some library code that's used in every request?
>
> On Tue, Sep 6, 2016 at 12:43 PM, John Dunlap <john@lariat.co> wrote:
>
>> My fear with logging the complete data input is that it will make the
>> problem worse for my customers because this problem is only happening on
>> heavily loaded servers. I can't reproduce it locally.
>>
>> On Tue, Sep 6, 2016 at 11:26 AM, Perrin Harkins <pharkins@gmail.com>
>> wrote:
>>
>>> Hi John,
>>>
>>> The key is usually finding out what the request was that caused it. You
>>> can add the pid to your access logging, or write a more complete mod_perl
>>> handler to log the complete data input along with the pid. Then you just go
>>> back and look at what it was after you see which process was killed.
>>>
>>> - Perrin
>>>
>>> On Tue, Sep 6, 2016 at 10:00 AM, John Dunlap <john@lariat.co> wrote:
>>>
>>>> The system load reported by the uptime command, on one of my servers,
>>>> periodically spikes to 20-30 and then, shortly thereafter, I see this in
>>>> dmesg:
>>>>
>>>> [2887460.393402] Out of memory: Kill process 12533 (/usr/sbin/apach)
>>>> score 25 or sacrifice child
>>>> [2887460.394880] Killed process 12533 (/usr/sbin/apach)
>>>> total-vm:476432kB, anon-rss:204480kB, file-rss:0kB
>>>>
>>>> Several gigs of memory then becomes available and the system load
>>>> quickly returns to normal. I'm pretty sure it's a mod perl process that's
>>>> doing this but I'm not entirely sure how to track down the problem.
>>>>
>>>> How would you guys approach this problem?
>>>>
>>>> --
>>>> John Dunlap
>>>> *CTO | Lariat *
>>>>
>>>> *Direct:*
>>>> *john@lariat.co <john@lariat.co>*
>>>>
>>>> *Customer Service:*
>>>> 877.268.6667
>>>> support@lariat.co
>>>>
>>>
>>>
>>
>>
>> --
>> John Dunlap
>> *CTO | Lariat *
>>
>> *Direct:*
>> *john@lariat.co <john@lariat.co>*
>>
>> *Customer Service:*
>> 877.268.6667
>> support@lariat.co
>>
>
>
>
> --
> John Dunlap
> *CTO | Lariat *
>
> *Direct:*
> *john@lariat.co <john@lariat.co>*
>
> *Customer Service:*
> 877.268.6667
> support@lariat.co
>