Mailing List Archive: any ideas on this 0.8.16 problem?

+1 on releasing 1.0.0. It has been running on our server for the
weekend with no problems.

However, I was having problems with 0.8.16, which crashed our server
machine on Wednesday and drove the root partition to an early grave. :(
I am hoping that this is not a general problem, but here it is in case
anyone has seen something similar or can find the cause. Maybe one of
you folks with a serious test setup can reproduce it.

Start: Some broken client accesses a (foolish undergrad) user homepage:

=====access_log==============================================================
hcfpath.hnet.uci.edu - - [22/Nov/1995:13:07:12 -0800] "GET /~jtrinida HTTP/1.0" 302 -
hcfpath.hnet.uci.edu - - [22/Nov/1995:13:07:15 -0800] "GET /~jtrinida/ HTTP/1.0" 200 4063
=========
At this point, the client ignores the newly redirected URL and starts
to retrieve the 68 (!!!bloody idiot!!!) in-line images on that page.
=========
hcfpath.hnet.uci.edu - - [22/Nov/1995:13:07:17 -0800] "GET /embars.gif HTTP/1.0" 404 -
hcfpath.hnet.uci.edu - - [22/Nov/1995:13:07:17 -0800] "GET /W.GIF HTTP/1.0" 404 -
hcfpath.hnet.uci.edu - - [22/Nov/1995:13:07:18 -0800] "GET /E.GIF HTTP/1.0" 404 -
hcfpath.hnet.uci.edu - - [22/Nov/1995:13:07:18 -0800] "GET /L.GIF HTTP/1.0" 404 -
hcfpath.hnet.uci.edu - - [22/Nov/1995:13:07:19 -0800] "GET /C.GIF HTTP/1.0" 404 -
hcfpath.hnet.uci.edu - - [22/Nov/1995:13:07:19 -0800] "GET /O.GIF HTTP/1.0" 404 -
hcfpath.hnet.uci.edu - - [22/Nov/1995:13:07:20 -0800] "GET /M.GIF HTTP/1.0" 404 -
hcfpath.hnet.uci.edu - - [22/Nov/1995:13:07:20 -0800] "GET /E.GIF HTTP/1.0" 404 -
=========
... until
=========
hcfpath.hnet.uci.edu - - [22/Nov/1995:13:07:56 -0800] "GET /love.gif HTTP/1.0" 404 -
=========
Note that this is 68 404 Not Found (standard error message, nothing fancy)
responses in less than 40 seconds. Since I have Multiviews on for all
directories, each failure also invokes the multiview code. My httpd.conf:

MinSpareServers 2
MaxSpareServers 4
MaxClients 40
MaxRequestsPerChild 60

My current theory is that the high request rate, combined with a severe
memory leak somewhere (multiviews?), is causing a server memory blowout.
Interestingly, the first "Unable to fork new process" occurs immediately
after the 60th request. I am going to reduce MaxRequestsPerChild to 40,
just in case.

It is possible that this is just the straw that broke the camel's back.
I haven't seen the same since we replaced the root hard disk and upgraded
to 1.0.0, but I haven't been able to test it either (I blocked all access
to the ugrad's webspace, since they are forbidden to display such stupidity
as 68 worthless inlined images in a home page on the department webserver,
and I don't know of any multithreaded clients that are broken enough to
ignore the redirected URL (causing the huge number of 404 responses).

===================================================================
The error_log:

[Wed Nov 22 13:07:17 1995] access to /dc/ud/www/documentroot/embars.gif failed for hcfpath.hnet.uci.edu, reason: File does not exist
[Wed Nov 22 13:07:17 1995] access to /dc/ud/www/documentroot/W.GIF failed for hcfpath.hnet.uci.edu, reason: File does not exist
[Wed Nov 22 13:07:18 1995] access to /dc/ud/www/documentroot/E.GIF failed for hcfpath.hnet.uci.edu, reason: File does not exist
========
...
========
[Wed Nov 22 13:07:53 1995] access to /dc/ud/www/documentroot/A.GIF failed for hcfpath.hnet.uci.edu, reason: File does not exist
[Wed Nov 22 13:07:53 1995] Unable to fork new process
[Wed Nov 22 13:07:53 1995] access to /dc/ud/www/documentroot/G.GIF failed for hcfpath.hnet.uci.edu, reason: File does not exist
[Wed Nov 22 13:07:54 1995] access to /dc/ud/www/documentroot/E.GIF failed for hcfpath.hnet.uci.edu, reason: File does not exist
Out of memory!
[Wed Nov 22 13:07:54 1995] access to /dc/ud/www/documentroot/happy.gif failed for hcfpath.hnet.uci.edu, reason: File does not exist
[Wed Nov 22 13:07:55 1995] access to /dc/ud/www/documentroot/sail.gif failed for hcfpath.hnet.uci.edu, reason: File does not exist
[Wed Nov 22 13:07:55 1995] Unable to fork new process
========
...
========
[Wed Nov 22 13:08:04 1995] access to /dc/ud/www/documentroot/8_ball.gif failed for hcfpath.hnet.uci.edu, reason: File does not exist
[Wed Nov 22 13:08:04 1995] access to /dc/ud/www/documentroot/island.gif failed for hcfpath.hnet.uci.edu, reason: File does not exist
[Wed Nov 22 13:08:05 1995] Unable to fork new process
[Wed Nov 22 13:08:06 1995] Unable to fork new process
[Wed Nov 22 13:08:07 1995] Unable to fork new process
... until machine dies
===================================================================

Note that the "Out of memory!" message is abnormal.

.....Roy

> +1 on releasing 1.0.0. It has been running on our server for the
> weekend with no problems.
>
> However, I was having problems with 0.8.16, which crashed our server
> machine on Wednesday and drove the root partition to an early grave. :(
> I am hoping that this is not a general problem, but here it is in case
> anyone has seen something similar or can find the cause. Maybe one of
> you folks with a serious test setup can reproduce it.

It might be the same problem I was seeing.

I originally dismissed lack of memory, but there doesn't seem to be
anything else that could cause it.

It's possible that when Cardiff runs out of memory that Apache is failing
to make a check for lack of memory - or more likely failing to react
correctly, and it sends itself into a downward spiral to certain death
(a massive writing spree to error_log)

> Since I have Multiviews on for all
> directories, each failure also invokes the multiview code. My httpd.conf:
>
> MinSpareServers 2
> MaxSpareServers 4
> MaxClients 40
> MaxRequestsPerChild 60
>
> My current theory is that the high request rate, combined with a severe
> memory leak somewhere (multiviews?),

Unless it is switched on by default, I don't use it at Cardiff, so that
should eliminate that part of the code.

> is causing a server memory blowout.
> Interestingly, the first "Unable to fork new process" occurs immediately
> after the 60th request. I am going to reduce MaxRequestsPerChild to 40,
> just in case.

I think mine's down to 15.

I still get one of these "blowouts" every day or two.

Are there any cases where an apache child process decides to continue
after a failed request for memory?. If there are, shouldn't the action
be to do an immediate log and die?

rob