Mailing List Archive

found it!
Okay, file desciptors leak when

a) the parent is sent a kill -HUP, and the children respawn - there is
suddenly another fd pointing to the main access log in each child. I
have cron send a kill -HUP to the parent every 15 minutes to refresh the
children - over time (like clockwork! :) this was increasing the number
of fd's. Only kill -9 on the parent frees these.

b) a connection is "lost" - in netscape, this when one page is being
accessed and you jump to another page before it's loading. Netscape must
do a different type of disconnect in this circumstance than when you hit
the stop button, which doesn't cause an fd leak.

The first is the most important one to squish. I'll try looking for
that, hopefully someone who knows the code better will beat me to it :)

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: found it! [ In reply to ]
>
>
> Okay, file desciptors leak when
>
> a) the parent is sent a kill -HUP, and the children respawn - there is
> suddenly another fd pointing to the main access log in each child. I
> have cron send a kill -HUP to the parent every 15 minutes to refresh the
> children - over time (like clockwork! :) this was increasing the number
> of fd's. Only kill -9 on the parent frees these.
>
> b) a connection is "lost" - in netscape, this when one page is being
> accessed and you jump to another page before it's loading. Netscape must
> do a different type of disconnect in this circumstance than when you hit
> the stop button, which doesn't cause an fd leak.
>
> The first is the most important one to squish. I'll try looking for
> that, hopefully someone who knows the code better will beat me to it :)

I don't remember the code, but I think in 0.7.2 a "lost" or "timed out"
connection doesn't kill the child. In 0.7.3 the child hits exit(0);
So the problem *should* go away.
To fix 0.7.2 throw in an exit(0) in "send_fd_timed_out"

I'm having great fun (not) fixing bugs on a live server at Cardiff..
the connection to the UK is slow (what's new), and the only way I can
start a new server is with a reboot command the sysadmin gave me in case
of emergencies :-)
God knows what the users think is going on... when a big problem occurs,
every connection can SIGSEGV, so first they get a series of empty
documents, then the server just disapears for 5 minutes...

I'll be at 0.7.3z by the end of the night :-)

--
http://nqcd.lanl.gov/~hartill/