Mailing List Archive

things to look for in runaway server?
www.links.net is running 0.8.11, is a BSDI 2.0.1 box, and gets about 350K
hits/day with peaks of 350/minute. It's for the most part stable,
with 30-50 children in the pool at peak, but every now and then the
server goes wild and creates as many children as possible, wedging the
machine, slowing down requests (though they still get out). Since I'm
not always on the machine I'm not there to see when it gets wedged what's
going on, so we're trying to develop a script that runs every two minutes
and takes a snapshot of the system. Given that this should include a
load average, a number of concurrent httpd's, and the output of a vmstat
and iostat, is there anything else it should look for?

The logfiles indicate that reverse DNS stopped working (so the log files
just had IP numbers in it) and every 50 hits had a listing like

206.14.10.76 - - [13/Sep/1995:13:26:23 -0700] "NULL" 200 920

Thoughts?

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: things to look for in runaway server? [ In reply to ]
> and takes a snapshot of the system. Given that this should include a
> load average, a number of concurrent httpd's, and the output of a vmstat
> and iostat, is there anything else it should look for?

You might find a 'netstat -s' interesting as well.
Re: things to look for in runaway server? [ In reply to ]
>
>
> www.links.net is running 0.8.11, is a BSDI 2.0.1 box, and gets about 350K
> hits/day with peaks of 350/minute. It's for the most part stable,
> with 30-50 children in the pool at peak, but every now and then the
> server goes wild and creates as many children as possible, wedging the
> machine, slowing down requests (though they still get out). Since I'm
> not always on the machine I'm not there to see when it gets wedged what's
> going on, so we're trying to develop a script that runs every two minutes
> and takes a snapshot of the system. Given that this should include a
> load average, a number of concurrent httpd's, and the output of a vmstat
> and iostat, is there anything else it should look for?
>
> The logfiles indicate that reverse DNS stopped working (so the log files
> just had IP numbers in it) and every 50 hits had a listing like
>
> 206.14.10.76 - - [13/Sep/1995:13:26:23 -0700] "NULL" 200 920

I'll wager that setting your MaxClients lower will solve this (at the cost
of locking people out occasionally).


--
Ben Laurie Phone: +44 (181) 994 6435
Freelance Consultant Fax: +44 (181) 994 6472
and Technical Director Email: ben@algroup.co.uk (preferred)
A.L. Digital Ltd, benl@fear.demon.co.uk (backup)
London, England.

[.Note for the paranoid: "fear" as in "Fear and Loathing
in Las Vegas", "demon" as in Demon Internet Services, a
commercial Internet access provider.]
Re: things to look for in runaway server? [ In reply to ]
Hmmm... at the times when it runs away, the nameserver stops reliably
serving up hostnames. If, at the same time, it is *also* taking longer
to produce no answer, then that would explain the runaway forking and
the delays seen by clients (each server that accepts a connection waits
several seconds for the name server --- since connections are coming in
at about the same rate as normal, that means dramatically more requests
are "in process" at any given time, corresponding to the increase in
service time, which in turn accounts for the excess forking).

This is, at best, an educated guess, but there is a way to test it (though
it's not free of potentially noxious side effects) --- if I'm right, then\
compiling the server with -DMINIMAL_DNS will eliminate the delays and fork
bombs...

rst
Re: things to look for in runaway server? [ In reply to ]
[.server brings it's friends over for a party while you're away and trashes
the house]

> The logfiles indicate that reverse DNS stopped working (so the log files
> just had IP numbers in it) and every 50 hits had a listing like
>
> 206.14.10.76 - - [13/Sep/1995:13:26:23 -0700] "NULL" 200 920
>
> Thoughts?

A fork bombing server is symptomatic of damage to the /tmp/scoreboard file.
In a worst case scenario (already covered in mail last month) we get the
same effect when the 'contents' of the /tmp file are zeroed. If you've got
a root cron running about zapping /tmp files then this could be a contender.


> Brian

Ay.