Mailing List Archive

ANNOUNCE: Apache::Watchdog::RunAway 0.1
The uploaded file

Apache-Watchdog-RunAway-0.01.tar.gz

has entered CPAN as

file: $CPAN/authors/id/S/ST/STAS/Apache-Watchdog-RunAway-0.01.tar.gz
size: 6565 bytes
md5: 6a8175e1c91c44b65e18e72e5fcec0c1

Please allow a few hours for the mirror sites to picks this new file.

(note: this is a module that I've previously called Apache::SafeHang)

NAME
Apache::Watchdog::RunAway - a monitor for hanging processes

SYNOPSIS
stop_monitor();
start_monitor();
start_detached_monitor();

$Apache::Watchdog::RunAway::TIMEOUT = 0;
$Apache::Watchdog::RunAway::POLLTIME = 60;
$Apache::Watchdog::RunAway::DEBUG = 0;
$Apache::Watchdog::RunAway::LOCK_FILE = "/tmp/safehang.lock";
$Apache::Watchdog::RunAway::LOG_FILE = "/tmp/safehang.log";
$Apache::Watchdog::RunAway::SCOREBOARD_URL = "http://localhost/scoreboard";

DESCRIPTION
A module that monitors hanging Apache/mod_perl processes. You define the
time in seconds after which the process to be counted as hanging. You
also control the polling time between check to check.

When the process is considered as 'hanging' it will be killed and the
event logged into a log file. The log file is being opened on append, so
you can basically defined the same log file that uses Apache.

You can start this process from startup.pl or through any other method.
(e.g. a crontab). Once started it runs indefinitely, untill killed.

You cannot start a new monitoring process before you kill the old one.
The lockfile will prevent you from doing that.

Generally you should use the `amprapmon' program that bundled with this
module's distribution package, but you can write your own code using the
module as well. See the amprapmon manpage for more info about it.

Methods:

* stop_monitor()
Stop the process based on the PID in the lock file. Remove the lock
file.

* start_monitor()
Starts the monitor in the current process. Create the lock file.

* start_detached_monitor()
Starts the monitor in a forked process. (used by `amprapmon').
Create the lock file.

WARNING
This is an alpha version of the module, so use it after a testing on
development machine.

The most critical parameter is the value of
*$Apache::Watchdog::RunAway::TIMEOUT* (see CONFIGURATION), since the
processes will be killed without waiting for them to quit (since they
hung).

CONFIGURATION
Install and configure `Apache::Scoreboard' module

<Location /scoreboard>
SetHandler perl-script
PerlHandler Apache::Scoreboard::send
order deny,allow
# deny from all
# allow from ...
</Location>

Configure the Apache::Watchdog::RunAway parameters:

$Apache::Watchdog::RunAway::TIMEOUT = 0;

The time in seconds after which the process is considered hanging. 0
means deactivated. The default is 0 (deactivated).

$Apache::Watchdog::RunAway::POLLTIME = 60;

Polling intervals in seconds. The default is 60.

$Apache::Watchdog::RunAway::DEBUG = 0;

Debug mode (0 or 1). The default is 0.

$Apache::Watchdog::RunAway::LOCK_FILE = "/tmp/safehang.lock";

The process lock file location. The default is */tmp/safehang.lock*

$Apache::Watchdog::RunAway::LOG_FILE = "/tmp/safehang.log";

The log file location. Since it flocks the file, you can safely use the
same log file that Apache uses, so you will get the messages about
killed processes in file you've got used to. The default is
*/tmp/safehang.log*

$Apache::Watchdog::RunAway::SCOREBOARD_URL = "http://localhost/scoreboard";

Since the process relies on scoreboard URL configured on any of your
machines (the URL returns a binary image that includes the status of the
server and its children), you must specify it. This enables you to run
the monitor on one machine while the server can run on the other
machine. The default is URI is *http://localhost/scoreboard*.

Start the monitoring process either with:

start_detached_monitor()

that starts the monitor in a forked process or

start_monitor()

that starts the monitor in the current process.

Stop the process with:

stop_monitor()

The distribution arrives with `amprapmon' program that provides an rc.d
like or apachectl interface.

Instead of using a Perl interface you can start it from the command
line:

amprapmon start

or from the *startup.pl* file:

system "amprapmon start";

or

system "amprapmon stop";
system "amprapmon start";

or

system "amprapmon restart";

As mentioned before, once started it sholdn't be killed. So you may
leave only the `system "amprapmon start";' in the *startup.pl*

You can start the `amprapmon' program from crontab as well.

TUNING
The most important part of configuration is choosing the right timeout
(aka $Apache::Watchdog::RunAway::TIMEOUT) parameter. You should try this
code that hangs and see the process killed after a timeout if the
monitor is running.

my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
$r->rflush;
while(1){
$r->print("\0");
$r->rflush;
$i++;
sleep 1;
}

TROUBLESHOOTING
The module relies on correctly configured `/scoreboard' location URI. If
it cannot fetch the URI, it queitly assumes that server is stopped. So
either check manually that the `/scoreboard' location URI is working or
use the above test script that hangs to make sure it works.

Enable debug mode for more information.

PREREQUISITES
You need to have Apache::Scoreboard installed and configured in
*httpd.conf*.

BUGS
Was ist dieses?

SEE ALSO
the Apache manpage, the mod_perl manpage, the Apache::Scoreboard manpage

AUTHORS
Stas Bekman <sbekman@iname.com>

COPYRIGHT
Apache::Watchdog::RunAway is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.


Enjoy!

_______________________________________________________________________
Stas Bekman mailto:sbekman@iname.com http://www.stason.org/stas
Perl,CGI,Apache,Linux,Web,Java,PC http://www.stason.org/stas/TULARC
perl.apache.org modperl.sourcegarden.org perlmonth.com perl.org
single o-> + single o-+ = singlesheaven http://www.singlesheaven.com
Re: ANNOUNCE: Apache::Watchdog::RunAway 0.1 [ In reply to ]
Stas Bekman <sbekman@iname.com> writes:

> DESCRIPTION
> A module that monitors hanging Apache/mod_perl processes. You define the
> time in seconds after which the process to be counted as hanging. You
> also control the polling time between check to check.

Maybe your documentation should actually describe what constitutes "hanging"
and how the watchdog actually detects a hanging process? I'm really curious to
know if this would address some of our problems but I'm not sure exactly which
problems it's designed to solve.

I have tried to look at the source but I suspect most people wouldn't and it
wasn't immediately obvious to me what was going on for that matter.

--
greg
Re: ANNOUNCE: Apache::Watchdog::RunAway 0.1 [ In reply to ]
On 29 Feb 2000, Greg Stark wrote:

>
> Stas Bekman <sbekman@iname.com> writes:
>
> > DESCRIPTION
> > A module that monitors hanging Apache/mod_perl processes. You define the
> > time in seconds after which the process to be counted as hanging. You
> > also control the polling time between check to check.
>
> Maybe your documentation should actually describe what constitutes
> "hanging" and how the watchdog actually detects a hanging process? I'm
> really curious to know if this would address some of our problems but
> I'm not sure exactly which problems it's designed to solve.
>
> I have tried to look at the source but I suspect most people wouldn't
> and it wasn't immediately obvious to me what was going on for that
> matter.

Sure, that's version 0.1, so both the code and the documentation in the
alpha stage, but it works :)

The process is considered haging or run-away when it serves a single
request for more than X seconds, where X is the threshold that you define.
I know when it's the same request by looking at the counters of each
server and keeping track of the last counter. The differences between
other modules that kill processes on the resource consuming limit base,
this module watches the clock, when it ticks for too long, something is
wrong and the processes should be stopped.

Some processes might hang (run-away) without actually using a CPU, so you
cannot detect them with the other modules (Apache::SizeLimit,
Apache::GTopLimit, Apache::Resource), so I can run a DoS attack on your
server by slowly sending characters to you, let's say a char per 10 secs,
The I can accept the outgoing stream very very slowly too. I can easily
keep all your processes busy and have all your other users unable to
contact you. This is something that wouldn't require a lot of CPU, isn't
it.

Another example is when you've some process that waits for something that
never happens, or an infinite loop, take a look at the debug chapter for
the code examples (it's in this module docs as well).

Probably you can think about other examples when the process can pretend
to be busy but actually doing nothing...

BTW, Apache::VMonitor will show you these potentially run-away processes
in a red colour -- this is one of the visual alert features it provides.

Please, tell me whether this makes it easier to comprehend the target of
this module and how should I improve the documentation to make it
comprehensive for everybody and not just the one who wrote it :)


______________________________________________________________________
Stas Bekman | JAmpH -- Just Another mod_perl Hacker
http://stason.org/ | mod_perl Guide http://perl.apache.org/guide/
mailto:stas@stason.org | http://perl.org http://stason.org/TULARC/
http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
----------------------------------------------------------------------