Mailing List Archive

HARD_SERVER_MAX
Maybe this is a clue..

I can see the use of HARD_SERVER_MAX in the code, but I'm configuring
this much lower.


void reclaim_child_processes ()
{
int i, status;

sync_scoreboard_image();
for (i = 0; i < HARD_SERVER_MAX; ++i)
waitpid (scoreboard_image[i].pid, &status, 0);
}



Does this loop go too far if my MAX is < HARD_SERVER_MAX.

Could it be my Apache parent was waiting on some unreleated pid ?



rob
--
http://nqcd.lanl.gov/~hartill/
Re: HARD_SERVER_MAX [ In reply to ]
Re: HARD_SERVER_MAX [ In reply to ]
Hmmm... that function is only called during restart (or should only be);
in any case, the scoreboard is cleaned out before any pid gets written
in it, so the "dead space" should only contain zeroes, and waitpid on
0 should return ECHILD just about everyplace. (Though, come to think of
it, the check should probably be there). I'll look the code over again
and see what might have changed...

(BTW, a gcore on the root server process, if it ever gets in this state
again, would be very interesting --- or better yet, just attach to the
process with gdb, an even easier way of figuring out what it's doing).

rst
Re: HARD_SERVER_MAX [ In reply to ]
If the slot isn't being used, then the waitpid() should immediately
return with an error --- admittedly, using the kernel to do the check
isn't efficient, but fast restarts were less on my mind when I wrote
this than just getting the code to work.

rst
Re: HARD_SERVER_MAX [ In reply to ]
for (i = 0; i < HARD_SERVER_MAX; ++i)
waitpid (scoreboard_image[i].pid, &status, 0);


scoreboard_image[i].pid can be equal to the parent pid.
Is it safe to do a waitpid on yourself ?

Other clues that might help..
I have MaxClients 4 on my own workstation, and the first 5
scoreboard pids get set to the parent pid, the rest are 0.

Incidentally, the scoreboard file can take 150 pids, but I only
need 4, so there's some wastage there.


rob
Re: HARD_SERVER_MAX [ In reply to ]
NB all children (both servers and miscellany like the TransferLog procs)
are supposed to get a SIGKILL (the TransferLog children get a SIGHUP and
three seconds' grace first). So, it only hangs on restart if a child
survives that.... which may be possible under certain very obscure
circumstances (i.e., situations where the child is trying to die, but
can't, because some kernel resource can't be freed; this was the SO_LINGER
problem some may remember from way back when).

Still, in the long run this should probably be cleaned up. The only reason
I'm reluctant to do so *now* is that we'd then have to retest the code
everyplace (viz. the MaxClients thing, which *shouldn't* have caused any
problems --- NB I threw a MaxClients into my own (SunOS) configuration here
to see if the problem showed up... nothing yet).

rst
Re: HARD_SERVER_MAX [ In reply to ]
scoreboard_image[i].pid can be equal to the parent pid.
Is it safe to do a waitpid on yourself ?

Yes --- it returns ECHILD. (In any case, waitpid() is only used by
the scoreboard code when it is trying to restart, which doesn't sound
like it was happening in your case --- the rest of the time it is always
an ordinary wait(), via the function wait_or_timeout, which sets a
handler for alarms as well to get out of it).

I have MaxClients 4 on my own workstation, and the first 5
coreboard pids get set to the parent pid, the rest are 0.

They get set this way *after the children are running*? That would be
bad...

Incidentally, the scoreboard file can take 150 pids, but I only
need 4, so there's some wastage there.

Total size of the file is 1200 bytes... IMHO, not worth paring down.
(If we went to a more complete scoreboard, with information on requests
in progress, etc., things would be different).

rst
Re: HARD_SERVER_MAX [ In reply to ]
> Maybe this is a clue..
>
> I can see the use of HARD_SERVER_MAX in the code, but I'm configuring
> this much lower.
>
>
> void reclaim_child_processes ()
> {
> int i, status;
>
> sync_scoreboard_image();
> for (i = 0; i < HARD_SERVER_MAX; ++i)
> waitpid (scoreboard_image[i].pid, &status, 0);
> }
>
>
>
> Does this loop go too far if my MAX is < HARD_SERVER_MAX.
>
> Could it be my Apache parent was waiting on some unreleated pid ?

Uuuh, am I being lame here? Should the code be checking to see
if the s_i[i] slot is being used?


> rob
> --
> http://nqcd.lanl.gov/~hartill/
>
Re: HARD_SERVER_MAX [ In reply to ]
> for (i = 0; i < HARD_SERVER_MAX; ++i)
> waitpid (scoreboard_image[i].pid, &status, 0);

>
> Does this loop go too far if my MAX is < HARD_SERVER_MAX.
>
> Could it be my Apache parent was waiting on some unreleated pid ?

Hate me, but is this better? MIN_SERVER_MAX would be, like, 1 or something
insane.

m = min( HARD_SERVER_MAX, MAX );
if ( m < MIN_SERVER_MAX ) m = MIN_SERVER_MAX; # Lettem hang themselves
for( i = 0; i < m; ++i)
etc...

> rob
> --
> http://nqcd.lanl.gov/~hartill/
>

Cheers,
Ay, probably shooting himself in the foot with this.


Andrew Wilson URL: http://www.cm.cf.ac.uk/User/Andrew.Wilson/
Elsevier Science, Oxford Office: +44 01865 843155 Mobile: +44 0589 616144
Re: HARD_SERVER_MAX [ In reply to ]
Oh god, this is a bad hair day. Just ignore those last two posts (the
first of which only got posted because I slipped and clicked on SEND
instead of KILL)
Re: HARD_SERVER_MAX [ In reply to ]
>From: Rob Hartill <hartill@ooo.lanl.gov>
>Date: Wed, 2 Aug 95 10:38:16 MDT
>
>Maybe this is a clue..
>
>I can see the use of HARD_SERVER_MAX in the code, but I'm configuring
>this much lower.
>
>void reclaim_child_processes ()
>{
> int i, status;
>
> sync_scoreboard_image();
> for (i = 0; i < HARD_SERVER_MAX; ++i)
> waitpid (scoreboard_image[i].pid, &status, 0);
>}

>Does this loop go too far if my MAX is < HARD_SERVER_MAX.
>
>Could it be my Apache parent was waiting on some unreleated pid ?

>From: rst@ai.mit.edu (Robert S. Thau)
>Date: Wed, 2 Aug 95 12:57:40 EDT
>
>Hmmm... that function is only called during restart (or should only be);
>in any case, the scoreboard is cleaned out before any pid gets written
>in it, so the "dead space" should only contain zeroes, and waitpid on
>0 should return ECHILD just about everyplace. (Though, come to think of
>it, the check should probably be there). I'll look the code over again
>and see what might have changed...

Hmm, I don't want to be too offensive, but if there are any other bits of
code in this hacky unix style then it would be best to know now...

waitpid(0) waits for any process in the calling process' process group.
Thus this will only work if the parent httpd has no children which are
not going to die; otherwise the server will hang.

So:
If any child httpd does not exit on a restart, the server will hang.

If I set TransferLog | program, then the server will hang on the first restart.

David.