Mailing List Archive

weirdness happening now
I restarted (rebooted) the SunOS machine at Cardiff. There have been
no restarts since...

Config :-

MinSpareServers 4
MaxSpareServers 17
MaxClients 50
MaxRequestsPerChild 60


I saw 17 processes alive a couple of times, only to see the number drop back
to below 17. I now have 12, about 5 minutes after having 17.
I didn't think there would ever be less than 17, once 17 had been created.

Now there are 14... it might be in a kind of forking mode.

16 now.


-=-=-=-=-=-=
od -h /tmp/htstatus.a00166
0000000 0000 032b 0100 3290 0000 0427 0100 3290
0000020 0000 0487 0100 3290 0000 0459 0100 3290
0000040 0000 0479 0100 3290 0000 0445 0100 3290
0000060 0000 0467 0100 3290 0000 02bc 0100 3290
0000100 0000 047a 0200 3290 0000 0396 0100 3290
0000120 0000 03db 0200 3290 0000 039b 0100 3290
0000140 0000 03b6 0100 3290 0000 03eb 0100 3290
0000160 0000 048a 0100 3290 0000 0499 0200 3290
0000200 0000 00a6 00ff fe90 0000 00a6 00ff fe90
0000220 0000 00a6 00ff fe90 0000 0000 0000 0000
0000240 0000 0000 0000 0000 0000 0000 0000 0000
*
0002260
-=-=-=-=-=-=-


Now there are 13 servers.


od -h /tmp/htstatus.a00166
0000000 0000 050c 0100 3290 0000 0427 0200 3290
0000020 0000 0487 0100 3290 0000 0459 0200 3290
0000040 0000 0479 0200 3290 0000 0445 0200 3290
0000060 0000 0467 0100 3290 0000 052c 0100 3290
0000100 0000 047a 0200 3290 0000 054d 0100 3290
0000120 0000 03db 0100 3290 0000 00a6 00ff fe90
0000140 0000 00a6 00ff fe90 0000 00a6 00ff fe90
0000160 0000 048a 0100 3290 0000 0499 0200 3290
0000200 0000 00a6 00ff fe90 0000 00a6 00ff fe90
0000220 0000 00a6 00ff fe90 0000 0000 0000 0000
0000240 0000 0000 0000 0000 0000 0000 0000 0000
*
0002260



Is this correct ?


rob
Re: weirdness happening now [ In reply to ]
> Re: servers dying off when less than MaxSpareServers are active ---
> that's at least as expected; servers which die because they've served
> their MaxRequestsPerChild are not immediately replaced.

In that case, it is behaving, I think.

I was trying to reproduce the problem from earlier today.

> I'm a bit more worried about what's happening on your own workstation
> at this point...

Hmm, did I made a mistake reporting it earlier ?. The problem is with
the SunOS machine at cardiff which went into meltdown earlier today.
It went down to MinSpareServers and stayed there.

I haven't noticed any problems with the HPs. Any info I gave on the
HPs was for quick analysis to see if anything wasn't as it should be. One
of the numbers looked odd, but I couldn't see anything that would break
it.


Still waiting to see if the sunos machine gets stuck when it falls down
to MinSpareServers.. Unfortunately, there's a scheduled SIGHUP due in
10 minutes which could spoil things.


rob
Re: weirdness happening now [ In reply to ]
> The problem on your machine which worries me is the pids in the scoreboard
> being all the pid of the root process --- that's pretty bad, if it's really
> happening, since it means that the root won't be able to free the slot
> when the child dies (and it will therefore still think that a server is
> running).

I thought that was the intention.

Doesn't the parent call update_child_status () for dead children
and have it's own pid stamped on the entry ?

void update_child_status (int child_num, int status)
{
short_score new_score_rec;
new_score_rec.pid = getpid();
new_score_rec.status = status;


If that's not what's wanted, then make the 2nd line conditional on the
routine being called by a child.



rob
Re: weirdness happening now [ In reply to ]
> Ah... having the parent's pid on entries for *dead* children is not a problem.
> (Since those slots should have status of SERVER_DEAD, meaning free for
> reallocation, the pid there doesn't really matter anyway). If that's what
> you were commenting on, then that's OK, and I just misunderstood your
> comments. It's only if a live child doesn't have its own pid in the
> scoreboard that there is (obviously) a real problem.

Okay, so to clarify the sunos version at Cardiff has...

od -h /tmp/htstatus.a00166
0000000 0000 0ea7 0100 3290 0000 0d66 0100 3290
0000020 0000 0e78 0200 3290 0000 0f03 0100 3290
0000040 0000 0ed7 0100 3290 0000 00a6 00ff fe90
0000060 0000 0e90 0200 3290 0000 0ed8 0100 3290
0000100 0000 0e64 0200 3290 0000 0d91 0200 3290
0000120 0000 0de7 0100 3290 0000 0dc4 0100 3290
0000140 0000 00a6 00ff fe90 0000 0e68 0200 3290
0000160 0000 0e9c 0100 3290 0000 00a6 00ff fe90
0000200 0000 00a6 00ff fe90 0000 00a6 00ff fe90
*
0000240 0000 0000 0000 0000 0000 0000 0000 0000
*
0002260



Where the parent pid (a6) has 00ff next to it. On my HP I see
0000.

Could the "ff" be tripping it up ?
Also, I'm not sure what the "fe90" and "3290" refer to in the above.
I don't see these number under HP.

Can you explain those if they are valid please Rob.



rob.
Re: weirdness happening now [ In reply to ]
Re: servers dying off when less than MaxSpareServers are active ---
that's at least as expected; servers which die because they've served
their MaxRequestsPerChild are not immediately replaced.

I'm a bit more worried about what's happening on your own workstation
at this point...

rst
Re: weirdness happening now [ In reply to ]
The problem on your machine which worries me is the pids in the scoreboard
being all the pid of the root process --- that's pretty bad, if it's really
happening, since it means that the root won't be able to free the slot
when the child dies (and it will therefore still think that a server is
running).

rst
Re: weirdness happening now [ In reply to ]
Ah... having the parent's pid on entries for *dead* children is not a problem.
(Since those slots should have status of SERVER_DEAD, meaning free for
reallocation, the pid there doesn't really matter anyway). If that's what
you were commenting on, then that's OK, and I just misunderstood your
comments. It's only if a live child doesn't have its own pid in the
scoreboard that there is (obviously) a real problem.

Frayed nerves...

rst
Re: weirdness happening now [ In reply to ]
What you have is a three-byte structure which is padded out to eight
bytes. The ff, 3290 and fe90 look to me like crud which was dragged
in off the stack doing structure assignments; it should have no
significance.

(The three bytes are two bytes of pid, one byte of status; unfortunately,
something seems to believe that every struct is at least a doubleword in
size...).

rst