Mailing List Archive

update ii
well the suckers are down to 260k each at startup (was 548k), so that's
promising.. trouble is that a SIGHUP (restart) causes the new children
to die when servicing any request :-(


So, how ugly is the following suggestion...

Rather than have the parent process attempt to cleanup after
a SIGHUP, it could do the following..

very early in the code (first line maybe),

while((pid=fork())) /* while it was you that called fork */
waitpid(pid, NULL, 0); /* wait for the child to exit */
/* child falls through */


Then a SIGHUP can be handled by writing "bye bye" to the error log,
followed by an exit(0);

It's a lot easier than using siglongjump and freeing loads of resources.

I just looked at the man page for siglongjmp; it's storing a pile of
info much like a 2nd process would anyway, but the man page also says

"But, because the register storage class is only
a hint to the C compiler,
variables declared as register variables may not necessarily
be assigned to machine registers, so their values are
unpredictable after a longjmp(). This is especially a prob-
lem for programmers trying to write machine-independent C
routines."


so my alternative might not be so bad after all.

It might also be worth ditching the children's longjump handling and
just let them exit whenever they would otherwise jump. N.B. they'll be
replaced with a new child almost immediately.


thoughts ?

--
Rob Hartill
http://nqcd.lanl.gov/~hartill/
Re: update ii [ In reply to ]
Last time, Rob Hartill uttered the following other thing:
>
>
> well the suckers are down to 260k each at startup (was 548k), so that's
> promising.. trouble is that a SIGHUP (restart) causes the new children
> to die when servicing any request :-(
>
>
> So, how ugly is the following suggestion...
>
> Rather than have the parent process attempt to cleanup after
> a SIGHUP, it could do the following..
>
> very early in the code (first line maybe),
>
> while((pid=fork())) /* while it was you that called fork */
> waitpid(pid, NULL, 0); /* wait for the child to exit */
> /* child falls through */
>
>
> Then a SIGHUP can be handled by writing "bye bye" to the error log,
> followed by an exit(0);
>
> It's a lot easier than using siglongjump and freeing loads of resources.
>
> I just looked at the man page for siglongjmp; it's storing a pile of
> info much like a 2nd process would anyway, but the man page also says
>
> "But, because the register storage class is only
> a hint to the C compiler,
> variables declared as register variables may not necessarily
> be assigned to machine registers, so their values are
> unpredictable after a longjmp(). This is especially a prob-
> lem for programmers trying to write machine-independent C
> routines."

This is actually a problem with 1.4.1 if you use gcc -O2 on Solaris
and SunOS. It causes a SIGBUS after the set jump in the main process
after a SIGHUP.

> so my alternative might not be so bad after all.
>
> It might also be worth ditching the children's longjump handling and
> just let them exit whenever they would otherwise jump. N.B. they'll be
> replaced with a new child almost immediately.

We haven't noticed as much of a problem with the children. The problem
is how often die() is called. If more of the die()'s could be handled
instead of actually die'ing, then it would probably be worth it. B18
helped, for instance. You're not going to get around the sigalrm
and sigpipe, but the If-Modified-Since probably happens a little too
often (causing a die(USE_LOCAL_COPY)).

hmmm

Brandon
--
Brandon Long (N9WUC) "I think, therefore, I am confused." -- RAW
Computer Engineering Run Linux '95. It's that Easy.
University of Illinois blong@uiuc.edu http://www.uiuc.edu/ph/www/blong
Don't worry, these aren't even my views.
Re: update ii [ In reply to ]
Re: update ii [ In reply to ]
On Sat, 17 Jun 1995, Chuck Murcko wrote:
> Unfortuantely, There are real
> problems with 0.7.2j eating up all the file descriptors under FreeBSD and
> BSDI under heavy load, and I have to look at those first.

Make sure your MAXUSERS is at least as big as the number of children you
expect to run. That was a problem too for links.net until that was done.
fstat was showing *lots* of file descriptors, but on this machine it was
also serving 4 virtual hosts, each of which had at least 3 fd's for log
files and such. It did seem like there was some leakage going on as
well, so I won't rule that out....

Other problems we've run into:

1) the server starts returning nothing but 403 errors for every request,
dutifully logging it though sometimes the URI logged isn't the one requested.

2) all the children die, only the parent is left, and no files are being
served. Is there a condition whereby a parent might not detect the death
of a child?

Both of these were seen on systems where a kill -HUP was sent to the
parents via cron every 15 minutes, but the parent was never completely
killed and restarted (it hopefully shouldn't have to be). The kill -HUP
would fix the first problem but not the second.

Brian


--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: update ii [ In reply to ]
On Jun 17, 4:47pm, Brian Behlendorf wrote:
} Subject: Re: update ii
} On Sat, 17 Jun 1995, Chuck Murcko wrote:
} > Unfortuantely, There are real
} > problems with 0.7.2j eating up all the file descriptors under FreeBSD and
} > BSDI under heavy load, and I have to look at those first.
}
} Make sure your MAXUSERS is at least as big as the number of children you
} expect to run. That was a problem too for links.net until that was done.
} fstat was showing *lots* of file descriptors, but on this machine it was
} also serving 4 virtual hosts, each of which had at least 3 fd's for log
} files and such. It did seem like there was some leakage going on as
} well, so I won't rule that out....

Actually I use the following formula for maxusers:

8 + # of users + # of NFS clients + # pop clients + #
of http children.

Better safe than sorry...investing in memory and/or increasing the buffer
chache for large sites can also do wonders. It's amazing to see a site
serve 250K+ hits/day and still have periods of no disk activity for 20
seconds with 5-10 hits/sec :) Now if we can just keep the system stable...

[.I should note that increasing the buffer cache is for non SVR4
systems only. There is no real buffer cache for Solaris and SVR4
systems.]


Cliff