Mailing List Archive

Serialising accepts (was Re: apache_0.7.3h comments)
Re: Serialising accepts (was Re: apache_0.7.3h comments) [ In reply to ]
In reply to Brandon Long who said
>
> >
> > >From the Sun guy:
> > } On Solaris, you need to use it for single processors also and I suspect in
> > } all library based Socket interface implementations like Unixware etc.
> > ~~~~~~~~~~~~~~~~~~~~
> > } The reason is that accept() is not an atomic operation but just a call in
> > } the ABI and any other call to the same fd at the same time can cause
> > } unpredictable behaviour.
> >
> > In other words, relying on the kernel to serialise accepts may simply not
> > work on several unixes, not just multiprocessor OSes. For example, I would
> > suspect Linux might fall into this case. (Hence some of the problems reported.)
>
> Alan Cox claimed that this was actually one of his tests for the Linux
> networking code (multiple accepts on the same socket), so I find it strange
> that it would be that way.
>

I'm not entirely sure what the above refers to since I haven't
followed the whole thread but there's a bug in some Berkeley
networking implementations that means that queued tcp connections
get passed to the application in LIFO order rather than FIFO order.

This can happen, for instance, where the socket is open but the
application is sleeping. The kernel will complete the TCP open
handshake with the client and queue the connection for when the
application next runs but it passes the connections to the app back
to front.

Since I missed the start of this thread, why is serialising the accepts
important?

--
Paul Richards, Bluebird Computer Systems. FreeBSD core team member.
Internet: paul@FreeBSD.org, http://www.freebsd.org/~paul
Phone: 0370 462071 (Mobile), +44 1222 457651 (home)
Re: Serialising accepts (was Re: apache_0.7.3h comments) [ In reply to ]
>
> >From the Sun guy:
> } On Solaris, you need to use it for single processors also and I suspect in
> } all library based Socket interface implementations like Unixware etc.
> ~~~~~~~~~~~~~~~~~~~~
> } The reason is that accept() is not an atomic operation but just a call in
> } the ABI and any other call to the same fd at the same time can cause
> } unpredictable behaviour.
>
> In other words, relying on the kernel to serialise accepts may simply not
> work on several unixes, not just multiprocessor OSes. For example, I would
> suspect Linux might fall into this case. (Hence some of the problems reported.)

Alan Cox claimed that this was actually one of his tests for the Linux
networking code (multiple accepts on the same socket), so I find it strange
that it would be that way.

>I think we should bite the bullet and admit that use of multiple accepts is not
> portable, and that we have to re-work it to explicitly serialise the accepts
> in a portable manner.
>
> The alternative is to have ad hoc serialisation for each OS that needs it,
> possibly in a manner that is OS minor-version-number specific. For these
> OS's (and how will we know if an OS breaks until the users complain?) the
> ad hoc serialisation will probably be no faster than some portable schemes
> we could implement. For example, lockf() is actually implemented via IPC
> to a lockd daemon; IPC to the parent httpd would not be any slower.
>
> So why not have a queue managed by the parent of the children waiting
> to accept? Queue management occurs after a child has serviced a request,
> and so does not impact on the server response time. (Unlike the NCSA approach,
> where it is done between accept() returning and the child being passed the
> file-descriptor.)

We tried a similar approach in alpha, and it would blow up. Of course,
with the number of other bugs in the code at the time, it might have been
something else. The queue would grow huge, then drop (as some child came
free, and it found that all of the users had aborted). This was with sites
like hoohoo and www.acm.uiuc.edu (lots of multi-megabyte transfers) both
of which have really long lived connections. I don't think it'll be any
worse then your current schema, but remember a lot of Un*ces have
relatively low limits on number of open file descriptors per process (
if you have ipc between parent and child and a queue, you might hit that
limit pretty quick).

> So:
> The parent has a pipe to each child.
> It sends a short message to an idle child saying: 'you accept next'.
> On completing a request a child sends a message to the parent saying
> 'I am idle', and waits for a response message.
> The parent has some algorithm for deciding which idle child should be
> given the next connection. Round-robin would be cache-unfriendly; instead,
> send it to the most recently used child.

We moved to round robin to keep down bugs (At the time). Its also faster
to find the next child that way. Of course, explicitly keeping 2 separate
lists of children (busy and free) would negate most of that. Also,
damn select (you have to test the whole list of fd's).

Brandon

--
Brandon Long "I think, therefore I am Confused."
NCSA HTTPd Server Team - Robert Anton Wilson
University of Illinois
blong@uiuc.edu <a href="http://www.uiuc.edu/ph/www/blong">Push</a>
Re: Serialising accepts (was Re: apache_0.7.3h comments) [ In reply to ]
David writes,

>I think we should bite the bullet and admit that use of multiple accepts is not
> portable, and that we have to re-work it to explicitly serialise the accepts
> in a portable manner.

that's never been denied, however it was only assumed that MP systems
would be the only ones which need to be mutually excluded from doing
an accept.

> The alternative is to have ad hoc serialisation for each OS that needs it,
> possibly in a manner that is OS minor-version-number specific. For these
> OS's (and how will we know if an OS breaks until the users complain?) the
> ad hoc serialisation will probably be no faster than some portable schemes
> we could implement. For example, lockf() is actually implemented via IPC
> to a lockd daemon; IPC to the parent httpd would not be any slower.

I really dislike the idea of involving the parent in this process. If
the children (on the affected systems) can do the job themselves, then
that must be the best way.

> So:
> The parent has a pipe to each child.
> It sends a short message to an idle child saying: 'you accept next'.
> On completing a request a child sends a message to the parent saying
> 'I am idle', and waits for a response message.

This was the mothod I used to prototype a pre-forker with perl. It
works, but I'm sure that OS systems for mutual exclusion will be
much more efficient than parent<->child communication. As a last
resort, maybe.

> The parent has some algorithm for deciding which idle child should be
> given the next connection. Round-robin would be cache-unfriendly; instead,
> send it to the most recently used child.

Systems which are happy to do multiple accepts; SunOs and HPUX appear to
be, should be allowed to just get on with things and not have to get
involved in parent scheduling algorithms which will only slow it down.

As a final note, I'd sooner trust an OS based mutual exclusion system
than one involving apache interprocess communication.


rob
--
http://nqcd.lanl.gov/~hartill/
Re: Serialising accepts (was Re: apache_0.7.3h comments) [ In reply to ]
/*
* "Re: Serialising accepts (was Re: apache_0.7.3h comments)" by Rob Hartill
* written Tue, 27 Jun 95 12:21:49 MDT
*
* Multi processor systems (and now Solaris) can produce unexpected
* results if more than one process are waiting for accept() to tie a
* process to a client.
*
*/

The only systems we've found that need this workaround are
those with SVR4-derived networking. SGI multiprocessors handle more
than one process in accept() just fine.

The systems that do need the workaround are Solaris, Sequent, and
Novell's Unixware among others.

--Rob
Re: Serialising accepts (was Re: apache_0.7.3h comments) [ In reply to ]
> Since I missed the start of this thread, why is serialising the accepts
> important?


Multi processor systems (and now Solaris) can produce unexpected
results if more than one process are waiting for accept() to tie
a process to a client.

--
http://nqcd.lanl.gov/~hartill/