Mailing List Archive

Re: SSI handlers
} But, this is how server-side includes *work* in Shambhala (each
} inclusion directive generates the same internal data structure as an
} actual client request, with a few extra notations; see the API docs on
} the sub-request mechanism. This means, in particular, that they are
} *already* subject to content negotiation, if the item being included
} is such that content negotiation would come into play on a direct
} request).

Kindof, I was thinking of more complicated objects than files. I'll go back
and read the API again, perhaps I missed something.

} Performance is important to people. I actually think most of the
} current servers get performance as good as you can expect out of the
} current HTTP spec, and in fact, I'd be happy to stick to a forking
} server, except for one thing, which is HTTP-NG. The performance issue
} which *it* addresses is gaining adequate network performance in the
} face of TCP slow-start, and that issue is very real. However, I don't
} see any easy way of implementing HTTP-NG in a non-threaded server,
} hence my desire not to commit to anything which would close off that
} option.

I'm not sure we will need threading for HTTP-NG. I don't see why one
thread can't handle multiple requests at once to be honest. All you
really need is some sort of async I/O. This can be simulated in an OS
neutral way, unlike threads. Of course many of the same concerns and issues
would be the same in the code.

} Ummmm... in such an environment, is there any reason for the server
} not to use the NFS interface as well, which would eliminate the need
} for special code to interface to the database?
}
} (Carrying this further --- if stuff in the database is edited as if it
} were in the normal filesystem, and retrieved as if it were in the
} normal filesystem, what is the database offering which is *not*
} offered by a normal filesystem? Before getting lost in implementation
} detail, it might be wise to step back a bit, and write down someplace
} exactly what you want to achieve by integrating the database, so you
} can be sure that the machinery you come up with achieves those goals).

I used to say the same thing. I could spew for hours on how a file
system is just fine for a server. I mean a pile of documents and a file
system just "work" the same way most of the time. But working at Organic
I've seen a bunch of things than make me want more than the UNIX file
system provides. I guess I could switch to VMS (cough, choke) or use something
that provides:

1) Lack of versioning

2) Backup/recovery

3) Mirroring

4) Performance

5) Multiple front ends to the same data

I really want a nice stable, logging, easily backupable, pile of
objects :) For now I am off in the direction of a log based file
system, CVS/RCS, and dump/restore. But I'd prefer a nice integrated
solution. Come to think of it, I can now spew for hours on how
you want a database now...Certainly all these issues can be addressed
by various solutions (even 5 can be addressed with NFS), but databases
are designed to do this sort of thing.

As web servers move into more transaction based things you will need a
database hooked to your web server anyway. Why not just use it to handle
all the data issues, not just transactions.

I don't expect the Apache group to do this, but I do want to see APIs that
support a database vendor or an interested third party (like Organic)
doing this. This is the future.

Cliff
Re: SSI handlers [ In reply to ]
On Mon, 10 Jul 1995, Robert S. Thau wrote:
> I've thought fairly long and hard about how to come up with an API
> which generalizes all these things, and I can't. I've then given up on
> 3), decided that anything which was in the DB back-end would need its
> own response handler, and I have a few ideas about how that could work
> (it helps to start distinguishing internal object type from the type
> that will be served to the client, so that you can dispatch on the
> former and do content negotiation on the latter), but it's still quite
> messy (particularly #2 --- what interface do you provide to the
> command stuff)?

I agree with this assessment. Unlike Cliff, I don't see why the file
access layer has to be 100% abstracted to the API. One question - what's
the best way to configure things such that accesses to a particular
sub-path are acted upon internally differently? I.e., if I write a
system where all the URL's sit under

http://host/program/arg1/arg2?arg3

(etc etc), is there a way I can write a module to handle that? I.e.

http://host/music-database/artist/autechre
http://host/music-database/label/warp
http://host/music-database/search-for?Richard%20H%20Kirk

etc..... where "music-database" is a module?

> Sigh...

Rob, we weren't kidding when we said we'd take you out to dinner next
time you visit SF :) (this goes for all you other contributors too!)

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: SSI handlers [ In reply to ]
Date: Mon, 10 Jul 1995 08:53:54 -0700
From: Cliff Skolnick <cliff@organic.com>
Precedence: bulk
Reply-To: new-httpd@hyperreal.com

This note is hard to respond to, because there are discussions of two
issues here, which have gotten kind of intermingled. These two issues
(which I see as pretty much distinct) are:

1) Using a database instead of a filesystem as a server back-end
2) embedding code (not just new directives, but actual code) in
server-side include documents, via, e.g., a <TCL> tag.

First, topic #2.

I'm not sure we will need threading for HTTP-NG. I don't see why one
thread can't handle multiple requests at once to be honest. All you
really need is some sort of async I/O. This can be simulated in an OS
neutral way, unlike threads. Of course many of the same concerns and issues
would be the same in the code.

To begin with, my problem with your <TCL> tag has less to do
with multithreading per se than it has to do with simply having
multiple requests served by one process, regardless of how that's
accomplished. Consider a "document" with the following embedded code:

<!--#TCL-->
while {1} {open /dev/null r}
<!--#/TCL-->

--- I hope you'll pardon me for using my own suggested syntax.

Whatever process winds up serving this document will quickly run out
of file descriptors (too quickly to be caught by a timeout, for
instance) --- at which point all other requests being served by the
same process would be totally screwed. As I've already pointed out to
you, detecting and recovering from these sorts of situations, in the
general case, is very difficult. This has nothing to do with
multithreading per se --- it has to do with having one process serving
multiple requests, no matter how that is accomplished.

(You can say that no one would write that sort of thing deliberately,
and at Organic, that may even be true. But sooner or later, it's
going to happen by accident --- or worse, you'll have a slow leak
which doesn't get detected until the "document" gets loose on your
primary server and brings *it* down. If you're doing one request per
process, you have exit() as a last-ditch way to escape from these
situations, but if multiple requests are being served by the same
process, and for HTTP-NG they basically have to be, there is no way
out).

As to using asynch I/O, instead of full multithreading --- it is
*possible*. However, the only way I see to make it actually work
would be to write the entire server (or at least, anything which did
potentially blocking I/O, including all the response handlers) in the
same style as the protocol state machines of a kernel device driver.
(Anytime a request could block, you have to save *all* the information
that would be needed to resume it before you can go do something else
--- at enormous cost to the clarity of your code).

I have worked on code that was actually written like this. (It was
control software for some experiments in a biology lab, which ran on a
non-preemptive system and had to deal with real-time constraints, so
this sort of state machine approach was the only option). I don't
want to repeat the experience, which is why I don't regard this as a
productive option.

----------------------------------------------------------------

Now, on to topic #1 --- skipping point-by-point replies, the heart of
it is:

I don't expect the Apache group to do this, but I do want to see APIs that
support a database vendor or an interested third party (like Organic)
doing this. This is the future.

Here is a list of ways in which Shambhala is currently wedded to a
filesystem as a back-end:

1) Translation handlers have to be translating into some sort of a
namespace; currently, the filesystem is it.

2) The server core scans the translated pathnames, looking for
.htaccess files to read per-directory permissions out of. A
database-back-ended server would presumably want a similar
mechanism, but coming up with a suitably general interface
is extremely difficult.

3) The response handlers all invariably use fopen() to get at the
filesystem object whose name popped out of the translation
handler.

I've thought fairly long and hard about how to come up with an API
which generalizes all these things, and I can't. I've then given up on
3), decided that anything which was in the DB back-end would need its
own response handler, and I have a few ideas about how that could work
(it helps to start distinguishing internal object type from the type
that will be served to the client, so that you can dispatch on the
former and do content negotiation on the latter), but it's still quite
messy (particularly #2 --- what interface do you provide to the
command stuff)?

It took me a couple of *months* to come up with clean APIs for what
Shambhala does now --- I was effectively AWOL for quite a bit longer
than people seem to have noticed. I expect it would take at least
an equivalent amount of time to come up with a good clean design for
this and make it work, and I'm not sure I have the time for that right
now. Sigh...

rst
Re: SSI handlers [ In reply to ]
Date: Mon, 10 Jul 1995 18:38:59 -0700 (PDT)
From: Brian Behlendorf <brian@organic.com>

What's the best way to configure things such that accesses to a particular
sub-path are acted upon internally differently? I.e., if I write a
system where all the URL's sit under

http://host/program/arg1/arg2?arg3

(etc etc), is there a way I can write a module to handle that? I.e.

http://host/music-database/artist/autechre
http://host/music-database/label/warp
http://host/music-database/search-for?Richard%20H%20Kirk

etc..... where "music-database" is a module?

Look at how ScriptAlias works, in mod_cgi.c; you could do the same
thing with a "database-access" magic type of your own invention,
instead of the CGI_MAGIC_TYPE. This would cause your own response
handler (presumably in the same module) to be invoked when that phase
rolls around.

It isn't pretty, but it does the job.

rst
Re: SSI handlers [ In reply to ]
Both issues are handler issues, but they are distinct. I like pulling them
apart. In fact maybe two API's for handlers instead of one. One to
get objects, one to manipulate them.

On Mon, 10 Jul 1995 19:49:47 EDT, rst@ai.mit.edu (Robert S. Thau) wrote:
} From: Cliff Skolnick <cliff@organic.com>
} First, topic #2.
}
} I'm not sure we will need threading for HTTP-NG. I don't see why one
} thread can't handle multiple requests at once to be honest. All you
} really need is some sort of async I/O. This can be simulated in an OS
} neutral way, unlike threads. Of course many of the same concerns and issu
es
} would be the same in the code.
}
} To begin with, my problem with your <TCL> tag has less to do
} with multithreading per se than it has to do with simply having
} multiple requests served by one process, regardless of how that's
} accomplished. Consider a "document" with the following embedded code:
}
} <!--#TCL-->
} while {1} {open /dev/null r}
} <!--#/TCL-->
}
} --- I hope you'll pardon me for using my own suggested syntax.
}
} Whatever process winds up serving this document will quickly run out
} of file descriptors (too quickly to be caught by a timeout, for
} instance) --- at which point all other requests being served by the
} same process would be totally screwed. As I've already pointed out to
} you, detecting and recovering from these sorts of situations, in the
} general case, is very difficult. This has nothing to do with
} multithreading per se --- it has to do with having one process serving
} multiple requests, no matter how that is accomplished.

Fine on the syntax, I like the idea of all non-standard tags sticking
out. Forget I even mentioned <TCL> :). I understand the issue here and
I don't think we can protect people from doing stupid things. I'd like
to see a very restrictive language instead of Tcl, but I don't want to
write one. So I will use Tcl and keep my fingers crossed. If there is
something better I will jump on it. I'm not arguing to hang people, just
give 'em the rope. I want this ability in the API, but I don't want to see
it distributed with the server. Is that OK?

} (You can say that no one would write that sort of thing deliberately,
} and at Organic, that may even be true. But sooner or later, it's
} going to happen by accident --- or worse, you'll have a slow leak
} which doesn't get detected until the "document" gets loose on your
} primary server and brings *it* down. If you're doing one request per
} process, you have exit() as a last-ditch way to escape from these
} situations, but if multiple requests are being served by the same
} process, and for HTTP-NG they basically have to be, there is no way
} out).

Again, this will be use at your own risk. In fact since you can add
handlers are we protecting again someone adding a bad one with a leak?
I think not. Nothing is making them use your alloc lines.

} Now, on to topic #1 --- skipping point-by-point replies, the heart of
} it is:
}
} I don't expect the Apache group to do this, but I do want to see APIs that
} support a database vendor or an interested third party (like Organic)
} doing this. This is the future.
}
} Here is a list of ways in which Shambhala is currently wedded to a
} filesystem as a back-end:
}
} 1) Translation handlers have to be translating into some sort of a
} namespace; currently, the filesystem is it.

This is actually fine for a database. Most information can be aranged
hierarchically, which maps to file names. Nothing needed to change here.
}
} 2) The server core scans the translated pathnames, looking for
} .htaccess files to read per-directory permissions out of. A
} database-back-ended server would presumably want a similar
} mechanism, but coming up with a suitably general interface
} is extremely difficult.

This may still work, the server could "fetch" and object "{somename}/.htaccess"
which may be returned by the database. I'd rather have a "is this object
accessed controlled" call, which would be a backing store specifc funtion
combined with an access method. I am sure the database vendor would like to
store the user/password database themselves and not have it look like a flat
file. Perhaps the module itself should be given the translated object
(filename) and return if it needs to be called for further access control.

}
} 3) The response handlers all invariably use fopen() to get at the
} filesystem object whose name popped out of the translation
} handler.
}
} I've thought fairly long and hard about how to come up with an API
} which generalizes all these things, and I can't. I've then given up on
} 3), decided that anything which was in the DB back-end would need its
} own response handler, and I have a few ideas about how that could work
} (it helps to start distinguishing internal object type from the type
} that will be served to the client, so that you can dispatch on the
} former and do content negotiation on the latter), but it's still quite
} messy (particularly #2 --- what interface do you provide to the
} command stuff)?

I don't know the answer to this, but I could see two things:

1 module that knows how to
start access to an object (open)
read data from an object (read)
stop access (close)

mod_filesystem.c and mod_db.c would have these routines. I think we
can safely assume sequencial access to an object so this should be OK.

handlers like mod_include.c would get handed an object where data was ready
to be read. The framework would do the start access and stop access
to the object. The framework could also provide glue that would allow
the output of handler A to be fed into handler B.

} It took me a couple of *months* to come up with clean APIs for what
} Shambhala does now --- I was effectively AWOL for quite a bit longer
} than people seem to have noticed. I expect it would take at least
} an equivalent amount of time to come up with a good clean design for
} this and make it work, and I'm not sure I have the time for that right
} now. Sigh...

Hey, this stuff is #ifdef FUTURE. Maybe even a 2.0 thing. I think you've
done most excellent work and have thought about these implimentation issues
more than me. I just have a very strong feeling that this is where
the web is going, and I want to be there. I was hoping this would be easier,
but you've given many good arguments as to why these are a pain.

Our priority should be to get this code out there and have people look at
and make suggestions. Let's release with only changes that you (rst) see
as high priority. I'll even shut up about this until we release. Hmm, maybe
I should shutup after release as a motivation for the release. :)

Cliff