>1. There isn't a bug - appropriate configuration would solve the reported
>problem. Putting cgi-bin in your document tree seems unwise. I suppose that
>other URLs would also be able to access it
>(e.g. /somedir/../cgi-bin/somescript), but I haven't tried it. Also,
>presumably, SSIs could include them, even with the new restrictions, which
>would present an internal security problem.
>2. Double slashes don't currently mean anything to Apache, don't make any
>great sense to me, and lead to unintuitive defeats of various useful
>mechanisms. It would seem not unreasonable to ban them, pending a defined use
>of them, or to convert them all to single slashes.
>I don't understand the need for support of // in PATH_INFO, though.
Err, in which case, why support the ^ character either? (Example chosen at
random.)
Let me tell a story.
An http URL was defined as
http://host[:port][path]
A path is defined as concatenation of zero or more path segments, separated
by '/'. A path segment may be _zero_ or more of a set of characters.
"" /wibble /wombat/subres /wombat//subres /wibble////
are all valid paths. /wombat/subres and /wombat//subres are not required
to identify the same resource.
Problem: the Unix file system does not a allow "" as the name of a file.
Conventional behaviour _ignores_ null path segments in pathnames passed to
the system routines; the pathnames /foo/bar and /foo//bar represent the same
file.
So, how do we map the URL semantics to the file system's semantics?
The NCSA (& Apache 0.6.5) solution; remove null path segments from the entire
URL. Thus
http://host/wombat/subres and
http://host/wombat//subres access the
same resource.
What is wrong with this?
1. Relative links don't work.
If the document subres contains a link to ../index.html
Then when accessed as
http://host/wombat/subres, the link refers to
http://host/index.html; Whereas for
http://host/wombat//subres, the
link refers to
http://host/wombat/index.html. 2. CGI scripts don't get the data they expect.
If /cgi-bin/fetch is a script then an access to
http://host/cgi-bin/fetch/some//path calls the script with PATH_INFO set to /some/path
3. I don't think that documents should have multiple URLs unless the user
wanted this.
Other solutions would be to _redirect_ the request (redirect
http://host/wombat//subres ->
http://host/wombat/subres) so that relative
links 'work', or treat the request as asking for the access to a directory
("") which does not exist, and return 404 Not Found, or 403 Forbidden.
This might optionally be not applied to the PATH_INFO that will be handled
by a CGI script.
The Apache behaviour:
Apache attempts to emulate the NCSA behaviour, but without removing multiple
slashes from PATH_INFO data. Unfortunately, it gets it _wrong_; although in
the majority of cases it ignores void path segments, it does not always do
so. Here are the bugs:
* Multiple slashes defeat Alias, ScriptAlias and Redirect directives.
DocumentRoot /web/docs
ScriptAlias /cgi-bin /web/cgi-bin
http://host//cgi-bin/c references the file /web/docs/cgi-bin/c
http://host/cgi-bin/c references the script /web/cgi-bin/c
This is not compatible with the NCSA behaviour, which would map both
of these to the CGI script
* Multiple slashes defeat the Userdir directive.
http://host//~drtr/dir/ http://host/~drtr//dir/ http://host/~drtr/dir/ all reference the _same_ file under NCSA httpd; Apache treats the first
as a reference to /web/docs/~drtr/dir/ rather than /home/drtr/dir/
Similarly, AddDescription does not work.
Whether these bugs are significant is a moot point. However, they are bugs,
and they do represent incompatibilities with NCSA, and they could catch out
the unwary. (As has happened; the original poster had assumed that
ScriptAlias /cgi-bin ... would mean that /documentroot/cgi-bin would not
be accessible to the client directly.)
Apache should, at the very least, be consistent in its handling of void
path segments. I think the current NCSA behaviour is poor, and that
such requests should be either redirected or forbidden. The clients that
rst sees making these accesses are probably getting pagefulls of bad links.
David.