Mailing List Archive

CGI specification (was Re: restructuring the server)
Beth wrote:
>... The administration
>and CGI tutorials are another set of meetings suggested. I
>figured those in this group would be most interested in
>the third set of meetings where we listen to gripes, discuss
>requested features and present our grand plans for a
>Dynamic Object Repository (DOR) as a bull's eye for y'all
>to shoot at. In particular we would like help developing
>an Application Program Interface (API) to replace/enhance
>CGI...

I've just tried to write a decent specification for CGI; not very
easy. Have a look at <URL:http://www.ast.cam.ac.uk/&7Edrtr/cgi.html>

I've aimed at a precise definition of all the environment variables;
In particular, PATH_INFO and PATH_TRANSLATED required some thought.
(I started this document to organise my thoughts about these variables
because of a discussion on this list)

I've highlighted problems areas with italicised notes; these still need to
be sorted out.

The main differences with the NCSA docs are:
1. I specify some objects completely; such as complete syntax for all
the environment variables.
2. I don't specify parameters which are features of the host operating system,
such as how the script access `environment variables', or what character
set it must use.
3. I give a rather different interpretation of PATH_INFO, QUERY_STRING and
PATH_TRANSLATED (the latter now optional); I was forced into this
interpretation by the behaviour of httpd/apache.
4. Emphasises that a standard CGI script does not and cannot know the
original URL the client requested. (And hence cannot use it as a base
for relative links in any HTML returned).
5. Some obvious minor improvements, such as ISINDEX queries must use the "GET"
method, or that the authentication information (AUTH_TYPE and REMOTE_USER)
should only be set if the _script_ is protected, rather than always if
the data is sent by the client.

The document is in the nature of a base specification; from it one could
derive CGI specifications for specific systems, such as Unix, MS-Windows etc.

I would welcome comments, especially about what should be done with this
document.

David.
Re: CGI specification (was Re: restructuring the server) [ In reply to ]
Correction number one --- the '&' in Dave's URL should be a '%'.

rst
Re: CGI specification (was Re: restructuring the server) [ In reply to ]
Date: Thu, 27 Apr 95 18:35 BST
From: drtr@ast.cam.ac.uk (David Robinson)

Dave --- a formal CGI spec would be a good idea. I haven't gone over
your actual document yet, so I can't really post a full critique;
however, I do have one comment.

4. Emphasises that a standard CGI script does not and cannot know the
original URL the client requested. (And hence cannot use it as a base
for relative links in any HTML returned).

With the Apache and NCSA 1.4 treatment of PATH_INFO, the script *can*
know the client's URL --- it is guaranteed to be the same as the
concatenation of SCRIPT_URI and PATH_INFO. In fact, I'd prefer to see
this used as the definition of PATH_INFO in your document. (The
language you have now is awfully non-committal).

(The only deviation from this invariant with NCSA 1.3, and possibly
CERN, is in the case where PATH_INFO consists entirely of '/'
characters, and as everyone will remember from the flame wars we had
about *that*, I regard that deviation as a bug).

Here we go again...

rst
Re: CGI specification (was Re: restructuring the server) [ In reply to ]
Date: Thu, 27 Apr 95 18:35 BST
From: drtr@ast.cam.ac.uk (David Robinson)

Dave --- a formal CGI spec would be a good idea. I haven't gone over
your actual document yet, so I can't really post a full critique;
however, I do have one comment.

4. Emphasises that a standard CGI script does not and cannot know the
original URL the client requested. (And hence cannot use it as a base
for relative links in any HTML returned).

With the Apache and NCSA 1.4 treatment of PATH_INFO, the script *can*
know the client's URL --- it is guaranteed to be the same as the
concatenation of SCRIPT_URI and PATH_INFO. In fact, I'd prefer to see
this used as the definition of PATH_INFO in your document. (The
language you have now is awfully non-committal).

(The only deviation from this invariant with NCSA 1.3, and possibly
CERN, is in the case where PATH_INFO consists entirely of '/'
characters, and as everyone will remember from the flame wars we had
about *that*, I regard that deviation as a bug).

Here we go again...

rst
Re: CGI specification (was Re: restructuring the server) [ In reply to ]
Date: Thu, 27 Apr 95 21:30 BST
From: drtr@ast.cam.ac.uk (David Robinson)

Except I'm trying to write a standard, instead of a description of
what apache does. I suppose a 'conformant' CGI script is allowed
to look at SERVER_SOFTWARE and make extra assumptions based on that.

Hmmm... The following describes every CGI implementation that I'm
aware of, except for behavior in pathological cases (e.g., the case in
which 1.3 doesn't even invoke the script) --- in particular, if I read
the code right, it always describes CERN, and it always describes NCSA
1.4 and Apache:

The server maps *prefixes* of URLs onto actual scripts in a
server-defined manner. When a script is invoked, SCRIPT_URI is set
to the prefix which caused the server to select that particular
script, and the rest of the URL (if any) is available in PATH_INFO.
The details of this mapping of SCRIPT_URIs to actual scripts are
outside the scope of the CGI spec. However, regardless of the
details of the mapping, the following must always be true:

There is at least one URL which causes the script to be invoked
with SCRIPT_URI set to that URL, and PATH_INFO set to the null
string. Appending any string which begins with '/' to any such URL
will cause the script to be invoked with the appended string as
PATH_INFO.

This language makes no reference to the internals of Apache, or NCSA
1.4. What's more important any server can implement this
specification without constraining the way it finds scripts in any
way, shape or form.

In fact, I can't think of a way to write a CGI script which usefully
uses PATH_INFO without being able to count on the above-specified
behavior (again, barring failure to handle pathological cases).

And yes, I'd rather be less non-committal, but that's the way
things are.

Are they? What servers do anything different?

rst
Re: CGI specification (was Re: restructuring the server) [ In reply to ]
Date: Thu, 27 Apr 95 21:30 BST
From: drtr@ast.cam.ac.uk (David Robinson)

Except I'm trying to write a standard, instead of a description of
what apache does. I suppose a 'conformant' CGI script is allowed
to look at SERVER_SOFTWARE and make extra assumptions based on that.

Hmmm... The following describes every CGI implementation that I'm
aware of, except for behavior in pathological cases (e.g., the case in
which 1.3 doesn't even invoke the script) --- in particular, if I read
the code right, it always describes CERN, and it always describes NCSA
1.4 and Apache:

The server maps *prefixes* of URLs onto actual scripts in a
server-defined manner. When a script is invoked, SCRIPT_URI is set
to the prefix which caused the server to select that particular
script, and the rest of the URL (if any) is available in PATH_INFO.
The details of this mapping of SCRIPT_URIs to actual scripts are
outside the scope of the CGI spec. However, regardless of the
details of the mapping, the following must always be true:

There is at least one URL which causes the script to be invoked
with SCRIPT_URI set to that URL, and PATH_INFO set to the null
string. Appending any string which begins with '/' to any such URL
will cause the script to be invoked with the appended string as
PATH_INFO.

Conversely, when any script is invoked, it shall be the case that
SCRIPT_URI and PATH_INFO, when concatenated, are the same as the
URL that caused the script to be invoked, and that submitting
SCRIPT_URI on its own causes the same script to be invoked with
null PATH_INFO.

This language makes no reference to the internals of Apache, or NCSA
1.4. What's more important any server can implement this
specification without constraining the way it finds scripts in any
way, shape or form.

In fact, I can't think of a way to write a CGI script which usefully
uses PATH_INFO without being able to count on the above-specified
behavior (again, barring failure to handle pathological cases).

And yes, I'd rather be less non-committal, but that's the way
things are.

Are they? What servers do anything different?

rst
Re: CGI specification (was Re: restructuring the server) [ In reply to ]
>With the Apache and NCSA 1.4 treatment of PATH_INFO, the script *can*
>know the client's URL --- it is guaranteed to be the same as the
>concatenation of SCRIPT_URI and PATH_INFO. In fact, I'd prefer to see
>this used as the definition of PATH_INFO in your document. (The
>language you have now is awfully non-committal).

Except I'm trying to write a standard, instead of a description of
what apache does. I suppose a 'conformant' CGI script is allowed
to look at SERVER_SOFTWARE and make extra assumptions based on that.

And yes, I'd rather be less non-committal, but that's the way
things are.

>(The only deviation from this invariant with NCSA 1.3, and possibly
>CERN, is in the case where PATH_INFO consists entirely of '/'
>characters, and as everyone will remember from the flame wars we had
>about *that*, I regard that deviation as a bug).

I've put warnings in about that case, recommending that scripts
not try and cope with it.

David.
Re: CGI specification (was Re: restructuring the server) [ In reply to ]
From: drtr@ast.cam.ac.uk (David Robinson)

Apache, for one.

As you say, there is at least one URL for which PATH_INFO is well defined.
However, that does not require this to be the only URL which can invoke the
script. Thus, given this URL, one can deduce PATH_INFO. But given
PATH_INFO and SCRIPT_NAME, one cannot deduce what the URL was.

Consider:
ScriptAlias /cgi-bin/ /htdocs/cgi-bin/
ScriptAlias /htbin/ /htdocs/cgi-bin/

for a client request of http://host.name/htbin/script/path
SCRIPT_NAME=/cgi-bin/script and PATH_INFO=/path are valid settings
(and quite likely with apache or NCSA httpd). Even the NCSA docs
acknowledge this.

Hmmm... I was very careful *not* to require that the URL which causes
a script to be invoked with null PATH_INFO be unique. However, there
is the possibility, at least with NCSA 1.3-deriviative code, that with
multiple mappings like this, unmunge_name may choose the wrong
mapping. But I'd still prefer to see PATH_INFO specified as above,
and to consider the Apache and NCSA 1.3 behavior to be a bug
(particularly since the same behavior, from the same routine, can
cause the URLs logged in case of various errors to differ from what
the client actually presented --- which is clearly and unambiguously
erroneous, even though it's incredibly hard to fix).

Or consider:
/doc.shtml is a server-side include document which does
<!--#exec cgi="/cgi-bin/script" -->

If the client requests http://host.name/doc.shtml/path
then the script is called with SCRIPT_NAME=/cgi-bin/script, PATH_INFO=/path

I do not consider these to be pathalogical cases. In fact, I was
told (on this list) that people rely on the behaviour shown by the
second example. (Much as I deplore it.)

Invocation of scripts from server-side includes is an interesting
thing to think about, but I think the CGI spec proper can be confined
to direct invocation of scripts for the moment.

Still, this is an awkward case... a script which can be invoked either
directly, or through <!--#exec cgi--> can construct, at the very
least, a URL which will cause it to be reinvoked in the same manner,
but it has to jump through an uncomfortable number of hoops to do it.
Sigh...

rst
Re: CGI specification (was Re: restructuring the server) [ In reply to ]
Rst wrote:
>Hmmm... The following describes every CGI implementation that I'm
>aware of, except for behavior in pathological cases (e.g., the case in
>which 1.3 doesn't even invoke the script) --- in particular, if I read
>the code right, it always describes CERN, and it always describes NCSA
>1.4 and Apache:
>
> The server maps *prefixes* of URLs onto actual scripts in a
> server-defined manner. When a script is invoked, SCRIPT_URI is set
> to the prefix which caused the server to select that particular
> script, and the rest of the URL (if any) is available in PATH_INFO....
No (see below).
> ...The details of this mapping of SCRIPT_URIs to actual scripts are
> outside the scope of the CGI spec. However, regardless of the
> details of the mapping, the following must always be true:
>
> There is at least one URL which causes the script to be invoked
> with SCRIPT_URI set to that URL, and PATH_INFO set to the null
> string. Appending any string which begins with '/' to any such URL
> will cause the script to be invoked with the appended string as
> PATH_INFO.

I already have your second paragraph under 'Server requirements'. Though
I might not have said it as clearly.

>This language makes no reference to the internals of Apache, or NCSA
>1.4. What's more important any server can implement this
>specification without constraining the way it finds scripts in any
>way, shape or form.

I know. I've already considered this.

>In fact, I can't think of a way to write a CGI script which usefully
>uses PATH_INFO without being able to count on the above-specified
>behavior (again, barring failure to handle pathological cases).

Indeed, which is why I require the server to provide this behaviour.

>
> And yes, I'd rather be less non-committal, but that's the way
> things are.
>
>Are they? What servers do anything different?

Apache, for one.

As you say, there is at least one URL for which PATH_INFO is well defined.
However, that does not require this to be the only URL which can invoke the
script. Thus, given this URL, one can deduce PATH_INFO. But given
PATH_INFO and SCRIPT_NAME, one cannot deduce what the URL was.

Consider:
ScriptAlias /cgi-bin/ /htdocs/cgi-bin/
ScriptAlias /htbin/ /htdocs/cgi-bin/

for a client request of http://host.name/htbin/script/path
SCRIPT_NAME=/cgi-bin/script and PATH_INFO=/path are valid settings (and quite
likely with apache or NCSA httpd). Even the NCSA docs acknowledge this.

Or consider:
/doc.shtml is a server-side include document which does
<!--#exec cgi="/cgi-bin/script" -->

If the client requests http://host.name/doc.shtml/path
then the script is called with SCRIPT_NAME=/cgi-bin/script, PATH_INFO=/path

I do not consider these to be pathalogical cases. In fact, I was told (on this
list) that people rely on the behaviour shown by the second example.
(Much as I deplore it.)

David.
Re: CGI specification (was Re: restructuring the server) [ In reply to ]
It doesn't sound as though rst and myself disagree that much...

Basically, I started by writing a spec which matched current code, so that
we could then consider how it could be improved, rather than writing down
the ideal spec and then realising how nothing implemented it.

> Or consider:
> /doc.shtml is a server-side include document which does
> <!--#exec cgi="/cgi-bin/script" -->
>
> If the client requests http://host.name/doc.shtml/path
> then the script is called with SCRIPT_NAME=/cgi-bin/script, PATH_INFO=/path
>
> I do not consider these to be pathalogical cases. In fact, I was
> told (on this list) that people rely on the behaviour shown by the
> second example. (Much as I deplore it.)
>
> Invocation of scripts from server-side includes is an interesting
> thing to think about, but I think the CGI spec proper can be confined
> to direct invocation of scripts for the moment.

That's how it started out; but I gave up trying to define a `directly invoked
script'.

I don't think we loose too much by making the spec general; I still require
the server to at least provide the `direct invocation'. (Although I don't call
it that; maybe I should.)

> Still, this is an awkward case... a script which can be invoked either
> directly, or through <!--#exec cgi--> can construct, at the very
> least, a URL which will cause it to be reinvoked in the same manner,
> but it has to jump through an uncomfortable number of hoops to do it.
> Sigh...

Yes... I do define this as a 'script URL' (distinct from the client URL), but
I'm not sure it's of much use.

And I was also thinking ahead to AddHandler CGI scripts, i.e. a script
the server calls to output all .xxx files; then a PATH_INFO of the entire
URL path would be consistent with my CGI spec. (Though would you be
able to include .xxx files from ssi-scripts?)

David.