Mailing List Archive

Content-type negotiation: thoughts and code
Generally, it's been my experience that nothing clarifies the issues
involved in doing something like actually writing code to do it. I
decided to take this approach with regard to content-type negotiation;
as a result, I now have two new patches, both in
ftp://ftp.ai.mit.edu/pub/users/rst/httpd-patches:

patch.addtype-bug --- Largely a code cleanup, but also fixes the
long-standing and oft-reported bug that the usual AddType directives
for *.cgi and *.shtml are ineffective in .htaccess files, and
leaves a few hooks for...

patch.content-arb --- Content-type negotiation. (Yes, really ---
but see caveats below --- a user's-manual style writeup on what
this does is towards the bottom of this note, after "HERE'S THE DOCS").
This patch is 650-odd lines, of which about 500 are the contents
of the new file http_mime_db.c, which does most of the work.

This code does a *basic* job of content-type negotiation --- it parses
everything (at least according to my reading of the relevant standards
documents, and the CERN code), but it doesn't actually use the
Accept-content-encoding and Accept-language info yet; however, it does
handle qualities. (It also doesn't yet return the correct HTTP error
codes in all cases --- in particular, it returns 404 Not Found, rather
than 406 None Acceptable, when content-type negotiation does find
alternate views, of which none are acceptable).

With this code in mind, some reflections on the discussions we've had
on this subject over the past week or two.



First off, with regard to the question of whether to support
CERN-style auto-arbitration based on filename extensions, or to have
explicit map files, I'd like to suggest that we can afford to support
both. Directories take very little extra code to support beyond what
you need to handle the contents of the map files themselves --- from
the server's point of view, they're just another form of map file,
which happens to come pre-parsed. However, the gain in convenience
for people who can use the feature is substantial.

(Of course, some people *don't* want to use the feature, so it needs
to be a configuration option. The way I've done it in my code above,
you need "Options MultiViews" enabled in a directory in order for "GET
/.../foo" to be resolved to "/.../foo.gif" or "/.../foo.jpeg". In
directories where MultiViews is off, the server behaves exactly as it
would if the directory-scanning code were never there, so I don't
*think* there's a back-compatibility issue ;-).



A second interesting thing which comes up is what to do with clients
like certain, ahem, colorful browser betas which ship completely bogus
Accept: headers (or HTTP/0.9 browsers, which don't ship any). My code
basically pretends the browser did "Accept: text/html" and "Accept:
text/plain" whether it actually did or not, to ease this difficulty;
is that the right thing?



Then there's security. The issue here is that if some Malevolent
Entity (say, a cracker exploiting a leaky ftp server) can create
type-map files, you don't want the server believing one which names
/etc/passwd as the text/plain view of the composite entity described
by /inoccuous/directory/pretty-bunny.map. My code takes the
thoroughly draconian approach of making all pathnames in map files
relative to the map file itself, and *disallowing* relative paths
containing '/', so a type-map file can *only* name things in the same
directory (although those can be symlinks *if* FollowSymLinks is
enabled). Is that the wrong thing? If so, what's the right thing?




As a final point, writing the code raises the question of what the
map files should look like. What I've done probably isn't the right
thing, but a discussion of what's wrong with it might prove
instructive. The map files implemented by my code just look like:

foo.au: audio/basic
foo.gif: image/gif
foo.html: text/html
foo.txt: text/plain

(this being the contents of foo.map). Qualities can be associated
with these ---

foo.gif: image/gif; q = 0.6
foo.jpeg: image/jpeg; q = 1.0
foo.xbm: image/x-xbitmap; q = 0.00001

but that's about it. My reasons for choosing this syntax were crass
and pragmatic --- I got to reuse the Accept:-line parsing code. (That
means that

foo.txt: text/plain, text/setext

works, if both those types apply to the document --- in fact, for
compatibility with the broken past,

foo.txt: text/plain text/setext

works, but Lord knows we don't want to advertise *that*).

My question for the group is, what else do we want? Extra MIME header
lines? Some way of discriminating on USER_AGENT? (FWIW, I was
thinking vaguely of a syntax along the lines of:

foo.aiff {
Content-type: audio/aiff; q=1.0
Pass-along-this-mime-hdr: Kilroy was here
}

foo.au {
Content-type: audio/basic; q=0.2
Pass-along-this-mime-hdr: Klrooy wuzzz heerere
}

I have a truly marvelous implementation of this in mind, but my
weekend was too small to contain it ;-).




Finally, wrt the disposition of the code --- I think patch.addtype-bug
is a reasonable candidate for Cliff's alpha release; it fixes a long
standing problem, and cleans up some of the script code a little (one
routine that searches for PATH_INFO instead of three; also, about 90
lines of duplicate code from http_{post,put,delete}.c merged into one
routine. These changes are a bit more than I would have liked at this
point, but there were several distinct pieces of code which all had
the bug, and you really can't fix it without bashing them all).

The content-arb code is a little more tenuous --- largely because it's
not quite complete (content-encodings aren't handled correctly yet),
and because we really don't know what the map files ought to look
like.





Oh yes, HERE'S THE DOCS:

This code adds two new features to httpd: special treatment for the
pseudo-mime-type application/x-type-map, and the MultiViews
per-directory Option (which can be set in srm.conf, or in .htaccess
files, as usual). These features are alternate user interfaces to
what amounts to the same piece of code (in the file http_mime_db.c)
which implements (uh, ...most of) the optional content negotiation
portion of the HTTP protocol.

Each of these features allows one of several files to satisfy a
request, based on what the client says it's willing to accept; the
differences are in the way the files are identified:

*) A type map names the files explicitly
*) In a MultiViews directory, the server does an implicit glob
and chooses from among the results

TYPE MAPS:

A type map is a document which is typed by the server (using its
normal suffix-based mechanisms) as application/x-type-map. The syntax
of these files is simple:

filename: mime/type; parm parm parm
filename: mime/type; parm parm parm

so, for instance, you can have

foo.gif: image/gif; q = 0.6
foo.jpeg: image/jpeg; q = 1.0
foo.xbm: image/x-xbitmap; q = 0.00001

The 'q', for 'quality' parameter, specifies preferences among these
images if the client doesn't much care --- in this case, the jpeg is
somewhat preferred to the gif, and the xbm is only shipped if the
client won't take *anything* else.

Note that the files references *must* be in the same directory as the
map file, for security reasons (we wouldn't want someone coming in
through an ftp server to be able to fake up a map file listing
/etc/passwd, and have the server respect it). You get a Server Error
message if they aren't.

Note also that to use this, you've got to have an AddType someplace
which defines a file suffix as application/x-type-map; the easiest
thing may be to stick a

AddType application/x-type-map map

in srm.conf.

MULTIVIEWS:

This is a per-directory option, meaning it can be set with an Options
directive within a <Directory> section in access.conf, or (if
AllowOverride is properly set) in .htaccess files. Note that Options
All does not set MultiViews; you have to ask for it by name. (This is
a one-line change to httpd.h).

The effect of MultiViews is as follows: if the server receives a
request for /some/dir/foo, /some/dir has MultiViews enabled, and
/some/dir/foo does *not* exist, then the server reads the directory
looking for files named foo.*, and effectively fakes up a type map
which names all those files, assigning them the same MIME types it
would have if the client had asked for one of them by name. It then
chooses the best match to the client's accept: headers, and forwards
them along.

If one of the files found by the globbing is a CGI script, it's not
obvious what should happen. My code gives that case gets special
treatment --- if the request was a POST, or a GET with QUERY_ARGS or
PATH_INFO, the script is given an extremely high quality rating, and
generally invoked; otherwise it is given an extremely low quality
rating, which generally causes one of the other views (if any) to be
retrieved. This is the only jiggering of quality ratings done by the
MultiViews code; aside from that, all Qualities in the synthesized
type maps are 1.0.

Note that this machinery only comes into play if the file which the
user attempted to retrieve does *not* exist by that name; if it does,
it is simply retrieved as usual. (So, someone who actually asks for
'foo.jpeg', as opposed to 'foo', never gets foo.gif).

That's it.

rst
Re: Content-type negotiation: thoughts and code [ In reply to ]
Robert, you are amazing. I was thinking about going home tonight to
watch "the State" but now I'm not so sure....

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: Content-type negotiation: thoughts and code [ In reply to ]
Okay, review time:

On Sun, 12 Mar 1995, Robert S. Thau wrote:
> First off, with regard to the question of whether to support
> CERN-style auto-arbitration based on filename extensions, or to have
> explicit map files, I'd like to suggest that we can afford to support
> both. Directories take very little extra code to support beyond what
> you need to handle the contents of the map files themselves --- from
> the server's point of view, they're just another form of map file,
> which happens to come pre-parsed. However, the gain in convenience
> for people who can use the feature is substantial.
>
> (Of course, some people *don't* want to use the feature, so it needs
> to be a configuration option. The way I've done it in my code above,
> you need "Options MultiViews" enabled in a directory in order for "GET
> /.../foo" to be resolved to "/.../foo.gif" or "/.../foo.jpeg". In
> directories where MultiViews is off, the server behaves exactly as it
> would if the directory-scanning code were never there, so I don't
> *think* there's a back-compatibility issue ;-).

This is all solid, I agree. In testing it doesn't look like the
automatic selection is working exactly right. The .map-based system
apparently works fine (try <URL:http://hyperreal.com:8000/index.map>)
, but apparently the q variable isn't being used
properly (or maybe I don't understand it) in the auto-negotiation:

There's an "index2.html" and "index2.html3". I added an AddType to bind
.html3 to text/x-html3. Working directly with the server ("telnet
hyperreal.com 8000"), I try

1) HEAD /index2 HTTP/1.0
Accept: text/x-html3 returns text/x-html3, no problem

2) HEAD /index2 HTTP/1.0
Accept: text/html returns text/html, no problem

3) HEAD /index2 HTTP/1.0
Accept: text/x-html3
Accept: text/html returns text/x-html3, no problem

4) HEAD /index2 HTTP/1.0
Accept: text/html; q=0.1
Accept: text/x-html3; q=1.0 returns text/x-html3, no problem

5) HEAD /index2 HTTP/1.0
Accept: text/html; q=0.600
Accept: text/x-html3; q=0.800

returns text/html, which doesn't make sense, unless there's some
internal preference for html over x-html3 (and I'm not using a map
file). These are the q values sent by Arena. Order doesn't seem to
matter.

Any clue?

Also, I can set DirectoryIndex to a .map file. Yayay! Can I set it to
"index2"... nope. How much work is involved in being able to set the
DirectoryIndex to use autonegotiation? Finally, I noticed that I can't
yet have server

> A second interesting thing which comes up is what to do with clients
> like certain, ahem, colorful browser betas which ship completely bogus
> Accept: headers (or HTTP/0.9 browsers, which don't ship any). My code
> basically pretends the browser did "Accept: text/html" and "Accept:
> text/plain" whether it actually did or not, to ease this difficulty;
> is that the right thing?

Sounds fine to me.

> Then there's security. The issue here is that if some Malevolent
> Entity (say, a cracker exploiting a leaky ftp server) can create
> type-map files, you don't want the server believing one which names
> /etc/passwd as the text/plain view of the composite entity described
> by /inoccuous/directory/pretty-bunny.map. My code takes the
> thoroughly draconian approach of making all pathnames in map files
> relative to the map file itself, and *disallowing* relative paths
> containing '/', so a type-map file can *only* name things in the same
> directory (although those can be symlinks *if* FollowSymLinks is
> enabled). Is that the wrong thing? If so, what's the right thing?

Hmm - another directive a la FollowSymLinks seems in order. At least
allowing a map file to go down directories would be good. Actually
"Includes" seems like it covers the same grounds security-wise...

> As a final point, writing the code raises the question of what the
> map files should look like. What I've done probably isn't the right
> thing, but a discussion of what's wrong with it might prove
> instructive. The map files implemented by my code just look like:
>
> foo.au: audio/basic
> foo.gif: image/gif
> foo.html: text/html
> foo.txt: text/plain

Looks great to me.

> My question for the group is, what else do we want? Extra MIME header
> lines? Some way of discriminating on USER_AGENT?

This gets back to the whole meta-information debate - where can we allow
people to add the "Refresh: 10" lines in the absence of a file system that
makes this easy. I think that the purpose for this and content
negotiation is roughly seperable - thus, we could have something like a
".meta" file in each directory which acted as a way to store
metainformation about different files in that directory. It would have
the performance penalty of a stat() and a read if it exists, which means
people should use it sparingly and in directories with not too many other
files, but it could be useful. Whether we should allow conditionals in
the format ("if(USER_AGENT) =~ /*Mozilla*/", etc) is a big question - for
ease of implementation I'd argue against it for now at least.

Also, documentation: Randy, since you're in charge of the HTML end of the
apache web pages, would you be willing to handle this? Each patch should
at least have a mention.

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: Content-type negotiation: thoughts and code [ In reply to ]
On Mon, 13 Mar 1995, Brian Behlendorf wrote:
> Also, I can set DirectoryIndex to a .map file. Yayay! Can I set it to
> "index2"... nope. How much work is involved in being able to set the
> DirectoryIndex to use autonegotiation? Finally, I noticed that I can't
> yet have server
.... side includes, though that hopefully isn't too much more effort.

Brian
Re: Content-type negotiation: thoughts and code [ In reply to ]
On Mon, 13 Mar 1995, Robert S. Thau wrote:
> FWIW, the problem Brian was having here is that there's no way to get
> the server to process includes in a document which it believes to be
> of type text/x-html3. (Brian, did I get that right?)

Yes - though the thought that "text/html; level=3" might work has crossed
my mind. Time to try it, since that's the W3C position on what
HTMl 3.0 documents should be typed as.

> Unfortunately, the only ways I can see out of this are somewhat ugly
> --- either generalizing XBITHACK so it will direct the server to
> process world-X files of *any* text/* MIME type, or add yet *another*
> magic MIME type (and also educate XBITHACK about text/x-html3).
>
> (I dislike the former because I have a sneaking distaste for the
> XBITHACK --- if people like it, fine, but I don't like the thought of
> a server which effectively requires it if people are going to make
> maximum use of its features. Unfortunately, that leaves the new magic
> MIME type).
(as I mentioned in private mail)

What about a directive in srm.conf to the effect of:

ServerSideParse text/html
ServerSideParse text/x-html3
ServerSideParse text/html; level=3

By the way, rst has won the monthly Bill Perry Emacs-W3 Award for the
shortest average feature-suggestion-to-working-implementation elapsed
time.

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: Content-type negotiation: thoughts and code [ In reply to ]
Date: Mon, 13 Mar 1995 15:55:29 -0800 (PST)
From: Brian Behlendorf <brian@wired.com>
Precedence: bulk
Reply-To: new-httpd@hyperreal.com


returns text/html, which doesn't make sense, unless there's some
internal preference for html over x-html3

... which there is --- see note_accepts_for_broken_browsers, which
fakes up an "Accept: text/html" with the implicit q value of 1.0
whether the client has specified a different one or not. This is one
of the reasons why I'm not sure that note_accepts_for_broken_browsers
is such a good idea, although in this particular instance, simply
checking to see if the client had actually submitted an accept line
for text/html would take care of the problem. (I think I saw code for
something like this in the CERN server, but I might have to read it
more carefully).

I'm about 85% sure that this is the cause of the bug. NB, if this is
right, then setting the q for text/x-html3 to 1.0 would solve the
problem, since ties are broken by order, and the faked headers come
last.

Also, I can set DirectoryIndex to a .map file. Yayay! Can I set it to
"index2"... nope. How much work is involved in being able to set the
DirectoryIndex to use autonegotiation? Finally, I noticed that I can't
yet have server

It *ought* to be just a few lines of code, but I haven't checked.

> A second interesting thing which comes up is what to do with clients
> like certain, ahem, colorful browser betas which ship completely bogus
> Accept: headers (or HTTP/0.9 browsers, which don't ship any). My code
> basically pretends the browser did "Accept: text/html" and "Accept:
> text/plain" whether it actually did or not, to ease this difficulty;
> is that the right thing?

Sounds fine to me.

But the code to handle it may need a bit of work ;-). [see above].

> Then there's security. The issue here is that if some Malevolent
> Entity (say, a cracker exploiting a leaky ftp server) can create
> type-map files, you don't want the server believing one which names
> /etc/passwd as the text/plain view of the composite entity described
> by /inoccuous/directory/pretty-bunny.map. My code takes the
> thoroughly draconian approach of making all pathnames in map files
> relative to the map file itself, and *disallowing* relative paths
> containing '/', so a type-map file can *only* name things in the same
> directory (although those can be symlinks *if* FollowSymLinks is
> enabled). Is that the wrong thing? If so, what's the right thing?

Hmm - another directive a la FollowSymLinks seems in order. At least
allowing a map file to go down directories would be good. Actually
"Includes" seems like it covers the same grounds security-wise...

That's a good point. (One thing that was bothering me is that
"Includes" does a full security check on included files; I currently
do that as well out of sheer laziness, but things are spec'ed out so
that the only thing you actually *need* to do is a symlink check).

(BTW, another thing that was bugging me is that for the security check
to be effective, you need to check for '..'s and repeated slashes as
well; if the includes code doesn't do those checks, there's a
potential security hole).

This gets back to the whole meta-information debate - where can we allow
people to add the "Refresh: 10" lines in the absence of a file system that
makes this easy. I think that the purpose for this and content
negotiation is roughly seperable - thus, we could have something like a
".meta" file in each directory which acted as a way to store
metainformation about different files in that directory. It would have
the performance penalty of a stat() and a read if it exists, which means
people should use it sparingly and in directories with not too many other
files, but it could be useful. Whether we should allow conditionals in
the format ("if(USER_AGENT) =~ /*Mozilla*/", etc) is a big question - for
ease of implementation I'd argue against it for now at least.

One compromise I thought of was having .meta files, but only have them
effective if they're found in a MultiViews scan (where they can be
detected with no overhead). But this obviously fails for people who
don't like MultiViews.

I smell another per-directory option --- which is a bit of a problem
in the code, since with MultiViews, we're up to eight, and that's as
many as can fit in the one-byte bitmasks which are used by the access
control stuff.

rst
Re: Content-type negotiation: thoughts and code [ In reply to ]
Date: Mon, 13 Mar 1995 16:28:52 -0800 (PST)
From: Brian Behlendorf <brian@wired.com>

On Mon, 13 Mar 1995, Brian Behlendorf wrote:
> Also, I can set DirectoryIndex to a .map file. Yayay! Can I set it to
> "index2"... nope. How much work is involved in being able to set the
> DirectoryIndex to use autonegotiation? Finally, I noticed that I can't
> yet have server
.... side includes, though that hopefully isn't too much more effort.

Brian

FWIW, the problem Brian was having here is that there's no way to get
the server to process includes in a document which it believes to be
of type text/x-html3. (Brian, did I get that right?)

Unfortunately, the only ways I can see out of this are somewhat ugly
--- either generalizing XBITHACK so it will direct the server to
process world-X files of *any* text/* MIME type, or add yet *another*
magic MIME type (and also educate XBITHACK about text/x-html3).

(I dislike the former because I have a sneaking distaste for the
XBITHACK --- if people like it, fine, but I don't like the thought of
a server which effectively requires it if people are going to make
maximum use of its features. Unfortunately, that leaves the new magic
MIME type).

Any thoughts?

rst
Re: Content-type negotiation: thoughts and code [ In reply to ]
From: rst@ai.mit.edu (Robert S. Thau)
Date: Mon, 13 Mar 95 20:06:39 EST

FWIW, the problem Brian was having here is that there's no way to get
the server to process includes in a document which it believes to be
of type text/x-html3. (Brian, did I get that right?)

Unfortunately, the only ways I can see out of this are somewhat ugly
--- either generalizing XBITHACK so it will direct the server to
process world-X files of *any* text/* MIME type, or add yet *another*
magic MIME type (and also educate XBITHACK about text/x-html3).

Actually, there is another way --- to have suffixes carry attributes
as well as mime types, and to have server-parsedness be one of the
attributes which is stored, separately from the MIME type itself.
Then we could have stuff in mime.types like:

text/html; level=2.0: html
text/html; level=2.0, server_parsed=1: shtml
text/html; level=3.0: html3
text/html; level=3.0, server_parsed=1: shtml3

and have it all work, and all be configurable. In fact, this would
also transparently allow includes to operate in other forms of
text-type stuff:

text/plain; server_parsed=1: stxt

(Note implicit change to the syntax of mime.types, to take the
presence of parameters on the mime types into account --- we can keep
this back-compatible by special casing the presence of only two tokens
on a line. The same problem comes up with AddType directives, whose
args are also in the wrong order).

IMHO, that's the Right Thing, but it's rather more work than any of
the kludges I suggested. In particular, it involves getting
http_mime.c rather further in bed with my new mime_db stuff, or the
equivalent, than it is right now. (Right now, they're pretty much
separate; they only change my content-arb patch makes to http_mime.c
is to have it invoke functions which enter the client's Accept-foo
headers into the database).

If that doesn't bug anybody severely, I could try to do it next
weekend, along with my currently scheduled (though subject to change)
second pass over the content-arb code.

rst
Re: Content-type negotiation: thoughts and code [ In reply to ]
rst was saying:

> This code does a *basic* job of content-type negotiation --- it parses
> everything (at least according to my reading of the relevant standards
> documents, and the CERN code), but it doesn't actually use the
> Accept-content-encoding and Accept-language info yet; however, it does

I'd recommend not looking at the CERN code and just use the latest
HTTP/1.0 as a reference. If you find parts that don't make sense in the
spec, please take notes and forward them to me -- you will probably
be the first to implement it right.

There is no "Accept-content-encoding" -- only Accept-Encoding.

> handle qualities. (It also doesn't yet return the correct HTTP error
> codes in all cases --- in particular, it returns 404 Not Found, rather
> than 406 None Acceptable, when content-type negotiation does find
> alternate views, of which none are acceptable).

Well, that's not useful -- that would be a bug. Lack of 406 is the
main reason content negotiation has failed in the past.

>[...]
> A second interesting thing which comes up is what to do with clients
> like certain, ahem, colorful browser betas which ship completely bogus
> Accept: headers (or HTTP/0.9 browsers, which don't ship any). My code
> basically pretends the browser did "Accept: text/html" and "Accept:
> text/plain" whether it actually did or not, to ease this difficulty;
> is that the right thing?

Nope, though that is what the old spec said. The "official" default
is "*/*" -- HTTP/0.9 clients cannot (and, in general, do not) assume
that what they are getting is HTML.

>[...]
> As a final point, writing the code raises the question of what the
> map files should look like. What I've done probably isn't the right
> thing, but a discussion of what's wrong with it might prove
> instructive. The map files implemented by my code just look like:
>
> foo.au: audio/basic
> foo.gif: image/gif
> foo.html: text/html
> foo.txt: text/plain

That syntax is inherently limiting -- one dimension (media type) only.
Since we know that more dimensions are on the way, the syntax should prepare
for them. The suggestions below are better.

> (this being the contents of foo.map). Qualities can be associated
> with these ---
>
> foo.gif: image/gif; q = 0.6
> foo.jpeg: image/jpeg; q = 1.0
> foo.xbm: image/x-xbitmap; q = 0.00001

That should be "qs" instead of "q", to match the description in the spec.

>[...]
> My question for the group is, what else do we want? Extra MIME header
> lines? Some way of discriminating on USER_AGENT? (FWIW, I was
> thinking vaguely of a syntax along the lines of:
>
> foo.aiff {
> Content-type: audio/aiff; q=1.0
> Pass-along-this-mime-hdr: Kilroy was here
> }
>
> foo.au {
> Content-type: audio/basic; q=0.2
> Pass-along-this-mime-hdr: Klrooy wuzzz heerere
> }
>
> I have a truly marvelous implementation of this in mind, but my
> weekend was too small to contain it ;-).

Yeah, that's my problem as well. What you have just reinvented is a
proposed syntax for URCs (for more info on that, see
<http://union.ncsa.uiuc.edu/HyperNews/get/www/URCs.html>).
If you want to see what I consider to be the ideal syntax for a new
media type for carrying URCs ("meta/prdm"), see
<http://www-diglib.stanford.edu/rmr/TR/TR.html#APPENDIXB>.

A simpler syntax is what I call "meta/http", which is just a series of
HTTP headers separated by blank lines:

URI: foo; vary="type,language"

URI: foo.jpeg
Content-length: 3456
Content-type: image/jpeg; qs=1.0
Content-language: en

URI: foo.gif
Content-type: image/gif; qs=0.7
Content-length: 123456
Content-language: en

URI: foo.gif
Content-type: image/gif; qs=0.7
Content-length: 126543
Content-language: fr

URI: http://www.inria.fr/Images/foo.gif
Content-type: image/gif; qs=0.7
Content-length: 126543
Content-language: fr

The key is this: If the client's accept headers were not sufficient
to determine the "best" choice, then the server would send a
"300 Multiple Choices" response and include this "meta/http" as the
message body -- an enhanced client can then do a redirect on this response.
Note that the last one is a mirror entry and would be ignored by
the server's internal content negotiation.

More importantly, we can eventually move from pre-emptive negotiation and
just have the server return a "meta/*" content whenever multiple choices
do exist (a rare event). This is likely to occur soon after some form
of short-term keep-alive is implemented for HTTP/1.1.

The meta/http format is okay for access redirection, but the PRDM format
will be much better for eventual handling of URI->URC->URL redirection
and resource discovery systems.

Mind you, all of these are just ideas -- one of these days I'll get
around to writing a paper.... I would not expect any of this to show
up in an early version of Apache.


......Roy Fielding ICS Grad Student, University of California, Irvine USA
<fielding@ics.uci.edu>
<URL:http://www.ics.uci.edu/dir/grad/Software/fielding>
Re: Content-type negotiation: thoughts and code [ In reply to ]
> Well, perhaps we should go at this off the main list, but one thing
> which seems to be missing is any notion of how to treat "level", as in
> what is apparently Dan Connolly's semi-official suggestion for
> discriminating levels of HTML conformance --- "Accept: text/html;
> level=3.0". My current draft code refuses to serve up a particular
> variant if it's at a higher level of whatever content-type than the
> client claims to accept (as if it exceeded MaxBytes). This seems
> intuitively the right thing.
>
> Unfortunately, you need a KLUDGE to make this actually work as
> intended (to serve HTML 2.0 variants to older browsers for
> back-compatibility), which is to assume, for HTML only, that the
> client meant "level=2.0" unless some other level was explicitly
> specified.
>
> Now, there's obviously no fully official sanction for this anyplace,
> but some informal guidance would be helpful.

There is a bit of waffling in the spec for this purpose -- where it
talks about "unusual types" in the section on Accept: -- such that
it recommends clients be configurable to send a list of unusual types
along with the */* globbing. But you are right in that there is
currently no clean way to do it. BTW, "level=2" is assumed for text/html
because it is the default....ummm...errr...at least it used to be --
I can't find it in the HTML 2.0 spec now.

> [...]
> Hmmm... I kinda prefer my own proposal (I like to avoid giving blank
> lines semantic significance), but I could warm up to this.

Yep, as do I. However, it has the advantage of having a clear
definition and pre-defined semantics for the fields.

........Roy
Re: Content-type negotiation: thoughts and code [ In reply to ]
Date: Thu, 16 Mar 1995 07:13:30 -0800
From: "Roy T. Fielding" <fielding@avron.ics.uci.edu>

I'd recommend not looking at the CERN code and just use the latest
HTTP/1.0 as a reference. If you find parts that don't make sense in the
spec, please take notes and forward them to me -- you will probably
be the first to implement it right.

Well, perhaps we should go at this off the main list, but one thing
which seems to be missing is any notion of how to treat "level", as in
what is apparently Dan Connolly's semi-official suggestion for
discriminating levels of HTML conformance --- "Accept: text/html;
level=3.0". My current draft code refuses to serve up a particular
variant if it's at a higher level of whatever content-type than the
client claims to accept (as if it exceeded MaxBytes). This seems
intuitively the right thing.

Unfortunately, you need a KLUDGE to make this actually work as
intended (to serve HTML 2.0 variants to older browsers for
back-compatibility), which is to assume, for HTML only, that the
client meant "level=2.0" unless some other level was explicitly
specified.

Now, there's obviously no fully official sanction for this anyplace,
but some informal guidance would be helpful.

There is no "Accept-content-encoding" -- only Accept-Encoding.

My mistake. (Doesn't matter much to the currently released code ---
it ignores this anyway. The next spin will do better).

> handle qualities. (It also doesn't yet return the correct HTTP error
> codes in all cases --- in particular, it returns 404 Not Found, rather
> than 406 None Acceptable, when content-type negotiation does find
> alternate views, of which none are acceptable).

Well, that's not useful -- that would be a bug. Lack of 406 is the
main reason content negotiation has failed in the past.

Is it really? That strikes me as odd --- most browsers that I've seen
don't do much interesting with error codes anyway; they just throw
the message body they got up to the users, for whatever good it'll do
them. Regardless, it pretty clearly is a bug.

>[...]
> A second interesting thing which comes up is what to do with clients
> like certain, ahem, colorful browser betas which ship completely bogus
> Accept: headers (or HTTP/0.9 browsers, which don't ship any). My code
> basically pretends the browser did "Accept: text/html" and "Accept:
> text/plain" whether it actually did or not, to ease this difficulty;
> is that the right thing?

Nope, though that is what the old spec said. The "official" default
is "*/*" -- HTTP/0.9 clients cannot (and, in general, do not) assume
that what they are getting is HTML.

Well, then, maybe we won't have to deal with
note_accepts_for_broken_browsers after all...

That syntax is inherently limiting -- one dimension (media type) only.
Since we know that more dimensions are on the way, the syntax should prepare
for them. The suggestions below are better.

It was a quick hack whose only virtue was that it didn't take a whole
lot of time to write (since the Accept: line parsing code could be
reused in toto).

That should be "qs" instead of "q", to match the description in the spec.

Well, it's easy enough to get it to accept "qs" in that
context... unfortunately, if I do it the quick and obvious way, people
who use "q" instead will get better than they deserve.

Yeah, that's my problem as well. What you have just reinvented is a
proposed syntax for URCs (for more info on that, see
<http://union.ncsa.uiuc.edu/HyperNews/get/www/URCs.html>).
If you want to see what I consider to be the ideal syntax for a new
media type for carrying URCs ("meta/prdm"), see
<http://www-diglib.stanford.edu/rmr/TR/TR.html#APPENDIXB>.

Neta, but probably too elaborate for Apache 1.0.

A simpler syntax is what I call "meta/http", which is just a series of
HTTP headers separated by blank lines:

URI: foo; vary="type,language"

URI: foo.jpeg
Content-length: 3456
Content-type: image/jpeg; qs=1.0
Content-language: en

URI: foo.gif
Content-type: image/gif; qs=0.7
Content-length: 123456
Content-language: en

...

The meta/http format is okay for access redirection, but the PRDM format
will be much better for eventual handling of URI->URC->URL redirection
and resource discovery systems.

Hmmm... I kinda prefer my own proposal (I like to avoid giving blank
lines semantic significance), but I could warm up to this.

rst
Re: Content-type negotiation: thoughts and code [ In reply to ]
Date: Thu, 16 Mar 95 16:31 GMT
From: drtr@ast.cam.ac.uk (David Robinson)

Can the mapfile begin with an identification string please?
In the future we might want the content type of a file to be determied
by the first few bytes of the file.

We could, sure... any suggestions?

In any event, I would imagine wanting a *.doit feature for map files;
it would be nice if we didn't have to invent a new extension for it.

Well, the '*.doit' effect (looking for filenames with added suffixes
if a file with the specified base name isn't present) is already
used by MultiViews. I suppose we could say that if MultiViews finds a
map file, it could use it, but that relies on extensions again, and
it's a bit of a kludge. (NB the extensions are configurable; what it
really relies on is a magic MIME type).

(No, losing MultiViews is not an option).

I would suggest this syntax for a content negiotation map file:
<!--#select
language="en"
type="image/gif; qs=1."
file="pic.gif"
-->

That way, it would be easier to integrate this feature into .shtml parsing.

If you're suggesting that we should have content-type negotiation be
sensitive to some of the contents of the file, I'm sympathetic, but
the *right* way to do that is to handle <META>, not to build on
.shtml. There already is a spec for this, and I'm a little hesitant
to reinvent the wheel without due cause. (In any case, parsing isn't
the hard part).

If you're suggesting that map files, i.e. files whose sole purpose is
to direct the content-negotiation process, and not to be shipped to
clients, should look like .shtml, I'm not sure I see why that's a good
idea. (It doesn't make things any easier to write, BTW; the includes
code is committed to writing whatever it finds down a FILE*, which is
very much the wrong thing when you're still trying to figure out what
to retrieve. Besides, for a variety of reasons, I'm trying to
minimize the impact of my current content-negotiation work on the rest
of the server).

Note that it should also be possible for the selected object to be a directory,
although you'd have to cope with multiple map files in a single path.
(e.g. as in http://host/path/language-map/countries/image-map/canada)

David.

I'm not sure I understand what you're requesting here. Is the
suggestion that "language-map" would map directories like

en
kr
fr

and then "image-map" would direct MIME type discrimination? If so,
it's a neat idea, but *very* hairy to implement, in the context of the
NCSA base code. I'd prefer to get one-level discrimination working
solidly first, before having a go at this. That's messy enough!

rst
Re: Content-type negotiation: thoughts and code [ In reply to ]
Can the mapfile begin with an identification string please?
In the future we might want the content type of a file to be determied
by the first few bytes of the file.

In any event, I would imagine wanting a *.doit feature for map files;
it would be nice if we didn't have to invent a new extension for it.

I would suggest this syntax for a content negiotation map file:
<!--#select
language="en"
type="image/gif; qs=1."
file="pic.gif"
-->

That way, it would be easier to integrate this feature into .shtml parsing.

Note that it should also be possible for the selected object to be a directory,
although you'd have to cope with multiple map files in a single path.
(e.g. as in http://host/path/language-map/countries/image-map/canada)

David.
Re: Content-type negotiation: thoughts and code [ In reply to ]
On Thu, 16 Mar 1995, Robert S. Thau wrote:
> Oy. <!--#includes--> of mapped entities was something I hadn't even
> considered. But there is an immediately obvious application for it,
> now that you mention it: documents with tables --- (keep one version
> of most of it, but <!--#include--> the tables, and use mapping to
> choose either real HTML3 tables or a <pre>formatted version).
...
> In any case, there are some *nasty* subtleties here. Say we have a
> variant entity whose variant forms are, say, GIF, JPEG, and text/plain
> (ASCII-art). Right now, when the server retrieves a .shtml file, it
> goes through processing <!--#directives-->, and sending all other
> bytes over the wire. If there is any extraneous whitespace at all in
> the pseudo-.shtml "map" file (let alone, say, comments!) then that
> too will be sent over the wire, corrupting the binary formats.

I think this is one we shouldn't worry about too much - the real solution
is that instead of a server-side include the author can say <EMBED
SRC="table"> and the browser and client negotiate for whether that's
table.html3, or table.gif. It just doesn't feel right to think about a
text/html-level-2 document have a section that's text/html-level-3... to
me at least. Hmm.

Let's skip on this for the first release - as long as we can do
server-side includes in content-negotiated files, I'm happy.

Brian


--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: Content-type negotiation: thoughts and code [ In reply to ]
Date: Thu, 16 Mar 95 22:25 GMT
From: drtr@ast.cam.ac.uk (David Robinson)

It may not make it easier to write, but it's one less thing for the user
to understand.

It's new syntax either way. Besides, Roy's meta/http is at least
pretty straightforward ---

URI: foo.en.gif
Content-type: image/gif; qs = 0.7
Content-language: en

is pretty hard to screw up.

In fact, what I was getting at was this; suppose I want
to do a server-side include of a document subject to content-negotiation.

Oy. <!--#includes--> of mapped entities was something I hadn't even
considered. But there is an immediately obvious application for it,
now that you mention it: documents with tables --- (keep one version
of most of it, but <!--#include--> the tables, and use mapping to
choose either real HTML3 tables or a <pre>formatted version).

This requires heavy work on send_included_file in http_includes.c.

In the near future (when we have some working patches), you would
have to <!--#include--> another document that was a mapfile. What I
was hoping for in the long term was for the shtml file to be able
to do content-negotiated includes directly. So that map files would
be a sub-set of general shtml files.

I'm not sure that this is wise in all cases --- if the same varied
entity is #included multiple times, it sounds like it would be a real
hassle to update the N different maps in the N different files that
include it.

In any case, there are some *nasty* subtleties here. Say we have a
variant entity whose variant forms are, say, GIF, JPEG, and text/plain
(ASCII-art). Right now, when the server retrieves a .shtml file, it
goes through processing <!--#directives-->, and sending all other
bytes over the wire. If there is any extraneous whitespace at all in
the pseudo-.shtml "map" file (let alone, say, comments!) then that
too will be sent over the wire, corrupting the binary formats.

Another problem is how the server is supposed to know that it should
wait to read the <!--#directives--> before deciding what the MIME type
of this thing is, rather than doing what it usually does with *.shtml,
which is deciding that the thing is text/html, sending an appropriate
header, and only then starting to process the contents.

Maybe I'm slow, but I don't see how to deal with either of these
problems without having the server know, somehow, that maps are to be
treated differently from ordinary *.shtml files --- and after that,
I'm not sure that I see any compelling reason to keep the same syntax.

rst
Re: Content-type negotiation: thoughts and code [ In reply to ]
rst wrote:
> I would suggest this syntax for a content negiotation map file:
> <!--#select
> language="en"
> type="image/gif; qs=1."
> file="pic.gif"
> -->
>
> That way, it would be easier to integrate this feature into .shtml parsing.

>If you're suggesting that map files, i.e. files whose sole purpose is
>to direct the content-negotiation process, and not to be shipped to
>clients, should look like .shtml, I'm not sure I see why that's a good
>idea. (It doesn't make things any easier to write, BTW; the includes
>code is committed to writing whatever it finds down a FILE*, which is
>very much the wrong thing when you're still trying to figure out what
>to retrieve. Besides, for a variety of reasons, I'm trying to
>minimize the impact of my current content-negotiation work on the rest
>of the server).

It may not make it easier to write, but it's one less thing for the user
to understand. In fact, what I was getting at was this; suppose I want
to do a server-side include of a document subject to content-negotiation.

In the near future (when we have some working patches), you would have to
<!--#include--> another document that was a mapfile. What I was hoping for in
the long term was for the shtml file to be able to do content-negotiated
includes directly. So that map files would be a sub-set of general shtml
files.

> Note that it should also be possible for the selected object to be a directory,
> although you'd have to cope with multiple map files in a single path.
> (e.g. as in http://host/path/language-map/countries/image-map/canada)
>
>I'm not sure I understand what you're requesting here. Is the
suggestion that "language-map" would map directories like
>
> en
> kr
> fr
>
>and then "image-map" would direct MIME type discrimination? If so,
>it's a neat idea, but *very* hairy to implement, in the context of the
>NCSA base code. I'd prefer to get one-level discrimination working
>solidly first, before having a go at this. That's messy enough!

Absolutely! (Yes, the server could return /root/fr/countries/gif/canada.gif)
I was only trying to suggest these 'next generation' features in case
they might affect how you'd implement the features you want right now.
There's nothing like looking ahead...

David.