Mailing List Archive

Content Negotiation
I think we should move content negotiation into the very top of the list
for stuff to-be-worked-on - Netscape 1.1 supports HTML 3.0-isms, and
*hopefully* (I haven't had a chance to download it yet, their servers
must be melting - why don't they post it to a newsgroup???) they stick a
"text/x-html3" or "text/html; version=3.0" into their Accept: headers, or
we're all doomed. :)

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: Content Negotiation [ In reply to ]
On Sun, 5 Mar 1995, Robert S. Thau wrote:
> We may be doomed; nothing like shows up in HTTP_ACCEPT from my
> printenv script from SunOS Mozilla 1.1b1. Then again, it's possible
> the server is dropping Accept: headers more complicated than it
> ordinarily understands.

Okay, I got my copy of 1.1 now, and its headers in toto to a new URL are:

GET / HTTP/1.0
User-Agent: Mozilla/1.1b1 (X11; international; IRIX 5.3 IP22)
Accept: */*
Accept: image/gif
Accept: image/x-xbitmap
Accept: image/jpeg

We are doomed. I'm sending flame mail to www-talk.

I prefer the multiple stat() method - the server can be smart about it by
paying attention to the order (or q values) in the Accept: headers.

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: Content Negotiation [ In reply to ]
Date: Sun, 5 Mar 1995 15:45:09 -0800 (PST)
From: Brian Behlendorf <brian@wired.com>

I think we should move content negotiation into the very top of the list
for stuff to-be-worked-on - Netscape 1.1 supports HTML 3.0-isms, and
*hopefully* (I haven't had a chance to download it yet, their servers
must be melting - why don't they post it to a newsgroup???) they stick a
"text/x-html3" or "text/html; version=3.0" into their Accept: headers, or
we're all doomed. :)

Brian

We may be doomed; nothing like shows up in HTTP_ACCEPT from my
printenv script from SunOS Mozilla 1.1b1. Then again, it's possible
the server is dropping Accept: headers more complicated than it
ordinarily understands.

I'm developing a sneaking admiration for the CERN scheme (scan the
directory for files with the given name as a prefix; do type
discrimination among those by suffix as usual) --- the overhead is a
little more than I'd like, but I don't *think* it'll turn out to be
prohibitive. Besides, I still don't have any better ideas.

Then again, I have a bias: if we toss on the rule that the server
should run a CGI script if it finds one[1], that preserves the effect
of my *.doit scripting hacks (assuming an AddType .doit ...cgi... in
the right place, but that's trivial). I'd have to toss the existing
code of course, but getting emotionally attached to one's code is
always a mistake anyway.

rst

[1] Actually, the rule you'd need is *slightly* more complicated:
If all you find is a script, run the script. If you find other
files, and the request is a "simple" GET (*no* PATH_INFO, *no*
QUERY_ARGS), choose one of the files and retrieve it as usual;
otherwise, run the script.

This can save you the expense of actually running the script for
things like cover pages.
Re: Content Negotiation [ In reply to ]
> Okay, I got my copy of 1.1 now, and its headers in toto to a new URL are:

me too, also, likewise, an' all (S. Wales saying)

Found a list of bugs as long as my arm.

> GET / HTTP/1.0
> User-Agent: Mozilla/1.1b1 (X11; international; IRIX 5.3 IP22)
> Accept: */*
> Accept: image/gif
> Accept: image/x-xbitmap
> Accept: image/jpeg
>
> We are doomed. I'm sending flame mail to www-talk.

what's the problem. I thought the standard was to write Accept: like
this, and that Mosaic et-al were doing it wrong.

httpd glues all the Accepts together for you.

Am I missing something ?

rob h
Re: Content Negotiation [ In reply to ]
On Sun, 5 Mar 1995, Rob Hartill wrote:
> > Okay, I got my copy of 1.1 now, and its headers in toto to a new URL are:
>
> me too, also, likewise, an' all (S. Wales saying)
>
> Found a list of bugs as long as my arm.
>
> > GET / HTTP/1.0
> > User-Agent: Mozilla/1.1b1 (X11; international; IRIX 5.3 IP22)
> > Accept: */*
> > Accept: image/gif
> > Accept: image/x-xbitmap
> > Accept: image/jpeg
> >
> > We are doomed. I'm sending flame mail to www-talk.
>
> what's the problem. I thought the standard was to write Accept: like
> this, and that Mosaic et-al were doing it wrong.
>
> httpd glues all the Accepts together for you.
>
> Am I missing something ?

Yes - There should be a fourth line, "Accept: text/x-html3" like Arena
does, or "Accept: text/html; version=3.0" like the MIME folks would like.
That way I can tell when someone hits my home page whether I should send
them the HTML 2.0 version of my home page or the HTML 3.0 version without
keeping around a big table of browser USER_AGENTS and their capabilities.

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: Content Negotiation [ In reply to ]
>
> Yes - There should be a fourth line, "Accept: text/x-html3" like Arena
> does, or "Accept: text/html; version=3.0" like the MIME folks would like.
> That way I can tell when someone hits my home page whether I should send
> them the HTML 2.0 version of my home page or the HTML 3.0 version without
> keeping around a big table of browser USER_AGENTS and their capabilities.
>

Fairy nuff. Not so much a bug, more of an oversight.

My 1.1b core dumps whenever I use the middle button to open
a new window on a link. hmm.

Some interesting fixes and bugs...

Expires: now works
ALT="bleah" is now displayed when images aren't shown.
forms mailto: still ignores enc-type - mail is still sent
URL encoded
Their mime mail attachments looks broken. It only attaches part of
the document you want mailed, and 'cos you can't see it being attached,
you don't notice it's only half there.
RELOAD on a cached document fails to RELOAD because "if-modified-since"
is still being sent. Duh.
There's no sign of a way to specify a prefered language, so that servers
can serve applicable content. This is simple to add via a preferences
option.

Oh well, back to server related topics I guess.

rob h
Re: Content Negotiation [ In reply to ]
From: rst@ai.mit.edu (Robert S. Thau)
Date: Sun, 5 Mar 95 20:01:32 EST

I'm developing a sneaking admiration for the CERN scheme (scan the
directory for files with the given name as a prefix; do type
discrimination among those by suffix as usual) --- the overhead is a
little more than I'd like, but I don't *think* it'll turn out to be
prohibitive. Besides, I still don't have any better ideas.

Somehow, that didn't come out right. Must be the #$@!$ flu.

FWIW, it's easier to come up with schemes which have lower overhead
than the CERN scheme --- the gopher meta-file scheme, for instance ---
but I haven't yet seen one which didn't require fairly active
maintenance. (The advantage of the CERN scheme is that you can
install a new form of a document simply by moving a file into place;
the server automatically adapts, so you don't have to recompile any
type-map files to make it notice the new version --- or worry about
version skew between the type maps and the actual file system).

One idea I was toying with was to have /some/multi-typed/page be
represented in the filesystem as a *directory* which would contain
*files* named html, html3, txt, or whatever. However, this doesn't
save many cycles over the CERN scheme (the server has to scan smaller
directories, but a lot of the overhead of the scan is in opendir()
itself), and it seems to conflict with automagic directory indexing
--- the latter being a definite stumbling block.

rst
Re: Content Negotiation [ In reply to ]
On Sun, 5 Mar 1995, Rob Hartill wrote:
> > Yes - There should be a fourth line, "Accept: text/x-html3" like Arena
> > does, or "Accept: text/html; version=3.0" like the MIME folks would like.
> > That way I can tell when someone hits my home page whether I should send
> > them the HTML 2.0 version of my home page or the HTML 3.0 version without
> > keeping around a big table of browser USER_AGENTS and their capabilities.
>
> Fairy nuff. Not so much a bug, more of an oversight.

But... if you thought "we suggest you view this page using Netscape"
on all sorts of pages was bad enough, just wait until you see "You *must*
view this page using netscape", even though some browsers (XMosaic 2.5,
w3-emacs, Arena) can view tables fine. It's a sorta subtle but very
powerfully dangerous step towards balkanization, worse than just the
regular netscape tags. Sorry to sound like Chicken Little, but the sky
really is kinda falling :)

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: Content Negotiation [ In reply to ]
>
> However, I see that if you had a directory containing .gif and .jpeg versions
> of many images, it would be tedious to have to create a .shtml map file for
> every image. So how about a default wildcard mapping set in the .htaccess
> file in the directory? That way, the server would read a per-directory list
> of extensions to stat for, rather than having to try everying the browser
> accepts.
>

The script and hack I've got for 1.3 does something very similar to
this. A global MIME type can be set so that any file ending with
a particular extension can be checked for negotiation, e.g.
one could set an extension .img which when requested would be
intercepted by httpd and passed to my cgi-script, the script then
looks for a conversion table by that name, if it doesn't find one
it looks for a table in that directory, then in the document root.

When (if) it finds a table it looks to see how the type should be
processed. For gif/jpeg, this could be a local redirect to one or
the other file formats. The table can also specify commands to be
executed, response codes to be returned, and probably a few other
useful alternatives I've forgotten already.

See http://ooo.lanl.gov/explain.html for some old info.


rob h
Re: Content Negotiation [ In reply to ]
Date: Sun, 5 Mar 1995 18:48:02 -0800 (PST)
From: Brian Behlendorf <brian@wired.com>

I prefer the multiple stat() method - the server can be smart about it by
paying attention to the order (or q values) in the Accept: headers.

That certainly would help, but there's still a pretty long list of
types to check[1], some of which may come with multiple extensions
(.htm, .html, .shtml; .txt, .text), and no matter how well the browser
orders them[2], there will be cases where the legitimate best choice
is near the bottom of the list. On top of that, there's the issue of
content-encoding to deal with (.ps, .ps.Z, .ps.gz); doing this the
simple way multiplies the number of stats required by three. That's a
lot of stats.

The best thing may be to just implement both and do a benchmark.

rst


[1] FWIW, an accurate list of the MIME types which I've configured
Netscape to handle would include at least text/plain, text/html,
image/gif, image/jpeg, image/x-xbitmap, audio/basic, audio/x-aiff,
audio/x-wav, audio/x-mpeg, video/mpeg, video/quicktime,
application/postscript, and application/x-dvi. This ignores a
list of distinct text/* types which X Mosaic, at least, sends
Accept: headers for (application/x-tex, text/setext,
text/tab-separated-text, etc.).

[2] As it stands right now, we'd have enough trouble just getting the
browser authors to put together an accurate list of accepted types!
Re: Content Negotiation [ In reply to ]
>> GET / HTTP/1.0
>> User-Agent: Mozilla/1.1b1 (X11; international; IRIX 5.3 IP22)
>> Accept: */*
>> Accept: image/gif
>> Accept: image/x-xbitmap
>> Accept: image/jpeg
>>
>> We are doomed. I'm sending flame mail to www-talk.
>
> what's the problem. I thought the standard was to write Accept: like
> this, and that Mosaic et-al were doing it wrong.

They are both wrong. I'll post to this list when the next revision
of the standard is ready (sometime tonight or tomorrow). It contains
a lot more info on Accept.


......Roy Fielding ICS Grad Student, University of California, Irvine USA
<fielding@ics.uci.edu>
<URL:http://www.ics.uci.edu/dir/grad/Software/fielding>
Re: Content Negotiation [ In reply to ]
My inclination would be towards having map files. However, I would make the
syntax compatible with server parsed include files, with includes conditional
on the values of the Accept, Accept-Encoding and Accept-Language headers.

So accessing jupiter.shtml could return either jupiter.gif or jupiter.jpg.
If you want the map file not to end in .shtml, then either the x-bit hack or
the .doit approach could be used. Or the server could determine to doc
type from its contents. (I think httpd should be doing this anyway.)

However, I see that if you had a directory containing .gif and .jpeg versions
of many images, it would be tedious to have to create a .shtml map file for
every image. So how about a default wildcard mapping set in the .htaccess
file in the directory? That way, the server would read a per-directory list
of extensions to stat for, rather than having to try everying the browser
accepts.

The other way of solving this is to allow map files to be path components.
So when a browser asks for /adir/images/jupiter, in fact /adir/images
is a map file which selects one of the directories /adir/images.gif or
/adir/images.jpg. So the browser would receive /adir/images.gif/jupiter or
/adir/images.jpg/jupiter.

David.
Re: Content Negotiation [ In reply to ]
I considered another two cases where automatic content negotiation causes
problems (so I disband my case for it):

1) Someone who created a page "mother.html" which inlined the image
"mother.gif". Most browsers send "Accept: image/gif" in their headers even
to HTML requests, so clearly we can't have the server assume that mother.gif
is just the gif version of the mother.html page :)

2) URL's, while powerful, can't completely express the exact request since
they don't specify what the other headers say, and those headers can
affect output, particularly for lines like Accept: or Content-Encoding:.
Thus, we need to have a way for content providers to be able to
explicitely link to a content-negotiating version of a resource
separately from the link where a particular content-type is desired.
I.e., some times I'll just want to link to a picture of my mother, not
caring whether it's the jpg or gif or tiff version, other times I'll
explicitly want to be able to link to the jpg version.

So, like it or not it looks like content negotiation has to be specified on a
per-document level using 3rd-party files (like gopher menu files) until we
have meta-informational file systems.


On Mon, 6 Mar 1995, Rob Hartill wrote:
> > However, I see that if you had a directory containing .gif and .jpeg versions
> > of many images, it would be tedious to have to create a .shtml map file for
> > every image. So how about a default wildcard mapping set in the .htaccess
> > file in the directory? That way, the server would read a per-directory list
> > of extensions to stat for, rather than having to try everying the browser
> > accepts.
>
> The script and hack I've got for 1.3 does something very similar to
> this. A global MIME type can be set so that any file ending with
> a particular extension can be checked for negotiation, e.g.
> one could set an extension .img which when requested would be
> intercepted by httpd and passed to my cgi-script, the script then
> looks for a conversion table by that name, if it doesn't find one
> it looks for a table in that directory, then in the document root.

Can your system be used for imagemap files as well? For example, NetSite
allows you to use http://host/path/mapfile.map instead of
http://host/cgi-bin/imagemap/path/mapfile.map - as the server does the
imagemap processing internally (again a good hack ripe to be implemented into
1.3 :) That's the way I'd like to do this: let's say I have /path/mother.gif
and /path/mother.jpg. To explicitely allow content negotiation for this
resource I link to a file /path/mother.img - that mother.img would work like
a mapfile in that it describes certain criteria that match certain other
files. I don't know what the format of that file would be like...
Also, this would be greatly enhanced if instead of issuing a Redirect the
server could respond "here's the jpg file, but you (the client) should
be aware the URL this is really known as is http://host/path/mother.jpg".
I didn't see any codes that matched that - Roy?

> See http://ooo.lanl.gov/explain.html for some old info.

Looks like you've already implemented this :):) Now to move chooser.pl
into the server and do away with the need to look at user_agent or
redirects...

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: Content Negotiation [ In reply to ]
Brian wrote..
>
> Can your system be used for imagemap files as well? For example, NetSite
> allows you to use http://host/path/mapfile.map instead of
> http://host/cgi-bin/imagemap/path/mapfile.map - as the server does the

I haven't tried it, but if it doesn't it'd be easy to make it.

> Also, this would be greatly enhanced if instead of issuing a Redirect the
> server could respond "here's the jpg file, but you (the client) should
> be aware the URL this is really known as is http://host/path/mother.jpg".
> I didn't see any codes that matched that - Roy?

301 is supposed to do that.

http://info.cern.ch/hypertext/WWW/Protocols/HTTP/HTRESP.html says..

-=-=
Moved 301

The data requested has been assigned a new URI, the change is
permanent. (N.B. this is an optimisation, which must, pragmatically,
be included in this definition. Browsers with link editing
capabiliy should automatically relink to the new reference, where possible)

The response contains one or more header lines of the form

URI: <url> String CrLf


Which specify alternative addresses for the object in question. The
String is an optional comment field. If the response is to indicate a
set of variants which each correspond to the requested URI,
then the multipart/alternative wrapping may be used to distinguish
different sets
-=-=

>
> > See http://ooo.lanl.gov/explain.html for some old info.
>
> Looks like you've already implemented this :):) Now to move chooser.pl
> into the server and do away with the need to look at user_agent or
> redirects...

'chooser.pl' will probably take a lot of coding to convert it into
C. Before anyone even thinks of doing this, the convention I've
developed to specify the tables needs to be checked to see if there
are any cases I've missed - I bet there are.



rob h
Re: Content Negotiation [ In reply to ]
Rob wrote:
> Brian wrote..
> > Also, this would be greatly enhanced if instead of issuing a Redirect the
> > server could respond "here's the jpg file, but you (the client) should
> > be aware the URL this is really known as is http://host/path/mother.jpg".
> > I didn't see any codes that matched that - Roy?
>
>301 is supposed to do that.
>
>http://info.cern.ch/hypertext/WWW/Protocols/HTTP/HTRESP.html says..
>
>-=-=
>Moved 301
[quote from old standard deleted]

The current standard says basically the same, but the wording is slighty
altered, without significantly changing the meaning.
http://www.w3.org/hypertext/WWW/Protocols/HTTP1.0/HTTP1.0-ID_28.html says,,
301 Moved Permanently
The object requested has been assigned a new permanent URI, and any _future_
references to this object must be done using the returned URI.
...
302 Moved Temporarily
The data requested resides temporarily under a different URI. As the
redirection may be altered on occasion, the client should on _future_
requests from the user continue to use the original URI used for this
request and not the URI returned in the URI-header field.
[My emphasis]

The problem is whether the redirect applies to _future_ requests, or the
_current and future_ requests. The standard, efficiency and what we would
desire all prefer the former. However, current browsers and servers (including
httpd) implement the latter. So, I disagree with Rob, I don't think 301
(or 302) status can be used for 'here's the jpg file, but you should be
aware the URL this is really known as is...'.

Also, I don't believe that you would always want to redirect the browser to
the specific resource URL. Sometimes I think you would not want the browser
to know about URLs for the different image types (say), but only use the
generic URL.

I think the best solution would be if the URL header could be returned for
non-redirect requests; then the server, if it wished, could, in reply to
GET /path/mother.img ....

with headers
URI: <http://host/path/mother.jpg> (the actual resource)
URI: <http://host/path/mother.img>; vary="type" (the generic resource)

David.