Mailing List Archive

Patches to handle content-language
What follows is a patch for Apache 0.8.0 (and Shambhala) that enables
consistent handling of content-language with MultiViews.

(The new behaviour is much closer to what you can have using CERN's httpd.)

Previously, if you wanted to handle files in several languages, you
were obliged to have a .var file for each, because mod_mime.c didn't
know what Content-Language was and so didn't type on language.

I added a per-directory directive AddLanguage which is very similar to
AddEncoding : it takes a language and a suffix. For example my srm.conf has

AddLanguage fr .fr
AddLanguage en .en
AddLanguage de .de

Mod_mime.c now recognizes filenames of the form
basename.type.lang.encoding, for example chapter1.html.fr.gz is
correctly typed as text/html with language=fr and encoding=x-gzip. The
Content-Language is stored in a new field in request_rec, exactly like
the Content-Encoding.

Also : when in MultiViews, if you request somefile.html and both
somefile.html.fr and somefile.html.en are available with the same
quality setting, the previous behaviour was to serve whichever was
smallest in size. This made it impossible to have the server give by
default pages in French if the client didn't send an Accept-Language:
header. I changed this behaviour to server the pages with the priority
given in the config file (the first AddLanguage has highest priority).
I don't think it has any impact on existing applications.

I also fixed a bug in find_lang_index in which a NULL string could be
strncmp'ed.

Regards,

Florent Guillaume


*** ../shambhala.orig/httpd.h Wed Jul 12 19:44:53 1995
--- httpd.h Sun Jul 16 21:12:14 1995
***************
*** 274,279 ****
--- 274,280 ----

char *content_type; /* Break these out --- we dispatch on 'em */
char *content_encoding;
+ char *content_language;

int no_cache;

*** ../shambhala.orig/http_config.h Mon Jun 26 00:42:14 1995
--- http_config.h Sun Jul 16 21:12:57 1995
***************
*** 176,182 ****
* (as a SERVER_ERROR, since the module which was
* supposed to handle this was configured wrong).
* type_checker --- Determine MIME type of the requested entity;
! * sets content_type and _encoding fields.
* logger --- log a transaction. Not supported yet out of sheer
* laziness on my part.
*/
--- 176,182 ----
* (as a SERVER_ERROR, since the module which was
* supposed to handle this was configured wrong).
* type_checker --- Determine MIME type of the requested entity;
! * sets content_type, _encoding and _language fields.
* logger --- log a transaction. Not supported yet out of sheer
* laziness on my part.
*/
*** ../shambhala.orig/http_protocol.c Thu Jul 13 02:28:05 1995
--- http_protocol.c Sun Jul 16 22:44:53 1995
***************
*** 499,504 ****
--- 499,507 ----
if (r->content_encoding)
fprintf (fd, "Content-encoding: %s\015\012", r->content_encoding);

+ if (r->content_language)
+ fprintf (fd, "Content-language: %s\015\012", r->content_language);
+
for (i = 0; i < hdrs_arr->nelts; ++i) {
if (!hdrs[i].key) continue;
fprintf (fd, "%s: %s\015\012", hdrs[i].key, hdrs[i].val);
*** ../shambhala.orig/mod_negotiation.c Sat Jul 1 19:46:05 1995
--- mod_negotiation.c Mon Jul 17 01:32:47 1995
***************
*** 132,138 ****
char *type_name;
char *file_name;
char *content_encoding;
! char *lang;
float level; /* Auxiliary to content-type... */
float qs;
float bytes;
--- 132,138 ----
char *type_name;
char *file_name;
char *content_encoding;
! char *content_language;
float level; /* Auxiliary to content-type... */
float qs;
float bytes;
***************
*** 172,178 ****
mime_info->type_name = "";
mime_info->file_name = "";
mime_info->content_encoding = "";
! mime_info->lang = "";

mime_info->is_pseudo_html = 0.0;
mime_info->level = 0.0;
--- 172,178 ----
mime_info->type_name = "";
mime_info->file_name = "";
mime_info->content_encoding = "";
! mime_info->content_language = "";

mime_info->is_pseudo_html = 0.0;
mime_info->level = 0.0;
***************
*** 560,567 ****
mime_info.bytes = atoi(body);
}
else if (!strncmp (buffer, "content-language:", 17)) {
! mime_info.lang = get_token (neg->pool, &body, 0);
! str_tolower (mime_info.lang);
}
else if (!strncmp (buffer, "content-encoding:", 17)) {
mime_info.content_encoding = get_token (neg->pool, &body, 0);
--- 560,567 ----
mime_info.bytes = atoi(body);
}
else if (!strncmp (buffer, "content-language:", 17)) {
! mime_info.content_language = get_token (neg->pool, &body, 0);
! str_tolower (mime_info.content_language);
}
else if (!strncmp (buffer, "content-encoding:", 17)) {
mime_info.content_encoding = get_token (neg->pool, &body, 0);
***************
*** 589,597 ****
int read_types_multi (negotiation_state *neg)
{
request_rec *r = neg->r;
- char *file_name = pstrdup (r->pool, r->filename);

! char *filp = &file_name[strlen(file_name) - 1];
int prefix_len;
DIR *dirp;
struct DIR_TYPE *dir_entry;
--- 589,596 ----
int read_types_multi (negotiation_state *neg)
{
request_rec *r = neg->r;

! char *filp;
int prefix_len;
DIR *dirp;
struct DIR_TYPE *dir_entry;
***************
*** 648,653 ****
--- 647,653 ----
mime_info.sub_req = sub_req;
mime_info.file_name = dir_entry->d_name;
mime_info.content_encoding = sub_req->content_encoding;
+ mime_info.content_language = sub_req->content_language;

get_entry (neg->pool, &accept_info, sub_req->content_type);
set_mime_fields (&mime_info, &accept_info);
***************
*** 759,767 ****

int find_lang_index (array_header *accept_langs, char *lang)
{
! accept_rec *accs = (accept_rec *)accept_langs->elts;
int i;

for (i = 0; i < accept_langs->nelts; ++i)
if (!strncmp (lang, accs[i].type_name, strlen(accs[i].type_name)))
return i;
--- 759,772 ----

int find_lang_index (array_header *accept_langs, char *lang)
{
! accept_rec *accs;
int i;

+ if (!lang)
+ return -1;
+
+ accs = (accept_rec *)accept_langs->elts;
+
for (i = 0; i < accept_langs->nelts; ++i)
if (!strncmp (lang, accs[i].type_name, strlen(accs[i].type_name)))
return i;
***************
*** 777,793 ****

if (neg->accept_langs->nelts == 0) {

! /* Client doesn't care */

for (i = 0; i < neg->avail_vars->nelts; ++i)
! var_recs[i].lang_index = -1;

return;
}

for (i = 0; i < neg->avail_vars->nelts; ++i)
if (var_recs[i].quality > 0) {
! int index = find_lang_index (neg->accept_langs, var_recs[i].lang);

var_recs[i].lang_index = index;
if (index >= 0) found_any = 1;
--- 782,802 ----

if (neg->accept_langs->nelts == 0) {

! /* Client doesn't care : use order of config file */
!
! extern int mime_get_lang_index (request_rec *r, char *lang);

for (i = 0; i < neg->avail_vars->nelts; ++i)
! var_recs[i].lang_index =
! mime_get_lang_index (neg->r, var_recs[i].content_language);

return;
}

for (i = 0; i < neg->avail_vars->nelts; ++i)
if (var_recs[i].quality > 0) {
! int index = find_lang_index (neg->accept_langs,
! var_recs[i].content_language);

var_recs[i].lang_index = index;
if (index >= 0) found_any = 1;
***************
*** 1031,1036 ****
--- 1040,1046 ----
r->filename = sub_req->filename;
r->content_type = sub_req->content_type;
r->content_encoding = sub_req->content_encoding;
+ r->content_language = sub_req->content_language;
r->finfo = sub_req->finfo;

return OK;
*** ../shambhala.orig/mod_mime.c Fri Jun 30 13:54:26 1995
--- mod_mime.c Mon Jul 17 01:32:25 1995
***************
*** 69,74 ****
--- 69,75 ----
typedef struct {
table *forced_types; /* Additional AddTyped stuff */
table *encoding_types; /* Added with AddEncoding... */
+ table *language_types; /* Added with AddLanguage... */
} mime_dir_config;

module mime_module;
***************
*** 80,85 ****
--- 81,87 ----

new->forced_types = make_table (p, 4);
new->encoding_types = make_table (p, 4);
+ new->language_types = make_table (p, 4);

return new;
}
***************
*** 95,100 ****
--- 97,104 ----
base->forced_types);
new->encoding_types = overlay_tables (p, add->encoding_types,
base->encoding_types);
+ new->language_types = overlay_tables (p, add->language_types,
+ base->language_types);

return new;
}
***************
*** 113,118 ****
--- 117,157 ----
return NULL;
}

+ char *add_language(cmd_parms *cmd, mime_dir_config *m, char *lang, char *ext)
+ {
+ if (*ext == '.') ++ext;
+ table_set (m->language_types, ext, lang);
+ return NULL;
+ }
+
+
+ /* This function is called by the negotiation module to know the index
+ * of a given language in the config files.
+ */
+
+ int mime_get_lang_index (request_rec *r, char *lang)
+ {
+ mime_dir_config *conf;
+ int nelts;
+ table_entry *elts;
+ int i;
+
+ if (!lang)
+ return -1;
+
+ conf = (mime_dir_config *)get_module_config(r->per_dir_config, &mime_module);
+ nelts = conf->language_types->nelts;
+ elts = (table_entry *) conf->language_types->elts;
+
+ for (i = 0; i < nelts; ++i)
+ if (!strcasecmp (elts[i].val, lang))
+ return i;
+
+ return -1;
+ }
+
+
+
/* The sole bit of server configuration that the MIME module has is
* the name of its config file, so...
*/
***************
*** 129,134 ****
--- 168,175 ----
"a mime type followed by a file extension" },
{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2,
"an encoding (e.g., gzip), followed by a file extension" },
+ { "AddLanguage", add_language, NULL, OR_FILEINFO, TAKE2,
+ "a language (e.g., fr), followed by a file extension" },
{ "TypesConfig", set_types_config, NULL, RSRC_CONF, TAKE1,
"the MIME types config file" },
{ NULL }
***************
*** 198,203 ****
--- 239,255 ----
if ((type = table_get (conf->encoding_types, &fn[i])))
{
r->content_encoding = type;
+
+ /* go back to previous extension to try to use it as a language */
+
+ fn[i-1] = '\0';
+ if((i=rind(fn,'.')) < 0) return OK;
+ ++i;
+ }
+
+ if ((type = table_get (conf->language_types, &fn[i])))
+ {
+ r->content_language = type;

/* go back to previous extension to try to use it as a type */


--
Florent.Guillaume@ens.fr
Re: Patches to handle content-language [ In reply to ]
On Mon, 17 Jul 1995, Florent Guillaume wrote:
> What follows is a patch for Apache 0.8.0 (and Shambhala) that enables
> consistent handling of content-language with MultiViews.
>
> (The new behaviour is much closer to what you can have using CERN's httpd.)
>
> Previously, if you wanted to handle files in several languages, you
> were obliged to have a .var file for each, because mod_mime.c didn't
> know what Content-Language was and so didn't type on language.
>
> I added a per-directory directive AddLanguage which is very similar to
> AddEncoding : it takes a language and a suffix. For example my srm.conf has
>
> AddLanguage fr .fr
> AddLanguage en .en
> AddLanguage de .de
>
> Mod_mime.c now recognizes filenames of the form
> basename.type.lang.encoding, for example chapter1.html.fr.gz is
> correctly typed as text/html with language=fr and encoding=x-gzip. The
> Content-Language is stored in a new field in request_rec, exactly like
> the Content-Encoding.

I really like this, but what resolves name collisions and missing
info between type, lang, and encoding? For example, if I decide to name
all my Framemaker documents .fr, what happens to document.fr?
document.fr.en? document.fr.fr? If type, lang, and encoding shared the
same namespace, *no* problem. In this case, we're using filename
extensions to indicate meta-information other than content-type, which
I'm certainly comfortable with, but the collision issue should be
resolved somehow.

Also, it would be tremendous if I could have the flexibility to negotiate
on file type and language and encoding by specifying only the meta-info I
want in the filename - in other words, lets say I have documents in all
the possible variations of

basename.[html,txt,pdf].[en,fr,jp].[gz,Z,uu]

Right now with content-negotiation, if I have an index.html and an
index.html3, then I can simply point a resource locator to "index" and
negotiation happens, but I can also defeat negotiation by explicitly
linking to "index.html3" if I wanted to make sure someone got the 3.0
version.

Let's say for the above 9 versions of the document I wanted to
be able to specify which variables are mandatory. If I didn't care at
all which document was fetched, I'd create a link to "basename". If I
wanted specifically the gzip'd french PDF, I'd make a link to
"basename.pdf.fr.gzip". Now, let's say I want to make a link to all
french variants explicit, yet let the client/server negotiate on their
own as to encoding and content-type preferences. I'd like to then link
to "basename.fr". Or, I specifically want the uuencoded PDF's, but I
don't care what language: "basename.pdf.gz".

Thoughts? If we ensure there's no namespace collisions between mime
type extensions and filename extensions and encoding extension then this
is easy. If not....

Brian


--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: Patches to handle content-language [ In reply to ]
On Mon, 17 Jul 1995, Robert S. Thau wrote:
> One brief thought on this subject --- managing a site on which the same
> extension (e.g., .fr) could mean multiple things (French, Framemaker)
> would no doubt lead to severe confusion. My temptation would be to
> disallow it on those grounds alone.

Noted - I should have also said that unless you specify that "basename"
can't have a period in it, then you have to worry about the namespace of
parts of the filenames too.

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: Patches to handle content-language [ In reply to ]
I think we have a small number of choices:

1) forbid namespace collisions between content-type,language,encoding.
a) basename(.[content-type,language,encoding])* where basename does not
have a "."
b) basename.[content-type,language,encoding](-[content-type,language,encoding])*
i.e. basename.html-en-gz

2) express meta-data like this in wholly separate .var files, and ditch
the ability to ask for "all french versions of this resource", etc.

We already have #2. Can we make #1 toggle-able, with a note that it imposes
certain constraints (that the server could flag with warning messages on
startup)?

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: Patches to handle content-language [ In reply to ]
One brief thought on this subject --- managing a site on which the same
extension (e.g., .fr) could mean multiple things (French, Framemaker)
would no doubt lead to severe confusion. My temptation would be to
disallow it on those grounds alone.

(Incidentally, as I understand it, the CERN server, which is the only
prior art I'm aware of in this area, does put all suffixes in the same
namespace).

rst
Re: Patches to handle content-language [ In reply to ]
>
> 2) express meta-data like this in wholly separate .var files, and ditch
> the ability to ask for "all french versions of this resource", etc.
>
> We already have #2. Can we make #1 toggle-able, with a note that it imposes
> certain constraints (that the server could flag with warning messages on
> startup)?


Why can't we add the ability to glob extensions/meta-data in the .var files?
Re: Patches to handle content-language [ In reply to ]
>[rst]
> BTW, I'm running with Florent's language negotiation code, so I think
> it works; this is also a much-requested feature, so it would be nice
> to have it in the release. My only problem with it is that it sets
> the priority for languages by the order of AddLanguage directives in
> the config files --- I'd prefer an explicit LanguagePriority directive
> for two reasons.
>
> First off, [... should be an explicit priority, ordering is clearer ...]
> Secondly, [... not modular ...]
>
> Otherwise, as I said, the code seems to work fine, and though it comes
> in a bit late, if this problem is solved, then I at least would be
> happy to have it.


You're right, the code I gave you was ugly, I was a bit ashamed to call
a module from another module. But it was a first test, anyway.

Okay, here's a new patch (over 0.8.1) that does it cleanly. There's a
new per-directory config directive, LanguagePriority, that lists the
languages (and not the suffixes) in decreasing priority of preference.
A directive in .htaccess is treated as if it was _before_ the
directory-wide config, because it has to be able to override it.
(My description of AddLanguage in the first version still applies.)


[.I just received Brian's comments, I'll send a seperate mail about them.]

Florent


Index: src/http_config.h
*** apache_0.8.1.orig/src/http_config.h Mon Jun 26 00:42:14 1995
--- apache_0.8.1/src/http_config.h Mon Jul 17 22:39:34 1995
***************
*** 176,182 ****
* (as a SERVER_ERROR, since the module which was
* supposed to handle this was configured wrong).
* type_checker --- Determine MIME type of the requested entity;
! * sets content_type and _encoding fields.
* logger --- log a transaction. Not supported yet out of sheer
* laziness on my part.
*/
--- 176,182 ----
* (as a SERVER_ERROR, since the module which was
* supposed to handle this was configured wrong).
* type_checker --- Determine MIME type of the requested entity;
! * sets content_type, _encoding and _language fields.
* logger --- log a transaction. Not supported yet out of sheer
* laziness on my part.
*/
Index: src/http_protocol.c
*** apache_0.8.1.orig/src/http_protocol.c Thu Jul 13 02:28:05 1995
--- apache_0.8.1/src/http_protocol.c Mon Jul 17 22:39:35 1995
***************
*** 499,504 ****
--- 499,507 ----
if (r->content_encoding)
fprintf (fd, "Content-encoding: %s\015\012", r->content_encoding);

+ if (r->content_language)
+ fprintf (fd, "Content-language: %s\015\012", r->content_language);
+
for (i = 0; i < hdrs_arr->nelts; ++i) {
if (!hdrs[i].key) continue;
fprintf (fd, "%s: %s\015\012", hdrs[i].key, hdrs[i].val);
Index: src/httpd.h
*** apache_0.8.1.orig/src/httpd.h Mon Jul 17 15:24:32 1995
--- apache_0.8.1/src/httpd.h Mon Jul 17 22:39:34 1995
***************
*** 282,287 ****
--- 282,288 ----

char *content_type; /* Break these out --- we dispatch on 'em */
char *content_encoding;
+ char *content_language;

int no_cache;

Index: src/mod_mime.c
*** apache_0.8.1.orig/src/mod_mime.c Fri Jun 30 13:54:26 1995
--- apache_0.8.1/src/mod_mime.c Mon Jul 17 23:16:03 1995
***************
*** 69,74 ****
--- 69,75 ----
typedef struct {
table *forced_types; /* Additional AddTyped stuff */
table *encoding_types; /* Added with AddEncoding... */
+ table *language_types; /* Added with AddLanguage... */
} mime_dir_config;

module mime_module;
***************
*** 80,85 ****
--- 81,87 ----

new->forced_types = make_table (p, 4);
new->encoding_types = make_table (p, 4);
+ new->language_types = make_table (p, 4);

return new;
}
***************
*** 95,100 ****
--- 97,104 ----
base->forced_types);
new->encoding_types = overlay_tables (p, add->encoding_types,
base->encoding_types);
+ new->language_types = overlay_tables (p, add->language_types,
+ base->language_types);

return new;
}
***************
*** 113,118 ****
--- 117,129 ----
return NULL;
}

+ char *add_language(cmd_parms *cmd, mime_dir_config *m, char *lang, char *ext)
+ {
+ if (*ext == '.') ++ext;
+ table_set (m->language_types, ext, lang);
+ return NULL;
+ }
+
/* The sole bit of server configuration that the MIME module has is
* the name of its config file, so...
*/
***************
*** 129,134 ****
--- 140,147 ----
"a mime type followed by a file extension" },
{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2,
"an encoding (e.g., gzip), followed by a file extension" },
+ { "AddLanguage", add_language, NULL, OR_FILEINFO, TAKE2,
+ "a language (e.g., fr), followed by a file extension" },
{ "TypesConfig", set_types_config, NULL, RSRC_CONF, TAKE1,
"the MIME types config file" },
{ NULL }
***************
*** 198,203 ****
--- 211,227 ----
if ((type = table_get (conf->encoding_types, &fn[i])))
{
r->content_encoding = type;
+
+ /* go back to previous extension to try to use it as a language */
+
+ fn[i-1] = '\0';
+ if((i=rind(fn,'.')) < 0) return OK;
+ ++i;
+ }
+
+ if ((type = table_get (conf->language_types, &fn[i])))
+ {
+ r->content_language = type;

/* go back to previous extension to try to use it as a type */

Index: src/mod_negotiation.c
*** apache_0.8.1.orig/src/mod_negotiation.c Sat Jul 1 19:46:05 1995
--- apache_0.8.1/src/mod_negotiation.c Tue Jul 18 00:22:28 1995
***************
*** 71,78 ****
--- 71,113 ----
* server basis...
*/

+ typedef struct {
+ array_header *language_priority;
+ } neg_dir_config;
+
module negotiation_module;

+ void *create_neg_dir_config (pool *p, char *dummy)
+ {
+ neg_dir_config *new =
+ (neg_dir_config *) palloc (p, sizeof (neg_dir_config));
+
+ new->language_priority = make_array (p, 4, sizeof (char *));
+ return new;
+ }
+
+ void *merge_neg_dir_configs (pool *p, void *basev, void *addv)
+ {
+ neg_dir_config *base = (neg_dir_config *)basev;
+ neg_dir_config *add = (neg_dir_config *)addv;
+ neg_dir_config *new =
+ (neg_dir_config *) palloc (p, sizeof (neg_dir_config));
+
+ /* give priority to the config in the subdirectory */
+ new->language_priority = append_arrays (p, add->language_priority,
+ base->language_priority);
+ return new;
+ }
+
+ char *set_language_priority (cmd_parms *cmd, void *n, char *lang)
+ {
+ array_header *arr = ((neg_dir_config *) n)->language_priority;
+ char **langp = (char **) push_array (arr);
+
+ *langp = pstrdup (arr->pool, lang);
+ return NULL;
+ }
+
char *cache_negotiated_docs (cmd_parms *cmd, void *dummy, char *dummy2)
{
void *server_conf = cmd->server->module_config;
***************
*** 89,94 ****
--- 124,131 ----
command_rec negotiation_cmds[] = {
{ "CacheNegotiatedDocs", cache_negotiated_docs, NULL, RSRC_CONF, RAW_ARGS,
NULL },
+ { "LanguagePriority", set_language_priority, NULL, OR_FILEINFO, ITERATE,
+ NULL },
{ NULL }
};

***************
*** 132,138 ****
char *type_name;
char *file_name;
char *content_encoding;
! char *lang;
float level; /* Auxiliary to content-type... */
float qs;
float bytes;
--- 169,175 ----
char *type_name;
char *file_name;
char *content_encoding;
! char *content_language;
float level; /* Auxiliary to content-type... */
float qs;
float bytes;
***************
*** 172,178 ****
mime_info->type_name = "";
mime_info->file_name = "";
mime_info->content_encoding = "";
! mime_info->lang = "";

mime_info->is_pseudo_html = 0.0;
mime_info->level = 0.0;
--- 209,215 ----
mime_info->type_name = "";
mime_info->file_name = "";
mime_info->content_encoding = "";
! mime_info->content_language = "";

mime_info->is_pseudo_html = 0.0;
mime_info->level = 0.0;
***************
*** 560,567 ****
mime_info.bytes = atoi(body);
}
else if (!strncmp (buffer, "content-language:", 17)) {
! mime_info.lang = get_token (neg->pool, &body, 0);
! str_tolower (mime_info.lang);
}
else if (!strncmp (buffer, "content-encoding:", 17)) {
mime_info.content_encoding = get_token (neg->pool, &body, 0);
--- 597,604 ----
mime_info.bytes = atoi(body);
}
else if (!strncmp (buffer, "content-language:", 17)) {
! mime_info.content_language = get_token (neg->pool, &body, 0);
! str_tolower (mime_info.content_language);
}
else if (!strncmp (buffer, "content-encoding:", 17)) {
mime_info.content_encoding = get_token (neg->pool, &body, 0);
***************
*** 589,597 ****
int read_types_multi (negotiation_state *neg)
{
request_rec *r = neg->r;
- char *file_name = pstrdup (r->pool, r->filename);

! char *filp = &file_name[strlen(file_name) - 1];
int prefix_len;
DIR *dirp;
struct DIR_TYPE *dir_entry;
--- 626,633 ----
int read_types_multi (negotiation_state *neg)
{
request_rec *r = neg->r;

! char *filp;
int prefix_len;
DIR *dirp;
struct DIR_TYPE *dir_entry;
***************
*** 648,653 ****
--- 684,690 ----
mime_info.sub_req = sub_req;
mime_info.file_name = dir_entry->d_name;
mime_info.content_encoding = sub_req->content_encoding;
+ mime_info.content_language = sub_req->content_language;

get_entry (neg->pool, &accept_info, sub_req->content_type);
set_mime_fields (&mime_info, &accept_info);
***************
*** 759,767 ****

int find_lang_index (array_header *accept_langs, char *lang)
{
! accept_rec *accs = (accept_rec *)accept_langs->elts;
int i;

for (i = 0; i < accept_langs->nelts; ++i)
if (!strncmp (lang, accs[i].type_name, strlen(accs[i].type_name)))
return i;
--- 796,809 ----

int find_lang_index (array_header *accept_langs, char *lang)
{
! accept_rec *accs;
int i;

+ if (!lang)
+ return -1;
+
+ accs = (accept_rec *)accept_langs->elts;
+
for (i = 0; i < accept_langs->nelts; ++i)
if (!strncmp (lang, accs[i].type_name, strlen(accs[i].type_name)))
return i;
***************
*** 769,774 ****
--- 811,842 ----
return -1;
}

+ /* This function returns the priority of a given language
+ * according to LanguagePriority. It is used in case of a tie
+ * between several languages.
+ */
+
+ int find_default_index (neg_dir_config *conf, char *lang)
+ {
+ array_header *arr;
+ int nelts;
+ char **elts;
+ int i;
+
+ if (!lang)
+ return -1;
+
+ arr = conf->language_priority;
+ nelts = arr->nelts;
+ elts = (char **) arr->elts;
+
+ for (i = 0; i < nelts; ++i)
+ if (!strcasecmp (elts[i], lang))
+ return i;
+
+ return -1;
+ }
+
void find_lang_indexes (negotiation_state *neg)
{
var_rec *var_recs = (var_rec*)neg->avail_vars->elts;
***************
*** 776,793 ****
int found_any = 0;

if (neg->accept_langs->nelts == 0) {
-
- /* Client doesn't care */

for (i = 0; i < neg->avail_vars->nelts; ++i)
! var_recs[i].lang_index = -1;

return;
}

for (i = 0; i < neg->avail_vars->nelts; ++i)
if (var_recs[i].quality > 0) {
! int index = find_lang_index (neg->accept_langs, var_recs[i].lang);

var_recs[i].lang_index = index;
if (index >= 0) found_any = 1;
--- 844,866 ----
int found_any = 0;

if (neg->accept_langs->nelts == 0) {

+ /* Client doesn't care : use LanguagePriority order */
+
+ neg_dir_config *conf =
+ (neg_dir_config *) get_module_config (neg->r->per_dir_config,
+ &negotiation_module);
for (i = 0; i < neg->avail_vars->nelts; ++i)
! var_recs[i].lang_index =
! find_default_index (conf, var_recs[i].content_language);

return;
}

for (i = 0; i < neg->avail_vars->nelts; ++i)
if (var_recs[i].quality > 0) {
! int index = find_lang_index (neg->accept_langs,
! var_recs[i].content_language);

var_recs[i].lang_index = index;
if (index >= 0) found_any = 1;
***************
*** 1031,1036 ****
--- 1104,1110 ----
r->filename = sub_req->filename;
r->content_type = sub_req->content_type;
r->content_encoding = sub_req->content_encoding;
+ r->content_language = sub_req->content_language;
r->finfo = sub_req->finfo;

return OK;
***************
*** 1044,1051 ****
module negotiation_module = {
STANDARD_MODULE_STUFF,
NULL, /* initializer */
! NULL, /* dir config creater */
! NULL, /* dir merger --- default is to override */
NULL, /* server config */
NULL, /* merge server config */
negotiation_cmds, /* command table */
--- 1118,1125 ----
module negotiation_module = {
STANDARD_MODULE_STUFF,
NULL, /* initializer */
! create_neg_dir_config, /* dir config creater */
! merge_neg_dir_configs, /* dir merger --- default is to override */
NULL, /* server config */
NULL, /* merge server config */
negotiation_cmds, /* command table */

--
Florent.Guillaume@ens.fr
Re: Patches to handle content-language [ In reply to ]
>[Brian]
>
> I really like this, but what resolves name collisions and missing
> info between type, lang, and encoding? For example, if I decide to name
> all my Framemaker documents .fr, what happens to document.fr?
> document.fr.en? document.fr.fr? If type, lang, and encoding shared the
> same namespace, *no* problem. In this case, we're using filename
> extensions to indicate meta-information other than content-type, which
> I'm certainly comfortable with, but the collision issue should be
> resolved somehow.
>
> Also, it would be tremendous if I could have the flexibility to negotiate
> on file type and language and encoding by specifying only the meta-info I
> want in the filename - in other words, lets say I have documents in all
> the possible variations of
>
> basename.[html,txt,pdf].[en,fr,jp].[gz,Z,uu]
>
> Right now with content-negotiation, if I have an index.html and an
> index.html3, then I can simply point a resource locator to "index" and
> negotiation happens, but I can also defeat negotiation by explicitly
> linking to "index.html3" if I wanted to make sure someone got the 3.0
> version.
>
> Let's say for the above 9 versions of the document I wanted to
> be able to specify which variables are mandatory. If I didn't care at
> all which document was fetched, I'd create a link to "basename". If I
> wanted specifically the gzip'd french PDF, I'd make a link to
> "basename.pdf.fr.gzip". Now, let's say I want to make a link to all
> french variants explicit, yet let the client/server negotiate on their
> own as to encoding and content-type preferences. I'd like to then link
> to "basename.fr". Or, I specifically want the uuencoded PDF's, but I
> don't care what language: "basename.pdf.gz".
>
> Thoughts? If we ensure there's no namespace collisions between mime
> type extensions and filename extensions and encoding extension then this
> is easy. If not....


Concerning namespace collision : it's certainly a problem.
Currently the behaviour I have is :
document.fr -> french
document.fr.fr -> french, framemaker
document.fr.en -> english, framemaker
This is because the code starts with the last suffix and moves to the
left, looking for an encoding, then a language, then a type.

What should be done ? Forbidding any namespace collision would be a bit
exaggerated, because (as you showed) it can very well happen that a
content-type is also an abbreviation for a language. So I think the
content-type should be given priority over the content-language,
somehow (more later).


Now on the topic of missing info for the negotiation. What you're
describing is exactly the behaviour of the CERN server :

Supposing you ask for "basename.pdf.gz", CERN first extracts the
basename of the requested file, "basename", and the associated suffixes,
"pdf" and "gz" (the ordering of suffixes isn't important for CERN. Then
it looks in the requested directory for all files with the same
"basename", and for each one analyses the suffixes. It eliminates all
the files in which the requested suffixes are not present, and is left
in our case with "basename.pdf.en.gz", "basename.pdf.fr.gz", and
"basename.pdf.jp.gz". The prioritizing between these three is made by
the usual quality assessment. But before this last quality assessment,
the suffixes had no meaning, so the files "the.dog.eats.cat.txt" and
"the.cat.eats.dog.txt" are confused by the server. It's unclear to me
how we could make the distinction : keep all suffixes unknown to the
server in the "basename" part ?

Instead of this, the current behaviour of Apache in MultiViews is to
look for files that have the same beginning as the requested filename
(not simply the basename) with additionnal suffixes, and to do typing of
suffixes in a fixed order.

Doing things a la CERN is not difficult (and I'll probably write a patch
for this), but it should be a little bit slower than what we have now.
This may not be a problem if the majority of files are accessed by exact
name, or if the directories have a small number of files. Also do we
keep the fact that the order of suffixes is unimportant
(i.e. file.txt.gz and file.gz.txt both work with CERN) ? I have mixed
feelings : I like both index.fr.html and index.html.fr, but I think the
encoding should come last (this reflects the way gzip and compress
work).


Note that the problem of namespace collision still exists with the CERN
behaviour : suppose the files you have are all 27 (not 9, BTW) variations of

basename.[fr,txt,pdf].[en,fr,jp].[gz,Z,uu]

Then how do we treat a request for "basename.fr.gz" ? If we arrange to
have content-type > content-language, then this is a request for the
Framemaker version, in any language. But then how do we ask for a
gziped French version of the document in any type available ? It can be
done using an Accept-Language: fr header, and requesting "basename.gz",
but this is cheating, we want something in the URL only.


Florent

--
Florent.Guillaume@ens.fr
Re: Patches to handle content-language [ In reply to ]
In reply to Florent Guillaume who said
>
>
> Note that the problem of namespace collision still exists with the CERN
> behaviour : suppose the files you have are all 27 (not 9, BTW) variations of
>
> basename.[fr,txt,pdf].[en,fr,jp].[gz,Z,uu]
>
> Then how do we treat a request for "basename.fr.gz" ? If we arrange to
> have content-type > content-language, then this is a request for the
> Framemaker version, in any language. But then how do we ask for a
> gziped French version of the document in any type available ? It can be
> done using an Accept-Language: fr header, and requesting "basename.gz",
> but this is cheating, we want something in the URL only.

Why not use the locale naming scheme (there's a standard for this somewhere),
e.g. from FreeBSD's locale directory,

da_DK.ISO8859-1@ fi_FI.ISO8859-1@ lt_LN.ISO8859-1/
de_AT.ISO8859-1@ fr_BE.ISO8859-1@ nl_BE.ISO8859-1@
de_CH.ISO8859-1@ fr_CA.ISO8859-1@ nl_NL.ISO8859-1@
de_DE.ISO8859-1@ fr_CH.ISO8859-1@ no_NO.ISO8859-1@
en_AU.ISO8859-1@ fr_FR.ISO8859-1@ pt_PT.ISO8859-1@
en_CA.ISO8859-1@ is_IS.ISO8859-1@ ru_SU.CP866/
en_GB.ISO8859-1@ it_CH.ISO8859-1@ ru_SU.KOI8-R/
en_US.ISO8859-1@ it_IT.ISO8859-1@ sv_SE.ISO8859-1@
es_ES.ISO8859-1@ ja_JP.EUC/

Obviously you just use the country code as in

basename.fr.fr_CA.gz

which is a French gzipped framemaker document. Don't ask me what these
codes stand for, I know the en_* ones of course, en_GB is British
English, en_US is US English etc.

Just an idea, I doubt you'd get namespace collision using those
codes though.

--
Paul Richards, Bluebird Computer Systems. FreeBSD core team member.
Internet: paul@FreeBSD.org, http://www.freebsd.org/~paul
Phone: 0370 462071 (Mobile), +44 1222 457651 (home)
Re: Patches to handle content-language [ In reply to ]
From: Florent Guillaume

> Note that the problem of namespace collision still exists with the CERN
> behaviour : suppose the files you have are all 27 (not 9, BTW) variations of
>
> basename.[fr,txt,pdf].[en,fr,jp].[gz,Z,uu]
>
> Then how do we treat a request for "basename.fr.gz" ? If we arrange to
> have content-type > content-language, then this is a request for the
> Framemaker version, in any language. But then how do we ask for a
> gziped French version of the document in any type available ? It can be
> done using an Accept-Language: fr header, and requesting "basename.gz",
> but this is cheating, we want something in the URL only.

Hmmm... there seems to be a real can of worms here. Still, this is an
oft-requested feature, for which we have code which *appears* to work,
and it would be a shame to leave it out.

Just to be sure that we're debating a real issue here (and that simply
mandating, or assuming, that all suffixes are in the same namespace,
which would solve the issue, really is an inviable option), are people
here actually using *.fr as a suffix for FrameMaker? (The software
itself seems not to care, as far as I can tell).

rst
Re: Patches to handle content-language [ In reply to ]
On Tue, 18 Jul 1995, Robert S. Thau wrote:
> Just to be sure that we're debating a real issue here (and that simply
> mandating, or assuming, that all suffixes are in the same namespace,
> which would solve the issue, really is an inviable option), are people
> here actually using *.fr as a suffix for FrameMaker? (The software
> itself seems not to care, as far as I can tell).

No, I made that up. I don't know of a solid example, really. But that
doesn't mean it couldn't happen, and would get reported as a bug :)

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: Patches to handle content-language [ In reply to ]
Brian B wrote,

> On Tue, 18 Jul 1995, Robert S. Thau wrote:
> > Just to be sure that we're debating a real issue here (and that simply
> > mandating, or assuming, that all suffixes are in the same namespace,
> > which would solve the issue, really is an inviable option), are people
> > here actually using *.fr as a suffix for FrameMaker? (The software
> > itself seems not to care, as far as I can tell).
>
> No, I made that up. I don't know of a solid example, really. But that
> doesn't mean it couldn't happen, and would get reported as a bug :)


.au = Australia = audio file
.pl = Poland = perl script
.ph = Philippines = perl header file
.cf = Central African Rep. = config file


I'm sure there are lots more.


rob
--
http://nqcd.lanl.gov/~hartill/
Re: Patches to handle content-language [ In reply to ]
Oops, I overlooked a small thing : if not all the languages were named
in LanguagePriority, you got incorrect behaviour. Please apply this
patch as well over my previous one.

Florent

*** apache_0.8.1/src/mod_negotiation.c Tue Jul 18 00:22:28 1995
--- apache_0.8.1.new/src/mod_negotiation.c Tue Jul 18 14:41:57 1995
***************
*** 842,865 ****
var_rec *var_recs = (var_rec*)neg->avail_vars->elts;
int i;
int found_any = 0;

! if (neg->accept_langs->nelts == 0) {

- /* Client doesn't care : use LanguagePriority order */
-
- neg_dir_config *conf =
- (neg_dir_config *) get_module_config (neg->r->per_dir_config,
- &negotiation_module);
- for (i = 0; i < neg->avail_vars->nelts; ++i)
- var_recs[i].lang_index =
- find_default_index (conf, var_recs[i].content_language);
-
- return;
- }
-
for (i = 0; i < neg->avail_vars->nelts; ++i)
if (var_recs[i].quality > 0) {
! int index = find_lang_index (neg->accept_langs,
var_recs[i].content_language);

var_recs[i].lang_index = index;
--- 842,862 ----
var_rec *var_recs = (var_rec*)neg->avail_vars->elts;
int i;
int found_any = 0;
+ neg_dir_config *conf;
+ int naccept = neg->accept_langs->nelts;

! if (naccept == 0)
! conf = (neg_dir_config *) get_module_config (neg->r->per_dir_config,
! &negotiation_module);

for (i = 0; i < neg->avail_vars->nelts; ++i)
if (var_recs[i].quality > 0) {
! int index;
! if (naccept == 0) /* Client doesn't care */
! index = find_default_index (conf,
! var_recs[i].content_language);
! else /* Client has Accept-Language */
! index = find_lang_index (neg->accept_langs,
var_recs[i].content_language);

var_recs[i].lang_index = index;
Re: Patches to handle content-language [ In reply to ]
> a propitious palliative for the perplexities that would previal were
> a .pl suffix to be presumptively prescribed for the present purpose).

> rst


she sells sea s^H^H^H^H^H^H^H^H^H^H^H

Just be wary of relying on improbabilities to keep things working...
if you give people the tools to break the system, they invariably end
up using them, by which time it's too late to replace the tools with
something foolproof.


rob
Re: Patches to handle content-language [ In reply to ]
>Why not use the locale naming scheme (there's a standard for this somewhere),
>e.g. from FreeBSD's locale directory,
>
>da_DK.ISO8859-1@ fi_FI.ISO8859-1@ lt_LN.ISO8859-1/
>de_AT.ISO8859-1@ fr_BE.ISO8859-1@ nl_BE.ISO8859-1@
>de_CH.ISO8859-1@ fr_CA.ISO8859-1@ nl_NL.ISO8859-1@
>de_DE.ISO8859-1@ fr_CH.ISO8859-1@ no_NO.ISO8859-1@
>en_AU.ISO8859-1@ fr_FR.ISO8859-1@ pt_PT.ISO8859-1@
>en_CA.ISO8859-1@ is_IS.ISO8859-1@ ru_SU.CP866/
>en_GB.ISO8859-1@ it_CH.ISO8859-1@ ru_SU.KOI8-R/
>en_US.ISO8859-1@ it_IT.ISO8859-1@ sv_SE.ISO8859-1@
>es_ES.ISO8859-1@ ja_JP.EUC/
>
>Obviously you just use the country code as in
>
>basename.fr.fr_CA.gz
>
>which is a French gzipped framemaker document. Don't ask me what these
>codes stand for, I know the en_* ones of course, en_GB is British
>English, en_US is US English etc.

The first two characters are the ISO 639:1988 "Code for the representation
of names of languages". The underscore is a common Unix convention for
Locale names. The next two characters are the ISO 3166, A2 abbreviation
for country codes. After the "." is the character encoding (known to HTTP
as charset="iso-8859-1").

Note, however, that the WWW will be using the language tags defined
by RFC 1766 (similar, but not quite the same).


....Roy T. Fielding Department of ICS, University of California, Irvine USA
Visiting Scholar, MIT/LCS + World-Wide Web Consortium
(fielding@w3.org) (fielding@ics.uci.edu)
Re: Patches to handle content-language [ In reply to ]
> There clearly is potential trouble here, and it would be good if our
> docs mentioned it, but there is a way around it (just don't assign
> ambiguous suffixes --- there's nothing that forces you into it),

except a CD full of framemaker documents with .fr extensions ? which
you can't change.

> I say ship it ;-).

as long as nobody has a better solution.


rob
Re: Patches to handle content-language [ In reply to ]
> except a CD full of framemaker documents with .fr extensions ? which
> you can't change.
>
>
> That's why I asked why *.fr was a genuine problem --- I think Brian said
> he made it up.

Well the .pl Poland/perl problem could exist.



Someone will hit this sooner or later, the question is, do we care ?
I don't, I'm just playing devil's advocate.

rob
Re: Patches to handle content-language [ In reply to ]
From: Rob Hartill <hartill@ooo.lanl.gov>
Date: Tue, 18 Jul 95 13:41:43 MDT

.au = Australia = audio file
.pl = Poland = perl script
.ph = Philippines = perl header file
.cf = Central African Rep. = config file


I'm sure there are lots more.

FWIW, those are conflicts between country codes, (and therefore
potential HTTP language codes) and common file extensions, but they
don't necessarily provide a problem for Florent's code --- Poles could
do something like

AddLanguage pl .po

to mark text files as Polish while avoiding the ambiguity with Perl
scripts. (I expect that any prospective Polish Perl hacker would find
use of, say, .po instead of .pl to be a propitious palliative for the
perplexities that would previal were a .pl suffix to be presumptively
prescribed for the present purpose).

rst
Re: Patches to handle content-language [ In reply to ]
From: Rob Hartill <hartill@ooo.lanl.gov>
Date: Tue, 18 Jul 95 15:19:19 MDT

> a propitious palliative for the perplexities that would previal were
> a .pl suffix to be presumptively prescribed for the present purpose).

> rst


she sells sea s^H^H^H^H^H^H^H^H^H^H^H

Just be wary of relying on improbabilities to keep things working...
if you give people the tools to break the system, they invariably end
up using them, by which time it's too late to replace the tools with
something foolproof.

Well, if people want a foolproof tool which lets them spell out
everything in detail, they've got it (*.var files). However, there
are plenty of people who want to the server to make a best guess based
on suffixes, and a lot of them are using the CERN server to do exactly
that, potential ambiguities and all, and they do want Apache to do the
same thing.

There clearly is potential trouble here, and it would be good if our
docs mentioned it, but there is a way around it (just don't assign
ambiguous suffixes --- there's nothing that forces you into it), and
the potential screw isn't much worse at all than the potential for
abuse of existing features (conflicts bewteen content-encodings and
content-types in the extension namespace).

I say ship it ;-).

rst
Re: Patches to handle content-language [ In reply to ]
except a CD full of framemaker documents with .fr extensions ? which
you can't change.


That's why I asked why *.fr was a genuine problem --- I think Brian said
he made it up.

(I think the suffix I've actually seen for the purpose was .fm, but it's
been long enough that I've been anyplace which seriously used framemaker
that I'm not sure I trust my memory on the issue...).

rst
Re: Patches to handle content-language [ In reply to ]
Ooops... that's why I asked *if* *.fr was a real problem...

rst
Re: Patches to handle content-language [ In reply to ]
I think .po is an adequate solution for the Poles --- we just have to
be sure they're aware of it...

rst