Mailing List Archive: YAS

YAS

Apr 10, 1995, 10:37 AM

Post #1 of 47 (6557 views)

another suggestion....

Have httpd parse ALIWEB index files, and return formtted output.

I've got a perl script which does this, but it might be useful to
build it into Apache.

Someone would hit an ALIWEB mime type, if there were arguments then
it would look for those as ALIWEB keywords, if there were no args
then it would return a form where the keywords could be typed in.

-=-=-=-=

Aliweb looks for "/site.idx" which contains things like..

Template-Type: ORGANIZATION
Organization-Name: Department Of Computing Mathematics, UWCC.
URI: /Places/comma.html
Description: The department for computer science in the University of Wales College of Cardiff. Wales. UK
Keywords: COMMA, UWCC, Cardiff, Wales, computer science, computimg mathematics

Template-Type: SERVICE
Name: The rec.arts.movies database
URI: /Movies/index.html
Description: An interface to the rec.arts.movies database of movie facts
Keywords: movies, film, cinema, reviews

Template-Type: DOCUMENT
Name: Centre for High Performance Computing - Cardiff
URI: /Hpc/index.html
Description: Promotion and support of High Performance Computing
Keywords: high performance computing, JISC

Re: YAS [ In reply to ]

Apr 10, 1995, 7:07 PM

Post #2 of 47 (6537 views)

From: Rob Hartill <hartill@ooo.lanl.gov>
Date: Mon, 10 Apr 95 10:37:48 MDT

Have httpd parse ALIWEB index files, and return formtted output.

... um... why not just parse the thing once, when it's built, and
serve the output as an ordinary file?

I've got a perl script which does this, but it might be useful to
build it into Apache.

There's a tradeoff here --- putting something like this into the
server itself makes it run faster, but at the same time it also
complicates the server code itself. For, say, imagemap, this may be
worth the tradeoff, but I'm not sure that prettyprinting ALIWEB files
comes up often enough that it can't run as effectively as it needs to
as a script.

(One thing we might want to do about these sorts of "YAS" things that
seem to be coming up is to define a binary interface, along the lines
of Simon's BGI or the NetSite internal APIs, and write these things as
examples of that. Brian's suggestion for trying a case-insensitive
file match as recovery from a 404 could be done, for instance, by
setting ErrorDocument to the effective URL of such an internaly loaded
module --- perhaps even to a script, if it doesn't come up much).

rst

Re: indexing suggestion [ In reply to ]

Apr 11, 1995, 5:41 PM

Post #3 of 47 (6533 views)

> From: Rob Hartill <hartill@ooo.lanl.gov>
> Date: Mon, 10 Apr 95 10:37:48 MDT
>
> Have httpd parse ALIWEB index files, and return formtted output.
>
> ... um... why not just parse the thing once, when it's built, and
> serve the output as an ordinary file?

I'm suggesting we have the server search the index file when it
is requested with arguments. Without arguments it prompts for some.

So I could hit some site with a link to their index, and be able to
type in a keyword. It would then give me a formatted list of pointers
to what I probably wanted.

Now it may be that ALIWEB doesn't have the ideal syntax for this, but
maybe we can define some kind of local index file that better suits this
idea.

The index files, being of a special MIME type could be placed in
lots of directories, so that the index will be specific to that
region of the server. A top level index could point directly to
resources or to lower level indicies.

A simple approach could be to have a format such as

#comment
URL
keywords
description

e.g.
#let's index my new game
/Robs_junk/new/game.html
game,entertainment,hangman,fun
A www verstion of the classic hangman game
# i stole the hangman code form Fred's site
http://fred.com/cgi-bin/hangman
game,entertainment,hangman,fun,fred
The original version which I based <A HREF="/Robs_junk/new/game.html">my hangman game on</A>

HTML doesn't give a damn about \r\n so, the syntax could just be 1
field per line, with an unlimited line length.

Robots could be incouraged to request the source of the index, so that
they do a proper job of indexing the web.

thoughts ?

Re: indexing suggestion [ In reply to ]

Apr 12, 1995, 7:47 AM

Post #4 of 47 (6528 views)

From: Rob Hartill <hartill@ooo.lanl.gov>
Date: Tue, 11 Apr 95 17:41:49 MDT

I'm suggesting we have the server search the index file when it
is requested with arguments. Without arguments it prompts for some.

So I could hit some site with a link to their index, and be able to
type in a keyword. It would then give me a formatted list of pointers
to what I probably wanted.

Now it may be that ALIWEB doesn't have the ideal syntax for this, but
maybe we can define some kind of local index file that better suits this
idea.

Last time I checked, Martijn was actually giving out code for a search
engine that worked on ALIWEB template files... it probably would be a good
addition to the cgi-bin of our distribution, but I'm still not sure I
see the point of integrating the code into the server itself.

rst

Re: indexing suggestion [ In reply to ]

Apr 12, 1995, 10:25 AM

Post #5 of 47 (6537 views)

> Last time I checked, Martijn was actually giving out code for a search
> engine that worked on ALIWEB template files... it probably would be a good
> addition to the cgi-bin of our distribution, but I'm still not sure I
> see the point of integrating the code into the server itself.

Okay, forget ALIWEB, but think along those lines.

For the cost of a few string compares, we can allow people to set
up index files in any directory - not a plain list of pointers, we're
talking about a database of URLs, keywords and descriptions which are
searched by httpd.

By hitting the index file URL for a directory, I could

1) ask for a form (no arguments given with the URL)
2) query the index (any arguments given)
3) view the index source (special argument given)

Andy suggested WAIS and glimpse. This is something different -
the resource owners decide what goes into the index, and how it
is described (the ALIWEB approach). The index files will typically
be small

A simple format such as the one I gave earlier will be easy to
parse. Searching will be performed on the keywords only - just
do simple case insensitive string comparisons on comma separated
keywords.

Now, because this is a special MIME type (maybe call it httpd/index)
it's really inexpensive to check for it. If you see it, you jump to
some new code.

Integrating this simple idea into the server will mean that everyone
will be able to correctly index their stuff without any CGI privillages.

The indicies will be hierarchical,

e.g.

/foo.indx could index lots of info about the site, as well
as point to other index files deeper in the URL file system.
A keyword of "games" could point you to the dedicated games index.
The webmaster wouldn't have to worry about indexing his users
resources.

If all of this was expensive, I too would have my doubts, and would
suggest it be CGI'ed. But it's so easy to bolt on to the existing code,
and by being based on simple format and searching principles, it shouldn't
have an impact on server performance.

w.r.t robots, they could ask for the raw index file and use that to
build an ALIWEB style of index - one which is far superior to existing
"grab everything and guess" robot indexing techniques.

Someone told be the other day that it's pointless just saying "it's easy".
Implement it and show people how easy it was. I will have a crack at it
today.

robh

Re: indexing suggestion [ In reply to ]

Apr 12, 1995, 1:24 PM

Post #6 of 47 (6529 views)

From: Rob Hartill <hartill@ooo.lanl.gov>
Date: Wed, 12 Apr 95 10:25:54 MDT

> Last time I checked, Martijn was actually giving out code for a search
> engine that worked on ALIWEB template files... it probably would be a good
> addition to the cgi-bin of our distribution, but I'm still not sure I
> see the point of integrating the code into the server itself.

Okay, forget ALIWEB, but think along those lines.

For the cost of a few string compares, we can allow people to set
up index files in any directory - not a plain list of pointers, we're
talking about a database of URLs, keywords and descriptions which are
searched by httpd.

This can still be done perfectly well using a script sitting in
/cgi-bin, which finds the location of the per-directory index file
from PATH_INFO_TRANSLATED --- in fact, that's what that CGI variable
is there for in the first place. It works fine for imagemap, and it
would work fine for a search script as well.

As I think I've said before, my belief is that stuff which can be
reasonably implemented as CGI scripts should be, just to keep creeping
featuritis out of the server itself. I'm afraid you still haven't
made a convincing case for an exception here --- the server would be
doing nothing which a CGI script couldn't do just as well, and I'm not
convinced the efficiency gain is worth the complication.

If you've tried it with a script-based approach, and found something
which simply cannot be done, or cannot be done effectively, without
some kind of hooks into the server, I might feel differently. But as
it is, I'm not at all convinced.

rst

Re: indexing suggestion [ In reply to ]

Apr 12, 1995, 3:56 PM

Post #7 of 47 (6525 views)

> Someone told be the other day that it's pointless just saying "it's easy".
> Implement it and show people how easy it was. I will have a crack at it
> today.

This is now Patch E68

It's in the incoming directory. Will someone move it into position
for me please.

incoming/E68_simple_indexer.txt.v2

It works, it's unintrusive, quick and it could be popular.

robh

Re: indexing suggestion [ In reply to ]

Apr 12, 1995, 4:06 PM

Post #8 of 47 (6530 views)

To see it working...
http://ooo.lanl.gov/try.indx

Re: indexing suggestion [ In reply to ]

Apr 12, 1995, 4:59 PM

Post #9 of 47 (6530 views)

> incoming/E68_simple_indexer.txt.v2

v3 fixes a typo in the HTML that was being output, and correctly
logs the number of bytes sent.

Re: indexing suggestion [ In reply to ]

Apr 12, 1995, 5:35 PM

Post #10 of 47 (6534 views)

> Andy suggested WAIS and glimpse. This is something different -
> the resource owners decide what goes into the index, and how it
> is described (the ALIWEB approach). The index files will typically
> be small

WAIS (definately - not sure about glimpse) can index anything. The ideal
solution (if you used WAIS) would be for authors to decide what they wanted
to appear in the robsownformat.idx files in each of their directories. WAIS
would then index the robsownformat.idx files - *not* the entire *.html space.
Authors can add their own keywords etc, etc, etc.

Authors would still get the final say about what was searchable - it would
*NOT* be an indexing of ALL the files that the server (or server admin)
knew about.

> If all of this was expensive, I too would have my doubts, and would
> suggest it be CGI'ed. But it's so easy to bolt on to the existing code,
> and by being based on simple format and searching principles, it shouldn't
> have an impact on server performance.
>
>
> w.r.t robots, they could ask for the raw index file and use that to
> build an ALIWEB style of index - one which is far superior to existing
> "grab everything and guess" robot indexing techniques.
>
>
> Someone told be the other day that it's pointless just saying "it's easy".
> Implement it and show people how easy it was. I will have a crack at it
> today.

It's easy. The server's already doing most of the hard work - directory
hopping, looking for files etc. The point is do you want Apache to
hardcode a preference for any given .idx format, when a sexy PERL script
and a decent Makefile (yeah with WAIS or whatever) can do the same thing?

> robh

[.I've done the WAIS thing already Rob, but go for it anyhow]

Ay.

Re: indexing suggestion [ In reply to ]

Apr 12, 1995, 10:40 PM

Post #11 of 47 (6521 views)

Last time, Rob Hartill uttered the following other thing:
>
>
>
> To see it working...
> http://ooo.lanl.gov/try.indx

I noticed that you have :
PATH=.:/bin:/usr/local/bin:/usr/bin:/users/hartill/bin:/usr/bin/X11:/etc

You really shouldn't have the . path first in the list. Besides being
bad practice from a security stand point, we found on hoohoo that our
uptime script did a `which uptime` first, and found itself, called itself
in a loop, and did a fair job of crashing the machine (not quite, but
damn hard to find, esp. when we thought it was a bug in the server).

Brandon

--
Brandon Long (N9WUC) "I think, therefore, I am confused." -- RAW
Computer Engineering Run Linux 1.1.xxx It's that Easy.
University of Illinois blong@uiuc.edu http://www.uiuc.edu/ph/www/blong
Don't worry, these aren't even my views.

Re: indexing suggestion [ In reply to ]

Apr 13, 1995, 8:52 AM

Post #12 of 47 (6531 views)

Grumble, grumble. I see that indexing in the server isn't likely to
get passed the voting stage, even though other things like imagemaps
and content-negotiation are equally valid candidates for cgi, it is
considered favorable to have them inside rather than out.

On the assumption that my proposal is heading for a veto, I'd
at least like to see changes to the counter proposal.

> http://www.ai.mit.edu/cgi-bin/idx/site.idx

I'd like to see Apache act on the MIME type mapping for .idx
instead of having "/cgi-bin/idx/" prefixes to URLs.

In the long term, this would make it more flexible, in that
one could immediately change the characteristics of all .idx/indx
URLs without changing or redirecting the well established URLs.

As it stands, all Rob T is proposing is an idea which can already
be implemented in 1.3 (I've had such a system running at Cardiff
for well over a year). That's not to say that old is bad, but
anyone can use this method on top of a Apache-built-in system if
they wanted to anyway.

The advantages of the original proposal haven't gone away, it'll
be faster (no fork ultimately) and it's based on a much simpler
(more restrictive you might argue) syntax.

On the busy Cardiff server we find that under heavy traffic, the
first services to melt are the ones based on perl cgi. And they
have a habbit of dragging the rest of the system down with them.

> If it's popular we can think about moving it into the server proper,
> when that is clearly appropriate. Right now, I don't think it is.

I think it'd become more popular if it were in the server from the
outset. The cgi approach has been there for people to use for
over a year, it simply didn't catch on. We can always extend the
syntax to meet changing needs at a later date.

It's open to a vote.

robh

Re: indexing suggestion [ In reply to ]

Apr 13, 1995, 9:16 AM

Post #13 of 47 (6523 views)

> I noticed that you have :
> PATH=.:/bin:/usr/local/bin:/usr/bin:/users/hartill/bin:/usr/bin/X11:/etc
>
> You really shouldn't have the . path first in the list. Besides being
> bad practice from a security stand point, we found on hoohoo that our
> uptime script did a `which uptime` first, and found itself, called itself
> in a loop, and did a fair job of crashing the machine (not quite, but
> damn hard to find, esp. when we thought it was a bug in the server).

I noticed that with some NCSA bundled scripts.

It shouldn't be picking up my path for this anyway - will have to
fix that.

I'll move my "." elsewhere.

cheers,
rob

Re: indexing suggestion [ In reply to ]

Apr 13, 1995, 9:54 AM

Post #14 of 47 (6527 views)

From: Rob Hartill <hartill@ooo.lanl.gov>
Date: Wed, 12 Apr 95 15:56:36 MDT
> Someone told be the other day that it's pointless just saying "it's easy".
> Implement it and show people how easy it was. I will have a crack at it
> today.

As an argument for my counterproposal of doing it as a CGI script, I
have implemented that --- it took me all of 30 seconds to replace the

$indexfilename = '/index/path/here';

with

$indexfilename = $ENV{'PATH_TRANSLATED'};

at the top of the file. The resulting script is in
E68_simple_indexer.pl in the patches/for_Apache_0.5.1 directory on
hyperreal; you can try it out with

http://www.ai.mit.edu/cgi-bin/idx/site.idx

This searches the index which can be retrieved directly at

http://www.ai.mit.edu/site.idx

This is not the world's greatest site index (it's maintained by a
script with looks for <META> fields in documents, which I regretfully
consider an experiment that failed), but it does demonstrate that the
basic functionality works, assuming that someone has a *.idx file
which has more useful information. Anyone who's capable of managing
the new imagemap script can easily manage this as well.

Rob says of his thing:

It works, it's unintrusive, quick and it could be popular.

Mine also works, it's a great deal less intrusive (wild pointers or
memory leaks in the code can't compromise the integrity of even a
non-forking server), it's quick, it can be easily modified to suit
local conditions by webmasters who don't want to mess with the server
code itself, it doesn't commit the server code to any particular index
format, and it can be replaced at will with any of the more capable
search engines that are widely available.

If it's popular we can think about moving it into the server proper,
when that is clearly appropriate. Right now, I don't think it is.

rst

Re: indexing suggestion [ In reply to ]

Apr 13, 1995, 10:09 AM

Post #15 of 47 (6526 views)

From: rst@ai.mit.edu (Robert S. Thau)
Date: Thu, 13 Apr 95 09:54:51 EDT

Mine also works, it's a great deal less intrusive (wild pointers or
memory leaks in the code can't compromise the integrity of even a
non-forking server), it's quick, it can be easily modified to suit
local conditions by webmasters who don't want to mess with the server
code itself, it doesn't commit the server code to any particular index
format, and it can be replaced at will with any of the more capable
search engines that are widely available.

...but wait! There's more! It parses the full Aliweb IANA template
syntax, which means that people who already have Aliweb site indexes
can use them directly! And the format has other advantages... in
addition to trivia like having more mnemonic field names, it supports
multi-line descriptions and keyword lists (using the usual RFC822
continuation syntax)!

It slices! It dices! It chops! And if you order now, you get a free
turnip twaddler!

rst

Re: indexing suggestion [ In reply to ]

Apr 13, 1995, 11:49 AM

Post #16 of 47 (6541 views)

Date: Thu, 13 Apr 95 16:07 BST
From: drtr@ast.cam.ac.uk (David Robinson)
Precedence: bulk
Reply-To: new-httpd@hyperreal.com

>I'd like to see Apache act on the MIME type mapping for .idx
>instead of having "/cgi-bin/idx/" prefixes to URLs.

So how about a mime type for running a CGI script. .e.g.
AddType application/x-script-parsed .idx /cgi-bin/idx

The specified script would be called with the document URL as PATH_INFO,
and the document file as PATH_INFO_TRANSLATED.

I *really* like this idea... here are a few possible improvements.
Instead of overloading PATH_INFO, give the script the document URL as
DOCUMENT_URI, as is currently done for server-side includes. This is
a bit more consistent with the includes functionality, and it also
lets the script get at the real PATH_INFO, if any was supplied.

Also, instead of having a three-argument AddType directive, it might
be better to have a separate AddHandler directive --- this would allow
users to easily declare handlers for a MIME type with multiple
suffixes, or for their DefaultType, without having to repeat the name
of the handler several times, e.g.

AddHandler text/plain /cgi-bin/format_setext_stuff

Finally, substitute directory indexing routines could be declared as
handlers for an appropriately chosen MIME type, say

AddHandler application/x-unix-directory /cgi-bin/read-4dos-indexes

It's a little late to get this in for this week's vote, but I'll
probably implement it over the weekend, as specified above, if no one
has any strong objections.

rst

Re: indexing suggestion [ In reply to ]

Apr 13, 1995, 12:11 PM

Post #17 of 47 (6531 views)

On Thu, 13 Apr 1995, Robert S. Thau wrote:
> It slices! It dices! It chops! And if you order now, you get a free
> turnip twaddler!

I'm not so sure I want my turnips twaddled, thanks.

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/

Re: indexing suggestion [ In reply to ]

Apr 13, 1995, 4:07 PM

Post #18 of 47 (6534 views)

>I'd like to see Apache act on the MIME type mapping for .idx
>instead of having "/cgi-bin/idx/" prefixes to URLs.

So how about a mime type for running a CGI script. .e.g.
AddType application/x-script-parsed .idx /cgi-bin/idx

The specified script would be called with the document URL as PATH_INFO,
and the document file as PATH_INFO_TRANSLATED.

This could be an alternative way of having a script generate an index for
a directory, e.g.
AddType application/x-script-parse .4dos /cgi-bin/index
DirectoryIndex index.html index.4dos

which would cause the cgi script to be run if there were a 4dos-style
description file in the directory.

David.

Re: indexing suggestion [ In reply to ]

Apr 13, 1995, 6:09 PM

Post #19 of 47 (6546 views)

On Thu, 13 Apr 1995, Robert S. Thau wrote:
> Date: Thu, 13 Apr 95 16:07 BST
> From: drtr@ast.cam.ac.uk (David Robinson)
> Precedence: bulk
> Reply-To: new-httpd@hyperreal.com
>
> >I'd like to see Apache act on the MIME type mapping for .idx
> >instead of having "/cgi-bin/idx/" prefixes to URLs.
>
> So how about a mime type for running a CGI script. .e.g.
> AddType application/x-script-parsed .idx /cgi-bin/idx

Actually, this is sorta the same as, for example, putting

#!/www/cgi-bin/idx

at the top of an .idx file and making .idx a recognized CGI script type,
yes? I use this kind of mechanism for imagemaps, where I put a

#!/usr/local/bin/imagemap-new

at the top of map files - imagemap-new is a slightly modified version
which understands being called this way.

However, comparing this to something like

AddType application/x-script-parsed .imap /cgi-bin/imagemap

I think the latter is better (certainly more general).

> Also, instead of having a three-argument AddType directive, it might
> be better to have a separate AddHandler directive --- this would allow
> users to easily declare handlers for a MIME type with multiple
> suffixes, or for their DefaultType, without having to repeat the name
> of the handler several times, e.g.
>
> AddHandler text/plain /cgi-bin/format_setext_stuff
>
> Finally, substitute directory indexing routines could be declared as
> handlers for an appropriately chosen MIME type, say
>
> AddHandler application/x-unix-directory /cgi-bin/read-4dos-indexes

The first is fine - the WN server allows one to specify a particular
"filter" to be applied to URL objects, so this is similar. This also
means we don't have to create a new bogus MIME type, which is the reason
why I don't like the second AddHandler example. Rob McCool, is there a
public specification of your the NetSite server API anywhere? It seems
like there must be a more general way of modularizing server capabilities
than defining new bogus MIME types.

Brian

Re: indexing suggestion [ In reply to ]

Apr 16, 1995, 12:04 PM

Post #20 of 47 (6540 views)

>
> Re PATH_INFO; if /dir/file.ext is a regular (unix) file, then accessing
> /dir/file.ext/path_info will fail.
>
> Not currently --- the PATH_INFO is simply ignored in this case. I
> personally see no compelling reason to change this, although as we all
> will recall, Rob H. vehemently disagrees. However, I do think that
> PATH_INFO should clearly be allowed anywhere that a CGI script might
> get into the mix.

I can't follow this discussion, perhaps because I just got out
of bed, so I'm lost as to what I vehemently disagreed to here.
Please remind me.

robh

Re: indexing suggestion [ In reply to ]

Apr 16, 1995, 12:07 PM

Post #21 of 47 (6524 views)

Date: Sun, 16 Apr 95 16:48 BST
From: drtr@ast.cam.ac.uk (David Robinson)

Yes, probably better, although DOCUMENT_URI isn't part of the CGI spec.
Currently you can only have PATH_INFO for server-side includes or CGI scripts.
(See below.)

Actually, when a script is invoked via  from a
server-side-includes document, it gets both PATH_INFO *and*
DOCUMENT_URI set --- see

http://www.ai.mit.edu/xperimental/foo.shtml/path/info?query+string

and look at the results of /cgi-bin/printenv which are included at the
bottom. Note also that at least in this cas, you actually do need to
set PATH_INFO to something *different* from the DOCUMENT_URI, or lose
useful information about the actual request.

I think this would be amazingly useful. For example, the patchlog database
runs each patch file through an html converter for sending to the user.
So you read a patch with a URL like
http://host/dir/bugread.cgi?id=00001
Currently, this could be slightly better if the id was passed in the path
info as http://host/dir/bugread.cgi/00001

Whereas with your suggestion, I would be able to present the bug files as
http://host/dir/bugs/00001
and httpd would automatically run the cgi script to format the file.

That way, the index produced by http://host/dir/bugs/ would have links which
would return the formatted documents, rather than the plain files.

You'd need per-directory DefaultTypes (another potentially useful
extension, though some care would be needed in implementation in the
non-forking case), or an extension on the names of the buglog files
themselves, but that is the general idea.

Re PATH_INFO; if /dir/file.ext is a regular (unix) file, then accessing
/dir/file.ext/path_info will fail.

Not currently --- the PATH_INFO is simply ignored in this case. I
personally see no compelling reason to change this, although as we all
will recall, Rob H. vehemently disagrees. However, I do think that
PATH_INFO should clearly be allowed anywhere that a CGI script might
get into the mix.

And how about using this mechanism for content-type based translation?
Suppose the request specifies acceptance of image/gif files, but not
image/jpeg, and a jpeg version does not exist. Then we might have
AddTranslator image/jpeg image/gif /cgi-bin/jpeg2gif
Obviously, this could be done with an AddHandler CGI script which
checked the Accept headers.

It could --- I suspect that if anyone started making serious use of
such a feature, they'd want it inside the server for performance
reasons, but AddHandler would at least allow it to be prototyped
more easily.

>Finally, substitute directory indexing routines could be declared as
>handlers for an appropriately chosen MIME type, say
>
> AddHandler application/x-unix-directory /cgi-bin/read-4dos-indexes

How would this interact with the multiple DirectoryIndex config?

The Handler would only come into play if *none* of the files named in
the DirectoryIndex directive are found in that directory (including
MultiViews searches, if MultiViews is on).

NB I'm starting to code this now; it doesn't look hard...

rst

Re: indexing suggestion [ In reply to ]

Apr 16, 1995, 4:48 PM

Post #22 of 47 (6543 views)

Rst wrote:
> From: drtr@ast.cam.ac.uk (David Robinson)
>
> >I'd like to see Apache act on the MIME type mapping for .idx
> >instead of having "/cgi-bin/idx/" prefixes to URLs.
>
> So how about a mime type for running a CGI script. .e.g.
> AddType application/x-script-parsed .idx /cgi-bin/idx
>
> The specified script would be called with the document URL as PATH_INFO,
> and the document file as PATH_INFO_TRANSLATED.

>I *really* like this idea... here are a few possible improvements.
>Instead of overloading PATH_INFO, give the script the document URL as
>DOCUMENT_URI, as is currently done for server-side includes. This is
>a bit more consistent with the includes functionality, and it also
>lets the script get at the real PATH_INFO, if any was supplied.

Yes, probably better, although DOCUMENT_URI isn't part of the CGI spec.
Currently you can only have PATH_INFO for server-side includes or CGI scripts.
(See below.)

>Also, instead of having a three-argument AddType directive, it might
>be better to have a separate AddHandler directive --- this would allow
>users to easily declare handlers for a MIME type with multiple
>suffixes, or for their DefaultType, without having to repeat the name
>of the handler several times, e.g.
>
> AddHandler text/plain /cgi-bin/format_setext_stuff

I think this would be amazingly useful. For example, the patchlog database
runs each patch file through an html converter for sending to the user.
So you read a patch with a URL like
http://host/dir/bugread.cgi?id=00001
Currently, this could be slightly better if the id was passed in the path
info as http://host/dir/bugread.cgi/00001

Whereas with your suggestion, I would be able to present the bug files as
http://host/dir/bugs/00001
and httpd would automatically run the cgi script to format the file.

That way, the index produced by http://host/dir/bugs/ would have links which
would return the formatted documents, rather than the plain files.

Re PATH_INFO; if /dir/file.ext is a regular (unix) file, then accessing
/dir/file.ext/path_info will fail. Should this still apply if file.ext
is subject to a CGI handler? If the use of handlers is as 'output filters',
then httpd should probably reject such a request. However, I imagine there
might be cases where the files don't represent objects to be filtered, so
extra PATH_INFO might be useful.

And how about using this mechanism for content-type based translation?
Suppose the request specifies acceptance of image/gif files, but not
image/jpeg, and a jpeg version does not exist. Then we might have
AddTranslator image/jpeg image/gif /cgi-bin/jpeg2gif
Obviously, this could be done with an AddHandler CGI script which
checked the Accept headers.

>Finally, substitute directory indexing routines could be declared as
>handlers for an appropriately chosen MIME type, say
>
> AddHandler application/x-unix-directory /cgi-bin/read-4dos-indexes

How would this interact with the multiple DirectoryIndex config?

David.

Re: indexing suggestion [ In reply to ]

Apr 16, 1995, 6:34 PM

Post #23 of 47 (6529 views)

From: Rob Hartill <hartill@ooo.lanl.gov>
Date: Sun, 16 Apr 95 12:04:32 MDT

>
> Re PATH_INFO; if /dir/file.ext is a regular (unix) file, then accessing
> /dir/file.ext/path_info will fail.
>
> Not currently --- the PATH_INFO is simply ignored in this case. I
> personally see no compelling reason to change this, although as we all
> will recall, Rob H. vehemently disagrees. However, I do think that
> PATH_INFO should clearly be allowed anywhere that a CGI script might
> get into the mix.

I can't follow this discussion, perhaps because I just got out
of bed, so I'm lost as to what I vehemently disagreed to here.
Please remind me.

If I remember right, you were fairly insistent earlier on that
/dir/file.ext/ was "incorrect" and should be bounced with a 404;
current behavior (as I indicated above) is simply to retrieve the file
in this case and ignore the '/' (which is a simple case of the general
/dir/file.ext/path/info).

rst

Re: indexing suggestion (ATTN NCSA: possible 1.4 bug...) [ In reply to ]

Apr 17, 1995, 8:29 AM

Post #24 of 47 (6532 views)

Last time, Robert S. Thau uttered the following other thing:
>
> Date: Mon, 17 Apr 95 14:30 BST
> From: drtr@ast.cam.ac.uk (David Robinson)
>
> Rst wrote:
> >...You'd need per-directory DefaultTypes (another potentially useful
> >extension, though some care would be needed in implementation in the
> >non-forking case)...
>
> This is already available; it is a feature of NCSA httpd 1.3.
>
> Hmmm... it's still there in 1.4, and still implemented as a simple
> strcpy() into the default_type variable --- however, I can't see
> anyplace where it saves the srm.conf DefaultType value before
> overwriting it (and hence, it can't restore DefaultType to that value
> before the next transaction).
>
> (That's the subtlety I was alluding to... Rob H., if you've taken over
> the non-forking stuff, I guess this is in your bailiwick).

argh. You know, I'm beginning to understand why Netsite doesn't have
local directory config files, just central ones that define everything.
It makes it a lot easier.

fixed.

> would *clearly* be improper with XBITHACK on --- if the XBIT is set on
> /file.html/, then this is a reference to a server-side-includes file
> with PATH_INFO, and the correct thing is very definitely to process
> the file, and pass along the PATH_INFO to any scripts it happens to
> invoke.
>
> At any rate, this is such a minor issue that I can't see fussing with
> it at all until after beta 1.

1.4 returns the file as html, PATH_INFO set /. I'd feel this is the proper
course of action.

odd, 1.4 returns 403 on /index.html/a

Brandon

--
Brandon Long (N9WUC) "I think, therefore, I am confused." -- RAW
Computer Engineering Run Linux 1.1.xxx It's that Easy.
University of Illinois blong@uiuc.edu http://www.uiuc.edu/ph/www/blong
Don't worry, these aren't even my views.

Re: indexing suggestion (ATTN NCSA: possible 1.4 bug...) [ In reply to ]

Apr 17, 1995, 9:10 AM

Post #25 of 47 (6528 views)

> (That's the subtlety I was alluding to... Rob H., if you've taken over
> the non-forking stuff, I guess this is in your bailiwick).

parse_access_dir() in non-forking code is an unmunge_name()
waiting to happen.

Re: Trailing slash stuff..

> if the XBIT is set on
> /file.html/, then this is a reference to a server-side-includes file
> with PATH_INFO, and the correct thing is very definitely to process
^^^^^^^
> the file, and pass along the PATH_INFO to any scripts it happens to
> invoke.

I'll agree with that, if and only if you can point me to the CGI
documentation which defines this behaviour. If that isn't documented,
I'd agree that index.html/ and index.html/a should return a 404 Not
Found.

Until we get HTTP/1.1, and the ability to add a BASE to the header, I
think it is too dangerous (w.r.t broken relative URLs) to service these
requests. If 1.3 behaves as David described, we won't be breaking
anything by 404'ing them.

rob