Mailing List Archive

Logging of ErrorDocument responses (was Re: hmmm...)
On Wed, 19 Apr 1995, Rob Hartill wrote:
> Brian responded
> > No, I think for now we should just not log the second response, the "real
> > object" that was returned. For 401 access for example, I know that
> > 401.html will always be returned, so logging it is redundant.
>
> What about the other error/problem redirects, and regular redirects ?
> should we not log those too ?
>
> Is it also redundant to log
> /missing
> when I know that /missing/ is almost sure to follow a second
> later ?

No, because you *aren't* sure. They are two separate processes, two separate
requests. In this situation there is one response and *two* log file
entries, which is completely new and not clearly correct. One request, one
response, one log file entry. We admit that the log file format is way too
sparse, and move towards a general solution for that. We don't fudge it by
making it look like there was a separate request for the internally
redirected object. If this stands it needs to be documented clearly, so
people can modify their current log file tools to account for it.

Whatever, if I'm the only one who thinks this is a problem, I'll shut up,
and I admit it is a little late in the game to bring this up. Can we
have a show of hands? What do those not directly involved in coding this
think?

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: Logging of ErrorDocument responses (was Re: hmmm...) [ In reply to ]
> > Whatever, if I'm the only one who thinks this is a problem, I'll shut up,
> > and I admit it is a little late in the game to bring this up. Can we
> > have a show of hands? What do those not directly involved in coding this
> > think?

The "common" logfile format continues to be a difficult standard to
stick to. It has been suggested that maybe we should create "Apache"
format with a config parameter to revert to the "common" format.
There are a lot of nicer things that we could do with logging.
The common format is "too terse" and "too huge". IMHO - any change
now should be made with consideration toward doing it how the
group would like to see it work in the future.

.03
Re: Logging of ErrorDocument responses (was Re: hmmm...) [ In reply to ]
Brian says...

> > What about the other error/problem redirects, and regular redirects ?
> > should we not log those too ?
> >
> > Is it also redundant to log
> > /missing
> > when I know that /missing/ is almost sure to follow a second
> > later ?
>
> No, because you *aren't* sure.

You can be 100% sure for the first example I gave.

> They are two separate processes, two separate
> requests. In this situation there is one response and *two* log file
> entries, which is completely new and not clearly correct. One request, one

You say "not clearly correct" which is a long way
from "clearly not correct".

Logging 2 request (or however many it takes) is not going to break
anything. NCSA 1.4 will do the same I believe.

If someone asks for X, and X tells them (directly or by some error/problem)
to go to Y, then I'd like to see both X and Y logged. Let's see the
complete picture. I might want to know how many times Y is used
in a day, it doesn't have to be redirected to only by X, so I can't
assume that a count of redirected Xs gives me a count of the Ys.

Ignoring Y in the log file is small price to pay if you dislike them.

> If this stands it needs to be documented clearly, so
> people can modify their current log file tools to account for it.

or not as the case may be. I'll leave mine alone. I see cases where
the redirect amounts to two separate requests, one which may have failed
and one which succeeded.

> Whatever, if I'm the only one who thinks this is a problem, I'll shut up,
> and I admit it is a little late in the game to bring this up. Can we
> have a show of hands? What do those not directly involved in coding this
> think?

my hands are hidden.


robh
Re: Logging of ErrorDocument responses (was Re: hmmm...) [ In reply to ]
On Thu, 20 Apr 1995, Rob Hartill wrote:
> Script X can redirect to URLs A B C D or E
> if someone requests X, do you log X or A-E ?

Unless you have a stateful server the only way X can choose A B C D or E
is from the query string or path info. It could of course choose A or B
based on a random process - but I can't think of any useful applications
of this.

> Internal redirects can be recorded in the logfile with
> a host name of "redirect.internal"

How about a dash or a star in the host name field for redirects?

Mark
Re: Logging of ErrorDocument responses (was Re: hmmm...) [ In reply to ]
Okay, here's another scenario,

Script X can redirect to URLs A B C D or E

if someone requests X, do you log X or A-E ?

X was the actual requested URL, but A-E represents
the important information - what the user wanted.

If you just log X, you lose the most important info,
if you don't log X, you don't even know IF the request
came via X.


How about this as a CLF friendly modification.
Internal redirects can be recorded in the logfile with
a host name of "redirect.internal"

e.g.

ooo.lanl.gov - - [19/Apr/1995:20:09:44 -0600] "GET /re_from HTTP/1.0" 401 -
redirect.internal - - [19/Apr/1995:20:14:44 -0600] "GET /re_to HTTP/1.0" 200 123


And even have a conf setting to toggle them on/off


robh
Re: Logging of ErrorDocument responses (was Re: hmmm...) [ In reply to ]
Date: Wed, 19 Apr 1995 18:22:23 -0700 (PDT)
From: Brian Behlendorf <brian@organic.com>

Whatever, if I'm the only one who thinks this is a problem, I'll shut up,
and I admit it is a little late in the game to bring this up. Can we
have a show of hands? What do those not directly involved in coding this
think?

Brian

Well, since you asked... if I cared more, I would have spoken up
earlier. However, my (mild) preference would be to keep the invariant
that one transaction generates one log-entry.

What Rob's trying to do here, effectively, is to kludge an extra
log-entry field (redirected-URL, or something like that) into the
existing CLF, while still producing something that *looks* like a CLF
produced by one of the current servers. Personally, I think that the
right thing is to just bite the bullet, implement one of the
fully-configurable-logfile proposals which have been flying around
(after fleshing it out a bit), and make "redirected-URL" be one of the
fields which can optionally be logged, for those who care (leaving it
out of the CLF-compatible default).

However, that involves rewriting http_log.c from scratch anyway, so I
don't see that the Apache status quo represents a barrier to adopting
the approach I outlined above at some future time.

rst
Re: Logging of ErrorDocument responses (was Re: hmmm...) [ In reply to ]
Date: Wed, 19 Apr 1995 18:22:23 -0700 (PDT)
From: Brian Behlendorf <brian@organic.com>

Whatever, if I'm the only one who thinks this is a problem, I'll shut up,
and I admit it is a little late in the game to bring this up. Can we
have a show of hands? What do those not directly involved in coding this
think?

Brian

Well, since you asked... if I cared more, I would have spoken up
earlier. However, my (mild) preference would be to keep the invariant
that one transaction generates one log-entry.

What Rob's trying to do here, effectively, is to kludge an extra
log-entry field (redirected-URL, or something like that) into the
existing CLF, while still producing something that *looks* like a CLF
produced by one of the current servers. Personally, I think that the
right thing is to just bite the bullet, implement one of the
fully-configurable-logfile proposals which have been flying around
(after fleshing it out a bit), and make "redirected-URL" be one of the
fields which can optionally be logged, for those who care (leaving it
out of the CLF-compatible default).

However, that involves rewriting http_log.c from scratch anyway, so I
don't see that the Apache status quo represents a barrier to adopting
the approach I outlined above at some future time.

rst
Re: Logging of ErrorDocument responses (was Re: hmmm...) [ In reply to ]
> On Thu, 20 Apr 1995, Rob Hartill wrote:
> > Script X can redirect to URLs A B C D or E
> > if someone requests X, do you log X or A-E ?
>
> Unless you have a stateful server the only way X can choose A B C D or E
> is from the query string or path info.

Or POSTed info. you can't see that in the URL.

> It could of course choose A or B
> based on a random process - but I can't think of any useful applications
> of this.

ISMAPs are based on X redirecting to multiple URLs, and URLs which
aren't alaways called via the map script.

We use scripts to redirect to other URLs at this site and find
it very useful. One script we have has to translate TeX into postscript
on-the-fly, and point to the postscript file(s). The filename isn't
known until the process is complete, sometime there are multiple
postscript files so the URL might be a pointer to a ps file or to
another script which lists the ps files.

It's important for us to know how the TeX->ps script is used, and
how many accesses the ps gets.

> > Internal redirects can be recorded in the logfile with
> > a host name of "redirect.internal"
>
> How about a dash or a star in the host name field for redirects?

That sounds like it'd screw up CLF. It'd certainly mislead existing
CLF analysers.


robh
Re: Logging of ErrorDocument responses (was Re: hmmm...) [ In reply to ]
[bad log blues...]
> Whatever, if I'm the only one who thinks this is a problem, I'll shut up,
> and I admit it is a little late in the game to bring this up. Can we
> have a show of hands? What do those not directly involved in coding this
> think?

Well I *am* involved in so far as I'd like to see more, but even so:
-1
the patch can *burn*, bwAHAHAHAHAHA.

It breaks stuff and the problems it provokes outweigh the benefits I
believe.

> Brian

Ay.
Re: Logging of ErrorDocument responses (was Re: hmmm...) [ In reply to ]
From: Rob Hartill <hartill@ooo.lanl.gov>
Date: Thu, 20 Apr 95 8:24:50 MDT

Okay, here's another scenario,

Script X can redirect to URLs A B C D or E

if someone requests X, do you log X or A-E ?

X was the actual requested URL, but A-E represents
the important information - what the user wanted.

If you just log X, you lose the most important info,
if you don't log X, you don't even know IF the request
came via X.

That's certainly the case if X is, say, an imagemap. However, on an
error redirect, it is X which is the most important info (the URL
which caused the error condition, whatever it was); this is a case in
which it is *vital* to log X, and much less important to log A-E.

How about this as a CLF friendly modification.
Internal redirects can be recorded in the logfile with
a host name of "redirect.internal"

e.g.

ooo.lanl.gov - - [19/Apr/1995:20:09:44 -0600] "GET /re_from HTTP/1.0" 401 -
redirect.internal - - [19/Apr/1995:20:14:44 -0600] "GET /re_to HTTP/1.0" 200 123

Keeping in mind that this is coming from someone who doesn't care all
that much (so long as the URLs which *caused* errors still get logged):

One problem here is that nothing in the logfile ties the two entries
for a single request together (note that log entries for other
transactions might well intervene, particularly if the redirection is
to a CGI script which does nontrivial work to assemble a custom
message). So, the "redirect.internal" business at least tags a
transaction as being the "second half" of an internal redirect, but
you lose information about what host it was going towards.

Counterproposal:

ooo.lanl.gov - - [19/Apr/1995:20:14:44 -0600] "GET /re_to HTTP/internal" 200 123

since the internally-generated transaction did not originate from a
genuine HTTP/1.0 request. (The tag of HTTP/internal is chosen so that
wwwstat won't throw away these lines --- but then again, maybe it
should).

As for what I *really* think, that continues to be:

1) Ideally, we should have one log-entry per transaction, which logs
*both* initial and final URLs on an internal redirect (in the case
where they're different), via some sort of configurable logging.

2) Unlike, say, the problems we're having with 401 redirects (hmmm...
does NCSA have the same bug? maybe someone ought to check), it's
not worth holding up the beta to get this right.

rst