Mailing List Archive

WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd)
I think the following can be fixed by discarding IMS when redirecting
to an ErrorDocument...

Ack sent to submitter.

Forwarded message:
> From nobody@hyperreal.com Fri Sep 22 08:53:51 1995
> Message-Id: <199509221553.IAA28800@taz.hyperreal.com>
> From: khera@kciLink.com
> To: apache-bugs%apache.org@organic.com
> Date: Fri Sep 22 8:53:44 1995
> Subject: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI
>
> Submitter: khera@kciLink.com
> Operating system: BSDI, version:
> Extra Modules used: mSQL
> URL exhibiting problem: http://www.govcon.com/information/events.html
>
> Symptoms:
> --
> Given: ErrorDocument 403 /members-only.html if the document members-only.html is newer than the document which caused the error, you can no longer retrieve the original document even when the error is corrected, if your browser uses the "if-modified-since" request, eg, Netscape. To see the error in action, visit the URL below while running Netscape Navigator. When it asks for user ID, just hit "cancel" and you will be presented with the members-only.html document. However, the client still thinks this is the original document name. Now, open that URL again. You will not be asked for an ID at all but dropped right into the error document. I believe that any error document shipped out should either have a very short Expires header, or should have a no-cache pragma so that the browsers don't cache the files. To see the actual file, use the User ID "guest" and the password "guest". Visit the directory URL (without the "events.html" file name to see the whole directory. !
> You will then be asked for the user ID and password. From there, scoll down and click on "upcoming events". Even though you are authenticated now, you will not get the real document, but the error document. There is no way short of purging your entire cache to get at the real document.
> --
>
> Backtrace:
> --
>
> --
>
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
>
>
> I think the following can be fixed by discarding IMS when redirecting
> to an ErrorDocument...

I don't think so, what seems to be happening is that Netscape is cacheing the
error document as the image for another URL, so when that _other_ URL is
called again, Netscape offers up the cached image. The correct procedure
would be as he suggests: either send a no-cache pragma, or a very short
expiry. Unfortunately, this may not work either, as Netscape seems to be
brain-damaged about both cache expiry and no-cache pragma (at least, some
experiments I have done with client pull have required me to disable Netscape's
cache to see updates).

On second thoughts, a short expiry is also wrong. It needs to be an expiry
that is before the date of the real document. But it might as well be a
no-cache, in that case.

>
> Ack sent to submitter.
>
> Forwarded message:
> > From nobody@hyperreal.com Fri Sep 22 08:53:51 1995
> > Message-Id: <199509221553.IAA28800@taz.hyperreal.com>
> > From: khera@kciLink.com
> > To: apache-bugs%apache.org@organic.com
> > Date: Fri Sep 22 8:53:44 1995
> > Subject: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI
> >
> > Submitter: khera@kciLink.com
> > Operating system: BSDI, version:
> > Extra Modules used: mSQL

BTW - what's this ^^^^ ?

> > URL exhibiting problem: http://www.govcon.com/information/events.html
> >
> > Symptoms:
> > --
> > Given: ErrorDocument 403 /members-only.html if the document members-only.html is newer than the document which caused the error, you can no longer retrieve the original document even when the error is corrected, if your browser uses the "if-modified-since" request, eg, Netscape. To see the error in action, visit the URL below while running Netscape Navigator. When it asks for user ID, just hit "cancel" and you will be presented with the members-only.html document. However, the client still thinks this is the original document name. Now, open that URL again. You will not be asked for an ID at all but dropped right into the error document. I believe that any error document shipped out should either have a very short Expires header, or should have a no-cache pragma so that the browsers don't cache the files. To see the actual file, use the User ID "guest" and the password "guest". Visit the directory URL (without the "events.html" file name to see the whole directory!
> . !
> > You will then be asked for the user ID and password. From there, scoll down and click on "upcoming events". Even though you are authenticated now, you will not get the real document, but the error document. There is no way short of purging your entire cache to get at the real document.
> > --
> >
> > Backtrace:
> > --
> >
> > --
> >
>

--
Ben Laurie Phone: +44 (181) 994 6435
Freelance Consultant Fax: +44 (181) 994 6472
and Technical Director Email: ben@algroup.co.uk (preferred)
A.L. Digital Ltd, benl@fear.demon.co.uk (backup)
London, England.

[.Note for the paranoid: "fear" as in "Fear and Loathing
in Las Vegas", "demon" as in Demon Internet Services, a
commercial Internet access provider.]
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
BTW, shouldn't the bug report form have a (mandatory) slot for Apache version
number?

--
Ben Laurie Phone: +44 (181) 994 6435
Freelance Consultant Fax: +44 (181) 994 6472
and Technical Director Email: ben@algroup.co.uk (preferred)
A.L. Digital Ltd, benl@fear.demon.co.uk (backup)
London, England.

[.Note for the paranoid: "fear" as in "Fear and Loathing
in Las Vegas", "demon" as in Demon Internet Services, a
commercial Internet access provider.]
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
Hmm, thinking about it, it seems that to be really safe, all internal
redirects need to stop caching. I don't know what the best solution is,
but candidates are;

I don't think *all* internal redirects need to stop caching, but error
redirects clearly should. In fact, I thought they already did (die()
sets r->no_cache before calling internal_redirect, which certainly should
be copying the no_cache state of the original request onto the new one).

rst
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
they don't if the bug-report is accurate. I haven't attempted to verify it.

Even if it is, there may be other explanations. For instance, we may be
doing the last-modified check before verifying that we have permission to
read the file. If we are, and someone does 'chmod 000 somefile.html', we
will continue to send USE_LOCAL_COPY, rather than reporting permission
denied, to anyone who sends 'If-modified-since: lastmod-date'.

I have checked the code, BTW --- die() certainly should be setting r->no_cache,
and internal_redirect() certainly should be propagating it to the new
request_rec. There may be some silly typo which keeps that code from
working, but it seems more likely that the guy encountered some combination
of circumstances like I outlined above, if he isn't just confused.

rst
Re: WWW Form Bug Report: "can't retrieve real document if error doc is [ In reply to ]
> > > Extra Modules used: mSQL
>
> BTW - what's this ^^^^ ?

It's in the contrib directory on hyperreal. Deals with authentication using
the mSQL relational database. cf the DBM authentication functionality.

> Ben Laurie Phone: +44 (181) 994 6435
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
>
> they don't if the bug-report is accurate. I haven't attempted to verify it.
>
> Even if it is, there may be other explanations. For instance, we may be
> doing the last-modified check before verifying that we have permission to
> read the file. If we are, and someone does 'chmod 000 somefile.html', we
> will continue to send USE_LOCAL_COPY, rather than reporting permission
> denied, to anyone who sends 'If-modified-since: lastmod-date'.
>
> I have checked the code, BTW --- die() certainly should be setting r->no_cache,
> and internal_redirect() certainly should be propagating it to the new
> request_rec. There may be some silly typo which keeps that code from
> working, but it seems more likely that the guy encountered some combination
> of circumstances like I outlined above, if he isn't just confused.

Or it could be a symptom of the Netscape braindamage I mentioned earlier.
As I said, I couldn't get Netscape to update an image (which was being
refreshed every 30 seconds), not with no-cache, not with Expires:. I haven't
fully pinned down exactly what the problem was (but I'm pretty sure I was
sending the headers, at least, it would be nice to confirm that Netscape
recieved them ... does anyone know of a winsock spy tool?). In fact, I even
saw bizarre behaviour where it fetched a new version, and then on the next
update reverted to an old cached version (!).

You can experiment with this on http://www.algroup.co.uk/main/pubcam/pubcam.htm
if you want to see it failing in real life. Switching off all cacheing in
Netscape fixes it (I'm talking Windows Netscape here). (You have to click
the picture to get the updating version. Even though it may not change much
at night, you can tell whether it updates by the speed of the refresh).

Cheers,

Ben.

--
Ben Laurie Phone: +44 (181) 994 6435
Freelance Consultant Fax: +44 (181) 994 6472
and Technical Director Email: ben@algroup.co.uk (preferred)
A.L. Digital Ltd, benl@fear.demon.co.uk (backup)
London, England.

[.Note for the paranoid: "fear" as in "Fear and Loathing
in Las Vegas", "demon" as in Demon Internet Services, a
commercial Internet access provider.]
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
> > I think the following can be fixed by discarding IMS when redirecting
> > to an ErrorDocument...
>
> I don't think so, what seems to be happening is that Netscape is cacheing the
> error document as the image for another URL, so when that _other_ URL is
> called again, Netscape offers up the cached image.

Hmm, thinking about it, it seems that to be really safe, all internal
redirects need to stop caching. I don't know what the best solution is,
but candidates are;

- don't send a last-modified
- send a Pragma: no-cache
- both
- .....
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
> Hmm, thinking about it, it seems that to be really safe, all internal
> redirects need to stop caching. I don't know what the best solution is,
> but candidates are;
>
> I don't think *all* internal redirects need to stop caching,

in the context of being "really safe" they might.

the problem as I see it is that it's possible to reach a point after
however many internal redirects where a "Last-modified" is created, but
any of the intermediate URLs could have a 1 to many redirect mapping. If
the IMS gets passed down the chain of redirects then there's always a
chance that a "new" response with an older "Last-modified" date is
to be returned. In these cases, Apache won't send the "new" response.

> but error redirects clearly should. In fact, I thought they already did (die()
> sets r->no_cache before calling internal_redirect, which certainly should
> be copying the no_cache state of the original request onto the new one).

they don't if the bug-report is accurate. I haven't attempted to verify it.

rob
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
>the problem as I see it is that it's possible to reach a point after
>however many internal redirects where a "Last-modified" is created, but
>any of the intermediate URLs could have a 1 to many redirect mapping. If
>the IMS gets passed down the chain of redirects then there's always a
>chance that a "new" response with an older "Last-modified" date is
>to be returned. In these cases, Apache won't send the "new" response.

IMS only applies to what would be sent by Apache, not how many internal
redirects take place. That means the server must determine the correct
response *before* checking IMS.

In any case, caches are not allowed to cache error responses, and IMS
should be checked only if the response would otherwise be 200.

.....Roy
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
> >the problem as I see it is that it's possible to reach a point after
> >however many internal redirects where a "Last-modified" is created, but
> >any of the intermediate URLs could have a 1 to many redirect mapping. If
> >the IMS gets passed down the chain of redirects then there's always a
> >chance that a "new" response with an older "Last-modified" date is
> >to be returned. In these cases, Apache won't send the "new" response.
>
> IMS only applies to what would be sent by Apache, not how many internal
> redirects take place. That means the server must determine the correct
> response *before* checking IMS.

but that still allows an IMS value to be compared against a different
document than the one the IMS relates to.

So if I hit a script which chooses a document A.html -> Z.html at
random, then redirects to it, Apache will compare the IMS with an
unreleated last-modified time 25 times out of 26.

My point is that if you start redirecting, the IMS value given by the
client should be discarded, because there is a reasonable chance that
it will be used out of context.

I think the safest approach is therefore to toss the IMS value at the
first redirect, that way Apache can't send a 304 under any circumstance.

> In any case, caches are not allowed to cache error responses, and IMS
> should be checked only if the response would otherwise be 200.

the problem is more general than just ErrorDocument and gets worse
when you also hit a Netscape caching bug.. if you request a URL that
first produces HTML, then later produces, say postscript, Netscape seems
to keep the HTML cached with the postscript's last-modified date, so
subsequent reloads keep giving you the HTML. Discarding the IMS
because of the redirect should solve this.

rob
--
http://nqcd.lanl.gov/~hartill/
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
>but that still allows an IMS value to be compared against a different
>document than the one the IMS relates to.

Not if the redirect is consistent.

>So if I hit a script which chooses a document A.html -> Z.html at
>random, then redirects to it, Apache will compare the IMS with an
>unreleated last-modified time 25 times out of 26.

And if you shoot yourself in the leg, it's quite possible that pain
will be the result. So? Don't shoot yourself in the leg.
When Apache implements an internally random redirect, then the random
redirect gets no-cache set. The same is true of scripts -- if they
want no-cache, they can bloody well ask for it.

>My point is that if you start redirecting, the IMS value given by the
>client should be discarded, because there is a reasonable chance that
>it will be used out of context.

No there isn't. There is a single point in time at which the server
may, by some strange chance in which the old response is newer than
the new response, that the cache in question will get a 304
when it shouldn't. So, that cache has an old copy (which was perfectly
valid before) until that cache is flushed. No big deal.

>I think the safest approach is therefore to toss the IMS value at the
>first redirect, that way Apache can't send a 304 under any circumstance.

No. If the server ignores IMS for resources in which it is capable
of determining a Last-Modified date, then that server is broken.
The safest approach is to simply take the maximum (latest) of the
last-mod times of all the redirects and use that for the comparison.

>> In any case, caches are not allowed to cache error responses, and IMS
>> should be checked only if the response would otherwise be 200.
>
>the problem is more general than just ErrorDocument and gets worse
>when you also hit a Netscape caching bug.. if you request a URL that
>first produces HTML, then later produces, say postscript, Netscape seems
>to keep the HTML cached with the postscript's last-modified date, so
>subsequent reloads keep giving you the HTML. Discarding the IMS
>because of the redirect should solve this.

What, replace a once-in-a-billion bug with a once-in-a-thousand bug?
No thanks.

.....Roy
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
Roy said..

> And if you shoot yourself in the leg, it's quite possible that pain
> will be the result. So? Don't shoot yourself in the leg.
> When Apache implements an internally random redirect, then the random
> redirect gets no-cache set.

but that's not what I'm seeing when I use the thing. Apache redirects
and the IMS is being used on the url I'm redirecting to, and the response
of the redirect doesn't say no-cache or anything else that would give
the client a clue that this is not the same internal URL as before.

> The same is true of scripts -- if they
> want no-cache, they can bloody well ask for it.

scripts can bloody well do what they want, but if the url you
redirect to isn't a script it can't account for the problem, so
the client is bloody well screwed.

If a script does an internal redirect, Apache discards that script's
headers, or at least I can't get "Pragma: no-cache" to be added.

> >My point is that if you start redirecting, the IMS value given by the
> >client should be discarded, because there is a reasonable chance that
> >it will be used out of context.
>
> No there isn't. There is a single point in time at which the server
> may, by some strange chance in which the old response is newer than
> the new response, that the cache in question will get a 304
> when it shouldn't. So, that cache has an old copy (which was perfectly
> valid before) until that cache is flushed. No big deal.

I'd say it's a big deal if the client is told there's "no change"
even though there has been.

> >I think the safest approach is therefore to toss the IMS value at the
> >first redirect, that way Apache can't send a 304 under any circumstance.
>
> No. If the server ignores IMS for resources in which it is capable
> of determining a Last-Modified date, then that server is broken.

Not broken, just inefficient.

The choice seems to be inefficient but accurate or efficient and potentially
inaccurate.

> The safest approach is to simply take the maximum (latest) of the
> last-mod times of all the redirects and use that for the comparison.

I assume you mean a script does this. But script headers which
accompany a redirect go to /dev/null AFAIK.

> >> In any case, caches are not allowed to cache error responses, and IMS
> >> should be checked only if the response would otherwise be 200.
> >
> >the problem is more general than just ErrorDocument and gets worse
> >when you also hit a Netscape caching bug.. if you request a URL that
> >first produces HTML, then later produces, say postscript, Netscape seems
> >to keep the HTML cached with the postscript's last-modified date, so
> >subsequent reloads keep giving you the HTML. Discarding the IMS
> >because of the redirect should solve this.
>
> What, replace a once-in-a-billion bug with a once-in-a-thousand bug?
> No thanks.

I must be very unlucky then, 'cos I've seen both in real applications.

Write a script that redirects to something recent, hit the URL for the
script, now change the script to redirect to something older. If you give
Apache the IMS, you'll always see the newest object.

Now if the script was able to redirect to different objects by iteself
(which many scripts already do .. I have examples) the problem has a
much greater than one-in-a-thousand chance of showing itself.


The problem with if-modified-since in an internal redirect situation
is that Apache makes no effort to check *what* has supposed to have been
modified. What seems to be happening is that Apache compares the age
of the object to be sent with the IMS, even though that object can be
often be different.

Now if only clients could say "if-foo-modified-since" where foo maps to
something unique, there'd be no problem. It's too late in the day for me
to guess what "foo" needs to be.


rob
Re: WWW Form Bug Report: "can't retrieve real document if error doc is newer" on BSDI (fwd) [ In reply to ]
On Fri, 22 Sep 1995, Ben Laurie wrote:
> BTW, shouldn't the bug report form have a (mandatory) slot for Apache version
> number?

The bug report form asks that before filling out the form, the user has
verified that this bug exists on the "latest" version of Apache.

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/