On dim, 2002-05-12 at 17:38, Lars Aronsson wrote:
> Less than two hours ago, I found the page
> http://www.wikipedia.com/wiki/Sweden blank again. So I clicked Edit,
> then Save, to restore its full contents. This has happened before.
> See the page's History.
>
> Is this "the caching bug"?
Not as far as I know...
> How does it work? Wouldn't the obsolete
> cached copy me destroyed the first time I did this?
Sure should have been. Here's how the caching works:
* When a page is loaded, the contents of the cur_cache field are
checked. If it is not empty (!= ""), the cached HTML is dropped into the
output and various parsing/rendering steps are bypassed. Otherwise, the
page is rendered anew, shown, and stored in cur_cache for the next time
the page is shown.
* When a page is saved, the cur_cache field is cleared (set to ""), so
that the next time it's loaded (which should be immediately!) it will be
re-rendered.
* When a new page is created, the "unlinked" table is checked to see if
any other pages were linking to the previously-nonexistant page. If so,
those pages have their cur_cache cleared (set to ""), so that when next
viewed, they can show the link as being to an existing page. (This won't
happen currently, as the "unlinked" table is broken.)
"The caching bug" is due to the fact that, if caching is disabled, the
cur_cache field isn't touched at all, not even to clear caches when old
pages are saved: the software has no way of knowing the cache is old
(because there's no cache timestamp), so the old version is displayed as
if it were the current version of the page. Basically, it assumes that
caching will either be always off, or start off and turn on forever.
Turning it on, then turning it off for a while, then back on without
manually clearing the cache leaves bad cached pages.
This can easily be solved in the future by performing the cache-clears
whether caching is enabled or not. (I assume this was not done in the
first place so that the code would still work until the cur_cache field
was added to the database!)
The solution for our current problem is simply to clear all the cache
fields:
UPDATE cur SET cur_cache="",cur_timestamp=cur_timestamp
which can be done by someone with is_developer access (that would be
nobody, currently!) or by someone with direct access to the db server
(messieurs Wales and Richey). That will get rid of all the old cached
pages, and the _correct_ caches can start to fill up again. (Except that
unlinked links may stick around for newly created pages due to the
separate "unlinked" table problem.)
> Why does the blank page come back?
I have no idea! I haven't actually seen this problem yet, I've only seen
a couple notes that said roughly "page was blank, so I saved it again,
fine now."
Does the page _stay_ blank after reloading; ie does it consistently come
up blank until you resave it? Or does it come up blank once, but is okay
if you reload? Is it completely blank, or does it contain spaces or
something? Are there any error messages? Can you see anything suspicious
in the HTML source of the page?
If anyone with is_sysop status notices a page doing this, try doing a
direct SQL query before doing anything to the article:
SELECT * FROM cur WHERE cur_title="Title_of_the_page"
What's in the cur_cache field? Does anything else look out of place?
(Hmm, looks like special_asksql.php needs to slip in a couple of
<nowiki></nowiki> tags.)
> Is caching really necessary for performance now?
I wouldn't doubt that it's _helping_.
> When caching was
> activated, I thought several other functions were disabled at the same
> time. Do we know the impact on response times of not using caching?
Alas, yes and no in that order. Since the main limiting factor is still
the multiple/complex/slow database queries, not CPU time in PHP, caching
may be a smaller help than anything else we're doing; but I'd be quite
surprised if it's not helping at all. And what's the sense in throwing
away a performance gain if we've got one?
(Note that caching does cut down a bit on database queries, since links
don't need to be checked for existence.)
-- brion vibber (brion @ pobox.com)