Mailing List Archive

brokenlinks table
Jan writes:

> Ultimately the best solution would be to have a table wanted(title,
> #pages) with an index on #pages (and a unique index on title), then
> MySQL wouldn't need to sort at all. I don't know of the top of my head
> if there are any other queries that depend on 'brokenlinks' but I
> don't believe so and if there are not then I would recommend replacing
> it.

I believe we need the full brokenlinks information in order to
initialize the links table once a new article is written, so that
"What links here" will immediately work for new articles.

Nevertheless, I think we should have a wanted table as above in
addition. Space is really no issue, but time is, and Most Wanted is
likely to be one of our more commonly called slow functions.

Axel
Re: brokenlinks table [ In reply to ]
On Wed, Jul 10, 2002 at 02:23:13AM +0200, Axel Boldt wrote:
> Jan writes:
>
> > Ultimately the best solution would be to have a table wanted(title,
> > #pages) with an index on #pages (and a unique index on title), then
> > MySQL wouldn't need to sort at all. I don't know of the top of my head
> > if there are any other queries that depend on 'brokenlinks' but I
> > don't believe so and if there are not then I would recommend replacing
> > it.
>
> I believe we need the full brokenlinks information in order to
> initialize the links table once a new article is written, so that
> "What links here" will immediately work for new articles.

Good grief. How could I miss that? I even remember writing the code that did
that! You are of course completely right. Lee already pointed out another
problem that has to do with the fact that pages are not saved until after
they are rendered, but by that time the information of the old page (esp.
the valid links on that page) is no longer in the internal PHP variables.

> Nevertheless, I think we should have a wanted table as above in
> addition. Space is really no issue, but time is, and Most Wanted is
> likely to be one of our more commonly called slow functions.

In fact I advised the same thing by mail to Lee. However, I also agree with
Lee that we should now go with the quick fix and get Wikipedia running on
the new code base ASAP. Then we will be able to get some real timing data
and be sure that this is one of the bottle-necks. I strongly suspect it will
be because it will still be eating a lot of space. If I had the time and Lee
would allow it I would probably go and implement it right away.

-- Jan Hidders