Mailing List Archive

Slow parts
I identified the "getOtherNamespaces" function in wikiPage.php as a major bad
guy. It is used to link to the appropriate Talk namespace in the sidebar and
the footer. On my test installation, returning no other namespaces (an empty
array) cuts the time to generate a page in half! (Tested with the "Biology"
article using ab -n 20 -c 4)

I don't have time right now to fix this, someone please!!!!!!! help :)

Magnus
Re: Slow parts [ In reply to ]
I just now made the function look like this:

function getOtherNamespaces () {
$a = array () ;
# modification by Jimbo
return $a;

I assume that forces it to return an empty array and not run the rest of the code.

The site didn't die, so I'll assume that's o.k. as an emergency measure.

Also, I turned on pagecaching. This seems to have helped somewhat.

Magnus Manske wrote:

> I identified the "getOtherNamespaces" function in wikiPage.php as a major bad
> guy. It is used to link to the appropriate Talk namespace in the sidebar and
> the footer. On my test installation, returning no other namespaces (an empty
> array) cuts the time to generate a page in half! (Tested with the "Biology"
> article using ab -n 20 -c 4)
>
> I don't have time right now to fix this, someone please!!!!!!! help :)
>
> Magnus
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@ross.bomis.com
> http://ross.bomis.com/mailman/listinfo/wikitech-l
Re: Slow parts [ In reply to ]
On ĵaŭ, 2002-05-09 at 17:12, Jimmy Wales wrote:
>
> I just now made the function look like this:
>
> function getOtherNamespaces () {
> $a = array () ;
> # modification by Jimbo
> return $a;
>
> I assume that forces it to return an empty array and not run the rest of the code.
>
> The site didn't die, so I'll assume that's o.k. as an emergency measure.

As Chuck has noted, this has the effect of removing all the Talk links!

I suspect the problem may be this query:
SELECT cur_title FROM cur WHERE cur_title LIKE "%:$n"

I put a similar query into the Watchlist code so that watchlisting a
title would list updates to pages with that title in all namespaces
(mainly for talk pages), and it slowed to a completely unusable crawl:
my (more than ample) watchlist eventually times out before giving any
output. (Note that I've already put a workaround for that into CVS to
explicitly look for talk pages.)

It may be useful to split the title into separate namespace and
(sub)title fields so that slow substring matches don't need to be used.
Alternatively, a more efficient way to search the existing field this
way would be welcome.

I've attached a patch for a workaround that replaces the query with just
the talk page link; give it a try and see if that's still helping.

-- brion vibber (brion @ pobox.com)
Re: Slow parts [ In reply to ]
>I suspect the problem may be this query:
> SELECT cur_title FROM cur WHERE cur_title LIKE "%:$n"

Wow! This is a pig. Linear search (with regular expression) through the
whole database for every single page served :-)

There should not be any LIKE operators in the code whatsoever, since
they cannot use the index. The quick and dirty fix would be to
explicitly search for the page in other namespaces, like:

SELECT cur_title FROM cur
WHERE cur_title = "wikipedia:$n" OR
cur_title="user:$n" OR
...

this uses the index on cur_title and is quick. A much
cleaner solution would be to have an additional field "namespace" in
the database. Encoding namespaces in the article title is a
kludge.

Axel