Mailing List Archive

Bottleneck: Internal link resolution
Hi,

after some browsing around in the code, I've found a major bottleneck:
the current internal link resolution system.

If you have a local Wikipedia install, try it for yourself: Visit a few
very link heavy pages, like [[List of reference tables]]. You will
notice that the rendering speed depends directly on the number of
internal links (not on the actual text size, or formatting tags). The
link cache seems to work as page rendering gets faster as you load a
page repeatedly, but it remains rather slow for link-heavy pages.

For further proof, edit OutputPage.php and in the function
replaceInternalLinks.php just add a return $s at the top. As a result,
internal links will no longer be resolved. Now browse a page (optimally
open Main Page before the change and follow links from there after it)
and notice that the rendering is lightning fast.

The internal link resolution process is quite complex. I notice that
lots of Title objects are created, and we have lots of ID lookups. I
won't speculate too much about the cause until I have had time to
examine them closer. But that this is a bottleneck is certain.

I believe we need to make use of the links and brokenlinks table for the
actual page rendering. Right now they seem to be used only for the
special pages.

Regards,

Erik

--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Re: Bottleneck: Internal link resolution [ In reply to ]
Erik Moeller wrote:

>Hi,
>
>after some browsing around in the code, I've found a major bottleneck:
>the current internal link resolution system.
>
>If you have a local Wikipedia install, try it for yourself: Visit a few
>very link heavy pages, like [[List of reference tables]]. You will
>notice that the rendering speed depends directly on the number of
>internal links (not on the actual text size, or formatting tags). The
>link cache seems to work as page rendering gets faster as you load a
>page repeatedly, but it remains rather slow for link-heavy pages.
>
I address this issue as a non-techie. I have worked on some very
link-heavy pages, such as the ones for the Academy Awards whose loading
time increases with the addition of more data.
1. Is it likely that a technical solution will soon be found for the
slow loading?
2. Are there ways in which data could be better organized to
minimize the effects of slow loading?
3. Should articles like the Academy Awards listings be broken down
to more manageable sizes, and if so, how does one determine optimum sizes?

Eclecticology
Re: Bottleneck: Internal link resolution [ In reply to ]
Hi,

> I address this issue as a non-techie. I have worked on some very
> link-heavy pages, such as the ones for the Academy Awards whose loading
> time increases with the addition of more data.
> 1. Is it likely that a technical solution will soon be found for the
> slow loading?

I'll try to look closer into it in the next few days, but I'd appreciate
some help from the long-term developers. Another thing I'm currently
trying to improve is the speed of the Orphaned Pages special, which takes
an awful lot of time on my local install (possibly because of a missing
index in the table).

> 2. Are there ways in which data could be better organized to
> minimize the effects of slow loading?
> 3. Should articles like the Academy Awards listings be broken down
> to more manageable sizes, and if so, how does one determine optimum sizes?

With the current software, it's simply a matter of the number of links.
The more links, the slower the page will load. The volume of the actual
text content does not matter. So you could try to break collections of
links into different sections on separate pages.

But in general I am against such workarounds. We should try to improve the
software ASAP. Do you think I could convert you into a developer? We need
all the help we can get, and I'd be willing to teach as far as I
understand.

Regards,

Erik
Re: Getting Neophyte Developers Started [ In reply to ]
erik_moeller@gmx.de wrote:

>
> But in general I am against such workarounds. We should try to improve the
> software ASAP. Do you think I could convert you into a developer? We need
> all the help we can get, and I'd be willing to teach as far as I
> understand.

I would be interested in joining a mailing list aimed at
assisting neophyte developers with getting started
or in establishing private dialogues with friendly mentors
and other neophytes.

I have some obsolete pentium systems that I could dedicate
and set up for local testing and/or development activities.

I have substantial experience and exposure to a variety
of software and system development projects and some
applicable formal training but very limited actual coding
experience.

If you can assist me (and others) with converting to a handson
developer I would appreciate it. Gaining traction with free
software based development has proven more difficult than I
expected. Much more so (for me) than my previous experience
with limited Windows based programming.

I will be traveling on business next week and the following
and will have access to high bandwith internet access. If
we could develop a list of software and versions to work with
initially in the next few days; I could download all the required
components to setup a local system very similar or identical to
the initial recommended neophyte's platform.

Regards,
Mike Irwin
Re: Getting Neophyte Developers Started [ In reply to ]
Hello Michael!

> I would be interested in joining a mailing list aimed at
> assisting neophyte developers with getting started
> or in establishing private dialogues with friendly mentors
> and other neophytes.

Hm, how about a wiki page instead? The knowledge of mailing lists tends
to get lost in their vast archives.

I have created
http://meta.wikipedia.org/wiki/How_to_become_a_Wikipedia_hacker
as a skeleton with some content. Please add to the structure,
particularly the questions you find relevant for yourself.

> I have some obsolete pentium systems that I could dedicate
> and set up for local testing and/or development activities.

Great!

> I have substantial experience and exposure to a variety
> of software and system development projects and some
> applicable formal training but very limited actual coding
> experience.

The best way to learn coding is to look at code. Again, that's why I
think installing Wikipedia is crucial for anyone, and it doesn't take
much knowledge.

A problem for me, and I should have written that, is that I have never
used PHP, MySQL etc. under Windows -- I'm used to Linux. Do we have any
Wikipedia coders using Windows?

Regards,

Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Re: Getting Neophyte Developers Started [ In reply to ]
On Tue, Nov 12, 2002 at 10:41:55AM +0100, Erik Moeller wrote:
> > I have substantial experience and exposure to a variety
> > of software and system development projects and some
> > applicable formal training but very limited actual coding
> > experience.
>
> The best way to learn coding is to look at code. Again, that's why I
> think installing Wikipedia is crucial for anyone, and it doesn't take
> much knowledge.
>
> A problem for me, and I should have written that, is that I have never
> used PHP, MySQL etc. under Windows -- I'm used to Linux. Do we have any
> Wikipedia coders using Windows?

Why should we bother about Windoze at all ?
Server stuff is coded here.