Mailing List Archive

Optimization
I've been studying performance today. I'm looking for the
bottlenecks. The total volume of traffic on the site is not so high
as to cause me to reasonably suspect that hardware is the problem. So
I'm looking for things that people do on the site that tend to be
expensive, so that we can eliminate features for awhile (if we must)
or find ways to optimize/cache them.

1. I notice that many of the 'special' pages are quite popular. I
suspect that these are our best bets.

2. I propose that we test the site without the "this page has been
accessed N times" feature. Doesn't it seem excessive to have a
database write for every single pageview, when we are trying to
increase performance? A nearly equivalent feature could be provided
with a daily update from the access logs.

--Jimbo
Re: Optimization [ In reply to ]
Jimmy Wales wrote:
> 2. I propose that we test the site without the "this page has been
> accessed N times" feature.

This is my best guess too. However, it is possible to get from
guesses to knowledge by checking getmicrotime() before and after the
call and log the elapsed time (to stderr?). Wrap each database
transaction in this, and get some statistics on which calls are
worst, rather than removing calls by guessing.

Not knowing more about this specific system, my general experience is
that database locks can eat a lot of wallclock time without consuming
any CPU time. During the delay, the memory allocated by each execution
thread can eat so much memory that a machine starts to swap. I have
no idea if this happens in Wikipedia, but you can monitor disk I/O by
simple programs like vmstat (or iostat on some systems). If this is
the case, adding RAM to avoid swapping is *not* a solution.


--
Lars Aronsson (lars@aronsson.se)
Aronsson Datateknik
Teknikringen 1e, SE-583 30 Linuxköping, Sweden
tel +46-70-7891609
http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/