Mailing List Archive

Optimizing suggestions
Hi

I'm fairly new to wikipedia and think it's an amazing project. Congrats to
all involved! I've been hit with the speed problems though, so I've been
following the optimization discussions from afar and may be able to
contribute somewhat. I'm hoping to have some more time to help out with
optimizing as I've got quite a bit of experience in the field (at least I've
just written a MySQL book, so people think I do).

I was quite surprised to read that the web and database server are on the
same machine. Simply separating these two (even for 2 equivalent slower
machines) which you llook like you're going to do, has usually made a
noticeable difference, allowing the machines to do both jobs optimally
rather than hopping about trying to do both. Definitely put the mysql server
on the better machine.

Another fairly easy thing you can do to cut out the slow queries is to
activate slow query logging. This only logs 'slow' queries and you can
quickly tell quite a lot from this, and work on these queries as a priority.

MySQL 4 would make sense as an upgrade. It's more than stable enough on any
*nix platforms, and has some substantial advantages as some of you have
pointed out already.

I haven't really had a look at the code, so I'm not sure how relevant all
this is, but mirroring pages makes a lot of sense, and takes unecessary load
off the database. The front page, and all articles, should be available at
high speed (this at least gives newcomers.something to see even if the db is
churning in the background). If the front page needs to do any database
queries, then mirror it as a static page every x minutes, much more
efficient than doing the query hundreds of times a minute. You could mirror
individual articles, but I'd need to look at the code and usage to see if
this will help.

Hopefully I'll be able to help out more, but these are just some thoughts
that may be useful to think about for now. Apologies if they're not relevant
because I haven't taken a close enough look at the details. I'll hopefully
be able to dive in soon. I'd also like to take a look at the my.cnf file as
well, if that's possible, as well as know the hardware of the db server.
That's something that can be optimized quite quickly, and may help a bit, if
it's not been done already.

regards,
ian gilfillan
Re: Optimizing suggestions [ In reply to ]
Brion Vibber wrote:
> Now I've got to get back to my paper; if anyone sees me online in the
> next few hours, please intervene. :)

Brion, go to your room right now!

I, too, will be offline for a few hours. I have to clean up my office
before my mom comes to visit. You'd all be horrified to see how I live.

--Jimbo
Re: Optimizing suggestions [ In reply to ]
On ven, 2003-02-07 at 02:49, Ian Gilfillan wrote:
> I'm fairly new to wikipedia and think it's an amazing project. Congrats to
> all involved! I've been hit with the speed problems though, so I've been
> following the optimization discussions from afar and may be able to
> contribute somewhat. I'm hoping to have some more time to help out with
> optimizing as I've got quite a bit of experience in the field (at least I've
> just written a MySQL book, so people think I do).

Wonderful! Any advice you can give would certainly be welcome.

> I was quite surprised to read that the web and database server are on the
> same machine. Simply separating these two (even for 2 equivalent slower
> machines) which you llook like you're going to do, has usually made a
> noticeable difference, allowing the machines to do both jobs optimally
> rather than hopping about trying to do both. Definitely put the mysql server
> on the better machine.

Jimbo, listen to this man: he speaks things that are true! :)

> Another fairly easy thing you can do to cut out the slow queries is to
> activate slow query logging. This only logs 'slow' queries and you can
> quickly tell quite a lot from this, and work on these queries as a priority.

I've had this on a couple times in the past, but seem to have forgot to
put the option into the bootup scripts. I'll start it up again and show
off a current slow log at the end of the day.

While we're speaking of slow queries, I'll just mention on the list a
couple tweaks I've done (not in CVS as sourceforge CVS is down): I
noticed that some of our slow queries were the Allpages list, which does
a sort on both namespace & title which we have only as separate indices.
For the meantime I've changed it to sort only on title, which saves a
huge sorting step. A composite index should be added later.

Also, the generator for the RSS feed of the last 10 changes was dog
slow. I've set it to use the Recentchanges table, which has rather sped
it up.

[snip]
> I'd also like to take a look at the my.cnf file as
> well, if that's possible, as well as know the hardware of the db server.
> That's something that can be optimized quite quickly, and may help a bit, if
> it's not been done already.

That would be *great*. The switches are dark voodoo to me.

2-CPU Athlon MP1700+
2GB ram
Single hard disk (SCSI; some IBM model, I forget exactly. 10000rpm)
OS: Red Hat Linux 7.2 (kernel 2.4.7)

I've posted our my.cnf to this list before but I can't find it at the
moment. Attached is the current one.

Now I've got to get back to my paper; if anyone sees me online in the
next few hours, please intervene. :)

-- brion vibber (brion @ pobox.com)