Mailing List Archive

Getting rid of filesort?
It's the sorts on some of our queries that are so damn slow. A few
hundred or a couple thousand rows can take *a couple minutes* for it to
sort on the timestamp field.

I've just put in a hacky manual sort (in CVS) for the page history; all
rows are returned without sorting (so in ID order, usually but not
always chronological) and then output in reverse order. It's *immensely*
faster. Try loading up the history on the Village Pump or Votes for
Deletion, and watch it actually come up! Kinda nice. :)

I also took the opportunity to drop in the see next X links, so you
don't have to load all 2500+ items at once on your (well, my) 56k modem.
(These are used pretty generally, and the code really ought to be moved
out of SearchEngine.php and into globalfunctions...)

Of course, it would be *nicer* to do the ORDER BY and LIMIT in mysql.

The old query:
EXPLAIN SELECT old_id,old_namespace,old_title,old_user,
-> old_comment,old_user_text,old_timestamp,old_minor_edit FROM old
-> WHERE old_namespace=4 AND
-> old_title='Village_pump'
-> ORDER BY old_timestamp DESC;
+-------+------+---------------+---------------+---------+-------------+------+----------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+---------------+---------------+---------+-------------+------+----------------------------+
| old | ref | old_namespace | old_namespace | 256 | const,const | 1900 | where used; Using filesort |
+-------+------+---------------+---------------+---------+-------------+------+----------------------------+

The new query:
EXPLAIN SELECT old_id,old_namespace,old_title,old_user,
-> old_comment,old_user_text,old_timestamp,old_minor_edit FROM old
-> WHERE old_namespace=4 AND
-> old_title='Village_pump';
+-------+------+---------------+---------------+---------+-------------+------+------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+---------------+---------------+---------+-------------+------+------------+
| old | ref | old_namespace | old_namespace | 256 | const,const | 1900 | where used |
+-------+------+---------------+---------------+---------+-------------+------+------------+

Now, in an ideal world, it could do the sorting based on that handy
index it has on old_timestamp. In the topsy-turvy world of MySQL,
however, only the index used for the WHERE clause means anything. Other
ways to speed it up?

http://www.mysql.com/doc/en/ORDER_BY_optimisation.html

Oh, yes, and I changed the index from old_title to a combo on
old_namespace and old_title. This should cut down the number of rows a
little bit where a page and its talk page are both long; we never WHERE
the two columns separately on the old table, only together.

SHOW INDEX FROM old;
+-------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment |
+-------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+---------+
| old | 0 | old_id | 1 | old_id | A | 744949 | NULL | NULL | |
| old | 1 | old_timestamp | 1 | old_timestamp | A | 744949 | NULL | NULL | |
| old | 1 | old_user | 1 | old_user | A | 14 | NULL | NULL | |
| old | 1 | old_user_text | 1 | old_user_text | A | 74494 | NULL | NULL | |
| old | 1 | old_namespace | 1 | old_namespace | A | 14 | NULL | NULL | |
| old | 1 | old_namespace | 2 | old_title | A | 148989 | NULL | NULL | |
+-------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+---------+

And, since I was torturing things with slowness anyway, old is now
InnoDB. Whoop.

-- brion vibber (brion @ pobox.com)
Re: Getting rid of filesort? [ In reply to ]
More ugly hacks... User contribs now checks the recentchanges table
first, then moves on to 'cur' and 'old' if it hasn't hit its quota yet.
(See in CVS.)

This makes it actually possible to look at contribs lists for very busy
users. Yay!

-- brion vibber (brion @ pobox.com)
Re: Getting rid of filesort? [ In reply to ]
On Mon, 2003-02-03 at 11:03, Brion Vibber wrote:
> More ugly hacks... User contribs now checks the recentchanges table
> first, then moves on to 'cur' and 'old' if it hasn't hit its quota yet.
> (See in CVS.)
>
> This makes it actually possible to look at contribs lists for very busy
> users. Yay!

Cool. I think there should be a "My edits" link in the sidebar now that
it's reasonably fast. It's too useful to have it hidden like that. Any
objections?

It would also be neat to use the same paging style as in the history.

Regards,

Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Re: Getting rid of filesort? [ In reply to ]
On Sun, Feb 02, 2003 at 11:49:05PM -0800, Brion Vibber wrote:
>
> I've just put in a hacky manual sort (in CVS) for the page history; all
> rows are returned without sorting (so in ID order, usually but not
> always chronological) and then output in reverse order. It's *immensely*
> faster. Try loading up the history on the Village Pump or Votes for
> Deletion, and watch it actually come up! Kinda nice. :)
>
> I also took the opportunity to drop in the see next X links, so you
> don't have to load all 2500+ items at once on your (well, my) 56k modem.
> (These are used pretty generally, and the code really ought to be moved
> out of SearchEngine.php and into globalfunctions...)
>
> Of course, it would be *nicer* to do the ORDER BY and LIMIT in mysql.
>
> The old query:
> EXPLAIN SELECT old_id,old_namespace,old_title,old_user,
> -> old_comment,old_user_text,old_timestamp,old_minor_edit FROM old
> -> WHERE old_namespace=4 AND
> -> old_title='Village_pump'
> -> ORDER BY old_timestamp DESC;
> +-------+------+---------------+---------------+---------+-------------+------+----------------------------+
> | table | type | possible_keys | key | key_len | ref | rows | Extra |
> +-------+------+---------------+---------------+---------+-------------+------+----------------------------+
> | old | ref | old_namespace | old_namespace | 256 | const,const | 1900 | where used; Using filesort |
> +-------+------+---------------+---------------+---------+-------------+------+----------------------------+
>
> The new query:
> EXPLAIN SELECT old_id,old_namespace,old_title,old_user,
> -> old_comment,old_user_text,old_timestamp,old_minor_edit FROM old
> -> WHERE old_namespace=4 AND
> -> old_title='Village_pump';
> +-------+------+---------------+---------------+---------+-------------+------+------------+
> | table | type | possible_keys | key | key_len | ref | rows | Extra |
> +-------+------+---------------+---------------+---------+-------------+------+------------+
> | old | ref | old_namespace | old_namespace | 256 | const,const | 1900 | where used |
> +-------+------+---------------+---------------+---------+-------------+------+------------+
>
> Now, in an ideal world, it could do the sorting based on that handy
> index it has on old_timestamp. In the topsy-turvy world of MySQL,
> however, only the index used for the WHERE clause means anything. Other
> ways to speed it up?

Actually, that is not MySQL's fault but just a consequence of how indices
work; the only way in which an index can help with sorting is that it in
some sense presorts the records for you, but that is irrelevant if you also
use other indices (as you do here) because they will order the records in a
different way.

Anyway, there is actually a "trick" that you can use here to get always the
chronological order and that is to define an in index on (old_namespace,
old_title, old_timestamp) and forcing MySQL to use that index as in
(assuming that this indes is named old_namespace_2):

SELECT old_id, old_timestamp
FROM old
USE INDEX (old_namespace_2)
WHERE old_namespace=0 AND old_title='Village_pump'
ORDER BY old_timestamp;

then EXPLAIN will give you:

+-------+------+-------------------------------+-----------------+---------+-------------+------+------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+-------------------------------+-----------------+---------+-------------+------+------------+
| old | ref | old_namespace,old_namespace_2 | old_namespace_2 | 256 | const,const | 29 | where used |
+-------+------+-------------------------------+-----------------+---------+-------------+------+------------+

As you can see there is no sorting and that is because it uses the fact that
the index already presorts the records for you. Because MySQL knows that
this index can also be used as an index on (namespace, title) and even on
(namespace) you can drop the indices on those combinations. I'm not sure if
you can actually omit the USE but I know that if the other index is still
around on (namespace, title) it will usually pick that index, and then it
still has to sort.

You can use this method in most cases where we sort on timestamp, and you
might even consider storing the timestamps negated (unfortunately MySQL
doesn't have descending indices) in an extra redundant column and use that
column for the index. Then you wouldn't even have to reverse the list and
could efficiently do a query that selects that last 100 or so without
reading the full list into memory.

-- Jan Hidders
Re: Getting rid of filesort? [ In reply to ]
On lun, 2003-02-03 at 16:59, Jan Hidders wrote:
> Anyway, there is actually a "trick" that you can use here to get always the
> chronological order and that is to define an in index on (old_namespace,
> old_title, old_timestamp) and forcing MySQL to use that index

Okay, finally got around to trying this (on test wiki). The EXPLAIN
results are looking very promising -- it gets rid of the "using
filesort" and is presumably sorting with the index... *but only if
sorting forwards*.

If it were just the order the results come in, no big deal, but we need
to grab the last X rows and I don't know of another way to do that than
a reverse sort and LIMIT. No one wants to transfer all 15,000 of mav's
edits from the db just to display the last 50! (Well, I suppose you
could do two queries, one just to count, then one forward-sorted with
LIMIT N-X,X. But, that's ugly and increases load on the db.)

MySQL 3.x doesn't support using an index to sort backwards. However, if
I'm reading the docs correctly*, MySQL 4.x _does_. This may be a strong
argument in favor of upgrading.

* http://www.mysql.com/doc/en/ORDER_BY_optimisation.html

-- brion vibber (brion @ pobox.com)
Re: Getting rid of filesort? [ In reply to ]
I can't access the CVS server right now, so I've attached the
experimental index tweaks and sample queries for contribs and history.

-- brion vibber (brion @ pobox.com)