Mailing List Archive

What's up with the lag?
It is worse now than it was during the Slashdotting especially with Konqueror.
Sometimes all I get for hours are timeouts.

Does this have anything to do with the new table structures Brion installed a
few days ago? Everything was working fine before then.

--mav
Re: What's up with the lag? [ In reply to ]
On Wed, 29 Jan 2003, Daniel Mayer wrote:
> It is worse now than it was during the Slashdotting especially with Konqueror.
> Sometimes all I get for hours are timeouts.
>
> Does this have anything to do with the new table structures Brion installed a
> few days ago? Everything was working fine before then.

"Everything was working fine before then"? You have a short memory, my
friend. :)

As I've said before, it's going to take some time to get everything
tweaked just right.

-- brion vibber (brion @ pobox.com)
Re: What's up with the lag? [ In reply to ]
On Wed, Jan 29, 2003 at 10:26:31AM -0800, Daniel Mayer wrote:
> It is worse now than it was during the Slashdotting especially with Konqueror.
> Sometimes all I get for hours are timeouts.
>
> Does this have anything to do with the new table structures Brion
> installed a few days ago? Everything was working fine before then.

Is it correct the we are now using the locking mechanism in MySQL? (I
haven't been following this list for a while.) If so then this is probably
causing all the delays. Was there is specific reason for starting to use
locking?

-- Jan Hidders
Re: What's up with the lag? [ In reply to ]
> It is worse now than it was during the Slashdotting especially with
> Konqueror. Sometimes all I get for hours are timeouts.

> Does this have anything to do with the new table structures Brion installed
> a few days ago? Everything was working fine before then.

Yes, it's related to the InnoDB updates. Once we get the problems fixed,
it should be a lot faster than before, though. The problems should
absolutely be browser-independent, please try to verify that.

Regards,

Erik
Re: What's up with the lag? [ In reply to ]
> Is it correct the we are now using the locking mechanism in MySQL? (I
> haven't been following this list for a while.) If so then this is probably
> causing all the delays. Was there is specific reason for starting to use
> locking?

No. We switched to the InnoDB table type, which uses row-level locking
instead of MyISAM's table locking. So in general, things should get much
faster.

Regards,

Erik
Re: What's up with the lag? [ In reply to ]
On Wed, Jan 29, 2003 at 10:43:00PM +0100, Erik Moeller wrote:
> > Is it correct the we are now using the locking mechanism in MySQL? (I
> > haven't been following this list for a while.) If so then this is probably
> > causing all the delays. Was there is specific reason for starting to use
> > locking?
>
> No. We switched to the InnoDB table type, which uses row-level locking
> instead of MyISAM's table locking. So in general, things should get much
> faster.

Not necessarily, row level locking causes sometimes a *lot* more overhead
and can block *more* than table level locking. I assume you have read the
manual but for good measure I will quote the relevant part anyway:

> Cons:
>
> * Takes more memory than page level or table locks.
> * Is slower than page level or table locks when used on a big part of
> the table, because one has to do many more locks.
> * Is definitely much worse than other locks if you do often do GROUP BY
> on a large part of the data or if one has to often scan the whole
> table.
> * With higher level locks one can also more easily support locks of
> different types to tune the application as the lock overhead is less
> notable as for row level locks.

Especially if you are also going to use transactions that span more than 1
SQL statement (are you?) things may even get considerably worse.

-- Jan Hidders

.---------------------------------------------------------------------.
| Post-doctoral researcher e-mail: jan.hidders@ua.ac.be |
| Dept. Math. & Computer Science tel: (+32) 3 820 2427 |
| University of Antwerp, room J1.06 fax: (+32) 3 820 2421 |
| (UIA) Universiteitsplein 1, B-2610 Wilrijk - Antwerpen (Belgium) |
`---------------------------------------------------------------------'
Re: What's up with the lag? [ In reply to ]
On mer, 2003-01-29 at 14:15, Jan Hidders wrote:
> On Wed, Jan 29, 2003 at 10:43:00PM +0100, Erik Moeller wrote:
> > > Is it correct the we are now using the locking mechanism in MySQL? (I
> > > haven't been following this list for a while.) If so then this is probably
> > > causing all the delays. Was there is specific reason for starting to use
> > > locking?
> >
> > No. We switched to the InnoDB table type, which uses row-level locking
> > instead of MyISAM's table locking. So in general, things should get much
> > faster.
>
> Not necessarily, row level locking causes sometimes a *lot* more overhead
> and can block *more* than table level locking. I assume you have read the
> manual but for good measure I will quote the relevant part anyway:

If you have concrete suggestions on how to reform our code, I'd *love*
to hear them. I say this in all seriousness; I freely admit that I'm new
at this database stuff, and am unsure of the best course of action.

Our slow locks currently seem to be happening on 'old' and
'searchindex', both MyISAM tables -- so no row-level locks on them. When
one thread is stuck reading the table, and other threads are sitting
waiting for it (or rather waiting for another thread which is waiting to
write the table), killing the stuck thread springs things back into
action.

'old' is usually being queried for the list of previous edits for a
given page, or sometimes user contribs; these queries have no joins, and
are WHERE'd with = on indexed fields. It appears to be properly indexed,
so I'm unsure why it sometimes takes so long. It's rarely written to
except for an INSERT on each page save.

'searchindex' has a row REPLACEd into it on every page save, and has
SELECTs run over most of the table quite frequently (fulltext MATCH
AGAINST), with a join on the integer index to the cur table. As an
experiment last night, I tried taking out the join and grabbing the cur
data in a separate query; it didn't improve performance any, so the join
doesn't seem to be blameable.

> > Cons:
> >
> > * Takes more memory than page level or table locks.
> > * Is slower than page level or table locks when used on a big part of
> > the table, because one has to do many more locks.
> > * Is definitely much worse than other locks if you do often do GROUP BY
> > on a large part of the data or if one has to often scan the whole
> > table.

This is (or should be) relatively rare; the vast majority of selects are
either on one row (check the existence of an article) or on a specific
subset (grab rows belonging to one or a particular set of articles).

GROUP BY does turn up in a couple of places:
* Recentchangeslinked (where it seems to be redundant, and should only
operate on a small subset of pages)
* Undelete (sysops only, rarely used, on a small table)
* Wantedpages (currently disabled)

> > * With higher level locks one can also more easily support locks of
> > different types to tune the application as the lock overhead is less
> > notable as for row level locks.

Examples?

> Especially if you are also going to use transactions that span more than 1
> SQL statement (are you?) things may even get considerably worse.

If you mean chunks of statements actually wrapped in BEGIN/COMMIT
blocks, then there are only a couple of these at present, both done on
page save. The first should be quite lightweight in terms of rows, doing
a 1-row update to 'cur', a 1-row insert to 'recentchanges', and two more
updates to 'recentchanges' which hit respectively 1 row and as many rows
as are present that refer to the particular article.

The other does deletes and insertions on the links and brokenlinks
tables, and may hit a larger number of rows when touching oft-linked
pages. I've not yet seen any evidence that anything is blocking on locks
on the link tables.

-- brion vibber (brion @ pobox.com)
Re: What's up with the lag? [ In reply to ]
On Wed, Jan 29, 2003 at 03:55:16PM -0800, Brion Vibber wrote:
> On mer, 2003-01-29 at 14:15, Jan Hidders wrote:
> >
> > Not necessarily, row level locking causes sometimes a *lot* more overhead
> > and can block *more* than table level locking. I assume you have read the
> > manual but for good measure I will quote the relevant part anyway:
>
> If you have concrete suggestions on how to reform our code, I'd *love*
> to hear them. I say this in all seriousness; I freely admit that I'm new
> at this database stuff, and am unsure of the best course of action.

Ok, sorry for being critical in an unhelpful way. But I would still want to
know if there was a concrete problem that you wanted to solve with row-level
locking, or would going back to full MyISAM be an option? My gut feeling is
that row locking really doesn't help much because our access patterns are a
lot of small reads (fetching pages), a few big reads (recent changes et
cetera) and relatively rare small updates that don't mind if they have to
wait a few seconds.

> Our slow locks currently seem to be happening on 'old' and
> 'searchindex', both MyISAM tables -- so no row-level locks on them.

Hmm, that is very mysterious indeed, and I have to think a little longer
about why that could be. The only thing I can think of now would be to
consider using INSERT DELAYED so that the updates don't wait on the SELECTs.

> 'old' is usually being queried for the list of previous edits for a
> given page, or sometimes user contribs; these queries have no joins, and
> are WHERE'd with = on indexed fields. It appears to be properly indexed,
> so I'm unsure why it sometimes takes so long.

What does EXPLAIN say? Is it using the indices properly?

> > Especially if you are also going to use transactions that span more than
> > 1 SQL statement (are you?) things may even get considerably worse.
>
> If you mean chunks of statements actually wrapped in BEGIN/COMMIT blocks,
> then there are only a couple of these at present, both done on page save.
> The first should be quite lightweight in terms of rows, doing a 1-row
> update to 'cur', a 1-row insert to 'recentchanges', and two more updates
> to 'recentchanges' which hit respectively 1 row and as many rows as are
> present that refer to the particular article.

Yes, ok, that doesn't sound too bad, but I was a little bit affraid that you
also had included the MyISAM tables in the transactin.

-- Jan Hidders

.---------------------------------------------------------------------.
| Post-doctoral researcher e-mail: jan.hidders@ua.ac.be |
| Dept. Math. & Computer Science tel: (+32) 3 820 2427 |
| University of Antwerp, room J1.06 fax: (+32) 3 820 2421 |
| (UIA) Universiteitsplein 1, B-2610 Wilrijk - Antwerpen (Belgium) |
`---------------------------------------------------------------------'
Re: What's up with the lag? [ In reply to ]
On mer, 2003-01-29 at 17:25, Jan Hidders wrote:
> On Wed, Jan 29, 2003 at 03:55:16PM -0800, Brion Vibber wrote:
> > On mer, 2003-01-29 at 14:15, Jan Hidders wrote:
> > >
> > > Not necessarily, row level locking causes sometimes a *lot* more overhead
> > > and can block *more* than table level locking. I assume you have read the
> > > manual but for good measure I will quote the relevant part anyway:
> >
> > If you have concrete suggestions on how to reform our code, I'd *love*
> > to hear them. I say this in all seriousness; I freely admit that I'm new
> > at this database stuff, and am unsure of the best course of action.
>
> Ok, sorry for being critical in an unhelpful way. But I would still want to
> know if there was a concrete problem that you wanted to solve with row-level
> locking, or would going back to full MyISAM be an option? My gut feeling is
> that row locking really doesn't help much because our access patterns are a
> lot of small reads (fetching pages), a few big reads (recent changes et
> cetera) and relatively rare small updates that don't mind if they have to
> wait a few seconds.

Well, an update (counter increment) comes with every page view, and we
often have several edits per minute.

> > Our slow locks currently seem to be happening on 'old' and
> > 'searchindex', both MyISAM tables -- so no row-level locks on them.
>
> Hmm, that is very mysterious indeed, and I have to think a little longer
> about why that could be. The only thing I can think of now would be to
> consider using INSERT DELAYED so that the updates don't wait on the SELECTs.

I made the searchindex updates REPLACE DELAYED last night, it may have
helped a bit.

I would try making the olds INSERT DELAYED too, but currently we need to
get the autoincrement index on the new insert in order to reference it
in the recentchanges table. (in Article::updateArticle() )

> > 'old' is usually being queried for the list of previous edits for a
> > given page, or sometimes user contribs; these queries have no joins, and
> > are WHERE'd with = on indexed fields. It appears to be properly indexed,
> > so I'm unsure why it sometimes takes so long.
>
> What does EXPLAIN say? Is it using the indices properly?

They _look_ okay to me.

Let's try the history of [[Talk:Ellipse]]:
+-------+------+---------------+-----------+---------+-------+------+----------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+------+---------------+-----------+---------+-------+------+----------------------------+
| old | ref | old_title | old_title | 20 | const | 31 | where used; Using filesort |
+-------+------+---------------+-----------+---------+-------+------+----------------------------+

Sounds good... Village pump bumps the rows up to 600-something, still a tiny fraction of the table.

Search for "English civil war":
+-------------+----------+----------------------+---------+---------+---------------------+------+------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------------+----------+----------------------+---------+---------+---------------------+------+------------+
| searchindex | fulltext | si_page,si_text | si_text | 0 | | 1 | where used |
| cur | eq_ref | cur_id,cur_namespace | cur_id | 4 | searchindex.si_page | 1 | where used |
+-------------+----------+----------------------+---------+---------+---------------------+------+------------+

No diff in the explain results if we reduce to a single search term.
Note that the number of rows listed here is bogus when InnoDB tables are
involved; this may be true of fulltext search as well. (In theory it's
not scanning individual rows, but a big glob 'o' indexy goodness, so I
guess it would be meaningless.)

If we cut out the join to cur:
+-------------+----------+---------------+---------+---------+------+------+------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------------+----------+---------------+---------+---------+------+------+------------+
| searchindex | fulltext | si_text | si_text | 0 | | 1 | where used |
+-------------+----------+---------------+---------+---------+------+------+------------+


> > > Especially if you are also going to use transactions that span more than
> > > 1 SQL statement (are you?) things may even get considerably worse.
> >
> > If you mean chunks of statements actually wrapped in BEGIN/COMMIT blocks,
> > then there are only a couple of these at present, both done on page save.
> > The first should be quite lightweight in terms of rows, doing a 1-row
> > update to 'cur', a 1-row insert to 'recentchanges', and two more updates
> > to 'recentchanges' which hit respectively 1 row and as many rows as are
> > present that refer to the particular article.
>
> Yes, ok, that doesn't sound too bad, but I was a little bit affraid that you
> also had included the MyISAM tables in the transactin.

Shouldn't be. However, on page save we do have to wait for the INSERT to
the old table to finish before we move on to the part that is wrapped in
a transaction, since we need the insert id.

-- brion vibber (brion @ pobox.com)
Re: What's up with the lag? [ In reply to ]
On Wed, Jan 29, 2003 at 06:46:22PM -0800, Brion Vibber wrote:
> On mer, 2003-01-29 at 17:25, Jan Hidders wrote:
> > On Wed, Jan 29, 2003 at 03:55:16PM -0800, Brion Vibber wrote:
> > > On mer, 2003-01-29 at 14:15, Jan Hidders wrote:
> > > >
> > > > Not necessarily, row level locking causes sometimes a *lot* more overhead
> > > > and can block *more* than table level locking. I assume you have read the
> > > > manual but for good measure I will quote the relevant part anyway:
> > >
> > > If you have concrete suggestions on how to reform our code, I'd *love*
> > > to hear them. I say this in all seriousness; I freely admit that I'm new
> > > at this database stuff, and am unsure of the best course of action.
> >
> > Ok, sorry for being critical in an unhelpful way. But I would still want to
> > know if there was a concrete problem that you wanted to solve with row-level
> > locking, or would going back to full MyISAM be an option? My gut feeling is
> > that row locking really doesn't help much because our access patterns are a
> > lot of small reads (fetching pages), a few big reads (recent changes et
> > cetera) and relatively rare small updates that don't mind if they have to
> > wait a few seconds.
>
> Well, an update (counter increment) comes with every page view, and we
> often have several edits per minute.

True, I had forgotten about those. But these are really tiny updates, and
requesting a row lock is sometimes implemented such that it excludes other lock
requests on the table until the lock is granted or refused. Remember that a
lock request is in some sense also an update operation and one that
necessarily has to use a coarser locking protocol than the one it
implements.

Sorry I can't be of more help now, because I really have to get to my work
now. I'll see if I can find some time this evening. I did look at the result
of EXPLAIN and I agree that they look Ok.

-- Jan Hidders

.---------------------------------------------------------------------.
| Post-doctoral researcher e-mail: jan.hidders@ua.ac.be |
| Dept. Math. & Computer Science tel: (+32) 3 820 2427 |
| University of Antwerp, room J1.06 fax: (+32) 3 820 2421 |
| (UIA) Universiteitsplein 1, B-2610 Wilrijk - Antwerpen (Belgium) |
`---------------------------------------------------------------------'
Re: What's up with the lag? [ In reply to ]
Brion Vibber wrote:
> If you have concrete suggestions on how to reform our code, I'd *love*
> to hear them. I say this in all seriousness; I freely admit that I'm new
> at this database stuff, and am unsure of the best course of action.

Yeah, Jan really knows this stuff, too. So I'd like to second this
motion.

--Jimbo
Re: What's up with the lag? [ In reply to ]
Brion Vibber wrote:
> Well, an update (counter increment) comes with every page view, and we
> often have several edits per minute.

What's the wikipedia-purpose of a counter increment on every page
view? Just to know which pages are popular? We could get the same
information easily enough from a log analysis program, which I could
offload to a less-busy server. Or we could just do without the
information easily enough. It's not really that useful. Or am I
missing something?

Even if this is thought by Jan and others to be only a small part of
the performance problem, it's probably philosophically good for us to
get in the habit of thinking "unless this feature is free or really
really useful, we should do without it". That is, we already have a
bad case of feature-itis, and taking a hardline on such things might
be very helpful.

--Jimbo
Re: What's up with the lag? [ In reply to ]
On Don, 2003-01-30 at 12:17, Jimmy Wales wrote:
> Brion Vibber wrote:
> > Well, an update (counter increment) comes with every page view, and we
> > often have several edits per minute.
>
> What's the wikipedia-purpose of a counter increment on every page
> view? Just to know which pages are popular? We could get the same
> information easily enough from a log analysis program, which I could
> offload to a less-busy server. Or we could just do without the
> information easily enough. It's not really that useful. Or am I
> missing something?
>
> Even if this is thought by Jan and others to be only a small part of
> the performance problem, it's probably philosophically good for us to
> get in the habit of thinking "unless this feature is free or really
> really useful, we should do without it". That is, we already have a
> bad case of feature-itis, and taking a hardline on such things might
> be very helpful.

Yeah, the counters are wildly inaccurate anyway because they've been
disabled so often. Let's just get rid of them entirely. A most popular
ranking by URL would still be neat though.

Regards,

Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Re: What's up with the lag? [ In reply to ]
On Thu, Jan 30, 2003 at 12:37:34PM +0100, Erik Moeller wrote:
> On Don, 2003-01-30 at 12:17, Jimmy Wales wrote:
> > Brion Vibber wrote:
> > > Well, an update (counter increment) comes with every page view, and we
> > > often have several edits per minute.
> >
> > What's the wikipedia-purpose of a counter increment on every page
> > view? Just to know which pages are popular? We could get the same
> > information easily enough from a log analysis program, which I could
> > offload to a less-busy server. Or we could just do without the
> > information easily enough. It's not really that useful. Or am I
> > missing something?
> >
> > Even if this is thought by Jan and others to be only a small part of
> > the performance problem, it's probably philosophically good for us to
> > get in the habit of thinking "unless this feature is free or really
> > really useful, we should do without it". That is, we already have a
> > bad case of feature-itis, and taking a hardline on such things might
> > be very helpful.
>
> Yeah, the counters are wildly inaccurate anyway because they've been
> disabled so often. Let's just get rid of them entirely. A most popular
> ranking by URL would still be neat though.

I couldn't agree more. Whatever type of locking we are going to use,
restricting ourselves to read-only operations for page views is better for
concurrency and performance.

-- Jan Hidders

.---------------------------------------------------------------------.
| Post-doctoral researcher e-mail: jan.hidders@ua.ac.be |
| Dept. Math. & Computer Science tel: (+32) 3 820 2427 |
| University of Antwerp, room J1.06 fax: (+32) 3 820 2421 |
| (UIA) Universiteitsplein 1, B-2610 Wilrijk - Antwerpen (Belgium) |
`---------------------------------------------------------------------'
Re: What's up with the lag? [ In reply to ]
On ĵaŭ, 2003-01-30 at 03:17, Jimmy Wales wrote:
> What's the wikipedia-purpose of a counter increment on every page
> view? Just to know which pages are popular? We could get the same
> information easily enough from a log analysis program, which I could
> offload to a less-busy server. Or we could just do without the
> information easily enough. It's not really that useful. Or am I
> missing something?

I wouldn't miss seeing them go. The counters are fairly arbitrary; they
don't indicate how much time has passed since they began. They include
hits from search engine spiders and the same people passing over the
page multiple times by accident.

The only thing they're used for is the count at the bottom of the page
view, and the Popular Pages list. And they're inaccurate for both.

-- brion vibber (brion @ pobox.com)
Re: No more new (major) features until jan 2004 ? [ In reply to ]
Jimmy Wales wrote:

> it's probably philosophically good for us to
> get in the habit of thinking "unless this feature is free or really
> really useful, we should do without it". That is, we already have a
> bad case of feature-itis, and taking a hardline on such things might
> be very helpful.


Very much agree!
No more new features until jan 2004 (or so). Then allow 4 new ones or so.

Mathematics is difficult enough. Don't create more headache.
Programming / developing software IS mathematics in the end.

Let's agree on things like: no more than doubling the amount of sourcecode
within 5 years (or so).

Wikipedia-content can be self-organising very well, Wiki is great!,
but metaWiki should become more 'pre'-structured. Designing democratic
structures, institutions, etc. is also quite 'technical', this seems
the place (I have no account yet on wikipedia-l, but I'll soon get one.)

Sorry for bothering,
Pieter Suurmond
Re: Re: No more new (major) features until jan 2004 ? [ In reply to ]
On Fri, Jan 31, 2003 at 05:20:49AM +0100, Pieter Suurmond wrote:
> Jimmy Wales wrote:
> > it's probably philosophically good for us to
> > get in the habit of thinking "unless this feature is free or really
> > really useful, we should do without it". That is, we already have a
> > bad case of feature-itis, and taking a hardline on such things might
> > be very helpful.
>
> Very much agree!
> No more new features until jan 2004 (or so). Then allow 4 new ones or so.
>
> Mathematics is difficult enough. Don't create more headache.
> Programming / developing software IS mathematics in the end.
>
> Let's agree on things like: no more than doubling the amount of sourcecode
> within 5 years (or so).
>
> Wikipedia-content can be self-organising very well, Wiki is great!,
> but metaWiki should become more 'pre'-structured. Designing democratic
> structures, institutions, etc. is also quite 'technical', this seems
> the place (I have no account yet on wikipedia-l, but I'll soon get one.)
>
> Sorry for bothering,
> Pieter Suurmond

No, it's all completely wrong.

We need lot of new features, like:
Support for SVG, conversion to good printable formats (PS and PDF),
exporting Wikipedia to text-only dict format, better mirroring (that would
just transfer daily diffs or so), speech synthesis support, including
generating pronunciation of words in actual sound files, support for
rendering non-Latin scripts into PNGs (most browsers don't have CJK,
Arabic etc., and it would be nice if they could see it anyway),
multilingual accounts, recent changes and all that stuff, moving
all Wikipedias to UTF-8 with some support for broken browsers,
spelling checker, support for external WYSIWYG editor (mode for
Emacs or something), <code> tag that would allow to download examples
(as separate filer or as gziped tarball generated on fly), support
for XHTML and MathML, and a lot more.

And programming is not math at all.
Re: Re: No more new (major) features until jan 2004 ? [ In reply to ]
> No more new features until jan 2004 (or so). Then allow 4 new ones or so.

Forget it. There's plenty of new stuff we need (Tomasz has touched upon
it), and most of it is not a problem from a performance perspective. What
we do need to do is discuss for every new feature what impact it might
have on performance. That should be a priority.

Regards,

Erik
Re: Re: No more new (major) features until jan 2004 ? [ In reply to ]
Erik Moeller wrote:
> Forget it. There's plenty of new stuff we need (Tomasz has touched
> upon it), and most of it is not a problem from a performance
> perspective. What we do need to do is discuss for every new feature
> what impact it might have on performance. That should be a priority.

Absolutely.
Re: Optimizing and tuning [ In reply to ]
Jan Hidders wrote:
> On Sat, Feb 01, 2003 at 01:14:00PM +0100, Erik Moeller wrote:
> > Jan Hidders wrote:
> > >
> > > I suspect the biggest part of the involved tables is in the cache
> > > anyway,
> >
> > I don't trust automatic caching, especially since our CUR table is huge
> > and it is only a very small subset of it (the titles) which we need in
> > memory.
>
> For an existence query it only needs the index, not the table itself. Did
> you check if we are doing something silly here so that the database has to
> access the table anyway? What does EXPLAIN say here?

Sorry for replying to myself, but I just saw in CVS that the SQL is
something like this:

SELECT HIGH_PRIORITY cur_id
FROM cur
WHERE cur_namespace=" . "{$ns} AND cur_title='" . wfStrencode( $t ) . "'";

but if I look in buildTables.inc we have a separate index for namespace and
title. That should really be a combined index and that holds for everywhere
we do a look-up with namespace + title. Do we have those?

And I'm not sure if we really need the resulting cur_id. If we don't, then
indeed only the index is needed and that would really speed up the query.

-- Jan Hidders

.---------------------------------------------------------------------.
| Post-doctoral researcher e-mail: jan.hidders@ua.ac.be |
| Dept. Math. & Computer Science tel: (+32) 3 820 2427 |
| University of Antwerp, room J1.06 fax: (+32) 3 820 2421 |
| (UIA) Universiteitsplein 1, B-2610 Wilrijk - Antwerpen (Belgium) |
`---------------------------------------------------------------------'
Re: Re: Optimizing and tuning [ In reply to ]
On sab, 2003-02-01 at 07:12, Jan Hidders wrote:
> Sorry for replying to myself, but I just saw in CVS that the SQL is
> something like this:
>
> SELECT HIGH_PRIORITY cur_id
> FROM cur
> WHERE cur_namespace=" . "{$ns} AND cur_title='" . wfStrencode( $t ) . "'";
>
> but if I look in buildTables.inc we have a separate index for namespace and
> title. That should really be a combined index and that holds for everywhere
> we do a look-up with namespace + title. Do we have those?

Combined index might make sense. In most cases it won't significantly
cut down the number of rows, but for some (main page, current events,
user:maveric149) it could take out a couple hundred rows by splitting
the page and talk page.

Also, a vaguely related question on the watchlist query. Currently the
return-whole-watchlist query looks like:
SELECT DISTINCT
cur_id,cur_namespace,cur_title,cur_comment,
cur_user,cur_user_text,cur_timestamp,cur_minor_edit,cur_is_new
FROM cur,watchlist
WHERE wl_user={$uid} AND wl_title=cur_title
AND (cur_namespace=wl_namespace OR cur_namespace=wl_namespace+1)
ORDER BY cur_timestamp DESC {$dolimit}

I did an 'explain' using my uid. (I have 1370 titles in my watchlist;
the larger # of estimated returned rows is presumably due to the weird
InnoDB optimizer, which is known to return sometimes-wrong numbers of
rows.)

+-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+
| watchlist | ref | wl_user | wl_user | 4 | const | 3264 | where used; Using index; Using temporary; Using filesort |
| cur | ref | cur_namespace,cur_title | cur_title | 255 | watchlist.wl_title | 1 | where used |
+-----------+------+-------------------------+-----------+---------+--------------------+------+----------------------------------------------------------+

Is there a better way of doing this? We have a combined namespace/title
index on watchlist; but we want to pick up two variants in the
namespace. In most cases, just using the title as index on cur should
then be sufficient (it's relatively rare for titles to exist in multiple
groups of namespaces, mostly just in one namespace and the X_talk: next
to it).

> And I'm not sure if we really need the resulting cur_id. If we don't, then
> indeed only the index is needed and that would really speed up the query.

99% of the time, we don't; we're just checking for existence. It should
be sufficient to just have a boolean 'exists/doesn't exist' field. When
we're checking existence, is it most efficient to SELECT the indexed
field, or COUNT(*), or what?

-- brion vibber (brion @ pobox.com)
Re: Re: Optimizing and tuning [ In reply to ]
On Sat, Feb 01, 2003 at 11:10:21AM -0800, Brion Vibber wrote:
>
> Also, a vaguely related question on the watchlist query. Currently the
> return-whole-watchlist query looks like:
> SELECT DISTINCT
> cur_id,cur_namespace,cur_title,cur_comment,
> cur_user,cur_user_text,cur_timestamp,cur_minor_edit,cur_is_new
> FROM cur,watchlist
> WHERE wl_user={$uid} AND wl_title=cur_title
> AND (cur_namespace=wl_namespace OR cur_namespace=wl_namespace+1)
> ORDER BY cur_timestamp DESC {$dolimit}

I would suggest the same trick here as in my previous mail, but you would
have to split up the query for each namespace and then in PHP do some
postprocessing by merging the two lists and cutting it off if the total sum
is beyond the requested limit.

> > And I'm not sure if we really need the resulting cur_id. If we don't,
> > then indeed only the index is needed and that would really speed up the
> > query.
>
> 99% of the time, we don't; we're just checking for existence. It should
> be sufficient to just have a boolean 'exists/doesn't exist' field. When
> we're checking existence, is it most efficient to SELECT the indexed
> field, or COUNT(*), or what?

Just a SELECT 1 FROM ... wil do. And as I said, in that case only the index
is needed to answer the query and since the caching rate for the indices is
very high this should really speed up the queries for the links in the
documents.

-- Jan Hidders