On Wed, 4 Jun 2003, Tim Starling wrote:
> Taking the SQL query used in Special:Randompage from CVS and modifying
> it very slightly...
[snip]
> It doesn't always return the same articles, but they're always very
> low-numbered. I don't know about you, but I would call that a MySQL
> bug.
Hmm... let's simplify this further:
+---------------------+
| cur_random |
+---------------------+
| 0.00257335324080042 |
| 0.00301321596187839 |
| 0.00409562141084636 |
| 0.00434284564512115 |
| 0.00447388831942704 |
| 0.00527506415292161 |
| 0.00677824021017015 |
| 0.00724384654987962 |
| 0.00791455340377479 |
| 0.00809311867513984 |
| 0.00832060632139501 |
| 0.00845975429607532 |
| 0.00916914828975606 |
| 0.00930567272874124 |
| 0.010219381200354 |
| 0.010613451721718 |
| 0.011154617193299 |
| 0.0122322952488738 |
| 0.0126715852065679 |
| 0.0127805173516092 |
+---------------------+
This quite consistently gives me results in the 0.001-0.017 range.
The results are presumably already sorted by the use of the index, but
it's definitely odd that they seem to so consistently come out so small.
Because, we *don't* see that if we grab a rand() value as a column:
20;+---------------------+-------------------+
| cur_random | rand() |
+---------------------+-------------------+
| 0.00180685892869426 | 0.059753977749268 |
| 0.00333090965014967 | 0.27524542461638 |
| 0.00345821027034727 | 0.083287411446951 |
| 0.00541453902182592 | 0.1797017885183 |
| 0.00616005901820963 | 0.10850227168622 |
| 0.00718451943917621 | 0.44687699754432 |
| 0.00725775678386703 | 0.24804242723439 |
| 0.0073513565653482 | 0.66955343696247 |
| 0.00753892400072787 | 0.58810817505057 |
| 0.00818642974662262 | 0.35786299627075 |
| 0.00856924430333939 | 0.92427461121629 |
| 0.00867950265172823 | 0.58906755278731 |
| 0.00916074124086717 | 0.16777823601642 |
| 0.00939816703032532 | 0.19605916291108 |
| 0.00979022216963603 | 0.10278878091163 |
| 0.0102785686126711 | 0.27433319694766 |
| 0.0103007052189677 | 0.76059719990995 |
| 0.0105801159614512 | 0.25284009636644 |
| 0.0111034736140663 | 0.53778274221139 |
| 0.0113199194998666 | 0.046922132416556 |
+---------------------+-------------------+
Note that the rand() we WHERE with and the rand() we SELECT are separate
invocations of the function, and don't return the same result as each
other.
Hmm, let's look at the docs:
Note that a RAND() in a WHERE clause will be re-evaluated every time the
WHERE is executed. RAND() is not meant to be a perfect random
generator, but instead a fast way to generate ad hoc random numbers
that will be portable between platforms for the same MySQL version.
--
http://www.mysql.com/doc/en/Mathematical_functions.html *WHAM WHAM WHAM*
Now, let's think what that means. We're selecting for cur_random. It uses
the index on cur_random, so it's going to sort starting from the
infintesimally small end, but can't use a constant to index by because the
WHERE clause is a function -- we have to scan. For each row it makes up a
random number, and sees if this row if at least as big. If yes, it puts
the row in the return queue. If no, it goes to the next row and makes up
another random number.
Some portion of those small-numbered rows are going to match, and at some
point we fill up our quota and return the matching rows.
AAAAAAGGGGGHHHHH!!!!! :)
It's not a MySQL _bug_, just a very non-intuitive behavior which leads to
over-biasing to the low-end when we misuse it (and the updates to 'stir
the pot' would thus tend to depopulate the low-end and bias the value
distribution high). Generating one random number ourselves and giving it
to mysql as a constant, as my recent update does, should solve this.
> May I make a suggestion, while we're on the topic? How about changing the query to:
[snip]
> AND cur_random>{$rand} AND cur_user<>3903 AND cur_user<>6120
[snip]
> which will skip anything last edited by Ram-Man or Rambot. Like Daniel Mikkelsen said, the most important function for Special:Randompage is to impress passers-by. We should rig it any way we can to make Wikipedia look better.
That's awfully specific to be hard-coding. :)
-- brion vibber (brion @ pobox.com)