Mailing List Archive

Random Page works a little bit strange
Hi,

I just played a little bit with the "random page" feature on the German
Wikipedia and read an article about "Masatoshi Koshiba". After I hit the
link for the random page again I was directed to "Skins" which seems ok,
but after that I came to "Raymond Davis Jr.
<http://de.wikipedia.org/wiki/Raymond_Davis_Jr.>" who was linked as "see
also" on the page of Koshiba. IMHO the random page is not that random at
all, on the other hand this could really be a random coincidence. Maybe
some of you have an idea if the random mode needs to be modulated...
Regards

Thomas aka Urbanus
Re: Random Page works a little bit strange [ In reply to ]
On Tue, 3 Jun 2003, Thomas Luft wrote:

> I just played a little bit with the "random page" feature on the German
> Wikipedia and read an article about "Masatoshi Koshiba". After I hit the
> link for the random page again I was directed to "Skins" which seems ok,
> but after that I came to "Raymond Davis Jr.
> <http://de.wikipedia.org/wiki/Raymond_Davis_Jr.>" who was linked as "see
> also" on the page of Koshiba. IMHO the random page is not that random at
> all, on the other hand this could really be a random coincidence. Maybe
> some of you have an idea if the random mode needs to be modulated...
> Regards

The random page on the English one isn't random either. You will regularly get
the same page again and again if you try a few times.

This is a shame, because one of the most impressive introductions to Wikipedia
is to have someone hit the random page link for a while. The repeating pages
kind of burst the bubble. :)

-- Daniel
Re: Random Page works a little bit strange [ In reply to ]
On Tue, 3 Jun 2003, Hr. Daniel Mikkelsen wrote:
> On Tue, 3 Jun 2003, Thomas Luft wrote:
> The random page on the English one isn't random either. You will regularly get
> the same page again and again if you try a few times.

Grrr.... looks like the random indices are all off again; somehow MySQL's
rand() function is biasing high the way we've been using it, and there are
_very_ few articles set with lower indexes (<0.25), so those few get
pulled up way too often. I've just told it to redo all the random indexes
on the german wiki in a lump; I'll reset them on the English wiki later
tonight when traffic is lower.

I've also gone ahead and replaced the random seed generator in the wiki
and changed Special:Random to use its own random number instead of asking
for one from MySQL. I don't trust MySQL anymore. :) And I took out the
reset-index-on-load, which was probably trouble.

-- brion vibber (brion @ pobox.com)
Re: Random Page works a little bit strange [ In reply to ]
Brion wrote:
>On Tue, 3 Jun 2003, Hr. Daniel Mikkelsen wrote:
> > On Tue, 3 Jun 2003, Thomas Luft wrote:
> > The random page on the English one isn't random either. You will
>regularly get
> > the same page again and again if you try a few times.
>
>Grrr.... looks like the random indices are all off again; somehow MySQL's
>rand() function is biasing high the way we've been using it, and there are
>_very_ few articles set with lower indexes (<0.25), so those few get
>pulled up way too often. I've just told it to redo all the random indexes
>on the german wiki in a lump; I'll reset them on the English wiki later
>tonight when traffic is lower.

Unless you've already fixed it, the English cur_random column is still fine.

SELECT cur_random FROM cur WHERE cur_random>0.01 ORDER BY cur_random LIMIT
10

returns...

0.0100032491702617
0.010005122059961
0.010018127048405
0.0100242663268226
0.0100461980421526
0.0100568952132546
0.0100595876668204
0.0100729866047138
0.0100769354124339
0.0100776087586559

Sounds to me like you fixed the English one after I first described the
cause of the problem a month ago, but you didn't fix the other languages. If
in fact the English cur_random was stuffed up again, and you fixed it before
I ran the above query, I want to know about it. I consider this a pet bug of
mine now.

>I've also gone ahead and replaced the random seed generator in the wiki
>and changed Special:Random to use its own random number instead of asking
>for one from MySQL. I don't trust MySQL anymore. :) And I took out the
>reset-index-on-load, which was probably trouble.

Why not just go the whole hog and use a noisy diode? ;)

I was pretty confident I worked out the problem last time around. I even
wrote a little program simulating the behaviour of the previous version of
Special:Randompage. It's attached. Compile it with "g++ drift_test.cc" and
watch all those "random" numbers gravitate towards 1.0 like it's a hot woman
at a party or something.

Hr. Daniel Mikkelsen <daniel@copyleft.no> wrote:
<snip>
>The random page on the English one isn't random either. You will regularly
>get
>the same page again and again if you try a few times.

This should have been fixed a month ago, when Brion reset the index. Have
you checked since then?

-- Tim Starling.

_________________________________________________________________
Get mobile Hotmail. Go to http://ninemsn.com.au/mobilecentral/signup.asp
Re: Random Page works a little bit strange [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Je Merkredo 04 Junio 2003 00:06, Tim Starling skribis:
> Unless you've already fixed it, the English cur_random column is
> still fine.
>
> SELECT cur_random FROM cur WHERE cur_random>0.01 ORDER BY cur_random
> LIMIT 10

Nope, it got broke again. Try restricting the query to the pages
Special:Randompage is concerned with:

SELECT cur_random FROM cur WHERE cur_random>0.01 and cur_namespace=0 and
cur_is_redirect=0 ORDER BY cur_random LIMIT 10

Not only did it take two minutes to return (mmm, partial table scans),
but there are *huge* gaps:
0.010371912280444
0.0106440456683226
0.0130615625652127
0.0138753941411855
0.0403562412041763
0.0404494861517562
0.044624416199163
0.0523233311738105
0.0542095322666779
0.0586327482560954

When I checked this afternoon, there were only 82 article-space
non-redirect pages with cur_random values less than 0.22something. 82
out of a hundred some thousand... so, no wonder that the page with the
particular value was coming up repeatedly in my Random page clicks.

I'm regenerating the index again now...

- -- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE+3aHsxVlOmwh1xjgRAndLAJ0ZFP/tmSufIlj17ZJ+54G0iS5JcQCggZ19
W43mKGD12iMF8kIEV62mLlE=
=aky3
-----END PGP SIGNATURE-----
Re: Random Page works a little bit strange [ In reply to ]
On Wed, 4 Jun 2003, Tim Starling wrote:

> Hr. Daniel Mikkelsen <daniel@copyleft.no> wrote:

>> The random page on the English one isn't random either. You will regularly
>> get the same page again and again if you try a few times.

> This should have been fixed a month ago, when Brion reset the index. Have
> you checked since then?

Yes, this happened to me about a week ago.

-- Daniel
Re: Random Page works a little bit strange [ In reply to ]
Sorry to pine users...

Taking the SQL query used in Special:Randompage from CVS and modifying it very slightly...

SELECT cur_id,cur_title,cur_random
FROM cur USE INDEX (cur_random)
WHERE cur_namespace=0 AND cur_is_redirect=0
AND cur_random>RAND()
ORDER BY cur_random
LIMIT 20

returns...

cur_id cur_title cur_random
124125 Pierce,_Nebraska 0.0030205277754185
205997 Wagh_el_Birket 0.00385735184313483
120605 Custer_Township,_Minnesota 0.00416424684614339
131375 Lorane,_Pennsylvania 0.00439120363853053
150887 Columbiana,_Ohio 0.00589350611520326
53913 Castle_Rock 0.00614019670164231
10438 Komyo 0.00616735406794339
131027 Newberg,_Oregon 0.00645017624502087
120060 Hartland_Township,_Minnesota 0.00903007575220435
126590 Osceola,_New_York 0.00905275718220766


It doesn't always return the same articles, but they're always very low-numbered. I don't know about you, but I would call that a MySQL bug.

May I make a suggestion, while we're on the topic? How about changing the query to:

SELECT cur_id,cur_title,cur_random
FROM cur USE INDEX (cur_random)
WHERE cur_namespace=0 AND cur_is_redirect=0
AND cur_random>{$rand} AND cur_user<>3903 AND cur_user<>6120
ORDER BY cur_random
LIMIT 20

which will skip anything last edited by Ram-Man or Rambot. Like Daniel Mikkelsen said, the most important function for Special:Randompage is to impress passers-by. We should rig it any way we can to make Wikipedia look better.

-- Tim Starling.
Re: Random Page works a little bit strange [ In reply to ]
Sorry to pine users...

Taking the SQL query used in Special:Randompage from CVS and modifying it very slightly...

SELECT cur_id,cur_title,cur_random
FROM cur USE INDEX (cur_random)
WHERE cur_namespace=0 AND cur_is_redirect=0
AND cur_random>RAND()
ORDER BY cur_random
LIMIT 20

returns...

cur_id cur_title cur_random
124125 Pierce,_Nebraska 0.0030205277754185
205997 Wagh_el_Birket 0.00385735184313483
120605 Custer_Township,_Minnesota 0.00416424684614339
131375 Lorane,_Pennsylvania 0.00439120363853053
150887 Columbiana,_Ohio 0.00589350611520326
53913 Castle_Rock 0.00614019670164231
10438 Komyo 0.00616735406794339
131027 Newberg,_Oregon 0.00645017624502087
120060 Hartland_Township,_Minnesota 0.00903007575220435
126590 Osceola,_New_York 0.00905275718220766


It doesn't always return the same articles, but they're always very low-numbered. I don't know about you, but I would call that a MySQL bug. This is good because it means Brion has already fixed the problem.

May I make a suggestion, while we're on the topic? How about changing the query to:

SELECT cur_id,cur_title,cur_random
FROM cur USE INDEX (cur_random)
WHERE cur_namespace=0 AND cur_is_redirect=0
AND cur_random>{$rand} AND cur_user<>3903 AND cur_user<>6120
ORDER BY cur_random
LIMIT 20


which will skip anything last edited by Ram-Man or Rambot. Like Daniel Mikkelsen said, the most important function for Special:Randompage is to impress passers-by. We should rig it any way we can to make Wikipedia look better.

-- Tim Starling
Re: Re: Random Page works a little bit strange [ In reply to ]
On Wed, 4 Jun 2003, Tim Starling wrote:
> Taking the SQL query used in Special:Randompage from CVS and modifying
> it very slightly...
[snip]
> It doesn't always return the same articles, but they're always very
> low-numbered. I don't know about you, but I would call that a MySQL
> bug.

Hmm... let's simplify this further:

+---------------------+
| cur_random |
+---------------------+
| 0.00257335324080042 |
| 0.00301321596187839 |
| 0.00409562141084636 |
| 0.00434284564512115 |
| 0.00447388831942704 |
| 0.00527506415292161 |
| 0.00677824021017015 |
| 0.00724384654987962 |
| 0.00791455340377479 |
| 0.00809311867513984 |
| 0.00832060632139501 |
| 0.00845975429607532 |
| 0.00916914828975606 |
| 0.00930567272874124 |
| 0.010219381200354 |
| 0.010613451721718 |
| 0.011154617193299 |
| 0.0122322952488738 |
| 0.0126715852065679 |
| 0.0127805173516092 |
+---------------------+

This quite consistently gives me results in the 0.001-0.017 range.

The results are presumably already sorted by the use of the index, but
it's definitely odd that they seem to so consistently come out so small.
Because, we *don't* see that if we grab a rand() value as a column:
20;+---------------------+-------------------+
| cur_random | rand() |
+---------------------+-------------------+
| 0.00180685892869426 | 0.059753977749268 |
| 0.00333090965014967 | 0.27524542461638 |
| 0.00345821027034727 | 0.083287411446951 |
| 0.00541453902182592 | 0.1797017885183 |
| 0.00616005901820963 | 0.10850227168622 |
| 0.00718451943917621 | 0.44687699754432 |
| 0.00725775678386703 | 0.24804242723439 |
| 0.0073513565653482 | 0.66955343696247 |
| 0.00753892400072787 | 0.58810817505057 |
| 0.00818642974662262 | 0.35786299627075 |
| 0.00856924430333939 | 0.92427461121629 |
| 0.00867950265172823 | 0.58906755278731 |
| 0.00916074124086717 | 0.16777823601642 |
| 0.00939816703032532 | 0.19605916291108 |
| 0.00979022216963603 | 0.10278878091163 |
| 0.0102785686126711 | 0.27433319694766 |
| 0.0103007052189677 | 0.76059719990995 |
| 0.0105801159614512 | 0.25284009636644 |
| 0.0111034736140663 | 0.53778274221139 |
| 0.0113199194998666 | 0.046922132416556 |
+---------------------+-------------------+

Note that the rand() we WHERE with and the rand() we SELECT are separate
invocations of the function, and don't return the same result as each
other.

Hmm, let's look at the docs:

Note that a RAND() in a WHERE clause will be re-evaluated every time the
WHERE is executed. RAND() is not meant to be a perfect random
generator, but instead a fast way to generate ad hoc random numbers
that will be portable between platforms for the same MySQL version.
-- http://www.mysql.com/doc/en/Mathematical_functions.html

*WHAM WHAM WHAM*

Now, let's think what that means. We're selecting for cur_random. It uses
the index on cur_random, so it's going to sort starting from the
infintesimally small end, but can't use a constant to index by because the
WHERE clause is a function -- we have to scan. For each row it makes up a
random number, and sees if this row if at least as big. If yes, it puts
the row in the return queue. If no, it goes to the next row and makes up
another random number.

Some portion of those small-numbered rows are going to match, and at some
point we fill up our quota and return the matching rows.

AAAAAAGGGGGHHHHH!!!!! :)

It's not a MySQL _bug_, just a very non-intuitive behavior which leads to
over-biasing to the low-end when we misuse it (and the updates to 'stir
the pot' would thus tend to depopulate the low-end and bias the value
distribution high). Generating one random number ourselves and giving it
to mysql as a constant, as my recent update does, should solve this.


> May I make a suggestion, while we're on the topic? How about changing the query to:
[snip]
> AND cur_random>{$rand} AND cur_user<>3903 AND cur_user<>6120
[snip]
> which will skip anything last edited by Ram-Man or Rambot. Like Daniel Mikkelsen said, the most important function for Special:Randompage is to impress passers-by. We should rig it any way we can to make Wikipedia look better.

That's awfully specific to be hard-coding. :)

-- brion vibber (brion @ pobox.com)
Re: Re: Random Page works a little bit strange [ In reply to ]
>> AND cur_random>{$rand} AND cur_user<>3903 AND cur_user<>6120

> That's awfully specific to be hard-coding. :)

Add a settings variable that's a partial SQL query. I.e.,
in LocalSettings, put something like:

$wgRestrictRandom = "cur_user<>3903 AND cur_user<>6120";

then in DefaultSettings, put:

$wgRestrictRandom = "TRUE";

then the query becomes:

WHERE ($randonQuery) AND ($wgRestrictRandom)

Any individual wiki-specific hack can be done this way while
keeping all source files identical except for LocalSettings.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: Random Page works a little bit strange [ In reply to ]
Tim Starling schrieb:

> Taking the SQL query used in Special:Randompage from CVS and modifying
> it very slightly...
>
> SELECT cur_id,cur_title,cur_random
> FROM cur USE INDEX (cur_random)
> WHERE cur_namespace=0 AND cur_is_redirect=0
> AND cur_random>RAND()
> ORDER BY cur_random
> LIMIT 20
>

I guess that's why this doesn't work:
MySQL processes the query in the following order:

1. Fetching
2. Ordering
3. Limiting


1. MySQL fetches rows from the table until there ist no more row with
cur_random greater than the recently generated RAND() (it generates a
new RAND() on every fetch). This should return on avarage half of the
rows from the table.
2. MySQL orders the fetched rows by cur_random in ascending order.
3. MySQL returns the 20 rows with the lowest cur_random from the subset.

-- WeißNix
Re: Random Page works a little bit strange [ In reply to ]
I see that you techies discussing how to tweak
Random Page behaviour, but Polish Wikipedia suffers a bit: when
Random Page
is pressed the following message appears.
Other main 'pedias do not report any error.
Can you correct that quickly? Thanks
in advance

Regards
Youandme

--------------------------------------------------------

Wyst¹pi³ b³¹d sk³adni w zapytaniu do bazy danych. Mog³o to byæ
spowodowane przez z³e sformu³owanie zapytania (zobacz Przeszukiwanie
Wikipedii) albo przez b³¹d w oprogramowaniu. Ostatnie, nieudane
zapytanie to:

SELECT cur_id,cur_title FROM cur USE INDEX (cur_random) WHERE
cur_namespace=0 AND cur_is_redirect=0 AND cur_random>0,172539821906
ORDER BY cur_random LIMIT 1

wys³ane przez funkcjê "wfSpecialRandompage". MySQL zg³osi³ b³¹d
"1064: You have an error in your SQL syntax. Check the manual that
corresponds to your MySQL server version for the right syntax to use
near '172539821906 ORDER BY
cur_random LIMIT 1' at line 4".
Re: Random Page works a little bit strange [ In reply to ]
On Fri, 6 Jun 2003, Youandme wrote:
(on Polish wiki)
> SELECT cur_id,cur_title FROM cur USE INDEX (cur_random) WHERE
> cur_namespace=0 AND cur_is_redirect=0 AND cur_random>0,172539821906

Ah, I do so love PHP's transparent support for localization. :P

I'll see if I can work around that...

-- brion vibber (brion @ pobox.com)
Re: Random Page works a little bit strange [ In reply to ]
On Fri, 6 Jun 2003, Brion Vibber wrote:
> On Fri, 6 Jun 2003, Youandme wrote:
> (on Polish wiki)
> > SELECT cur_id,cur_title FROM cur USE INDEX (cur_random) WHERE
> > cur_namespace=0 AND cur_is_redirect=0 AND cur_random>0,172539821906
>
> Ah, I do so love PHP's transparent support for localization. :P
>
> I'll see if I can work around that...

I've switched from variable interpolation with an implicit float->string
conversion to using the number_format() function, where I can explicitly
set the decimal separator to "." to keep MySQL from choking when set in a
locale with "," as the decimal separator.

Seems happy so far...

-- brion vibber (brion @ pobox.com)
Re: Random Page works a little bit strange [ In reply to ]
On 6 Jun 2003 at 11:42, Brion Vibber wrote:

> I've switched from variable interpolation with an implicit float->string
> conversion to using the number_format() function, where I can explicitly
> set the decimal separator to "." to keep MySQL from choking when set in a
> locale with "," as the decimal separator.
>
> Seems happy so far...

:)
As happy as me and others
Thank you very much
Youandme