Mailing List Archive

Mystery of most-viewed pages on En.Wikipedia
Does the Wikimedia Foundation's technology team have any insight or
comment on the finding that (other than the Wikipedia Main Page and
the "404 error" page), in September the most popular page on the
English Wikipedia was "Mathematical descriptions of opacity", with
over 5.1 million views? There was no discernible "bump" in interest
in opacity due to outside news events or a book or movie release on
the subject.

The phenomenon is outlined here:
http://www.examiner.com/wiki-edits-in-national/wikipedia-s-top-10-most-viewed-articles-september-2011

Do you think this is some sort of malicious probing activity by a hacker, or is
it perhaps the deliberate testing of a developer employed by the WMF?

Thank you,

Greg

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Mystery of most-viewed pages on En.Wikipedia [ In reply to ]
If I remember correctly; stats collection is imperfect, and that results in
some odd numbers.

That is just my memory of why it looks like that.

Tom

On 4 October 2011 14:10, Gregory Kohs <thekohser@gmail.com> wrote:

> Does the Wikimedia Foundation's technology team have any insight or
> comment on the finding that (other than the Wikipedia Main Page and
> the "404 error" page), in September the most popular page on the
> English Wikipedia was "Mathematical descriptions of opacity", with
> over 5.1 million views? There was no discernible "bump" in interest
> in opacity due to outside news events or a book or movie release on
> the subject.
>
> The phenomenon is outlined here:
>
> http://www.examiner.com/wiki-edits-in-national/wikipedia-s-top-10-most-viewed-articles-september-2011
>
> Do you think this is some sort of malicious probing activity by a hacker,
> or is
> it perhaps the deliberate testing of a developer employed by the WMF?
>
> Thank you,
>
> Greg
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Mystery of most-viewed pages on En.Wikipedia [ In reply to ]
On Tue, Oct 4, 2011 at 9:10 AM, Gregory Kohs <thekohser@gmail.com> wrote:
> Do you think this is some sort of malicious probing activity by a hacker, or is
> it perhaps the deliberate testing of a developer employed by the WMF?
>

There's no specific testing that I know of. Other than skewing people's
stats, I don't really know what I hacker would gain from skewing the
"most viewed articles" list.

-Chad

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Mystery of most-viewed pages on En.Wikipedia [ In reply to ]
On Tue, Oct 4, 2011 at 3:10 PM, Gregory Kohs <thekohser@gmail.com> wrote:
> Does the Wikimedia Foundation's technology team have any insight or
> comment on the finding that (other than the Wikipedia Main Page and
> the "404 error" page), in September the most popular page on the
> English Wikipedia was "Mathematical descriptions of opacity", with
> over 5.1 million views?  There was no discernible "bump" in interest
> in opacity due to outside news events or a book or movie release on
> the subject.
>
> The phenomenon is outlined here:
> http://www.examiner.com/wiki-edits-in-national/wikipedia-s-top-10-most-viewed-articles-september-2011
>
> Do you think this is some sort of malicious probing activity by a hacker, or is
> it perhaps the deliberate testing of a developer employed by the WMF?
>
There seem to have been a lot of page views concentrated around
September 22-26. This could be something as innocent as someone
running a broken bot that's supposed to fetch lots of different
articles but instead fetches the same URL again and again due to a
typo in the code, or it could be as malicious as someone trying to DoS
us in a very simplistic way. I'll look at the sampled logs for those
days and see what I can find.

Roan

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Mystery of most-viewed pages on En.Wikipedia [ In reply to ]
On Tue, Oct 4, 2011 at 3:48 PM, Roan Kattouw <roan.kattouw@gmail.com> wrote:
> There seem to have been a lot of page views concentrated around
> September 22-26.
Missing link here:
http://stats.grok.se/en/201109/Mathematical%20descriptions%20of%20opacity

Roan

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Mystery of most-viewed pages on En.Wikipedia [ In reply to ]
On Tue, Oct 4, 2011 at 3:48 PM, Roan Kattouw <roan.kattouw@gmail.com> wrote:
> There seem to have been a lot of page views concentrated around
> September 22-26. This could be something as innocent as someone
> running a broken bot that's supposed to fetch lots of different
> articles but instead fetches the same URL again and again due to a
> typo in the code, or it could be as malicious as someone trying to DoS
> us in a very simplistic way. I'll look at the sampled logs for those
> days and see what I can find.
I've grepped the sampled (1:1000) Squid logs for September 23rd, 24th
and 25th, and I do indeed see that a vast, vast majority of requests
for that article come from a single IP. In fact, I got output like
this (IP addresses redacted for privacy reasons):

$ zgrep Mathematical_descriptions_of_opacity
sampled-1000.log-20110924.gz | cut -d ' ' -f 5 | sort | uniq -c | sort
-rn | head
1548 AA.BB.CC.DD
1 EE.FF.GG.HH
1 JJ.KK.LL.MM

which means that in the sampled log (we don't keep full access logs,
only a 1:1000 sample) for September 24th, of the 1550 logged requests,
1548 came from our guy and 2 came from different, random people. This
doesn't mean there were only 1550 visits to that page that day; due to
the sampling, the real number is roughly near 1550*1000 = 1.55
million, which matches the 1.6M reported by stats.grok.se well enough.

Also, these requests all list
http://en.wikipedia.org/wiki/Snell%27s_law as their referer:
$ zgrep Mathematical_descriptions_of_opacity
sampled-1000.log-20110924.gz | grep Snell | wc -l
1548
but the Snell's law article doesn't show any strange access patterns
on stats.grok.se .

So I guess this was just one IP hitting the same article ~1.5 million
times per day for 3-4 days, for whatever reason. That doesn't really
hurt our servers much (unless the article is also edited heavily in
the meantime and contains complex templates that take a long time to
parse, see Jackson, Michael) but obviously it does skew the traffic
stats.

Roan

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Mystery of most-viewed pages on En.Wikipedia [ In reply to ]
On 04/10/11 15:48, Roan Kattouw wrote:
> On Tue, Oct 4, 2011 at 3:10 PM, Gregory Kohs<thekohser@gmail.com> wrote:
>> Does the Wikimedia Foundation's technology team have any insight or
>> comment on the finding that (other than the Wikipedia Main Page and
>> the "404 error" page), in September the most popular page on the
>> English Wikipedia was "Mathematical descriptions of opacity", with
>> over 5.1 million views? There was no discernible "bump" in interest
>> in opacity due to outside news events or a book or movie release on
>> the subject.
>>
>> The phenomenon is outlined here:
>> http://www.examiner.com/wiki-edits-in-national/wikipedia-s-top-10-most-viewed-articles-september-2011
>>
> There seem to have been a lot of page views concentrated around
> September 22-26. This could be something as innocent as someone
> running a broken bot that's supposed to fetch lots of different
> articles but instead fetches the same URL again and again due to a
> typo in the code, or it could be as malicious as someone trying to DoS
> us in a very simplistic way. I'll look at the sampled logs for those
> days and see what I can find.

My conspiracy theory: someone knew about the upcoming Examiner article
and bumped his favorite topic :)

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Mystery of most-viewed pages on En.Wikipedia [ In reply to ]
Yo,

> So I guess this was just one IP hitting the same article ~1.5 million
> times per day for 3-4 days, for whatever reason.

OMG, if anyone can influence quality journalism on examiner.com that easily, we definitely have to go and build proper analysis of full logs with all the shiny modern technologies and whatever cluster we can build for that. That would be bump in program spending \o/

(Though it would be nice to notice someone crap like that. Web activity, not article, I mean).

Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l