Mailing List Archive

Topology of Wikipedia
Topology of Wikipedias is very interesting.

First question is: what is distribution of number of hops needed to
reach an article from the Main page.

Attached script gives aproximate answer to this question.
It requires PHP database, and libmysql-ruby.

Data for Polish Wikipedia:

-1 602 (12.75964392%)
0 1 (0.02119542179%)
1 113 (2.395082662%)
2 886 (18.7791437%)
3 2367 (50.16956337%)
4 600 (12.71725307%)
5 126 (2.670623145%)
6 16 (0.3391267486%)
7 5 (0.1059771089%)
8 2 (0.04239084358%)
Total 4718


Results:
* It's yet to be found how much the fact that empty pages, redirects,
user and talk pages are treated like normal pages affects the results.
* Number of pages that are not reachable at all is very high.
* If page is reachable, it's usually reachable in just a few hops
* Adding more links to pages linked from Main page seems to be best
way of improving results.

I'm especially interested in results from English (which is the
biggest) and Esperanto (which has different linking philosophy)
Wikipedias. Information from Spanish, German and others would
also be interesting, but I don't expect it to differ a lot from Polish
results.
Re: Topology of Wikipedia [ In reply to ]
Tomasz Wegrzanowski wrote:

>Topology of Wikipedias is very interesting.
>
>First question is: what is distribution of number of hops needed to
>reach an article from the Main page.
>
>Attached script gives aproximate answer to this question.
>It requires PHP database, and libmysql-ruby.
>
>Data for Polish Wikipedia:
>
>-1 602 (12.75964392%)
>0 1 (0.02119542179%)
>1 113 (2.395082662%)
>2 886 (18.7791437%)
>3 2367 (50.16956337%)
>4 600 (12.71725307%)
>5 126 (2.670623145%)
>6 16 (0.3391267486%)
>7 5 (0.1059771089%)
>8 2 (0.04239084358%)
>Total 4718
>
>
>
Interesting. The English-language Wikipedia claims only 313 orphans (<
1%) out of 34457 articles, not counting redirects or non-comma articles.
Maybe there is a 'closure' effect as the encyclopedia gets bigger? Or
maybe 'real' articles are more likely to be linked?

Neil
Re: Topology of Wikipedia [ In reply to ]
On Mon, Jul 22, 2002 at 11:56:58PM +0100, Neil Harris wrote:
> Tomasz Wegrzanowski wrote:
>
> >Topology of Wikipedias is very interesting.
> >
> >First question is: what is distribution of number of hops needed to
> >reach an article from the Main page.
> >
> >Attached script gives aproximate answer to this question.
> >It requires PHP database, and libmysql-ruby.
> >
> >Data for Polish Wikipedia:
> >
> >-1 602 (12.75964392%)
> >0 1 (0.02119542179%)
> >1 113 (2.395082662%)
> >2 886 (18.7791437%)
> >3 2367 (50.16956337%)
> >4 600 (12.71725307%)
> >5 126 (2.670623145%)
> >6 16 (0.3391267486%)
> >7 5 (0.1059771089%)
> >8 2 (0.04239084358%)
> >Total 4718
> >
> >
> >
> Interesting. The English-language Wikipedia claims only 313 orphans (<
> 1%) out of 34457 articles, not counting redirects or non-comma articles.
> Maybe there is a 'closure' effect as the encyclopedia gets bigger? Or
> maybe 'real' articles are more likely to be linked?

Orphans count is different. Orphans count is 175 on Polish Wikipedia.

Orphans count doesn't include redirects, empty, user and talk pages.
That's good.

But if some group of articles link to each other but are not linked
from any article outside of the group, then orphan count doesn't
include them. But they're also not accesible, so it should.
Re: Topology of Wikipedia [ In reply to ]
Tomasz Wegrzanowski wrote:

> On Mon, Jul 22, 2002 at 11:56:58PM +0100, Neil Harris wrote:
>
>
>>Tomasz Wegrzanowski wrote:
>>
>>
>>
>>>Topology of Wikipedias is very interesting.
>>>
>>>First question is: what is distribution of number of hops needed to
>>>reach an article from the Main page.
>>>
>>>Attached script gives aproximate answer to this question.
>>>It requires PHP database, and libmysql-ruby.
>>>
>>>Data for Polish Wikipedia:
>>>
>>>-1 602 (12.75964392%)
>>>0 1 (0.02119542179%)
>>>1 113 (2.395082662%)
>>>2 886 (18.7791437%)
>>>3 2367 (50.16956337%)
>>>4 600 (12.71725307%)
>>>5 126 (2.670623145%)
>>>6 16 (0.3391267486%)
>>>7 5 (0.1059771089%)
>>>8 2 (0.04239084358%)
>>>Total 4718
>>>
>>>
>>>
>>>
>>>
>>Interesting. The English-language Wikipedia claims only 313 orphans (<
>>1%) out of 34457 articles, not counting redirects or non-comma articles.
>>Maybe there is a 'closure' effect as the encyclopedia gets bigger? Or
>>maybe 'real' articles are more likely to be linked?
>>
>>
>
>Orphans count is different. Orphans count is 175 on Polish Wikipedia.
>
>Orphans count doesn't include redirects, empty, user and talk pages.
>That's good.
>
>But if some group of articles link to each other but are not linked
>from any article outside of the group, then orphan count doesn't
>include them. But they're also not accesible, so it should.
>

Oh, I see. They're disconnected sub-graphs not reachable from the root.
That's interesting.

I wonder what the equivalent figures are for the English-language Wikipedia?

Neil