Topology of Wikipedias is very interesting.
First question is: what is distribution of number of hops needed to
reach an article from the Main page.
Attached script gives aproximate answer to this question.
It requires PHP database, and libmysql-ruby.
Data for Polish Wikipedia:
-1 602 (12.75964392%)
0 1 (0.02119542179%)
1 113 (2.395082662%)
2 886 (18.7791437%)
3 2367 (50.16956337%)
4 600 (12.71725307%)
5 126 (2.670623145%)
6 16 (0.3391267486%)
7 5 (0.1059771089%)
8 2 (0.04239084358%)
Total 4718
Results:
* It's yet to be found how much the fact that empty pages, redirects,
user and talk pages are treated like normal pages affects the results.
* Number of pages that are not reachable at all is very high.
* If page is reachable, it's usually reachable in just a few hops
* Adding more links to pages linked from Main page seems to be best
way of improving results.
I'm especially interested in results from English (which is the
biggest) and Esperanto (which has different linking philosophy)
Wikipedias. Information from Spanish, German and others would
also be interesting, but I don't expect it to differ a lot from Polish
results.
First question is: what is distribution of number of hops needed to
reach an article from the Main page.
Attached script gives aproximate answer to this question.
It requires PHP database, and libmysql-ruby.
Data for Polish Wikipedia:
-1 602 (12.75964392%)
0 1 (0.02119542179%)
1 113 (2.395082662%)
2 886 (18.7791437%)
3 2367 (50.16956337%)
4 600 (12.71725307%)
5 126 (2.670623145%)
6 16 (0.3391267486%)
7 5 (0.1059771089%)
8 2 (0.04239084358%)
Total 4718
Results:
* It's yet to be found how much the fact that empty pages, redirects,
user and talk pages are treated like normal pages affects the results.
* Number of pages that are not reachable at all is very high.
* If page is reachable, it's usually reachable in just a few hops
* Adding more links to pages linked from Main page seems to be best
way of improving results.
I'm especially interested in results from English (which is the
biggest) and Esperanto (which has different linking philosophy)
Wikipedias. Information from Spanish, German and others would
also be interesting, but I don't expect it to differ a lot from Polish
results.