Mailing List Archive

Wikimedia Projects Growth Animated
This is a joyful day indeed.
I would first like to send my best wishes to Angela and Tim for getting engaged ☺
And Erik Moeller did an awesome job pitching Wikimedia at the Sloan foundation, congratulations!

I met Erik and GerardM two weeks ago, and we discussed all kind of interesting things.
One of the topics we briefly touched was animated web stats, we all knew and admired Hans Rosling's wonderful Gapcasts.

I figured Wikimedia could benefit from similar stats.
Of course java or svg would be likely candidates for implementation, but both have their pro's and con's.
Also rather than going back to Hans' (and now Google's) GapMinder, I thought to do some experimentation myself.

Upcoming html5 promises exciting, albeit not yet standardized, new functionality.
The new canvas element was introduced first in Safari, but Firefox 3 takes it a few steps further.

So for now my javascript-only 'Wikimedia Projects Growth Animated' runs only on Firefox 3 (beta).
Be assured there is also a recorded 8 Mb Flash file for project Wikipedia.
The idea is to update these animations automatically from the wikistats job.

http://www.infodisiac.com/Wikipedia/Wikistats/

Erik Zachte








_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
Very cool! The growth of the smallest wikis seems to be greatly
over-exaggerated on that scale. Wikis with less than 100 articles take up
roughly 30% of the screen real estate and wikis with less than 10k articles
take up more than half.

On Tue, Mar 25, 2008 at 4:19 PM, Erik Zachte <erikzachte@infodisiac.com>
wrote:

> This is a joyful day indeed.
> I would first like to send my best wishes to Angela and Tim for getting
> engaged ☺
> And Erik Moeller did an awesome job pitching Wikimedia at the Sloan
> foundation, congratulations!
>
> I met Erik and GerardM two weeks ago, and we discussed all kind of
> interesting things.
> One of the topics we briefly touched was animated web stats, we all knew
> and admired Hans Rosling's wonderful Gapcasts.
>
> I figured Wikimedia could benefit from similar stats.
> Of course java or svg would be likely candidates for implementation, but
> both have their pro's and con's.
> Also rather than going back to Hans' (and now Google's) GapMinder, I
> thought to do some experimentation myself.
>
> Upcoming html5 promises exciting, albeit not yet standardized, new
> functionality.
> The new canvas element was introduced first in Safari, but Firefox 3 takes
> it a few steps further.
>
> So for now my javascript-only 'Wikimedia Projects Growth Animated' runs
> only on Firefox 3 (beta).
> Be assured there is also a recorded 8 Mb Flash file for project Wikipedia.
> The idea is to update these animations automatically from the wikistats
> job.
>
> http://www.infodisiac.com/Wikipedia/Wikistats/
>
> Erik Zachte
>
>
>
>
>
>
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
Great tool; I wonder how one could find out something from those stats in
order to see flaws of a language Wikipedia.
Once I tried to make a system with points to see which Wikipedia is "better"
than another, because the number of articles is a poor criterion. I
considered the total number of articles, the procentage of articles with at
least 0,5 KB, the total number of wikipedians and the number of very active
users (and other things, I believe). But I am not very good in algorythms,
so I did not go on.
Ziko


2008/3/25, Brian <Brian.Mingus@colorado.edu>:
>
> Very cool! The growth of the smallest wikis seems to be greatly
> over-exaggerated on that scale. Wikis with less than 100 articles take up
> roughly 30% of the screen real estate and wikis with less than 10k
> articles
> take up more than half.
>
> On Tue, Mar 25, 2008 at 4:19 PM, Erik Zachte <erikzachte@infodisiac.com>
> wrote:
>
>
> > This is a joyful day indeed.
> > I would first like to send my best wishes to Angela and Tim for getting
> > engaged ☺
> > And Erik Moeller did an awesome job pitching Wikimedia at the Sloan
> > foundation, congratulations!
> >
> > I met Erik and GerardM two weeks ago, and we discussed all kind of
> > interesting things.
> > One of the topics we briefly touched was animated web stats, we all knew
> > and admired Hans Rosling's wonderful Gapcasts.
> >
> > I figured Wikimedia could benefit from similar stats.
> > Of course java or svg would be likely candidates for implementation, but
> > both have their pro's and con's.
> > Also rather than going back to Hans' (and now Google's) GapMinder, I
> > thought to do some experimentation myself.
> >
> > Upcoming html5 promises exciting, albeit not yet standardized, new
> > functionality.
> > The new canvas element was introduced first in Safari, but Firefox 3
> takes
> > it a few steps further.
> >
> > So for now my javascript-only 'Wikimedia Projects Growth Animated' runs
> > only on Firefox 3 (beta).
> > Be assured there is also a recorded 8 Mb Flash file for project
> Wikipedia.
> > The idea is to update these animations automatically from the wikistats
> > job.
> >
> > http://www.infodisiac.com/Wikipedia/Wikistats/
> >
> > Erik Zachte
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
Roomberg 30
NL-7064 BN Silvolde
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
>
> So for now my javascript-only 'Wikimedia Projects Growth Animated' runs only on Firefox 3 (beta).
> Be assured there is also a recorded 8 Mb Flash file for project Wikipedia.
> The idea is to update these animations automatically from the wikistats job.
>
> http://www.infodisiac.com/Wikipedia/Wikistats/
>
> Erik Zachte
>
Very nice presentation of wikistats. I think about beer... and Cthulhu ;)

przykuta

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brian wrote:
> Very cool! The growth of the smallest wikis seems to be greatly
> over-exaggerated on that scale. Wikis with less than 100 articles take up
> roughly 30% of the screen real estate and wikis with less than 10k articles
> take up more than half.

It's a logarithmic scale, which is basically the only way you're going
to see the smaller ones at all -- on a linear scale they'd be totally
dwarfed by the few biggest. :)

- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkfpj2gACgkQwRnhpk1wk46TMQCgzEFH2u+yRj+IjACoQM1STDAM
dSAAn2rAzKQlnmbg/XyNxSs9x6K0ySef
=RV3j
-----END PGP SIGNATURE-----

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
Brion Vibber wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Brian wrote:
>> Very cool! The growth of the smallest wikis seems to be greatly
>> over-exaggerated on that scale. Wikis with less than 100 articles take up
>> roughly 30% of the screen real estate and wikis with less than 10k articles
>> take up more than half.
>
> It's a logarithmic scale, which is basically the only way you're going
> to see the smaller ones at all -- on a linear scale they'd be totally
> dwarfed by the few biggest. :)

Errr

why is En 6 years old, whilst fr, de, es are nearly 7 ?

Ant


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
On 26/03/2008, Florence Devouard <Anthere9@yahoo.com> wrote:
> Brion Vibber wrote:
> > Brian wrote:

> >> Very cool! The growth of the smallest wikis seems to be greatly
> >> over-exaggerated on that scale. Wikis with less than 100 articles take up
> >> roughly 30% of the screen real estate and wikis with less than 10k articles
> >> take up more than half.

> > It's a logarithmic scale, which is basically the only way you're going
> > to see the smaller ones at all -- on a linear scale they'd be totally
> > dwarfed by the few biggest. :)

> Errr
> why is En 6 years old, whilst fr, de, es are nearly 7 ?


Because we've only just got a good full history backup of en:wp, and
Erik Zachte hasn't run the stats on it as yet. (I offered to help, but
couldn't secure a spare machine as yet ...) So the graph runs only up
to late 2006.


- d.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
Rightly so, too, I think. They aren't the ones that grew!

On Tue, Mar 25, 2008 at 5:48 PM, Brion Vibber <brion@wikimedia.org> wrote:

> on a linear scale they'd be totally
> dwarfed by the few biggest. :)
>
> - -- brion vibber (brion @ wikimedia.org)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.8 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkfpj2gACgkQwRnhpk1wk46TMQCgzEFH2u+yRj+IjACoQM1STDAM
> dSAAn2rAzKQlnmbg/XyNxSs9x6K0ySef
> =RV3j
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
David Gerard wrote:
> On 26/03/2008, Florence Devouard <Anthere9@yahoo.com> wrote:
>> Brion Vibber wrote:
>> > Brian wrote:
>
>> >> Very cool! The growth of the smallest wikis seems to be greatly
>> >> over-exaggerated on that scale. Wikis with less than 100 articles take up
>> >> roughly 30% of the screen real estate and wikis with less than 10k articles
>> >> take up more than half.
>
>> > It's a logarithmic scale, which is basically the only way you're going
>> > to see the smaller ones at all -- on a linear scale they'd be totally
>> > dwarfed by the few biggest. :)
>
>> Errr
>> why is En 6 years old, whilst fr, de, es are nearly 7 ?
>
>
> Because we've only just got a good full history backup of en:wp, and
> Erik Zachte hasn't run the stats on it as yet. (I offered to help, but
> couldn't secure a spare machine as yet ...) So the graph runs only up
> to late 2006.
>
>
> - d.



Makes sense. I only have Firefox 2, so can not run the animation. I only
see the last position :-)

ant


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
Still no spare machine! I might have access to an 8GB machine. Let me ask.

On Tue, Mar 25, 2008 at 6:50 PM, Florence Devouard <Anthere9@yahoo.com>
wrote:

> David Gerard wrote:
> > On 26/03/2008, Florence Devouard <Anthere9@yahoo.com> wrote:
> >> Brion Vibber wrote:
> >> > Brian wrote:
> >
> >> >> Very cool! The growth of the smallest wikis seems to be greatly
> >> >> over-exaggerated on that scale. Wikis with less than 100 articles
> take up
> >> >> roughly 30% of the screen real estate and wikis with less than 10k
> articles
> >> >> take up more than half.
> >
> >> > It's a logarithmic scale, which is basically the only way you're
> going
> >> > to see the smaller ones at all -- on a linear scale they'd be
> totally
> >> > dwarfed by the few biggest. :)
> >
> >> Errr
> >> why is En 6 years old, whilst fr, de, es are nearly 7 ?
> >
> >
> > Because we've only just got a good full history backup of en:wp, and
> > Erik Zachte hasn't run the stats on it as yet. (I offered to help, but
> > couldn't secure a spare machine as yet ...) So the graph runs only up
> > to late 2006.
> >
> >
> > - d.
>
>
>
> Makes sense. I only have Firefox 2, so can not run the animation. I only
> see the last position :-)
>
> ant
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
PS: Why doesn't the Foundation readily give Erik a server that he can use
whenever he needs to? From my own memory at Wikimanias he has been promised
by both Jimbo and Sue, and yet he has still never gotten one.

On Tue, Mar 25, 2008 at 6:56 PM, Brian <Brian.Mingus@colorado.edu> wrote:

> Still no spare machine! I might have access to an 8GB machine. Let me ask.
>
>
> On Tue, Mar 25, 2008 at 6:50 PM, Florence Devouard <Anthere9@yahoo.com>
> wrote:
>
> > David Gerard wrote:
> > > On 26/03/2008, Florence Devouard <Anthere9@yahoo.com> wrote:
> > >> Brion Vibber wrote:
> > >> > Brian wrote:
> > >
> > >> >> Very cool! The growth of the smallest wikis seems to be greatly
> > >> >> over-exaggerated on that scale. Wikis with less than 100 articles
> > take up
> > >> >> roughly 30% of the screen real estate and wikis with less than
> > 10k articles
> > >> >> take up more than half.
> > >
> > >> > It's a logarithmic scale, which is basically the only way you're
> > going
> > >> > to see the smaller ones at all -- on a linear scale they'd be
> > totally
> > >> > dwarfed by the few biggest. :)
> > >
> > >> Errr
> > >> why is En 6 years old, whilst fr, de, es are nearly 7 ?
> > >
> > >
> > > Because we've only just got a good full history backup of en:wp, and
> > > Erik Zachte hasn't run the stats on it as yet. (I offered to help, but
> > > couldn't secure a spare machine as yet ...) So the graph runs only up
> > > to late 2006.
> > >
> > >
> > > - d.
> >
> >
> >
> > Makes sense. I only have Firefox 2, so can not run the animation. I only
> > see the last position :-)
> >
> > ant
> >
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
>
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
>
> Makes sense. I only have Firefox 2, so can not run the animation. I only
> see the last position :-)
>
> ant
>
choose the static version ;)

http://www.infodisiac.com/Wikipedia/Wikistats/Animation.html

przykuta

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Wikimedia Projects Growth Animated [ In reply to ]
> [..] I only have Firefox 2, so can not run the animation.
> I only see the last position :-)

Ant, there is a Flash version (8 Mb) at
http://www.infodisiac.com/Wikipedia/Wikistats/Animation.html

Erik Zachte




_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Wikimedia Projects Growth Animated [ In reply to ]
Brian, before Taipei I did not follow up on offers for a wikistats machine,
as I felt there was no pressing need for it then, and hardware priorities
clearly ought to be elsewhere.
Today running in 'nice mode' on an average Wikimedia Apache server is a
bottleneck, and hampers addition of new stats. But only the new English dump
made it really tight.
The foundation is aware of this. I am confident that in the near to medium
future a solution will be found.

Erik Zachte




_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
Florence Devouard wrote:
> Brion Vibber wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Brian wrote:
>>
>>> Very cool! The growth of the smallest wikis seems to be greatly
>>> over-exaggerated on that scale. Wikis with less than 100 articles take up
>>> roughly 30% of the screen real estate and wikis with less than 10k articles
>>> take up more than half.
>>>
>> It's a logarithmic scale, which is basically the only way you're going
>> to see the smaller ones at all -- on a linear scale they'd be totally
>> dwarfed by the few biggest. :)
>>
>
> Errr
>
> why is En 6 years old, whilst fr, de, es are nearly 7 ?
>
Oh, I thought it was because you and Erik were the real founders of
Wikipedia, Jimmy just came along later. (Presumably there was some
Spanish-speaking person too, but maybe he or she left with the fork.)

--Michael Snow

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
On Tue, Mar 25, 2008 at 3:19 PM, Erik Zachte <erikzachte@infodisiac.com> wrote:

> So for now my javascript-only 'Wikimedia Projects Growth Animated' runs only on Firefox 3 (beta).
> Be assured there is also a recorded 8 Mb Flash file for project Wikipedia.
> The idea is to update these animations automatically from the wikistats job.
>
> http://www.infodisiac.com/Wikipedia/Wikistats/

This is amazing, thank you so much for making it. I look forward to
using it in presentations. :-)

(And, as per Erik's other message, yes, we're looking into ways to
support his many great projects. We would be insane not to. :-)

--
Erik Möller
Deputy Director, Wikimedia Foundation

Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
Erik Zachte wrote:
>> [..] I only have Firefox 2, so can not run the animation.
>> I only see the last position :-)
>
> Ant, there is a Flash version (8 Mb) at
> http://www.infodisiac.com/Wikipedia/Wikistats/Animation.html
>
> Erik Zachte

wow.

What I find very interesting is to see that most projects follow the
same pattern of growth; so, by far, most bubbles stick together, except
for an occasional freely-willed bubble.

Strange as well to see that in the case of an already pretty big .sv,
half of the articles are <0,5 ko.

ant


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
Hoi,
The interactive stats require Firefox 3. I have been running the Firefox 3
release candidates now for quite some time. They have been really beneficial
for me. The browser is more stable, uses less memory and runs really well on
my small system. I do recommend you to move to the Firefox release
candidates not only because the statistics are awesome.. you only see their
true use in this way.
Thanks,
Gerard

On Wed, Mar 26, 2008 at 6:32 AM, Erik Moeller <erik@wikimedia.org> wrote:

> On Tue, Mar 25, 2008 at 3:19 PM, Erik Zachte <erikzachte@infodisiac.com>
> wrote:
>
> > So for now my javascript-only 'Wikimedia Projects Growth Animated' runs
> only on Firefox 3 (beta).
> > Be assured there is also a recorded 8 Mb Flash file for project
> Wikipedia.
> > The idea is to update these animations automatically from the wikistats
> job.
> >
> > http://www.infodisiac.com/Wikipedia/Wikistats/
>
> This is amazing, thank you so much for making it. I look forward to
> using it in presentations. :-)
>
> (And, as per Erik's other message, yes, we're looking into ways to
> support his many great projects. We would be insane not to. :-)
>
> --
> Erik Möller
> Deputy Director, Wikimedia Foundation
>
> Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
Florence Devouard wrote:

> Strange as well to see that in the case of an already pretty big
> .sv, half of the articles are <0,5 ko.

At the Swedish Wikipedia, we're very well aware of this. It is
"pretty big" only by article count, not so much by word count.
Article count is a traditional measure for encyclopedias, but it
can be inflated if a few large articles are split into many
smaller ones. If stubs are merged into larger articles (e.g.
"villages in Bretagne", instead of having one stub for each), the
amount of text is unchanged, but the article count is lowered.

If we were to count words or something to that effect, the Russian
Wikipedia (the 11th biggest by article count) would already be
bigger than Swedish (10th). The Polish (4th) and Dutch (6th)
Wikipedias would also fall behind Japanese (5th) and Italian
(7th). Over all, the ranking of Wikipedias would become more
similar to the ranking of languages (by number of speakers).

There is currently a discussion on how to possibly rearrange the
top ten positions at www.wikipedia.org taking place on
http://meta.wikimedia.org/wiki/Top_Ten_Wikipedias

My congratulations to the donation from the Sloan Foundation!


--
Lars Aronsson (lars@aronsson.se)
Aronsson Datateknik - http://aronsson.se

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brian wrote:
> PS: Why doesn't the Foundation readily give Erik a server that he can use
> whenever he needs to? From my own memory at Wikimanias he has been promised
> by both Jimbo and Sue, and yet he has still never gotten one.

We don't have a machine dedicated to stats generation yet; core
operations still take priority, and old machines aren't suitable for a
script that apparently needs massive amounts of memory to process its data.

I should note that it would probably be useful to ensure that stats
generation scripts are available in the public source repository.

- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkfqg2QACgkQwRnhpk1wk45iRgCg08SfaihZ2JOCyHIoKuBjqd1u
XhEAnjmu9CLndH4Qe306b7+l4DNiTzCd
=2Zd4
-----END PGP SIGNATURE-----

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Wikimedia Projects Growth Animated [ In reply to ]
> We don't have a machine dedicated to stats generation yet; core
> operations still take priority, and old machines aren't suitable for a
> script that apparently needs massive amounts of memory to process its
data.

Brion, I assume you can't wait to stop doing whatever you are doing and
verify the 'apparently' yourself ;)
Scripts are online since years, albeit not (yet) in SVN.

Perl is known for its tendency to spend memory in order to save time (both
in execution and in development).
Hashes are the main culprit, or blessing, which way you look at it.

Disabling edit counts for anons would make a substantial difference, with
limited effect on the output.
But that would be a stop gap solution and shortsighted. There are other very
interesting stats that would fill the space.
To name one example: I would love to generate statistics on how the content
of Wikipedia becomes less geeky, by analysing and visualising trends in
edits/articles/views per category (cluster) per month.

There is a saying that instead of spending time on optimizing perl it is
more efficient to take a job washing cars to save for adequate hardware.
Had I taken that advice I would have saved a lot of time to do more useful
work than over-optimizing a job that has outgrown its current platform.
We are not talking tens of thousands of dollars, just a run of the mill
reasonably fast machine with above average memory and a above average
harddisk, yet both in the commodity range.
Many 15 year olds in the first world spend more on their 3D gaming machine.

Erik Zachte








_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
Hi!

this may sound as a heresy, but for some jobs, that are short in time-
span, but need lots of CPU capacity we could try using Amazon's EC2
or any other grid computing service (maybe some university wants to
donate cluster time?).
That would be much cheaper than allocating high-performance-high-
bucks hardware to projects like this.

Really, we have a capable cluster that has extra-CPU capacity for
distributed tasks, but anything what needs lots-of-memory in single
location simply doesn't scale.
Most of our tasks are scaled out, where lots of smaller machines can
do lots of big work, so this wikistats job is the only one which
cannot be distributed this way.

Eventually we may run Hadoop,Gearman or similar framework for
statistics job distribution, but really, first of all the actual
tasks have to be minimized to smaller segments, for map/reduce
operation, if needed.
I don't see many problems (except setting the whole grid up)
allocating job execution resources during off peak, on 10, 20 or 100
nodes, as long as it doesn't have exceptional resource needs on a
single node. It would be very nice practice for many other future
jobs too.

BR,
--
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]



_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
Sounds like a good idea to me. If we can do it for pennies, I like it even better.


----- Original Message ----
From: Domas Mituzas <midom.lists@gmail.com>
To: Wikimedia Foundation Mailing List <foundation-l@lists.wikimedia.org>
Cc: Wikimedia developers <wikitech-l@lists.wikimedia.org>
Sent: Wednesday, March 26, 2008 11:57:40 AM
Subject: Re: [Foundation-l] Wikimedia Projects Growth Animated

Hi!

this may sound as a heresy, but for some jobs, that are short in time-
span, but need lots of CPU capacity we could try using Amazon's EC2
or any other grid computing service (maybe some university wants to
donate cluster time?).
That would be much cheaper than allocating high-performance-high-
bucks hardware to projects like this.

Really, we have a capable cluster that has extra-CPU capacity for
distributed tasks, but anything what needs lots-of-memory in single
location simply doesn't scale.
Most of our tasks are scaled out, where lots of smaller machines can
do lots of big work, so this wikistats job is the only one which
cannot be distributed this way.

Eventually we may run Hadoop,Gearman or similar framework for
statistics job distribution, but really, first of all the actual
tasks have to be minimized to smaller segments, for map/reduce
operation, if needed.
I don't see many problems (except setting the whole grid up)
allocating job execution resources during off peak, on 10, 20 or 100
nodes, as long as it doesn't have exceptional resource needs on a
single node. It would be very nice practice for many other future
jobs too.

BR,
--
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]



_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
I do lots of cluster computing, and offered my cluster to Erik, but his job
can't be run that way. I also have jobs that are very difficult to run on a
cluster, such as computing en.WP's pagerank (this can be done with the
Parallel Boost Graph Library, but are you guys really going to install MPI
and PBS/Maui?). I image there are lots more.

So before borrowing a 16GB machine from noaa.gov, I tried using Amazon's
elastic compute cloud for this pagerank task. While it may seem cheap, it's
really not that cheap. I think Erik's statistics would only take a few
months to add up to the cost of a new machine. Unpacking that 17GB 7-zip
file alone is going to cost you dearly in S3 storage (and instance time -
how long does it take?). You'll have to have a Large Instance ($.40 per
instance hour) to run the stats job which is four times more expensive than
the small instance.

On Wed, Mar 26, 2008 at 12:57 PM, Domas Mituzas <midom.lists@gmail.com>
wrote:

> Hi!
>
> this may sound as a heresy, but for some jobs, that are short in time-
> span, but need lots of CPU capacity we could try using Amazon's EC2
> or any other grid computing service (maybe some university wants to
> donate cluster time?).
> That would be much cheaper than allocating high-performance-high-
> bucks hardware to projects like this.
>
> Really, we have a capable cluster that has extra-CPU capacity for
> distributed tasks, but anything what needs lots-of-memory in single
> location simply doesn't scale.
> Most of our tasks are scaled out, where lots of smaller machines can
> do lots of big work, so this wikistats job is the only one which
> cannot be distributed this way.
>
> Eventually we may run Hadoop,Gearman or similar framework for
> statistics job distribution, but really, first of all the actual
> tasks have to be minimized to smaller segments, for map/reduce
> operation, if needed.
> I don't see many problems (except setting the whole grid up)
> allocating job execution resources during off peak, on 10, 20 or 100
> nodes, as long as it doesn't have exceptional resource needs on a
> single node. It would be very nice practice for many other future
> jobs too.
>
> BR,
> --
> Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
>
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikimedia Projects Growth Animated [ In reply to ]
*Unless*, of course, Amazon was willing to donate them :)

On Wed, Mar 26, 2008 at 1:53 PM, Brian <Brian.Mingus@colorado.edu> wrote:

> I do lots of cluster computing, and offered my cluster to Erik, but his
> job can't be run that way. I also have jobs that are very difficult to run
> on a cluster, such as computing en.WP's pagerank (this can be done with
> the Parallel Boost Graph Library, but are you guys really going to install
> MPI and PBS/Maui?). I image there are lots more.
>
> So before borrowing a 16GB machine from noaa.gov, I tried using Amazon's
> elastic compute cloud for this pagerank task. While it may seem cheap, it's
> really not that cheap. I think Erik's statistics would only take a few
> months to add up to the cost of a new machine. Unpacking that 17GB 7-zip
> file alone is going to cost you dearly in S3 storage (and instance time -
> how long does it take?). You'll have to have a Large Instance ($.40 per
> instance hour) to run the stats job which is four times more expensive than
> the small instance.
>
>
> On Wed, Mar 26, 2008 at 12:57 PM, Domas Mituzas <midom.lists@gmail.com>
> wrote:
>
> > Hi!
> >
> > this may sound as a heresy, but for some jobs, that are short in time-
> > span, but need lots of CPU capacity we could try using Amazon's EC2
> > or any other grid computing service (maybe some university wants to
> > donate cluster time?).
> > That would be much cheaper than allocating high-performance-high-
> > bucks hardware to projects like this.
> >
> > Really, we have a capable cluster that has extra-CPU capacity for
> > distributed tasks, but anything what needs lots-of-memory in single
> > location simply doesn't scale.
> > Most of our tasks are scaled out, where lots of smaller machines can
> > do lots of big work, so this wikistats job is the only one which
> > cannot be distributed this way.
> >
> > Eventually we may run Hadoop,Gearman or similar framework for
> > statistics job distribution, but really, first of all the actual
> > tasks have to be minimized to smaller segments, for map/reduce
> > operation, if needed.
> > I don't see many problems (except setting the whole grid up)
> > allocating job execution resources during off peak, on 10, 20 or 100
> > nodes, as long as it doesn't have exceptional resource needs on a
> > single node. It would be very nice practice for many other future
> > jobs too.
> >
> > BR,
> > --
> > Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
> >
> >
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
>
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

1 2  View All