Mailing List Archive: Wikimedia Projects Growth Animated

Re: Wikimedia Projects Growth Animated [ In reply to ]

Mar 26, 2008, 1:46 PM

Post #26 of 32 (474 views)

Hello,

> I do lots of cluster computing, and offered my cluster to Erik, but
> his job
> can't be run that way. I also have jobs that are very difficult to
> run on a
> cluster, such as computing en.WP's pagerank (this can be done with the
> Parallel Boost Graph Library, but are you guys really going to
> install MPI
> and PBS/Maui?). I image there are lots more.

Well, I expect that not every paralleled computing needs real-time
clustering libs, merely properly queueing jobs and handling them
could be enough.
Pity that Erik's task is that resource hungry.

> how long does it take?). You'll have to have a Large Instance ($.40
> per
> instance hour) to run the stats job which is four times more
> expensive than
> the small instance.

Well, one can use node-local storage, and handle just uncompressed
stream, instead of placing complete uncompressed dump on file system
or even S3.
Anyway, I do not know exact costs. Last time I was looking at S3-for-
wikipedia, the costs were far too high.

Was just simply thinking, that we could go into grid direction, if
tasks were actually grid-able :)
We just really don't seem to have spare cycles for dedicating high-
capacity nodes.

Now, should we buy one? Foundation hasn't been into scientific
computing & number crunching so far, so that would mean entering
slightly other domain.

--
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Wikimedia Projects Growth Animated [ In reply to ]

removed at example

Mar 26, 2008, 2:09 PM

Post #27 of 32 (488 views)

Permalink

>
> On Wed, Mar 26, 2008 at 2:46 PM, Domas Mituzas <midom.lists@gmail.com>
> wrote:
> Now, should we buy one? Foundation hasn't been into scientific
> computing & number crunching so far, so that would mean entering
> slightly other domain.
>
> --
> Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
>

The answer to that question is..YES! :) Being more open and inviting to all
researchers is well within the Foundation's mission and will directly
benefit the projects. As an example, I recently read an NLP paper where the
researchers were able to automatically fill in missing information in
infoboxes based on article content. It's probably not a good idea to do this
fully automatically, but projects have had scripts for some time now that
detected spelling errors and the like and guided editors through the process
of updating them if it's actually appropriate. The Foundation really ought
to have some sort of outreach position that *proactively* sought these
researchers out, let them know that if they could package up their compute
job and provide an api/interface that there was a place it could be run and
(potentially) directly benefit Wikipedia. After all, they already did the
hard part of conducting state of the art research and implementing it! The
Toolserver can act in this function in a limited fashion, but it's really
not suited for folks who think in terms of the entire dataset.

Here's that particular paper. I found it on the DBPedia list.

Fei Wu, Daniel S. Weld, 2008. Automatically Refining the
> Wikipedia Infobox Ontology, in the 17th International World Wide
> Web Conference, (WWW-08), Beijing, China, April, 2008.
> See http://www.cs.washington.edu/homes/wufei/papers/www08.pdf
>

And keep in mind that it is just one of many.

http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_in_academic_studies

There is much for growth here, as in all areas.
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Wikimedia Projects Growth Animated [ In reply to ]

dgerard at gmail

Mar 26, 2008, 2:18 PM

Post #28 of 32 (477 views)

Permalink

On 26/03/2008, Brian <Brian.Mingus@colorado.edu> wrote:
> On Wed, Mar 26, 2008 at 2:46 PM, Domas Mituzas <midom.lists@gmail.com>
> wrote:

> > Now, should we buy one? Foundation hasn't been into scientific
> > computing & number crunching so far, so that would mean entering
> > slightly other domain.

> The answer to that question is..YES! :) Being more open and inviting to all
> researchers is well within the Foundation's mission and will directly
> benefit the projects.

It would probably be cheaper to ask a university to sponsor a compute
server for us. It wouldn't be that hard to talk them into paying for a
box for us with buckets of RAM and buckets of CPU and a nice press
release saying how much we like them and their name on pages served up
from the server.

Or get IBM to spot us an iSeries and five years' service ...

- d.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Wikimedia Projects Growth Animated [ In reply to ]

wikipedia.kawaii.neko at gmail

Mar 27, 2008, 11:14 AM

Post #29 of 32 (484 views)

Permalink

You can always ask them...

On Wed, Mar 26, 2008 at 9:59 PM, Brian <Brian.Mingus@colorado.edu> wrote:

> *Unless*, of course, Amazon was willing to donate them :)
>
> On Wed, Mar 26, 2008 at 1:53 PM, Brian <Brian.Mingus@colorado.edu> wrote:
>
> > I do lots of cluster computing, and offered my cluster to Erik, but his
> > job can't be run that way. I also have jobs that are very difficult to
> run
> > on a cluster, such as computing en.WP's pagerank (this can be done with
> > the Parallel Boost Graph Library, but are you guys really going to
> install
> > MPI and PBS/Maui?). I image there are lots more.
> >
> > So before borrowing a 16GB machine from noaa.gov, I tried using Amazon's
> > elastic compute cloud for this pagerank task. While it may seem cheap,
> it's
> > really not that cheap. I think Erik's statistics would only take a few
> > months to add up to the cost of a new machine. Unpacking that 17GB 7-zip
> > file alone is going to cost you dearly in S3 storage (and instance time
> -
> > how long does it take?). You'll have to have a Large Instance ($.40 per
> > instance hour) to run the stats job which is four times more expensive
> than
> > the small instance.
> >
> >
> > On Wed, Mar 26, 2008 at 12:57 PM, Domas Mituzas <midom.lists@gmail.com>
> > wrote:
> >
> > > Hi!
> > >
> > > this may sound as a heresy, but for some jobs, that are short in time-
> > > span, but need lots of CPU capacity we could try using Amazon's EC2
> > > or any other grid computing service (maybe some university wants to
> > > donate cluster time?).
> > > That would be much cheaper than allocating high-performance-high-
> > > bucks hardware to projects like this.
> > >
> > > Really, we have a capable cluster that has extra-CPU capacity for
> > > distributed tasks, but anything what needs lots-of-memory in single
> > > location simply doesn't scale.
> > > Most of our tasks are scaled out, where lots of smaller machines can
> > > do lots of big work, so this wikistats job is the only one which
> > > cannot be distributed this way.
> > >
> > > Eventually we may run Hadoop,Gearman or similar framework for
> > > statistics job distribution, but really, first of all the actual
> > > tasks have to be minimized to smaller segments, for map/reduce
> > > operation, if needed.
> > > I don't see many problems (except setting the whole grid up)
> > > allocating job execution resources during off peak, on 10, 20 or 100
> > > nodes, as long as it doesn't have exceptional resource needs on a
> > > single node. It would be very nice practice for many other future
> > > jobs too.
> > >
> > > BR,
> > > --
> > > Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
> > >
> > >
> > >
> > > _______________________________________________
> > > foundation-l mailing list
> > > foundation-l@lists.wikimedia.org
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> > >
> >
> >
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Wikimedia Projects Growth Animated [ In reply to ]

dgerard at gmail

Mar 30, 2008, 9:20 AM

Post #30 of 32 (463 views)

Permalink

On 26/03/2008, Florence Devouard <Anthere9@yahoo.com> wrote:

> What I find very interesting is to see that most projects follow the
> same pattern of growth; so, by far, most bubbles stick together, except
> for an occasional freely-willed bubble.

You may just be seeing an artefact of the logarithmic vertical scale -
does "a factor of 10" count as sticking together?

- d.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Wikimedia Projects Growth Animated [ In reply to ]

erikzachte at infodisiac

Mar 30, 2008, 4:15 PM

Post #31 of 32 (461 views)

Permalink

>> What I find very interesting is to see that most projects follow the
>> same pattern of growth; so, by far, most bubbles stick together, except
>> for an occasional freely-willed bubble.

> You may just be seeing an artefact of the logarithmic vertical scale -
> does "a factor of 10" count as sticking together?

Of course a logarithmic scale tends to mask 'nuances', precisely because it
squeezes widely diverging values.

As Brion already explained:
"A logarithmic scale is basically the only way you're going to see the
smaller ones at all --
on a linear scale they'd be totally dwarfed by the few biggest."

On second thought, why not have both? You can now switch between lineair and
logarithmic scale.
And in lineair mode you can resize the vertical scale so that smaller
projects can be followed.
New version for now as Firefox 3 interactive version only. Not yet recorded
as Flash.

http://infodisiac.com/Wikipedia/Wikistats/AnimationProjectsGrowthWp.html

Erik Zachte

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Re: Wikimedia Projects Growth Animated [ In reply to ]

thomas.dalton at gmail

Mar 30, 2008, 4:21 PM

Post #32 of 32 (461 views)

Permalink

> Of course a logarithmic scale tends to mask 'nuances', precisely because it
> squeezes widely diverging values.

Well, it masks nuances for large values and extenuates them for small values.

> On second thought, why not have both? You can now switch between lineair and
> logarithmic scale.

Problem solved!

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l