Mailing List Archive

Image dumps, html dumps, and static copies missing
Are there good unofficial sites with mirrors and dumps? Is anyone using a
live feed to generate same?

Here is one of those core project support tasks that only the Foundation can
do at the moment, that never seems to become a priority... but is
fundamental to supporting a broad network of people who are carrying out
their own Wikipedia and related initiatives.

Among the core ways that the projects' work gets out into the world is
through full dumps provided by the foundation in all languages. There
aren't many people with access to the databases to generate those dumps, and
it often requires scheduling machine processor and disk time from inside the
cluster to carry out regular dumps effectively.

Image dumps haven't worked reliably since sometime in 2005. I blogged about
this in mid-2006, at which point I believe there was a bittorrent option but
no other; the bittorrent option hasn't worked for over a year.
http://downloads.wikimedia.org/images/ used to offer a few 2006-era
dump links; those too are now gone.

Static versions of the site have also been available from time to time -- at
the moment, the links from download.wikimedia.org are broken:
http://static.wikipedia.org/

And html dumps of the projects have been generated from time to time; I
don't know why these are presented separately from the full dump-lists
(which generate xml dumps in many gratifying varieties), but the process
involved needs some upkeep. At the moment, one can only get wikipedia
dumps from february for languages aa to eml.
http://static.wikipedia.org/downloads/2008-02/

Cheers,
SJ
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Image dumps, html dumps, and static copies missing [ In reply to ]
On Tue, Apr 29, 2008 at 3:53 PM, Samuel Klein <meta.sj@gmail.com> wrote:
[snip]
> Image dumps haven't worked reliably since sometime in 2005. I blogged about
> this in mid-2006, at which point I believe there was a bittorrent option but
> no other; the bittorrent option hasn't worked for over a year.
> http://downloads.wikimedia.org/images/ used to offer a few 2006-era
> dump links; those too are now gone.
[snip]

For several months one of the Wikimedia systems was configured to push
images out to a system of mine with several terrabytes set aside, via
the then-development version 3 of rsync. (prior versions of rsync were
unable to cope with the large number of files)

This worked fairly well at the time and I handed out snapshots to a
number of other people who requested them.

The feed seems to have stopped in early January. I never bothered
looking into why, since I haven't been doing much with them recently
and no one has asked me for a new snapshot lately.

Bittorrent is pretty much non-viable for maintaining a mirror of
images (and nearly so for even making the initial several tbyte
transfer).

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Image dumps, html dumps, and static copies missing [ In reply to ]
On Tue, Apr 29, 2008 at 12:53 PM, Samuel Klein <meta.sj@gmail.com> wrote:
> Are there good unofficial sites with mirrors and dumps? Is anyone using a
> live feed to generate same?
>
> Here is one of those core project support tasks that only the Foundation can
> do at the moment, that never seems to become a priority... but is
> fundamental to supporting a broad network of people who are carrying out
> their own Wikipedia and related initiatives.
>
> Among the core ways that the projects' work gets out into the world is
> through full dumps provided by the foundation in all languages. There
> aren't many people with access to the databases to generate those dumps, and
> it often requires scheduling machine processor and disk time from inside the
> cluster to carry out regular dumps effectively.

On the wiki-research list, Sue Gardner recently made a post about
Foundation research priorities:
http://lists.wikimedia.org/pipermail/wiki-research-l/2008-April/000546.html

There's an associated document on Meta:
http://meta.wikimedia.org/wiki/Wikimedia_Foundation_Research_Goals

which lists a lot of the things many of us have been interested in
researching for a long time.

Arguably, however, providing solid dumps is the backbone for getting
most of this research getting done, since having project data to
manipulate is necessary for many possible studies. So not only are
regular dumps critical for fulfilling our free content
responsibilities and mission, but they are critical for future
research. Which is to say: we all really want to see them happen! And
agreed, the Foundation is the only one that can make it so (even
though it's not an easy task); and this is the sort of infrastructure
task that should be absolutely core.

-- phoebe

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Image dumps, html dumps, and static copies missing [ In reply to ]
Phoebe Ayers writes:

> Arguably, however, providing solid dumps is the backbone for getting
> most of this research getting done, since having project data to
> manipulate is necessary for many possible studies. So not only are
> regular dumps critical for fulfilling our free content
> responsibilities and mission, but they are critical for future
> research. Which is to say: we all really want to see them happen! And
> agreed, the Foundation is the only one that can make it so (even
> though it's not an easy task); and this is the sort of infrastructure
> task that should be absolutely core.

We at the Foundation want to see this happen too. We regard increasing
the frequency and reliability of the dumps as mission-critical, and
we're working toward that goal.


--Mike




_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Image dumps, html dumps, and static copies missing [ In reply to ]
On Tue, Apr 29, 2008 at 8:27 PM, Mike Godwin <mgodwin@wikimedia.org> wrote:
>
> Phoebe Ayers writes:
>
> > Arguably, however, providing solid dumps is the backbone for getting
> > most of this research getting done, since having project data to
> > manipulate is necessary for many possible studies. So not only are
> > regular dumps critical for fulfilling our free content
> > responsibilities and mission, but they are critical for future
> > research. Which is to say: we all really want to see them happen! And
> > agreed, the Foundation is the only one that can make it so (even
> > though it's not an easy task); and this is the sort of infrastructure
> > task that should be absolutely core.
>
> We at the Foundation want to see this happen too. We regard increasing
> the frequency and reliability of the dumps as mission-critical, and
> we're working toward that goal.

Lovely! Glad to hear it. :)

The small but enthusiastic wiki research community will certainly be
glad too. In anticipation, we are already planning many parties this
year to celebrate! *

-- phoebe

* The first in Alexandria, the second in Porto (www.wikisym.org), not
to mention Wikimedia Conference Netherlands, Konferencja Wikimedia
Polska, etc. etc. ...

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Image dumps, html dumps, and static copies missing [ In reply to ]
On Tue, Apr 29, 2008 at 11:27 PM, Mike Godwin <mgodwin@wikimedia.org> wrote:

>
> Phoebe Ayers writes:
>
> > Arguably, however, providing solid dumps is the backbone for getting
> > most of this research getting done, since having project data to
> > manipulate is necessary for many possible studies. So not only are
> > regular dumps critical for fulfilling our free content
> > responsibilities and mission, but they are critical for future
> > research. Which is to say: we all really want to see them happen! And
> > agreed, the Foundation is the only one that can make it so (even
> > though it's not an easy task); and this is the sort of infrastructure
> > task that should be absolutely core.
>
> We at the Foundation want to see this happen too. We regard increasing
> the frequency and reliability of the dumps as mission-critical, and
> we're working toward that goal.
>
>
Wonderful, thanks for the feedback. Is there a commentable roadmap of
critical (and not-so-critical) projects for people who have their own crazy
ideas they'd like to contribute? There have been some good brainstorming
sessions on this list on the subject over the years.

Regards,
SJ

@phoebe, thanks for the link to the set of research questions.
I'd like to see an elaboration of the foundation goals...
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Image dumps, html dumps, and static copies missing [ In reply to ]
Samuel writes:

> Wonderful, thanks for the feedback. Is there a commentable roadmap of
> critical (and not-so-critical) projects for people who have their
> own crazy
> ideas they'd like to contribute? There have been some good
> brainstorming
> sessions on this list on the subject over the years.

I have no idea if there's a commentable roadmap, but I do know that
detailed questions regarding our technical maintenance and expansion
plans are best directed to Erik or to Brion (or other members of our
tech team). I do know that a nearer-term goal would be monthly updates
of the dumps.


--Mike



_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l