Mailing List Archive

Thank you for discussing my Top 25 Nonprofit LIst
over at
http://www.nonprofittechblog.org/philanthropy-and-nonprofit-top-25-list-october-2007

I was wondering if Wikimedia could use the Quantcast Javascript tag so I
would be able to track Wikimedia's URLs in real-time. If that's not possble,
it would be great if the Foundation simply just released the web site
statistics for all its URLs.

It would be a great service for the nonprofit sector if this could be done
as many nonprofit marketers are still trying to understand the effects of
Web 2.0 on our sector and would love to have hard data.
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
On 10/3/07, Allan Benamer <abenamer@nonprofittechblog.org> wrote:
> I was wondering if Wikimedia could use the Quantcast Javascript tag so I
> would be able to track Wikimedia's URLs in real-time.

Sorry, Probably not.

> If that's not possble,
> it would be great if the Foundation simply just released the web site
> statistics for all its URLs.

We can probably do something like that. What information would be useful?

We are pretty much already releasing all the statistics we generate,
i.e. almost nothing. :)

With the exception of information we need to keep private to protect
our readers and contributors we are interested in operating in the
most open manner possible. Releasing bulk site statistics is
consistent with that.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
Allan Benamer wrote:
> over at
> http://www.nonprofittechblog.org/philanthropy-and-nonprofit-top-25-list-october-2007
>
> I was wondering if Wikimedia could use the Quantcast Javascript tag so I
> would be able to track Wikimedia's URLs in real-time. If that's not possble,
> it would be great if the Foundation simply just released the web site
> statistics for all its URLs.
>
> It would be a great service for the nonprofit sector if this could be done
> as many nonprofit marketers are still trying to understand the effects of
> Web 2.0 on our sector and would love to have hard data.

Hmm. Add javascript to every one of our pages that calls up one of your
servers? I'm not sure that's workable. Wikimedia sites get twenty
thousand hits a second - could you even handle the traffic?

As far as I am aware, the Foundation doesn't even *have* the web site
statistics for all its URLs. There's so much data, it's just discarded,
although if I recall correctly, a few institutions have been sent
1-in-100 samples of log data or something like that. Perhaps an
arrangement like that would be more feasible.

Also, easy on the buzzwords. Wikipedia's success does not stem from
shiny Ajaxey features. Wikis predate Web 2.0 by some time.

-Gurch

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
Hehe.. well, I just want monthly unique visitors. I can even live with
awstats? I know it's old but that's how boingboing works too. Statistics
generation is on a whole new level with the sizes of Wikimedia's sites but
awstats would be a start. Do you have a link to the statistics you do
generate? It may well be that's all I need for more accurate numbers. Google
Analytics reports e-mailed out via the custom report generator would be
great too.

I love your stance on transparency. It's mine as well but I'm having a
hellacious time trying to get other nonprofits to agree that transparency of
web site statistics is important.

On 10/3/07, Gregory Maxwell <gmaxwell@gmail.com> wrote:
>
> On 10/3/07, Allan Benamer <abenamer@nonprofittechblog.org> wrote:
> > I was wondering if Wikimedia could use the Quantcast Javascript tag so I
> > would be able to track Wikimedia's URLs in real-time.
>
> Sorry, Probably not.
>
> > If that's not possble,
> > it would be great if the Foundation simply just released the web site
> > statistics for all its URLs.
>
> We can probably do something like that. What information would be useful?
>
> We are pretty much already releasing all the statistics we generate,
> i.e. almost nothing. :)
>
> With the exception of information we need to keep private to protect
> our readers and contributors we are interested in operating in the
> most open manner possible. Releasing bulk site statistics is
> consistent with that.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
> Do you have a link to the statistics you do
> generate?

I think most (if not all) of the web stats we have are here:
https://wikitech.leuksman.com/view/Main_Page

However, most of those links seem to be broken, so I'm not sure if
will do you much good.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
You're right -- almost everything is broken :(

I guess we'll have to live with the Quantcast and Compete statistics for the
time being. However, I suspect that web site statistics are an integral part
of Wikimedia's ethos so I hope that someone will focus on this task.

All you folks are doing incredible work!!! Thank you so much for making the
Web and our lives just that much better! Thank you!

On 10/3/07, Thomas Dalton <thomas.dalton@gmail.com> wrote:
>
> > Do you have a link to the statistics you do
> > generate?
>
> I think most (if not all) of the web stats we have are here:
> https://wikitech.leuksman.com/view/Main_Page
>
> However, most of those links seem to be broken, so I'm not sure if
> will do you much good.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
On 03/10/2007, Allan Benamer <abenamer@nonprofittechblog.org> wrote:
> Hehe.. well, I just want monthly unique visitors.

When it was said almost nothing was collected that wasn't a joke.

--
geni

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
On 10/3/07, Matthew Britton <matthew.britton@btinternet.com> wrote:
> Also, easy on the buzzwords. Wikipedia's success does not stem from
> shiny Ajaxey features. Wikis predate Web 2.0 by some time.

Depending on who you're talking to Web 2.0 alternatively means
user-exploitative^wcontributed content, or even the fast-and-lose
handling of copyright which so often goes with it.

I think it's terrible to call those things "Web 2.0", while AJAX makes
sense... but people are doing it. ::shrugs::

There is also a very .. um.. interesting video on Wikiversity about
Web2.0 which spends most of its time talking about free content
licenses and doesn't mention AJAX at all ;), all while using a
soundtrack which is most likely a copyright infringement. ;)

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
Gregory Maxwell wrote:
> On 10/3/07, Matthew Britton <matthew.britton@btinternet.com> wrote:
>> Also, easy on the buzzwords. Wikipedia's success does not stem from
>> shiny Ajaxey features. Wikis predate Web 2.0 by some time.
>
> Depending on who you're talking to Web 2.0 alternatively means
> user-exploitative^wcontributed content, or even the fast-and-lose
> handling of copyright which so often goes with it.
>
> I think it's terrible to call those things "Web 2.0", while AJAX makes
> sense... but people are doing it. ::shrugs::
>
> There is also a very .. um.. interesting video on Wikiversity about
> Web2.0 which spends most of its time talking about free content
> licenses and doesn't mention AJAX at all ;), all while using a
> soundtrack which is most likely a copyright infringement. ;)

So Web 2.0 is mostly about copyright infringement? Even more reason not
to associate Wikipedia with it. :)

-Gurch

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
geni wrote:
> On 03/10/2007, Allan Benamer <abenamer@nonprofittechblog.org> wrote:
>> Hehe.. well, I just want monthly unique visitors.
>
> When it was said almost nothing was collected that wasn't a joke.
>

I'll try to clarify what geni is saying. The Wikimedia Foundation relies
exclusively on donations and has a very tight budget. It can only buy as
much hardware as it can afford, and can only just afford enough to keep
the sites running. (The toolserver had to be donated separately).

The resources just aren't available to completely log all site traffic -
it would require scripts to process the mess of data generated at a fast
enough pace to keep up without using up precious CPU time, and a whole
load of extra disk space to store this data.

It's not possible to just "release all log data", because it doesn't exist.

-Gurch

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
Ummm when I use Web 2.0 terms, I basically go off Tim O'Reilly's
understanding of it over at:

http://www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html

As far as he's concerned (and yes, I know Wikis predate the term), Wikipedia
is the epitome of Web 2.0. Think of it this way -- art deco as a term for a
particular class of art objects didn't exist until the 60s despite the fact
that the objects in question were made in the 20s and 30s.

Anyway, I'm eagerly waiting for the revival of Wikimedia statistics
gathering!

On 10/3/07, Gregory Maxwell <gmaxwell@gmail.com> wrote:
>
> On 10/3/07, Matthew Britton <matthew.britton@btinternet.com> wrote:
> > Also, easy on the buzzwords. Wikipedia's success does not stem from
> > shiny Ajaxey features. Wikis predate Web 2.0 by some time.
>
> Depending on who you're talking to Web 2.0 alternatively means
> user-exploitative^wcontributed content, or even the fast-and-lose
> handling of copyright which so often goes with it.
>
> I think it's terrible to call those things "Web 2.0", while AJAX makes
> sense... but people are doing it. ::shrugs::
>
> There is also a very .. um.. interesting video on Wikiversity about
> Web2.0 which spends most of its time talking about free content
> licenses and doesn't mention AJAX at all ;), all while using a
> soundtrack which is most likely a copyright infringement. ;)
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
Ahhh -- understood. I guess we'll just have to live with the estimates from
Quantcast and Compete in the meantime. I have a question though -- if a
service was willing to handle Wikipedia's load, would you use it? That is,
if Quantcast offered to do statistics gathering for Wikimedia, would
Wikimedia consent to it?

On 10/3/07, Matthew Britton <matthew.britton@btinternet.com> wrote:
>
> geni wrote:
> > On 03/10/2007, Allan Benamer <abenamer@nonprofittechblog.org> wrote:
> >> Hehe.. well, I just want monthly unique visitors.
> >
> > When it was said almost nothing was collected that wasn't a joke.
> >
>
> I'll try to clarify what geni is saying. The Wikimedia Foundation relies
> exclusively on donations and has a very tight budget. It can only buy as
> much hardware as it can afford, and can only just afford enough to keep
> the sites running. (The toolserver had to be donated separately).
>
> The resources just aren't available to completely log all site traffic -
> it would require scripts to process the mess of data generated at a fast
> enough pace to keep up without using up precious CPU time, and a whole
> load of extra disk space to store this data.
>
> It's not possible to just "release all log data", because it doesn't
> exist.
>
> -Gurch
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
This has recently been discussed and we probably wouldn't consent to that
because it may give away too much private information.

On 10/3/07, Allan Benamer <abenamer@nonprofittechblog.org> wrote:
>
> Ahhh -- understood. I guess we'll just have to live with the estimates
> from
> Quantcast and Compete in the meantime. I have a question though -- if a
> service was willing to handle Wikipedia's load, would you use it? That is,
> if Quantcast offered to do statistics gathering for Wikimedia, would
> Wikimedia consent to it?
>
> On 10/3/07, Matthew Britton <matthew.britton@btinternet.com> wrote:
> >
> > geni wrote:
> > > On 03/10/2007, Allan Benamer <abenamer@nonprofittechblog.org> wrote:
> > >> Hehe.. well, I just want monthly unique visitors.
> > >
> > > When it was said almost nothing was collected that wasn't a joke.
> > >
> >
> > I'll try to clarify what geni is saying. The Wikimedia Foundation relies
> > exclusively on donations and has a very tight budget. It can only buy as
> > much hardware as it can afford, and can only just afford enough to keep
> > the sites running. (The toolserver had to be donated separately).
> >
> > The resources just aren't available to completely log all site traffic -
> > it would require scripts to process the mess of data generated at a fast
> > enough pace to keep up without using up precious CPU time, and a whole
> > load of extra disk space to store this data.
> >
> > It's not possible to just "release all log data", because it doesn't
> > exist.
> >
> > -Gurch
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l@lists.wikimedia.org
> > Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Casey Brown
Cbrown1023

---
Note: This e-mail address is used for mailing lists. Personal emails sent
to
this address will probably get lost.
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
Ok, I will just sit tight then. Thanks for the quick and straightforward
responses everyone! I didn't realize Wikimedia Foundation was so open and
transparent -- it's a heck of an example.

On 10/3/07, Casey Brown <cbrown1023.ml@gmail.com> wrote:
>
> This has recently been discussed and we probably wouldn't consent to that
> because it may give away too much private information.
>
> On 10/3/07, Allan Benamer <abenamer@nonprofittechblog.org> wrote:
> >
> > Ahhh -- understood. I guess we'll just have to live with the estimates
> > from
> > Quantcast and Compete in the meantime. I have a question though -- if a
> > service was willing to handle Wikipedia's load, would you use it? That
> is,
> > if Quantcast offered to do statistics gathering for Wikimedia, would
> > Wikimedia consent to it?
> >
> > On 10/3/07, Matthew Britton <matthew.britton@btinternet.com> wrote:
> > >
> > > geni wrote:
> > > > On 03/10/2007, Allan Benamer <abenamer@nonprofittechblog.org> wrote:
> > > >> Hehe.. well, I just want monthly unique visitors.
> > > >
> > > > When it was said almost nothing was collected that wasn't a joke.
> > > >
> > >
> > > I'll try to clarify what geni is saying. The Wikimedia Foundation
> relies
> > > exclusively on donations and has a very tight budget. It can only buy
> as
> > > much hardware as it can afford, and can only just afford enough to
> keep
> > > the sites running. (The toolserver had to be donated separately).
> > >
> > > The resources just aren't available to completely log all site traffic
> -
> > > it would require scripts to process the mess of data generated at a
> fast
> > > enough pace to keep up without using up precious CPU time, and a whole
> > > load of extra disk space to store this data.
> > >
> > > It's not possible to just "release all log data", because it doesn't
> > > exist.
> > >
> > > -Gurch
> > >
> > > _______________________________________________
> > > foundation-l mailing list
> > > foundation-l@lists.wikimedia.org
> > > Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
> > >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l@lists.wikimedia.org
> > Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
>
>
>
> --
> Casey Brown
> Cbrown1023
>
> ---
> Note: This e-mail address is used for mailing lists. Personal emails
> sent
> to
> this address will probably get lost.
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
On 10/3/07, Matthew Britton <matthew.britton@btinternet.com> wrote:
> I'll try to clarify what geni is saying. The Wikimedia Foundation relies
> exclusively on donations and has a very tight budget. It can only buy as
> much hardware as it can afford, and can only just afford enough to keep
> the sites running. (The toolserver had to be donated separately).

It's a bit misleading to characterize this as a poor-wikimedia issue.
Handling this amount of data is hard for anyone. It's just that while
other sites need a higher degree of data for things like selling
themselves to advertisers, we don't... so efforts have been spent
elsewhere instead.

Historically we've only collected the information that we need for
capacity planning. I linked to that stuff up thread.

> The resources just aren't available to completely log all site traffic -
> it would require scripts to process the mess of data generated at a fast
> enough pace to keep up without using up precious CPU time, and a whole
> load of extra disk space to store this data.

As of ~January, we send records of every access to an analysis system.
Prior to then technical issues prevented us from collecting that kind
of data.

On that system we log (to disk) 1:100 and 1:1000 samples of the
traffic. Logging all accesses to disk would result in, as I said
before, about 0.6 TB of log data per day. We'd run out disk rather
quickly. ;)

We can send the data (at a configurable sample rate) to other hosts,
or to programs for analysis.

We have at least some resources to run some analysis programs but they
must be very efficient unless they are to be run only on infrequently
sampled data.

We just don't have the analysis programs.

I checked a simple aggregator for pageview stats into SVN last night.
http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/counter/fast_counter.c?view=markup

I've got a unique viewers by project/country one almost done, I'll
probably check it in tonight.

> It's not possible to just "release all log data", because it doesn't exist.

Thats not correct anymore.

Complete data is not stored, but it is now collected and can be
transmitted. ... It's not possible to "release all log data" because
there are have ethical, legal, and procedural obligations to avoid
endangering the privacy of readers/editors with sloppy disclosures.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
Hi!

> We have at least some resources to run some analysis programs but they
> must be very efficient unless they are to be run only on infrequently
> sampled data.
> We just don't have the analysis programs.

I did run once transform our profiling framework to profile page hits
at full rate, and it managed to work on simple modern machine.
It is not impossible, just handling the data collected is the tricky
part.

If anyone works that way, we could get page hit counters, but still,
that needs work :)

Best regards,
--
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]



_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
Casey Brown wrote:
> This has recently been discussed and we probably wouldn't consent to that
> because it may give away too much private information.
>
> On 10/3/07, Allan Benamer <abenamer@nonprofittechblog.org> wrote:
>> Ahhh -- understood. I guess we'll just have to live with the estimates
>> from
>> Quantcast and Compete in the meantime. I have a question though -- if a
>> service was willing to handle Wikipedia's load, would you use it? That is,
>> if Quantcast offered to do statistics gathering for Wikimedia, would
>> Wikimedia consent to it?

Specifically the IP addresses of users. Such log data as has been
released has always been anonymized first.

-Gurch

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Thank you for discussing my Top 25 Nonprofit LIst [ In reply to ]
Gregory Maxwell wrote:
> On 10/3/07, Matthew Britton <matthew.britton@btinternet.com> wrote:
>> I'll try to clarify what geni is saying. The Wikimedia Foundation relies
>> exclusively on donations and has a very tight budget. It can only buy as
>> much hardware as it can afford, and can only just afford enough to keep
>> the sites running. (The toolserver had to be donated separately).
>
> It's a bit misleading to characterize this as a poor-wikimedia issue.
> Handling this amount of data is hard for anyone. It's just that while
> other sites need a higher degree of data for things like selling
> themselves to advertisers, we don't... so efforts have been spent
> elsewhere instead.

Sorry about that. It is indeed hard for someone. Just easier for other
sites with comparable traffic (e.g. ebay.com, microsoft.com) because
they have vastly more resoures available. :)

> Historically we've only collected the information that we need for
> capacity planning. I linked to that stuff up thread.
>
>> The resources just aren't available to completely log all site traffic -
>> it would require scripts to process the mess of data generated at a fast
>> enough pace to keep up without using up precious CPU time, and a whole
>> load of extra disk space to store this data.
>
> As of ~January, we send records of every access to an analysis system.
> Prior to then technical issues prevented us from collecting that kind
> of data.
>
> On that system we log (to disk) 1:100 and 1:1000 samples of the
> traffic. Logging all accesses to disk would result in, as I said
> before, about 0.6 TB of log data per day. We'd run out disk rather
> quickly. ;)
>
> We can send the data (at a configurable sample rate) to other hosts,
> or to programs for analysis.
>
> We have at least some resources to run some analysis programs but they
> must be very efficient unless they are to be run only on infrequently
> sampled data.
>
> We just don't have the analysis programs.

That's more or less what I was trying to say. The Foundation's resources
also limit the number of tehnical staff it can hire, and keeping the
site going has to be their first priority.

> I checked a simple aggregator for pageview stats into SVN last night.
> http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/counter/fast_counter.c?view=markup
>
> I've got a unique viewers by project/country one almost done, I'll
> probably check it in tonight.
>
>> It's not possible to just "release all log data", because it doesn't exist.
>
> Thats not correct anymore.
>
> Complete data is not stored, but it is now collected and can be
> transmitted. ... It's not possible to "release all log data" because
> there are have ethical, legal, and procedural obligations to avoid
> endangering the privacy of readers/editors with sloppy disclosures.



_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: http://lists.wikimedia.org/mailman/listinfo/foundation-l