Mailing List Archive

Querylogs and accesslogs
Hi,
are this information accessible for an academic research paper?
Thank you
_______________________________________________
foundation-l mailing list
foundation-l@wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/foundation-l
Re: Querylogs and accesslogs [ In reply to ]
On 11/24/06, Antonio Gulli <gulli@di.unipi.it> wrote:
> Hi,
> are this information accessible for an academic research paper?
> Thank you

Sorry, we do not currently store this data because of the performance impact.
We do, however, have plenty of data on editing.

We also have some low quality sampled data on readership for some of
our projects at:

http://tools.wikimedia.de/~leon/stats/wikicharts/
_______________________________________________
foundation-l mailing list
foundation-l@wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/foundation-l
Re: Querylogs and accesslogs [ In reply to ]
Gregory Maxwell ha scritto:
> On 11/24/06, Antonio Gulli <gulli@di.unipi.it> wrote:
>> Hi,
>> are this information accessible for an academic research paper?
>> Thank you
>
Is wiki using apache web server or something equivalent server?
I was referring to the access.log file
> Sorry, we do not currently store this data because of the performance
> impact.
> We do, however, have plenty of data on editing.
>
> We also have some low quality sampled data on readership for some of
> our projects at:
>
> http://tools.wikimedia.de/~leon/stats/wikicharts/
>
>

_______________________________________________
foundation-l mailing list
foundation-l@wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/foundation-l
Re: Querylogs and accesslogs [ In reply to ]
On 11/24/06, Antonio Gulli <gulli@di.unipi.it> wrote:
> Is wiki using apache web server or something equivalent server?
> I was referring to the access.log file

Although we use Apache, we do not store an access.log.
We also use squid, but have disabled logging in that as well.

At peak we are serving over 20,000 requests per second. At this
activity level logging would present a non-negligible performance and
administrative overhead.

Lets pretend for a moment that all access hit apache:

My local mediawiki installation on apache produces log entries of
232.13 bytes per hit on average. I would expect that my log entries
would be shorter than the entries we'd see in production.

Over a day we are receiving about 1,188,345,600 http requests.

This would be 256.9 GiB/day in access logs.

At 7.8 terabytes of log data to simply preserve a month's history,
keeping full access logs would be both unreasonable and wasteful.

If you have some especially interesting research ideas, and your
research can be done on smaller amounts of data that we might be
collecting (such as the wikicharts data) then I would be glad to
discuss the possibilities. But it would be best to take that
discussion off list...
_______________________________________________
foundation-l mailing list
foundation-l@wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/foundation-l
Re: Querylogs and accesslogs [ In reply to ]
You could also try this link if you want general statistics on Wikipedia:
http://stats.wikimedia.org/EN/Sitemap.htm

-- Hay Kranen / [[User:Husky]]

On 11/24/06, Gregory Maxwell <gmaxwell@gmail.com> wrote:
>
> On 11/24/06, Antonio Gulli <gulli@di.unipi.it> wrote:
> > Is wiki using apache web server or something equivalent server?
> > I was referring to the access.log file
>
> Although we use Apache, we do not store an access.log.
> We also use squid, but have disabled logging in that as well.
>
> At peak we are serving over 20,000 requests per second. At this
> activity level logging would present a non-negligible performance and
> administrative overhead.
>
> Lets pretend for a moment that all access hit apache:
>
> My local mediawiki installation on apache produces log entries of
> 232.13 bytes per hit on average. I would expect that my log entries
> would be shorter than the entries we'd see in production.
>
> Over a day we are receiving about 1,188,345,600 http requests.
>
> This would be 256.9 GiB/day in access logs.
>
> At 7.8 terabytes of log data to simply preserve a month's history,
> keeping full access logs would be both unreasonable and wasteful.
>
> If you have some especially interesting research ideas, and your
> research can be done on smaller amounts of data that we might be
> collecting (such as the wikicharts data) then I would be glad to
> discuss the possibilities. But it would be best to take that
> discussion off list...
> _______________________________________________
> foundation-l mailing list
> foundation-l@wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/foundation-l
Re: Querylogs and accesslogs [ In reply to ]
Good morning,
I put this question 2 years ago. I know that some stats are now
available here
http://dammit.lt/wikistats/
Is is possible to use them for an academic paper?
>
>
>> Gregory Maxwell ha scritto:
>>> On 11/24/06, Antonio Gulli <gulli@di.unipi.it> wrote:
>>>> Hi,
>>>> are this information accessible for an academic research paper?
>>>> Thank you
>>>
>> Is wiki using apache web server or something equivalent server?
>> I was referring to the access.log file
>>> Sorry, we do not currently store this data because of the
>>> performance impact.
>>> We do, however, have plenty of data on editing.
>>>
>>> We also have some low quality sampled data on readership for some of
>>> our projects at:
>>>
>>> http://tools.wikimedia.de/~leon/stats/wikicharts/
>>>
>>>
>>
>>
>
>



_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Querylogs and accesslogs [ In reply to ]
On Sun, May 11, 2008 at 3:57 PM, Antonio Gulli <gulli@di.unipi.it> wrote:

> Good morning,
> I put this question 2 years ago. I know that some stats are now
> available here
> http://dammit.lt/wikistats/
> Is is possible to use them for an academic paper?
> >
> >
> >> Gregory Maxwell ha scritto:
> >>> On 11/24/06, Antonio Gulli <gulli@di.unipi.it> wrote:
> >>>> Hi,
> >>>> are this information accessible for an academic research paper?
> >>>> Thank you
> >>>
> >> Is wiki using apache web server or something equivalent server?
> >> I was referring to the access.log file
> >>> Sorry, we do not currently store this data because of the
> >>> performance impact.
> >>> We do, however, have plenty of data on editing.
> >>>
> >>> We also have some low quality sampled data on readership for some of
> >>> our projects at:
> >>>
> >>> http://tools.wikimedia.de/~leon/stats/wikicharts/<http://tools.wikimedia.de/%7Eleon/stats/wikicharts/>
> >>>
> >>>
> >>
> >>
> >
> >
>
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


I'm not sure anyone would trust you if you cited 'dammit', but the data is
freely available, so you can use it. :)
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Querylogs and accesslogs [ In reply to ]
> I'm not sure anyone would trust you if you cited 'dammit', but the
> data is
> freely available, so you can use it. :)

phew, trusting that guy?!!? never!

--
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]



_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Querylogs and accesslogs [ In reply to ]
Do you think is possible to access the raw access log in squid and apache?
I would be interested in sessions (queries and clicks)
> I'm not sure anyone would trust you if you cited 'dammit', but the data is
> freely available, so you can use it. :)
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
>
>



_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Querylogs and accesslogs [ In reply to ]
On Fri, Nov 24, 2006 at 5:36 PM, Gregory Maxwell <gmaxwell@gmail.com> wrote:
>
> Over a day we are receiving about 1,188,345,600 http requests.
>
> This would be 256.9 GiB/day in access logs.

As I recall logging stopped when the rate of incoming log data
outpaced the rate at which the data could be written to disk :)

--
Stephen Bain
stephen.bain@gmail.com

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l