Dear Wikimedia community,
please have a look at the following proposal of
collaboration I sent to the Wikimedia board right yesterday.
The board (and myself too!) would like to hear your opinion
on that, before taking a decision.
I would like to specify that such collaboration will have
scientific- and academic-only purposes, without any
commercial involvement.
Finally, the analysis software we are developing (and
going to apply to Wikipedia data, if the proposal will be
accepted) will be distributed in the scientific community as
open source.
Of course, I will be glad to provide you any detail and
explanation you will think necessary.
Thank you for your attention. Best regards,
- Mirco
----------------------------------------------------------
Dear Sirs,
I am writing to you on behalf of the KDD-Lab (Laboratory
on Knowledge Discovery and Delivery:
http://www-kdd.isti.cnr.it), a branch of the ISTI institute
of the Italian National Research Centre (CNR).
Our group is working (among the others) on a project
regarding the analysis of the logs of web servers, and in
recent days we are working on analysis techniques that seem
to be best suited for "content-rich sites". Our first
thought obviously went to Wikipedia...
We would like to have the opportunity to apply our
analysis techniques to the web logs of Wikipedia. Looking to
the Wikipedia access statistics, we believe that an optimal
amount of data would be the following: (1) the (raw) weblogs
of the English section covering a few days of usage, or (2)
a few weeks for the Italian section.
Do you think it could be possible to start this kind of
collaboration?
Of course, we are willing to provide you all the legal
agreements you will consider necessary, especially those
regarding privacy. And, obviously, we will properly
acknowledge your contribution in any of our scientifical
publications and reports where we use it.
[.Addendum: the sensible information in web logs is
essentially located in the "client IP" field ("who visited
that page"). However, for our research purposes such field
is not strictly needed as an encrypted version of it would
be enough, thus avoiding most of the privacy issues.]
Thank you for your attention.
Looking for receiving your answer and opinion, I send you
my best regards,
- Mirco Nanni
====================================
http://ercolino.isti.cnr.it/mirco
====================================
please have a look at the following proposal of
collaboration I sent to the Wikimedia board right yesterday.
The board (and myself too!) would like to hear your opinion
on that, before taking a decision.
I would like to specify that such collaboration will have
scientific- and academic-only purposes, without any
commercial involvement.
Finally, the analysis software we are developing (and
going to apply to Wikipedia data, if the proposal will be
accepted) will be distributed in the scientific community as
open source.
Of course, I will be glad to provide you any detail and
explanation you will think necessary.
Thank you for your attention. Best regards,
- Mirco
----------------------------------------------------------
Dear Sirs,
I am writing to you on behalf of the KDD-Lab (Laboratory
on Knowledge Discovery and Delivery:
http://www-kdd.isti.cnr.it), a branch of the ISTI institute
of the Italian National Research Centre (CNR).
Our group is working (among the others) on a project
regarding the analysis of the logs of web servers, and in
recent days we are working on analysis techniques that seem
to be best suited for "content-rich sites". Our first
thought obviously went to Wikipedia...
We would like to have the opportunity to apply our
analysis techniques to the web logs of Wikipedia. Looking to
the Wikipedia access statistics, we believe that an optimal
amount of data would be the following: (1) the (raw) weblogs
of the English section covering a few days of usage, or (2)
a few weeks for the Italian section.
Do you think it could be possible to start this kind of
collaboration?
Of course, we are willing to provide you all the legal
agreements you will consider necessary, especially those
regarding privacy. And, obviously, we will properly
acknowledge your contribution in any of our scientifical
publications and reports where we use it.
[.Addendum: the sensible information in web logs is
essentially located in the "client IP" field ("who visited
that page"). However, for our research purposes such field
is not strictly needed as an encrypted version of it would
be enough, thus avoiding most of the privacy issues.]
Thank you for your attention.
Looking for receiving your answer and opinion, I send you
my best regards,
- Mirco Nanni
====================================
http://ercolino.isti.cnr.it/mirco
====================================