Mailing List Archive

Wikipedia tracks user behaviour via third party companies
Hi,

recently the report of the KnowPrivacy [1] study - a research project
by the School of Information from University of California in Berkeley
- hit the German media [2].

It came to the conclusion that "All of the top 50 websites contained
at least one web bug at some point in a one month time period." [3]
which includes wikipedia.org.

This is very troubleing and irritating for some of our (German) users
who are very sensitive to data privacy topics. So I established
contact to Brian W. Carver (University of California) who connected me
to David Cancel, the maintainer of Ghostery, which was used to
identify the web bugs. David wrote me today:

> The following web bug trackers were reported to us, on the following subdomains:
> Google Analytics - vls.wikipedia.org
> Doubleclick - hu.wikipedia.org
> Both were seen in yesterday's data so they're recent. We don't receive any page level information so that's as much detail as we have. Hope that helps.

I wasn't able to track down the Doubleclick web bug on the hungarian
Wikipedia, but Google Analytics web bug is integrated in every page of
the West Flemish Wikipedia via JavaScript [4].

Our privacy policy [5] states "The Wikimedia Foundation may keep raw
logs of such transactions [IP and other technical information], but
these will not be published or used to track legitimate users." and
"As a general principle, the access to, and retention of, personally
identifiable data in all projects should be minimal and should be used
only internally to serve the well-being of the projects."

I think we should stop the current use of Google Analytics ASAP.

Bye, Tim.

--
http://wikimedia.de

[1] http://knowprivacy.org
[2] http://www.heise.de/newsticker/Studie-Google-fuehrend-bei-Web-Bug-Nutzung--/meldung/139841
[3] http://www.knowprivacy.org/report/KnowPrivacy_Final_Report.pdf, p. 4
[4] http://vls.wikipedia.org/wiki/MediaWiki:Common.js
[5] http://wikimediafoundation.org/wiki/Privacy_policy

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
Hi,

> I think we should stop the current use of Google Analytics ASAP.

I'm usually proponent of indefinite bans to people who do this, but
there are others who want milder approaches :-)
Indeed, this is violation of our privacy policy, and never should be
allowed. Thanks for headsup.

Do note, hu.wikipedia.org has external stats aggregator,
'stats.wikipedia.hu', which is hosted on vhost102.sx6.tolna.net - and
all our traffic is sent there ( http://hu.wikipedia.org/w/index.php?title=MediaWiki:Lastmodifiedat&oldid=4493139
- as well as few other places )

I removed from both. Thanks again :)

Domas

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
Tim 'avatar' Bartel wrote:
> Hi,
>
> recently the report of the KnowPrivacy [1] study - a research project
> by the School of Information from University of California in Berkeley
> - hit the German media [2].
>
> It came to the conclusion that "All of the top 50 websites contained
> at least one web bug at some point in a one month time period." [3]
> which includes wikipedia.org.
>
> This is very troubleing and irritating for some of our (German) users
> who are very sensitive to data privacy topics. So I established
> contact to Brian W. Carver (University of California) who connected me
> to David Cancel, the maintainer of Ghostery, which was used to
> identify the web bugs. David wrote me today:
>
>
>> The following web bug trackers were reported to us, on the following subdomains:
>> Google Analytics - vls.wikipedia.org
>> Doubleclick - hu.wikipedia.org
>> Both were seen in yesterday's data so they're recent. We don't receive any page level information so that's as much detail as we have. Hope that helps.
>>
>
> I wasn't able to track down the Doubleclick web bug on the hungarian
> Wikipedia, but Google Analytics web bug is integrated in every page of
> the West Flemish Wikipedia via JavaScript [4].
>
> Our privacy policy [5] states "The Wikimedia Foundation may keep raw
> logs of such transactions [IP and other technical information], but
> these will not be published or used to track legitimate users." and
> "As a general principle, the access to, and retention of, personally
> identifiable data in all projects should be minimal and should be used
> only internally to serve the well-being of the projects."
>
> I think we should stop the current use of Google Analytics ASAP.
>
> Bye, Tim.
>
>
Surely this is something which should be possible to block at the
MediaWiki level, by suppressing the generation of any HTML that loads
any indirect resources (scripts, iframes, images, etc.) whatsoever other
than from a clearly defined whitelist of Wikimedia-Foundation-controlled
domains?

Doing this should completely stop site admins from adding web bugs.

-- Neil


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
We need tools to track user behavior inside Wikipedia. As it is now we
know nearly nothing at all about user behavior and nearly all people
saying anything about users at Wikipedia makes gross estimates and wild
guesses.

User privacy on Wikipedia is is close to a public hoax, pages are
transfered unencrypted and with user names in clear text. Anyone with
access to a public hub is able to intercept and identify users, in
addition to _all_ websites that are referenced during an edit on
Wikipedia through correlation of logs.

Compared to this the whole previous discussion about the Iranian steward
is somewhat strange, if not completely ridiculous.

Get real, the whole system and access to it is completely open!

John

Neil Harris skrev:
> Tim 'avatar' Bartel wrote:
>> Hi,
>>
>> recently the report of the KnowPrivacy [1] study - a research project
>> by the School of Information from University of California in Berkeley
>> - hit the German media [2].
>>
>> It came to the conclusion that "All of the top 50 websites contained
>> at least one web bug at some point in a one month time period." [3]
>> which includes wikipedia.org.
>>
>> This is very troubleing and irritating for some of our (German) users
>> who are very sensitive to data privacy topics. So I established
>> contact to Brian W. Carver (University of California) who connected me
>> to David Cancel, the maintainer of Ghostery, which was used to
>> identify the web bugs. David wrote me today:
>>
>>
>>> The following web bug trackers were reported to us, on the following subdomains:
>>> Google Analytics - vls.wikipedia.org
>>> Doubleclick - hu.wikipedia.org
>>> Both were seen in yesterday's data so they're recent. We don't receive any page level information so that's as much detail as we have. Hope that helps.
>>>
>> I wasn't able to track down the Doubleclick web bug on the hungarian
>> Wikipedia, but Google Analytics web bug is integrated in every page of
>> the West Flemish Wikipedia via JavaScript [4].
>>
>> Our privacy policy [5] states "The Wikimedia Foundation may keep raw
>> logs of such transactions [IP and other technical information], but
>> these will not be published or used to track legitimate users." and
>> "As a general principle, the access to, and retention of, personally
>> identifiable data in all projects should be minimal and should be used
>> only internally to serve the well-being of the projects."
>>
>> I think we should stop the current use of Google Analytics ASAP.
>>
>> Bye, Tim.
>>
>>
> Surely this is something which should be possible to block at the
> MediaWiki level, by suppressing the generation of any HTML that loads
> any indirect resources (scripts, iframes, images, etc.) whatsoever other
> than from a clearly defined whitelist of Wikimedia-Foundation-controlled
> domains?
>
> Doing this should completely stop site admins from adding web bugs.
>
> -- Neil
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
Domas Mituzas wrote:
> Do note, hu.wikipedia.org has external stats aggregator,
> 'stats.wikipedia.hu', which is hosted on vhost102.sx6.tolna.net - and
> all our traffic is sent there ( http://hu.wikipedia.org/w/index.php?title=MediaWiki:Lastmodifiedat&oldid=4493139
> - as well as few other places )

One way to fight this would be to offer more detailed visitor statistics
to people who need them.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
Forgot a link to an article which describes very well privacy on
Wikipedia! ;)

http://en.wikipedia.org/wiki/The_Emperor%27s_New_Clothes

John at Darkstar skrev:
> We need tools to track user behavior inside Wikipedia. As it is now we
> know nearly nothing at all about user behavior and nearly all people
> saying anything about users at Wikipedia makes gross estimates and wild
> guesses.
>
> User privacy on Wikipedia is is close to a public hoax, pages are
> transfered unencrypted and with user names in clear text. Anyone with
> access to a public hub is able to intercept and identify users, in
> addition to _all_ websites that are referenced during an edit on
> Wikipedia through correlation of logs.
>
> Compared to this the whole previous discussion about the Iranian steward
> is somewhat strange, if not completely ridiculous.
>
> Get real, the whole system and access to it is completely open!
>
> John
>
> Neil Harris skrev:
>> Tim 'avatar' Bartel wrote:
>>> Hi,
>>>
>>> recently the report of the KnowPrivacy [1] study - a research project
>>> by the School of Information from University of California in Berkeley
>>> - hit the German media [2].
>>>
>>> It came to the conclusion that "All of the top 50 websites contained
>>> at least one web bug at some point in a one month time period." [3]
>>> which includes wikipedia.org.
>>>
>>> This is very troubleing and irritating for some of our (German) users
>>> who are very sensitive to data privacy topics. So I established
>>> contact to Brian W. Carver (University of California) who connected me
>>> to David Cancel, the maintainer of Ghostery, which was used to
>>> identify the web bugs. David wrote me today:
>>>
>>>
>>>> The following web bug trackers were reported to us, on the following subdomains:
>>>> Google Analytics - vls.wikipedia.org
>>>> Doubleclick - hu.wikipedia.org
>>>> Both were seen in yesterday's data so they're recent. We don't receive any page level information so that's as much detail as we have. Hope that helps.
>>>>
>>> I wasn't able to track down the Doubleclick web bug on the hungarian
>>> Wikipedia, but Google Analytics web bug is integrated in every page of
>>> the West Flemish Wikipedia via JavaScript [4].
>>>
>>> Our privacy policy [5] states "The Wikimedia Foundation may keep raw
>>> logs of such transactions [IP and other technical information], but
>>> these will not be published or used to track legitimate users." and
>>> "As a general principle, the access to, and retention of, personally
>>> identifiable data in all projects should be minimal and should be used
>>> only internally to serve the well-being of the projects."
>>>
>>> I think we should stop the current use of Google Analytics ASAP.
>>>
>>> Bye, Tim.
>>>
>>>
>> Surely this is something which should be possible to block at the
>> MediaWiki level, by suppressing the generation of any HTML that loads
>> any indirect resources (scripts, iframes, images, etc.) whatsoever other
>> than from a clearly defined whitelist of Wikimedia-Foundation-controlled
>> domains?
>>
>> Doing this should completely stop site admins from adding web bugs.
>>
>> -- Neil
>>
>>
>> _______________________________________________
>> foundation-l mailing list
>> foundation-l@lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
John at Darkstar wrote:
> We need tools to track user behavior inside Wikipedia. As it is now we
> know nearly nothing at all about user behavior and nearly all people
> saying anything about users at Wikipedia makes gross estimates and wild
> guesses.
>
> User privacy on Wikipedia is is close to a public hoax, pages are
> transfered unencrypted and with user names in clear text. Anyone with
> access to a public hub is able to intercept and identify users, in
> addition to _all_ websites that are referenced during an edit on
> Wikipedia through correlation of logs.
>
> Compared to this the whole previous discussion about the Iranian steward
> is somewhat strange, if not completely ridiculous.
>
> Get real, the whole system and access to it is completely open!
>
> John
>

As you say, there is no possibility of absolute privacy from anyone with
access to the traffic stream, since the Internet was never engineered to
give this kind of privacy. Wikipedia as "completely open" as any other
non-https website -- and, even with https, as with any other website
with publicly visible transactions, for anyone with access to the
traffic stream, simple traffic analysis is generally enough to correlate
user identities to IPs. A combination of http and Tor is probably as
good as it gets in attempting to avoid this, but even this has its
limitations.

But it is simply unreasonable to equate this with no privacy at all.
Most possible eavesdroppers do _not_ have access to the entire traffic
stream, and those who do have access to traffic generally only have
access to part of the traffic stream, and even then, most of them can't
be bothered to eavesdrop, or are discouraged from doing so by privacy laws.

Given this, it is quite reasonable to take appropriate technical
measures that attempt to keep as much of that remaining privacy as
secure as possible.

-- Neil



_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
On Thu, Jun 4, 2009 at 1:18 AM, Tim 'avatar' Bartel <
wikipedia@computerkultur.org> wrote:

> Hi,
>
> recently the report of the KnowPrivacy [1] study - a research project
> by the School of Information from University of California in Berkeley
> - hit the German media [2].
>

The case of vlswiki is troubling as it's a single sysop who is stubbornly
adding the analytics bug

- (huidig) (latst<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&diff=131158&oldid=127221>)
4 jun 2009 06:54<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&oldid=131158>
Midom<http://vls.wikipedia.org/w/index.php?title=Gebruker:Midom&action=edit&redlink=1>
(discuusjeblad<http://vls.wikipedia.org/w/index.php?title=Discuusje_gebruker:Midom&action=edit&redlink=1>|
bydroagn <http://vls.wikipedia.org/wiki/Specioal:Bijdragen/Midom> |
blokkeer <http://vls.wikipedia.org/wiki/Specioal:Blokkeren/Midom>) (74
bytes) (privacy policy violation) (zêre
ersteln<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&action=rollback&from=Midom&token=7b9e891163dcfaa7ffbc86f28f2734a5%2B%5C>|
ersteln<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&action=edit&undoafter=127221&undo=131158>)

- (huidig<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&diff=131158&oldid=127221>)
(latst<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&diff=127221&oldid=100968>)
25 apr 2009 15:13<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&oldid=127221>
Tbc <http://vls.wikipedia.org/wiki/Gebruker:Tbc>
(discuusjeblad<http://vls.wikipedia.org/wiki/Discuusje_gebruker:Tbc>|
bydroagn <http://vls.wikipedia.org/wiki/Specioal:Bijdragen/Tbc> |
blokkeer <http://vls.wikipedia.org/wiki/Specioal:Blokkeren/Tbc>) (363
bytes) (cannot find in the policy this is not allowed)
(ersteln<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&action=edit&undoafter=100968&undo=127221>)

- (huidig<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&diff=131158&oldid=100968>)
(latst<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&diff=100968&oldid=70975>)
9 jul 2008 21:13<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&oldid=100968>
Drini <http://vls.wikipedia.org/wiki/Gebruker:Drini>
(discuusjeblad<http://vls.wikipedia.org/wiki/Discuusje_gebruker:Drini>|
bydroagn <http://vls.wikipedia.org/wiki/Specioal:Bijdragen/Drini> |
blokkeer <http://vls.wikipedia.org/wiki/Specioal:Blokkeren/Drini>) (74
bytes) (google analytics is not allowed in global scripts)
(ersteln<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&action=edit&undoafter=70975&undo=100968>)


http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&action=history

What I propose is this being re-added would cause a removal of sysop bit due
to misuse of powers.
Don't we have a committee that checks privacy violations?
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
2009/6/4 Pedro Sanchez <pdsanchez@gmail.com>:

> What I propose is this being re-added would cause a removal of sysop bit due
> to misuse of powers.
> Don't we have a committee that checks privacy violations?


The Foundation would surely have this power.


- d.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
The interesting thing is "who has interest in which users identity".
Lets make an example, some organization sets up a site with a honeypot
and logs all visitors. Then they correlates that with RC-logs from
Wikipedia and then checks out who adds external links back to
themselves. They do not need direct access to Wikipedia logs or the raw
traffic.

There is only one valid reason as I see it to avoid certain stat
engines, and that is to block advertising companies from getting
information about the readers. The writers does not have any real
anonymity at all.

John

Neil Harris skrev:
> John at Darkstar wrote:
>> We need tools to track user behavior inside Wikipedia. As it is now we
>> know nearly nothing at all about user behavior and nearly all people
>> saying anything about users at Wikipedia makes gross estimates and wild
>> guesses.
>>
>> User privacy on Wikipedia is is close to a public hoax, pages are
>> transfered unencrypted and with user names in clear text. Anyone with
>> access to a public hub is able to intercept and identify users, in
>> addition to _all_ websites that are referenced during an edit on
>> Wikipedia through correlation of logs.
>>
>> Compared to this the whole previous discussion about the Iranian steward
>> is somewhat strange, if not completely ridiculous.
>>
>> Get real, the whole system and access to it is completely open!
>>
>> John
>>
>
> As you say, there is no possibility of absolute privacy from anyone with
> access to the traffic stream, since the Internet was never engineered to
> give this kind of privacy. Wikipedia as "completely open" as any other
> non-https website -- and, even with https, as with any other website
> with publicly visible transactions, for anyone with access to the
> traffic stream, simple traffic analysis is generally enough to correlate
> user identities to IPs. A combination of http and Tor is probably as
> good as it gets in attempting to avoid this, but even this has its
> limitations.
>
> But it is simply unreasonable to equate this with no privacy at all.
> Most possible eavesdroppers do _not_ have access to the entire traffic
> stream, and those who do have access to traffic generally only have
> access to part of the traffic stream, and even then, most of them can't
> be bothered to eavesdrop, or are discouraged from doing so by privacy laws.
>
> Given this, it is quite reasonable to take appropriate technical
> measures that attempt to keep as much of that remaining privacy as
> secure as possible.
>
> -- Neil
>
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
2009/6/4 Tim 'avatar' Bartel <wikipedia@computerkultur.org>:

> I think we should stop the current use of Google Analytics ASAP.

Indeed.

For the record, we've discussed Google Analytics before:

* in July 2007, for pms.wiki - nothing implemented, I think

* in October 2007, for en.wikibooks - implemented but then stopped. at
the same time, en.wikinews had implemented it and then stopped it
again; the Wikimania07 site also ran it for most of the year and then
had it taken out when discovered.

* in December 2007, for fi.wiki - implemented but then stopped

* in July 2008, for th.wiki - discovered and removed. a check then
found it on vls.wiki and th.wikisource; the discussion doesn't record
that these were removed, but checking the sites shows they were.

The vls one is interesting - it was removed by Drini in July, per the
foundation-l discussion, and only added back in at the end of April
2009... and there we get this problem.

So, yeah. Pretty solid consensus that this is something to avoid. If
we have some "explanatory notes" to go with the privacy policy
anywhere, it might be worth explicitly mentioning the use of external
logging services and Why Thou Shalt Not.

--
- Andrew Gray
andrew.gray@dunelm.org.uk

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
Domas Mituzas <midom.lists@...> writes:

> Do note, hu.wikipedia.org has external stats aggregator,
> 'stats.wikipedia.hu', which is hosted on vhost102.sx6.tolna.net - and
> all our traffic is sent there (
> http://hu.wikipedia.org/w/index.php?title=MediaWiki:Lastmodifiedat&oldid=4493139

The stats aggregator for hu.wikipedia.org was set up with community approval,
the public results contain no identifiable per-machine information (you can
check them here: http://stats.wikipedia.hu/cgi-bin/awstats.pl ), and the records
are not used for any other purposes. I think it is well within the lines of the
privacy policy.

As for Doubleclick, that was probably a mistake on KnowPrivacy's part - maybe
they misidentified the aggregator (we use awstats) because Doubleclick uses a
similar method? If not, I would appreciate if they could serve with more
detailed information.


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
The Ombudsman Commission would likely be that group. Although their
focus has traditionally been CheckUser, their purview actually covers
any and all violations of the privacy policy. Here is one such case. At
this moment, I agree: this sysop shouldn't be.

-Mike

On Thu, 2009-06-04 at 06:21 -0500, Pedro Sanchez wrote:

> On Thu, Jun 4, 2009 at 1:18 AM, Tim 'avatar' Bartel <
> wikipedia@computerkultur.org> wrote:
>
> > Hi,
> >
> > recently the report of the KnowPrivacy [1] study - a research project
> > by the School of Information from University of California in Berkeley
> > - hit the German media [2].
> >
>
> The case of vlswiki is troubling as it's a single sysop who is stubbornly
> adding the analytics bug
>
> - (huidig) (latst<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&diff=131158&oldid=127221>)
> 4 jun 2009 06:54<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&oldid=131158>
> Midom<http://vls.wikipedia.org/w/index.php?title=Gebruker:Midom&action=edit&redlink=1>
> (discuusjeblad<http://vls.wikipedia.org/w/index.php?title=Discuusje_gebruker:Midom&action=edit&redlink=1>|
> bydroagn <http://vls.wikipedia.org/wiki/Specioal:Bijdragen/Midom> |
> blokkeer <http://vls.wikipedia.org/wiki/Specioal:Blokkeren/Midom>) (74
> bytes) (privacy policy violation) (zre
> ersteln<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&action=rollback&from=Midom&token=7b9e891163dcfaa7ffbc86f28f2734a5%2B%5C>|
> ersteln<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&action=edit&undoafter=127221&undo=131158>)
>
> - (huidig<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&diff=131158&oldid=127221>)
> (latst<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&diff=127221&oldid=100968>)
> 25 apr 2009 15:13<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&oldid=127221>
> Tbc <http://vls.wikipedia.org/wiki/Gebruker:Tbc>
> (discuusjeblad<http://vls.wikipedia.org/wiki/Discuusje_gebruker:Tbc>|
> bydroagn <http://vls.wikipedia.org/wiki/Specioal:Bijdragen/Tbc> |
> blokkeer <http://vls.wikipedia.org/wiki/Specioal:Blokkeren/Tbc>) (363
> bytes) (cannot find in the policy this is not allowed)
> (ersteln<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&action=edit&undoafter=100968&undo=127221>)
>
> - (huidig<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&diff=131158&oldid=100968>)
> (latst<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&diff=100968&oldid=70975>)
> 9 jul 2008 21:13<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&oldid=100968>
> Drini <http://vls.wikipedia.org/wiki/Gebruker:Drini>
> (discuusjeblad<http://vls.wikipedia.org/wiki/Discuusje_gebruker:Drini>|
> bydroagn <http://vls.wikipedia.org/wiki/Specioal:Bijdragen/Drini> |
> blokkeer <http://vls.wikipedia.org/wiki/Specioal:Blokkeren/Drini>) (74
> bytes) (google analytics is not allowed in global scripts)
> (ersteln<http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&action=edit&undoafter=70975&undo=100968>)
>
>
> http://vls.wikipedia.org/w/index.php?title=MediaWiki:Common.js&action=history
>
> What I propose is this being re-added would cause a removal of sysop bit due
> to misuse of powers.
> Don't we have a committee that checks privacy violations?
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
Hi,

2009/6/4 Tisza Gergõ <gtisza@gmail.com>:
> As for Doubleclick, that was probably a mistake on KnowPrivacy's part - maybe
> they misidentified the aggregator (we use awstats) because Doubleclick uses a
> similar method? If not, I would appreciate if they could serve with more
> detailed information.

Sad but true, they don't have further information on that. I'll try to
reproduce it.

Bye, Tim.

--
http://wikimedia.de

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
John at Darkstar wrote:
> The interesting thing is "who has interest in which users identity".
> Lets make an example, some organization sets up a site with a honeypot
> and logs all visitors. Then they correlates that with RC-logs from
> Wikipedia and then checks out who adds external links back to
> themselves. They do not need direct access to Wikipedia logs or the raw
> traffic.
>
> There is only one valid reason as I see it to avoid certain stat
> engines, and that is to block advertising companies from getting
> information about the readers. The writers does not have any real
> anonymity at all.
>
> John
>

Indeed they could. But even so, they would still have great difficulty
in getting more than a small fraction of Wikipedia's readers to both
visit the honeypot and make an edit that links to it, and the vast
majority of unaffected users will still avoid being bitten by this
attack. And even then, they will still only have obtained a mapping
between the user's current IP and their Wikipedia account, and will
still have to correlate this back to a personal identity, which is often
harder than it might seem to be in theory.

The world is a dangerous place, but just because privacy and security
can never be absolute is not a reason to make good faith efforts to
preserve it as much of both as reasonably possible within the limits of
time and resources available.

Just because a door can be knocked down with a sledgehammer (or a wall
demolished with a pneumatic hammer) is not a reason not to have a lock
on it, or a door there in the first place.

-- Neil


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
Nikola Smolenski wrote:
> Domas Mituzas wrote:
>
>> Do note, hu.wikipedia.org has external stats aggregator,
>> 'stats.wikipedia.hu', which is hosted on vhost102.sx6.tolna.net - and
>> all our traffic is sent there ( http://hu.wikipedia.org/w/index.php?title=MediaWiki:Lastmodifiedat&oldid=4493139
>> - as well as few other places )
>>
>
> One way to fight this would be to offer more detailed visitor statistics
> to people who need them.
>
> __________

And another, possibily even more effective one would be to prevent the
loading of external resources in the software, except possibly via
editors' own custom user javascript pages.

-- Neil


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
Web bugs for statistical data are a legitimate want but potentially a
horrible privacy violation.

So I asked on wikitech-l, and the obvious answer appears to be to do
it internally. Something like http://stats.grok.se/ only more so.

So - if you want web bug data in a way that fits the privacy policy,
please pop over to the wikitech-l thread with technical suggestions
and solutions :-)


- d.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
David Gerard wrote:
> Web bugs for statistical data are a legitimate want but potentially a
> horrible privacy violation.
>
> So I asked on wikitech-l, and the obvious answer appears to be to do
> it internally. Something like http://stats.grok.se/ only more so.
>
> So - if you want web bug data in a way that fits the privacy policy,
> please pop over to the wikitech-l thread with technical suggestions
> and solutions
Precisely. External web bug trackers should be removed without
exception. People who add them innocently, out of an understandable
interest in collecting aggregated information that would not violate the
privacy policy, should be directed to request and help with internal
solutions, kept within appropriate limits to comply with the policy.

--Michael Snow

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
>Surely this is something which should be possible to block at the
MediaWiki level
Maybe if we set up Google Analytics in the first place (done by the
Foundation office) and never used it; the foundation could set up analytics
for all projects with a super secure password, and never use it. Will this
work, or will somebody else be able to set up analytics still?

Go Freedom!
Unionhawk



On Thu, Jun 4, 2009 at 6:01 AM, Neil Harris <usenet@tonal.clara.co.uk>wrote:

> Tim 'avatar' Bartel wrote:
> > Hi,
> >
> > recently the report of the KnowPrivacy [1] study - a research project
> > by the School of Information from University of California in Berkeley
> > - hit the German media [2].
> >
> > It came to the conclusion that "All of the top 50 websites contained
> > at least one web bug at some point in a one month time period." [3]
> > which includes wikipedia.org.
> >
> > This is very troubleing and irritating for some of our (German) users
> > who are very sensitive to data privacy topics. So I established
> > contact to Brian W. Carver (University of California) who connected me
> > to David Cancel, the maintainer of Ghostery, which was used to
> > identify the web bugs. David wrote me today:
> >
> >
> >> The following web bug trackers were reported to us, on the following
> subdomains:
> >> Google Analytics - vls.wikipedia.org
> >> Doubleclick - hu.wikipedia.org
> >> Both were seen in yesterday's data so they're recent. We don't receive
> any page level information so that's as much detail as we have. Hope that
> helps.
> >>
> >
> > I wasn't able to track down the Doubleclick web bug on the hungarian
> > Wikipedia, but Google Analytics web bug is integrated in every page of
> > the West Flemish Wikipedia via JavaScript [4].
> >
> > Our privacy policy [5] states "The Wikimedia Foundation may keep raw
> > logs of such transactions [IP and other technical information], but
> > these will not be published or used to track legitimate users." and
> > "As a general principle, the access to, and retention of, personally
> > identifiable data in all projects should be minimal and should be used
> > only internally to serve the well-being of the projects."
> >
> > I think we should stop the current use of Google Analytics ASAP.
> >
> > Bye, Tim.
> >
> >
> Surely this is something which should be possible to block at the
> MediaWiki level, by suppressing the generation of any HTML that loads
> any indirect resources (scripts, iframes, images, etc.) whatsoever other
> than from a clearly defined whitelist of Wikimedia-Foundation-controlled
> domains?
>
> Doing this should completely stop site admins from adding web bugs.
>
> -- Neil
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
David Gerard wrote:
> External web bug trackers should be removed without
> exception. People who add them innocently, out of an understandable
> interest in collecting aggregated information that would not violate the
> privacy policy, should be directed to request and help with internal
> solutions, kept within appropriate limits to comply with the policy.

So how do you propose we enforce this? I'm thinking we need to prevent this
from happening in the first place. Analytics like this could pretty much
give checkuser powers to anybody!

They have a legitimate purpose, so, if analytics are wanted/needed by the
Foundation, they may be implemented by the Foundation. Otherwise, no
analytics.

Go Freedom!
Unionhawk



On Thu, Jun 4, 2009 at 11:13 AM, Michael Snow <wikipedia@verizon.net> wrote:

> David Gerard wrote:
> > Web bugs for statistical data are a legitimate want but potentially a
> > horrible privacy violation.
> >
> > So I asked on wikitech-l, and the obvious answer appears to be to do
> > it internally. Something like http://stats.grok.se/ only more so.
> >
> > So - if you want web bug data in a way that fits the privacy policy,
> > please pop over to the wikitech-l thread with technical suggestions
> > and solutions
> Precisely. External web bug trackers should be removed without
> exception. People who add them innocently, out of an understandable
> interest in collecting aggregated information that would not violate the
> privacy policy, should be directed to request and help with internal
> solutions, kept within appropriate limits to comply with the policy.
>
> --Michael Snow
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
Installing Google Analytics, even for our own purposes, is a bad idea.
For one, it creates a link to google that is not necessarily what we
want; it would be a big target for people to try and hack, and it
presents tempting security risks on Google's end. Not to mention, as
far as I know the program is proprietary.

If we're going to do something like this, it should be open source,
and it should something that we can internally install and monitor
without external options. That is, again, assuming we do something
like that. That's not a foregone presumption. I'm not convinced that
we need to be tracking user behavior at this point in time, or that
the tradeoffs for doing so are worth any benefits, or that doing so is
in furtherance of our mission.

-Dan

On Jun 4, 2009, at 11:13 AM, Unionhawk wrote:

>> Surely this is something which should be possible to block at the
> MediaWiki level
> Maybe if we set up Google Analytics in the first place (done by the
> Foundation office) and never used it; the foundation could set up
> analytics
> for all projects with a super secure password, and never use it.
> Will this
> work, or will somebody else be able to set up analytics still?
>
> Go Freedom!
> Unionhawk
>
>
>
> On Thu, Jun 4, 2009 at 6:01 AM, Neil Harris
> <usenet@tonal.clara.co.uk>wrote:
>
>> Tim 'avatar' Bartel wrote:
>>> Hi,
>>>
>>> recently the report of the KnowPrivacy [1] study - a research
>>> project
>>> by the School of Information from University of California in
>>> Berkeley
>>> - hit the German media [2].
>>>
>>> It came to the conclusion that "All of the top 50 websites contained
>>> at least one web bug at some point in a one month time period." [3]
>>> which includes wikipedia.org.
>>>
>>> This is very troubleing and irritating for some of our (German)
>>> users
>>> who are very sensitive to data privacy topics. So I established
>>> contact to Brian W. Carver (University of California) who
>>> connected me
>>> to David Cancel, the maintainer of Ghostery, which was used to
>>> identify the web bugs. David wrote me today:
>>>
>>>
>>>> The following web bug trackers were reported to us, on the
>>>> following
>> subdomains:
>>>> Google Analytics - vls.wikipedia.org
>>>> Doubleclick - hu.wikipedia.org
>>>> Both were seen in yesterday's data so they're recent. We don't
>>>> receive
>> any page level information so that's as much detail as we have.
>> Hope that
>> helps.
>>>>
>>>
>>> I wasn't able to track down the Doubleclick web bug on the hungarian
>>> Wikipedia, but Google Analytics web bug is integrated in every
>>> page of
>>> the West Flemish Wikipedia via JavaScript [4].
>>>
>>> Our privacy policy [5] states "The Wikimedia Foundation may keep raw
>>> logs of such transactions [IP and other technical information], but
>>> these will not be published or used to track legitimate users." and
>>> "As a general principle, the access to, and retention of, personally
>>> identifiable data in all projects should be minimal and should be
>>> used
>>> only internally to serve the well-being of the projects."
>>>
>>> I think we should stop the current use of Google Analytics ASAP.
>>>
>>> Bye, Tim.
>>>
>>>
>> Surely this is something which should be possible to block at the
>> MediaWiki level, by suppressing the generation of any HTML that
>> loads
>> any indirect resources (scripts, iframes, images, etc.) whatsoever
>> other
>> than from a clearly defined whitelist of Wikimedia-Foundation-
>> controlled
>> domains?
>>
>> Doing this should completely stop site admins from adding web bugs.
>>
>> -- Neil
>>
>>
>> _______________________________________________
>> foundation-l mailing list
>> foundation-l@lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/
>> foundation-l
>>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
Dan Rosenthal wrote:
> Installing Google Analytics, even for our own purposes, is a bad idea.
> For one, it creates a link to google that is not necessarily what we
> want; it would be a big target for people to try and hack, and it
> presents tempting security risks on Google's end. Not to mention, as
> far as I know the program is proprietary.
>
> If we're going to do something like this, it should be open source,
> and it should something that we can internally install and monitor
> without external options. That is, again, assuming we do something
> like that. That's not a foregone presumption. I'm not convinced that
> we need to be tracking user behavior at this point in time, or that
> the tradeoffs for doing so are worth any benefits, or that doing so is
> in furtherance of our mission.
>

The plain pageview stats are already available.
Erik Zachte has been doing some work on other stats.
<http://stats.wikimedia.org/EN/VisitorsSampledLogRequests.htm>

If I were to compile a wishlist of stats things:

1. stats.grok.se data for non-Wikipedia projects
2. A better interface for stats.wikimedia.org - There's a lot of data
there, but it can be hard to find it and its not very publicized. The
only reason I knew about the link above is because someone pointed it
out to me once and I bookmarked it.
3. Pageview stats at <http://dammit.lt/wikistats/> in files based on
projects. It would be a lot easier for people at the West Flemish
Wikipedia to analyze statistics themselves if they didn't have to
download tons of data they don't need.

--
Alex (wikipedia:en:User:Mr.Z-man)

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
On Thu, Jun 4, 2009 at 8:35 AM, Dan Rosenthal <swatjester@gmail.com> wrote:
> Installing Google Analytics, even for our own purposes, is a bad idea.
> For one, it creates a link to google that is not necessarily what we
> want; it would be a big target for people to try and hack, and it
> presents tempting security risks on Google's end.  Not to mention, as
> far as I know the program is proprietary.
<snip>

I may be misreading, but I believe Unionhawk's suggestion was to setup
-- but not install -- Google Analytics in the hope that simply
registering the accounts would block anyone else from creating an
Analytics account pointed at Wikipedia. (I don't know if it actually
works that way.)

That strikes me as rather too much work though. Better to block the
relevant URLs from being inserted in the first place. That could be
accomplished in any one of several technical ways.

One idea is the proposal to install the AbuseFilter in a global mode,
i.e. rules loaded at Meta that apply everywhere. If that were done
(and there are some arguments about whether it is a good idea), then
it could be used to block these types of URLs from being installed,
even by admins.

-Robert Rohde

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Wikipedia tracks user behaviour via third party companies [ In reply to ]
[repost with proper subscribed mail address]

Alex wrote:

> The plain pageview stats are already available.
> Erik Zachte has been doing some work on other stats.
> <http://stats.wikimedia.org/EN/VisitorsSampledLogRequests.htm>

> If I were to compile a wishlist of stats things:

> 1. stats.grok.se data for non-Wikipedia projects 2. A better interface
> for stats.wikimedia.org - There's a lot of data there, but it can be
> hard to find it and its not very publicized. The only reason I knew
> about the link above is because someone pointed it out to me once and
> I bookmarked it.
> 3. Pageview stats at <http://dammit.lt/wikistats/> in files based on
> projects. It would be a lot easier for people at the West Flemish
> Wikipedia to analyze statistics themselves if they didn't have to
> download tons of data they don't need.

Your enhancement requests:

1 IIRC this is already a (albeit undocumented) feature.
One can manually alter the url to find e.g. wiktionary stats.
But I forgot precisely how and see nothing on User:Henriks talk page.

2 Seconded whole heartedly. In fact I started to reshape the main page (just
eight links) this week :) I just uploaded it a bit earlier than planned:
http://stats.wikimedia.org/

3 That could be a useful extension on the preservation script described
below.

--------------------------------
General response

I would say since begin 2008 quite a lot has happened. A recap:

As already has been said Domas' (and Tim's) work was a major step forward.

http://dammit.lt/wikistats/

Two very useful aggregators of these on a page by page basis are

http://stats.grok.se/
http://wikistics.falsikon.de/

Based on the same data, on a higher aggregation level there are visitors
counts for all projects in a easily digestible fashion

http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm

Also since two months we know much more about Wikimedia traffic based on 8
reports with all kinds of cross sections:

http://infodisiac.com/blog/2009/04/wikimedia-traffic-analyzed/

With regard to dammit.lt raw data I helped to preserve these for posterity
in a more compact and slightly filtered state, so that we can query them
much longer. (dammit.lt server has space for one or two months) Actually
Mathias Schindler started this important rescue effort. Each day all files
are downloaded and processed, reduced from 40 Gb per month to 3 Gb (May
2009). I also made a script to query these files, which is much more
efficiently than processing the original hourly files. But runtime is still
considerably so querying these files without restraints through a public
interface is not advisable. But the toolserver could get a copy of the files
of course.

http://infodisiac.com/blog/wp-content/uploads/2009/05/influenza1.png

Is this enough? Of course not, there is so much more to learn.

Considering geo data: for many months a patch for Domas' (and Tims) code has
been laying around, by Antonio José Reinoso Peinado, that would add country
level geolocation data from Maxmind's public database (ip->geo lookup).
Although I promised to look at it, I haven't found the time yet.

Considering web bugs: comScore also proposed such a scheme to us.
Apart from the question how much it would bring us that we don't or can't
figure out ourselves an overriding concern is privacy.

Erik Zachte
Data Analyst
Wikimedia Foundation, Inc.
E-Mail: ezachte@wikimedia.org



_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Wikipedia tracks user behaviour via third party companies [ In reply to ]
On Thu, Jun 4, 2009 at 6:01 AM, Neil Harris<usenet@tonal.clara.co.uk> wrote:
> Surely this is something which should be possible to block at the
> MediaWiki level, by suppressing the generation of any HTML  that loads
> any indirect resources (scripts, iframes, images, etc.) whatsoever other
> than from a clearly defined whitelist of Wikimedia-Foundation-controlled
> domains?

Not possible as long as we allow JS to be added. See [[halting problem]].

On Thu, Jun 4, 2009 at 6:20 AM, John at Darkstar<vacuum@jeb.no> wrote:
> User privacy on Wikipedia is is close to a public hoax, pages are
> transfered unencrypted and with user names in clear text. Anyone with
> access to a public hub is able to intercept and identify users, in
> addition to _all_ websites that are referenced during an edit on
> Wikipedia through correlation of logs.

This only works for getting info on totally random Wikipedia users,
who happen to edit using your router. This isn't a serious compromise
of privacy for practical purposes due to the resources required to get
info on a large number of users, or to target a specific user. Users
who are concerned about this, however, can use secure.wikimedia.org.

Note that if you make edits, it should be pretty easy for a MITM to
figure out your IP address even if you're using SSL: 1) Watch all
traffic going to Wikimedia IP addresses. 2) Guess which traffic
streams correspond to edits by looking at the amount of data the
client is sending. 3) Correlate suspected edits with RecentChanges
over a period of time. Once they know your IP address, if they're a
MITM, they can still figure out what sites you're accessing, just not
the exact pages (or exact domain in the case of virtual hosting).

So if you want real privacy against MITMs, you still need to use
something like Tor, as usual.

On Thu, Jun 4, 2009 at 12:53 PM, Robert Rohde<rarohde@gmail.com> wrote:
> One idea is the proposal to install the AbuseFilter in a global mode,
> i.e. rules loaded at Meta that apply everywhere. If that were done
> (and there are some arguments about whether it is a good idea), then
> it could be used to block these types of URLs from being installed,
> even by admins.

No, it wouldn't.

document.write('<script' + ' src="' + 'http://www.go' + 'ogle-an' +
'alytics.com/urc' + 'hin.js" type="text/javascript"></script>');

Obviously more complicated obfuscation is possible. JavaScript is
Turing-complete. You can't reliably figure out whether it will output
a specific string.

However, perhaps a default AbuseFilter could be installed telling
admins that installing Analytics is a violation of Foundation policy
and that they'll get desysopped if they continue. That wouldn't stop
them from doing it if they were determined, but it might be able to
trigger an alert to get the appropriate parties to make sure they
didn't try evading it. Maybe the filter could be installed on Meta
and local violations could go to Meta logs so stewards will see? Are
global filters possible right now?

At a bare minimum, such a warning would reduce inadvertent errors.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

1 2  View All