Mailing List Archive

Re: [Wikimedia-l] [Wikimedia Announcements] Announcing a new Wikimedia project: Abstract Wikipedia
Sorry I'm coming to this discussion a bit late, but I'd like to underline a
slightly different aspect of the concern that Phoebe raised:

> It concerns me that, at least in the high-level project proposals I've
> seen (I haven't been tracking this closely, and haven't read the academic
> papers) I have not yet seen discussions of ethical data, or how we might
> think about identifying bias, or even how to recruit contributors and the
> impact on existing contributors.
>

Using the terminology of Ibram X. Kendi (and others), I'd put this as:
"it's not enough to not be racist, you must actively be *anti-racist*."

Abstract Wikipedia is a "color blind" project. Indeed it is often
described as advancing WMF goals by improving the amount of content
available for minority languages.

However, it is built on a huge edifice of ML and AI technology which
advantages majority languages and the already-powerful.

As Phoebe mentioned, the subtle biases of ML translation toward majority
views (selecting the "proper" gender pronoun for someone described as a
"doctor" or "professor", say) are well known, and certainly deserve to be
foregrounded from the start, as Danny has pledged to do in his response to
Phoebe.

But the infrastructure of this project is built this way from the ground
up. Language models for European languages are orders of magnitude better
than language models for minority languages (if the latter exist at all).
The same is true for ontologies and every other constructed abstraction,
down to choices of what topics are significant enough to include in an
abstract article---but that ground has been ably covered by Kaldari and
others. So let me concentrate solely on language models in the remainder
(with some parenthetical asides, for which I hope you'll forgive me).

I would like to challenge Abstract Wikipedia not only to be "not racist" or
"color blind", but to be actively *antiracist*. That is, instead of
passively accepting the status quo wrt language models (& etc), to commit
to actively supporting a language model in *at least one* minority
language, treating it as a first-class citizen or (better) the *main*
output of the project. That means not just looking for "a good enough
language model that happens not to be a European language" but *actively
developing the language model* so that the Abstract Wikipedia project *from
inception* has a positive effect on *at least one* community speaking a
underrepresented language with a small Wikipedia. (Again, WLOG this could
apply to general AI/ML support for many many minority groups, but I'm
sticking with "at least one" and "language model" in order to make this as
concrete and actionable as possible.) This of course also means committing
to hire a speaker of that non-European language as part of the core team
(not just an "and translations" afterthought), committing to foregrounding
that language in demonstrations, and doing outreach and community building
to the language group in question. (All the mockups I've seen have been in
German and English, and have been pitched to an English-speaking audience.)

I don't think it is wise in 2020 to pretend that "colorblind" business as
usual will advance the goals of our organization. We need to actively work
to ensure this project has effects that *work against* the significant
pre-existing biases toward highly-educated speakers of European languages.
It is not enough to say that "someday" this "may" have an effect on
minority language groups if "somebody" ever gets around to doing it. We
must make those investments proactively and with clear intention in order
to effect the change we wish to see in the world.
-- C. Scott Ananian
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] [Wikimedia Announcements] Announcing a new Wikimedia project: Abstract Wikipedia [ In reply to ]
I applaud this idea. Preferably a language family with a large community of
practice, 'minority' in the sense of coverage and support by modern tools
and scaffolding, not in the sense of limited use.

We used to have a roughly weighted list of major world languages by
(spoken, written; primary, secondary) and how well covered they were by wp
(articles, contributors). Is there something like that still?

//S

????????????

On Wed., Aug. 5, 2020, 3:19 p.m. C. Scott Ananian, <cananian@wikimedia.org>
wrote:

> Sorry I'm coming to this discussion a bit late, but I'd like to underline a
> slightly different aspect of the concern that Phoebe raised:
>
> > It concerns me that, at least in the high-level project proposals I've
> > seen (I haven't been tracking this closely, and haven't read the academic
> > papers) I have not yet seen discussions of ethical data, or how we might
> > think about identifying bias, or even how to recruit contributors and the
> > impact on existing contributors.
> >
>
> Using the terminology of Ibram X. Kendi (and others), I'd put this as:
> "it's not enough to not be racist, you must actively be *anti-racist*."
>
> Abstract Wikipedia is a "color blind" project. Indeed it is often
> described as advancing WMF goals by improving the amount of content
> available for minority languages.
>
> However, it is built on a huge edifice of ML and AI technology which
> advantages majority languages and the already-powerful.
>
> As Phoebe mentioned, the subtle biases of ML translation toward majority
> views (selecting the "proper" gender pronoun for someone described as a
> "doctor" or "professor", say) are well known, and certainly deserve to be
> foregrounded from the start, as Danny has pledged to do in his response to
> Phoebe.
>
> But the infrastructure of this project is built this way from the ground
> up. Language models for European languages are orders of magnitude better
> than language models for minority languages (if the latter exist at all).
> The same is true for ontologies and every other constructed abstraction,
> down to choices of what topics are significant enough to include in an
> abstract article---but that ground has been ably covered by Kaldari and
> others. So let me concentrate solely on language models in the remainder
> (with some parenthetical asides, for which I hope you'll forgive me).
>
> I would like to challenge Abstract Wikipedia not only to be "not racist" or
> "color blind", but to be actively *antiracist*. That is, instead of
> passively accepting the status quo wrt language models (& etc), to commit
> to actively supporting a language model in *at least one* minority
> language, treating it as a first-class citizen or (better) the *main*
> output of the project. That means not just looking for "a good enough
> language model that happens not to be a European language" but *actively
> developing the language model* so that the Abstract Wikipedia project *from
> inception* has a positive effect on *at least one* community speaking a
> underrepresented language with a small Wikipedia. (Again, WLOG this could
> apply to general AI/ML support for many many minority groups, but I'm
> sticking with "at least one" and "language model" in order to make this as
> concrete and actionable as possible.) This of course also means committing
> to hire a speaker of that non-European language as part of the core team
> (not just an "and translations" afterthought), committing to foregrounding
> that language in demonstrations, and doing outreach and community building
> to the language group in question. (All the mockups I've seen have been in
> German and English, and have been pitched to an English-speaking audience.)
>
> I don't think it is wise in 2020 to pretend that "colorblind" business as
> usual will advance the goals of our organization. We need to actively work
> to ensure this project has effects that *work against* the significant
> pre-existing biases toward highly-educated speakers of European languages.
> It is not enough to say that "someday" this "may" have an effect on
> minority language groups if "somebody" ever gets around to doing it. We
> must make those investments proactively and with clear intention in order
> to effect the change we wish to see in the world.
> -- C. Scott Ananian
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] [Wikimedia Announcements] Announcing a new Wikimedia project: Abstract Wikipedia [ In reply to ]
On Wed, Aug 5, 2020 at 2:01 PM Samuel Klein <meta.sj@gmail.com> wrote:

> We used to have a roughly weighted list of major world languages by
> (spoken, written; primary, secondary) and how well covered they were by wp
> (articles, contributors). Is there something like that still?
>

I think you might be referring to the links in the 3rd and 4th line of
https://meta.wikimedia.org/wiki/Template:Lists_of_Wikipedias ?
Looking more closely, it appears that the "speakers per article" listing is
unfortunately a few years out of date, as the column of "Speakers" was
being manually updated from Ethnologue stats (which are now paywalled).
I've started a tangential discussion on the talkpage there, about using
Wikidata instead.
Additionally, none of those links contain the "primary / secondary
language" statistics, for which I think we'd need to cross-reference with
https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers
(https://www.wikidata.org/wiki/Q1394450) Or perhaps Wikidata can resolve it
again, as at least some languages' items include a split of the statistics
for that, e.g. Q150. Let's discuss further onwiki?

And +1 to the overall recommendation from C. Scott. :)
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] [Wikimedia Announcements] Announcing a new Wikimedia project: Abstract Wikipedia [ In reply to ]
Scott,

It is perfectly legitimate to be "anti-racist," but races are completely
artificial constructs. Racial conflict was interposed during the "tea
party" astroturfing in response to the Occupy movements:

https://www.reddit.com/r/occupywallstreet/comments/hyoogt/is_this_accurate/fze7t5c/

Do you support Wikimedia Foundation AI being programmed to be explicitly
anti-classist?

Best regards,
Jim


On Wed, Aug 5, 2020 at 12:19 PM C. Scott Ananian <cananian@wikimedia.org>
wrote:

> Sorry I'm coming to this discussion a bit late, but I'd like to underline a
> slightly different aspect of the concern that Phoebe raised:
>
> > It concerns me that, at least in the high-level project proposals I've
> > seen (I haven't been tracking this closely, and haven't read the academic
> > papers) I have not yet seen discussions of ethical data, or how we might
> > think about identifying bias, or even how to recruit contributors and the
> > impact on existing contributors.
> >
>
> Using the terminology of Ibram X. Kendi (and others), I'd put this as:
> "it's not enough to not be racist, you must actively be *anti-racist*."
>
> Abstract Wikipedia is a "color blind" project. Indeed it is often
> described as advancing WMF goals by improving the amount of content
> available for minority languages.
>
> However, it is built on a huge edifice of ML and AI technology which
> advantages majority languages and the already-powerful.
>
> As Phoebe mentioned, the subtle biases of ML translation toward majority
> views (selecting the "proper" gender pronoun for someone described as a
> "doctor" or "professor", say) are well known, and certainly deserve to be
> foregrounded from the start, as Danny has pledged to do in his response to
> Phoebe.
>
> But the infrastructure of this project is built this way from the ground
> up. Language models for European languages are orders of magnitude better
> than language models for minority languages (if the latter exist at all).
> The same is true for ontologies and every other constructed abstraction,
> down to choices of what topics are significant enough to include in an
> abstract article---but that ground has been ably covered by Kaldari and
> others. So let me concentrate solely on language models in the remainder
> (with some parenthetical asides, for which I hope you'll forgive me).
>
> I would like to challenge Abstract Wikipedia not only to be "not racist" or
> "color blind", but to be actively *antiracist*. That is, instead of
> passively accepting the status quo wrt language models (& etc), to commit
> to actively supporting a language model in *at least one* minority
> language, treating it as a first-class citizen or (better) the *main*
> output of the project. That means not just looking for "a good enough
> language model that happens not to be a European language" but *actively
> developing the language model* so that the Abstract Wikipedia project *from
> inception* has a positive effect on *at least one* community speaking a
> underrepresented language with a small Wikipedia. (Again, WLOG this could
> apply to general AI/ML support for many many minority groups, but I'm
> sticking with "at least one" and "language model" in order to make this as
> concrete and actionable as possible.) This of course also means committing
> to hire a speaker of that non-European language as part of the core team
> (not just an "and translations" afterthought), committing to foregrounding
> that language in demonstrations, and doing outreach and community building
> to the language group in question. (All the mockups I've seen have been in
> German and English, and have been pitched to an English-speaking audience.)
>
> I don't think it is wise in 2020 to pretend that "colorblind" business as
> usual will advance the goals of our organization. We need to actively work
> to ensure this project has effects that *work against* the significant
> pre-existing biases toward highly-educated speakers of European languages.
> It is not enough to say that "someday" this "may" have an effect on
> minority language groups if "somebody" ever gets around to doing it. We
> must make those investments proactively and with clear intention in order
> to effect the change we wish to see in the world.
> -- C. Scott Ananian
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] [Wikimedia Announcements] Announcing a new Wikimedia project: Abstract Wikipedia [ In reply to ]
If "anti-classist" is your way of writing "empowering the less-powerful",
then sure. As the rest of my email indicates, I'm choosing to focus on
language and language groups, since that's the most direct relation to the
output and input technologies of Abstract Wikimedia. Obviously there is no
direct mapping from 'race' to language, although those are both social
constructs. I found the academic work of the anti-racism movement helpful
in thinking about efforts to counteract long-standing structural
privileges, but if you prefer to use a different framework feel free. It
seems we are aligned on the actual actions required.
--scott

On Thu, Aug 6, 2020 at 3:42 AM James Salsman <jsalsman@gmail.com> wrote:

> Scott,
>
> It is perfectly legitimate to be "anti-racist," but races are completely
> artificial constructs. Racial conflict was interposed during the "tea
> party" astroturfing in response to the Occupy movements:
>
>
> https://www.reddit.com/r/occupywallstreet/comments/hyoogt/is_this_accurate/fze7t5c/
>
> Do you support Wikimedia Foundation AI being programmed to be explicitly
> anti-classist?
>
> Best regards,
> Jim
>
>
> On Wed, Aug 5, 2020 at 12:19 PM C. Scott Ananian <cananian@wikimedia.org>
> wrote:
>
>> Sorry I'm coming to this discussion a bit late, but I'd like to underline
>> a
>> slightly different aspect of the concern that Phoebe raised:
>>
>> > It concerns me that, at least in the high-level project proposals I've
>> > seen (I haven't been tracking this closely, and haven't read the
>> academic
>> > papers) I have not yet seen discussions of ethical data, or how we might
>> > think about identifying bias, or even how to recruit contributors and
>> the
>> > impact on existing contributors.
>> >
>>
>> Using the terminology of Ibram X. Kendi (and others), I'd put this as:
>> "it's not enough to not be racist, you must actively be *anti-racist*."
>>
>> Abstract Wikipedia is a "color blind" project. Indeed it is often
>> described as advancing WMF goals by improving the amount of content
>> available for minority languages.
>>
>> However, it is built on a huge edifice of ML and AI technology which
>> advantages majority languages and the already-powerful.
>>
>> As Phoebe mentioned, the subtle biases of ML translation toward majority
>> views (selecting the "proper" gender pronoun for someone described as a
>> "doctor" or "professor", say) are well known, and certainly deserve to be
>> foregrounded from the start, as Danny has pledged to do in his response to
>> Phoebe.
>>
>> But the infrastructure of this project is built this way from the ground
>> up. Language models for European languages are orders of magnitude better
>> than language models for minority languages (if the latter exist at all).
>> The same is true for ontologies and every other constructed abstraction,
>> down to choices of what topics are significant enough to include in an
>> abstract article---but that ground has been ably covered by Kaldari and
>> others. So let me concentrate solely on language models in the remainder
>> (with some parenthetical asides, for which I hope you'll forgive me).
>>
>> I would like to challenge Abstract Wikipedia not only to be "not racist"
>> or
>> "color blind", but to be actively *antiracist*. That is, instead of
>> passively accepting the status quo wrt language models (& etc), to commit
>> to actively supporting a language model in *at least one* minority
>> language, treating it as a first-class citizen or (better) the *main*
>> output of the project. That means not just looking for "a good enough
>> language model that happens not to be a European language" but *actively
>> developing the language model* so that the Abstract Wikipedia project
>> *from
>> inception* has a positive effect on *at least one* community speaking a
>> underrepresented language with a small Wikipedia. (Again, WLOG this could
>> apply to general AI/ML support for many many minority groups, but I'm
>> sticking with "at least one" and "language model" in order to make this as
>> concrete and actionable as possible.) This of course also means
>> committing
>> to hire a speaker of that non-European language as part of the core team
>> (not just an "and translations" afterthought), committing to foregrounding
>> that language in demonstrations, and doing outreach and community building
>> to the language group in question. (All the mockups I've seen have been
>> in
>> German and English, and have been pitched to an English-speaking
>> audience.)
>>
>> I don't think it is wise in 2020 to pretend that "colorblind" business as
>> usual will advance the goals of our organization. We need to actively
>> work
>> to ensure this project has effects that *work against* the significant
>> pre-existing biases toward highly-educated speakers of European languages.
>> It is not enough to say that "someday" this "may" have an effect on
>> minority language groups if "somebody" ever gets around to doing it. We
>> must make those investments proactively and with clear intention in order
>> to effect the change we wish to see in the world.
>> -- C. Scott Ananian
>> _______________________________________________
>> Wikimedia-l mailing list, guidelines at:
>> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>> New messages to: Wikimedia-l@lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
>

--
(http://cscott.net)
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] [Wikimedia Announcements] Announcing a new Wikimedia project: Abstract Wikipedia [ In reply to ]
Scott,

thank you for raising this really important issue, and I whole-heartedly
agree. Since I heard of Ibram X. Kendi's argument to not just be not racist
but rather be actively anti-racist, I thought a lot about it (I have a long
essay trying to sort my thoughts on that, but I am not sure my voice is
helpful in that conversation). But yes, I agree with the sentiment and the
idea.

Another statement that has deeply influenced my thinking in preparation for
this project was the statement "nothing about us without us", and the
implications of that for the Abstract Wikipedia project (and how,
currently, we are not really achieving it).

So, in short, yes, I want to commit to both of these as guidelines for how
the project will unfold.

Having a specific, non-European and underrepresented language as a
first-class development target is a great suggestion, and having someone on
the core team with a native-level grasp of that language is, I think, a
very good suggestion. Whether and when we can actually implement this
depends on a number of factors, such as funding, but yes, ensuring such
representation is very much a high priority for myself, and I am very much
(and painfully) aware that we are not fulfilling this promise yet.

For the choice of language I hope to go through a process similar as we did
for Wikidata, where we worked with the Wikipedia communities to identify
potential language communities that would be interested and willing to work
together with us. I am planning for us to have a similar process within the
next few months.

One advantage of the current state is that the focus for the first part of
the project will be solely on the wiki of functions, not yet on the part
that generates natural language, and that the current plan calls for
additional hires when this second part starts. So all of these decisions
and preparations are not blockers during the first part of the project, but
will be so for the second - and obviously I want to have them resolved well
before.

Also, one correction - we are fortunately not blocked by the availability
of language models in a given language. Since the natural language
generation, as we plan it, is developed by the communities using functions,
we do not need to have a good language model, or in fact, any language
model at all, for the system to work. So we have that going for us.

Finally, as answered to Phoebe, I want to tackle these issues heads-on with
a call for discussing the ethical implications of this project. Your
suggestions are good, and will inform our planning and development, but I
am also aware that, in order to have a fuller picture, we need to hear more
voices and figure out how to have these conversations. This will happen
within the next few months.

Thanks again for raising this important issue! I hope my thoughts on that
make sense, and I am happy to further work on them,
Denny




On Wed, Aug 5, 2020 at 11:19 PM Nick Wilson (Quiddity) <
nwilson@wikimedia.org> wrote:

> On Wed, Aug 5, 2020 at 2:01 PM Samuel Klein <meta.sj@gmail.com> wrote:
>
> > We used to have a roughly weighted list of major world languages by
> > (spoken, written; primary, secondary) and how well covered they were by
> wp
> > (articles, contributors). Is there something like that still?
> >
>
> I think you might be referring to the links in the 3rd and 4th line of
> https://meta.wikimedia.org/wiki/Template:Lists_of_Wikipedias ?
> Looking more closely, it appears that the "speakers per article" listing is
> unfortunately a few years out of date, as the column of "Speakers" was
> being manually updated from Ethnologue stats (which are now paywalled).
> I've started a tangential discussion on the talkpage there, about using
> Wikidata instead.
> Additionally, none of those links contain the "primary / secondary
> language" statistics, for which I think we'd need to cross-reference with
> https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers
> (https://www.wikidata.org/wiki/Q1394450) Or perhaps Wikidata can resolve
> it
> again, as at least some languages' items include a split of the statistics
> for that, e.g. Q150. Let's discuss further onwiki?
>
> And +1 to the overall recommendation from C. Scott. :)
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>