Mailing List Archive

1 2  View All
Re: ORES To Lift Wing Migration [ In reply to ]
Luca writes:

> Managing several hundreds models for goodfaith and damaging is not very
scalable in a modern micro-service architecture like Lift Wing
> (since we have a model for each supported wiki). We (both Research and
ML) are oriented on having fewer models that manage more languages at the
same time,

Is there a reason to think that separate models for each wiki are more
effective than one general model that sees the name of the wiki as part of
its context?
I'd love to read more about the cost of training and updating current
models, how much material they are trained on, and how others w/ their own
GPUs can contribute to updates.

Personally I wouldn't mind a single model that can suggest multiple
properties of an edit, including goodfaith, damaging, and likelihood of
reversion. They are different if related concepts -- the first deals with
the intent and predicted further editing history of the editor, the second
with article accuracy and quality, and the latter with the size +
activity + norms of the other editors...

SJ




On Fri, Sep 22, 2023 at 5:34?PM Aaron Halfaker <aaron.halfaker@gmail.com>
wrote:

> All fine points. As you can see, I've filed some phab tasks where I saw a
> clear opportunity to do so.
>
> > as mentioned before all the models that currently run on ORES are
> available in both ores-legacy and Lift Wing.
>
> I thought I read that damaging and goodfaith models are going to be
> replaced. Should I instead read that they are likely to remain available
> for the foreseeable future? When I asked about a community discussion
> about the transition from damaging/goodfaith to revertrisk, I was imagining
> that many people who use those predictions might have an opinion about them
> going away. E.g. people who use the relevant filters in RecentChanges.
> Maybe I missed the discussions about that.
>
> I haven't seen a mention of the article quality or article topic models in
> the docs. Are those also going to remain available? I have some user
> scripts that use these models and are relatively widely used. I didn't
> notice anyone reaching out. ... So I checked and setting a User-Agent on my
> user scripts doesn't actually change the User-Agent. I've read that you
> need to set "Api-User-Agent" instead, but that causes a CORS error when
> querying ORES. I'll file a bug.
>
> On Fri, Sep 22, 2023 at 1:22?PM Luca Toscano <ltoscano@wikimedia.org>
> wrote:
>
>>
>>
>> On Fri, Sep 22, 2023 at 8:59?PM Aaron Halfaker <aaron.halfaker@gmail.com>
>> wrote:
>>
>>> We could definitely file a task. However, it does seem like
>>> highlighting the features that will no longer be available is an
>>> appropriate topic for a discussion about migration in a technical mailing
>>> list.
>>>
>>
>> A specific question related to a functionality is the topic for a task, I
>> don't think that we should discuss every detail that differs from the ORES
>> API (Wikitech-l doesn't seem a good medium for it). We are already
>> following up on Phabricator, let's use tasks if possible to keep the
>> conversation as light and targeted as possible.
>>
>> Is there a good reference for which features have been excluded from
>>> ores-legacy? It looks like https://wikitech.wikimedia.org/wiki/ORES covers
>>> some of the excluded features/models, but not all of them.
>>>
>>
>> We spent the last months helping the community to migrate away from the
>> ORES API (to use Lift Wing instead), the remaining traffic is only related
>> to few low traffic IPs that we are not able to contact. We didn't add
>> feature injection or threshold optimization to ores-legacy, for example,
>> since there was no indication on our logs that users were relying on it. We
>> have always stated everywhere (including all emails sent in this mailing
>> list) that we are 100% open to add a functionality if it is backed up by a
>> valid use case.
>>
>>
>>> I see now that it looks like the RevertRisk model will be replacing the *damaging
>>> *and *goodfaith *models that differentiate intentional damage from
>>> unintentional damage. There's a large body of research on why this is
>>> valuable and important to the social functioning of the wikis. This
>>> literature also discusses why being reverted is not a very good signal for
>>> damage/vandalism and can lead to problems when used as a signal for
>>> patrolling. Was there a community discussion about this deprecation that I
>>> missed? I have some preliminary results (in press) that demonstrate that
>>> the RevertRisk model performs significantly worse than the damaging and
>>> goodfaith models in English Wikipedia for patrolling work. Do you have
>>> documentation for how you evaluated this model and compared it to
>>> damaging/goodfaith?
>>>
>>
>> We have model cards related to both Revert Risk models, all of them
>> linked in the API portal docs (more info:
>> https://api.wikimedia.org/wiki/Lift_Wing_API). All the community folks
>> that migrated their bots/tools/etc.. to Revert Risk were very happy about
>> the change, and we haven't had any request to switch back since then.
>>
>> The ML team provides all the models deployed on ORES on Lift Wing, so any
>> damaging and goodfaith variant is available in the new API. We chose to not
>> pursue the development of those models for several reasons:
>> - We haven't had any indication/request from the community about those
>> models in almost two years, except few Phabricator updates that we followed
>> up on.
>> - Managing several hundreds models for goodfaith and damaging is not very
>> scalable in a modern micro-service architecture like Lift Wing (since we
>> have a model for each supported wiki). We (both Research and ML) are
>> oriented on having fewer models that manage more languages at the same
>> time, and this is the direction that we are following at the moment. It may
>> not be the perfect one but so far it seems a good choice. If you want to
>> chime in and provide your inputs we are 100% available in hearing
>> suggestions/concerns/doubts/recommendations/etc.., please follow up in any
>> of our channels (IRC, mailing lists, Phabricator for example).
>> - Last but not the least, most of the damaging/goodfaith models have been
>> trained with data coming from years ago, and never re-trained. The efforts
>> to keep several hundreds models up-to-date with recent data versus doing
>> the same of few models (like revert risk) weights in favor of the latter
>> for a relatively small team of engineers like us.
>>
>>
>>> FWIW, from my reading of these announcement threads, I believed that
>>> generally functionality and models would be preserved in
>>> ores-legacy/LiftWing. This is the first time I've realized the scale of
>>> what will become unavailable.
>>>
>>
>> This is the part that I don't get, since as mentioned before all the
>> models that currently run on ORES are available in both ores-legacy and
>> Lift Wing. What changes is that we don't expose anymore functionality that
>> logs clearly show are not used, and that would need to be maintained and
>> improved over time. We are open to improve and add any requirement that the
>> community needs, the only thing that we ask is to provide a valid use case
>> to support it.
>>
>> I do think that Lift Wing is a great improvement for the community, we
>> have been working with all the folks that reached out to us, without hiding
>> anything (including deprecation plans and path forwards).
>>
>> Thanks for following up!
>>
>> Luca
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



--
Samuel Klein @metasj w:user:sj +1 617 529 4266
Re: ORES To Lift Wing Migration [ In reply to ]
On Fri, Sep 22, 2023 at 11:34?PM Aaron Halfaker <aaron.halfaker@gmail.com>
wrote:

> All fine points. As you can see, I've filed some phab tasks where I saw a
> clear opportunity to do so.
>

Thanks a lot! We are going to review them next week and decide the next
steps, but we'd like to proceed anyway to migrate ores to ores-legacy on
Monday (this will allow us to free some old nodes that need to be decommed
etc..). Adding features later on to the models on Lift Wing should be
doable, and our goal is to transition away from ores-legacy in a few months
(to avoid maintaining too many systems). The timeline is not yet set in
stone, we'll update this mailing list when the time comes (and we'll follow
up with the remaining users of ores-legacy as well). To summarize: we start
with Ores -> Ores Legacy on Monday, and we'll do Ores Legacy -> Lift Wing
in a second step.

> as mentioned before all the models that currently run on ORES are
> available in both ores-legacy and Lift Wing.
>
> I thought I read that damaging and goodfaith models are going to be
> replaced. Should I instead read that they are likely to remain available
> for the foreseeable future? When I asked about a community discussion
> about the transition from damaging/goodfaith to revertrisk, I was imagining
> that many people who use those predictions might have an opinion about them
> going away. E.g. people who use the relevant filters in RecentChanges.
> Maybe I missed the discussions about that.
>

This is a good point, I'll clarify the documentation on Wikitech. Until
models are used we'll not remove them from Lift Wing, but we'll propose to
use Revert Risk where it is suited since it is a model family on which we
decided to invest time and efforts. Basic maintenance will be performed on
the goodfaith/damaging/articlequality/etc.. models on Lift Wing, but we
don't have (at the moment) any bandwidth to guarantee retraining or more
complex workflows on them. This is why we used the term "deprecated" on
Wikitech, but we need to specify what we mean to avoid confusion. Thanks
for the feedback :)


>
> I haven't seen a mention of the article quality or article topic models in
> the docs. Are those also going to remain available? I have some user
> scripts that use these models and are relatively widely used. I didn't
> notice anyone reaching out. ... So I checked and setting a User-Agent on my
> user scripts doesn't actually change the User-Agent. I've read that you
> need to set "Api-User-Agent" instead, but that causes a CORS error when
> querying ORES. I'll file a bug.
>

Will update the docs as well, as mentioned above we'll keep the current
ORES models available on Lift Wing. Eventually new models will be proposed
by Research and other teams (like Revert Risk), and at that point we (as ML
team) will decide what recommendation to give. Nothing will be removed from
Lift Wing if there are active users on it, but we'll certainly try to
reduce the amount of models to maintain (based on common functionality
etc..), so some changes will be proposed in the future.

Luca
Re: ORES To Lift Wing Migration [ In reply to ]
Hi folks,

So glad to see the old and new ML teams have an open discussion about this
subject.

I understand that the team might prefer to have several tickets for
different issues, but the discussion about the general approach to the
different models is of interest to many people and is more easily digested
on email. I would suggest to continue discussing the merits of the current
strategy (and not necessarily of a model or another) on email.

* One model per wiki or overall
This is a tough one. :) As a user, I remember how hard it was for Romanian
speakers to complete the training data for damaging/goodfaith and would
prefer to not have to do it again.

However, I'm also worried that some specificities of larger wikis would
creep in the output, leading to reverts that would normally not happen on
my wiki. For instance, smaller settlements are not accepted on enwp, while
they are accepted on rowp. I don't know how to test it myself, and I
haven't seen anything about it in the research.

Another problem I have is I'm not sure how the revert-risk score should be
matched against custom damaging/goodfaith thresholds. Ate there some
guidelines on this except "test"?

* Multiple criteria VS a single score
I think the discussion has been very much about reverts, but as Sj said,
each of these scores are a slightly different facet. Is there data
available on the prevalence of other use-cases or is everyone just writing
revert bots?

On the long run, I believe an unique model good enough can be developed for
revert bots. However, it would be great if there were some clear quality
criteria that the community can verify and the old models are maintained
for a wiki until we are sure the new model passes that criteria on that
wiki.

A change in hosting should not be the guiding force in any team's roadmap,
but the needs of its users.

Have a good weekend,
Strainu




Pe sâmb?t?, 23 septembrie 2023, Luca Toscano <ltoscano@wikimedia.org> a
scris:
>
>
> On Fri, Sep 22, 2023 at 11:34?PM Aaron Halfaker <aaron.halfaker@gmail.com>
wrote:
>>
>> All fine points. As you can see, I've filed some phab tasks where I saw
a clear opportunity to do so.
>
> Thanks a lot! We are going to review them next week and decide the next
steps, but we'd like to proceed anyway to migrate ores to ores-legacy on
Monday (this will allow us to free some old nodes that need to be decommed
etc..). Adding features later on to the models on Lift Wing should be
doable, and our goal is to transition away from ores-legacy in a few months
(to avoid maintaining too many systems). The timeline is not yet set in
stone, we'll update this mailing list when the time comes (and we'll follow
up with the remaining users of ores-legacy as well). To summarize: we start
with Ores -> Ores Legacy on Monday, and we'll do Ores Legacy -> Lift Wing
in a second step.
>>
>> > as mentioned before all the models that currently run on ORES are
available in both ores-legacy and Lift Wing.
>>
>> I thought I read that damaging and goodfaith models are going to be
replaced. Should I instead read that they are likely to remain available
for the foreseeable future? When I asked about a community discussion
about the transition from damaging/goodfaith to revertrisk, I was imagining
that many people who use those predictions might have an opinion about them
going away. E.g. people who use the relevant filters in RecentChanges.
Maybe I missed the discussions about that.
>
> This is a good point, I'll clarify the documentation on Wikitech. Until
models are used we'll not remove them from Lift Wing, but we'll propose to
use Revert Risk where it is suited since it is a model family on which we
decided to invest time and efforts. Basic maintenance will be performed on
the goodfaith/damaging/articlequality/etc.. models on Lift Wing, but we
don't have (at the moment) any bandwidth to guarantee retraining or more
complex workflows on them. This is why we used the term "deprecated" on
Wikitech, but we need to specify what we mean to avoid confusion. Thanks
for the feedback :)
>
>>
>> I haven't seen a mention of the article quality or article topic models
in the docs. Are those also going to remain available? I have some user
scripts that use these models and are relatively widely used. I didn't
notice anyone reaching out. ... So I checked and setting a User-Agent on my
user scripts doesn't actually change the User-Agent. I've read that you
need to set "Api-User-Agent" instead, but that causes a CORS error when
querying ORES. I'll file a bug.
>
> Will update the docs as well, as mentioned above we'll keep the current
ORES models available on Lift Wing. Eventually new models will be proposed
by Research and other teams (like Revert Risk), and at that point we (as ML
team) will decide what recommendation to give. Nothing will be removed from
Lift Wing if there are active users on it, but we'll certainly try to
reduce the amount of models to maintain (based on common functionality
etc..), so some changes will be proposed in the future.
> Luca
Re: ORES To Lift Wing Migration [ In reply to ]
Hi!

I'll leave the comments related to model architecture and behavior to
others (more expert than me), I'd like to comment on the
process/infrastructure parts :)

On Sat, Sep 23, 2023 at 9:03?PM Strainu <strainu10@gmail.com> wrote:

> Hi folks,
>
> So glad to see the old and new ML teams have an open discussion about this
> subject.
>
> I understand that the team might prefer to have several tickets for
> different issues, but the discussion about the general approach to the
> different models is of interest to many people and is more easily digested
> on email. I would suggest to continue discussing the merits of the current
> strategy (and not necessarily of a model or another) on email.


I proposed Phabricator tasks because I think that they better target
different broad subjects, it is easier to involve specific teams/people and
to define the goal of the conversations. In this big email thread we
started outlining the migration/deprecation of ORES in favor of Lift Wing,
and now we are talking about model architectures and strategies to use for
various use cases in the future. I really like the conversation, but if we
wanted to be strict a new email thread (with a different subject) should be
created, instead of mixing multiple subjects. People interested in the Lift
Wing migration wouldn't be able to add comments, or if they did it would
become difficult to follow all the discussions.

As stated before, I'll clarify the "deprecation" term mentioned in Wikitech
for the various revscoring-based models, but it is not something that is
related to the Lift Wing migration (since all models present on ORES are
also on Lift Wing). It is a long term and wider project that will happen
over the upcoming months/years, and that requires a broader discussion.

This is why I propose to discuss models on Phabricator, rather than
Wikitech-l :)


> On the long run, I believe an unique model good enough can be developed
> for revert bots. However, it would be great if there were some clear
> quality criteria that the community can verify and the old models are
> maintained for a wiki until we are sure the new model passes that criteria
> on that wiki.
>

Definitely, I just want to make it clear that the ML team has no intention
to force any choice to the community, we are just trying to optimize our
infrastructure to serve a wide variety of models and in the process we have
to choose the best strategy to follow. On Lift Wing we require that every
new model has a model card that explains how it works, how it was trained,
best use cases, etc.. For example, these are the API Portal's pages for the
two Revert Risk models (they contain the link to model cards):
https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_reverted_risk_language_agnostic_prediction
https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_reverted_risk_multilingual_prediction


> A change in hosting should not be the guiding force in any team's roadmap,
> but the needs of its users.
>

I hope that we (as ML) didn't describe our intentions in the wrong way,
since our aim is absolutely not to impose anything, but to improve our
infrastructure to better serve users in the future (WMF internal use cases
and the community). ORES served us well over the years, it was a pioneering
project on a topic, ML, that was only discussed in Research papers and some
futuristic set of libraries at the time. Big players were already working
on it internally, but there was no clear guidance or standards, and over
the years stuff like MLOps formed and nowadays they are the de facto
standard to operate. We are trying to follow those best practices, because
we are convinced that they will surely improve and ease the process to
build and publish a model at the WMF.
If you are curious, the ML team worked a lot on documentation, see
https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing. We tried as
best as we could to make the transition smooth and to highlight new
features and improvements for every user.

To summarize - we, as ML team, have created Lift Wing to serve the
community and our internal use cases, and we wouldn't remove support or
change dramatically how the community operates without a gradual migration
path and proposing new solutions first. During the migration to Lift Wing
we asked folks to test Revert Risk models, instead of goodfaith/damanging
ones, and the solution seems to have suited a lot of use cases. Maybe in
the future we'll have a mixture of specialized models for certain wikis,
and more "multi-purpose" ones, but finding the right solution will surely
involve community feedback and several tries.

Thanks for the feedback!

Luca
Re: ORES To Lift Wing Migration [ In reply to ]
Hey SJ!

> Is there a reason to think that separate models for each wiki are more
effective than one general model that sees the name of the wiki as part of
its context?

Intuitively one model per wiki has a lot of merit. The training data comes
from the community that is impacted by the model, etc. However, there are
scale and equity issues we wrestled with. One lesson we have learned
training ~300+ models for the Add-A-Link
<https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Structured_tasks/Add_a_link>
project is that if we continued down that path, Lift Wing would eventually
be hosting 3000+ models (i.e. 330 models per new feature) pretty quickly
and overwhelm any ability of our small team to maintain, quality control,
support, and improve them over their lifespan. Regarding equity, even with
a multi-year effort the one model per wiki RevScoring models only covered
~33 out of 330 wikis. The communities we didn't reach didn't get the
benefit of those models. But with language agnostic models we can make that
model available to all communities. For example, the language agnostic
revert risk model will likely be the model selected for the automoderator
project <https://www.mediawiki.org/wiki/Moderator_Tools/Automoderator>,
which means hundreds more wikis get access to the tool compared to a one
model per wiki approach.

> I'd love to read more about the cost of training and updating current
models, how much material they are trained on, and how others w/ their own
GPUs can contribute to updates.

The training data information should be available in the model cards
<https://meta.wikimedia.org/wiki/Machine_learning_models>. If it isn't, let
me know so we can change it. Regarding GPUs and contributions, we are still
working on what a good training environment will be. Our initial idea was a
kubeflow-based cluster we called Train Wing, but we've had to put them on
hold for resource reasons (i.e. we couldn't build Lift Wing, deprecate
ORES, and build Train Wing all at the same time). More on that soon after
the Research-ML offsite when we'll have those conversations.

All this said, one thing we do want to support is hosting community created
models. So, if a community has a model they want to host, we can load it
into Lift Wing and host it for them at scale. We have a lot of details to
work out (e.g. community consensus, human rights review as part of being a
Very Large Online Platform etc.) as to what that would look like, but that
is the goal.

Chris

On Fri, Sep 22, 2023 at 5:42?PM Samuel Klein <meta.sj@gmail.com> wrote:

> Luca writes:
>
> > Managing several hundreds models for goodfaith and damaging is not very
> scalable in a modern micro-service architecture like Lift Wing
> > (since we have a model for each supported wiki). We (both Research and
> ML) are oriented on having fewer models that manage more languages at the
> same time,
>
> Is there a reason to think that separate models for each wiki are more
> effective than one general model that sees the name of the wiki as part of
> its context?
> I'd love to read more about the cost of training and updating current
> models, how much material they are trained on, and how others w/ their own
> GPUs can contribute to updates.
>
> Personally I wouldn't mind a single model that can suggest multiple
> properties of an edit, including goodfaith, damaging, and likelihood of
> reversion. They are different if related concepts -- the first deals with
> the intent and predicted further editing history of the editor, the second
> with article accuracy and quality, and the latter with the size +
> activity + norms of the other editors...
>
> SJ
>
>
>
>
> On Fri, Sep 22, 2023 at 5:34?PM Aaron Halfaker <aaron.halfaker@gmail.com>
> wrote:
>
>> All fine points. As you can see, I've filed some phab tasks where I saw
>> a clear opportunity to do so.
>>
>> > as mentioned before all the models that currently run on ORES are
>> available in both ores-legacy and Lift Wing.
>>
>> I thought I read that damaging and goodfaith models are going to be
>> replaced. Should I instead read that they are likely to remain available
>> for the foreseeable future? When I asked about a community discussion
>> about the transition from damaging/goodfaith to revertrisk, I was imagining
>> that many people who use those predictions might have an opinion about them
>> going away. E.g. people who use the relevant filters in RecentChanges.
>> Maybe I missed the discussions about that.
>>
>> I haven't seen a mention of the article quality or article topic models
>> in the docs. Are those also going to remain available? I have some user
>> scripts that use these models and are relatively widely used. I didn't
>> notice anyone reaching out. ... So I checked and setting a User-Agent on my
>> user scripts doesn't actually change the User-Agent. I've read that you
>> need to set "Api-User-Agent" instead, but that causes a CORS error when
>> querying ORES. I'll file a bug.
>>
>> On Fri, Sep 22, 2023 at 1:22?PM Luca Toscano <ltoscano@wikimedia.org>
>> wrote:
>>
>>>
>>>
>>> On Fri, Sep 22, 2023 at 8:59?PM Aaron Halfaker <aaron.halfaker@gmail.com>
>>> wrote:
>>>
>>>> We could definitely file a task. However, it does seem like
>>>> highlighting the features that will no longer be available is an
>>>> appropriate topic for a discussion about migration in a technical mailing
>>>> list.
>>>>
>>>
>>> A specific question related to a functionality is the topic for a task,
>>> I don't think that we should discuss every detail that differs from the
>>> ORES API (Wikitech-l doesn't seem a good medium for it). We are already
>>> following up on Phabricator, let's use tasks if possible to keep the
>>> conversation as light and targeted as possible.
>>>
>>> Is there a good reference for which features have been excluded from
>>>> ores-legacy? It looks like https://wikitech.wikimedia.org/wiki/ORES covers
>>>> some of the excluded features/models, but not all of them.
>>>>
>>>
>>> We spent the last months helping the community to migrate away from the
>>> ORES API (to use Lift Wing instead), the remaining traffic is only related
>>> to few low traffic IPs that we are not able to contact. We didn't add
>>> feature injection or threshold optimization to ores-legacy, for example,
>>> since there was no indication on our logs that users were relying on it. We
>>> have always stated everywhere (including all emails sent in this mailing
>>> list) that we are 100% open to add a functionality if it is backed up by a
>>> valid use case.
>>>
>>>
>>>> I see now that it looks like the RevertRisk model will be replacing the *damaging
>>>> *and *goodfaith *models that differentiate intentional damage from
>>>> unintentional damage. There's a large body of research on why this is
>>>> valuable and important to the social functioning of the wikis. This
>>>> literature also discusses why being reverted is not a very good signal for
>>>> damage/vandalism and can lead to problems when used as a signal for
>>>> patrolling. Was there a community discussion about this deprecation that I
>>>> missed? I have some preliminary results (in press) that demonstrate that
>>>> the RevertRisk model performs significantly worse than the damaging and
>>>> goodfaith models in English Wikipedia for patrolling work. Do you have
>>>> documentation for how you evaluated this model and compared it to
>>>> damaging/goodfaith?
>>>>
>>>
>>> We have model cards related to both Revert Risk models, all of them
>>> linked in the API portal docs (more info:
>>> https://api.wikimedia.org/wiki/Lift_Wing_API). All the community folks
>>> that migrated their bots/tools/etc.. to Revert Risk were very happy about
>>> the change, and we haven't had any request to switch back since then.
>>>
>>> The ML team provides all the models deployed on ORES on Lift Wing, so
>>> any damaging and goodfaith variant is available in the new API. We chose to
>>> not pursue the development of those models for several reasons:
>>> - We haven't had any indication/request from the community about those
>>> models in almost two years, except few Phabricator updates that we followed
>>> up on.
>>> - Managing several hundreds models for goodfaith and damaging is not
>>> very scalable in a modern micro-service architecture like Lift Wing (since
>>> we have a model for each supported wiki). We (both Research and ML) are
>>> oriented on having fewer models that manage more languages at the same
>>> time, and this is the direction that we are following at the moment. It may
>>> not be the perfect one but so far it seems a good choice. If you want to
>>> chime in and provide your inputs we are 100% available in hearing
>>> suggestions/concerns/doubts/recommendations/etc.., please follow up in any
>>> of our channels (IRC, mailing lists, Phabricator for example).
>>> - Last but not the least, most of the damaging/goodfaith models have
>>> been trained with data coming from years ago, and never re-trained. The
>>> efforts to keep several hundreds models up-to-date with recent data versus
>>> doing the same of few models (like revert risk) weights in favor of the
>>> latter for a relatively small team of engineers like us.
>>>
>>>
>>>> FWIW, from my reading of these announcement threads, I believed that
>>>> generally functionality and models would be preserved in
>>>> ores-legacy/LiftWing. This is the first time I've realized the scale of
>>>> what will become unavailable.
>>>>
>>>
>>> This is the part that I don't get, since as mentioned before all the
>>> models that currently run on ORES are available in both ores-legacy and
>>> Lift Wing. What changes is that we don't expose anymore functionality that
>>> logs clearly show are not used, and that would need to be maintained and
>>> improved over time. We are open to improve and add any requirement that the
>>> community needs, the only thing that we ask is to provide a valid use case
>>> to support it.
>>>
>>> I do think that Lift Wing is a great improvement for the community, we
>>> have been working with all the folks that reached out to us, without hiding
>>> anything (including deprecation plans and path forwards).
>>>
>>> Thanks for following up!
>>>
>>> Luca
>>> _______________________________________________
>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>>>
>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
>
>
> --
> Samuel Klein @metasj w:user:sj +1 617 529 4266
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: ORES To Lift Wing Migration [ In reply to ]
It looks like user-scripts running on Wikipedia can no longer use ORES.
I'm getting a CORS error. You can test this by trying to run the
following the JS dev console on a Wikimedia page:
> $.ajax({url: "https://ores.wikimedia.org/v3/scores/
"}).done(function(response){console.log(response)})

This is what I see:
> Access to XMLHttpRequest at 'https://ores.wikimedia.org/v3/scores/' from
origin 'https://en.wikipedia.org' has been blocked by CORS policy: No
'Access-Control-Allow-Origin' header is present on the requested resource.
> GET https://ores.wikimedia.org/v3/scores/ net::ERR_FAILED 307

I'll file a bug, but I thought elevating this to the migration thread was a
good idea.

On Mon, Sep 25, 2023 at 7:37?AM Chris Albon <calbon@wikimedia.org> wrote:

> Hey SJ!
>
> > Is there a reason to think that separate models for each wiki are more
> effective than one general model that sees the name of the wiki as part of
> its context?
>
> Intuitively one model per wiki has a lot of merit. The training data comes
> from the community that is impacted by the model, etc. However, there are
> scale and equity issues we wrestled with. One lesson we have learned
> training ~300+ models for the Add-A-Link
> <https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Structured_tasks/Add_a_link>
> project is that if we continued down that path, Lift Wing would eventually
> be hosting 3000+ models (i.e. 330 models per new feature) pretty quickly
> and overwhelm any ability of our small team to maintain, quality control,
> support, and improve them over their lifespan. Regarding equity, even with
> a multi-year effort the one model per wiki RevScoring models only covered
> ~33 out of 330 wikis. The communities we didn't reach didn't get the
> benefit of those models. But with language agnostic models we can make that
> model available to all communities. For example, the language agnostic
> revert risk model will likely be the model selected for the automoderator
> project <https://www.mediawiki.org/wiki/Moderator_Tools/Automoderator>,
> which means hundreds more wikis get access to the tool compared to a one
> model per wiki approach.
>
> > I'd love to read more about the cost of training and updating current
> models, how much material they are trained on, and how others w/ their own
> GPUs can contribute to updates.
>
> The training data information should be available in the model cards
> <https://meta.wikimedia.org/wiki/Machine_learning_models>. If it isn't,
> let me know so we can change it. Regarding GPUs and contributions, we are
> still working on what a good training environment will be. Our initial idea
> was a kubeflow-based cluster we called Train Wing, but we've had to put
> them on hold for resource reasons (i.e. we couldn't build Lift Wing,
> deprecate ORES, and build Train Wing all at the same time). More on that
> soon after the Research-ML offsite when we'll have those conversations.
>
> All this said, one thing we do want to support is hosting community
> created models. So, if a community has a model they want to host, we can
> load it into Lift Wing and host it for them at scale. We have a lot of
> details to work out (e.g. community consensus, human rights review as part
> of being a Very Large Online Platform etc.) as to what that would look
> like, but that is the goal.
>
> Chris
>
> On Fri, Sep 22, 2023 at 5:42?PM Samuel Klein <meta.sj@gmail.com> wrote:
>
>> Luca writes:
>>
>> > Managing several hundreds models for goodfaith and damaging is not
>> very scalable in a modern micro-service architecture like Lift Wing
>> > (since we have a model for each supported wiki). We (both Research and
>> ML) are oriented on having fewer models that manage more languages at the
>> same time,
>>
>> Is there a reason to think that separate models for each wiki are more
>> effective than one general model that sees the name of the wiki as part of
>> its context?
>> I'd love to read more about the cost of training and updating current
>> models, how much material they are trained on, and how others w/ their own
>> GPUs can contribute to updates.
>>
>> Personally I wouldn't mind a single model that can suggest multiple
>> properties of an edit, including goodfaith, damaging, and likelihood of
>> reversion. They are different if related concepts -- the first deals with
>> the intent and predicted further editing history of the editor, the second
>> with article accuracy and quality, and the latter with the size +
>> activity + norms of the other editors...
>>
>> SJ
>>
>>
>>
>>
>> On Fri, Sep 22, 2023 at 5:34?PM Aaron Halfaker <aaron.halfaker@gmail.com>
>> wrote:
>>
>>> All fine points. As you can see, I've filed some phab tasks where I saw
>>> a clear opportunity to do so.
>>>
>>> > as mentioned before all the models that currently run on ORES are
>>> available in both ores-legacy and Lift Wing.
>>>
>>> I thought I read that damaging and goodfaith models are going to be
>>> replaced. Should I instead read that they are likely to remain available
>>> for the foreseeable future? When I asked about a community discussion
>>> about the transition from damaging/goodfaith to revertrisk, I was imagining
>>> that many people who use those predictions might have an opinion about them
>>> going away. E.g. people who use the relevant filters in RecentChanges.
>>> Maybe I missed the discussions about that.
>>>
>>> I haven't seen a mention of the article quality or article topic models
>>> in the docs. Are those also going to remain available? I have some user
>>> scripts that use these models and are relatively widely used. I didn't
>>> notice anyone reaching out. ... So I checked and setting a User-Agent on my
>>> user scripts doesn't actually change the User-Agent. I've read that you
>>> need to set "Api-User-Agent" instead, but that causes a CORS error when
>>> querying ORES. I'll file a bug.
>>>
>>> On Fri, Sep 22, 2023 at 1:22?PM Luca Toscano <ltoscano@wikimedia.org>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Sep 22, 2023 at 8:59?PM Aaron Halfaker <
>>>> aaron.halfaker@gmail.com> wrote:
>>>>
>>>>> We could definitely file a task. However, it does seem like
>>>>> highlighting the features that will no longer be available is an
>>>>> appropriate topic for a discussion about migration in a technical mailing
>>>>> list.
>>>>>
>>>>
>>>> A specific question related to a functionality is the topic for a task,
>>>> I don't think that we should discuss every detail that differs from the
>>>> ORES API (Wikitech-l doesn't seem a good medium for it). We are already
>>>> following up on Phabricator, let's use tasks if possible to keep the
>>>> conversation as light and targeted as possible.
>>>>
>>>> Is there a good reference for which features have been excluded from
>>>>> ores-legacy? It looks like https://wikitech.wikimedia.org/wiki/ORES covers
>>>>> some of the excluded features/models, but not all of them.
>>>>>
>>>>
>>>> We spent the last months helping the community to migrate away from the
>>>> ORES API (to use Lift Wing instead), the remaining traffic is only related
>>>> to few low traffic IPs that we are not able to contact. We didn't add
>>>> feature injection or threshold optimization to ores-legacy, for example,
>>>> since there was no indication on our logs that users were relying on it. We
>>>> have always stated everywhere (including all emails sent in this mailing
>>>> list) that we are 100% open to add a functionality if it is backed up by a
>>>> valid use case.
>>>>
>>>>
>>>>> I see now that it looks like the RevertRisk model will be replacing
>>>>> the *damaging *and *goodfaith *models that differentiate intentional
>>>>> damage from unintentional damage. There's a large body of research on why
>>>>> this is valuable and important to the social functioning of the wikis.
>>>>> This literature also discusses why being reverted is not a very good signal
>>>>> for damage/vandalism and can lead to problems when used as a signal for
>>>>> patrolling. Was there a community discussion about this deprecation that I
>>>>> missed? I have some preliminary results (in press) that demonstrate that
>>>>> the RevertRisk model performs significantly worse than the damaging and
>>>>> goodfaith models in English Wikipedia for patrolling work. Do you have
>>>>> documentation for how you evaluated this model and compared it to
>>>>> damaging/goodfaith?
>>>>>
>>>>
>>>> We have model cards related to both Revert Risk models, all of them
>>>> linked in the API portal docs (more info:
>>>> https://api.wikimedia.org/wiki/Lift_Wing_API). All the community folks
>>>> that migrated their bots/tools/etc.. to Revert Risk were very happy about
>>>> the change, and we haven't had any request to switch back since then.
>>>>
>>>> The ML team provides all the models deployed on ORES on Lift Wing, so
>>>> any damaging and goodfaith variant is available in the new API. We chose to
>>>> not pursue the development of those models for several reasons:
>>>> - We haven't had any indication/request from the community about those
>>>> models in almost two years, except few Phabricator updates that we followed
>>>> up on.
>>>> - Managing several hundreds models for goodfaith and damaging is not
>>>> very scalable in a modern micro-service architecture like Lift Wing (since
>>>> we have a model for each supported wiki). We (both Research and ML) are
>>>> oriented on having fewer models that manage more languages at the same
>>>> time, and this is the direction that we are following at the moment. It may
>>>> not be the perfect one but so far it seems a good choice. If you want to
>>>> chime in and provide your inputs we are 100% available in hearing
>>>> suggestions/concerns/doubts/recommendations/etc.., please follow up in any
>>>> of our channels (IRC, mailing lists, Phabricator for example).
>>>> - Last but not the least, most of the damaging/goodfaith models have
>>>> been trained with data coming from years ago, and never re-trained. The
>>>> efforts to keep several hundreds models up-to-date with recent data versus
>>>> doing the same of few models (like revert risk) weights in favor of the
>>>> latter for a relatively small team of engineers like us.
>>>>
>>>>
>>>>> FWIW, from my reading of these announcement threads, I believed that
>>>>> generally functionality and models would be preserved in
>>>>> ores-legacy/LiftWing. This is the first time I've realized the scale of
>>>>> what will become unavailable.
>>>>>
>>>>
>>>> This is the part that I don't get, since as mentioned before all the
>>>> models that currently run on ORES are available in both ores-legacy and
>>>> Lift Wing. What changes is that we don't expose anymore functionality that
>>>> logs clearly show are not used, and that would need to be maintained and
>>>> improved over time. We are open to improve and add any requirement that the
>>>> community needs, the only thing that we ask is to provide a valid use case
>>>> to support it.
>>>>
>>>> I do think that Lift Wing is a great improvement for the community, we
>>>> have been working with all the folks that reached out to us, without hiding
>>>> anything (including deprecation plans and path forwards).
>>>>
>>>> Thanks for following up!
>>>>
>>>> Luca
>>>> _______________________________________________
>>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>>>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>>>>
>>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>>>
>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>
>>
>>
>> --
>> Samuel Klein @metasj w:user:sj +1 617 529 4266
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: ORES To Lift Wing Migration [ In reply to ]
See https://phabricator.wikimedia.org/T347344

On Mon, Sep 25, 2023 at 12:26?PM Aaron Halfaker <aaron.halfaker@gmail.com>
wrote:

> It looks like user-scripts running on Wikipedia can no longer use ORES.
> I'm getting a CORS error. You can test this by trying to run the
> following the JS dev console on a Wikimedia page:
> > $.ajax({url: "https://ores.wikimedia.org/v3/scores/
> "}).done(function(response){console.log(response)})
>
> This is what I see:
> > Access to XMLHttpRequest at 'https://ores.wikimedia.org/v3/scores/'
> from origin 'https://en.wikipedia.org' has been blocked by CORS policy:
> No 'Access-Control-Allow-Origin' header is present on the requested
> resource.
> > GET https://ores.wikimedia.org/v3/scores/ net::ERR_FAILED 307
>
> I'll file a bug, but I thought elevating this to the migration thread was
> a good idea.
>
> On Mon, Sep 25, 2023 at 7:37?AM Chris Albon <calbon@wikimedia.org> wrote:
>
>> Hey SJ!
>>
>> > Is there a reason to think that separate models for each wiki are more
>> effective than one general model that sees the name of the wiki as part of
>> its context?
>>
>> Intuitively one model per wiki has a lot of merit. The training data
>> comes from the community that is impacted by the model, etc. However, there
>> are scale and equity issues we wrestled with. One lesson we have learned
>> training ~300+ models for the Add-A-Link
>> <https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Structured_tasks/Add_a_link>
>> project is that if we continued down that path, Lift Wing would eventually
>> be hosting 3000+ models (i.e. 330 models per new feature) pretty quickly
>> and overwhelm any ability of our small team to maintain, quality control,
>> support, and improve them over their lifespan. Regarding equity, even with
>> a multi-year effort the one model per wiki RevScoring models only covered
>> ~33 out of 330 wikis. The communities we didn't reach didn't get the
>> benefit of those models. But with language agnostic models we can make that
>> model available to all communities. For example, the language agnostic
>> revert risk model will likely be the model selected for the automoderator
>> project <https://www.mediawiki.org/wiki/Moderator_Tools/Automoderator>,
>> which means hundreds more wikis get access to the tool compared to a one
>> model per wiki approach.
>>
>> > I'd love to read more about the cost of training and updating current
>> models, how much material they are trained on, and how others w/ their own
>> GPUs can contribute to updates.
>>
>> The training data information should be available in the model cards
>> <https://meta.wikimedia.org/wiki/Machine_learning_models>. If it isn't,
>> let me know so we can change it. Regarding GPUs and contributions, we are
>> still working on what a good training environment will be. Our initial idea
>> was a kubeflow-based cluster we called Train Wing, but we've had to put
>> them on hold for resource reasons (i.e. we couldn't build Lift Wing,
>> deprecate ORES, and build Train Wing all at the same time). More on that
>> soon after the Research-ML offsite when we'll have those conversations.
>>
>> All this said, one thing we do want to support is hosting community
>> created models. So, if a community has a model they want to host, we can
>> load it into Lift Wing and host it for them at scale. We have a lot of
>> details to work out (e.g. community consensus, human rights review as part
>> of being a Very Large Online Platform etc.) as to what that would look
>> like, but that is the goal.
>>
>> Chris
>>
>> On Fri, Sep 22, 2023 at 5:42?PM Samuel Klein <meta.sj@gmail.com> wrote:
>>
>>> Luca writes:
>>>
>>> > Managing several hundreds models for goodfaith and damaging is not
>>> very scalable in a modern micro-service architecture like Lift Wing
>>> > (since we have a model for each supported wiki). We (both Research
>>> and ML) are oriented on having fewer models that manage more languages at
>>> the same time,
>>>
>>> Is there a reason to think that separate models for each wiki are more
>>> effective than one general model that sees the name of the wiki as part of
>>> its context?
>>> I'd love to read more about the cost of training and updating current
>>> models, how much material they are trained on, and how others w/ their own
>>> GPUs can contribute to updates.
>>>
>>> Personally I wouldn't mind a single model that can suggest multiple
>>> properties of an edit, including goodfaith, damaging, and likelihood of
>>> reversion. They are different if related concepts -- the first deals with
>>> the intent and predicted further editing history of the editor, the second
>>> with article accuracy and quality, and the latter with the size +
>>> activity + norms of the other editors...
>>>
>>> SJ
>>>
>>>
>>>
>>>
>>> On Fri, Sep 22, 2023 at 5:34?PM Aaron Halfaker <aaron.halfaker@gmail.com>
>>> wrote:
>>>
>>>> All fine points. As you can see, I've filed some phab tasks where I
>>>> saw a clear opportunity to do so.
>>>>
>>>> > as mentioned before all the models that currently run on ORES are
>>>> available in both ores-legacy and Lift Wing.
>>>>
>>>> I thought I read that damaging and goodfaith models are going to be
>>>> replaced. Should I instead read that they are likely to remain available
>>>> for the foreseeable future? When I asked about a community discussion
>>>> about the transition from damaging/goodfaith to revertrisk, I was imagining
>>>> that many people who use those predictions might have an opinion about them
>>>> going away. E.g. people who use the relevant filters in RecentChanges.
>>>> Maybe I missed the discussions about that.
>>>>
>>>> I haven't seen a mention of the article quality or article topic models
>>>> in the docs. Are those also going to remain available? I have some user
>>>> scripts that use these models and are relatively widely used. I didn't
>>>> notice anyone reaching out. ... So I checked and setting a User-Agent on my
>>>> user scripts doesn't actually change the User-Agent. I've read that you
>>>> need to set "Api-User-Agent" instead, but that causes a CORS error when
>>>> querying ORES. I'll file a bug.
>>>>
>>>> On Fri, Sep 22, 2023 at 1:22?PM Luca Toscano <ltoscano@wikimedia.org>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 22, 2023 at 8:59?PM Aaron Halfaker <
>>>>> aaron.halfaker@gmail.com> wrote:
>>>>>
>>>>>> We could definitely file a task. However, it does seem like
>>>>>> highlighting the features that will no longer be available is an
>>>>>> appropriate topic for a discussion about migration in a technical mailing
>>>>>> list.
>>>>>>
>>>>>
>>>>> A specific question related to a functionality is the topic for a
>>>>> task, I don't think that we should discuss every detail that differs from
>>>>> the ORES API (Wikitech-l doesn't seem a good medium for it). We are already
>>>>> following up on Phabricator, let's use tasks if possible to keep the
>>>>> conversation as light and targeted as possible.
>>>>>
>>>>> Is there a good reference for which features have been excluded from
>>>>>> ores-legacy? It looks like https://wikitech.wikimedia.org/wiki/ORES covers
>>>>>> some of the excluded features/models, but not all of them.
>>>>>>
>>>>>
>>>>> We spent the last months helping the community to migrate away from
>>>>> the ORES API (to use Lift Wing instead), the remaining traffic is only
>>>>> related to few low traffic IPs that we are not able to contact. We didn't
>>>>> add feature injection or threshold optimization to ores-legacy, for
>>>>> example, since there was no indication on our logs that users were relying
>>>>> on it. We have always stated everywhere (including all emails sent in this
>>>>> mailing list) that we are 100% open to add a functionality if it is backed
>>>>> up by a valid use case.
>>>>>
>>>>>
>>>>>> I see now that it looks like the RevertRisk model will be replacing
>>>>>> the *damaging *and *goodfaith *models that differentiate intentional
>>>>>> damage from unintentional damage. There's a large body of research on why
>>>>>> this is valuable and important to the social functioning of the wikis.
>>>>>> This literature also discusses why being reverted is not a very good signal
>>>>>> for damage/vandalism and can lead to problems when used as a signal for
>>>>>> patrolling. Was there a community discussion about this deprecation that I
>>>>>> missed? I have some preliminary results (in press) that demonstrate that
>>>>>> the RevertRisk model performs significantly worse than the damaging and
>>>>>> goodfaith models in English Wikipedia for patrolling work. Do you have
>>>>>> documentation for how you evaluated this model and compared it to
>>>>>> damaging/goodfaith?
>>>>>>
>>>>>
>>>>> We have model cards related to both Revert Risk models, all of them
>>>>> linked in the API portal docs (more info:
>>>>> https://api.wikimedia.org/wiki/Lift_Wing_API). All the community
>>>>> folks that migrated their bots/tools/etc.. to Revert Risk were very happy
>>>>> about the change, and we haven't had any request to switch back since then.
>>>>>
>>>>> The ML team provides all the models deployed on ORES on Lift Wing, so
>>>>> any damaging and goodfaith variant is available in the new API. We chose to
>>>>> not pursue the development of those models for several reasons:
>>>>> - We haven't had any indication/request from the community about those
>>>>> models in almost two years, except few Phabricator updates that we followed
>>>>> up on.
>>>>> - Managing several hundreds models for goodfaith and damaging is not
>>>>> very scalable in a modern micro-service architecture like Lift Wing (since
>>>>> we have a model for each supported wiki). We (both Research and ML) are
>>>>> oriented on having fewer models that manage more languages at the same
>>>>> time, and this is the direction that we are following at the moment. It may
>>>>> not be the perfect one but so far it seems a good choice. If you want to
>>>>> chime in and provide your inputs we are 100% available in hearing
>>>>> suggestions/concerns/doubts/recommendations/etc.., please follow up in any
>>>>> of our channels (IRC, mailing lists, Phabricator for example).
>>>>> - Last but not the least, most of the damaging/goodfaith models have
>>>>> been trained with data coming from years ago, and never re-trained. The
>>>>> efforts to keep several hundreds models up-to-date with recent data versus
>>>>> doing the same of few models (like revert risk) weights in favor of the
>>>>> latter for a relatively small team of engineers like us.
>>>>>
>>>>>
>>>>>> FWIW, from my reading of these announcement threads, I believed that
>>>>>> generally functionality and models would be preserved in
>>>>>> ores-legacy/LiftWing. This is the first time I've realized the scale of
>>>>>> what will become unavailable.
>>>>>>
>>>>>
>>>>> This is the part that I don't get, since as mentioned before all the
>>>>> models that currently run on ORES are available in both ores-legacy and
>>>>> Lift Wing. What changes is that we don't expose anymore functionality that
>>>>> logs clearly show are not used, and that would need to be maintained and
>>>>> improved over time. We are open to improve and add any requirement that the
>>>>> community needs, the only thing that we ask is to provide a valid use case
>>>>> to support it.
>>>>>
>>>>> I do think that Lift Wing is a great improvement for the community, we
>>>>> have been working with all the folks that reached out to us, without hiding
>>>>> anything (including deprecation plans and path forwards).
>>>>>
>>>>> Thanks for following up!
>>>>>
>>>>> Luca
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>>>>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>>>>>
>>>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>>>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>>>>
>>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>>
>>>
>>>
>>> --
>>> Samuel Klein @metasj w:user:sj +1 617 529
>>> 4266
>>> _______________________________________________
>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>>>
>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
>
Re: ORES To Lift Wing Migration [ In reply to ]
On Mon, Sep 25, 2023 at 10:37?AM Chris Albon <calbon@wikimedia.org> wrote:

> Hey SJ!
>
> Intuitively one model per wiki has a lot of merit.
>
< if we continued down that path, Lift Wing would eventually be hosting
> 3000+ models (i.e. 330 models per new feature) pretty quickly
>
< But with language agnostic models we can make that model available to all
> communities.
>

Thanks! Agreed that having a model available to all communities is good
for equity :) In the automoderator case, is it that the multilingual model
incorporates the language-agnostic model, but not vice-versa? Is there a
way to have the inverse: a generalized multilingual model, that may be
fine-tuned for different communities, but does its best with input in
less-known languages or variants? [.Perhaps w/ context cues for users
estimating how far out of distribution the input is.]

I like the idea of a general model that can be tuned, since I can imagine
community groups maintaining datasets for fine-tuning more easily than
maintaining their own entire models.

Warmly, SJ

1 2  View All