Mailing List Archive

[Wikimedia-l] Re: Chat GPT
Just to clarify, my point was not about Getty to begin with. Whether Getty
would win and whether a big corporation should own such a large amount of
visual content are questions outside this particular thread. It would
certainly be interesting to see how things roll.

But AI/ML is way more than just looking. Training with large models is a
very sophisticated and technical process. Data annotation among many other
forms of labour are done by real people. the article I had linked earlier
tells a lot about the real world consequences of AI. I'm certain AI/ML,
especially when we're talking about language models like ChatGPT, are far
from innocent looking/reading. For starters, derivative of works, except
Public Domain ones, must attribute the authors. Any provision for
attribution is deliberately removed from systems like ChatGPT and that only
gives corporations like OpenAI a free ride sans accountability.

Subhashish


On Sat, Feb 4, 2023, 4:41 PM Todd Allen <toddmallen@gmail.com> wrote:

> I'm not so sure Getty's got a case, though. If the images are on the Web,
> is using them to train an AI something copyright would cover? That to me
> seems more equivalent to just looking at the images, and there's no
> copyright problem in going to Getty's site and just looking at a bunch of
> their pictures.
>
> But it will be interesting to see how that one shakes out.
>
> Todd
>
> On Sat, Feb 4, 2023 at 11:47 AM Subhashish <psubhashish@gmail.com> wrote:
>
>> Not citing sources is probably a conscious design choice, as citing
>> sources would mean sharing the sources used to train the language models.
>> Getty has just sued Stability AI, alleging the use of 12 million
>> photographs without permission or compensation. Imagine if Stability had to
>> purchase from Getty through a legal process. For starters, Getty might not
>> have agreed in the first place. Bulk-scaping publicly visible text in
>> text-based AIs like ChatGPT would mean scraping text with copyright. But
>> even reusing CC BY-SA content would require attribution. None of the AI
>> platforms attributes their sources because they did not acquire content in
>> legal and ethical ways [1]. Large language models won't be large and
>> releases won't happen fast if they actually start acquiring content
>> gradually from trustworthy sources. It took so many years for hundreds and
>> thousands of Wikimedians to take Wikipedias in different languages to where
>> they are for a reason.
>>
>> 1. https://time.com/6247678/openai-chatgpt-kenya-workers/
>>
>> Subhashish
>>
>>
>> On Sat, Feb 4, 2023 at 1:06 PM Peter Southwood <
>> peter.southwood@telkomsa.net> wrote:
>>
>>> From what I have seen the AIs are not great on citing sources. If they
>>> start citing reliable sources, their contributions can be verified, or not.
>>> If they produce verifiable, adequately sourced, well written information,
>>> are they a problem or a solution?
>>>
>>> Cheers,
>>>
>>> Peter
>>>
>>>
>>>
>>> *From:* Gnangarra [mailto:gnangarra@gmail.com]
>>> *Sent:* 04 February 2023 17:04
>>> *To:* Wikimedia Mailing List
>>> *Subject:* [Wikimedia-l] Re: Chat GPT
>>>
>>>
>>>
>>> I see our biggest challenge is going to be detecting these AI tools
>>> adding content whether it's media or articles, along with identifying when
>>> they are in use by sources. The failing of all new AI is not in its
>>> ability but in the lack of transparency with that being able to be
>>> identified by the readers. We have seen people impersonating musicians and
>>> writing songs in their style. We have also seen pictures that have been
>>> created by copying someone else's work yet not acknowledging it as being
>>> derivative of any kind.
>>>
>>>
>>>
>>> Our big problems will be in ensuring that copyright is respected in
>>> legally, and not hosting anything that is even remotely dubious
>>>
>>>
>>>
>>> On Sat, 4 Feb 2023 at 22:24, Adam Sobieski <adamsobieski@hotmail.com>
>>> wrote:
>>>
>>> Brainstorming on how to drive traffic to Wikimedia content from
>>> conversational media, UI/UX designers could provide menu items or buttons
>>> on chatbots' applications or webpage components (e.g., to read more about
>>> the content, to navigate to cited resources, to edit the content, to
>>> discuss the content, to upvote/downvote the content, to share the content
>>> or the recent dialogue history on social media, to request
>>> review/moderation/curation for the content, etc.). Many of these envisioned
>>> menu items or buttons would operate contextually during dialogues, upon the
>>> most recent (or otherwise selected) responses provided by the chatbot or
>>> upon the recent transcripts. Some of these features could also be made
>>> available to end-users via spoken-language commands.
>>>
>>> At any point during hypertext-based dialogues, end-users would be able
>>> to navigate to Wikimedia content. These navigations could utilize either
>>> URL query string arguments or HTTP POST. In either case, bulk usage data,
>>> e.g., those dialogue contexts navigated from, could be useful.
>>>
>>> The capability to perform A/B testing across chatbots’ dialogues, over
>>> large populations of end-users, could also be useful. In this way,
>>> Wikimedia would be better able to: (1) measure end-user engagement and
>>> satisfaction, (2) measure the quality of provided content, (3) perform
>>> personalization, (4) retain readers and editors. A/B testing could be
>>> performed by providing end-users with various feedback buttons (as
>>> described above). A/B testing data could also be obtained through data
>>> mining, analyzing end-users’ behaviors, response times, responses, and
>>> dialogue moves. These data could be provided for the community at special
>>> pages and could be made available per article, possibly by enhancing the
>>> “Page information” system. One can also envision these kinds of analytics
>>> data existing at the granularity of portions of, or selections of,
>>> articles.
>>>
>>>
>>>
>>>
>>>
>>> Best regards,
>>>
>>> Adam
>>>
>>>
>>> ------------------------------
>>>
>>> *From:* Victoria Coleman <vstavridoucoleman@gmail.com>
>>> *Sent:* Saturday, February 4, 2023 8:10 AM
>>> *To:* Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org>
>>> *Subject:* [Wikimedia-l] Re: Chat GPT
>>>
>>>
>>>
>>> Hi Christophe,
>>>
>>>
>>>
>>> I had not thought about the threat to Wikipedia traffic from Chat GPT
>>> but you have a good point. The success of the projects is always one step
>>> away from the next big disruption. So the WMF as the tech provider for the
>>> mission (because first and foremost in my view that?s what the WMF is - as
>>> well as the financial engine of the movement of course) needs to pay
>>> attention and experiment to maintain the long term viability of the
>>> mission. In fact I think the cluster of our projects offers compelling
>>> options. For example to your point below on data sets, we have the amazing
>>> Wikidata as well the excellent work on abstract Wikipedia. We have
>>> Wikipedia Enterprise which has built some avenues of collaboration with big
>>> tech. A bold vision is needed to bring all of it together and build an MVP
>>> for the community to experiment with.
>>>
>>> Best regards,
>>>
>>>
>>>
>>> Victoria Coleman
>>>
>>>
>>>
>>> On Feb 4, 2023, at 4:14 AM, Christophe Henner <
>>> christophe.henner@gmail.com> wrote:
>>>
>>> ?Hi,
>>>
>>>
>>>
>>> On the product side, NLP based AI biggest concern to me is that it would
>>> drastically decrease traffic to our websites/apps. Which means less new
>>> editors ans less donations.
>>>
>>>
>>>
>>> So first from a strictly positioning perspective, we have here a major
>>> change that needs to be managed.
>>>
>>>
>>>
>>> And to be honest, it will come faster than we think. We are
>>> perfectionists, I can assure you, most companies would be happy to launch a
>>> search product with a 80% confidence in answers quality.
>>>
>>>
>>>
>>> From a financial perspective, large industrial investment like this are
>>> usually a pool of money you can draw from in x years. You can expect they
>>> did not draw all of it yet.
>>>
>>>
>>>
>>> Second, GPT 3 and ChatGPT are far from being the most expensive products
>>> they have. On top of people you need:
>>>
>>> * datasets
>>>
>>> * people to tag the dataset
>>>
>>> * people to correct the algo
>>>
>>> * computing power
>>>
>>>
>>>
>>> I simplify here, but we already have the capacity to muster some of
>>> that, which drastically lowers our costs :)
>>>
>>>
>>>
>>> I would not discard the option of the movement doing it so easily. That
>>> being said, it would mean a new project with the need of substantial
>>> ressources.
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>>
>>>
>>> On Feb 4, 2023, at 9:30 AM, Adam Sobieski <adamsobieski@hotmail.com>
>>> wrote:
>>>
>>> ?
>>>
>>> With respect to cloud computing costs, these being a significant
>>> component of the costs to train and operate modern AI systems, as a
>>> non-profit organization, the Wikimedia Foundation might be interested in
>>> the National Research Cloud (NRC) policy proposal:
>>> https://hai.stanford.edu/policy/national-research-cloud .
>>>
>>>
>>>
>>> "Artificial intelligence requires vast amounts of computing power, data,
>>> and expertise to train and deploy the massive machine learning models
>>> behind the most advanced research. But access is increasingly out of reach
>>> for most colleges and universities. A National Research Cloud (NRC) would
>>> provide academic and *non-profit researchers* with the compute power
>>> and government datasets needed for education and research. By democratizing
>>> access and equity for all colleges and universities, an NRC has the
>>> potential not only to unleash a string of advancements in AI, but to help
>>> ensure the U.S. maintains its leadership and competitiveness on the global
>>> stage.
>>>
>>>
>>>
>>> "Throughout 2020, Stanford HAI led efforts with 22 top computer science
>>> universities along with a bipartisan, bicameral group of lawmakers
>>> proposing legislation to bring the NRC to fruition. On January 1, 2021, the
>>> U.S. Congress authorized the National AI Research Resource Task Force Act
>>> as part of the National Defense Authorization Act for Fiscal Year 2021.
>>> This law requires that a federal task force be established to study and
>>> provide an implementation pathway to create world-class computational
>>> resources and robust government datasets for researchers across the country
>>> in the form of a National Research Cloud. The task force will issue a final
>>> report to the President and Congress next year.
>>>
>>>
>>>
>>> "The promise of an NRC is to democratize AI research, education, and
>>> innovation, making it accessible to all colleges and universities across
>>> the country. Without a National Research Cloud, all but the most elite
>>> universities risk losing the ability to conduct meaningful AI research and
>>> to adequately educate the next generation of AI researchers."
>>>
>>>
>>>
>>> See also: [1][2]
>>>
>>>
>>>
>>> [1]
>>> https://www.whitehouse.gov/ostp/news-updates/2023/01/24/national-artificial-intelligence-research-resource-task-force-releases-final-report/
>>>
>>> [2]
>>> https://www.ai.gov/wp-content/uploads/2023/01/NAIRR-TF-Final-Report-2023.pdf
>>>
>>>
>>> ------------------------------
>>>
>>> *From:* Steven Walling <steven.walling@gmail.com>
>>> *Sent:* Saturday, February 4, 2023 1:59 AM
>>> *To:* Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org>
>>> *Subject:* [Wikimedia-l] Re: Chat GPT
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 3, 2023 at 9:47 PM Gerg? Tisza <gtisza@gmail.com> wrote:
>>>
>>> Just to give a sense of scale: OpenAI started with a $1 billion
>>> donation, got another $1B as investment, and is now getting a larger
>>> investment from Microsoft (undisclosed but rumored to be $10B). Assuming
>>> they spent most of their previous funding, which seems likely, their
>>> operational costs are in the ballpark of $300 million per year. The idea
>>> that the WMF could just choose to create conversational software of a
>>> similar quality if it wanted seems detached from reality to me.
>>>
>>>
>>>
>>> Without spending billions on LLM development to aim for a
>>> conversational chatbot trying to pass a Turing test, we could definitely
>>> try to catch up to the state of the art in search results. Our search
>>> currently does a pretty bad job (in terms of recall especially). Today's
>>> featured article in English is the Hot Chip album "Made in the Dark", and
>>> if I enter anything but the exact article title the typeahead results are
>>> woefully incomplete or wrong. If I ask an actual question, good luck.
>>>
>>>
>>>
>>> Google is feeling vulnerable to OpenAI here in part because everyone can
>>> see that their results are often full of low quality junk created for SEO,
>>> while ChatGPT just gives a concise answer right there.
>>>
>>>
>>>
>>> https://en.wikipedia.org/wiki/The_Menu_(2022_film) is one of the top
>>> viewed English articles. If I search "The Menu reviews" the Google results
>>> are noisy and not so great. ChatGPT actually gives you nothing relevant
>>> because it doesn't know anything from 2022. If we could just manage to
>>> display the three sentence snippet of our article about the critical
>>> response section of the article, it would be awesome. It's too bad that the
>>> whole "knowledge engine" debacle poisoned the well when it comes to a
>>> Wikipedia search engine, because we could definitely do a lot to learn from
>>> what people like about ChatGPT and apply to Wikipedia search.
>>>
>>>
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/6OBPB7WNHKJQXXIBCK73SDXLE3DMGNMY/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/SIAPXQCG4ZKE46KS4PS6PQQMYQRSDNR5/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/Q7BZ5M4MR5EIV3EJ2OS7NH3VREADLUI2/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/BMGLWIDD6MRBADEJSGRJE7FI6YTLHBUT/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/IQ6XWOCBBIWLO23GD2RFQ4YTTGKYJKAB/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Boodarwun
>>> Gnangarra
>>>
>>> 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar
>>> koortaboodjar'
>>>
>>>
>>>
>>>
>>>
>>>
>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>>
>>> Virus-free.www.avg.com
>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>>
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/U2CENUMVOZMRKCG5CHH2U3EEIS244HZF/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>> _______________________________________________
>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/QADY6XI2HPXI3Q2X5TDCSWL4IUINZKP3/
>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/QKEHBIVNDJ37JIVYOLV7NNCCS3UY76E3/
> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org