Mailing List Archive: [Wikimedia-l] Re: Chat GPT

Hi.
Thank you for your cooperation.

?? ????? ??????? ?? ?????? ?????? ??:?? The Cunctator <cunctator@gmail.com>
????:

> This is almost definitely the case.
>
> On Mon, Feb 6, 2023, 2:39 AM Ilario Valdelli <valdelli@gmail.com> wrote:
>
>> And this is a problem.
>>
>> If ChatGPT uses open content, there is an infringement of license.
>>
>> Specifically the CC-by-sa if it uses Wikipedia. In this case the
>> attribution must be present.
>>
>> Kind regards
>>
>> On Sun, 5 Feb 2023, 08:12 Peter Southwood, <peter.southwood@telkomsa.net>
>> wrote:
>>
>>> “Not citing sources is probably a conscious design choice, as citing
>>> sources would mean sharing the sources used to train the language models”
>>> This may be a choice that comes back to bite them. Without citing their
>>> sources, they are unreliable as a source for anything one does not know
>>> already. Someone will have a bad consequence from relying on the
>>> information and will sue the publisher. It will be interesting to see how
>>> they plan to weasel their way out of legal responsibility while retaining
>>> any credibility. My guess is there will be a requirement to state that the
>>> information is AI generated and of entirely unknown and untested
>>> reliability. How soon to the first class action, I wonder. Lots of money
>>> for the lawyers. Cheers, Peter.
>>>
>>>
>>>
>>> *From:* Subhashish [mailto:psubhashish@gmail.com]
>>> *Sent:* 05 February 2023 06:37
>>> *To:* Wikimedia Mailing List
>>> *Subject:* [Wikimedia-l] Re: Chat GPT
>>>
>>>
>>>
>>> Just to clarify, my point was not about Getty to begin with. Whether
>>> Getty would win and whether a big corporation should own such a large
>>> amount of visual content are questions outside this particular thread. It
>>> would certainly be interesting to see how things roll.
>>>
>>>
>>>
>>> But AI/ML is way more than just looking. Training with large models is a
>>> very sophisticated and technical process. Data annotation among many other
>>> forms of labour are done by real people. the article I had linked earlier
>>> tells a lot about the real world consequences of AI. I'm certain AI/ML,
>>> especially when we're talking about language models like ChatGPT, are far
>>> from innocent looking/reading. For starters, derivative of works, except
>>> Public Domain ones, must attribute the authors. Any provision for
>>> attribution is deliberately removed from systems like ChatGPT and that only
>>> gives corporations like OpenAI a free ride sans accountability.
>>>
>>>
>>>
>>> Subhashish
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Feb 4, 2023, 4:41 PM Todd Allen <toddmallen@gmail.com> wrote:
>>>
>>> I'm not so sure Getty's got a case, though. If the images are on the
>>> Web, is using them to train an AI something copyright would cover? That to
>>> me seems more equivalent to just looking at the images, and there's no
>>> copyright problem in going to Getty's site and just looking at a bunch of
>>> their pictures.
>>>
>>>
>>>
>>> But it will be interesting to see how that one shakes out.
>>>
>>>
>>>
>>> Todd
>>>
>>>
>>>
>>> On Sat, Feb 4, 2023 at 11:47 AM Subhashish <psubhashish@gmail.com>
>>> wrote:
>>>
>>> Not citing sources is probably a conscious design choice, as citing
>>> sources would mean sharing the sources used to train the language models.
>>> Getty has just sued Stability AI, alleging the use of 12 million
>>> photographs without permission or compensation. Imagine if Stability had to
>>> purchase from Getty through a legal process. For starters, Getty might not
>>> have agreed in the first place. Bulk-scaping publicly visible text in
>>> text-based AIs like ChatGPT would mean scraping text with copyright. But
>>> even reusing CC BY-SA content would require attribution. None of the AI
>>> platforms attributes their sources because they did not acquire content in
>>> legal and ethical ways [1]. Large language models won't be large and
>>> releases won't happen fast if they actually start acquiring content
>>> gradually from trustworthy sources. It took so many years for hundreds and
>>> thousands of Wikimedians to take Wikipedias in different languages to where
>>> they are for a reason.
>>>
>>>
>>>
>>> 1. https://time.com/6247678/openai-chatgpt-kenya-workers/
>>>
>>>
>>> Subhashish
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Feb 4, 2023 at 1:06 PM Peter Southwood <
>>> peter.southwood@telkomsa.net> wrote:
>>>
>>> From what I have seen the AIs are not great on citing sources. If they
>>> start citing reliable sources, their contributions can be verified, or not.
>>> If they produce verifiable, adequately sourced, well written information,
>>> are they a problem or a solution?
>>>
>>> Cheers,
>>>
>>> Peter
>>>
>>>
>>>
>>> *From:* Gnangarra [mailto:gnangarra@gmail.com]
>>> *Sent:* 04 February 2023 17:04
>>> *To:* Wikimedia Mailing List
>>> *Subject:* [Wikimedia-l] Re: Chat GPT
>>>
>>>
>>>
>>> I see our biggest challenge is going to be detecting these AI tools
>>> adding content whether it's media or articles, along with identifying when
>>> they are in use by sources. The failing of all new AI is not in its
>>> ability but in the lack of transparency with that being able to be
>>> identified by the readers. We have seen people impersonating musicians and
>>> writing songs in their style. We have also seen pictures that have been
>>> created by copying someone else's work yet not acknowledging it as being
>>> derivative of any kind.
>>>
>>>
>>>
>>> Our big problems will be in ensuring that copyright is respected in
>>> legally, and not hosting anything that is even remotely dubious
>>>
>>>
>>>
>>> On Sat, 4 Feb 2023 at 22:24, Adam Sobieski <adamsobieski@hotmail.com>
>>> wrote:
>>>
>>> Brainstorming on how to drive traffic to Wikimedia content from
>>> conversational media, UI/UX designers could provide menu items or buttons
>>> on chatbots' applications or webpage components (e.g., to read more about
>>> the content, to navigate to cited resources, to edit the content, to
>>> discuss the content, to upvote/downvote the content, to share the content
>>> or the recent dialogue history on social media, to request
>>> review/moderation/curation for the content, etc.). Many of these envisioned
>>> menu items or buttons would operate contextually during dialogues, upon the
>>> most recent (or otherwise selected) responses provided by the chatbot or
>>> upon the recent transcripts. Some of these features could also be made
>>> available to end-users via spoken-language commands.
>>>
>>> At any point during hypertext-based dialogues, end-users would be able
>>> to navigate to Wikimedia content. These navigations could utilize either
>>> URL query string arguments or HTTP POST. In either case, bulk usage data,
>>> e.g., those dialogue contexts navigated from, could be useful.
>>>
>>> The capability to perform A/B testing across chatbots’ dialogues, over
>>> large populations of end-users, could also be useful. In this way,
>>> Wikimedia would be better able to: (1) measure end-user engagement and
>>> satisfaction, (2) measure the quality of provided content, (3) perform
>>> personalization, (4) retain readers and editors. A/B testing could be
>>> performed by providing end-users with various feedback buttons (as
>>> described above). A/B testing data could also be obtained through data
>>> mining, analyzing end-users’ behaviors, response times, responses, and
>>> dialogue moves. These data could be provided for the community at special
>>> pages and could be made available per article, possibly by enhancing the
>>> “Page information” system. One can also envision these kinds of analytics
>>> data existing at the granularity of portions of, or selections of,
>>> articles.
>>>
>>>
>>>
>>>
>>>
>>> Best regards,
>>>
>>> Adam
>>>
>>>
>>> ------------------------------
>>>
>>> *From:* Victoria Coleman <vstavridoucoleman@gmail.com>
>>> *Sent:* Saturday, February 4, 2023 8:10 AM
>>> *To:* Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org>
>>> *Subject:* [Wikimedia-l] Re: Chat GPT
>>>
>>>
>>>
>>> Hi Christophe,
>>>
>>>
>>>
>>> I had not thought about the threat to Wikipedia traffic from Chat GPT
>>> but you have a good point. The success of the projects is always one step
>>> away from the next big disruption. So the WMF as the tech provider for the
>>> mission (because first and foremost in my view that?s what the WMF is - as
>>> well as the financial engine of the movement of course) needs to pay
>>> attention and experiment to maintain the long term viability of the
>>> mission. In fact I think the cluster of our projects offers compelling
>>> options. For example to your point below on data sets, we have the amazing
>>> Wikidata as well the excellent work on abstract Wikipedia. We have
>>> Wikipedia Enterprise which has built some avenues of collaboration with big
>>> tech. A bold vision is needed to bring all of it together and build an MVP
>>> for the community to experiment with.
>>>
>>> Best regards,
>>>
>>>
>>>
>>> Victoria Coleman
>>>
>>>
>>>
>>> On Feb 4, 2023, at 4:14 AM, Christophe Henner <
>>> christophe.henner@gmail.com> wrote:
>>>
>>> ?Hi,
>>>
>>>
>>>
>>> On the product side, NLP based AI biggest concern to me is that it would
>>> drastically decrease traffic to our websites/apps. Which means less new
>>> editors ans less donations.
>>>
>>>
>>>
>>> So first from a strictly positioning perspective, we have here a major
>>> change that needs to be managed.
>>>
>>>
>>>
>>> And to be honest, it will come faster than we think. We are
>>> perfectionists, I can assure you, most companies would be happy to launch a
>>> search product with a 80% confidence in answers quality.
>>>
>>>
>>>
>>> From a financial perspective, large industrial investment like this are
>>> usually a pool of money you can draw from in x years. You can expect they
>>> did not draw all of it yet.
>>>
>>>
>>>
>>> Second, GPT 3 and ChatGPT are far from being the most expensive products
>>> they have. On top of people you need:
>>>
>>> * datasets
>>>
>>> * people to tag the dataset
>>>
>>> * people to correct the algo
>>>
>>> * computing power
>>>
>>>
>>>
>>> I simplify here, but we already have the capacity to muster some of
>>> that, which drastically lowers our costs :)
>>>
>>>
>>>
>>> I would not discard the option of the movement doing it so easily. That
>>> being said, it would mean a new project with the need of substantial
>>> ressources.
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>>
>>>
>>> On Feb 4, 2023, at 9:30 AM, Adam Sobieski <adamsobieski@hotmail.com>
>>> wrote:
>>>
>>> ?
>>>
>>> With respect to cloud computing costs, these being a significant
>>> component of the costs to train and operate modern AI systems, as a
>>> non-profit organization, the Wikimedia Foundation might be interested in
>>> the National Research Cloud (NRC) policy proposal:
>>> https://hai.stanford.edu/policy/national-research-cloud .
>>>
>>>
>>>
>>> "Artificial intelligence requires vast amounts of computing power, data,
>>> and expertise to train and deploy the massive machine learning models
>>> behind the most advanced research. But access is increasingly out of reach
>>> for most colleges and universities. A National Research Cloud (NRC) would
>>> provide academic and *non-profit researchers* with the compute power
>>> and government datasets needed for education and research. By democratizing
>>> access and equity for all colleges and universities, an NRC has the
>>> potential not only to unleash a string of advancements in AI, but to help
>>> ensure the U.S. maintains its leadership and competitiveness on the global
>>> stage.
>>>
>>>
>>>
>>> "Throughout 2020, Stanford HAI led efforts with 22 top computer science
>>> universities along with a bipartisan, bicameral group of lawmakers
>>> proposing legislation to bring the NRC to fruition. On January 1, 2021, the
>>> U.S. Congress authorized the National AI Research Resource Task Force Act
>>> as part of the National Defense Authorization Act for Fiscal Year 2021.
>>> This law requires that a federal task force be established to study and
>>> provide an implementation pathway to create world-class computational
>>> resources and robust government datasets for researchers across the country
>>> in the form of a National Research Cloud. The task force will issue a final
>>> report to the President and Congress next year.
>>>
>>>
>>>
>>> "The promise of an NRC is to democratize AI research, education, and
>>> innovation, making it accessible to all colleges and universities across
>>> the country. Without a National Research Cloud, all but the most elite
>>> universities risk losing the ability to conduct meaningful AI research and
>>> to adequately educate the next generation of AI researchers."
>>>
>>>
>>>
>>> See also: [1][2]
>>>
>>>
>>>
>>> [1]
>>> https://www.whitehouse.gov/ostp/news-updates/2023/01/24/national-artificial-intelligence-research-resource-task-force-releases-final-report/
>>>
>>> [2]
>>> https://www.ai.gov/wp-content/uploads/2023/01/NAIRR-TF-Final-Report-2023.pdf
>>>
>>>
>>> ------------------------------
>>>
>>> *From:* Steven Walling <steven.walling@gmail.com>
>>> *Sent:* Saturday, February 4, 2023 1:59 AM
>>> *To:* Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org>
>>> *Subject:* [Wikimedia-l] Re: Chat GPT
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 3, 2023 at 9:47 PM Gerg? Tisza <gtisza@gmail.com> wrote:
>>>
>>> Just to give a sense of scale: OpenAI started with a $1 billion
>>> donation, got another $1B as investment, and is now getting a larger
>>> investment from Microsoft (undisclosed but rumored to be $10B). Assuming
>>> they spent most of their previous funding, which seems likely, their
>>> operational costs are in the ballpark of $300 million per year. The idea
>>> that the WMF could just choose to create conversational software of a
>>> similar quality if it wanted seems detached from reality to me.
>>>
>>>
>>>
>>> Without spending billions on LLM development to aim for a
>>> conversational chatbot trying to pass a Turing test, we could definitely
>>> try to catch up to the state of the art in search results. Our search
>>> currently does a pretty bad job (in terms of recall especially). Today's
>>> featured article in English is the Hot Chip album "Made in the Dark", and
>>> if I enter anything but the exact article title the typeahead results are
>>> woefully incomplete or wrong. If I ask an actual question, good luck.
>>>
>>>
>>>
>>> Google is feeling vulnerable to OpenAI here in part because everyone can
>>> see that their results are often full of low quality junk created for SEO,
>>> while ChatGPT just gives a concise answer right there.
>>>
>>>
>>>
>>> https://en.wikipedia.org/wiki/The_Menu_(2022_film) is one of the top
>>> viewed English articles. If I search "The Menu reviews" the Google results
>>> are noisy and not so great. ChatGPT actually gives you nothing relevant
>>> because it doesn't know anything from 2022. If we could just manage to
>>> display the three sentence snippet of our article about the critical
>>> response section of the article, it would be awesome. It's too bad that the
>>> whole "knowledge engine" debacle poisoned the well when it comes to a
>>> Wikipedia search engine, because we could definitely do a lot to learn from
>>> what people like about ChatGPT and apply to Wikipedia search.
>>>
>>>
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/6OBPB7WNHKJQXXIBCK73SDXLE3DMGNMY/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/SIAPXQCG4ZKE46KS4PS6PQQMYQRSDNR5/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/Q7BZ5M4MR5EIV3EJ2OS7NH3VREADLUI2/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/BMGLWIDD6MRBADEJSGRJE7FI6YTLHBUT/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/IQ6XWOCBBIWLO23GD2RFQ4YTTGKYJKAB/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Boodarwun
>>> Gnangarra
>>>
>>> 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar
>>> koortaboodjar'
>>>
>>>
>>>
>>>
>>>
>>>
>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>>
>>> Virus-free.www.avg.com
>>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/U2CENUMVOZMRKCG5CHH2U3EEIS244HZF/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/QADY6XI2HPXI3Q2X5TDCSWL4IUINZKP3/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/QKEHBIVNDJ37JIVYOLV7NNCCS3UY76E3/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/3TKJ2DTVRD7JNZGBYZQ2D7HHTNRUR2WV/
>>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>>
>> _______________________________________________
>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/V2ZU3FVMNAMZ43URP5UABMK6645QBYNE/
>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/WKMWSBRABR5G7TPCPJQWWVBECTUVJRZ5/
> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org