Mailing List Archive

[Wikimedia-l] Re: Bing-ChatGPT
Or, maybe just require an open disclosure of where the bot pulled from and
how much, instead of having it be a black box? "Text in this response
derived from: 17% Wikipedia article 'Example', 12% Wikipedia article
'SomeOtherThing', 10%...".

On Sat, Mar 18, 2023 at 10:17?PM Steven Walling <steven.walling@gmail.com>
wrote:

>
>
> On Sat, Mar 18, 2023 at 3:49 PM Erik Moeller <eloquence@gmail.com> wrote:
>
>> On Fri, Mar 17, 2023 at 7:05?PM Steven Walling <steven.walling@gmail.com>
>> wrote:
>>
>> > IANAL of course, but to me this implies that responsibility for the
>> *egregious* lack
>> > of attribution in models that rely substantially on Wikipedia is
>> violating the Attribution
>> > requirements of CC licenses.
>>
>> Morally, I agree that companies like OpenAI would do well to recognize
>> and nurture the sources they rely upon in training their models.
>> Especially as the web becomes polluted with low quality AI-generated
>> content, it would seem in everybody's best interest to sustain the
>> communities and services that make and keep high quality information
>> available. Not just Wikimedia, but also the Internet Archive, open
>> access journals and preprint servers, etc.
>>
>> Legally, it seems a lot murkier. OpenAI in particular does not
>> distribute any of its GPT models. You can feed them prompts by various
>> means, and get responses back. Do those responses plagiarize
>> Wikipedia?
>>
>> With image-generating models like Stable Diffusion, it's been found
>> that the models sometimes generate output nearly indistinguishable
>> from source material [1]. I don't know if similar studies have been
>> undertaken for text-generating models yet. You can certainly ask GPT-4
>> to generate something that looks like a Wikipedia article -- here are
>> example results for generating a random Wikipedia article:
>>
>> Article: https://en.wikipedia.org/wiki/The_Talented_Mr._Ripley_(film)
>> GPT-4 <https://en.wikipedia.org/wiki/The_Talented_Mr._Ripley_(film)GPT-4>
>> run 1: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/1
>> (cut off at the ChatGPT generation limit)
>> GPT-4 run 2: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/2
>> GPT-4 <https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/2GPT-4>
>> run 3: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/3
>>
>> It imitates the form of a Wikipedia article & mixes up / makes up
>> assertions, but I don't know that any of its generations would meet
>> the standard of infringing on the Wikipedia article's copyright. IANAL
>> either, and as you say, the legal landscape is evolving rapidly.
>>
>> Warmly,
>> Erik
>
>
> The whole thing is definitely a hot mess. If the remixing/transformation
> by the model is a derivative work, it means OpenAI is potentially violating
> the ShareAlike requirement by not distributing the text output as CC. But
> on other hand the nature of the model means they’re combining CC and non
> free works freely / at random, unless a court would interpret whatever % of
> training data comes from us as the direct degree to which the model output
> is derived from Wikipedia. Either way it’s going to be up to some legal
> representation of copyright holders to test the boundaries here.
>
>
>> [1]
>> https://arstechnica.com/information-technology/2023/02/researchers-extract-training-images-from-stable-diffusion-but-its-difficult/
>> _______________________________________________
>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/CO3IJWXGHTBP3YE7AKUHHKPAL5HA56IC/
>> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/4BZ5B4DFK3HTWM6CHPZ4Q4RDZIGIN26V/
> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org