Mailing List Archive: [Wikimedia-l] Re: Bing-ChatGPT

On Fri, Mar 17, 2023 at 7:05?PM Steven Walling <steven.walling@gmail.com> wrote:

> IANAL of course, but to me this implies that responsibility for the *egregious* lack
> of attribution in models that rely substantially on Wikipedia is violating the Attribution
> requirements of CC licenses.

Morally, I agree that companies like OpenAI would do well to recognize
and nurture the sources they rely upon in training their models.
Especially as the web becomes polluted with low quality AI-generated
content, it would seem in everybody's best interest to sustain the
communities and services that make and keep high quality information
available. Not just Wikimedia, but also the Internet Archive, open
access journals and preprint servers, etc.

Legally, it seems a lot murkier. OpenAI in particular does not
distribute any of its GPT models. You can feed them prompts by various
means, and get responses back. Do those responses plagiarize
Wikipedia?

With image-generating models like Stable Diffusion, it's been found
that the models sometimes generate output nearly indistinguishable
from source material [1]. I don't know if similar studies have been
undertaken for text-generating models yet. You can certainly ask GPT-4
to generate something that looks like a Wikipedia article -- here are
example results for generating a random Wikipedia article:

Article: https://en.wikipedia.org/wiki/The_Talented_Mr._Ripley_(film)
GPT-4 run 1: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/1
(cut off at the ChatGPT generation limit)
GPT-4 run 2: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/2
GPT-4 run 3: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/3

It imitates the form of a Wikipedia article & mixes up / makes up
assertions, but I don't know that any of its generations would meet
the standard of infringing on the Wikipedia article's copyright. IANAL
either, and as you say, the legal landscape is evolving rapidly.

Warmly,
Erik

[1] https://arstechnica.com/information-technology/2023/02/researchers-extract-training-images-from-stable-diffusion-but-its-difficult/
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/CO3IJWXGHTBP3YE7AKUHHKPAL5HA56IC/
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org