Mailing List Archive

[Wikimedia-l] Re: Bing-ChatGPT
Or, maybe just require an open disclosure of where the bot pulled from and
how much, instead of having it be a black box? "Text in this response
derived from: 17% Wikipedia article 'Example', 12% Wikipedia article
'SomeOtherThing', 10%...".

On Sat, Mar 18, 2023 at 10:17?PM Steven Walling <>

> On Sat, Mar 18, 2023 at 3:49 PM Erik Moeller <> wrote:
>> On Fri, Mar 17, 2023 at 7:05?PM Steven Walling <>
>> wrote:
>> > IANAL of course, but to me this implies that responsibility for the
>> *egregious* lack
>> > of attribution in models that rely substantially on Wikipedia is
>> violating the Attribution
>> > requirements of CC licenses.
>> Morally, I agree that companies like OpenAI would do well to recognize
>> and nurture the sources they rely upon in training their models.
>> Especially as the web becomes polluted with low quality AI-generated
>> content, it would seem in everybody's best interest to sustain the
>> communities and services that make and keep high quality information
>> available. Not just Wikimedia, but also the Internet Archive, open
>> access journals and preprint servers, etc.
>> Legally, it seems a lot murkier. OpenAI in particular does not
>> distribute any of its GPT models. You can feed them prompts by various
>> means, and get responses back. Do those responses plagiarize
>> Wikipedia?
>> With image-generating models like Stable Diffusion, it's been found
>> that the models sometimes generate output nearly indistinguishable
>> from source material [1]. I don't know if similar studies have been
>> undertaken for text-generating models yet. You can certainly ask GPT-4
>> to generate something that looks like a Wikipedia article -- here are
>> example results for generating a random Wikipedia article:
>> Article:
>> GPT-4 <>
>> run 1:
>> (cut off at the ChatGPT generation limit)
>> GPT-4 run 2:
>> GPT-4 <>
>> run 3:
>> It imitates the form of a Wikipedia article & mixes up / makes up
>> assertions, but I don't know that any of its generations would meet
>> the standard of infringing on the Wikipedia article's copyright. IANAL
>> either, and as you say, the legal landscape is evolving rapidly.
>> Warmly,
>> Erik
> The whole thing is definitely a hot mess. If the remixing/transformation
> by the model is a derivative work, it means OpenAI is potentially violating
> the ShareAlike requirement by not distributing the text output as CC. But
> on other hand the nature of the model means they’re combining CC and non
> free works freely / at random, unless a court would interpret whatever % of
> training data comes from us as the direct degree to which the model output
> is derived from Wikipedia. Either way it’s going to be up to some legal
> representation of copyright holders to test the boundaries here.
>> [1]
>> _______________________________________________
>> Wikimedia-l mailing list --, guidelines
>> at: and
>> Public archives at
>> To unsubscribe send an email to
> _______________________________________________
> Wikimedia-l mailing list --, guidelines
> at: and
> Public archives at
> To unsubscribe send an email to