Mailing List Archive: [Wikimedia-l] Re: Chat GPT

mk0705182@gmail.com

On Sat, Feb 4, 2023, 8:24 PM Adam Sobieski <adamsobieski@hotmail.com> wrote:

> Brainstorming on how to drive traffic to Wikimedia content from
> conversational media, UI/UX designers could provide menu items or buttons
> on chatbots' applications or webpage components (e.g., to read more about
> the content, to navigate to cited resources, to edit the content, to
> discuss the content, to upvote/downvote the content, to share the content
> or the recent dialogue history on social media, to request
> review/moderation/curation for the content, etc.). Many of these
> envisioned menu items or buttons would operate contextually during
> dialogues, upon the most recent (or otherwise selected) responses provided
> by the chatbot or upon the recent transcripts. Some of these features
> could also be made available to end-users via spoken-language commands.
>
> At any point during hypertext-based dialogues, end-users would be able to
> navigate to Wikimedia content. These navigations could utilize either URL
> query string arguments or HTTP POST. In either case, bulk usage data, e.g.,
> those dialogue contexts navigated from, could be useful.
>
> The capability to perform A/B testing across chatbots’ dialogues, over
> large populations of end-users, could also be useful. In this way,
> Wikimedia would be better able to: (1) measure end-user engagement and
> satisfaction, (2) measure the quality of provided content, (3) perform
> personalization, (4) retain readers and editors. A/B testing could be
> performed by providing end-users with various feedback buttons (as
> described above). A/B testing data could also be obtained through data
> mining, analyzing end-users’ behaviors, response times, responses, and
> dialogue moves. These data could be provided for the community at special
> pages and could be made available per article, possibly by enhancing the
> “Page information” system. One can also envision these kinds of analytics
> data existing at the granularity of portions of, or selections of,
> articles.
>
>
>
> Best regards,
>
> Adam
>
> ------------------------------
> *From:* Victoria Coleman <vstavridoucoleman@gmail.com>
> *Sent:* Saturday, February 4, 2023 8:10 AM
> *To:* Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org>
> *Subject:* [Wikimedia-l] Re: Chat GPT
>
> Hi Christophe,
>
> I had not thought about the threat to Wikipedia traffic from Chat GPT but
> you have a good point. The success of the projects is always one step away
> from the next big disruption. So the WMF as the tech provider for the
> mission (because first and foremost in my view that?s what the WMF is - as
> well as the financial engine of the movement of course) needs to pay
> attention and experiment to maintain the long term viability of the
> mission. In fact I think the cluster of our projects offers compelling
> options. For example to your point below on data sets, we have the amazing
> Wikidata as well the excellent work on abstract Wikipedia. We have
> Wikipedia Enterprise which has built some avenues of collaboration with big
> tech. A bold vision is needed to bring all of it together and build an MVP
> for the community to experiment with.
>
> Best regards,
>
> Victoria Coleman
>
> On Feb 4, 2023, at 4:14 AM, Christophe Henner <christophe.henner@gmail.com>
> wrote:
>
> ?Hi,
>
> On the product side, NLP based AI biggest concern to me is that it would
> drastically decrease traffic to our websites/apps. Which means less new
> editors ans less donations.
>
> So first from a strictly positioning perspective, we have here a major
> change that needs to be managed.
>
> And to be honest, it will come faster than we think. We are
> perfectionists, I can assure you, most companies would be happy to launch a
> search product with a 80% confidence in answers quality.
>
> From a financial perspective, large industrial investment like this are
> usually a pool of money you can draw from in x years. You can expect they
> did not draw all of it yet.
>
> Second, GPT 3 and ChatGPT are far from being the most expensive products
> they have. On top of people you need:
> * datasets
> * people to tag the dataset
> * people to correct the algo
> * computing power
>
> I simplify here, but we already have the capacity to muster some of that,
> which drastically lowers our costs :)
>
> I would not discard the option of the movement doing it so easily. That
> being said, it would mean a new project with the need of substantial
> ressources.
>
> Sent from my iPhone
>
> On Feb 4, 2023, at 9:30 AM, Adam Sobieski <adamsobieski@hotmail.com>
> wrote:
>
> ?
> With respect to cloud computing costs, these being a significant component
> of the costs to train and operate modern AI systems, as a non-profit
> organization, the Wikimedia Foundation might be interested in the National
> Research Cloud (NRC) policy proposal:
> https://hai.stanford.edu/policy/national-research-cloud .
>
> "Artificial intelligence requires vast amounts of computing power, data,
> and expertise to train and deploy the massive machine learning models
> behind the most advanced research. But access is increasingly out of reach
> for most colleges and universities. A National Research Cloud (NRC) would
> provide academic and *non-profit researchers* with the compute power and
> government datasets needed for education and research. By democratizing
> access and equity for all colleges and universities, an NRC has the
> potential not only to unleash a string of advancements in AI, but to help
> ensure the U.S. maintains its leadership and competitiveness on the global
> stage.
>
> "Throughout 2020, Stanford HAI led efforts with 22 top computer science
> universities along with a bipartisan, bicameral group of lawmakers
> proposing legislation to bring the NRC to fruition. On January 1, 2021, the
> U.S. Congress authorized the National AI Research Resource Task Force Act
> as part of the National Defense Authorization Act for Fiscal Year 2021.
> This law requires that a federal task force be established to study and
> provide an implementation pathway to create world-class computational
> resources and robust government datasets for researchers across the country
> in the form of a National Research Cloud. The task force will issue a final
> report to the President and Congress next year.
>
> "The promise of an NRC is to democratize AI research, education, and
> innovation, making it accessible to all colleges and universities across
> the country. Without a National Research Cloud, all but the most elite
> universities risk losing the ability to conduct meaningful AI research and
> to adequately educate the next generation of AI researchers."
>
> See also: [1][2]
>
> [1]
> https://www.whitehouse.gov/ostp/news-updates/2023/01/24/national-artificial-intelligence-research-resource-task-force-releases-final-report/
> [2]
> https://www.ai.gov/wp-content/uploads/2023/01/NAIRR-TF-Final-Report-2023.pdf
>
> ------------------------------
> *From:* Steven Walling <steven.walling@gmail.com>
> *Sent:* Saturday, February 4, 2023 1:59 AM
> *To:* Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org>
> *Subject:* [Wikimedia-l] Re: Chat GPT
>
>
>
> On Fri, Feb 3, 2023 at 9:47 PM Gerg? Tisza <gtisza@gmail.com> wrote:
>
> Just to give a sense of scale: OpenAI started with a $1 billion donation,
> got another $1B as investment, and is now getting a larger investment from
> Microsoft (undisclosed but rumored to be $10B). Assuming they spent most of
> their previous funding, which seems likely, their operational costs are in
> the ballpark of $300 million per year. The idea that the WMF could just
> choose to create conversational software of a similar quality if it wanted
> seems detached from reality to me.
>
>
> Without spending billions on LLM development to aim for a
> conversational chatbot trying to pass a Turing test, we could definitely
> try to catch up to the state of the art in search results. Our search
> currently does a pretty bad job (in terms of recall especially). Today's
> featured article in English is the Hot Chip album "Made in the Dark", and
> if I enter anything but the exact article title the typeahead results are
> woefully incomplete or wrong. If I ask an actual question, good luck.
>
> Google is feeling vulnerable to OpenAI here in part because everyone can
> see that their results are often full of low quality junk created for SEO,
> while ChatGPT just gives a concise answer right there.
>
> https://en.wikipedia.org/wiki/The_Menu_(2022_film) is one of the top
> viewed English articles. If I search "The Menu reviews" the Google results
> are noisy and not so great. ChatGPT actually gives you nothing relevant
> because it doesn't know anything from 2022. If we could just manage to
> display the three sentence snippet of our article about the critical
> response section of the article, it would be awesome. It's too bad that the
> whole "knowledge engine" debacle poisoned the well when it comes to a
> Wikipedia search engine, because we could definitely do a lot to learn from
> what people like about ChatGPT and apply to Wikipedia search.
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/6OBPB7WNHKJQXXIBCK73SDXLE3DMGNMY/
> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/SIAPXQCG4ZKE46KS4PS6PQQMYQRSDNR5/
> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/Q7BZ5M4MR5EIV3EJ2OS7NH3VREADLUI2/
> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/BMGLWIDD6MRBADEJSGRJE7FI6YTLHBUT/
> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/IQ6XWOCBBIWLO23GD2RFQ4YTTGKYJKAB/
> To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org