New York Times Takes on OpenAI, Microsoft

“The Times’ complaint highlights the phenomenon of AI ‘hallucinations,’ which remain a major risk when it comes to LLMs.

https://depositphotos.com/49480745/stock-photo-the-new-york-times-web.htmlOn December 27, the New York Times Company became the latest complainant to accuse OpenAI’s Large Language Model, ChatGPT, as well as Microsoft’s GPT-4-powered Bing Chat, of widespread copyright infringement. The Times alleges that Microsoft and OpenAI reproduce Times content verbatim and also often attribute false information to the Times.

OpenAI has been sued by numerous creators and authors for training its chatbots on content found online, including non-public or copyright-protected content. For example, the Times included examples in its complaint in which prompts to ChatGPT asking it to reproduce paywalled content resulted in verbatim excerpts from the article in question.

The Times’ complaint also highlights the phenomenon of AI “hallucinations,” which remain a major risk when it comes to LLMs. In one example provided by the Times, Bing Chat “completely fabricated” a paragraph from a Times article by “including specific quotes attributed to Steve Forbes’s daughter Moira Forbes, that appear nowhere in The Times article in question or anywhere else on the internet.” In another example of hallucinations, Bing Chat generated a list of heart-healthy foods based on a specific New York Times article, but the article in question did not even mention 12 of the 15 foods on the list. The complaint included several other examples of hallucinations that resulted in fake article headlines about COVID-19 with non-working links and a fabricated headline about the link between orange juice and non-Hodgkins lymphoma that was attributed to the Times. “Users who ask a search engine what The Times has written on a subject should be provided with neither an unauthorized copy nor an inaccurate forgery of a Times article, but a link to the article itself,” said the complaint.

Some reports have deemed the Times’ suit more likely to succeed than others out there based on the market harm the company is likely able to prove under the fourth fair use factor. The complaint details instances in which locked down content is reproduced in its entirety for free, potentially diminishing the publication’s subscriber base. Ironically, noted Bloomberg columnist Noah Feldman in a recent op-ed, taking the Times’ business away from it could ultimately backfire on companies like OpenAI and Microsoft:

“If you can get information more cheaply from an LLM than from the New York Times, you might drop your subscription. But if everyone did that, there would be no New York Times at all. Put another way, OpenAI and Microsoft need the New York Times and other news organizations to exist if they are to provide reliable news as part of their service. Rationally and economically, therefore, they ought to be obligated to pay for the information they are using.”

The Times is ultimately seeking to hold the companies “responsible for the billions of dollars in statutory and actual damages that they owe for the unlawful copying and use of The Times’s uniquely valuable works.”

In emailed statements to the press, OpenAI said it respects the rights of content owners and has been in talks with the Times, so was “surprised and disappointed” by the lawsuit.

In November, OpenAI announced that it will offer to pay for the costs customers incur from copyright lawsuits via its program, Copyright Shield.

Image Source: Deposit Photos
Author: boggy22
Image ID: 49480745

Share

Warning & Disclaimer: The pages, articles and comments on IPWatchdog.com do not constitute legal advice, nor do they create any attorney-client relationship. The articles published express the personal opinion and views of the author as of the time of publication and should not be attributed to the author’s employer, clients or the sponsors of IPWatchdog.com.

Join the Discussion

3 comments so far.

  • [Avatar for Anon]
    Anon
    January 3, 2024 05:42 pm

    S,

    How do you figure that?

  • [Avatar for S]
    S
    January 3, 2024 05:18 pm

    @Anon, agreed that the hallucinations element is a side-dish to the main argument over whether the AI programs’ scraping is fair use. But I think the indistinguishable INTERMIXING of reproduced material and hallucinations works against a fair use defense.

  • [Avatar for Anon]
    Anon
    January 3, 2024 10:05 am

    The multiple types of “hallucinations” actually works AGAINST the Times’ complaint as these show a LACK of actual violation of copyright (and THAT is not even touching the primary argument against the Times that ANY scraping for training is a Fair Use action).