Battle Between Newspaper Giant and Generative AI Boils Down to Definition of Fair Use

“As is always true with fair use cases, the success of OpenAI’s defense will turn on its particular facts, but it seems if OpenAI can survive the ‘output’ attack, it is highly likely to survive the ‘training’ attack.”

fair useThe training of artificial intelligence models using copyrighted material continues to stir debate and prompt litigation. In the latest salvo, the New York Times Company sued Microsoft and OpenAI – the creator of ChatGPT – for infringement under the federal Copyright Act.

As often is the case with claims like these, the merits will center on the fair-use doctrine, a well-recognized legal principle in copyright law that aims to balance the interests of copyright holders with the public benefit of free speech and creative works. Fair use is a defense to a claim of copyright infringement that must be affirmatively invoked by the accused infringer.

Importantly, Section 107 of the Copyright Act makes clear that the fair use doctrine works as an exclusion to copyright infringement (i.e., even where a copyrighted work is used, the exclusion prevents such use from constituting copyright infringement).

The success of a fair use defense is heavily fact dependent, as courts balance a number of factors in considering its application (e.g., the amount of the use, whether the use was transformative, etc.). The fair use defense has been successful in permitting the unauthorized use of protected work for certain purposes like criticism, comment, teaching, scholarship, research, and news reporting.

Getting to the Merits

Whether OpenAI’s expressed fair-use defense has merit presents an interesting question. It certainly appears to be a colorable defense, but it is important to appreciate that the case is in the early stages and, at the time of this writing, OpenAI has yet to formally lodge a fair-use defense or provide an answer in the pending litigation. For its part, Microsoft is alleged to have invested $13 billion in OpenAI and incorporated OpenAI’s technology in the Microsoft-owned search engine, Bing, and other 365 Office products. Microsoft also has yet to submit its answer in the lawsuit.

The New York Times appears to focus its attack on two aspects of OpenAI’s conduct: (1) its “training” of AI models using content “scraped” from The New York Times and other sources; and (2) the AI model’s “output,” allegedly containing “output [that are] near-verbatim copies of significant portions of Times Works.”

OpenAI’s position against the “training-based” allegation seems to be the stronger of the two positions, and a court would likely have to work harder to reject a fair use defense there. The “output” argument is most likely to present the stronger danger to OpenAI, but OpenAI has in its quiver the holding in Authors Guild, Inc. v. Google Inc., 721 F.3d 132 (2d Cir. 2013).

There, Google successfully used the fair-use defense to defend its Google Books technology, which scanned and digitized huge volumes of copyrighted material to make their contents searchable online.

Big Implications

As is always true with fair use cases, the success of OpenAI’s defense will turn on its particular facts, but it seems if OpenAI can survive the “output” attack, it is highly likely to survive the “training” attack.

Merits aside, the outcome of this case is likely to have important policy implications, as a win by The New York Times arguably undermines the fundamentals of generative AI.

Image Source: Deposit Photos
Author: garagestock
Image ID: 133247770 

Share

Warning & Disclaimer: The pages, articles and comments on IPWatchdog.com do not constitute legal advice, nor do they create any attorney-client relationship. The articles published express the personal opinion and views of the author as of the time of publication and should not be attributed to the author’s employer, clients or the sponsors of IPWatchdog.com.

Join the Discussion

11 comments so far. Add my comment.

  • [Avatar for Mark Nowotarski]
    Mark Nowotarski
    February 7, 2024 01:30 pm

    @ Anon,

    Thanks for your point of view. It will be interesting to see how these cases develop.

  • [Avatar for Anon]
    Anon
    February 7, 2024 10:04 am

    Well Mark,

    That is a much more European view than a US view, and – as it may be – does not reflect either the traditional US view or the massively “on-line” generations of those born after circa 2002 that have FREELY given all of their ‘private’ sense to social media.

    So, not only are you striving against the law as it is (your view of what copyright entails is false), but you are also striving against the greater bulk of US social structure in any sense of expectation of privacy.

    To answer directly, yes, I see things differently.

    The boat of “should own” sailed long ago. The view that copyright “entitles” ANY compensation is simply NOT a right within copyright under US law. Fair Use IS fair, and training for the technically transformations necessarily present makes the NYT case a clear loser and aligned far more with cases on Fair Use that pivot on the technical processing than any post-processing business deal.

    You have each of technical fact, principles of law, and common expected culture going against your position.

    Certainly, certain authors and others may wish things were different, but as they say, if wishes were fishes…

  • [Avatar for Mark Nowotarski]
    Mark Nowotarski
    February 7, 2024 08:37 am

    “please consider my tone dialed back a bit.”

    Thanks. I appreciate that.

    “suffice it to say, it is an entirely different technical paradigm.”

    Understood, and I do have some background in the technology.

    I think there is a larger issue, however, that goes beyond a particular technology or even the current state of the law. I think that issue is summed up in the title of Jaron Lanier’s 2013 book “Who Owns the Future”.

    Lanier’s basic premise is that we should not so easily give up our data (creative content, monitored behavior etc.) for the financial benefit of tech companies at the expense of our own benefit (e.g. spam). He proposes a system where all users get micro payments for their data. Their data is something they should own.

    I see the NYT case and similar cases as extensions of that idea. I think that if a company is making a commercial product based on copyrighted materials, then the owners of those copyrights should be entitled to some form of fair and reasonable compensation, even if that data is massively transformed to make the commercial product.

    What “fair and reasonable” is will be a difficult question to answer, but as a larger principle, I think it’s a good idea.

    Do you see things differently?

  • [Avatar for Anon]
    Anon
    February 6, 2024 10:57 am

    Ah, thank you for noting that you are not an attorney – please consider my tone dialed back a bit.

    Whether or not an output can be ‘tricked’ into a near repeat is very different from the whole of the technical processing within AI engines.

    There are plenty of free articles diving into the basis of what AI does in the training (and there are more than one mode of training), but suffice it to say, it is an entirely different technical paradigm.

  • [Avatar for Mark Nowotarski]
    Mark Nowotarski
    February 6, 2024 10:15 am

    “You are an attorney, are you not?”

    I am not.

    “It absolutely does matter.”

    How so? What is the key difference?

  • [Avatar for Anon]
    Anon
    February 6, 2024 08:32 am

    It absolutely does matter.

    You are an attorney, are you not?

  • [Avatar for Mark Nowotarski]
    Mark Nowotarski
    February 5, 2024 01:44 pm

    “Does AI do what a straight up copy of a MP3?”

    If both can reproduce the original material, does it matter?

  • [Avatar for Anon]
    Anon
    February 5, 2024 08:16 am

    Does AI do what a straight up copy of a MP3?

    I am embarrassed (for you) that I even have to ask that question.

    Again – please learn just a little bit about the technology involved.

  • [Avatar for Mark Nowotarski]
    Mark Nowotarski
    February 4, 2024 02:07 pm

    @Anon you make an interesting point “Weights are NOT art”.

    I agree, but I’m not quite sure how that relates to copyright. MP3 files are also not art, yet if I make an unauthorized MP3 of someone’s music, I am violating their copyright am I not?

  • [Avatar for Anon]
    Anon
    February 2, 2024 02:13 pm

    Mark,

    Why would that seem to you?

    Training is eminently Fair Use, not to mention the level of transformation in the AI process puts ALL existing cases to shame.

    Weights are NOT art – nor are they derivative work.

    I would recommend that you inform yourself of the technology prior to this type of misfiring.

  • [Avatar for Mark Nowotarski]
    Mark Nowotarski
    February 2, 2024 10:41 am

    It seems to me that if you want to use a copyrighted work to train an AI model, you need a license from the copyright holder. The weights in the trained model are nothing more than a derivative work made from the copyrighted material.

Add Comment

Your email address will not be published. Required fields are marked *