Judge Calls Anthropic’s Training of LLMs with Authors’ Works ‘Quintessentially Transformative’ But Gives No Pass on Piracy

“Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.” – Judge William Alsup

On Monday, the U.S. District Court for the Northern District of California issued a mixed order on fair use as it relates to generative AI, in part likening the training of Large Language Models (LLMs) to the process of human learning, in a case brought against generative AI tool Anthropic by a group of authors.

The lawsuit was filed by journalists and book authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson in August 2024 against Anthropic on behalf of a class of plaintiffs, alleging widespread copyright infringement of “hundreds of thousands of copyrighted books.” The suit challenged only the inputs of the LLMs, not the outputs.

Anthropic’s core product is the AI ChatBot, Claude, which the complaint claimed was fed “known pirated versions of Plaintiffs’ works” in order to train the Chatbot to generate human-like responses. “An essential component of Anthropic’s business model—and its flagship ‘Claude’ family of large language models (or “LLMs”)—is the largescale theft of copyrighted works,” said the complaint.

Far from compensating the plaintiffs for their works, Anthropic “has taken multiple steps to hide the full extent of its copyright theft,” it continued.

According to reports cited in the complaint, Anthropic “has raised $7.6 billion from tech giants like Amazon and Google” and, as of December 2023, the company was valued in excess of $18 billion. The company has become particularly popular with corporate clients, including Slack, Zoominfo, Asama, Bridgewater, LexisNexis, and Jane Street Capital, according to the lawsuit.

The infringement allegations stem chiefly from Anthropic’s admission in a December 2021 paper that it created a training dataset relying mostly on “The Pile,” which is “an 800 GB+ open-source dataset created for large language model training,” according to the complaint. One of the architects of the Pile, Shawn Presser, created a dataset called “Books3” in the Pile, which, according to the plaintiffs, is “a trove of pirated books.” Books3 consists of “all of Bibliotik,” according to public posts by Presser, and Bibliotik, according to sources cited in the complaint, is a “notorious pirated collection” of “pirated books.”

The complaint additionally argued that Anthropic purchased millions of copies of print books, some that overlapped with the digital pirated copies it obtained, “tore off the bindings, scanned every page, and stored them in digitized, searchable files” in order to create a “central library” of “all the books in the world” to retain “forever,” according to Monday’s order, which was authored by Judge William Alsup.

In his analysis, Alsup first said, with respect to the copies of the works that were used to train specific LLMS that “the purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative.” He explained:

“Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use.”

However, with respect to the copies that were used to build a central library, while Alsup found that the print copies Anthropic purchased and subsequently disposed of as it scanned were purchased “fair and square” and represented a transformative use with respect to the mere format change from print to digital, he also rejected Anthropic’s argument that the pirated copies should equally qualify as fair use. Anthropic said that because it intended to eventually use the pirated copies in the central library to train LLMs, the use should be deemed transformative. But the district court dismissed this argument, finding that in this case that the actual use was not transformative and that “piracy was the point: To build a central library that one could have paid for, just as Anthropic later did, but without paying for it.”

Alsup also found the second fair use factor—the nature of the copyrighted works—pointed against fair use for all the copies at issue because Anthropic’s works were clearly expressive. But the third factor with respect to the works used to train LLMs did favor fair use because the amount and substantiality of the portion of the work used was necessary to the transformative use, according to Alsup. With respect to the works that were purchased for the central library, the analysis was the same on the third factor, but for the pirated copies used for the central library, the third factor pointed against fair use, said the order. It added:

“[Anthropic’s] purpose, it says, was to train LLMs. But its objective conduct was to seek ‘all the books in the world’ and then retain them even after deciding it would not make further copies from them for training — indicating there were other further uses. Against the purpose of acquiring all the books one could on the chance some might prove useful for training LLMs and maybe other stuff too, almost any unauthorized copying would have been too much.”

Finally, with respect to the effect of the use upon the market value of the copyrighted works, the fourth fair use factor, Alsup found that only the use of the pirated works to create a central libray weighed against fair use.

Overall, Alsup granted summary judgment for Anthropic that the training use was a fair use and that the print-to-digital format change was a fair use. But he denied summary judgment for Anthropic that the pirated library copies must be treated as training copies and ordered a trial with respect to the pirated copies to determine damages, including potentially for willfulness. “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages,” Alsup noted.

In an analysis of the case for Truth on the Market, Kristian Stout, director of innovation policy at the International Center for Law & Economics (ICLE), said the decision provides a “clear roadmap” for AI companies with respect to inputs, chiefly that “companies should acquire training materials through legitimate channels—purchase, licensing, or authorized access,” but that “output liability emerges as the next frontier.”

Image Source: Deposit Photos
Author: phonlamai
Image ID: 413313112

Eileen McDermott Eileen McDermott is the Editor-in-Chief of IPWatchdog.com. Eileen is a veteran IP and legal journalist, and no stranger to the intellectual property world, having held editorial and managerial positions at [...see more]

Warning & Disclaimer: The pages, articles and comments on IPWatchdog.com do not constitute legal advice, nor do they create any attorney-client relationship. The articles published express the personal opinion and views of the author as of the time of publication and should not be attributed to the author’s employer, clients or the sponsors of IPWatchdog.com.

Join the Discussion

2 comments so far.

Anon
June 26, 2025 09:41 am
A second case has also dropped on the topic.

My main thrust (training being necessarily transformative) appears to be holding firm.
Anon
June 25, 2025 08:35 am
The race is not over, so I do not celebrate.

That being said, the level of technical transformation necessary should have made this an easy call for all attorneys trained in this space.

View Comments

Judge Calls Anthropic’s Training of LLMs with Authors’ Works ‘Quintessentially Transformative’ But Gives No Pass on Piracy

Join the Discussion

Anon

Anon

Varsity Sponsors

Latest IPW Posts

How Successful Patent Practitioners Are Putting AI to Work

Unjust Enrichment Under the DTSA: A Nascent Circuit Split and Its Practical Implications

Fourth Circuit Says USPTO Can Withhold Documents in Repaneled Centripetal Networks IPR Featuring Alleged APJ Bias

CAFC Affirms PTAB Ruling That DraftKings Failed to Prove Unpatentability of Gaming Patent Claim

Raskin Presses Squires on Motives for Board of Peace Trademark Filings

Latest Podcasts

Patents, Drug Prices, Clinical Trials and the Economics of Medicine | IPWatchdog Unleashed

IP Innovators: Closing the Gap: Emily Teesdale on IP Strategy, Collaboration, and the Fractional Model

Proactive IP Risk Management: A Patent Litigator’s Perspective | IPWatchdog Unleashed

SEPs, Patent Pools and the Case for Market-Based IP Solutions | IPWatchdog Unleashed

High Performance, Hidden Struggles: Law Firm Culture and the Human Side of IP Law / IPWatchdog Unleashed

IPWatchdog Events

Webinar: Beyond the Report – Rebuilding FTO with Workflow-Native AI

Virtual Artificial Intelligence Masters™ 2026

Webinar: Brand Protection that Works – Practical Strategies for Faster, Effective Online Enforcement

Webinar: Sponsored by Ankar

Webinar: Sponsored by Juristat

Industry Events

PIUG 2026 Joint Annual and Biotechnology Conference

24th Annual Rocky Mountain Intellectual Property & Technology Law Institute

Certified Patent Valuation Analyst Training

2026 WIPO-U.S. Summer School on Intellectual Property

From IPWatchdog

More from IPWatchdog