Copyright Office Weighs in on AI and Fair Use Amid Major Leadership Shakeup

“In the Office’s view, training a generative AI foundation model on a large and diverse dataset will often be transformative.” – U.S. Copyright Office

Copyright Office

Library of Congress

Late last week—one day after the Trump Administration fired Librarian of Congress Carla Hayden and the day before it reportedly fired Register of Copyrights Shira Perlmutter—the U.S. Copyright Office released a pre-publication edition of the third part in the agency’s series of reports exploring issues in copyright law in light of evolving artificial intelligence (AI) technology.

The Office’s third report focuses on generative AI training, including related fair use considerations and possible content licensing schemes, with the agency’s legal analysis generally recommending a case-by-case approach that drew mixed reviews by U.S. copyright system advocates.

Following the release of the Copyright Office’s first part of its series of reports on copyright and AI last July, which advocated for a federal right to digital replicas, the agency was unable to issue parts two and three of the series by the end of 2024. Last November, Register of Copyrights Shira Perlmutter appeared before the Senate IP Subcommittee acknowledging difficulties in meeting the Office’s own ambitious deadlines with the rapid pace of AI advancement while dealing with copyright and AI issues thoughtfully. This January, the Office released part two of its series, focusing on the scope of copyright protection in AI-generated works under current human authorship requirements.

Multiple Acts During AI Training and Deployment Could Infringe Copyright

The third report begins by outlining technical processes by which generative AI models are trained on datasets constituting copyrighted human expression. While recognizing that generative AI developers need to acquire high-quality content to use as training data for improving their models, with content being selected based on the AI model’s purpose, the Office noted that the amount of data required to build effective AI models is an open question.

Regardless of the source, data is often filtered or otherwise curated by generative AI developers before their use in iterative training phases, with the Office noting diverging views from public comments as to whether datasets are memorized during training. Further complicating matters are the various forms in which AI models can be deployed, which can involve guardrails to prevent certain outputs, task-specific functions and different degrees of end-user control.

The Office’s report identified multiple acts during the course of creating and deploying generative AI systems that could constitute copyright infringement. Not only does data acquisition and curation clearly implicate the exclusive right to reproduction codified at 17 U.S.C. § 106(a), but reproduction and derivative work rights are potentially infringed by several aspects of AI training if models are weighted to generate verbatim copies of works, the Office found. Retrieval-augmented generation (RAG), in which user prompts are used to query search engines for content used by the AI to complete the prompt, also presents infringement concerns acutely felt by news publishers. Further, public display and public performance rights could be infringed based on the form of output generated by the AI model, the Office wrote.

Fair Use Analysis Considers Stylistic Imitation as Market Impact Under Factor Four

The majority of the Office’s third report is devoted to analyzing fair use in the AI context, with factors one and four receiving the greatest amount of the agency’s attention. As to factor one’s inquiry into the purpose and character of the copyrighted work’s use, much of the Office’s analysis was guided by the U.S. Supreme Court’s reasoning on transformative use in Andy Warhol Foundation for the Arts v. Goldsmith (2023), in which the Court clarified that a use may be transformative if it serves a purpose similar enough to the original that the work is instead derivative despite significant alterations.

“In the Office’s view, training a generative AI foundation model on a large and diverse dataset will often be transformative,” the report reads. However, the agency added that the degree to which particular AI models will be transformative will depend on the model’s functionality and how it’s deployed, with models trained to output content appealing to a particular audience to be modestly transformative at best.

The Office also rejected two arguments on fair use commonly raised in the AI context. First, the agency dismissed the notion that the use of works to train AI models is inherently transformative because the training itself is not an expressive purpose. “Where the resulting model is used to generate expressive content… the training use cannot be fairly characterized as ‘non-expressive,’” the Office found. Further, analogies to human learning rest on faulty premises in the agency’s eyes “as fair use does not excuse all human acts done for the purpose of learning,” like a student’s copying of texts from a school library rather than borrowing or purchasing those books instead.

In evaluating fair use factor four, the use’s effect on the potential market for the original work, the Office’s report wades into “uncharted territory” by advancing market dilution theories which have been untested by federal courts. Stylistic imitation even without substantial similarity would likely be implicated under such a theory, which could be considered as a market effect under factor four that diminishes the value of the original work used to train the model.

The report was welcomed by stakeholders like The New York Times, which issued a statement supporting the Office’s conclusions on market harms posed to news publishers, especially through RAG deployments. On the other hand, although copyright advocacy coalition Re:Create agreed with the agency’s findings that there was no need for government-enforced compulsory licensing, the coalition argued that key fair use principles were misapplied in the report. For example, the Office’s transformative use analysis was overly focused on the generated output’s audience rather than the nature of the AI model itself, according to Re:Create. Further, the coalition criticized the Office’s advancement of market dilution theory, which “runs counter to the primary goal of copyright: to encourage the creation of new works.”

While both the Copyright Office and Library of Congress websites still listed Perlmutter as Register and Hayden as Librarian as of the time of writing, reports indicate that Trump’s Department of Justice Deputy Attorney General Todd Blanche has been appointed Acting Librarian of Congress following Hayden’s firing, while Paul Perkins, associate deputy attorney general, has been appointed acting Register of Copyrights.

Share

Warning & Disclaimer: The pages, articles and comments on IPWatchdog.com do not constitute legal advice, nor do they create any attorney-client relationship. The articles published express the personal opinion and views of the author as of the time of publication and should not be attributed to the author’s employer, clients or the sponsors of IPWatchdog.com.

Join the Discussion

No comments yet.

Varsity Sponsors

Industry Events

IPPI 2026 Winter Institute: IP and National Success
February 26 @ 7:45 am - 8:00 pm EST
PIUG 2026 Joint Annual and Biotechnology Conference
May 19 @ 8:00 am - May 21 @ 5:00 pm EDT
Certified Patent Valuation Analyst Training
May 28 @ 9:00 am - May 29 @ 5:00 pm EDT

From IPWatchdog