“There is no benefit solely from reading or observing content. Thus, training input cannot be copyright infringement.”
Much of the focus on generative artificial intelligence (GenAI) has been on training data ingestion—the moment when AI “steals” from creators. But legally, that’s not where the real fight should be. Decades of legal precedent—from search engines to image?scanning to streaming media—already give us a roadmap. No new formulation of copyright law by Congress, as suggested by some academics, is necessary. By considering these seven unique aspects of GenAI systems, copyright analysis is actually easy.
Aspect 1: Training Input
The most common way to obtain training data is by using publicly available sources. There have been cases where private data was accessed without permission, such as data behind a paywall that was not purchased, or where pirated data was used, but those cases have problems beyond copyright infringement, so I will not address them here.
The intent of copyright law is to promote the advancement of knowledge and the arts; the consumption of copyrighted materials by individuals or automated systems aligns with this purpose. Accessing such materials does not violate any of the exclusive rights of copyright holders: reproduction, creation of derivative works, distribution, public performance, or public display. There is no benefit solely from reading or observing content. Thus, training input cannot be copyright infringement.
Aspect 2: Storage
Some GenAI systems store training data for long periods of time while others store them for very short periods of time, just enough to map relationships between types of elements such as how words are generally assembled into sentences or how musical notes follow patterns. Whether storage comprises copyright infringement needs to be further examined with respect to whether the system uses short-term or long-term storage.
Aspect 3: Short-Term Storage
The Copyright Act defines copies as “material objects… in which a work is fixed… A work is ‘fixed’ in a tangible medium of expression when its embodiment in a copy… is sufficiently permanent or stable to permit it to be perceived, reproduced, or otherwise communicated for a period of more than transitory duration.” The Second Circuit established a general precedent in Cartoon Network, LP v. CSC Holdings, Inc. (Cablevision)that a copy remaining in memory for 1.2 seconds before being overwritten by subsequent data was transitory. While the court noted that its decision was specific to that case, the 1.2 seconds has become a de facto minimum threshold. Modern high-speed GenAI systems that do not store data long term only store input training data for less than 1.2 seconds and thus do not constitute copyright infringement.
Aspect 4: Long-Term Storage
Long-term storage of copyrighted data does seem to meet the criteria of fixed copies, as these can potentially be “perceived, reproduced, or otherwise communicated.” The question to answer, then, is whether this long-term storage constitutes fair use. Let us examine each of the four factors of fair use: 1) purpose and character of the use, 2) nature of the copyrighted work, 3) amount and substantiality of the portion copied, and 4) effect upon the potential market for or value of the copyrighted work. Two particularly informative cases are Authors Guild, Inc. v. HathiTrust and Authors Guild, Inc. v. Google, Inc., both before the Second Circuit. HathiTrust Digital Library (“HDL”) and Google both scanned books to allow for searching and were sued by the Authors Guild for copyright infringement.
In both cases, the court determined that the scanning was fair use because the creation of a full?text searchable database is a “quintessentially transformative use” and thus meets the criteria of the first factor. Similarly, GenAI transforms semantic relationships from copyrighted works into an internal model and thus meets the criteria of the first factor.
The court also determined that the second factor was not dispositive but further stated that the scanning process “provides valuable information about the original, rather than replicating protected expression in a manner that provides a meaningful substitute for the original” and thus met factor the criteria of the second factor. Similarly, GenAI analyzes the relationships between words, sentences, paragraphs, and concepts in the copyrighted works, thus providing valuable information about the original.
In determining that HDL’s service was fair use by consideration of the third factor, the Second Circuit court stated, “Because it was reasonably necessary for the HDL to make use of the entirety of the works in order to enable the full?text search function, we do not believe the copying was excessive.” The same argument holds for GenAI, which must copy entire works in order to understand and learn from the appropriate semantic relationships.
In determining that HDL’s service was fair use by consideration of the fourth factor, the Second Circuit court stated that because the book scanners did not “allow users to view any portion of the books they are searching… in providing this service, the HDL does not add into circulation any new, human?readable copies of any books.” In determining that Google Books was fair use by consideration of the fourth factor, the court stated that giving readers snippets of copyrighted books, there can be some loss of sales, but that “some loss of sales does not suffice to make the copy an effectively competing substitute that would tilt the weighty fourth factor in favor of the rights holder in the original. There must be a meaningful or significant effect ‘upon the potential market for or value of the copyrighted work’” to not be fair use.
Because GenAI storage does not, by itself, give access to the original materials, it meets the criteria for the fourth factor. The fourth factor is only relevant to the output, which I discuss below.
Aspect 5: Output
Whether the output of a GenAI system comprises copyright infringement depends on the type of output. I classify two types of GenAI output, repurposing and non-repurposing.
Aspect 6: Repurposing Output
Repurposing GenAI systems learn from one type of data and produce a different type of output. For example, a security system trained on facial images might generate alerts about specific individuals passing a camera. Another might process road sign images to help autonomous vehicles navigate. Copyright infringement requires that an output closely resembles copyrighted training data, either literally or non-literally (e.g., structure, sequence, and organization). Since repurposing GenAI systems produce outputs very different than their inputs, they do not infringe copyrights.
Aspect 7: Non-Repurposing Output
Non-repurposing Generative AI creates outputs that match the type of data it was trained on. For example, a non-repurposing system trained on English texts will produce English documents like research papers, legal briefs, or short stories; a system trained on artwork generates artwork; and one trained on music creates songs. These kinds of systems can involve copyright infringement.
For there to be copyright infringement, the output must be substantially similar to protected training material. Infringement can be literal—exact copies—or nonliteral, where the structure or unique elements are copied without exact words. Nonliteral infringement is somewhat subjective. Typically, any literal content from training data present in GenAI outputs is not substantial and, in my opinion, unlikely to qualify as literal infringement. However, nonliteral infringement—such as imitating another creator’s style—requires further analysis, including whether fair use might apply.
By definition, the output of a non-repurposing GenAI system has the same use as the training input, and so the output would not meet the first factor for fair use, a different purpose and character as the original.
Whether the second factor, the nature of the copyrighted work, applies to a non-repurposing GenAI system depends on the use of the system. Creating a list of factual information, such as the list of highest grossing films or the names of the capitals of each state in the United States, would most likely meet the criteria of this fair use factor. Creating an artistic work would be less likely to pass this fair use factor. Creating a novel in the style of a specific author would be even less likely to pass this fair use factor.
Whether the third factor, amount and substantiality of the portion copied, applies to the system depends on specific outputs of the specific GenAI system and how much of copyrighted training inputs actually appear in the outputs.
The fourth factor, effect on the market, is difficult to determine. How much would a novel in the manner of J.K. Rowling or a painting in the manner of Peter Max reduce the market for original works by those creators? In art collecting, widely available prints often boost the value of originals by increasing public exposure and demand. Could GenAI-generated works similarly enhance the market for originals? A high-quality Picasso print costs much less than an identical original because the original was made by the artist himself.
No Need for Change
This review of existing case law ultimately teaches that training input is not infringing. Storage may be short-term (non-infringing) or long-term (fair use). Output involves repurposing (not infringing) and non-repurposing (potentially infringing, assessed case-by-case). Existing court rulings adequately address these concerns, so there is no need to change current IP laws; they effectively balance GenAI innovation with copyright protection.
This article is based on the author’s previously published article, “Seven Aspects of Generative AI for Analyzing Copyright Infringement and Fair Use.”

Join the Discussion
2 comments so far. Add my comment.
Bob Zeidman
January 6, 2026 11:23 am@Ateara L. Garrison, the CLMA framework sounds interesting. Where can I learn more about it?
Ateara L. Garrison Esq.
January 5, 2026 09:46 pmI agree with you that AI training should not be treated as infringement and that existing copyright principles are sufficient to discern when use can equal infringement. Where the industry still lacks clarity is not doctrine, but implementation. I created the Compulsory License Modernization Act (CLMA) framework which bridges that gap by allowing AI systems to learn freely while ensuring that monetized hybrid outputs are attributed and compensated through existing copyright and distribution channels without training surveillance or new exclusive rights.
Add Comment