A significant legal decision is poised to shape the future of artificial intelligence development. A US judge has ruled that using copyrighted books to train AI models can be considered permissible “fair use” under US law. However, in a crucial distinction, the same ruling found that the source of the training data matters significantly, potentially holding AI firms liable for using pirated materials.
The ruling stems from a high-profile copyright lawsuit filed last year against AI firm Anthropic by three authors: best-selling mystery writer Andrea Bartz, non-fiction author Charles Graeber, and author Kirk Wallace Johnson. They accused Anthropic of using their books without permission to train its Claude large language model (LLM), subsequently building a multi-billion dollar business.
Transformative Use in AI Training Ruled Fair Use
In his decision, U.S. District Judge William Alsup of San Francisco determined that Anthropic’s use of the authors’ books for training its AI was “exceedingly transformative.” This finding is key because transformative use is a core element of the fair use defense under Section 107 of the US Copyright Act.
Judge Alsup reasoned that Anthropic’s LLMs trained on these works “not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.” He compared the AI’s training process to that of a human reader learning to write, arguing that the AI extracts uncopyrightable information to build revolutionary technology that promotes creativity. If making copies within the LLM was reasonably required for this transformative training, the judge found those copies were part of a legitimate fair use.
Notably, the authors did not claim that Anthropic’s Claude generated “infringing knockoffs” or direct replicas of their specific works for users. Judge Alsup indicated that such a claim would present a fundamentally different legal scenario.
The ruling also offered a perspective on market impact, one of the factors in fair use analysis. Judge Alsup suggested that competition resulting from a transformative new work is not the concern of copyright law, whose purpose is to encourage the creation of new works, not protect authors from competition from those new works. He even likened the authors’ concern to fearing that teaching schoolchildren to write well might lead to an increase in competing books. Anthropic also highlighted the guardrails it has incorporated into Claude to prevent direct plagiarism.
The Catch: Pirated Source Data Faces Trial
Despite backing Anthropic on the transformative nature of AI training, Judge Alsup rejected the company’s request to dismiss the entire case. This pivotal decision hinges on the origin of the training material.
Judge Alsup ruled that Anthropic must still face trial over its use of pirated copies to build its training library. According to the judge, Anthropic holds more than seven million pirated books in a “central library.” He found that by saving and maintaining these illegal copies as part of this database, Anthropic violated the authors’ rights, regardless of the transformative nature of the subsequent training process.
The judge expressed skepticism that downloading from pirate sites could ever be considered reasonably necessary for a subsequent fair use. This means that while training itself might be fair use, the acquisition and storage of illegal training data is a distinct issue that could lead to liability.
Anthropic, a major AI firm backed by tech giants like Amazon and Google’s parent company, Alphabet, could face substantial damages, potentially up to $150,000 per copyrighted work found to have been improperly sourced and stored in their library of pirated materials. A separate trial has been ordered specifically to address this aspect of the case.
Broader Implications and Ongoing Legal Battles
This ruling is among the first judicial decisions in the US to directly address the complex question of how LLMs can legitimately learn from existing copyrighted material. It sets a potential precedent, distinguishing between the act of training (potentially fair use) and the legality of the data’s source (a separate, critical issue).
The case is part of a growing wave of legal challenges across the AI industry concerning the use of various forms of content for AI training data. Similar battles are unfolding over:
Image Generation: Disney and Universal recently filed a lawsuit against AI image generator Midjourney, alleging it’s a “bottomless pit of plagiarism” that copies characters like Darth Vader, Elsa, and the Minions. The studios argue that creating images essentially replicating their copyrighted characters infringes their rights, with Disney’s legal chief stating, “piracy is piracy,” regardless of AI.
News Content: The BBC is reportedly threatening legal action against AI firm Perplexity, claiming its chatbot reproduces BBC content “verbatim” and ignores technical measures like robots.txt
. The BBC cites concerns about copyright infringement, damage to reputation due to inaccurate AI summaries, and the threat to the publishing industry.
In response to these legal pressures, some AI companies are reportedly exploring and striking licensing deals with content creators or publishers to obtain permission for using materials for training.
The Path Ahead
In a statement, Anthropic expressed satisfaction with the judge’s recognition of its transformative use of the works but disagreed with the decision to hold a trial regarding the acquisition and use of some books. The company remains confident in its case and is evaluating its options. A lawyer representing the authors declined to comment on the ruling.
The legal landscape surrounding AI and copyright remains complex and rapidly evolving. While this ruling offers some clarity on the transformative training aspect, particularly regarding text data like books, the critical issue of illegal data acquisition and storage introduces significant new questions and potential liabilities. The possibility of appeals means a definitive resolution on AI and copyright in the US is likely still years away.