AI Training Ruled ‘Fair Use,’ Anthropic Still Faces Piracy Trial

In a significant ruling poised to shape the future of artificial intelligence and copyright law, a US federal judge has delivered a mixed verdict in a lawsuit against AI firm Anthropic. The decision offers a partial victory for AI developers by upholding the “fair use” defense for using copyrighted material to train models, but simultaneously mandates a trial over the alleged use of pirated copies.

The case, Bartz v. Anthropic, was brought by three authors, including best-selling mystery writer Andrea Bartz. Alongside non-fiction authors Charles Graeber and Kirk Wallace Johnson, they accused Anthropic of illegally using their books to train its AI chatbot, Claude, and build its multi-billion dollar business.

AI Training: Fair Use Found

US District Judge William Alsup ruled that Anthropic’s use of the authors’ books for the specific purpose of training its AI models was “exceedingly transformative.” This finding is crucial, as transformative use is a key factor supporting the doctrine of fair use under US copyright law, which permits limited use of copyrighted material without permission.

Judge Alsup reasoned that Anthropic’s Large Language Models (LLMs), like aspiring writers, learn from existing works not to simply replicate or replace them, but to process the information and create something “new and different.” He stated that if the training process required making copies within the LLM, those copies were engaged in a transformative use. A key point in this determination was the judge’s observation that the authors did not claim the AI generated “infringing knockoffs” or replicas of their original works.

This aspect of the ruling is viewed by many analysts as a landmark judgment and a notable win for US AI companies. It suggests that the process by which AI models synthesize information from vast texts to generate new content can be legally analogous to how humans learn from reading. It is reportedly the first instance where a federal judge has sided with tech companies over individual creators on this specific point related to AI training data.

Pirated Copies: The Trial Continues

However, Judge Alsup drew a critical distinction between using works for training and the method Anthropic used to acquire and store the training material. Despite the fair use finding for the training process itself, the judge rejected Anthropic’s request to dismiss the entire case.

He ruled that Anthropic must still face trial regarding its alleged use of pirated copies of the authors’ books to build its internal library of training material. The judge noted evidence suggesting Anthropic holds millions of pirated books – potentially around seven million – in a “central library.” He explicitly stated that Anthropic had “no entitlement to use pirated copies for its central library,” deeming the storage and use of these illicitly obtained works a potential violation of the authors’ rights that does not fall under fair use.

Evidence revealed during the case reportedly showed that internal Anthropic researchers had previously raised concerns about the legality of using online libraries of pirated books. While the company may have subsequently shifted its approach or legally purchased some works, the judge noted that this would not absolve them of liability for earlier alleged thefts, though it might influence potential damages.

Anthropic, which is backed by major tech companies like Amazon and Google’s parent company Alphabet, could face substantial damages, potentially up to $150,000 per copyrighted work found to have been infringed through the use of pirated copies.

Broader Implications for the AI Industry

This split decision holds significant implications and is expected to potentially influence dozens of similar copyright lawsuits filed against other prominent AI companies, including OpenAI, Meta Platforms, and Perplexity AI. Since generative AI became widespread in 2023, creators across various fields – including authors, media companies (like the BBC), music labels, and visual artists (leading to lawsuits like Disney and Universal vs. Midjourney) – have initiated numerous legal challenges and called for regulations against using copyrighted works for AI training without permission.

The ruling highlights the complex challenge of applying existing copyright law to novel AI technologies and reflects the fierce debate about whether AI will truly enhance creativity or primarily lead to cheap imitations.

In response to the growing legal pressure, some AI companies have begun proactively pursuing licensing deals with content creators or publishers to secure permission and data legally, a trend that may accelerate following this ruling’s distinction between fair use in training and infringement through illegal acquisition.

Following the decision, Anthropic stated it was pleased with the judge’s recognition that its use of the works for training was transformative, asserting it aligns with copyright’s goal of fostering creativity. However, the company expressed disagreement with the decision to proceed to trial over how some of the books were obtained and used in its library, stating confidence in its case. A lawyer for the authors declined to comment.

The trial concerning the pirated copies is expected to proceed in December, further clarifying the legal boundaries for AI developers acquiring and using data.