A federal judge has issued a significant ruling in the complex legal landscape surrounding artificial intelligence and copyright, determining that training AI models on copyrighted books can constitute “fair use” under U.S. law. This decision, hailed as a landmark moment, offers some clarity to AI companies facing numerous lawsuits over the data used to build their powerful models.
In the U.S. District Court for the Northern District of California, Judge William Alsup presided over a case brought by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson against AI company Anthropic, known for its AI assistant, Claude. The authors alleged Anthropic used millions of copyrighted books without permission to train its large language models (LLMs).
AI Training: A “Spectacularly Transformative” Fair Use?
The core of Judge Alsup’s ruling centered on the nature of the AI training process itself. He found that using legally obtained, copyrighted books to train LLMs is “quintessentially” or “exceedingly transformative.” This means the AI is not merely copying the works to reproduce them, but using them to learn patterns, language structures, and information to create something new and different.
Judge Alsup likened the process to a human writer learning their craft by reading extensively – the goal is to enable new creation, not replicate the source material. He stated that Anthropic’s models were trained “not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.”
Applying the factors of fair use under the Copyright Act (including the purpose and character of the use, and the effect on the market), the judge determined that using lawfully acquired books for this transformative training purpose did not harm the copyright holders’ primary market for the original books. He reasoned that copying entire works was “especially reasonable” for training because the models don’t reproduce these copies for public access and the use doesn’t displace demand for the originals.
This finding represents a notable victory for AI developers, suggesting that the act of training on legitimately sourced copyrighted material, even entire works, may be permissible.
The Critical Caveat: Piracy is Not Excused
However, the ruling delivered a crucial distinction that is less favorable to Anthropic. While training on lawfully acquired material might be fair use, obtaining that material illegally is not.
Judge Alsup found that Anthropic still faces potential liability for allegedly using pirated versions of copyrighted books to build its central training library. Court documents reportedly indicated internal concerns among Anthropic employees about the legality of sourcing books from pirate sites.
The judge explicitly ruled that downloading copyrighted works from pirate sites when they could have been purchased or accessed lawfully is highly unlikely to be considered reasonable or justified, regardless of the subsequent fair use for training. He stated that buying a copy of a book later after stealing it initially does not absolve the company of liability for the original theft.
Therefore, while the fair use argument succeeded for the training process using lawfully obtained copies, the case will proceed to trial specifically on the allegations related to Anthropic’s use and storage of pirated books.
What This Landmark Decision Means
Fair Use for Training (with lawful data): The ruling provides some initial legal backing for the AI industry’s argument that training models on copyrighted data is fair use, particularly due to the highly transformative nature of LLMs.
Strict Stance Against Piracy: It sends a clear message that obtaining copyrighted material through illegal means, like piracy, is not excused by the subsequent use of that material for AI training.
Precedent Set: As one of the first definitive judicial answers on AI training and copyright, this decision is precedent-setting and will likely influence the dozens of other ongoing lawsuits against AI companies.
Uncertainty Remains: Despite this clarity on training vs. acquisition, the legal landscape is still evolving. Appeals are likely, and other cases (like the New York Times lawsuit against OpenAI and Microsoft) involve different claims, such as the AI’s output competing with or mimicking the original content. The upcoming trial for Anthropic regarding the pirated books will also be closely watched.
Anthropic, a major player in the AI space valued at $61.5 billion and generating over $1 billion in annual revenue primarily through its Claude model, welcomed the ruling on the transformative nature of AI training. However, the pending trial regarding the pirated materials highlights the ongoing challenges AI companies face in navigating copyright law in the digital age.