Judge Rules Meta AI Training on Books is ‘Fair Use’ in Copyright Win

judge-rules-meta-ai-training-on-books-is-fair-use-685ce5333612b

Meta Platforms has secured a significant legal victory in a closely watched copyright lawsuit concerning its AI training data. Federal Judge Vince Chhabria ruled in favor of Meta in the case Kadrey v. Meta, brought by 13 authors, including comedian Sarah Silverman. The authors alleged Meta illegally used their copyrighted books to train its Llama AI models without permission.

Judge Chhabria granted summary judgment on Wednesday, June 26, 2025, bypassing a jury trial, and found that Meta’s use of the books for training, in this specific instance, qualified as “fair use” under U.S. copyright law. This decision is seen as a major win for Meta and the broader AI industry, affirming their position that using publicly available (even if pirated) data to train generative AI models can be lawful under certain conditions.

Understanding the Judge’s Decision

Judge Chhabria’s ruling hinged primarily on two aspects of the “fair use” doctrine as applied to this case:

  1. Transformative Use: The judge found Meta’s use of the books to be “highly transformative.” This means the AI models created were fundamentally different from the original copyrighted works; they didn’t simply reproduce the books but used them to learn language patterns, structure, and information to generate new outputs. The AI’s output was not a substitute for reading the original books.
  2. Lack of Market Harm: Crucially, the authors failed to convince the judge that Meta’s training process harmed the market for their original books or their ability to license their work. Judge Chhabria stated the plaintiffs “presented no meaningful evidence on market dilution at all.” External reports noted the plaintiffs’ market harm argument was introduced late in the proceedings and lacked sufficient support, failing to withstand scrutiny.
  3. The judge determined that while AI training could potentially harm markets, particularly for less-known authors, the plaintiffs in this specific case did not provide the necessary evidence to demonstrate such harm resulted directly from Meta’s models using their works for training.

    Not a Blanket Win: Key Caveats from the Bench

    Despite ruling for Meta, Judge Chhabria issued significant caveats, making it clear this decision is narrowly focused on the specific arguments and evidence presented in this case and does not provide a sweeping legal endorsement for all AI training practices on copyrighted material.

    The judge explicitly stated, “This ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful.” He attributed the outcome in large part to the plaintiffs’ litigation strategy, noting they “made the wrong arguments and failed to develop a record in support of the right one.”

    Judge Chhabria strongly suggested that future lawsuits with stronger evidence, particularly concerning the market effects of AI training, could yield different results. He commented that in cases “with better-developed records on the market effects of the defendant’s use,” plaintiffs “will often win.” His opinion even contained language suggesting that AI companies like Meta might be “serial copyright infringers” based on their training methods, pushing back against the industry narrative that free use of copyrighted data is essential for innovation. He argued that if such data is crucial for products generating billions or trillions, companies “will figure out a way to compensate copyright holders.”

    Part of a Larger Legal Battle

    This decision follows closely on the heels of a similar ruling from the same court system that favored AI company Anthropic in a separate copyright lawsuit involving book training data. Like the Meta ruling, the Anthropic decision found that training an AI model on copyrighted books could qualify as “fair use” due to its transformative nature. However, the Anthropic case is proceeding to trial on the separate issue of whether obtaining those books from pirate websites was illicit.

    The Meta and Anthropic rulings occur amidst a wave of copyright litigation against AI developers. Other notable lawsuits include The New York Times suing OpenAI and Microsoft for allegedly training models on its news articles, and Disney and Universal Studios suing Midjourney over the use of films and TV shows.

    Judge Chhabria noted in his decision that fair use defenses are highly dependent on the specific details of each case and the nature of the content. He suggested that markets for certain types of works, like news articles, “might be even more vulnerable to indirect competition from AI outputs,” potentially making fair use arguments more challenging for AI companies in those contexts.

    It’s also important to note that the Kadrey v. Meta case was brought by only 13 named authors and was not a class action lawsuit. This means that countless other authors whose works may have been used to train Meta’s models are not bound by this ruling and remain free to file their own separate lawsuits. While the judge dismissed the core training claims, a separate, narrower claim against Meta regarding the re-uploading of pirated books during torrenting remains unresolved and will proceed separately.

    The Contentious Source of Data: ‘Shadow Libraries’

    A key point of contention in the lawsuit was Meta’s admitted practice of obtaining copyrighted books from “online repositories of pirated works,” often referred to as “shadow libraries” like LibGen. The plaintiff authors, including Jacqueline Woodson, Richard Kadrey, Andrew Sean Greer, Rachel Louise Snyder, David Henry Hwang, Ta-Nehisi Coates, Laura Lippman, Matthew Klam, Junot Diaz, Sarah Silverman, Lysa TerKeurst, Christopher Golden, and Christopher Farnsworth, argued that Meta committed “massive copyright infringement” by sourcing data this way. They asserted the company knew the risks, with internal discussions on the matter reportedly escalating to CEO Mark Zuckerberg, and contended Meta “could and should have paid” to license these works.

    Meta countered that the source of the books had “no bearing on the nature and purpose of its use” for training, arguing the transformative nature of AI training made the acquisition method irrelevant to the fair use analysis. While the judge did not rule against Meta on the acquisition method for the purpose of dismissing the training claim in this specific case (unlike the Anthropic case where the source issue will go to trial), the use of pirated sources remains a highly debated aspect of AI training practices. The U.S. Copyright Office has also issued a report endorsing a “market dilution” theory that courts have not yet widely adopted, highlighting the disconnect between regulatory bodies and current court rulings.

    In summary, the ruling in Kadrey v. Meta marks a significant, though carefully limited, legal victory for Meta and the AI industry regarding the use of copyrighted books for model training. While finding “fair use” in this instance based on transformative use and the plaintiffs’ failure to demonstrate market harm, Judge Chhabria’s explicit caveats underscore that the legality of training AI on copyrighted material remains highly fact-dependent and far from settled. The judge’s comments effectively serve as an invitation for future lawsuits from copyright holders who can present more compelling arguments and evidence of market impact, particularly as litigation involving different types of creative works continues to shape the evolving legal landscape of AI and copyright.

    References

Leave a Reply