Key Takeaways
- Fair use upheld. The court ruled that Anthropic’s use of legally obtained books to train its Claude models qualifies as fair use, calling it “spectacularly” transformative.
- Pirated data remains unresolved. The case over Anthropic’s use of more than 7 million pirated books will go before a jury in December 2025.
- Dual precedent emerging. AI developers now have legal cover for using lawful sources, but the ruling draws a firm line against unauthorized datasets.
- Industry-wide impact. The decision is expected to influence legal strategy, data sourcing practices and licensing negotiations across the generative AI space.
In a landmark decision on June 24, 2025, US District Judge William Alsup ruled that Anthropic’s practice of training its Claude models on legally obtained books qualifies as fair use — a monumental win for AI developers.
The judge described their methods as “exceedingly transformative,” comparing Claude’s learning process to “a reader aspiring to be a writer.” While the ruling clears a major path forward for lawful AI training, it also reserved judgment on Anthropic’s use of over 7 million pirated books — setting the stage for a high-stakes December jury trial.
Authors v. AI: How the Lawsuit Against Anthropic Began
The case against Anthropic began in late 2023 when three prominent authors — Michael Chabon, David Graeber’s estate and Sarah Silverman — joined a broader wave of lawsuits targeting generative AI firms. Unlike the recent Reddit v. Anthropic dispute over user-generated content, this case centers entirely on published books used in AI model training.
In this instance, the plaintiffs accused Anthropic of using their copyrighted books without permission to train its Claude family of AI models. The legal filing detailed two distinct practices:
- The scanning of legally purchased books
- The more controversial act of downloading over 7 million pirated works from online “shadow libraries,” such as Library Genesis and Z-Library
The lawsuit argued that Anthropic’s training of its models on this vast trove of copyrighted material violated copyright law and devalued authors' works. Anthropic responded by asserting that its use of the content was transformative and thus protected under fair use — a position that would become central to the court’s eventual ruling. Over the following months, the case progressed through motions and discovery, culminating in a partial summary judgment issued in June 2025 by US District Judge William Alsup.
Related Article: Anthropic Accused of Massive Data Theft in Reddit Lawsuit
The Court Declares Fair Use for AI Training
"Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use."
- William Alsup
US District Judge
In the partial summary judgment, Judge Alsup ruled that Anthropic’s use of lawfully acquired books to train its large language models (LLMs) qualifies as fair use under US copyright law. Fair use allows people to use copyrighted material without permission in certain cases — especially when the use is transformative (meaning it adds new meaning, purpose or expression) and doesn't harm the original work's market value.
The court emphasized that Anthropic's Claude models did not reproduce or regurgitate the original texts, but instead created novel outputs based on generalized learnings. Alsup compared the training process to “any reader aspiring to be a writer,” stating that the models were not substituting or competing with the original works, but transforming them into something new.
This decision marks one of the first significant judicial endorsements of fair use in the context of AI training, providing a legal framework that could influence numerous ongoing lawsuits against other AI developers, including OpenAI, Meta and Google. Alsup’s reasoning focused heavily on the “transformative” nature of the use, which he described as “spectacularly so,” and found that the plaintiffs had failed to demonstrate material market harm from the scanned books alone.
Next Stop: Jury Trial Over Shadow Library Data
While Anthropic prevailed on the fair use argument for lawfully obtained books, the court declined to extend that protection to the company’s use of pirated content. The lawsuit alleges that Anthropic downloaded over 7 million books from illicit shadow libraries — an act Judge Alsup described as "legally and ethically distinct" from scanning purchased works. He ruled that this portion of the case must proceed to trial, stating that a jury should determine whether the use of pirated materials constitutes willful infringement.
The trial, scheduled for December 2025, could carry significant financial consequences. Under US copyright law, Anthropic could face statutory damages of up to $150,000 per infringed work if found liable for willful violation. Legal experts say this portion of the case will test the limits of fair use defenses and could set a precedent around the sourcing of training data, particularly when it involves unauthorized or illegal copies.
What This Decision Means for the AI Industry
Judge Alsup’s ruling is already being viewed as a pivotal moment in the legal issues surrounding generative AI. For developers of LLMs, the decision offers the strongest judicial endorsement yet that using legally acquired materials for model training may fall under fair use. That validation could reduce some of the legal uncertainty plaguing the industry and provide a partial roadmap for how AI firms can approach data sourcing without explicit licensing.
Still, the unresolved piracy issue serves as a cautionary boundary. Companies that scrape or acquire content from illegal sources may find themselves exposed to litigation, even if their downstream use is algorithmic or transformative. The split decision — fair use for legal copies, potential infringement for pirated ones — highlights the importance of data provenance in model training. As AI developers refine their pipelines, this case signals a growing expectation for documented sourcing and ethical handling of copyrighted materials.
For rights holders and advocacy groups, the ruling may feel like a setback. The court's embrace of transformative use could limit future copyright claims, especially if plaintiffs cannot show direct market harm or exact duplication. But the door remains open for more aggressive legal action when training data is drawn from unauthorized sources, as the pending jury trial will further explore.
How Anthropic — and Everyone Else — Is Reacting
"In some instances AI companies should be happy with the decision and in other instances copyright owners should be happy."
- Keith Kupferschmid
CEO, Copyright Alliance
Naturally, the company welcomed the court’s recognition that its AI model development was fundamentally transformative. The Anthropic spokesperson — as noted through the court filing summary reported in NBCNews — stated, “We’re pleased that the Court recognized that using works to train [language models] was transformative — spectacularly so.” Anthropic expressed satisfaction with the court’s framing of its training practices as legally permissible and creatively distinct from the original works.
The ruling drew swift reactions from both sides of the generative AI debate. For Anthropic, the decision marked a legal and reputational win in a growing wave of copyright litigation targeting AI companies. Meanwhile, copyright advocates offered a more measured response, pointing to unresolved concerns around unauthorized datasets and author compensation.
Keith Kupferschmid, CEO of the Copyright Alliance, acknowledged that while AI developers had reason to celebrate, rights holders also earned a partial victory in the court’s rejection of fair use for pirated content. He pointed to the decision’s dual nature — upholding fair use for legally obtained data while denying that protection for pirated books — as the reason both sides could claim partial success.
“In some instances AI companies should be happy with the decision and in other instances copyright owners should be happy,” he said, noting that while developers gained legal cover for using lawfully sourced materials, authors preserved a pathway for enforcement through the upcoming jury trial.
Other legal experts agreed that the court’s endorsement of transformative use sets an important precedent. Chris Mammen, managing partner at Womble Bond Dickinson and a specialist in IP law, remarked that the ruling underscored the broader stakes at play. “The technology at issue was among the most transformative many of us will see in our lifetimes,” he said, adding that the court correctly focused on the creative distance between training inputs and AI outputs — even if the models internalize large volumes of content.
The Anthropic decision also comes amid a flurry of parallel legal activity. Just one day later, a federal judge dismissed a similar lawsuit filed by 13 authors against Meta’s Llama model, stating the plaintiffs had failed to make viable claims under current copyright law. And on the same day as the Anthropic ruling, a group of authors — including "American Prometheus" co-author Kai Bird and New Yorker writer Jia Tolentino — filed suit against Microsoft, alleging the use of more than 180,000 pirated books to train its Megatron AI model.
Together, these developments show a legal environment in flux — where some AI developers are gaining breathing room under fair use, while others face growing scrutiny over how their training data is sourced.
Related Article: Are AI Models Running Out of Training Data?
Countdown to the December Jury Trial
With the court’s partial summary judgment now issued, attention turns to the December 2025 jury trial that will determine Anthropic’s liability for using pirated books. At stake are potentially millions in statutory damages, depending on whether the jury finds willful infringement. Legal experts say this phase of the case could set a precedent around the use of unauthorized content in AI training pipelines, particularly for startups and open-source model developers operating without licensing agreements.
More broadly, the ruling is likely to influence the strategies of other generative AI businesses currently facing litigation, including OpenAI, Meta, Google and Stability AI. If upheld, the court’s stance on fair use could encourage brands to build datasets strictly from lawfully acquired sources while avoiding material of questionable origin. It may also accelerate efforts to formalize licensing deals between AI developers and publishers, authors and other rights holders to minimize legal exposure.
For now, the industry sits at a crossroads. The fair use ruling provides a degree of legal clarity, but the upcoming trial will determine just how costly it can be to get that line wrong.