Anthropic to pay $1.5B over pirated books in Claude AI training — payouts of roughly $3,000 per infringed work

chatgpt claude and perplexity logo on an iPhone
(Image credit: Getty / Robert way)

Anthropic, the company behind Claude AI, has agreed to pay at least $1.5 billion to settle a class-action lawsuit brought by authors over the use of pirated books in training its large language models. The proposed settlement, filed September 5, comes after months of litigation that could change how AI companies acquire and manage data for model training.

The class action was led by authors Andrea Barta, Charles Graeber, and Kirk Wallace Johnson, and accused Anthropic of downloading hundreds of thousands of copyrighted books from torrent-based sources like Library Genesis, Pirate Mirror, and Books3. They claim that doing so allowed the company to build Claude’s underlying dataset.

This marks the largest publicly disclosed AI copyright settlement to date. OpenAI has also settled with publishers in a separate matter, but the specific details of these deals are confidential. While Anthropic’s settlement is not an admission that they’ve done anything wrong, the sheer scale of the payout sets a new benchmark for data liability in generative AI development.

It’s worth noting that this case doesn’t challenge the broader legality of training AI on public or lawfully obtained content — a separate issue is still working its way through the courts — but it does highlight the legal risk and potential financial cost of using pirated material even if the intent is research and the content is later purchased.

As Judge Alsup put it in his June ruling, “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft,” adding that it might affect the extent of statutory damages owed to rights holders.

If models trained on pirated data face lawsuits or potential forced retraining, developers may need to start over using clean, licensed datasets. That means redoing training runs that already consumed millions of GPU hours, lending a huge boost to compute demand. Nvidia’s H100 and upcoming Blackwell GPUs, as well as AMD’s MI300X and HBM3e providers, could all benefit as courts force labs to scramble and revalidate their models.

TOPICS
Luke James
Contributor

Luke James is a freelance writer and journalist.  Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.