Anthropic to pay $1.5B over pirated books in Claude AI training — payouts of roughly $3,000 per infringed work
A landmark settlement highlights the legal risks of using copyrighted materials without permission.

Anthropic, the company behind Claude AI, has agreed to pay at least $1.5 billion to settle a class-action lawsuit brought by authors over the use of pirated books in training its large language models. The proposed settlement, filed September 5, comes after months of litigation that could change how AI companies acquire and manage data for model training.
The class action was led by authors Andrea Barta, Charles Graeber, and Kirk Wallace Johnson, and accused Anthropic of downloading hundreds of thousands of copyrighted books from torrent-based sources like Library Genesis, Pirate Mirror, and Books3. They claim that doing so allowed the company to build Claude’s underlying dataset.
In June, a federal judge had allowed the case to proceed on the narrow issue of unauthorized digital copying, setting the stage for a trial in December. Instead, Anthropic has agreed to a non-reversionary settlement fund starting at $1.5 billion, with payouts of approximately $3,000 per infringed work. This figure may increase as more titles are identified.
Anthropic will also be required to delete the infringing data, though there is currently no indication that the court will force the company to delete or retrain its models — a process known as model disgorgement — under the current agreement.
This marks the largest publicly disclosed AI copyright settlement to date. OpenAI has also settled with publishers in a separate matter, but the specific details of these deals are confidential. While Anthropic’s settlement is not an admission that they’ve done anything wrong, the sheer scale of the payout sets a new benchmark for data liability in generative AI development.
It’s worth noting that this case doesn’t challenge the broader legality of training AI on public or lawfully obtained content — a separate issue is still working its way through the courts — but it does highlight the legal risk and potential financial cost of using pirated material even if the intent is research and the content is later purchased.
As Judge Alsup put it in his June ruling, “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft,” adding that it might affect the extent of statutory damages owed to rights holders.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
If models trained on pirated data face lawsuits or potential forced retraining, developers may need to start over using clean, licensed datasets. That means redoing training runs that already consumed millions of GPU hours, lending a huge boost to compute demand. Nvidia’s H100 and upcoming Blackwell GPUs, as well as AMD’s MI300X and HBM3e providers, could all benefit as courts force labs to scramble and revalidate their models.
That is just speculation for the moment, but it will be interesting to see how courts decide to rule on related matters in the future.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button!

Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.