Anthropic accuses DeepSeek, other Chinese AI developers of 'industrial-scale' copying — Claims 'distillation' included 24,000 fraudulent accounts and 16 million exchanges to train smaller models

Anthropic on Monday accused three leading Chinese developers of frontier AI models of using large-scale distillation to improve their own models by using Anthropic's Claude capabilities. In total, DeepSeek, Moonshot, and MiniMax made 16 million exchanges using 24,000 fraudulent accounts.

Go deeper with TH Premium: AI and data centers

Microsoft data center in Mount Pleasant, Wisconsin — (Image credit: Microsoft)

Distillation is a machine learning technique in which a smaller or less capable model is trained on the outputs of a stronger model instead of using actual data to train. It can save time, create cheaper, more specialized models, extract capabilities from competitors, and/or lower requirements for hardware capabilities. While distillation is generally a legitimate technique, when a China-based entity with heavy restrictions does it, it violates both U.S. export controls and end-user license agreement with Anthropic.

"Distillation can be legitimate: AI labs use it to create smaller, cheaper models for their customers," a statement by Anthropic published on X reads. "But foreign labs that illicitly distill American models can remove safeguards, feeding model capabilities into their own military, intelligence, and surveillance systems."

Article continues below

American companies like OpenAI have long accused DeepSeek of using distillation to train some of their frontier models using outputs of ChatGPT and other services, but have not presented detailed explanation, unlike Anthropic.

How Chinese companies use distillation from American AI models

According to Anthropic, the perpetrators followed the same pattern: they used commercial services that resell access to frontier models and built what the company calls 'hydra cluster' networks — large pools of accounts that spread traffic across Anthropic's API and third-party clouds.

In one case, a single proxy setup allegedly controlled more than 20,000 fraudulent accounts at once. To avoid raising flags, it mixed extraction traffic with ordinary use requests. However, its prompt patterns stood out: very high volumes, tightly focused on specific capabilities, and highly repetitive. Such behavior was consistent with model training, but certainly not typical end-user interaction.

DeepSeek alone generated over 150,000 exchanges that targeted reasoning tasks, rubric-based grading suitable for reinforcement learning reward models, and censorship-safe rewrites of politically sensitive queries, according to Anthropic. Anthropic also observed prompts designed to produce step-by-step internal reasoning and therefore reveal chain-of-thought training data.

Moonshot, known for its Kimi models, accounted for more than 3.4 million exchanges, according to Anthropic. Its focus areas included agentic reasoning, tool use, coding, data analysis, computer-use agents, and computer vision. Moonshot allegedly used hundreds of fraudulent accounts spanning multiple access pathways and later tried to extract and reconstruct Claude's reasoning traces.

MiniMax conducted the largest campaign with over 13 million exchanges that targeted agentic coding and orchestration. Anthropic says it detected this operation while it was still ongoing, as MiniMax was training its model that was to be released in the future, which provides the American company a unique view on the lifecycle of the extraction. After Anthropic introduced a new Claude model, MiniMax allegedly redirected nearly half its traffic within 24 hours to capture capabilities from the latest model.

Anthropic's response

To fight future distillation attempts, Anthropic says it is strengthening defenses to make large-scale distillation harder to carry out and easier to detect. The company has deployed classifiers and behavioral fingerprinting systems to identify extraction patterns in API traffic, including chain-of-thought elicitation and coordinated multi-account activity. The company is also sharing technical indicators of large-scale distillation operation with other AI labs, cloud providers, and authorities, as well as tightening verification for educational, research, and startup accounts often used to create fraudulent access. In parallel, it is developing product-, API-, and model-level safeguards to reduce their usefulness of outputs for illicit training without harming legitimate users. At the same time, the company admits that that countering attacks at this scale requires coordinated industry and policy action.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

TOPICS

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

13 Comments Comment from the forums

hotaru251

every commercial "ai model" was created via copying (theft).
Reply
SkyBill40

What's surprising is that this comes as a surprise to them.
Reply
Geef

Not political, it's factual.
Certain groups are known to not play by the rules and not care when called out about it.
Reply
ravewulf

Considering that all the big AI models use copyrighted material without permission...
Reply
Blacksad999

Hahaha!

"We stole everything from people to train our AI models, b-b-b-b-but this other AI model is now stealing from us! Unfair!"
Reply
timsSOFTWARE

Wasn't it the case that at some point not that long ago, OpenAI banned Anthropic researchers from using their model, because some of them were using ChatGPT on a regular basis?

Everybody is all, "good artists copy, great artists steal" - until they feel they are the ones being stolen from.
Reply
derekullo

Geef said:
Not political, it's factual.
Certain groups are known to not play by the rules and not care when called out about it.
If anything ... they get a bonus from their government for a job well done.

On a slightly related but much funnier note ... Microsoft isn't accusing anyone of distilling CoPilot !!
Reply
dvdedios

Cry me a river. I'd have pity if they posted open weights. But then they'd have nothing to cry about if they did.
Reply
Notton

If you look at history, stealing is the best way to get rich quick.
Steal gold, steal rubber trees, wage theft...
All you have need is enough influence and power... or money, to not be punished for it.
Reply
-Fran-

AlanisMorrissette.jpeg

Poor big corpo now crying foul for "suspected" copying (not even proven).

But hey, it's ok. I'm sure the executives of those corporations gave the ok to the engineers, just like it happened with nVidia and META.

Regards.
Reply

Show more comments