Microsoft and OpenAI investigate whether DeepSeek illicitly obtained data from ChatGPT
Intellectual property theft?

Microsoft and OpenAI are probing whether a group linked to the Chinese AI startup DeepSeek accessed OpenAI's data using the company's application programming interface without authorization, reports Bloomberg, citing its sources familiar with the matter. A Financial Times source at OpenAI said that the company had evidence of data theft by the group. Meanwhile, U.S. officials suspect DeepSeek trained its model using OpenAI's outputs, a method known as distillation.
Microsoft's security team observed a group believed to have ties to DeepSeek extracting a large volume of data from OpenAI's API. The API allows developers to integrate OpenAI's proprietary models into their applications for a fee and retrieve some data. However, the excessive data retrieval noticed by Microsoft researchers violates OpenAI's terms and conditions and signals an attempt to bypass OpenAI's restrictions.
The probe comes after DeepSeek launched its R1 AI model. The company claims R1 matches or exceeds leading models in areas like reasoning, math, and general knowledge while consuming considerably fewer resources. Following DeepSeek’s announcement, Alphabet, Microsoft, Nvidia, and Oracle experienced a collective market loss of nearly $1 trillion. Investors reacted to concerns that DeepSeek's advancements could threaten the dominance of U.S. firms in the AI sector. However, if it turns out that DeepSeek used data illicitly obtained data from others, this will explain how the company managed to achieve its results without investing billions of dollars.
David Sacks, the U.S. government's AI advisor, stated there was strong evidence that DeepSeek used OpenAI-generated content to train its model through a process called distillation. This method allows one AI system to learn from another by analyzing its outputs. Sacks did not provide specific details on the evidence, though.
Neither OpenAI nor Microsoft provided an official statement on the investigation. DeepSeek and High-Flyer, the hedge fund that helped launch the company, did not respond to Bloomberg's requests for comment. However, in a statement published by Bloomberg and the Financial Times, Open AI acknowledged that China-based companies tend to distill models from American companies and that it does its best to protect its models.
"We know PRC based companies — and others — are constantly trying to distill the models of leading US AI companies," a statement by Open AI reads. "As the leading builder of AI, we engage in countermeasures to protect our IP, including a careful process for which frontier capabilities to include in released models, and believe as we go forward that it is critically important that we are working closely with the U.S. government to best protect the most capable models from efforts by adversaries and competitors to take U.S. technology.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
oofdragon Trying to forge criminal evidence to make some kind of move against it? They didn't sleep well after losing billions overnight lolReply -
das_stig If they did use distillation without permission, straight forward IP/data theft and Trumps ban hammer will fall on Deepseek and High-Flyer, although even if only used in China and the other axis of eveil rogue states, it will be a very useful tool.Reply -
jeffy9987 open ai had a whistle blower who said they stole from pretty much everyone who just so happens decided to kill himself id doubt microsoft hasnt stolen aswell looking at how much data they syphon from windowsReply -
GenericUsername109 That's rich. How about auditing OpenAI's own training data for proper consent and licensing first? :DReply -
pug_s Open Ai want to investigate 'illicitly obtained data" when they are not even open? Oh the irony.Reply
There's an old lawyer's saying: When you can't innovate, litigate.
https://www.thetimes.com/business-money/companies/article/microsoft-ceo-copyright-laws-information-ai-pc9hh2c0p#
"The boss of Microsoft has called for a rethink of copyright laws so that tech giants are able to train artificial intelligence models without risk of infringing intellectual property rights."
Apparently obtaining data from others is okay, but someone obtaining data from them is not. -
hotaru251 openai: "they might have illegally stolen content of ours"Reply
People: "but didn't you yourself steal as much content as possible to make your product? so whats the difference?"
openai: "...but thats different"
honestly might be a hilarious lawsuit if it would happen as they risk shafting the industry as a whole and i would love nothing more than big corpo to get shafted. -
SkyBill40 Would anyone honestly be surprised if it was found that DeepSeek indeed stole IP? And even if it had, what difference is it going to make? There's been loads of IP theft lawsuits filed in the US against Chinese companies and none of them move anywhere because... well.. it's China and their government doesn't care. For the record, I'm in no part ignoring that companies in the US have done the same thing, but the consequences for getting caught can be significant.Reply
It'll be interesting to see how this all plays out and if it ends up looking like the three finger pointing Spiderman meme or not. -
ivan_vy
thishotaru251 said:openai: "they might have illegally stolen content of ours"
People: "but didn't you yourself steal as much content as possible to make your product? so whats the difference?"
openai: "...but thats different"
honestly might be a hilarious lawsuit if it would happen as they risk shafting the industry as a whole and i would love nothing more than big corpo to get shafted.
classic MSFT tactics , FUD , legal slowdown , etc.
it does not matter if true as long it seeds the doubt and stops the progress of rivals. -
Quirkz
you know how openAI and meta got their data in the first place, right?das_stig said:If they did use distillation without permission, straight forward IP/data theft -
phead128 OpenAI unironically stole training data from publishers, artists, newspapers, and everywhere, now crying foul about their data getting trained on. Zero self-awareness.Reply