Chinese-made DeepSeek AI model records extensive online user data, stores it in China-based servers

(Image credit: Shutterstock)

DeepSeek’s newest R1 large language model has already become notorious after its release cratered AI stocks, and revelations about its privacy policy might raise eyebrows even more — the company records extensive data from its online users, including keystrokes, passwords, and data entered in queries like images and text, and then stores it in China-based servers.

Personal information, including date of birth, email addresses, phone numbers, and passwords, are all fair game, according to DeepSeek. Any content users give to the R1 LLM, from text and audio prompts to uploaded files, may also be collected by DeepSeek. And whenever someone contacts DeepSeek, it says it might keep users’ proof of identity, which presumably means documents like a driver’s license.

But that’s not all. DeepSeek records anything related to users’ hardware: IP addresses, phone models, language, etc. Its collection efforts are so thorough that the company notes “keystroke patterns or rhythms.” Cookies, a classic method of tracking users on the Internet, also contribute to user data collection.

Because R1 is 'open source,' it can be run anywhere on any hardware, which is generally good for privacy — running the model locally on your own hardware will presumably not lead to data collection. However, DeepSeek offers online access to R1 via its website and mobile app, which means the AI company handles and stores online users' data. Thankfully, DeepSeek is very transparent about what data it collects from online users, where it’s stored, and what it does with it. It details it all in its privacy policy webpage, which reveals that there’s almost nothing the company doesn’t collect.

While it’s common practice for companies with lots of user data to sell that data to interested companies such as advertising firms, something that DeepSeek says it might do, it also admits that “advertisers, measurement, and other partners share information with us about you and the actions you have taken outside of the Service, such as your activities on other websites and apps or in stores, including the products or services you purchased, online or in person.” With all this information at its disposal, it seems that DeepSeek has the potential to know its users very intimately.

As for where all this information is stored, the privacy policy says it’s all kept inside servers located in China, a point that has the potential to spark serious controversy. Concerns about the personal details of Americans being in the hands of the Chinese government was a key factor in the Biden administration’s attempt to ban TikTok, raising the possibility that DeepSeek might come under similar scrutiny.

On the other hand, President Trump’s allies include Meta’s Mark Zuckerberg and OpenAI’s Sam Altman, and both of them are probably not very happy to see the R1 LLM run circles around their LLMs. Additionally, it’s hard to imagine that DeepSeek has made a good impression on the Republican President by inadvertently causing the stock prices of many American tech companies to fall significantly.

Developed by Chinese AI company DeepSeek, R1 is an open-source LLM that boasts cutting-edge performance at a fraction of the computing power. With 671 billion parameters, it’s one of the most significant AI models and only took 2.8 million GPU hours to train. Meta’s Llama 3 required 30.8 million GPU hours, or 11 times more.

DeepSeek boasted about these accomplishments over a month ago, but R1 launched on January 20, and the implications were fully appreciated by the stock market only yesterday. The market reacted by selling shares in AI companies like Nvidia. While the spotlight on DeepSeek has raised its profile, many have also reviewed how it handles user privacy, a particularly thorny issue for anything involving AI and software developed in China.

TOPICS

Matthew Connatser is a freelancing writer for Tom's Hardware US. He writes articles about CPUs, GPUs, SSDs, and computers in general.

20 Comments Comment from the forums

EzzyB

the company records extensive data from its online users, including keystrokes, passwords, and data entered in queries like images and text, and then stores it in China-based servers.

SHOCKING! SHOCKING I SAY! :eek:

Seriously, is anyone at all surprised by this?
Reply
hotaru251

90% of people have no info that any gov would care about & that data is already collected (by private companies & their own govs) all time via multiple sources.

if you are "that" paranoid just run it in a sandbox or on a dummy device thats only used for junk stuff (thus they never get anything of value)
Reply
Notton

You know what? At this point, I don't care.
altman/elon/zucker/gates have a long history of harvesting and selling off user data.
Do you really think they also don't harvest all your data when using their ai models?

At least DeepSeek is open about the data they collect, and it's open source. Where as grok/openai/copilot/facebook is a big Questionmark. Who knows what they collect about you.

If you really care about privacy, EU's GDPR is a good starting point.
Reply
Gaidax

I would definitely not use it for anything work-related, as a software engineer.

I have no illusions about Western AI chatbots and tools, but China is a whole next level low as far as accountability and morals go.
Reply
Dementoss

EzzyB said:

Seriously, is anyone at all surprised by this?
They shouldn't be, it's as surprising as night following day...
Reply
ederbond

EzzyB said:
SHOCKING! SHOCKING I SAY! :eek:

Seriously, is anyone at all surprised by this?
Nothing different from what Google, MSFT, Apple, Facebook and X has been doing since forever. So what's the point?
Reply
pug_s

ederbond said:
Nothing different from what Google, MSFT, Apple, Facebook and X has been doing since forever. So what's the point?
Believe it or not, unlike the US, China has a data privacy law (PIPL) . So your data will be housed in some Chinese server and not sold to some 3rd party.
Reply
USAFRet

pug_s said:
Believe it or not, unlike the US, China has a data privacy law (PIPL) . So your data will be housed in some Chinese server and not sold to some 3rd party.
And then used however the govt directs them to.
Reply
WhteTrash

Would trust more China than the US at this point.
Reply
DalaiLamar

Gaidax said:
I would definitely not use it for anything work-related, as a software engineer.

I have no illusions about Western AI chatbots and tools, but China is a whole next level low as far as accountability and morals go.
https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExZ3BmZ3dzbWlleDJ2aWp4MWxuMnhiYmppaTR6bTIyY3hjM2F3d3BkOSZlcD12MV9naWZzX3NlYXJjaCZjdD1n/5R1FM2PNw3G6AZWBsc/giphy.gif
Reply

Show more comments