DeepSeek reportedly urged by Chinese authorities to train new model on Huawei hardware — after multiple failures, R2 training to switch back to Nvidia hardware while Ascend GPUs handle inference

(Image credit: Getty / iStock Editorial)

A new report claims that after successfully training its R1 model on Nvidia hardware, DeepSeek was urged by Chinese authorities to switch to using Huawei Ascend-based hardware for its next model. However, according to the Financial Times, training for R2 was met with persistent Huawei hardware failures, delaying the release of the model. DeepSeek was reportedly forced to switch back to Nvidia chips for training while using Huawei's for inference.

Following the success of R1, Chinese authorities allegedly encouraged DeepSeek to rely on Huawei's Ascend-based platforms instead of Nvidia for training, according to three individuals with knowledge of the matter cited by FT. DeepSeek followed that advice during R2's development, but the move quickly ran into a bunch of issues, including unstable performance, slower chip-to-chip connectivity, and limitations of Huawei's CANN software toolkit.

As a result, DeepSeek reverted to using Nvidia's AI accelerators for training the R2 model, while keeping Huawei's hardware for inference. On the one hand, this mixed approach was a compromise born out of necessity rather than preference. But on the other hand, given the shortage of Nvidia processors in China, it makes sense to ensure that a new AI model works on Huawei hardware, as many of DeepSeek's customers will use R2 on such platforms.

DeepSeek reportedly trained its R1 model on a cluster of 50,000 Hopper-series GPUs — made up of 30,000 HGX H20 units, 10,000 H800s, and 10,000 H100s — that were supplied through its investor, High-Flyer Capital Management. For natural reasons, R2 will require a substantially more powerful cluster for training, so DeepSeek and its backer will have to land them somewhere (which may not be that hard, given plenty of AI data centers in China).

There might be another issue, though. Reports indicate that DeepSeek's AI platform is tuned specifically for Nvidia hardware, which not only leaves the company vulnerable to the availability of Nvidia GPUs but also makes its clients depend on the supply of AI accelerators like Nvidia's HGX H20. To that end, it is crucial for DeepSeek to make R2 inference work on domestic hardware platforms, such as Huawei's Ascend.

TOPICS

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

4 Comments Comment from the forums

Pierce2623

Of course that happened. “NPUs” are for inference
Reply
ejolson

According to report DeepSeek used Nvidia's PTX assembly to make more efficient use of the H20 and hide communication latencies. While it's lucky for Nvidia that a company invested in significant hardware-specific performance engineering, it's somewhat surprising that the Huwei hardware kept crashing and given the level of expertise that those same software engineers couldn't also make it perform well.
Reply
Constellar

Now let's not forget who were dealing with here--we're dealing with China, the country that is supposedly graduating over seven times (!) the number of engineers than the United States. Seven times the number of engineers! That's seven times the speed of DeepSeek versus chatGPT. Or seven times the compute power of their data centers versus ours. Or seven times the number of token acceptance. Or seven times as fast at bringing the next LLM to market...
But gee, China is not doing things seven times as well as we're doing them. Hmm. Come to think of it, we're kicking China's ass, and we have been now for quite some time. In fact, it appears as though Nvidia's server is running about seven times as efficiently than cloudmatrix, that row of junk computers that Huawei has been bragging about lately. OpenAI has about seven times as many LLMs as DeepSeek has, while together offering over seven times the compute power... Plus, doesn't the United States have over seven times the number of businesses involved in the AI race?
One thing I know is that China has about seven times as many AI patents filed compared to USA patent filings. Y' ever wanted to know where they file the Chinese patents? They file them in the bin that sits right on top of the paper shredder. I heard that Chinese toilet paper has "patented in China" printed on its wrapper. And, no, that's not a brand name you're reading. bwHAHAhahahah!🤣🤣🤣
Reply
fiyz

Constellar said:
Now let's not forget who were dealing with here--we're dealing with China, the country that is supposedly graduating over seven times (!) the number of engineers than the United States. Seven times the number of engineers! That's seven times the speed of DeepSeek versus chatGPT. Or seven times the compute power of their data centers versus ours. Or seven times the number of token acceptance. Or seven times as fast at bringing the next LLM to market...
But gee, China is not doing things seven times as well as we're doing them. Hmm. Come to think of it, we're kicking China's ass, and we have been now for quite some time. In fact, it appears as though Nvidia's server is running about seven times as efficiently than cloudmatrix, that row of junk computers that Huawei has been bragging about lately. OpenAI has about seven times as many LLMs as DeepSeek has, while together offering over seven times the compute power... Plus, doesn't the United States have over seven times the number of businesses involved in the AI race?
One thing I know is that China has about seven times as many AI patents filed compared to USA patent filings. Y' ever wanted to know where they file the Chinese patents? They file them in the bin that sits right on top of the paper shredder. I heard that Chinese toilet paper has "patented in China" printed on its wrapper. And, no, that's not a brand name you're reading. bwHAHAhahahah!🤣

Better to over estimate our rival than it is to underestimate.
Reply