DeepSeek research suggests Huawei's Ascend 910C delivers 60% of Nvidia H100 inference performance
Aging chip could succeed in reducing China's reliance on Nvidia GPUs.

Huawei's HiSilicon Ascend 910C is a version of the company's Ascend 910 processor for AI training introduced in 2019. By now, the performance of the Ascend 910 is barely sufficient for the cost-efficient training of large AI models. Still, when it comes to inference, it delivers 60% of Nvidia's H100 performance, according to researchers from DeepSeek. While the Ascend 910C is not a performance champion, it can succeed in reducing China's reliance on Nvidia GPUs.
Testing by DeepSeek revealed that the 910C processor exceeded expectations in inference performance. Additionally, with manual optimizations of CUNN kernels, its efficiency could be further improved. DeepSeek's native support for Ascend processors and its PyTorch repository allows for seamless CUDA-to-CUNN conversion with minimal effort, making it easier to integrate Huawei's hardware into AI workflows.
This suggests that Huawei's AI processor's capabilities are advancing rapidly, despite sanctions by the U.S. government and the lack of access to leading-edge process technologies of TSMC.
While Huawei and SMIC have managed to catch up with TSMC's capabilities in the 2019–2020 era and produce a chip that can be considered competitive with Nvidia's A100 and H100 processors, the Ascend 910C is not the best option for AI training. AI training remains a domain where Nvidia maintains its undisputable lead.
DeepSeek's Yuchen Jin said that long-term training reliability is a critical weakness of Chinese processors. This challenge stems from the deep integration of Nvidia's hardware and software ecosystem, which has been developed over two decades. While inference performance can be optimized, sustained training workloads require further improvements in Huawei's hardware and software stack.
Just like the original Ascend 910, the new Ascend 910C uses chiplet packaging, and its main compute SoC has around 53 billion transistors. While the original compute chiplet of the Ascend 910 was made by TSMC using its N7+ fabrication technology (7nm-class with EUV), the compute chiplet of the Ascend 910C is made by SMIC on its 2nd Generation 7nm-class process technology known as N+2.
Looking ahead, some experts predict that as AI models converge to Transformer architectures, the importance of Nvidia's software ecosystem may decline. DeepSeek's expertise in the optimization of hardware and software could also significantly reduce dependency on Nvidia, offering AI companies a more cost-effective alternative, particularly for inference. However, to compete at a global scale, China must overcome the challenge of training stability and further refine its AI computing infrastructure.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
bit_user I just want to point out that the H100 burns a lot of die area on stuff that's not relevant to inference. For instance, you don't need quite so much inter-GPU bandwidth and connectivity, when just doing inference. Also, H100 has quite a bit of FP64 horsepower, for HPC. If you're building a pure AI processor, you wouldn't need that stuff. I actually expected Nvidia to have separated off their AI and HPC products by now. Maybe in the next generation, they will finally do this.Reply
Finally, I wonder how many people are even using H100 for inference. It'd be cheaper to distribute your model over a set of L40 cards. -
das_stig So 910C is 60% performance of H100, what about the per unit price/power/workload, at what point does it make more sense to use the multiple 910c for a single H100 locally?Reply -
nookoool das_stig said:So 910C is 60% performance of H100, what about the per unit price/power/workload, at what point does it make more sense to use the multiple 910c for a single H100 locally?
I think H100 is export control. It would be blackmarket smuggle price vs huawei pricing. -
artk2219 I wonder how well Moore Threads cards perform with deepseek, i haven't heard anything out of them for a while.Reply -
atomicWAR "Rumored To Come Pretty Close With NVIDIA’s H100"... 60% is pretty close? I am not sure when being one->two generations behind, performance wise, was considered close but I for one find that math suspect as also assumes the purported performance is as described. My guess is it closer to 50% or less in non-cherry picked workloads. This "news" seems to be anything but... Until it is tested rigorously by non-Chinese reviewers I'll take this news with a truck load of salt.Reply -
Pierce2623
The Ascend 910 can do more than inference too. It was originally marketed as a high power training solution. It still has s somewhat respectable amount of fp32 compute too. Its not solely a pure tensor/matrix math accelerator like a Google Coral or whatever it was called.bit_user said:I just want to point out that the H100 burns a lot of die area on stuff that's not relevant to inference. For instance, you don't need quite so much inter-GPU bandwidth and connectivity, when just doing inference. Also, H100 has quite a bit of FP64 horsepower, for HPC. If you're building a pure AI processor, you wouldn't need that stuff. I actually expected Nvidia to have separated off their AI and HPC products by now. Maybe in the next generation, they will finally do this.
Finally, I wonder how many people are even using H100 for inference. It'd be cheaper to distribute your model over a set of L40 cards. -
bit_user
But not HPC, correct? You mostly need fp64, for that.Pierce2623 said:The Ascend 910 can do more than inference too. It was originally marketed as a high power training solution. It still has s somewhat respectable amount of fp32 compute too. -
Pierce2623
Correct. It’s dogshit in fp64 from what I’ve seen as it has to use two fp32 data paths to run fp64 calculations. That applies to pretty much everything outside of AMD Instinct though. AMD is literally the only one still serving the fp64 market. A MI300x has more than double the fp64 throughput of anything Nvidia offers since they’ve gone all in on AI and LESS precise data formats.bit_user said:But not HPC, correct? You mostly need fp64, for that. -
zsydeepsky well...if anyone truly wants to find out the performance of Ascend...I guess they can just rent an Ascend cluster on Huawei Cloud and play with it:Reply
https://www.huaweicloud.com/intl/en-us/product/modelarts.html
Ascend cards can be used for model training, Huawei has a coop case report (with another Chinese AI company) on their official website, and introduced their training solutions:
https://e.huawei.com/cn/case-studies/solutions/storage/iflyte
Huawei also has its own models (like the one that serves in Huawei's HarmonyOS as AI assistant), according to Huawei Developer Conference announcements, all the models were trained on their Ascend clusters.
furthermore, all major Chinese companies with AI requirements (such as ByteDance, Alibaba, Tencent, etc.), have placed mass Ascend card orders. I think the recent numbers I read are that Huawei is estimated to ship 1 million 910C in 2025.
even though one could argue that Nvidia can sell 2x more H100 than 910C, and H100 is roughly 2x more powerful than 910C, let alone more powerful B200...but the point is that...the strategy of "preventing China from getting computing power needed for AI development through embargo & sanctions", doesn't seem working.