Huawei Ascend AI 910D processor designed to take on Nvidia's Blackwell and Rubin GPUs

Huawei
(Image credit: Huawei)

Huawei's next-generation HiSilicon Ascend 910D AI processor is expected to offer better performance than Nvidia's H100, reports Reuters. The new processor will be slower on a chip vs chip basis compared to Nvidia's Blackwell B200 and Blackwell Ultra B300 GPUs, never mind the next-generation Rubin GPUs slated to launch next year. However, Huawei's approach of building pods with hundreds of processors should allow the Ascend 910D to compete against pods based on Nvidia's current Blackwell and upcoming Rubin GPUs.

Huawei is preparing to start tests of its most advanced artificial intelligence processor, the Ascend 910D, with the performance goal of surpassing Nvidia's H100 and offering a domestic alternative amid U.S. export restrictions. According to sources, Huawei has approached several local companies to assess whether the new Ascend 910D chip meets performance and deployment requirements. Initial samples are expected by late May.

Separately, Huawei plans to start large-scale shipments of its dual-chiplet Ascend 910C AI processors to Chinese customers (and probably full systems based on the chips) as early as next month. The majority of of these processors were reportedly produced by TSMC for a third-party company. It remains to be seen whether the Ascend 910D will be made by China-based SMIC, or whether — nearly five years after the U.S. government restricted Huawei's access to leading-edge semiconductor production capabilities — Huawei will once again find a way to circumvent U.S. sanctions.

Reaching Nvidia H100 performance levels won't be easy for Huawei. The company's latest dual-chiplet Ascend 910C offers around 780 BF16 TFLOPS of performance, whereas Nvidia's H100 can deliver around 2,000 BF16 TFLOPS. In order to achieve H100 performance levels, Huawei will have to redesign the internal architecture of the Ascend 910D and possibly increase the number of compute chiplets.

To stay competitive in the AI industry next year, Huawei will have to achieve performance comparable to that of AI clusters developed in the U.S. This year, the company introduced its CloudMatrix 384 system with 384 Ascend 910C processors. It can reportedly beat Nvidia's GB200 NVL72 in certain workloads, but at the cost of significantly higher power consumption due to dramatically lower performance-per-watt. It also has over five times as many 'AI processors' as an NVL72 rack. Whether the interconnect can scale well to the required number of processors remains to be seen.

Without access to leading-edge process technologies, it will become significantly more difficult for Huawei to maintain competitive positions next year. Nvidia is on-track to introduce its codenamed Rubin GPUs for AI and HPC in 2026. Rubin GPUs are set to be made on TSMC's N3 (or a more advanced) fabrication process, and they should offer even higher performance-per-watt than the current-generation Blackwell GPUs.

Rubin GPUs are slated to offer around 8,300 TFLOPS of FP8 training performance, and presumably half that for BF16 — about twice the performance of the B200. Huawei's Ascend 910D and next-generation CloudMatrix systems with 384 of such processors could theoretically offer competitive AI performance on the rack level. However, it remains to be seen what performance benefits Huawei's Ascend 910D and Nvidia's Rubin GPUs will offer compared to existing offerings. Also, it should be noted that Nvidia will barely be able to sell its high-performance Rubin GPUs in China, so for that market Huawei won't really have a direct competitor.

Regardless of performance or efficiency, Huawei's Ascend 910D processors will likely become China's workhorses when it comes to AI training in the coming years. Given the strategic importance of AI, the power consumption of the Ascend 910D (or any other domestic AI processor) will not be a limiting factor, as the number of deployed units could offset the efficiency of Nvidia's (or AMD, Intel, Broadcom, etc.) AI processors. The main limiting factor for China will be its ability to produce enough processors — either domestically, or overseas using proxy companies.

TOPICS
Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • freedomnow
    This is why politicians should not make decisions on things they know nothing about. The only thing they are doing is making China pollute the earth more. Nvidia parts are more efficient per watt which allows them to train models with less power. This does not stop China from training models they can simply create larger more power hungry data centers.

    This reminds me of stupid environmentalists who will block the US production of oil to force us to import it from the other side of the world. They think they are making a difference but in reality they are contributing more pollution to the Earth.

    Critical thinking skills have depleted in this world and they just follow the leader like mindless zombies.
    Reply
  • Pierce2623
    freedomnow said:
    This is why politicians should not make decisions on things they know nothing about. The only thing they are doing is making China pollute the earth more. Nvidia parts are more efficient per watt which allows them to train models with less power. This does not stop China from training models they can simply create larger more power hungry data centers.

    This reminds me of stupid environmentalists who will block the US production of oil to force us to import it from the other side of the world. They think they are making a difference but in reality they are contributing more pollution to the Earth.

    Critical thinking skills have depleted in this world and they just follow the leader like mindless zombies.
    The only problem with what you’re saying is the Huawei chip is so uncompetitive that all the big Chinese companies are hoarding previous generation Nvidia stock. If Blackwell is platinum and Hopper is gold, Huawei Ascend is a cheap copper alloy.
    Reply
  • nookoool
    Pierce2623 said:
    The only problem with what you’re saying is the Huawei chip is so uncompetitive that all the big Chinese companies are hoarding previous generation Nvidia stock. If Blackwell is platinum and Hopper is gold, Huawei Ascend is a cheap copper alloy.

    These Chinese tech company have a decade of code built off CUDA. Their access to nvdia can be ban at any moment and Huawei has limited production rates, of course they will hoard while the hoarding is possible
    Reply
  • Pierce2623
    nookoool said:
    These Chinese tech company have a decade of code built off CUDA. Their access to nvdia can be ban at any moment and Huawei has limited production rates, of course they will hoard while the hoarding is possible
    Oh Huawei is claiming they can run CUDA code. They’re not affected by any US ruling on the Zluda CUDA translation layer. They’re hoarding Nvidia because the purchase cost and operating cost on the Huawei makes them flat out uncompetitive even if they can run 4x the number of chips in a single cluster, and that’s just to be competitive with Hopper. Compared to Blackwell, they’ll be miles off. Blackwell is Nvidia’s first architecture ever built specifically for AI performance from the ground up
    Reply
  • regs01
    H100 only delivers 990 TFLOPS, just slightly higher of 910C. 2000 TFLOPS is B100.
    Reply
  • Pierce2623
    regs01 said:
    H100 only delivers 990 TFLOPS, just slightly higher of 910C. 2000 TFLOPS is B100.
    I’m not talking about raw FLOP numbers. I’m talking about actual use cases and well benchmarked AI performance. If the 910c was truly competitive with a H100 at half the price(They’re actually well under half the price), then why are Chinese companies hoarding even weaker H20 chips? Also why does their new cluster need 4x the number 910c chips to reach “2x Hopper performance” ? You’re also neglecting the fact that running the Huawei chips is over twice as expensive in electricity for a given level of performance. At the power levels new data centers are pushing, that’s a MASSIVE expense.
    Reply