Huawei unveils Atlas 950 SuperCluster — promises 1 ZettaFLOPS FP4 performance and features hundreds of thousands of 950DT APUs

(Image credit: Huawei)

Huawei has unveiled its next-generation data-center scale AI solution that can offer 1 FP4 ZettaFLOPS performance for AI inference and 524 FP8 ExaFLOPS for AI training at its Huawei Connect 2025 conference on Thursday. The new SuperCluster 950 system runs hundreds of thousands of the company's Ascend 950DT neural processing units (NPUs) and promises to be one of the most powerful supercomputers for artificial intelligence on the planet. Huawei expects its SuperCluster to compete with Nvidia's Rubin-based systems in late 2026.

Massive performance

The supercomputer purportedly offers up to 524 FP8 ExaFLOPS for AI training and up to 1 FP4 ZettaFLOPS for AI inference (MXFP4 to be more specific), which puts it just behind leading-edge AI supercomputers, such as Oracle's OCI Supercluster running 131,072 B200 GPUs and offering peak performance of up to 2.4 FP4 ZettaFLOPS for inference introduced last year. Keep in mind these figures pertain to peak performance numbers, so it remains to be seen whether they can be achieved in real life.

This SuperCluster is designed to support both RoCE (Remote Direct Memory Access over Converged Ethernet) and Huawei's proprietary UBoE (UnifiedBus over Ethernet) protocols, though it remains to be seen how fast the latter will be adopted. According to Huawei, UBoE offers lower idle-state latency, higher hardware reliability, and requires fewer switches and optical modules than traditional RoCE setups.

Huawei positions its Atlas 950 SuperCluster to support training and inference workloads for AI models with hundreds of billions to tens of trillions of parameters. Huawei believes this platform is well-suited for the next wave of large-scale dense and sparse models, thanks to its combination of compute throughput, interconnect bandwidth, and system stability. Though given its size, it is unclear how many companies will be able to accommodate the system.

Massive footprint

Huawei admits that it cannot build processors that would challenge Nvidia's GPUs in terms of performance. Therefore, to achieve 1 ZettaFLOPS with the Atlas 950 SuperCluster, it intends to use a brute force approach, utilizing hundreds of thousands of AI accelerators to compete against Nvidia Rubin-based clusters in 2026–2027.

A common building block of Huawei's Atlas 950 SuperCluster is the Atlas 950 SuperPoD that integrates 8,192 Ascend 950DT chips, representing a 20-fold increase in processing units compared to the Atlas 900 A3 SuperPoD (also known as the CloudMatrix 384) and a massive increase in compute performance — 8 FP8 ExaFLOPS and 16 FP4 ExaFLOPS.

Performance of the Atlas 950 SuperCluster is truly impressive on paper; it is said to be massively higher compared to Nvidia's Vera Rubin NVL144 (1.2 FP8 ExaFLOPS, 3.6 NVFP4 ExaFLOPS), a product that the company compares it to. However, that performance comes at a price, namely size. The Atlas 950 SuperCluster setup includes 160 total cabinets — 128 for computation and 32 for communications — spread across 1,000 square meters, which is about the size of two basketball courts. By contrast, Nvidia's Vera Rubin NVL144 is a rack-scale solution that consists of one compute rack and one cable and switch rack that requires just several square meters of space.

As for Huawei's Atlas 950 SuperCluster — which consists of 64 Atlas 950 SuperPoDs and should measure around 64,000 m2 — its size is comparable to 150 basketball courts, or nine regulation soccer fields. Keep in mind, though, that a real campus would likely require additional space for power rooms, chillers/cooling towers, battery/UPS systems, and support offices, so the total site footprint could be significantly larger than 64,000 m².

The road ahead

One of the things about selling server hardware is that customers always want to know what is next, so in addition to having a good product, it is vital to have a roadmap. So at the Huawei Connect, the company disclosed plans to launch the Atlas 960 SuperCluster alongside the Atlas 960 SuperPoD in the fourth quarter of 2027.

This next-generation system will scale up to more than 1 million Ascend 960 NPUs and will provide 2 FP8 ZettaFLOPS and 4 MXFP4 ZettaFLOPS of performance. It will also support both UBoE and RoCE, with the former expected to deliver improved latency and uptime metrics while continuing to rely on Ethernet.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button!

TOPICS

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

15 Comments Comment from the forums

Pierce2623

Yay another product with terrible performance and even worse performance per watt
Reply
EyadSoftwareEngineer

It's becoming clear that the era of Nvidia's unchallenged dominance is over. While everyone was watching Blackwell, Huawei was executing a masterclass in vertical integration and innovation under pressure. Developing competitive in-house HBM and outlining a roadmap with specs that not only meet but exceed current market leaders is a staggering achievement, especially considering the constraints they've operated under.

Their ability to deploy massive-scale domestic clusters like the Atlas 950 signals a seismic shift. The conversation is no longer about if there is a competitor, but how decisively the landscape has changed. Nvidia's software lead is significant, but it's no longer an insurmountable moat. Huawei has demonstrated it has the vision, the political backing, and now the hardware to not just compete, but to lead the next chapter of AI compute. The torch is being passed.
Reply
pug_s

Pierce2623 said:
Yay another product with terrible performance and even worse performance per watt
Its clusters goes toe to toe with Nvidia AI Cluster. Despite that it takes up more power than a comparable Nvidia cluster, it does not have backdoors which can screw Chinese companies over.
Reply
bit_user

pug_s said:
it does not have backdoors which can screw Chinese companies over.
Please post evidence of back doors in Nvidia products, if there are any.
Reply
zsydeepsky

bit_user said:
Please post evidence of back doors in Nvidia products, if there are any.
The safety concerns mostly came from the US gov, namely, the Chip Security Act (H.R. 3447), which has not passed Congress yet, but according to it, any American company has to embed a location identification feature based on network latency detection within their GPUs (the detailed method is not included in the act itself, I read it somewhere else, mentioned by some US congressman).

Anyone investing billions in long-term projects would naturally want to avoid this kind of risk. To the record though, Nvidia did oppose this act since this harms their business model; they look more like a victim here.
Reply
bit_user

zsydeepsky said:
embed a location identification feature based on network latency detection within their GPUs
That enables usage restrictions, while a backdoor is something which grants unauthorized access. So, they're not the same thing. However, thanks for the detailed citation.

Maybe that's what pug_s meant, and yes I could see why having such a built-in limiter would be problematic.
Reply
nookoool

Pierce2623 said:
Yay another product with terrible performance and even worse performance per watt

I have seen this tortoise and hare race multiple times from telecom to military tech over the past 25 years. The tortoise has nearly always caught up or sometime surpass.

Some parts of the article seems weird, but it seems they are able to compete on raw performance but it will take a larger amount of space and power. Rumor is that Huawei is printing DUV litography machines, so we will see how that goes in terms of wafer counts.
Reply
pug_s

bit_user said:
That enables usage restrictions, while a backdoor is something which grants unauthorized access. So, they're not the same thing. However, thanks for the detailed citation.

Maybe that's what pug_s meant, and yes I could see why having such a built-in limiter would be problematic.
usage restrictions are restrictions, and someone can exploit those restrictions.
Reply
tamalero

bit_user said:
That enables usage restrictions, while a backdoor is something which grants unauthorized access. So, they're not the same thing. However, thanks for the detailed citation.

Maybe that's what pug_s meant, and yes I could see why having such a built-in limiter would be problematic.
That sounds literally like a kill switch.
Reply
bit_user

tamalero said:
That sounds literally like a kill switch.
Effectively, yes. That's the point. But it's not a back door, which was the original claim.

And it hasn't been implemented, as per @zsydeepsky 's post. I mean, you can claim anyone has done anything, but there's no evidence it's been implemented.
Reply

Show more comments