Huawei Ascend NPU roadmap examined — company targets 4 ZettaFLOPS FP4 performance by 2028, amid manufacturing constraints

MEMBER EXCLUSIVE
Huawei Ascend
(Image credit: Huawei)

In addition to announcing its first AI cluster with 1 FP4 ZettaFLOPS performance, Huawei also revealed a detailed roadmap of its upcoming Ascend neural processing units (NPUs) that accelerate AI workloads at the Huawei Connect 2025 event.

The company does not have access to TSMC’s leading-edge process technologies or high-end HBM4 and GDDR7 memory from global leaders. So to boost the performance of its Ascend processors, it will need to rely on a new architecture and new types of memory, kicking off with the Ascend 950-series and onwards. Huawei expects its new NPUs to enable multi-ZettaFLOPS performance toward the end of the decade.

Latest Videos From
Swipe to scroll horizontally
Huawei Ascend roadmap

NPU

Targeted Release

Architecture

FP8 Performance

FP4 Perf

Memory

Memory Bandwidth

Interconnect Bandwidth

Supported Formats

Ascend 910C

2025 Q1

SIMD

128 GB

3.2 TB/s

784 GB/s

FP32, HF32, FP16, BF16, INT8

Ascend 950PR

2026 Q1

SIMD + SIMT

1 PFLOPS

2 PFLOPS

128 GB

1.6 TB/s

2.0 TB/s

FP32, HF32, FP16, BF16, FP8, MXFP8, HiF8, MXFP4

Ascend 950DT

2026 Q4

SIMD + SIMT

1 PFLOPS

2 PFLOPS

144 GB

4.0 TB/s

2.0 TB/s

FP32, HF32, FP16, BF16, FP8, MXFP8, HiF8, MXFP4

Ascend 960

2027 Q4

SIMD + SIMT

2 PFLOPS

4 PFLOPS

288 GB

9.6 TB/s

2.2 TB/s

FP32, HF32, FP16, BF16, FP8, MXFP8, HiF8, MXFP4, HiF4

Ascend 970

2028 Q4

SIMD + SIMT

4 PFLOPS

8 PFLOPS

288 GB

14.4 TB/s

4.0 TB/s

FP32, HF32, FP16, BF16, FP8, MXFP8, HiF8, MXFP4, HiF4

Swipe to scroll horizontally
Huawei SuperPoD and SuperClusters

System

NPUs / Chips

Performance

Cabinets / Components

Release Timeframe

Atlas 950 SuperPoD

8,192 Ascend 950DT

8 EFLOPS FP8, 16 EFLOPS FP4

160 (128 compute + 32 comm)

Q4 2026

Atlas 950 SuperCluster

~524,288 Ascend 950DT (64 SuperPoDs)

524 EFLOPS FP8,
1 ZettaFLOPS FP4

>10,000 cabinets

Q4 2026

Atlas 960 SuperPoD

15,488 Ascend 960

30 EFLOPS FP8, 60 EFLOPS FP4

220 (176 compute + 44 comm)

Q4 2027

Atlas 960 SuperCluster

>1,000,000 Ascend 960

2 ZettaFLOPS FP8,
4 ZettaFLOPS FP4

Multiple SuperPoDs

Q4 2027

TOPICS
Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.