China Wants 300 ExaFLOPS of Compute Power by 2025

Nvidia Ada Lovelace and GeForce RTX 40-Series

(Image credit: Shutterstock)

CNBC reports that China is set on building a 50% enhancement of its computational capabilities by 2025 aiming for a total of 300 ExaFLOPS, the country's key ministries announced on Monday. This initiative is part of the nation's strategy to stay competitive in the high-tech sector, especially in artificial intelligence (AI) and high-performance computing (HPC).

The Chinese government, through six of its primary ministries, announced its ambitious plan to dramatically elevate the country's computing prowess roughly a year. Currently, China possesses an aggregated computational power of 197 ExaFLOPS, but it wants to increase that aggregated performance to around 300 ExaFLOPS by 2025, a which would be a major accomplishment, if achieved.

China's tech giants, such as Alibaba and Tencent, are poised to benefit from this initiative. The government's plan includes a focus on memory storage enhancements, improved data transmission infrastructure, and the establishment of additional data centers. These developments are crucial for cloud computing services, a domain where many AI solutions are currently being marketed. Meanwhile, it is unclear what kind of hardware will be used to build over 100 ExaFLOPS of additional computational capacity in such a short amount of time.

This surge in computational strength is not just for bragging rights. Chinese government recognizes the crucial role of advanced computing in various sectors, notably in finance and the educational realms. The underlying idea is that a robust computational backbone can significantly aid in the development and deployment of AI technologies.

Historical data suggests that China's investments in computational infrastructure yield substantial economic returns. For every yuan spent on enhancing computing capabilities, the nation has seen an economic boost of three to four yuan, according to Akshara Bassi, senior research analyst at Counterpoint. This pattern underscores the importance of technology in driving economic growth.

"China has found that traditionally, every 1 yuan invested in computing power has driven 3-4 yuan of economic output," Bassi told CNBC. "The investments echo China's plans to drive economic output through leadership in technology prowess and integrating AI with existing technologies and solutions across all industries and domains. China aims to invest in growing in its computing power especially the AI, as it sees its major cloud providers launching AI solutions en masse for consumers and enterprises."

It is not going to be easy for China and its companies to build up over 100 ExaFLOPS of compute power in about a year. The U.S. has imposed sanctions that have put a strain on China's tech supply chain, particularly in accessing AI and HPC CPUs and GPUs from companies like AMD, Intel, and Nvidia. Although Chinese No. 1 foundry SMIC can build fairly sophisticated application processors for smartphones, they do not have capability to build something as advanced as Nvidia's H100 or Intel's Ponte Vecchio. As a result, experts believe that U.S. sanctions, especially those affecting access to top-tier AI and HPC chips, will continue to pose significant obstacles for China in the future.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

9 Comments Comment from the forums

bit_user

Okay, so a single H100 delivers between 25.6 TFLOPS (PCIe; 350 W) and 33.5 TFLOPS (SXM; 700 W) @ fp64. So, 39 of the PCIe cards or 30 of the SXM cards would get you to 1 PFLOPS. Multiply that by 100k to reach 100 EFLOPS. I don't know what's Nvidia's annual production volume of H100's, but I'm pretty sure it's not more than 3M! Also, you could look at what proportion of the global production volume of HBM or DDR6 it would consume.

The situation with the A100 is even worse, since it delivers only 5.2 TFLOPS or 9.7 TFLOPS. So, either 192 PCIe cards or 103 SXM cards per PFLOPS.

Of course, I'm taking a leap by assuming they're talking about fp64 compute, but that's the standard for scientific & technical computing (i.e. HPC).

Another interesting angle is to consider how much power that would consume. 3.9M * 350W = 1.37 Gigawatts, best case (assuming they mean fp64). If it's using their own processors, made on like 12 nm, then maybe like 10x as much?
Reply
jp7189

bit_user said:
Okay, so a single H100 delivers between 25.6 TFLOPS (PCIe; 350 W) and 33.5 TFLOPS (SXM; 700 W) @ fp64. So, 39 of the PCIe cards or 30 of the SXM cards would get you to 1 PFLOPS. Multiply that by 100k to reach 100 EFLOPS. I don't know what's Nvidia's annual production volume of H100's, but I'm pretty sure it's not more than 3M! Also, you could look at what proportion of the global production volume of HBM or DDR6 it would consume.

The situation with the A100 is even worse, since it delivers only 5.2 TFLOPS or 9.7 TFLOPS. So, either 192 PCIe cards or 103 SXM cards per PFLOPS.

Of course, I'm taking a leap by assuming they're talking about fp64 compute, but that's the standard for scientific & technical computing (i.e. HPC).

Another interesting angle is to consider how much power that would consume. 3.9M * 350W = 1.37 Gigawatts, best case (assuming they mean fp64). If it's using their own processors, made on like 12 nm, then maybe like 10x as much?
Considering the focus was on AI, I think it unlikely FP64 performance will be the benchmark. H100 is 8x faster @ FP16. 100 FP16 EFLOPS looks a lot more attainable.
Reply
ien2222

bit_user said:
...

Another interesting angle is to consider how much power that would consume. 3.9M * 350W = 1.37 Gigawatts, best case (assuming they mean fp64). If it's using their own processors, made on like 12 nm, then maybe like 10x as much?
Over 1.21 Gigawatts!?!?!?!?! Great Scott!!

(sorry, someone had to do it)
Reply
HaninTH

ien2222 said:
Over 1.21 Gigawatts!?!?!?!?! Great Scott!!

(sorry, someone had to do it)
That's more than a bolt of lightning!
Reply
zsydeepsky

bit_user said:
Another interesting angle is to consider how much power that would consume. 3.9M * 350W = 1.37 Gigawatts, best case (assuming they mean fp64). If it's using their own processors, made on like 12 nm, then maybe like 10x as much?
just considering the electricity requirements...
averagely speaking, China increases its electricity generation by ~200,000 Gwh, per year. a 1.3 Gigawatts system runs 365x24 consumes 11,388 Gwh energy, which China can just add around 17.5 of them per year.
so it's not a problem at all.
Reply
bit_user

zsydeepsky said:
China can just add around 17.5 of them per year.
so it's not a problem at all.
You seem to assume China is able to keep ahead of demand for electricity, however that will be growing as they continue to push for electrification of their transportation infrastructure.

Also, I just computed the GPU power. I didn't factor in the host machine, cooling, or infrastructure. So, probably multiply my estimate by at least 1.5.

The other thing is that you took my best-case estimate of using H100, ignoring the part where I floated the notion of a 10x less-efficient accelerator on a node they could domestically mass-produce. Taken together, that would be something like 15x as high as the estimate you used. Do you still think it's not a problem, at all?

Furthermore, since China's power-generation is predominantly based on fossil fuel, they'd have to keep scaling up their fuel importation/production to match.
Reply
zsydeepsky

bit_user said:
You seem to assume China is able to keep ahead of demand for electricity, however that will be growing as they continue to push for electrification of their transportation infrastructure.

Also, I just computed the GPU power. I didn't factor in the host machine, cooling, or infrastructure. So, probably multiply my estimate by at least 1.5.

The other thing is that you took my best-case estimate of using H100, ignoring the part where I floated the notion of a 10x less-efficient accelerator on a node they could domestically mass-produce. Taken together, that would be something like 15x as high as the estimate you used. Do you still think it's not a problem, at all?

Furthermore, since China's power-generation is predominantly based on fossil fuel, they'd have to keep scaling up their fuel importation/production to match.
read carefully, I said, China can afford *17.5* your new system, per year.
since we still have 2 years to go, China can afford ~35 of your system.
even if you multiply the energy cost by 10 times it's still feasible.

also, China has its own version of H100/A100: Huawei's Ascened card, which was reported on Tom's hardware as well:
https://www.tomshardware.com/news/huaweis-gpu-reportedly-matches-nvidias-a100-reportand Huawei has access to at least 7nm tech which was proven by their new cell phone release. the energy efficiency will be much better than your estimates.

besides, in 2022, when China adds new electricity outputs, about 50% is non-fossil energy. so just by the newly added renewable energy, China can easily handle at least 35 / 2 (energy efficiency) / 2 (renewables) / 2 (heat management) = 4.4 computation systems.

so, again, easy task.
Reply
bit_user

zsydeepsky said:
read carefully, I said, China can afford *17.5* your new system, per year.
since we still have 2 years to go, China can afford ~35 of your system.
even if you multiply the energy cost by 10 times it's still feasible.
Assuming they have either the ability to scale up much faster than their current rate, or that their current rate of construction is creating plenty of spare capacity, sure. I expect neither is exactly true.

zsydeepsky said:
also, China has its own version of H100/A100: Huawei's Ascened card, which was reported on Tom's hardware as well:
https://www.tomshardware.com/news/huaweis-gpu-reportedly-matches-nvidias-a100-report
It's not "H100/A100". The two are definitely not interchangeable, as should be quite clear from my first post. The article said A100.

If it's exactly as powerful and efficient as the A100 (which seems unlikely, but let's assume so), then the total would be about 4.8 GigaWatts for just the GPUs. If we add 50% for hosts, infrastructure, and cooling (TBH, I think it's probably more like 70%, since IIRC cooling is typically a 30% multiplier on everything else), then the figure would be more like 7.2 GW.

Not as extreme as 17.5, but still big enough that you probably can't just assume that much excess capacity will be available - would have to at least be planned for and budgeted.

zsydeepsky said:
and Huawei has access to at least 7nm tech which was proven by their new cell phone release. the energy efficiency will be much better than your estimates.
The A100 was made on TSMC N7. Do we have good information on how those nodes compare?

zsydeepsky said:
besides, in 2022, when China adds new electricity outputs, about 50% is non-fossil energy. so just by the newly added renewable energy, China can easily handle at least 35 / 2 (energy efficiency) / 2 (renewables) / 2 (heat management) = 4.4 computation systems.
Your original figure seemed to indicate total generation capacity added. I think you're double-counting.
Reply
zsydeepsky

bit_user said:
The A100 was made on TSMC N7. Do we have good information on how those nodes compare?

according to some Chinese analysis, just comparing the transistor density, SMIC N+2 (which was used on Kirin 9000s) is equivalent to TSMC N7P, or Intel 7nm, or Samsung 6LPP

<Non English Language link removed by moderator>

bit_user said:
If it's exactly as powerful and efficient as the A100 (which seems unlikely, but let's assume so), then the total would be about 4.8 GigaWatts for just the GPUs. If we add 50% for hosts, infrastructure, and cooling (TBH, I think it's probably more like 70%, since IIRC cooling is typically a 30% multiplier on everything else), then the figure would be more like 7.2 GW.

...

Your original figure seemed to indicate total generation capacity added. I think you're double-counting.

I'll just use the 70% multiplier, then the total energy cost for the system to run 24*365 should be:
4.8GW* 170% * 24*365 = 71481.6 GWh = 71.5 TWh
it still doesn't exceed China's energy output adding capacity, *averagely* it adds 200TWh energy output capacity *per year*, and it still falls in the *renewable* energy capacity range.

besides, according to historical data, in a not so "average" year, China can add way more capacity, for example in the year 2021, China added 750TWh energy generation capacity:

https://www.statista.com/statistics/302233/china-power-generation-by-source/
as a comparison, that's equivalent to almost 1/5 of the US total annual energy generation capacity (4100 TWh), or 127% of Germany (588 TWh), or 78% of Japan (967 TWh), so I understand why that energy cost for such system seems unfeasible to people who are not familiar with China's industry scale
Reply

Show more comments