Power Consumption of AI Workloads Approaches That of Small Country: Report

(Image credit: Intel)

Demand for AI is immense these days. French firm Schneider Electric estimates that power consumption of AI workloads will total around 4.3 GW in 2023, which is slightly lower than power consumption of the nation of Cyprus (4.7 GW) was in 2021. The company anticipates that power consumption of AI workloads will grow at a compound annual growth rate (CAGR) of 26% to 36%, which suggests that by 2028, AI workloads will consume from 13.5 GW to 20 GW, which is more than what Iceland consumed in 2021.

Massive Power Requirements

In 2023, the total power consumption of all datacenters is estimated to be 54 GW, with AI workloads accounting for 4.3 GW of this demand, according to Schneider Electric. Within these AI workloads, the distribution between training and inference is characterized by 20% of the power being consumed for training purposes, and 80% allocated to inference tasks. This means that AI workloads will be responsible for approximately 8% of the total power consumption of datacenters this year.

Looking ahead to 2028, Schneider projects that the total power consumption of datacenters will escalate to 90 GW, with AI workloads consuming between 13.5 GW to 20 GW of this total. This indicates that by 2028, AI could be responsible for consuming around 15% to 20% of the total power usage of datacenters, showcasing a significant increase in the proportion of power consumed by AI workloads in datacenters over the five-year period. The distribution between training and inference is expected to shift slightly, with training consuming 15% of the power and inference accounting for 85%, according to estimates by Schneider Electric.

AI GPUs Get Hungrier

The escalating power consumption in AI datacenters is primarily attributed to the intensification of AI workloads, advancements of AI GPUs and AI processors, and increasing requirements of other datacenter hardware. For example, of Nvidia's A100 from 2020 consumed up to 400W, H100 from 2022 consumes up to 700W. In addition to GPUs, AI servers also run power-hungry CPUs and network cards.

AI workloads, especially those associated with training, necessitate substantial computational resources, including specialized servers equipped AI GPUs, specialized ASICs, or CPUs. The size of AI clusters, influenced by the complexity and magnitude of AI models, is a major determinant of power consumption. Larger AI models necessitate a more considerable number of GPUs, thereby increasing the overall energy requirements. For instance, a cluster with 22,000 H100 GPUs utilizes about 700 racks. An H100-based rack, when populated with eight HPE Cray XD670 GPU-accelerated servers, results in a total rack density of 80 kW. As a result, the whole cluster demands approximately 31 MW of power, excluding the energy required for additional infrastructural needs like cooling, Schneider Electric notes.

These clusters and GPUs are often operational at nearly full capacity throughout the training processes, ensuring that the average energy usage is almost synonymous with the peak power consumption. The document specifies that the rack densities in substantial AI clusters vary between 30 kW and 100 kW, contingent on the quantity and model of the GPU.

Network latency also plays a crucial role in the power consumption of AI datacenters. A sophisticated network infrastructure is essential to support the high-speed data communication required by powerful GPUs during distributed training processes. The necessity for high-speed network cables and infrastructures, such as those capable of supporting speeds up to 800 Gb/s, further escalates the overall energy consumption.

Given that AI workloads require power-hungry ASICs, GPUs, CPUs, network cards, and SSDs, cooling poses a major challenge. Given the high rack densities and the immense heat generated during computational processes, effective cooling solutions are imperative to maintain optimal performance and prevent hardware malfunctions or failures. Meanwhile air and liquid cooling methods are also 'expensive' in terms of power consumption, which is why they also contribute heavily to power consumption of datacenters used for AI workloads.

Some Recommendations

Schneider Electric does not expect power consumption of AI hardware to get lower anytime soon, and the company fully expects power consumption of an AI rack to get to 100 kW or higher. As such, Schneider Electric has some recommendations for datacentres specializing on AI workloads.

In particular, Schneider Electric recommends transitioning to a 240/415V distribution from the conventional 120/208V to better accommodate the high power densities of AI workloads. For cooling, a shift from air cooling to liquid cooling is advised to enhance processor reliability and energy efficiency, though immersive cooling might produce even better results. Racks used should be more capacious, with specifications such as being at least 750 mm wide and having a static weight capacity greater than 1,800 kg.

TOPICS

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

20 Comments Comment from the forums

Findecanor

Now this is one reason why I am against AI.

AI is the new cryptocurrency. Making money at the expense of others and of the environment.
And scams will be prolific.
Reply
wbfox

Yay, stupid search enginges that hallucinate and steal all the intellectual property of humanity to "learn" are going to be adding to the stupid impact of great products like crypto and nft. Water needs to be reserved so some idiot can get poorly written code or be told a summation of human author's work, or be prevented from getting insurance because the AI doesn't think its a good bet or etc, etc, etc, etc. Time to stop the stupid. We can do these things, but why? So google and amazon and 2000 other companies can steal and sell your info to then sell or ban you from things. This is the darkest, dumbest timeline.
Reply
TheOtherOne

But we can always put the blame on third world countries for the major cause of Carbon Pollution and Global Warming!

Even tho WE "consume" way more electricity (sometimes even 5x/10x more) that is mostly generated by fossil fuel sources.
Reply
bit_user

Findecanor said:
Now this is one reason why I am against AI.

AI is the new cryptocurrency. Making money at the expense of others and of the environment.
Yes, but... one should also consider that if AI is used to replace human workers, what's the carbon footprint of those humans. If we're being honest, we really should look at both sides of it.

Even if it's not replacing humans, the economic benefits of AI imply that it's delivering greater efficiency or else people & businesses wouldn't be willing to pay for it. That typically means less of some sort of resource gets used in the process. So, unless we're talking about AI being used by fossil fuel companies to locate and extract more coal, oil, or gas, there should be some efficiency upside to using AI.

Then, the real question becomes how to make sure that upside is greater than the negative impact it has. And that brings us back to carbon pricing. Yes, I know it'll probably only ever happen long after renewable energy becomes dominant (i.e. due to the influence of fossil fuel lobbyists), but carbon pricing is ultimately the way to help ensure everything in the economy that still uses carbon is delivering more good than bad.
Reply
bit_user

The document specifies that the rack densities in substantial AI clusters vary between 30 kW and 100 kW
Wow... the notion of dissipating 100 kW in a rack is pretty mind-blowing.

One thing that's kind of sad is that AI chips are being run far beyond their window of good efficiency. I know it's not exactly analogous, but this article showed you could get about 77% of the performance from a RTX 4090 at 50% of the power:
https://www.tomshardware.com/news/improving-nvidia-rtx-4090-efficiency-through-power-limiting
Reply
bit_user

wbfox said:
We can do these things, but why? So google and amazon and 2000 other companies can steal and sell your info to then sell or ban you from things.
There are lots of other things it's good for. Improved medical diagnosis, improving crop yields, and even designing more efficient hardware.

It's just a tool. Whether it's good or bad depends on how we use it.
Reply
Findecanor

bit_user said:
Yes, but... one should also consider that if AI is used to replace human workers, what's the carbon footprint of those humans. If we're being honest, we really should look at both sides of it.
Are you seriously suggesting that we should euthanise the unemployed?!
Reply
bit_user

Findecanor said:
Are you seriously suggesting that we should euthanise the unemployed?!
Not necessarily. There are two ways you could consider the carbon footprint of the human workers. One is to consider their lifetime carbon footprint. The other is merely the difference between their baseline footprint (i.e. just sitting at home and sustaining themselves) and the additional carbon used by the to commute and sit an an office which needs to be climate-controlled and have ample space, lighting, cleaning, plumbing, workstations, etc. for them to do their job.
Reply
Co BIY

Equivalent to the electrical use of Cyprus mean it's essentially a rounding error globally. That's a small Island in a very temperate climate.

May grow to to size of Iceland's use, again a small country with total population the size of a smallish city.

The cooling and energy requirements in the actual building housing the racks are an interesting challenge that I'm sure Schneider Electric can solve for a tidy but not unreasonable sum.

What is the ideal amount of CO2 in the atmosphere ? Not sure we know. It isn't zero.
Reply
Co BIY

bit_user said:
One thing that's kind of sad is that AI chips are being run far beyond their window of good efficiency. I know it's not exactly analogous, but this article showed you could get about 77% of the performance from a RTX 4090 at 50% of the power:

It's the high price of the processors that result in them being run beyond their electrical efficiency sweet-spots. Perhaps when the producers are not as capacity constrained they can market a bigger chip but run it slower and cooler for the same performance.

When the processor/accelerator costs $40,000 then the electrical expense is negligible and will be disregarded.

And ... This is how it should be - We have many more resources to produce electricity than we have to produce semi-conductors.

And ... they are much more widely (and equitably) distributed. If we required increased electrical efficiency out of the processors rather than allow users to run them at the max it would increase the power of the Fab giants at the expense of many others.
Reply

Show more comments