AMD MI300X Guzzles Power, Rated for 750 Watts

AMD MI300X
(Image credit: AMD)

We're still on on the heels of AMD's official announcement of its AI datacenter accelerator, the MI300X.  It's certainly a processing force to be reckoned with - one which AMD aims to use as a cudgel to try and dislodge Nvidia from its perch as the dominant player in the AI acceleration world. But increasing performance does sometimes translate into higher power draws, despite each new architecture usually improving power efficiency (consuming lees energy for the same unit of work). And AMD's OAM-based (OCP Accelerator Module) - the MI300X - is certainly a power guzzler: at 750 W, it's actually the product with the highest-rated TDP ever in its form-factor. Don't worry, though: the specifications for OAM solutions go all the way up to 1000 W of deliverable power, so there's still room to scale performance further.

AMD footnotes

Hidden in plain sight in AMD's footnotes is a measure of the MI300X's power consumption. (Image credit: AMD via Videocardz)

While 750 W is an egregious amount of power to be consumed by any individual piece of PC hardware (at least from the perspective of an individual), we do have to keep in mind that those watts are powering hardware that's much faster and more specialized than even AMD's most powerful graphics cards. For that wattage, AMD is offering what it claims to be the most performant accelerator for AI-related workloads (both in generative AI and Large Language Model [LLM] processing).

Considering how AMD managed to cram 12 chiplets built across two fabrication processes (8x 5nm [GPU] and 4x 6nm nodes [I/O die] for a total of 153 billion transistors, that claim may have some backing. Of course, there's also the matter that AMD managed to run a 40-billion parameter LLM model (Falcon 40-B) atop a single MI300X. Now that's impressive, especially considering AMD aims for the MI300X to scale to up to eight accelerators in a single package.

Swipe to scroll horizontally
The AMD MI300X... and the others
Row 0 - Cell 0 AMD MI300XAMD MI300AAMD MI250XAMD RX 7900 XTX
CPU cores03x 8-core CCD (24-cores) [Zen 4]--
GPU cores8x GCD (304 CUs) [CDNA 3]6x GCD (228 CUs) [CDNA 3](220 CUs) [CDNA 2](RDNA 3)
Adressable Memory192 GB (8x 24 GB HBM3)128 GB (8x 16 GB HBM3)128 GB (8x 16 GB HBM2e)24 GB GDDR6
Memory Bandwidth5.2 TB/s5.2 TB/s~ 3.28 TB/s960 GB/s
Infinity Fabric Bandwidth896 GB/s896 GB/s800 GB/s-
Transistor Count153 Billion146 Billion~ 58.2 Billion~ 57 Billion
TDP750 W?560 W355 W

As we see from the table above, AMD's focus on increased power efficiency hasn't been enough to offset the increasing computing requirements for High Performance Computing (HPC) scenarios, which now include the processing of LLM models that seem to be springing left and right. Increased performance requirements mean that even with AMD's latest power-saving technologies, techniques, and the latest fabrication technology from TSMC, there was still the need for a 190 W power envelope increase. 

But that 190 W TDP increase (around 33% higher power draw) does translate into roughly three times the transistors being powered up compared to the MI250X - an impressive showing on efficiency gains, even without considering the MI300X's improved support for sparse algorithms (incredibly important for LLM and AI processing). That's not to say anything about the difference between AMD's compute accelerators and the company's flagship gaming GPU the comparably puny RX 7900 XTX.

Francisco Pires
Freelance News Writer

Francisco Pires is a freelance news writer for Tom's Hardware with a soft side for quantum computing.

  • gg83
    Well 3x the transistor count equates to 3x the wattage. Compared to the 7900xtx.
    Reply
  • Metal Messiah.
    So the power requirements have gone up by 34-50% in a single generation. Doesn't come as a surprise though.

    At the same time, this power increase was to be expected considering the design of the chip itself and the performance it has on offer. The chip delivers an 8x performance boost in AI workloads while being 5 times more efficient.

    A recent Gigabyte Server roadmap also showcased how CPUs, GPUs & APUs are getting close to the 1KW power barrier. AMD certainly has the most power-hungry chip of all but the red team has also invested in a host of chiplet and packing technologies that allow them to significantly reduce power requirements.

    Having chips stacked and packaged together also helps to reduce the relative Bits/Joule cost. So far, advanced packaging has alone provided a 50x reduction in communication power compared to when these chips were all stand-alone and put far to each other across the board.

    Though efficiency is not a linear progression-like performance. Based on CPU & GPU efficiency trends, we are starting to see the progress flattening out so while achieving "Zettascale" performance in the next 10 years or so will be achievable, it will come at a significant efficiency cost.
    https://i.imgur.com/wcZijro.jpg
    Reply
  • bit_user
    I see at least 2 trends driving this:
    The more expensive new process nodes get per mm^2, the more incentive there's going to be to run it at higher clocks.
    Chiplet technology has unlocked a path towards transistor counts increasing faster than process node density increases, which naturally means it's going to take more power to operate them.
    Reply
  • bit_user
    Metal Messiah. said:
    Based on CPU & GPU efficiency trends, we are starting to see the progress flattening out so while achieving "Zettascale" performance in the next 10 years or so will be achievable, it will come at a significant efficiency cost.
    HPC facilities are ultimately somewhat constrained in how much power they can consume and how much heat they can dissipate. Lisa Su was clear about the need to find technological innovations to improve efficiency, since the default trajectory would consume far too much power. IIRC, the difference is like orders of magnitude.
    Reply
  • The Historical Fidelity
    Admin said:
    AMD's latest product for the AI accelerator market, the MI300X, is a tour de force for the company's High Performance Computing (HPC) chops, but with great performance also come great power requirements.

    AMD MI300X Guzzles Power, Rated for 750 Watts : Read more
    Hi just want to point out an error in the article. In the comparison table, the memory type and memory bandwidth for the 7900xtx are wrong. It uses GDDR6 not GDDR5 and bandwidth is close to 1TB/s not 384GB/s
    Reply
  • prtskg
    Compared to previous gen, 250X, TDP has increased 50% while transistors has increased just north of 2.62 times. Let's see how performance increases.
    Reply
  • Kamen Rider Blade
    That 750 Watts is powering 8x GPU Chiplets shiped & 192 GiB of HBM3 on a single large Tile.

    That's not half bad IMO.
    Reply
  • Metal Messiah.
    The Historical Fidelity said:
    Hi just want to point out an error in the article. In the comparison table, the memory type and memory bandwidth for the 7900xtx are wrong. It uses GDDR6 not GDDR5 and bandwidth is close to 1TB/s not 384GB/s

    +1 for this.
    Reply
  • gg83
    bit_user said:
    HPC facilities are ultimately somewhat constrained in how much power they can consume and how much heat they can dissipate. Lisa Su was clear about the need to find technological innovations to improve efficiency, since the default trajectory would consume far too much power. IIRC, the difference is like orders of magnitude.
    I like the idea of turning the heat back to electricity to run the stuff like fans and led's. Who cares about power draw if none of it is wasted right?
    Reply
  • bit_user
    gg83 said:
    I like the idea of turning the heat back to electricity to run the stuff like fans and led's. Who cares about power draw if none of it is wasted right?
    It's pretty difficult to recover energy from waste heat, or it'd be a lot more common. I'd be surprised if they could even achieve 5% efficiency.

    The only way that waste heat isn't actually wasted is if you use it to heat a swimming pool or an eel farm, where you'd have needed some form of heating, regardless.
    https://www.tomshardware.com/news/japanese-data-center-starts-eel-farming-side-hustlehttps://www.tomshardware.com/news/heata-server-hot-water-trial-uk
    Reply