We're still on on the heels of AMD's official announcement of its AI datacenter accelerator, the MI300X. It's certainly a processing force to be reckoned with - one which AMD aims to use as a cudgel to try and dislodge Nvidia from its perch as the dominant player in the AI acceleration world. But increasing performance does sometimes translate into higher power draws, despite each new architecture usually improving power efficiency (consuming lees energy for the same unit of work). And AMD's OAM-based (OCP Accelerator Module) - the MI300X - is certainly a power guzzler: at 750 W, it's actually the product with the highest-rated TDP ever in its form-factor. Don't worry, though: the specifications for OAM solutions go all the way up to 1000 W of deliverable power, so there's still room to scale performance further.
While 750 W is an egregious amount of power to be consumed by any individual piece of PC hardware (at least from the perspective of an individual), we do have to keep in mind that those watts are powering hardware that's much faster and more specialized than even AMD's most powerful graphics cards. For that wattage, AMD is offering what it claims to be the most performant accelerator for AI-related workloads (both in generative AI and Large Language Model [LLM] processing).
Considering how AMD managed to cram 12 chiplets built across two fabrication processes (8x 5nm [GPU] and 4x 6nm nodes [I/O die] for a total of 153 billion transistors, that claim may have some backing. Of course, there's also the matter that AMD managed to run a 40-billion parameter LLM model (Falcon 40-B) atop a single MI300X. Now that's impressive, especially considering AMD aims for the MI300X to scale to up to eight accelerators in a single package.
|Row 0 - Cell 0||AMD MI300X||AMD MI300A||AMD MI250X||AMD RX 7900 XTX|
|CPU cores||0||3x 8-core CCD (24-cores) [Zen 4]||-||-|
|GPU cores||8x GCD (304 CUs) [CDNA 3]||6x GCD (228 CUs) [CDNA 3]||(220 CUs) [CDNA 2]||(RDNA 3)|
|Adressable Memory||192 GB (8x 24 GB HBM3)||128 GB (8x 16 GB HBM3)||128 GB (8x 16 GB HBM2e)||24 GB GDDR6|
|Memory Bandwidth||5.2 TB/s||5.2 TB/s||~ 3.28 TB/s||960 GB/s|
|Infinity Fabric Bandwidth||896 GB/s||896 GB/s||800 GB/s||-|
|Transistor Count||153 Billion||146 Billion||~ 58.2 Billion||~ 57 Billion|
|TDP||750 W||?||560 W||355 W|
As we see from the table above, AMD's focus on increased power efficiency hasn't been enough to offset the increasing computing requirements for High Performance Computing (HPC) scenarios, which now include the processing of LLM models that seem to be springing left and right. Increased performance requirements mean that even with AMD's latest power-saving technologies, techniques, and the latest fabrication technology from TSMC, there was still the need for a 190 W power envelope increase.
But that 190 W TDP increase (around 33% higher power draw) does translate into roughly three times the transistors being powered up compared to the MI250X - an impressive showing on efficiency gains, even without considering the MI300X's improved support for sparse algorithms (incredibly important for LLM and AI processing). That's not to say anything about the difference between AMD's compute accelerators and the company's flagship gaming GPU the comparably puny RX 7900 XTX.