AMD crafts custom EPYC CPU with HBM3 memory for Microsoft Azure – CPU with 96 Zen 4 cores and 450GB of HBM3 may be repurposed MI300C, four chips hit 7 TB/s [Updated]

MI300
(Image credit: Future)

Update 11/22/2024 6:45am PT: AMD has sent over additional details about the processor:

"D: The AMD EPYC 9V64H processor is an HPC focused processor that utilizes 96 Zen 4 cores with 128GB of HBM3 on package. It is designed to outperform the competition when it comes to memory bandwidth and memory latency sensitive workloads and was developed in close collaboration with Microsoft Azure. Our chiplet architecture has enabled us to easily build the EPYC 9V64H. The EPYC 9V64H is based on the SH5 socket that is used for the AMD Instinct MI300X and AMD Instinct MI300A accelerators."

Original Article:

Microsoft announced its latest high performance computing (HPC) Azure virtual machines powered by a custom AMD CPU that may have once been called MI300C. 


The HBv series of Azure VMs are focused on delivering high amounts of memory bandwidth, an important specification for HPC; Microsoft calls it the “biggest HPC bottleneck.” Previously, Microsoft had used Milan-X and Genoa-X server CPUs with AMD’s 3D V-Cache to provide this extra bandwidth, but for the latest HBv5 VMs, Microsoft clearly wanted something even more performant.

The custom AMD CPU used for HBv5 VMs leverages HBM3, usually the memory of choice for the latest data center-class GPUs, such as AMD’s MI300X. With a bandwidth of 6.9TB/s from four of the chips in a single VM, the VMs are almost nine times faster than the Genoa-X CPUs that Microsoft offers in HBv4 VMs, and nearly 20 times faster than Milan-X chips in HBv3 VMs.

When paired with a CPU, the HBM3 fulfills a similar role as 3D V-Cache. Still, instead of expanding the pool of L3 cache, it effectively adds a massive L4 cache with even greater bandwidth and presumably much worse latency. However, the latter isn't as important in certain types of workloads.

Each HBv5 VM gets four of these custom AMD CPUs, and with all the bells and whistles, a single HBv5 VM offers 450GB of HBM3, 352 Zen 4 cores that clock up to 4GHz, and double the normal Infinity Fabric bandwidth that’s available on regular Epyc CPUs. SMT (hyperthreading) has, however, been disabled. The VMs also have 800Gb/s of Nvidia’s Quantum-2 InfiniBand for network switching.

At 352 cores across four CPUs, that’s 88 cores for each, though it’s likely not every core on the processor is exposed to the VM. Each Zen 4 CCD has either eight or 16 cores, depending on whether it’s Zen 4 or Zen 4c; the custom CPU either uses 11 Zen 4 CCDs or six Zen 4c CCDs, with eight cores on one CCD disabled. It’s more probable that the CPU has 96 fully functional cores, with eight of them reserved for operating the VM, perhaps in an orchestration or hypervisor role.

This “custom” AMD CPU might not be so custom either, as it sounds quite a bit like last year’s rumored MI300C chip. This CPU was expected to essentially be an MI300A APU but equipped exclusively with Zen 4 CCDs instead of CDNA 3 graphics, allowing for a 96-core CPU with HBM3. MI300A’s CPU cores clock up to 3.7GHz, not far off from the CPU used for HBv5, indicating that the custom Azure processor and MI300C may be one and the same.

However, while the HBv5 CPU may not be custom on a technical level, it’s nevertheless Microsoft’s exclusive CPU. “It is only available on Azure,” Microsoft engineer Glenn Lockwood said on Bluesky, responding to a user wondering whether the AMD CPU would ever become available as a regular Epyc CPU.

If the HBv5 processor was formerly MI300C, AMD may have initially wanted to sell it to the general public but had trouble finding a market for it, according to AMD memory engineer Phil Park.

“Why haven’t we seen EPYC+HBM sooner? EPYC has been focused on high volume markets, which is why you don’t see EPYC with more than 2 sockets,” Park posted on Bluesky. “You can’t swap out your DDR5 controllers and add HBM controllers/stacks and call it a day. HBM forces certain design choices (e.g., every HBM3 stack requires sixteen 64-bit channels).

“Flexibility: with HBM, you can’t upgrade capacity or have lower cost versions with fewer channels populated," he added. "Generally, CPUs don’t require that much bandwidth.”

This explanation lines up with the thus-far short history of HBM-equipped CPUs. Intel has already launched HBM-infused CPUs based on Sapphire Rapids, called Xeon Max, which are used in the Aurora supercomputer and are also generally available.

However, Intel confirmed last year there won’t be a version of Xeon Max based on Emerald Rapids, and it’s unclear if Granite Rapids will get a Xeon Max variant either, which may indicate they’ve not been a huge commercial success. The pragmatic decision for AMD may have been to secure a deal with Microsoft and focus MI300C production towards Azure.

Matthew Connatser

Matthew Connatser is a freelancing writer for Tom's Hardware US. He writes articles about CPUs, GPUs, SSDs, and computers in general.

  • bolweval
    But does it run Team Fortress 2?
    Reply
  • newtechldtech
    Where is our custom desktop Ryzen 9950X with 64GB HBMx please AMD ? Or even better with included GPU ?
    Reply
  • User of Computers
    newtechldtech said:
    Where is our custom desktop Ryzen 9950X with 64GB HBMx please AMD ? Or even better with included GPU ?
    Strix Halo is the closest we're going to get to that pipe dream, I'm afraid.
    Reply
  • Makaveli
    newtechldtech said:
    Where is our custom desktop Ryzen 9950X with 64GB HBMx please AMD ? Or even better with included GPU ?
    lol you got 5k to spend that.
    Reply
  • Stomx
    Would be great to see the benchmarks of this interesting processor like those this site or Phoronix are doing. Just two such chips would bring you pretty powerful workstation delivering 7 million core-hours per year and if your app is memory-bound one then much more than that.
    If you are currently using your University supercomputer you know that this is typically way more than you would get from the most University supercomputer centers which became extremely busy lately. And with 8 such processors you would 9 out of 10 of your CPU time requests just completely forget larger supercomputers
    Reply
  • LibertyWell
    Microsoft, Apple, and Google are the last companies who should have access to this type of tech and yet they are first in line.

    Humanity is good and well f——-
    Reply
  • User of Computers
    LibertyWell said:
    Microsoft, Apple, and Google are the last companies who should have access to this type of tech and yet they are first in line.
    Without companies like these, products such as this have no reason to exist.
    Reply
  • User of Computers
    Stomx said:
    And with 8 such processors you would 9 out of 10 of your CPU time requests just completely forget larger supercomputers
    Depends on what you're doing.
    Reply
  • redgarl
    newtechldtech said:
    Where is our custom desktop Ryzen 9950X with 64GB HBMx please AMD ? Or even better with included GPU ?
    If you want to shove 5000$ for it, go ahead...

    However, AMD knows it is not going to be a profitable endeavor since nobody from the Client Compute Group is going to buy one of those.
    Reply
  • newtechldtech
    redgarl said:
    If you want to shove 5000$ for it, go ahead...

    However, AMD knows it is not going to be a profitable endeavor since nobody from the Client Compute Group is going to buy one of those.

    it would be around $1500.

    And AMD does not care about delivering such tech , because what they have are selling like hotcakes , nothing to do about cost.
    Reply