HPE flaunts El Capitan supercomputer blade with AMD's Instinct MI300A — projected to be world's fastest when finished this year

(Image credit: DOE)

A server blade from the upcoming El Capitan supercomputer was shown off at the ISC High Performance event in Hamburg, Germany. The server blade's front cover was stripped off, revealing all of the internal components — including the extremely potent AMD Instinct MI300A APU.

This is the second time we've seen El Capitan's MI300A's compute chips in all their glory. The blade itself, dubbed the HPE Cray Supercomputing EX255a accelerator blade, consists of a single-slot 1U blade chassis. But in that small size it manages to pack a whopping eight MI300A chips. That's a densely packed setup, featuring copper cooling blocks with copper cooling pipes linking everything together.

Naturally, each blade utilizes liquid cooling to help deal with the immense amount of heat used by the eight APUs. Each MI300A APU has a TDP rating of 550W, with a peak power rating of 760W. That means the cooling for the blades needs to be able to deal with up to 6,080W — or at least a more manageable 4,400W on average.

El Capitan Server Blade — (Image credit: ComputerBase)

Each blade packs two 4-socket node cards (boards) with two MI300A APUs per card. Each blade can also include one additional NVMe SSD if needed, and carries four to eight injection ports designed to connect to El Capitan's HPE Slingshot-11 networking system.

Once it's deployed later this year, El Capitan is poised to become the world's fastest supercomputer — dethroning the AMD-based Frontier supercomputer in the process. It's worth noting that the Intel-based Aurora supercomputer currently only reaches half of its targeted performance (at least in LINPACK).

At the heart of the new machine is AMD's bleeding-edge MI300A APU, which is one of the most advanced microchip processors in the world right now. Each MI300A packs 24 Zen 4 CPU cores and a beefy CDNA3-based GPU with 224 Compute Units and 14,592 Streaming Processors. Both the CPU and GPU share the eight stacks of HBM3 memory with 128GB of total capacity. The MI300A chip is the largest chip AMD has ever produced, with nine linked compute dies (one CPU, eight GPU) in total. AMD utilizes TSMC's 5nm process for the CPUs and GPUs along with 6nm-based dies for the 3D-stacked dies.

The El Capitan supercomputer is being built by HP Enterprise and will be based on its Shasta architecture and Slingshot-11 networking sub-system. A similar combination is used in other supercomputers, including Frontier. El Capitan will be the fastest supercomputer in the world when it debuts, reaching over 2 exaflops of computing power. For reference the Frontier boasts 1.2 exaflops of power, making El Capitan nearly twice as fast as its older counterpart.

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

4 Comments Comment from the forums

SDBud1

Blades PRODUCE heat, they don't use it.
Reply
DS426

Interesting how blades in the rest of the datacenter world started to kind of disappear, or at least lose favor to full 1U and 2U racks, yet the form factor is resilient and certainly holds appeal, especially in server applications that require high density over almost anything else. Dissipating this amount of heat in this volume of space is impressive, although certainly costly and still an engineering challenge. I also wonder how much phase-change is utilized??
Reply
jp7189

Can anyone explain to me the seemingly crazy routing of the copper heatpipes in that pic?
Reply
eX_Arkangel

i assume is for cooling the rest of the MB components (VRM's, other IC's, controller's, etc).
Try to look at the entire blade as a single gigantic current dGPU with its 3-4slots coolers, that often have dedicated heatpipes for memory and VRM cooling.
anyway thats my "guess"
Reply