Argonne National Laboratory and Intel announced on Thursday that installation of 10,624 blades for the Aurora supercomputer has been completed and the system will come online later in 2023. The machine uses tens of thousands of Xeon Max 'Sapphire Rapids' processors with HBM2E memory as well as tens of thousands Data Center GPU Max 'Ponte Vecchio' compute GPUs to achieve performance of over 2 FP64 ExaFLOPS.
The HPE-built Aurora supercomputer consists of 166 racks with 64 blades per rack, for a total of 10,624 blades. Each Aurora blade is based on two Xeon Max CPUs with 64 GB on-package HBM2E memory as well as six Intel Data Center Max 'Ponte Vecchio' compute GPUs. These CPUs and GPUs will be cooled with a custom liquid-cooling system.
In total, the Aurora supercomputer packs 21,248 general purpose CPUs with over 1.1 million high performance cores, 19.9 petabytes (PB) of DDR5 memory, 1.36 PB of HBM2E memory attached to the CPUs, and 63,744 compute GPUs designed for massively parallel AI and HPC workloads with 8.16 PB of HBM2E memory onboard. The blades are interconnected using HPE's Slingshot fabric designed specifically for supercomputers.
"Aurora is the first deployment of Intel's Max Series GPU, the biggest Xeon Max CPU-based system, and the largest GPU cluster in the world," said Jeff McVeigh, Intel's corporate vice president and general manager of the Super Compute Group. "We are proud to be part of this historic system and excited for the ground-breaking AI, science, and engineering Aurora will enable."
The Aurora supercomputer uses an array of 1,024 storage nodes consisting of solid-state storage devices and providing 220PB of capacity as well as 31 TB/s of total bandwidth, which will be handy for handling workloads involving massive datasets, such as nuclear fusion research, scientific engineering, physical simulations, cure research, weather forecasting, and other tasks.
While the installation of the Aurora blades has been completed, the supercomputer has yet to pass acceptance testing. When it does and comes online later this year, it promises to reach a theoretical peak performance beyond 2 ExaFLOPS, making it the first supercomputer to achieve this level of performance when it joins the ranks of the Top500 list.
"While we work toward acceptance testing, we are going to be using Aurora to train some large-scale open source generative AI models for science," said Rick Stevens, Argonne National Laboratory associate laboratory director. "Aurora, with over 60,000 Intel Max GPUs, a very fast I/O system, and an all solid-state mass storage system, is the perfect environment to train these models."
While the Aurora supercomputer yet has to pass tests and ANL yet has to submit its performance results to Top500.org, Intel took a chance to share performance advantages its hardware has over competing solutions from AMD and Nvidia.
According to Intel, preliminary tests with the Max Series GPUs show they excel in 'real-world science and engineering workloads,' delivering performance twice as high as AMD Instinct MI250X GPUs on OpenMC, and nearly perfectly scalable across hundreds of nodes. In addition, Intel says that its Intel Xeon Max Series CPU offers a 40% advantage in performance over its rivals in numerous real-world HPC applications, including HPCG, NEMO-GYRE, Anerlastic Wave Propagation, BlackScholes, and OpenFOAM.