Beyond Rome: AMD's EPYC and Radeon to Power World's Fastest Exascale Supercomputer

AMD announced today that it had been selected to power Frontier, which is set to be the world's fastest exascale-class supercomputer when it comes online in 2021. The new supercomputer, which will be built with Cray's Shasta supercomputer blades, is being developed by Cray under a $600 million contract for the U.S. Department of Energy (DOE) for the Oak Ridge National Laboratory. All told, the supercomputer is projected to be faster than the top 160 supercomputers in the world, combined. 

The new Frontier supercomputer is expected to deliver a leading 1.5 exaflops of performance powered by next-generation variants of AMD's EPYC processors and Radeon Instinct GPUs. The announcement comes on the heels of Intel's recent disclosure that its graphics cards featuring the Xe Architecture will power Aurora, which will be the first exascale-class supercomputer, though its projected speed doesn't match the Frontier supercomputer.

The new Frontier supercomputer builds upon AMD's rising presence in the supercomputing market, which we recently covered in our Ryze of EPYC feature. Notably, Frontier marks the second exascale-class system that doesn't wield Nvidia's GPUs that have long dominated the supercomputer market.

EPYC and Radeon Combine

Credit: AMDCredit: AMD

AMD didn't reveal which specific generation of its GPUs and CPUs would power the system, although CEO Lisa Su did announce that both components are customized for the deployment.

AMD optimized its custom EPYC processor with support for new instructions that provide optimal performance in AI and supercomputing workloads. "It is a future version of our Zen architecture. So think of it as beyond..what we put into Rome," said Su.

That could indicate that AMD will use a custom variant of its next-next-gen EPYC Milan processors for the task, but that remains unconfirmed. Those processors have already been tapped for the DOE's Perlmutter supercomputer that will also be built with Cray's Shasta building blocks. We have extensive coverage of that design here. Credit: AMDCredit: AMD

The CPUs will be combined with high-performance Radeon GPU accelerators that Su said have extensive mixed-precision compute capabilities and high bandwidth memory (HBM). Su specified that the GPU will come to market in the future, but didn't elaborate about the future of the custom-designed EPYC processor.

Infinity Fabric Ties it Together

AMD will connect each EPYC CPU to four Radeon Instinct GPUs via a custom high-bandwidth low-latency coherent Infinity Fabric. This is an evolution of AMD's foundational Infinity Fabric technology that it currently uses to tie together CPU and GPU die inside its processors, but now AMD has extended it to operate over the PCIe bus.

AMD previously announced this new capability for its MI60 7nm Radeon Instinct accelerators. That version of the enhanced protocol provides up to 100 GB/s of CPU-to-GPU bandwidth over the PCIe 4.0 bus, but it isn't clear if the Frontier supercomputer will wield a future generation of the technology.

Cray will employ an enhanced version of AMD's open-source ROCm programming environment for Frontier, marking an important step forward for AMD's suite of programming tools. Nvidia's CUDA has become the defacto programming environment of choice for GPU accelerators. That entrenchment provides Nvidia an advantage in the parallel computing market, so AMD's forward progress on this front will help it in the broader ecosystem.

The Frontier Building Blocks

Frontier will consist of 100 cabinets of Shasta supercomputer blades, with each cabinet drawing up to 300kW of power. The entire system is projected to consume 40 MW of power and cover 7,300 square feet (approximately two basketball courts). The system will also have over 90 miles of cabling and require 5,900 gallons of water per minute for cooling. Credit: Tom's HardwareCredit: Tom's Hardware

We recently took a look at Cray's current lineup of Shasta blades at the Supercomputing 2018 conference. Frontier will use Shasta compute blades that currently house up to eight compute sockets and a full complement of memory DIMMs and networking. Cray has specified that the current generation (image above) of its Shasta blades will not be used in Frontier, instead a new undisclosed variant will be pushed into service. Currently the Shasta blades consist of CPU, GPU, and networking blades, but it is unclear if the company will stick to that design philosophy for Frontier. Credit: Tom's HardwareCredit: Tom's Hardware

Like the current generation of blades, Cray will use its proprietary Slingshot fabric to connect the nodes to integrated top-of-rack switches. This networking fabric uses an enhanced low-latency protocol that includes intelligent routing mechanisms to alleviate congestion. The interconnect supports optical links, but it is primarily designed to support low-cost copper wiring.

Keeping it in Perspective

Frontier builds on Oak Ridge National Labs' long heritage of hosting a string of the most powerful supercomputers in the world, including previous title-holders Jaguar and Titan, with the latter weighing in at 17.6 petaFLOPS of performance. The lab also hosts Summit, the world's current title holder with 143.5 petaflops of performance. Frontier is theoretically cable of more than seven times more performance.

To put Frontier's performance into perspective, the supercomputer will be able to crunch up to 1.5 quintillion operations per second, which is equivalent to solving 1.5 quintillion mathematical problems every second. Cray also touts the performance of its networking solution as offering 24,000,000 times the bandwidth of the fastest home internet connection, or equivalent to being able to download 100,000 full HD movies in one second.

The system is projected to come online in 2021, though a firm date hasn't been announced. AMD CEO Lisa Su commented "we intend to deliver on-time, on-schedule, and on-performance," which is an important distinction in a supercomputer industry that can be plagued with delays, much like Aurora's timeline was repeatedly pushed back, and then completely redesigned to use Intel's Xe Architecture, due to issues with the Xeon Phi Knight's Hill accelerators. In an odd coincidence, Intel announced the retirement of the last of its Xeon Phi lineup today as it turns its eyes to its Xe Graphics-powered future.

With AMD and Intel both continuing to make headway into the supercomputing realm and Nvidia's presence seemingly receding for newer supercomputers, the next few years could mark a fundamental shift to the established pecking order.

11 comments
    Your comment
  • Mandark
    Nice!
  • akamateau
    nVidia seems to be blocked from the two top Supercomputers.
  • animalosity
    But will it run Crysis?