AMD's EPYC Milan CPU and Nvidia's 'Volta-Next' GPUs To Power Shasta Supercomputer

AMD recently announced its EPYC Rome processors processors, the first 7nm data center chips on the market, but the company is already moving forward with its next-generation products. Here at Supercomputer 2018, the U.S. Department of Energy (DOE) announced that its Perlmutter supercomputer would come armed with AMD's unreleased EPYC Milan processors. The new supercomputer will also use Nvidia's "Volta-Next" GPUs, with the two combining to make an exascale-class machine that will be one of the fastest supercomputers in the world.

Credit: Paul Alcorn/Tom's Hardware — (Image credit: Paul Alcorn/Tom's Hardware)

The Perlmutter supercomputer will be built using Cray's Shasta supercomputer platform, which was also on display here at the show. The supercomputer will be built with a mixture of both CPU and GPU nodes, with the CPU node pictured above. This watercooled chassis houses eight AMD Milan CPUs. We see four copper waterblocks that cover the Milan processors, while four more processors are mounted inverted on the PCBs between the DIMM slots. This system is designed for the ultimate in performance density, so all the DIMM sticks are also watercooled.

The DOE presented a slide outlining the Milan processors. But, in a case study of how easily slides can be misinterpreted if you aren't there for the presentation, the speaker specifically stated that the "64 cores" listing refers to AMD's Rome processors, and not the Milan chips. For now, the DOE isn't at liberty to disclose the core counts for the Milan CPUs.

The Rome processors AMD recently announced will come with the 7nm process and the Zen 2 microarchitecture, while the Milan CPUs will come with the Zen 3 microarchitecture built on the 7nm+ process. The slide also lists 8 channels of DDR memory with >=256 GiB of memory per node, but the speaker again specified that the memory capacity figure is based on Milan's specifications either matching or exceeding that of Rome.

The supercomputer will also have nodes dedicated to GPU compute. Each node will have four Nvidia "Volta-Next" GPUs installed, but again, the specifications listed in the slide merely indicate the Volta-next GPUs will exceed the current-gen V100's specifications.

All of that high-powered compute packed into a slim blade server is guaranteed to generate a copious amount of heat, and here we can see the connections at the rear of the blade for the warm-water cooling system.

The CPU and GPU nodes connect to a unique networking attachment, shown here with the connections that mate between the two chassis. You can see the waterblock in the center of the networking chassis. That covers the dedicated networking ASIC, and water also circulates through the metallic blocks over the networking attachments on the rear. That helps deal with the heat generated by optical links. Optical links are supported, but the system is designed with cheaper and more-reliable copper connections in mind. Cray isn't sharing networking speeds and feeds, though we expect it to be 100Gb/s, or faster.

The two nodes, once mated together, connect to the top of rack switch (above) via the networking, which is then tied together into a Dragonfly topology that Cray designed to reduce the number of hops for data that traverses the nodes.

Image 1 of 9

The new supercomputer will also come with an all-flash storage system, largely due to the plummeting costs of flash, eliminating the need for costly and less-spacious burst buffers to handle sporadic and intense storage workloads.

AMD CEO Lisa Su is fond of reminding us that customers in the data center buy into long-term roadmaps rather than single product generations, which is accurate given the long qualification cycles for both hardware and software on new platforms. This latest buy-in from the DOE is a sign that more high-profile customers are confident in AMD's capabilities to provide HPC-class hardware. Cray also makes the Shasta platform available for other supercomputing projects, so this surely won't be the last announcement of a Milan-powered supercomputer.

The Perlmutter supercomputer will come online in 2020, and we expect the DOE to release more information as the time nears.

TOPICS

Paul Alcorn is the Editor-in-Chief for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

14 Comments Comment from the forums

LordConrad

But can it... play Solitaire?
Reply
johnynavvaro

But can it even turn on?
Reply
fevanson

The real question is, can it run Crysis?
Reply
boju

Might run The DeLorean. What would you do >:P
Reply
derekullo

All hail our new computer overlords
Reply
Lucky_SLS

Damn those cash infused departments! Spending top dollar and also being early adopters!
Reply
AgentLozen

What do you think each one of those blades costs? $20k? $40k?
Reply
shrapnel_indie

What exactly will the DOE do with that much power?

<Poof> a few men jump out of the sub-etha, roughing you up a bit and telling you to never ask that again...
Reply
bruceghayes

Will it be obsolete in 5-10 years (Moore's law now says silicon transistors can only keep shrinking for another 5 years), and the DOE have to replace it?
Reply
IR240474

WOW.... Not sure if its powerful enough for Solitaire, ya might need just one more DIMM in there.
Reply

Show more comments