AMD's EPYC Milan CPU and Nvidia's 'Volta-Next' GPUs To Power Shasta Supercomputer

AMD recently announced its EPYC Rome processors processors, the first 7nm data center chips on the market, but the company is already moving forward with its next-generation products. Here at Supercomputer 2018, the U.S. Department of Energy (DOE) announced that its Perlmutter supercomputer would come armed with AMD's unreleased EPYC Milan processors. The new supercomputer will also use Nvidia's "Volta-Next" GPUs, with the two combining to make an exascale-class machine that will be one of the fastest supercomputers in the world.

Credit: Paul Alcorn/Tom's HardwareCredit: Paul Alcorn/Tom's Hardware

The Perlmutter supercomputer will be built using Cray's Shasta supercomputer platform, which was also on display here at the show. The supercomputer will be built with a mixture of both CPU and GPU nodes, with the CPU node pictured above. This watercooled chassis houses eight AMD Milan CPUs. We see four copper waterblocks that cover the Milan processors, while four more processors are mounted inverted on the PCBs between the DIMM slots. This system is designed for the ultimate in performance density, so all the DIMM sticks are also watercooled.

Credit: Paul Alcorn/Tom's HardwareCredit: Paul Alcorn/Tom's Hardware

The DOE presented a slide outlining the Milan processors. But, in a case study of how easily slides can be misinterpreted if you aren't there for the presentation, the speaker specifically stated that the "64 cores" listing refers to AMD's Rome processors, and not the Milan chips. For now, the DOE isn't at liberty to disclose the core counts for the Milan CPUs.

The Rome processors AMD recently announced will come with the 7nm process and the Zen 2 microarchitecture, while the Milan CPUs will come with the Zen 3 microarchitecture built on the 7nm+ process. The slide also lists 8 channels of DDR memory with >=256 GiB of memory per node, but the speaker again specified that the memory capacity figure is based on Milan's specifications either matching or exceeding that of Rome. Credit: Paul Alcorn/Tom's HardwareCredit: Paul Alcorn/Tom's HardwareThe supercomputer will also have nodes dedicated to GPU compute. Each node will have four Nvidia "Volta-Next" GPUs installed, but again, the specifications listed in the slide merely indicate the Volta-next GPUs will exceed the current-gen V100's specifications. Credit: Paul Alcorn/Tom's HardwareCredit: Paul Alcorn/Tom's HardwareAll of that high-powered compute packed into a slim blade server is guaranteed to generate a copious amount of heat, and here we can see the connections at the rear of the blade for the warm-water cooling system.

Credit: Paul Alcorn/Tom's HardwareCredit: Paul Alcorn/Tom's HardwareThe CPU and GPU nodes connect to a unique networking attachment, shown here with the connections that mate between the two chassis. You can see the waterblock in the center of the networking chassis. That covers the dedicated networking ASIC, and water also circulates through the metallic blocks over the networking attachments on the rear. That helps deal with the heat generated by optical links. Optical links are supported, but the system is designed with cheaper and more-reliable copper connections in mind. Cray isn't sharing networking speeds and feeds, though we expect it to be 100Gb/s, or faster. Credit: Paul Alcorn/Tom's HardwareCredit: Paul Alcorn/Tom's Hardware

The two nodes, once mated together, connect to the top of rack switch (above) via the networking, which is then tied together into a Dragonfly topology that Cray designed to reduce the number of hops for data that traverses the nodes.

The new supercomputer will also come with an all-flash storage system, largely due to the plummeting costs of flash, eliminating the need for costly and less-spacious burst buffers to handle sporadic and intense storage workloads.

AMD CEO Lisa Su is fond of reminding us that customers in the data center buy into long-term roadmaps rather than single product generations, which is accurate given the long qualification cycles for both hardware and software on new platforms. This latest buy-in from the DOE is a sign that more high-profile customers are confident in AMD's capabilities to provide HPC-class hardware. Cray also makes the Shasta platform available for other supercomputing projects, so this surely won't be the last announcement of a Milan-powered supercomputer.

The Perlmutter supercomputer will come online in 2020, and we expect the DOE to release more information as the time nears.

14 comments
    Your comment
  • LordConrad
    But can it... play Solitaire?
  • johnynavvaro
    But can it even turn on?
  • fevanson
    The real question is, can it run Crysis?