The United States’ National Oceanic and Atmospheric Administration (NOAA) announced plans to purchase and install two new forecasting supercomputers as part of its new 10-year $505.2 million program.
The two supercomputers have 2,560 dual-socket nodes placed in 10 cabinets and powered by second-generation AMD Epyc ‘Rome’ 64-core 7742 processors, all connected by Cray’s Slingshot network. That works out to 327,680 Zen 2 cores for the clusters.
Each machine will have 1.3 petabytes of system memory and Cray’s ClusterStor systems will come with 26 petabytes of storage per site. 614 terabytes will be flash storage, with the rest being split into two 12.5 petabytes HDD file systems.
The peak theoretical performance of each Cray system is 12 petaflops. Combined with its other research and operational capacity, NOAA’s supercomputers can reach a peak theoretical performance of 40 petaflops.
Cray’s Shasta systems haven’t hit the top 500 supercomputer list yet, but based on their promised performance, they’d be ranking somewhere around the 25th place, based on the November 2019 listing. However, the new Shasta systems won’t be installed until 2022, and chances are they won’t rank as high by then.
The NOAA scientists argue that leadership could also be measured based on how well hurricanes are tracked, the accuracy of the upper-level flow, identifying surface temperature anomalies, and so on.
They also don’t see the supercomputers as putting them in a “fierce competition” with their colleagues at other forecasting centers around the world.
Out With The Old (IBM, Dell), In With The New (Cray)
NOAA will be replacing a mix of eight older systems, most of which are IBM and Dell, but with two also being older Cray systems. IBM manages the operational centers of NOAA until 2022, after which General Dynamics Information Technology (GDIT) will take over for the following eight years with a two-year optional renewal period. The contract with GDIT is worth $505.2 million over the 10 year span.
GDIT won the contract based on it meeting the 99% system availability requirements as well as offering the best the performance-per-dollar out of all the bidders. GDIT choosing AMD’s Epyc servers for the CPUs may have played a significant role in reaching that performance-per-dollar objective, too. We've recently seen several other institutions opt for AMD Epyc-based systems due to their high performance-per-dollar.