Nvidia Steals AMD's Supercomputer Efficiency World Record
Nvidia's Hopper-based supercomputers make triumphal debut in the Green500 list.
The first supercomputers based on Nvidia’s H100 compute GPUs are yet to set records in terms of absolute performance, but they already show their might in terms of performance-per-watt.
Flatiron Institute’s Henri supercomputer, based on Intel’s Xeon Platinum 8362 (Ice Lake) and accelerated by Nvidia’s H100 compute GPUs this week, debuted in the Top500 and Green500 lists. In addition, it dethroned the AMD-powered Frontier Test and Development System, running AMD’s EPYC and Instinct MI250X hardware from the top of the Green500 list.
Lenovo built the Henri supercomputer, and it is currently the No. 405 most powerful system in the Top500 list with a Rmax performance of 2.04 FP64 PFLOPS, which is hardly impressive by itself. What is remarkable is that the machine consumes only 31 kW of power, demonstrating an energy efficiency of 65.091 GFLOPS/Watt, the world’s record. To put the number into context, the Frontier TDS machine hits 62.684 PFLOPS/W, the Frontier — the world’s fastest supercomputer — scores 52.227 PFLOPS/W, whereas the Lumi system achieves 58.021 PFLOPS/W.
The Henri machine is a relatively simplistic supercomputer by today’s standards: it uses Lenovo’s off-the-shelf air-cooled ThinkSystem SR670 V2 servers featuring Intel’s 32-core Xeon Platinum 8362 processors (5,920 cores in total) and 80 Nvidia’s H100 80GB PCIe cards based on the Hopper architecture. Of course, using air cooling for a relatively small system might have some other impact on its performance-per-watt results. However, Nvidia’s latest compute GPUs offer impressive performance in general.
“This supercomputer opens up opportunities for doing new kinds of science,” said Ian Fisk, co-director of the Flatiron Institute’s Scientific Computing Core. “This is a workhorse machine, and we’re going to let our researchers try new things and drive discoveries. […] [It offers] very high performance and very efficient without being particularly exotic. It only took a couple of people to load the system in. This kind of efficiency is now accessible to a lot more groups rather than just the largest supercomputing centers.”
Truth to be told, Nvidia-based supercomputers (in many cases comprising of standard servers) have been performance-per-watt champions in the Green500 list for some time, so it is logical to expect H100 to continue Nvidia’s winning spree here.
Meanwhile, AMD EPYC and Instinct MI250X-based machines are not outsiders in terms of performance-per-watt metrics, especially if you consider the scale of Frontier, Lumi, AdAstra, Setonix, and Dardel machines powered by AMD’s technology. In addition, six out of the top 10 supercomputers in the Green500 list utilize AMD’s CPUs and GPUs, three are accelerated by Nvidia’s compute GPUs, and one uses Intel’s Xeon Platinum 8260M-based nodes.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
bit_user Truth to be told, Nvidia-based supercomputers (in many cases comprising of standard servers) have been performance-per-watt champions in the Green500 list for some time, so it is logical to expect H100 to continue Nvidia’s winning spree here.
The reason it's surprising is that MI200 doubled CDNA's fp64 rate, by expanding its registers and datapaths to 64-bit. This enabled it to dispatch fp64 operations at full-rate, instead of half the fp32 rate. Relative to the A100, this enabled AMD to take a significant single-device performance lead, which typically translates into better perf/W (so long as the lead wasn't achieved by juicing power to an extreme degree).
From what I'm reading, H100 PCIe cards have fp64 performance of 24 TFLOPS for 350 W (not sure if that's base or boost). By comparison AMD's MI210 PCIe card provides 13.3 or 22.6 fp64 TFLOPS (base vs. boost) at 300 W. If Nvidia's figures are base, then I'd say it checks out. If they're boost, then it sure looks close.
Anyway, that's a good comeback from Nvidia, especially when you consider they still have a 2:1 fp32:fp64 ratio. That said, their H100 is made on TSMC N4, whereas AMD's MI200-series uses N6. So, the tables will likely turn once again, when AMD counters with the MI300-series.