Nvidia's Next-Gen Compute GPU Chosen for a Supercomputer

NREL
(Image credit: NREL)

The National Renewable Energy Laboratory (NREL) on Wednesday announced its new supercomputer called Kestrel. The new system will be built by Hewlett Packard Enterprise (HPE) and will be powered by Intel's Xeon Scalable Sapphire Rapids processor as well as a mysterious "Nvidia A100Next Tensor Core compute GPU."

The upcoming Kestrel supercomputer will feature HPE's Cray Ex architecture (like most heterogeneous HPC machines today) and will offer performance of about 44 FP64 PetaFLOPS, which is in line with what the No. 7 most powerful supercomputer in the world provides today, and 75PB of storage. Starting from early 2023, NREL's Kestrel will accelerate accelerate energy efficiency and renewable energy research, the organization said.

The fact that NREL wants higher performance to make better research is hardly surprising, but the choice of hardware is somewhat unconventional. While most supercomputers tend to choose AMD's EPYC processors for their superior core count compared to their rivals from Intel, Kestrel uses Intel's Sapphire Rapids. The system will also use Nvidia's "A100Next Tensor Core GPUs to accelerate AI," and this branding brings in more questions than answers.

Normally, one would expect Nvidia to introduce a brand-new compute GPU architecture by 2023 and the most recent rumors indicate that this architecture is called Hopper and it will use one of TSMC's N5 process technologies. Meanwhile, the name A100Next may mean a next-generation compute GPU based on whatever architecture comes next, or it may indicate that we might be dealing with another incarnation of Nvidia's Ampere architecture. NextPlatform speculates that the A100Next could be "a die shrink to 5nm" with "more compute units, perhaps doubling via a chiplet design." In fact, this looks like a rumored multi-die Hopper-based design.

Intel's Sapphire Rapids CPU and Nvidia's A100 GPU will work perfectly together via a PCIe bus, but they will not be able to efficiently share memory pools since Intel's processor does not support NVLink, whereas Nvidia's A100 compute modules do not support the CXL protocol. Perhaps there will be some changes with the next-generation Nvidia compute GPU to address this shortcoming, but we will have to wait and see.

At this point we do not know anything concrete about Nvidia's H100 or A100Next designs, though CXL support is very likely. Nvidia will also be happy to win a supercomputer contract even before formally announcing its next-generation compute GPU architecture. We could speculate about Nvidia's plans regarding its next-generation compute GPUs, but the company knows how to keep secrets and how to surprise. Still, NREL's usage of the "A100Next" name is a rather odd way to refer Nvidia's upcoming compute GPU. Perhaps Nvidia just didn't want anyone explicitly stating it's Hopper H100, saving the details for a later date.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • rtoaht
    Admin said:
    While most supercomputers tend to choose AMD's EPYC processors for their superior core count compared to their rivals from Intel, Kestrel uses Intel's Sapphire Rapids.

    Only around 14% of the top 500 supercomputers use EPYC. How is that "most"?
    Reply