Nvidia Grace Superchip loses to Intel Sapphire Rapids in HPC performance benchmarks, but promises greater efficiency

Grace CPU Superchip
Grace CPU Superchip (Image credit: Nvidia)

The Barcelona Supercomputing Center and the State University of New York have published benchmarks showing the prowess of Nvidia's brand-new Grace Superchip, which couldn't quite match two of Intel's 48-core Sapphire Rapids CPUs. Despite not having earth-shattering performance, Grace nevertheless promises to be a competitive datacenter and HPC processor thanks to its efficiency.

Grace is Nvidia's first-ever homemade server CPU, built on the Arm architecture. A single Grace CPU comes with 72 cores and 480GB of LPDDR5X memory. Though it's not possible to buy a single Grace CPU on its own, it features alongside Hopper GPUs in Grace-Hopper processors and Nvidia offers the Grace Superchip with two Grace CPUs combined on a single board for a total of 144 cores and 960GB of LPDDR5X.

The benchmarks shown at the HPC Asia conference last week are perhaps the most detailed we've seen thus far, with the Barcelona and New York researchers each presenting their findings at the conference. Each group tested differently, with the Barcelona benchmarks focusing on Grace's performance relative to Skylake-X's, and the New York tests comparing Grace to a variety of other AMD and Intel CPUs.

The Barcelona researchers tested Grace-Hopper (without the GPU part — effectively a single Grace CPU) and the Grace Superchip against a pair of 24-core Xeon Platinum 8160s. Given that Skylake-X turns seven years old in 2024, it wasn't surprising that the Grace Superchip in its worst showing was still 67% faster than the 48-core Skylake-X server; the Superchip's best result saw a lead of 4.49x. The choice of CPU comparison is strange but not arbitrary, as the Barcelona Supercomputing Center is replacing its Intel-powered MareNostrum 4 with Nvidia's Grace.

The New York benchmarks are more interesting given that they include comparisons to Intel Sapphire Rapids and Ice Lake, AMD's Milan, and rival Arm-based CPUs in the form of Amazon's Graviton 3 and Fujitsu's A64FX. The Grace Superchip easily beat the Graviton 3, the A64FX, an 80-core Ice Lake setup, and even a 128-core configuration of Milan in all benchmarks. However, the Sapphire Rapids server with two 48-core Xeon Max 9468s stopped Grace's winning streak.

Swipe to scroll horizontally
Grace Overall Performance (GFLOPs)
Row 0 - Cell 0 GraceSapphire Rapids HBMSapphire Rapids DDR5
Matrix Multiplication4,4615,3924,787
LINPACK3,1202,8622,211
FFT134.2143.1129
HPCG106.5197.583.6
OpenFOAM (lower is better)5.466.876.89
Gromacs MEM171206.1203.64
Gromacs RIB12.713.5213.88
Gromacs PEP0.9771.21.18

Against Sapphire Rapids in HBM mode, Grace only won in three of the eight tests — though it was able to outperform in five tests when in DDR5 mode. It's a surprisingly mixed bag for Nvidia considering that Grace has 50% more cores and uses TSMC's more advanced 4nm node instead of Intel's aging Intel 7 (formerly 10nm) process. It's not entirely out of left field, though: Sapphire Rapids also beat AMD's Epyc Genoa chips for a spot in a MI300X-powered Azure instance, indicating that, despite Sapphire Rapid's shortcomings, it still has plenty of potency for HPC.

On the other hand, Nvidia might have a crushing victory in efficiency. The Grace Superchip is rated for 500 watts, while the Xeon Max 9468 is rated for 350 watts, which means two would have a TDP of 700 watts. The paper doesn't detail power consumption on either chip, but if we assume each chip was running at its TDP, then the comparison becomes very favorable for Nvidia.

Swipe to scroll horizontally
Grace Hypothetical Efficiency
Row 0 - Cell 0 GraceSapphire Rapids HBMSapphire Rapids DDR5
Matrix Multiplication130.4%112.6%100%
LINPACK197.6%129.4%100%
FFT145.6%110.9%100%
HPCG178.3%236.2%100%
Gromacs MEM116.2%101.2%100%
Gromacs RIB128.1%97.4%100%
Gromacs PEP115.9%101.7%100%

Bearing in mind that this is a comparison of TDP and not actual power consumption, the data here looks very positive for Nvidia. It would seem that the Grace Superchip is only less efficient in a single benchmark compared to the Sapphire Rapids chip in HBM mode. That certainly changes Grace's outlook, especially considering that efficiency is a big deal in large deployments of server CPUs — since cooling and power usage costs can become very expensive.

Though not an absolute performance champion, Grace is shaping up to be one of the most efficient datacenter CPUs today, though bear in mind that neither Epyc CPUs based on Zen 4 nor Intel Xeons based on Emerald Rapids were included in these benchmarks. Nvidia claims Grace will beat AMD's Genoa in efficiency, but we're going to have to wait and see if Nvidia proves to be right about that.

Matthew Connatser

Matthew Connatser is a freelancing writer for Tom's Hardware US. He writes articles about CPUs, GPUs, SSDs, and computers in general.