AVX-512 is a gamechanger on Intel Emerald Rapids CPU — 5th Gen Xeon runs twice as fast on average with slightly higher power consumption

5th Generation Xeon Emerald Rapids CPU (Image credit: Intel)

Linux benchmarking website Phoronix has taken Intel's 5th Generation Xeon Emerald Rapids scalable CPU out for a spin to see how much faster it runs with AVX-512 instructions, and the result was a doubling in average performance. Some workloads even saw performance boosts over ten times without substantially increasing power consumption.

Phoronix performed its tests using a server with two of Intel's top-end Xeon Platinum 8592+ 64-core CPUs, 1TB of DDR5 memory, a 3TB SSD, and running on the Intel Eagle Stream with the Ubuntu Linux distro. The publication benchmarked various workloads, such as Embree, OpenVKL, and Y-Cruncher, and enabled AVX-512 to double performance on average.

Much of this average rested on performance results in OpenVINO, which Phoronix tested multiple times with different parameters. Most of the OpenVINO run with AVX-512 showed performance boosts of at least two times, with the fastest result being over ten times faster. This is primarily thanks to OpenVINO supporting AVX-VNNI and BF16, which is especially useful for AI workloads. The difference in peak frequency with AVX-512 enabled and disabled was minimal. With it on, the Xeon Platinum 8592+ hit 2.95 GHz on all cores compared to 3.01 GHz when AVX-512 was off. The 64-core Emerald Rapids chip hit the 3.9 GHz boost clock regardless of whether AVX-512 was on or off.

On average, power usage didn't change with or without AVX-512, though many individual workloads required up to 10% more power. The maximum power consumption was about 120 watts higher, which is somewhat typical given that it's hard to gain free performance for no tradeoff. That higher power draw also meant slightly hotter temperatures. Additionally, turning on AVX-512 slightly decreased the frequency, which can result from higher power draw and higher temperatures.

Support for a wide spread of AVX-512 instructions is a primary selling point of Emerald Rapids. Although the CPU loses out to AMD's 4th Generation EPYC Genoa chip with 96 cores in raw performance, as seen in our Emerald Rapids review, AVX instructions can change the dynamic between Intel and AMD's server CPUs, especially for AI. It's one of the probable reasons Microsoft chose last-generation Sapphire Rapids chips over EPYC to pair with AMD's MI300X GPUs.

TOPICS

Matthew Connatser is a freelancing writer for Tom's Hardware US. He writes articles about CPUs, GPUs, SSDs, and computers in general.

3 Comments Comment from the forums

bit_user

AVX-512 is a gamechanger on Intel Emerald Rapids CPU — 5th Gen Xeon runs twice as fast on average
Not typically. The GeoMean is skewed by the relatively large number of OpenVINO tests. That's a deep learning framework written by Intel and obviously quite optimized for its own hardware.

The median speedup for AVX-512, in those tests, is probably closer to 30%. That doesn't make for such an attention-grabbing headline, of course. And don't forget that the test suite was specifically tailored to include things which traditionally benefit from AVX-512. Across all computing workloads, the typical speedup would be in the (probably lower) single digits.

AVX-512 boosts performance up to 10X higher in some workloads.
Only in OpenVINO, and only for 2 of the test cases. Those tests were clearly designed to showcase specific AVX-512 (VNNI) instructions. If you didn't have those instructions, you wouldn't use that type of model, because other layer types perform better without VNNI.

It's one of the probable reasons Microsoft chose last-generation Sapphire Rapids chips over EPYC to pair with AMD's MI300X GPUs.
And not at all volume, cost, or time-to-market? Microsoft has their own AI accelerators, which could be another reason why they weren't interested in the GPU portion of MI300X.
Reply
thestryker

bit_user said:
And not at all volume, cost, or time-to-market? Microsoft has their own AI accelerators, which could be another reason why they weren't interested in the GPU portion of MI300X.
MI300X is the OAM accelerator only, the MI300A is the APU version. Article author is just positing that this may be a reason why MS went SPR instead of Genoa (their Azure AI instances are SPR+MI300X).
Reply
d0x360

bit_user said:
Not typically. The GeoMean is skewed by the relatively large number of OpenVINO tests. That's a deep learning framework written by Intel and obviously quite optimized for its own hardware.

The median speedup for AVX-512, in those tests, is probably closer to 30%. That doesn't make for such an attention-grabbing headline, of course. And don't forget that the test suite was specifically tailored to include things which traditionally benefit from AVX-512. Across all computing workloads, the typical speedup would be in the (probably lower) single digits.

Only in OpenVINO, and only for 2 of the test cases. Those tests were clearly designed to showcase specific AVX-512 (VNNI) instructions. If you didn't have those instructions, you wouldn't use that type of model, because other layer types perform better without VNNI.

And not at all volume, cost, or time-to-market? Microsoft has their own AI accelerators, which could be another reason why they weren't interested in the GPU portion of MI300X.

But bigger numbers in the headlines of a CPU for servers regarding a limited use instruction set brings in way more readers!
Reply