Ryzen 9000 CPUs drop 10% frequency executing AVX-512 instructions — Intel CPUs typically suffer from more substantial clock speed drops

(Image credit: AMD)

AVX-512 instructions can significantly boost performance in multiple workloads, but the way these instructions were implemented in CPUs caused a significant frequency drop and increase in power consumption. Yet, how AVX-512 is implemented in AMD's Zen 5-based Ryzen 9000-series processors does not cause any considerable clock speed drop or a massive increase in power draw, as tested by InstLatX64.

As it turns out, AMD's Ryzen 9 9950X drops frequency by 10% with heavy AVX-512 usage: it reduces clock speed from 5,700 MHz to 5,300 MHz, which is not substantial and which is in line with what AMD said in an interview with Tom's Hardware back in July. In contrast, Intel processors that do support AVX-512 (the company is known for disabling AVX-512 from Alder Lake and Raptor Lake CPUs for various reasons) usually drop their clocks dramatically when executing AVX-512 instructions.

To some degree, this happens because Intel's AVX-512-supporting CPUs are made on rather outdated process technologies. On the other hand, wide data paths are power hungry themselves, so it remains to be seen how much power AMD Ryzen 9000-series processors do (which are made on TSMC's N4P, 4nm-class process technology) consume when executing AVX-512 instructions.

AMD's Zen 5-based desktop processors have four full-width 512-bit execution units for AVX-512, which makes execution of such instructions very efficient (as some parts use double-pumped AVX-256 units to execute 512-bit instructions), but at the cost of die size.

High-performance desktops, workstations, and servers are often used for various vector workloads from AI and HPC realms, so implementing AVX-512 correctly was crucial for AMD when it designed its Zen 5 implementation for desktops and servers. However, AMD's mobile parts, such as the codenamed Strix Point processors, use double-pumped AVX-256 to execute AVX-512 instructions.

While such an approach will probably confuse software developers and, to some degree, end users, it should be noted that by avoiding the implementation of full-blown 512-bit data paths, AMD makes its cores slightly more compact. This allows it to pack more cores into its processors, and more cores bring higher performance for more users than AVX-512 alone.

See more CPUs News

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

26 Comments Comment from the forums

Pierce2623

I read ChipsandCheese and now I’m gonna show people how smart I am. Parts of this literally read like they’re ripped straight from ChipsandCheese.
Reply
King_V

Ok, maybe this is nitpicking, but, if we're being honest, shouldn't the headline say "drops less than 10%" or "drops 7%"

A 10% drop from 5700 MHz would be 5130 MHz. But, the article states:

it reduces clock speed from 5,700 MHz to 5,300 MHz
That's a 400 MHz drop. 400 is 7% of 5700. Or, 7.018%, depending on how precise you need to be.

And, given that 10 is about 42.9% higher than 7....
I mean, at that point, I might say "Headline overstates AMD clock frequency drop by 43%"
Reply
Jame5

Two statements that feel very contradictory in back to back paragraphs.

1. "AMD's Zen 5-based desktop processors have four full-width 512-bit execution units for AVX-512, which makes execution of such instructions very efficient..."

2. "...it should be noted that by avoiding the implementation of full-blown 512-bit data paths, AMD makes its cores slightly more compact."

So are we praising AMD for having 512-bit execution, or for not having it?
Reply
mitch074

Jame5 said:
Two statements that feel very contradictory in back to back paragraphs.

So are we praising AMD for having 512-bit execution, or for not having it?
Praise - contrary to Intel that disabled AVX-512 on consumer CPUs because E-cores don't support it, AMD managed to cram AVX-512 support one way or another in all their cores.
Reply
bit_user

The article said:
it remains to be seen how much power AMD Ryzen 9000-series processors do (which are made on TSMC's N4P, 4nm-class process technology) consume when executing AVX-512 instructions.
Phoronix tested this, of course. The test suite-wide average went from 152.26 W with AVX-512 off to 148.64 W with it on. Sounds like I mixed those up, doesn't it? No, that's what the data says!

I think it's because their power limit-based clock throttling might have a tendency to overshoot, which would hit the AVX-512 case more, since that should be power-limited more frequently & severely. Just a guess.

The article said:
AMD's mobile parts, such as the codenamed Strix Point processors, use double-pumped AVX-256 to execute AVX-512 instructions.

While such an approach will probably confuse software developers and, to some degree, end users,
Huh? Why would it confuse software developers? Just like Zen 4, software sees them as full-blown AVX-512 implementations, which they are!

With AVX10, Intel is introducing the new requirement where you have to do a runtime check to see how wide the CPU's implementation is. Then, for software written to extract the most performance, precompiled* software would have to dispatch down either a 256-bit or 512-bit code path, because the operand width of the instructions is baked into the instruction stream.

* As opposed to JIT-compiled languages, like for Javascript or Web assembly, where the JIT compiler can presumably generate autovectorized code suited to the native implementation.
Reply
bit_user

Jame5 said:
Two statements that feel very contradictory in back to back paragraphs.
There's no contradiction. He's talking about the desktop & server CPUs implementing at 512-bit, then later explaining why the mobile cores are still limited to a 256-bit "double-pumped" implementation.

It actually aligns with Intel's stated plan of using AVX10/256 for client processors and AVX10/512 for server processors. The chiplet-based Ryzens use the same CCD chiplets as their server CPUs. So, that can be seen as the reason why the current crop of desktop processors have a 512-bit implementation - riding the coat tails of the servers. However, once AMD brings Strix Point to the desktop, now the picture will get more nuanced, because we'll be back to having some current-model AM5 CPUs with the same approach that Zen 4 used.
Reply
bit_user

Pierce2623 said:
I read ChipsandCheese and now I’m gonna show people how smart I am. Parts of this literally read like they’re ripped straight from ChipsandCheese.
FWIW, I don't care if Toms talks and sounds like ChipsAndCheese, as long as the get the details right (and, of course, cite the other publication where & when actually quoting it or using their data).
Reply
mac_angel

Is Intel having troubles getting AVX-512 to work? They've dropped it from the past couple of generations, and from what I've seen, even the new ones aren't going to support it either. Meanwhile there are more and more games and programs that do support it.

edit: been trying to find a list of games on Google, but most of the posts are a couple of years old.
Reply
bit_user

mac_angel said:
Is Intel having troubles getting AVX-512 to work?
No, the problem is that it takes up too much area in their E-cores, so it undermines their hybrid strategy.

AMD's demonstration that you can implement it at (mosty) 256-bit isn't quite the rebuke that it seems, since I'm pretty sure the E-cores actually have a 128-bit implementation of AVX2 (which is natively 256-bit). That would mean having to implement AVX-512 via 4x 128-bit, which might become rather complex, like in the case of "horizontal" operations.
Reply
mitch074

bit_user said:
Phoronix tested this, of course. The test suite-wide average went from 152.26 W with AVX-512 off to 148.64 W with it on. Sounds like I mixed those up, doesn't it? No, that's what the data says!

I think it's because their power limit-based clock throttling might have a tendency to overshoot, which would hit the AVX-512 case more, since that should be power-limited more frequently & severely. Just a guess.
That, or when AVX512 instructions are running the rest of the core's units are powered down - the AVX512 execution unit becomes the hot spot the CPU needs to throttle down for, but it still uses a bit less power than the whole core does.
Reply

Show more comments