AVX-512 Works Surprisingly Well on Ryzen 7040 Series Phoenix CPUs

Ryzen Pro Mobile
(Image credit: AMD)

Phoronix recently benchmarked AMD's most sophisticated Ryzen mobile architecture, the 7040 mobile series, in AVX-512 workloads to see how performant it is compared to Intel's last two generations of AVX-512-supported CPUs in the mobile space. Turns out, AMD's Phoenix series CPUs are incredibly effective AVX-512 chips, easily beating out the competition in power efficiency and performance.

The CPUs Phoronix tested included a Ryzen 7 7840U, as well as Intel's older i7-1165G7, and i7-1065G7 — which were the last mobile CPUs to support AVX-512. The AMD chip blew past the older Intel CPUs outperforming the 1165G7 by 46% and outperformed the older 1065G7 by a whopping 63%. The Ryzen 7 chip also saw the highest performance gain when enabling AVX-512, with a 54% performance margin when enabling or disabling AVX-512. The Intel chips weren't even close, with a performance margin of 35%.

Phoronix AVX-512 Comparison

(Image credit: Phoronix)

AMD's performance gains with AVX-512 are impressive, especially given that Zen 4 — the CPU architecture the 7840U utilizes — is the very first architecture from team red to adopt the new instruction set. Intel, conversely, has had years of experience developing AVX-512-capable architectures but has failed to pull off the same performance margins as AMD. Intel also had to deal with other architectural oddities found in Rocket Lake and Alder Lake regarding AVX-512 performance and capability, that AMD's Zen 4 architecture does not have.

AVX-512 is a relatively new instruction set that was first developed by Intel in the mid-2010s. The instruction set offers more efficient data processing compared to other AVX standards and is capable of boosting highly complex computation workloads, such as scientific simulation, 3D modeling, analytics, data compression, deep learning, and more.

The instruction set was first seen in desktop consumer chips in 2017, starting with Intel's Skylake-X CPU lineup of HEDT processors. Since then, the instruction set has made its way to desktop and mobile consumer chips, including Rocket Lake, Tiger Lake, and Ice Lake.

But, unexpectedly, Intel dropped AVX-512 support altogether when Alder Lake launched, even though the architecture featured improved AVX-512 capabilities over Rocket Lake. The problem was that Intel couldn't get AVX-512 to work in conjunction with its E-cores, which did not support AVX-512 at all. Though oddly, AVX-512 was actually functional on the P cores for a little while, as long as you disabled the E-cores from the BIOS.

The ironic part is that AMD was busy integrating AVX-512 into its Zen 4 CPU architecture when Alder Lake dropped, making 2022 one of the worst years to drop AVX-512 support on the consumer side for Intel.

So, not only do AMD's Zen 4 mobile CPUs feature AVX-512 support, but they are also the only players in the space until Intel decides to reintroduce it in its consumer mobile chips in the future. This will give AMD-powered notebooks a huge performance advantage for users that can take advantage of AVX-512's faster processing capabilities.

Aaron Klotz
Freelance News Writer

Aaron Klotz is a freelance writer for Tom’s Hardware US, covering news topics related to computer hardware such as CPUs, and graphics cards.

  • hotaru.hino
    To me the elephant in the room is how much power does it take up? IIRC, that was a major issue with AVX-512 workloads.
    Reply
  • drhoi
    AMD implemented AVX-512 differently than Intel to address power issues with the Intel AVX-512 implementation. Consequently, there is negligible power difference using AVX-512 on AMD. I have a Intel Cascade Lake machine where AVX-512 was a bust because continuous use of AVX-512 produces such extreme thermal throttling that it ran no faster than not using it.

    The Phoronix articles has power use results that show no significant power or CPU frequency degradation when using AMD AVX-512 like the graphic below
    Reply
  • Findecanor
    AMD CPUs have multiple AVX execution units. For AVX-512, two units operate in lock-step, each operating on different lanes of the same vector.
    VIA/Centair has a CPU that does the same thing.

    AVX-512 doesn't just increase the register file to 512 bits, however. It is overall a more complete, more modern instruction set than AVX2, and it can operate also on 128-bit and 256-bit vectors.
    You could probably get modest speed improvements over AVX2 just by using AVX-512 with 256-bit vectors (sometimes called "AVX-256").
    Reply
  • bit_user
    As usual, the devil is in the details.

    First, these tests cover only a hand-picked selection of benchmarks that benefit from AVX-512.

    Second, the GeoMean got badly skewed by a few OpenVINO tests which greatly benefited from some newer AVX-512 instructions in Zen 4 that the older Intel CPUs lack. If you exclude those benchmarks, then the speedup seen by Phoenix should more closely match the two Intel CPUs.

    As for the baseline/absolute performance difference, those tests compare two quad-core Intel CPUs against an 8-core Phoenix. On the other hand, the Tiger Lake is using significantly more power than either of the other two, which is the main reason it's so much faster than Ice Lake (internally, the two have basically the same microarchitecture).
    Reply
  • bit_user
    hotaru.hino said:
    To me the elephant in the room is how much power does it take up? IIRC, that was a major issue with AVX-512 workloads.
    They measured that, too. I'm not sure if this is the same graphic @drhoi tried to post, but:
    URL: https://www.phoronix.com/benchmark/result/ryzen-7-7840u-amd-zen-4-avx-512-analysis/cpu-power-consumption-monitor-ptssm.svgz
    Basically, here are the averages:
    Model
    Baseline (AVX/AVX2)

    AVX-512
    Core i7-1065G7
    14.57 W

    15.11 W
    Core i7-1165G7
    29.93 W

    28.73 W
    Ryzen 7 7840U
    16.39 W

    15.88 W

    So, no significant difference between AVX-512 and not. Although, with these being laptops, they're probably just bouncing off their configured power limits in each case.
    Reply
  • hotaru.hino
    bit_user said:
    Basically, here are the averages:
    Model
    Baseline (AVX/AVX2)

    AVX-512
    Core i7-1065G7
    14.57 W

    15.11 W
    Core i7-1165G7
    29.93 W

    28.73 W
    Ryzen 7 7840U
    16.39 W

    15.88 W


    So, no significant difference between AVX-512 and not. Although, with these being laptops, they're probably just bouncing off their configured power limits in each case.
    I'm pretty sure this was just the low power limit being hit, rather than something like this (from https://www.anandtech.com/show/16535/intel-core-i7-11700k-review-blasting-off-with-rocket-lake/2)

    And am too lazy to find out if anyone did an AVX512/no-AVX512 test.
    Reply
  • bit_user
    Findecanor said:
    AMD CPUs have multiple AVX execution units. For AVX-512, two units operate in lock-step, each operating on different lanes of the same vector.
    No, actually the way it works is by splitting most operations into two haves and sending them each down the same 256-bit execution port. This is like how Intel CPUs implemented SSE, prior to the Core microarchitecture.

    Sandybridge actually implemented AVX the way you're saying, by fusing two 128-bit ports so that a 256-bit op could be dispatched every cycle.
    Reply
  • bit_user
    hotaru.hino said:
    I'm pretty sure this was just the low power limit being hit, rather than something like this (from https://www.anandtech.com/show/16535/intel-core-i7-11700k-review-blasting-off-with-rocket-lake/2)

    And am too lazy to find out if anyone did an AVX512/no-AVX512 test.
    Yeah, but also remember that Rocket Lake is the 14 nm backport of Ice Lake, whereas the Intel CPUs in Phoronix' benchmark are made on Intel 10 nm.

    But, you're almost certainly right that the Intel would use more power on AVX-512, if not clock/power/thermally throttled. The reason Zen 4 might be an exception is that it can do the same amount of work per cycle, either using AVX/AVX2 instructions or AVX-512, given how they implemented AVX-512.

    Edit: I found this benchmark of AVX-512 on Ice Lake SP showing AVX-512 used 23.3% more power for only 34.1% more performance:
    https://openbenchmarking.org/embed.php?i=2301176-NE-SAPPHIRER14&sha=65a87c239a64&p=2
    Source: https://www.phoronix.com/review/intel-sapphirerapids-avx512#page-8
    On Sapphire Rapids, power consumption stayed about the same with as without, but performance jumped by 44.2%. In this case, OpenVINO seemed to play a much smaller role. So, I trust those results a bit more.
    Reply
  • hotaru.hino
    bit_user said:
    Yeah, but also remember that Rocket Lake is the 14 nm backport of Ice Lake, whereas the Intel CPUs in Phoronix' benchmark are made on Intel 10 nm.

    But, you're almost certainly right that the Intel would use more power on AVX-512, if not clock/power/thermally throttled. The reason Zen 4 might be an exception is that it can do the same amount of work per cycle, either using AVX/AVX2 instructions or AVX-512, given how they implemented AVX-512.
    What I'm trying to understand is if this is something I should actually be impressed by, or simply something that's on track for what's expected and this is just websites sensationalizing something again.

    Although digging around some more, AnandTech did test AVX512 on Alder Lake and found that the power consumption issue went away. So had Intel kept it around, I'd expect this behavior going forward.
    Reply
  • digitalgriffin
    Waiting for Intel fans to call this lies, irrelevant, and how avx512 doesn't matter.
    Reply