AMD claims LLMs run up to 79% faster on Ryzen 8040 CPUs compared to Intel’s newest Core Ultra chips

(Image credit: AMD)

AMD reports that its older Ryzen mobile 7040 Phoenix and Ryzen mobile 8040 series processors outperform Intel’s Core Ultra Meteor Lake CPUs by up to 79% in various large language models (LLMs). The CPU manufacturer unveiled a plethora of benchmarks against Intel’s Core Ultra 7 155H CPU compared to the Ryzen 7 8740U. Both chips sport hardware-based Neural Processing Units (NPUs).

AMD put together several slides featuring performance results in Mistral 7b, Llama v2 and Mistral Instruct 7B with the two CPUs. In Llama v2 Chat using a Q4 bit size, the Ryzen chip achieved 14% faster tokens per second than the Core Ultra 7 155H. With the same bit size in Mistral Instruct, the Ryzen chips achieved 17% faster tokens per second. In the same LLMs, but looking at Time to First Token for Sample Prompt, AMD’s competitor was 79% faster than the Core Ultra 7 in Llama v2 and 41% faster in Mistral Instruct.

AMD showed another chart of Llama 2 7B Chat using a plethora of different bit sizes, block sizes, and quality levels. On average, the Ryzen 7 7840U was 55% quicker than the Intel counterpart and up to 70% faster in the Q8 results. Despite Q8 being the fastest, AMD recommends a 4-bit K M quantization for running LLMs for real-world use and setting a 5-bit K M for tasks requiring extreme accuracy, like coding.

We are not surprised that AMD is currently winning the AI performance war with Intel. Despite its Ryzen 7040 series architecture having the same level of performance (in TOPS) as Meteor Lake, we discovered late last year that AMD often outperforms Meteor Lake in AI-based workloads. This appears to be a problem with LLM optimization rather than a hardware or driver issue. We noticed AMD notably wins in AI workloads that don’t take advantage of Intel’s OpenVINO framework, which is optimized for Intel products only. OpenVINO appears to be vital to significantly boosting Intel AI performance. Intel’s A770, for instance, gets a tremendous 54% performance improvement purely from OpenVINO optimizations.

Don’t expect this performance behavior to last long. We are only at the beginning of NPU development, after all. If more apps don’t embrace OpenVINO, we expect Intel to switch gears and try a better optimization route—one that will be adopted by more developers. Intel is also getting ready to unleash its next-generation Lunar Lake mobile CPU architecture later this year, which will reportedly feature 3x the AI performance of Meteor Lake (on top of huge IPC improvements for the CPU cores).

See more CPUs News

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

16 Comments Comment from the forums

newtechldtech

Who cares ... Apple M3 pro outperform them both...
Reply
Eximo

And it is the GPU doing the heavy lifting anyway. Is anyone surprised?
Reply
usertests

Eximo said:
And it is the GPU doing the heavy lifting anyway. Is anyone surprised?
The NPU in Strix Point will have around tripled performance, possibly outperforming the iGPU and CPU on its own.
Reply
Alvar "Miles" Udell

Last I saw the Ryzen 8040 was rated for 39TOPS, which is less than the 40TOPS Microsoft requires to be labeled an "AI PC", so does it -really- matter that it's faster (supposedly) than Intel if it's still slower than requirements?
Reply
Notton

newtechldtech said:
Who cares ... Apple M3 pro outperform them both...
You mean under-performs both?

M3 only does 18 TOPS

Core Ultra is max 34 TOPS
Ryzen 8040 series is max 38 TOPS
Snapdragon X Elite claims to do between 45 to 75 TOPS.
Reply
scottslayer

Wow, I sure am glad AMD gets to self report results again.
Reply
Amdlova

Amd self claims is the pure truth =]You can belive lol
Reply
hotaru251

Amdlova said:
Amd self claims is the pure truth =]You can belive lol
thats how all 1st party stuff is.

They will always use cherry picked results that make theirs look good.
Reply
usertests

Notton said:
You mean under-performs both?

M3 only does 18 TOPS

Core Ultra is max 34 TOPS
Ryzen 8040 series is max 38 TOPS
Snapdragon X Elite claims to do between 45 to 75 TOPS.
You're comparing the Apple M3 Neural Engine (NPU only) to Intel/AMD using CPU+iGPU+NPU all at the same time, so not exactly fair.

https://en.wikipedia.org/wiki/Apple_M3#Variants
Apparently the iPhone 15 Pro's NPU is almost twice as fast as the M3 variants.

Despite the months of hype, Snapdragon X Elite isn't out yet. AMD will also be doing 45+ TOPS out of the Strix Point NPU by the end of the year.
Reply
Notton

usertests said:
You're comparing the Apple M3 Neural Engine (NPU only) to Intel/AMD using CPU+iGPU+NPU all at the same time, so not exactly fair.

https://en.wikipedia.org/wiki/Apple_M3#Variants
Apparently the iPhone 15 Pro's NPU is almost twice as fast as the M3 variants.

Despite the months of hype, Snapdragon X Elite isn't out yet. AMD will also be doing 45+ TOPS out of the Strix Point NPU by the end of the year.
Well if Apple doesn't want their M3 to look so badly, maybe they would plaster ads about the total TOPS?
Seeing as they don't, M3 has 18 TOPS.

Snapdragon X Elite launch date is 24'Q2, and it's been well known for some +4 months right now.

Strix Point, or 8050 series is 45~48 TOPS, yeah.

It will be interesting to see if X Elite will have a paper or hard launch. AMD mobile parts have always been paper launches that show up some 3 to 6 months after announced.
Reply

Show more comments