AMD deep-dives Zen 5 architecture — Ryzen 9000 and AI 300 benchmarks, RDNA 3.5 GPU, XDNA 2, and more
Zen 5's 16% IPC improvement floats all boats.
AMD RDNA 3.5
AMD’s RDNA 3.5 engine powers the Radeon 890M and 880M integrated graphics in the Strix Point processors, but AMD doesn’t use it for the Ryzen 9000 processors — those continue to leverage the RDNA 2 architecture.
AMD’s Mark Papermaster credits the company’s partnership with Samsung, wherein the company licensed its RDNA graphics IP for use in Galaxy smartphones, as a key source of learning about low-power environments. Those same lessons are very useful for other mobile designs, like laptops, that are also limited by battery power.
AMD incorporated those lessons into RDNA 3.5, an incremental change to the RDNA 3 engine that infuses a series of optimizations that improve performance per watt through several techniques, including targeted changes to the texture and shader engines. The design also improves performance per bit with an optimized memory subsystem. Papermaster says these alterations yielded a "double-digit performance gain" per unit of energy expended.
Other improvements include a doubled texture sampler rate that was accomplished by doubling the number of texture sample units to introduce more parallelism for game texturing. AMD also doubled the pixel interpolation and comparison rates and added support for floating point in the scalar arithmetic logic unit (SALU). A new SALU instruction identifies single-use VPGR write operations and allows it to skip them, boosting performance and efficiency. Papermaster also pointed to an entirely new method of creating smaller sub-batches that reduces access to LPDDDR5 memory and optimized memory compression to reduce data traffic, both of which save power while improving performance.
AMD claims the combined optimizations yield a 32% improvement in power efficiency for a Strix Point processor running at 15W in the 3Dmark Timespy benchmark over a Hawk Point chip also running at 15W. AMD’s tests also showed a 19% improvement in power efficiency in the 3Dmark Night Raid benchmark. Remember however that these are purely synthetic workloads that aren’t the best proxy for real-world gaming performance or efficiency.
AMD XDNA 2 NPU Architecture
AMD’s Ryzen AI 300 series is the company’s third generation of processors with an in-built neural processing unit (NPU). AMD’s Phoenix chips were the first x86 processors with an in-built NPU, delivering 10 TOPS of performance via the XDNA NPU, and AMD improved that to 16 TOPS with the second-gen Hawk Point models. However, those gains came from increased clock speeds instead of changes to the XDNA architecture.
Strix Point moves forward to 50 TOPS of NPU performance with the second-gen XDNA 2 engine, a tech borne of AMD’s Xilinx acquisition. Going beyond the speeds and feeds, above, we can see one of the biggest rationales for native AI acceleration — power savings. Here, AMD shows that its XDNA 2 engine is up to 35X more power efficient at running an AI model than the CPU, and that capability becomes more critical for long-duration background workloads, the sweet spot for NPUs.
The XDNA 2 engine is a spatial dataflow architecture with a 2D array of compute tiles tied together with a flexible interconnect that can be programmed at run time to create custom compute hierarchies. AMD says all other NPUs have a fixed hierarchy and don’t have the terabytes of east/west bandwidth available in XDNA 2's interconnect fabric. The architecture also employs SRAM buffers placed throughout the array. AMD claims the cache-less design offers very deterministic latency — key for AI workloads — and the programmable interconnect maximizes bandwidth by allowing seamless data multicasting between units to reduce traffic on the fabric.
The design also supports flexible real-time partitioning by providing adjustable compute streams. For instance, a single column of AIE compute tiles can be dedicated to a light workload, while a quad-column array can be assigned to a heavier task. The engine supports up to eight concurrent isolated streams. This technique is designed to optimize power, performance, bandwidth, and latency while running concurrent AI models. Power gating is also essential to save power during idle time, and the engine supports per-column granularity for power gating.
Architectural enhancements have added plenty of processing horsepower per tile, but AMD also had to expand the number of tiles from 20 to 32 to reach the full 50 TOPS target. The company also added 1.6X more on-chip memory and twice the number of MACs (multiply accumulators) per tile.
The architectural changes with the XDNA 2 engine result in up to five times the compute capacity and twice the power efficiency of the first-gen XDNA engine. XDNA 2 also supports running up to eight concurrent AI models.
NPU performance is typically measured by performance in INT8 workloads, a less precise data type that uses less computing and memory to run a model. However, models must be quantized to the INT8 format first, and they lose some precision in the process.
AMD’s XDNA 2 NPU supports Block BF16, a new data format that is said to provide the full accuracy of FP16 with many of the same compute and memory characteristics as INT8. AMD says Block FP16 is plug-and-play with its implementation; it doesn’t require quantizing, tuning, or retraining of existing models.
AMD claims to have the only NPU on the market supporting Block FP16, but Intel has said Lunar Lake also supports the math format. AMD’s representatives didn’t seem familiar with Intel’s support for the format, and they acknowledged that this would invalidate their Lunar Lake performance projections in the benchmarks above.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Thoughts
The Ryzen 9000 'Granite Ridge' processors arrive on July 31, an opportune time. Intel is struggling with widespread crashes with its flagship enthusiast processors, an unresolved problem, and its competing Arrow Lake processors won't come to market until later in the year. That could potentially leave AMD with the performance crown for several months. Ryzen 9000 is coiled to take on Intel’s coming chips, with the 16% IPC increase floating all boats in terms of performance, while the up to 40% reduction in TDP for the lower-tier models capitalizes on AMD’s power consumption advantages.
However, Arrow Lake will be a stiff competitor — it will be the first to come with Intel’s new 20A process node. This node features Intel's first backside power delivery (PowerVia) and gate-all-around (GAA/RibbonFET) transistors. The chips are also said to come with the new Lion Cove P-cores and Skymont E-cores, again marking another significant step forward that should keep competition heated in the desktop PC market.
Laptops with AMD’s Ryzen AI 300 ‘Strix Point’ processors will be available on shelves this month, and that couldn’t come at a better time — Qualcomm’s Snapdragon X Elite Arm processors have stolen the show with the distinction of being the only systems that meet Microsoft’s requirements to be branded as a Windows Copilot+ PCs.
It appears that AMD’s official Copilot+ certification won’t come until later in the year. Still, the ability to field AI-capable PCs with higher TOPS performance from the NPU, even if only five additional TOPS, is a marketing win that will help keep AMD in the limelight of the AI PC upgrade craze. AMD also has the distinction of beating Intel’s Lunar Lake to market, leaving it some breathing room for now. And AMD doesn't need to worry about x86 emulation or graphics driver woes, something that Qualcomm continues to work on.
The Zen 5 Ryzen 9000 'Granite Ridge' processors arrive on July 31, and Ryzen AI 300 'Strix Point' laptops will also be on shelves by the end of the month. Stay tuned for the real tale of the tape when we post our reviews later this month.
Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.
-
TerryLaze Admin said:AMD revealed the deep-dive details of its Zen 5 Ryzen 9000 ‘Granite Ridge’ and Ryzen AI 300 series ‘Strix Point’ chips at its Zen 5 Tech Day.
AMD deep-dives Zen 5 — Ryzen 9000 and AI 300 benchmarks, Zen 5, RDNA 3.5 GPU, and XDNA 2 microarchitectures : Read moreTDP / PBP / MTP
Please don't use intel specific terminology on ryzen, they don't make any sense.
It's TDP and PPT only for ryzen. -
TerryLaze Also testing was done with a ~ $200 water cooler....just saying but if you need that to run the 9950x at stock, 230w ppt, then it's gonna be a joke, at least on the intel system you would get like ~350W out of that.Reply -
jeremyj_83
So if you can get more performance out of the AMD with a lower power draw that is a negative?TerryLaze said:Also testing was done with a ~ $200 water cooler....just saying but if you need that to run the 9950x at stock, 230w ppt, then it's gonna be a joke, at least on the intel system you would get like ~350W out of that. -
TerryLaze
If you have to pay another $200 on top of the price of the CPU to get the performance that AMD claims then that is a bad thing.jeremyj_83 said:So if you can get more performance out of the AMD with a lower power draw that is a negative?
And I don't know how you do math but 230W of the 9950x is not lower than 230W of the 7950x
Being able to use 330-50W with the same cooling that another CPU can only use 230-50w with is a good thing because that means that if you use less power on it you will have much better temps.
Although we don't know, the other article shows the 9950x using 320W so maybe AMD chose to show overclocking numbers for their presentation, I honestly don't know which would be worse. -
jeremyj_83
Literally none of what you are saying makes sense. On top of that AMD might have used a $200 cooler to make sure that they couldn't be called out for hurting possible i9-14900k performance. Also note that at a 170W TDP the AMD chips have a 230W PPT and that is based on AM5 specifications.TerryLaze said:If you have to pay another $200 on top of the price of the CPU to get the performance that AMD claims then that is a bad thing.
And I don't know how you do math but 230W of the 9950x is not lower than 230W of the 7950x
Being able to use 330-50W with the same cooling that another CPU can only use 230-50w with is a good thing because that means that if you use less power on it you will have much better temps.
Although we don't know, the other article shows the 9950x using 320W so maybe AMD chose to show overclocking numbers for their presentation, I honestly don't know which would be worse. -
evdjj3j
Wow, I lost some IQ points reading that.TerryLaze said:Also testing was done with a ~ $200 water cooler....just saying but if you need that to run the 9950x at stock, 230w ppt, then it's gonna be a joke, at least on the intel system you would get like ~350W out of that. -
TheSecondPower "The chips are also said to come with the new Lion Cove P-cores and Gracemont E-cores." That should say "Skymont E-cores." Gracemont is used in Alder Lake and Raptor Lake.Reply -
TerryLaze jeremyj_83 said:Literally none of what you are saying makes sense. On top of that AMD might have used a $200 cooler to make sure that they couldn't be called out for hurting possible i9-14900k performance. Also note that at a 170W TDP the AMD chips have a 230W PPT and that is based on AM5 specifications.
It's not like it's a secret or in any way controversial that ryzen is very hard to cool.evdjj3j said:Wow, I lost some IQ points reading that.
The same amount of cooling that is required to get the PPT of ryzen at thermal throttle temps is enough to give intel 50% more power draw at 8 degrees lower temp.
https://www.anandtech.com/show/17641/lighter-touch-cpu-power-scaling-13900k-7950x/3