AMD publishes first Zen 6 document detailing ground-up redesign on 2nm process node — brand-new 8-wide CPU core with strong vector capabilities
But there are a few catches.
AMD this week released a document titled "Performance Monitor Counters for AMD Family 1Ah Model 50h-57h Processors" (discovered by InstLatX64) that reveals numerous architectural details of AMD's Zen 6-based CPUs, including the EPYC 'Venice' processor for data centers, through performance monitoring interfaces. As it turns out, Zen 6 is not exactly an evolution of Zen 5, but rather an all-new design with a different ideology.
AMD has been talking about its Zen 6-based CPUs in very general terms for quite a while, revealing that they will feature up to 256 cores and be made using TSMC's 2nm-class process technology. This week's PMC document for software developers states that the Zen 6 microarchitecture is no longer an incremental evolution of Zen 4/Zen 5, but a deliberately wide, throughput-oriented design with an eight-slot dispatch engine and simultaneous multi-threading (SMT).
In such a design, two hardware threads dynamically contend for a shared pool of dispatch slots, so, at the same clock speeds, the single-thread performance of Zen 6-based processors may not be as high as that of Apple's 9-wide (or wider) CPUs in all situations. However, in some instances, this type of architecture promises very high performance. Furthermore, the core has dedicated counters for unused dispatch slots, backend stalls, and thread-selection losses, which confirms that wide issue and SMT arbitration are the factors AMD is betting on with Zen 6.
Zen 6 also substantially expands AMD's visibility into vector and floating-point execution, underscoring the architecture's emphasis on dense-math workloads. According to PMC documentation, Zen 6 processors support full-width AVX-512 execution with FP64, FP32, FP16, and BF16 data formats, including FMA/MAC operations and mixed FP-INT vector execution (including VNNI-class, AES, and SHA operations). Furthermore, it delivers sustained 512-bit throughput high enough to require merged performance counters for accurate measurement. This is hardly proof that Zen 6-based CPUs will be AVX-512 performance champions, but it does show that Zen 6 can retire enough vector work per cycle to overwhelm legacy measurement methods.
In general, Zen 6's performance-focused capabilities suggest it is AMD's first microarchitecture designed from the ground up for data center use cases. It remains to be seen which features will be retained in client offerings and how well these perform. But based on what we can observe today, Zen 6-based CPUs will be number-crunching monsters.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
rluker5 Wasn't Zen 4 4 wide, Zen 5 6 wide, and now Zen 6 will be 8 wide so on that, with 512 staying basically the same, the uplift could be less than from 4 to 5, or it could be more due to other things not clearly explained here. I'm skeptical of the ground up redesign though as I've heard that from AMD a few times before when it wasn't really true.Reply
It is definitely an improvement and the rumored 50% increase in cores per CCD will really help single CCD CPUs in some tasks, but the experience for most will likely be more dependent on clockspeed and IPC gains from the node change as the vast majority of common user tasks aren't significantly benefitted by using over 8 cores.
If the node improvement isn't enough to offset the added heat from the added cores and wideness then sustained cockspeeds will be lower, as opposed to the rumors of 6, 6.5, 7.0 clocks. Heat comes from compute per second per area at a given efficiency.
AMD is doing what they should with node improvements IMO as they are balancing the responsiveness of high clockspeeds with the inefficiency of high clockspeeds by using the improved efficiency of the node improvement to get more done per clock instead of going too far up the inefficiency curve. It isn't everything at once but getting the best practical outcome by balancing several related variables.
It seems like AMD and Intel both do this and are both limited by node improvements (per core) and silicon expenditures (multicore, cache quantity) when it comes to CPU generational improvements.
And a good example of where architectural design improvements outpaced node improvements is Rocket Lake. And a good example of going too far up the inefficiency curve on clockspeeds is Raptor Lake, especially when voltages were handled in some outdated archaic fashion with the built for Haswell LLC settings that were exacerbated by motherboard manufacturers giving a dirty all core undervolt at the expense of single core overvolt.
I don't think many want to see Zen 6 repeat either of these mistakes and most would prefer an architecture and execution to best balance the opportunities and limitations given by the node and how much AMD is willing to spend on silicon per CPU. Don't hype yourself up too much from rumors and you won't be as disappointed with reality. The I/O die could bring some extra noticeable improvements though. -
jp7189 I'd like to hear more about the IOD. There should be some new tricks there to help feed the beast. Dare I hope for quad channel on the desktop? Bad timing with current memory prices. At the least, I expect the platform to be optimized for high RAM clockspeeds.Reply