AMD Unwraps Zen 2 Microarchitecture, X570 Chipset, Ryzen APUs

(Image credit: AMD)

AMD unveiled its new 16-core 32-thread Ryzen 9 3850X here at its Next Horizon Gaming Tech Day, along with deep-dive details of its X570 chipset, information on Ryzen 3000's higher overclocking potential, the expansion of auto-overclocking to the mainstream desktop, and many more details on the Zen 2 microarchitecture. AMD also unveiled its new Ryzen 3 3200G and Ryzen 5 3400G APUs that will also drop into Socket AM4 motherboards.

Ryzen 9 3950X

AMD's new Ryzen 3000 series has the chance to truly redefine our expectations for the mainstream desktop by offering models with up to 16 cores and 32 threads that drop right into mainstream motherboards. That brings what was previously an HEDT-class (high end desktop) level of performance down to more acceptable mainstream price points, opening up the possibility of a big upset to Intel's dominance in the desktop PC market. The Ryzen 9 3950X comes to market in September for $749. We have more coverage of the Ryzen 9 3950X here.

Ryzen 3000 Series Product Stack

(Image credit: AMD)

The Ryzen 9 3950X is undoubtedly AMD's halo part for the mainstream desktop, but it is just part of the company's full frontal assault on Intel's entire lineup. As you can see in the image above, the 3950X isn't present because there simply isn't a comparable Intel model. But analyzing the competitive landscape of the rest of the series (based on pricing) reveals one clear trend: AMD offers more threads than Intel in its competitive price ranges. Intel retains the advantage of integrated graphics compared to some of AMD's models, but the Ryzen 3 2200G and Ryzen 5 3400G APUs, the lone 12nm chips with the first-gen Zen architecture on the chart, slot in to attack Intel's low-end products with capable Vega graphics cores.  

Ryzen 3000 Series Performance

AMD's per-core performance, which is generally a mixture of IPC and frequency, has improved substantially with the debut of the Zen 2 microarchitecture, and translates to all types of workloads, be they single- or multi-threaded. As measured with a single-threaded Cinebench workload in the chart above, AMD has made tremendous strides in single-threaded performance compared to the first-gen Ryzen models, with up to 21% more per-core performance with the Ryzen 9 3900X.

The benefits of the 7nm process also translate to lower power consumption, with the Ryzen 9 3900X drawing 15W less power at the wall and delivering 58% more performance-per-watt than the Core i9-9900K. That translates to easier cooling for the chip, and the full line of existing socket AM4 coolers are compatible. AMD derived 38% of the single-threaded performance improvement from the faster 7nm process, while 68% of the performance uplift comes from the new Zen 2 microarchitecture.

AMD has given us the full deep-dive details of the Zen 2 microarchitecture, which we'll cover more deeply down below. This new design confers performance benefits that AMD claims deliver a range of performance improvements in 1080p gaming, making them far more competitive with Intel's models than first-gen Ryzen, and, notably, these deltas will be reduced even further with higher-resolution gaming. AMD also provided these performance numbers without patching Intel's systems for the wide range of vulnerabilities that have sapped Intel's performance and didn't utilize the new Ryzen-optimized Windows scheduler, which we'll cover below. That means AMD has provided a best-case scenario for Intel's systems in these benchmarks, so the deltas could possibly be wider in AMD's favor. 

The eight-core 16-thread Ryzen 9 3900X stands out as a real performer based on its price point: AMD claims it offers comparable gaming performance, 47% more performance in threaded 'creator' workloads, and is 58% more power efficient than Intel's similarly-priced Core i9-9900K, not to mention its advantage of PCIe 4.0 connectivity. Notably, AMD provided these benchmarks, so we'll need to wait for our samples to arrive to confirm. In either case, the claimed performance improvements are impressive.

The X570 Chipset - PCIe 4.0 Goes Mainstream

But performance requires a lot more than "just" heftier core counts. AMD has invested heavily in pushing the PCIe 4.0 interface into the wild by encouraging its ODMs and partners to develop motherboards that will benefit from the doubling of throughput over PCIe 3.0. Typically we would expect the company with the most market share to lead the push to a faster interface, in this case Intel, but AMD has stepped into the role quite nicely. The company infused the new technology into its "Navi" Radeon 5000 series GPUs and worked with storage vendors to assure a supply of speedy new PCIe 4.0 SSDs when the company's X570 platform arrives on 7/7.  

(Image credit: Tom's Hardware)

PCIe 4.0 provides yet another advantage for performance seekers, particularly in the content creation realm, over Intel's platform, but it will come at the cost of higher pricing for X570-equipped motherboards. Those pricing increases could come down over time as the pricing of the PCIe 4.0 component ecosystem, like switches and redrivers, benefit from economies of scale, but AMD is wisely encouraging its partners to continue offering the current-gen X470 motherboards that will now serve as a lower tier of motherboards. AMD's new Ryzen 3000 series lineup is fully compatible with existing X470 motherboards and will operate at their full performance, albeit at the loss of PCIe 4.0 connectivity. That shouldn't be too much of a concern for users without PCIe 4.0 devices or large RAID storage arrays.

(Image credit: AMD)

We also learned that the actual X570 chipset is a 14nm variant of the 12nm I/O die inside the Ryzen 3000-series processors, which is a clever reuse of the technology that will ultimately lower costs. AMD uses the smaller 12nm process for the in-package I/O die to leverage the increased frequency potential for the memory controllers, which improves memory data transfer rates but uses the more economical 14nm variant, which has its memory controllers disabled, for the chipset die.

Faster Memory and Auto-Overclocking

AMD has improved memory overclocking substantially, partly due to decoupling the Infinity Fabric from the memory clock. AMD also bumped up the base supported memory frequency from DDR4-2933 to DDR4-3200, but the real advantage comes from heightened memory overclocking potential. AMD's first-gen Ryzen processors had plenty of difficulties with memory overclocking when they first launched, but AMD has addressed those concerns with the second-gen products and even demoed an air-cooled Ryzen platform running at DDR4-5100 at the show.

As with previous-gen Ryzen, memory overclocking confers big performance speedups for gaming. To sidestep the Infinity Fabric's maximum frequency of 2,000 MHz, which effectively constrains memory overclocking, AMD will now allow users to separate the memory and Infinity Fabric clock dependencies. The domains remain tied together at a 1:1 ratio up to DDR4-3733, but run at a 2:1 ratio beyond that transfer rate. This setting, which is user-adjustable in the BIOS, improves memory bandwidth but comes with a latency penalty.

AMD is also bringing the auto-overclocking Precision Boost Overdrive (PBO) feature from its Threadripper lineup down to the mainstream chips. This feature allows Ryzen processors to communicate with the platform to modulate performance based on what the motherboard's power delivery subsystem can do, which is the key enabler for Precision Boost Overdrive. The processor then analyzes telemetry data on power delivery and thermal overhead and makes adjustments to the core clock and voltages automatically in the background, boosting performance based upon the workload dynamically. This feature is definitely a welcome addition to AMD's mainstream platform.

Ryzen-Specific Windows 10 Scheduler Updates

AMD has worked with Microsoft to deliver on a much needed feature: A Ryzen-aware scheduler. The new scheduler arrives with the Windows 10 May update and will benefit both current-gen and previous-gen multi-die Ryzen models (Threadripper and Ryzen 3000 processors).

The new scheduler pins active threads in cores that have localized data, thus improving performance. AMD also introduced its CPPC2 feature, which is a software feature that manipulates Ryzen 3000's power states from within the operating system. AMD says this will reduce power state transition latency from 30ns to 1ns, which will ultimately save power.

Zen 2 Microarchitecture

Given the limited resources of AMD compared to its larger rival Intel, this is truly a David vs. Goliath story that began unfolding with the debut of the first-gen Ryzen processors and the revolutionary Zen microarchitecture. Zen's modular and scalable design provides AMD with plenty of advantages in terms of cost and time to market, and fine-grained tuning to the architecture has yielded phenomenal results.

AMD has improved IPC by roughly 15% (though that can vary by workload) doubled the L3 cache size to keep data as close to the execution units as possible, and doubled floating point performance by stepping up to two 256-bit floating point units (FPUs) that enable support for AVX2 instructions.

AMD shared deep-dive details of the microarchitecture, but due to time constraints, we'll have to follow up with a deeper analysis. Headline improvements include a doubled micro-op and L3 cache, which came at the expense of a slightly smaller L1 instruction cache that is now an 8-way associative 32K block as opposed to the 64K block with 4-way associativity on first-gen Ryzen. AMD also beefed up the Translation Lookaside Buffer to 2,000 entries.

AMD now has a double-stage branch predictor, with its Perceptron predictor handling the first stage while a new TAGE branch predictor, which features larger lookup tables to improve performance, serves as the second stage. AMD says the improved branch predictor does expend some extra energy on the front end, but the 30% lower misprediction rate ultimately saves more energy on the backend.

The doubled L3 cache means that AMD's four-core complexes are now wrapped around 16MB of cache apiece, and the L3 cache still serves as a victim cache for L2 data. The larger cache does result in a latency penalty that amounts to a few cycles of higher latency than first-gen Ryzen, but AMD feels the increased capacity offsets those losses.

AMD infused new hardware-based mitigations into its architecture to help deal with the only two vulnerabilities that impact its architecture: Spectre and speculative store bypass.

AMD also added support for a few new instructions, like CWLFLSH to ensure data is flushed from caches to persistent memories, but we're told the company is holding back other new instructions for its Hot Chips event. We're going to follow up with a full breakdown of the microarchitecture in the interim. Stay tuned.

7nm Process

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • Rdslw
    507 single thread score.
    it means its 5x stronger t1 vs t1 than my old linux box with 3210m, AND 2x strong t1 VS all core on my older laptop.....
    even new one with i7-8750H, 32 GB DDR4 is not even half on both of the benches .... that's some solid horses guys.
    Reply
  • setx
    "doubled floating point performance by stepping up to two 256-bit floating point units (FPUs) that enable support for AVX2 instructions"
    It doesn't "enable support" for AVX2 as Zen1 already supported it – it doubles the speed as Zen1 was executing them in two 128-bit steps.

    Would be nice to have AVX-512 support even at half the speed, but I guess those 32 zmm's are just too expensive.
    Reply
  • alextheblue
    I knew they had x4 lanes direct to one M.2 and x4 to the chipset, but I thought all the USB ports were fed by the chipset. Having four 10Gb ports right off the CPU is pretty nice.
    Reply