Zen, Meet Vega
If you missed our coverage of the Zen design, check out Everything Zen: AMD Presents New Microarchitecture At HotChips. We're not going that deep in today's review. But to better understand how the Raven Ridge die operates, we have to take a quick look at the Zeppelin silicon that made Ryzen famous.
The Zen microarchitecture centers on a four-core CCX building block. AMD complements each CCX with an 8MB L3 cache split into four slices. Two CCXes (we outlined one in green) come together to create an eight-core Zeppelin die. All Ryzen desktop processors, until now, featured the same underlying design, regardless of the number of active cores.
CCXes communicate with each other via AMD’s Infinity Fabric, which is an optimized version of HyperTransport, and share memory controllers over the bus. This is basically two quad-core CPUs talking to each other over an on-die interconnect that also handles northbridge and PCIe traffic.
Raven Ridge essentially replaces the second CCX with a graphics engine. Now, the die is divided up into one CCX, Vega Graphics, and the uncore. The uncore includes an Infinity Controller, the Infinity Fabric, and the I/O and System Hub. Whereas Zeppelin is composed of 4.8 billion transistors across 213mm2, the Raven Ridge die below has 4.94 billion transistors and measures 209.8mm2.
Unlike previous Ryzen products, all four execution cores reside in a single CCX (orange block to the left in the image above). That means an application running on multiple cores does not have to traverse the Infinity Fabric to communicate with other cores and cache. We know from past tests that working across the Infinity Fabric with a set of “remote” cores (and cache) can negatively affect performance in latency-sensitive applications, such as games. Raven Ridge’s single CCX should fare better in those situations.
We outlined the four-core CCXes with green boxes. Similar to what you've seen from AMD's Zeppelin die, the center of a Raven Ridge CCX contains vertical rows of L3 cache. Of course, a Zeppelin CCX has four rows of L3 cache units in the center, which add up to 8MB. The Raven Ridge CCX only sports two rows, giving us 4MB. That means Raven Ridge's L3 capacity isn't an arbitrary restriction or the product of market segmentation. Rather, it was an architectural design choice.
The orange block in the upper-left corner of Raven Ridge contains the interconnect circuitry and control units. That's in the same place on Zeppelin. But the DDR4 memory controllers and platform I/O circuitry around the edges move to different locations. Work definitely went into getting this die's layout just right, and even though the cores themselves appear identical, the CCX design is new.
Raven Ridge processors use Infinity Fabric to connect the CPU cores and on-die Vega CUs (the blue block on the right). But the fabric is merely a protocol. That means it can travel through a number of physical connections, such as interposers, PCB traces, or internal PCIe lanes. One could guess that the protocol operates over an internal PCIe bus, and that the graphics engine consumes some available connectivity, thus trimming Raven Ridge's externally-accessible lanes down to eight. It's also possible that the drop from 16 to eight lanes was another design choice, just like less L3 cache.
As we've demonstrated, increasing the system's memory frequency also improves Infinity Fabric throughput, speeding transfers between the execution cores and CUs. And of course, the Vega-based graphics engine stands to benefit greatly from more memory bandwidth, so you'll want to crank up DDR4 frequencies up as much as possible for better performance.
Unfortunately, we can't yet measure Infinity Fabric latency improvements with existing tools, though we're working on ways around that. In the meantime, we ran some benchmarks on the new cache hierarchy. Despite Ryzen 5 2400G's lower cache capacity, we are expecting speed-ups attributable to design tweaks.
|Range||2KB - 32KB||32KB - 512KB||512KB - 8MB||8MB - 1GB|
From a high level, Ryzen 5 2400G's single-threaded cache throughput remains comparable with the previous-gen Ryzen. But multi-threaded throughput declines significantly due to fewer responding regions.
As a result of the new single-CCX design and other tweaks, we observe the lowest L2 and L3 cache latency seen from a Ryzen CPU. That's a good omen of what we might see from latency-sensitive applications. This trend holds true for all three types of data access, which we've explained in-depth. We also provide zoomed-out versions of the latency measurements that show main memory latency. The 2400G excels in the sequential and full random access tests to main memory.
MORE: Best CPUs
MORE: All CPUs Content