AMD Ryzen 7 5800X3D Review: 3D V-Cache Powers a New Gaming Champion

96MB of L3 cache goes Brrrr

Ryzen 7 5800X3D
Editor's Choice
(Image: © Tom's Hardware)

Why you can trust Tom's Hardware Our expert reviewers spend hours testing and comparing products and services so you can choose the best for you. Find out more about how we test.

3D V-Cache Technology

The idea behind 3D V-Cache is relatively simple, but the execution is complex. The basic idea behind any on-chip cache is to keep frequently accessed data as close to the execution cores as possible, thus eliminating high-latency trips to main memory. As a result, the cores don't have to wait for data, thus staying busier and boosting performance. The L3 cache is slower than other caches (like L1 and L2), but its higher capacity means it can store more data, improving the hit rate (the number of times useful data is held in the cache). There's a reason AMD calls it "Game Cache" — L3 cache is very important to performance, and games, in particular, can suffer from high L3 latency or reduced cache capacity/hit rates.

But a big slab of cache is best. As you can see in the above album, AMD stacks an additional SRAM chiplet, connected via TSVs to the lower die, directly in the center of the compute die (CCD) to isolate it from the heat-generating cores on the sides of the chiplet. However, AMD has to use a silicon shim on top of the cores to create an even surface for the heat spreader that sits atop the chiplet. Contrary to popular belief, this is a single shim that wraps around the chiplet on three sides (images in cache testing section). Silicon is an excellent thermal conductor, but the shim and extra SRAM die will inevitably reduce thermal dissipation from the bottom die, thus resulting in less thermal headroom. We show that impact in our boost frequency and thermal load testing. The extra memory also consumes more power.

AMD says overclocking isn't possible because the cache chiplet and the CCD share the same power plane and the effective voltage limit for the SRAM chiplet weighs in at 1.35V. Since the core voltage can't be altered separately, that prevents overclocking the CPU core frequencies. Unfortunately, this also hampers peak chip frequencies during normal operation, so the 3D V-Cache tech does contribute to the 5800X3D’s lower clock speeds. For perspective, the Ryzen 7 5800X has a 1.5V limit, so it can reach higher clock speeds.

AMD's 3D chip stacking tech is based on TSMC's SoIC technology. TSMC's SoIC is a bumpless chip stacking tech, meaning that it doesn't use microbumps or solder to connect the two die. Instead, the two die are milled to such a perfectly flat surface that the TSV channels can mate without any type of bonding material, reducing the distance between the cache and core by 1000X. That reduces heat and power consumption while boosting bandwidth. You can read much more about the hybrid bonding and manufacturing process here. AMD says the technique uses silicon fab-like manufacturing with back-end like TSVs, which means the production flow is similar to that of a regular chip. 

Swipe to scroll horizontally
Row 0 - Cell 0 7nm 3D V-Cache Die7nm Core Complex Die (CCD)12nm I/O Die (IOD)
Transistor Count4.7 Billion4.15 Billion2.09 Billion
MTr/mm^2 (Transistor Density)~114.6 Million~51.4 Million~16.7 Million

As before, the 7nm Core Complex Die (CCD) has 4.15 billion transistors spread out over 80.7mm^2 of silicon. Meanwhile, the new smaller 7nm 3D V-Cache die measures only 41mm^2, yet has 4.7 billion transistors. As you can see in the table, that means it has slightly more than twice the transistor density, which is due to AMD using a density-optimized version of 7nm that's specialized for SRAM. It's also important to remember that a standard compute die includes several types of transistors (libraries, standard cells) for different purposes, so density varies across the die. In contrast, the V-Cache die uses a largely uniform layout.

The L3 cache chiplet spans the same amount of area as the L3 cache on the CCD underneath, but it also has twice the capacity. That's due to the optimized process, but also partially because the additional L3 cache slice is somewhat 'dumb' — all the control circuitry resides on the base die, which helps reduce the inevitable latency overhead associated with fetching data from a separate die (more on that in the cache testing section later).

AMD Ryzen 7 5800X3D 3D V-Cache Design and Latency

Several factors influenced AMD's decision to use 3D-stacked SRAM, but key among them is that SRAM density isn't scaling as fast as logic density. As a result, caches now comprise a higher percentage of the die area than before, but without delivering meaningful capacity increases. Furthermore, expanding the cache laterally would incur higher latency due to longer wire lengths and eat into the available die area that AMD could use for cores. Additionally, adding another SRAM chiplet in a 2D layout isn't feasible due to the latency and bandwidth impact.

To address those issues, AMD stacks the additional SRAM directly on top of the center of the compute die where the existing L3 resides. This L3-on-L3 stacking allows the lower die to deliver power and communicate through two rows of TSV connections that extend upwards into the bottom of the L3 cache chiplet. These connections go vertically into the upper die and fan out, which actually reduces the amount of distance data has to travel, thus reducing the number of cycles needed for traversal compared to a standard planar (2D) cache expansion. As a result, the L3 chiplet provides the same 2 TB/s of peak throughput as the on-die L3 cache, but it only comes with a four-cycle latency penalty.

Swipe to scroll horizontally
AIDA L3 Cache Latency Measurements
Tom's Hardware Ryzen 7 5800X3DRyzen 7 5800X
AIDA - L3 Latency13.84 ns11.49 ns
AIDA - L3 Cycles47 clk43 clk

The album above outlines our cache and memory latency benchmarks with the AMD Ryzen 7 5800X3D and the 5800X using the Memory Latency tool from the Chips and Cheese team. These tests measure cache latency with varying sizes of data chunks, and the first slide zooms in on the L3 portion of the cache. Here we can see that the tool measures the Ryzen 7 5800X3D's L3 latency at 12-13ns, whereas the 5800X measures at 10-11ns (the second slide shows the zoomed-out version). We also used AIDA to record the latency measurements, which we listed in the table. Overall, the 3D V-Cache triples the amount of L3 cache but incurs a fairly negligible ~2ns latency impact and a four-cycle penalty.

As mentioned before, the L3 cache chiplet spans the same amount of area as the L3 cache on the CCD underneath, but it has twice the capacity. That's partially because the additional L3 cache slice is somewhat 'dumb' — all the control circuitry resides on the base die, which helps reduce the latency overhead. AMD also uses a density-optimized version of 7nm that's specialized for SRAM. The L3 chiplet is also thinner than the base die (13 metal layers).

AMD produces all of its Zen 3 silicon with TSVs, so all of its Zen 3 silicon supports a 3D V-Cache configuration. However, the TSVs aren't exposed unless they're needed. For 3D V-Cache models, AMD slightly thins the base die as well to both expose the TSV connections and also to maintain the same overall package thickness (Z-Height) as the existing models.

The lack of control circuitry in the L3 chiplet also maximizes capacity and allows AMD to selectively 'light up' only the portions of the cache that are being accessed, thus reducing (and even removing) the power overhead of tripling the L3 cache capacity. In addition, because the larger cache reduces trips to main memory due to higher L3 cache hit rates, the additional capacity relieves bandwidth pressure on main memory, helping to reduce latency and thereby improving application performance from multiple axes. Fewer trips to main memory also reduces overall power consumption.

The L3 cache chiplet consumes significantly less power per square millimeter than the CPU cores. Still, vertical stacking does increase power density, so it's best to isolate it from the heat-generating cores on the sides of the chiplet. However, this would leave a protruding die on top of the CCD, so AMD uses a single silicon shim that wraps around three sides of the L3 chiplet to create an even surface for the heat spreader that sits atop the chiplet. Silicon is an excellent thermal conductor, and the intention is for the shim to allow heat to transfer from the cores up to the heat spreader.

Previous renderings of the design have shown two distinct silicon shims and appeared to show the L3 cache die spanning from one side of the die to the other. However, AMD's materials for the Milan-X launch clearly show one long shim that covers the compute die and a thin portion on the edge of the die that isn't covered by the L3 cache chiplet. This thin expanse of the bottom die includes I/O functions that the chiplet uses to communicate with the I/O die. AMD confirmed that this is the actual layout on all 3D V-Cache processors, like the Ryzen 7 5800X3D, and not the stylized renders shared that show two separate shims. 

Paul Alcorn
Deputy Managing Editor

Paul Alcorn is the Deputy Managing Editor for Tom's Hardware US. He writes news and reviews on CPUs, storage and enterprise hardware.