AMD GPU Appears to Leave Room for Future 3D V-Cache

AMD 3D V-Cache
(Image credit: AMD)

Semiconductor engineer Tom Wassick has discovered what he believes to be 3D V-Cache functionality on one of AMD's best GPUs, the RX 7900 XT. The engineer took a peak inside the 7900 XT's die with infrared imaging and found the same type of 3D V-Cache connection points used on AMD's Zen 3 and Zen 4 architecture. Wassick spotted the connections on the MCD die. 

Wassick cannot say if these TSV connection points will be used for caching purposes specifically, but AMD has made no known plans to expand its 3D packaging capabilities beyond vertically stacked cache at this point. That makes it seem like these connection points would be used with some sort of 3D cache in mind to increase gaming performance and/or compute performance. 

The discovery comes after numerous unconfirmed rumors that AMD would add 3D V-Cache tech to its GPUs. 

3D V-Cache has been used to great success on AMD's Ryzen and EPYC CPUs so far. The technology relies on a hybrid bonding technique that fuses an additional 64MB slab of cache on top of a Ryzen or EPYC compute die to increase L3 cache capacity. Currently, this 3D stacking technique has allowed AMD to double the amount of L3 cache available to its desktop Ryzen 9 7900X3D and 7950X3D parts while tripling it on its Ryzen 7 5800X3D, 7800X3D consumer chips and EPYC Milan-X server processors. 

The performance benefits from this technology have been impressive, with 3D-V-Cache chips gaining a full generational increase in performance in applications that benefit heavily from large chunks of cache. A good example of this is with the Ryzen 7 5800X3D where we saw a 28% uplift in gaming performance against the Ryzen 9 5900X, and 7% faster performance than the Core i9-12900KS.

AMD's server counterparts are even more impressive, with Milan-X benchmarks from AMD and Microsoft showing performance improvements of well over 50% against standard Milan parts. However, this technology can't magically increase performance at will. Only cache-sensitive workloads will see this type of behavior.

We have no idea how 3D V-Cache would operate in a GPU application. But in theory, the main principles of 3D V-Cache should still apply. Having more cache capacity would enable faster processing of cache-sensitive workloads since the GPU has to make fewer trips to its slower GDDR6 memory.

We've already seen a good example of this with AMD's Infinity Cache in the RX 6000 series where AMD was able to use slower GDDR6 memory and retain the same performance as Nvidia's RTX 30 series GPUs, featuring power-hungry GDDR6X memory, thanks to that infinity cache keeping the GPU fed with data.

However, we don't know if the same behaviors will apply with 3D V-Cache. This will all depend on how sensitive AMD's GPU architectures are to additional cache capacity, and how many applications will benefit.

Another problem AMD will have to deal with is thermals. We've seen this issue extensively on AMD's Ryzen X3D processors, where the additional slab of cache hinders thermal dissipation, resulting in lower CPU frequencies and higher temperatures at the same time (in comparison to a non-X3D part). There's a high likelihood AMD would deal with the same issues on 3D V-Cache GPUs, and be forced to reduce clock speeds to keep temperatures in check.

Nonetheless, its cool to see AMD possibly looking into the idea of adding 3D-Vache to its GPUs. We could be looking at AMD's next silver bullet, to "magically" increase gaming performance.

Aaron Klotz
Contributing Writer

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

  • thisisaname
    Could be great but going by their current form I'm wondering how they will use this "silver bullet" to shoot themselves in both feet.
    Reply
  • Geef
    GPU has to make fewer trips to its slower GDDR6 memory.

    It is so crazy that we are at the point where GDDR6 memory is considered slower! :p
    Reply
  • bit_user
    IMO, it's hardly surprising. Why else would they launch their flagship GPU with less cache than the previous gen, unless they had some avenue for later surpassing it?

    Meanwhile, if you look at what Nvidia did, their 4000-series GPUs have 12x as much L2 cache as the corresponding 3000-series, even surpassing AMD's 7900 XTX!! So, this definitely seems like a key point for performance that AMD will want to exploit.

    As for die-stacking vs. thermals, I presume the MCDs should be cooler than the GCD, so probably a non-issue. And it could further justify their decision to move cache off the GCD. I rather expected this is what they had up their sleeves.
    Reply
  • bit_user
    Geef said:
    It is so crazy that we are at the point where GDDR6 memory is considered slower! :p
    Than cache? It was always slower.

    I found this example from the 2017 Tesla V100:
    "V100 has a peak math rate of 125 FP16 Tensor TFLOPS, an off-chip memory bandwidth of approx. 900 GB/s, and an on-chip L2 bandwidth of 3.1 TB/s"Source: https://docs.nvidia.com/deeplearning/performance/dl-performance-gpu-background/index.htmlThe V100 was made on 12nm and use HBM2 memory, but this should give you some sense of the ratio of L2 to main memory bandwidth.


    Another datapoint: AMD claimed something like 5.3 TB/s bandwidth to the L3 cache in its RX 7900 XTX, whereas its GDDR6 bandwidth is just 0.96 TB/s.
    Reply
  • Kamen Rider Blade
    I wouldn't be surprised if they created 3DvCache/Infinity Cache for L2$ & L1$ on the GPU/CPU at some point, that would be pretty awesome.
    Reply
  • Gillerer
    This option to stack V-Cache on the MCDs was leaked before the GPUs even came out.

    *

    Since AMD has been unable to meet their performance estimates on RDNA3, it's possible even a binned and super-overclocked refresh variant wouldn't be memory bandwidth starved enough to warrant the extra cost. (This would be the mid-gen refresh "RX 7950XTX" or "RX 7970XTX".)

    And even if they manage to pull a rabbit out of a hat and fix the performance with drivers or firmware, there's still the alternative of equipping the card with faster GDDR6 instead, depending on which is cheaper.

    *

    Having said that, AMD tends to create reusable designs. Even if V-Cache isn't required this generation, it could come in handy for the next one. This same MCD can be used for the GDDR6 cards, at least.
    Reply
  • bit_user
    Kamen Rider Blade said:
    I wouldn't be surprised if they created 3DvCache/Infinity Cache for L2$ & L1$ on the GPU/CPU at some point,
    Definitely not L1 cache. The latency of going to a stacked die would be too high for that.

    L2 cache... again, I'd expect latency would be a major issue for CPUs, but less so for GPUs. The biggest issue for GPUs might turn out to be the thermal impact of stacking anything atop the GCD.
    Reply
  • tamalero
    Admin said:
    A semiconductor engineer has discovered the same 3D V-Cache connection points on AMD's RX 7900XT, as was found on AMD's Zen 3 CPU architecture. Pointing to the fact AMD could be building 3D V-Cache GPUs in the future.

    AMD GPU Appears to Leave Room for Future 3D V-Cache : Read more
    Could AMD use these connectors to actually have 2 layers of chiplets for processing there?
    instead of vcache two full processing cores one on top of the other?
    Reply
  • JarredWaltonGPU
    tamalero said:
    Could AMD use these connectors to actually have 2 layers of chiplets for processing there?
    instead of vcache two full processing cores one on top of the other?
    The earliest leaks of the RDNA 3 architecture suggested AMD had planned for 16MB of L3 cache on each MCD, with the potential to stack another 16MB on top. There was even the possibility of doing a 2-high stack (an extra 32MB per MCD), but the benefits were outweighed by the cost.

    In general, I think the gains from going to 192MB total L3 cache on a 7900 XTX will be relatively small. There are diminishing returns. Maybe best-case, AMD gets an additional 10–15% at 4K, and a future 7950 XTX certainly seems likely. But while going from 96MB to 192MB might get 10% or whatever, the move from 192MB to 288MB total probably ends up only adding another 5% or less is my bet. Only time will tell what happens, and perhaps the 1-high and 2-high stacks will be more for professional cards where they cost $2000 and so spending an extra $200 per chip for stacking isn't out of the question.
    Reply
  • hotaru251
    thisisaname said:
    I'm wondering how they will use this "silver bullet" to shoot themselves in both feet.
    1st thought is lower performance per dollar in stuff that doesnt benefit from it. (same way 5800x3d was slower than 5800x in non cache heavy stuff)

    as well as how many applications can actually benefit from it (think early days RTX..no realy use for the hardware if barely anything can use it)
    Reply