As AMD is enabling driver support for its upcoming Radeon RX 7000-series graphics cards based on the RDNA 3 architecture, various hardware enthusiasts and investigators continue to disclose new information about the upcoming GPUs. This week it's been uncovered that AMD's next-generation flagship GPU will apparently feature a 384-bit memory interface.
AMD's codenamed Navi 31 graphics processor (which is called SoC15 in AMD's drivers) supports six 64-bit MCD (memory controller dies, which is AMD's new way of calling memory controller), as discovered by VideoCardz in AMD's driver patch code published by Freedesktop.
This is the first time AMD has used the term MCD to describe its memory controller, as previously the company stuck to the term UMC (unified memory controller) both for GDDR and HBM memory subsystems. While the word die certainly has a concrete meaning, we are still not sure that AMD’s Navi 31 uses separate dies (i.e., chiplets) for memory controllers.
PC memory interfaces are typically 64-bit wide, so six 64-bit controllers would provide AMD’s Navi 31 with a 384-bit memory but, which is 50% wider than that of Navi 21. Keeping in mind that AMD’s next-generation GPU should naturally have more compute oomph than the current one, it should need more memory bandwidth. A wider memory interface is a good way to get it, though more Infinity Cache might also suffice.
To put things into context, a 384-bit 18 GT/s memory subsystem offers a peak bandwidth of 864 GB/s, which is significantly more compared to Radeon RX 6900 XT’s 512GB/s and 50% more than the RX 6950 XT's 576 GB/s.
A wider memory interface is relatively expensive to implement since it takes up a lot of die space. If AMD’s Navi 31 indeed comes with a 384-bit memory bus, that would indirectly mean the company is positioning its upcoming flagship GPU higher than its current-generation Navi 21. We do not know whether it will also mean a higher recommended price for AMD’s next-gen flagship, but that's certainly a possibility.
As always, since we are dealing with unconfirmed information from an unofficial source. AMD is not going to comment on an unreleased product, and the source code path has been edited after the initial change. The only things we know for certain are that RDNA 3 will feature some form of GPU chiplets, and that the GPU cores will be made using TSMC's 5nm N5 process.
.... It does work well at 1080p but starts dropping off once you hit 1440p and 4K... There is no way a tiny cache is enough at higher resolution. You need extra memory bandwidth too!!
Not to mention 5nm is alot more expensive compared to 7nm. So, you don't want to waste the silicon on just cache memory.
They're not leaving the cache idea - it's both, the cache and the bus width. The cache volume only just not enough since (if) computing power doubled.
Now, about spending expensive 5 nm silicon budget on cache: the cache is suppose to be on IO die and, if I'm not mistaken, the IO die is 6 nm. Huge bus width makes the PCB expensive (more layers) so AMD is balancing between expensive cache, expensive GDDR and expensive PCB.
I'm pretty certain i've seen projected specs showing the infinity cache sizes will be going up in the Next Gen gpu's so both are welcomed.
I would definitively imagine it costs more for sure, might generate a bit more heat as well and consume a bit more power as well.
AMD side seems promising, there is nothing there that screams 300W idle... so I have hope that dual height GPU won't be red-hot each time I recalculate something.
Fingers crossed guys. I am running 1060 right now, and I really need something stronger, I have to many coffee breaks.
It's an interesting design choice, and one I had not expected. When I did my paper napkin block diagram, I did design out a separate memory controller chip to handle the IO. Cache was on another chiplette and CU on another. I think AMD had several such designs on the table. One with the compute clusters broken into two chips. CU's generate a lot of heat as they are focused around SIMD/MIMD matrix FP16 ops. So I thought it logical to break this up and bin them. I was more worried about the CU->memory interface -> CU (coherency) rather than the memory interface to memory due to that inherent latency. But memory latency is a huge issue due to long access times followed by high bandwidth.
This one seems optimized for memory bandwidth (mcd) over compute. Memory interfaces tend to be cooler and run slower because of trace and access latencies to memory. Having independent memory controllers allows an improved efficiency on ops that don't require 192 bit wide access. For example fetching 32 bits of data for a 192 bit bus is a waste due to latency. If the next data you need is non sequential, you have an even larger stall in the pipe. With a narrow 32 bit bus for each MCD, the likelihood of this happening is considerably smaller. But you have to design for parallel workloads with each doing separate work. And the cache kept for local draw call buffers. The final call where you compost the scene and flip the screen buffer window will be an interesting algorithm. While a section of the screen is being composted using mostly infinity cache using a tile approach, the io controller will have to in parallel be lining up optimal access to GDDR to compost the next section.
Driver optimizations in the past focused around keep commonly accessed compiled shaders and assets in GDDR memory. This will lead to a new scheduling technique. AI can be used to predict blocks of GDDR memory will be used before they are even executed. Then you have a shopping line algorithm bucket sift which is NP Complete to optimize the retrieval before the data is needed. Compute execution will be based on the results of which memory is ready first.
Is anyone seriously expecting the new gen flagship to be slower than the current flagship?
Frankly, I'm still VERY skeptical of the "guesswork" showing chiplet MCDs. That makes no sense to me. Each MCD would need to link to the GCD via, what, Infinity Fabric? But having six 64-bit MCDs would mean having six Infinity Fabric links, most likely, which hasn't really simplified anything. The only thing that would accomplish is moving the cache off the GPU and onto a separate chip, which maybe works out okay. Anyway, I'm not saying AMD isn't going this route, but I I still think a larger MCD + Cache with a varying number of enabled memory links (binning) connecting to multiple GCDs would be more sensible.
Also, the prospect of 3D V-Cache might seem tantalizing, but it basically added $100 to the cost of 5800X3D. Maybe smaller cache chips only add $25 to $50 per MCD, but that's still up to $300 to the raw bill of materials, which would mean the cards would have to cost basically $500 more.