AMD RX 7600 Has Better Cache Latency Than the RX 7900 XTX

(Image credit: AMD)

Chips and Cheese published an in-depth review of the new AMD Radeon RX 7600, highlighting the chip's strengths and weaknesses at an architectural level. Despite its often mediocre performance, there are some positive characteristics of the RX 7600. Specifically, it has superior cache and memory latency performance compared to its much more potent RX 7900-series counterparts.

The RX 7600's behavior stems from the way AMD manufactures the RX 7600. Instead of utilizing a chiplet-based design, which AMD heavily promoted for the RX 7900-series, Navi 33 uses a traditional monolithic design — a single chip. This combined with its much smaller die give it superior memory latency over its bigger GPU counterparts.

According to tests conducted by Chips and Cheese, the RX 7950 XTX takes up to 58% longer to retrieve data from its Infinity Cache compared to the RX 7600. This behavior extends to the GDDR6 VRAM as well, giving the RX 7600 a 15% lower VRAM latency compared to the RX 7900 XTX.

Chips and Cheese RX 7600 Review - Latency Benchmarks — (Image credit: Chips and Cheese)

That's a significant difference, though ultimately it all boils down to real-world performance. Larger caches mean fewer VRAM accesses, and it's possible to hide higher latency with other techniques like data pre-fetching.

It's still an interesting look at how two GPUs within the same generation stack up at a low level. The RX 7600 shows how AMD opted for higher latency cache and memory access with the chiplet variants of the RDNA 3 architecture. It would have been more expensive to go the monolithic die route, though it would have been interesting to see what that would have done for performance.

Some of the latency advantage for the RX 7600 comes from AMD's cost-optimized design. The RX 7600's Navi 33 die is significantly smaller than the Navi 31 die used in the RX 7900 XT and XTX, so it wouldn't have made sense to use a multi-chiplet approach on Navi 33. Instead, AMD kept Navi 33 on TSMC's N6 node rather than moving to the latest N5 process. It also cut down the PCIe interface to x8 rather than the full x16, also saving on die area.

None of this changes the fact that the RX 7600 is a rather unexciting GPU, with underwhelming performance for the price you have to pay. On the bright side, at least the RX 7600 shows us what a more streamlined RDNA 3 GPU looks like with the advantages offered by a monolithic design.

We can't say how much faster a theoretical RX 7900 XTX would be with a monolithic die, especially since some of the latency improvements can be attributed to the RX 7600's smaller die size overall. However, it would certainly have made some difference with a latency gap as wide as 45%.

See more GPUs News

TOPICS

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

22 Comments Comment from the forums

InvalidError

If you want to keep stuff as fast as possible, you keep it on-die as close as possible, as much as possible. Shouldn't surprise anybody.

I wonder if anybody will manage to solve the challenge of consistently scalable multi-die GPUs. I wouldn't be surprised if the SM-to-SM bandwidth scaling across the chip-to-chip interface to achieve seamless integration turns out too steep for that to ever work fine without software implementing some variation of explicit multi-GPU to minimize die-to-die traffic, much like how OS schedulers and programs need to be CCD-aware to avoid getting bogged down by CCD-to-CCD latency.
Reply
bit_user

the RX 7950 XTX takes up to RX 7600 enjoys a massive 58% longer to retrieve data from its Infinity Cache
Uh... not only does that not parse, but I strongly reject the ironic use of "enjoys". Next thing you know, people are using it in a negative sense, without a hint of irony, and the word "enjoys" loses all connotations of pleasure.

Kind of like how a couple of the authors at WCCFTech use the phrase "sips power", when talking about something that actually guzzles it. With no hint of irony, they're depriving the word "sips" of any sense of rate or quantity.

Please have some pride, as a writer. Use the language, don't abuse it.
Reply
bit_user

InvalidError said:
If you want to keep stuff as fast as possible, you keep it on-die as close as possible, as much as possible. Shouldn't surprise anybody.
IIRC, the RX 7600 review talked about L2 cache. I'm glad Chips & Cheese investigated the matter, because I was curious just how much difference it made vs. the RX 7900's L3 cache.

I hope AMD does a version of the RX 7900 with 3D V-cache. That would seem to help justify their use of chiplets, as well as stepping down from the RX 6900's 128 MB of cache to a mere 96 MB.

InvalidError said:
I wonder if anybody will manage to solve the challenge of consistently scalable multi-die GPUs.
Apple's M1 Ultra.
Reply
InvalidError

bit_user said:
Apple's M1 Ultra.
From what few comparisons can be made across PC to Mac worlds to gauge the M1's scaling success, the performance ranging anywhere from the RTX3090 being only 15% faster to as much as 160% faster looks like consistency is far from perfect.
Reply
L1fe Test

So basically the 7900xtx sucks and lets wait for version 2.0
Reply
Kamen Rider Blade

This should surprise no one.

Monolithic vs Chiplet.

Obviously Monolithic will have lower latency when L3$ is off-die.
Reply
bit_user

InvalidError said:
From what few comparisons can be made across PC to Mac worlds to gauge the M1's scaling success, the performance ranging anywhere from the RTX3090 being only 15% faster to as much as 160% faster looks like consistency is far from perfect.
I didn't say it was perfect - just that they did deliver a multi-die GPU.
Reply
bit_user

L1fe Test said:
So basically the 7900xtx sucks
Nobody said it "sucks". Chips & Cheese specializes in architectural analysis, largely fueled by micro-benchmarking. They estimated the latency of the chiplet approach as worse, but they did not say how much impact it had on final performance!

GPUs are designed to be pretty good at latency-hiding. So, the impact of the additional latency could be fairly minor.
Reply
JarredWaltonGPU

bit_user said:
Uh... not only does that not parse, but I strongly reject the ironic use of "enjoys". Next thing you know, people are using it in a negative sense, without a hint of irony, and the word "enjoys" loses all connotations of pleasure.

Kind of like how a couple of the authors at WCCFTech use the phrase "sips power", when talking about something that actually guzzles it. With no hint of irony, they're depriving the word "sips" of any sense of rate or quantity.

Please have some pride, as a writer. Use the language, don't abuse it.
Sorry, it was a bad edit on my part. I removed the "RX 7600 enjoys a massive" bit, because even if latency is theoretically that much worse, you can built around that. Ultimately, the proof is in the eating of the pudding, and RX 7900 XTX is much faster. But I do think it could have been faster still had it not gone the GPU chiplet route.
Reply
InvalidError

bit_user said:
I didn't say it was perfect - just that they did deliver a multi-die GPU.
Anyone can make a multi-die GPU. All of the challenge is in making one that scales well. Had AMD's multi-GCDs scaled as well as it hoped it would in gaming workloads, we would likely have seen that in place of the RX7900 we got.
Reply

Show more comments