AMD RX 7600 Has Better Cache Latency Than the RX 7900 XTX

Radeon RX 7900 XT(X) power
(Image credit: AMD)

Chips and Cheese published an in-depth review of the new AMD Radeon RX 7600, highlighting the chip's strengths and weaknesses at an architectural level. Despite its often mediocre performance, there are some positive characteristics of the RX 7600. Specifically, it has superior cache and memory latency performance compared to its much more potent RX 7900-series counterparts.

The RX 7600's behavior stems from the way AMD manufactures the RX 7600. Instead of utilizing a chiplet-based design, which AMD heavily promoted for the RX 7900-series, Navi 33 uses a traditional monolithic design — a single chip. This combined with its much smaller die give it superior memory latency over its bigger GPU counterparts.

According to tests conducted by Chips and Cheese, the RX 7950 XTX takes up to 58% longer to retrieve data from its Infinity Cache compared to the RX 7600. This behavior extends to the GDDR6 VRAM as well, giving the RX 7600 a 15% lower VRAM latency compared to the RX 7900 XTX.

(Image credit: Chips and Cheese)

(Image credit: Chips and Cheese)

That's a significant difference, though ultimately it all boils down to real-world performance. Larger caches mean fewer VRAM accesses, and it's possible to hide higher latency with other techniques like data pre-fetching.

It's still an interesting look at how two GPUs within the same generation stack up at a low level. The RX 7600 shows how AMD opted for higher latency cache and memory access with the chiplet variants of the RDNA 3 architecture. It would have been more expensive to go the monolithic die route, though it would have been interesting to see what that would have done for performance.

Some of the latency advantage for the RX 7600 comes from AMD's cost-optimized design. The RX 7600's Navi 33 die is significantly smaller than the Navi 31 die used in the RX 7900 XT and XTX, so it wouldn't have made sense to use a multi-chiplet approach on Navi 33. Instead, AMD kept Navi 33 on TSMC's N6 node rather than moving to the latest N5 process. It also cut down the PCIe interface to x8 rather than the full x16, also saving on die area.

None of this changes the fact that the RX 7600 is a rather unexciting GPU, with underwhelming performance for the price you have to pay. On the bright side, at least the RX 7600 shows us what a more streamlined RDNA 3 GPU looks like with the advantages offered by a monolithic design.

We can't say how much faster a theoretical RX 7900 XTX would be with a monolithic die, especially since some of the latency improvements can be attributed to the RX 7600's smaller die size overall. However, it would certainly have made some difference with a latency gap as wide as 45%.

Aaron Klotz
Contributing Writer

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

  • InvalidError
    If you want to keep stuff as fast as possible, you keep it on-die as close as possible, as much as possible. Shouldn't surprise anybody.

    I wonder if anybody will manage to solve the challenge of consistently scalable multi-die GPUs. I wouldn't be surprised if the SM-to-SM bandwidth scaling across the chip-to-chip interface to achieve seamless integration turns out too steep for that to ever work fine without software implementing some variation of explicit multi-GPU to minimize die-to-die traffic, much like how OS schedulers and programs need to be CCD-aware to avoid getting bogged down by CCD-to-CCD latency.
    Reply
  • bit_user
    the RX 7950 XTX takes up to RX 7600 enjoys a massive 58% longer to retrieve data from its Infinity Cache
    Uh... not only does that not parse, but I strongly reject the ironic use of "enjoys". Next thing you know, people are using it in a negative sense, without a hint of irony, and the word "enjoys" loses all connotations of pleasure.

    Kind of like how a couple of the authors at WCCFTech use the phrase "sips power", when talking about something that actually guzzles it. With no hint of irony, they're depriving the word "sips" of any sense of rate or quantity.

    Please have some pride, as a writer. Use the language, don't abuse it.
    Reply
  • bit_user
    InvalidError said:
    If you want to keep stuff as fast as possible, you keep it on-die as close as possible, as much as possible. Shouldn't surprise anybody.
    IIRC, the RX 7600 review talked about L2 cache. I'm glad Chips & Cheese investigated the matter, because I was curious just how much difference it made vs. the RX 7900's L3 cache.

    I hope AMD does a version of the RX 7900 with 3D V-cache. That would seem to help justify their use of chiplets, as well as stepping down from the RX 6900's 128 MB of cache to a mere 96 MB.

    InvalidError said:
    I wonder if anybody will manage to solve the challenge of consistently scalable multi-die GPUs.
    Apple's M1 Ultra.
    Reply
  • InvalidError
    bit_user said:
    Apple's M1 Ultra.
    From what few comparisons can be made across PC to Mac worlds to gauge the M1's scaling success, the performance ranging anywhere from the RTX3090 being only 15% faster to as much as 160% faster looks like consistency is far from perfect.
    Reply
  • L1fe Test
    So basically the 7900xtx sucks and lets wait for version 2.0
    Reply
  • Kamen Rider Blade
    This should surprise no one.

    Monolithic vs Chiplet.

    Obviously Monolithic will have lower latency when L3$ is off-die.
    Reply
  • bit_user
    InvalidError said:
    From what few comparisons can be made across PC to Mac worlds to gauge the M1's scaling success, the performance ranging anywhere from the RTX3090 being only 15% faster to as much as 160% faster looks like consistency is far from perfect.
    I didn't say it was perfect - just that they did deliver a multi-die GPU.
    Reply
  • bit_user
    L1fe Test said:
    So basically the 7900xtx sucks
    Nobody said it "sucks". Chips & Cheese specializes in architectural analysis, largely fueled by micro-benchmarking. They estimated the latency of the chiplet approach as worse, but they did not say how much impact it had on final performance!

    GPUs are designed to be pretty good at latency-hiding. So, the impact of the additional latency could be fairly minor.
    Reply
  • JarredWaltonGPU
    bit_user said:
    Uh... not only does that not parse, but I strongly reject the ironic use of "enjoys". Next thing you know, people are using it in a negative sense, without a hint of irony, and the word "enjoys" loses all connotations of pleasure.

    Kind of like how a couple of the authors at WCCFTech use the phrase "sips power", when talking about something that actually guzzles it. With no hint of irony, they're depriving the word "sips" of any sense of rate or quantity.

    Please have some pride, as a writer. Use the language, don't abuse it.
    Sorry, it was a bad edit on my part. I removed the "RX 7600 enjoys a massive" bit, because even if latency is theoretically that much worse, you can built around that. Ultimately, the proof is in the eating of the pudding, and RX 7900 XTX is much faster. But I do think it could have been faster still had it not gone the GPU chiplet route.
    Reply
  • InvalidError
    bit_user said:
    I didn't say it was perfect - just that they did deliver a multi-die GPU.
    Anyone can make a multi-die GPU. All of the challenge is in making one that scales well. Had AMD's multi-GCDs scaled as well as it hoped it would in gaming workloads, we would likely have seen that in place of the RX7900 we got.
    Reply