Reports that AMD's RDNA 3 GPUs have broken shader pre-fetch functionality aren't accurate, according to a statement that AMD issued to Tom's Hardware:
"Like previous hardware generations, shader pre-fetching is supported on RDNA 3 as per [gitlab link (opens in new tab)]. The code in question controls an experimental function which was not targeted for inclusion in these products and will not be enabled in this generation of product. This is a common industry practice to include experimental features to enable exploration and tuning for deployment in a future product generation." — AMD Spokesperson to Tom's Hardware.
AMD's statement comes on the heels of media reports that the recently-launched Navi31 silicon in the RDNA 3 graphics cards have 'non-working shader pre-fetch hardware.' The source of the speculation, @Kepler_L2, cited code from the Mesa3D drivers that appeared to indicate the shader pre-fetch doesn't work for some GPUs with the A0 revision of the silicon (CHIP_GFZ1100, CHIP_GFX1102, and CHIP_GFX110).
However, AMD's statement says that the code cited by Kepler_L2 pertained to an experimental function that wasn't intended for the final RDNA 3 products, so it is disabled for now. AMD notes that including experimental features in new silicon is a fairly common practice, which is accurate — we have often seen this approach used with other types of processors, like CPUs.
For instance, AMD shipped an entire generation of Ryzen 3000 products with the TSVs needed to enable 3D V-Cache, but didn't use the functionality until the very end of the Ryzen 5000 era. Likewise, Intel often adds features that might not make it into the final product, with its DLVR functionality being a recent example.
Naturally, one would assume that if an 'experimental' feature works perfectly fine, it would be included in the final product if it didn't require any additional accommodations (like the additional L3 cache slice needed for 3D V-Cache). That means the line between an 'experimental' or 'nice to have but not critical or needed to hit targets' feature could be a bit blurry. In either case, AMD says that the pre-fetch mechanism works on RDNA 3 as intended.
The other elephant in the room is AMD's use of an A0 stepping of the RDNA 3 silicon, which means this is the first physically-unrevised version of the chip. This has led to claims that AMD is shipping 'unfinished silicon,' but that type of speculation doesn't hold water.
AMD didn't respond to our queries on whether or not it used A0 silicon for the first wave of RDNA 3 CPUs, but industry sources tell us that the company did use A0 silicon for Navi31. In fact, we're told the company launched with A0-revision silicon for almost all of the 6000 series and most of the 5000 series.
This is not indicative of an 'unfinished product.' The goal of all design teams is to nail the design on the first spin with working, shippable silicon. Nvidia, for instance, often ships A0 stepping silicon, too.
Microprocessors can go through several revisions over the span of their life, often to fix bugs or errata and/or improve performance. Generally, the first revision of the silicon from the fabs is A0, and successive 'minor' respins (change to the metal layer) will be categorized as A1, A2, and so on. Silicon revisions create to switch to a 'B' or successive letter, and so forth. This continues with newer alpha-numeric designators as the chip is refined.
Nearly all complex chips have both known and unknown errata and bugs that are addressed with firmware, driver, and software workarounds that can reduce or eliminate those issues, and they ship that way — that's the very nature of modern semiconductor design and production. For example, Intel's Skylake generation of processors shipped with 53 known errata, and six months later, Intel listed another 40 errata. This is common because chip design cycles are long, often on the order of years, so there often isn't time to respin the chip to address minor issues. We see similar trends from other types and generations of processors, too.
However, not all errata can be fixed with workarounds, so some issues will be cleaned up in later steppings of the silicon — if deemed necessary. But the goal of any design team remains the same, to deliver silicon on the first spin that can meet the design goals for a shipping product. In that respect, using A0 silicon is considered a home run.
There are also many examples of chips that had issues in the design/verification process that required multiple steppings to come to market. For instance, Sapphire Rapids was last known to be on the 12th stepping, and it still hasn't shipped in volume (A0, A1, B0, C0, C1, C2, D0, E0, E2, E3, E4, and E5 stepping — technically 5 base spins). Naturally, that has led to severe production delays and missed launch dates.
Making chips is hard; they are the most sophisticated class of devices ever constructed by humankind, but they're made with almost unimaginably small features. That leads to issues and errata that can require several revisions to stamp out. Pay no mind to those that would claim that an A0 stepping always equates to 'unfinished silicon.' Success is measured by shipping workable silicon that meets targets on the first outing.
We expect some teething pains as AMD develops its first generation of chiplet-based GPUs, but the recent round of speculation is off target. Chip historians will remind you that the progression from the incredibly rough Ryzen 1000 chips to the polished Ryzen 3000 generation completely redefined a multi-billion dollar market and upset an entrenched incumbent. Will chiplet-based GPUs eventually find the same level of success? Time will tell, but as you can see in our AMD Radeon RX 7900 XTX and XT review, we've already found plenty of reasons to be impressed with AMD's new cards.
Thanks for this article, it just shown how some of these Youtubers have ZERO knowledge about semiconductors...
If you are following MLID, RGT, Coreteks, Kopite, momomo, Kepler, Greymon... then you are the problem propagating garbage that this article is adressing.
The cache helped not just bandwidth, but latency as well. Like the X3D.
It would be interesting to see a comparison of the three arches and see if 3 acts more like a large 1 than 2. If there are games that favor cache on the GPU side.
This idiots on reddits and arm chair GPU architects who have no idea what they are talking about. They really need to start sending letters from their lawyers out to these people to STFU or they will be held liable for hearsay and spreading FUD.
Its not about who you can and can't listen to its about uneducated people spreading rumors.
Sky is falling tags need to be applied.
Tell me how is it underperforming vs a 4080 ?
I have seen some outrageous videos like gamers meld that make zero literal sense. With dumb "hip" titles to attract dumb ppl like "RIP X PRODUCT" almost in the same caliber as those dumb videos claiming "X company does not want you to know this" "Most well known secret is out!" and so on..