Skip to main content

AMD Big Navi and RDNA 2 GPUs: Release Date, Specs, Everything We Know

AMD Big Navi
(Image credit: AMD)

AMD Big Navi, Navi 2x, RDNA 2. Whatever you want to call them, AMD's next-generation GPUs are promising big performance and efficiency gains, along with feature parity with Nvidia in terms of ray tracing support. Will Team Red finally take the pole position in our GPU hierarchy and lay claim to the crown for the best graphics card, or will the Nvidia Ampere architecture spoil the party? And what about Intel Xe Graphics and the Xe HPG? It's too soon to say, but here's everything we know about Big Navi, including the RDNA 2 architecture, potential performance, expected release date and pricing.

With Nvidia's GeForce RTX 3090, GeForce RTX 3080, and GeForce RTX 3070 now revealed, the ball is in AMD's court. There are two ways to look at the Nvidia launch. Either it's Nvidia doing its best to bury AMD — 30 TFLOPS from the RTX 3080 and double the performance of the RTX 2080!? We didn't see that coming — or Nvidia has a good idea of what Big Navi will deliver and priced their cards more aggressively to compete.

We've done our best to sort fact from fiction, but even without hard numbers from AMD, we have a reasonable idea of what to expect. The Xbox Series X and PlayStation 5 hardware announcements certainly add fuel to the fire, and give us realistic ideas of where Big Navi is likely to land in the PC world. If AMD plays its cards right, perhaps Big Navi will finally put AMD's high graphics card power consumption behind it. Let's start at the top, with the new RDNA 2 architecture that powers Big Navi / Navi 2x. But first, here’s a brief list of what we know (or think we know) so far. 

Big Navi / RDNA 2 at a Glance

  • Up to 80 CUs / 5120 shaders
  • 50% better performance per watt
  • Coming October 28 (confirmed)
  • Pricing of $549-$599 for RX 6900 XT (rumor, big spoonful of salt)

(Image credit: AMD)

The RDNA 2 Architecture in Big Navi 

Every generation of GPUs is built from a core architecture, and each architecture offers improvements over the previous generation. It's an iterative and additive process that never really ends. AMD's GCN architecture went from first generation for its HD 7000 cards in 2012 up through fifth gen in the Vega and Radeon VII cards in 2017-2019. The RDNA architecture that powers the RX 5000 series of AMD GPUs arrived in mid 2019, bringing major improvements to efficiency and overall performance. RDNA 2 looks to double down on those improvements in late 2020.

First, a quick recap of RDNA 1 is in order. The biggest changes with RDNA 1 over GCN involve a redistribution of resources and a change in how instructions are handled. In some ways, RDNA doesn't appear to be all that different from GCN. The instruction set is the same, but how those instructions are dispatched and executed has been improved. RDNA also adds working support for primitive shaders, something present in the Vega GCN architecture that never got turned on due to complications.

Perhaps the most noteworthy update is that the wavefronts—the core unit of work that gets executed—have been changed from being 64 threads wide with four SIMD16 execution units, to being 32 threads wide with a single SIMD32 execution unit. SIMD stands for Single Instruction, Multiple Data; it's a vector processing element that optimizes workloads where the same instruction needs to be run on large chunks of data, which is common in graphics workloads.

This matching of the wavefront size to the SIMD size helps improve efficiency. GCN issued one instruction per wave every four cycles; RDNA issues an instruction every cycle. GCN used a wavefront of 64 threads (work items); RDNA supports 32- and 64-thread wavefronts. GCN has a Compute Unit (CU) with 64 GPU cores, 4 TMUs (Texture Mapping Units) and memory access logic. RDNA implements a new Workgroup Processor (WGP) that consists of two CUs, with each CU still providing the same 64 GPU cores and 4 TMUs plus memory access logic.

How much do these changes matter when it comes to actual performance and efficiency? It's perhaps best illustrated by looking at the Radeon VII, AMD's last GCN GPU, and comparing it with the RX 5700 XT. Radeon VII has 60 CUs, 3840 GPU cores, 16GB of HBM2 memory with 1 TBps of bandwidth, a GPU clock speed of up to 1750 MHz, and a peak performance rating of 13.8 TFLOPS. The RX 5700 XT has 40 CUs, 2560 GPU cores, 8GB of GDDR6 memory with 448 GBps of bandwidth, and clocks at up to 1905 MHz with peak performance of 9.75 TFLOPS.

On paper, Radeon VII looks like it should come out with an easy victory. In practice, across a dozen games that we've tested, the RX 5700 XT is slightly faster at 1080p gaming and slightly slower at 1440p. Only at 4K is the Radeon VII able to manage a 7% lead, helped no doubt by its memory bandwidth. Overall, the Radeon VII only has a 1% performance advantage, but it uses 300W compared to the RX 5700 XT's 225W.

In short, AMD is able to deliver roughly the same performance as the previous generation, with a third fewer cores, less than half the memory bandwidth and using 25% less power. That's a very impressive showing, and while TSMC's 7nm FinFET manufacturing process certainly warrants some of the credit (especially in regards to power), the performance uplift is mostly thanks to the RDNA architecture.

(Image credit: AMD)

That's a lot of RDNA discussion, but it's important because RDNA 2 appears to carry over all of that, with one major new addition: Support for ray tracing. It also supports Variable Rate Shading (VRS), which is part of the DirectX 12 Ultimate spec. There will almost certainly be other tweaks to the architecture, as AMD is making some big claims about Big Navi / RDNA 2 / Navi 2x when it comes to performance per watt. Specifically, AMD says RDNA 2 will offer 50% more performance per watt than RDNA 1, which is frankly a huge jump—the same large jump RDNA 1 saw relative to GCN.

It means AMD claims RDNA 2 will deliver either the same performance while using 33% less power, or 50% higher performance with the same power, or most likely some in between solution with higher performance and lower power requirements. Of course, there's another way to read things. RDNA 2 could be up to 1.5X performance per watt, if you restrict it to the same performance level as RDNA 1. That's pretty much what Nvidia is saying with its 1.9X efficiency increase on Ampere.

The one thing we know for certain is that RDNA 2 / Big Navi / Navi 2x GPUs will all support hardware ray tracing. That will bring AMD up to feature parity with Nvidia. There was some question as to whether AMD would use the same BVH approach to ray tracing calculations as Nvidia, and with the PlayStation 5 and Xbox Series X announcements out of the way, the answer appears to be yes.

If you're not familiar with the term BVH, it stands for Bounding Volume Hierarchy and is used to efficiently find ray and triangle intersections; you can read more about it in our discussion of Nvidia's Turing architecture and its ray tracing algorithm. While AMD didn't provide much detail on its BVH hardware, BVH as a core aspect of ray tracing was definitely mentioned, and we heard similar talk about ray tracing and BVH with the VulkanRT and DirectX 12 Ultimate announcements.

We don't know how much ray tracing hardware is present, or how fast will it be. Even if AMD takes the same approach as Nvidia and puts one RT core (or whatever AMD wants to call it) into each CU, the comparison between AMD and Nvidia isn't clear cut. Nvidia for example says it doubled the performance of its RT cores in Ampere. Will AMD's RT cores be like Nvidia's Gen1, Gen2, or something else? The fact is, we don't know yet and won't know until AMD says more.

Note that Nvidia also has Tensor cores in its Turing architecture, which are used for deep learning and AI computations, as well as DLSS (Deep Learning Super Sampling), which has now been generalized with DLSS 2.0 to improve performance and image quality and make it easier for games to implement DLSS. So far, AMD has said nothing about RDNA 2 / Navi 2x including Tensor cores or an equivalent to DLSS, though AMD's CAS (Contrast Aware Sharpening) and RIS (Radeon Image Sharpening) do overlap with DLSS in some ways. Recently, Sony patents detailed a DLSS-like technique for image reconstruction, presumably for the PlayStation 5. It may be possible to do that without any Tensor cores, using just the FP16 or INT8 capabilities of Navi 2x.

We also know that AMD is planning multiple Navi 2x products, and we expect to see extreme, high-end and mainstream options—though budget Navi 2x seems unlikely in the near term, given RX 5500 XT launched this year. AMD could launch multiple GPUs in a relatively short period of time, but more likely we'll see the highest performance options first, followed by high-end and eventually mid-range solutions. Some of those may not happen until 2021, however. 

(Image credit: Sony, Microsoft)

Console Specifications and a Historical Recap 

While we don't officially know how many CUs will be present in Navi 2x, for any of the configurations, there are hints as to what we can expect thanks to the console announcements. We're going to provide a lot of background information on the previous-generation consoles, as well as the upcoming consoles, to help inform our specification speculation. You've been warned.

Xbox Series X will have a relatively-massive 52 CUs in its GPU, while the PlayStation 5 will 'only' have 36 CUs. Sony's PS5 CUs are clocked higher, but in terms of raw performance the Xbox Series X is clearly faster. Looking at historical console hardware launches, it's safe to bet that neither represents the pinnacle of what AMD will launch in PC hardware this year.

Back in 2013, when the PlayStation 4 and Xbox One first launched, the PS4 clearly had the faster GPU. It had 18 CUs or 1152 GPU cores running at 800 MHz, compared to the Xbox One with 12 CUs or 768 cores running at 853 MHz. That's 1843 GFLOPS for the PS4 vs. 1310 GFLOPS on the Xbox One. More importantly, though, is the PC graphics hardware AMD was shipping at the time.

The Radeon HD 7970 GHz Edition had been shipping for over a year with 32 CUs and 2048 GPU cores, and up to 4301 GFLOPS. The R9 290X launched around the same time as the PS4 and Xbox One, with 44 CUs and 2816 cores, and the 1050 MHz clock speed meant 5632 GFLOPS. That means AMD's top PC GPU when the PS4 launched was about three times as fast, though the PS4 was closer in terms of memory bandwidth (256 GBps on PS4 vs. 320 GBps on the 290X).

The same thing happened in 2017 with the console updates. PS4 Pro moved to 36 CUs and 2304 cores, with a 911 MHz core clock providing up to 4198 GFLOPS of compute. The Xbox One X had 40 CUs and 2560 cores clocked at 1172 MHz for 6001 GFLOPS. The Xbox is basically using a consolized variant of the RX 580, while the PS4 Pro is using a consolized RX 570. The top PC GPU from AMD in 2017 was RX Vega 64, with 64 CUs and 4096 cores running at up to 1677 MHz, yielding 13,738 GFLOPS of compute. Again, PC hardware was at least twice the compute performance—and since we're comparing similar architectures, it's a 'fair' comparison (ie, unlike comparing GFLOPS between AMD and Nvidia GPUs).

(Image credit: AMD)

Potential Big Navi / Navi 2x Specifications 

What do the console specifications mean for Big Navi / Navi 2x and RDNA 2 desktop GPUs? Obviously times change, but we definitely know a few things. First, AMD is fully capable of building an RDNA 2 / Big Navi GPU with at least 52 CUs, and very likely can and will go higher. AMD is also using two completely different GPU configurations for the Xbox Series X and PlayStation 5, though that doesn't mean either configuration will actually end up in a PC graphics card. Sony's Mark Cerny was quick to point out that there's some undisclosed 'special sauce' in the PS5 processor, for example. Basically, the upcoming consoles give us a minimum baseline for what AMD can do with Big Navi.

AMD has a lot of options available. The PC Navi 2x GPUs are going to be different from what the consoles are using, because they'll be focused purely on graphics—there's not going to be any Zen 2 CPU chiplet, for example. There's a balancing act between chip size, clock speed and power, and every processor can prioritize things differently. Larger chips use more power and cost more to manufacture, and they typically run at lower clock speeds to compensate. Smaller chips have better yields, cost less and use less power, but for GPUs there's a lot of base functionality that has to be present, so a chip that's half the performance usually isn't half the size.

Looking at Navi 10 and RDNA 1, it's not a stretch to imagine AMD shoving twice the number of GPU cores into a Navi 2x GPU. Navi 10 is relatively small at just 251mm square, and AMD has used much larger die sizes in the past. Anyway, let's cut to the chase. There are lots of rumors floating about, and as always we recommend taking these with a healthy dose of skepticism. The reality is that AMD can and likely will alter CU and core counts as it gets closer to launch. The Navi GPU designs are complete and we've seen a few early demonstrations of working hardware, but that doesn't mean final specs are known—not by AMD, and certainly not by any leakers. A GPU's maximum CU count can't be exceeded, but disabling parts of each GPU is common practice and has been for years.

The following table of potential specs gives a rundown of what we expect to see. The question marks and less than signs indicate our own best guesses based purely on previous GPU launches and the current graphics card market. We've run some numbers to help fill in the remaining data, based on AMD saying Navi 2x / Big Navi / RDNA 2 will provide a 50% improvement in performance per watt compared to Navi 1x. The table gives figures that will deliver on that goal, but AMD can still change TDP, core counts, and clock speeds to end up with the same 50% improvement in performance per watt with completely different specs. We think it's unlikely AMD will go higher than these estimates. Lower is more plausible, and nothing is certain yet. 

AMD Big Navi / Navi 2x Estimated Specifications
GPUNavi 21Navi 22?Navi 23
Graphics Card ModelRX 6900 XT?RX 6800 XT?RX 6700 XT?
Process (nm)777
Transistors (billion)21??
Die size (mm^2)505??
CUsUp to 80Up to 64?Up to 40?
GPU cores<5120<4096?<2560?
Max Clock (MHz)<1600?<1800?<2250?
VRAM Speed (MT/s)16000?16000?14000?
VRAM (GB)16?8?8?
Bus width256??256??256??
ROPs64?64?64?
TMUs<320?<256?<160?
GFLOPS (boost)<16384?<14746?<11520?
Bandwidth (GB/s)512?512?448?
TBP (watts)250?225?175?
Launch DateOctober 2020October 2020Winter 2021?
Launch Price$599?$499?$299?

The highest spec rumors point to a Navi 21 GPU with 80 CUs and 5120 GPU cores, and basically double the size (505mm square) of the current Navi 10 used in the RX 5700 XT. Those numbers are awfully close to just being double the Navi 10, however, which gives us pause. RDNA 2 has to add new hardware for ray tracing, which judging by Nvidia's Turing GPUs should require quite a few transistors. Let us explain.

The TU106 GPU used in RTX 1060 has a maximum of 36 SMs (Nvidia's equivalent of AMD's CU, basically) and a die size of 445mm square. The TU116 drops the RT and Tensor cores and has a maximum of 24 SMs and a die size of 284mm square. Analyzing die shots of each GPU and focusing just on the SM blocks, we end up with a rough estimate of 5.4mm square for the Turing RTX SM vs. 4.5mm square for the Turing GTX SM—meaning, the RT and Tensor core logic makes each SM about 20% larger. Of course, there's other hardware that doesn't need to be duplicated (eg, video blocks), so twice the CU count of Navi 10 isn't out of the question.

Other rumors give the same die size and transistor counts, but different CU and core counts. We're going with the assumption of several GPUs (Navi 21/22/23), but it could be Navi 22 will be an HBM2 mobile variant for Apple, with a trimmed down Navi 21 going into the second tier products. Either way, the core counts are plausible, but the naming is up for grabs.

Much less is known about the third potential configuration, Navi 23. There are no leaks discussing die size or transistor counts, which makes sense if AMD plans to ship Navi 21 first and follow up next year with Navi 23. Best guess right now is that Navi 23 ends up being a similar configuration to Navi 10 (ie, 40 CU maximum), but with ray tracing and potentially higher clock speeds. This would be a chip similar to the PlayStation 5's GPU: smaller but with clock speeds of 2GHz or more.

Navi 10

(Image credit: AMD)

Moving over to the memory side of things, we have more questions than answers. Some rumors say 16GB for the top product, 12GB for a high-end option, and 8GB for the mainstream GPU. That sounds good on paper ... but how does AMD get there?

GDDR6 memory comes in either 1GB (8Gb) or 2GB (16Gb) modules, each of which has a 32-bit interface. To get to 16GB, AMD could use 16 chips and a 512-bit interface. That would be nuts. The last 512-bit interface GPU we saw was Hawaii (R9 290X/390X), and it was a power hungry beast. The other option is to use 8 chips and a 256-bit interface, which cuts memory bandwidth in half.

For now, we're guessing AMD will use 16GB and 256-bit for the top card, and 8GB with 256-bit for the second tier card. Both will clock the GDDR6 at a higher speed of 16Gbps, but even then memory bandwidth is a concern. Perhaps AMD has dramatically improved memory compression techniques to make bandwidth less critical, or has more cache to help out. Either way, VRAM is a big question still.

There are also rumors of a 12GB card playing second string, which makes things even more difficult. That would necessitate a 384-bit interface or a 192-bit interface (or a weird asymmetrical memory interface, which is a huge can o' worms). But if the top card sticks with a 256-bit interface, there's no way the step down would go wider.

So maybe it's 24GB at the top, 12GB at the middle, and 8GB at the 'bottom' of the current Navi 2x stack. For now, we're assuming 16GB, 8GB, and 8GB (possibly 6GB) for the three top GPUs, because that makes the interface width manageable. This is why Nvidia has 24GB, 10GB, and 8GB on its RTX 30-series parts, incidentally. 384-bit interfaces are still manageable, and can be cut down as needed. 512-bit remains problematic.

Big Navi / Navi 2x Graphics Card Model Names 

(Image credit: AMD)

What will AMD call the retail products using Big Navi / Navi 2x GPUs? AMD has at least revealed that the Navi 2x family will be sold under the RX 6000 series, which is what most of us expected. Beyond that, there are still some remaining quesitons.

AMD has said it will launch a whole series of Navi 2x GPUs. The Navi 1x family consists of RX 5700 XT, RX 5700, RX 5600 XT, and RX 5500 XT (in 4GB and 8GB models), along with RX 5600/5500/5300 models for the OEM market that lack the XT suffix. AMD could simply add 1000 points to the current models, but we expect there will be a few more options this round.

We expect the top model to be called RX 6900 XT or something similar, with the various performance tiers as RX 6800, 6700, etc. The consumer models will likely keep the XT suffix, and AMD could do non-XT models for the OEM market. We think it would be great to have more consistent branding (e.g., all XT models are for consumers), but we'll have to see what AMD decides to do.

Big Navi / Navi 2x Release Date 

AMD has reiterated many times this year that RDNA 2, aka Big Navi—which AMD is even using now in homage to the enthusiast community's adoption of that moniker—will arrive before the end of 2020. AMD has now announced a Future of Radeon PC Gaming event that will take place on October 28.

AMD could potentially launch the RX 6000 GPUs at that time, but more likely is that it will first reveal the architecture, specs, and other details similar to what Nvidia did with it's Ampere announcement. That means actual GPUs will probably arrive in November, just in time for the holiday shoppers.

While the impact of COVID-19 around the globe is immense, AMD still plans on launching at least some Navi 2x parts in 2020. However, given the late date of the event, it's likely we will only see the top two products from RDNA 2. It might be more than that, but most new GPU families roll out over a period of several months.

Big Navi / Navi 2x Cost 

(Image credit: AMD)

We provided our own estimated pricing based on the potential performance and graphics card market in the table near the top. We've changed those estimates quite a bit since the Nvidia Ampere announcement, as AMD can't hope to sell slower cards at equal or higher pricing.

Officially, AMD hasn't said anything in regards to pricing yet, and that will likely remain the case until the final two or three weeks before launch. Other factors, like the price of competing Nvidia (and maybe even Intel?) GPUs, will be considered as the launch date approaches. We can look back at the Navi 10 / RX 5700 XT launch for context.

Rumors came out more than six months before launch listing various prices. We saw everything from RTX 2080 performance for $250 to $500, or RTX 2060 performance for under $200. AMD officially revealed prices of $449 for the RX 5700 XT and $379 for the RX 5700 about a month before launch.

After the initial RX 5700 XT reveal, Nvidia, to the surprise of pretty much no one, launched its RTX 2070 Super and RTX 2060 Super, providing improved performance at lower prices. (The RTX 2080 Super was also announced, but it didn't launch until two weeks after the RX 5700 series.) Just a few days before launch, AMD then dropped the prices of its RX 5700 XT to $399, and the RX 5700 to $349, making them far more appealing. (The RX 5600 XT arrived about six months later priced at $299.) AMD would later go on to state that this was all premeditated—gamesmanship to get Nvidia to reveal its hand early.

The bottom line is that no one, including AMD itself, knows what the final pricing will be on a new graphics card months before launch. There are plans with multiple contingencies, and ultimately the market will help determine the price. If Nvidia launches Ampere before AMD launches Big Navi, which seems likely, that will naturally impact pricing. If Ampere is priced aggressively and performs better, AMD will likely reduce prices. Conversely, if Ampere increases Nvidia's pricing ladder and doesn't significantly outperform Navi 2x, AMD may increase prices on Big Navi.

Intel's Xe Graphics may also prove to be more capable than many are assuming, which would have a knock-on effect to both Ampere and Navi 2x. (Yeah, don't hold your breath.) AMD also explicitly states "enthusiast-class performance" in the above slide, and that has never been synonymous with "affordable."

There are also multiple reports of a 505mm square die size, and if that's correct we have to assume Big Navi / Navi 2x graphics cards will go after the enthusiast segment—meaning, $500 or more. TSMC's 7nm FinFET lithography is more expensive than its 12nm, and 505mm square means yields and dies per wafer are both going to be lower. Plus, 12GB of GDDR6 will increase both the memory and board price. Big chips lead to big prices, in other words.

The only real advice we can give right now is to wait and see. AMD (and Nvidia and Intel) will do its best to deliver RDNA 2 and Navi 2x GPUs at compelling prices. That doesn't mean we'll get RTX 2080 Super performance for $250, sadly, but if Big Navi can give Nvidia some much-needed competition in the enthusiast graphics card segment, we should see bang-for-the-buck improvements across the entire spectrum of GPUs. And if AMD really does have an 80-CU monster Navi 21 GPU coming that will beat the RTX 2080 Ti in performance, we expect it to charge accordingly.

Big Navi Closing Thoughts

AMD has a lot riding on Big Navi, RDNA 2, and Navi 2x. Just like Nvidia's Ampere, AMD has a lot to prove. This will be the GPU architecture that powers the next generation of consoles, which tend to have much longer shelf lives than PC graphics cards. Look at the PS4 and Xbox One: both launched in late 2013 and are still in use today. There are still PC gamers with GTX 700-series or R9 200-series graphics cards, but if you're running such a GPU, we feel for you.

We're very interested in finding out how Big Navi performs, with and without ray tracing. 50% better performance per watt can mean a lot of different things, and AMD hasn't shied away from 300W GPUs for the past several generations of hardware. A 300W part with 50% better performance per watt would basically be double the performance of the current RX 5700 XT, and that's enough to potentially compete with whatever Nvidia has to offer. Even 225W with 50% higher performance than the 5700 XT would be pretty impressive.

Realistically, AMD's 50% PPW improvements probably only occur in specific scenarios, just like Nvidia's 90% PPW improvements on Ampere. Particularly for the higher performance parts, we're skeptical of claims of 50% improvements, but we'll withhold judgement for now.

As always, without actual hardware in hand, running actual gaming benchmarks, we can't declare a victor. September and October are shaping up to be very exciting. Considering it's been more than a year since AMD's Navi architecture launched, and two years since Nvidia's Turing architecture reveal, it's well past time for a new series of GPUs from both companies. Let's just hope the prices are compelling, and that upcoming games will put the hardware to good use.

  • animalosity
    Unless my math is wrong; 80 Compute Units * 96 Raster Operations * 1600 Mhz clock = 12.28 TFLOPS of single precision floating point (FP32).

    Not bad AMD. Not bad. Let's see what that translates to in the real world, though with the advances of DX12 and now Vulkan being implemented I expect AMD to be on a more even level playing field with high end Nvidia. I might be inclined to head back to team Red, especially if the price is right.
    Reply
  • JarredWaltonGPU
    animalosity said:
    Unless my math is wrong; 80 Compute Units * 96 Raster Operations * 1600 Mhz clock = 12.28 TFLOPS of single precision floating point (FP32).

    Not bad AMD. Not bad. Let's see what that translates to in the real world, though with the advances of DX12 and now Vulkan being implemented I expect AMD to be on a more even level playing field with high end Nvidia. I might be inclined to head back to team Red, especially if the price is right.
    Your math is wrong. :-)

    FLOPS is simply FP operations per second. It's calculated as a "best-case" figure, so FMA instructions (fused multiply add) count as two operations, and each GPU core in AMD and Nvidia GPUs can do one FMA per clock (peak theoretical performance). So FLOPS ends up being:
    GPU cores * 2 * clock

    For the tables:
    80 CUs * 64 cores/CU * 2 * clock (1600 MHz) = 16,384 GFLOPS.

    ROPs and TMUs and some other functional elements of GPUs might do work that sort of looks like an FP operation, but they're not programmable or accessible in the same way as the GPUs and so any instructions run on the ROPs or TMUs generally aren't counted as part of the FP32 performance.
    Reply
  • animalosity
    JarredWaltonGPU said:
    Your math is wrong. :)

    FLOPS is simply FP operations per second. It's calculated as a "best-case" figure, so FMA instructions (fused multiply add) count as two operations, and each GPU core in AMD and Nvidia GPUs can do one FMA per clock (peak theoretical performance). So FLOPS ends up being:
    GPU cores * 2 * clock

    For the tables:
    80 CUs * 64 cores/CU * 2 * clock (1600 MHz) = 16,384 GFLOPS.

    Ah yes, I knew I was forgetting Texture Mapping Units. Thank you for the correction. I am assuming you meant 16.3 TFLOPS vice GigaFLOPS. I knew what you were trying to convey. Either way, those some pretty impressive theoretical compute performance. Excited to see how that translates tor real world performance versus some pointless synthetic benchmark.
    Reply
  • JamesSneed
    Speaking of FLOPS we also should note that AMD gutted most of GCN that was left especially the parts that helped compute. I fully expect the same amount of FLOPS from this architecture to translate into more FPS since they are no longer making general gaming and compute GPU but a dedicated gaming GPU.
    Reply
  • JarredWaltonGPU
    animalosity said:
    Ah yes, I knew I was forgetting Texture Mapping Units. Thank you for the correction. I am assuming you meant 16.3 TFLOPS vice GigaFLOPS. I knew what you were trying to convey. Either way, those some pretty impressive theoretical compute performance. Excited to see how that translates tor real world performance versus some pointless synthetic benchmark.
    Well, 16384 GFLOPS is the same as 16.384 TFLOPS if you want to do it that way. I prefer the slightly higher precision of GFLOPS instead of rounding to the nearest 0.1 TFLOPS, but it would be 16.4 TFLOPS if you want to go that route.
    Reply
  • JarredWaltonGPU
    JamesSneed said:
    Speaking of FLOPS we also should note that AMD gutted most of GCN that was left especially the parts that helped compute. I fully expect the same amount of FLOPS from this architecture to translate into more FPS since they are no longer making general gaming and compute GPU but a dedicated gaming GPU.
    I'm not sure that's completely accurate. If you are writing highly optimized compute code (not gaming or general code), you should be able to get relatively close to the theoretical compute performance. Or at least, both GCN and Navi should end up with a relatively similar percentage of the theoretical compute. Which means:

    RX 5700 XT = 9,654 GFLOPS
    RX Vega 64 = 12,665 GFLOPS
    Radeon VII = 13,824 GFLOPS

    For gaming code that uses a more general approach, the new dual-CU workgroup processor design and change from 1 SIMD16 (4 cycle latency) to 2 SIMD32 (1 cycle latency) clearly helps, as RX 5700 XT easily outperforms Vega 64 in every test I've seen. But with the right computational workload, Vega 64 should still be up to 30% faster. Navi 21 with 80 CUs meanwhile would be at least 30% faster than Vega 64 in pure compute, and probably a lot more than that in games.
    Reply
  • JamesSneed
    JarredWaltonGPU said:
    I'm not sure that's completely accurate. If you are writing highly optimized compute code (not gaming or general code), you should be able to get relatively close to the theoretical compute performance. Or at least, both GCN and Navi should end up with a relatively similar percentage of the theoretical compute. Which means:

    RX 5700 XT = 9,654 GFLOPS
    RX Vega 64 = 12,665 GFLOPS
    Radeon VII = 13,824 GFLOPS

    For gaming code that uses a more general approach, the new dual-CU workgroup processor design and change from 1 SIMD16 (4 cycle latency) to 2 SIMD32 (1 cycle latency) clearly helps, as RX 5700 XT easily outperforms Vega 64 in every test I've seen. But with the right computational workload, Vega 64 should still be up to 30% faster. Navi 21 with 80 CUs meanwhile would be at least 30% faster than Vega 64 in pure compute, and probably a lot more than that in games.


    "Navi 21 with 80 CUs meanwhile would be at least 30% faster than Vega 64 in pure compute, and probably a lot more than that in games. "

    Was my point ^ We will see more FPS in games than the FLOPS is telling us. It is not a flops is 30% more so we can expect that much more gaming performance it wont be linear this go around.
    Reply
  • JarredWaltonGPU
    JamesSneed said:
    "Navi 21 with 80 CUs meanwhile would be at least 30% faster than Vega 64 in pure compute, and probably a lot more than that in games. "

    Was my point ^ We will see more FPS in games than the FLOPS is telling us. It is not a flops is 30% more so we can expect that much more gaming performance it wont be linear this go around.
    I agree with that part, though it wasn't clear from your original post that you were saying that. Specifically, the bit about "AMD gutted most of GCN that was left especially the parts that helped compute" isn't really accurate. AMD didn't "gut" anything -- it added hardware and reorganized things to make better use of the hardware. And ultimately, that leads to better performance in nearly all workloads.

    Interesting thought:
    If AMD really does an 80 CU Navi 2x part, at close to the specs I listed, performance should be roughly 60% higher than RX 5700 XT. Considering the RTX 2080 Ti is only about 30% faster than RX 5700 XT, that would actually be a monstrously powerful GPU. I suspect it will be a datacenter part first, if it exists, and maybe AMD will finally get a chance to make a Titan killer. Except Nvidia can probably get a 40-50% boost to performance over Turing by moving to 7nm and adding more cores, so I guess we wait and see.
    Reply
  • jeremyj_83
    JarredWaltonGPU said:
    I agree with that part, though it wasn't clear from your original post that you were saying that. Specifically, the bit about "AMD gutted most of GCN that was left especially the parts that helped compute" isn't really accurate. AMD didn't "gut" anything -- it added hardware and reorganized things to make better use of the hardware. And ultimately, that leads to better performance in nearly all workloads.

    Interesting thought:
    If AMD really does an 80 CU Navi 2x part, at close to the specs I listed, performance should be roughly 60% higher than RX 5700 XT. Considering the RTX 2080 Ti is only about 30% faster than RX 5700 XT, that would actually be a monstrously powerful GPU. I suspect it will be a datacenter part first, if it exists, and maybe AMD will finally get a chance to make a Titan killer. Except Nvidia can probably get a 40-50% boost to performance over Turing by moving to 7nm and adding more cores, so I guess we wait and see.
    Looking at the numbers AMD could get an RX 5700XT performance part in a 150W envelope if their performance/watt numbers can be believed. Having a 1440p GPU in the power envelope of a GTX 1660 would be a killer product.
    Reply
  • JamesSneed
    I am expecting INT8 performance to not move much from the RX5700XT. Shall see though as they do need to handle ray tracing.
    Reply