Nvidia's GeForce RTX 4090 might look incredibly strong, and will certainly rank as the fastest option on our list of the best graphics cards when it debuts (at least until AMD's RDNA 3 GPUs arrive), but the shaved down AD102 die in the RTX 4090 isn't close to showing off the full potential of AD102 with all of its cores and cache enabled. This combined with additional enhancements could hint at a future RTX 4090 Ti that will be much faster — and perhaps even more expensive.
The specs for the Nvidia RTX 40-series and Ada Lovelace GPUs, but those only show the announced and rumored cards. Nvidia's full AD102 die comes equipped with 144 SMs, 18,432 CUDA cores, 96MB of L2 cache, and 192 ROPs. This translates to 12% more CUDA cores and a whopping 33% more L2 cache capacity compared to the RTX 4090 we have today. The fully enabled AD102 die also packs 9% more ROPS and 12% more Texture Mapping Units as well, thanks to the additional SMs.
But that's not all that could be done for the future 4090 Ti. Micron has new 24Gbps GDDR6X memory modules in the works, another 14% boost over the RTX 4090's 21Gbps modules, and still faster than the RTX 4080 16GB's 22.4 Gbps modules that Nvidia claims are the fastest in the world right now. That would push the hypothetical (but very likely) RTX 4090 Ti up to 1152 MB/s of bandwidth.
But faster memory would come with higher power consumption, and we suspect that Nvidia is seriously holding back AD102's full clock speed and power potential as well. All those rumors of 600W RTX 40-series graphics cards? We know Nvidia has successfully overclocked RTX 4090 to more than 3.0GHz, and that would definitely push up power use.
It looks like the Ada architecture and TSMC's 4N process have plenty of headroom remaining beyond the RTX 4090's 2520 MHz boost frequency. Once the process matures a bit more, and if Nvidia is willing to increase the power limits, we wouldn't be surprised to see a RTX 4090 Ti clock at closer to 2800 MHz.
The theoretical performance of AD102 with all these bells and whistles enabled could reach a whopping 103 teraflops in FP32 workloads, and 826 teraflops in FP16 workloads with the Tensor cores, and 1652 teraflops with the Tensor cores in FP8 mode. That would be a huge 25% performance jump in comparison to the RTX 4090.
These gains would only be realized in GPU limited scenarios, of course, so probably not 1080p or 1440p gaming. Heavy compute applications would also likely benefit. The combination of more L2 cache capacity, additional GDDR6X bandwidth, and more cores and clocks could result in tangible improvements.
|Row 0 - Cell 0||RTX 4090 Ti (Full AD102)||RTX 4090||RTX 3090 Ti|
|Process||TSMC 4N||TSMC 4N||Samsung 8N|
|Ray Tracing Cores||144||128||84|
|VRAM Speed||24 Gbps?||21 Gbps||21Gbps|
|L2 Cache Capacity||96MB||72MB||None|
When Will We See an RTX 3090 Ti?
It appears Nvidia has a lot of performance headroom remaining with its GA102 die, with the potential to create a RTX 4090 Ti that could theoretically smoke the RTX 4090. It would certainly cost a lot more money, and consume way more power than a RTX 4090, but it can be done.
All of this will depend on how hard Nvidia wants to push its GA102 die, and that will almost certainly depend on how close AMD can come to matching Nvidia's performance with the upcoming RDNA 3 chips. Yields on fully functional AD102 GPUs would also play a role, though it's doubtful these would be high volume parts.
Nvidia could add some or all of these enhancements to an RTX 4090 Ti any time it feels the need. We didn't get the RTX 3090 Ti until 18 months after the RTX 3090 debut, but there were a lot of compounding factors in play. More likely is we'll see a 2023 refresh of the RTX 40-series some time around nine months to 12 months after the initial salvo.
There's also the rare chance Nvidia could skip the RTX 4090 Ti completely in favor of a new Titan variant, but we doubt that will be the case. Titan cards tend to cut into the lucrative RTX A-series professional card profits too much.