Nvidia will increase shader counts but not ROPS on RTX 50-series GPUs — except on the lowest tier GB207, according to leak
Rasterization games tend to love ROPS.
GPU shader cores — called CUDA cores in Nvidia parlance — and ROPs are important aspects of modern GPUs. With it's upcomig RTX 50-series, it appears Nvidia has focused on the former rather than both. Harukaze5719 reports on X (formally Twitter) that Nvidia's upcoming Blackwell RTX 50-series GPUs will only see CUDA core count improvements over the Ada Lovelace RTX 40-series GPUs, with ROPs staying the same on the various tiers. The only exception is the entry-level GB207 die, which will get a whopping 33% reduction in ROPS count.
ROPs, or Render Output units (also Raster Operations Pipelines), play a vital role in the traditional GPU 3D rendering pipeline. As the name implies, these handle the processing of pixel and texel information, or in other words, rasterization workloads. ROPs generally aren't as important as shader cores, but they still play a key role in the GPU pipeline. You want to scale the number of ROPs relative to the number of shader cores and other processing clusters to provide optimal performance.
So then might be like this one?All number based on kopite7kimi and formula isn't change ( 1 GPC = 1 ROPs / 1 TPC = 2 SM / SM = 128 CUDA) https://t.co/158neeR86i pic.twitter.com/xmuvANTXi1June 11, 2024
Harukaze's new information (which is based on a formula from popular leaker Kopite7kimi) suggests that Nvidia won't be adding more render output units to its gaming-oriented variant of the Blackwell GPU architecture. From the presumably mainstream GB206 all the way up to the flagship GB202 die, the various GPUs will supposedly sport the exact same ROPS count as their Ada Lovelace (RTX 40-series) predecessors. GB207, the only exception, will reportedly take things a step further in trimming ROPS counts and will have a 33% reduction compared to AD107.
It might seem strange for Nvidia to not increase ROPS count, but very likely the company architects think there are enough ROPs already for Blackwell. As previously mentioned, ROPS aren't the be-all and end-all of GPU performance, especially on modern workloads that incorporate ray tracing, upscaling, and other effects. More ROPS doesn't necessitate more performance if the architecture becomes unbalanced. Nvidia could also be improving the individual ROPS performance in Blackwell, which would provide another explanation for the rumored changes.
Take GB207's 33% ROPS nerf. Nvidia's outgoing AD107 GPU die has an identical ROPS count to the slightly larger and thus more expensive AD106 die. But despite this seeming advantage, AD107-based GPUs never managed to compete with AD106-based GPUs. As our RTX 4060 review showed, the AD107 equipped RTX 4060 card comes nowhere near the RTX 4060 Ti in gaming performance. The key differences between the two are the CUDA core counts and other processing cores (RT, tensor, and texture).
Perhaps AD107 was "overspecced" and Nvidia will cut the ROPS count with GB207, potentially making for a bigger gap to GB206. It also appears Nvidia will be cutting the CUDA core count to just 2,560 — less than the 3,072 on the RTX 4060. The GB206 meanwhile has up to 4,608 shaders, the same number as AD106 (but RTX 4060 Ti only had 4,342 cores enabled). These changes will most likely make make for a bigger gap between the GB207 and GB206 parts.
Speaking of CUDA cores, Nvidia will supposedly have up to 24,576 shaders (192 SMs — Streaming Multiprocessors) on its top GB202 die. That will also have a 512-bit memory interface, which when coupled to GDDR7 could provide a massive boost to memory bandwidth. GB203 on the other hand will be similar to the current AD103, with up to 84 SMs and 10,752 shaders compared to 80 SMs and 10,240 CUDA cores on AD103, and also the same 256-bit interface (but with GDDR7 support). That makes for an absolutely massive gulf between the potential RTX 5090 and RTX 5080, if these rumors prove correct.
Going down the stack, GB205 replaces AD104, but where AD104 had up to 60 SMs and 7,680 shaders, the new chip will apparently max out with 50 SMs and 6,400 shaders — and again, stick to the same 192-bit memory interface. GB206 will retain the same 36 SMs and 4,608 CUDA core count as its AD106 predecessor, with a 128-bit interface. And last and least, the GB207 die will only offer 20 SMs and 2,560 CUDA cores, with a 128-bit GDDR6 memory interface.
It hopefully goes without saying, but readers should take all of the provided information with a huge serving of salt. This unofficial data might come from a leak, or it could simply be rumor mongers spitballing various ideas based on what makes sense. Nvidia will release the first two RTX 50-series GPUs toward the end of the year, according to current rumors, but the last three dies won't come out until 2025. That leaves plenty of time for changes and further speculation. We haven't heard about the consumer Blackwell architectural changes either, though it's a safe bet there will be upgraded CUDA, Tensor, and RT cores — and potentially changes to the ROPS and other elements as well.
One thing is certain, though: If Nvidia really does plan on a 512-bit memory interface and up to 192 SMs with the top GB202 solution, that will not come cheap. Ultimate performance, lots of power, and a shark-sized bite out of your bank account.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.
-
SyCoREAPER My mouth is covered in salt with this statementReply
Seems more like a mid-gen overhaul rather than a true next-gen to me. -
edzieba ROP-scaling is more closely tied to output resolution than render complexity, as the ROPS are dumping their pixels into output buffers rather than working in intermediary shading. If you're busting out triple UHD monitor setups running at 240Hz each then ROP count may be something to take note of, but otherwise not.Reply
Remember when Pixel Fillrate was a figure of note in GPU performance? Remember how it isn't even a footnote now and hasn't been for years? ROPs are what contribute to pixel fillrate. -
These are all Fake "made up" specs. Can fully confirm this after talking to other sources (AIB board channel forums, and Benchlife members).Reply
"Harukaze5719" just copied the data from Kopite7kimi's leaked tweet (which in itself is wrong), and made some of his own predictions, and then assumptions regarding ROPs.
Nothing is final.
So don't bother scratching up your head too much on this leaked specs data, or any SM or ROP theory. This is literally the 4th-5th time the said "Kopite7kimi" leaker has changed his own prediction/Tweet.
Earlier he himself debunked the 512-bit Memory bus rumor on the flagship GB202 die. And now, 512-bit is back on track. Of course, the full die might have a 512-bit interface, but Nvidia will never use the FULL die for any gaming GPU.
And then recently, he himself tweeted that the GB202 RTX 5090 might sport a 448-bit Memory Bus instead. So it's all confusing and messed up right now.
Expect a dozen more random Tweets like these to spawn, till these cards arrive in shelf, with each Tweet having some changed/altered info than before!
EDIT:
Those total number of cores on the new Blackwell Gaming GPUs are just based on the "assumption" that Blackwell lineup will also use Ada's "128 cores per SM count". But these are not yet final/official.
Hence, we get GB202 with 24,576 Cores. -
Amdlova Only thing I feel of this new gen is the GDDR7 will be the same with 30% price increase.Reply -
umeng2002_2 With DLSS scaling and ray/ path tracing becoming more and popular, I can how this move makes sense.Reply -
mac_angel I kinda feel like they are purposely gimping the gaming GPUs by a LOT, on purpose, to stop companies from buying and using them for AI and other professional applications; forcing them to pay hugely premium prices for full working chips.Reply -
valthuer Metal Messiah. said:These are all Fake "made up" specs. Can fully confirm this after talking to other sources (AIB board channel forums, and Benchlife members).
"Harukaze5719" just copied the data from Kopite7kimi's leaked tweet (which in itself is wrong), and made some of his own predictions, and then assumptions regarding ROPs.
Nothing is final.
So don't bother scratching up your head too much on this leaked specs data, or any SM or ROP theory. This is literally the 4th-5th time the said "Kopite7kimi" leaker has changed his own prediction/Tweet.
Earlier he himself debunked the 512-bit Memory bus rumor on the flagship GB202 die. And now, 512-bit is back on track. Of course, the full die might have a 512-bit interface, but Nvidia will never use the FULL die for any gaming GPU.
And then recently, he himself tweeted that the GB202 RTX 5090 might sport a 448-bit Memory Bus instead. So it's all confusing and messed up right now.
Expect a dozen more random Tweets like these to spawn, till these cards arrive in shelf, with each Tweet having some changed/altered info than before!
Sometimes, i wonder: is there any truth to any of these rumours, at all?
And i'm not just talking about the specs.
Can we even begin to trust the speculated release dates? -
35below0 (formally Twitter)Reply
If true, this isn't a cut it's a disembowelment. Great things are expected of 50XX. A hacked down 4060 isn't one of them. -
Alvar "Miles" Udell I would completely expect this if AMD continues with their current strategy of pricing their cards too close. People will not be happy, but they will still buy it, especially because nVidia is making more with their AI cards than GeForce could ever bring in.Reply -
baboma >Only thing I feel of this new gen is the GDDR7 will be the same with 30% price increase.Reply
At low/mid, ie 5060/5070, every rumor indicates continuing "cost optimization" trend by downspec'ing to fit within specific price ranges. Ballparking the rumored nerfs for 5060 vs 4060, 5060 should still stay at $300 mark.
While users obviously don't like it, it's a sound business move in light of, yes, higher priority for AI. It's practiced by both Nvidia and AMD, and presumably Intel for Battlemage.
At high-end, ie 5090, rumored specs show substantial perf increase, and proportional price increase should be expected. I'm ballparking $2K MSRP for 5090, with street pricing possibly higher.
Again, good business sense. High-end users are price insensitive, and will pay more to get the best perf. Low/mid users are price sensitive, so downspec'ing to maintain specific price points is the norm.
As usual, Videocardz piece on this is clearer w/ accompanying charts comparing Blackwell vs Ada SKUs. It's still rumor status, so while specific figures may be off, the gist gleaned above should be fairly on point.
https://videocardz.com/newz/geforce-rtx-50-blackwell-gb20x-gpu-specs-have-been-leaked