Nvidia’s Turing Architecture Explored: Inside the GeForce RTX 2080
Designing for The Future: Tensor Cores and DLSS
Although the Volta architecture was full of significant changes compared to Pascal, the addition of Tensor cores was most indicative of GV100’s ultimate purpose: to accelerate 4x4 matrix operations with FP16 inputs, which form the basis of neural network training and inferencing.
Like the Volta SM, Turing exposes two Tensor cores per quad, or eight per Streaming Multiprocessor. TU102 does feature fewer SMs than GV100 (72 versus 84), and GeForce RTX 2080 Ti has fewer SMs enabled than Titan V (68 versus 80). So, the RTX 2080 Ti only has 544 Tensor cores to Titan V’s 640. But TU102’s Tensor cores are implemented differently in that they also support INT8 and INT4 operations. This makes sense of course; GV100 was designed to train neural networks, while TU102 is a gaming chip able to use trained networks for inferencing.
Nvidia claims that TU102’s Tensor cores deliver up to 114 TFLOPS for FP16 operations, 228 TOPS of INT8, and 455 TOPS INT4. The FP16 multiply with FP32 accumulation operations used for deep learning training are supported as well, but at half-speed compared to FP16 accumulate.
Most of Nvidia’s current plans for the Tensor cores revolve around neural graphics. However, the company is also researching other applications of deep learning on desktop cards. Intelligent enemies, for instance, would completely change the way gamers approach boss fights. Speech synthesis, voice recognition, material/art enhancement, cheat detection, and character animation are all areas where AI is already in use, or where Nvidia sees potential.
But of course, Deep Learning Super Sampling (DLSS) is the focus for GeForce RTX. The process by which DLSS is implemented does require developer support through Nvidia’s NGX API. But the company claims integration is fairly easy, and has a list of games with planned support to demonstrate industry enthusiasm for what DLSS can do for image quality. This might be because the heavy lifting is handled by Nvidia itself. The company offers to generate ground truth images—the highest-quality representation possible, achieved through super-high resolution, lots of samples per frame, or lots of frames averaged together. Then, it’s able to train an AI model with the 660-node DGX-1-based SaturnV server to get lower-quality images as close to the ground truth as possible. These models are downloaded through Nvidia’s driver and accessed using the Tensor cores on any GeForce RTX graphics card. Nvidia claims that each AI model is measured in megabytes, making them relatively lightweight.
While we hoped Nvidia's GeForce Experience (GFE) software wouldn't be a requisite of DLSS, we suspected it probably would be. Sure enough, the company confirmed that the features of NGX are tightly woven into GFE. If the software detects a Turing-based GPU, it downloads a package called NGX Core, which determines if games/apps are relevant to NGX. When there's a match, NGX Core retrieves any associated deep neural networks for later use.
Will DLSS be worth the effort? It’s hard to say at this point. We’ve seen one example of DLSS from Epic’s Infiltrator demo and it looked great. But it’s unclear if Nvidia can get the same caliber of results from any game, regardless of genre, pace, environmental detail, and so on. What we do know is that DLSS is a real-time convolutional auto-encoder trained on images sampled 64 times. It’s given a normal-resolution frame through the NGX API, and spits back a higher-quality version of that frame.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Shortly after its Gamescom announcement, Nvidia started teasing performance figures of GeForce RTX 2080 with DLSS enabled versus GeForce RTX 2080 versus GTX 1080. Those results made it seem like turning DLSS on improved frame rates absolutely, but there just wasn't much backup data to clarify how the benchmark numbers were achieved. As it turns out, DLSS improves performance by reducing the card's shading workload, while achieving similar quality.
Turing can produce higher-quality output from a given number of input samples compared to a post-processing algorithm like Temporal Anti-Aliasing (TAA). For DLSS, Nvidia turns that into a performance benefit by reducing input samples to the network until the final output (at the same resolution as TAA) is close to a similar quality. Even though Turing spends time running the deep neural network, the savings attributable to less shading work are greater. Set to 2x DLSS, Nvidia says it can achieve the equivalent of 64x super-sampling by rendering its inputs at the target resolution, while side-stepping the transparency artifacts and blurriness sometimes seen from TAA.
Twenty-five games are already queued up with DLSS support, including existing titles like Ark: Survival Evolved, Final Fantasy XV, and PlayerUnknown’s Battlegrounds, plus several others that aren’t out yet.
MORE: Best Graphics Cards
MORE: Desktop GPU Performance Hierarchy Table
MORE: All Graphics Content
Current page: Designing for The Future: Tensor Cores and DLSS
Prev Page Turing Improves Performance in Today’s Games Next Page Hybrid Ray Tracing in Real-Time-
siege19 "And although veterans in the hardware field have their own opinions of what real-time ray tracing means to an immersive gaming experience, I’ve been around long enough to know that you cannot recommend hardware based only on promises of what’s to come."Reply
So wait, do I preorder or not? (kidding) -
jimmysmitty Well done article Chris. This is why I love you. Details and logical thinking based on the facts we have.Reply
Next up benchmarks. Can't wait to see if the improvements nVidia made come to fruition in performance worthy of the price. -
Lutfij Holding out with bated breath about performance metrics.Reply
Pricing seems to be off but the followup review should guide users as to it's worth! -
Krazie_Ivan i didn't expect the 2070 to be on TU106. as noted in the article, **106 has been a mid-range ($240-ish msrp) chip for a few generations... asking $500-600 for a mid-range GPU is insanity. esp since there's no way it'll have playable fps with RT "on" if the 2080ti struggles to maintain 60. DLSS is promisingly cool, but that's still not worth the MASSIVE cost increases.Reply -
jimmysmitty 21319910 said:i didn't expect the 2070 to be on TU106. as noted in the article, **106 has been a mid-range ($240-ish msrp) chip for a few generations... asking $500-600 for a mid-range GPU is insanity. esp since there's no way it'll have playable fps with RT "on" if the 2080ti struggles to maintain 60. DLSS is promisingly cool, but that's still not worth the MASSIVE cost increases.
It is possible that they are changing their lineup scheme. 106 might have become the low high end card and they might have something lower to replace it. This happens all the time. -
Lucky_SLS turing does seem to have the ability to pump up the fps if used right with all its features. I just hope that nvidia really made a card to power up its upcoming 4k 200hz hdr g sync monitors. wow, thats a mouthful!Reply -
anthonyinsd ooh man the jedi mind trick Nvidia played on hyperbolic gamers to get rid of thier overstock is gonna be EPIC!!! and just based on facts: 12nm gddr6 awesome new voltage regulation and to GAME only processes thats a win in my book. I mean if all you care is about is your rast score, then you should be on the hunt for a titan V, if it doesn't rast its trash lol. been 10 years since econ 101, but if you want to get rid of overstock you dont tell much about the new product till its out; then the people who thought they were smart getting the older product, now want o buy the new one too....Reply -
none12345 I see a lot of features that are seemingly designed to save compute resources and output lower image quality. With the promise that those savings will then be applied to increase image quality on the whole.Reply
I'm quite dubious about this. My worry is that some of the areas of computer graphics that need the most love, are going to get even worse. We can only hope that overall image quality goes up at the same frame rate. Rather then frame rate going up, and parts of the image getting worse.
I do not long to return to the day where different graphics cards output difference image quality at the same up front graphics settings. This was very annoying in the past. You had some cards that looked faster if you just looked at their fps numbers. But then you looked at the image quality and noticed that one was noticeably worse.
I worry that in the end we might end up in the age of blur. Where we have localized areas of shiny highly detailed objects/effects layered on top of an increasingly blurry background. -
CaptainTom I have to admit that since I have a high-refresh (non-Adaptive Sync) monitor, I am eyeing the 2080 Ti. DLSS would be nice if it was free in 1080p (and worked well), and I still don't need to worry about Gstink. But then again I have a sneaking suspicion that AMD is going to respond with 7nm Cards sooner than everyone expects, so we'll see.Reply
P.S. Guys the 650 Ti was a 106 card lol. Now a xx70 is a 106 card. Can't believe the tech press is actually ignoring the fact that Nvidia is relabeling their low-end offering as a xx70, and selling it for $600 (Halo product pricing). I swear Nvidia could get away with murder... -
mlee 2500 4nm is no longer considered a "Slight Density Improvement".Reply
Hasn't been for over a decade. It's only lumped in with 16 from a marketing standpoint becuase it's no longer the flagship lithography (7nm).