Designing for The Future: Tensor Cores and DLSS
Although the Volta architecture was full of significant changes compared to Pascal, the addition of Tensor cores was most indicative of GV100’s ultimate purpose: to accelerate 4x4 matrix operations with FP16 inputs, which form the basis of neural network training and inferencing.
Like the Volta SM, Turing exposes two Tensor cores per quad, or eight per Streaming Multiprocessor. TU102 does feature fewer SMs than GV100 (72 versus 84), and GeForce RTX 2080 Ti has fewer SMs enabled than Titan V (68 versus 80). So, the RTX 2080 Ti only has 544 Tensor cores to Titan V’s 640. But TU102’s Tensor cores are implemented differently in that they also support INT8 and INT4 operations. This makes sense of course; GV100 was designed to train neural networks, while TU102 is a gaming chip able to use trained networks for inferencing.
Nvidia claims that TU102’s Tensor cores deliver up to 114 TFLOPS for FP16 operations, 228 TOPS of INT8, and 455 TOPS INT4. The FP16 multiply with FP32 accumulation operations used for deep learning training are supported as well, but at half-speed compared to FP16 accumulate.
Most of Nvidia’s current plans for the Tensor cores revolve around neural graphics. However, the company is also researching other applications of deep learning on desktop cards. Intelligent enemies, for instance, would completely change the way gamers approach boss fights. Speech synthesis, voice recognition, material/art enhancement, cheat detection, and character animation are all areas where AI is already in use, or where Nvidia sees potential.
But of course, Deep Learning Super Sampling (DLSS) is the focus for GeForce RTX. The process by which DLSS is implemented does require developer support through Nvidia’s NGX API. But the company claims integration is fairly easy, and has a list of games with planned support to demonstrate industry enthusiasm for what DLSS can do for image quality. This might be because the heavy lifting is handled by Nvidia itself. The company offers to generate ground truth images—the highest-quality representation possible, achieved through super-high resolution, lots of samples per frame, or lots of frames averaged together. Then, it’s able to train an AI model with the 660-node DGX-1-based SaturnV server to get lower-quality images as close to the ground truth as possible. These models are downloaded through Nvidia’s driver and accessed using the Tensor cores on any GeForce RTX graphics card. Nvidia claims that each AI model is measured in megabytes, making them relatively lightweight.
While we hoped Nvidia's GeForce Experience (GFE) software wouldn't be a requisite of DLSS, we suspected it probably would be. Sure enough, the company confirmed that the features of NGX are tightly woven into GFE. If the software detects a Turing-based GPU, it downloads a package called NGX Core, which determines if games/apps are relevant to NGX. When there's a match, NGX Core retrieves any associated deep neural networks for later use.
Will DLSS be worth the effort? It’s hard to say at this point. We’ve seen one example of DLSS from Epic’s Infiltrator demo and it looked great. But it’s unclear if Nvidia can get the same caliber of results from any game, regardless of genre, pace, environmental detail, and so on. What we do know is that DLSS is a real-time convolutional auto-encoder trained on images sampled 64 times. It’s given a normal-resolution frame through the NGX API, and spits back a higher-quality version of that frame.
Shortly after its Gamescom announcement, Nvidia started teasing performance figures of GeForce RTX 2080 with DLSS enabled versus GeForce RTX 2080 versus GTX 1080. Those results made it seem like turning DLSS on improved frame rates absolutely, but there just wasn't much backup data to clarify how the benchmark numbers were achieved. As it turns out, DLSS improves performance by reducing the card's shading workload, while achieving similar quality.
Turing can produce higher-quality output from a given number of input samples compared to a post-processing algorithm like Temporal Anti-Aliasing (TAA). For DLSS, Nvidia turns that into a performance benefit by reducing input samples to the network until the final output (at the same resolution as TAA) is close to a similar quality. Even though Turing spends time running the deep neural network, the savings attributable to less shading work are greater. Set to 2x DLSS, Nvidia says it can achieve the equivalent of 64x super-sampling by rendering its inputs at the target resolution, while side-stepping the transparency artifacts and blurriness sometimes seen from TAA.
Twenty-five games are already queued up with DLSS support, including existing titles like Ark: Survival Evolved, Final Fantasy XV, and PlayerUnknown’s Battlegrounds, plus several others that aren’t out yet.
MORE: Best Graphics Cards
MORE: Desktop GPU Performance Hierarchy Table
MORE: All Graphics Content