Page 1:Meet TU102 and GeForce RTX 2080 Ti
Page 2:Meet TU104 and GeForce RTX 2080
Page 3:Meet TU106 and GeForce RTX 2070
Page 4:Turing Improves Performance in Today’s Games
Page 5:Designing for The Future: Tensor Cores and DLSS
Page 6:Hybrid Ray Tracing in Real-Time
Page 7:NVLink: A Bridge To…Anywhere?
Page 8:Mesh Shading: A Foundation for More On-Screen Objects
Page 9:Variable Rate Shading: Get Smarter About Shading, Too
Page 10:RTX-OPS: Trying to Make Sense of Performance
Page 11:Display Outputs and the Video Controller
Page 12:Nvidia’s Founders Edition: Farewell, Beautiful Blower
Page 13:Overclocking: Making The Most Of Headroom With Nvidia Scanner
Page 14:Ray Tracing And AI: Betting It All on Black
Hybrid Ray Tracing in Real-Time
Beyond the scope of anything that Volta touched, and arguably the most promising chapter in Turing’s story, is the RT core bolted to the bottom of each SM in TU102. Nvidia’s RT cores are essentially fixed-function accelerators for Bounding Volume Hierarchy (BVH) traversal and triangle intersection evaluation. Both operations are essential to the ray tracing algorithm. For more background about ray tracing and why it’s so visually appealing, see our feature: What is Ray Tracing and Why Do You Want it in Your GPU?
In short, BVHs form boxes of geometry in a given scene. These boxes help narrow down the location of triangles intersecting rays through a tree structure. Each time a triangle is found to be in a box, that box is subdivided into more boxes until the final box can be divided into triangles. Without BVHs, an algorithm would be forced to search through the entire scene, burning tons of cycles testing every triangle for an intersection.
Running this algorithm today is entirely possible using the Microsoft D3D12 Raytracing Fallback Layer APIs, which use compute shaders to emulate DirectX Raytracing on devices without native support (and redirect to DXR when driver support is identified). On a Pascal-based GPU, for instance, the BVH scan happens on programmable cores, which fetch each box, decode it, test for an intersection, and determine if there’s a sub-box or triangles inside. The process iterates until triangles are found, at which point they’re tested for intersection with the ray. As you might imagine, this operation is very expensive to run in software, preventing real-time ray tracing from running smoothly on today’s graphics processors.
By creating fixed-function accelerators for the box and triangle intersection steps, the SM casts a ray into the scene using a ray generation shader and hands off acceleration structure traversal to the fixed-function RT core. All of the intersection evaluation happens much more quickly as a result, and the SM’s other resources are freed up for shading just as they would for a traditional rasterization workload.
According to Nvidia, a GeForce GTX 1080 Ti can cast about 1.1 billion rays per second in software using its CUDA cores capable of 11.3 FP32 TFLOPs. In comparison, GeForce RTX 2080 Ti can cast about 10 billion rays per second using its 68 RT cores. It’s important to note that neither of those figures are based on calculated peaks like a lot of speeds and feeds. Rather, Nvidia took the geometric mean of results from several workloads to settle on its “10+ gigarays”value.
MORE: Best Graphics Cards
MORE: All Graphics Content
- Meet TU102 and GeForce RTX 2080 Ti
- Meet TU104 and GeForce RTX 2080
- Meet TU106 and GeForce RTX 2070
- Turing Improves Performance in Today’s Games
- Designing for The Future: Tensor Cores and DLSS
- Hybrid Ray Tracing in Real-Time
- NVLink: A Bridge To…Anywhere?
- Mesh Shading: A Foundation for More On-Screen Objects
- Variable Rate Shading: Get Smarter About Shading, Too
- RTX-OPS: Trying to Make Sense of Performance
- Display Outputs and the Video Controller
- Nvidia’s Founders Edition: Farewell, Beautiful Blower
- Overclocking: Making The Most Of Headroom With Nvidia Scanner
- Ray Tracing And AI: Betting It All on Black