Nvidia researchers have been looking into techniques to improve GPU ray tracing performance. A recently published paper, spotted by 0x22h, sees GPU Subwarp Interleaving as a technology with good potential to accelerate real-time raytracing by as much as 20%.
However, to reach this headline figure, some micro-architectural enhancements need to be put in place; otherwise, the gains from the technique will be limited. Furthermore, the required changes preclude GPU architectures such as Turing (which was modified, then used in the study), and Nvidia will have to bake the changes into a new GPU as architectural extensions. This means the Subwarp Interleaving gains aren't likely to be seen until a generation after Lovelace.
Real-time ray tracing will only get bigger in the world of graphics, and Nvidia continues to tackle the issue of RT performance from multiple angles to maintain a competitive advantage and marketing halo. With this in mind, a research team consisting of Sana Damani (Georgia Institute of Technology), Mark Stephenson (Nvidia), Ram Rangan (Nvidia), Daniel Johnson (Nvidia), Rishkul Kulkarni (Nvidia), and Stephen W. Keckler (Nvidia) have published a paper which shows early promise in ray tracing microbenchmark studies.
The paper's main thrust is that the way that modern Nvidia GPUs are designed gets in the way of RT performance. "First, GPUs group threads into units, which we call warps, that fetch from a single program counter (PC) and execute in SIMT (single instruction, multiple thread) fashion," explain the researchers. "Second, GPUs hide stalls by concurrently scheduling among many active warps." However, these design choices inherently cause issues in real-time ray tracing due to warp divergence, warp-starved scenarios, and the loss of GPU efficiency when the scheduler runs out of threads and can no longer hide any stalls.
Bring in the Subwarp Scheduler
GPU Subwarp Interleaving is a good solution to the above sticky situations faced by contemporary GPUs being overloaded by warps and running out of threads. The key technique is described as follows: "When a long latency operation stalls a warp and the GPU's warp scheduler cannot find an active warp to switch to, a subwarp scheduler can instead switch execute to another divergent subwarp of the current warp."
On a microarchitecture enhanced Turing-like GPU with Subwarp Interleaving, the researchers achieved what they call "compelling performance gains," of 6.3% on average, and up to 20% in the best cases, in a suite of applications with raytracing workloads.
As we mentioned in the intro, these RT performance gains aren't going to be available to existing GPU families. So you won't be getting a 20% boost courtesy of a driver update to your current GeForce card. However, these background moves are critical to future GPU architectures. So one must assume GPU Subwarp Interleaving is one of many background projects that Nvidia is working on to push forward in future generations. In fact, it may be compounded by other GPU advancements that arrived in Ampere and are coming to Lovelace, before this particular microarchitectural enhancement makes it to shipping GPUs.