Nvidia's Response to Early DIY Quad-SLI

Latency And Overhead

Looking at raw horsepower, Nvidia uses four key versions of rendering processes across the four graphics processors: AFR, SFR, AFR of SFR, and single processor operation. The first mode is alternative frame rendering (AFR). Nvidia uses AFR to boost frame rates. In this scenario, the next four frames are divided and the rendering load is shared across the four graphics processors. This sounds fantastic, but this does not work in some Direct3D applications. Nvidia claims that "DirectX 9 doesn't support queuing of enough back-buffers to effectively support high-performance frour-way AFR mode, combined with 7950GX2 GPUs being clocked a bit lower than standard GeForce 7900 GTX GPUs."

In an interview with Kyle Bennett of HardOCP, Nvidia SLI Product Manager Chris Daniel stated that "Outside of the SLI antialiasing modes, for example 2560x1600 at 4xAA/8xAF, the benefits of Quad-SLI are significant in OpenGL games, but often less prominent in Direct3D games. It turns out that DX9 does not support queuing of enough frames using standard D3D API programming practices (that would attain WHQL certification) to effectively support high-performance 4-way AFR mode used in Quad systems. Quad-SLI instead uses "AFR of SFR" for many D3D games. In the DirectX 10 future, we expect 4-way AFR to work as effectively as it does with OpenGL today."

Just to define the terms a bit further, the role of the front buffer is to send completed frames to the screen and the back buffer contains frames in various states of completion. The twist happens when doing AFR with four graphics processors. There are only three frames allowed in the back-buffer, which means there is one processor waiting for work. If Direct 3D does not allow the application and Quad to queue as much as it wants, then the GPUs will be looking for work, but the CPU has nothing to feed it with. This is what we call being "CPU-bound" or "system-bound." With a ceiling like this, performance will not scale efficiently as more processors are added. There should be advantage of four graphics processors over only two, but it would not be 100%. In some cases it is not even 30%.

As Nvidia stated earlier, there is an issue with overhead. One basic problem comes when you have to queue up a lot of frames. Setting up four different processors with sufficient work can create latencies. The diagram provided demonstrates how Nvidia has stated this in pictorial form. If we extrapolate on this concept, as the scaling decreases, there is greater latency per frame. This decrease can ultimately mean there will be lower frame rate performance than 2-way SLI or single card configurations at some point. To maintain performance, Nvidia needs to keep latencies under control for Quad-SLI to operate effectively.