Sign in with
Sign up | Sign in

Virtualizing The GPU: How It Works

Can Lucidlogix Right Sandy Bridge’s Wrongs? Virtu, Previewed

Interestingly, some of the same technologies that went into Lucidlogix’s Hydra also play into its Virtu software.

Normally, when you fire up a game—let’s say it’s a modern DirectX 11 title like Metro 2033—it loads certain DLLs based on the hardware installed. If you’re using a Radeon HD 4870, for example, the game will only run through the DirectX 10 or DirectX 9 code path. The same would hold true in a machine limited to Intel’s HD Graphics engine.

Virtu inserts an abstraction layer between the application and operating system, though. Depending on the piece of software requesting resources, Virtu’s rendering assignment manager sends the workload to either Intel’s HD Graphics or the discrete card. Both the abstraction layer and rendering assignment manager are borrowed from Hydra.

Applications that don’t require the discrete card’s performance, or conversely run best on HD Graphics, are handled by Intel’s integrated component. Web content, video playback, the Aero interface all fall under that umbrella, as do apps optimized for Quick Sync. There’s really nothing fancy involved.

Games better rendered on the discrete card, however, are redirected by the assignment manager and processed by the GPU. From there, Lucid’s InterOp engine maps the discrete card’s frame buffer to the HD Graphics’ memory—necessary, since the display output is connected to that device.

Overcoming Overhead

Naturally, the process of mapping one adapter’s frame buffer to the other’s over PCI Express is not free. You’re generally looking a 1 to 1.2 ms process.

So, say you’re running Call of Duty at 100 frames per second. That means each frame is being rendered in 10 ms. Factor in the time it takes to move that frame from the discrete GPU to the other GPU for output, and you’re looking at 11.2 ms or slightly more than 89 frames per second.

Now take that number to the other extreme. Let’s say you’re running Metro 2033 at 20 frames per second. Each frame gets rendered in 50 ms. Add 1.2 ms for the frame buffer transfer and you’re looking at 51.2 ms, or 19.53 frames per second. Clearly, the concern about overhead is more pronounced at higher frame rates, where performance theoretically won’t be affected as severely.

Even though those frame rate drops aren’t too impactful, there is a way to work around them—at least to some extent. In fact, we’ve already seen the strategy from Nvidia with its Optimus technology. From Nvidia’s Optimus white paper:

“To preserve coherency, the 3D engine is blocked from rendering until the mem2mem transfer completes. This time-consuming (synchronous) DMA operation can stall the 3D engine and have a negative impact upon performance. The new Optimus Copy Engine relies on the bidirectional bandwidth of the PCI Express bus to allow simultaneous 3D rendering and copying of display data from the GPU frame buffer to the main memory area used as the IGP frame buffer.”

Lucid similarly employs an asynchronous copy using multiple buffers to transfer data during the render process. In theory, that means you’ll still see 100 frames per second in Call of Duty, and performance is only affected by a small amount of latency in the game. This latency is masked by the fact that a DirectX game buffers up to three frames ahead. Of course, those are the nuts and bolts. In practice, we still see some performance loss.

However, if access to Quick Sync is important to you—and if it isn’t now, there’s a good chance it will be at some point in the next year or two—then Virtu as it exists today presents an acceptable compromise. An upcoming version of Virtu will be even more attractive. More on that shortly.

React To This Article