The Complexity of Benchmarking VR
As you might expect, the tools for quantifying a virtual reality experience are still immature. Tests we were running on pre-production hardware no longer work on the retail Rift. Other benchmarks that were supposed to be ready for launch are still in the final stages of validation. And when it comes to guidance from Oculus itself, we’re told that much of the evaluation process involves simply using the HMD.
Well sure, we’re doing plenty of that too. But passing judgement on such a highly-anticipated piece of hardware with little more than “the experience feels smooth enough” feels so…incomplete.
Consulting with AMD and Nvidia turned up conflicting guidance; one company conceded that Fraps is the best we have for measuring performance, while the other expressed apprehension in trusting any metric at this early point. After all, what good is a result if it can’t be knowledgeably explained?
Indeed, this is not the first time Fraps’ utility was called into question. Years ago, long before we were even thinking about a retail HMD, AMD was on trial for inconsistent frame delivery, which manifested as stuttering. To make a long story short, the company helped explain how Fraps collects data and where its results can be misconstrued. Because the software does its work in between the game and Windows, intercepting Present calls before they reach Direct3D several stages in the display pipeline don’t factor in to a Fraps result.
There’s even more complexity in VR. Specific to the Rift, each eye’s frame passes through Asynchronous Timewarp (ATW) prior to display. In the event that a frame misses its 11.11ms deadline (corresponding to a 90Hz refresh), ATW reuses the previous frame with updated headset position information, preventing the judder that would have occurred otherwise.
Now, we’re very short on confirmed fact at this point—Oculus is being elusive, while representatives at AMD and Nvidia don’t appear to know for sure—but we suspect that a missed frame would show up on Fraps, while Oculus’ runtime would mask its effect in real-world game play. And after spending plenty of time benchmarking and playing on the Rift, that’s what seems to be happening.
Thus, in a world where Fraps is our only insight into performance, we have to kick off our benchmarks with a colossal caveat. There is some correlation between our Fraps results and playing on the Rift, but read our accompanying analysis before drawing conclusions. In many cases, problematic-looking charts really aren’t as troublesome as they seem.
Our comparisons are further complicated by dynamic quality adjustments that games will increasingly use to maintain that critical 11.11ms cadence. You see, in order to correct for the spacial distortion imposed by an HMD’s lenses, more pixels need to be rendered than the actual resolution of the display—a common target is 1.4x. But if a game engine determines it’s going to miss its frame rate goal, 1.4x might slip to 1.3 or 1.2x, sacrificing some quality (we’re told this can be almost imperceptible) in the name of maintaining an immersive experience. Fraps should pick up the performance side of that, but a difference in quality adds an unknown to our analysis…at least until someone creates a tool for collecting that information. Beyond these adaptive changes to the viewport, Valve Software’s senior graphics programmer Alex Vlachos says that anti-aliasing can also be adjusted on the fly, or radial density masking can be applied to maintain performance.
Understand also that frame rates and more granular frame times do not reflect all aspects of the HMD’s performance, either. Latency and persistence play a significant role in the comfort and enjoyment of a VR experience, and we simply cannot get those measurements from the Rift with the tools at our disposal today.
What we can do, then, is present the Fraps-based numbers we generated and do our best to correlate them to our real-world experience gaming with the Rift. In the days to come, we expect a press preview build of VRMark that facilitates draw call to photon latency, which will effectively capture more of the pipeline. Basemark’s VRScore is also imminent. That one claims to offer some of the same measurements as VRMark, plus simultaneous left and right eye latency measurements, dropped frame detection and duplicate frame detection. When final, approved versions of those early tools become available, we’ll report the results. So without further ado, let’s look at the hardware involved.
How We Tested Oculus Rift
Oculus shipped us a VR-ready PC for our evaluation with decidedly mainstream specifications. And while we certainly used that for much of our testing, we put one of our quickest lab machines to work for the performance measurements. This consists of a Core i7-5960X host processor, 16GB of G.Skill DDR4 memory, MSI’s X99S Xpower AC motherboard and a 500GB Crucial MX200 SSD.
Given the computational requirements of VR, there are only a few graphics cards relevant to the discussion. We wanted to test Oculus’ recommended Radeon R9 290 and GeForce GTX 970, a handful of cards above that specification and at least one board below it. As it has so many times in the past, MSI stepped in with a number of boards to help complete our line-up, including:
Those cards complement the Radeon R9 Fury X, GeForce GTX 980 Ti and GeForce GTX 970 we already had in the lab.
It’s also worth mentioning that MSI sent two of its R9 390X and GTX 980 cards so that we could generate results using CrossFire and SLI. We were later told, however, that none of Oculus’ launch titles support multi-GPU rendering. When applications start utilizing AMD’s Affinity Multi-GPU and Nvidia’s VR SLI technologies, we’ll use those cards to evaluate the speed-up (in his 2016 GDC presentation, “Advanced VR Rendering Performance,” Valve’s Vlachos said to expect a 30-35% increase).
We chose four applications from Oculus’ list of launch titles to benchmark. All of them offer a limited number of quality options, which we maxed out in three cases. And they’re all imperfect in that it’s incredibly difficult to repeat the same sequence in successive runs.
In the middle of our testing, Nvidia sent word of a new driver, 364.64, which we installed and started over with. We’re told that this is a version of the driver that will launch alongside the Rift. That later version should include a few extra bug fixes. However, Nvidia tells us that 364.64 is representative of the performance and image quality available at launch.
Separately, AMD informed us of a new Radeon Software Crimson Edition 16.3.2 driver that will support Oculus’ SDK 1.3. That news came several days after we wrapped up benchmarking and sent the Rift off for experiential testing at another Tom’s Hardware lab. As such, our results come from version 16.3.1, published on 3/16/16. We asked AMD for an accounting of the changes made to 16.3.2, but have not received a response.
Benchmarks | |
---|---|
EVE: Valkyrie | Highest detail settings, basic training run-through, Fraps-based recording |
Chronos | Epic graphics options (Rendering Quality, Shadow Quality, Rendering Resolution), Tale of the Scouring sequence, Fraps-based recording |
Radial-G | Highest detail settings, Dead Zone Alpha race, Fraps-based recording |
Lucky's Tale | Medium detail settings, first level run-through to first door, Fraps-based recording |
MORE: Best Graphics Cards