Proof of Concept: Hardware Versus Software Capture
First things first. Before we start pitting graphics architectures against each other or digging into quality settings, we need to establish the comparability of hardware- and software-based FCAT.
For this, we’ll use three different GPUs with varying performance under Gunfire Games’ Chronos. It’s one of the most taxing titles we’ve tested, and we deliberately select the Epic quality preset to push each board as hard as possible. Simultaneously, we record one run using both performance tools. They’re processed independently and aligned using Nvidia’s FCAT VR Analyzer to ensure they overlay as perfectly as possible.
We find it easiest to look at the fastest card first, and that’s Nvidia’s Titan X (Pascal).
The Analyzer included with Nvidia’s toolset generates three charts. Up top, we see frame time over time, with both the hardware- and software-based results plotted together. Below that are the two runs, graphed separately, illustrating frame type over time. The legend shows real frames, in green, synthesized frames (via asynchronous spacewarp) in yellow, and dropped frames in red.
Even at Chronos’ most demanding settings, a Titan X is fast enough to sustain 90 real frames per second throughout our test sequence. Both the hardware and software captures recognize four dropped frames, which frankly may correspond to scene changes as the RPG changes perspective. Regardless, you can see in the frame time chart up top where small spikes cause the hardware capture to flip from 11 to 22ms, and where the unconstrained frame time measured by software briefly blips. The red line staying far enough below 11ms is what facilitates an ideal result.
If it weren’t for a 90 Hz refresh, the Titan X would average 134 FPS in this workload.
The GeForce GTX 1070 chart looks significantly different, requiring some more information to properly analyze. Let’s start with the frame time over time chart up top.
Our software capture oscillates above and below 11ms. When it’s below, we get 90 newly-rendered frames per second. When the line spends too much time above 11ms, asynchronous spacewarp kicks in and we end up synthesizing alternating frames.
To the hardware-based video capture solution, it looks like we’re dropping every other frame, pushing the green line up to 22ms (or 45 FPS). That’s because the tool is looking at the overlay’s colored bars and determining if the regular pattern missed, and then how many colors it missed by. Based on that number, the scripts determine how long it took to display the last frame (one missed color equals 11 + 11ms, or 22ms, two missed colors equal 11 + 22ms, or 33ms, and so on).
There’s another interesting observation to address. In several instances, the video capture reflects 90 real frames rendered per second as the software tool shows frame times exceeding 11ms. How is that possible, when we’re taught work on a frame has to be completed in less than 11ms to appear during the subsequent scan? In all actuality, the envelope can expand/contract due to preemption/parallelization done by the VR runtime. Oculus' adaptive queue ahead feature is designed to facilitate this, so an 11ms cut-off is not absolute, though it’s generally true. In short, optimizations are not enough to keep you from dropping frames if you render at >11ms for an extended period of time.
The next two charts break down the composition of each 90 FPS run. In the last one (hardware capture), Nvidia’s scripts report 1964 dropped frames. In reality, asynchronous spacewarp is active when the 1070 isn’t able to render 90 real frames per second. But that’s not apparent in a recorded video, so we end up with lots of red. Data analyzed by the FCAT VR Capture tool can convey this information, though. As a result, the chart corresponding to software capture tells us only 12 frames were dropped, while 1952 were synthesized.
The introduction of asynchronous spacewarp bumped the Rift’s minimum graphics specification down to a GeForce GTX 960, but Chronos’ Epic detail preset demonstrates how taxing VR workloads need more than just a mainstream GPU.
Our hardware-based capture system reflects an almost-constant 22ms frame time. Were it not for warping, this would be a fairly abysmal VR experience. The stats tell us we generated 3616 new frames through the run and dropped 3626 frames. That slightly-higher-than 1:1 ratio of dropped to real frames is likely due to the four spikes up to 33ms and subsequent dips to 11ms.
Picking through the software capture tells us that the frames originally reported as dropped by a video sequence were in fact synthesized frames, generated by ASW. The stats spit back 3685 new frames and 3695 synthesized ones. One question remains unanswered for us here: if ASW generates extrapolated frames using previous real frames, how does the technology operate in situations where the frame rate drops below 45? Does it simply extrapolate based on an extrapolation? This isn’t the most extreme example we've seen of Nvidia’s tool reporting more synthesized than real frames. We'll address the question in more depth shortly, and present a potential answer.
Hardware Vs. Software: It Works
The real purpose of this first look at FCAT VR is to test the relationship between hardware-based capture and Nvidia’s FCAT VR Capture tool.
We now have the infrastructure in place to intercept and record a 2160x1200 signal at 90 Hz, analyze the output, and chart it just as we did during the days of Challenging FPS: Testing SLI And CrossFire Using Video Capture. Nvidia knows this methodology is expensive and time-consuming, though. So it developed software to make VR benchmarking more accessible.
By instead tapping into telemetry data exposed to developers by Oculus’ runtime and SteamVR, Nvidia’s FCAT VR Capture application facilitates Fraps-like recording of several different statistics, which can be analyzed and charted through a Python-based Analyzer GUI.
Anyone could have accessed this information previously. In fact, we have an unpublished story in our CMS from last year extensively measuring graphics performance on the Vive. But we uncovered so many issues that changed with each subsequent driver update that it was never possible to push the piece live before its findings were rendered obsolete. Eventually we shelved it. The point is that Nvidia isn’t presenting anything you couldn’t get at yourself, given the time and effort. By making the tool available publicly, the technical community has an opportunity to dissect it. And by facilitating a comparison to hardware-based capture, we can verify agreement between the two metrics.