Hardware And Software: Two Ways To Test
Hardware FCAT: Measuring Performance Through Video Capture
Let’s start with a walk-through of what goes into testing with FCAT through video capture, otherwise known as the hardware method.
Just like the FCAT we introduced three years ago, the new version employs a gaming system to which the HMD is attached and a capture machine responsible for recording. In between, an HDMI splitter takes input from the gaming system’s GPU and outputs two identical signals: one to the HMD and another to the capture system.
Software running on the gaming system applies two overlays to the HMD. The overlays display colored bars, one per frame, in a known sequence. Deviations from that sequence in a recorded file tell you something went wrong and a frame was not displayed. The square in the upper left-hand corner tracks warp misses, while the bar in the middle-left helps monitor for app misses. Runt frames—an artifact we checked for using the original FCAT—are no longer an issue since the Rift and Vive employ 90 Hz refresh cycles. Consequently, only full frames are displayed, and you’ll never see a partial bar.
Assuring FCAT’s accuracy requires a flawless capture at 2160x1200 and 90 Hz, without dropped or added frames. This is no small feat. While the gaming system’s specs are not integral to the benchmark’s integrity, the capture machine should be configured somewhat specifically. We’re using a Core i7-5960X on an MSI X99A Gaming Pro motherboard with 16GB of G.Skill DDR4-3000 installed. Graphics is unimportant, so we added a GeForce GTX 750 Ti to keep heat/power to a minimum.
The more important components are a Datapath VisionSC-HD4+ capture card and an Intel SSD 750 1.2 TB. Datapath’s eight-lane PCIe card supports four independent channels of HDMI 1.4 capture, two of which run at up to 297 MHz. As luck would have it, this is close to the same rate needed for the Rift’s 2160x1200 at 90 Hz. We only need one of those channels for today’s project, but that’s still a ton of data moving over the PCIe bus and onto storage. Fortunately, Intel’s SSD 750 is up to the task, facilitating sequential writes at up to 1.2 GB/s. The sample Intel sent over slides into a 2.5” form factor and attaches to the motherboard’s U.2 connector, enabling a four-lane PCIe 3.0 link.
Once the capture card’s resolution and timing settings are configured to match the HMD, and VirtualDub is set up to record at 2160x1200 @ 90 FPS, all that’s left is capturing video. The overlay is started on the gaming machine, followed by a VR application. We get to where we want to benchmark and then start recording on the capture machine (VirtualDub’s default hotkey is F5). At the end of our run, recording is stopped, leaving a massive AVI file to feed through Nvidia’s Extractor tool and subsequently analyze through FCAT’s Perl scripts.
Software FCAT: Simplifying Testing With A Local GUI
Nvidia’s FCAT software tool does away with the capture machine completely, leaving us with our gaming machine (Core i7-6700K, MSI Z170A Gaming M7 motherboard, 16GB of G.Skill DDR4 memory, and a 500GB Crucial MX200 SSD) and Rift. FCAT VR accesses the performance information provided by Oculus’ runtime logged to Event Tracing for Windows (ETW). Per the FCAT documentation, the following are measured in milliseconds:
- Game Start: Timestamp when Game starts preparing frame
- Game Complete CPU/GPU: Timestamp when Game has prepared frame (all CPU-side work is done), and then another timestamp when frame is finished on GPU
- Queue Ahead: Amount of queue ahead that was allowed for the frame
- Runtime Sample: Timestamp for warp start. Usually fixed amount of time before v-sync
- Runtime Complete: Warp finished on GPU
- V-sync: V-sync interrupt for HMD (Nvidia-only)
There are also integer counters in Nvidia’s output file to report app and warp misses. This is important functionality, and the hardware-based FCAT is limited in its ability to catch those events.
Testing on a Vive is similar, except that FCAT ties into an API exposed by SteamVR to generate its timestamps. The list of captured events is different (longer, in fact). However, we come up with the same performance statistics.
There’s one other benefit associated with software-based FCAT: Because the software is fed granular timing information from ETW and SteamVR, we can calculate what Nvidia calls unconstrained frame rate—that is, the performance we would have seen were it not for a 90 Hz refresh rate. This is particularly powerful in that we’re able to estimate headroom based on how long the render takes.
Meet The VR Capture Utility
When we first started beta-testing FCAT VR, it was referred to as FCAT 2.0. Back then, it consisted of a simple UI that let you specify a log file destination and a benchmarking hotkey. You'd fire up the capture utility and then launch your VR application. A red bar on the right side of the HMD told you FCAT Capture was running, but idle. Hitting Scroll Lock (the only hotkey supported) turned the bar green until you pressed it again, ending the run.
Not much changes in the version of FCAT VR Capture that Nvidia plans to publish. You're still able to specify a custom output directory, and Scroll Lock remains the only functional hotkey. Nvidia does, however, add options for capture delay and capture duration, making it easier to control how tests start and stop.
The first FCAT 2.0 builds we played with used Perl scripts to read in the captured data, set up comparisons, and customize the output charts. After a number of revisions (one of which was precipitated by Oculus' ASW introduction), the charts ended up looking something like this:
But all along, Nvidia knew a lot of time and effort was spent editing script files to create these charts. In an effort to encourage broader adoption of FCAT VR, the company created a Python-based GUI into which we're able to drag log files and create charts much more easily.
This VR Analyzer conveys:
- The delivered frame rate, or what actually shows up on-screen. If you have a perfect run, you'll see 90 FPS. If your card spends half of its time at 90 and half of its time at 45 FPS, this field averages out to 67.5 FPS.
- The unconstrained frame rate, or what you'd see if it weren't for the 90 Hz display interval.
- Refresh intervals, or 90 times the benchmark's duration.
- New frames
- Dropped frames (app misses)
- Frames synthesized by asynchronous spacewarp
- Warp misses
- Average frame time
All of that data paints an interesting picture of what's happening on-screen, complementing the subjective observations we relied on previously, plus the video-based capture technique we'll compare on the next couple of pages.