Multi-Card Graphics Problems, And A Solution: Nvidia's FCAT
Even now, the industry measures rendered frames per second using software-based tools like Fraps and in-game counters that capture frames from the graphics card's memory every second. On the surface, that sounds like a great way to generate accurate data.
The problem is that there can be a significant disconnect between the game engine's output and what you see on your monitor. And, when you move beyond the issues that might affect a single-GPU setup and start considering the complexity of multiple graphics processors working cooperatively, two additional variables surface, affecting your experience: dropped frames and what Nvidia is introducing to us as runt frames.
Briefly, even after a frame is loaded into memory, there's still a lot of work that goes on in the graphics pipeline; this takes time. As a result, some frames are dropped before they ever show up on-screen. Other frames show up, but only on a very small part of the screen. Nvidia's definition of a runt is any frame that shows up on fewer than 21 scan lines on a monitor.
Dropped and runt frames have no positive impact on what you see while you're playing a game. They still get counted by benchmark tools, though. This is a problem, particularly if a combination of hardware is rendering as quickly as possible without any consideration of consistency. The more drops and runts generated, the less accurate utilities like Fraps become. The good news is that these two issues only affect multi-GPU configurations enough to be a problem. In single-card testing, Fraps and the in-game metrics we use are much more accurate.
So, how do we capture frame rate data from AMD CrossFire- and Nvidia SLI-based setups without runt and drop frames interfering with the result?
We record the display output and analyze it, frame by frame. Sound intense? It is. In fact, it's so potentially time consuming that Nvidia developed its own suite of tools to make this possible.
Some of what we're using is commercially available and open source. For example, the system being benchmarked is connected to a Gefen dual-link DVI distribution amplifier, which in turn outputs to a monitor and a Datapath Limited VisionDVI-DL capture card, plugged into a second machine. For an accurate analysis, the recorded video has to be flawless, without dropped frames. So, we have a trio of striped SSDs able to cope with the uncompressed stream (as much as 650 MB/s at the card's highest supported resolution).
The system getting tested runs an overlay, which applies a colored bar to each frame coming from the graphics card in a repeating sequence of 16 colors. This software is Nvidia's, but it's necessary in order to automate the analysis. Company representatives say that the folks at Beepa could conceivably add this overlay functionality to Fraps, taking Nvidia out of the equation.
Game play, with the overlay bars, is captured by VirtualDub onto the solid-state storage. From there, another Nvidia app reads in the video and creates an information-rich Excel spreadsheet by analyzing the video's frames and looking for the color bar sequence. Knowing what it expects, the utility can easily see if a frame is dropped (color missing) or if a frame occupies fewer than 21 scan lines on the screen (a runt).
Even the Excel file is very data-heavy, though. So, Nvidia developed a series of Perl scripts able to parse the information and create frame rate/frame time analysis, along with charts reflecting the information.
Nvidia calls this complete package FCAT, for Frame Capture Analysis Tool. FCAT's resulting performance data, modified to remove dropped and runt frames from the reporting going on today, should be more in-line with what gamers see.