Sign in with
Sign up | Sign in
Challenging FPS: Testing SLI And CrossFire Using Video Capture
By ,
1. Frames Per Second: Why The World Was Wrong

"You take the red pill - you stay in Wonderland, and I show you how deep the rabbit hole goes."
- Morpheus, The Matrix

Over the years, we've accumulated mountains of data from benchmarking tools like Fraps and metrics built-in to top titles to help us evaluate performance. Historically, that information gave us our impression of how much faster one graphics card is than another, or what speed-up could be expected from a second GPU in CrossFire or SLI.

As a rule, human beings don't respond well when their beliefs are challenged. But how would you feel if I told you that the frames-per-second method for conveying performance, as it's often presented, is fundamentally flawed? It's tough to accept, right? And, to be honest, that was my first reaction the first time I heard that Scott Wasson at The Tech Report was checking into frame times using Fraps. His initial look and continued persistence was largely responsible for drawing attention to performance "inside the second," which is often discussed in terms of uneven or stuttery playback, even in the face of high average frame rates.

I still remember talking about this with Scott roughly two years ago, and we're still left with more questions than answers, despite his impressive volume of work over that time. There are a couple of reasons the escalation of this issue has taken so long.

First, as mentioned, even open-minded enthusiasts are uncomfortable with fundamental changes to what they took for granted previously (after all, that means we, you, and much of the industry was often wrong with our analysis). Nobody wants to believe that the information we were gleaning previously wasn't necessarily precise. So, many folks shied away from it for as long as possible.

Second, and perhaps even more technically-correct, there is no complete replacement for reporting average frame rate. Frame times and latency are not perfect answers to the problem; there are other variables in play, including where Fraps pulls its information from the graphics pipeline. At the end of the day, there is no metric we can use to definitively compare the smoothness of video performance based exclusively on objective observation.

That's what we're looking for; that's the Holy Grail. We'd need something to replace FPS. The bad news is that we're not there yet.


But frames per second is far from a useless yardstick. It reliably tells us when a piece of hardware delivers poor performance. When you see a card averaging less than 15 FPS, for instance, you know that combination of settings isn't running fluidly enough for a perceived sense of realism. There is no ambiguity in that. Unfortunately, averaging frames per second does not help distinguish between the consistency of rendered frames, particularly when two solutions serve up high frame rates and would appear to perform comparably. 

It's not all doom and gloom, though. This is an exciting time to be involved in PC hardware, and graphics performance gives us a new frontier to explore. There are a lot of smart people working on this problem, and it's something that'll invariably be conquered. For our part, we've put our own research into the question of smoothness, which you've recently seen reflected as charts that include average frame rates, minimum frame rates, frame rates over time, and frame time variance. None of those address the challenge completely, but they help paint a more complete picture when it comes to choosing the right graphics card for your games.

Today, we're exploring another tool that's going to help us dig into the performance of graphics cards (particularly multi-GPU configurations): Nvidia's Frame Capture Analysis Tool, or FCAT.

2. Multi-Card Graphics Problems, And A Solution: Nvidia's FCAT

Even now, the industry measures rendered frames per second using software-based tools like Fraps and in-game counters that capture frames from the graphics card's memory every second. On the surface, that sounds like a great way to generate accurate data.

The problem is that there can be a significant disconnect between the game engine's output and what you see on your monitor. And, when you move beyond the issues that might affect a single-GPU setup and start considering the complexity of multiple graphics processors working cooperatively, two additional variables surface, affecting your experience: dropped frames and what Nvidia is introducing to us as runt frames.

Briefly, even after a frame is loaded into memory, there's still a lot of work that goes on in the graphics pipeline; this takes time. As a result, some frames are dropped before they ever show up on-screen. Other frames show up, but only on a very small part of the screen. Nvidia's definition of a runt is any frame that shows up on fewer than 21 scan lines on a monitor.

Dropped and runt frames have no positive impact on what you see while you're playing a game. They still get counted by benchmark tools, though. This is a problem, particularly if a combination of hardware is rendering as quickly as possible without any consideration of consistency. The more drops and runts generated, the less accurate utilities like Fraps become. The good news is that these two issues only affect multi-GPU configurations enough to be a problem. In single-card testing, Fraps and the in-game metrics we use are much more accurate.

So, how do we capture frame rate data from AMD CrossFire- and Nvidia SLI-based setups without runt and drop frames interfering with the result? 

We record the display output and analyze it, frame by frame. Sound intense? It is. In fact, it's so potentially time consuming that Nvidia developed its own suite of tools to make this possible.

Some of what we're using is commercially available and open source. For example, the system being benchmarked is connected to a Gefen dual-link DVI distribution amplifier, which in turn outputs to a monitor and a Datapath Limited VisionDVI-DL capture card, plugged into a second machine. For an accurate analysis, the recorded video has to be flawless, without dropped frames. So, we have a trio of striped SSDs able to cope with the uncompressed stream (as much as 650 MB/s at the card's highest supported resolution).

The system getting tested runs an overlay, which applies a colored bar to each frame coming from the graphics card in a repeating sequence of 16 colors. This software is Nvidia's, but it's necessary in order to automate the analysis. Company representatives say that the folks at Beepa could conceivably add this overlay functionality to Fraps, taking Nvidia out of the equation. 

Game play, with the overlay bars, is captured by VirtualDub onto the solid-state storage. From there, another Nvidia app reads in the video and creates an information-rich Excel spreadsheet by analyzing the video's frames and looking for the color bar sequence. Knowing what it expects, the utility can easily see if a frame is dropped (color missing) or if a frame occupies fewer than 21 scan lines on the screen (a runt).

Even the Excel file is very data-heavy, though. So, Nvidia developed a series of Perl scripts able to parse the information and create frame rate/frame time analysis, along with charts reflecting the information. 

Nvidia calls this complete package FCAT, for Frame Capture Analysis Tool. FCAT's resulting performance data, modified to remove dropped and runt frames from the reporting going on today, should be more in-line with what gamers see.

3. Test System And Benchmarks

We're using Nvidia's FCAT suite of tools to compare the performance of two Radeon HD 7870s in CrossFire to a pair of GeForce GTX 660 Tis in SLI across a wide spectrum of modern games. These closely-priced (and popular) competitors are perfect for showcasing the differences between each company's multi-card rendering technologies.

Test System
CPU
Intel Core i5-3550 (Ivy Bridge) 3.3 GHz Base, 3.7 GHz Turbo Boost
Motherboard
Gigabyte Z77X-UP7, LGA 1155, Chipset: Intel Z77 Express
Networking
On-Board Gigabit LAN controller
Memory
Corsair Vengeance LP PC3-16000, 4 x 4 GB, 1600 MT/s, CL 8-8-8-24-2T
Graphics
2 x GeForce GTX 660 Ti 2 GB GDDR5 in SLI
2 x Radeon HD 7870 2 GB GDDR5 in CrossFire
Hard Drive
Western Digital Caviar Black 1 TB, 7,200 RPM, 32 MB Cache, SATA 3Gb/s
Power
ePower EP-1200E10-T2 1,200 W, ATX12V, EPS12V
Software and Drivers
Operating System
Microsoft Windows 8
DirectX
DirectX 11.1
Graphics Drivers
AMD Catalyst 13.3 Beta 3
Nvidia GeForce 314.22 Beta
Benchmarks
Borderlands 2
v.1.0.28.697606, Custom Benchmark, 60-second Fraps run
Crysis 3
v.1.0.0.1, Custom Benchmark, 60-second Fraps run
F1 2012
v.1.3.3.0, Included Benchmark, 60-second Fraps run
Far Cry 3
v.1.0.0.1, Custom Benchmark, 50-second Fraps run
Tomb Raider
v.1.0.722.3, Custom Benchmark, 45-second Fraps run
4. Results: Batman Arkham City

With results to compare in nine games, we're proceeding alphabetically. Batman Arkham Asylum is first in line.

First, we'll look at the minimum and average frame rates. When you see "Hardware FPS" in the charts, you're looking at the number of frames per second the graphics card is rendering. "Practical FPS" is the more accurate representation of what you'd actually see, with dropped frames and runts taken out of the equation.

In addition, we have a fifth data point, captured in Fraps on the CrossFire-based system, to use as a comparison point.

The difference between the hardware and practical frame rates on our GeForce GTX 660 Ti-based SLI setup is only about one frame per second. In comparison, the gap is more like five frames per second on the Radeon HD 7870-based CrossFire configuration.

Fraps would have us believe that the hardware FPS number is right, even though the practical frame rate is lower.

The frame rate over time chart shows us that the Radeon configuration's hardware FPS (the thin red line) and the Fraps result (the thin red line with dots) sometimes spike above the practical FPS (the thick red line). When it comes to the GeForce cards, both lines remain close together.

For reference, we're also including our frame time variance chart, generated from the data generated by FCAT. The GeForce and Radeon cards exhibit sub-10 ms results when we look at the 95th percentile.

5. Results: Borderlands 2

In Borderlands 2 we see a .1 FPS difference between the rendered frame rate and the practical results, while 2.4 FPS separate the Radeon cards. Fraps is even more optimistic than either FCAT-based measurement.

As we look at frame rates over time, it's obvious that these configurations perform similarly. However, we see some frame rate spikes on AMD's hardware that isn't reflected in the practical result when drops and runts are taken into account.

Frame time variance at these high frame rates is minimal.

6. Results: Crysis 3

The GeForce cards demonstrate no difference between our hardware and practical results, while the Radeon combo exhibits a 1.3 FPS split. In this case, Fraps actually tracks more closely with the practical result. Just bear in mind that Fraps needs to be run separately, since its overlay doesn't appear to cooperate with the one included in FCAT.

Although there is little difference between the actual rendered frames from AMD's cards and the practical result, we can clearly see the dropped and runt frames when we look at frame rate over time.

Frame time variance is relatively high for both graphics setups on this game. The GeForce cards seemed choppier, based on my experience, but that's a subjective call.

7. Results: F1 2012

Let's see how these dual-card setups take on F1 2012, a game we consider to be platform-limited.

Interestingly, despite this title's reliance on processor and memory performance, we see the largest discrepancy between actual rendered frames (including drops and runts) and the practical output you'd actually experience, a 13.1 FPS delta. It's also interesting that Fraps reports results that come closer to AMD's practical output; in theory you'd think the opposite would be true.

Meanwhile, the GeForce boards don't have the large gap separating them, making this story the first time we've seen quantifiable evidence of Nvidia's effort to deliver consistent frames, rather than pushing frames as fast as they can be rendered.

Illustrating frame rates over time gives us a dramatic visualization of runt and dropped frames causing spikes during the benchmark run.

Regardless of those issues, frame time variance appears fairly modest, even at the 95th percentile.

8. Results: Far Cry 3

Performance-wise, these settings favor Nvidia's cards, which yield identical numbers for our actual and practical frame rates. The Radeon boards see 3.4 FPS between those same two measurements. Correlating with Fraps gives us a results that'd predictably come close to what the Radeon cards are actually rendering.

As before, we see spikes from what AMD's cards are actually rendering, which includes the dropped and runt frames.

The spikiness from AMD's Radeon cards can be demonstrated by looking at frame time variance. At the 95th percentile, we can see that frames aren't being delivered as consistently. I did notice the game felt a little laggier, too.

9. Results: Hitman Absolution

Like Far Cry 3, this title is included in AMD's Never Settle bundle. We were surprised to see it take up 25 GB of storage space.

So, check this out: two Radeon HD 7870s in CrossFire achieve the highest "actual" frame rate, while the corrected "practical" result comes in at the bottom. On average, the difference is just under 5 FPS. However, the minimum drops by 9 FPS.

By now we're familiar with this chart format, which shows the hardware frame rate spiking occasionally.

This is another game in which the Radeon cards suffer from a relatively high amount of frame rate variance. We didn't notice this negatively affecting our experience in Hitman (and we've repeatedly told both AMD and Nvidia that frame rate consistently seems to affect certain titles more than others). However, it does help explain the data we're reporting.

10. Results: Metro 2033

This title is both very mature and graphically demanding (something that seems to favor more consistent results). The actual and practical frame rates are stable on both the Radeon and GeForce cards.

A glance at frame rate over time reveals a couple of spikes, but nothing so severe that it'd drastically change our averages.

Low frame time variance generated from the FCAT data suggests that game play in Metro 2033 is fairly smooth.

11. Results: Elder Scrolls V: Skyrim

Nvidia's cards achieve consistent actual and practical frame rates, while the Radeon boards have 3.1 FPS between what the cards actually render and what you might expect to experience with drops and runts factored out.

12. Results: Tomb Raider

In perhaps the most dramatic finish of our nine-game suite, the Radeon cards encounter their largest drop in experiential performance, sacrificing 16.5 FPS on average.

Comparatively, the GeForce GTX 660 Tis turn back the same result for actual and practical frame rate.

Frame rate over time shows us just how many dropped and runt frames must be discarded for us to reach our practical frame rate.

Frame time variance is low from the GeForce cards, whereas the Radeon boards encounter notably more variance. Additionally, we noticed a bit of this during game play.

13. When Frame Rates Aren't What They Seem...

We've known about the existence of Nvidia's frame metering technology for years, and its efforts to quantify the benefits of spacing frames out evenly, minimizing dropped and runt frames, for months. It was only recently, however, that the company was willing or able to show off the fruits of its development efforts. Even today, the tools can be a little finicky. We would have had even more performance data, even, except that our X79 Express-based benchmark platform was spitting out FCAT data that clearly wasn't right. Switching over to Z77 Express at the last minute gave us the results we were looking for.

As we expected, both from The Tech Report's background with this and Nvidia's in-house examples, we see that AMD's Radeon HD 7870s in CrossFire tend to suffer more dropped and runt frames than a pair of GeForce GTX 660 Tis in SLI. This addresses much of the trepidation about multi-card configurations expressed in Best Graphics Cards for the Money, confirming that two of Nvidia's boards appear more appealing. AMD even admits it's playing catch-up in this area, and is addressing its shortcomings through successive driver improvements.

The Datapath card we used for video captureThe Datapath card we used for video capture

In any case, it's telling that Nvidia put in all of this effort to quantifying graphics performance in a scientifically sound way. The company deserves credit for putting resources into something we wouldn't have been able to develop on our own, even if its motivation wasn't altruistic. The fact is that FCAT gives us a tool for evaluating something we couldn't accurately measure before. We now have two Tom's Hardware labs enabled with the hardware and software to run tests using FCAT, and we're already in the process of testing for a much more comprehensive Part 2.

Having said all of that, we're once again left with questions, even as we uncover a number of answers. While it seems obvious that a runt frame of fewer than 21 scan lines contributes little (or nothing) to the smoothness of a game, would a hardcore gamer see a quality improvement if we split the screen evenly by two, three, or even 50 frames composed of 22 or more lines on the screen? The FCAT tool is built to facilitate user-specified definitions of how large a runt frame can be, and we'll need to play with the script's switches to really dial-in our own recipe for performance evaluation. What we're presenting today is really what FCAT can do out of its proverbial box.

But questions like that tell us we have a lot more work to do. Hopefully, innovative tools like Nvidia's FCAT help us solve them. Keep an eye on this space for our upcoming follow-up.