Update: After this story went live in mid-April, AMD let us know that a number of our results did not match what the company was seeing in its own lab. Today’s update is the product of hundreds of benchmark runs designed to diagnose the behavior of two Radeon R9 295X2 cards in quad-CrossFire.
According to AMD, the only officially-supported way to connect a 295X2 and a 4K display is through DisplayPort. The dual-HDMI method we’re forced to use (and described in detail here: Challenging FPS: Testing SLI And CrossFire Using Video Capture) is not considered valid. So, the first step was to run all of our tests using a DisplayPort cable and Fraps, comparing the outcome to FCAT-generated data.
Using that information, we identified a number of games that behaved the same, regardless of test method, and a couple that suggested something else was the matter. And so we dug…
All of the charts on the following pages were re-rendered. And much of what you’ll see confirms our original conclusions. But because we have so much more time invested into benchmarking and troubleshooting, all of the analysis is new.
We also gathered additional information about AMD's XDMA engine. The Catalyst driver dynamically calculates PCIe bandwidth in real-time. If there isn't enough available, either because your platform cannot keep up or because the resolution you're driving is too throughput-intensive, compositing happens in software instead. Ideally, then, you want x16 slots running at x16 transfer rates, without PLX switches in front of them adding latency. AMD says its dual-GPU boards will still work on links narrower than 16 lanes and through motherboard switches, though stuttering/timing issues become more likely, as you'd expect.
Does our revisit vindicate a pair of $1500 graphics cards working in parallel, or are you better off spending your budget elsewhere?
Twenty-four point eight billion transistors. Eleven-thousand two-hundred and sixty-four shaders. Sixteen gigabytes of GDDR5 memory. Twenty-three teraflops of compute power. One thousand watts of rated board power.
No matter which specification you use to describe two Radeon R9 295X2 cards in CrossFire, the number is undoubtedly unlike any other describing a gaming PC’s capabilities. And yet, late last Friday, a box from iBuyPower showed up at the lab with a tall, majestic Erebus PC sporting a pair of the dual-GPU, liquid-cooled flagships I reviewed in Radeon R9 295X2 8 GB Review: Project Hydra Gets Liquid Cooling. iBuyPower’s builders dutifully matched my test bed’s specifications, including a Core i7-4960X overclocked to 4.2 GHz, 32 GB of memory in a quad-channel configuration, and an even beefier 1350 W power supply. Despite the box’s brawn (and the haste with which I asked to have it built), every cable was tied down neatly, giving the lighting on AMD’s latest a neat, roomy enclosure with a side window to illuminate.
Two R9 295X2s in iBuyPower's Erebus gaming PC
Also nifty, but something we don’t typically think much about as enthusiasts, is that iBuyPower shipped the exceedingly-heavy system with two liquid-cooled cards and it arrived working perfectly. Polyurethane foam works wonders.
And so, with years of experience under our belts suggesting that four GPUs working cooperatively typically don’t behave well at all, regardless of vendor, we set off an experiment to gauge whether history repeats itself, or if AMD’s best-built board breaks precedent in more ways than one.

Testing Two Radeon R9 295X2s
We have our reasons to believe that two Radeon R9 295X2s might behave different from quad-GPU solutions tested previously. To begin, there's AMD's new DMA engine that enables CrossFire without a bridge connector. There's also the fact that we're testing at 3840x2160 and using an overclocked Ivy Bridge-E-based platform, hopefully minimizing the poor scaling you'd expect from a platform-bound configuration. AMD also sent over a new beta driver late last week, which was supposed to be optimized for four-way CrossFire.
| Test Hardware | |
|---|---|
| Processors | Intel Core i7-4960X (Ivy Bridge-E) 3.5 GHz Base Clock Rate, Overclocked to 4.2 GHz, LGA 2011, 15 MB Shared L3, Hyper-Threading enabled, Power-savings enabled |
| Motherboard | MSI X79A-GD45 Plus (LGA 2011) X79 Express Chipset, BIOS 17.8 ASRock X79 Extreme11 (LGA 2011) X79 Express Chipset |
| Memory | G.Skill 32 GB (8 x 4 GB) DDR3-2133, F3-17000CL9Q-16GBXM x2 @ 9-11-10-28 and 1.65 V |
| Hard Drive | Samsung 840 Pro SSD 256 GB SATA 6Gb/s |
| Graphics | 2 x AMD Radeon R9 295X2 8 GB |
| 2 x AMD Radeon R9 290X 4 GB (CrossFire) | |
| AMD Radeon HD 7990 6 GB | |
| 2 x Nvidia GeForce GTX Titan 6 GB (SLI) | |
| 2 x Nvidia GeForce GTX 780 Ti 3 GB (SLI) | |
| Nvidia GeForce GTX 690 4 GB | |
| Power Supply | Rosewill Lightning 1300 1300 W, Single +12 V rail, 108 A output |
| System Software And Drivers | |
| Operating System | Windows 8.1 Professional 64-bit |
| DirectX | DirectX 11 |
| Graphics Driver | AMD Catalyst 14.4 Beta |
| Nvidia GeForce 337.50 Beta | |
We're careful to make sure Frame Pacing is enabled, that Tessellation Mode is controlled by the applications, and v-sync is forced off in AMD's driver.

| Benchmarks And Settings | |
|---|---|
| Battlefield 4 | 3840x2160: Ultra Quality Preset, v-sync off, 100-second Tashgar playback. Fraps/FCAT for 3840x2160 |
| Arma 3 | 3840x2160: Ultra Quality Preset, 8x FSAA, Anisotropic Filtering: Ultra, v-sync off, Infantry Showcase, 30-second playback, FCAT and Fraps |
| Metro: Last Light | 3840x2160: Very High Quality Preset, 16x Anisotropic Filtering, Normal Motion Blur, v-sync off, Built-In Benchmark, FCAT and Fraps |
| Assassin's Creed IV | 3840x2160: Maximum Quality options, 4x MSAA, 40-second Custom Run-Through, FCAT and Fraps |
| Grid 2 | 3840x2160: Ultra Quality Preset, 120-second recording of built-in benchmark, FCAT and Fraps |
| Thief | 3840x2160: Very High Quality Preset, 70-second recording of built-in benchmark, FCAT and Fraps |
| Tomb Raider | 3840x2160: Ultimate Quality Preset, FXAA, 16x Anisotropic Filtering, TressFX Hair, 45-second Custom Run-Through, FCAT and Fraps |

FCAT says Arma 3 averages 73 FPS on a pair of Radeon R9 295X2s; Fraps says 72 FPS. I’d say that’s pretty close to a consensus.
This chart reflects performance a fair bit higher than what was original reported. Previously, scaling was around 20%. Here, it’s 46%. We can’t say for sure why the new number is so much higher. However, all of the benchmarks going into today’s piece are run with the side of our Erebus test platform’s side panel off, allowing maximum airflow.
Why? As it turns out, the Hawaii GPUs on AMD’s Radeon R9 295X2 have a 75 °C limit necessitated by the liquid cooling hardware. It’s not hard to trigger throttling using a workload like GUIMiner. However, it’s probable that the closed case and heat runs we used in the original version of this piece forced performance in Arma down as well.
We’ll keep an eye out for similar behavior as the evaluation continues. Although you wouldn't use a pair of Radeon R9 295X2s like this in the real-world, AMD is also clear that its radiators need to be set up in a specific way for optimal thermal performance, and company reps say that wasn't the case for the system we initially reviewed.

The hypothesis is bolstered by a far less frenetic frame rate over time chart (the original looked like this). Instead, two Radeon R9 295X2s sustain their performance for longer, nosing up over 75 FPS, and not dropping below 65.

Frames are delivered more smoothly as well, and a pair of 295X2s only trail the individual dual-Hawaii board in worst-case frame time variance.

The twin 295X2s are plagued by a handful of frame time variance spikes that are noticeable on-screen, but behave themselves otherwise.
Arma 3 isn’t one of the games we originally took issue with. It makes sense then, that the results on this page don’t flavor our ultimate opinion much. We’re glad to measure additional performance in a best-case cooling/airflow scenario. And if this game is the only one you’re worried about playing, $1500 might be worth an extra 40-something-percent performance.
Let’s leave the editorialization there for now and continue on with the hard numbers.

FCAT tells us that two Radeon R9 295X2s average 42 FPS, and Fraps says 42 FPS as well. Both measurement methods are in agreement, and confirm what we saw in my original quad-CrossFire evaluation.
While AMD does come away with the first-place finish, it’d be hard to argue the sensibility of a second $1500 graphics card for an extra 16% performance compared to just one $1500 board.

Zoom out from the similar averages and you’ll discover a frame rate over time graph that looks a lot alike, too. A pair of Radeon R9 295X2s is indisputably fastest. However, they also jump up and down the performance chart in a more exaggerated manner.

This, in part, translates to the least-attractive frame time variance results, as individual frames are not paced well, resulting in longer pauses interpreted as stuttering while you play through the game.

Once the frame time variance is charted out, you clearly see the spikes in the difference in time between frames. For each of them, there’s a blip in the overall experience. The same held true in our original evaluation, but because I left the GeForce GTX 690 on the graph, you missed out on the overall impact. Nvidia’s dual-GK104-based card is wholly unsuited for 4K, so it and the Radeon HD 7990 were pulled to make the data more readable.

Our results in Battlefield 4 change dramatically compared to the original story, which showed barely any scaling at all with a second Radeon R9 295X2. As you can imagine, that first round of numbers seemed implausible, and we retested multiple times across several drive images. Consistently, we ended up with a frame rate over time chart that largely tracked a single 295X2, but spiked and dipped much more severely.
After that piece went live, AMD shared its own results with us, prompting me to revisit this game (and indeed all of the others as well). At one point, I noticed that, after installing the Catalyst beta package on the two-card system, CrossFire was reported as enabled, yet scaling was off in every test run. Then, after clicking the “Something requires your attention” pop-up, the technology suddenly showed up disabled. Between this and another apparent issue where turning CrossFire on or off caused the screen to go dark, necessitating a soft-reboot, there appear to be a couple of minor software bugs.
At any rate, toggling CrossFire off and back on seemed to help, yielding more impressive scaling figures. FCAT says an average of 84 FPS. Fraps says 84 FPS as well. It’s a match that yields a 75% boost with a second card.

Charting frame rate over time exposes more dramatic changes in instantaneous performance, with dips to 50 FPS and peaks up to 100 FPS. Even at its slowest, however, an array of Radeon R9 295X2 cards are faster than the fastest competition.
It’d be easy to call that the perfect example of why you’d spend $3000 on four Hawaii GPUs. But it’s not. There’s an experiential element that doesn’t show up in the average frame rate or frame rate over time charts, and that’s stutter. The stutter is so much more apparent with four GPUs than two, and I’d rather have the smoother game at lower frame rates than whatever two Radeon R9 295X2s give you. Battlefield 4 suffers from this more severely than any other title in our suite.

Nvidia’s GeForce GTX 780 Ti actually shows up in last place due to its 3 GB of memory per GPU, which isn’t enough for a smooth experience in this game. I’ve already done everything I can to dissuade you from 3 GB cards for 3840x2160, and that story remains intact.
More surprising to me is that we don’t get the sense of choppiness by looking at frame time variance. Typically a 95th percentile result in the 6 ms range isn’t bad.

Putting frame time variance on a line chart shows where there’s a ton of difference between frames rendered by two GeForce GTX 780 Tis. You can see where the two Radeon R9 295X2s peek out from behind, though. And again, regardless of what the charts say, the stutter is impossible to ignore and all the more bothersome from three-grand worth of graphics cards.

Aside from Battlefield 4, Grid 2 is the other game that previously benchmarked poorly, demonstrating negative scaling. The original prognosis was that this typically-platform-bound title was maxed out by a pair of Hawaii GPUs, and the overhead of two more hurt performance. Again, a spiky frame rate over time graph seemed to corroborate.
The same troubleshooting that helped knock the Battlefield 4 numbers into line works here as well. FCAT shows the in-game benchmark averaging 152 FPS and Fraps says 156. There are dropped frames observable in the FCAT output, so this checks out. Still, we end up with 55% scaling, and that’s not bad for a title often held back by processor and system memory performance.

Even though frame rates peak above 200 FPS and dip under 120, Grid 2 runs smoothly at 3840x2160. This isn’t one of the games we’d worry about with regard to stuttering. Unfortunately, the second Radeon R9 295X2 also isn’t needed for an enjoyable experience, even with the Ultra preset applied. A pair of Hawaii GPUs is already capable of 98 FPS on average, after all.

The performance of all five configurations is so high that even a last-place showing in the frame time variance chart is perfectly acceptable for two Radeon R9 295X2s. At worst, you’re looking at a 95th percentile figure under 2 ms.

This is what that looks like on a line over time. There are some clear examples where the time between two frames spikes, but every combination of cards experiences that on occasion. The higher average variance comes from the underlying trend, which you can see as the red line consistently peeking up over the other colors along the bottom.

By this point, I was hoping that one little driver toggle fix would solve the performance issues throughout our benchmark suite. Unfortunately, the taxing Metro: Last Light benchmark shows that there are still issues to be worked out. FCAT indicates 49 average frames per second, Fraps confirms, and our original quad-CrossFire piece also reported 49 FPS.

Frame rate over time makes it pretty clear that there’s some sort of bottleneck in the Metro benchmark. Thinking this might be limited to the canned test, we played through pieces of the actual game and found that performance didn’t improve there, either.
There’s clearly a bottleneck of some sort in play, and given the close proximity of four different configurations, it might not be an AMD-only problem. With that said, AMD didn’t have any feedback regarding what might be happening when we asked.

The frame time variance figures show only that two Radeon R9 295X2s land at the back of the pile. A 5 ms worst-case figure shouldn’t be construed as bad, though. Moreover, our experiences in-game and with the built-in benchmark didn’t turn up issues with stuttering.

Quad-CrossFire does demonstrate the large variance spikes, though again, the experience isn’t made perceptibly worse by them.

For the first time, we observe significant difference between the FCAT- and Fraps-reported benchmark results using Thief’s in-game test. FCAT tells us that there’s a 62 FPS average, while Fraps spits back 77 FPS. But Fraps also tries convincing us that the game dips as low as 6 FPS and shoots as high as 1174 FPS, which surely throws the average out of whack. The FCAT number is far more believable, dropping to 16 FPS, and peaking under 90 FPS. That’s scaling in the 38% range, which is not great.

What’s up with the big difference between FCAT and Fraps in this title? Using four GPUs, the Thief benchmark exhibits strange behavior in that it starts, chops through a few seconds, and then spits out a rendered payload in faster-than-real-time until it catches up with where the action is supposed to be. AMD suggests to us that this could be due to the app compiling thousands of shaders upfront, affecting performance. If you play through the game for several minutes, the frame rate does even out a bit.
The big drop in performance happens at the end of the test for no clear reason.

Two Radeon R9 295X2s in CrossFire again yield the highest worst-case frame time variance, though I normally don’t consider the 6 ms-range problematic. We do know, however, that results in the 5 ms range can be distinguished in blind testing, depending on the title.

Big frame time variance spikes are indicative of our biggest problem with the Thief benchmark: severe stuttering. As with Battlefield 4 the experience in Thief simply isn’t acceptable. The issue was confirmed when we went into the actual game and encountered the same stuttering issues.

Tomb Raider received the bulk of our attention because of its odd behavior. But it too typifies the issues AMD is facing.
Let’s start with the chart above. It’s correct for the in-game benchmark we ran. FCAT says we’re seeing an average of 57 FPS. Fraps confirms that 57 FPS sounds about right. And our previous FCAT-generated chart said 51 FPS. In all cases, that’s negative scaling compared to one Radeon R9 295X2 at 60 FPS.
Now, bear in mind that this is a benchmark taken straight from the game. It was chosen by our very own Paul Henningsen for its load and repeatability compared to sequences elsewhere. If you instead choose to run Tomb Raider’s built-in benchmark, you’ll start with around 56 FPS with one Radeon R9 295X2 and end up around 100 FPS. That’s the result AMD is expecting, and we replicated it on our side.
So we have an in-game benchmark that is helped along by four GPUs and a real sequence from Tomb Raider that scales negatively. Almost certainly, something is bottlenecking AMD’s cards, since we have folks at iBuyPower running numbers concurrently using our test and showing that you can use two, three, and four GeForce GTX Titans and still scale performance.
But why the negative scaling? It also turns out that, with a single Radeon R9 295X2 under the hood, there’s an aspect ratio bug, which renders the scene offset to one side. This means less of Lara is rendered on a more regular basis, lightening the load. Switching to quad-CrossFire fixes the aspect ratio, creating a more demanding benchmark. And thus, the frame rate drops compared to a single card.

The red line speaks for itself; one Radeon R9 295X2 outperforms two, but only because the sequence it’s rendering is also incorrect. There are bugs that need to be fixed.

As two Radeon R9 295X2s struggle with whatever’s going on in Tomb Raider, frame time variance is all over the place in a bad way. As you might have guessed, stuttering is a prominent issue in this game as well, and it’s so much worse with four GPUs than two.

And there’s what it looks like over time. Ouch.
Update, May 6, 2014: AMD maintains that the only supported way to hook its Radeon R9 295X2 up to a 4K monitor is through DisplayPort, and that no enthusiast will use the dual-HDMI method required for us to benchmark with our FCAT toolset. With that last point, I agree completely. DisplayPort is the only practical way to go in the real world.
But that also means leaning on Fraps exclusively for performance evaluation. We’ve already opened Pandora’s Box on that, and we know why Fraps isn’t the best solution for testing multi-GPU configurations. There’s no going back, and I just wouldn’t feel comfortable presenting Fraps data and asking you to make an exception and trust it. So, in addition to re-running all of the FCAT-generated numbers with lots of extra airflow and in two different platforms (I swapped the cards out of iBuyPower's box and into my own as a sanity check), we also ran Fraps numbers hooked up through DisplayPort. All of the results for quad-CrossFire are mentioned on the benchmark pages.
In two cases, a comparison between the results suggested an issue that went undetected in the first piece. Battlefield 4 and Grid 2 should have been much faster with four Hawaii GPUs compared to two, but weren’t. I have some guesses as to what happened, and I’ve presented them, along with all of my testing notes, to AMD’s driver team. We now report closer-to-expected frame rates in those titles.
Assassin’s Creed, Metro, Thief, and Tomb Raider come fairly close to what we saw a couple of weeks ago, and today’s re-worked charts make the same point over again. Arma 3 speeds up today versus our first evaluation of quad-CrossFire, but again, all of our testing is best-case, with maximum airflow through an open chassis.
Quantitatively, we end up with a couple of games in which AMD serves up impressive scaling given the historically difficult move from two to four GPUs. A few others demonstrate more modest scaling. And the last couple don’t scale at all, really. As a matter of principle, just based on the numbers, I stand by our original take on quad-CrossFire using two Radeon R9 295X2s: they’re a work in progress for gaming at 3840x2160. Idiosyncrasies abound, though AMD tells us that the experience from an integrator like Maingear would be significantly different.
But because some of the performance metrics suggest more attractive performance in games that we weren’t seeing before, it’s also important to consider the qualitative experience of gaming on four GPUs. In Battlefield 4, for example, the idea of 84 FPS from two 295X2s is sexy compared to one board’s 48 FPS. The stuttering you run into, however, makes the benchmark result irrelevant—you’d unplug the second card before ever trying to play that way. Assassin’s Creed IV is similarly intolerable, though AMD says that one's problematic because of Nvidia and its GameWorks library. Arma and Tomb Raider don’t escape this “Quality Index” unscathed, either. Only Grid 2 and Metro looked smooth enough to enjoy, and in the latter title, quad-CrossFire doesn’t affect performance at all.
What on earth could be to blame? First, consider the complexity of what AMD is trying to do. You have two dual-GPU boards communicating over the PCI Express bus. Each has its own PLX switch to facilitate communication on-card. However, the boards have to reach across the bus to synchronize with each other. At 3840x2160, each frame is about 33.2 MB. AMD has mechanisms in its driver to evaluate available bandwidth and cope with shortcomings by switching to software compositing. Still, ignoring one of its best practices (using a link smaller than 16 lanes or a motherboard with another PLX switch) could cause issues at 4K. But we tried isolating those variables and continued to see the same issues.
Obviously, this information is relevant to a privileged few able to consider three-grand worth of graphics hardware. But 4K is the future for a great many enthusiasts, and driving that many pixels isn’t easy. Gamers buying today are almost certainly looking at multi-GPU configurations of some sort, making twin Radeon R9 295X2s an attractive option (particularly given their outward-venting closed-loop liquid coolers). And although I continue saying good things about the 295X2, and wouldn’t have any problem using one in my personal system, there are still issues to work out with a pair operating in tandem.