Sign in with
Sign up | Sign in
GeForce GTX 680 2 GB Review: Kepler Sends Tahiti On Vacation
By ,
1. GeForce GTX 680: The Card And Cooling

Nvidia is fond of drawing parallels. With its prior-generation graphics cards, the company likened each model to a different role on a virtual battlefield. GeForce GTX 480 was the tank—big performance and, to a fault, a big price, big power, and big heat, as well. GeForce GTX 460 came to be referred to as the hunter, incorporating a better balance of speed, efficiency, and cost more apropos to gamers. Finally, GeForce GTS 450 was dubbed the sniper for its focus on enabling playable frame rates at 1680x1050, according to Nvidia.

As silly as that trio of categories seemed, they make it easier for us to put a finger on the pulse of GeForce GTX 680. Though its name (and price) suggests a successor to Nvidia’s current single-GPU flagship, this is decidedly the hunter—a gamer-oriented card that almost completely de-emphasizes the once-emphatic message of improving general-purpose compute performance. But hey, it does that whole gamer thing really well, just like the GeForce GTX 460.

GK104: The hunter that succeeded a tankGK104: The hunter that succeeded a tank

Fitting In Isn’t Always Easy

Regardless of the role it was designed to fill, competition is the biggest influencer of positioning. AMD may have higher-end cards on its roadmap that we haven’t seen or heard about yet. However, in the context of AMD’s six Radeon HD 7000-series boards that are already available, Nvidia knows exactly what it’s up against.

Had Radeon HD 7970 been 30 or 40 percent faster than it is, there’s a good chance we wouldn’t be looking at a GeForce GTX 680 today. Maybe it would have been called GTX 660 or 670. But because of where AMD’s flagship sits, Nvidia sees justification in crowning its new hunter as a successor to its old tank—all the while making it pretty clear that another piece of heavy armor is in the works.  

What we have on our hands, then, is a $500 card based on Nvidia’s GK104 graphics processor, designed specifically for gamers (if you’re interested in compute potential, you’ll have to keep waiting). GeForce GTX 680 addresses some of last generation’s most glaring competitive disadvantages, and it adds a handful of interesting features, too. 

Meet GeForce GTX 680

At 10” long, the GeForce GTX 680’s PCB is half an inch longer than AMD’s Radeon HD 7800 boards and half an inch shorter than the Radeon HD 7900s.

Looking at the card head-on, we see that it employs a centrifugal fan, which pushes hot air out of the card’s rear bracket. There’s only about half of one slot available for passing exhaust back there. But as we’ll see in our thermal and acoustic tests, the GTX 680 does not have a problem contending with heat.

The rest of the dual-slot bracket plays host to four display outputs: two dual-link DVI connectors, one full-sized HDMI port, and a DisplayPort output. All of them are useable concurrently, addressing one of our sharpest critiques of all prior-gen Fermi-based boards. At long last, we can consider multi-monitor gaming tests to replace 2560x1600 in our single-card reviews (and indeed, multi-monitor benchmarks will follow in a story on which we're already working)! Like AMD, Nvidia claims that this card support HDMI 1.4a, monitors with 4K horizontal resolutions, and multi-stream audio.

Up top, GeForce GTX 680 features twin SLI connectors, enabling two-, three-, and four-way SLI configurations. In comparison, AMD’s Radeon HD 7970 and 7950 similarly support up to four-way arrays.

We also get our first piece of physical evidence that Nvidia’s GK104 processor was designed to be a viable option in more mainstream environments: GeForce GTX 680 employs two six-pin auxiliary power connectors. Those two inputs, plus a PCI Express slot, facilitate up to 225 W of power delivery. Nvidia rates this card for up to 195 W. However, it also says typical power use is closer to 170 W. Keep those numbers in mind—the available headroom between 170 W and the 225 W specification ceiling come into play shortly.

Keeping GeForce GTX 680 Cool

Nvidia claims it put a significant effort into three aspects of its dual-slot cooler’s design, contributing to an impressive acoustic footprint, even under load. We can peel back the GTX 680’s shroud for a closer look…

First, the company cites a trio of horseshoe-shaped heat pipes embedded in the GPU heat sink, which quickly draw heat away from GK104. Thermal energy is transferred from those pipes into a dual-slot aluminum sink. 

Optimizations to the sink itself also rank on Nvidia’s list of improvements. For instance, the fin stack is angled where air exits, creating more space, internally, between the cooler and exhaust grate. Apparently, in prior designs, heat was getting trapped between the bracket and fins, affecting cooling performance. This new approach is said to yield lower temperatures than the older implementation.

Finally, Nvidia says it added acoustic dampening material to the fan motor—a step it also took with the GeForce GTX 580, which contributed to getting that card’s noise level down compared to its maligned predecessor.

2. GK104: The Chip And Architecture

GeForce GTX 680’s Vital Signs

Once we strip the card of its cooling hardware, we’re left with a bare board.

The GK104 GPU sits at its center, composed of 3.54 billion transistors on a 294 mm² die. Again, Nvidia lands between two of AMD’s current-gen offerings: the Tahiti GPU contains 4.31 billion transistors in a 365 mm² die, while Pitcairn is composed of 2.8 billion transistors in a 212 mm² die.

Knowing that GK104 is manufactured on TSMC’s 28 nm node, the GPU’s supply will almost certainly be tight as multiple customers vie for the foundry’s limited output. With that said, the company tells us to expect some availability at launch, with another wave of boards to arrive in early April.

Now, let’s have a look at the GPU’s specifications compared to GeForce GTX 580 and AMD’s Radeon HD 7970:


GeForce GTX 680
Radeon HD 7950
Radeon HD 7970
GeForce GTX 580
Shaders1536
1792
2048512
Texture Units
128
112
128
64
Full Color ROPs
32
32
32
48
Graphics Clock
1006 MHz
800 MHz
925 MHz772 MHz
Texture Fillrate
128.8 Gtex/s
89.6 Gtex/s
118.4 Gtex/s
49.4 Gtex/s
Memory Clock
1502 MHz
1250 MHz
1375 MHz1002 MHz
Memory Bus
256-bit
384-bit
384-bit384-bit
Memory Bandwidth192.3 GB/s
240 GB/s
264 GB/s
192.4 GB/s
Graphics RAM
2 GB GDDR5
3 GB GDDR5
3 GB GDDR5
1.5 GB GDDR5
Die Size
294 mm2365 mm2
365 mm2
520 mm2
Transistors (Billion)
3.54
4.31
4.31
3
Process Technology
28 nm
28 nm
28 nm40 nm
Power Connectors
2 x 6-pin
2 x 6-pin
1 x 8-pin, 1 x 6-pin1 x 8-pin, 1 x 6-pin
Maximum Power
195 W
200 W
250 W
244 W
Price (Street)
$510
$450
$550
~$410


The GK104 GPU is broken down into four Graphics Processing Clusters (GPCs), each of which contains two Streaming Multiprocessors (formerly referred to as SMs, and now called SMXs).

GeForce GTX 680's GK104 GPUGeForce GTX 680's GK104 GPU

While there’s undoubtedly a lot of depth we could go into on the evolution from the original GF100 design to the GK104 architecture debuting today, the easiest way to characterize this new chip is in reference to the GF104 processor that first powered GeForce GTX 460.

In comparison to GF104's SM, the GK104 SMXs features two times as many warp schedulers (from two up to four), dispatch units (from four up to eight), and texture units (from eight up to 16) per Shader Multiprocessor, along with a register file that’s twice as large. It also sports four times as many CUDA cores. GF104 included 48 shaders per SM; GK104 ups that to 192 shaders in each SMX.

GK104 SMX (Left) Versus GF104 SM (Right)
Per SM:
GK104GF104Ratio
CUDA Cores
192
48
4x
Special Function Units
32
8
4x
Load/Store
32
16
2x
Texture Units
16
8
2x
Warp Schedulers
4
2
2x
Geometry Engines
1
1
1x


Why quadruple the number of CUDA cores and double the other resources? Kepler’s shaders run at the processor’s frequency (1:1). Previous-generation architectures (everything since G80, that is) operated the shaders two times faster than the core (2:1). Thus, doubling shader throughput at a given clock rate requires four times as many cores running at half-speed. 

The question then becomes: Why on earth would Nvidia throttle back its shader clock in the first place? It’s all about the delicate balance of performance, power, and die space, baby. Fermi allowed Nvidia’s architects to optimize for area. Fewer cores take up less space, after all. But running them twice as fast required much higher clock power. Kepler, on the other hand, is tuned for efficiency. Halving the shader clock slashes power consumption. However, comparable performance necessitates two times as many data paths. The result is that Kepler trades off die size for some reduction in power on the logic side, and more significant savings from clocking.

Additional die area and power are cut by eliminating some of the multi-ported hardware structures used to help schedule warps and moving that work to software. By minimizing the amount of power and area consumed by control logic, more space is freed up for doing useful work versus Fermi.

Nvidia claims that the underlying changes made to its SMX architecture result in a theoretical doubling of performance per watt compared to Kepler’s predecessor—and this is actually something we’ll be testing today.

Alright. We have eight of these SMXs, each with 192 CUDA cores, adding up to 1536. Sixteen texture units per SMX yield 128 in a fully-enabled GK104 processor. And one geometry engine per SMX gives us a total of eight. Now, wait a minute—GF104 had eight PolyMorph engines, too. So Nvidia multiplied all of these other resources out, but left its primitive performance alone? Not quite.

To begin, Nvidia claims that each PolyMorph engine is redesigned to deliver close to two times the per-clock performance of Fermi’s fixed-function geometry logic. This improvement is primarily observable in synthetic metrics, which we’ve seen developers (rightly) argue should represent the future of their efforts, exploiting significantly more geometry than today’s titles to augment realism. But if you’re playing something available right now, like HAWX 2, how can you expect Kepler to behave?

In absolute terms, GeForce GTX 680 easily outperforms the Radeon HD 7970 and 7950, regardless of whether tessellation is used or not. However, Nvidia’s new card takes a 31% performance hit when you toggle tessellation on in the game’s settings. In comparison, the Tahiti-based Radeons take a roughly 16% ding. Now, there’s a lot more going on in a game than just tessellation to affect overall performance. But it’s still interesting to note that, in today’s titles, the advantage Nvidia claims doesn’t necessarily pan out. 

In order to support the theoretically-higher throughput of its geometry block, Nvidia also doubles the number of raster engines compared to GF104, striking a 1:1 ratio with the ROP partitions.

Like GF104, each of GK104’s ROPs outputs eight 32-bit integer pixels per clock, adding up to 32. The two GPUs also share 256-bit aggregate memory buses in common. Where they differ is maximum memory bandwidth. 

Perhaps more true to its mid-range price point, GeForce GTX 460 included up to 1 GB of GDDR5 running at 900 MHz, yielding 115.2 GB/s. GeForce GTX 680 is being groomed to succeed GeForce GTX 580, though, which utilized 1002 MHz GDDR5 on a 384-bit aggregate bus to serve up 192.4 GB/s of throughput. Thus, GeForce GTX 680 comes armed with 2 GB of GDDR5 operating at 1502 MHz, resulting in a very similar 192.26 GB/s figure.

3. GPU Boost: Graphics Afterburners

GPU Boost: When 1 GHz Isn’t Enough

Beyond its use of very fast GDDR5 memory, GeForce GTX 680 also enjoys a pretty significant graphics clock advantage compared to any of Nvidia’s prior designs: the GPU operates at 1006 MHz by default. But Nvidia says the engine can actually run faster—even within its 195 W thermal design power.

The problem with setting a higher frequency is that applications tax the processor in different ways, some more rigorously than others. Nvidia guarantees that GeForce GTX 680 will duck under its power ceiling at 1006 MHz, even in demanding real-world titles. Less power-hungry games end up leaving performance on the table, though. Enter GPU Boost.

When headroom exists in the card’s thermal budget, GPU Boost dynamically increases the chip’s clock rate and voltage to improve performance. The mechanism is actually somewhat similar to Intel’s Turbo Boost technology in that a number of variables are monitored (power use, temperature, utilization, and more—Nvidia won’t divulge the full list) by on-board hardware and software. Data is fed through a software algorithm, which then alters the GPU’s clock within roughly 100 ms.  

As you might imagine, for every variable that GeForce GTX 680 measures, a handful of other factors come into play. Are you using the card in Antarctica or in Death Valley? Was your GK104 cut from the outside of wafer or the center? The point is that no two GeForce GTX 680s will behave identically. And there’s no way to turn GPU Boost off, so we can’t isolate its effect. Consequently, Nvidia’s board partners are going to cite two figures on their packaging. First, you’ll get the base clock, equivalent to the classic core speed. Second, vendors will need to report a boost clock, representing the average frequency achieved under load in a game that doesn’t hit this card’s TDP. For GeForce GTX 680, that number should be 1058 MHz. Quite the mouthful, right? Let’s try illustrating.

I ripped the chart above from an upcoming page where we measure performance per watt. What it shows is power consumption over time during six different game benchmarks, run at 1920x1080. And as you can see, system power use ranges from about 260 to somewhere around 370 W. On average, there might be a 50 W difference between a largely processor-bound game like World of Warcraft and a shader-intensive title like DiRT 3.

Given these power numbers, you might expect that the most power-hungry titles hit the 680’s ceiling, while others get nice, fat clock rate infusions.

Power in %, frequency in MHz, and utilization in %, over timePower in %, frequency in MHz, and utilization in %, over time

In all actuality, though, even a game like DiRT gets sped-up. In the The first chart from the top shows maximum board power in percent, and it’s pretty clear that World of Warcraft bounces around quite a bit. Given plenty of observable headroom, it gets our board’s maximum clock rate of 1110 MHz. Metro 2033 is another fairly low-power game, which also gets pegged at 1110 MHz. Crysis 2, DiRT 3, and even Battlefield 3 receive less consistent boosts. Still, though, they run a bit faster, too.

How does GPU Boost differ from AMD’s PowerTune technology, which we first covered in our Radeon HD 6970 and 6950 review? In essence, PowerTune adds granularity between a GPU’s highest p-state and the intermediate state it’d hit if TDP were exceeded. From my Cayman coverage:

“…rather than designing the Radeon HD 6000s with worst-case applications in mind, AMD is able to dial in a higher core clock at the factory (880 MHz in the case of the 6970) and rely on PowerTune to modulate performance down in the applications that would have previously forced the company to ship at, say, 750 MHz.”

In other words, a PowerTune-equipped graphics card might ship at a faster frequency than a board designed with a worst-case thermal event in mind—and that’s good. But it does not accelerate to exploit available power headroom. AMD considers this a good thing, touting consistent performance as a virtue.

Everyone’s take is going to be a different, of course. We’re not particularly enthused about the impact of variability in our numbers and inconsistent scaling from one game to another as GPU Boost tinkers around in the background. At the same time, we appreciate the ingenuity of a technology that dynamically strives to maximize performance at a given power budget when it can, and then throttles down when it’s not needed. I think that our concerns have to be outweighed by the real-world potential benefit to gamers in this case. We would like to have an option to turn GPU Boost off, though, if only to measure a baseline performance number in some sort of development mode.

4. Overclocking: I Want More Than GPU Boost

The implementation of GPU Boost does not preclude overclocking. But because you can’t disable GPU Boost like you might with Intel’s Turbo Boost, you have to operate within the technology’s parameters.

For example, overclocking is now achieved through an offset. You can easily push the base 3D clock up 100, 150, or even 200 MHz. However, if a game was already TDP-constrained at the default clock, it won’t run any faster. In apps that weren’t hitting the GTX 680’s thermal limit before, the offset pushes the performance curve closer to the ceiling.

Because GPU Boost was designed to balance clock rate and voltage with thermal design power in mind, though, overclocking is really made most effective by adjusting the board’s power target upward as well. EVGA’s Precision X tweaking tool includes built-in sliders for both the power target and the GPU clock offset.

Although GeForce GTX 680’s TDP is 195 W, Nvidia says the card’s typical board power is closer to 170 W. So, increasing the power slider actually moves this number higher. At +32%, Precision X’s highest setting is designed to get you right up to the 225 W limit of what two six-pin power connectors and a PCI Express slot are specified to deliver.  

Using Crysis 2 as our very-consistent test case, we can measure the impact of each different alteration and its effect on performance.

First, we launch a single run of the Central Park level at 1920x1080 in DirectX 11 mode, without anti-aliasing. We get a 72.3 FPS result, and we observe GPU Boost pushing the GeForce GTX 680 between 1071 and 1124 MHz during the run (up from the 1006 MHz base).

The top chart shows that we’re bouncing around the upper end of GK104’s power ceiling. So, we increase the target board power by 15%. The result is a small jump to 74.2 FPS, along with clocks that vacillate between 1145 and 1197 MHz.

Figuring the power target boost likely freed up some thermal headroom, we then increase the offset by 100 MHz, which enables even better performance—76.1 FPS. This time, however, we get a constant 1215 MHz. Nvidia says this is basically as fast as the card will go given our workload and the power limit.

So why not up the target power again? At 130% (basically, the interface’s 225 W specification), performance actually drops to 75.6 FPS, and the graph over time shows a constant 1202 MHz. We expected more performance, not less. What gives? This is where folks are going to find a problem with GPU Boost. Because outcome is dependent on factors continually being monitored, performance does change over time. As a GPU heats up, current leakage increases. And as that happens, variables like frequency and voltage are brought down to counter a vicious cycle.

The effect is similar to heat soak in an engine. If you’re on a dynamometer doing back to back pulls, you expect to see a drop in horsepower if you don’t wait long enough between runs. Similarly, it’s easy to get consistently-high numbers after a few minute-long benchmarks. But if you’re gaming for hours, GPU Boost cannot be as effective.

Our attempt to push a 200 MHz offset demonstrates that, even though this technology tries to keep you at the highest frequency under a given power ceiling, increasing both limits still makes it easy to exceed the board’s potential and seize up.

Sliding back a bit to a 150 MHz offset gives us stability, but performance isn’t any better than the 100 MHz setting. No doubt, it’ll take more tinkering to find the right overclock with GPU Boost in the mix and always on.

5. PCI Express 3.0 And Adaptive V-Sync

PCI Express 3.0: One Last Perf Point

GeForce GTX 680 includes a 16-lane PCI Express interface, just like almost every other graphics card we’ve reviewed in the last seven or so years. However, it’s one of the first boards with third-gen support. All six Radeon HD 7000 family members preempt the GeForce GTX 680 in this regard. But we already know that, in today’s games, doubling the data rate of a bus that isn’t currently saturated doesn’t impact performance very much.

By default, GTX 680 runs in X79 at PCIe 2.0 data ratesBy default, GTX 680 runs in X79 at PCIe 2.0 data ratesEnabling PCIe 3.0 is achieved through a driver update.Enabling PCIe 3.0 is achieved through a driver update.

Nevertheless PCI Express 3.0 support becomes a more important discussion point here because Nvidia’s press driver doesn’t enable it on X79-based platforms. The company’s official stance is that the card is gen-three-capable, but that X79 Express is only validated for second-gen data rates. Drop it into an Ivy Bridge-based system, though, and it should immediately enable 8 GT/s transfer speeds.

Nvidia sent us an updated driver to prove that GeForce GTX 680 does work, and indeed, data transfer bandwidth shot up to almost 12 GB/s. Should Nvidia validate GTX 680 on X79, a new driver should be the answer. In contrast, the data bandwidth of AMD’s Radeon HD 7900s slides back from what we’ve seen in previous reviews. Neither AMD nor Gigabyte is able to explain why this is happening.

Adaptive V-Sync: Smooth Is Good

When we benchmark games, we’re perpetually looking for ways to turn off vertical synchronization, or v-sync, which creates a relationship between our monitors’ refresh and graphics card frame rate. By locking our frame rate to 60 FPS on a 60 Hz LCD, for example, we wouldn’t be conveying the potential performance of a high-end graphics card capable of averaging 90 or 100 FPS. In most titles, turning off v-sync is a simple switch. In others, we have to hack our way around the feature to make the game testable.

In the real world, however, you want to use v-sync to prevent tearing—an artifact that occurs when in-game frame rates are higher than the display’s refresh and you show more than one frame on the screen at a time. Tearing bothers gamers to varying degrees. However, if you own a card capable of keeping you above a 60 FPS minimum, there’s really no downside to turning v-sync on.

Dropping under 60 FPS is where you run into problems. Because the technology is synchronizing the graphics card output with a fixed refresh, anything below 60 Hz has to still be a multiple of 60. So, running at 47 frames per second, for instance, actually forces you down to 30 FPS. The transition from 60 to 30 manifests on-screen as a slight stutter. Again, the degree to which this bothers you during game play is going to vary. If you know where and when to expect the stutter, though, spotting it is pretty easy.

Nvidia’s solution to the pitfalls of running with v-sync on or off is called adaptive v-sync. Basically, any time your card pushes more than 60 FPS, v-sync remains enabled. When the frame rate drops below that barrier, v-sync is turned off to prevent stuttering. The 300.99 driver provided with press boards enables adaptive V-sync through a drop-down menu that also contains settings for turning v-sync on or off.

Given limited time for testing, I was only really able to play a handful of games with and without v-sync, and then using adaptive v-sync. The tearing effect with v-sync turned off is the most distracting artifact. I’m less bothered when v-sync is on. Though, to be honest, it takes a title like Crysis 2 at Ultra quality to bounce above and below 60 FPS with any regularity on a GeForce GTX 680.

Overall, I’d call adaptive v-sync a good option to have, particularly as it permeates slower models in Nvidia’s line-up, which are more likely to spend time under the threshold of a display’s native refresh rate.

6. Hardware Setup And Benchmarks
Test Hardware
Processors
Intel Core i7-3960X (Sandy Bridge-E) 3.3 GHz at 4.2 GHz (42 * 100 MHz), LGA 2011, 15 MB Shared L3, Hyper-Threading enabled, Power-savings enabled
Motherboard
Gigabyte X79-UD5 (LGA 2011) X79 Express Chipset, BIOS F9
Memory
G.Skill 16 GB (4 x 4 GB) DDR3-1600, F3-12800CL9Q2-32GBZL @ 9-9-9-24 and 1.5 V
Hard Drive
Intel SSDSC2MH250A2 250 GB SATA 6Gb/s
Graphics
Nvidia GeForce GTX 680 2 GB

AMD Radeon HD 7970 3 GB

AMD Radeon HD 7950 3 GB

AMD Radeon HD 6990 4 GB

Nvidia GeForce GTX 590 3 GB

Nvidia GeForce GTX 580 1.5 GB
Power Supply
Cooler Master UCP-1000 W
System Software And Drivers
Operating System
Windows 7 Ultimate 64-bit
DirectX
DirectX 11
Graphics DriverNvidia GeForce Release 300.99 (For GTX 680)

Nvidia GeForce Release 296.10 (For GTX 580 and 590)

AMD Catalyst 12.2


Undoubtedly, a number of Tom’s Hardware regulars saw that some of my launch-day data was pulled from the CMS earlier in the week and quickly spread around the Web. Based on the data, many conclusions were prematurely drawn without the context of hardware setup, drivers, or lab time in front of all of these cards. Even another vendor got in on the action, wondering why the stolen Skyrim results with FXAA enabled were so much different from the Skyrim numbers with adaptive anti-aliasing from Don’s Radeon HD 7800-series story.

Understand that, for each of these reviews, we update our test bed to the latest BIOS, we download all of the latest Windows Updates, and we download the games themselves, along with their newest patches. Some variation is expected, and anything out of the ordinary is checked and checked again. In this case, because I looked at some of the feedback, I was able to go back, re-run tests, and have another lab validate the results—everything checks out.

Of course, if you’re truly interested in a discussion about graphics testing methodology, test settings, and analysis, I welcome constructive discourse.

Games
Battlefield 3
Ultra Quality Settings, No AA / 16x AF, 4x MSAA / 16x AF, v-sync off, 1680x1050 / 1920x1080 / 2560x1600, DirectX 11, Going Hunting, 90-second playback, Fraps
Crysis 2
DirectX 9 / DirectX 11, Ultra System Spec, v-sync off, 1680x1050 / 1920x1080 / 2560x1600, No AA / No AF, Central Park, High-Resolution Textures: On
Metro 2033
High Quality Settings, AAA / 4x AF, 4x MSAA / 16x AF, 1680x1050 / 1920x1080 / 2560x1600, Built-in Benchmark, Depth of Field filter Disabled, Steam version
DiRT 3
Ultra High Settings, No AA / No AF, 8x AA / No AF, 1680x1050 / 1920x1080 / 2560x1600, Steam version, Built-In Benchmark Sequence, DX 11
The Elder Scrolls V: Skyrim
High Quality (8x AA / 8x AF) / Ultra Quality (8x AA, 16x AF) Settings, FXAA enabled, vsync off, 1680x1050 / 1920x1080 / 2560x1600, 25-second playback, Fraps
3DMark 11
Version 1.03, Extreme Preset
HAWX 2
Highest Quality Settings, 8x AA, 1920x1200, Retail Version, Built-in Benchmark, Tessellation on/off
World of Warcraft: Cataclysm
Ultra Quality Settings, No AA / 16x AF, 8x AA / 16x AF, From Crushblow to The Krazzworks, 1680x1050 / 1920x1080 / 2560x1600, Fraps, DirectX 11 Rendering, x64 Client
SiSoftware Sandra 2012
Sandra Tech Support (Engineer) 2012.SP2, GP Processing and GP Bandwidth Modules
CyberLink MediaEspresso 6.5
449 MB MPEG-2 1080i Video Sample to Apple iPad 2 Profile (1024x768)
LuxMark 2.0
64-bit Binary, Version 1.0, Classroom Scene
7. Benchmark Results: 3DMark 11 (DX 11)

Although we like to start our benchmark suite with Futuremark’s synthetic metric, we know that it doesn’t always reflect the ebb and flow of real-world games, which are often affected by developer relationships and extra optimizations in one direction or the other.

From that perspective, the GeForce GTX 680 falls right under both dual-GPU cards, which we’re really only including for the purpose of exhibition at this point, since neither is available any more.

AMD’s Radeon HD 7970 gives up its short-lived lead, though based on its advantage over the GeForce GTX 580, we can tell it’s still a very fast piece of hardware.

8. Benchmark Results: Battlefield 3 (DX 11)

At 1680x1050 and 1920x1080, Nvidia’s GeForce GTX 680 slides past the GeForce GTX 590 when anti-aliasing isn’t in the picture, though AMD’s Radeon HD 6990 is the fastest card of all with and without a combination of 4x MSAA/FXAA turned on. Nvidia’s new flagship holds its largest lead over the Radeon HD 7970 at lower resolutions. By the time we reach 2560x1600, it’s much smaller.

We received some feedback in our Radeon HD 7800-series launch story asking us to go back to Ultra quality settings, and that’s what we’re doing here. It’s pretty amazing that, late last year, we were having a tough time coming up with single-card configurations that’d handle this game at its highest detail options in Battlefield 3 Performance: 30+ Graphics Cards, Benchmarked. Now, almost any of these boards is suitable all the way up to 2560x1600 without AA turned on (though you could probably get away with FXAA, no sweat).

9. Benchmark Results: Crysis 2 (DX 9/DX 11)

Sorted according to DirectX 9 results, Nvidia’s new GeForce GTX 680 leads the Radeon HD 7970 1680x1050 and 1920x1080, falling behind at 2560x1600. It’s more probable, though, that if you own one of these high-end boards, you’ll use it to play under DirectX 11. In that case, Nvidia’s single-GPU flagship is actually a second-place finisher up and down the charts, getting bested only by the defunct GeForce GTX 590. It actually puts down playable average frame rates at up to 1920x1080, too.

Very recently, AMD’s cards were having a difficult time in our DirectX 9-based Crysis 2 tests. The company’s driver team is clearly hard at work, optimizing for the idiosyncrasies of its new GCN architecture. Now, DirectX 9 is one of AMD’s strengths in this game, while performance under DirectX 11 appears to be lagging back more than we’d expect given other benchmark results.

10. Benchmark Results: The Elder Scrolls V: Skyrim (DX 9)

The GeForce GTX 680 takes second place in Skyrim at 1680x1050 and 1920x1080. But across-the-board performance is so good that the victory isn’t entirely meaningful.

Frame rates slow down just enough at 2560x1600 that the Radeon HD 6990 sneaks past Nvidia’s new card. Again, though, neither the 6990 nor the GTX 590 are even for sale anymore, so their significance is largely symbolic. Frankly, I’m glad to see them go.

Back on topic, the GeForce GTX 680 maintains just enough distance ahead of AMD’s single-GPU flagship to edge it out at all three of our tested resolutions.

11. Benchmark Results: DiRT 3 (DX 11)

With no MSAA applied to our test sequence, the GeForce GTX 680 manages to beat both of the dual-GPU cards at each of the resolutions we test in DiRT 3. Applying 8x MSAA drops Nvidia’s new board behind the Radeon HD 6990 and GeForce GTX 590, again, at all three resolutions. However, it’s still the fastest single-GPU card available, posting playable numbers all the way through 2560x1600.

AMD’s Radeon HD 7970 has little trouble staving off the previous-best GeForce GTX 580, particularly at higher resolutions like 2560x1600.

12. Benchmark Results: World Of Warcraft: Cataclysm (DX 11)

We had a couple of folks request benchmarks with the new 64-bit World of Warcraft client, which was released with patch 4.3.3. So, today’s charts represent the latest 64-bit build.

Nvidia tends to do better in WoW—a trend we first observed in World Of Warcraft: Cataclysm--Tom's Performance Guide. This continues more than a year later after many game patches and driver revisions. The GeForce GTX 590 takes first place, followed closely by the GeForce GTX 680 and GTX 580.

Did you notice the Radeon HD 6990 sitting at the bottom of the charts in our first two resolutions? Although its performance without anti-aliasing lands it in last place, two Cayman GPUs give up very little when you apply 8x MSAA. WoW depends heavily on host processor performance. Meanwhile, CrossFire is known to exact more CPU overhead than SLI. Put those two observations together, and AMD’s dual-GPU card lags behind the crowd, but then enjoys the addition of anti-aliasing at very little performance cost.

Of course, once we throttle up to 2560x1600, the graphics load is such that the 6990 springs up behind Nvidia’s GeForce GTX 680 for third place.

13. Benchmark Results: Metro 2033 (DX 11)

Testing high-end graphics cards allows us to step up to High detail settings in Metro 2033. We still can’t get consistently playable frame rates at the Very High preset, and we dare not enable the DirectCompute-based depth of field filter known take a debilitating cut out of the overall frame rate.

At these settings, the GeForce GTX 680 starts strongly, finishing behind the Radeon HD 6990 and GeForce GTX 590 with adaptive anti-aliasing enabled. Once 4x MSAA is turned on, however, AMD’s Radeon HD 7970 is faster.

As we work our way up to 2560x1600, AMD’s single-GPU flagship manages to deliver better performance at both anti-aliasing settings, and the Radeon HD 7950 nearly manages to match pace with the GeForce GTX  680 under 4x MSAA.

Did you think you’d see a day when the GeForce GTX 580 would be the anchor on a chart of eight high-end graphics cards? Crazy.

14. Benchmark Results: Sandra 2012

If you need any evidence that GK104 was originally intended to fill the same “Hunter” role that the GeForce GTX 460 originally targeted, this is it.

Although the GK104 GPU’s increased shader count has a positive impact on 32-bit floating-point math, drastically outperforming the GeForce GTX 590, it’s unable to catch AMD’s Radeon HD 7950, 6990, or 7970.

Moreover, Nvidia limits 64-bit double-precision math to 1/24 of single-precision, protecting its more compute-oriented cards from being displaced by purpose-built gamer boards. The result is that GeForce GTX 680 underperforms GeForce GTX 590, 580 and to a much direr degree, the three competing boards from AMD.

AMD’s GCN architecture absolutely dominates this benchmark, forming a class entirely separate from the GeForce GTX 680 or Radeon HD 6990, which trade blows.

Using Nvidia’s latest 296.10 driver (and several earlier versions), the GeForce GTX 590 and 580 cannot complete this test using the OpenCL or DirectCompute paths.

15. Benchmark Results: Compute Performance In LuxMark 2.0

Last generation, Nvidia made compute performance and gaming equally important on its flagship GPU—the same piece of silicon that went into high-end Quadro cards and the GeForce GTX 480.

This time around, at the event introducing GeForce GTX 680 to press from around the world, the company refused to discuss compute, joking that it took a lot of heat for pushing the subject with Fermi and didn’t want to go there again.

The more complete story is that it doesn’t want to go there…yet. Sandra 2012 just showed us that the GeForce GTX 680 trails AMD’s Radeon HD 7900 cards in 32-bit math. And it gets absolutely decimated in 64-bit floating-point operations, as Nvidia purposely protects its profitable professional graphics business by artificially capping perfrmance. 

Not surprisingly, then, the OpenGL-based LuxMark 2.0 benchmark shows the GeForce GTX 680 dragging across the finish line.

In comparison, the GeForce GTX 580/590’s GF110 GPU is better-suited to general-purpose compute tasks. And Nvidia argues it’d rather sell you a workstation-oriented Quadro card or dedicated Tesla-based board. We’d counter that AMD’s Radeon HD 7900-series cards are, at least from a performance perspective, clearly viable alternatives in this particular workload (not to mention a lot cheaper).

16. Benchmark Results: NVEnc And MediaEspresso 6.5

Back when Intel introduced Quick Sync as Sandy Bridge’s secret weapon, I estimated that it’d take both AMD and Nvidia about a year to go from CUDA- and APP-based video transcoding to a more purpose-built fixed-function pipeline capable of better performance at substantially lower power use.

Well, AMD introduced its solution almost exactly one year after I wrote Intel’s Second-Gen Core CPUs: The Sandy Bridge Review. Unfortunately, drivers enabling the hardware-based feature weren’t ready when its Radeon HD 7970 launched. They didn’t make it into the Radeon HD 7950 review, either. We missed Video Codec Engine functionality a couple of weeks later when Radeon HD 7770 and 7750 hit our lab. And we were told to keep waiting more recently before the Radeon HD 7870 and 7850 introduction.

Now it’s Nvidia’s turn. GeForce GTX 680 includes a feature called NVEnc theoretically able to take a number of input codecs and decode, preprocess, and encode H.264-based content.

Intel’s year-old Quick Sync feature accepts MPEG-2, VC-1, and H.264 and outputs MPEG-2 or H.264. Conversely, Nvidia is not specific about compatible input formats. However, we know it’s limited to H.264 output. But while Intel’s engine maxes out at 1080p in and out, NVEnc purportedly supports up to 4096x4096 encodes.

Like Quick Sync, NVEnc is currently exposed through a proprietary API, though Nvidia does have plans to provide access to NVEnc through CUDA.

Nvidia gave us access to a beta version of CyberLink’s MediaEspresso 6.5 with support for its NVEnc fixed-function encode/decode acceleration feature.

Our standard workload for this app involves converting an almost-500 MB MPEG-2 file into an iPad 2-friendly H.264-encoded movie. We ran it over and over, coming up with inferior performance on the GeForce GTX 680 compared to Nvidia’s GeForce GTX 580 or 590. Then, the company let us know that there’s a bug in its driver affecting the performance of MPEG-2 transcodes.

So, I grabbed the H.264-based trailer for The Assault and tried again. Sure enough, NVENC made a much more pronounced difference, cutting the transcode time almost in half compared to the other two Nvidia cards.

It’s worth mentioning that, whereas we’ve had major issues getting AMD’s hardware-accelerated encode working in MediaEspresso, the latest drivers and latest build of CyberLink’s software seem to address our struggles. However, performance remains pretty modest. While the new Radeon HD 7900-series cards manage to slightly trail Nvidia’s prior-generation hardware in an H.264-to-H.264 transcode, the MPEG-2-to-H.264 operation is far less favorable. In both cases, the Radeon HD 6990 shows downright poorly.

Now, as far as we know, AMD’s Video Codec Engine—introduced late last year and conceptually similar to NVEnc—is still not functional. There’s a good chance this could help put AMD’s newest cards back in the running. However, the fact that we’re still waiting for driver support almost four months later is not impressive.

17. Temperature And Noise

Idle Temperature And Noise

Two years ago, when Nvidia led off with its Fermi-based GeForce GTX 480, the company was chided for egregious power consumption and correspondingly bad thermal output. Subsequently, it seemed to put more effort into augmenting its cooling solutions. The GeForce GTX 580 seemed to be the culmination of those improvements, enabling a single-GPU flagship that didn’t need to make its presence known.

A handful of engineering emphases, plus the benefits of 28 nm manufacturing, pay off here. The GeForce GTX 680 is quieter than any other high-end card at idle. And it doesn’t seem to require much airflow to stay cool; only the GeForce GTX 590 manages to hit a lower temperature after 10 minutes on the Windows desktop.

Load Temperature And Noise

More telling than idle performance, however, is a graphics card’s behavior under load. This is where the GeForce GTX 680 really impresses, achieving the lowest acoustic measurement in the bunch.

And although it remains quiet, the card’s cooler doesn’t allow temperatures to get out of control. AMD’s Radeon HD 7970 and 7950 both turn in lower thermal readings. However, the GTX 680 does outperform Nvidia’s other two tested cards.

18. Power Consumption

Nvidia’s idle power consumption is just as impressive as AMD’s. Its GeForce GTX 680 sits right between the Radeon HD 7970 and 7950 on Windows’ desktop.

However, Nvidia doesn’t enjoy the benefits of AMD’s ZeroCore technology once our test platform shuts off its display. Both Radeon HD 7900s shed an additional 13-16 W, while GeForce GTX 680 only drops two. That’s certainly an improvement from GeForce GTX 580, but AMD unquestionably holds the advantage here.

Now, here’s where stuff gets real.

As we saw several times in the performance benchmarks, the GeForce GTX 680 actually comes close to matching the performance of GeForce GTX 590 and Radeon HD 6990 in a few situations, and can even beat them in a title like DiRT 3. But look at the difference in power consumption.

The GeForce GTX 590—even though we think Nvidia did a fair job keeping it cool and quiet—is an ugly power hog. The Radeon HD 6990, which isn’t cooled well or kept quiet at all, is a little better. But still. Yuck.

As we start looking at the single-GPU cards, the situation improves. GeForce GTX 680 sits somewhere between Radeon HD 7970 and 7950—both cards that we’ve already observed to offer substantially better performance per watt than the previous champion, GeForce GTX 580.

What’s also interesting about the 3DMark demo, specifically, is that it throws two different workloads at our contenders. The GeForce GTX 590, in particular, shows that Deep Sea incurs more of a power cost, though all of the cards demonstrate some degree of inconsistent power consumption. Meanwhile, Nvidia’s GeForce GTX 680 gives us a fairly straight line all the way across, illustrating that GPU Boost is continually adjusting clocks/voltage to operate within its power envelope.

But although this information is interesting as theory, it’s not particularly telling of performance per watt. So, let’s dice this up another way…

19. Performance Per Watt: The Index

Both AMD and Nvidia claim to be offering unprecedented performance per watt of power consumed, and we believe that both companies are telling the truth.

Nvidia is taking the extra step, though, of adjusting its clock rate and voltage in real-time, based on the premise that no two workloads exact the same power demands. As a result, we can’t simply test one game, divide its average frame rate by average power use, and expect you to believe the outcome is representative of all games. But we also don’t have time to test every game at every resolution (yes, power consumption changes based on resolution, detail settings, and so on). So, we took the games from our suite, set them all to 1920x1080 using the most demanding settings possible, and charted the power behavior for each on all six cards.

This starts a little messy, but it gets easier as we go, so bear with us. First, four different graphics cards in six games. We have the data for GeForce GTX 590 and Radeon HD 6990 as well, but those two cards are just ugly...

It doesn’t matter that some of these tests wrap up before the others. What’s important is that we have the power captured, along with the performance generated during the test run. Charting everything out on a line graph simply shows you the upper and lower bounds for system power use in each game—and that no two games are identical.

Averaging all of the games together, we come up with an average power use figure for each card. AMD’s Radeon HD 7950 uses the least power, on average, followed by Nvidia’s GeForce GTX 680.

We already know that the Radeon HD 7970 is a faster graphics card than Nvidia’s GeForce GTX 580. The fact that it also uses less power tells us it’s more efficient without needing this next graph.

Averaging the frame rates for all six games in our 1920x1080 runs gives us an index of sorts there too, represented in frames per second. The GeForce GTX 680 easily captures the top position, followed by AMD’s Radeon HD 7970. The GeForce GTX 580 takes third place, followed closely by the Radeon HD 7950.

Update (3/23/2012): The original chart on this page showed GeForce GTX 680 at 172% of GeForce GTX 580's performance per watt. This result was derived from an Excel division error, which was noticed by German reader csc. It has since been corrected, yielding a more modest number. The overall effect remains the same, though we're certainly a lot further from Nvidia's original claim of a 2x improvement over GeForce GTX 580. Our apologies for the mistake.

Now, the GeForce GTX 580 is our frame of reference. We want to know how AMD’s and Nvidia’s respective architectures perform in comparison. We set the GTX 580 as 100%, and the rest of the results speak for themselves.

The Radeon HD 7970 and 7950 both do deliver more performance per watt of power used compared to GeForce GTX 580—and by a significant amount. But GeForce GTX 680 is like, way up there.

As a gamer, do you care about this? Not nearly as much as absolute performance, we imagine. And I personally doubt I’d ever pay more for a card specifically because it gave me better performance/watt. But with AMD and Nvidia both talking about their efficiency this generation, thanks to 28 nm manufacturing and new architectural decisions, the exercise is still interesting.

20. GeForce GTX 680: The Hunter Scores A Kill

Sometimes, when a new graphics card launches, we really have to put some effort into figuring out whether the performance and features justify the price. It’s not a science, and the right answer isn’t always crystal clear.

This is not one of those times.

GeForce GTX 680 is now the fastest single-GPU graphics card, and not by a margin that leaves room to hem or haw. Making matters worse for AMD, the GTX 680 is priced right between its Radeon HD 7970 and 7950. Providing that Nvidia’s launch price sticks, both Radeon HD 7900s need to be significantly less expensive in order to compete. I'd expect to see the 7970 drop $100. The 7950 would have to slide $50 to leave some room between the 7870 and 7970.

Every indication points to the GeForce GTX 680 beginning its life as a GK104-based embryo, destined to do its duty as Nvidia’s hunter-class card. With pointed strengths in gaming, compute performance was something it had to sacrifice, just like GeForce GTX 460. But fate dealt this chip a different hand when it proved competitive against AMD’s flagship in games. GK104 would not be following in GF104’s footsteps. Instead, it'd take the reigns from the GF110-derived tank, GeForce GTX 580. In principle, that's like Rosie taking over for Oprah. But rather than falling on its face, GK104 turns out to be a great follow-up.

Make no mistake—AMD’s Radeon HD 7970 serves up better frame rates than Nvidia’s outgoing flagship at lower power. That’s a recipe for superior performance per watt, and our index demonstrates AMD’s success versus GeForce GTX 580. But then GeForce GTX 680 steps up with enough speed to outpace every other single-GPU card out there. And it only requires a pair of six-pin auxiliary power connectors. We can’t quite corroborate Nvidia’s claim that it improved on Fermi’s performance per watt by 2x. But real data does suggest it gets 72% 44% of the way there, which is still pretty crazy.

Given our benchmark data, power results, a distinguished list of features, and a competitive price tag, the GeForce GTX 680 is easily a better gaming card than Radeon HD 7970. And because Nvidia finally supports more than two display outputs, I can consider Kepler for my own workstation.

That is, of course, if I’m able to part ways with $500. Budget-constrained gamers should remember that the Radeon HD 7870, which AMD previewed earlier this month, just recently showed up on shopping sites. Sitting right around $360, I consider it a smarter value than both of the 7900s. It trades blows with GeForce GTX 580 in the benchmarks, and it sips power. Don’t let today’s GeForce GTX 680 news completely overshadow availability of what we consider to be a far more accessible piece of hardware. Kepler is cool, but it’s definitely pricey.

But hey, at least on the bright side, it should be available on launch day, and it should sell for close to Nvidia's estimated street price. Prior to the embargo lifting, Newegg had a bit of a slip and its GeForce GTX 680s were made available for a brief time. Tom's Hardware reader Doug Mytty sent us the above screen shot showing a number of brands selling cards around $500. A couple of others go quite a bit higher, but that's par for the course, really.