Sign in with
Sign up | Sign in

GK104: The Chip And Architecture

GeForce GTX 680 2 GB Review: Kepler Sends Tahiti On Vacation
By

GeForce GTX 680’s Vital Signs

Once we strip the card of its cooling hardware, we’re left with a bare board.

The GK104 GPU sits at its center, composed of 3.54 billion transistors on a 294 mm² die. Again, Nvidia lands between two of AMD’s current-gen offerings: the Tahiti GPU contains 4.31 billion transistors in a 365 mm² die, while Pitcairn is composed of 2.8 billion transistors in a 212 mm² die.

Knowing that GK104 is manufactured on TSMC’s 28 nm node, the GPU’s supply will almost certainly be tight as multiple customers vie for the foundry’s limited output. With that said, the company tells us to expect some availability at launch, with another wave of boards to arrive in early April.

Now, let’s have a look at the GPU’s specifications compared to GeForce GTX 580 and AMD’s Radeon HD 7970:


GeForce GTX 680
Radeon HD 7950
Radeon HD 7970
GeForce GTX 580
Shaders1536
1792
2048512
Texture Units
128
112
128
64
Full Color ROPs
32
32
32
48
Graphics Clock
1006 MHz
800 MHz
925 MHz772 MHz
Texture Fillrate
128.8 Gtex/s
89.6 Gtex/s
118.4 Gtex/s
49.4 Gtex/s
Memory Clock
1502 MHz
1250 MHz
1375 MHz1002 MHz
Memory Bus
256-bit
384-bit
384-bit384-bit
Memory Bandwidth192.3 GB/s
240 GB/s
264 GB/s
192.4 GB/s
Graphics RAM
2 GB GDDR5
3 GB GDDR5
3 GB GDDR5
1.5 GB GDDR5
Die Size
294 mm2365 mm2
365 mm2
520 mm2
Transistors (Billion)
3.54
4.31
4.31
3
Process Technology
28 nm
28 nm
28 nm40 nm
Power Connectors
2 x 6-pin
2 x 6-pin
1 x 8-pin, 1 x 6-pin1 x 8-pin, 1 x 6-pin
Maximum Power
195 W
200 W
250 W
244 W
Price (Street)
$510
$450
$550
~$410


The GK104 GPU is broken down into four Graphics Processing Clusters (GPCs), each of which contains two Streaming Multiprocessors (formerly referred to as SMs, and now called SMXs).

GeForce GTX 680's GK104 GPUGeForce GTX 680's GK104 GPU

While there’s undoubtedly a lot of depth we could go into on the evolution from the original GF100 design to the GK104 architecture debuting today, the easiest way to characterize this new chip is in reference to the GF104 processor that first powered GeForce GTX 460.

In comparison to GF104's SM, the GK104 SMXs features two times as many warp schedulers (from two up to four), dispatch units (from four up to eight), and texture units (from eight up to 16) per Shader Multiprocessor, along with a register file that’s twice as large. It also sports four times as many CUDA cores. GF104 included 48 shaders per SM; GK104 ups that to 192 shaders in each SMX.

GK104 SMX (Left) Versus GF104 SM (Right)
Per SM:
GK104GF104Ratio
CUDA Cores
192
48
4x
Special Function Units
32
8
4x
Load/Store
32
16
2x
Texture Units
16
8
2x
Warp Schedulers
4
2
2x
Geometry Engines
1
1
1x


Why quadruple the number of CUDA cores and double the other resources? Kepler’s shaders run at the processor’s frequency (1:1). Previous-generation architectures (everything since G80, that is) operated the shaders two times faster than the core (2:1). Thus, doubling shader throughput at a given clock rate requires four times as many cores running at half-speed. 

The question then becomes: Why on earth would Nvidia throttle back its shader clock in the first place? It’s all about the delicate balance of performance, power, and die space, baby. Fermi allowed Nvidia’s architects to optimize for area. Fewer cores take up less space, after all. But running them twice as fast required much higher clock power. Kepler, on the other hand, is tuned for efficiency. Halving the shader clock slashes power consumption. However, comparable performance necessitates two times as many data paths. The result is that Kepler trades off die size for some reduction in power on the logic side, and more significant savings from clocking.

Additional die area and power are cut by eliminating some of the multi-ported hardware structures used to help schedule warps and moving that work to software. By minimizing the amount of power and area consumed by control logic, more space is freed up for doing useful work versus Fermi.

Nvidia claims that the underlying changes made to its SMX architecture result in a theoretical doubling of performance per watt compared to Kepler’s predecessor—and this is actually something we’ll be testing today.

Alright. We have eight of these SMXs, each with 192 CUDA cores, adding up to 1536. Sixteen texture units per SMX yield 128 in a fully-enabled GK104 processor. And one geometry engine per SMX gives us a total of eight. Now, wait a minute—GF104 had eight PolyMorph engines, too. So Nvidia multiplied all of these other resources out, but left its primitive performance alone? Not quite.

To begin, Nvidia claims that each PolyMorph engine is redesigned to deliver close to two times the per-clock performance of Fermi’s fixed-function geometry logic. This improvement is primarily observable in synthetic metrics, which we’ve seen developers (rightly) argue should represent the future of their efforts, exploiting significantly more geometry than today’s titles to augment realism. But if you’re playing something available right now, like HAWX 2, how can you expect Kepler to behave?

In absolute terms, GeForce GTX 680 easily outperforms the Radeon HD 7970 and 7950, regardless of whether tessellation is used or not. However, Nvidia’s new card takes a 31% performance hit when you toggle tessellation on in the game’s settings. In comparison, the Tahiti-based Radeons take a roughly 16% ding. Now, there’s a lot more going on in a game than just tessellation to affect overall performance. But it’s still interesting to note that, in today’s titles, the advantage Nvidia claims doesn’t necessarily pan out. 

In order to support the theoretically-higher throughput of its geometry block, Nvidia also doubles the number of raster engines compared to GF104, striking a 1:1 ratio with the ROP partitions.

Like GF104, each of GK104’s ROPs outputs eight 32-bit integer pixels per clock, adding up to 32. The two GPUs also share 256-bit aggregate memory buses in common. Where they differ is maximum memory bandwidth. 

Perhaps more true to its mid-range price point, GeForce GTX 460 included up to 1 GB of GDDR5 running at 900 MHz, yielding 115.2 GB/s. GeForce GTX 680 is being groomed to succeed GeForce GTX 580, though, which utilized 1002 MHz GDDR5 on a 384-bit aggregate bus to serve up 192.4 GB/s of throughput. Thus, GeForce GTX 680 comes armed with 2 GB of GDDR5 operating at 1502 MHz, resulting in a very similar 192.26 GB/s figure.

Display all 327 comments.
This thread is closed for comments
Top Comments
  • 44 Hide
    borden5 , March 22, 2012 12:55 PM
    oh man this's good news for consumer, hope to see a price war soon
  • 38 Hide
    Anonymous , March 22, 2012 12:46 PM
    Hail to the new king.
  • 33 Hide
    outlw6669 , March 22, 2012 12:59 PM
    Nice results, this is how the transition to 28nm should be.
    Now we just need prices to start dropping, although significant drops will probably not come until the GK110 is released :/ 
Other Comments
  • 38 Hide
    Anonymous , March 22, 2012 12:46 PM
    Hail to the new king.
  • 44 Hide
    borden5 , March 22, 2012 12:55 PM
    oh man this's good news for consumer, hope to see a price war soon
  • 26 Hide
    johnners2981 , March 22, 2012 12:58 PM
    Damn prices, in europe we have to pay the equivalent of $650-$700 to get one
  • 33 Hide
    outlw6669 , March 22, 2012 12:59 PM
    Nice results, this is how the transition to 28nm should be.
    Now we just need prices to start dropping, although significant drops will probably not come until the GK110 is released :/ 
  • 23 Hide
    Anonymous , March 22, 2012 1:00 PM
    Finally we will see prices going down (either way :-) )
  • -4 Hide
    Scotty99 , March 22, 2012 1:03 PM
    Its a midrange card, anyone who disagrees is plain wrong. Thats not to say its a bad card, what happened here is nvidia is so far ahead of AMD in tech that the mid range card purposed to fill the 560ti in the lineup actually competed with AMD's flagship. If you dont believe me that is fine, you will see in a couple months when the actual flagship comes out, the ones with the 384 bit interface.
  • 26 Hide
    Chainzsaw , March 22, 2012 1:04 PM
    Wow not too bad. Looks like the 680 is actually cheaper than the 7970 right now, about 50$, and generally beats the 7970, but obviously not at everything.

    Good going Nvidia...
  • 32 Hide
    SkyWalker1726 , March 22, 2012 1:05 PM
    AMD will certainly Drop the price of the 7xxx series
  • 20 Hide
    rantoc , March 22, 2012 1:13 PM
    2x of thoose ordered and will be delivered tomorrow, will be a nice geeky weekend for sure =)
  • 23 Hide
    Scotty99 , March 22, 2012 1:21 PM
    scrumworksNothing surprising here. Little overclocking can put Tahiti right at the same level. Kepler is actually losing to Tahiti in really demanding games like Metro 2033 that uses the latest tech. Pointless to test ancient and low tech games like World of Warcrap that is ancient, uses dx9 and is not considered cutting edge in any meter.


    Sigh...

    WoW has had DX11 for quite a long time now. Also, go play in a 25 man raid with every detail setting on ultra with 8xAA and 16x AAF and tell me WoW is not taxing on a PC.
  • 16 Hide
    yougotjaked , March 22, 2012 1:21 PM
    Wait what does it mean by "if you’re interested in compute potential, you’ll have to keep waiting"?
  • 0 Hide
    dragonsqrrl , March 22, 2012 1:22 PM
    Just throwing this out there now, but some AMD fanboy will find a way to discredit or marginalize these results.

    ...oh, wait.
  • 24 Hide
    klausey , March 22, 2012 1:24 PM
    Great to see nVidia jumping back into the game and forcing AMD to lower its prices accordingly. I was shocked to see the card actually available at the MSRP of $500 on launch day. I guess we'll see how long that lasts.

    For everyone suggesting that nVidia will release another true "flagship" beyond the 680, I think you are spot on, IF AMD gives them a reason to. There's no reason to push it at the moment as they already hold the crown. If, on the other hand, AMD goes out and makes a 7980, or 79070 SE card with higher clocks (more like what the 7970 can achieve when properly overclocked), I definitely see nVidia stepping their game up a bit.

    Either way, it's awesome to see both AMD and now nVidia taking power consumption into consideration. I'm tired of my computer room feeling like a toaster after an all nighter.
  • 18 Hide
    rantoc , March 22, 2012 1:24 PM
    yougotjakedWait what does it mean by "if you’re interested in compute potential, you’ll have to keep waiting"?


    He means waiting for the GK110, that will be a more of a compute card while this GK104 is more equiped towards gaming.
  • 13 Hide
    EXT64 , March 22, 2012 1:26 PM
    Really disappointing DP compute, but a tradeoff had to be made and this card is meant for gaming, so I can understand their position. Hopefully GK110 is a real card and will eventually come out.
  • -7 Hide
    Anonymous , March 22, 2012 1:27 PM
    but will it run tetris with multiple displays?
  • 7 Hide
    amk-aka-Phantom , March 22, 2012 1:27 PM
    Oh yeah, team green strikes back! :D  Now let's see what 660 Ti will be like, might suggest that to a friend.
Display more comments