The GeForce GTX 560 Ti Review
Back in July of last year, I speculated that Nvidia turned off one Streaming Multiprocessor on the GeForce GTX 460’s GF104 processor to avoid blowing GeForce GTX 465 out of the water. Wouldn’t it have been cool to see a version of that chip with all eight SMs enabled, though? I mean, the GeForce GTX 460 was an award winner with 336 CUDA cores and 56 texture units. And yes, the fully-functioning version might have thrown a wrench into more expensive (and less attractive) GeForce boards. But it also would have wrecked the Radeon HD 6800-series cards a full three months before they launched.
Of course, it made sense that Nvidia wouldn’t do such a thing. The GeForce GTX 465 had enough trouble competing in a world without GF104. That card is no longer being manufactured, though. And we have to imagine that Nvidia is eager to shift away from GF100-based boards entirely.
Up until today, the company had one card left employing its original Fermi-based GPU: GeForce GTX 470. The 470 filled an important gap in Nvidia’s portfolio at $259 (between the $200 GTX 460 and $349 GTX 570). That’s down from a launch price of $350. It’s amazing how healthy competition turns into better value for us game enthusiasts, isn’t it?
GF100 says its final farewell today, just under a year after its original introduction. It’s being replaced by the GeForce GTX 560 Ti. Rather than center on an unrestricted version of GF104, GTX 560 employs a re-spun derivative of the chip that folds in some of the transistor-level improvements first seen in the GeForce GTX 580.
The result is a graphics processor armed with fewer than two billion transistors (1.95 billion, according to Nvidia) that universally matches or exceeds the performance of the three billion-transistor GF100 as it appeared on GeForce GTX 470.
Building A Faster Gaming GPU
We all know that Nvidia’s GeForce GTX 500-series is hardly different from the 400-series, architecturally.
The GeForce GTX 580 and 570 both center on GF110—a reworked GF100 with improved texture filtering, better Z-cull efficiency, and a number of transistor-level optimizations that facilitate higher clocks at comparable power use.
Similarly, GeForce GTX 560 Ti centers on GF114, the re-spun version of GF104. Before you get too excited, though, remember that GF104 already incorporated the texture filtering improvements that didn’t make it into GF100. That is to say 64-bit FP16 texel throughput doubled from two/clock to four/clock, per texture unit. GF104 has this capability, GF110 has it, but GF100 did not. What’s more, Nvidia decided not to carry over the Z-cull improvements from GF110, instead choosing to leave the raster engine unchanged.
The net effect is that GF114 is functionally identical to GF104. In fact, Nvidia even cites the same 1.95 billion transistor count. And we’re still looking at TSMC’s 40 nm process here.
The reworked silicon can clock higher at lower power levels, yielding more performance, but it’s still an improved GF104. Of course, the main difference is that, while Nvidia turned off one of GF104’s Streaming Multiprocessors to create GeForce GTX 460, GeForce GTX 560 Ti sports an unshorn GF114 (I challenge you to use that word in a sentence at some point today). Compared to the 460, that means higher clocks, more CUDA cores, theoretically higher geometry performance due to an eighth PolyMorph engine, and eight additional texture units. All of those factors combine to create a card that doesn’t end up replacing GeForce GTX 460 at all; it instead is fast enough to eclipse the GeForce GTX 470.
Those Specs Look Familiar
If you already know GF104, then this is going to look a bit like a study guide from Last Year’s GPUs 101. GF114 consists of two Graphics Processing Clusters (GPCs), each with four Streaming Multiprocessors. As you already know, all eight SMs are fully enabled in GeForce GTX 560 Ti.
Taken (and modified) from my GeForce GTX 460 review:
“Instead of the GF100’s 32 CUDA cores per SM, GF114 wields 48 cores per SM. Keeping these more complex SMs fed with information necessitates higher instruction throughput, so we see another enhancement: taking GF100’s two dispatch units per SM to GF114’s four. Similarly, each SM now boasts eight texture units (instead of four).
In the simplest terms possible, this is a wider GPU than GF100/GF110. The result is better performance than a scaled-down GF100 in the types of apps that most people play today.
The chip’s back-end is a bit different, too. A complete GF100 offers six ROP partition units independent of the GPCs, each capable of outputting eight 32-bit integer pixels per clock (totaling 48). All six partitions are also associated with a 64-bit memory path, yielding an aggregate 384-bit bus. GF114 gets a maximum of four partitions, yielding up to 32 pixels per clock and a 256-bit bus.”