GF110: Nvidia Gives Fermi A Facelift
As I’ve mentioned, GF110 is decidedly evolutionary, and Nvidia already had a foundation to build on with GF100, so the number of changes made this time around is actually pretty small. We’re still looking a 40 nm chip consisting of roughly three billion transistors.
First off, the GPU itself is largely the same. This isn’t a GF100 to GF104 sort of change, where Shader Multiprocessors get reoriented to improve performance at mainstream price points (read: more texturing horsepower). The emphasis here remains compute muscle. Really, there are only two feature changes: full-speed FP16 filtering and improved Z-cull efficiency.
GF110 can perform FP16 texture filtering in one clock cycle (similar to GF104), while GF100 required two cycles. In texturing-limited applications, this speed-up may translate into performance gains. The culling improvements give GF110 an advantage in titles that suffer lots of overdraw, helping maximize available memory bandwidth. On a clock-for-clock basis, Nvidia claims these enhancements have up to a 14% impact (or so).
|Header Cell - Column 0||GeForce GTX 580||GeForce GTX 480||GeForce GTX 470|
|Graphics Processing Clusters (GPCs)||4||4||4|
|Streaming Multiprocessors (SMs)||16||15||14|
|Graphics Clock||772 MHz||700 MHz||607 MHz|
|Shader Clock||1544 MHz||1401 MHz||1215 MHz|
|Memory Clock (Data Rate)||1002 MHz (4008 MT/s)||924 MHz (3696 MT/s)||837 MHz (3348 MT/s)|
|Memory Capacity||1.5 GB GDDR5||1.5 GB GDDR5||1.25 GB GDDR5|
|Memory Bandwidth||192.4 GB/s||177.4 GB/s||133.9 GB/s|
|Fillrate||49.4 GTexels/s||42.0 GTexels/s||34.0 GTexels/s|
|Manufacturing Process||40 nm TSMC||40 nm TSMC||40 nm TSMC|
|Display Outputs||2 x DL-DVI, 1 x mini-HDMI||2 x DL-DVI, 1 x mini-HDMI||2 x DL-DVI, 1 x mini-HDMI|
Part of the shift from GF100 to GF110 involves a chip-level re-work. There are different types of transistors an architect can use to build an integrated circuit, depending on the properties he wants to impart. Nvidia’s engineers went back to their GF100 design and purportedly modified much of it, implementing slower, lower-leakage transistors in less timing-sensitive paths and faster, higher-leakage transistors in other areas.
The result was a significant enough power savings to not only allow Nvidia to turn on the 16th Shader Multiprocessor originally disabled in its original design (adding 32 CUDA cores, four texture units, and a single PolyMorph geometry engine), but also ramp up clock rates. Whereas the GeForce GTX 480 sported core/shader/memory frequencies of 700/1401/924 MHz, GeForce GTX 580 employs a 772 MHz core clock, a 1544 MHz shader frequency, and a 1002 MHz memory clock (which translates to a 4008 MT/s data rate). All told, GF110 offers specifications closer to what we were expecting earlier this year. It includes 512 functional CUDA cores, 64 texture units, and 16 PolyMorph engines.
The independent back-end looks the same on a block diagram, the main change being the faster memory clock. It still features six ROP partitions, each associated with a 64-bit memory interface (totaling 384-bit aggregate). Each partition is capable of outputting eight 32-bit integer pixels at a time, totaling 48 pixels per clock.
Telling That Tessellation Tale
Geometry is suddenly causing quite a stir. When Nvidia briefed us on its Fermi architecture and how it related to gaming, DirectX 11 and tessellation came up over and over. The fact that the company built one of its PolyMorph engines into each of its Shader Multiprocessors (x16 on the GF110 ASIC) would purportedly make a huge difference in titles that extensively utilized tessellation to help increase realism. The problem, of course, was that there weren't any games to show off at the time. Yeah, DiRT 2 and Aliens vs. Predator were out, but both "first-gen" DirectX 11 titles are very choosy in where geometry gets added (neither was able to back Nvidia's claims that more geometry was the future of gaming).
And then there was HAWX 2. The game doesn't actually launch until after the GeForce GTX 580, but a review copy of the game did arrive a couple of days ago. We plan to roll this one into the test suite, just as we used HAWX before it. The controversy, it seems, is that AMD feels HAWX 2 employs an unrealistically high tessellation factor, handicapping the performance of its cards. The company claims to be lobbying for a patch of some sort that'd enable a definable degree of geometry. But once the game starts shipping, with or without the patch, that's the experience available to gamers, and we'll report on it in whichever state it exists.
For now, we're using Unigine to measure the scaling performance of each architecture as geometry is ramped up.
Though AMD claims its approach to tessellation is the right one, it's pretty clear that as complexity increases, Nvidia's design maintains more of its performance. The extra SM (and PolyMorph engine) works in concert with higher clocks to give the GeForce GTX 580 a fairly commanding win over the 480 with tessellation set to Extreme in this test. The Radeon HD 5870 can't even keep up with Nvidia's GeForce GTX 470 once complexity is cranked up.
Also interesting is the fact that AMD's Radeon HD 6870 outperforms the technically more capable 5870. AMD claimed that tessellation performance was one of the attributes it worked on with the Barts GPU, and despite its shader deficiency, the newer card finishes ahead (and with a higher minimum frame rate).