The Differences Between Hawaii And Tahiti GPUs
While Hawaii's 438 mm² die is still smaller than the GK110 on Nvidia's Quadro K6000, it's the largest GPU AMD has ever manufactured. The legendary R600 was a "mere" 420 mm².
In most respects, the implementation of AMD's Graphics Core Next architecture on Hawaii is almost identical to the FirePro W9000’s Tahiti GPU. Specifically, the Compute Unit building block is the same. All 64 IEEE-754-2008-conformant shaders consist of four vector and sixteen texture fetch load/store units.
There are, of course, improvements over the Tahiti GPU on AMD's FirePro W9000, such as device flat addressing to support standard calling conventions, precision improvements to the native LOG and EXP operations, and optimizations to the Masked Quad Sum of Absolute Difference (MQSAD) function, speeding up algorithms for motion estimation.
And with the introduction of DirectX 11.2, programmable LOD clamping and the ability to tell a shader if a surface is resident were added. Both are tier-two features associated with tiled resources.
The main departure from the W9000's GPU is the arrangement of Compute Units. Whereas Tahiti employs 32 CUs, totaling 2048 shaders and 128 texture units, Hawaii wields 44 CUs organized into four of what AMD calls Shader Engines. The math adds up to 2816 aggregate shaders and 176 texture units.
The new GPU employs eight revamped Asynchronous Compute Engines, responsible for scheduling real-time and background tasks to the CUs. The W9000 has only two. Each ACE manages up to eight queues, totaling 64, and has access to L2 cache and shared memory.
It makes a lot of sense to dedicate more resources to the arbitration of GPU resources between computation and graphics; this improves overall efficiency.
The W9000’s front-end fed vertex data to the shaders through a pair of geometry processors. Given its quad-Shader Engine layout, the FirePro W9100 doubles that number, facilitating four primitives per clock cycle instead of two. There’s also more inter-stage storage between the front- and back-end to hide latencies and realize as much of that peak primitive throughput as possible.
In addition to a dedicated geometry engine (and 11 CUs), Shader Engines also have their own rasterizer and four render back-ends capable of 16 pixels per clock. That’s 64 pixels per clock across the GPU—twice what the W9000’s GPU could do. The W9100’s Hawaii chip enables up to 256 depth and stencil operations per cycle, again doubling Tahiti’s 128.
On a graphics card designed for high resolutions, a big pixel fill rate comes in handy, and, according to AMD, in many cases, this shifts the chip’s performance bottleneck from fill to memory bandwidth.
The shared L2 read/write cache grows from 768 KB in Tahiti to 1 MB, divided into 16 64 KB partitions. This 33% increase yields a corresponding bandwidth increase between the L1 and L2 structures of 33% as well, topping out at 1 TB/s.
It makes sense, then, that increasing geometry throughput, adding 768 shaders, and doubling the back-end’s peak pixel fill would put additional demands on Hawaii’s memory subsystem. AMD addresses this with a redesigned controller.
The new GPU features a 512-bit aggregate interface that the company says occupies about 20% less area than Tahiti’s 384-bit design and enables 50% more bandwidth per mm².
How is this possible? It actually costs die space to support very fast data rates. So, hitting 6 Gb/s at higher voltage made Tahiti less efficient than Hawaii’s bus, which targets lower frequencies at lower voltage, and can consequently be smaller. Operating at 5 Gb/s in the case of the FirePro W9100, the 512-bit bus pushes up to 320 GB/s. In comparison, Tahiti maxed out at 288 GB/s.