Harnessing The Potential Of GCN
From V- To W-Series: In A New League
The new FirePro W family centers on AMD’s Graphics Core Next (GCN) architecture, which is the design used in the company's Radeon HD 7xxx-series desktop boards. The FirePro W boards succeed the V-series, which employed an older Very Long Instruction Word (VLIW) architecture. VLIW enabled decent 3D performance, but it struggled in compute-heavy applications. GCN was designed to alleviate that issue, and we've already seen it do wonders for AMD's consumer offerings in that regard.
A Detailed Look at the Compute Unit
GCN's Compute Unit (CU) replaces the Single Instruction Multiple Data (SIMD) engine that AMD has used since the days of its Radeon HD 2000. A CU consists of four vector units (VUs), which, in turn, consist of 16 ALUs and a register. Each VU unit can operate independently and execute one quarter of a command set (wavefront) per clock cycle. A CU with four VUs can execute four wavefronts every four clock cycles (or one wavefront per clock cycle). The VUs can also be scalar programmed and operate in a vector mode.
Additionally, the CUs have a scalar unit that’s responsible for things like flow control operations, which could be handled by the VUs if the VUs weren't better suited to other tasks. Each CU also has four texture units connected to a 16 KB read/write cache. The L1 cache isn’t just twice the size as the VLIW4 architecture's, but can be written to in addition to just read from.
FirePro W9000 Hits 1 TFLOP Of Double-Precision Math
The Tahiti GPU in AMD's flagship FirePro W9000 features 32 CUs. Each sports 64 ALUs, totaling 2048 ALUs. A GPU clock of 975 MHz gives us up to 4 TFLOPs of 32-bit compute performance and 1 TFLOP of double-precision floating-point math. Naturally, that's a good marketing figure, so it's a fair bet that AMD decided to use a 975 MHz GPU clock rate on the W9000 for this specific reason. Its second-fastest W8000 employs a more conservative 900 MHz frequency.
At that high-end sped, the card's L1 cache serves up 2 TB/s of bandwidth. The GPU is also equipped with 768 KB of L2 cache.
Better Tessellation And Order-Independent Transparency (OIT)
As with the desktop-oriented Tahiti-based cards, both high-end FirePro cards are armed with a GPU that sports two geometry engines with better tessellation performance than their predecessors. The ninth-generation fixed-function tessellation engines are able to handle about 2 billion triangles/s. However, AMD promises between 1.7x and 4x better performance, depending on the number of tessellation divisions.
The hardware-accelerated OIT mode is supposed to result in better output quality, minimizing artifacts and transparency render errors. Applications do have to be written to take advantage of this feature, though.
PowerTune and ZeroCore
We first dove into PowerTune in Radeon HD 6970 And 6950 Review: Is Cayman A Gator Or A Crock?The feature monitors power consumption and lowers the GPU's clock frequency as needed to keep it from exceeding its thermal design power (TDP).
AMD's Tahiti GPU has an additional power-saving feature called ZeroCore, which is composed of several components. A deep sleep mode and the DRAM’s stutter mode both serve to lower power consumption. Meanwhile, the contents of the frame buffer can now be compressed.
ZeroCore kicks in when the card is idle or the system goes to sleep. On the Windows desktop, these high-end cards only draw 15 W. Once Windows sits idle for long enough and switches off the display signal, power use drops even more. The card is nearly shut off, dissipating so little heat that its fan is able to stop spinning.
CrossFire users should be particularly excited about ZeroCore, which is able to turn off the second, third, and fourth graphics cards when they aren’t needed, lowering power consumption and thermal output. Granted, multi-card configurations are rare indeed in workstations.