Double Or Nothing
It’s Good To Set Goals
ATI says it approached this design with five different goals. First on the list, naturally, was incorporating DirectX 11 support to coincide with the launch of Windows 7. The timing there couldn’t have been much better, as Micrososft’s next-gen operating system is RTM and on the verge of retail availability.
Second, it wanted to improve performance in DirectX 9, 10, and 10.1 titles. Because DirectX 11 games aren’t shipping yet, the company knew its “legacy” capabilities would be the benchmark by which it’d be measured for many months after launch.
Third, the company had an eye on stream computing. This is an area Nvidia’s CUDA architecture has outright dominated since inception. With OpenCL 1.0 and DirectCompute now standardizing the way developers handle GPGPU functionality, this is ATI’s first chance to really step out.
Fourth, it shot for two times the processing power of its previous generation in a comparable power envelope. According to ATI’s own measurement, it achieved that goal. And while maximum TDP is actually higher this time around, idle power is significantly lower.
Finally, ATI’s architects sought innovation, achieved through Cypress’ display output configuration and certain image quality enhancements.
How Do You Double Performance?
Perhaps the easiest way to double the processing power of a GPU is by doubling the resources most likely to affect performance. The result is 2.7 TeraFLOPS single-precision and 544 GigaFLOPS double-precision performance.
|Radeon HD 5870||Radeon HD 4870|
|Die Size||334 square millimeters||263 square millimeters|
|Transistors||2.15 billion||.956 billion|
|Memory Bandwidth||153 GB/s||115 GB/s|
|Idle Board Power||27W||90W|
|Active Board Power||188W||160W|
Whereas the RV770 had 10 SIMD cores, Cypress sports 20. As before, each core contains 16 stream processor units. And each stream processor boasts five ALUs, which ATI calls stream cores. Multiply those out and you get 1,600 total stream cores or shaders. Sixteen hundred shaders times 850 MHz times two FLOPS gives you that 2.7 TFLOPS measurement, all else being perfect.
As with the generation prior, texture units are tied to the SIMD arrays—four per engine. With 20 arrays, that’s 80 total texture units. Of course, RV770 featured 40.
And though they also look fairly similar on a full-size die shot, Cypress’ render back-ends are also significantly improved. This part of the chip was a concern back when ATI first introduced us to its RV770 architecture. But GDDR5 memory helped mitigate the effects of stepping down to an aggregate 256-bit memory bus. Moreover, improvements to anti-aliasing performance and Z/stencil rate demonstrated that ATI had fixed much of what was “broken” on RV670.