Sign in with
Sign up | Sign in

Double Or Nothing

ATI Radeon HD 5870: DirectX 11, Eyefinity, And Serious Speed
By , Fedy Abi-Chahla

It’s Good To Set Goals

ATI says it approached this design with five different goals. First on the list, naturally, was incorporating DirectX 11 support to coincide with the launch of Windows 7. The timing there couldn’t have been much better, as Micrososft’s next-gen operating system is RTM and on the verge of retail availability.

Second, it wanted to improve performance in DirectX 9, 10, and 10.1 titles. Because DirectX 11 games aren’t shipping yet, the company knew its “legacy” capabilities would be the benchmark by which it’d be measured for many months after launch.

Third, the company had an eye on stream computing. This is an area Nvidia’s CUDA architecture has outright dominated since inception. With OpenCL 1.0 and DirectCompute now standardizing the way developers handle GPGPU functionality, this is ATI’s first chance to really step out.

Fourth, it shot for two times the processing power of its previous generation in a comparable power envelope. According to ATI’s own measurement, it achieved that goal. And while maximum TDP is actually higher this time around, idle power is significantly lower.

Finally, ATI’s architects sought innovation, achieved through Cypress’ display output configuration and certain image quality enhancements.  

How Do You Double Performance?

Perhaps the easiest way to double the processing power of a GPU is by doubling the resources most likely to affect performance. The result is 2.7 TeraFLOPS single-precision and 544 GigaFLOPS double-precision performance.


Radeon HD 5870
Radeon HD 4870
Die Size
334 square millimeters
263 square millimeters
Transistors
2.15 billion
.956 billion
Memory Bandwidth
153 GB/s
115 GB/s
AA Resolve
128
64
Z/Stencil
128
64
Texture Units
80
40
Shader (ALUs)
1,600
800
Idle Board Power
27W
90W
Active Board Power
188W
160W


Whereas the RV770 had 10 SIMD cores, Cypress sports 20. As before, each core contains 16 stream processor units. And each stream processor boasts five ALUs, which ATI calls stream cores. Multiply those out and you get 1,600 total stream cores or shaders. Sixteen hundred shaders times 850 MHz times two FLOPS gives you that 2.7 TFLOPS measurement, all else being perfect.

Block diagram of CypressBlock diagram of Cypress

As with the generation prior, texture units are tied to the SIMD arrays—four per engine. With 20 arrays, that’s 80 total texture units. Of course, RV770 featured 40.

And though they also look fairly similar on a full-size die shot, Cypress’ render back-ends are also significantly improved. This part of the chip was a concern back when ATI first introduced us to its RV770 architecture. But GDDR5 memory helped mitigate the effects of stepping down to an aggregate 256-bit memory bus. Moreover, improvements to anti-aliasing performance and Z/stencil rate demonstrated that ATI had fixed much of what was “broken” on RV670.

React To This Article