Skip to main content

ATI Radeon HD 5870: DirectX 11, Eyefinity, And Serious Speed

Double Or Nothing

It’s Good To Set Goals

ATI says it approached this design with five different goals. First on the list, naturally, was incorporating DirectX 11 support to coincide with the launch of Windows 7. The timing there couldn’t have been much better, as Micrososft’s next-gen operating system is RTM and on the verge of retail availability.

Second, it wanted to improve performance in DirectX 9, 10, and 10.1 titles. Because DirectX 11 games aren’t shipping yet, the company knew its “legacy” capabilities would be the benchmark by which it’d be measured for many months after launch.

Third, the company had an eye on stream computing. This is an area Nvidia’s CUDA architecture has outright dominated since inception. With OpenCL 1.0 and DirectCompute now standardizing the way developers handle GPGPU functionality, this is ATI’s first chance to really step out.

Fourth, it shot for two times the processing power of its previous generation in a comparable power envelope. According to ATI’s own measurement, it achieved that goal. And while maximum TDP is actually higher this time around, idle power is significantly lower.

Finally, ATI’s architects sought innovation, achieved through Cypress’ display output configuration and certain image quality enhancements.  

How Do You Double Performance?

Perhaps the easiest way to double the processing power of a GPU is by doubling the resources most likely to affect performance. The result is 2.7 TeraFLOPS single-precision and 544 GigaFLOPS double-precision performance.

Radeon HD 5870Radeon HD 4870
Die Size334 square millimeters263 square millimeters
Transistors2.15 billion.956 billion
Memory Bandwidth153 GB/s115 GB/s
AA Resolve12864
Texture Units8040
Shader (ALUs)1,600800
Idle Board Power27W90W
Active Board Power188W160W

Whereas the RV770 had 10 SIMD cores, Cypress sports 20. As before, each core contains 16 stream processor units. And each stream processor boasts five ALUs, which ATI calls stream cores. Multiply those out and you get 1,600 total stream cores or shaders. Sixteen hundred shaders times 850 MHz times two FLOPS gives you that 2.7 TFLOPS measurement, all else being perfect.

Block diagram of Cypress

As with the generation prior, texture units are tied to the SIMD arrays—four per engine. With 20 arrays, that’s 80 total texture units. Of course, RV770 featured 40.

And though they also look fairly similar on a full-size die shot, Cypress’ render back-ends are also significantly improved. This part of the chip was a concern back when ATI first introduced us to its RV770 architecture. But GDDR5 memory helped mitigate the effects of stepping down to an aggregate 256-bit memory bus. Moreover, improvements to anti-aliasing performance and Z/stencil rate demonstrated that ATI had fixed much of what was “broken” on RV670.

Chris Angelini
Chris Angelini is an Editor Emeritus at Tom's Hardware US. He edits hardware reviews and covers high-profile CPU and GPU launches.