Geometric Performances, PowerPlay
AMD hasn’t only improved its architecture’s weaknesses; the engineers have also improved the cards’ existing strong points even more. The performance of the Geometry Shaders has been improved. That’s not surprising. This type of shader is still very recent, and the preceding architecture was the first version either AMD or Nvidia had implemented. Now they’ve had time to improve on their first versions. Like Nvidia, AMD has increased the size of the output buffer of the Geometry Shaders in order to conserve more data on the GPU. The number of Geometry Shader threads being processed at one time has been multiplied by four. Let’s look at the practical results of these improvements:
Though the RV770 wasn’t very impressive on the Galaxy benchmark (which, if you notice that the GT200 shows a very limited gain over the G92 and seems not to be influenced much by the size of the buffer in all cases), it really showed its capabilities with Hyperlight, where it placed second just behind the GTX 280.
Let’s continue our tests with the accent on geometry, this time with vertex shading:
Again not surprisingly, the AMD architecture still holds sway. But again there’s some disappointment, since you’d expect an architecture with 800 ALUs to scores much better. But in practice all current GPUs are limited by the power of the setup engine, which holds them to one triangle per cycle in the best cases. The Vertex Shader 3.0 test simply refused to run on the RV770.
Let’s move on with vertex shader performance, this time specifically targeting texture fetching, since it’s a useful technique, especially for displacement mapping. Nvidia kept the advantage by a nose on the Earth test, but on the Waves test AMD was far ahead, even leaving the hottest new series from Nvidia behind.
PowerPlay
AMD has also improved management of its GPU’s power consumption, in particular by introducing clock gating, which disables certain parts of the chip when they’re not being used. AMD has also corrected a bug in its power management that was revealed on the RV670s when running with midrange or low-end CPUs. With such CPUs, the RV670 was sometimes underused and so shifted to low-power mode, and when the CPU had finished processing the data and suddenly sent them in a burst, the GPU had to move back into high-performance mode, which took several cycles and could cause micro stuttering.
The GPU also has a microcontroller that takes readings:
- of the temperature at various different sensors disseminated around the GPU;
- of the activity of the different GPU areas. This microcontroller is what controls clock gating and the frequency of the GPU as a function of these readings, thus minimizing cost at the level of the driver.