Thermal Management In A Modern Graphics Card
Modern graphics cards from both AMD and Nvidia employ protection mechanisms to ramp up fan speeds, and eventually throttle back clock rates and voltages if they get too hot. This technology doesn't always work to keep your system stable (particularly when you're overclocking). Rather, it's meant to keep the hardware from getting damaged. So it's not unheard of for an over-tuned card to crash, requiring a reset.
There has been much debate about how hot is too hot for a GPU. However, higher temperatures, if they're tolerated by the equipment, are actually desirable as they result in better heat dissipation overall (as the difference with ambient temperature, and thus amount of heat that can be transferred, is higher). At least from a technical perspective, AMD's frustration over reactions to the Hawaii GPU's thermal ceiling is understandable. There are no long-term studies that I'm aware of speaking to the viability of given temperature set points. From my own experiences with device stability, I have to rely on manufacturer specifications.
On the other hand, it is a well-known fact that silicon transistors broadly perform better at lower temperatures. That is the main reason you see competitive overclockers using liquid nitrogen to get the chips they're testing as cold as possible. In general, lower temperatures help facilitate more overclocking headroom.
Some of the most power-hungry cards in the world are the Radeon HD 7990 (375 W TDP) and GeForce GTX 690 (300 W TDP). Both are dual-GPU cards. Single-GPU boards tend to be quite a bit lower, though the Radeon R9 290-series cards creep up closer to 300 W. In either case, that's a lot of heat to dissipate.
Volumes have been written about graphics card cooling, so we wont delve into that. Rather, we're interested in what actually happens when you begin applying load to a modern GPU.
- You launch a processing-intensive application like a 3D game or your favorite bitcoin miner
- The card's clock rates increase to their nominal/boost values; the board starts warming up due to greater current absorption
- Fan speed progressively rises, up to a point defined by firmware; usually it'll taper off when acoustics approach 50 dB(A)
- If the programmed fan speed isn't enough to keep the GPU's temperature below a certain level, clock rates scale back until the temperature falls below the set threshold
- Your card should operate stably within a relatively narrow frequency and temperature range until the application driving the load is shut down
As you can imagine, the exact thermal throttling point depends on many factors, including the specific load, the enclosure's airflow, the ambient air temperature, and even ambient air pressure. That's why cards throttle at different times, or not at all. This thermal throttling point can be used to define a reference level of performance. And if we set a card's fan speed (and thus noise level) manually, we can create a noise-dependent measurement level. What use is that? Let's find out...