AMD Threadripper BIOS Reporting Wrong Temps?
While testing a new waterblock for AMD's Threadripper platform, we stumbled across a bug in the company's firmware. The temperature values we measured compared to those reported during our launch coverage using a comparable water block were lower during overclocking (at 325W power consumption) by up to 25 Kelvin for Tctl (the cores) and up to 35 Kelvin for Tdie (the chip temperature)!
This alone wouldn't be dramatic, since the values we saw operating the CPU at its factory clock rate (and about 180W of package power) now correspond to what we can expect from a good water block, a custom loop, and a soldered IHS. Recall that for the Threadripper launch, AMD circulated a 27°C offset to the Tctl values to get to average core temperature. Thus, we were happy when our factory clock values looked realistic.
Then we found that as power consumption rose, the reported temperature went down! At 180W we saw approximately 67°C for Tctl. But this reading dropped to 51°C at about 325W (or 16 Kelvin less). This makes very little sense, of course, especially since these are also the temperature values output at idle with short, small load peaks.
We saw the same effect with Tdie. The 24°C value at 325W is nonsensical. Note that AMD's WattMan also uses this extremely low value, and motherboards do as well for their temperature-controlled fan control. As you can imagine, this causes significant issues when overclocking.
So, we set out looking for clues. In order to exclude our systems as the source of error, we tested them extensively.
We started with a new, clean Windows image with old and new drivers. We switched between three motherboards from different manufacturers (Asus, Gigabyte, and ASRock) using the latest BIOS. Still nothing to report.
But after we flashed back from BIOS 0503 to the old 0304 (used for our launch review) on Asus' X399 ROG Zenith motherboard, we saw the old temperature values once again, in addition to the already-documented stability problems. We therefore hypothesize that the cause of the error is the AGESA code 1003 Patch 4, and that it is displaying the calculated temperatures incorrectly during overclocking, with the potential for reduced fan curves during increased power consumption.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
We tested further with a much weaker AIO cooler, and our overclocking led to significantly lower fan speeds when using the motherboard's PWM-controlled fans. The result is a thermal accident waiting to happen. An air cooler is therefore out of the question for now.
We have already informed AMD about these measurements, and we are awaiting a statement or a new BIOS, which we will re-test for an update. For now, we recommend manually controlling the fans when using the current BIOS versions.
-
cia1413 Why would you quote a K value? I would bet less than 1% of people know intuitively what that means in relation to C.Reply -
Clamyboy74 20122130 said:Why would you quote a K value? I would bet less than 1% of people know intuitively what that means in relation to C.
Its just math, subtract 273 -
JamesSneed 20122130 said:Why would you quote a K value? I would bet less than 1% of people know intuitively what that means in relation to C.
Personally it bothered me more mixing Kelvin and Celsius temperatures throughout the article. Just pick a direction for god sakes. -
derekullo Might as well mix the rest in;Reply
Fahrenheit, Rankine, Delisle, Newton, Réaumur, Rømer and Electronvolts.
Newton, at some archaic time, was a measure of temperature lol.
https://en.wikipedia.org/wiki/Temperature#Units -
InvalidError My guess about where the reading difference comes from? AMD didn't use a separate ground circuit for its temperature sensing element and the "temperature" reading ends up offset by the "voltage droop" on the ground as current goes up. Electrical resistance also goes up with temperature, which could compound the effect.Reply -
Gillerer 20122173 said:20122130 said:Why would you quote a K value? I would bet less than 1% of people know intuitively what that means in relation to C.
Its just math, subtract 273
In delta temperatures, value in K is exactly the same as value in °C: ∆T of +1 K = +1°C. -
willwill56 lol at all of the comments confused by Kelvin.Reply
Specifying change in temperature in Kelvin is perfectly fine. 10 Kelvin colder is, by definition of the Kelvin itself, exactly the same as 10 Celsius colder. -
davidchaoth "We have already informed AMD about these measurements, and we are awaiting a statement or a new BIOS, ..."Reply
AMD's statement: All our employees' paychecks have been correctly deposited; the auto-deposit function works well as expected : )