Overclocking GP102 Under LN2
Scaling As A Function Of Voltage
In order to determine the card’s behavior as a function of its supply voltage, we ran a series of tests while holding temperature constant. This involved lowering the temperature to -100°C and searching for a peak stable frequency while adjusting the GPU's voltage.
At 1.1V, GP102 runs stably at 2.3 GHz. Thanks to lower temperatures, the GPU truly benefits from more voltage (rather than simply overheating). An increment of 0.1V yields a gain of 40 MHz.
The next step is more interesting: scaling seems to accelerate, and the jump from 1.2 to 1.3V offers a bonus of 70 MHz. We obtain an extra 50 MHz by pushing to 1.4V, after which the curve starts to level off. This trend is confirmed by pushing to 1.45V. And that's where we stop. It's useless to continue risking damage for clearly diminishing returns. Going any further requires lower temperatures.
Scaling As A Function Of Temperature
Our goal here is the same, except this time we vary the temperature at a fixed voltage.
The chart starts at 20°C with a voltage setting of 1.35V. It just didn't seem wise to go any higher at a non-negative temperature.
Going from 20°C to 0°C yields a gain of 60 MHz. It's already looking like temperature makes more of a difference than supply voltage for Nvidia's GP102 processor. Now we understand why MSI shipped its Lightning Z board with such a large heat sink.
The 2300 MHz threshold is crossed at -50°C, and the scaling almost seems linear from there. We push ahead to 2440 MHz at -100°C and 2510 MHz at -130°C. A clock rate of over 2.5 GHz using "only" 1.35V isn't bad at all.
Usually, we stop experimenting when we see scaling taper off. But that just wasn't happening here. So why did we give up in the face of such success? First of all, we ran out of LN2, despite allocating 200 liters of it for this experiment. Moreover, it's difficult to create these curves. Thermal paste doesn't like negative temperatures, which cause it to degrade quickly. When this degradation occurs, you'll start seeing the maximum clock rate drop. And the worse the paste gets, the further the frequency falls. Honing in on an operational sweet spot involves warming up the card and cooling it back down to make sure you get a good reading, and not a value held back by poor thermal transfer. Playing yo-yo consumes a significant amount of liquid nitrogen.
Beyond -130°C, we weren’t able to launch a test run without the thermal paste getting in our way. We tried three times before exhausting the LN2.
The important take-away, though, is that these cards love cold temperatures. Our -130°C limit wasn't imposed by GP102, but rather our thermal paste. By extrapolation, frequencies in the 2600 MHz range should be possible. You just have to find a way to test at -160°C without degradation.
With the pot reading 20°C, our GPU reports a temperature of between 30 and 40°C.
The PCB temperature rises due to heat from the memory chips and VRMs. We can’t cite a final value for how warm the PCB got, though, because it never stabilized during our quick tests. More interesting was the reported VRM temperature of 74°C. That's nothing to worry about; it's normal to see elevated readings at such high voltages, particularly before our pot cools down.
When the internal sensor drops below a 0°C threshold, we see the frequency fall. On the previous screen, a +230 MHz overclock yielded 2138 MHz. So you'd think that +430 MHz would give us 2338 MHz. But no, we get 2282 MHz instead.
While the difference isn't earth-shattering, you should still use caution when playing around at temperatures in the 0°C range, since a frequency boost will kick on and off. If you launch a test at -10°C, the boost is deactivated altogether and you'll have to set a larger offset to compensate. Then the GPU temperature climbs past 0°C, the boost kicks in with an instant 60 MHz speed-up, and the card crashes.
At a pot temperature of -100°C, the GPU reads -40°C and doesn’t budge. The PCB temperature also drops, cold temperatures from the pot spread, and the GDDR5X cools down as well.
The VRMs stabilize to around 50°C. This is an ideal temperature because it's far from overheating, and a positive reading keeps the board from freezing. This is the barrier against condensation that we mentioned earlier.
As you test under air cooling, it's possible to hypothesize how a piece of hardware will behave under liquid nitrogen. An air-cooled processor that is stable at 5 GHz and 1.2V will, without a doubt, fare better with LN2 than another chip that barely held 4.8 GHz at 1.4V.
But when the difference between samples is small, all bets are off. It is completely possible for a circuit that holds 5 GHz at 1.2V on air to under-perform a sample that maintained 5 GHz at 1.22V once LN2 is applied. This is even more true in our case, since temperatures clearly have a huge effect on overclocking potential.
A GPU that scales better under LN2 cooling or doesn't suffer under degrading thermal paste will almost certainly come out on top. Likewise, certain cards sometimes hide bad surprises: they refuse to work below 0°C. Therefore, we tested all four of our cards in order to profile their behavior. This little experiment consumed most of our liquid nitrogen.
Our first GPU was a real disappointment. In short, we weren't able to run a single benchmark below -40°C. Once we hit that temperature, the card froze up. And from -40 to -110°C, it was impossible to restart. But at -120°C, a miracle happened and the system started working again. We thought we had been saved, but the internal temperature sensor insisted on being fussy.
In theory, when they are too cold, the sensors report the last value read, or the lowest value they're capable of reading. For example, motherboards often get stuck at -11°C. In the case of our first GeForce GTX 1080 Ti Lightning Z, the temperatures were all over the place. Once the sensor hit its minimum, it'd jump to its maximum. This made it impossible to launch a test because the card would go into protection mode, thinking it was above 90°C each time we tried.
GPUs #2, #3, And #4
Thankfully, the next three cards were more cooperative. We were able to overclock them above 2500 MHz at around -120°C and 1.45V. The hardest part was determining which one was the best because their results were so close.
In order to make sure our power supply wasn't getting in the way of higher overclocks, we fired up its power consumption monitor. With a 1.5V setting, the output approached 900W. This 1200W Cooler Master PSU apparently still has some headroom, though its monitoring is laggy and doesn't properly reflect short power spikes. With concerns still lingering, we replaced the MasterWatt Maker 1200W with a Corsair AX1500i and didn't notice a difference.
Because of the aforementioned problems with thermal paste degradation, ranking our three remaining Lightning Z cards took hours. Without a strong indication one way or the other, we cut one card that proved a little difficult to restart and a second one that seemed a little weaker than the last board standing. Using this sample, we hit 2570 MHz during our first Time Spy benchmark.
MORE Best Graphics Cards