i7-5820k Core temps MUCH higher than CPU Temperature. Is it safe?

mordhau5

Commendable
Sep 16, 2016
5
0
1,510
I've had my current rig for a little over a year now and recently decided to put it on custom water :) After a few months of designing/building, my loop is functional and keeps idle temps generally around 28c and under 45c during load (GPU and CPU, no OC). This is all grand, so I decided I'm ready to really push this thing, and managed to get 4.6GHz to validate at around 1.45v (yikes). I know it's high, but should still be safe. It's not long-term stable yet as p95 and RealBench both crash eventually. Bearing this in mind, I want to proceed to a more stable voltage but my temps are what I'm really worried about....

I'm using SpeedFan to monitor temps and adjust fan speeds. Currently, I have fans set to spin up according to water temp, which I'm monitoring from a temperature sensitive plug). My board reports the socket chilling at a cool 50c (with the fans barely running) after 30 mins of p95 and few degrees lower with RealBench. My core temps, however, fluctuate wildly between 60c and 95c. I've seen several cores sit at 90c+ for a minute or more. That seems way too high! Why is the package as a whole sitting a nice comfortable temperature when the cores are screaming? Do the extra two cores on Haswell-E really mean that much extra heat? Bad internal TIM under the lid? Bad silicon? What gives!

Relevant parts:
CPU: i7-5820k
Board: Asus x-99a
TIM: Collaboratory Liquid Ultra
Block: XSPC Raystorm Pro (rest of the loop available if needed)

First question here on Tom's so please let me know if there's more info I need to provide. Couldn't find any questions that dealt specifically with core temps higher than CPU temp.

EDIT1: some grammar fixes and add relevant info
 
Solution
mordhau5,

On behalf of the Moderator Team, welcome to Tom's!

I'm sure your custom water cooling is solid, and Liquid Ultra is a fantastic TIM, so let's rule out those variables.

Here's the normal operating range for Core temperature:

80C Hot (100% Load)
75C Warm
70C Warm (Heavy Load)
60C Norm
50C Norm (Medium Load)
40C Norm
30C Cool (Idle)
25C Cool

Core temperatures in the mid 70's are safe, so you need to keep it under 80C.

Your high Core temperatures are being driven strictly through your Core voltage being much too high, but there's also a few common misconception we need to clear up, which...

mordhau5

Commendable
Sep 16, 2016
5
0
1,510


So you don't think the cores themselves getting super hot is as important that the whole package is cool? I'm way more than comfortable with my socket sitting around 50c like it is now but not if I'm damaging the cores with that much voltage. It seems like maybe I can't pull the heat out of the cores themselves quick enough and that's why I was wondering if it could be limited by the lid/internal TIM.



Ok, I was afraid that maybe I had made a poor choice in mobos (as far as this application goes). I'm reading now that the x-99a uses lower-quality VRMs than even the step up, x-99 Deluxe. How did you destroy the others if I might ask?
 

mordhau5

Commendable
Sep 16, 2016
5
0
1,510


Damn... I've had to RMA one already that was stuck in POST with "NVRAM initialization" error-code and the new one still takes forever to POST. Seems like a bad board for anybody really trying to push their systems - or really anybody at all. And now I'm having trouble getting my m.2 ssd's full speed out of it.
 

CompuTronix

Intel Master
Moderator
mordhau5,

On behalf of the Moderator Team, welcome to Tom's!

I'm sure your custom water cooling is solid, and Liquid Ultra is a fantastic TIM, so let's rule out those variables.

Here's the normal operating range for Core temperature:

80C Hot (100% Load)
75C Warm
70C Warm (Heavy Load)
60C Norm
50C Norm (Medium Load)
40C Norm
30C Cool (Idle)
25C Cool

Core temperatures in the mid 70's are safe, so you need to keep it under 80C.

Your high Core temperatures are being driven strictly through your Core voltage being much too high, but there's also a few common misconception we need to clear up, which always create considerable confusion in processor temperature topics.

(1) Your i7 5820K is 22 nanometer microarchitecture. As such, the maximum recommended Vcore is 1.3 volts. 1.45 is potentially destructive to you processor with respect to electromigration. Also, as has already been discussed, X-99 motherboards are vulnerable to high TDP (140 Watts) CPU's as a result of high Vcore due to overclocking, which causes even higher Wattage and amperage to be drawn through the power planes within the layers of the motherboard to feed devices such as VRM's.

Use "Core Temp" to monitor your processor's Power (Watts) and you'll see what I mean - http://www.alcpu.com/CoreTemp

(2) Do NOT run any versions of Prime95 later than 26.6. Here's why:

Core i 2nd through 6th Generation CPU's have AVX (Advanced Vector Extension) instruction sets. Recent versions of Prime95, such as 28.9, run AVX code on the Floating Point Unit (FPU) math coprocessor, which produces unrealistically high temperatures. The FPU test in the utility AIDA64 shows similar results.

Prime95 v26.6 produces temperatures on 3rd through 6th Generation processors more consistent with 2nd Generation, which also have AVX instructions, but do not suffer from thermal extremes due to having a significantly larger Die.

Please download Prime95 version 26.6 - http://windows-downloads-center.blogspot.com/2011/04/prime95-266.html

Run only Small FFT’s for 10 minutes.

Your Core temperatures will test 10 to 20C lower with v26.6 than with v28.9, not to mention the Wattage.

Keep in mind that Prime95 Small FFT's is a steady-state 100% workload which produces steady-state temperatures. On the other hand, RealBench is a fluctuating workload which produces fluctuating temperatures, so it's normal to see your Core temperatures fluctuate when running RealBench.

(3) Let's get our terminology straight; there is no "Socket" temperature for Intel processors. "Socket" applies to AMD processors.

The term "CPU" temperature is synonymous with Intel's "Tcase" specification, which is a laboratory thermocouple measurement sampled on the surface of the IHS. The last processors that actually had an analog thermal diode to measure "CPU" temperature was socket 1366 back in the day of the 1st Gen i7 920.

Also, Core temperature is 5C higher than the Tcase specification (CPU temperature) due to the proximity of the DTS sensors to the heat sources, which originates in the "hot spots" at the transistor junctions within the Cores. As such, the following comments offered above by Multipack are incorrect:


Much to the contrary, the hottest processor temperatures are always Core temperatures, which are indeed most relevant.

Further, "water" or liquid temperature in your loop has no relationship to "Package" temperature, which is defined as the hottest Core.

(4) Don't depend on SpeedFan to correctly assign labels to devices. This is often the cause of much confusion when using SpeedFan. Don't get me wrong; it's a great utility that's highly configurable which I've used on various rigs for many years, but it has a bit of a challenging learning curve to get it set up correctly so that the labels accurately correspond to the appropriate devices. If SpeedFan is telling you there's a "CPU" temperature, there actually isn't one; it simply misused the term "CPU" to assign a temperature to a device such as VRM's.

(5) Intel groups your i7 5820K as a "High-End" processor - http://ark.intel.com/products/family/79318/Intel-High-End-Desktop-Processors#@Desktop

As such, High-End processors do not use TIM between the Die and the IHS; they still use Indium solder due to the need to efficiently dissipate heat from high TDP processors such as the 140 Watt i7 5820K, so you can dispose of this as a variable. Other than High-End processors, 2nd Gen Sandy Bridge were the last processors to all use solder due to the cost of Indium, which is a rather expensive and toxic exotic material. Again, other than High-End processors, 3rd Gen and later now use a Dow Corning TIM.

The bottom line is that you need to reduce your Vcore to something closer to 1.3. When tweaking your processor near it's highest overclock, keep in mind that for an increase of 100 MHz, a corresponding increase of about 50 millivolts (0.050) is needed to maintain stability. If 75 to 100 millivolts or more is needed for the next stable 100 MHz increase, it means your processor is overclocked beyond it's limits.

With high TDP air or liquid cooling you might reach the Vcore limit before 80C. With low-end cooling you’ll reach 80C before the Vcore limit. Regardless, whichever limit you reach first is where you should stop, declare victory and have a beer.

Remember to keep overclocking in perspective. For example, the difference between 4.5 GHz and 4.6 Ghz is only 2.2%, which has no noticeable impact on overall system performance. It simply isn’t worth pushing your processor beyond recommended Core voltage and Core temperature limits just to squeeze out another 100 MHz.

If you'd like to get yourself up to speed on this topic, then you might want to read this Sticky: Intel Temperature Guide - http://www.tomshardware.com/forum/id-1800828/intel-temperature-guide.html

CT :sol:
 
Solution
I meant all trhat matters is the core temp. I just said that to distinguish from overall system/rad temp and other sensors on the motherboard etc. Not contrary, concurrent..... Also I find it curious that you say the part of the motherboard containing the cpu doesnt get hot because it doesnt exist... Touch it after intensive use and see if its warm. It'll burn your fingers. I guess pain is only abstract. If I misled then it was because I was trying to keep it simple ;-)
 

scuzzycard

Honorable
^That was a good read CT - my thoughts on the X99-A are not that it is garbage - although it's clear that it cannot handle the extreme loads of an overclocked HEDT processor. My experiences with my 4 X99-A's really don't make that much sense to me. What I find interesting is that this board has a much more robust VRM than my video cards do - and I pull 350 watts from each of their poor little 6-phase VRM's all the time, but this board likes to go up in flames when I pull more than 200 watts from its 8-phase all-digital CPU VRM. All of ASUS's X99 boards have either the same VRM as the X99-A, or a very slightly upgraded version of the same VRM. And they all have problems. It's not like ASUS just went cheap - the hardware is pretty expensive, but it seems like there is something wrong in the firmware. There is something that makes this board want to fry your processor with prodigious voltage right after it is placed under a heavy load. It could be immediately after, or it can happen after a reboot, warm or cold...

What I do know is that if you run any processor at more than 1.25V, or you try to run the cache above 1.20V, or the DRAM above 1.35V - it will probably die - at least mine always did :)
 

mordhau5

Commendable
Sep 16, 2016
5
0
1,510


Holy crap I never dreamed of getting such a concise and thorough answer complete with links, solution, etc. THANK you! I'd like to just comment on a few of your points, partly for questions and partly just amazement for the time you took to write this.

1) I KNEW it.....I was unsure and should have been more cautious, but I had a hunch. I have read many many OC posts saying you'll "burn your CPU out (with heat) before you actually overvolt it" but physically that didn't make sense to me. When your die is that small the tiniest fractions of a volt must severely reduce the life span. Thank you so much for clearing this up for me. I will never again go the "first start with a voltage you can keep cool enough then bump up clock until you're unstable, then drop down" method.

I ran CoreTemp prior to this post and witnessed power consumption over 200W at times (but again that was with the newest p95 and using the "blend" test which leads me to

2) Thanks so much for this tip. I knew p95 is an old standard for stability testing (I've used it before in my 2nd rig which was actually an i7 920!) but I did not know the newest version created an unrealistic scenario. I downloaded and ran 26.6 after reading this and My core temps are MUCH cooler like you said. Hovering around 80c or slightly above. I'm gonna look into bumping voltage down from the 1.35 it's stable at now to see how I can get those temps closer to 80 and below.

RealBench unfortunately does not run properly for me, it seems to crash always, no matter what system I run it on, even at stock settings :/ Always causes the display driver to fail. Even when every other benchmark and game runs fine.

3) Thanks for clearing that up. I'm often guilty of mixing up terminology and not getting my point across well. I was using "socket" to mean the chip as a whole (which if I'm reading correctly is also not the case). React to the highest of the core temps, not the CPU temp as it's not directly reflective of the cores. I'm still a bit confused how there could be a 20-30c diff in my cores and my package temp. I don't think speedfan thinks "CPU" is the board VRMs as it's definitely cooler now that I'm on water (and I'm not cooling the board VRMs).

4) I did check with the monitor that comes on board to make sure speedfan has it's labels right and corrected any that were wrong. There is quite a learning curve. I currently have the Rad fans (all on a pwm splitter) set to spin up based on the sum of GPU and CPU temps. I had originally intended the system to only spin fans up if it found that water was beginning to heat up as the amount of heat in the water affects it's ability to take on more heat, but as it turns out Speedfan cannot find my board's external therm register address. Also I'm now remembering a physics lesson in which the heat water takes on doesn't equal it's own heat necessarily, something something "specific heat capacity".

Then again Speedfan also recently started fun thing where 4/5 times it starts up, the CPU temp goes from -100c to 60c back and forth and all the fan speeds are over 65,000 rpm. Not the most foolproof program.

5) Very. Helpful. Thank you. Seems like that sort of throws a wrench into that whole "delidding club" notion if you're running a "high-end processor"

All in all: damn this solution needs to be stickied for ever
 

mordhau5

Commendable
Sep 16, 2016
5
0
1,510


You probably already knew this but quality of the VRM is often more important than the number of phases. When I was trying to decide on "what model of gtx1080 should I buy" a lot of it initially came down to which one had more VRM phases. Then I learned that the quality of the VRM isn't always in how many phases it has. "EvilGenius" helped me understand better here: http://forums.overclockers.com.au/showthread.php?t=938062

Have you heard of many other people having this problem with x99 though? The reviews are generally favorable and I haven't seen too much trash talk on it outside of this thread. All I know is it burned out my first chip around 1.28v and then later locked itself out with perma-"NVRAM error" and needed to be RMA'd.

I should also add that an official MFR response to one complaint I saw on Newegg mentioned x99 as a specifically slow boot-er because of DRAM training. http://blog.asset-intertech.com/test_data_out/2014/11/memory-training-testing-and-margining.html And apparently you can de-activate it to speed up boot but sacrificing some potential stability.
 

scuzzycard

Honorable
It's really just an ASUS thing - other X99 boards seem to work as expected. So far, this fourth X99-A has been different from the others - it has made 2 attempts at killing my processor, but they were both weak and ineffective. The first time, I got an overvoltage error at POST, so I pulled the plug before seeing the voltage - but I believe it has to be above 1.6V for that error to appear. The second time, I caught it giving my CPU around 1.33 and cache around 1.43 when both were set to around 1.2V. That was a month ago, and I have since set AIDA64 set to shut my system down if either Vcore or Vring go above 1.3V. There have been no assassination attempts in the past month.

Unless this board behaves so well in the next 6 months that I change my mind, I will be switching it out for an ASROCK OC Formula 3.1. I spent quite a lot of money to have a machine I wouldn't have to touch for years, and I'm not going to let a motherboard that needs a psychiatrist continue to ruin that :)
 

CompuTronix

Intel Master
Moderator


Again, "Package" temperature is the hottest Core, so as there's no IGPU, I wouldn't lose any sleep over which sensor is labeled "Package" temperature. Also, whatever sensor SpeedFan is calling "CPU" temperature could instead be "PCH" (Platform Controller Hub or "Chipset") temperature, but it most certainly can not be CPU temperature.