I had the most puzzling thing happen in helping a friend with a hot-running i7 that he uses as a professional animator and modeler for a major gaming company.
The machine was making a lot of noise. A fan would come on at 100% for about 30 seconds, cycle down lower, and then repeat every 90 seconds. It was quite irritating to all concerned. I also noticed the computer seemed to be running slowly - just browsing the web.
Speedfan said the back fan was cycling from 1500 to 2600 rpm. There is only one slow intake fan - probably 120mm at about 800rpm. He had already cleaned the front grill. He said he looked inside and saw a temp readout of 95. I downloaded and displayed core temps, and saw 72, 74, 75, 72, and I thought those were high.
So I looked inside the case, saw dust everywhere, including caking the top of the stock intel cooler, took off the cooler, blew the dust out, and picked up some arctic silver 5 from the local radio shack. I installed the cooler, and the digital readout dropped to 84. That corresponded to cpu in the bios, and also on speedfan. Great! Dropped overall chip temp by 11 degrees. I told him his new core temps would be lower.
The back fan stopped cycling on, and the computer seemed to be cooler.
But I was shocked to see core temps go to 100, 99, 98, 99. !!!!!!!!!!!!
What in the world is happening? Any ideas? Now the computer even shuts itself off from time to time.
If by "digital readout" you mean you stuck a temperature sensing probe near the heatsink , then i have your answer;
what has happened is that the heat sink has not been attached properly !!!
That's why the heat sink temps have lowered down, as the heat form the CPU isn't getting properly transferred to it. At the same time, reduced heat transfer also means that the Cores will start to overheat
Hi, thanks for all the input so far. I did order an arctic cooling pro from newegg - same push pins so I won't have to take out the mobo. When I say digital readout - we didn't stick a temperature probe. The EVGA motherboard has this - I guess as a standard item, on its 1366 model. His i7 dates back to beginning of 2010. The heatsink seemingly dropped that temp by 11 degrees, but the cores went through the roof.
So you guys are saying - maybe the heat sink was not properly installed. Well, maybe you're right. It appeared to snap on - all four posts. It seemed to be on solidly. And I cleaned off the old thermal compound, used the arctic silver cleaning solutions, and then applied the arctic silver 5, and was gratified to see the cpu temp drop 11 degrees. But not the core temps which climbed.
Here's my theory - let me know what you think. I believe that the power management in the computer is faulty. The cpu temp is the overall chip temperature from wherever EVGA has the probe mounted. Maybe it is the IHS temperature. I believe, that because that overall cpu temp was high at 95, the computer was throttling back the rated 2.93 ghz frequency of the cpu. As I recall, it seemed slow. In addition the computer was cycling that rear exhaust fan at 100% every 90 seconds for about 30 seconds.
My theory is that the heat sink is on good, dropping the cpu to 84, and the computer now thinks it is fine to run it at full 2.93. I think we picked up a binned 940 that should have better been in the pile of 920s at 2.66 ghz. So the computer is running at full speed, since everything "is cool" and it allows for core temps up to 100 - with very good core temp management systems. So there they are, almost all at 100. But really, that's too hot for sustainable operation, and errors are cropping up.
That's my theory - idiotic power management gone wrong.
But I agree - get a better cooler, and in addition increase case air movement - something like a kama bay 3 x 5.25 optical drive cage for a 120mm filtered intake fan at 1600 rpm to give that case more breathing. Anybody know how to find a kama bay these days? Or does anybody know a good way to take 3 optical covers, and draw a nice circular hole - with a soldering iron maybe?
Well, it's a 3-graphic card board I noticed. And yes I saw the nvidia sli insignia. So I think you're right - that's the board. You like those boards I take it.
So what do you think? Guys are saying I didn't seat the hsf properly. But how could I reduce the cpu temp - the IHS I guess - by 11 degrees if I didn't properly seat the heat sink. But then why are core temps now 30 degrees hotter?
So, cpu temp is down by 11, core temps up by 30-35. How is that possible? Is there an evga forum I should go to?
Btw. Can we get a picture of the case ; the intake fans, exhaust fans, interior wire management etc. so we can get a better understanding of the situation and possibly identify areas where you might need improvements.
@ lilotimz: great suggestion about a great cooler - thanks for the tip - I printed it to my paperport cooler folder. Everything is very clean inside the case - but the case is about 50 miles from where I am, so I don't have access to it for another week or so.
@baddad - I wish I were talking F, but no, I've been talking C so the situation is BAD!!
@amuffin and $hawn: the temp on the digital readout per the user manual is the cpu temp. Also the bios identifies it as the cpu temp. So does speedfan. I would guess it's the package - maybe the integrated heat spreader, or some monitoring point close to it. On this machine it's consistently lower than the core temps. At idle right now, we got cpu temp down to 59, and core temps down to mid 70's.
The machine is up in LA, and I am down in Orange County, but I think we did some good this evening over the phone by dramatically underclocking the machine - but it still seems plenty powerful for his needs. He is not a gamer.
What was happening to cause us to take dramatic action, is that the situation got progressively worse, with the latest development being that the machine was shutting off every couple of minutes, and of course was entirely unusable.
My puzzling question remains: why would cpu temp drop 11 degrees C, from 95 C to 84 C, but core temps climb 35 degrees C, to 100, 99, 98, 99 and such - at the max threshhold temp of i7 nehalem?
And my bizarre theory remains the same - by affixing the stock intel cooler with a superior cooling paste, namely arctic silver 5, I was able to drive the cpu temp down, which allowed the machine to think things were cool, speed-step the hot-running 940 to full rate 2.93 ghz, power management allowing core temps to build, but not beyond 100 C, which staggers my imagination! I know graphics cards also will support temps like that, but the BOILING POINT OF WATER!!!
But intel outsmarted itself, and the machine apparently is not capable of maintaining those core temps without producing errors. When I say the machine was shutting off every couple of minutes, Miles explained to me just a while ago, that this was not an orderly shutdown - this was like somebody pulled the power cord. Ouch! I hope his hard drive was not damaged with sector read errors.
I think that the machine detected cpu computational errors, and therefore the machine took a very drastic emergency shutoff procedure.
So to be clear, I am not talking Fahrenheit, I am most definitely talking centigrade, and 100 degree core temps mean 212 degrees fahrenheit = the boiling point of water. I did a lot of reading about i7 nehalem, and the guy who wrote the article, which I found by googling i7 core temp limits, interviewed intel engineers whom he had access to as a system integrator. He was told how they binned chips, and he described how a 940 that barely passed the tests would end up as a hot-running 940 at up to 2.93ghz, whereas if it had failed, the same chip would be a cheaper, cool-running 920, never being pushed past 2.66ghz. So I think that's what we got in this chip, a hot 940 that barely passed the tests.
I appreciate all the suggestions about re-seating the heat sink. If I were up there that would be the first thing I would look at. I asked my friend, Miles, if he wanted to take the side off the case off and make sure the heat sink were not crooked, as if one push-pin had become disengaged. But yet, I doubt it, because then the cpu temp would not be so low, at 84 degrees C.
So, tonight, with the computer now shutting off within 2 minutes, we went into the Bios. Miles has never been in a bios before, but I was on the phone with a copy of the manual. The first thing I had Miles do is turn off HT. Right away, there went half the 8 cores (not full cores, HT cores, but still heat-producing elements I would imagine.) That's got to help with heat, is what I told myself. Then I had Miles drop vcore voltage from current 1.1 down to 1.0, minimum allowable. That might have been unstable at 2.93, but I had no intention of letting the machine run that fast. I dropped the 22 clock multiple on the 133 fsb, to minimum of 12 - was going to put it at 15, for about 2ghz, but he said the minimum was 12 so I said, let's go minimum for now.
Now it's a drastically underclocked 1.6ghz i7 940 - and compared to before, it now is running cool, so we accomplished our objective. My goal was to get all core temps below 90, under load, and we did.
I also increased all fans to max - there are only 3 fans: the cpu fan at 3,000, 92mm back case fan at 4900, and front 120mm input, at lazy 900. That may not have made any difference, but I did that just for thoroughness. The back 92mm fan is loud, but before we went to max, he could see it cycling in the bios at up to 7000, where it was VERY loud. We set it on max, but it does not show up at 7000 as I would have thought, it now shows up at about 4900. It definitely makes noise, but is not as loud as when it did the 30 second whoosh, and the fact that it remains constant means it is not annoying to him, steady noise, not a variable annoying 30 second whoooosh every minute or so.
To test our underclock, he loaded the machine up with lots of applications, Doda running, modeling, (he is modeling some of the doda characters) multiple YouTube videos running 1080, etc. He got cpu usage including kernel up to about 50%. The result is that, under load: cpu temp has dropped down to about 70, and core temps are down to in the mid-80s Centigrade.
I told him that as long as core temps are under 95 C, I am certain his computer will never shut down. He monitors core temperatures with speedfan, and also with core temp.
So to summarize, we severely underclocked the machine locking it to 1.6ghz. We dropped vcore 10% down to 1.0 from 1.1. We turned off HT thereby cutting active "cores" in half (I know that those HT cores are not full logical cores, but they are still heat-producing elements, so now there are only 4, not 8.) We feel that we now have a cool machine that will not shut off.
Tomorrow or the next day when we get around to it, we might play with HT, we might play with vcore, we might play with clock multiplier, and see what happens to core temps.
I want to mention again that I appreciate the suggestion about that other cooler that looks really good for about $30. The one thing about the arctic cooling freezer pro 7 that is an advantage, is the push-pins - you don't have to remove the motherboard.
By the way amuffin - I just noticed those forum links. Thanks! I'll check them out tomorrow for sure!