Hey guys,
Thanks for all the suggestions.
@ lilotimz: great suggestion about a great cooler - thanks for the tip - I printed it to my paperport cooler folder. Everything is very clean inside the case - but the case is about 50 miles from where I am, so I don't have access to it for another week or so.
@baddad - I wish I were talking F, but no, I've been talking C so the situation is BAD!!
@amuffin and $hawn: the temp on the digital readout per the user manual is the cpu temp. Also the bios identifies it as the cpu temp. So does speedfan. I would guess it's the package - maybe the integrated heat spreader, or some monitoring point close to it. On this machine it's consistently lower than the core temps. At idle right now, we got cpu temp down to 59, and core temps down to mid 70's.
The machine is up in LA, and I am down in Orange County, but I think we did some good this evening over the phone by dramatically underclocking the machine - but it still seems plenty powerful for his needs. He is not a gamer.
What was happening to cause us to take dramatic action, is that the situation got progressively worse, with the latest development being that the machine was shutting off every couple of minutes, and of course was entirely unusable.
----------------------------------------------------
My puzzling question remains: why would cpu temp drop 11 degrees C, from 95 C to 84 C, but core temps climb 35 degrees C, to 100, 99, 98, 99 and such - at the max threshhold temp of i7 nehalem?
And my bizarre theory remains the same - by affixing the stock intel cooler with a superior cooling paste, namely arctic silver 5, I was able to drive the cpu temp down, which allowed the machine to think things were cool, speed-step the hot-running 940 to full rate 2.93 ghz, power management allowing core temps to build, but not beyond 100 C, which staggers my imagination! I know graphics cards also will support temps like that, but the BOILING POINT OF WATER!!!
But intel outsmarted itself, and the machine apparently is not capable of maintaining those core temps without producing errors. When I say the machine was shutting off every couple of minutes, Miles explained to me just a while ago, that this was not an orderly shutdown - this was like somebody pulled the power cord. Ouch! I hope his hard drive was not damaged with sector read errors.
I think that the machine detected cpu computational errors, and therefore the machine took a very drastic emergency shutoff procedure.
-------------------------------------
So to be clear, I am not talking Fahrenheit, I am most definitely talking centigrade, and 100 degree core temps mean 212 degrees fahrenheit = the boiling point of water. I did a lot of reading about i7 nehalem, and the guy who wrote the article, which I found by googling i7 core temp limits, interviewed intel engineers whom he had access to as a system integrator. He was told how they binned chips, and he described how a 940 that barely passed the tests would end up as a hot-running 940 at up to 2.93ghz, whereas if it had failed, the same chip would be a cheaper, cool-running 920, never being pushed past 2.66ghz. So I think that's what we got in this chip, a hot 940 that barely passed the tests.
I appreciate all the suggestions about re-seating the heat sink. If I were up there that would be the first thing I would look at. I asked my friend, Miles, if he wanted to take the side off the case off and make sure the heat sink were not crooked, as if one push-pin had become disengaged. But yet, I doubt it, because then the cpu temp would not be so low, at 84 degrees C.
So, tonight, with the computer now shutting off within 2 minutes, we went into the Bios. Miles has never been in a bios before, but I was on the phone with a copy of the manual. The first thing I had Miles do is turn off HT. Right away, there went half the 8 cores (not full cores, HT cores, but still heat-producing elements I would imagine.) That's got to help with heat, is what I told myself. Then I had Miles drop vcore voltage from current 1.1 down to 1.0, minimum allowable. That might have been unstable at 2.93, but I had no intention of letting the machine run that fast. I dropped the 22 clock multiple on the 133 fsb, to minimum of 12 - was going to put it at 15, for about 2ghz, but he said the minimum was 12 so I said, let's go minimum for now.
Now it's a drastically underclocked 1.6ghz i7 940 - and compared to before, it now is running cool, so we accomplished our objective. My goal was to get all core temps below 90, under load, and we did.
I also increased all fans to max - there are only 3 fans: the cpu fan at 3,000, 92mm back case fan at 4900, and front 120mm input, at lazy 900. That may not have made any difference, but I did that just for thoroughness. The back 92mm fan is loud, but before we went to max, he could see it cycling in the bios at up to 7000, where it was VERY loud. We set it on max, but it does not show up at 7000 as I would have thought, it now shows up at about 4900. It definitely makes noise, but is not as loud as when it did the 30 second whoosh, and the fact that it remains constant means it is not annoying to him, steady noise, not a variable annoying 30 second whoooosh every minute or so.
To test our underclock, he loaded the machine up with lots of applications, Doda running, modeling, (he is modeling some of the doda characters) multiple YouTube videos running 1080, etc. He got cpu usage including kernel up to about 50%. The result is that, under load: cpu temp has dropped down to about 70, and core temps are down to in the mid-80s Centigrade.
I told him that as long as core temps are under 95 C, I am certain his computer will never shut down. He monitors core temperatures with speedfan, and also with core temp.
So to summarize, we severely underclocked the machine locking it to 1.6ghz. We dropped vcore 10% down to 1.0 from 1.1. We turned off HT thereby cutting active "cores" in half (I know that those HT cores are not full logical cores, but they are still heat-producing elements, so now there are only 4, not 8.) We feel that we now have a cool machine that will not shut off.
Tomorrow or the next day when we get around to it, we might play with HT, we might play with vcore, we might play with clock multiplier, and see what happens to core temps.
I want to mention again that I appreciate the suggestion about that other cooler that looks really good for about $30. The one thing about the arctic cooling freezer pro 7 that is an advantage, is the push-pins - you don't have to remove the motherboard.
By the way amuffin - I just noticed those forum links. Thanks! I'll check them out tomorrow for sure!
Rich