CPU has gone up in flames..almost

Hey guys,
I was writing a rather long winded detailed tale of woe as my CPU just failed on me. I will try again, but keep it shorter.
I have been running an AMD X4 940 CPU for the last 3 years. It has performed very well and I've been quite pleased with my purchase.
I was running this under a TRU in push-pull which also performed great. I read reviews about the Spire Thermax II, still rated #1 for AMD at Frostytech. I decided to get it, as I needed the TRU for another build.
It was hate at first sight. Stupid mounting hardware and the most idiotic and useless fan retention system I've ever seen. I decided to give it a go anyway. My temps were a bit higher (from 33 idle to 45ish load to 38 idle and 50 load), but I figured it may be because I only used the one fan. I lowered my OC from 3.6 to 3.4 so I wouldn't have to worry about temps.
About a week ago, I noticed my temps rising. I was running about 40's idle and loads in the 50's. The other day it got worse, with idling temps soaring.
I decided I needed to inspect things and clear out the dust bunnies. Without too much detail, I tested everything and cleaned it all out and check the TIM (which was fine) cleaned the heatsink and reseated. All of a sudden I couldn't keep the machine on for more than 5 minutes without it shutting down from reaching 60C at idle. This was after adding a second fan in pull position and turning the exhaust fan to high speed.
I ran some more tests and determined it was the CPU that suddenly had out of control thermals. I wanted to share this because it is not something I've encountered before and I've not seen this on the forums. Although I've heard people claim their CPU is non-functional and had out of control thermals, there always seemed to be a root cause for this. I now have a Sempy 140 2.7 Ghz single core that idles near room temp and gets to a measly 32C maxed out. Which doesn't take much!! :(

I am very sad now and have retired my 940, RIP. Mostly I am sad that I gave away all my dual cores! With no money to replace now, I am gaming with no joy.

I am hoping that someone will have similar experiences to share. I can only guess that after 3 years of running hard it just finally spiraled out of control. As far as I can tell, the CPU itself is still operational but I can imagine what might have happened had I not set a 60C limit in the BIOS for shutdown. :)
 
Solution

Dogsnake

Distinguished
First, it seems the trouble starts when you swapped out coolers. I would check the mounting hardware and the physical installation in general. It sounds like a bad cooler install. Please describe what thermal compound used and how it was applied. When you removed the cooler was it spread evenly on the cpu top and cooler base? It is possible that the cpu chose this moment to go bad but all this starts when you pulled the TRU. I would also go to square 1 by setting bios to "stock" and see where it goes. If the voltage regulation on the MB has gone bad you could get the same results. Could you have damaged a MB component while installing the coolers?
 

sportsfanboy

Distinguished
out of control thermals doesn't sound right, as said above it seems that the cooler install is the culprit. I don't understand how a solid state device comprised of mainly transistors can have "out of control thermals". If the internal circuitry was so damaged that temperatures were building despite proper heatsink install, it's unlikely it would have worked in the first place to test your theory.

reseat reseat reseat,slowly slowly slowly,carefully carefully carefully :)
 
Thanks for your comments. As mentioned above I did test thoroughly to eliminate any other possibilities, including BIOS reset and fail safe defaults.

My wife and my friend suggested early maybe it was a CPU that had somehow lost its thermal controls. I scoffed at this saying "possibly", but I did everything I could to prove this theory wrong. I really wanted my CPU to still be functional, you see?

I installed the CPU with other coolers, and I assure you most carefully, with the same results; soaring temperatures and a thermal shutdown within five minutes.

The last thing I tried was using the oem heatsink. Same thing, in fact this caused the heatsink to become very hot to the touch almost instantly. Replacing the CPU with the Sempron 140, the CPU was under 30C. It sat for 20 minutes at 28C, not budging one bit. The core is at 29C as I write this. Wish I had a better processor, but this is all I have for now.

The method I use for TIM is the same as its always been. I start with a drop of AS5, then with a small artists brush I spread as thin a layer as possible across the CPU heat shield. Its a method that has provided excellent results. I also used to lap my heat sink and CPU, the X4 940 had been lapped when first installed.

@dogsnake I am wondering if the stupid Spire heatsink may have damaged my CPU, as you say I had no problems prior to installing that beast. But I find this unlikely, I think its more likely the damn CPU just burned out.

@ uther: In the Fall I am upgrading to 990FX and bulldozer. This is when I will have money. At that time, I plan to put the CPU back in and see what happens. I still need my current motherboard for the time being. I will definitely document this and return here to post my results. For this venture I will probably use an oem heatsink, as I am uncertain just how hot this thing will get! Anyway blowing up my CPU sounds like good fun!!
 


I agree wholeheartedly, in fact this was the basis for my argument with my wife (not really an "argument" a logic argument.) I didn't see how the CPU could be responsible for the soaring temps.
I told them that if a CPU were to "burn out" it would probably be non-functional and unable to run anything. I was able to enter BIOS at any time, and even booted to windows a few times before it got too hot and shutdown. As far as I can tell, its still a functioning CPU just one thats too hot to use. That's exactly why I chose to share this, because its so unusual and I've never heard of anyone experiencing this before.

Do you really think I installed the Spire heatsink wrong 3 times, followed by installing the oem heatsink wrong twice? Even if I was a complete moron with no experience, thats a bit hard to swallow! ;P

The first thing I checked was to make sure I installed the heatsink correctly.
 

sportsfanboy

Distinguished
Hmm that's really weird,you would think a burnt out or dieing chip would just have stability issues or not post or not post every time or something. Didn't mean to insult ya on the heatsink thingy but I've never heard of this before. So no matter what you do the temperatures get how high?
 


No worries. The temps get to 60C now in a matter of about 2 mins from a cold start, with any heat sink and default BIOS options. After that it will refuse to boot long enough for me to get into the BIOS to check the temp! :pt1cable:

My Antec 650's voltages are spot on as always. I even tried to lower the multiplier to run at 2.8 Ghz and lowered voltage to 1.2V but it still had no effect on the 60+ CPU temps.

I need my current motherboard, so I don't want to experiment with "how high can it go?", but keeping the shutdown threshold at 60C keeps the 940 from exceeding the 62C safe operating temp as described by AMD for Phenom IIs.

It was such a good chip, I'm sad to see it departed but I am looking forward to testing the limits in September!!
 
Well the only thing i could say is (unless you've already done this and i didn't see) is try the cpu in another motherboard. Normally a reset to default settings should of sorted things out. Although there maybe a chance that the motherboard could still be an issue for higher powered cpus.

Although it sound like that may not be an option and with the BD build in the not too far future, it may not be worth testing...

Very very stange issue indeed.
 

sportsfanboy

Distinguished
Good point it's possible the mobo is flaking out and reading temps wrong there by shutting the system down. Is the heatsink warm or hot to the touch? Because you might have some answers if it isn't and the bios is reporting high temps.
 
Yes, I'd like to have another board to test it in. I've had my eye on my wife's PC, and she has graciously offered me her triple core for the time being but we both need our PC's right now for school. I played some heroes V last night and it ran ok, much slower and for some reason load times increased horribly..

I thibnk I will do some testing with both PC's this weekend when I have some more time. I've increased the voltage and clock speed on the Sempy with no ill effects, temps still right around 28C.
 


Yep, thought of that too. In a previous post I mentioned how after starting up the 940 with the oem heatsink it became hot to the touch almost immediately. Swapped out the 940 for the Sempron and tested immediately. The Sempron was initially at 40C due to the hot heatsink. Temps dropped quickly back to 28C for Sempron idle. All other sensor temps are reporting as they always have, and the Sempron temps are consistent with temps I've observed with this chip on several different builds. Its a go-to spare CPU that is cool, quiet and reliable.

The sensor appears to be operating correctly. And the evidence for out of control thermals is abundant, including a heatsink getting hotter to the touch than the one on my GPU. Also, the voltages are within values they have always been at and I have every control over the voltages for CPU, NB, HT, etc. Everyhting is set as it should be at default values. The motherboard appears to be functioning at full operation. No problems whatsoever since switching CPU's, except for a weird driver that windows insisted in installing for the AM3 CPU. The new CPU easily accepts extra voltage and overclocking.
 

sportsfanboy

Distinguished
The only other thing I can think of is the mobo is not stressing while running the Sempron. So maybe the issues aren't manifesting themselves with that chip, at least not now. Dunno almost out of ideas,moving the chip to another motherboard would say for sure.
 

Here's some screenshots:

BG


BG
 
My thinking on it that maybe a component on the motherboard could be causing the issue. (pretty much what sportsfanboy is saying about mb not being stressed)

Most cpu's dont normally have issues. I've notice, It's typically caused by the PSU or MB if it's overheating on default settings.

If the PSU is fine, then the motherboard maybe an issue. if a cpu has the same issues on another MB, then the logical conclusion would be the cpu has the issue.

Those are my setups to determine whats fully going on.

Although if it was the PSU, i would think the gpu would show the signs first (seen in the past people replacing PSU with more powerful quality psu's and gpu temps went doing with it) and with the psu being already tested, it doesn't appear the psu the issue at all (at least to me or anyone else)

So the next step is the motherboard test. If the issue show on the other board, then we'll know that the cpu is bad. If not, then i think it's safe to say where the issue resides (the MB) but how the issue came about will be beyond me.
 
if your brave you could try a superpi run on 1m chances are it will fail b4 it finishes, not due to heat but it may be down to electron leakage out the lanes. if this is the case superpi will get the calcs wrong thus fail.
electron leakage becomes more of a worry after long periods of overclocking normally it can be combated for a short while by increasing volts but it looks like yours has gone beyond that.
either way it will mean you want a new cpu...