tss26

Distinguished
Jul 18, 2009
6
0
18,510
Last week I changed my CPU HSF. It ran fine with the new fan for several days (with good temperatures) then all of a sudden it just switches off. Everytime I try to start it the fan jolts a little bit but doesn't start spinning and it shuts down automatically shortly thereafter. There's no video signal either. If you turn it off before it shuts down the fan starts spinning, then winds down as it loses momentum.

So I removed the mainboard and everything else then put them back in again. It worked properly for a while but shut down again about 15 mins later and I was back to square one.

So then I took everything out again, thinking that there was something in the case that was causing the motherboard to short circuit. I placed the board on an anti-static bag and connected everything again. Again, this worked, but only for about 15-20 mins before I got the same problems as before. My fan is 4 pin but I've tried running it on 3 pin connectors as well, which wasn't any help.

I'm really at a complete and utter loss as to what the problem could be. The only thing I can think of is that the fault is in the mainboard itself. However, I don't feel like spending $250 on a new mainboard just to test this hypothesis, so any advice would be appreciated.

My system consists of a stock E8200, 2 GB of ram, a GA-P35-DS3P and an 8800GT all in an Enermax Chakra case. Thanks in advance.



 

bilbat

Splendid
First things first:
I placed the board on an anti-static bag and connected everything again.
The anti-static bag makes a great place to work on your board if you don't happen to have an anti-stat mat:
dscn1649s.jpg

but the same thing that actually makes it 'anti-static', also makes it a really poor place to run the board - it's mildly conductive - not enough to, say, actually short a trace and let the 'magic smoke' out of the MOBO (all modern electronics work because they have 'magic smoke' inside - once you let out the smoke, they cease to operate! :pt1cable: ), but enough to futz up their operation in strange ways...

It ran fine with the new fan for several days (with good temperatures) then all of a sudden it just switches off.
You fail to note here what your temps look like when you're having the actual problem. This looks a lot like an HSF problem: too much, or not enough, or a 'bad spread' of paste, or a loose, unseated, cracked, or broken retaining pin. I have had numerous people swear to me here that they're sure they did the HSF install correctly, and then sheepishly come back to report either a pin loose, unlocked, or cracked... It's not inattention or incompetence, either - there seems to be a contest going on (but I think Intel's system wins hands down with their really crappy 'stock' 775 piece) to see who can design the worst, most impractical way to attach a heatsink! I've tried a few, and, with one exception, they all stank! (The exception: I finally bit the bullet and went to water - have a D-Tek Fuzion, and it came with: an 'x' shaped, threaded, metal backing plate with reinforcing ribs and a molded-on foam rubber insulating pad, and a spring-loaded set of screws with a machined release [not a single &^%$ ^%#@ plastic piece involved!] that lets you get to exactly the correct tension, and then simply turns without tightening further; the drawback - you have to have unimpeded access to the back of the MOBO; I've written D-Tek and told them they should quit wasting their time on water-blocks, and sell mounting systems to every HSF manufacturer!)

IMHO, the best feature of GB's 'ultra-durable' MOBOs (which is, otherwise, a bunch of ditzy marketing hype - '50 degrees C cooler'? - I'm pretty sure that if anything on my MOBO was 50 cooler, there'd be frost forming on it!) is that they'll take enough pressure to get an HSF seated without giving you the ominous feeling that you're a half-ounce of pressure away from a dreadful, fatal snap! Another point to be made is procedure: it's usually easier to 'work your way' around the chip, but, for the best results, you want to do a pair of diagonally opposed pins first, and then finish up with the other two...

Thhe symptom is this - the board powers up, fans start to spin as the BIOS starts to control them, but the CPU is not 'spilling' enough heat to the HSF to keep it reasonably cool, so it goes into themal shutdown to protect your chip. If you're getting any time at all that it's remaining on, try going to the 'health' page of the BIOS and watch your CPU temp - if it's going up and up and up and - powee - dead - it's the HSF; also note if the HSF fan continues spinning at a fairly consistent speed - could be a configuration problem, the board simply 'doesn't like' its fan (have seen a few of these - appear to be a compatibility problem with the PWM switching voltage on GB boards), or it's binding due to a bad bearing...
 

compdude61

Distinguished
Jun 6, 2009
48
0
18,530
Humm! You may have given me some valuble info on my problem with my Foxconn 775 Intel E2200 Duel Core cpu board. Mine does the exact same thing with the cpu fan spinning up fast & commences to slow down right after powering up the system. I may give it a shot & purchase another cpu hsf & give it a try. My hsf is supported to the board by 4 screws & I have a new hsf but the holddowns on it have 4 plastic locking pins but will not fit into the threaded holes in the mobo. The pins are too large to fit into the theaded hole. Wished I could find a way to make it fit. Maybe the fan is my problem causing the system to shut down to protect the cpu. Maybe I could plug the hsf into one of my towers lying around & give it a shot? Bench test is what I mean. Thanks for that additional info. Did not try this in my other testing.
 

tss26

Distinguished
Jul 18, 2009
6
0
18,510

Well actually, the reason I had to change my HSF was that I broke one of the plastic screws on the stock HSF when I was trying to put it back on after cleaning it.

So I got a Cooler Master Vortex 752 (the cheapest HSF I could obtain at such short notice) and some Titan Nano thermal grease. One possible issue is that I removed the old stuff using nail polish remover and a cotton bud (hardly lint free!). I also managed to spill some of the nail polish onto the mainboard if that makes a difference. I did that credit card thing to make sure the spread was good.

But if poor thermal conductivity was the issue why would it work fine (averaging 30C idle and a maximum of about 45C under load) for several days before suddenly failing? And the two times I managed to get it up and running again the temps were the same. So that's why I think it might be a motherboard issue and not the CPU.


If I put the fan on one of the 3-pin connectors for other fans it spins but the same thing happens. I even tried blowing a hair dryer (on cold) on the HSF which kept it going a bit longer but it shut down again shortly after starting windows.

The last time something like this happened it turned out to be the power supply. Could that possibly be the problem here? Although I have a Silverstone PSU now and not some cheap one, so I don't think so...
 

bilbat

Splendid
One possible issue is that I removed the old stuff using nail polish remover and a cotton bud (hardly lint free!).

Not a likely problem - a few barely visible cotton fibers 'stuck' in the TIM (Thermal Interface Material - the heatsink compound) certainly wouldn't cause its failure...

I also managed to spill some of the nail polish onto the mainboard if that makes a difference.

Also nothing to worry about; manufacturers actually run the boards through a vapor-phase-change solvent cleaning step after the wave solder to clean up flux residue...

I did that credit card thing to make sure the spread was good.

This is not a good idea - sort of a computer version of an 'old wives' tale'! The best way to get a good spread of TIM is to place a single, uniform (i.e., as close as manageable to spherical) 'blob' of TIM on the CPU, and then mount the HSF by placing it atop - as flat as you can; if you do it this way, the 'squeeze' applied as you tighten the HSF retainers pushes the blob toward the outer edges of the interface uniformly, with no chance of trapping air. If you start out with a 'sort of' flattened application by 'spreader' (i.e., credit card edge), low spots are likely to trap air - and, think about it: if a bubble is trapped, and there's no where for it to go, what happens as you 'squeeze' it? The volume remains the same, but it gets 'thinner' due to the pressure, so it gets wider, causing more area under the block to lose contact with the actual TIM. This is also a reason for 'delayed' failure of the thermal interface: the TIM isn't very viscous, but it is, somewhat; and thermal cycling makes it more so; with a good spread, this is an advantage, as your thin, even layer gets even thinner, and therefore, more conductive; if you have a bubble trapped, as the TIM spreads from continued pressure and thermal cycling, your 'bubble' also gets thinner - meaning, at a constant volume, you lose more contact area between the CPU and sink, degrading the thermal interface, and causing the chip to get hotter and hotter...

All in all, though, based on your comments about the observed temps the twice you got it to 'catch', I'm leaning toward doubting a thermally induced problem; power supply is always possible - if you'd be so kind as to list everything 'in the box' that's drawing power, and the model # of the PSU, I can check for you to see if the setup is in any way 'marginal'
 

tss26

Distinguished
Jul 18, 2009
6
0
18,510

I have an SST-ST50EF. The only devices other than what I've listed in the OP are a hard drive, a DVD drive and a sound card. So I don't think it's a case of the PSU being overloaded.

What part is responsible for PWM generation? Is it on the mainboard or is it internal to the CPU itself? That seems like the most likely problem to me.
 

bilbat

Splendid
I checked the PSU specs - looks fairly remarkable - how do you find it for noise? You've got two 18 Amp 12V rails, and 25 Amps on the 5v - I think, in a pinch, you could use it for welding! Also has short-cicuit 'fold-back', overcurrent, and open-circuit protection - don't think you could damage it, even if you were trying!

The PWM is a little bit odd concept. If you already know the basics, I'm not trying to be condescending, just thorough... What happens in PWM (Pulse Width Modulation) is that a signal at a fixed frequency (someplace around 18KHz, I think, for fans) has its 'duty cycle' changed (in other words, the ratio of its 'on time' to its 'off time' - the English have a really descriptive term for this - they call it 'mark-to-space-ratio') varied, so that, for various fan speeds, it looks like this:
pwmwave1.jpg

The important part of this drawing is the 'level' at Vth; the board's PWM output, on pin 4 of the header, is ostensibly a 'chopped' 5V signal that operates 12V switches on the FAN to regulate the speed, but it has this 'on' threshold that is the actual switching voltage of the physical switches. There is also a 'low' threshold that I didn't add to the drawing (confusing enough, I figured) that sets the highest voltage, beneath which the signal will be interpreted as 'off'.

Intel set the PWM general spec, I've posted it here:
http://rapidshare.com/files/259935766/PWM_fan_specs.pdf.html

Ah - my frequency guess was off ("Target frequency 25 kHz, acceptable operational range 21 kHz to 28 kHz"), but for our purposes - doesn't matter; the important stuff is on page 9:
Maximum voltage for logic low: VIL = 0.8 V
Absolute maximum current sourced: Imax = 5 mA (short circuit current)
Absolute maximum voltage level: VMax = 5.25 V (open circuit voltage)
You see, they give the 'logic low' threshold, and the highest signal allowed, but they don't mandate the 'switching' or 'logic high' threshold, and we're guessing (have had this problem with two specific HSFs, but, unfortunately long enough ago so I can't remember the brands) that this is where the problem is. We have tested old fan in new motherboard, works and exhibits speed control, so we know the MOBO PWM is working; new fan in different MOBO, so we know the new fan is controlling OK; new fan in GB MOBO, either dead (in one of the cases), or spits at first -spins a little (startup surge specified in spec??), then dead. All we can figure is that this must be the situation:
pwmwave2.jpg

I mention this as I have seen it numerous times, alway with two products; our guess was that the switching threshold on some fans is a little bit high, and the 'chopping' signal on GB's might be a bit low; put 'em together - no go.

Easy to test, as above. Old fan in this board, new fan in any other (non-GB) board; both work, I can't think of any other explanation...

What part is responsible for PWM generation? Is it on the mainboard or is it internal to the CPU itself?

I'm not absolutely positive, but believe this is handled by the ITC IT8718 LPCIO chip, but, of course, the actual PWM manipulation of the triggering is done by the CPU running some BIOS code, so everyone's got a hand in it...
 

rockyjohn

Distinguished


Actually, according to a little company called Arctic Silver, the best way to apply the thermal paste depends on the particular processor and its features. For an Intel dual core, this is a single line of paste. At least for some AMD chips, and maybe others, you do spread it. Not an old wives tale at all. For more specific directions for various cpus, see

http://www.arcticsilver.com/arctic_silver_instructions.htm
 

rockyjohn

Distinguished


I have never seen that. Being extremely curious and considering such a board for my next build, I went to their website and looked at some of their information on the ultra-durable boards and did not see anything like that. Would you please provide a link to where they make these claims.

They did however explain certain features of those boards, such two 2 oz copper layers for power and ground layers to lower system temperatures (no claim as to specific amounts0 and improve energy efficiency, ferrite core chokes, and low RDS on mosfets that seem like more than marketing hype, I am no electrical expert but I bet if this was all marketing hype the reviews would have said that - all I have seen are positive reviews about the boards features.
 

rockyjohn

Distinguished


Actually this could be a problem.

First, I assume you really meant "nail polish remover" and not nail polish. If you have some "crimson fire" nail polish left on the board this would be a real issue.

Second, not all solvents are the same - the fact that manufacturers use specially selected - and in the cpu manufacturing process most likely specially formulated - solvents in a closely controlled process does not mean that spilling other solvents on the board will have no impact.

How soon after spilling it did you first power up the board? If there was even a small amount of the liquid left, it could short out a circuit. In additon, some nail polish removers contain oils, scents, and colorings that could leave a residue after the rest of the solvent evaporates that could create a short. What did you do to clean the spot afterwards? Note that oils might resist simple cleaning and leave an almost invisible film still capable of short circuiting a components.
 

rockyjohn

Distinguished
Please stop with your bogus personal attacks and off topic posts. I corrected you in two cases above because your were flat wrong and providng people with misleading information, Should I just ignore what I know to be errors - maybe at a cost to the originator and others reading the post? When someone has what strongly appears to be a heat issue after changing their HSF, getting accurate information about how to apply thermal paste is important and you should thank me for correcting the misleading information you provided.

And I asked you to post support for a statement you made attacking a motherboard maker of a board I am considering for purchase because I found it not beleivable. Why are you personally attacking me for that? If your statements are true, why not just post the link to were the statements were made instead?
In fact for all three - why not post correct information if I am wrong. If I am not wrong - why are you personally attacking me? Let's stick to the originator's issues.

And to give readers a clear picture, his statement above "All because we decided you were irrelevant, and ignored you?" refers to another thread where I had the temerity to disagree with him. Apparently it is a pattern - if you disagree or show him wrong, he responds with personal attacks or put downs - not at all in keeping with the spirit of the forum to help the thread originator by focusing on his issues. Here is the other thread should you wish to see for yourself.

http://www.tomshardware.com/forum/261510-30-trouble-china-ep45-ud35#t1831315
 

rockyjohn

Distinguished
bilbat - tss26 started this thread 6 days ago - and you started trying to help 6 days ago.
You have posted some long lists of things you wanted to talk about - which were of little direct help to tss, and have left him hanging since four days ago with "I'm going to post this temporarily, and come back to edit... "

When are you going to go back and edit? Do you intend to continue helping him try to solve his problem? Meanwhile I see you have gone on to posting on many later threads. Tss26 is probably not looking for a lot of generic info on HSF and power supplies, he just wants his system running.

tss - are you still watching this thread and still have same problems? Please let me know and I will try to help. If so:
Is system still "on the table" or back in the case?
Do you have the old HSF? If so, first thing I would do is try to reinstall it - since it was your last good config- and see if it works. If so this narrows down the likely causes to bad new HSF or problem with installing it. When you reinstall old HSF, take care in applying thermal paste and make sure it is correctly and fully connected to system and that electrical connected back up properly (I say the obvious because these could have been issues in installing the new HSF).
Also - since removing the first HSF initially - have you made any other changes at all to system?
Also - did you review my post above about the "nail polish" and are you reasonably certain there is not issue there? With regard to that, please also carefully inspect the top of mobo and report any indications of any overheating - such as black marks or bulging capacitors. This is less likely as system shut down should be protecting the system but doesn't hurt and is easy to check.
 

bilbat

Splendid
I believe that "come back to edit" was just before you interrupted me; I had already spent some time in CAD doing an illustration of a previous problem we have seen - exactly these symptoms - specific to individual brands and models of fan, and, unfortunately, I can't remember which... Appears to be a threshold problem between the GB implementation of PWM, and the PWM spec; symptoms - old fan works fine, and exhibits speed control - proving the on-board PWM is working; new fan either doesn't work, or 'spits and sputters', usually showing a turn-on burst, followed by nothing; new fan works in other MOBO - proving fan's on-board PWM switches are functional; has to be a threshold problem - saved the cad waveform drawings to elaborate and explain; you interrupted, causing me to spend a day or two wondering if I really thought doing this was worth the aggravation, and passing out my e-mail to 'satisfied customers' who refer back for periodic miscellany; an awful toothache; tooth pulled; a day spent taking my elderly parents to a local function; do you now imply that you can schedule me, or that I should feel guilty about my schedule? I believe this is 'gratis'; I do it to be helpful - if I occasionally lag a little - feel free to call Gigabyte tech support - oh, ahh, you do speak Mandarin, don't you... I do...
 

rockyjohn

Distinguished

I see. Now you are blaming me for you not returning for 4 days? I did not realize that I had such power over your schedule.

Just another example of how you ramble on about things the originator may have little interest in - using up your time and his instead of helping him fix his issue. For third parties reading this and wondering why I am concerned about this issue, I refer you to this thread where he took another person on a long journey, until that person stopped coming back. You can't understand my concern about this issue without seeing how this is not an isolated incident - and since bilbat appears to distorting things again in the same way - it is relevant:

http://www.tomshardware.com/forum/261510-30-trouble-china-ep45-ud35


I never implied as you state that I can shedule you nor that you should feel guilt about your schedule. Where do you keep coming up with such crap? Just more attempts to mislead. But I do see - just by looking at the current listings for the Gigabyte forum alone - that you have the last response on 13 other threads since you had deserted to this one - with multiple posts in some threads. Clearly you were out "taking on" new threads while leaving this one that you had already started to help high and dry - and waiting for the continuation you said you would provide. So when you complain about your toothe ache and parents it is just another misleading statement. You have clearly spent a lot of time on the forum - just with other new threads.

It is nice to help. You are to be commended for that. But why not focus more on helping someone you started with and made a committment to before deserting them and starting with others? And this is not the first thread you have deserted. Just recently you recruited a lot of users having problems with their EX 58 boards to join a new thread you originator and repost their information, promising to work with them and solve their issues. Then after making a few charts and pontificating some more - you apparently abandoned them and have not been back in the past 9 days - no doubt due solely to your dental problems and parents' issues.

http://www.tomshardware.com/forum/256094-30-ex58-ud3r-failure-debugging
http://www.tomshardware.com/forum/261336-30-ganging

Could it be the current thread was no longer as entertaining to you? You wanted to spin some more of your yarn like you did in the above post or earlier in this thread?

And you end your post with "oh, ahh, you do speak Mandarin, don't you... I do..." - another of your childish put downs and while attempting to puff yourself up. Just like in the thread linked above.

I am just attempting to help posters and solve their issues. I just wish you would do likewise instead of using the forum as a place to pontificate, take posters on round about journeys that don't help them solve their issues, and, when corrected by others like me when you do or when you state erroneous facts, respond with childish personal attacks. I don't have time for this either - but feel I must respond to your misleading statements.
 

tss26

Distinguished
Jul 18, 2009
6
0
18,510
LOL, calm down guys. I'm still stumped, but I've had it running now for the longest time yet -- over half an hour now! I'm just waiting for it to shut down again.

I ran Super Pi for 32 million digits and the maximum temp was 46/41 (core 1 / core 2). Then I ran 3D Mark and the maximum was 48 / 45. So I really don't think it's a thermal issue.

I really think it is the motherboard as it seems to work after nudging the board a bit. Not sure why. Has anyone seen anything like this before?
 

rockyjohn

Distinguished
It could be a thermal short that does not express itself until the board is heated - but those are extremely rare.

Does the system still regularly shut down after 15 minutes? How consistent is the pattern? What changes if any have you made since your last prior report - both in things you have tried and changes in symptoms.

Please describe any other current issues and respond to the questions I asked 4 posts above.
Also, how many times have you reinstalled the HSF since you switched to the new one? Any chance the thermal paste was not properly applied both (all) times?

Have you run the system in bios monitoring the temperture from there? If not you might want to try it to see if it still shuts down with that minimal load and what the temp readings are.

Have you tried to run it with only one stick of memory just to minimize the possiblity of having an issue related to the memory itself or the memory sockets. Obviously this is a less likely case but you are down to checking those now and if you can narrow down the other options it increases the liklihood of a mobo problem and builds the case to RMA it. Also it is a pain to RMA a motherboard only to get a new one and find that it was not the problem when the issue repeats.

Finally to rule out the PSU, can you try it on another board or get another known, working PSU to try in your system?
 

tss26

Distinguished
Jul 18, 2009
6
0
18,510
Sorry, formatting is messed up.

>Is system still "on the table" or back in the case?

It's back in the case now.

>Do you have the old HSF? If so, first thing I would do is try to reinstall it - since it >was your last good config- and see if it works. If so this narrows down the likely >causes to bad new HSF or problem with installing it. When you reinstall old HSF, >take care in applying thermal paste and make sure it is correctly and fully >connected to system and that electrical connected back up properly (I say the >obvious because these could have been issues in installing the new HSF).

The problem is that one of those annoying push pins on the stock HSF broke, which is why I needed to get a new one. However I plugged in the old HSF fan and it had the same problem.

>Also - since removing the first HSF initially - have you made any other changes at >all to system?

No.

>Also - did you review my post above about the "nail polish" and are you >reasonably certain there is not issue there? With regard to that, please also

Yes, I meant to type "nail polish remover". There doesn't seem to be any visible mark on the board, but as you said it might not be visible.

>carefully inspect the top of mobo and report any indications of any overheating - >such as black marks or bulging capacitors. This is less likely as system shut down >should be protecting the system but doesn't hurt and is easy to check.

Nothing obvious.

>Does the system still regularly shut down after 15 minutes? How consistent is the >pattern? What changes if any have you made since your last prior report - both in >things you have tried and changes in symptoms.

Good news! A couple of days ago I had it working continuously for several hours! It may be that the problem has solved itself, but I've been busy so haven't had the chance to try it again. I'll give it another go tonight ad see what happens.

It started working after I prodded and flexed the board by pushing and pulling on the edges. That makes me think the problem was most likely to do with the board's circuitry and the issue was solved by applying a bit of force and deforming the board slightly.

Here's hoping the problem doesn't recur. I'll keep you posted.

 

bilbat

Splendid
It started working after I prodded and flexed the board by pushing and pulling on the edges. That makes me think the problem was most likely to do with the board's circuitry and the issue was solved by applying a bit of force and deforming the board slightly.
That's hilarious, and entirely possible. Back in the late eighties to early nineties, I think, I carried around a CRT 'terminal' that weighed about thirty-five or forty pounds, which was used to program and display 'ladder logic' that ran in Allen Bradley programmable logic controllers; this was about a six thousand dollar piece of hardware. One day I had an AB guy in to look at a particular machine's 'set-up' and give us some kind of advice - I forget about what. For weeks I had been having intermittent problems with our terminal glitching up - losing connections, losing video; was starting to think it might be time to bite the bullet, and invest in a new one. It just happened to do it while the AB guy was there - he shut it down, pulled the keyboard (inside of which he said the whole 'brain board' lived), grabbed it by its two corners, and gave it a mighty diagonal wrench (I cringed at the popping noises!), stuck it back in, powered up - and it worked fine for years until the programming moved to laptops...