2 murdered GTX570s - PSU or Mobo is serial killer?

steiner666

Distinguished
Jul 30, 2008
369
0
18,780
OK, here's specs:

Asus P5e-vm hdmi (OCed to 420mhz FSB, prime stable for years)
Intel Q9550
4x2GB Gskill ddr2-1000 (memtest 86+ tested as good)
nvidia gtx 570
antec neopower 650w
creative xfi pci soundcard


a concise catchup of the problems i've been having for the past week or two, from the start:

-shut my computer down one day and opened it up to clean it with canned air, something i've done countless time on my personal computers and at work. i was properly grounded and careful as always, the only thing odd that happened is that some mist sprayed out of the canned air (like it does when you hold it upsidedown) even tho i was holding it almost perfectly vertical. (while blowing out PSU i believe). I let the computer sit for 10 mins or so before powering it on to let any moisture dry out some.

-computer wouldnt start once i plugged it back in. the case/cpu fans would spin, but there was no video signal and no power to my USB devices (they usually flash and then stay on during POST). i lost the speaker for my mobo so i couldnt listen for beep codes. but it did start working again, after i removed 2 of the 4 sticks of ram, on a whim. I dont think the ram was responsible for this event, i think the mobo defaulted the bios during my many attempts to turn the PC on, and IIRC, the mobo defaults to too low of voltage for my ram, esp 4 sticks of it.

-so my BIOS was defaulted and i forgot to save my OC settings to one of the stored profiles, so i started from scratch. i got it back to where it was and decided to take it a bit further while i was at it, got things stable at 450mhz bus with some added vcore and vNB, still well within the rated limits of both.

-computer ran fine that day, played a ton of BF3 and skyrim with no problems.

-the next day i get a message saying my beta ver of MSI Afterburner is about to expire, so i grab the new beta, while i was at it i got the newest version of OCCT as well. You can set the volts on the 570 up to 1.1v using afterburner, i've had my card on 1.075v to keep the 725>900mhz OC stable, and its been fine for months, other than a brief period of instability (i think caused by a driver version) where my card needed to be bumped up to 1.1v to be stable, but with my aggressive custom fan profile set, it still never got much into the 70s C. after i installed the next drivers tho i noticed i was able to get it stable at a lower voltage again.

so on the day in question, i bump the voltage up to 1.1 and then the core clock to 910mhz, load up OCCT and run a stress test and my PC shuts off instantly as soon as i did. i turn the computer back on and run OCCT again, this time on the 900mhz 1.075 that was stable for so long, same thing it just kills everything instantly. So i turn computer back on and then lower core/shader and voltage back to almost stock (i never touch memory) and try again, and again the computer shuts off.

- this time the computer wont come back on. the case fans and lights come on, but there is no video. and this time the GPU's fan is running full blast, really loud, from the second i push the power button to the second i shut it off. i reset CMOS and unplugged, reseated GPU, checked power connections, everything, still nothing. I put the card in an older gaming rig and it did the same thing, no picture and max fan. So i get a hold of EVGA and send the card in. fortunately my mobo has onboard video (something ive always had disabled) so i enable it and lower my bus back to stock so it runs ok with it on.

- my friend comes over with his i5 rig which also has an evga gtx570 on it. I help him OC his card (something he wanted help with for months) and we get it stable at about the same settings as mine ran at. then we took his 570 out of his computer and put it in mine, i went in bios and disabled onboard video and reloaded OC profile. When we got in windows the GPU seemed to be working fine but my soundcard wasnt detected and neither was the onboard NIC. I restarted and shut down a few times and they still werent detected. I shut down/unplug computer and remove sound card and then get back into window, shut down and put card back in and then when windows comes back up the sound card and onboard NIC are both detected and working normally...



-i did a clean install of afterburner and OCCt for the heck of it. Then I set his GPU for the same speeds that we just had stable in his computer (900mhz and 1.075v) ran a brief OCCT scan to make sure it was ok and it seemed to be. I get in BF3 and play for an hour or so and get a crash to desktop with a popup in the tray about the display driver/adapter stopped responding so i lowered the clocks down to 850mhz and resume playing and it was fine for another 3 hours or so.

- we leave the GPU in computer overnight, PC is left on like normal, and then when i get back on it in the morning, i decide to try to figure out why it wasnt stable. I run prime95 to make sure my CPU/mobo OC is stable, and it seemed to be, i let it go through like 8 tests. Then i set card back to 900mhz 1.075v that it was stable at and launch OCCT while prime was still running to test and see if its the CPU and GPU both being under load that makes things unstable. Well, as soon as i start OCCT GPU test the computer shuts off again. we get back into windows and try running just OCCT by itself and it shuts off again. we get back in windows and set everything back to stock and try again and again it shuts off.

- this time theres no video when we turn the PC back on, but no max fanspeed like with the other card. I try the usual tricks but it wouldnt come back on. So we put it in his PC again and it does the same thing in his. and during the many attempts to get the card running again it did the thing where it ran on full blast fanspeed again. after a few hours we give up and decide that another GPU has been murdered by my computer.

- I switch back over to the onboard video and get into windows just to discover that my onboard NIC isnt working, but the soundcard is fine this time. I try disabling/enabling it in the bios, shutting down, unplugging, resetting CMOS, defaulting BIOS, but scans for hardware in windows never finds any network adapters or unknown devices even.

That pretty much brings things up to date. My friend is RMAing that card, and my replacement card for the first murdered card should be here in a couple days. Needless to say i'm really worried about frying a third card in my PC, and I'm really at a loss as to what the cause could be. Logically, since the GPU is attached to both teh mobo and the PSU i'm thinking it could be either one of those things... here are my guesses and supporting evidence:

1) It could be the PSU. While this antec PSU has served me well through a couple builds, it is getting a little old... and i have heard about dying PSUs causing odd problems like this. Also there was that tiny mist of moisture that the canned air blew into it (way back at the beginning of this story lol)

but does it explain why my onboard NIC is dead? or the weird period where the NIC and PCI soundcard both stopped working for a while? i havent heard any weird noises or smelled anything...

also i ran a PSU test under OCCT and it seemed to be fine, but at that point my GPU was already removed so it didnt have that load on it...

2) It could be the motherboard. During all of this i have been tweaking my CPU/mobo overclock, but i havent drastically changed any speeds or voltages really... temps all seemed fine and prime95 runs fine...

3) could be software/driver related. is it just coincidence that I installed new MSI afterburner beta immediately before running the stress tests that killed my first card?


thoughts?
 

Helltech

Distinguished
Well, I read it all.

Every conclusion I can come up with wouldn't make the card do the full fan spinning in ANOTHER computer I don't think.

My friend had a similar problem with the fan on the card going at full blast and getting a lot of crashes, it turned out to be his RAM. I doubt this is your problem. But you could run Memtest+ and see how that goes =/
 

chesteracorgi

Distinguished
I think your first guess is right. A dying PSU can cause these problems, and the sound/nic problems can accompany the result of a transient spike. If it is the mobo I would think that it took a dump as a result of the transient spike.

I think that it is nuch less likely that the software cause the problem.
 

steiner666

Distinguished
Jul 30, 2008
369
0
18,780
thanks for your replies guys!

i've been leaning towards PSU too, since the motherboard seems to be working fine except for the NIC. After looking for a good mATX lga775, i'm glad that its looking like the PSU is what will need replaced, since its impossible to find a good p45/g45 matx board for new or even a decent priced refurb, and i really dont want to go with the g41 and have to get ddr3 ram, lose my RAID capabilities, and have a hot NB and not be able to raise my bus speed much.

I ordered a nice looking corsair 750w, and im going to try hard to not even put my GPU in my computer until the PSU gets here, even tho there will be a few days of temptation to do so...

 

steiner666

Distinguished
Jul 30, 2008
369
0
18,780


Yup, sure did. I always make sure to do things asynchronously
 

CryptorX

Distinguished
Aug 9, 2009
111
0
18,690
Two murdered 570's... sorry but i can't even read this... my heart can't take this! :pfff:
But... i can't stay idle either...

Well... First of all your psu seems to be a good one, i have already checked some reviews and it seems to be a very good one, but i noticed on your config that you have your cpu set to 3.8Ghz which is a big OC and running a GTX 570 at 900 will also raise the bar for your psu, but i am not saying you have been pushing it near his limits, but i wouldn't be surprised as well if you were, i have been searching a lot trying to find how many watts that cpu draws at that speed or at a near one but no luck until now, without an OC it draws around 180W so if i had to take a guess i'd say it must be something like 250W at that speed, one 570 if i am not mistaken draws near 230 at full load and with that OC it should be around 280w. That put together gives us around 530 leaving 120 for the rest of the system and that margin will easily run out with hard drives, cd\dvd drives, fans, ram, sound card etc...

Maybe that even explains the fact that some hardware sometimes detects correctly and some other times it doesn't and why BF3 crashed whit the card at 900Mhz but not with it at 850Mhz, maybe it wasn't the card that didn't hold it but the psu that was failing.

I know that some psu's (i believe most of them) when exceed their rated power may compromise their protections like active pfc and others which are vital to protect your system from energy anomaly's like current peaks and there may also be spikes that generates EMI (Electromagnetic Interferences) when OC'ing that may cause system instability or even damage some hardware. But about this last part i am not sure. I noticed that in you motherboard's manual in page 4-16 that your bios have the option to enable pci-e spread spectrum and cpu spread spectrum as well, i am guessing now it's too late to try it but you should try it as it will eliminate Electromagnetic Interferences at the cost of some system stability.

Anyway, just trying to bring new possibilities to the table... Maybe someone can elaborate more on this than i can... but i will keep looking into this...
 

CryptorX

Distinguished
Aug 9, 2009
111
0
18,690
steiner666... have you ever had problems with your USB devices? Problems like having them disabled and enabled like you had unplugged them then plugged'em again without actually do anything or not detecting at all?
Because in this review it says that under heavy load the over current protection often shuts the 5v line off... if that ever happened to you than it is very likely that the psu was not only under a heavy load but being overloaded...
 

CryptorX

Distinguished
Aug 9, 2009
111
0
18,690




That make us three but if this psu couldn't handle that config with those OC's it must have been due to lack of power, and i just can't see how lack of power can damage anything, it is the most certain cause to the crashes and system instability... but damaging the cards? It must have been a spike, but was it caused by a faulty psu or the CPU OC? Either way i believe having spread spectrum enabled would have saved the cards... Though i would prefer to rely on a good psu...
 

steiner666

Distinguished
Jul 30, 2008
369
0
18,780


The mystery continues.

I got my replacement GTX570 from EVGA and my Corsair AX750 PSU last friday. I installed the PSU (very, very nice I must say) and then powered on with just the onboard video to make sure everything was working fine, and it was. Put the "new" 570 in and powered on, got into windows, launched Skyrim and hit Continue at the main menu and crossed my fingers. Got into the game and it didnt immediately shut off...

But before i could celebrate that fact, a weird glitch appeared on the screen and then it went blank. "wtf?!" i said, as the picture came back on and then went blank again seconds later. I managed to get to the desktop and noticed the popup from the system tray notifying me that the display driver stopped responding and was recovered. I ran a test in occt and artifacts were detected instantly, same in furmark. So i installed Afterburner and verified that the card was in fact running at stock speeds/volts, and it was indeed. Powered down and rechecked connections and reseated GPU for the hell of it and got into windows and was met with the same results. Every game i tried crashed to windows within seconds. I emailed EVGA about the problem and they said something along the lines of "some card do have problems running at stock voltages" i asked them "how the hell did the fact that this one cant even run at stock speeds elude the person in charge of testing this replacement card?" afterall it did clearly say "inspected for quality assurance" on the sticker I had to cut to open the box the card arrived in.

So, REALLY not wanting to have to go another 1-2 weeks without a GPU in the busiest gaming season of the year, i started playing with the voltages and found that if i raised voltage to 1.075 from teh default .9xx volts it was stable at stock speeds, however the card was emitting a frequency noise like a noisy fluorescent/neon light makes when it was under load. It was somewhat noticeable at stock voltage, but it was even more noticeable at the higher ones. At first i was worried it was coming from the PSU but i put my ear up in my case and it was clearly coming from the card. I emailed EVGA about the issue and they advised me to try updating the cards firmware, which i did and didnt really notice any change. I did notice that while it took 1.075 for the card to be stable at around its 730mhz stock speed, i WAS able to up the core to slightly above 800mhz at that voltage :pt1cable:

the next day i was running some stress tests to see if i could get it stable at a lower voltage and my rig completely shut off again. I pushed the powerbutton to turn it back on but the computer wouldnt turn on. I flipped the PSU switch off and waited a bit and flicked it back on and the fans lit up and spun for a split second, but then nothing, power button still wouldnt do anything. So i pulled out the GPU and the computer powered on when next tried. Even though i had been monitoring the temps while stressing the card and they were only in the 70's C range, when i pulled the card out a minute later it was still surprisingly warm, especially on the backside. I smelled hot/friend electronics and after sniffing the card over decided that the smell was definitely coming from the back of the card, near the PCI-E connectors. No doubt the VRMs, which were clearly even more crippled on this 570 than most, had finally burn up.

I emailed EVGA and got another RMA number, the card just got delivered to them today so I expect to have the replacement by next wednesday at the latest since i raised enough hell for a manager to authorize them to give me 3 day vs 5 day return shipping. I'll update then. At least im 99% sure that this cards death had nothing to do with any of my hardware, it was messed up from the beginning, probably some other messed up card that someone else sent in that evga just turned around and shipped to me.



oh and i figured out the mystery of the missing onboard NIC. Apparently the NICs chip is right next to the edge of the motherboard right next to the slots pci slots my GPU takes up, and at some point during the many removals/insertions of GPUs into my case one of the brackets on the cards must have scratched a trace from the NIC chip enough that the onboard NIC isnt working any more. I ordered a nice Intel pci-e NIC, since the x1 slot above my GPu is the only one thats open when a dual slot cards in there. Thing works better than the onboard anyway, so i'm happy this happened and caused me to switch lol.
 

CryptorX

Distinguished
Aug 9, 2009
111
0
18,690
I still think that the psu is somewhat related to that problem, even more now because of that problem you had with the pc shuting down and then refusing to turn on again. If it was a motherboard problem it wouldn't turn on a few moments later, when a mobo dies it simply die, now when a psu gets overloaded it shuts down and will be able get started again when the temperatures go down again (i have also heard about some psus that use timers but i am not sure about that), i think you should try with another psu.

Also, having your previous card overclocked for some time might have stressed the motherboard circuitry to much, you should check you motherboard for swollen capacitors. That is either a motherboard problem or a faulty psu. But what leaves me wondering here is that if its truly the psu how can you ram and cpu be okay when they are much more vulnerable to voltage problems, i believe that a psu that screws a graphics card will much much more easily damage ram or a cpu or at least it would make them really unstable to the point of running constantly into bsod's.
Have you tried running stress tests like prime95 to check both the ram and cpu? You should if you havent. If they are okay then that dims the chances of being a psu problem though i still think she is being overloaded.

So that would probably leave us only with the pci-e slot... it may be damaged... and by the way... i never heard of cards that needed voltage adjustments to run at stock speeds, and if evga support told you to adjust them that really raises some suspicions about the cards they are sending your way no to mention that messing with voltages voids the warranty and they told you to do it anyway... like they were already messed up... that is what i find really suspicious...
 

steiner666

Distinguished
Jul 30, 2008
369
0
18,780
Like i said in my previous post, i did change PSUs. I went from my 3-year-old ~$90 650W antec to a $160 750W Corsair. I think the fact that the PC wouldnt turn on this time, with the AX750 installed, is because it has more protection features that probably detected the short (or whatever) in the card and wouldnt power on because of it (just speculation).

As I also said, I'm fairly certain that this cards death was entirely unrelated to the previous two because of the issues that it had from beginning. I was at first thinking that it was a problem with my old PSU that caused the first two cards to die, but this one was just faulty. But lately i've been thinking about how likely it could be that all of the cards just died because of all 3 being faulty.

card #1 was the card i originally purchased in feb and ran overclocked ~170mhz at nearly the max safe voltage that the card will allow itself to be set to (no, changing voltage at all doesnt void warranty, flashing the cards BIOS to override manufacturers specified max voltage limits will though however, which ive never been stupid enough to do). Ran great and then just died a few weeks ago... could have just been its time to go?

card #2 that was tried in my computer was my friends 570. same model, but this wasnt the original card that he purchased in feb also, this was an RMA replacement he got through EVGA a few months back because his had (we suspected) bad vram and was artifacting even in windows and videos unless we drastically lowered the memory clock. After seeing <sarcasm> how thoroughly EVGA clearly tests all RMA replacements before sending them out </sarcasm> I'm wondering if, even though his died a day after being put in my computer, maybe it was just faulty too. I had just showed him how to OC the card in his computer the night before and it was running nearly at what i had mine set for, could have just taken a couple days at running at those volts to kill it.

card #3 as i already discussed, was the crap replacement EVGA sent to me for the first card that couldnt even run at stock speeds and was clearly crap

*forgot to mention they yes, the system is entirely stable. i ran prime95 plenty of times while i was having these GPU issues and never had a BSoD, crash, or even a fail on one of the cores. part of why i'm pretty sure this has all either been because of a mix of my old psu being bad and 1 card being bad or just the cards being the problem. everything else has been fine.

I think that, based off my experiences, and the experience i'm reading about in other threads and on other forums of ppl having problems with 570s, that the real issue is that these cards have too weak of power regulation circuitry to be able to support the settings that they are allowed t.o be set to (or sometimes even what theyre set to at stock, obviously). It's no wonder that the 570 seemed like such an awesome deal, overclock speeds that brought pretty up to 580 speeds for ~$150 less? sounds awesome! until you read the fine print on the specs and see that it only has 4 phase power vs the 580s 8 phase, thats where the cut costs.

Needless to say, when this next card shows up, i'm going to overclock it but only as far as i can without having to raise the voltage at all. (... that is, assuming it can even run at stock settings.) and wait until the 600 series is out and make a more careful purchase next time around. In the mean time, I'd suggest anyone looking at buyin a 570 get the special versions offered by some of the manufacturers that have 6 or 8 phase power, even though theyre like $30-50 more theyre totally worth it. captain hindsight to the rescue!
 
I agree with your analysis. I don't think the new PSU is any part of the problem; you got some bum cards. I had to return my first GTX560Ti, and I've been suspicious of the second due to some sporadic texture flickering and other unusual effects; I've been thinking that perhaps Fermi simply wasn't ready. It will be interesting to see if the new card is good. If not, in your place I'd probably scratch eVGA off my short list, at least for a while.
 

CryptorX

Distinguished
Aug 9, 2009
111
0
18,690


Sorry... i focused more on other detail and i completely forgot that... i has a bit in a hurry when replied as well... that eliminates the psu problem (if there ever was one) but i don't think it is a good idea to rely on the possibility that all three cards were bad, though i have heard more than once than the reference 570's aren't that great at OC because of some problems related to voltages, even i, when i bought mine, was strongly advised to choose one that used a custom designed pcb because the reference ones didn't had as much overcloking room because of those problems, i am not a great OC adept but i didn't liked the word "problems" so i decided to go for a gainward golden sample despite EVGA's 10 years warranty...

but... three cards in a row... :heink:
I wouldn't OC the next one if i were you... and if those card have stability problems with stock voltages i would demand a refund or a different model if i were you, i think you have the right to demand it or at least demand the same model but from a different factory, some components vary from factory to factory though i am not sure if EVGA get their cards from multiple places like some other brands do... also, i have seen in a motherboards.org review that the reference GTX580's are supplied to the different brands like EVGA, ASUS etc already built by nvidia, i don't know if the same goes for the 570's...
 

steiner666

Distinguished
Jul 30, 2008
369
0
18,780



Yeah im confident that a majority, if not all, of these dead cards were from the gimped 4phase VRMs on the 570s. I got my replacement today and it runs fine at stock speeds and volts. i got it overclocked to 875mhz with only a minimal voltage increase. I bumped it up to 1.02v, which i think should be fine, its just the 1.1v max they allow these cards to be set to is too high to be safe it seems. The failure falls on whoever designed the reference 570 to have 4phase power instead of at least 6. I really wish i could return this card and get one of the 570s with better power regulation that can handle OCing better. I tried talking EVGA in to extending their 90 day trade-in program for me but they wouldnt have it. Oh well, live and learn. When i buy my next card i'm going to research the VRMs a lot.