Two failed cards in a row?

Ildon

Distinguished
Oct 21, 2010
7
0
18,510
Hello, folks. For a little over two years now, I've been using a GeForce 8800 GTS card in my homebuilt system, and have had little to complain about. Sure, it wasn't the fastest thing around, it wasn't going to win any benchmark contests, but it was dependable and it was cheap. Then, suddenly, it failed on me without any warning whatsoever. No gradual artifacting, no weird colors, none of the usual things I've read about. One day fine, the next, it locked up in the middle of my playing World of Warcraft, colors distorted. A hard reboot was required, and though I made it back into Windows, the exact same problem hit me upon restarting WoW.

This time, unfortunately, I noticed artifacting immediately. On the POST screen, which is of course bad news. From this point on, I was unable to get back into Windows normally. Freeze after 'Loading Windows...' screen, eventually giving me a blue screen with the "attempt to reset the display adapter and recover from timeout failed" message. I could get into safe mode just fine, with no artifacts present once I made it in. I contacted EVGA, RMAd the card, and eventually recieved an 8800 GTX in exchange.

This worked fine for about an hour, and then... artifacts again, this time while playing Dawn of War. Computer locked up, distorted colors, hard reboot. Artifacts on POST immediately, albeit 'different' ones. Same blue screen upon attempting to load Windows, only now, even safe mode is afflicted by artifacting. This made me think that perhaps my computer was frying cards for some reason, but if I put my roomie's card in, nothing seems to go wrong after a number of hours under semi-heavy load.

Furthermore, with this RMA card, I was once able to clear the problem up by disconnecting everything from the card and plugging it all back in. As I understand it, a fried chip or physical damage wouldn't clear up even temporarily, no matter how much fidgeting around you do with things, which leaves me really confused here. This 'fix' only lasted for about 10 minutes of Windows, and once again fritzed out after running WoW for about 2 minutes. I've since been unable to clear things up again despite a couple of hours of connecting/reconnecting and swapping cables around.

The good Dr. Google brought up numerous instances of this sort of problem, along with more than one mention of the 'oven bake' remedy. As I'm hoping to work things out with EVGA and hopefully get this thing switched out for something else, I'm loathe to try throwing it into the oven or anything else that they could latch on to in order to blame this on me.

So... I guess what I'm looking for here would be things I could do in order to determine if there's something in my computer eating cards alive or what. Even if I do get another card from them, there's no point in bothering if it's just going to follow the first two down. On the other hand, I've read plenty of cases online where people got poorly-soldered cards three or more times in a row through RMA, so I feel it could go either way.

I'm not sure what else I could add, other than a GPU-X log showing not much out of the ordinary from the one time I was able to clear it up and get back in. A brief spike in GPU load (up to 23% or so while loading a zone in-game), everything stable, and then a sharp spike from 2% load to 99% load in the very last log entry.

Is there anything else I could throw up here to help you guys out? I did find the .dmp file Windows made after a few of the restarts, but they make little sense to me. If there's anything you guys can think of to help me isolate the root of this problem and hopefully solve it, I'd certainly appreciate it. Thank you.

EDIT: Sorry, I forgot to include my specs.

Motherboard: ASUS EP35-DS3R
Video card: EVGA Nvidia GeForce 8800 GTX
CPU: Intel Core 2 Quad Q6600 2.40GHz
PSU: Corsair 750W
RAM: 6 GB Corsair XMS
OS: Windows 7 Ultimate
 
Well, no need to RMA it again, 8800GTX+ is an old card and can get hot pretty fast in term of it's aging...
You can try re-apply the thermal paste and make sure that card is free from dust...
Or you can buy a new card...
 

4745454b

Titan
Moderator
I'm going to disagree with Wa1 here. The 8800GTX isn't a new card, so they probably just sent something they had in the back. Might have been refurbed, who knows. Either way I wouldn't mess with it. Tell them it doesn't work, and they sent you a bad card. Try another card, or see if they have something newer that might not fail. Either way, don't bake it or reapply paste as they might claim thats why the card doesn't work.
 

deweycd

Distinguished
Sep 13, 2005
846
0
19,010
It is likely that you were sent a refirbished card that had a previous issue. Older cards like this tend to have more and more issues as they get older and RMAed. RMA it and try again.
 
err, i should make my statement clear, yes you are right 474545b, maybe the OP only got a bad replacement, so the best thing is to return the card (again) and tell them that your card isn't working... :)

Thanks for clearing, 4745454b...
 

Ildon

Distinguished
Oct 21, 2010
7
0
18,510
I'm going to give another RMA a try, I guess. Disappointing, but that's the way it goes. My question to you guys is now... what can I do to make absolutely certain that there's nothing faulty on my end that will cause the next card to go up in proverbial flames? I want to do whatever I can to ensure that my rig is as safe for the (hopefully) incoming card as possible and that this doesn't happen again.

Is there a way to somehow test the PSU, for example? Some signs to watch for to see if everything is normal, or if there is something wonky going on with it somewhere? Anything else that might cause this sort of video card death?
 

Ildon

Distinguished
Oct 21, 2010
7
0
18,510
How does one test the voltages? I saw an area in the BIOS where I could monitor voltages, but I really have no idea what to look for in such a thing.

Not that I'm questioning anyone's judgement, of course, and I agree with you guys on the wonky card being sent out to me. I just want to cover all of my bases and ensure that there's no way at all this can be the result of anything on my end. Careful to the point of fault, perhaps, but it's not such a bad flaw in the end.
 

Ildon

Distinguished
Oct 21, 2010
7
0
18,510
A bit of an addendum here: I booted up the PC about an hour ago with the intention of going into safe mode and copying some files onto a USB drive before sending the card out. Lo and behold, it's... working? I'm not really sure what's going on at this point. I would assume that it just got too hot yesterday and was shutting itself down, but all of the logs I've taken thus far show only a marginal rise in temperature as time passed, all the way up to the time of crash.

I'm pumping up effects on WoW at the moment and going to the most crowded place I can find, hoping to trigger the crash again to see what the root cause was, but still nothing. About 5 degrees rise in temp and that's it. Anyone have any advice here? I don't want to RMA the card, only to have them turn around and send it right back to me, claiming that there's nothing wrong with it. If not for the nightmare of last night, I wouldn't believe it myself as I sit here watching it in action now.
 

Ildon

Distinguished
Oct 21, 2010
7
0
18,510
I've gone ahead and RMAd the card despite the episode of good health earlier today. I have no idea how it managed to recover, but I'm not willing to risk it behaving for now and then dying for good once the window of return has passed me by. That said, I'd still like to eliminate all possible causes that could be lurking in my rig. I was uncomfortable with the coincidence before, but the fact that the card was working again this morning struck me as incredibly odd and has me right back to thinking that there's something amiss on my end.

I'm not asking for anyone to hold my hand and walk me through all of this. I'm more than willing to do my own reading and learn whatever it takes to diagnose the issue, if someone can just suggest a specific issue for me to go and research. Thus far, blindly searching Google has told me to 'test PSU and mobo', but really... I have no idea how to test anything like that. I saw 'power supply testers' for decent prices online, but I still don't know what I'm looking for with such a thing or how to go about narrowing it down.

I can't do any work until I've got a work until I get a working card, so I really am eager to solve this problem conclusively. If I 'break' a third card, I doubt that I'll get another one out of them. I've got about two weeks before the replacement's replacement arrives, which I think is plenty of time to test out all sorts of things, so please. Let me have it!
 

Ildon

Distinguished
Oct 21, 2010
7
0
18,510


It was my understanding that it had been tested for three hours, but upon checking with her a moment ago, 'a few hours' suddenly became 'a few minutes'; nowhere near long enough for a conclusive test. I could put it back in there, as I do still have access to her card, but I'm afraid of what might happen to it if indeed it is my hardware at fault here.

I don't have access to a multimeter, but I suppose that I could go out and buy one (assuming that I can figure out how to use it for testing a PSU), or I could pick up one of those 'power supply testers' I noticed online earlier (assuming that they actually work.) That would only test the PSU, I think. How can I go about testing to see if it's the motherboard/PCIe slot?
 

4745454b

Titan
Moderator
The only way I know of to test a slot is with a card. If you put a known good card into a slot and it F's up, then you know its the slot. You can use software tools to measure the voltages. Not as accurate, but in my experience if they report good levels then you're ok.