Possible failed graphics card?

Agni451

Honorable
Jun 16, 2012
3
0
10,510
I have two GTX 570s running 3 monitors. Recently (past ~5 days), I've been experiencing seemingly random restarts (at least 8) and two BSODs. What ALWAYS happens with the restarts is that the two monitors connected to the primary GTX 570 (call it Card A) go black and if I had music playing, it freezes on one note and keeps blaring it. The third monitor on Card B looks fine, and I've even had the mouse work on that monitor for a few seconds. Then the computer restarts. This happens during gameplay (which has been very infrequent the last couple months) AND simple tasks like going on the internet or working on a Word doc.

The BSOD whose BCCode I remember (7E) happened when I was just checking emails. Unfortunately, neither BSOD has been recorded in the Event Log, so I have no idea what program caused it. Even more recently (just the last couple of surprise restarts), the screen freezes right at the end of the starting screen that says "Starting Windows" with the shining logo. But the computer keeps starting up in the background- it plays the welcome sound and I could type in my password, except that the screen is still stuck. It sounds weird, but on that login screen I can also restart the computer with a few taps of the keys (to highlight the red box on the lower right then select restart. I do this from memory of the keystrokes). It will do this several times, then on the 3rd or 4th manual restart, the screen gets past the start screen and to the login like normal. Then everything's fine until the next random restart.

What I've done:
1. Virus scan- nothing found by Norton or Smit Fraud Fix
2. Defrag
3. Chkdsk- everything fine, no bad sectors
4. Updated my previous 296.10 drivers to the latest 301.42
5. I always have EVGA Precision open in the corner of one monitor to check on temps. Card A shows temps of 55C idle (that's with Card B running; it's 46C without) and load at 78C. Card B, which is usually under 80% load constantly, shows idle temps of 40C and load temps of 72C. CPU temps never get above 70C under full load.

I'm probably going to reinstall Windows to completely rule out software problems tomorrow. I have a bad feeling that it won't help, and this is either a motherboard or video card problem. Any more ideas on what to do to determine the cause of the problem before I reinstall or spend $300 on a replacement for Card A? All in all, I tend to have my computer on doing something 24/7, but it only uses Card A during games. Card B has gotten a workout being used for BOINC/Folding@home. Card A really hasn't been pushed hard at all the last few months, so I'm at a loss why it's failing (if it is).

My computer specs (assembled Jan 2011):
Gigabyte UD3R motherboard
2x GTX 570 @ stock
i7 950 @ stock
12GB DDR3 1333
150GB Raptor for OS
1TB Hitachi data
320GB WD
Corsair 850W
3x Samsung HD monitors (2 on Card A, 1 on Card B)

The last occurrence was just 15 minutes ago- screens went blank, then the system restarted. It then froze at the startup screen, and I had to go through two manual restarts to get to the desktop again. I was watching a .ts video file in MPC at the time on my primary monitor, but I'm still not seeing a definite pattern.
 
I hate to say it as a collector these high end Fermi cards are not built as well as they should and have a lot in common with the old GT200A series cards when it comes to having degradation problems. Is there any coil squeal while the card is under full load?
Second the gtx570 is well known to be very sensitive to overclocking when core voltage is raised beyond a certain limit. The mosfets for the gtx570 are under more stress than on the 580 due to their being two less phases to power the gpu core. So they run much hotter when under load.

It is best to consider an rma but it is also possible that the card might be fine but that could easily mean that your board or psu is having trouble. I would bet that it could be the board because I had the same issues my self before the board finally went tits up.
 

Agni451

Honorable
Jun 16, 2012
3
0
10,510
No squealing, but the primary card (GPU A) runs hotter than the secondary (+15C) because it's sandwiched in between the secondary GPU and the CPU heatsink (V8). I have the cards at the factory OC of only 18mhz (core), so I'm not really pushing the voltage.

I really hope it's not the board (PCI-e x16 failure?), because I'd be SOL if that happened. I'm going to try swapping the GPUs and see what happens. If the problem is still with GPU A, I'm reinstalling Windows. If that doesn't work, I'll have to get a new card.
 

Agni451

Honorable
Jun 16, 2012
3
0
10,510
Well, I did a clean reinstall yesterday, and it ran fine for 24 hours+. However, tonight I played a video and it crashed again. All 3 screens were frozen but not black this time, and then the computer restarted. In the event viewer, I finally got a log of what happened. The BCCode was 278 = 0x116 = VIDEO_TDR_ERROR. So it's got to be the video card, right? Would this error occur if it were the motherboard? I just reinstalled (cleanly) drivers 301.42, but I'm not hopeful that this is the solution. I am very reluctant to do something drastic like try to upgrade the BIOS of the actual card. Is there any hope left or should I just bite the bullet and buy a new card?
 

stgoo

Honorable
Jul 13, 2012
1
0
10,510