Computer Restarting Under High SLI GPU Load,

jddg5wa

Honorable
Jan 24, 2013
55
0
10,630
I'm currently running Dual GTX660s on a 750w Fatality PSU. I've tested a Corsair CX850m and my original Antec HCG-750w. The problem still persists through each.

I can put a full load on a single card and it will do fine. As soon as I turn SLI on and run a high load on both cards the PC will restart itself. The memory each test fine, well the problem persists with either in on their own. I've tested with furmark; runs fine on each card solo but runs for only about a minute when the cards are in SLI mode. The only thing I haven't tested are the CPU and motherboard but I'll be getting a motherboard Sunday to test with.

I'm here as a last resort. I've tried so many things with no success and know I am missing something. Hope someone here can shed some light on the issue. Thanks.

Also, something small to note, the Ethernet will disconnect sometimes when there is a high load. Not every time or consistently, just some.
 
Solution
I had something similar happen to me once. Fresh install of windows, and eventually the Pc would just behave oddly, restarts, graphics, etc. Figured it out and replaced the Ram. Reinstalled Windows, and eventually is did it again......turns out the motherboard went south, was messing with the ram and was corrupting the HDD. Replaced the motherboard and everything worked perfectly.
Hmm, well the Fatality and Antec Psu's are both tier 2, and the COrsair is a tier 3...so solid units. That was my first though, but too much coincidence for it to be all three units.

What are your temps like when running SLI? Also, what are your other specs for this system?
 

jddg5wa

Honorable
Jan 24, 2013
55
0
10,630


Yea really been testing those PSUs. From what I've read that usually seems be the cause, not enough power to the cards combined. Although I've seen anything from drivers being a fix to bad memory so it's weird one. I'm at my wits end trying to solve it.

Full load temps I often see are 72c and climbing on the Main GPU and approx 50c to 60c max on the secondary right before it freezes. Without SLI the temps are usually 60 to 62c on the main. Idle temps are pretty normal at about 30c for both cards.

I can't really remember the other temps before crash. Usually I notice the CPU leveling out around 60c.

 

jddg5wa

Honorable
Jan 24, 2013
55
0
10,630


Is that supposed to work? I can't even get it to recognize the secondary GPU. Although the GPU power light is on and the fans are running.

Edit: A bit quick to say that. The GPU does appear in Device manager but has an error "Windows has stopped the device because it has reported problems".

Also doesn't matter which device is in the second slot with the second PSU, same error.
 

jddg5wa

Honorable
Jan 24, 2013
55
0
10,630
I wasn't around to see it happen but my computer an unexpected restart error with just one card in the motherboard. So it must not only be high load with SLI. There wasn't even a load on the card when I left it.

When I got back to my computer I also notice a ton of errors about nvlddmkm not being found and a couple of errors about it not responding and recovering.
9d0b3af67a05dc6fca075c007ad26888.png


Edit: So I realize after the fact I did something stupid, could have cracked solders and such. Used a can of compressed air, upside down, to cool the GPUs while running furmark; for sure not doing that again. Either way I managed to get furmark to run for 6 minutes, then furmark itself just froze. So I restarted and, no air, got it to run for 8 minutes. The temps for the 8 minute session were in high 70Cs for GPU-1 and 60Cs for GPU-2.

The 8 minute session did result in a restart as usual but there was no error about the restart being unexpected. I'm not entirely sure what the meaning of that leads to or how to test more.

 
uh huh

You could remove the hardware from windows, restart, then reinstall the drivers again(using the latest nVidia drivers of course) and see what happens. Temps seem ok actually. Also, check around online to see if anyone else is having issues with SLI on your motherboard maybe? Hopefully it's a software issue and not a hardware issue, I know that SLI/Crossfire still has bugs every now and then.
 

jddg5wa

Honorable
Jan 24, 2013
55
0
10,630
I had a long thing written about what is going on now but things changed with a last test I did before commenting. Pretty sure the cause is hardware and not the GPUs as problems happen no matter what card I have plugged in.

Someone suggested I run furmark on both the cards for at least 15 min. I suspected GPU-1 to be the problem because during it's run the display driver crashed and recovered while GPU-2 ran fine. Plus when I switched from GPU-2 back to GPU-1 I had a weird white checkerboard artifact covering the screen on login and the PC reset. I was able to log back in with GPU-1 and run furmark for over an hour.

Although I tested and switched back to GPU-2 and the same artifacts happened, even worst this time as it appeared directly after the bios screen, and not on login.

I'm headed out tomorrow to get a motherboard to test with. At this point it's the last thing I know to test.

Either way the problem I notice now, on top of reseting on high sli load. Is that when I turn off computer and switch to the other graphics card I get artifacts and then the pc resets with no artifacts.

Edit: It's getting worst, display driver crashing/recovering with either card in the mobo and had a freeze with either card. Even little to no load on the card.

And it's gotten even worst, now it just resets at any old time. Someone said it could be the motherboard having poor power regulation. Either way taking it into fry's to see if I can test the parts before leaving.

Thanks for the help!
 
I had something similar happen to me once. Fresh install of windows, and eventually the Pc would just behave oddly, restarts, graphics, etc. Figured it out and replaced the Ram. Reinstalled Windows, and eventually is did it again......turns out the motherboard went south, was messing with the ram and was corrupting the HDD. Replaced the motherboard and everything worked perfectly.
 
Solution