GTX 670 Sli crashing

Mietoxd

Honorable
Aug 14, 2013
8
0
10,510
Hello. So ever since I got another gtx 670 for sli my games / system keeps crashing. This happens in pretty much every game I can think of; Skyrim, Dota 2, Batman Arkham City, Farcry 3 etc. Sometimes the screen keeps flashing black and then returns back to the game, only with low fps this time, which I can fix by alt tabbing 2 times and the game works fine afterwards for a while, only to repeat the problem. Other times my whole system just hangs with a black screen and is completely unresponsive and I have to reset it with power button.

Playing fullscreen / windowed makes no difference. I've also keeped track of temps and power with Afterburner, both seem fine except the "main" card which my 2 screens are connected into is running 10-15 degrees hotter than the other. Otherwise both cards are running with identical clocks etc.

Earlier I had problems with turning Sli on from control panel would hang my system as well and I had to reinstall drivers every time that happened, but I fixed that with motherboard bios update.

Specs:

Asus P8Z77-V LK
i7-3770
Corsair Vengeance 16gb ram
Palit Jetstream GTX 670
MSi Twin Frozr Power Edition GTX 670
Corsair TX850 PSU
OCZ Vertex 3 60gb SSD
Samsung 830 series 120gb SSD

Can anyone help me?
 
Solution
Hi,

the Power plan change gives some more stability as the PCIE power save is disabled in that mode, it can also be disabled manually for other modes. However, that alone didn't solve at least my problem completely.

For me, the ultimate fix was to increase my CPU voltages. I used to run my 3570k 4.5 @ 1.17V, but after having done everything the internet suggested for this SLI issue I figured that the issue may be elsewhere. I set my CPU voltage up to 1.3, VCCSA voltage to 1.07500, CPU PLL to 1,88750 and my PCH voltage to 1.1 and hey presto! I haven't had a single driver crash, kernel error or device removed error in over a week! Previously I needed to downclock my FTW's even from the factory overclock to get them to work in SLI...

Mietoxd

Honorable
Aug 14, 2013
8
0
10,510


Actually, I just ran Heaven, and it said a little bit different clock speeds. The first 670 was running 1254 MHz while the second was running at 1304 MHz. Also, the memory speeds were a little different, first card was 1305 MHz and second was 1355 MHz, but while idle the cards run identical clocks.
 
I really can't say if the clock difference is the real problem or not. But from what I have always understood with Nvidia cards you should use the same make/model cards. I can say that my two Gigabyte GTX 670's(Winforce2) cards are completely stable in everything I have played.

You might try a program like EVGAPrecisionX to set the clock rate of both cards to the same and see if you can get them stable then you can atleast rule that out or in.
 

Mietoxd

Honorable
Aug 14, 2013
8
0
10,510


Already did that, both cards work fine on their own and with sli disabled every game works.
 

Mietoxd

Honorable
Aug 14, 2013
8
0
10,510
I tried both cards individually and they work. I also tried swapping them around on the motherboard and swapping the sli bridge to 2nd slot, issue still persists. I looked at Event Viewer after a crash happened and there were 3 errors, two of them about Desktop Window Manager and one of LiveKernelEvent. Could these have something to do about this?

Anyone?
 

Mietoxd

Honorable
Aug 14, 2013
8
0
10,510


I'm currently using the 326.41 Beta drivers, but I also tried the 320.49 WHQL drivers and the problems were still there.
 
re-seat the cards in the slots, re-seat the sli bridge, update your motherboard bios and check for graphic card bios updates. I would use driversweeper to remove all traces of drivers, and clean install the latest driver. If none of that fixes it then swap the sli bridge first as its the cheapest part to replace.. even a clean install of windows may fix the issue. Just because the cards work properly independently doesn't mean one is not causing the fault in sli, although this is hard to diagnose if you don't have acess to another card. It could even be a motherboard fault.

Also, give us more details of the crash, check your windows error report log looking for what happened at the time of the crash.

I can confirm my 660's in sli run at different clock speeds in sli despite being completely identical. the bottom card runs a few MHz more on the core clock, ram is same speed. If you think clock speed difference might be causing a problem, underclock the cards slightly to nvidias standard clock sped (not the manufacturers factory oc speeds). You can set up a forced clock rate with NVidia inspector and a batch file to force a constant clock, rather than the variable boost clock, which fixed some problems for me when I was running a single gtx660
 

Mietoxd

Honorable
Aug 14, 2013
8
0
10,510
I don't want to jinx it, but I may have solved the problem. I changed the windows power plan to high performance and from nvidia control panel I changed the power save mode to performance. I managed to play Splinter Cell Blacklist for a good 8 hours straight without a single problem. Also downloaded the latest beta drivers.
 

jimbob79

Honorable
Sep 16, 2013
1
0
10,520
Hi,

the Power plan change gives some more stability as the PCIE power save is disabled in that mode, it can also be disabled manually for other modes. However, that alone didn't solve at least my problem completely.

For me, the ultimate fix was to increase my CPU voltages. I used to run my 3570k 4.5 @ 1.17V, but after having done everything the internet suggested for this SLI issue I figured that the issue may be elsewhere. I set my CPU voltage up to 1.3, VCCSA voltage to 1.07500, CPU PLL to 1,88750 and my PCH voltage to 1.1 and hey presto! I haven't had a single driver crash, kernel error or device removed error in over a week! Previously I needed to downclock my FTW's even from the factory overclock to get them to work in SLI, but now i have run both of them at 1281, memory 3650 stable: http://www.3dmark.com/3dm11/7178477. I think that adding a second GPU adds so much I/O load that the previously optimized CPU voltages just weren't enough.

I think I may have some headroom to pinch off from the voltages, but right now I'm just enjoying the results :D
 
Solution