Graphics card issues, don't know what to do next.

WallHackJack

Commendable
Oct 31, 2016
5
0
1,510
I've tried everything! But my two EVGA GTX 960s will not stop giving me trouble.

I get 3 main crashes.

1. The display device has stopped responding and has recovered (Crashes games)
2. The display device gtx 960 is not removable and cannot be unplugged (crashes game)
3. The device doesn't recover, leaving me to restart my machine. (the display will not come back, but I can hear the device-removal sound or a similar error sound)

I've tried everything. I've run the 960s in SLI and separately, using both of the PCI ports. I've tried every combination of my ram sticks. I've tried under-clocking my card when using high power games. I've reinstalled windows, my games, my drivers. I've been monitoring the heat on my cards and my processor. I don't know where to go next, it's too hard to pin down when I'm getting such a barrage of errors. I don't even know if it's really my cards that are the problem, I've seen similar threads saying ram was the problem.

Please help me! :(

  • - What tests should I be running?
    - What information can I dump here that would be most helpful?
    - Which component is most likely failing here?


Additional Info

  • - The crashes occur while playing games, specifically overwatch.
    - The crashes occur more frequently the longer the computer has been used ->
    - Once Crash occurs, more crashes will occur, even if computer restarted
    - Both of my 960's produce identical errors when swapped out for one another.
    - Sometimes if I hit my build with my knee or something I'll get one of the above errors, but - everything is seated correctly so 9/10 times I touch the rig nothing happens.
    - Replacing the 960 with and old 580 seemed like the computer wasn't crashing, but didn't test for too long.
    - Under clocking sometimes gives me longer before a crash occurs.
    - I'm sure a stress testing software could cause crashes
    - Ram sticks seem pretty hot when taking them out; Case airflow is awful, but processor fan is great + GPU temps OK (<72c can cause crash)

 
Solution

Endless8

Respectable
Oct 21, 2016
389
0
2,160
Hi WallHackJack,
Did you test your 960 separated? Do a stress test for them and test out your GTX 580 with stress test too, and watch what happens. Running a MemTest will be good.
I look forward to your reply.
 

WallHackJack

Commendable
Oct 31, 2016
5
0
1,510


Hey Endless, getting on it now. I've run the Nvidia stress test in the past, by separated, you mean running a stress test that targets only the GPU? I'll use nvidia again unless you have a better recommendation. (Update, furmark instead)

Same for Mem, any software recommendation or should I boot a flash drive. Those are always so slow and I don't know how to read them.

Anyways, getting on it.
 

Endless8

Respectable
Oct 21, 2016
389
0
2,160


Furmark will be good to test, Separated I mean plug only 1 card do a test then do the same for the second one.
Edit: For ram test https://www.youtube.com/watch?v=3mcVGz9Ryuc
 

WallHackJack

Commendable
Oct 31, 2016
5
0
1,510


Gotcha, I'm only running one card at a time. Ran furmark a few times and it crashed at 70% the first time, then 40% the other two times. Temps rise up to the 80C mark, which is what is listed as the target temp in EVGA precision XOC. The crash black screens me, but then results in "Furmark has stopped responding" without triggering my normal issues.

Now that I have something to test with that doesn't completely destroy my system, I was able to play around with different card settings through the EVGA software. I boosted my fan to run at 60% (2000rpm) instead of auto which was always hovering at 15% until the card hit 80 degrees. It worked! I was able to get through the test without a crash. My feet feel colder too!

I have the results from furmark, is there anything you need? 2000 RPM while gaming won't hurt it will it? It's 60% power through the official software anyways.

I'll get on the mem test later but I can tell you there are errors, I've done that process before. Otherwise though windows seems to work fine when not gaming. Plus this new revelation should narrow it down to my card correct?
 

Endless8

Respectable
Oct 21, 2016
389
0
2,160


Well, pretty strange, your GPU memory may be overheating thats why it happens, and the problem happens on both cards? Make sure your card is not dusted and try to change the thermal paste on it for something like Noctua NT-H1.
You can make a fan speed target in MSI Afterburner it should be in EVGA Precision too, it will throttle your fan when the temperature increase. And nah 60% won't hurt at all, it will be just more loud.
 
Solution

WallHackJack

Commendable
Oct 31, 2016
5
0
1,510


it's a bigass ax1200. Pretty new. Again, crashes seem to occur more frequently once a crash has already happened, even after a full reboot, including powering down the power supply
 

WallHackJack

Commendable
Oct 31, 2016
5
0
1,510


Yeah I get the same set of problems on both cards (they are very clean.). I bet if I repeated the process above on the other card with and without the fan boost we'd see furmark do the same thing. EVGA's "auto" fan controls are not great, I'm comfortable with going manual and adjusting it to make sure I don't hit 75 degrees.

So seems like card temperature was the culprit here, even though my card should be fine running at 75 degrees. I guess there are many parts to the graphics card, so I'll try your solution after some overwatch. If we crash again with the fan up I guess it's back to the drawing board.

 

Endless8

Respectable
Oct 21, 2016
389
0
2,160


By auto I meant this https://www.youtube.com/watch?v=ZsHVhZ_ARn4 and ok, post if something happens :)