Graphics Card appears to be crashing

browncoat40

Commendable
Jan 18, 2017
6
0
1,520
I'm having an odd issue with my graphics. Mid game, most often 5-10 minutes in, the graphics crash. Both monitors display the "no signal" message and start searching for a signal. In the mean time, the PC is still outputting audio, so the PC isn't completely locked up. I can button-mash and it'll make sounds, video or streams keep playing. But the graphics don't come back. Not even when I restart the driver with WIN+CTRL+SHIFT+B. In order to reset the graphics, I have to hard shut down via holding the power button down. But it doesn't happen all the time; sometimes I can go for days without issue.
PC specs:
Windows 10
Asus H97i Plus
i7 4790 w/ Kraken x61
GTX 1070FE
EVGA 650 Platinum PSU
2x8GB Corsair Vengeance DDR3
Samsung 850 (M.2)
no overclocks anywhere

Errors on Event Viewer
Adding to the confusion, I don't get consistent errors. Sometimes I get:
Graphics Exception on BE 29: CRD_CACHE_HIT_FROM_OTHER_GPC
nvlddmkm: "The description for Event ID 13 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.
....(like 10 similar errors with different exception strings)
Sometimes:
Warning Microsoft-Windows-TaskScheduler 414 Task Misconfiguration Task Scheduler service found a misconfiguration in the NT TASK\Microsoft\Windows\....
(bunch of errors with different tasks)
Sometimes it doesn't record any errors, and I think I've seen a few others. Most often, they have to do with the task scheduler, nvlddmkm, or tasks not having the correct access.

Onset:
The whole PC is ~2 years old, except for the 1070, which was new in October. Everything ran well from October to December. In late Dec, I'd get these crashes every so often, but not consistently enough to expect or compensate for them. Last week, they started happening consistently every time I pushed the hardware (Overwatch and Rocket League)

Troubleshooting:
I've done most of the nvlddmkm trouble shooting: I've checked the connections, got the high power plan, the PCIE power saving is turned off, changed the theme, rolled back the drivers as far as it would go without running drive sweep, PC is up to date, updated all the other drivers, added TdrDelay to the registry, and made sure my temps were reasonable (crashes happen in the low-mid 70's). I then reformatted and did all that again. All to no avail.
Next, I'm going to try the oldest 1070 driver I can find and upping the voltage with EVGA Precision. (got the tests set up to go after work).
Any suggestions on other things to check? Or if it is hardware failure, is there any sure way to tell if it's a PSU, GPU or Mobo issue?

 
Solution
Alright, I'm fairly sure I've found out the culprit. It was my PSU.
I swapped in a 960 from another system for my 1070, and had exactly the same issues. I had to stress both the GPU and CPU to cause a crash though. Apparently, the psu could handle maxing out either the GPU or CPU, but not both. I had to run Prime 95 and Unigen Heaven to get the 960 to crash. The sounds after the graphics crash meant that the CPU was still good. Swapping the cards with the same result meant it wasn't the graphics card. Ram issues probably would have presented differently and more consistently in the event log. That left the mobo and PSU. The PSU was a replacement from an RMA. It took about the same amount of time to fail as the original unit. I may have...

Ramlethal

Estimable


56 degrees on the CPU its quite a high temp you know... The 76 degrees on the 1070 is almost fine... Still its recommended to turn the fans to 100% while playing...
What sort of PSU do you have ?
 

Pkai92

Respectable
Oct 20, 2016
357
0
1,860
Temps are perfectly fine if these temps are displayed while gaming. Make sure that the drivers are up-to-date. Check the power cables from your PSU to your GPU. Try changing the graphics card slot. Try your Gpu on an another Pc if available (friend's Pc or something).
 

browncoat40

Commendable
Jan 18, 2017
6
0
1,520


Yup, temps are fine, well within the norms. And they've been running that way just fine. I've got the most up-to-date drivers, which may be the problem instead of the solution. Checked and re-checked the cables. ITX board, so no extra slot. If I rule everything else out, I'll steal my wife's PC for trying the card in a different PC....but lord knows what fights might result. As such, that'll be the step right before RMAing and hoping the GPU is the fault
 

browncoat40

Commendable
Jan 18, 2017
6
0
1,520
*Update* this didn't fix the issue, I only thought it did.
Alright, so I uninstalled the current drivers via device manager (deleting the driver), installed the oldest ones (circa Jun '16, currently its Jan'17). Then I upped the voltage to 87% via EVGA PrecisionX. Ran Unigen Heaven for a half hour. Played Overwatch for a half hour. No problems. So I guess that fixed it?
I tried to break it to see what the problem was; I installed the most current drivers via GeForce Experience, and set Precision back to default. It did a half hour of Heaven and an hour of Overwatch without issue. So I might have fixed it, but I can't reproduce the problem.
I'll let you know if it was "just a good day" and the problems reoccur. Thanks for your help!
 

browncoat40

Commendable
Jan 18, 2017
6
0
1,520
So it wasn't the drivers or the voltage going to the card. It ran fine all of Tuesday night through 4 hours of gaming, stayed on overnight, but Wednesday, same issues as before, with no changes to the system. However, the ambient temperature was about 10F (5C) warmer. Apparently one of the minor components on the card likes to overheat. I upped the fan speed of the case fan directly feeding the card and added another fan, and it seems to be running fine now. Should I be concerned that my Founders Edition "blower style" card isn't able to cool its components without additional airflow on it's backplate? I ask, because I've seen videos with similar blower cards sandwiched right on top of each other, and they seem to run fine. I'll let you know if simply upping the case fans fixes the issues.
 

browncoat40

Commendable
Jan 18, 2017
6
0
1,520
Alright, I'm fairly sure I've found out the culprit. It was my PSU.
I swapped in a 960 from another system for my 1070, and had exactly the same issues. I had to stress both the GPU and CPU to cause a crash though. Apparently, the psu could handle maxing out either the GPU or CPU, but not both. I had to run Prime 95 and Unigen Heaven to get the 960 to crash. The sounds after the graphics crash meant that the CPU was still good. Swapping the cards with the same result meant it wasn't the graphics card. Ram issues probably would have presented differently and more consistently in the event log. That left the mobo and PSU. The PSU was a replacement from an RMA. It took about the same amount of time to fail as the original unit. I may have just had bad luck, but I'll be staying away from EVGA's SuperNova P2 650W from now on. It could have been the motherboard, but I would have expected a full crash instead of just the graphics card crashing.
Anyway, swapped out the PSU for a shiny new one, and it's been running perfectly the last few days, including pushing the system as hard as it can go.
 
Solution