Extremely hard to diagnose issue with tower

maceman_black

Prominent
Sep 15, 2017
2
0
510
OK everyone. I have a very hard to diagnose issue with my Tower that started a few weeks ago.

The problem: The computer will lose monitor signal after playing certain games for several hours (anywhere from 3 to 8 hours) and only certain games it seems. The game sound continues to play in the background. The computer has to be rebooted. Sometimes I can't get it to happen, and then it will seemingly happen from nowhere. Was no able to detect any temperatures even remotely close to overheating on GPU or CPU before these incidents occurred. HW Monitor reports my PSU voltages are all good, slightly above in most cases (for example 12V is running at 12.107).

Solution Attempts:
- Ran MemTest for a day, no errors.
- Switched monitor from DVI to Display Port (totally different port and cable)
- deconstructed PC, swapped ram slots, reapplied arctic silver, swapped GPU to a new PCIe Slot

After doing the above steps, the computer seemed perfectly fine for over a week with absolutely no freezes or shutdowns. I even ran DOOM 2016 for 2 days straight with no issues.

However over the last 2 days I have seen 2 crashes. They were not exactly the same as the first time, the computer just froze and the monitor does not lose signal anymore. No overheating as far as I can tell. This time the game sound doesn't keep playing in the background.

In addition, and strangely enough the first of the 2 new crashes gave me a BSOD message when I rebooted the computer.. THIS is the ONLY one of the crashes that has ever shown an error.

Additional information about the problem:
BCCode: 116
BCP1: FFFFFA80149294E0
BCP2: FFFFF88011A276C0
BCP3: FFFFFFFFC000009A
BCP4: 0000000000000004
OS Version: 6_1_7601
Service Pack: 1_0
Product: 256_1

However the second crash which just occurred did not. Also strangely upon rebooting after the first crash the bios threw an error saying that overclocking was unsuccessful on the CPU and forced me to enter the setup where i reset my CPU to default. Thing is, my GPU and CPU are not and have never been overclocked.

I am tearing my hair out over this issue as I find it really hard to diagnose. All I really know is that it's not an issue with the ram, and that switching to Display Port from DVI seemed to fix the issue, but apparently that only lasted a week.

I am assuming that the most likely culprits are either my PSU, or my GPU.. Or possibly an issue with the PCIe slots on my Mobo. luckily my GPU is still under warranty so I was planning to RMA it to EVGA soon. However I am concerned that if something else is wrong then I will just damage the new GPU they send me.

Its such a weird issue because I can run a game liek DOOM 2016 for days at a time with no issue with the PC at full load. Then randomly I'll play Starcraft 2 or GTAV for a few hours and the PC will crash. The two crashes that have happened since "temporarily fixing the problem" were 1: durign starcraft 2 after about 6 hours of play (the one that threw a BSOD error), and 2: just while I was on the desktop, but I had a ton of internet tabs with animations palying and Second Life open.

Any help would be GREATLY appreciated.

PC specs:
- AMD FX 8350 4.0ghz
- Asus AM3+ mobo
- 16 GB Ripjaw Ram
- EVGA geforce GTX 970 4GB Vram
- Window 7 64 bit on an Intel solid state drive
- several other mechanical hardrives for storage
- Thermaltake 750W power supply

Games that have cause the issue:
- GTAV
- Starcraft 2
- Desktop? or second life maybe?

One last interesting piece of information: I am an indie game dev and I use Unity 2017.1. Unity taxes my system more than most of the games I play on it and it has never caused any crash or hanging issues, and I use it every day.

EDIT: I have found something interesting

It seems that every single time the computer crashed I get the same error which seems to be related to the display driver crashing. Problem is I have update and tried several nvidia driver versions. So there is something else going on here, maybe a PSU problem, Maybe GPU.

Event ID 13 from source nvlddmkm

\Device\Video7

and a bunch of different error types. a few of them here for example:

- Graphics Exception: Shader Program Header 3 Error
- NVRM: Graphics TEX Exception on (GPC 0, TPC 0): TEX NACK / Page Fault
- Graphics Exception: Shader Program Header 18 Error
- Graphics Exception: ESR 0x504224=0x80000041 0x504228=0x900080 0x50422c=0xa2ac1b 0x504234=0x0

Also my PSU voltages, I'm not sure if they are too low or not.. here they are at idle:
12V = 12.107
5V = 4.974
3.3V = 3.147


 

maceman_black

Prominent
Sep 15, 2017
2
0
510
I also noticed something. I'm not sure if these PSU voltages are low or not. 12V = 12.107, 5V = 4.974, 3.3V = 3.147.

The 3.3 volt is the one that concerns me, these are the voltages according to hardware monitor while the system is idle.