Can't figure out what's wrong with my PC

Irbyz

Distinguished
Aug 3, 2011
7
0
18,510
My system specs:
OS Windows 10 and Arch Linux
CPU Intel Core i3-530 2.93GHz 4Mb Cache with Cooler Master Hyper T4 fan
GPU Palit GTX660 OC 2048Mb GDDR5
RAM Corsair XMS3 4Gb (2x2Gb) DDR3 1333MHz
MB ASUS P7H55/USB3
HDD Western Digital Caviar Green 1Tb 64Mb Cache (and some old 160Gb Seagate, don't remember the actual model)
PSU Cooler Master eXtreme Power Plus 460W
Monitor LG W2363D-PF 120Hz
Audio FiiO E10 USB DAC Headphone Amplifier (onboard audio is disabled in BIOS)

I bought Overwatch recently and wasn't getting the desired performance (I want to have about 120 FPS constantly with everything on low). So I decided to overclock my CPU, had some BSODs, but after some time it seemed that I reached a stable 4.0 GHz. However next day when playing a game for some minutes, monitor lost signal and the graphics card fan started spinning at maximum speed. There was a sound like some error window popped up but the system wouldn't react to keyboard so only cold restart helped.
I tried more mild overclock options and then completely reverted to stock, but this kept happening. I use MSI Afterburner/HWiNFO64/RivaTuner On-Screen Display while in game and all the temps were always fine, CPU temp never went above 60 °C and GPU above 70 °C, all voltages seemed fine as well. Only thing that would change right before this problem occurs is that GPU usage would suddenly become very low, dropping to 70-50% and FPS in game dropping significantly as well. In Windows Event Viewer there'd be nothing relevant.
Then the problem was gone for a couple of days but then it came back.
I tried Metro 2033 to make sure it's not just Overwatch problem, and almost as soon as I started new game, the same happened . Then I booted into Arch Linux and tried running Unreal Tournament 4 Pre-Alpha. It was stuttering a lot and soon turned into a slideshow, followed by the same thing. This is what I found in the log:

Code:
Nov 11 20:15:54 miow kernel: ^[[0;1;39mNVRM: GPU at 0000:01:00.0 has fallen off the bus.
Nov 11 20:15:54 miow kernel: ^[[0;1;39mNVRM: A GPU crash dump has been created. If possible, please run
                             ^[[0;1;39mNVRM: nvidia-bug-report.sh as root to collect this data before
                             ^[[0;1;39mNVRM: the NVIDIA kernel module is unloaded.
Nov 11 20:15:54 miow kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000917c:0:0:0x00000040
Nov 11 20:15:54 miow kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000917c:0:0:0x0000000f

One time playing OW after monitor lost signal and GPU fan started spinning at 100% as usual, a second later GPU fan stopped and another second later PC crashed, rebooting itself. After that I think graphics card started making a hiss sound.

At some point I decided to take GPU out and put it in another PCIE slot. When I turned PC on, GPU fan would spin for a moment and then stop. I put it back in the old slot, same, and now this happens all the time.
After some minutes in OS it comes back to life, otherwise i have to spin it with a hand to help it start. But after launching a game or benchmark it stops spinning shortly. One thing I found that would help it keep spinning in games is setting some higher flat fan speed in Afterburner. The fan became more noisy too.
But monitor doesn't lose signal anymore so far, though of course I'm hesitant to do more testing now.

So now there's something wrong with GPU fan, this much is clear. But why was I getting this problem with monitor losing signal and GPU fan spinning at max speed? It seems very likely it will re-occur. Is it a sign of graphics card dying (which is very strange since i was only overclocking CPU/RAM and never exceeded recommended voltages), or could it be some other part of my PC? On top of that it seems like DVD-ROM is more noisy too now when I start my PC. Unfortunately I can't really swap parts to pinpoint which one is faulty, can't RMA any of the parts either as the warranty ran out.

Looking forward for any clues, and thanks in advance!
 
Solution
Overclocking is only something you should in software, and dont go Full power PC-go faster all the way up to 7.7Ghz. You need to go up in say 500mhz at a time and see which works best for you.
It seems clear there is a gpu fan problem, this could be causing overheating, and therefore decrease in performance (Throttling) and in extreme situations, a shutdown. I'm not sure if your GPU needs external Pcie power from the power supply,but if it does, try using a different cable. Otherwise, I have no idea.

LaserNinja11

Honorable
Dec 28, 2013
213
0
10,760
Overclocking is only something you should in software, and dont go Full power PC-go faster all the way up to 7.7Ghz. You need to go up in say 500mhz at a time and see which works best for you.
It seems clear there is a gpu fan problem, this could be causing overheating, and therefore decrease in performance (Throttling) and in extreme situations, a shutdown. I'm not sure if your GPU needs external Pcie power from the power supply,but if it does, try using a different cable. Otherwise, I have no idea.
 
Solution

TRENDING THREADS