PC Crashes Frequently: Black Screen & Reboot Every Few Minutes

Ransome

Distinguished
Jul 24, 2012
1,163
2
19,315
Very low CPU temps.
Very low GPU temps.
No Overclock on both GPU and CPU.
Everything worked fine for months and years. Didn't change a thing for the past few months.

PC is extremely clean. Bare-minimum background applications.
Games run smoothly and have ran for a long time. Latest drivers.

Suddenly, since this afternoon, the PC starts crashing very frequently - every few minutes - without warning, without reason, and without explanation
Just as writing this message - I had 2 crashes!

Happens while idle, under mid-load and under stress.

Nothing triggers it in particular.

Totally random and very rapid! PC is 100% un-usable right now.

Tried removing GPU driver (DDU) and many other applications, hidden devices, programs and device-drivers; and the re-installing them - no use.

Already ran: CCleaner. Disk Error checking through Windows. Trim/defrag.

PLEASE HELP. I am at my wits end here, no clue what to do!

This is my primary, high budget, high end gaming PC. It never caused such issues.
 
Solution
Hey guys, thanks for all the replies so far.
Dudeman I have placed my 2nd GPU (which is actually newer, and of later revision: Rev 1.1 instead of my 1st GPU that was 1.0).
After doing a DDU removal of Nvidia, I couldn't find the 2nd GPU in Device Manger.
I then made sure again that the PCI-E cables are connected soundly, both to the GPU and the PSU, as well as all the other cables. I have also re-made sure both GPUs are properly inserted to their individual PCI-E Slots, by applying greater force on the side of the backplate and contra-pressure on the other side of the case.

I then restarted the PC, set my CPU OC to XMP and 4.2 GHz with default auto voltage.
Finally I installed the latest driver of Nvidia, cleanly, and after the...

Ransome

Distinguished
Jul 24, 2012
1,163
2
19,315


My full Specs:
Windows 10 64bit
Gigabyte G1 Gaming Geforce GTX 980 SLI
Core i5-3570K (OC @ 4.2 GHz -0.03 Offset Voltage)
Asus Sabertooth Z77 Motherboard
16 GB of G.Skill Ripjaw-X DDR3 1600Mhz (X.M.P profile)
SSD 256GB Crucial M4
HD Western Digital Black 2TB
CPU Cooler: Noctua NH-D14
Case: Corsair CC600T Graphite White Special Edition
PSU: Corsair 850AX Gold
Monitor: Asus ROG SWIFT PG278Q 2560 x 1440p @ 144 MHz + Nvidia G-Sync.
Secondary Screen: 55" TV: Samsung UA55D6400 TV1080p @ 60 MHz -Connected via Pioneer VSX-823-K AV Receiver (Home Theater System)

Not sure about integrated GPU.
Don't know how to use memtestx86 exactly and where to get it (from a safe source). Removing 1 RAM would be difficult as my Noctua blocks the RAM (especially the 2nd).
 

Eximo

Titan
Ambassador
Did you just get a lump of Windows 10 updates? The anniversary patch particularly targets 'incompatible' applications and tries to remove them. When it isn't successful at uninstalling and the program runs, it can cause unexpected behavior. So look for anything that might run occasionally.

On mine I couldn't use Explorer until I deleted the root folder of a program it didn't like.
 

Ransome

Distinguished
Jul 24, 2012
1,163
2
19,315

Yes, I think I got some auto Windows Updates today. I think it made Chrome not work properly. Uninstalled it. I tried restoring to Oct2 Restore Point, but it didn't help. Still crashing very frequently. What can I possibly do now? What should I look for?
This is so oppressively disheartening.

*The only thing that happened recently, was yesterday: a power failure by the infamous local power company. Electricity went down and up instantly. I have a 5-min timer surge protector and a surge protector AC-splitter for the main cables. Don't know if that power-jump had anything to do with it.
 

Ransome

Distinguished
Jul 24, 2012
1,163
2
19,315


And how would I do that ?

I have been struggling with this issue for days now.
I thought I fixed but it seems to return at tims.
Instead of of hardfreeze+automatic reboot, ever few minutes, I get a occasional hardfreeze every several hours of gameplay. Again no performance issues, nothing serious running in the background, no heat issues. Nada.

I can't tell if it is mere coincidence and this has something to do with my CPU OC, or GPU OC or with the same problem.

Few days ago, hours after my last post, I tried removing the timer surge protector from the main power socket on the wall.
That surge protector is a small, socket shape device with 2 LED lights 1 for power 1 for timer protection. When the power in the house drops or flicker - the timer starts for- stopping all power to the connected device (PC)for 3-5 minutes- preventing electricity surges (potentially harmful AC power spikes when the power gets back on).

That puny Surge Protector seems to have been the source of the frequent mega-freezes every few minutes!
Removing it and the issue was gone.

I connected a new surge protector from a different manufacturer- athe issue was gone for a while .

Yesterday and today, when playing The Witcher 3 after few hours of gameplay, the entire system froze.
Image still appears, sound gets muted or a loud corrupted noise coming in from the speakers ; then nothing responds and after about 10-15 seconds the PC reboots automatically. When it gets back on there is no message from Windows.

What the hell is causing this? Please help.
My PC never made those issues.

Whats strange is that I can run an hours worth of Prime95 Blend or Short and get no issues.
 

Eximo

Titan
Ambassador
Many of the monitoring tools out there will show you the voltages the computer is reporting. You can also look in the BIOS.

If you have been having power issues, and they are common, you might consider investing in a UPS/Power Conditioner.

Overclocks can degrade over time, so disabling that temporarily while you are troubleshooting is a good idea. Given that you have a pair of GPUs, might be one of them is misbehaving. Monitoring temperatures and voltages while running a demand game might reveal the problem.

Could still be the power supply, it may not have fared well through a sag or surge.

When in doubt, simplify. Start taking things out. Go to a single memory stick, single GPU (or use the onchip), and see if it is stable.

I've found a lot of problems are solved in just a reassembly. Loose connections can be anywhere.
 

Ransome

Distinguished
Jul 24, 2012
1,163
2
19,315


I am suspecting the culprit is the GPUs or SLI setup. I am not sure yet.

The thing is, I have made a BLEND test in Prime 95 with lots of apps opened. It caused no issues for an hour of stress testing.

Then I opened a game, The Witcher 3, loaded a specific save and crash after few minutes.
Tried a remote socket from a different room, tried with and withoutsurge protector, tried a different splitter.

After successive crashes, 2 things happend:
The pc came up with a very bad zoomed in low resolution, as if I uninstalled the graphics driver. I rebooted again, it failed to boot several times.
Finally it booted abd I noticed 1 of the GPUs was missing, not appearing in SLI setup or Device Manager.

I've decided to open up everything and manually clean it up, make sure the cables including led and fans are all secure. Wipe any dust or debris even though my case is rather clean already.

This is really a bummer but I am trying to stay optimistic.

I remember having issues occurred few months ago, where MSI Afterburner Freaked OUT and CRANKED Overclock by itself to +999999o or something immense on both clock and memory. That caused horrid slowdowns and visual artifacts and crashes. It was hard to get it to stop because MSI was starting at startup. It was an odd weird bug, not made by my hand for sure. Couldn't find anything in google! Happened 2-3 times. I hope it didn't damage the GPU or burned them. Because that was months ago.

I sure hope it's some loose connection or component. I don't feel like buying random hardware, shooting in the dark, and I really don't want to spend more money and time on this rig at this time. Aside from the crashes, everything runs smoothly.
 

Ransome

Distinguished
Jul 24, 2012
1,163
2
19,315
Damn it. After spending the better part of the day dissembling, cleaning the heck out of the entire rig (it was exceptionally clean to begin with, no it's crystal clear clean) and re-assembling everything up - making sure I connected everything right.

Now I turned on the PC, and the 2nd GPU isn't detected in Device Manager. What's more, I am pretty sure I switched the 2nd GPU with the 1st, meaning it is more likely to be an issue with the Motherboard other than either of the GPUs. I am sick and sour from leaning, kneeling and crouching over this thing, tired of removing parts. This is frustrating. It makes no sense.

Is there a solution to this?
Is the 2nd PCI-E dead? Or is there a remedy. Bios setting maybe? Something else? The LED lighting on both GPUS are on. It could be a problem with one of the cards, but less likely. Am I missing anything? :(
 

Ransome

Distinguished
Jul 24, 2012
1,163
2
19,315
Hey guys, thanks for all the replies so far.
Dudeman I have placed my 2nd GPU (which is actually newer, and of later revision: Rev 1.1 instead of my 1st GPU that was 1.0).
After doing a DDU removal of Nvidia, I couldn't find the 2nd GPU in Device Manger.
I then made sure again that the PCI-E cables are connected soundly, both to the GPU and the PSU, as well as all the other cables. I have also re-made sure both GPUs are properly inserted to their individual PCI-E Slots, by applying greater force on the side of the backplate and contra-pressure on the other side of the case.

I then restarted the PC, set my CPU OC to XMP and 4.2 GHz with default auto voltage.
Finally I installed the latest driver of Nvidia, cleanly, and after the auto reboot, both GPUs were detected! Activated SLI and milder GPU OC (+100MHz Core, power limit max) - and it is working!!
I left the PC on the very same The Witcher 3 location for hours now, and it has been running well without a hitch! By the Lord I hope it stays stable! Keeping my fingers crossed.
I really hope this works guys!

So to sum it up, the only difference now is a far leaner, cleaner PC rig, especially the heatsink, deeper case parts and fans (including Noctua CPU cooler fans). Took out that elusive stubborn black dust. Cables have all been re-secured and connected firmly. 2nd GPU have become the primary horse while the 1st taken the back-seat. Think I might have relocated the SLI bridge to the left sockets. Drivers been reinstalled (again). CPU bios OC have been changed from Offest -0.030 to AUTO voltage. Which have been working solidly for years. Might try going to -0.05 or +0.05 offset if all is stable .

Here's hoping I can give good news in a few days!!! gw

Would like to hear what you think nonetheless! Thanks.
 
Solution

Eximo

Titan
Ambassador
Sounds like a classic case of try everything. Something worked, but we'll never know what.

Though the different revisions of the cards are interesting. I know EVGA had issues with the cards not liking each other between models that were before and after their fancy LED SLI Bridge. Making the newer card the anchor in the first slot may have been all that was needed, or using the other SLI path could have fixed it. Might have a wiggly pad or bad capacitor somewhere.