Computer crashes without BSoD, restarts, and crashes again for several loops

explorer45

Commendable
Dec 10, 2016
10
0
1,510
Hello,

I am hoping to get some help regarding my computer. I was watching a video on YouTube when my computer turned off like I had hit the power, no error reports or Blue screen. It automatically restarted, but when I went to login, it crashed in the same way. From here it entered a loop of this behavior. It would sometimes crash before login, and sometimes crash after login. After 4-5 restarts, a separate boot screen showed up that let me try and fix the error if it was with the system itself. I did a system restore, but I don't know if it worked, system restorer said it failed, but on the next boot it said it worked. I thought this behavior could mean the computer was overheating, so I cleaned it out and dusted it. This took a little time, and after it booted and ran fine, for about an hour or so. During that time, I ran Malwarebytes, and it came back negative. After about an hour, the computer crashed again, same as before, on YouTube, no BSoD. I turned it off, so I don't know if the reboot loop restarted.

My computer uses
a H97-D3H Gigabyte motherboard,
an Intel core i5 Processor,
a Geforce GTX 970 graphics card,
a cooler master V650 Semi-modular power supply (model RS-650-AMAA-G1)
two Corsair Vengeance 8-gig ram sticks,
a 1 TB hard drive,
a crucial MX100 256GB SSD,
and a TP-Link N900 Wireless Dual Band PCI Express Adapter for wireless.

I run windows 8.1 from the SSD

Any help on this matter would be greatly appreciated.

EDIT: Finally fixed the problem. The PSU started to fail, no real idea why. Threw in a new one and its working fine again.
 
Solution
Some possibilities for your issue are:
1. Drivers not installed or bad driver. You can check in Device Manager for any yellow triangles. If there are then uninstall and update the driver.
2. Problems with RAM can cause this. Try with one stick of ram in first slot and try switching them around in-case one is failing. If suspect then run Memtest86+ from a USB stick.
3. Your system may be overheating when under load. You can check using HWMonitor when stress testing. Also view your rail voltages to check the PSU is doing its job.
 

explorer45

Commendable
Dec 10, 2016
10
0
1,510


I checked the drivers, no triangles. I downloaded the HWMonitor, [strike]and it showed my 970 is getting up to 95 - 100 degrees on the desktop, no games or internet, just HWMonitor. It also shows that my GPU fans are not spinning. From observation, I saw them spin up during boot, but they stop after login and do not restart. All other fans seem to be working. I assume this is the issue, but honestly, I am not very knowledgeable about technical information. Assuming this is the issue, do I need to get a new GPU, or is it something with the drivers? [/strike]

EDIT: I just realized that HWmonitor said the temp was 100 Fahrenheit, and dangerous temps are around there, but in Celsius. I'll check the RAM tomorrow.
 


100F = 38C (approx) and that is not dangerous at all.
Ideal system temperatures are 10-15C above ambient room temperature and 60-65C under load.
Report the Rail Voltages for 12V, 5V and 3.3V in HWMonitor.?
Which stress tester are you using.?
 

explorer45

Commendable
Dec 10, 2016
10
0
1,510


Stress-tester wise i'm not sure what you mean, I am using HWMonitor for the temp.
According to HWMonitor,
-12V is at -6.8/-6.9V
5V is at 3.34V
-5V is at -6V
3.3V is at 2.01V
5V VCCH is at 2.84V
VBAT is at 1.57V
CPU VCORE is at .132V to 1.06V max
and VIN1 is at 2.016V

What are the usual limits for these voltages? This is really the first time I've delved into this kind of stuff.

For the record, I started the computer again today, and it lasted about three and a half hours before crashing and entering the loop. That is about how long I though it lasted when this first began.
 
I think the voltages should be within 5% of the reference voltage
ie +3.3V would be max +3.47V min of +3.14V
your numbers are off but you need to see if it is the power supply or test tool you are using.
you might go into bios and see if you can read the voltages while in BIOS, some systems can do this.

here are some voltages:http://www.kmepc.com/WebForms/TechnicFAQInfoPage.aspx?para=0000000018

with incorrect voltages you would get bugcheck 0x124 errors and random memory errors result in bugcheck that show 0xc0000005 error codes as parameter one.





 

explorer45

Commendable
Dec 10, 2016
10
0
1,510


According to the BIOS,
+3.3V is at 3.32V
+5V is at 4.9V
and +12 V is at 12.1V

Looks like the test tool was off.
 
that is good news.
generally a crash without a bugcheck indicates a power problem OR a problem where the bugcheck memory dump can not be stored to disk.

-for the disk related problem you would update the BIOS and the motherboard SATA drivers.
you might put the disk on a different port or sata controller if you have two in your machine.

- the problem can also be a GPU pulling too much power from the system motherboard slot. The motherboard can detect this and reset the CPU, if you have a good power supply you end up with a black screen. If you have a cheap power supply it will let the CPU restart with unstable power and you get a bugcheck 0x124 on the restart.
I would check to make sure you have proper power to the GPU, check and extra power cables running from the GPU to the PSU. I would also remove any overclock of the GPU or CPU and see if you still have the problem. IF so I would underclock the GPU.
You might turn off your browser GPU hardware acceleration until the problem is resolved, just so it will not trigger a bug check.
you will also want to make sure your PSU modular connections are firmly seated/connected. a unplugged cable can cause the motherboard to reboot when the GPU goes into 3 d mode.




 

explorer45

Commendable
Dec 10, 2016
10
0
1,510


I checked all the wires going to and from the supply, and they all seemed to be hooked up properly. The RAM and GPU also seem to be hooked up right. As far as I know, none of the components are overclocked, so that can't be the issue. For what it may be worth, given how it was off with the voltages, HWMonitor said that the GPU power was at 11.58%, whatever that means. It also reported that the CPU was drawing no more than 50 W for each category it reported, around 90-100 W max.
 
the only real driver update to this older board was for the realtek audio driver, it could corrupt memory and cause problems with graphics cards not responding. the driver was released for windows 8.1 on the gigabyte motherboard site
Realtek HD Audio Driver dated 2016/02/23 http://www.gigabyte.com/products/product-page.aspx?pid=4962#driver

generally the screen would just freeze, sometimes with sound if you had speakers connected to the monitor.

you might check to see if you have a fan in the PSU that has stopped working. IE the PSU overheats and resets the power_OK signal to false, the motherboard gets the signal and resets the CPU until the signal is set to true by the power supply. This could take some time for it to cool. You might heat the PSU with a hair dryer to see if it can trigger the failure.
or blow the dust out of the PSU fans if it is stopped by dust. or just blow air into the fan and sometimes they just start again.



 

explorer45

Commendable
Dec 10, 2016
10
0
1,510


That doesn't sound like the error i'm having, the computer and sound don't freeze. It acts like I had forced a restart by holding down the power button. It just shuts down, and then turns back on. I did just check the PSU fan, it seems to be working fine, it spun up during boot and stayed on. Just to make sure, I also checked and all the other fans seem to be working as well. I have two case fans, a fan on the CPU, two on the GPU, and one on the PSU. All spin up during boot just fine.

It seems like to me that something is "building up" in the computer that forces a shut down. If its left overnight, it will last about 3 and a half hours. Leaving it for less time means it takes less time to crash, and if you try reboot it right away, it will crash near instantly. Other than heat, is there anything that "builds up" that could cause a crash or force restart?
 
heat can cause thermal expansion and cause short circuits or cause components to become disconnected from the circuit traces. Happens more often with older electronics.
The are also very hard to find, you have to use a heat gun and try to find the faulty component or connection. Then look with a stereo scope to see if you see any broken traces. Most people give up and start swapping out parts until they hit the correct part.



 


It is when the system is under load (as in stress testing) that faults can happen and I suspect your PSU to be an issue until it is ruled out as the culprit.

Conduct a stress test explorer45 to determine the culprit.
Download AIDA64 and put it side by side on your desktop with HWMonitor.
In AIDA64, go to the tools menu to run the stress test. Check boxes for CPU, FPU and Cache.
Run the test for 10mins and stop the test if temps reach 80C.
At the 10min mark, take screen shots showing Rail voltages and temperatures with 100% utilization on all cores.
You can use IMIGUR file host for your screen shots. Upload the files to IMIGUR then go to your images and obtain the BBurl to link here for analysis.

AIDA64 is free trial software and will test the PSU, CPU and other sub-systems like GPU and RAM.
 
Solution

explorer45

Commendable
Dec 10, 2016
10
0
1,510


Alright, I did the stress test for CPU, FPU, and Cache, here are the results.
oSuR2en.png
[/url][/img]
egOAg9y.png
[/url][/img]
 
Thanks for the reports explorer45.

As you can see, all rail voltages are out of spec in HWMonitor. Acceptable tolerances are +or- 5%.
Your temperatures are fine at the 10min mark so that's not the issue.

In AIDA64 you should monitor Voltages of the graph shown during the test to see if voltages spike or droop below acceptable ranges. Although AIDA64 stats show voltages are OK its not conclusive.

You can either extend the test to say 1hr or more to see what happens or the definitive test is to now swap out the PSU with a known working unit with same Wattage or higher.

If it still keeps happening then some checks on OS can be done.
You can run "sfc /scannow" without the quotation marks in an elevated command prompt.
This will verify your OS files for any corruption and it will report if any thing is found and attempt to fix them.



 

explorer45

Commendable
Dec 10, 2016
10
0
1,510


For what's it worth, I did check the graph, and all three voltages stay constant and within acceptable bounds. I also checked the BIOS diagnostics, and it reported acceptable values as well. I don't know which are right and which are wrong.
After running the scan, the command prompt returned no missing files, so that's not the issue.
I don't have any spare PSUs or RAM sticks, so if either are the issue, I won't be able to test it immediately. I'll get back to this thread when I have done the switch outs.
Also, I never mentioned it, but all of the parts are about 18 to 21 months old, I don't know how long these parts are supposed to last, but I would assume that this is well within their lifespans.
 
A good way to test your RAM is to have just one module in the first slot and try switching them around.
If a module is failing then RMA the complete kit.
You can also test the DIMMs using Memtest86+ by booting from a USB stick. If you do this, run tests as a group and individually for min 3 passes. This is a much better way than using windows memtest.

If any of your system is still under Warranty you should submit a ticket on the Manufacturers web site and obtain an RMA authorization and get the product replaced.