completely new rig, sudden reboot problems (in idle as in gaming)

triops_90

Reputable
Jan 10, 2016
11
0
4,510
Hey fellas,

I hope this is the right sub forum for this problem - If no, please excuse me.

The problem is sudden reboots with an almost completely new rig. A few weeks ago I built my new PC with the following spec:
-Motherboard: Asus Z170 Pro Gaming
-CPU: Intel i5 6600k
-Memory: 2 x 8GiB DDR4 Corsair Vengeance LPX 2400MHz (running @2133MHz)
-Graphics: Asus Strix GTX970OC
-PSU: be quiet Pure Power CM BQT (630W)
-Drives: Crucial BX100 SSD, some Seagate SSHD
-OS: Windows 10 Pro x64

At first the computer seemed to run fine, but all of a sudden one or two weeks after the installation of the hardware sudden reboots started to happen.

It doesn't matter what I'm doing, sometimes it happens after 10 minutes of office work, sometimes it happens after 1 minute oder 1 hour of gaming at maxed settings, sometimes it even happens during the boot process. It's like someone pulls the plug – the screen just goes black. After about two or three seconds after the crash the PC boots again (despite of having deactivated automatic reboot after crashs in windows). All in all it seems to get worse and happen more often than after the „birth“ of this computer.

Interestingly in most cases the screen just stays black while rebooting as if no signal was produced by my graphics card. I then have to manually switch of the PC by switching the PSU off and on again – then booting, everything seems to be ok with my graphics card again and a normal view is generated. Although this doesn't happen in all cases. Sometimes the graphics card runs well after the reboot, but in most cases not.

The windows event logger creates just the generic Kernel Power Event 41. Furthermore I checked the temperatues of the components in windows – everything was a-ok (CPU 30°-40°C, GPU ~40°C)

What I already have done without any effect:
1)Prior to the beQuiet 630W PSU, I had an about 8 years old 500W bequiet PSU installed which was replaced by the stronger one.
2)Changed the graphics card – downgraded to an Nvidia GTX650Ti (thought maybe power spikes of the 970 caused it)
3)Removed one memory stick
4)checked on all cables and connections
5)deactivated the "asus anti power surge" in BIOS after having read that this tool seems to "misdiagnose" power surges.

My next guess would be either the power switch (mechanical problem? But why would then my graphics card not work in most cases after the reboot?) or something motherboard related (which I don't know how to diagnose, next step would be to RMA it)


Dear community, do you have any idea how this problem can be dealt with? Appreciat any tips :) Please excuse any language problems, english is not my first language.

Have a nice sunday!
 

triops_90

Reputable
Jan 10, 2016
11
0
4,510


Currently Bios Version is "0802 x64", build date 09/02/2015.

The system already crashed once while booting shortly after initial boot screen - therefore I'm afraid of updating the bios (if there is a newer version) and corrupting the system further by having it crashed while bios updating..
 

Geekwad

Admirable


Agreed. This is why it is NEVER advisable to update the BIOS from within the OS. Always, always, always flash the BIOS.

Because there has been several system stability updates since your version:

http://www.asus.com/us/supportonly/Z170%20PRO%20GAMING/HelpDesk_Download/

Follow the directions for flashing the most recent version:

http://dlcdnet.asus.com/pub/ASUS/mb/LGA1151/Z170-PRO-GAMING/E10719_Z170_PRO_GAMING_UM_V2_WEB.pdf


 

triops_90

Reputable
Jan 10, 2016
11
0
4,510


Updated the bios to version 1202 x64 - problem still persists :(
 

Geekwad

Admirable
OK, this might seem like a bit of a wild goose hunt (it is), so trying some ideas:

Try uninstalling all the Nvidia software/drivers again and reinstalling a fresh set directly from Nvidia (go back to the 970):

http://www.geforce.com/drivers

What is the status of Windows Updates? Anything hanging, or saying it can't install?

In Control Panel > Device Manager, are there any 'alerts !' for driver problems? Some Asus motherboards back during the vintage of your old BIOS shipped with chipset drivers (on the installation disc) that were not perfect. There are other chipset drivers on the motherboard page:

http://www.asus.com/us/supportonly/Z170%20PRO%20GAMING/HelpDesk_Download/

Or Intel has a chipset driver utility that is worth a try if you can't identify what you had:

https://downloadcenter.intel.com/download/24345/Intel-Driver-Update-Utility

Can you log or access event viewer to see what the +12v rail is doing during the crash? Is it dipping at all? Are there any deep cycling appliances on the same circuit as your PC? Anything like a fridge, sump pump, microwave, etc? Is light dimming a problem where you're at?
 

triops_90

Reputable
Jan 10, 2016
11
0
4,510


First of all, thank you for your tips and patience! Really appreciate it.

Graphic drivers: I have uninstalled all nvidia software and drivers and reinstalled the newest version right from their homepage, didn't help.

Device Manager is a-ok, toom no problems detected. Nevertheless installed an update for the „Management Engine Interface“ out of the chipset section from the asus portal – didn't help.

The intel chipset driver utility didn't offer me any updates because „no drivers were found for [my] product“.

Concerning the event viewer, can this tool explicitly monitor the 12V rail? I don't have numbers for the crash situations, but in general the 3V, 5V and 12V Rails are surely within the +-5% threshold, I have looked that up more than a few times, of course just in "normal" situations. Looking at the event viewer was a saddening experience – I assembled my pc in the beginning of december, the first few crashes (9 total) happened between the 14. and 30. december, since january it seemed to get worse (~40 crashes between 2. january and now). In decemeber I spent a lot more time in front of the pc than now– so there are a lot more crashes per runtime now than there were before; the problem seems to really get worse. All crashes' logs contain the same information – Kernel Power 41 (63), Keywords (70368744177664),(2)/0x8000400000000002 [full log (in german, sorrry): http://pastebin.com/7p6A7dTM].

There are no such appliances on the same circuit – the only consumer is my PC + TFT and sometimes my notebook. Light isn't dimming at all.

My next step would be to get a new case (already wanted to do this prior to the problems, so no senseless spent money for bugfixing :p) to exclude some button or unwanted motherboard-chassis-contact and/or to RMA my motherboard because I'm really out of ideas. What keeps bugging me is that the problem is getting worse and worse which leads me to the conclusion that it has to be an hardware related problem, but again – I'm all out of ideas :(
 

Geekwad

Admirable
What kind of case is it in currently? Could be a grounding problem of the motherboard and case? Intermittent problems are the very worst to track down :(

Have you tried activating the XMP profile for your RAM to have it run at it's 2400mhz rated speed? As it's defaulted to CPU stock, it's timings could be off and throwing issues. Also make sure that your two sticks are in A2/B2 and not in A1/B1.

EDIT: Use this for monitoring voltages:

http://www.cpuid.com/softwares/hwmonitor.html

It isn't perfect, and is not always totally accurate, but might be able to narrow down an issue with mobo or GPU.
 

triops_90

Reputable
Jan 10, 2016
11
0
4,510


It's an Thermaltake case, I don't really know the exact specification. Already thought of a grounding problem - therefore the new case, too.

I have activated the XMP profile for my ram - so long no more crashes, my system has been running for about 45 minutes without reboots so far. Although it seems to have solved the issue, I'm not 100% convinced. What I don't understand is that the problem has been progessively going worse since a few days, how could this have been a simple settings issue? If it really are these settings, timings, voltages and whatnot, then my gut feeling tells me it should hvae been constantly bad?

But let's hope this has solved the problem. What a nice wild goose hunting trip having you and your tips with me, thank you! I am going to let the system run for a few hours, if no problems occur I'm gonna select the right solution for people having the same problem.

 

Geekwad

Admirable
OK,

So the OS is on the 240Gb Crucial SSD.....is there anything on the 2tb Seagate? I do think you're probably onto the mobo RMA, but was thinking of taking the SSD out of the mix and doing a clean OS install on the HDD.
 

triops_90

Reputable
Jan 10, 2016
11
0
4,510
Thanks for checking the dump. The Seagate drive is mostly empty, there are just a few games, nothing relevant for the system.

I'm gonna remove the SSD after work and install an OS on the HDD, gonna update later on.

To RMA the mainboard would be my next step, too, let's hope they can deal with such an unspecific error - or hope a new mobo solves the problem.

Furthermore a friend pointed out that I didn't check the CPU for failures. Do you think it's maybe worth a try to buy/borrow a fitting CPU to check this? My gut feeling tells me that such an elemental component wouldn't cause a "random" failure like this.
 

Geekwad

Admirable
It certainly could, but the generic kernal power event doesn't suggest that. PSU is usually the first suspect, but that's been ruled out. GPU also seems to be ruled out with it happening to more than one, but the power delivery/signal to the PCIe slots could still be the problem. With the clean install, also get all drivers from the Asus website to make sure only the most recent versions are installed.

Also on the clean install, take the dedicated GPU out entirely and use the integrated graphics to remove the 970 GPU from the mix too. If that happens to work, then putting the GPU into a different PCIe slot is worth a try just to see if the slot itself is the issue. It would still mean RMA'ing the board, but would be nice to know where the issue really was.
 

triops_90

Reputable
Jan 10, 2016
11
0
4,510
Finally had the time for further tests.

I removed the graphics card, left just one RAM stick installed (tried both), unplugged either the HDD or the SSD and ran a live linux - problem still persisted.

After a reboot all of a sudden, even after replugging the graphics card, no screen is rendered anymore, it's just as if a cable is plugged in to my monitor but no signal is coming (of course I plugged the cable back to the graphics card :p). What could that mean when neither onboard chip nor graphics card are displaying something?

Now I'm gonna RMA the mobo, gonna update you when I know more :) Thanks for all your help!
 

triops_90

Reputable
Jan 10, 2016
11
0
4,510


Let's hope it's the board! Preparing to RMA at this moment, gonna update when I hear something.

 

triops_90

Reputable
Jan 10, 2016
11
0
4,510
Update: After RMAing the Board I got a completely new one.. which wasn't properly working, too :D Inserted my RAM sticks into Dimm A2+B2 like shown in the manual, the dram-led was flashing and the system not booting. Each stick works isolated in B1 or B2, further both are working together in B1+B2 (single channel), so it has to be a problem with the ram slot. RMA'd it again. I'm getting really tired of this board series..

Let's hope again it was just the motherboard/ram slot and not the CPU. But if it was the CPU there should be more signs than "just" a non-working ram slot (I hope :p). I even used this asus cpu installation kit which basically ensures that you can't bend any of the mobo pins.