BSoD on brand new system. Please help diagnose possible faulty component.

Jonathah

Reputable
Jan 29, 2015
7
0
4,510
First, the problem:
My computer seems to run well and stably most of the time. However when I put it into standby, especially when I have a lot open such as my browser with 10 tabs or a game, my computer will enter standby apparently normally but when I resume from standby it sometimes blue-screens with that Windows 8 frown face blue-screen and then it dumps and restarts. I have also left my computer idle, returned hours later to see that it has rebooted itself (presumably blue-screened?)
In addition, it has occasionally blue screened while in use, such as during a game, but I suspect it is related to the same cause.

In search of the culprit I downloaded some diagnostic software, everything comes back normal. Except Prime95. When running Prime95 I select the first option (small FFTs) and it crashes in less than a second. It doesn't even blue-screen, it immediately blacks out and reboots. When running the second option (large FFTs) it runs about 5 minutes and I stop the test and move on. When running the third option (blended) it lasts about a minute and black-screen reboots. I have no idea what an FFT is so I have no idea what component or configuration might be causing the problem.

I booted from the USB flash drive version of MemTest86 tried to configure it for the most thorough test and ran it. It found zero errors until "Test 10" and it found 13. I thought "Aha! I just have to identify the problem stick and problem solved before the Newegg return policy expires!"
So I took all 4 sticks, and tested them with MemTest86 one at a time to find the problem stick. But BIOS warned me that my RAM had changed and demanded that I enter BIOS or restore BIOS defaults. I had trouble initially even booting past BIOS until I disabled certain memory features in BIOS. Then, while running with just 1 stick and doing MemTest86, each stick passed with no errors. Huh!? what about those 13? So I put all 4 sticks back in, tested again, this time no errors. It makes me think it was just a BIOS setting, so I leave all memory enhancements disabled (even the ones that are Enabled by default), and run Prime95 again. It still black-screens and reboots.

So I go into BIOS again, and disable everything from my CPU "K OC" to Turbo, and Hyper Threading. (I wasn't manually overclocking, just the defaults) Now, I run Prime95 and this time it runs indefinitely, no black-screens. Success? It seems strange that I have to sacrifice such major features to get a stable system... so I decided to put it through a different kind of stress test. I open my browser with 20 tabs, I open a game and benchmarking software and monitoring software and EVGA software, my memory only hits 20% which I thought was strange (I guess 16GB of RAM is really very unnecessary). I put it into standby. woke it back up. Success. Beginning to think this was really stable now, I go through my browser tabs and make sure each one fully loaded and then Blue Screen hits me.

I am at a loss and don't think I can continue on my own, especially because I am running out of time to return any problem components to Newegg. Must identify it quickly.

So here is my build:
I have only built 4 systems in the last 10 years, so I just have a bit of experience but not enough it seems. This is a new system with 100% new components mostly from Newegg over the holiday sales.

Intel Core i7 4790K Haswell
CoolerMaster Nepton 140XL CPU cooler
(my temp never goes over 66C under stress test, still seems a tiny bit high)

G.Skill Sniper DDR3-1600 CL9-9-9-24 1.5V
4 sticks of 4GB (F3-12800CL9D-GBSR)

Gigabyte Z97X-UD5H Mobo
using integrated mobo Audio

EVGA Geforce GTX 970 FTW

Sandisk Extreme Pro 480GB SSD (OS on this one)
WD Black 1TB HDD (Games, media on this one)
ASUS ODD DVD multi

Windows 8.1 Pro 64-bit
Avast Anti-virus

I would list my BIOS settings, but I've tried so many different configurations I haven't really settled on one, but I would love to get some suggestions if anyone is familiar with these Gigabyte BIOS options.

If anyone wants to suggest a good, effective, safe OC configuration with this system that would also solve my stability issues, I would happy accept as well. I have never done any OCing (that's the reason I went with the DDR3 1600, but if my memory is bad, I might replace it with 2400 or something if that would balance out my system).

I apologize if I was not as concise as I could have been. I just don't want to leave out any details that might be useful. I have until January 31 to return these items to Newegg if I can identify any faulty components. Though it's become evident that I may not know what I'm doing.... It all seems perfect, until it crashes.

I am not sure if this is relevant because I could not reproduce any errors upon further testing with MemTest86, but here were the initially reported errors:
Error Confidence Value 236
Lowest Error Address 0013c53df7c - 5061.2 MB
Highest Error Address 0014e53dffc - 5349.2 MB
Bits in Error Mask 00000008
BIts in Error - Total: min: 1 max: 1 avg: 1
Max contiguous errors: 1
Tests 0 through 9 = 0 errors
Test 10 = 13 errors
 

Jonathah

Reputable
Jan 29, 2015
7
0
4,510
I think I fixed the blackscreen power loss with Prime95. I kept searching over the last few hours and found another post here on tom's Hardware:
http://www.tomshardware.com/answers/id-2207989/4790k-crashes-turbo-boost-default-settings.html
The part that worked for me was:


I followed his example and my Prime95 test seems to be stable for the amount of time that I've had to test it so far.
I doubt that this will fix the occasional Standby/Resume and occasional random blue screen error, so I wouldn't call my problem totally solved. I am still open to any suggestions about that. Also still open to anyone familiar with this hardware combination and what settings are best to use, especially in light of the fact that the "Auto" settings for Voltages were the culprit for one problem so far.
 

Jonathah

Reputable
Jan 29, 2015
7
0
4,510
I'm still having blue screens while the PC is idle. Nothing going on, all applications closed except for normal background stuff (e.g. antivirus), and then bluescreen. I am not familiar with how to look these kinds of things in the Event View or analyzing dump files but here goes...

Around the time it crashed (not sure the exact time) I get 205 of these events in the Event Viewer, they all seem to be identical but there are 205 of them in a row, the event description reads:

"wuaueng.dll (656) SUS20ClientDataStore: The database page read from the file "C:\WINDOWS\SoftwareDistribution\DataStore\DataStore.edb" at offset 98304 (0x0000000000018000) (database page 2 (0x2)) for 32768 (0x00008000) bytes failed verification. Bit 64483 was corrupted and has been corrected. This problem is likely due to faulty hardware and may continue. Transient failures such as these can be a precursor to a catastrophic failure in the storage subsystem containing this file. Please contact your hardware vendor for further assistance diagnosing the problem."

It says likely due to faulty hardware, but what hardware? Memory? CPU? SSD? Not sure which component to return. or if this event info is accurate/relevant.
 

Jonathah

Reputable
Jan 29, 2015
7
0
4,510
One of the recent BSoD messages said DPC Watchdog Violation. While I was in the process of looking that one up, I got another BSoD that said IRQL_NOT_LESS_OR_EQUAL.

I couldn't find anything helpful on that second message, but for the first one I found a site claiming "DPC_WATCHDOG_VIOLATION is very common if you have a Solid State Drive (SSD) in Windows 8. Many SSDs can’t handle Windows 8 correctly until you update the firmware on the drive."

My SSD is SanDisk and SadDisk SSD Dashboard software assures me that my firmware is already up-to-date.
So is this site's claim accurate? If so, does that imply that perhaps Windows 7 would work better? I would be willing to make that compromise, if it meant I didn't have to return any of my hardware and solve my bluescreen problems.

But I need advice before wasting time or causing more harm. I think I have already introduced more errors from tinkering around too much. For example, I tried a 20% boost option that was built into my Gigabyte BIOS and that made boot times take 10 times longer. I disabled the boost, set the BIOS back to default, but the very lengthy boot times remain. My previous boot time was around 6 seconds, now it is over a minute. At this point I don't even know if my CPU has been damaged, or what... is it possible to affect only boot times? The rest of my PC operation is fine, in terms of speed. I can still play games and do anything else at normal timings, it is merely the black screen between the Windows 8 logo screen and the logon screen that hangs for far too long.

I have never had this much trouble before. My previous system lasted 8 years (still works, but time for upgrade) without any problems whatsoever....

At this point I think I will just RMA every single piece to Newegg, because I cannot figure out which component is causing the trouble. But I will keep searching till the last minute.
 

Jonathah

Reputable
Jan 29, 2015
7
0
4,510


Honestly, I don't know anything about voltages. That is probably the biggest gap in my knowledge of anything computer related, which is why I have never tried to change it before. But this time my default settings were very unstable so searched for knowledge from someone more familiar. The person I quoted had the same mobo and CPU that I have, had the same result from the same Prime95 test that I had.

I admit I did hesitate to blindly follow his example, not knowing for myself whether it was safe or absurd. But in lack of any other specifics from anyone, I took a chance. And my result was an improvement, at least in regard to that one scenario of instantaneous crash when testing stability Prime95. Now I can run all the Prime95 tests without crashing.

Your input does concern me though. Is it really so bad? I think my default "auto" voltage was even lower. So if this voltage is far from stable, please provide me with more suitable settings. Or point me to the best resource by which I can make my own determination.

I suppose I've been lucky that in my PC building experience I have never run across a system that wasn't stable by default, so this is my first time having to make manual adjustments. That is the reason I finally decided to ask the experts at this site, after all. Although perhaps I gave too much information and scared away everyone in the world of "TL;DR"

TL;DR
I don't know what an offset is. I don't know for sure what the vcore really is. I didn't know 1.13v is low. What is best? And I don't know what LLC is but I will go into my BIOS and turn it off if that will be result in a healthier system. And I would be willing to learn all about it, if I knew where the best place to start was. Perhaps I should start a new threat specifically stating my motherboard, requesting settings for input.
 

Jonathah

Reputable
Jan 29, 2015
7
0
4,510
It is very surprising that no one here has even made the attempt to be helpful, but I do know this site comes up frequently under search results. That is what gave me the hope that it was a good resource to ask the hard technical questions. Even the unanswered threads such as these pollute the internet for all time, making it that much harder search for relevant websites.

So for those who come upon this by searching for a problem similar to mine, here is the best I was able to do in correction of these problems.

For those who have blue screens (BSOD) in Windows 8 when resuming from standby, and seeming random other times, with the error message occasionally being "DPC_WATCHDOG_VIOLATION" and/or "IRQL_NOT_LESS_OR_EQUAL" , and who also have an SSD such as the SanDisk Extreme Pro 480GB even with up-to-date firmware confirmed through SanDisk SSD Dashboard. The solution that worked for me was simply to downgrade to Windows 7. I did a clean format of Windows 8 from my SSD and replaced it with Windows 7; I have not had a single crash yet, no blue screens whatsoever after 5 days under regular use.

For those who had instantaneous blackouts from running Prime95 small FFTs stress test using an Intel Core i7 4790K and a Gigabyte Z97X-UD5H motherboard. The problem is that the BIOS default "Auto" settings for voltages is configured improperly. The credit for this solution goes to the person I quoted and linked above. I cannot confirm the full safety of these settings and one member criticized it as absurd but then offered no justification for that comment leaving the issue more of a mystery. I can say that it solved my problem and now my system runs stably even under the Prime95 tests. Here are the custom BIOS settings that worked for me:
CPU CLOCK ratio: 44
Vcore manually to 1.13
CPU VRIN external override to 2.00
CPU VRIN loadline calibration: extreme
CPU VRIN current protection: extreme
PWM Phase Control: eXm Perf
and everything else on auto.

Hope this helps anyone who comes up against the issues I had.