Various BSODs- I'm at a loss

sashcroft07

Prominent
Sep 1, 2017
5
0
510
Hey guys, I was hoping you might be able to help me diagnose these various BSODs I've been getting over the last few months. Excuse my noobiness I'm not exactly a pro at this xD

My specs:
OS: Microsoft Windows 7 Ultimate 64-bit (Desktop)
Motherboard: ASUSTeK COMPUTER INC. (MAXIMUS VII HERO)
Processor: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (architecture: x64; 4001 MHz)
RAM: 2 Sticks of 4.00gb
SSD: Samsung SSD 840 EVO 250G SCSI Disk Device (232.9 GB)
Other Drive: WDC WD20EZRX-22D8PB0 SCSI Disk Device (1.8 TB)
Graphics Card: NVIDIA GeForce GTX 970 (1920x1080x32b)
PSU: Corsair RM750

Right, essentially I've been having consistent BSOD's over the last few months, almost exclusively during gaming on high graphics settings. I would say 90% are 0x124, 8% are 0x101 and say the remaining 2% have been 0xF4 and 0x7a. I'll state what my buddy (my go-to guy on building computers) and I have done so far to try and cut down the list of probable offenders.

Overheating issues: All temperatures well within acceptable levels.
Malware Issues: No Malware on computer.
Driver issues: Used Driver Booster to update everything I could- previous to this i was updating everything manually.
Benchmarking (Graphics card): Used Heaven Benchmark 4.0- everything ran great on highest settings possible, fans kicked in no problem so no overheating. Ran multiple times and no crashes.
Ram testing: Ran memtest overnight for the full amount of passes- no issues with RAM. Also tried re-seating RAM multiple times.
Memory testing: No memory problems whatsoever- ran multiple times using the command prompt.
Overclocking: The graphics card is only Factory Overclocked- I don't mess around with any other overclocking manually.
BIOS: I haven't flashed the BIOS (?) to see if this would fix the problem.

Most of the time the crashes happen when gaming on high settings- these are essentially always 0x124 errors. When gaming on lowest settings I hardly ever get crashes. These crashes have happened during Farcry 4, Guild Wars 2, Total War: Warhammer and Dishonored 2. Note I've never had crashes during games using lower graphics, for example CIV5 or Dark souls series, and hardly ever playing GW2 on lowest graphics settings.

My buddy and I are inclined to think the problem is a failing SSD after exploring all other options and due to the following evidence; Samsung Magician and CrystalDiskInfo once being able to access the drives but now suddenly not recognizing either drive at all even after reinstalling (in addition to occasionally crashing during opening the programs). HWinfo crashing when using Summary-only results and when loading the Drives, slow data transfers even over USB 3.0 ports, when moving icons over the desktop workspace the computer lags out and occasionally crashes or sets all shortcuts to their default location. However, before Samsung Magician failed to read the drives, it stated the drive was in excellent condition (something like 96% intact?).

I'm not really sure how to attach dump files here so I'll copy and paste some of the analysis conclusions from WhoCrashed.

On Sat 23/09/2017 17:08:25 your computer crashed
crash dump file: C:\Windows\MiniDump\092317-11731-01.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x6F980)
Bugcheck code: 0x101 (0x19, 0x0, 0xFFFFF88002F65180, 0x2)
Error: CLOCK_WATCHDOG_TIMEOUT
file path: C:\Windows\system32\ntoskrnl.exe
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem. This problem might also be caused because of overheating (thermal issue).
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.

On Sat 23/09/2017 15:54:12 your computer crashed
crash dump file: C:\Windows\MiniDump\092317-7550-01.dmp
This was probably caused by the following module: hal.dll (hal+0x12A3B)
Bugcheck code: 0x124 (0x0, 0xFFFFFA80081CE028, 0xBE000000, 0x800400)
Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
This is likely to be caused by a hardware problem. This problem might also be caused because of overheating (thermal issue).
The crash took place in a standard Microsoft module. Your system configuration may be incorrect. Possibly this problem is caused by another driver on your system that cannot be identified at this time.

Before I end up replacing parts, does anybody know whether it's likely the problem is still faulty drivers/software? I'll be reinstalling windows and wiping the damn thing just to see if it is indeed a software problem. I'm also wondering whether the whole problem could be caused by a naff Wireless Adapter I got from BT that disconnects constantly.

I'll try to use a file hosting site if anybody requests the dump files. Thanks guys!

Edit: Shareable link to my google drive with the .dmp files if anybody wants to take a look.

https://drive.google.com/open?id=0B_ycYprcUXDAZ3QtYy1pT1dHMjQ
 
Solution
install the current bios update to get proper support for your low power cpu.
https://www.asus.com/us/Motherboards/MAXIMUS_VII_HERO/HelpDesk_Download/
looks like the current version is Version 3201 dated 2017/03/03
old bios version would apply too high of a voltage for the clock rate and will cause the cpu to overheat or mess up internal bus transfers inside the cpu.
(which look like a cpu hardware problem but it is more likely bios setting problem problem)


your cpu was released just before the bios version you have installed. Good odds that it does not have the correct bios patches for your cpu.
generally when I see this, the bios reports a max speed of 3800MHZ and a current speed of 4000MHz

be sure to update the motherboard...
you should provide the actual minidump files so they can be looked at with the debugger.

for a bugcheck 0x124 the memory dump will show the system uptime. if it is under 15 seconds then the motherboard logic reset your CPU. most likely because the GPU got warm and pulled too much power from the PCI/e bus. you can see if underclocking the GPU prevents the problem.

if the system uptimer is longer than 15 seconds, then I would look for overheating.

the memory dump will also show other common problems (duplicate copies of overclocking software running is pretty common)
 

sashcroft07

Prominent
Sep 1, 2017
5
0
510
Hey thanks for the reply johnbl. I'm currently in the process of reinstalling Windows to rule out any software problems but I'll upload a few mini dump files as I'm pretty sure i still have access to then. Any recommendations as to a file sharing service?

Also, I have my doubts as to overheating issues. Apart from the fact that the maximum temperature the GPU has ever reached is 79 degrees while benchmarking on ultra, I've cleaned the case of the majority of any dust and opened up the case while pointing a pretty beefy mains fan at the open case. I even decreased the temps that the fans kick in with the BIOS to the 'Turbo' setting.
 
I recommend to not install any software after reinstallation, including drivers! Do, however, update Windows and let Windows install any drivers it deems necessary.

On which drive is Windows installed? If it's the SSD, do you have a spare drive to install Windows on for testing?
 

sashcroft07

Prominent
Sep 1, 2017
5
0
510
Hey axe, sorry for the late reply, I've only just managed to get windows up and running on a spare HDD. Update on the crashes: still getting them after a clean install of windows. So it may not have been the SSD or a software issue. Updated windows & drivers, and still got a crash, so I still think its pointing to hardware. Though, one of the games that was crashing consistently is playable, it still crashes eventually.

Managed to find a good way to share the .dmp files- I've pasted a shareable link from my google drive below.

https://drive.google.com/open?id=0B_ycYprcUXDAZ3QtYy1pT1dHMjQ

Can anybody with some debugging experience take a look?

 

sashcroft07

Prominent
Sep 1, 2017
5
0
510
Strange, I downloaded a stress tester for the CPU (Prime95) and tested it for an hour on 100%, got 0 errors for I think 13 tests run. Temperatures and fans during the tests were all fine too. Would that not rule out a bad CPU? Should I perhaps run the tests overnight considering I don't need to worry about temperatures?

If the problem is indeed the CPU, is there any way to fix it without buying a new one? Or do you recommend a new one anyway? Cheers
 
Software tests for hardware parts cannot find everything. They may find hardware errors with the CPU, but they can't find that the cause is a broken pin in the socket causing damage to the CPU which as a result requires replacement for both CPU & motherboard, as an example.

If there is any problem with the CPU, there is no way of fixing it. Replacement is required.
 
install the current bios update to get proper support for your low power cpu.
https://www.asus.com/us/Motherboards/MAXIMUS_VII_HERO/HelpDesk_Download/
looks like the current version is Version 3201 dated 2017/03/03
old bios version would apply too high of a voltage for the clock rate and will cause the cpu to overheat or mess up internal bus transfers inside the cpu.
(which look like a cpu hardware problem but it is more likely bios setting problem problem)


your cpu was released just before the bios version you have installed. Good odds that it does not have the correct bios patches for your cpu.
generally when I see this, the bios reports a max speed of 3800MHZ and a current speed of 4000MHz

be sure to update the motherboard drivers if you have not done so.


BIOS Version 1104
BIOS Release Date 07/16/2014
Manufacturer ASUSTeK COMPUTER INC.
Product MAXIMUS VII HERO
Version Rev 1.xx
Processor Version Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
Processor Voltage 8ch - 1.2V
External Clock 100MHz
Max Speed 3800MHz
Current Speed 4000MHz
 
Solution

sashcroft07

Prominent
Sep 1, 2017
5
0
510
Thanks John I'll make sure to update the BIOS- ironically enough it's also what my friend recommended but I *was* hesitant to do it, what with my system being unstable.

Strangely enough, the problem may have corrected itself- after wiping my I haven't had a single blue screen and it's been a few weeks. I did change some settings in my BIOS to stabilize the CPU- I turned off the settings that change voltages depending on workload. Since then the games that were causing blue screens consistently don't seem to any more.

I'll update this thread if a problem arises though, and try your fix if the problems continue to persist.
 
cool, the tables in the bios for voltage at different clock frequencies is what should have been updated for the low voltage cpu. I think your setting just turned off the settings. These settings get tuned by the motherboard vendor in the various updates to the BIOS.