Random reboots no BSOD

Molloch

Reputable
Sep 3, 2015
18
0
4,510
Hi everyone, this is my first time using this forum, so if I'm posting this in the wrong section, please tell me !

Alright, so the problem is, I bought this computer in january 2014, working well, never had any problem. But 2 weeks ago, it started crashing randomly, with no BSOD, for apparently no reason, I don't remember modifying anything important. Those crashes occur randomly, although more often when I'm in game. It doesn't just shut down either, it's restarting a few seconds later everytime. In this span of time, I researched as much as I could, explored as many options as I could.

Here are the specs :

- OS : Windows 7 x64
- CPU : Intel Core i5 4440
- MOBO : ASRock B85M-HDS
- RAM : 2x4GB Kingston HyperX DDR3
- GPU : NVidia Geforce GTX 950
- HDD : ST1000DM 003-1CH162 SCSI
- PSU : Corsair VS650w

So, in these two weeks, I did several things :

- cleaning the computer off of it's (considerate) amount of dust
- ran several antiviruses, anti malwares, checked the temperatures of everything through different softwares (nothing infectious, and the temp all seemed very correct)
- went back to an old save
-updated drivers and the BIOS
-checked the mobo for any physical damage
- tested my ram with memtest86, one full loop with no problem, I'm not sure this is enough

Support of the website I bought the comp from told me it could very well be a hardware problem, and most likely a PSU or mobo malfunction. They told me to check the event log, and the infamous Kernel-power 41 (63) error was brought up everytime a reboot occured. The support then told me to run OCCT scans for my CPU, during which the computer crashed, and then another OCCT for the PSU, during which it also crashed several seconds in.

Today, I've done something stupid, and went and bought a new PSU without really knowing if it was the real problem. The GPU is also new. So I install these two components, start the computer after checking everything was correctly connected. Miracle, the computer does not crash for a solid 2 hours, I install the drivers for the new GPU, and enjoy my functional computer.

Then I started a game, played for 15 minutes, and the computer crashed again, same errors, same everything. I wanted to punch the universe in the mouth, then realised I was stupid for buying this stuff without knowing better. But hey, I can still use that after the problem is fixed. I then ran an OCCT scan for the PSU and the computer crashed after 2 minutes (which was much longer than the several seconds it crashed after for the previous scans) and I knew my problems weren't resolved.

So here I am, sending you this message on the very computer I'm having problems with, that has been running for 4 hours now without any problem, I'm watching a stream on Twitch with no problems whatsoever, everything runs fine, but I'm scared to launch a game or something heavy on the computer as it IS going to crash. Can anyone tell me if I'm missing something monstrously obvious or if I need to just stop denying the evidence and need a new MoBo ? Or maybe there is some other scan/software/stuff I can use to see if there is anything problematic ?

PS : english isn't my first language, sorry for the long story, and thank you in advance.



 
Solution
just watch the heat during CPU stress. if the cooling is not so good the heat will build up and the CPU should start to throttle. If this is turned off in BIOS, it will overheat and generate errors.


Zerk2012

Titan
Ambassador
Turn off auto reboot http://pcsupport.about.com/od/windows7/ht/automatic-restart-windows-7.htm
Download Prime 95, MSI Kombuster, and CPUID HW monitor and run all 3 at one time and report back with your max temperatures CPUID will show it for both.
ERROR 41 can be caused buy several different things all it is doing if recognizing something is wrong and shutting off the PC, this can be anything from heat to software, antivirus including everything in between.
 

Molloch

Reputable
Sep 3, 2015
18
0
4,510
Hey, thanks for the help.

I downloaded all 3 of these programs, I ran prime95 first, and as I was figuring out MSI Kombustor, it crashed again. I just checked the temperatures prior to that, all cores were at 75°C, which I think is fine. The "AUXTIN" t° stays at 50°C even in idle, as I am writing this message, is it alarming ?

By the way, when the pc crashed, he had a hard time rebooting. He shut down the first time, then rebooted, and he rebooted a few times after that before I got a clean boot. And now all seems fine, as it was before I ran the test. This is exactly the same as with OCCT :(

Any other ideas ? And again, thanks for taking the time to help.

EDIT : oh, and I have the automatic reboot disabled since the first crash, first thing I did.

EDIT2 : alright I tried the MSI kombustor stress test with the default selected in my native resolution, it was going very well until the very end, at like 95%, the computer crashed again. The temperature of the GPU was 59° at the time it rebooted. It also had difficulties starting back up, it rebooted a few times unsuccessfully before starting clean again. But remember the GPU and PSU are fresh from yesterday
 

Molloch

Reputable
Sep 3, 2015
18
0
4,510
So I did that, tried one stick, ran prime95, crash after 20 seconds. Same thing with the other stick. So I thought maybe one of the slots is malfunctioning. I tried the same thing with the other slot and the two sticks, same thing, a crash happens after 20 seconds more or less. And the even viewer just keeps on pointing that error :

- System

- Provider

[ Name] Microsoft-Windows-Kernel-Power
[ Guid] {331C3B3A-2005-44C2-AC5E-77220C37D6B4}

EventID 41

Version 2

Level 1

Task 63

Opcode 0

Keywords 0x8000000000000002

- TimeCreated

[ SystemTime] 2015-09-04T15:59:56.729205700Z

EventRecordID 138469

Correlation

- Execution

[ ProcessID] 4
[ ThreadID] 8

Channel System

Computer Molokh-PC

- Security

[ UserID] S-1-5-18


- EventData

BugcheckCode 0
BugcheckParameter1 0x0
BugcheckParameter2 0x0
BugcheckParameter3 0x0
BugcheckParameter4 0x0
SleepInProgress false
PowerButtonTimestamp 0
 

Molloch

Reputable
Sep 3, 2015
18
0
4,510
No new softwares prior to the first crash, which happened the 21/08/2015, the only thing I see in the installation log before this is a windows update from the 19/08/2015
 

Zerk2012

Titan
Ambassador
I would first check the hard drive, http://knowledge.seagate.com/articles/en_US/FAQ/170511en?language=en_US
The bad thing about that code is about anything can cause it from a simple driver issue to your antivirus, even the windows update could cause it.
Cut off any energy saving settings in windows then try again.
From their I might back up anything you can't replace on the drive and do a fresh install of the OS, and drivers.
 

Molloch

Reputable
Sep 3, 2015
18
0
4,510
The hard drive was okay, and there are NO crash dumps, I already tried the WhoCrashed program, and although the crash dumps are activated on my comp, they just aren't created when the computer crashes. Thanks for your help guys !

Edit : well damn I selected a best solution without wanting to. The problem is still here :(
 
force a memory dump using the keyboard.
http://www.dell.com/support/article/us/en/19/SLN283146/EN

then put the memory dump on a server and post a link.

This will make sure your memory dump settings are ok, and I can look at your running system to confirm you don't have certain problems.

if you get another bugcheck but don't get a memory dump saved to disk, the problem is likely to be in the disk subsystem. IE SATA drivers, bad hard drive connections, firmware in SSD, you may consider putting your drive on a different SATA port. The Primary SATA port (slower one) is the most likely to work. Ports supported by external USB 3.0 chips have the most problems and often require BIOS updates and special drivers to fix. There can also be bugs in the drive that require firmware updates. Maybe run crystaldiskinfo.exe to read the firmware version and smart error data.
 

Molloch

Reputable
Sep 3, 2015
18
0
4,510
Hello again, and thanks for your time. I did what johnbl said, and I could analyze those crash dumps with WhoCrashed. Here are the results :

Crash Dump Analysis
--------------------------------------------------------------------------------

Crash dump directory: C:\Windows\Minidump

Crash dumps are enabled on your computer.

On Sun 6/09/2015 19:45:43 GMT your computer crashed
crash dump file: C:\Windows\Minidump\090615-13213-01.dmp
This was probably caused by the following module: kbdhid.sys (kbdhid+0x2DDF)
Bugcheck code: 0xE2 (0x0, 0x0, 0x0, 0x0)
Error: MANUALLY_INITIATED_CRASH
file path: C:\Windows\system32\drivers\kbdhid.sys
product: Système d'exploitation Microsoft® Windows®
company: Microsoft Corporation
description: Pilote de filtre clavier HID
Bug check description: This indicates that the user deliberately initiated a crash dump from either the kernel debugger or the keyboard.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem.
The crash took place in a standard Microsoft module. Your system configuration may be incorrect. Possibly this problem is caused by another driver on your system that cannot be identified at this time.



On Sun 6/09/2015 19:45:43 GMT your computer crashed
crash dump file: C:\Windows\memory.dmp
This was probably caused by the following module: lhidfilt.sys (LHidFilt+0xE485)
Bugcheck code: 0xE2 (0x0, 0x0, 0x0, 0x0)
Error: MANUALLY_INITIATED_CRASH
file path: C:\Windows\system32\drivers\lhidfilt.sys
product: Logitech SetPoint(TM)
company: Logitech, Inc.
description: Logitech HID Filter Driver.
Bug check description: This indicates that the user deliberately initiated a crash dump from either the kernel debugger or the keyboard.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem.
A third party driver was identified as the probable root cause of this system error. It is suggested you look for an update for the following driver: lhidfilt.sys (Logitech HID Filter Driver., Logitech, Inc.).
Google query: Logitech, Inc. MANUALLY_INITIATED_CRASH


If you need me to drop those dump files somewhere, tell me, I can't do it from where I am currently but will do asap.

 
overall, I have no real idea. You do have a new BIOS dated in 2015 (new usb fixes), then Old USB drivers from 2013, but new USB filter drivers from 2015. You might just need update USB chipset drivers to match your new filter drivers and new BIOS.
the last BIOS update was to update microcode, this is generally to fix or disable some broken feature in the CPU.

I don't see proper updates the the motherboard site.

you might change your memory dump type from mini dump to kernel dump.
https://www.sophos.com/en-us/support/knowledgebase/111474.aspx

then start windows, run cmd.exe as an admin then run
sfc.exe /scannow
verifier.exe /standard /all
then reboot your system
verifier will run and check your device drivers for common errors and bughceck the system if it finds a problem.

if you have the system to make a kernel dump it will store the dump in c:\windows\memory.dmp
Note: to turn off verifier you will need to run
verifier.exe /reset
or it will run until you execute that command even after you reboot. So turn it off after you are done testing.
It create more debug info and the kernel dump will save the info and the internal error logs that allow USB problems and plug and play problems to be debugged.




-------
looking at the other memory dump, you do have a mix of some old drivers from 2010 with some new drivers from 2015.
this driver is listed as causing bugchecks but I think it was an older version of the driver:
daemon tools driver:
\SystemRoot\system32\DRIVERS\dtsoftbus01.sys Fri Feb 21 01:49:36 2014
http://www.carrona.org/drivers/driver.php?id=dtsoftbus01.sys

this is another driver that is kind of old and I don't see very often:
LSI SAS driver You might see if there is a update for it:
\SystemRoot\system32\drivers\lsi_sas.sys Mon May 18 17:20:23 2009
http://www.carrona.org/drivers/driver.php?id=lsi_sas.sys
-------
first one looks like a windows subsystem got a access violation when running
C:\Users\Molokh\AppData\Local\Temp\~nsu.tmp\Au_.exe,
you might want to find out what AU_.exe is doing and if you want it.

-------
machine info:
BIOS Version P1.70
BIOS Release Date 07/23/2015
Manufacturer ASRock
Product B85M-HDS
Processor Version Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz
Processor Voltage 8ch - 1.2V
External Clock 100MHz
Max Speed 3800MHz
Current Speed 3100MHz
Memory:
Speed 1600MHz
Manufacturer Kingston
Part Number KHX1600C9D3/4GX


 

Molloch

Reputable
Sep 3, 2015
18
0
4,510
Alright, it seems it was a Yahoo! toolbar that slipped into one of the softwares I downloaded these last days. I got rid of it, but it was from the 31/08, while the crashes began the 23rd, so I don't think this is related. Did you see anything weird in the hardware log by the way ?

Thanks again for the help!
 
added info to the above post.



 

Molloch

Reputable
Sep 3, 2015
18
0
4,510
So I updated DaemonTools, but I have no idea what I'm looking for with the lsi_sas.sys thing. I thing I ended up finding it on the avago website, but clicking the download button just redirects to the home page and nothing happens. But after looking for a bit, I'm not even sure why I have this thing ? It seems to be related to Host Bus Adaptater, and I'm pretty sure I don't have that in the machine.

I also switched to kernel memory dumps, and the sfc.exe reveals no error. Anything else I can do ?

Edit : Just found out I had another memory dump I forgot to include, here it is : http://www.filedropper.com/memory_2

Edit 2: Ok, I found out what lsi_sas.sys is about I think, I used to have a cordless mouse before but I'm using an USB connected one now. Could it be the problem ? Should I uninstall that driver and if yes, can you help me find it ? I wouldn't want to uninstall the other usb 3.0 drivers
 
i think lsi SAS dirver means it is from the company LSI for Serial attached SCSI drives.
it would be a host controller because you attach drives to it. (maybe a CD player ?)
https://en.wikipedia.org/wiki/Serial_attached_SCSI

my debugger is having problems reading your memory dump file. This could mean it is getting corrupted as it is written to disk.
(could be a problem with the storage drivers, or the actual hard drive)

Try to update any storage drivers, and run tests on your hard drive. Maybe run crystaldiskinfo.exe to check the SMART errors reported by the drive.


 

Molloch

Reputable
Sep 3, 2015
18
0
4,510
Well alright, I think I've got it updated now. It was part of a pack for USB 3.0 eXtensible host controller driver. Unfortunately, I ran an intel processor diagnostic tool and 75% in or so it crashed again, although I believe this is getting better. I'd get absolutely random crashes when the computer was idle in the beginning, now I can sometimes launch a game for an hour without having crash, so that's something. It happened during the cpu stress part of the test. Do I need to try anything else or is it definitely my CPU slowly giving up ?

Edit : I'm currently running a "SeaTool for windows" scan on my hard disk, I'll let you know if something comes up at the end.
 
just watch the heat during CPU stress. if the cooling is not so good the heat will build up and the CPU should start to throttle. If this is turned off in BIOS, it will overheat and generate errors.




 
Solution

Molloch

Reputable
Sep 3, 2015
18
0
4,510
Okay so apparently I have the latest driver for my Hard Drive, and all the tests came back as successful, so it's ok on this side.

I ran another test like you said and watched the temperatures, the computer crashed as soon as the core 0 reached 61°. I don't know if it is a coincidence or if the limit is 60°C, what do you think ?

Also, should I check the BIOS for the CPU throttling and activate or deactivate it ? I'm not used to navigate in the BIOS and I don't want to screw up anything by accident.
 
72.72°C it the max temp that the entire CPU case should reach before shutdown.
individual cores may get hotter than the overall case temperature. I would reset the BIOS to defaults, you want the CPU to shutdown or start to throttle if it gets too hot.