My new homebuilt PC give BSOD, but only after being off for a while

TheBloke

Honorable
Nov 2, 2012
11
0
10,510
I just build a PC. But every time it has been turned off for some hours, it gives a BSOD shortly after being turned on. Beside from that it runs perfect. The BSODs is different kinds.

All fans are running, and the CPU and MB temperature is low.

Any suggestions to how to find out what is wrong?


Intel Core i7 4820K LGA2011
EVGA GTX770
KINGSTON 180GB SSDNow KC300
Asus MB P9X79 LGA2011 ATX
Corsair Vengeance 8 GB DDR3 kit
ATX Corsair Carbide 200R
Seagate HD3.5 SATA3 1TB ST1000DM003
Netzteil 650W Energon EPS-650W CM
Intel CPU Cooler Air 2011
 

TheBloke

Honorable
Nov 2, 2012
11
0
10,510
After reinstalling the OS to UEFI boot, everything in the OS works again.

But the BSODs is back when starting the PC cold. It has passed Memtest+ and Microsofts memory test.

I have firmware updated motherboard to 4608, it did not help.

The fans start right away, and CPU and motherboard temperature is low.

The BSOD normally happens within the first 1-10 minutes of starting.

"Page error in non-paged area" and "Memory Management" is the most common ones.

The memory is: Vengeance® — 8GB Dual Channel DDR3 Memory Kit (CMZ8GX3M2A1600C9).

Btw. WIndows is set to restart at BSOD, but it does not, I see the blue screen.
 
in your BIOS mark your sata port for your OS drive as hotswap enabled. Ff the connection fails and the port is reset, it will not reconnect and you will not get a mini dump unless that is enabled. if you do get a memory dump, post it on skydrive with public access
 
edit: Also scan for malware using malwarebytes. Only because the function that failed was one that was used as a example in a book on how to write malware. I would expect Microsoft to put a check and bugcheck if the structure is modified to block this type of attack.

I looked at two of the crash dumps, both were caused by a corrupt page table. a Page table is a data structure that the winodows memory
manager uses to map virtual memory to physical memory. (basically used to move programs from the hard drive to memory(ram) so the CPU can run programs)

for the first stab at fixing this update Intel Rapid Storage technology drivers
http://www.intel.com/p/en_US/support/highlights/sftwr-prod/imsm
your intel drivers were iaStorF.sys Thu Aug 01 18:39:54 2013
iaStorA.sys Thu Aug 01 18:39:52 2013

also because of the nature of this issue you will want to scan your OS files for corruption
cmd.exe (as a admin) (they looked ok in the debug memory image, but check anyway)

sfc.exe /scannow <- will fix corrupted files if found
give that a shot and go from there
 

TheBloke

Honorable
Nov 2, 2012
11
0
10,510
The driver update did not help. Just gave a BSOD 2 minutes after startup, after being turned off for 2 hours.

Not likely to be a malware problem, unless an official driver is malware. Only installed Steam and a mainstream game inside that.

Also has anti-malware installed.
 
notes: BIOS reports cpu at 1.0 volt, you might want to confirm the value, maybe increase it (or the BIOS may indicate the wrong value)
You may also want to try your memory in different banks


edit:file 010914-10951.dmp
Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz
PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.

BaseBoardManufacturer = ASUSTeK COMPUTER INC.
BaseBoardProduct = P9X79
BiosVendor = American Megatrends Inc.
BiosVersion = 4608
BiosReleaseDate = 12/24/2013
CurrentSpeed: 3700MHz

ChannelB_Dimm1 corsair 1600Mhz 4096MB part num CMZ8GX3M2A1600C9
ChannelD_Dimm1 corsair 1600Mhz 4096MB part num CMZ8GX3M2A1600C9



Processor Voltage 8ah - 1.0V






I looked at another of your bugchecks
your win32k.sys driver in memory had a single bit corrupted

2c = 00101100
ac = 10101100

this could be a bug in the electronics, driver, cpu or just a low power condition to the CPU.
very hard to find these types of errors.






I would go into BIOS and disable your onboard USB controller for the USB 3.0 device
and remove its driver:
asanci64.sys
http://www.sysnative.com/drivers/driver.php?id=asahci64.sys for usb port I think

reboot and see if you bugcheck.

looks like internal data structures were corrupted or deleted twice.

(the only two suspect drivers were the asmedia driver and the mcaffee dirver)

failing that, you are back a hardware issue like a crack or chip leg popped off its solder pad. (thermal expansion can make and break connections)
it does happen.



 

TheBloke

Honorable
Nov 2, 2012
11
0
10,510
There is no mcaffee product or driver installed on the system.

The CPU normally has 1.2 volt. (reported by HWInfo)

asahci64.sys seems to be the SATA driver, not the USB driver. Should I still remove the USB 3.0 driver?

I was also looking around on this site, an often reply is increase memory voltage, could that be worth trying also, in that case how high from 1.5V should I increase it?

Is there any chance 650W is too little for this system?
 
edit, i would not increase the memory voltage in this case. you got a single bit error, I would expect it is always in the same location. If it were a low voltage I would expect larger errors in physical memory locations not a single bit. I would hope that it is a RAM stick defect (easy to fix by replacement, or underclocking the RAM)

your are correct, I just got sloppy
http://www.sysnative.com/drivers/driver.php?id=asahci64.sys is there sata driver, look for a update for it

in this case your copy of win32k has a single bit wrong while it is in memory. if your copy on disk is correct then it had to be modified as it has been copied to ram or while in RAM. Windows will load the modules in different sections of memory on each boot (to make it harder to hack) but will only check certain structures and bugcheck if they are modified.

Try and swap your ram modules with each other, if it is the ram, you can move the bad bit location from being a place where drivers are loaded (and later crash) to a place where your user programs are loaded (then they might crash, but your OS will still be running and you just restart the app) if you swap and still have the issue, the problem might be on the motherboard or cpu. (after that I would try the RAM in another bank set, to see if it is the slot connection).

NOTE: you can confirm RAM thermal breaks: cool off your system, heat the RAM chips directly with a heat gun before you boot when it is cold and see if the problem goes away. I have found a bad RAM stick this way, then pulled off the heat sink on the RAM stick, put it under a Stereo scope and saq that one of the chips was on the pad but there was no solder on the connection. When I looked up the chip, It was a address line that was not connected, but the power feeds were. What happened was when the chip was cool, the leg would contract away from the pad and that address line would be 0, a few mins after the chip got power it would heat up and the thermal expansion would cause the chip leg to expand and connect the pad. This caused the whole physical block of memory to move to a new location by a offset of the block that the address controlled, If a checked windows structure was in the block the mapping is no longer correct and the system may bug check depending on what driver was loaded in that location. in my case the defect was on a address line,
in your case you could have one on a data line. you write a 1 bit and read back a zero bit.

A ram test is a good place to start, even if it passes, you should rotate your ram and retest
(you can have a error in memory that was not under test by the software ram tester)