At wit's end. New build, 7 months of BSODS

wrightfellow

Honorable
Aug 17, 2012
18
0
10,510
[UPDATE: issue traced back to faulty RAM stick. The bad RAM wasn't detected through memtest *until I tested each stick individually* Lesson: test your RAM one stick at a time]

I'm a noob who's learning the hard way. First build, chronic troubles, not sure what to do anymore.

Here's my specs:

cpu: i5-3570k
mobo: GA-Z77X-UD5H
gpu: Gigabyte nvidia 660 ti
ssd: Samsung 830 (windows and a few utilities)
hdd: western digital 1tb (data drive)
ram: corsair vengeance 16gb
psu: corsair 650w 80 plus
keyboard: logitech G110
mouse: logitech performance mx

Over the course of seven months, I have continued to get bsods. Performance is great on this rig until the bsods come. And they come randomly. I've tried a lot of fixes, updated to latest BIOS, tweaked settings, taken it to a shop, (fyi: the shop was unable to replicate the errors so unable to fix) and anyway ...

Here's my latest history, over the course of the last five days. My notes aren't very complete but hopefully this explains things well:

The bsods almost always while system is idle; system always seems to be cool too; occasionally while gaming but there's no pattern. I would try typical fixes--drivers, etc., with no resolution. Then the frequency of bsods began to increase a few weeks ago to the point where I scrubbed my ssd, reinstalled windows, and tried to start anew.

For a couple of days, things were fine. Then the first bsod came shortly after a windows update. I did a system restore. Things seemed okay. Later, a notification for DXSETUP.exe corrupt file. Ran chkdisk, got a "repairing USNJournal $J data". Things seemed okay again.

From this point, I undertook every fix/optimization I could think of. Hard Drives are on intel ports, drivers fully updated, ssd is benchmarked with great performance, and then the second bsod. 0x00000000a. Caused by ntoskrnl.exe according to blue screen viewer. This one came shortly after unplugging a game controller. Not sure if that's connected or not.

Anyway, updated drivers for the usb controller and left PC on all night.

Next morning, turned on monitor and saw third BSOD. Stop Code 50. Driver nvlddmkm.sys indicated. At that point, i fully removed my nvidia driver, ran driversweeper to fully uninstall, reinstalled most recent drivers, changed some settings on the Nvidia control panel regarding phsy-x, thought it was solved.

Later, I decided to address all drivers. I ran Slim Driver and updated 20 drivers on the machine. Ranging from intel chipsets to sound drivers, etc.

Then a fourth bsod. This time 0x0000001a. blbdrive.sys indicated. I double check all my drivers to make sure they're okay. Unsure what else to do, I leave it alone.

Next morning, fifth bsod. Again, 0x0000001a. This time fltmgr.sys indicated. On a whim, I removed two sticks of ram, taking me down to 8 gb. I run memtest through 3 passes to check for errors. (a few months ago, I tested all 16 gb through 7 passes). no errors. **memtest has never detected an error with my RAM** I then decide to reset CMOS. Why not, right? I use my reset CMOS button on the mobo, later I start up the pc, select default settings in BIOS, start to load windows, and boom: get a 0x0000007e before windows can fully boot. Now I'm stuck in this loop with 0x0000007e bsods and can't boot up in windows at all.

Long history aside, here's what I have:

All drivers updated
Latest bios from Gigabyte (F14)
Attempted resetting CMOS
Hard Drives connected to intel ports
No overclocking, trying to run perfectly on defaults
No apparent temperature issues
I've used two different anti-virus packages--avast for six months and Norton for this past week.
Persistent bsods over many months, possibly 18 in six months before scrubbing ssd and reinstalling windows and 7 bsods after a clean install of windows.
Variable bsods with many different drivers/codes indicated.
Hardly any pattern whatsoever. usually while pc is idling or in low-grade tasks.

And again, this history I've given is actually a repeated cycle of the past six months or so. I've tried every software adjustment I can think of after trawling nearly a hundred forum threads.

My question is ... what would you do?

Should I chalk it up to some hardware issue?
Take it to a tech?
Give up on this godforsaken thing altogether?

Please note I'm a total noob but I've done everything I can think of from reading forums, etc. It's beyond frustrating now.
 
Solution

animal

Distinguished
you might try running malwarebytes......could be a malicious program corrupting files. You can get the free version, only drawback is you have to tell it to perform a scan, the free version doesn't allow scheduled scans. Try it and see if it helps/resolves the BSOD issue(s). If it does, watch sites like Newegg or TigerDirect, the PRO version goes on sale quite often.
 

USAFRet

Titan
Moderator
Two main systems, hardware and software.
Since you have done a full reinstall of Windows and it still does it, that would 'probably' rule out the software. Unless there is a lingering virus/trojan in your install disk.

So let's move on to the hardware.
Take everything apart, and start at the beginning.

Minimal system, 1 stick of ram at a time. Test at each change.
 
Are you running Windows security essentials?

The memory may have been the problem with the 0x0000001a

Unfortunately 0x0000007e is just a stop command from an error. It can be a bad drive, memory, driver etc.

There are some people on this forum that can read the minihex dump.
 

wrightfellow

Honorable
Aug 17, 2012
18
0
10,510
Not running Windows security essentials. I've been using Norton this latest go-round.

And to USAF, I'll try removing, testing tomorrow. Leaving for a party at the moment.

Any further advice, steps, etc., is very much welcome. I can implement steps tomorrow morning. Thanks to any and all.
 

g-unit1111

Titan
Moderator
One thing I'd suggest doing is updating the firmware on your SSD, I ran into similar issues on my old build and got BSODs left and right. I ran the firmware update and that stabilized it until I was able to dump the SSD and replace it for a better one.
 

wrightfellow

Honorable
Aug 17, 2012
18
0
10,510


Sorry, should have mentioned before that I have the most current firmware on the ssd. Have had it current for several months now. Thanks for the reply.
 

wrightfellow

Honorable
Aug 17, 2012
18
0
10,510
Update: this morning, I finally get through the 7e reboot bsods and a couple scary things happen (scary for me, anyway).

First, after bios screen, I get a "Bootmgr is corrupt. Cannot boot". Then I restart and enter startup repair. In startup repair, halfway through the process, I get a popup that says "Instruction at 0x773624e" referenced memory at 0xfffffff. The memory could not be read".

Restart, do another startup repair, get another error, then somehow end up at system repair options, try a Window Memory Diagnostic Tool. tests are run, ends up clear, and then another startup. This time, I get to my desktop screen but it's all effed up. DVD drive isn't detected, no internet, no system restore options in control panel, antivirus is down, Eventviewer shows no files ... Completely effed. How on earth has this happened?

One thing seems clear: the latest steps for troubleshooting are completely impossible. So many things now seem wrong. Looks like I need to reinstall windows again.

Meanwhile, I really starting to think this is an issue that begins with my graphics card. Here are similar people with this card who had similar kernel power bsods:

http://www.tomshardware.com/forum/390715-33-gigabyte-n66twf2-display-driver-mode-kernel-error
http://www.tomshardware.com/forum/389532-33-bsod
https://forums.geforce.com/default/topic/532554/msi-gtx-660-ti-pe-kernel-mode-bsod-39-s-rma-39-s-black-screens-/
http://www.tomshardware.com/forum/370896-33-gigabyte-660ti-random-reboots

My next step then, I suppose, is to scrub ssd again, reinstall windows, while trying to get an rma for the graphics card?

I just don't know if I can stomach reliving the past week yet again. The past week has been incredible. From a clean install to a progressively degraded system that now is just screwed. Six months of this and then, this week, it accelerated.

My only other option is to take it to a tech pro and let them deal with things from here. Likely to be expensive but I'm at the point I'd rather spend money than time/stress.

Anyone's thoughts and suggestions are very welcome and much appreciated.
 

g-unit1111

Titan
Moderator
There's two things I would suggest doing before taking your system to a tech:

1. Take your SSD out and try installing Windows on your mass storage drive. Chances are good that you might have got a bad SSD.

2. If you are overclocking your system - return your motherboard's BIOS to default settings. If you do that and Windows still gives you BSODs, then I would suggest taking your system to a tech.
 

wrightfellow

Honorable
Aug 17, 2012
18
0
10,510


Thanks for the input. Before I begin, should I also remove the graphics card, too, or just keep it in and do this with only the ssd removed? Also, how difficult will it be to extract windows from the mass storage device? Let's say I put it on, things go well, we establish the ssd as the problem ... will I risk data loss by later extracting windows from the mass device and putting on the new ssd? I think I know the answer but I'm still such a noob with this stuff.
 

USAFRet

Titan
Moderator


It doesn't work like that. You can extract your data, but not extract the Windows installation.
You might be able to clone it, but going between an HDD and an SDD is problematic.
 

g-unit1111

Titan
Moderator


No just remove the SSD, leave the graphics card. It won't be difficult at all to remove the Windows install on the mass storage drive once you install the replacement SSD. Windows will simply rename the folder "Windows.old" and you just delete that.
 

wrightfellow

Honorable
Aug 17, 2012
18
0
10,510
UPDATE: I decided to just get another ssd, do a clean install, and see how things go. Sadly, after doing the installation, updating drivers, and leaving the PC on all night, I awoke this morning to find another Blue Screen. This time it's MEMORY_Management, and the driver indicated is nvlddmkm.sys

I had the same blue screen (memory_management, code 1a) three times during last week's reinstall. Once caused by nvlddmkm.sys, other times by blbdrive.sys and fltmgr.sys.

Video card? RAM slot? Mobo? Again, different ssd, third time with a clean windows installation, bsod happened within approx seven hours of initial install.

Also, Blue Screen Viewer indicates that the crash occurred the instant the pc woke this morning. Does that mean anything? And I don't think it's power settings because I've had the bsod while the pc idles and/or sleeps.



 

g-unit1111

Titan
Moderator


Wow, that's bizarre. If you're getting memory management errors maybe try running MEM_TEST. If that doesn't work you're going to have to pull out each of the memory modules in your system and test them one by one.
 
wrightfellow,

I'm not a diagnostician by any means, but with gogglelization of "BSOD GA-Z77X-UD5H", I've found quite a few threads discussing similarly frustrating and serious problems. One older thread in particular caught my attention as the OP has a system that sounds very similar to yours and that thread seems comprehensive in suggestions and experiments>

http://atforums.mobi/msg.php?threadid=2244025&catid=5&rnum=56

> two interesting mentions concern the status of "Virtu" and a couple mentioned enabling the integrated graphics. The latter is an odd one and the poster didn't himself understand why enabling IG solved his problems. There are in all the threads repeated mention of possible memory related problems.

Also, from Gigabyte's site, here are the Corsair memory sets that are listed as compatible with the GA-Z77X-UD5H >

http://www2.corsair.com/configurator/product_results.aspx?id=4577420

> just in case your Corsair set is not on that list, that might be a consideration. One poster mentioned that dialing his memory back to 1333 solved his troubles, others talked about altering the voltage.

Sorry, I can't offer more specific suggestions, but I can offer my sympathy and hopes you can work it out. This kind of intermittent, unexplained problem is the worst form of digital torture. It reminds me of months of Hell with Windows 95- repeated BSOD's due to "SDRS" -Sudden Disappearing Registry Syndrome.


Cheers,

BambiBoom
 

wrightfellow

Honorable
Aug 17, 2012
18
0
10,510
Dear good people ...

I might have found a solution. I'm going to give it a week to make sure and then, hopefully, return to list this issue as SOLVED.

It appears that the whole thing is due to nvidia drivers. I don't know why or how but any driver that is later than 306.23 seems to cause BSODs for me. I haven't tested them all but I've tested 311, 314, 305 ... and after searching other people's threads, it seems that 306.23 was stable.

So I downloaded it three days ago and things have been great. No blue screens.

I'm still skeptical, though. It seems so strange to see all these crazy errors just from a stupid video driver. But the 660ti has a bad history already with these things. And I still want to RMA my original ssd to see if it had any errors. But for now, things are good. I'll update again with final assessment (hopefully) in the next few days. Wish me luck, bros. And thanks so much for everyone's help!

**Also, bambiboom, thanks for the link. That's going to be a great resource in the future**
 

g-unit1111

Titan
Moderator


Interesting, never would have guessed that in a million years. :lol:
 

wrightfellow

Honorable
Aug 17, 2012
18
0
10,510
Well, gentlemen, as I feared, I spoke too soon.

Powered up the PC this morning and immediately got a BSOD. DRIVER_IRQL_NOT_LESS_OR_EQUAL.
Restarted, got a second BSOD. CODE 3B.
Restarted, began Startup Repair, got a BSOD. CODE 7F.
Restarted again, began Startup Repair again, got a BSOD. CODE 7E.
Then I powered off, booted back in Safe Mode, and pulled the reports from Blue Screen viewer.

It said my first BSOD was caused by ntoskrnl.exe.
My second BSOD was caused by nvlddmkm.sys.
The other BSODs weren't recorded.

All of this happened after 5 whole days of a perfectly stable system. 5 whole days of blissful bliss. It's the same sort of BSOD loop that occured last week, prior to my reinstalling Windows for the 3rd time. What on earth could be causing this behavior?

Any thoughts? The nvlddmkm.sys driver was the source of my first BSOD five days ago (on my latest Windows install). Then, after rolling back the video driver from 314 to 306, I had stabilty until this morning. I've tested the video ram and it passed clear but I still lean towards this being a video card issue. But would a video card cause this sort of restarting BSOD behavior?

Let me reiterate what I've done so far ... many of these actions have been performed multiple times over the course of 7 months and 3 different windows installs ...


  • Done a clean install of Windows three times.
    With every system reinstall, I've updated all drivers.
    Updated to the latest BIOS.
    Paid special attention to video drivers and used Driver Sweeper to clean out files before installing any new drivers.
    Researched video drivers and installed what appears to be the most stable version: 306.
    Tested my RAM through 10 passes using memtest.
    Have tried two different antivirus programs.
    Have replaced the SSD during my 3rd installation of windows (which was done this past Friday).
    Have swapped power connectors, disabled competing sound drivers, tweaked power settings, PhysX settings, cleared CMOS.

Probably done something else, too, and just don't remember it. But one thing is clear: it's not the SSD, I can't imagine it being software anymore, either. Do note that my video card runs perfectly fine, V-RAM checks out ... and a tech has tested my power supply before.

Maybe it's voltages or such things, as indicated by the resource provided by bambiboon, but maybe it's just hardware?

Any thoughts, help, advice, or consoling would be much appreciated. I'm really starting to hate my computer. Starting to wish I'd stuck to being a console gamer.

 

wrightfellow

Honorable
Aug 17, 2012
18
0
10,510
Okay, sorry for the double post but I may have found something new with some extra research. It relates to RAM timings, etc.

First, my RAM is Corsair Vengeance Dual Channel 4GB sticks. It's settings, based on the Corsair website provided by bambiboom, are as follows:

1600 Mhz
8-8-8-24-2T
1.5v

The motherboard, meanwhile supports the following settings:

Support for DDR3 2800(OC)/1600/1333/1066 MHz memory modules
Support for non-ECC memory modules
Support for Extreme Memory Profile (XMP) memory modules

In other forums, they solved their issues by reducing to 1333 mhz for best stability and, in one case, also adjusting RAM timings. So I looked at a photo I took of my last memtest. I says my RAM is at "DDR3-1334 mhz" and has a "CAS: 9-9-9-24", which I assume is timing. Could this be my issue? And if so, do I just adjust in BIOS?
 

g-unit1111

Titan
Moderator


Well before you do that there's only one real proper way to test your memory to determine if there's a defective module or not. And that is to take all of the applicable sticks of RAM out, insert them into slot [1] one by one and start up your PC with each stick in the slot. It's not fun but it's the only way it can be done. If that's the case then contact Corsair to get replacement RAM when you figure out which one is the defective module.
 
Solution

wrightfellow

Honorable
Aug 17, 2012
18
0
10,510
g-unit, I can't thank you enough for the persistent advice to check the RAM with the correct method. After 10 passes through memtest, 7 with all four sticks and 3 with just two, I finally took your advice and tested one stick at a time.

My first stick? No worries. The second stick? It blew up with 3,000 errors in less than thirty seconds. After countless months and countless other fixes, this finally has solved the problem. I'm using a friend's RAM and have had zero issues over the past week.

SO TO EVERYONE ELSE WHO MAY READ THIS: Take g-unit's advice and only test your RAM one stick at a time. Memtest won't necessarily detect the error otherwise.

Thanks again to everyone. I'm embarrassed that the issue was so simple to fix and yet took so long to find. This is a lesson in practicing good trouble-shooting methods.