Constant BSOD (Probably GPU Driver Related)

Aaron Solomon

Honorable
Jan 13, 2014
11
0
10,510
Dumps: https://app.box.com/s/dzlawpgzxafzgydn07rh
System Information: https://app.box.com/s/tf6pyrqlbzrmwbzhbcua

When doing anything, even idling, it will BSOD with whatever error. I have deleted them (which is why they only go back to June), but they've been going on since probably 2011. There were just getting to be too many. It will even BSOD every 5 minutes sometimes.

I have run memtest with 0 errors.
I have done check disk with 0 errors.
I have run driver verifier w/ 0 issues (ran it for a month and it didn't crash at all, turned it off and it crashed a week later).

I am running Malware Bytes and SUPER AntiSpyware


For GPU, I have completely cleaned out the driver, did a thorough registry clean, went into safe mode, did another registry clean, and then completely reinstalled it. It still crashed.

I have completely dusted out the inside and outside of my computer. No changes, keeps crashing.

I used overdrive to lower videocard to bare minimum and maxed out the fan speed. I also tried playing games on bare minimum settings. Still crashed.

I tried disabling all hardware acceleration. No change, still crashed.

The crashes are completely random, I have no idea :|.

I have had my computer both in overclocked mode (tuned RAM, overclocked CPU) and in regular mode. In either mode, it crashes at the same random frequency.

Please help, I've had this problem for almost 3 years now :(. I've never posted on it and tried to resolve it on my own, but everything I've tried has failed.

I think that it's just AMD's crappy super buggy drivers, and if that's the case, then there is nothing I can do about it.

Thank you for taking the time to read this :).
 
IRQ error

Interesting. I would actually think it's your motherboard.

There should be an option in the BIOS to reassign all IRQs

If not, remove any attached USB devices.

Edit: Changed my mind.

It's your RAM that's bad. You can do a RAM test, but if you are overclocking, I would undo the overclock.

If you recently bought new RAM, go in the BIOS and make sure it's set to the correct profile settings.

Double edit; missed the part about overclocking

What RAM are you using? My guess here is that the timings or frequency are wrong, or that you friend the RAM or memory controller when overclocking.
 

Aaron Solomon

Honorable
Jan 13, 2014
11
0
10,510


Right now everything's on default.

The timings table looks strange =o (brand, type, etc included, all 4 modules are the same type)

https://app.box.com/s/xu37sjfgviwzeh81h5on
 

Aaron Solomon

Honorable
Jan 13, 2014
11
0
10,510


That's my timings table, so it's currently using that profile :\.

Whenever my comp BSODs, the motherboard resets all hardware to defaults, so nothing is customized atm.

edit
also, I don't think that either the RAM or memory controller are fried as memtest comes up with 0 errors :\. I've run it many times, including on all sticks and individual sticks =(.
 

Aaron Solomon

Honorable
Jan 13, 2014
11
0
10,510
I haven't changed out OS or Hardware since I first built the machine.

It's Windows 7 ultimate. I posted up my dxdiag ;).

I always keep all drivers and the OS up to date. I also regularly clean the registry. My computer (besides the BSODs) still runs as well as it did when I first made it :\.

edit
i see that Asus has been releasing one Bios after another, including some quick fixes. I'll try an update.
 
How old as in, how long have you been using Windows since your last format?

If you haven't changed anything, I say hardware issue. Easiest thing to check is the Hard drive.

But it's all guess work without trial and error with other components.

Run sfc /scannow in CMD and see if that finds any file structure errors.

If the BIOS is resetting, that would mean there's a read error with the CPU though... I still think it's the memory controller. Which is imbedded on the CPU for i7's.
 

Aaron Solomon

Honorable
Jan 13, 2014
11
0
10,510
-If you haven't changed anything, I say hardware issue. Easiest thing to check is the Hard drive.

have run check disk on it in the past with 0 errors

-How old as in, how long have you been using Windows since your last format?
not sure when I built the machine, but I've never formatted except for when I first installed Windows. Maybe 2010?

-Run sfc /scannow in CMD and see if that finds any file structure errors.
ran, 0 integrity violations

The BIOS is resetting because that's what the motherboard does. If it runs into a BSOD, it loads up defaults for all hardware. I have a P9X79 =).
 

Aaron Solomon

Honorable
Jan 13, 2014
11
0
10,510
Well, given that the RAM throws no errors, i hope that the BIOS update fixes it :(. My BIOS was 1 year old and more than 2000 versions behind :eek:. Most of them have had to do with system stability =).

I'll try checking the CPU for failures.

edit
Actually, it'd fail the POST check if CPU had errors, so I don't think that's the case. I suppose I could try prime95 or something :).

edit
I've run prime95 before with 0 errors... so CPU isn't the problem, lol. Did several insane tests.

Guess that leaves the mobo, unless RAM can fail in other ways that can't be caught by memtest? :\

edit
got a new dump for you. This is the first time I've ever gotten this error.

Happened while testing the CPU like 8 times :\. Don't know if it was random or not. It passed each test. I just benched the CPU for awhile with no crashes or errors and spammed test, so I think that it just happened by coincidence.

https://app.box.com/s/l7jv51y8w75ct5eassdc
 

Aaron Solomon

Honorable
Jan 13, 2014
11
0
10,510
Well, I'm hoping it was the BIOS. I'll have to wait a month or so to see. The BIOS had a lot of instability issues apparently. The last few updates I saw of it all had to do with system instability and USB incompatibility.

Even now, Asus is releasing a new Bios like every 4-6 days, so the problem might still be there :\.

There is no way to know really I guess :|.

Normally, memtest would throw errors with all of the RAM together if it was the memory controller. I had no errors and I tested it multiple times to be certain ;(. prime95 may also crash the computer.

Anyone know of a good way to test the memory controller?

I googled bad memory controllers, and in every case, people have had infinite boot loops or couldn't boot at all (RAM error beeps). If people could get into their computer, it was the BIOS every time (almost every time).

Also, in most of these cases, it was an asus motherboard similar to mine (and they've been releasing non-stop new BIOS versions for the past year). The cases go from 2009 to now. I'm hoping against hope that it's the BIOS.
 
It's not the bios. After post, the bios is done it's job. Anything bsod is a dump from windows, usually because it can't read the memory or the CPU is giving bad data. Troubleshoot hardware. If it posts, bios is fine.

The memory controller is the CPU. You may need to run the test for a few hours. Could be your PSU dying though.
 

Aaron Solomon

Honorable
Jan 13, 2014
11
0
10,510
Ok, I decided to run memtest again (after a long time).

At blockmove, the entire computer crashed (screen went black). The fan maxed out etc, then the computer shut down.

This crash occurs on memtest 5 when all 12 threads are running in parallel. With 1 core at a time, there are no errors or crashes.

edit
retesting with dif version of memtest since I'm getting dif results based on the version :\

edit
nvm, my comp just isn't compatible with v5 >.<.

ehm, also, the memory controller primary gets damaged when voltage settings change no? My RAM voltage has always been at 1.5v. The only thing that the tuning did was lower it from 1600 to 1333 (don't know why), which was still compatible with the memory controller.

The BIOS was causing BSODs all over the place (system instability etc) according to ASUS, which is why I'm hoping it was the BIOS. I haven't had a crash or any problems since updating the BIOS (besides the memtest crashes on v5), but it hasn't been that long yet :\.

edit
was looking through BIOS and it appears that most everything was on auto. Explicitly set profile to XMP. Some other things were on auto too, like voltage overload on CPU ... =|. I'm hoping that the mobo didn't kill the CPU.

When I did the overclock thing, it was actually the Overclock Tuner on the mobo, which was 100% automatic. Have no idea what it actually did. All of those settings were lost with the first BSOD though.
 
The memory controller is on the CPU, so if you are adjusting CPU voltage, you could damage the memory controller.

If you changed the RAM voltage, you may have damaged just the RAM but it's unlikely if they are properly cooled.

What heatsink do you have on that i7?

Try the BIOS flash, just be careful. If you BSOD during the flash, that's bad news bears! I would do it from a bootable USB with DOS.
 

Aaron Solomon

Honorable
Jan 13, 2014
11
0
10,510
I tried a BIOS flash already ;), and no crashes since then like I said, but it hasn't been that long.

For the heatsink, it's some huge brick of a dual fan thing that was really expensive : |. It was the best one I could buy at the time >.<. It keeps the CPU at a constant temp no matter what I do, it's amazing o-o.
 

Kari

Splendid
couple of questions

1 What's your psu?
2 What memtest are you using to test the ram? Sounds like it is something else than Memtest86+ http://www.memtest.org/


edit and imo continuous usage of registry cleaners will eventually mess things up :p
 

Aaron Solomon

Honorable
Jan 13, 2014
11
0
10,510
PSU: 1000w 80+ gold Silent Pro from Cooler Master

http://www.coolermaster.com/powersupply/silent-pro-gold/silent-pro-gold-1000w/

memtest: I used memtest+ v5 and v4. I also used memtest v4. memtest+ v5 completely crashed my computer on the move block test when all CPUs were going parallel. When single CPUs were used (did sequence), no problems. 0 errors ofc. I used memtest v4 and it did not crash when all CPUs were going parallel for the move block test. I could not start memtest v5, it wouldn't let me ;). memtest+ v4 doesn't support parallel, so no crash. Got 0 errors too. prime95 also gave 0 errors.
 
Try: http://sourceforge.net/projects/cudagpumemtest/

PSU has a good review; http://www.anandtech.com/show/3856/cooler-master-silent-pro-m1000-1000w and it outputs to spec.


Open the case, remove dust from the motherboard, verify that everything is properly in place. Make sure the fans are all spinning, make sure there are no screws shorting the motherboard to the tower.

Make sure the RAM sticks are in.

On another forum, someone recommended the "wiggle test" where you wiggle a stick of RAM during Memtest to see if it puts out errors.


Are you on a clean install of Windows, a pirated version, or an HD from another system?
I recommend going into add-remove and making sure any programs that are not mandatory are removed.

I'd also boot into Safemode, and verify you don't have two different video card or soundcard drivers installed, and disable or remove anything that could cause a conflict. If you are using a generic CD ROM, you may even want to dig for the proper driver if it's not installed.
 

Aaron Solomon

Honorable
Jan 13, 2014
11
0
10,510
Open the case, remove dust from the motherboard, verify that everything is properly in place. Make sure the fans are all spinning, make sure there are no screws shorting the motherboard to the tower.

I regularly clean my computer and everything inside of it is tight. I was the one that screwed the stuff in.

Make sure the RAM sticks are in.

Don't need to check, I know they're in. They are hell to put in and unless the motherboard cracked in two, they aren't going to come out without some serious effort. Took me 2 hours last time to get them back in.

Are you on a clean install of Windows, a pirated version, or an HD from another system?
I recommend going into add-remove and making sure any programs that are not mandatory are removed.

My Windows is legit. I am very careful with software as well. I have no spyware, malware, or unwanted software on my machine.

I'd also boot into Safemode, and verify you don't have two different video card or soundcard drivers installed, and disable or remove anything that could cause a conflict.

Have already done that in the past. Remember that I cleaned up the corrupted GPU driver? : P

If you are using a generic CD ROM, you may even want to dig for the proper driver if it's not installed.

I regularly maintain my drivers. I use SlimDrivers now, it's awesome :D

Try: http://sourceforge.net/projects/cudagpumemtest/

I'm not going to do that. The only problem with the GPU has been the driver. The GPU itself is fine.

http://www.gigabyte.com/products/product-page.aspx?pid=3784#ov

Why do I say this? I used to get BSODs every 5-10 minutes from the GPU driver. The BSODs are identical to the ones I showed you in that zip file, which you said have nothing to do with the driver. Why do I say they do? When hardware acceleration was enabled, my computer crashed non-stop. This points directly to a GPU Driver problem. When I disabled it, it only crashed once a week. Major difference. This last Radeon driver finally lets me enable hardware acceleration without mega crashing. It's finally getting pretty stable.

In the past, I also benched the GPU extensively with furmark because it used to fail all the time. The GPU would just kick out, I'd lose display, and then it'd restart : |. I had it on the lowest settings it could go and it was still doing that. With this last driver, I have it on default settings and no problems :D. The GPU is also staying cooler o_O. I even heated up the house to make sure that it wouldn't crash in the summer from overheating or something ;D. I finally don't have to run the fans at max speed to keep the thing cool. Do not underestimate how bad the GPU driver was and has been since I first built this computer. It is only in the last 3 months that it is finally starting to not suck (note that most of my crashes stopped on 10/6, which is when did another update of GPU driver). I did another update of it just now too : \.

This is also why I put GPU driver in the thread title, because the crashes are identical to the crashes I was getting from the GPU driver. However, there are probably multiple culprits. It could still be that the GPU driver isn't 100% reliable though.


When I think about it, my random crashes stopped on 10/6 with the last GPU update. All of my current crashes have been when I have completely turned off the computer (power switch off, unplugged) and then plugged it back in. It would BSOD every couple of minutes for the first couple of boots and then wouldn't BSOD again until I'd unplug it again. I move it between a few rooms ;). This is probably from the BIOS USB device incompatibilities, which according to Asus, caused BSODs. I dunno tho.

I'll test it out tomorrow (it's getting late now) to see if it crashes at all when I totally unplug it.


So anyways, thanks for your tips all, but I think that in the end it was my GPU driver and my BIOS.

I know everyone really wants it to be the RAM or the memory controller, but my system has only been getting more stable with GPU driver updates. Hopefully the BIOS update finishes it off ;).

Thanks everyone again for all your time =). I'll put an update up tomorrow on how the test goes with the unplugging.