Seemingly random Video Driver Crashes, system crashes, failures to boots, and fire and brimstone.

inb4tehlulz

Honorable
Jun 12, 2013
7
0
10,510
This February I built an almost all new PC. New Mobo, CPU, Video Card, and Ram. I used a Asus Radeon HD7750 (Which comes "overclocked"), and had tons of trouble with DX11. Not only driver crashes which were recovered, but entire program lockups which required ending the task, but when I turned back to DX9 most of my problems went away, with the exception of a few random driver crashes. Using the Asus utility for the card I underclocked it and no longer had crashes.

My first thoughts were driver issues, or possibly a power supply issue (Possibly underpowered) or even a problem with the windows install (I had not used an official windows disc to install, since I no longer had one). Long story short, I had been using a 550W power supply (With a max of 600W) from like 2005, a TTGI TT-550K04, but I had no replacement supply to test with at the time. I had updated all drivers, tried downgrading video drivers, then upgrading, then totally removing, and clearing out registry keys, and reinstalling to no avail. So I went ahead and found where to download the actual windows 7 install iso, checked hash, installed it using my product key, reinstalled everything being super careful, install a driver, restart, rinse, repeat.. and the problem still existed. I even removed everything except for my SSD, my Video Card, Ram, Processor, Keyboard, and Mouse.

So where we are now is that with a fresh, seemingly good installation of windows, all the updated drivers, I could not run DX11 without crashes, and had to underclock to avoid random crashes during games. So I just dealt with it for a bit, because I couldn't RMA the Graphics card, or honestly any parts yet since I was in the middle of classes for college, and I am not trying to program on a netbook for 3 weeks.

Fast forward to now. I've ordered a new video card, and a new power supply. Uninstalled everything related to AMD, ATI, Radeon etc, ripped open the PC installed everything, did crazy awesome cable management, etc, booted up. Installed drivers, set everything up, everything seemed great.. for about 3 hours. Basically I had a game running, and DVD Fab copying a movie, and firefox running with youtube.. then suddenly all the screens flashed to white. Flashed back, firefox crashed, close firefox and suddenly one screen is purple, the other is white.. system just sat there hung on white and purple till I did a hard reset. After that hard reset the system had been crashing constantly, Nvidia kernel driver crashing, couldn't load firefox without the entire system hard crashing to reboot... all kinds of hell, but worse yet now randomly while the BIOS is booting, screen shoots to purple and reboots or hangs.

So I go and do a few things..

A memory test, it ran for a good amount of time with zero errors, but then the screen goes purple and crashes.

Reinstalled windows. Everything seems to be running well till I install the Nvidia Drivers. Then constant crashes and hangs, sometimes unable to even finish loading windows before the screen goes some weird color and reboots. Or the screen doesn't turn on.

Safe mode appears to work flawlessly, but it doesn't load the Nvidia drivers. That lead me to believe I could run in normal without nvidia drivers and no problems, but that isn't the case. I uninstalled all the drivers, and the system, while more stable, still had occasional hangs. Of course the fact that it crashed the same way in BIOS makes it hard to say it's driver error, and it seems more likely the drivers are exacerbating the hardware problem.

Removed new video card, reinstalled old card, removed old drivers completely, new drivers for radeon. System seems fine, so I load up FurMark to try to crash the system, because I'm a sadist. Everything runs well for about 5 minutes, the video card gets to about 60C and still only at 30 percent fan (Why such a slow rampup I don't know?) watching the rest of the system on SpeedFan and nothing get's over 44C.

Suddenly a hard crash, and then the system won't boot, I'm forced to unplug the power, and wait, and attempt but still no boot. I then unplug the power, flip off the power supply, unplug from the mobo, and pull the CMOS battery.. put it all together.. reboot and it keeps failing to boot. The power light turns on.. fans turn on.. then the power light fades away. I keep unplugging replugging.. and after about 15 minutes it turns on and starts to boot. It's seems to be booting, and resetting far slower than normal, but other than that it's running. Now I am getting random resets.

I'm about to go use the Ultimate Boot CD to try and run more diagnostics and tests and see if I can narrow down the problem. I'll be checking memory, hard drives, and anything esle I can. I have dump files from the Nvidia card I've attempted to read, but I don't really know enough to fully interpret them, I can provide them if needed. Below is what I believe to be the most pertinent information..

3 Dumps with this information
VIDEO_TDR_FAILURE (116)
DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT
MODULE_NAME: nvlddmkm
IMAGE_NAME: nvlddmkm.sys

and 1 dump with this

WHEA_UNCORRECTABLE_ERROR (124)
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
PROCESS_NAME: iexplore.exe
MODULE_NAME: hardware
IMAGE_NAME: hardware

The system has not made any dump files from the ATI card, apparently it hasn't bluescreened, just hard reset... of course I never saw a blue screen from the Nvidia card either but that's what windows has done.

A list of all hardware being used is below.

OCZ Agility 3 AGT3-25SAT3-60G 2.5" 60GB SATA III MLC Internal Solid State Drive (SSD) (About 1 year old this month, never had a problem with it in the old build, assuming it's good)

About a 5 year old 500GB Western Digital 7200 RPM drive, which I disconnected during the new install of windows and tests so I'll consider it a non factor in the original or ongoing problems.

ASRock Z75 Pro3 LGA 1155 Intel Z75 HDMI SATA 6Gb/s USB 3.0 ATX Intel Motherboard

Intel Pentium G860 Sandy Bridge 3.0GHz LGA 1155 65W Dual-Core Desktop Processor Intel HD Graphics BX80623G860

G.SKILL Sniper Series 8GB (2 x 4GB) 240-Pin DDR3 SDRAM DDR3 1866 (PC3 14900) Desktop Memory Model F3-14900CL9D-8GBSR

ASUS HD7750-1GD5-V2 Radeon HD 7750 1GB 128-bit GDDR5 PCI Express 3.0 x16 HDCP Ready Video Card (Original Video Card)

EVGA GeForce GTX 660 SUPERCLOCKED 2048MB GDDR5 DVI HDMI DP Graphics Card 02G-P4-2662-KR (New Video Card)

OCZ ZT Series 750W Fully-Modular 80PLUS Bronze High Performance Power Supply compatible with Intel Sandy Bridge Core i3 i5 i7 and AMD Phenom (New Power Supply)


Any advice would be greatly appreciated.. I'm about to go whip out my multi-meter and start checking leads at this point. In retrospect I wish I had done a stress burn in the day I bought everything, perhaps I would have found a problem then, never again will I skip out on that.

Thanks everyone.

Edit:
I meant to add that I'm thinking the problem is less likely the video card at this point, but either the motherboard, or PSU, or both, but I find it highly unlikely that the PSU would be bad on arrival, and I still had problems with stability with the other PSU as well. The only things that have been constant are the Mobo, Processor, and Ram. If the RAM continues to test good I'm thinking it's gotta be the Mobo at this point. I know the processor is almost always not the culprit.

I have also gone ahead (Because I ordered through Amazon for the new Vid and PSU) had replacements shipped (Since it's free and I have prime) so on Friday I will be able to at least try swapping things in if I don't find anything else wrong, though if it's a problem with the Mobo I'm hesitant to add a Video card to it, because perhaps that would fry it.. I mean I know it wouldn't cost me anything to replace again but if I start frying parts from lack of due diligence it's definitely not a financial burden I want someone else to have to carry.
 

inb4tehlulz

Honorable
Jun 12, 2013
7
0
10,510
Alright I did a memory test for about 4 hours which had no errors, then I pulled the power supply, jumped it, and checked the 24 pin, as well as checked what I was getting from the 12V 4pin's for both Mobo and PCIE, as well as the Sata and peripheral power. All values are within the tolerances, both underload and not under load. Most values were the same under load, a couple were .10 difference or so. (I can post the actual numbers if it will help)

I did notice though while testing my voltages under load that the heat sink for the chip set is hot. I mean like really hot, so I went to watch hardware monitor some more. I don't know what all of these connections are but here is the screenshot, supposedly one is getting to 122, and the other to 105, and even the one to 68.5 is crazy. All i've done is run the Pc for about an hour, running firefox, and hardware monitor. that's a pretty crazy temperature.

nvb3t3.jpg


Also I noticed, and I don't know what it is specifically, but there is some kind of heat sink directly above the processor, but below the 12 V power connection and it appears to be loose. It doesn't seem to be hot, but it doesn't seem like it could be helping much either.

 

inb4tehlulz

Honorable
Jun 12, 2013
7
0
10,510
Tried running furmark again, and this time it finished a 15 minute benchmark, but with one screen flash, not sure if driver crashed, but windows didn't say it did. Also tried to research what tmpin3 and tmpin4 are and according to a Russian tech forum.. someone says they are sensors connected to nothing and not to consider them.

I tried both prime95 and linx, prime95 seemed to fail after one or two minutes, linx went about 30 minutes then failed.. I decided to try to do a memory test again, this time I did each stick, one at a time, and both sticks passed in Memtest86+, then ran both sticks together and they passed. I double checked all the ram settings in BIOS and they are correct. Nothing is overclocked and now I'm pretty much out of idea's to even try.
 
G

Guest

Guest
I have a similiar problem w/ my 660ti. I bought a "custom" pc and it had a bad bottleneck, it very rarely had a video driver crash, and very rarely had windows crash to a randomly colored screen. it stopped after I noticed my GPU was overclocked and I put it back to stock clock, So fast forward to last week I bought a new CPU mobo and PSU and my graphics card is no longer bottlenecked, so 60 FPS on far cry 3 on ultra, but my drivers keep crashing and sometimes I get a crash to a solid colored screen.


So I make sure my GPU wasn't overheating, and yup it is. So then I clean out dust and put the fan to max, update drivers and that almost got rid of the problem, But every couple of hours my drivers will crash and recover after a few seconds, but then I have to set the fan again and stuff.


Also before I updated my drivers I had put my PC to sleep, (after my driver had crashed during that session) and It didn't wake up, it just restarted and I checked event viewer and it said:


windows failed to resume from hibernate with error status 0xc0000001 That only happened once...


So please help.
 

inb4tehlulz

Honorable
Jun 12, 2013
7
0
10,510
It's probably worth updating this that I troubleshooted as much as I could and figured out the real issue seemed to be the motherboard. I RMAed it and suddenly everything worked.