Does NVIDIA Suck Now? BSOD's galore, please Help

demetrius202

Distinguished
Jan 15, 2011
136
0
18,690
I've built a rig and previously posted on the forums about troubleshooting my BSOD's:

http://www.tomshardware.com/answers/id-1908611/absence-pin-atx-causing-crashes.html#xtor=EPR-8809

I've replaced the psu with a nice, silent OCZ 1000W psu and after the first 15 mins of gaming (Skyrim) I get a freaking BSOD. Followed by more while watching a movie. I windowed skyrim and watched my gpu and the temp went to 44C and stayed there but then I got another BSOD. Here is my .dmp file per 'whocrashed'

On Fri 12/6/2013 1:55:36 PM GMT your computer crashed
crash dump file: C:\Windows\Minidump\120613-9968-01.dmp
This was probably caused by the following module: hal.dll (hal+0x12A3B)
Bugcheck code: 0x124 (0x0, 0xFFFFFA8010930028, 0xBE200000, 0x5110A)
Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
This is likely to be caused by a hardware problem problem. This problem might be caused by a thermal issue.
The crash took place in a standard Microsoft module. Your system configuration may be incorrect. Possibly this problem is caused by another driver on your system that cannot be identified at this time.



On Fri 12/6/2013 1:55:36 PM GMT your computer crashed
crash dump file: C:\Windows\memory.dmp
This was probably caused by the following module: hal.dll (hal!HalBugCheckSystem+0x1E3)
Bugcheck code: 0x124 (0x0, 0xFFFFFA8010930028, 0xBE200000, 0x5110A)
Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
This is likely to be caused by a hardware problem problem. This problem might be caused by a thermal issue.
The crash took place in a standard Microsoft module. Your system configuration may be incorrect. Possibly this problem is caused by another driver on your system that cannot be identified at this time.



On Fri 12/6/2013 5:38:51 AM GMT your computer crashed
crash dump file: C:\Windows\Minidump\120613-9703-01.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x75BC0)
Bugcheck code: 0x3B (0xC0000005, 0xFFFFFFFF, 0xFFFFF8800C0EC7D0, 0x0)
Error: SYSTEM_SERVICE_EXCEPTION
file path: C:\Windows\system32\ntoskrnl.exe
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that an exception happened while executing a routine that transitions from non-privileged code to privileged code.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.



On Fri 12/6/2013 12:32:11 AM GMT your computer crashed
crash dump file: C:\Windows\Minidump\120513-10701-01.dmp
This was probably caused by the following module: hal.dll (hal+0x12A3B)
Bugcheck code: 0x124 (0x0, 0xFFFFFA8010917028, 0xBE200000, 0x5110A)
Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
This is likely to be caused by a hardware problem problem. This problem might be caused by a thermal issue.
The crash took place in a standard Microsoft module. Your system configuration may be incorrect. Possibly this problem is caused by another driver on your system that cannot be identified at this time.

I dont know what other thermal issue there is. Well, I'd installed the latest NVIDIA driver 331 (actually 331.xx but I dont know since I just uninstalled it) and am going through and uninstalling everything nvidia even the hd audio, then I'm gonna find a pre-314 driver to install per instructions. Is this right? My other question is: does NVIDIA now suck?! I never would've invested in a gtx 690 if I knew how terrible nvidia's software had become, and I've been using a 560 Ti with no trouble.

I've even tried to disable one of the 690's in my card since I hear SLI causes issues but this just disables the card so I didnt even try to game on it.
 
Solution
indigo's item 1 is key. An overclock changes the predictability of the behavior of everything.

I especially like indigo's step 4 above. A lot of times, the BIOS update can fix a lot of hardware compatibility issues given all hardware is functioning properly.

If you're having a power problem, troubleshooting the other hardware can be problematic. I would test the PSU (since this was the hardware item you changed recently) if the BIOS update doesn't work. And then proceed trying to troubleshoot the other issues.
Maybe your getting a misread on the temps? You might have a bad video card. Just exchange it if it's still under warranty.

How long have you had this system?

Other than that, have you tested your PSU? This would help determine if the PSU is getting power to the video card. It really could be a number of hardware issues causing the problem. Can you get a hold of another video card from a friend or another system and see if the system will run with it?

I don't have a 690, but I've run SLI with 580s, 680s, and 780s and haven't had issues myself. SLI does not cause issues. I don't think the Nvidia drivers are your problem.
 

indigo5

Honorable
Nov 12, 2013
59
0
10,640
Have you overclocked? Unstable oc can cause this error.

Generic "Stop 0x124" Troubleshooting Strategy:
1) Ensure that none of the hardware components are overclocked. Hardware that is driven beyond its design specifications - by overclocking - can malfunction in unpredictable ways.

2) Ensure that the machine is adequately cooled. If there is any doubt, open up the side of the PC case (be mindful of any relevant warranty conditions!) and point a mains fan squarely at the motherboard. That will rule out most (lack of) cooling issues.

3) Update all hardware-related drivers: video, sound, RAID (if any), NIC... anything that interacts with a piece of hardware. It is good practice to run the latest drivers anyway.

4) Update the motherboard BIOS according to the manufacturer's instructions. Their website should provide detailed instructions as to the brand and model-specific procedure.

5) Rarely, bugs in the OS may cause "false positive" 0x124 events where the hardware wasn't complaining but Windows thought otherwise (because of the bug). At the time of writing, Windows 7 is not known to suffer from any such defects, but it is nevertheless important to always keep Windows itself updated.

6) Attempt to (stress) test those hardware components which can be put through their paces artificially. The most obvious examples are the RAM and HDD(s). For the RAM, use the in-built memory diagnostics (run MDSCHED) or the 3rd-party memtest86 utility to run many hours worth of testing. For hard drives, check whether CHKDSK /R finds any problems on the drive(s), notably "bad sectors". Unreliable RAM, in particular, is deadly as far as software is concerned, and anything other than a 100% clear memory test result is cause for concern. Unfortunately, even a 100% clear result from the diagnostics utilities does not guarantee that the RAM is free from defects - only that none were encountered during the test passes.

7) As the last of the non-invasive troubleshooting steps, perform a "vanilla" reinstallation of Windows: just the OS itself without any additional applications, games, utilities, updates, or new drivers - NOTHING AT ALL that is not sourced from the Windows 7 disc. Should that fail to mitigate the 0x124 problem, jump to the next steps. Otherwise, if you run the "vanilla" installation long enough to convince yourself that not a single 0x124 crash has occurred, start installing updates and applications slowly, always pausing between successive additions long enough to get a feel for whether the machine is still free from 0x124 crashes. Should the crashing resume, obviously the very last software addition(s) may be somehow linked to the root cause.

If stop 0x124 errors persist despite the steps above, and the harware is under warranty, consider returning it and requesting a replacement which does not suffer periodic MCE events. Be aware that attempting the subsequent harware troubleshooting steps may, in some cases, void your warranty:
8) Clean and carefully remove any dust from the inside of the machine. Reseat all connectors and memory modules. Use a can of compressed air to clean out the RAM DIMM sockets as much as possible.

9) If all else fails, start removing items of hardware one-by-one in the hope that the culprit is something non-essential which can be removed. Obviously, this type of testing is a lot easier if you've got access to equivalent components in order to perform swaps.


Should you find yourself in the situation of having performed all of the steps above without a resolution of the symptom, unfortunately the most likely reason is because the error message is literally correct - something is fundamentally wrong with the machine's hardware.
 
indigo's item 1 is key. An overclock changes the predictability of the behavior of everything.

I especially like indigo's step 4 above. A lot of times, the BIOS update can fix a lot of hardware compatibility issues given all hardware is functioning properly.

If you're having a power problem, troubleshooting the other hardware can be problematic. I would test the PSU (since this was the hardware item you changed recently) if the BIOS update doesn't work. And then proceed trying to troubleshoot the other issues.
 
Solution

demetrius202

Distinguished
Jan 15, 2011
136
0
18,690


Update: I just wanted ppl to know the solution instead of just gorging myself on rpg gaming with no bsod's and ignoring forums, lol. As someone suggested on my previous post (see link in my first post above) the current NVIDIA drivers suck--atleast, on my rig with the above settings (see link again). I uninstalled ALL nvidia drivers, went to the website, and installed an old driver (310.90) and I've been gaming for about 7 hrs with no BSOD's. Freaking beautiful. If a more current rpg comes out I may be forced to update my drivers (waiting for DA3 and ESO) but for now my rig is STABLE. Woohooo.