Help! BSOD keeps interrupting games (or Prime95), Probably caused by AuthenticAMD

miggello

Reputable
Feb 17, 2015
6
0
4,510
First time posting so please take it easy on me.

I've been having this issue since I got my computer several years ago, during periods of high processor activity (very consistantly within a few seconds if I run a Prime95 stress test) my computer will crash to a BSOD.

From browsing other similar issues people have had I ran a command line prompt to find corrupted files and successfully restored them online, but this didn't clear the issue.

I ran a bugcheck and compiled the results below along with my system information.
Any input into what may be the problem and how I can correct it would be greatly appreciated.

SYSTEM:

OS: Windows 8.1
Processor: AMD FX-8350 Eight-Core ~4.0GHz (black edition), I believe this is manufacturer overclocked but I'm not certain.
Motherboard: GA-78LMT-USB3 with most recent bios update (F4 XV)
Memory: 2x Kingston 8GB DDR III SDRAM
Graphics: NVIDIA GeForce GTX 660 Ti 2
PSU: GreenMe 650W

BUGCHECK:

************* Symbol Path validation summary **************
Response Time (ms) Location
Deferred srv*

************* Symbol Path validation summary **************
Response Time (ms) Location
Deferred srv*

Microsoft (R) Windows Debugger Version 6.3.9600.17298 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Windows\Minidump\021715-18562-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available


************* Symbol Path validation summary **************
Response Time (ms) Location
Deferred srv*

************* Symbol Path validation summary **************
Response Time (ms) Location
Deferred srv*
Symbol search path is: srv*
Executable search path is: srv*
Windows 8 Kernel Version 9600 MP (8 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS Personal
Built by: 9600.17630.amd64fre.winblue_r7.150109-2022
Machine Name:
Kernel base = 0xfffff801`ef206000 PsLoadedModuleList = 0xfffff801`ef4df250
Debug session time: Tue Feb 17 15:12:06.927 2015 (UTC - 5:00)
System Uptime: 0 days 0:00:08.636
Loading Kernel Symbols
..

Press ctrl-c (cdb, kd, ntsd) or ctrl-break (windbg) to abort symbol loads that take too long.
Run !sym noisy before .reload to track down problems loading symbols.

..........................................................
Loading User Symbols
Mini Kernel Dump does not contain unloaded driver list
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 124, {0, ffffe0009db03038, 0, 0}

Probably caused by : AuthenticAMD

Followup: MachineOwner
---------

0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: ffffe0009db03038, Address of the WHEA_ERROR_RECORD structure.
Arg3: 0000000000000000, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000000000000, Low order 32-bits of the MCi_STATUS value.

Debugging Details:
------------------


BUGCHECK_STR: 0x124_AuthenticAMD

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

PROCESS_NAME: System

CURRENT_IRQL: 0

ANALYSIS_VERSION: 6.3.9600.17298 (debuggers(dbg).141024-1500) amd64fre

STACK_TEXT:
ffffd001`4ff47500 fffff801`ef58ced8 : ffffd001`4f778180 ffffe000`9db03010 ffffe000`9b7e0800 fffff801`ef29ff79 : nt!WheapCreateLiveTriageDump+0x81
ffffd001`4ff47a30 fffff801`ef3ccab4 : ffffe000`9db03010 fffff801`ef4ee7c0 ffffe000`9b7e0880 fffff801`ef4b8300 : nt!WheapCreateTriageDumpFromPreviousSession+0x44
ffffd001`4ff47a60 fffff801`ef3cd8d1 : fffff801`ef4ee760 fffff801`ef4ee7c0 ffffe000`9b7e0880 00000000`00000000 : nt!WheapProcessWorkQueueItem+0x48
ffffd001`4ff47aa0 fffff801`ef2b13ac : fffff801`ef71a084 fffff801`ef3cd8ac ffffe000`9b7e0880 fffff801`ef4b8300 : nt!WheapWorkQueueWorkerRoutine+0x25
ffffd001`4ff47ad0 fffff801`ef2de280 : ffffd001`5395f3c0 ffffe000`9b7e0880 00000000`00000080 ffffe000`9b7e0880 : nt!ExpWorkerThread+0x28c
ffffd001`4ff47b80 fffff801`ef35cfc6 : ffffd001`53953180 ffffe000`9b7e0880 ffffd001`5395f3c0 00000000`00000000 : nt!PspSystemThreadStartup+0x58
ffffd001`4ff47be0 00000000`00000000 : ffffd001`4ff48000 ffffd001`4ff41000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16


STACK_COMMAND: kb

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: AuthenticAMD

IMAGE_NAME: AuthenticAMD

DEBUG_FLR_IMAGE_TIMESTAMP: 0

IMAGE_VERSION:

FAILURE_BUCKET_ID: 0x124_AuthenticAMD_PROCESSOR_CACHE_PRV

BUCKET_ID: 0x124_AuthenticAMD_PROCESSOR_CACHE_PRV

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:0x124_authenticamd_processor_cache_prv

FAILURE_ID_HASH: {cedb04af-9437-ee8e-2e67-54d858b5dbbc}

Followup: MachineOwner
---------






 
Solution
bugcheck 0x124 WHEA_UNCORRECTABLE_ERROR
with parameter 1 = 0 indicates that the CPU called the bugcheck

If you have access to the memory dump look at the system uptime
If the system uptime is 5 or 6 seconds, then most likely you had a power fluctuation to the CPU and the CPU reset and you have a power supply that did not block the cpu from restarting before the power was stable.

if the uptime is over about 10 mins then use the command
!errrec ffffe0009db03038
(the address shown as parameter 2 in the bugcheck code)
this will dump the info as to why the CPU called the bugcheck, Most often it will be from the memory controller getting a CRC error.
for this case you need to look at:
-remove any overclocking in BIOS, remove any BIOS...
bugcheck 0x124 WHEA_UNCORRECTABLE_ERROR
with parameter 1 = 0 indicates that the CPU called the bugcheck

If you have access to the memory dump look at the system uptime
If the system uptime is 5 or 6 seconds, then most likely you had a power fluctuation to the CPU and the CPU reset and you have a power supply that did not block the cpu from restarting before the power was stable.

if the uptime is over about 10 mins then use the command
!errrec ffffe0009db03038
(the address shown as parameter 2 in the bugcheck code)
this will dump the info as to why the CPU called the bugcheck, Most often it will be from the memory controller getting a CRC error.
for this case you need to look at:
-remove any overclocking in BIOS, remove any BIOS overclocking of the PCI bus, remove any software overclocking of the CPU and GPU.
- check for overheating issues, blow out dust from GPU, CPU, and power supply fans and confirm they are spinning.
- confirm your power supply is working correctly and your GPU is getting proper power. (under powered GPU can cause brown outs to the power on the motherboard, this CPU Memory errors the the CPU memory controller reports to the CPU and the CPU shuts the system down.

 
Solution

miggello

Reputable
Feb 17, 2015
6
0
4,510


Thank you for your response john, it looks like the first case is definitely the problem I'm having. The system uptime was only 8 seconds for the instance above, and every time I run a stress test a crash occurs promptly. I've removed any dust from around components and ruled heat out as an issue (the crash occurs long before temperatures rise when stress testing). How do I confirm my power supply is working correctly and the GPU is getting power?

Also is it possible to verify if power fluctuations are the issue and address those?
 
if your CPU and GPU are not overclocked and you fans are working. You would next check any supplemental power connections from the Power supply directly to the GPU maybe un-connect and reconnect them. Sometimes people forget to connect them or they have a bad connection.
if the connections are good you could be looking at a power supply failure. The power supply is supposed to send a power_ok signal to the motherboard when the power is good. The cheap ones just lie and hard code a power_ok signal and just let the power fluctuate. Sometimes with a older motherboard the voltage regulators start to fail. For most people, they just end up getting a new power supply if it is under powered, or very old. (or a questionable brand)
 

miggello

Reputable
Feb 17, 2015
6
0
4,510


Ok I'll pull out and re-attach all power connections when I get home this afternoon to try to rule out a bad connection somewhere. The power supply I'm using is a GreenMe 650w, which doesn't seem to be one of the top brands but isn't badly rated on newegg either. I'm hoping that turns out to not be the problem as that will probably be $100 to replace. A little additional information I went into the BIOS and set my clock settings to the 'safe defaults', so an overclock shouldn't be the issue.
 

miggello

Reputable
Feb 17, 2015
6
0
4,510


So I snugged all of my power cables back up, and I'm still getting crashes, so I don't think a bad connection was the issue. I did find a way that seems to be at least stopping some of the crashes. In the Advanced BIOS settings I changed Load Line Control from 'Auto' to 'Extreme' and ran the Prime95 tests that previous crashed my computer almost immediately, however it seems to have made my cpu temperatures skyrocket and I had to stop the tests fairly quickly, although they still ran longer than they had been before crashing. The Small FFTs (maximum heat, FPU stress, data fits in L2 cache, RAM not tested much) torture test ran for only about 10 seconds before processor temperatures got to a point where I had to stop the test, and the Blend (tests some of everything, lots of RAM tested) lasted slightly longer but again ran the cpu temperatures up (I stopped the test when CPU temperatures were approaching 70 C).

I have a better CPU fan and Arctic Silver 5 on the way so hopefully soon I can make the cpu temperature increase a non-issue, but the fact that the crash seems to be voltage related may suggest the problem to be either the power supply or my motherboard? Are there any diagnostics I can run that will record whats happening during the torture tests that could help with diagnosing the problem?
 

miggello

Reputable
Feb 17, 2015
6
0
4,510
UPDATE 2-23-15: I replaced my previous PSU (IN WIN Greenme 650W) with a new Corsair RM 750W PSU, and re-ran stress tests. This didn't result in improvements and my computer crashed the same as before. I have a replacement motherboard coming in, the ASUS M5A99FX and am really hoping that fixes the problem..
 

miggello

Reputable
Feb 17, 2015
6
0
4,510


I replaced my motherboard with a new one last night and that seems to have been the problem. I was able to run Prime 95 for quite a while with no errors and no crash last night, as well as play Crysis 2 for the first time since I got it! This is mostly speculation but I'm guessing there was a defect in the motherboard since I bought it where under duress it would 'brown-out' the processor causing a crash. Thanks for trying to help with this!