Whea_Uncorrectable_error Surface Pro 4. Bug check Analysis

egatzonis

Prominent
Dec 16, 2017
4
0
510
Hi, I am getting whea_uncorrectable_error on my Surface pro 4 (6 months). I am getting it, since the beginning, but rather infrequently. I got the error like 10 times in the six months I own it. I got it today and decided to see what's going on. I am including the Bug Check Analysis and I need somebody more knowledgeable to confirm what I am reading. Is the Cache Processor to blame?
Pls note that I have used Intel's Processor Diagnostic, which reports no errros. Bug check analysis follows:


Microsoft (R) Windows Debugger Version 10.0.16299.15 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Users\gatzo\Downloads\121617-12250-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

WARNING: Whitespace at start of path element
Error: Empty Path.
WARNING: Whitespace at start of path element
Symbol search path is: srv*c:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
WARNING: Whitespace at start of path element
Windows 10 Kernel Version 16299 MP (4 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 16299.15.amd64fre.rs3_release.170928-1534
Machine Name:
Kernel base = 0xfffff802`5348e000 PsLoadedModuleList = 0xfffff802`537effb0
Debug session time: Sat Dec 16 10:19:52.627 2017 (UTC + 2:00)
System Uptime: 0 days 20:30:58.489
Loading Kernel Symbols
.

Press ctrl-c (cdb, kd, ntsd) or ctrl-break (windbg) to abort symbol loads that take too long.
Run !sym noisy before .reload to track down problems loading symbols.

..............................................................
................................................................
................................................................
.....................................
Loading User Symbols
Loading unloaded module list
..................................................
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 124, {0, ffffbf0112070028, be000000, 100110a}

*** WARNING: Unable to verify timestamp for IntcAudioBus.sys
*** ERROR: Module load completed but symbols could not be loaded for IntcAudioBus.sys
Probably caused by : GenuineIntel

Followup: MachineOwner
---------

1: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: ffffbf0112070028, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000be000000, High order 32-bits of the MCi_STATUS value.
Arg4: 000000000100110a, Low order 32-bits of the MCi_STATUS value.

Debugging Details:
------------------


DUMP_CLASS: 1

DUMP_QUALIFIER: 400

BUILD_VERSION_STRING: 10.0.16299.64 (WinBuild.160101.0800)

SYSTEM_MANUFACTURER: Microsoft Corporation

SYSTEM_PRODUCT_NAME: Surface Pro 4

SYSTEM_SKU: Surface_Pro_4

SYSTEM_VERSION: D:0B:08F:1C:03P:38

BIOS_VENDOR: Microsoft Corporation

BIOS_VERSION: 108.1866.769

BIOS_DATE: 10/10/2017

BASEBOARD_MANUFACTURER: Microsoft Corporation

BASEBOARD_PRODUCT: Surface Pro 4

DUMP_TYPE: 2

BUGCHECK_P1: 0

BUGCHECK_P2: ffffbf0112070028

BUGCHECK_P3: be000000

BUGCHECK_P4: 100110a

BUGCHECK_STR: 0x124_GenuineIntel

CPU_COUNT: 4

CPU_MHZ: 8a0

CPU_VENDOR: GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 4e

CPU_STEPPING: 3

CPU_MICROCODE: 6,4e,3,0 (F,M,S,R) SIG: BA'00000000 (cache) BA'00000000 (init)

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

PROCESS_NAME: dwm.exe

CURRENT_IRQL: f

ANALYSIS_SESSION_HOST: SURFACEVAGS

ANALYSIS_SESSION_TIME: 12-16-2017 12:40:46.0164

ANALYSIS_VERSION: 10.0.16299.15 amd64fre

STACK_TEXT:
ffffae00`a6c0c5c8 fffff802`5344af1f : 00000000`00000124 00000000`00000000 ffffbf01`12070028 00000000`be000000 : nt!KeBugCheckEx
ffffae00`a6c0c5d0 fffff806`770a148a : ffffbf01`12070028 ffffbf01`0ff29040 ffffbf01`0ff29040 ffffbf01`0ff29040 : hal!HalBugCheckSystem+0xcf
ffffae00`a6c0c610 fffff802`5370ed41 : ffffbf01`12070028 00000000`00000000 ffffbf01`0ff29040 ffffbf01`0ff29040 : PSHED!PshedBugCheckSystem+0xa
ffffae00`a6c0c640 fffff802`5344b460 : 00000000`00000728 00000000`00000001 00000000`00000000 ffffbf01`0ff29090 : nt!WheaReportHwError+0x261
ffffae00`a6c0c6a0 fffff802`5344b7c0 : 00000000`00000010 00000000`00000001 ffffae00`a6c0c848 00000000`00000001 : hal!HalpMcaReportError+0x50
ffffae00`a6c0c7f0 fffff802`5344b6ae : ffffbf01`0ff28b60 00000000`00000001 00000000`00000000 00000000`00000000 : hal!HalpMceHandlerCore+0xe0
ffffae00`a6c0c840 fffff802`5344b8f2 : 00000000`00000004 00000000`00000001 00000000`00000000 00000000`00000000 : hal!HalpMceHandler+0xda
ffffae00`a6c0c880 fffff802`5344ba80 : ffffbf01`0ff28b60 ffffae00`a6c0cab0 00000000`00000000 00000000`00000000 : hal!HalpMceHandlerWithRendezvous+0xce
ffffae00`a6c0c8b0 fffff802`535fc7fb : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : hal!HalHandleMcheck+0x40
ffffae00`a6c0c8e0 fffff802`535fc56c : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxMcheckAbort+0x7b
ffffae00`a6c0ca20 fffff806`7aeed97b : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiMcheckAbort+0x1ac
ffffae00`a6c1ae80 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : IntcAudioBus+0x1d97b


THREAD_SHA1_HASH_MOD_FUNC: fd23acc635c95369845545ab6491c5a7ece55d8f

THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 23850cb567dbc00f9bc36c9cda8f235d859cd6ba

THREAD_SHA1_HASH_MOD: 7b58a27a1657f238a46f01f888fd06880593a81a

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: GenuineIntel

IMAGE_NAME: GenuineIntel

DEBUG_FLR_IMAGE_TIMESTAMP: 0

STACK_COMMAND: .thread ; .cxr ; kb

FAILURE_BUCKET_ID: 0x124_GenuineIntel_PROCESSOR_CACHE

BUCKET_ID: 0x124_GenuineIntel_PROCESSOR_CACHE

PRIMARY_PROBLEM_CLASS: 0x124_GenuineIntel_PROCESSOR_CACHE

TARGET_TIME: 2017-12-16T08:19:52.000Z

OSBUILD: 16299

OSSERVICEPACK: 64

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK: 272

PRODUCT_TYPE: 1

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

OSEDITION: Windows 10 WinNt TerminalServer SingleUserTS

OS_LOCALE:

USER_LCID: 0

OSBUILD_TIMESTAMP: 2017-10-25 06:06:03

BUILDDATESTAMP_STR: 160101.0800

BUILDLAB_STR: WinBuild

BUILDOSVER_STR: 10.0.16299.64

ANALYSIS_SESSION_ELAPSED_TIME: 4d2

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:0x124_genuineintel_processor_cache

FAILURE_ID_HASH: {4c8f3f5e-1af5-ed8b-df14-d42663b1dfa7}

Followup: MachineOwner
---------

Thanks
 
Solution
system being up for 2 minutes but having a 20 hour up time isn't mutually exclusive, If fast startup is enabled then the uptime is since the last restart, not since last time you turned it on.

Colif

Win 11 Master
Moderator
Do you have latest firmware and driver package from here? https://docs.microsoft.com/en-us/surface/deploy-the-latest-firmware-and-drivers-for-surface-devices#surface-pro-4 - BIOS updates may fix this.

Normally the only software that will cause WHEA errors is overclocking software so remove MSI afterburner if its installed

error occurred in dwm.exe which is the desktop windows manager. I may be misreading this but it looks like a driver error

Can you follow option one here
and then do this step below: Small memory dumps - Have Windows Create a Small Memory Dump (Minidump) on BSOD

that creates a file in c windows/minidump
copy that file to documents
upload the copy from documents to a cloud server and share the link here and someone with right software to read them will help you fix it :)

Dumps will show what drivers were actually loaded when it crashed, might narrow down the field.
 

egatzonis

Prominent
Dec 16, 2017
4
0
510

Hi,
Actually, the bug check analysis provided with my orginal post, is taken from using the Windows Debugger and the minidump created by Windows. I can provide the minidump itself, if usefull. The bug check analysis blames the Intel Cache processor, although you are right the WDM Driver error is confusing. I have no overclocking. I have installed all latest updates on Windows and Firmware available through Windows Updates. I assume these are the latest and there is no point manually installing it.
Link to Minidump : https://1drv.ms/u/s!Aget539abtS4ldUIuF1-3bjFFJiJ1A
 

gardenman

Splendid
Moderator
Hi, I ran the dump file through the debugger and got the following information: https://pste.eu/p/gcon.html

File: 121617-12250-01.dmp (Dec 16 2017 - 03:19:52)
BugCheck: [WHEA_UNCORRECTABLE_ERROR (124)]
*** WARNING: Unable to verify timestamp for IntcAudioBus.sys
Probably caused by: GenuineIntel (Process: dwm.exe)
Uptime: 0 Day(s), 20 Hour(s), 30 Min(s), and 58 Sec(s)

I can't help you with this. Wait for additional replies. Good luck.
 

egatzonis

Prominent
Dec 16, 2017
4
0
510
Thanks. Will wait. Hopefully somebody with more experince in memory dumps can figure it out. Even though it mentions a driver fault, the processor cache is mentioned as well.
 
most likely a power or overheating problem. As long as you did not block the vents to cause overheating, I would see if Microsoft would switch out the power brick or the actual machine.
the other cause of this would be overclocking but I did not see any overclocking drivers.
(note: power bricks can overheat also, wife had one that slipped between the cushions of the sofa and it was very hot when I pulled it out)

the cache processor is inside the CPU, when the CPU overheats or has voltage fluctuations the CPU memory controller can get errors when moving data inside the CPU. The memory controller in the CPU can detect the error and then the CPU will tell windows to shutdown with bugcheck 0x124.





 

egatzonis

Prominent
Dec 16, 2017
4
0
510
Thanks. Will call Microsoft. Just a note. It's not a matter of overheating. Every time this happened it was 1 or 2 minutes after boot. The system was cold. No overclocking.Also running on battery.
 
make sure the bios is upudated. it will be doing custom settings when running on battery.
(just in case you have blocked updates)
note: the debugger indicated the system was up for 20 hours