New PC gets a whea_uncorrectable_error BSOD

Igor_I

Prominent
Feb 19, 2017
5
0
510
tl;dr It would be great if somebody can help me with debugging minidumps as I don't have enough skills.

Greetings,

Long story: I've ordered a new PC (NY-present to myself) with Win 8.1. It was kinda strange to get a BSOD on the 3rd day of using it (got bsod in Firefox while watching a video), but I've ignored that bsod and everything was fine till recently I got bsod while playing Pillars of Eternity (played 1-2 hours daily for a week without any issues), and some kind of nightmare has started. I do get bsods now with high frequency, mostly while browsing the web.

I can't get the pattern how exactly to reproduce the issue, but I have lots of minidumps.
I've tried updating BIOS, setting XMP memory profile in BIOS, updating all the drivers & Windows updates, changing BIOS back to default (updated default). Did multiple memory checks (that run after windows reboot) and aida stress tests. Also tried to recover OS from 10 days old recovery point (when everything was fine) and sfc /scannow - no luck.

I thought XMP profile helped as I had no bsods for several days, but it did not last long.
Diagnostic startup (which disables all non-windows services) does not help either.

Going to return the PC to the shop to fix the issue, but it'd be great if the issue can be solved without them.

P.S. Please take a look at some AIDA\whocrashed\bluescreenview screens and minidumps:

https://www.dropbox.com/sh/btdqm8mhl1eg2h6/AACBTWMyDguH4kTLof676Grfa?dl=0
https://www.dropbox.com/sh/63dkclexseawrcj/AAB8sLeN67UyngEj0PwBwMKra?dl=0


tl;dr It would be great if somebody can help me with debugging minidumps as I don't have enough skills.
 
Solution
Do you happen to have AI Suite insitalled? If so, remove it

In your case, we're dealing with a Translation lookaside-buffer (TLB) error. It specifically occurred on Processor #6 and Cache Bank #2.

What is a TLB? In a very basic definition, a Translation lookaside-buffer (TLB) is a cache that memory management hardware uses to improve virtual address translation speed. All current desktop, laptop, and server processors include one or more TLBs in the memory management hardware, and it is nearly always present in any hardware that utilizes paged or segmented virtual memory.

By default, a TLB miss whether caused by hardware and/or software complications is not fatal (if the virtual address is not stored in the TLB, it's simply...

Igor_I

Prominent
Feb 19, 2017
5
0
510
Hello Colif,

Thank you very much for the suggestion !
I have performed the test and it seems to be OK.
Not sure why exactly it tests only one core as there should be more.

If you have any additional suggestions I'd gladly follow them as I'm out of ideas.

--- IPDT64 - Revision: 4.0.0.29
--- IPDT64 - Start Time: 19.02.2017 21:14:01

--------------------------------------------------------------------
-- Testing
--------------------------------------------------------------------
CPU 1 - SPBC - Pass.
CPU 1 - Genuine Intel - Pass.
CPU 1 - BrandString - Pass.
CPU 1 - Floating Point - Pass.
CPU 1 - Prime Number - Pass.
CPU 1 - Cache - Pass.
CPU 1 - MMXSSE - Pass.
CPU 1 - FMA - Pass.
CPU 1 - AVX - Pass.
CPU 1 - IMC - Pass.
CPU 1 - PCH - Pass.
CPU 1 - IGD - Pass.
CPU 1 - GFX - Pass.
CPU 1 - CPULoad - Pass.
CPU 1 - CPU Frequency - Pass.

IPDT64 Passed
--- IPDT64 - Revision: 4.0.0.29
--- IPDT64 - End Time: 19.02.2017 21:18:50

--------------------------------------------------------------------
PASS




 

Colif

Win 11 Master
Moderator
That is a good thing, means its not CPU so that is 1 less expensive thing to replace. I doubt it only tests 1 core, it is more likely it calls it CPU 1 as it might be able to test multi CPU systems.

I can't actually read the dump files, I just knew what a WHEA error is and saw you had a Intel CPU so could at least offer you IPDT to run.
there are a few who can read them so hopefully they will look in.
 

Igor_I

Prominent
Feb 19, 2017
5
0
510
Sure, hope they can take a look. Thanks again!
Most irritating thing is that I can not reproduce it by will.

Like, I can browse the web for long, or play some game and nothing happens. I did get this bsod once while playing dota2, but could've played literally for hours (even after this bsod issue has started).

So, mostly this happens when I'm browsing web and listen to flash\html5 audio, but I can not reproduce it anytime by will.




 

Igor_I

Prominent
Feb 19, 2017
5
0
510
Hello again,

So I've deleted all my games and had no bsods like for a 3 days (which was kinda insane), but it was insane to be anyhow related - so BSODs have returned.

I've downloaded WinDbg and tested it a little, but not sure what to do next. Is this info enough to show the vendor to ask to replace some hardware? Every piece of my PC can be replaced in nearest ~10 months, I guess.

Please find the analyze below:

6: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: ffffe001aa27c028, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000b2000000, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000000010014, Low order 32-bits of the MCi_STATUS value.

Debugging Details:
------------------


BUGCHECK_STR: 0x124_GenuineIntel

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

PROCESS_NAME: System

CURRENT_IRQL: f

ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre

STACK_TEXT:
ffffd000`78fa6df8 fffff801`593c7213 : 00000000`00000124 00000000`00000000 ffffe001`aa27c028 00000000`b2000000 : nt!KeBugCheckEx
ffffd000`78fa6e00 fffff801`58dc1b31 : 00000000`00000001 ffffe001`a5d39880 ffffe001`a5d39880 ffffe001`aa27c028 : hal!HalBugCheckSystem+0xcf
ffffd000`78fa6e40 fffff801`593c76a0 : 00000000`00000728 00000000`00000006 ffffd000`78fa7230 00000000`00000000 : nt!WheaReportHwError+0x22d
ffffd000`78fa6ea0 fffff801`593c7a0d : ffffe001`00000010 ffffe001`a5d39880 ffffd000`78fa7048 ffffe001`a5d39880 : hal!HalpMcaReportError+0x50
ffffd000`78fa6ff0 fffff801`593c78f8 : ffffe001`a5d38f10 00000000`00000001 00000000`00000006 00000000`00000000 : hal!HalpMceHandlerCore+0xe1
ffffd000`78fa7040 fffff801`593c7b42 : 00000000`00000008 00000000`00000001 00000000`00000000 00000000`00000000 : hal!HalpMceHandler+0xe4
ffffd000`78fa7080 fffff801`593c7ccf : ffffe001`a5d38f10 ffffd000`78fa72b0 00000000`00000000 00000000`00000000 : hal!HalpMceHandlerWithRendezvous+0xce
ffffd000`78fa70b0 fffff801`58d5cebb : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : hal!HalHandleMcheck+0x40
ffffd000`78fa70e0 fffff801`58d5cc71 : 00000000`00000000 fffff801`58d5cbf2 00000000`00000000 00000000`00000000 : nt!KxMcheckAbort+0x7b
ffffd000`78fa7220 fffff801`674b2214 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiMcheckAbort+0x171
ffffd000`78fb37e8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : intelppm!MWaitIdle+0x18


STACK_COMMAND: kb

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: GenuineIntel

IMAGE_NAME: GenuineIntel

DEBUG_FLR_IMAGE_TIMESTAMP: 0

IMAGE_VERSION:

FAILURE_BUCKET_ID: 0x124_GenuineIntel_PROCESSOR_TLB

BUCKET_ID: 0x124_GenuineIntel_PROCESSOR_TLB

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:0x124_genuineintel_processor_tlb

FAILURE_ID_HASH: {03e060e2-37bb-3e02-4bd4-77932f0be153}



6: kd> !errrec ffffe001aa27c028
===============================================================================
Common Platform Error Record @ ffffe001aa27c028
-------------------------------------------------------------------------------
Record Id : 01d29027dfa13135
Severity : Fatal (1)
Length : 928
Creator : Microsoft
Notify Type : Machine Check Exception
Timestamp : 2/26/2017 19:05:35 (UTC)
Flags : 0x00000000

===============================================================================
Section 0 : Processor Generic
-------------------------------------------------------------------------------
Descriptor @ ffffe001aa27c0a8
Section @ ffffe001aa27c180
Offset : 344
Length : 192
Flags : 0x00000001 Primary
Severity : Fatal

Proc. Type : x86/x64
Instr. Set : x64
Error Type : TLB error
Flags : 0x00
Level : 0
CPU Version : 0x00000000000506e3
Processor ID : 0x0000000000000006

===============================================================================
Section 1 : x86/x64 Processor Specific
-------------------------------------------------------------------------------
Descriptor @ ffffe001aa27c0f0
Section @ ffffe001aa27c240
Offset : 536
Length : 128
Flags : 0x00000000
Severity : Fatal

Local APIC Id : 0x0000000000000006
CPU Id : e3 06 05 00 00 08 10 06 - bf fb fa 7f ff fb eb bf
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00

Proc. Info 0 @ ffffe001aa27c240

===============================================================================
Section 2 : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor @ ffffe001aa27c138
Section @ ffffe001aa27c2c0
Offset : 664
Length : 264
Flags : 0x00000000
Severity : Fatal

Error : DTLBL0_ERR (Proc 6 Bank 2)
Status : 0xb200000000010014
 

Colif

Win 11 Master
Moderator
Do you happen to have AI Suite insitalled? If so, remove it

In your case, we're dealing with a Translation lookaside-buffer (TLB) error. It specifically occurred on Processor #6 and Cache Bank #2.

What is a TLB? In a very basic definition, a Translation lookaside-buffer (TLB) is a cache that memory management hardware uses to improve virtual address translation speed. All current desktop, laptop, and server processors include one or more TLBs in the memory management hardware, and it is nearly always present in any hardware that utilizes paged or segmented virtual memory.

By default, a TLB miss whether caused by hardware and/or software complications is not fatal (if the virtual address is not stored in the TLB, it's simply computed and found manually from other source data), but we're crashing on a TLB failure, this implies that the CPU determined there was corruption or a hardware error in date, therefore notified Windows that an unrecoverable hardware error has occurred.

Unfortunately, this is not an easy *124 to solve, and requires a fair bit of troubleshooting as it can be: CPU itself, Motherboard, RAM, etc. In most cases, it's the CPU, with the RAM being 2nd most common and the Motherboard being 3rd.

----------------

1. Ensure your temperatures are within standard and nothing's overheating. You can use a program such as Speccy if you'd like to monitor temps - http://www.piriform.com/speccy

2. Clear your CMOS (or load optimized BIOS defaults) to ensure there's no improper BIOS setting - http://pcsupport.about.com/od/fixtheproblem/tp/clearcmos.htm

3. Ensure your BIOS is up to date (yours is - version 1707).

4. The only software conflict that can usually cause *124 bugchecks are OS to BIOS utilities from manufacturer's like Asus' AI Suite. If you have something like this software-wise, remove it ASAP. I see you do (Asus PC Probe, specifically).

5. Run Memtest for NO LESS than ~8 passes (several hours). If you ran Memtest already, but not for at least ~8 passes at stock settings, run it again:

https://answers.microsoft.com/en-us/windows/forum/windows8_1-update/multiple-bsods-already-tested-hard-drives-and/8567c30e-d91d-49a3-ae95-b0b96f5a77ac
 
Solution

Igor_I

Prominent
Feb 19, 2017
5
0
510
Hello Colif!
I'm really grateful for all your attempts to help.

I'm afraid I don't have ASUS AI Suite. The only ASUS software I see around is 'ASUS Product Register Program' (for graphics card, I guess, never used it) and Sonic Radar 2 (never used).
I've got some pre-installed Intel and other software (like xsplit gamecaster), but I'm starting to think that this is really a hardware issue. Just wanted to be exactly sure what's wrong.

I'll try memtest (not the one that comes with Windows) for 8+ times and Intel Burn Test and after that I'm done - gonna bring the PC to vendor. I'm sure it would sound stupid that there is no step-by-step guide how to reproduce the issue, but I do hope that the amount of minidumps will bring them to my side.

Thanks again!