BSOD 0x124 GenuineIntel_PCIEXPRESS

Status
Not open for further replies.

Wolken

Commendable
Jul 20, 2016
4
0
1,510
Since I built this computer (last year) I keep getting this very same BSOD in a random fashion.
For instance between january and june I didn't get any, but then I started getting them again almost each week.

I constantly install the newest drivers for each component.

My temps are all right, both during stress tests and daily usage.

These BSOD appear quite randomly, but at most when playing games (WoW seems to trigger the most of them, but witcher3 as well), or doing video conversion, 3d modeling and other intensive stuff. The crazy stuff is that after one BSOD I can play for hours without crashes, but almost every time I launch WoW for the first time in a while it BSODs within the first playing hour.


I performed stress tests on most components without any detection:
RAM: memtest86 , memtest86+ >10 passes
CPU: prime95 overnight , intel processor diagnostic tool
GPU: Heaven & Valley benchmarks overnight , 1h furmark
[The only error I was able to find was on one RAM bank with HCI memtest, which runs in windows, not as a live cd. But then I tested again the same bank for more than 24 hours and wasn't able to find any error, so maybe a false positive]

My build:
cpu: intel i7-5820K
mobo: MSI X99S Sli Plus
gpu: GTX 970 (Gigabyte)
RAM: Crucial Ballistix Sport DDR4 2x8GB
system drive: Samsung SSD 850 pro 256GB
psu: 750W Seasonic S12G-750 (80+Gold)
OS: fully updated Win7

My BSOD:
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 124, {4, fffffa8008f0a038, 0, 0}

Probably caused by : GenuineIntel

Followup: MachineOwner
---------

6: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000004, PCI Express Error
Arg2: fffffa8008f0a038, Address of the WHEA_ERROR_RECORD structure.
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------


BUGCHECK_STR: 0x124_GenuineIntel

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT

PROCESS_NAME: Wow-64.exe

CURRENT_IRQL: 9

ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre

STACK_TEXT:
fffff880`0eabd468 fffff800`04215a3b : 00000000`00000124 00000000`00000004 fffffa80`08f0a038 00000000`00000000 : nt!KeBugCheckEx
fffff880`0eabd470 fffff800`03da6bff : 00000000`00000001 fffffa80`08edf0c0 00000000`00000000 fffffa80`08edeb60 : hal!HalBugCheckSystem+0x1e3
fffff880`0eabd4b0 fffff880`00fa9bcf : fffffa80`00000750 fffffa80`08edf0c0 00000000`00000000 fffffa80`08f09ab0 : nt!WheaReportHwError+0x26f
fffff880`0eabd510 fffff880`00fa95f6 : 00000000`00000000 fffff880`0eabd660 fffffa80`08f20cc0 fffff800`03c8edef : pci!ExpressRootPortAerInterruptRoutine+0x27f
fffff880`0eabd570 fffff800`03c8825c : 00000000`00000031 00000000`00000000 fffffa80`08f20cc0 00000000`00000001 : pci!ExpressRootPortInterruptRoutine+0x36
fffff880`0eabd5e0 fffff800`03cb1356 : fffff880`0eabd7c0 00000000`001131ae fffffa80`033950a0 fffff880`035e6180 : nt!KiInterruptDispatch+0x16c
fffff880`0eabd770 fffff800`03c9cfff : 00000000`00000002 fffff880`00000002 fffffa80`088a03f8 00000000`00000002 : nt!MiRemoveAnyPage+0x146
fffff880`0eabd890 fffff800`03c998fe : 00000000`00000001 00000000`68d9b000 fffff880`0eabdae0 fffff680`00346cd8 : nt!MiResolveDemandZeroFault+0x54f
fffff880`0eabd980 fffff800`03c8a52e : 00000000`00000001 00000000`68d9b000 00000000`68cdba01 00000000`ffffffff : nt!MmAccessFault+0x5de
fffff880`0eabdae0 000007fe`f545ff5b : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiPageFault+0x16e
00000000`0290eeb0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x000007fe`f545ff5b


STACK_COMMAND: kb

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: GenuineIntel

IMAGE_NAME: GenuineIntel

DEBUG_FLR_IMAGE_TIMESTAMP: 0

IMAGE_VERSION:

FAILURE_BUCKET_ID: X64_0x124_GenuineIntel_PCIEXPRESS

BUCKET_ID: X64_0x124_GenuineIntel_PCIEXPRESS

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:x64_0x124_genuineintel_pciexpress

FAILURE_ID_HASH: {3fd82133-403a-028a-6939-36a9eccfc17d}

Followup: MachineOwner
---------

6: kd> !errrec fffffa8008f0a038
===============================================================================
Common Platform Error Record @ fffffa8008f0a038
-------------------------------------------------------------------------------
Record Id : 01d1e1a8ef791428
Severity : Fatal (1)
Length : 672
Creator : Microsoft
Notify Type : PCI Express Error
Timestamp : 7/20/2016 8:53:47 (UTC)
Flags : 0x00000000

===============================================================================
Section 0 : PCI Express
-------------------------------------------------------------------------------
Descriptor @ fffffa8008f0a0b8
Section @ fffffa8008f0a148
Offset : 272
Length : 208
Flags : 0x00000001 Primary
Severity : Recoverable

Port Type : Root Port
Version : 1.1
Command/Status: 0x0010/0x0407
Device Id :
VenId:DevId : 8086:2f08
Class code : 030400
Function No : 0x00
Device No : 0x03
Segment : 0x0000
Primary Bus : 0x00
Second. Bus : 0x00
Slot : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ fffffa8008f0a17c
Device Caps : 00008001 Role-Based Error Reporting: 1
Device Ctl : 0027 ur FE NF CE
Dev Status : 0002 ur fe NF ce
Root Ctl : 0008 fs nfs cs

AER Information @ fffffa8008f0a1b8
Uncorrectable Error Status : 00000020 ur ecrc mtlp rof uc ca cto fcp ptlp SD dlp und
Uncorrectable Error Mask : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Severity : 00062010 ur ecrc MTLP ROF uc ca cto FCP ptlp sd DLP und
Correctable Error Status : 00000000 adv rtto rnro dllp tlp re
Correctable Error Mask : 00000000 adv rtto rnro dllp tlp re
Caps & Control : 00000005 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
Header Log : 00000000 00000000 00000000 00000000
Root Error Command : 00000000 fen nfen cen
Root Error Status : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
Correctable Error Source ID : 00,00,00
Correctable Error Source ID : 00,00,00

===============================================================================
Section 1 : Processor Generic
-------------------------------------------------------------------------------
Descriptor @ fffffa8008f0a100
Section @ fffffa8008f0a218
Offset : 480
Length : 192
Flags : 0x00000000
Severity : Informational

Proc. Type : x86/x64
Instr. Set : x64
CPU Version : 0x00000000000306f2
Processor ID : 0x0000000000000006

Sincerely I don't know what to do anymore!
Thank you for any help
 
two devices were talking on your PCI/e bus and one just went away
unncorrectable error Status : the flags that are in capital letters are the ones that are set
in this case SD = surprise down.

generally, you update the BIOS, make sure your PCI/e bus is not overclocked (set to 100 MHz in BIOS)
make sure your GPU is not overclocked, remove any overclocking drivers for the CPU and GPU.

fix anything that can screw up a GPU:
update the motherboard sound drivers
update the network drivers
check for over heating (not likely in this case)
check for proper power connections (supplemental power connections, check the actual connector for burning /overheating marks)

and try again.

problem is that many devices now talk thru the PCI/e and you don't know which one failed.
(best guess is the video card)

update the CPU chipset drivers and any usb 3.0 drivers from your motherboard vendor.

all else fails, you have to set your memory dump type to kernel or full and produce a memory dump and
have it looked at, the kernel dump can tell you the name of the driver that is failing. (maybe)


here are some of the codes for the PCI/e errors (helps when attempting to google for a correct answer)
UR = Unsupported Request Error
MTLP = Malformed TLP
SD = Surprise Down
ROF = Receiver Overflow
UC = Unexcepted Completion
CT = Completion Timeout
 

Wolken

Commendable
Jul 20, 2016
4
0
1,510
Thank you for your help!!!
@hang-the-nine
I run memtest86 overnight with both sticks together, and HCImemtest on the sticks one at a time.
I did not try with a clean installation as I thought this error refers to a hardware failure more than a software issue.
I gave up on the idea of overclocking, since the system wasn't stable anyway, so no OC.

@johnbl
I have updated all the mobo drivers and bios multiple times during the last year without any effect. I reset my bios to default settings, everything set to -auto- is this fine or should I force base values?
If I correctly follow your diagnosis, the information reported in the dump would rule out a RAM problem, am I right?
The Surprise Down could be caused by just ANY peripheral connected to USB or SATA? For instance a usb keyboard, or a non system hard drive?
thanks again
 
Set your system to make a Kernel memory dump then put it on a server after the next bugcheck and post a link.
Yes usb drivers can mess up a pci/e port. I can take a look at the internal error logs in the kernel dump.
I do not think it would be a system ram problem.



 

Wolken

Commendable
Jul 20, 2016
4
0
1,510


You can get the dump at this mega link

Windbg additionally reports this on the kernel dump:
STACK_TEXT:
fffff880`0eabd468 fffff800`04215a3b : 00000000`00000124 00000000`00000004 fffffa80`08f0a038 00000000`00000000 : nt!KeBugCheckEx
fffff880`0eabd470 fffff800`03da6bff : 00000000`00000001 fffffa80`08edf0c0 00000000`00000000 fffffa80`08edeb60 : hal!HalBugCheckSystem+0x1e3
fffff880`0eabd4b0 fffff880`00fa9bcf : fffffa80`00000750 fffffa80`08edf0c0 00000000`00000000 fffffa80`08f09ab0 : nt!WheaReportHwError+0x26f
fffff880`0eabd510 fffff880`00fa95f6 : 00000000`00000000 fffff880`0eabd660 fffffa80`08f20cc0 fffff800`03c8edef : pci!ExpressRootPortAerInterruptRoutine+0x27f
fffff880`0eabd570 fffff800`03c8825c : 00000000`00000031 00000000`00000000 fffffa80`08f20cc0 00000000`00000001 : pci!ExpressRootPortInterruptRoutine+0x36
fffff880`0eabd5e0 fffff800`03cb1356 : fffff880`0eabd7c0 00000000`001131ae fffffa80`033950a0 fffff880`035e6180 : nt!KiInterruptDispatch+0x16c
fffff880`0eabd770 fffff800`03c9cfff : 00000000`00000002 fffff880`00000002 fffffa80`088a03f8 00000000`00000002 : nt!MiRemoveAnyPage+0x146
fffff880`0eabd890 fffff800`03c998fe : 00000000`00000001 00000000`68d9b000 fffff880`0eabdae0 fffff680`00346cd8 : nt!MiResolveDemandZeroFault+0x54f
fffff880`0eabd980 fffff800`03c8a52e : 00000000`00000001 00000000`68d9b000 00000000`68cdba01 00000000`ffffffff : nt!MmAccessFault+0x5de
fffff880`0eabdae0 000007fe`f545ff5b : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiPageFault+0x16e
00000000`0290eeb0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x000007fe`f545ff5b

Thank you a lot!!



 
the system also thinks one of your chipset drivers is unknown.
this should be Intel C610 Series X99 Chipset

install the motherboard drivers that were updated (all of them)
https://us.msi.com/Motherboard/support/X99S-SLI-PLUS.html#down-driver&Win7 64

i think the new drivers have a chance to fix the various problems. I would still look for a update to the microsoft webcam,

-----------------------------



-----------
the system thinks the actual physical devices below were removed right before the crash.
----------
looks like your system is running out of virtual memory space.
you might make your pagefile to be 2 times the actual size of your physical RAM. This would give you longer time until you crash.
your system was up 22 hours with a 8GB pagefile.sys
--------------

the pool tag using your pagefile was 'ismc' which I think is for the intel storage driver.
you can update your intel storage drivers from the motherboard website or from intel
the driver date was Nov 04 02:27:49 2015

I kind of expect that you have another driver is causing the problem.
I would be looking at nx6000.sys Thu Dec 02 14:23:39 2010
Microsoft Lifecam
look here and see if this is the correct product:
https://www.microsoft.com/accessories/en-us/d/lifecam-hd-5000
if so you might want to update the drivers and the firmware for the device.




-----------------
plug and play reports two devices with problems:
one is i8042prt the old ps/2 mouse interface
the second was teamviewervpn

here is the error txt

Dumping IopRootDeviceNode (= 0xfffffa8008577350)
DevNode 0xfffffa8008f11010 for PDO 0xfffffa8008f0fe40
InstancePath is "ACPI\PNP0303\0"
ServiceName is "i8042prt"
State = DeviceNodeRemoved (0x312)
Previous State = DeviceNodeRemovePendingCloses (0x311)
Problem = CM_PROB_DEVICE_NOT_THERE
DevNode 0xfffffa800859bd90 for PDO 0xfffffa800859b060
InstancePath is "Root\NET\0001"
ServiceName is "teamviewervpn"
State = DeviceNodeInitialized (0x302)
Previous State = DeviceNodeUninitialized (0x301)
Problem = CM_PROB_DISABLED



machine info:
BIOS Version 1.90
BIOS Release Date 10/30/2015
Manufacturer MSI
Product Name MS-7885
Version 1.0
Product X99S SLI PLUS (MS-7885)
Version 1.0
Processor Version Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
Processor Voltage 80h - 0.0V
External Clock 100MHz
Max Speed 4000MHz
Current Speed 3300MHz

memory:
Speed 2400MHz
Part Number BLS8G4D240FSA.M16FA


 

Wolken

Commendable
Jul 20, 2016
4
0
1,510

Thank you for your effort man, I really appreciate it!

I installed all the mobo drivers, the latest nvidia one as well, and unplugged all the usb devices except keyboard and mouse. Now it's a matter of waiting, as the BSOD happen quite randomly.

Still it seems strange to me that the problem is caused by these peripherals, as the same bsods happen to me since I built the pc, january 2015, without the webcam and other hardware, and about every 3-4 months I installed all the updated drivers without any improvement.



I increased it to 16GB, but could this actually have been the crashes cause, or increasing it only makes the bsod harder to get? That would increase the time needed to diagnose the actual error cause, or not?

The teamviewer vpn is a virtual device and I've disabled intentionally, I think that's the meaning of that message...

I'll post an update if I get new crashes or if I don't get any for a while, thanks!



 
Status
Not open for further replies.