New build / Win 10 nightmare - looking for guidance

JodahStarlen

Distinguished
Dec 18, 2010
13
0
18,510
I just built a new PC and am dealing with some incredibly frustrating recurring failures. I'd love to know what my next steps should be to isolate where my failure is; I'm not far from RMA'ing the whole build and going for a nice long walk. :(

First, stats:
OS: Windows 10 Pro (retail copy, not upgrade)
CPU: i7-6700K w/ Hyper 212 EVO
MOBO: Gigabyte Z170MX-Gaming 5 && MSI Z170M Mortar
RAM: Corsair Vengeance LPX 32gb (2x16) && G.Skill Ripjaws V 16gb (2x8) (both DDR4 @ 2400)
SSD: Samsung EVO 850 500gb
PSU: Corsair CX750M (one old used one, then another new from the store)
Case: Thermaltake Core V21
No wifi, No GPU, no optical drives, no additional peripherals.

Minidumps (this is a small sample from a few hours of working, but pretty consistent with pattern): https://www.dropbox.com/s/qhhdys3rlnr5kn2/dumps-11-15.zip

Local tests:
sfc /scannow (no invalid areas found)
chkdsk (no issues detected)
dism /restorehealth (went through the process, but made no changes)

History:
This was a mostly new build I knocked out over the weekend. It started with the Z170MX mobo, Corsair 2x16 RAM & my CX750m PSU from my previous PC (the PSU itself is a couple years old). All parts other than the PSU are brand new, opened-the-box-myself.

POST'ed on the first try with no problems, Windows 10 install from a USB key (created from Microsoft ISO download & Rufus bootable USB maker) went as expected. Within 5 minutes I had my first BSOD, and they've never stopped. I can turn the PC on and watch it bluescreen at random intervals all on its own, with varying messages. Occasionally I'll get maybe 10 minutes of zippy use out of it before it fails. I've been unable to isolate a particular app or action that triggers it, but there's no warning or slow-down. It's an immediate failure, dump & restart.

Common BSOD errors include:
- IRQL_NOT_LESS_OR_EQUAL
- SYSTEM_SERVICE_EXCEPTION
- MEMORY_MANAGEMENT
- SYSTEM_THREAD_EXCEPTION_NOT_HANDLED
... along with a handful of others. Most don't have associated files on the BSOD. BlueScreenViewer indicates most of them source from `ntoskrnl`, which I understand is core to Windows and probably a red herring.

I've tried all of the following:
- Reset Windows
- Refresh Windows
- Format & clean install Windows
- Replace the PSU (bought a brand new CX750M and replaced the PSU & cables)
- Swapped the motherboard (Gigabyte Z170MX -> MSI Z170M)
- Moved/reseated RAM
- Tried only one stick of RAM
- Moved one stick of RAM between slots.
- Enabled/disabled XMP to knock RAM to 2400
- Run ram at default speed (2133)
- Tried new RAM entirely (Corsair 2x16 -> G.Skill 2.8, both DDR4 @ 2400)
- Repeated all steps with the new RAM
- Update BIOS (on both boards, incrementally through updates to see if a particular rev stabilizes)
- Update drivers (manufacturer sourced, via Driver Verifier, etc. No unrecognized devices)
- Tried a few different SATA cables and moved to different SATA ports
- Swapping keyboards
- Swapping mice
- Swapping monitors (tearing my hair out at this point)
- Unplugging all case-related peripherals (USB font-ports, LEDs, etc)
- Plugging into a different surge protector
- Plugging into an entirely different circuit in a different part of the house

None of this has affected the errors: they're still the same, even after swapping out the mobos & RAM. The only portion of this build that I haven't attempted to replace at least once is the i7 itself & the SSD. I can definitely go grab another SSD if that's a possible point of failure, but I'm timeboxing this build to another week. If I'm not on track to resolve it by then I'm just going to buy a prebuilt from someone so I don't feel so bad about myself when it doesn't work. :)

I'm STUMPED. I know I've gotta be missing something obvious - I've never struggled with a build issue this badly. Does anyone have any tips/suggestions/thoughts about where else I should be debugging this? Nothing's too crazy to try, and I'm happy to provide whatever logs or output might help track down what's up Thanks for sticking with me!


 

jpe1701

Honorable
Wow that's tough. any chance it's got something to do with the installation media? It just seems to me you have tried everything else. That must be really annoying. Hopefully some of the smart people on here can help.
 

weilin

Distinguished
Have you made sure the motherboard has been upgraded to the latest BIOS? Did you run memtest86+ to make sure there's no memory errors?

Your errors point to memory issue, depending upon what ends up in the bad memory sector, a different error pops up...

If all that was done and passes... Try running Prime95, you can see if that results in an error...
 

JodahStarlen

Distinguished
Dec 18, 2010
13
0
18,510
@jpe1701 - I wondered the same about installation media. I'm using a Samsung 32g USB3 drive, formatted with the latest Win10.iso from Microsoft, made bootable with the Rufus utility (this was the most commonly-suggested procedure I found for PCs without disk drives). I did re-download the ISO on a different PC last night and re-formatted the usb stick, but a clean install from the rebuilt USB resulted in the same errors during install/use. :( I also tried using the Windows "Refresh Tool", that purports to download the Windows installer while installing. Any other thoughts re: ways to install? I don't have an external disk drive or a disk to install from.

@weilin - I've tried both default & updated BIOS'es on both motherboards but saw the same errors in each case. I also saw the same errors with entirely different RAM. I thought the first mobo might have had a bad DIMM slot but two in a row with the same issues, even across different RAM, led me to believe that's not it. I am running memtest86 now and will try Prime95 afterwards, if I can get the system to stay up long enough to run it. I'll let you know the results, though! Thanks for the suggestions.
 

JodahStarlen

Distinguished
Dec 18, 2010
13
0
18,510
Update:
- I had some old HDDs laying around so I tried a fresh install of Windows on one of those. Got the same errors during install & using Windows (specifically IRQL & SYSTEM_SERVICE). So I think I've ruled out the SSD.

- memtest86 ran for ~10 hours with the Corsair RAM and had only one error (pic below). I'm now unable to POST with the G. Skill. Maybe I just got really unlucky and had 2 bad sets of RAM? I may try to buy another set and test it out today.
Memtest86 results: https://www.dropbox.com/s/egd5zvm27xm7z28/20161116_172008.jpg?dl=0


Any other thoughts? I'm asking around locally for an alternative installation media as well, in case that's part of my problem.
Thanks!
 

weilin

Distinguished
yes any memory errors is instant disqualification. Try running memtest with only one stick of memory. If that passes. See if your computer is stable with just that stick.

Then test the other stick to double confirm it was that stick causing all the issues...
 

JodahStarlen

Distinguished
Dec 18, 2010
13
0
18,510
I went ahead and bought an entirely new set of ram to see if that was the issue. Had 3 of the same sorts of bluescreens within 15 minutes of rebooting. :(

Is there any chance this could be the processor? That's the only component I haven't been able to test a different one of at this point.
 

JodahStarlen

Distinguished
Dec 18, 2010
13
0
18,510
This was apparently a processor issue - after the third set of brand new RAM failed, I RMA'ed the i7 for a new one. The new processor's working great so far.

Thanks for all the help!