Help narrowing down memory/motherboard issue!

luxuselg

Distinguished
Sep 27, 2007
17
0
18,510
Hello!

In short:
My current problem is that I get intermittent BSODs of type 'MEMORY_MANAGEMENT' and others, general system instability, and some spontaneous reboots.
This prompted a memtest, which resulted in over 400k errors, which in turn made me attempt to test each stick of memory by itself, but no configuration other than the initial one will boot.
Could this be a motherboard issue, or is it a memory issue?

The full story with more detail:
The issues started while playing relatively demanding fullscreen games a couple of weeks back. My displays would suddenly go dark, the sound would glitch out and/or loop, and the system itself would (apparently) hardlock.

After some research, I marked this down to unrecoverable display driver crashes, and tried disabling SLI. No dice.
Then I tried actually disabling the card in slot #1 (the 780Ti SC) in my device manager, thus also disabling SLI completely, and plug both my screens over to the second card.
This seemed to work without issues, except for a huge hassle on reboot, as I would have to plug one screen back to (the disabled) card #1 to be able to get into windows, then re-enable card #1 and reboot, THEN plug the screen back to #2 and disable #1 again. Whew.

At any rate, this worked without crashing for about a week. Then weird stuff started happening. Chrome tabs would crash or fail to load, dropbox and google drive would crash on startup, and the system would occasionally just reboot spontaneously.

Figuring this might be because of my ‘unconventional’ GPU setup, I removed the card in slot #1, and replaced it with the card from slot #2, taking the #1 card out completely.
This is when the BSODs started happening.
To test whether or not card #1 was, in fact, broken, I ran some 3Dmark stress tests on card #2 (now in slot #1). They ran fine, but as soon as I attempted to run a regular benchmark, the system crashed with a MEMORY_MANAGEMENT stop error. Upon reboot I was greeted by another identical bluescreen about 5-10 seconds after logging into windows.

These bluescreens have been popping up intermittently since then.
Because of the memory-related error message, I figured I might as well try a Windows Memory Diagnostic and see if any errors popped up, but the test came out clean after 2 passes.
Believing this had to be a mistake, I ran Memtest from a USB stick.
Memtest would consistently crash on test #1 or #2 with an error saying it couldn’t start CPU 0, so I had to disable the first two tests to get it to run at all.
I ran the remaining tests for 4 passes, and was presented with a little over 400,000 errors.
You can take a look at the memtest log itself here.

Being fairly confident in that I’d narrowed this down to a memory issue, I then started the process of testing each stick on its own to isolate the faulty module, but no configuration of memory sticks other than my initial configuration would boot at all.

Here are my test results in more detail:
Orig config (rotated 90 degrees clockwise):

DIMM Slot A1 - Serial No. 348325
DIMM Slot A2 -
DIMM Slot B1 - Serial No. 348324
DIMM Slot B2 -

CPU SOCKET

DIMM Slot D2 -
DIMM Slot D1 - Serial No. 348322
DIMM Slot C2 -
DIMM Slot C1 - Serial No. 348323

(This is the config ASUS recommends for this mobo)

Test 1:
A1 - 348325
A2 -
B1 -
B2 -

D2 -
D1 -
C2 -
C1 -
Result:
PC reboots after 20~ seconds, no display output at all. No beeps. Loops endlessly.

Test 2:
A1 -
A2 -
B1 - 348324
B2 -

D2 -
D1 -
C2 -
C1 -
Result:
Same as test 1

Test 3:
A1 -
A2 -
B1 -
B2 -


D2 -
D1 - 348322
C2 -
C1 -
Result:
Same as test 1

Test 4:
A1 -
A2 -
B1 -
B2 -

D2 -
D1 -
C2 -
C1 - 348323
Result:
Same as test 1

Control test because wtf:
A1 - 348325
A2 -
B1 - 348324
B2 -

D2 -
D1 - 348322
C2 -
C1 - 348323
Result:
Normal boot to windows without issues.

Test 5:
A1 -
A2 -
B1 - 348324
B2 -

D2 -
D1 - 348322
C2 -
C1 - 348323
Result:
Same as test 1

Test 6:
A1 -
A2 -
B1 - 348324
B2 -

D2 -
D1 -
C2 -
C1 - 348323
Result:
Same as test 1

Test 7:
A1 -
A2 - 348325
B1 -
B2 - 348324

D2 - 348322
D1 -
C2 - 348323
C1 -
Result:
No reboot, but no display output. No beeps. No response to any kb+m input.

## I reset the CMOS at this point ##

Test 8:
A1 -
A2 - 348325
B1 -
B2 - 348324

D2 - 348322
D1 -
C2 - 348323
C1 -
Result:
Same as test 7

Control test 2:
A1 - 348325
A2 -
B1 - 348324
B2 -

D2 -
D1 - 348322
C2 -
C1 - 348323
Result:
Normal boot, but since the BIOS is back to defaults, the system will not pass POST because it
can’t find the CPU Fan (what).
Setting the CPU fan to manual 100% speed fixes this error.
(This error has not happened before BSODs started happening.)
And now I’m here.
I’m sort of at a loss at this point. I’m reasonably certain that the problem lies either with the memory or the motherboard, but I don’t know how to proceed to narrow it down further.

Any help would be appreciated!
 

luxuselg

Distinguished
Sep 27, 2007
17
0
18,510
I'm sorry for the wall of text. :)

Yes, the board will boot when all 4 sticks are installed in the recommended slots, but it will not boot in any other case. (i.e. less than 4 sticks, sticks in other positions)
 

luxuselg

Distinguished
Sep 27, 2007
17
0
18,510
I also realize i forgot to post my system specs. Here they are:
Motherboard: Asus Rampage IV Black Edition
Processor: Intel i7 4930K
Memory: 4x4GB Corsair Dominator CMD16GX3M4A1866C9 (ver 3.24)
Video Card # 1: EVGA GTX 780 Ti SC
Video Card # 2: EVGA GTX 780 Ti K|ngp|n
Hard Drive # 1: Samsung SSD 840 EVO 120GB
Hard Drive # 2: Samsung SSD 850 EVO 250GB
Hard Drive # 3: Some 2TB SATA drive
Power Supply: Corsair AX1200i
Case: Corsair Obsidian 750D
Monitor: Asus PG278Q
Operating System: Windows 7 64bit
 
not sure if you did but usually clearing the cmos will fix that since it may get stuck working only with that ram configuration. i had a gigabyte motherboard die on me in the summer and it wouldn't boot with more then one stick of ram installed but i would get the BSOD after a few minutes after booting into windows
 

luxuselg

Distinguished
Sep 27, 2007
17
0
18,510
So after clearing the CMOS one more time, I got the system to run with only one stick. On booting to windows with the first stick in for testing, I got another bluescreen, but it produced no errors in memtest. (All 12 tests, 4 passes.)
I tested a second stick afterwards, which also came out clean. I'll test the remaining two sticks after work today.

I find it odd that even though it didn't give me any errors, it would still give me a bluescreen in Windows, but that might just be because of the hardware change? The bsod didn't have any specific error message, anyway.

Will update when I've tested the two remaining sticks.
 


it could also be that the ram isn't being used much before you boot into windows so the ram stick could start failing under errors. only thing you can do is test all your ram sticks one at a time and if they all check out with no errors or BSOD's then its likely a bad mobo