Please help with mysterious memory-related problem

frankiee69

Commendable
Apr 14, 2016
9
0
1,510
Hi guys, I really could need some help from the experts. I am literally banging my head on this!

I have a R4BE board with a 4930K and 64GBs (8x8GB DIMMs of G.Skill memory rated at 2133 MHz). This machine dual boots with Win 8.1 and OS X 10.11

While the machine was running flawlessly for over 2 years, a strange problem occured at the time I upgraded to 10.11 and I still do not exactly know what is causing this. What I get is a (usually) "type 14=page fault" Kernel Panic when running OS X, but only when:

  • - I have slept before
    - I am doing a Memtest within OS X
    - ... or usually at shutdown / restart, but sometimes machine just restarts "out of nowhere"
    and only with having all 8 DIMMs inserted
This does not happen when:

  • - Running in Windows (at least I could not reproduce)
    - Running the UEFI version of Memtest (for > 20 hours)
    - When having only four DIMMs in my machine. Does not matter which they are, i.e. problem goes away when taking out four of them, but as well when swapping those remaining four with the four I took out
So, what could this be? The RAM? The memory controller of the CPU? A software problem?

Already tried a LOT of things, including:

  • - Disabling USB2 and USB3
    - Removing cards
    - I even swapped the mobo with another model! (R4E and then R4BE)
    - OC or not OC does not matter (I mildly overclocked, i.e. CPU to x42, and RAM to 1866MHz, but still well below specs, with no problems in IBT, Prime95 etc, so I guess it was absolutely stable)
I have absolutely no idea what to do, so any help or any tips are GREATLY appreciated!
 

frankiee69

Commendable
Apr 14, 2016
9
0
1,510
The model number is: F3-17000CL11Q2-64GBZLD, its a kit and it should be certified.

For me the main question now is how to find out if its a bad IMC or bad RAM without exchanging everything (that would be pretty expensive), or - if both seem to be OK - how to stabilize the memory situation in regards to sleep / wake. It might be that 10.11 has become kinda "agressive" with paging and / or virtual memory. Looks like the kernel maps are getting corrupted or something. Still strange it only happens when having slept before.

Maybe some BIOS tweaks could help to find out?
 

frankiee69

Commendable
Apr 14, 2016
9
0
1,510
OK, I did that and actually it seems the problem got worse, at least when I got to 1.25V (Was careful and first tried 1.1, then 1.2, then 1.25 as you recommended)

With the last increase I even had *multiple* panics one time when restarting, think there were three or even four of them in a row - something I never had before! The subsequent test my machine refused to reboot at all, i.e. black screen after quitting all apps, so I had to press the reset button after a while since no restart occured and the machine seemed to be stuck. At next restart I did see the usual "type 14" panic report though. What also seemed to change is that the crashing process varied more, i.e. it was not only launchd that crashed (usually its that 95% of the time) but all other sorts of processes like kextd, mdworker or aslmanager. But still the type was "page fault error" just a different process. BTW the page fault error code is always "0", and CR2 is always "0x0000000000000030" if that helps. So while the process apparently crashing can vary, some other parameters are always the same.

Hmm, now what does this mean? Bad IMC? Or is there something else I can try?
 

frankiee69

Commendable
Apr 14, 2016
9
0
1,510
HD - not sure - at least I tried a fresh install on a different HDD and there were the same problems.

About sleep states - and of course sleep is definetly involved since this only happened when I do sleep before - any idea what I can do, for example in regard to BIOS settings?
 

frankiee69

Commendable
Apr 14, 2016
9
0
1,510
Setting harddrives to "never turn off" does not help unfortunately. Still think this is somehow memory related, since reducing RAM to 32GB did help. Not sure if its a concrete DIMM that has a problem as it did not matter which 4 DIMMs I inserted. So it might be all 8 DIMMs not playing well together or could this be the CPUs memory controller having a problem?

I have no clue how to find out whats the source of the problem or maybe if this is still a kind of weird software error. So as increasing VCCSA (and VCORE as well) did not help, does this rather point to the CPU having a problem or is it more likely its the RAM? Downclocking the RAM to just 800MHz did not help either.

I also checked the DRAM timings in BIOS and there were some values that seemed a bit odd to me, dont know if this is normal:

DRAM CKE minimum pulse width has only three readings, but shouldn't that be four - each for one channel? Plus, some readings were not the same for all channels, namely tRWDR, tRWDD, tRWSR and DRAM IOL, these were different between Channels A/C and B/D. Is this to be expected?

 

frankiee69

Commendable
Apr 14, 2016
9
0
1,510
I increased DIMM Voltage to 1.55V and VCCSA to 1.15V. With my first test after that I wasn't even able to completely reboot (like it happened before with increasing VCCSA only) - machine hung again with a black screen. (But the usual KP still occurred, saw it with next restart after pressing the reset button). Oh well ...

So if anything, esp. increasing VCCSA seems to make the problem a bit worse, not better! What could this mean?

And what do you think of the BIOS readings I mentioned above? Is that really supposed to be, i.e. different values across the four channels? What I think is especially strange is that there are only three entries for "DRAM CKE minimum pulse width". An explanation for this value I found reads:

"CKE defines the minimum number of clocks that must elapse before the system can transition from normal operating to low power state and vice versa."

At least it has something to do with sleep I guess. So might this be an indication of something being wrong with one of my DIMMs?

Anyways, thanks again a lot for your help! I really appreciate this, since I am a really overwhelmed by this type of problem.