BSOD once a day when using 2 RAM slots

Mar 7, 2018
7
0
10
I have a problem with my new PC setup. The setup is assembled by a computer store.

My setup:
Windows 10
Intel Core i7-7700K, 4x 4.20GHz
Xilence i250 PWM
Asus Prime Z270-K, 1151
Nvidia GeForce GTX1080 8GB
2x Crucial 16GB DDR4-2400 (=32GB RAM)
250GB Samsung SSD 850 EVO
2TB Toshiba SATA III

My problem:
Once each day I get a BSOD with the name "memory management" when using my PC. This happens after the PC ran for a while when I start new programs or loading a game. After I got the BSOD I don't get a second one at the same day.

See attached pictures for more information.
whocrashed
cpu-z1
cpu-z2
cpu-z3
cpu-z4
cpu-z5

What I have done for now:
- Updated all drivers
- Updated BIOS to the latest version (1002, 14.12.2017)
- Memtest, multiple times with both or only one RAM stick, no errors
- Chkdisk
- Tested each RAM stick separately for 2-4 days. With one RAM (16GB) I got not a single bluescreen

Ideas:
- Could I fix the problem setting some options in the BIOS manually. I read about increasing the voltage for the memory but have no idea about it
- Could one RAM stick be broken although I tested them separately with no problems?
- Could there be a problem with the Dual Channel Mode on my mainboard?

Thanks for the help!

Kaesezwerg.
 
most likely a corrupted page table entry, not sure because of the undocumented error code returned.

I would make sure your bios is updated to a current version and make sure you have the current SATA drivers from your motherboard vendors website. I would then turn off the virtual memory, reboot and make sure the hidden pagefile c:\pagefile.sys was deleted then turn virtual memory back on.

I would also run a Malwarebytes scan then run cmd.exe as an admin and run
sfc.exe /scannow
dism.exe /online /cleanup-image /restorehealth

reboot and wait to see if you get another bugcheck.

you might also consider putting your drives data cable on a different SATA controller/port
(mosts systems have 2 sata controllers, the slower one is more likely to not have bugs in its driver since it is often updated with windows update, some systems also have special custom functions with higher numbered sata ports, Put the sata data cable on a lower port number (0-3)

sata drives also have firmware that might need to be updated, Samsung also has custom drivers that people install then hit various bugs because they don't get the updates. ie Samsung magician software.

if you do you should copy your memory dump (c:\windows\minidump directory) to a cloud server, share the file for public access and post a link


 
Mar 7, 2018
7
0
10
Thanks for your answer.

BIOS is up to date and also all drivers should be.

I now decided to send the PC back to the computer store I bought it from because I still have warranty for the hardware.

Could inserting a new RAM fix a problem with faulty page table entries? How happens something like this?
I already had this problem when I made a fresh installation of windows 10.
 
it is hard to say without looking at the windows memory dump. This error is pretty common when you get certain malware infections, or have a custom low level driver installed that might have a bug. (Samsung magician driver for example, or bugs in solid state drives firmware)
most of the time you just need to update the driver and firmware. often motherboards will have two sata controllers, the one controlled by the CPU chipset is more likely to work correctly since it is often updated by the windows update software. The secondary (often the faster controller) will often need to be updated manually by going to the motherboard vendors website to download the update.

generally if memtest86 works ok then the problem is not the ram, the corrupted page table could come from incorrect voltage settings in bios, bad overclock settings by a overclock driver, overheating or from driver bugs and malware attempting to infect the c:\pagefile.sys (virtual memory subsystem)

generally, you would update the drivers from the motherboard vendor, do a malwarebyte scan, run crystaldiskinfo.exe to read the firmware version of your SSD and then check to see if there is a update.
sometimes malware will replace a storage driver and just cause lots of problems.

key point is that memory manager bugchecks can be cause by RAM stick, but also the cache ram inside the cpu and the virtual memory subsystem (pagefile.sys) any thing that effects storage can be the cause of a virtual memory error. sometimes looking at the bad memory address can help figure out the source of the corruption. for example low values like memory address 20 are often caused by one driver overwriting another drivers data.
(most kernel mode memory addresses will look 0xffff0123 (leading F characters) while data will look like 0x00000020 (leading zeros, more likely a size data structure from a pool header, which indicates a programming error or a driver overwriting another drivers data)

anyway, if you returned the machine, be sure to run your new machine thru tests when you get it. New machines often have a 5% or so failure rate if they are actually tested.







 
Crucial 16GB DDR4-2400 UDIMM
CT16G4DFD824A
http://www.crucial.com/usa/en/ct16g4dfd824a

Modules sold as singles carry no guarantee to be compatible together when combined. If you want to use two modules then buy a kit of two that have already been tested together and are guaranteed to work together.


You wrote, "- Tested each RAM stick separately for 2-4 days. With one RAM (16GB) I got not a single bluescreen".
https://ibb.co/nxLa07
 
Mar 7, 2018
7
0
10
Thanks for your explanations.

I mentioned the problem with the 2 RAM sticks maybe not working perfectly together although they are both excatly the same. But I will speak to the computer store once again.

They told me they will also check the BIOS settings. But more important it would be to check if all drivers - especially the mainboard drivers - are really up to date. The problem is that I have no idea which settings I have to change in the BIOS that my PC is running with the increased GBs (2 sticks) of RAM.

I will keep the tests you mentioned in my mind.
 
Mar 7, 2018
7
0
10
Update:

I got my PC back from the support. They tested the RAM and adjusted the BIOS settings.

But I still get the BSOD once a day.
The BSOD occurs always when I'm in the loading screen of a game.
It's always MEMORY MANGEMENT with parameter 0x1a. Dumpfile from today: Link

Yesterday, I ran driver verifier and deleted a driver that caused a BSOD while running the verifier. But I still got the MEMORY MANGEMENT BSOD today.
The strange thing is, that I never get this BSOD a second time the same day.

Thank you for your help!
 
you can get errors like this when you have a bad solder connection on a chip. Something like a leg of a chip is not correctly soldered to the pad. when you turn on the system the system boots with the leg disconnected but after 15 seconds or so the chip heats up and the leg expands and makes a connection to the pad. This means the system will pass all of the tests.

these kind of defects are very hard to find, Hopefully it will be in the RAM module but it can be on any connection in the system. you can only find the problem using a heat gun and cooling or by swapping out parts.
The last time I took the time to find something like this it took about a month and it was a leg to a RAM module and I confirmed the problem by removing the heat sink and looking at the legs with a stereoscope to see the crack.

only symptom was the problem would occur 1 time a day in the morning. I later figure it out that it would happen only when the system was cold and I could prevent the problem by heating up the system with a heat gun before I turned on the power. Sucked up a bunch of time to figure it out. had to remove the heat spreader from the RAM to see the problem. solder was on the leg but the pad had separated from the board and did not connect electrically when the chip was cold.



 
Mar 7, 2018
7
0
10
Thank for the quick response.

Does the dumpfile give you a hint that it could be such a problem you described?

I also suspected some time ago that it could be a problem with cold/ heat.
Before I got the MEMORY MANGEMENT BSOD once a day while using the PC, I had an issue when booting the PC after it was turned off for some hours, e.g. in the morning after it was switched off over night. At each - I call it - "cold boot" I got a random BSOD. After restarting I could use the PC without any problems the whole day. But at the next day I got the same problem again while booting. I solved that problem turning off a Windows fast boot option. But now it seems the problem has only relocated.
 
if the problem is on a memory address line on a memory chip, then what will happen is windows will load
drivers into memory in under 5 seconds, then after 15 seconds or so the address leg will make a connection and the memory block will appear to jump to the new address. So what ever driver is loaded over that memory address will not work correctly. Problem is that windows attempts to load the drivers in a different order on each boot up to make it harder for malware and viruses to know where to patch memory. So the symptom you end up getting is a different bugcheck on each boot up depending on which driver is loaded into "bad" memory spot.

you may find that moving the memory around to different slots can help move the problem from kernel memory which causes bugchecks to user memory which will just crash your app.

and this assumes the problem is in the removeable RAM chips.

you could remove 1/2 of your ram, cool down your system and reboot and see if you can get the problem again. Then attempt to swap out ram to see if you can isolate the problem to a ram module or particular ram slot.


or you can do a full shutdown each time you stop windows, then boot into BIOS for maybe 30 seconds before rebooting into windows each time you boot (kind of a pain but it will give you an idea if you are looking at thermal related problem)

cans of compressed air or Freon can be used to cool components so you don't have to wait too long to test.
or a heat gun or hair dryer can heat a component when the system is cold to help test.
IE in the morning before applying power, heat the memory slightly with a hair dryer then boot up and see if you do not get a bugcheck.

guess you could turn off all of your sleep functions and run the system in full power mode so all of the parts just stay warm. That would prevent the problem also.




 
Mar 7, 2018
7
0
10
I tested both of my RAM sticks separately for 2-4 days each. While doing this, I had no problems or errors. The strange is, that after putting both RAM sticks back in my system I ran without any problems for 2 days. But since day 3 I got the same BSOD again once a day.
It also could be the RAM of the GPU or I am wrong?

To turn off more sleep functions besides the fast boot option would maybe help me but not fix the problem itself.
I will try to convince the support tomorrow to take my machine back and maybe fully replace it by a new one because I bought the whole system there.
 
with bugchecks you never know when they are gone. Sometimes it might take a week to hit the problem.
with thermal problems, the length of test running does not help. The problem only shows up during the short 15 second window while the part is cold. the problem is to boot up and run a test that hits the problem in those few seconds is very hard without a dedicated tester.

if they will not swap out the machine, see if you can get a full swap of the memory sticks.

users should not have to deal with this. I would do a return for cash if you can. (rather than a repair)

not likely to be the gpu

new ram sticks tend to have about 6% failure rates when tested. thermal failures generally pass the software tests but are still defective.



 
Mar 7, 2018
7
0
10
Yeah, that's also my opinion that I - as a user - should not have to deal with these kind of problems. I sure could test these things for the next weeks but therefore they have technicians.

if they will not swap out the machine, see if you can get a full swap of the memory sticks.
That is what I want to do. But of course it would be better to swap the whole machine because slowly I loose trust in it :D

I would do a return for cash if you can. (rather than a repair)
That's the point. I gave it back 2 times already and all they did in both cases was to adjust BIOS settings and testing the RAM. But you would take a new machine or would you ask for a cashback and switch to an other store? I have to say that they were kinda friendly and generous because I never had to pay for sending my machine back etc.
I hope they will just switch the machine.

 
bios settings can be perfect and you would still have this problem if it is a crack in a trace or bad solder joint.
a total swap of the RAM would be my next step if you want to spend the time and they extend the initial warranty. (on the entire machine , not just the ram)

A new replaced machine would restart your warranty at the time of replacement. it would be the best option for most users since they will have lost trust in the machine. Even though you loose the time you spent on installing programs and will never get the time back you spent on attempting fixes or going to the repair shop. (lost of use of the machine)

I have worked on problems that took 30 days and found defective hardware then was told I had to have it repaired by the manufacture rather than have it replaced by the vendor. I don't let the exchange periods expire if I can help it.

a replacement should also be tested since there can be hardware design defects on certain model designs.