Sign in with
Sign up | Sign in
Your question

Memtest fails that recover with power button but not reset button

Tags:
  • Memory
Last response: in Memory
Share
December 21, 2010 12:01:44 PM

For a while now, I have been suffering from BSODs, often upon powerup. I was able to observe massive memtest fails, which explains early BSOD. I have collected interesting clues, but cant put together a diagnosis. Here is what i know

- Not SW related, as I have wiped and reinstalled OS on different drives numerous times
- Not GPU related, as I changed those
- Not CPU related, as I've tried different i7 920's (but could be related to mobo CPU socket and such)
- FYI, mobo is Gigabyte X58A-UD3R, F6 BIOS. I turned my frequencies down, to rule out O/C.
- Issue generally occurs only when PC has been turned off for a long time. I basically get the issue either when I come home from work or when I wake up in the morning (PC off for hours).
- When the issue occurs, Memtest86 reports numerous fails in a consistent memory addr range, for a single bit (sort of like a massive bitline fail, with some 500 consecutive addresses at the end of a RAM reporting bad bit4). I have 4x G.Skill F3-10600CL9D-4GBNT and have manual timings to match 9-9-9-24 and CR 2. The fails are in the range of about 1500-2000, which sounds like last quarter of a 2gb addr space.
- Using PC reset or ctrl-alt-del type reboots does not resolve the issue
- Using PC power off button, followed by power on RESOLVES the issue and memtest starts passing. This is VERY consistent and I think is the biggest clue, if someone can explain it.
- I have run memtest on each of the 4 sticks individually, 2 passes each, with all passing results. However, I cant rule out DRAM since the issue comes and goes, so the passes may have been during a "good" state.
- I tried changing voltage regulation on mobo to rule out vdroop, no difference.
- I have on datapoint, where changing BCLK from 175 down to 120 recovered the massive fails. However, need to check this again for consistency. Note that DRAM voltage made no difference (tried from 1.4 to 1.7, with same massive fails when in failing state).

This is a start. I plan the following experiments later, but it will take some time, since the issue requires long powered-off times to show up, so I can't test it in a timely manner.

- Remove 2 of the sticks, wait X hours (to reproduce issue), see if it's still there. If issue present, remove 1 stick, wait X hours, try again. If issue present, switch with one of the earlier-removed sticks, wait X hours, try again. This is to see if a specific module is at fault or if this is not the memory itself causing the issue. Also need to check if a specific slot on mobo is reporting the fails.
- Repeat BCLK experiment


Thanks in advance for any ideas.

More about : memtest fails recover power button reset button

December 21, 2010 12:04:08 PM

As another note, waiting 15 mins is NOT enough to reproduce the issue, so it seems like it really has to be a long time in Off state.
m
0
l
a b } Memory
December 21, 2010 12:50:49 PM

Just a few suggestions... have you checked for bent pins on the cpu socket? Also, it may be the memory sockets. You replaced just about everything other than the mobo.
m
0
l
Related resources
December 21, 2010 1:08:22 PM

Hawkeye22 said:
Just a few suggestions... have you checked for bent pins on the cpu socket? Also, it may be the memory sockets. You replaced just about everything other than the mobo.


Not yet, but I also considered checking CPU pins, especially considering that I've seated many CPUs in it that I used to sell on ebay.

Mem sockets could also be true, I once had a minor spill of liquid coolant between sockets, could be related.

But how the hell does power button (and not reset) fix the issue under any of these possibilities? That's what I'd like to know.

m
0
l
a b } Memory
December 21, 2010 1:22:52 PM

korndog13 said:
But how the hell does power button (and not reset) fix the issue under any of these possibilities? That's what I'd like to know.


Yeah, that's tricky. I'm not sure what all the differences are between a soft reset (reset button) and a hard reset (power switch). Appearantly some flags/switches are persistent across soft boots where as they must get cleared during a hard reset.
m
0
l
December 21, 2010 1:35:24 PM

THe other clue of course is that if i wait 30 mins (tried this morning) in power off state, no issue present. But give it several hours (like 4+ hrs), then it's very consistently producing an issue. Almost seems like some sort of leakage taking place for a long time and maybe getting stored in a capacitor or something. I dont know what else to think :)  Anyway, i will try checking the CPU socket for bent pins later tonight.
m
0
l
December 22, 2010 12:01:48 PM

Good news! I was able to narrow down the problem to a particular DIMM module. WHat I did is I removed the DIMM that was sitting in the memory address space that was failing, then waited X hours and tested: no issue. So now I knew it was either the DIMM or the slot it was in. Next, of the remaining 3 passing DIMMs, I took one and placed it into the removed DIMM's slot. Since then I've run the "wait X hours" test twice and no problem. I will do a final confirmation by reinserting what I think is a bad DIMM to see if problem comes back.
m
0
l
a b } Memory
December 22, 2010 12:19:18 PM

Good job on tracking that down. They can be a real PITA.
m
0
l
!