For a while now, I have been suffering from BSODs, often upon powerup. I was able to observe massive memtest fails, which explains early BSOD. I have collected interesting clues, but cant put together a diagnosis. Here is what i know
- Not SW related, as I have wiped and reinstalled OS on different drives numerous times
- Not GPU related, as I changed those
- Not CPU related, as I've tried different i7 920's (but could be related to mobo CPU socket and such)
- FYI, mobo is Gigabyte X58A-UD3R, F6 BIOS. I turned my frequencies down, to rule out O/C.
- Issue generally occurs only when PC has been turned off for a long time. I basically get the issue either when I come home from work or when I wake up in the morning (PC off for hours).
- When the issue occurs, Memtest86 reports numerous fails in a consistent memory addr range, for a single bit (sort of like a massive bitline fail, with some 500 consecutive addresses at the end of a RAM reporting bad bit4). I have 4x G.Skill F3-10600CL9D-4GBNT and have manual timings to match 9-9-9-24 and CR 2. The fails are in the range of about 1500-2000, which sounds like last quarter of a 2gb addr space.
- Using PC reset or ctrl-alt-del type reboots does not resolve the issue
- Using PC power off button, followed by power on RESOLVES the issue and memtest starts passing. This is VERY consistent and I think is the biggest clue, if someone can explain it.
- I have run memtest on each of the 4 sticks individually, 2 passes each, with all passing results. However, I cant rule out DRAM since the issue comes and goes, so the passes may have been during a "good" state.
- I tried changing voltage regulation on mobo to rule out vdroop, no difference.
- I have on datapoint, where changing BCLK from 175 down to 120 recovered the massive fails. However, need to check this again for consistency. Note that DRAM voltage made no difference (tried from 1.4 to 1.7, with same massive fails when in failing state).
This is a start. I plan the following experiments later, but it will take some time, since the issue requires long powered-off times to show up, so I can't test it in a timely manner.
- Remove 2 of the sticks, wait X hours (to reproduce issue), see if it's still there. If issue present, remove 1 stick, wait X hours, try again. If issue present, switch with one of the earlier-removed sticks, wait X hours, try again. This is to see if a specific module is at fault or if this is not the memory itself causing the issue. Also need to check if a specific slot on mobo is reporting the fails.
- Repeat BCLK experiment
Thanks in advance for any ideas.