I've been having a frustrating time with my system, mostly because I haven't been able to determine to my satisfaction whether my problem is software or hardware based, and further whether it's due to bad component(s), or bad settings/incompatibilities.
Self taught computer geek with a (very rusty) Engineering degree gathering dust somewhere. I've been demoted from "Child Prodigy" somewhere in the '80s to "Obsolete Fossil" now. I've been building my own machines regularly for about 15 years, but sadly every year I understand what I'm doing a little less, are rely on dances and rituals to empower the strange artifacts falling from the gods' sky-cities. I'm reasonably savvy under the hood (but feel naked without my soldering iron), and can follow directions pretty well even if I've long lost the insight into figuring out what the hell Windows is actually doing.
Gigabyte GA-EP45-DS4P MB F8 BIOS
Corsair XMS2 PC8500 DDR2 1066MHz 2048MBx2
Intel Core Duo E8500
GeForce GTX260 video
Creative X-Fi Titanium pro
WD 75GB x2 SATA RAID (Current boot drive)
WD 250GB SATA (Alternate boot drive)
Pair of misc optical drives, Saitek USB KB, Razer USB mouse, generic floppy
Belkin N1 Wireless adapter (PCI)
Antec case/Antec 700W Earthwatts PS (or something like that)
WinXP Pro 32-bit SP3
ESET Nod32 Antivirus
All drivers and BIOS are current, nothing is overclocked, but a word about the age of some of the components:
I've had a string of "Bad luck" before with this box, starting with what I think was a rotten power supply last year, so...
The RAM is new (04/09), "Upgrading" from 4x 1GB 800MHz Crucial sticks (two slightly different sets of two)
The MB is new (05/09) replacing an identical model which got "damaged" when installing the RAM (Stuck in S3 infinite loop)
The CPU is new (05/09) impulse purchased when replacing the MB (old was a E6600 -- wasn't positive it wasn't damaged from earlier)
The Power supply, the 250GB HD, and the video card are from last fall (along with the MB which just got replaced)
The Sound card was added 6-9 months ago
The RAID drives, optical drives and case are a little older, though the RAID was wiped and rebuilt in September (this is also when XP was installed)
Of Note, I was having minor problems with the old RAM and MB (same model MB but different physical unit), which is why I opted to replace the RAM in the first place. The old (RMA'd) MB, RAM, and CPU have already been adopted by a loving home out of town, so please don't suggest that I retry them.
The problem I'm having:
When I rebuilt the system in May, the machine was completely stable for about a month, working what I'd consider 100% normal.
Out of the blue (sorry for the pun), I began seeing BSODs. These would occur anywhere from a few minutes, to typically a few hours after booting, unrelated to system activity/load. Most commonly, if I left the machine idle overnight (typical for me), in the morning it would be frozen or BSODed.
The BSOD stop codes would vary, but typically would point to the RAM.
Windows on restarting would report recovery from a "Serious error," with either the details "corrupted," or claiming the culprit as an unspecified "bad driver."
If I immediately (warm or quick cold) reboot and run a RAM checker (either MemTest86 or the Windows one), I see multiple errors in random addresses on both sticks. These go away if I leave the system off for a few minutes and restart.
What I've done/thought:
If I simply shut the system down and full cold boot, the problem "Resets" and I get (typically) a few hours of stable computing.
I flashed the BIOS to a newer version (No effect)
I've updated all drivers I could think of (NE)
I've adjusted BIOS settings, including varying RAM voltage and slowing down the RAM to 800MHz (NE)
I've "cleaned" my Windows registry (to the best of my ability) (seemed to make the machine last longer at first, but probably NE)
I've done a Repair Install of XP on my main drive (currently the RAID) (NE)
I've done a clean XP install on the non-RAID drive (same problem, plus its boot sector keeps getting corrupted for some reason, bad "HAL.dll" typically)
I've run Chkdsk /p ("Fixed" some bad sectors, otherwise NE)
I've run the RAM sticks individually in different slots (NE)
I've checked system temperatures, I have no direct way of monitoring RAM temp, but the system is always cool when I check (40-50oC), and the crash often happens when not under heavy load.
I've cold booted and run Memtest (without ever going into XP) and it's gone 8-12 hours without any errors
What I'm not presently willing to do:
Buy a Mac
Pith myself (equivalent)
Live without a desktop PC
Punt to Alienware (though starting to think about it...)
I'm kind of at my wits end here, as I'm loathe to do a full HD wipe and reinstall (in case it's really a hardware problem), or start replacing hardware that isn't all that old (in case some driver is really the problem). Sorry for the Wall of Text, but any thoughts would be a big help!
ADDENDUM: Tonight's winning Stop code: 0x00000050, which appears to be a video driver error. This isn't the only stop code I get, merely the one I wrote down this time. My video driver is current (185.85, updated after the problem first began), and my video card is working fine when my system isn't crashing.
Another stop code, 0x0000001A this time, a memory management code. Still no real insight into what's causing all this.
All these problems seem to point to a faulty stick of RAM. Properly functioning RAM shouldn't get ANY errors in Memtest86+ if the RAM speed/timings/voltage are set to the manufacturer's specs in the BIOS. I would make sure all the RAM settings are correct and run Memtest86+ on each stick of RAM by itself to find the bad stick.
All BIOS settings are manufacturer's specs, and the errors aren't confined to one stick. I've run each stick individually and had the exact same behavior. Also, please remember that when first booting, I get no errors on either stick, but once I get a crash, the errors occur spread out among both sticks, at seemingly random addresses (the addresses tend to be the same each time the problem occurs, but next time around they're completely different).
I'm starting to think that despite what the Corsair and Crucial sites tell me, this motherboard doesn't play entirely well with either my new or old memory. My next step will probably be to buy some fresh RAM again, this time looking for modules explicitly named on the Gigabyte vendor list (a little tough, since that list is a year or two old, and RAM part numbers change way faster than that). As an added bonus, on the Once Bitten Twice Shy front, I'm probably going to pick a different manufacturer, though I have no real experience with any of the other brands. Any suggestions?
Well, I'm not willing to say that the RAM is faulty (both sticks behaved identically), but the Gigabyte motherboard seems to be really picky about memory. I've seen some threads from other people having a hard time getting their boards to run stable with 1066MHz modules, even though Gigabyte claims it'll handle RAM overclocked to 1333.
Off of the Gigabyte QVL, I was able to find a couple of modules of Qimonda HYS64T256020EU-2.5-C2 for a good price (not sure if they're "Previously owned"). They only run at 800MHz, but I've had the system up for about 48 hours now with nary a hitch.