About four or five weeks ago, I'd replaced a fan on my GPU with an aftermarket cooler (the stock fan was dying and had an awful grinding sound), and my system was ok for about a few days when I started running into more problems. Playing games, I'd notice occasional lockups and freezing, which required a reboot to fix. Ctrl+Alt+Del did nothing for me, and I Win Explorer and Firefox would both stop responding, audio would cut out, and then the game would stop responding.
The lockups would also happen on watching videos, or shortly after getting to the desktop. For about two weeks, I assumed this was an issue with my hard drive, and backed up everything important onto externals or cloud storage. However, I started getting errors about Windows being unable to detect a bootable hard drive (I don't remember the exact error message, apologies). Initially I assumed it might be an error with Windows' file system so I ran Chkdsk, although it reported 0 bad sectors.
Multi-bit ECC errors became common now, and complete failures to recognise my HDD during booting which required multiple restarts to get into Windows. This continued on and off for about two weeks. Curiously, my ethernet connection would also spontaneously drop out, but this only happened once to my knowledge. This is where I began to suspect that I was looking at either a mainboard failure, or a multiple failure of my hard drive and RAM modules.
I removed two RAM modules and suffered no errors until a few days ago. I found my rig had reset while I was away and was showing that SMART had detected a bad drive. I thought this confirmed my HDD issue, so I rebooted and found that the mainboard was unable to locate a bootable drive. It also hung on the POST screen while running ATA checks for a boot drive. I removed the drive and filed an RMA to WDC for a replacement. It arrived today.
However, yesterday I booted with Hiren's to check the memory modules with Memtest to see which were faulty. I checked all four of my modules for eight hours, ten passes, with zero errors, ECC or otherwise. This is where I began to get suspicious that my mainboard might be failing.
I booted Hiren's again today and checked my old drive after hooking it back up. Previously I'd been unable to locate the drive to check it, but I tried today SeaTools and it located it fine. SeaTools reported that SMART hadn't been tripped as the mainboard said, although it did mention an overtemp.
I'd also tried to boot into Windows earlier, before running SeaTools, and the mainboard recognised the boot drive and made it to the login screen, although I didn't bother attempting to get to the desktop.
Also, it continued to pop up the multi-bit ECC error (with unicode smiley faces) when it couldn't find a drive to boot from. I'm not sure if this an MSI generic error, or if this is something to do with the mainboard RAM controller.
I'm assuming this is a failure of my mainboard, or possibly a PSU failure since I haven't heard any beeps during POST (although I could just have the audio speaker disconnected, I didn't check).
I'm hoping more knowledgeable folks here could possibly weigh in with a second opinion.
Gigabyte HD5870 2GB w/ AC Accelero L2 Plus (stock clocks)
AMD Phenom II X6 1090T Black (stock HSF, stock clocks)
8GB G.Skill DDR3 PC3 10600 (4x 2GB)
MSI 870-G45 Mainboard
Corsair 700W Silent Pro M
With regards to SeaTools I'm aware that temperatures for HDDs have little correlation with failure, and the SeaTools test returned negative errors; but is it possible this could still be an issue with the hard disk? I don't exactly have a test PC to hook the HDD up to boot off and confirm, as I'm currently on a laptop, and my other desktop isn't up to snuff for W7. ):
Also, is there any reliable method of testing the PSU under load that could confirm it isn't the source of the problem, besides swapping it out with a similar capability PSU or using a multimeter? My current rig build is roughly two years old next month, so the PSU has aged somewhat. I currently do not have another PSU with the wattage to run my rig, and those little plug-in testing things (I do not know the technical term, PSU tester?) aren't bulletproof from what I've read because they don't test load, just wiring.
Can I assume that if my PSU passes the plugin test, it's ok?
Also, I would think a mainboard RAM controller failure would produce errors during Memtest as well, but I'm not entirely sure. Other than that, is it possible I could be looking at CPU failure, although my rig successfully POST's, or is the Multi-Bit ECC Error a generic message for MSI boards?
Apologies for the massive amounts of questions, but I'm not especially tech literate when it comes to diagnosing hardware problems based on hunches, and don't have much in the way of testing equipment besides Hiren's/UBCD and gut intuition. >:
Unfortunately sometimes it is just a process of elimination.
There is no reliable way to test a PSU that you would have. PSU testers are expensive. The paperclip test just tell you that the PSU is not totally dead. You already know it.
When I was listening to your symptoms I was thinking either PSU or memory. It is still possible to get a bad psu from Corsair but I would lean toward the memory. Since you tested it and is clean I would agree it is the mem controller on the Mb.