I have a home-built system that's been running just fine for a few years, and has recently started giving me problems.
The system sometimes runs just fine for a day or more, and other times the problem occurs twice in ten minutes. On average, I'd say it occurs once per evening of gaming (4 hour session approx).
There is a "blurt" from the speakers, and the screen image either freezes or blacks out (both are common). The system fans keep running, but the lights on my mouse and keyboard (Logitech G5 and G15) go out. System is unresponsive to keyboard, and only a hard reset (power button for 20secs) seems to work at this point.
I initially suspected overheating, but the problem occurs no matter what level of gaming I'm doing (BFBC2 seems no more likely to cause it than WoW) and it has even happened when just browsing the web.
Q6600 @ stock with a non-reference cooler
HIS HD 5850 (upgraded ~2-3 months ago from an HD4850)
When the problem first occurred I assumed it was one of two options: Overheating or PSU failing.
I removed the CPU cooler, cleaned it and the CPU and replaced with arctic silver 5 paste. Both CPU and GPU run at reasonable temperatures at all times.
I replaced the PSU with a Corsair 750W.
The problem persists.
Now I am faced with the following possibilities:
RAM failure - How do I test for this? I have no way to reliably cause the error to occur, so testing each stick would be quite hit-and-miss. Is there a program I could use?
Motherboard dying - How would I identify this?
CPU dying - Ditto (although this seems the least likely)
Faulty GPU - again, I have no clue how to diagnose this.
It's a problem with my peripherals - some sort of short in a USB device would just damage the device, right, it couldn't affect the host system to this degree?
Memtest is generally the test to do to force a RAM error. You can download it for free, just make sure you have a cd burner and ISO image burning software to make the boot disk.
Generally if the CPU is dead it wouldn't boot at all. Save for static shock, the CPU is one of the most resilient parts in your machine, will keep going for a decade if it could.
Test for a faulty GPU by testing the system with one known to work just fine, maybe a friends GPU.
Testing for peripheral problems is to leave the computer on with no peripherals (i'm assuming it will eventually crash). If it doesn't, plug a peripheral in and test again. Lather, rinse repeat.
Faulty motherboards are a bit odd, because they are linked to everything in your system, they are often a last resort option and should only be RMA'd once all your other parts are known to be working fine.
Just out of curiosity, how was your system when you had the 4850 in there?
Worked fine with the 4850, it was replaced to improve performance in BFBC2.
I can try and dig it out and see if I can find it - try and replicate it with that installed.
I've thought of trying that before, but since I don't have a way to reliably replicate the problem I feared it would be quite a time consuming and hit-and-miss approach.
I'll also give memtest a go - I have a CD/DvD burner and a copy of Nero that came free with it that I've used all of once.
-You can try using the PC in safe mode or even let it sit in bios for some hours and see if it crashes, it may be boring but it may give a clue if your problem resides on drivers or hardware.
-Also make a clean re-install of your video drivers with Driver Sweeper.
-If you use memtest you'd like to leave it working for about 12 hours or 20 passes to ensure your RAM stability. Also using OCCT or Prime95 for some hours to stress your PC may help, OCCT has built in GPU and PSU specific stress tests.
-This may be a long shot but if your crash is similar to the one I experience in my rig you may replicate it by turning on the PC and immediately begin a session of heavy gaming, this most of the times provokes the deadly sound loop and total system freeze in my PC.
Having tweaked the Bios settings (putting the CPU fan profiles on a higher setting, bascially) I gave it a very brief test last night - running Prime I kept an eye on the CPU temp for ~30 minutes, it never peaked above 61, so that's fine.
I used MSI Afterburner to test the graphics card - under 100% load using the (?)Kombustor test the tempetratures reached a plateua of 63 degres at 56% fan and stayed there for 20 mins. I also checked to see if it was 'drooping' from the slot and it appears to be well seated.
Memtest on a bootable CD - didn't have time to run it for long, but 1 pass with no errors.
Obviously I need to run these for longer to be sure, but the problem does not appear to be easily identifiable.
Memtest - any reason not to run it overnight? I mean, unattended. if it DOES encounter a problem, it's not going to cause any issue that could damage the hardware, right? (I don't know how it would, but I'd rather know for sure before I do it)
GPU driver - was installed after a Driver Sweeper clear. I will try and replace with the latest one tonight.
Motherboard drivers - honestly, these are a massive pain to me. I'm never convinced I'm getting the right ones, and most motherboard manufacturers' sites seem to be under the infuence of poor or partial translation, or at least assume you already know every individual chip on your board. I will see if I can find replacement drivers, particularly for the onboard sound. Does replacing MoBo drivers require a driversweep too, and if so, what tool should I use?
Some folks say Memtest needs an all night session to check RAM properly, personally I ran it for 12 or 13 hours and didn't noticed any kind of "stress" on the rig, cool air came out of the case during all the test. Also, surfing around I found nothing about Memtest damaging someone's RAM.
Updating your bios is advisable too. About MoBo drivers, I never read anything about using Driver Sweeper or so on them, although you can use it on the sound, maybe you can just update them while in safe-mode to be sure.
For this you may use DriverScanner, even if you don't buy the whole version it is kind of useful to properly identify the device names and the latest version of the driver you should look for.
This random crash bug is an acute pain where the sun don't shine, I know it by experience. Yours seems to be pointing to the GPU since your rig worked fine using the 4850, if possible run the stress tests for at least one hour or two to make sure the new card is stable and it would be great if you can get a similar GPU from a friend as Griffolion says.
Be patient mate, and keep on troubleshooting, solution may be very close.