Any tips/tricks/programs to use would be great. Let me know what you guys got!
To make sure your machine is 100% stable, you have to stress each major component to the max.
CPU, memory, graphics, storage I/O.
I run 3 programs to stress my machine close to the max.
1) Prime95 - This does heavy CPU loading, and moderate RAM loading.
2) QuickPAR - Choose to run 1/2 of the number of cores you have. So if you have 4 cores, run 2 instances. This does moderate to heavy CPU loading, and very heavy RAM loading, and moderate I/O loading.
3) Any 3D game - This typically does moderate CPU/RAM loading, and heavy graphics loading.
Launch Prime95, and do the torture test. Let it run for about 16 minutes, and make sure all cores pass the 1st test. Prime95 checks floating point accuracy, and when things go wrong, you will get warnings, or errors. Warnings typically mean that either the CPU or the RAM is being stressed just a bit too hard. Errors means there is something seriously wrong, and do not do any important work until the problem is solved. You should get zero errors when everything is working fine.
Then, while Prime95 is still running, launch QuickPAR.
QuickPAR is a parity/redundancy calculator allowing you to repair damaged files. Set it to 15% redundancy, and 384,000 block size. Choose between 3 and 4.5 Gb of data. Do this for 1/2 the number of cores you have.
QuickPAR does 3 checks. A quick memory test at the beginning, flaky memory is quickly detected.
At the 50% mark, QuickPAR does a parity check to see if the calculations mesh properly with the hash (or so I believe, it does some sort of complicated check anyways), a failure at the 50% mark usually means memory is being stressed too far, although sometimes it's the CPU.
Finally, once QuickPAR reaches the 100% mark, it checks every file against the redundancy calculation. If there is a failure at this point, usually the CPU is stressed too far, although sometimes it can be the RAM.
If QuickPAR has no errors after the initial memory test, then launch your 3D program. Do not worry if your game is running a bit slow. The goal is to engage all 3, CPU/RAM/GPU, and make sure they all work while loaded to the max.
-----
This method is rigorous, and thorough. I've proved it.
My initial system had 2 Gb of Crucial Ballistix RAM. This RAM worked well for the first four months. Then all of a sudden, I couldn't run it at the DDR2-1066 setting anymore. Reluctantly, I downed the RAM setting to DDR2-800.
At DDR2-1066, Prime95 would fail in a few seconds.
At DDR2-800, Prime95 passed with flying colours.
I would get occasional crashes, perhaps once every week or two.
Then I upgraded my RAM, and purchased, some cheap Kingston DDR2-800 memory.
So I had 2x4 Gb Kingston in the primary slots, and 2x1 Gb Crucial RAM in the secondary ones.
At DDR2-800, Prime95 passed with flying colours.
Crashes reduced to about once every 3 weeks.
I was concerned about the occasional crashes, before the Crucial RAM flake out... I had zero crashes.
I suspected, the crucial memory was flaking out, but didn't know how to test for that.
Another upgrade... old CPU, AMD Phenom 9950 quad core, was upgraded to the AMD Phenom II X6 1090T six core. Now I could do QuickPAR redundancies, while playing my games at the same time.
Crashes increased significantly, once or twice per day.
And I knew there was a problem for sure.
Eventually, I thought out the above testing method, and sure enough, after kicking in the 3rd QuickPAR instance, QuickPAR reported (not on the 1st two, only the 3rd) memory errors right away.
I removed the crucial memory, and voila, everything passed with flying colours. It took my machine about 4 hours to complete the QuickPAR calculations (Prime95 occupies a lot of CPU headroom, so QuickPAR runs about 1/3 slower than normal).
This mix between synthetic stress (Prime95 torture test), real world calculations (QuickPAR instances on real data), and real world gaming (any modern 3D game) is the best way to verify your machine is running top-notch.
Best of luck
--ZeeRoD