Reader How Tos: Building For Stability

Stability Testing, Continued

The final stage is what to do if it does fail. The following information represents my thought process, but what I try to do is identify it as fitting into one of my main categories for instability (i.e. Hardware, Heat or Software);

  1. The type of failure will help to determine my gut reaction. I have found a complete lock indicates hardware, as normally a software fault causes a system crash of some description. A BSOD (blue screen of death) can indicate hardware or software, depending on the error description. Reboots generally point to hardware, although a crash to desktop can be either, but I have found it normally points to RAM. I have found reboots are difficult to distinguish from a crash to the desktop when testing overnight, so I add a login screen or similar to clarify.
  2. If the test fails, I rerun it with the case off in a cool room. (It is difficult to gauge "cool," but be aware of the ambient temperature.) This should eliminate heat-based problems, so if it didn't crash with the case off, I further investigate my cooling solutions (e.g. heatsink mounting, airflow through case).

Consider your cooling; this solution has a PSU exhaust and chassis fan around the processor.

  1. If I suspect a software-based error code, then I investigate drivers, moving backward through releases and reading comments on newsgroups. Have they made "optimisations" that reduce stability? Software errors can hide a PSU or RAM read/write error due to effectively causing a bad read/write. The software does not compensate for bad reads/writes of RAM, and this can cause a software error where the hardware is at fault. In my opinion, as VIA drivers have been notoriously unreliable in the past, I recommend reading the VIA arena forums (http://forums.viaarena.com/) for feedback on the latest revision, but the principle is the same for the other chipset manufacturers, as well. Graphic card drivers also suffer the same problem, so I check forums and look for new "beta" drivers, as these are often the preliminary fix for just my problem. (It can be weeks before the official release, and some drivers never get beyond beta!)
  2. If I suspect the hardware as the cause, then I start to use more directed benchmarking. There are numerous RAM tests (that run from boot disks) to help check for RAM integrity. I have found this has sometimes failed to recreate the problem, as is does not produce the same amount of loading (hence heat) compared to a full multimedia test.
  3. The SiSoftware Sandra application (http://www.sisoftware.co.uk/sandra) provides a "burn in" test, but it has not always recreated my problem. I tend to loop a mix of tests that have heavy system loading with some that test fundamental hardware performance. There are numerous processor, graphics card and RAM tests available, free to download off the net. Good, free resources can be found with minimal effort.
  4. If you still believe there is a hardware problem, you should have a reasonable hunch as to the cause of the problem. The process can be approached from several angles: first, a replacement item(s) from the original supplier can be arranged (though it takes time). I also borrow components from a "donor" system for testing, or, finally, put suspect components in a donor system to see if the problem is recreated. Make sure you ask the owner of the donor PC first, because there is a small chance you could damage donor parts.

This guide is not exhaustive, but the main aim should be to think about the possible causes and use a systematic approach if problems occur.

Conclusion

I wrote this article because I had not seen one that I thought discussed the many areas of concern I have when building a PC. There is a lack of detail in many areas in this article, as I am trying to describe my approach, rather than to just provide an endless list.

If you consider all the OSes, chipsets and processors involved, I believe you could write a book on this subject. However, I hope my description provokes thought when contemplating building and putting together a system.

My approach has been criticized as being overly cautious, but I believe the need is understood when building a PC for someone else (particularly distant friends, where repair is difficult), or when you encounter problems with reliability. If you have never experienced such problems, consider yourself very fortunate!

My Systems

I currently have two PCs, both Athlon XP-based using VIA KT266A chipset. I won't bore you with the specifications, but I have devoted a lot of attention to noise, as fan drone drives me mad! My XP1700+ is overclocked to a 1900+, but is whisper quiet with a GF4Ti4400.