Reader's Voice: Building Your Own File Server

Power, Heat, And Memory

The interior of the case. It isn't pretty, with four PATA cables, seven hard drives, a DVD drive, and power connectors.

Both of my servers use 80% efficient power supplies. The first server has dual Pentium III 933 MHz CPUs, six 250 GB hard drives, and an operating system disk. Peak power when booting is 214 W, and the power consumption is 95 W when the CPU runs at a 100% load. The second server has dual low-voltage Xeon 2.8 GHz CPUs with six 750 GB hard drives and an operating system disk. Peak power when booting that one is 315 W, and the machine uses 164 W when idle and 260 W when the CPU is pegged at 100%.

Unless you have more than six hard drives in your array or a really hot CPU, you don't need a power supply rated at more than 400 W. Of course, the power supply will have to be able to supply enough of the different voltages that the computer will use, but getting a 750 W plus power supply will only waste money and end up running less efficiently than a 400 W power supply.

Memory

Most enthusiasts don't spend enough time talking about memory reliability. They mostly care about clock frequency and latency, which are less important in this environment than reliability. As the data goes in and out of your file server, it is stored in system memory. There are parity calculations done in memory. Data on disk is cached. The best pre-built file server boxes use error correcting code (ECC) memory, while the cheaper ones don't. In my opinion, it is silly to build a high-performance file server and not use ECC memory.

This is the Supermicro MV8 controller card, plugged into a PCI-X slot.

Memory is unlikely to have a permanent error, but is very likely to have non-permanent errors, called transient errors. IBM estimates that with 1 GB of memory, there will be a transient error every week. Alpha particles in the memory packaging and cosmic rays cause these errors. However, ECC has extra memory that is used to detect and correct memory errors. Standard-quality ECC memory will detect all 2-bit errors in 64 bits of memory and correct all 1-bit errors. There are higher-quality ECC components available, such as what IBM offers with its Chipkill memory.

Errors in areas of the memory that are written to before they are read, or unused areas in the memory are not a problem. However, a memory error that otherwise affects processing in some way is a bad thing. Serious server motherboards, such as those from Tyan and Supermicro, will log memory errors. Cheaper motherboards, such as my Asus CUR-DLS and Asus NCCH-DL, will support ECC memory, but won't log memory errors.

There are some CPU chipsets that do not support ECC memory at all, and motherboards manufactured on these chipsets will not support ECC memory. I recommend using only motherboards that support ECC memory, and using only ECC memory in them. If you are really worried about memory errors, the better motherboards will support IBM's Chipkill technology, which will detect and correct many multi-bit errors and even continue to function if a single chip fails.

See the following links for details:

ECC SDRAM Primer
IBM’s Chipkill White Paper

EETimes on IBM’s Chipkill

Soft Errors in Electronic Memory