Reason #4: Power Consumption
There’s More To An Enterprise Drive
What's this
Reason #3: Reliability
Clearly, building a structurally stronger drive will help prolong that drive’s longevity and reduce the wear and tear caused by vibration. But there’s much more to how enterprise drives go about extending their functional lifespan well beyond that of desktop drives.
Ultimately, the most important characteristic of any hard drive is its ability to store data and retrieve data accurately upon request. Without accuracy, no other characteristics matter. This is why all drives implement error correction code (ECC), a technology that allows in-flight data to be checked for errors and, if necessary, automatically corrected. SAS drives will also typically add error detection code (EDC) for additional data protection as well as more advanced firmware features and algorithms designed to assist in error handling.
These added safeguards are sufficient to give enterprise drives an order of magnitude greater data protection. Whereas a Barracuda desktop drive will experience an unrecoverable read error once in every 10E14 bits read, a Constellation ES nearline drive will experience one such error in every 10E15 bits. In a three-drive RAID, this improvement would drop the chance of an unrecoverable read error from 12% down to under 2%. More volumes in the RAID and/or the use of a more fault-tolerant RAID type, minimize the risk further.
The risk of a non-recoverable error happening during a RAID rebuild is real and potentially very destructive. This is why rebuilds need to complete as quickly as possible, in order to keep that window of vulnerability short. Seagate introduced a feature called RAID Rebuild into its enterprise drives (also called Rebuild Assist in versions submitted to open standards committees) designed to keep this window narrower than ever before. See Seagate’s “Reducing RAID Recovery Downtime” technical paper at http://bit.ly/yieFrd for more details.
“If you are rebuilding a RAID, and you experience a second error during that time, you’re out of luck – you’ve lost everything,” says Seagate’s Barbara Craig. “Plus there’s the downtime involved in rebuilding. Our new RAID Rebuild cuts the rebuild time down to approximately10% of the time. Essentially, we do copies of all of the data that’s good, and then we just rebuild the part that’s in the damaged area of the drive rather than the entire drive.”
The flip side of reliability is device longevity, meaning how long one can expect the drive to run before ceasing to function. This is typically expressed with the mean-time between failure (MTBF) metric or sometimes an annual failure rate (AFR) percentage. It’s commonly known that desktop drives have an MTBF around 700,000 hours. This reflects a bell curve-type analysis in which the peak of the curve rests at 700,000 hours. Half of the drives in any given lot will fail sooner than this, a few considerably sooner. Every drive that dies requires IT support time, creating additional costs beyond any replacement expenses. Thus it pays to push the mid-point of that MTBF bell curve as far to the right as possible. Enterprise drives generally feature a 1.2, 1.4, or even 2.0 million-hour MTBF.
But that’s not the end of the MTBF story. Keep in mind that heat plays a key role in wear on mechanical parts and electronics. When desktop drives specify 700,000 hours, this is usually based on laboratory tests conducted in a roughly 40˚C ambient setting, just over the maximum recommended temperature threshold Intel advises for performance-oriented desktop PCs. Enterprise drives achieve their 1.2 million-hour rating under 60˚C conditions, which is much more applicable to a high-volume, high-density data center application. Put drives qualified for 40˚C into a 60˚C environment and watch both performance decline and the MTBF curve swing to the left.
“Under data center conditions, a SAS drive operates anywhere from 1.5 to three or four times faster than a desktop drive,” Seagate representatives informed us. “So if you take a SATA drive and try to run it in exactly the same enterprise application, the very fact that the drive is, let’s say, three times slower than an enterprise drive means that it will take three times longer to do the work. If an enterprise drive is busy 33% of the time, a desktop drive would be always busy with no idle time whatsoever. The usage goes up, the temperature goes up. When temperature goes up, typically reliability comes down. So you’re going to get more failures.”
The greater fragility of desktop drives is part of why their duty cycle is so much lower than that of their enterprise counterparts. Desktop models sport a recommended duty cycle of eight hours per day, five days per week, yielding a total life cycle of 2400 hours over five years. Enterprise drives are expected to function around the clock, yielding a five-year life of 8760 hours. Expressed differently, one could expect to go through 3.65 desktop drives during the effective use time of one enterprise drive.