Sign in with
Sign up | Sign in

Enterprise Endurance

Enterprise SSDs: Are You Asking Too Much From Client Drives?
By
Brought to you by What's this

#2) Enterprise Endurance

All computer components fail eventually. The object of the game is to get the longest service life with the highest performance at the lowest price, and buyers inevitably have to weigh compromises between those factors in their storage decisions. Enterprises are rightfully phobic about data corruption and loss, because a home user will survive if a Web cache file becomes inaccessible, but an enterprise will have a serious problem if it suddenly finds a batch of mission critical data rendered into meaningless gibberish. This is why many enterprises implement parity and redundancy in their systems, but such caution adds cost. One way to mitigate that cost is to experience fewer hardware failures, and that means using a higher grade of component quality in storage systems.

When discussing SSDs, longevity is often described in terms of endurance. This refers to the number of write (program/erase) cycles a NAND cell can sustain before it wears out and becomes unable to retain data. SSD controllers use advanced wear leveling algorithms to make sure that writes get spread evenly across all of the NAND media, so when one cell wears out, it’s a fair bet that many other cells aren’t far behind. As such, drive vendors can approximate a drive’s endurance based on the expected endurance of its NAND cells.

When SSDs were first entering the mainstream, single-layer cell (SLC) NAND had the benefits of much faster performance and a 100,000 write cycle endurance, which made it the clear choice for enterprises. In contrast, multi-level cell (MLC) NAND had higher data density (and thus higher capacity), but its rated endurance was only 10,000 cycles. Since consumers and office workers exercised far fewer writes each day, this lower endurance wasn’t seen as a problem for client drives.

SSD endurance can be impacted by the sophistication (or lack thereof) in a controller’s wear leveling routines. Designing algorithms that actually distribute with near-perfect evenness across the NAND is harder than it sounds, and lower quality solutions often develop “hot spots” in which NAND blocks fail prematurely. The SSD space is brimming with new companies. Enterprises often shun untried vendors, and this would be one place where that caution is justified.

No matter how good the wear leveling and how long the cell endurance, though, NAND blocks fail sooner or later. Enterprise drives from the SCSI, SAS, and Fibre Channel worlds have long histories of implementing extra data protection measures for those instants of data failure, and thankfully many of these have carried over into the SSD space, including:

  • Error correction code (ECC). ECC relies on extra bits in data streams to detect and then correct any information that might have been corrupted while in flight between two devices.
  • T10 Protection Information (PI). PI allows a checksum to pass from the application through the host bus adapter and out to the storage device. ECC and other measures also perform a similar function, but PI does a more thorough job of validating data as it moves continuously through the entire storage subsystem.
  • Input/output error correction/detection code (IOECC/DC). NAND suppliers provide basic IOEDC/CC intelligence, but it falls to the drive vendor to add more safeguards beyond this. Some do, some don’t. All IOEDC/CC implementations are not created equal.

These measures and others add “robustness” to a drive’s ability to protect data integrity. Client drives tend to not support them or else implement them with less depth than what is found in enterprise drives. If data is worth protecting, it’s worth protecting thoroughly.