The Roots of Wear and Tear
SSD Endurance: Data Integrity is Marathon, Not a SprintRecall that single-layer cell (SLC) technology means that each cell contains one floating gate. Thus the value of that cell is either 1 (empty, no charge) or 0 (containing electrons, negative charge). With multi-layer cell (MLC) technology, each cell contains multiple floating gates. With two floating gates per cell, each unit could register values of 00, 01, 10, or 11—four possible values per cell rather than SLC’s two. This yields higher data density and explains much of why MLC drives can deliver higher capacities at lower price points.
The downside of having thick insulation is the extra voltage required to force electrons back and forth across that barrier. Programming uses a 20V charge while erasure uses -20V. Over thousands of electron flows, the oxide layer wears down. Imagine sticking a knitting needle through a thick slab of fiberglass insulation. One poke isn’t going to make any difference to the insulation’s heat retention properties. But thousands of such pokes will leave your insulation in tatters. Without an effective insulating oxide layer, NAND cells can’t hold their electron charges reliably, and are thus permanently damaged. The drive controller notes this as a bad block. Obviously, the more electrons you have going in and out of a cell, the faster that oxide layer will deteriorate. Again, this is why SLC technology features a much longer data reliability time than MLC.
That said, the challenge to maximize cell-level and drive-level endurance remains for both technologies. Traditionally, MLC cells could withstand roughly 10,000 cycles while SLC could tolerate 100,000. Depending on the vendor, these numbers today can be substantially higher or lower. But using those traditional numbers as a reference, if you were to program and erase the same MLC block once per minute, the usage would kill that cell in about one week. Fortunately, SSDs have at least hundreds of thousands of cells and wear leveling algorithms built into their controllers that are able to intelligently distribute read/write operations across all of the NAND media. This “wear leveling” is critical in prolonging device longevity, and some vendors do it better than others. Unfortunately, it’s difficult to get a sense of wear leveling quality simply by looking at a spec sheet.
Additionally, imagine walking into a room and assessing the level of the light you find there. With only two possible options, on or off, the measurement is easy. But with four possible light settings, you might wonder if the lights are all the way up or only three-quarters up. You might have to spend a little more time thinking about it—essentially measuring and performing error checking. This is why MLC memory uses four or more times as many error correction bits as SLC. However, as MLC cells wear, it still requires more error correction processing to derive accurate readings from them. This is part of why read performance in some MLC drives tends to degrade with time as the media gets heavily used.
SSD Endurance Characteristics
Understanding the differences between MLC and SLC is important. The underlying cell-level technology plays a large role in determining final SSD endurance, but there is much more to the discussion. As noted, there can be a considerable endurance variance within the MLC or SLC categories from vendor to vendor and drive to drive. Perhaps surprisingly, some of the variance can also be caused by environment and usage circumstances. How a drive gets used can literally make or break that device’s endurance promises.
“There are different temperature ranges and different power-on hours,” says Seagate’s Teresa Worth. “Data centers are 24x7, 365 days a year whereas a heavy laptop user might be on eight to 10 hours a day, five days a week. Those are very different usage patterns. Likewise, the work loads are very different. With an enterprise, you’re going to have complex, intensive workloads, with much more emphasis on random reads and writes vs. more simple, read-oriented workloads. The data patterns are going to be different, as well.”