Old School Endurance
Can MLC Endure Your Data Load?
What's this
Imagine a glass vial filled with water, topped by a flexible rubber cap, hanging so that the cap faces downward. One could inject a syringe into the vial, withdraw a couple of drops, withdraw the needle, and the rubber seal would still be watertight. No drops would leak out. But what about if the needle punctured the rubber a hundred times? A thousand? Ten thousand? At some point, the rubber will be so degraded that it can no longer maintain the watertight integrity of the vial, whereupon it will no longer be able to reliably hold water.
NAND cells work in much the same fashion. Each flash module contains many thousands of memory cells. Each cell contains one (SLC) or more (MLC) floating gates sitting atop a silicon substrate with a thin oxide layer sandwiched in the middle. Data writes push electrons from the substrate through the oxide and into the floating gate, where they remain, like water in the vial, until a voltage reversal pulls the electrons back out. The oxide layer is the rubber cap that holds in the electrons, thus preserving the cell’s data. Each time electrons pass through the oxide, a tiny bit of layer erosion occurs. The amount of erosion per write depends in part on the amount of voltage behind the push—more floating gates in the cell require more voltage. This is why SLC cells offer greater endurance than MLC. After so many thousands of write cycles, the layer is shot and the cell must be flagged as dead, at which point the drive controller will no longer recognize it as a valid storage location.

When SSDs first entered the mainstream, NAND manufacturers were working with larger fabrication nodes than today’s 22 to 30 nm lithography range. Back then, it was common for MLC cells to average 10,000 write cycles and SLC to average 100,000. Unfortunately, smaller fab sizes mean smaller cell features, which in turn negatively impacts the cell’s endurance. Today, some MLC NAND (such as that used in the Kingston HyperX 3K) only averages 3,000 writes, and SLC is veering down toward 50,000. Nearly all current MLC contains two bits (floating gates) per cells. The technology for triple-level cell (TLC) NAND has been around for a few years, but no one has taken it into mass production exactly for these reasons. The extra voltage needed for TLC would crush endurance ratings even beyond their already depressed levels.

The counter-force against this endurance decline is capacity expansion. Roughly speaking, when capacity (the number of available NAND cells) doubles, then each cell should only be written to half as often, thanks to wear leveling algorithms that distribute writes more or less evenly across the media. In a better world, the pace of capacity expansion would be greater than the loss of endurance caused by other factors. Unfortunately, this is not the case. The NAND market is losing endurance faster than rising capacities can recoup it, leaving an impending crisis on the horizon that manufacturer R&D efforts are racing to solve.