SSD enthusiast Russ Bishop (@xenadu02) has reportedly tested four NVMe SSDs and their associated protection from power loss. According to his testing, two out of the four SSDs lost data after the SSDs flushed data from the DRAM when power was artificially cut.
Most SSDs on the market today use a DRAM cache to improve latency and bandwidth. However, due to the nature of DRAM chips, DRAM cannot store data when power is lost, which is a legitimate reliability concern for SSDs when unexpected power outages occur. Most consumer SSDs aren't equipped with the power loss capacitors that we see on enterprise SSDs, which makes them more vulnerable to losing data during unexpected power loss events.
The DRAM chip holds a lot of important data, not just temporary data that needs to be transferred to NAND storage. DRAM also holds the drive's FTL or Flash Transition Layer, which is used as a map to see where data is stored on the drive. If the FTL is corrupted, the entire SSD could also become corrupted.
Thankfully, some SSD manufacturers have countermeasures in place for such an occasion. One example is a technique used by Samsung that employs journaling to keep as much data intact as possible during a power outage. Journaling allows SSDs to keep track of what changes need to be made to the SSD from the OS's file system before they happen. When a power outage occurs and data is lost in the DRAM cache, the SSD knows what data was already transferred to the NAND (and what data was lost directly from the journal).
Other approaches involve sensitive circuitry that detects a power outage before all power is lost, triggering a DRAM flush prior to a complete loss of power. These techniques are often good enough for consumer SSDs, to the point where actual data loss is rare during a power loss event.
Fun story: I tested a random selection of four NVMe SSDs from four vendors. Half lose FLUSH’d data on power loss. That is the flush went to the drive, confirmed, success reported all the way back to userspace. Then I manually yanked the cable. Boom, data gone.February 21, 2022
Update 2: models that lost writes:SK Hynix Gold P31 2TB SHGP31-2000GM-2, FW 31060C20Sabrent Rocket 512 (Phison PH-SBT-RKT-303 controller, no version or date codes listed)February 23, 2022
Bishop tested four NVMe SSDs -- the SK Hynix Gold P31 2TB, Sabrent Rocket 512GB, Samsung 970 Evo Plus 2TB, and the Western Digital Red SN700 1TB -- in an effort to see how these drives behave during an unexpected power outage.
The SK Hynix Gold and Sabrent Rocket lost data from the power outage after the DRAM data was "flushed," meaning the data didn't complete its final trip to the NAND. That isn't entirely unexpected given that none of these consumer-class drives have power capacitors for full power-loss protection functionality, but it does indicate that some drives may have better emergency data flushing systems even without a full-fledged power loss protection feature.
For now, Bishop says he is going to test eight more drives, including the Intel 670P, Samsung 980 (a DRAMless drive), Crucial P5 Plus, and more to see how various drives handle power loss.
Tomorrow I'll have results for:Intel 670pSamsung 980WD Black SN750WD Green SN350Kingston NV1Seagate Firecuda 530Crucial P2Crucial P5 PlusFebruary 23, 2022
SSDs have no mechanical storage. period. their storage is all electronic. any power loss, brownout, power supply blowing up, cap going on the motherboard, power fluctuation of any type will endanger your data. not running with a surge protector/battery backup is like inviting the only significant danger to your storage into the house.
I used to buy Crucial end of the line drives with advertised protection, which I think was tested.
I've got UPSes and laptops have batteries.... but still, it's quite a gamble to rely on OSes now a days to properly shut down when certain things are running, or if there's users logged in.
Given that WD and SK are also putting terms of service stickers on their SSDs, according to Russ. That's a strike against those brands, in addition to the particular models that fail on power loss.
In any case, testing how a drive handles loss of power seems rather useful, and makes me wonder why drive reviews don't test for that, seeing as different drives may handle the situation differently.
Actually, it is unexpected. The tests that were performed involved issuing a write command followed by a flush command. Per the NVMe 1.4 specification:
Data that the drive said was written to stable storage via a successful completion of a flush command was not present on the drive when reading it after returning power to the drive. That means that these drives are not conforming to the specification.
The power loss protection feature is addressed in section 220.127.116.11, mentioned in the earlier quote.
That is, drives that support power loss protection will use DRAM + a battery/capacitor to make it so that the write cache is not considered a volatile cache. As such, all acknowledged writes are considered to be on stable storage, even if not followed by a flush. In fact, section 6.8 says that in such a case, a flush command is a no-op if a sanitize operation is not in progress.