Backblaze, purveyor of cloud storage, has published the statistics (opens in new tab) for the 2,906 SSDs used as boot drives on its storage servers. To be clear, these aren't only boot drives, as they also read and write log files and temporary files, the former of which can sometimes generate quite a bit of wear and tear. Backblaze has been using SSDs for boot drives starting in 2018, and like its hard drive statistics, it's one of few ways to get a lot of insight into how large quantities of mostly consumer drives hold up over time.
Before we get to the stats, there are some qualifications. First, most of the SSDs that Backblaze uses aren't the latest M.2 NVMe models. They're also generally quite small in capacity, with most drives only offering 250GB of storage, plus about a third that are 500GB, and only three that are larger 2TB drives. But using lots of the same model of hardware keeps things simple when it comes to managing the hardware. Anyway, if you're hoping to see stats for popular drives that might make our list of the best SSDs, you'll be disappointed.
Here are the stats for the past year.
While seven of the drive models used have zero failures, only one of those has a significant number of installed drives, the Dell DELLBOSS VD (Boot Optimized Storage Solution). The other six have fewer than 40 SSDs in use, with four that are only installed in two or three servers. It's more useful to pay attention to the drives that have a large amount of use, specifically the ones with over 100,000 drive days.
Those consist of the Crucial MX500 250GB, Seagate BarraCuda SSD and SSD 120, and the Dell BOSS. It's interesting to note that the average age of the Crucial MX500 (opens in new tab) drives is only seven months, even though the MX500 first became available in early 2018. Clearly, Backblaze isn't an early adopter of the latest SSDs. Still, overall the boot SSDs have an annualized failure rate below 1%.
Stepping back to an even longer view of the past three years, that 1% annualized failure rate remains, with just 46 total drive failure over that time span. Backblaze also notes that after two relatively quick failures in 2021, the MX500 did much better in 2022. Seagate's older ZA250CM10002 also slipped to a 2% failure rate last year, while the newer ZA250CM10003 had more days in service and fewer failures, so it will be interesting to see if those trends continue.
Another piece of data Backblaze looked at is SSD temperature, as reported by SMART (the Dell BOSS doesn't appear to support this). The chart isn't zero-based, so it might look like there's a decent amount of fluctuation at first glance, but in reality the drives ranged from an average of 34.4C up to 35.4C — just a 1C span.
Of course that's just the average, and there are some outliers. There were four observations of a 20C drive, and one instance of a drive at 61C, with most falling in the 25–42 degrees Celsius range. It would be nice if the bell curve seen above also correlated with failed drives in some fashion, but with only 25 total failures during the year, that was not to be — Backblaze called its resulting plot "nonsense."
Ultimately, the number of SSDs in use by Backblaze pales in comparison to the number of hard drives — check the latest Backblaze HDD report (opens in new tab), for example, where over 290,000 drives were in use during the past year. That's because no customer data gets stored on the SSDs, so they're only for the OS, temp files, and logs. Still, data from nearly 3,000 drives is a lot more than what any of us (outside of IT people) are likely to access over the course of a year. The HDDs incidentally had an AFR of 1.37%.
Does this prove SSDs are more reliable than HDDs? Not really, and having a good backup strategy is still critical. Hopefully, in the coming years we'll see more recent M.2 SSDs make their way into Backblaze's data — we'd love to see maybe 100 or so each of some PCIe 3.0 and PCIe 4.0 drives, for example, but that of course assumes that the storage servers even support those interfaces. Given time, they almost certainly will.
I felt the NAND burn from here.
Because servers are under constant power and have many ways to mitigate loss of power, they are not comparable to consumer PC.
Backblaze literally says their data is ONLY applicable to comparable server environments and has no relevance to consumer devices.
You can in theory install an uninterruptible power supply behind a PC, which would give you similar power stability as a server, but it would be much cheaper to just back up on more media. Another thing is that cheap consumer UPS use batteries, and they are just as likely to catch fire than storage failing from a power interruption.
Depends on drive and power supply. Some have capacitors which allow all write operations to complete. If you are on a UPS or laptop, you are covered there too.
That said, power loss to a SSD results in FILE CORRUPTION (data) NOT DRIVE FAILURE (physical). Physical failures affect everyone, power failure or not.
File corruption errors are what redundant filesystems like ZFS check for. They add another few layers to the block sector checksum. And NT Drives has a transaction log. Transact begin/end on table entries. If power goes out before the operation is done, the transact end is missing and the file assumed bad and the table rolled back. XFS isn't bad either. I consider it reliable enough for smaller backup arrays.
We tried using an SSD for the data drive (Postgresql Database tablespace) but noticed that it would hang every now and then which we attribute to the drive's GC. And in the end it wasn't significantly faster than the HDD's which we over-provisioned to make sure the data could be kept as much as possible in the outer most tracks.
In 23 years we've never had an HDD failure but we replace servers and thus the drives every 4 years.
StorageReview routinely does reviews of enterprise SSDs. Here's a recent one:
Now, I'm dying to know which SSD you tried. Because the most optimal data layout, on the fastest 15k RPM HDDs, are still limited to mere hundreds of IOPS. Whereas a good NVMe drive can deliver several tens of thousands at a mere QD=1.
Your server fleet consists of just 1, I take it?
Both SSDs and HDDs have to write entire blocks. An update of a partial block or RAID stripe necessarily involves a read-modify-write operation. SSDs are much faster at that than HDDs.
It'd be interesting to know why it hangs for entire seconds, but the mere fact that consumer SSDs bog down under sustained load is a well-known fact. It's principally due to the way they use low-density storage to buffer writes. Once you fill that buffer, then your write speed drops to the native speed of high-density writes.
While it seems the 970 can handle sequential writes without a dropoff, there definitely appears to be some buffering in effect for small writes:
Another issue with sustained workloads on M.2 drives is thermal throttling! And that's something that you'll definitely encounter with a heavy database workload.
Still, it's not a big enough drop to explain the hang.
It's recommended not to mount it with the TRIM option. The preferred way to handle TRIM is to schedule fstrim to run during off-peak hours.
Whatever the specific cause of the hangs, I think the main issue is probably trying to use a consumer SSD outside of its intended usage envelope. The issue might've been compounded by misuse of the TRIM mount option.