Softlayer: Roughly 5000 SSDs!
The folks at Softlayer are old friends, but they also manage the largest Web hosting company in the world. As such, they know a lot about storage. With close to 5000 SSDs deployed, they give us an impressive data set to analyze. Here is what Softlayer reports.
|Drive||Number of Drives||Avg. Failure Rate||Years in Use|
|Intel 64 GB X25-E (SLC)||3586||2.19 %||2|
|Intel 32 GB X25-E (SLC)||1340||1.28 %||2|
|Intel 160 GB X25-M (MLC)||11||0 %||less than 1|
|Hard Drives||117 989||see Dr. Schroeder's study||-|
The company experiences similar failure rates for SAS and SATA drives as those cited in the Google study. Simply put, hard drive failure rates increase proportional to age, and actual rates match those seen in the two studies cited earlier. In the first year, there is a 0.5-1% annualized failure rate (AFR), which increases towards 5-7% in the fifth year.
While hard drive failure rates are no surprise, the SSD failure rates are telling, too. Though our data points here are limited SSD failure rates seem to increase over time, too. Granted, these drives have only been in use for two years. Clearly, we need to follow up after these SSDs are in use for three and four years to see if a trend can be established.
Softlayer almost exclusively uses SLC-based SSDs due to write endurance concerns. Based on the company's usage patterns, we know that none of the failures have to do with write exhaustion. But alarmingly, many of these SSDs failed without any early warning from SMART. This is something that we continue to hear from different data centers. As InterServer pointed out, hard drives tend to fail more gracefully. SSDs often die more abruptly, for any number of reasons that we've heard reported by actual end-users in the real world.
Softlayer's experience is more mixed; some drives were recoverable, while others were not. None of the company’s 11 X25-Ms have failed, but that’s a tiny sample size and they've only been in service since June 2010.
As we explained in the article, write endurance is a spec'ed failure. That won't happen in the first year, even at enterprise level use. That has nothing to do with our data. We're interested in random failures. The stuff people have been complaining about... BSODs with OCZ drives, LPM stuff with m4s, the SSD 320 problem that makes capacity disappear... etc... Mostly "soft" errors. Any hard error that occurs is subject to the "defective parts per million" problem that any electrical component also suffers from.
All of the data is so fragmented... I doubt that would help. You still need to take a fine toothcomb to figure out how the numbers were calculated.
gpm23You guys do the most comprehensive research I have ever seen. If I ever have a question about anything computer related, this is the first place I go to. Without a doubt the most knowledgeable site out there. Excellent article and keep up the good work.
Thank you. I personally love these type of articles.. very reminiscent of academia. :)
Of the sealed issue of return, if by the time you check that you had been using something different and something said something else different, what you bought that was different might not be of useful use of the same thing.
Otherwise just ideas of working with more are hard said for what not to be using that was used before. Yes?
But for alot of interest into it maybe is still that of rather for the performance is there anything of actual use of it, yes?
To say the smaller amounts of information lost to say for the use of SSDs if so, makes a difference as probably are found. But of Writing order in which i think they might work with at times given them the benefit of use for it. Since they seem to be faster. Or are.
Temperature doesn't seem to be much help for many things are times for some reason. For ideas of SSDs, finding probably ones that are of use that reduce the issues is hard from what was in use before.
When things get better for use of products is hard placed maybe.
But to say there are issues is speculative, yes? Especially me not reading the whole article.
But of investments and use of say "means" an idea of waste and less use for it, even if its on lesser note , is waste. In many senses to say of it though.
Otherwise some ideas, within computing may be better of use with the drives to say. Of what, who knows...
Otherwise again, it will be more of operation place of instances of use. Which i think will fall into order of acccess with storage, rather information is grouped or not grouped to say as well.
But still. they should be usually useful without too many issues, but still maybe ideas of timiing without some places not used as much in some ways.
They will continue to receive invites for our stories, and hopefully we can do more with OWC in the future!