The Week In Storage: Self Serving Survey Says SSDs Are Screwy, NVMe Over Fabrics Released

This week in storage (well, the week prior) found us in hot, humid, steamy, rainy, sweaty and wonderful Taipei for Computex 2016. All brutal combinations of heat and precipitation aside, Computex was yet another exciting affair for storage, even though the show overall felt a bit wilted this year. Perhaps my personal perception is due to a seemingly flat year in terms of whizbang product announcements on the PC side. It might be that CES Asia, which occurred in Shanghai a few weeks prior, stole some of the Computex thunder from the cloudy Taipei skies.

There was still plenty to gawk at whilst rudely blocking the aisle, such as the Adata DRAMless SSDs or its latest Tigershark SSDs. True to Chris Ramseyer's Computex predictions, several vendors uncorked 2 TB SSDs, including Micron's 3D NAND-powered 2100 series, the Patriot Ignite, and the Corsair Nuetron XTi. Micron unveiled its forthcoming high-performance Ballistix M.2 SSDs, and Areca also unveiled its new high-capacity Thunderbolt 3 products.

Intel also joined in with a 3D XPoint demo, and as usual, Intel's as-yet undefined type of memory impressed us with its speed. More details also emerged on the Kaby Lake/NVDIMM front, which will be one of the vehicles that speed 3D XPoint into the system.

Self Serving Survey Says Your SSD Is Going To Die And Take Your Data With It

Another week, another self-serving survey. Self-serving surveys are (unfortunately) becoming increasingly popular among the storage marketing folk, and the more panic-inducing the survey, the better.

The process is simple: A company typically commissions a survey that justifies some aspect of its business model, thus drawing attention to the relevance of its products or services. The company then sends out a culled portion of the responses, preferably something shocking, as a press release to the awaiting news world. The media regurgitates the data, usually with little to no fact checking, thus placing said company directly in the limelight. Advertising by way of survey, what wonders the world beholds.

The latest hot topic is a press release from Kroll Ontrack titled "Data Loss from SSD Technology Increases with Wider Adoption." My eyes instinctively narrow at the sight of the title, as it is akin to saying "The Sky Is Blue, and Other Assorted Painfully Obvious Statements of Fact." There is no perfect data storage device known to man, so data loss is a given. Naturally, there will be no data loss if no one uses the storage device, and of course, data loss from any storage medium will increase with more use.

The survey goes on to reveal that, of nearly 2,000 respondents, 38 percent had an SSD failure. Of the 38 percent that experienced a failure, two-thirds permanently lost the data stored on the SSD. This means that roughly 24 percent of the respondents to the survey had lost data on an SSD. Curiously, there is no mention of how many reported lost data due to an HDD.

Queue the end-of-the-SSD-world headlines as some rush to proclaim that SSDs are failing at spectacular rates and that users likely will not be able to retrieve the data. The first tip-off that the survey is somewhat skewed comes in the form of the 92 percent of respondents that use SSDs, which is far above the mix of normal users that have SSDs in their systems. (SSDs make up only 10 percent of brand new desktop PCs, for instance.)

If we take a second to dig into the single annotation on the survey, we see that the survey pool consists of 1,849 Kroll Ontrack data recovery customers. This is a bit like shooting fish in a barrel, as data recovery customers have already experienced a failure and the resultant data recovery or loss.

In Kroll Ontrack's defense, it does note in the press release that the survey consisted of its own customers. Self-serving surveys can backfire though, as it is easy to misunderstand the context of a survey. For instance, if we consider that the respondents are all Kroll Ontrack customers, one might surmise (perhaps incorrectly) that Kroll Ontrack was unable to recover data from failed SSDs two-thirds of the time. This does not engender much faith; perhaps other companies are more successful.

In either case, it is important to note that the responses are not indicative of either the failure rate or recover-ability of SSDs as it pertains to the general population.

All storage devices suffer failures. There are significant hurdles to recovering data from SSDs, as there are with all forms of data storage. Fortunately, most SSD failures are in the SSD controller or logic and not a physical failure of the actual medium (NAND). This increases recover-ability. Clever data recovery vendors work with SSD manufacturers to understand the device, and thus recover the data.

HDDs (arguably) suffer a higher rate of physical damage to the actual storage medium (platters) due to head crashes. Head crashes consist of the heads colliding with the platter, usually due to shock or vibration. This literally scrapes away the data, which renders it unrecoverable (we are also scratching at this topic, as it would take several pages to explain SSD vs. HDD failures). 

SSDs are nearly impervious to shock and vibration, and publicly released SSD failure rate data from vendors indicates that SSD failure rates hover in the tenths of one percent, whereas the HDD AFR (Annual Failure Rate) is usually in the 4 to 8 percent range. It is hard to accurately gauge the typical HDD failure rate due to a lack of industry disclosure, which alone speaks volumes.

SSDs are more reliable than HDDs, but all storage devices will fail, given enough time. All data will be lost eventually, and it does not matter where it is stored. As always, we should store data on several different storage devices, and at multiple locations, to mitigate risk. 

Of course, these surveys are tightly controlled affairs; I have yet to see a company promote a "survey" that states it is irrelevant, or that its product/service is only marginally important. 

This Week's Storage Tidbit

NVMe is coming soon to a network near you.

NVMe is best known for its use as a storage protocol designed specifically for non-volatile memory (such as flash) that reduces latency and system overhead while boosting performance. The new protocol has gained broad industry acceptance for DAS (Direct Attached Storage) implementations, meaning directly inside of the computer or server.

The obvious performance and efficiency gains from NVMe are wonderful inside the system, but it was only a matter of time before our engineering friends decided to bring NVMe out of the box. The new NVMe over Fabrics 1.0 specification allows the protocol, which typically communicates over the PCIe bus, to communicate over other interconnects (such as Ethernet, Fibre Channel, InfiniBand and others).

PCIe can be used to connect servers, but it has length restrictions that limit applications, such as with the technique employed in the FSA 200 we recently evaluated. Expanding NVMe over the various networking options extends the range of shareable storage in datacenter applications and also provides much better performance than competing protocols. 

Protocol perfection is not achieved when there is nothing left to add, but rather when there is nothing left to remove. NVMe revolutionized storage performance by peeling back the performance-inhibiting layers of storage abstraction, and extending the speedy protocol over the network promises to revolutionize the performance of shared storage, as well.

Paul Alcorn is a Contributing Editor for Tom's Hardware, covering Storage. Follow him on Twitter and Google+.

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • sh4dow83
    That survey does seem questionable to say the least. Personally, I had about six HDDs fail on me over the years but no SSD so far. I own 3.
    And so I plan on migrating my HDD RAID1 to SSD once the prices drop a bit further still.
    Reply
  • sh4dow83
    By which I'm by the way not claiming that MY experiences are representative of anything.
    By the way... why didn't you guys research the average failure rates of HDDs? Because I bothered doing that before relying so heavily on SSD and found that they indeed fail much more often.
    Reply
  • John Lauro
    I have lost many hard drives and ssd drives. Not sure which I would say is more reliable, but what I have found is HDD can tend to be brought back to life a little and read most if not all the data. However, when a SSD decides to fail (and I am talking failure way prior to it's estimated life expectancy), it dies immediate, no way of setting up a machine to keep retrying and eventually getting the data off. Plus the reuse different bits and so it's also less likely a recovery service will be able to pull the data off the individual chips and put it back together.

    I think manufacturers should be learning how to have drives fail in a more readable state, but I haven't seen that yet with SSD. I currently have about 120 SSD and way more than that HDDs, and haven't lost any data from a SSD failure. However, all the SSDs are in RAID 1 or RAID 5 or 6, etc...

    Based on my experience, HDD give more warning signs they are degrading... SSD just die for no good reason. As long as you have automatic redundancy in place, the failure rates are not that significant even if one was 10x as likely to fail as the other. However, if you only have a single drive in a computer it can be critical...
    Reply
  • PaulAlcorn
    18102289 said:
    By the way... why didn't you guys research the average failure rates of HDDs?

    Unfortunately, much of the data that is available is anecdotal, and/or drawn from inherently flawed test environments, which are typically not actual test environments. An accurate measurement of failures would require a large sample base (hundreds) and a solid test methodology and environment. Of course, HDD vendors could just share the failure rate data as a few of the SSD vendors have (Intel, OCZ), but we have not seen that as of yet. Better yet, Google (or any of the hyperscalers) could share their data. That would be the best answer.

    Reply
  • CaedenV
    It would be interesting to see how many people have lost data on a SSD 'lately'

    I mean, of early adopters, who didn't loose data on an SSD a few years ago? They were crap, and we all knew it, and we all prepared for it. The point was that while it was not reliable, the convenience of speed far outweighed that fact of life.
    But, of all the SSDs I have bought in the last 2 years (~12 of them... I have a problem lol), I have only had 1 fail. All of these are in use every day in either my own machines, or machine's owned by close friends. They run the gamut on quality, and the only one that failed is a 2 year old Kingston (which I think is under warranty).
    But in that same 2 year period I have also bought 10 3TB HDDs. 3 of which showed up dead, another of which died after 2 months, and another died after a year. Granted, these were all bottom-barrel HDDs (next version of my home server will focus more on quality drives... I just needed to get something up and running), but still... of all of the crap OCZ and Mushkin drives I have bought over the years, not a single one of those have died in the last 3 years.

    I know it is anecdotal, but SSDs seem MUCH more reliable than HDDs in this day and age. But, if asked about data loss... it is true, the SSD that died took some important data with it to it's grave, but as all the HDDs have been in RAID I have not lost anything on them. I suppose when you know you can't trust something, you take appropriate precautions.
    Reply
  • stevenrix
    From a personal experience i've been using hard-drives for the last 30 years: all my drives failed at some point in time. The fastest failure rates was with Maxtor and Seagate. Your mileage may vary, but some of my drives lasted barely 1 month and other lasted more than 10 years. I switched to SSDs 5 years ago and so far they haven't failed on me yet but that will come one of those days.
    It's very important to do backups or have a redundant solution, the price to recover 3 hard-drives of 500 gigs for a company is around $7600. That's a lot of money.
    Reply
  • PaulAlcorn
    18104185 said:
    From a personal experience i've been using hard-drives for the last 30 years: all my drives failed at some point in time. The fastest failure rates was with Maxtor and Seagate. Your mileage may vary, but some of my drives lasted barely 1 month and other lasted more than 10 years. I switched to SSDs 5 years ago and so far they haven't failed on me yet but that will come one of those days.
    It's very important to do backups or have a redundant solution, the price to recover 3 hard-drives of 500 gigs for a company is around $7600. That's a lot of money.

    Excellent insight, we definitely always need some form of redundancy.
    I tried to upvote your comment, but accidentally downvoted it, and it will not let me change it. Gah! I upvote this comment :)
    Reply
  • PaulAlcorn
    18104793 said:
    I keep 3HD with one Crucial MX200 500GB for programs then another to store all my files,
    then i back up everything on my trusty WD Black 2TB HD,
    then i Re Back everything up thru online backup,
    then take my very most important documents putting them on a thumb drive encrypted then placed in a safe in my underground bunker.

    People should never underestimate the importance of the secret bunker in a well-rounded data protection plan.

    Also, the cloud makes backup so easy now, it's hard to justify not keeping an offsite backup. I used to take a few Bitlocker'd drives over to a friends house every few months for off-site, but now just uploaded everything to AWS.

    $59.99 yearly for unlimited cloud backup (I scored it for $5.99 on a Black Friday special) - Bezos, that was a bad idea, I have 13TB uploaded and counting :)

    Reply
  • ohim
    I had only 1 HDD failure since 2005 (only Seagate HDDs, no WD). And 1 SSD failure, Corsair brand.

    But each time i have valuable data i never store it on only 1 drive, i always do multiple back`ups on multiple drives.

    The article is misleading on this matter fueling the fear of loosing data. When you have only 1 copy of your data.. expect that there is a possibility that you`ll loose it all!
    Reply
  • Darkk
    18105064 said:
    18104185 said:
    From a personal experience i've been using hard-drives for the last 30 years: all my drives failed at some point in time. The fastest failure rates was with Maxtor and Seagate. Your mileage may vary, but some of my drives lasted barely 1 month and other lasted more than 10 years. I switched to SSDs 5 years ago and so far they haven't failed on me yet but that will come one of those days.
    It's very important to do backups or have a redundant solution, the price to recover 3 hard-drives of 500 gigs for a company is around $7600. That's a lot of money.

    Excellent insight, we definitely always need some form of redundancy.
    I tried to upvote your comment, but accidentally downvoted it, and it will not let me change it. Gah! I upvote this comment :)

    In enterprise environments you should be running offline backups AND redundancy. Redundancy is great for quick recovery but if you get hit by some crypto ransom malware it's not going to do you bit of good if you don't have offline backups. Don't rely on volume shadows / snapshots as it might get deleted by the malware.

    In addition you should have multiple timed offline backups. Finally test your backups!

    Reply