Sign in with
Sign up | Sign in

RAID 5 May Be Doomed in 2009

By - Source: Tom's Hardware | B 34 comments
Tags :

A story appearing online is forecasting the doom of RAID 5 in 2009. Apparently with storage capacities of modern SATA hard drives now reaching 2-terabytes in size, the odds of a read error during a RAID 5 disk reconstruction is becoming unavoidable.

According to Zdnet, SATA drives often have unrecoverable read rates (URE) of 10^14, which implies that disk drives will not be able to read a sector once every 100,000,000,000,000 bits read. With hard drive capacities expected to reach two-terrabytes in 2009, the odds of a read error become practically unavoidable when recovering from a 7-drive RAID 5 disk failure. Upon encountering such a read error during a reconstruction process, it is claimed that the array volume will be declared unreadable and the recovery processes will be halted. Apparently all 12-terrabytes of data stored on the drives will be lost... or at least will require some extra effort and knowledge to recover.

RAID 5 is described as a striped set with distributed parity, which protects against a single disk failure. When a drive fails in a RAID 5 set, the failed drive can be replaced, the data can be rebuilt from the distributed parity and the array can eventually be restored. If more than one drive fails however, the array will have data loss. For some, this can make the reconstruction process after a single drive failure a stressful event, as the array during that time will be vulnerable to more drive failures.

While using RAID 6 instead may seem like a solution, where RAID 6 is two drive failures are allowable instead of just one, the increased redundancy may not be cost effective. Also, as hard drive capacities continue to increase exponentially, year after year, even RAID 6 may soon become prone to the same problems. When single disk drives become 12-terrabytes in size, even a direct drive-to-drive copy may commonly encounter these read errors. The use of disk drives that have smaller capacities and improved unrecoverable read rates could be a solution to avoid these potential headaches.

The problem comes from the increasingly tight data density packed onto drive platters. Using traditional means, bit magnetic poles can often leak their polarity onto other adjacent bits, causing a switch in an otherwise normal bit. Manufacturers have switched to perpendicular recording methods to avoid such problems and increase density, but even this method has its physical limits. Manufacturers will have to find more creative solutions down the road if drives are going to exceed 2TB in size.

Display 34 Comments.
This thread is closed for comments
  • -1 Hide
    Acethechosenone , October 22, 2008 11:17 PM
    "When single disk drives become 12-terrabytes in size"

    Surely we are nowhere near that right?
  • 5 Hide
    jhansonxi , October 22, 2008 11:19 PM
    News about an article posted 15 months ago? You must have quite a backlog!
  • 3 Hide
    mtyermom , October 22, 2008 11:23 PM
    At the rate drive capacity is growing... it'll be sooner than you think.
  • 2 Hide
    Nik_I , October 22, 2008 11:27 PM
    mtyermomAt the rate drive capacity is growing... it'll be sooner than you think.

    i agree. i remember when i got my dell p4 system in early 2004. it came with a 120GB hard drive which was fairly impressive for its time, and now we're already up to 1.5TB, more than ten times the capacity in only 4 years. getting up to 12TB may happen within the next couple of years.
  • -1 Hide
    braindonor75 , October 22, 2008 11:41 PM
    RAID 5 seeing this issue, I can see. RAID 6 maybe but by the time we have 12TB drives, they are not longer going to be spindles, SSDs are far from high capacity now but given time that will happen.
  • 2 Hide
    enewmen , October 23, 2008 1:00 AM
    I still don't understand how RAID with parity can be less reliable than 1 drive with no parity. I never in my life had more than 1 drive fail at the same time (with big & small RAID5 arrays).
    i.e. If one 12TB drive fails in RAID5, it's POSSIBLE (or likely) a reconstruction can fail. If one 12TB drive fails with no RAID, all data is 100% gone for sure.
    Where can I find more info on this?
  • -1 Hide
    Anonymous , October 23, 2008 1:34 AM

    The point of this article was that the increase in drive size will increase the chance of read errors, thus increasing the chance of an error happening while you are rebuilding your array. According to this article, if there is a read error during the reconstruction then the whole array will be lost. If you'll remember from earlier, the read error will be more likely due to the size hard drives will be in the near future. So, in conclusion, your experience with hard drives in the present has no relevance to the issues of larger hard drives of the future, which is the subject of this article.
  • 1 Hide
    nekatreven , October 23, 2008 2:11 AM
    enewmenI still don't understand how RAID with parity can be less reliable than 1 drive with no parity. I never in my life had more than 1 drive fail at the same time (with big & small RAID5 arrays).i.e. If one 12TB drive fails in RAID5, it's POSSIBLE (or likely) a reconstruction can fail. If one 12TB drive fails with no RAID, all data is 100% gone for sure.Where can I find more info on this?Thanks!

    I couldn't give you info that is 100% technically sound on this topic (or a terribly elegant one), at least compared to some hardware junkies...but I'll take a stab at it.

    Consider, that with one drive you are dealing with the probability of one drive failing. That is, the probability that one drive is defective, or has an off 'moment' inside, or is just too old, or whatever. This is the chance that the one drive will 'become another statistic', as I'll call it.

    The simple explanation here provides that the more drives you have, the better chance you have for a single drive failure. So instead of having 1 chance for one drive to have 5 chances for one drive to fail. Then, the more drives you add, even better the probability that one of them will 'become another statistic.'

    The problem enters when you consider that (after a drive failure) rebuilding an array of a certain size takes a quantifiable amount of time and number of disk operations. I have not personally done the math, but the idea is that each drive may (for argument's sake) have a 1 in 10,000 operations error rate, and rebuilding an array of the specified size or disk count may require 10,001 disk operations from EACH disk; and this is assuming that each drive operates within its specified tolerances. Another drive really could fail altogether.

    The basic point is that the numbers can catch up to you
  • 1 Hide
    enewmen , October 23, 2008 2:25 AM
    thanks for explaining..
    It still seems like RAID6 is the better choice over no RAID. (the title of the article can be more clear)
    Lets hope the rebuilding time also decreases with capacity increases.
  • 1 Hide
    FriendlyFire , October 23, 2008 2:47 AM
    What I hate about such types of articles is that the guy basically says: "RAID 5 sucks, RAID 6 sucks, you're screwed. HA! Told ya!"

    It's a bit like announcing the end of the world because X is going to happen, yet not giving any practical solution for how X can be solved. The negative tone of the whole article is just like saying to us that we're doomed to lose our whole data every once in a while and that, oh, it's life... I'm sure there are already researches being conducted to circumvent such an issue, I'm sure technologies will show up in due time, and I'm pretty darn sure the drives will get larger and larger without stopping. There have always been seemingly unsurmountable issues, yet here we are today, through all those impossible obstacles and still living to talk about it!
  • 4 Hide
    bf2gameplaya , October 23, 2008 2:51 AM
    Keyword: unrecoverable.

    Error correction routines must become more robust otherwise all data will be corrupted, by their logic.

    CRC errors happen constantly with I/O, you never notice them, or maybe you do..that weird hiccup or glitch. RAM has ECC as well, so does your CPU.

    Our favorite hobby is awash in bit flipping and transients, if it wasn't for error correction algorithms, absolutely nothing electrical would work...hell all MP3's are just one big approximation of what you might be lisenting to.
  • 0 Hide
    cruiseoveride , October 23, 2008 2:51 AM
    you get a boo for ugly hair.

    I dont see why this is all so melodramatic. Just increase the parity information with each strip. Big deal, as if we cant afford it.
  • -3 Hide
    Darkk , October 23, 2008 3:06 AM
    Makes me wonder why in hell you want a single 12TB drive volume? Ever heard of the expression, "Don't put all your eggs in one basket?"

    Smaller drives are better.
  • 0 Hide
    xxsk8er101xx , October 23, 2008 3:29 AM
    Ok guys they are talking about Corporations with NAS or SANs devices that hold 12 drives and 6TB of data. This is in RAID 5. Just 3 weeks ago we had 2 drives fail. 1 was part of the parity and the other was a spare. Heck one of our servers holds 6 drives 2 used in raid 1 and 4 in raid 5.

    The more drives you have the higher the likely hood that you have multiple failures at the same time.

    RAID 6 is expensive and requires hardware that is expensive. If you've ever bought a Server you will understand. One of our PowerEdge servers cost over 4 grand and that just had sata drives. Throw in 15k SAS drives and you're looking at 7-8grand. Now you need windows server...

    This is old news anyway. Companies have and use redundant systems now. Companies don't rely on RAID all that much. A lot of companies also invest in imaging software like Acronis Echo or Symantec Ghost as well. As soon as your company experiences multiple drive failures in a RAID 5 they will switch to redundant systems faster than you can ship the server. You'll be buying the server that same day.
  • -1 Hide
    FilthPig2004 , October 23, 2008 3:32 AM

    Exactly. Keep each RAID-5 volume under the size where read errors are statistically likely, and it's a non-issue. Assuming the scenario being suggested does occur, then it will simply become an industry standard that all RAID-5 arrays are built under the safe size limit. Either the drive designers will figure out how to resolve the problem, or another technology will replace hard drives, but no corporation or responsible individual is going to risk data corruption in exchange for capacity. Especially when media distribution is becoming increasingly an online enterprise...does anyone think that you'll be able to re-download your multi-terabyte movie collection for free (legitimately, I mean) when your array craps out?

    Also, let's not forget that RAID-5 is not a backup solution. It's a high-availability solution.
  • 1 Hide
    xxsk8er101xx , October 23, 2008 3:32 AM
    What they're talking about is a 12TB RAID Array. Where all the drives in your system, SANs, or NAS box equals 12TB. Then they slice it up for different purposes.

    It's a little bit more complicated as it requires planning of how much space this particular section will need. If you use roaming profiles will you need 1TB or 2?

    DarkkMakes me wonder why in hell you want a single 12TB drive volume? Ever heard of the expression, "Don't put all your eggs in one basket?"Smaller drives are better.

  • 0 Hide
    Pei-chen , October 23, 2008 3:42 AM
    I had case of déjà vu when reading this article. I think I saw it before on the same PC and time in the day.
  • -1 Hide
    geok1ng , October 23, 2008 4:19 AM
    RAID 5 is a nonsense solution: it is slower for writes than RAID 0 and slower for recoveries than RAID. you can lose an entery business day to recover a RAID 5, imagine the time consumed when 2TB+ disk are on the market, and as stated in the article, RAID 5 is not a failproof solution.

    The whole idea behind RAId is that it it an inexpensive/independent array of disks that is safer or faster than JBOD, so with 3 drives in a RAID 5 i have safety and speed, but the safety implies hours of recovery time, during which not a single bit on the entire array can go wrong, and the speed is just for reads, writes on RAID 5 are longer. with just one more disk i can make a RAID 10 solution, for the best of both worlds, or i can go cheap and use the very same 3 disks for a RAID 0 + external dayly backup-an elegant way to have lighting fast performance and fast system recovery.
    On business solution the costs of delaying to much to recover the array far exceed the price gap between RAID 5 and RAID 10/RAID 6, and with the increased likehood that the array rebuild will be unsucessful this cost gap will be reversed.

    Lets all say no to RAID 5!!
  • 1 Hide
    enewmen , October 23, 2008 4:45 AM
    RAID5 saved my butt before. Even with the long recovery time.
    I then quickly got the external drive to be double sure the data is safe.
    This works well for me for home-use.
    2 cents worth..
Display more comments