RAID 5 May Be Doomed in 2009

A story appearing online is forecasting the doom of RAID 5 in 2009. Apparently with storage capacities of modern SATA hard drives now reaching 2-terabytes in size, the odds of a read error during a RAID 5 disk reconstruction is becoming unavoidable.

According to Zdnet, SATA drives often have unrecoverable read rates (URE) of 10^14, which implies that disk drives will not be able to read a sector once every 100,000,000,000,000 bits read. With hard drive capacities expected to reach two-terrabytes in 2009, the odds of a read error become practically unavoidable when recovering from a 7-drive RAID 5 disk failure. Upon encountering such a read error during a reconstruction process, it is claimed that the array volume will be declared unreadable and the recovery processes will be halted. Apparently all 12-terrabytes of data stored on the drives will be lost... or at least will require some extra effort and knowledge to recover.

RAID 5 is described as a striped set with distributed parity, which protects against a single disk failure. When a drive fails in a RAID 5 set, the failed drive can be replaced, the data can be rebuilt from the distributed parity and the array can eventually be restored. If more than one drive fails however, the array will have data loss. For some, this can make the reconstruction process after a single drive failure a stressful event, as the array during that time will be vulnerable to more drive failures.

While using RAID 6 instead may seem like a solution, where RAID 6 is two drive failures are allowable instead of just one, the increased redundancy may not be cost effective. Also, as hard drive capacities continue to increase exponentially, year after year, even RAID 6 may soon become prone to the same problems. When single disk drives become 12-terrabytes in size, even a direct drive-to-drive copy may commonly encounter these read errors. The use of disk drives that have smaller capacities and improved unrecoverable read rates could be a solution to avoid these potential headaches.

The problem comes from the increasingly tight data density packed onto drive platters. Using traditional means, bit magnetic poles can often leak their polarity onto other adjacent bits, causing a switch in an otherwise normal bit. Manufacturers have switched to perpendicular recording methods to avoid such problems and increase density, but even this method has its physical limits. Manufacturers will have to find more creative solutions down the road if drives are going to exceed 2TB in size.

  • Acethechosenone
    "When single disk drives become 12-terrabytes in size"

    Surely we are nowhere near that right?
    Reply
  • jhansonxi
    News about an article posted 15 months ago? You must have quite a backlog!
    Reply
  • mtyermom
    At the rate drive capacity is growing... it'll be sooner than you think.
    Reply
  • Nik_I
    mtyermomAt the rate drive capacity is growing... it'll be sooner than you think.
    i agree. i remember when i got my dell p4 system in early 2004. it came with a 120GB hard drive which was fairly impressive for its time, and now we're already up to 1.5TB, more than ten times the capacity in only 4 years. getting up to 12TB may happen within the next couple of years.
    Reply
  • braindonor75
    RAID 5 seeing this issue, I can see. RAID 6 maybe but by the time we have 12TB drives, they are not longer going to be spindles, SSDs are far from high capacity now but given time that will happen.
    Reply
  • enewmen
    I still don't understand how RAID with parity can be less reliable than 1 drive with no parity. I never in my life had more than 1 drive fail at the same time (with big & small RAID5 arrays).
    i.e. If one 12TB drive fails in RAID5, it's POSSIBLE (or likely) a reconstruction can fail. If one 12TB drive fails with no RAID, all data is 100% gone for sure.
    Where can I find more info on this?
    Thanks!
    Reply
  • @enewmen

    The point of this article was that the increase in drive size will increase the chance of read errors, thus increasing the chance of an error happening while you are rebuilding your array. According to this article, if there is a read error during the reconstruction then the whole array will be lost. If you'll remember from earlier, the read error will be more likely due to the size hard drives will be in the near future. So, in conclusion, your experience with hard drives in the present has no relevance to the issues of larger hard drives of the future, which is the subject of this article.
    Reply
  • nekatreven
    enewmenI still don't understand how RAID with parity can be less reliable than 1 drive with no parity. I never in my life had more than 1 drive fail at the same time (with big & small RAID5 arrays).i.e. If one 12TB drive fails in RAID5, it's POSSIBLE (or likely) a reconstruction can fail. If one 12TB drive fails with no RAID, all data is 100% gone for sure.Where can I find more info on this?Thanks!
    I couldn't give you info that is 100% technically sound on this topic (or a terribly elegant one), at least compared to some hardware junkies...but I'll take a stab at it.

    Consider, that with one drive you are dealing with the probability of one drive failing. That is, the probability that one drive is defective, or has an off 'moment' inside, or is just too old, or whatever. This is the chance that the one drive will 'become another statistic', as I'll call it.

    The simple explanation here provides that the more drives you have, the better chance you have for a single drive failure. So instead of having 1 chance for one drive to fail...you have 5 chances for one drive to fail. Then, the more drives you add, even better the probability that one of them will 'become another statistic.'

    The problem enters when you consider that (after a drive failure) rebuilding an array of a certain size takes a quantifiable amount of time and number of disk operations. I have not personally done the math, but the idea is that each drive may (for argument's sake) have a 1 in 10,000 operations error rate, and rebuilding an array of the specified size or disk count may require 10,001 disk operations from EACH disk; and this is assuming that each drive operates within its specified tolerances. Another drive really could fail altogether.

    The basic point is that the numbers can catch up to you
    Reply
  • cruiseoveride
    ALL is possible on Linux!!!!!

    !go Linux!
    Reply
  • enewmen
    thanks for explaining..
    It still seems like RAID6 is the better choice over no RAID. (the title of the article can be more clear)
    Lets hope the rebuilding time also decreases with capacity increases.
    Reply