Actually Drive usage appears to have little impact..... (See Below)
Yes, Some die and some do not.
The point of the post is not to prove that RAID-0 is a reliable safe method.
The point is to show that the failure rate is probably FAR higher than most expect in a 3-year period.
Many posters keep talking about 500,000+ MTB Failure rates and the odds of having a drive fail for many years as being almost statistically impossible.
This study which is likely the most comprehensive to date, is quite shocking at the rate of drive failure.
---------------------------------------------------------------------
The study uses failure data from several large scale deployments,
including a large number of SATA drives. They
report a significant overestimation of mean time to failure
by manufacturers and a lack of infant mortality effects.
-----------------------------------------------------------------------
In this study we report on the failure characteristics of
consumer-grade disk drives. To our knowledge, the
study is unprecedented in that it uses a much larger
population size than has been previously reported and
presents a comprehensive analysis of the correlation between
failures and several parameters that are believed to
affect disk lifetime. Such analysis is made possible by
a new highly parallel health data collection and analysis
infrastructure, and by the sheer size of our computing
deployment.
One of our key findings has been the lack of a consistent
pattern of higher failure rates for higher temperature
drives or for those drives at higher utilization levels.
Such correlations have been repeatedly highlighted
by previous studies, but we are unable to confirm them
by observing our population. Although our data do not
allow us to conclude that there is no such correlation,
it provides strong evidence to suggest that other effects
may be more prominent in affecting disk drive reliability
in the context of a professionally managed data center
deployment.
--------------------------------------------------------------
Points to ponder:
1) MTBF is a statistical probability construct. Way back in High School, introductary statistics taught that combined probability was not additive, but MULTIPLICATIVE. In short, total error equals X
times Y,
NOT X
plus Y, where X and Y are the individual probabilities of two independent events occurring. In our case, when two (or more) individual hard drives will llikely fail.
2) MTBF is calculated on a relatively small sample size of a particular manufacturing run. Smaller sample size means greater uncertainty. And, as we also learned in High School, statistics can be manipulated.
3) On what basis do you assume the manufacturer is taking a truly random sample of product from a particular run?
4) On what basis do you assume that the manufacturer of a particular line is not fudging the data? There were cases of "over-optimistic" specs for assorted audio equipment over the years. In some cases, investigation demonstrated (at best) dubious methodology. Keep in mind that everyone was scratching each other's backs. Even US-based manufacturers like Macintosh or European-based manufacturers like Bang and Oluffson used components manufactured in Japan. And the Japanese electronics manufacturers were, and remain, an incestuous bunch. The acronym "CYA" comes to mind.
5) The article specifically references the fact that "consumer level" devices were under study. About 2 years ago, all of the major "first-tier" hard drive manufacturers changed their warranty policies. The warranties on consumer products were dropped from 3 yaers to 1. This was discussed at some length here at THG.
6) The failure rate statistics quoted in the article are instructive on a number of levels. Firstly, they confirm one of the most serious concerns raised in the THG article - that the quality of the product had been severly compromised in order to reduce costs. Secondly, since Google is most likely to use product from first-tier manufacturers, this doesn't say anything positive about the reliability of current product. Thirdly, remember that the large increases in capacity in the past 18 months, along with the release of 2 revs of a new interface which has essentially replaced a technology that has been in use for almost 20 years (IDE/ATA), along with entirely new storage techniques (perpendicular vs horizontal) used to increase storage capacity have been accompanied by very significant price
decreases. Such price decreases are abnormal in any other market, especially for the really new, (b)le(e)/a-ding edge equipment. How did you think the manufacturers were managing to stay in business? Something has to give under these circumstances. You'd better believe it that quality and reliability and very low product prices are mutually exclusive. If you think I am wrong, why is it that there are so few hard drive manufacturers left out there? Do the corporate names Quantum, Conner Peripherals, Fujitsu, and Maxtor ring any bells? Quantum, Connor and Maxtor have been bought out by Seagate. Fujitsu got out of the IDE/consumer market about 18 months ago. I don't think IBM is manufacturing consumer hard drives anymore. Samsung is not considered a first-tier manufacturer, but they still offer a 3 year warranty, and have some of the best prices around. Draw your own conclusions.
7) What passes for RAID controllers on assorted MoBos is a very bad joke, in extremely poor taste. These clowns are unable to ensure compatability of their onboard RAID controllers between revs of the same model of MoBo. Never mind different generations. And while using the same controller chip at that. What's wrong with this picture?
Given that these are trivial, stupidly inconsequential and jejune concerns, I am amazed that there are so few RAID 0 systems set up out there, especially in mission-critical applications. Even more amazing is that, despite the fact that a RAID 0 setup won't actually lead to a real and noticeable performance improvement in 90% of the cases where it is implemented, is that there are so few RAID 0 setups out there.
The good readership will need to bear with my sarcasm. I am not one of those morons who is willing to pay a several-hundred dollar premium for the latest graphics card that only gives a 1 to 3 FPS "improvement" in performance over the previous generation. Even 10 to 50 FPS at anything over 60 FPS cannot be seen (why do you think movies are still made at 30 FPS?). Tragically, I don't have more money than brains, and what little brains I do have, I tend (for some very strange reasons) to use very carefully.
But then, what do I know?
Since it is your time, money and work at risk here, you are free to make your own decisions and deal with any negative consequences. Just don't whine and ask for help when (not if) your system crashes and burns. Deal with your own, informed, screw-ups.