What is MTBF?

G

Guest

Guest
I don't understand, what is meant by failure rate, "so the hard rive fails and I buy a new one 1.2 million times" is what my brain interpreted, can someone explain MTBF.
 
G

Guest

Guest



MTBF stands for Mean time between failures, it's a misconception that it's the life expectancy, and I don't understand what "Mean time between failures" means
 

Rusting In Peace

Distinguished
Jul 2, 2009
1,048
0
19,460
I don't think between is right. Surely this is mean time before failure. A single hard drive can only fail once.

Mean time between failure would be having n of the same disks and the average time before you had to replace the previous disk with the next. This is not a standard scenario for disk replacement - infrequently will the same disk model be used given changes in cost and technological improvements.
 
G

Guest

Guest


So your saying that when a hard drive claims 1.2 million hours MTBF, it claims that I can run it for 136 years straight.
 
No, that is not what the manufacturer is saying. What the manufacturer is saying is that, of a sample of 100 drives, 50 will fail before the 136 years and 50 will fail after.

"Mean time between failures" does not mean "average". "Mean" tells you that half the units failed at some point. That is different from "average".

It is largely statistical. In the case of the 1.2 million hours MTTBF (I assume you are talking about the WD RE3 drives), I can guarantee that WD did not build 100 of these back in 1870 and run them continuously until 50 (arithmetic mean) drives failed.
 

jfby

Distinguished
Jun 4, 2010
418
0
18,810


It is Mean Time Between Failures, modeled on a system that failed, was instantly repaired, and then estimates the next time it fails (for whatever reason). A more accurate description would be Mean Time To Failure, but this is not what the marketing execs like to say, and it isn't as slick as MTBF.

According to wikipedia, given X number of failures, the MTBF would be the amount of the the system runs (let's say 20 years) and this number is divided by the number of failures (let's say 2 times). The MTBF of this system would be 10 years.

So theoretically a number of systems using a part with a MTBF of 50 years would only fail once every 50 years.
 
G

Guest

Guest


So how do I know if my drive is one of the drives that will fail after 136 years? :lol:

What does 1.2 million hours tell me about the life span, how can I interpret 1.2 million hours MTBF to "convert" it to life expectancy?
 

jfby

Distinguished
Jun 4, 2010
418
0
18,810
Look at like this based on the equation on how MTBF is calculated. If you take 100,000 units and run them for 20 hour, that gives you 20 million hours of operation. According to 1.2 million hour MTBF, you would expect 2 units to fail (rouned up, of course).

If you want to get statistical about it, look here too: http://en.wikipedia.org/wiki/Normal_distribution.

Assuming a normal distribution about the mean time to failure, roughly half of the units could last over 1.2 million hours and the other half will fall before.

Anyone who has had a 50,000 hr MTBF PSU die after just 18,000 hrs of operation knows that the 50,000 number doesn't really mean that much. That is when warranties become involved; I would consider the warranty and word of mouth before I would believe someone say they had a 1.2 million hour MTBF. I'm not even saying they are being dishonest, just that MTBF is only a statistical theoretical number.
 
MTBF as calculated by manufacturers is a way to lie using statistics. For example, they'll test 100 drives until one fails. Suppose that's after a month of continuous operation. They'll multiply that by the number of drives and say that they only had one failure in 100 operating months, or over 8 years. Furthermore, it says nothing of the usage pattern. Was the drive being hammered as if by heavy database processing, was it doing occasional random seeks, was it repetitively formatting, etc?
It's about as reliable as giving a piece of hard candy to 100 monkeys and seeing how long it took the first one to dissolve, then multiplying it by 100 to claim how long each piece would last.
 

jfby

Distinguished
Jun 4, 2010
418
0
18,810
You cannot calculate a lifespan for a single item with a statisical average (however dubiously the MTBF number was obtained).

The MTBF is really just an estimate on the average time between failures for a specific device; it's not a promise at all. If your unit dies after 1 hr, how does this violate MTBF of 1.2 million hours? It doesn't; theoretically enough devices will last to offset yours dying after 1 hr.
 
Failure rates are affected by environmental factors, such as temperature and humidity, handling procedures, workloads and "duty cycles" or powered-on hours patterns during the drive's life. Did the drive manufacturer actually subject their drives to these varying environmental factors that users would subject them to? I highly doubt it.

Real world MTBF rates seems to be several times shorter than vendors claim. The 1.2 million hour MTBF figure you're quoting is an overinflated value put out there by the manufacturer's marketing department and has no basis in real world use.
 

simontompkins

Distinguished
Jul 12, 2006
31
1
18,535
Not sure what point as being made but mean and average are the same thing.

MTBF is needed for enterprise level companies. Server efficiency is meant to be greater than @ 99.98% which sounds good unless you transact billions of dollars and your servers go down.

It's very much in companies interests to give good figures for MTBF. It allows networks to update before failure, switches are another example where a company will replace them long before end of life.

Also, imagine a company exageratting a MTBF, then a bank's servers going down. If it could be proved that the bank would have replaced their hard drives if they'd had better MTBF - they could potentially sue for vast sums.

Also, although statistically, one hard drive could fail after one hour and not upset MTBF figures, that's very unlikely. Modern fabrication plants mean it's very unlikely that one hard drive would fail - much more likely that whole batches would fail. Which is why fabrication plants have all kinds of measures to ensure they spot problems
 

jfby

Distinguished
Jun 4, 2010
418
0
18,810


I understand what you are saying, and I was just giving an example. For a single person though it is still not possible to guess a single items lifespan (please explain if I am wrong though...).
 
MTBF is a statistical measure derived by extrapolation methods after lab tests. It is primarily for comparison purposes. MTBF requirements drive product design, component choices within the product, redundancies, and manufacturing methods and QA plans.

Statistical inference and extrapolation of test results under controlled environments are used to determine this life. Not just in electronics, but also in products to determine the long term aging effects. Obviously, manufacturers cannot wait for 5 years in order to evaluate the 5-year aging effect on a certain product. Accelerated aging along with statistical methods are used to predict these results by conducting tests over a 30-day period (not 5 years). Same with MTBF on electronic products, and other products.

Use it to compare apples with apples.