Quantcast
Channel: Schneider Electric Blog » From the Trenches
Viewing all articles
Browse latest Browse all 8

Understanding Mean Time between Failure in the Data Center – Part 2

$
0
0

In this blog I am going to finish the topic I started last week’s blog on understanding MTBF.  If you recall, I stated towards then end of last week’s blog that MTBF calculations can lead to numbers that are clearly nonsensical but that may have some value in a relative sense.  That is, the reliability of two products or systems can be compared IF calculated in EXACTLY the same way and IF ALL the same assumptions are made.  I’m going to give an example of this but first let me explain a common method for calculating MTBF using the Annual Failure Rate or AFR.  The AFR is the percentage of a population of units that are expected to “fail” in a calendar year.  Again, note that “fail” has been highlighted in quotes because a precise definition of what constitutes a failure is required.  For a product that is in use continuously (i.e., turned on and left on until failure), the AFR and MTBF are mathematically linked according to:

MTBF(hrs) = 876,000 / AFR(%)

(8,760 hours per year and a 100 multiplier to convert percentage yield the 876,000 number)

Therefore a unit with an AFR of 10% has an MTBF of 87,600 hours.  Since field returns occur on a calendar and not an operating hour basis, it is only actually possible to measure AFR directly.  MTBF is then estimated from the AFR by applying the above formula.  There is a vast library of assumptions and sometimes imprecisely known factors that affect AFR but we’ll leave that discussion for a later time.  Now, let’s move on to my example.

If 100 standard incandescent light bulbs (forgive me Green Gods for not using CFL’s) are all placed into continuous operation (turned on and left that way) on day 1 and exactly one month later 1 has failed, we have an AFR of 1/100*12 = 12% at that point in time.  Using the above formula, this gives us an MTBF of 73,000 hours, which equates to 8.3 years.  Now, we know from experience that ALL the light bulbs will likely have failed by the end of 1 year so an 8.3-year MTBF is meaningless in an absolute sense.  However, when comparing a light bulb from manufacturer A with one from manufacturer B, it is valid to consider the MTBF’s of each manufacturer IF AND ONLY IF they were calculated the exact same way with all the same assumptions.  Also note that light bulb manufacturers have the luxury of manufacturing a product whose failure is not usually “mission critical,” whose lifespan is relatively short, and whose volume is very large, so they can calculate MTBF with better accuracy.  As such, you’ll note an “MTBF” for continuous operation of 2000 hours or roughly 3 months for a standard incandescent bulb as common.

I’ve seen MTBF’s for UPS’s quoted in the range of 100 years.  This is not a practical number but if you can assume that, when comparing 100 years to the number of another UPS manufacturer the MTBF was calculated in the EXACT same way with all the same EXACT assumptions, then you might be able to use it in making some decisions.  This is not likely so please proceed with great caution when utilizing this metric. If you are interested in getting a more technically feel for MTBF I recommend white paper 112, “Performing Effective MTBF Comparisons for Data Center Infrastructure”.

 

Please follow me on Twitter @DomenicAlcaro

 

About Domenic Alcaro:

Domenic Alcaro is the Vice President of Mission Critical Services and Software. Prior to his current role, Domenic held technical, sales, and management roles during his more than 14 years at Schneider Electric. In his most recent role as Vice President, Enterprise Sales, he is responsible for helping large corporations improve their enterprise IT infrastructure availability.

 

 

 

 

The post Understanding Mean Time between Failure in the Data Center – Part 2 appeared first on Schneider Electric Blog.


Viewing all articles
Browse latest Browse all 8

Latest Images

Trending Articles





Latest Images