Failure Modes and Effects Analysis (FMEA) is no doubt one of the more widely used tools in risk management. It originated in American defense services during the 1940s and 1950s timeframe, and over decades made inroads into the U.S. National Aeronautics and Space Administration (NASA), aerospace and other industries. Its utility is only limited by your imagination; I have applied it across the spectrum from vacations to business operations and found that it enables me “see” the 90 percent of the risks that lurk below the water’s surface.
But, you cannot see clearly if tears of grief or pain swell up the eyes. While the FMEA is very useful in failure prevention, it is often associated with much agony as many users struggle to apply it. In this article, I have extracted materials from my upcoming books to discuss three of many reasons why this happens, and what to do about it.
In risk management, several terms* are used on a casual basis — with inadequate understanding or definition. The standards are not of much help either since varying definitions of “risk” occur in different standards – with ISO 9000:2015 referring to it as “the effect of uncertainty.”
Using terms interchangeably can lead to confusion and faulty analysis. Take the case of mixing up risk level and harm. Risk level is a combination of frequency and severity of a given failure. Harm is more like severity, it is the impact, or an effect from a failure, and it is situation specific.
Think of a boat with a large hole in the bottom versus another one with a small hole. The former will flood and sink quickly. On the other hand, if the latter boat has not one, but many small holes; it will also sink rapidly. Small events occurring in large numbers can and do add up to have the same outcome as a single large event.
Along the same lines, about 30,000-plus fatalities occur annually in the U.S. due to auto accidents. What is the corresponding figure for deaths from nuclear accidents? Zero! Yet, most people feel a nuclear power plant carries far more risk than the automobile. They would rather have coal-fired power plants that contribute to deaths in the mining industry and expose miners to coal dust health hazards that cause cancer.
The remedy lies in standardizing the terminology and educating everyone in the company on the proper use of the same together with a process for conducting risk management.
In the boat example above, what exactly was the failure? The holes in the boat? The flooding of the boat? The sinking of the boat? If you consider the “failure” to be the flooding of the boat, then the effect is it sinks, and the associated harm is the drowning of the occupants. In this case, the cause for the failure would be the hole(s).
If you consider the hole(s) in the boat as a failure, then the immediate effect is flooding, and the downstream effect is sinking. What is the cause? It’s not clear from the provided information; it needs to be investigated. Teams (or individuals) working on FMEAs get caught up in this kind of reasoning and find themselves running around in circles as the same thing is labeled a failure mode by some, while others label it an effect.
The antidote to this is asking the questions — what is the object of analysis, and what are its functions? If it is the boat, its function is to stay afloat. When a function is not accomplished, it is a failure.
Rating tables are a critical element in the FMEA; they are the backbone when it comes to identifying the risk level in the form of risk priority number (RPN). It integrates the three dimensions of failure: severity of the effect, frequency of failure, and detectability of the failure into one number that is easy to comprehend and use.
Each dimension is evaluated using a numerical rating scale that converts different conditions/states to numbers. For example: Where the “effect” of a failure is severe enough to cause a fatality, the Severity rating would be the highest possible (typically 5, or 10 – depending on the rating scale which may go from 1 to 5 or 1 to 10). For Occurrence and Detectability, the scales can be quantitative in nature where data is available. Integration of the ratings is achieved via multiplying the ratings for Severity, Occurrence, and Detection to provide a single number representative of the risk level. This number can vary from 1 to 125 for a five-level rating scale, such as the one below:
The trouble starts when the scale descriptions are too sparse, or ill defined — leaving the door open to varying interpretations. For instance, Severity from a failed car horn in India would be very high; whereas in the U.S., it might be low. The failure was the same in both cases. Therefore, proper descriptors are a must.
Constructing rating tables with the intended use of the product in mind, and who the users are, can alleviate much confusion and error.
So, there you have it — three ways to change gears and “do FMEAs without tears.”
All contents of this article are copyright of Rai Chowdhary and may be used by MasterControl under permission from the author.