Emergency Management vs. Failure Elimination

August 4, 2016

In many cases, the results being chased can be achieved from a completely different angle than you would have thought. Most in the reliability community are trained that breakdown elimination is the number one goal. While this is ultimately the endgame, dealing with high levels of reactivity can obtain significant change in performance by containing rather than eliminating existing problems.

 

Two Types of Downtime – Same Result

Between the two primary indicators of a world-class maintenance system; MTTR or mean time to repair versus MTBF mean time between failure, which is easier to impact in the short-term? The answer is MTTR, as many of the contributing factors that determine duration are typically policy, rather than asset condition requiring change. The MTBF we use today is not impacted by what we did yesterday, rather the effect of behavior from over a year ago. In essence, you may be experiencing a “Reliability Hangover”. MTBF or failure rate is a lagging effective KPI:  If you performed all the correct preventive maintenance tomorrow, the equipment will not run better.

 

MTTR, on the other hand, can be immediately impacted by the right program focused on identified losses, countermeasures designed and performance metrics. This provides the fuel for continuous improvement in a world measured in real time.

 

Case Study Overview

Recently, we had the opportunity to work with a client who was experiencing high reactive maintenance levels. Besides the typical falsely prioritized work, they dealt with all down items having equal priority, pulling them in many directions.

 

While performing a maintenance and reliability assessment, we witnessed an unplanned downtime event on a critical asset and the typical behavior that transpired. On the floor with the plant manager, the immediate reaction was to jump right in and see what was going on.  Against better judgement, we stood back and just observed. The event lasted a painful 45 minutes, while the operator tried to solve the problem. Meanwhile, looking up and down the line, the impact of this issue blocked or starved other. All operators up and down stream watched their colleague struggle. There was no sense of urgency or required process for them. THIS WAS NORMAL. In addition, the maintenance team was unaware of the downtime as the operator continued trying to work his way through the problem.

 

The maintenance planning organization had many PMs that were due but stated operations never gave them the equipment to work on. All of these situations were incorrect assumptions.

 

By containing these problems through an escalated, critical-down management system, this site was able to reduce reactive maintenance MTTR downtime by 35%. Not by eliminating the problems but rather by getting out of their way.  The site has reduced its overall by downtime by 45% all within 90 days!

 

Bottom Line: A 30-minute problem should not take 90 minutes.

In the blogs that will follow, this solution will be broken down into individual segments, illustrating first-hand how they solved this chronic problem.  Our multi-part blog series will review:

  • Today’s Normal – Apathy vs. A sense of urgency

  • Establishing straight-line communications

  • Developing a trauma center approach

  • Organizing reliability for small unplanned windows of opportunity

  • Cultural transformation