Imagine yourself as a fighter pilot, hurtling to land on the pitching deck of an aircraft carrier. Your 47,000-lb. Super Hornet, F/A-18 E is fast approaching the deck, and, if you have a bad landing, you may crash and burn — or plunge over the edge. Between you and catastrophe is a slender cable, which is part of an arresting system designed to catch your jet and slow it down enough to stop it before the runway ends. Your plane trails a hook that should catch the arresting cable, which then pulls on other cables and pistons that slow your jet. Each time the cable catches a plane, it gets somewhat worn or damaged. Because a snapped cable would be disastrous, you fervently hope that somebody has been maintaining it appropriately. Fortunately, the Navy, via metallurgical studies, simulations, and painful past experiences, knows the arresting cable is good for at least 125 landings. It actually might last longer, but the Navy does not want to take the chance on losing a pilot and an expensive plane, so its conservative maintenance strategy is to replace the cable immediately after 125 landings.
For the Navy, a well-managed repair and maintenance (R&M) function is a matter of life or death. It also is vital in many industries such as manufacturing, transportation, oil extraction, mining, metals, pharmaceuticals, medical devices, airlines, and chemicals. A good R&M function reduces capital expenditures because well-maintained equipment lasts much longer and is less likely to fail and stop production. Maintained equipment also helps meet quality expectations and ensures customer satisfaction. Finally, it usually is much more cost-effective to appropriately maintain equipment than it is to repair or replace it once it is broken. A familiar example is the recommended practice of periodically changing the oil in a car versus the high cost of engine failure.
Although there is little internal auditors can tell a maintenance engineer about the technical aspects of equipment maintenance, there is much they can do to review and test the R&M processes to improve their effectiveness.
Developing an Audit Program
By understanding the elements of a well-run R&M function, an auditor can design an operational audit program that is tailored to the exact needs of the organization. One way to gain this understanding is to team up with the most knowledgeable members of an organization’s R&M function. For example, in my role as an audit executive with a US $23 billion global industrial company with hundreds of manufacturing locations, I invited the three R&M managers with the best reputations to join two audit managers and me for a three-day session in which we generated our R&M audit program and a separate guide that became the reference material for the auditors. For the first several audits, the R&M experts on the development team participated as guest auditors and were paired with senior auditors. This helped validate the R&M audit program and facilitated knowledge-transfer to internal audit.
The R&M experts on the audit program development team agreed that an important element of an effective R&M function is a computer system that holds key information on the facility’s equipment. The system should be able to answer these types of questions:
- How critical is the individual equipment to overall production? Is it a bottleneck?
- When did it last break down?
- How often does the equipment fail?
- What is the estimated useful life?
- Which components generally fail?
- How was it fixed last time? How long did it take, and how costly was the repair?
- Are needed spare parts available in stock?
- What is the equipment’s maintenance history?
Another important module of an R&M system is a scheduling function that includes the date of the next maintenance event for each piece of equipment. Every morning, a report should be generated with the day’s scheduled maintenance events for each engineer to accomplish, with past-due events flagged for R&M management.
Preventive or Predictive
The R&M experts explained to the auditors on the program development team that there are two main types of maintenance practices: preventive maintenance and predictive maintenance. Each is appropriate for different circumstances.
Preventive maintenance is the approach used in the aircraft carrier example. Based on engineering studies, past failures, benchmarking data, and information from the equipment manufacturer, the R&M function estimates when the part is likely to fail and schedules the maintenance activity for a conservative period before failure is anticipated. The more potentially costly the failure, the more conservative the advanced interval for maintenance should be. Although this type of maintenance is much less expensive than having critical failures that must be repaired, it does leave money on the table. For example, following preventive maintenance principles, the Navy replaces its arresting cables before they need to. This approach squanders useful life and is wasteful because the scrapped part is expensive. Maintenance management should perform a cost-benefit analysis, considering the probability and cost of equipment failure vs. the cost of wasted life of the replaced equipment. The auditors could evaluate this analysis as part of the audit program.
A more advanced and potentially less costly form of maintenance, called predictive maintenance, evaluates the condition of equipment through constant monitoring. The goal is to perform maintenance just before failure, based on the systematic observation of the equipment. This is the time when the maintenance activity is most cost-effective and before the equipment loses performance. By doing predictive maintenance, there is little waste in the remaining life of discarded parts. Also, there are fewer maintenance events over the life of the equipment, greatly reducing labor cost. Despite such advantages, there is a cost to monitoring that must be compared to the expense of the wasted useful life of equipment and additional maintenance events in preventive maintenance.
Take for example big industrial motors, which have large ball bearings that become damaged over time. Bad bearings could cause catastrophic failure and burn out the motor, possibly requiring an expensive replacement. Using preventive maintenance, the motor would be totally dismantled periodically and the ball bearings automatically replaced. With predictive maintenance, the R&M function might instead take weekly infrared pictures of the motor. These pictures can reveal “hot spots” in ball bearings that may indicate the motor is about to fail. In response, the organization would schedule maintenance of the motor right away.
To evaluate equipment condition, predictive maintenance uses nondestructive testing technologies such as infrared pictures, acoustic tests, vibration analysis, sound-level measurements, and used-oil analysis. This type of maintenance might require technical monitoring expertise, which may need to come from external specialized services if it is not available in-house.
The desired maintenance approach should be considered when purchasing new equipment. A requirement for the vendor might be that there should be built-in gauges or warning lights signaling the imminent need for maintenance. The “check engine light” in cars is one example. While built-in monitoring tools are more expensive at the time of equipment purchase, over the life of the equipment, they can make economic sense. A best-class R&M function should work closely with the manufacturing engineers to decide, in advance, the most effective way of doing maintenance.
The auditors should ask about the use of preventive versus predictive maintenance practices and probe whether predictive practices were considered. The auditors also could ask to see the analysis of the cost of continually monitoring the health of equipment (for predictive maintenance) compared to the cost of squandered useful life and additional labor if preventive maintenance is used. One common recommendation is to consider adding predictive maintenance practices for specific pieces of equipment.
Another superior R&M concept, Total Productive Maintenance (TPM), was launched in Japan and refined by Toyota in the 1980s. In TPM, the equipment operator is trained to perform many of the day-to-day tasks of diagnostics and simple maintenance. To enable this, TPM teams are created that include the operators as well as a maintenance engineer for each production station. The operators are trained to understand the machinery, identify potential problems, and maintain the equipment before a failure can affect production. This decreases downtime and reduces the overall cost of maintenance. Usually, the simpler maintenance tasks are assigned to the less technically skilled but lower-cost operators (e.g., change oil, add lubrication, tighten belts, and replace modular components). For the more complex maintenance tasks, expensive engineers are scheduled. The employees closest to the equipment, the operators, are in the best position to notice odd noises, burnt smells, rough vibrations, etc. Usually part of a broader Lean Six Sigma or Total Quality program, TPM is a proactive and efficient approach that aims to identify issues as soon as possible and prevent problems before they occur. Keys to TPM are a high degree of employee involvement, teaming, empowerment, and training. Not every corporate culture can make this work.
Good R&M departments also have a set of performance metrics. For example, maintenance management should measure the total hours spent on maintenance and compare this to the total hours for both repair and maintenance. Eighty percent or greater maintenance hours is usually excellent; 60 percent of hours for maintenance indicates the R&M process has significant opportunities for improvement. Auditors easily can calculate this to assess the R&M process. Because the percentages considered best in class vary by industry, a benchmarking or research effort may be needed to set goals. Another good metric for auditors to test is the percentage of maintenance events that are completed as originally scheduled. A 95 percent or greater completion rate is considered good performance, as long as the missed events are completed subsequently within an established and reasonable time frame.
Another critical area internal auditors should analyze is spare-parts management. For each piece of equipment, the R&M function should make an analytic decision about which spare parts to stock and which ones to order at the last minute.
I worked at one organization where the manufacturing facilities used huge generators that required more than a year for the vendor to build to the appropriate specifications and deliver. If these generators failed, the entire production center would grind to a halt, potentially for more than a year. These generators can cost as much as US $10 million each, making them very expensive to stock as a spare part. However, not stocking a spare generator might risk hundreds of millions of dollars in lost production.
After much analysis, the organization ultimately decided to stock just one of these spare generators at one of four similar plants. If the generator went bad at one of the other three plants, the spare generator could be transported to that location within two weeks. This emergency shipping and loss of production was expensive, but given a careful analysis of the costs of the various options, and the probability of total failure of these generators, it was an acceptable solution. Of course, there still was some risk that two or more plants might need a new generator before a spare could be restocked.
Auditors can evaluate the economic analysis used to decide which spare parts to stock vs. which ones to order as needed. Because stocking spare parts ties up valuable working capital, this cost should be compared to the risk of losing production if a critical piece of equipment stops working. The economic analysis needs to consider the probability of a part failing, the cost of stocking the part, the criticality of the part to production, the lead time to get the part once ordered, and the business impact of not having production during the lead time period. If the economic analysis is incomplete, it is unlikely that the appropriate spare-parts decisions have been made.
Sometimes, even with the best maintenance practices, critical equipment fails and needs to be repaired immediately. In this case, the R&M group should have a “race car pit crew” mentality: Get in and out as quickly as possible. Every minute a critical piece of equipment is down, the organization loses valuable production.
Auditors should ask to see the contingency plans the R&M organization has created to mobilize in case of unexpected critical equipment failures. Contingency plans should be in place and kept up-to-date. Needed parts and tools should be identified and organized. The repair steps should be rehearsed, especially if they are complex.
Auditors also could review past equipment failures that affected production. By analyzing how long production was impaired, the auditors could ask about the repair practices used and make recommendations to enhance preparation. This is an area that may be beyond the auditors’ technical skills to assess. If the organization has R&M experts outside of the specific facility being audited, auditors could engage them to help analyze past production shutdowns.
A Worthwhile Investment
An effective operational R&M audit program should review, analyze, and test the various elements comprising an R&M organization, such as the maintenance information system, type of maintenance practices, metrics, and preparedness to repair equipment when it fails. Investing the time and effort to create an operational audit program for this vitally important function can contribute greatly to reducing risk, minimizing cost, and improving operations.