When a critical machine goes down, the natural instinct is to swap the failed part and get production moving again. But effective root cause analysis for equipment failure demands more than a quick fix. It requires understanding the physical and chemical mechanisms that led to the failure. As a practicing tribologist, I've seen too many plants replace bearings, gears, or seals only to have the same failure repeat within weeks. The real culprit is almost always a lubrication or wear issue that a proper root cause analysis for equipment failure would have caught.
Why Traditional Troubleshooting Falls Short
Most maintenance teams rely on experience and intuition to diagnose failures. They look at the broken part—say, a spalled bearing—and assume it was a material defect or just "old age." But that approach ignores the underlying physics. By the relevant standard (ISO 15243:2017), bearing failures are classified into six modes: fatigue, wear, corrosion, electrical erosion, plastic deformation, and fracture. Each mode has distinct root causes tied to lubrication, contamination, or operating conditions. Without a systematic root cause analysis for equipment failure, you're just treating symptoms.
In the lab we call this a failure mode and effects analysis (FMEA) applied to tribology. On your shop floor, it means collecting the right data before you touch a wrench. This includes oil analysis, vibration data, operating temperature history, and a visual inspection of the failed surface. An oil analysis report from an ISO 17025 accredited lab can reveal particle counts (ISO 4406), water content (ASTM D6304), and additive depletion. These clues point directly to the root cause.

The Tribology-Based RCA Framework
A structured root cause analysis for equipment failure follows three steps: data collection, hypothesis testing, and corrective action. I recommend this framework, which I've developed over 25 years of consulting.
**Step 1: Collect forensic evidence.** Pull the oil sample taken just before failure, if available. Look at the wear debris under a microscope. Ferrography (ASTM D7690) can distinguish between normal rubbing wear, cutting wear from hard particles, and fatigue spalls. Also document the operating conditions—load, speed, temperature—at the time of failure.
**Step 2: Formulate and test hypotheses.** List possible failure mechanisms based on the evidence. For a bearing that failed by fatigue spalling, the root cause could be excessive load, insufficient lubricant viscosity, or contamination. Use the Stribeck curve to check if the lubricant film thickness was adequate for the operating conditions. If the specific film thickness (λ ratio) was below 1.0, boundary lubrication dominated, and that's a root cause.
**Step 3: Implement permanent corrective action.** Replace the oil with one meeting the correct viscosity grade (ISO 3448), install better seals to keep contaminants out, or adjust the operating parameters. Verify the fix with follow-up oil analysis.
Application Note: For a paper mill hydraulic pump failure I consulted on, the root cause analysis for equipment failure revealed that the oil's viscosity index improver (VII) had sheared down, dropping viscosity below the pump's minimum requirement. The corrective action was switching to a higher VII oil with better shear stability.
Common Failure Modes and Their Root Causes
Three failure modes account for more than 80% of equipment failures in industrial plants: abrasive wear, adhesive wear, and fatigue spalling. Each has a specific root cause tied to lubrication.
**Abrasive wear** — caused by hard particles (dirt, wear debris) in the lubricant. Root cause: inadequate filtration or poor seal maintenance. ISO 4406 cleanliness targets for gearboxes and hydraulic systems exist for a reason. A root cause analysis for equipment failure in a gearbox often shows a jump in particle counts weeks before failure.
**Adhesive wear** (scuffing) — occurs when the oil film collapses and metal-to-metal contact happens. Root cause: insufficient viscosity or high operating temperature lowering the oil's viscosity below the minimum required. Use the viscosity selection guidelines from ISO 6336 for gear contacts.
**Fatigue spalling** — the classic surface fatigue failure. Root cause: cyclic stress exceeding the material's endurance limit, often exacerbated by contaminants that cause surface dents or by inadequate lubrication that increases friction. The root cause analysis for equipment failure in a crusher bearing revealed that water contamination (from a leaking seal) caused hydrogen embrittlement, accelerating fatigue.

Implementing a Long-Term Reliability Program
Once you've conducted a root cause analysis for equipment failure and fixed the immediate problem, the next step is to prevent recurrence. Build a reliability program around oil analysis, vibration monitoring, and thermography. Set alarms for critical parameters: water content above 500 ppm (ASTM D6304), particle counts above ISO 4406 20/18/15 for most gearboxes, and ferrous wear debris above 10 ppm. Train your team to recognize the visual signs of failure modes—a polished surface indicates mild wear, a matt grey finish suggests incipient scuffing.
In the lab we call this condition-based maintenance. On your shop floor, it means scheduling oil changes based on oil condition, not calendar time. It means verifying that every new lubricant batch meets the OEM specifications per NLGI consistency grades for greases or ISO viscosity grades for oils.
Conclusion
Equipment failures are rarely random. They follow predictable patterns rooted in tribology. A thorough root cause analysis for equipment failure will reveal the real issue—whether it's wrong oil, contamination, or an overlooked operating condition. Master this process, and you'll not only reduce downtime but extend asset life by years. The next time a machine fails, don't just replace the part. Ask why it failed. The answer is almost always written in the oil and the wear surface.
No feedback yet — submit the first.