Condition Monitoring
End-to-end ISO 13374 predictive maintenance: from sensor to prognostics, with a single unified alert inbox and a health index blending 5 signals.
Condition Monitoring
Condition Monitoring is Rela AI's predictive maintenance system. It goes beyond threshold alarms: it learns how each asset behaves normally, monitors health over time, detects anomalies with ML, estimates when it might fail, and consolidates all those signals into a single alert row per asset — not three for the same physical event.
Executive summary
The shift: stop reacting, start anticipating. Before, three independent systems shouted the same thing and no one knew what to do. Today, one unified inbox consolidates mechanical, energy, and RUL detections into a single alert per asset, with canonical severity A/B/C/D/F and actionable recommendations.
Before vs After
| Dimension | Before (reactive) | After (Rela AI predictive) |
|---|---|---|
| Fault detection | Equipment fails → operator calls → tech arrives | 18–72h lead time with estimated RUL |
| Operator inbox | 3 alerts for the same spike (anomaly + energy + RUL) | 1 consolidated row per asset, severity = max |
| Maintenance decision | Calendar-fixed or reactive | Real condition + confidence + per-tenant thresholds |
| Early ML signal | Ignored until an ISA-18.2 alarm fired | anomaly_pressure drops AHI from the first ML event |
| Silent drift | Motor loses 0.5%/week for months undetected | Page-Hinkley catches it within 20 samples |
| Auditability | "What threshold was active when this got an A?" unanswered | Every snapshot carries a traceable config_version |
What it's for
Reactive maintenance (wait for failure) and calendar-based preventive maintenance (change oil every 3 months regardless of actual state) are the two extremes. Predictive maintenance is the sweet spot: intervene exactly when the asset needs it, based on real condition.
Condition Monitoring lets you:
- Detect gradual degradation weeks before a failure.
- Estimate remaining useful life of a component in days/hours.
- Prioritize maintenance on equipment that needs it most.
- Reduce unplanned downtime — the costliest form of production loss.
- Consolidate the noise from multiple detection systems into an actionable inbox.
End-to-end pipeline
flowchart LR
S[Sensors / PLC / SCADA] --> I[Ingest MQTT / OPC UA / Modbus / S7 / EtherNet-IP / HTTP]
I --> F[Field mapping + normalization]
F --> T[Trends + baselines]
T --> A1[ML detection<br/>IsolationForest + LOF]
T --> A2[Energy residuals<br/>z-score + Page-Hinkley]
T --> H[Asset Health Index<br/>5 sub-indices]
H --> P[RUL prognostics<br/>+ CBM triggers]
A1 --> AG[Alert Aggregator<br/>per-asset dedup]
A2 --> AG
P --> AG
AG --> INBOX[Unified inbox<br/>1 row per asset]
INBOX --> TASK[Work order]
INBOX --> CMMS[CMMS sync]
INBOX --> NOTIF[WhatsApp / Email]Each block is documentable, observable, and per-tenant configurable. See Anomaly Detection, Alert Aggregator, Industrial Protocols.
ISO 13374 — 6 levels
L1 — Data acquisition
Raw sensor, PLC and SCADA data arrive via natively supported protocols: HTTP, MQTT, OPC UA (incl. Reverse Connect), Modbus TCP, S7comm, EtherNet/IP (CIP). Stored with original timestamp, normalized for uniform downstream processing.
L2 — Trend analysis
Raw data gets moving averages, min/max, rate-of-change, std deviation. Loose data becomes readable trends.
L3 — State detection
Two complementary flows:
- Condition vs baseline — current data vs learned baselines.
- ML anomaly detection — IsolationForest + LocalOutlierFactor ensemble scores 0–1 each reading; >0.7 enters the inbox.
Each detection is translated to a canonical severity (info, warning, high, critical) — the common language for every detector.
L4 — Health assessment
AHI 0–100 combining 5 sub-indices:
| Sub-index | Default weight | Measures |
|---|---|---|
condition | 0.35 | Instantaneous vs baselines |
alarm_health | 0.20 | Accumulated alarm-hours (ISA-18.2), cap per alarm |
maintenance_compliance | 0.15 | Overdue preventive plans |
trend_stability | 0.10 | 24h trend direction + r² |
anomaly_pressure | 0.20 | Recent ML detection density (7d), severity-weighted |
Weights tunable per tenant and per asset_type. See Predictive Config.
Grades:
| Grade | AHI | Status |
|---|---|---|
| A | 90-100 | Excellent |
| B | 70-89 | Good |
| C | 50-69 | Acceptable |
| D | 30-49 | Unsatisfactory |
| F | 0-29 | Critical — imminent failure risk |
A/B/C/D thresholds are tenant-configurable via ahi_grades.
L5 — Prognostics
Based on AHI history: RUL with bootstrap confidence, degradation rate (AHI pts/day), CBM trigger, failure probability. See Prognostics.
L6 — Recommendations + consolidated inbox
AI generates natural-language recommendations. Each detection is published to the Alert Aggregator, which consolidates across the 3 systems into one row per asset and routes to WhatsApp/email/CMMS/tasks.
Predictive maturity levels
| Level | Name | Requirements | Capabilities |
|---|---|---|---|
| 0 | Monitoring | Insufficient data | Basic alerts |
| 1 | Health tracking | 10+ snapshots | AHI active, trends visible |
| 2 | Prediction | 30+ snapshots + 1 registered failure | Reliable RUL, recommendations |
| 3 | Optimized | 30+ snapshots + 3 failures + confidence > 70% | Full automation, auto-CBM |
Auto-progresses — thresholds tenant-configurable.
Audit trail: config_version
Every health snapshot and every prognostics record carries config_version — the number of the predictive configuration active at compute time. Past alerts are not rewritten when config changes. Audit can rebuild exactly which thresholds produced each historical grade.
Use cases with measurable impact
Case 1 — Pump pre-failure caught 18h early. Compressor C-03 AHI drops 87 → 65 over 3 weeks. Prognostics estimates 9 days until F. Auto-task created; tech finds clogged oil filter. Replace → AHI back to 83 in 2 days.
Case 2 — Silent drift. Extruder motor within ±2σ for 4 months but drifting +0.3%/week. Page-Hinkley catches the regime change at sample 20. Tech realigns coupling, consumption returns to baseline.
Case 3 — Unified inbox. Vibration spike on pump B-07 fires 3 detectors. Before: 3 alerts. After: 1 row with source_systems = [anomaly, energy, prognostics]. See Unified Inbox case.
Rollout
Phase 1 — connect sources, accumulate 7-14 days of data, review trends.
Phase 2 — compute baselines, train ML anomaly models, turn on health assessment.
Phase 3 — enable RUL prognostics, tune alert_dedup_window_minutes, wire recommendations.
Key benefits
- Reactive → predictive with days of lead time.
- 5-signal health index including early ML detection.
- Single inbox — cross-system consolidation with max severity.
- Per-tenant and per-asset-type thresholds.
- Full
config_versiontraceability. - Gradual drift detection (Page-Hinkley).
- ISO 13374 standard-aligned.
Predictive KPIs
Value metrics for predictive maintenance including anomaly-to-WO time, false positive rate, percentage of auto-created WOs, unplanned downtime, and RUL accuracy.
Baselines and Condition State
Baselines record how equipment behaves when it is healthy. The system compares each new measurement against that reference to automatically detect when something changes.