Case: Unified Inbox — 1 spike, 1 alert
The same physical event that used to generate 3 red rows on the board now consolidates into a single line with source_systems = [anomaly, energy, prognostics]. One action closes the incident.
Case: Unified Inbox — 1 spike, 1 alert
Context
- Industry: metalworking
- Equipment: process centrifugal pump B-07, 75 kW
- Sensors: triaxial vibration, discharge pressure, motor current, instantaneous kWh
- Criticality: high — feeds the water-jet cutting line
- Rela AI stack active: ML anomaly detection + energy monitoring + prognostics
The problem before the aggregator
When a bearing started failing, three subsystems caught the same anomaly at the same time and produced three independent alerts:
[14:02:15] 🔴 CRITICAL anomaly_detection B-07 ensemble score 0.92
[14:02:17] 🔴 CRITICAL energy_anomaly B-07 z=3.4 on kwh
[14:02:21] 🔴 CRITICAL prognostics B-07 RUL 18hReal consequences the customer experienced:
- Operator saw 3 reds at once and didn't know which to address first.
- 3 duplicate tasks on the kanban for the same asset — tech closed them all, supervisor confirmed 3 times the same thing.
- 3 WhatsApp messages 6 seconds apart that the operator read as 3 different problems.
- MTTR metrics got dirty — CMMS synced 3 incidents when it was really one.
What Rela AI does now
sequenceDiagram
participant S as Sensors
participant A as anomaly_detection
participant E as energy_service
participant P as prognostics
participant AG as alert_aggregator
participant I as Inbox
S->>A: Vibration spike
A->>A: Ensemble score 0.75 (warning)
A->>AG: ingest_alert(warning)
AG->>I: Insert row · status=open · severity=warning
S->>E: kWh spike
E->>E: z-score 3.4 (high)
E->>AG: ingest_alert(high)
AG->>I: Upgrade · severity=high · source_systems=[anomaly,energy] · count=2
S->>P: AHI collapses
P->>P: RUL 18h (critical)
P->>AG: ingest_alert(critical)
AG->>I: Upgrade · severity=critical · source_systems=[anomaly,energy,prognostics] · count=3The three detections arrive at the aggregator within the configured window (alert_dedup_window_minutes, default 60). Result in _alerts:
| Field | Value |
|---|---|
asset_id | B-07 |
status | open |
severity | critical (max of 3) |
source_systems | ["anomaly", "energy", "prognostics"] |
count | 3 |
sources (upgrade trail) | warning→ / →high / →critical |
first_seen_at | 14:02:15 |
last_seen_at | 14:02:21 |
What the operator sees
Before:
[14:02:15] 🔴 CRITICAL anomaly B-07 score 0.92
[14:02:17] 🔴 CRITICAL energy B-07 z=3.4
[14:02:21] 🔴 CRITICAL prognostics B-07 RUL 18hAfter:
[14:02:15 → 14:02:21] 🔴 CRITICAL B-07 [anomaly, energy, prognostics] count=3
└ trail:
14:02:15 anomaly warning→ ensemble score 0.75
14:02:17 energy →high z=3.4 on kwh
14:02:21 prognostics →critical RUL 18hOne row. Severity is the worst observed. Trail tells the escalation story — audit without opening 3 collections.
Post-ACK escalation
Maintenance lead marks the row as acknowledged. 15 min later, a new critical detection from anomaly_detection arrives for the same asset.
Before auto re-open: the acknowledged alert stayed silenced while worse detections arrived.
Now: the row flips to status = open with reopened_reason = severity_upgrade. UI promotes it back to the active inbox view and fires a new notification — no way to lose visibility during escalation.
Hysteresis: trail stays clean
If an incoming detection is equal-or-lower severity than stored, the aggregator still bumps count and source_systems (stats) but does not push to sources. Prevents a critical alert from ending up with a trail of 87 warning entries that add no information.
Impact
| Metric | No aggregator | With aggregator |
|---|---|---|
| Red rows per incident | 3 | 1 |
| Tasks on kanban | 3 duplicates | 1 |
| WhatsApp notifications | 3 in 6s | 1 |
| Operator decision time | +90s of confusion | immediate |
| MTTR to CMMS | 3 mixed incidents | 1 clean incident |
| Post-ACK escalation visibility | zero (silenced) | automatic re-open |
Configuration used
{
"alert_dedup_window_minutes": 60,
"ahi_weights": {
"condition": 0.35,
"alarm_health": 0.20,
"maintenance_compliance": 0.15,
"trend_stability": 0.10,
"anomaly_pressure": 0.20
}
}See Predictive Config and Unified Inbox for the full parameter list.
What makes it possible
Backed by concrete technical changes:
- R1:
alert_aggregator_servicewith per-asset merge rules and configurable window. - R3: canonical severity (
info/warning/high/critical) shared by the 3 detectors. - R6: E2E test (
test_single_spike_consolidates_into_one_alert) that codifies the "1 spike → 1 alert" contract. - R8: trail hysteresis, re-open on upgrade, dedup search that includes
acknowledged.
Why it matters commercially
- Less noise → more action. Operator stops ignoring alerts because "all shout the same".
- Clean MTTR. CMMS reports real incidents, not duplicates — the metric is trustworthy again.
- Post-ACK confidence. Shift leads can acknowledge without fear of missing escalations.
- External integrations. WhatsApp, email, ServiceNow, SAP PM — all receive one notification per incident, not three.
Not a UI "nice to have": these are persisted contract changes in
_alerts, with integration tests guaranteeing nobody breaks semantics. See Unified Inbox for the full technical contract.
Many machines, one VPN tunnel
How to add oven, mixers, compressors, and sensors behind the same bakery VPN tunnel — without reconfiguring or touching the Mikrotik.
Case: First Month with No History — Plastics Extruder
How Rela AI bootstrapped from zero on a fresh extruder with no historical data and evolved into prediction