Unified Inbox (Alert Aggregator)
One inbox per asset: detections from anomalies, energy, and prognostics collapse into a single row. Canonical severity, windowed dedup, automatic re-open on escalation.
Unified Inbox — Alert Aggregator
The classic problem: three independent detection systems (mechanical anomalies, energy anomalies, RUL prognostics) produce three "critical" alerts when a pump degrades. The operator sees three rows, hesitates, and noise kills signal.
The solution: a single inbox consolidating the three sources into one row per asset, with severity = the highest observed, source(s) = union of detectors, and a severity-change trail to audit how the incident escalated.
Executive summary
One spike → one alert. Before: 3 reds in the inbox for the same physical event. After: 1 row with
source_systems = [anomaly, energy, prognostics]. Operator acts once; the system closes all three detections together.
What it's for
- Eliminate redundancy: the same phenomenon can fire ML detection + energy residual + critical RUL in seconds. Same story.
- Preserve the trail: consolidation does not lose information — each detector contributes to the alert's
sourceswith its summary and timestamp. - Canonical max severity: the alert reflects the worst diagnosis.
- Automatic escalation: if operator acknowledges but a new higher-severity detection arrives, the alert reopens (
reopened_reason = severity_upgrade).
How it works
Input contract
Each of the three detectors calls the aggregator after persisting its own record:
ingest_alert(
tenant_id,
asset_id,
source_system="anomaly" | "energy" | "prognostics",
severity="warning" | "high" | "critical",
summary="...",
source_doc_id="<optional origin doc id>",
extra={...}
)Input filter: only severity >= warning feeds the aggregator. info is noise, not incident.
Merge rules
flowchart TD
IN[New detection<br/>asset_id + severity] --> Q{Active row<br/>for this asset<br/>in window?}
Q -- No --> INS[Insert new row<br/>status=open]
Q -- Yes --> CMP{Incoming severity<br/>> stored?}
CMP -- Yes --> UP[Upgrade<br/>severity=max<br/>push sources<br/>re-open if acked]
CMP -- No --> EQ[Silent merge<br/>count += 1<br/>source_systems addToSet<br/>NO push sources]- Dedup window:
alert_dedup_window_minutes(default 60, per-tenant). - Search: includes
openandacknowledgedrows. An acknowledged alert within the window is still "live" for dedup. - Severity upgrade:
severity := max(stored, incoming)per canonical order. - Re-open on escalation: acknowledged row + higher incoming → back to
openwithreopened_reason = severity_upgrade. - Trail hysteresis: equal or lower severity still bumps
countandsource_systems, but does not push tosources.
Persistence
Per-tenant _alerts collection. Key fields:
| Field | Meaning |
|---|---|
asset_id | Affected asset. Dedup key. |
status | open / acknowledged (with reopened_* if reopened). |
severity | Max historic within the row. |
source_systems | Set of contributing detectors. |
sources | Append-only trail of upgrades. |
count | Total ingested detections. |
first_seen_at, last_seen_at | Timestamps. |
Contract-codifying tests
Behavior is codified in integration tests (tests/predictive/test_alert_aggregator_cross_system.py):
test_single_spike_consolidates_into_one_alert— all 3 detectors fire in the same window on the same asset →len(_alerts) == 1,source_systems == {anomaly, energy, prognostics},severity == critical,count == 3,len(sources) == 3.test_two_assets_two_separate_alerts— spikes on different assets → 2 separate rows.test_upgrade_reopens_acknowledged_alert— ACK + critical incoming →statusreturns toopen.test_same_severity_on_ack_keeps_it_acknowledged— ACK + same severity incoming → does not reopen.test_lower_severity_does_not_push_source_trail— stored critical + warning incoming →$pushto trail does NOT happen.
How to use it
View the inbox
- Operations → Alerts — tenant list, sorted by
last_seen_atdesc. - Columns: Asset · Severity · Source systems · Count · First seen · Last seen · Actions.
- Filters: severity, source, status, date range.
Per-alert actions
| Action | Effect |
|---|---|
| Acknowledge | status := acknowledged, records actor. |
| Create task | Converts into work order, linking all sources. |
| Notify | Fires WhatsApp/email. |
| Close | Closes the incident. New detections start a new one. |
Per-tenant config
Via Predictive Config:
alert_dedup_window_minutes— default 60.
Before / After
Before
[14:02:15] CRITICAL anomaly_detection pump-B07 vibration ensemble score 0.92
[14:02:17] CRITICAL energy_anomaly pump-B07 z=3.4 on kwh
[14:02:21] CRITICAL prognostics pump-B07 RUL 18hAfter
[14:02:15→14:02:21] CRITICAL pump-B07 [anomaly,energy,prognostics] count=3
└ sources:
14:02:15 anomaly warning→ ensemble score 0.75
14:02:17 energy →high z=3.4 on kwh
14:02:21 prognostics →critical RUL 18hLimitations
- Dedup is by
asset_id. Different metrics of the same asset collapse into the same row. - Window slides on
last_seen_at. A detection arriving outside the window opens a new row. - Trail captures upgrades, not repetitions. Use
countfor frequency stats. - Not a full workflow system. Shift management, assignments, SLAs live in Work Orders.
Key benefits
- Single inbox per asset — less noise, more action.
- Canonical
maxseverity, auditable trail. - Automatic re-open on escalation post-ACK.
- Smart hysteresis — trail reflects severity changes, not repetitions.
- Per-tenant configurable window.
- E2E integration tests guaranteeing the contract.