Predictive Configuration
Per-tenant predictive engine control panel: AHI weights, grades, RUL thresholds, failure probability, inbox window, and per-asset-type overrides. Full traceability via config_version.
Predictive Configuration
Predictive configuration is the control panel of the predictive maintenance engine. Each tenant tunes how the AHI is computed, when an asset is considered at risk, which window the unified inbox uses to consolidate alerts, and the minimum confidence for auto-interventions — all without code changes or deploys.
What is it for?
- Configure thresholds, baselines, and detection windows per asset.
- Version the predictive config for safe rollback.
- Propagate changes to the pipeline without restart.
How it works
The config is versioned (config_version) and consumed online by the predictive pipeline. Changes are validated, persisted, and propagated via pub/sub without worker restart.
What it's for
- Personalize the engine to each organization's risk tolerance and operational context.
- Tune the 5 AHI sub-index weights and grade thresholds (A/B/C/D).
- Define RUL breakpoints (hours separating
critical/high/medium/low) and failure probability breakpoints. - Configure the Alert Aggregator dedup window.
- Override any parameter per
asset_type(critical pumps stricter than tolerant compressors). - Full traceability via
config_version— every persisted snapshot carries its compute-time version.
Configurable parameters
AHI weights (ahi_weights)
Relative importance of each Asset Health Index sub-index. Sum must be 1.0.
| Sub-index | Default | Measures |
|---|---|---|
condition | 0.35 | Instantaneous vs baselines |
alarm_health | 0.20 | Accumulated alarm-hours ISA-18.2 with per-alarm cap |
maintenance_compliance | 0.15 | Preventive plan compliance |
trend_stability | 0.10 | 24h trend direction + r² |
anomaly_pressure | 0.20 | Recent ML detection density (7d) |
See Condition Monitoring for sub-index semantics.
AHI grades (ahi_grades)
Thresholds translating 0–100 score into A/B/C/D/F:
| Parameter | Default | Meaning |
|---|---|---|
grade_a | 90 | AHI ≥ 90 → A |
grade_b | 70 | AHI ≥ 70 → B |
grade_c | 50 | AHI ≥ 50 → C |
grade_d | 30 | AHI ≥ 30 → D; below → F |
Stricter tenants can raise thresholds (grade_a = 95, etc.) so the same numeric health gets a tougher grade.
RUL thresholds (rul_thresholds)
Remaining hours separating risk levels:
| Parameter | Default | Allowed range | Meaning |
|---|---|---|---|
critical_hours | 24 | 1 to 8760 (1 year) | RUL under 24h becomes critical |
high_hours | 168 | 1 to 17520 (2 years) | RUL under 168h (7d) becomes high |
medium_hours | 720 | 1 to 43800 (5 years) | RUL under 720h (30d) becomes medium |
Hierarchy rule: the three values must satisfy critical_hours <= high_hours <= medium_hours. Requests that violate this ordering are rejected with 422. This prevents accidental misconfiguration like critical=500 with high=100.
The ranges were intentionally relaxed to cover long-lived industrial assets (transformers, pressure vessels) whose expected remaining life is measured in years rather than weeks.
Failure probability (failure_probability)
AHI breakpoints into failure risk buckets:
| Parameter | Default | Meaning |
|---|---|---|
critical_ahi | 30 | AHI ≤ 30 → critical probability |
high_ahi | 50 | 30 < AHI ≤ 50 → high |
normal_ahi | 70 | 50 < AHI ≤ 70 → medium |
Alert Aggregator
| Parameter | Default | Meaning |
|---|---|---|
alert_dedup_window_minutes | 60 | Window within which same-asset detections collapse into one row |
See Unified Inbox.
Other
| Parameter | Default | Meaning |
|---|---|---|
cbm_trigger_multiplier | 1.5 | Metric crossing baseline_max × 1.5 triggers CBM |
ahi_risk_threshold | 70 | AHI ≤ 70 considered "at risk" for executive dashboards |
confidence_snapshots_ceiling | 30 | Snapshots for RUL confidence to saturate at 100% |
Maturity requirements (maturity_requirements)
| Parameter | Default | Meaning |
|---|---|---|
level_1_snapshots | 10 | Snapshots to exit Level 0 |
level_2_snapshots | 30 | Snapshots for Level 2 |
level_2_failures | 1 | Registered failures for Level 2 |
level_3_failures | 3 | Failures for Level 3 |
level_3_confidence | 70 | Minimum RUL confidence (%) for Level 3 |
Per-asset-type overrides
Any sub-dict can be overridden per asset_type without repeating others:
{
"rul_thresholds": { "critical_hours": 24, "high_hours": 168, "medium_hours": 720 },
"asset_type_overrides": {
"critical_pump": { "rul_thresholds": { "critical_hours": 12 } },
"hvac_chiller": { "rul_thresholds": { "critical_hours": 48, "high_hours": 336 } }
}
}config_version — audit traceability
Every update_config atomically increments config_version ($inc: {config_version: 1}). New tenants start at 0 (factory defaults).
Each persisted document carries the active config_version at compute time:
_asset_health_snapshots—config_versionfield per snapshot._asset_prognostics—config_versionfield per prognostics record.compute_enhanced_prognostics—config_versionin response.
Why it matters: if a tenant adjusts ahi_grades three times in 6 months, old snapshots are not rewritten — each keeps its compute-time version. An audit can rebuild exactly which thresholds produced each historical grade.
Callers can't smuggle a config_version: the service strips it in every update_config — the counter is system-owned.
Cache
Config is cached in memory (5 min TTL) and Redis when available. Every update_config invalidates the cache. Changes require no restart.
How to use it
Dashboard
- Configuration → Predictive Engine.
- Adjust AHI weights via sliders (sum auto-normalizes).
- Modify grade and RUL thresholds.
- Configure per-asset-type overrides in the Per-Type section.
- Save —
config_versionincrements. - Reset Defaults — clears the tenant configuration.
Changes are logged in the tenant audit trail with date, user, previous/new values. Combined with config_version, you can answer "which thresholds applied on March 3?" with certainty.
Lowering critical_hours or grade_a may generate more automatic urgent tasks. Recommend reviewing impact in staging before production.
API
GET /api/v1/predictive-config # returns resolved config
PATCH /api/v1/predictive-config # partial update, bumps config_version
POST /api/v1/predictive-config/reset # clears tenant configUpdates are deep-merge: sending {"rul_thresholds": {"critical_hours": 12}} preserves other rul_thresholds fields.
Key benefits
- One per-tenant panel — tune without code, without deploys.
- Granularity: weights + grades + thresholds + per-type overrides.
- Incremental, traceable
config_versionin every persisted snapshot. - 5-min cache + automatic invalidation on updates.
- System protects its own counter — callers can't smuggle versions.
- Idempotent reset that doesn't break versioning.
Consumers
Resolved config (defaults + tenant + asset_type override) is consumed by:
health_assessment_service— weights + grades + anomaly_pressure.prognostics_service— rul_thresholds + failure_probability.maturity_calculation_service— maturity_requirements.alert_aggregator_service— alert_dedup_window_minutes.- Daily WhatsApp/email briefings — ahi_risk_threshold.
CMMS Synchronization
Bidirectional synchronization with external CMMS systems such as SAP PM, Maximo, or custom integrations to unify predictive planning with existing legacy systems.
Predictive KPIs
Value metrics for predictive maintenance including anomaly-to-WO time, false positive rate, percentage of auto-created WOs, unplanned downtime, and RUL accuracy.