Per-tenant predictive engine control panel: AHI weights, grades, RUL thresholds, failure probability, inbox window, and per-asset-type overrides. Full traceability via config_version.

Predictive Configuration

Predictive configuration is the control panel of the predictive maintenance engine. Each tenant tunes how the AHI is computed, when an asset is considered at risk, which window the unified inbox uses to consolidate alerts, and the minimum confidence for auto-interventions — all without code changes or deploys.

What is it for?

Configure thresholds, baselines, and detection windows per asset.
Version the predictive config for safe rollback.
Propagate changes to the pipeline without restart.

How it works

The config is versioned (config_version) and consumed online by the predictive pipeline. Changes are validated, persisted, and propagated via pub/sub without worker restart.

What it's for

Personalize the engine to each organization's risk tolerance and operational context.
Tune the 5 AHI sub-index weights and grade thresholds (A/B/C/D).
Define RUL breakpoints (hours separating critical/high/medium/low) and failure probability breakpoints.
Configure the Alert Aggregator dedup window.
Override any parameter per asset_type (critical pumps stricter than tolerant compressors).
Full traceability via config_version — every persisted snapshot carries its compute-time version.

Configurable parameters

AHI weights (`ahi_weights`)

Relative importance of each Asset Health Index sub-index. Sum must be 1.0.

Sub-index	Default	Measures
`condition`	0.35	Instantaneous vs baselines
`alarm_health`	0.20	Accumulated alarm-hours ISA-18.2 with per-alarm cap
`maintenance_compliance`	0.15	Preventive plan compliance
`trend_stability`	0.10	24h trend direction + r²
`anomaly_pressure`	0.20	Recent ML detection density (7d)

See Condition Monitoring for sub-index semantics.

AHI grades (`ahi_grades`)

Thresholds translating 0–100 score into A/B/C/D/F:

Parameter	Default	Meaning
`grade_a`	90	AHI ≥ 90 → A
`grade_b`	70	AHI ≥ 70 → B
`grade_c`	50	AHI ≥ 50 → C
`grade_d`	30	AHI ≥ 30 → D; below → F

Stricter tenants can raise thresholds (grade_a = 95, etc.) so the same numeric health gets a tougher grade.

RUL thresholds (`rul_thresholds`)

Remaining hours separating risk levels:

Parameter	Default	Allowed range	Meaning
`critical_hours`	24	1 to 8760 (1 year)	RUL under 24h becomes critical
`high_hours`	168	1 to 17520 (2 years)	RUL under 168h (7d) becomes high
`medium_hours`	720	1 to 43800 (5 years)	RUL under 720h (30d) becomes medium

Hierarchy rule: the three values must satisfy critical_hours <= high_hours <= medium_hours. Requests that violate this ordering are rejected with 422. This prevents accidental misconfiguration like critical=500 with high=100.

The ranges were intentionally relaxed to cover long-lived industrial assets (transformers, pressure vessels) whose expected remaining life is measured in years rather than weeks.

Failure probability (`failure_probability`)

AHI breakpoints into failure risk buckets:

Parameter	Default	Meaning
`critical_ahi`	30	AHI ≤ 30 → critical probability
`high_ahi`	50	30 < AHI ≤ 50 → high
`normal_ahi`	70	50 < AHI ≤ 70 → medium

Alert Aggregator

Parameter	Default	Meaning
`alert_dedup_window_minutes`	60	Window within which same-asset detections collapse into one row

See Unified Inbox.

Other

Parameter	Default	Meaning
`cbm_trigger_multiplier`	1.5	Metric crossing `baseline_max × 1.5` triggers CBM
`ahi_risk_threshold`	70	AHI ≤ 70 considered "at risk" for executive dashboards
`confidence_snapshots_ceiling`	30	Snapshots for RUL confidence to saturate at 100%

Maturity requirements (`maturity_requirements`)

Parameter	Default	Meaning
`level_1_snapshots`	10	Snapshots to exit Level 0
`level_2_snapshots`	30	Snapshots for Level 2
`level_2_failures`	1	Registered failures for Level 2
`level_3_failures`	3	Failures for Level 3
`level_3_confidence`	70	Minimum RUL confidence (%) for Level 3

Per-asset-type overrides

Any sub-dict can be overridden per asset_type without repeating others:

{
  "rul_thresholds": { "critical_hours": 24, "high_hours": 168, "medium_hours": 720 },
  "asset_type_overrides": {
    "critical_pump": { "rul_thresholds": { "critical_hours": 12 } },
    "hvac_chiller":  { "rul_thresholds": { "critical_hours": 48, "high_hours": 336 } }
  }
}

`config_version` — audit traceability

Every update_config atomically increments config_version ($inc: {config_version: 1}). New tenants start at 0 (factory defaults).

Each persisted document carries the active config_version at compute time:

_asset_health_snapshots — config_version field per snapshot.
_asset_prognostics — config_version field per prognostics record.
compute_enhanced_prognostics — config_version in response.

Why it matters: if a tenant adjusts ahi_grades three times in 6 months, old snapshots are not rewritten — each keeps its compute-time version. An audit can rebuild exactly which thresholds produced each historical grade.

Callers can't smuggle a config_version: the service strips it in every update_config — the counter is system-owned.

Cache

Config is cached in memory (5 min TTL) and Redis when available. Every update_config invalidates the cache. Changes require no restart.

How to use it

Dashboard

Configuration → Predictive Engine.
Adjust AHI weights via sliders (sum auto-normalizes).
Modify grade and RUL thresholds.
Configure per-asset-type overrides in the Per-Type section.
Save — config_version increments.
Reset Defaults — clears the tenant configuration.

Changes are logged in the tenant audit trail with date, user, previous/new values. Combined with config_version, you can answer "which thresholds applied on March 3?" with certainty.

Lowering critical_hours or grade_a may generate more automatic urgent tasks. Recommend reviewing impact in staging before production.

API

GET  /api/v1/predictive-config            # returns resolved config
PATCH /api/v1/predictive-config           # partial update, bumps config_version
POST /api/v1/predictive-config/reset      # clears tenant config

Updates are deep-merge: sending {"rul_thresholds": {"critical_hours": 12}} preserves other rul_thresholds fields.

Key benefits

One per-tenant panel — tune without code, without deploys.
Granularity: weights + grades + thresholds + per-type overrides.
Incremental, traceable config_version in every persisted snapshot.
5-min cache + automatic invalidation on updates.
System protects its own counter — callers can't smuggle versions.
Idempotent reset that doesn't break versioning.

Consumers

Resolved config (defaults + tenant + asset_type override) is consumed by:

health_assessment_service — weights + grades + anomaly_pressure.
prognostics_service — rul_thresholds + failure_probability.
maturity_calculation_service — maturity_requirements.
alert_aggregator_service — alert_dedup_window_minutes.
Daily WhatsApp/email briefings — ahi_risk_threshold.

Predictive Configuration

On this page