Rela AIRela AI Docs
Maintenance

Predictive Configuration

Per-tenant predictive engine control panel: AHI weights, grades, RUL thresholds, failure probability, inbox window, and per-asset-type overrides. Full traceability via config_version.

Predictive Configuration

Predictive configuration is the control panel of the predictive maintenance engine. Each tenant tunes how the AHI is computed, when an asset is considered at risk, which window the unified inbox uses to consolidate alerts, and the minimum confidence for auto-interventions — all without code changes or deploys.

What is it for?

  • Configure thresholds, baselines, and detection windows per asset.
  • Version the predictive config for safe rollback.
  • Propagate changes to the pipeline without restart.

How it works

The config is versioned (config_version) and consumed online by the predictive pipeline. Changes are validated, persisted, and propagated via pub/sub without worker restart.

What it's for

  • Personalize the engine to each organization's risk tolerance and operational context.
  • Tune the 5 AHI sub-index weights and grade thresholds (A/B/C/D).
  • Define RUL breakpoints (hours separating critical/high/medium/low) and failure probability breakpoints.
  • Configure the Alert Aggregator dedup window.
  • Override any parameter per asset_type (critical pumps stricter than tolerant compressors).
  • Full traceability via config_version — every persisted snapshot carries its compute-time version.

Configurable parameters

AHI weights (ahi_weights)

Relative importance of each Asset Health Index sub-index. Sum must be 1.0.

Sub-indexDefaultMeasures
condition0.35Instantaneous vs baselines
alarm_health0.20Accumulated alarm-hours ISA-18.2 with per-alarm cap
maintenance_compliance0.15Preventive plan compliance
trend_stability0.1024h trend direction + r²
anomaly_pressure0.20Recent ML detection density (7d)

See Condition Monitoring for sub-index semantics.

AHI grades (ahi_grades)

Thresholds translating 0–100 score into A/B/C/D/F:

ParameterDefaultMeaning
grade_a90AHI ≥ 90 → A
grade_b70AHI ≥ 70 → B
grade_c50AHI ≥ 50 → C
grade_d30AHI ≥ 30 → D; below → F

Stricter tenants can raise thresholds (grade_a = 95, etc.) so the same numeric health gets a tougher grade.

RUL thresholds (rul_thresholds)

Remaining hours separating risk levels:

ParameterDefaultAllowed rangeMeaning
critical_hours241 to 8760 (1 year)RUL under 24h becomes critical
high_hours1681 to 17520 (2 years)RUL under 168h (7d) becomes high
medium_hours7201 to 43800 (5 years)RUL under 720h (30d) becomes medium

Hierarchy rule: the three values must satisfy critical_hours <= high_hours <= medium_hours. Requests that violate this ordering are rejected with 422. This prevents accidental misconfiguration like critical=500 with high=100.

The ranges were intentionally relaxed to cover long-lived industrial assets (transformers, pressure vessels) whose expected remaining life is measured in years rather than weeks.

Failure probability (failure_probability)

AHI breakpoints into failure risk buckets:

ParameterDefaultMeaning
critical_ahi30AHI ≤ 30 → critical probability
high_ahi5030 < AHI ≤ 50 → high
normal_ahi7050 < AHI ≤ 70 → medium

Alert Aggregator

ParameterDefaultMeaning
alert_dedup_window_minutes60Window within which same-asset detections collapse into one row

See Unified Inbox.

Other

ParameterDefaultMeaning
cbm_trigger_multiplier1.5Metric crossing baseline_max × 1.5 triggers CBM
ahi_risk_threshold70AHI ≤ 70 considered "at risk" for executive dashboards
confidence_snapshots_ceiling30Snapshots for RUL confidence to saturate at 100%

Maturity requirements (maturity_requirements)

ParameterDefaultMeaning
level_1_snapshots10Snapshots to exit Level 0
level_2_snapshots30Snapshots for Level 2
level_2_failures1Registered failures for Level 2
level_3_failures3Failures for Level 3
level_3_confidence70Minimum RUL confidence (%) for Level 3

Per-asset-type overrides

Any sub-dict can be overridden per asset_type without repeating others:

{
  "rul_thresholds": { "critical_hours": 24, "high_hours": 168, "medium_hours": 720 },
  "asset_type_overrides": {
    "critical_pump": { "rul_thresholds": { "critical_hours": 12 } },
    "hvac_chiller":  { "rul_thresholds": { "critical_hours": 48, "high_hours": 336 } }
  }
}

config_version — audit traceability

Every update_config atomically increments config_version ($inc: {config_version: 1}). New tenants start at 0 (factory defaults).

Each persisted document carries the active config_version at compute time:

  • _asset_health_snapshotsconfig_version field per snapshot.
  • _asset_prognosticsconfig_version field per prognostics record.
  • compute_enhanced_prognosticsconfig_version in response.

Why it matters: if a tenant adjusts ahi_grades three times in 6 months, old snapshots are not rewritten — each keeps its compute-time version. An audit can rebuild exactly which thresholds produced each historical grade.

Callers can't smuggle a config_version: the service strips it in every update_config — the counter is system-owned.

Cache

Config is cached in memory (5 min TTL) and Redis when available. Every update_config invalidates the cache. Changes require no restart.

How to use it

Dashboard

  1. Configuration → Predictive Engine.
  2. Adjust AHI weights via sliders (sum auto-normalizes).
  3. Modify grade and RUL thresholds.
  4. Configure per-asset-type overrides in the Per-Type section.
  5. Saveconfig_version increments.
  6. Reset Defaults — clears the tenant configuration.

Changes are logged in the tenant audit trail with date, user, previous/new values. Combined with config_version, you can answer "which thresholds applied on March 3?" with certainty.

Lowering critical_hours or grade_a may generate more automatic urgent tasks. Recommend reviewing impact in staging before production.

API

GET  /api/v1/predictive-config            # returns resolved config
PATCH /api/v1/predictive-config           # partial update, bumps config_version
POST /api/v1/predictive-config/reset      # clears tenant config

Updates are deep-merge: sending {"rul_thresholds": {"critical_hours": 12}} preserves other rul_thresholds fields.

Key benefits

  • One per-tenant panel — tune without code, without deploys.
  • Granularity: weights + grades + thresholds + per-type overrides.
  • Incremental, traceable config_version in every persisted snapshot.
  • 5-min cache + automatic invalidation on updates.
  • System protects its own counter — callers can't smuggle versions.
  • Idempotent reset that doesn't break versioning.

Consumers

Resolved config (defaults + tenant + asset_type override) is consumed by:

On this page