Rela AIRela AI Docs
OEE

OEE Accuracy

How Rela-ai filters structural biases (dead sensors, shared PLCs, fabricated downtime) so OEE reflects real operation — not a cosmetic number.

OEE Accuracy

The OEE a dashboard shows is only useful if it reflects reality. An audit surfaced several places where the traditional computation diverges from real operation: downtime fabricated by heuristics, dead sensors producing a fake 0%, shared PLCs inflating metrics via double-counting, broken calibration hidden behind a 100% cap. This page documents each one and how it was closed.

What is it for?

  • Know whether the OEE number you're looking at is trustworthy or contaminated by wrong config.
  • Audit the calculation point by point against the Nakajima standard.
  • Spot dead sensors or shared PLCs silently inflating or deflating the number.

How it works

Each bias source is closed by an explicit filter and a response-JSON flag. If the calculation is affected, the response says so: status=stale_source, downtime_estimation=true, performance_capped=true. The operator sees the number AND knows whether to trust it.

Executive summary

Verifiable OEE: every deviation from the Nakajima standard is covered by an explicit filter and a response-side flag. The operator sees reality — including when reality is contaminated by bad config.

BeforeNow
Hardcoded 5 min per downtime event (a 2h stop reported as 5 min)Real duration from metadata.duration_seconds + downtime_estimation flag
Dead sensor then fake 0% OEEStaleness filter + status=stale_source
PLC shared across lines then double-countedAsset-ID scope in the $match
Performance > 100% silently cappedperformance_pct_raw + performance_capped flag
Buffer/store-and-forward misaligned periodsmetadata.timestamp over created_at
Orphan configs (asset/source deleted)404 validation on configure
Threshold changes rewrote history silentlyFire-and-forget audit trail
Shifts and PM tanked availabilityshift_pattern + subtract_scheduled_maintenance
Trend truncated on a transient errorRobust loop + calendar-day windows
No Pareto or version stampdowntime_breakdown + config_version

Closed findings

Critical (H)

OEE-H1 — Real downtime duration

The legacy computation used 5 min per downtime event regardless of real length. A 2-hour stop reported as a single event counted as 5 min; a 10-second stop with 12 rebounds counted as 60 min. The resulting Availability was random vs reality.

The resolver now follows a priority order:

  1. Sum of metadata.duration_seconds (first-class, measured).
  2. Sum of metadata.duration_minutes (convenience shortcut).
  3. Legacy 5 min × event_count heuristic (backward compat, flagged).

The response carries downtime_estimation with measured, heuristic or none so the dashboard can warn when the number is approximate.

{
  "downtime_minutes": 90.0,
  "downtime_estimation": "measured"
}

OEE-H2 — Staleness guard on count source

A count source flagged stale by the sensor_watchdog no longer feeds the calculation. Instead the response short-circuits with status="stale_source" and a message pointing to the watchdog. Prevents a fake 0% OEE when the sensor died but the line is still producing.

{
  "status": "stale_source",
  "message": "Count source is stale; OEE not computed."
}

OEE-H3 — Asset-id scoping

When two assets share the same count_source_id (typical: two lines behind the same PLC), every event was counted once per asset — Performance and Quality symmetrically inflated, downtime doubled. The $match now includes an $or accepting events with asset_id at root, inside metadata, or absent (legacy single-asset).

OEE-H4 — Raw performance + performance_capped

performance > 100% is not a glitch — it's a strong signal of broken calibration (cycle_time subconfigured or double-counting). Legacy code silently capped at 100%. The response now exposes both:

{
  "performance_pct": 100.0,
  "performance_pct_raw": 173.6,
  "performance_capped": true
}

The OEE KPI still uses the capped value (Nakajima defines 0-100), but the dashboard can render a "calibration check" badge when raw > 100.

Medium (M)

OEE-M1 — Real timestamp over created_at

$match prefers metadata.timestamp (ISO) when the event carries one; created_at is the fallback. Corrects the buffer/store-and-forward bias: a reading that fired at 14:00 but was re-ingested at 14:45 now lands in the 14:00 bucket, not 14:45.

OEE-M2 — shift_pattern per day

New config field:

{
  "shift_pattern": {
    "hours_by_weekday": {"1": 8, "2": 8, "3": 8, "4": 8, "5": 8}
  }
}

ISO weekday: 1=Monday, 7=Sunday. A plant running Mon-Fri 8h and dark on weekends sees planned=0 min on Saturday, not 0% OEE against a phantom 480-min plan.

OEE-M3 — Scheduled PM subtracts from planned

New flag subtract_scheduled_maintenance: true (default). _maintenance_plans with next_due_at inside the period reduce planned_minutes instead of counting as downtime. A 4h PM inside an 8h shift no longer reports Availability=50% (as if the PM were a failure); planned becomes 240 min and Availability stays 100% for the remaining 4h of real production. Opt-out available for tenants whose internal convention keeps PM in the downtime bucket.

OEE-M4 — Validate asset_id + count_source_id

POST /configure returns 404 when asset_id doesn't exist (neither as _assets._id ObjectId nor asset_code) or when count_source_id isn't present in _machine_event_sources. Prevents orphan configs producing a silent 0% OEE when the asset or source has been deleted.

OEE-M5 — Config mutation audit trail

Any change to the 6 regulated fields (planned_production_hours, ideal_cycle_time_seconds, count_source_id, count_metric_field, reject_metric_field, downtime_event_type) writes an entry to _audit_trail with actor, timestamp, previous and new snapshot. An auditor can answer "who moved ideal_cycle_time_seconds from 2.5 to 3.0 on March 3rd".

OEE-M6 — Robust trend against transient errors

A transient DB error (e.g. a timeout on day 3 of a 7-day trend) no longer truncates the result. Each day computes in its own try/except; failures return {"date": "...", "status": "error"} as a placeholder and the loop continues. Only a 404 (not-configured) is terminal, because that's a config error, not transient.

OEE-M7 — Calendar-day windows

Trend windows are anchored at UTC midnight (00:00 to 24:00) instead of rolling 24h anchored to call time. The label "2026-04-17" now matches exactly the window it represents.

Low (L)

OEE-L2 — Downtime Pareto breakdown

New downtime_breakdown list on the response, grouped by event_type and sorted descending by minutes:

{
  "downtime_breakdown": [
    {"event_type": "STOP_COMPRESSOR", "event_count": 2, "minutes": 40.0},
    {"event_type": "STOP_CHANGEOVER", "event_count": 3, "minutes": 20.0}
  ]
}

Answers "which stop type cost us the most time?" without a second query.

OEE-L3 — config_version in the response

Each configure_oee does $inc.config_version in MongoDB. Each calculate_oee stamps the active config_version in the response. A historical OEE value is anchored to the threshold set that produced it — the audit question "which config produced that 87%?" is trivially resolvable.

Full response shape

{
  "asset_id": "line-01",
  "oee_pct": 72.3,
  "availability_pct": 93.8,
  "performance_pct": 85.0,
  "performance_pct_raw": 85.0,
  "performance_capped": false,
  "quality_pct": 98.2,
  "total_count": 4800,
  "good_count": 4714,
  "reject_count": 86,
  "planned_minutes": 480.0,
  "operating_minutes": 450.0,
  "downtime_minutes": 30.0,
  "downtime_estimation": "measured",
  "downtime_breakdown": [
    {"event_type": "STOP_COMPRESSOR", "event_count": 2, "minutes": 22.0},
    {"event_type": "STOP_CHANGEOVER", "event_count": 1, "minutes": 8.0}
  ],
  "scheduled_maintenance_minutes": 0.0,
  "config_version": 7,
  "period_start": "2026-04-17T00:00:00+00:00",
  "period_end": "2026-04-18T00:00:00+00:00"
}

MongoDB collections touched

CollectionUse
_oee_configsPer-asset config with monotonic config_version.
_machine_eventsSingle source of events (production + downtime) filtered by asset_id + real timestamp.
_machine_event_sourcesconnected/stale status read by the OEE-H2 guard.
_assetsasset_id validation on configure (dual lookup: ObjectId or asset_code).
_maintenance_plansScheduled windows reducing planned_minutes (OEE-M3).
_audit_trailFire-and-forget entries with action=oee_config_updated.

Key benefits

  • Decision-making on real data, not heuristics.
  • Zero fake 0% OEE from dead sensors.
  • Zero double-counting across lines sharing a PLC.
  • Visible calibration: performance_pct_raw > 100 triggers review.
  • Stable trend: labels and windows aligned, transient errors don't truncate.
  • Audit-ready: every threshold mutation leaves a trail, every historical KPI carries its config_version.

On this page