How Rela-ai filters structural biases (dead sensors, shared PLCs, fabricated downtime) so OEE reflects real operation — not a cosmetic number.

OEE Accuracy

The OEE a dashboard shows is only useful if it reflects reality. An audit surfaced several places where the traditional computation diverges from real operation: downtime fabricated by heuristics, dead sensors producing a fake 0%, shared PLCs inflating metrics via double-counting, broken calibration hidden behind a 100% cap. This page documents each one and how it was closed.

What is it for?

Know whether the OEE number you're looking at is trustworthy or contaminated by wrong config.
Audit the calculation point by point against the Nakajima standard.
Spot dead sensors or shared PLCs silently inflating or deflating the number.

How it works

Each bias source is closed by an explicit filter and a response-JSON flag. If the calculation is affected, the response says so: status=stale_source, downtime_estimation=true, performance_capped=true. The operator sees the number AND knows whether to trust it.

Executive summary

Verifiable OEE: every deviation from the Nakajima standard is covered by an explicit filter and a response-side flag. The operator sees reality — including when reality is contaminated by bad config.

Before	Now
Hardcoded 5 min per downtime event (a 2h stop reported as 5 min)	Real duration from `metadata.duration_seconds` + `downtime_estimation` flag
Dead sensor then fake 0% OEE	Staleness filter + `status=stale_source`
PLC shared across lines then double-counted	Asset-ID scope in the `$match`
Performance > 100% silently capped	`performance_pct_raw` + `performance_capped` flag
Buffer/store-and-forward misaligned periods	`metadata.timestamp` over `created_at`
Orphan configs (asset/source deleted)	404 validation on configure
Threshold changes rewrote history silently	Fire-and-forget audit trail
Shifts and PM tanked availability	`shift_pattern` + `subtract_scheduled_maintenance`
Trend truncated on a transient error	Robust loop + calendar-day windows
No Pareto or version stamp	`downtime_breakdown` + `config_version`

Closed findings

Critical (H)

OEE-H1 — Real downtime duration

The legacy computation used 5 min per downtime event regardless of real length. A 2-hour stop reported as a single event counted as 5 min; a 10-second stop with 12 rebounds counted as 60 min. The resulting Availability was random vs reality.

The resolver now follows a priority order:

Sum of metadata.duration_seconds (first-class, measured).
Sum of metadata.duration_minutes (convenience shortcut).
Legacy 5 min × event_count heuristic (backward compat, flagged).

The response carries downtime_estimation with measured, heuristic or none so the dashboard can warn when the number is approximate.

{
  "downtime_minutes": 90.0,
  "downtime_estimation": "measured"
}

OEE-H2 — Staleness guard on count source

A count source flagged stale by the sensor_watchdog no longer feeds the calculation. Instead the response short-circuits with status="stale_source" and a message pointing to the watchdog. Prevents a fake 0% OEE when the sensor died but the line is still producing.

{
  "status": "stale_source",
  "message": "Count source is stale; OEE not computed."
}

OEE-H3 — Asset-id scoping

When two assets share the same count_source_id (typical: two lines behind the same PLC), every event was counted once per asset — Performance and Quality symmetrically inflated, downtime doubled. The $match now includes an $or accepting events with asset_id at root, inside metadata, or absent (legacy single-asset).

OEE-H4 — Raw performance + `performance_capped`

performance > 100% is not a glitch — it's a strong signal of broken calibration (cycle_time subconfigured or double-counting). Legacy code silently capped at 100%. The response now exposes both:

{
  "performance_pct": 100.0,
  "performance_pct_raw": 173.6,
  "performance_capped": true
}

The OEE KPI still uses the capped value (Nakajima defines 0-100), but the dashboard can render a "calibration check" badge when raw > 100.

Medium (M)

OEE-M1 — Real timestamp over `created_at`

$match prefers metadata.timestamp (ISO) when the event carries one; created_at is the fallback. Corrects the buffer/store-and-forward bias: a reading that fired at 14:00 but was re-ingested at 14:45 now lands in the 14:00 bucket, not 14:45.

OEE-M2 — `shift_pattern` per day

New config field:

{
  "shift_pattern": {
    "hours_by_weekday": {"1": 8, "2": 8, "3": 8, "4": 8, "5": 8}
  }
}

ISO weekday: 1=Monday, 7=Sunday. A plant running Mon-Fri 8h and dark on weekends sees planned=0 min on Saturday, not 0% OEE against a phantom 480-min plan.

OEE-M3 — Scheduled PM subtracts from `planned`

New flag subtract_scheduled_maintenance: true (default). _maintenance_plans with next_due_at inside the period reduce planned_minutes instead of counting as downtime. A 4h PM inside an 8h shift no longer reports Availability=50% (as if the PM were a failure); planned becomes 240 min and Availability stays 100% for the remaining 4h of real production. Opt-out available for tenants whose internal convention keeps PM in the downtime bucket.

OEE-M4 — Validate `asset_id` + `count_source_id`

POST /configure returns 404 when asset_id doesn't exist (neither as _assets._id ObjectId nor asset_code) or when count_source_id isn't present in _machine_event_sources. Prevents orphan configs producing a silent 0% OEE when the asset or source has been deleted.

OEE-M5 — Config mutation audit trail

Any change to the 6 regulated fields (planned_production_hours, ideal_cycle_time_seconds, count_source_id, count_metric_field, reject_metric_field, downtime_event_type) writes an entry to _audit_trail with actor, timestamp, previous and new snapshot. An auditor can answer "who moved ideal_cycle_time_seconds from 2.5 to 3.0 on March 3rd".

OEE-M6 — Robust trend against transient errors

A transient DB error (e.g. a timeout on day 3 of a 7-day trend) no longer truncates the result. Each day computes in its own try/except; failures return {"date": "...", "status": "error"} as a placeholder and the loop continues. Only a 404 (not-configured) is terminal, because that's a config error, not transient.

OEE-M7 — Calendar-day windows

Trend windows are anchored at UTC midnight (00:00 to 24:00) instead of rolling 24h anchored to call time. The label "2026-04-17" now matches exactly the window it represents.

Low (L)

OEE-L2 — Downtime Pareto breakdown

New downtime_breakdown list on the response, grouped by event_type and sorted descending by minutes:

{
  "downtime_breakdown": [
    {"event_type": "STOP_COMPRESSOR", "event_count": 2, "minutes": 40.0},
    {"event_type": "STOP_CHANGEOVER", "event_count": 3, "minutes": 20.0}
  ]
}

Answers "which stop type cost us the most time?" without a second query.

OEE-L3 — `config_version` in the response

Each configure_oee does $inc.config_version in MongoDB. Each calculate_oee stamps the active config_version in the response. A historical OEE value is anchored to the threshold set that produced it — the audit question "which config produced that 87%?" is trivially resolvable.

Full response shape

{
  "asset_id": "line-01",
  "oee_pct": 72.3,
  "availability_pct": 93.8,
  "performance_pct": 85.0,
  "performance_pct_raw": 85.0,
  "performance_capped": false,
  "quality_pct": 98.2,
  "total_count": 4800,
  "good_count": 4714,
  "reject_count": 86,
  "planned_minutes": 480.0,
  "operating_minutes": 450.0,
  "downtime_minutes": 30.0,
  "downtime_estimation": "measured",
  "downtime_breakdown": [
    {"event_type": "STOP_COMPRESSOR", "event_count": 2, "minutes": 22.0},
    {"event_type": "STOP_CHANGEOVER", "event_count": 1, "minutes": 8.0}
  ],
  "scheduled_maintenance_minutes": 0.0,
  "config_version": 7,
  "period_start": "2026-04-17T00:00:00+00:00",
  "period_end": "2026-04-18T00:00:00+00:00"
}

MongoDB collections touched

Collection	Use
`_oee_configs`	Per-asset config with monotonic `config_version`.
`_machine_events`	Single source of events (production + downtime) filtered by `asset_id` + real timestamp.
`_machine_event_sources`	`connected`/`stale` status read by the OEE-H2 guard.
`_assets`	`asset_id` validation on `configure` (dual lookup: ObjectId or `asset_code`).
`_maintenance_plans`	Scheduled windows reducing `planned_minutes` (OEE-M3).
`_audit_trail`	Fire-and-forget entries with `action=oee_config_updated`.

Key benefits

Decision-making on real data, not heuristics.
Zero fake 0% OEE from dead sensors.
Zero double-counting across lines sharing a PLC.
Visible calibration: performance_pct_raw > 100 triggers review.
Stable trend: labels and windows aligned, transient errors don't truncate.
Audit-ready: every threshold mutation leaves a trail, every historical KPI carries its config_version.

OEE Accuracy

On this page