Temperature monitoring for refrigeration and transport with traceable excursions, configurable grace, and journey/leg attribution for shipments.

Cold Chain

Cold chain is the uninterrupted temperature control from production to consumption. If a refrigerated asset or shipment leaves the safe band, the product may be compromised. Rela-ai monitors every asset, every batch, and every transport leg — and leaves a verifiable audit trail for regulators.

Executive summary

Cold chain without cry-wolf: every deviation runs through staleness, flapping, and configurable grace filters before it touches the inbox — and ends up attached to the batch and the transport leg that caused it.

Before this iteration, a 40-second door-opening could fire three separate alerts and leave zero information about the affected batch. Now:

Before	Now
Dead sensor sending 0 °C fabricated fake excursions	CC-H1 filter: `stale` sources are discarded
Cold chain alerted separately from the rest of the system	CC-H2: feed to the alert aggregator with `source_system=cold_chain`
Impossible to know WHICH sensor opened the excursion	CC-H3: `source_id` persisted on every excursion
Door open/close/open = 2 artificial incidents	CC-M2: re-open inside a 5-min flap window
Mutations of `temperature_max` invisible	CC-M5: fire-and-forget audit trail
Rigid "warning to critical" grace	CC-M1: `escalate` or `suppress` (HACCP-style) mode
Impossible to know WHICH batch was inside	CC-L1: `batch_id` and `product_type` on the excursion
Impossible to know WHICH transport leg failed	CC-M6: journey + legs with `excursion_ids`

What it does

Detects out-of-band temperatures in freezers, cold rooms, and vehicles.
Tells a 30-second door opening from a 2-hour compressor failure.
Attributes every breach to the batch and the transport leg responsible.
Produces immutable audit trail for HACCP, GDP, WHO TRS 961, FSMA.
Unifies alerts with the rest of the pipeline (maintenance, quality, SPC).

How it works

flowchart LR
  SENSOR[Sensor] --> READING[Reading]
  READING --> STALE{Source stale?}
  STALE -- yes --> DROP[Silent discard]
  STALE -- no --> BAND{Out of band?}
  BAND -- no --> CLEAR[Clear buffers + resolve active]
  BAND -- yes --> ACTIVE{Active excursion?}
  ACTIVE -- yes --> UPDATE[Update peak, duration, sample_count]
  ACTIVE -- no --> FLAP{Flap window 5 min?}
  FLAP -- yes --> REOPEN[Re-open previous excursion]
  FLAP -- no --> MODE{Grace mode?}
  MODE -- escalate --> OPEN[Open at warning, escalate to critical if it lasts]
  MODE -- suppress --> BUFFER[Volatile buffer: promote only if sustained]
  OPEN --> AGG[Alert Aggregator]
  REOPEN --> AGG
  BUFFER --> AGG
  OPEN --> JOURNEY[Attach to active leg if applicable]
  BUFFER --> JOURNEY

Filters and protections

CC-H1 — staleness: a source flagged stale by the sensor watchdog is ignored before evaluating range. A dead sensor sending 0 °C cannot fabricate fake excursions.
CC-M2 — flap window: if a resolved excursion of the same direction reappears within 5 minutes, it is re-opened instead of creating a new one. Prevents fragmenting a real incident.
CC-H2 — aggregator feed: every new or escalated excursion publishes to the alert aggregator with source_system=cold_chain, canonical severity, and excursion_id reference.
CC-M1 — configurable grace:
- escalate (default): every excursion opens at warning, promotes to critical when grace is exceeded.
- suppress (HACCP-style): transients inside the grace window do not persist; an excursion only opens if it outlasts grace, and it opens directly at critical.

Asset configuration

Dashboard

Assets → your equipment → edit.
Toggle Cold Chain on.
Fill in:
- Min / max temperature: safe limits.
- Unit: °C or °F.
- Grace minutes: 1 to 1440. Up to 24 hours for pharmaceutical shipments (WHO TRS 961 Annex 9, EU GDP).
- Grace mode: escalate or suppress.
- Event source: the source_id providing readings. Validated against _machine_event_sources (CC-M4).

API

PATCH /api/v1/assets/{asset_id}
{
  "cold_chain_enabled": true,
  "cold_chain_source_id": "src_fridge_01",
  "cold_chain_metric": "temperature",
  "temperature_min": -22,
  "temperature_max": -16,
  "temperature_unit": "C",
  "excursion_grace_minutes": 15,
  "cold_chain_grace_mode": "escalate"
}

CC-M5: any change to the 8 regulated fields (cold_chain_enabled, cold_chain_source_id, cold_chain_metric, temperature_min, temperature_max, temperature_unit, excursion_grace_minutes, cold_chain_grace_mode) writes an entry to _audit_trail with actor, timestamp, previous and new snapshot. An auditor can answer "who moved temperature_max from −18 to −12 on March 3rd".

Reading ingestion

Two paths, same pipeline (same filters, same dedup, same aggregator).

Path 1 — via machine events

Configure an event_source of type http / mqtt / opcua. Every event carrying temperature (or the field configured in cold_chain_metric) triggers check_cold_chain_from_event.

Path 2 — direct ingestion `POST /api/v1/cold-chain/readings`

For sensors outside the machine_events pipeline: Bluetooth loggers, handheld probes at receiving, transport recorders that bulk-upload on dock arrival.

POST /api/v1/cold-chain/readings
{
  "source_id": "logger-bt-042",
  "asset_id": "65a...",
  "temperature": -12.4,
  "batch_id": "LOT-2026-03-A",
  "product_type": "mRNA vaccine"
}

If you omit asset_id, the service resolves every asset whose cold_chain_source_id matches and evaluates each.

CC-L1: batch_id and product_type are carried onto the resulting excursion. A breach on a fridge holding insulin at 14:00 and yogurt at 17:00 is two different compliance stories — the asset is the same, the batch inside rotates.

Grace modes — `escalate` vs `suppress`

`escalate` (default)

Every excursion is recorded from the first reading. Good for food-service where every door opening matters.

t=0s   Out-of-band reading    -> Excursion opened (severity=warning)
t=5s   Out-of-band reading    -> Update: sample_count=2, peak refreshed
t=30s  In-band reading        -> Excursion resolved

Result: one row in _cold_chain_excursions with duration=0.5 min.

`suppress` (HACCP-style)

Transient excursions are forgiven. Only excursions that outlast grace persist, and they open directly at critical. Good for pharma and GDP where cry-wolf alerting destroys the signal.

t=0s    Out-of-band reading    -> Buffer opened (NO excursion persisted)
t=30s   Out-of-band reading    -> Buffer updates sample_count + peak
t=2min  In-band reading        -> Buffer dropped silently (0 excursions)

If the excursion lasts long enough:

t=0s        Out-of-band         -> Buffer opened
t=15min     Out-of-band         -> elapsed >= grace: promote
                                   persisted excursion with severity=critical,
                                   started_at=t=0s, duration=15min

Switch modes per asset via cold_chain_grace_mode. Default escalate preserves historical behaviour for assets that predated the flag.

Journey / Leg — refrigerated shipments

CC-M6. A journey is an end-to-end shipment. A leg is a custody segment (warehouse then truck then distribution center then pharmacy). Hand-offs between legs are the highest-risk moments in cold chain and are where accountability questions arise ("whose truck warmed up?").

Model a shipment

POST /api/v1/cold-chain/journeys
{
  "journey_code": "JRN-2026-03-001",
  "origin": "DC Madrid",
  "destination": "Pharmacy Valencia",
  "batch_id": "LOT-42",
  "product_type": "insulin",
  "legs": [
    {"sequence": 0, "from_location": "DC Madrid",    "to_location": "Cross-dock"},
    {"sequence": 1, "from_location": "Cross-dock",   "to_location": "Truck 7", "asset_id": "65a..."},
    {"sequence": 2, "from_location": "Truck 7",      "to_location": "Pharmacy"}
  ]
}

Lifecycle

stateDiagram-v2
  [*] --> planned
  planned --> in_transit: start_leg (first time)
  in_transit --> in_transit: start_leg / complete_leg
  in_transit --> delivered: final leg completed
  planned --> cancelled
  in_transit --> cancelled

POST /api/v1/cold-chain/journeys/{id}/legs/{leg_id}/start — marks the leg as active and, on the first activation, stamps actual_start_at on the journey. Subsequent activations do NOT overwrite the original dispatch.
POST /api/v1/cold-chain/journeys/{id}/legs/{leg_id}/complete — marks the leg as completed with ended_at.

Automatic excursion attribution

When check_excursion opens a new excursion for an asset_id, the service looks for an in_transit journey with an active leg referencing that asset. If found, it appends the excursion_id to the leg's legs.$.excursion_ids array. Fire-and-forget: a journey-side failure never breaks the cold chain pipeline.

The audit answers "which transport leg failed?" without manual joins across collections.

Status dashboard

GET /api/v1/cold-chain/status returns, per asset, the three numbers an operator reads at a glance:

Field	Meaning
`peak_deviation`	How far outside the band (asset unit).
`duration_minutes`	How long it has been outside.
`sample_count`	How many readings back the excursion (CC-L3).

A 1-sample critical is a glitch; a 20-sample critical over 20 minutes is a real event. The third number is what separates real triage from noise.

[
  {
    "id": "65a...",
    "name": "Freezer Unit A",
    "temperature_min": -22,
    "temperature_max": -16,
    "has_excursion": true,
    "peak_deviation": 3.2,
    "duration_minutes": 18.5,
    "sample_count": 37,
    "excursion": { "id": "...", "severity": "critical", "source_id": "src_fridge_01", "batch_id": "LOT-42" }
  }
]

Endpoints

Method	Path	What it does
`POST`	`/api/v1/cold-chain/readings`	Direct ingestion with `batch_id` / `product_type`.
`GET`	`/api/v1/cold-chain/status`	Per-asset status + peak/duration/sample.
`GET`	`/api/v1/cold-chain/excursions`	History with filters `asset_id`, `resolved`.
`POST`	`/api/v1/cold-chain/journeys`	Create journey + legs.
`GET`	`/api/v1/cold-chain/journeys`	List, filter by `status` and `batch_id`.
`POST`	`/api/v1/cold-chain/journeys/{id}/legs`	Append a leg.
`POST`	`/api/v1/cold-chain/journeys/{id}/legs/{leg_id}/start`	Mark leg `active`.
`POST`	`/api/v1/cold-chain/journeys/{id}/legs/{leg_id}/complete`	Mark leg `completed`.

Real-world scenarios

1. Short door-opening on a meat freezer

Grace mode escalate, grace 15 min. An operator opens the door for 40 seconds. Temperature rises from −20 to −16.5 °C.

t=0s: excursion opened (warning, duration 0).
t=20s: update (sample_count=2, peak 3.5 °C).
t=40s: in-band then resolve. Duration 0.7 min.
Dashboard shows peak=3.5, duration=0.7, sample_count=2. Operator dismisses.

2. Vaccine shipment with a breach on leg 2

Grace mode suppress, grace 60 min. Journey with 3 legs. Truck hits traffic and temperature exceeds 8 °C for 75 minutes.

Leg 2 active, asset = "Truck 7".
t=0: buffer opened (no excursion persisted).
t=60min: buffer exceeds grace then promote. Excursion critical with started_at = t=0, duration=60min, severity=critical, batch_id=LOT-42, product_type=insulin.
The CC-M6 hook appends excursion_id to leg 2. Audit answers: "the breach happened on the Cross-dock then Truck 7 leg".

3. Dead sensor sending 0 °C

Sensor watchdog flags src_fridge_05 as stale. The reading pipeline receives 0 °C (out of band against a −18 floor). CC-H1 discards the reading on the first line of check_excursion. No ghost excursion.

Limitations and assumptions

cold_chain_enabled=True requires a valid cold_chain_source_id in _machine_event_sources (CC-M4). If the source is deleted afterwards, the service logs a warning but doesn't block; the validation is an early-warning.
The flap window is fixed at 5 minutes. Adjustable only in code (_COLD_CHAIN_FLAP_WINDOW_MINUTES).
suppress does not apply to flap-reopen: if a PERSISTED excursion resolves and a new one appears in the flap window, it re-opens regardless of mode (the noise call was already made).
journey.status doesn't auto-close when the last leg completes; it requires an explicit PATCH to delivered.

Per-tenant MongoDB collections

Collection	Content
`_cold_chain_excursions`	Persisted excursions (full schema with `batch_id`, `product_type`, `source_id`, `sample_count`, `peak_deviation`, `duration_minutes`).
`_cold_chain_excursion_buffers`	Volatile `suppress`-mode buffers. Dropped on in-band resolve or on promotion.
`_cold_chain_journeys`	Shipments with embedded legs.
`_audit_trail`	Fire-and-forget entries with `action=asset_cold_chain_updated`.

Findings closed in this iteration

ID	What was closed
CC-H1	Staleness filter before range check.
CC-H2	Aggregator feed with dedicated source_system.
CC-H3	`source_id` persisted on excursion (propagated from webhook).
CC-M1	`cold_chain_grace_mode` flag with HACCP-style mode.
CC-M2	5-min flap window for consecutive-excursion reopen.
CC-M3	`POST /readings` endpoint for ingestion without `machine_events`.
CC-M4	`cold_chain_source_id` validation on asset create/update.
CC-M5	Audit trail on mutations of the 8 regulated fields.
CC-M6	Journey + Leg with automatic attribution.
CC-L1	`batch_id` + `product_type` persisted on excursion.
CC-L2	`excursion_grace_minutes` range widened to 1 through 1440.
CC-L3	`peak_deviation`, `duration_minutes`, `sample_count` on status.

Key benefits

Verifiable audit trail for HACCP, GDP, WHO TRS 961, FSMA.
Zero cry-wolf: staleness + flapping + grace mode remove structural noise.
Real attribution: every breach knows its batch, product, and leg.
Unified inbox: same inbox as maintenance and quality; the operator never switches screens.
Pharma-ready: grace up to 24 h, suppress mode, batch tracking, journey model.

Cold Chain

On this page