Rela AIRela AI Docs
Food Safety

Cold Chain

Temperature monitoring for refrigeration and transport with traceable excursions, configurable grace, and journey/leg attribution for shipments.

Cold Chain

Cold chain is the uninterrupted temperature control from production to consumption. If a refrigerated asset or shipment leaves the safe band, the product may be compromised. Rela-ai monitors every asset, every batch, and every transport leg — and leaves a verifiable audit trail for regulators.

Executive summary

Cold chain without cry-wolf: every deviation runs through staleness, flapping, and configurable grace filters before it touches the inbox — and ends up attached to the batch and the transport leg that caused it.

Before this iteration, a 40-second door-opening could fire three separate alerts and leave zero information about the affected batch. Now:

BeforeNow
Dead sensor sending 0 °C fabricated fake excursionsCC-H1 filter: stale sources are discarded
Cold chain alerted separately from the rest of the systemCC-H2: feed to the alert aggregator with source_system=cold_chain
Impossible to know WHICH sensor opened the excursionCC-H3: source_id persisted on every excursion
Door open/close/open = 2 artificial incidentsCC-M2: re-open inside a 5-min flap window
Mutations of temperature_max invisibleCC-M5: fire-and-forget audit trail
Rigid "warning to critical" graceCC-M1: escalate or suppress (HACCP-style) mode
Impossible to know WHICH batch was insideCC-L1: batch_id and product_type on the excursion
Impossible to know WHICH transport leg failedCC-M6: journey + legs with excursion_ids

What it does

  • Detects out-of-band temperatures in freezers, cold rooms, and vehicles.
  • Tells a 30-second door opening from a 2-hour compressor failure.
  • Attributes every breach to the batch and the transport leg responsible.
  • Produces immutable audit trail for HACCP, GDP, WHO TRS 961, FSMA.
  • Unifies alerts with the rest of the pipeline (maintenance, quality, SPC).

How it works

flowchart LR
  SENSOR[Sensor] --> READING[Reading]
  READING --> STALE{Source stale?}
  STALE -- yes --> DROP[Silent discard]
  STALE -- no --> BAND{Out of band?}
  BAND -- no --> CLEAR[Clear buffers + resolve active]
  BAND -- yes --> ACTIVE{Active excursion?}
  ACTIVE -- yes --> UPDATE[Update peak, duration, sample_count]
  ACTIVE -- no --> FLAP{Flap window 5 min?}
  FLAP -- yes --> REOPEN[Re-open previous excursion]
  FLAP -- no --> MODE{Grace mode?}
  MODE -- escalate --> OPEN[Open at warning, escalate to critical if it lasts]
  MODE -- suppress --> BUFFER[Volatile buffer: promote only if sustained]
  OPEN --> AGG[Alert Aggregator]
  REOPEN --> AGG
  BUFFER --> AGG
  OPEN --> JOURNEY[Attach to active leg if applicable]
  BUFFER --> JOURNEY

Filters and protections

  1. CC-H1 — staleness: a source flagged stale by the sensor watchdog is ignored before evaluating range. A dead sensor sending 0 °C cannot fabricate fake excursions.
  2. CC-M2 — flap window: if a resolved excursion of the same direction reappears within 5 minutes, it is re-opened instead of creating a new one. Prevents fragmenting a real incident.
  3. CC-H2 — aggregator feed: every new or escalated excursion publishes to the alert aggregator with source_system=cold_chain, canonical severity, and excursion_id reference.
  4. CC-M1 — configurable grace:
    • escalate (default): every excursion opens at warning, promotes to critical when grace is exceeded.
    • suppress (HACCP-style): transients inside the grace window do not persist; an excursion only opens if it outlasts grace, and it opens directly at critical.

Asset configuration

Dashboard

  1. Assets → your equipment → edit.
  2. Toggle Cold Chain on.
  3. Fill in:
    • Min / max temperature: safe limits.
    • Unit: °C or °F.
    • Grace minutes: 1 to 1440. Up to 24 hours for pharmaceutical shipments (WHO TRS 961 Annex 9, EU GDP).
    • Grace mode: escalate or suppress.
    • Event source: the source_id providing readings. Validated against _machine_event_sources (CC-M4).

API

PATCH /api/v1/assets/{asset_id}
{
  "cold_chain_enabled": true,
  "cold_chain_source_id": "src_fridge_01",
  "cold_chain_metric": "temperature",
  "temperature_min": -22,
  "temperature_max": -16,
  "temperature_unit": "C",
  "excursion_grace_minutes": 15,
  "cold_chain_grace_mode": "escalate"
}

CC-M5: any change to the 8 regulated fields (cold_chain_enabled, cold_chain_source_id, cold_chain_metric, temperature_min, temperature_max, temperature_unit, excursion_grace_minutes, cold_chain_grace_mode) writes an entry to _audit_trail with actor, timestamp, previous and new snapshot. An auditor can answer "who moved temperature_max from −18 to −12 on March 3rd".

Reading ingestion

Two paths, same pipeline (same filters, same dedup, same aggregator).

Path 1 — via machine events

Configure an event_source of type http / mqtt / opcua. Every event carrying temperature (or the field configured in cold_chain_metric) triggers check_cold_chain_from_event.

Path 2 — direct ingestion POST /api/v1/cold-chain/readings

For sensors outside the machine_events pipeline: Bluetooth loggers, handheld probes at receiving, transport recorders that bulk-upload on dock arrival.

POST /api/v1/cold-chain/readings
{
  "source_id": "logger-bt-042",
  "asset_id": "65a...",
  "temperature": -12.4,
  "batch_id": "LOT-2026-03-A",
  "product_type": "mRNA vaccine"
}

If you omit asset_id, the service resolves every asset whose cold_chain_source_id matches and evaluates each.

CC-L1: batch_id and product_type are carried onto the resulting excursion. A breach on a fridge holding insulin at 14:00 and yogurt at 17:00 is two different compliance stories — the asset is the same, the batch inside rotates.

Grace modes — escalate vs suppress

escalate (default)

Every excursion is recorded from the first reading. Good for food-service where every door opening matters.

t=0s   Out-of-band reading    -> Excursion opened (severity=warning)
t=5s   Out-of-band reading    -> Update: sample_count=2, peak refreshed
t=30s  In-band reading        -> Excursion resolved

Result: one row in _cold_chain_excursions with duration=0.5 min.

suppress (HACCP-style)

Transient excursions are forgiven. Only excursions that outlast grace persist, and they open directly at critical. Good for pharma and GDP where cry-wolf alerting destroys the signal.

t=0s    Out-of-band reading    -> Buffer opened (NO excursion persisted)
t=30s   Out-of-band reading    -> Buffer updates sample_count + peak
t=2min  In-band reading        -> Buffer dropped silently (0 excursions)

If the excursion lasts long enough:

t=0s        Out-of-band         -> Buffer opened
t=15min     Out-of-band         -> elapsed >= grace: promote
                                   persisted excursion with severity=critical,
                                   started_at=t=0s, duration=15min

Switch modes per asset via cold_chain_grace_mode. Default escalate preserves historical behaviour for assets that predated the flag.

Journey / Leg — refrigerated shipments

CC-M6. A journey is an end-to-end shipment. A leg is a custody segment (warehouse then truck then distribution center then pharmacy). Hand-offs between legs are the highest-risk moments in cold chain and are where accountability questions arise ("whose truck warmed up?").

Model a shipment

POST /api/v1/cold-chain/journeys
{
  "journey_code": "JRN-2026-03-001",
  "origin": "DC Madrid",
  "destination": "Pharmacy Valencia",
  "batch_id": "LOT-42",
  "product_type": "insulin",
  "legs": [
    {"sequence": 0, "from_location": "DC Madrid",    "to_location": "Cross-dock"},
    {"sequence": 1, "from_location": "Cross-dock",   "to_location": "Truck 7", "asset_id": "65a..."},
    {"sequence": 2, "from_location": "Truck 7",      "to_location": "Pharmacy"}
  ]
}

Lifecycle

stateDiagram-v2
  [*] --> planned
  planned --> in_transit: start_leg (first time)
  in_transit --> in_transit: start_leg / complete_leg
  in_transit --> delivered: final leg completed
  planned --> cancelled
  in_transit --> cancelled
  • POST /api/v1/cold-chain/journeys/{id}/legs/{leg_id}/start — marks the leg as active and, on the first activation, stamps actual_start_at on the journey. Subsequent activations do NOT overwrite the original dispatch.
  • POST /api/v1/cold-chain/journeys/{id}/legs/{leg_id}/complete — marks the leg as completed with ended_at.

Automatic excursion attribution

When check_excursion opens a new excursion for an asset_id, the service looks for an in_transit journey with an active leg referencing that asset. If found, it appends the excursion_id to the leg's legs.$.excursion_ids array. Fire-and-forget: a journey-side failure never breaks the cold chain pipeline.

The audit answers "which transport leg failed?" without manual joins across collections.

Status dashboard

GET /api/v1/cold-chain/status returns, per asset, the three numbers an operator reads at a glance:

FieldMeaning
peak_deviationHow far outside the band (asset unit).
duration_minutesHow long it has been outside.
sample_countHow many readings back the excursion (CC-L3).

A 1-sample critical is a glitch; a 20-sample critical over 20 minutes is a real event. The third number is what separates real triage from noise.

[
  {
    "id": "65a...",
    "name": "Freezer Unit A",
    "temperature_min": -22,
    "temperature_max": -16,
    "has_excursion": true,
    "peak_deviation": 3.2,
    "duration_minutes": 18.5,
    "sample_count": 37,
    "excursion": { "id": "...", "severity": "critical", "source_id": "src_fridge_01", "batch_id": "LOT-42" }
  }
]

Endpoints

MethodPathWhat it does
POST/api/v1/cold-chain/readingsDirect ingestion with batch_id / product_type.
GET/api/v1/cold-chain/statusPer-asset status + peak/duration/sample.
GET/api/v1/cold-chain/excursionsHistory with filters asset_id, resolved.
POST/api/v1/cold-chain/journeysCreate journey + legs.
GET/api/v1/cold-chain/journeysList, filter by status and batch_id.
POST/api/v1/cold-chain/journeys/{id}/legsAppend a leg.
POST/api/v1/cold-chain/journeys/{id}/legs/{leg_id}/startMark leg active.
POST/api/v1/cold-chain/journeys/{id}/legs/{leg_id}/completeMark leg completed.

Real-world scenarios

1. Short door-opening on a meat freezer

Grace mode escalate, grace 15 min. An operator opens the door for 40 seconds. Temperature rises from −20 to −16.5 °C.

  • t=0s: excursion opened (warning, duration 0).
  • t=20s: update (sample_count=2, peak 3.5 °C).
  • t=40s: in-band then resolve. Duration 0.7 min.
  • Dashboard shows peak=3.5, duration=0.7, sample_count=2. Operator dismisses.

2. Vaccine shipment with a breach on leg 2

Grace mode suppress, grace 60 min. Journey with 3 legs. Truck hits traffic and temperature exceeds 8 °C for 75 minutes.

  • Leg 2 active, asset = "Truck 7".
  • t=0: buffer opened (no excursion persisted).
  • t=60min: buffer exceeds grace then promote. Excursion critical with started_at = t=0, duration=60min, severity=critical, batch_id=LOT-42, product_type=insulin.
  • The CC-M6 hook appends excursion_id to leg 2. Audit answers: "the breach happened on the Cross-dock then Truck 7 leg".

3. Dead sensor sending 0 °C

Sensor watchdog flags src_fridge_05 as stale. The reading pipeline receives 0 °C (out of band against a −18 floor). CC-H1 discards the reading on the first line of check_excursion. No ghost excursion.

Limitations and assumptions

  • cold_chain_enabled=True requires a valid cold_chain_source_id in _machine_event_sources (CC-M4). If the source is deleted afterwards, the service logs a warning but doesn't block; the validation is an early-warning.
  • The flap window is fixed at 5 minutes. Adjustable only in code (_COLD_CHAIN_FLAP_WINDOW_MINUTES).
  • suppress does not apply to flap-reopen: if a PERSISTED excursion resolves and a new one appears in the flap window, it re-opens regardless of mode (the noise call was already made).
  • journey.status doesn't auto-close when the last leg completes; it requires an explicit PATCH to delivered.

Per-tenant MongoDB collections

CollectionContent
_cold_chain_excursionsPersisted excursions (full schema with batch_id, product_type, source_id, sample_count, peak_deviation, duration_minutes).
_cold_chain_excursion_buffersVolatile suppress-mode buffers. Dropped on in-band resolve or on promotion.
_cold_chain_journeysShipments with embedded legs.
_audit_trailFire-and-forget entries with action=asset_cold_chain_updated.

Findings closed in this iteration

IDWhat was closed
CC-H1Staleness filter before range check.
CC-H2Aggregator feed with dedicated source_system.
CC-H3source_id persisted on excursion (propagated from webhook).
CC-M1cold_chain_grace_mode flag with HACCP-style mode.
CC-M25-min flap window for consecutive-excursion reopen.
CC-M3POST /readings endpoint for ingestion without machine_events.
CC-M4cold_chain_source_id validation on asset create/update.
CC-M5Audit trail on mutations of the 8 regulated fields.
CC-M6Journey + Leg with automatic attribution.
CC-L1batch_id + product_type persisted on excursion.
CC-L2excursion_grace_minutes range widened to 1 through 1440.
CC-L3peak_deviation, duration_minutes, sample_count on status.

Key benefits

  • Verifiable audit trail for HACCP, GDP, WHO TRS 961, FSMA.
  • Zero cry-wolf: staleness + flapping + grace mode remove structural noise.
  • Real attribution: every breach knows its batch, product, and leg.
  • Unified inbox: same inbox as maintenance and quality; the operator never switches screens.
  • Pharma-ready: grace up to 24 h, suppress mode, batch tracking, journey model.

On this page