Cold Chain
Temperature monitoring for refrigeration and transport with traceable excursions, configurable grace, and journey/leg attribution for shipments.
Cold Chain
Cold chain is the uninterrupted temperature control from production to consumption. If a refrigerated asset or shipment leaves the safe band, the product may be compromised. Rela-ai monitors every asset, every batch, and every transport leg — and leaves a verifiable audit trail for regulators.
Executive summary
Cold chain without cry-wolf: every deviation runs through staleness, flapping, and configurable grace filters before it touches the inbox — and ends up attached to the batch and the transport leg that caused it.
Before this iteration, a 40-second door-opening could fire three separate alerts and leave zero information about the affected batch. Now:
| Before | Now |
|---|---|
| Dead sensor sending 0 °C fabricated fake excursions | CC-H1 filter: stale sources are discarded |
| Cold chain alerted separately from the rest of the system | CC-H2: feed to the alert aggregator with source_system=cold_chain |
| Impossible to know WHICH sensor opened the excursion | CC-H3: source_id persisted on every excursion |
| Door open/close/open = 2 artificial incidents | CC-M2: re-open inside a 5-min flap window |
Mutations of temperature_max invisible | CC-M5: fire-and-forget audit trail |
| Rigid "warning to critical" grace | CC-M1: escalate or suppress (HACCP-style) mode |
| Impossible to know WHICH batch was inside | CC-L1: batch_id and product_type on the excursion |
| Impossible to know WHICH transport leg failed | CC-M6: journey + legs with excursion_ids |
What it does
- Detects out-of-band temperatures in freezers, cold rooms, and vehicles.
- Tells a 30-second door opening from a 2-hour compressor failure.
- Attributes every breach to the batch and the transport leg responsible.
- Produces immutable audit trail for HACCP, GDP, WHO TRS 961, FSMA.
- Unifies alerts with the rest of the pipeline (maintenance, quality, SPC).
How it works
flowchart LR
SENSOR[Sensor] --> READING[Reading]
READING --> STALE{Source stale?}
STALE -- yes --> DROP[Silent discard]
STALE -- no --> BAND{Out of band?}
BAND -- no --> CLEAR[Clear buffers + resolve active]
BAND -- yes --> ACTIVE{Active excursion?}
ACTIVE -- yes --> UPDATE[Update peak, duration, sample_count]
ACTIVE -- no --> FLAP{Flap window 5 min?}
FLAP -- yes --> REOPEN[Re-open previous excursion]
FLAP -- no --> MODE{Grace mode?}
MODE -- escalate --> OPEN[Open at warning, escalate to critical if it lasts]
MODE -- suppress --> BUFFER[Volatile buffer: promote only if sustained]
OPEN --> AGG[Alert Aggregator]
REOPEN --> AGG
BUFFER --> AGG
OPEN --> JOURNEY[Attach to active leg if applicable]
BUFFER --> JOURNEYFilters and protections
- CC-H1 — staleness: a source flagged
staleby the sensor watchdog is ignored before evaluating range. A dead sensor sending 0 °C cannot fabricate fake excursions. - CC-M2 — flap window: if a resolved excursion of the same direction reappears within 5 minutes, it is re-opened instead of creating a new one. Prevents fragmenting a real incident.
- CC-H2 — aggregator feed: every new or escalated excursion publishes to the alert aggregator with
source_system=cold_chain, canonical severity, andexcursion_idreference. - CC-M1 — configurable grace:
escalate(default): every excursion opens atwarning, promotes tocriticalwhen grace is exceeded.suppress(HACCP-style): transients inside the grace window do not persist; an excursion only opens if it outlasts grace, and it opens directly atcritical.
Asset configuration
Dashboard
- Assets → your equipment → edit.
- Toggle Cold Chain on.
- Fill in:
- Min / max temperature: safe limits.
- Unit: °C or °F.
- Grace minutes: 1 to 1440. Up to 24 hours for pharmaceutical shipments (WHO TRS 961 Annex 9, EU GDP).
- Grace mode:
escalateorsuppress. - Event source: the
source_idproviding readings. Validated against_machine_event_sources(CC-M4).
API
PATCH /api/v1/assets/{asset_id}
{
"cold_chain_enabled": true,
"cold_chain_source_id": "src_fridge_01",
"cold_chain_metric": "temperature",
"temperature_min": -22,
"temperature_max": -16,
"temperature_unit": "C",
"excursion_grace_minutes": 15,
"cold_chain_grace_mode": "escalate"
}CC-M5: any change to the 8 regulated fields (cold_chain_enabled, cold_chain_source_id, cold_chain_metric, temperature_min, temperature_max, temperature_unit, excursion_grace_minutes, cold_chain_grace_mode) writes an entry to _audit_trail with actor, timestamp, previous and new snapshot. An auditor can answer "who moved temperature_max from −18 to −12 on March 3rd".
Reading ingestion
Two paths, same pipeline (same filters, same dedup, same aggregator).
Path 1 — via machine events
Configure an event_source of type http / mqtt / opcua. Every event carrying temperature (or the field configured in cold_chain_metric) triggers check_cold_chain_from_event.
Path 2 — direct ingestion POST /api/v1/cold-chain/readings
For sensors outside the machine_events pipeline: Bluetooth loggers, handheld probes at receiving, transport recorders that bulk-upload on dock arrival.
POST /api/v1/cold-chain/readings
{
"source_id": "logger-bt-042",
"asset_id": "65a...",
"temperature": -12.4,
"batch_id": "LOT-2026-03-A",
"product_type": "mRNA vaccine"
}If you omit asset_id, the service resolves every asset whose cold_chain_source_id matches and evaluates each.
CC-L1: batch_id and product_type are carried onto the resulting excursion. A breach on a fridge holding insulin at 14:00 and yogurt at 17:00 is two different compliance stories — the asset is the same, the batch inside rotates.
Grace modes — escalate vs suppress
escalate (default)
Every excursion is recorded from the first reading. Good for food-service where every door opening matters.
t=0s Out-of-band reading -> Excursion opened (severity=warning)
t=5s Out-of-band reading -> Update: sample_count=2, peak refreshed
t=30s In-band reading -> Excursion resolvedResult: one row in _cold_chain_excursions with duration=0.5 min.
suppress (HACCP-style)
Transient excursions are forgiven. Only excursions that outlast grace persist, and they open directly at critical. Good for pharma and GDP where cry-wolf alerting destroys the signal.
t=0s Out-of-band reading -> Buffer opened (NO excursion persisted)
t=30s Out-of-band reading -> Buffer updates sample_count + peak
t=2min In-band reading -> Buffer dropped silently (0 excursions)If the excursion lasts long enough:
t=0s Out-of-band -> Buffer opened
t=15min Out-of-band -> elapsed >= grace: promote
persisted excursion with severity=critical,
started_at=t=0s, duration=15minSwitch modes per asset via cold_chain_grace_mode. Default escalate preserves historical behaviour for assets that predated the flag.
Journey / Leg — refrigerated shipments
CC-M6. A journey is an end-to-end shipment. A leg is a custody segment (warehouse then truck then distribution center then pharmacy). Hand-offs between legs are the highest-risk moments in cold chain and are where accountability questions arise ("whose truck warmed up?").
Model a shipment
POST /api/v1/cold-chain/journeys
{
"journey_code": "JRN-2026-03-001",
"origin": "DC Madrid",
"destination": "Pharmacy Valencia",
"batch_id": "LOT-42",
"product_type": "insulin",
"legs": [
{"sequence": 0, "from_location": "DC Madrid", "to_location": "Cross-dock"},
{"sequence": 1, "from_location": "Cross-dock", "to_location": "Truck 7", "asset_id": "65a..."},
{"sequence": 2, "from_location": "Truck 7", "to_location": "Pharmacy"}
]
}Lifecycle
stateDiagram-v2
[*] --> planned
planned --> in_transit: start_leg (first time)
in_transit --> in_transit: start_leg / complete_leg
in_transit --> delivered: final leg completed
planned --> cancelled
in_transit --> cancelledPOST /api/v1/cold-chain/journeys/{id}/legs/{leg_id}/start— marks the leg asactiveand, on the first activation, stampsactual_start_aton the journey. Subsequent activations do NOT overwrite the original dispatch.POST /api/v1/cold-chain/journeys/{id}/legs/{leg_id}/complete— marks the leg ascompletedwithended_at.
Automatic excursion attribution
When check_excursion opens a new excursion for an asset_id, the service looks for an in_transit journey with an active leg referencing that asset. If found, it appends the excursion_id to the leg's legs.$.excursion_ids array. Fire-and-forget: a journey-side failure never breaks the cold chain pipeline.
The audit answers "which transport leg failed?" without manual joins across collections.
Status dashboard
GET /api/v1/cold-chain/status returns, per asset, the three numbers an operator reads at a glance:
| Field | Meaning |
|---|---|
peak_deviation | How far outside the band (asset unit). |
duration_minutes | How long it has been outside. |
sample_count | How many readings back the excursion (CC-L3). |
A 1-sample critical is a glitch; a 20-sample critical over 20 minutes is a real event. The third number is what separates real triage from noise.
[
{
"id": "65a...",
"name": "Freezer Unit A",
"temperature_min": -22,
"temperature_max": -16,
"has_excursion": true,
"peak_deviation": 3.2,
"duration_minutes": 18.5,
"sample_count": 37,
"excursion": { "id": "...", "severity": "critical", "source_id": "src_fridge_01", "batch_id": "LOT-42" }
}
]Endpoints
| Method | Path | What it does |
|---|---|---|
POST | /api/v1/cold-chain/readings | Direct ingestion with batch_id / product_type. |
GET | /api/v1/cold-chain/status | Per-asset status + peak/duration/sample. |
GET | /api/v1/cold-chain/excursions | History with filters asset_id, resolved. |
POST | /api/v1/cold-chain/journeys | Create journey + legs. |
GET | /api/v1/cold-chain/journeys | List, filter by status and batch_id. |
POST | /api/v1/cold-chain/journeys/{id}/legs | Append a leg. |
POST | /api/v1/cold-chain/journeys/{id}/legs/{leg_id}/start | Mark leg active. |
POST | /api/v1/cold-chain/journeys/{id}/legs/{leg_id}/complete | Mark leg completed. |
Real-world scenarios
1. Short door-opening on a meat freezer
Grace mode escalate, grace 15 min. An operator opens the door for 40 seconds. Temperature rises from −20 to −16.5 °C.
- t=0s: excursion opened (warning, duration 0).
- t=20s: update (sample_count=2, peak 3.5 °C).
- t=40s: in-band then resolve. Duration 0.7 min.
- Dashboard shows
peak=3.5,duration=0.7,sample_count=2. Operator dismisses.
2. Vaccine shipment with a breach on leg 2
Grace mode suppress, grace 60 min. Journey with 3 legs. Truck hits traffic and temperature exceeds 8 °C for 75 minutes.
- Leg 2 active, asset = "Truck 7".
- t=0: buffer opened (no excursion persisted).
- t=60min: buffer exceeds grace then promote. Excursion
criticalwithstarted_at= t=0,duration=60min,severity=critical,batch_id=LOT-42,product_type=insulin. - The CC-M6 hook appends
excursion_idto leg 2. Audit answers: "the breach happened on the Cross-dock then Truck 7 leg".
3. Dead sensor sending 0 °C
Sensor watchdog flags src_fridge_05 as stale. The reading pipeline receives 0 °C (out of band against a −18 floor). CC-H1 discards the reading on the first line of check_excursion. No ghost excursion.
Limitations and assumptions
cold_chain_enabled=Truerequires a validcold_chain_source_idin_machine_event_sources(CC-M4). If the source is deleted afterwards, the service logs a warning but doesn't block; the validation is an early-warning.- The flap window is fixed at 5 minutes. Adjustable only in code (
_COLD_CHAIN_FLAP_WINDOW_MINUTES). suppressdoes not apply to flap-reopen: if a PERSISTED excursion resolves and a new one appears in the flap window, it re-opens regardless of mode (the noise call was already made).journey.statusdoesn't auto-close when the last leg completes; it requires an explicitPATCHtodelivered.
Per-tenant MongoDB collections
| Collection | Content |
|---|---|
_cold_chain_excursions | Persisted excursions (full schema with batch_id, product_type, source_id, sample_count, peak_deviation, duration_minutes). |
_cold_chain_excursion_buffers | Volatile suppress-mode buffers. Dropped on in-band resolve or on promotion. |
_cold_chain_journeys | Shipments with embedded legs. |
_audit_trail | Fire-and-forget entries with action=asset_cold_chain_updated. |
Findings closed in this iteration
| ID | What was closed |
|---|---|
| CC-H1 | Staleness filter before range check. |
| CC-H2 | Aggregator feed with dedicated source_system. |
| CC-H3 | source_id persisted on excursion (propagated from webhook). |
| CC-M1 | cold_chain_grace_mode flag with HACCP-style mode. |
| CC-M2 | 5-min flap window for consecutive-excursion reopen. |
| CC-M3 | POST /readings endpoint for ingestion without machine_events. |
| CC-M4 | cold_chain_source_id validation on asset create/update. |
| CC-M5 | Audit trail on mutations of the 8 regulated fields. |
| CC-M6 | Journey + Leg with automatic attribution. |
| CC-L1 | batch_id + product_type persisted on excursion. |
| CC-L2 | excursion_grace_minutes range widened to 1 through 1440. |
| CC-L3 | peak_deviation, duration_minutes, sample_count on status. |
Key benefits
- Verifiable audit trail for HACCP, GDP, WHO TRS 961, FSMA.
- Zero cry-wolf: staleness + flapping + grace mode remove structural noise.
- Real attribution: every breach knows its batch, product, and leg.
- Unified inbox: same inbox as maintenance and quality; the operator never switches screens.
- Pharma-ready: grace up to 24 h, suppress mode, batch tracking, journey model.