Edge Gateway — Install and Operations
How to install the rela-ai-edge container in your plant, register it, assign sources, monitor the fleet, and troubleshoot connection issues.
Edge Gateway
The edge gateway is a Docker container that runs inside your plant network and polls the PLCs locally. Instead of Rela AI opening inbound TCP connections to the PLC, the gateway reads the data and pushes it out via HTTPS to the Cloud Run worker.
What it's for
For plants with strict firewalls where you can't open inbound ports, not even via VPN. The gateway only requires outbound HTTPS to the Rela AI domain — a rule 95% of industrial firewalls already allow without IT negotiation. It also adds local buffering: if internet goes down, events accumulate on the gateway's disk and drain when the connection returns.
How it works
- Install the container on a machine with Docker inside the plant network (Raspberry Pi, mini-PC, NUC).
- On startup, the container registers with our backend using a unique per-gateway token.
- It receives its config (which sources to poll) via HTTPS.
- Opens Modbus / OPC UA / S7 connections locally to the PLC.
- Each read → event → SQLite buffer → HTTP push to the agent's webhook.
- If the network drops, the buffer holds up to 100k events; on recovery, drains FIFO.
- Sends heartbeats every 60 seconds; if the backend doesn't see any for >1h, fires
GATEWAY_OFFLINEalert.
Benefits
| Firewall-friendly | Outbound HTTPS 443 only. No VPN needed, no inbound ports. |
| Local buffering | Store-and-forward SQLite — losing internet doesn't lose data. |
| Local polling | PLC latency under 1ms (same network). No internet RTT. |
| Multi-protocol | One gateway can poll Modbus + OPC UA + S7 concurrently. |
| Centralized monitoring | Dashboard shows fleet with last_heartbeat, CPU, memory, queue depth. |
Install — step by step
1. Register the gateway in the dashboard
Dashboard → Connections → Edge Gateways → Register Gateway.
Returns a gateway_id (e.g. gw_a1b2c3) and a shared_secret the container uses to authenticate. Save it — shown only once.
2. Prepare the host
Minimum requirements:
- Linux x86_64 or ARM64 (Raspberry Pi 4+ works).
- Docker 20+ installed.
- 1 GB RAM, 10 GB disk.
- Connectivity to the PLC network (same subnet or routed).
- Internet egress (HTTPS 443) to
rela-ai-worker-*.run.app.
3. Start the container
docker run -d \
--name rela-ai-edge \
--restart unless-stopped \
-v rela-edge-data:/var/lib/rela-edge \
-e GATEWAY_ID=gw_a1b2c3 \
-e GATEWAY_SECRET=<shared_secret> \
-e BACKEND_URL=https://rela-ai-worker-687568754456.europe-west1.run.app \
gcr.io/rela-ai-488016/rela-ai-edge:latestOr with docker compose:
services:
rela-ai-edge:
image: gcr.io/rela-ai-488016/rela-ai-edge:latest
restart: unless-stopped
volumes:
- rela-edge-data:/var/lib/rela-edge
environment:
GATEWAY_ID: gw_a1b2c3
GATEWAY_SECRET: <shared_secret>
BACKEND_URL: https://rela-ai-worker-687568754456.europe-west1.run.app
volumes:
rela-edge-data:4. Verify the gateway is online
In the dashboard, within ~60 seconds you should see the gateway in green with a recent last_heartbeat.
Local logs:
docker logs -f rela-ai-edgeYou should see:
edge_started gateway_id=gw_a1b2c3
heartbeat_sent seq=1 latency_ms=142
config_fetched sources=05. Assign sources to the gateway
In the dashboard, edit the sources you want running from this gateway:
Deployment Mode:edgeEdge Gateway ID:gw_a1b2c3
On the next heartbeat (≤60s), the gateway downloads the updated config and starts polling.
Fleet monitoring
Dashboard → Connections → Edge Gateways shows every gateway for the tenant with:
| Column | Meaning |
|---|---|
| Name | Label (e.g. "Gateway North Plant") |
| Status | online if last_heartbeat < 60s, warning if 1-60min, offline if >60min |
| Last heartbeat | Relative time since the last ping |
| CPU / Mem | Host metrics reported by the container |
| Queue depth | Events pending in the buffer (should be low except during outages) |
| Firmware version | Container version |
Automatic alerts
A Cloud Scheduler job runs hourly and scans every gateway. If it finds one with last_heartbeat > 60min, it fires a canonical GATEWAY_OFFLINE alert with warning severity in the inbox. Operators see it like any other asset alert.
The inbox dedup consolidates outages from the same gateway into a single row until it heartbeats again (no spam during long outages).
Firmware update
Dashboard → Edge Gateways → <gateway> → Update firmware. Pick the target version. On the next heartbeat the gateway receives the command and:
- Downloads the new Docker image.
- Does a
docker stopon the old container. - Starts the new one.
- Reports success or failure on the next heartbeat.
If the update fails (corrupted image, insufficient space), the gateway stays on the previous version and we see the error in the dashboard.
Troubleshooting
The gateway doesn't come online after 2 minutes
Check 1 — container logs:
docker logs rela-ai-edge | tail -50Common messages and their causes:
| Message | Cause | Fix |
|---|---|---|
DNS resolution failed for rela-ai-worker-*.run.app | No internet or DNS blocked | Try nslookup rela-ai-worker-687568754456.europe-west1.run.app from the host |
SSL: CERTIFICATE_VERIFY_FAILED | System time out of sync + expired certs | sudo systemctl restart systemd-timesyncd |
HTTP 401 Unauthorized | Wrong GATEWAY_SECRET | Regenerate the secret in the dashboard and update the container |
HTTP 403 Forbidden | GATEWAY_ID doesn't exist or was deleted | Register it again in the dashboard |
Check 2 — outbound connectivity:
curl -v https://rela-ai-worker-687568754456.europe-west1.run.app/internal/healthShould return 200 OK with JSON {"status":"ok"}.
The gateway is online but doesn't read the PLC
- Verify the gateway and the PLC are on the same network (ping from gateway to PLC should work).
- Review the container logs for specific Modbus/OPC UA errors (same troubleshooting as Modbus sources or OPC UA sources).
- In the dashboard, each source has a "Last read" widget — if it's been at 0 for a while, the container logs tell you why.
Queue depth rising endlessly
Means the gateway reads faster than it can push to the backend. Common causes:
- Slow internet: check HTTPS latency in the logs. >2s usually indicates a problem.
- Backend down: last POST returned 5xx. Check the dashboard.
- Too many sources: redistribute across gateways.
The queue caps at 100k events. When it reaches, it starts discarding the oldest with a warning. If you see this, it's time to add another gateway or debug the bottleneck.
Queue depth at 0 and no events reaching the dashboard
The gateway is connected and pushing fine, but the backend pipeline isn't processing them. Causes:
- The source that should emit has
enabled: false. - The machine agent assigned to that source has
min_severitytoo high. - A correlation rule is suppressing (see Event Rules docs).
Security considerations
- The
shared_secretis equivalent to an API key — treat it as any secret. - The gateway rotates its auth token automatically every 24 hours using the secret.
- Each gateway can only read the sources assigned to its
gateway_id. It cannot impersonate other gateways of the same tenant. - All outbound traffic is TLS 1.3 encrypted.
- No inbound connection to the gateway from the internet — it's a client, not a server.
How to test the gateway works end-to-end
Operational checklist:
- Dashboard → Edge Gateways → gw_xxx: status
online,last_heartbeatunder 1min. - Reasonable CPU / Mem (under 20% CPU, under 500MB RAM at idle).
- Queue depth low or 0.
- A source with
deployment_mode=edgeandedge_gateway_id=gw_xxxmust beconnectedand reading. - Events arriving in the inbox with
source_system=edge_gatewayin the metadata. - No
GATEWAY_OFFLINEalerts for the gateway itself in the inbox.
Modbus TCP Connection — Technical Guide
How to connect a Modbus TCP PLC to Rela AI: form fields, register map, cloud_direct vs edge modes, emit policies, connection test, and troubleshooting by error type.
Technician handoff via WhatsApp
How the industrial event reaches the technician with full context — no re-identification, no context loss.