How to install the rela-ai-edge container in your plant, register it, assign sources, monitor the fleet, and troubleshoot connection issues.

Edge Gateway

The edge gateway is a Docker container that runs inside your plant network and polls the PLCs locally. Instead of Rela AI opening inbound TCP connections to the PLC, the gateway reads the data and pushes it out via HTTPS to the Cloud Run worker.

What it's for

For plants with strict firewalls where you can't open inbound ports, not even via VPN. The gateway only requires outbound HTTPS to the Rela AI domain — a rule 95% of industrial firewalls already allow without IT negotiation. It also adds local buffering: if internet goes down, events accumulate on the gateway's disk and drain when the connection returns.

How it works

Install the container on a machine with Docker inside the plant network (Raspberry Pi, mini-PC, NUC).
On startup, the container registers with our backend using a unique per-gateway token.
It receives its config (which sources to poll) via HTTPS.
Opens Modbus / OPC UA / S7 connections locally to the PLC.
Each read → event → SQLite buffer → HTTP push to the agent's webhook.
If the network drops, the buffer holds up to 100k events; on recovery, drains FIFO.
Sends heartbeats every 60 seconds; if the backend doesn't see any for >1h, fires GATEWAY_OFFLINE alert.

Benefits


Firewall-friendly	Outbound HTTPS 443 only. No VPN needed, no inbound ports.
Local buffering	Store-and-forward SQLite — losing internet doesn't lose data.
Local polling	PLC latency under 1ms (same network). No internet RTT.
Multi-protocol	One gateway can poll Modbus + OPC UA + S7 concurrently.
Centralized monitoring	Dashboard shows fleet with last_heartbeat, CPU, memory, queue depth.

Install — step by step

1. Register the gateway in the dashboard

Dashboard → Connections → Edge Gateways → Register Gateway.

Returns a gateway_id (e.g. gw_a1b2c3) and a shared_secret the container uses to authenticate. Save it — shown only once.

2. Prepare the host

Minimum requirements:

Linux x86_64 or ARM64 (Raspberry Pi 4+ works).
Docker 20+ installed.
1 GB RAM, 10 GB disk.
Connectivity to the PLC network (same subnet or routed).
Internet egress (HTTPS 443) to rela-ai-worker-*.run.app.

3. Start the container

docker run -d \
  --name rela-ai-edge \
  --restart unless-stopped \
  -v rela-edge-data:/var/lib/rela-edge \
  -e GATEWAY_ID=gw_a1b2c3 \
  -e GATEWAY_SECRET=<shared_secret> \
  -e BACKEND_URL=https://rela-ai-worker-687568754456.europe-west1.run.app \
  gcr.io/rela-ai-488016/rela-ai-edge:latest

Or with docker compose:

services:
  rela-ai-edge:
    image: gcr.io/rela-ai-488016/rela-ai-edge:latest
    restart: unless-stopped
    volumes:
      - rela-edge-data:/var/lib/rela-edge
    environment:
      GATEWAY_ID: gw_a1b2c3
      GATEWAY_SECRET: <shared_secret>
      BACKEND_URL: https://rela-ai-worker-687568754456.europe-west1.run.app

volumes:
  rela-edge-data:

4. Verify the gateway is online

In the dashboard, within ~60 seconds you should see the gateway in green with a recent last_heartbeat.

Local logs:

docker logs -f rela-ai-edge

You should see:

edge_started gateway_id=gw_a1b2c3
heartbeat_sent seq=1 latency_ms=142
config_fetched sources=0

5. Assign sources to the gateway

In the dashboard, edit the sources you want running from this gateway:

Deployment Mode: edge
Edge Gateway ID: gw_a1b2c3

On the next heartbeat (≤60s), the gateway downloads the updated config and starts polling.

Fleet monitoring

Dashboard → Connections → Edge Gateways shows every gateway for the tenant with:

Column	Meaning
Name	Label (e.g. "Gateway North Plant")
Status	`online` if `last_heartbeat < 60s`, `warning` if `1-60min`, `offline` if `>60min`
Last heartbeat	Relative time since the last ping
CPU / Mem	Host metrics reported by the container
Queue depth	Events pending in the buffer (should be low except during outages)
Firmware version	Container version

Automatic alerts

A Cloud Scheduler job runs hourly and scans every gateway. If it finds one with last_heartbeat > 60min, it fires a canonical GATEWAY_OFFLINE alert with warning severity in the inbox. Operators see it like any other asset alert.

The inbox dedup consolidates outages from the same gateway into a single row until it heartbeats again (no spam during long outages).

Firmware update

Dashboard → Edge Gateways → <gateway> → Update firmware. Pick the target version. On the next heartbeat the gateway receives the command and:

Downloads the new Docker image.
Does a docker stop on the old container.
Starts the new one.
Reports success or failure on the next heartbeat.

If the update fails (corrupted image, insufficient space), the gateway stays on the previous version and we see the error in the dashboard.

Troubleshooting

The gateway doesn't come online after 2 minutes

Check 1 — container logs:

docker logs rela-ai-edge | tail -50

Common messages and their causes:

Message	Cause	Fix
`DNS resolution failed for rela-ai-worker-*.run.app`	No internet or DNS blocked	Try `nslookup rela-ai-worker-687568754456.europe-west1.run.app` from the host
`SSL: CERTIFICATE_VERIFY_FAILED`	System time out of sync + expired certs	`sudo systemctl restart systemd-timesyncd`
`HTTP 401 Unauthorized`	Wrong `GATEWAY_SECRET`	Regenerate the secret in the dashboard and update the container
`HTTP 403 Forbidden`	`GATEWAY_ID` doesn't exist or was deleted	Register it again in the dashboard

Check 2 — outbound connectivity:

curl -v https://rela-ai-worker-687568754456.europe-west1.run.app/internal/health

Should return 200 OK with JSON {"status":"ok"}.

The gateway is online but doesn't read the PLC

Verify the gateway and the PLC are on the same network (ping from gateway to PLC should work).
Review the container logs for specific Modbus/OPC UA errors (same troubleshooting as Modbus sources or OPC UA sources).
In the dashboard, each source has a "Last read" widget — if it's been at 0 for a while, the container logs tell you why.

Queue depth rising endlessly

Means the gateway reads faster than it can push to the backend. Common causes:

Slow internet: check HTTPS latency in the logs. >2s usually indicates a problem.
Backend down: last POST returned 5xx. Check the dashboard.
Too many sources: redistribute across gateways.

The queue caps at 100k events. When it reaches, it starts discarding the oldest with a warning. If you see this, it's time to add another gateway or debug the bottleneck.

Queue depth at 0 and no events reaching the dashboard

The gateway is connected and pushing fine, but the backend pipeline isn't processing them. Causes:

The source that should emit has enabled: false.
The machine agent assigned to that source has min_severity too high.
A correlation rule is suppressing (see Event Rules docs).

Security considerations

The shared_secret is equivalent to an API key — treat it as any secret.
The gateway rotates its auth token automatically every 24 hours using the secret.
Each gateway can only read the sources assigned to its gateway_id. It cannot impersonate other gateways of the same tenant.
All outbound traffic is TLS 1.3 encrypted.
No inbound connection to the gateway from the internet — it's a client, not a server.

How to test the gateway works end-to-end

Operational checklist:

Dashboard → Edge Gateways → gw_xxx: status online, last_heartbeat under 1min.
Reasonable CPU / Mem (under 20% CPU, under 500MB RAM at idle).
Queue depth low or 0.
A source with deployment_mode=edge and edge_gateway_id=gw_xxx must be connected and reading.
Events arriving in the inbox with source_system=edge_gateway in the metadata.
No GATEWAY_OFFLINE alerts for the gateway itself in the inbox.

Edge Gateway — Install and Operations