Rela AIRela AI Docs
Machine Agents

Edge Gateway — Install and Operations

How to install the rela-ai-edge container in your plant, register it, assign sources, monitor the fleet, and troubleshoot connection issues.

Edge Gateway

The edge gateway is a Docker container that runs inside your plant network and polls the PLCs locally. Instead of Rela AI opening inbound TCP connections to the PLC, the gateway reads the data and pushes it out via HTTPS to the Cloud Run worker.

What it's for

For plants with strict firewalls where you can't open inbound ports, not even via VPN. The gateway only requires outbound HTTPS to the Rela AI domain — a rule 95% of industrial firewalls already allow without IT negotiation. It also adds local buffering: if internet goes down, events accumulate on the gateway's disk and drain when the connection returns.

How it works

  1. Install the container on a machine with Docker inside the plant network (Raspberry Pi, mini-PC, NUC).
  2. On startup, the container registers with our backend using a unique per-gateway token.
  3. It receives its config (which sources to poll) via HTTPS.
  4. Opens Modbus / OPC UA / S7 connections locally to the PLC.
  5. Each read → event → SQLite buffer → HTTP push to the agent's webhook.
  6. If the network drops, the buffer holds up to 100k events; on recovery, drains FIFO.
  7. Sends heartbeats every 60 seconds; if the backend doesn't see any for >1h, fires GATEWAY_OFFLINE alert.

Benefits

Firewall-friendlyOutbound HTTPS 443 only. No VPN needed, no inbound ports.
Local bufferingStore-and-forward SQLite — losing internet doesn't lose data.
Local pollingPLC latency under 1ms (same network). No internet RTT.
Multi-protocolOne gateway can poll Modbus + OPC UA + S7 concurrently.
Centralized monitoringDashboard shows fleet with last_heartbeat, CPU, memory, queue depth.

Install — step by step

1. Register the gateway in the dashboard

Dashboard → Connections → Edge GatewaysRegister Gateway.

Returns a gateway_id (e.g. gw_a1b2c3) and a shared_secret the container uses to authenticate. Save it — shown only once.

2. Prepare the host

Minimum requirements:

  • Linux x86_64 or ARM64 (Raspberry Pi 4+ works).
  • Docker 20+ installed.
  • 1 GB RAM, 10 GB disk.
  • Connectivity to the PLC network (same subnet or routed).
  • Internet egress (HTTPS 443) to rela-ai-worker-*.run.app.

3. Start the container

docker run -d \
  --name rela-ai-edge \
  --restart unless-stopped \
  -v rela-edge-data:/var/lib/rela-edge \
  -e GATEWAY_ID=gw_a1b2c3 \
  -e GATEWAY_SECRET=<shared_secret> \
  -e BACKEND_URL=https://rela-ai-worker-687568754456.europe-west1.run.app \
  gcr.io/rela-ai-488016/rela-ai-edge:latest

Or with docker compose:

services:
  rela-ai-edge:
    image: gcr.io/rela-ai-488016/rela-ai-edge:latest
    restart: unless-stopped
    volumes:
      - rela-edge-data:/var/lib/rela-edge
    environment:
      GATEWAY_ID: gw_a1b2c3
      GATEWAY_SECRET: <shared_secret>
      BACKEND_URL: https://rela-ai-worker-687568754456.europe-west1.run.app

volumes:
  rela-edge-data:

4. Verify the gateway is online

In the dashboard, within ~60 seconds you should see the gateway in green with a recent last_heartbeat.

Local logs:

docker logs -f rela-ai-edge

You should see:

edge_started gateway_id=gw_a1b2c3
heartbeat_sent seq=1 latency_ms=142
config_fetched sources=0

5. Assign sources to the gateway

In the dashboard, edit the sources you want running from this gateway:

  • Deployment Mode: edge
  • Edge Gateway ID: gw_a1b2c3

On the next heartbeat (≤60s), the gateway downloads the updated config and starts polling.

Fleet monitoring

Dashboard → Connections → Edge Gateways shows every gateway for the tenant with:

ColumnMeaning
NameLabel (e.g. "Gateway North Plant")
Statusonline if last_heartbeat < 60s, warning if 1-60min, offline if >60min
Last heartbeatRelative time since the last ping
CPU / MemHost metrics reported by the container
Queue depthEvents pending in the buffer (should be low except during outages)
Firmware versionContainer version

Automatic alerts

A Cloud Scheduler job runs hourly and scans every gateway. If it finds one with last_heartbeat > 60min, it fires a canonical GATEWAY_OFFLINE alert with warning severity in the inbox. Operators see it like any other asset alert.

The inbox dedup consolidates outages from the same gateway into a single row until it heartbeats again (no spam during long outages).

Firmware update

Dashboard → Edge Gateways → <gateway>Update firmware. Pick the target version. On the next heartbeat the gateway receives the command and:

  1. Downloads the new Docker image.
  2. Does a docker stop on the old container.
  3. Starts the new one.
  4. Reports success or failure on the next heartbeat.

If the update fails (corrupted image, insufficient space), the gateway stays on the previous version and we see the error in the dashboard.

Troubleshooting

The gateway doesn't come online after 2 minutes

Check 1 — container logs:

docker logs rela-ai-edge | tail -50

Common messages and their causes:

MessageCauseFix
DNS resolution failed for rela-ai-worker-*.run.appNo internet or DNS blockedTry nslookup rela-ai-worker-687568754456.europe-west1.run.app from the host
SSL: CERTIFICATE_VERIFY_FAILEDSystem time out of sync + expired certssudo systemctl restart systemd-timesyncd
HTTP 401 UnauthorizedWrong GATEWAY_SECRETRegenerate the secret in the dashboard and update the container
HTTP 403 ForbiddenGATEWAY_ID doesn't exist or was deletedRegister it again in the dashboard

Check 2 — outbound connectivity:

curl -v https://rela-ai-worker-687568754456.europe-west1.run.app/internal/health

Should return 200 OK with JSON {"status":"ok"}.

The gateway is online but doesn't read the PLC

  • Verify the gateway and the PLC are on the same network (ping from gateway to PLC should work).
  • Review the container logs for specific Modbus/OPC UA errors (same troubleshooting as Modbus sources or OPC UA sources).
  • In the dashboard, each source has a "Last read" widget — if it's been at 0 for a while, the container logs tell you why.

Queue depth rising endlessly

Means the gateway reads faster than it can push to the backend. Common causes:

  1. Slow internet: check HTTPS latency in the logs. >2s usually indicates a problem.
  2. Backend down: last POST returned 5xx. Check the dashboard.
  3. Too many sources: redistribute across gateways.

The queue caps at 100k events. When it reaches, it starts discarding the oldest with a warning. If you see this, it's time to add another gateway or debug the bottleneck.

Queue depth at 0 and no events reaching the dashboard

The gateway is connected and pushing fine, but the backend pipeline isn't processing them. Causes:

  • The source that should emit has enabled: false.
  • The machine agent assigned to that source has min_severity too high.
  • A correlation rule is suppressing (see Event Rules docs).

Security considerations

  • The shared_secret is equivalent to an API key — treat it as any secret.
  • The gateway rotates its auth token automatically every 24 hours using the secret.
  • Each gateway can only read the sources assigned to its gateway_id. It cannot impersonate other gateways of the same tenant.
  • All outbound traffic is TLS 1.3 encrypted.
  • No inbound connection to the gateway from the internet — it's a client, not a server.

How to test the gateway works end-to-end

Operational checklist:

  1. Dashboard → Edge Gateways → gw_xxx: status online, last_heartbeat under 1min.
  2. Reasonable CPU / Mem (under 20% CPU, under 500MB RAM at idle).
  3. Queue depth low or 0.
  4. A source with deployment_mode=edge and edge_gateway_id=gw_xxx must be connected and reading.
  5. Events arriving in the inbox with source_system=edge_gateway in the metadata.
  6. No GATEWAY_OFFLINE alerts for the gateway itself in the inbox.

On this page