Files
Joseph Doherty 5f97c9d1ed docs(glauth): point all dev/test LDAP at the shared GLAuth on 10.100.0.35
deployment.md / CLAUDE.md / env_vars.md: the per-app LDAP (scadabridge-ldap
container, OtOpcUa DevStubMode, per-box C:\publish\glauth) is replaced by one
shared zb-shared-glauth on 10.100.0.35:3893 (dc=zb,dc=local); source of truth
infra/glauth/. Fixed stale baseDNs (dc=lmxopcua/dc=otopcua -> dc=zb).
2026-06-04 16:37:52 -04:00

342 lines
21 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Deployment & Environments — SCADA/OT family
> How the sister projects are deployed: environments, hosts, SSH access, Docker/Traefik
> topology, databases, and the full service/port map. Compiled **2026-06-03** by reading the
> actual compose/Traefik/SSH files (not docs alone). For the per-service **environment
> variables** see the companion [`env_vars.md`](env_vars.md).
>
> **Source confidence:** container/port/Traefik/DB facts below are read straight from the
> compose + `traefik/*.yml` files. SSH facts are from `~/.ssh/config` + `~/.ssh/known_hosts`.
> Where a fact is referenced in repo docs but not pinned in a config/script on this machine,
> it's marked _(referenced, not scripted in-repo)_ — don't treat those as automated.
---
## 1. Environment inventory
| Environment | Where it runs | What it is | Entry point |
|---|---|---|---|
| **ScadaBridge `docker`** | This Mac (Docker Desktop/OrbStack) | Full hub-and-spoke: 2 Central + 3 sites ×2 nodes | `http://localhost:9000` |
| **ScadaBridge `docker-env2`** | This Mac | Second isolated cluster: 2 Central + 1 site ×2 nodes | `http://localhost:9100` |
| **ScadaBridge `infra`** | This Mac | Shared backing services (MSSQL, OPC-UA sims, SMTP, REST, Playwright) — **not** LDAP (see shared GLAuth below) | n/a (deps) |
| **OtOpcUa `otopcua-dev`** | This Mac | 3 independent Akka clusters (MAIN + SITE-A + SITE-B) sharing one ConfigDb | `http://localhost:9200` |
| **MxAccessGateway** | `windev` (10.100.0.48), Windows | Windows-native gRPC gateway + per-session x86 worker (no Docker) | `http://10.100.0.48:5120` (gRPC) |
| **Production (VD03)** | `wonder-app-vd03.zmr.zimmer.com` | Single-node ScadaBridge + MxGateway prod host | see `docs/operations/` runbooks |
The three ScadaBridge stacks share one external Docker network **`scadabridge-net`**; the
OtOpcUa `otopcua-dev` stack runs on its **own** default network (`otopcua-dev_default`) and is
network-isolated from ScadaBridge. All local stacks can run simultaneously — host ports do not
collide (see [§7](#7-consolidated-host-port-map)).
> On this Apple-Silicon Mac, MSSQL runs under amd64 emulation (slow first-ready; the "platform
> does not match" warning is expected/benign). See [[scadabridge-local-deploy-gotchas]].
---
## 2. Hosts & SSH connectivity
### 2.1 Host inventory
| Host | Address | OS | Role | SSH port |
|---|---|---|---|---|
| **This Mac** | local | macOS (darwin) | Dev workstation — runs all local Docker stacks | n/a |
| **windev** | `10.100.0.48` | Windows | OtOpcUa Windows-service host **+** MxAccessGateway (gRPC `5120` / dashboard `5130`) | 22 |
| **fixture host** | `10.100.0.35` | Debian/Linux + Docker | OtOpcUa driver **integration-test fixtures** + a test SQL Server | 22 |
| **VD03 (prod)** | `wonder-app-vd03.zmr.zimmer.com` | Windows | Production single-node ScadaBridge + MxGateway | **2222** |
| **gitea** | `gitea.dohertylan.com` (`10.100.0.228`) | Linux | Git remotes + NuGet feed (`/api/packages/dohertj2/nuget`) | 22 |
All are on the private `10.x` lab network — a LAN/VPN connection is required.
### 2.2 How to connect (passwordless SSH)
Auth is **key-based (passwordless)** with `~/.ssh/id_ed25519` (a legacy `~/.ssh/id_rsa` exists
as fallback). Only **one** host alias is defined in `~/.ssh/config`:
```sshconfig
# ~/.ssh/config (verified)
Include ~/.orbstack/ssh/config # OrbStack local Linux VMs — use `ssh orb` / `orb` CLI
Host windev
HostName 10.100.0.48
User dohertj2
IdentityFile ~/.ssh/id_ed25519
# Port 22 (default)
```
| Target | Command | Notes |
|---|---|---|
| **windev** (Win host) | `ssh windev` | Configured alias; user `dohertj2`, key `id_ed25519`, port 22 |
| **fixture host** | `ssh dohertj2@10.100.0.35` | In `known_hosts`; **no** config alias — pass user explicitly; port 22, key-based |
| **VD03 (prod)** | `ssh dohertj2@wonder-app-vd03.zmr.zimmer.com -p 2222` | In `known_hosts` on **port 2222** (the only non-standard SSH port); user/key not pinned in config — confirm before use |
| **local Linux VMs** | `ssh orb` / `orb` | OrbStack-managed |
> ⚠️ `~/bin` is **empty** on this Mac. OtOpcUa's `CLAUDE.md` mentions an `lmxopcua-fix` helper "in
> `~/bin`" for controlling the `10.100.0.35` fixture containers — it is **not present here** (it's a
> Windows-side helper). On this machine, drive the fixture host with direct SSH, e.g.
> `ssh dohertj2@10.100.0.35 'docker compose -f /opt/otopcua-<driver>/docker-compose.yml up -d'`.
> Treat the exact remote paths/commands as _(referenced, not scripted in-repo)_ — verify on the host.
---
## 3. ScadaBridge deployment
.NET 10 + Akka.NET. One image `scadabridge:latest` (built by `docker/build.sh`) backs every node;
role is chosen by `SCADABRIDGE_CONFIG` (`Central`|`Site`) → `appsettings.{role}.json`. Central is a
2-node Akka cluster (split-brain resolver = `keep-oldest`); each Site is its **own** 2-node Akka
cluster reached from Central via ClusterClient.
### 3.1 `docker/` — primary 3-site cluster (network `scadabridge-net`)
| Service | Container | Host→container ports | Role | Volumes |
|---|---|---|---|---|
| central-a | `scadabridge-central-a` | `9001:5000` (UI+Inbound API), `9011:8081` (Akka) | Central | `central-node-a/appsettings.Central.json` (ro), `…/logs` |
| central-b | `scadabridge-central-b` | `9002:5000`, `9012:8081` | Central | `central-node-b/…` |
| site-a-a | `scadabridge-site-a-a` | `9021:8082` (Akka), `9023:8083` (gRPC) | Site | `site-a-node-a/{appsettings.Site.json,data,logs}` |
| site-a-b | `scadabridge-site-a-b` | `9022:8082`, `9024:8083` | Site | `site-a-node-b/…` |
| site-b-a | `scadabridge-site-b-a` | `9031:8082`, `9033:8083` | Site | `site-b-node-a/…` |
| site-b-b | `scadabridge-site-b-b` | `9032:8082`, `9034:8083` | Site | `site-b-node-b/…` |
| site-c-a | `scadabridge-site-c-a` | `9041:8082`, `9043:8083` | Site | `site-c-node-a/…` |
| site-c-b | `scadabridge-site-c-b` | `9042:8082`, `9044:8083` | Site | `site-c-node-b/…` |
| traefik | `scadabridge-traefik` | `9000:80` (Central LB), `8180:8080` (dashboard) | LB | `traefik/{traefik,dynamic}.yml` (ro) |
All `restart: unless-stopped`; image `scadabridge:latest` (traefik `traefik:v3.4`).
**Access:** Central UI/API via LB `http://localhost:9000`; direct nodes `:9001`/`:9002`; Traefik
dashboard `http://localhost:8180`; Management API `http://localhost:9000/management`; health
`…/health/ready` + `…/health/active`.
### 3.2 `docker-env2/` — secondary 1-site cluster (same `scadabridge-net`)
| Service | Container | Host→container ports | Role |
|---|---|---|---|
| central-a | `scadabridge-env2-central-a` | `9101:5000`, `9111:8081` | Central |
| central-b | `scadabridge-env2-central-b` | `9102:5000`, `9112:8081` | Central |
| site-x-a | `scadabridge-env2-site-x-a` | `9121:8082`, `9123:8083` | Site |
| site-x-b | `scadabridge-env2-site-x-b` | `9122:8082`, `9124:8083` | Site |
| traefik | `scadabridge-env2-traefik` | `9100:80` (LB), `8181:8080` (dashboard) | LB |
**Access:** LB `http://localhost:9100`; direct `:9101`/`:9102`; dashboard `http://localhost:8181`.
This cluster's DBs and **auth cookie name** are distinct from `docker/` so the two can run on
`localhost` at once — cookie `ZB.MOM.WW.ScadaBridge.Auth.env2` vs the default; see
[[scadabridge-local-deploy-gotchas]].
### 3.3 `infra/` — shared backing services (network `scadabridge-net`)
| Service | Container | Image | Host ports | Purpose |
|---|---|---|---|---|
| mssql | `scadabridge-mssql` | `mcr.microsoft.com/mssql/server:2022-latest` | `1433:1433` | SQL Server — Central DBs for **both** clusters; named vol `scadabridge-mssql-data`; init via `/docker-entrypoint-initdb.d/{setup,machinedata_seed,setup-env2}.sql` |
| opcua | `scadabridge-opcua` | `mcr.microsoft.com/iotedge/opc-plc:latest` | `50000:50000`, `8080:8080` | OPC-UA simulator 1 (`--unsecuretransport --autoaccept`) |
| opcua2 | `scadabridge-opcua2` | `…/opc-plc:latest` | `50010:50010`, `8081:8080` | OPC-UA simulator 2 |
| smtp | `scadabridge-smtp` | `axllent/mailpit:latest` | `1025:1025`, `8025:8025` | SMTP sink + web UI (`http://localhost:8025`) |
| restapi | `scadabridge-restapi` | local build `./restapi` | `5200:5200` | Test REST endpoint |
| playwright | `scadabridge-playwright` | `mcr.microsoft.com/playwright:v1.58.2-noble` | `3000:3000` | Browser-automation server |
> **LDAP is NOT started by `infra/`.** The per-app `scadabridge-ldap` container has been retired
> (commented out in `infra/docker-compose.yml`). All three apps (ScadaBridge, OtOpcUa, MxAccessGateway)
> now share a single **`zb-shared-glauth`** container on the Linux fixture host **`10.100.0.35:3893`**
> (`baseDN dc=zb,dc=local`, Transport=None). Source of truth and deploy/verify runbook:
> **`scadaproj/infra/glauth/`** (`config.toml` + `docker-compose.yml` + `README.md`); deploy by
> scp-ing those two files to `10.100.0.35` and running `docker compose up -d`.
### 3.4 Traefik (ScadaBridge)
Both clusters use a file provider + insecure API dashboard. `traefik.yml`: entrypoint `web:80`,
`api.dashboard: true / insecure: true`, file provider `dynamic.yml`. `dynamic.yml` router
`central` (`PathPrefix(/)` → service `central`) load-balances the two Central containers with an
**active health check** on `/health/active` (interval 5s, timeout 3s) — so traffic only routes to
the active leader (standby returns 503 and is dropped from rotation):
```yaml
# docker/traefik/dynamic.yml (env2 points at scadabridge-env2-central-a/-b)
http:
routers: { central: { rule: "PathPrefix(`/`)", service: central, entryPoints: [web] } }
services:
central:
loadBalancer:
healthCheck: { path: /health/active, interval: 5s, timeout: 3s }
servers: [ {url: "http://scadabridge-central-a:5000"}, {url: "http://scadabridge-central-b:5000"} ]
```
### 3.5 Databases (ScadaBridge)
- **Central → MSSQL** (`scadabridge-mssql:1433`), app login `scadabridge_app` / `ScadaBridge_Dev1#` 🔒(dev-only):
- `docker/`: `ScadaBridgeConfig` + `ScadaBridgeMachineData`
- `docker-env2/`: `ScadaBridgeConfig2` + `ScadaBridgeMachineData2`
- Created by `infra/mssql/setup.sql` + `setup-env2.sql` at MSSQL init; EF Core migrations run on Central startup; `docker-env2/init-db.sh` ensures the env2 DBs before deploy; `seed-sites.sh` seeds Site rows post-deploy.
- **Site → SQLite**, per node under the mounted `…/data` volume (`SiteDbPath`, plus a store-and-forward DB). Not networked, not replicated across hosts.
### 3.6 Deploy commands (ScadaBridge)
```bash
cd ~/Desktop/ScadaBridge
cd infra && docker compose up -d # 1) backing services (MSSQL, OPC-UA, SMTP, REST) — LDAP is shared glauth on 10.100.0.35 (scadaproj/infra/glauth/)
bash docker/build.sh # 2) create scadabridge-net (if missing) + build scadabridge:latest
bash docker/deploy.sh # 3) up -d --force-recreate; prints access points (9000/9001/9002/8180)
bash docker/seed-sites.sh # 4) seed sites + data-connections (optional)
# env2 cluster:
bash docker-env2/deploy.sh # reuses the image; runs init-db.sh; ports 9100/9101/9102/8181
```
> **Caveat:** `deploy.sh` does `up -d --force-recreate`, starting both Central nodes at once — they
> can split-brain on a simultaneous start. Start Central **sequenced** (central-a → wait `/health/active`
> 200 → central-b). Central also requires `ScadaBridge__InboundApi__ApiKeyPepper` (dev value is inline in
> both composes). Full detail: [[scadabridge-local-deploy-gotchas]].
---
## 4. OtOpcUa deployment (`otopcua-dev`)
.NET 10 OPC-UA server. **Three independent Akka clusters** share the single `OtOpcUa` ConfigDb
(multi-tenancy via the `ServerCluster` table); Akka isolation is by disjoint seed lists (same
system name `otopcua`, internal remoting port `4053`). Built locally from `docker-dev/Dockerfile`
→ image `otopcua-host:dev`. **No per-app LDAP container**`docker-dev` is un-stubbed
(`Authentication__Ldap__DevStubMode` removed) and binds the **shared GLAuth** at
`10.100.0.35:3893` (`baseDN dc=zb,dc=local`, Transport=None). Start the shared glauth first via
`scadaproj/infra/glauth/` if it is not already running.
| Service | Container | Host→container ports | Cluster / role |
|---|---|---|---|
| sql | (`otopcua-dev-sql-1`) | `14330:1433` | SQL Server 2022 — the shared `OtOpcUa` ConfigDb |
| cluster-seed | one-shot | — | `mssql-tools` running `/seed/entrypoint.sh` (idempotent ServerCluster/ClusterNode seed) |
| admin-a | host | _(none — internal `:9000` UI behind Traefik)_ | MAIN, role `admin` (seed) |
| admin-b | host | _(none)_ | MAIN, role `admin` (joins admin-a) |
| driver-a | host | `4840:4840` (OPC UA) | MAIN, role `driver` |
| driver-b | host | `4841:4840` | MAIN, role `driver` |
| site-a-1 | host | `4842:4840` | SITE-A, `admin,driver` (seed) |
| site-a-2 | host | `4843:4840` | SITE-A, `admin,driver` |
| site-b-1 | host | `4844:4840` | SITE-B, `admin,driver` (seed) |
| site-b-2 | host | `4845:4840` | SITE-B, `admin,driver` |
| traefik | host | `9200:80` (Admin UI LB), `8089:8080` (dashboard) | `traefik:v3.1` |
- **OPC UA endpoints:** `opc.tcp://localhost:4840` (driver-a) … `:4845` (site-b-2). Admin nodes serve no OPC UA.
- **Admin UI (Traefik, sticky cookie `otopcua_lb`, health-checked on `/health/active`):**
- MAIN cluster: `http://localhost:9200`
- SITE-A: `http://site-a.localhost:9200` · SITE-B: `http://site-b.localhost:9200` (Host-header routing; macOS auto-resolves `*.localhost`)
- Traefik dashboard: `http://localhost:8089`
- **DB:** `sql` service, `14330:1433`, SA `OtOpcUa!Dev123` 🔒(dev-only), database `OtOpcUa`; EF auto-migrates on host start, then `cluster-seed` inserts the 3 ServerCluster + 6 ClusterNode rows.
- **Deploy:** `docker compose -f docker-dev/docker-compose.yml up -d --build` ; tear down with `… down -v`.
- **Galaxy link:** driver nodes resolve `GALAXY_MXGW_API_KEY` and connect out to MxAccessGateway (see §5).
> **Integration-test fixtures (separate from this stack)** run on the Linux **fixture host
> `10.100.0.35`** (Modbus `:5020`, Allen-Bradley `:44818`, S7 `:102`, OPC-UA `:50000`, SQL `:14330`).
> Those are test endpoints, not the deployed app; per-fixture env defaults are in [`env_vars.md`](env_vars.md) §1.3.
---
## 5. MxAccessGateway deployment (Windows-native, no Docker)
Two processes: an **x64 .NET 10 Server** (ASP.NET Core gRPC + Blazor dashboard) and a **per-session
x86 .NET 4.8 Worker** that owns the 32-bit AVEVA MXAccess COM/STA. Windows-only. Deployed on
**`windev` (10.100.0.48)** and **VD03**, run as a **Windows Service via NSSM** (config delivered as
`Kestrel__Endpoints__…` environment variables, not `appsettings.json`).
### 5.1 Endpoint/port map
| Endpoint | Default URL | Protocol | Config key | Purpose |
|---|---|---|---|---|
| **Http (gRPC)** | `http://0.0.0.0:5120` (h2c) | HTTP/2 cleartext | `Kestrel__Endpoints__Http__Url` / `__Protocols=Http2` | Public gRPC: sessions, MxCommand/MxEvent, Galaxy browse |
| **Dashboard** | `http://0.0.0.0:5130` | HTTP/1.1 | `Kestrel__Endpoints__Dashboard__Url` | Blazor dashboard + SignalR hubs + `/login` |
Local dev (`launchSettings.json`): gRPC `http://localhost:5120` (https dev profile adds `7121`).
TLS optional — set `…Http__Url=https://…`; the gateway auto-generates a self-signed cert if none is
supplied (`docs/GatewayConfiguration.md`). Dashboard cookie name is now configurable
(`MxGateway:Dashboard:CookieName`).
### 5.2 Run / host
```powershell
# local dev
dotnet run --project src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj
# the x86 worker must be published first; path = MxGateway:Worker:ExecutablePath
dotnet build src/MxGateway.Worker/MxGateway.Worker.csproj -p:Platform=x86
```
- **Worker model:** the Server spawns one `ZB.MOM.WW.MxGateway.Worker.exe` (x86) **per gRPC session**;
IPC over a named pipe (`\\.\pipe\mxgateway-<session>` + a per-session `MXGATEWAY_WORKER_NONCE`);
heartbeat 5s / grace 15s; max 64 concurrent sessions. The worker exits when the session closes.
- **Production hosts:** both `10.100.0.48` and `wonder-app-vd03` serve gRPC on `:5120` (per
`docs/GatewayConfiguration.md`).
### 5.3 Who connects to it
| Client | Connects to | Auth |
|---|---|---|
| OtOpcUa `GalaxyDriver` | `http://10.100.0.48:5120` (gRPC) | API key via `GALAXY_MXGW_API_KEY` (`mxgw_…` bearer) 🔒 |
| ScadaBridge MxGateway adapter | same gRPC endpoint `:5120` | API key |
---
## 6. Cross-project runtime data flow (deployed)
```
AVEVA Galaxy (Wonderware) ──MXAccess COM (32-bit)──► MxAccessGateway (windev:5120 gRPC / :5130 dashboard)
▲ ▲
OtOpcUa GalaxyDriver ───gRPC────┘ │ gRPC
(otopcua-dev: opc.tcp :48404845) │
│ OPC UA │
▼ │
ScadaBridge DCL ◄──OPC UA──┐ ┌──MxGateway adapter──┘
(docker :9000 / env2 :9100) └───┘
```
ScadaBridge reaches Wonderware data two ways: **(1)** OPC UA → OtOpcUa → gateway, or **(2)** its
MxGateway adapter → gateway directly. The break surface is the wire contracts (the gateway `.proto`s
and OtOpcUa's OPC-UA address space), not compile references.
---
## 7. Consolidated host port map
Every published host port across the local stacks (no collisions — all can run at once):
| Port | → Container:port | Service | Stack |
|---|---|---|---|
| 1025 | `scadabridge-smtp`:1025 | SMTP submission | infra |
| 1433 | `scadabridge-mssql`:1433 | SQL Server (ScadaBridge Central DBs) | infra |
| 3000 | `scadabridge-playwright`:3000 | Playwright server | infra |
| 3893 | `zb-shared-glauth`:3893 on **10.100.0.35** | LDAP (shared GLAuth — remote fixture host, not a local container) | scadaproj/infra/glauth/ |
| 5200 | `scadabridge-restapi`:5200 | Test REST API | infra |
| 8025 | `scadabridge-smtp`:8025 | Mailpit web UI | infra |
| 8080 | `scadabridge-opcua`:8080 | OPC-UA sim 1 web UI | infra |
| 8081 | `scadabridge-opcua2`:8080 | OPC-UA sim 2 web UI | infra |
| 50000 | `scadabridge-opcua`:50000 | OPC-UA sim 1 endpoint | infra |
| 50010 | `scadabridge-opcua2`:50010 | OPC-UA sim 2 endpoint | infra |
| 9000 | `scadabridge-traefik`:80 | **Central UI/API (LB)** | docker |
| 8180 | `scadabridge-traefik`:8080 | Traefik dashboard | docker |
| 9001 / 9002 | central-a / central-b :5000 | Central UI+Inbound API (direct) | docker |
| 9011 / 9012 | central-a / central-b :8081 | Akka remoting | docker |
| 90219024 | site-a-a/b :8082 / :8083 | Site A Akka / gRPC | docker |
| 90319034 | site-b-a/b :8082 / :8083 | Site B Akka / gRPC | docker |
| 90419044 | site-c-a/b :8082 / :8083 | Site C Akka / gRPC | docker |
| 9100 | `scadabridge-env2-traefik`:80 | **Central UI/API (LB)** | docker-env2 |
| 8181 | `scadabridge-env2-traefik`:8080 | Traefik dashboard | docker-env2 |
| 9101 / 9102 | env2 central-a / central-b :5000 | Central (direct) | docker-env2 |
| 9111 / 9112 | env2 central-a / central-b :8081 | Akka remoting | docker-env2 |
| 91219124 | env2 site-x-a/b :8082 / :8083 | Site X Akka / gRPC | docker-env2 |
| 14330 | `otopcua-dev` sql :1433 | SQL Server (`OtOpcUa` DB) | otopcua-dev |
| 4840 / 4841 | driver-a / driver-b :4840 | OPC UA (MAIN) | otopcua-dev |
| 4842 / 4843 | site-a-1 / site-a-2 :4840 | OPC UA (SITE-A) | otopcua-dev |
| 4844 / 4845 | site-b-1 / site-b-2 :4840 | OPC UA (SITE-B) | otopcua-dev |
| 9200 | `otopcua-dev` traefik :80 | **Admin UI (LB)** | otopcua-dev |
| 8089 | `otopcua-dev` traefik :8080 | Traefik dashboard | otopcua-dev |
**Remote (non-local) endpoints:** MxAccessGateway gRPC `10.100.0.48:5120` (h2c) / dashboard `:5130`;
production gRPC on `wonder-app-vd03:5120`. SSH: windev/fixture/gitea on `22`, **VD03 on `2222`**.
---
## 8. Secrets & dev-only values
Every credential shown above (`OtOpcUa!Dev123`, `ScadaBridge_Dev1#`, the inline API-key peppers,
the `docker-dev` JWT signing key, the `mxgw_…` API key) is a **dev-only placeholder** for the local
stacks — never reuse as a real secret. Production injects real secrets out-of-band (NSSM env / secret
store), per ScadaBridge `docs/operations/inbound-api-key-reissue.md` (the VD03 runbook). The full
🔒 secret inventory and the `__`-env-var override forms are in [`env_vars.md`](env_vars.md) §5.
## 9. Production (VD03) — pointer
`wonder-app-vd03.zmr.zimmer.com` (SSH `:2222`) runs the production single-node ScadaBridge and the
MxGateway (gRPC `:5120`). The production install is **not a scripted in-repo flow** here — the
operational procedures live in ScadaBridge `docs/operations/` (`failover-procedures.md`,
`maintenance-procedures.md`, `inbound-api-key-reissue.md`, `troubleshooting-guide.md`). Treat any
prod service/port specifics not in those runbooks as unverified.