diff --git a/mbproxy/CLAUDE.md b/mbproxy/CLAUDE.md index 28855a2..51bc9e6 100644 --- a/mbproxy/CLAUDE.md +++ b/mbproxy/CLAUDE.md @@ -32,7 +32,7 @@ The full architecture is documented under **[`docs/`](docs/)** — see the `Arch - **Polly bounded retries** on backend connect (3 attempts at 100ms / 500ms / 2000ms). No retries on mid-request failures (FC06/FC16 are non-idempotent on BCD tags). A per-request watchdog in the multiplexer surfaces Modbus exception 0x0B to the upstream client if a backend response never arrives within `BackendRequestTimeoutMs`. - **Backend disconnect cascades upstream**: when the shared backend socket dies, every attached upstream pipe is closed in the same cycle (counter `BackendDisconnectCascades`); clients reconnect on their next request. - **Keepalive / connection monitoring** (ON by default, `Connection.Keepalive`): OS `SO_KEEPALIVE` on backend and accepted upstream sockets, plus a per-PLC application heartbeat — a synthetic FC03 qty=1 read fired on an idle backend socket (`BackendHeartbeatIdleMs`). An unanswered heartbeat proactively tears the backend down (counters `backendHeartbeatsSent/Failed`, `backendIdleDisconnects`). The DL260 has no FC08, so the probe is a real register read. See [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md). -- **Read-only Kestrel admin port** (default 8080) exposes `GET /` (auto-refreshing HTML) and `GET /status.json` with service-wide and per-PLC counters (including Phase-9 mux fields, Phase-10 coalescing fields, and Phase-11 cache fields `cacheHitCount`, `cacheMissCount`, `cacheInvalidations`, `cacheEntryCount`, `cacheBytes`). +- **Read-only Kestrel admin port** (default 8080) serves a SignalR-backed web dashboard — `GET /` (filterable fleet KPI table), `GET /plc/{name}` (per-PLC grouped counters + a real-time debug view of raw PLC-side BCD vs. decoded client-side values), `/hub/status` (live feed, `Mbproxy.AdminPushIntervalMs` cadence), `/assets/*` (embedded Bootstrap/SignalR/fonts, no CDN) — plus the unchanged `GET /status.json` twin with service-wide and per-PLC counters (Phase-9 mux, Phase-10 coalescing, Phase-11 cache fields `cacheHitCount`/`cacheMissCount`/`cacheInvalidations`/`cacheEntryCount`/`cacheBytes`). The debug view's per-tag value capture (`TagValueCapture`/`TagCaptureRegistry`) is armed on-demand only while a detail page is open. Admin stays strictly read-only — no control actions. Anything beyond this short list lives in the `docs/` tree: the appsettings.json schema in [`docs/Operations/Configuration.md`](docs/Operations/Configuration.md), config propagation in [`docs/Features/HotReload.md`](docs/Features/HotReload.md), stable log event names in [`docs/Reference/LogEvents.md`](docs/Reference/LogEvents.md), the status counter catalog in [`docs/Operations/StatusPage.md`](docs/Operations/StatusPage.md), and the simulator-backed test fixture in [`docs/Testing/Simulator.md`](docs/Testing/Simulator.md). Open the relevant page before writing code; keep it in sync when decisions change. diff --git a/mbproxy/README.md b/mbproxy/README.md index 65bbf29..292864b 100644 --- a/mbproxy/README.md +++ b/mbproxy/README.md @@ -50,7 +50,7 @@ The `docs/` tree is organized by topic. Start with [`Architecture/Overview.md`]( ### Operations - [`Operations/Configuration.md`](docs/Operations/Configuration.md) — Full `appsettings.json` reference: every `Mbproxy:*` key, default, and validation rule. -- [`Operations/StatusPage.md`](docs/Operations/StatusPage.md) — Admin endpoint surface (`/`, `/status.json`) with every JSON field documented. +- [`Operations/StatusPage.md`](docs/Operations/StatusPage.md) — Admin endpoint surface: the SignalR-backed web dashboard (`/`, `/plc/{name}`, `/hub/status`) and the `/status.json` twin, with every JSON field documented. - [`Operations/Troubleshooting.md`](docs/Operations/Troubleshooting.md) — Diagnosis playbook keyed to log events and status counters. ### Reference @@ -106,7 +106,7 @@ cd src/Mbproxy dotnet run --configuration Debug ``` -Edit `src/Mbproxy/appsettings.json` to configure PLCs before running. The admin status page will be at `http://localhost:8080/` by default. +Edit `src/Mbproxy/appsettings.json` to configure PLCs before running. The admin dashboard will be at `http://localhost:8080/` by default — a live SignalR-backed fleet view; click any PLC row for its per-connection detail page and real-time BCD debug view. ## Install diff --git a/mbproxy/docs/Operations/Configuration.md b/mbproxy/docs/Operations/Configuration.md index c36a34c..340e381 100644 --- a/mbproxy/docs/Operations/Configuration.md +++ b/mbproxy/docs/Operations/Configuration.md @@ -109,9 +109,19 @@ Port for the read-only HTTP status server. Binds to all interfaces on startup. `ReloadValidator` rejects values outside `[1, 65535]` and rejects collisions with any `Plcs[i].ListenPort`. Source: `MbproxyOptions.AdminPort`. -The server exposes `GET /` (auto-refreshing HTML) and `GET /status.json`. See [`./StatusPage.md`](./StatusPage.md) for the schema. +The server exposes the SignalR-backed web dashboard (`GET /`, `GET /plc/{name}`, `GET /assets/{path}`, `/hub/status`) and the JSON twin `GET /status.json`. See [`./StatusPage.md`](./StatusPage.md) for the endpoint surface and schema. -Authentication is assumed at the network layer (trusted internal segment). The endpoint is read-only — there are no `POST` / `PUT` / `DELETE` routes — so the risk surface is limited to status disclosure. Place the admin port behind a firewall rule that allows only operator workstations. +Authentication is assumed at the network layer (trusted internal segment). The endpoint is read-only — no admin actions are exposed — so the risk surface is limited to status disclosure. Place the admin port behind a firewall rule that allows only operator workstations. + +## `Mbproxy.AdminPushIntervalMs` + +Server-push cadence (milliseconds) for the admin dashboard's SignalR feed. Every interval `StatusBroadcaster` builds a status snapshot and pushes it to connected dashboard / detail-page clients. + +| Field | Type | Default | Range | +|-------|------|---------|-------| +| `AdminPushIntervalMs` | int | `1000` | `> 0` | + +`MbproxyOptionsValidator` and `ReloadValidator` both reject values `<= 0`. The broadcaster additionally floors the effective interval at 100 ms. Source: `MbproxyOptions.AdminPushIntervalMs`. ## `Mbproxy.Plcs[]` diff --git a/mbproxy/docs/Operations/StatusPage.md b/mbproxy/docs/Operations/StatusPage.md index 7bc24d6..d2f0767 100644 --- a/mbproxy/docs/Operations/StatusPage.md +++ b/mbproxy/docs/Operations/StatusPage.md @@ -1,17 +1,20 @@ # Status Page -The status page is the operator-facing view of the running service: an auto-refreshing HTML dashboard at `GET /` and a JSON twin at `GET /status.json` that monitoring scrapers consume. This document describes the endpoint surface, every wire-level field, and how counters map back to architecture decisions. +The status page is the operator-facing view of the running service: a live web dashboard backed by SignalR, plus a JSON twin at `GET /status.json` that monitoring scrapers consume. This document describes the endpoint surface, every wire-level field, and how counters map back to architecture decisions. ## Endpoint Surface -The admin endpoint is owned by `AdminEndpointHost` (see `src/Mbproxy/Admin/AdminEndpointHost.cs`). It exposes exactly two routes: +The admin endpoint is owned by `AdminEndpointHost` (see `src/Mbproxy/Admin/AdminEndpointHost.cs`). It exposes: -- `GET /` — a single self-contained HTML document with a `` tag. The page refreshes every five seconds by reload, not by JavaScript polling. There is no JS bundle, no external CSS, no remote fonts, and no favicon fetch. +- `GET /` — the **fleet dashboard** SPA shell: aggregate fleet health cards and a filterable/sortable per-PLC KPI table. +- `GET /plc/{name}` — the **connection-detail** SPA shell for one PLC: every per-PLC counter grouped into readable cards, the connected-client list, and a real-time debug view (per-tag PLC-side raw BCD vs. client-side decoded value). +- `GET /assets/{path}` — embedded static assets: Bootstrap 5, the SignalR JS client, two vendored IBM Plex woff2 fonts, and the dashboard's own HTML/CSS/JS. Everything is embedded in the binary; nothing is fetched from a CDN, so the UI works on a firewalled network. Served with a long immutable cache header. - `GET /status.json` — the same in-memory snapshot serialized as JSON via the source-generated `StatusJsonContext` (camelCase property names). +- `/hub/status` — the SignalR hub. The two SPA shells open a hub connection and subscribe: the dashboard to the `fleet` group, a detail page to its `plc:{name}` group. A `StatusBroadcaster` loop pushes a fresh snapshot every `Mbproxy.AdminPushIntervalMs` (default 1000 ms). -The endpoint is **read-only**. There are no admin actions exposed — no kick-client, no force-reload, no listener restart, no log download. Reload happens automatically via `IOptionsMonitor`; listener recovery is owned by the supervisor. Authentication lives at the network layer: the service binds to `IPAddress.Any` on the admin port and assumes the deployment runs in a trusted internal segment behind a firewall. +The endpoint is **read-only**. There are no admin actions exposed — no kick-client, no force-reload, no listener restart, no log download. The detail-page debug view is the one feature with a runtime side effect, and it is benign and read-only: a PLC's tag-value capture is *armed* (begins recording last-seen values) only while at least one detail page is subscribed to it, and *disarmed* when the last viewer leaves. Reload happens automatically via `IOptionsMonitor`; listener recovery is owned by the supervisor. Authentication lives at the network layer: the service binds to `IPAddress.Any` on the admin port and assumes the deployment runs in a trusted internal segment behind a firewall. -Both routes call `StatusSnapshotBuilder.Build()` for every request. The builder reads atomic counters directly from the supervisor map and per-PLC `ProxyCounters`; it holds no locks and performs no I/O. +`GET /status.json` and every SignalR push call `StatusSnapshotBuilder.Build()`. The builder reads atomic counters directly from the supervisor map and per-PLC `ProxyCounters`; it holds no locks and performs no I/O. ## Port and Configuration @@ -291,19 +294,38 @@ A representative two-PLC deployment, ~2 hours into a run: } ``` -## HTML Page Layout +## Web Dashboard -The HTML renderer is `StatusHtmlRenderer.Render(StatusResponse)` in `src/Mbproxy/Admin/StatusHtmlRenderer.cs`. The page is one document, inline CSS in a `"); - sb.Append("
"); - - // ── Header ──────────────────────────────────────────────────────────── - sb.Append("No PLCs configured.
"); - } - else - { - sb.Append("| Name | Host | Port | State | "); - sb.Append("Clients | PDUs fwd | FC03 | FC04 | "); - sb.Append("FC06 | FC16 | FC? | BCD slots | "); - sb.Append("Partial BCD | Invalid BCD | Ex 01 | Ex 02 | Ex 03 | Ex 04 | Ex ? | "); - sb.Append("RTT ms | Bytes in | Bytes out | "); - // Multiplexer telemetry columns. - sb.Append("In-flight | Max in-flight | TxId wraps | "); - sb.Append("Cascades | Queue | "); - // Coalescing column. Single cell carries hit / (hit + miss) ratio as a - // percentage plus the raw hit count for context. Kept compact (one cell) to - // stay under the 50 KB page-weight budget. - sb.Append("Coal | "); - // Cache column. Single cell carries hit-ratio percent plus raw hit count; - // an em-dash when no cache-eligible reads have occurred. Page-weight budget - // assertion stays under 50 KB for the 54-PLC fleet. - sb.Append("Cache | "); - // Keepalive column — heartbeats sent, with failure / idle-disconnect counts - // shown only when non-zero. - sb.Append("Keepalive | "); - sb.Append("
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ").Append(HtmlEncode(plc.Name)).Append(" | "); - sb.Append("").Append(HtmlEncode(plc.Host)).Append(" | "); - sb.Append("").Append(plc.ListenPort).Append(" | "); - - // State cell with colour coding - string stateClass = plc.Listener.State switch - { - "bound" => "bound", - "recovering" => "recovering", - _ => "stopped", - }; - sb.Append("")
- .Append(HtmlEncode(plc.Listener.State)).Append("");
- if (plc.Listener.State == "recovering" && plc.Listener.LastBindError is { } err)
- {
- sb.Append(" ") - .Append(HtmlEncode(err)) - .Append(" (attempt ").Append(plc.Listener.RecoveryAttempts).Append(")") - .Append(""); - } - sb.Append(" | ");
-
- // Connected clients
- sb.Append("");
- sb.Append(plc.Clients.Connected);
- if (plc.Clients.RemoteEndpoints.Count > 0)
- {
- sb.Append(" "); - bool first = true; - foreach (var c in plc.Clients.RemoteEndpoints) - { - if (!first) sb.Append(", "); - sb.Append(HtmlEncode(c.Remote)) - .Append(" (").Append(c.PdusForwarded).Append(')'); - first = false; - } - } - sb.Append(" | ");
-
- // Counter cells
- sb.Append("").Append(plc.Pdus.Forwarded).Append(" | "); - sb.Append("").Append(plc.Pdus.ByFc.Fc03).Append(" | "); - sb.Append("").Append(plc.Pdus.ByFc.Fc04).Append(" | "); - sb.Append("").Append(plc.Pdus.ByFc.Fc06).Append(" | "); - sb.Append("").Append(plc.Pdus.ByFc.Fc16).Append(" | "); - sb.Append("").Append(plc.Pdus.ByFc.Other).Append(" | "); - sb.Append("").Append(plc.Pdus.RewrittenSlots).Append(" | "); - sb.Append("").Append(plc.Pdus.PartialBcdWarnings).Append(" | "); - sb.Append("").Append(plc.Pdus.InvalidBcdWarnings).Append(" | "); - sb.Append("").Append(plc.Backend.ExceptionsByCode.Code01).Append(" | "); - sb.Append("").Append(plc.Backend.ExceptionsByCode.Code02).Append(" | "); - sb.Append("").Append(plc.Backend.ExceptionsByCode.Code03).Append(" | "); - sb.Append("").Append(plc.Backend.ExceptionsByCode.Code04).Append(" | "); - sb.Append("").Append(plc.Backend.ExceptionsByCode.CodeOther).Append(" | "); - sb.Append("").Append(plc.Backend.LastRoundTripMs.ToString("F1")).Append(" | "); - sb.Append("").Append(plc.Bytes.UpstreamIn).Append(" | "); - sb.Append("").Append(plc.Bytes.UpstreamOut).Append(" | "); - // Multiplexer telemetry cells. - sb.Append("").Append(plc.Backend.InFlight).Append(" | "); - sb.Append("").Append(plc.Backend.MaxInFlight).Append(" | "); - sb.Append("").Append(plc.Backend.TxIdWraps).Append(" | "); - sb.Append("").Append(plc.Backend.DisconnectCascades).Append(" | "); - sb.Append("").Append(plc.Backend.QueueDepth).Append(" | "); - // Coalescing ratio cell — ""); - if (coalHit + coalMiss == 0) - { - sb.Append("—"); - } - else - { - int pct = (int)Math.Round(100.0 * coalHit / (coalHit + coalMiss)); - sb.Append(pct).Append("% (").Append(coalHit).Append(')'); - } - sb.Append(" | "); - // Cache ratio cell — same pattern as coalescing. - long cacheHit = plc.Backend.CacheHitCount; - long cacheMiss = plc.Backend.CacheMissCount; - sb.Append(""); - if (cacheHit + cacheMiss == 0) - { - sb.Append("—"); - } - else - { - int pct = (int)Math.Round(100.0 * cacheHit / (cacheHit + cacheMiss)); - sb.Append(pct).Append("% (").Append(cacheHit).Append(')'); - } - sb.Append(" | "); - // Keepalive cell — heartbeats sent; failures + idle-disconnects appended - // only when non-zero to keep the cell narrow. - long hbSent = plc.Backend.BackendHeartbeatsSent; - long hbFailed = plc.Backend.BackendHeartbeatsFailed; - long hbIdle = plc.Backend.BackendIdleDisconnects; - sb.Append(""); - if (hbSent == 0 && hbFailed == 0 && hbIdle == 0) - { - sb.Append("—"); - } - else - { - sb.Append(hbSent); - if (hbFailed > 0 || hbIdle > 0) - sb.Append(" (fail ").Append(hbFailed) - .Append(", idle-disc ").Append(hbIdle).Append(')'); - } - sb.Append(" | "); - sb.Append("