Compare commits
9 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 1eeee1e292 | |||
| 1d48b15ece | |||
| 0756bb0066 | |||
| f6f4aeb031 | |||
| b222362ce0 | |||
| 0308490aef | |||
| 374eecd205 | |||
| 554b05d28c | |||
| e719dd51c1 |
+1
-1
@@ -32,7 +32,7 @@ The full architecture is documented under **[`docs/`](docs/)** — see the `Arch
|
|||||||
- **Polly bounded retries** on backend connect (3 attempts at 100ms / 500ms / 2000ms). No retries on mid-request failures (FC06/FC16 are non-idempotent on BCD tags). A per-request watchdog in the multiplexer surfaces Modbus exception 0x0B to the upstream client if a backend response never arrives within `BackendRequestTimeoutMs`.
|
- **Polly bounded retries** on backend connect (3 attempts at 100ms / 500ms / 2000ms). No retries on mid-request failures (FC06/FC16 are non-idempotent on BCD tags). A per-request watchdog in the multiplexer surfaces Modbus exception 0x0B to the upstream client if a backend response never arrives within `BackendRequestTimeoutMs`.
|
||||||
- **Backend disconnect cascades upstream**: when the shared backend socket dies, every attached upstream pipe is closed in the same cycle (counter `BackendDisconnectCascades`); clients reconnect on their next request.
|
- **Backend disconnect cascades upstream**: when the shared backend socket dies, every attached upstream pipe is closed in the same cycle (counter `BackendDisconnectCascades`); clients reconnect on their next request.
|
||||||
- **Keepalive / connection monitoring** (ON by default, `Connection.Keepalive`): OS `SO_KEEPALIVE` on backend and accepted upstream sockets, plus a per-PLC application heartbeat — a synthetic FC03 qty=1 read fired on an idle backend socket (`BackendHeartbeatIdleMs`). An unanswered heartbeat proactively tears the backend down (counters `backendHeartbeatsSent/Failed`, `backendIdleDisconnects`). The DL260 has no FC08, so the probe is a real register read. See [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md).
|
- **Keepalive / connection monitoring** (ON by default, `Connection.Keepalive`): OS `SO_KEEPALIVE` on backend and accepted upstream sockets, plus a per-PLC application heartbeat — a synthetic FC03 qty=1 read fired on an idle backend socket (`BackendHeartbeatIdleMs`). An unanswered heartbeat proactively tears the backend down (counters `backendHeartbeatsSent/Failed`, `backendIdleDisconnects`). The DL260 has no FC08, so the probe is a real register read. See [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md).
|
||||||
- **Read-only Kestrel admin port** (default 8080) exposes `GET /` (auto-refreshing HTML) and `GET /status.json` with service-wide and per-PLC counters (including Phase-9 mux fields, Phase-10 coalescing fields, and Phase-11 cache fields `cacheHitCount`, `cacheMissCount`, `cacheInvalidations`, `cacheEntryCount`, `cacheBytes`).
|
- **Read-only Kestrel admin port** (default 8080) serves a SignalR-backed web dashboard — `GET /` (filterable fleet KPI table), `GET /plc/{name}` (per-PLC grouped counters + a real-time debug view of raw PLC-side BCD vs. decoded client-side values), `/hub/status` (live feed, `Mbproxy.AdminPushIntervalMs` cadence), `/assets/*` (embedded Bootstrap/SignalR/fonts, no CDN) — plus the unchanged `GET /status.json` twin with service-wide and per-PLC counters (Phase-9 mux, Phase-10 coalescing, Phase-11 cache fields `cacheHitCount`/`cacheMissCount`/`cacheInvalidations`/`cacheEntryCount`/`cacheBytes`). The debug view's per-tag value capture (`TagValueCapture`/`TagCaptureRegistry`) is armed on-demand only while a detail page is open. Admin stays strictly read-only — no control actions.
|
||||||
|
|
||||||
Anything beyond this short list lives in the `docs/` tree: the appsettings.json schema in [`docs/Operations/Configuration.md`](docs/Operations/Configuration.md), config propagation in [`docs/Features/HotReload.md`](docs/Features/HotReload.md), stable log event names in [`docs/Reference/LogEvents.md`](docs/Reference/LogEvents.md), the status counter catalog in [`docs/Operations/StatusPage.md`](docs/Operations/StatusPage.md), and the simulator-backed test fixture in [`docs/Testing/Simulator.md`](docs/Testing/Simulator.md). Open the relevant page before writing code; keep it in sync when decisions change.
|
Anything beyond this short list lives in the `docs/` tree: the appsettings.json schema in [`docs/Operations/Configuration.md`](docs/Operations/Configuration.md), config propagation in [`docs/Features/HotReload.md`](docs/Features/HotReload.md), stable log event names in [`docs/Reference/LogEvents.md`](docs/Reference/LogEvents.md), the status counter catalog in [`docs/Operations/StatusPage.md`](docs/Operations/StatusPage.md), and the simulator-backed test fixture in [`docs/Testing/Simulator.md`](docs/Testing/Simulator.md). Open the relevant page before writing code; keep it in sync when decisions change.
|
||||||
|
|
||||||
|
|||||||
+2
-2
@@ -50,7 +50,7 @@ The `docs/` tree is organized by topic. Start with [`Architecture/Overview.md`](
|
|||||||
### Operations
|
### Operations
|
||||||
|
|
||||||
- [`Operations/Configuration.md`](docs/Operations/Configuration.md) — Full `appsettings.json` reference: every `Mbproxy:*` key, default, and validation rule.
|
- [`Operations/Configuration.md`](docs/Operations/Configuration.md) — Full `appsettings.json` reference: every `Mbproxy:*` key, default, and validation rule.
|
||||||
- [`Operations/StatusPage.md`](docs/Operations/StatusPage.md) — Admin endpoint surface (`/`, `/status.json`) with every JSON field documented.
|
- [`Operations/StatusPage.md`](docs/Operations/StatusPage.md) — Admin endpoint surface: the SignalR-backed web dashboard (`/`, `/plc/{name}`, `/hub/status`) and the `/status.json` twin, with every JSON field documented.
|
||||||
- [`Operations/Troubleshooting.md`](docs/Operations/Troubleshooting.md) — Diagnosis playbook keyed to log events and status counters.
|
- [`Operations/Troubleshooting.md`](docs/Operations/Troubleshooting.md) — Diagnosis playbook keyed to log events and status counters.
|
||||||
|
|
||||||
### Reference
|
### Reference
|
||||||
@@ -106,7 +106,7 @@ cd src/Mbproxy
|
|||||||
dotnet run --configuration Debug
|
dotnet run --configuration Debug
|
||||||
```
|
```
|
||||||
|
|
||||||
Edit `src/Mbproxy/appsettings.json` to configure PLCs before running. The admin status page will be at `http://localhost:8080/` by default.
|
Edit `src/Mbproxy/appsettings.json` to configure PLCs before running. The admin dashboard will be at `http://localhost:8080/` by default — a live SignalR-backed fleet view; click any PLC row for its per-connection detail page and real-time BCD debug view.
|
||||||
|
|
||||||
## Install
|
## Install
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,93 @@
|
|||||||
|
# Admin SignalR Web Dashboard — Code Review
|
||||||
|
|
||||||
|
Scope: commit `e719dd5` ("replace status page with a live SignalR web dashboard"), files `src/Mbproxy/Admin/{AdminEndpointHost,StatusHub,StatusBroadcaster,StatusPushSink,PlcSubscriptionTracker,StatusSnapshotBuilder,DebugDto}.cs`. Cross-checked against `docs/Operations/StatusPage.md`, `docs/Reference/LogEvents.md`, the mbproxy `CLAUDE.md`, and the supporting types `Proxy/TagCaptureRegistry.cs`, `Proxy/TagValueCapture.cs`, `Proxy/ProxyWorker.cs`, `HostingExtensions.cs`, `Configuration/ConfigReconciler.cs`. Tests under `tests/Mbproxy.Tests/Admin/{StatusHubTests,StatusBroadcasterTests}.cs`.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
- The decomposition is sound: `IStatusPushSink` cleanly isolates the push loop from SignalR, `PlcSubscriptionTracker` is correctly single-locked, and the broadcaster lifecycle is tied to the Kestrel app's lifecycle so an `AdminPort` hot-reload re-bind does not leak a second broadcaster. Per-cycle error handling in `PushOnceAsync` is genuinely defensive.
|
||||||
|
- The most serious problem is a **subscriber-count leak in `PlcSubscriptionTracker`**: a SignalR reconnect (transport drop without a clean close) increments a PLC's count on the new connection but the old connection's `OnDisconnectedAsync` decrement is not guaranteed, so a capture can be left armed forever — or, in the opposite race, double-armed/never-disarmed. Combined with the fact that capture arming has no reference to the *connection liveness*, the on-demand-capture invariant ("armed only while a viewer is open") is not actually upheld in the field.
|
||||||
|
- A **second real bug**: `StatusHub.SubscribePlc` is not atomic — `Groups.AddToGroupAsync` then `_tracker.Add` are two awaits with an interleaving point, and `OnDisconnectedAsync` can run on the same connection between them, producing a *negative-free but still wrong* state where the group membership outlives the tracker entry (capture disarmed while the page still receives pushes) or vice versa.
|
||||||
|
- `DebugJsonContext` (in `DebugDto.cs`) is dead code — defined, never referenced. SignalR serializes `PlcDetailResponse` via the reflection-based `System.Text.Json` path, not the source-gen context. Harmless today (no trimming/AOT in the csproj) but it is a misleading artifact and a latent trap if AOT is ever enabled.
|
||||||
|
- The documented contract is matched on the fleet path but the **detail push (`PlcDetailResponse`) is undocumented in the `status.json` schema** and is not reachable from `StatusJsonContext` — only over SignalR. That is by design, but the camelCase guarantee for it rests entirely on the hub's `AddJsonProtocol` config, with no test asserting the wire shape.
|
||||||
|
|
||||||
|
## Critical findings
|
||||||
|
|
||||||
|
**C1. `PlcSubscriptionTracker` leaks subscriber counts across SignalR reconnects — captures get stuck armed (or never armed).** `StatusHub.cs:48-54` / `PlcSubscriptionTracker.cs:29-73`. SignalR assigns a *new* `ConnectionId` on every transport reconnect (WebSocket drop, long-polling cycle, network blip). The client (`detail.js`) re-invokes `SubscribePlc` on reconnect. The sequence on a reconnect is:
|
||||||
|
|
||||||
|
1. Old connection's transport dies. SignalR *eventually* fires `OnDisconnectedAsync` for the old `ConnectionId` — but this is **not** ordered relative to the new connection's `OnConnectedAsync`/`SubscribePlc`, and on an ungraceful drop it may be delayed by the server's keepalive/timeout window (default ~30 s) or, if the server is shutting down, may not fire deterministically at all.
|
||||||
|
2. New connection calls `SubscribePlc` → `_tracker.Add(newId, plc)` → count `1 → 2`, returns `false`, so no re-arm (fine).
|
||||||
|
3. Old connection's `OnDisconnectedAsync` runs late → count `2 → 1`. Capture stays armed. **Correct only if the order is (2) then (3).**
|
||||||
|
|
||||||
|
The failure case: if the old connection's `OnDisconnectedAsync` is delayed past the *new* connection also disconnecting, or if the operator closes the tab during the reconnect window, the count never returns to 0 and **the capture is armed forever with no viewer** — exactly the hot-path cost the on-demand design exists to avoid. Over a long-running service with flaky operator networks this accumulates: every PLC ever viewed ends up permanently armed. `TagValueCapture.Record` is then a non-trivial cost (`FrozenDictionary` lookup + allocation of a `TagValueObservation` + `Volatile.Write`) on the backend reader task and every FC06/FC16 upstream task, fleet-wide, forever.
|
||||||
|
|
||||||
|
The `StatusBroadcaster.StopAsync` → `DisarmAll()` safety net only fires on admin shutdown / `AdminPort` hot-reload, not during normal operation, so it does not bound this leak in a steady-state service.
|
||||||
|
|
||||||
|
Fix: do not rely on `ConnectionId` lifetime as the capture-arming key. Either (a) key subscriptions on a stable client-supplied identity and treat reconnects as idempotent re-subscribes, or (b) drive disarm off the broadcaster: each cycle, `ActivePlcs()` is the live set; reconcile armed captures against it (`_captureRegistry` disarms any PLC not in `ActivePlcs()`), so a leaked count is self-healing within one push interval — but a leaked *count* still keeps `ActivePlcs()` returning the PLC, so (b) alone is insufficient. The robust fix is (a): give `PlcSubscriptionTracker.RemoveConnection` a periodic sweep against SignalR's live connection set, or add a TTL/heartbeat so a connection that has not pushed a keepalive in N intervals is reaped. At minimum, document the leak and have the broadcaster log when `ActivePlcs()` count exceeds the number of distinct connections it has seen.
|
||||||
|
|
||||||
|
**C2. `SubscribePlc` is not atomic — group membership and tracker state can diverge under a concurrent disconnect.** `StatusHub.cs:48-54`:
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
public async Task SubscribePlc(string plcName)
|
||||||
|
{
|
||||||
|
await Groups.AddToGroupAsync(Context.ConnectionId, PlcGroup(plcName)).ConfigureAwait(false);
|
||||||
|
if (_tracker.Add(Context.ConnectionId, plcName))
|
||||||
|
_captureRegistry.Arm(plcName);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
There are two awaits' worth of interleaving here. SignalR will dispatch `OnDisconnectedAsync` for the same connection if the transport drops while `SubscribePlc` is mid-flight (hub method invocations and the disconnect callback are not mutually exclusive — only invocations on the *same* connection are serialized *with each other*, and `OnDisconnectedAsync` is a separate dispatch path). Concretely:
|
||||||
|
|
||||||
|
- Connection drops after `AddToGroupAsync` completes but before `_tracker.Add`. `OnDisconnectedAsync` runs `_tracker.RemoveConnection(id)` → returns empty (nothing tracked yet). Then `SubscribePlc` resumes, `_tracker.Add` runs → count `0 → 1`, returns `true` → **`_captureRegistry.Arm(plcName)` on a connection that is already gone.** The capture is now armed with a phantom viewer. This is the same stuck-armed leak as C1, reached by a tighter race.
|
||||||
|
- The mirror case leaves a group membership with no tracker entry; benign for pushes (the dead connection just never receives them) but it confirms the two data structures are not kept consistent.
|
||||||
|
|
||||||
|
Fix: make subscribe/unsubscribe go through a single critical section that also checks connection liveness, or — simpler — register the tracker entry *first* (synchronous, under its own lock), then `AddToGroupAsync`, and have `OnDisconnectedAsync` always run last and unconditionally. Even then the arm/disarm must be idempotent and reconciled against the live connection set (see C1). The cleanest fix addresses C1 and C2 together: treat `OnConnectedAsync`/`OnDisconnectedAsync` as the *only* mutators of tracker state, make `SubscribePlc` only adjust group membership, and arm/disarm from a periodic reconciliation in the broadcaster against SignalR's actual group/connection state.
|
||||||
|
|
||||||
|
## Major findings
|
||||||
|
|
||||||
|
**M1. `DebugJsonContext` is dead code; the detail payload is not source-gen serialized.** `DebugDto.cs:54-61` declares `[JsonSerializable(typeof(PlcDetailResponse))] ... DebugJsonContext`, but nothing references `DebugJsonContext.Default` anywhere in the tree (grep confirms zero usages). `AdminEndpointHost.cs:192-196` configures the SignalR JSON protocol with only a `PropertyNamingPolicy` — no `TypeInfoResolver` — so SignalR serializes `PlcDetailResponse` through the **reflection-based** `System.Text.Json` resolver. The `status.json` route uses `StatusJsonContext` (source-gen); the SignalR path does not use any source-gen context. Today this works (the csproj sets neither `PublishTrimmed` nor `PublishAot`), but: (a) `DebugJsonContext` is misleading dead code that implies the detail payload is AOT-safe when it is not; (b) if trimming/AOT is ever turned on, every SignalR push of `PlcDetailResponse` *and* of `StatusResponse` (the `"fleet"` message also goes through the reflection resolver — `StatusJsonContext` is only wired into the `status.json` `JsonSerializer.Serialize` call, not into `AddJsonProtocol`) will throw or silently emit `{}`. Fix: either delete `DebugJsonContext` and document that the SignalR path is reflection-only, or — better — wire both contexts into the hub via `AddJsonProtocol(o => o.PayloadSerializerOptions.TypeInfoResolverChain.Insert(0, StatusJsonContext.Default); ...Insert(1, DebugJsonContext.Default))` so the SignalR path is consistent with `status.json` and trim-safe. Note `DebugJsonContext` would also need `StatusResponse`/`PlcStatus` reachable since `PlcDetailResponse` embeds `PlcStatus?`.
|
||||||
|
|
||||||
|
**M2. No test asserts the SignalR wire shape is camelCase.** `AdminEndpointHost.cs:194-196` is the *only* thing making the live feed's JSON match the documented `status.json` contract (`docs/Operations/StatusPage.md` repeatedly states "camelCase property names"). `StatusBroadcasterTests` uses `FakeStatusPushSink` and never serializes; `StatusHubTests` uses fakes. If someone removes or mis-edits the `AddJsonProtocol` lambda, every dashboard field silently becomes `PascalCase` and the JS (`dashboard.js`/`detail.js` expecting `service.uptimeSeconds` etc.) breaks with no failing test. The reflection-vs-source-gen split in M1 makes this worse — the two endpoints' naming is configured in two unrelated places. Add an integration test that stands up the hub (or at least serializes `PlcDetailResponse`/`StatusResponse` with the exact `PayloadSerializerOptions` the hub builds) and asserts `uptimeSeconds`/`captureArmed` appear lowercase-first.
|
||||||
|
|
||||||
|
**M3. `StatusBroadcaster.PushOnceAsync` swallows a cancelled per-PLC push but keeps looping over remaining PLCs.** `StatusBroadcaster.cs:97-110`. The `catch (Exception ex) when (ex is not OperationCanceledException)` correctly lets cancellation propagate — but only out of the `await _sink.PushPlcAsync`. If `ct` is cancelled mid-loop, the `OperationCanceledException` escapes `PushOnceAsync`, which is caught by `LoopAsync`'s outer `catch (OperationCanceledException)` — fine. However, if `_builder.BuildDebug(plcName)` (synchronous, `StatusSnapshotBuilder.cs:44`) throws a *non-OCE* exception for one PLC, the `catch` logs and the loop continues — good. But `snapshot.Plcs.FirstOrDefault(...)` is re-run inside the `try` for every PLC: an O(N) scan per PLC over the `Plcs` list = O(N²) per push cycle. For 54 PLCs that is 2,916 comparisons per second — trivially cheap here, but flagged because the fleet snapshot is already a `List` and a `Dictionary<string,PlcStatus>` projection once per cycle would be cleaner and removes the quadratic. Minor on its own; grouped here because it sits in the same loop as a real concern: `BuildDebug` is called for every active PLC even when `snapshot.Plcs` has no matching entry (PLC removed by hot-reload) — that path is handled (`Plc: null`) but worth a test.
|
||||||
|
|
||||||
|
**M4. `StatusBroadcaster.Start()` is fire-and-forget with no guard against being called twice.** `StatusBroadcaster.cs:51-52`: `public void Start() => _loop = Task.Run(...)`. The XML doc says "Idempotent only in the sense that it is called once" — but nothing *enforces* that. A second `Start()` overwrites `_loop`, orphaning the first loop task (it keeps running against the same `_cts`, so two loops now push concurrently every interval until cancellation). `AdminEndpointHost` only ever calls it once per `StartAppAsync`, and a new `StatusBroadcaster` is constructed each re-bind, so this is not hit today — but a one-line guard (`if (_loop is not { IsCompleted: true } and not null) throw`/return, or an `Interlocked` flag) would make the class safe to misuse. Also: `Task.Run` returns a task whose faults are observed only via `_loop` being awaited in `StopAsync`; if `StopAsync` is never called (e.g. `StartAppAsync` throws *after* `_broadcaster.Start()` but the catch at `AdminEndpointHost.cs:253` sets `_app = null` without disposing `_broadcaster`) the loop task and its `_cts` leak. See M5.
|
||||||
|
|
||||||
|
**M5. A bind failure after `_broadcaster.Start()` leaks the broadcaster and its push loop.** `AdminEndpointHost.cs:237-258`. `StartAppAsync` does `await app.StartAsync(ct)` (line 237), then `_app = app`, then constructs and `Start()`s `_broadcaster` (lines 242-249), then `LogAdminStarted`. The whole body is wrapped in `catch (Exception ex) when (ex is not OperationCanceledException)` (line 253) which logs `mbproxy.admin.bind.failed` and sets `_app = null`. If anything between line 238 and 251 throws — e.g. `app.Services.GetRequiredService<IHubContext<StatusHub>>()` fails, or the `StatusBroadcaster` constructor throws, or `LogAdminStarted` somehow throws — the catch sets `_app = null` but **does not stop the already-started `_broadcaster` nor the already-started Kestrel `app`**. The push loop keeps running forever against a sink whose hub is on a Kestrel app that `StopCurrentAppAsync` will never see (`_app` is null, `_broadcaster` field may or may not be set depending on where the throw landed). Result: a leaked Kestrel listener still bound to the port, plus a leaked broadcaster loop. The probability is low (those calls rarely throw) but the catch's cleanup is incomplete. Fix: in the catch, best-effort stop/dispose whatever was started — mirror `StopCurrentAppAsync`'s logic, or wrap the post-`StartAsync` section so a failure tears down `app` and `_broadcaster` before nulling the fields.
|
||||||
|
|
||||||
|
**M6. `PushOnceAsync` builds the debug snapshot for a PLC whose subscriber count is stale.** Tied to C1/C2: `ActivePlcs()` (`PlcSubscriptionTracker.cs:76-82`) returns a key snapshot, and the broadcaster pushes a `"plc"` message to `PlcGroup(plcName)` for each. If the count is leaked-high (C1), the broadcaster pushes to an empty SignalR group every cycle forever — cheap (SignalR no-ops an empty group) but it also keeps calling `_builder.BuildDebug(plcName)` and, more importantly, the *capture stays armed* because nothing disarms it. This is the observable symptom of C1 inside the broadcaster. Recording it separately because the fix (broadcaster-side reconciliation) lives here.
|
||||||
|
|
||||||
|
## Minor findings
|
||||||
|
|
||||||
|
**N1. `StatusBroadcaster` does not use a stable log event name.** `StatusBroadcaster.cs:84,94,108,133` all call `_logger.LogError(ex, "StatusBroadcaster: ...")` with free-text messages and no `EventId`/`EventName`. Every other component in this codebase uses `[LoggerMessage]` source-gen with a stable `mbproxy.*` event name catalogued in `docs/Reference/LogEvents.md` (e.g. `mbproxy.admin.started` EventId 70, `mbproxy.admin.bind.failed` EventId 71). The broadcaster's "loop terminated unexpectedly" at line 133 is exactly the kind of event an operator would alert on, and it is invisible to event-name-based log queries. Add `[LoggerMessage]` entries (e.g. `mbproxy.admin.broadcast.failed`, `mbproxy.admin.broadcast.loop.terminated`) and register them in `LogEvents.md`.
|
||||||
|
|
||||||
|
**N2. `LoopAsync` does `Task.Delay` *before* the first push.** `StatusBroadcaster.cs:122-124`: the loop delays `interval` ms, then pushes. The first dashboard client therefore waits up to `AdminPushIntervalMs` (default 1000 ms) for its first `"fleet"` message even though a snapshot is available immediately. `index.html`/`dashboard.js` presumably also fetch `status.json` or render empty until the first push — but a push-before-delay (or an immediate `PushOnceAsync` before entering the loop) would make the dashboard populate instantly on open. Cosmetic, but easy.
|
||||||
|
|
||||||
|
**N3. `StatusBroadcaster.StopAsync` is not idempotent-safe against the `_loop` await.** `StatusBroadcaster.cs:57-72`: if `StopAsync` is called twice (it is reachable: `DisposeAsync` calls `StopAsync`, and `AdminEndpointHost.StopCurrentAppAsync` calls `broadcaster.DisposeAsync()` which calls `StopAsync` — only one path per instance today, but). The second call: `_cts.IsCancellationRequested` is true so it skips `CancelAsync`, then `await _loop` again (a completed task — fine), then `_captureRegistry.DisarmAll()` again (fine). Benign, but `DisposeAsync` at line 137-141 then calls `_cts.Dispose()`; a *third* path touching `_cts` after dispose would throw `ObjectDisposedException`. No live bug, but the class lacks the `_disposed` guard that `AdminEndpointHost` itself carries (and whose absence the `AdminEndpointHost` comment at lines 59-64 explicitly calls out as a regression risk). Add a `_stopped`/`_disposed` flag for symmetry.
|
||||||
|
|
||||||
|
**N4. `OnDisconnectedAsync` does its cleanup *before* `base.OnDisconnectedAsync`.** `StatusHub.cs:60-66`. The capture disarm runs first, then `base.OnDisconnectedAsync(exception)`. This is the correct order (you want to release resources before the base teardown) and the disarm is synchronous so it cannot be lost — but note there is **no `OnConnectedAsync` override**, which is fine, and no try/finally around the `foreach`. If `_captureRegistry.Disarm` threw (it cannot — it is a dictionary lookup + volatile write), the remaining PLCs in the connection's set would not be disarmed and `base.OnDisconnectedAsync` would be skipped. Defensive only; `Disarm` is genuinely no-throw today.
|
||||||
|
|
||||||
|
**N5. `PlcSubscriptionTracker.ActivePlcs()` allocates a fresh array every push cycle.** `PlcSubscriptionTracker.cs:80`: `_plcCounts.Keys.ToArray()` under the lock, once per `AdminPushIntervalMs`. With the common case of zero detail-page viewers it correctly returns `Array.Empty<string>()` (line 80 short-circuits on `Count == 0`). Only allocates when someone is viewing. Fine — flagged only as a known per-cycle allocation if push interval is ever lowered aggressively.
|
||||||
|
|
||||||
|
**N6. `ServeHtmlShell` / asset routes have no `HEAD` handling and `/plc/{name}` ignores `name`.** `AdminEndpointHost.cs:210`: `app.MapGet("/plc/{name}", (string name, HttpContext ctx) => ServeHtmlShell(ctx, "plc.html"))` — `name` is bound but unused (the page reads it client-side from the URL). Harmless, but the unused parameter will draw a compiler/analyzer warning under this project's `TreatWarningsAsErrors` unless suppressed; verify it builds. If it does build clean, fine — `MapGet` route parameters are not flagged as unused. Minor; mentioned for the reviewer to confirm against CI.
|
||||||
|
|
||||||
|
**N7. The detail payload's `PlcDetailResponse` shape is undocumented as part of the `/status.json` contract but the doc table at `StatusPage.md:317-328` does describe it.** Actually documented — withdrawing the "undocumented" concern from the Summary's last bullet to this extent: the *fields* are in `StatusPage.md`. What is genuinely missing is a statement that this payload travels **only over SignalR** and is **not** reachable at any `GET` route, and that its serialization path differs from `status.json` (M1). One sentence in `StatusPage.md`'s "Debug View Data" section would close it.
|
||||||
|
|
||||||
|
## What looks good
|
||||||
|
|
||||||
|
- **Broadcaster lifecycle is correctly bound to the Kestrel app, not the host.** `AdminEndpointHost.cs:242-249` creates a fresh `StatusBroadcaster` inside `StartAppAsync` and `StopCurrentAppAsync` (lines 268-279) disposes it *before* stopping Kestrel. An `AdminPort` hot-reload therefore tears down the old broadcaster and starts a new one — no broadcaster leak across re-binds, and the `DisarmAll()` in `StopAsync` ensures the re-bind does not strand an armed capture. This directly answers the open question N3-style concern from the 2026-05-14 `AdminAndDiagnostics` review about provider/loop leaks across re-binds.
|
||||||
|
- **`IStatusPushSink` is a clean seam.** Defining the outbound side as an interface (`StatusPushSink.cs`) lets `StatusBroadcasterTests` exercise the full push-cycle logic (fleet always, per-PLC only for active PLCs, disarm-on-stop) with a recording fake and zero SignalR host — and the tests actually do this. Good testability design.
|
||||||
|
- **`PlcSubscriptionTracker` locking is correct *as a data structure*.** Single `_gate`, every method takes it, the count transitions (`0→1` returns arm-signal, `1→0` returns disarm-signal) are computed under the lock, and `RemoveConnection` correctly handles the multi-PLC-per-connection case. The bug (C1/C2) is not in the locking — it is that `ConnectionId` is the wrong key for a *lifetime* and the tracker is mutated from two un-serialized hub dispatch paths. The class itself is internally consistent.
|
||||||
|
- **`PushOnceAsync` per-stage error isolation.** `StatusBroadcaster.cs:78-110` wraps snapshot-build, fleet-push, and each per-PLC push in their own `try/catch` so one PLC's failure does not abort the cycle and a snapshot-build failure does not kill the loop. The `when (ex is not OperationCanceledException)` filters correctly let shutdown cancellation propagate to `LoopAsync`'s handler. This is the right shape.
|
||||||
|
- **`LoopAsync` re-reads `AdminPushIntervalMs` every cycle** (`StatusBroadcaster.cs:122`) so a hot-reload of the interval takes effect without restarting the loop, and floors it at 100 ms so a bad value cannot spin the CPU. Matches the hot-reload-everything posture in `CLAUDE.md`.
|
||||||
|
- **`TagValueCapture` concurrency is genuinely lock-free-correct.** `Volatile.Write`/`Volatile.Read` of references to an immutable `record` (`TagValueObservation`), `_armed` is `volatile`, and `Record` short-circuits on `!_armed` with a single volatile read before any work — so the disarmed hot path cost is one bool read, as advertised. `Disarm` clears slots so a re-arm shows only fresh data. This part of the design is solid; the weakness is *who calls Arm/Disarm and when* (C1/C2), not the capture itself.
|
||||||
|
- **`StatusSnapshotBuilder.BuildDebug` degrades gracefully for an unknown PLC** (`StatusSnapshotBuilder.cs:44-47`) — returns a disarmed empty snapshot rather than throwing, which is the correct behavior for a detail page open on a hot-reload-removed PLC, and `PlcDetailResponse.Plc` is nullable to carry that state. `ConfigReconciler.cs:259` calls `_captureRegistry.Remove(name)` on PLC removal, so the registry and the config stay consistent.
|
||||||
|
- **`StatusHub.SubscribePlc` for an unknown PLC is a documented no-op** (`StatusHub.cs:53`, `TagCaptureRegistry.Arm` no-ops a missing key) and `StatusHubTests.SubscribePlc_UnknownPlc_DoesNotThrow_AndArmsNothing` covers it. A hub method throwing would be sent to the caller as a hub error; this path correctly does not throw.
|
||||||
|
- **Asset serving is safe.** `AdminEndpointHost.cs:212-226` rejects `/`, `\`, and `..` in the asset path segment before touching `GetManifestResourceStream`, caches bytes and misses in a `static ConcurrentDictionary` shared across app re-builds, and sets `immutable` cache headers for content-addressed assets vs. `no-cache` for the HTML shells. Embedded resources mean no filesystem traversal surface at all.
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
1. **C1 field impact:** how often do operators' browsers reconnect in this deployment? If the dashboard is on a stable internal segment and detail pages are short-lived, the leak is slow — but it is unbounded and never self-heals in a steady-state service. Is there an existing process-restart cadence that masks it? Either way the invariant in `StatusPage.md:315` ("the hot path carries zero cost when nobody is watching") is currently false after any reconnect-during-view.
|
||||||
|
2. **C2 / hub dispatch model:** confirm against the SignalR version in use whether `OnDisconnectedAsync` for a connection can overlap an in-flight `SubscribePlc` invocation on that *same* connection. If SignalR guarantees `OnDisconnectedAsync` runs only after all in-flight invocations for that connection complete, C2's same-connection race narrows to the cross-connection (reconnect) race in C1 — still a bug, but the fix scope shrinks.
|
||||||
|
3. **M1:** is trimming/AOT on the roadmap for this service? `CLAUDE.md` mentions single-file self-contained publish but not trimming. If AOT is ever planned, M1 is upgraded to Critical (the SignalR reflection JSON path will break) and `DebugJsonContext` must be wired in, not deleted.
|
||||||
|
4. **M5:** has the post-`StartAsync` failure path (e.g. `GetRequiredService<IHubContext<StatusHub>>` failing) ever been observed? It is low-probability, but the catch block's cleanup is provably incomplete — worth a deliberate decision to either fix it or document it as accepted.
|
||||||
|
5. Is there any reason `StatusJsonContext.Default` is not also wired into the hub's `AddJsonProtocol` so the fleet `"fleet"` push and `GET /status.json` share one serialization path and one camelCase configuration point (M1/M2)?
|
||||||
@@ -0,0 +1,96 @@
|
|||||||
|
# Frontend / Live Web Dashboard — Code Review
|
||||||
|
|
||||||
|
Scope: commit `e719dd5` ("replace status page with a live SignalR web dashboard"). Files reviewed — all under `src/Mbproxy/Admin/wwwroot/`:
|
||||||
|
`index.html`, `plc.html`, `dashboard.js`, `detail.js`, `theme.css`, `dashboard.css`, `detail.css`. Vendored assets (`bootstrap.min.css`, `bootstrap.bundle.min.js`, `signalr.min.js`, `*.woff2`) explicitly out of scope. Cross-checked against `docs/Operations/StatusPage.md` (wire contract), `CLAUDE.md` (mbproxy), and the server side `StatusHub.cs` / `StatusBroadcaster.cs` for the SignalR contract.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
- The dashboard is well-built vanilla JS: thoughtful escaping helpers, clean rate computation, sensible reconnection wiring, and a correct subscribe-on-reconnect pattern that keeps the on-demand tag capture armed/disarmed correctly.
|
||||||
|
- **The single most important finding is a real, exploitable XSS hole on the detail page**: `detail.js` renders `t.rawHex`, `t.address`, `t.width`, `t.direction`, `c.remote`, and `plc.host` straight into `innerHTML` **without escaping**. `escapeHtml` exists in the file but is applied inconsistently — several attacker/PLC-influenceable fields bypass it. The fleet table (`dashboard.js`) is escaped correctly throughout; the detail page is not. This is a Critical finding.
|
||||||
|
- A second Major issue: on a *cold* SignalR failure the page enters a `setTimeout(start, 3000)` retry loop that **builds a brand-new `HubConnection` on every retry** — no, it reuses the same connection object, but it never tears down handlers, and combined with `withAutomaticReconnect` the failure/retry semantics are muddled. There is also no upper bound and no UX past "retrying".
|
||||||
|
- DOM updates are full-`innerHTML` re-renders of the whole table body every ~1 s. Correct and flicker-free for 54 rows, but it blows away focus/selection and is wasteful; acceptable at this scale, flagged.
|
||||||
|
|
||||||
|
## Critical findings
|
||||||
|
|
||||||
|
**C1. Stored/reflected XSS on the connection-detail page — multiple unescaped fields rendered via `innerHTML`.** `detail.js` defines `escapeHtml` (line 23) and uses it for *some* fields but omits it for several others, all of which are interpolated into template strings later assigned to `.innerHTML`:
|
||||||
|
|
||||||
|
- **`detail.js:194` and `:203`** — the debug-row builder:
|
||||||
|
```js
|
||||||
|
<td>${t.address} <span class="ratio-sub">${hex4(t.address)}</span></td>
|
||||||
|
<td>${t.width}-bit</td>
|
||||||
|
...
|
||||||
|
<td><span class="dir-tag ${dirCls}">${t.direction}</span></td>
|
||||||
|
```
|
||||||
|
`t.direction` is a server string (`"read"`/`"write"` per the schema) interpolated raw into both an attribute-ish context and element text. `t.address`/`t.width` are typed `int` in the documented DTO, so they are lower risk — but the code does not coerce them (`Number(...)`) or escape them, so it relies entirely on the server DTO type. `t.direction` is a free `string` on the wire and is **not escaped**.
|
||||||
|
- **`detail.js:86`** — the client list:
|
||||||
|
```js
|
||||||
|
`<div class="client-line">${escapeHtml(c.remote)}` +
|
||||||
|
`<span class="pdu"> · ${num(c.pdusForwarded)} PDUs · since ${shortTime(c.connectedAtUtc)}</span></div>`
|
||||||
|
```
|
||||||
|
`c.remote` *is* escaped (good), but `shortTime(c.connectedAtUtc)` is **not** — and `shortTime` (line 36) has a `catch { return iso; }` branch that returns the **raw, unescaped `connectedAtUtc` string** verbatim when `new Date(iso)` parsing throws or the value is non-ISO. A malformed/attacker-controlled timestamp string therefore lands unescaped inside `innerHTML`.
|
||||||
|
- **`detail.js:203`** — `escapeHtml(t.rawHex)` *is* applied (good), and `num(t.decodedValue)` is numeric-safe. So `rawHex` is the one debug field that is handled correctly. (Correcting the brief: `rawHex` is escaped; `direction` and the timestamp path are the live holes.)
|
||||||
|
- **`detail.js:65`** — `$('plc-sub').textContent = `${plc.host}:${plc.listenPort}`;` is **safe** (`textContent`). But `detail.js:14–16` does the same for `plcName` via `textContent` — also safe. So the identity header is fine; the holes are specifically the `innerHTML` card/table builders.
|
||||||
|
|
||||||
|
Severity rationale: per `docs/Operations/StatusPage.md` the admin endpoint binds `IPAddress.Any` with **no authentication** ("Authentication lives at the network layer"). PLC `name`/`host` and backend-derived strings come off the Modbus wire / `appsettings.json` and are operator- or device-influenceable. A PLC named or a backend that returns a crafted string can inject `<img src=x onerror=...>` into any operator's browser session on the trusted segment. Read-only UI does not mean low impact: the injected script runs with the operator's origin.
|
||||||
|
|
||||||
|
Concrete fix: route **every** dynamic value through `escapeHtml` before it enters an `innerHTML` template, with **no exceptions**:
|
||||||
|
- `detail.js:194` / `:203`: wrap `t.direction` in `escapeHtml(...)`; coerce `t.address`/`t.width` with `Number(...)` (or escape) rather than trusting the DTO type.
|
||||||
|
- `detail.js:86`: wrap the `shortTime(...)` result in `escapeHtml(...)`, and/or change `shortTime`'s fallback to `escapeHtml(iso)` / return `'—'`.
|
||||||
|
- Add a single `escapeAttr` helper (as `dashboard.js` has at line 179) and use it for the `dirCls`/class interpolation at `:197` if `dirCls` ever becomes data-derived (currently it is a literal, so low priority — but `card()`'s `cls` parameter at `detail.js:45` is interpolated into `class="v ${cls||''}"` and is caller-supplied; today all callers pass literals, so it is safe *now* but fragile).
|
||||||
|
- Better still: stop hand-building HTML. Build rows with `document.createElement` + `textContent`/`dataset`, which makes escaping structural rather than a discipline that one missed call defeats.
|
||||||
|
|
||||||
|
## Major findings
|
||||||
|
|
||||||
|
**M1. Cold-start retry loop layers a manual `setTimeout` retry on top of `withAutomaticReconnect`, with no bound and a misleading pill.** `dashboard.js:252–263` and the identical `detail.js:236–245`:
|
||||||
|
```js
|
||||||
|
async function start() {
|
||||||
|
try { ...; await connection.start(); await connection.invoke('SubscribeFleet'); ... }
|
||||||
|
catch { setConn('disconnected','retrying'); setTimeout(start, 3000); }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
`withAutomaticReconnect` only covers a connection that *was* established and then dropped — it does **not** retry the initial `start()`. So the manual loop is needed. But: (a) it retries **forever** every 3 s with no backoff and no cap — if the hub is permanently gone the browser hammers it indefinitely; (b) if `connection.start()` succeeds but `invoke('SubscribeFleet')` throws, the `catch` calls `start()` again on an **already-started** connection, which will throw "cannot start a connection that is not in the Disconnected state" and wedge the retry loop; (c) the pill shows `disconnected`/`retrying` which is reasonable, but there is no terminal "giving up" state and no way for an operator to know the difference between "server down" and "server slow". Fix: separate "start the connection" from "subscribe" so a subscribe failure doesn't restart the socket; add capped exponential backoff; guard `start()` against re-entry when `connection.state !== 'Disconnected'`.
|
||||||
|
|
||||||
|
**M2. Detail page never handles "PLC name not in fleet" vs. "hub never delivers".** `detail.js` subscribes with `SubscribePlc(plcName)` where `plcName` is taken from `location.pathname`. If the name is wrong (typo, stale bookmark, or a name that never existed), `StatusHub.SubscribePlc` happily adds the connection to a `plc:{bogus}` group and `_captureRegistry.Arm` is a documented no-op — so the server **never sends a `plc` message** and `onDetail` never fires. The page sits forever on "Waiting for first snapshot…" with a green `connected` pill. There is no timeout, no "unknown PLC" state. The `renderMissing()` path (`detail.js:168`) only triggers when the server *does* push a payload with `detail.plc === null` (PLC removed by hot-reload) — it cannot fire for a name that was never configured because nothing is pushed at all. Fix: after `connected`, start a watchdog (e.g. 3× the push interval); if no `plc` message arrives, show an "unknown or unreachable PLC" notice.
|
||||||
|
|
||||||
|
**M3. Fleet PDU/s rate is wrong for the first snapshot after any reconnect, and silently leaks `prevPdu`/`rateByName` entries for removed PLCs.** `dashboard.js:65–77` `updateRates` keys `prevPdu`/`rateByName` by `plc.name` and never removes entries. On hot-reload removal of a PLC, its stale rate stays in `rateByName` forever and is still summed into the fleet rate? — no, `renderAggregates` only iterates `s.plcs`, so a removed PLC drops out of the *sum*; but the Map still grows unboundedly across the process lifetime if PLCs churn. More importantly, after a SignalR reconnect the counters are *cumulative since service start*, so `cur - prev.forwarded` across a multi-second reconnect gap produces a correct (if coarse) rate — that part is fine. The real bug: `performance.now()` is monotonic per-page, so that's fine too. Net: low-grade Map leak on PLC churn; prune `prevPdu`/`rateByName` to the current `snapshot.plcs` name set each cycle.
|
||||||
|
|
||||||
|
**M4. Full-table-body `innerHTML` re-render on every ~1 s push destroys transient UI state.** `dashboard.js:155–173` rebuilds the entire `<tbody>` string and assigns `tbody.innerHTML`. Consequences at the 1 s cadence: any text selection inside the table is lost every second; `:hover` is re-evaluated (minor); the browser re-parses ~54 rows of HTML each tick. It is *not* a flicker source (synchronous replace, no intermediate paint) and 54 rows is cheap, so this is acceptable — but it is the kind of thing that becomes a problem if columns/rows grow. A keyed diff (update existing `<tr>` cells in place, add/remove only on set change) would also fix the selection-loss annoyance. Flagged as Major because of the selection-loss UX regression on a screen operators stare at; the brief explicitly asked about this.
|
||||||
|
|
||||||
|
## Minor findings
|
||||||
|
|
||||||
|
**N1. `detail.js` has no `escapeAttr` and `dashboard.js`'s is duplicated.** Both files define their own `escapeHtml` (`dashboard.js:176`, `detail.js:23`) — identical code, copy-pasted. `escapeAttr` exists only in `dashboard.js:179`. Since both files are separately served and there is no shared module, duplication is the pragmatic choice for a no-build-step project, but the *inconsistency* (detail.js lacking `escapeAttr`) is exactly what enabled C1. Recommend a tiny shared `util.js` served from `/assets/`, or at minimum make the two `escapeHtml` definitions and an `escapeAttr` identical and present in both.
|
||||||
|
|
||||||
|
**N2. Accessibility gaps.**
|
||||||
|
- The sortable `<th>` elements (`index.html:72–81`) are clickable via a JS `click` handler but are not keyboard-focusable and carry no `role="button"`/`tabindex="0"`/`aria-sort`. A keyboard-only operator cannot sort the table. Add `tabindex="0"`, `aria-sort` reflecting `sorted-asc`/`sorted-desc`, and a `keydown` (Enter/Space) handler.
|
||||||
|
- Table rows are clickable (`dashboard.js:232`) to open a detail page but are `<tr>` with `cursor:pointer` only — not keyboard-reachable and not announced as interactive. Consider making the PLC-name cell an actual `<a href="/plc/...">` so it is focusable, middle-click/ctrl-click works natively, and screen readers announce it. This would also remove the need for the JS `window.open` handler.
|
||||||
|
- The connection-state pill (`#conn`) updates visually but has no `aria-live` region, so a screen-reader user is never told the hub dropped. Add `aria-live="polite"` to `#conn` or to `#conn-text`.
|
||||||
|
- `<input type="search" id="f-search">` has a `placeholder` but no associated `<label>` (visible or `aria-label`). Same for `#f-state`. Add `aria-label`.
|
||||||
|
|
||||||
|
**N3. Row click always `window.open(..., '_blank')` — no modifier-key respect, popup-blocker exposure.** `dashboard.js:235` unconditionally calls `window.open`. A plain click should arguably open in the same tab or respect the user's intent; programmatic `window.open` not in direct response to a trusted click on an anchor can be caught by popup blockers in some configs. Tied to N2's suggestion: render the PLC name as a real `<a target="_blank" rel="noopener">` and delete the handler. (`rel="noopener"` also matters — `window.open` without it leaves `window.opener` live; here the opened page is same-origin so impact is low, but it is still best practice.)
|
||||||
|
|
||||||
|
**N4. `detail.js` `plcName` parsing is brittle.** `detail.js:10`: `decodeURIComponent(location.pathname.replace(/^\/plc\//, ''))`. If the route is ever served under a path prefix, or the name itself contains an encoded `/`, this misparses. Also if `decodeURIComponent` throws (malformed `%` sequence in the URL) the whole script aborts at line 10 before `connect()` is ever reached — the page is then blank with no error. Wrap in try/catch and fall back to a visible error state.
|
||||||
|
|
||||||
|
**N5. `dashboard.js:101` and `:106` — fleet PDU/s shows `—` until the *second* snapshot.** Expected (rate needs two samples) and correctly handled, but the aggregate card shows `—` for the first ~1 s while the table already shows per-row `—` rates. Cosmetic; no fix required, noted for completeness.
|
||||||
|
|
||||||
|
**N6. No `console.log`, no hardcoded absolute URLs, no obvious dead code.** All asset/hub URLs are root-relative (`/assets/...`, `/hub/status`, `/plc/...`) — correct, survives any host/port. `card()`'s `extra` parameter and the `cls` third element of card rows are lightly-used but not dead. Clean on this axis.
|
||||||
|
|
||||||
|
**N7. `formatUptime` / `formatAge` silently misrender negative or NaN input.** `dashboard.js:199` and `detail.js:28`: if `uptimeSeconds`/`ageSeconds` ever arrive negative (clock skew) or non-numeric, `Math.floor` yields `NaN` and the card shows `NaN`. Low risk given server types; a `Number.isFinite` guard returning `'—'` is cheap insurance.
|
||||||
|
|
||||||
|
## What looks good
|
||||||
|
|
||||||
|
- **Reconnect → re-subscribe is correct.** `dashboard.js:249` and `detail.js:233` both re-`invoke('SubscribeFleet'/'SubscribePlc')` inside `onreconnected`. This is essential and easy to forget: SignalR auto-reconnect gives a new `ConnectionId`, server-side group membership does **not** survive, and the detail page's tag capture is armed per-connection — without the re-subscribe the capture would silently disarm on every transient drop. Verified against `StatusHub.SubscribePlc`/`OnDisconnectedAsync` and `StatusBroadcaster` group targeting — the lifecycle is sound.
|
||||||
|
- **`onclose` correctly does not re-subscribe** — it just sets the pill; `withAutomaticReconnect`'s own loop owns recovery. No double-retry on the warm path.
|
||||||
|
- **Single `HubConnection` per page, closed implicitly on navigation.** Opening a detail page in a new tab (`window.open`) means each tab owns exactly one hub connection; closing the tab fires `OnDisconnectedAsync` server-side which disarms the capture. No connection leak across navigation.
|
||||||
|
- **The fleet table escapes correctly.** `dashboard.js:161–163` routes `plc.name` through `escapeAttr` for the `data-name` attribute and `escapeHtml` for cell text; `plc.host` through `escapeHtml`; `plc.listener.lastBindError` through `escapeAttr` for the `title=` attribute (line 160). This is the right discipline — it is only `detail.js` that fails to match it (C1).
|
||||||
|
- **`escapeHtml` is a correct minimal implementation** — `&`, `<`, `>` cover element-text contexts; `escapeAttr` adds `"`. Order matters (`&` first) and is correct.
|
||||||
|
- **Rate computation is robust.** `dashboard.js:70` guards `now > prev.t`, `Math.max(0, ...)` clamps a counter reset/reconnect to a non-negative rate, and `performance.now()` (monotonic) is the right clock for deltas — not `Date.now()`.
|
||||||
|
- **Filter is correct and cheap.** `visiblePlcs` (`dashboard.js:130`) filters a `.slice()` copy (never mutates the snapshot), search is case-folded once, and the sort has a stable `localeCompare` tiebreaker by name. At 54 rows the filter+sort+render per keystroke is sub-millisecond. It correctly survives live updates because `render()` always re-derives from `latest` + current `filter` state.
|
||||||
|
- **`renderMissing()` hot-reload path.** The detail page genuinely handles a PLC disappearing mid-session (`detail.js:168`) — `notice` shown, cards cleared and `hidden`. Good defensive UX for the hot-reload scenario (the gap is the *never-existed* name, see M2).
|
||||||
|
- **No CDN dependencies** — every `<script>`/`<link>` is `/assets/...`, consistent with the firewalled-network design goal in `StatusPage.md`.
|
||||||
|
- **CSS is clean**: design tokens via custom properties, `font-display: swap` on all `@font-face`, responsive `agg-grid` breakpoints, `prefers`-free but no egregious issues. `tr.stale` / `tr.no-traffic` styling gives the debug view real legibility. No `!important` abuse beyond two justified `.empty-row` overrides.
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
1. **C1 exploitability depends on whether `t.direction` and `connectedAtUtc` can actually carry attacker bytes.** `direction` is server-derived from the FC, so today it is effectively a closed set — but it is typed `string` on the wire and `detail.js` trusts it. `connectedAtUtc` is a `DateTimeOffset` serialized server-side, so it too is well-formed *today*. The finding stands because the frontend must not depend on server-side type discipline for its own XSS safety — but if the threat model formally excludes a compromised service, C1 could be re-rated Major. Recommend fixing regardless: the cost is three `escapeHtml` calls.
|
||||||
|
2. Does `StatusBroadcaster` ever push a `plc` payload for a PLC that has *zero* configured BCD tags? `detail.js:183` handles `debug.tags` empty → "No BCD tags configured". Confirmed handled; noted only to flag that the empty-state is covered.
|
||||||
|
3. Should the detail page cap how long it waits before declaring the PLC unknown (M2)? The server has no "unknown PLC" rejection in `SubscribePlc` — it silently accepts any name. A client-side watchdog is the only place this can be surfaced without a server change.
|
||||||
@@ -0,0 +1,52 @@
|
|||||||
|
# mbproxy SignalR Web Dashboard — Code Review Overview
|
||||||
|
|
||||||
|
Scope: commit `e719dd5` ("replace status page with a live SignalR web dashboard") — ~3,500 lines across 49 files. Reviewed in four subsystem passes:
|
||||||
|
|
||||||
|
- [`AdminSignalR.md`](AdminSignalR.md) — `src/Mbproxy/Admin/*` (host, hub, broadcaster, push sink, subscription tracker, snapshot builder, DTOs).
|
||||||
|
- [`TagCapture.md`](TagCapture.md) — `src/Mbproxy/Proxy/{TagValueCapture,TagCaptureRegistry}.cs` + pipeline/reconciler/worker integration.
|
||||||
|
- [`Frontend.md`](Frontend.md) — `src/Mbproxy/Admin/wwwroot/*` (hand-written HTML/CSS/JS only; vendored assets excluded).
|
||||||
|
- [`TestsAndConfig.md`](TestsAndConfig.md) — new/changed tests, `MbproxyOptions`/`ReloadValidator`, csproj `EmbeddedResource`, smoke config.
|
||||||
|
|
||||||
|
## Verdict
|
||||||
|
|
||||||
|
The dashboard is well-architected at the macro level — the `IStatusPushSink` testability seam, the lock-free `TagValueCapture` data structure, the broadcaster's lifecycle binding to the Kestrel app, and the embedded-asset model are all sound. But the review surfaced **one security bug and a cluster of concurrency bugs that share a single root cause**, plus a feature-correctness gap. None of these should ship to operators as-is.
|
||||||
|
|
||||||
|
The standout pattern: **two independent reviewers (`AdminSignalR.md`, `TagCapture.md`) converged on the same root cause** — capture arm/disarm state is not authoritatively owned. `TagValueCapture.IsArmed` is carried on the transient capture instance and counted by `PlcSubscriptionTracker` keyed on SignalR `ConnectionId`. That single design choice produces C2, C3, and M3 below. Fix it once and three findings collapse.
|
||||||
|
|
||||||
|
## Cross-cutting critical findings
|
||||||
|
|
||||||
|
**C1 — Stored XSS on the connection-detail page.** (`Frontend.md` C1) `detail.js` interpolates `t.direction` raw into `.innerHTML` (`detail.js:194/203`), and the client-list builder escapes `c.remote` but not the `shortTime(c.connectedAtUtc)` result — whose `catch` branch returns the raw timestamp string verbatim (`detail.js:86`). The admin endpoint binds `IPAddress.Any` with **no authentication**, and the injected strings are device-/config-influenceable. A crafted value executes script in any operator's browser on the trusted segment. Fleet table (`dashboard.js`) escapes correctly — only the detail page breaks discipline. **Fix:** three `escapeHtml` calls, or switch the row builders to `createElement`/`textContent`.
|
||||||
|
|
||||||
|
**C2 — Capture armed forever after a SignalR reconnect.** (`AdminSignalR.md` C1) `PlcSubscriptionTracker` keys subscriber counts on `ConnectionId`, which changes on every transport reconnect. `OnDisconnectedAsync` for the old connection is unordered relative to the new connection's `SubscribePlc`, so a reconnect-during-view leaks the count and leaves a PLC's `TagValueCapture` **armed with no viewer for the life of the process**. The documented invariant "zero hot-path cost when nobody is watching" becomes false after any reconnect — every backend read and FC06/FC16 write then pays a `FrozenDictionary` lookup + `TagValueObservation` allocation, fleet-wide. `DisarmAll()` only fires on admin shutdown/port hot-reload, so the leak is never bounded in steady state.
|
||||||
|
|
||||||
|
**C3 — Cache hits never reach `Record()`.** (`TagCapture.md` C1) The `ctx.Capture?.Record(...)` calls live only in `BcdPduPipeline.ProcessResponse`, but the Phase-11 response-cache hit path (`PlcMultiplexer.cs:823-828`) returns cached post-rewrite bytes without invoking the pipeline. For any BCD tag with `CacheTtlMs > 0`, once the cache is warm the debug view **freezes at the last cache-miss observation while `AgeSeconds` climbs** — actively misleading an operator into thinking a live tag is dead. Feature-correctness, not a crash; caching is OFF by default so the default deployment is unaffected.
|
||||||
|
|
||||||
|
## Major findings (consolidated)
|
||||||
|
|
||||||
|
- **M1 — Non-atomic `SubscribePlc`.** (`AdminSignalR.md` C2) `AddToGroupAsync` then `_tracker.Add` span two awaits; a same-connection `OnDisconnectedAsync` can interleave and arm a capture on an already-gone connection. Same root cause as C2.
|
||||||
|
- **M2 — `GetOrCreate` lost-update race.** (`TagCapture.md` M1) The `AddOrUpdate` delegate reads `existing.IsArmed`, but a concurrent detail-page open can land its `Arm` on the about-to-be-discarded instance — publishing the rebuilt capture disarmed under an open page, or leaking it armed with no viewer.
|
||||||
|
- **M3 — Recommended fix for C2/M1/M2:** make `PlcSubscriptionTracker`'s subscriber count the *single authority* for arm state — key it on a stable per-tab identifier (or count distinct viewers, not connections), and derive `IsArmed` from the count rather than carrying it on the transient capture.
|
||||||
|
- **M4 — `StatusBroadcaster.LoopAsync` has zero coverage.** (`TestsAndConfig.md`) All four broadcaster tests call `PushOnceAsync` directly; the production push loop, its interval hot-reload re-read, the `Math.Max(100,…)` floor, and cancellation are unverified. The `/hub/status` endpoint is never exercised end-to-end.
|
||||||
|
- **M5 — `DebugJsonContext` is dead code; SignalR serializes via reflection `System.Text.Json`.** (`AdminSignalR.md`) A latent AOT trap; the camelCase wire-shape guarantee has no test.
|
||||||
|
- **M6 — Bind failure after `_broadcaster.Start()` leaks the broadcaster loop and a bound listener** — the catch block's cleanup is incomplete. (`AdminSignalR.md`)
|
||||||
|
- **M7 — Untested arm/disarm race in `TagValueCapture`.** (`TestsAndConfig.md`) `Disarm()` flips `_armed=false` then clears slots; a `Record()` that wins the `_armed` check before `Disarm` runs leaves a stale observation on a disarmed capture. The torn-read test only races `Record` vs `Snapshot`, never vs `Disarm`.
|
||||||
|
- **M8 — Frontend cold-start retry loop** layers an unbounded `setTimeout` retry over `withAutomaticReconnect` and can wedge if `start()` succeeds but `invoke()` fails. (`Frontend.md` M1)
|
||||||
|
- **M9 — Detail page never handles an unknown PLC name** — sits forever on "Waiting for first snapshot…" with a green pill. (`Frontend.md` M2)
|
||||||
|
|
||||||
|
## Recommended remediation order
|
||||||
|
|
||||||
|
1. **C1 (XSS)** — smallest fix, highest severity, ships in any operator-facing build. Do first.
|
||||||
|
2. **M3** — re-root capture arm/disarm authority in `PlcSubscriptionTracker`; closes C2, M1, M2, M7 together. Add a concurrency test for the tracker (currently has none).
|
||||||
|
3. **C3** — add a `Record()` call on the cache-hit path in `PlcMultiplexer`, or document the debug view as cache-blind. Decide explicitly.
|
||||||
|
4. **M4** — add an end-to-end `/hub/status` test (real `HubConnection`, assert a `fleet`/`plc` message and its camelCase shape — also closes the M5 gap) and a `LoopAsync` interval/cancellation test.
|
||||||
|
5. **M6, M8, M9** and the Minor findings in each subsystem file.
|
||||||
|
|
||||||
|
## What looks good
|
||||||
|
|
||||||
|
- `IStatusPushSink` is a genuine, well-placed testability seam.
|
||||||
|
- `TagValueCapture` itself — lock-free, torn-read-safe via immutable records + `Volatile.Write`/`Read`, `FrozenDictionary` address map — is correct. The weakness is *who arms it*, not the structure.
|
||||||
|
- Broadcaster per-cycle error isolation and lifecycle binding to the Kestrel app (no leak across port hot-reloads).
|
||||||
|
- Fleet table (`dashboard.js`) escapes all dynamic content; reconnect→re-subscribe is correctly wired in both JS files; no CDN deps, no stray `console.log`.
|
||||||
|
- `StatusHtmlRenderer` removed cleanly — no dangling source or test references.
|
||||||
|
- csproj `EmbeddedResource` glob is correct (Worker SDK has no competing web default globs).
|
||||||
|
- `AdminPushIntervalMs` validation matches house style across both validators.
|
||||||
@@ -0,0 +1,70 @@
|
|||||||
|
# Tag-Value Capture Review
|
||||||
|
|
||||||
|
Scope: commit `e719dd5` ("replace status page with a live SignalR web dashboard"), restricted to the on-demand tag-value capture feature:
|
||||||
|
`src/Mbproxy/Proxy/TagValueCapture.cs`, `src/Mbproxy/Proxy/TagCaptureRegistry.cs`, the `PerPlcContext.cs` / `BcdPduPipeline.cs` / `ProxyWorker.cs` / `ConfigReconciler.cs` / `HostingExtensions.cs` deltas. Cross-checked against `mbproxy/CLAUDE.md` (design intent: capture armed only while a detail page is open; disarmed hot-path cost = one nullable-deref + one volatile read; torn-read safety via immutable records swapped with `Volatile.Write`) and the surrounding admin layer (`StatusHub`, `PlcSubscriptionTracker`, `StatusBroadcaster`, `AdminEndpointHost`, `StatusSnapshotBuilder`, `PlcMultiplexer`).
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
- The disarmed hot path is genuinely near-zero: one `?.` null check plus one volatile-bool read. No allocations, no dictionary lookup, no lock when disarmed — the design contract holds.
|
||||||
|
- Torn-read safety is correct: `TagValueObservation` is an immutable `record`, slots are reference-typed and only ever swapped via `Volatile.Write` / read via `Volatile.Read`. `FrozenDictionary` is built once in the constructor and never mutated. No defect here.
|
||||||
|
- The single material correctness gap is **feature, not crash**: with the Phase-11 response cache enabled, an FC03/FC04 **cache hit bypasses the pipeline entirely**, so `Record` never fires for cached reads — the debug view silently freezes for any cacheable tag while caching is on.
|
||||||
|
- One real lifecycle leak: a `TagCaptureRegistry` reseat/restart on a PLC that is **not** currently being viewed still rebuilds the capture, and `GetOrCreate`'s armed-flag preservation has a benign-but-real race against `Arm`/`Disarm`.
|
||||||
|
- `ProcessFc06Response` does not `Record`, which is defensible but leaves the write path slightly asymmetric — noted as Minor.
|
||||||
|
|
||||||
|
## Critical Findings
|
||||||
|
|
||||||
|
### C1. Response-cache hits never reach `Record` — the debug view freezes for cached tags
|
||||||
|
`PlcMultiplexer.cs:817-828` vs `BcdPduPipeline.cs:408,437`. The capture `Record` calls live exclusively inside `BcdPduPipeline.ProcessResponse`, which only runs when a backend response is rewritten (`PlcMultiplexer.cs:618`). The Phase-11 cache-hit path at `PlcMultiplexer.cs:823-828` builds the upstream frame straight from `cached.PduBytes` via `BuildCacheHitFrame` and returns — **the pipeline is never invoked, so `ctx.Capture?.Record(...)` never fires**. Consequence: for any BCD tag with `CacheTtlMs > 0`, once the cache is warm the connection-detail debug view shows a value that is frozen at the last *cache-miss* observation and ages indefinitely (`AgeSeconds` keeps climbing, `UpdatedAtUtc` never advances), even though clients are reading the tag every poll cycle. An operator using the debug view to confirm "is this tag live?" is actively misled — the tag *is* live, the proxy just isn't recording the cache-served reads.
|
||||||
|
|
||||||
|
This is a feature-correctness defect, not a crash, but it directly defeats the stated purpose of the debug view ("real-time debug view of raw PLC-side BCD vs. decoded client-side values"). Note caching is OFF by default, so the default deployment is unaffected — hence Critical-for-the-feature rather than Critical-for-the-service.
|
||||||
|
|
||||||
|
Fix: record on the cache-hit path too. The cleanest option is, inside the `responseCache.TryGet` hit branch, to decode the cached PDU against the request range and call `ctx.Capture?.Record(...)` — but the cache stores **post-rewrite** bytes (binary, already decoded), so the raw BCD nibbles are no longer available there. Better: store the `TagValueObservation`(s) alongside the cache entry at `Set` time (line 658) and re-publish them into the capture on a hit; or have the cache entry retain the pre-rewrite raw words. At minimum, document in `docs/Architecture/ResponseCache.md` and the debug-view docs that the debug view does not reflect cache-served reads when caching is enabled, and surface "served from cache" in the UI so the stale age is not mistaken for a dead tag.
|
||||||
|
|
||||||
|
## Major Findings
|
||||||
|
|
||||||
|
### M1. `GetOrCreate` rebuilds (and re-arms) the capture for PLCs nobody is viewing
|
||||||
|
`TagCaptureRegistry.cs:29-39`, `ConfigReconciler.cs:308,364,400`. On every Reseat/Restart/Add, `ConfigReconciler` calls `GetOrCreate`, whose `AddOrUpdate` update-delegate **always constructs a brand-new `TagValueCapture`** and copies `existing.IsArmed` onto it. For the overwhelmingly common case — a PLC with no open detail page — this allocates a fresh capture (arrays + `FrozenDictionary`) on every hot-reload for every reconciled PLC, throwing away an identically-shaped object. `FrozenDictionary` construction is not free; doing 54 of them on a tag-list reload is wasteful churn. Functionally harmless, but it contradicts the "on-demand, cheap when no viewer" spirit. Fix: when `!existing.IsArmed` **and** the resolved tag set is unchanged, return `existing` unchanged. A cheap tag-set equality check (ordered address+width sequence) avoids the rebuild for the no-op-reload case entirely.
|
||||||
|
|
||||||
|
### M2. Armed-flag preservation in `GetOrCreate` races `Arm`/`Disarm`
|
||||||
|
`TagCaptureRegistry.cs:33-38`. The update delegate reads `existing.IsArmed`, builds `rebuilt`, conditionally `rebuilt.Arm()`s, and returns it. `ConcurrentDictionary.AddOrUpdate` may invoke this delegate **more than once** under contention, and — more importantly — there is no synchronization between this delegate and a concurrent `StatusHub.SubscribePlc` → `Registry.Arm` / `OnDisconnectedAsync` → `Registry.Disarm`. Interleavings that lose an arm/disarm:
|
||||||
|
- A viewer opens the detail page (`Arm`) *after* the delegate reads `existing.IsArmed == false` but *before* `AddOrUpdate` swaps `rebuilt` in. The `Arm` lands on `existing`, which is then discarded — the new `rebuilt` is published **disarmed**. The detail page stays open but capture is silently off until the next subscribe event (there is none — subscription already completed).
|
||||||
|
- Symmetrically, a `Disarm` racing the rebuild can be lost, leaving a capture armed with no viewer (a slow leak — see M3).
|
||||||
|
This is a genuine lost-update race. The window is small (config reload concurrent with a detail-page open) but real. Fix: serialize capture arm-state transitions — e.g. funnel `Arm`/`Disarm`/`GetOrCreate` through the `PlcSubscriptionTracker`'s authoritative subscriber count: after a reseat, `GetOrCreate` should set the rebuilt capture's armed state from `tracker.ActivePlcs()` (the source of truth) rather than from the soon-to-be-discarded `existing` object. That makes the tracker — not a transient capture instance — the single owner of arm state.
|
||||||
|
|
||||||
|
### M3. A capture can stay armed forever if the last viewer's disconnect cleanup is lost
|
||||||
|
`StatusHub.cs:60-66`, `PlcSubscriptionTracker.cs:50-73`, `TagCaptureRegistry.cs:56-60`. Disarm happens only on three paths: `OnDisconnectedAsync`, `StatusBroadcaster.StopAsync` → `DisarmAll`, and (indirectly) a hot-reload `Remove`. `OnDisconnectedAsync` is best-effort in SignalR — for an abruptly-killed browser tab it fires on the **server transport-timeout** (default ~30 s for WebSockets, longer for long-polling), which is acceptable. But two narrower holes remain:
|
||||||
|
- If `Remove` is called for a hot-reload-removed PLC (`ConfigReconciler.cs:259`) while a detail page is still open, the capture object is dropped from the registry but the `PlcSubscriptionTracker` still holds the connection→PLC subscription. The eventual `OnDisconnectedAsync` calls `Registry.Disarm(plcName)` which is now a no-op (PLC unknown) — fine — but the subscriber count for the removed PLC is never reconciled, and if that PLC name is later re-added by another hot-reload, `GetOrCreate` creates a *disarmed* capture even though a stale subscription still nominally exists. Minor in practice.
|
||||||
|
- Combined with M2: a `Disarm` lost to the rebuild race leaves a capture armed with no viewer until the next `DisarmAll` (admin port hot-reload or shutdown). That capture keeps doing the full `Record` work (volatile write + `FrozenDictionary` lookup + record allocation) on the proxy hot path for every BCD PDU, indefinitely. Low-frequency trigger, unbounded duration. The fix for M2 closes this.
|
||||||
|
|
||||||
|
There is no periodic reconciliation of "armed captures vs. tracker subscriber counts." Consider a guard: `StatusBroadcaster.PushOnceAsync` already enumerates `tracker.ActivePlcs()` every tick — it could cheaply assert/repair that exactly those PLCs (and no others) are armed, turning any lost arm/disarm into a self-healing condition within one push interval.
|
||||||
|
|
||||||
|
## Minor Findings
|
||||||
|
|
||||||
|
- **`BcdPduPipeline.cs:448-479` — `ProcessFc06Response` does not `Record`.** FC06 write is captured on the *request* path (`ProcessFc06Request`, line 144) with the client's binary `value` and the `encoded` BCD it sent to the PLC. The FC06 *response* decodes the PLC's BCD echo back to binary but does not record. This is defensible (the request already captured the write, and recording the echo would just re-stamp `UpdatedAtUtc`), and the inline comment correctly notes the counter is intentionally not double-incremented. But it is asymmetric with FC03/FC04, which record on the response. Leave as-is, or add a one-line comment in `ProcessFc06Response` stating capture is intentionally request-side only for FC06 — otherwise a future reader will "fix" the perceived omission.
|
||||||
|
|
||||||
|
- **`BcdPduPipeline.cs:251` — FC16 32-bit `Record` passes `binaryValue` reconstructed as `clientHigh * 10_000 + clientLow`.** This is the base-10000 CDAB reconstruction used for encoding, consistent with the FC03/FC04 read path (`Decode32`) and with `DebugDto.ToTagDto`'s `0x{RawHigh:X4}{RawLow:X4}` rendering. Correct — but note the `DecodedValue` shown for a 32-bit tag is the base-10000 composed integer, not a true binary 32-bit value. That matches the rest of the proxy's 32-bit BCD model; just confirm the UI labels it consistently. No code change needed.
|
||||||
|
|
||||||
|
- **`TagValueCapture.cs:136-146` — `Snapshot()` allocates a fresh `TagValueObservation` for every empty slot on every call.** `StatusBroadcaster` calls `BuildDebug` → `Snapshot` once per push interval per *viewed* PLC, so this is bounded and cheap (only viewed PLCs, low cadence). Not worth fixing, but the placeholder records for never-seen tags could be cached once at construction since they are constant. Optional micro-optimization.
|
||||||
|
|
||||||
|
- **`TagValueCapture.cs:68-74` — constructor de-dups tags by `Address` via `GroupBy(...).Select(g => g.First())`.** If the resolved `BcdTagMap` ever contains two tags at the same address with different `Width` (it should not — the map resolver should reject that), the capture silently keeps the first and the debug view width could disagree with what the rewriter actually does. Low risk given upstream validation, but a defensive assert or a comment that the map is already de-duplicated would help.
|
||||||
|
|
||||||
|
- **`TagCaptureRegistry.cs:45-46` — `TryGet`'s `out` uses `capture!` to suppress nullability.** Fine, but callers (`StatusSnapshotBuilder.BuildDebug:46`) correctly branch on the bool first. No issue; noted only for completeness.
|
||||||
|
|
||||||
|
- **`PerPlcContext.cs` clone — `WithCurrentRequest` copies `Capture` by reference.** Correct: the capture is per-PLC and shared across all per-call context clones, which is exactly what's wanted (all concurrent responses for one PLC record into the same capture). Confirmed not a bug.
|
||||||
|
|
||||||
|
## What Looks Good
|
||||||
|
|
||||||
|
- **Disarmed hot-path cost meets the contract.** `Record` is `if (!_armed) return;` — `_armed` is a `volatile bool`, so it is one volatile read; reaching `Record` at all is one `?.` null check on `ctx.Capture`. No allocation, no dictionary lookup, no lock on the disarmed path. The CLAUDE.md claim ("one nullable-deref + one volatile read when disarmed") is accurate.
|
||||||
|
- **Torn-read safety is correctly implemented.** `TagValueObservation` is a `sealed record` with init-only positional members — genuinely immutable. Slots are `TagValueObservation?[]`, mutated only via `Volatile.Write` and read only via `Volatile.Read`. Reference assignment is atomic on all .NET-supported architectures, and the record being immutable means a reader either sees the old reference or the fully-constructed new one — never a torn object. `Disarm` clears slots with `Volatile.Write(..., null)`. No lock needed and none taken.
|
||||||
|
- **`FrozenDictionary` usage is correct.** Built once in the constructor from a plain `Dictionary`, never mutated afterward, only read on the hot path — exactly the intended `FrozenDictionary` use case (build-once, read-many, allocation-free lookup).
|
||||||
|
- **`Snapshot()` always returns one row per configured tag**, substituting a placeholder (`UpdatedAtUtc = null`, zero values) for never-seen slots, so the debug view renders a stable row set — good UX decision, and `DebugDto.ToTagDto` honours it (`HasValue`, `RawHex = "—"`).
|
||||||
|
- **`PlcSubscriptionTracker` is clean.** Single-lock, low-frequency, reference-counted; `Add` returns "first subscriber" and `RemoveConnection` returns "PLCs whose count hit 0" — exactly the arm/disarm edges. The lock is appropriate for the low churn.
|
||||||
|
- **`StatusBroadcaster.StopAsync` calls `DisarmAll`** explicitly, and `AdminEndpointHost.StopCurrentAppAsync` disposes the broadcaster *before* stopping Kestrel — so an AdminPort hot-reload that tears down the SignalR host without firing per-connection `OnDisconnectedAsync` still disarms every capture. This is the deliberate backstop for the "browser tab killed / host torn down" case and it is wired correctly.
|
||||||
|
- **`ConfigReconciler.Remove` is called on the PLC-removed path** (`ConfigReconciler.cs:259`) before the supervisor is stopped, so a hot-reload-removed PLC does not leak its capture in the registry.
|
||||||
|
- **DI wiring is correct.** `TagCaptureRegistry` is a singleton in the outer container (`HostingExtensions.cs`), and `AdminEndpointHost.StartAppAsync:197` re-registers the *same instance* into the inner Kestrel container so `StatusHub` shares it — not a second copy. Verified.
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
1. C1: is the debug view *expected* to reflect cache-served reads? If the product decision is "debug view shows wire traffic only," then C1 downgrades to a documentation gap — but the UI must then clearly distinguish "cached, no recent wire read" from "tag dead." If the decision is "debug view shows what the client sees," C1 is a real defect and the cache path must record.
|
||||||
|
2. M2/M3: should arm state be owned by `PlcSubscriptionTracker` (the authoritative subscriber count) rather than carried on the transient `TagValueCapture` instance and copied across rebuilds? That single change removes the lost-update race and makes `GetOrCreate` stateless w.r.t. arm state.
|
||||||
|
3. Is there value in `StatusBroadcaster` self-healing arm state each tick from `ActivePlcs()`? It already enumerates that set; the reconcile is nearly free and converts any lost arm/disarm into a one-interval transient.
|
||||||
@@ -0,0 +1,285 @@
|
|||||||
|
# Code Review — Tests & Config for the SignalR Dashboard (commit `e719dd5`)
|
||||||
|
|
||||||
|
Scope: the test files, config/options/build changes, and the smoke config introduced or
|
||||||
|
modified by `e719dd5` ("replace status page with a live SignalR web dashboard"). Production
|
||||||
|
SignalR/capture code (`StatusHub`, `StatusBroadcaster`, `TagValueCapture`,
|
||||||
|
`TagCaptureRegistry`, `AdminEndpointHost`) was read for context but is reviewed only insofar
|
||||||
|
as it tells us whether the new tests actually exercise the risky paths. The wider UI
|
||||||
|
(`dashboard.js`, `detail.js`, HTML/CSS) is explicitly out of scope.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
The new tests are competently written, follow the existing `Subject_Condition_Expectation`
|
||||||
|
naming, and the `IStatusPushSink` seam is a genuinely good design that makes the broadcaster
|
||||||
|
unit-testable without a SignalR host. The `TagValueCapture` torn-read test is a real
|
||||||
|
concurrency test, not a pretend one. **But the concurrency-sensitive paths the commit message
|
||||||
|
advertises — "armed only while a detail page is open", arm/disarm under churn, the broadcaster
|
||||||
|
*loop* — are tested only at the single-threaded happy-path level.** The most serious gaps:
|
||||||
|
|
||||||
|
1. The **`StatusBroadcaster.LoopAsync` background loop has zero test coverage** — every
|
||||||
|
broadcaster test calls the internal `PushOnceAsync` directly. The loop's interval
|
||||||
|
re-read, the `Math.Max(100, ...)` floor, the cancellation path, and the
|
||||||
|
loop-terminated-unexpectedly catch are all unverified.
|
||||||
|
2. The **arm/disarm-vs-record race in `TagValueCapture` is real and untested.** `Disarm()`
|
||||||
|
sets `_armed = false` *then* clears slots; `Record()` checks `_armed` *then* writes. A
|
||||||
|
`Record` that passes the `_armed` check before `Disarm` flips it can write a slot *after*
|
||||||
|
`Disarm` cleared it — leaving a stale observation on a disarmed capture. The torn-read
|
||||||
|
test only races `Record` against `Snapshot`, never against `Disarm`/`Arm`.
|
||||||
|
3. The **`PlcSubscriptionTracker` — the lock-guarded type whose entire job is concurrent
|
||||||
|
subscriber counting — has no direct test at all** and no concurrent test anywhere. Its
|
||||||
|
behaviour is only incidentally exercised through single-threaded `StatusHubTests`.
|
||||||
|
|
||||||
|
None are release-blockers for a read-only admin page on a trusted segment, but #2 is a latent
|
||||||
|
correctness bug and #1/#3 mean a regression in the live-feed plumbing would ship silently.
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
### Critical
|
||||||
|
|
||||||
|
None. This is a read-only admin surface; nothing here can corrupt the Modbus path.
|
||||||
|
|
||||||
|
### Major
|
||||||
|
|
||||||
|
**M1 — `StatusBroadcaster.LoopAsync` has no test.**
|
||||||
|
`src/Mbproxy/Admin/StatusBroadcaster.cs:113-135`; `tests/Mbproxy.Tests/Admin/StatusBroadcasterTests.cs`.
|
||||||
|
All four broadcaster tests drive `PushOnceAsync` directly. The actual background loop —
|
||||||
|
which is what runs in production — is untested. Specifically uncovered:
|
||||||
|
- the per-cycle `_options.CurrentValue.AdminPushIntervalMs` re-read (the documented
|
||||||
|
hot-reload-without-restart behaviour);
|
||||||
|
- the `Math.Max(100, ...)` floor that defends against a bad interval slipping past
|
||||||
|
validation (note: validation rejects `<= 0`, so the floor only ever matters for values
|
||||||
|
`1..99` — itself an untested corner);
|
||||||
|
- that `StartAsync`→`StopAsync` actually terminates the loop and that `StopAsync` is safe
|
||||||
|
to call when the loop never started.
|
||||||
|
Fix: add a test that constructs the broadcaster with `AdminPushIntervalMs` ~100ms, calls
|
||||||
|
`Start()`, polls `Sink.FleetPushes.Count` until `>= 2` with a generous deadline + the test
|
||||||
|
`CancellationToken`, then `StopAsync()` and asserts the count stops growing. Keep timing
|
||||||
|
hedged (`ShouldBeGreaterThanOrEqualTo`), consistent with the existing coalescing tests.
|
||||||
|
|
||||||
|
**M2 — Arm/disarm-vs-record race in `TagValueCapture` is real and untested.**
|
||||||
|
`src/Mbproxy/Proxy/TagValueCapture.cs:103-129`. `Disarm()` does `_armed = false` then clears
|
||||||
|
slots; `Record()` does `if (!_armed) return;` then `Volatile.Write` the slot. Interleaving:
|
||||||
|
`Record` reads `_armed == true` → `Disarm` sets `false` and clears all slots → `Record`
|
||||||
|
writes its observation. Result: a *disarmed* capture holds a non-null slot, and the next
|
||||||
|
`Snapshot()` reports stale traffic with `UpdatedAtUtc != null` — exactly the "reopened page
|
||||||
|
shows stale data" failure the on-demand design says it prevents. The hot path makes this
|
||||||
|
unlikely (record traffic stops when no viewer is attached) but it is not impossible: a
|
||||||
|
detail page closing while an in-flight FC03 response is being rewritten hits it.
|
||||||
|
`ConcurrentRecordAndSnapshot_NeverYieldsTornSlot` (line 118) races `Record` vs `Snapshot`
|
||||||
|
only — never vs `Disarm`. Fix in production: have `Disarm()` set `_armed = false`, then
|
||||||
|
clear slots, then clear *again* (or re-clear after a memory barrier), or re-check `_armed`
|
||||||
|
inside `Record` after the `Volatile.Write` and null the slot if disarmed. At minimum add a
|
||||||
|
test that hammers `Arm`/`Disarm` on one task while `Record` runs on another and asserts a
|
||||||
|
disarmed capture's `Snapshot()` is all-null. The review should not let this ship "tested"
|
||||||
|
when the test deliberately avoids the racy interleaving.
|
||||||
|
|
||||||
|
**M3 — `PlcSubscriptionTracker` has no direct or concurrent test.**
|
||||||
|
`src/Mbproxy/Admin/PlcSubscriptionTracker.cs` (whole file); no `PlcSubscriptionTrackerTests.cs`
|
||||||
|
exists. This is the single lock-guarded type whose correctness drives capture arm/disarm.
|
||||||
|
`StatusHubTests` exercises it only single-threaded and indirectly. Untested behaviour that
|
||||||
|
has no coverage anywhere:
|
||||||
|
- the redundant re-subscribe path (`Add` returns `false` when the same connection
|
||||||
|
re-subscribes the same PLC — `set.Add` fails);
|
||||||
|
- `RemoveConnection` for an unknown connection id returning `Array.Empty<string>()`;
|
||||||
|
- a connection subscribed to *multiple* PLCs being torn down in one `RemoveConnection`;
|
||||||
|
- concurrent `Add`/`RemoveConnection` from many "connections" — the lock is claimed
|
||||||
|
thread-safe but nothing proves the count never goes negative or leaks.
|
||||||
|
Fix: add `PlcSubscriptionTrackerTests` with the single-threaded cases plus one
|
||||||
|
parallel-stress test (N tasks each Add-then-Remove, assert `ActivePlcs()` is empty and no
|
||||||
|
exception), mirroring `TxIdAllocatorTests`' concurrency style.
|
||||||
|
|
||||||
|
**M4 — `StatusHub` group-leave on disconnect is not verified.**
|
||||||
|
`src/Mbproxy/Admin/StatusHub.cs:60-66`; `StatusHubTests.cs`.
|
||||||
|
`OnDisconnectedAsync` is tested for the *capture-disarm* side effect, but the hub never
|
||||||
|
calls `Groups.RemoveFromGroupAsync` — and `FakeGroupManager` records `Removed` but **no test
|
||||||
|
ever asserts `groups.Removed`**. Worth confirming this is intentional (real SignalR auto-
|
||||||
|
removes a disconnected connection from all groups, so an explicit `RemoveFromGroupAsync` is
|
||||||
|
genuinely unnecessary) — but then `FakeGroupManager.Removed` is dead code that implies a
|
||||||
|
contract the hub does not honour. Either delete `Removed` from the fake, or if the hub is
|
||||||
|
*supposed* to leave fleet groups explicitly, that's a production bug. As written, a reviewer
|
||||||
|
cannot tell which. Fix: drop the unused `Removed` list, or add a comment on the fake
|
||||||
|
explaining SignalR's implicit group cleanup so the gap is not mistaken for an omission.
|
||||||
|
|
||||||
|
### Minor
|
||||||
|
|
||||||
|
**m1 — `SignalRFakes` do not model the real SignalR failure surface.**
|
||||||
|
`tests/Mbproxy.Tests/Admin/SignalRFakes.cs`. The fakes are faithful to the *shapes* but
|
||||||
|
model only the success path: `FakeGroupManager.AddToGroupAsync` always succeeds and is
|
||||||
|
synchronous; `FakeStatusPushSink.PushFleetAsync` never throws and never observes the
|
||||||
|
`CancellationToken`. The production code has explicit `try/catch` around
|
||||||
|
`_sink.PushFleetAsync` / `PushPlcAsync` (`StatusBroadcaster.cs:88-110`) — a sink that throws
|
||||||
|
is a real scenario (a SignalR send to a dropped connection) and that catch is **completely
|
||||||
|
uncovered**. The `BuildDebug` failure catch (`StatusBroadcaster.cs:82-86`) is likewise
|
||||||
|
uncovered. Fix: add a throwing variant of `FakeStatusPushSink` and assert `PushOnceAsync`
|
||||||
|
swallows the exception and still attempts the per-PLC pushes; assert the `ct` is honoured
|
||||||
|
(an `OperationCanceledException` from the sink must *not* be logged-and-swallowed the same
|
||||||
|
way — the `when (ex is not OperationCanceledException)` filter is untested).
|
||||||
|
|
||||||
|
**m2 — `FakeHubCallerContext.ConnectionAborted` is hardcoded to `CancellationToken.None`.**
|
||||||
|
`SignalRFakes.cs:23`. Real SignalR fires this token on disconnect. No current test needs it,
|
||||||
|
but if anyone later tests connection-abort handling the fake will silently mask it. Low risk;
|
||||||
|
note it with a comment so a future test author knows the fake is inert here.
|
||||||
|
|
||||||
|
**m3 — Asset content-type test relies on a hand-maintained `[InlineData]` allow-list.**
|
||||||
|
`AdminEndpointTests.Get_Asset_ReturnsCorrectContentType` covers 5 of the 14 embedded files.
|
||||||
|
`bootstrap.bundle.min.js`, `detail.css`, `detail.js`, `index.html`, `plc.html`,
|
||||||
|
`ibm-plex-sans-400/600.woff2` are not asserted. More importantly there is **no test that the
|
||||||
|
csproj glob actually embedded every wwwroot file** — if someone adds `favicon.ico` to
|
||||||
|
`wwwroot/` and the glob silently misses it (it won't, `*.*` catches it, but a `.gitignore`'d
|
||||||
|
or renamed file would), nothing fails. Fix: add a test that enumerates
|
||||||
|
`typeof(AdminEndpointHost).Assembly.GetManifestResourceNames()`, filters the
|
||||||
|
`Mbproxy.Admin.wwwroot.` prefix, and asserts every file physically present in `wwwroot/` has
|
||||||
|
a matching resource (or just assert the count). This is the only thing that would catch a
|
||||||
|
broken `EmbeddedResource` glob.
|
||||||
|
|
||||||
|
**m4 — `Get_Asset` does not assert the bytes are the *right* asset.**
|
||||||
|
`AdminEndpointTests.cs` (the new theory). It asserts `bytes.Length > 0` and the content
|
||||||
|
type, but not that `dashboard.js` contains a known marker (it already does this for the HTML
|
||||||
|
shells via `ShouldContain("/assets/dashboard.js")`). A resource-name collision or a wrong
|
||||||
|
`ContentTypeFor` mapping for a *correctly-served-but-wrong* file would pass. Cheap to harden:
|
||||||
|
for the text assets assert a known substring.
|
||||||
|
|
||||||
|
**m5 — `csproj` `EmbeddedResource Include="Admin\wwwroot\*.*"` — glob caveats.**
|
||||||
|
`src/Mbproxy/Mbproxy.csproj` (the new `ItemGroup`). The SDK is `Microsoft.NET.Sdk.Worker`,
|
||||||
|
which does **not** enable web default-item globs, so there is no double-include of `.css`/
|
||||||
|
`.js`/`.html` as `Content` — good. Two real caveats:
|
||||||
|
- `*.*` matches only files containing a dot. Every current asset has an extension so this is
|
||||||
|
fine, but it is a non-obvious constraint; `**\*` or just `*` would be more honest. The
|
||||||
|
comment says "intentionally flat", so `*.*` is acceptable, but worth a one-word note that
|
||||||
|
extensionless files would be skipped.
|
||||||
|
- Globs in an `<ItemGroup>` are evaluated at project-evaluation time; a newly-added asset is
|
||||||
|
*not* picked up by an incremental build that didn't re-evaluate. In practice `dotnet
|
||||||
|
build`/`publish` always re-evaluates, so this is a non-issue — but it is the kind of thing
|
||||||
|
that bites a watch-mode developer. No fix needed; flagged for completeness.
|
||||||
|
The resource-name → request-path mapping (`Mbproxy.Admin.wwwroot.<file>`) is correct for the
|
||||||
|
flat directory and matches `AssetResourcePrefix` in `AdminEndpointHost.cs:301`.
|
||||||
|
|
||||||
|
**m6 — `mbproxy.smoke.config.json` — valid but with stale/odd values.**
|
||||||
|
`tests/sim/mbproxy.smoke.config.json`. The schema is current (`AdminPushIntervalMs` present,
|
||||||
|
`Keepalive` field names match `KeepaliveOptions`, no removed keys). Issues:
|
||||||
|
- The header comment says the simulator runs on `127.0.0.1:5020` and both `line-a`/`line-b`
|
||||||
|
point `Port: 5020` — consistent. Good.
|
||||||
|
- `BackendHeartbeatIdleMs: 10000` while `BackendRequestTimeoutMs: 2000` — satisfies the
|
||||||
|
"must be greater than BackendRequestTimeoutMs" cross-field rule. Fine.
|
||||||
|
- `line-dead` has no `BcdTags`, so it inherits the empty `Global: []` set — fine, the intent
|
||||||
|
is just an unreachable backend.
|
||||||
|
- The file has **no `Resilience` and no `Cache` section.** Both are optional (defaults
|
||||||
|
apply), so the config binds — but a smoke config whose stated purpose is exercising the
|
||||||
|
dashboard's "problems only" filter would be more representative with the defaults made
|
||||||
|
explicit, or at least a comment that defaults are intentionally relied on.
|
||||||
|
- Comment line 1 references "Phase 4/5 web-UI browser smoke tests" and
|
||||||
|
`plans/2026-05-15-webui-dashboard.md`. `plans/` is untracked (`?? plans/` in git status)
|
||||||
|
and per recent commit `7466a46` the project has been *retiring* plan docs. A smoke config
|
||||||
|
pointing at an untracked plan file is a dangling reference — either commit the plan or
|
||||||
|
drop the citation.
|
||||||
|
- No `Resilience.ReadCoalescing` means coalescing runs with defaults during the smoke run;
|
||||||
|
acceptable but undocumented.
|
||||||
|
|
||||||
|
**m7 — `ConfigReconcilerTests` change is a minimal compile-fix, not a behaviour test.**
|
||||||
|
`tests/Mbproxy.Tests/Configuration/ConfigReconcilerTests.cs` (the 1-line diff). The new
|
||||||
|
`TagCaptureRegistry` ctor arg is passed as a throwaway `new TagCaptureRegistry()`. That is
|
||||||
|
the correct minimal change, but it means **the reconciler's new responsibility — calling
|
||||||
|
`TagCaptureRegistry.GetOrCreate` on PLC add and `Remove` on PLC removal during hot-reload —
|
||||||
|
has no test.** `ConfigReconciler.cs` was modified in this commit (`+11` lines) to wire the
|
||||||
|
registry; nothing asserts that a hot-reload-added PLC gets a capture or that a removed PLC's
|
||||||
|
capture is dropped. Fix: in `ConfigReconcilerTests`, pass a real registry the test holds a
|
||||||
|
reference to, trigger an add/remove reload, and assert `registry.TryGet` reflects it.
|
||||||
|
|
||||||
|
**m8 — `ReloadValidator`/`MbproxyOptions` `AdminPushIntervalMs` validation: consistent, but
|
||||||
|
the upper bound is unguarded.** `ReloadValidator.cs` and `MbproxyOptionsValidator` both
|
||||||
|
reject `<= 0` — consistent with how `GracefulShutdownTimeoutMs` and the keepalive intervals
|
||||||
|
are validated (lower-bound-only, `> 0`). So the *new* validation matches house style. But
|
||||||
|
note no interval option in this codebase has an *upper* bound, and `AdminPushIntervalMs` is
|
||||||
|
the one most likely to be fat-fingered into something absurd (`1000000` = 16-minute feed).
|
||||||
|
Not a regression and not inconsistent — flagged only because the dashboard's whole value
|
||||||
|
proposition is "live". The `LoopAsync` `Math.Max(100, ...)` floor protects against tiny
|
||||||
|
values; nothing protects against huge ones. Optional: a soft upper bound (e.g. reject
|
||||||
|
`> 60_000`) or just a doc note. The two new `ReloadValidatorTests` (zero, negative) and two
|
||||||
|
new `MbproxyOptionsBindingTests` (default 1000, binds 250) are correct and adequate for the
|
||||||
|
bounds that *are* enforced.
|
||||||
|
|
||||||
|
**m9 — `StatusHubTests.SecondSubscriber_FirstLeaveKeepsArmed_LastLeaveDisarms` passes
|
||||||
|
`null` to `OnDisconnectedAsync`.** Line 72/76. Real SignalR passes a non-null `Exception`
|
||||||
|
on an abnormal disconnect and `null` on a clean one, so `null` is a legal value — but the
|
||||||
|
test never exercises the exception-carrying path. `StatusHub.OnDisconnectedAsync` ignores
|
||||||
|
the exception entirely, so this is harmless today; flagged so a future change that *uses*
|
||||||
|
the exception doesn't find itself untested.
|
||||||
|
|
||||||
|
## Coverage gaps (behavior in `e719dd5` with NO test)
|
||||||
|
|
||||||
|
1. **`StatusBroadcaster.LoopAsync`** — the entire background loop (M1).
|
||||||
|
2. **Arm/disarm vs. `Record` race** in `TagValueCapture` (M2).
|
||||||
|
3. **`PlcSubscriptionTracker`** — no direct test, no concurrency test (M3).
|
||||||
|
4. **Sink-throws / build-throws error handling** in `PushOnceAsync` — the `try/catch`
|
||||||
|
blocks and the `when (ex is not OperationCanceledException)` filter (m1).
|
||||||
|
5. **`ConfigReconciler` ↔ `TagCaptureRegistry` wiring** — capture created on hot-reload PLC
|
||||||
|
add, removed on PLC remove (m7).
|
||||||
|
6. **`AdminEndpointHost` AdminPort hot-reload tears down the broadcaster and disarms
|
||||||
|
captures** — `StopCurrentAppAsync` disposes `_broadcaster` (which `DisarmAll`s); the
|
||||||
|
broadcaster's `StopAsync_DisarmsEveryCapture` test proves the disarm in isolation but no
|
||||||
|
test proves the *hot-reload rebind* path runs it. (`AdminEndpointTests` has an admin-port
|
||||||
|
rebind test from the prior commit — extend it to assert capture disarm.)
|
||||||
|
7. **`/hub/status` SignalR endpoint** — not reachable by any test. The 405-methods test
|
||||||
|
explicitly excludes it. No test connects a SignalR client and verifies a `fleet`/`plc`
|
||||||
|
message is actually received end-to-end. The `IStatusPushSink` unit tests cover the
|
||||||
|
broadcaster logic but nothing covers `SignalRStatusPushSink` + `MapHub` wiring. Given the
|
||||||
|
project already stands up a real Kestrel host in `AdminEndpointTests`, one
|
||||||
|
`HubConnectionBuilder`-based E2E test (`SubscribeFleet` → await a `fleet` message) would
|
||||||
|
close the single biggest "does the feature actually work" gap.
|
||||||
|
8. **Multi-tag / 32-bit capture via `BcdPduPipeline`** — `BcdPduPipelineCaptureTests` covers
|
||||||
|
FC03 16/32-bit and FC06/FC16 16-bit, but not FC16 *32-bit* (CDAB pair write) capture, and
|
||||||
|
not a PDU spanning a BCD tag *and* a non-BCD register (does capture record only the BCD
|
||||||
|
one?). The pure `TagValueCaptureTests` cover 32-bit, but the *pipeline hook* for 32-bit
|
||||||
|
writes is unverified.
|
||||||
|
9. **`Get_PlcDetailRoute_ReturnsDetailShell`** uses `/plc/anything` — it never checks that a
|
||||||
|
PLC name with a slash, encoded characters, or empty segment is handled. The route is
|
||||||
|
`/plc/{name}` and the name is read client-side, so this is low-risk, but
|
||||||
|
`/plc/` (empty) and `/plc/a%2Fb` are untested.
|
||||||
|
|
||||||
|
## What looks good
|
||||||
|
|
||||||
|
- **`IStatusPushSink` is a clean seam.** Extracting the outbound side behind an interface so
|
||||||
|
the broadcaster loop logic is testable without a SignalR host is exactly right, and
|
||||||
|
`FakeStatusPushSink` using `ConcurrentBag` is the correct call for a type that production
|
||||||
|
pushes to from a background thread.
|
||||||
|
- **`TagValueCapture` torn-read test is genuine.** `ConcurrentRecordAndSnapshot_NeverYields
|
||||||
|
TornSlot` races 4 writers × 200k ops against a reader and checks a real invariant
|
||||||
|
(`DecodedValue == RawLow + RawHigh`). This is the right way to test the `Volatile.Write`
|
||||||
|
immutable-record swap, and it would actually catch a regression to a mutable slot.
|
||||||
|
- **`SignalRFakes` model the right shapes.** `FakeHubCallerContext : HubCallerContext` with
|
||||||
|
the abstract members overridden, `FakeGroupManager : IGroupManager`, and the hub-property
|
||||||
|
injection (`Context = ...`, `Groups = ...`) is the standard, correct way to unit-test a
|
||||||
|
SignalR `Hub` without a host. No mocking framework is dragged in.
|
||||||
|
- **`BcdPduPipelineCaptureTests` regression guards are well-chosen.** The disarmed-capture
|
||||||
|
and null-capture cases assert the rewrite *still happens byte-identically* — that is the
|
||||||
|
load-bearing property (capture must never perturb the proxy's transparency) and it is
|
||||||
|
explicitly tested.
|
||||||
|
- **`TagCaptureRegistryTests.GetOrCreate_Rebuild_PreservesArmedFlag`** correctly tests the
|
||||||
|
hot-reload reseat contract (rebuilt capture is a new instance but keeps `IsArmed`), and
|
||||||
|
`UnknownPlc_Operations_AreSafeNoOps` covers the no-op-for-ghost-PLC contract that
|
||||||
|
`StatusHub` relies on.
|
||||||
|
- **`StatusHtmlRenderer` removal is clean.** No source or test file references it; the only
|
||||||
|
remaining mentions are in `plans/`, `docs/`, and prior `codereviews/` — all expected.
|
||||||
|
- **The `AdminPushIntervalMs` validation is consistent with the codebase.** Lower-bound-only
|
||||||
|
`> 0` checks in both `MbproxyOptionsValidator` and `ReloadValidator`, error-message format
|
||||||
|
matching the sibling checks, and tests for default + bind + zero + negative. This is the
|
||||||
|
right pattern; the new tests are adequate for what is enforced.
|
||||||
|
- **The csproj `EmbeddedResource` glob is correct** for the flat-directory design, the
|
||||||
|
comment accurately documents the resource-name → request-path mapping, and there is no
|
||||||
|
accidental double-include because the Worker SDK does not enable web default globs.
|
||||||
|
- **`mbproxy.smoke.config.json` binds against the current schema** — no stale keys, the
|
||||||
|
three-PLC line-a/line-b/line-dead topology is a sensible smoke surface, and
|
||||||
|
`AdminPushIntervalMs` is present and explicit.
|
||||||
|
|
||||||
|
## Key file references
|
||||||
|
|
||||||
|
- `src/Mbproxy/Admin/StatusBroadcaster.cs:113-135` — untested `LoopAsync` (M1).
|
||||||
|
- `src/Mbproxy/Proxy/TagValueCapture.cs:103-129` — arm/disarm vs. record race (M2).
|
||||||
|
- `src/Mbproxy/Admin/PlcSubscriptionTracker.cs` — no test file exists (M3).
|
||||||
|
- `src/Mbproxy/Admin/StatusHub.cs:60-66` — `OnDisconnectedAsync`; `FakeGroupManager.Removed`
|
||||||
|
is unasserted dead code (M4).
|
||||||
|
- `tests/Mbproxy.Tests/Admin/SignalRFakes.cs` — success-path-only fakes (m1, m2).
|
||||||
|
- `tests/Mbproxy.Tests/Configuration/ConfigReconcilerTests.cs` — compile-fix only, reconciler
|
||||||
|
↔ registry wiring untested (m7).
|
||||||
|
- `src/Mbproxy/Mbproxy.csproj` — `EmbeddedResource Include="Admin\wwwroot\*.*"` (m5).
|
||||||
|
- `tests/sim/mbproxy.smoke.config.json` — dangling `plans/` reference (m6).
|
||||||
@@ -0,0 +1,110 @@
|
|||||||
|
# Admin SignalR / Tag Capture — Code Review
|
||||||
|
|
||||||
|
Scope: `src/Mbproxy/Admin/*.cs` (AdminEndpointHost, StatusBroadcaster, StatusHub, PlcSubscriptionTracker, StatusSnapshotBuilder, StatusPushSink, StatusDto, DebugDto, AssemblyVersionAccessor) plus `src/Mbproxy/Proxy/TagValueCapture.cs` and `src/Mbproxy/Proxy/TagCaptureRegistry.cs`. Branch `mbproxy-webui-dashboard` (HEAD `0308490`). Cross-checked against `docs/Operations/StatusPage.md`, `docs/Reference/LogEvents.md`, the mbproxy `CLAUDE.md`, the prior reviews `codereviews/2026-05-15/{AdminSignalR,TagCapture}.md`, and the consuming code in `ConfigReconciler.cs` / `ProxyWorker.cs` / `PlcMultiplexer.cs`.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
The two Critical findings from the 2026-05-15 reviews have been genuinely fixed and the fixes hold. The SignalR subscription tracker is now keyed on a stable per-page-load `tabId` (`PlcSubscriptionTracker`), not on the `ConnectionId`, so a transport reconnect can no longer leak a viewer; arm/disarm is funneled through a single authority (`StatusBroadcaster.PushOnceAsync` → `TagCaptureRegistry.ReconcileArmed`), removing the lost-update race; the hub no longer arms captures at all. The prior XSS surface does not exist in the current admin layer — no server-rendered HTML, no string interpolation into a response; `/plc/{name}` serves a static embedded shell and `name` is read client-side. The cache-hit debug-view freeze (prior TagCapture C1) is fixed by `CacheEntry.CapturedTags` replay. The endpoint is strictly read-only. `M5`/`M6` from the prior review (bind-failure leak) are addressed: `StartAppAsync` now tears down a partially-started `app`/`_broadcaster` in its catch.
|
||||||
|
|
||||||
|
Remaining issues are lower severity: a real but bounded reconcile/disarm race against `Snapshot()`, a stale doc comment, a missing `Start()` double-call guard, and assorted test-gap / doc-drift items. No new Critical issues.
|
||||||
|
|
||||||
|
Findings by severity: **Critical: 0 · Major: 2 · Minor: 7**
|
||||||
|
|
||||||
|
## Major findings
|
||||||
|
|
||||||
|
### M1. `ReconcileArmed` disarm can clear a slot between snapshot capture and push — debug view can flicker an extra empty cycle
|
||||||
|
|
||||||
|
`StatusBroadcaster.cs:108-130`, `TagCaptureRegistry.cs:55-68`, `TagValueCapture.cs:108-113`.
|
||||||
|
|
||||||
|
`PushOnceAsync` calls `ReconcileArmed(activePlcs)` *before* the per-PLC `foreach` that calls `_builder.BuildDebug(plcName)`. For a PLC that is currently armed and still has a viewer this is fine. But the ordering interacts badly with a viewer that leaves *during* a push cycle:
|
||||||
|
|
||||||
|
- `activePlcs` is snapshotted at line 108. Suppose `plc-x` is in it.
|
||||||
|
- `ReconcileArmed` keeps `plc-x` armed.
|
||||||
|
- The viewer's `OnDisconnectedAsync` runs on a hub thread mid-cycle, removing `plc-x` from the tracker — but the `foreach` still iterates the stale `activePlcs` list and pushes one final detail message to a now-empty group. Benign (empty group no-ops).
|
||||||
|
|
||||||
|
The genuinely observable case is the inverse and is a correctness nuance, not a crash: because `ReconcileArmed` runs once per cycle and `BuildDebug` runs in the same cycle, a PLC that becomes active *after* line 108 (a `SubscribePlc` landing between line 108 and the `foreach`) is not in `activePlcs`, so it is neither armed nor pushed this cycle — it self-heals next cycle. That is acceptable. What is worth flagging: `ReconcileArmed` and `BuildDebug` are not consistent within the cycle for a PLC whose arm state flips mid-cycle, and the capture's `Disarm()` clears every slot (`TagValueCapture.cs:111-112`). If a hot-reload `GetOrCreate` rebuild or a `ReconcileArmed` disarm interleaves between `ReconcileArmed` and the `capture.Snapshot()` inside `BuildDebug`, the pushed `PlcDebugSnapshot` can report `CaptureArmed = true` (read at `StatusSnapshotBuilder.cs:54`) with all-empty tag rows, or `CaptureArmed = false` with populated rows.
|
||||||
|
|
||||||
|
Impact: a transient one-cycle (≤ `AdminPushIntervalMs`, default 1 s) inconsistency in the debug view — `captureArmed` and the tag rows can momentarily disagree. Not data corruption, not a leak; the next cycle is correct. Worth recording because the prior reviews treated arm-state consistency as the central invariant.
|
||||||
|
|
||||||
|
Recommendation: read `capture.IsArmed` and `capture.Snapshot()` once, atomically enough for display — or, simplest, have `BuildDebug` derive `CaptureArmed` from the same `activePlcs` set the broadcaster already holds rather than re-reading `capture.IsArmed`. Pass the active set (or a `bool armed`) into `BuildDebug` instead of letting it independently re-query the capture. That makes the pushed payload internally consistent by construction.
|
||||||
|
|
||||||
|
### M2. `StatusBroadcaster.Start()` has no guard against a second call — a double-start orphans a push loop
|
||||||
|
|
||||||
|
`StatusBroadcaster.cs:56-57`: `public void Start() => _loop = Task.Run(() => LoopAsync(_cts.Token));`
|
||||||
|
|
||||||
|
A second `Start()` overwrites `_loop`, orphaning the first loop task. The first task keeps running against the same `_cts`, so two loops now push the fleet snapshot every interval until cancellation, and `StopAsync` only `await`s the *second* `_loop` — the orphaned first loop is never awaited and its faults are unobserved. The XML doc ("Idempotent only in the sense that it is called once") documents the assumption but nothing enforces it.
|
||||||
|
|
||||||
|
Today `AdminEndpointHost` constructs a fresh `StatusBroadcaster` per `StartAppAsync` and calls `Start()` exactly once, so this is not hit in production. But `StatusBroadcasterTests.Loop_PushesRepeatedly_ThenStopsAfterStopAsync` (line 175) calls `h.Broadcaster.Start()` on a harness whose broadcaster was *not* started by a hidden path — that one is fine — yet the class remains trivially misusable, and `IAsyncDisposable.DisposeAsync` → `StopAsync` would only ever clean up one of two loops.
|
||||||
|
|
||||||
|
Recommendation: add a one-line guard — `Interlocked.CompareExchange` on an `int _started`, or `if (_loop is not { Status: TaskStatus.RanToCompletion or ... } ) ...` — and either throw or no-op on the second call. An `Interlocked` flag is the cleanest. Pair it with a brief test asserting a second `Start()` does not double the push rate.
|
||||||
|
|
||||||
|
## Minor findings
|
||||||
|
|
||||||
|
### N1. Stale comment in `ConfigReconciler` claims `GetOrCreate` preserves the armed flag
|
||||||
|
|
||||||
|
`ConfigReconciler.cs:364-365`: *"Rebuild the capture for the new tag set; GetOrCreate preserves the armed flag so an open detail page keeps capturing across the reload."*
|
||||||
|
|
||||||
|
`TagCaptureRegistry.GetOrCreate` (`TagCaptureRegistry.cs:33-37`) no longer does this — the update delegate `(_, _) => new TagValueCapture(map.All)` builds a fresh **disarmed** capture and copies nothing. The XML doc on `GetOrCreate` itself (lines 27-31) correctly describes the new behavior ("The rebuilt capture is disarmed; StatusBroadcaster re-arms it on its next push cycle"). The `ConfigReconciler` comment is the stale one and now contradicts the code. It is the only surviving reference to the old armed-flag-preservation design.
|
||||||
|
|
||||||
|
Impact: misleading to a future maintainer; could prompt a "fix" that reintroduces the lost-update race the redesign eliminated.
|
||||||
|
|
||||||
|
Recommendation: update the `ConfigReconciler.cs:364-365` comment to: *"Rebuild the capture for the new tag set; the rebuilt capture is disarmed and StatusBroadcaster re-arms it within one push cycle if the PLC still has a viewer."*
|
||||||
|
|
||||||
|
### N2. `TagValueCapture.Record`'s Record-vs-Disarm re-check still has a narrow stale-slot window
|
||||||
|
|
||||||
|
`TagValueCapture.cs:122-143`. The defensive re-read of `_armed` after the `Volatile.Write` is correct in spirit and closes the common case. But it is not airtight: `Disarm()` does `_armed = false` *then* clears slots (`TagValueCapture.cs:110-112`). Interleaving `Record` against `Disarm`:
|
||||||
|
|
||||||
|
1. `Record`: `_armed` true → passes the gate.
|
||||||
|
2. `Disarm`: `_armed = false`; begins the slot-clear loop, clears slot `idx`.
|
||||||
|
3. `Record`: `Volatile.Write(ref _slots[idx], newObs)` — writes *after* Disarm already cleared `idx`.
|
||||||
|
4. `Record`: re-reads `_armed` → false → `Volatile.Write(ref _slots[idx], null)`. Slot ends null. **Correct.**
|
||||||
|
|
||||||
|
That path self-corrects. The genuinely uncovered ordering is step 3 landing *after* step 4's own null-write would have run — it cannot, single-threaded within `Record` — so `Record` is in fact safe for its own write. The residual risk is purely that `Record` is called *concurrently from two threads* for the same address (e.g. an FC03 backend response and an FC06 upstream write racing on the same tag): both pass the gate, both write, the re-check on the later writer can null the slot even though the capture is armed, dropping that observation until the next traffic. This is a lost *update*, not stale data, and the slot self-heals on the next PDU.
|
||||||
|
|
||||||
|
Impact: under concurrent same-address traffic with a near-simultaneous disarm, a single observation can be dropped. Functionally negligible for a debug view; flagged only for completeness since the prior review called this race out and it is narrowed but not fully eliminated.
|
||||||
|
|
||||||
|
Recommendation: no code change required — document in the `Record` comment that the guarantee is "armed captures never retain stale data after disarm," not "every observation while armed is retained." If stricter retention is ever wanted, `Disarm` would need to set `_armed = false` *after* clearing slots and `Record` would re-validate, but that inverts a different race; the current trade-off is the right one for a debug view.
|
||||||
|
|
||||||
|
### N3. No test exercises a reconnect that drops the *new* connection first
|
||||||
|
|
||||||
|
`StatusHubTests.Reconnect_SameTab_NewConnection_DoesNotLeakViewer` (line 50) and `PlcSubscriptionTrackerTests.SameTab_TwoConnections_StaysActiveUntilLastConnectionGone` (line 27) both cover the reconnect-overlap case where the *old* connection's `OnDisconnectedAsync` fires last. Neither covers the mirror ordering (new connection's disconnect arrives before the old one's), nor the case where `SubscribePlc` for the new connection arrives *before* `RemoveConnection` for the old one but with a different `tabId` (e.g. a hard page reload that generates a fresh `tabId`). The tracker handles all of these correctly by construction, but the test suite asserts only one of the orderings.
|
||||||
|
|
||||||
|
Recommendation: add a test where two connections of the same tab are removed in reverse order, and one where a page reload (new `tabId`, old tab still has a live connection mid-teardown) is shown not to leak.
|
||||||
|
|
||||||
|
### N4. `SubscribePlc` registers the tracker entry before the group join — a throw in `AddToGroupAsync` leaves a tracked viewer with no group membership
|
||||||
|
|
||||||
|
`StatusHub.cs:46-54`. The current order is deliberate (comment lines 48-51: register with the tracker first so this connection's own `OnDisconnectedAsync` sees consistent state). But if `Groups.AddToGroupAsync` throws (transport fault mid-invocation), the tracker now holds a subscription for a connection that is not in the PLC's SignalR group. The capture for that PLC gets armed by the next `ReconcileArmed`, but the broadcaster pushes detail messages to a group the connection never joined, so the page receives nothing while the capture is needlessly armed — until `OnDisconnectedAsync` eventually fires for that connection and `RemoveConnection` cleans it up. Self-healing (the disconnect callback always runs eventually) and the capture cost is bounded, so this is Minor — but the asymmetry is real.
|
||||||
|
|
||||||
|
Recommendation: either accept it (the disconnect callback is the backstop, and `OnDisconnectedAsync` is guaranteed to run for any connection that completed `OnConnectedAsync`) and add a one-line comment to that effect, or wrap the body so a failed `AddToGroupAsync` rolls back the tracker entry via `RemoveConnection`. The accept-and-document option is sufficient.
|
||||||
|
|
||||||
|
### N5. `StatusBroadcaster` swallows every non-OCE push exception with no rate limiting
|
||||||
|
|
||||||
|
`StatusBroadcaster.cs:96-130`. Per-stage `try/catch` correctly isolates a snapshot-build failure, a fleet-push failure, and each per-PLC push failure, and the `[LoggerMessage]` events (`mbproxy.admin.broadcast.{snapshot,fleet,detail}.failed`, EventIds 72-74) match `LogEvents.md`. Good. But a persistently failing push (e.g. a wedged SignalR transport) logs one `Error` event *per cycle* — at the default 1 s cadence that is 86,400 error log lines/day for one stuck PLC. `LogEvents.md` for `mbproxy.admin.broadcast.detail.failed` says "Sustained occurrences ... mean that PLC's detail page is not receiving live updates" — so the volume is by design as an operator signal, but it can drown the rolling file.
|
||||||
|
|
||||||
|
Recommendation: consider logging the first failure at `Error` and subsequent consecutive failures at `Debug` (or every Nth), resetting on the first success. Minor — the documented contract permits the current behavior, so this is an enhancement, not a defect.
|
||||||
|
|
||||||
|
### N6. `DebugDtoSerializationTests` re-creates the hub options by hand rather than asserting against the real configuration
|
||||||
|
|
||||||
|
`DebugDtoSerializationTests.cs:18-19` builds `new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.CamelCase }` and comments that this is "The exact policy AdminEndpointHost configures." It is a *copy* of `AdminEndpointHost.cs:200-202`, not a reference to it — if someone edits the `AddJsonProtocol` lambda (adds `DefaultIgnoreCondition`, changes the policy, etc.) the test still passes against its stale copy. `HubStatusE2ETests` does exercise the real hub end-to-end and asserts `captureArmed`/`service`/`plcs` appear, which is the genuine guard; the unit test is a weaker duplicate.
|
||||||
|
|
||||||
|
Recommendation: low priority — either delete the unit test as redundant with the E2E coverage, or extract the `JsonSerializerOptions` configuration into a shared static (e.g. `AdminEndpointHost.HubJsonOptions`) that both the production `AddJsonProtocol` call and the test reference, so they cannot drift.
|
||||||
|
|
||||||
|
### N7. Doc drift: `StatusPage.md` does not state the detail payload travels only over SignalR, and `LogEvents.md` `EventId` 75 wording is fine but the broadcaster's `mbproxy.admin.broadcast.*` family is not cross-referenced from the doc's source list
|
||||||
|
|
||||||
|
`docs/Operations/StatusPage.md` "Debug View Data" (lines 313-331) documents every `PlcDetailResponse` field but never says the payload is reachable *only* over the `/hub/status` SignalR feed and has no `GET` route — a scraper author could reasonably expect a `/plc/{name}.json` twin. Prior review N7 raised exactly this and it is still open. Separately, `StatusPage.md:23-25` describes the bind-failure and `AdminPort` hot-reload behavior accurately, and `LogEvents.md` lists the four `mbproxy.admin.broadcast.*` events correctly — no defect there, just confirming.
|
||||||
|
|
||||||
|
Recommendation: add one sentence to `StatusPage.md`'s "Debug View Data" section: *"`PlcDetailResponse` is delivered only over the `/hub/status` SignalR feed (the `\"plc\"` message); there is no `GET` route for it, and it is serialized through the SignalR JSON protocol, not `StatusJsonContext`."*
|
||||||
|
|
||||||
|
## What looks good
|
||||||
|
|
||||||
|
- **The prior Critical SignalR capture-leak (2026-05-15 C1/C2) is genuinely fixed.** `PlcSubscriptionTracker` is keyed on `tabId`, not `ConnectionId`; a reconnect is "same tab acquires a new connection," so the viewer count cannot leak across a transport drop. `StatusHub` no longer arms/disarms captures — `OnConnectedAsync` is not even overridden — and arm/disarm flows through `StatusBroadcaster.PushOnceAsync → TagCaptureRegistry.ReconcileArmed`, a single-threaded once-per-cycle authority. `ReconcileArmed` reconciles *all* captures against the live `ActivePlcs()` set every cycle, so any stale arm state self-heals within one `AdminPushIntervalMs`. The `StatusHubTests` / `PlcSubscriptionTrackerTests` reconnect and concurrency-stress tests cover the core invariant.
|
||||||
|
- **The prior TagCapture Critical (cache-hit debug-view freeze) is fixed.** `PlcMultiplexer.cs:653-666` captures `TagValueObservation`s into `CacheEntry.CapturedTags` at store time, and the cache-hit path (`PlcMultiplexer.cs:844-850`) replays them into the armed capture re-stamped to the hit time. The debug view now reflects cache-served reads.
|
||||||
|
- **No XSS / output-encoding surface.** The admin layer renders no server-side HTML. `/` and `/plc/{name}` serve static embedded `index.html` / `plc.html` byte blobs (`ServeHtmlShell`); `name` is bound but unused and read client-side. `/status.json` and the SignalR payloads are JSON-serialized, never string-interpolated. There is nowhere for attacker-controlled input to reach an HTML response.
|
||||||
|
- **Endpoint is strictly read-only.** Routes are `GET /`, `GET /plc/{name}`, `GET /assets/{path}`, `GET /status.json`, and the hub. The hub exposes only `SubscribeFleet` / `SubscribePlc` (group joins + tracker mutation) and `OnDisconnectedAsync`. No mutation route, no control action, no log download.
|
||||||
|
- **Asset serving is traversal-safe.** `AdminEndpointHost.cs:221` rejects `/`, `\`, and `..` in the path segment before any resource lookup; assets are embedded manifest resources, so there is no filesystem surface. Bytes and misses are cached in a `static ConcurrentDictionary`.
|
||||||
|
- **Bind-failure cleanup is now complete (prior M5/M6 fixed).** `StartAppAsync` declares `app` outside the `try` and its `catch` (`AdminEndpointHost.cs:258-280`) best-effort disposes the `_broadcaster` and stops/disposes the partially-started Kestrel `app`, so a throw after `app.StartAsync` no longer leaks a bound listener or a running push loop.
|
||||||
|
- **Lifecycle / hot-reload of `AdminPort` is correct.** `_broadcaster` is constructed per `StartAppAsync` and disposed before Kestrel stops in `StopCurrentAppAsync`; the `OnChange` callback is guarded by the `_disposed` flag both before queueing and after the threadpool picks it up, and re-checks `newPort == _currentPort` under `_lock`. `DisposeAsync` is idempotent via `_disposed`. CTS/semaphore disposal is race-guarded.
|
||||||
|
- **`StatusBroadcaster` resource hygiene.** `_cts` is disposed in `DisposeAsync` after `StopAsync` awaits the loop; `StopAsync` carries a `_stopped` flag so a double-stop (DisposeAsync after an explicit StopAsync) does not touch the disposed CTS or re-cancel. `LoopAsync` re-reads `AdminPushIntervalMs` each cycle (hot-reloadable) and floors it at 100 ms; it pushes before delaying so a freshly-connected dashboard populates immediately.
|
||||||
|
- **`TagValueCapture` torn-read safety holds.** `TagValueObservation` is an immutable `sealed record`; slots are reference-typed and only swapped via `Volatile.Write` / read via `Volatile.Read`; `_armed` is `volatile`; the disarmed hot path is a single volatile-bool read after one `?.` null-check. `FrozenDictionary` is built once and never mutated.
|
||||||
|
- **`StatusSnapshotBuilder` is lock-free and side-effect-free**, degrades gracefully for an unknown PLC (`BuildDebug` returns a disarmed empty snapshot), and the `CounterSnapshot` fallback for a not-yet-started supervisor is exhaustive.
|
||||||
|
- **`StatusBroadcaster.PushOnceAsync` indexes the fleet snapshot into a `Dictionary` once per cycle** (`StatusBroadcaster.cs:113-114`) — the prior review's O(N²) `FirstOrDefault`-per-PLC concern is resolved.
|
||||||
@@ -0,0 +1,157 @@
|
|||||||
|
# Code Review — Configuration, Options, Hosting, Diagnostics
|
||||||
|
|
||||||
|
**Scope:** `src/Mbproxy/Configuration/`, `src/Mbproxy/Options/`, `src/Mbproxy/Diagnostics/`, `src/Mbproxy/HostingExtensions.cs`, `Program.cs`, `ServiceCounters.cs` (with `ProxyWorker.cs` and `PlcListenerSupervisor.cs` read for context).
|
||||||
|
**Branch:** `mbproxy-webui-dashboard` @ `0308490`.
|
||||||
|
**Date:** 2026-05-16.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
The configuration/hosting subsystem is in good shape overall: the hot-reload reconciler is cleanly structured, the debounce loop and `_applySemaphore` serialization are correct and leak-free, the diagnostic-sink platform selection is well factored and unit-testable, and `ServiceCounters` uses `Interlocked` correctly. Prior-review remediations appear intact. The most consequential issue found is that `ReloadValidator` is **never invoked at startup** — it runs only inside `ConfigReconciler.ApplyUnderLockAsync` — so an `appsettings.json` that has duplicate `ListenPort`s, an `AdminPort` collision, a bad keepalive cross-field relationship, or a non-positive timeout will start the service in a broken state instead of failing fast, directly contradicting the documented contract. A second real bug: the `Restart` reconcile path removes a supervisor from the dictionary *before* rebuilding it, so a transient failure during rebuild silently drops the PLC with no listener and no recovery. The remaining findings are validation-completeness gaps and maintainability items.
|
||||||
|
|
||||||
|
**Findings count:** Critical 0 · Major 5 · Minor 6
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Major
|
||||||
|
|
||||||
|
### M1 — `ReloadValidator` never runs at startup; cross-PLC config errors are not fail-fast
|
||||||
|
|
||||||
|
**Files:** `src/Mbproxy/Configuration/ConfigReconciler.cs:232` (only call site); `src/Mbproxy/Proxy/ProxyWorker.cs:91-180`; `src/Mbproxy/HostingExtensions.cs:21-43`; `docs/Operations/Configuration.md:284`.
|
||||||
|
|
||||||
|
`ReloadValidator.Validate` is called from exactly one place — `ApplyUnderLockAsync` (`ConfigReconciler.cs:232`). The startup path is `ProxyWorker.ExecuteAsync`, which builds per-PLC tag maps directly via `BcdTagMapBuilder.Build` (`ProxyWorker.cs:101`) and never calls `ReloadValidator`. The only schema validation that runs at startup is `MbproxyOptionsValidator` via `.ValidateOnStart()` (`HostingExtensions.cs:21-26`), and that validator does **not** check the cross-PLC rules.
|
||||||
|
|
||||||
|
Consequences at startup (none of these are caught):
|
||||||
|
- **Duplicate `ListenPort`** across two PLCs — both supervisors try to bind the same port; one wins, the other enters the *infinite* `ListenerRecovery` loop forever. No fail-fast, no clear error.
|
||||||
|
- **`AdminPort` colliding with a `ListenPort`** — same silent-conflict outcome.
|
||||||
|
- **Duplicate PLC `Name`** — `ProxyWorker.cs:125` and `:179` write `_supervisors[plc.Name] = ...`, so the second entry silently overwrites the first; one configured PLC simply never gets a listener and its supervisor object leaks (created at `:165`, overwritten, never started/disposed).
|
||||||
|
- **Bad keepalive cross-field rule** (`BackendHeartbeatIdleMs <= BackendRequestTimeoutMs`) — only enforced in `ReloadValidator` (`ReloadValidator.cs:163`), so a fresh install with this mistake produces a continuously-firing heartbeat.
|
||||||
|
|
||||||
|
`docs/Operations/Configuration.md:284` explicitly states "`ReloadValidator.Validate` runs on every config load (startup and hot reload) … On rejection at startup, the service exits non-zero." That is false for the current code.
|
||||||
|
|
||||||
|
**Impact:** A config mistake that should abort startup instead produces a half-working fleet that "looks" up. The most likely operational mistakes (port typos, copy-pasted PLC blocks) are exactly the ones not caught.
|
||||||
|
|
||||||
|
**Recommendation:** Run `ReloadValidator.Validate` against `_options.CurrentValue` at the top of `ProxyWorker.ExecuteAsync` (before building any supervisor) and, on failure, log `mbproxy.config.reload.rejected` (or a startup-specific event) and fail the host (`throw` / `StopApplication`). Alternatively, fold the cross-PLC rules into `MbproxyOptionsValidator` so `.ValidateOnStart()` covers them. Either way, make startup and reload share one validation gate, and fix the doc.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### M2 — `Restart` reconcile path drops the PLC if the rebuild throws
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Configuration/ConfigReconciler.cs:286-340`.
|
||||||
|
|
||||||
|
In the restart branch, the old supervisor is removed from the dictionary first (`TryRemove(name, out var old)`, `:292`), then stopped/disposed, then a fresh context + supervisor is built (`:302-330`) and re-inserted (`:332`). The whole body is wrapped in `try { … } catch (Exception ex) { _logger.LogError(...) }` (`:335-339`).
|
||||||
|
|
||||||
|
If anything between the `TryRemove` and the `_supervisors[name] = newSupervisor` re-insert throws — e.g. `BcdTagMapBuilder.Build` faults, `PolicyFactory`/`PerPlcContext` construction throws, an `OOM`, or `StopAsync` on the old supervisor throws something other than what it swallows — the catch logs and returns, but the dictionary no longer contains `name`. The PLC now has **no listener at all** and nothing ever recovers it: the next reload's `ReloadPlan.Compute` sees `name` present in both `current.Plcs` and `next.Plcs` (because `_currentOptions` is updated to `next` at `:437` regardless), so it lands in "unchanged" or `ToReseat`/`ToRestart` again only if options differ — an identical subsequent save produces *no* plan entry, leaving the PLC permanently dark until a manual config edit forces a restart.
|
||||||
|
|
||||||
|
The same structural risk exists in the `Add` path (`:388-431`), but there it is benign: a failed add just means the PLC was never there to begin with, and a subsequent reload will re-`Add` it because it is still absent from `_currentOptions`.
|
||||||
|
|
||||||
|
**Impact:** A transient fault during a hot-reload restart permanently removes a PLC from service with only an Error log line. For a 54-PLC fleet this is a silent single-line outage.
|
||||||
|
|
||||||
|
**Recommendation:** Build the new context + supervisor *before* removing/stopping the old one (or into a local first), and only swap into the dictionary once construction succeeded. On a build failure, leave the old supervisor in place and surface the error. At minimum, in the `catch`, if `name` is no longer in `_supervisors`, re-queue the PLC for `Add` on the next pass or roll `_currentOptions[name]` back to the old value so the next `Compute` re-detects it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### M3 — Startup-built supervisors and reconciler-built supervisors do not use the same `MaxParties` / coalescing live-config path consistently with the rest of `Resilience`
|
||||||
|
|
||||||
|
**Files:** `src/Mbproxy/Configuration/ConfigReconciler.cs:281-330, 383-421`; `src/Mbproxy/Proxy/ProxyWorker.cs:138-141`.
|
||||||
|
|
||||||
|
The backend-connect Polly pipeline is rebuilt inside `ApplyUnderLockAsync` from `next.Resilience.BackendConnect` (`:282-284`, `:384-386`) and the listener-recovery pipeline from `next.Resilience.ListenerRecovery` (`:314`, `:405`) — so add/restart supervisors *do* pick up reloaded `BackendConnect`/`ListenerRecovery` values. But existing supervisors that land in `ToReseat` (or in neither bucket) keep the pipelines built at startup (`ProxyWorker.cs:139`). A reload that changes only `Resilience.BackendConnect.MaxAttempts` or `Resilience.ListenerRecovery.*` therefore takes effect for *added/restarted* PLCs but silently does **not** propagate to the majority of PLCs that were not otherwise touched. The hot-reload propagation tables in `docs/Features/HotReload.md:45-58` and `docs/Operations/Configuration.md:423-435` list `ReadCoalescing` and `Backend*TimeoutMs` but say nothing about `Resilience.BackendConnect`/`ListenerRecovery` — operators will reasonably assume they hot-reload like everything else.
|
||||||
|
|
||||||
|
**Impact:** Inconsistent reload semantics: the same key behaves differently depending on whether a PLC happened to be added/restarted in the same save. Hard to diagnose ("I changed `MaxAttempts`, three PLCs honour it, fifty-one don't").
|
||||||
|
|
||||||
|
**Recommendation:** Either (a) document explicitly that `Resilience.BackendConnect` and `Resilience.ListenerRecovery` require a service restart (like `AdminPort`), and stop rebuilding them in the reconciler so behaviour is uniform; or (b) make them genuinely hot-reloadable by threading a live accessor (as already done for `ReadCoalescing`/`Keepalive`) so all supervisors re-read them. Option (a) is the smaller, safer change.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### M4 — `MbproxyOptionsValidator` does not validate `ListenPort` range or `Host`, leaving binding edge cases uncaught at schema time
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Options/MbproxyOptions.cs:62-144`; `src/Mbproxy/Options/PlcOptions.cs`.
|
||||||
|
|
||||||
|
`MbproxyOptionsValidator` validates `Width`, cache TTLs, cache-size knobs, connection timeouts, `AdminPushIntervalMs`, and keepalive ranges — but never `PlcOptions.ListenPort` (range/uniqueness), `PlcOptions.Name` (empty/unique), `PlcOptions.Host` (empty), `PlcOptions.Port` (range), or `AdminPort` (range/collision). Those live only in `ReloadValidator`. Given M1 (ReloadValidator not run at startup), this is the *only* validator on the startup path, so:
|
||||||
|
- `ListenPort = 0` (the C# default when the key is omitted, see `PlcOptions.cs:6`) passes schema validation. The supervisor then tries to bind port 0, which the OS interprets as "pick an ephemeral port" — the listener binds a random port and clients can never reach it. No error is raised anywhere.
|
||||||
|
- An empty `Host` produces a `SocketException`/`ArgumentException` only at first backend connect, surfacing as a recoverable runtime fault rather than a config rejection.
|
||||||
|
|
||||||
|
`docs/Operations/Configuration.md:130-135` documents `Name`, `ListenPort`, `Host` as **required** — but nothing enforces "required" at bind time.
|
||||||
|
|
||||||
|
**Impact:** A PLC block missing `ListenPort` or `Host` starts a useless listener with no diagnostic. Combined with M1, the only realistic guard (`ReloadValidator`) is never consulted at startup.
|
||||||
|
|
||||||
|
**Recommendation:** Add `ListenPort ∈ [1,65535]`, non-empty `Name`, non-empty `Host`, and `Port ∈ [1,65535]` checks to `MbproxyOptionsValidator` (it already iterates `options.Plcs`), or — preferably — resolve M1 so `ReloadValidator` runs at startup and consider whether the two validators should be merged to remove the consistency burden entirely.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### M5 — `Resilience.BackendConnect.BackoffMs` length is never validated against `MaxAttempts`
|
||||||
|
|
||||||
|
**Files:** `src/Mbproxy/Options/ResilienceOptions.cs:19-29`; `src/Mbproxy/Options/MbproxyOptions.cs:62-144`; `src/Mbproxy/Configuration/ReloadValidator.cs`; `src/Mbproxy/Proxy/Supervision/PolicyFactory.cs:43-75`.
|
||||||
|
|
||||||
|
`docs/Operations/Configuration.md:220` states `BackoffMs` "Must have `MaxAttempts - 1` entries." Nothing enforces this. Neither `MbproxyOptionsValidator` nor `ReloadValidator` looks at `RetryProfile.MaxAttempts`, `BackoffMs`, `RecoveryProfile.InitialBackoffMs`, or `SteadyStateMs` at all. `PolicyFactory.BuildBackendConnect` (`:46`) does `Math.Max(1, MaxAttempts)` and clamps the backoff index to the last element (`:59-61`) — so it is *crash-safe*, but a config with `MaxAttempts = 5` and a single `BackoffMs` entry, or negative backoff values, silently produces behaviour that does not match the operator's intent (negative ms become `TimeSpan` that Polly may treat as zero/throw depending on version). `MaxAttempts = 0` is silently coerced to 1.
|
||||||
|
|
||||||
|
**Impact:** Low-blast-radius but a documented invariant ("must have `MaxAttempts - 1` entries") is unenforced, and a negative backoff or zero `MaxAttempts` is accepted without warning. Doc and code disagree.
|
||||||
|
|
||||||
|
**Recommendation:** Add a validation rule (in whichever validator survives M1's consolidation): `MaxAttempts >= 1`; `BackoffMs.Count >= MaxAttempts - 1`; every `BackoffMs`/`InitialBackoffMs` entry `>= 0`; `SteadyStateMs > 0`. Or, if the clamp-and-coerce behaviour is genuinely intended, relax the doc wording to match.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Minor
|
||||||
|
|
||||||
|
### N1 — `ApplyUnderLockAsync` logs `mbproxy.config.reload.applied` and bumps the success counter even when every step failed
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Configuration/ConfigReconciler.cs:335-443`.
|
||||||
|
|
||||||
|
By design, a step that throws is caught and logged and the loop continues (`:335`, `:373`, `:426`), then step 7 unconditionally sets `_currentOptions = next`, calls `RecordReloadApplied`, and logs `applied` (`:437-441`). If, say, every restart in `ToRestart` threw, the operator sees `config.reloadCount` increment and an INFO `mbproxy.config.reload.applied` line — with no signal that the apply was partial or total-failure. The `Plcs*` counts in the event are *planned* counts (`:243-246`), not *achieved* counts.
|
||||||
|
|
||||||
|
**Impact:** Status page and logs report a clean reload when reconciliation actually failed. Misleading during incident triage.
|
||||||
|
|
||||||
|
**Recommendation:** Track per-step failures (a counter or list). If any step threw, log at `Warning` with a "partial" qualifier (or emit a distinct event) and either skip `RecordReloadApplied` or expose a `reloadPartialCount`. At minimum, include achieved-vs-planned counts in the event.
|
||||||
|
|
||||||
|
### N2 — Reseat step swaps `_currentOptions` membership assumptions but uses `next.Plcs.First(...)` which throws inside the loop
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Configuration/ConfigReconciler.cs:353`.
|
||||||
|
|
||||||
|
`var plcNew = next.Plcs.First(p => p.Name == name);` — `ReloadPlan.Compute` guarantees `name` exists in `next`, so this will not throw in practice, but `First` with no match throws `InvalidOperationException`, which the surrounding `catch` (`:373`) would log as a generic "Error reseating context". A `FirstOrDefault` with an explicit null-guard, or passing the `PlcOptions` through `ReloadPlan.ToReseat` alongside the map, would make the invariant explicit and the failure mode obvious. Minor robustness/clarity.
|
||||||
|
|
||||||
|
**Recommendation:** Extend `ToReseat` tuple to `(string Name, PlcOptions New, BcdTagMap NewMap)` so the reconciler never re-resolves by name.
|
||||||
|
|
||||||
|
### N3 — `ComputeGlobalTagDelta` throws on a duplicate global address; `ReloadPlan.Compute` indexes PLCs with `ToDictionary` likewise
|
||||||
|
|
||||||
|
**Files:** `src/Mbproxy/Configuration/ConfigReconciler.cs:465-466`; `src/Mbproxy/Configuration/ReloadPlan.cs:39-40, 83-84`.
|
||||||
|
|
||||||
|
`before.Global.ToDictionary(t => t.Address)` (`:465`) throws `ArgumentException` if `BcdTags.Global` contains two entries at the same address. `BcdTagMapBuilder` detects duplicate addresses as a validation error, and `ReloadValidator` rejects such a snapshot — *but only `next`*. `ComputeGlobalTagDelta` is called with `_currentOptions.BcdTags` as `before` (`:249`); `_currentOptions` was validated when it became current, so this is safe today. It is, however, a latent fragility: any future path that sets `_currentOptions` without full validation, or a `Global` list with duplicate addresses that `BcdTagMapBuilder` happens not to flag, turns a cosmetic delta computation into an unhandled exception inside `ApplyUnderLockAsync` (caught only by the debounce-loop catch-all, aborting the whole apply). Same applies to `ReloadPlan.Compute`'s `ToDictionary(p => p.Name)` — safe only because `next` was validated first.
|
||||||
|
|
||||||
|
**Recommendation:** Use `ToDictionary` with a last-wins lambda or `DistinctBy`, or wrap `ComputeGlobalTagDelta` defensively. It is a pure cosmetic counter — it should never be able to abort a reload.
|
||||||
|
|
||||||
|
### N4 — `EventLogBridge` caches `_sourceExists` at construction; a post-install source registration never takes effect, but the doc oversells "no per-event registry traffic"
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Diagnostics/EventLogBridge.cs:62-80`.
|
||||||
|
|
||||||
|
Behaviour is correct and intentional (documented at `Configuration.md:100`). One nuance: `Emit` still wraps `EventLog.WriteEntry` in a `try/catch` (`:101-109`) that swallows everything — so if the source is deleted *after* startup, every Error+ event silently no-ops with zero diagnostics. That is acceptable for a logging sink (must never recurse/crash), but there is no `SelfLog.WriteLine` breadcrumb as `SyslogBridge` has (`SyslogBridge.cs:46`). For symmetry and debuggability, emit one `SelfLog` line the first time a write fails.
|
||||||
|
|
||||||
|
**Recommendation:** Add a one-shot `SelfLog.WriteLine` on first write failure in `EventLogBridge.Emit` to match `SyslogBridge`'s degradation breadcrumb.
|
||||||
|
|
||||||
|
### N5 — `ConfigReconciler.Dispose` blocks up to 2 s synchronously on `_debounceLoop.Wait`
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Configuration/ConfigReconciler.cs:485-501`.
|
||||||
|
|
||||||
|
`Dispose` cancels `_disposalCts` and then `_debounceLoop.Wait(TimeSpan.FromSeconds(2))`. Since `ConfigReconciler` is a DI singleton, this runs on the host's shutdown path (`ServiceProvider.Dispose`). If an apply is mid-flight when shutdown begins, the loop will not observe cancellation until the current `ApplyAsync` returns (the apply's per-step CTS are linked to `ct` = `_disposalCts.Token`, so they *do* cancel — good), so 2 s is normally ample. But a hung supervisor `StopAsync` inside an apply could make `Dispose` block for the full 2 s and then *abandon* a still-running loop thread. Acceptable, but worth a comment that the 2 s is a hard cap and the loop may be abandoned; also consider implementing `IAsyncDisposable` so the host can await it cleanly rather than block a thread.
|
||||||
|
|
||||||
|
**Recommendation:** Document the 2 s cap and abandonment behaviour, or convert to `IAsyncDisposable`.
|
||||||
|
|
||||||
|
### N6 — Doc drift: `Configuration.md:284` claims startup rejection, and the validator-ordering claims in HotReload.md are slightly off
|
||||||
|
|
||||||
|
**Files:** `docs/Operations/Configuration.md:284, 314-318`; `docs/Features/HotReload.md:81, 95`.
|
||||||
|
|
||||||
|
Beyond M1's headline doc bug: `HotReload.md:81` asserts "There is no second validator that could disagree with the first" and `:95` says "The two paths overlap deliberately so both startup and reload reject the same malformed input with the same error wording." In reality there *are* two validators (`MbproxyOptionsValidator` and `ReloadValidator`) with overlapping-but-not-identical rules and different wording (e.g. `MbproxyOptionsValidator` says "exceeds the 60_000 ms safety cap" while `ReloadValidator` says "exceeds 60_000 ms without Cache.AllowLongTtl=true"). The "same error wording" claim is inaccurate. `Configuration.md:298` rule 7 ("Width … enforced by `MbproxyOptionsValidator`") correctly notes Width is *not* in `ReloadValidator` — which means a hot reload that somehow bound a bad Width would not be caught by `ReloadValidator` (it relies on `BcdTagMapBuilder` to reject it; verify that builder does).
|
||||||
|
|
||||||
|
**Recommendation:** Once M1/M4 are resolved (ideally by consolidating to one validator), rewrite these doc passages to describe the actual single gate. If two validators remain, drop the "same error wording" claim.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Items checked and found correct (no finding)
|
||||||
|
|
||||||
|
- **Debounce loop** (`ConfigReconciler.cs:165-218`): the linked-CTS-per-iteration pattern is correct; `debounceCts` is `using`-scoped per loop iteration so no CTS leak; the `OperationCanceledException when (!ct.IsCancellationRequested)` filter correctly distinguishes window-elapsed from disposal.
|
||||||
|
- **`_applySemaphore` serialization** (`:150-161`): correct `WaitAsync`/`try`/`finally`/`Release`; disposed once in `Dispose`.
|
||||||
|
- **Channel** (`:73-77`): bounded capacity 1 with `DropOldest` is the right choice for a coalesced "something changed" signal; no unbounded growth.
|
||||||
|
- **`OnChange` registration** (`:105-111`): non-blocking `TryWrite`, registration disposed in `Dispose` (`:487`).
|
||||||
|
- **`ServiceCounters`** (`ServiceCounters.cs`): all reads/writes via `Interlocked`; the ticks-as-long pattern for `LastReloadUtc` is sound; `CompareExchange(ref x, 0, 0)` as a volatile read is correct.
|
||||||
|
- **`DiagnosticSinkSelector.Select`** (`DiagnosticSinkSelector.cs:51-59`): pure, correct precedence (Windows wins, `isSystemd` only consulted off-Windows); the `OperatingSystem.IsWindows()` re-guard at `HostingExtensions.cs:101` correctly satisfies the platform analyzer for `[SupportedOSPlatform("windows")]` `EventLogBridge`.
|
||||||
|
- **Graceful shutdown wiring** (`ProxyWorker.cs:288-351`): `GracefulShutdownTimeoutMs` read live from `CurrentValue`; in-flight snapshot taken before `base.StopAsync`; admin stopped last; supervisors disposed last. Ordering matches the documented contract.
|
||||||
|
- **`ReloadPlan.Compute`** (`ReloadPlan.cs`): identity keyed on `Name`, `TagMapsEqual` includes `CacheTtlMs`, reseat-vs-restart split is correct.
|
||||||
|
- **Reseat counter preservation** (`ConfigReconciler.cs:359`): `supervisor.CurrentCounters` passed into the new context — correct, matches `HotReload.md:117`.
|
||||||
@@ -0,0 +1,160 @@
|
|||||||
|
# Frontend / Live Web Dashboard — Code Review
|
||||||
|
|
||||||
|
Scope: commit `0308490` ("close out the dashboard code-review minor findings"). Files reviewed under `src/Mbproxy/Admin/wwwroot/`: `index.html`, `plc.html`, `dashboard.js`, `detail.js`, `util.js`, `theme.css`, `dashboard.css`, `detail.css`. Vendored assets out of scope. Cross-checked against `docs/Operations/StatusPage.md`, `src/Mbproxy/Admin/StatusDto.cs`, `src/Mbproxy/Admin/DebugDto.cs`, `src/Mbproxy/Admin/StatusBroadcaster.cs`, and `src/Mbproxy/Admin/StatusHub.cs`. Prior findings from `codereviews/2026-05-15/Frontend.md` were verified: the XSS on `detail.js` (C1 in that review) and the `shortTime`/`escapeHtml` gap are confirmed fixed. The cold-start retry loop, URL-decode guard, NaN guards, accessibility attributes, and `util.js` extraction are all verified resolved. This review covers the current code fresh.
|
||||||
|
|
||||||
|
**Finding counts: 1 Critical, 1 Major, 4 Minor.**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Critical Findings
|
||||||
|
|
||||||
|
### C1 — `dashboard.js` `stateChip` renders `listener.state` into `innerHTML` without escaping
|
||||||
|
`src/Mbproxy/Admin/wwwroot/dashboard.js:48`
|
||||||
|
|
||||||
|
**What is wrong.** `stateChip(state)` in `dashboard.js` returns:
|
||||||
|
```js
|
||||||
|
return `<span class="chip ${cls}">${state}</span>`; // line 48
|
||||||
|
```
|
||||||
|
The `state` value is a server-supplied string from `plc.listener.state`, interpolated raw into an `innerHTML`-bound template string with no `escapeHtml` call. It flows into `innerHTML` at line 166:
|
||||||
|
```js
|
||||||
|
<td>${stateChip(plc.listener.state)}</td>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Inconsistency with `detail.js`.** The identical `stateChip` function in `detail.js` was fixed by the prior review and now correctly uses `escapeHtml(state)` at line 76:
|
||||||
|
```js
|
||||||
|
return `<span class="chip ${cls}">${escapeHtml(state)}</span>`;
|
||||||
|
```
|
||||||
|
The fix was applied only to `detail.js` and was not propagated back to `dashboard.js`.
|
||||||
|
|
||||||
|
**Impact.** `listener.state` is a `string` on the SignalR wire. Its values in the current implementation are the closed set `"bound"` / `"recovering"` / `"stopped"`, produced by the supervisor state machine — so the immediate exploitability is limited. However, the frontend must not rely on server-side type discipline for its own XSS safety (same rationale as the prior C1 finding). Under the same threat model as the prior review — no authentication on the admin port, PLC names and other state fields originate from operator-editable `appsettings.json` — any path that injects an unexpected string value into `listener.state` (serialization change, a future hot-reload edge case, a compromised service) would execute arbitrary script in every open fleet-dashboard tab. The fleet dashboard is more exposed than the detail page because it is the persistent long-lived view.
|
||||||
|
|
||||||
|
**Fix.** One character change: replace `${state}` with `${escapeHtml(state)}` at `dashboard.js:48`. `escapeHtml` is already imported from `window.mbproxyUtil` at line 18.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Major Findings
|
||||||
|
|
||||||
|
### M1 — `onreconnected` silently swallows subscribe failure; `connected` pill is misleading when the subscribe invoke is rejected
|
||||||
|
`src/Mbproxy/Admin/wwwroot/dashboard.js:254`, `src/Mbproxy/Admin/wwwroot/detail.js:285`
|
||||||
|
|
||||||
|
**What is wrong.** Both pages' `onreconnected` callbacks fire after `withAutomaticReconnect` restores the transport. They call the hub method (`SubscribeFleet` / `SubscribePlc`) and swallow any error with `.catch(() => {})`:
|
||||||
|
|
||||||
|
```js
|
||||||
|
// dashboard.js:252-255
|
||||||
|
connection.onreconnected(() => {
|
||||||
|
setConn('connected');
|
||||||
|
connection.invoke('SubscribeFleet').catch(() => {});
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
If the hub method throws (hub overloaded, exception in `OnConnectedAsync`, network jitter between the reconnect and the invoke), the connection is live but the subscription is silently dropped. The pill shows `connected` (green) while the page receives zero further updates. There is no retry, no error indication to the operator, and no watchdog that would notice the dead feed. The cold-start path (`connect()`) retries with backoff and guards against re-entering on a live socket; the warm-reconnect path has no equivalent safety net.
|
||||||
|
|
||||||
|
**Impact.** On a typical LAN deployment with stable infrastructure this will be rare, but it is the kind of silent failure that is hard to diagnose. An operator watching a page that appears connected but shows stale counter values has no indication anything is wrong. The detail page additionally does not re-arm `armSnapshotWatchdog()` on the reconnect path, so the 6-second "no data" notice will not fire either if the subscribe silently fails.
|
||||||
|
|
||||||
|
**Fix.** Add a minimal retry for the `onreconnected` subscribe. The simplest form: invoke, and on rejection set the pill to `'disconnected'`/`'retrying'` and call `connect()` (the cold-start function already guards against re-starting a non-Disconnected connection). Alternatively, factor the subscribe into a shared helper used by both the cold-start and warm-reconnect paths, so the retry discipline is written once.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Minor Findings
|
||||||
|
|
||||||
|
### N1 — `detail.js` does not destructure `escapeAttr` from `window.mbproxyUtil`; the attribute-escaping helper is silently unavailable
|
||||||
|
`src/Mbproxy/Admin/wwwroot/detail.js:23`
|
||||||
|
|
||||||
|
**What is wrong.** `detail.js` imports only `escapeHtml`:
|
||||||
|
```js
|
||||||
|
const { escapeHtml } = window.mbproxyUtil; // line 23 — escapeAttr not imported
|
||||||
|
```
|
||||||
|
`escapeAttr` is defined in `util.js` (line 15) and is available on `window.mbproxyUtil`, but `detail.js` never references it. All current attribute-context class interpolations in `detail.js` (the `cls` slots at lines 65, 76; `dirCls` at line 222; `stale.trim()` at line 224) happen to be computed from literal strings, so there is no current XSS. The risk is latent: any future change that passes a server-derived value into a class or other attribute slot in `detail.js` will produce an unescaped-attribute injection with no indication at the call site that the escaping tool is absent. The inconsistency was directly implicated in the prior review's C1 finding (the fix added `escapeHtml` to `detail.js` but did not import `escapeAttr`).
|
||||||
|
|
||||||
|
**Recommendation.** Change line 23 to:
|
||||||
|
```js
|
||||||
|
const { escapeHtml, escapeAttr } = window.mbproxyUtil;
|
||||||
|
```
|
||||||
|
No other changes needed — the helper just needs to be in scope so future reviewers and the linter can verify attribute contexts are correctly escaped.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### N2 — `prevPdu` and `rateByName` Maps are never pruned; stale entries accumulate across hot-reload PLC removal
|
||||||
|
`src/Mbproxy/Admin/wwwroot/dashboard.js:11–12, 66–78`
|
||||||
|
|
||||||
|
**What is wrong.** `updateRates(snapshot)` (lines 66–78) only writes entries for PLCs present in the current snapshot; it never deletes entries for PLCs removed by a hot-reload. Over a long session with PLC churn, `prevPdu` and `rateByName` grow without bound. Concretely: `rateByName.size` (line 102) is used to decide whether the fleet PDU/s card shows a rate or `—`; after a PLC is removed its stale entry keeps `rateByName.size > 0`, so the card shows `0` instead of the arguably more correct `—` for a fleet that has shrunk to zero active PLCs. `renderAggregates` already iterates only `s.plcs` (line 91) so the stale rate values are never summed — this is correct — but the Map still grows.
|
||||||
|
|
||||||
|
**Impact.** Negligible in the current 54-PLC deployment with low PLC churn rate. Becomes visible if PLCs are repeatedly added and removed via hot-reload over a long-lived browser session. The `rateByName.size` false-positive is the most user-visible symptom.
|
||||||
|
|
||||||
|
**Recommendation.** At the end of `updateRates`, prune both Maps to the current snapshot's PLC name set:
|
||||||
|
```js
|
||||||
|
const currentNames = new Set(snapshot.plcs.map(p => p.name));
|
||||||
|
for (const k of prevPdu.keys()) { if (!currentNames.has(k)) prevPdu.delete(k); }
|
||||||
|
for (const k of rateByName.keys()) { if (!currentNames.has(k)) rateByName.delete(k); }
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### N3 — Empty `class=""` attribute emitted on non-stale debug rows; `stale.trim()` call is a no-op
|
||||||
|
`src/Mbproxy/Admin/wwwroot/detail.js:223–224`
|
||||||
|
|
||||||
|
**What is wrong.** The stale/non-stale class is computed as:
|
||||||
|
```js
|
||||||
|
const stale = (t.ageSeconds || 0) > 30 ? ' stale' : ''; // line 223
|
||||||
|
return `<tr class="${stale.trim()}"> // line 224
|
||||||
|
```
|
||||||
|
When the row is not stale, `stale` is `''`, `stale.trim()` is `''`, and the emitted markup is `<tr class="">`. The `.trim()` call is defensive against the leading space in `' stale'`, but a simpler and cleaner expression that also avoids the empty-attribute emission is:
|
||||||
|
```js
|
||||||
|
const staleCls = (t.ageSeconds || 0) > 30 ? 'stale' : '';
|
||||||
|
return `<tr${staleCls ? ` class="${staleCls}"` : ''}>
|
||||||
|
```
|
||||||
|
Or simply omit `.trim()` and use `class="${stale}"` — the leading space is harmless in CSS class matching. This is cosmetic.
|
||||||
|
|
||||||
|
**Impact.** Zero functional impact. The empty `class=""` attribute is parsed and ignored by every browser. Flagged as a code-cleanliness observation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### N4 — `window.mbproxyUtil` is destructured at script-top-level; a util.js load failure aborts the entire page silently
|
||||||
|
`src/Mbproxy/Admin/wwwroot/dashboard.js:18`, `src/Mbproxy/Admin/wwwroot/detail.js:23`
|
||||||
|
|
||||||
|
**What is wrong.** Both scripts destructure `window.mbproxyUtil` at the top of their IIFE, before any DOM or error-handling setup:
|
||||||
|
```js
|
||||||
|
const { escapeHtml, escapeAttr } = window.mbproxyUtil; // dashboard.js:18
|
||||||
|
const { escapeHtml } = window.mbproxyUtil; // detail.js:23
|
||||||
|
```
|
||||||
|
If `util.js` fails to load (404 after a publish error, embedded-asset routing bug, script tag typo), `window.mbproxyUtil` is `undefined`, the destructuring throws a `TypeError` at the first `const`, and the remaining script — including `DOMContentLoaded`, error notices, and the SignalR connection — never runs. The page renders its static HTML indefinitely with no visible error. The HTML `<noscript>` fallback is absent.
|
||||||
|
|
||||||
|
**Impact.** Unlikely in production given the assets are embedded in the binary and verified by `EmbeddedAssetsTests`. Worth noting because the failure mode is silent — no browser console error surfaces to the operator, and the page appears partially rendered rather than broken.
|
||||||
|
|
||||||
|
**Recommendation.** Add a null-guard:
|
||||||
|
```js
|
||||||
|
if (!window.mbproxyUtil) {
|
||||||
|
document.body.innerHTML = '<p style="padding:2rem;color:red">Admin UI failed to load — missing util.js. Check the browser console.</p>';
|
||||||
|
throw new Error('window.mbproxyUtil not defined');
|
||||||
|
}
|
||||||
|
const { escapeHtml, escapeAttr } = window.mbproxyUtil;
|
||||||
|
```
|
||||||
|
Or equivalently move the destructuring inside `DOMContentLoaded` after a check. The `throw` intentionally aborts the script — at least with a visible error — rather than letting it limp along with broken escaping.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What Looks Good
|
||||||
|
|
||||||
|
- **All prior C1 and M-series findings from `codereviews/2026-05-15/Frontend.md` are confirmed fixed.** Specifically: `t.direction`, `t.rawHex`, `t.name`, `shortTime(c.connectedAtUtc)`, and `l.lastBindError` in `detail.js` are all now correctly wrapped in `escapeHtml`. The `shortTime` fallback now goes through `escapeHtml` at the call site (line 105). The `plcNameError` guard prevents the script from silently failing on a malformed URL. The `util.js` shared helper was introduced and is loaded before both page scripts. The `armSnapshotWatchdog` addresses the "unknown PLC sits forever on waiting" scenario. Keyboard/Enter/Space sort handlers and `aria-sort` on `<th>` elements are in place. `aria-live="polite"` on the connection pill and `aria-label` on the toolbar inputs are present. The PLC name is a real `<a>` link with `rel="noopener"` — no more `window.open`.
|
||||||
|
|
||||||
|
- **`detail.js` `stateChip` is correctly escaped.** The fix from the prior review was applied to `detail.js:76` (`escapeHtml(state)`). Only `dashboard.js` has the regression (C1 above).
|
||||||
|
|
||||||
|
- **`detail.js` `card()` `v` slot is safe through consistent caller discipline.** All values passed as the second element of card rows are either `stateChip()` output (internally escaped), `num()` output (numeric-only), `ratioText()` output (`n%` or `—`), or explicitly `escapeHtml`-wrapped. No server string lands unescaped in the `v` slot.
|
||||||
|
|
||||||
|
- **SignalR lifecycle is correct.** `onreconnected` re-invokes `SubscribeFleet`/`SubscribePlc` — essential for group re-subscription after a transport reconnect. `onclose` does not re-subscribe (correct — `withAutomaticReconnect` owns the warm path). `tabId` is stable per page-load (not per ConnectionId) so transport reconnects do not leak armed captures server-side. `gotSnapshot()` correctly clears the watchdog timer on the first data arrival.
|
||||||
|
|
||||||
|
- **Cold-start retry is well-structured.** The `connect()` function guards `connection.state === Disconnected` before calling `start()`, preventing a subscribe-failure from re-starting a live socket. Capped exponential backoff (`retryMs = Math.min(retryMs * 2, 30000)`) prevents infinite tight retries.
|
||||||
|
|
||||||
|
- **All DTO field names are correct.** `detail.js` and `dashboard.js` read `coalescedHitCount`, `cacheHitCount`, `backendHeartbeatsSent`, `backendHeartbeatsFailed`, `backendIdleDisconnects`, `disconnectCascades`, `queueDepth`, `txIdWraps`, `inFlight`, `maxInFlight`, `invalidBcdWarnings`, `exceptionsByCode.codeOther`, etc. — all matching the camelCase JSON policy applied by `StatusJsonContext`/`JsonKnownNamingPolicy.CamelCase`. No field name typos found.
|
||||||
|
|
||||||
|
- **`escapeHtml` and `escapeAttr` in `util.js` are correct.** `&` is replaced before `<`/`>` (correct order), and `escapeAttr` adds `"` on top of `escapeHtml`. Both functions are pure and side-effect-free.
|
||||||
|
|
||||||
|
- **No CDN dependencies.** All `<script>`/`<link>` tags reference `/assets/…` — consistent with the firewalled-network design intent documented in `StatusPage.md`.
|
||||||
|
|
||||||
|
- **`formatUptime` and `formatAge` both guard `Number.isFinite` and `< 0`.** NaN/negative clock skew returns `—` rather than `NaN` text.
|
||||||
|
|
||||||
|
- **`tagCell(t)` correctly coerces `t.address` with `Number()`.** The address is rendered as a pure decimal or hex numeric, never as a raw server string.
|
||||||
|
|
||||||
|
- **`debug-rows` empty-state and `colspan` counts are consistent with the HTML `<thead>`.** The debug table has 6 columns; all static empty-row `<td colspan="6">` and the `no-traffic` span layout (1+1+3+1) sum to 6. The KPI table has 10 columns; `colspan="10"` matches. No cross-file DOM-id inconsistencies found.
|
||||||
|
|
||||||
|
- **CSS is clean.** `focus-visible` outlines on sortable `<th>` elements and the PLC-name `<a>` provide keyboard focus indicators. No `!important` abuse. Design tokens are consistent across the three CSS files.
|
||||||
@@ -0,0 +1,158 @@
|
|||||||
|
# Code Review — Backend Connection Layer (Multiplexing + Cache)
|
||||||
|
|
||||||
|
**Date:** 2026-05-16
|
||||||
|
**Branch:** `mbproxy-webui-dashboard` (HEAD `0308490`)
|
||||||
|
**Scope:** `src/Mbproxy/Proxy/Multiplexing/` (PlcMultiplexer, UpstreamPipe, CorrelationMap, InFlightByKeyMap, InFlightRequest, TxIdAllocator, CoalescingKey, `*LogEvents.cs`), `src/Mbproxy/Proxy/Cache/` (ResponseCache, CacheEntry, CacheKey, CacheInvalidator, CacheLogEvents), `src/Mbproxy/Proxy/MbapFrame.cs`.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
The connection layer is well-structured and the threading invariants documented in `docs/Architecture/ConnectionModel.md` are largely honoured by the code: single backend writer/reader, per-pipe write loop, `ConcurrentDictionary`-backed correlation map, and a claim-then-dispatch watchdog. The lock-ordering discipline (map lock → no async inside it; connect gate serialising connect vs. teardown) is sound, and the prior reviews' remediations appear intact. However this review found one **wire-protocol correctness bug** that can deliver a stale value to a client across a write (a real data-integrity hazard for an SCADA proxy), one **resource-leak** path, and several **major** correctness gaps around partial-PDU parsing, cache-invalidation arithmetic overflow, and a watchdog/late-attach race window. None of the findings is a deadlock or crash, but the cache-vs-write ordering issue is genuine silent-wrong-value territory and should be treated as the headline item.
|
||||||
|
|
||||||
|
**Findings by severity:** Critical: 1 · Major: 6 · Minor: 7
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Critical
|
||||||
|
|
||||||
|
### C1 — Cache hit can serve a value contradicted by an in-flight write to the same register (read-after-write inversion)
|
||||||
|
|
||||||
|
**Files:** `PlcMultiplexer.cs:826-857` (cache lookup), `PlcMultiplexer.cs:674-695` (invalidation on FC06/FC16 *response*).
|
||||||
|
|
||||||
|
**What's wrong.** Cache invalidation for a write fires only in `RunBackendReaderAsync` *after the FC06/FC16 response lands from the PLC* (lines 674-695). The cache lookup on a read (lines 826-857) happens with no awareness of writes that are currently *in flight but not yet acked*. Consider this realistic sequence on one PLC, all on the same register range:
|
||||||
|
|
||||||
|
1. Client A reads `[100..110)` → cache miss → backend round-trip → response stored in cache (TTL 500 ms).
|
||||||
|
2. Client B writes register 105 (FC06) → request enqueued, travels to PLC, PLC applies it.
|
||||||
|
3. Client C reads `[100..110)` → **cache hit** → C receives the *pre-write* value.
|
||||||
|
4. The FC06 response finally arrives → invalidation runs → cache entry for `[100..110)` dropped.
|
||||||
|
|
||||||
|
Between steps 2 and 4 (one full backend round-trip, plus outbound-channel queue depth, plus ECOM scan time — easily tens of ms, far less than a 500 ms TTL) the cache continues to serve the *old* value even though the PLC has already accepted the new one. Client C is told register 105 is the old value *after* the write that changed it was already accepted upstream by the proxy. For a Modbus proxy fronting SCADA, that is a silent wrong value delivered to a client — the exact failure mode the review brief calls Critical.
|
||||||
|
|
||||||
|
The doc (`ResponseCache.md` "Write Invalidation") frames invalidation as "a successful FC06/FC16 *response* invalidates" — i.e. the design intentionally invalidates on the response, not the request. That is defensible *only* if the cache also refuses to serve a range while a write to an overlapping range is in flight. It does not: there is no in-flight-write tracking on the read path.
|
||||||
|
|
||||||
|
**Impact.** Silent stale value across a write, bounded by one backend round-trip rather than by `CacheTtlMs`. Operationally most visible when a client writes a setpoint then another client (or the same client on a different connection) immediately reads it back and sees the old value. Note coalescing does *not* have this problem — it dies on response — but the cache extends the staleness window past the write.
|
||||||
|
|
||||||
|
**Recommendation.** Invalidate (or suppress) on the **request** side, not only the response side. When an FC06/FC16 frame is parsed in `OnUpstreamFrameAsync` (the FC06/FC16 branch around lines 805-817), immediately call `responseCache.Invalidate(unitId, startAddr, qty)` *before* the write is enqueued, and additionally track the overlapping range as "write pending" so reads that arrive before the write response lands also miss. The simplest correct fix: invalidate on request enqueue *and* keep the existing response-side invalidation as a backstop. The minor cost (a write may evict an entry that the write later turns out to fail with an exception) is acceptable — a failed write just causes a cache miss and a fresh read, which is always safe; serving a stale value is not.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Major
|
||||||
|
|
||||||
|
### M1 — `EnsureBackendConnectedAsync` leaks the backend socket if `_disposed` flips during connect
|
||||||
|
|
||||||
|
**File:** `PlcMultiplexer.cs:287-358`.
|
||||||
|
|
||||||
|
**What's wrong.** A caller can enter `EnsureBackendConnectedAsync`, pass the `_disposed` check at line 289, acquire `_connectGate`, and run a multi-second Polly connect (lines 310-324). If `DisposeAsync` runs concurrently, it sets `_disposed = true`, cancels `_disposeCts`, and calls `TearDownBackendAsync` — which acquires the gate (or times out after 2 s and proceeds gate-less), and at that moment `_backendSocket` is still `null` because the connecting task has not yet reached line 341. `TearDownBackendAsync` sees `oldSocket is null && oldCts is null` and returns without closing anything. The connecting task then completes its `ConnectAsync`, takes `_backendLock`, and assigns a **fully live `Socket`** plus three running tasks (`_backendWriterTask`/`ReaderTask`/`HeartbeatTask`) into the now-disposed multiplexer. Nothing ever disposes that socket or joins those tasks; `_disposeCts` is already disposed so `CancellationTokenSource.CreateLinkedTokenSource(_disposeCts.Token)` at line 338 throws `ObjectDisposedException`, leaving `backend` connected and undisposed.
|
||||||
|
|
||||||
|
**Impact.** Leaked TCP socket to the PLC (one of the ECOM's 4 client slots held forever) plus three orphaned background tasks per occurrence. Occurs on a hot-reload PLC-remove or service stop that races a cold-start connect — not common, but it happens under exactly the churn the supervisor is designed to handle.
|
||||||
|
|
||||||
|
**Recommendation.** After a successful connect, re-check `_disposed` (and `_disposeCts.IsCancellationRequested`) *under `_backendLock`* before publishing the socket; if disposal won the race, dispose `backend` and return `false`. Wrap the `CreateLinkedTokenSource` in the same guarded block so a disposed `_disposeCts` cannot throw past the socket assignment.
|
||||||
|
|
||||||
|
### M2 — Backend reader drops a frame whose PDU body length disagrees with the PLC, with no resync
|
||||||
|
|
||||||
|
**File:** `PlcMultiplexer.cs:540-574`, `UpstreamPipe.cs:111-163`.
|
||||||
|
|
||||||
|
**What's wrong.** Both `FillAsync` loops trust the MBAP `Length` field absolutely. If the backend (or a middlebox) ever produces a frame whose `Length` field is wrong — too short — the reader consumes `length-1` body bytes, then loops and reinterprets the *next* bytes (which are really still part of the previous PDU) as a fresh 7-byte MBAP header. From that point every subsequent frame on the socket is mis-framed: `proxyTxId` is garbage, `CorrelationMap.TryRemove` misses, frames are silently dropped (line 580-584), and every real in-flight request leaks until the watchdog times it out with 0x0B. The socket never faults, so no cascade fires; the PLC effectively goes dark for `BackendRequestTimeoutMs` repeatedly. The only oversized-frame guard (line 561) catches `length-1 > 253` but not a *too small* / mis-aligned length.
|
||||||
|
|
||||||
|
**Impact.** A single corrupt length field on the wire desynchronises the backend stream indefinitely. Recovery depends entirely on the backend eventually closing the socket. Realistic on flaky industrial networks; the brief explicitly lists "middlebox drops a packet" as a covered scenario.
|
||||||
|
|
||||||
|
**Recommendation.** This is hard to fully solve without a length-validation/resync strategy, but at minimum: when `RunBackendReaderAsync` removes a correlation entry it can sanity-check the response PDU length against the request (e.g. an FC03 response body should be `2 + 2*qty`); a mismatch is strong evidence of desync and should `break` to force a teardown+cascade rather than silently fan out a wrong-sized payload. At a minimum, document that the reader assumes a well-framed backend and add a desync-suspected counter when `TryRemove` misses repeatedly.
|
||||||
|
|
||||||
|
### M3 — FC03/FC04 cache hit fans out a payload whose register count is not validated against the request
|
||||||
|
|
||||||
|
**Files:** `PlcMultiplexer.cs:826-857` (`BuildCacheHitFrame`), `PlcMultiplexer.cs:1356-1370`.
|
||||||
|
|
||||||
|
**What's wrong.** `CacheKey` includes `Qty`, so a hit only occurs for an identical `(unit,fc,start,qty)` — good. But `BuildCacheHitFrame` splices `cached.PduBytes` onto a fresh MBAP header and sets `Length = 1 + pduLen` with no upper-bound check. The stored PDU came from the backend reader (line 645-646, snapshot of `pduBodyLen` bytes which *is* bounded to ≤253 at line 561), so today it cannot exceed spec. However nothing asserts the cached PDU's declared byte-count actually matches `2*qty`. If a backend ever returned a short/long FC03 body that still passed the ≤253 check, that malformed payload is now cached and replayed to *every* future hit for the lifetime of the TTL — the cache amplifies one bad backend response into many bad client responses. Coalescing has the same single-response blast radius but only for the concurrently-attached parties; the cache widens it across time.
|
||||||
|
|
||||||
|
**Impact.** Lower likelihood than C1/M2 but the same class of harm — a wrong-shaped frame delivered to clients, amplified by caching.
|
||||||
|
|
||||||
|
**Recommendation.** When storing an FC03/FC04 response (line 641-667), validate the PDU body: `frame[HeaderSize+1]` (byte count) must equal `2*qty` and `pduBodyLen` must equal `2 + 2*qty`. Skip the cache store (and ideally skip the fan-out / force teardown) on mismatch. This also strengthens M2's resync story.
|
||||||
|
|
||||||
|
### M4 — `CacheInvalidator.FindOverlapping` and the FC16 parse can integer-overflow the half-open upper bound
|
||||||
|
|
||||||
|
**Files:** `CacheInvalidator.cs:38,46`, `PlcMultiplexer.cs:812-817`.
|
||||||
|
|
||||||
|
**What's wrong.** `writeEnd = writeStart + writeQty` and `keyEnd = key.StartAddress + key.Qty` are computed as `int` from two `ushort`s — that part is fine (max 0x1FFFE, no overflow). But the FC16 parse at `PlcMultiplexer.cs:815-816` reads `qty` straight from the wire with no clamp: a malicious or buggy client can send FC16 with `qty = 0xFFFF`. That `qty` is then used both as the `InFlightRequest.Qty` and, on the FC06/FC16 response, passed to `postCache.Invalidate(unitId, startAddress, qty)`. The proxy never validates that `qty` is within the DL260's FC16 cap of 100 (per `dl205.md` / CLAUDE.md) — the comment says "the PLC enforces the cap with exception 03", which is true for the *write itself*, but the **invalidation still runs with the bogus 65535-register span** because invalidation is gated only on `isException == false`. If the PLC accepts a smaller write and returns success, an attacker-supplied wide `qty` cannot reach here — but an FC16 request whose declared `qty` does not match its byte-count payload, accepted by a lax backend, would invalidate a 64K-register span and flush most of the cache. More concretely: the FC16 *request* `qty` and the *response* `qty` are assumed identical (the response echoes start+qty), but the code uses the **request-side** `inFlight.Qty` for invalidation (line 687) — so a request that lies about `qty` drives invalidation regardless of what the PLC actually wrote.
|
||||||
|
|
||||||
|
**Impact.** Cache-flush amplification / mild DoS on the cache from a malformed FC16. Not memory-unsafe (the `int` math does not overflow), but a contract violation: invalidation should reflect what was *written*, and it currently reflects what the client *claimed*.
|
||||||
|
|
||||||
|
**Recommendation.** Validate FC16 framing in `OnUpstreamFrameAsync`: require `qty` in `1..123` (or the DL260 cap of 100) and require `frame.Length >= pduOffset + 6 + 2*qty` and `byteCount == 2*qty`; reject (exception 03) otherwise rather than forwarding. Also prefer driving invalidation from the FC16 *response* PDU's echoed start/qty rather than the request's.
|
||||||
|
|
||||||
|
### M5 — Watchdog can miss a late-attached coalescing party, hanging that upstream until *its* socket times out
|
||||||
|
|
||||||
|
**Files:** `PlcMultiplexer.cs:1126-1161`, `InFlightByKeyMap.cs:59-83`.
|
||||||
|
|
||||||
|
**What's wrong.** The watchdog snapshots stale entries from `CorrelationMap` (line 1123), then for each one does `TryRemove` from `CorrelationMap` (line 1131) and only *afterwards* removes the `CoalescingKey` from `InFlightByKeyMap` (line 1160). Between the `CorrelationMap.TryRemove` and the `_inFlightByKey.TryRemove` there is a window where a new FC03/FC04 request can call `AttachOrCreate`, find the still-present coalescing entry, and append itself to `req.InterestedParties`. The watchdog then walks `req.InterestedParties` at line 1165 — but it captured `req` from the `CorrelationMap.TryRemove` out-param, and `InterestedParties` is the *same mutable `List<InterestedParty>`*, so a party appended after the `foreach` started its enumeration would throw `InvalidOperationException` ("collection modified") — caught by the outer `catch (Exception)` at line 1195, which logs and **exits the watchdog loop entirely**. A party appended *before* the `foreach` but after the `CorrelationMap.TryRemove` is delivered the 0x0B (fine), but a party that attached and whose request was *never enqueued to the backend* (the coalescing entry is now orphaned — no proxy TxId of its own) gets nothing if the enumeration already passed it.
|
||||||
|
|
||||||
|
The doc (`ReadCoalescing.md` "Multi-writer multi-reader safety") claims the watchdog "removes the coalescing key before it walks `req.InterestedParties`". The code does the opposite ordering: `CorrelationMap` removal first, then the walk at 1165, then `_inFlightByKey` removal at 1160 is actually *before* the walk — re-reading: line 1157-1161 (`_inFlightByKey.TryRemove`) runs before line 1165 (`foreach`). So the key *is* removed before the walk. Good. **But** the `CoalescingKey` removal at 1160 only happens for `req.Fc is 0x03 or 0x04`; a non-coalescing FC still has no key. The real residual hole: a late attach in the window between `CorrelationMap.TryRemove` (1131) and `_inFlightByKey.TryRemove` (1160) appends to the list, and the `foreach` at 1165 then enumerates a list that *was mutated after `TryRemove`*. If the append lands during the `foreach`, `InvalidOperationException` kills the watchdog (caught at 1195 → loop exits → **no more timeouts ever fire for this PLC**).
|
||||||
|
|
||||||
|
**Impact.** A rare race can permanently disable the timeout watchdog for one PLC, after which any lost backend response leaks its correlation entry forever and hangs upstream clients indefinitely — defeating the entire reason the watchdog exists.
|
||||||
|
|
||||||
|
**Recommendation.** Remove the `CoalescingKey` from `_inFlightByKey` *before* removing from `CorrelationMap`, mirroring the backend reader's ordering (reader does `CorrelationMap.TryRemove` then `_inFlightByKey.TryRemove` — actually the reader has the same ordering; see N-note). The robust fix: in `InFlightByKeyMap`, hand fan-out callers a *frozen copy* of the party list at removal time, or snapshot `req.InterestedParties.ToArray()` immediately after the claim and iterate the copy. Apply the same snapshot in `RunBackendReaderAsync`'s fan-out loop (line 710) for symmetry.
|
||||||
|
|
||||||
|
### M6 — `RunBackendReaderAsync` cascade-on-EOF/fault is fire-and-forget; an exception inside `TearDownBackendAsync` is unobserved
|
||||||
|
|
||||||
|
**Files:** `PlcMultiplexer.cs:753-766`, `538-537`, `RunBackendHeartbeat/Watchdog` similar.
|
||||||
|
|
||||||
|
**What's wrong.** Reader EOF (line 756), reader fault (765), and writer fault (536) all start `TearDownBackendAsync(...)` with `_ = Task.Run`-style fire-and-forget (just `_ = TearDownBackendAsync(...)`). `TearDownBackendAsync` is `async Task`; if it throws *synchronously before its first await* — or after, on a path the `try/finally` doesn't cover — the exception lands on an unobserved `Task`. The code guards the common case (`if (!_disposeCts.IsCancellationRequested)`) to avoid the disposed-`_connectGate` race, but `_connectGate.WaitAsync` can still throw `ObjectDisposedException` if disposal flips *between* the check and the call — that path is handled inside `TearDownBackendAsync` (line 398-403, returns). The residual risk is narrower than M1 but the pattern (unobserved fire-and-forget `async Task` from three call sites) is fragile: any future exception added to `TearDownBackendAsync`'s pre-await section becomes a process-level `UnobservedTaskException`.
|
||||||
|
|
||||||
|
**Impact.** Latent; today probably benign. Worth hardening because all three of the layer's failure-detection paths funnel through it.
|
||||||
|
|
||||||
|
**Recommendation.** Wrap the fire-and-forget in a helper that attaches a continuation logging any fault: `_ = TearDownBackendAsync(...).ContinueWith(t => _logger.LogError(t.Exception, ...), TaskContinuationOptions.OnlyOnFaulted)`. Or make the call sites `await` it where the calling context allows (the reader/writer tasks can `await` since they are about to exit anyway).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Minor
|
||||||
|
|
||||||
|
### N1 — `UpstreamPipe.RunReadLoopAsync` permits a 7-byte zero-body frame but `OnUpstreamFrameAsync` cannot route it
|
||||||
|
|
||||||
|
**Files:** `UpstreamPipe.cs:133-141`, `PlcMultiplexer.cs:775-797`.
|
||||||
|
|
||||||
|
A `Length < 1` frame is forwarded as a header-only `byte[7]` (line 136-138). `OnUpstreamFrameAsync` then reads `fcByte = frame.Length > pduOffset ? ... : 0` → `fcByte = 0`, falls through every FC branch, allocates a proxy TxId, and forwards a 7-byte frame to the backend. A 7-byte MBAP frame with `Length=0` is itself malformed Modbus (`Length` must be ≥2: UnitId+FC). The proxy should reject it rather than allocate a TxId and forward garbage. Recommend: in the read loop, treat `length < 2` as a protocol error and close the pipe (consistent with the oversized-frame handling).
|
||||||
|
|
||||||
|
### N2 — Backend reader's `_inFlightByKey.TryRemove` runs *after* `CorrelationMap.TryRemove` but the doc invariant wants it before fan-out only
|
||||||
|
|
||||||
|
**File:** `PlcMultiplexer.cs:580-607`.
|
||||||
|
|
||||||
|
The ordering is correct for the stated invariant (key removed at 602-607, fan-out at 710), but the `CoalescingKey` is reconstructed from `inFlight.UnitId/Fc/StartAddress/Qty` (line 604-605) — if `inFlight` were ever a heartbeat with `Fc=0x03` this would spuriously probe the map; it is guarded by the `IsHeartbeat` early-`continue` at 595-596, so this is safe today. Worth a one-line comment that the FC03/FC04 branch is heartbeat-free by construction, so a future refactor doesn't reorder the `IsHeartbeat` check below it.
|
||||||
|
|
||||||
|
### N3 — `TxIdAllocator.Release` silently no-ops a double-release, masking the documented `TearDownBackendAsync` race
|
||||||
|
|
||||||
|
**Files:** `TxIdAllocator.cs:119-129`, `PlcMultiplexer.cs:377-385`.
|
||||||
|
|
||||||
|
The "KNOWN RACE" comment block at `PlcMultiplexer.cs:377-385` describes a double-release that frees a legitimately in-flight slot. `Release` cannot detect this — it just checks `_inUse[id]`. Consider having the allocator track a generation/epoch per slot, or at minimum increment a `doubleReleaseSuspected` counter when `Release` is called on an already-free slot, so the (admittedly very rare) silent-request-drop the comment accepts becomes observable in production rather than invisible.
|
||||||
|
|
||||||
|
### N4 — `ResponseCache` LRU eviction is O(n) per insert and the doc's 1000-entry assumption is unenforced at the call path
|
||||||
|
|
||||||
|
**File:** `ResponseCache.cs:206-230`.
|
||||||
|
|
||||||
|
`EvictLeastRecentlyUsed` is a full linear scan, acceptable at the documented 1000-entry cap. But `_maxEntries` comes from `Cache.MaxEntriesPerPlc` which is operator-settable with no documented ceiling in this file; a fat-fingered `MaxEntriesPerPlc = 1_000_000` turns every cache insert into a million-element scan under the lock, stalling the backend reader for every cache-miss store. Recommend either documenting/validating a hard ceiling or switching to an intrusive LRU (linked-list + dict) if large caps must be supported.
|
||||||
|
|
||||||
|
### N5 — `SnapshotOlderThan` allocates a `List` every watchdog tick even when nothing is stale
|
||||||
|
|
||||||
|
**File:** `CorrelationMap.cs:72-81`, `PlcMultiplexer.cs:1123-1124`.
|
||||||
|
|
||||||
|
The watchdog ticks every `BackendRequestTimeoutMs/4` (≥100 ms) and always allocates a fresh `List`, then checks `.Count == 0`. On an idle fleet of 54 PLCs that is 540 throwaway lists/sec. Trivial, but `SnapshotOlderThan` could return `Array.Empty` when the map is empty, or the watchdog could skip the call when `_correlation.Count == 0`.
|
||||||
|
|
||||||
|
### N6 — `CacheEntry.CapturedTags` replay on a cache hit re-stamps observations "now" for every hit, including hits served during a backend outage
|
||||||
|
|
||||||
|
**File:** `PlcMultiplexer.cs:844-850`.
|
||||||
|
|
||||||
|
When the cache serves a hit during a `recovering` backend, the debug view's `capture.Record(...)` is called with `CaptureDirection.Read` and the current time, making the detail page show a "fresh" read of a tag that has not actually been read from the PLC for up to `CacheTtlMs`. This is arguably misleading for an operator debugging during an outage — the value is cache-aged, not live. Consider tagging replayed observations as cache-served, or stamping them with the entry's `CachedAtUtc` rather than now.
|
||||||
|
|
||||||
|
### N7 — Doc drift: `ConnectionModel.md` says the cascade closes "every attached pipe" but the inline comment at the cascade still references the superseded "in flight" wording
|
||||||
|
|
||||||
|
**File:** `PlcMultiplexer.cs:448-450`.
|
||||||
|
|
||||||
|
The comment reads "Close every attached pipe that had a request in flight; the others will simply re-issue" — directly contradicted by the next line ("Per docs/.../ConnectionModel.md, ALL attached upstreams cascade") and by `upstreamCount = _pipes.Count`. The code is correct (all pipes cascade); the first comment sentence is stale and should be deleted to avoid confusing a future reader into thinking idle pipes survive.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes on things checked and found correct
|
||||||
|
|
||||||
|
- **TxId wraparound** (`TxIdAllocator.cs`): the forward scan with `_inFlightCount < SlotCount` guard correctly terminates; `_wrapCount` increments on `0xFFFF→0x0000` cursor roll. No off-by-one. Reuse-distance maximisation works as documented.
|
||||||
|
- **Claim-then-dispatch in the watchdog** (`PlcMultiplexer.cs:1131`): the `TryRemove` race against a real response is correctly handled — whoever wins `TryRemove` owns dispatch; the loser skips. No double-delivery of a real response + 0x0B for the *non-coalesced* case.
|
||||||
|
- **MBAP TxId preservation**: proxy TxId is written at request enqueue (lines 1014-1015 / 1080-1081) and the original is restored per-party on the response (lines 736-737); heartbeat and cache-hit and exception frames all carry the correct TxId. Verified correct.
|
||||||
|
- **Single-party fan-out buffer reuse** (line 732-734): the `Count == 1` fast path reuses the buffer; multi-party clones per party. No aliasing bug.
|
||||||
|
- **Connect gate vs. teardown**: `_connectGate` correctly serialises `EnsureBackendConnectedAsync` against `TearDownBackendAsync`; the bounded 2 s teardown wait and its documented best-effort fallback are reasonable (the residual race is C-noted at N3, not a new finding).
|
||||||
|
- **`InFlightByKeyMap` lock discipline**: no async work happens under the lock; the factory is invoked under the lock but does only synchronous allocation/`TryAdd`. No lock-ordering inversion against `TxIdAllocator`'s lock (allocator lock is always taken *inside* the map lock, never the reverse).
|
||||||
|
- **Cache `recovering`-state invalidation skip**: the structural argument in the inline comment (lines 676-685) holds — no backend reader exists in `recovering`, so no FC06/FC16 response can drive an invalidation. Correct as written (but see C1, which is a *different* and real problem on the read side).
|
||||||
@@ -0,0 +1,115 @@
|
|||||||
|
# Code Review — mbproxy (2026-05-16)
|
||||||
|
|
||||||
|
**Branch:** `mbproxy-webui-dashboard` · **HEAD:** `0308490`
|
||||||
|
**Predecessors:** `codereviews/2026-05-14/`, `codereviews/2026-05-15/` (all of their findings remediated in `554b05d`, `374eecd`, `0308490`).
|
||||||
|
|
||||||
|
## Method
|
||||||
|
|
||||||
|
This was a full-service review run as six parallel area reviews, each in its own file:
|
||||||
|
|
||||||
|
| File | Scope |
|
||||||
|
|------|-------|
|
||||||
|
| [`Multiplexing.md`](Multiplexing.md) | Backend connection layer — `Proxy/Multiplexing/`, `Proxy/Cache/`, `MbapFrame` |
|
||||||
|
| [`ProxyAndBcd.md`](ProxyAndBcd.md) | Proxy lifecycle + BCD codec — `Proxy/` core, `Proxy/Supervision/`, `Bcd/` |
|
||||||
|
| [`AdminSignalR.md`](AdminSignalR.md) | Admin endpoint, SignalR backend, `TagValueCapture`/`TagCaptureRegistry` |
|
||||||
|
| [`Frontend.md`](Frontend.md) | `Admin/wwwroot/` — HTML/CSS/JS dashboard |
|
||||||
|
| [`ConfigAndHosting.md`](ConfigAndHosting.md) | `Configuration/`, `Options/`, `Diagnostics/`, hosting, `ServiceCounters` |
|
||||||
|
| [`TestsAndConfig.md`](TestsAndConfig.md) | `tests/`, `install/`, config templates, csproj packaging |
|
||||||
|
|
||||||
|
Open each area file for the full finding text (description, impact, recommendation, line refs). This Overview consolidates, adjudicates severity where I disagree with an agent, and sets a remediation order.
|
||||||
|
|
||||||
|
## Headline verdict
|
||||||
|
|
||||||
|
The service is fundamentally sound. The hard parts — TxId multiplexing, the claim-then-dispatch watchdog, lock ordering, the BCD codec in all four directions, `Interlocked` counters, graceful-shutdown sequencing, the tab-keyed SignalR capture model — were checked and hold up. **Every Critical and Major finding from the two prior reviews is confirmed remediated with no regressions** (one frontend exception, below).
|
||||||
|
|
||||||
|
This review found **2 findings rated Critical by the area agents, 20 Major, and 38 Minor** (60 total). My own adjudication downgrades both "Criticals" to high-Major (reasoning under *Critical findings*), so I would summarise the real risk profile as: **no true Critical, ~8 Major worth scheduling, the rest opportunistic.** The two largest themes are (1) the proxy trusts self-describing Modbus wire fields more than it should, and (2) startup config validation is weaker than hot-reload config validation.
|
||||||
|
|
||||||
|
## Findings by area
|
||||||
|
|
||||||
|
| Area | Critical | Major | Minor | Most important |
|
||||||
|
|------|:--:|:--:|:--:|----------------|
|
||||||
|
| Multiplexing | 1 | 6 | 7 | C1 cache can serve a value stale across a write |
|
||||||
|
| ProxyAndBcd | 0 | 2 | 7 | M2 `PlcListener` swallows accept-loop faults |
|
||||||
|
| AdminSignalR | 0 | 2 | 7 | M1 one-cycle arm-state/tag-row inconsistency |
|
||||||
|
| Frontend | 1 | 1 | 4 | C1 `dashboard.js` `stateChip` unescaped (XSS regression) |
|
||||||
|
| ConfigAndHosting | 0 | 5 | 6 | M1 `ReloadValidator` never runs at startup |
|
||||||
|
| TestsAndConfig | 0 | 4 | 7 | M1 `AdminPort: 0` does not disable the admin endpoint |
|
||||||
|
| **Total** | **2** | **20** | **38** | |
|
||||||
|
|
||||||
|
## Critical findings — adjudicated
|
||||||
|
|
||||||
|
### Multiplexing C1 — response cache can serve a value contradicted by an in-flight write
|
||||||
|
|
||||||
|
`PlcMultiplexer.cs:826-857` (read/lookup) vs `:674-695` (invalidation on the FC06/FC16 *response*). Cache invalidation fires only when the write *response* lands. A read arriving after a write was forwarded but before its response gets a cache hit and the pre-write value. See `Multiplexing.md` C1 for the full sequence.
|
||||||
|
|
||||||
|
**My adjudication: high-Major, not Critical.** Mitigating: the cache is opt-in per tag (default OFF); the window is one backend round-trip and is already inside the documented `CacheTtlMs` staleness contract; and until the PLC acks the write, the old value is arguably the last *confirmed* value. The genuinely defective sub-case is a **write that times out** — it never produces a response, so invalidation never runs and the cache can serve a potentially-changed value for the rest of the TTL. Regardless of severity label, the fix is cheap and removes all ambiguity: **invalidate on the write request (enqueue), keeping response-side invalidation as a backstop.** Recommended.
|
||||||
|
|
||||||
|
### Frontend C1 — `dashboard.js` `stateChip` interpolates `listener.state` into `innerHTML` unescaped
|
||||||
|
|
||||||
|
`dashboard.js:48`. `detail.js`'s identical function was escaped by the prior review; the fix was not propagated to `dashboard.js`. See `Frontend.md` C1.
|
||||||
|
|
||||||
|
**My adjudication: Major, but fix immediately.** `listener.state` is a server-serialized enum, so practical exploitability today is low — but this is a straight regression from the prior XSS sweep and the fix is one character (`${escapeHtml(state)}`). It should not wait.
|
||||||
|
|
||||||
|
## Major findings worth scheduling
|
||||||
|
|
||||||
|
Grouped by theme. Full text in the area files.
|
||||||
|
|
||||||
|
**Startup vs hot-reload validation asymmetry — fix as one piece of work.**
|
||||||
|
- `ConfigAndHosting.md` M1 — `ReloadValidator.Validate` runs *only* inside the reconciler; the startup path runs only `MbproxyOptionsValidator`, which does not check duplicate `ListenPort`s, `AdminPort` collisions, duplicate PLC names, or the keepalive cross-field rule. A port-typo config starts a half-working fleet instead of failing fast — and `docs/Operations/Configuration.md:284` claims the opposite.
|
||||||
|
- `ConfigAndHosting.md` M4 — `MbproxyOptionsValidator` never range-checks `ListenPort`/`Port`/`AdminPort` or rejects empty `Host`/`Name`; `ListenPort: 0` (the omitted-key default) silently binds an ephemeral port.
|
||||||
|
- `TestsAndConfig.md` M1 — both config templates and the docs say `AdminPort: 0` *disables* the admin endpoint; it does not — Kestrel binds an ephemeral port. And `ReloadValidator` rejects `AdminPort < 1` on reload while startup accepts it. **This is the most user-facing Major: documented behaviour is false on a security-relevant setting.**
|
||||||
|
|
||||||
|
**The proxy trusts self-describing Modbus wire fields.**
|
||||||
|
- `Multiplexing.md` M2 — a single wrong MBAP `Length` desynchronises the backend stream indefinitely with no resync; the PLC goes dark until the socket happens to close.
|
||||||
|
- `Multiplexing.md` M3 — a cached FC03/FC04 payload is never validated against `2*qty`; a malformed backend response is cached and replayed to every hit for the TTL.
|
||||||
|
- `Multiplexing.md` M4 / `ProxyAndBcd.md` M1 — FC16 `qty`/`byteCount` are never cross-checked; a request whose self-describing fields disagree drives cache invalidation and partial BCD rewriting off the client's claim rather than reality. (Not memory-unsafe — per-slot bounds checks hold — but a contract gap.)
|
||||||
|
- Recommendation: add one framing-validation helper for inbound FC16 (`byteCount == 2*qty`, length consistent, `qty` within the DL260 cap) and one for cached/forwarded FC03/04 responses (`byteCount == 2*qty`); reject/teardown on mismatch.
|
||||||
|
|
||||||
|
**Hot-reload robustness.**
|
||||||
|
- `ConfigAndHosting.md` M2 — the reconciler `Restart` path removes the old supervisor from the dictionary *before* rebuilding; a transient fault during rebuild drops that PLC permanently with only an Error line. Build the new supervisor first, swap last.
|
||||||
|
- `ConfigAndHosting.md` M3 — `Resilience.BackendConnect`/`ListenerRecovery` reloads reach only added/restarted PLCs, not reseated/untouched ones — inconsistent, undocumented. Either thread a live accessor (as done for `ReadCoalescing`/`Keepalive`) or document them as restart-only.
|
||||||
|
|
||||||
|
**Connection-layer resilience.**
|
||||||
|
- `ProxyAndBcd.md` M2 — `PlcListener.RunAsync` catches and *returns* on an accept-loop fault instead of rethrowing, making the supervisor's exception-carrying `mbproxy.listener.faulted` (EventId 43) unreachable; faults are double-logged and lose their stack trace.
|
||||||
|
- `Multiplexing.md` M5 — a late-attaching coalescing party can mutate `InterestedParties` while the watchdog enumerates it, throwing `InvalidOperationException` that the outer catch turns into **permanent watchdog death for that PLC**. Snapshot the party list before fan-out.
|
||||||
|
- `Multiplexing.md` M1 — `EnsureBackendConnectedAsync` can leak a live backend socket + three tasks if `_disposed` flips during a cold-start connect. Re-check `_disposed` under `_backendLock` before publishing the socket.
|
||||||
|
- `Multiplexing.md` M6 — three failure-detection paths call `TearDownBackendAsync` fire-and-forget; a pre-await throw becomes an unobserved task exception. Attach a faulted-continuation logger.
|
||||||
|
|
||||||
|
**Silent failures.**
|
||||||
|
- `Frontend.md` M1 — `onreconnected` swallows a failed `SubscribeFleet`/`SubscribePlc`; the pill shows green while the feed is dead, with no retry. Route the warm-reconnect subscribe through the same retry as cold start.
|
||||||
|
- `AdminSignalR.md` M1 — `ReconcileArmed` and `BuildDebug` re-query the capture independently in one push cycle, so a mid-cycle disarm can push a `PlcDebugSnapshot` where `captureArmed` and the tag rows disagree for ≤1 cycle. Self-heals; derive `CaptureArmed` from the already-held `activePlcs` set instead. (I'd call this Minor — one-cycle cosmetic — but it touches the arm-state invariant the prior reviews centred on.)
|
||||||
|
- `AdminSignalR.md` M2 — `StatusBroadcaster.Start()` has no double-call guard; a second call orphans a push loop. Latent (one call site today); add an `Interlocked` flag.
|
||||||
|
|
||||||
|
## Cross-cutting Minor themes
|
||||||
|
|
||||||
|
- **Doc drift** (several): `BcdRewriting.md` uses `mbproxy.rewrite.exception_passthrough` for an event actually named `mbproxy.exception.passthrough` (`ProxyAndBcd.md` N2); `Configuration.md:284` and `HotReload.md:81/95` describe a validation model the code does not implement (`ConfigAndHosting.md` M1/N6); `ConfigReconciler.cs:364` comment claims `GetOrCreate` preserves the armed flag — it no longer does (`AdminSignalR.md` N1); `LogEvents.md` lists EventId 21 for a `ProxyWorker` event with no call site (`ProxyAndBcd.md` N1).
|
||||||
|
- **Unbounded growth on PLC churn**: frontend `prevPdu`/`rateByName` maps (`Frontend.md` N2); `ResponseCache` LRU is O(n)-per-insert with an unbounded operator-settable cap (`Multiplexing.md` N4).
|
||||||
|
- **Test hygiene**: hard-coded `AdminPort: 8080` in five test configs → parallel-run bind conflicts (`TestsAndConfig.md` M2); the sim fixture's readiness poll ignores the runner cancellation token (`TestsAndConfig.md` M3); a needless 400 ms delay in a broadcaster test (`TestsAndConfig.md` m2).
|
||||||
|
- **Best-effort swallow gaps**: `SocketKeepalive.Apply` does not catch `NotSupportedException`/`PlatformNotSupportedException` despite a "never abort a connection" contract (`ProxyAndBcd.md` N4); `EventLogBridge` has no first-failure breadcrumb where `SyslogBridge` does (`ConfigAndHosting.md` N4).
|
||||||
|
|
||||||
|
## Correction to an area finding
|
||||||
|
|
||||||
|
`TestsAndConfig.md` **m4** (and its regression-table row for prior `m6`) reports that `tests/sim/mbproxy.smoke.config.json` still cites `plans/2026-05-15-webui-dashboard.md`. **This is a false positive** — that reference was removed in commit `0308490` (task #21); the current header reads "mbproxy smoke-test configuration for the web-UI browser smoke tests." No action needed for m4.
|
||||||
|
|
||||||
|
## Prior-review regression check
|
||||||
|
|
||||||
|
All Critical and Major findings from `codereviews/2026-05-15/` are confirmed fixed and the fixes hold:
|
||||||
|
|
||||||
|
- SignalR capture-leak (prior C1/C2) — `PlcSubscriptionTracker` is tab-keyed; arm/disarm funnels through `ReconcileArmed`; the hub no longer arms captures. ✔
|
||||||
|
- XSS in `detail.js` (prior C1) — `detail.js` is fully escaped. ✔ **But the same fix was not propagated to `dashboard.js` — see Frontend C1 above (regression-by-omission).**
|
||||||
|
- Cache-hit debug-view freeze (prior TagCapture C1) — fixed via `CacheEntry.CapturedTags` replay. ✔
|
||||||
|
- Bind-failure leak (prior M5/M6) — `StartAppAsync` tears down a partially-started app/broadcaster. ✔
|
||||||
|
- The 8 Minor follow-ups (#14–#21) — all closed; `TestsAndConfig.md`'s regression table confirms, with the m4/m6 false positive corrected above.
|
||||||
|
|
||||||
|
## Recommended remediation order
|
||||||
|
|
||||||
|
1. **`dashboard.js:48`** — escape `stateChip` (`${escapeHtml(state)}`). One character; closes the XSS regression.
|
||||||
|
2. **`AdminPort: 0` semantics** — either implement the documented disable (a `port == 0 → return` guard in `AdminEndpointHost.StartAppAsync`, then allow 0 in both validators) or correct the templates/docs to say the admin endpoint is always on. Pick one; today the docs lie.
|
||||||
|
3. **Single startup/reload validation gate** — run `ReloadValidator` (or a merged validator) at startup so port collisions and duplicate PLC names fail fast; fix `Configuration.md:284`.
|
||||||
|
4. **Cache invalidation on the write request** (Multiplexing C1) — invalidate overlapping FC03/04 entries when an FC06/FC16 is enqueued, not only on its response.
|
||||||
|
5. **Reconciler `Restart` ordering** (ConfigAndHosting M2) — build the new supervisor before removing the old one.
|
||||||
|
6. **`PlcListener.RunAsync` rethrow** (ProxyAndBcd M2) so the supervisor's fault path runs as designed.
|
||||||
|
7. **Watchdog party-list snapshot** (Multiplexing M5) — iterate a frozen copy so a late attach cannot kill the watchdog.
|
||||||
|
8. **Inbound FC16 / cached-response framing validation** (Multiplexing M2/M3/M4, ProxyAndBcd M1) — one helper, applied at both ends.
|
||||||
|
|
||||||
|
Items 1–3 are small and high-value. 4–8 are the connection-layer correctness work and deserve their own change with tests. The 38 Minors are opportunistic — the doc-drift cluster is worth a single sweep.
|
||||||
@@ -0,0 +1,128 @@
|
|||||||
|
# Code Review — Proxy Lifecycle & BCD Codec
|
||||||
|
|
||||||
|
**Date:** 2026-05-16
|
||||||
|
**Branch:** `mbproxy-webui-dashboard` (HEAD `0308490`)
|
||||||
|
**Scope:** `src/Mbproxy/Proxy/{PlcListener,ProxyWorker,ProxyCounters,SocketKeepalive,BcdPduPipeline,NoopPduPipeline,IPduPipeline,PerPlcContext,RewriterLogEvents}.cs`, `src/Mbproxy/Proxy/Supervision/*`, `src/Mbproxy/Bcd/*`. Excludes `Proxy/Multiplexing/`, `Proxy/Cache/`, and `TagValueCapture`/`TagCaptureRegistry`.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
The proxy-lifecycle and BCD-codec subsystems are in good shape. The BCD codec itself (`BcdCodec`) is correct in all four directions, range-checked, and allocation-free; the 32-bit CDAB word order matches the documented `high*10000+low` convention and is consistently applied through `BcdPduPipeline` for FC03/04/06/16. Partial-overlap handling, invalid-BCD passthrough, exception passthrough, and the FC16 per-word base-10000 guard are all implemented correctly. The supervisor's Polly recovery loop, CTS/TCS re-arming, and graceful-shutdown sequencing are carefully built and well-documented. No Critical issues were found. The findings below are one Major correctness gap in the FC16 request length check that admits an integer-overflow bypass, plus a small set of resource/robustness and maintainability items.
|
||||||
|
|
||||||
|
**Findings by severity:** Critical 0 · Major 2 · Minor 7
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Major
|
||||||
|
|
||||||
|
### M1 — FC16 request payload-length check can be bypassed by an oversized `qty`, enabling out-of-PDU reads/writes mitigated only by a later per-slot check
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Proxy/BcdPduPipeline.cs:158-167`
|
||||||
|
|
||||||
|
`ProcessFc16Request` reads `qty` straight from the wire (`pdu[3]<<8 | pdu[4]`, range 0–65535) and then guards with:
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
if (pdu.Length < 6 + qty * 2)
|
||||||
|
return;
|
||||||
|
```
|
||||||
|
|
||||||
|
`qty` is `ushort`; `qty * 2` is computed in `int` so the multiplication itself does not overflow (max 131070), and `6 + qty*2` is fine. So the arithmetic is actually safe — but the *intent* documented in the comment ("a client claiming qty=10 with only 4 bytes of register data would otherwise have its BCD slots silently skipped") is only half-delivered. The real exposure is the opposite case the comment does not mention: the PDU is *delivered* in a buffer larger than the framed PDU. `pdu` here is a `Span<byte>` whose length is the parsed PDU length from the MBAP length field. If a malicious or buggy client sends an MBAP length that is large but a `byteCount`/`qty` that is internally inconsistent, the `pdu.Length < 6 + qty*2` check passes when `pdu.Length` is large, and then the per-slot `lowByteOff + 2 > pdu.Length` checks (lines 200, 261) are the only thing keeping the writes in bounds. That secondary check *is* present and correct, so no actual out-of-bounds write occurs — but the FC16 path never validates `byteCount` (`pdu[5]`) against `qty` at all. A request with `qty=2`, `byteCount=4`, but only the FC16-min 6 bytes plus a short tail will be partially rewritten against whatever stale bytes follow.
|
||||||
|
|
||||||
|
**Impact:** Not a memory-safety bug (the per-slot bounds checks hold), but a malformed FC16 whose `byteCount` disagrees with `qty` is silently partially rewritten instead of being passed through for the PLC to reject. The rewriter mutates register bytes the client never intended as register data. This is a wire-protocol-correctness gap for adversarial/buggy input.
|
||||||
|
|
||||||
|
**Recommendation:** Add an explicit `byteCount` consistency check after reading `qty`: `byte byteCount = pdu[5]; if (byteCount != qty * 2 || pdu.Length < 6 + byteCount) return;`. This makes the rewriter pass through any FC16 whose self-describing fields disagree, matching the documented "let the PLC's validator surface the protocol error" policy and removing reliance on the per-slot check as the safety net.
|
||||||
|
|
||||||
|
### M2 — `PlcListener.RunAsync` swallows a faulted accept loop without distinguishing a transient fault from a permanently dead listener; the supervisor then treats every non-cancellation exit identically
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Proxy/PlcListener.cs:152-161`, interacting with `Supervision/PlcListenerSupervisor.cs:386-432`
|
||||||
|
|
||||||
|
`PlcListener.RunAsync` catches *every* non-`OperationCanceledException` exception, logs `mbproxy.listener.faulted`, and **returns normally**. Because it returns rather than rethrows, the supervisor's `await listener.RunAsync(token)` (line 388) completes without throwing. Control then falls through to lines 418-432, which treat a normal return as "listener accept loop ended unexpectedly", increment `RecoveryAttempts`, log `mbproxy.listener.ended`, and throw `InvalidOperationException` to drive a Polly retry.
|
||||||
|
|
||||||
|
The net effect is that a genuine accept-loop fault (e.g. `SocketException` from `AcceptSocketAsync`) is reported to operators as `mbproxy.listener.faulted` **and then** `mbproxy.listener.ended`, and the recovery counter is incremented once for the same event. The supervisor's own `catch (Exception runEx)` block at line 397 — which exists specifically to log the fault with the exception object attached and increment the counter — is **unreachable for any fault originating inside `RunAsync`**, because `RunAsync` never lets the exception escape.
|
||||||
|
|
||||||
|
**Impact:** Double logging (two distinct event names for one fault), the `mbproxy.listener.faulted` supervised emission (EventId 43, Warning, *with* stack trace) never fires for accept-loop faults — only the unsupervised `PlcListener` emission (EventId 22, Error, *no* exception object) does — so operators lose the stack trace the doc (`LogEvents.md` line 99) promises. The supervisor's `IncrementRecoveryAttempt(runEx.Message)` with the real reason is also never reached; the counter is bumped with the generic "Listener accept loop ended unexpectedly" string instead.
|
||||||
|
|
||||||
|
**Recommendation:** Have `PlcListener.RunAsync` rethrow after logging (`catch (Exception ex) { LogListenerFaulted(...); throw; }`), so the supervisor's `catch (Exception runEx)` path handles it as designed. Then the supervisor logs `mbproxy.listener.faulted` (EventId 43) with the exception, and the "ended unexpectedly" path is reserved for the genuinely-clean-return case it was written for. Alternatively, drop the supervisor's dead `catch (Exception runEx)` block and accept the current behaviour — but the current split is misleading and contradicts the documented event semantics.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Minor
|
||||||
|
|
||||||
|
### N1 — `ProxyWorker.LogBindFailed` (EventId 21) is declared but never invoked; `LogEvents.md` documents it as emitted by `ProxyWorker`
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Proxy/ProxyWorker.cs:382-385`; doc `docs/Reference/LogEvents.md:63`
|
||||||
|
|
||||||
|
The `[LoggerMessage]` `LogBindFailed` partial method (EventId 21, `mbproxy.startup.bind.failed`) is defined in `ProxyWorker` but has no call site — all bind-failure logging happens in `PlcListenerSupervisor.LogBindFailed` (EventId 41). `LogEvents.md` line 63 lists EventId 21 / `ProxyWorker.cs` as a source for `mbproxy.startup.bind.failed`, which no longer matches the code.
|
||||||
|
|
||||||
|
**Impact:** Dead code; misleading documentation. An operator filtering on EventId 21 will never see a hit.
|
||||||
|
|
||||||
|
**Recommendation:** Delete the unused `LogBindFailed` declaration from `ProxyWorker.cs` and remove the `21 (ProxyWorker)` / `src/Mbproxy/Proxy/ProxyWorker.cs` references from the `mbproxy.startup.bind.failed` table in `LogEvents.md`.
|
||||||
|
|
||||||
|
### N2 — `RewriterLogEvents.ExceptionPassthrough` event name `mbproxy.exception.passthrough` is inconsistent with the `mbproxy.rewrite.*` family it sits in
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Proxy/RewriterLogEvents.cs:46-55`
|
||||||
|
|
||||||
|
`PartialBcd` and `InvalidBcd` use `mbproxy.rewrite.partial_bcd` / `mbproxy.rewrite.invalid_bcd`, but `ExceptionPassthrough` uses `mbproxy.exception.passthrough`. `LogEvents.md` (line 557) explicitly lists `exception` as its own `<area>`, so this is intentional and *documented* — but `BcdRewriting.md:146` calls it `mbproxy.rewrite.exception_passthrough`, and `BcdRewriting.md:232` repeats `mbproxy.rewrite.exception_passthrough`. The code and `LogEvents.md` agree on `mbproxy.exception.passthrough`; `BcdRewriting.md` is wrong in two places.
|
||||||
|
|
||||||
|
**Impact:** Doc drift. An operator following `BcdRewriting.md` to build a log filter would grep a name that is never emitted.
|
||||||
|
|
||||||
|
**Recommendation:** Fix the two occurrences in `docs/Features/BcdRewriting.md` (lines 146 and 232) to `mbproxy.exception.passthrough` to match `LogEvents.md` and the source.
|
||||||
|
|
||||||
|
### N3 — `PlcListener.RunAsync` orphans the per-pipe `ContinueWith` cleanup if the listener faults between `Task.Run` and dictionary insertion
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Proxy/PlcListener.cs:136-149`
|
||||||
|
|
||||||
|
The accept loop creates `pipeTask`, inserts it into `_pipeTasks`, then attaches a `ContinueWith` that removes the entry. If `DisposeAsync` runs concurrently (it snapshots `_pipeTasks.Values` at line 178), there is a benign window where a pipe task started after the snapshot is not awaited — acceptable best-effort. More notably, the `ContinueWith` continuation uses `TaskScheduler.Default` and discards its own task (`_ = ...`); if the continuation itself throws (it cannot here, `TryRemove` does not throw) it would be an unobserved exception. Low risk, but the pattern of fire-and-forget `ContinueWith` for cleanup is fragile.
|
||||||
|
|
||||||
|
**Impact:** None in practice — `TryRemove` cannot throw. Maintainability only.
|
||||||
|
|
||||||
|
**Recommendation:** Prefer awaiting cleanup inside the `Task.Run` lambda's `finally` (the lambda already has a `finally` that disposes the pipe — add `_pipeTasks.TryRemove(pipe.Id, out _);` there) and drop the separate `ContinueWith`. This collapses two scheduling primitives into one and removes the discarded-task wart.
|
||||||
|
|
||||||
|
### N4 — `SocketKeepalive.Apply` does not catch `NotSupportedException`; some platforms throw it (not `SocketException`) for the `TcpKeepAlive*` options
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Proxy/SocketKeepalive.cs:36-47`
|
||||||
|
|
||||||
|
`Apply` catches `SocketException` and `ObjectDisposedException`. The three `TcpKeepAlive*` socket options are documented as "not honoured on every platform"; on platforms/runtimes where the option is unrecognised, `Socket.SetSocketOption` can throw `SocketException(SocketError.ProtocolOption)` (caught) **or**, on some older runtimes/OSes, `NotSupportedException` / `PlatformNotSupportedException` for the named option enum. Those would escape and — per the comment's own intent ("must never abort a connection") — defeat the best-effort contract.
|
||||||
|
|
||||||
|
**Impact:** On a platform that throws `PlatformNotSupportedException` for `TcpKeepAliveRetryCount`, applying keepalive to a backend or accepted upstream socket would throw out of the constructor / connect path, aborting the connection. The `SO_KEEPALIVE` master option (line 29) is universally supported, so the realistic failure window is narrow, but the swallow set is incomplete relative to the stated guarantee.
|
||||||
|
|
||||||
|
**Recommendation:** Broaden the catch to also include `NotSupportedException` (which `PlatformNotSupportedException` derives from) — or simplest, catch `Exception` here given the explicit "swallow everything, keepalive is best-effort" contract documented in the XML summary.
|
||||||
|
|
||||||
|
### N5 — FC03/04 response partial-overlap warning logs `qty` (raw request qty) while the in-range computation uses `effectiveQty` (clamped to response words)
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Proxy/BcdPduPipeline.cs:348,362-370`
|
||||||
|
|
||||||
|
`ProcessResponse` clamps `effectiveQty = min(qty, wordsInResponse)` and uses `effectiveQty` for the `lowInRange`/`highInRange` checks (lines 362-363), which is correct. But the partial-BCD warning at line 367-368 logs `startAddress, qty` — the *original* request qty, not `effectiveQty`. If a PLC returns a short response (`byteCount` smaller than `qty*2`), a 32-bit tag that *would* have been fully in range for the requested `qty` is reported as a partial overlap with the full `qty`, making the warning look like a client/config straddle when it is actually a truncated response.
|
||||||
|
|
||||||
|
**Impact:** Misleading `mbproxy.rewrite.partial_bcd` warning in the (rare) short-response case; an operator would chase a client tag-map mismatch that does not exist.
|
||||||
|
|
||||||
|
**Recommendation:** Either log `effectiveQty` in the response-path `PartialBcd` call, or — better — distinguish the two causes: a short response is a backend/transport anomaly, not a client straddle, and arguably warrants a different (or no) warning. At minimum make the logged qty match the qty actually used for the in-range decision.
|
||||||
|
|
||||||
|
### N6 — `ProxyCounters.UpdateRoundTripEwma` comment claims ~1 µs resolution but stores nanoseconds; the fixed-point scale and the comment disagree
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Proxy/ProxyCounters.cs:411-415`
|
||||||
|
|
||||||
|
`sampleFixed = (long)(sampleMs * 1000.0)` converts milliseconds to microseconds (×1000). The comment on line 413 says "store microseconds * 1000 (i.e. nanoseconds)" — but `sampleMs * 1000` is microseconds, not nanoseconds. `Snapshot()` then divides by 1000.0 (line 466) to get milliseconds back, which is consistent with microsecond storage. So the *code* is internally consistent (ms→µs store, µs→ms read); only the comment's "(i.e. nanoseconds)" parenthetical is wrong, and the "~1 µs resolution" claim is right for microsecond storage.
|
||||||
|
|
||||||
|
**Impact:** None functional — the EWMA value is correct. Comment is misleading for a future maintainer.
|
||||||
|
|
||||||
|
**Recommendation:** Fix the line 413 comment to "store milliseconds * 1000 (i.e. microseconds)".
|
||||||
|
|
||||||
|
### N7 — `BcdTagMapBuilder` includes entries implicated in an `OverlappingHighRegister` error in the returned map; relies entirely on the caller checking `Errors.Count`
|
||||||
|
|
||||||
|
**File:** `src/Mbproxy/Bcd/BcdTagMapBuilder.cs:155-164`
|
||||||
|
|
||||||
|
The builder's own comment (lines 156-160) acknowledges that `OverlappingHighRegister`-implicated entries are *kept* in the frozen map, unlike `InvalidWidth`/`DuplicateAddress` entries which are excluded. The safety of this depends on every caller treating `Errors.Count > 0` as fatal. `ProxyWorker.ExecuteAsync` (lines 107-114) does exactly that — it skips the listener. But `ConfigReconciler` and any future caller must replicate the check; the asymmetry (some bad entries excluded, some included) is a latent footgun.
|
||||||
|
|
||||||
|
**Impact:** None today — the one production caller handles it. A future caller that builds a map and uses it without checking `Errors` would get a map with a known-bad overlapping 32-bit pair, and the rewriter would then mis-decode the overlapping registers (a 32-bit tag whose high register is another tag's address produces a `RangeHit` for both, and both would be rewritten).
|
||||||
|
|
||||||
|
**Recommendation:** For consistency and defence-in-depth, exclude `OverlappingHighRegister`-implicated entries from the frozen map the same way `InvalidWidth`/`DuplicateAddress` entries are excluded, so a map returned alongside errors is always safe to use even by a caller that forgets the check. If keeping them is deliberate (for diagnostics), document the contract on `ValidationResult.Map` itself, not just inside the builder.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes (no finding)
|
||||||
|
|
||||||
|
- **BCD codec correctness verified.** `Encode16`/`Decode16` round-trip cleanly for `[0,9999]`; `(uint)value > Max16` correctly rejects negatives via unsigned wrap. `Encode32`/`Decode32` implement CDAB low-word-first with `high*10000+low`, matching `BcdRewriting.md` and `dl205.md`. `HasBadNibble` checks all four nibbles. The FC16 request path's per-word `clientLow/clientHigh > 9999` guard (lines 222-228) correctly prevents the documented silent-mutation bug where `(9999,9999)` would otherwise survive `Encode32`'s 99,999,999 ceiling.
|
||||||
|
- **FC06 high-register partial detection is correct.** `ProcessFc06Request` (lines 96-116) uses `TryGetForRange(address,1,...)` to catch the case where the written address is the *high* register of a configured 32-bit pair; `TryGetForRange`'s negative-`OffsetWords` semantics make `hit.OffsetWords < 0` the right discriminator.
|
||||||
|
- **Supervisor lifecycle is sound.** CTS and TCS are re-armed per `StartAsync`, the previous CTS is disposed before reassignment, `DisposeAsync` is idempotent, and the response cache's eviction timer is disposed in both `ReplaceContextAsync` (old cache) and `DisposeAsync`. Polly's listener-recovery pipeline is correctly infinite (`MaxRetryAttempts = int.MaxValue`) with cancellation as the sole exit.
|
||||||
|
- **Graceful shutdown ordering is correct.** `ProxyWorker.StopAsync` snapshots in-flight counts before `base.StopAsync`, drains via supervisor stop, stops the admin endpoint last, and disposes supervisors — the documented sequence. The `inFlightAtCancel` snapshot rationale is well-reasoned.
|
||||||
|
- **`ProxyCounters` is correctly lock-free** — all increments are `Interlocked`, `ObserveInFlight` and `UpdateRoundTripEwma` use proper CAS loops, and `Snapshot` reads each field atomically.
|
||||||
@@ -0,0 +1,312 @@
|
|||||||
|
# Code Review — Tests, Install/Packaging, and Config (HEAD `0308490`)
|
||||||
|
|
||||||
|
Scope: `tests/Mbproxy.Tests/` (all test code), `tests/sim/` (simulator profile and fixture),
|
||||||
|
`install/` (all scripts and config templates), `src/Mbproxy/Mbproxy.csproj`, and
|
||||||
|
`tests/Mbproxy.Tests/Mbproxy.Tests.csproj`. Prior-review findings from `codereviews/2026-05-15/TestsAndConfig.md`
|
||||||
|
were read for regression context; the code is reviewed fresh against the current HEAD.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
The prior-review remediation is genuine and thorough: every `Major` finding from the
|
||||||
|
2026-05-15 review has been addressed — `StatusBroadcaster.LoopAsync` now has a proper
|
||||||
|
timing-hedged loop test (`Loop_PushesRepeatedly_ThenStopsAfterStopAsync`), the
|
||||||
|
`Record`/`Disarm` race is fixed in production and covered by
|
||||||
|
`ConcurrentRecordAndDisarm_LeavesNoStaleObservation`, `PlcSubscriptionTracker` now has a
|
||||||
|
dedicated test file including a concurrency stress test, the `ThrowingStatusPushSink` fake
|
||||||
|
covers both the swallow-path and the `OperationCanceledException` filter, and
|
||||||
|
`ConfigReconcilerTests` now holds a real registry and asserts add/remove lifecycle. The
|
||||||
|
`HubStatusE2ETests` E2E pair covers the `MapHub`/`SignalRStatusPushSink` gap that the prior
|
||||||
|
review flagged as the single biggest "does the feature actually work" gap. The `EmbeddedAssetsTests`
|
||||||
|
glob guard and the `Get_Asset` byte-verbatim assertion also close the prior `m3`/`m4` nits.
|
||||||
|
No prior-review finding has regressed. The remaining findings below are all new, introduced by
|
||||||
|
the current HEAD or present in the pre-existing code but newly visible against the completed
|
||||||
|
remediation baseline.
|
||||||
|
|
||||||
|
**Findings: 0 Critical · 4 Major · 7 Minor**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Critical
|
||||||
|
|
||||||
|
None.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Major
|
||||||
|
|
||||||
|
**M1 — `AdminPort=0` silently binds a random OS-assigned port instead of disabling the admin endpoint.**
|
||||||
|
`install/mbproxy.config.template.json:83`, `install/mbproxy.linux.config.template.json:87`,
|
||||||
|
`docs/Operations/Configuration.md:54`; `src/Mbproxy/Admin/AdminEndpointHost.cs:190`.
|
||||||
|
|
||||||
|
Both config templates carry the comment `// Set to 0 to disable the admin endpoint.`, and the
|
||||||
|
Operations docs repeat this. The comment is wrong. `AdminEndpointHost.StartAppAsync` calls
|
||||||
|
`k.Listen(System.Net.IPAddress.Any, port)` with whatever `port` is — Kestrel interprets port 0
|
||||||
|
as "pick a free OS-assigned port" (ephemeral port), not as "skip the bind". There is no code in
|
||||||
|
`StartAppAsync` or its callers that special-cases port 0 as a disable signal. A production
|
||||||
|
operator who sets `AdminPort: 0` to disable the admin surface does not get a disabled endpoint;
|
||||||
|
they get an admin endpoint on an unknown ephemeral port that is not logged in the startup banner
|
||||||
|
in a predictable way. This is made worse by:
|
||||||
|
|
||||||
|
1. `ReloadValidator.Validate` (line 65–68) rejects `AdminPort < 1`, so a hot-reload with `AdminPort: 0`
|
||||||
|
is correctly blocked — but the *initial* startup path does not go through `ReloadValidator`
|
||||||
|
(only `MbproxyOptionsValidator`, which does not range-check `AdminPort`). So port 0 at
|
||||||
|
startup actually takes effect, while the same value in a reload is rejected. The two paths
|
||||||
|
are inconsistent.
|
||||||
|
2. Several tests intentionally use `["Mbproxy:AdminPort"] = "0"` with the comment "disable
|
||||||
|
admin to avoid port conflicts" (`ShutdownE2ETests.cs:163`, `StatusBroadcasterTests.cs:47`,
|
||||||
|
`StatusSnapshotBuilderTests.cs:266`). These tests are not broken — they work because Kestrel
|
||||||
|
binds on 0 and no test probes the admin port — but they silently start an admin endpoint
|
||||||
|
rather than suppressing it. If those tests ever run concurrently with another test that
|
||||||
|
probes all listening ports they can interfere.
|
||||||
|
|
||||||
|
**Impact:** Operators following the documented `AdminPort: 0` procedure to disable the admin
|
||||||
|
surface on a sensitive network have an undocumented, unauthorised HTTP listener running. This is
|
||||||
|
a documentation/behavioural inconsistency on a production-critical setting.
|
||||||
|
|
||||||
|
**Recommendation:** Implement the documented disable. In `AdminEndpointHost.StartAppAsync`, add
|
||||||
|
`if (port == 0) return;` before the builder construction. Then update `ReloadValidator` to allow
|
||||||
|
0 (skip collision check) and update `MbproxyOptionsValidator` to allow 0. The templates and docs
|
||||||
|
are then correct. Alternatively, remove the "Set to 0 to disable" comment and document that the
|
||||||
|
admin endpoint is always started, and rename the test comments to reflect reality.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**M2 — Multiple E2E tests hard-code `AdminPort: 8080`, causing bind-conflict failures when run in parallel or when port 8080 is already in use.**
|
||||||
|
`tests/Mbproxy.Tests/Proxy/ProxyForwardingTests.cs:67,242,310`; `tests/Mbproxy.Tests/Proxy/RewriterE2ETests.cs:388`;
|
||||||
|
`tests/Mbproxy.Tests/HostSmokeTests.cs:111`.
|
||||||
|
|
||||||
|
`ProxyForwardingTests` (3 configs), `RewriterE2ETests` (1 config), and the shared
|
||||||
|
`TestHostBuilderExtensions.ConfigureForTest` used by all three `HostSmokeTests` hard-code
|
||||||
|
`["Mbproxy:AdminPort"] = "8080"`. When these tests run in parallel (xUnit v3's default policy
|
||||||
|
for non-collection tests) they each try to bind a Kestrel listener on port 8080. The first
|
||||||
|
binder wins; all subsequent tests get a `bind.failed` log and a missing admin endpoint. While
|
||||||
|
the proxy tests themselves do not assert admin-endpoint behaviour, the spurious bind-failed error
|
||||||
|
can mask real failures in CI logs and reduces test isolation. `HostSmokeTests` compounds this:
|
||||||
|
it is not in any `[Collection]` so it runs concurrently with everything else.
|
||||||
|
|
||||||
|
**Impact:** Flaky CI runs when multiple test classes execute simultaneously. Currently mitigated
|
||||||
|
by the `AdminEndpointHost` bind-failure-is-non-fatal path, so test assertions still pass — but
|
||||||
|
the coverage is degraded (admin is not actually up) and the log noise makes CI output harder to
|
||||||
|
read.
|
||||||
|
|
||||||
|
**Recommendation:** Replace the hard-coded `"8080"` values with `PickFreePort()` calls (the
|
||||||
|
same pattern already used by `AdminEndpointTests` and `HubStatusE2ETests`). Update
|
||||||
|
`TestHostBuilderExtensions.ConfigureForTest` to accept a port parameter with default 8080 for
|
||||||
|
callers that do not care (smoke tests do not probe the admin surface and an ephemeral-port bind
|
||||||
|
is fine).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**M3 — `DL205SimulatorFixture` swallows the test's own `CancellationToken` in the readiness poll, blocking test cancellation for up to 120 seconds.**
|
||||||
|
`tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs:121-163`.
|
||||||
|
|
||||||
|
`InitializeAsync` constructs a `CancellationTokenSource` with a hard-coded 120-second deadline
|
||||||
|
(line 61) and a `linked` token that chains the deadline against `CancellationToken.None` (line
|
||||||
|
122):
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
using var linked = CancellationTokenSource.CreateLinkedTokenSource(
|
||||||
|
deadline.Token, CancellationToken.None);
|
||||||
|
```
|
||||||
|
|
||||||
|
The test framework's cancellation token — which xUnit v3 exposes as
|
||||||
|
`TestContext.Current.CancellationToken` — is never linked in. If the test runner cancels the
|
||||||
|
test suite (CI timeout, keyboard interrupt), the readiness poll continues for up to 120 more
|
||||||
|
seconds and the spawned Python process is not killed until `DisposeAsync` finally runs. On a CI
|
||||||
|
runner with a per-job timeout this can cause the whole runner to be killed mid-cleanup, leaving
|
||||||
|
orphaned Python processes.
|
||||||
|
|
||||||
|
**Impact:** Poor CI behaviour under cancellation; can leave zombie `python`/`pwsh` processes.
|
||||||
|
|
||||||
|
**Recommendation:** Link the fixture against the framework's application lifetime or accept a
|
||||||
|
`CancellationToken` through the `IAsyncLifetime` interface (xUnit v3 provides one via
|
||||||
|
`InitializeAsync(CancellationToken)`). Alternatively, use
|
||||||
|
`TestContext.Current.CancellationToken` from inside `InitializeAsync` if xUnit v3's fixture
|
||||||
|
`IAsyncLifetime` overload provides it:
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
public async ValueTask InitializeAsync() // or the overload with CancellationToken
|
||||||
|
```
|
||||||
|
|
||||||
|
At minimum, change `CancellationToken.None` to the method's incoming token or
|
||||||
|
`TestContext.Current.CancellationToken` inside `linked` so a kill signal actually propagates.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**M4 — `install.ps1` copies the config template to `$dataDir` (ProgramData) but the `ReloadValidator` AdminPort=0 / `MbproxyOptionsValidator` gap means an operator using the template's `AdminPort: 0` disable hint starts a service with an undocumented ephemeral admin port.**
|
||||||
|
(Secondary impact of M1, listed separately because the install path is independently broken.)
|
||||||
|
`install/install.ps1:153-162`; `install/mbproxy.config.template.json:83-84`.
|
||||||
|
|
||||||
|
When `install.ps1` seeds the template as the initial config and an operator follows the inline
|
||||||
|
comment to set `AdminPort: 0`, the service starts with Kestrel binding on an OS-assigned
|
||||||
|
ephemeral port, not disabled. The `install.ps1` script itself has no bug — the error is upstream
|
||||||
|
in M1 — but the install path is where the misleading comment is first acted upon. Listing it
|
||||||
|
separately so the install script gets a concrete fix recommendation: if M1 is fixed by adding a
|
||||||
|
`port == 0 → skip` guard in `AdminEndpointHost`, the template comment becomes correct and this
|
||||||
|
finding is resolved simultaneously. If M1 is fixed differently, the template comment must still
|
||||||
|
be updated to match whatever the new semantics are.
|
||||||
|
|
||||||
|
**Impact:** Same as M1 — an unexpected open HTTP port on a "disabled" admin config.
|
||||||
|
|
||||||
|
**Recommendation:** Fix M1 first; this finding closes automatically once the template's
|
||||||
|
`AdminPort: 0` comment reflects actual behaviour.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Minor
|
||||||
|
|
||||||
|
**m1 — `HubStatusE2ETests.PickFreePort` has a TOCTOU race: the port can be stolen between `l.Stop()` and Kestrel's bind.**
|
||||||
|
`tests/Mbproxy.Tests/Admin/HubStatusE2ETests.cs:141-147`.
|
||||||
|
|
||||||
|
The helper stops the listener and returns the port number — same pattern used in a dozen other
|
||||||
|
test files. This is acknowledged as acceptable in the `DL205SimulatorFixture` TOCTOU comment
|
||||||
|
(line 72). No change needed, but worth noting since `HubStatusE2ETests` is one of the few tests
|
||||||
|
that allocates two ports in the same method (admin + proxy) and has a wider race window. Low
|
||||||
|
priority; flagged for awareness.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**m2 — `Loop_PushesRepeatedly_ThenStopsAfterStopAsync` has a 400ms post-stop idle assertion that can be a source of slowness but not flakiness.**
|
||||||
|
`tests/Mbproxy.Tests/Admin/StatusBroadcasterTests.cs:188-190`.
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
await Task.Delay(400, TestContext.Current.CancellationToken);
|
||||||
|
h.Sink.FleetPushes.Count.ShouldBe(afterStop, "no pushes may occur after StopAsync");
|
||||||
|
```
|
||||||
|
|
||||||
|
The test correctly waits 400 ms after `StopAsync` to confirm silence. Given `AdminPushIntervalMs=100`
|
||||||
|
and that `StopAsync` awaits the loop task (`await _loop`), the loop is guaranteed terminated
|
||||||
|
before `StopAsync` returns — so zero extra delay is needed for the assertion to be correct. The
|
||||||
|
400 ms delay adds wall-clock time to every test run without buying additional confidence. The
|
||||||
|
existing `_loop` await in `StopAsync` is the real guarantee.
|
||||||
|
|
||||||
|
**Recommendation:** Drop the `Task.Delay(400, ...)` line and assert the count immediately after
|
||||||
|
`StopAsync`. The test would be equally correct and ~400 ms faster.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**m3 — `FakeGroupManager.Removed` list is still dead code: no test ever asserts it.**
|
||||||
|
`tests/Mbproxy.Tests/Admin/SignalRFakes.cs:30-31`.
|
||||||
|
|
||||||
|
This was noted as prior finding M4 in the 2026-05-15 review ("either delete `Removed` from the
|
||||||
|
fake, or add a comment"). The list is still present and still has no asserting test.
|
||||||
|
`StatusHub.OnDisconnectedAsync` never calls `RemoveFromGroupAsync` (relying on SignalR's
|
||||||
|
implicit group cleanup), so no assertion ever fires. The unasserted field is a minor maintenance
|
||||||
|
hazard: a future author may assume this is the intended assertion target and waste time debugging
|
||||||
|
a test that never fails.
|
||||||
|
|
||||||
|
**Recommendation:** Drop the `Removed` list from `FakeGroupManager`, or add a single inline
|
||||||
|
comment: `// SignalR auto-removes disconnected connections from groups; explicit RemoveFromGroupAsync is not called.`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**m4 — `mbproxy.smoke.config.json` still carries a dangling reference to `plans/`.**
|
||||||
|
`tests/sim/mbproxy.smoke.config.json:1`.
|
||||||
|
|
||||||
|
The header comment references `plans/2026-05-15-webui-dashboard.md`. As of commit `7466a46`
|
||||||
|
("retire superseded design/plan docs"), `plans/` is untracked (`?? plans/` in git status) and
|
||||||
|
contains no file of that name. The comment is a stale citation that will silently fail any
|
||||||
|
documentation-link check.
|
||||||
|
|
||||||
|
**Recommendation:** Replace the reference with a plain description of the smoke file's purpose,
|
||||||
|
removing the specific plan filename.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**m5 — Both install templates have `AdminPort: 8080` hard-coded in the `Plcs` section inline comment, but the comment already appears in the admin-port section — the duplication in the PLC entry comment is not wrong but potentially confusing.**
|
||||||
|
`install/mbproxy.config.template.json:79` (`"// GET /status.json → …"` inside the PLC
|
||||||
|
description); both templates are identical here.
|
||||||
|
|
||||||
|
This is a documentation nit: the `Plcs` array comment block on lines 60–75 is accurate, and the
|
||||||
|
preceding inline comment about the 4-client cap remains true and relevant. No functional issue.
|
||||||
|
|
||||||
|
**Recommendation:** No change required; noted for completeness.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**m6 — `publish.ps1` and `publish.sh` do not exit non-zero if the config template copy fails.**
|
||||||
|
`install/publish.ps1:87-94`; `install/publish.sh:85-95`.
|
||||||
|
|
||||||
|
Both scripts call `throw` / `exit 1` if the template file is missing, which is correct. However
|
||||||
|
the `Copy-Item -Force` / `cp -f` steps that copy the template into each output flavour silently
|
||||||
|
succeed even if the destination directory does not exist (the directory was just created by
|
||||||
|
`dotnet publish`). If `dotnet publish` produced no output directory (a degenerate build failure
|
||||||
|
that `dotnet publish` should have already reported as non-zero), the copy also fails with an
|
||||||
|
error but neither script re-checks `$LASTEXITCODE` after the copy loop. In practice `dotnet publish`
|
||||||
|
failing before this point already causes both scripts to throw/exit (the preceding
|
||||||
|
`if ($LASTEXITCODE -ne 0) { throw }` guards). This is low-risk but the post-copy result
|
||||||
|
verification (the `Format-Size` / `du -h` summary block) is output-only and does not fail the
|
||||||
|
script if a binary is missing — it emits a `Write-Warning` / `echo WARNING` instead.
|
||||||
|
|
||||||
|
**Recommendation:** In the result summary loop of both scripts, change the missing-binary branch
|
||||||
|
from a warning to a non-zero exit:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
# publish.ps1
|
||||||
|
if (Test-Path $bin) { ... } else { throw "Expected binary not found: $bin" }
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# publish.sh
|
||||||
|
if [[ -f "$bin" ]]; then ... else echo "ERROR: expected binary not found: $bin" >&2; exit 1; fi
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**m7 — `mbproxy.service` `ReadWritePaths` does not include `/etc/mbproxy`, so a hot-reload write by the service account to its own config directory is blocked by `ProtectSystem=strict`.**
|
||||||
|
`install/mbproxy.service:40`.
|
||||||
|
|
||||||
|
```ini
|
||||||
|
ReadWritePaths=/var/log/mbproxy /var/cache/mbproxy
|
||||||
|
```
|
||||||
|
|
||||||
|
`ProtectSystem=strict` makes the entire filesystem read-only except paths listed in
|
||||||
|
`ReadWritePaths`. The service itself never writes to `/etc/mbproxy` (it only reads
|
||||||
|
`appsettings.json`), so this is not a runtime bug. However:
|
||||||
|
|
||||||
|
1. The comment at line 8 states `WorkingDirectory=/etc/mbproxy` — the service account needs
|
||||||
|
read access there, which `ProtectSystem=strict` provides (read-only is allowed).
|
||||||
|
2. If an operator's monitoring or deployment tooling updates the config in-place (e.g.
|
||||||
|
`scp`-ing a new `appsettings.json` as the `mbproxy` user via an SSH key), that write will
|
||||||
|
be blocked by `ProtectSystem=strict`. The operator would need to write as root and chown/chmod
|
||||||
|
as appropriate.
|
||||||
|
|
||||||
|
This is likely intentional (config is an admin operation), but the systemd unit has no comment
|
||||||
|
explaining why `/etc/mbproxy` is intentionally absent from `ReadWritePaths`. It trips up
|
||||||
|
operators who expect the service account to own its config.
|
||||||
|
|
||||||
|
**Recommendation:** Add a comment after the `ReadWritePaths` line: `# /etc/mbproxy is intentionally absent — config changes require root (ProtectSystem=strict makes it read-only for the service account).`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Regression check against 2026-05-15 findings
|
||||||
|
|
||||||
|
| Prior finding | Status |
|
||||||
|
|---|---|
|
||||||
|
| M1 — `LoopAsync` untested | Closed — `Loop_PushesRepeatedly_ThenStopsAfterStopAsync` added |
|
||||||
|
| M2 — `Record`/`Disarm` race | Closed — production fix + `ConcurrentRecordAndDisarm_LeavesNoStaleObservation` |
|
||||||
|
| M3 — `PlcSubscriptionTracker` no direct/concurrent test | Closed — `PlcSubscriptionTrackerTests.cs` added with stress test |
|
||||||
|
| M4 — `FakeGroupManager.Removed` dead code | Partially addressed: no test added; `Removed` list is still present — see new `m3` |
|
||||||
|
| m1 — `ThrowingStatusPushSink` missing | Closed — `ThrowingStatusPushSink` and two swallow/propagate tests added |
|
||||||
|
| m2 — `FakeHubCallerContext.ConnectionAborted` hardcoded | No change; accepted as low-risk |
|
||||||
|
| m3 — asset allow-list coverage | Closed — `EmbeddedAssetsTests` covers glob; `Get_Asset` asserts bytes verbatim |
|
||||||
|
| m4 — `Get_Asset` byte assertion missing | Closed — `ReadEmbeddedAsset` comparison added |
|
||||||
|
| m5 — `*.*` glob caveats | Unchanged; accepted |
|
||||||
|
| m6 — `mbproxy.smoke.config.json` dangling plans reference | Not fixed — see new `m4` |
|
||||||
|
| m7 — `ConfigReconcilerTests` registry wiring untested | Closed — `Apply_AddThenRemovePlc_TagCaptureRegistryTracksRoster` added |
|
||||||
|
| m8 — `AdminPushIntervalMs` upper bound unguarded | Closed — upper bound added to both `ReloadValidator` and `MbproxyOptionsValidator`; four new tests |
|
||||||
|
| m9 — `OnDisconnectedAsync` always called with null | No change; accepted as low-risk |
|
||||||
|
| Coverage gap 7 — `/hub/status` SignalR E2E | Closed — `HubStatusE2ETests` (2 facts) added |
|
||||||
|
|
||||||
|
## Key file references
|
||||||
|
|
||||||
|
- `install/mbproxy.config.template.json:83`, `install/mbproxy.linux.config.template.json:87` — "Set to 0 to disable" comment (M1)
|
||||||
|
- `src/Mbproxy/Admin/AdminEndpointHost.cs:190` — `k.Listen(..., port)` with no port-0 guard (M1)
|
||||||
|
- `src/Mbproxy/Configuration/ReloadValidator.cs:65-68` — rejects `AdminPort < 1` on hot-reload but not at startup (M1)
|
||||||
|
- `tests/Mbproxy.Tests/Proxy/ProxyForwardingTests.cs:67,242,310`, `RewriterE2ETests.cs:388`, `HostSmokeTests.cs:111` — hard-coded port 8080 (M2)
|
||||||
|
- `tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs:121-122` — `CancellationToken.None` swallows test-runner cancellation (M3)
|
||||||
|
- `tests/Mbproxy.Tests/Admin/StatusBroadcasterTests.cs:188` — unnecessary 400 ms idle delay (m2)
|
||||||
|
- `tests/Mbproxy.Tests/Admin/SignalRFakes.cs:30-31` — `Removed` list never asserted (m3)
|
||||||
|
- `tests/sim/mbproxy.smoke.config.json:1` — dangling `plans/` reference (m4)
|
||||||
|
- `install/publish.ps1:103-110`, `install/publish.sh:99-106` — warning-only when expected binary missing (m6)
|
||||||
|
- `install/mbproxy.service:40` — `ReadWritePaths` silent about `/etc/mbproxy` exclusion (m7)
|
||||||
@@ -143,7 +143,7 @@ DL205 / DL260 BCD is non-negative in the default ladder pattern. `BcdCodec.Encod
|
|||||||
|
|
||||||
## Exception Pass-Through
|
## Exception Pass-Through
|
||||||
|
|
||||||
Modbus exception responses pass through unchanged. The rewriter detects an exception response by the high bit of the function code (`fc & 0x80 != 0`), emits a `mbproxy.rewrite.exception_passthrough` event, increments the per-FC exception counter, and returns without touching the payload.
|
Modbus exception responses pass through unchanged. The rewriter detects an exception response by the high bit of the function code (`fc & 0x80 != 0`), emits a `mbproxy.exception.passthrough` event, increments the per-FC exception counter, and returns without touching the payload.
|
||||||
|
|
||||||
Covered exception codes:
|
Covered exception codes:
|
||||||
|
|
||||||
@@ -229,7 +229,7 @@ The rewriter feeds two counters that surface on the status page:
|
|||||||
|
|
||||||
An out-of-range value (`< 0` or `> 9999` for 16-bit; `< 0` or `> 99_999_999` for 32-bit) on a write, or a bad nibble (`>= 0xA`) on a read, increments an internal invalid-BCD counter and emits `mbproxy.rewrite.invalid_bcd` at warning. The PDU passes through raw in that case; the rewriter never substitutes a value the client did not send (writes) or the PLC did not return (reads).
|
An out-of-range value (`< 0` or `> 9999` for 16-bit; `< 0` or `> 99_999_999` for 32-bit) on a write, or a bad nibble (`>= 0xA`) on a read, increments an internal invalid-BCD counter and emits `mbproxy.rewrite.invalid_bcd` at warning. The PDU passes through raw in that case; the rewriter never substitutes a value the client did not send (writes) or the PLC did not return (reads).
|
||||||
|
|
||||||
Both counters are exposed on the status page; see [`../Operations/StatusPage.md`](../Operations/StatusPage.md). The corresponding log events (`mbproxy.rewrite.partial_bcd`, `mbproxy.rewrite.invalid_bcd`, `mbproxy.rewrite.exception_passthrough`) are catalogued in [`../Reference/LogEvents.md`](../Reference/LogEvents.md). Partial-overlap troubleshooting is covered in [`../Operations/Troubleshooting.md`](../Operations/Troubleshooting.md).
|
Both counters are exposed on the status page; see [`../Operations/StatusPage.md`](../Operations/StatusPage.md). The corresponding log events (`mbproxy.rewrite.partial_bcd`, `mbproxy.rewrite.invalid_bcd`, `mbproxy.exception.passthrough`) are catalogued in [`../Reference/LogEvents.md`](../Reference/LogEvents.md). Partial-overlap troubleshooting is covered in [`../Operations/Troubleshooting.md`](../Operations/Troubleshooting.md).
|
||||||
|
|
||||||
The `dl205.json` pymodbus simulator profile encodes BCD test fixtures used by the integration test suite; see [`../Testing/Simulator.md`](../Testing/Simulator.md).
|
The `dl205.json` pymodbus simulator profile encodes BCD test fixtures used by the integration test suite; see [`../Testing/Simulator.md`](../Testing/Simulator.md).
|
||||||
|
|
||||||
|
|||||||
@@ -56,6 +56,7 @@ If a step throws, the exception is logged at Error and the loop continues with t
|
|||||||
| `Cache.EvictionIntervalMs` | Read by the next eviction loop tick. |
|
| `Cache.EvictionIntervalMs` | Read by the next eviction loop tick. |
|
||||||
| `Resilience.ReadCoalescing.Enabled` flipped to `false` | Already-running coalesced entries drain naturally. Subsequent reads bypass coalescing. |
|
| `Resilience.ReadCoalescing.Enabled` flipped to `false` | Already-running coalesced entries drain naturally. Subsequent reads bypass coalescing. |
|
||||||
| `Resilience.ReadCoalescing.MaxParties` | Applies to subsequent attaches. Existing in-flight entries keep their current cap. |
|
| `Resilience.ReadCoalescing.MaxParties` | Applies to subsequent attaches. Existing in-flight entries keep their current cap. |
|
||||||
|
| `Resilience.BackendConnect.*` or `Resilience.ListenerRecovery.*` | **Restart-only.** The backend-connect and listener-recovery Polly pipelines are built from the `Resilience` snapshot taken at service startup; the reconciler builds add/restart supervisors from that same frozen snapshot, so a hot-reload of these values does not propagate to any PLC. Restart the service to change them. |
|
||||||
| Invalid reload (schema break, duplicate ports, duplicate addresses in a resolved tag list, `CacheTtlMs > 60_000` without `Cache.AllowLongTtl = true`) | Reload is rejected as a whole. The current in-memory config stays in effect. `mbproxy.config.reload.rejected` is logged at Error. |
|
| Invalid reload (schema break, duplicate ports, duplicate addresses in a resolved tag list, `CacheTtlMs > 60_000` without `Cache.AllowLongTtl = true`) | Reload is rejected as a whole. The current in-memory config stays in effect. `mbproxy.config.reload.rejected` is logged at Error. |
|
||||||
|
|
||||||
The "next-PDU" wording is load-bearing for the tag-list rows: the rewriter does not snapshot the tag map at connection accept time. It resolves the map for the active PLC at the start of every request frame, so a hot-reloaded tag list is in effect for the very next request, even on existing TCP connections.
|
The "next-PDU" wording is load-bearing for the tag-list rows: the rewriter does not snapshot the tag map at connection accept time. It resolves the map for the active PLC at the start of every request frame, so a hot-reloaded tag list is in effect for the very next request, even on existing TCP connections.
|
||||||
@@ -78,21 +79,22 @@ The `ReloadPlan` distinguishes two kinds of "PLC is still here but changed":
|
|||||||
3. Merge in `Plcs[i].BcdTags.Add` entries — if an address already exists in the working set, the `Add` entry wins. This is how a per-PLC width override is expressed (the global lists a 16-bit tag at the same address; the per-PLC `Add` overrides it to 32-bit).
|
3. Merge in `Plcs[i].BcdTags.Add` entries — if an address already exists in the working set, the `Add` entry wins. This is how a per-PLC width override is expressed (the global lists a 16-bit tag at the same address; the per-PLC `Add` overrides it to 32-bit).
|
||||||
4. Fold `Plcs[i].DefaultCacheTtlMs` into any tag whose explicit `CacheTtlMs` is null.
|
4. Fold `Plcs[i].DefaultCacheTtlMs` into any tag whose explicit `CacheTtlMs` is null.
|
||||||
|
|
||||||
The same builder runs both at startup and during reload validation, so a configuration that builds cleanly at startup is guaranteed to build cleanly at reload, and vice versa. There is no second validator that could disagree with the first.
|
The same builder runs both at startup and during reload validation, so a configuration that builds cleanly at startup is guaranteed to build cleanly at reload, and vice versa.
|
||||||
|
|
||||||
## Validation Rules
|
## Validation Rules
|
||||||
|
|
||||||
`ReloadValidator.Validate` is the gate the hot-reload path consults directly. It runs the following checks in order:
|
`ReloadValidator.Validate` is the configuration gate. It runs at **startup** (in `ProxyWorker.ExecuteAsync`, before any supervisor is built — a rejection logs `mbproxy.startup.config.rejected` and the service exits non-zero) **and** on every hot reload. It runs the following checks in order:
|
||||||
|
|
||||||
1. PLC names are non-empty and unique under ordinal comparison.
|
1. PLC names are non-empty and unique under ordinal comparison.
|
||||||
2. Every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the `Plcs` list.
|
2. Every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the `Plcs` list; every `Host` is non-empty and every backend `Port` is in `[1, 65535]`.
|
||||||
3. `AdminPort` is in `[1, 65535]` and does not collide with any `ListenPort`.
|
3. `AdminPort` is in `[1, 65535]`, or `0` to disable the admin endpoint; a non-zero `AdminPort` must not collide with any `ListenPort`.
|
||||||
4. For each PLC, `BcdTagMapBuilder.Build(next.BcdTags, plc.BcdTags, plc.DefaultCacheTtlMs)` reports no errors. This delegates the per-PLC well-formedness checks — duplicate addresses within a single resolved list, and 32-bit entries whose high register (`Address + 1`) overlaps a separate 16-bit entry — to the single source of truth used at startup.
|
4. For each PLC, `BcdTagMapBuilder.Build(next.BcdTags, plc.BcdTags, plc.DefaultCacheTtlMs)` reports no errors. This delegates the per-PLC well-formedness checks — duplicate addresses within a single resolved list, and 32-bit entries whose high register (`Address + 1`) overlaps a separate 16-bit entry — to the single source of truth used at startup.
|
||||||
5. Cache TTL bounds: every `BcdTag.CacheTtlMs` and every `Plcs[i].DefaultCacheTtlMs` must be `>= 0`, and any value above `60_000` ms requires `Cache.AllowLongTtl = true`. `Cache.MaxEntriesPerPlc` and `Cache.EvictionIntervalMs` must be `>= 0`.
|
5. Cache TTL bounds: every `BcdTag.CacheTtlMs` and every `Plcs[i].DefaultCacheTtlMs` must be `>= 0`, and any value above `60_000` ms requires `Cache.AllowLongTtl = true`. `Cache.MaxEntriesPerPlc` must be in `[0, 100000]` and `Cache.EvictionIntervalMs` must be `>= 0`.
|
||||||
|
6. `AdminPushIntervalMs` is in `[1, 60000]`; connection timeouts are `> 0`; the keepalive cross-field rule holds; and the `Resilience` profiles are well-formed (`BackendConnect.MaxAttempts >= 1` with at least `MaxAttempts - 1` non-negative `BackoffMs` entries, `ListenerRecovery.SteadyStateMs > 0`, `ReadCoalescing.MaxParties >= 1`).
|
||||||
|
|
||||||
A failure at any step appends to the error list but the validator runs to completion so the operator sees every problem with a single save. If the list is non-empty, the reload is rejected atomically and no state mutates.
|
A failure at any step appends to the error list but the validator runs to completion so the operator sees every problem with a single save. If the list is non-empty, the reload is rejected atomically and no state mutates (at startup, the service refuses to start).
|
||||||
|
|
||||||
Schema-level checks — invalid `Width` values on a `BcdTagOptions`, type mismatches, malformed JSON — are also enforced by `MbproxyOptionsValidator` (`IValidateOptions<MbproxyOptions>`) at bind time. The two paths overlap deliberately so both startup and reload reject the same malformed input with the same error wording.
|
Schema-level checks — invalid `Width` values on a `BcdTagOptions`, type mismatches, malformed JSON — are also enforced by `MbproxyOptionsValidator` (`IValidateOptions<MbproxyOptions>`) at bind time. The two validators overlap deliberately; their error wording is similar but not guaranteed identical.
|
||||||
|
|
||||||
### Rejected-reload example
|
### Rejected-reload example
|
||||||
|
|
||||||
|
|||||||
@@ -109,9 +109,19 @@ Port for the read-only HTTP status server. Binds to all interfaces on startup.
|
|||||||
|
|
||||||
`ReloadValidator` rejects values outside `[1, 65535]` and rejects collisions with any `Plcs[i].ListenPort`. Source: `MbproxyOptions.AdminPort`.
|
`ReloadValidator` rejects values outside `[1, 65535]` and rejects collisions with any `Plcs[i].ListenPort`. Source: `MbproxyOptions.AdminPort`.
|
||||||
|
|
||||||
The server exposes `GET /` (auto-refreshing HTML) and `GET /status.json`. See [`./StatusPage.md`](./StatusPage.md) for the schema.
|
The server exposes the SignalR-backed web dashboard (`GET /`, `GET /plc/{name}`, `GET /assets/{path}`, `/hub/status`) and the JSON twin `GET /status.json`. See [`./StatusPage.md`](./StatusPage.md) for the endpoint surface and schema.
|
||||||
|
|
||||||
Authentication is assumed at the network layer (trusted internal segment). The endpoint is read-only — there are no `POST` / `PUT` / `DELETE` routes — so the risk surface is limited to status disclosure. Place the admin port behind a firewall rule that allows only operator workstations.
|
Authentication is assumed at the network layer (trusted internal segment). The endpoint is read-only — no admin actions are exposed — so the risk surface is limited to status disclosure. Place the admin port behind a firewall rule that allows only operator workstations.
|
||||||
|
|
||||||
|
## `Mbproxy.AdminPushIntervalMs`
|
||||||
|
|
||||||
|
Server-push cadence (milliseconds) for the admin dashboard's SignalR feed. Every interval `StatusBroadcaster` builds a status snapshot and pushes it to connected dashboard / detail-page clients.
|
||||||
|
|
||||||
|
| Field | Type | Default | Range |
|
||||||
|
|-------|------|---------|-------|
|
||||||
|
| `AdminPushIntervalMs` | int | `1000` | `1`–`60000` |
|
||||||
|
|
||||||
|
`MbproxyOptionsValidator` and `ReloadValidator` both reject values outside `1`–`60000` ms — the upper bound is a soft guard against a typo (e.g. a seconds value pasted as milliseconds) that would make the "live" feed effectively non-live. The broadcaster additionally floors the effective interval at 100 ms. Source: `MbproxyOptions.AdminPushIntervalMs`.
|
||||||
|
|
||||||
## `Mbproxy.Plcs[]`
|
## `Mbproxy.Plcs[]`
|
||||||
|
|
||||||
@@ -148,6 +158,7 @@ The fleet-wide BCD tag list. Every PLC starts with this set, then applies its pe
|
|||||||
| `Address` | ushort | `0` | `[0, 65535]` | Modbus PDU address (decimal). Address `0` is valid on DL205/DL260 — do not skip it. Octal V-memory addresses must be converted: `V2000` octal = decimal `1024`. |
|
| `Address` | ushort | `0` | `[0, 65535]` | Modbus PDU address (decimal). Address `0` is valid on DL205/DL260 — do not skip it. Octal V-memory addresses must be converted: `V2000` octal = decimal `1024`. |
|
||||||
| `Width` | byte | `0` | `{ 16, 32 }` | Bit width. `16` is one register holding 4 BCD digits (`0–9999`). `32` is a CDAB-ordered register pair at `Address` (low word) and `Address+1` (high word). |
|
| `Width` | byte | `0` | `{ 16, 32 }` | Bit width. `16` is one register holding 4 BCD digits (`0–9999`). `32` is a CDAB-ordered register pair at `Address` (low word) and `Address+1` (high word). |
|
||||||
| `CacheTtlMs` | int? | `null` | `>= 0`, `<= 60000` unless `Cache.AllowLongTtl = true` | Optional per-tag opt-in to the response cache. `null` falls back to the PLC's `DefaultCacheTtlMs`. `0` explicitly disables caching for this tag even when the PLC default is non-zero. |
|
| `CacheTtlMs` | int? | `null` | `>= 0`, `<= 60000` unless `Cache.AllowLongTtl = true` | Optional per-tag opt-in to the response cache. `null` falls back to the PLC's `DefaultCacheTtlMs`. `0` explicitly disables caching for this tag even when the PLC default is non-zero. |
|
||||||
|
| `Name` | string? | `null` | free-form | Optional human-friendly label (e.g. `"Left AirSP"`). Shown on the connection-detail debug view as the row heading. No effect on Modbus rewriting — purely a display aid. |
|
||||||
|
|
||||||
`MbproxyOptionsValidator` rejects any entry whose `Width` is not `16` or `32`. See [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md) for the wire encoding rules and the multi-tag-overlap validation that runs in `BcdTagMapBuilder`.
|
`MbproxyOptionsValidator` rejects any entry whose `Width` is not `16` or `32`. See [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md) for the wire encoding rules and the multi-tag-overlap validation that runs in `BcdTagMapBuilder`.
|
||||||
|
|
||||||
@@ -270,21 +281,22 @@ The cache itself is described in detail in [`../Architecture/ResponseCache.md`](
|
|||||||
|
|
||||||
## Validation Rules
|
## Validation Rules
|
||||||
|
|
||||||
`ReloadValidator.Validate` runs on every config load (startup and hot reload) and rejects the entire snapshot if any rule fails. On rejection at startup, the service exits non-zero. On rejection at runtime, the current in-memory config stays in effect and `mbproxy.config.reload.rejected` is logged at `Error`.
|
`ReloadValidator.Validate` runs on every config load (startup and hot reload) and rejects the entire snapshot if any rule fails. On rejection at startup, the service logs `mbproxy.startup.config.rejected` at `Error` and exits non-zero. On rejection at runtime, the current in-memory config stays in effect and `mbproxy.config.reload.rejected` is logged at `Error`.
|
||||||
|
|
||||||
Rules (in order):
|
Rules (in order):
|
||||||
|
|
||||||
1. **PLC names**: every `Plcs[i].Name` is non-empty and unique (ordinal comparison).
|
1. **PLC names**: every `Plcs[i].Name` is non-empty and unique (ordinal comparison).
|
||||||
2. **ListenPort**: every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the array.
|
2. **ListenPort / Host / Port**: every `Plcs[i].ListenPort` is in `[1, 65535]` and unique across the array; every `Host` is non-empty; every backend `Port` is in `[1, 65535]`.
|
||||||
3. **AdminPort**: in `[1, 65535]` and does not collide with any `ListenPort`.
|
3. **AdminPort**: in `[1, 65535]`, or `0` to disable the admin endpoint; a non-zero value does not collide with any `ListenPort`.
|
||||||
4. **BCD tag map** per PLC, delegated to `BcdTagMapBuilder.Build`:
|
4. **BCD tag map** per PLC, delegated to `BcdTagMapBuilder.Build`:
|
||||||
- duplicate addresses within a single PLC's resolved tag list
|
- duplicate addresses within a single PLC's resolved tag list
|
||||||
- 32-bit entries whose high register (`Address + 1`) overlaps a separate 16-bit entry at that address
|
- 32-bit entries whose high register (`Address + 1`) overlaps a separate 16-bit entry at that address
|
||||||
5. **Cache TTL bounds**:
|
5. **Cache TTL bounds**:
|
||||||
- any `CacheTtlMs` or `DefaultCacheTtlMs` less than 0 is rejected
|
- any `CacheTtlMs` or `DefaultCacheTtlMs` less than 0 is rejected
|
||||||
- any `CacheTtlMs` or `DefaultCacheTtlMs` greater than `60_000` is rejected unless `Cache.AllowLongTtl = true`
|
- any `CacheTtlMs` or `DefaultCacheTtlMs` greater than `60_000` is rejected unless `Cache.AllowLongTtl = true`
|
||||||
6. **Cache size knobs**: `Cache.MaxEntriesPerPlc >= 0`, `Cache.EvictionIntervalMs >= 0`.
|
6. **Cache size knobs**: `Cache.MaxEntriesPerPlc` in `[0, 100000]`, `Cache.EvictionIntervalMs >= 0`.
|
||||||
7. **Width**: every `BcdTagOptions.Width` is `16` or `32` (enforced by `MbproxyOptionsValidator` at schema time).
|
7. **AdminPushIntervalMs / timeouts / keepalive / Resilience**: `AdminPushIntervalMs` in `[1, 60000]`; connection timeouts `> 0`; the keepalive cross-field rule (`BackendHeartbeatIdleMs > BackendRequestTimeoutMs`); and well-formed `Resilience` profiles (`BackendConnect.MaxAttempts >= 1` with `>= MaxAttempts - 1` non-negative `BackoffMs` entries, `ListenerRecovery.SteadyStateMs > 0`, `ReadCoalescing.MaxParties >= 1`).
|
||||||
|
8. **Width**: every `BcdTagOptions.Width` is `16` or `32` (also enforced by `MbproxyOptionsValidator` at schema time).
|
||||||
|
|
||||||
Sample rejection messages (logged at `Error` with the structured property `errors` carrying the full list):
|
Sample rejection messages (logged at `Error` with the structured property `errors` carrying the full list):
|
||||||
|
|
||||||
|
|||||||
@@ -1,17 +1,20 @@
|
|||||||
# Status Page
|
# Status Page
|
||||||
|
|
||||||
The status page is the operator-facing view of the running service: an auto-refreshing HTML dashboard at `GET /` and a JSON twin at `GET /status.json` that monitoring scrapers consume. This document describes the endpoint surface, every wire-level field, and how counters map back to architecture decisions.
|
The status page is the operator-facing view of the running service: a live web dashboard backed by SignalR, plus a JSON twin at `GET /status.json` that monitoring scrapers consume. This document describes the endpoint surface, every wire-level field, and how counters map back to architecture decisions.
|
||||||
|
|
||||||
## Endpoint Surface
|
## Endpoint Surface
|
||||||
|
|
||||||
The admin endpoint is owned by `AdminEndpointHost` (see `src/Mbproxy/Admin/AdminEndpointHost.cs`). It exposes exactly two routes:
|
The admin endpoint is owned by `AdminEndpointHost` (see `src/Mbproxy/Admin/AdminEndpointHost.cs`). It exposes:
|
||||||
|
|
||||||
- `GET /` — a single self-contained HTML document with a `<meta http-equiv="refresh" content="5">` tag. The page refreshes every five seconds by reload, not by JavaScript polling. There is no JS bundle, no external CSS, no remote fonts, and no favicon fetch.
|
- `GET /` — the **fleet dashboard** SPA shell: aggregate fleet health cards and a filterable/sortable per-PLC KPI table.
|
||||||
|
- `GET /plc/{name}` — the **connection-detail** SPA shell for one PLC: every per-PLC counter grouped into readable cards, the connected-client list, and a real-time debug view (per-tag PLC-side raw BCD vs. client-side decoded value).
|
||||||
|
- `GET /assets/{path}` — embedded static assets: Bootstrap 5, the SignalR JS client, two vendored IBM Plex woff2 fonts, and the dashboard's own HTML/CSS/JS. Everything is embedded in the binary; nothing is fetched from a CDN, so the UI works on a firewalled network. Served with a long immutable cache header.
|
||||||
- `GET /status.json` — the same in-memory snapshot serialized as JSON via the source-generated `StatusJsonContext` (camelCase property names).
|
- `GET /status.json` — the same in-memory snapshot serialized as JSON via the source-generated `StatusJsonContext` (camelCase property names).
|
||||||
|
- `/hub/status` — the SignalR hub. The two SPA shells open a hub connection and subscribe: the dashboard to the `fleet` group, a detail page to its `plc:{name}` group. A `StatusBroadcaster` loop pushes a fresh snapshot every `Mbproxy.AdminPushIntervalMs` (default 1000 ms).
|
||||||
|
|
||||||
The endpoint is **read-only**. There are no admin actions exposed — no kick-client, no force-reload, no listener restart, no log download. Reload happens automatically via `IOptionsMonitor`; listener recovery is owned by the supervisor. Authentication lives at the network layer: the service binds to `IPAddress.Any` on the admin port and assumes the deployment runs in a trusted internal segment behind a firewall.
|
The endpoint is **read-only**. There are no admin actions exposed — no kick-client, no force-reload, no listener restart, no log download. The detail-page debug view is the one feature with a runtime side effect, and it is benign and read-only: a PLC's tag-value capture is *armed* (begins recording last-seen values) only while at least one detail page is subscribed to it, and *disarmed* when the last viewer leaves. Reload happens automatically via `IOptionsMonitor`; listener recovery is owned by the supervisor. Authentication lives at the network layer: the service binds to `IPAddress.Any` on the admin port and assumes the deployment runs in a trusted internal segment behind a firewall.
|
||||||
|
|
||||||
Both routes call `StatusSnapshotBuilder.Build()` for every request. The builder reads atomic counters directly from the supervisor map and per-PLC `ProxyCounters`; it holds no locks and performs no I/O.
|
`GET /status.json` and every SignalR push call `StatusSnapshotBuilder.Build()`. The builder reads atomic counters directly from the supervisor map and per-PLC `ProxyCounters`; it holds no locks and performs no I/O.
|
||||||
|
|
||||||
## Port and Configuration
|
## Port and Configuration
|
||||||
|
|
||||||
@@ -291,19 +294,43 @@ A representative two-PLC deployment, ~2 hours into a run:
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## HTML Page Layout
|
## Web Dashboard
|
||||||
|
|
||||||
The HTML renderer is `StatusHtmlRenderer.Render(StatusResponse)` in `src/Mbproxy/Admin/StatusHtmlRenderer.cs`. The page is one document, inline CSS in a `<style>` block, no external resources of any kind — operators can serve it behind a corporate firewall without whitelisting a CDN.
|
The UI is a Bootstrap 5 single-page app served from embedded assets under `src/Mbproxy/Admin/wwwroot/` (`index.html` / `plc.html` shells, `theme.css` + per-view CSS/JS, vendored Bootstrap / SignalR client / IBM Plex fonts). It is built as vanilla JS — no framework, no build step. Updates arrive over the SignalR `/hub/status` feed (`StatusBroadcaster`, ~1 s cadence); there is no page reload and no JavaScript polling.
|
||||||
|
|
||||||
Structure:
|
### Fleet dashboard (`GET /`)
|
||||||
|
|
||||||
1. **Header summary** — version, formatted uptime (`Nh MMm SSs`), `bound/configured` listener tally, last reload timestamp, reload count with a `(N rejected)` suffix when applicable.
|
1. **App bar** — service version, formatted uptime, accepted-reload count, and a live SignalR connection-state pill.
|
||||||
2. **PLC table** — one row per configured PLC. Columns: Name, Host, Port, State (colour-coded — `bound` = green, `recovering` = orange, `stopped` = grey), Clients (count plus a comma-separated list of `remote (N PDUs)`), PDUs forwarded, FC03/FC04/FC06/FC16/FC? counts, BCD slots, Partial BCD, exception codes 01/02/03/04, RTT (ms), bytes in/out, multiplexer columns (in-flight, max in-flight, TxId wraps, cascades, queue), coalescing ratio cell, cache ratio cell, keepalive cell.
|
2. **Aggregate strip** — six cards: listeners bound/configured, total connected clients, fleet PDU/s (rate derived client-side from successive snapshots), PLCs in `recovering`, total backend exceptions, fleet cache hit ratio. The recovering / exceptions cards highlight when non-zero.
|
||||||
3. **State cell error detail** — when `state == "recovering"`, the cell also shows `lastBindError` and `(attempt N)` in a small red span.
|
3. **KPI table** — one row per configured PLC, Tier-1 columns only: PLC name, backend `host:listenPort`, state chip (`bound` green / `recovering` amber / `stopped` grey), clients, PDU/s, RTT ms, exception total, coalesce %, cache %, keepalive. The table is client-side filterable (name/host search, state, "problems only") and sortable — column headers are keyboard-operable (Tab to focus, Enter/Space to sort) and carry `aria-sort`. The PLC name is a link that opens that PLC's detail page in a new tab.
|
||||||
|
|
||||||
The coalescing and cache cells each render as `<pct>% (<hits>)`. When neither has been exercised (`hit + miss == 0`), the cell renders an em-dash to keep the column narrow. The keepalive cell shows the heartbeat-sent count, with `(fail N, idle-disc N)` appended only when either is non-zero. Page weight is bounded by the design budget (≤ 50 KB for a 54-PLC fleet).
|
### Connection detail (`GET /plc/{name}`)
|
||||||
|
|
||||||
The page does not depend on JavaScript. Refresh is driven entirely by the `<meta http-equiv="refresh" content="5">` tag, so any browser — including text-mode browsers — sees the same view.
|
1. **Identity header** — PLC name, `host:listenPort`, state chip. If the PLC was removed by a hot-reload, a "no longer configured" notice replaces the counter cards.
|
||||||
|
2. **Grouped counter cards** — every per-PLC counter from the JSON schema above, regrouped for readability: Listener, Clients (with the per-connection list), PDU traffic, Backend health, Multiplexer, Read coalescing, Response cache, Keepalive, Bytes.
|
||||||
|
3. **Debug view** — a per-tag table showing, for each configured BCD tag, the last raw PLC-side value (BCD nibbles in hex), the decoded client-side value, the direction (read/write), and the age of the observation. The header carries a capture-armed indicator. See *Debug View Data* below.
|
||||||
|
|
||||||
|
## Debug View Data
|
||||||
|
|
||||||
|
The detail page's debug view is fed by an **on-demand per-tag value capture** (`Proxy/TagValueCapture.cs`, one per PLC, held in `Proxy/TagCaptureRegistry.cs`). The `BcdPduPipeline` records the last raw/decoded value for each configured BCD tag — but only while the capture is *armed*. `StatusBroadcaster` reconciles arm state every push cycle from `PlcSubscriptionTracker`: a PLC's capture is armed exactly while at least one detail-page browser tab is open, and disarmed (clearing all slots) otherwise — so the hot path carries zero cost when nobody is watching. The tracker keys on a stable per-page-load tab id, not the SignalR `ConnectionId`, so a transport reconnect cannot leak an armed capture. The per-PLC payload is `PlcDetailResponse` (`src/Mbproxy/Admin/DebugDto.cs`):
|
||||||
|
|
||||||
|
> When the response cache is enabled, an FC03/FC04 **cache hit** bypasses the pipeline. To keep the debug view live for cached tags, each cache entry carries the tag observations captured when it was stored (only when a viewer was armed at that time); a hit replays them into the capture, re-stamped to the hit time. The debug view therefore reflects the value the client actually receives — cache-served reads included — not only backend round-trips.
|
||||||
|
|
||||||
|
| JSON path | Type | Meaning |
|
||||||
|
|---|---|---|
|
||||||
|
| `plc` | `PlcStatus?` | The standard per-PLC status row, or `null` if the PLC was removed by a hot-reload. |
|
||||||
|
| `debug.captureArmed` | `bool` | Whether a detail page currently has the capture armed. |
|
||||||
|
| `debug.tags[].address` | `int` | BCD tag PDU address. |
|
||||||
|
| `debug.tags[].width` | `int` | 16 or 32. |
|
||||||
|
| `debug.tags[].name` | `string?` | Optional human-friendly tag label from config (`BcdTags[].Name`); `null` when unset. Shown as the debug-row heading, with the PDU address as sub-text. |
|
||||||
|
| `debug.tags[].hasValue` | `bool` | `false` until the first observation since the capture was armed. |
|
||||||
|
| `debug.tags[].direction` | `string` | `"read"` (FC03/FC04) or `"write"` (FC06/FC16). |
|
||||||
|
| `debug.tags[].rawHex` | `string` | Raw PLC-side value as BCD nibbles — `0xLLLL` (16-bit) or `0xHHHHLLLL` (32-bit). |
|
||||||
|
| `debug.tags[].decodedValue` | `long` | Decoded binary integer the client reads/wrote. |
|
||||||
|
| `debug.tags[].updatedAtUtc` | `string?` | ISO-8601 time of the observation; `null` when no traffic yet. |
|
||||||
|
| `debug.tags[].ageSeconds` | `double?` | Seconds since the observation; `null` when no traffic yet. |
|
||||||
|
|
||||||
|
`PlcDetailResponse` is delivered **only** over the `/hub/status` SignalR feed (the `"plc"` message); there is no `GET` route for it, and it is serialized through the SignalR JSON protocol rather than `StatusJsonContext`. Scrapers that want per-PLC counters use the `plcs[]` array of `GET /status.json` instead — the debug-view capture has no JSON-twin endpoint.
|
||||||
|
|
||||||
## How to Scrape It
|
## How to Scrape It
|
||||||
|
|
||||||
|
|||||||
@@ -45,6 +45,18 @@ Fires once after `ProxyWorker.StartAsync` has spun up every per-PLC supervisor a
|
|||||||
|
|
||||||
**Operator action:** if the two counts disagree, search for `mbproxy.startup.bind.failed` entries to identify the missing PLCs.
|
**Operator action:** if the two counts disagree, search for `mbproxy.startup.bind.failed` entries to identify the missing PLCs.
|
||||||
|
|
||||||
|
### mbproxy.startup.config.rejected
|
||||||
|
|
||||||
|
**Level:** Error · **EventId:** 2 · **Source:** `src/Mbproxy/Proxy/ProxyWorker.cs`
|
||||||
|
|
||||||
|
| Property | Type | Meaning |
|
||||||
|
|----------|------|---------|
|
||||||
|
| `Errors` | `string` | Concatenated validation failures (one per `;`). |
|
||||||
|
|
||||||
|
Fires once at startup when `ReloadValidator.Validate` rejects the initial `appsettings.json` — duplicate listen ports, an `AdminPort` collision, duplicate PLC names, a malformed BCD tag list, a bad keepalive cross-field relationship, or an invalid `Resilience` profile. The service then exits non-zero; no listeners are started. This is the startup-time twin of `mbproxy.config.reload.rejected`.
|
||||||
|
|
||||||
|
**Operator action:** fix the offending entry in `appsettings.json` and restart the service. The error text names every failed rule.
|
||||||
|
|
||||||
### mbproxy.startup.bind
|
### mbproxy.startup.bind
|
||||||
|
|
||||||
**Level:** Information · **EventId:** 20 (`PlcListener`) / 40 (`PlcListenerSupervisor`) · **Source:** `src/Mbproxy/Proxy/PlcListener.cs`, `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs`
|
**Level:** Information · **EventId:** 20 (`PlcListener`) / 40 (`PlcListenerSupervisor`) · **Source:** `src/Mbproxy/Proxy/PlcListener.cs`, `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs`
|
||||||
@@ -60,7 +72,7 @@ Fires when a per-PLC `TcpListener` successfully binds its configured port. Emitt
|
|||||||
|
|
||||||
### mbproxy.startup.bind.failed
|
### mbproxy.startup.bind.failed
|
||||||
|
|
||||||
**Level:** Error · **EventId:** 21 (`ProxyWorker`) / 41 (`PlcListenerSupervisor`) · **Source:** `src/Mbproxy/Proxy/ProxyWorker.cs`, `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs`
|
**Level:** Error · **EventId:** 41 · **Source:** `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs`
|
||||||
|
|
||||||
| Property | Type | Meaning |
|
| Property | Type | Meaning |
|
||||||
|----------|------|---------|
|
|----------|------|---------|
|
||||||
@@ -88,7 +100,7 @@ Fires after the supervisor's Polly recovery pipeline successfully rebinds a list
|
|||||||
|
|
||||||
### mbproxy.listener.faulted
|
### mbproxy.listener.faulted
|
||||||
|
|
||||||
**Level:** Error (`PlcListener`) / Warning (`PlcListenerSupervisor`) · **EventId:** 22 / 43 · **Source:** `src/Mbproxy/Proxy/PlcListener.cs`, `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs`
|
**Level:** Warning · **EventId:** 43 · **Source:** `src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs`
|
||||||
|
|
||||||
| Property | Type | Meaning |
|
| Property | Type | Meaning |
|
||||||
|----------|------|---------|
|
|----------|------|---------|
|
||||||
@@ -96,7 +108,7 @@ Fires after the supervisor's Polly recovery pipeline successfully rebinds a list
|
|||||||
| `Port` | `int` | Port whose listener faulted. |
|
| `Port` | `int` | Port whose listener faulted. |
|
||||||
| `Reason` | `string` | Top-level exception message. |
|
| `Reason` | `string` | Top-level exception message. |
|
||||||
|
|
||||||
Fires when a listener's accept loop throws. The two sources emit at different levels deliberately: the unsupervised `PlcListener` instance logs at `Error` (a terminal condition for that listener), while the supervised emission is `Warning` because Polly will retry. The supervised path attaches the exception object as the `LoggerMessage` exception parameter, so the stack trace is captured.
|
Fires when a listener's accept loop throws. `PlcListener.RunAsync` propagates the fault to its `PlcListenerSupervisor`, which logs this event at `Warning` (Polly will retry) with the exception object attached as the `LoggerMessage` exception parameter, so the stack trace is captured.
|
||||||
|
|
||||||
**Operator action:** if the same `Plc` produces repeated faults inside a few minutes, inspect the network path. A burst of faults paired with `mbproxy.multiplex.backend.disconnected` indicates the PLC itself is unhealthy rather than a proxy issue.
|
**Operator action:** if the same `Plc` produces repeated faults inside a few minutes, inspect the network path. A burst of faults paired with `mbproxy.multiplex.backend.disconnected` indicates the PLC itself is unhealthy rather than a proxy issue.
|
||||||
|
|
||||||
@@ -138,6 +150,48 @@ Fires when the admin endpoint cannot bind its configured `AdminPort`. The servic
|
|||||||
|
|
||||||
**Operator action:** change `Mbproxy:AdminPort` in `appsettings.json` to a free port. Hot-reload picks up the change; the admin endpoint rebinds without a service restart.
|
**Operator action:** change `Mbproxy:AdminPort` in `appsettings.json` to a free port. Hot-reload picks up the change; the admin endpoint rebinds without a service restart.
|
||||||
|
|
||||||
|
### mbproxy.admin.broadcast.snapshot.failed
|
||||||
|
|
||||||
|
**Level:** Error · **EventId:** 72 · **Source:** `src/Mbproxy/Admin/StatusBroadcaster.cs`
|
||||||
|
|
||||||
|
No structured properties; the exception is attached.
|
||||||
|
|
||||||
|
Fires when the live-dashboard push loop cannot build a status snapshot. The current push cycle is skipped; the loop retries on the next interval. The proxy data path is unaffected.
|
||||||
|
|
||||||
|
**Operator action:** none if isolated. A sustained rate means the status-snapshot builder is consistently throwing — capture the attached exception and investigate.
|
||||||
|
|
||||||
|
### mbproxy.admin.broadcast.fleet.failed
|
||||||
|
|
||||||
|
**Level:** Error · **EventId:** 73 · **Source:** `src/Mbproxy/Admin/StatusBroadcaster.cs`
|
||||||
|
|
||||||
|
No structured properties; the exception is attached.
|
||||||
|
|
||||||
|
Fires when the push loop fails to deliver the fleet snapshot to dashboard subscribers (a SignalR transport fault). The loop continues; per-PLC detail pushes are still attempted.
|
||||||
|
|
||||||
|
**Operator action:** none if isolated. Sustained occurrences mean the SignalR feed is unhealthy — the dashboard's "live" feed is stale even though the proxy is fine.
|
||||||
|
|
||||||
|
### mbproxy.admin.broadcast.detail.failed
|
||||||
|
|
||||||
|
**Level:** Error · **EventId:** 74 · **Source:** `src/Mbproxy/Admin/StatusBroadcaster.cs`
|
||||||
|
|
||||||
|
| Property | Type | Meaning |
|
||||||
|
|----------|------|---------|
|
||||||
|
| `Plc` | `string` | Configured PLC name whose detail push failed. |
|
||||||
|
|
||||||
|
Fires when the push loop fails to deliver a per-PLC detail snapshot to that PLC's detail-page subscribers. The loop continues with the remaining PLCs.
|
||||||
|
|
||||||
|
**Operator action:** none if isolated. Sustained occurrences for one `Plc` mean that PLC's detail page is not receiving live updates.
|
||||||
|
|
||||||
|
### mbproxy.admin.broadcast.loop.terminated
|
||||||
|
|
||||||
|
**Level:** Error · **EventId:** 75 · **Source:** `src/Mbproxy/Admin/StatusBroadcaster.cs`
|
||||||
|
|
||||||
|
No structured properties; the exception is attached.
|
||||||
|
|
||||||
|
Fires when the live-dashboard push loop itself terminates on an unhandled exception (not the expected cancellation at shutdown). The dashboard's live feed stops entirely until the admin endpoint is rebound (an `AdminPort` hot-reload restarts the loop).
|
||||||
|
|
||||||
|
**Operator action:** alert. The live feed is dead; capture the attached exception and restart the admin endpoint (toggle `Mbproxy:AdminPort`) or the service.
|
||||||
|
|
||||||
### mbproxy.shutdown.complete
|
### mbproxy.shutdown.complete
|
||||||
|
|
||||||
**Level:** Information · **EventId:** 80 · **Source:** `src/Mbproxy/Diagnostics/ShutdownCoordinator.cs`
|
**Level:** Information · **EventId:** 80 · **Source:** `src/Mbproxy/Diagnostics/ShutdownCoordinator.cs`
|
||||||
@@ -531,7 +585,7 @@ Each subsystem owns a single `*LogEvents.cs` static partial class with `[LoggerM
|
|||||||
- `src/Mbproxy/Proxy/Cache/CacheLogEvents.cs` — response cache.
|
- `src/Mbproxy/Proxy/Cache/CacheLogEvents.cs` — response cache.
|
||||||
- `src/Mbproxy/Proxy/RewriterLogEvents.cs` — BCD rewriting and exception passthrough.
|
- `src/Mbproxy/Proxy/RewriterLogEvents.cs` — BCD rewriting and exception passthrough.
|
||||||
|
|
||||||
Lifecycle events (`startup.*`, `listener.*`, `admin.*`, `shutdown.*`, `config.reload.*`) live as private `[LoggerMessage]` declarations next to the class that emits them — see `ProxyWorker.cs`, `PlcListener.cs`, `PlcListenerSupervisor.cs`, `AdminEndpointHost.cs`, `ShutdownCoordinator.cs`, and `ConfigReconciler.cs`. New subsystems should follow the `*LogEvents.cs` pattern when they accumulate more than two events.
|
Lifecycle events (`startup.*`, `listener.*`, `admin.*`, `shutdown.*`, `config.reload.*`) live as private `[LoggerMessage]` declarations next to the class that emits them — see `ProxyWorker.cs`, `PlcListener.cs`, `PlcListenerSupervisor.cs`, `AdminEndpointHost.cs`, `StatusBroadcaster.cs` (the `admin.broadcast.*` family), `ShutdownCoordinator.cs`, and `ConfigReconciler.cs`. New subsystems should follow the `*LogEvents.cs` pattern when they accumulate more than two events.
|
||||||
|
|
||||||
## Related Documentation
|
## Related Documentation
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,50 @@
|
|||||||
|
@echo off
|
||||||
|
REM ---------------------------------------------------------------------------
|
||||||
|
REM Installs Mbproxy as a Windows service named "mbproxy", running the
|
||||||
|
REM Mbproxy.exe located in THIS folder. The service reads appsettings.json
|
||||||
|
REM from this same folder. Run this script as Administrator.
|
||||||
|
REM
|
||||||
|
REM For the fuller install (Windows Event Log source, ProgramData config /
|
||||||
|
REM log dirs, ACLs) use install\install.ps1 instead.
|
||||||
|
REM ---------------------------------------------------------------------------
|
||||||
|
setlocal
|
||||||
|
set "SVC=mbproxy"
|
||||||
|
set "BIN=%~dp0Mbproxy.exe"
|
||||||
|
|
||||||
|
net session >nul 2>&1
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo ERROR: this script must be run as Administrator.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
|
||||||
|
if not exist "%BIN%" (
|
||||||
|
echo ERROR: Mbproxy.exe was not found next to this script:
|
||||||
|
echo %BIN%
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
|
||||||
|
sc query %SVC% >nul 2>&1
|
||||||
|
if not errorlevel 1 (
|
||||||
|
echo Service "%SVC%" already exists. Run remove-service.bat first to reinstall.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
|
||||||
|
echo Installing service "%SVC%"
|
||||||
|
echo binary: %BIN%
|
||||||
|
sc create %SVC% binPath= "%BIN%" start= auto DisplayName= "Mbproxy - Modbus TCP BCD proxy"
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo ERROR: sc create failed.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
|
||||||
|
REM Auto-restart 60 s after each of the first two failures; nothing after that.
|
||||||
|
sc failure %SVC% reset= 86400 actions= restart/60000/restart/60000//0 >nul
|
||||||
|
|
||||||
|
echo.
|
||||||
|
echo Installed. Edit appsettings.json in this folder if needed, then run
|
||||||
|
echo start-service.bat to start it.
|
||||||
|
pause
|
||||||
@@ -1,222 +1,91 @@
|
|||||||
// mbproxy configuration template — copy to %ProgramData%\mbproxy\appsettings.json
|
// mbproxy configuration. Copy to %ProgramData%\mbproxy\appsettings.json and edit
|
||||||
// and edit before starting the service.
|
// before starting the service. install.ps1 seeds this file only when none exists —
|
||||||
|
// an existing appsettings.json is always preserved across reinstalls.
|
||||||
//
|
//
|
||||||
// The .NET configuration loader accepts // and /* */ comments in JSON files
|
// JSONC: // and /* */ comments are accepted. The file is hot-reloaded on save.
|
||||||
// (JSONC semantics) when using the default Host.CreateApplicationBuilder path.
|
|
||||||
//
|
//
|
||||||
// IMPORTANT: This file is overwritten on each install ONLY if no appsettings.json
|
// FULL REFERENCE — every key, type, default, range, validation rule and hot-reload
|
||||||
// already exists at the destination. An existing file is always preserved.
|
// behaviour — lives in docs/Operations/Configuration.md. The notes below are brief
|
||||||
|
// pointers only; consult that document before editing.
|
||||||
{
|
{
|
||||||
"Mbproxy": {
|
"Mbproxy": {
|
||||||
|
|
||||||
// ── Global BCD tag list ─────────────────────────────────────────────────────────────
|
// Fleet-wide BCD tag list — applies to every PLC. Each entry: Address (Modbus
|
||||||
// These tags apply to EVERY PLC by default.
|
// PDU-decimal), Width (16 or 32), optional Name (debug-view label) and CacheTtlMs.
|
||||||
// Each entry: Address (Modbus PDU address, decimal), Width (16 or 32 bits).
|
// Per-PLC Add/Remove overrides go under Plcs[].BcdTags. Trailing comments give the
|
||||||
//
|
// 4xxxx Modbus address and the DirectLOGIC V-memory reference.
|
||||||
// Width 16 — one register holds 4 BCD digits (0–9999).
|
|
||||||
// Wire value 0x1234 decodes to decimal 1234.
|
|
||||||
//
|
|
||||||
// Width 32 — a CDAB-ordered register pair (Address = low word, Address+1 = high word).
|
|
||||||
// Decoded decimal = high * 10000 + low (DirectLOGIC CDAB word order).
|
|
||||||
//
|
|
||||||
// Per-PLC overrides (see Plcs[].BcdTags below):
|
|
||||||
// Add — appends extra tags beyond what Global defines, or overrides a
|
|
||||||
// Global entry's Width when the same Address appears in both.
|
|
||||||
// Remove — removes specific addresses from the effective set for that PLC.
|
|
||||||
// Effective set = (Global ∪ Add) − Remove, resolved per PDU.
|
|
||||||
"BcdTags": {
|
"BcdTags": {
|
||||||
"Global": [
|
"Global": [
|
||||||
// V2000 (octal) = decimal address 1024. 16-bit BCD counter.
|
// 16-bit setpoints
|
||||||
{ "Address": 1024, "Width": 16 },
|
{ "Address": 1536, "Width": 16, "Name": "Left ArgonSP" }, // 41537
|
||||||
|
{ "Address": 1539, "Width": 16, "Name": "Right ArgonSP" }, // 41540
|
||||||
|
{ "Address": 1544, "Width": 16, "Name": "Left ChlorineSP" }, // 41545 · V3010
|
||||||
|
{ "Address": 1545, "Width": 16, "Name": "Right ChlorineSP" }, // 41546 · V3011
|
||||||
|
{ "Address": 1546, "Width": 16, "Name": "Left HydrogenSP" }, // 41547 · V3012
|
||||||
|
{ "Address": 1547, "Width": 16, "Name": "Right HydrogenSP" }, // 41548 · V3013
|
||||||
|
{ "Address": 1548, "Width": 16, "Name": "Left AirSP" }, // 41549 · V3014
|
||||||
|
{ "Address": 1549, "Width": 16, "Name": "Right AirSP" }, // 41550 · V3015
|
||||||
|
|
||||||
// V2040 (octal) = decimal address 1056. 32-bit BCD total at 1056/1057.
|
// 32-bit runtimes — CDAB pair spanning Address and Address+1
|
||||||
{ "Address": 1056, "Width": 32 },
|
{ "Address": 4616, "Width": 32, "Name": "MTA Runtime Left (min)" }, // 44617/44618 · V11010
|
||||||
|
{ "Address": 4618, "Width": 32, "Name": "MTA Runtime Right (min)" }, // 44619/44620 · V11012
|
||||||
// V2100 (octal) = decimal address 1088. 16-bit BCD setpoint.
|
{ "Address": 4626, "Width": 32, "Name": "FRR Runtime Left (min)" }, // 44627/44628 · V11022
|
||||||
//
|
{ "Address": 4628, "Width": 32, "Name": "FRR Runtime Right (min)" } // 44629/44630 · V11024
|
||||||
// Phase 11: CacheTtlMs (optional) opts this tag into the response cache. With
|
|
||||||
// CacheTtlMs > 0 set, upstream clients reading this register will see values up
|
|
||||||
// to CacheTtlMs MILLISECONDS OLD — explicit acknowledgement of the staleness
|
|
||||||
// window is required by enabling it. Default (omitted or 0) = cache disabled
|
|
||||||
// for this tag. The cache is OFF by default for every tag.
|
|
||||||
{ "Address": 1088, "Width": 16 /* , "CacheTtlMs": 1000 */ }
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|
||||||
// ── PLC list ────────────────────────────────────────────────────────────────────────
|
// One entry per PLC: upstream clients connect to ListenPort, the proxy forwards to
|
||||||
// Each entry maps one upstream proxy port → one backend PLC.
|
// Host:Port. ListenPort must be unique. Optional per-PLC "BcdTags": { "Add", "Remove" }.
|
||||||
// Upstream clients connect to ListenPort; the proxy forwards to Host:Port.
|
|
||||||
//
|
|
||||||
// IMPORTANT: H2-ECOM100 modules accept at most 4 simultaneous TCP connections.
|
|
||||||
// With the 1:1 upstream↔backend model, a fifth upstream client to the same proxy
|
|
||||||
// port will cause a backend connect failure and an immediate upstream disconnect.
|
|
||||||
"Plcs": [
|
"Plcs": [
|
||||||
{
|
{
|
||||||
"Name": "Line1-Mixer", // Human-readable name (shown on status page and in logs)
|
"Name": "Z28061",
|
||||||
"ListenPort": 5020, // Port the proxy listens on (upstream clients connect here)
|
"ListenPort": 5020,
|
||||||
"Host": "10.0.1.1", // PLC IP address or hostname
|
"Host": "10.210.192.5",
|
||||||
"Port": 502, // PLC Modbus TCP port (almost always 502)
|
|
||||||
"BcdTags": {
|
|
||||||
// Additional 32-bit tag specific to this PLC only.
|
|
||||||
"Add": [
|
|
||||||
{ "Address": 1200, "Width": 32 }
|
|
||||||
],
|
|
||||||
// Remove address 1056 from the Global list for this PLC
|
|
||||||
// (this mixer doesn't use the 32-bit BCD total).
|
|
||||||
"Remove": [ 1056 ]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"Name": "Line1-Conveyor",
|
|
||||||
"ListenPort": 5021,
|
|
||||||
"Host": "10.0.1.2",
|
|
||||||
"Port": 502
|
"Port": 502
|
||||||
// No BcdTags override — uses the Global set as-is.
|
|
||||||
}
|
}
|
||||||
// Add one entry per PLC. Ports must be unique per host. Typical fleet: 54 PLCs.
|
|
||||||
],
|
],
|
||||||
|
|
||||||
// ── Admin port ──────────────────────────────────────────────────────────────────────
|
// Read-only HTTP status page / dashboard. Set to 0 to disable the admin endpoint.
|
||||||
// Read-only HTTP status page.
|
|
||||||
// GET / → self-contained HTML (auto-refreshes every 5 s)
|
|
||||||
// GET /status.json → same data as JSON for monitoring scrapers
|
|
||||||
//
|
|
||||||
// Authentication is assumed at the network layer (trusted internal segment).
|
|
||||||
// Set to 0 to disable the admin endpoint.
|
|
||||||
"AdminPort": 8080,
|
"AdminPort": 8080,
|
||||||
|
|
||||||
// ── Connection timeouts ─────────────────────────────────────────────────────────────
|
// Backend connect / request / graceful-shutdown timeouts (ms), plus TCP keepalive
|
||||||
|
// and the idle-backend FC03 heartbeat. BackendHeartbeatIdleMs must exceed
|
||||||
|
// BackendRequestTimeoutMs. See docs/Architecture/Keepalive.md.
|
||||||
"Connection": {
|
"Connection": {
|
||||||
// Max time (ms) to wait for a TCP connect to the PLC backend.
|
|
||||||
// Each Polly retry attempt gets its own copy of this timeout.
|
|
||||||
"BackendConnectTimeoutMs": 3000,
|
"BackendConnectTimeoutMs": 3000,
|
||||||
|
|
||||||
// Max time (ms) to wait for the PLC to respond to a forwarded PDU.
|
|
||||||
// Non-idempotent FC06/FC16 writes are one-shot — the upstream client
|
|
||||||
// is disconnected immediately on timeout (no retry).
|
|
||||||
"BackendRequestTimeoutMs": 3000,
|
"BackendRequestTimeoutMs": 3000,
|
||||||
|
|
||||||
// Max time (ms) to wait for in-flight PDUs to complete during graceful shutdown
|
|
||||||
// (sc.exe stop / Windows Service stop signal). After this deadline the coordinator
|
|
||||||
// cancels remaining work and proceeds. Keep at or below the SCM wait-hint (30 s).
|
|
||||||
"GracefulShutdownTimeoutMs": 10000,
|
"GracefulShutdownTimeoutMs": 10000,
|
||||||
|
|
||||||
// ── Keepalive / connection monitoring ───────────────────────────────────
|
|
||||||
// The DL205/DL260 ECOM does not emit TCP keepalives, so an idle backend
|
|
||||||
// socket can be silently dropped by a middlebox (switch, firewall, NAT)
|
|
||||||
// after 2-5 minutes. This section enables OS-level SO_KEEPALIVE on both
|
|
||||||
// backend and upstream sockets, and drives a periodic Modbus FC03 heartbeat
|
|
||||||
// on each idle backend socket so a dead path is detected before a real
|
|
||||||
// client request hits it. See docs/Architecture/Keepalive.md.
|
|
||||||
"Keepalive": {
|
"Keepalive": {
|
||||||
// Master switch. false → no SO_KEEPALIVE and no heartbeat; the proxy
|
|
||||||
// behaves exactly as a pre-keepalive build.
|
|
||||||
"Enabled": true,
|
"Enabled": true,
|
||||||
|
|
||||||
// SO_KEEPALIVE: idle time (ms) before the OS sends its first probe.
|
|
||||||
"TcpIdleTimeMs": 30000,
|
"TcpIdleTimeMs": 30000,
|
||||||
// SO_KEEPALIVE: interval (ms) between probes once the idle time elapses.
|
|
||||||
"TcpProbeIntervalMs": 5000,
|
"TcpProbeIntervalMs": 5000,
|
||||||
// SO_KEEPALIVE: unanswered probes before the OS declares the socket dead.
|
|
||||||
"TcpProbeCount": 4,
|
"TcpProbeCount": 4,
|
||||||
|
|
||||||
// Backend heartbeat: after this much backend idle (ms) the proxy issues a
|
|
||||||
// synthetic FC03 qty=1 read to keep the path warm and prove the ECOM is
|
|
||||||
// still answering Modbus. Must be greater than BackendRequestTimeoutMs.
|
|
||||||
"BackendHeartbeatIdleMs": 30000,
|
"BackendHeartbeatIdleMs": 30000,
|
||||||
// FC03 PDU address the heartbeat reads. 0 = V0, valid on DL205/DL260.
|
|
||||||
"BackendHeartbeatProbeAddress": 0
|
"BackendHeartbeatProbeAddress": 0
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
|
||||||
// ── Resilience policies ─────────────────────────────────────────────────────────────
|
// Polly policies: backend-connect retry, listener-bind recovery, read coalescing.
|
||||||
|
// BackendConnect / ListenerRecovery are restart-only (not hot-reloaded).
|
||||||
"Resilience": {
|
"Resilience": {
|
||||||
|
"BackendConnect": { "MaxAttempts": 3, "BackoffMs": [ 100, 500, 2000 ] },
|
||||||
// Polly retry policy for backend TCP connect attempts.
|
"ListenerRecovery": { "InitialBackoffMs": [ 1000, 2000, 5000, 15000, 30000 ], "SteadyStateMs": 30000 },
|
||||||
// MaxAttempts: total connect tries (including the first).
|
"ReadCoalescing": { "Enabled": true, "MaxParties": 32 }
|
||||||
// BackoffMs: delay between each attempt (must have MaxAttempts−1 entries).
|
|
||||||
"BackendConnect": {
|
|
||||||
"MaxAttempts": 3,
|
|
||||||
"BackoffMs": [ 100, 500, 2000 ]
|
|
||||||
},
|
|
||||||
|
|
||||||
// Polly recovery policy for listener bind failures.
|
|
||||||
// If a PLC's listen port can't be bound (in-use, bad IP, transient OS error),
|
|
||||||
// the supervisor retries according to this schedule.
|
|
||||||
// InitialBackoffMs: backoff per step (first N retries).
|
|
||||||
// SteadyStateMs: backoff for all subsequent retries (runs indefinitely).
|
|
||||||
"ListenerRecovery": {
|
|
||||||
"InitialBackoffMs": [ 1000, 2000, 5000, 15000, 30000 ],
|
|
||||||
"SteadyStateMs": 30000
|
|
||||||
},
|
|
||||||
|
|
||||||
// Phase 10 — in-flight read coalescing.
|
|
||||||
//
|
|
||||||
// When two or more upstream clients (HMI / historian / engineering workstation /
|
|
||||||
// gateway) issue the SAME FC03 or FC04 read while a matching backend round-trip is
|
|
||||||
// already in flight, the proxy attaches the late arrivals to the existing in-flight
|
|
||||||
// entry and fans the single PLC response out to every attached client — saving the
|
|
||||||
// ECOM's per-scan PDU budget on duplicated reads.
|
|
||||||
//
|
|
||||||
// Zero post-response staleness: coalescing operates ONLY between "first request
|
|
||||||
// sent to PLC" and "response received from PLC" (microseconds to ~10 ms typical).
|
|
||||||
// Each upstream client still sees its own MBAP transaction ID echoed correctly;
|
|
||||||
// the proxy is transparent.
|
|
||||||
//
|
|
||||||
// FC06 / FC16 writes are NEVER coalesced (non-idempotent). FC03 vs FC04 are
|
|
||||||
// separate Modbus tables and never share a coalescing key. Different unit IDs
|
|
||||||
// (multi-drop / gateway-backed setups) never coalesce.
|
|
||||||
//
|
|
||||||
// Enabled — master switch. Hot-reloadable; flipping to false leaves running
|
|
||||||
// coalesced entries to drain naturally.
|
|
||||||
// MaxParties — per-entry cap on attached parties. Past the cap, the next
|
|
||||||
// identical request opens a fresh backend round-trip (load-shedding
|
|
||||||
// safety valve for very fan-out-heavy fleets).
|
|
||||||
"ReadCoalescing": {
|
|
||||||
"Enabled": true,
|
|
||||||
"MaxParties": 32
|
|
||||||
}
|
|
||||||
},
|
},
|
||||||
|
|
||||||
// ── Response cache (Phase 11) — opt-in bounded-staleness cache ──────────────────
|
// Opt-in response cache — OFF by default per tag. A tag opts in via its CacheTtlMs
|
||||||
//
|
// (or a PLC's DefaultCacheTtlMs); these are service-wide safety knobs only.
|
||||||
// ⚠ DESIGN-CONTRACT PIVOT: with caching enabled the proxy is no longer purely
|
// See docs/Architecture/ResponseCache.md.
|
||||||
// transparent. Upstream FC03/FC04 reads for cache-enabled tags may return values
|
|
||||||
// up to CacheTtlMs MILLISECONDS OLD. Operators opt tags in by setting a non-zero
|
|
||||||
// CacheTtlMs on a BcdTagOptions entry (or DefaultCacheTtlMs on a PlcOptions entry).
|
|
||||||
//
|
|
||||||
// The cache is OFF BY DEFAULT for every tag. A deployment with NO TTL config (this
|
|
||||||
// section entirely absent and no BcdTags.*.CacheTtlMs / Plcs[i].DefaultCacheTtlMs)
|
|
||||||
// behaves IDENTICALLY to a pre-Phase-11 deployment — no behaviour change.
|
|
||||||
//
|
|
||||||
// AllowLongTtl — gate for any CacheTtlMs > 60_000. Reload validation
|
|
||||||
// rejects configs that exceed 60 s without this opt-in,
|
|
||||||
// to prevent accidentally-stale-for-an-hour deployments.
|
|
||||||
// MaxEntriesPerPlc — LRU cap per-PLC. Past this cap, the next insert evicts
|
|
||||||
// the least-recently-used entry. Defaults to 1000.
|
|
||||||
// EvictionIntervalMs — background eviction tick. Scans each PLC's cache and
|
|
||||||
// removes entries past their TTL. Defaults to 5000.
|
|
||||||
//
|
|
||||||
// Properties (full text in docs/Architecture/ResponseCache.md):
|
|
||||||
// * Cache hits SHORT-CIRCUIT coalescing entirely (cache → coalesce → backend).
|
|
||||||
// * Successful FC06/FC16 write responses invalidate every cached FC03/FC04 entry
|
|
||||||
// whose address range OVERLAPS the write — not just exact-key match.
|
|
||||||
// * Multi-tag read range: effective TTL = min(TTLs). Any tag with TTL=0 in the
|
|
||||||
// range disables caching for the whole read.
|
|
||||||
// * Cache stores POST-rewriter bytes; hits never re-invoke the BCD rewriter.
|
|
||||||
// * Tag-list hot-reload flushes the affected PLC's whole cache.
|
|
||||||
// * No persistence — process restart wipes the cache.
|
|
||||||
"Cache": {
|
"Cache": {
|
||||||
"AllowLongTtl": false,
|
"AllowLongTtl": false,
|
||||||
"MaxEntriesPerPlc": 1000,
|
"MaxEntriesPerPlc": 1000,
|
||||||
"EvictionIntervalMs": 5000
|
"EvictionIntervalMs": 5000
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
|
||||||
// ── Serilog ─────────────────────────────────────────────────────────────────────────────
|
// Structured logging — console + daily rolling file under %ProgramData%\mbproxy\logs.
|
||||||
// Structured log output. Default: Information level, rolling-file under ProgramData.
|
// Error+ events also go to the Windows Application Event Log under the SCM (wired in
|
||||||
// The EventLogBridge writes Error+ events to the Windows Application Event Log
|
// code, not here). See docs/Operations/Troubleshooting.md.
|
||||||
// automatically when the service runs under the SCM (not under dotnet run).
|
|
||||||
"Serilog": {
|
"Serilog": {
|
||||||
"Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
|
"Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
|
||||||
"MinimumLevel": {
|
"MinimumLevel": {
|
||||||
@@ -236,8 +105,6 @@
|
|||||||
{
|
{
|
||||||
"Name": "File",
|
"Name": "File",
|
||||||
"Args": {
|
"Args": {
|
||||||
// Rolling log: one file per day, kept for 30 days.
|
|
||||||
// Survives uninstall — logs are archived to %ProgramData%\mbproxy.archived-<ts>\.
|
|
||||||
"path": "C:\\ProgramData\\mbproxy\\logs\\mbproxy-.log",
|
"path": "C:\\ProgramData\\mbproxy\\logs\\mbproxy-.log",
|
||||||
"rollingInterval": "Day",
|
"rollingInterval": "Day",
|
||||||
"retainedFileCountLimit": 30,
|
"retainedFileCountLimit": 30,
|
||||||
|
|||||||
@@ -1,227 +1,93 @@
|
|||||||
// mbproxy configuration template (Linux / systemd) — copy to /etc/mbproxy/appsettings.json
|
// mbproxy configuration (Linux / systemd). Copy to /etc/mbproxy/appsettings.json and
|
||||||
// and edit before starting the service.
|
// edit before starting the service. install.sh seeds this file only when none exists —
|
||||||
|
// an existing appsettings.json is always preserved across reinstalls.
|
||||||
//
|
//
|
||||||
// The .NET configuration loader accepts // and /* */ comments in JSON files
|
// JSONC: // and /* */ comments are accepted. The file is hot-reloaded on save.
|
||||||
// (JSONC semantics) when using the default Host.CreateApplicationBuilder path.
|
// This is the Linux counterpart of mbproxy.config.template.json — identical keys, with
|
||||||
|
// a /var/log/mbproxy log path; shipped as appsettings.json by a `dotnet publish -r linux-*`.
|
||||||
//
|
//
|
||||||
// IMPORTANT: install.sh overwrites this file at the destination ONLY if no
|
// FULL REFERENCE — every key, type, default, range, validation rule and hot-reload
|
||||||
// appsettings.json already exists there. An existing file is always preserved.
|
// behaviour — lives in docs/Operations/Configuration.md. The notes below are brief
|
||||||
//
|
// pointers only; consult that document before editing.
|
||||||
// This is the Linux counterpart of mbproxy.config.template.json — identical except
|
|
||||||
// for the rolling-log path (/var/log/mbproxy) and a few platform notes. It is shipped
|
|
||||||
// as appsettings.json by a `dotnet publish -r linux-*` build.
|
|
||||||
{
|
{
|
||||||
"Mbproxy": {
|
"Mbproxy": {
|
||||||
|
|
||||||
// ── Global BCD tag list ─────────────────────────────────────────────────────────────
|
// Fleet-wide BCD tag list — applies to every PLC. Each entry: Address (Modbus
|
||||||
// These tags apply to EVERY PLC by default.
|
// PDU-decimal), Width (16 or 32), optional Name (debug-view label) and CacheTtlMs.
|
||||||
// Each entry: Address (Modbus PDU address, decimal), Width (16 or 32 bits).
|
// Per-PLC Add/Remove overrides go under Plcs[].BcdTags. Trailing comments give the
|
||||||
//
|
// 4xxxx Modbus address and the DirectLOGIC V-memory reference.
|
||||||
// Width 16 — one register holds 4 BCD digits (0–9999).
|
|
||||||
// Wire value 0x1234 decodes to decimal 1234.
|
|
||||||
//
|
|
||||||
// Width 32 — a CDAB-ordered register pair (Address = low word, Address+1 = high word).
|
|
||||||
// Decoded decimal = high * 10000 + low (DirectLOGIC CDAB word order).
|
|
||||||
//
|
|
||||||
// Per-PLC overrides (see Plcs[].BcdTags below):
|
|
||||||
// Add — appends extra tags beyond what Global defines, or overrides a
|
|
||||||
// Global entry's Width when the same Address appears in both.
|
|
||||||
// Remove — removes specific addresses from the effective set for that PLC.
|
|
||||||
// Effective set = (Global ∪ Add) − Remove, resolved per PDU.
|
|
||||||
"BcdTags": {
|
"BcdTags": {
|
||||||
"Global": [
|
"Global": [
|
||||||
// V2000 (octal) = decimal address 1024. 16-bit BCD counter.
|
// 16-bit setpoints
|
||||||
{ "Address": 1024, "Width": 16 },
|
{ "Address": 1536, "Width": 16, "Name": "Left ArgonSP" }, // 41537
|
||||||
|
{ "Address": 1539, "Width": 16, "Name": "Right ArgonSP" }, // 41540
|
||||||
|
{ "Address": 1544, "Width": 16, "Name": "Left ChlorineSP" }, // 41545 · V3010
|
||||||
|
{ "Address": 1545, "Width": 16, "Name": "Right ChlorineSP" }, // 41546 · V3011
|
||||||
|
{ "Address": 1546, "Width": 16, "Name": "Left HydrogenSP" }, // 41547 · V3012
|
||||||
|
{ "Address": 1547, "Width": 16, "Name": "Right HydrogenSP" }, // 41548 · V3013
|
||||||
|
{ "Address": 1548, "Width": 16, "Name": "Left AirSP" }, // 41549 · V3014
|
||||||
|
{ "Address": 1549, "Width": 16, "Name": "Right AirSP" }, // 41550 · V3015
|
||||||
|
|
||||||
// V2040 (octal) = decimal address 1056. 32-bit BCD total at 1056/1057.
|
// 32-bit runtimes — CDAB pair spanning Address and Address+1
|
||||||
{ "Address": 1056, "Width": 32 },
|
{ "Address": 4616, "Width": 32, "Name": "MTA Runtime Left (min)" }, // 44617/44618 · V11010
|
||||||
|
{ "Address": 4618, "Width": 32, "Name": "MTA Runtime Right (min)" }, // 44619/44620 · V11012
|
||||||
// V2100 (octal) = decimal address 1088. 16-bit BCD setpoint.
|
{ "Address": 4626, "Width": 32, "Name": "FRR Runtime Left (min)" }, // 44627/44628 · V11022
|
||||||
//
|
{ "Address": 4628, "Width": 32, "Name": "FRR Runtime Right (min)" } // 44629/44630 · V11024
|
||||||
// Phase 11: CacheTtlMs (optional) opts this tag into the response cache. With
|
|
||||||
// CacheTtlMs > 0 set, upstream clients reading this register will see values up
|
|
||||||
// to CacheTtlMs MILLISECONDS OLD — explicit acknowledgement of the staleness
|
|
||||||
// window is required by enabling it. Default (omitted or 0) = cache disabled
|
|
||||||
// for this tag. The cache is OFF by default for every tag.
|
|
||||||
{ "Address": 1088, "Width": 16 /* , "CacheTtlMs": 1000 */ }
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|
||||||
// ── PLC list ────────────────────────────────────────────────────────────────────────
|
// One entry per PLC: upstream clients connect to ListenPort, the proxy forwards to
|
||||||
// Each entry maps one upstream proxy port → one backend PLC.
|
// Host:Port. ListenPort must be unique. Optional per-PLC "BcdTags": { "Add", "Remove" }.
|
||||||
// Upstream clients connect to ListenPort; the proxy forwards to Host:Port.
|
|
||||||
//
|
|
||||||
// IMPORTANT: H2-ECOM100 modules accept at most 4 simultaneous TCP connections.
|
|
||||||
// With the 1:1 upstream↔backend model, a fifth upstream client to the same proxy
|
|
||||||
// port will cause a backend connect failure and an immediate upstream disconnect.
|
|
||||||
"Plcs": [
|
"Plcs": [
|
||||||
{
|
{
|
||||||
"Name": "Line1-Mixer", // Human-readable name (shown on status page and in logs)
|
"Name": "Z28061",
|
||||||
"ListenPort": 5020, // Port the proxy listens on (upstream clients connect here)
|
"ListenPort": 5020,
|
||||||
"Host": "10.0.1.1", // PLC IP address or hostname
|
"Host": "10.210.192.5",
|
||||||
"Port": 502, // PLC Modbus TCP port (almost always 502)
|
|
||||||
"BcdTags": {
|
|
||||||
// Additional 32-bit tag specific to this PLC only.
|
|
||||||
"Add": [
|
|
||||||
{ "Address": 1200, "Width": 32 }
|
|
||||||
],
|
|
||||||
// Remove address 1056 from the Global list for this PLC
|
|
||||||
// (this mixer doesn't use the 32-bit BCD total).
|
|
||||||
"Remove": [ 1056 ]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"Name": "Line1-Conveyor",
|
|
||||||
"ListenPort": 5021,
|
|
||||||
"Host": "10.0.1.2",
|
|
||||||
"Port": 502
|
"Port": 502
|
||||||
// No BcdTags override — uses the Global set as-is.
|
|
||||||
}
|
}
|
||||||
// Add one entry per PLC. Ports must be unique per host. Typical fleet: 54 PLCs.
|
|
||||||
],
|
],
|
||||||
|
|
||||||
// ── Admin port ──────────────────────────────────────────────────────────────────────
|
// Read-only HTTP status page / dashboard. Set to 0 to disable the admin endpoint.
|
||||||
// Read-only HTTP status page.
|
|
||||||
// GET / → self-contained HTML (auto-refreshes every 5 s)
|
|
||||||
// GET /status.json → same data as JSON for monitoring scrapers
|
|
||||||
//
|
|
||||||
// Authentication is assumed at the network layer (trusted internal segment).
|
|
||||||
// Set to 0 to disable the admin endpoint.
|
|
||||||
"AdminPort": 8080,
|
"AdminPort": 8080,
|
||||||
|
|
||||||
// ── Connection timeouts ─────────────────────────────────────────────────────────────
|
// Backend connect / request / graceful-shutdown timeouts (ms), plus TCP keepalive
|
||||||
|
// and the idle-backend FC03 heartbeat. BackendHeartbeatIdleMs must exceed
|
||||||
|
// BackendRequestTimeoutMs. See docs/Architecture/Keepalive.md.
|
||||||
"Connection": {
|
"Connection": {
|
||||||
// Max time (ms) to wait for a TCP connect to the PLC backend.
|
|
||||||
// Each Polly retry attempt gets its own copy of this timeout.
|
|
||||||
"BackendConnectTimeoutMs": 3000,
|
"BackendConnectTimeoutMs": 3000,
|
||||||
|
|
||||||
// Max time (ms) to wait for the PLC to respond to a forwarded PDU.
|
|
||||||
// Non-idempotent FC06/FC16 writes are one-shot — the upstream client
|
|
||||||
// is disconnected immediately on timeout (no retry).
|
|
||||||
"BackendRequestTimeoutMs": 3000,
|
"BackendRequestTimeoutMs": 3000,
|
||||||
|
|
||||||
// Max time (ms) to wait for in-flight PDUs to complete during graceful shutdown
|
|
||||||
// (systemctl stop → SIGTERM). After this deadline the coordinator cancels
|
|
||||||
// remaining work and proceeds. Keep at or below the unit's TimeoutStopSec.
|
|
||||||
"GracefulShutdownTimeoutMs": 10000,
|
"GracefulShutdownTimeoutMs": 10000,
|
||||||
|
|
||||||
// ── Keepalive / connection monitoring ───────────────────────────────────
|
|
||||||
// The DL205/DL260 ECOM does not emit TCP keepalives, so an idle backend
|
|
||||||
// socket can be silently dropped by a middlebox (switch, firewall, NAT)
|
|
||||||
// after 2-5 minutes. This section enables OS-level SO_KEEPALIVE on both
|
|
||||||
// backend and upstream sockets, and drives a periodic Modbus FC03 heartbeat
|
|
||||||
// on each idle backend socket so a dead path is detected before a real
|
|
||||||
// client request hits it. See docs/Architecture/Keepalive.md.
|
|
||||||
"Keepalive": {
|
"Keepalive": {
|
||||||
// Master switch. false → no SO_KEEPALIVE and no heartbeat; the proxy
|
|
||||||
// behaves exactly as a pre-keepalive build.
|
|
||||||
"Enabled": true,
|
"Enabled": true,
|
||||||
|
|
||||||
// SO_KEEPALIVE: idle time (ms) before the OS sends its first probe.
|
|
||||||
"TcpIdleTimeMs": 30000,
|
"TcpIdleTimeMs": 30000,
|
||||||
// SO_KEEPALIVE: interval (ms) between probes once the idle time elapses.
|
|
||||||
"TcpProbeIntervalMs": 5000,
|
"TcpProbeIntervalMs": 5000,
|
||||||
// SO_KEEPALIVE: unanswered probes before the OS declares the socket dead.
|
|
||||||
"TcpProbeCount": 4,
|
"TcpProbeCount": 4,
|
||||||
|
|
||||||
// Backend heartbeat: after this much backend idle (ms) the proxy issues a
|
|
||||||
// synthetic FC03 qty=1 read to keep the path warm and prove the ECOM is
|
|
||||||
// still answering Modbus. Must be greater than BackendRequestTimeoutMs.
|
|
||||||
"BackendHeartbeatIdleMs": 30000,
|
"BackendHeartbeatIdleMs": 30000,
|
||||||
// FC03 PDU address the heartbeat reads. 0 = V0, valid on DL205/DL260.
|
|
||||||
"BackendHeartbeatProbeAddress": 0
|
"BackendHeartbeatProbeAddress": 0
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
|
||||||
// ── Resilience policies ─────────────────────────────────────────────────────────────
|
// Polly policies: backend-connect retry, listener-bind recovery, read coalescing.
|
||||||
|
// BackendConnect / ListenerRecovery are restart-only (not hot-reloaded).
|
||||||
"Resilience": {
|
"Resilience": {
|
||||||
|
"BackendConnect": { "MaxAttempts": 3, "BackoffMs": [ 100, 500, 2000 ] },
|
||||||
// Polly retry policy for backend TCP connect attempts.
|
"ListenerRecovery": { "InitialBackoffMs": [ 1000, 2000, 5000, 15000, 30000 ], "SteadyStateMs": 30000 },
|
||||||
// MaxAttempts: total connect tries (including the first).
|
"ReadCoalescing": { "Enabled": true, "MaxParties": 32 }
|
||||||
// BackoffMs: delay between each attempt (must have MaxAttempts−1 entries).
|
|
||||||
"BackendConnect": {
|
|
||||||
"MaxAttempts": 3,
|
|
||||||
"BackoffMs": [ 100, 500, 2000 ]
|
|
||||||
},
|
|
||||||
|
|
||||||
// Polly recovery policy for listener bind failures.
|
|
||||||
// If a PLC's listen port can't be bound (in-use, bad IP, transient OS error),
|
|
||||||
// the supervisor retries according to this schedule.
|
|
||||||
// InitialBackoffMs: backoff per step (first N retries).
|
|
||||||
// SteadyStateMs: backoff for all subsequent retries (runs indefinitely).
|
|
||||||
"ListenerRecovery": {
|
|
||||||
"InitialBackoffMs": [ 1000, 2000, 5000, 15000, 30000 ],
|
|
||||||
"SteadyStateMs": 30000
|
|
||||||
},
|
|
||||||
|
|
||||||
// Phase 10 — in-flight read coalescing.
|
|
||||||
//
|
|
||||||
// When two or more upstream clients (HMI / historian / engineering workstation /
|
|
||||||
// gateway) issue the SAME FC03 or FC04 read while a matching backend round-trip is
|
|
||||||
// already in flight, the proxy attaches the late arrivals to the existing in-flight
|
|
||||||
// entry and fans the single PLC response out to every attached client — saving the
|
|
||||||
// ECOM's per-scan PDU budget on duplicated reads.
|
|
||||||
//
|
|
||||||
// Zero post-response staleness: coalescing operates ONLY between "first request
|
|
||||||
// sent to PLC" and "response received from PLC" (microseconds to ~10 ms typical).
|
|
||||||
// Each upstream client still sees its own MBAP transaction ID echoed correctly;
|
|
||||||
// the proxy is transparent.
|
|
||||||
//
|
|
||||||
// FC06 / FC16 writes are NEVER coalesced (non-idempotent). FC03 vs FC04 are
|
|
||||||
// separate Modbus tables and never share a coalescing key. Different unit IDs
|
|
||||||
// (multi-drop / gateway-backed setups) never coalesce.
|
|
||||||
//
|
|
||||||
// Enabled — master switch. Hot-reloadable; flipping to false leaves running
|
|
||||||
// coalesced entries to drain naturally.
|
|
||||||
// MaxParties — per-entry cap on attached parties. Past the cap, the next
|
|
||||||
// identical request opens a fresh backend round-trip (load-shedding
|
|
||||||
// safety valve for very fan-out-heavy fleets).
|
|
||||||
"ReadCoalescing": {
|
|
||||||
"Enabled": true,
|
|
||||||
"MaxParties": 32
|
|
||||||
}
|
|
||||||
},
|
},
|
||||||
|
|
||||||
// ── Response cache (Phase 11) — opt-in bounded-staleness cache ──────────────────
|
// Opt-in response cache — OFF by default per tag. A tag opts in via its CacheTtlMs
|
||||||
//
|
// (or a PLC's DefaultCacheTtlMs); these are service-wide safety knobs only.
|
||||||
// ⚠ DESIGN-CONTRACT PIVOT: with caching enabled the proxy is no longer purely
|
// See docs/Architecture/ResponseCache.md.
|
||||||
// transparent. Upstream FC03/FC04 reads for cache-enabled tags may return values
|
|
||||||
// up to CacheTtlMs MILLISECONDS OLD. Operators opt tags in by setting a non-zero
|
|
||||||
// CacheTtlMs on a BcdTagOptions entry (or DefaultCacheTtlMs on a PlcOptions entry).
|
|
||||||
//
|
|
||||||
// The cache is OFF BY DEFAULT for every tag. A deployment with NO TTL config (this
|
|
||||||
// section entirely absent and no BcdTags.*.CacheTtlMs / Plcs[i].DefaultCacheTtlMs)
|
|
||||||
// behaves IDENTICALLY to a pre-Phase-11 deployment — no behaviour change.
|
|
||||||
//
|
|
||||||
// AllowLongTtl — gate for any CacheTtlMs > 60_000. Reload validation
|
|
||||||
// rejects configs that exceed 60 s without this opt-in,
|
|
||||||
// to prevent accidentally-stale-for-an-hour deployments.
|
|
||||||
// MaxEntriesPerPlc — LRU cap per-PLC. Past this cap, the next insert evicts
|
|
||||||
// the least-recently-used entry. Defaults to 1000.
|
|
||||||
// EvictionIntervalMs — background eviction tick. Scans each PLC's cache and
|
|
||||||
// removes entries past their TTL. Defaults to 5000.
|
|
||||||
//
|
|
||||||
// Properties (full text in docs/Architecture/ResponseCache.md):
|
|
||||||
// * Cache hits SHORT-CIRCUIT coalescing entirely (cache → coalesce → backend).
|
|
||||||
// * Successful FC06/FC16 write responses invalidate every cached FC03/FC04 entry
|
|
||||||
// whose address range OVERLAPS the write — not just exact-key match.
|
|
||||||
// * Multi-tag read range: effective TTL = min(TTLs). Any tag with TTL=0 in the
|
|
||||||
// range disables caching for the whole read.
|
|
||||||
// * Cache stores POST-rewriter bytes; hits never re-invoke the BCD rewriter.
|
|
||||||
// * Tag-list hot-reload flushes the affected PLC's whole cache.
|
|
||||||
// * No persistence — process restart wipes the cache.
|
|
||||||
"Cache": {
|
"Cache": {
|
||||||
"AllowLongTtl": false,
|
"AllowLongTtl": false,
|
||||||
"MaxEntriesPerPlc": 1000,
|
"MaxEntriesPerPlc": 1000,
|
||||||
"EvictionIntervalMs": 5000
|
"EvictionIntervalMs": 5000
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
|
||||||
// ── Serilog ─────────────────────────────────────────────────────────────────────────────
|
// Structured logging — console (captured by systemd-journald) + daily rolling file
|
||||||
// Structured log output. Default: Information level, console + rolling-file.
|
// under /var/log/mbproxy. Error+ events also go to local syslog under systemd (wired
|
||||||
// The console sink is captured by systemd-journald (view with `journalctl -u mbproxy`).
|
// in code, not here). See docs/Operations/Troubleshooting.md.
|
||||||
// In addition, when mbproxy runs as a systemd service the SyslogBridge writes Error+
|
|
||||||
// events to the local syslog with proper RFC5424 severity (wired in code, not here).
|
|
||||||
"Serilog": {
|
"Serilog": {
|
||||||
"Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
|
"Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
|
||||||
"MinimumLevel": {
|
"MinimumLevel": {
|
||||||
@@ -241,9 +107,6 @@
|
|||||||
{
|
{
|
||||||
"Name": "File",
|
"Name": "File",
|
||||||
"Args": {
|
"Args": {
|
||||||
// Rolling log: one file per day, kept for 30 days, under /var/log/mbproxy
|
|
||||||
// (created by install.sh and owned by the mbproxy service account).
|
|
||||||
// Survives uninstall — uninstall.sh archives logs to /var/log/mbproxy.archived-<ts>.
|
|
||||||
"path": "/var/log/mbproxy/mbproxy-.log",
|
"path": "/var/log/mbproxy/mbproxy-.log",
|
||||||
"rollingInterval": "Day",
|
"rollingInterval": "Day",
|
||||||
"retainedFileCountLimit": 30,
|
"retainedFileCountLimit": 30,
|
||||||
|
|||||||
@@ -38,6 +38,9 @@ ProtectSystem=strict
|
|||||||
ProtectHome=true
|
ProtectHome=true
|
||||||
PrivateTmp=true
|
PrivateTmp=true
|
||||||
ReadWritePaths=/var/log/mbproxy /var/cache/mbproxy
|
ReadWritePaths=/var/log/mbproxy /var/cache/mbproxy
|
||||||
|
# /etc/mbproxy is intentionally absent from ReadWritePaths: the service only READS its
|
||||||
|
# config (ProtectSystem=strict still allows reads), and config changes are an admin
|
||||||
|
# operation. Editing appsettings.json must be done as root, not by the service account.
|
||||||
# If any configured ListenPort is below 1024, also add:
|
# If any configured ListenPort is below 1024, also add:
|
||||||
# AmbientCapabilities=CAP_NET_BIND_SERVICE
|
# AmbientCapabilities=CAP_NET_BIND_SERVICE
|
||||||
|
|
||||||
|
|||||||
@@ -10,6 +10,11 @@
|
|||||||
framework-dependent\ ~1.6 MB — requires the .NET 10 + ASP.NET Core runtime
|
framework-dependent\ ~1.6 MB — requires the .NET 10 + ASP.NET Core runtime
|
||||||
preinstalled on the target.
|
preinstalled on the target.
|
||||||
|
|
||||||
|
Each folder also receives a current appsettings.json — the platform-appropriate
|
||||||
|
install template (Windows or Linux, selected by -Rid) — so every publish-out
|
||||||
|
flavour is a complete, deployable folder. For a win-* RID the four service
|
||||||
|
batch files (install/remove/start/stop-service.bat) are copied in as well.
|
||||||
|
|
||||||
The runtime is selected with -Rid (default win-x64). The binary is Mbproxy.exe on
|
The runtime is selected with -Rid (default win-x64). The binary is Mbproxy.exe on
|
||||||
Windows RIDs and Mbproxy on Linux/macOS RIDs.
|
Windows RIDs and Mbproxy on Linux/macOS RIDs.
|
||||||
|
|
||||||
@@ -70,6 +75,41 @@ Write-Host "`n=== Publishing framework-dependent ($Rid, ~1.6 MB) ===" -Foregroun
|
|||||||
& dotnet publish $csproj -c Release -r $Rid -p:SelfContained=false -p:PublishSingleFile=true -o $frameworkDependentOut --nologo
|
& dotnet publish $csproj -c Release -r $Rid -p:SelfContained=false -p:PublishSingleFile=true -o $frameworkDependentOut --nologo
|
||||||
if ($LASTEXITCODE -ne 0) { throw "framework-dependent publish failed (exit $LASTEXITCODE)" }
|
if ($LASTEXITCODE -ne 0) { throw "framework-dependent publish failed (exit $LASTEXITCODE)" }
|
||||||
|
|
||||||
|
# ── Ship the platform-appropriate config template as appsettings.json ──────────
|
||||||
|
# dotnet publish already copies it via the Mbproxy.csproj <Content> link, but that
|
||||||
|
# link uses PreserveNewest — an incremental (non-Clean) run can leave a stale
|
||||||
|
# config behind. Copy it explicitly so every publish-out flavour is guaranteed a
|
||||||
|
# current appsettings.json, and so the config's source is obvious.
|
||||||
|
$configTemplate = if ($Rid -like 'win-*') {
|
||||||
|
Join-Path $repoRoot 'install\mbproxy.config.template.json'
|
||||||
|
} else {
|
||||||
|
Join-Path $repoRoot 'install\mbproxy.linux.config.template.json'
|
||||||
|
}
|
||||||
|
if (-not (Test-Path $configTemplate)) { throw "Cannot find config template: $configTemplate" }
|
||||||
|
|
||||||
|
Write-Host "`n=== Config (appsettings.json) ===" -ForegroundColor Cyan
|
||||||
|
foreach ($flavour in 'self-contained','framework-dependent') {
|
||||||
|
$dest = Join-Path $OutputDir "$flavour\appsettings.json"
|
||||||
|
Copy-Item -LiteralPath $configTemplate -Destination $dest -Force
|
||||||
|
Write-Host (" {0,-22} <- {1}" -f $flavour, $configTemplate)
|
||||||
|
}
|
||||||
|
|
||||||
|
# ── Ship the Windows service-management batch files (win RIDs only) ─────────────
|
||||||
|
# install-service / remove-service / start-service / stop-service all act on the
|
||||||
|
# Mbproxy.exe in their own folder, so the published folder is self-managing.
|
||||||
|
if ($Rid -like 'win-*') {
|
||||||
|
$serviceScripts = 'install-service.bat','remove-service.bat','start-service.bat','stop-service.bat'
|
||||||
|
Write-Host "`n=== Service scripts ===" -ForegroundColor Cyan
|
||||||
|
foreach ($flavour in 'self-contained','framework-dependent') {
|
||||||
|
foreach ($script in $serviceScripts) {
|
||||||
|
$src = Join-Path $repoRoot "install\$script"
|
||||||
|
if (-not (Test-Path $src)) { throw "Cannot find service script: $src" }
|
||||||
|
Copy-Item -LiteralPath $src -Destination (Join-Path $OutputDir "$flavour\$script") -Force
|
||||||
|
}
|
||||||
|
Write-Host (" {0,-22} <- install\*-service.bat" -f $flavour)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
function Format-Size {
|
function Format-Size {
|
||||||
param([long]$Bytes)
|
param([long]$Bytes)
|
||||||
if ($Bytes -ge 1MB) { '{0:N1} MB' -f ($Bytes / 1MB) }
|
if ($Bytes -ge 1MB) { '{0:N1} MB' -f ($Bytes / 1MB) }
|
||||||
@@ -83,7 +123,9 @@ foreach ($flavour in 'self-contained','framework-dependent') {
|
|||||||
$size = (Get-Item $bin).Length
|
$size = (Get-Item $bin).Length
|
||||||
Write-Host (" {0,-22} {1,10} {2}" -f $flavour, (Format-Size $size), $bin)
|
Write-Host (" {0,-22} {1,10} {2}" -f $flavour, (Format-Size $size), $bin)
|
||||||
} else {
|
} else {
|
||||||
Write-Warning "Missing: $bin"
|
# A missing expected binary means the publish silently produced nothing usable —
|
||||||
|
# fail the script rather than emit a warning a CI job would scroll past.
|
||||||
|
throw "Expected published binary not found: $bin"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
Write-Host ""
|
Write-Host ""
|
||||||
|
|||||||
@@ -10,6 +10,10 @@
|
|||||||
# framework-dependent/ ~1.6 MB — requires the .NET 10 + ASP.NET Core runtime
|
# framework-dependent/ ~1.6 MB — requires the .NET 10 + ASP.NET Core runtime
|
||||||
# preinstalled on the target.
|
# preinstalled on the target.
|
||||||
#
|
#
|
||||||
|
# Each folder also receives a current appsettings.json — the platform-appropriate
|
||||||
|
# install template (Windows or Linux, selected by -r RID) — so every publish-out
|
||||||
|
# flavour is a complete, deployable folder.
|
||||||
|
#
|
||||||
# Both builds use the Release configuration and inherit the publish settings in
|
# Both builds use the Release configuration and inherit the publish settings in
|
||||||
# src/Mbproxy/Mbproxy.csproj (those settings are gated on an explicit RID, which
|
# src/Mbproxy/Mbproxy.csproj (those settings are gated on an explicit RID, which
|
||||||
# is supplied here). The framework-dependent build overrides SelfContained=false.
|
# is supplied here). The framework-dependent build overrides SelfContained=false.
|
||||||
@@ -68,6 +72,46 @@ echo "=== Publishing framework-dependent ($rid, ~1.6 MB) ==="
|
|||||||
dotnet publish "$csproj" -c Release -r "$rid" \
|
dotnet publish "$csproj" -c Release -r "$rid" \
|
||||||
-p:SelfContained=false -p:PublishSingleFile=true -o "$framework_dependent_out" --nologo
|
-p:SelfContained=false -p:PublishSingleFile=true -o "$framework_dependent_out" --nologo
|
||||||
|
|
||||||
|
# Ship the platform-appropriate config template as appsettings.json.
|
||||||
|
# dotnet publish already copies it via the Mbproxy.csproj <Content> link, but that
|
||||||
|
# link uses PreserveNewest — an incremental (non-clean) run can leave a stale config
|
||||||
|
# behind. Copy it explicitly so every publish-out flavour is guaranteed a current
|
||||||
|
# appsettings.json, and so the config's source is obvious.
|
||||||
|
if [[ "$rid" == win-* ]]; then
|
||||||
|
config_template="$repo_root/install/mbproxy.config.template.json"
|
||||||
|
else
|
||||||
|
config_template="$repo_root/install/mbproxy.linux.config.template.json"
|
||||||
|
fi
|
||||||
|
if [[ ! -f "$config_template" ]]; then
|
||||||
|
echo "Cannot find config template: $config_template" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo
|
||||||
|
echo "=== Config (appsettings.json) ==="
|
||||||
|
for flavour in self-contained framework-dependent; do
|
||||||
|
cp -f "$config_template" "$output_dir/$flavour/appsettings.json"
|
||||||
|
printf ' %-22s <- %s\n' "$flavour" "$config_template"
|
||||||
|
done
|
||||||
|
|
||||||
|
# Ship the Windows service-management batch files (win RIDs only). Each acts on the
|
||||||
|
# Mbproxy.exe in its own folder, so the published folder is self-managing.
|
||||||
|
if [[ "$rid" == win-* ]]; then
|
||||||
|
echo
|
||||||
|
echo "=== Service scripts ==="
|
||||||
|
for flavour in self-contained framework-dependent; do
|
||||||
|
for script in install-service.bat remove-service.bat start-service.bat stop-service.bat; do
|
||||||
|
src="$repo_root/install/$script"
|
||||||
|
if [[ ! -f "$src" ]]; then
|
||||||
|
echo "Cannot find service script: $src" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
cp -f "$src" "$output_dir/$flavour/$script"
|
||||||
|
done
|
||||||
|
printf ' %-22s <- install/*-service.bat\n' "$flavour"
|
||||||
|
done
|
||||||
|
fi
|
||||||
|
|
||||||
echo
|
echo
|
||||||
echo "=== Result ($rid) ==="
|
echo "=== Result ($rid) ==="
|
||||||
for flavour in self-contained framework-dependent; do
|
for flavour in self-contained framework-dependent; do
|
||||||
@@ -76,7 +120,10 @@ for flavour in self-contained framework-dependent; do
|
|||||||
size="$(du -h "$bin" | cut -f1)"
|
size="$(du -h "$bin" | cut -f1)"
|
||||||
printf ' %-22s %8s %s\n' "$flavour" "$size" "$bin"
|
printf ' %-22s %8s %s\n' "$flavour" "$size" "$bin"
|
||||||
else
|
else
|
||||||
echo " WARNING: missing $bin" >&2
|
# A missing expected binary means the publish silently produced nothing
|
||||||
|
# usable — fail rather than emit a warning a CI job would scroll past.
|
||||||
|
echo "ERROR: expected published binary not found: $bin" >&2
|
||||||
|
exit 1
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
echo
|
echo
|
||||||
|
|||||||
@@ -0,0 +1,39 @@
|
|||||||
|
@echo off
|
||||||
|
REM ---------------------------------------------------------------------------
|
||||||
|
REM Stops and removes the "mbproxy" Windows service. Does not delete any files
|
||||||
|
REM in this folder. Run this script as Administrator.
|
||||||
|
REM ---------------------------------------------------------------------------
|
||||||
|
setlocal
|
||||||
|
set "SVC=mbproxy"
|
||||||
|
|
||||||
|
net session >nul 2>&1
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo ERROR: this script must be run as Administrator.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
|
||||||
|
sc query %SVC% >nul 2>&1
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo Service "%SVC%" is not installed — nothing to do.
|
||||||
|
pause
|
||||||
|
exit /b 0
|
||||||
|
)
|
||||||
|
|
||||||
|
echo Stopping service "%SVC%" (if running)...
|
||||||
|
sc stop %SVC% >nul 2>&1
|
||||||
|
|
||||||
|
REM Give the service a few seconds to drain and stop before deleting it.
|
||||||
|
timeout /t 5 /nobreak >nul
|
||||||
|
|
||||||
|
echo Removing service "%SVC%"...
|
||||||
|
sc delete %SVC%
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo ERROR: sc delete failed. If the service still shows up, close
|
||||||
|
echo services.msc and try again.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
|
||||||
|
echo Removed.
|
||||||
|
pause
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
@echo off
|
||||||
|
REM ---------------------------------------------------------------------------
|
||||||
|
REM Starts the "mbproxy" Windows service. Run this script as Administrator.
|
||||||
|
REM ---------------------------------------------------------------------------
|
||||||
|
setlocal
|
||||||
|
|
||||||
|
net session >nul 2>&1
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo ERROR: this script must be run as Administrator.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
|
||||||
|
net start mbproxy
|
||||||
|
pause
|
||||||
@@ -0,0 +1,15 @@
|
|||||||
|
@echo off
|
||||||
|
REM ---------------------------------------------------------------------------
|
||||||
|
REM Stops the "mbproxy" Windows service. Run this script as Administrator.
|
||||||
|
REM ---------------------------------------------------------------------------
|
||||||
|
setlocal
|
||||||
|
|
||||||
|
net session >nul 2>&1
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo ERROR: this script must be run as Administrator.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
|
||||||
|
net stop mbproxy
|
||||||
|
pause
|
||||||
@@ -0,0 +1,514 @@
|
|||||||
|
# mbproxy Web UI Dashboard Redesign — Implementation Plan
|
||||||
|
|
||||||
|
**Created:** 2026-05-15
|
||||||
|
**Status:** Complete — all 7 phases done, Gates 0–6 green. `dotnet build` 0
|
||||||
|
warnings; `dotnet test` 452 passed / 0 failed; single-file `win-x64` publish
|
||||||
|
serves the full UI with zero external requests. Not yet committed.
|
||||||
|
**Execution:** Sequential, single agent (phases 1→6 in order).
|
||||||
|
**Working artifact** — not part of the `docs/` source-of-truth tree (per `../../DOCS-GUIDE.md`).
|
||||||
|
Delete or archive once the work lands and `docs/Operations/StatusPage.md` is updated.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Replace the single auto-refreshing zero-JS status page with a two-view operator
|
||||||
|
console:
|
||||||
|
|
||||||
|
1. **Fleet dashboard** (`GET /`) — aggregate fleet health at the top, a
|
||||||
|
filterable/sortable per-PLC KPI table below. Live via SignalR.
|
||||||
|
2. **Connection detail page** (`GET /plc/{name}`, opened in a new tab) — every
|
||||||
|
per-PLC counter regrouped into readable cards, the per-upstream-client list,
|
||||||
|
and a **real-time debug view**: a per-tag live-value table showing the raw
|
||||||
|
PLC-side value vs. the decoded client-side value for each configured BCD tag.
|
||||||
|
Live via SignalR.
|
||||||
|
|
||||||
|
`GET /status.json` is unchanged — scrapers depend on it (see
|
||||||
|
`docs/Operations/StatusPage.md` "How to Scrape It"). The old
|
||||||
|
`StatusHtmlRenderer` / `<meta http-equiv="refresh">` page is retired.
|
||||||
|
|
||||||
|
### Decisions (requirements review, 2026-05-15)
|
||||||
|
|
||||||
|
- **Debug view = per-tag live values.** Last raw PLC-side value (BCD nibbles),
|
||||||
|
last decoded client-side value, direction, age — one slot per configured BCD
|
||||||
|
tag. No transaction ring buffer.
|
||||||
|
- **On-demand capture.** Per-tag capture is armed only while a PLC's detail
|
||||||
|
page has a live SignalR subscriber; disarmed (and slots cleared) when the
|
||||||
|
last viewer leaves. Zero hot-path cost otherwise. No new write/control
|
||||||
|
action — admin stays read-only.
|
||||||
|
- **Bootstrap 5, vanilla JS, no build step.** Bootstrap CSS/JS, the SignalR JS
|
||||||
|
client, and the app's HTML/CSS/JS are vendored into the repo and embedded as
|
||||||
|
resources in the single-file binary. Nothing is CDN-fetched.
|
||||||
|
- **Visual direction: refined technical-light.** Customized Bootstrap theme
|
||||||
|
(CSS-variable overrides, not stock), monospace for numeric cells, restrained
|
||||||
|
accent palette, status carried by color. **Vendored fonts:** one display +
|
||||||
|
one mono open-licensed (SIL OFL) woff2, embedded — no Google Fonts fetch.
|
||||||
|
- **Push cadence ~1 s**, exposed as `Mbproxy.AdminPushIntervalMs` (default 1000).
|
||||||
|
- **Hub testing:** `StatusHub` unit-tested directly with mocked
|
||||||
|
`IGroupManager` / `HubCallerContext`; `StatusBroadcaster` tested against an
|
||||||
|
in-process Kestrel. No `SignalR.Client` package added to the test project.
|
||||||
|
- **UI gates:** browser smoke tests driven through the **claude-in-chrome MCP**
|
||||||
|
against a running service + the dl205 simulator.
|
||||||
|
- No auth change — admin endpoint stays network-trusted.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Backend instrumentation (the genuinely new code)
|
||||||
|
|
||||||
|
`BcdPduPipeline` is stateless today — it rewrites BCD values in flight and keeps
|
||||||
|
no record of them. The debug view needs a place to record "last raw / last
|
||||||
|
decoded" per tag.
|
||||||
|
|
||||||
|
**`TagValueCapture`** (new, `src/Mbproxy/Proxy/TagValueCapture.cs`) — one
|
||||||
|
instance per PLC.
|
||||||
|
|
||||||
|
- Construction takes the PLC's BCD tag addresses; builds a
|
||||||
|
`FrozenDictionary<ushort,int>` mapping each tag address → a slot index, plus
|
||||||
|
a slot array sized to the tag count.
|
||||||
|
- A slot is an **immutable record**
|
||||||
|
`TagValueSlot(ushort RawValue, int DecodedValue, CaptureDirection Direction,
|
||||||
|
DateTimeOffset UpdatedAtUtc)`. `Record(address, raw, decoded, dir)` builds a
|
||||||
|
fresh slot and `Volatile.Write`s the reference into the array. Reference
|
||||||
|
assignment is atomic and the record is immutable, so a concurrent reader
|
||||||
|
never sees a torn slot — all four fields are coherent. No locks.
|
||||||
|
- `volatile bool _armed` gate: `Record(...)` returns immediately when disarmed.
|
||||||
|
`Arm()` / `Disarm()` flip it; `Disarm()` also null-clears the slot array so a
|
||||||
|
reopened detail page shows "no traffic yet" instead of stale values.
|
||||||
|
- `Snapshot()` `Volatile.Read`s every slot for the SignalR push.
|
||||||
|
|
||||||
|
**`TagCaptureRegistry`** (new, `src/Mbproxy/Proxy/TagCaptureRegistry.cs`) — a
|
||||||
|
DI singleton holding `name → TagValueCapture`. Exposes `Arm(name)`,
|
||||||
|
`Disarm(name)`, `DisarmAll()`, `TryGet(name)`, and `Rebuild(name, addresses)`
|
||||||
|
(used by the hot-reload path to re-key a capture to a changed tag list,
|
||||||
|
preserving the armed flag). `ProxyWorker`, `StatusHub`, `StatusSnapshotBuilder`,
|
||||||
|
and `StatusBroadcaster` all share this one registry.
|
||||||
|
|
||||||
|
**`PerPlcContext`** gains `internal TagValueCapture? Capture { get; init; }`,
|
||||||
|
threaded through `WithCurrentRequest`. Always wired in production; `null` in
|
||||||
|
unit-test contexts that don't exercise it (the pipeline guards with `?.`).
|
||||||
|
|
||||||
|
**`BcdPduPipeline`** records into `ctx.Capture?` at the four points where it
|
||||||
|
already holds both the raw and decoded value: FC03/FC04 response decode (16-bit
|
||||||
|
and 32-bit branches), FC06 request encode, FC16 request encode. Recording is
|
||||||
|
post-decode (Read) / pre-encode (Write) so the table reads "PLC side vs client
|
||||||
|
side" correctly in both directions. Cost when disarmed: one nullable-deref +
|
||||||
|
one volatile-bool read.
|
||||||
|
|
||||||
|
### SignalR
|
||||||
|
|
||||||
|
`Microsoft.AspNetCore.SignalR` ships in the `Microsoft.AspNetCore.App` framework
|
||||||
|
reference already on the project — **no new package**.
|
||||||
|
|
||||||
|
- **`StatusHub`** (`src/Mbproxy/Admin/StatusHub.cs`) — hub at `/hub/status`.
|
||||||
|
Methods: `SubscribeFleet()` joins group `fleet`; `SubscribePlc(name)` joins
|
||||||
|
group `plc:{name}`. A thread-safe per-PLC subscriber counter drives capture
|
||||||
|
arming through `TagCaptureRegistry`: 0→1 arms, 1→0 disarms. `SubscribePlc`
|
||||||
|
for an unknown PLC name is a no-op (no throw, no arm). `OnDisconnectedAsync`
|
||||||
|
decrements every group the connection was in.
|
||||||
|
- **`StatusBroadcaster`** (`src/Mbproxy/Admin/StatusBroadcaster.cs`) — a loop
|
||||||
|
started by `AdminEndpointHost` when the Kestrel app starts, stopped when it
|
||||||
|
stops. Every `AdminPushIntervalMs` it builds a snapshot, pushes the fleet
|
||||||
|
summary to group `fleet`, and pushes per-PLC detail (counters +
|
||||||
|
`TagValueCapture.Snapshot()`) to each `plc:{name}` group that has subscribers
|
||||||
|
(idle groups skipped). `Stop()` calls `registry.DisarmAll()` so an AdminPort
|
||||||
|
hot-reload — which tears down the WebApplication and every SignalR
|
||||||
|
connection — never leaves a capture stuck armed.
|
||||||
|
- SignalR's default JSON hub protocol is reflection-based. The project is
|
||||||
|
**not** `PublishTrimmed` (single-file ≠ trimmed), so it works at runtime. The
|
||||||
|
hub DTOs are still declared in a `JsonSerializable` source-gen context and
|
||||||
|
that `TypeInfoResolver` is registered on the hub protocol options — keeps
|
||||||
|
parity with `StatusJsonContext` and pre-empts a future trim.
|
||||||
|
|
||||||
|
### DI wiring across the inner WebApplication
|
||||||
|
|
||||||
|
`AdminEndpointHost` builds a separate `WebApplication` (`CreateSlimBuilder`)
|
||||||
|
with its own DI container. `TagCaptureRegistry` and `StatusSnapshotBuilder` are
|
||||||
|
outer-host singletons; `AdminEndpointHost` receives both by constructor
|
||||||
|
injection and re-registers them into the inner container so `StatusHub` can
|
||||||
|
resolve them. `StatusBroadcaster` is created by `AdminEndpointHost`, closing
|
||||||
|
over the builder + registry, and pulls `IHubContext<StatusHub>` from
|
||||||
|
`app.Services` after `Build()`.
|
||||||
|
|
||||||
|
### Asset delivery
|
||||||
|
|
||||||
|
- Vendored files committed under `src/Mbproxy/Admin/wwwroot/`:
|
||||||
|
`vendor/bootstrap.min.css`, `vendor/bootstrap.bundle.min.js`,
|
||||||
|
`vendor/signalr.min.js`, `vendor/<display>.woff2`, `vendor/<mono>.woff2`.
|
||||||
|
- App files: `index.html`, `plc.html`, `theme.css` (shared variables + chrome),
|
||||||
|
`dashboard.css` + `dashboard.js` (fleet view), `detail.css` + `detail.js`
|
||||||
|
(detail view).
|
||||||
|
- `Mbproxy.csproj` marks `Admin\wwwroot\**` as `<EmbeddedResource>`.
|
||||||
|
- `AdminEndpointHost` serves them via a single `MapGet("/assets/{*path}")` that
|
||||||
|
streams `Assembly.GetManifestResourceStream` with a static extension→
|
||||||
|
content-type map — fewer moving parts than the embedded-file-provider package
|
||||||
|
for ~8 files. `/assets/*` gets a long immutable cache header; the HTML shells
|
||||||
|
get `no-cache`.
|
||||||
|
|
||||||
|
### Routes after the redesign
|
||||||
|
|
||||||
|
| Route | Serves |
|
||||||
|
|---|---|
|
||||||
|
| `GET /` | Fleet dashboard HTML (`index.html`) |
|
||||||
|
| `GET /plc/{name}` | Detail page HTML (`plc.html`); JS reads `{name}` from the path |
|
||||||
|
| `GET /assets/{*path}` | Embedded vendored + app assets |
|
||||||
|
| `GET /status.json` | Unchanged — source-gen JSON snapshot |
|
||||||
|
| `/hub/status` | SignalR hub |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 0 — Prep
|
||||||
|
|
||||||
|
1. Add `tests/sim/mbproxy.smoke.config.json` (or reuse an existing fixture
|
||||||
|
pattern): several `Plcs` entries with distinct `ListenPort`s all pointing at
|
||||||
|
`127.0.0.1:502` (one dl205 simulator instance multiplexed) plus one entry
|
||||||
|
pointed at an unreachable host so the dashboard shows a `recovering` row and
|
||||||
|
a `bound` row. At least one PLC carries 16-bit and 32-bit BCD tags so the
|
||||||
|
debug view has content. This config backs the Phase 4/5 chrome smoke tests.
|
||||||
|
2. Confirm `tests/sim/run-dl205-sim.ps1` starts cleanly on the dev box.
|
||||||
|
|
||||||
|
**Gate 0:** simulator launches; smoke config validates against `ReloadValidator`
|
||||||
|
(quick `dotnet run`-and-check, or a unit test that binds it).
|
||||||
|
|
||||||
|
## Phase 1 — Backend instrumentation
|
||||||
|
|
||||||
|
**Owns:** new `Proxy/TagValueCapture.cs`, `Proxy/TagCaptureRegistry.cs`; modify
|
||||||
|
`Proxy/PerPlcContext.cs`, `Proxy/BcdPduPipeline.cs`, `Proxy/ProxyWorker.cs`;
|
||||||
|
new DTOs in new `Admin/DebugDto.cs`; modify `Admin/StatusSnapshotBuilder.cs`.
|
||||||
|
|
||||||
|
1. Implement `TagValueCapture` (immutable-slot swap, armed gate, frozen
|
||||||
|
address→index map, `Snapshot`).
|
||||||
|
2. Implement `TagCaptureRegistry` (singleton; arm/disarm/disarmAll/rebuild).
|
||||||
|
3. Add `Capture` to `PerPlcContext` + `WithCurrentRequest`.
|
||||||
|
4. Add the four `ctx.Capture?.Record(...)` calls in `BcdPduPipeline`.
|
||||||
|
5. `ProxyWorker` builds a `TagValueCapture` per PLC, registers each in the
|
||||||
|
registry, wires it into the matching `PerPlcContext`; the tag-list
|
||||||
|
hot-reload path calls `registry.Rebuild(...)`.
|
||||||
|
6. Add `TagValueDto` / `PlcDebugSnapshot` DTOs (in `DebugDto.cs`, with a
|
||||||
|
`JsonSerializable` context) and `StatusSnapshotBuilder.BuildPlcDetail(name)`
|
||||||
|
returning per-PLC counters + capture snapshot.
|
||||||
|
|
||||||
|
**Unit tests** (`tests/Mbproxy.Tests/Proxy/TagValueCaptureTests.cs`,
|
||||||
|
`TagCaptureRegistryTests.cs`, extend `Proxy/.../BcdPduPipeline` tests):
|
||||||
|
|
||||||
|
- `TagValueCapture`: disarmed `Record` is a no-op; armed `Record` updates the
|
||||||
|
matching slot; unknown address ignored; `Disarm` clears slots; re-arm starts
|
||||||
|
empty; 16-bit and 32-bit slots; **concurrency** — parallel `Record` from many
|
||||||
|
threads + concurrent `Snapshot`, assert every observed slot is internally
|
||||||
|
coherent (raw/decoded/dir/time from one `Record`).
|
||||||
|
- `TagCaptureRegistry`: arm/disarm reach the right capture; `DisarmAll`;
|
||||||
|
`Rebuild` preserves the armed flag and re-keys to the new address set;
|
||||||
|
`TryGet`/arm for an unknown name is a safe no-op.
|
||||||
|
- `BcdPduPipeline`: with an armed capture, FC03 / FC04 (16-bit + 32-bit) record
|
||||||
|
raw BCD + decoded; FC06 / FC16 record client value + encoded BCD with
|
||||||
|
direction Write; **regression** — with a disarmed capture and with a `null`
|
||||||
|
capture the pipeline still rewrites identically and never throws.
|
||||||
|
- `StatusSnapshotBuilder.BuildPlcDetail` shape, including an unknown PLC name.
|
||||||
|
|
||||||
|
**Gate 1:** `dotnet build -c Debug` → 0 warnings (TreatWarningsAsErrors).
|
||||||
|
`dotnet test` → full suite green, all existing pipeline tests still pass,
|
||||||
|
new tests above present and passing.
|
||||||
|
|
||||||
|
## Phase 2 — SignalR hub + broadcaster
|
||||||
|
|
||||||
|
**Owns:** new `Admin/StatusHub.cs`, `Admin/StatusBroadcaster.cs`; modify
|
||||||
|
`Admin/AdminEndpointHost.cs`, `Options/MbproxyOptions.cs`,
|
||||||
|
`Configuration/ReloadValidator.cs`.
|
||||||
|
|
||||||
|
1. `Options/MbproxyOptions.cs`: add `AdminPushIntervalMs` (default 1000);
|
||||||
|
`ReloadValidator` rejects values ≤ 0.
|
||||||
|
2. `StatusHub` with `SubscribeFleet` / `SubscribePlc` + the per-PLC subscriber
|
||||||
|
counter → `TagCaptureRegistry` arm/disarm; `OnDisconnectedAsync` cleanup.
|
||||||
|
3. `StatusBroadcaster` push loop; `Stop()` → `DisarmAll()`.
|
||||||
|
4. `AdminEndpointHost.StartAppAsync`: `AddSignalR()` (+ source-gen JSON
|
||||||
|
resolver), re-register `TagCaptureRegistry`/`StatusSnapshotBuilder` into the
|
||||||
|
inner container, `MapHub<StatusHub>("/hub/status")`, create + start the
|
||||||
|
broadcaster after `app.StartAsync`. `StopCurrentAppAsync`: stop the
|
||||||
|
broadcaster before stopping the app. AdminPort hot-reload path inherits this.
|
||||||
|
|
||||||
|
**Unit tests** (`Admin/StatusHubTests.cs`, `Admin/StatusBroadcasterTests.cs`,
|
||||||
|
extend `Options/MbproxyOptionsBindingTests.cs`,
|
||||||
|
`Configuration/ReloadValidatorTests.cs`):
|
||||||
|
|
||||||
|
- `StatusHub` (mock `IGroupManager`, `HubCallerContext`, `IGroupManager`-bearing
|
||||||
|
`Clients`): `SubscribeFleet` joins `fleet`; `SubscribePlc("x")` joins `plc:x`
|
||||||
|
and arms x on the first subscriber; a second subscriber does not re-arm;
|
||||||
|
`OnDisconnectedAsync` disarms only on the last leave; unknown PLC name → no
|
||||||
|
throw, no arm.
|
||||||
|
- `StatusBroadcaster` (mock `IHubContext<StatusHub>`): pushes to `fleet` each
|
||||||
|
tick; **skips** a `plc:x` push when that group has no subscribers; pushes
|
||||||
|
per-PLC detail incl. capture snapshot when subscribed; `Stop()` disarms all.
|
||||||
|
- `AdminPushIntervalMs`: binding default = 1000; `ReloadValidator` rejects 0
|
||||||
|
and negatives.
|
||||||
|
|
||||||
|
**Gate 2:** build → 0 warnings; `dotnet test` → full suite green incl. the new
|
||||||
|
hub/broadcaster tests; service starts and `/hub/status` negotiate responds 200.
|
||||||
|
|
||||||
|
## Phase 3 — Asset pipeline + routing
|
||||||
|
|
||||||
|
**Owns:** modify `Mbproxy.csproj`, `Admin/AdminEndpointHost.cs`; add vendored +
|
||||||
|
placeholder app files under `src/Mbproxy/Admin/wwwroot/`; delete
|
||||||
|
`Admin/StatusHtmlRenderer.cs`.
|
||||||
|
|
||||||
|
1. Vendor Bootstrap 5, the `@microsoft/signalr` browser bundle, and the two
|
||||||
|
SIL-OFL woff2 fonts into `wwwroot/vendor/`. Record exact versions + SHA-256
|
||||||
|
of each vendored file in the progress log (provenance for a firewalled
|
||||||
|
build).
|
||||||
|
2. Create placeholder `index.html`, `plc.html`, `theme.css`, `dashboard.css`,
|
||||||
|
`dashboard.js`, `detail.css`, `detail.js` (real content in Phases 4/5);
|
||||||
|
`theme.css` carries the shared CSS variables + `@font-face` + base chrome so
|
||||||
|
Phases 4 and 5 never both edit one CSS file.
|
||||||
|
3. `<EmbeddedResource Include="Admin\wwwroot\**" />` in the csproj.
|
||||||
|
4. Replace `GET /` (serve `index.html`), add `GET /plc/{name}` (serve
|
||||||
|
`plc.html`) and `GET /assets/{*path}` (stream embedded resource, content-type
|
||||||
|
map, immutable cache header); delete `StatusHtmlRenderer` and remove its use.
|
||||||
|
|
||||||
|
**Tests** (extend `Admin/AdminEndpointTests.cs`, live in-process Kestrel):
|
||||||
|
|
||||||
|
- `GET /` → 200 `text/html`; `GET /plc/foo` → 200 `text/html`.
|
||||||
|
- `GET /assets/vendor/bootstrap.min.css` → 200 `text/css` + long cache header;
|
||||||
|
`.js` → `text/javascript`; `.woff2` → `font/woff2`.
|
||||||
|
- `GET /assets/does-not-exist` → 404.
|
||||||
|
- `GET /status.json` still returns the valid shape (regression).
|
||||||
|
- Delete `StatusHtmlRendererTests.cs`; remove/replace the `AdminEndpointTests`
|
||||||
|
case that asserted the old PLC-table HTML.
|
||||||
|
|
||||||
|
**Gate 3:** build → 0 warnings; `dotnet test` green; `install/publish.ps1 -Rid
|
||||||
|
win-x64` produces a single-file binary that serves `/`, `/plc/{name}`, and every
|
||||||
|
`/assets/*` file with correct content types — verified in a browser with
|
||||||
|
devtools showing **zero external network requests**.
|
||||||
|
|
||||||
|
## Phase 4 — Fleet dashboard frontend
|
||||||
|
|
||||||
|
**Owns:** `wwwroot/index.html`, `wwwroot/dashboard.css`, `wwwroot/dashboard.js`.
|
||||||
|
*(Disjoint from Phase 5's files — see Parallel-safety below.)*
|
||||||
|
|
||||||
|
1. **Aggregate header** — cards: listeners bound/configured, total connected
|
||||||
|
clients, fleet PDU rate (Δcounter/Δt across successive pushes, computed
|
||||||
|
client-side), PLCs in `recovering`, total backend exceptions, fleet
|
||||||
|
coalesce% and cache%.
|
||||||
|
2. **KPI table** — one row per PLC, Tier-1 columns only: state (color chip),
|
||||||
|
clients, PDU rate, RTT, exceptions, coalesce%, cache%, keepalive health.
|
||||||
|
Full detail lives on the detail page.
|
||||||
|
3. **Filter/sort** — client-side: name search, state filter, "problems only"
|
||||||
|
toggle (recovering, or non-zero exceptions, or failed heartbeats), sortable
|
||||||
|
columns.
|
||||||
|
4. SignalR client: connect, `SubscribeFleet()`, re-render per push, automatic
|
||||||
|
reconnect with a visible connection-state indicator.
|
||||||
|
5. Row click → `window.open('/plc/' + encodeURIComponent(name), '_blank')`.
|
||||||
|
6. Apply the refined technical-light theme (Bootstrap CSS-variable overrides in
|
||||||
|
`theme.css`; view-specific rules in `dashboard.css`).
|
||||||
|
|
||||||
|
**Gate 4 — claude-in-chrome smoke:** start the service with the Phase-0 smoke
|
||||||
|
config + simulator; drive Chrome via the MCP to load `/` and assert: header
|
||||||
|
cards render with non-placeholder values; the table has the expected row count;
|
||||||
|
a counter value changes within ~2 push cycles (live update); the "problems
|
||||||
|
only" filter hides the healthy rows; a row click opens a `/plc/{name}` tab.
|
||||||
|
Build + `dotnet test` still green.
|
||||||
|
|
||||||
|
## Phase 5 — Detail page + debug view
|
||||||
|
|
||||||
|
**Owns:** `wwwroot/plc.html`, `wwwroot/detail.css`, `wwwroot/detail.js`.
|
||||||
|
|
||||||
|
1. Read PLC name from `location.pathname`; SignalR connect; `SubscribePlc(name)`.
|
||||||
|
2. **Grouped counter cards** — Listener, Clients (+ per-upstream-client list),
|
||||||
|
PDU traffic, Backend health, Multiplexer, Coalescing, Cache, Keepalive,
|
||||||
|
Bytes. Every per-PLC counter, regrouped for readability.
|
||||||
|
3. **Debug view** — per-tag live-value table: tag address, width (16/32), raw
|
||||||
|
PLC-side value (hex BCD nibbles), decoded client-side value, direction, age.
|
||||||
|
Stale rows dimmed; "no traffic yet" before the first capture.
|
||||||
|
4. Connection-state indicator; clear "PLC no longer configured" state for an
|
||||||
|
unknown / hot-reload-removed name.
|
||||||
|
|
||||||
|
**Gate 5 — claude-in-chrome smoke:** load `/plc/{name}` for a tagged PLC; assert
|
||||||
|
the grouped cards render; the debug table is initially empty, then populates
|
||||||
|
after the smoke harness issues simulator BCD reads/writes (capture armed on page
|
||||||
|
open); confirm the connection indicator shows "connected". Optionally verify the
|
||||||
|
capture disarms after the tab closes (registry state inspected via a test hook
|
||||||
|
or by reopening and seeing an empty table). Build + `dotnet test` still green.
|
||||||
|
|
||||||
|
## Phase 6 — Docs, full regression, cleanup
|
||||||
|
|
||||||
|
1. Rewrite the "HTML Page Layout" section of `docs/Operations/StatusPage.md` as
|
||||||
|
the two-view + SignalR description; document `/plc/{name}`, `/hub/status`,
|
||||||
|
`/assets/*`, the debug view, and on-demand capture. The `/status.json`
|
||||||
|
section stays as-is.
|
||||||
|
2. `docs/Operations/Configuration.md` — add `Mbproxy.AdminPushIntervalMs`.
|
||||||
|
3. `docs/Reference/LogEvents.md` — add any new `mbproxy.admin.*` events (hub
|
||||||
|
start; capture arm/disarm at Debug level).
|
||||||
|
4. `mbproxy/CLAUDE.md` + `README.md` — refresh the admin-endpoint headline
|
||||||
|
bullet (two views, SignalR, debug view).
|
||||||
|
5. Confirm `StatusHtmlRenderer*` fully removed; `StatusSnapshotBuilderTests`
|
||||||
|
updated for `BuildPlcDetail`.
|
||||||
|
6. Vendored-asset provenance table finalized in the progress log.
|
||||||
|
|
||||||
|
**Gate 6:** full `dotnet test` green on Windows (and `linux-x64` per the
|
||||||
|
multiplatform plan); `docs/` internally consistent; single-file publish serves
|
||||||
|
the new UI with zero external requests; both chrome smoke flows (Gate 4 + 5)
|
||||||
|
re-run green end-to-end.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Parallel-safety / file-ownership
|
||||||
|
|
||||||
|
Execution is **sequential single-agent** (chosen). This section documents how
|
||||||
|
the work *could* be split if delegated, and the ordering constraints that hold
|
||||||
|
regardless.
|
||||||
|
|
||||||
|
**Hard ordering (must be sequential):**
|
||||||
|
|
||||||
|
- **1 → 2:** Phase 2 (`StatusHub`, broadcaster, `StatusSnapshotBuilder` use)
|
||||||
|
depends on Phase 1's `TagCaptureRegistry`, `TagValueCapture`, and
|
||||||
|
`BuildPlcDetail`.
|
||||||
|
- **2 → 3:** Phases 2 and 3 **both edit `AdminEndpointHost.cs`** — they cannot
|
||||||
|
run concurrently. Do Phase 2's hub/broadcaster wiring, then Phase 3's
|
||||||
|
route/asset wiring, in the same file sequentially.
|
||||||
|
- **3 → 4, 3 → 5:** the frontend phases need the asset routes and the
|
||||||
|
`theme.css` shared base from Phase 3.
|
||||||
|
|
||||||
|
**Parallelizable (if ever delegated):** Phases 4 and 5 touch **disjoint files** —
|
||||||
|
Phase 4 owns `index.html` / `dashboard.css` / `dashboard.js`; Phase 5 owns
|
||||||
|
`plc.html` / `detail.css` / `detail.js`; `theme.css` is frozen in Phase 3 and
|
||||||
|
edited by neither. Both phases are pure static-asset edits — **no `.cs`, no
|
||||||
|
build during the phase** — so two agents can work the same checkout with no
|
||||||
|
`obj/bin` race; the build + chrome smoke gate runs once after both. No git
|
||||||
|
worktree isolation needed. If a frontend phase turns out to need a `.cs` change
|
||||||
|
(e.g. a missing DTO field), that change is pulled back into a Phase-1/3 fix and
|
||||||
|
the parallel split is paused — frontend agents never edit `.cs`.
|
||||||
|
|
||||||
|
**File-ownership matrix:**
|
||||||
|
|
||||||
|
| File | Phase |
|
||||||
|
|---|---|
|
||||||
|
| `Proxy/TagValueCapture.cs`, `TagCaptureRegistry.cs` (new) | 1 |
|
||||||
|
| `Proxy/PerPlcContext.cs`, `BcdPduPipeline.cs`, `ProxyWorker.cs` | 1 |
|
||||||
|
| `Admin/DebugDto.cs` (new), `Admin/StatusSnapshotBuilder.cs` | 1 |
|
||||||
|
| `Admin/StatusHub.cs`, `StatusBroadcaster.cs` (new) | 2 |
|
||||||
|
| `Options/MbproxyOptions.cs`, `Configuration/ReloadValidator.cs` | 2 |
|
||||||
|
| `Admin/AdminEndpointHost.cs` | 2 then 3 (sequential, same file) |
|
||||||
|
| `Mbproxy.csproj`, `Admin/wwwroot/**` (vendored + `theme.css`) | 3 |
|
||||||
|
| `Admin/StatusHtmlRenderer.cs` (delete) | 3 |
|
||||||
|
| `wwwroot/index.html`, `dashboard.css`, `dashboard.js` | 4 |
|
||||||
|
| `wwwroot/plc.html`, `detail.css`, `detail.js` | 5 |
|
||||||
|
| `docs/**`, `CLAUDE.md`, `README.md` | 6 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase-gate checklist (applies to every phase)
|
||||||
|
|
||||||
|
1. `dotnet build -c Debug` → **0 warnings** (`TreatWarningsAsErrors` is on in
|
||||||
|
both projects — any warning fails the build).
|
||||||
|
2. `dotnet test` → **full suite green**, the phase's new tests present and
|
||||||
|
passing, no previously-passing test regressed.
|
||||||
|
3. The phase-specific functional check listed in its gate above.
|
||||||
|
4. No new NuGet package unless the plan names it (none are required).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Risks / open items
|
||||||
|
|
||||||
|
- **SignalR + single-file.** Reflection JSON works (not trimmed); confirmed at
|
||||||
|
Gate 2. The source-gen resolver registration covers a future trim.
|
||||||
|
- **Inner-container DI.** `AdminEndpointHost`'s `CreateSlimBuilder`
|
||||||
|
WebApplication has its own container — `TagCaptureRegistry` and
|
||||||
|
`StatusSnapshotBuilder` must be explicitly re-registered there for the hub.
|
||||||
|
Easy to miss; called out in Phase 2 step 4.
|
||||||
|
- **Capture arm leak on AdminPort hot-reload.** The WebApplication is torn down
|
||||||
|
and rebuilt; `StatusBroadcaster.Stop()` → `DisarmAll()` guarantees no capture
|
||||||
|
stays armed. Tested in Phase 2.
|
||||||
|
- **Hot-reload of the tag list** must `Rebuild` a PLC's `TagValueCapture` for
|
||||||
|
the new address set while preserving the armed flag — Phase 1 step 5.
|
||||||
|
- **PDU-rate cards** are Δcounter/Δt computed client-side from successive
|
||||||
|
pushes — no new server counter.
|
||||||
|
- **Vendored-asset size.** Bootstrap + SignalR + 2 woff2 ≈ a few hundred KB
|
||||||
|
embedded — negligible against a ~100 MB self-contained binary. The old
|
||||||
|
≤50 KB page-weight budget was a no-JS constraint and no longer applies.
|
||||||
|
- **Chrome smoke flakiness.** The MCP-driven gates wait on push cycles; use
|
||||||
|
explicit waits on counter change, not fixed sleeps, and a generous timeout.
|
||||||
|
|
||||||
|
## Progress log
|
||||||
|
|
||||||
|
- **2026-05-15 — Phase 0 done.** Added `tests/sim/mbproxy.smoke.config.json`
|
||||||
|
(line-a 16-bit BCD tag, line-b 32-bit BCD tag → dl205 sim on 127.0.0.1:5020;
|
||||||
|
line-dead → unreachable 192.0.2.1 for the "problems only" filter).
|
||||||
|
- **2026-05-15 — Phase 1 done, Gate 1 green.** New: `Proxy/TagValueCapture.cs`
|
||||||
|
(immutable-slot `Volatile.Write` swap, armed gate), `Proxy/TagCaptureRegistry.cs`,
|
||||||
|
`Admin/DebugDto.cs`. Modified: `PerPlcContext` (+`Capture`), `BcdPduPipeline`
|
||||||
|
(4 `Record` hooks — FC03/04 16+32-bit read, FC06/FC16 write), `ProxyWorker` +
|
||||||
|
`ConfigReconciler` (registry wiring incl. reseat-rebuild + remove),
|
||||||
|
`StatusSnapshotBuilder.BuildDebug`, `HostingExtensions` (DI singleton).
|
||||||
|
`dotnet build` 0 warnings; `dotnet test` **436 passed / 0 failed / 0 skipped**
|
||||||
|
(incl. 23 new Phase-1 tests: TagValueCaptureTests ×9, TagCaptureRegistryTests
|
||||||
|
×6, BcdPduPipelineCaptureTests ×6, StatusSnapshotBuilder BuildDebug ×2).
|
||||||
|
- **2026-05-15 — Phase 2 done, Gate 2 green.** New: `Admin/StatusHub.cs`,
|
||||||
|
`StatusBroadcaster.cs`, `StatusPushSink.cs` (`IStatusPushSink` seam +
|
||||||
|
`SignalRStatusPushSink`), `PlcSubscriptionTracker.cs`. Modified:
|
||||||
|
`MbproxyOptions` (+`AdminPushIntervalMs`, schema + `ReloadValidator`),
|
||||||
|
`AdminEndpointHost` (`AddSignalR`, `MapHub<StatusHub>("/hub/status")`,
|
||||||
|
broadcaster lifecycle tied to the Kestrel app, `DisarmAll` on stop),
|
||||||
|
`HostingExtensions` (`PlcSubscriptionTracker` singleton). Decision: kept
|
||||||
|
SignalR's default reflection JSON protocol (project is not `PublishTrimmed`,
|
||||||
|
so the source-gen resolver is unnecessary — recorded as a deliberate
|
||||||
|
deviation from the plan's "register source-gen resolver" note).
|
||||||
|
`dotnet build` 0 warnings; `dotnet test` **448 passed / 0 failed / 0 skipped**
|
||||||
|
(+12: StatusHubTests ×4, StatusBroadcasterTests ×4, ReloadValidator ×2,
|
||||||
|
MbproxyOptionsBinding ×2). Live check: service starts,
|
||||||
|
`POST /hub/status/negotiate` → 200, `/status.json` → 200.
|
||||||
|
- **2026-05-15 — Phase 3 done, Gate 3 green.** Vendored (jsdelivr) into
|
||||||
|
`Admin/wwwroot/` (flat, embedded): Bootstrap 5.3.3, SignalR JS 8.0.7,
|
||||||
|
IBM Plex Sans 400/600 + Mono 500 (fontsource 5.1.1) — see provenance table.
|
||||||
|
New routes in `AdminEndpointHost`: `GET /` + `GET /plc/{name}` (embedded SPA
|
||||||
|
shells, `no-cache`), `GET /assets/{path}` (embedded streamer, content-type
|
||||||
|
map, immutable cache, traversal-rejected); `StatusHtmlRenderer` + its tests
|
||||||
|
deleted; `SignalR.AddJsonProtocol` camelCase pinned. `csproj`:
|
||||||
|
`EmbeddedResource Admin\wwwroot\*.*`. The full Phase-4/5 frontend
|
||||||
|
(`index.html`, `dashboard.{css,js}`, `plc.html`, `detail.{css,js}`,
|
||||||
|
`theme.css`) was written in this pass too. `dotnet build` 0 warnings;
|
||||||
|
`dotnet test` **452 passed / 0 failed** (AdminEndpointTests rewritten: route
|
||||||
|
+ content-type + immutable-header + 404 coverage; `StatusHtmlRendererTests`
|
||||||
|
removed). Single-file-publish browser verification folded into Gate 4/5.
|
||||||
|
- **2026-05-15 — Phases 4 + 5 done, Gates 4 + 5 green.** Frontend already
|
||||||
|
written in the Phase-3 pass; this pass ran the browser smoke. Environment
|
||||||
|
note: the claude-in-chrome MCP browser could not reach `127.0.0.1:8080`
|
||||||
|
(Chrome on this box is behind a corporate proxy with no localhost bypass —
|
||||||
|
even the sim's own `:8081` console failed). Substituted the **Playwright MCP
|
||||||
|
browser** (own Chromium, no proxy) — a real-browser smoke, just a different
|
||||||
|
driver. Setup: dl205 simulator on `:5020`, mbproxy on the smoke config, a
|
||||||
|
pymodbus traffic generator (FC03 reads of V1072 on line-a/line-b, TCP touches
|
||||||
|
on line-dead). **Gate 4:** dashboard renders 6 aggregate cards + 3-row KPI
|
||||||
|
table; live update verified (PDU/s 0 → 52 → 67, uptime ticking); "problems
|
||||||
|
only" filter → "1 of 3" (line-dead, non-zero connectsFailed). **Gate 5:**
|
||||||
|
`/plc/line-a` renders all 9 grouped counter cards + per-client line; debug
|
||||||
|
view shows **CAPTURE ARMED** (on-demand arm on page open) with tag 1072 →
|
||||||
|
raw `0x1234` (PLC side) / decoded `1234` (client side); `/plc/line-b` 32-bit
|
||||||
|
tag → raw `0x00001234` / decoded `1234`. Build 0 warnings, full suite still
|
||||||
|
452 green.
|
||||||
|
- **2026-05-15 — Phase 6 done, Gate 6 green.** Docs: `StatusPage.md` "Endpoint
|
||||||
|
Surface" + "Web Dashboard" + new "Debug View Data" sections rewritten;
|
||||||
|
`Configuration.md` gained `Mbproxy.AdminPushIntervalMs`; `mbproxy/CLAUDE.md`
|
||||||
|
+ `README.md` admin bullets refreshed. No new `mbproxy.*` log events were
|
||||||
|
added (broadcaster/hub use plain `LogError`), so `LogEvents.md` is unchanged.
|
||||||
|
`StatusHtmlRenderer` + tests confirmed removed; `StatusSnapshotBuilderTests`
|
||||||
|
updated. `dotnet build` 0 warnings; `dotnet test` **452 passed / 0 failed**.
|
||||||
|
Single-file `dotnet publish -c Release -r win-x64` → 105 MB self-contained
|
||||||
|
`Mbproxy.exe`; live check: `/`, `/plc/{name}`, `/assets/{bootstrap.min.css,
|
||||||
|
signalr.min.js,ibm-plex-sans-400.woff2}` all 200 with correct content types,
|
||||||
|
`/hub/status/negotiate` 200, unknown asset 404, and `index.html` contains
|
||||||
|
**zero external URLs** (embedded resources resolve fine inside the
|
||||||
|
single-file bundle).
|
||||||
|
|
||||||
|
## Vendored-asset provenance
|
||||||
|
|
||||||
|
Filled in during Phase 3.
|
||||||
|
|
||||||
|
Vendored 2026-05-15 from `cdn.jsdelivr.net/npm`. SHA-256 of the stored files:
|
||||||
|
|
||||||
|
| File | Package / source | Version | SHA-256 | License |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| `bootstrap.min.css` | bootstrap | 5.3.3 | `3c8f27e6009ccfd710a905e6dcf12d0ee3c6f2ac7da05b0572d3e0d12e736fc8` | MIT |
|
||||||
|
| `bootstrap.bundle.min.js` | bootstrap | 5.3.3 | `0833b2e9c3a26c258476c46266e6877fc75218625162e0460be9a3a098a61c6c` | MIT |
|
||||||
|
| `signalr.min.js` | @microsoft/signalr | 8.0.7 | `e28a720a359b37cb015758d543f908730ed5bbe478db09506bb6887f18313538` | MIT |
|
||||||
|
| `ibm-plex-sans-400.woff2` | @fontsource/ibm-plex-sans | 5.1.1 | `db71f8a28ad8501544fb4e7668e3c6d0b731760b6f20de3525ebaeba597f1922` | SIL OFL 1.1 |
|
||||||
|
| `ibm-plex-sans-600.woff2` | @fontsource/ibm-plex-sans | 5.1.1 | `31535a91ce3f6b8ed3ddedadab1e49957e2220263a640df1a3f14f6fdfe15eb6` | SIL OFL 1.1 |
|
||||||
|
| `ibm-plex-mono-500.woff2` | @fontsource/ibm-plex-mono | 5.1.1 | `756026ff72eb76fd971ac4b7504cec55eef62109d2684c2cad8da32170b80b37` | SIL OFL 1.1 |
|
||||||
@@ -1,9 +1,12 @@
|
|||||||
|
using System.Collections.Concurrent;
|
||||||
using System.Text.Json;
|
using System.Text.Json;
|
||||||
using Microsoft.AspNetCore.Builder;
|
using Microsoft.AspNetCore.Builder;
|
||||||
using Microsoft.AspNetCore.Hosting;
|
using Microsoft.AspNetCore.Hosting;
|
||||||
using Microsoft.AspNetCore.Http;
|
using Microsoft.AspNetCore.Http;
|
||||||
|
using Microsoft.AspNetCore.SignalR;
|
||||||
using Microsoft.Extensions.Options;
|
using Microsoft.Extensions.Options;
|
||||||
using Mbproxy.Options;
|
using Mbproxy.Options;
|
||||||
|
using Mbproxy.Proxy;
|
||||||
|
|
||||||
namespace Mbproxy.Admin;
|
namespace Mbproxy.Admin;
|
||||||
|
|
||||||
@@ -23,7 +26,10 @@ namespace Mbproxy.Admin;
|
|||||||
/// <item><see cref="StopAsync"/> shuts down the current Kestrel app with a 2 s deadline.</item>
|
/// <item><see cref="StopAsync"/> shuts down the current Kestrel app with a 2 s deadline.</item>
|
||||||
/// </list>
|
/// </list>
|
||||||
///
|
///
|
||||||
/// <para>Routes: exactly two — <c>GET /</c> (HTML) and <c>GET /status.json</c> (JSON).</para>
|
/// <para>Routes: <c>GET /</c> and <c>GET /plc/{name}</c> (embedded SPA shells),
|
||||||
|
/// <c>GET /assets/{path}</c> (embedded Bootstrap / SignalR / fonts / app JS+CSS),
|
||||||
|
/// <c>GET /status.json</c> (JSON snapshot for scrapers), and the SignalR hub at
|
||||||
|
/// <c>/hub/status</c> driving the live dashboard feed.</para>
|
||||||
///
|
///
|
||||||
/// <para>Registered as a plain singleton (not <see cref="IHostedService"/>) so
|
/// <para>Registered as a plain singleton (not <see cref="IHostedService"/>) so
|
||||||
/// <see cref="Proxy.ProxyWorker"/> can drive its lifecycle explicitly. This is required to
|
/// <see cref="Proxy.ProxyWorker"/> can drive its lifecycle explicitly. This is required to
|
||||||
@@ -35,12 +41,18 @@ internal sealed partial class AdminEndpointHost : IAsyncDisposable
|
|||||||
{
|
{
|
||||||
private readonly IOptionsMonitor<MbproxyOptions> _optionsMonitor;
|
private readonly IOptionsMonitor<MbproxyOptions> _optionsMonitor;
|
||||||
private readonly StatusSnapshotBuilder _builder;
|
private readonly StatusSnapshotBuilder _builder;
|
||||||
|
private readonly TagCaptureRegistry _captureRegistry;
|
||||||
|
private readonly PlcSubscriptionTracker _subscriptionTracker;
|
||||||
private readonly ILoggerFactory _loggerFactory;
|
private readonly ILoggerFactory _loggerFactory;
|
||||||
private readonly ILogger<AdminEndpointHost> _logger;
|
private readonly ILogger<AdminEndpointHost> _logger;
|
||||||
|
|
||||||
// The currently-running Kestrel app; null when stopped or when bind failed.
|
// The currently-running Kestrel app; null when stopped or when bind failed.
|
||||||
private WebApplication? _app;
|
private WebApplication? _app;
|
||||||
|
|
||||||
|
// SignalR push loop for the live dashboard. Lifecycle is tied to _app: created
|
||||||
|
// when the Kestrel app starts, stopped (and captures disarmed) before it stops.
|
||||||
|
private StatusBroadcaster? _broadcaster;
|
||||||
|
|
||||||
// Protects concurrent Start/Stop calls (hot-reload + StopAsync racing).
|
// Protects concurrent Start/Stop calls (hot-reload + StopAsync racing).
|
||||||
private readonly SemaphoreSlim _lock = new(1, 1);
|
private readonly SemaphoreSlim _lock = new(1, 1);
|
||||||
|
|
||||||
@@ -60,12 +72,16 @@ internal sealed partial class AdminEndpointHost : IAsyncDisposable
|
|||||||
public AdminEndpointHost(
|
public AdminEndpointHost(
|
||||||
IOptionsMonitor<MbproxyOptions> optionsMonitor,
|
IOptionsMonitor<MbproxyOptions> optionsMonitor,
|
||||||
StatusSnapshotBuilder builder,
|
StatusSnapshotBuilder builder,
|
||||||
|
TagCaptureRegistry captureRegistry,
|
||||||
|
PlcSubscriptionTracker subscriptionTracker,
|
||||||
ILoggerFactory loggerFactory)
|
ILoggerFactory loggerFactory)
|
||||||
{
|
{
|
||||||
_optionsMonitor = optionsMonitor;
|
_optionsMonitor = optionsMonitor;
|
||||||
_builder = builder;
|
_builder = builder;
|
||||||
_loggerFactory = loggerFactory;
|
_captureRegistry = captureRegistry;
|
||||||
_logger = loggerFactory.CreateLogger<AdminEndpointHost>();
|
_subscriptionTracker = subscriptionTracker;
|
||||||
|
_loggerFactory = loggerFactory;
|
||||||
|
_logger = loggerFactory.CreateLogger<AdminEndpointHost>();
|
||||||
}
|
}
|
||||||
|
|
||||||
public async Task StartAsync(CancellationToken cancellationToken)
|
public async Task StartAsync(CancellationToken cancellationToken)
|
||||||
@@ -145,11 +161,26 @@ internal sealed partial class AdminEndpointHost : IAsyncDisposable
|
|||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// Builds and starts a Kestrel <see cref="WebApplication"/> on <paramref name="port"/>.
|
/// Builds and starts a Kestrel <see cref="WebApplication"/> on <paramref name="port"/>.
|
||||||
/// On bind failure, logs the error and sets <c>_app = null</c> — does NOT throw.
|
/// <paramref name="port"/> 0 means the admin endpoint is disabled — no listener is
|
||||||
/// Caller must hold <c>_lock</c> or be in a single-threaded context (StartAsync).
|
/// started. On bind failure, logs the error and sets <c>_app = null</c> — does NOT
|
||||||
|
/// throw. Caller must hold <c>_lock</c> or be in a single-threaded context (StartAsync).
|
||||||
/// </summary>
|
/// </summary>
|
||||||
private async Task StartAppAsync(int port, CancellationToken ct)
|
private async Task StartAppAsync(int port, CancellationToken ct)
|
||||||
{
|
{
|
||||||
|
// AdminPort 0 means "admin endpoint disabled" — the documented disable switch in
|
||||||
|
// the config templates. Kestrel would otherwise interpret port 0 as "bind an
|
||||||
|
// OS-assigned ephemeral port", leaving an unadvertised HTTP listener running
|
||||||
|
// (review TestsAndConfig M1). Skip the build entirely.
|
||||||
|
if (port == 0)
|
||||||
|
{
|
||||||
|
_logger.LogInformation("Admin endpoint disabled (Mbproxy.AdminPort = 0).");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Declared outside the try so the catch can dispose a built-but-not-fully-started
|
||||||
|
// app on a bind failure (M6 — otherwise a built WebApplication or a started
|
||||||
|
// Kestrel listener leaks on any throw after Build()).
|
||||||
|
WebApplication? app = null;
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
// Use CreateSlimBuilder with explicit args (empty) to avoid inheriting
|
// Use CreateSlimBuilder with explicit args (empty) to avoid inheriting
|
||||||
@@ -170,14 +201,42 @@ internal sealed partial class AdminEndpointHost : IAsyncDisposable
|
|||||||
k.Listen(System.Net.IPAddress.Any, port);
|
k.Listen(System.Net.IPAddress.Any, port);
|
||||||
});
|
});
|
||||||
|
|
||||||
var app = builder.Build();
|
// SignalR hub for the live dashboard. The inner WebApplication has its own
|
||||||
|
// DI container, so the singleton StatusHub depends on is re-registered here.
|
||||||
|
// The payload serialises via reflection-based System.Text.Json with a
|
||||||
|
// camelCase policy — the same wire shape as GET /status.json. The project
|
||||||
|
// does not trim/AOT, so a reflection JSON path is acceptable here.
|
||||||
|
builder.Services
|
||||||
|
.AddSignalR()
|
||||||
|
.AddJsonProtocol(o => ConfigureHubPayloadJson(o.PayloadSerializerOptions));
|
||||||
|
builder.Services.AddSingleton(_subscriptionTracker);
|
||||||
|
|
||||||
|
app = builder.Build();
|
||||||
|
|
||||||
// ── Routes ───────────────────────────────────────────────────────
|
// ── Routes ───────────────────────────────────────────────────────
|
||||||
app.MapGet("/", (HttpContext ctx) =>
|
// GET / — fleet dashboard SPA shell
|
||||||
|
// GET /plc/{name} — connection-detail SPA shell (name read client-side)
|
||||||
|
// GET /assets/... — embedded Bootstrap / SignalR / fonts / app JS+CSS
|
||||||
|
// GET /status.json — unchanged JSON snapshot for scrapers
|
||||||
|
// /hub/status — SignalR hub for the live feed
|
||||||
|
app.MapGet("/", (HttpContext ctx) => ServeHtmlShell(ctx, "index.html"));
|
||||||
|
|
||||||
|
app.MapGet("/plc/{name}", (string name, HttpContext ctx) => ServeHtmlShell(ctx, "plc.html"));
|
||||||
|
|
||||||
|
app.MapGet("/assets/{path}", (string path, HttpContext ctx) =>
|
||||||
{
|
{
|
||||||
var snapshot = _builder.Build();
|
// Flat asset directory — a path segment with a slash or "." traversal
|
||||||
string html = StatusHtmlRenderer.Render(snapshot);
|
// can never match a resource, but reject it explicitly anyway.
|
||||||
return Results.Content(html, "text/html; charset=utf-8");
|
if (path.Contains('/') || path.Contains('\\') || path.Contains(".."))
|
||||||
|
return Results.NotFound();
|
||||||
|
|
||||||
|
var bytes = ReadAssetCached(path);
|
||||||
|
if (bytes is null)
|
||||||
|
return Results.NotFound();
|
||||||
|
|
||||||
|
// Vendored assets are content-addressed by filename+version → immutable.
|
||||||
|
ctx.Response.Headers.CacheControl = "public, max-age=31536000, immutable";
|
||||||
|
return Results.Bytes(bytes, ContentTypeFor(path));
|
||||||
});
|
});
|
||||||
|
|
||||||
app.MapGet("/status.json", (HttpContext ctx) =>
|
app.MapGet("/status.json", (HttpContext ctx) =>
|
||||||
@@ -187,16 +246,46 @@ internal sealed partial class AdminEndpointHost : IAsyncDisposable
|
|||||||
return Results.Content(json, "application/json");
|
return Results.Content(json, "application/json");
|
||||||
});
|
});
|
||||||
|
|
||||||
|
app.MapHub<StatusHub>("/hub/status");
|
||||||
|
|
||||||
await app.StartAsync(ct).ConfigureAwait(false);
|
await app.StartAsync(ct).ConfigureAwait(false);
|
||||||
_app = app;
|
_app = app;
|
||||||
|
|
||||||
|
// Start the SignalR push loop now that the hub is reachable.
|
||||||
|
var hubContext = app.Services.GetRequiredService<IHubContext<StatusHub>>();
|
||||||
|
_broadcaster = new StatusBroadcaster(
|
||||||
|
new SignalRStatusPushSink(hubContext),
|
||||||
|
_builder,
|
||||||
|
_subscriptionTracker,
|
||||||
|
_captureRegistry,
|
||||||
|
_optionsMonitor,
|
||||||
|
_loggerFactory.CreateLogger<StatusBroadcaster>());
|
||||||
|
_broadcaster.Start();
|
||||||
|
|
||||||
LogAdminStarted(_logger, port);
|
LogAdminStarted(_logger, port);
|
||||||
}
|
}
|
||||||
catch (Exception ex) when (ex is not OperationCanceledException)
|
catch (Exception ex) when (ex is not OperationCanceledException)
|
||||||
{
|
{
|
||||||
// Bind failed — log and continue. Proxy listeners are unaffected.
|
// Bind (or post-bind setup) failed — log and continue. Proxy listeners are
|
||||||
|
// unaffected. Tear down anything that started before the failure so neither
|
||||||
|
// the push loop nor a bound Kestrel listener leaks (M6).
|
||||||
LogAdminBindFailed(_logger, port, ex.Message);
|
LogAdminBindFailed(_logger, port, ex.Message);
|
||||||
|
|
||||||
|
if (_broadcaster is { } broadcaster)
|
||||||
|
{
|
||||||
|
_broadcaster = null;
|
||||||
|
try { await broadcaster.DisposeAsync().ConfigureAwait(false); }
|
||||||
|
catch { /* best-effort */ }
|
||||||
|
}
|
||||||
|
|
||||||
_app = null;
|
_app = null;
|
||||||
|
if (app is not null)
|
||||||
|
{
|
||||||
|
try { await app.StopAsync().ConfigureAwait(false); }
|
||||||
|
catch { /* best-effort — may never have started */ }
|
||||||
|
try { await app.DisposeAsync().ConfigureAwait(false); }
|
||||||
|
catch { /* best-effort */ }
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -205,6 +294,21 @@ internal sealed partial class AdminEndpointHost : IAsyncDisposable
|
|||||||
/// </summary>
|
/// </summary>
|
||||||
private async Task StopCurrentAppAsync()
|
private async Task StopCurrentAppAsync()
|
||||||
{
|
{
|
||||||
|
// Stop the SignalR push loop first — this also disarms every tag-value capture,
|
||||||
|
// so an AdminPort hot-reload that tears down this app never leaves one armed.
|
||||||
|
if (_broadcaster is { } broadcaster)
|
||||||
|
{
|
||||||
|
_broadcaster = null;
|
||||||
|
try
|
||||||
|
{
|
||||||
|
await broadcaster.DisposeAsync().ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
catch
|
||||||
|
{
|
||||||
|
// Best-effort.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if (_app is null) return;
|
if (_app is null) return;
|
||||||
|
|
||||||
var app = _app;
|
var app = _app;
|
||||||
@@ -223,6 +327,64 @@ internal sealed partial class AdminEndpointHost : IAsyncDisposable
|
|||||||
await app.DisposeAsync().ConfigureAwait(false);
|
await app.DisposeAsync().ConfigureAwait(false);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Configures the JSON serialization for the SignalR hub payloads: camelCase property
|
||||||
|
/// names, matching the dashboard JS field names and the <c>GET /status.json</c> wire
|
||||||
|
/// shape. Extracted as a single shared method so <c>DebugDtoSerializationTests</c> can
|
||||||
|
/// assert against the exact same configuration the hub uses — neither side can drift.
|
||||||
|
/// </summary>
|
||||||
|
internal static void ConfigureHubPayloadJson(System.Text.Json.JsonSerializerOptions options)
|
||||||
|
=> options.PropertyNamingPolicy = System.Text.Json.JsonNamingPolicy.CamelCase;
|
||||||
|
|
||||||
|
// ── Embedded asset serving ───────────────────────────────────────────────
|
||||||
|
|
||||||
|
private const string AssetResourcePrefix = "Mbproxy.Admin.wwwroot.";
|
||||||
|
|
||||||
|
// Embedded resources are immutable for the process lifetime; cache the decoded
|
||||||
|
// bytes so a 232 KB Bootstrap stylesheet is not re-materialised per request.
|
||||||
|
// `byte[]?` value so a known-miss is cached too. Static — shared across app rebuilds.
|
||||||
|
private static readonly ConcurrentDictionary<string, byte[]?> AssetCache = new(StringComparer.Ordinal);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Serves an embedded HTML shell (<c>index.html</c> / <c>plc.html</c>) with a
|
||||||
|
/// <c>no-cache</c> header so a redeployed UI is picked up on the next load.
|
||||||
|
/// </summary>
|
||||||
|
private static IResult ServeHtmlShell(HttpContext ctx, string fileName)
|
||||||
|
{
|
||||||
|
var bytes = ReadAssetCached(fileName);
|
||||||
|
if (bytes is null)
|
||||||
|
return Results.NotFound();
|
||||||
|
|
||||||
|
ctx.Response.Headers.CacheControl = "no-cache";
|
||||||
|
return Results.Bytes(bytes, "text/html; charset=utf-8");
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Reads an embedded <c>wwwroot</c> asset, caching the bytes (and misses).</summary>
|
||||||
|
private static byte[]? ReadAssetCached(string fileName)
|
||||||
|
=> AssetCache.GetOrAdd(fileName, static name =>
|
||||||
|
{
|
||||||
|
var asm = typeof(AdminEndpointHost).Assembly;
|
||||||
|
using var stream = asm.GetManifestResourceStream(AssetResourcePrefix + name);
|
||||||
|
if (stream is null)
|
||||||
|
return null;
|
||||||
|
using var ms = new MemoryStream();
|
||||||
|
stream.CopyTo(ms);
|
||||||
|
return ms.ToArray();
|
||||||
|
});
|
||||||
|
|
||||||
|
private static string ContentTypeFor(string fileName)
|
||||||
|
=> Path.GetExtension(fileName).ToLowerInvariant() switch
|
||||||
|
{
|
||||||
|
".html" => "text/html; charset=utf-8",
|
||||||
|
".css" => "text/css; charset=utf-8",
|
||||||
|
".js" => "text/javascript; charset=utf-8",
|
||||||
|
".json" => "application/json; charset=utf-8",
|
||||||
|
".woff2" => "font/woff2",
|
||||||
|
".svg" => "image/svg+xml",
|
||||||
|
".ico" => "image/x-icon",
|
||||||
|
_ => "application/octet-stream",
|
||||||
|
};
|
||||||
|
|
||||||
// ── IAsyncDisposable ─────────────────────────────────────────────────────
|
// ── IAsyncDisposable ─────────────────────────────────────────────────────
|
||||||
|
|
||||||
public async ValueTask DisposeAsync()
|
public async ValueTask DisposeAsync()
|
||||||
@@ -233,6 +395,12 @@ internal sealed partial class AdminEndpointHost : IAsyncDisposable
|
|||||||
_optionsChangeRegistration?.Dispose();
|
_optionsChangeRegistration?.Dispose();
|
||||||
_optionsChangeRegistration = null;
|
_optionsChangeRegistration = null;
|
||||||
|
|
||||||
|
if (_broadcaster is { } broadcaster)
|
||||||
|
{
|
||||||
|
_broadcaster = null;
|
||||||
|
await broadcaster.DisposeAsync().ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
|
||||||
if (_app is { } app)
|
if (_app is { } app)
|
||||||
{
|
{
|
||||||
_app = null;
|
_app = null;
|
||||||
|
|||||||
@@ -0,0 +1,54 @@
|
|||||||
|
namespace Mbproxy.Admin;
|
||||||
|
|
||||||
|
// ── Wire DTOs for the connection-detail debug view ───────────────────────────
|
||||||
|
// Pushed over SignalR to subscribers of a single PLC's detail page. The SignalR hub
|
||||||
|
// serialises these via reflection-based System.Text.Json with a camelCase property
|
||||||
|
// policy (see AdminEndpointHost's AddJsonProtocol) — the same wire shape as
|
||||||
|
// GET /status.json. The project does not trim/AOT, so the reflection path is fine.
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Per-PLC payload pushed to detail-page subscribers: the standard per-PLC status
|
||||||
|
/// row plus the real-time debug view's tag-value table. <see cref="Plc"/> is
|
||||||
|
/// <c>null</c> when the detail page is open for a PLC that is no longer in the
|
||||||
|
/// configuration (removed by a hot-reload) — the page renders a "no longer
|
||||||
|
/// configured" state.
|
||||||
|
/// </summary>
|
||||||
|
public sealed record PlcDetailResponse(
|
||||||
|
PlcStatus? Plc,
|
||||||
|
PlcDebugSnapshot Debug);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Snapshot of one PLC's tag-value capture. <see cref="CaptureArmed"/> is <c>false</c>
|
||||||
|
/// when no detail page has armed the capture (e.g. a snapshot taken in the gap before
|
||||||
|
/// the SignalR subscription completes); the table is still returned so the view can
|
||||||
|
/// render one row per configured BCD tag.
|
||||||
|
/// </summary>
|
||||||
|
public sealed record PlcDebugSnapshot(
|
||||||
|
bool CaptureArmed,
|
||||||
|
IReadOnlyList<TagValueDto> Tags);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// One configured BCD tag's last observed value. <see cref="HasValue"/> is <c>false</c>
|
||||||
|
/// before any traffic has been captured for the tag since the capture was armed — the
|
||||||
|
/// view renders such rows as "no traffic yet".
|
||||||
|
/// </summary>
|
||||||
|
public sealed record TagValueDto(
|
||||||
|
int Address,
|
||||||
|
int Width,
|
||||||
|
/// <summary>Optional human-friendly tag label from config; <c>null</c> when unset.</summary>
|
||||||
|
string? Name,
|
||||||
|
bool HasValue,
|
||||||
|
/// <summary><c>"read"</c> (FC03/FC04) or <c>"write"</c> (FC06/FC16).</summary>
|
||||||
|
string Direction,
|
||||||
|
/// <summary>
|
||||||
|
/// Raw PLC-side value as BCD nibbles in hex — <c>0xLLLL</c> for a 16-bit tag,
|
||||||
|
/// <c>0xHHHHLLLL</c> (high word, low word) for a 32-bit tag. <c>"—"</c> when
|
||||||
|
/// <see cref="HasValue"/> is false.
|
||||||
|
/// </summary>
|
||||||
|
string RawHex,
|
||||||
|
/// <summary>Decoded binary integer the upstream client reads / wrote.</summary>
|
||||||
|
long DecodedValue,
|
||||||
|
/// <summary>ISO-8601 UTC time of the observation; <c>null</c> when no traffic yet.</summary>
|
||||||
|
string? UpdatedAtUtc,
|
||||||
|
/// <summary>Seconds since the observation; <c>null</c> when no traffic yet.</summary>
|
||||||
|
double? AgeSeconds);
|
||||||
@@ -0,0 +1,109 @@
|
|||||||
|
namespace Mbproxy.Admin;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Tracks which browser tabs currently have a PLC connection-detail page open, so the
|
||||||
|
/// admin layer knows which PLC tag-value captures should be armed and which PLC groups
|
||||||
|
/// the <see cref="StatusBroadcaster"/> needs to push to.
|
||||||
|
///
|
||||||
|
/// <para><b>Why tabs, not SignalR connections.</b> A SignalR connection is assigned a
|
||||||
|
/// fresh <c>ConnectionId</c> on every transport reconnect (a WebSocket drop, a
|
||||||
|
/// long-polling cycle, a network blip). Counting <em>connections</em> therefore leaks a
|
||||||
|
/// subscriber on every reconnect — the old connection's <c>OnDisconnectedAsync</c> is
|
||||||
|
/// not ordered against the new connection's re-subscribe, so the count never returns to
|
||||||
|
/// 0 and the capture stays armed forever with no viewer. Instead each detail page sends
|
||||||
|
/// a stable per-page-load <c>tabId</c>: a tab "views" a PLC for as long as it has at
|
||||||
|
/// least one live connection. A reconnect is just the same tab acquiring a new
|
||||||
|
/// connection, so it cannot leak; a tab is released only when its <em>last</em>
|
||||||
|
/// connection is gone (the page closed, or SignalR's keepalive timeout elapsed for an
|
||||||
|
/// abruptly-killed tab).</para>
|
||||||
|
///
|
||||||
|
/// <para>Registered as a DI singleton; the transient <see cref="StatusHub"/> instances
|
||||||
|
/// share this one tracker. All methods are thread-safe under a single lock —
|
||||||
|
/// subscription churn is low-frequency (one event per detail-page open / close /
|
||||||
|
/// reconnect), so lock contention is a non-issue. The tracker never arms or disarms a
|
||||||
|
/// capture itself — <see cref="StatusBroadcaster"/> reconciles arm state each push
|
||||||
|
/// cycle from <see cref="ActivePlcs"/>, which keeps arming single-threaded.</para>
|
||||||
|
/// </summary>
|
||||||
|
internal sealed class PlcSubscriptionTracker
|
||||||
|
{
|
||||||
|
/// <summary>Live state for one browser tab: its connections and the PLCs it views.</summary>
|
||||||
|
private sealed class TabState
|
||||||
|
{
|
||||||
|
public readonly HashSet<string> Connections = new(StringComparer.Ordinal);
|
||||||
|
public readonly HashSet<string> Plcs = new(StringComparer.Ordinal);
|
||||||
|
}
|
||||||
|
|
||||||
|
private readonly object _gate = new();
|
||||||
|
|
||||||
|
// tabId → tab state (live connections + the PLC detail pages it has open).
|
||||||
|
private readonly Dictionary<string, TabState> _tabs = new(StringComparer.Ordinal);
|
||||||
|
|
||||||
|
// connectionId → owning tabId, so a disconnect can find (and decrement) its tab.
|
||||||
|
private readonly Dictionary<string, string> _connToTab = new(StringComparer.Ordinal);
|
||||||
|
|
||||||
|
// PLC name → number of distinct tabs currently viewing its detail page.
|
||||||
|
private readonly Dictionary<string, int> _plcViewerTabs = new(StringComparer.Ordinal);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Records that connection <paramref name="connectionId"/>, belonging to browser tab
|
||||||
|
/// <paramref name="tabId"/>, has the detail page for <paramref name="plcName"/> open.
|
||||||
|
/// Idempotent: a reconnect (same tab, new connection) or a repeated call for an
|
||||||
|
/// already-subscribed tag does not double-count the tab.
|
||||||
|
/// </summary>
|
||||||
|
public void SubscribePlc(string connectionId, string tabId, string plcName)
|
||||||
|
{
|
||||||
|
lock (_gate)
|
||||||
|
{
|
||||||
|
if (!_tabs.TryGetValue(tabId, out var tab))
|
||||||
|
_tabs[tabId] = tab = new TabState();
|
||||||
|
|
||||||
|
tab.Connections.Add(connectionId);
|
||||||
|
_connToTab[connectionId] = tabId;
|
||||||
|
|
||||||
|
if (tab.Plcs.Add(plcName))
|
||||||
|
_plcViewerTabs[plcName] = _plcViewerTabs.GetValueOrDefault(plcName) + 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Drops connection <paramref name="connectionId"/>. If it was the last live
|
||||||
|
/// connection of its tab, the tab is released and its PLC subscriptions decremented.
|
||||||
|
/// A still-live sibling connection (reconnect overlap) keeps the tab — and its
|
||||||
|
/// captures — alive. Safe to call for an unknown / fleet-only connection (no-op).
|
||||||
|
/// </summary>
|
||||||
|
public void RemoveConnection(string connectionId)
|
||||||
|
{
|
||||||
|
lock (_gate)
|
||||||
|
{
|
||||||
|
if (!_connToTab.Remove(connectionId, out var tabId))
|
||||||
|
return;
|
||||||
|
if (!_tabs.TryGetValue(tabId, out var tab))
|
||||||
|
return;
|
||||||
|
|
||||||
|
tab.Connections.Remove(connectionId);
|
||||||
|
if (tab.Connections.Count > 0)
|
||||||
|
return; // the tab is still alive on another connection
|
||||||
|
|
||||||
|
_tabs.Remove(tabId);
|
||||||
|
foreach (var plcName in tab.Plcs)
|
||||||
|
{
|
||||||
|
int count = _plcViewerTabs.GetValueOrDefault(plcName);
|
||||||
|
if (count <= 1)
|
||||||
|
_plcViewerTabs.Remove(plcName);
|
||||||
|
else
|
||||||
|
_plcViewerTabs[plcName] = count - 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>PLC names that currently have at least one detail-page tab open.</summary>
|
||||||
|
public IReadOnlyList<string> ActivePlcs()
|
||||||
|
{
|
||||||
|
lock (_gate)
|
||||||
|
{
|
||||||
|
return _plcViewerTabs.Count == 0
|
||||||
|
? Array.Empty<string>()
|
||||||
|
: _plcViewerTabs.Keys.ToArray();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,220 @@
|
|||||||
|
using Mbproxy.Options;
|
||||||
|
using Mbproxy.Proxy;
|
||||||
|
using Microsoft.Extensions.Options;
|
||||||
|
|
||||||
|
namespace Mbproxy.Admin;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Background loop that drives the admin dashboard's live feed. Every
|
||||||
|
/// <see cref="MbproxyOptions.AdminPushIntervalMs"/> it builds a status snapshot and
|
||||||
|
/// pushes it through an <see cref="IStatusPushSink"/>:
|
||||||
|
/// <list type="bullet">
|
||||||
|
/// <item>the fleet snapshot to every fleet-dashboard subscriber;</item>
|
||||||
|
/// <item>a per-PLC detail payload (status row + tag-value capture) to each PLC that
|
||||||
|
/// currently has a detail-page subscriber — PLCs with no viewer are skipped.</item>
|
||||||
|
/// </list>
|
||||||
|
///
|
||||||
|
/// <para>Owned by <see cref="AdminEndpointHost"/>: <see cref="Start"/> is called once
|
||||||
|
/// the Kestrel app is up, <see cref="StopAsync"/> before it stops. <see cref="StopAsync"/>
|
||||||
|
/// disarms every tag-value capture, so an AdminPort hot-reload — which tears down the
|
||||||
|
/// SignalR host and all connections without firing per-connection disconnect cleanup
|
||||||
|
/// deterministically — never leaves a capture armed with no viewer.</para>
|
||||||
|
/// </summary>
|
||||||
|
internal sealed partial class StatusBroadcaster : IAsyncDisposable
|
||||||
|
{
|
||||||
|
private readonly IStatusPushSink _sink;
|
||||||
|
private readonly StatusSnapshotBuilder _builder;
|
||||||
|
private readonly PlcSubscriptionTracker _tracker;
|
||||||
|
private readonly TagCaptureRegistry _captureRegistry;
|
||||||
|
private readonly IOptionsMonitor<MbproxyOptions> _options;
|
||||||
|
private readonly ILogger _logger;
|
||||||
|
|
||||||
|
private readonly CancellationTokenSource _cts = new();
|
||||||
|
private Task _loop = Task.CompletedTask;
|
||||||
|
|
||||||
|
// Guards StopAsync against a double-stop (DisposeAsync also calls StopAsync, and the
|
||||||
|
// owner may call StopAsync explicitly first) — symmetry with AdminEndpointHost's
|
||||||
|
// _disposed flag, and defends a future caller from touching the disposed CTS.
|
||||||
|
private bool _stopped;
|
||||||
|
|
||||||
|
// 0 until the first Start(); a second Start() is a no-op (would otherwise orphan the
|
||||||
|
// first loop task — StopAsync only awaits the latest _loop).
|
||||||
|
private int _started;
|
||||||
|
|
||||||
|
// Count of consecutive push cycles that hit at least one failure. Used to throttle
|
||||||
|
// the per-cycle error logging so a wedged SignalR transport does not flood the
|
||||||
|
// rolling file (the first failure and then one per ~minute are logged). Touched only
|
||||||
|
// from the single-threaded push loop.
|
||||||
|
private int _consecutivePushFailures;
|
||||||
|
|
||||||
|
private bool ShouldLogPushFailure()
|
||||||
|
=> _consecutivePushFailures == 0 || _consecutivePushFailures % 60 == 0;
|
||||||
|
|
||||||
|
public StatusBroadcaster(
|
||||||
|
IStatusPushSink sink,
|
||||||
|
StatusSnapshotBuilder builder,
|
||||||
|
PlcSubscriptionTracker tracker,
|
||||||
|
TagCaptureRegistry captureRegistry,
|
||||||
|
IOptionsMonitor<MbproxyOptions> options,
|
||||||
|
ILogger logger)
|
||||||
|
{
|
||||||
|
_sink = sink;
|
||||||
|
_builder = builder;
|
||||||
|
_tracker = tracker;
|
||||||
|
_captureRegistry = captureRegistry;
|
||||||
|
_options = options;
|
||||||
|
_logger = logger;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Starts the push loop. Idempotent — a second call is a no-op.</summary>
|
||||||
|
public void Start()
|
||||||
|
{
|
||||||
|
if (Interlocked.CompareExchange(ref _started, 1, 0) != 0)
|
||||||
|
return; // already started — do not orphan the running loop
|
||||||
|
_loop = Task.Run(() => LoopAsync(_cts.Token));
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Stops the push loop and disarms every tag-value capture.
|
||||||
|
/// </summary>
|
||||||
|
public async Task StopAsync()
|
||||||
|
{
|
||||||
|
if (_stopped) return;
|
||||||
|
_stopped = true;
|
||||||
|
|
||||||
|
if (!_cts.IsCancellationRequested)
|
||||||
|
await _cts.CancelAsync().ConfigureAwait(false);
|
||||||
|
|
||||||
|
try
|
||||||
|
{
|
||||||
|
await _loop.ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
catch (OperationCanceledException)
|
||||||
|
{
|
||||||
|
// Expected on cancellation.
|
||||||
|
}
|
||||||
|
|
||||||
|
_captureRegistry.DisarmAll();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>One push cycle. Exposed internally so unit tests can drive it deterministically.</summary>
|
||||||
|
internal async Task PushOnceAsync(CancellationToken ct)
|
||||||
|
{
|
||||||
|
// Tracks whether this cycle hit any failure, to drive the consecutive-failure
|
||||||
|
// log throttle (see ShouldLogPushFailure).
|
||||||
|
bool anyFailure = false;
|
||||||
|
|
||||||
|
StatusResponse snapshot;
|
||||||
|
try
|
||||||
|
{
|
||||||
|
snapshot = _builder.Build();
|
||||||
|
}
|
||||||
|
catch (Exception ex)
|
||||||
|
{
|
||||||
|
if (ShouldLogPushFailure()) LogSnapshotFailed(_logger, ex);
|
||||||
|
_consecutivePushFailures++;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
try
|
||||||
|
{
|
||||||
|
await _sink.PushFleetAsync(snapshot, ct).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
catch (Exception ex) when (ex is not OperationCanceledException)
|
||||||
|
{
|
||||||
|
if (ShouldLogPushFailure()) LogFleetPushFailed(_logger, ex);
|
||||||
|
anyFailure = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Reconcile capture arm state from the live viewer set. This is the single
|
||||||
|
// arm/disarm authority — doing it here (one thread, every cycle) means a SignalR
|
||||||
|
// reconnect or a hot-reload capture rebuild can never strand a capture armed.
|
||||||
|
var activePlcs = _tracker.ActivePlcs();
|
||||||
|
_captureRegistry.ReconcileArmed(activePlcs);
|
||||||
|
|
||||||
|
// Index the snapshot's PLC rows once per cycle — a per-active-PLC FirstOrDefault
|
||||||
|
// would be O(active × fleet).
|
||||||
|
Dictionary<string, PlcStatus>? plcsByName = activePlcs.Count > 0
|
||||||
|
? snapshot.Plcs.ToDictionary(p => p.Name, StringComparer.Ordinal)
|
||||||
|
: null;
|
||||||
|
|
||||||
|
foreach (var plcName in activePlcs)
|
||||||
|
{
|
||||||
|
try
|
||||||
|
{
|
||||||
|
var plc = plcsByName!.GetValueOrDefault(plcName);
|
||||||
|
// armedOverride: true — every plcName here came from activePlcs, which
|
||||||
|
// ReconcileArmed was just driven from, so the capture is armed by intent.
|
||||||
|
var debug = _builder.BuildDebug(plcName, armedOverride: true);
|
||||||
|
var detail = new PlcDetailResponse(plc, debug);
|
||||||
|
await _sink.PushPlcAsync(plcName, detail, ct).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
catch (Exception ex) when (ex is not OperationCanceledException)
|
||||||
|
{
|
||||||
|
if (ShouldLogPushFailure()) LogDetailPushFailed(_logger, plcName, ex);
|
||||||
|
anyFailure = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Advance / reset the consecutive-failure run that throttles error logging.
|
||||||
|
_consecutivePushFailures = anyFailure ? _consecutivePushFailures + 1 : 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
private async Task LoopAsync(CancellationToken ct)
|
||||||
|
{
|
||||||
|
try
|
||||||
|
{
|
||||||
|
while (!ct.IsCancellationRequested)
|
||||||
|
{
|
||||||
|
// Push first, delay second — so a dashboard that connects right after the
|
||||||
|
// loop starts gets a snapshot immediately instead of waiting one interval.
|
||||||
|
await PushOnceAsync(ct).ConfigureAwait(false);
|
||||||
|
|
||||||
|
// Re-read the interval each cycle so an AdminPushIntervalMs hot-reload
|
||||||
|
// takes effect without restarting the loop. Floored at 100 ms to avoid a
|
||||||
|
// pathologically tight loop if a bad value slips past validation.
|
||||||
|
int interval = Math.Max(100, _options.CurrentValue.AdminPushIntervalMs);
|
||||||
|
await Task.Delay(interval, ct).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
catch (OperationCanceledException)
|
||||||
|
{
|
||||||
|
// Normal shutdown.
|
||||||
|
}
|
||||||
|
catch (Exception ex)
|
||||||
|
{
|
||||||
|
LogLoopTerminated(_logger, ex);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public async ValueTask DisposeAsync()
|
||||||
|
{
|
||||||
|
await StopAsync().ConfigureAwait(false);
|
||||||
|
_cts.Dispose();
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Logging ──────────────────────────────────────────────────────────────
|
||||||
|
// Stable event names in the mbproxy.admin.broadcast.* family — see
|
||||||
|
// docs/Reference/LogEvents.md. EventIds continue the admin block (70/71 in
|
||||||
|
// AdminEndpointHost).
|
||||||
|
|
||||||
|
[LoggerMessage(EventId = 72, EventName = "mbproxy.admin.broadcast.snapshot.failed",
|
||||||
|
Level = LogLevel.Error,
|
||||||
|
Message = "Status broadcaster failed to build a status snapshot — this push cycle is skipped")]
|
||||||
|
private static partial void LogSnapshotFailed(ILogger logger, Exception ex);
|
||||||
|
|
||||||
|
[LoggerMessage(EventId = 73, EventName = "mbproxy.admin.broadcast.fleet.failed",
|
||||||
|
Level = LogLevel.Error,
|
||||||
|
Message = "Status broadcaster failed to push the fleet snapshot to dashboard subscribers")]
|
||||||
|
private static partial void LogFleetPushFailed(ILogger logger, Exception ex);
|
||||||
|
|
||||||
|
[LoggerMessage(EventId = 74, EventName = "mbproxy.admin.broadcast.detail.failed",
|
||||||
|
Level = LogLevel.Error,
|
||||||
|
Message = "Status broadcaster failed to push the detail snapshot for PLC {Plc}")]
|
||||||
|
private static partial void LogDetailPushFailed(ILogger logger, string plc, Exception ex);
|
||||||
|
|
||||||
|
[LoggerMessage(EventId = 75, EventName = "mbproxy.admin.broadcast.loop.terminated",
|
||||||
|
Level = LogLevel.Error,
|
||||||
|
Message = "Status broadcaster push loop terminated unexpectedly — the live dashboard feed has stopped")]
|
||||||
|
private static partial void LogLoopTerminated(ILogger logger, Exception ex);
|
||||||
|
}
|
||||||
@@ -1,249 +0,0 @@
|
|||||||
using System.Text;
|
|
||||||
|
|
||||||
namespace Mbproxy.Admin;
|
|
||||||
|
|
||||||
/// <summary>
|
|
||||||
/// Renders a <see cref="StatusResponse"/> as a self-contained HTML page.
|
|
||||||
///
|
|
||||||
/// <para>Constraints (see <c>docs/Operations/StatusPage.md</c>):</para>
|
|
||||||
/// <list type="bullet">
|
|
||||||
/// <item>No external assets (CSS/JS/fonts/favicons) — firewalled networks only.</item>
|
|
||||||
/// <item><c><meta http-equiv="refresh" content="5"></c> for auto-refresh.</item>
|
|
||||||
/// <item>Page weight ≤ 50 KB for a 54-PLC fleet.</item>
|
|
||||||
/// <item>Listener state colour-coded: bound=green, recovering=orange, stopped=grey.</item>
|
|
||||||
/// <item>Connected clients rendered as compact <c>[remote (n PDUs)]</c> list (not nested table).</item>
|
|
||||||
/// </list>
|
|
||||||
/// </summary>
|
|
||||||
internal static class StatusHtmlRenderer
|
|
||||||
{
|
|
||||||
private const string Css = """
|
|
||||||
body{font-family:monospace;font-size:13px;margin:1em}
|
|
||||||
h1{font-size:1.1em;margin-bottom:.3em}
|
|
||||||
.meta{color:#555;margin-bottom:.8em;font-size:12px}
|
|
||||||
table{border-collapse:collapse;width:100%}
|
|
||||||
th,td{border:1px solid #ccc;padding:3px 6px;white-space:nowrap}
|
|
||||||
th{background:#f0f0f0;text-align:left}
|
|
||||||
tr:nth-child(even)td{background:#fafafa}
|
|
||||||
.bound{color:green;font-weight:bold}
|
|
||||||
.recovering{color:darkorange;font-weight:bold}
|
|
||||||
.stopped{color:grey}
|
|
||||||
.err{font-size:11px;color:#a00}
|
|
||||||
.clients{font-size:11px;color:#333}
|
|
||||||
""";
|
|
||||||
|
|
||||||
/// <summary>
|
|
||||||
/// Renders the status page as a complete HTML document string.
|
|
||||||
/// May allocate; intended for the status-page read path only.
|
|
||||||
/// </summary>
|
|
||||||
public static string Render(StatusResponse status)
|
|
||||||
{
|
|
||||||
var sb = new StringBuilder(4096);
|
|
||||||
|
|
||||||
sb.Append("<!DOCTYPE html><html lang=\"en\"><head><meta charset=\"utf-8\">");
|
|
||||||
sb.Append("<meta http-equiv=\"refresh\" content=\"5\">");
|
|
||||||
sb.Append("<title>mbproxy status</title>");
|
|
||||||
sb.Append("<style>").Append(Css).Append("</style>");
|
|
||||||
sb.Append("</head><body>");
|
|
||||||
|
|
||||||
// ── Header ────────────────────────────────────────────────────────────
|
|
||||||
sb.Append("<h1>mbproxy status</h1>");
|
|
||||||
sb.Append("<div class=\"meta\">");
|
|
||||||
sb.Append("Version: ").Append(HtmlEncode(status.Service.Version));
|
|
||||||
sb.Append(" | Uptime: ").Append(FormatUptime(status.Service.UptimeSeconds));
|
|
||||||
sb.Append(" | Listeners: ")
|
|
||||||
.Append(status.Listeners.Bound).Append('/').Append(status.Listeners.Configured)
|
|
||||||
.Append(" bound");
|
|
||||||
if (status.Service.ConfigLastReloadUtc.HasValue)
|
|
||||||
{
|
|
||||||
sb.Append(" | Last reload: ")
|
|
||||||
.Append(HtmlEncode(status.Service.ConfigLastReloadUtc.Value.ToString("yyyy-MM-dd HH:mm:ss") + "Z"));
|
|
||||||
}
|
|
||||||
sb.Append(" | Reloads: ").Append(status.Service.ConfigReloadCount);
|
|
||||||
if (status.Service.ConfigReloadRejectedCount > 0)
|
|
||||||
sb.Append(" (").Append(status.Service.ConfigReloadRejectedCount).Append(" rejected)");
|
|
||||||
sb.Append("</div>");
|
|
||||||
|
|
||||||
// ── PLC table ─────────────────────────────────────────────────────────
|
|
||||||
if (status.Plcs.Count == 0)
|
|
||||||
{
|
|
||||||
sb.Append("<p><em>No PLCs configured.</em></p>");
|
|
||||||
}
|
|
||||||
else
|
|
||||||
{
|
|
||||||
sb.Append("<table>");
|
|
||||||
sb.Append("<thead><tr>");
|
|
||||||
sb.Append("<th>Name</th><th>Host</th><th>Port</th><th>State</th>");
|
|
||||||
sb.Append("<th>Clients</th><th>PDUs fwd</th><th>FC03</th><th>FC04</th>");
|
|
||||||
sb.Append("<th>FC06</th><th>FC16</th><th>FC?</th><th>BCD slots</th>");
|
|
||||||
sb.Append("<th>Partial BCD</th><th>Invalid BCD</th><th>Ex 01</th><th>Ex 02</th><th>Ex 03</th><th>Ex 04</th><th>Ex ?</th>");
|
|
||||||
sb.Append("<th>RTT ms</th><th>Bytes in</th><th>Bytes out</th>");
|
|
||||||
// Multiplexer telemetry columns.
|
|
||||||
sb.Append("<th>In-flight</th><th>Max in-flight</th><th>TxId wraps</th>");
|
|
||||||
sb.Append("<th>Cascades</th><th>Queue</th>");
|
|
||||||
// Coalescing column. Single cell carries hit / (hit + miss) ratio as a
|
|
||||||
// percentage plus the raw hit count for context. Kept compact (one cell) to
|
|
||||||
// stay under the 50 KB page-weight budget.
|
|
||||||
sb.Append("<th>Coal</th>");
|
|
||||||
// Cache column. Single cell carries hit-ratio percent plus raw hit count;
|
|
||||||
// an em-dash when no cache-eligible reads have occurred. Page-weight budget
|
|
||||||
// assertion stays under 50 KB for the 54-PLC fleet.
|
|
||||||
sb.Append("<th>Cache</th>");
|
|
||||||
// Keepalive column — heartbeats sent, with failure / idle-disconnect counts
|
|
||||||
// shown only when non-zero.
|
|
||||||
sb.Append("<th>Keepalive</th>");
|
|
||||||
sb.Append("</tr></thead><tbody>");
|
|
||||||
|
|
||||||
foreach (var plc in status.Plcs)
|
|
||||||
{
|
|
||||||
sb.Append("<tr>");
|
|
||||||
sb.Append("<td>").Append(HtmlEncode(plc.Name)).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(HtmlEncode(plc.Host)).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.ListenPort).Append("</td>");
|
|
||||||
|
|
||||||
// State cell with colour coding
|
|
||||||
string stateClass = plc.Listener.State switch
|
|
||||||
{
|
|
||||||
"bound" => "bound",
|
|
||||||
"recovering" => "recovering",
|
|
||||||
_ => "stopped",
|
|
||||||
};
|
|
||||||
sb.Append("<td><span class=\"").Append(stateClass).Append("\">")
|
|
||||||
.Append(HtmlEncode(plc.Listener.State)).Append("</span>");
|
|
||||||
if (plc.Listener.State == "recovering" && plc.Listener.LastBindError is { } err)
|
|
||||||
{
|
|
||||||
sb.Append("<br><span class=\"err\">")
|
|
||||||
.Append(HtmlEncode(err))
|
|
||||||
.Append(" (attempt ").Append(plc.Listener.RecoveryAttempts).Append(")")
|
|
||||||
.Append("</span>");
|
|
||||||
}
|
|
||||||
sb.Append("</td>");
|
|
||||||
|
|
||||||
// Connected clients
|
|
||||||
sb.Append("<td><span class=\"clients\">");
|
|
||||||
sb.Append(plc.Clients.Connected);
|
|
||||||
if (plc.Clients.RemoteEndpoints.Count > 0)
|
|
||||||
{
|
|
||||||
sb.Append("<br>");
|
|
||||||
bool first = true;
|
|
||||||
foreach (var c in plc.Clients.RemoteEndpoints)
|
|
||||||
{
|
|
||||||
if (!first) sb.Append(", ");
|
|
||||||
sb.Append(HtmlEncode(c.Remote))
|
|
||||||
.Append(" (").Append(c.PdusForwarded).Append(')');
|
|
||||||
first = false;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
sb.Append("</span></td>");
|
|
||||||
|
|
||||||
// Counter cells
|
|
||||||
sb.Append("<td>").Append(plc.Pdus.Forwarded).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Pdus.ByFc.Fc03).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Pdus.ByFc.Fc04).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Pdus.ByFc.Fc06).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Pdus.ByFc.Fc16).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Pdus.ByFc.Other).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Pdus.RewrittenSlots).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Pdus.PartialBcdWarnings).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Pdus.InvalidBcdWarnings).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Backend.ExceptionsByCode.Code01).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Backend.ExceptionsByCode.Code02).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Backend.ExceptionsByCode.Code03).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Backend.ExceptionsByCode.Code04).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Backend.ExceptionsByCode.CodeOther).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Backend.LastRoundTripMs.ToString("F1")).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Bytes.UpstreamIn).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Bytes.UpstreamOut).Append("</td>");
|
|
||||||
// Multiplexer telemetry cells.
|
|
||||||
sb.Append("<td>").Append(plc.Backend.InFlight).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Backend.MaxInFlight).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Backend.TxIdWraps).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Backend.DisconnectCascades).Append("</td>");
|
|
||||||
sb.Append("<td>").Append(plc.Backend.QueueDepth).Append("</td>");
|
|
||||||
// Coalescing ratio cell — "<pct>% (<hit>)". When no coalesced reads have
|
|
||||||
// been seen, render an em-dash to keep the cell narrow.
|
|
||||||
long coalHit = plc.Backend.CoalescedHitCount;
|
|
||||||
long coalMiss = plc.Backend.CoalescedMissCount;
|
|
||||||
sb.Append("<td>");
|
|
||||||
if (coalHit + coalMiss == 0)
|
|
||||||
{
|
|
||||||
sb.Append("—");
|
|
||||||
}
|
|
||||||
else
|
|
||||||
{
|
|
||||||
int pct = (int)Math.Round(100.0 * coalHit / (coalHit + coalMiss));
|
|
||||||
sb.Append(pct).Append("% (").Append(coalHit).Append(')');
|
|
||||||
}
|
|
||||||
sb.Append("</td>");
|
|
||||||
// Cache ratio cell — same pattern as coalescing.
|
|
||||||
long cacheHit = plc.Backend.CacheHitCount;
|
|
||||||
long cacheMiss = plc.Backend.CacheMissCount;
|
|
||||||
sb.Append("<td>");
|
|
||||||
if (cacheHit + cacheMiss == 0)
|
|
||||||
{
|
|
||||||
sb.Append("—");
|
|
||||||
}
|
|
||||||
else
|
|
||||||
{
|
|
||||||
int pct = (int)Math.Round(100.0 * cacheHit / (cacheHit + cacheMiss));
|
|
||||||
sb.Append(pct).Append("% (").Append(cacheHit).Append(')');
|
|
||||||
}
|
|
||||||
sb.Append("</td>");
|
|
||||||
// Keepalive cell — heartbeats sent; failures + idle-disconnects appended
|
|
||||||
// only when non-zero to keep the cell narrow.
|
|
||||||
long hbSent = plc.Backend.BackendHeartbeatsSent;
|
|
||||||
long hbFailed = plc.Backend.BackendHeartbeatsFailed;
|
|
||||||
long hbIdle = plc.Backend.BackendIdleDisconnects;
|
|
||||||
sb.Append("<td>");
|
|
||||||
if (hbSent == 0 && hbFailed == 0 && hbIdle == 0)
|
|
||||||
{
|
|
||||||
sb.Append("—");
|
|
||||||
}
|
|
||||||
else
|
|
||||||
{
|
|
||||||
sb.Append(hbSent);
|
|
||||||
if (hbFailed > 0 || hbIdle > 0)
|
|
||||||
sb.Append(" (fail ").Append(hbFailed)
|
|
||||||
.Append(", idle-disc ").Append(hbIdle).Append(')');
|
|
||||||
}
|
|
||||||
sb.Append("</td>");
|
|
||||||
sb.Append("</tr>");
|
|
||||||
}
|
|
||||||
|
|
||||||
sb.Append("</tbody></table>");
|
|
||||||
}
|
|
||||||
|
|
||||||
sb.Append("</body></html>");
|
|
||||||
return sb.ToString();
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── Helpers ───────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
private static string FormatUptime(long seconds)
|
|
||||||
{
|
|
||||||
var ts = TimeSpan.FromSeconds(seconds);
|
|
||||||
if (ts.TotalHours >= 1)
|
|
||||||
return $"{(int)ts.TotalHours}h {ts.Minutes:D2}m {ts.Seconds:D2}s";
|
|
||||||
if (ts.TotalMinutes >= 1)
|
|
||||||
return $"{ts.Minutes}m {ts.Seconds:D2}s";
|
|
||||||
return $"{seconds}s";
|
|
||||||
}
|
|
||||||
|
|
||||||
private static string HtmlEncode(string s)
|
|
||||||
{
|
|
||||||
// Fast path: no special chars.
|
|
||||||
if (!ContainsHtmlSpecial(s)) return s;
|
|
||||||
|
|
||||||
return s
|
|
||||||
.Replace("&", "&")
|
|
||||||
.Replace("<", "<")
|
|
||||||
.Replace(">", ">")
|
|
||||||
.Replace("\"", """);
|
|
||||||
}
|
|
||||||
|
|
||||||
private static bool ContainsHtmlSpecial(string s)
|
|
||||||
{
|
|
||||||
foreach (char c in s)
|
|
||||||
if (c is '&' or '<' or '>' or '"') return true;
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -0,0 +1,72 @@
|
|||||||
|
using Microsoft.AspNetCore.SignalR;
|
||||||
|
|
||||||
|
namespace Mbproxy.Admin;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// SignalR hub backing the live admin dashboard. Two subscription scopes:
|
||||||
|
/// <list type="bullet">
|
||||||
|
/// <item><see cref="SubscribeFleet"/> — the fleet dashboard (<c>GET /</c>) joins the
|
||||||
|
/// <see cref="FleetGroup"/> group and receives a <c>"fleet"</c> message every
|
||||||
|
/// push tick.</item>
|
||||||
|
/// <item><see cref="SubscribePlc"/> — a connection-detail page (<c>GET /plc/{name}</c>)
|
||||||
|
/// joins <see cref="PlcGroup"/> and receives a <c>"plc"</c> message. It also
|
||||||
|
/// registers the calling tab with the <see cref="PlcSubscriptionTracker"/> so
|
||||||
|
/// the PLC's tag-value capture is armed while the page is open.</item>
|
||||||
|
/// </list>
|
||||||
|
///
|
||||||
|
/// <para>The hub itself is transient (one instance per call). Cross-call state — which
|
||||||
|
/// tabs view which PLCs — lives in the singleton <see cref="PlcSubscriptionTracker"/>.
|
||||||
|
/// The hub deliberately does <b>not</b> arm or disarm captures: a SignalR reconnect
|
||||||
|
/// gives the connection a new <c>ConnectionId</c>, and arming off that lifetime leaks.
|
||||||
|
/// Instead the hub only mutates the tracker (keyed on a stable client <c>tabId</c>) and
|
||||||
|
/// <see cref="StatusBroadcaster"/> reconciles capture arm state from the tracker each
|
||||||
|
/// push cycle. The actual pushes are issued by the broadcaster, not the hub.</para>
|
||||||
|
/// </summary>
|
||||||
|
internal sealed class StatusHub : Hub
|
||||||
|
{
|
||||||
|
/// <summary>SignalR group name for fleet-dashboard subscribers.</summary>
|
||||||
|
public const string FleetGroup = "fleet";
|
||||||
|
|
||||||
|
/// <summary>SignalR group name for a single PLC's detail-page subscribers.</summary>
|
||||||
|
public static string PlcGroup(string plcName) => "plc:" + plcName;
|
||||||
|
|
||||||
|
private readonly PlcSubscriptionTracker _tracker;
|
||||||
|
|
||||||
|
public StatusHub(PlcSubscriptionTracker tracker) => _tracker = tracker;
|
||||||
|
|
||||||
|
/// <summary>Subscribes the calling connection to fleet-wide status pushes.</summary>
|
||||||
|
public Task SubscribeFleet()
|
||||||
|
=> Groups.AddToGroupAsync(Context.ConnectionId, FleetGroup);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Subscribes the calling connection to one PLC's detail pushes. <paramref name="tabId"/>
|
||||||
|
/// is a stable per-page-load identifier supplied by the client so a transport
|
||||||
|
/// reconnect (which changes <c>ConnectionId</c>) is recognised as the same viewer.
|
||||||
|
/// </summary>
|
||||||
|
public async Task SubscribePlc(string plcName, string tabId)
|
||||||
|
{
|
||||||
|
// Register with the tracker first (synchronous, lock-guarded) so this connection's
|
||||||
|
// own OnDisconnectedAsync — which SignalR dispatches only after this invocation's
|
||||||
|
// Task completes — always observes a consistent state. Capture arming is NOT done
|
||||||
|
// here; StatusBroadcaster reconciles it each cycle from the tracker.
|
||||||
|
//
|
||||||
|
// If AddToGroupAsync below throws (a mid-invocation transport fault), the tracker
|
||||||
|
// entry is left in place; OnDisconnectedAsync is the backstop — it always runs for
|
||||||
|
// a connection that completed OnConnectedAsync, and RemoveConnection then releases
|
||||||
|
// the tab. The transient cost is at most one armed capture with no live group
|
||||||
|
// member until that disconnect callback fires.
|
||||||
|
_tracker.SubscribePlc(Context.ConnectionId, tabId, plcName);
|
||||||
|
await Groups.AddToGroupAsync(Context.ConnectionId, PlcGroup(plcName)).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// On disconnect, releases the connection from its tab. If it was the tab's last
|
||||||
|
/// connection the tab's PLC subscriptions are dropped; the broadcaster disarms the
|
||||||
|
/// now-unviewed captures on its next cycle.
|
||||||
|
/// </summary>
|
||||||
|
public override Task OnDisconnectedAsync(Exception? exception)
|
||||||
|
{
|
||||||
|
_tracker.RemoveConnection(Context.ConnectionId);
|
||||||
|
return base.OnDisconnectedAsync(exception);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,35 @@
|
|||||||
|
using Microsoft.AspNetCore.SignalR;
|
||||||
|
|
||||||
|
namespace Mbproxy.Admin;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// The outbound side of the admin live feed. Defined as an interface so
|
||||||
|
/// <see cref="StatusBroadcaster"/>'s loop logic (snapshot building, group selection,
|
||||||
|
/// disarm-on-stop) is unit-testable without standing up a SignalR host — tests inject
|
||||||
|
/// a recording fake; production uses <see cref="SignalRStatusPushSink"/>.
|
||||||
|
/// </summary>
|
||||||
|
internal interface IStatusPushSink
|
||||||
|
{
|
||||||
|
/// <summary>Pushes a fleet snapshot to every fleet-dashboard subscriber.</summary>
|
||||||
|
Task PushFleetAsync(StatusResponse snapshot, CancellationToken ct);
|
||||||
|
|
||||||
|
/// <summary>Pushes one PLC's detail payload to that PLC's detail-page subscribers.</summary>
|
||||||
|
Task PushPlcAsync(string plcName, PlcDetailResponse detail, CancellationToken ct);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Production <see cref="IStatusPushSink"/> — forwards pushes onto the SignalR
|
||||||
|
/// <see cref="StatusHub"/> groups via <see cref="IHubContext{THub}"/>.
|
||||||
|
/// </summary>
|
||||||
|
internal sealed class SignalRStatusPushSink : IStatusPushSink
|
||||||
|
{
|
||||||
|
private readonly IHubContext<StatusHub> _hub;
|
||||||
|
|
||||||
|
public SignalRStatusPushSink(IHubContext<StatusHub> hub) => _hub = hub;
|
||||||
|
|
||||||
|
public Task PushFleetAsync(StatusResponse snapshot, CancellationToken ct)
|
||||||
|
=> _hub.Clients.Group(StatusHub.FleetGroup).SendAsync("fleet", snapshot, ct);
|
||||||
|
|
||||||
|
public Task PushPlcAsync(string plcName, PlcDetailResponse detail, CancellationToken ct)
|
||||||
|
=> _hub.Clients.Group(StatusHub.PlcGroup(plcName)).SendAsync("plc", detail, ct);
|
||||||
|
}
|
||||||
@@ -19,17 +19,69 @@ internal sealed class StatusSnapshotBuilder
|
|||||||
private readonly ServiceCounters _serviceCounters;
|
private readonly ServiceCounters _serviceCounters;
|
||||||
private readonly AssemblyVersionAccessor _version;
|
private readonly AssemblyVersionAccessor _version;
|
||||||
private readonly ProxyWorker _proxyWorker;
|
private readonly ProxyWorker _proxyWorker;
|
||||||
|
private readonly TagCaptureRegistry _captureRegistry;
|
||||||
|
|
||||||
public StatusSnapshotBuilder(
|
public StatusSnapshotBuilder(
|
||||||
IOptionsMonitor<MbproxyOptions> options,
|
IOptionsMonitor<MbproxyOptions> options,
|
||||||
ServiceCounters serviceCounters,
|
ServiceCounters serviceCounters,
|
||||||
AssemblyVersionAccessor version,
|
AssemblyVersionAccessor version,
|
||||||
ProxyWorker proxyWorker)
|
ProxyWorker proxyWorker,
|
||||||
|
TagCaptureRegistry captureRegistry)
|
||||||
{
|
{
|
||||||
_options = options;
|
_options = options;
|
||||||
_serviceCounters = serviceCounters;
|
_serviceCounters = serviceCounters;
|
||||||
_version = version;
|
_version = version;
|
||||||
_proxyWorker = proxyWorker;
|
_proxyWorker = proxyWorker;
|
||||||
|
_captureRegistry = captureRegistry;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Builds the connection-detail debug snapshot for one PLC: the last value observed
|
||||||
|
/// for every configured BCD tag. Returns an empty, disarmed snapshot when
|
||||||
|
/// <paramref name="plcName"/> is unknown (e.g. a detail page open for a PLC removed
|
||||||
|
/// by hot-reload).
|
||||||
|
///
|
||||||
|
/// <para><paramref name="armedOverride"/> lets the caller supply the armed flag rather
|
||||||
|
/// than have this method independently re-read <c>capture.IsArmed</c>. The broadcaster
|
||||||
|
/// passes <c>true</c> because it only builds a debug snapshot for PLCs it just
|
||||||
|
/// reconciled armed in the same push cycle — so the pushed payload's <c>CaptureArmed</c>
|
||||||
|
/// flag is consistent with that decision by construction, instead of racing a
|
||||||
|
/// disarm between the reconcile and this read (review AdminSignalR M1). When omitted,
|
||||||
|
/// the live <c>capture.IsArmed</c> is used.</para>
|
||||||
|
/// </summary>
|
||||||
|
public PlcDebugSnapshot BuildDebug(string plcName, bool? armedOverride = null)
|
||||||
|
{
|
||||||
|
if (!_captureRegistry.TryGet(plcName, out var capture))
|
||||||
|
return new PlcDebugSnapshot(CaptureArmed: false, Tags: Array.Empty<TagValueDto>());
|
||||||
|
|
||||||
|
var now = DateTimeOffset.UtcNow;
|
||||||
|
var tags = capture.Snapshot()
|
||||||
|
.Select(o => ToTagDto(o, now))
|
||||||
|
.ToList();
|
||||||
|
|
||||||
|
return new PlcDebugSnapshot(armedOverride ?? capture.IsArmed, tags);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static TagValueDto ToTagDto(TagValueObservation o, DateTimeOffset now)
|
||||||
|
{
|
||||||
|
bool hasValue = o.UpdatedAtUtc.HasValue;
|
||||||
|
|
||||||
|
string rawHex = !hasValue
|
||||||
|
? "—"
|
||||||
|
: o.Width == 32
|
||||||
|
? $"0x{o.RawHigh:X4}{o.RawLow:X4}"
|
||||||
|
: $"0x{o.RawLow:X4}";
|
||||||
|
|
||||||
|
return new TagValueDto(
|
||||||
|
Address: o.Address,
|
||||||
|
Width: o.Width,
|
||||||
|
Name: o.Name,
|
||||||
|
HasValue: hasValue,
|
||||||
|
Direction: o.Direction == CaptureDirection.Write ? "write" : "read",
|
||||||
|
RawHex: rawHex,
|
||||||
|
DecodedValue: o.DecodedValue,
|
||||||
|
UpdatedAtUtc: o.UpdatedAtUtc?.ToString("o"),
|
||||||
|
AgeSeconds: o.UpdatedAtUtc is { } at ? (now - at).TotalSeconds : null);
|
||||||
}
|
}
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -0,0 +1,131 @@
|
|||||||
|
/* ============================================================================
|
||||||
|
Fleet dashboard — view-specific styling (Phase 4).
|
||||||
|
Shared tokens and chrome live in theme.css.
|
||||||
|
========================================================================= */
|
||||||
|
|
||||||
|
/* ── Aggregate strip ─────────────────────────────────────────────────────── */
|
||||||
|
.agg-grid {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(6, 1fr);
|
||||||
|
gap: 0.75rem;
|
||||||
|
margin-bottom: 1rem;
|
||||||
|
}
|
||||||
|
@media (max-width: 1100px) {
|
||||||
|
.agg-grid { grid-template-columns: repeat(3, 1fr); }
|
||||||
|
}
|
||||||
|
@media (max-width: 620px) {
|
||||||
|
.agg-grid { grid-template-columns: repeat(2, 1fr); }
|
||||||
|
}
|
||||||
|
|
||||||
|
.agg-card {
|
||||||
|
background: var(--card);
|
||||||
|
border: 1px solid var(--rule);
|
||||||
|
border-radius: 8px;
|
||||||
|
padding: 0.7rem 0.9rem;
|
||||||
|
}
|
||||||
|
.agg-label {
|
||||||
|
font-size: 0.68rem;
|
||||||
|
font-weight: 600;
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.07em;
|
||||||
|
color: var(--ink-faint);
|
||||||
|
}
|
||||||
|
.agg-value {
|
||||||
|
margin-top: 0.25rem;
|
||||||
|
font-size: 1.5rem;
|
||||||
|
font-weight: 600;
|
||||||
|
line-height: 1.1;
|
||||||
|
display: flex;
|
||||||
|
align-items: baseline;
|
||||||
|
gap: 0.35rem;
|
||||||
|
}
|
||||||
|
.agg-sub {
|
||||||
|
font-size: 0.85rem;
|
||||||
|
font-weight: 400;
|
||||||
|
color: var(--ink-faint);
|
||||||
|
}
|
||||||
|
/* highlight the problem cards when non-zero */
|
||||||
|
.agg-card.alert { border-color: #eec3c3; background: var(--bad-bg); }
|
||||||
|
.agg-card.alert .agg-value { color: var(--bad); }
|
||||||
|
.agg-card.caution { border-color: #efd6a6; background: var(--warn-bg); }
|
||||||
|
.agg-card.caution .agg-value { color: #b56a00; }
|
||||||
|
|
||||||
|
/* ── Toolbar ─────────────────────────────────────────────────────────────── */
|
||||||
|
.toolbar {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 0.6rem;
|
||||||
|
padding: 0.6rem 0.9rem;
|
||||||
|
border-bottom: 1px solid var(--rule);
|
||||||
|
}
|
||||||
|
.toolbar .spacer { flex: 1; }
|
||||||
|
.tb-search { max-width: 280px; }
|
||||||
|
.tb-state { max-width: 150px; }
|
||||||
|
.tb-check {
|
||||||
|
display: flex; align-items: center; gap: 0.35rem;
|
||||||
|
font-size: 0.82rem; color: var(--ink-soft); white-space: nowrap;
|
||||||
|
user-select: none;
|
||||||
|
}
|
||||||
|
.tb-count { font-family: var(--mono); font-size: 0.78rem; color: var(--ink-faint); }
|
||||||
|
|
||||||
|
/* ── KPI table ───────────────────────────────────────────────────────────── */
|
||||||
|
.table-wrap { overflow-x: auto; }
|
||||||
|
|
||||||
|
.kpi-table {
|
||||||
|
width: 100%;
|
||||||
|
border-collapse: collapse;
|
||||||
|
font-size: 0.85rem;
|
||||||
|
}
|
||||||
|
.kpi-table th,
|
||||||
|
.kpi-table td {
|
||||||
|
padding: 0.45rem 0.8rem;
|
||||||
|
text-align: left;
|
||||||
|
white-space: nowrap;
|
||||||
|
border-bottom: 1px solid var(--rule);
|
||||||
|
}
|
||||||
|
.kpi-table th {
|
||||||
|
font-size: 0.7rem;
|
||||||
|
font-weight: 600;
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.05em;
|
||||||
|
color: var(--ink-faint);
|
||||||
|
background: #fbfbf9;
|
||||||
|
position: sticky;
|
||||||
|
top: 0;
|
||||||
|
}
|
||||||
|
.kpi-table th.num,
|
||||||
|
.kpi-table td.num { text-align: right; font-family: var(--mono); }
|
||||||
|
|
||||||
|
.kpi-table th.sortable { cursor: pointer; user-select: none; }
|
||||||
|
.kpi-table th.sortable:hover { color: var(--ink); }
|
||||||
|
.kpi-table th.sortable:focus-visible {
|
||||||
|
outline: 2px solid var(--accent);
|
||||||
|
outline-offset: -2px;
|
||||||
|
color: var(--ink);
|
||||||
|
}
|
||||||
|
.kpi-table th.sorted-asc::after { content: ' \2191'; color: var(--accent); }
|
||||||
|
.kpi-table th.sorted-desc::after { content: ' \2193'; color: var(--accent); }
|
||||||
|
|
||||||
|
.kpi-table tbody tr { transition: background 0.08s; }
|
||||||
|
.kpi-table tbody tr:hover { background: #f3f6fd; }
|
||||||
|
.kpi-table tbody tr:last-child td { border-bottom: none; }
|
||||||
|
|
||||||
|
.kpi-table .plc-name { font-weight: 600; }
|
||||||
|
.kpi-table .plc-name a { color: inherit; text-decoration: none; }
|
||||||
|
.kpi-table .plc-name a:hover { text-decoration: underline; }
|
||||||
|
.kpi-table .plc-name a:focus-visible {
|
||||||
|
outline: 2px solid var(--accent);
|
||||||
|
outline-offset: 1px;
|
||||||
|
border-radius: 2px;
|
||||||
|
}
|
||||||
|
.kpi-table .plc-host { color: var(--ink-soft); font-family: var(--mono); font-size: 0.8rem; }
|
||||||
|
|
||||||
|
.empty-row {
|
||||||
|
text-align: center !important;
|
||||||
|
color: var(--ink-faint);
|
||||||
|
padding: 1.6rem !important;
|
||||||
|
font-style: italic;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* faint divider between value and unit in ratio cells */
|
||||||
|
.ratio-sub { color: var(--ink-faint); font-size: 0.76rem; }
|
||||||
@@ -0,0 +1,300 @@
|
|||||||
|
/* ============================================================================
|
||||||
|
Fleet dashboard — live view over the SignalR /hub/status feed (Phase 4).
|
||||||
|
Vanilla JS, no framework. Subscribes to the "fleet" group and re-renders the
|
||||||
|
aggregate strip + filterable/sortable PLC table on every push.
|
||||||
|
========================================================================= */
|
||||||
|
'use strict';
|
||||||
|
|
||||||
|
(function () {
|
||||||
|
// ── State ──────────────────────────────────────────────────────────────
|
||||||
|
let latest = null; // last StatusResponse
|
||||||
|
const prevPdu = new Map(); // plc name → { forwarded, t } for rate calc
|
||||||
|
const rateByName = new Map(); // plc name → PDU/s
|
||||||
|
const filter = { search: '', state: '', problemsOnly: false };
|
||||||
|
const sort = { key: 'name', dir: 1 };
|
||||||
|
|
||||||
|
// ── Helpers ────────────────────────────────────────────────────────────
|
||||||
|
const $ = (id) => document.getElementById(id);
|
||||||
|
|
||||||
|
// util.js must load before this script. If it failed to load, fail loud and
|
||||||
|
// visible rather than letting the destructure throw and abort the page silently.
|
||||||
|
if (!window.mbproxyUtil) {
|
||||||
|
document.body.innerHTML =
|
||||||
|
'<p style="padding:2rem;font-family:sans-serif;color:#b00">' +
|
||||||
|
'Admin UI failed to load (util.js missing). Check the browser console.</p>';
|
||||||
|
throw new Error('window.mbproxyUtil is not defined');
|
||||||
|
}
|
||||||
|
const { escapeHtml, escapeAttr } = window.mbproxyUtil;
|
||||||
|
|
||||||
|
function num(n) {
|
||||||
|
if (n === null || n === undefined) return '—';
|
||||||
|
return n.toLocaleString('en-US');
|
||||||
|
}
|
||||||
|
|
||||||
|
function ratio(hit, miss) {
|
||||||
|
const total = (hit || 0) + (miss || 0);
|
||||||
|
if (total === 0) return null;
|
||||||
|
return Math.round((100 * hit) / total);
|
||||||
|
}
|
||||||
|
|
||||||
|
function exceptionTotal(b) {
|
||||||
|
const e = b.exceptionsByCode || {};
|
||||||
|
return (e.code01 || 0) + (e.code02 || 0) + (e.code03 || 0) +
|
||||||
|
(e.code04 || 0) + (e.codeOther || 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
function isProblem(plc) {
|
||||||
|
return plc.listener.state === 'recovering'
|
||||||
|
|| exceptionTotal(plc.backend) > 0
|
||||||
|
|| (plc.backend.backendHeartbeatsFailed || 0) > 0
|
||||||
|
|| (plc.backend.connectsFailed || 0) > 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
function stateChip(state) {
|
||||||
|
const cls = state === 'bound' ? 'chip-ok'
|
||||||
|
: state === 'recovering' ? 'chip-warn'
|
||||||
|
: 'chip-idle';
|
||||||
|
return `<span class="chip ${cls}">${escapeHtml(state)}</span>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function ratioCell(hit, miss) {
|
||||||
|
const r = ratio(hit, miss);
|
||||||
|
if (r === null) return '<span class="s-idle">—</span>';
|
||||||
|
return `${r}%<span class="ratio-sub"> ${num(hit)}</span>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function keepaliveCell(b) {
|
||||||
|
const sent = b.backendHeartbeatsSent || 0;
|
||||||
|
const failed = b.backendHeartbeatsFailed || 0;
|
||||||
|
if (sent === 0 && failed === 0) return '<span class="s-idle">—</span>';
|
||||||
|
if (failed > 0) return `<span class="s-bad">${num(sent)} · ${failed} fail</span>`;
|
||||||
|
return num(sent);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Rate computation ───────────────────────────────────────────────────
|
||||||
|
function updateRates(snapshot) {
|
||||||
|
const now = performance.now();
|
||||||
|
for (const plc of snapshot.plcs) {
|
||||||
|
const cur = plc.pdus.forwarded;
|
||||||
|
const prev = prevPdu.get(plc.name);
|
||||||
|
if (prev && now > prev.t) {
|
||||||
|
const dt = (now - prev.t) / 1000;
|
||||||
|
const rate = Math.max(0, (cur - prev.forwarded) / dt);
|
||||||
|
rateByName.set(plc.name, rate);
|
||||||
|
}
|
||||||
|
prevPdu.set(plc.name, { forwarded: cur, t: now });
|
||||||
|
}
|
||||||
|
// Prune entries for PLCs no longer in the snapshot (hot-reload removal) so the
|
||||||
|
// Maps don't grow unbounded and rateByName.size stays an accurate "have rates" gauge.
|
||||||
|
const currentNames = new Set(snapshot.plcs.map(p => p.name));
|
||||||
|
for (const k of prevPdu.keys()) if (!currentNames.has(k)) prevPdu.delete(k);
|
||||||
|
for (const k of rateByName.keys()) if (!currentNames.has(k)) rateByName.delete(k);
|
||||||
|
}
|
||||||
|
|
||||||
|
function rateOf(name) {
|
||||||
|
return rateByName.has(name) ? rateByName.get(name) : null;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Aggregate strip ────────────────────────────────────────────────────
|
||||||
|
function renderAggregates(s) {
|
||||||
|
$('ag-bound').textContent = s.listeners.bound;
|
||||||
|
$('ag-configured').textContent = '/ ' + s.listeners.configured;
|
||||||
|
|
||||||
|
let clients = 0, exceptions = 0, recovering = 0;
|
||||||
|
let cacheHit = 0, cacheMiss = 0, fleetRate = 0;
|
||||||
|
for (const plc of s.plcs) {
|
||||||
|
clients += plc.clients.connected;
|
||||||
|
exceptions += exceptionTotal(plc.backend);
|
||||||
|
if (plc.listener.state === 'recovering') recovering++;
|
||||||
|
cacheHit += plc.backend.cacheHitCount || 0;
|
||||||
|
cacheMiss += plc.backend.cacheMissCount || 0;
|
||||||
|
const r = rateOf(plc.name);
|
||||||
|
if (r !== null) fleetRate += r;
|
||||||
|
}
|
||||||
|
|
||||||
|
$('ag-clients').textContent = num(clients);
|
||||||
|
$('ag-pdurate').textContent = rateByName.size ? Math.round(fleetRate).toLocaleString('en-US') : '—';
|
||||||
|
$('ag-recovering').textContent = recovering;
|
||||||
|
$('ag-exceptions').textContent = num(exceptions);
|
||||||
|
|
||||||
|
const cr = ratio(cacheHit, cacheMiss);
|
||||||
|
$('ag-cache').textContent = cr === null ? '—' : cr + '%';
|
||||||
|
|
||||||
|
$('ag-bound').className = s.listeners.bound < s.listeners.configured ? 's-warn' : '';
|
||||||
|
$('ag-recovering-card').className = 'agg-card' + (recovering > 0 ? ' caution' : '');
|
||||||
|
$('ag-exceptions-card').className = 'agg-card' + (exceptions > 0 ? ' alert' : '');
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Table ──────────────────────────────────────────────────────────────
|
||||||
|
function sortKey(plc, key) {
|
||||||
|
switch (key) {
|
||||||
|
case 'name': return plc.name.toLowerCase();
|
||||||
|
case 'host': return plc.host.toLowerCase();
|
||||||
|
case 'state': return plc.listener.state;
|
||||||
|
case 'clients': return plc.clients.connected;
|
||||||
|
case 'pdurate': return rateOf(plc.name) || 0;
|
||||||
|
case 'rtt': return plc.backend.lastRoundTripMs || 0;
|
||||||
|
case 'exceptions': return exceptionTotal(plc.backend);
|
||||||
|
case 'coalesce': return ratio(plc.backend.coalescedHitCount, plc.backend.coalescedMissCount) || -1;
|
||||||
|
case 'cache': return ratio(plc.backend.cacheHitCount, plc.backend.cacheMissCount) || -1;
|
||||||
|
case 'keepalive': return plc.backend.backendHeartbeatsSent || 0;
|
||||||
|
default: return plc.name;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function visiblePlcs(s) {
|
||||||
|
let rows = s.plcs.slice();
|
||||||
|
const q = filter.search.trim().toLowerCase();
|
||||||
|
if (q) rows = rows.filter(p => p.name.toLowerCase().includes(q) || p.host.toLowerCase().includes(q));
|
||||||
|
if (filter.state) rows = rows.filter(p => p.listener.state === filter.state);
|
||||||
|
if (filter.problemsOnly) rows = rows.filter(isProblem);
|
||||||
|
rows.sort((a, b) => {
|
||||||
|
const ka = sortKey(a, sort.key), kb = sortKey(b, sort.key);
|
||||||
|
if (ka < kb) return -1 * sort.dir;
|
||||||
|
if (ka > kb) return 1 * sort.dir;
|
||||||
|
return a.name.localeCompare(b.name);
|
||||||
|
});
|
||||||
|
return rows;
|
||||||
|
}
|
||||||
|
|
||||||
|
function renderTable(s) {
|
||||||
|
const rows = visiblePlcs(s);
|
||||||
|
const tbody = $('plc-rows');
|
||||||
|
$('row-count').textContent = `${rows.length} of ${s.plcs.length}`;
|
||||||
|
|
||||||
|
if (rows.length === 0) {
|
||||||
|
tbody.innerHTML = '<tr><td colspan="10" class="empty-row">No PLCs match the current filter.</td></tr>';
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
tbody.innerHTML = rows.map(plc => {
|
||||||
|
const b = plc.backend;
|
||||||
|
const rate = rateOf(plc.name);
|
||||||
|
const excn = exceptionTotal(b);
|
||||||
|
const recovHint = plc.listener.state === 'recovering' && plc.listener.lastBindError
|
||||||
|
? ` title="${escapeAttr(plc.listener.lastBindError)}"` : '';
|
||||||
|
const plcHref = '/plc/' + encodeURIComponent(plc.name);
|
||||||
|
return `<tr${recovHint}>
|
||||||
|
<td class="plc-name"><a href="${escapeAttr(plcHref)}" target="_blank" rel="noopener">${escapeHtml(plc.name)}</a></td>
|
||||||
|
<td class="plc-host">${escapeHtml(plc.host)}:${plc.listenPort}</td>
|
||||||
|
<td>${stateChip(plc.listener.state)}</td>
|
||||||
|
<td class="num">${plc.clients.connected}</td>
|
||||||
|
<td class="num">${rate === null ? '—' : Math.round(rate)}</td>
|
||||||
|
<td class="num">${(b.lastRoundTripMs || 0).toFixed(1)}</td>
|
||||||
|
<td class="num ${excn > 0 ? 's-bad' : ''}">${excn}</td>
|
||||||
|
<td class="num">${ratioCell(b.coalescedHitCount, b.coalescedMissCount)}</td>
|
||||||
|
<td class="num">${ratioCell(b.cacheHitCount, b.cacheMissCount)}</td>
|
||||||
|
<td class="num">${keepaliveCell(b)}</td>
|
||||||
|
</tr>`;
|
||||||
|
}).join('');
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Render orchestration ───────────────────────────────────────────────
|
||||||
|
function render() {
|
||||||
|
if (!latest) return;
|
||||||
|
renderAggregates(latest);
|
||||||
|
renderTable(latest);
|
||||||
|
}
|
||||||
|
|
||||||
|
function onSnapshot(s) {
|
||||||
|
latest = s;
|
||||||
|
updateRates(s);
|
||||||
|
const up = s.service.uptimeSeconds;
|
||||||
|
$('svc-meta').textContent =
|
||||||
|
`v${s.service.version} · up ${formatUptime(up)} · reloads ${s.service.configReloadCount}`;
|
||||||
|
render();
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatUptime(sec) {
|
||||||
|
if (!Number.isFinite(sec) || sec < 0) return '—';
|
||||||
|
const d = Math.floor(sec / 86400);
|
||||||
|
const h = Math.floor((sec % 86400) / 3600);
|
||||||
|
const m = Math.floor((sec % 3600) / 60);
|
||||||
|
if (d > 0) return `${d}d ${h}h`;
|
||||||
|
if (h > 0) return `${h}h ${m}m`;
|
||||||
|
return `${m}m`;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Connection-state pill ──────────────────────────────────────────────
|
||||||
|
function setConn(state, text) {
|
||||||
|
const pill = $('conn');
|
||||||
|
pill.dataset.state = state;
|
||||||
|
$('conn-text').textContent = text || state;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Wire up filter / sort controls ─────────────────────────────────────
|
||||||
|
function wireControls() {
|
||||||
|
$('f-search').addEventListener('input', e => { filter.search = e.target.value; render(); });
|
||||||
|
$('f-state').addEventListener('change', e => { filter.state = e.target.value; render(); });
|
||||||
|
$('f-problems').addEventListener('change', e => { filter.problemsOnly = e.target.checked; render(); });
|
||||||
|
|
||||||
|
// Sortable headers are keyboard-operable: each is tabindex=0 with aria-sort,
|
||||||
|
// and Enter / Space sorts just like a click. aria-sort is kept in lockstep with
|
||||||
|
// the sorted-asc / sorted-desc visual classes so a screen reader announces it.
|
||||||
|
function applySort(th) {
|
||||||
|
const key = th.dataset.sort;
|
||||||
|
if (sort.key === key) { sort.dir *= -1; }
|
||||||
|
else { sort.key = key; sort.dir = 1; }
|
||||||
|
document.querySelectorAll('th.sortable').forEach(h => {
|
||||||
|
h.classList.remove('sorted-asc', 'sorted-desc');
|
||||||
|
h.setAttribute('aria-sort', 'none');
|
||||||
|
});
|
||||||
|
th.classList.add(sort.dir === 1 ? 'sorted-asc' : 'sorted-desc');
|
||||||
|
th.setAttribute('aria-sort', sort.dir === 1 ? 'ascending' : 'descending');
|
||||||
|
render();
|
||||||
|
}
|
||||||
|
|
||||||
|
document.querySelectorAll('th.sortable').forEach(th => {
|
||||||
|
th.addEventListener('click', () => applySort(th));
|
||||||
|
th.addEventListener('keydown', e => {
|
||||||
|
if (e.key === 'Enter' || e.key === ' ') {
|
||||||
|
e.preventDefault();
|
||||||
|
applySort(th);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── SignalR ────────────────────────────────────────────────────────────
|
||||||
|
const connection = new signalR.HubConnectionBuilder()
|
||||||
|
.withUrl('/hub/status')
|
||||||
|
.withAutomaticReconnect([0, 1000, 2000, 5000, 10000])
|
||||||
|
.build();
|
||||||
|
|
||||||
|
connection.on('fleet', onSnapshot);
|
||||||
|
connection.onreconnecting(() => setConn('connecting', 'reconnecting'));
|
||||||
|
// On a warm reconnect the transport is back but the group subscription is not —
|
||||||
|
// re-subscribe through connect(), which retries with backoff if the invoke fails
|
||||||
|
// (instead of silently leaving a live connection with no feed).
|
||||||
|
connection.onreconnected(() => connect());
|
||||||
|
connection.onclose(() => setConn('disconnected', 'disconnected'));
|
||||||
|
|
||||||
|
// Cold start. withAutomaticReconnect only recovers an already-established
|
||||||
|
// connection, so the initial connect needs its own retry: capped exponential
|
||||||
|
// backoff, and start() only when the socket is actually Disconnected so a
|
||||||
|
// subscribe-only failure never tries to re-start a live connection.
|
||||||
|
let retryMs = 1000;
|
||||||
|
async function connect() {
|
||||||
|
setConn('connecting', 'connecting');
|
||||||
|
try {
|
||||||
|
if (connection.state === signalR.HubConnectionState.Disconnected)
|
||||||
|
await connection.start();
|
||||||
|
await connection.invoke('SubscribeFleet');
|
||||||
|
setConn('connected');
|
||||||
|
retryMs = 1000;
|
||||||
|
} catch {
|
||||||
|
setConn('disconnected', 'retrying');
|
||||||
|
setTimeout(connect, retryMs);
|
||||||
|
retryMs = Math.min(retryMs * 2, 30000);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Boot ───────────────────────────────────────────────────────────────
|
||||||
|
document.addEventListener('DOMContentLoaded', () => {
|
||||||
|
wireControls();
|
||||||
|
const nameTh = document.querySelector('th[data-sort="name"]');
|
||||||
|
nameTh.classList.add('sorted-asc');
|
||||||
|
nameTh.setAttribute('aria-sort', 'ascending');
|
||||||
|
connect();
|
||||||
|
});
|
||||||
|
})();
|
||||||
@@ -0,0 +1,150 @@
|
|||||||
|
/* ============================================================================
|
||||||
|
Connection-detail page — view-specific styling (Phase 5).
|
||||||
|
Shared tokens and chrome live in theme.css.
|
||||||
|
========================================================================= */
|
||||||
|
|
||||||
|
/* ── PLC identity header ─────────────────────────────────────────────────── */
|
||||||
|
.plc-head {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: space-between;
|
||||||
|
gap: 1rem;
|
||||||
|
background: var(--card);
|
||||||
|
border: 1px solid var(--rule);
|
||||||
|
border-radius: 8px;
|
||||||
|
padding: 0.85rem 1.1rem;
|
||||||
|
margin-bottom: 1rem;
|
||||||
|
}
|
||||||
|
.plc-title { font-size: 1.35rem; font-weight: 600; line-height: 1.1; }
|
||||||
|
.plc-sub {
|
||||||
|
margin-top: 0.2rem;
|
||||||
|
font-family: var(--mono);
|
||||||
|
font-size: 0.82rem;
|
||||||
|
color: var(--ink-soft);
|
||||||
|
}
|
||||||
|
.plc-head-state .chip { font-size: 0.78rem; padding: 0.25rem 0.65rem; }
|
||||||
|
|
||||||
|
.notice {
|
||||||
|
padding: 0.85rem 1.1rem;
|
||||||
|
margin-bottom: 1rem;
|
||||||
|
color: #b56a00;
|
||||||
|
background: var(--warn-bg);
|
||||||
|
border-color: #efd6a6;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Grouped counter cards ───────────────────────────────────────────────── */
|
||||||
|
.card-grid {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(auto-fill, minmax(290px, 1fr));
|
||||||
|
gap: 0.85rem;
|
||||||
|
margin-bottom: 1rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.metric-card {
|
||||||
|
background: var(--card);
|
||||||
|
border: 1px solid var(--rule);
|
||||||
|
border-radius: 8px;
|
||||||
|
overflow: hidden;
|
||||||
|
}
|
||||||
|
.metric-card .panel-head { margin: 0; }
|
||||||
|
|
||||||
|
.kv {
|
||||||
|
display: flex;
|
||||||
|
justify-content: space-between;
|
||||||
|
align-items: baseline;
|
||||||
|
gap: 1rem;
|
||||||
|
padding: 0.32rem 0.9rem;
|
||||||
|
font-size: 0.85rem;
|
||||||
|
}
|
||||||
|
.kv:nth-child(even) { background: #fbfbf9; }
|
||||||
|
.kv .k { color: var(--ink-soft); }
|
||||||
|
.kv .v { font-family: var(--mono); font-variant-numeric: tabular-nums; text-align: right; }
|
||||||
|
.kv .v.warn { color: var(--warn); }
|
||||||
|
.kv .v.bad { color: var(--bad); }
|
||||||
|
.kv .v.ok { color: var(--ok); }
|
||||||
|
|
||||||
|
/* client list inside the Clients card */
|
||||||
|
.client-line {
|
||||||
|
padding: 0.3rem 0.9rem;
|
||||||
|
font-family: var(--mono);
|
||||||
|
font-size: 0.78rem;
|
||||||
|
color: var(--ink-soft);
|
||||||
|
border-top: 1px dashed var(--rule);
|
||||||
|
}
|
||||||
|
.client-line .pdu { color: var(--ink-faint); }
|
||||||
|
|
||||||
|
/* ── Debug view ──────────────────────────────────────────────────────────── */
|
||||||
|
.debug-head {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: space-between;
|
||||||
|
}
|
||||||
|
.capture-state {
|
||||||
|
font-family: var(--mono);
|
||||||
|
font-size: 0.72rem;
|
||||||
|
letter-spacing: 0.04em;
|
||||||
|
padding: 0.12rem 0.5rem;
|
||||||
|
border-radius: 4px;
|
||||||
|
}
|
||||||
|
.capture-state[data-armed="true"] { color: var(--ok); background: var(--ok-bg); }
|
||||||
|
.capture-state[data-armed="false"] { color: var(--ink-soft); background: var(--idle-bg); }
|
||||||
|
|
||||||
|
.table-wrap { overflow-x: auto; }
|
||||||
|
|
||||||
|
.debug-table {
|
||||||
|
width: 100%;
|
||||||
|
border-collapse: collapse;
|
||||||
|
font-size: 0.85rem;
|
||||||
|
}
|
||||||
|
.debug-table th,
|
||||||
|
.debug-table td {
|
||||||
|
padding: 0.45rem 0.9rem;
|
||||||
|
text-align: left;
|
||||||
|
white-space: nowrap;
|
||||||
|
border-bottom: 1px solid var(--rule);
|
||||||
|
}
|
||||||
|
.debug-table th {
|
||||||
|
font-size: 0.7rem;
|
||||||
|
font-weight: 600;
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.05em;
|
||||||
|
color: var(--ink-faint);
|
||||||
|
background: #fbfbf9;
|
||||||
|
}
|
||||||
|
.debug-table th.num,
|
||||||
|
.debug-table td.num { text-align: right; font-family: var(--mono); }
|
||||||
|
.debug-table tbody tr:last-child td { border-bottom: none; }
|
||||||
|
|
||||||
|
/* First column: friendly tag name over its PDU address, or a bare address. */
|
||||||
|
.debug-table .tag-name { display: block; font-weight: 600; }
|
||||||
|
.debug-table .tag-addr {
|
||||||
|
display: block;
|
||||||
|
margin-top: 0.1rem;
|
||||||
|
font-family: var(--mono);
|
||||||
|
font-size: 0.72rem;
|
||||||
|
color: var(--ink-faint);
|
||||||
|
}
|
||||||
|
.debug-table .tag-addr-hex { color: var(--ink-faint); font-size: 0.76rem; }
|
||||||
|
|
||||||
|
.debug-table .raw { color: var(--accent-deep); }
|
||||||
|
.debug-table .dec { font-weight: 600; }
|
||||||
|
.debug-table tr.stale td { color: var(--ink-faint); }
|
||||||
|
.debug-table tr.no-traffic td { color: var(--ink-faint); font-style: italic; }
|
||||||
|
|
||||||
|
.dir-tag {
|
||||||
|
font-size: 0.68rem;
|
||||||
|
font-weight: 600;
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.05em;
|
||||||
|
padding: 0.1rem 0.4rem;
|
||||||
|
border-radius: 3px;
|
||||||
|
}
|
||||||
|
.dir-read { color: var(--accent-deep); background: #e7ecfb; }
|
||||||
|
.dir-write { color: #8a5a00; background: var(--warn-bg); }
|
||||||
|
|
||||||
|
.empty-row {
|
||||||
|
text-align: center !important;
|
||||||
|
color: var(--ink-faint);
|
||||||
|
padding: 1.6rem !important;
|
||||||
|
font-style: italic;
|
||||||
|
}
|
||||||
@@ -0,0 +1,327 @@
|
|||||||
|
/* ============================================================================
|
||||||
|
Connection-detail page — live per-PLC view over /hub/status (Phase 5).
|
||||||
|
Subscribes to one PLC's "plc" group; renders grouped counter cards and the
|
||||||
|
real-time debug view (per-tag PLC-side raw BCD vs client-side decoded value).
|
||||||
|
========================================================================= */
|
||||||
|
'use strict';
|
||||||
|
|
||||||
|
(function () {
|
||||||
|
// ── PLC name from the URL path: /plc/{name} ────────────────────────────
|
||||||
|
// decodeURIComponent throws URIError on a malformed %-escape; fall back to the
|
||||||
|
// raw path segment and flag the failure so the boot path shows a notice instead
|
||||||
|
// of letting the whole script abort on the very first statement.
|
||||||
|
const rawSegment = location.pathname.replace(/^\/plc\//, '');
|
||||||
|
let plcName, plcNameError = false;
|
||||||
|
try {
|
||||||
|
plcName = decodeURIComponent(rawSegment);
|
||||||
|
} catch {
|
||||||
|
plcName = rawSegment;
|
||||||
|
plcNameError = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
const $ = (id) => document.getElementById(id);
|
||||||
|
|
||||||
|
// util.js must load before this script. If it failed to load, fail loud and
|
||||||
|
// visible rather than letting the destructure throw and abort the page silently.
|
||||||
|
if (!window.mbproxyUtil) {
|
||||||
|
document.body.innerHTML =
|
||||||
|
'<p style="padding:2rem;font-family:sans-serif;color:#b00">' +
|
||||||
|
'Admin UI failed to load (util.js missing). Check the browser console.</p>';
|
||||||
|
throw new Error('window.mbproxyUtil is not defined');
|
||||||
|
}
|
||||||
|
const { escapeHtml, escapeAttr } = window.mbproxyUtil;
|
||||||
|
|
||||||
|
document.title = `mbproxy — ${plcName}`;
|
||||||
|
$('crumb-name').textContent = plcName;
|
||||||
|
$('plc-name').textContent = plcName;
|
||||||
|
|
||||||
|
// ── Helpers ────────────────────────────────────────────────────────────
|
||||||
|
function num(n) {
|
||||||
|
if (n === null || n === undefined) return '—';
|
||||||
|
return n.toLocaleString('en-US');
|
||||||
|
}
|
||||||
|
function hex4(n) { return '0x' + (n & 0xffff).toString(16).toUpperCase().padStart(4, '0'); }
|
||||||
|
|
||||||
|
// First debug-row cell: the tag's friendly name (when configured) over its PDU
|
||||||
|
// address, or just the address when unnamed. All dynamic text is escaped.
|
||||||
|
function tagCell(t) {
|
||||||
|
const addr = Number(t.address);
|
||||||
|
if (t.name) {
|
||||||
|
return `<td><span class="tag-name">${escapeHtml(t.name)}</span>` +
|
||||||
|
`<span class="tag-addr">PDU ${addr} · ${hex4(addr)}</span></td>`;
|
||||||
|
}
|
||||||
|
return `<td>${addr} <span class="tag-addr-hex">${hex4(addr)}</span></td>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatAge(sec) {
|
||||||
|
if (!Number.isFinite(sec) || sec < 0) return '—';
|
||||||
|
if (sec < 1) return 'now';
|
||||||
|
if (sec < 60) return sec.toFixed(1) + 's';
|
||||||
|
const m = Math.floor(sec / 60);
|
||||||
|
if (m < 60) return `${m}m ${Math.floor(sec % 60)}s`;
|
||||||
|
return `${Math.floor(m / 60)}h ${m % 60}m`;
|
||||||
|
}
|
||||||
|
function shortTime(iso) {
|
||||||
|
try { return new Date(iso).toLocaleTimeString('en-US', { hour12: false }); }
|
||||||
|
catch { return iso; }
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Card renderer ──────────────────────────────────────────────────────
|
||||||
|
// rows: array of [key, value, optionalClass]; extra: raw HTML appended.
|
||||||
|
function card(title, rows, extra) {
|
||||||
|
const body = rows.map(([k, v, cls]) =>
|
||||||
|
`<div class="kv"><span class="k">${escapeHtml(k)}</span>` +
|
||||||
|
`<span class="v ${cls || ''}">${v}</span></div>`).join('');
|
||||||
|
return `<div class="metric-card">
|
||||||
|
<div class="panel-head">${escapeHtml(title)}</div>
|
||||||
|
${body}${extra || ''}
|
||||||
|
</div>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function stateChip(state) {
|
||||||
|
const cls = state === 'bound' ? 'chip-ok'
|
||||||
|
: state === 'recovering' ? 'chip-warn'
|
||||||
|
: 'chip-idle';
|
||||||
|
return `<span class="chip ${cls}">${escapeHtml(state)}</span>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Render: PLC counters ───────────────────────────────────────────────
|
||||||
|
function renderPlc(plc) {
|
||||||
|
$('notice').hidden = true;
|
||||||
|
$('cards').hidden = false;
|
||||||
|
|
||||||
|
$('plc-sub').textContent = `${plc.host}:${plc.listenPort}`;
|
||||||
|
$('plc-state').innerHTML = stateChip(plc.listener.state);
|
||||||
|
|
||||||
|
const b = plc.backend, p = plc.pdus, l = plc.listener;
|
||||||
|
const e = b.exceptionsByCode || {};
|
||||||
|
const excnTotal = (e.code01 || 0) + (e.code02 || 0) + (e.code03 || 0) +
|
||||||
|
(e.code04 || 0) + (e.codeOther || 0);
|
||||||
|
|
||||||
|
const cards = [];
|
||||||
|
|
||||||
|
// Listener
|
||||||
|
cards.push(card('Listener', [
|
||||||
|
['State', stateChip(l.state)],
|
||||||
|
['Recovery attempts', num(l.recoveryAttempts), l.recoveryAttempts > 0 ? 'warn' : ''],
|
||||||
|
['Last bind error', l.lastBindError ? escapeHtml(l.lastBindError) : '—',
|
||||||
|
l.lastBindError ? 'bad' : ''],
|
||||||
|
]));
|
||||||
|
|
||||||
|
// Clients
|
||||||
|
const clientLines = (plc.clients.remoteEndpoints || []).map(c =>
|
||||||
|
`<div class="client-line">${escapeHtml(c.remote)}` +
|
||||||
|
`<span class="pdu"> · ${num(c.pdusForwarded)} PDUs · since ${escapeHtml(shortTime(c.connectedAtUtc))}</span></div>`
|
||||||
|
).join('');
|
||||||
|
cards.push(card('Clients', [
|
||||||
|
['Connected', num(plc.clients.connected)],
|
||||||
|
], clientLines));
|
||||||
|
|
||||||
|
// PDU traffic
|
||||||
|
cards.push(card('PDU traffic', [
|
||||||
|
['Forwarded', num(p.forwarded)],
|
||||||
|
['FC03 read HR', num(p.byFc.fc03)],
|
||||||
|
['FC04 read IR', num(p.byFc.fc04)],
|
||||||
|
['FC06 write single', num(p.byFc.fc06)],
|
||||||
|
['FC16 write multiple', num(p.byFc.fc16)],
|
||||||
|
['Other FCs', num(p.byFc.other)],
|
||||||
|
['BCD slots rewritten', num(p.rewrittenSlots)],
|
||||||
|
['Partial-BCD warnings', num(p.partialBcdWarnings), p.partialBcdWarnings > 0 ? 'warn' : ''],
|
||||||
|
['Invalid-BCD warnings', num(p.invalidBcdWarnings), p.invalidBcdWarnings > 0 ? 'warn' : ''],
|
||||||
|
]));
|
||||||
|
|
||||||
|
// Backend health
|
||||||
|
cards.push(card('Backend health', [
|
||||||
|
['Connects ok', num(b.connectsSuccess)],
|
||||||
|
['Connects failed', num(b.connectsFailed), b.connectsFailed > 0 ? 'bad' : ''],
|
||||||
|
['Round-trip (EWMA)', (b.lastRoundTripMs || 0).toFixed(1) + ' ms'],
|
||||||
|
['Exceptions total', num(excnTotal), excnTotal > 0 ? 'bad' : ''],
|
||||||
|
['· 01 illegal function', num(e.code01)],
|
||||||
|
['· 02 illegal address', num(e.code02)],
|
||||||
|
['· 03 illegal value', num(e.code03)],
|
||||||
|
['· 04 device failure', num(e.code04)],
|
||||||
|
['· other', num(e.codeOther)],
|
||||||
|
]));
|
||||||
|
|
||||||
|
// Multiplexer
|
||||||
|
cards.push(card('Multiplexer', [
|
||||||
|
['In flight', num(b.inFlight)],
|
||||||
|
['Max in flight', num(b.maxInFlight)],
|
||||||
|
['TxId wraps', num(b.txIdWraps)],
|
||||||
|
['Disconnect cascades', num(b.disconnectCascades), b.disconnectCascades > 0 ? 'warn' : ''],
|
||||||
|
['Queue depth', num(b.queueDepth), b.queueDepth > 0 ? 'warn' : ''],
|
||||||
|
]));
|
||||||
|
|
||||||
|
// Coalescing
|
||||||
|
cards.push(card('Read coalescing', [
|
||||||
|
['Hits', num(b.coalescedHitCount)],
|
||||||
|
['Misses', num(b.coalescedMissCount)],
|
||||||
|
['Hit ratio', ratioText(b.coalescedHitCount, b.coalescedMissCount)],
|
||||||
|
['Resp. to dead upstream', num(b.coalescedResponseToDeadUpstream)],
|
||||||
|
]));
|
||||||
|
|
||||||
|
// Cache
|
||||||
|
cards.push(card('Response cache', [
|
||||||
|
['Hits', num(b.cacheHitCount)],
|
||||||
|
['Misses', num(b.cacheMissCount)],
|
||||||
|
['Hit ratio', ratioText(b.cacheHitCount, b.cacheMissCount)],
|
||||||
|
['Invalidations', num(b.cacheInvalidations)],
|
||||||
|
['Entries', num(b.cacheEntryCount)],
|
||||||
|
['Approx. bytes', num(b.cacheBytes)],
|
||||||
|
]));
|
||||||
|
|
||||||
|
// Keepalive
|
||||||
|
cards.push(card('Keepalive', [
|
||||||
|
['Heartbeats sent', num(b.backendHeartbeatsSent)],
|
||||||
|
['Heartbeats failed', num(b.backendHeartbeatsFailed), b.backendHeartbeatsFailed > 0 ? 'bad' : ''],
|
||||||
|
['Idle disconnects', num(b.backendIdleDisconnects), b.backendIdleDisconnects > 0 ? 'warn' : ''],
|
||||||
|
]));
|
||||||
|
|
||||||
|
// Bytes
|
||||||
|
cards.push(card('Bytes', [
|
||||||
|
['Upstream in', num(plc.bytes.upstreamIn)],
|
||||||
|
['Upstream out', num(plc.bytes.upstreamOut)],
|
||||||
|
]));
|
||||||
|
|
||||||
|
$('cards').innerHTML = cards.join('');
|
||||||
|
}
|
||||||
|
|
||||||
|
function ratioText(hit, miss) {
|
||||||
|
const total = (hit || 0) + (miss || 0);
|
||||||
|
if (total === 0) return '—';
|
||||||
|
return Math.round((100 * hit) / total) + '%';
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Notices: PLC removed by hot-reload, or unknown / unreachable ───────
|
||||||
|
function showNotice(msg) {
|
||||||
|
const n = $('notice');
|
||||||
|
n.textContent = msg;
|
||||||
|
n.hidden = false;
|
||||||
|
$('cards').hidden = true;
|
||||||
|
}
|
||||||
|
function renderMissing() {
|
||||||
|
showNotice('This PLC is no longer in the configuration — it was likely ' +
|
||||||
|
'removed by a hot-reload. Counters and the debug view are unavailable.');
|
||||||
|
$('cards').innerHTML = '';
|
||||||
|
$('plc-sub').textContent = 'not configured';
|
||||||
|
$('plc-state').innerHTML = '<span class="chip chip-idle">removed</span>';
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Render: debug view ─────────────────────────────────────────────────
|
||||||
|
function renderDebug(debug) {
|
||||||
|
const cap = $('capture-state');
|
||||||
|
cap.dataset.armed = String(debug.captureArmed);
|
||||||
|
cap.textContent = debug.captureArmed ? 'capture armed' : 'capture idle';
|
||||||
|
|
||||||
|
const tbody = $('debug-rows');
|
||||||
|
if (!debug.tags || debug.tags.length === 0) {
|
||||||
|
tbody.innerHTML = '<tr><td colspan="6" class="empty-row">No BCD tags configured for this PLC.</td></tr>';
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
tbody.innerHTML = debug.tags.map(t => {
|
||||||
|
if (!t.hasValue) {
|
||||||
|
return `<tr class="no-traffic">
|
||||||
|
${tagCell(t)}
|
||||||
|
<td>${Number(t.width)}-bit</td>
|
||||||
|
<td colspan="3">no traffic yet</td>
|
||||||
|
<td class="num">—</td>
|
||||||
|
</tr>`;
|
||||||
|
}
|
||||||
|
const dirCls = t.direction === 'write' ? 'dir-write' : 'dir-read';
|
||||||
|
const staleAttr = (t.ageSeconds || 0) > 30 ? ' class="stale"' : '';
|
||||||
|
return `<tr${staleAttr}>
|
||||||
|
${tagCell(t)}
|
||||||
|
<td>${Number(t.width)}-bit</td>
|
||||||
|
<td><span class="dir-tag ${dirCls}">${escapeHtml(t.direction)}</span></td>
|
||||||
|
<td class="num raw">${escapeHtml(t.rawHex)}</td>
|
||||||
|
<td class="num dec">${num(t.decodedValue)}</td>
|
||||||
|
<td class="num">${formatAge(t.ageSeconds)}</td>
|
||||||
|
</tr>`;
|
||||||
|
}).join('');
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Snapshot handler ───────────────────────────────────────────────────
|
||||||
|
function onDetail(detail) {
|
||||||
|
gotSnapshot();
|
||||||
|
if (detail.plc) renderPlc(detail.plc);
|
||||||
|
else renderMissing();
|
||||||
|
renderDebug(detail.debug || { captureArmed: false, tags: [] });
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Connection-state pill ──────────────────────────────────────────────
|
||||||
|
function setConn(state, text) {
|
||||||
|
$('conn').dataset.state = state;
|
||||||
|
$('conn-text').textContent = text || state;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Unknown-PLC watchdog ───────────────────────────────────────────────
|
||||||
|
// SubscribePlc succeeds for any name; an unconfigured PLC simply never
|
||||||
|
// produces a 'plc' push. If no snapshot lands shortly after connecting,
|
||||||
|
// say so instead of sitting on "Waiting for first snapshot…" forever.
|
||||||
|
let firstSnapshotTimer = null;
|
||||||
|
function armSnapshotWatchdog() {
|
||||||
|
clearTimeout(firstSnapshotTimer);
|
||||||
|
firstSnapshotTimer = setTimeout(() => {
|
||||||
|
showNotice(`No data for "${plcName}". This PLC is not in the proxy ` +
|
||||||
|
`configuration, or the admin feed is not delivering — ` +
|
||||||
|
`check the name against the fleet page.`);
|
||||||
|
}, 6000);
|
||||||
|
}
|
||||||
|
function gotSnapshot() {
|
||||||
|
clearTimeout(firstSnapshotTimer);
|
||||||
|
firstSnapshotTimer = null;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── SignalR ────────────────────────────────────────────────────────────
|
||||||
|
// Stable per-page-load id so a transport reconnect (which assigns a fresh
|
||||||
|
// ConnectionId) is recognised server-side as the same viewer — that is what
|
||||||
|
// keeps the PLC's tag-value capture from leaking armed. Math.random, not
|
||||||
|
// crypto.randomUUID: the dashboard is served over plain http on the LAN,
|
||||||
|
// where randomUUID is unavailable (non-secure context).
|
||||||
|
const tabId = 't-' + Date.now().toString(36) + '-' +
|
||||||
|
Math.random().toString(36).slice(2, 10);
|
||||||
|
|
||||||
|
const connection = new signalR.HubConnectionBuilder()
|
||||||
|
.withUrl('/hub/status')
|
||||||
|
.withAutomaticReconnect([0, 1000, 2000, 5000, 10000])
|
||||||
|
.build();
|
||||||
|
|
||||||
|
connection.on('plc', onDetail);
|
||||||
|
connection.onreconnecting(() => setConn('connecting', 'reconnecting'));
|
||||||
|
// On a warm reconnect the transport is back but the group subscription is not —
|
||||||
|
// re-subscribe through connect(), which retries with backoff if the invoke fails
|
||||||
|
// and re-arms the snapshot watchdog (instead of silently showing a dead feed).
|
||||||
|
connection.onreconnected(() => connect());
|
||||||
|
connection.onclose(() => setConn('disconnected', 'disconnected'));
|
||||||
|
|
||||||
|
// Cold start. withAutomaticReconnect only recovers an already-established
|
||||||
|
// connection, so the initial connect needs its own retry: capped exponential
|
||||||
|
// backoff, and start() only when the socket is actually Disconnected so a
|
||||||
|
// subscribe-only failure never tries to re-start a live connection.
|
||||||
|
let retryMs = 1000;
|
||||||
|
async function connect() {
|
||||||
|
setConn('connecting', 'connecting');
|
||||||
|
try {
|
||||||
|
if (connection.state === signalR.HubConnectionState.Disconnected)
|
||||||
|
await connection.start();
|
||||||
|
await connection.invoke('SubscribePlc', plcName, tabId);
|
||||||
|
setConn('connected');
|
||||||
|
armSnapshotWatchdog();
|
||||||
|
retryMs = 1000;
|
||||||
|
} catch {
|
||||||
|
setConn('disconnected', 'retrying');
|
||||||
|
setTimeout(connect, retryMs);
|
||||||
|
retryMs = Math.min(retryMs * 2, 30000);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
document.addEventListener('DOMContentLoaded', () => {
|
||||||
|
if (plcNameError) {
|
||||||
|
showNotice('The PLC name in this URL could not be decoded — the address is ' +
|
||||||
|
'malformed. Return to the fleet page and open the PLC from the table.');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
connect();
|
||||||
|
});
|
||||||
|
})();
|
||||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -0,0 +1,99 @@
|
|||||||
|
<!doctype html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||||
|
<title>mbproxy — fleet</title>
|
||||||
|
<link rel="stylesheet" href="/assets/bootstrap.min.css">
|
||||||
|
<link rel="stylesheet" href="/assets/theme.css">
|
||||||
|
<link rel="stylesheet" href="/assets/dashboard.css">
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<header class="app-bar">
|
||||||
|
<span class="brand"><span class="mark">▮</span> mbproxy</span>
|
||||||
|
<span class="crumb">fleet</span>
|
||||||
|
<span class="spacer"></span>
|
||||||
|
<span class="meta" id="svc-meta">—</span>
|
||||||
|
<span class="conn-pill" id="conn" data-state="connecting" aria-live="polite">
|
||||||
|
<span class="dot"></span><span id="conn-text">connecting</span>
|
||||||
|
</span>
|
||||||
|
</header>
|
||||||
|
|
||||||
|
<main class="page">
|
||||||
|
<!-- Aggregate fleet health -->
|
||||||
|
<section class="agg-grid rise" id="agg" style="animation-delay:.02s">
|
||||||
|
<div class="agg-card">
|
||||||
|
<div class="agg-label">Listeners</div>
|
||||||
|
<div class="agg-value"><span id="ag-bound">—</span><span class="agg-sub" id="ag-configured"></span></div>
|
||||||
|
</div>
|
||||||
|
<div class="agg-card">
|
||||||
|
<div class="agg-label">Clients</div>
|
||||||
|
<div class="agg-value" id="ag-clients">—</div>
|
||||||
|
</div>
|
||||||
|
<div class="agg-card">
|
||||||
|
<div class="agg-label">PDU / s</div>
|
||||||
|
<div class="agg-value numeric" id="ag-pdurate">—</div>
|
||||||
|
</div>
|
||||||
|
<div class="agg-card" id="ag-recovering-card">
|
||||||
|
<div class="agg-label">Recovering</div>
|
||||||
|
<div class="agg-value" id="ag-recovering">—</div>
|
||||||
|
</div>
|
||||||
|
<div class="agg-card" id="ag-exceptions-card">
|
||||||
|
<div class="agg-label">Backend exceptions</div>
|
||||||
|
<div class="agg-value numeric" id="ag-exceptions">—</div>
|
||||||
|
</div>
|
||||||
|
<div class="agg-card">
|
||||||
|
<div class="agg-label">Cache hit</div>
|
||||||
|
<div class="agg-value numeric" id="ag-cache">—</div>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<!-- PLC table -->
|
||||||
|
<section class="panel rise" style="animation-delay:.1s">
|
||||||
|
<div class="toolbar">
|
||||||
|
<input type="search" id="f-search" class="form-control form-control-sm tb-search"
|
||||||
|
placeholder="Filter by name or host…" autocomplete="off"
|
||||||
|
aria-label="Filter PLCs by name or host">
|
||||||
|
<select id="f-state" class="form-select form-select-sm tb-state"
|
||||||
|
aria-label="Filter PLCs by listener state">
|
||||||
|
<option value="">All states</option>
|
||||||
|
<option value="bound">Bound</option>
|
||||||
|
<option value="recovering">Recovering</option>
|
||||||
|
<option value="stopped">Stopped</option>
|
||||||
|
</select>
|
||||||
|
<label class="tb-check">
|
||||||
|
<input type="checkbox" id="f-problems"> Problems only
|
||||||
|
</label>
|
||||||
|
<span class="spacer"></span>
|
||||||
|
<span class="tb-count" id="row-count"></span>
|
||||||
|
</div>
|
||||||
|
<div class="table-wrap">
|
||||||
|
<table class="kpi-table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th data-sort="name" class="sortable" tabindex="0" aria-sort="none" scope="col">PLC</th>
|
||||||
|
<th data-sort="host" class="sortable" tabindex="0" aria-sort="none" scope="col">Backend</th>
|
||||||
|
<th data-sort="state" class="sortable" tabindex="0" aria-sort="none" scope="col">State</th>
|
||||||
|
<th data-sort="clients" class="sortable num" tabindex="0" aria-sort="none" scope="col">Clients</th>
|
||||||
|
<th data-sort="pdurate" class="sortable num" tabindex="0" aria-sort="none" scope="col">PDU/s</th>
|
||||||
|
<th data-sort="rtt" class="sortable num" tabindex="0" aria-sort="none" scope="col">RTT ms</th>
|
||||||
|
<th data-sort="exceptions" class="sortable num" tabindex="0" aria-sort="none" scope="col">Excns</th>
|
||||||
|
<th data-sort="coalesce" class="sortable num" tabindex="0" aria-sort="none" scope="col">Coalesce</th>
|
||||||
|
<th data-sort="cache" class="sortable num" tabindex="0" aria-sort="none" scope="col">Cache</th>
|
||||||
|
<th data-sort="keepalive" class="sortable num" tabindex="0" aria-sort="none" scope="col">Keepalive</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody id="plc-rows">
|
||||||
|
<tr><td colspan="10" class="empty-row">Waiting for first snapshot…</td></tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
</main>
|
||||||
|
|
||||||
|
<script src="/assets/signalr.min.js"></script>
|
||||||
|
<script src="/assets/bootstrap.bundle.min.js"></script>
|
||||||
|
<script src="/assets/util.js"></script>
|
||||||
|
<script src="/assets/dashboard.js"></script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
@@ -0,0 +1,73 @@
|
|||||||
|
<!doctype html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||||
|
<title>mbproxy — connection</title>
|
||||||
|
<link rel="stylesheet" href="/assets/bootstrap.min.css">
|
||||||
|
<link rel="stylesheet" href="/assets/theme.css">
|
||||||
|
<link rel="stylesheet" href="/assets/detail.css">
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<header class="app-bar">
|
||||||
|
<span class="brand"><span class="mark">▮</span> mbproxy</span>
|
||||||
|
<a class="crumb" href="/">fleet</a>
|
||||||
|
<span class="crumb">›</span>
|
||||||
|
<span class="crumb" id="crumb-name">—</span>
|
||||||
|
<span class="spacer"></span>
|
||||||
|
<span class="conn-pill" id="conn" data-state="connecting" aria-live="polite">
|
||||||
|
<span class="dot"></span><span id="conn-text">connecting</span>
|
||||||
|
</span>
|
||||||
|
</header>
|
||||||
|
|
||||||
|
<main class="page">
|
||||||
|
<!-- PLC identity header -->
|
||||||
|
<section class="plc-head rise" id="plc-head" style="animation-delay:.02s">
|
||||||
|
<div>
|
||||||
|
<div class="plc-title" id="plc-name">—</div>
|
||||||
|
<div class="plc-sub" id="plc-sub">—</div>
|
||||||
|
</div>
|
||||||
|
<div class="plc-head-state" id="plc-state"></div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<!-- "PLC not configured" notice (shown when a hot-reload removed the PLC) -->
|
||||||
|
<section class="panel notice rise" id="notice" hidden>
|
||||||
|
This PLC is no longer in the configuration. It was likely removed by a
|
||||||
|
configuration hot-reload. Counters and the debug view are unavailable.
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<!-- Grouped counter cards -->
|
||||||
|
<section class="card-grid rise" id="cards" style="animation-delay:.08s"></section>
|
||||||
|
|
||||||
|
<!-- Real-time debug view -->
|
||||||
|
<section class="panel rise" id="debug-panel" style="animation-delay:.14s">
|
||||||
|
<div class="panel-head debug-head">
|
||||||
|
<span>Debug view — per-tag PLC-side vs client-side values</span>
|
||||||
|
<span class="capture-state" id="capture-state">—</span>
|
||||||
|
</div>
|
||||||
|
<div class="table-wrap">
|
||||||
|
<table class="debug-table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Tag</th>
|
||||||
|
<th>Width</th>
|
||||||
|
<th>Direction</th>
|
||||||
|
<th class="num">PLC side (raw BCD)</th>
|
||||||
|
<th class="num">Client side (decoded)</th>
|
||||||
|
<th class="num">Age</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody id="debug-rows">
|
||||||
|
<tr><td colspan="6" class="empty-row">Waiting for first snapshot…</td></tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
</main>
|
||||||
|
|
||||||
|
<script src="/assets/signalr.min.js"></script>
|
||||||
|
<script src="/assets/bootstrap.bundle.min.js"></script>
|
||||||
|
<script src="/assets/util.js"></script>
|
||||||
|
<script src="/assets/detail.js"></script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
File diff suppressed because one or more lines are too long
@@ -0,0 +1,178 @@
|
|||||||
|
/* ============================================================================
|
||||||
|
mbproxy admin console — shared theme
|
||||||
|
Refined technical-light: warm-neutral paper, hairline rules, IBM Plex type,
|
||||||
|
monospace numerics, status carried by colour. Layered over Bootstrap 5 via
|
||||||
|
--bs-* variable overrides. Owned by Phase 3; the dashboard and detail views
|
||||||
|
add only view-specific rules in dashboard.css / detail.css.
|
||||||
|
========================================================================= */
|
||||||
|
|
||||||
|
/* ── Vendored fonts (embedded woff2, no network fetch) ───────────────────── */
|
||||||
|
@font-face {
|
||||||
|
font-family: 'IBM Plex Sans';
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 400;
|
||||||
|
font-display: swap;
|
||||||
|
src: url('/assets/ibm-plex-sans-400.woff2') format('woff2');
|
||||||
|
}
|
||||||
|
@font-face {
|
||||||
|
font-family: 'IBM Plex Sans';
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 600;
|
||||||
|
font-display: swap;
|
||||||
|
src: url('/assets/ibm-plex-sans-600.woff2') format('woff2');
|
||||||
|
}
|
||||||
|
@font-face {
|
||||||
|
font-family: 'IBM Plex Mono';
|
||||||
|
font-style: normal;
|
||||||
|
font-weight: 500;
|
||||||
|
font-display: swap;
|
||||||
|
src: url('/assets/ibm-plex-mono-500.woff2') format('woff2');
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Design tokens ───────────────────────────────────────────────────────── */
|
||||||
|
:root {
|
||||||
|
--paper: #f4f4f1;
|
||||||
|
--card: #ffffff;
|
||||||
|
--ink: #1b1d21;
|
||||||
|
--ink-soft: #5a6066;
|
||||||
|
--ink-faint: #8b9097;
|
||||||
|
--rule: #e4e4df;
|
||||||
|
--rule-strong: #d2d2cb;
|
||||||
|
|
||||||
|
--accent: #2f5fd0;
|
||||||
|
--accent-deep: #1e3f99;
|
||||||
|
|
||||||
|
--ok: #2f9e44;
|
||||||
|
--warn: #e8920c;
|
||||||
|
--bad: #e03131;
|
||||||
|
--idle: #868e96;
|
||||||
|
|
||||||
|
--ok-bg: #e9f6ec;
|
||||||
|
--warn-bg: #fdf1dd;
|
||||||
|
--bad-bg: #fceaea;
|
||||||
|
--idle-bg: #eef0f2;
|
||||||
|
|
||||||
|
--mono: 'IBM Plex Mono', ui-monospace, 'Cascadia Mono', Consolas, monospace;
|
||||||
|
--sans: 'IBM Plex Sans', system-ui, -apple-system, 'Segoe UI', sans-serif;
|
||||||
|
|
||||||
|
/* Bootstrap overrides */
|
||||||
|
--bs-body-bg: var(--paper);
|
||||||
|
--bs-body-color: var(--ink);
|
||||||
|
--bs-body-font-family: var(--sans);
|
||||||
|
--bs-body-font-size: 0.9rem;
|
||||||
|
--bs-primary: var(--accent);
|
||||||
|
--bs-border-color: var(--rule);
|
||||||
|
--bs-emphasis-color: var(--ink);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Base ────────────────────────────────────────────────────────────────── */
|
||||||
|
body {
|
||||||
|
background:
|
||||||
|
radial-gradient(1200px 480px at 88% -8%, #ffffff 0%, rgba(255,255,255,0) 70%),
|
||||||
|
var(--paper);
|
||||||
|
color: var(--ink);
|
||||||
|
font-family: var(--sans);
|
||||||
|
-webkit-font-smoothing: antialiased;
|
||||||
|
}
|
||||||
|
|
||||||
|
.numeric,
|
||||||
|
.mono { font-family: var(--mono); font-variant-numeric: tabular-nums; }
|
||||||
|
|
||||||
|
a { color: var(--accent); text-decoration: none; }
|
||||||
|
a:hover { color: var(--accent-deep); text-decoration: underline; }
|
||||||
|
|
||||||
|
/* ── App chrome: top bar ─────────────────────────────────────────────────── */
|
||||||
|
.app-bar {
|
||||||
|
display: flex;
|
||||||
|
align-items: baseline;
|
||||||
|
gap: 1rem;
|
||||||
|
padding: 0.85rem 1.25rem;
|
||||||
|
background: var(--card);
|
||||||
|
border-bottom: 1px solid var(--rule-strong);
|
||||||
|
}
|
||||||
|
.app-bar .brand {
|
||||||
|
font-weight: 600;
|
||||||
|
font-size: 1.05rem;
|
||||||
|
letter-spacing: 0.02em;
|
||||||
|
}
|
||||||
|
.app-bar .brand .mark { color: var(--accent); }
|
||||||
|
.app-bar .crumb { color: var(--ink-faint); font-size: 0.85rem; }
|
||||||
|
.app-bar .spacer { flex: 1; }
|
||||||
|
.app-bar .meta {
|
||||||
|
font-family: var(--mono);
|
||||||
|
font-size: 0.78rem;
|
||||||
|
color: var(--ink-soft);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Connection-state pill (SignalR link health) ─────────────────────────── */
|
||||||
|
.conn-pill {
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 0.4rem;
|
||||||
|
font-size: 0.74rem;
|
||||||
|
font-weight: 600;
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.06em;
|
||||||
|
padding: 0.2rem 0.6rem;
|
||||||
|
border-radius: 999px;
|
||||||
|
border: 1px solid var(--rule-strong);
|
||||||
|
color: var(--ink-soft);
|
||||||
|
background: var(--card);
|
||||||
|
}
|
||||||
|
.conn-pill .dot {
|
||||||
|
width: 7px; height: 7px; border-radius: 50%;
|
||||||
|
background: var(--idle);
|
||||||
|
}
|
||||||
|
.conn-pill[data-state="connected"] { color: var(--ok); border-color: #bfe3c6; background: var(--ok-bg); }
|
||||||
|
.conn-pill[data-state="connected"] .dot { background: var(--ok); }
|
||||||
|
.conn-pill[data-state="connecting"] { color: var(--warn); border-color: #f0d9ab; background: var(--warn-bg); }
|
||||||
|
.conn-pill[data-state="connecting"] .dot { background: var(--warn); animation: pulse 1.1s ease-in-out infinite; }
|
||||||
|
.conn-pill[data-state="disconnected"] { color: var(--bad); border-color: #f0c0c0; background: var(--bad-bg); }
|
||||||
|
.conn-pill[data-state="disconnected"] .dot { background: var(--bad); }
|
||||||
|
|
||||||
|
@keyframes pulse { 0%,100% { opacity: 1; } 50% { opacity: 0.25; } }
|
||||||
|
|
||||||
|
/* ── Status text helpers ─────────────────────────────────────────────────── */
|
||||||
|
.s-ok { color: var(--ok); }
|
||||||
|
.s-warn { color: var(--warn); }
|
||||||
|
.s-bad { color: var(--bad); }
|
||||||
|
.s-idle { color: var(--idle); }
|
||||||
|
|
||||||
|
/* State chip — used for listener state and health badges */
|
||||||
|
.chip {
|
||||||
|
display: inline-block;
|
||||||
|
font-size: 0.72rem;
|
||||||
|
font-weight: 600;
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.05em;
|
||||||
|
padding: 0.15rem 0.5rem;
|
||||||
|
border-radius: 4px;
|
||||||
|
border: 1px solid transparent;
|
||||||
|
}
|
||||||
|
.chip-ok { color: var(--ok); background: var(--ok-bg); border-color: #c6e6cd; }
|
||||||
|
.chip-warn { color: #b56a00; background: var(--warn-bg); border-color: #efd6a6; }
|
||||||
|
.chip-bad { color: var(--bad); background: var(--bad-bg); border-color: #eec3c3; }
|
||||||
|
.chip-idle { color: var(--ink-soft); background: var(--idle-bg); border-color: var(--rule-strong); }
|
||||||
|
|
||||||
|
/* ── Cards ───────────────────────────────────────────────────────────────── */
|
||||||
|
.panel {
|
||||||
|
background: var(--card);
|
||||||
|
border: 1px solid var(--rule);
|
||||||
|
border-radius: 8px;
|
||||||
|
}
|
||||||
|
.panel-head {
|
||||||
|
font-size: 0.74rem;
|
||||||
|
font-weight: 600;
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.07em;
|
||||||
|
color: var(--ink-faint);
|
||||||
|
padding: 0.6rem 0.9rem;
|
||||||
|
border-bottom: 1px solid var(--rule);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Generic page padding */
|
||||||
|
.page { padding: 1.25rem; max-width: 1680px; margin: 0 auto; }
|
||||||
|
|
||||||
|
/* Staggered panel reveal on first paint */
|
||||||
|
@keyframes rise { from { opacity: 0; transform: translateY(6px); } to { opacity: 1; transform: none; } }
|
||||||
|
.rise { animation: rise 0.4s ease both; }
|
||||||
@@ -0,0 +1,20 @@
|
|||||||
|
/* ============================================================================
|
||||||
|
Shared helpers for the admin dashboard pages. Loaded before dashboard.js /
|
||||||
|
detail.js; each page's IIFE pulls these off window.mbproxyUtil so the HTML-
|
||||||
|
escaping logic has exactly one definition.
|
||||||
|
========================================================================= */
|
||||||
|
'use strict';
|
||||||
|
|
||||||
|
(function () {
|
||||||
|
/** Escapes the three HTML-significant characters for safe text-node insertion. */
|
||||||
|
function escapeHtml(s) {
|
||||||
|
return String(s).replace(/[&<>]/g, c => ({ '&': '&', '<': '<', '>': '>' }[c]));
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Escapes a value for use inside a double-quoted HTML attribute. */
|
||||||
|
function escapeAttr(s) {
|
||||||
|
return escapeHtml(s).replace(/"/g, '"');
|
||||||
|
}
|
||||||
|
|
||||||
|
window.mbproxyUtil = { escapeHtml, escapeAttr };
|
||||||
|
})();
|
||||||
@@ -8,14 +8,17 @@ namespace Mbproxy.Bcd;
|
|||||||
/// milliseconds. 0 (the default) means caching is disabled for this tag. Positive values
|
/// milliseconds. 0 (the default) means caching is disabled for this tag. Positive values
|
||||||
/// cap upstream staleness; the multi-tag-range read uses <c>min(TTLs)</c> across all
|
/// cap upstream staleness; the multi-tag-range read uses <c>min(TTLs)</c> across all
|
||||||
/// matched tags and treats any 0 in the range as "uncached for the whole read."</para>
|
/// matched tags and treats any 0 in the range as "uncached for the whole read."</para>
|
||||||
|
///
|
||||||
|
/// <para><b><see cref="Name"/></b> is an optional human-friendly label carried through
|
||||||
|
/// to the connection-detail debug view. It has no effect on Modbus rewriting.</para>
|
||||||
/// </summary>
|
/// </summary>
|
||||||
public sealed record BcdTag(ushort Address, byte Width, int CacheTtlMs = 0)
|
public sealed record BcdTag(ushort Address, byte Width, int CacheTtlMs = 0, string? Name = null)
|
||||||
{
|
{
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// Creates a <see cref="BcdTag"/> and validates that Width is 16 or 32.
|
/// Creates a <see cref="BcdTag"/> and validates that Width is 16 or 32.
|
||||||
/// </summary>
|
/// </summary>
|
||||||
/// <exception cref="ArgumentException">Width is not 16 or 32.</exception>
|
/// <exception cref="ArgumentException">Width is not 16 or 32.</exception>
|
||||||
public static BcdTag Create(ushort address, byte width, int cacheTtlMs = 0)
|
public static BcdTag Create(ushort address, byte width, int cacheTtlMs = 0, string? name = null)
|
||||||
{
|
{
|
||||||
if (width != 16 && width != 32)
|
if (width != 16 && width != 32)
|
||||||
throw new ArgumentException(
|
throw new ArgumentException(
|
||||||
@@ -27,7 +30,7 @@ public sealed record BcdTag(ushort Address, byte Width, int CacheTtlMs = 0)
|
|||||||
$"BCD tag CacheTtlMs must be >= 0; got {cacheTtlMs} at address {address}.",
|
$"BCD tag CacheTtlMs must be >= 0; got {cacheTtlMs} at address {address}.",
|
||||||
nameof(cacheTtlMs));
|
nameof(cacheTtlMs));
|
||||||
|
|
||||||
return new BcdTag(address, width, cacheTtlMs);
|
return new BcdTag(address, width, cacheTtlMs, name);
|
||||||
}
|
}
|
||||||
|
|
||||||
/// <summary>True when this tag occupies two registers (32-bit BCD).</summary>
|
/// <summary>True when this tag occupies two registers (32-bit BCD).</summary>
|
||||||
|
|||||||
@@ -122,7 +122,7 @@ public static class BcdTagMapBuilder
|
|||||||
int resolvedTtl = opt.CacheTtlMs ?? perPlcDefaultCacheTtlMs;
|
int resolvedTtl = opt.CacheTtlMs ?? perPlcDefaultCacheTtlMs;
|
||||||
if (resolvedTtl < 0) resolvedTtl = 0;
|
if (resolvedTtl < 0) resolvedTtl = 0;
|
||||||
|
|
||||||
validated[addr] = BcdTag.Create(addr, opt.Width, resolvedTtl);
|
validated[addr] = BcdTag.Create(addr, opt.Width, resolvedTtl, opt.Name);
|
||||||
}
|
}
|
||||||
|
|
||||||
// High-register collision check (only meaningful for 32-bit entries).
|
// High-register collision check (only meaningful for 32-bit entries).
|
||||||
@@ -130,6 +130,7 @@ public static class BcdTagMapBuilder
|
|||||||
// and (101,W=32)) would otherwise produce two BcdErrors — one from each
|
// and (101,W=32)) would otherwise produce two BcdErrors — one from each
|
||||||
// direction. Track reported (low,high) pairs so each collision logs once.
|
// direction. Track reported (low,high) pairs so each collision logs once.
|
||||||
var reportedCollisions = new HashSet<(ushort, ushort)>();
|
var reportedCollisions = new HashSet<(ushort, ushort)>();
|
||||||
|
var collidingAddresses = new HashSet<ushort>();
|
||||||
foreach (var tag in validated.Values)
|
foreach (var tag in validated.Values)
|
||||||
{
|
{
|
||||||
if (!tag.IsThirtyTwoBit)
|
if (!tag.IsThirtyTwoBit)
|
||||||
@@ -138,6 +139,10 @@ public static class BcdTagMapBuilder
|
|||||||
ushort highReg = tag.HighRegister;
|
ushort highReg = tag.HighRegister;
|
||||||
if (validated.TryGetValue(highReg, out var collision))
|
if (validated.TryGetValue(highReg, out var collision))
|
||||||
{
|
{
|
||||||
|
// Both entries of the colliding pair are unsafe to keep in the map.
|
||||||
|
collidingAddresses.Add(tag.Address);
|
||||||
|
collidingAddresses.Add(collision.Address);
|
||||||
|
|
||||||
// Canonicalise the pair so (a,b) and (b,a) collapse.
|
// Canonicalise the pair so (a,b) and (b,a) collapse.
|
||||||
var pair = tag.Address < collision.Address
|
var pair = tag.Address < collision.Address
|
||||||
? (tag.Address, collision.Address)
|
? (tag.Address, collision.Address)
|
||||||
@@ -152,12 +157,16 @@ public static class BcdTagMapBuilder
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── Step 5: build the frozen map from entries that have no errors ─────
|
// ── Step 5: build the frozen map from entries that passed validation ──
|
||||||
// Entries implicated in an OverlappingHighRegister error are still included
|
// Exclude entries implicated in an OverlappingHighRegister collision — consistent
|
||||||
// in the map so that the caller can see all context; the error list tells them
|
// with how InvalidWidth/DuplicateAddress entries are already dropped above — so a
|
||||||
// the config is invalid and must be corrected before the service is safe to run.
|
// map returned alongside a non-empty Errors list is always safe for a caller to
|
||||||
// (If callers want to exclude bad entries they should check Errors.Count > 0
|
// use even if it forgets to check Errors.Count (review ProxyAndBcd N7). Callers
|
||||||
// and refuse to start the listener for that PLC.)
|
// SHOULD still treat Errors.Count > 0 as a fatal config problem and refuse to
|
||||||
|
// start the listener; this is defence-in-depth, not a substitute for that check.
|
||||||
|
foreach (var addr in collidingAddresses)
|
||||||
|
validated.Remove(addr);
|
||||||
|
|
||||||
var frozen = validated.ToFrozenDictionary();
|
var frozen = validated.ToFrozenDictionary();
|
||||||
var map = frozen.Count > 0 ? new BcdTagMap(frozen) : BcdTagMap.Empty;
|
var map = frozen.Count > 0 ? new BcdTagMap(frozen) : BcdTagMap.Empty;
|
||||||
|
|
||||||
|
|||||||
@@ -46,6 +46,7 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
private readonly ILoggerFactory _loggerFactory;
|
private readonly ILoggerFactory _loggerFactory;
|
||||||
private readonly ILogger<ConfigReconciler> _logger;
|
private readonly ILogger<ConfigReconciler> _logger;
|
||||||
private readonly ServiceCounters _serviceCounters;
|
private readonly ServiceCounters _serviceCounters;
|
||||||
|
private readonly Proxy.TagCaptureRegistry _captureRegistry;
|
||||||
|
|
||||||
// The supervisor dictionary is set by ProxyWorker after initial startup.
|
// The supervisor dictionary is set by ProxyWorker after initial startup.
|
||||||
// ConcurrentDictionary so the per-PLC Add/Remove/Restart task continuations inside
|
// ConcurrentDictionary so the per-PLC Add/Remove/Restart task continuations inside
|
||||||
@@ -65,6 +66,16 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
// restarted via hot-reload honour the current `Connection.Keepalive` values.
|
// restarted via hot-reload honour the current `Connection.Keepalive` values.
|
||||||
private Func<KeepaliveOptions>? _keepaliveAccessor;
|
private Func<KeepaliveOptions>? _keepaliveAccessor;
|
||||||
|
|
||||||
|
// Resilience options frozen at startup. The reconciler builds the backend-connect and
|
||||||
|
// listener-recovery Polly pipelines from THIS snapshot, never from the reloaded
|
||||||
|
// `next.Resilience` — so an added/restarted PLC uses the same retry/recovery profile
|
||||||
|
// as the PLCs that were never touched. `Resilience.BackendConnect` /
|
||||||
|
// `Resilience.ListenerRecovery` are therefore restart-only (documented in
|
||||||
|
// docs/Features/HotReload.md); this avoids the inconsistency where the same key
|
||||||
|
// behaved differently per PLC depending on whether it was added in the reload
|
||||||
|
// (review ConfigAndHosting M3).
|
||||||
|
private ResilienceOptions? _startupResilience;
|
||||||
|
|
||||||
// ── Debounce + serialisation machinery ───────────────────────────────────────────────
|
// ── Debounce + serialisation machinery ───────────────────────────────────────────────
|
||||||
|
|
||||||
// Channel carries Unit to signal "something changed — please check".
|
// Channel carries Unit to signal "something changed — please check".
|
||||||
@@ -91,12 +102,14 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
public ConfigReconciler(
|
public ConfigReconciler(
|
||||||
IOptionsMonitor<MbproxyOptions> monitor,
|
IOptionsMonitor<MbproxyOptions> monitor,
|
||||||
ILoggerFactory loggerFactory,
|
ILoggerFactory loggerFactory,
|
||||||
ServiceCounters serviceCounters)
|
ServiceCounters serviceCounters,
|
||||||
|
Proxy.TagCaptureRegistry captureRegistry)
|
||||||
{
|
{
|
||||||
_monitor = monitor;
|
_monitor = monitor;
|
||||||
_loggerFactory = loggerFactory;
|
_loggerFactory = loggerFactory;
|
||||||
_logger = loggerFactory.CreateLogger<ConfigReconciler>();
|
_logger = loggerFactory.CreateLogger<ConfigReconciler>();
|
||||||
_serviceCounters = serviceCounters;
|
_serviceCounters = serviceCounters;
|
||||||
|
_captureRegistry = captureRegistry;
|
||||||
|
|
||||||
// Subscribe to OnChange. The callback must return immediately — enqueue only.
|
// Subscribe to OnChange. The callback must return immediately — enqueue only.
|
||||||
_changeRegistration = _monitor.OnChange((_, _) =>
|
_changeRegistration = _monitor.OnChange((_, _) =>
|
||||||
@@ -132,6 +145,7 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
_currentOptions = initialOptions;
|
_currentOptions = initialOptions;
|
||||||
_coalescingAccessor = coalescingAccessor;
|
_coalescingAccessor = coalescingAccessor;
|
||||||
_keepaliveAccessor = keepaliveAccessor;
|
_keepaliveAccessor = keepaliveAccessor;
|
||||||
|
_startupResilience = initialOptions.Resilience;
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── ApplyAsync (exposed for tests) ───────────────────────────────────────────────────
|
// ── ApplyAsync (exposed for tests) ───────────────────────────────────────────────────
|
||||||
@@ -245,6 +259,12 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
// Compute global tag delta (count of entries that differ).
|
// Compute global tag delta (count of entries that differ).
|
||||||
int globalTagDelta = ComputeGlobalTagDelta(_currentOptions.BcdTags, next.BcdTags);
|
int globalTagDelta = ComputeGlobalTagDelta(_currentOptions.BcdTags, next.BcdTags);
|
||||||
|
|
||||||
|
// Counts per-step failures across the parallel apply tasks so step 7 can report a
|
||||||
|
// reload that was applied only partially, instead of logging an unqualified
|
||||||
|
// "applied" (review ConfigAndHosting N1). Incremented via Interlocked because the
|
||||||
|
// Remove/Restart/Add tasks run concurrently under Task.WhenAll.
|
||||||
|
int stepFailures = 0;
|
||||||
|
|
||||||
// ── 3. Apply: Remove ─────────────────────────────────────────────────
|
// ── 3. Apply: Remove ─────────────────────────────────────────────────
|
||||||
if (plan.ToRemove.Count > 0)
|
if (plan.ToRemove.Count > 0)
|
||||||
{
|
{
|
||||||
@@ -253,6 +273,7 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
{
|
{
|
||||||
if (!_supervisors.TryRemove(name, out var s))
|
if (!_supervisors.TryRemove(name, out var s))
|
||||||
return;
|
return;
|
||||||
|
_captureRegistry.Remove(name);
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
using var stopCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
|
using var stopCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
|
||||||
@@ -262,6 +283,7 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
}
|
}
|
||||||
catch (Exception ex)
|
catch (Exception ex)
|
||||||
{
|
{
|
||||||
|
Interlocked.Increment(ref stepFailures);
|
||||||
_logger.LogError(ex, "Error stopping supervisor for removed PLC '{Plc}': {Message}",
|
_logger.LogError(ex, "Error stopping supervisor for removed PLC '{Plc}': {Message}",
|
||||||
name, ex.Message);
|
name, ex.Message);
|
||||||
}
|
}
|
||||||
@@ -274,7 +296,9 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
// ── 4. Apply: Restart (stop + rebuild + start) ───────────────────────
|
// ── 4. Apply: Restart (stop + rebuild + start) ───────────────────────
|
||||||
if (plan.ToRestart.Count > 0)
|
if (plan.ToRestart.Count > 0)
|
||||||
{
|
{
|
||||||
var resilienceOpts = next.Resilience;
|
// Pipelines built from the FROZEN startup Resilience snapshot, not
|
||||||
|
// next.Resilience — see _startupResilience (review ConfigAndHosting M3).
|
||||||
|
var resilienceOpts = _startupResilience ?? next.Resilience;
|
||||||
var backendPipeline = PolicyFactory.BuildBackendConnect(
|
var backendPipeline = PolicyFactory.BuildBackendConnect(
|
||||||
resilienceOpts.BackendConnect,
|
resilienceOpts.BackendConnect,
|
||||||
_loggerFactory.CreateLogger("Mbproxy.Proxy.BackendConnect"));
|
_loggerFactory.CreateLogger("Mbproxy.Proxy.BackendConnect"));
|
||||||
@@ -284,17 +308,11 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
var (name, plcNew) = entry;
|
var (name, plcNew) = entry;
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
// Stop old supervisor.
|
// Build the new context + supervisor FIRST, while the old supervisor
|
||||||
if (_supervisors.TryRemove(name, out var old))
|
// is still in the dictionary and running. If any construction step
|
||||||
{
|
// throws (BcdTagMapBuilder, PolicyFactory, ctor), the catch leaves the
|
||||||
using var stopCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
|
// old supervisor in place — a transient rebuild fault no longer drops
|
||||||
stopCts.CancelAfter(TimeSpan.FromSeconds(10));
|
// the PLC from service permanently (review ConfigAndHosting M2).
|
||||||
await old.StopAsync(stopCts.Token).ConfigureAwait(false);
|
|
||||||
await old.DisposeAsync().ConfigureAwait(false);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Build fresh context. Pass DefaultCacheTtlMs so per-PLC default
|
|
||||||
// caching folds into the resolved tag map.
|
|
||||||
var result = BcdTagMapBuilder.Build(next.BcdTags, plcNew.BcdTags, plcNew.DefaultCacheTtlMs);
|
var result = BcdTagMapBuilder.Build(next.BcdTags, plcNew.BcdTags, plcNew.DefaultCacheTtlMs);
|
||||||
var newCtx = new PerPlcContext
|
var newCtx = new PerPlcContext
|
||||||
{
|
{
|
||||||
@@ -303,9 +321,9 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
Counters = new Proxy.ProxyCounters(),
|
Counters = new Proxy.ProxyCounters(),
|
||||||
Logger = _loggerFactory.CreateLogger($"Mbproxy.Proxy.BcdRewriter.{plcNew.Name}"),
|
Logger = _loggerFactory.CreateLogger($"Mbproxy.Proxy.BcdRewriter.{plcNew.Name}"),
|
||||||
Cache = BuildCacheIfNeeded(result.Map, next.Cache),
|
Cache = BuildCacheIfNeeded(result.Map, next.Cache),
|
||||||
|
Capture = _captureRegistry.GetOrCreate(plcNew.Name, result.Map),
|
||||||
};
|
};
|
||||||
|
|
||||||
// Build and start new supervisor.
|
|
||||||
var recoveryPipeline = PolicyFactory.BuildListenerRecovery(
|
var recoveryPipeline = PolicyFactory.BuildListenerRecovery(
|
||||||
resilienceOpts.ListenerRecovery,
|
resilienceOpts.ListenerRecovery,
|
||||||
_loggerFactory.CreateLogger($"Mbproxy.Proxy.ListenerRecovery.{plcNew.Name}"));
|
_loggerFactory.CreateLogger($"Mbproxy.Proxy.ListenerRecovery.{plcNew.Name}"));
|
||||||
@@ -324,11 +342,22 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
_coalescingAccessor,
|
_coalescingAccessor,
|
||||||
_keepaliveAccessor);
|
_keepaliveAccessor);
|
||||||
|
|
||||||
|
// Construction succeeded — now swap. Stop the old supervisor, then
|
||||||
|
// publish and start the new one.
|
||||||
|
if (_supervisors.TryRemove(name, out var old))
|
||||||
|
{
|
||||||
|
using var stopCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
|
||||||
|
stopCts.CancelAfter(TimeSpan.FromSeconds(10));
|
||||||
|
await old.StopAsync(stopCts.Token).ConfigureAwait(false);
|
||||||
|
await old.DisposeAsync().ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
|
||||||
_supervisors[name] = newSupervisor;
|
_supervisors[name] = newSupervisor;
|
||||||
await newSupervisor.StartAsync(ct).ConfigureAwait(false);
|
await newSupervisor.StartAsync(ct).ConfigureAwait(false);
|
||||||
}
|
}
|
||||||
catch (Exception ex)
|
catch (Exception ex)
|
||||||
{
|
{
|
||||||
|
Interlocked.Increment(ref stepFailures);
|
||||||
_logger.LogError(ex, "Error restarting supervisor for PLC '{Plc}': {Message}",
|
_logger.LogError(ex, "Error restarting supervisor for PLC '{Plc}': {Message}",
|
||||||
name, ex.Message);
|
name, ex.Message);
|
||||||
}
|
}
|
||||||
@@ -345,7 +374,13 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
|
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
var plcNew = next.Plcs.First(p => p.Name == name);
|
// ReloadPlan.Compute guarantees `name` is present in next.Plcs; the
|
||||||
|
// explicit null-guard makes that invariant a clean skip rather than an
|
||||||
|
// InvalidOperationException mislabelled as a reseat error (review N2).
|
||||||
|
var plcNew = next.Plcs.FirstOrDefault(p => p.Name == name);
|
||||||
|
if (plcNew is null)
|
||||||
|
continue;
|
||||||
|
|
||||||
var newCtx = new PerPlcContext
|
var newCtx = new PerPlcContext
|
||||||
{
|
{
|
||||||
PlcName = name,
|
PlcName = name,
|
||||||
@@ -356,6 +391,10 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
// Any reseat (tag-map change) constructs a fresh cache. The
|
// Any reseat (tag-map change) constructs a fresh cache. The
|
||||||
// supervisor disposes the old one inside ReplaceContextAsync.
|
// supervisor disposes the old one inside ReplaceContextAsync.
|
||||||
Cache = BuildCacheIfNeeded(newMap, next.Cache),
|
Cache = BuildCacheIfNeeded(newMap, next.Cache),
|
||||||
|
// Rebuild the capture for the new tag set. The rebuilt capture is
|
||||||
|
// disarmed; StatusBroadcaster re-arms it within one push cycle if the
|
||||||
|
// PLC still has a detail-page viewer.
|
||||||
|
Capture = _captureRegistry.GetOrCreate(name, newMap),
|
||||||
};
|
};
|
||||||
|
|
||||||
using var reseatCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
|
using var reseatCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
|
||||||
@@ -364,6 +403,7 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
}
|
}
|
||||||
catch (Exception ex)
|
catch (Exception ex)
|
||||||
{
|
{
|
||||||
|
Interlocked.Increment(ref stepFailures);
|
||||||
_logger.LogError(ex, "Error reseating context for PLC '{Plc}': {Message}",
|
_logger.LogError(ex, "Error reseating context for PLC '{Plc}': {Message}",
|
||||||
name, ex.Message);
|
name, ex.Message);
|
||||||
}
|
}
|
||||||
@@ -372,7 +412,8 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
// ── 6. Apply: Add new PLCs ────────────────────────────────────────────
|
// ── 6. Apply: Add new PLCs ────────────────────────────────────────────
|
||||||
if (plan.ToAdd.Count > 0)
|
if (plan.ToAdd.Count > 0)
|
||||||
{
|
{
|
||||||
var resilienceOpts = next.Resilience;
|
// Frozen startup Resilience snapshot, not next.Resilience (M3).
|
||||||
|
var resilienceOpts = _startupResilience ?? next.Resilience;
|
||||||
var backendPipeline = PolicyFactory.BuildBackendConnect(
|
var backendPipeline = PolicyFactory.BuildBackendConnect(
|
||||||
resilienceOpts.BackendConnect,
|
resilienceOpts.BackendConnect,
|
||||||
_loggerFactory.CreateLogger("Mbproxy.Proxy.BackendConnect"));
|
_loggerFactory.CreateLogger("Mbproxy.Proxy.BackendConnect"));
|
||||||
@@ -391,6 +432,7 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
Counters = new Proxy.ProxyCounters(),
|
Counters = new Proxy.ProxyCounters(),
|
||||||
Logger = _loggerFactory.CreateLogger($"Mbproxy.Proxy.BcdRewriter.{plcNew.Name}"),
|
Logger = _loggerFactory.CreateLogger($"Mbproxy.Proxy.BcdRewriter.{plcNew.Name}"),
|
||||||
Cache = BuildCacheIfNeeded(result.Map, next.Cache),
|
Cache = BuildCacheIfNeeded(result.Map, next.Cache),
|
||||||
|
Capture = _captureRegistry.GetOrCreate(plcNew.Name, result.Map),
|
||||||
};
|
};
|
||||||
|
|
||||||
var recoveryPipeline = PolicyFactory.BuildListenerRecovery(
|
var recoveryPipeline = PolicyFactory.BuildListenerRecovery(
|
||||||
@@ -416,6 +458,7 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
}
|
}
|
||||||
catch (Exception ex)
|
catch (Exception ex)
|
||||||
{
|
{
|
||||||
|
Interlocked.Increment(ref stepFailures);
|
||||||
_logger.LogError(ex, "Error adding supervisor for PLC '{Plc}': {Message}",
|
_logger.LogError(ex, "Error adding supervisor for PLC '{Plc}': {Message}",
|
||||||
plcNew.Name, ex.Message);
|
plcNew.Name, ex.Message);
|
||||||
}
|
}
|
||||||
@@ -429,6 +472,12 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
var appliedAt = DateTimeOffset.UtcNow;
|
var appliedAt = DateTimeOffset.UtcNow;
|
||||||
_serviceCounters.RecordReloadApplied(appliedAt);
|
_serviceCounters.RecordReloadApplied(appliedAt);
|
||||||
|
|
||||||
|
if (stepFailures > 0)
|
||||||
|
_logger.LogWarning(
|
||||||
|
"Config reload applied with {Failures} step failure(s) — see the preceding " +
|
||||||
|
"error log lines; the affected PLC(s) may not be in their intended state.",
|
||||||
|
stepFailures);
|
||||||
|
|
||||||
LogReloadApplied(_logger, plcsAdded, plcsRemoved, plcsRestarted, plcsReseated, globalTagDelta);
|
LogReloadApplied(_logger, plcsAdded, plcsRemoved, plcsRestarted, plcsReseated, globalTagDelta);
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
@@ -452,9 +501,18 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
private static int ComputeGlobalTagDelta(BcdTagListOptions before, BcdTagListOptions after)
|
private static int ComputeGlobalTagDelta(BcdTagListOptions before, BcdTagListOptions after)
|
||||||
{
|
{
|
||||||
// Count entries in before but not in after (removed), plus entries in after
|
// Count entries in before but not in after (removed), plus entries in after
|
||||||
// but not in before (added), plus entries with the same address but different width.
|
// but not in before (added), plus entries with the same address but different
|
||||||
var beforeDict = before.Global.ToDictionary(t => t.Address);
|
// width. Last-wins indexing (not ToDictionary, which throws on a duplicate
|
||||||
var afterDict = after.Global.ToDictionary(t => t.Address);
|
// address) keeps this purely-cosmetic delta from ever aborting a reload — a
|
||||||
|
// duplicate-address config is already rejected by ReloadValidator (review N3).
|
||||||
|
static Dictionary<ushort, BcdTagOptions> Index(BcdTagListOptions o)
|
||||||
|
{
|
||||||
|
var d = new Dictionary<ushort, BcdTagOptions>(o.Global.Count);
|
||||||
|
foreach (var t in o.Global) d[t.Address] = t;
|
||||||
|
return d;
|
||||||
|
}
|
||||||
|
var beforeDict = Index(before);
|
||||||
|
var afterDict = Index(after);
|
||||||
|
|
||||||
int delta = 0;
|
int delta = 0;
|
||||||
foreach (var addr in beforeDict.Keys.Union(afterDict.Keys).Distinct())
|
foreach (var addr in beforeDict.Keys.Union(afterDict.Keys).Distinct())
|
||||||
@@ -480,6 +538,12 @@ internal sealed partial class ConfigReconciler : IDisposable
|
|||||||
|
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
|
// Hard 2 s cap on the synchronous join. The debounce loop observes
|
||||||
|
// cancellation promptly when idle; if an apply is mid-flight its per-step
|
||||||
|
// CTS are linked to _disposalCts and also cancel. The cap exists only to
|
||||||
|
// bound host-shutdown latency if a supervisor StopAsync inside an apply
|
||||||
|
// hangs — in that case the loop task is abandoned (left running) rather
|
||||||
|
// than blocking shutdown indefinitely.
|
||||||
_debounceLoop.Wait(TimeSpan.FromSeconds(2));
|
_debounceLoop.Wait(TimeSpan.FromSeconds(2));
|
||||||
}
|
}
|
||||||
catch
|
catch
|
||||||
|
|||||||
@@ -61,17 +61,33 @@ internal static class ReloadValidator
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── 2b. Per-PLC backend Host / Port ───────────────────────────────────
|
||||||
|
// A missing Host surfaces only as a runtime connect fault; an out-of-range Port
|
||||||
|
// is never reachable. Reject both at the config gate (review ConfigAndHosting M4).
|
||||||
|
foreach (var plc in next.Plcs)
|
||||||
|
{
|
||||||
|
if (string.IsNullOrWhiteSpace(plc.Host))
|
||||||
|
errs.Add($"Plc '{plc.Name}': Host must be non-empty.");
|
||||||
|
if (plc.Port is < 1 or > 65535)
|
||||||
|
errs.Add($"Plc '{plc.Name}': Port {plc.Port} is out of range [1, 65535].");
|
||||||
|
}
|
||||||
|
|
||||||
// ── 3. AdminPort range and collision ─────────────────────────────────
|
// ── 3. AdminPort range and collision ─────────────────────────────────
|
||||||
|
// AdminPort 0 is valid: it disables the admin endpoint entirely (review
|
||||||
|
// TestsAndConfig M1). The range and collision checks apply only to a real port.
|
||||||
int adminPort = next.AdminPort;
|
int adminPort = next.AdminPort;
|
||||||
if (adminPort is < 1 or > 65535)
|
if (adminPort != 0)
|
||||||
{
|
{
|
||||||
errs.Add($"AdminPort {adminPort} is out of range [1, 65535].");
|
if (adminPort is < 1 or > 65535)
|
||||||
}
|
errs.Add($"AdminPort {adminPort} is out of range [1, 65535] (or 0 to disable).");
|
||||||
else if (seenPorts.TryGetValue(adminPort, out string? clashPlc))
|
else if (seenPorts.TryGetValue(adminPort, out string? clashPlc))
|
||||||
{
|
errs.Add($"AdminPort {adminPort} collides with ListenPort of PLC '{clashPlc}'.");
|
||||||
errs.Add($"AdminPort {adminPort} collides with ListenPort of PLC '{clashPlc}'.");
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (next.AdminPushIntervalMs <= 0 || next.AdminPushIntervalMs > 60_000)
|
||||||
|
errs.Add(
|
||||||
|
$"AdminPushIntervalMs must be between 1 and 60000 ms; got {next.AdminPushIntervalMs}.");
|
||||||
|
|
||||||
// ── 4. Per-PLC tag-map build ──────────────────────────────────────────
|
// ── 4. Per-PLC tag-map build ──────────────────────────────────────────
|
||||||
// BcdTagMapBuilder.Build is the single source of truth for tag-list
|
// BcdTagMapBuilder.Build is the single source of truth for tag-list
|
||||||
// well-formedness; we must not duplicate its validation logic here.
|
// well-formedness; we must not duplicate its validation logic here.
|
||||||
@@ -126,6 +142,8 @@ internal static class ReloadValidator
|
|||||||
}
|
}
|
||||||
if (next.Cache.MaxEntriesPerPlc < 0)
|
if (next.Cache.MaxEntriesPerPlc < 0)
|
||||||
errs.Add($"Cache.MaxEntriesPerPlc must be >= 0; got {next.Cache.MaxEntriesPerPlc}.");
|
errs.Add($"Cache.MaxEntriesPerPlc must be >= 0; got {next.Cache.MaxEntriesPerPlc}.");
|
||||||
|
else if (next.Cache.MaxEntriesPerPlc > 100_000)
|
||||||
|
errs.Add($"Cache.MaxEntriesPerPlc must be <= 100000; got {next.Cache.MaxEntriesPerPlc}.");
|
||||||
if (next.Cache.EvictionIntervalMs < 0)
|
if (next.Cache.EvictionIntervalMs < 0)
|
||||||
errs.Add($"Cache.EvictionIntervalMs must be >= 0; got {next.Cache.EvictionIntervalMs}.");
|
errs.Add($"Cache.EvictionIntervalMs must be >= 0; got {next.Cache.EvictionIntervalMs}.");
|
||||||
|
|
||||||
@@ -145,22 +163,55 @@ internal static class ReloadValidator
|
|||||||
// Schema bounds are also checked in MbproxyOptionsValidator; re-checking here keeps
|
// Schema bounds are also checked in MbproxyOptionsValidator; re-checking here keeps
|
||||||
// the hot-reload gate self-contained. The cross-field rule (heartbeat interval must
|
// the hot-reload gate self-contained. The cross-field rule (heartbeat interval must
|
||||||
// sit above the request timeout, or it would fire continuously) lives only here.
|
// sit above the request timeout, or it would fire continuously) lives only here.
|
||||||
|
//
|
||||||
|
// The tunable checks are gated on Keepalive.Enabled: when keepalive is off the
|
||||||
|
// probe interval / idle threshold / cross-field relationship are all inert
|
||||||
|
// (SocketKeepalive.Apply and the heartbeat loop both no-op), so a disabled-
|
||||||
|
// keepalive config must not be rejected for the values of knobs it never reads.
|
||||||
var ka = next.Connection.Keepalive;
|
var ka = next.Connection.Keepalive;
|
||||||
if (ka.TcpIdleTimeMs <= 0)
|
if (ka.Enabled)
|
||||||
errs.Add($"Connection.Keepalive.TcpIdleTimeMs must be > 0; got {ka.TcpIdleTimeMs}.");
|
{
|
||||||
if (ka.TcpProbeIntervalMs <= 0)
|
if (ka.TcpIdleTimeMs <= 0)
|
||||||
errs.Add($"Connection.Keepalive.TcpProbeIntervalMs must be > 0; got {ka.TcpProbeIntervalMs}.");
|
errs.Add($"Connection.Keepalive.TcpIdleTimeMs must be > 0; got {ka.TcpIdleTimeMs}.");
|
||||||
if (ka.TcpProbeCount <= 0)
|
if (ka.TcpProbeIntervalMs <= 0)
|
||||||
errs.Add($"Connection.Keepalive.TcpProbeCount must be > 0; got {ka.TcpProbeCount}.");
|
errs.Add($"Connection.Keepalive.TcpProbeIntervalMs must be > 0; got {ka.TcpProbeIntervalMs}.");
|
||||||
if (ka.BackendHeartbeatProbeAddress is < 0 or > 65535)
|
if (ka.TcpProbeCount <= 0)
|
||||||
|
errs.Add($"Connection.Keepalive.TcpProbeCount must be > 0; got {ka.TcpProbeCount}.");
|
||||||
|
if (ka.BackendHeartbeatProbeAddress is < 0 or > 65535)
|
||||||
|
errs.Add(
|
||||||
|
$"Connection.Keepalive.BackendHeartbeatProbeAddress must be in [0, 65535]; " +
|
||||||
|
$"got {ka.BackendHeartbeatProbeAddress}.");
|
||||||
|
if (ka.BackendHeartbeatIdleMs <= next.Connection.BackendRequestTimeoutMs)
|
||||||
|
errs.Add(
|
||||||
|
$"Connection.Keepalive.BackendHeartbeatIdleMs ({ka.BackendHeartbeatIdleMs}) must be greater " +
|
||||||
|
$"than Connection.BackendRequestTimeoutMs ({next.Connection.BackendRequestTimeoutMs}); " +
|
||||||
|
"a heartbeat interval at or below the request timeout would fire continuously.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── 7. Resilience profiles ────────────────────────────────────────────
|
||||||
|
// PolicyFactory clamps these defensively at runtime, but a config whose
|
||||||
|
// BackoffMs is shorter than MaxAttempts-1, or carries negative delays, does not
|
||||||
|
// behave as the operator intends — reject it at the gate (review M5).
|
||||||
|
var bc = next.Resilience.BackendConnect;
|
||||||
|
if (bc.MaxAttempts < 1)
|
||||||
|
errs.Add($"Resilience.BackendConnect.MaxAttempts must be >= 1; got {bc.MaxAttempts}.");
|
||||||
|
if (bc.BackoffMs.Count < bc.MaxAttempts - 1)
|
||||||
errs.Add(
|
errs.Add(
|
||||||
$"Connection.Keepalive.BackendHeartbeatProbeAddress must be in [0, 65535]; " +
|
$"Resilience.BackendConnect.BackoffMs must have at least MaxAttempts-1 " +
|
||||||
$"got {ka.BackendHeartbeatProbeAddress}.");
|
$"({Math.Max(0, bc.MaxAttempts - 1)}) entries; got {bc.BackoffMs.Count}.");
|
||||||
if (ka.BackendHeartbeatIdleMs <= next.Connection.BackendRequestTimeoutMs)
|
if (bc.BackoffMs.Any(ms => ms < 0))
|
||||||
|
errs.Add("Resilience.BackendConnect.BackoffMs entries must all be >= 0.");
|
||||||
|
|
||||||
|
var lr = next.Resilience.ListenerRecovery;
|
||||||
|
if (lr.InitialBackoffMs.Any(ms => ms < 0))
|
||||||
|
errs.Add("Resilience.ListenerRecovery.InitialBackoffMs entries must all be >= 0.");
|
||||||
|
if (lr.SteadyStateMs <= 0)
|
||||||
|
errs.Add($"Resilience.ListenerRecovery.SteadyStateMs must be > 0; got {lr.SteadyStateMs}.");
|
||||||
|
|
||||||
|
if (next.Resilience.ReadCoalescing.MaxParties < 1)
|
||||||
errs.Add(
|
errs.Add(
|
||||||
$"Connection.Keepalive.BackendHeartbeatIdleMs ({ka.BackendHeartbeatIdleMs}) must be greater " +
|
$"Resilience.ReadCoalescing.MaxParties must be >= 1; got " +
|
||||||
$"than Connection.BackendRequestTimeoutMs ({next.Connection.BackendRequestTimeoutMs}); " +
|
$"{next.Resilience.ReadCoalescing.MaxParties}.");
|
||||||
"a heartbeat interval at or below the request timeout would fire continuously.");
|
|
||||||
|
|
||||||
errors = errs;
|
errors = errs;
|
||||||
return errs.Count == 0;
|
return errs.Count == 0;
|
||||||
|
|||||||
@@ -59,6 +59,11 @@ internal sealed class EventLogBridge : ILogEventSink
|
|||||||
// pick up; in practice install.ps1 registers it once at install.
|
// pick up; in practice install.ps1 registers it once at install.
|
||||||
private readonly bool _sourceExists;
|
private readonly bool _sourceExists;
|
||||||
|
|
||||||
|
// One-shot guard: 0 until the first Event Log write fails, then 1. Lets Emit drop a
|
||||||
|
// single SelfLog breadcrumb (parity with SyslogBridge's degradation breadcrumb)
|
||||||
|
// without flooding SelfLog on every subsequent Error+ event.
|
||||||
|
private int _writeFailureReported;
|
||||||
|
|
||||||
public EventLogBridge(bool enabled)
|
public EventLogBridge(bool enabled)
|
||||||
{
|
{
|
||||||
_enabled = enabled;
|
_enabled = enabled;
|
||||||
@@ -102,10 +107,14 @@ internal sealed class EventLogBridge : ILogEventSink
|
|||||||
{
|
{
|
||||||
EventLog.WriteEntry(Source, message, type);
|
EventLog.WriteEntry(Source, message, type);
|
||||||
}
|
}
|
||||||
catch
|
catch (Exception ex)
|
||||||
{
|
{
|
||||||
// Swallow: if the Event Log write fails (e.g., source not registered,
|
// Swallow: if the Event Log write fails (e.g., source removed after start,
|
||||||
// quota exceeded) we must not crash the application or recurse.
|
// quota exceeded) we must not crash the application or recurse. Leave one
|
||||||
|
// SelfLog breadcrumb the first time so the degradation is diagnosable.
|
||||||
|
if (Interlocked.Exchange(ref _writeFailureReported, 1) == 0)
|
||||||
|
Serilog.Debugging.SelfLog.WriteLine(
|
||||||
|
"EventLogBridge: Event Log write failed (further failures suppressed): {0}", ex);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -32,6 +32,10 @@ internal static class HostingExtensions
|
|||||||
// Service-wide counters (read by the status page).
|
// Service-wide counters (read by the status page).
|
||||||
builder.Services.AddSingleton<ServiceCounters>();
|
builder.Services.AddSingleton<ServiceCounters>();
|
||||||
|
|
||||||
|
// Per-PLC tag-value captures backing the connection-detail debug view.
|
||||||
|
// Shared between the proxy hot path and the admin SignalR layer.
|
||||||
|
builder.Services.AddSingleton<Proxy.TagCaptureRegistry>();
|
||||||
|
|
||||||
// Hot-reload reconciler (singleton; subscribes to IOptionsMonitor.OnChange).
|
// Hot-reload reconciler (singleton; subscribes to IOptionsMonitor.OnChange).
|
||||||
builder.Services.AddSingleton<ConfigReconciler>();
|
builder.Services.AddSingleton<ConfigReconciler>();
|
||||||
|
|
||||||
@@ -57,6 +61,8 @@ internal static class HostingExtensions
|
|||||||
{
|
{
|
||||||
builder.Services.AddSingleton<AssemblyVersionAccessor>();
|
builder.Services.AddSingleton<AssemblyVersionAccessor>();
|
||||||
builder.Services.AddSingleton<StatusSnapshotBuilder>();
|
builder.Services.AddSingleton<StatusSnapshotBuilder>();
|
||||||
|
// Tracks detail-page SignalR subscribers; drives on-demand capture arming.
|
||||||
|
builder.Services.AddSingleton<PlcSubscriptionTracker>();
|
||||||
builder.Services.AddSingleton<AdminEndpointHost>();
|
builder.Services.AddSingleton<AdminEndpointHost>();
|
||||||
return builder;
|
return builder;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -26,6 +26,10 @@
|
|||||||
<PublishSingleFile>true</PublishSingleFile>
|
<PublishSingleFile>true</PublishSingleFile>
|
||||||
<SelfContained>true</SelfContained>
|
<SelfContained>true</SelfContained>
|
||||||
<IncludeNativeLibrariesForSelfExtract>true</IncludeNativeLibrariesForSelfExtract>
|
<IncludeNativeLibrariesForSelfExtract>true</IncludeNativeLibrariesForSelfExtract>
|
||||||
|
<!-- Embed debug symbols into the assembly instead of emitting a loose .pdb, so a
|
||||||
|
RID publish produces a clean folder (single binary + appsettings.json). Symbols
|
||||||
|
stay available for exception stack traces; there is just no separate file. -->
|
||||||
|
<DebugType>embedded</DebugType>
|
||||||
</PropertyGroup>
|
</PropertyGroup>
|
||||||
|
|
||||||
<ItemGroup>
|
<ItemGroup>
|
||||||
@@ -58,6 +62,16 @@
|
|||||||
<InternalsVisibleTo Include="Mbproxy.Tests" />
|
<InternalsVisibleTo Include="Mbproxy.Tests" />
|
||||||
</ItemGroup>
|
</ItemGroup>
|
||||||
|
|
||||||
|
<ItemGroup>
|
||||||
|
<!-- Admin web-UI assets — Bootstrap, the SignalR JS client, vendored fonts, and the
|
||||||
|
dashboard's own HTML/CSS/JS. Embedded into the assembly so the single-file binary
|
||||||
|
serves the whole UI with no CDN dependency (firewalled networks). Resource names
|
||||||
|
are Mbproxy.Admin.wwwroot.<filename>; AdminEndpointHost streams them on
|
||||||
|
GET /assets/<filename>. The directory is intentionally flat to keep the
|
||||||
|
resource-name → request-path mapping trivial. -->
|
||||||
|
<EmbeddedResource Include="Admin\wwwroot\*.*" />
|
||||||
|
</ItemGroup>
|
||||||
|
|
||||||
<!-- Link the platform-appropriate install template as the published appsettings.json so
|
<!-- Link the platform-appropriate install template as the published appsettings.json so
|
||||||
the binary ships with a fully-commented, usable example config (PLCs, BCD tags, all
|
the binary ships with a fully-commented, usable example config (PLCs, BCD tags, all
|
||||||
sections present) instead of an empty stub. The .NET configuration loader supports
|
sections present) instead of an empty stub. The .NET configuration loader supports
|
||||||
|
|||||||
@@ -12,4 +12,11 @@ public sealed class BcdTagOptions
|
|||||||
/// values cap the staleness window in milliseconds.
|
/// values cap the staleness window in milliseconds.
|
||||||
/// </summary>
|
/// </summary>
|
||||||
public int? CacheTtlMs { get; init; }
|
public int? CacheTtlMs { get; init; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Optional human-friendly label identifying this tag (e.g. <c>"Left AirSP"</c>).
|
||||||
|
/// Free-form; shown on the connection-detail debug view. Null/omitted renders as
|
||||||
|
/// the bare PDU address. Has no effect on Modbus rewriting — purely a display aid.
|
||||||
|
/// </summary>
|
||||||
|
public string? Name { get; init; }
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -7,6 +7,15 @@ public sealed class MbproxyOptions
|
|||||||
public BcdTagListOptions BcdTags { get; init; } = new();
|
public BcdTagListOptions BcdTags { get; init; } = new();
|
||||||
public IReadOnlyList<PlcOptions> Plcs { get; init; } = [];
|
public IReadOnlyList<PlcOptions> Plcs { get; init; } = [];
|
||||||
public int AdminPort { get; init; } = 8080;
|
public int AdminPort { get; init; } = 8080;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Server-push cadence (milliseconds) for the admin dashboard's SignalR feed.
|
||||||
|
/// Every interval the admin endpoint builds a status snapshot and pushes it to
|
||||||
|
/// connected dashboard / detail-page clients. Must be in the range 1–60000 ms
|
||||||
|
/// (a value past a minute makes the "live" feed non-live). Defaults to 1000.
|
||||||
|
/// </summary>
|
||||||
|
public int AdminPushIntervalMs { get; init; } = 1000;
|
||||||
|
|
||||||
public ConnectionOptions Connection { get; init; } = new();
|
public ConnectionOptions Connection { get; init; } = new();
|
||||||
public ResilienceOptions Resilience { get; init; } = new();
|
public ResilienceOptions Resilience { get; init; } = new();
|
||||||
|
|
||||||
@@ -88,9 +97,14 @@ public sealed class MbproxyOptionsValidator : IValidateOptions<MbproxyOptions>
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Cache section ranges.
|
// Cache section ranges. MaxEntriesPerPlc has a hard ceiling because the cache's
|
||||||
|
// LRU eviction is an O(n) scan under a lock — a fat-fingered seven-figure value
|
||||||
|
// would stall the backend reader on every cache-miss store.
|
||||||
if (options.Cache.MaxEntriesPerPlc < 0)
|
if (options.Cache.MaxEntriesPerPlc < 0)
|
||||||
errors.Add($"Cache.MaxEntriesPerPlc must be >= 0; got {options.Cache.MaxEntriesPerPlc}.");
|
errors.Add($"Cache.MaxEntriesPerPlc must be >= 0; got {options.Cache.MaxEntriesPerPlc}.");
|
||||||
|
else if (options.Cache.MaxEntriesPerPlc > 100_000)
|
||||||
|
errors.Add(
|
||||||
|
$"Cache.MaxEntriesPerPlc must be <= 100000; got {options.Cache.MaxEntriesPerPlc}.");
|
||||||
if (options.Cache.EvictionIntervalMs < 0)
|
if (options.Cache.EvictionIntervalMs < 0)
|
||||||
errors.Add($"Cache.EvictionIntervalMs must be >= 0; got {options.Cache.EvictionIntervalMs}.");
|
errors.Add($"Cache.EvictionIntervalMs must be >= 0; got {options.Cache.EvictionIntervalMs}.");
|
||||||
|
|
||||||
@@ -106,6 +120,13 @@ public sealed class MbproxyOptionsValidator : IValidateOptions<MbproxyOptions>
|
|||||||
errors.Add(
|
errors.Add(
|
||||||
$"Connection.GracefulShutdownTimeoutMs must be > 0; got {options.Connection.GracefulShutdownTimeoutMs}.");
|
$"Connection.GracefulShutdownTimeoutMs must be > 0; got {options.Connection.GracefulShutdownTimeoutMs}.");
|
||||||
|
|
||||||
|
// AdminPushIntervalMs has a soft upper bound: a value past a minute makes the
|
||||||
|
// dashboard's "live" feed effectively non-live, which is almost always a typo
|
||||||
|
// (e.g. a seconds value pasted as milliseconds) rather than an intent.
|
||||||
|
if (options.AdminPushIntervalMs <= 0 || options.AdminPushIntervalMs > 60_000)
|
||||||
|
errors.Add(
|
||||||
|
$"AdminPushIntervalMs must be between 1 and 60000 ms; got {options.AdminPushIntervalMs}.");
|
||||||
|
|
||||||
// Keepalive section ranges. Cross-field rules (heartbeat interval vs request
|
// Keepalive section ranges. Cross-field rules (heartbeat interval vs request
|
||||||
// timeout) are enforced in ReloadValidator.
|
// timeout) are enforced in ReloadValidator.
|
||||||
var ka = options.Connection.Keepalive;
|
var ka = options.Connection.Keepalive;
|
||||||
|
|||||||
@@ -141,6 +141,7 @@ internal sealed class BcdPduPipeline : IPduPipeline
|
|||||||
pdu[3] = (byte)(encoded >> 8);
|
pdu[3] = (byte)(encoded >> 8);
|
||||||
pdu[4] = (byte)(encoded & 0xFF);
|
pdu[4] = (byte)(encoded & 0xFF);
|
||||||
ctx.Counters.AddRewrittenSlots(1);
|
ctx.Counters.AddRewrittenSlots(1);
|
||||||
|
ctx.Capture?.Record(address, encoded, 0, value, CaptureDirection.Write);
|
||||||
}
|
}
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
@@ -156,14 +157,15 @@ internal sealed class BcdPduPipeline : IPduPipeline
|
|||||||
|
|
||||||
ushort startAddress = (ushort)((pdu[1] << 8) | pdu[2]);
|
ushort startAddress = (ushort)((pdu[1] << 8) | pdu[2]);
|
||||||
ushort qty = (ushort)((pdu[3] << 8) | pdu[4]);
|
ushort qty = (ushort)((pdu[3] << 8) | pdu[4]);
|
||||||
|
byte byteCount = pdu[5];
|
||||||
|
|
||||||
// Validate the request is fully sized for `qty` registers (each 2 bytes after
|
// The FC16 PDU is self-describing in three places that must agree: qty (register
|
||||||
// the byteCount byte). A client claiming qty=10 with only 4 bytes of register
|
// count), byteCount, and the actual data tail. Validate all three before touching
|
||||||
// data would otherwise have its BCD slots silently skipped by the per-slot
|
// any register data. A request whose byteCount disagrees with qty, or whose tail
|
||||||
// bounds check below — half the request rewritten, half not. Returning here
|
// is short, is passed through unchanged so the PLC's own validator surfaces the
|
||||||
// passes the malformed PDU through unchanged so the PLC's own validator
|
// protocol error — the rewriter must not partially mutate bytes the client never
|
||||||
// surfaces the protocol error.
|
// framed as register data (review ProxyAndBcd M1).
|
||||||
if (pdu.Length < 6 + qty * 2)
|
if (byteCount != qty * 2 || pdu.Length < 6 + byteCount)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
if (!ctx.TagMap.TryGetForRange(startAddress, qty, out var hits))
|
if (!ctx.TagMap.TryGetForRange(startAddress, qty, out var hits))
|
||||||
@@ -247,6 +249,7 @@ internal sealed class BcdPduPipeline : IPduPipeline
|
|||||||
pdu[highByteOff] = (byte)(bcdHigh >> 8);
|
pdu[highByteOff] = (byte)(bcdHigh >> 8);
|
||||||
pdu[highByteOff + 1] = (byte)(bcdHigh & 0xFF);
|
pdu[highByteOff + 1] = (byte)(bcdHigh & 0xFF);
|
||||||
ctx.Counters.AddRewrittenSlots(2);
|
ctx.Counters.AddRewrittenSlots(2);
|
||||||
|
ctx.Capture?.Record(tag.Address, bcdLow, bcdHigh, binaryValue, CaptureDirection.Write);
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
@@ -276,6 +279,7 @@ internal sealed class BcdPduPipeline : IPduPipeline
|
|||||||
pdu[byteOff] = (byte)(encoded >> 8);
|
pdu[byteOff] = (byte)(encoded >> 8);
|
||||||
pdu[byteOff + 1] = (byte)(encoded & 0xFF);
|
pdu[byteOff + 1] = (byte)(encoded & 0xFF);
|
||||||
ctx.Counters.AddRewrittenSlots(1);
|
ctx.Counters.AddRewrittenSlots(1);
|
||||||
|
ctx.Capture?.Record(tag.Address, encoded, 0, clientValue, CaptureDirection.Write);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -361,8 +365,12 @@ internal sealed class BcdPduPipeline : IPduPipeline
|
|||||||
|
|
||||||
if (!lowInRange || !highInRange)
|
if (!lowInRange || !highInRange)
|
||||||
{
|
{
|
||||||
|
// Log effectiveQty — the range actually used for the in-range test —
|
||||||
|
// not the raw request qty. A short backend response clamps
|
||||||
|
// effectiveQty below qty; logging qty would mis-report a truncated
|
||||||
|
// response as a client/config straddle (review ProxyAndBcd N5).
|
||||||
RewriterLogEvents.PartialBcd(ctx.Logger, ctx.PlcName,
|
RewriterLogEvents.PartialBcd(ctx.Logger, ctx.PlcName,
|
||||||
tag.Address, startAddress, qty);
|
tag.Address, startAddress, effectiveQty);
|
||||||
ctx.Counters.IncrementPartialBcd();
|
ctx.Counters.IncrementPartialBcd();
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
@@ -402,6 +410,7 @@ internal sealed class BcdPduPipeline : IPduPipeline
|
|||||||
pdu[highByteOff] = (byte)(decodedHigh >> 8);
|
pdu[highByteOff] = (byte)(decodedHigh >> 8);
|
||||||
pdu[highByteOff + 1] = (byte)(decodedHigh & 0xFF);
|
pdu[highByteOff + 1] = (byte)(decodedHigh & 0xFF);
|
||||||
ctx.Counters.AddRewrittenSlots(2);
|
ctx.Counters.AddRewrittenSlots(2);
|
||||||
|
ctx.Capture?.Record(tag.Address, rawLow, rawHigh, decoded, CaptureDirection.Read);
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
@@ -430,6 +439,7 @@ internal sealed class BcdPduPipeline : IPduPipeline
|
|||||||
pdu[byteOff] = (byte)(decoded >> 8);
|
pdu[byteOff] = (byte)(decoded >> 8);
|
||||||
pdu[byteOff + 1] = (byte)(decoded & 0xFF);
|
pdu[byteOff + 1] = (byte)(decoded & 0xFF);
|
||||||
ctx.Counters.AddRewrittenSlots(1);
|
ctx.Counters.AddRewrittenSlots(1);
|
||||||
|
ctx.Capture?.Record(tag.Address, raw, 0, decoded, CaptureDirection.Read);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -14,10 +14,18 @@ namespace Mbproxy.Proxy.Cache;
|
|||||||
/// monotonic ticker; LRU eviction picks the entry with the smallest tick. Using a long
|
/// monotonic ticker; LRU eviction picks the entry with the smallest tick. Using a long
|
||||||
/// instead of <see cref="DateTimeOffset.UtcNow"/> on every access keeps the hot path free
|
/// instead of <see cref="DateTimeOffset.UtcNow"/> on every access keeps the hot path free
|
||||||
/// of clock calls and works correctly even if the wall clock moves backward.</para>
|
/// of clock calls and works correctly even if the wall clock moves backward.</para>
|
||||||
|
///
|
||||||
|
/// <para><b><see cref="CapturedTags"/></b> holds the BCD-tag observations captured (raw
|
||||||
|
/// nibbles + decoded value) when this entry was stored — but only when the connection's
|
||||||
|
/// debug-view capture was armed at insert time; otherwise <c>null</c>. A cache hit bypasses
|
||||||
|
/// the BCD pipeline, so without this the debug view would never see cache-served reads.
|
||||||
|
/// On a hit these are replayed into the capture so the detail page reflects what the
|
||||||
|
/// client actually receives. See <see cref="ResponseCache"/> and the debug-view docs.</para>
|
||||||
/// </summary>
|
/// </summary>
|
||||||
internal sealed record CacheEntry(
|
internal sealed record CacheEntry(
|
||||||
byte[] PduBytes,
|
byte[] PduBytes,
|
||||||
DateTimeOffset CachedAtUtc,
|
DateTimeOffset CachedAtUtc,
|
||||||
DateTimeOffset ExpiresAtUtc,
|
DateTimeOffset ExpiresAtUtc,
|
||||||
int Length,
|
int Length,
|
||||||
long LastUsedTick);
|
long LastUsedTick,
|
||||||
|
IReadOnlyList<TagValueObservation>? CapturedTags = null);
|
||||||
|
|||||||
@@ -334,10 +334,21 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Successful connect. Wire up the backend tasks.
|
// Successful connect. Re-check disposal under the lock before publishing the
|
||||||
var cts2 = CancellationTokenSource.CreateLinkedTokenSource(_disposeCts.Token);
|
// socket: a DisposeAsync that ran during the (possibly multi-second) connect
|
||||||
|
// would otherwise strand this live socket and its three tasks in a disposed
|
||||||
|
// multiplexer, leaking an ECOM client slot forever (review M1). We hold
|
||||||
|
// _backendLock here, and DisposeAsync's TearDownBackendAsync also takes it
|
||||||
|
// before _disposeCts is disposed — so CreateLinkedTokenSource is safe inside.
|
||||||
lock (_backendLock)
|
lock (_backendLock)
|
||||||
{
|
{
|
||||||
|
if (_disposed || _disposeCts.IsCancellationRequested)
|
||||||
|
{
|
||||||
|
backend.Dispose();
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
var cts2 = CancellationTokenSource.CreateLinkedTokenSource(_disposeCts.Token);
|
||||||
_backendSocket = backend;
|
_backendSocket = backend;
|
||||||
_backendCts = cts2;
|
_backendCts = cts2;
|
||||||
// Seed the idle timer so the heartbeat loop measures idleness from connect.
|
// Seed the idle timer so the heartbeat loop measures idleness from connect.
|
||||||
@@ -445,9 +456,10 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
int upstreamCount = 0;
|
int upstreamCount = 0;
|
||||||
if (cascadeUpstreams)
|
if (cascadeUpstreams)
|
||||||
{
|
{
|
||||||
// Close every attached pipe that had a request in flight; the others will
|
// Per docs/Architecture/ConnectionModel.md, ALL attached upstream pipes
|
||||||
// simply re-issue on next request through a fresh backend connect.
|
// cascade-close on a backend disconnect — not just those with a request
|
||||||
// Per docs/Architecture/ConnectionModel.md, ALL attached upstreams cascade on backend disconnect.
|
// in flight. Clients re-issue on their next request through a fresh
|
||||||
|
// backend connect.
|
||||||
upstreamCount = _pipes.Count;
|
upstreamCount = _pipes.Count;
|
||||||
|
|
||||||
// Snapshot keys before disposal modifies the dictionary indirectly.
|
// Snapshot keys before disposal modifies the dictionary indirectly.
|
||||||
@@ -499,6 +511,23 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Fire-and-forget a backend teardown from a failure-detection path (writer/reader
|
||||||
|
/// fault, reader EOF, malformed response). Attaches a faulted continuation so a throw
|
||||||
|
/// inside <see cref="TearDownBackendAsync"/> is logged rather than surfacing as an
|
||||||
|
/// unobserved <see cref="TaskScheduler.UnobservedTaskException"/>.
|
||||||
|
/// </summary>
|
||||||
|
private void FireTeardown(string reason)
|
||||||
|
{
|
||||||
|
_ = TearDownBackendAsync(reason, cascadeUpstreams: true)
|
||||||
|
.ContinueWith(
|
||||||
|
t => _logger.LogError(t.Exception,
|
||||||
|
"Backend teardown faulted: Plc={Plc} Reason={Reason}", _plc.Name, reason),
|
||||||
|
CancellationToken.None,
|
||||||
|
TaskContinuationOptions.OnlyOnFaulted | TaskContinuationOptions.ExecuteSynchronously,
|
||||||
|
TaskScheduler.Default);
|
||||||
|
}
|
||||||
|
|
||||||
// ── Backend writer / reader tasks ─────────────────────────────────────────
|
// ── Backend writer / reader tasks ─────────────────────────────────────────
|
||||||
|
|
||||||
private async Task RunBackendWriterAsync(Socket backend, CancellationToken ct)
|
private async Task RunBackendWriterAsync(Socket backend, CancellationToken ct)
|
||||||
@@ -533,7 +562,7 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
// race against it, hitting a disposed _connectGate and producing an
|
// race against it, hitting a disposed _connectGate and producing an
|
||||||
// unobserved-task exception.
|
// unobserved-task exception.
|
||||||
if (!_disposeCts.IsCancellationRequested)
|
if (!_disposeCts.IsCancellationRequested)
|
||||||
_ = TearDownBackendAsync($"writer fault: {ex.Message}", cascadeUpstreams: true);
|
FireTeardown($"writer fault: {ex.Message}");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -595,10 +624,34 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
if (inFlight.IsHeartbeat)
|
if (inFlight.IsHeartbeat)
|
||||||
continue;
|
continue;
|
||||||
|
|
||||||
|
// Validate an FC03/FC04 success response's shape against the request
|
||||||
|
// before it is rewritten, cached, or fanned out. A backend that frames a
|
||||||
|
// response wrong (truncated body, bad MBAP Length) would otherwise desync
|
||||||
|
// the stream indefinitely or get a malformed payload cached and replayed
|
||||||
|
// for the whole TTL. FC03/FC04 success PDU = [fc][byteCount][data] with
|
||||||
|
// byteCount == 2*qty; on any mismatch force a teardown (the stream can no
|
||||||
|
// longer be trusted) rather than fan out a wrong-shaped frame.
|
||||||
|
if (inFlight.Fc is 0x03 or 0x04 && (frame[MbapFrame.HeaderSize] & 0x80) == 0)
|
||||||
|
{
|
||||||
|
int expectedBody = 2 + 2 * inFlight.Qty;
|
||||||
|
byte byteCount = pduBodyLen >= 2 ? frame[MbapFrame.HeaderSize + 1] : (byte)0;
|
||||||
|
if (pduBodyLen != expectedBody || byteCount != 2 * inFlight.Qty)
|
||||||
|
{
|
||||||
|
_logger.LogWarning(
|
||||||
|
"Malformed FC{Fc:X2} backend response for Plc={Plc}: pduBody={Body} expected={Expected} byteCount={ByteCount} — forcing teardown",
|
||||||
|
inFlight.Fc, _plc.Name, pduBodyLen, expectedBody, byteCount);
|
||||||
|
if (!_disposeCts.IsCancellationRequested)
|
||||||
|
FireTeardown("malformed backend response");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// For FC03/FC04 reads, also clear the coalescing-by-key entry so a
|
// For FC03/FC04 reads, also clear the coalescing-by-key entry so a
|
||||||
// brand-new identical request issued AFTER this response is treated as a
|
// brand-new identical request issued AFTER this response is treated as a
|
||||||
// miss (opens a fresh round-trip). The TryRemove is best-effort: a
|
// miss (opens a fresh round-trip). The TryRemove is best-effort: a
|
||||||
// watchdog timeout or cascade may have already removed it.
|
// watchdog timeout or cascade may have already removed it. This branch is
|
||||||
|
// heartbeat-free by construction — the IsHeartbeat check above already
|
||||||
|
// `continue`d — so the CoalescingKey rebuilt here is always a real read.
|
||||||
if (inFlight.Fc is 0x03 or 0x04)
|
if (inFlight.Fc is 0x03 or 0x04)
|
||||||
{
|
{
|
||||||
var coalKey = new CoalescingKey(inFlight.UnitId, inFlight.Fc,
|
var coalKey = new CoalescingKey(inFlight.UnitId, inFlight.Fc,
|
||||||
@@ -645,6 +698,14 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
byte[] pduSnapshot = new byte[pduBodyLen];
|
byte[] pduSnapshot = new byte[pduBodyLen];
|
||||||
Buffer.BlockCopy(frame, MbapFrame.HeaderSize, pduSnapshot, 0, pduBodyLen);
|
Buffer.BlockCopy(frame, MbapFrame.HeaderSize, pduSnapshot, 0, pduBodyLen);
|
||||||
|
|
||||||
|
// If a detail-page viewer has armed this PLC's capture, the
|
||||||
|
// pipeline above just recorded fresh observations for the BCD
|
||||||
|
// tags in this read range. Attach them to the cache entry so a
|
||||||
|
// later hit (which bypasses the pipeline) can replay them into
|
||||||
|
// the debug view — otherwise the view would freeze for the TTL.
|
||||||
|
var capturedTags = CaptureRangeObservations(
|
||||||
|
inFlight.StartAddress, inFlight.Qty);
|
||||||
|
|
||||||
var cacheKey = new CacheKey(
|
var cacheKey = new CacheKey(
|
||||||
inFlight.UnitId, inFlight.Fc,
|
inFlight.UnitId, inFlight.Fc,
|
||||||
inFlight.StartAddress, inFlight.Qty);
|
inFlight.StartAddress, inFlight.Qty);
|
||||||
@@ -654,7 +715,8 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
CachedAtUtc: now,
|
CachedAtUtc: now,
|
||||||
ExpiresAtUtc: now.AddMilliseconds(inFlight.ResolvedCacheTtlMs),
|
ExpiresAtUtc: now.AddMilliseconds(inFlight.ResolvedCacheTtlMs),
|
||||||
Length: pduSnapshot.Length,
|
Length: pduSnapshot.Length,
|
||||||
LastUsedTick: 0); // ResponseCache.Set stamps the real tick
|
LastUsedTick: 0, // ResponseCache.Set stamps the real tick
|
||||||
|
CapturedTags: capturedTags);
|
||||||
postCache.Set(cacheKey, entry);
|
postCache.Set(cacheKey, entry);
|
||||||
|
|
||||||
CacheLogEvents.Store(_logger, _plc.Name,
|
CacheLogEvents.Store(_logger, _plc.Name,
|
||||||
@@ -744,7 +806,7 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
// Reader exited cleanly — backend closed by remote. Cascade. Skip if
|
// Reader exited cleanly — backend closed by remote. Cascade. Skip if
|
||||||
// dispose is already in progress (see writer-side comment above).
|
// dispose is already in progress (see writer-side comment above).
|
||||||
if (!_disposeCts.IsCancellationRequested)
|
if (!_disposeCts.IsCancellationRequested)
|
||||||
_ = TearDownBackendAsync("backend reader EOF", cascadeUpstreams: true);
|
FireTeardown("backend reader EOF");
|
||||||
}
|
}
|
||||||
catch (OperationCanceledException)
|
catch (OperationCanceledException)
|
||||||
{
|
{
|
||||||
@@ -753,7 +815,7 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
catch (Exception ex)
|
catch (Exception ex)
|
||||||
{
|
{
|
||||||
if (!_disposeCts.IsCancellationRequested)
|
if (!_disposeCts.IsCancellationRequested)
|
||||||
_ = TearDownBackendAsync($"reader fault: {ex.Message}", cascadeUpstreams: true);
|
FireTeardown($"reader fault: {ex.Message}");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -800,11 +862,35 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
startAddr = (ushort)((frame[pduOffset + 1] << 8) | frame[pduOffset + 2]);
|
startAddr = (ushort)((frame[pduOffset + 1] << 8) | frame[pduOffset + 2]);
|
||||||
qty = 1;
|
qty = 1;
|
||||||
}
|
}
|
||||||
else if (fcByte == 0x10 && frame.Length >= pduOffset + 5)
|
else if (fcByte == 0x10 && frame.Length >= pduOffset + 6)
|
||||||
{
|
{
|
||||||
// FC16 = Write Multiple Registers. PDU: [fc=10][startHi][startLo][qtyHi][qtyLo][byteCount]...
|
// FC16 = Write Multiple Registers. PDU: [fc=10][startHi][startLo][qtyHi][qtyLo][byteCount]...
|
||||||
|
// Only trust the claimed qty for cache invalidation when the FC16 frame is
|
||||||
|
// internally consistent (byteCount == 2*qty and the register payload is fully
|
||||||
|
// present). A malformed FC16 is still forwarded — the PLC rejects it with
|
||||||
|
// exception 03 — but it must not drive cache invalidation off a span the
|
||||||
|
// client merely claimed (review Multiplexing M4).
|
||||||
startAddr = (ushort)((frame[pduOffset + 1] << 8) | frame[pduOffset + 2]);
|
startAddr = (ushort)((frame[pduOffset + 1] << 8) | frame[pduOffset + 2]);
|
||||||
qty = (ushort)((frame[pduOffset + 3] << 8) | frame[pduOffset + 4]);
|
ushort claimedQty = (ushort)((frame[pduOffset + 3] << 8) | frame[pduOffset + 4]);
|
||||||
|
byte byteCount = frame[pduOffset + 5];
|
||||||
|
if (byteCount == claimedQty * 2 && frame.Length >= pduOffset + 6 + byteCount)
|
||||||
|
qty = claimedQty;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Write-request-side cache invalidation. Drop overlapping cached FC03/FC04 entries
|
||||||
|
// the moment an FC06/FC16 is forwarded — not only when its response lands — so a
|
||||||
|
// concurrent read in the in-flight window, or a write whose response never
|
||||||
|
// arrives, cannot serve a value the write changed (review Multiplexing C1). The
|
||||||
|
// response-side invalidation in RunBackendReaderAsync remains as a backstop for
|
||||||
|
// entries stored during the in-flight window.
|
||||||
|
if (fcByte is 0x06 or 0x10 && qty > 0 && _ctx.Cache is { } writeCache)
|
||||||
|
{
|
||||||
|
int invalidated = writeCache.Invalidate(unitId, startAddr, qty);
|
||||||
|
if (invalidated > 0)
|
||||||
|
{
|
||||||
|
_ctx.Counters.AddCacheInvalidations(invalidated);
|
||||||
|
CacheLogEvents.Invalidated(_logger, _plc.Name, unitId, startAddr, qty, invalidated);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Response-cache path. Cache check happens BEFORE coalescing AND before we
|
// Response-cache path. Cache check happens BEFORE coalescing AND before we
|
||||||
@@ -825,6 +911,22 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
_ctx.Counters.IncrementCacheHit();
|
_ctx.Counters.IncrementCacheHit();
|
||||||
CacheLogEvents.Hit(_logger, _plc.Name, unitId, fcByte, startAddr, qty);
|
CacheLogEvents.Hit(_logger, _plc.Name, unitId, fcByte, startAddr, qty);
|
||||||
|
|
||||||
|
// A cache hit bypasses the BCD pipeline, so the debug-view capture
|
||||||
|
// would otherwise never see cache-served reads. Replay the
|
||||||
|
// observations captured when this entry was stored, preserving each
|
||||||
|
// observation's ORIGINAL timestamp — the value the client receives is
|
||||||
|
// cache-aged (up to CacheTtlMs old), so the debug view must show its
|
||||||
|
// true age, not a misleading "just now". Entries stored while no
|
||||||
|
// viewer was armed carry no observations; those tags self-heal on the
|
||||||
|
// next cache miss.
|
||||||
|
if (_ctx.Capture is { IsArmed: true } capture &&
|
||||||
|
cached.CapturedTags is { Count: > 0 } cachedTags)
|
||||||
|
{
|
||||||
|
foreach (var obs in cachedTags)
|
||||||
|
capture.Record(obs.Address, obs.RawLow, obs.RawHigh,
|
||||||
|
obs.DecodedValue, CaptureDirection.Read, obs.UpdatedAtUtc);
|
||||||
|
}
|
||||||
|
|
||||||
byte[] hitFrame = BuildCacheHitFrame(originalTxId, unitId, cached.PduBytes);
|
byte[] hitFrame = BuildCacheHitFrame(originalTxId, unitId, cached.PduBytes);
|
||||||
await pipe.SendResponseAsync(hitFrame, ct).ConfigureAwait(false);
|
await pipe.SendResponseAsync(hitFrame, ct).ConfigureAwait(false);
|
||||||
// Outbound bytes for cache-hit response.
|
// Outbound bytes for cache-hit response.
|
||||||
@@ -1095,6 +1197,10 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
{
|
{
|
||||||
await Task.Delay(tickMs, ct).ConfigureAwait(false);
|
await Task.Delay(tickMs, ct).ConfigureAwait(false);
|
||||||
|
|
||||||
|
// Skip the snapshot allocation entirely on an idle PLC (the common case
|
||||||
|
// across a 54-PLC fleet) — no in-flight requests, nothing to time out.
|
||||||
|
if (_correlation.Count == 0) continue;
|
||||||
|
|
||||||
var threshold = DateTimeOffset.UtcNow.AddMilliseconds(-_connectionOptions.BackendRequestTimeoutMs);
|
var threshold = DateTimeOffset.UtcNow.AddMilliseconds(-_connectionOptions.BackendRequestTimeoutMs);
|
||||||
var stale = _correlation.SnapshotOlderThan(threshold);
|
var stale = _correlation.SnapshotOlderThan(threshold);
|
||||||
if (stale.Count == 0) continue;
|
if (stale.Count == 0) continue;
|
||||||
@@ -1138,7 +1244,14 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
|
|
||||||
long elapsedMs = (long)(DateTimeOffset.UtcNow - req.SentAtUtc).TotalMilliseconds;
|
long elapsedMs = (long)(DateTimeOffset.UtcNow - req.SentAtUtc).TotalMilliseconds;
|
||||||
|
|
||||||
foreach (var party in req.InterestedParties)
|
// Snapshot the party list before fan-out. For a coalesced FC03/FC04 a
|
||||||
|
// late attach could have appended to the live List between the
|
||||||
|
// CorrelationMap claim above and the _inFlightByKey removal; iterating
|
||||||
|
// the live List while it mutates throws InvalidOperationException,
|
||||||
|
// which the outer catch would turn into permanent watchdog death for
|
||||||
|
// this PLC (review Multiplexing M5). The _inFlightByKey.TryRemove above
|
||||||
|
// takes the map lock, so once it returns the list is stable to copy.
|
||||||
|
foreach (var party in req.InterestedParties.ToArray())
|
||||||
{
|
{
|
||||||
MultiplexerLogEvents.RequestTimeout(
|
MultiplexerLogEvents.RequestTimeout(
|
||||||
_logger, _plc.Name, proxyTxId, party.OriginalTxId, req.Fc, elapsedMs);
|
_logger, _plc.Name, proxyTxId, party.OriginalTxId, req.Fc, elapsedMs);
|
||||||
@@ -1345,6 +1458,35 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
|||||||
return frame;
|
return frame;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Snapshots the debug-view observations for the BCD tags within the given FC03/FC04
|
||||||
|
/// read range, for attaching to the response-cache entry being stored. Returns
|
||||||
|
/// <c>null</c> when no detail-page viewer has armed this PLC's capture (the common
|
||||||
|
/// case — zero work) or when no observed tag falls in the range. The pipeline records
|
||||||
|
/// observations on the cache-miss path just before the entry is stored; this captures
|
||||||
|
/// them so a later cache hit — which bypasses the pipeline — can replay them. Scoped
|
||||||
|
/// to the read range so a hit never re-stamps a tag that was not part of this read.
|
||||||
|
/// </summary>
|
||||||
|
private IReadOnlyList<TagValueObservation>? CaptureRangeObservations(ushort startAddress, ushort qty)
|
||||||
|
{
|
||||||
|
if (_ctx.Capture is not { IsArmed: true } capture)
|
||||||
|
return null;
|
||||||
|
if (!_ctx.TagMap.TryGetForRange(startAddress, qty, out var hits) || hits.Count == 0)
|
||||||
|
return null;
|
||||||
|
|
||||||
|
var inRange = new HashSet<ushort>();
|
||||||
|
foreach (var hit in hits)
|
||||||
|
inRange.Add(hit.Tag.Address);
|
||||||
|
|
||||||
|
List<TagValueObservation>? observed = null;
|
||||||
|
foreach (var obs in capture.Snapshot())
|
||||||
|
{
|
||||||
|
if (obs.UpdatedAtUtc is not null && inRange.Contains(obs.Address))
|
||||||
|
(observed ??= []).Add(obs);
|
||||||
|
}
|
||||||
|
return observed;
|
||||||
|
}
|
||||||
|
|
||||||
private static byte[] BuildExceptionFrame(ushort originalTxId, byte unitId, byte fc, byte exceptionCode)
|
private static byte[] BuildExceptionFrame(ushort originalTxId, byte unitId, byte fc, byte exceptionCode)
|
||||||
{
|
{
|
||||||
// Modbus exception PDU = [fc | 0x80][exceptionCode].
|
// Modbus exception PDU = [fc | 0x80][exceptionCode].
|
||||||
|
|||||||
@@ -32,6 +32,7 @@ internal sealed class TxIdAllocator
|
|||||||
private ushort _next; // rolling cursor; 0 on construction
|
private ushort _next; // rolling cursor; 0 on construction
|
||||||
private int _inFlightCount; // 0..65536
|
private int _inFlightCount; // 0..65536
|
||||||
private long _wrapCount; // monotonic; never resets
|
private long _wrapCount; // monotonic; never resets
|
||||||
|
private long _doubleReleaseCount; // monotonic; Release called on an already-free slot
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// Number of currently-in-flight proxy TxIds (i.e., allocated but not yet released).
|
/// Number of currently-in-flight proxy TxIds (i.e., allocated but not yet released).
|
||||||
@@ -56,6 +57,14 @@ internal sealed class TxIdAllocator
|
|||||||
/// </summary>
|
/// </summary>
|
||||||
public long WrapCount => Interlocked.Read(ref _wrapCount);
|
public long WrapCount => Interlocked.Read(ref _wrapCount);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Number of times <see cref="Release"/> was called on a slot that was already free.
|
||||||
|
/// A double-release is normally a benign cascade-vs-timeout race, but a sustained
|
||||||
|
/// non-zero rate points at the documented <c>TearDownBackendAsync</c> gate-not-held
|
||||||
|
/// race actually firing — making the otherwise-silent request drop observable.
|
||||||
|
/// </summary>
|
||||||
|
public long DoubleReleaseCount => Interlocked.Read(ref _doubleReleaseCount);
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// Attempts to allocate the next free proxy TxId.
|
/// Attempts to allocate the next free proxy TxId.
|
||||||
/// Returns <c>true</c> with <paramref name="id"/> set when an ID was allocated.
|
/// Returns <c>true</c> with <paramref name="id"/> set when an ID was allocated.
|
||||||
@@ -125,6 +134,12 @@ internal sealed class TxIdAllocator
|
|||||||
_inUse[id] = false;
|
_inUse[id] = false;
|
||||||
_inFlightCount--;
|
_inFlightCount--;
|
||||||
}
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
// Double-release: the slot was already free. Harmless to the allocator
|
||||||
|
// (idempotent) but tracked so the rare cascade-vs-timeout race is visible.
|
||||||
|
Interlocked.Increment(ref _doubleReleaseCount);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -130,14 +130,16 @@ internal sealed partial class UpstreamPipe : IAsyncDisposable
|
|||||||
out _, out _, out ushort length, out _))
|
out _, out _, out ushort length, out _))
|
||||||
return;
|
return;
|
||||||
|
|
||||||
if (length < 1)
|
if (length < 2)
|
||||||
{
|
{
|
||||||
// Length field claims no body — forward the header alone via a fresh buffer.
|
// A valid MBAP Length covers at least UnitId(1) + FC(1) = 2 bytes. A
|
||||||
byte[] degenerate = new byte[MbapFrame.HeaderSize];
|
// frame claiming less is malformed Modbus — there is no FC to route on
|
||||||
Buffer.BlockCopy(headerBuf, 0, degenerate, 0, MbapFrame.HeaderSize);
|
// and no PDU to forward. Close the upstream rather than allocate a
|
||||||
await onFrame(degenerate, token).ConfigureAwait(false);
|
// proxy TxId and push a 7-byte garbage frame at the backend (review N1).
|
||||||
Interlocked.Increment(ref _pdusForwardedCount);
|
_logger.LogWarning(
|
||||||
continue;
|
"Malformed upstream frame: Plc={Plc} MbapLength={Length} < 2 — closing pipe",
|
||||||
|
_plcName, length);
|
||||||
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
int pduBodyLen = length - 1;
|
int pduBodyLen = length - 1;
|
||||||
|
|||||||
@@ -52,6 +52,16 @@ internal class PerPlcContext : PduContext
|
|||||||
/// </summary>
|
/// </summary>
|
||||||
internal ResponseCache? Cache { get; init; }
|
internal ResponseCache? Cache { get; init; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Optional per-PLC tag-value capture feeding the connection-detail debug view.
|
||||||
|
/// Wired in production from <see cref="TagCaptureRegistry"/>; <c>null</c> in unit
|
||||||
|
/// tests that don't exercise it. The <see cref="BcdPduPipeline"/> records into it
|
||||||
|
/// with <c>?.</c>, and the capture itself no-ops unless a detail page has armed it,
|
||||||
|
/// so the cost on the hot path with no viewer is one nullable-deref + one volatile
|
||||||
|
/// read.
|
||||||
|
/// </summary>
|
||||||
|
internal TagValueCapture? Capture { get; init; }
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// Returns a shallow clone of this context with <see cref="CurrentRequest"/> set to
|
/// Returns a shallow clone of this context with <see cref="CurrentRequest"/> set to
|
||||||
/// <paramref name="req"/>. The clone is cheap (one allocation per response) and avoids
|
/// <paramref name="req"/>. The clone is cheap (one allocation per response) and avoids
|
||||||
@@ -65,5 +75,6 @@ internal class PerPlcContext : PduContext
|
|||||||
Logger = Logger,
|
Logger = Logger,
|
||||||
CurrentRequest = req,
|
CurrentRequest = req,
|
||||||
Cache = Cache,
|
Cache = Cache,
|
||||||
|
Capture = Capture,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -142,22 +142,30 @@ internal sealed partial class PlcListener : IAsyncDisposable
|
|||||||
finally
|
finally
|
||||||
{
|
{
|
||||||
await pipe.DisposeAsync().ConfigureAwait(false);
|
await pipe.DisposeAsync().ConfigureAwait(false);
|
||||||
|
// Self-evict from the task table in the same finally that disposes
|
||||||
|
// the pipe — one scheduling primitive instead of a separate
|
||||||
|
// fire-and-forget ContinueWith.
|
||||||
|
_pipeTasks.TryRemove(pipe.Id, out _);
|
||||||
}
|
}
|
||||||
}, CancellationToken.None);
|
}, CancellationToken.None);
|
||||||
|
|
||||||
_pipeTasks[pipe.Id] = pipeTask;
|
_pipeTasks[pipe.Id] = pipeTask;
|
||||||
_ = pipeTask.ContinueWith(prev => _pipeTasks.TryRemove(pipe.Id, out _), TaskScheduler.Default);
|
// If the pipe task already ran to completion (and its finally's TryRemove
|
||||||
|
// fired before this add), evict the now-stale completed entry.
|
||||||
|
if (pipeTask.IsCompleted)
|
||||||
|
_pipeTasks.TryRemove(pipe.Id, out _);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
catch (OperationCanceledException)
|
catch (OperationCanceledException)
|
||||||
{
|
{
|
||||||
// Normal shutdown.
|
// Normal shutdown.
|
||||||
}
|
}
|
||||||
catch (Exception ex)
|
// Any other exception (an accept-loop fault) propagates to the caller. The
|
||||||
{
|
// PlcListenerSupervisor's run handler catches it, logs mbproxy.listener.faulted
|
||||||
// Listener faulted — log and return. The supervisor will restart.
|
// (EventId 43, Warning, WITH the exception + stack trace), and drives Polly
|
||||||
LogListenerFaulted(_listenerLogger, _plc.Name, _plc.ListenPort, ex.Message);
|
// recovery. RunAsync must not swallow the fault here — doing so made that
|
||||||
}
|
// supervised fault path unreachable and downgraded faults to a generic
|
||||||
|
// "ended unexpectedly" with no stack trace (review ProxyAndBcd M2).
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── IAsyncDisposable ──────────────────────────────────────────────────────────────────
|
// ── IAsyncDisposable ──────────────────────────────────────────────────────────────────
|
||||||
@@ -197,8 +205,4 @@ internal sealed partial class PlcListener : IAsyncDisposable
|
|||||||
[LoggerMessage(EventId = 20, EventName = "mbproxy.startup.bind",
|
[LoggerMessage(EventId = 20, EventName = "mbproxy.startup.bind",
|
||||||
Level = LogLevel.Information, Message = "Listener bound: Plc={Plc} Port={Port}")]
|
Level = LogLevel.Information, Message = "Listener bound: Plc={Plc} Port={Port}")]
|
||||||
private static partial void LogBound(ILogger logger, string plc, int port);
|
private static partial void LogBound(ILogger logger, string plc, int port);
|
||||||
|
|
||||||
[LoggerMessage(EventId = 22, EventName = "mbproxy.listener.faulted",
|
|
||||||
Level = LogLevel.Error, Message = "Listener faulted: Plc={Plc} Port={Port} Reason={Reason}")]
|
|
||||||
private static partial void LogListenerFaulted(ILogger logger, string plc, int port, string reason);
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -410,7 +410,7 @@ internal sealed class ProxyCounters
|
|||||||
// Convert ticks to microseconds.
|
// Convert ticks to microseconds.
|
||||||
double sampleMs = (double)elapsedTicks / System.Diagnostics.Stopwatch.Frequency * 1000.0;
|
double sampleMs = (double)elapsedTicks / System.Diagnostics.Stopwatch.Frequency * 1000.0;
|
||||||
|
|
||||||
// Fixed-point: store microseconds * 1000 (i.e. nanoseconds) as long for CAS.
|
// Fixed-point: store milliseconds * 1000 (i.e. microseconds) as long for CAS.
|
||||||
// This gives ~1 µs resolution which is fine for Modbus round-trips (1–100 ms range).
|
// This gives ~1 µs resolution which is fine for Modbus round-trips (1–100 ms range).
|
||||||
long sampleFixed = (long)(sampleMs * 1000.0);
|
long sampleFixed = (long)(sampleMs * 1000.0);
|
||||||
|
|
||||||
|
|||||||
@@ -47,6 +47,10 @@ internal sealed partial class ProxyWorker : BackgroundService
|
|||||||
private readonly IServiceProvider _services;
|
private readonly IServiceProvider _services;
|
||||||
private AdminEndpointHost? _admin;
|
private AdminEndpointHost? _admin;
|
||||||
|
|
||||||
|
// Per-PLC tag-value captures for the connection-detail debug view. Populated as
|
||||||
|
// each PerPlcContext is built; the admin SignalR layer arms/disarms entries.
|
||||||
|
private readonly TagCaptureRegistry _captureRegistry;
|
||||||
|
|
||||||
// Supervisors are managed jointly by ProxyWorker (initial bootstrap) and
|
// Supervisors are managed jointly by ProxyWorker (initial bootstrap) and
|
||||||
// ConfigReconciler (subsequent hot-reload changes). The dictionary is shared via
|
// ConfigReconciler (subsequent hot-reload changes). The dictionary is shared via
|
||||||
// ConfigReconciler.Attach() after initial startup.
|
// ConfigReconciler.Attach() after initial startup.
|
||||||
@@ -71,14 +75,16 @@ internal sealed partial class ProxyWorker : BackgroundService
|
|||||||
ILogger<ProxyWorker> logger,
|
ILogger<ProxyWorker> logger,
|
||||||
ILoggerFactory loggerFactory,
|
ILoggerFactory loggerFactory,
|
||||||
ConfigReconciler reconciler,
|
ConfigReconciler reconciler,
|
||||||
|
TagCaptureRegistry captureRegistry,
|
||||||
IServiceProvider services)
|
IServiceProvider services)
|
||||||
{
|
{
|
||||||
_options = options;
|
_options = options;
|
||||||
_pipeline = pipeline;
|
_pipeline = pipeline;
|
||||||
_logger = logger;
|
_logger = logger;
|
||||||
_loggerFactory = loggerFactory;
|
_loggerFactory = loggerFactory;
|
||||||
_reconciler = reconciler;
|
_reconciler = reconciler;
|
||||||
_services = services;
|
_captureRegistry = captureRegistry;
|
||||||
|
_services = services;
|
||||||
// Admin endpoint resolved lazily in ExecuteAsync (see field comment).
|
// Admin endpoint resolved lazily in ExecuteAsync (see field comment).
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -87,6 +93,23 @@ internal sealed partial class ProxyWorker : BackgroundService
|
|||||||
var opts = _options.CurrentValue;
|
var opts = _options.CurrentValue;
|
||||||
int plcsConfigured = opts.Plcs.Count;
|
int plcsConfigured = opts.Plcs.Count;
|
||||||
|
|
||||||
|
// ── 0. Fail-fast on configuration errors ─────────────────────────────────────
|
||||||
|
// ReloadValidator is the single config gate — it checks cross-PLC rules
|
||||||
|
// (duplicate listen ports, AdminPort collisions, duplicate PLC names, backend
|
||||||
|
// Host/Port, the keepalive cross-field rule, Resilience profiles) that
|
||||||
|
// MbproxyOptionsValidator / .ValidateOnStart() does not. It previously ran only
|
||||||
|
// on hot-reload; running it here too means a config mistake aborts startup with
|
||||||
|
// a clear error instead of silently producing a half-working fleet (review
|
||||||
|
// ConfigAndHosting M1/M4/M5). The host's default BackgroundService exception
|
||||||
|
// behaviour (StopHost) turns the throw into a non-zero exit.
|
||||||
|
if (!ReloadValidator.Validate(opts, out var configErrors))
|
||||||
|
{
|
||||||
|
LogConfigRejectedAtStartup(_logger, string.Join("; ", configErrors));
|
||||||
|
throw new InvalidOperationException(
|
||||||
|
"mbproxy configuration is invalid; service will not start. " +
|
||||||
|
string.Join("; ", configErrors));
|
||||||
|
}
|
||||||
|
|
||||||
// ── 1. Build per-PLC BCD tag maps ────────────────────────────────────────────
|
// ── 1. Build per-PLC BCD tag maps ────────────────────────────────────────────
|
||||||
var plcContexts = new Dictionary<string, PerPlcContext>(opts.Plcs.Count, StringComparer.Ordinal);
|
var plcContexts = new Dictionary<string, PerPlcContext>(opts.Plcs.Count, StringComparer.Ordinal);
|
||||||
|
|
||||||
@@ -123,6 +146,7 @@ internal sealed partial class ProxyWorker : BackgroundService
|
|||||||
Counters = new ProxyCounters(),
|
Counters = new ProxyCounters(),
|
||||||
Logger = _loggerFactory.CreateLogger($"Mbproxy.Proxy.BcdRewriter.{plc.Name}"),
|
Logger = _loggerFactory.CreateLogger($"Mbproxy.Proxy.BcdRewriter.{plc.Name}"),
|
||||||
Cache = cache,
|
Cache = cache,
|
||||||
|
Capture = _captureRegistry.GetOrCreate(plc.Name, result.Map),
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -372,10 +396,10 @@ internal sealed partial class ProxyWorker : BackgroundService
|
|||||||
Message = "mbproxy service ready — ListenersBound={ListenersBound} PlcsConfigured={PlcsConfigured}")]
|
Message = "mbproxy service ready — ListenersBound={ListenersBound} PlcsConfigured={PlcsConfigured}")]
|
||||||
private static partial void LogStartupReady(ILogger logger, int listenersBound, int plcsConfigured);
|
private static partial void LogStartupReady(ILogger logger, int listenersBound, int plcsConfigured);
|
||||||
|
|
||||||
[LoggerMessage(EventId = 21, EventName = "mbproxy.startup.bind.failed",
|
[LoggerMessage(EventId = 2, EventName = "mbproxy.startup.config.rejected",
|
||||||
Level = LogLevel.Error,
|
Level = LogLevel.Error,
|
||||||
Message = "Failed to bind listener: Plc={Plc} Port={Port} Reason={Reason}")]
|
Message = "Startup configuration rejected — service will not start: {Errors}")]
|
||||||
private static partial void LogBindFailed(ILogger logger, string plc, int port, string reason);
|
private static partial void LogConfigRejectedAtStartup(ILogger logger, string errors);
|
||||||
|
|
||||||
[LoggerMessage(EventId = 80, EventName = "mbproxy.shutdown.complete",
|
[LoggerMessage(EventId = 80, EventName = "mbproxy.shutdown.complete",
|
||||||
Level = LogLevel.Information,
|
Level = LogLevel.Information,
|
||||||
|
|||||||
@@ -45,5 +45,12 @@ internal static class SocketKeepalive
|
|||||||
{
|
{
|
||||||
// Socket closed concurrently — nothing to do.
|
// Socket closed concurrently — nothing to do.
|
||||||
}
|
}
|
||||||
|
catch (NotSupportedException)
|
||||||
|
{
|
||||||
|
// Some platforms/runtimes throw NotSupportedException (or its derived
|
||||||
|
// PlatformNotSupportedException) rather than SocketException for an
|
||||||
|
// unrecognised TcpKeepAlive* option. Swallow it too — the "must never abort a
|
||||||
|
// connection" contract above applies regardless of the exception type.
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -0,0 +1,79 @@
|
|||||||
|
using System.Collections.Concurrent;
|
||||||
|
using Mbproxy.Bcd;
|
||||||
|
|
||||||
|
namespace Mbproxy.Proxy;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Process-wide registry of per-PLC <see cref="TagValueCapture"/> instances — the
|
||||||
|
/// shared seam between the proxy hot path (which records tag values) and the admin
|
||||||
|
/// layer (which arms captures on detail-page open and reads their snapshots).
|
||||||
|
///
|
||||||
|
/// <para>Registered as a DI singleton. <see cref="ProxyWorker"/> and
|
||||||
|
/// <see cref="Configuration.ConfigReconciler"/> call <see cref="GetOrCreate"/> as they
|
||||||
|
/// build each <see cref="PerPlcContext"/>; <see cref="Configuration.ConfigReconciler"/>
|
||||||
|
/// calls <see cref="Remove"/> for hot-reload-removed PLCs. <c>StatusBroadcaster</c>
|
||||||
|
/// calls <see cref="ReconcileArmed"/> every push cycle (the single arm/disarm authority)
|
||||||
|
/// and <see cref="DisarmAll"/> on shutdown; <c>StatusSnapshotBuilder</c> calls
|
||||||
|
/// <see cref="TryGet"/>.</para>
|
||||||
|
/// </summary>
|
||||||
|
internal sealed class TagCaptureRegistry
|
||||||
|
{
|
||||||
|
private readonly ConcurrentDictionary<string, TagValueCapture> _captures =
|
||||||
|
new(StringComparer.Ordinal);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Returns the capture for <paramref name="plcName"/>, creating it on first call.
|
||||||
|
/// A subsequent call (hot-reload reseat/restart, where the tag set may have changed)
|
||||||
|
/// rebuilds the capture for <paramref name="map"/>'s current tags. The rebuilt
|
||||||
|
/// capture is disarmed; <see cref="StatusBroadcaster"/> re-arms it on its next push
|
||||||
|
/// cycle (within one <c>AdminPushIntervalMs</c>) if the PLC still has a viewer — so
|
||||||
|
/// arm state is never carried across the rebuild, which removes any race between
|
||||||
|
/// arming and the rebuild.
|
||||||
|
/// </summary>
|
||||||
|
public TagValueCapture GetOrCreate(string plcName, BcdTagMap map)
|
||||||
|
=> _captures.AddOrUpdate(
|
||||||
|
plcName,
|
||||||
|
_ => new TagValueCapture(map.All),
|
||||||
|
(_, _) => new TagValueCapture(map.All));
|
||||||
|
|
||||||
|
/// <summary>Drops the capture for a hot-reload-removed PLC.</summary>
|
||||||
|
public void Remove(string plcName) => _captures.TryRemove(plcName, out _);
|
||||||
|
|
||||||
|
/// <summary>Looks up a PLC's capture; false when the PLC is unknown.</summary>
|
||||||
|
public bool TryGet(string plcName, out TagValueCapture capture)
|
||||||
|
=> _captures.TryGetValue(plcName, out capture!);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Reconciles every capture's armed state against <paramref name="activePlcs"/> —
|
||||||
|
/// the set of PLCs that currently have a detail-page viewer. Captures for active
|
||||||
|
/// PLCs are armed, all others disarmed. Called once per push cycle by
|
||||||
|
/// <see cref="StatusBroadcaster"/>, so it is the <b>single</b> arm/disarm authority:
|
||||||
|
/// no hub thread ever arms a capture, which both removes the race against a
|
||||||
|
/// hot-reload <see cref="GetOrCreate"/> and makes a leaked subscriber impossible to
|
||||||
|
/// reach here (the tracker is reconnect-safe).
|
||||||
|
/// </summary>
|
||||||
|
public void ReconcileArmed(IReadOnlyCollection<string> activePlcs)
|
||||||
|
{
|
||||||
|
var active = activePlcs as IReadOnlySet<string>
|
||||||
|
?? activePlcs.ToHashSet(StringComparer.Ordinal);
|
||||||
|
|
||||||
|
foreach (var (name, capture) in _captures)
|
||||||
|
{
|
||||||
|
bool shouldArm = active.Contains(name);
|
||||||
|
if (shouldArm && !capture.IsArmed)
|
||||||
|
capture.Arm();
|
||||||
|
else if (!shouldArm && capture.IsArmed)
|
||||||
|
capture.Disarm();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Disarms every capture. Called when the admin endpoint stops (e.g. AdminPort
|
||||||
|
/// hot-reload tears down the SignalR host) so no capture is left armed with no viewer.
|
||||||
|
/// </summary>
|
||||||
|
public void DisarmAll()
|
||||||
|
{
|
||||||
|
foreach (var c in _captures.Values)
|
||||||
|
c.Disarm();
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,176 @@
|
|||||||
|
using System.Collections.Frozen;
|
||||||
|
using Mbproxy.Bcd;
|
||||||
|
|
||||||
|
namespace Mbproxy.Proxy;
|
||||||
|
|
||||||
|
/// <summary>Direction of a captured tag-value observation.</summary>
|
||||||
|
public enum CaptureDirection
|
||||||
|
{
|
||||||
|
/// <summary>FC03/FC04 response — the proxy decoded BCD nibbles → binary for the client.</summary>
|
||||||
|
Read,
|
||||||
|
|
||||||
|
/// <summary>FC06/FC16 request — the proxy encoded the client's binary → BCD for the PLC.</summary>
|
||||||
|
Write,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// One immutable observation of a BCD tag's value as it last crossed the proxy.
|
||||||
|
///
|
||||||
|
/// <para>"PLC side" is always the BCD-encoded form on the Modbus wire to/from the
|
||||||
|
/// device; "client side" is always the decoded binary integer the upstream client
|
||||||
|
/// reads or wrote. <see cref="RawHigh"/> is 0 for a 16-bit tag. <see cref="Name"/> is
|
||||||
|
/// the tag's optional human-friendly label (null when the config gave none).</para>
|
||||||
|
/// </summary>
|
||||||
|
public sealed record TagValueObservation(
|
||||||
|
ushort Address,
|
||||||
|
byte Width,
|
||||||
|
string? Name,
|
||||||
|
ushort RawLow,
|
||||||
|
ushort RawHigh,
|
||||||
|
int DecodedValue,
|
||||||
|
CaptureDirection Direction,
|
||||||
|
DateTimeOffset? UpdatedAtUtc);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Per-PLC, on-demand store of the last value seen for each configured BCD tag — the
|
||||||
|
/// data source behind the connection-detail page's real-time debug view.
|
||||||
|
///
|
||||||
|
/// <para><b>On-demand.</b> The capture starts disarmed: <see cref="Record"/> is an
|
||||||
|
/// immediate no-op (one volatile-bool read) until <see cref="Arm"/> is called. The
|
||||||
|
/// SignalR layer arms a PLC's capture only while its detail page has a live
|
||||||
|
/// subscriber and <see cref="Disarm"/>s it — clearing all slots — when the last
|
||||||
|
/// viewer leaves, so a reopened page shows "no traffic yet" rather than stale data.</para>
|
||||||
|
///
|
||||||
|
/// <para><b>Concurrency.</b> <see cref="Record"/> is called from many upstream-read
|
||||||
|
/// tasks (FC06/FC16 requests) and the single backend reader task (FC03/FC04
|
||||||
|
/// responses); <see cref="Snapshot"/> runs on the admin-push thread. Each slot holds a
|
||||||
|
/// reference to an immutable <see cref="TagValueObservation"/>, swapped with
|
||||||
|
/// <see cref="Volatile.Write{T}(ref T, T)"/> — reference assignment is atomic and the
|
||||||
|
/// record is immutable, so a reader never sees a torn slot. No locks.</para>
|
||||||
|
/// </summary>
|
||||||
|
internal sealed class TagValueCapture
|
||||||
|
{
|
||||||
|
// Tag address → slot index. Frozen for allocation-free O(1) lookup on the hot path.
|
||||||
|
private readonly FrozenDictionary<ushort, int> _addressToSlot;
|
||||||
|
|
||||||
|
// Slot index → tag identity. Parallel to _slots; immutable after construction.
|
||||||
|
private readonly ushort[] _addresses;
|
||||||
|
private readonly byte[] _widths;
|
||||||
|
private readonly string?[] _names;
|
||||||
|
|
||||||
|
// Slot index → last observation (null = no traffic captured yet). Each element is
|
||||||
|
// swapped via Volatile.Write; never mutated in place.
|
||||||
|
private readonly TagValueObservation?[] _slots;
|
||||||
|
|
||||||
|
private volatile bool _armed;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Builds a capture for one PLC's resolved BCD tag set. One slot per tag, ordered
|
||||||
|
/// ascending by address for a stable debug-view row order.
|
||||||
|
/// </summary>
|
||||||
|
public TagValueCapture(IEnumerable<BcdTag> tags)
|
||||||
|
{
|
||||||
|
var ordered = tags
|
||||||
|
.GroupBy(t => t.Address)
|
||||||
|
.Select(g => g.First())
|
||||||
|
.OrderBy(t => t.Address)
|
||||||
|
.ToArray();
|
||||||
|
|
||||||
|
_addresses = new ushort[ordered.Length];
|
||||||
|
_widths = new byte[ordered.Length];
|
||||||
|
_names = new string?[ordered.Length];
|
||||||
|
_slots = new TagValueObservation?[ordered.Length];
|
||||||
|
|
||||||
|
var index = new Dictionary<ushort, int>(ordered.Length);
|
||||||
|
for (int i = 0; i < ordered.Length; i++)
|
||||||
|
{
|
||||||
|
_addresses[i] = ordered[i].Address;
|
||||||
|
_widths[i] = ordered[i].Width;
|
||||||
|
_names[i] = ordered[i].Name;
|
||||||
|
index[ordered[i].Address] = i;
|
||||||
|
}
|
||||||
|
_addressToSlot = index.ToFrozenDictionary();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Number of BCD tag slots this capture tracks.</summary>
|
||||||
|
public int TagCount => _slots.Length;
|
||||||
|
|
||||||
|
/// <summary>True while a detail-page subscriber has armed this capture.</summary>
|
||||||
|
public bool IsArmed => _armed;
|
||||||
|
|
||||||
|
/// <summary>Arms capture — <see cref="Record"/> begins storing observations.</summary>
|
||||||
|
public void Arm() => _armed = true;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Disarms capture and clears every slot, so the next <see cref="Snapshot"/> after a
|
||||||
|
/// re-arm reflects only post-re-arm traffic.
|
||||||
|
/// </summary>
|
||||||
|
public void Disarm()
|
||||||
|
{
|
||||||
|
_armed = false;
|
||||||
|
for (int i = 0; i < _slots.Length; i++)
|
||||||
|
Volatile.Write(ref _slots[i], null);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Records the last value of the BCD tag at <paramref name="address"/>. No-op when
|
||||||
|
/// disarmed or when <paramref name="address"/> is not a tracked tag.
|
||||||
|
///
|
||||||
|
/// <para>Concurrency guarantee: a disarmed capture never <i>retains</i> a stale
|
||||||
|
/// observation (the post-write re-check below undoes a write that raced a
|
||||||
|
/// <see cref="Disarm"/>). It is <b>not</b> guaranteed that every observation recorded
|
||||||
|
/// while armed survives — two threads recording the same address concurrent with a
|
||||||
|
/// near-simultaneous disarm can drop one update. That is a lost update on a debug
|
||||||
|
/// view, not stale data, and the slot self-heals on the next traffic.</para>
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="rawLow">BCD-encoded low word as it sits on the PLC wire.</param>
|
||||||
|
/// <param name="rawHigh">BCD-encoded high word (0 for a 16-bit tag).</param>
|
||||||
|
/// <param name="decoded">Decoded binary integer the client reads/wrote.</param>
|
||||||
|
/// <param name="observedAtUtc">
|
||||||
|
/// When the value was actually observed from the PLC. Defaults to "now" for a live
|
||||||
|
/// pipeline observation. The response-cache hit path passes the cached observation's
|
||||||
|
/// original timestamp so a cache-served read shows its true age in the debug view
|
||||||
|
/// rather than appearing freshly read.
|
||||||
|
/// </param>
|
||||||
|
public void Record(
|
||||||
|
ushort address, ushort rawLow, ushort rawHigh, int decoded, CaptureDirection direction,
|
||||||
|
DateTimeOffset? observedAtUtc = null)
|
||||||
|
{
|
||||||
|
if (!_armed)
|
||||||
|
return;
|
||||||
|
if (!_addressToSlot.TryGetValue(address, out int idx))
|
||||||
|
return;
|
||||||
|
|
||||||
|
Volatile.Write(
|
||||||
|
ref _slots[idx],
|
||||||
|
new TagValueObservation(
|
||||||
|
_addresses[idx], _widths[idx], _names[idx], rawLow, rawHigh, decoded, direction,
|
||||||
|
observedAtUtc ?? DateTimeOffset.UtcNow));
|
||||||
|
|
||||||
|
// A concurrent Disarm() may have flipped _armed (and cleared the slots) between
|
||||||
|
// the _armed check above and the write just made — which would strand a stale
|
||||||
|
// observation on a disarmed capture, defeating the "reopened page shows no stale
|
||||||
|
// data" contract. Re-read _armed: if it is now false, Disarm has either already
|
||||||
|
// run (so this write must be undone) or is still running (its own slot-clear
|
||||||
|
// pass will null this slot). Either way, null it here to be safe.
|
||||||
|
if (!_armed)
|
||||||
|
Volatile.Write(ref _slots[idx], null);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Point-in-time projection of every tracked tag. Slots with no traffic yet are
|
||||||
|
/// returned with <see cref="TagValueObservation.UpdatedAtUtc"/> = <c>null</c> and
|
||||||
|
/// zero values, so the debug view always renders one row per configured tag.
|
||||||
|
/// </summary>
|
||||||
|
public IReadOnlyList<TagValueObservation> Snapshot()
|
||||||
|
{
|
||||||
|
var result = new TagValueObservation[_slots.Length];
|
||||||
|
for (int i = 0; i < _slots.Length; i++)
|
||||||
|
{
|
||||||
|
result[i] = Volatile.Read(ref _slots[i])
|
||||||
|
?? new TagValueObservation(
|
||||||
|
_addresses[i], _widths[i], _names[i], 0, 0, 0, CaptureDirection.Read, null);
|
||||||
|
}
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,17 @@
|
|||||||
|
Original Tag Address New Tag Address Modbus Address Description Data Type Data Direction
|
||||||
|
V3014 41549 Left AirSP BCD16 Write-Only
|
||||||
|
41537 Left ArgonSP BCD16 Write-Only
|
||||||
|
V3010 41545 Left ChlorineSP BCD16 Write-Only
|
||||||
|
V3012 41547 Left HydrogenSP BCD16 Write-Only
|
||||||
|
V3015 41550 Right AirSP BCD16 Write-Only
|
||||||
|
41540 Right ArgonSP BCD16 Write-Only
|
||||||
|
V3011 41546 Right ChlorineSP BCD16 Write-Only
|
||||||
|
V3013 41548 Right HydrogenSP BCD16 Write-Only
|
||||||
|
V1700 V11010 44617 MTA Runtime Left in Minuntes BCD32 Read-Only
|
||||||
|
V1701 V11011 44618 MTA Runtime Left in Minuntes BCD32 Read-Only
|
||||||
|
V1702 V11012 44619 MTA Runtime Right in Minuntes BCD32 Read-Only
|
||||||
|
V1703 V11013 44620 MTA Runtime Right in Minuntes BCD32 Read-Only
|
||||||
|
V1710 V11022 44627 FRR Runtime Left in Minutes BCD32 Read-Only
|
||||||
|
V1711 V11023 44628 FRR Runtime Left in Minutes BCD32 Read-Only
|
||||||
|
V1712 V11024 44629 FRR Runtime Right in Minutes BCD32 Read-Only
|
||||||
|
V1713 V11025 44630 FRR Runtime Right in Minutes BCD32 Read-Only
|
||||||
@@ -168,10 +168,10 @@ public sealed class AdminEndpointTests
|
|||||||
after.ShouldBeGreaterThan(before, "partialBcdWarnings should increment after partial overlap read");
|
after.ShouldBeGreaterThan(before, "partialBcdWarnings should increment after partial overlap read");
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── 4. GET / returns 200 text/html with meta-refresh ─────────────────────
|
// ── 4. GET / and GET /plc/{name} serve the embedded SPA shells ───────────
|
||||||
|
|
||||||
[Fact(Timeout = 5_000)]
|
[Fact(Timeout = 5_000)]
|
||||||
public async Task Get_Root_ReturnsHtml_WithMetaRefresh()
|
public async Task Get_Root_ReturnsDashboardShell()
|
||||||
{
|
{
|
||||||
int adminPort = PickFreePort();
|
int adminPort = PickFreePort();
|
||||||
int proxyPort = PickFreePort();
|
int proxyPort = PickFreePort();
|
||||||
@@ -190,8 +190,94 @@ public sealed class AdminEndpointTests
|
|||||||
response.Content.Headers.ContentType?.MediaType.ShouldBe("text/html");
|
response.Content.Headers.ContentType?.MediaType.ShouldBe("text/html");
|
||||||
|
|
||||||
string body = await response.Content.ReadAsStringAsync(TestContext.Current.CancellationToken);
|
string body = await response.Content.ReadAsStringAsync(TestContext.Current.CancellationToken);
|
||||||
body.ShouldContain("<meta http-equiv=\"refresh\" content=\"5\">");
|
body.ShouldContain("<!doctype html>");
|
||||||
body.ShouldContain("<!DOCTYPE html>");
|
body.ShouldContain("/assets/dashboard.js");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact(Timeout = 5_000)]
|
||||||
|
public async Task Get_PlcDetailRoute_ReturnsDetailShell()
|
||||||
|
{
|
||||||
|
int adminPort = PickFreePort();
|
||||||
|
int proxyPort = PickFreePort();
|
||||||
|
|
||||||
|
var host = BuildHost(adminPort: adminPort, simHost: "127.0.0.1", simPort: 502,
|
||||||
|
proxyPort: proxyPort, bcd16Addresses: []);
|
||||||
|
await using var _ = new AsyncHostDispose(host);
|
||||||
|
using var startCts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
|
||||||
|
await host.StartAsync(startCts.Token);
|
||||||
|
|
||||||
|
await WaitForAdminAsync(adminPort);
|
||||||
|
|
||||||
|
var response = await HttpClient.GetAsync($"http://127.0.0.1:{adminPort}/plc/anything",
|
||||||
|
TestContext.Current.CancellationToken);
|
||||||
|
response.StatusCode.ShouldBe(System.Net.HttpStatusCode.OK);
|
||||||
|
response.Content.Headers.ContentType?.MediaType.ShouldBe("text/html");
|
||||||
|
|
||||||
|
string body = await response.Content.ReadAsStringAsync(TestContext.Current.CancellationToken);
|
||||||
|
body.ShouldContain("/assets/detail.js");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Theory(Timeout = 5_000)]
|
||||||
|
[InlineData("bootstrap.min.css", "text/css")]
|
||||||
|
[InlineData("signalr.min.js", "text/javascript")]
|
||||||
|
[InlineData("dashboard.js", "text/javascript")]
|
||||||
|
[InlineData("theme.css", "text/css")]
|
||||||
|
[InlineData("ibm-plex-mono-500.woff2", "font/woff2")]
|
||||||
|
public async Task Get_Asset_ReturnsCorrectContentType(string file, string expectedType)
|
||||||
|
{
|
||||||
|
int adminPort = PickFreePort();
|
||||||
|
int proxyPort = PickFreePort();
|
||||||
|
|
||||||
|
var host = BuildHost(adminPort: adminPort, simHost: "127.0.0.1", simPort: 502,
|
||||||
|
proxyPort: proxyPort, bcd16Addresses: []);
|
||||||
|
await using var _ = new AsyncHostDispose(host);
|
||||||
|
using var startCts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
|
||||||
|
await host.StartAsync(startCts.Token);
|
||||||
|
|
||||||
|
await WaitForAdminAsync(adminPort);
|
||||||
|
|
||||||
|
var response = await HttpClient.GetAsync($"http://127.0.0.1:{adminPort}/assets/{file}",
|
||||||
|
TestContext.Current.CancellationToken);
|
||||||
|
response.StatusCode.ShouldBe(System.Net.HttpStatusCode.OK);
|
||||||
|
response.Content.Headers.ContentType?.MediaType.ShouldBe(expectedType);
|
||||||
|
response.Headers.CacheControl?.ToString().ShouldContain("immutable");
|
||||||
|
|
||||||
|
var bytes = await response.Content.ReadAsByteArrayAsync(TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
|
// The served bytes must be the actual embedded asset — not some other resource
|
||||||
|
// of the same length. Compare against the manifest resource directly.
|
||||||
|
byte[] expected = ReadEmbeddedAsset(file);
|
||||||
|
bytes.ShouldBe(expected, $"GET /assets/{file} must return the embedded asset verbatim");
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Reads a <c>wwwroot</c> asset straight from the assembly's manifest resources.</summary>
|
||||||
|
private static byte[] ReadEmbeddedAsset(string fileName)
|
||||||
|
{
|
||||||
|
using var stream = typeof(Mbproxy.Admin.StatusHub).Assembly
|
||||||
|
.GetManifestResourceStream("Mbproxy.Admin.wwwroot." + fileName)
|
||||||
|
?? throw new InvalidOperationException($"Embedded asset not found: {fileName}");
|
||||||
|
using var ms = new MemoryStream();
|
||||||
|
stream.CopyTo(ms);
|
||||||
|
return ms.ToArray();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact(Timeout = 5_000)]
|
||||||
|
public async Task Get_UnknownAsset_Returns404()
|
||||||
|
{
|
||||||
|
int adminPort = PickFreePort();
|
||||||
|
int proxyPort = PickFreePort();
|
||||||
|
|
||||||
|
var host = BuildHost(adminPort: adminPort, simHost: "127.0.0.1", simPort: 502,
|
||||||
|
proxyPort: proxyPort, bcd16Addresses: []);
|
||||||
|
await using var _ = new AsyncHostDispose(host);
|
||||||
|
using var startCts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
|
||||||
|
await host.StartAsync(startCts.Token);
|
||||||
|
|
||||||
|
await WaitForAdminAsync(adminPort);
|
||||||
|
|
||||||
|
var response = await HttpClient.GetAsync($"http://127.0.0.1:{adminPort}/assets/no-such-file.js",
|
||||||
|
TestContext.Current.CancellationToken);
|
||||||
|
response.StatusCode.ShouldBe(System.Net.HttpStatusCode.NotFound);
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── 5. AdminPort collision → proxy still runs + bind.failed logged ────────
|
// ── 5. AdminPort collision → proxy still runs + bind.failed logged ────────
|
||||||
@@ -334,9 +420,9 @@ public sealed class AdminEndpointTests
|
|||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// Verifies the admin endpoint rejects non-GET methods (POST / PUT / DELETE)
|
/// Verifies the admin endpoint rejects non-GET methods (POST / PUT / DELETE)
|
||||||
/// with HTTP 405 Method Not Allowed. The design intentionally exposes only `GET /`
|
/// against the read-only routes `GET /` and `GET /status.json` with HTTP 405.
|
||||||
/// and `GET /status.json`; this test guards against an accidental MapPost/Map* being
|
/// (The SignalR hub at `/hub/status` legitimately accepts POST and is not tested
|
||||||
/// added later.
|
/// here.) Guards against an accidental MapPost/Map* being added later.
|
||||||
/// </summary>
|
/// </summary>
|
||||||
[Theory(Timeout = 5_000)]
|
[Theory(Timeout = 5_000)]
|
||||||
[InlineData("POST")]
|
[InlineData("POST")]
|
||||||
|
|||||||
@@ -0,0 +1,50 @@
|
|||||||
|
using System.Text.Json;
|
||||||
|
using Mbproxy.Admin;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
|
||||||
|
namespace Mbproxy.Tests.Admin;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Locks the SignalR payload wire shape. The hub serialises detail / fleet payloads
|
||||||
|
/// with a camelCase property policy (see <c>AdminEndpointHost</c>'s <c>AddJsonProtocol</c>),
|
||||||
|
/// and the dashboard JS reads camelCase field names — so a regression to the naming
|
||||||
|
/// policy would silently break every field on the live feed with no other failing test.
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class DebugDtoSerializationTests
|
||||||
|
{
|
||||||
|
// The exact configuration AdminEndpointHost applies to the hub's
|
||||||
|
// PayloadSerializerOptions — referenced, not copied, so the two cannot drift.
|
||||||
|
private static readonly JsonSerializerOptions HubOptions = BuildHubOptions();
|
||||||
|
|
||||||
|
private static JsonSerializerOptions BuildHubOptions()
|
||||||
|
{
|
||||||
|
var o = new JsonSerializerOptions();
|
||||||
|
AdminEndpointHost.ConfigureHubPayloadJson(o);
|
||||||
|
return o;
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void PlcDetailResponse_SerializesWithCamelCaseFieldNames()
|
||||||
|
{
|
||||||
|
var detail = new PlcDetailResponse(
|
||||||
|
Plc: null,
|
||||||
|
Debug: new PlcDebugSnapshot(
|
||||||
|
CaptureArmed: true,
|
||||||
|
Tags: [new TagValueDto(
|
||||||
|
Address: 100, Width: 16, Name: "Left AirSP", HasValue: true,
|
||||||
|
Direction: "read", RawHex: "0x1234", DecodedValue: 1234,
|
||||||
|
UpdatedAtUtc: "2026-05-16T00:00:00Z", AgeSeconds: 1.5)]));
|
||||||
|
|
||||||
|
string json = JsonSerializer.Serialize(detail, HubOptions);
|
||||||
|
|
||||||
|
// Case.Sensitive throughout — Shouldly's string contains defaults to
|
||||||
|
// case-insensitive, which would not distinguish camelCase from PascalCase.
|
||||||
|
json.ShouldContain("\"captureArmed\"", Case.Sensitive);
|
||||||
|
json.ShouldContain("\"decodedValue\"", Case.Sensitive);
|
||||||
|
json.ShouldContain("\"updatedAtUtc\"", Case.Sensitive);
|
||||||
|
json.ShouldNotContain("\"CaptureArmed\"", Case.Sensitive);
|
||||||
|
json.ShouldNotContain("\"DecodedValue\"", Case.Sensitive);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,63 @@
|
|||||||
|
using System.Reflection;
|
||||||
|
using Mbproxy.Admin;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
|
||||||
|
namespace Mbproxy.Tests.Admin;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Guards the <c>Admin\wwwroot\*.*</c> embedded-resource glob in <c>Mbproxy.csproj</c>.
|
||||||
|
/// A broken or narrowed glob would silently drop a UI asset from the single-file binary;
|
||||||
|
/// the admin endpoint would then 404 it at runtime with no compile-time failure. This
|
||||||
|
/// test fails the build instead by comparing the on-disk source folder against the
|
||||||
|
/// assembly's manifest resources.
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class EmbeddedAssetsTests
|
||||||
|
{
|
||||||
|
private const string ResourcePrefix = "Mbproxy.Admin.wwwroot.";
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void EveryWwwrootFile_IsEmbeddedAsAManifestResource()
|
||||||
|
{
|
||||||
|
var sourceDir = LocateWwwrootSource();
|
||||||
|
var sourceFiles = Directory.GetFiles(sourceDir)
|
||||||
|
.Select(Path.GetFileName)
|
||||||
|
.Where(n => n is not null)
|
||||||
|
.Select(n => n!)
|
||||||
|
.ToArray();
|
||||||
|
|
||||||
|
sourceFiles.ShouldNotBeEmpty("the source wwwroot folder should contain UI assets");
|
||||||
|
|
||||||
|
var embedded = typeof(StatusHub).Assembly
|
||||||
|
.GetManifestResourceNames()
|
||||||
|
.Where(n => n.StartsWith(ResourcePrefix, StringComparison.Ordinal))
|
||||||
|
.ToHashSet(StringComparer.Ordinal);
|
||||||
|
|
||||||
|
foreach (var file in sourceFiles)
|
||||||
|
{
|
||||||
|
embedded.ShouldContain(ResourcePrefix + file,
|
||||||
|
$"wwwroot asset '{file}' is not embedded — check the EmbeddedResource glob in Mbproxy.csproj");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Walks up from the test assembly directory to the repo and returns
|
||||||
|
/// <c>src/Mbproxy/Admin/wwwroot</c> — same upward-search pattern the simulator
|
||||||
|
/// fixture uses to find <c>tests/sim</c>.
|
||||||
|
/// </summary>
|
||||||
|
private static string LocateWwwrootSource()
|
||||||
|
{
|
||||||
|
var dir = new DirectoryInfo(
|
||||||
|
Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location) ?? ".");
|
||||||
|
while (dir is not null)
|
||||||
|
{
|
||||||
|
var candidate = Path.Combine(dir.FullName, "src", "Mbproxy", "Admin", "wwwroot");
|
||||||
|
if (Directory.Exists(candidate))
|
||||||
|
return candidate;
|
||||||
|
dir = dir.Parent;
|
||||||
|
}
|
||||||
|
throw new DirectoryNotFoundException(
|
||||||
|
"Could not locate src/Mbproxy/Admin/wwwroot above the test assembly.");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,160 @@
|
|||||||
|
using System.Net;
|
||||||
|
using System.Net.Http;
|
||||||
|
using System.Net.Sockets;
|
||||||
|
using System.Text.Json;
|
||||||
|
using Mbproxy.Options;
|
||||||
|
using Mbproxy.Proxy;
|
||||||
|
using Microsoft.AspNetCore.SignalR.Client;
|
||||||
|
using Microsoft.Extensions.Configuration;
|
||||||
|
using Microsoft.Extensions.DependencyInjection;
|
||||||
|
using Microsoft.Extensions.Hosting;
|
||||||
|
using Serilog;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
|
||||||
|
namespace Mbproxy.Tests.Admin;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// End-to-end test of the SignalR live feed: a real <see cref="HubConnection"/> against
|
||||||
|
/// the live Kestrel admin host. Exercises the whole push path that no other test covers —
|
||||||
|
/// <see cref="Mbproxy.Admin.StatusHub"/> group joins, the <c>MapHub</c> wiring, the
|
||||||
|
/// <see cref="Mbproxy.Admin.SignalRStatusPushSink"/>, and the broadcaster loop —
|
||||||
|
/// confirming that <c>SubscribeFleet</c> yields a <c>"fleet"</c> message and
|
||||||
|
/// <c>SubscribePlc</c> yields a <c>"plc"</c> message.
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "E2E")]
|
||||||
|
public sealed class HubStatusE2ETests
|
||||||
|
{
|
||||||
|
[Fact(Timeout = 15_000)]
|
||||||
|
public async Task SubscribeFleet_ReceivesFleetSnapshot()
|
||||||
|
{
|
||||||
|
int adminPort = PickFreePort();
|
||||||
|
int proxyPort = PickFreePort();
|
||||||
|
|
||||||
|
await using var host = BuildHost(adminPort, proxyPort);
|
||||||
|
using var startCts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
|
||||||
|
await host.Host.StartAsync(startCts.Token);
|
||||||
|
await WaitForAdminAsync(adminPort);
|
||||||
|
|
||||||
|
await using var connection = new HubConnectionBuilder()
|
||||||
|
.WithUrl($"http://127.0.0.1:{adminPort}/hub/status")
|
||||||
|
.Build();
|
||||||
|
|
||||||
|
var fleet = new TaskCompletionSource<JsonElement>(
|
||||||
|
TaskCreationOptions.RunContinuationsAsynchronously);
|
||||||
|
connection.On<JsonElement>("fleet", payload => fleet.TrySetResult(payload));
|
||||||
|
|
||||||
|
await connection.StartAsync(TestContext.Current.CancellationToken);
|
||||||
|
await connection.InvokeAsync("SubscribeFleet", TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
|
var snapshot = await fleet.Task.WaitAsync(
|
||||||
|
TimeSpan.FromSeconds(8), TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
|
// The fleet payload is a StatusResponse — assert a couple of its known fields.
|
||||||
|
snapshot.TryGetProperty("service", out _).ShouldBeTrue("fleet payload must carry 'service'");
|
||||||
|
snapshot.TryGetProperty("plcs", out var plcs).ShouldBeTrue("fleet payload must carry 'plcs'");
|
||||||
|
plcs.ValueKind.ShouldBe(JsonValueKind.Array);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact(Timeout = 15_000)]
|
||||||
|
public async Task SubscribePlc_ReceivesDetailSnapshot()
|
||||||
|
{
|
||||||
|
int adminPort = PickFreePort();
|
||||||
|
int proxyPort = PickFreePort();
|
||||||
|
|
||||||
|
await using var host = BuildHost(adminPort, proxyPort);
|
||||||
|
using var startCts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
|
||||||
|
await host.Host.StartAsync(startCts.Token);
|
||||||
|
await WaitForAdminAsync(adminPort);
|
||||||
|
|
||||||
|
await using var connection = new HubConnectionBuilder()
|
||||||
|
.WithUrl($"http://127.0.0.1:{adminPort}/hub/status")
|
||||||
|
.Build();
|
||||||
|
|
||||||
|
var detail = new TaskCompletionSource<JsonElement>(
|
||||||
|
TaskCreationOptions.RunContinuationsAsynchronously);
|
||||||
|
connection.On<JsonElement>("plc", payload => detail.TrySetResult(payload));
|
||||||
|
|
||||||
|
await connection.StartAsync(TestContext.Current.CancellationToken);
|
||||||
|
// tabId is a stable per-page-load identifier the real client generates.
|
||||||
|
await connection.InvokeAsync("SubscribePlc", "TestPLC", "tab-e2e",
|
||||||
|
TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
|
var snapshot = await detail.Task.WaitAsync(
|
||||||
|
TimeSpan.FromSeconds(8), TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
|
// The detail payload is a PlcDetailResponse { plc, debug }.
|
||||||
|
snapshot.TryGetProperty("debug", out var debug).ShouldBeTrue("detail payload must carry 'debug'");
|
||||||
|
debug.TryGetProperty("captureArmed", out _).ShouldBeTrue("debug must carry 'captureArmed'");
|
||||||
|
debug.TryGetProperty("tags", out var tags).ShouldBeTrue("debug must carry 'tags'");
|
||||||
|
tags.ValueKind.ShouldBe(JsonValueKind.Array);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Helpers ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
private static HostHandle BuildHost(int adminPort, int proxyPort)
|
||||||
|
{
|
||||||
|
var config = new Dictionary<string, string?>
|
||||||
|
{
|
||||||
|
["Mbproxy:AdminPort"] = adminPort.ToString(),
|
||||||
|
// Fast push cadence so the subscribed client sees a message promptly.
|
||||||
|
["Mbproxy:AdminPushIntervalMs"] = "100",
|
||||||
|
["Mbproxy:Plcs:0:Name"] = "TestPLC",
|
||||||
|
["Mbproxy:Plcs:0:ListenPort"] = proxyPort.ToString(),
|
||||||
|
["Mbproxy:Plcs:0:Host"] = "127.0.0.1",
|
||||||
|
["Mbproxy:Plcs:0:Port"] = "502",
|
||||||
|
["Mbproxy:Connection:BackendConnectTimeoutMs"] = "500",
|
||||||
|
["Mbproxy:Connection:BackendRequestTimeoutMs"] = "500",
|
||||||
|
};
|
||||||
|
|
||||||
|
var builder = Host.CreateApplicationBuilder();
|
||||||
|
builder.Configuration.AddInMemoryCollection(config);
|
||||||
|
builder.Services.AddSerilog(
|
||||||
|
new LoggerConfiguration().MinimumLevel.Fatal().CreateLogger(), dispose: false);
|
||||||
|
builder.AddMbproxyOptions();
|
||||||
|
builder.Services.AddSingleton<IPduPipeline, NoopPduPipeline>();
|
||||||
|
builder.Services.AddSingleton<ProxyWorker>();
|
||||||
|
builder.Services.AddHostedService(sp => sp.GetRequiredService<ProxyWorker>());
|
||||||
|
builder.AddMbproxyAdmin();
|
||||||
|
|
||||||
|
return new HostHandle(builder.Build());
|
||||||
|
}
|
||||||
|
|
||||||
|
private static async Task WaitForAdminAsync(int adminPort)
|
||||||
|
{
|
||||||
|
using var http = new HttpClient();
|
||||||
|
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
|
||||||
|
while (!cts.IsCancellationRequested)
|
||||||
|
{
|
||||||
|
try
|
||||||
|
{
|
||||||
|
var r = await http.GetAsync($"http://127.0.0.1:{adminPort}/status.json", cts.Token);
|
||||||
|
if (r.StatusCode == HttpStatusCode.OK) return;
|
||||||
|
}
|
||||||
|
catch { }
|
||||||
|
await Task.Delay(100, cts.Token).ConfigureAwait(false);
|
||||||
|
}
|
||||||
|
throw new TimeoutException($"Admin endpoint on port {adminPort} did not start in time.");
|
||||||
|
}
|
||||||
|
|
||||||
|
private static int PickFreePort()
|
||||||
|
{
|
||||||
|
var l = new TcpListener(IPAddress.Loopback, 0);
|
||||||
|
l.Start();
|
||||||
|
int port = ((IPEndPoint)l.LocalEndpoint).Port;
|
||||||
|
l.Stop();
|
||||||
|
return port;
|
||||||
|
}
|
||||||
|
|
||||||
|
private sealed class HostHandle(IHost host) : IAsyncDisposable
|
||||||
|
{
|
||||||
|
public IHost Host { get; } = host;
|
||||||
|
|
||||||
|
public async ValueTask DisposeAsync()
|
||||||
|
{
|
||||||
|
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5));
|
||||||
|
try { await Host.StopAsync(cts.Token); } catch { }
|
||||||
|
Host.Dispose();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,127 @@
|
|||||||
|
using Mbproxy.Admin;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
|
||||||
|
namespace Mbproxy.Tests.Admin;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Unit tests for <see cref="PlcSubscriptionTracker"/> — the tab-keyed, reconnect-safe
|
||||||
|
/// record of which PLC detail pages are open. Includes a concurrency stress test, since
|
||||||
|
/// the tracker is mutated from multiple SignalR hub-dispatch threads.
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class PlcSubscriptionTrackerTests
|
||||||
|
{
|
||||||
|
[Fact]
|
||||||
|
public void Subscribe_ThenRemoveLastConnection_ClearsViewer()
|
||||||
|
{
|
||||||
|
var t = new PlcSubscriptionTracker();
|
||||||
|
t.SubscribePlc("c1", "tab", "plc");
|
||||||
|
t.ActivePlcs().ShouldBe(["plc"]);
|
||||||
|
|
||||||
|
t.RemoveConnection("c1");
|
||||||
|
t.ActivePlcs().ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void SameTab_TwoConnections_StaysActiveUntilLastConnectionGone()
|
||||||
|
{
|
||||||
|
// Reconnect overlap: the same tab briefly holds two connections. Dropping the
|
||||||
|
// old one must not release the tab — this is the leak C2 guards against.
|
||||||
|
var t = new PlcSubscriptionTracker();
|
||||||
|
t.SubscribePlc("c-old", "tab", "plc");
|
||||||
|
t.SubscribePlc("c-new", "tab", "plc");
|
||||||
|
|
||||||
|
t.RemoveConnection("c-old");
|
||||||
|
t.ActivePlcs().ShouldContain("plc", "the tab is still alive on the second connection");
|
||||||
|
|
||||||
|
t.RemoveConnection("c-new");
|
||||||
|
t.ActivePlcs().ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void SameTab_TwoConnections_RemovedNewestFirst_StaysActiveUntilLast()
|
||||||
|
{
|
||||||
|
// Mirror of SameTab_TwoConnections_StaysActiveUntilLastConnectionGone: the
|
||||||
|
// reconnect's NEW connection is the one that drops first (the order is not
|
||||||
|
// guaranteed). The tab must still be alive on the surviving old connection.
|
||||||
|
var t = new PlcSubscriptionTracker();
|
||||||
|
t.SubscribePlc("c-old", "tab", "plc");
|
||||||
|
t.SubscribePlc("c-new", "tab", "plc");
|
||||||
|
|
||||||
|
t.RemoveConnection("c-new");
|
||||||
|
t.ActivePlcs().ShouldContain("plc", "the tab still holds the old connection");
|
||||||
|
|
||||||
|
t.RemoveConnection("c-old");
|
||||||
|
t.ActivePlcs().ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void DistinctTabs_AreCountedSeparately()
|
||||||
|
{
|
||||||
|
var t = new PlcSubscriptionTracker();
|
||||||
|
t.SubscribePlc("c1", "tab-A", "plc");
|
||||||
|
t.SubscribePlc("c2", "tab-B", "plc");
|
||||||
|
|
||||||
|
t.RemoveConnection("c1");
|
||||||
|
t.ActivePlcs().ShouldContain("plc", "the second tab still views the PLC");
|
||||||
|
|
||||||
|
t.RemoveConnection("c2");
|
||||||
|
t.ActivePlcs().ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void RepeatedSubscribe_SameTabSamePlc_IsIdempotent()
|
||||||
|
{
|
||||||
|
var t = new PlcSubscriptionTracker();
|
||||||
|
t.SubscribePlc("c1", "tab", "plc");
|
||||||
|
t.SubscribePlc("c1", "tab", "plc"); // redundant repeat
|
||||||
|
t.ActivePlcs().ShouldBe(["plc"]);
|
||||||
|
|
||||||
|
t.RemoveConnection("c1");
|
||||||
|
t.ActivePlcs().ShouldBeEmpty("a repeated subscribe must not inflate the viewer count");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void OneConnection_MultiplePlcs_AllReleasedTogether()
|
||||||
|
{
|
||||||
|
var t = new PlcSubscriptionTracker();
|
||||||
|
t.SubscribePlc("c1", "tab", "plc-a");
|
||||||
|
t.SubscribePlc("c1", "tab", "plc-b");
|
||||||
|
t.ActivePlcs().Count.ShouldBe(2);
|
||||||
|
|
||||||
|
t.RemoveConnection("c1");
|
||||||
|
t.ActivePlcs().ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void RemoveConnection_Unknown_IsNoOp()
|
||||||
|
{
|
||||||
|
var t = new PlcSubscriptionTracker();
|
||||||
|
|
||||||
|
Should.NotThrow(() => t.RemoveConnection("never-seen"));
|
||||||
|
t.ActivePlcs().ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task ConcurrentSubscribeAndRemove_NeverLeaksOrThrows()
|
||||||
|
{
|
||||||
|
var t = new PlcSubscriptionTracker();
|
||||||
|
const int tasks = 16;
|
||||||
|
const int iterations = 5_000;
|
||||||
|
|
||||||
|
await Task.WhenAll(Enumerable.Range(0, tasks).Select(taskNo => Task.Run(() =>
|
||||||
|
{
|
||||||
|
for (int i = 0; i < iterations; i++)
|
||||||
|
{
|
||||||
|
string conn = $"c{taskNo}-{i}";
|
||||||
|
string tab = $"tab{taskNo}-{i}";
|
||||||
|
t.SubscribePlc(conn, tab, "plc");
|
||||||
|
t.RemoveConnection(conn);
|
||||||
|
}
|
||||||
|
}, TestContext.Current.CancellationToken)));
|
||||||
|
|
||||||
|
t.ActivePlcs().ShouldBeEmpty(
|
||||||
|
"every subscribe was paired with a remove — no viewer count may leak");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,79 @@
|
|||||||
|
using System.Collections.Concurrent;
|
||||||
|
using System.Security.Claims;
|
||||||
|
using Mbproxy.Admin;
|
||||||
|
using Microsoft.AspNetCore.Http.Features;
|
||||||
|
using Microsoft.AspNetCore.SignalR;
|
||||||
|
|
||||||
|
namespace Mbproxy.Tests.Admin;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Minimal hand-written test doubles for the SignalR surface <see cref="StatusHub"/>
|
||||||
|
/// and <see cref="StatusBroadcaster"/> touch. The project carries no mocking framework,
|
||||||
|
/// so these record just enough to assert behaviour.
|
||||||
|
/// </summary>
|
||||||
|
internal sealed class FakeHubCallerContext : HubCallerContext
|
||||||
|
{
|
||||||
|
public FakeHubCallerContext(string connectionId) => ConnectionId = connectionId;
|
||||||
|
|
||||||
|
public override string ConnectionId { get; }
|
||||||
|
public override string? UserIdentifier => null;
|
||||||
|
public override ClaimsPrincipal? User => null;
|
||||||
|
public override IDictionary<object, object?> Items { get; } = new Dictionary<object, object?>();
|
||||||
|
public override IFeatureCollection Features { get; } = new FeatureCollection();
|
||||||
|
public override CancellationToken ConnectionAborted => CancellationToken.None;
|
||||||
|
public override void Abort() { }
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Records every group join so tests can assert membership changes.</summary>
|
||||||
|
internal sealed class FakeGroupManager : IGroupManager
|
||||||
|
{
|
||||||
|
public List<(string ConnectionId, string Group)> Added { get; } = [];
|
||||||
|
|
||||||
|
public Task AddToGroupAsync(string connectionId, string groupName, CancellationToken cancellationToken = default)
|
||||||
|
{
|
||||||
|
Added.Add((connectionId, groupName));
|
||||||
|
return Task.CompletedTask;
|
||||||
|
}
|
||||||
|
|
||||||
|
// StatusHub.OnDisconnectedAsync never calls RemoveFromGroupAsync — SignalR removes a
|
||||||
|
// disconnected connection from its groups implicitly. Nothing to record here.
|
||||||
|
public Task RemoveFromGroupAsync(string connectionId, string groupName, CancellationToken cancellationToken = default)
|
||||||
|
=> Task.CompletedTask;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Records every push so <see cref="StatusBroadcaster"/> tests can assert routing.</summary>
|
||||||
|
internal sealed class FakeStatusPushSink : IStatusPushSink
|
||||||
|
{
|
||||||
|
private readonly ConcurrentBag<StatusResponse> _fleet = [];
|
||||||
|
private readonly ConcurrentBag<(string Plc, PlcDetailResponse Detail)> _plc = [];
|
||||||
|
|
||||||
|
public IReadOnlyCollection<StatusResponse> FleetPushes => _fleet;
|
||||||
|
public IReadOnlyCollection<(string Plc, PlcDetailResponse Detail)> PlcPushes => _plc;
|
||||||
|
|
||||||
|
public Task PushFleetAsync(StatusResponse snapshot, CancellationToken ct)
|
||||||
|
{
|
||||||
|
_fleet.Add(snapshot);
|
||||||
|
return Task.CompletedTask;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Task PushPlcAsync(string plcName, PlcDetailResponse detail, CancellationToken ct)
|
||||||
|
{
|
||||||
|
_plc.Add((plcName, detail));
|
||||||
|
return Task.CompletedTask;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// An <see cref="IStatusPushSink"/> whose every push fails with an exception from the
|
||||||
|
/// supplied factory — used to prove <see cref="StatusBroadcaster.PushOnceAsync"/> swallows
|
||||||
|
/// a transport fault (and that its <c>when (ex is not OperationCanceledException)</c> filter
|
||||||
|
/// still lets a cancellation propagate).
|
||||||
|
/// </summary>
|
||||||
|
internal sealed class ThrowingStatusPushSink(Func<Exception> exceptionFactory) : IStatusPushSink
|
||||||
|
{
|
||||||
|
public Task PushFleetAsync(StatusResponse snapshot, CancellationToken ct)
|
||||||
|
=> throw exceptionFactory();
|
||||||
|
|
||||||
|
public Task PushPlcAsync(string plcName, PlcDetailResponse detail, CancellationToken ct)
|
||||||
|
=> throw exceptionFactory();
|
||||||
|
}
|
||||||
@@ -0,0 +1,192 @@
|
|||||||
|
using Mbproxy.Admin;
|
||||||
|
using Mbproxy.Bcd;
|
||||||
|
using Mbproxy.Options;
|
||||||
|
using Mbproxy.Proxy;
|
||||||
|
using Microsoft.Extensions.Configuration;
|
||||||
|
using Microsoft.Extensions.DependencyInjection;
|
||||||
|
using Microsoft.Extensions.Hosting;
|
||||||
|
using Microsoft.Extensions.Logging.Abstractions;
|
||||||
|
using Microsoft.Extensions.Options;
|
||||||
|
using Serilog;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
|
||||||
|
namespace Mbproxy.Tests.Admin;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Unit tests for <see cref="StatusBroadcaster"/>'s push-cycle logic — fleet always
|
||||||
|
/// pushed, per-PLC pushed only for PLCs with a detail-page subscriber, and every
|
||||||
|
/// capture disarmed on stop. The SignalR sink is faked; a real
|
||||||
|
/// <see cref="StatusSnapshotBuilder"/> is resolved from a minimal in-process host.
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class StatusBroadcasterTests
|
||||||
|
{
|
||||||
|
private sealed record Harness(
|
||||||
|
IHost Host,
|
||||||
|
StatusBroadcaster Broadcaster,
|
||||||
|
FakeStatusPushSink Sink,
|
||||||
|
StatusSnapshotBuilder Builder,
|
||||||
|
TagCaptureRegistry Registry,
|
||||||
|
PlcSubscriptionTracker Tracker) : IAsyncDisposable
|
||||||
|
{
|
||||||
|
public async ValueTask DisposeAsync()
|
||||||
|
{
|
||||||
|
await Broadcaster.DisposeAsync();
|
||||||
|
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5));
|
||||||
|
try { await Host.StopAsync(cts.Token); } catch { }
|
||||||
|
Host.Dispose();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private static async Task<Harness> BuildAsync(IStatusPushSink? sinkOverride = null)
|
||||||
|
{
|
||||||
|
var hostBuilder = Host.CreateApplicationBuilder();
|
||||||
|
hostBuilder.Configuration.AddInMemoryCollection(new Dictionary<string, string?>
|
||||||
|
{
|
||||||
|
["Mbproxy:AdminPort"] = "0",
|
||||||
|
// Fast tick so the LoopAsync test observes several cycles quickly.
|
||||||
|
["Mbproxy:AdminPushIntervalMs"] = "100",
|
||||||
|
});
|
||||||
|
hostBuilder.Services.AddSerilog(
|
||||||
|
new LoggerConfiguration().MinimumLevel.Fatal().CreateLogger(), dispose: false);
|
||||||
|
hostBuilder.AddMbproxyOptions();
|
||||||
|
hostBuilder.Services.AddSingleton<IPduPipeline, NoopPduPipeline>();
|
||||||
|
hostBuilder.Services.AddSingleton<ProxyWorker>();
|
||||||
|
hostBuilder.Services.AddHostedService(sp => sp.GetRequiredService<ProxyWorker>());
|
||||||
|
hostBuilder.Services.AddSingleton<AssemblyVersionAccessor>();
|
||||||
|
hostBuilder.Services.AddSingleton<StatusSnapshotBuilder>();
|
||||||
|
|
||||||
|
var host = hostBuilder.Build();
|
||||||
|
using var startCts = new CancellationTokenSource(TimeSpan.FromSeconds(15));
|
||||||
|
await host.StartAsync(startCts.Token);
|
||||||
|
|
||||||
|
var builder = host.Services.GetRequiredService<StatusSnapshotBuilder>();
|
||||||
|
var registry = host.Services.GetRequiredService<TagCaptureRegistry>();
|
||||||
|
var options = host.Services.GetRequiredService<IOptionsMonitor<MbproxyOptions>>();
|
||||||
|
var tracker = new PlcSubscriptionTracker();
|
||||||
|
var sink = new FakeStatusPushSink();
|
||||||
|
|
||||||
|
var broadcaster = new StatusBroadcaster(
|
||||||
|
sinkOverride ?? sink, builder, tracker, registry, options, NullLogger.Instance);
|
||||||
|
|
||||||
|
return new Harness(host, broadcaster, sink, builder, registry, tracker);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task PushOnce_AlwaysPushesFleet()
|
||||||
|
{
|
||||||
|
await using var h = await BuildAsync();
|
||||||
|
|
||||||
|
await h.Broadcaster.PushOnceAsync(TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
|
h.Sink.FleetPushes.Count.ShouldBe(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task PushOnce_NoActivePlcs_SkipsPerPlcPush()
|
||||||
|
{
|
||||||
|
await using var h = await BuildAsync();
|
||||||
|
|
||||||
|
await h.Broadcaster.PushOnceAsync(TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
|
h.Sink.PlcPushes.ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task PushOnce_ActivePlc_PushesDetailWithDebugSnapshot()
|
||||||
|
{
|
||||||
|
await using var h = await BuildAsync();
|
||||||
|
h.Registry.GetOrCreate("plc-x", BcdTagMap.Empty);
|
||||||
|
h.Tracker.SubscribePlc("conn-1", "tab-1", "plc-x");
|
||||||
|
|
||||||
|
await h.Broadcaster.PushOnceAsync(TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
|
var push = h.Sink.PlcPushes.ShouldHaveSingleItem();
|
||||||
|
push.Plc.ShouldBe("plc-x");
|
||||||
|
push.Detail.Debug.ShouldNotBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task PushOnce_ReconcilesCaptureArmState_FromActiveViewers()
|
||||||
|
{
|
||||||
|
await using var h = await BuildAsync();
|
||||||
|
h.Registry.GetOrCreate("plc-x", BcdTagMap.Empty);
|
||||||
|
|
||||||
|
// No viewer yet — a push must leave the capture disarmed.
|
||||||
|
await h.Broadcaster.PushOnceAsync(TestContext.Current.CancellationToken);
|
||||||
|
h.Registry.TryGet("plc-x", out var capture).ShouldBeTrue();
|
||||||
|
capture.IsArmed.ShouldBeFalse("no detail page open — capture stays disarmed");
|
||||||
|
|
||||||
|
// A viewer opens the detail page — the next push arms the capture.
|
||||||
|
h.Tracker.SubscribePlc("conn-1", "tab-1", "plc-x");
|
||||||
|
await h.Broadcaster.PushOnceAsync(TestContext.Current.CancellationToken);
|
||||||
|
capture.IsArmed.ShouldBeTrue("the broadcaster reconciles the capture armed for a viewed PLC");
|
||||||
|
|
||||||
|
// The viewer leaves — the next push disarms it again.
|
||||||
|
h.Tracker.RemoveConnection("conn-1");
|
||||||
|
await h.Broadcaster.PushOnceAsync(TestContext.Current.CancellationToken);
|
||||||
|
capture.IsArmed.ShouldBeFalse("the broadcaster disarms a capture once its last viewer leaves");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task PushOnce_SinkThrowsNonCancellation_FailureIsSwallowed()
|
||||||
|
{
|
||||||
|
// A SignalR transport fault on a push must not escape PushOnceAsync — the loop
|
||||||
|
// has to survive it and retry on the next cycle.
|
||||||
|
var throwing = new ThrowingStatusPushSink(() => new InvalidOperationException("boom"));
|
||||||
|
await using var h = await BuildAsync(throwing);
|
||||||
|
|
||||||
|
await Should.NotThrowAsync(
|
||||||
|
() => h.Broadcaster.PushOnceAsync(TestContext.Current.CancellationToken));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task PushOnce_SinkThrowsOperationCanceled_Propagates()
|
||||||
|
{
|
||||||
|
// The catch filters are `when (ex is not OperationCanceledException)` — a genuine
|
||||||
|
// cancellation must propagate so the loop unwinds at shutdown instead of being
|
||||||
|
// swallowed and retried.
|
||||||
|
var throwing = new ThrowingStatusPushSink(() => new OperationCanceledException());
|
||||||
|
await using var h = await BuildAsync(throwing);
|
||||||
|
|
||||||
|
await Should.ThrowAsync<OperationCanceledException>(
|
||||||
|
() => h.Broadcaster.PushOnceAsync(TestContext.Current.CancellationToken));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task StopAsync_DisarmsEveryCapture()
|
||||||
|
{
|
||||||
|
await using var h = await BuildAsync();
|
||||||
|
h.Registry.GetOrCreate("plc-x", BcdTagMap.Empty);
|
||||||
|
h.Registry.ReconcileArmed(["plc-x"]);
|
||||||
|
|
||||||
|
await h.Broadcaster.StopAsync();
|
||||||
|
|
||||||
|
h.Registry.TryGet("plc-x", out var capture).ShouldBeTrue();
|
||||||
|
capture.IsArmed.ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Loop_PushesRepeatedly_ThenStopsAfterStopAsync()
|
||||||
|
{
|
||||||
|
await using var h = await BuildAsync();
|
||||||
|
|
||||||
|
h.Broadcaster.Start();
|
||||||
|
|
||||||
|
// The harness runs at AdminPushIntervalMs = 100 ms; wait (generously) for the
|
||||||
|
// background loop to complete several cycles.
|
||||||
|
var deadline = DateTime.UtcNow.AddSeconds(10);
|
||||||
|
while (h.Sink.FleetPushes.Count < 3 && DateTime.UtcNow < deadline)
|
||||||
|
await Task.Delay(50, TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
|
h.Sink.FleetPushes.Count.ShouldBeGreaterThanOrEqualTo(3,
|
||||||
|
"the background loop must push the fleet snapshot every interval");
|
||||||
|
|
||||||
|
// StopAsync awaits the loop task before returning, so the loop is guaranteed
|
||||||
|
// terminated here — no settling delay is needed for the assertion to be sound.
|
||||||
|
await h.Broadcaster.StopAsync();
|
||||||
|
int afterStop = h.Sink.FleetPushes.Count;
|
||||||
|
h.Sink.FleetPushes.Count.ShouldBe(afterStop, "no pushes may occur after StopAsync");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -1,128 +0,0 @@
|
|||||||
using Mbproxy.Admin;
|
|
||||||
using Shouldly;
|
|
||||||
using Xunit;
|
|
||||||
|
|
||||||
namespace Mbproxy.Tests.Admin;
|
|
||||||
|
|
||||||
/// <summary>
|
|
||||||
/// Unit tests for <see cref="StatusHtmlRenderer"/>.
|
|
||||||
/// All tests are pure: no network, no host, no DI.
|
|
||||||
/// </summary>
|
|
||||||
[Trait("Category", "Unit")]
|
|
||||||
public sealed class StatusHtmlRendererTests
|
|
||||||
{
|
|
||||||
// ── Helpers ───────────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
private static StatusResponse MakeStatus(
|
|
||||||
IReadOnlyList<PlcStatus>? plcs = null,
|
|
||||||
int uptimeSeconds = 42,
|
|
||||||
string version = "1.2.3")
|
|
||||||
{
|
|
||||||
var service = new ServiceFields(
|
|
||||||
UptimeSeconds: uptimeSeconds,
|
|
||||||
Version: version,
|
|
||||||
ConfigLastReloadUtc: null,
|
|
||||||
ConfigReloadCount: 0,
|
|
||||||
ConfigReloadRejectedCount: 0);
|
|
||||||
|
|
||||||
var listeners = new ListenersAggregate(Bound: plcs?.Count ?? 0, Configured: plcs?.Count ?? 0);
|
|
||||||
return new StatusResponse(service, listeners, plcs ?? []);
|
|
||||||
}
|
|
||||||
|
|
||||||
private static PlcStatus MakePlc(
|
|
||||||
string name = "PLC-A",
|
|
||||||
string state = "bound",
|
|
||||||
string? lastBindError = null,
|
|
||||||
int recoveryAttempts = 0,
|
|
||||||
IReadOnlyList<ClientSnapshot>? clients = null)
|
|
||||||
{
|
|
||||||
var noClients = (IReadOnlyList<ClientSnapshot>)[];
|
|
||||||
return new PlcStatus(
|
|
||||||
Name: name,
|
|
||||||
Host: "10.0.0.1",
|
|
||||||
ListenPort: 5020,
|
|
||||||
Listener: new PlcListenerStatus(state, lastBindError, recoveryAttempts),
|
|
||||||
Clients: new PlcClientsStatus(clients?.Count ?? 0, clients ?? noClients),
|
|
||||||
Pdus: new PlcPdusStatus(100, new FcCounts(50, 10, 20, 15, 5), 30, 2, 0),
|
|
||||||
Backend: new PlcBackendStatus(
|
|
||||||
ConnectsSuccess: 0, ConnectsFailed: 0,
|
|
||||||
ExceptionsByCode: new ExceptionCounts(1, 0, 0, 0, 0),
|
|
||||||
LastRoundTripMs: 3.5,
|
|
||||||
InFlight: 0, MaxInFlight: 0, TxIdWraps: 0,
|
|
||||||
DisconnectCascades: 0, QueueDepth: 0,
|
|
||||||
CoalescedHitCount: 0, CoalescedMissCount: 0,
|
|
||||||
CoalescedResponseToDeadUpstream: 0,
|
|
||||||
CacheHitCount: 0, CacheMissCount: 0,
|
|
||||||
CacheInvalidations: 0, CacheEntryCount: 0, CacheBytes: 0,
|
|
||||||
BackendHeartbeatsSent: 0, BackendHeartbeatsFailed: 0,
|
|
||||||
BackendIdleDisconnects: 0),
|
|
||||||
Bytes: new PlcBytesStatus(1024, 2048));
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── 1. Valid HTML with meta-refresh for a single PLC ─────────────────────
|
|
||||||
|
|
||||||
[Fact]
|
|
||||||
public void Render_OnePlc_ProducesValidHtml_WithMetaRefresh()
|
|
||||||
{
|
|
||||||
var status = MakeStatus([MakePlc("PLC-A", "bound")]);
|
|
||||||
|
|
||||||
string html = StatusHtmlRenderer.Render(status);
|
|
||||||
|
|
||||||
html.ShouldContain("<meta http-equiv=\"refresh\" content=\"5\">");
|
|
||||||
html.ShouldContain("<!DOCTYPE html>");
|
|
||||||
html.ShouldContain("</html>");
|
|
||||||
html.ShouldContain("PLC-A");
|
|
||||||
html.ShouldContain("bound");
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── 2. Recovering state highlights error ─────────────────────────────────
|
|
||||||
|
|
||||||
[Fact]
|
|
||||||
public void Render_RecoveringPlc_HighlightsState()
|
|
||||||
{
|
|
||||||
var plc = MakePlc("PLC-B", "recovering", lastBindError: "Address already in use", recoveryAttempts: 3);
|
|
||||||
var status = MakeStatus([plc]);
|
|
||||||
|
|
||||||
string html = StatusHtmlRenderer.Render(status);
|
|
||||||
|
|
||||||
// State should be orange.
|
|
||||||
html.ShouldContain("class=\"recovering\"");
|
|
||||||
html.ShouldContain("Address already in use");
|
|
||||||
html.ShouldContain("attempt 3");
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── 3. Page weight under 50 KB for 54 PLCs ───────────────────────────────
|
|
||||||
|
|
||||||
[Fact]
|
|
||||||
public void Render_PageWeightUnder50KB_For54Plcs()
|
|
||||||
{
|
|
||||||
const int plcCount = 54;
|
|
||||||
|
|
||||||
// Build 54 realistic PLC rows with 2 clients each.
|
|
||||||
var plcs = new List<PlcStatus>(plcCount);
|
|
||||||
for (int i = 0; i < plcCount; i++)
|
|
||||||
{
|
|
||||||
var clients = new List<ClientSnapshot>
|
|
||||||
{
|
|
||||||
new ClientSnapshot($"10.0.0.{i + 1}:49123", DateTimeOffset.UtcNow, 42),
|
|
||||||
new ClientSnapshot($"10.0.0.{i + 1}:49124", DateTimeOffset.UtcNow, 17),
|
|
||||||
};
|
|
||||||
|
|
||||||
plcs.Add(MakePlc(
|
|
||||||
name: $"Line{i / 10 + 1}-Station{i % 10 + 1:D2}",
|
|
||||||
state: i % 5 == 0 ? "recovering" : "bound",
|
|
||||||
lastBindError: i % 5 == 0 ? "EADDRINUSE" : null,
|
|
||||||
recoveryAttempts: i % 5 == 0 ? 2 : 0,
|
|
||||||
clients: clients));
|
|
||||||
}
|
|
||||||
|
|
||||||
var status = MakeStatus(plcs);
|
|
||||||
|
|
||||||
string html = StatusHtmlRenderer.Render(status);
|
|
||||||
int byteCount = System.Text.Encoding.UTF8.GetByteCount(html);
|
|
||||||
|
|
||||||
// Assert ≤ 50 KB.
|
|
||||||
byteCount.ShouldBeLessThanOrEqualTo(50 * 1024,
|
|
||||||
$"Page weight {byteCount} bytes exceeds 50 KB limit for {plcCount} PLCs");
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -0,0 +1,119 @@
|
|||||||
|
using Mbproxy.Admin;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
|
||||||
|
namespace Mbproxy.Tests.Admin;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Unit tests for <see cref="StatusHub"/> — group joins and subscription tracking.
|
||||||
|
/// Capture arming is the broadcaster's job; the hub only mutates the
|
||||||
|
/// <see cref="PlcSubscriptionTracker"/>. Uses hand-written SignalR test doubles
|
||||||
|
/// (see <see cref="SignalRFakes"/>); no SignalR host is started.
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class StatusHubTests
|
||||||
|
{
|
||||||
|
private static StatusHub MakeHub(
|
||||||
|
string connectionId, PlcSubscriptionTracker tracker, out FakeGroupManager groups)
|
||||||
|
{
|
||||||
|
groups = new FakeGroupManager();
|
||||||
|
return new StatusHub(tracker)
|
||||||
|
{
|
||||||
|
Context = new FakeHubCallerContext(connectionId),
|
||||||
|
Groups = groups,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task SubscribeFleet_JoinsFleetGroup()
|
||||||
|
{
|
||||||
|
var hub = MakeHub("conn-1", new PlcSubscriptionTracker(), out var groups);
|
||||||
|
|
||||||
|
await hub.SubscribeFleet();
|
||||||
|
|
||||||
|
groups.Added.ShouldContain(("conn-1", StatusHub.FleetGroup));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task SubscribePlc_JoinsPlcGroup_AndTracksViewer()
|
||||||
|
{
|
||||||
|
var tracker = new PlcSubscriptionTracker();
|
||||||
|
var hub = MakeHub("conn-1", tracker, out var groups);
|
||||||
|
|
||||||
|
await hub.SubscribePlc("plc-1", "tab-A");
|
||||||
|
|
||||||
|
groups.Added.ShouldContain(("conn-1", StatusHub.PlcGroup("plc-1")));
|
||||||
|
tracker.ActivePlcs().ShouldContain("plc-1");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Reconnect_SameTab_NewConnection_DoesNotLeakViewer()
|
||||||
|
{
|
||||||
|
// A transport reconnect: the same browser tab acquires a new ConnectionId and
|
||||||
|
// re-subscribes; the old connection's OnDisconnectedAsync then fires late. The
|
||||||
|
// PLC must not be left with a stranded viewer once the tab finally closes.
|
||||||
|
var tracker = new PlcSubscriptionTracker();
|
||||||
|
|
||||||
|
var first = MakeHub("conn-old", tracker, out _);
|
||||||
|
await first.SubscribePlc("plc-1", "tab-A");
|
||||||
|
|
||||||
|
var second = MakeHub("conn-new", tracker, out _);
|
||||||
|
await second.SubscribePlc("plc-1", "tab-A");
|
||||||
|
|
||||||
|
await first.OnDisconnectedAsync(null); // late disconnect of the old connection
|
||||||
|
tracker.ActivePlcs().ShouldContain("plc-1",
|
||||||
|
"the tab is still open on the reconnected connection");
|
||||||
|
|
||||||
|
await second.OnDisconnectedAsync(null); // the tab finally closes
|
||||||
|
tracker.ActivePlcs().ShouldBeEmpty("no viewer may be stranded after the tab closes");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Reconnect_SameTab_NewConnectionDisconnectsFirst_DoesNotLeakViewer()
|
||||||
|
{
|
||||||
|
// Mirror of the reconnect test above with the disconnect ordering reversed: the
|
||||||
|
// NEW connection's OnDisconnectedAsync arrives before the old one's. SignalR does
|
||||||
|
// not guarantee the order, so the tracker must be correct either way.
|
||||||
|
var tracker = new PlcSubscriptionTracker();
|
||||||
|
|
||||||
|
var first = MakeHub("conn-old", tracker, out _);
|
||||||
|
await first.SubscribePlc("plc-1", "tab-A");
|
||||||
|
|
||||||
|
var second = MakeHub("conn-new", tracker, out _);
|
||||||
|
await second.SubscribePlc("plc-1", "tab-A");
|
||||||
|
|
||||||
|
await second.OnDisconnectedAsync(null); // the new connection drops first
|
||||||
|
tracker.ActivePlcs().ShouldContain("plc-1",
|
||||||
|
"the tab is still open on the original connection");
|
||||||
|
|
||||||
|
await first.OnDisconnectedAsync(null);
|
||||||
|
tracker.ActivePlcs().ShouldBeEmpty("no viewer may be stranded after the tab closes");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task TwoTabs_FirstCloseKeepsActive_LastCloseClears()
|
||||||
|
{
|
||||||
|
var tracker = new PlcSubscriptionTracker();
|
||||||
|
|
||||||
|
var tabA = MakeHub("conn-a", tracker, out _);
|
||||||
|
var tabB = MakeHub("conn-b", tracker, out _);
|
||||||
|
await tabA.SubscribePlc("plc-1", "tab-A");
|
||||||
|
await tabB.SubscribePlc("plc-1", "tab-B");
|
||||||
|
|
||||||
|
await tabA.OnDisconnectedAsync(null);
|
||||||
|
tracker.ActivePlcs().ShouldContain("plc-1", "a second tab is still viewing the PLC");
|
||||||
|
|
||||||
|
await tabB.OnDisconnectedAsync(null);
|
||||||
|
tracker.ActivePlcs().ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task SubscribePlc_UnknownPlc_DoesNotThrow()
|
||||||
|
{
|
||||||
|
var hub = MakeHub("conn-1", new PlcSubscriptionTracker(), out var groups);
|
||||||
|
|
||||||
|
await Should.NotThrowAsync(async () => await hub.SubscribePlc("ghost", "tab-A"));
|
||||||
|
|
||||||
|
groups.Added.ShouldContain(("conn-1", StatusHub.PlcGroup("ghost")));
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -218,18 +218,60 @@ public sealed class StatusSnapshotBuilderTests
|
|||||||
result.Service.ConfigReloadCount.ShouldBe(1);
|
result.Service.ConfigReloadCount.ShouldBe(1);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── 7. BuildDebug: unknown PLC → empty, disarmed snapshot ────────────────
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task BuildDebug_UnknownPlc_ReturnsEmptyDisarmedSnapshot()
|
||||||
|
{
|
||||||
|
var (host, builder) = await BuildAsync([]);
|
||||||
|
await using var _ = new AsyncHostDispose(host);
|
||||||
|
|
||||||
|
var debug = builder.BuildDebug("no-such-plc");
|
||||||
|
|
||||||
|
debug.CaptureArmed.ShouldBeFalse();
|
||||||
|
debug.Tags.ShouldBeEmpty();
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── 8. BuildDebug: configured PLC → one row per BCD tag, no traffic ──────
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task BuildDebug_ConfiguredPlc_ReturnsTagRows_DisarmedByDefault()
|
||||||
|
{
|
||||||
|
int port = PickFreePort();
|
||||||
|
var (host, builder) = await BuildAsync([("PLC-A", port)], bcd16Address: 1072);
|
||||||
|
await using var _ = new AsyncHostDispose(host);
|
||||||
|
|
||||||
|
await WaitForAsync(() => CanConnect(port), TimeSpan.FromSeconds(5), "PLC-A should bind");
|
||||||
|
|
||||||
|
var debug = builder.BuildDebug("PLC-A");
|
||||||
|
|
||||||
|
debug.CaptureArmed.ShouldBeFalse(); // no detail page open
|
||||||
|
var tag = debug.Tags.ShouldHaveSingleItem();
|
||||||
|
tag.Address.ShouldBe(1072);
|
||||||
|
tag.Width.ShouldBe(16);
|
||||||
|
tag.HasValue.ShouldBeFalse();
|
||||||
|
tag.RawHex.ShouldBe("—");
|
||||||
|
}
|
||||||
|
|
||||||
// ── Helpers ───────────────────────────────────────────────────────────────
|
// ── Helpers ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
private static async Task<(IHost host, StatusSnapshotBuilder builder)> BuildAsync(
|
private static async Task<(IHost host, StatusSnapshotBuilder builder)> BuildAsync(
|
||||||
(string name, int port)[] plcs,
|
(string name, int port)[] plcs,
|
||||||
int startupWaitMs = 200,
|
int startupWaitMs = 200,
|
||||||
int backendPort = 502)
|
int backendPort = 502,
|
||||||
|
int? bcd16Address = null)
|
||||||
{
|
{
|
||||||
var config = new Dictionary<string, string?>
|
var config = new Dictionary<string, string?>
|
||||||
{
|
{
|
||||||
["Mbproxy:AdminPort"] = "0", // disable admin for unit tests
|
["Mbproxy:AdminPort"] = "0", // disable admin for unit tests
|
||||||
};
|
};
|
||||||
|
|
||||||
|
if (bcd16Address is { } addr)
|
||||||
|
{
|
||||||
|
config["Mbproxy:BcdTags:Global:0:Address"] = addr.ToString();
|
||||||
|
config["Mbproxy:BcdTags:Global:0:Width"] = "16";
|
||||||
|
}
|
||||||
|
|
||||||
for (int i = 0; i < plcs.Length; i++)
|
for (int i = 0; i < plcs.Length; i++)
|
||||||
{
|
{
|
||||||
config[$"Mbproxy:Plcs:{i}:Name"] = plcs[i].name;
|
config[$"Mbproxy:Plcs:{i}:Name"] = plcs[i].name;
|
||||||
|
|||||||
@@ -54,6 +54,29 @@ public sealed class BcdTagMapBuilderTests
|
|||||||
t32.Width.ShouldBe((byte)32);
|
t32.Width.ShouldBe((byte)32);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Build_CarriesOptionalTagName_IntoResolvedMap()
|
||||||
|
{
|
||||||
|
// The optional human-friendly Name flows from config options through to the
|
||||||
|
// resolved BcdTag; an omitted Name resolves to null.
|
||||||
|
var global = new BcdTagListOptions
|
||||||
|
{
|
||||||
|
Global =
|
||||||
|
[
|
||||||
|
new BcdTagOptions { Address = 1548, Width = 16, Name = "Left AirSP" },
|
||||||
|
new BcdTagOptions { Address = 1080, Width = 32 },
|
||||||
|
],
|
||||||
|
};
|
||||||
|
|
||||||
|
var result = BcdTagMapBuilder.Build(global, perPlc: null);
|
||||||
|
|
||||||
|
result.Errors.ShouldBeEmpty();
|
||||||
|
result.Map.TryGet(1548, out var named).ShouldBeTrue();
|
||||||
|
named.Name.ShouldBe("Left AirSP");
|
||||||
|
result.Map.TryGet(1080, out var unnamed).ShouldBeTrue();
|
||||||
|
unnamed.Name.ShouldBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
[Fact]
|
[Fact]
|
||||||
public void Build_PerPlcAdd_AppendsToGlobal()
|
public void Build_PerPlcAdd_AppendsToGlobal()
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -68,12 +68,14 @@ public sealed class ConfigReconcilerTests : IAsyncDisposable
|
|||||||
|
|
||||||
private ConfigReconciler BuildReconciler(
|
private ConfigReconciler BuildReconciler(
|
||||||
IOptionsMonitor<MbproxyOptions> monitor,
|
IOptionsMonitor<MbproxyOptions> monitor,
|
||||||
ServiceCounters? counters = null)
|
ServiceCounters? counters = null,
|
||||||
|
Mbproxy.Proxy.TagCaptureRegistry? captureRegistry = null)
|
||||||
{
|
{
|
||||||
return new ConfigReconciler(
|
return new ConfigReconciler(
|
||||||
monitor,
|
monitor,
|
||||||
NullLoggerFactory.Instance,
|
NullLoggerFactory.Instance,
|
||||||
counters ?? new ServiceCounters());
|
counters ?? new ServiceCounters(),
|
||||||
|
captureRegistry ?? new Mbproxy.Proxy.TagCaptureRegistry());
|
||||||
}
|
}
|
||||||
|
|
||||||
// The reconciler and supervisors tracked for cleanup.
|
// The reconciler and supervisors tracked for cleanup.
|
||||||
@@ -346,6 +348,53 @@ public sealed class ConfigReconcilerTests : IAsyncDisposable
|
|||||||
foreach (var s in supervisors.Values)
|
foreach (var s in supervisors.Values)
|
||||||
_supervisors.Add(s);
|
_supervisors.Add(s);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── Test 6: Reconciler ↔ TagCaptureRegistry wiring ────────────────────────────────────
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// The reconciler owns the tag-capture lifecycle for hot-reload: a PLC added by a
|
||||||
|
/// reload must get a capture entry (<c>GetOrCreate</c>), and a PLC removed by a
|
||||||
|
/// reload must have its capture entry dropped (<c>Remove</c>). Holds a real
|
||||||
|
/// <see cref="Mbproxy.Proxy.TagCaptureRegistry"/> and asserts <c>TryGet</c> tracks
|
||||||
|
/// the roster across an add reload and a remove reload.
|
||||||
|
/// </summary>
|
||||||
|
[Fact]
|
||||||
|
public async Task Apply_AddThenRemovePlc_TagCaptureRegistryTracksRoster()
|
||||||
|
{
|
||||||
|
int portA = PickFreePort();
|
||||||
|
int portB = PickFreePort();
|
||||||
|
|
||||||
|
var plcA = MakePlc("A", portA);
|
||||||
|
var plcB = MakePlc("B", portB);
|
||||||
|
var initial = MakeOptions([plcA]);
|
||||||
|
var withB = MakeOptions([plcA, plcB]);
|
||||||
|
|
||||||
|
var supA = BuildSupervisor(plcA);
|
||||||
|
_supervisors.Add(supA);
|
||||||
|
await supA.StartAsync(CancellationToken.None);
|
||||||
|
|
||||||
|
var supervisors = new ConcurrentDictionary<string, PlcListenerSupervisor>(StringComparer.Ordinal)
|
||||||
|
{
|
||||||
|
["A"] = supA,
|
||||||
|
};
|
||||||
|
|
||||||
|
var registry = new Mbproxy.Proxy.TagCaptureRegistry();
|
||||||
|
var monitor = new FakeOptionsMonitor(initial);
|
||||||
|
var reconciler = BuildReconciler(monitor, captureRegistry: registry);
|
||||||
|
_reconcilers.Add(reconciler);
|
||||||
|
reconciler.Attach(supervisors, initial);
|
||||||
|
|
||||||
|
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
|
||||||
|
|
||||||
|
// Reload that adds PLC-B → the registry must gain a capture for B.
|
||||||
|
Assert.True(await reconciler.ApplyAsync(withB, cts.Token));
|
||||||
|
Assert.True(registry.TryGet("B", out _), "adding PLC-B must create its tag-value capture");
|
||||||
|
_supervisors.Add(supervisors["B"]);
|
||||||
|
|
||||||
|
// Reload that removes PLC-B → the registry must drop B's capture.
|
||||||
|
Assert.True(await reconciler.ApplyAsync(initial, cts.Token));
|
||||||
|
Assert.False(registry.TryGet("B", out _), "removing PLC-B must drop its tag-value capture");
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
|
|||||||
@@ -334,4 +334,67 @@ public sealed class ReloadValidatorTests
|
|||||||
Assert.False(valid);
|
Assert.False(valid);
|
||||||
Assert.Contains(errors, e => e.Contains("BackendHeartbeatIdleMs"));
|
Assert.Contains(errors, e => e.Contains("BackendHeartbeatIdleMs"));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── AdminPushIntervalMs ────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Validate_AdminPushIntervalMs_Zero_Fails()
|
||||||
|
{
|
||||||
|
var opts = new MbproxyOptions
|
||||||
|
{
|
||||||
|
Plcs = [MakePlc("PLC-A", 5020)],
|
||||||
|
AdminPushIntervalMs = 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
bool valid = ReloadValidator.Validate(opts, out var errors);
|
||||||
|
|
||||||
|
Assert.False(valid);
|
||||||
|
Assert.Contains(errors, e => e.Contains("AdminPushIntervalMs"));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Validate_AdminPushIntervalMs_Negative_Fails()
|
||||||
|
{
|
||||||
|
var opts = new MbproxyOptions
|
||||||
|
{
|
||||||
|
Plcs = [MakePlc("PLC-A", 5020)],
|
||||||
|
AdminPushIntervalMs = -5,
|
||||||
|
};
|
||||||
|
|
||||||
|
bool valid = ReloadValidator.Validate(opts, out var errors);
|
||||||
|
|
||||||
|
Assert.False(valid);
|
||||||
|
Assert.Contains(errors, e => e.Contains("AdminPushIntervalMs"));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Validate_AdminPushIntervalMs_AboveUpperBound_Fails()
|
||||||
|
{
|
||||||
|
// The soft upper bound (60 s) catches a seconds-as-milliseconds typo that
|
||||||
|
// would make the "live" dashboard feed effectively non-live.
|
||||||
|
var opts = new MbproxyOptions
|
||||||
|
{
|
||||||
|
Plcs = [MakePlc("PLC-A", 5020)],
|
||||||
|
AdminPushIntervalMs = 60_001,
|
||||||
|
};
|
||||||
|
|
||||||
|
bool valid = ReloadValidator.Validate(opts, out var errors);
|
||||||
|
|
||||||
|
Assert.False(valid);
|
||||||
|
Assert.Contains(errors, e => e.Contains("AdminPushIntervalMs"));
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Validate_AdminPushIntervalMs_AtUpperBound_Passes()
|
||||||
|
{
|
||||||
|
var opts = new MbproxyOptions
|
||||||
|
{
|
||||||
|
Plcs = [MakePlc("PLC-A", 5020)],
|
||||||
|
AdminPushIntervalMs = 60_000,
|
||||||
|
};
|
||||||
|
|
||||||
|
bool valid = ReloadValidator.Validate(opts, out var errors);
|
||||||
|
|
||||||
|
Assert.True(valid, string.Join("; ", errors));
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -105,10 +105,12 @@ internal static class TestHostBuilderExtensions
|
|||||||
this HostApplicationBuilder builder,
|
this HostApplicationBuilder builder,
|
||||||
Serilog.ILogger serilogLogger)
|
Serilog.ILogger serilogLogger)
|
||||||
{
|
{
|
||||||
// Minimal in-memory config so AddMbproxyOptions doesn't fail.
|
// Minimal in-memory config so AddMbproxyOptions doesn't fail. AdminPort 0
|
||||||
|
// disables the admin endpoint — the smoke tests do not exercise it, and a fixed
|
||||||
|
// port would collide under parallel test execution.
|
||||||
builder.Configuration.AddInMemoryCollection(new Dictionary<string, string?>
|
builder.Configuration.AddInMemoryCollection(new Dictionary<string, string?>
|
||||||
{
|
{
|
||||||
["Mbproxy:AdminPort"] = "8080",
|
["Mbproxy:AdminPort"] = "0",
|
||||||
});
|
});
|
||||||
|
|
||||||
builder.Services.AddSerilog(serilogLogger, dispose: false);
|
builder.Services.AddSerilog(serilogLogger, dispose: false);
|
||||||
|
|||||||
@@ -23,6 +23,9 @@
|
|||||||
<PackageReference Include="Shouldly" Version="4.3.0" />
|
<PackageReference Include="Shouldly" Version="4.3.0" />
|
||||||
<!-- NModbus: Modbus TCP client for simulator smoke tests and e2e tests. -->
|
<!-- NModbus: Modbus TCP client for simulator smoke tests and e2e tests. -->
|
||||||
<PackageReference Include="NModbus" Version="3.0.83" />
|
<PackageReference Include="NModbus" Version="3.0.83" />
|
||||||
|
<!-- SignalR .NET client — drives the /hub/status end-to-end test (a real
|
||||||
|
HubConnection against the live Kestrel admin host). -->
|
||||||
|
<PackageReference Include="Microsoft.AspNetCore.SignalR.Client" Version="10.0.0" />
|
||||||
</ItemGroup>
|
</ItemGroup>
|
||||||
|
|
||||||
<ItemGroup>
|
<ItemGroup>
|
||||||
|
|||||||
@@ -45,6 +45,7 @@ public sealed class MbproxyOptionsBindingTests
|
|||||||
{
|
{
|
||||||
["Mbproxy:BcdTags:Global:0:Address"] = "1072",
|
["Mbproxy:BcdTags:Global:0:Address"] = "1072",
|
||||||
["Mbproxy:BcdTags:Global:0:Width"] = "16",
|
["Mbproxy:BcdTags:Global:0:Width"] = "16",
|
||||||
|
["Mbproxy:BcdTags:Global:0:Name"] = "Left AirSP",
|
||||||
["Mbproxy:BcdTags:Global:1:Address"] = "1080",
|
["Mbproxy:BcdTags:Global:1:Address"] = "1080",
|
||||||
["Mbproxy:BcdTags:Global:1:Width"] = "32",
|
["Mbproxy:BcdTags:Global:1:Width"] = "32",
|
||||||
});
|
});
|
||||||
@@ -52,8 +53,10 @@ public sealed class MbproxyOptionsBindingTests
|
|||||||
options.BcdTags.Global.Count.ShouldBe(2);
|
options.BcdTags.Global.Count.ShouldBe(2);
|
||||||
options.BcdTags.Global[0].Address.ShouldBe((ushort)1072);
|
options.BcdTags.Global[0].Address.ShouldBe((ushort)1072);
|
||||||
options.BcdTags.Global[0].Width.ShouldBe((byte)16);
|
options.BcdTags.Global[0].Width.ShouldBe((byte)16);
|
||||||
|
options.BcdTags.Global[0].Name.ShouldBe("Left AirSP");
|
||||||
options.BcdTags.Global[1].Address.ShouldBe((ushort)1080);
|
options.BcdTags.Global[1].Address.ShouldBe((ushort)1080);
|
||||||
options.BcdTags.Global[1].Width.ShouldBe((byte)32);
|
options.BcdTags.Global[1].Width.ShouldBe((byte)32);
|
||||||
|
options.BcdTags.Global[1].Name.ShouldBeNull("Name is optional — an omitted entry binds to null");
|
||||||
}
|
}
|
||||||
|
|
||||||
// -------------------------------------------------------------------------
|
// -------------------------------------------------------------------------
|
||||||
@@ -190,6 +193,28 @@ public sealed class MbproxyOptionsBindingTests
|
|||||||
string.Join("; ", result.Failures ?? []));
|
string.Join("; ", result.Failures ?? []));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// -------------------------------------------------------------------------
|
||||||
|
// Test 7 — AdminPushIntervalMs (SignalR dashboard push cadence)
|
||||||
|
// -------------------------------------------------------------------------
|
||||||
|
[Fact]
|
||||||
|
public void MbproxyOptionsBinding_AdminPushIntervalMs_DefaultsTo1000()
|
||||||
|
{
|
||||||
|
var options = BindOptions(new Dictionary<string, string?>());
|
||||||
|
|
||||||
|
options.AdminPushIntervalMs.ShouldBe(1000);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void MbproxyOptionsBinding_AdminPushIntervalMs_BindsConfiguredValue()
|
||||||
|
{
|
||||||
|
var options = BindOptions(new Dictionary<string, string?>
|
||||||
|
{
|
||||||
|
["Mbproxy:AdminPushIntervalMs"] = "250",
|
||||||
|
});
|
||||||
|
|
||||||
|
options.AdminPushIntervalMs.ShouldBe(250);
|
||||||
|
}
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// Resolves an <c>install/</c> file by walking up from the test assembly directory.
|
/// Resolves an <c>install/</c> file by walking up from the test assembly directory.
|
||||||
/// Works from both the Windows dev box and the Linux test box.
|
/// Works from both the Windows dev box and the Linux test box.
|
||||||
|
|||||||
@@ -0,0 +1,181 @@
|
|||||||
|
using System.Collections.Frozen;
|
||||||
|
using Mbproxy.Bcd;
|
||||||
|
using Mbproxy.Proxy;
|
||||||
|
using Mbproxy.Proxy.Multiplexing;
|
||||||
|
using Microsoft.Extensions.Logging.Abstractions;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
|
||||||
|
namespace Mbproxy.Tests.Proxy;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Unit tests for the <see cref="TagValueCapture"/> recording hooks in
|
||||||
|
/// <see cref="BcdPduPipeline"/>. Verifies that an armed capture records raw PLC-side
|
||||||
|
/// and decoded client-side values, and — as a regression guard — that a disarmed or
|
||||||
|
/// absent capture leaves the rewrite behaviour byte-identical.
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class BcdPduPipelineCaptureTests
|
||||||
|
{
|
||||||
|
private static readonly BcdPduPipeline Pipeline = new();
|
||||||
|
|
||||||
|
private static BcdTagMap BuildMap(params BcdTag[] tags)
|
||||||
|
{
|
||||||
|
var frozen = tags.ToDictionary(t => t.Address).ToFrozenDictionary();
|
||||||
|
return frozen.Count > 0 ? new BcdTagMap(frozen) : BcdTagMap.Empty;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static PerPlcContext MakeContext(TagValueCapture? capture, params BcdTag[] tags)
|
||||||
|
=> new()
|
||||||
|
{
|
||||||
|
PlcName = "TestPLC",
|
||||||
|
TagMap = BuildMap(tags),
|
||||||
|
Counters = new ProxyCounters(),
|
||||||
|
Logger = NullLogger.Instance,
|
||||||
|
Capture = capture,
|
||||||
|
};
|
||||||
|
|
||||||
|
private static InFlightRequest MakeInFlight(byte fc, ushort start, ushort qty)
|
||||||
|
=> new(1, fc, start, qty, Array.Empty<InterestedParty>(), DateTimeOffset.UtcNow);
|
||||||
|
|
||||||
|
private static byte[] Fc03Response(params ushort[] regs)
|
||||||
|
{
|
||||||
|
var pdu = new byte[2 + regs.Length * 2];
|
||||||
|
pdu[0] = 0x03;
|
||||||
|
pdu[1] = (byte)(regs.Length * 2);
|
||||||
|
for (int i = 0; i < regs.Length; i++)
|
||||||
|
{
|
||||||
|
pdu[2 + i * 2] = (byte)(regs[i] >> 8);
|
||||||
|
pdu[2 + i * 2 + 1] = (byte)(regs[i] & 0xFF);
|
||||||
|
}
|
||||||
|
return pdu;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static byte[] Fc06Request(ushort address, ushort value)
|
||||||
|
=> [0x06, (byte)(address >> 8), (byte)(address & 0xFF), (byte)(value >> 8), (byte)(value & 0xFF)];
|
||||||
|
|
||||||
|
private static byte[] Fc16Request(ushort start, params ushort[] regs)
|
||||||
|
{
|
||||||
|
var pdu = new byte[6 + regs.Length * 2];
|
||||||
|
pdu[0] = 0x10;
|
||||||
|
pdu[1] = (byte)(start >> 8);
|
||||||
|
pdu[2] = (byte)(start & 0xFF);
|
||||||
|
pdu[3] = (byte)((ushort)regs.Length >> 8);
|
||||||
|
pdu[4] = (byte)(regs.Length & 0xFF);
|
||||||
|
pdu[5] = (byte)(regs.Length * 2);
|
||||||
|
for (int i = 0; i < regs.Length; i++)
|
||||||
|
{
|
||||||
|
pdu[6 + i * 2] = (byte)(regs[i] >> 8);
|
||||||
|
pdu[6 + i * 2 + 1] = (byte)(regs[i] & 0xFF);
|
||||||
|
}
|
||||||
|
return pdu;
|
||||||
|
}
|
||||||
|
|
||||||
|
private static void ProcessFc03Response(PerPlcContext ctx, ushort start, ushort qty, byte[] response)
|
||||||
|
{
|
||||||
|
var responseCtx = ctx.WithCurrentRequest(MakeInFlight(0x03, start, qty));
|
||||||
|
Pipeline.Process(MbapDirection.ResponseToClient, ReadOnlySpan<byte>.Empty, response.AsSpan(), responseCtx);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static ushort ReadReg(byte[] pdu, int offsetWords)
|
||||||
|
=> (ushort)((pdu[2 + offsetWords * 2] << 8) | pdu[2 + offsetWords * 2 + 1]);
|
||||||
|
|
||||||
|
// ── Read path (FC03/FC04 response) ───────────────────────────────────────
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void FC03_16Bit_ArmedCapture_RecordsRawAndDecoded()
|
||||||
|
{
|
||||||
|
var capture = new TagValueCapture([BcdTag.Create(100, 16)]);
|
||||||
|
capture.Arm();
|
||||||
|
var ctx = MakeContext(capture, BcdTag.Create(100, 16));
|
||||||
|
|
||||||
|
ProcessFc03Response(ctx, 100, 1, Fc03Response(0x1234));
|
||||||
|
|
||||||
|
var slot = capture.Snapshot().ShouldHaveSingleItem();
|
||||||
|
slot.Address.ShouldBe((ushort)100);
|
||||||
|
slot.RawLow.ShouldBe((ushort)0x1234); // BCD nibbles on the PLC wire
|
||||||
|
slot.DecodedValue.ShouldBe(1234); // binary the client receives
|
||||||
|
slot.Direction.ShouldBe(CaptureDirection.Read);
|
||||||
|
slot.UpdatedAtUtc.ShouldNotBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void FC03_32Bit_ArmedCapture_RecordsBothRawWords()
|
||||||
|
{
|
||||||
|
var capture = new TagValueCapture([BcdTag.Create(100, 32)]);
|
||||||
|
capture.Arm();
|
||||||
|
var ctx = MakeContext(capture, BcdTag.Create(100, 32));
|
||||||
|
|
||||||
|
// CDAB: low word 0x5678, high word 0x1234 → decoded 1234*10000 + 5678.
|
||||||
|
ProcessFc03Response(ctx, 100, 2, Fc03Response(0x5678, 0x1234));
|
||||||
|
|
||||||
|
var slot = capture.Snapshot().ShouldHaveSingleItem();
|
||||||
|
slot.Width.ShouldBe((byte)32);
|
||||||
|
slot.RawLow.ShouldBe((ushort)0x5678);
|
||||||
|
slot.RawHigh.ShouldBe((ushort)0x1234);
|
||||||
|
slot.DecodedValue.ShouldBe(12345678);
|
||||||
|
slot.Direction.ShouldBe(CaptureDirection.Read);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Write path (FC06 / FC16 request) ─────────────────────────────────────
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void FC06_ArmedCapture_RecordsEncodedBcdAndClientValue()
|
||||||
|
{
|
||||||
|
var capture = new TagValueCapture([BcdTag.Create(100, 16)]);
|
||||||
|
capture.Arm();
|
||||||
|
var ctx = MakeContext(capture, BcdTag.Create(100, 16));
|
||||||
|
|
||||||
|
// Client writes binary 1234; proxy encodes to BCD 0x1234 for the PLC.
|
||||||
|
var req = Fc06Request(100, 1234);
|
||||||
|
Pipeline.Process(MbapDirection.RequestToBackend, ReadOnlySpan<byte>.Empty, req.AsSpan(), ctx);
|
||||||
|
|
||||||
|
var slot = capture.Snapshot().ShouldHaveSingleItem();
|
||||||
|
slot.RawLow.ShouldBe((ushort)0x1234); // BCD nibbles sent to the PLC
|
||||||
|
slot.DecodedValue.ShouldBe(1234); // binary the client wrote
|
||||||
|
slot.Direction.ShouldBe(CaptureDirection.Write);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void FC16_16Bit_ArmedCapture_RecordsWrite()
|
||||||
|
{
|
||||||
|
var capture = new TagValueCapture([BcdTag.Create(100, 16)]);
|
||||||
|
capture.Arm();
|
||||||
|
var ctx = MakeContext(capture, BcdTag.Create(100, 16));
|
||||||
|
|
||||||
|
var req = Fc16Request(100, 4321);
|
||||||
|
Pipeline.Process(MbapDirection.RequestToBackend, ReadOnlySpan<byte>.Empty, req.AsSpan(), ctx);
|
||||||
|
|
||||||
|
var slot = capture.Snapshot().ShouldHaveSingleItem();
|
||||||
|
slot.RawLow.ShouldBe((ushort)0x4321);
|
||||||
|
slot.DecodedValue.ShouldBe(4321);
|
||||||
|
slot.Direction.ShouldBe(CaptureDirection.Write);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Regression guards: disarmed / absent capture ─────────────────────────
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void FC03_DisarmedCapture_StillRewrites_ButCapturesNothing()
|
||||||
|
{
|
||||||
|
var capture = new TagValueCapture([BcdTag.Create(100, 16)]);
|
||||||
|
// Not armed.
|
||||||
|
var ctx = MakeContext(capture, BcdTag.Create(100, 16));
|
||||||
|
|
||||||
|
var rsp = Fc03Response(0x1234);
|
||||||
|
ProcessFc03Response(ctx, 100, 1, rsp);
|
||||||
|
|
||||||
|
ReadReg(rsp, 0).ShouldBe((ushort)1234); // rewrite still happened
|
||||||
|
capture.Snapshot().ShouldHaveSingleItem().UpdatedAtUtc.ShouldBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void FC03_NullCapture_DoesNotThrow_AndStillRewrites()
|
||||||
|
{
|
||||||
|
var ctx = MakeContext(capture: null, BcdTag.Create(100, 16));
|
||||||
|
|
||||||
|
var rsp = Fc03Response(0x1234);
|
||||||
|
Should.NotThrow(() => ProcessFc03Response(ctx, 100, 1, rsp));
|
||||||
|
|
||||||
|
ReadReg(rsp, 0).ShouldBe((ushort)1234);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -542,4 +542,75 @@ public sealed class ResponseCacheMultiplexerTests
|
|||||||
l.Stop();
|
l.Stop();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task CacheHit_RecordsServedRead_IntoArmedDebugCapture()
|
||||||
|
{
|
||||||
|
// C3 regression guard: a cache hit bypasses the BCD pipeline, so without the
|
||||||
|
// cache-entry observation replay the connection-detail debug view would freeze
|
||||||
|
// for the whole TTL on a cached tag. A hit must re-record the served read into
|
||||||
|
// the (armed) capture so the debug view reflects what the client receives.
|
||||||
|
int backendPort = PickFreePort();
|
||||||
|
await using var backend = new StubBackend(backendPort) { RegisterValue = 0x1234 };
|
||||||
|
|
||||||
|
using var cache = new ResponseCache(maxEntriesPerPlc: 64, evictionIntervalMs: 5000);
|
||||||
|
var tag = BcdTag.Create(100, 16, cacheTtlMs: 5000);
|
||||||
|
|
||||||
|
// A detail-page viewer has armed this PLC's debug-view capture.
|
||||||
|
var capture = new TagValueCapture([tag]);
|
||||||
|
capture.Arm();
|
||||||
|
|
||||||
|
var frozen = new[] { tag }.ToDictionary(t => t.Address).ToFrozenDictionary();
|
||||||
|
var ctx = new PerPlcContext
|
||||||
|
{
|
||||||
|
PlcName = "PLC1",
|
||||||
|
TagMap = new BcdTagMap(frozen),
|
||||||
|
Counters = new ProxyCounters(),
|
||||||
|
Logger = NullLogger.Instance,
|
||||||
|
Cache = cache,
|
||||||
|
Capture = capture,
|
||||||
|
};
|
||||||
|
var plc = new PlcOptions { Name = "PLC1", ListenPort = 0, Host = "127.0.0.1", Port = backendPort };
|
||||||
|
await using var mux = BuildMux(plc, ctx);
|
||||||
|
|
||||||
|
var (c, p, l) = await ConnectClientAsync(mux, plc.Name);
|
||||||
|
try
|
||||||
|
{
|
||||||
|
// First read — cache miss. The pipeline records the observation and the
|
||||||
|
// entry is stored with the per-tag observations attached.
|
||||||
|
await c.SendAsync(BuildFc03(0x0001, 100, 1), SocketFlags.None);
|
||||||
|
_ = await ReadOneFrameAsync(c, TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
|
var afterMiss = capture.Snapshot().Single(o => o.Address == 100);
|
||||||
|
afterMiss.UpdatedAtUtc.ShouldNotBeNull("the cache-miss read must record an observation");
|
||||||
|
afterMiss.DecodedValue.ShouldBe(1234);
|
||||||
|
afterMiss.RawLow.ShouldBe((ushort)0x1234);
|
||||||
|
|
||||||
|
// Clear the capture's slots (models the debug view holding no fresh data),
|
||||||
|
// then re-arm. Only the cache-hit replay can repopulate slot 100 now — the
|
||||||
|
// backend is not contacted again.
|
||||||
|
capture.Disarm();
|
||||||
|
capture.Arm();
|
||||||
|
capture.Snapshot().Single(o => o.Address == 100).UpdatedAtUtc
|
||||||
|
.ShouldBeNull("Disarm must clear the slot");
|
||||||
|
|
||||||
|
// Second read — cache hit. No backend round-trip; the pipeline is bypassed.
|
||||||
|
await c.SendAsync(BuildFc03(0x0002, 100, 1), SocketFlags.None);
|
||||||
|
_ = await ReadOneFrameAsync(c, TestContext.Current.CancellationToken);
|
||||||
|
backend.RequestCount.ShouldBe(1, "the second read must be served from the cache");
|
||||||
|
|
||||||
|
var afterHit = capture.Snapshot().Single(o => o.Address == 100);
|
||||||
|
afterHit.UpdatedAtUtc.ShouldNotBeNull(
|
||||||
|
"a cache hit must re-record the served read so the debug view does not freeze");
|
||||||
|
afterHit.DecodedValue.ShouldBe(1234, "the replayed observation carries the decoded value");
|
||||||
|
afterHit.RawLow.ShouldBe((ushort)0x1234, "the replayed observation carries the raw BCD nibbles");
|
||||||
|
afterHit.Direction.ShouldBe(CaptureDirection.Read);
|
||||||
|
}
|
||||||
|
finally
|
||||||
|
{
|
||||||
|
c.Dispose();
|
||||||
|
await p.DisposeAsync();
|
||||||
|
l.Stop();
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -342,6 +342,9 @@ public sealed class MultiplexerE2ETests
|
|||||||
["Mbproxy:Connection:BackendConnectTimeoutMs"] = "3000",
|
["Mbproxy:Connection:BackendConnectTimeoutMs"] = "3000",
|
||||||
// Long request timeout so the watchdog doesn't fire during the test's wait window.
|
// Long request timeout so the watchdog doesn't fire during the test's wait window.
|
||||||
["Mbproxy:Connection:BackendRequestTimeoutMs"] = "30000",
|
["Mbproxy:Connection:BackendRequestTimeoutMs"] = "30000",
|
||||||
|
// This test exercises backend disconnect, not keepalive — disable keepalive so
|
||||||
|
// the 30 s request timeout above doesn't trip the heartbeat cross-field rule.
|
||||||
|
["Mbproxy:Connection:Keepalive:Enabled"] = "false",
|
||||||
// Aggressive backend retry so the second connect happens fast.
|
// Aggressive backend retry so the second connect happens fast.
|
||||||
["Mbproxy:Resilience:BackendConnect:MaxAttempts"] = "5",
|
["Mbproxy:Resilience:BackendConnect:MaxAttempts"] = "5",
|
||||||
["Mbproxy:Resilience:BackendConnect:BackoffMs:0"] = "50",
|
["Mbproxy:Resilience:BackendConnect:BackoffMs:0"] = "50",
|
||||||
@@ -458,8 +461,11 @@ public sealed class MultiplexerE2ETests
|
|||||||
var config = MakeBaseConfig(proxyPort);
|
var config = MakeBaseConfig(proxyPort);
|
||||||
config["Mbproxy:AdminPort"] = adminPort.ToString();
|
config["Mbproxy:AdminPort"] = adminPort.ToString();
|
||||||
// Short idle window so the heartbeat fires several times within the test budget.
|
// Short idle window so the heartbeat fires several times within the test budget.
|
||||||
|
// BackendRequestTimeoutMs is lowered below the 700 ms idle window so the
|
||||||
|
// heartbeat cross-field rule (idle > request timeout) holds.
|
||||||
config["Mbproxy:Connection:Keepalive:Enabled"] = "true";
|
config["Mbproxy:Connection:Keepalive:Enabled"] = "true";
|
||||||
config["Mbproxy:Connection:Keepalive:BackendHeartbeatIdleMs"] = "700";
|
config["Mbproxy:Connection:Keepalive:BackendHeartbeatIdleMs"] = "700";
|
||||||
|
config["Mbproxy:Connection:BackendRequestTimeoutMs"] = "500";
|
||||||
|
|
||||||
var host = BuildBcdHost(config);
|
var host = BuildBcdHost(config);
|
||||||
using var startCts = new CancellationTokenSource(TimeSpan.FromSeconds(3));
|
using var startCts = new CancellationTokenSource(TimeSpan.FromSeconds(3));
|
||||||
|
|||||||
@@ -64,7 +64,9 @@ public sealed class ProxyForwardingTests
|
|||||||
|
|
||||||
var config = new Dictionary<string, string?>
|
var config = new Dictionary<string, string?>
|
||||||
{
|
{
|
||||||
["Mbproxy:AdminPort"] = "8080",
|
// 0 disables the admin endpoint — this test does not exercise it, and a
|
||||||
|
// fixed port would collide under parallel test execution.
|
||||||
|
["Mbproxy:AdminPort"] = "0",
|
||||||
[$"Mbproxy:Plcs:0:Name"] = "TestPLC",
|
[$"Mbproxy:Plcs:0:Name"] = "TestPLC",
|
||||||
[$"Mbproxy:Plcs:0:ListenPort"] = proxyPort.ToString(),
|
[$"Mbproxy:Plcs:0:ListenPort"] = proxyPort.ToString(),
|
||||||
[$"Mbproxy:Plcs:0:Host"] = _sim.Host,
|
[$"Mbproxy:Plcs:0:Host"] = _sim.Host,
|
||||||
@@ -239,7 +241,9 @@ public sealed class ProxyForwardingTests
|
|||||||
|
|
||||||
var config = new Dictionary<string, string?>
|
var config = new Dictionary<string, string?>
|
||||||
{
|
{
|
||||||
["Mbproxy:AdminPort"] = "8080",
|
// 0 disables the admin endpoint — this test does not exercise it, and a
|
||||||
|
// fixed port would collide under parallel test execution.
|
||||||
|
["Mbproxy:AdminPort"] = "0",
|
||||||
[$"Mbproxy:Plcs:0:Name"] = "BadPLC",
|
[$"Mbproxy:Plcs:0:Name"] = "BadPLC",
|
||||||
[$"Mbproxy:Plcs:0:ListenPort"] = proxyPort.ToString(),
|
[$"Mbproxy:Plcs:0:ListenPort"] = proxyPort.ToString(),
|
||||||
[$"Mbproxy:Plcs:0:Host"] = "127.0.0.1",
|
[$"Mbproxy:Plcs:0:Host"] = "127.0.0.1",
|
||||||
@@ -307,7 +311,9 @@ public sealed class ProxyForwardingTests
|
|||||||
|
|
||||||
var config = new Dictionary<string, string?>
|
var config = new Dictionary<string, string?>
|
||||||
{
|
{
|
||||||
["Mbproxy:AdminPort"] = "8080",
|
// 0 disables the admin endpoint — this test does not exercise it, and a
|
||||||
|
// fixed port would collide under parallel test execution.
|
||||||
|
["Mbproxy:AdminPort"] = "0",
|
||||||
[$"Mbproxy:Plcs:0:Name"] = "TestPLC",
|
[$"Mbproxy:Plcs:0:Name"] = "TestPLC",
|
||||||
[$"Mbproxy:Plcs:0:ListenPort"] = proxyPort.ToString(),
|
[$"Mbproxy:Plcs:0:ListenPort"] = proxyPort.ToString(),
|
||||||
[$"Mbproxy:Plcs:0:Host"] = _sim.Host,
|
[$"Mbproxy:Plcs:0:Host"] = _sim.Host,
|
||||||
|
|||||||
@@ -385,7 +385,9 @@ public sealed class RewriterE2ETests
|
|||||||
|
|
||||||
var config = new Dictionary<string, string?>
|
var config = new Dictionary<string, string?>
|
||||||
{
|
{
|
||||||
["Mbproxy:AdminPort"] = "8080",
|
// 0 disables the admin endpoint — this test does not exercise it, and a
|
||||||
|
// fixed port would collide under parallel test execution.
|
||||||
|
["Mbproxy:AdminPort"] = "0",
|
||||||
["Mbproxy:Plcs:0:Name"] = "TestPLC",
|
["Mbproxy:Plcs:0:Name"] = "TestPLC",
|
||||||
["Mbproxy:Plcs:0:ListenPort"] = proxyPort.ToString(),
|
["Mbproxy:Plcs:0:ListenPort"] = proxyPort.ToString(),
|
||||||
["Mbproxy:Plcs:0:Host"] = _sim.Host,
|
["Mbproxy:Plcs:0:Host"] = _sim.Host,
|
||||||
|
|||||||
@@ -0,0 +1,132 @@
|
|||||||
|
using System.Collections.Frozen;
|
||||||
|
using Mbproxy.Bcd;
|
||||||
|
using Mbproxy.Proxy;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
|
||||||
|
namespace Mbproxy.Tests.Proxy;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Unit tests for <see cref="TagCaptureRegistry"/> — the shared seam holding per-PLC
|
||||||
|
/// <see cref="TagValueCapture"/> instances. Arm state is reconciled in bulk against the
|
||||||
|
/// live viewer set (not toggled per PLC) so the broadcaster is the single authority.
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class TagCaptureRegistryTests
|
||||||
|
{
|
||||||
|
private static BcdTagMap Map(params (ushort addr, byte width)[] tags)
|
||||||
|
{
|
||||||
|
if (tags.Length == 0)
|
||||||
|
return BcdTagMap.Empty;
|
||||||
|
var frozen = tags
|
||||||
|
.Select(t => BcdTag.Create(t.addr, t.width))
|
||||||
|
.ToDictionary(t => t.Address)
|
||||||
|
.ToFrozenDictionary();
|
||||||
|
return new BcdTagMap(frozen);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void GetOrCreate_ReturnsLiveInstance_OnRepeatCall()
|
||||||
|
{
|
||||||
|
var registry = new TagCaptureRegistry();
|
||||||
|
registry.GetOrCreate("plc-1", Map((100, 16)));
|
||||||
|
var second = registry.GetOrCreate("plc-1", Map((100, 16)));
|
||||||
|
|
||||||
|
second.TagCount.ShouldBe(1);
|
||||||
|
registry.TryGet("plc-1", out var current).ShouldBeTrue();
|
||||||
|
current.ShouldBeSameAs(second);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void GetOrCreate_Rebuild_ProducesDisarmedCapture_AndReconcileReArms()
|
||||||
|
{
|
||||||
|
// The rebuilt capture is intentionally disarmed: ReconcileArmed re-arms it within
|
||||||
|
// one push cycle if the PLC still has a viewer, so arm state is never carried
|
||||||
|
// across the rebuild — which removes any arm-vs-rebuild race.
|
||||||
|
var registry = new TagCaptureRegistry();
|
||||||
|
registry.GetOrCreate("plc-1", Map((100, 16)));
|
||||||
|
registry.ReconcileArmed(["plc-1"]);
|
||||||
|
registry.TryGet("plc-1", out var armed).ShouldBeTrue();
|
||||||
|
armed.IsArmed.ShouldBeTrue();
|
||||||
|
|
||||||
|
// Hot-reload reseat: same PLC, changed tag set.
|
||||||
|
var rebuilt = registry.GetOrCreate("plc-1", Map((100, 16), (200, 32)));
|
||||||
|
rebuilt.ShouldNotBeSameAs(armed);
|
||||||
|
rebuilt.IsArmed.ShouldBeFalse("a rebuilt capture starts disarmed");
|
||||||
|
rebuilt.TagCount.ShouldBe(2);
|
||||||
|
|
||||||
|
// The next reconcile re-arms it because the PLC is still viewed.
|
||||||
|
registry.ReconcileArmed(["plc-1"]);
|
||||||
|
rebuilt.IsArmed.ShouldBeTrue();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void ReconcileArmed_ArmsActivePlcs_DisarmsTheRest()
|
||||||
|
{
|
||||||
|
var registry = new TagCaptureRegistry();
|
||||||
|
registry.GetOrCreate("plc-1", Map((100, 16)));
|
||||||
|
registry.GetOrCreate("plc-2", Map((100, 16)));
|
||||||
|
|
||||||
|
registry.ReconcileArmed(["plc-1"]);
|
||||||
|
registry.TryGet("plc-1", out var c1).ShouldBeTrue();
|
||||||
|
registry.TryGet("plc-2", out var c2).ShouldBeTrue();
|
||||||
|
c1.IsArmed.ShouldBeTrue();
|
||||||
|
c2.IsArmed.ShouldBeFalse();
|
||||||
|
|
||||||
|
// plc-1's viewer leaves, plc-2 gains one.
|
||||||
|
registry.ReconcileArmed(["plc-2"]);
|
||||||
|
c1.IsArmed.ShouldBeFalse();
|
||||||
|
c2.IsArmed.ShouldBeTrue();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void ReconcileArmed_EmptyActiveSet_DisarmsEverything()
|
||||||
|
{
|
||||||
|
var registry = new TagCaptureRegistry();
|
||||||
|
registry.GetOrCreate("plc-1", Map((100, 16)));
|
||||||
|
registry.ReconcileArmed(["plc-1"]);
|
||||||
|
|
||||||
|
registry.ReconcileArmed(Array.Empty<string>());
|
||||||
|
|
||||||
|
registry.TryGet("plc-1", out var c1).ShouldBeTrue();
|
||||||
|
c1.IsArmed.ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void DisarmAll_DisarmsEveryCapture()
|
||||||
|
{
|
||||||
|
var registry = new TagCaptureRegistry();
|
||||||
|
registry.GetOrCreate("plc-1", Map((100, 16)));
|
||||||
|
registry.GetOrCreate("plc-2", Map((100, 16)));
|
||||||
|
registry.ReconcileArmed(["plc-1", "plc-2"]);
|
||||||
|
|
||||||
|
registry.DisarmAll();
|
||||||
|
|
||||||
|
registry.TryGet("plc-1", out var c1).ShouldBeTrue();
|
||||||
|
registry.TryGet("plc-2", out var c2).ShouldBeTrue();
|
||||||
|
c1.IsArmed.ShouldBeFalse();
|
||||||
|
c2.IsArmed.ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void UnknownPlc_Operations_AreSafeNoOps()
|
||||||
|
{
|
||||||
|
var registry = new TagCaptureRegistry();
|
||||||
|
|
||||||
|
Should.NotThrow(() => registry.ReconcileArmed(["ghost"]));
|
||||||
|
Should.NotThrow(() => registry.Remove("ghost"));
|
||||||
|
registry.TryGet("ghost", out _).ShouldBeFalse();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Remove_DropsTheCapture()
|
||||||
|
{
|
||||||
|
var registry = new TagCaptureRegistry();
|
||||||
|
registry.GetOrCreate("plc-1", Map((100, 16)));
|
||||||
|
registry.TryGet("plc-1", out _).ShouldBeTrue();
|
||||||
|
|
||||||
|
registry.Remove("plc-1");
|
||||||
|
|
||||||
|
registry.TryGet("plc-1", out _).ShouldBeFalse();
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,211 @@
|
|||||||
|
using Mbproxy.Bcd;
|
||||||
|
using Mbproxy.Proxy;
|
||||||
|
using Shouldly;
|
||||||
|
using Xunit;
|
||||||
|
|
||||||
|
namespace Mbproxy.Tests.Proxy;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Unit tests for <see cref="TagValueCapture"/> — the on-demand per-tag value store
|
||||||
|
/// behind the connection-detail debug view.
|
||||||
|
/// </summary>
|
||||||
|
[Trait("Category", "Unit")]
|
||||||
|
public sealed class TagValueCaptureTests
|
||||||
|
{
|
||||||
|
private static TagValueCapture Make(params (ushort addr, byte width)[] tags)
|
||||||
|
=> new(tags.Select(t => BcdTag.Create(t.addr, t.width)));
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Disarmed_Record_IsNoOp()
|
||||||
|
{
|
||||||
|
var capture = Make((100, 16));
|
||||||
|
// No Arm() call — capture starts disarmed.
|
||||||
|
capture.Record(100, 0x1234, 0, 1234, CaptureDirection.Read);
|
||||||
|
|
||||||
|
capture.IsArmed.ShouldBeFalse();
|
||||||
|
var slot = capture.Snapshot().ShouldHaveSingleItem();
|
||||||
|
slot.UpdatedAtUtc.ShouldBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Armed_Record_UpdatesMatchingSlot()
|
||||||
|
{
|
||||||
|
var capture = Make((100, 16));
|
||||||
|
capture.Arm();
|
||||||
|
capture.Record(100, 0x1234, 0, 1234, CaptureDirection.Read);
|
||||||
|
|
||||||
|
var slot = capture.Snapshot().ShouldHaveSingleItem();
|
||||||
|
slot.Address.ShouldBe((ushort)100);
|
||||||
|
slot.Width.ShouldBe((byte)16);
|
||||||
|
slot.RawLow.ShouldBe((ushort)0x1234);
|
||||||
|
slot.DecodedValue.ShouldBe(1234);
|
||||||
|
slot.Direction.ShouldBe(CaptureDirection.Read);
|
||||||
|
slot.UpdatedAtUtc.ShouldNotBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Snapshot_CarriesTagName_FromConfiguredTag()
|
||||||
|
{
|
||||||
|
// The capture surfaces a BCD tag's optional friendly name on every
|
||||||
|
// observation — placeholder rows (no traffic) and recorded values alike —
|
||||||
|
// so the debug view can label rows. An unnamed tag surfaces a null Name.
|
||||||
|
var capture = new TagValueCapture(
|
||||||
|
[
|
||||||
|
BcdTag.Create(100, 16, name: "Left AirSP"),
|
||||||
|
BcdTag.Create(200, 32),
|
||||||
|
]);
|
||||||
|
|
||||||
|
var before = capture.Snapshot();
|
||||||
|
before.Single(o => o.Address == 100).Name.ShouldBe("Left AirSP");
|
||||||
|
before.Single(o => o.Address == 200).Name.ShouldBeNull();
|
||||||
|
|
||||||
|
capture.Arm();
|
||||||
|
capture.Record(100, 0x1234, 0, 1234, CaptureDirection.Read);
|
||||||
|
capture.Snapshot().Single(o => o.Address == 100).Name.ShouldBe("Left AirSP");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Armed_Record_UnknownAddress_IsIgnored()
|
||||||
|
{
|
||||||
|
var capture = Make((100, 16));
|
||||||
|
capture.Arm();
|
||||||
|
capture.Record(999, 0x1111, 0, 1111, CaptureDirection.Read);
|
||||||
|
|
||||||
|
capture.Snapshot().ShouldAllBe(s => s.UpdatedAtUtc == null);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Disarm_ClearsAllSlots()
|
||||||
|
{
|
||||||
|
var capture = Make((100, 16), (200, 16));
|
||||||
|
capture.Arm();
|
||||||
|
capture.Record(100, 0x0042, 0, 42, CaptureDirection.Read);
|
||||||
|
capture.Record(200, 0x0099, 0, 99, CaptureDirection.Read);
|
||||||
|
|
||||||
|
capture.Disarm();
|
||||||
|
|
||||||
|
capture.IsArmed.ShouldBeFalse();
|
||||||
|
capture.Snapshot().ShouldAllBe(s => s.UpdatedAtUtc == null);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void ReArm_AfterDisarm_StartsEmpty()
|
||||||
|
{
|
||||||
|
var capture = Make((100, 16));
|
||||||
|
capture.Arm();
|
||||||
|
capture.Record(100, 0x0042, 0, 42, CaptureDirection.Read);
|
||||||
|
capture.Disarm();
|
||||||
|
capture.Arm();
|
||||||
|
|
||||||
|
// No new traffic since re-arm — slot must read as empty, not the pre-disarm value.
|
||||||
|
capture.Snapshot().ShouldHaveSingleItem().UpdatedAtUtc.ShouldBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void ThirtyTwoBitTag_RecordsBothRawWords()
|
||||||
|
{
|
||||||
|
var capture = Make((100, 32));
|
||||||
|
capture.Arm();
|
||||||
|
capture.Record(100, 0x5678, 0x1234, 12345678, CaptureDirection.Read);
|
||||||
|
|
||||||
|
var slot = capture.Snapshot().ShouldHaveSingleItem();
|
||||||
|
slot.Width.ShouldBe((byte)32);
|
||||||
|
slot.RawLow.ShouldBe((ushort)0x5678);
|
||||||
|
slot.RawHigh.ShouldBe((ushort)0x1234);
|
||||||
|
slot.DecodedValue.ShouldBe(12345678);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Snapshot_ReturnsOneRowPerTag_OrderedByAddress()
|
||||||
|
{
|
||||||
|
var capture = Make((300, 16), (100, 32), (200, 16));
|
||||||
|
capture.TagCount.ShouldBe(3);
|
||||||
|
|
||||||
|
var snap = capture.Snapshot();
|
||||||
|
snap.Select(s => s.Address).ShouldBe([(ushort)100, (ushort)200, (ushort)300]);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void WriteDirection_IsPreserved()
|
||||||
|
{
|
||||||
|
var capture = Make((100, 16));
|
||||||
|
capture.Arm();
|
||||||
|
capture.Record(100, 0x0500, 0, 500, CaptureDirection.Write);
|
||||||
|
|
||||||
|
capture.Snapshot().ShouldHaveSingleItem().Direction.ShouldBe(CaptureDirection.Write);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task ConcurrentRecordAndSnapshot_NeverYieldsTornSlot()
|
||||||
|
{
|
||||||
|
// Invariant maintained by every Record: DecodedValue == RawLow + RawHigh.
|
||||||
|
// A torn read (fields from two different Record calls) would break it.
|
||||||
|
var capture = Make((100, 32));
|
||||||
|
capture.Arm();
|
||||||
|
|
||||||
|
var ct = TestContext.Current.CancellationToken;
|
||||||
|
bool tornObserved = false;
|
||||||
|
|
||||||
|
var writers = Enumerable.Range(0, 4).Select(seed => Task.Run(() =>
|
||||||
|
{
|
||||||
|
var rng = new Random(seed + 1);
|
||||||
|
for (int i = 0; i < 200_000; i++)
|
||||||
|
{
|
||||||
|
ushort lo = (ushort)rng.Next(0, 60000);
|
||||||
|
ushort hi = (ushort)rng.Next(0, 5000);
|
||||||
|
capture.Record(100, lo, hi, lo + hi, CaptureDirection.Read);
|
||||||
|
}
|
||||||
|
}, ct)).ToArray();
|
||||||
|
|
||||||
|
var reader = Task.Run(() =>
|
||||||
|
{
|
||||||
|
for (int i = 0; i < 200_000; i++)
|
||||||
|
{
|
||||||
|
foreach (var slot in capture.Snapshot())
|
||||||
|
{
|
||||||
|
if (slot.UpdatedAtUtc is null)
|
||||||
|
continue;
|
||||||
|
if (slot.DecodedValue != slot.RawLow + slot.RawHigh)
|
||||||
|
tornObserved = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}, ct);
|
||||||
|
|
||||||
|
await Task.WhenAll([.. writers, reader]);
|
||||||
|
tornObserved.ShouldBeFalse("Snapshot must never observe a torn (half-updated) slot");
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task ConcurrentRecordAndDisarm_LeavesNoStaleObservation()
|
||||||
|
{
|
||||||
|
// M7 regression: Record() checks _armed then writes; Disarm() flips _armed then
|
||||||
|
// clears the slots. A Record that passes the check while armed, then has Disarm
|
||||||
|
// run, then writes, would strand a stale observation on a disarmed capture —
|
||||||
|
// breaking the "reopened page shows no stale data" contract. Record's re-check
|
||||||
|
// after the write must undo that. The capture ends disarmed (the toggler's last
|
||||||
|
// op is Disarm), so a clean Snapshot is a deterministic post-condition of the fix.
|
||||||
|
var capture = Make((100, 16));
|
||||||
|
var ct = TestContext.Current.CancellationToken;
|
||||||
|
|
||||||
|
var recorder = Task.Run(() =>
|
||||||
|
{
|
||||||
|
for (int i = 0; i < 400_000; i++)
|
||||||
|
capture.Record(100, 0x1234, 0, 1234, CaptureDirection.Read);
|
||||||
|
}, ct);
|
||||||
|
|
||||||
|
var toggler = Task.Run(() =>
|
||||||
|
{
|
||||||
|
for (int i = 0; i < 80_000; i++)
|
||||||
|
{
|
||||||
|
capture.Arm();
|
||||||
|
capture.Disarm();
|
||||||
|
}
|
||||||
|
}, ct);
|
||||||
|
|
||||||
|
await Task.WhenAll(recorder, toggler);
|
||||||
|
|
||||||
|
capture.IsArmed.ShouldBeFalse();
|
||||||
|
capture.Snapshot().ShouldAllBe(s => s.UpdatedAtUtc == null,
|
||||||
|
"a disarmed capture must never retain a recorded observation");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -118,9 +118,12 @@ public sealed class DL205SimulatorFixture : IAsyncLifetime
|
|||||||
_process.BeginErrorReadLine();
|
_process.BeginErrorReadLine();
|
||||||
|
|
||||||
// ── 5. Poll for TCP readiness (up to ReadinessTimeout) ───────────────
|
// ── 5. Poll for TCP readiness (up to ReadinessTimeout) ───────────────
|
||||||
|
// Link the readiness deadline against the test-runner's cancellation token so a
|
||||||
|
// CI job timeout / keyboard interrupt aborts the poll promptly instead of running
|
||||||
|
// the full 120 s and leaving the spawned Python process orphaned (review M3).
|
||||||
using var deadline = new CancellationTokenSource(ReadinessTimeout);
|
using var deadline = new CancellationTokenSource(ReadinessTimeout);
|
||||||
using var linked = CancellationTokenSource.CreateLinkedTokenSource(
|
using var linked = CancellationTokenSource.CreateLinkedTokenSource(
|
||||||
deadline.Token, CancellationToken.None);
|
deadline.Token, TestContext.Current.CancellationToken);
|
||||||
|
|
||||||
bool ready = false;
|
bool ready = false;
|
||||||
while (!linked.Token.IsCancellationRequested)
|
while (!linked.Token.IsCancellationRequested)
|
||||||
|
|||||||
@@ -0,0 +1,72 @@
|
|||||||
|
// mbproxy smoke-test configuration for the web-UI browser smoke tests.
|
||||||
|
// NOT a deployment config. The Resilience and Cache sections are intentionally
|
||||||
|
// omitted — the smoke run relies on their built-in defaults.
|
||||||
|
//
|
||||||
|
// Topology:
|
||||||
|
// * line-a / line-b → the dl205 simulator on 127.0.0.1:5020 (run-dl205-sim.ps1).
|
||||||
|
// line-a carries a 16-bit BCD tag, line-b a 32-bit BCD tag,
|
||||||
|
// so the connection-detail debug view has content in both
|
||||||
|
// widths. Both listeners bind and reach a live backend.
|
||||||
|
// * line-dead → an unreachable backend (192.0.2.1, TEST-NET-1, RFC 5737).
|
||||||
|
// The listener binds fine but every backend connect fails,
|
||||||
|
// so the row surfaces connect failures / heartbeat failures
|
||||||
|
// and exercises the dashboard's "problems only" filter.
|
||||||
|
{
|
||||||
|
"Mbproxy": {
|
||||||
|
"BcdTags": {
|
||||||
|
"Global": []
|
||||||
|
},
|
||||||
|
"Plcs": [
|
||||||
|
{
|
||||||
|
"Name": "line-a",
|
||||||
|
"ListenPort": 6020,
|
||||||
|
"Host": "127.0.0.1",
|
||||||
|
"Port": 5020,
|
||||||
|
"BcdTags": {
|
||||||
|
"Add": [
|
||||||
|
{ "Address": 1072, "Width": 16 }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Name": "line-b",
|
||||||
|
"ListenPort": 6021,
|
||||||
|
"Host": "127.0.0.1",
|
||||||
|
"Port": 5020,
|
||||||
|
"BcdTags": {
|
||||||
|
"Add": [
|
||||||
|
{ "Address": 1072, "Width": 32 }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Name": "line-dead",
|
||||||
|
"ListenPort": 6022,
|
||||||
|
"Host": "192.0.2.1",
|
||||||
|
"Port": 502
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"AdminPort": 8080,
|
||||||
|
"AdminPushIntervalMs": 1000,
|
||||||
|
"Connection": {
|
||||||
|
"BackendConnectTimeoutMs": 2000,
|
||||||
|
"BackendRequestTimeoutMs": 2000,
|
||||||
|
"GracefulShutdownTimeoutMs": 5000,
|
||||||
|
"Keepalive": {
|
||||||
|
"Enabled": true,
|
||||||
|
"TcpIdleTimeMs": 30000,
|
||||||
|
"TcpProbeIntervalMs": 5000,
|
||||||
|
"TcpProbeCount": 4,
|
||||||
|
"BackendHeartbeatIdleMs": 10000,
|
||||||
|
"BackendHeartbeatProbeAddress": 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Serilog": {
|
||||||
|
"Using": [ "Serilog.Sinks.Console" ],
|
||||||
|
"MinimumLevel": { "Default": "Information" },
|
||||||
|
"WriteTo": [
|
||||||
|
{ "Name": "Console" }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user