# Code Review — Tests & Config for the SignalR Dashboard (commit `e719dd5`) Scope: the test files, config/options/build changes, and the smoke config introduced or modified by `e719dd5` ("replace status page with a live SignalR web dashboard"). Production SignalR/capture code (`StatusHub`, `StatusBroadcaster`, `TagValueCapture`, `TagCaptureRegistry`, `AdminEndpointHost`) was read for context but is reviewed only insofar as it tells us whether the new tests actually exercise the risky paths. The wider UI (`dashboard.js`, `detail.js`, HTML/CSS) is explicitly out of scope. ## Summary The new tests are competently written, follow the existing `Subject_Condition_Expectation` naming, and the `IStatusPushSink` seam is a genuinely good design that makes the broadcaster unit-testable without a SignalR host. The `TagValueCapture` torn-read test is a real concurrency test, not a pretend one. **But the concurrency-sensitive paths the commit message advertises — "armed only while a detail page is open", arm/disarm under churn, the broadcaster *loop* — are tested only at the single-threaded happy-path level.** The most serious gaps: 1. The **`StatusBroadcaster.LoopAsync` background loop has zero test coverage** — every broadcaster test calls the internal `PushOnceAsync` directly. The loop's interval re-read, the `Math.Max(100, ...)` floor, the cancellation path, and the loop-terminated-unexpectedly catch are all unverified. 2. The **arm/disarm-vs-record race in `TagValueCapture` is real and untested.** `Disarm()` sets `_armed = false` *then* clears slots; `Record()` checks `_armed` *then* writes. A `Record` that passes the `_armed` check before `Disarm` flips it can write a slot *after* `Disarm` cleared it — leaving a stale observation on a disarmed capture. The torn-read test only races `Record` against `Snapshot`, never against `Disarm`/`Arm`. 3. The **`PlcSubscriptionTracker` — the lock-guarded type whose entire job is concurrent subscriber counting — has no direct test at all** and no concurrent test anywhere. Its behaviour is only incidentally exercised through single-threaded `StatusHubTests`. None are release-blockers for a read-only admin page on a trusted segment, but #2 is a latent correctness bug and #1/#3 mean a regression in the live-feed plumbing would ship silently. ## Findings ### Critical None. This is a read-only admin surface; nothing here can corrupt the Modbus path. ### Major **M1 — `StatusBroadcaster.LoopAsync` has no test.** `src/Mbproxy/Admin/StatusBroadcaster.cs:113-135`; `tests/Mbproxy.Tests/Admin/StatusBroadcasterTests.cs`. All four broadcaster tests drive `PushOnceAsync` directly. The actual background loop — which is what runs in production — is untested. Specifically uncovered: - the per-cycle `_options.CurrentValue.AdminPushIntervalMs` re-read (the documented hot-reload-without-restart behaviour); - the `Math.Max(100, ...)` floor that defends against a bad interval slipping past validation (note: validation rejects `<= 0`, so the floor only ever matters for values `1..99` — itself an untested corner); - that `StartAsync`→`StopAsync` actually terminates the loop and that `StopAsync` is safe to call when the loop never started. Fix: add a test that constructs the broadcaster with `AdminPushIntervalMs` ~100ms, calls `Start()`, polls `Sink.FleetPushes.Count` until `>= 2` with a generous deadline + the test `CancellationToken`, then `StopAsync()` and asserts the count stops growing. Keep timing hedged (`ShouldBeGreaterThanOrEqualTo`), consistent with the existing coalescing tests. **M2 — Arm/disarm-vs-record race in `TagValueCapture` is real and untested.** `src/Mbproxy/Proxy/TagValueCapture.cs:103-129`. `Disarm()` does `_armed = false` then clears slots; `Record()` does `if (!_armed) return;` then `Volatile.Write` the slot. Interleaving: `Record` reads `_armed == true` → `Disarm` sets `false` and clears all slots → `Record` writes its observation. Result: a *disarmed* capture holds a non-null slot, and the next `Snapshot()` reports stale traffic with `UpdatedAtUtc != null` — exactly the "reopened page shows stale data" failure the on-demand design says it prevents. The hot path makes this unlikely (record traffic stops when no viewer is attached) but it is not impossible: a detail page closing while an in-flight FC03 response is being rewritten hits it. `ConcurrentRecordAndSnapshot_NeverYieldsTornSlot` (line 118) races `Record` vs `Snapshot` only — never vs `Disarm`. Fix in production: have `Disarm()` set `_armed = false`, then clear slots, then clear *again* (or re-clear after a memory barrier), or re-check `_armed` inside `Record` after the `Volatile.Write` and null the slot if disarmed. At minimum add a test that hammers `Arm`/`Disarm` on one task while `Record` runs on another and asserts a disarmed capture's `Snapshot()` is all-null. The review should not let this ship "tested" when the test deliberately avoids the racy interleaving. **M3 — `PlcSubscriptionTracker` has no direct or concurrent test.** `src/Mbproxy/Admin/PlcSubscriptionTracker.cs` (whole file); no `PlcSubscriptionTrackerTests.cs` exists. This is the single lock-guarded type whose correctness drives capture arm/disarm. `StatusHubTests` exercises it only single-threaded and indirectly. Untested behaviour that has no coverage anywhere: - the redundant re-subscribe path (`Add` returns `false` when the same connection re-subscribes the same PLC — `set.Add` fails); - `RemoveConnection` for an unknown connection id returning `Array.Empty()`; - a connection subscribed to *multiple* PLCs being torn down in one `RemoveConnection`; - concurrent `Add`/`RemoveConnection` from many "connections" — the lock is claimed thread-safe but nothing proves the count never goes negative or leaks. Fix: add `PlcSubscriptionTrackerTests` with the single-threaded cases plus one parallel-stress test (N tasks each Add-then-Remove, assert `ActivePlcs()` is empty and no exception), mirroring `TxIdAllocatorTests`' concurrency style. **M4 — `StatusHub` group-leave on disconnect is not verified.** `src/Mbproxy/Admin/StatusHub.cs:60-66`; `StatusHubTests.cs`. `OnDisconnectedAsync` is tested for the *capture-disarm* side effect, but the hub never calls `Groups.RemoveFromGroupAsync` — and `FakeGroupManager` records `Removed` but **no test ever asserts `groups.Removed`**. Worth confirming this is intentional (real SignalR auto- removes a disconnected connection from all groups, so an explicit `RemoveFromGroupAsync` is genuinely unnecessary) — but then `FakeGroupManager.Removed` is dead code that implies a contract the hub does not honour. Either delete `Removed` from the fake, or if the hub is *supposed* to leave fleet groups explicitly, that's a production bug. As written, a reviewer cannot tell which. Fix: drop the unused `Removed` list, or add a comment on the fake explaining SignalR's implicit group cleanup so the gap is not mistaken for an omission. ### Minor **m1 — `SignalRFakes` do not model the real SignalR failure surface.** `tests/Mbproxy.Tests/Admin/SignalRFakes.cs`. The fakes are faithful to the *shapes* but model only the success path: `FakeGroupManager.AddToGroupAsync` always succeeds and is synchronous; `FakeStatusPushSink.PushFleetAsync` never throws and never observes the `CancellationToken`. The production code has explicit `try/catch` around `_sink.PushFleetAsync` / `PushPlcAsync` (`StatusBroadcaster.cs:88-110`) — a sink that throws is a real scenario (a SignalR send to a dropped connection) and that catch is **completely uncovered**. The `BuildDebug` failure catch (`StatusBroadcaster.cs:82-86`) is likewise uncovered. Fix: add a throwing variant of `FakeStatusPushSink` and assert `PushOnceAsync` swallows the exception and still attempts the per-PLC pushes; assert the `ct` is honoured (an `OperationCanceledException` from the sink must *not* be logged-and-swallowed the same way — the `when (ex is not OperationCanceledException)` filter is untested). **m2 — `FakeHubCallerContext.ConnectionAborted` is hardcoded to `CancellationToken.None`.** `SignalRFakes.cs:23`. Real SignalR fires this token on disconnect. No current test needs it, but if anyone later tests connection-abort handling the fake will silently mask it. Low risk; note it with a comment so a future test author knows the fake is inert here. **m3 — Asset content-type test relies on a hand-maintained `[InlineData]` allow-list.** `AdminEndpointTests.Get_Asset_ReturnsCorrectContentType` covers 5 of the 14 embedded files. `bootstrap.bundle.min.js`, `detail.css`, `detail.js`, `index.html`, `plc.html`, `ibm-plex-sans-400/600.woff2` are not asserted. More importantly there is **no test that the csproj glob actually embedded every wwwroot file** — if someone adds `favicon.ico` to `wwwroot/` and the glob silently misses it (it won't, `*.*` catches it, but a `.gitignore`'d or renamed file would), nothing fails. Fix: add a test that enumerates `typeof(AdminEndpointHost).Assembly.GetManifestResourceNames()`, filters the `Mbproxy.Admin.wwwroot.` prefix, and asserts every file physically present in `wwwroot/` has a matching resource (or just assert the count). This is the only thing that would catch a broken `EmbeddedResource` glob. **m4 — `Get_Asset` does not assert the bytes are the *right* asset.** `AdminEndpointTests.cs` (the new theory). It asserts `bytes.Length > 0` and the content type, but not that `dashboard.js` contains a known marker (it already does this for the HTML shells via `ShouldContain("/assets/dashboard.js")`). A resource-name collision or a wrong `ContentTypeFor` mapping for a *correctly-served-but-wrong* file would pass. Cheap to harden: for the text assets assert a known substring. **m5 — `csproj` `EmbeddedResource Include="Admin\wwwroot\*.*"` — glob caveats.** `src/Mbproxy/Mbproxy.csproj` (the new `ItemGroup`). The SDK is `Microsoft.NET.Sdk.Worker`, which does **not** enable web default-item globs, so there is no double-include of `.css`/ `.js`/`.html` as `Content` — good. Two real caveats: - `*.*` matches only files containing a dot. Every current asset has an extension so this is fine, but it is a non-obvious constraint; `**\*` or just `*` would be more honest. The comment says "intentionally flat", so `*.*` is acceptable, but worth a one-word note that extensionless files would be skipped. - Globs in an `` are evaluated at project-evaluation time; a newly-added asset is *not* picked up by an incremental build that didn't re-evaluate. In practice `dotnet build`/`publish` always re-evaluates, so this is a non-issue — but it is the kind of thing that bites a watch-mode developer. No fix needed; flagged for completeness. The resource-name → request-path mapping (`Mbproxy.Admin.wwwroot.`) is correct for the flat directory and matches `AssetResourcePrefix` in `AdminEndpointHost.cs:301`. **m6 — `mbproxy.smoke.config.json` — valid but with stale/odd values.** `tests/sim/mbproxy.smoke.config.json`. The schema is current (`AdminPushIntervalMs` present, `Keepalive` field names match `KeepaliveOptions`, no removed keys). Issues: - The header comment says the simulator runs on `127.0.0.1:5020` and both `line-a`/`line-b` point `Port: 5020` — consistent. Good. - `BackendHeartbeatIdleMs: 10000` while `BackendRequestTimeoutMs: 2000` — satisfies the "must be greater than BackendRequestTimeoutMs" cross-field rule. Fine. - `line-dead` has no `BcdTags`, so it inherits the empty `Global: []` set — fine, the intent is just an unreachable backend. - The file has **no `Resilience` and no `Cache` section.** Both are optional (defaults apply), so the config binds — but a smoke config whose stated purpose is exercising the dashboard's "problems only" filter would be more representative with the defaults made explicit, or at least a comment that defaults are intentionally relied on. - Comment line 1 references "Phase 4/5 web-UI browser smoke tests" and `plans/2026-05-15-webui-dashboard.md`. `plans/` is untracked (`?? plans/` in git status) and per recent commit `7466a46` the project has been *retiring* plan docs. A smoke config pointing at an untracked plan file is a dangling reference — either commit the plan or drop the citation. - No `Resilience.ReadCoalescing` means coalescing runs with defaults during the smoke run; acceptable but undocumented. **m7 — `ConfigReconcilerTests` change is a minimal compile-fix, not a behaviour test.** `tests/Mbproxy.Tests/Configuration/ConfigReconcilerTests.cs` (the 1-line diff). The new `TagCaptureRegistry` ctor arg is passed as a throwaway `new TagCaptureRegistry()`. That is the correct minimal change, but it means **the reconciler's new responsibility — calling `TagCaptureRegistry.GetOrCreate` on PLC add and `Remove` on PLC removal during hot-reload — has no test.** `ConfigReconciler.cs` was modified in this commit (`+11` lines) to wire the registry; nothing asserts that a hot-reload-added PLC gets a capture or that a removed PLC's capture is dropped. Fix: in `ConfigReconcilerTests`, pass a real registry the test holds a reference to, trigger an add/remove reload, and assert `registry.TryGet` reflects it. **m8 — `ReloadValidator`/`MbproxyOptions` `AdminPushIntervalMs` validation: consistent, but the upper bound is unguarded.** `ReloadValidator.cs` and `MbproxyOptionsValidator` both reject `<= 0` — consistent with how `GracefulShutdownTimeoutMs` and the keepalive intervals are validated (lower-bound-only, `> 0`). So the *new* validation matches house style. But note no interval option in this codebase has an *upper* bound, and `AdminPushIntervalMs` is the one most likely to be fat-fingered into something absurd (`1000000` = 16-minute feed). Not a regression and not inconsistent — flagged only because the dashboard's whole value proposition is "live". The `LoopAsync` `Math.Max(100, ...)` floor protects against tiny values; nothing protects against huge ones. Optional: a soft upper bound (e.g. reject `> 60_000`) or just a doc note. The two new `ReloadValidatorTests` (zero, negative) and two new `MbproxyOptionsBindingTests` (default 1000, binds 250) are correct and adequate for the bounds that *are* enforced. **m9 — `StatusHubTests.SecondSubscriber_FirstLeaveKeepsArmed_LastLeaveDisarms` passes `null` to `OnDisconnectedAsync`.** Line 72/76. Real SignalR passes a non-null `Exception` on an abnormal disconnect and `null` on a clean one, so `null` is a legal value — but the test never exercises the exception-carrying path. `StatusHub.OnDisconnectedAsync` ignores the exception entirely, so this is harmless today; flagged so a future change that *uses* the exception doesn't find itself untested. ## Coverage gaps (behavior in `e719dd5` with NO test) 1. **`StatusBroadcaster.LoopAsync`** — the entire background loop (M1). 2. **Arm/disarm vs. `Record` race** in `TagValueCapture` (M2). 3. **`PlcSubscriptionTracker`** — no direct test, no concurrency test (M3). 4. **Sink-throws / build-throws error handling** in `PushOnceAsync` — the `try/catch` blocks and the `when (ex is not OperationCanceledException)` filter (m1). 5. **`ConfigReconciler` ↔ `TagCaptureRegistry` wiring** — capture created on hot-reload PLC add, removed on PLC remove (m7). 6. **`AdminEndpointHost` AdminPort hot-reload tears down the broadcaster and disarms captures** — `StopCurrentAppAsync` disposes `_broadcaster` (which `DisarmAll`s); the broadcaster's `StopAsync_DisarmsEveryCapture` test proves the disarm in isolation but no test proves the *hot-reload rebind* path runs it. (`AdminEndpointTests` has an admin-port rebind test from the prior commit — extend it to assert capture disarm.) 7. **`/hub/status` SignalR endpoint** — not reachable by any test. The 405-methods test explicitly excludes it. No test connects a SignalR client and verifies a `fleet`/`plc` message is actually received end-to-end. The `IStatusPushSink` unit tests cover the broadcaster logic but nothing covers `SignalRStatusPushSink` + `MapHub` wiring. Given the project already stands up a real Kestrel host in `AdminEndpointTests`, one `HubConnectionBuilder`-based E2E test (`SubscribeFleet` → await a `fleet` message) would close the single biggest "does the feature actually work" gap. 8. **Multi-tag / 32-bit capture via `BcdPduPipeline`** — `BcdPduPipelineCaptureTests` covers FC03 16/32-bit and FC06/FC16 16-bit, but not FC16 *32-bit* (CDAB pair write) capture, and not a PDU spanning a BCD tag *and* a non-BCD register (does capture record only the BCD one?). The pure `TagValueCaptureTests` cover 32-bit, but the *pipeline hook* for 32-bit writes is unverified. 9. **`Get_PlcDetailRoute_ReturnsDetailShell`** uses `/plc/anything` — it never checks that a PLC name with a slash, encoded characters, or empty segment is handled. The route is `/plc/{name}` and the name is read client-side, so this is low-risk, but `/plc/` (empty) and `/plc/a%2Fb` are untested. ## What looks good - **`IStatusPushSink` is a clean seam.** Extracting the outbound side behind an interface so the broadcaster loop logic is testable without a SignalR host is exactly right, and `FakeStatusPushSink` using `ConcurrentBag` is the correct call for a type that production pushes to from a background thread. - **`TagValueCapture` torn-read test is genuine.** `ConcurrentRecordAndSnapshot_NeverYields TornSlot` races 4 writers × 200k ops against a reader and checks a real invariant (`DecodedValue == RawLow + RawHigh`). This is the right way to test the `Volatile.Write` immutable-record swap, and it would actually catch a regression to a mutable slot. - **`SignalRFakes` model the right shapes.** `FakeHubCallerContext : HubCallerContext` with the abstract members overridden, `FakeGroupManager : IGroupManager`, and the hub-property injection (`Context = ...`, `Groups = ...`) is the standard, correct way to unit-test a SignalR `Hub` without a host. No mocking framework is dragged in. - **`BcdPduPipelineCaptureTests` regression guards are well-chosen.** The disarmed-capture and null-capture cases assert the rewrite *still happens byte-identically* — that is the load-bearing property (capture must never perturb the proxy's transparency) and it is explicitly tested. - **`TagCaptureRegistryTests.GetOrCreate_Rebuild_PreservesArmedFlag`** correctly tests the hot-reload reseat contract (rebuilt capture is a new instance but keeps `IsArmed`), and `UnknownPlc_Operations_AreSafeNoOps` covers the no-op-for-ghost-PLC contract that `StatusHub` relies on. - **`StatusHtmlRenderer` removal is clean.** No source or test file references it; the only remaining mentions are in `plans/`, `docs/`, and prior `codereviews/` — all expected. - **The `AdminPushIntervalMs` validation is consistent with the codebase.** Lower-bound-only `> 0` checks in both `MbproxyOptionsValidator` and `ReloadValidator`, error-message format matching the sibling checks, and tests for default + bind + zero + negative. This is the right pattern; the new tests are adequate for what is enforced. - **The csproj `EmbeddedResource` glob is correct** for the flat-directory design, the comment accurately documents the resource-name → request-path mapping, and there is no accidental double-include because the Worker SDK does not enable web default globs. - **`mbproxy.smoke.config.json` binds against the current schema** — no stale keys, the three-PLC line-a/line-b/line-dead topology is a sensible smoke surface, and `AdminPushIntervalMs` is present and explicit. ## Key file references - `src/Mbproxy/Admin/StatusBroadcaster.cs:113-135` — untested `LoopAsync` (M1). - `src/Mbproxy/Proxy/TagValueCapture.cs:103-129` — arm/disarm vs. record race (M2). - `src/Mbproxy/Admin/PlcSubscriptionTracker.cs` — no test file exists (M3). - `src/Mbproxy/Admin/StatusHub.cs:60-66` — `OnDisconnectedAsync`; `FakeGroupManager.Removed` is unasserted dead code (M4). - `tests/Mbproxy.Tests/Admin/SignalRFakes.cs` — success-path-only fakes (m1, m2). - `tests/Mbproxy.Tests/Configuration/ConfigReconcilerTests.cs` — compile-fix only, reconciler ↔ registry wiring untested (m7). - `src/Mbproxy/Mbproxy.csproj` — `EmbeddedResource Include="Admin\wwwroot\*.*"` (m5). - `tests/sim/mbproxy.smoke.config.json` — dangling `plans/` reference (m6).