Reviewed the new SignalR dashboard and fixed its two top findings: a stored XSS on the connection-detail page (unescaped tag name / direction / timestamp rendered into innerHTML) and FC03/FC04 cache hits bypassing the debug-view capture, which left cached tags frozen while their age climbed. Also adds an optional human-friendly Name to BCD tags surfaced on the debug view, and loads the real fleet config from tags.txt (12 named BCD tags, PLC Z28061) so the published appsettings.json is deploy-ready. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 KiB
Code Review — Tests & Config for the SignalR Dashboard (commit e719dd5)
Scope: the test files, config/options/build changes, and the smoke config introduced or
modified by e719dd5 ("replace status page with a live SignalR web dashboard"). Production
SignalR/capture code (StatusHub, StatusBroadcaster, TagValueCapture,
TagCaptureRegistry, AdminEndpointHost) was read for context but is reviewed only insofar
as it tells us whether the new tests actually exercise the risky paths. The wider UI
(dashboard.js, detail.js, HTML/CSS) is explicitly out of scope.
Summary
The new tests are competently written, follow the existing Subject_Condition_Expectation
naming, and the IStatusPushSink seam is a genuinely good design that makes the broadcaster
unit-testable without a SignalR host. The TagValueCapture torn-read test is a real
concurrency test, not a pretend one. But the concurrency-sensitive paths the commit message
advertises — "armed only while a detail page is open", arm/disarm under churn, the broadcaster
loop — are tested only at the single-threaded happy-path level. The most serious gaps:
- The
StatusBroadcaster.LoopAsyncbackground loop has zero test coverage — every broadcaster test calls the internalPushOnceAsyncdirectly. The loop's interval re-read, theMath.Max(100, ...)floor, the cancellation path, and the loop-terminated-unexpectedly catch are all unverified. - The arm/disarm-vs-record race in
TagValueCaptureis real and untested.Disarm()sets_armed = falsethen clears slots;Record()checks_armedthen writes. ARecordthat passes the_armedcheck beforeDisarmflips it can write a slot afterDisarmcleared it — leaving a stale observation on a disarmed capture. The torn-read test only racesRecordagainstSnapshot, never againstDisarm/Arm. - The
PlcSubscriptionTracker— the lock-guarded type whose entire job is concurrent subscriber counting — has no direct test at all and no concurrent test anywhere. Its behaviour is only incidentally exercised through single-threadedStatusHubTests.
None are release-blockers for a read-only admin page on a trusted segment, but #2 is a latent correctness bug and #1/#3 mean a regression in the live-feed plumbing would ship silently.
Findings
Critical
None. This is a read-only admin surface; nothing here can corrupt the Modbus path.
Major
M1 — StatusBroadcaster.LoopAsync has no test.
src/Mbproxy/Admin/StatusBroadcaster.cs:113-135; tests/Mbproxy.Tests/Admin/StatusBroadcasterTests.cs.
All four broadcaster tests drive PushOnceAsync directly. The actual background loop —
which is what runs in production — is untested. Specifically uncovered:
- the per-cycle
_options.CurrentValue.AdminPushIntervalMsre-read (the documented hot-reload-without-restart behaviour); - the
Math.Max(100, ...)floor that defends against a bad interval slipping past validation (note: validation rejects<= 0, so the floor only ever matters for values1..99— itself an untested corner); - that
StartAsync→StopAsyncactually terminates the loop and thatStopAsyncis safe to call when the loop never started. Fix: add a test that constructs the broadcaster withAdminPushIntervalMs~100ms, callsStart(), pollsSink.FleetPushes.Countuntil>= 2with a generous deadline + the testCancellationToken, thenStopAsync()and asserts the count stops growing. Keep timing hedged (ShouldBeGreaterThanOrEqualTo), consistent with the existing coalescing tests.
M2 — Arm/disarm-vs-record race in TagValueCapture is real and untested.
src/Mbproxy/Proxy/TagValueCapture.cs:103-129. Disarm() does _armed = false then clears
slots; Record() does if (!_armed) return; then Volatile.Write the slot. Interleaving:
Record reads _armed == true → Disarm sets false and clears all slots → Record
writes its observation. Result: a disarmed capture holds a non-null slot, and the next
Snapshot() reports stale traffic with UpdatedAtUtc != null — exactly the "reopened page
shows stale data" failure the on-demand design says it prevents. The hot path makes this
unlikely (record traffic stops when no viewer is attached) but it is not impossible: a
detail page closing while an in-flight FC03 response is being rewritten hits it.
ConcurrentRecordAndSnapshot_NeverYieldsTornSlot (line 118) races Record vs Snapshot
only — never vs Disarm. Fix in production: have Disarm() set _armed = false, then
clear slots, then clear again (or re-clear after a memory barrier), or re-check _armed
inside Record after the Volatile.Write and null the slot if disarmed. At minimum add a
test that hammers Arm/Disarm on one task while Record runs on another and asserts a
disarmed capture's Snapshot() is all-null. The review should not let this ship "tested"
when the test deliberately avoids the racy interleaving.
M3 — PlcSubscriptionTracker has no direct or concurrent test.
src/Mbproxy/Admin/PlcSubscriptionTracker.cs (whole file); no PlcSubscriptionTrackerTests.cs
exists. This is the single lock-guarded type whose correctness drives capture arm/disarm.
StatusHubTests exercises it only single-threaded and indirectly. Untested behaviour that
has no coverage anywhere:
- the redundant re-subscribe path (
Addreturnsfalsewhen the same connection re-subscribes the same PLC —set.Addfails); RemoveConnectionfor an unknown connection id returningArray.Empty<string>();- a connection subscribed to multiple PLCs being torn down in one
RemoveConnection; - concurrent
Add/RemoveConnectionfrom many "connections" — the lock is claimed thread-safe but nothing proves the count never goes negative or leaks. Fix: addPlcSubscriptionTrackerTestswith the single-threaded cases plus one parallel-stress test (N tasks each Add-then-Remove, assertActivePlcs()is empty and no exception), mirroringTxIdAllocatorTests' concurrency style.
M4 — StatusHub group-leave on disconnect is not verified.
src/Mbproxy/Admin/StatusHub.cs:60-66; StatusHubTests.cs.
OnDisconnectedAsync is tested for the capture-disarm side effect, but the hub never
calls Groups.RemoveFromGroupAsync — and FakeGroupManager records Removed but no test
ever asserts groups.Removed. Worth confirming this is intentional (real SignalR auto-
removes a disconnected connection from all groups, so an explicit RemoveFromGroupAsync is
genuinely unnecessary) — but then FakeGroupManager.Removed is dead code that implies a
contract the hub does not honour. Either delete Removed from the fake, or if the hub is
supposed to leave fleet groups explicitly, that's a production bug. As written, a reviewer
cannot tell which. Fix: drop the unused Removed list, or add a comment on the fake
explaining SignalR's implicit group cleanup so the gap is not mistaken for an omission.
Minor
m1 — SignalRFakes do not model the real SignalR failure surface.
tests/Mbproxy.Tests/Admin/SignalRFakes.cs. The fakes are faithful to the shapes but
model only the success path: FakeGroupManager.AddToGroupAsync always succeeds and is
synchronous; FakeStatusPushSink.PushFleetAsync never throws and never observes the
CancellationToken. The production code has explicit try/catch around
_sink.PushFleetAsync / PushPlcAsync (StatusBroadcaster.cs:88-110) — a sink that throws
is a real scenario (a SignalR send to a dropped connection) and that catch is completely
uncovered. The BuildDebug failure catch (StatusBroadcaster.cs:82-86) is likewise
uncovered. Fix: add a throwing variant of FakeStatusPushSink and assert PushOnceAsync
swallows the exception and still attempts the per-PLC pushes; assert the ct is honoured
(an OperationCanceledException from the sink must not be logged-and-swallowed the same
way — the when (ex is not OperationCanceledException) filter is untested).
m2 — FakeHubCallerContext.ConnectionAborted is hardcoded to CancellationToken.None.
SignalRFakes.cs:23. Real SignalR fires this token on disconnect. No current test needs it,
but if anyone later tests connection-abort handling the fake will silently mask it. Low risk;
note it with a comment so a future test author knows the fake is inert here.
m3 — Asset content-type test relies on a hand-maintained [InlineData] allow-list.
AdminEndpointTests.Get_Asset_ReturnsCorrectContentType covers 5 of the 14 embedded files.
bootstrap.bundle.min.js, detail.css, detail.js, index.html, plc.html,
ibm-plex-sans-400/600.woff2 are not asserted. More importantly there is no test that the
csproj glob actually embedded every wwwroot file — if someone adds favicon.ico to
wwwroot/ and the glob silently misses it (it won't, *.* catches it, but a .gitignore'd
or renamed file would), nothing fails. Fix: add a test that enumerates
typeof(AdminEndpointHost).Assembly.GetManifestResourceNames(), filters the
Mbproxy.Admin.wwwroot. prefix, and asserts every file physically present in wwwroot/ has
a matching resource (or just assert the count). This is the only thing that would catch a
broken EmbeddedResource glob.
m4 — Get_Asset does not assert the bytes are the right asset.
AdminEndpointTests.cs (the new theory). It asserts bytes.Length > 0 and the content
type, but not that dashboard.js contains a known marker (it already does this for the HTML
shells via ShouldContain("/assets/dashboard.js")). A resource-name collision or a wrong
ContentTypeFor mapping for a correctly-served-but-wrong file would pass. Cheap to harden:
for the text assets assert a known substring.
m5 — csproj EmbeddedResource Include="Admin\wwwroot\*.*" — glob caveats.
src/Mbproxy/Mbproxy.csproj (the new ItemGroup). The SDK is Microsoft.NET.Sdk.Worker,
which does not enable web default-item globs, so there is no double-include of .css/
.js/.html as Content — good. Two real caveats:
*.*matches only files containing a dot. Every current asset has an extension so this is fine, but it is a non-obvious constraint;**\*or just*would be more honest. The comment says "intentionally flat", so*.*is acceptable, but worth a one-word note that extensionless files would be skipped.- Globs in an
<ItemGroup>are evaluated at project-evaluation time; a newly-added asset is not picked up by an incremental build that didn't re-evaluate. In practicedotnet build/publishalways re-evaluates, so this is a non-issue — but it is the kind of thing that bites a watch-mode developer. No fix needed; flagged for completeness. The resource-name → request-path mapping (Mbproxy.Admin.wwwroot.<file>) is correct for the flat directory and matchesAssetResourcePrefixinAdminEndpointHost.cs:301.
m6 — mbproxy.smoke.config.json — valid but with stale/odd values.
tests/sim/mbproxy.smoke.config.json. The schema is current (AdminPushIntervalMs present,
Keepalive field names match KeepaliveOptions, no removed keys). Issues:
- The header comment says the simulator runs on
127.0.0.1:5020and bothline-a/line-bpointPort: 5020— consistent. Good. BackendHeartbeatIdleMs: 10000whileBackendRequestTimeoutMs: 2000— satisfies the "must be greater than BackendRequestTimeoutMs" cross-field rule. Fine.line-deadhas noBcdTags, so it inherits the emptyGlobal: []set — fine, the intent is just an unreachable backend.- The file has no
Resilienceand noCachesection. Both are optional (defaults apply), so the config binds — but a smoke config whose stated purpose is exercising the dashboard's "problems only" filter would be more representative with the defaults made explicit, or at least a comment that defaults are intentionally relied on. - Comment line 1 references "Phase 4/5 web-UI browser smoke tests" and
plans/2026-05-15-webui-dashboard.md.plans/is untracked (?? plans/in git status) and per recent commit7466a46the project has been retiring plan docs. A smoke config pointing at an untracked plan file is a dangling reference — either commit the plan or drop the citation. - No
Resilience.ReadCoalescingmeans coalescing runs with defaults during the smoke run; acceptable but undocumented.
m7 — ConfigReconcilerTests change is a minimal compile-fix, not a behaviour test.
tests/Mbproxy.Tests/Configuration/ConfigReconcilerTests.cs (the 1-line diff). The new
TagCaptureRegistry ctor arg is passed as a throwaway new TagCaptureRegistry(). That is
the correct minimal change, but it means the reconciler's new responsibility — calling
TagCaptureRegistry.GetOrCreate on PLC add and Remove on PLC removal during hot-reload —
has no test. ConfigReconciler.cs was modified in this commit (+11 lines) to wire the
registry; nothing asserts that a hot-reload-added PLC gets a capture or that a removed PLC's
capture is dropped. Fix: in ConfigReconcilerTests, pass a real registry the test holds a
reference to, trigger an add/remove reload, and assert registry.TryGet reflects it.
m8 — ReloadValidator/MbproxyOptions AdminPushIntervalMs validation: consistent, but
the upper bound is unguarded. ReloadValidator.cs and MbproxyOptionsValidator both
reject <= 0 — consistent with how GracefulShutdownTimeoutMs and the keepalive intervals
are validated (lower-bound-only, > 0). So the new validation matches house style. But
note no interval option in this codebase has an upper bound, and AdminPushIntervalMs is
the one most likely to be fat-fingered into something absurd (1000000 = 16-minute feed).
Not a regression and not inconsistent — flagged only because the dashboard's whole value
proposition is "live". The LoopAsync Math.Max(100, ...) floor protects against tiny
values; nothing protects against huge ones. Optional: a soft upper bound (e.g. reject
> 60_000) or just a doc note. The two new ReloadValidatorTests (zero, negative) and two
new MbproxyOptionsBindingTests (default 1000, binds 250) are correct and adequate for the
bounds that are enforced.
m9 — StatusHubTests.SecondSubscriber_FirstLeaveKeepsArmed_LastLeaveDisarms passes
null to OnDisconnectedAsync. Line 72/76. Real SignalR passes a non-null Exception
on an abnormal disconnect and null on a clean one, so null is a legal value — but the
test never exercises the exception-carrying path. StatusHub.OnDisconnectedAsync ignores
the exception entirely, so this is harmless today; flagged so a future change that uses
the exception doesn't find itself untested.
Coverage gaps (behavior in e719dd5 with NO test)
StatusBroadcaster.LoopAsync— the entire background loop (M1).- Arm/disarm vs.
Recordrace inTagValueCapture(M2). PlcSubscriptionTracker— no direct test, no concurrency test (M3).- Sink-throws / build-throws error handling in
PushOnceAsync— thetry/catchblocks and thewhen (ex is not OperationCanceledException)filter (m1). ConfigReconciler↔TagCaptureRegistrywiring — capture created on hot-reload PLC add, removed on PLC remove (m7).AdminEndpointHostAdminPort hot-reload tears down the broadcaster and disarms captures —StopCurrentAppAsyncdisposes_broadcaster(whichDisarmAlls); the broadcaster'sStopAsync_DisarmsEveryCapturetest proves the disarm in isolation but no test proves the hot-reload rebind path runs it. (AdminEndpointTestshas an admin-port rebind test from the prior commit — extend it to assert capture disarm.)/hub/statusSignalR endpoint — not reachable by any test. The 405-methods test explicitly excludes it. No test connects a SignalR client and verifies afleet/plcmessage is actually received end-to-end. TheIStatusPushSinkunit tests cover the broadcaster logic but nothing coversSignalRStatusPushSink+MapHubwiring. Given the project already stands up a real Kestrel host inAdminEndpointTests, oneHubConnectionBuilder-based E2E test (SubscribeFleet→ await afleetmessage) would close the single biggest "does the feature actually work" gap.- Multi-tag / 32-bit capture via
BcdPduPipeline—BcdPduPipelineCaptureTestscovers FC03 16/32-bit and FC06/FC16 16-bit, but not FC16 32-bit (CDAB pair write) capture, and not a PDU spanning a BCD tag and a non-BCD register (does capture record only the BCD one?). The pureTagValueCaptureTestscover 32-bit, but the pipeline hook for 32-bit writes is unverified. Get_PlcDetailRoute_ReturnsDetailShelluses/plc/anything— it never checks that a PLC name with a slash, encoded characters, or empty segment is handled. The route is/plc/{name}and the name is read client-side, so this is low-risk, but/plc/(empty) and/plc/a%2Fbare untested.
What looks good
IStatusPushSinkis a clean seam. Extracting the outbound side behind an interface so the broadcaster loop logic is testable without a SignalR host is exactly right, andFakeStatusPushSinkusingConcurrentBagis the correct call for a type that production pushes to from a background thread.TagValueCapturetorn-read test is genuine.ConcurrentRecordAndSnapshot_NeverYields TornSlotraces 4 writers × 200k ops against a reader and checks a real invariant (DecodedValue == RawLow + RawHigh). This is the right way to test theVolatile.Writeimmutable-record swap, and it would actually catch a regression to a mutable slot.SignalRFakesmodel the right shapes.FakeHubCallerContext : HubCallerContextwith the abstract members overridden,FakeGroupManager : IGroupManager, and the hub-property injection (Context = ...,Groups = ...) is the standard, correct way to unit-test a SignalRHubwithout a host. No mocking framework is dragged in.BcdPduPipelineCaptureTestsregression guards are well-chosen. The disarmed-capture and null-capture cases assert the rewrite still happens byte-identically — that is the load-bearing property (capture must never perturb the proxy's transparency) and it is explicitly tested.TagCaptureRegistryTests.GetOrCreate_Rebuild_PreservesArmedFlagcorrectly tests the hot-reload reseat contract (rebuilt capture is a new instance but keepsIsArmed), andUnknownPlc_Operations_AreSafeNoOpscovers the no-op-for-ghost-PLC contract thatStatusHubrelies on.StatusHtmlRendererremoval is clean. No source or test file references it; the only remaining mentions are inplans/,docs/, and priorcodereviews/— all expected.- The
AdminPushIntervalMsvalidation is consistent with the codebase. Lower-bound-only> 0checks in bothMbproxyOptionsValidatorandReloadValidator, error-message format matching the sibling checks, and tests for default + bind + zero + negative. This is the right pattern; the new tests are adequate for what is enforced. - The csproj
EmbeddedResourceglob is correct for the flat-directory design, the comment accurately documents the resource-name → request-path mapping, and there is no accidental double-include because the Worker SDK does not enable web default globs. mbproxy.smoke.config.jsonbinds against the current schema — no stale keys, the three-PLC line-a/line-b/line-dead topology is a sensible smoke surface, andAdminPushIntervalMsis present and explicit.
Key file references
src/Mbproxy/Admin/StatusBroadcaster.cs:113-135— untestedLoopAsync(M1).src/Mbproxy/Proxy/TagValueCapture.cs:103-129— arm/disarm vs. record race (M2).src/Mbproxy/Admin/PlcSubscriptionTracker.cs— no test file exists (M3).src/Mbproxy/Admin/StatusHub.cs:60-66—OnDisconnectedAsync;FakeGroupManager.Removedis unasserted dead code (M4).tests/Mbproxy.Tests/Admin/SignalRFakes.cs— success-path-only fakes (m1, m2).tests/Mbproxy.Tests/Configuration/ConfigReconcilerTests.cs— compile-fix only, reconciler ↔ registry wiring untested (m7).src/Mbproxy/Mbproxy.csproj—EmbeddedResource Include="Admin\wwwroot\*.*"(m5).tests/sim/mbproxy.smoke.config.json— danglingplans/reference (m6).