Fixes every finding from the codereviews/2026-05-16 multi-agent review (2 Critical, 20 Major, 38 Minor) and adds that review to the repo. Highlights: dashboard XSS escape; response cache invalidated on the write request (not just the response); ReloadValidator now runs at startup so port collisions / duplicate names / malformed Resilience profiles fail fast; AdminPort 0 genuinely disables the admin endpoint; PlcListener accept-loop faults propagate to the supervisor's faulted path; reconciler Restart builds before removing; Resilience pipelines are restart-only from a frozen snapshot; multiplexer connect-race leak, watchdog party-list snapshot, backend-response and FC16 framing validation; frontend reconnect retry and util.js load guard; plus the log-event/doc drift sweep and test-port hygiene. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 KiB
Code Review — Tests, Install/Packaging, and Config (HEAD 0308490)
Scope: tests/Mbproxy.Tests/ (all test code), tests/sim/ (simulator profile and fixture),
install/ (all scripts and config templates), src/Mbproxy/Mbproxy.csproj, and
tests/Mbproxy.Tests/Mbproxy.Tests.csproj. Prior-review findings from codereviews/2026-05-15/TestsAndConfig.md
were read for regression context; the code is reviewed fresh against the current HEAD.
Summary
The prior-review remediation is genuine and thorough: every Major finding from the
2026-05-15 review has been addressed — StatusBroadcaster.LoopAsync now has a proper
timing-hedged loop test (Loop_PushesRepeatedly_ThenStopsAfterStopAsync), the
Record/Disarm race is fixed in production and covered by
ConcurrentRecordAndDisarm_LeavesNoStaleObservation, PlcSubscriptionTracker now has a
dedicated test file including a concurrency stress test, the ThrowingStatusPushSink fake
covers both the swallow-path and the OperationCanceledException filter, and
ConfigReconcilerTests now holds a real registry and asserts add/remove lifecycle. The
HubStatusE2ETests E2E pair covers the MapHub/SignalRStatusPushSink gap that the prior
review flagged as the single biggest "does the feature actually work" gap. The EmbeddedAssetsTests
glob guard and the Get_Asset byte-verbatim assertion also close the prior m3/m4 nits.
No prior-review finding has regressed. The remaining findings below are all new, introduced by
the current HEAD or present in the pre-existing code but newly visible against the completed
remediation baseline.
Findings: 0 Critical · 4 Major · 7 Minor
Critical
None.
Major
M1 — AdminPort=0 silently binds a random OS-assigned port instead of disabling the admin endpoint.
install/mbproxy.config.template.json:83, install/mbproxy.linux.config.template.json:87,
docs/Operations/Configuration.md:54; src/Mbproxy/Admin/AdminEndpointHost.cs:190.
Both config templates carry the comment // Set to 0 to disable the admin endpoint., and the
Operations docs repeat this. The comment is wrong. AdminEndpointHost.StartAppAsync calls
k.Listen(System.Net.IPAddress.Any, port) with whatever port is — Kestrel interprets port 0
as "pick a free OS-assigned port" (ephemeral port), not as "skip the bind". There is no code in
StartAppAsync or its callers that special-cases port 0 as a disable signal. A production
operator who sets AdminPort: 0 to disable the admin surface does not get a disabled endpoint;
they get an admin endpoint on an unknown ephemeral port that is not logged in the startup banner
in a predictable way. This is made worse by:
ReloadValidator.Validate(line 65–68) rejectsAdminPort < 1, so a hot-reload withAdminPort: 0is correctly blocked — but the initial startup path does not go throughReloadValidator(onlyMbproxyOptionsValidator, which does not range-checkAdminPort). So port 0 at startup actually takes effect, while the same value in a reload is rejected. The two paths are inconsistent.- Several tests intentionally use
["Mbproxy:AdminPort"] = "0"with the comment "disable admin to avoid port conflicts" (ShutdownE2ETests.cs:163,StatusBroadcasterTests.cs:47,StatusSnapshotBuilderTests.cs:266). These tests are not broken — they work because Kestrel binds on 0 and no test probes the admin port — but they silently start an admin endpoint rather than suppressing it. If those tests ever run concurrently with another test that probes all listening ports they can interfere.
Impact: Operators following the documented AdminPort: 0 procedure to disable the admin
surface on a sensitive network have an undocumented, unauthorised HTTP listener running. This is
a documentation/behavioural inconsistency on a production-critical setting.
Recommendation: Implement the documented disable. In AdminEndpointHost.StartAppAsync, add
if (port == 0) return; before the builder construction. Then update ReloadValidator to allow
0 (skip collision check) and update MbproxyOptionsValidator to allow 0. The templates and docs
are then correct. Alternatively, remove the "Set to 0 to disable" comment and document that the
admin endpoint is always started, and rename the test comments to reflect reality.
M2 — Multiple E2E tests hard-code AdminPort: 8080, causing bind-conflict failures when run in parallel or when port 8080 is already in use.
tests/Mbproxy.Tests/Proxy/ProxyForwardingTests.cs:67,242,310; tests/Mbproxy.Tests/Proxy/RewriterE2ETests.cs:388;
tests/Mbproxy.Tests/HostSmokeTests.cs:111.
ProxyForwardingTests (3 configs), RewriterE2ETests (1 config), and the shared
TestHostBuilderExtensions.ConfigureForTest used by all three HostSmokeTests hard-code
["Mbproxy:AdminPort"] = "8080". When these tests run in parallel (xUnit v3's default policy
for non-collection tests) they each try to bind a Kestrel listener on port 8080. The first
binder wins; all subsequent tests get a bind.failed log and a missing admin endpoint. While
the proxy tests themselves do not assert admin-endpoint behaviour, the spurious bind-failed error
can mask real failures in CI logs and reduces test isolation. HostSmokeTests compounds this:
it is not in any [Collection] so it runs concurrently with everything else.
Impact: Flaky CI runs when multiple test classes execute simultaneously. Currently mitigated
by the AdminEndpointHost bind-failure-is-non-fatal path, so test assertions still pass — but
the coverage is degraded (admin is not actually up) and the log noise makes CI output harder to
read.
Recommendation: Replace the hard-coded "8080" values with PickFreePort() calls (the
same pattern already used by AdminEndpointTests and HubStatusE2ETests). Update
TestHostBuilderExtensions.ConfigureForTest to accept a port parameter with default 8080 for
callers that do not care (smoke tests do not probe the admin surface and an ephemeral-port bind
is fine).
M3 — DL205SimulatorFixture swallows the test's own CancellationToken in the readiness poll, blocking test cancellation for up to 120 seconds.
tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs:121-163.
InitializeAsync constructs a CancellationTokenSource with a hard-coded 120-second deadline
(line 61) and a linked token that chains the deadline against CancellationToken.None (line
122):
using var linked = CancellationTokenSource.CreateLinkedTokenSource(
deadline.Token, CancellationToken.None);
The test framework's cancellation token — which xUnit v3 exposes as
TestContext.Current.CancellationToken — is never linked in. If the test runner cancels the
test suite (CI timeout, keyboard interrupt), the readiness poll continues for up to 120 more
seconds and the spawned Python process is not killed until DisposeAsync finally runs. On a CI
runner with a per-job timeout this can cause the whole runner to be killed mid-cleanup, leaving
orphaned Python processes.
Impact: Poor CI behaviour under cancellation; can leave zombie python/pwsh processes.
Recommendation: Link the fixture against the framework's application lifetime or accept a
CancellationToken through the IAsyncLifetime interface (xUnit v3 provides one via
InitializeAsync(CancellationToken)). Alternatively, use
TestContext.Current.CancellationToken from inside InitializeAsync if xUnit v3's fixture
IAsyncLifetime overload provides it:
public async ValueTask InitializeAsync() // or the overload with CancellationToken
At minimum, change CancellationToken.None to the method's incoming token or
TestContext.Current.CancellationToken inside linked so a kill signal actually propagates.
M4 — install.ps1 copies the config template to $dataDir (ProgramData) but the ReloadValidator AdminPort=0 / MbproxyOptionsValidator gap means an operator using the template's AdminPort: 0 disable hint starts a service with an undocumented ephemeral admin port.
(Secondary impact of M1, listed separately because the install path is independently broken.)
install/install.ps1:153-162; install/mbproxy.config.template.json:83-84.
When install.ps1 seeds the template as the initial config and an operator follows the inline
comment to set AdminPort: 0, the service starts with Kestrel binding on an OS-assigned
ephemeral port, not disabled. The install.ps1 script itself has no bug — the error is upstream
in M1 — but the install path is where the misleading comment is first acted upon. Listing it
separately so the install script gets a concrete fix recommendation: if M1 is fixed by adding a
port == 0 → skip guard in AdminEndpointHost, the template comment becomes correct and this
finding is resolved simultaneously. If M1 is fixed differently, the template comment must still
be updated to match whatever the new semantics are.
Impact: Same as M1 — an unexpected open HTTP port on a "disabled" admin config.
Recommendation: Fix M1 first; this finding closes automatically once the template's
AdminPort: 0 comment reflects actual behaviour.
Minor
m1 — HubStatusE2ETests.PickFreePort has a TOCTOU race: the port can be stolen between l.Stop() and Kestrel's bind.
tests/Mbproxy.Tests/Admin/HubStatusE2ETests.cs:141-147.
The helper stops the listener and returns the port number — same pattern used in a dozen other
test files. This is acknowledged as acceptable in the DL205SimulatorFixture TOCTOU comment
(line 72). No change needed, but worth noting since HubStatusE2ETests is one of the few tests
that allocates two ports in the same method (admin + proxy) and has a wider race window. Low
priority; flagged for awareness.
m2 — Loop_PushesRepeatedly_ThenStopsAfterStopAsync has a 400ms post-stop idle assertion that can be a source of slowness but not flakiness.
tests/Mbproxy.Tests/Admin/StatusBroadcasterTests.cs:188-190.
await Task.Delay(400, TestContext.Current.CancellationToken);
h.Sink.FleetPushes.Count.ShouldBe(afterStop, "no pushes may occur after StopAsync");
The test correctly waits 400 ms after StopAsync to confirm silence. Given AdminPushIntervalMs=100
and that StopAsync awaits the loop task (await _loop), the loop is guaranteed terminated
before StopAsync returns — so zero extra delay is needed for the assertion to be correct. The
400 ms delay adds wall-clock time to every test run without buying additional confidence. The
existing _loop await in StopAsync is the real guarantee.
Recommendation: Drop the Task.Delay(400, ...) line and assert the count immediately after
StopAsync. The test would be equally correct and ~400 ms faster.
m3 — FakeGroupManager.Removed list is still dead code: no test ever asserts it.
tests/Mbproxy.Tests/Admin/SignalRFakes.cs:30-31.
This was noted as prior finding M4 in the 2026-05-15 review ("either delete Removed from the
fake, or add a comment"). The list is still present and still has no asserting test.
StatusHub.OnDisconnectedAsync never calls RemoveFromGroupAsync (relying on SignalR's
implicit group cleanup), so no assertion ever fires. The unasserted field is a minor maintenance
hazard: a future author may assume this is the intended assertion target and waste time debugging
a test that never fails.
Recommendation: Drop the Removed list from FakeGroupManager, or add a single inline
comment: // SignalR auto-removes disconnected connections from groups; explicit RemoveFromGroupAsync is not called.
m4 — mbproxy.smoke.config.json still carries a dangling reference to plans/.
tests/sim/mbproxy.smoke.config.json:1.
The header comment references plans/2026-05-15-webui-dashboard.md. As of commit 7466a46
("retire superseded design/plan docs"), plans/ is untracked (?? plans/ in git status) and
contains no file of that name. The comment is a stale citation that will silently fail any
documentation-link check.
Recommendation: Replace the reference with a plain description of the smoke file's purpose, removing the specific plan filename.
m5 — Both install templates have AdminPort: 8080 hard-coded in the Plcs section inline comment, but the comment already appears in the admin-port section — the duplication in the PLC entry comment is not wrong but potentially confusing.
install/mbproxy.config.template.json:79 ("// GET /status.json → …" inside the PLC
description); both templates are identical here.
This is a documentation nit: the Plcs array comment block on lines 60–75 is accurate, and the
preceding inline comment about the 4-client cap remains true and relevant. No functional issue.
Recommendation: No change required; noted for completeness.
m6 — publish.ps1 and publish.sh do not exit non-zero if the config template copy fails.
install/publish.ps1:87-94; install/publish.sh:85-95.
Both scripts call throw / exit 1 if the template file is missing, which is correct. However
the Copy-Item -Force / cp -f steps that copy the template into each output flavour silently
succeed even if the destination directory does not exist (the directory was just created by
dotnet publish). If dotnet publish produced no output directory (a degenerate build failure
that dotnet publish should have already reported as non-zero), the copy also fails with an
error but neither script re-checks $LASTEXITCODE after the copy loop. In practice dotnet publish
failing before this point already causes both scripts to throw/exit (the preceding
if ($LASTEXITCODE -ne 0) { throw } guards). This is low-risk but the post-copy result
verification (the Format-Size / du -h summary block) is output-only and does not fail the
script if a binary is missing — it emits a Write-Warning / echo WARNING instead.
Recommendation: In the result summary loop of both scripts, change the missing-binary branch from a warning to a non-zero exit:
# publish.ps1
if (Test-Path $bin) { ... } else { throw "Expected binary not found: $bin" }
# publish.sh
if [[ -f "$bin" ]]; then ... else echo "ERROR: expected binary not found: $bin" >&2; exit 1; fi
m7 — mbproxy.service ReadWritePaths does not include /etc/mbproxy, so a hot-reload write by the service account to its own config directory is blocked by ProtectSystem=strict.
install/mbproxy.service:40.
ReadWritePaths=/var/log/mbproxy /var/cache/mbproxy
ProtectSystem=strict makes the entire filesystem read-only except paths listed in
ReadWritePaths. The service itself never writes to /etc/mbproxy (it only reads
appsettings.json), so this is not a runtime bug. However:
- The comment at line 8 states
WorkingDirectory=/etc/mbproxy— the service account needs read access there, whichProtectSystem=strictprovides (read-only is allowed). - If an operator's monitoring or deployment tooling updates the config in-place (e.g.
scp-ing a newappsettings.jsonas thembproxyuser via an SSH key), that write will be blocked byProtectSystem=strict. The operator would need to write as root and chown/chmod as appropriate.
This is likely intentional (config is an admin operation), but the systemd unit has no comment
explaining why /etc/mbproxy is intentionally absent from ReadWritePaths. It trips up
operators who expect the service account to own its config.
Recommendation: Add a comment after the ReadWritePaths line: # /etc/mbproxy is intentionally absent — config changes require root (ProtectSystem=strict makes it read-only for the service account).
Regression check against 2026-05-15 findings
| Prior finding | Status |
|---|---|
M1 — LoopAsync untested |
Closed — Loop_PushesRepeatedly_ThenStopsAfterStopAsync added |
M2 — Record/Disarm race |
Closed — production fix + ConcurrentRecordAndDisarm_LeavesNoStaleObservation |
M3 — PlcSubscriptionTracker no direct/concurrent test |
Closed — PlcSubscriptionTrackerTests.cs added with stress test |
M4 — FakeGroupManager.Removed dead code |
Partially addressed: no test added; Removed list is still present — see new m3 |
m1 — ThrowingStatusPushSink missing |
Closed — ThrowingStatusPushSink and two swallow/propagate tests added |
m2 — FakeHubCallerContext.ConnectionAborted hardcoded |
No change; accepted as low-risk |
| m3 — asset allow-list coverage | Closed — EmbeddedAssetsTests covers glob; Get_Asset asserts bytes verbatim |
m4 — Get_Asset byte assertion missing |
Closed — ReadEmbeddedAsset comparison added |
m5 — *.* glob caveats |
Unchanged; accepted |
m6 — mbproxy.smoke.config.json dangling plans reference |
Not fixed — see new m4 |
m7 — ConfigReconcilerTests registry wiring untested |
Closed — Apply_AddThenRemovePlc_TagCaptureRegistryTracksRoster added |
m8 — AdminPushIntervalMs upper bound unguarded |
Closed — upper bound added to both ReloadValidator and MbproxyOptionsValidator; four new tests |
m9 — OnDisconnectedAsync always called with null |
No change; accepted as low-risk |
Coverage gap 7 — /hub/status SignalR E2E |
Closed — HubStatusE2ETests (2 facts) added |
Key file references
install/mbproxy.config.template.json:83,install/mbproxy.linux.config.template.json:87— "Set to 0 to disable" comment (M1)src/Mbproxy/Admin/AdminEndpointHost.cs:190—k.Listen(..., port)with no port-0 guard (M1)src/Mbproxy/Configuration/ReloadValidator.cs:65-68— rejectsAdminPort < 1on hot-reload but not at startup (M1)tests/Mbproxy.Tests/Proxy/ProxyForwardingTests.cs:67,242,310,RewriterE2ETests.cs:388,HostSmokeTests.cs:111— hard-coded port 8080 (M2)tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs:121-122—CancellationToken.Noneswallows test-runner cancellation (M3)tests/Mbproxy.Tests/Admin/StatusBroadcasterTests.cs:188— unnecessary 400 ms idle delay (m2)tests/Mbproxy.Tests/Admin/SignalRFakes.cs:30-31—Removedlist never asserted (m3)tests/sim/mbproxy.smoke.config.json:1— danglingplans/reference (m4)install/publish.ps1:103-110,install/publish.sh:99-106— warning-only when expected binary missing (m6)install/mbproxy.service:40—ReadWritePathssilent about/etc/mbproxyexclusion (m7)