# Code Review — Tests, Install/Packaging, and Config (HEAD `0308490`) Scope: `tests/Mbproxy.Tests/` (all test code), `tests/sim/` (simulator profile and fixture), `install/` (all scripts and config templates), `src/Mbproxy/Mbproxy.csproj`, and `tests/Mbproxy.Tests/Mbproxy.Tests.csproj`. Prior-review findings from `codereviews/2026-05-15/TestsAndConfig.md` were read for regression context; the code is reviewed fresh against the current HEAD. ## Summary The prior-review remediation is genuine and thorough: every `Major` finding from the 2026-05-15 review has been addressed — `StatusBroadcaster.LoopAsync` now has a proper timing-hedged loop test (`Loop_PushesRepeatedly_ThenStopsAfterStopAsync`), the `Record`/`Disarm` race is fixed in production and covered by `ConcurrentRecordAndDisarm_LeavesNoStaleObservation`, `PlcSubscriptionTracker` now has a dedicated test file including a concurrency stress test, the `ThrowingStatusPushSink` fake covers both the swallow-path and the `OperationCanceledException` filter, and `ConfigReconcilerTests` now holds a real registry and asserts add/remove lifecycle. The `HubStatusE2ETests` E2E pair covers the `MapHub`/`SignalRStatusPushSink` gap that the prior review flagged as the single biggest "does the feature actually work" gap. The `EmbeddedAssetsTests` glob guard and the `Get_Asset` byte-verbatim assertion also close the prior `m3`/`m4` nits. No prior-review finding has regressed. The remaining findings below are all new, introduced by the current HEAD or present in the pre-existing code but newly visible against the completed remediation baseline. **Findings: 0 Critical · 4 Major · 7 Minor** --- ## Critical None. --- ## Major **M1 — `AdminPort=0` silently binds a random OS-assigned port instead of disabling the admin endpoint.** `install/mbproxy.config.template.json:83`, `install/mbproxy.linux.config.template.json:87`, `docs/Operations/Configuration.md:54`; `src/Mbproxy/Admin/AdminEndpointHost.cs:190`. Both config templates carry the comment `// Set to 0 to disable the admin endpoint.`, and the Operations docs repeat this. The comment is wrong. `AdminEndpointHost.StartAppAsync` calls `k.Listen(System.Net.IPAddress.Any, port)` with whatever `port` is — Kestrel interprets port 0 as "pick a free OS-assigned port" (ephemeral port), not as "skip the bind". There is no code in `StartAppAsync` or its callers that special-cases port 0 as a disable signal. A production operator who sets `AdminPort: 0` to disable the admin surface does not get a disabled endpoint; they get an admin endpoint on an unknown ephemeral port that is not logged in the startup banner in a predictable way. This is made worse by: 1. `ReloadValidator.Validate` (line 65–68) rejects `AdminPort < 1`, so a hot-reload with `AdminPort: 0` is correctly blocked — but the *initial* startup path does not go through `ReloadValidator` (only `MbproxyOptionsValidator`, which does not range-check `AdminPort`). So port 0 at startup actually takes effect, while the same value in a reload is rejected. The two paths are inconsistent. 2. Several tests intentionally use `["Mbproxy:AdminPort"] = "0"` with the comment "disable admin to avoid port conflicts" (`ShutdownE2ETests.cs:163`, `StatusBroadcasterTests.cs:47`, `StatusSnapshotBuilderTests.cs:266`). These tests are not broken — they work because Kestrel binds on 0 and no test probes the admin port — but they silently start an admin endpoint rather than suppressing it. If those tests ever run concurrently with another test that probes all listening ports they can interfere. **Impact:** Operators following the documented `AdminPort: 0` procedure to disable the admin surface on a sensitive network have an undocumented, unauthorised HTTP listener running. This is a documentation/behavioural inconsistency on a production-critical setting. **Recommendation:** Implement the documented disable. In `AdminEndpointHost.StartAppAsync`, add `if (port == 0) return;` before the builder construction. Then update `ReloadValidator` to allow 0 (skip collision check) and update `MbproxyOptionsValidator` to allow 0. The templates and docs are then correct. Alternatively, remove the "Set to 0 to disable" comment and document that the admin endpoint is always started, and rename the test comments to reflect reality. --- **M2 — Multiple E2E tests hard-code `AdminPort: 8080`, causing bind-conflict failures when run in parallel or when port 8080 is already in use.** `tests/Mbproxy.Tests/Proxy/ProxyForwardingTests.cs:67,242,310`; `tests/Mbproxy.Tests/Proxy/RewriterE2ETests.cs:388`; `tests/Mbproxy.Tests/HostSmokeTests.cs:111`. `ProxyForwardingTests` (3 configs), `RewriterE2ETests` (1 config), and the shared `TestHostBuilderExtensions.ConfigureForTest` used by all three `HostSmokeTests` hard-code `["Mbproxy:AdminPort"] = "8080"`. When these tests run in parallel (xUnit v3's default policy for non-collection tests) they each try to bind a Kestrel listener on port 8080. The first binder wins; all subsequent tests get a `bind.failed` log and a missing admin endpoint. While the proxy tests themselves do not assert admin-endpoint behaviour, the spurious bind-failed error can mask real failures in CI logs and reduces test isolation. `HostSmokeTests` compounds this: it is not in any `[Collection]` so it runs concurrently with everything else. **Impact:** Flaky CI runs when multiple test classes execute simultaneously. Currently mitigated by the `AdminEndpointHost` bind-failure-is-non-fatal path, so test assertions still pass — but the coverage is degraded (admin is not actually up) and the log noise makes CI output harder to read. **Recommendation:** Replace the hard-coded `"8080"` values with `PickFreePort()` calls (the same pattern already used by `AdminEndpointTests` and `HubStatusE2ETests`). Update `TestHostBuilderExtensions.ConfigureForTest` to accept a port parameter with default 8080 for callers that do not care (smoke tests do not probe the admin surface and an ephemeral-port bind is fine). --- **M3 — `DL205SimulatorFixture` swallows the test's own `CancellationToken` in the readiness poll, blocking test cancellation for up to 120 seconds.** `tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs:121-163`. `InitializeAsync` constructs a `CancellationTokenSource` with a hard-coded 120-second deadline (line 61) and a `linked` token that chains the deadline against `CancellationToken.None` (line 122): ```csharp using var linked = CancellationTokenSource.CreateLinkedTokenSource( deadline.Token, CancellationToken.None); ``` The test framework's cancellation token — which xUnit v3 exposes as `TestContext.Current.CancellationToken` — is never linked in. If the test runner cancels the test suite (CI timeout, keyboard interrupt), the readiness poll continues for up to 120 more seconds and the spawned Python process is not killed until `DisposeAsync` finally runs. On a CI runner with a per-job timeout this can cause the whole runner to be killed mid-cleanup, leaving orphaned Python processes. **Impact:** Poor CI behaviour under cancellation; can leave zombie `python`/`pwsh` processes. **Recommendation:** Link the fixture against the framework's application lifetime or accept a `CancellationToken` through the `IAsyncLifetime` interface (xUnit v3 provides one via `InitializeAsync(CancellationToken)`). Alternatively, use `TestContext.Current.CancellationToken` from inside `InitializeAsync` if xUnit v3's fixture `IAsyncLifetime` overload provides it: ```csharp public async ValueTask InitializeAsync() // or the overload with CancellationToken ``` At minimum, change `CancellationToken.None` to the method's incoming token or `TestContext.Current.CancellationToken` inside `linked` so a kill signal actually propagates. --- **M4 — `install.ps1` copies the config template to `$dataDir` (ProgramData) but the `ReloadValidator` AdminPort=0 / `MbproxyOptionsValidator` gap means an operator using the template's `AdminPort: 0` disable hint starts a service with an undocumented ephemeral admin port.** (Secondary impact of M1, listed separately because the install path is independently broken.) `install/install.ps1:153-162`; `install/mbproxy.config.template.json:83-84`. When `install.ps1` seeds the template as the initial config and an operator follows the inline comment to set `AdminPort: 0`, the service starts with Kestrel binding on an OS-assigned ephemeral port, not disabled. The `install.ps1` script itself has no bug — the error is upstream in M1 — but the install path is where the misleading comment is first acted upon. Listing it separately so the install script gets a concrete fix recommendation: if M1 is fixed by adding a `port == 0 → skip` guard in `AdminEndpointHost`, the template comment becomes correct and this finding is resolved simultaneously. If M1 is fixed differently, the template comment must still be updated to match whatever the new semantics are. **Impact:** Same as M1 — an unexpected open HTTP port on a "disabled" admin config. **Recommendation:** Fix M1 first; this finding closes automatically once the template's `AdminPort: 0` comment reflects actual behaviour. --- ## Minor **m1 — `HubStatusE2ETests.PickFreePort` has a TOCTOU race: the port can be stolen between `l.Stop()` and Kestrel's bind.** `tests/Mbproxy.Tests/Admin/HubStatusE2ETests.cs:141-147`. The helper stops the listener and returns the port number — same pattern used in a dozen other test files. This is acknowledged as acceptable in the `DL205SimulatorFixture` TOCTOU comment (line 72). No change needed, but worth noting since `HubStatusE2ETests` is one of the few tests that allocates two ports in the same method (admin + proxy) and has a wider race window. Low priority; flagged for awareness. --- **m2 — `Loop_PushesRepeatedly_ThenStopsAfterStopAsync` has a 400ms post-stop idle assertion that can be a source of slowness but not flakiness.** `tests/Mbproxy.Tests/Admin/StatusBroadcasterTests.cs:188-190`. ```csharp await Task.Delay(400, TestContext.Current.CancellationToken); h.Sink.FleetPushes.Count.ShouldBe(afterStop, "no pushes may occur after StopAsync"); ``` The test correctly waits 400 ms after `StopAsync` to confirm silence. Given `AdminPushIntervalMs=100` and that `StopAsync` awaits the loop task (`await _loop`), the loop is guaranteed terminated before `StopAsync` returns — so zero extra delay is needed for the assertion to be correct. The 400 ms delay adds wall-clock time to every test run without buying additional confidence. The existing `_loop` await in `StopAsync` is the real guarantee. **Recommendation:** Drop the `Task.Delay(400, ...)` line and assert the count immediately after `StopAsync`. The test would be equally correct and ~400 ms faster. --- **m3 — `FakeGroupManager.Removed` list is still dead code: no test ever asserts it.** `tests/Mbproxy.Tests/Admin/SignalRFakes.cs:30-31`. This was noted as prior finding M4 in the 2026-05-15 review ("either delete `Removed` from the fake, or add a comment"). The list is still present and still has no asserting test. `StatusHub.OnDisconnectedAsync` never calls `RemoveFromGroupAsync` (relying on SignalR's implicit group cleanup), so no assertion ever fires. The unasserted field is a minor maintenance hazard: a future author may assume this is the intended assertion target and waste time debugging a test that never fails. **Recommendation:** Drop the `Removed` list from `FakeGroupManager`, or add a single inline comment: `// SignalR auto-removes disconnected connections from groups; explicit RemoveFromGroupAsync is not called.` --- **m4 — `mbproxy.smoke.config.json` still carries a dangling reference to `plans/`.** `tests/sim/mbproxy.smoke.config.json:1`. The header comment references `plans/2026-05-15-webui-dashboard.md`. As of commit `7466a46` ("retire superseded design/plan docs"), `plans/` is untracked (`?? plans/` in git status) and contains no file of that name. The comment is a stale citation that will silently fail any documentation-link check. **Recommendation:** Replace the reference with a plain description of the smoke file's purpose, removing the specific plan filename. --- **m5 — Both install templates have `AdminPort: 8080` hard-coded in the `Plcs` section inline comment, but the comment already appears in the admin-port section — the duplication in the PLC entry comment is not wrong but potentially confusing.** `install/mbproxy.config.template.json:79` (`"// GET /status.json → …"` inside the PLC description); both templates are identical here. This is a documentation nit: the `Plcs` array comment block on lines 60–75 is accurate, and the preceding inline comment about the 4-client cap remains true and relevant. No functional issue. **Recommendation:** No change required; noted for completeness. --- **m6 — `publish.ps1` and `publish.sh` do not exit non-zero if the config template copy fails.** `install/publish.ps1:87-94`; `install/publish.sh:85-95`. Both scripts call `throw` / `exit 1` if the template file is missing, which is correct. However the `Copy-Item -Force` / `cp -f` steps that copy the template into each output flavour silently succeed even if the destination directory does not exist (the directory was just created by `dotnet publish`). If `dotnet publish` produced no output directory (a degenerate build failure that `dotnet publish` should have already reported as non-zero), the copy also fails with an error but neither script re-checks `$LASTEXITCODE` after the copy loop. In practice `dotnet publish` failing before this point already causes both scripts to throw/exit (the preceding `if ($LASTEXITCODE -ne 0) { throw }` guards). This is low-risk but the post-copy result verification (the `Format-Size` / `du -h` summary block) is output-only and does not fail the script if a binary is missing — it emits a `Write-Warning` / `echo WARNING` instead. **Recommendation:** In the result summary loop of both scripts, change the missing-binary branch from a warning to a non-zero exit: ```powershell # publish.ps1 if (Test-Path $bin) { ... } else { throw "Expected binary not found: $bin" } ``` ```bash # publish.sh if [[ -f "$bin" ]]; then ... else echo "ERROR: expected binary not found: $bin" >&2; exit 1; fi ``` --- **m7 — `mbproxy.service` `ReadWritePaths` does not include `/etc/mbproxy`, so a hot-reload write by the service account to its own config directory is blocked by `ProtectSystem=strict`.** `install/mbproxy.service:40`. ```ini ReadWritePaths=/var/log/mbproxy /var/cache/mbproxy ``` `ProtectSystem=strict` makes the entire filesystem read-only except paths listed in `ReadWritePaths`. The service itself never writes to `/etc/mbproxy` (it only reads `appsettings.json`), so this is not a runtime bug. However: 1. The comment at line 8 states `WorkingDirectory=/etc/mbproxy` — the service account needs read access there, which `ProtectSystem=strict` provides (read-only is allowed). 2. If an operator's monitoring or deployment tooling updates the config in-place (e.g. `scp`-ing a new `appsettings.json` as the `mbproxy` user via an SSH key), that write will be blocked by `ProtectSystem=strict`. The operator would need to write as root and chown/chmod as appropriate. This is likely intentional (config is an admin operation), but the systemd unit has no comment explaining why `/etc/mbproxy` is intentionally absent from `ReadWritePaths`. It trips up operators who expect the service account to own its config. **Recommendation:** Add a comment after the `ReadWritePaths` line: `# /etc/mbproxy is intentionally absent — config changes require root (ProtectSystem=strict makes it read-only for the service account).` --- ## Regression check against 2026-05-15 findings | Prior finding | Status | |---|---| | M1 — `LoopAsync` untested | Closed — `Loop_PushesRepeatedly_ThenStopsAfterStopAsync` added | | M2 — `Record`/`Disarm` race | Closed — production fix + `ConcurrentRecordAndDisarm_LeavesNoStaleObservation` | | M3 — `PlcSubscriptionTracker` no direct/concurrent test | Closed — `PlcSubscriptionTrackerTests.cs` added with stress test | | M4 — `FakeGroupManager.Removed` dead code | Partially addressed: no test added; `Removed` list is still present — see new `m3` | | m1 — `ThrowingStatusPushSink` missing | Closed — `ThrowingStatusPushSink` and two swallow/propagate tests added | | m2 — `FakeHubCallerContext.ConnectionAborted` hardcoded | No change; accepted as low-risk | | m3 — asset allow-list coverage | Closed — `EmbeddedAssetsTests` covers glob; `Get_Asset` asserts bytes verbatim | | m4 — `Get_Asset` byte assertion missing | Closed — `ReadEmbeddedAsset` comparison added | | m5 — `*.*` glob caveats | Unchanged; accepted | | m6 — `mbproxy.smoke.config.json` dangling plans reference | Not fixed — see new `m4` | | m7 — `ConfigReconcilerTests` registry wiring untested | Closed — `Apply_AddThenRemovePlc_TagCaptureRegistryTracksRoster` added | | m8 — `AdminPushIntervalMs` upper bound unguarded | Closed — upper bound added to both `ReloadValidator` and `MbproxyOptionsValidator`; four new tests | | m9 — `OnDisconnectedAsync` always called with null | No change; accepted as low-risk | | Coverage gap 7 — `/hub/status` SignalR E2E | Closed — `HubStatusE2ETests` (2 facts) added | ## Key file references - `install/mbproxy.config.template.json:83`, `install/mbproxy.linux.config.template.json:87` — "Set to 0 to disable" comment (M1) - `src/Mbproxy/Admin/AdminEndpointHost.cs:190` — `k.Listen(..., port)` with no port-0 guard (M1) - `src/Mbproxy/Configuration/ReloadValidator.cs:65-68` — rejects `AdminPort < 1` on hot-reload but not at startup (M1) - `tests/Mbproxy.Tests/Proxy/ProxyForwardingTests.cs:67,242,310`, `RewriterE2ETests.cs:388`, `HostSmokeTests.cs:111` — hard-coded port 8080 (M2) - `tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs:121-122` — `CancellationToken.None` swallows test-runner cancellation (M3) - `tests/Mbproxy.Tests/Admin/StatusBroadcasterTests.cs:188` — unnecessary 400 ms idle delay (m2) - `tests/Mbproxy.Tests/Admin/SignalRFakes.cs:30-31` — `Removed` list never asserted (m3) - `tests/sim/mbproxy.smoke.config.json:1` — dangling `plans/` reference (m4) - `install/publish.ps1:103-110`, `install/publish.sh:99-106` — warning-only when expected binary missing (m6) - `install/mbproxy.service:40` — `ReadWritePaths` silent about `/etc/mbproxy` exclusion (m7)