Files
wwtools/mbproxy/codereviews/2026-05-14/TestSuite.md
Joseph Doherty f2c6669444 mbproxy/codereviews: 2026-05-14 in-depth review + remediation plan
Eight area-focused reviews (BCD rewriter, multiplexer, response cache,
supervisor + hot-reload, admin + diagnostics, hosting + options, test
suite) plus an Overview that prioritises findings across areas, and a
RemediationPlan that groups the work into three waves with per-item
file:line citations and regression-test sketches.

Findings call out: hot-reload tag-list/cache changes that don't reach the
running multiplexer, a coalescing factory leak that hangs late attachers,
backend-reader head-of-line block on a wedged upstream, stranded outbound
frames after cascade, and ShutdownCoordinator double-stop ordering. Plus
the unconventional 32-bit BCD wire format (two base-10000 digits in CDAB,
not standard binary), unreachable BcdValidationError.DuplicateAddress,
mbproxy.cache.flushed event that's defined but never emitted, and missing
test coverage for Cache.AllowLongTtl.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 05:15:34 -04:00

13 KiB
Raw Permalink Blame History

Test Suite Assessment

Scope: every .cs file under tests/Mbproxy.Tests/ (excluding bin/obj). Cross-checked against docs/design.md Testing section, docs/plan/README.md per-phase test inventory, docs/Testing/Simulator.md. Per CLAUDE.md the suite was around 325 tests at Phase 9 baseline and has grown through Phase 11.

Summary

Counts (by [Fact] / [Theory] attribute, recursive over tests/Mbproxy.Tests/**/*.cs excluding obj/):

Bucket Files Test methods
Unit (no I/O or stub-socket only, [Trait("Category","Unit")]) 26 ~189
E2E ([Trait("Category","E2E")], simulator + real Kestrel + real host) 9 ~37
Other / fixture smoke (HostSmokeTests, SimulatorSmokeTests, fixture files) 4 7
Total method count 39 ~233

Only 2 [Theory] tests exist (both in BcdCodecTests), each with a few [InlineData] rows, so the actual executed-test count is roughly 240245, slightly below the README/CLAUDE.md baseline of "325 tests at Phase 9 (282 unit + 43 E2E)". Either CLAUDE.md is over-reporting (perhaps counting individual Theory cases that have since been refactored away, or counting per-PLC parameterized runs), or test consolidation has happened since Phase 9 — worth reconciling. Phase 11 net additions are visible (Cache/* directory: 5 files, 32 facts).

Headlines:

  • BCD codec, MBAP framing, multiplexer correlation, cache, and coalescing are all well-covered with both targeted unit tests and E2E + stub-backend integration tests; the suite has clear architectural layering (pure unit → stub-socket integration → simulator E2E).
  • The E2E suite uses a single shared DL205SimulatorFixture that gracefully Assert.Skips when Python/pymodbus is unavailable, but it leans heavily on Task.Delay polling (64 occurrences) and explicit time-window arithmetic — flake-prone on slow/loaded CI workers.
  • The biggest concrete coverage gap is the Cache.AllowLongTtl reload-rejection rule, which is documented in docs/design.md and enforced in ReloadValidator.cs (line 100, 128) and MbproxyOptions.cs but has zero corresponding test in ReloadValidatorTests.cs or anywhere in the suite (grep -rn AllowLongTtl tests/ returns nothing).

Coverage matrix

Subsystem Rating Notes
BCD codec (16/32-bit, partial, OOR) Strong BcdCodecTests.cs (19) + BcdPduPipelineTests.cs (26) cover encode/decode, single-reg, full 32-bit CDAB pair, partial low-only & high-only with PartialBcdWarnings, mixed-with-non-BCD, bad-nibble pass-through, FC06 OOR, FC16 multi-reg + partial pair, exception responses, length invariant.
MBAP frame parsing Adequate MbapFrameTests.cs (10 facts). Not read in depth — but headers in PlcMultiplexerTests exercise real wire frames end-to-end.
Multiplexer (TxId, correlation, cascade, watchdog) Strong TxIdAllocatorTests (8, incl. saturation, full-65536 sweep, wrap, parallel-stress concurrency), CorrelationMapTests (5), RewriterCorrelationTests (4), PlcMultiplexerTests (8: TxId-collision, distinct-on-wire, upstream-disconnect isolation, backend-disconnect cascade with 3 upstreams, watchdog → exception 0x0B, reconnect after cascade).
Coalescing Strong ReadCoalescingTests (8) + CoalescingKeyTests (5) + InFlightByKeyMapTests (6) + ReadCoalescingE2ETests (5). Covers leader/waiter, FC03/FC04 not coalesced, FC06 never coalesced, MaxParties cap (= 2 forces overflow), dead-upstream CoalescedResponseToDeadUpstream counter, hot-reload Enabled=false path.
Cache Strong ResponseCacheTests (10), CacheInvalidatorTests (8), ResponseCacheMultiplexerTests (8), CacheKeyTests (3), CacheEntryTests (3), ResponseCacheE2ETests (5). Covers TTL expiry, LRU + access-order, multi-tag min-TTL/zero-TTL disable, range-overlap (full/low/high/adjacent/disjoint/wrong-unit/zero-qty), write invalidation, cache→coalesce short-circuit ordering, post-rewriter byte caching, concurrent get/set stress (8 tasks × 500 ops).
Supervisor Adequate SupervisorTests (6) covers bind, port-in-use → Recovering, recovery-on-port-free, Stop-during-Recovering, RecoveryAttempts accumulation. BackendConnectRetryTests (3) and PolicyFactoryTests (5) cover Polly. Test 4 ("RuntimeFault") is a placeholder (test name says runtime fault but doesn't actually induce one — it self-admits this).
Hot reload Adequate HotReloadE2ETests (4): add PLC, remove PLC, change global tag list (reseat without restart), invalid reload (port collision) → no mutation. ConfigReconcilerTests (4) adds concurrent-reload serialisation. ReloadPlanTests (8) and ReloadValidatorTests (8) cover plan computation and validation rules except the cache-related ones.
Admin endpoint Strong AdminEndpointTests (6 E2E): JSON shape, post-FC03 PDU count, partial-BCD warning surfaced, HTML+meta-refresh, bind-failure isolation (service stays up + logs bind-failed), admin-port hot-reload rebind. StatusSnapshotBuilderTests (6) + StatusHtmlRendererTests (3).
Shutdown Strong ShutdownCoordinatorTests (4): no-conn, with-conn drain, timeout-cancel-with-count, ordering (admin stops after listeners). ShutdownE2ETests (2): graceful drain ≤ 10s, in-flight cancel after timeout.

Specific gaps (design contracts without a test)

  1. Cache.AllowLongTtl=false rejects CacheTtlMs > 60_000 — enforced at src/Mbproxy/Configuration/ReloadValidator.cs:90-128 and src/Mbproxy/Options/MbproxyOptions.cs:58-111, but no test exists in ReloadValidatorTests.cs (only 8 tests covering name dup, port dup, admin collision, BCD map error, port range, admin-port range, happy path, empty name). Both the per-tag rule and the global reload rule should each get a test.
  2. Reload-driven cache flush on tag-list changedocs/design.md and CLAUDE.md state "a tag-list reload flushes the affected PLC's response cache". HotReloadE2ETests.E2E_ChangeGlobalBcdTagList_RewriteReflectsImmediately proves the reseat happens but does not assert that any pre-existing cache entries for that PLC are dropped (e.g., set entry, mutate tag list, observe Clear increment / next read goes to backend). And per the Supervisor review's Critical 1, this is exactly the behaviour that today does NOT actually fire in the running multiplexer — a test would catch the regression directly.
  3. SupervisorTests.Supervisor_RuntimeFault_TriggersRecovery is named for the contract but explicitly admits "we verify recovery mechanics are wired … For a full runtime-fault scenario, see the E2E tests." SupervisorE2ETests only has 2 tests — needs a check that they actually fault the accept loop. No test currently forces an accept-loop exception and asserts the supervisor resumes.
  4. Backend FC03/FC04 max-qty pass-through (PLC enforces 128/123 caps with exception 03)dl205.md calls this out as a quirk. The proxy is supposed to be transparent; no test sends qty > 128 and asserts the proxy doesn't truncate / forwards as-is and surfaces exception 03 to the client.
  5. TxId allocator saturation propagation — the allocator returns false when full (TxIdAllocatorTests.Allocate_WhenSaturated_ReturnsFalse_DoesNotThrow), but there's no PlcMultiplexerTests case asserting what happens when 65,536 in-flight requests pile up — does the multiplexer surface exception 0x06 (Server Device Busy) or backpressure?
  6. Cache.AllowLongTtl=true happy path — paired with gap (1); should also have a test that the long-TTL value passes validation when the gate is set.
  7. Reload of ReadCoalescing.Enabled at runtime — there's ReadCoalescingE2ETests.E2E_CoalescingDisabledViaConfig_EveryReadIsAMiss but it appears to start with the value disabled, not flip it at runtime through the hot-reload path. Per the Supervisor review's M6, hot-reload of this knob doesn't propagate through the reconciler's add/restart paths anyway — a test would catch it.
  8. Coalescing factory leak under saturation with multiple attachers — see Multiplexer review C1. No test exercises the case where the leader's TryAdd or TryAllocate fails while late attachers have already joined the entry.
  9. Backend-reader head-of-line block by a slow upstream — see Multiplexer review C4. No test wedges one upstream's response channel and asserts other peers on the same PLC continue to receive responses.

Quality observations

  • Wall-clock arithmetic in coalescing tests. ReadCoalescingTests uses fixed sleeps to construct the in-flight window (e.g. ResponseDelayMs = 300; await Task.Delay(80)). The FiveClients_SameRequest_OneBackendRoundTrip_FiveResponses test is hedged (backend.RequestCount.ShouldBeLessThanOrEqualTo(2, "...one for the leader, one for any racy first-arrival") and CoalescedHitCount.ShouldBeGreaterThanOrEqualTo(3)) — sensible hedging that admits the timing race exists. This is a known smell for CI.
  • Cascade test polls for counter convergence (PlcMultiplexerTests.BackendDisconnect_CascadesToAllUpstreams polls 50× at 20 ms with comment explaining the inherent scheduling gap). Good defensive engineering, but indicates the underlying BackendDisconnectCascades increment is not synchronously observable from the test seam.
  • Naming is strong throughout: tests follow Subject_Condition_Expectation style (e.g. TwoClients_SameRequest_OnlyOneBackendRoundTrip, Validate_AdminPortCollidesWith_PlcListenPort_Fails). Comments above each test explain the contract being verified.
  • AAA structure is clear — almost every test has // ── Helpers ── / // ── Tests ── regions and explicit "Arrange / Act / Assert" implicit through whitespace.
  • Concurrency tests are genuine, not pretend. TxIdAllocatorTests.Concurrent_AllocateRelease_NoDuplicateIds_Under_Parallel_Stress (100 tasks × 1000 ops verifying no duplicate live IDs via ConcurrentDictionary.TryAdd) and ResponseCacheTests.Concurrent_GetSet_NoDataRace (8 × 500) actually race tasks. Multiplexer's "two upstream concurrent" tests use real sockets with simultaneous SendAsync calls.
  • Skip pattern is clean and uniform: if (_sim.SkipReason is not null) Assert.Skip(_sim.SkipReason); at the top of every E2E test method that needs the simulator. Cold-pip provisioning gets a 120-second window — reasonable for cold CI.
  • One redundant test pair: Diagnostics/ShutdownCoordinatorTests line 27, 43, 57 each define a StopAsync stub — these are nested helper classes inside the test class, fine but easy to miss when reading the test method list.
  • Some E2E timeouts are tight enough to flake (e.g. [Fact(Timeout = 5_000)] for HotReloadE2ETests.E2E_AddPlcAtRuntime_NewListenerBinds_AndIsReachable). The author already bumped one to 10s with a comment about hot-reload propagation needing headroom — others may follow.

What looks good

  • Three-tier architecture: pure unit (BcdCodecTests, MbapFrameTests, TxIdAllocatorTests, CacheKeyTests) → stub-socket integration (PlcMultiplexerTests, ReadCoalescingTests, ResponseCacheMultiplexerTests) → simulator E2E (*E2ETests). Each layer is justified in its xmldoc — e.g. ReadCoalescingTests explicitly notes "the pymodbus simulator cannot be used here — its known concurrent-MBAP-frame bug would invalidate the proxy-TxId echo path".
  • Counter-based assertions everywhere (RewrittenSlots, PartialBcdWarnings, CoalescedHitCount, CoalescedResponseToDeadUpstream, BackendDisconnectCascades, ReloadAppliedCount, ReloadRejectedCount) — gives strong observable signals beyond happy-path correctness, and validates the status-page wire format simultaneously.
  • Stub backends (StubBackend in PlcMultiplexerTests, DelayedStubBackend in ReadCoalescingTests, drain-only in watchdog test) are well-factored and reusable.
  • Negative cases are first-class: bad nibbles, OOR encode, wrong unit ID, adjacent-but-not-overlapping ranges, FC06 against partial 32-bit pair, dead-upstream fan-out, port-already-in-use, duplicate config — not just happy-path coverage.
  • HotReloadE2ETests exercises the real FileSystemWatcher + reloadOnChange: true path with rename-replace (the pattern that produces the 2-3 spurious events the debounce has to handle) — the most realistic possible test for that subsystem.

Key file paths referenced:

  • src/Mbproxy/Configuration/ReloadValidator.cs (lines 90, 100, 128 — long-TTL gate enforcement, untested)
  • src/Mbproxy/Options/MbproxyOptions.cs (lines 32, 58111 — long-TTL gate option + per-tag enforcement)
  • tests/Mbproxy.Tests/Configuration/ReloadValidatorTests.cs (8 tests, missing cache rules)
  • tests/Mbproxy.Tests/Proxy/Supervision/SupervisorTests.cs (Test 4 = placeholder)
  • tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs (skip mechanism)