Eight area-focused reviews (BCD rewriter, multiplexer, response cache, supervisor + hot-reload, admin + diagnostics, hosting + options, test suite) plus an Overview that prioritises findings across areas, and a RemediationPlan that groups the work into three waves with per-item file:line citations and regression-test sketches. Findings call out: hot-reload tag-list/cache changes that don't reach the running multiplexer, a coalescing factory leak that hangs late attachers, backend-reader head-of-line block on a wedged upstream, stranded outbound frames after cascade, and ShutdownCoordinator double-stop ordering. Plus the unconventional 32-bit BCD wire format (two base-10000 digits in CDAB, not standard binary), unreachable BcdValidationError.DuplicateAddress, mbproxy.cache.flushed event that's defined but never emitted, and missing test coverage for Cache.AllowLongTtl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
Test Suite Assessment
Scope: every .cs file under tests/Mbproxy.Tests/ (excluding bin/obj). Cross-checked against docs/design.md Testing section, docs/plan/README.md per-phase test inventory, docs/Testing/Simulator.md. Per CLAUDE.md the suite was around 325 tests at Phase 9 baseline and has grown through Phase 11.
Summary
Counts (by [Fact] / [Theory] attribute, recursive over tests/Mbproxy.Tests/**/*.cs excluding obj/):
| Bucket | Files | Test methods |
|---|---|---|
Unit (no I/O or stub-socket only, [Trait("Category","Unit")]) |
26 | ~189 |
E2E ([Trait("Category","E2E")], simulator + real Kestrel + real host) |
9 | ~37 |
Other / fixture smoke (HostSmokeTests, SimulatorSmokeTests, fixture files) |
4 | 7 |
| Total method count | 39 | ~233 |
Only 2 [Theory] tests exist (both in BcdCodecTests), each with a few [InlineData] rows, so the actual executed-test count is roughly 240–245, slightly below the README/CLAUDE.md baseline of "325 tests at Phase 9 (282 unit + 43 E2E)". Either CLAUDE.md is over-reporting (perhaps counting individual Theory cases that have since been refactored away, or counting per-PLC parameterized runs), or test consolidation has happened since Phase 9 — worth reconciling. Phase 11 net additions are visible (Cache/* directory: 5 files, 32 facts).
Headlines:
- BCD codec, MBAP framing, multiplexer correlation, cache, and coalescing are all well-covered with both targeted unit tests and E2E + stub-backend integration tests; the suite has clear architectural layering (pure unit → stub-socket integration → simulator E2E).
- The E2E suite uses a single shared
DL205SimulatorFixturethat gracefullyAssert.Skips when Python/pymodbus is unavailable, but it leans heavily onTask.Delaypolling (64 occurrences) and explicit time-window arithmetic — flake-prone on slow/loaded CI workers. - The biggest concrete coverage gap is the
Cache.AllowLongTtlreload-rejection rule, which is documented indocs/design.mdand enforced inReloadValidator.cs(line 100, 128) andMbproxyOptions.csbut has zero corresponding test inReloadValidatorTests.csor anywhere in the suite (grep -rn AllowLongTtl tests/returns nothing).
Coverage matrix
| Subsystem | Rating | Notes |
|---|---|---|
| BCD codec (16/32-bit, partial, OOR) | Strong | BcdCodecTests.cs (19) + BcdPduPipelineTests.cs (26) cover encode/decode, single-reg, full 32-bit CDAB pair, partial low-only & high-only with PartialBcdWarnings, mixed-with-non-BCD, bad-nibble pass-through, FC06 OOR, FC16 multi-reg + partial pair, exception responses, length invariant. |
| MBAP frame parsing | Adequate | MbapFrameTests.cs (10 facts). Not read in depth — but headers in PlcMultiplexerTests exercise real wire frames end-to-end. |
| Multiplexer (TxId, correlation, cascade, watchdog) | Strong | TxIdAllocatorTests (8, incl. saturation, full-65536 sweep, wrap, parallel-stress concurrency), CorrelationMapTests (5), RewriterCorrelationTests (4), PlcMultiplexerTests (8: TxId-collision, distinct-on-wire, upstream-disconnect isolation, backend-disconnect cascade with 3 upstreams, watchdog → exception 0x0B, reconnect after cascade). |
| Coalescing | Strong | ReadCoalescingTests (8) + CoalescingKeyTests (5) + InFlightByKeyMapTests (6) + ReadCoalescingE2ETests (5). Covers leader/waiter, FC03/FC04 not coalesced, FC06 never coalesced, MaxParties cap (= 2 forces overflow), dead-upstream CoalescedResponseToDeadUpstream counter, hot-reload Enabled=false path. |
| Cache | Strong | ResponseCacheTests (10), CacheInvalidatorTests (8), ResponseCacheMultiplexerTests (8), CacheKeyTests (3), CacheEntryTests (3), ResponseCacheE2ETests (5). Covers TTL expiry, LRU + access-order, multi-tag min-TTL/zero-TTL disable, range-overlap (full/low/high/adjacent/disjoint/wrong-unit/zero-qty), write invalidation, cache→coalesce short-circuit ordering, post-rewriter byte caching, concurrent get/set stress (8 tasks × 500 ops). |
| Supervisor | Adequate | SupervisorTests (6) covers bind, port-in-use → Recovering, recovery-on-port-free, Stop-during-Recovering, RecoveryAttempts accumulation. BackendConnectRetryTests (3) and PolicyFactoryTests (5) cover Polly. Test 4 ("RuntimeFault") is a placeholder (test name says runtime fault but doesn't actually induce one — it self-admits this). |
| Hot reload | Adequate | HotReloadE2ETests (4): add PLC, remove PLC, change global tag list (reseat without restart), invalid reload (port collision) → no mutation. ConfigReconcilerTests (4) adds concurrent-reload serialisation. ReloadPlanTests (8) and ReloadValidatorTests (8) cover plan computation and validation rules except the cache-related ones. |
| Admin endpoint | Strong | AdminEndpointTests (6 E2E): JSON shape, post-FC03 PDU count, partial-BCD warning surfaced, HTML+meta-refresh, bind-failure isolation (service stays up + logs bind-failed), admin-port hot-reload rebind. StatusSnapshotBuilderTests (6) + StatusHtmlRendererTests (3). |
| Shutdown | Strong | ShutdownCoordinatorTests (4): no-conn, with-conn drain, timeout-cancel-with-count, ordering (admin stops after listeners). ShutdownE2ETests (2): graceful drain ≤ 10s, in-flight cancel after timeout. |
Specific gaps (design contracts without a test)
Cache.AllowLongTtl=falserejectsCacheTtlMs > 60_000— enforced atsrc/Mbproxy/Configuration/ReloadValidator.cs:90-128andsrc/Mbproxy/Options/MbproxyOptions.cs:58-111, but no test exists inReloadValidatorTests.cs(only 8 tests covering name dup, port dup, admin collision, BCD map error, port range, admin-port range, happy path, empty name). Both the per-tag rule and the global reload rule should each get a test.- Reload-driven cache flush on tag-list change —
docs/design.mdandCLAUDE.mdstate "a tag-list reload flushes the affected PLC's response cache".HotReloadE2ETests.E2E_ChangeGlobalBcdTagList_RewriteReflectsImmediatelyproves the reseat happens but does not assert that any pre-existing cache entries for that PLC are dropped (e.g., set entry, mutate tag list, observeClearincrement / next read goes to backend). And per the Supervisor review's Critical 1, this is exactly the behaviour that today does NOT actually fire in the running multiplexer — a test would catch the regression directly. SupervisorTests.Supervisor_RuntimeFault_TriggersRecoveryis named for the contract but explicitly admits "we verify recovery mechanics are wired … For a full runtime-fault scenario, see the E2E tests."SupervisorE2ETestsonly has 2 tests — needs a check that they actually fault the accept loop. No test currently forces an accept-loop exception and asserts the supervisor resumes.- Backend FC03/FC04 max-qty pass-through (PLC enforces 128/123 caps with exception 03) —
dl205.mdcalls this out as a quirk. The proxy is supposed to be transparent; no test sendsqty > 128and asserts the proxy doesn't truncate / forwards as-is and surfaces exception 03 to the client. - TxId allocator saturation propagation — the allocator returns
falsewhen full (TxIdAllocatorTests.Allocate_WhenSaturated_ReturnsFalse_DoesNotThrow), but there's noPlcMultiplexerTestscase asserting what happens when 65,536 in-flight requests pile up — does the multiplexer surface exception 0x06 (Server Device Busy) or backpressure? Cache.AllowLongTtl=truehappy path — paired with gap (1); should also have a test that the long-TTL value passes validation when the gate is set.- Reload of
ReadCoalescing.Enabledat runtime — there'sReadCoalescingE2ETests.E2E_CoalescingDisabledViaConfig_EveryReadIsAMissbut it appears to start with the value disabled, not flip it at runtime through the hot-reload path. Per the Supervisor review's M6, hot-reload of this knob doesn't propagate through the reconciler's add/restart paths anyway — a test would catch it. - Coalescing factory leak under saturation with multiple attachers — see Multiplexer review C1. No test exercises the case where the leader's
TryAddorTryAllocatefails while late attachers have already joined the entry. - Backend-reader head-of-line block by a slow upstream — see Multiplexer review C4. No test wedges one upstream's response channel and asserts other peers on the same PLC continue to receive responses.
Quality observations
- Wall-clock arithmetic in coalescing tests.
ReadCoalescingTestsuses fixed sleeps to construct the in-flight window (e.g.ResponseDelayMs = 300; await Task.Delay(80)). TheFiveClients_SameRequest_OneBackendRoundTrip_FiveResponsestest is hedged (backend.RequestCount.ShouldBeLessThanOrEqualTo(2, "...one for the leader, one for any racy first-arrival")andCoalescedHitCount.ShouldBeGreaterThanOrEqualTo(3)) — sensible hedging that admits the timing race exists. This is a known smell for CI. - Cascade test polls for counter convergence (
PlcMultiplexerTests.BackendDisconnect_CascadesToAllUpstreamspolls 50× at 20 ms with comment explaining the inherent scheduling gap). Good defensive engineering, but indicates the underlyingBackendDisconnectCascadesincrement is not synchronously observable from the test seam. - Naming is strong throughout: tests follow
Subject_Condition_Expectationstyle (e.g.TwoClients_SameRequest_OnlyOneBackendRoundTrip,Validate_AdminPortCollidesWith_PlcListenPort_Fails). Comments above each test explain the contract being verified. - AAA structure is clear — almost every test has
// ── Helpers ──/// ── Tests ──regions and explicit "Arrange / Act / Assert" implicit through whitespace. - Concurrency tests are genuine, not pretend.
TxIdAllocatorTests.Concurrent_AllocateRelease_NoDuplicateIds_Under_Parallel_Stress(100 tasks × 1000 ops verifying no duplicate live IDs viaConcurrentDictionary.TryAdd) andResponseCacheTests.Concurrent_GetSet_NoDataRace(8 × 500) actually race tasks. Multiplexer's "two upstream concurrent" tests use real sockets with simultaneousSendAsynccalls. - Skip pattern is clean and uniform:
if (_sim.SkipReason is not null) Assert.Skip(_sim.SkipReason);at the top of every E2E test method that needs the simulator. Cold-pip provisioning gets a 120-second window — reasonable for cold CI. - One redundant test pair:
Diagnostics/ShutdownCoordinatorTestsline 27, 43, 57 each define aStopAsyncstub — these are nested helper classes inside the test class, fine but easy to miss when reading the test method list. - Some E2E timeouts are tight enough to flake (e.g.
[Fact(Timeout = 5_000)]forHotReloadE2ETests.E2E_AddPlcAtRuntime_NewListenerBinds_AndIsReachable). The author already bumped one to 10s with a comment about hot-reload propagation needing headroom — others may follow.
What looks good
- Three-tier architecture: pure unit (
BcdCodecTests,MbapFrameTests,TxIdAllocatorTests,CacheKeyTests) → stub-socket integration (PlcMultiplexerTests,ReadCoalescingTests,ResponseCacheMultiplexerTests) → simulator E2E (*E2ETests). Each layer is justified in its xmldoc — e.g.ReadCoalescingTestsexplicitly notes "the pymodbus simulator cannot be used here — its known concurrent-MBAP-frame bug would invalidate the proxy-TxId echo path". - Counter-based assertions everywhere (
RewrittenSlots,PartialBcdWarnings,CoalescedHitCount,CoalescedResponseToDeadUpstream,BackendDisconnectCascades,ReloadAppliedCount,ReloadRejectedCount) — gives strong observable signals beyond happy-path correctness, and validates the status-page wire format simultaneously. - Stub backends (
StubBackendinPlcMultiplexerTests,DelayedStubBackendinReadCoalescingTests, drain-only in watchdog test) are well-factored and reusable. - Negative cases are first-class: bad nibbles, OOR encode, wrong unit ID, adjacent-but-not-overlapping ranges, FC06 against partial 32-bit pair, dead-upstream fan-out, port-already-in-use, duplicate config — not just happy-path coverage.
HotReloadE2ETestsexercises the realFileSystemWatcher+reloadOnChange: truepath with rename-replace (the pattern that produces the 2-3 spurious events the debounce has to handle) — the most realistic possible test for that subsystem.
Key file paths referenced:
src/Mbproxy/Configuration/ReloadValidator.cs(lines 90, 100, 128 — long-TTL gate enforcement, untested)src/Mbproxy/Options/MbproxyOptions.cs(lines 32, 58–111 — long-TTL gate option + per-tag enforcement)tests/Mbproxy.Tests/Configuration/ReloadValidatorTests.cs(8 tests, missing cache rules)tests/Mbproxy.Tests/Proxy/Supervision/SupervisorTests.cs(Test 4 = placeholder)tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs(skip mechanism)