Files
Joseph Doherty f2c6669444 mbproxy/codereviews: 2026-05-14 in-depth review + remediation plan
Eight area-focused reviews (BCD rewriter, multiplexer, response cache,
supervisor + hot-reload, admin + diagnostics, hosting + options, test
suite) plus an Overview that prioritises findings across areas, and a
RemediationPlan that groups the work into three waves with per-item
file:line citations and regression-test sketches.

Findings call out: hot-reload tag-list/cache changes that don't reach the
running multiplexer, a coalescing factory leak that hangs late attachers,
backend-reader head-of-line block on a wedged upstream, stranded outbound
frames after cascade, and ShutdownCoordinator double-stop ordering. Plus
the unconventional 32-bit BCD wire format (two base-10000 digits in CDAB,
not standard binary), unreachable BcdValidationError.DuplicateAddress,
mbproxy.cache.flushed event that's defined but never emitted, and missing
test coverage for Cache.AllowLongTtl.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 05:15:34 -04:00

74 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Test Suite Assessment
Scope: every `.cs` file under `tests/Mbproxy.Tests/` (excluding bin/obj). Cross-checked against `docs/design.md` Testing section, `docs/plan/README.md` per-phase test inventory, `docs/Testing/Simulator.md`. Per CLAUDE.md the suite was around 325 tests at Phase 9 baseline and has grown through Phase 11.
## Summary
**Counts (by `[Fact]` / `[Theory]` attribute, recursive over `tests/Mbproxy.Tests/**/*.cs` excluding `obj/`):**
| Bucket | Files | Test methods |
|---|---:|---:|
| Unit (no I/O or stub-socket only, `[Trait("Category","Unit")]`) | 26 | ~189 |
| E2E (`[Trait("Category","E2E")]`, simulator + real Kestrel + real host) | 9 | ~37 |
| Other / fixture smoke (`HostSmokeTests`, `SimulatorSmokeTests`, fixture files) | 4 | 7 |
| **Total method count** | **39** | **~233** |
Only **2 `[Theory]`** tests exist (both in `BcdCodecTests`), each with a few `[InlineData]` rows, so the actual executed-test count is roughly **240245**, slightly **below** the README/CLAUDE.md baseline of "325 tests at Phase 9 (282 unit + 43 E2E)". Either CLAUDE.md is over-reporting (perhaps counting individual `Theory` cases that have since been refactored away, or counting per-PLC parameterized runs), or test consolidation has happened since Phase 9 — worth reconciling. Phase 11 net additions are visible (Cache/* directory: 5 files, 32 facts).
**Headlines:**
- BCD codec, MBAP framing, multiplexer correlation, cache, and coalescing are all well-covered with both targeted unit tests and E2E + stub-backend integration tests; the suite has clear architectural layering (pure unit → stub-socket integration → simulator E2E).
- The E2E suite uses a **single shared `DL205SimulatorFixture`** that gracefully `Assert.Skip`s when Python/pymodbus is unavailable, but it **leans heavily on `Task.Delay` polling** (64 occurrences) and explicit time-window arithmetic — flake-prone on slow/loaded CI workers.
- The biggest concrete coverage gap is the **`Cache.AllowLongTtl` reload-rejection rule**, which is documented in `docs/design.md` and enforced in `ReloadValidator.cs` (line 100, 128) and `MbproxyOptions.cs` but has **zero corresponding test** in `ReloadValidatorTests.cs` or anywhere in the suite (`grep -rn AllowLongTtl tests/` returns nothing).
## Coverage matrix
| Subsystem | Rating | Notes |
|---|---|---|
| BCD codec (16/32-bit, partial, OOR) | **Strong** | `BcdCodecTests.cs` (19) + `BcdPduPipelineTests.cs` (26) cover encode/decode, single-reg, full 32-bit CDAB pair, partial low-only & high-only with `PartialBcdWarnings`, mixed-with-non-BCD, bad-nibble pass-through, FC06 OOR, FC16 multi-reg + partial pair, exception responses, length invariant. |
| MBAP frame parsing | **Adequate** | `MbapFrameTests.cs` (10 facts). Not read in depth — but headers in `PlcMultiplexerTests` exercise real wire frames end-to-end. |
| Multiplexer (TxId, correlation, cascade, watchdog) | **Strong** | `TxIdAllocatorTests` (8, incl. saturation, full-65536 sweep, wrap, parallel-stress concurrency), `CorrelationMapTests` (5), `RewriterCorrelationTests` (4), `PlcMultiplexerTests` (8: TxId-collision, distinct-on-wire, upstream-disconnect isolation, **backend-disconnect cascade with 3 upstreams**, **watchdog → exception 0x0B**, reconnect after cascade). |
| Coalescing | **Strong** | `ReadCoalescingTests` (8) + `CoalescingKeyTests` (5) + `InFlightByKeyMapTests` (6) + `ReadCoalescingE2ETests` (5). Covers leader/waiter, FC03/FC04 not coalesced, FC06 never coalesced, `MaxParties` cap (= 2 forces overflow), dead-upstream `CoalescedResponseToDeadUpstream` counter, hot-reload `Enabled=false` path. |
| Cache | **Strong** | `ResponseCacheTests` (10), `CacheInvalidatorTests` (8), `ResponseCacheMultiplexerTests` (8), `CacheKeyTests` (3), `CacheEntryTests` (3), `ResponseCacheE2ETests` (5). Covers TTL expiry, LRU + access-order, multi-tag min-TTL/zero-TTL disable, range-overlap (full/low/high/adjacent/disjoint/wrong-unit/zero-qty), write invalidation, **cache→coalesce short-circuit ordering**, post-rewriter byte caching, concurrent get/set stress (8 tasks × 500 ops). |
| Supervisor | **Adequate** | `SupervisorTests` (6) covers bind, port-in-use → Recovering, recovery-on-port-free, Stop-during-Recovering, RecoveryAttempts accumulation. `BackendConnectRetryTests` (3) and `PolicyFactoryTests` (5) cover Polly. **Test 4 ("RuntimeFault") is a placeholder** (test name says runtime fault but doesn't actually induce one — it self-admits this). |
| Hot reload | **Adequate** | `HotReloadE2ETests` (4): add PLC, remove PLC, change global tag list (reseat without restart), invalid reload (port collision) → no mutation. `ConfigReconcilerTests` (4) adds concurrent-reload serialisation. `ReloadPlanTests` (8) and `ReloadValidatorTests` (8) cover plan computation and validation rules **except the cache-related ones**. |
| Admin endpoint | **Strong** | `AdminEndpointTests` (6 E2E): JSON shape, post-FC03 PDU count, partial-BCD warning surfaced, HTML+meta-refresh, **bind-failure isolation** (service stays up + logs `bind-failed`), **admin-port hot-reload rebind**. `StatusSnapshotBuilderTests` (6) + `StatusHtmlRendererTests` (3). |
| Shutdown | **Strong** | `ShutdownCoordinatorTests` (4): no-conn, with-conn drain, timeout-cancel-with-count, ordering (admin stops after listeners). `ShutdownE2ETests` (2): graceful drain ≤ 10s, in-flight cancel after timeout. |
## Specific gaps (design contracts without a test)
1. **`Cache.AllowLongTtl=false` rejects `CacheTtlMs > 60_000`** — enforced at `src/Mbproxy/Configuration/ReloadValidator.cs:90-128` and `src/Mbproxy/Options/MbproxyOptions.cs:58-111`, but **no test exists in `ReloadValidatorTests.cs`** (only 8 tests covering name dup, port dup, admin collision, BCD map error, port range, admin-port range, happy path, empty name). Both the per-tag rule and the global reload rule should each get a test.
2. **Reload-driven cache flush on tag-list change** — `docs/design.md` and `CLAUDE.md` state "a tag-list reload flushes the affected PLC's response cache". `HotReloadE2ETests.E2E_ChangeGlobalBcdTagList_RewriteReflectsImmediately` proves the reseat happens but does **not** assert that any pre-existing cache entries for that PLC are dropped (e.g., set entry, mutate tag list, observe `Clear` increment / next read goes to backend). And per the Supervisor review's Critical 1, this is exactly the behaviour that today does NOT actually fire in the running multiplexer — a test would catch the regression directly.
3. **`SupervisorTests.Supervisor_RuntimeFault_TriggersRecovery`** is named for the contract but explicitly admits "we verify recovery mechanics are wired … For a full runtime-fault scenario, see the E2E tests." `SupervisorE2ETests` only has 2 tests — needs a check that they actually fault the accept loop. No test currently forces an accept-loop exception and asserts the supervisor resumes.
4. **Backend FC03/FC04 max-qty pass-through (PLC enforces 128/123 caps with exception 03)** — `dl205.md` calls this out as a quirk. The proxy is supposed to be transparent; no test sends `qty > 128` and asserts the proxy doesn't truncate / forwards as-is and surfaces exception 03 to the client.
5. **TxId allocator saturation propagation** — the allocator returns `false` when full (`TxIdAllocatorTests.Allocate_WhenSaturated_ReturnsFalse_DoesNotThrow`), but there's no `PlcMultiplexerTests` case asserting what happens when 65,536 in-flight requests pile up — does the multiplexer surface exception 0x06 (Server Device Busy) or backpressure?
6. **`Cache.AllowLongTtl=true` happy path** — paired with gap (1); should also have a test that the long-TTL value passes validation when the gate is set.
7. **Reload of `ReadCoalescing.Enabled` at runtime** — there's `ReadCoalescingE2ETests.E2E_CoalescingDisabledViaConfig_EveryReadIsAMiss` but it appears to start with the value disabled, not flip it at runtime through the hot-reload path. Per the Supervisor review's M6, hot-reload of this knob doesn't propagate through the reconciler's add/restart paths anyway — a test would catch it.
8. **Coalescing factory leak under saturation with multiple attachers** — see Multiplexer review C1. No test exercises the case where the leader's `TryAdd` or `TryAllocate` fails while late attachers have already joined the entry.
9. **Backend-reader head-of-line block by a slow upstream** — see Multiplexer review C4. No test wedges one upstream's response channel and asserts other peers on the same PLC continue to receive responses.
## Quality observations
- **Wall-clock arithmetic in coalescing tests.** `ReadCoalescingTests` uses fixed sleeps to construct the in-flight window (e.g. `ResponseDelayMs = 300; await Task.Delay(80)`). The `FiveClients_SameRequest_OneBackendRoundTrip_FiveResponses` test is hedged (`backend.RequestCount.ShouldBeLessThanOrEqualTo(2, "...one for the leader, one for any racy first-arrival")` and `CoalescedHitCount.ShouldBeGreaterThanOrEqualTo(3)`) — sensible hedging that admits the timing race exists. This is a known smell for CI.
- **Cascade test polls for counter convergence** (`PlcMultiplexerTests.BackendDisconnect_CascadesToAllUpstreams` polls 50× at 20 ms with comment explaining the inherent scheduling gap). Good defensive engineering, but indicates the underlying `BackendDisconnectCascades` increment is not synchronously observable from the test seam.
- **Naming is strong throughout**: tests follow `Subject_Condition_Expectation` style (e.g. `TwoClients_SameRequest_OnlyOneBackendRoundTrip`, `Validate_AdminPortCollidesWith_PlcListenPort_Fails`). Comments above each test explain the contract being verified.
- **AAA structure is clear** — almost every test has `// ── Helpers ──` / `// ── Tests ──` regions and explicit "Arrange / Act / Assert" implicit through whitespace.
- **Concurrency tests are genuine, not pretend.** `TxIdAllocatorTests.Concurrent_AllocateRelease_NoDuplicateIds_Under_Parallel_Stress` (100 tasks × 1000 ops verifying no duplicate live IDs via `ConcurrentDictionary.TryAdd`) and `ResponseCacheTests.Concurrent_GetSet_NoDataRace` (8 × 500) actually race tasks. Multiplexer's "two upstream concurrent" tests use real sockets with simultaneous `SendAsync` calls.
- **Skip pattern is clean and uniform**: `if (_sim.SkipReason is not null) Assert.Skip(_sim.SkipReason);` at the top of every E2E test method that needs the simulator. Cold-pip provisioning gets a 120-second window — reasonable for cold CI.
- **One redundant test pair**: `Diagnostics/ShutdownCoordinatorTests` line 27, 43, 57 each define a `StopAsync` stub — these are nested helper classes inside the test class, fine but easy to miss when reading the test method list.
- **Some E2E timeouts are tight enough to flake** (e.g. `[Fact(Timeout = 5_000)]` for `HotReloadE2ETests.E2E_AddPlcAtRuntime_NewListenerBinds_AndIsReachable`). The author already bumped one to 10s with a comment about hot-reload propagation needing headroom — others may follow.
## What looks good
- Three-tier architecture: **pure unit** (`BcdCodecTests`, `MbapFrameTests`, `TxIdAllocatorTests`, `CacheKeyTests`) → **stub-socket integration** (`PlcMultiplexerTests`, `ReadCoalescingTests`, `ResponseCacheMultiplexerTests`) → **simulator E2E** (`*E2ETests`). Each layer is justified in its xmldoc — e.g. `ReadCoalescingTests` explicitly notes "the pymodbus simulator cannot be used here — its known concurrent-MBAP-frame bug would invalidate the proxy-TxId echo path".
- Counter-based assertions everywhere (`RewrittenSlots`, `PartialBcdWarnings`, `CoalescedHitCount`, `CoalescedResponseToDeadUpstream`, `BackendDisconnectCascades`, `ReloadAppliedCount`, `ReloadRejectedCount`) — gives strong observable signals beyond happy-path correctness, and validates the status-page wire format simultaneously.
- Stub backends (`StubBackend` in `PlcMultiplexerTests`, `DelayedStubBackend` in `ReadCoalescingTests`, drain-only in watchdog test) are well-factored and reusable.
- Negative cases are first-class: bad nibbles, OOR encode, wrong unit ID, adjacent-but-not-overlapping ranges, FC06 against partial 32-bit pair, dead-upstream fan-out, port-already-in-use, duplicate config — not just happy-path coverage.
- `HotReloadE2ETests` exercises the **real `FileSystemWatcher` + `reloadOnChange: true`** path with rename-replace (the pattern that produces the 2-3 spurious events the debounce has to handle) — the most realistic possible test for that subsystem.
**Key file paths referenced:**
- `src/Mbproxy/Configuration/ReloadValidator.cs` (lines 90, 100, 128 — long-TTL gate enforcement, untested)
- `src/Mbproxy/Options/MbproxyOptions.cs` (lines 32, 58111 — long-TTL gate option + per-tag enforcement)
- `tests/Mbproxy.Tests/Configuration/ReloadValidatorTests.cs` (8 tests, missing cache rules)
- `tests/Mbproxy.Tests/Proxy/Supervision/SupervisorTests.cs` (Test 4 = placeholder)
- `tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs` (skip mechanism)