f2c6669444
Eight area-focused reviews (BCD rewriter, multiplexer, response cache, supervisor + hot-reload, admin + diagnostics, hosting + options, test suite) plus an Overview that prioritises findings across areas, and a RemediationPlan that groups the work into three waves with per-item file:line citations and regression-test sketches. Findings call out: hot-reload tag-list/cache changes that don't reach the running multiplexer, a coalescing factory leak that hangs late attachers, backend-reader head-of-line block on a wedged upstream, stranded outbound frames after cascade, and ShutdownCoordinator double-stop ordering. Plus the unconventional 32-bit BCD wire format (two base-10000 digits in CDAB, not standard binary), unreachable BcdValidationError.DuplicateAddress, mbproxy.cache.flushed event that's defined but never emitted, and missing test coverage for Cache.AllowLongTtl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
74 lines
13 KiB
Markdown
74 lines
13 KiB
Markdown
# Test Suite Assessment
|
||
|
||
Scope: every `.cs` file under `tests/Mbproxy.Tests/` (excluding bin/obj). Cross-checked against `docs/design.md` Testing section, `docs/plan/README.md` per-phase test inventory, `docs/Testing/Simulator.md`. Per CLAUDE.md the suite was around 325 tests at Phase 9 baseline and has grown through Phase 11.
|
||
|
||
## Summary
|
||
|
||
**Counts (by `[Fact]` / `[Theory]` attribute, recursive over `tests/Mbproxy.Tests/**/*.cs` excluding `obj/`):**
|
||
|
||
| Bucket | Files | Test methods |
|
||
|---|---:|---:|
|
||
| Unit (no I/O or stub-socket only, `[Trait("Category","Unit")]`) | 26 | ~189 |
|
||
| E2E (`[Trait("Category","E2E")]`, simulator + real Kestrel + real host) | 9 | ~37 |
|
||
| Other / fixture smoke (`HostSmokeTests`, `SimulatorSmokeTests`, fixture files) | 4 | 7 |
|
||
| **Total method count** | **39** | **~233** |
|
||
|
||
Only **2 `[Theory]`** tests exist (both in `BcdCodecTests`), each with a few `[InlineData]` rows, so the actual executed-test count is roughly **240–245**, slightly **below** the README/CLAUDE.md baseline of "325 tests at Phase 9 (282 unit + 43 E2E)". Either CLAUDE.md is over-reporting (perhaps counting individual `Theory` cases that have since been refactored away, or counting per-PLC parameterized runs), or test consolidation has happened since Phase 9 — worth reconciling. Phase 11 net additions are visible (Cache/* directory: 5 files, 32 facts).
|
||
|
||
**Headlines:**
|
||
- BCD codec, MBAP framing, multiplexer correlation, cache, and coalescing are all well-covered with both targeted unit tests and E2E + stub-backend integration tests; the suite has clear architectural layering (pure unit → stub-socket integration → simulator E2E).
|
||
- The E2E suite uses a **single shared `DL205SimulatorFixture`** that gracefully `Assert.Skip`s when Python/pymodbus is unavailable, but it **leans heavily on `Task.Delay` polling** (64 occurrences) and explicit time-window arithmetic — flake-prone on slow/loaded CI workers.
|
||
- The biggest concrete coverage gap is the **`Cache.AllowLongTtl` reload-rejection rule**, which is documented in `docs/design.md` and enforced in `ReloadValidator.cs` (line 100, 128) and `MbproxyOptions.cs` but has **zero corresponding test** in `ReloadValidatorTests.cs` or anywhere in the suite (`grep -rn AllowLongTtl tests/` returns nothing).
|
||
|
||
## Coverage matrix
|
||
|
||
| Subsystem | Rating | Notes |
|
||
|---|---|---|
|
||
| BCD codec (16/32-bit, partial, OOR) | **Strong** | `BcdCodecTests.cs` (19) + `BcdPduPipelineTests.cs` (26) cover encode/decode, single-reg, full 32-bit CDAB pair, partial low-only & high-only with `PartialBcdWarnings`, mixed-with-non-BCD, bad-nibble pass-through, FC06 OOR, FC16 multi-reg + partial pair, exception responses, length invariant. |
|
||
| MBAP frame parsing | **Adequate** | `MbapFrameTests.cs` (10 facts). Not read in depth — but headers in `PlcMultiplexerTests` exercise real wire frames end-to-end. |
|
||
| Multiplexer (TxId, correlation, cascade, watchdog) | **Strong** | `TxIdAllocatorTests` (8, incl. saturation, full-65536 sweep, wrap, parallel-stress concurrency), `CorrelationMapTests` (5), `RewriterCorrelationTests` (4), `PlcMultiplexerTests` (8: TxId-collision, distinct-on-wire, upstream-disconnect isolation, **backend-disconnect cascade with 3 upstreams**, **watchdog → exception 0x0B**, reconnect after cascade). |
|
||
| Coalescing | **Strong** | `ReadCoalescingTests` (8) + `CoalescingKeyTests` (5) + `InFlightByKeyMapTests` (6) + `ReadCoalescingE2ETests` (5). Covers leader/waiter, FC03/FC04 not coalesced, FC06 never coalesced, `MaxParties` cap (= 2 forces overflow), dead-upstream `CoalescedResponseToDeadUpstream` counter, hot-reload `Enabled=false` path. |
|
||
| Cache | **Strong** | `ResponseCacheTests` (10), `CacheInvalidatorTests` (8), `ResponseCacheMultiplexerTests` (8), `CacheKeyTests` (3), `CacheEntryTests` (3), `ResponseCacheE2ETests` (5). Covers TTL expiry, LRU + access-order, multi-tag min-TTL/zero-TTL disable, range-overlap (full/low/high/adjacent/disjoint/wrong-unit/zero-qty), write invalidation, **cache→coalesce short-circuit ordering**, post-rewriter byte caching, concurrent get/set stress (8 tasks × 500 ops). |
|
||
| Supervisor | **Adequate** | `SupervisorTests` (6) covers bind, port-in-use → Recovering, recovery-on-port-free, Stop-during-Recovering, RecoveryAttempts accumulation. `BackendConnectRetryTests` (3) and `PolicyFactoryTests` (5) cover Polly. **Test 4 ("RuntimeFault") is a placeholder** (test name says runtime fault but doesn't actually induce one — it self-admits this). |
|
||
| Hot reload | **Adequate** | `HotReloadE2ETests` (4): add PLC, remove PLC, change global tag list (reseat without restart), invalid reload (port collision) → no mutation. `ConfigReconcilerTests` (4) adds concurrent-reload serialisation. `ReloadPlanTests` (8) and `ReloadValidatorTests` (8) cover plan computation and validation rules **except the cache-related ones**. |
|
||
| Admin endpoint | **Strong** | `AdminEndpointTests` (6 E2E): JSON shape, post-FC03 PDU count, partial-BCD warning surfaced, HTML+meta-refresh, **bind-failure isolation** (service stays up + logs `bind-failed`), **admin-port hot-reload rebind**. `StatusSnapshotBuilderTests` (6) + `StatusHtmlRendererTests` (3). |
|
||
| Shutdown | **Strong** | `ShutdownCoordinatorTests` (4): no-conn, with-conn drain, timeout-cancel-with-count, ordering (admin stops after listeners). `ShutdownE2ETests` (2): graceful drain ≤ 10s, in-flight cancel after timeout. |
|
||
|
||
## Specific gaps (design contracts without a test)
|
||
|
||
1. **`Cache.AllowLongTtl=false` rejects `CacheTtlMs > 60_000`** — enforced at `src/Mbproxy/Configuration/ReloadValidator.cs:90-128` and `src/Mbproxy/Options/MbproxyOptions.cs:58-111`, but **no test exists in `ReloadValidatorTests.cs`** (only 8 tests covering name dup, port dup, admin collision, BCD map error, port range, admin-port range, happy path, empty name). Both the per-tag rule and the global reload rule should each get a test.
|
||
2. **Reload-driven cache flush on tag-list change** — `docs/design.md` and `CLAUDE.md` state "a tag-list reload flushes the affected PLC's response cache". `HotReloadE2ETests.E2E_ChangeGlobalBcdTagList_RewriteReflectsImmediately` proves the reseat happens but does **not** assert that any pre-existing cache entries for that PLC are dropped (e.g., set entry, mutate tag list, observe `Clear` increment / next read goes to backend). And per the Supervisor review's Critical 1, this is exactly the behaviour that today does NOT actually fire in the running multiplexer — a test would catch the regression directly.
|
||
3. **`SupervisorTests.Supervisor_RuntimeFault_TriggersRecovery`** is named for the contract but explicitly admits "we verify recovery mechanics are wired … For a full runtime-fault scenario, see the E2E tests." `SupervisorE2ETests` only has 2 tests — needs a check that they actually fault the accept loop. No test currently forces an accept-loop exception and asserts the supervisor resumes.
|
||
4. **Backend FC03/FC04 max-qty pass-through (PLC enforces 128/123 caps with exception 03)** — `dl205.md` calls this out as a quirk. The proxy is supposed to be transparent; no test sends `qty > 128` and asserts the proxy doesn't truncate / forwards as-is and surfaces exception 03 to the client.
|
||
5. **TxId allocator saturation propagation** — the allocator returns `false` when full (`TxIdAllocatorTests.Allocate_WhenSaturated_ReturnsFalse_DoesNotThrow`), but there's no `PlcMultiplexerTests` case asserting what happens when 65,536 in-flight requests pile up — does the multiplexer surface exception 0x06 (Server Device Busy) or backpressure?
|
||
6. **`Cache.AllowLongTtl=true` happy path** — paired with gap (1); should also have a test that the long-TTL value passes validation when the gate is set.
|
||
7. **Reload of `ReadCoalescing.Enabled` at runtime** — there's `ReadCoalescingE2ETests.E2E_CoalescingDisabledViaConfig_EveryReadIsAMiss` but it appears to start with the value disabled, not flip it at runtime through the hot-reload path. Per the Supervisor review's M6, hot-reload of this knob doesn't propagate through the reconciler's add/restart paths anyway — a test would catch it.
|
||
8. **Coalescing factory leak under saturation with multiple attachers** — see Multiplexer review C1. No test exercises the case where the leader's `TryAdd` or `TryAllocate` fails while late attachers have already joined the entry.
|
||
9. **Backend-reader head-of-line block by a slow upstream** — see Multiplexer review C4. No test wedges one upstream's response channel and asserts other peers on the same PLC continue to receive responses.
|
||
|
||
## Quality observations
|
||
|
||
- **Wall-clock arithmetic in coalescing tests.** `ReadCoalescingTests` uses fixed sleeps to construct the in-flight window (e.g. `ResponseDelayMs = 300; await Task.Delay(80)`). The `FiveClients_SameRequest_OneBackendRoundTrip_FiveResponses` test is hedged (`backend.RequestCount.ShouldBeLessThanOrEqualTo(2, "...one for the leader, one for any racy first-arrival")` and `CoalescedHitCount.ShouldBeGreaterThanOrEqualTo(3)`) — sensible hedging that admits the timing race exists. This is a known smell for CI.
|
||
- **Cascade test polls for counter convergence** (`PlcMultiplexerTests.BackendDisconnect_CascadesToAllUpstreams` polls 50× at 20 ms with comment explaining the inherent scheduling gap). Good defensive engineering, but indicates the underlying `BackendDisconnectCascades` increment is not synchronously observable from the test seam.
|
||
- **Naming is strong throughout**: tests follow `Subject_Condition_Expectation` style (e.g. `TwoClients_SameRequest_OnlyOneBackendRoundTrip`, `Validate_AdminPortCollidesWith_PlcListenPort_Fails`). Comments above each test explain the contract being verified.
|
||
- **AAA structure is clear** — almost every test has `// ── Helpers ──` / `// ── Tests ──` regions and explicit "Arrange / Act / Assert" implicit through whitespace.
|
||
- **Concurrency tests are genuine, not pretend.** `TxIdAllocatorTests.Concurrent_AllocateRelease_NoDuplicateIds_Under_Parallel_Stress` (100 tasks × 1000 ops verifying no duplicate live IDs via `ConcurrentDictionary.TryAdd`) and `ResponseCacheTests.Concurrent_GetSet_NoDataRace` (8 × 500) actually race tasks. Multiplexer's "two upstream concurrent" tests use real sockets with simultaneous `SendAsync` calls.
|
||
- **Skip pattern is clean and uniform**: `if (_sim.SkipReason is not null) Assert.Skip(_sim.SkipReason);` at the top of every E2E test method that needs the simulator. Cold-pip provisioning gets a 120-second window — reasonable for cold CI.
|
||
- **One redundant test pair**: `Diagnostics/ShutdownCoordinatorTests` line 27, 43, 57 each define a `StopAsync` stub — these are nested helper classes inside the test class, fine but easy to miss when reading the test method list.
|
||
- **Some E2E timeouts are tight enough to flake** (e.g. `[Fact(Timeout = 5_000)]` for `HotReloadE2ETests.E2E_AddPlcAtRuntime_NewListenerBinds_AndIsReachable`). The author already bumped one to 10s with a comment about hot-reload propagation needing headroom — others may follow.
|
||
|
||
## What looks good
|
||
|
||
- Three-tier architecture: **pure unit** (`BcdCodecTests`, `MbapFrameTests`, `TxIdAllocatorTests`, `CacheKeyTests`) → **stub-socket integration** (`PlcMultiplexerTests`, `ReadCoalescingTests`, `ResponseCacheMultiplexerTests`) → **simulator E2E** (`*E2ETests`). Each layer is justified in its xmldoc — e.g. `ReadCoalescingTests` explicitly notes "the pymodbus simulator cannot be used here — its known concurrent-MBAP-frame bug would invalidate the proxy-TxId echo path".
|
||
- Counter-based assertions everywhere (`RewrittenSlots`, `PartialBcdWarnings`, `CoalescedHitCount`, `CoalescedResponseToDeadUpstream`, `BackendDisconnectCascades`, `ReloadAppliedCount`, `ReloadRejectedCount`) — gives strong observable signals beyond happy-path correctness, and validates the status-page wire format simultaneously.
|
||
- Stub backends (`StubBackend` in `PlcMultiplexerTests`, `DelayedStubBackend` in `ReadCoalescingTests`, drain-only in watchdog test) are well-factored and reusable.
|
||
- Negative cases are first-class: bad nibbles, OOR encode, wrong unit ID, adjacent-but-not-overlapping ranges, FC06 against partial 32-bit pair, dead-upstream fan-out, port-already-in-use, duplicate config — not just happy-path coverage.
|
||
- `HotReloadE2ETests` exercises the **real `FileSystemWatcher` + `reloadOnChange: true`** path with rename-replace (the pattern that produces the 2-3 spurious events the debounce has to handle) — the most realistic possible test for that subsystem.
|
||
|
||
**Key file paths referenced:**
|
||
- `src/Mbproxy/Configuration/ReloadValidator.cs` (lines 90, 100, 128 — long-TTL gate enforcement, untested)
|
||
- `src/Mbproxy/Options/MbproxyOptions.cs` (lines 32, 58–111 — long-TTL gate option + per-tag enforcement)
|
||
- `tests/Mbproxy.Tests/Configuration/ReloadValidatorTests.cs` (8 tests, missing cache rules)
|
||
- `tests/Mbproxy.Tests/Proxy/Supervision/SupervisorTests.cs` (Test 4 = placeholder)
|
||
- `tests/Mbproxy.Tests/Sim/DL205SimulatorFixture.cs` (skip mechanism)
|