# Remediation Plan β€” 2026-05-14 Code Review Findings Plan for resolving every finding raised in the 2026-05-14 review. Items are grouped into three waves by impact, with each item naming the file:line, the problem, the suggested change, and the regression test that should lock the fix in place. **Conventions:** - "Wave 1" = ship before next deploy. Production-observable correctness defects. - "Wave 2" = next sprint. Contract gaps, concurrency primitives, telemetry accuracy. - "Wave 3" = opportunistic. Cleanup, dead code, doc, missing tests. - Effort estimates assume a developer already familiar with the multiplexer's lifecycle. βš™ = code change, πŸ“ = doc change, βœ… = test addition. **Dependency map** (Wave 1 items have load-bearing relationships): ``` W1.1 (context swap) ──┬──► W1.2 (cache.flushed emission) β”œβ”€β”€β–Ί W2.1 (ConfigReconciler coalescingAccessor) β”œβ”€β”€β–Ί W2.7 (skip-invalidation-while-recovering) └──► Tests βœ… (cache flush on tag-list reload) W1.5 (ShutdownCoordinator ownership) ──► W2.4 (IOptionsMonitor switch) β””β–Ί W2.6 (drop hardcoded 5s deadline) W2.3 (ConfigReconciler thread safety) ──► W3 test (concurrent reload assertion) ``` Other items are independent and can be parallelised. --- ## Wave 1 β€” Critical Correctness (4 items) ### W1.1 β€” Plumb context updates into the running `PlcMultiplexer` βš™ βœ… **File:** `Proxy/Multiplexing/PlcMultiplexer.cs:50,101,109` and call sites at `:497-509`. Driven by `Proxy/Supervision/PlcListenerSupervisor.cs:199-216`. **Problem:** `ReplaceContextAsync` swaps `_currentContext` on the supervisor, but `PlcMultiplexer` captured `_ctx = perPlcContext` at construction and never re-reads. So tag-list reloads, the cache-flush-on-tag-list contract, and any other context-derived state are visible only on the next listener fault. Worse, the old cache is disposed while the running mux still uses it. **Change:** Make `_ctx` a `volatile` field on `PlcMultiplexer` (or wrap it in a reference type the supervisor can swap atomically). Add a public `ReplaceContext(PerPlcContext newContext)` method on the multiplexer; have `ReplaceContextAsync` call it before disposing the old context. Every place inside the multiplexer that reads `_ctx.TagMap` / `_ctx.Cache` / `_ctx.CacheInvalidator` must read the field fresh, not snapshot it into a local. **Tests:** - βœ… `PlcMultiplexerTests.ReplaceContext_NewTagMap_VisibleOnNextPdu` β€” set up a mux, send one FC03, swap the context with a different tag map, send a second FC03, assert different rewriting on the second response. - βœ… `PlcMultiplexerTests.ReplaceContext_FlushesOldCache` β€” populate the old cache, swap context, assert next read goes to backend (cache miss, not a hit on the disposed cache). - βœ… `HotReloadE2ETests.E2E_GlobalTagListReload_FlushesPlcCache` β€” pre-seed cache via FC03, save modified `appsettings.json`, assert `cacheEntryCount` for that PLC drops to 0 and the next read goes through. **Effort:** ~1 day. The mechanical change is small; the risk is making sure every read-site of `_ctx.*` is migrated and no field is hoisted into a constructor capture. --- ### W1.2 β€” Fix coalescing factory leak βš™ βœ… **File:** `Proxy/Multiplexing/PlcMultiplexer.cs:692-733`, `Proxy/Multiplexing/InFlightByKeyMap.cs:62-83`. **Problem:** When `_correlation.TryAdd` or `_allocator.TryAllocate` fails inside the `_inFlightByKey.TryAttachOrCreate` factory, the code stores the stub `InFlightRequest` under the key. Late attachers join the stub's `InterestedParties` list. The post-lock cleanup at `:724` only sends the saturation exception to the leader; late attachers wait forever. **Change:** Either (a) inside the factory, do not return / store on failure β€” let `TryAttachOrCreate` propagate the error so no other thread sees the stub; or (b) on the cleanup path, walk `inFlight.InterestedParties` and deliver the exception 0x04 frame to every attached party before removing the entry. Option (b) is the smaller diff. Also remove the dead `bool` return from `TryAttachOrCreate` (`InFlightByKeyMap.cs:62`) β€” it always returns `true`. **Tests:** - βœ… `PlcMultiplexerTests.CoalescingFactoryFails_AllAttachersGetException` β€” stub the allocator to return false, fire 5 concurrent identical FC03 reads, assert all 5 receive exception 0x04 with their original TxIds. - βœ… `InFlightByKeyMapTests.AttachOrCreate_FactoryThrows_DoesNotStoreStub` β€” drive the factory to throw and assert no entry remains in the map. **Effort:** ~half day. --- ### W1.3 β€” Decouple backend reader from per-upstream backpressure βš™ βœ… **File:** `Proxy/Multiplexing/PlcMultiplexer.cs:545`. **Problem:** `await party.Pipe.SendResponseAsync(outFrame, ct)` runs synchronously inside the single backend reader task. If one upstream's bounded response channel (capacity 16) is full, the reader stalls β€” every other client on that PLC stops receiving responses. **Change:** Replace the synchronous `await` with a non-blocking `TryWrite` (drop-on-full, with a counter `responseDropForFullUpstream`), or fan out per-party with `_ = Task.Run(() => party.Pipe.SendResponseAsync(...))`. Drop-on-full is the cleaner semantics: a wedged upstream loses responses but the rest of the fleet keeps moving; the upstream's watchdog or socket closure will collect it. Add a counter for the dropped responses so it's observable on the status page. **Tests:** - βœ… `PlcMultiplexerTests.SlowUpstream_DoesNotStallPeerResponses` β€” wedge one upstream's channel reader, fire reads from a second upstream, assert the second upstream receives responses on time and the wedged upstream's drop counter increments. **Effort:** ~half day. Decide drop-vs-fanout first; the rest is mechanical. --- ### W1.4 β€” Drain or replace `_outboundChannel` on cascade βš™ βœ… **File:** `Proxy/Multiplexing/PlcMultiplexer.cs:362-385,423-428`. **Problem:** When the backend writer faults at `:380` and triggers a cascade, frames already enqueued in `_outboundChannel` are stranded. After reconnect, a new writer task drains them into the *fresh* backend with old proxy TxIds whose correlation entries were dropped by `TearDownBackendAsync`. The PLC responds; the reader silently drops the response (`:423-428`); upstream peers wait for the watchdog to time them out β€” observable as a multi-second pause for every client whose request was in flight at the cascade moment. **Change:** On cascade, complete `_outboundChannel.Writer.Complete()` before tearing down the writer task. In `EnsureBackendConnectedAsync`, recreate the channel as part of bringing up a new backend connection. Alternative: drain-and-discard (`while (TryRead) { releaseTxId; }`) before re-running the writer. **Tests:** - βœ… `PlcMultiplexerTests.Cascade_StrandedOutboundFrames_AreDiscarded` β€” fill the outbound channel, force a cascade mid-drain, reconnect, assert the next request gets a normal response and the stranded frames did not produce orphaned TxIds (counter `responseToUnknownTxId` stays at 0 for the new connection). **Effort:** ~1 day. Care needed to not accidentally race the channel recreation against fast reconnects. --- ### W1.5 β€” Resolve `ShutdownCoordinator` double-stop βš™ βœ… **File:** `Program.cs:60-66`, `Diagnostics/ShutdownCoordinator.cs:147`, `Proxy/ProxyWorker.cs:196-217`, `Admin/AdminEndpointHost.cs`. **Problem:** `ProxyWorker` and `AdminEndpointHost` are both `IHostedService`. Their `StopAsync` runs unconditionally during host shutdown. The coordinator on `ApplicationStopping` *also* calls `StopAsync` on both β€” so each is stopped twice. The drain logic in the coordinator therefore runs against a connection set that is already empty, and `mbproxy.shutdown.complete` always reports `InFlightAtCancel = 0`. The existing `ShutdownE2ETests.cs:113` doesn't actually assert the coordinator ran. **Change:** Pick one of: - **(A)** Make `ProxyWorker` and `AdminEndpointHost` *not* implement `IHostedService` β€” register them as singletons only β€” and let the coordinator drive their entire lifecycle (start in `ApplicationStarted`, stop in `ApplicationStopping`). - **(B)** Move the drain logic into `ProxyWorker.StopAsync` and delete the coordinator entirely. Simpler if the only thing the coordinator does is order + drain + admin-stop. (B) is the smaller change and matches how worker services are typically structured. **Tests:** - βœ… Update `ShutdownE2ETests` to assert `mbproxy.shutdown.complete` fires with a positive `InFlightAtCancel` when an in-flight request is dropped at deadline. - βœ… Assert `mbproxy.shutdown.complete` fires exactly once. **Effort:** ~1 day if (B); ~2 days if (A) because every IHostedService gets unwound. --- ## Wave 2 β€” Major Fixes (20 items) Grouped by file/area for batched PRs. ### Multiplexer / Concurrency | # | File:line | Problem | Change | Test | |---|-----------|---------|--------|------| | W2.1 | `Configuration/ConfigReconciler.cs:294-304,378-388` ↔ `Proxy/ProxyWorker.cs:129-130` | Add/restart paths construct `PlcListenerSupervisor` without the `coalescingAccessor` `ProxyWorker` builds at startup. Hot-reload of `ReadCoalescing.Enabled` doesn't propagate to PLCs added or restarted via reload. | Pass the same `coalescingAccessor` into `ConfigReconciler.Attach` and use it in the add/restart constructors. | βœ… `ConfigReconcilerTests.AddedPlc_HonoursCurrentReadCoalescingFlag` β€” reload with `Enabled=false`, add a PLC, assert that PLC's reads do not coalesce. | | W2.2 | `Proxy/Multiplexing/PlcMultiplexer.cs:86,142,181,216,566`; `UpstreamPipe.cs:52,77` | `_disposed` is non-volatile but read on the hot path. On weakly-ordered platforms (ARM) stale-false reads after disposal are permitted. | Mark both fields `volatile`, or replace with `Interlocked.CompareExchange(ref _disposedFlag, 1, 0)` pattern. The whole project ships win-x64 today so this is correctness defense for future portability. | (existing dispose tests are sufficient; verify via inspection.) | | W2.3 | `Configuration/ConfigReconciler.cs:236-238,269-271,306,390` | `_supervisors` `Dictionary<,>` is mutated from concurrent `Task.Run` continuations inside `Task.WhenAll`. Outer apply is serialised by a semaphore but the inner tasks are not. | Either `ConcurrentDictionary<,>`, or collect results from `Task.WhenAll` first then mutate the dict on a single thread. | βœ… Replace existing `Concurrent_Reloads_Serialised` with one that drives genuinely-concurrent add/remove via stub supervisors and asserts no `KeyNotFoundException` / corruption. | | W2.4 | `Proxy/Multiplexing/PlcMultiplexer.cs:721-733` (counter parity); `:626` (cache miss double-count) | Counter semantics drift from the docs: saturation increments `coalescedMissCount` but produces no backend round-trip; `cacheMiss` is incremented even when the request later coalesces onto a peer's in-flight read. | Two options: (a) restructure increments to fire only on the actual outcome (move `IncrementCoalescedMiss` after the successful send; demote cache-miss to "cache-eligible request" and add a separate `cacheElevatedToBackend` counter), or (b) leave behaviour unchanged and update `docs/Reference/LogEvents.md` + `docs/design.md` to match. (b) is the lower-effort fix. | βœ… Either way, add `PlcMultiplexerTests.CounterSemantics_DocumentedContract` that asserts the chosen rule. | | W2.5 | `Proxy/Multiplexing/PlcMultiplexer.cs:209,350` | `_disposeCts` disposed while watchdog/cancel paths may still race against it. `oldCts?.Dispose()` at `:350` races with reader's `ct.IsCancellationRequested` check. | Wrap `_disposeCts.Dispose()` in a `try { ... } catch (ObjectDisposedException) { }` at `:209`; gate `oldCts?.Dispose()` at `:350` behind the join of the reader task. | (existing dispose tests cover the happy path; consider a stress test that interleaves Dispose with active requests.) | | W2.6 | `Proxy/Multiplexing/PlcMultiplexer.cs:283` | `_connectGate` `SemaphoreSlim` never disposed. | Add to `Dispose`. | (no test needed.) | | W2.7 | `Proxy/Multiplexing/PlcMultiplexer.cs:468` | Reads the FC byte from the post-rewriter buffer. Correct today (FC byte never rewritten) but fragile. | Use `inFlight.Fc` from the request side. | (existing tests cover.) | ### Cache / Hot-reload | # | File:line | Problem | Change | Test | |---|-----------|---------|--------|------| | W2.8 | `Proxy/Cache/CacheLogEvents.cs:67-76`, supervisor reseat path | `mbproxy.cache.flushed` event defined and documented but never emitted. After W1.1 the reseat path can call `cache.Clear()` instead of disposing β€” that's the natural emission point. | After W1.1, replace `cache.Dispose()` in `ReplaceContextAsync` with `cache.Clear(reason: "tag-list-reload")` and have `Clear` emit the log event. | βœ… `ResponseCacheTests.Clear_EmitsFlushedEvent_WithReason`; βœ… `HotReloadE2ETests` add an assertion that `mbproxy.cache.flushed` fires on the tag-list reload. | | W2.9 | `Proxy/Multiplexing/PlcMultiplexer.cs:497-509` (currently no gate) | "Skip cache invalidations when listener is `recovering`" (`design.md:203`) is unimplemented. Implicit gating holds today (no backend β†’ no FC06/FC16 response β†’ no invalidation), but is undocumented and will break if any new code path produces a write response off the live backend. | Either (a) add an explicit check on `SupervisorState.Recovering` before calling `_invalidator.Invalidate(...)`, or (b) accept the implicit gating and add a comment in `PlcMultiplexer.cs` near the invalidation call referencing `design.md:203`. (b) is simpler and matches reality. | βœ… Doc-only fix β†’ no test. (a) β†’ fault-injection test that synthesises a response during recovery and asserts no invalidation. | | W2.10 | `Configuration/ReloadValidator.cs:80` | `ReloadValidator` runs `BcdTagMapBuilder.Build` and discards the result; the resolved per-tag `CacheTtlMs` after the per-PLC default fold is never re-checked against `Cache.AllowLongTtl`. | After Build, walk `result.Map.Values` and re-run the long-TTL gate against resolved values. | βœ… Add to `ReloadValidatorTests`: tag with explicit null TTL inheriting a per-PLC default of 60_001 ms is rejected when `AllowLongTtl=false`. | ### BCD Rewriter | # | File:line | Problem | Change | Test | |---|-----------|---------|--------|------| | W2.11 | `Bcd/BcdTagMapBuilder.cs:84-103` | `DuplicateAddress` validation is unreachable β€” input collapses through a `Dictionary` first. Design contract "reject duplicate addresses within a single PLC's resolved list" is silently violated. | Detect duplicates *before* the dictionary collapse: walk the `Add` list with a `HashSet`, collect duplicates, return them as `DuplicateAddress` errors. Same for the merged `Global βˆͺ Add βˆ’ Remove` resolution where Add overrides Global at the same address (intentional width-override path stays valid; only collisions within Add itself are errors). | βœ… `BcdTagMapBuilderTests.Build_DuplicateAddressInAddList_ReportsError` (currently the test acknowledges the gap). | | W2.12 | `Bcd/BcdTagMapBuilder.cs:113-127` | `OverlappingHighRegister` reports each pair twice (once from each direction). | Track reported pairs in `HashSet<(ushort,ushort)>`, dedupe symmetric reports. | βœ… `BcdTagMapBuilderTests.Build_TwoOverlappingPairs_ReportsExactlyOneErrorPerPair`. | | W2.13 | `Proxy/BcdPduPipeline.cs:206-211` | FC16 32-bit write silently mutates values when `clientLow > 9999` β€” multiplication math accepts the OOR value and re-encodes a different number. | Validate `clientLow < 10_000 && clientHigh < 10_000` before the multiplication; on failure, increment `invalidBcdWarnings` and pass through raw. | βœ… `BcdPduPipelineTests.FC16_32Bit_ClientHighOrLowOOR_PassesThroughRaw_WithInvalidBcdWarning`. | | W2.14 | `Proxy/BcdPduPipeline.cs:159` | FC16 `byteCount` field at `pdu[5]` is parsed but discarded. Malformed claims of `qty=10` with 4 bytes of register data silently process half the request. | Add `if (pdu.Length < 6 + qty*2) return;` at the top of `ProcessFc16Request` (and increment a counter or log). | βœ… `BcdPduPipelineTests.FC16_TruncatedRegisterData_PassesThroughRawWithoutHalfRewrite`. | ### Supervisor | # | File:line | Problem | Change | Test | |---|-----------|---------|--------|------| | W2.15 | `Proxy/Supervision/PlcListenerSupervisor.cs:137-144` | `WaitForInitialBindAttemptAsync` busy-polls every 10 ms, races a fast `Stoppedβ†’Boundβ†’Stopped`, never exits if the supervisor task throws. | Replace polling loop with a `TaskCompletionSource` set on the first state transition out of `Stopped`. | βœ… `SupervisorTests.WaitForInitialBindAttemptAsync_ResolvesEvenWhenBindThrows` β€” drive a synthetic bind exception and assert the wait returns within ~50 ms. | | W2.16 | `Proxy/Supervision/PlcListenerSupervisor.cs:126` | `_supervisorCts` reassigned on `StartAsync` without disposing the previous instance. | Add a guard (`throw if not Stopped`) or dispose the previous CTS before reassigning. | (covered by inspection.) | | W2.17 | `Proxy/Supervision/PlcListenerSupervisor.cs:177-180` | `Snapshot()` reads three fields independently β†’ status page can show inconsistent state/lastBindError/recoveryAttempts triples. | Read into locals under a single lock, or use a single `volatile` struct that's swapped atomically. | βœ… `SupervisorTests.Snapshot_DuringTransition_IsInternallyConsistent` β€” fire many transitions in parallel and assert no inconsistent triple ever observed. | | W2.18 | `Options/ConnectionOptions.cs` and `MbproxyOptions.cs` validators | `Connection.{BackendConnectTimeoutMs,BackendRequestTimeoutMs,GracefulShutdownTimeoutMs}` are not validated for `> 0`. A misconfigured 0 produces immediate `CancelAfter(0)` failures. | Add `> 0` checks to `MbproxyOptionsValidator` and `ReloadValidator`. | βœ… `MbproxyOptionsBindingTests.Validation_RejectsZeroOrNegativeTimeouts`. | ### Hosting / Diagnostics | # | File:line | Problem | Change | Test | |---|-----------|---------|--------|------| | W2.19 | `Diagnostics/ShutdownCoordinator.cs:93,102,118`; `Program.cs:42` | Coordinator binds `IOptions` (singleton snapshot). A hot-reloaded `GracefulShutdownTimeoutMs` is silently ignored. | Switch to `IOptionsMonitor`; read `.CurrentValue.Connection.GracefulShutdownTimeoutMs` inside `ShutdownAsync`. After W1.5 verify the coordinator still exists; if it was removed, this fix migrates to wherever drain logic lives. | βœ… `ShutdownE2ETests.E2E_HotReloadedShutdownTimeout_IsHonoured`. | | W2.20 | `Diagnostics/ShutdownCoordinator.cs:147` | Hardcoded `TimeSpan.FromSeconds(5)` for supervisor stop deadline regardless of `GracefulShutdownTimeoutMs`. | Use the configured value. | (covered by W2.19's test if extended.) | | W2.21 | `src/Mbproxy/appsettings.json` ↔ `install/mbproxy.config.template.json` | Shipped `appsettings.json` is an empty stub. Fresh `Mbproxy.exe` from publish folder is a no-op service. | Either mirror the install template into `src/Mbproxy/appsettings.json`, or set `Never` in the csproj so the publish folder doesn't ship a stub. | βœ… `HostSmokeTests.Host_StartsWithCheckedInAppsettings_AndAtLeastOneListenerBinds`. | | W2.22 | `Admin/StatusDto.cs:102`, `Admin/StatusSnapshotBuilder.cs:127-131` | `pdus.invalidBcdWarnings` and `backendExceptions.codeOther` are tracked in `CounterSnapshot` but never surfaced via the JSON DTO or HTML renderer. Dead increment paths. | Add the two fields to `PlcPdusStatus` / `ExceptionCounts` and wire them through `StatusSnapshotBuilder` and `StatusHtmlRenderer`. | βœ… `AdminEndpointTests.StatusJson_IncludesInvalidBcdWarnings_AfterPipelineWarning`; βœ… analogous for `codeOther`. | | W2.23 | `Diagnostics/EventLogBridge.cs:46` | `EventLog.SourceExists` registry hit on every Error+ log line. | Cache the result in a `bool _sourceExists` field set in the constructor (one-shot). Document that source registration changes after process start require a service restart. | (covered by inspection; optional perf-sensitive E2E.) | | W2.24 | `Proxy/ProxyWorker.cs:202-217` | Hard-coded 5-second deadline in `StopAsync`. Coexists with the coordinator's deadline; reader of just `ProxyWorker` would miss the contract. | If W1.5 keeps the coordinator: add a comment pointing at it. If W1.5 deletes the coordinator: replace the 5s with `IOptionsMonitor.CurrentValue.Connection.GracefulShutdownTimeoutMs`. | (covered by W2.19's test.) | --- ## Wave 3 β€” Cleanup, Doc-only, Test Gaps (low priority) ### Cleanup (small βš™) - `Proxy/Multiplexing/PlcMultiplexer.cs:445-450` β€” dead `elapsedMs` calculation, never used. Remove. - `Proxy/Multiplexing/UpstreamPipe.cs:262-263` β€” meaningless conditional `firstRead && remaining == count ? false : false`. Replace with `return false`. - `Proxy/Multiplexing/InFlightByKeyMap.cs:62` β€” `TryAttachOrCreate` always returns `true`; drop the dead `bool`. (Resolved as part of W1.2.) - `Bcd/BcdCodec.cs:103-110` and `Proxy/BcdPduPipeline.cs:455-459` β€” duplicated `HasBadNibble`. Promote codec's helper to `internal` and call it from the pipeline. - `Proxy/Supervision/PlcListenerSupervisor.cs:268,323,346` β€” copy-pasted exception-message truncation. Extract `Truncate(string, int)` helper. - `Proxy/Multiplexing/PlcMultiplexer.cs:844-846` β€” comment says "1-second floor"; code uses 100 ms. Fix comment. - `Configuration/ConfigReconciler.cs:428` β€” duplicated tag-delta computation. Either expose a delta count from `ReloadPlan.Compute` or drop the log enrichment. - `Mbproxy.csproj:42` β€” comment describes Polly as "deferred" but it's actively used. Remove stale comment. - `Admin/StatusSnapshotBuilder.cs:69` β€” unreachable `?.Address.ToString()` fallback. Simplify to `p.RemoteEp?.ToString() ?? "?"`. ### Doc-only (πŸ“) - `README.md` (operator-facing) β€” add a callout describing the 32-bit BCD wire format as "two base-10000 digits in CDAB", not standard binary CDAB Int32. Standard Modbus clients (NModbus, FluentModbus, Wonderware DAServer) will silently corrupt values > 9999 if they assume normal CDAB-int32 semantics. *(BcdRewriter M1.)* - `docs/design.md` and `docs/Reference/LogEvents.md` β€” clarify counter semantics for cache-miss-during-coalescing per the choice made in W2.4. *(Mux M9.)* - `docs/Operations/Configuration.md` β€” note that the Event Log sink is a custom `EventLogBridge`, not `Serilog.Sinks.EventLog`, so reviewers don't add a duplicate sink. *(Hosting minor.)* - `Proxy/ResponseCache.md` (or `docs/Architecture/ResponseCache.md`) β€” document the implicit invalidation-skip-during-recovery gating. *(Cache M1 / Supervisor M1, paired with W2.9 option (b).)* - `docs/plan/README.md` β€” add a phase entry for "Phase 12: Review remediation 2026-05-14" pointing at this file, so the phase history stays complete. ### Test gaps (βœ…) The following design contracts are unverified by any current test. Add them in priority order: 1. βœ… `ReloadValidatorTests` β€” `Cache.AllowLongTtl=false` rejects `CacheTtlMs > 60_000` (per-tag and per-PLC-default forms). Paired with W2.10. *(Cache test gap; design.md.)* 2. βœ… `ReloadValidatorTests` β€” `Cache.AllowLongTtl=true` happy-path acceptance. 3. βœ… `HotReloadE2ETests` β€” tag-list reload flushes the affected PLC's cache (paired with W1.1 + W2.8). 4. βœ… `SupervisorTests` β€” replace placeholder `Supervisor_RuntimeFault_TriggersRecovery` with a test that genuinely faults the accept loop and asserts state goes `Bound β†’ Recovering β†’ Bound`. *(Supervisor test gap.)* 5. βœ… `PlcMultiplexerTests` β€” TxId allocator saturation propagates a clean Modbus exception (0x06 or equivalent) to upstream, doesn't hang. *(Mux test gap.)* 6. βœ… `PlcMultiplexerTests` β€” coalescing factory leak under saturation with multiple attachers. (Paired with W1.2.) 7. βœ… `PlcMultiplexerTests` β€” backend-reader head-of-line block by a slow upstream. (Paired with W1.3.) 8. βœ… `PlcMultiplexerTests` β€” watchdog↔response race (response arriving in the same tick as `SnapshotOlderThan`). 9. βœ… `PlcMultiplexerTests` β€” cascade racing with a brand-new accept (does the new pipe survive or land in a zombie multiplexer mid-teardown?). 10. βœ… `HotReloadE2ETests` β€” flip `ReadCoalescing.Enabled` at runtime (not start-up) and assert subsequent reads stop coalescing. (Paired with W2.1.) 11. βœ… `BcdCodecTests` β€” `Encode16(int.MinValue)` doesn't blow up. 12. βœ… `BcdPduPipelineTests` β€” write range ending exactly mid-32-bit-pair on the high address (inverse of existing low-side test). 13. βœ… `BcdPduPipelineTests` β€” mixed 16-bit + 32-bit + non-BCD slots in a single FC03 read. 14. βœ… `BcdPduPipelineTests` β€” FC16 *response* handling (currently only the request is tested). 15. βœ… `ProxyForwardingTests` or similar β€” `qty > 128` FC03 read passes through unchanged; PLC's exception 03 surfaces to the client. 16. βœ… `ConfigReconcilerTests` β€” replace the soft "concurrent reloads serialised" test with one that drives genuinely-concurrent applies and asserts no overlap (using a fake supervisor with a "claim count" assertion). 17. βœ… `AdminEndpointTests` β€” POST/PUT/DELETE/HEAD against `/` and `/status.json` return 405 (verifies no unintended methods are bound). --- ## Out of Scope Findings that this plan deliberately does **not** address: - **Mux C2 / C3** (cache-hit-during-cascade visibility model). Per the design these are acceptable behaviours; the Multiplexer review downgraded them. - Any finding that recommended adding a new feature (e.g. an `Mbproxy.AdminBindAddress` config knob) or new defense-in-depth security hardening β€” those were trimmed from the reviews per direction on 2026-05-14. --- ## Suggested Sequencing A focused remediation pass might split as: | Sprint | Items | Outcome | |--------|-------|---------| | Sprint 1 (~1 week) | W1.1, W1.2, W1.3, W1.4 | The four critical multiplexer/hot-reload defects are gone. Hot-reload finally works as documented. | | Sprint 2 (~1 week) | W1.5, W2.19, W2.20, W2.24 + the related test (ShutdownE2ETests) | Shutdown path is single-owner and respects the configured timeout. | | Sprint 3 (~1 week) | W2.1, W2.3, W2.8, W2.9, W2.10, W2.21 + cache/supervisor tests | All the major hot-reload + cache contract gaps closed. | | Sprint 4 (~1 week) | All remaining W2 items (BCD, supervisor, multiplexer minor) | Counter accuracy, dispose hygiene, validator coverage. | | Sprint 5 (~3 days) | Wave 3 cleanup + doc + tests | Suite + docs caught up. | After Sprint 1 the production risk profile drops sharply. After Sprint 3 every documented design contract is enforced. Wave 3 is opportunistic and can be folded into ongoing work.