Closes the Wave 3 (cleanup) tier of codereviews/2026-05-14/RemediationPlan.md.
Tests: 378 pass / 0 fail (baseline 370 + 8 new W3 regression tests).
Code cleanups:
* PlcMultiplexer: removed dead `elapsedMs` calculation (the actual EWMA
conversion uses Stopwatch ticks two lines below).
* UpstreamPipe.FillAsync: dropped the meaningless `firstRead && remaining
== count ? false : false` ternary; both branches were `false`.
* InFlightByKeyMap.TryAttachOrCreate (always returned `true`) renamed to
`AttachOrCreate` and made `void`. Test sites updated to drop the dead
`bool ok = ...; ok.ShouldBeTrue();` assertions.
* BcdCodec.HasBadNibble promoted from private to internal; the duplicate
copy in BcdPduPipeline removed and the call sites updated to
`BcdCodec.HasBadNibble`.
* PlcMultiplexer watchdog comment fixed: said "1-second floor", code uses
100 ms. Now both agree.
* StatusSnapshotBuilder: simplified the unreachable
`RemoteEp?.ToString() ?? RemoteEp?.Address.ToString() ?? "?"` to
`RemoteEp?.ToString() ?? "?"`.
* Mbproxy.csproj: stale "deferred" Polly comment replaced with a real
description of where Polly is used (BackendConnect + ListenerRecovery).
Doc updates:
* README: added a callout about the unconventional 32-bit BCD wire format
("two base-10000 digits in CDAB", not standard binary CDAB Int32) so
integrators using off-the-shelf clients learn about the silent-corruption
hazard before configuring writes.
* docs/design.md: clarified `cacheMissCount` and `coalescedMissCount`
semantics — "miss" means "did not find a fresh entry / did not coalesce",
NOT "produced a backend round-trip". Operators wanting actual backend
traffic should compute `miss − coalescedHit − exception04`.
* docs/Architecture/ResponseCache.md: documented the structural
"skip invalidation while recovering" gating (no backend reader during
recovery → no FC06/FC16 response → no invalidation).
* docs/Operations/Configuration.md: noted that the Event Log sink is the
custom EventLogBridge, not Serilog.Sinks.EventLog (W2.23 cached check).
* docs/plan/README.md: added a Phase 12 row pointing at the remediation
plan and linking out to codereviews/2026-05-14/.
Test additions (W3 high-value gaps):
* BcdPduPipelineTests:
- FC16_WriteStartsOnHighWord_Of32BitPair_PassesThroughRaw_WithPartialWarning
(symmetric inverse of the existing low-side partial-overlap test).
- FC03_Mixed_16Bit_32Bit_AndNonBcd_InOneRead_OnlyConfiguredSlotsRewritten
(mixed-slot routing in a single FC03 read).
- FC16_Response_PassesThroughUnchanged_RegardlessOfTagMap (FC16 response
carries no register data; rewriter must pass through).
* AdminEndpointTests:
- NonGetMethod_AgainstAdminRoutes_Returns405 (Theory: POST/PUT/DELETE/
PATCH against `/` and `/status.json` must return 405; guards against
an accidental MapPost being added later).
* HotReloadE2ETests:
- E2E_TagListReload_OnCacheablePlc_EmitsCacheFlushedEvent (validates the
W2.8 cache.flushed wiring end-to-end via the real FileSystemWatcher
reload path).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mbproxy — implementation plan
Phase-by-phase implementation plan for the mbproxy service. Each phase is a self-contained work spec with explicit deliverables, tests, and a gate checklist that must be green before the next phase begins. Settled against the design plan in ../design.md on 2026-05-13.
Briefing a subagent for a phase: hand it exactly three documents — the phase doc, ../design.md, and ../../DL260/dl205.md. Tell it not to read other phase docs unless its own doc lists them under "Cross-references". The phase doc IS the contract.
Phase graph
| # | Phase | Depends on | Parallel-safe with |
|---|---|---|---|
| 00 | Bootstrap — host + DI + Serilog + options POCOs | — | (must run first, alone) |
| 01 | Simulator harness — pymodbus xUnit fixture | 00 | 02 |
| 02 | BCD codec — pure encode/decode logic | 00 | 01, 03 |
| 03 | Proxy plumbing — TcpListener + 1:1 byte forwarder | 00 | 02 |
| 04 | Rewriter integration — wire codec into proxy | 02, 03 | — |
| 05 | Listener supervisor — Polly auto-recovery | 03 | — |
| 06 | Hot-reload — IOptionsMonitor reconcile |
05 | — |
| 07 | Status page — Kestrel admin endpoint | 05, 06 | — |
| 08 | Service hardening — Windows service + shutdown | 04, 07 | — |
| 09 | TxId multiplexing — single backend connection per PLC (post-1.0 follow-on) | 04, 05, 07 | — |
| 10 | Read coalescing — in-flight FC03/04 dedup (post-1.0 follow-on) | 09 | — |
| 11 | Response cache — short-TTL post-response cache, bounded staleness (post-1.0; design-contract pivot) | 10 | — |
| 12 | Code-review remediation (2026-05-14) — Wave 1 critical, Wave 2 major, Wave 3 cleanup. Plan and findings in ../../codereviews/2026-05-14/. |
11 | — |
┌── 01 (sim) ──┐
00 ─────┼── 02 (codec) ─┼──── 04 ───┐
└── 03 (plumbing)┴── 05 ─── 06 ─── 07 ─── 08
│
└─────────────────→ 09 ───→ 10 ───→ 11 (post-1.0)
Phases 09, 10, and 11 are post-1.0 follow-ons, not part of the initial 1.0 release.
- Phase 09 rewires the connection layer to lift the H2-ECOM100's 4-concurrent-client cap as an operational ceiling. Pick it up only after Phase 08 has shipped and field experience confirms the 4-client cap is a real production problem (not just a theoretical one).
- Phase 10 plugs into Phase 09's
InterestedPartiesseam to coalesce same-key FC03/04 reads within the in-flight window. Zero post-response staleness. Worth doing only if field telemetry shows meaningful read overlap (≥ 2× duplicate-read traffic from concurrent HMIs / historians). - Phase 11 extends the "served without backend traffic" window from in-flight microseconds (Phase 10) to operator-configurable seconds via a per-tag TTL response cache. This is a deliberate design-contract pivot — the proxy stops being purely transparent and becomes an opt-in cache layer with bounded staleness. The cache is OFF by default; opting tags in is the operator's explicit acknowledgement of the staleness window. Pick up only if Phase 10's coalescing-ratio under real load reveals enough cross-poll overlap to justify staleness as a trade.
Working with subagents
Default: one subagent per phase, sequential
Spawn one Agent (Sonnet or Opus) per phase in order. Each agent reads exactly:
- Its own phase doc (under this directory).
../design.md— architecture, the source of truth.../../DL260/dl205.md— device quirks.
That is sufficient context. The agent must NOT invent scope beyond the phase doc's "Outputs" section. If it discovers a design-affecting issue, it must STOP and surface the issue rather than improvise — designs change in ../design.md, not silently in code.
Advanced: parallel subagents within a single phase boundary
Two phases marked "Parallel-safe with" each other can be picked up by independent subagents at the same time. The only safe parallel windows in this plan are:
- Phase 01 ∥ Phase 02 (sim harness lives in
tests/sim/, codec lives insrc/Mbproxy/Bcd/— fully disjoint). - Phase 02 ∥ Phase 03 (codec is pure logic in
src/Mbproxy/Bcd/; plumbing is insrc/Mbproxy/Proxy/— disjoint). - Phase 01 + Phase 02 + Phase 03 all three at once is also safe (all touch different directories).
Required pattern:
- Spawn each parallel agent with
isolation: "worktree"(Agent tool's worktree mode creates an isolated git checkout). - Each agent gets ONE phase doc + design.md + dl205.md.
- Each agent runs its phase gate locally before its worktree is committed.
- Merge order: lower phase number first. Resolve conflicts manually if the agents drifted outside their declared output scope (which they shouldn't).
- After merge, re-run the phase 00 smoke test plus both merged phases' tests to confirm no integration regression.
Hard rules — anti-patterns that break parallel work:
- ❌ Any two phases editing the same
.csprojPackageReference list at the same time. Phase 00 owns the initial csproj; later phases append PackageReferences atomically and a parallel pair must coordinate via separate<ItemGroup>blocks or sequential merges. - ❌ Running phase 04 in parallel with anything (it integrates two prior phases — by definition it touches their outputs).
- ❌ Running phase 06 in parallel with anything (the hot-reload reconcile inspects state from listener supervisor + rewriter + counters; it has the widest cross-cut).
- ❌ Spawning more than 3 concurrent worktree agents (review/merge overhead grows superlinearly and the value disappears).
Phase gate template
Every phase MUST be green on all of these before its branch is merged:
- Build is clean.
dotnet build src/Mbproxy/Mbproxy.csproj -c Debugwith zero warnings.<TreatWarningsAsErrors>true</TreatWarningsAsErrors>is set in phase 00 and stays set forever. - All unit tests pass.
dotnet test tests/Mbproxy.Tests/Mbproxy.Tests.csproj --filter Category!=E2Eis green. - E2E tests pass when the simulator is available.
dotnet test tests/Mbproxy.Tests/Mbproxy.Tests.csproj --filter Category=E2E --blame-hang-timeout 2mis green on a machine with Python + pymodbus installed. The--blame-hang-timeoutis mandatory — never run E2E without it. Skipped tests (due to missing simulator) don't count as failures, but ANY test added in this phase must NOT skip when the sim IS available, and every E2E test MUST carry a[Fact(Timeout = …)]per the Test discipline rules below. - No regressions in any prior phase's tests. The full suite stays green.
- No new public types beyond what the phase doc declares. Scope creep is a gate fail. If a needed type is missing from the doc, update the doc first.
- No
TODO/FIXME/HACKcomments committed. Either resolve or file in the Deferred section below. - Design / docs are in sync. If a design decision changed during the phase,
../design.mdis updated in the same PR — and only mirror to../../CLAUDE.md's Architecture summary if the change shifts one of the headline bullets. - Phase doc itself is updated to reflect any clarifications discovered during implementation, so the next subagent picking up the project doesn't relearn what this one learned.
Test discipline
- Framework: xUnit (v3 if available, v2 otherwise) + Shouldly for assertions. Never
Assert.Equal(x, y)— alwaysy.ShouldBe(x). NeverAssert.True(p)— alwaysp.ShouldBeTrue("reason"). - Categories:
[Trait("Category", "Unit")](default; no traits needed),[Trait("Category", "E2E")](needs simulator),[Trait("Category", "Stress")](slow / load-bearing — opt-in only). - No mocks for code we own. Exercise our types directly. Mock only at the network/file/process boundary — and prefer a real local socket / real temp file over a mock when feasible.
- Test naming:
MethodOrScenario_Condition_ExpectedOutcome. Example:BcdCodec_Decode16_Returns1234_For0x1234. - One assertion per test where reasonable. Multi-assertion tests are acceptable when they assert facets of the same scenario; never when they're really separate tests glued together.
- Every
[Trait("Category","E2E")]test MUST declare a hard timeout via[Fact(Timeout = N)](xUnit v3, milliseconds). Default:5_000ms. Expand per-test only when the test genuinely needs longer (concurrent bursts > 100 ops, reload-propagation debounce, graceful-shutdown drain) — and add a one-line comment explaining why. Start tight; raise only when a real test fails with a non-deadlock reason. Reason this matters: the existing fixtures use synchronous NModbus calls and stub TCP servers that do not honorTestContext.Current.CancellationToken— without[Fact(Timeout=…)], a deadlock in the proxy hangs the runner indefinitely. The same rule applies to[Trait("Category","Stress")]. Unit tests are exempt unless they touch real sockets or processes. - Run E2E with a hang backstop. The phase gate's E2E command is
dotnet test ... --filter Category=E2E --blame-hang-timeout 2m. The--blame-hang-timeoutis a process-level safety net in case a test's individualTimeoutsomehow doesn't fire (e.g. an unmanaged thread blocking finalization).
Deferred
A running list of things explicitly NOT done in any current phase. When a phase reveals one, add it here so it isn't forgotten and so the deferral is visible at review time:
- (none yet)
Cross-references
- Architecture and load-bearing decisions:
../design.md - Device quirks the proxy must respect:
../../DL260/dl205.md - pymodbus simulator profile that backs e2e tests:
../../DL260/dl205.json - As-deployed PLC parameters (port 502, BCD-by-default, swap bytes, etc.):
../../DL260/mbtcp_settings.JPG