# Phase 09 — MBAP TxId multiplexing (single backend connection per PLC) Replace the 1:1 upstream-client ↔ backend-socket model with a **single backend connection per PLC**, multiplexed across all upstream clients via MBAP transaction-ID rewriting and a correlation map. After this phase the H2-ECOM100's 4-simultaneous-TCP-client cap is no longer an operational ceiling — the proxy holds exactly one slot per PLC regardless of how many upstream clients are connected. **Status:** shipped 2026-05-14. Phases 00-08 shipped the production-ready 1:1 model; this phase swapped connection management without changing the transparent-rewrite contract. ## Implementation clarifications discovered during 2026-05-14 ship These notes capture decisions and surprises that surfaced during the actual implementation. They supplement (not replace) the Tasks section below. 1. **A per-request timeout watchdog is part of Phase 9, not deferred.** The 1:1 model collapsed missing-response handling onto the dedicated backend socket dying. The multiplexed model needs an explicit timer because a single lost or mis-routed response would otherwise leak a correlation entry forever and hang the upstream pipe indefinitely. The watchdog ticks at quarter-`BackendRequestTimeoutMs` (min 100 ms), scans the correlation map, and times out stale requests with **Modbus exception 0x0B (Gateway Target Device Failed To Respond)** delivered to the upstream party with the original TxId restored. Log event `mbproxy.multiplex.request.timeout` (Warning). 2. **PlcListener constructs a multiplexer unconditionally.** The Phase-9 draft had `PlcListener` conditionally construct the multiplexer only when a `PerPlcContext` was supplied; the no-context fallback dropped accepted upstream sockets. Tests (and any pre-Phase-6 startup path that lacked a context) hit a regression. The fix is to construct a minimal default `PerPlcContext` from the `PlcOptions` if the caller didn't supply one, and require `_multiplexer` to be non-null when `RunAsync` runs. 3. **`BackendConnectFailure_ClosesUpstreamCleanly` is now lazy.** The 1:1 model attempted a backend connect at upstream-accept time, so simply opening a TCP connection to a proxy with a bad backend triggered the close. The multiplexed model connects to the backend on the *first upstream frame*, so the test has to send a Modbus request before the proxy attempts the (failing) backend connect that causes the upstream close. Updated in-place. 4. **pymodbus 3.13.0 simulator is broken under multiplexed concurrent requests.** Its `ServerRequestHandler` keeps a single `last_pdu` per connection and schedules `handle_later` via `asyncio.call_soon`; two MBAP frames in one recv buffer overwrite `last_pdu` before the first handler runs, and both responses carry the later TxId. The real DL260 ECOM properly echoes per-request TxIds. Consequence for tests: - **Mux correctness under truly concurrent backend traffic is proven against the stub backend in `PlcMultiplexerTests`**, which models the DL260's correct TxId-echo behaviour. - **`MultiplexerE2ETests` paces requests** so pymodbus only ever sees one MBAP frame at a time on the shared backend connection. The headline test (`E2E_FiveSimultaneousClients_AllReadHR1072_AllGetDecoded_1234`) verifies the connection ceiling lift (5 simultaneous upstream connections, where Phase-08's 1:1 model would have refused the 5th) — *not* the under-concurrency multiplexing behaviour. - **The watchdog is the production defence** if any real backend (or future simulator) ever mis-echoes a TxId: stale entries time out cleanly with exception 0x0B rather than hanging upstream clients. 5. **E2E timeouts.** Per `docs/plan/README.md`'s Test discipline, all E2E tests are 5 s by default. Hot-reload tests that genuinely need 5 s + 3 s of propagation windows carry a 10 s timeout with a one-line comment; `E2E_BackendDisconnect_DuringInflight_CascadesUpstream_AndRecovers` carries 8 s for its sequential connects + Polly-paced reconnect path. 6. **`AsyncHostDispose` deadlock note.** Test fixtures that hold `IHost` via `await using` were originally written with a 5 s shutdown timeout; under Phase 9's drained-channel cleanup that occasionally exceeded the test's own `Timeout = 5000`. Reduced to 2-3 s where it doesn't materially affect the test's drain semantics. **Depends on:** Phase 04 (rewriter), Phase 05 (supervisor + Polly), Phase 07 (status page DTO surface). **Parallel-safe with:** nothing within itself. **Hard rule.** This phase deletes `PlcConnectionPair` and rewires the supervisor + rewriter correlation path simultaneously; the cross-cut is too broad for safe parallel work. The optional intra-phase slicing (below) is the closest thing to parallel. ## Goal The H2-ECOM100 accepts 4 concurrent TCP clients per PLC; today's 1:1 model means the 5th upstream client to the same proxy port fails at backend connect. This phase eliminates that ceiling by making **one persistent backend socket per PLC**, with the proxy serving as a connection multiplexer that rewrites MBAP transaction IDs to keep concurrent in-flight requests from different upstream clients distinguishable on the single wire. The wire-rate ceiling does not change — the H2-ECOM100 internally serializes requests (one per PLC scan, ~2-10 ms scan time) regardless of how many TCP connections it has. We're shifting where serialization happens (proxy outbound queue vs PLC accept queue), not adding throughput. The dashboard pay-off is that "PLC clients connected" can rise into the dozens without the proxy degrading. ## Intra-phase slicing (the closest thing to parallel-safe within this phase) The phase is one merge but can be implemented as five small commits in this order: | Slice | Output | Files touched | Hours | Parallelizable? | |-------|--------|---------------|-------|-----------------| | 9.1 | Pure data types (TxIdAllocator, CorrelationMap, InFlightRequest) + their unit tests | new files under `src/Mbproxy/Proxy/Multiplexing/` and `tests/...` | ~5 | Yes — pure logic, disjoint from rest. A second agent can write the E2E test scaffolding (slice 9.5) in parallel. | | 9.2 | `PlcMultiplexer` + `UpstreamPipe` skeleton with backend reader/writer loops | new files in `Multiplexing/` | ~10 | No — depends on 9.1's data types. | | 9.3 | Refactor `PlcListener` to own the multiplexer; delete `PlcConnectionPair`; rewire supervisor | modifies existing Proxy + Supervision files | ~8 | No — depends on 9.2. | | 9.4 | Update `BcdPduPipeline` to use correlation entries (drop `PerPlcContextWithRequest`); counter additions; status DTO + HTML updates | modifies pipeline + admin files | ~6 | No — depends on 9.3. | | 9.5 | Full E2E test suite + design.md + CLAUDE.md doc updates | new test file + doc edits | ~6 | Test-writing yes (slice 9.5 skeleton can land in parallel with 9.1); the doc edits at the end are sequential after 9.3. | **Total:** ~35 hours. With one parallel agent producing slice 9.1's data types and another sketching the e2e test fixtures during slice 9.5-prep, calendar time can compress to ~28 hours. ## Outputs (new files in this phase) ``` src/Mbproxy/Proxy/Multiplexing/PlcMultiplexer.cs # single backend conn owner; mux logic src/Mbproxy/Proxy/Multiplexing/UpstreamPipe.cs # per-upstream-client reader/writer src/Mbproxy/Proxy/Multiplexing/TxIdAllocator.cs # 16-bit allocator with wrap tracking src/Mbproxy/Proxy/Multiplexing/CorrelationMap.cs # proxyTxId → InFlightRequest src/Mbproxy/Proxy/Multiplexing/InFlightRequest.cs # the correlation record src/Mbproxy/Proxy/Multiplexing/MultiplexerLogEvents.cs # [LoggerMessage] vocab for this phase tests/Mbproxy.Tests/Proxy/Multiplexing/TxIdAllocatorTests.cs tests/Mbproxy.Tests/Proxy/Multiplexing/CorrelationMapTests.cs tests/Mbproxy.Tests/Proxy/Multiplexing/PlcMultiplexerTests.cs # integration, real sockets tests/Mbproxy.Tests/Proxy/Multiplexing/RewriterCorrelationTests.cs # rewriter w/ multiplexed paths tests/Mbproxy.Tests/Proxy/Multiplexing/MultiplexerE2ETests.cs # against pymodbus sim ``` ## Files modified (existing files in this phase) ``` src/Mbproxy/Proxy/PlcListener.cs # owns PlcMultiplexer; accept loop hands sockets to it src/Mbproxy/Proxy/PlcConnectionPair.cs # DELETED — replaced by UpstreamPipe + Multiplexer src/Mbproxy/Proxy/IPduPipeline.cs # PduContext gains in-flight correlation entry src/Mbproxy/Proxy/PerPlcContext.cs # delete PerPlcContextWithRequest; replaced by InFlightRequest passed per-call src/Mbproxy/Proxy/BcdPduPipeline.cs # FC03/04 response decodes via InFlightRequest, not last-request slot src/Mbproxy/Proxy/ProxyCounters.cs # new fields: InFlightCount, MaxInFlight, TxIdWraps, BackendDisconnectCascades, BackendQueueDepth src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs # supervises mux lifecycle alongside listener src/Mbproxy/Admin/StatusDto.cs # PlcBackendStatus gains the new mux fields src/Mbproxy/Admin/StatusSnapshotBuilder.cs # populate mux fields from counters src/Mbproxy/Admin/StatusHtmlRenderer.cs # show inFlight/max-in-flight in the per-PLC row docs/design.md # rewrite Connection model + Failure modes for multiplexed reality mbproxy/CLAUDE.md # flip Architecture summary's connection-model bullet docs/kpi.md # update operational notes referring to 4-client cap ``` ## Tasks ### 9.1 Data types (pure logic) 1. **`TxIdAllocator`** — `internal sealed class TxIdAllocator`. State: `_inUse` (`bool[65536]` for O(1) lookup; ~64 KB), `_next` (`ushort`), `_inFlightCount` (long), `_wrapCount` (long). Methods: - `bool TryAllocate(out ushort id)` — atomic via `lock` (the allocator is per-PLC, contention is low). Scans forward from `_next` for the next free slot; sets `_inUse[id] = true`; bumps `_next`. Returns `false` if `_inFlightCount == 65536` (saturated; emit `mbproxy.multiplex.saturated` Error and let caller decide to drop or queue). - `void Release(ushort id)` — clears `_inUse[id]`; decrements `_inFlightCount`. - `int InFlightCount { get; }`, `long WrapCount { get; }` — for telemetry. - **Wrap counter:** increment whenever `_next` rolls over `0xFFFF → 0x0000`. 2. **`InFlightRequest` + `InterestedParty`** — `InterestedParty` is `internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId)`. `InFlightRequest` is `internal sealed record InFlightRequest(byte UnitId, byte Fc, ushort StartAddress, ushort Qty, IReadOnlyList InterestedParties, DateTimeOffset SentAtUtc)`. Carries enough state for: (a) restoring each party's original TxId on the way back, (b) the FC03/04 correlation the rewriter needs (start/qty), (c) routing the response to each interested upstream socket, (d) round-trip-time measurement. **In Phase 9 `InterestedParties` always contains exactly one element.** The list shape is forward-compat with [Phase 10 — read coalescing](10-read-coalescing.md), which extends the same record to fan-out responses to multiple upstream clients without further refactor of the multiplexer's data model. Resist any reviewer suggestion to simplify it back to a single `UpstreamPipe Upstream` field — the list shape is the load-bearing foundation for Phase 10. 3. **`CorrelationMap`** — wraps a `ConcurrentDictionary`. Methods: `bool TryAdd(ushort, InFlightRequest)`, `bool TryRemove(ushort, out InFlightRequest)`, `int Count { get; }`, `IReadOnlyCollection Snapshot()` (for diagnostics; allocates a list). The dict is correct-by-construction for the mux's single-writer-add / single-reader-remove pattern; `ConcurrentDictionary` keeps it safe if/when we add upstream-side cancellation. ### 9.2 Multiplexer + UpstreamPipe 4. **`UpstreamPipe`** — `internal sealed class UpstreamPipe : IAsyncDisposable`. One instance per accepted upstream socket. Fields: `Socket _upstream`, `Guid _id`, `IPEndPoint _remoteEp`, `DateTimeOffset _connectedAtUtc`, `volatile bool _alive`, `Channel _responseChannel` (capacity 16). Two tasks: - **Read task**: pumps inbound MBAP frames from `_upstream` to a per-pipe `OnFrame` callback (registered by the multiplexer). - **Write task**: drains `_responseChannel` and writes each frame back to `_upstream`. On fault: sets `_alive = false`, closes the socket, the multiplexer notices on next correlation lookup and drops responses bound for this pipe. 5. **`PlcMultiplexer`** — `internal sealed class PlcMultiplexer : IAsyncDisposable`. One instance per PLC. Fields: backend `Socket`, `TxIdAllocator`, `CorrelationMap`, `Channel _outboundChannel` (cap 256), `PerPlcContext _ctx` (tag map + counters + logger), list of attached `UpstreamPipe`s. Two backend tasks plus a fan-in: - **Backend writer task**: drains `_outboundChannel` → writes to backend socket. Single writer; no synchronization on the socket needed. - **Backend reader task**: reads MBAP frames from backend → looks up `proxyTxId` in `CorrelationMap` → calls `pipeline.Process(ResponseToClient, header, pdu, ctx with InFlight)` → for each `InterestedParty` in `InFlightRequest.InterestedParties` (always exactly one in Phase 9; list-of-N once Phase 10 ships): writes a copy of the frame with that party's `OriginalTxId` restored in the MBAP header to the party's `UpstreamPipe._responseChannel` (or drops silently for that party if its pipe is `_alive = false`) → `CorrelationMap.TryRemove(proxyTxId)` + `TxIdAllocator.Release(proxyTxId)`. - **Per-upstream `OnFrame`**: invoked by each `UpstreamPipe`'s read task. Steps: 1. Parse MBAP: original TxId, length, unitId, PDU. 2. `TryAllocate` a proxyTxId. If saturated, write a Modbus exception response (Slave Device Failure, code 04) back to upstream and continue. 3. Build `InFlightRequest` (parse FC/start/qty from PDU if FC03/04 — needed for FC06 too if we want the symmetric correlation later). 4. `TryAdd` to correlation map. 5. Call `pipeline.Process(RequestToBackend, ...)` to apply BCD rewriting. 6. Overwrite MBAP TxId bytes with proxyTxId. 7. Enqueue the modified frame into `_outboundChannel`. 6. **Backend disconnect handling** — when the backend reader/writer task throws (socket closed, network reset, etc.): - Stop both tasks; close the backend socket. - Walk the correlation map; for each entry, close that entry's `UpstreamPipe` (cascade). Increment `BackendDisconnectCascades` by the upstream-pipe count. - Clear correlation map and TxIdAllocator. - The supervisor's Polly pipeline takes over for backend reconnect — when the next upstream request arrives, the multiplexer attempts a fresh backend connection through the Polly pipeline. ### 9.3 Listener + supervisor refactor 7. **`PlcListener.RunAsync`** — accept loop changes: - One `PlcMultiplexer` per listener (constructed in `PlcListenerSupervisor` and handed in). - On accept: wrap the socket in `UpstreamPipe`, register with the multiplexer via `mux.Attach(pipe)`. - On listener stop: dispose the multiplexer (which closes the backend + all attached pipes). - `ActivePairs` property → renamed `ActiveUpstreams` returning the multiplexer's list of attached `UpstreamPipe`s. Status page consumes this. 8. **Delete `PlcConnectionPair.cs`** — entire file. The replacement is `UpstreamPipe` + `PlcMultiplexer`. No backwards-compat shims; we're moving cleanly. 9. **`PlcListenerSupervisor`** — gains ownership of `PlcMultiplexer` alongside the listener. The Polly listener-recovery pipeline is unchanged; the multiplexer has its own internal Polly backend-connect pipeline (same `ResilienceOptions.BackendConnect` shape as today, just owned by the mux instead of the pair). ### 9.4 Rewriter + counters + status page 10. **`BcdPduPipeline`** — the FC03/04 response path stops reading `PerPlcContextWithRequest.LastRequestStart/Qty`. Instead, the multiplexer attaches an `InFlightRequest` to the `PduContext` for each response call: ```csharp public sealed class PerPlcContext : PduContext { public BcdTagMap TagMap { get; init; } public ProxyCounters Counters { get; init; } public ILogger Logger { get; init; } public InFlightRequest? CurrentRequest { get; init; } // NEW — non-null on response, null on request } ``` Concurrency: each backend response is handled on the backend reader task; the request path is handled by the per-upstream read task. Different `InFlightRequest` instances → no contention. 11. **Drop `PerPlcContextWithRequest`** entirely. The last-request-slot pattern was a 1:1-model workaround; the correlation map subsumes it. 12. **`ProxyCounters` additions:** - `InFlightCount` (`long` snapshot of `CorrelationMap.Count`) - `MaxInFlight` (`long`, peak observed via `Interlocked.Max`) - `TxIdWraps` (`long` from `TxIdAllocator.WrapCount`) - `BackendDisconnectCascades` (`long`) - `BackendQueueDepth` (snapshot of `_outboundChannel.Reader.Count`) 13. **Status page** — `StatusDto.PlcBackendStatus` gains `InFlight`, `MaxInFlight`, `TxIdWraps`, `DisconnectCascades`, `QueueDepth`. `StatusSnapshotBuilder` populates them. `StatusHtmlRenderer` adds a column or compact `[3/256]` indicator per PLC row. The JSON field names land in camelCase per the existing source-gen convention. ### 9.5 Tests + docs 14. **Unit + integration test suites** (see Tests required below). 15. **`docs/design.md` updates:** - **Connection model** section: rewrite. The diagram changes from "many clients → many backend sockets" to "many clients → one backend socket per PLC, multiplexed by proxy TxId rewriting." The operational consequence warning flips: instead of "5th client fails," it becomes "if backend disconnects, all attached upstream clients are cascaded closed; they reconnect on their own next request." - **Failure modes** section: amend to describe the cascade behaviour. - **Rewriter** section: amend to note the rewriter consumes `InFlightRequest` for response correlation (no architectural change, just an update to the description of how correlation flows). 16. **`mbproxy/CLAUDE.md`** Architecture summary: first bullet flips from "1:1 upstream-client ↔ backend-socket" to "single backend socket per PLC, multiplexed via MBAP TxId rewriting." 17. **`docs/kpi.md`** — the "Tier 2 → Connection-cap saturation warning" KPI loses its meaning (4-client cap no longer relevant on the upstream side). Either remove it or repurpose to track in-flight saturation against the 16-bit TxId space (which never realistically saturates but is the new equivalent ceiling). ## Public surface declared in this phase All `internal sealed` — the multiplexer types are not consumed outside the assembly. ```csharp namespace Mbproxy.Proxy.Multiplexing; internal sealed class TxIdAllocator { public bool TryAllocate(out ushort id); public void Release(ushort id); public int InFlightCount { get; } public long WrapCount { get; } } internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId); internal sealed record InFlightRequest( byte UnitId, byte Fc, ushort StartAddress, ushort Qty, IReadOnlyList InterestedParties, DateTimeOffset SentAtUtc); // Phase 9: InterestedParties.Count is always 1. // Phase 10 (read coalescing): the same record fans out to N parties without further refactor. internal sealed class CorrelationMap { public bool TryAdd(ushort proxyTxId, InFlightRequest req); public bool TryRemove(ushort proxyTxId, out InFlightRequest req); public int Count { get; } public IReadOnlyCollection Snapshot(); } internal sealed class UpstreamPipe : IAsyncDisposable { public Guid Id { get; } public IPEndPoint RemoteEp { get; } public DateTimeOffset ConnectedAtUtc { get; } public long PdusForwardedCount { get; } public bool IsAlive { get; } public Task RunReadLoopAsync(Func onFrame, CancellationToken ct); public ValueTask SendResponseAsync(byte[] frame, CancellationToken ct); public ValueTask DisposeAsync(); } internal sealed class PlcMultiplexer : IAsyncDisposable { public void Attach(UpstreamPipe pipe); public IReadOnlyCollection AttachedPipes { get; } public Task RunAsync(CancellationToken ct); public ValueTask DisposeAsync(); } ``` `PerPlcContext` gains a nullable `CurrentRequest` property. `PerPlcContextWithRequest` is removed (along with its `LastRequest*` slots). ## Tests required ### Unit (`Category = Unit`) **`TxIdAllocatorTests`** (≥ 8 tests): 1. `Allocate_FromEmpty_Returns_NextSequential` 2. `Allocate_AfterRelease_Reuses_FreedId` 3. `Allocate_AllocatesEveryUshort_BeforeWrapping` 4. `Allocate_WrapsCorrectly_After0xFFFF` 5. `Allocate_WhenSaturated_ReturnsFalse_DoesNotThrow` 6. `Release_OfNonAllocated_IsNoOp` 7. `Concurrent_AllocateRelease_NoDuplicateIds_Under_Parallel_Stress` (100 tasks, 1000 ops each) 8. `WrapCount_IncrementsOnEachFullWrap` **`CorrelationMapTests`** (≥ 5 tests): 1. `TryAdd_Then_TryRemove_RoundTrips` 2. `TryAdd_DuplicateKey_Fails` 3. `TryRemove_OfMissing_ReturnsFalse` 4. `Snapshot_ReflectsCurrentState` 5. `Concurrent_AddRemove_NoDataLoss_Under_Parallel_Stress` **`PlcMultiplexerTests`** (≥ 7 tests, real sockets, no simulator): 1. `SingleUpstream_RoundTripsFC03_Through_Multiplexer` 2. `SingleUpstream_RoundTripsFC06_Through_Multiplexer` 3. `TwoUpstreams_ConcurrentFC03_BothGetCorrectResponses` — proves TxId rewriting works end-to-end against a stub backend 4. `TwoUpstreams_ProxyTxIds_AreDistinct_OnTheWire` — sniff the backend socket; verify per-request TxIds are unique even when upstream TxIds collide 5. `UpstreamDisconnect_DoesNotAffectOtherUpstreams` — drop one client mid-flight; other client's response still arrives 6. `BackendDisconnect_CascadesToAllUpstreams` — kill backend; verify all upstream sockets close within 500 ms, `BackendDisconnectCascades` increments by N 7. `BackendReconnect_AfterCascade_NextUpstreamRequest_Succeeds` **`RewriterCorrelationTests`** (≥ 4 tests): 1. `FC03Response_DecodedViaInFlightRequest_NotPerPairSlot` 2. `ConcurrentFC03_FromTwoUpstreams_DecodeCorrectly_NoCrossTalk` — set up two `InFlightRequest`s with different start addresses, deliver responses out of order; verify each decodes against its own request 3. `ConcurrentFC06_FromTwoUpstreams_EncodeCorrectly` 4. `ResponseForDeadUpstream_IsDropped_NoExceptionPropagates` ### Integration (`Category = Unit`, no simulator) These use real `TcpListener` + `Socket` against a stub backend (a `TcpListener` that just echoes or canned-responds). They live in `PlcMultiplexerTests`. ### E2E (`Category = E2E`) **`MultiplexerE2ETests`** (≥ 5 tests, against pymodbus simulator): 1. `E2E_FiveConcurrentClients_AllReadHR1072_AllGetDecoded_1234` — the headline test. Five NModbus clients connected to the proxy in parallel; pymodbus sim has the BCD register at 1072. All five get `1234`. With Phase 08's 1:1 model, the 5th client would fail at backend connect. 2. `E2E_TwentyConcurrent_FC03_Requests_AcrossThreeClients_AllSucceed` 3. `E2E_BackendDisconnect_DuringInflight_CascadesUpstream_AndRecovers` — kill the sim mid-flight (simulate by closing on its side); verify upstream clients see clean socket close; relaunch sim; new upstream connection succeeds. 4. `E2E_RewriterStillWorks_UnderMultiplexedThreeClients` — three clients each writing different decimal values to different BCD-configured addresses via FC06; verify sim's register state. 5. `E2E_StatusPage_Shows_InFlightAndMaxInFlight` — drive 4 concurrent reads, verify `/status.json` reports `inFlight >= 1` during the burst and `maxInFlight >= 4`. ## Phase gate - [ ] `dotnet build Mbproxy.slnx -c Debug` — zero warnings, zero errors. - [ ] All 271 prior tests still green. Specifically: `Forward_FC03_HR1072_Returns_Decoded_1234`, `Forward_FC06_WriteHR200_ThenReadBack_RoundTrips`, `MbapTxId_IsPreservedEndToEnd`, and `MbapTxId_StillPreserved_AfterRewriting_20Consecutive` continue to pass against the multiplexed implementation. The MBAP-TxId-preserved tests are the **critical regression guard** — if multiplexing leaks proxy TxIds back to the client, these fail. - [ ] All new unit tests pass (≥ 24 new in slices 9.1-9.2 alone). - [ ] All new E2E tests pass (≥ 5). - [ ] `Forward_FC03_HR1072_Returns_Decoded_1234` PASSES with 5 concurrent NModbus clients connected to the same proxy port. **This is THE phase test.** - [ ] `PlcConnectionPair.cs` is gone. Grep for the type name across the solution returns zero hits. - [ ] `PerPlcContextWithRequest` is gone. Grep returns zero hits. - [ ] `docs/design.md` "Connection model" section is rewritten; the 1:1 model description is gone or moved into a "Historical: pre-Phase-09 model" footnote. - [ ] `mbproxy/CLAUDE.md` Architecture summary's connection-model bullet is updated. - [ ] Backend disconnect with N upstream clients in-flight: all N close within 500 ms; counter `BackendDisconnectCascades += N`. - [ ] `mbproxy.multiplex.saturated` Error event fires if TxId allocator hits 65,536 in-flight. (Stress-test acceptable; manufacture by holding 65,536 pending responses against a stub backend.) - [ ] Shutdown semantics still work: `ShutdownCoordinator` drains in-flight requests (now visible via `InFlightCount`, not `IsProcessing`). - [ ] Status page renders the new fields; HTML page weight remains under 50 KB for 54 PLCs. - [ ] CounterSnapshot's existing field set is preserved — only **added** fields, no renames or removals. Backwards-compat per the policy in `docs/kpi.md`. ## Out of scope - **Foundation for future caching, not caching itself.** This phase establishes the chokepoint where any future caching or coalescing layer plugs in, but implements no caching of any kind. `InFlightRequest.InterestedParties` is shaped as a list specifically to make [Phase 10 — read coalescing](10-read-coalescing.md) additive without refactor; do not infer caching behavior from the list shape alone. Tier C-2 (short-TTL response cache) and Tier C-3 (periodic poll + cache) remain explicitly out of scope until their own design discussions and `design.md` updates land. - **Per-tag read coalescing** — if two clients read the same register at the same time, Phase 9's multiplexer sends both requests. Coalescing them into one backend round-trip is the explicit goal of [Phase 10](10-read-coalescing.md), which plugs into the `InterestedParties` seam created here. - **Backend keepalive / heartbeat** — the design's current "no keepalive" position stands. An idle backend with no upstream activity will die after middlebox timeouts; the next upstream request triggers a fresh connect via Polly. Multiplexing doesn't change this. - **TxId fairness scheduling** — FIFO order in the `_outboundChannel` is the contract. No round-robin per upstream, no priority. If a single upstream client floods the channel, others queue behind. This is a stated trade-off and matches the ECOM's internal serialization anyway. - **Pipelined multi-PDU-in-flight per single upstream client** — still unsupported. One in-flight request per upstream pipe at a time. Multiplexing across DIFFERENT upstream clients works fully; multiplexing across multiple in-flight requests from the SAME upstream client does not. Document the constraint. - **Linux / cross-platform packaging** — still Windows Service only. ## Subagent briefing If you're the agent picking up this phase, here's the executive summary you need in your head: 1. **You are deleting `PlcConnectionPair`.** Everything that file did is now split between `UpstreamPipe` (the per-client half) and `PlcMultiplexer` (the per-PLC half). Read `PlcConnectionPair.cs` once before you delete it — every behavior in there has a destination in one of the two new classes. 2. **Single-writer / single-reader on the backend socket.** Two tasks share the backend socket: one writes (drained from `_outboundChannel`), one reads (decodes MBAP frames). No third task touches the socket. This invariant is what makes the channel + dictionary design correct without locks. 3. **The rewriter doesn't know about MBAP framing or correlation.** It still receives `(direction, mbapHeader span, pdu span, PerPlcContext ctx)`. The only addition is `ctx.CurrentRequest` (nullable, non-null on response). The rewriter is otherwise unchanged. Resist refactoring it. 4. **`InFlightRequest.SentAtUtc` powers `lastRoundTripMs` correctly across multiplexed clients.** Today's EWMA is per-pair; under multiplexing, the timestamp moves to per-request. The status counter stays the same. 5. **Cascade-on-backend-disconnect is the most subtle behavior.** Get the test for it right early (`BackendDisconnect_CascadesToAllUpstreams`). It's the difference between "graceful failure" and "leaked upstream sockets that hold connections open until OS timeout." 6. **TxId allocator saturation is a real-world impossibility but a stress-test reality.** Hold 65,536 responses in a stub backend; the allocator must refuse the 65,537th cleanly with an exception response code 04, not crash. 7. **Update the docs in the SAME PR as the code.** `design.md` Connection model, `mbproxy/CLAUDE.md` Architecture summary, and `docs/kpi.md` connection-cap KPI either get rewritten or removed. Doc drift is a gate fail. 8. **Do NOT introduce parallel agents within this phase.** The cross-cut is too broad. If you have spare agent budget, slice 9.1 (data types + their unit tests) can run alongside slice 9.5 (e2e test scaffolding writing against the unchanged outer-shape contract) but the middle slices are sequential. 9. **The 4 critical regression tests** that must stay green: - `Forward_FC03_HR1072_Returns_Decoded_1234` - `Forward_FC06_WriteHR200_ThenReadBack_RoundTrips` - `Forward_FC16_WriteMultipleHR201_203_ThenReadBack_RoundTrips` - `MbapTxId_IsPreservedEndToEnd` ← THIS is the one that proves multiplexing is transparent. 10. **When in doubt, re-read `BcdPduPipeline.ProcessResponse`.** The FC03/04 correlation logic there is the most subtle existing code that you're touching. Walk through it with one upstream client in mind first, then mentally replay with two; both must work without code change to the pipeline (only the way `PerPlcContext.CurrentRequest` gets populated changes). ## Cross-references - Today's 1:1 model: [`../design.md`](../design.md) → "Connection model" (will be rewritten by this phase). - DL260 4-client cap source: [`../../DL260/dl205.md`](../../DL260/dl205.md) → "Behavioral Oddities". - Existing rewriter request→response correlation: `src/Mbproxy/Proxy/BcdPduPipeline.cs` `ProcessResponse` (lines reading `PerPlcContextWithRequest.LastRequest*`). - Polly pipelines this phase reuses without modification: `src/Mbproxy/Proxy/Supervision/PolicyFactory.cs`. - Counter-snapshot backwards-compat policy: [`../kpi.md`](../kpi.md) → "Backwards-compat policy".