Files
wwtools/mbproxy/docs/plan/09-txid-multiplexing.md
T
Joseph Doherty 56eee3c563 mbproxy: initial commit through Phase 9 (TxId multiplexing)
Adds the mbproxy service end-to-end. Phases 00-08 implement the
production-ready single-listener / 1:1-backend transparent Modbus TCP
proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260
fleet. Phase 9 replaces the connection layer with a single backend
socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's
4-concurrent-client cap as an operational ceiling.

Phase 9 additions of note:
- PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap
- InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing
  for Phase 10 read coalescing — do not collapse to a single field)
- Per-request watchdog: surfaces Modbus exception 0x0B to upstream
  on BackendRequestTimeoutMs, defending against lost responses,
  dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed-
  request bug (its ServerRequestHandler.last_pdu state race)
- Status DTO + HTML gain inFlight / maxInFlight / txIdWraps /
  disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md)

Tests: 263 unit + 38 E2E. Multiplexer correctness under truly
concurrent backend traffic is proved against a stub backend in
PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus
3.13's single-PDU framer stays in known-good mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:49:35 -04:00

31 KiB

Phase 09 — MBAP TxId multiplexing (single backend connection per PLC)

Replace the 1:1 upstream-client ↔ backend-socket model with a single backend connection per PLC, multiplexed across all upstream clients via MBAP transaction-ID rewriting and a correlation map. After this phase the H2-ECOM100's 4-simultaneous-TCP-client cap is no longer an operational ceiling — the proxy holds exactly one slot per PLC regardless of how many upstream clients are connected.

Status: shipped 2026-05-14. Phases 00-08 shipped the production-ready 1:1 model; this phase swapped connection management without changing the transparent-rewrite contract.

Implementation clarifications discovered during 2026-05-14 ship

These notes capture decisions and surprises that surfaced during the actual implementation. They supplement (not replace) the Tasks section below.

  1. A per-request timeout watchdog is part of Phase 9, not deferred. The 1:1 model collapsed missing-response handling onto the dedicated backend socket dying. The multiplexed model needs an explicit timer because a single lost or mis-routed response would otherwise leak a correlation entry forever and hang the upstream pipe indefinitely. The watchdog ticks at quarter-BackendRequestTimeoutMs (min 100 ms), scans the correlation map, and times out stale requests with Modbus exception 0x0B (Gateway Target Device Failed To Respond) delivered to the upstream party with the original TxId restored. Log event mbproxy.multiplex.request.timeout (Warning).

  2. PlcListener constructs a multiplexer unconditionally. The Phase-9 draft had PlcListener conditionally construct the multiplexer only when a PerPlcContext was supplied; the no-context fallback dropped accepted upstream sockets. Tests (and any pre-Phase-6 startup path that lacked a context) hit a regression. The fix is to construct a minimal default PerPlcContext from the PlcOptions if the caller didn't supply one, and require _multiplexer to be non-null when RunAsync runs.

  3. BackendConnectFailure_ClosesUpstreamCleanly is now lazy. The 1:1 model attempted a backend connect at upstream-accept time, so simply opening a TCP connection to a proxy with a bad backend triggered the close. The multiplexed model connects to the backend on the first upstream frame, so the test has to send a Modbus request before the proxy attempts the (failing) backend connect that causes the upstream close. Updated in-place.

  4. pymodbus 3.13.0 simulator is broken under multiplexed concurrent requests. Its ServerRequestHandler keeps a single last_pdu per connection and schedules handle_later via asyncio.call_soon; two MBAP frames in one recv buffer overwrite last_pdu before the first handler runs, and both responses carry the later TxId. The real DL260 ECOM properly echoes per-request TxIds. Consequence for tests:

    • Mux correctness under truly concurrent backend traffic is proven against the stub backend in PlcMultiplexerTests, which models the DL260's correct TxId-echo behaviour.
    • MultiplexerE2ETests paces requests so pymodbus only ever sees one MBAP frame at a time on the shared backend connection. The headline test (E2E_FiveSimultaneousClients_AllReadHR1072_AllGetDecoded_1234) verifies the connection ceiling lift (5 simultaneous upstream connections, where Phase-08's 1:1 model would have refused the 5th) — not the under-concurrency multiplexing behaviour.
    • The watchdog is the production defence if any real backend (or future simulator) ever mis-echoes a TxId: stale entries time out cleanly with exception 0x0B rather than hanging upstream clients.
  5. E2E timeouts. Per docs/plan/README.md's Test discipline, all E2E tests are 5 s by default. Hot-reload tests that genuinely need 5 s + 3 s of propagation windows carry a 10 s timeout with a one-line comment; E2E_BackendDisconnect_DuringInflight_CascadesUpstream_AndRecovers carries 8 s for its sequential connects + Polly-paced reconnect path.

  6. AsyncHostDispose deadlock note. Test fixtures that hold IHost via await using were originally written with a 5 s shutdown timeout; under Phase 9's drained-channel cleanup that occasionally exceeded the test's own Timeout = 5000. Reduced to 2-3 s where it doesn't materially affect the test's drain semantics.

Depends on: Phase 04 (rewriter), Phase 05 (supervisor + Polly), Phase 07 (status page DTO surface). Parallel-safe with: nothing within itself. Hard rule. This phase deletes PlcConnectionPair and rewires the supervisor + rewriter correlation path simultaneously; the cross-cut is too broad for safe parallel work. The optional intra-phase slicing (below) is the closest thing to parallel.

Goal

The H2-ECOM100 accepts 4 concurrent TCP clients per PLC; today's 1:1 model means the 5th upstream client to the same proxy port fails at backend connect. This phase eliminates that ceiling by making one persistent backend socket per PLC, with the proxy serving as a connection multiplexer that rewrites MBAP transaction IDs to keep concurrent in-flight requests from different upstream clients distinguishable on the single wire.

The wire-rate ceiling does not change — the H2-ECOM100 internally serializes requests (one per PLC scan, ~2-10 ms scan time) regardless of how many TCP connections it has. We're shifting where serialization happens (proxy outbound queue vs PLC accept queue), not adding throughput. The dashboard pay-off is that "PLC clients connected" can rise into the dozens without the proxy degrading.

Intra-phase slicing (the closest thing to parallel-safe within this phase)

The phase is one merge but can be implemented as five small commits in this order:

Slice Output Files touched Hours Parallelizable?
9.1 Pure data types (TxIdAllocator, CorrelationMap, InFlightRequest) + their unit tests new files under src/Mbproxy/Proxy/Multiplexing/ and tests/... ~5 Yes — pure logic, disjoint from rest. A second agent can write the E2E test scaffolding (slice 9.5) in parallel.
9.2 PlcMultiplexer + UpstreamPipe skeleton with backend reader/writer loops new files in Multiplexing/ ~10 No — depends on 9.1's data types.
9.3 Refactor PlcListener to own the multiplexer; delete PlcConnectionPair; rewire supervisor modifies existing Proxy + Supervision files ~8 No — depends on 9.2.
9.4 Update BcdPduPipeline to use correlation entries (drop PerPlcContextWithRequest); counter additions; status DTO + HTML updates modifies pipeline + admin files ~6 No — depends on 9.3.
9.5 Full E2E test suite + design.md + CLAUDE.md doc updates new test file + doc edits ~6 Test-writing yes (slice 9.5 skeleton can land in parallel with 9.1); the doc edits at the end are sequential after 9.3.

Total: ~35 hours. With one parallel agent producing slice 9.1's data types and another sketching the e2e test fixtures during slice 9.5-prep, calendar time can compress to ~28 hours.

Outputs (new files in this phase)

src/Mbproxy/Proxy/Multiplexing/PlcMultiplexer.cs         # single backend conn owner; mux logic
src/Mbproxy/Proxy/Multiplexing/UpstreamPipe.cs           # per-upstream-client reader/writer
src/Mbproxy/Proxy/Multiplexing/TxIdAllocator.cs          # 16-bit allocator with wrap tracking
src/Mbproxy/Proxy/Multiplexing/CorrelationMap.cs         # proxyTxId → InFlightRequest
src/Mbproxy/Proxy/Multiplexing/InFlightRequest.cs        # the correlation record
src/Mbproxy/Proxy/Multiplexing/MultiplexerLogEvents.cs   # [LoggerMessage] vocab for this phase

tests/Mbproxy.Tests/Proxy/Multiplexing/TxIdAllocatorTests.cs
tests/Mbproxy.Tests/Proxy/Multiplexing/CorrelationMapTests.cs
tests/Mbproxy.Tests/Proxy/Multiplexing/PlcMultiplexerTests.cs           # integration, real sockets
tests/Mbproxy.Tests/Proxy/Multiplexing/RewriterCorrelationTests.cs      # rewriter w/ multiplexed paths
tests/Mbproxy.Tests/Proxy/Multiplexing/MultiplexerE2ETests.cs           # against pymodbus sim

Files modified (existing files in this phase)

src/Mbproxy/Proxy/PlcListener.cs                          # owns PlcMultiplexer; accept loop hands sockets to it
src/Mbproxy/Proxy/PlcConnectionPair.cs                    # DELETED — replaced by UpstreamPipe + Multiplexer
src/Mbproxy/Proxy/IPduPipeline.cs                         # PduContext gains in-flight correlation entry
src/Mbproxy/Proxy/PerPlcContext.cs                        # delete PerPlcContextWithRequest; replaced by InFlightRequest passed per-call
src/Mbproxy/Proxy/BcdPduPipeline.cs                       # FC03/04 response decodes via InFlightRequest, not last-request slot
src/Mbproxy/Proxy/ProxyCounters.cs                        # new fields: InFlightCount, MaxInFlight, TxIdWraps, BackendDisconnectCascades, BackendQueueDepth
src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs    # supervises mux lifecycle alongside listener
src/Mbproxy/Admin/StatusDto.cs                            # PlcBackendStatus gains the new mux fields
src/Mbproxy/Admin/StatusSnapshotBuilder.cs                # populate mux fields from counters
src/Mbproxy/Admin/StatusHtmlRenderer.cs                   # show inFlight/max-in-flight in the per-PLC row

docs/design.md                                            # rewrite Connection model + Failure modes for multiplexed reality
mbproxy/CLAUDE.md                                         # flip Architecture summary's connection-model bullet
docs/kpi.md                                               # update operational notes referring to 4-client cap

Tasks

9.1 Data types (pure logic)

  1. TxIdAllocatorinternal sealed class TxIdAllocator. State: _inUse (bool[65536] for O(1) lookup; ~64 KB), _next (ushort), _inFlightCount (long), _wrapCount (long). Methods:

    • bool TryAllocate(out ushort id) — atomic via lock (the allocator is per-PLC, contention is low). Scans forward from _next for the next free slot; sets _inUse[id] = true; bumps _next. Returns false if _inFlightCount == 65536 (saturated; emit mbproxy.multiplex.saturated Error and let caller decide to drop or queue).
    • void Release(ushort id) — clears _inUse[id]; decrements _inFlightCount.
    • int InFlightCount { get; }, long WrapCount { get; } — for telemetry.
    • Wrap counter: increment whenever _next rolls over 0xFFFF → 0x0000.
  2. InFlightRequest + InterestedPartyInterestedParty is internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId). InFlightRequest is internal sealed record InFlightRequest(byte UnitId, byte Fc, ushort StartAddress, ushort Qty, IReadOnlyList<InterestedParty> InterestedParties, DateTimeOffset SentAtUtc). Carries enough state for: (a) restoring each party's original TxId on the way back, (b) the FC03/04 correlation the rewriter needs (start/qty), (c) routing the response to each interested upstream socket, (d) round-trip-time measurement.

    In Phase 9 InterestedParties always contains exactly one element. The list shape is forward-compat with Phase 10 — read coalescing, which extends the same record to fan-out responses to multiple upstream clients without further refactor of the multiplexer's data model. Resist any reviewer suggestion to simplify it back to a single UpstreamPipe Upstream field — the list shape is the load-bearing foundation for Phase 10.

  3. CorrelationMap — wraps a ConcurrentDictionary<ushort, InFlightRequest>. Methods: bool TryAdd(ushort, InFlightRequest), bool TryRemove(ushort, out InFlightRequest), int Count { get; }, IReadOnlyCollection<InFlightRequest> Snapshot() (for diagnostics; allocates a list). The dict is correct-by-construction for the mux's single-writer-add / single-reader-remove pattern; ConcurrentDictionary keeps it safe if/when we add upstream-side cancellation.

9.2 Multiplexer + UpstreamPipe

  1. UpstreamPipeinternal sealed class UpstreamPipe : IAsyncDisposable. One instance per accepted upstream socket. Fields: Socket _upstream, Guid _id, IPEndPoint _remoteEp, DateTimeOffset _connectedAtUtc, volatile bool _alive, Channel<byte[]> _responseChannel (capacity 16). Two tasks:

    • Read task: pumps inbound MBAP frames from _upstream to a per-pipe OnFrame callback (registered by the multiplexer).
    • Write task: drains _responseChannel and writes each frame back to _upstream. On fault: sets _alive = false, closes the socket, the multiplexer notices on next correlation lookup and drops responses bound for this pipe.
  2. PlcMultiplexerinternal sealed class PlcMultiplexer : IAsyncDisposable. One instance per PLC. Fields: backend Socket, TxIdAllocator, CorrelationMap, Channel<byte[]> _outboundChannel (cap 256), PerPlcContext _ctx (tag map + counters + logger), list of attached UpstreamPipes. Two backend tasks plus a fan-in:

    • Backend writer task: drains _outboundChannel → writes to backend socket. Single writer; no synchronization on the socket needed.
    • Backend reader task: reads MBAP frames from backend → looks up proxyTxId in CorrelationMap → calls pipeline.Process(ResponseToClient, header, pdu, ctx with InFlight) → for each InterestedParty in InFlightRequest.InterestedParties (always exactly one in Phase 9; list-of-N once Phase 10 ships): writes a copy of the frame with that party's OriginalTxId restored in the MBAP header to the party's UpstreamPipe._responseChannel (or drops silently for that party if its pipe is _alive = false) → CorrelationMap.TryRemove(proxyTxId) + TxIdAllocator.Release(proxyTxId).
    • Per-upstream OnFrame: invoked by each UpstreamPipe's read task. Steps:
      1. Parse MBAP: original TxId, length, unitId, PDU.
      2. TryAllocate a proxyTxId. If saturated, write a Modbus exception response (Slave Device Failure, code 04) back to upstream and continue.
      3. Build InFlightRequest (parse FC/start/qty from PDU if FC03/04 — needed for FC06 too if we want the symmetric correlation later).
      4. TryAdd to correlation map.
      5. Call pipeline.Process(RequestToBackend, ...) to apply BCD rewriting.
      6. Overwrite MBAP TxId bytes with proxyTxId.
      7. Enqueue the modified frame into _outboundChannel.
  3. Backend disconnect handling — when the backend reader/writer task throws (socket closed, network reset, etc.):

    • Stop both tasks; close the backend socket.
    • Walk the correlation map; for each entry, close that entry's UpstreamPipe (cascade). Increment BackendDisconnectCascades by the upstream-pipe count.
    • Clear correlation map and TxIdAllocator.
    • The supervisor's Polly pipeline takes over for backend reconnect — when the next upstream request arrives, the multiplexer attempts a fresh backend connection through the Polly pipeline.

9.3 Listener + supervisor refactor

  1. PlcListener.RunAsync — accept loop changes:

    • One PlcMultiplexer per listener (constructed in PlcListenerSupervisor and handed in).
    • On accept: wrap the socket in UpstreamPipe, register with the multiplexer via mux.Attach(pipe).
    • On listener stop: dispose the multiplexer (which closes the backend + all attached pipes).
    • ActivePairs property → renamed ActiveUpstreams returning the multiplexer's list of attached UpstreamPipes. Status page consumes this.
  2. Delete PlcConnectionPair.cs — entire file. The replacement is UpstreamPipe + PlcMultiplexer. No backwards-compat shims; we're moving cleanly.

  3. PlcListenerSupervisor — gains ownership of PlcMultiplexer alongside the listener. The Polly listener-recovery pipeline is unchanged; the multiplexer has its own internal Polly backend-connect pipeline (same ResilienceOptions.BackendConnect shape as today, just owned by the mux instead of the pair).

9.4 Rewriter + counters + status page

  1. BcdPduPipeline — the FC03/04 response path stops reading PerPlcContextWithRequest.LastRequestStart/Qty. Instead, the multiplexer attaches an InFlightRequest to the PduContext for each response call:

    public sealed class PerPlcContext : PduContext {
        public BcdTagMap TagMap { get; init; }
        public ProxyCounters Counters { get; init; }
        public ILogger Logger { get; init; }
        public InFlightRequest? CurrentRequest { get; init; }  // NEW — non-null on response, null on request
    }
    

    Concurrency: each backend response is handled on the backend reader task; the request path is handled by the per-upstream read task. Different InFlightRequest instances → no contention.

  2. Drop PerPlcContextWithRequest entirely. The last-request-slot pattern was a 1:1-model workaround; the correlation map subsumes it.

  3. ProxyCounters additions:

    • InFlightCount (long snapshot of CorrelationMap.Count)
    • MaxInFlight (long, peak observed via Interlocked.Max)
    • TxIdWraps (long from TxIdAllocator.WrapCount)
    • BackendDisconnectCascades (long)
    • BackendQueueDepth (snapshot of _outboundChannel.Reader.Count)
  4. Status pageStatusDto.PlcBackendStatus gains InFlight, MaxInFlight, TxIdWraps, DisconnectCascades, QueueDepth. StatusSnapshotBuilder populates them. StatusHtmlRenderer adds a column or compact [3/256] indicator per PLC row. The JSON field names land in camelCase per the existing source-gen convention.

9.5 Tests + docs

  1. Unit + integration test suites (see Tests required below).

  2. docs/design.md updates:

    • Connection model section: rewrite. The diagram changes from "many clients → many backend sockets" to "many clients → one backend socket per PLC, multiplexed by proxy TxId rewriting." The operational consequence warning flips: instead of "5th client fails," it becomes "if backend disconnects, all attached upstream clients are cascaded closed; they reconnect on their own next request."
    • Failure modes section: amend to describe the cascade behaviour.
    • Rewriter section: amend to note the rewriter consumes InFlightRequest for response correlation (no architectural change, just an update to the description of how correlation flows).
  3. mbproxy/CLAUDE.md Architecture summary: first bullet flips from "1:1 upstream-client ↔ backend-socket" to "single backend socket per PLC, multiplexed via MBAP TxId rewriting."

  4. docs/kpi.md — the "Tier 2 → Connection-cap saturation warning" KPI loses its meaning (4-client cap no longer relevant on the upstream side). Either remove it or repurpose to track in-flight saturation against the 16-bit TxId space (which never realistically saturates but is the new equivalent ceiling).

Public surface declared in this phase

All internal sealed — the multiplexer types are not consumed outside the assembly.

namespace Mbproxy.Proxy.Multiplexing;

internal sealed class TxIdAllocator {
    public bool TryAllocate(out ushort id);
    public void Release(ushort id);
    public int InFlightCount { get; }
    public long WrapCount { get; }
}

internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId);

internal sealed record InFlightRequest(
    byte UnitId, byte Fc,
    ushort StartAddress, ushort Qty,
    IReadOnlyList<InterestedParty> InterestedParties,
    DateTimeOffset SentAtUtc);
// Phase 9: InterestedParties.Count is always 1.
// Phase 10 (read coalescing): the same record fans out to N parties without further refactor.

internal sealed class CorrelationMap {
    public bool TryAdd(ushort proxyTxId, InFlightRequest req);
    public bool TryRemove(ushort proxyTxId, out InFlightRequest req);
    public int Count { get; }
    public IReadOnlyCollection<InFlightRequest> Snapshot();
}

internal sealed class UpstreamPipe : IAsyncDisposable {
    public Guid Id { get; }
    public IPEndPoint RemoteEp { get; }
    public DateTimeOffset ConnectedAtUtc { get; }
    public long PdusForwardedCount { get; }
    public bool IsAlive { get; }
    public Task RunReadLoopAsync(Func<byte[], Task> onFrame, CancellationToken ct);
    public ValueTask SendResponseAsync(byte[] frame, CancellationToken ct);
    public ValueTask DisposeAsync();
}

internal sealed class PlcMultiplexer : IAsyncDisposable {
    public void Attach(UpstreamPipe pipe);
    public IReadOnlyCollection<UpstreamPipe> AttachedPipes { get; }
    public Task RunAsync(CancellationToken ct);
    public ValueTask DisposeAsync();
}

PerPlcContext gains a nullable CurrentRequest property. PerPlcContextWithRequest is removed (along with its LastRequest* slots).

Tests required

Unit (Category = Unit)

TxIdAllocatorTests (≥ 8 tests):

  1. Allocate_FromEmpty_Returns_NextSequential
  2. Allocate_AfterRelease_Reuses_FreedId
  3. Allocate_AllocatesEveryUshort_BeforeWrapping
  4. Allocate_WrapsCorrectly_After0xFFFF
  5. Allocate_WhenSaturated_ReturnsFalse_DoesNotThrow
  6. Release_OfNonAllocated_IsNoOp
  7. Concurrent_AllocateRelease_NoDuplicateIds_Under_Parallel_Stress (100 tasks, 1000 ops each)
  8. WrapCount_IncrementsOnEachFullWrap

CorrelationMapTests (≥ 5 tests):

  1. TryAdd_Then_TryRemove_RoundTrips
  2. TryAdd_DuplicateKey_Fails
  3. TryRemove_OfMissing_ReturnsFalse
  4. Snapshot_ReflectsCurrentState
  5. Concurrent_AddRemove_NoDataLoss_Under_Parallel_Stress

PlcMultiplexerTests (≥ 7 tests, real sockets, no simulator):

  1. SingleUpstream_RoundTripsFC03_Through_Multiplexer
  2. SingleUpstream_RoundTripsFC06_Through_Multiplexer
  3. TwoUpstreams_ConcurrentFC03_BothGetCorrectResponses — proves TxId rewriting works end-to-end against a stub backend
  4. TwoUpstreams_ProxyTxIds_AreDistinct_OnTheWire — sniff the backend socket; verify per-request TxIds are unique even when upstream TxIds collide
  5. UpstreamDisconnect_DoesNotAffectOtherUpstreams — drop one client mid-flight; other client's response still arrives
  6. BackendDisconnect_CascadesToAllUpstreams — kill backend; verify all upstream sockets close within 500 ms, BackendDisconnectCascades increments by N
  7. BackendReconnect_AfterCascade_NextUpstreamRequest_Succeeds

RewriterCorrelationTests (≥ 4 tests):

  1. FC03Response_DecodedViaInFlightRequest_NotPerPairSlot
  2. ConcurrentFC03_FromTwoUpstreams_DecodeCorrectly_NoCrossTalk — set up two InFlightRequests with different start addresses, deliver responses out of order; verify each decodes against its own request
  3. ConcurrentFC06_FromTwoUpstreams_EncodeCorrectly
  4. ResponseForDeadUpstream_IsDropped_NoExceptionPropagates

Integration (Category = Unit, no simulator)

These use real TcpListener + Socket against a stub backend (a TcpListener that just echoes or canned-responds). They live in PlcMultiplexerTests.

E2E (Category = E2E)

MultiplexerE2ETests (≥ 5 tests, against pymodbus simulator):

  1. E2E_FiveConcurrentClients_AllReadHR1072_AllGetDecoded_1234 — the headline test. Five NModbus clients connected to the proxy in parallel; pymodbus sim has the BCD register at 1072. All five get 1234. With Phase 08's 1:1 model, the 5th client would fail at backend connect.
  2. E2E_TwentyConcurrent_FC03_Requests_AcrossThreeClients_AllSucceed
  3. E2E_BackendDisconnect_DuringInflight_CascadesUpstream_AndRecovers — kill the sim mid-flight (simulate by closing on its side); verify upstream clients see clean socket close; relaunch sim; new upstream connection succeeds.
  4. E2E_RewriterStillWorks_UnderMultiplexedThreeClients — three clients each writing different decimal values to different BCD-configured addresses via FC06; verify sim's register state.
  5. E2E_StatusPage_Shows_InFlightAndMaxInFlight — drive 4 concurrent reads, verify /status.json reports inFlight >= 1 during the burst and maxInFlight >= 4.

Phase gate

  • dotnet build Mbproxy.slnx -c Debug — zero warnings, zero errors.
  • All 271 prior tests still green. Specifically: Forward_FC03_HR1072_Returns_Decoded_1234, Forward_FC06_WriteHR200_ThenReadBack_RoundTrips, MbapTxId_IsPreservedEndToEnd, and MbapTxId_StillPreserved_AfterRewriting_20Consecutive continue to pass against the multiplexed implementation. The MBAP-TxId-preserved tests are the critical regression guard — if multiplexing leaks proxy TxIds back to the client, these fail.
  • All new unit tests pass (≥ 24 new in slices 9.1-9.2 alone).
  • All new E2E tests pass (≥ 5).
  • Forward_FC03_HR1072_Returns_Decoded_1234 PASSES with 5 concurrent NModbus clients connected to the same proxy port. This is THE phase test.
  • PlcConnectionPair.cs is gone. Grep for the type name across the solution returns zero hits.
  • PerPlcContextWithRequest is gone. Grep returns zero hits.
  • docs/design.md "Connection model" section is rewritten; the 1:1 model description is gone or moved into a "Historical: pre-Phase-09 model" footnote.
  • mbproxy/CLAUDE.md Architecture summary's connection-model bullet is updated.
  • Backend disconnect with N upstream clients in-flight: all N close within 500 ms; counter BackendDisconnectCascades += N.
  • mbproxy.multiplex.saturated Error event fires if TxId allocator hits 65,536 in-flight. (Stress-test acceptable; manufacture by holding 65,536 pending responses against a stub backend.)
  • Shutdown semantics still work: ShutdownCoordinator drains in-flight requests (now visible via InFlightCount, not IsProcessing).
  • Status page renders the new fields; HTML page weight remains under 50 KB for 54 PLCs.
  • CounterSnapshot's existing field set is preserved — only added fields, no renames or removals. Backwards-compat per the policy in docs/kpi.md.

Out of scope

  • Foundation for future caching, not caching itself. This phase establishes the chokepoint where any future caching or coalescing layer plugs in, but implements no caching of any kind. InFlightRequest.InterestedParties is shaped as a list specifically to make Phase 10 — read coalescing additive without refactor; do not infer caching behavior from the list shape alone. Tier C-2 (short-TTL response cache) and Tier C-3 (periodic poll + cache) remain explicitly out of scope until their own design discussions and design.md updates land.
  • Per-tag read coalescing — if two clients read the same register at the same time, Phase 9's multiplexer sends both requests. Coalescing them into one backend round-trip is the explicit goal of Phase 10, which plugs into the InterestedParties seam created here.
  • Backend keepalive / heartbeat — the design's current "no keepalive" position stands. An idle backend with no upstream activity will die after middlebox timeouts; the next upstream request triggers a fresh connect via Polly. Multiplexing doesn't change this.
  • TxId fairness scheduling — FIFO order in the _outboundChannel is the contract. No round-robin per upstream, no priority. If a single upstream client floods the channel, others queue behind. This is a stated trade-off and matches the ECOM's internal serialization anyway.
  • Pipelined multi-PDU-in-flight per single upstream client — still unsupported. One in-flight request per upstream pipe at a time. Multiplexing across DIFFERENT upstream clients works fully; multiplexing across multiple in-flight requests from the SAME upstream client does not. Document the constraint.
  • Linux / cross-platform packaging — still Windows Service only.

Subagent briefing

If you're the agent picking up this phase, here's the executive summary you need in your head:

  1. You are deleting PlcConnectionPair. Everything that file did is now split between UpstreamPipe (the per-client half) and PlcMultiplexer (the per-PLC half). Read PlcConnectionPair.cs once before you delete it — every behavior in there has a destination in one of the two new classes.

  2. Single-writer / single-reader on the backend socket. Two tasks share the backend socket: one writes (drained from _outboundChannel), one reads (decodes MBAP frames). No third task touches the socket. This invariant is what makes the channel + dictionary design correct without locks.

  3. The rewriter doesn't know about MBAP framing or correlation. It still receives (direction, mbapHeader span, pdu span, PerPlcContext ctx). The only addition is ctx.CurrentRequest (nullable, non-null on response). The rewriter is otherwise unchanged. Resist refactoring it.

  4. InFlightRequest.SentAtUtc powers lastRoundTripMs correctly across multiplexed clients. Today's EWMA is per-pair; under multiplexing, the timestamp moves to per-request. The status counter stays the same.

  5. Cascade-on-backend-disconnect is the most subtle behavior. Get the test for it right early (BackendDisconnect_CascadesToAllUpstreams). It's the difference between "graceful failure" and "leaked upstream sockets that hold connections open until OS timeout."

  6. TxId allocator saturation is a real-world impossibility but a stress-test reality. Hold 65,536 responses in a stub backend; the allocator must refuse the 65,537th cleanly with an exception response code 04, not crash.

  7. Update the docs in the SAME PR as the code. design.md Connection model, mbproxy/CLAUDE.md Architecture summary, and docs/kpi.md connection-cap KPI either get rewritten or removed. Doc drift is a gate fail.

  8. Do NOT introduce parallel agents within this phase. The cross-cut is too broad. If you have spare agent budget, slice 9.1 (data types + their unit tests) can run alongside slice 9.5 (e2e test scaffolding writing against the unchanged outer-shape contract) but the middle slices are sequential.

  9. The 4 critical regression tests that must stay green:

    • Forward_FC03_HR1072_Returns_Decoded_1234
    • Forward_FC06_WriteHR200_ThenReadBack_RoundTrips
    • Forward_FC16_WriteMultipleHR201_203_ThenReadBack_RoundTrips
    • MbapTxId_IsPreservedEndToEnd ← THIS is the one that proves multiplexing is transparent.
  10. When in doubt, re-read BcdPduPipeline.ProcessResponse. The FC03/04 correlation logic there is the most subtle existing code that you're touching. Walk through it with one upstream client in mind first, then mentally replay with two; both must work without code change to the pipeline (only the way PerPlcContext.CurrentRequest gets populated changes).

Cross-references

  • Today's 1:1 model: ../design.md → "Connection model" (will be rewritten by this phase).
  • DL260 4-client cap source: ../../DL260/dl205.md → "Behavioral Oddities".
  • Existing rewriter request→response correlation: src/Mbproxy/Proxy/BcdPduPipeline.cs ProcessResponse (lines reading PerPlcContextWithRequest.LastRequest*).
  • Polly pipelines this phase reuses without modification: src/Mbproxy/Proxy/Supervision/PolicyFactory.cs.
  • Counter-snapshot backwards-compat policy: ../kpi.md → "Backwards-compat policy".