Adds the mbproxy service end-to-end. Phases 00-08 implement the production-ready single-listener / 1:1-backend transparent Modbus TCP proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260 fleet. Phase 9 replaces the connection layer with a single backend socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's 4-concurrent-client cap as an operational ceiling. Phase 9 additions of note: - PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap - InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing for Phase 10 read coalescing — do not collapse to a single field) - Per-request watchdog: surfaces Modbus exception 0x0B to upstream on BackendRequestTimeoutMs, defending against lost responses, dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed- request bug (its ServerRequestHandler.last_pdu state race) - Status DTO + HTML gain inFlight / maxInFlight / txIdWraps / disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md) Tests: 263 unit + 38 E2E. Multiplexer correctness under truly concurrent backend traffic is proved against a stub backend in PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus 3.13's single-PDU framer stays in known-good mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
31 KiB
Phase 09 — MBAP TxId multiplexing (single backend connection per PLC)
Replace the 1:1 upstream-client ↔ backend-socket model with a single backend connection per PLC, multiplexed across all upstream clients via MBAP transaction-ID rewriting and a correlation map. After this phase the H2-ECOM100's 4-simultaneous-TCP-client cap is no longer an operational ceiling — the proxy holds exactly one slot per PLC regardless of how many upstream clients are connected.
Status: shipped 2026-05-14. Phases 00-08 shipped the production-ready 1:1 model; this phase swapped connection management without changing the transparent-rewrite contract.
Implementation clarifications discovered during 2026-05-14 ship
These notes capture decisions and surprises that surfaced during the actual implementation. They supplement (not replace) the Tasks section below.
-
A per-request timeout watchdog is part of Phase 9, not deferred. The 1:1 model collapsed missing-response handling onto the dedicated backend socket dying. The multiplexed model needs an explicit timer because a single lost or mis-routed response would otherwise leak a correlation entry forever and hang the upstream pipe indefinitely. The watchdog ticks at quarter-
BackendRequestTimeoutMs(min 100 ms), scans the correlation map, and times out stale requests with Modbus exception 0x0B (Gateway Target Device Failed To Respond) delivered to the upstream party with the original TxId restored. Log eventmbproxy.multiplex.request.timeout(Warning). -
PlcListener constructs a multiplexer unconditionally. The Phase-9 draft had
PlcListenerconditionally construct the multiplexer only when aPerPlcContextwas supplied; the no-context fallback dropped accepted upstream sockets. Tests (and any pre-Phase-6 startup path that lacked a context) hit a regression. The fix is to construct a minimal defaultPerPlcContextfrom thePlcOptionsif the caller didn't supply one, and require_multiplexerto be non-null whenRunAsyncruns. -
BackendConnectFailure_ClosesUpstreamCleanlyis now lazy. The 1:1 model attempted a backend connect at upstream-accept time, so simply opening a TCP connection to a proxy with a bad backend triggered the close. The multiplexed model connects to the backend on the first upstream frame, so the test has to send a Modbus request before the proxy attempts the (failing) backend connect that causes the upstream close. Updated in-place. -
pymodbus 3.13.0 simulator is broken under multiplexed concurrent requests. Its
ServerRequestHandlerkeeps a singlelast_pduper connection and scheduleshandle_laterviaasyncio.call_soon; two MBAP frames in one recv buffer overwritelast_pdubefore the first handler runs, and both responses carry the later TxId. The real DL260 ECOM properly echoes per-request TxIds. Consequence for tests:- Mux correctness under truly concurrent backend traffic is proven against the stub backend in
PlcMultiplexerTests, which models the DL260's correct TxId-echo behaviour. MultiplexerE2ETestspaces requests so pymodbus only ever sees one MBAP frame at a time on the shared backend connection. The headline test (E2E_FiveSimultaneousClients_AllReadHR1072_AllGetDecoded_1234) verifies the connection ceiling lift (5 simultaneous upstream connections, where Phase-08's 1:1 model would have refused the 5th) — not the under-concurrency multiplexing behaviour.- The watchdog is the production defence if any real backend (or future simulator) ever mis-echoes a TxId: stale entries time out cleanly with exception 0x0B rather than hanging upstream clients.
- Mux correctness under truly concurrent backend traffic is proven against the stub backend in
-
E2E timeouts. Per
docs/plan/README.md's Test discipline, all E2E tests are 5 s by default. Hot-reload tests that genuinely need 5 s + 3 s of propagation windows carry a 10 s timeout with a one-line comment;E2E_BackendDisconnect_DuringInflight_CascadesUpstream_AndRecoverscarries 8 s for its sequential connects + Polly-paced reconnect path. -
AsyncHostDisposedeadlock note. Test fixtures that holdIHostviaawait usingwere originally written with a 5 s shutdown timeout; under Phase 9's drained-channel cleanup that occasionally exceeded the test's ownTimeout = 5000. Reduced to 2-3 s where it doesn't materially affect the test's drain semantics.
Depends on: Phase 04 (rewriter), Phase 05 (supervisor + Polly), Phase 07 (status page DTO surface).
Parallel-safe with: nothing within itself. Hard rule. This phase deletes PlcConnectionPair and rewires the supervisor + rewriter correlation path simultaneously; the cross-cut is too broad for safe parallel work. The optional intra-phase slicing (below) is the closest thing to parallel.
Goal
The H2-ECOM100 accepts 4 concurrent TCP clients per PLC; today's 1:1 model means the 5th upstream client to the same proxy port fails at backend connect. This phase eliminates that ceiling by making one persistent backend socket per PLC, with the proxy serving as a connection multiplexer that rewrites MBAP transaction IDs to keep concurrent in-flight requests from different upstream clients distinguishable on the single wire.
The wire-rate ceiling does not change — the H2-ECOM100 internally serializes requests (one per PLC scan, ~2-10 ms scan time) regardless of how many TCP connections it has. We're shifting where serialization happens (proxy outbound queue vs PLC accept queue), not adding throughput. The dashboard pay-off is that "PLC clients connected" can rise into the dozens without the proxy degrading.
Intra-phase slicing (the closest thing to parallel-safe within this phase)
The phase is one merge but can be implemented as five small commits in this order:
| Slice | Output | Files touched | Hours | Parallelizable? |
|---|---|---|---|---|
| 9.1 | Pure data types (TxIdAllocator, CorrelationMap, InFlightRequest) + their unit tests | new files under src/Mbproxy/Proxy/Multiplexing/ and tests/... |
~5 | Yes — pure logic, disjoint from rest. A second agent can write the E2E test scaffolding (slice 9.5) in parallel. |
| 9.2 | PlcMultiplexer + UpstreamPipe skeleton with backend reader/writer loops |
new files in Multiplexing/ |
~10 | No — depends on 9.1's data types. |
| 9.3 | Refactor PlcListener to own the multiplexer; delete PlcConnectionPair; rewire supervisor |
modifies existing Proxy + Supervision files | ~8 | No — depends on 9.2. |
| 9.4 | Update BcdPduPipeline to use correlation entries (drop PerPlcContextWithRequest); counter additions; status DTO + HTML updates |
modifies pipeline + admin files | ~6 | No — depends on 9.3. |
| 9.5 | Full E2E test suite + design.md + CLAUDE.md doc updates | new test file + doc edits | ~6 | Test-writing yes (slice 9.5 skeleton can land in parallel with 9.1); the doc edits at the end are sequential after 9.3. |
Total: ~35 hours. With one parallel agent producing slice 9.1's data types and another sketching the e2e test fixtures during slice 9.5-prep, calendar time can compress to ~28 hours.
Outputs (new files in this phase)
src/Mbproxy/Proxy/Multiplexing/PlcMultiplexer.cs # single backend conn owner; mux logic
src/Mbproxy/Proxy/Multiplexing/UpstreamPipe.cs # per-upstream-client reader/writer
src/Mbproxy/Proxy/Multiplexing/TxIdAllocator.cs # 16-bit allocator with wrap tracking
src/Mbproxy/Proxy/Multiplexing/CorrelationMap.cs # proxyTxId → InFlightRequest
src/Mbproxy/Proxy/Multiplexing/InFlightRequest.cs # the correlation record
src/Mbproxy/Proxy/Multiplexing/MultiplexerLogEvents.cs # [LoggerMessage] vocab for this phase
tests/Mbproxy.Tests/Proxy/Multiplexing/TxIdAllocatorTests.cs
tests/Mbproxy.Tests/Proxy/Multiplexing/CorrelationMapTests.cs
tests/Mbproxy.Tests/Proxy/Multiplexing/PlcMultiplexerTests.cs # integration, real sockets
tests/Mbproxy.Tests/Proxy/Multiplexing/RewriterCorrelationTests.cs # rewriter w/ multiplexed paths
tests/Mbproxy.Tests/Proxy/Multiplexing/MultiplexerE2ETests.cs # against pymodbus sim
Files modified (existing files in this phase)
src/Mbproxy/Proxy/PlcListener.cs # owns PlcMultiplexer; accept loop hands sockets to it
src/Mbproxy/Proxy/PlcConnectionPair.cs # DELETED — replaced by UpstreamPipe + Multiplexer
src/Mbproxy/Proxy/IPduPipeline.cs # PduContext gains in-flight correlation entry
src/Mbproxy/Proxy/PerPlcContext.cs # delete PerPlcContextWithRequest; replaced by InFlightRequest passed per-call
src/Mbproxy/Proxy/BcdPduPipeline.cs # FC03/04 response decodes via InFlightRequest, not last-request slot
src/Mbproxy/Proxy/ProxyCounters.cs # new fields: InFlightCount, MaxInFlight, TxIdWraps, BackendDisconnectCascades, BackendQueueDepth
src/Mbproxy/Proxy/Supervision/PlcListenerSupervisor.cs # supervises mux lifecycle alongside listener
src/Mbproxy/Admin/StatusDto.cs # PlcBackendStatus gains the new mux fields
src/Mbproxy/Admin/StatusSnapshotBuilder.cs # populate mux fields from counters
src/Mbproxy/Admin/StatusHtmlRenderer.cs # show inFlight/max-in-flight in the per-PLC row
docs/design.md # rewrite Connection model + Failure modes for multiplexed reality
mbproxy/CLAUDE.md # flip Architecture summary's connection-model bullet
docs/kpi.md # update operational notes referring to 4-client cap
Tasks
9.1 Data types (pure logic)
-
TxIdAllocator—internal sealed class TxIdAllocator. State:_inUse(bool[65536]for O(1) lookup; ~64 KB),_next(ushort),_inFlightCount(long),_wrapCount(long). Methods:bool TryAllocate(out ushort id)— atomic vialock(the allocator is per-PLC, contention is low). Scans forward from_nextfor the next free slot; sets_inUse[id] = true; bumps_next. Returnsfalseif_inFlightCount == 65536(saturated; emitmbproxy.multiplex.saturatedError and let caller decide to drop or queue).void Release(ushort id)— clears_inUse[id]; decrements_inFlightCount.int InFlightCount { get; },long WrapCount { get; }— for telemetry.- Wrap counter: increment whenever
_nextrolls over0xFFFF → 0x0000.
-
InFlightRequest+InterestedParty—InterestedPartyisinternal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId).InFlightRequestisinternal sealed record InFlightRequest(byte UnitId, byte Fc, ushort StartAddress, ushort Qty, IReadOnlyList<InterestedParty> InterestedParties, DateTimeOffset SentAtUtc). Carries enough state for: (a) restoring each party's original TxId on the way back, (b) the FC03/04 correlation the rewriter needs (start/qty), (c) routing the response to each interested upstream socket, (d) round-trip-time measurement.In Phase 9
InterestedPartiesalways contains exactly one element. The list shape is forward-compat with Phase 10 — read coalescing, which extends the same record to fan-out responses to multiple upstream clients without further refactor of the multiplexer's data model. Resist any reviewer suggestion to simplify it back to a singleUpstreamPipe Upstreamfield — the list shape is the load-bearing foundation for Phase 10. -
CorrelationMap— wraps aConcurrentDictionary<ushort, InFlightRequest>. Methods:bool TryAdd(ushort, InFlightRequest),bool TryRemove(ushort, out InFlightRequest),int Count { get; },IReadOnlyCollection<InFlightRequest> Snapshot()(for diagnostics; allocates a list). The dict is correct-by-construction for the mux's single-writer-add / single-reader-remove pattern;ConcurrentDictionarykeeps it safe if/when we add upstream-side cancellation.
9.2 Multiplexer + UpstreamPipe
-
UpstreamPipe—internal sealed class UpstreamPipe : IAsyncDisposable. One instance per accepted upstream socket. Fields:Socket _upstream,Guid _id,IPEndPoint _remoteEp,DateTimeOffset _connectedAtUtc,volatile bool _alive,Channel<byte[]> _responseChannel(capacity 16). Two tasks:- Read task: pumps inbound MBAP frames from
_upstreamto a per-pipeOnFramecallback (registered by the multiplexer). - Write task: drains
_responseChanneland writes each frame back to_upstream. On fault: sets_alive = false, closes the socket, the multiplexer notices on next correlation lookup and drops responses bound for this pipe.
- Read task: pumps inbound MBAP frames from
-
PlcMultiplexer—internal sealed class PlcMultiplexer : IAsyncDisposable. One instance per PLC. Fields: backendSocket,TxIdAllocator,CorrelationMap,Channel<byte[]> _outboundChannel(cap 256),PerPlcContext _ctx(tag map + counters + logger), list of attachedUpstreamPipes. Two backend tasks plus a fan-in:- Backend writer task: drains
_outboundChannel→ writes to backend socket. Single writer; no synchronization on the socket needed. - Backend reader task: reads MBAP frames from backend → looks up
proxyTxIdinCorrelationMap→ callspipeline.Process(ResponseToClient, header, pdu, ctx with InFlight)→ for eachInterestedPartyinInFlightRequest.InterestedParties(always exactly one in Phase 9; list-of-N once Phase 10 ships): writes a copy of the frame with that party'sOriginalTxIdrestored in the MBAP header to the party'sUpstreamPipe._responseChannel(or drops silently for that party if its pipe is_alive = false) →CorrelationMap.TryRemove(proxyTxId)+TxIdAllocator.Release(proxyTxId). - Per-upstream
OnFrame: invoked by eachUpstreamPipe's read task. Steps:- Parse MBAP: original TxId, length, unitId, PDU.
TryAllocatea proxyTxId. If saturated, write a Modbus exception response (Slave Device Failure, code 04) back to upstream and continue.- Build
InFlightRequest(parse FC/start/qty from PDU if FC03/04 — needed for FC06 too if we want the symmetric correlation later). TryAddto correlation map.- Call
pipeline.Process(RequestToBackend, ...)to apply BCD rewriting. - Overwrite MBAP TxId bytes with proxyTxId.
- Enqueue the modified frame into
_outboundChannel.
- Backend writer task: drains
-
Backend disconnect handling — when the backend reader/writer task throws (socket closed, network reset, etc.):
- Stop both tasks; close the backend socket.
- Walk the correlation map; for each entry, close that entry's
UpstreamPipe(cascade). IncrementBackendDisconnectCascadesby the upstream-pipe count. - Clear correlation map and TxIdAllocator.
- The supervisor's Polly pipeline takes over for backend reconnect — when the next upstream request arrives, the multiplexer attempts a fresh backend connection through the Polly pipeline.
9.3 Listener + supervisor refactor
-
PlcListener.RunAsync— accept loop changes:- One
PlcMultiplexerper listener (constructed inPlcListenerSupervisorand handed in). - On accept: wrap the socket in
UpstreamPipe, register with the multiplexer viamux.Attach(pipe). - On listener stop: dispose the multiplexer (which closes the backend + all attached pipes).
ActivePairsproperty → renamedActiveUpstreamsreturning the multiplexer's list of attachedUpstreamPipes. Status page consumes this.
- One
-
Delete
PlcConnectionPair.cs— entire file. The replacement isUpstreamPipe+PlcMultiplexer. No backwards-compat shims; we're moving cleanly. -
PlcListenerSupervisor— gains ownership ofPlcMultiplexeralongside the listener. The Polly listener-recovery pipeline is unchanged; the multiplexer has its own internal Polly backend-connect pipeline (sameResilienceOptions.BackendConnectshape as today, just owned by the mux instead of the pair).
9.4 Rewriter + counters + status page
-
BcdPduPipeline— the FC03/04 response path stops readingPerPlcContextWithRequest.LastRequestStart/Qty. Instead, the multiplexer attaches anInFlightRequestto thePduContextfor each response call:public sealed class PerPlcContext : PduContext { public BcdTagMap TagMap { get; init; } public ProxyCounters Counters { get; init; } public ILogger Logger { get; init; } public InFlightRequest? CurrentRequest { get; init; } // NEW — non-null on response, null on request }Concurrency: each backend response is handled on the backend reader task; the request path is handled by the per-upstream read task. Different
InFlightRequestinstances → no contention. -
Drop
PerPlcContextWithRequestentirely. The last-request-slot pattern was a 1:1-model workaround; the correlation map subsumes it. -
ProxyCountersadditions:InFlightCount(longsnapshot ofCorrelationMap.Count)MaxInFlight(long, peak observed viaInterlocked.Max)TxIdWraps(longfromTxIdAllocator.WrapCount)BackendDisconnectCascades(long)BackendQueueDepth(snapshot of_outboundChannel.Reader.Count)
-
Status page —
StatusDto.PlcBackendStatusgainsInFlight,MaxInFlight,TxIdWraps,DisconnectCascades,QueueDepth.StatusSnapshotBuilderpopulates them.StatusHtmlRendereradds a column or compact[3/256]indicator per PLC row. The JSON field names land in camelCase per the existing source-gen convention.
9.5 Tests + docs
-
Unit + integration test suites (see Tests required below).
-
docs/design.mdupdates:- Connection model section: rewrite. The diagram changes from "many clients → many backend sockets" to "many clients → one backend socket per PLC, multiplexed by proxy TxId rewriting." The operational consequence warning flips: instead of "5th client fails," it becomes "if backend disconnects, all attached upstream clients are cascaded closed; they reconnect on their own next request."
- Failure modes section: amend to describe the cascade behaviour.
- Rewriter section: amend to note the rewriter consumes
InFlightRequestfor response correlation (no architectural change, just an update to the description of how correlation flows).
-
mbproxy/CLAUDE.mdArchitecture summary: first bullet flips from "1:1 upstream-client ↔ backend-socket" to "single backend socket per PLC, multiplexed via MBAP TxId rewriting." -
docs/kpi.md— the "Tier 2 → Connection-cap saturation warning" KPI loses its meaning (4-client cap no longer relevant on the upstream side). Either remove it or repurpose to track in-flight saturation against the 16-bit TxId space (which never realistically saturates but is the new equivalent ceiling).
Public surface declared in this phase
All internal sealed — the multiplexer types are not consumed outside the assembly.
namespace Mbproxy.Proxy.Multiplexing;
internal sealed class TxIdAllocator {
public bool TryAllocate(out ushort id);
public void Release(ushort id);
public int InFlightCount { get; }
public long WrapCount { get; }
}
internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId);
internal sealed record InFlightRequest(
byte UnitId, byte Fc,
ushort StartAddress, ushort Qty,
IReadOnlyList<InterestedParty> InterestedParties,
DateTimeOffset SentAtUtc);
// Phase 9: InterestedParties.Count is always 1.
// Phase 10 (read coalescing): the same record fans out to N parties without further refactor.
internal sealed class CorrelationMap {
public bool TryAdd(ushort proxyTxId, InFlightRequest req);
public bool TryRemove(ushort proxyTxId, out InFlightRequest req);
public int Count { get; }
public IReadOnlyCollection<InFlightRequest> Snapshot();
}
internal sealed class UpstreamPipe : IAsyncDisposable {
public Guid Id { get; }
public IPEndPoint RemoteEp { get; }
public DateTimeOffset ConnectedAtUtc { get; }
public long PdusForwardedCount { get; }
public bool IsAlive { get; }
public Task RunReadLoopAsync(Func<byte[], Task> onFrame, CancellationToken ct);
public ValueTask SendResponseAsync(byte[] frame, CancellationToken ct);
public ValueTask DisposeAsync();
}
internal sealed class PlcMultiplexer : IAsyncDisposable {
public void Attach(UpstreamPipe pipe);
public IReadOnlyCollection<UpstreamPipe> AttachedPipes { get; }
public Task RunAsync(CancellationToken ct);
public ValueTask DisposeAsync();
}
PerPlcContext gains a nullable CurrentRequest property. PerPlcContextWithRequest is removed (along with its LastRequest* slots).
Tests required
Unit (Category = Unit)
TxIdAllocatorTests (≥ 8 tests):
Allocate_FromEmpty_Returns_NextSequentialAllocate_AfterRelease_Reuses_FreedIdAllocate_AllocatesEveryUshort_BeforeWrappingAllocate_WrapsCorrectly_After0xFFFFAllocate_WhenSaturated_ReturnsFalse_DoesNotThrowRelease_OfNonAllocated_IsNoOpConcurrent_AllocateRelease_NoDuplicateIds_Under_Parallel_Stress(100 tasks, 1000 ops each)WrapCount_IncrementsOnEachFullWrap
CorrelationMapTests (≥ 5 tests):
TryAdd_Then_TryRemove_RoundTripsTryAdd_DuplicateKey_FailsTryRemove_OfMissing_ReturnsFalseSnapshot_ReflectsCurrentStateConcurrent_AddRemove_NoDataLoss_Under_Parallel_Stress
PlcMultiplexerTests (≥ 7 tests, real sockets, no simulator):
SingleUpstream_RoundTripsFC03_Through_MultiplexerSingleUpstream_RoundTripsFC06_Through_MultiplexerTwoUpstreams_ConcurrentFC03_BothGetCorrectResponses— proves TxId rewriting works end-to-end against a stub backendTwoUpstreams_ProxyTxIds_AreDistinct_OnTheWire— sniff the backend socket; verify per-request TxIds are unique even when upstream TxIds collideUpstreamDisconnect_DoesNotAffectOtherUpstreams— drop one client mid-flight; other client's response still arrivesBackendDisconnect_CascadesToAllUpstreams— kill backend; verify all upstream sockets close within 500 ms,BackendDisconnectCascadesincrements by NBackendReconnect_AfterCascade_NextUpstreamRequest_Succeeds
RewriterCorrelationTests (≥ 4 tests):
FC03Response_DecodedViaInFlightRequest_NotPerPairSlotConcurrentFC03_FromTwoUpstreams_DecodeCorrectly_NoCrossTalk— set up twoInFlightRequests with different start addresses, deliver responses out of order; verify each decodes against its own requestConcurrentFC06_FromTwoUpstreams_EncodeCorrectlyResponseForDeadUpstream_IsDropped_NoExceptionPropagates
Integration (Category = Unit, no simulator)
These use real TcpListener + Socket against a stub backend (a TcpListener that just echoes or canned-responds). They live in PlcMultiplexerTests.
E2E (Category = E2E)
MultiplexerE2ETests (≥ 5 tests, against pymodbus simulator):
E2E_FiveConcurrentClients_AllReadHR1072_AllGetDecoded_1234— the headline test. Five NModbus clients connected to the proxy in parallel; pymodbus sim has the BCD register at 1072. All five get1234. With Phase 08's 1:1 model, the 5th client would fail at backend connect.E2E_TwentyConcurrent_FC03_Requests_AcrossThreeClients_AllSucceedE2E_BackendDisconnect_DuringInflight_CascadesUpstream_AndRecovers— kill the sim mid-flight (simulate by closing on its side); verify upstream clients see clean socket close; relaunch sim; new upstream connection succeeds.E2E_RewriterStillWorks_UnderMultiplexedThreeClients— three clients each writing different decimal values to different BCD-configured addresses via FC06; verify sim's register state.E2E_StatusPage_Shows_InFlightAndMaxInFlight— drive 4 concurrent reads, verify/status.jsonreportsinFlight >= 1during the burst andmaxInFlight >= 4.
Phase gate
dotnet build Mbproxy.slnx -c Debug— zero warnings, zero errors.- All 271 prior tests still green. Specifically:
Forward_FC03_HR1072_Returns_Decoded_1234,Forward_FC06_WriteHR200_ThenReadBack_RoundTrips,MbapTxId_IsPreservedEndToEnd, andMbapTxId_StillPreserved_AfterRewriting_20Consecutivecontinue to pass against the multiplexed implementation. The MBAP-TxId-preserved tests are the critical regression guard — if multiplexing leaks proxy TxIds back to the client, these fail. - All new unit tests pass (≥ 24 new in slices 9.1-9.2 alone).
- All new E2E tests pass (≥ 5).
Forward_FC03_HR1072_Returns_Decoded_1234PASSES with 5 concurrent NModbus clients connected to the same proxy port. This is THE phase test.PlcConnectionPair.csis gone. Grep for the type name across the solution returns zero hits.PerPlcContextWithRequestis gone. Grep returns zero hits.docs/design.md"Connection model" section is rewritten; the 1:1 model description is gone or moved into a "Historical: pre-Phase-09 model" footnote.mbproxy/CLAUDE.mdArchitecture summary's connection-model bullet is updated.- Backend disconnect with N upstream clients in-flight: all N close within 500 ms; counter
BackendDisconnectCascades += N. mbproxy.multiplex.saturatedError event fires if TxId allocator hits 65,536 in-flight. (Stress-test acceptable; manufacture by holding 65,536 pending responses against a stub backend.)- Shutdown semantics still work:
ShutdownCoordinatordrains in-flight requests (now visible viaInFlightCount, notIsProcessing). - Status page renders the new fields; HTML page weight remains under 50 KB for 54 PLCs.
- CounterSnapshot's existing field set is preserved — only added fields, no renames or removals. Backwards-compat per the policy in
docs/kpi.md.
Out of scope
- Foundation for future caching, not caching itself. This phase establishes the chokepoint where any future caching or coalescing layer plugs in, but implements no caching of any kind.
InFlightRequest.InterestedPartiesis shaped as a list specifically to make Phase 10 — read coalescing additive without refactor; do not infer caching behavior from the list shape alone. Tier C-2 (short-TTL response cache) and Tier C-3 (periodic poll + cache) remain explicitly out of scope until their own design discussions anddesign.mdupdates land. - Per-tag read coalescing — if two clients read the same register at the same time, Phase 9's multiplexer sends both requests. Coalescing them into one backend round-trip is the explicit goal of Phase 10, which plugs into the
InterestedPartiesseam created here. - Backend keepalive / heartbeat — the design's current "no keepalive" position stands. An idle backend with no upstream activity will die after middlebox timeouts; the next upstream request triggers a fresh connect via Polly. Multiplexing doesn't change this.
- TxId fairness scheduling — FIFO order in the
_outboundChannelis the contract. No round-robin per upstream, no priority. If a single upstream client floods the channel, others queue behind. This is a stated trade-off and matches the ECOM's internal serialization anyway. - Pipelined multi-PDU-in-flight per single upstream client — still unsupported. One in-flight request per upstream pipe at a time. Multiplexing across DIFFERENT upstream clients works fully; multiplexing across multiple in-flight requests from the SAME upstream client does not. Document the constraint.
- Linux / cross-platform packaging — still Windows Service only.
Subagent briefing
If you're the agent picking up this phase, here's the executive summary you need in your head:
-
You are deleting
PlcConnectionPair. Everything that file did is now split betweenUpstreamPipe(the per-client half) andPlcMultiplexer(the per-PLC half). ReadPlcConnectionPair.csonce before you delete it — every behavior in there has a destination in one of the two new classes. -
Single-writer / single-reader on the backend socket. Two tasks share the backend socket: one writes (drained from
_outboundChannel), one reads (decodes MBAP frames). No third task touches the socket. This invariant is what makes the channel + dictionary design correct without locks. -
The rewriter doesn't know about MBAP framing or correlation. It still receives
(direction, mbapHeader span, pdu span, PerPlcContext ctx). The only addition isctx.CurrentRequest(nullable, non-null on response). The rewriter is otherwise unchanged. Resist refactoring it. -
InFlightRequest.SentAtUtcpowerslastRoundTripMscorrectly across multiplexed clients. Today's EWMA is per-pair; under multiplexing, the timestamp moves to per-request. The status counter stays the same. -
Cascade-on-backend-disconnect is the most subtle behavior. Get the test for it right early (
BackendDisconnect_CascadesToAllUpstreams). It's the difference between "graceful failure" and "leaked upstream sockets that hold connections open until OS timeout." -
TxId allocator saturation is a real-world impossibility but a stress-test reality. Hold 65,536 responses in a stub backend; the allocator must refuse the 65,537th cleanly with an exception response code 04, not crash.
-
Update the docs in the SAME PR as the code.
design.mdConnection model,mbproxy/CLAUDE.mdArchitecture summary, anddocs/kpi.mdconnection-cap KPI either get rewritten or removed. Doc drift is a gate fail. -
Do NOT introduce parallel agents within this phase. The cross-cut is too broad. If you have spare agent budget, slice 9.1 (data types + their unit tests) can run alongside slice 9.5 (e2e test scaffolding writing against the unchanged outer-shape contract) but the middle slices are sequential.
-
The 4 critical regression tests that must stay green:
Forward_FC03_HR1072_Returns_Decoded_1234Forward_FC06_WriteHR200_ThenReadBack_RoundTripsForward_FC16_WriteMultipleHR201_203_ThenReadBack_RoundTripsMbapTxId_IsPreservedEndToEnd← THIS is the one that proves multiplexing is transparent.
-
When in doubt, re-read
BcdPduPipeline.ProcessResponse. The FC03/04 correlation logic there is the most subtle existing code that you're touching. Walk through it with one upstream client in mind first, then mentally replay with two; both must work without code change to the pipeline (only the wayPerPlcContext.CurrentRequestgets populated changes).
Cross-references
- Today's 1:1 model:
../design.md→ "Connection model" (will be rewritten by this phase). - DL260 4-client cap source:
../../DL260/dl205.md→ "Behavioral Oddities". - Existing rewriter request→response correlation:
src/Mbproxy/Proxy/BcdPduPipeline.csProcessResponse(lines readingPerPlcContextWithRequest.LastRequest*). - Polly pipelines this phase reuses without modification:
src/Mbproxy/Proxy/Supervision/PolicyFactory.cs. - Counter-snapshot backwards-compat policy:
../kpi.md→ "Backwards-compat policy".