Adds 11 topic-focused docs under docs/{Architecture,Features,Operations,Reference,Testing}/
and links them from README.md's new "Detailed documentation" section. Existing
top-level docs (design.md, kpi.md, operations.md) remain as canonical landings.
Architecture/
- Overview.md (150 lines) — listener topology, request flow, per-PLC isolation
- ConnectionModel.md (247 lines) — TxId multiplexer, watchdog, disconnect cascade
- ReadCoalescing.md (243 lines) — in-flight FC03/04 dedup via InFlightByKeyMap
- ResponseCache.md (398 lines) — opt-in per-tag TTL cache + range-overlap invalidation
Features/
- BcdRewriting.md (252 lines) — codec, CDAB, FC scope, partial-overlap policy
- HotReload.md (189 lines) — IOptionsMonitor + per-change-kind reconcile rules
Operations/
- Configuration.md (422 lines) — every Mbproxy:* option + validation rules
- StatusPage.md (334 lines) — admin endpoint surface, every JSON field
- Troubleshooting.md (364 lines) — diagnosis playbook keyed to log events
Reference/
- LogEvents.md (499 lines) — 28 events across 7 categories, grep-verified
Testing/
- Simulator.md (235 lines) — pymodbus fixture, skip policy, 3.13 framer quirk
Each doc was written by a dedicated agent against the StyleGuide.md rules with
a per-doc phase gate (PascalCase filename, H1 Title Case, code-fence language
tags, Related Documentation section with >=3 relative links, real type names
verified against src/). Cross-references between docs use relative paths;
all 18 README->docs links and all sibling links resolve.
Known follow-up: docs/design.md lines 215-251 are stale on two log-event
property templates (config.reload.applied and config.reload.rejected) and
mention LogContext.PushProperty scoping that isn't actually used. Reference/
LogEvents.md is now the authoritative event catalog and source-of-truth.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
27 KiB
Connection Model
The proxy holds one persistent backend TCP socket per PLC and multiplexes many upstream client connections onto it by rewriting the MBAP transaction ID on every request and restoring each client's original TxId on the matching response.
Why One Backend Connection Per PLC
An earlier design opened a fresh backend socket for each accepted upstream client (1:1 pairs). That model collapsed against the AutomationDirect H2-ECOM100, which caps simultaneous TCP clients at 4 per PLC (see ../../DL260/dl205.md under "Behavioural Oddities"). The fifth upstream client to attach to a busy PLC was refused at connect, with no recourse other than waiting for an existing pair to drop.
Multiplexing replaces 1:N upstream-to-backend with N:1 upstream-to-multiplexer-to-backend:
- The proxy occupies exactly one of the ECOM's 4 TCP client slots per PLC, regardless of how many upstream clients are attached.
- Upstream-side concurrency is no longer bounded by the controller's accept queue.
- Serialisation shifts from the PLC accept queue to the proxy's outbound channel (
_outboundChannelinPlcMultiplexer).
The honest trade-off: the wire-rate ceiling does not change. The ECOM serialises requests internally at roughly 2–10 ms per scan, so the multiplexer cannot deliver more PDUs per second to one PLC than the 1:1 model could. What multiplexing buys is connection headroom, plus the data structures that read coalescing and the response cache hook into.
Why TxId rewriting rather than connection pooling
The MBAP transaction ID is a 16-bit field at bytes 0–1 of every Modbus TCP frame, and the Modbus TCP specification explicitly permits clients to pipeline requests under different TxIds on a single connection. The PLC echoes each request's TxId on the matching response. The multiplexer exploits that contract: by allocating a proxy-side TxId per request and substituting it for the upstream client's TxId on the wire, many upstream clients can have concurrent requests outstanding on one backend socket without their MBAP frames ever colliding. A connection pool, by contrast, would still need either one backend socket per concurrent request (defeating the ECOM cap workaround) or a serialisation lock on each pooled socket (defeating concurrency).
Components
The load-bearing types all live in ../../src/Mbproxy/Proxy/Multiplexing/.
Type roster
| Type | File | Role |
|---|---|---|
PlcMultiplexer |
PlcMultiplexer.cs |
Owns the backend socket, the outbound channel, the backend writer and reader tasks, the per-request timeout watchdog, and the set of attached upstream pipes. One instance per PLC. |
UpstreamPipe |
UpstreamPipe.cs |
Per-upstream-client wrapper around an accepted Socket. Owns a read task that drives PlcMultiplexer.OnUpstreamFrameAsync, plus a write task that drains a bounded _responseChannel (capacity 16) onto the socket. |
TxIdAllocator |
TxIdAllocator.cs |
Proxy-side 16-bit TxId allocator. Backed by a bool[65536] plus a rolling _next cursor under a single lock. Exposes TryAllocate, Release, InFlightCount, and WrapCount. |
CorrelationMap |
CorrelationMap.cs |
ConcurrentDictionary<ushort, InFlightRequest> mapping proxy TxId to its in-flight record. Exposes TryAdd, TryRemove, DrainAll, and SnapshotOlderThan. |
InFlightRequest |
InFlightRequest.cs |
Record carrying UnitId, Fc, StartAddress, Qty, IReadOnlyList<InterestedParty> InterestedParties, SentAtUtc, and ResolvedCacheTtlMs. |
InterestedParty |
InFlightRequest.cs |
Record (UpstreamPipe Pipe, ushort OriginalTxId) identifying who receives the response and which TxId to restore. |
Threading invariants
The multiplexer relies on a handful of single-owner rules that keep the wire-touching code lock-free:
- One backend writer. Only
RunBackendWriterAsynccallsbackend.SendAsync. The single-writer drain of_outboundChannel.Reader.ReadAllAsyncmeans no socket-level send lock is needed. - One backend reader. Only
RunBackendReaderAsynccallsbackend.ReceiveAsync. The reader is the sole producer ofCorrelationMap.TryRemovefor the response path. - Per-pipe write loop. Each
UpstreamPipehas exactly one task that drains_responseChannelto its upstream socket. The multiplexer fan-out path only enqueues; it never writes to the socket directly. - Per-pipe read loop. A single read task per pipe parses MBAP frames and calls
OnUpstreamFrameAsyncsequentially. A single upstream client therefore cannot multi-PDU-pipeline itself; concurrency comes from having many pipes.
TxIdAllocator holds an internal lock for TryAllocate / Release. Contention is low in practice — one PLC's wire rate is bounded by the ECOM scan time — and the lock is preferred over a lock-free approach so the saturation, cascade, and Polly-retry paths remain deterministic.
Why ConcurrentDictionary for the correlation map
CorrelationMap is backed by ConcurrentDictionary<ushort, InFlightRequest> even though the request-side adds and the response-side removes are nominally single-threaded each. Three independent paths can remove an entry concurrently with each other: the backend reader on a normal response, the watchdog on a timeout, and the cascade walker on a backend disconnect. Two adders (the coalescing path's factory and the non-coalescing fast path) can also race against a removal if the backend response arrives mid-add. The ConcurrentDictionary makes those TryAdd/TryRemove calls atomic, which is what the "claim then dispatch" pattern in the watchdog and reader relies on for correctness.
Upstream To Multiplexer Path
PlcListener accepts an upstream Socket and constructs an UpstreamPipe around it. PlcMultiplexer.StartPipeAsync attaches the pipe, spins up its write loop, and invokes RunReadLoopAsync with OnUpstreamFrameAsync as the per-frame callback. When the read loop returns (clean upstream EOF, socket fault, or cascade), a ContinueWith removes the pipe from _pipes; disposal itself is owned by the listener.
Frame parsing
The pipe's read loop reads a 7-byte MBAP header into a stack-buffered array, parses the Length field, allocates a fresh byte[] sized to header + (Length − 1) bytes, fills the PDU body, and hands the complete frame to the callback. Frames whose length field claims a body larger than MbapFrame.MaxPduBodySize are treated as a protocol error and close the upstream pipe; a zero-body length is permitted (the header alone is forwarded). The buffer ownership transfers to the multiplexer with each call so the multiplexer can store it in the CorrelationMap entry without coordinating buffer lifetimes back to the pipe.
Each call to OnUpstreamFrameAsync:
- Parses the MBAP header to extract the upstream client's
originalTxIdand theunitId. - For FC03, FC04, FC06, and FC16 it also pulls
startAddressandqtyout of the PDU; these feed the cache, the read-coalescing key, and the response BCD rewriter. - (Response cache, FC03/FC04 only) checks
_ctx.Cachevia aCacheKey. A hit short-circuits the entire path — including the backend connect attempt — and returns a synthesised frame. - Calls
EnsureBackendConnectedAsync, which lazily brings up the backend socket through a Polly retry pipeline driven byConnection.BackendConnectTimeoutMs. - (Read coalescing, FC03/FC04 only, when enabled) consults
InFlightByKeyMapto either attach to an existing peer in flight or open a new entry. - On a coalescing miss or any non-coalescing FC: calls
TxIdAllocator.TryAllocate(out ushort proxyTxId). Saturation returns false and the client receives a Modbus exception code 4 (Slave Device Failure). - Builds an
InFlightRequest, registers it inCorrelationMap.TryAdd(proxyTxId, ...), and observes the new peak viaObserveInFlight. - Runs the BCD rewriter over the request payload through
_pipeline.Process(MbapDirection.RequestToBackend, ...). - Overwrites bytes 0 and 1 of the MBAP header with the big-endian encoding of
proxyTxId. - Enqueues the frame onto
_outboundChannelvia_outboundChannel.Writer.WriteAsync. The channel is bounded at 256 withBoundedChannelFullMode.Wait, so a saturated outbound queue backpressures the upstream rather than dropping frames.
// Sketch of the proxy-TxId rewrite (PlcMultiplexer.OnUpstreamFrameAsync):
if (!_allocator.TryAllocate(out ushort proxyTxIdFc)) { /* exception 4 */ }
_correlation.TryAdd(proxyTxIdFc, inFlightNc);
_pipeline.Process(MbapDirection.RequestToBackend, header, body, requestCtxNc);
frame[0] = (byte)(proxyTxIdFc >> 8);
frame[1] = (byte)(proxyTxIdFc & 0xFF);
await _outboundChannel.Writer.WriteAsync(frame, ct).ConfigureAwait(false);
After enqueuing, the upstream read loop is free to read the next frame. There is no per-pipe in-flight gate beyond what the upstream client itself imposes by reading from a single TCP stream.
Saturation handling
TxIdAllocator.TryAllocate returns false only when all 65,536 slots are simultaneously in flight against one PLC. In that state OnUpstreamFrameAsync calls BuildExceptionFrame(originalTxId, unitId, fcByte, exceptionCode: 4) and enqueues the frame straight onto the requesting pipe's response channel — the upstream client sees a clean Modbus exception code 4 (Slave Device Failure) rather than a hung socket. The same path emits MultiplexerLogEvents.Saturated with the remote endpoint string for operator triage.
Lazy backend connect
The backend socket starts offline. EnsureBackendConnectedAsync runs under a SemaphoreSlim named _connectGate so concurrent upstream frames during a cold start serialise their connect attempts. The first caller through the gate builds a fresh Socket, sets NoDelay = true, and runs ConnectAsync under either the supplied _backendConnectPipeline (Polly resilience pipeline) or a plain CancellationToken linked to Connection.BackendConnectTimeoutMs. On failure it logs MultiplexerLogEvents.BackendFailed, increments the per-PLC connectsFailed counter, and returns false; the upstream pipe is disposed by the caller. On success it spawns the backend writer and reader tasks under a fresh CancellationTokenSource linked to _disposeCts, increments connectsSuccess, and logs MultiplexerLogEvents.BackendConnected.
A double-checked fast path before the gate avoids the semaphore acquire on the happy path: the moment _backendSocket is { Connected: true } and _backendCts is { IsCancellationRequested: false }, EnsureBackendConnectedAsync returns immediately without taking the lock. The lazy-connect contract therefore costs one volatile read per request after the first successful connect.
Multiplexer To Backend Path
The backend side is two tasks plus one bounded channel. EnsureBackendConnectedAsync launches two tasks against the backend socket on first connect, both under a single _backendCts:
RunBackendWriterAsync— single consumer of_outboundChannel.Reader.ReadAllAsync. Writes every frame to the backend socket viaSendAsyncwith a loop to handle short writes. Single-writer means no socket-level lock is needed for sends.RunBackendReaderAsync— single producer reading frames off the backend socket. For each frame:- Parses the MBAP header to extract
proxyTxIdandlength. - Reads the PDU body into a fresh
byte[]. - Calls
CorrelationMap.TryRemove(proxyTxId, out var inFlight). A miss (no entry) drops the frame silently — usually a stale response after a cascade. - Frees the allocator slot via
_allocator.Release(proxyTxId). - Updates the per-PLC EWMA round-trip via
UpdateRoundTripEwmausinginFlight.SentAtUtc. - Runs the response-side BCD rewriter through
_pipeline.Process(MbapDirection.ResponseToClient, ...). The rewriter needsinFlight.StartAddressandinFlight.Qtybecause the FC03/FC04 response PDU does not echo the read range. - (Cache write-through, post-rewriter) on a non-exception response, stores FC03/FC04 entries in
_ctx.Cacheor invalidates overlapping entries on FC06/FC16. - Walks
inFlight.InterestedParties. For each party with a live pipe, copies the frame, restoresparty.OriginalTxIdinto header bytes 0–1, and callsparty.Pipe.SendResponseAsyncto enqueue the frame onto that pipe's response channel.
- Parses the MBAP header to extract
Single-reader on the backend socket plus per-pipe response channels means every cross-task hand-off goes through a Channel<byte[]> — no locks on the wire-touching code paths.
Frame fan-out
When inFlight.InterestedParties.Count == 1 — the common non-coalesced case — the reader optimises by passing the original frame buffer through to SendResponseAsync without copying. When the list has more than one party (a coalesced FC03/FC04 read), the reader clones the frame for each party before patching in its OriginalTxId, so each pipe's response channel owns an independent buffer.
A party whose pipe reports IsAlive == false is skipped. For multi-party FC03/FC04 frames the skip path also increments the per-PLC coalescedResponseToDeadUpstream counter and logs CoalescingLogEvents.DeadUpstream, so operators can correlate cascade-mid-flight events with which reads were affected.
Per-Request Timeout Watchdog
RunRequestTimeoutWatchdogAsync is launched from the multiplexer constructor and runs for the lifetime of the multiplexer. It ticks every BackendRequestTimeoutMs / 4, floored at 100 ms, and on each tick calls CorrelationMap.SnapshotOlderThan(DateTimeOffset.UtcNow.AddMilliseconds(-BackendRequestTimeoutMs)).
For each stale entry the watchdog:
- Tries to claim the entry via
_correlation.TryRemove(proxyTxId, out var req). A failed claim means a response, cascade, or another watchdog tick already removed it — skip. - Releases the proxy TxId via
_allocator.Release(proxyTxId). - For FC03/FC04, also removes the matching
CoalescingKeyfromInFlightByKeyMapso a brand-new identical request opens a fresh round-trip rather than attaching to a corpse. - Walks
req.InterestedPartiesand, for each live pipe, delivers a synthesised Modbus exception frame with function codereq.Fc | 0x80and exception code0x0B(Gateway Target Device Failed To Respond), with the party'sOriginalTxIdpatched back into the MBAP header.
The watchdog exists because the multiplexed model has no per-pair fault-on-timeout backstop. In the 1:1 model, a lost response simply sat on a dead socket that the upstream eventually closed; in the multiplexed model, a single missing or mis-echoed response would leak its CorrelationMap entry forever and hang every upstream party attached to it. Specific failure modes the watchdog covers:
- The PLC drops a response (busy controller, scan-time excursion).
- A middlebox drops a packet on a long-idle backend socket.
- A backend mis-echoes the MBAP TxId — including pymodbus 3.13.0's deferred-handler bug noted below.
Why claim then release
The watchdog reads the stale set via SnapshotOlderThan (a non-removing scan) and only then competes for each entry via TryRemove. The two-step is deliberate: a response arriving between the snapshot and the claim wins the TryRemove race and the watchdog skips that entry. Without the claim race, the upstream party could receive both a real response and a 0x0B exception for the same request, which would corrupt clients that expect responses in TxId order.
Tick cadence
The 100 ms floor on tickMs keeps the watchdog from busy-waking when an operator configures BackendRequestTimeoutMs below 400 ms. With the production default of 3000 ms the watchdog ticks every 750 ms, which keeps timeout dispatch latency well under one second past the threshold.
Exception frame shape
BuildExceptionFrame produces a 9-byte synthetic response: 7-byte MBAP header plus a 2-byte exception PDU. The function code byte is OR'd with 0x80 to flag the response as an exception, and the second PDU byte carries the exception code (0x04 for allocator saturation, 0x0B for the watchdog). The Length field in the MBAP header is set to 3 (UnitId + exception FC + exception code) and the ProtocolId is zero per the Modbus TCP spec. Clients written against a real DL260 see exactly the same frame layout a controller would emit, so client libraries surface a normal ModbusException rather than a transport error.
Backend Disconnect Cascade
When the backend socket dies — reader EOF, writer fault, PLC reboot, network partition, or middlebox idle drop — TearDownBackendAsync(reason, cascadeUpstreams: true) runs:
- Cancels
_backendCts, which terminates both backend tasks. - Shuts down and disposes the backend
Socket. - Calls
CorrelationMap.DrainAll, releases every allocator slot, and collects everyInterestedParty's pipe ID. - Calls
InFlightByKeyMap.DrainAllso stale coalescing entries cannot outlive the backend they were aimed at. - Disposes every attached
UpstreamPipeand clears_pipes. - Increments
BackendDisconnectCascadeson the per-PLC counters by the number of upstream pipes that were attached (AddDisconnectCascades(upstreamCount)). - Logs a
MultiplexerLogEvents.BackendDisconnectedevent with the upstream count, drained correlation count, and a reason string.
The rationale: a backend disconnect invalidates every in-flight response, and there is no clean way to mid-flight-rebind upstream clients to a fresh backend socket without risking silent data loss. Cascading the disconnect upstream is loud (clients re-issue immediately) but unambiguous — every upstream sees its socket close, no zombie upstream sockets hold stale state. The next upstream frame after the cascade triggers a fresh Polly-driven backend connect.
Failure detection paths
Three independent paths can initiate a cascade:
- Reader EOF.
RunBackendReaderAsyncsees a clean zero-byte read fromReceiveAsyncand falls out of the loop. It callsTearDownBackendAsync("backend reader EOF", cascadeUpstreams: true)as a fire-and-forget task. - Reader fault or writer fault. Either backend task catches a non-cancellation exception and calls
TearDownBackendAsync($"reader fault: {ex.Message}", ...)or the equivalent writer-fault path. - Watchdog-driven indirect failure. A backend that mis-echoes TxIds will not itself fault the socket; the watchdog eventually times out the leaked correlation entries and delivers 0x0B exceptions. The socket stays up unless the backend then also stops responding to subsequent requests.
TearDownBackendAsync is idempotent against itself — the lock (_backendLock) block atomically swaps the live socket and task references to null, so a second invocation sees oldSocket is null && oldCts is null and returns without re-cascading.
Why every attached upstream cascades
An earlier sketch cascaded only upstream pipes that had a request in flight at the moment of disconnect. The current implementation cascades every attached pipe, in flight or idle. The reason: an idle upstream pipe is one that the proxy has been quietly answering from cache or that has simply not sent a request recently. After a backend disconnect, the proxy has no way to prove the PLC's state still matches what those idle clients last saw — a PLC reboot, ladder edit, or operator write between the disconnect and reconnect can have moved the values out from under them. Closing every upstream socket is the unambiguous signal that "the link to the device was lost; rebuild your state from scratch." Clients reconnect on their own next request.
Connect-on-next-frame, not eager reconnect
The cascade tears down the backend without scheduling a reconnect. The next upstream frame that arrives invokes EnsureBackendConnectedAsync, which constructs a fresh socket and runs the Polly connect pipeline. The rationale is that an eager reconnect spinner would hammer a downed PLC at the configured backoff schedule even when no clients are attached; gating reconnect on client demand avoids waste during long PLC outages without sacrificing recovery latency once clients return.
Wire-Rate Considerations
The multiplexer is not a throughput multiplier. The ECOM serialises every request it receives on its single internal scan, so PDUs-per-second to one PLC is bounded by 1 / ecom_scan_ms regardless of how many upstream clients the proxy fans in. What changes:
- Connection count. Upstream-side connection count is now limited by the OS socket budget and
OutboundChannelCapacity(256), not by the ECOM's 4-client cap. - Coalescing opportunity. Identical concurrent FC03/FC04 reads attach to the same
InFlightRequestviaInFlightByKeyMap, so the proxy issues one backend round-trip and fans the response out to all attached parties (see./ReadCoalescing.md). - Cache short-circuit. FC03/FC04 reads with a resolved per-tag TTL never reach the wire while the cached PDU is fresh (see
./ResponseCache.md).
The proxy can hand more concurrent upstream clients a result on a hot tag than the bare PLC can serve simultaneously. It cannot let those clients hammer the PLC harder than the PLC's scan time allows.
Counters exposed by the status page
PlcMultiplexer implements IMultiplexCountersProvider and registers itself with the per-PLC counters object during construction. The status page reads these values per snapshot:
| Counter | Source | Meaning |
|---|---|---|
inFlight |
TxIdAllocator.InFlightCount |
Proxy TxIds currently allocated against this PLC. |
maxInFlight |
Counters.ObserveInFlight peak |
High-water mark since service start. |
txIdWraps |
TxIdAllocator.WrapCount |
Times the rolling cursor has rolled 0xFFFF → 0x0000. Sustained non-zero means very high churn. |
queueDepth |
_outboundChannel.Reader.Count |
Frames sitting in the outbound channel waiting for the backend writer. Persistent depth means the PLC is the bottleneck. |
disconnectCascades |
Counters.AddDisconnectCascades |
Cumulative count of upstream pipes cascaded by backend disconnects. Rises in chunks equal to the attached pipe count at cascade time. |
connectsSuccess / connectsFailed |
Counters.IncrementConnectSuccess / IncrementConnectFailed |
Per-PLC backend connect outcomes. |
Interpreting non-zero txIdWraps
Each WrapCount increment means the allocator has issued at least 65,536 TxIds against one PLC since service start. On a steady 10 ms-per-PDU pace that takes about 11 minutes; sustained wraps therefore indicate request rates in the hundreds-per-second range, well above what an ECOM-served PLC can answer. Wraps without a matching rise in inFlight simply reflect cumulative volume and are benign. Wraps that climb alongside a high inFlight value indicate the PLC is back-pressuring; check queueDepth and the EWMA round-trip on the same snapshot.
Interpreting non-zero queueDepth
_outboundChannel is bounded at 256 with BoundedChannelFullMode.Wait. A persistent depth above zero means the backend writer is not draining as fast as upstream pipes are submitting — the PLC has become the bottleneck. A queue that climbs toward 256 means upstream pipes are starting to block on WriteAsync; that backpressure walks back up the per-pipe read loop and ultimately stalls the upstream client's send buffer, which is the correct behaviour for an overloaded PLC.
A queue depth above zero with inFlight also climbing suggests the PLC is keeping up with requests but slowly; the EWMA round-trip on the same snapshot will confirm. A queue depth above zero with inFlight flat at the allocator's saturation ceiling indicates a stuck backend (no responses arriving, no slots freeing); the watchdog will eventually clear the stuck entries via 0x0B exceptions.
Memory footprint per PLC
Each PlcMultiplexer holds a bool[65536] for the TxId allocator (~64 KB), the ConcurrentDictionary for the correlation map (sized to peak in-flight, typically tens of bytes per entry plus the byte[] frame buffers referenced by the entries), the bounded outbound channel (≤ 256 frames in flight; each frame at most 260 bytes), and the per-pipe response channels (≤ 16 frames per attached pipe). With ~54 PLCs the allocator alone accounts for roughly 3.4 MB; the rest is request-rate dependent and well within the service's measured ~30 MB working set under load.
Lifecycle And Disposal
PlcMultiplexer.DisposeAsync is idempotent and runs in this order:
- Sets
_disposed = trueand unhooks the liveIMultiplexCountersProviderregistration so a concurrent status snapshot does not observe internal state mid-teardown. - Cancels
_disposeCts, which cooperatively stops the watchdog task. - Awaits the watchdog with a 2-second timeout so its in-flight 0x0B dispatches settle before tests assert against counter values.
- Calls
TearDownBackendAsync("disposing", cascadeUpstreams: true)to close the backend, drainCorrelationMap, drainInFlightByKeyMap, and dispose every attached pipe. - Completes the outbound channel writer, then disposes any pipes that were not already cleared by the cascade walk.
- Disposes
_disposeCts.
UpstreamPipe.DisposeAsync is similarly idempotent: it completes its response channel writer, cancels its internal CTS, shuts the upstream socket down both ways, and emits a MultiplexerLogEvents.ClientDisconnected event with the remote endpoint string and a reason. Disposal can be triggered by the listener (clean upstream EOF), by the read or write loop encountering a socket error, or by the cascade walk.
pymodbus 3.13.0 Simulator Quirk
The pymodbus simulator's ServerRequestHandler stores a single last_pdu field per connection and schedules deferred response handlers via asyncio.call_soon. When two MBAP frames arrive in the same recv buffer — exactly the workload the multiplexer can produce on its shared backend socket — the second frame's last_pdu overwrites the first before either deferred handler runs. Both responses then carry the second request's TxId.
Why this only matters in tests
The real H2-ECOM100 does not have this bug; it echoes per-request TxIds correctly. Multiplexer correctness under genuine backend concurrency is proven by the unit tests in PlcMultiplexerTests against a stub backend that respects MBAP TxIds, not via the simulator. The E2E suite paces requests against the pymodbus simulator to keep it in known-good single-PDU mode.
The per-request timeout watchdog described above is the production defence against any backend (real or simulated) that mis-echoes a TxId: the unanswered InFlightRequest ages past BackendRequestTimeoutMs and the upstream party receives a clean Modbus exception 0x0B rather than a hung socket.
Related Documentation
./Overview.md— proxy architecture entry point./ReadCoalescing.md— FC03/FC04 fan-out built onInterestedParties./ResponseCache.md— per-PLC FC03/FC04 cache layered in front of this multiplexer../Operations/Configuration.md—Connection.BackendConnectTimeoutMs,Connection.BackendRequestTimeoutMs, retry tuning../Operations/StatusPage.md—inFlight,maxInFlight,txIdWraps,queueDepth,disconnectCascadescounters../Reference/LogEvents.md—mbproxy.multiplex.*structured log events../Testing/Simulator.md— pymodbus 3.13.0 deferred-handler quirk in detail../../DL260/dl205.md— DL205/DL260 quirks including the 4-client ECOM cap