Files
wwtools/mbproxy/docs/Architecture/ConnectionModel.md
T
Joseph Doherty 0868613890 mbproxy: add keepalive / connection monitoring
The DL205/DL260 ECOM emits no TCP keepalives, so an idle backend socket
can be silently dropped by a middlebox (switch, firewall, NAT) after
2-5 minutes. Enable OS SO_KEEPALIVE on backend and accepted upstream
sockets, and drive a periodic synthetic FC03 heartbeat on each idle
backend socket so a dead path is detected before a real client request
hits it. Controlled by Connection.Keepalive (ON by default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 09:40:54 -04:00

27 KiB
Raw Blame History

Connection Model

The proxy holds one persistent backend TCP socket per PLC and multiplexes many upstream client connections onto it by rewriting the MBAP transaction ID on every request and restoring each client's original TxId on the matching response.

Why One Backend Connection Per PLC

An earlier design opened a fresh backend socket for each accepted upstream client (1:1 pairs). That model collapsed against the AutomationDirect H2-ECOM100, which caps simultaneous TCP clients at 4 per PLC (see ../Reference/dl205.md under "Behavioural Oddities"). The fifth upstream client to attach to a busy PLC was refused at connect, with no recourse other than waiting for an existing pair to drop.

Multiplexing replaces 1:N upstream-to-backend with N:1 upstream-to-multiplexer-to-backend:

  • The proxy occupies exactly one of the ECOM's 4 TCP client slots per PLC, regardless of how many upstream clients are attached.
  • Upstream-side concurrency is no longer bounded by the controller's accept queue.
  • Serialisation shifts from the PLC accept queue to the proxy's outbound channel (_outboundChannel in PlcMultiplexer).

The honest trade-off: the wire-rate ceiling does not change. The ECOM serialises requests internally at roughly 210 ms per scan, so the multiplexer cannot deliver more PDUs per second to one PLC than the 1:1 model could. What multiplexing buys is connection headroom, plus the data structures that read coalescing and the response cache hook into.

Why TxId rewriting rather than connection pooling

The MBAP transaction ID is a 16-bit field at bytes 01 of every Modbus TCP frame, and the Modbus TCP specification explicitly permits clients to pipeline requests under different TxIds on a single connection. The PLC echoes each request's TxId on the matching response. The multiplexer exploits that contract: by allocating a proxy-side TxId per request and substituting it for the upstream client's TxId on the wire, many upstream clients can have concurrent requests outstanding on one backend socket without their MBAP frames ever colliding. A connection pool, by contrast, would still need either one backend socket per concurrent request (defeating the ECOM cap workaround) or a serialisation lock on each pooled socket (defeating concurrency).

Components

The load-bearing types all live in ../../src/Mbproxy/Proxy/Multiplexing/.

Type roster

Type File Role
PlcMultiplexer PlcMultiplexer.cs Owns the backend socket, the outbound channel, the backend writer and reader tasks, the per-request timeout watchdog, and the set of attached upstream pipes. One instance per PLC.
UpstreamPipe UpstreamPipe.cs Per-upstream-client wrapper around an accepted Socket. Owns a read task that drives PlcMultiplexer.OnUpstreamFrameAsync, plus a write task that drains a bounded _responseChannel (capacity 16) onto the socket.
TxIdAllocator TxIdAllocator.cs Proxy-side 16-bit TxId allocator. Backed by a bool[65536] plus a rolling _next cursor under a single lock. Exposes TryAllocate, Release, InFlightCount, and WrapCount.
CorrelationMap CorrelationMap.cs ConcurrentDictionary<ushort, InFlightRequest> mapping proxy TxId to its in-flight record. Exposes TryAdd, TryRemove, DrainAll, and SnapshotOlderThan.
InFlightRequest InFlightRequest.cs Record carrying UnitId, Fc, StartAddress, Qty, IReadOnlyList<InterestedParty> InterestedParties, SentAtUtc, and ResolvedCacheTtlMs.
InterestedParty InFlightRequest.cs Record (UpstreamPipe Pipe, ushort OriginalTxId) identifying who receives the response and which TxId to restore.

Threading invariants

The multiplexer relies on a handful of single-owner rules that keep the wire-touching code lock-free:

  • One backend writer. Only RunBackendWriterAsync calls backend.SendAsync. The single-writer drain of _outboundChannel.Reader.ReadAllAsync means no socket-level send lock is needed.
  • One backend reader. Only RunBackendReaderAsync calls backend.ReceiveAsync. The reader is the sole producer of CorrelationMap.TryRemove for the response path.
  • Per-pipe write loop. Each UpstreamPipe has exactly one task that drains _responseChannel to its upstream socket. The multiplexer fan-out path only enqueues; it never writes to the socket directly.
  • Per-pipe read loop. A single read task per pipe parses MBAP frames and calls OnUpstreamFrameAsync sequentially. A single upstream client therefore cannot multi-PDU-pipeline itself; concurrency comes from having many pipes.

TxIdAllocator holds an internal lock for TryAllocate / Release. Contention is low in practice — one PLC's wire rate is bounded by the ECOM scan time — and the lock is preferred over a lock-free approach so the saturation, cascade, and Polly-retry paths remain deterministic.

Why ConcurrentDictionary for the correlation map

CorrelationMap is backed by ConcurrentDictionary<ushort, InFlightRequest> even though the request-side adds and the response-side removes are nominally single-threaded each. Three independent paths can remove an entry concurrently with each other: the backend reader on a normal response, the watchdog on a timeout, and the cascade walker on a backend disconnect. Two adders (the coalescing path's factory and the non-coalescing fast path) can also race against a removal if the backend response arrives mid-add. The ConcurrentDictionary makes those TryAdd/TryRemove calls atomic, which is what the "claim then dispatch" pattern in the watchdog and reader relies on for correctness.

Upstream To Multiplexer Path

PlcListener accepts an upstream Socket and constructs an UpstreamPipe around it. PlcMultiplexer.StartPipeAsync attaches the pipe, spins up its write loop, and invokes RunReadLoopAsync with OnUpstreamFrameAsync as the per-frame callback. When the read loop returns (clean upstream EOF, socket fault, or cascade), a ContinueWith removes the pipe from _pipes; disposal itself is owned by the listener.

Frame parsing

The pipe's read loop reads a 7-byte MBAP header into a stack-buffered array, parses the Length field, allocates a fresh byte[] sized to header + (Length 1) bytes, fills the PDU body, and hands the complete frame to the callback. Frames whose length field claims a body larger than MbapFrame.MaxPduBodySize are treated as a protocol error and close the upstream pipe; a zero-body length is permitted (the header alone is forwarded). The buffer ownership transfers to the multiplexer with each call so the multiplexer can store it in the CorrelationMap entry without coordinating buffer lifetimes back to the pipe.

Each call to OnUpstreamFrameAsync:

  1. Parses the MBAP header to extract the upstream client's originalTxId and the unitId.
  2. For FC03, FC04, FC06, and FC16 it also pulls startAddress and qty out of the PDU; these feed the cache, the read-coalescing key, and the response BCD rewriter.
  3. (Response cache, FC03/FC04 only) checks _ctx.Cache via a CacheKey. A hit short-circuits the entire path — including the backend connect attempt — and returns a synthesised frame.
  4. Calls EnsureBackendConnectedAsync, which lazily brings up the backend socket through a Polly retry pipeline driven by Connection.BackendConnectTimeoutMs.
  5. (Read coalescing, FC03/FC04 only, when enabled) consults InFlightByKeyMap to either attach to an existing peer in flight or open a new entry.
  6. On a coalescing miss or any non-coalescing FC: calls TxIdAllocator.TryAllocate(out ushort proxyTxId). Saturation returns false and the client receives a Modbus exception code 4 (Slave Device Failure).
  7. Builds an InFlightRequest, registers it in CorrelationMap.TryAdd(proxyTxId, ...), and observes the new peak via ObserveInFlight.
  8. Runs the BCD rewriter over the request payload through _pipeline.Process(MbapDirection.RequestToBackend, ...).
  9. Overwrites bytes 0 and 1 of the MBAP header with the big-endian encoding of proxyTxId.
  10. Enqueues the frame onto _outboundChannel via _outboundChannel.Writer.WriteAsync. The channel is bounded at 256 with BoundedChannelFullMode.Wait, so a saturated outbound queue backpressures the upstream rather than dropping frames.
// Sketch of the proxy-TxId rewrite (PlcMultiplexer.OnUpstreamFrameAsync):
if (!_allocator.TryAllocate(out ushort proxyTxIdFc)) { /* exception 4 */ }
_correlation.TryAdd(proxyTxIdFc, inFlightNc);
_pipeline.Process(MbapDirection.RequestToBackend, header, body, requestCtxNc);
frame[0] = (byte)(proxyTxIdFc >> 8);
frame[1] = (byte)(proxyTxIdFc & 0xFF);
await _outboundChannel.Writer.WriteAsync(frame, ct).ConfigureAwait(false);

After enqueuing, the upstream read loop is free to read the next frame. There is no per-pipe in-flight gate beyond what the upstream client itself imposes by reading from a single TCP stream.

Saturation handling

TxIdAllocator.TryAllocate returns false only when all 65,536 slots are simultaneously in flight against one PLC. In that state OnUpstreamFrameAsync calls BuildExceptionFrame(originalTxId, unitId, fcByte, exceptionCode: 4) and enqueues the frame straight onto the requesting pipe's response channel — the upstream client sees a clean Modbus exception code 4 (Slave Device Failure) rather than a hung socket. The same path emits MultiplexerLogEvents.Saturated with the remote endpoint string for operator triage.

Lazy backend connect

The backend socket starts offline. EnsureBackendConnectedAsync runs under a SemaphoreSlim named _connectGate so concurrent upstream frames during a cold start serialise their connect attempts. The first caller through the gate builds a fresh Socket, sets NoDelay = true, and runs ConnectAsync under either the supplied _backendConnectPipeline (Polly resilience pipeline) or a plain CancellationToken linked to Connection.BackendConnectTimeoutMs. On failure it logs MultiplexerLogEvents.BackendFailed, increments the per-PLC connectsFailed counter, and returns false; the upstream pipe is disposed by the caller. On success it spawns the backend writer and reader tasks under a fresh CancellationTokenSource linked to _disposeCts, increments connectsSuccess, and logs MultiplexerLogEvents.BackendConnected.

A double-checked fast path before the gate avoids the semaphore acquire on the happy path: the moment _backendSocket is { Connected: true } and _backendCts is { IsCancellationRequested: false }, EnsureBackendConnectedAsync returns immediately without taking the lock. The lazy-connect contract therefore costs one volatile read per request after the first successful connect.

Multiplexer To Backend Path

The backend side is two tasks plus one bounded channel. EnsureBackendConnectedAsync launches two tasks against the backend socket on first connect, both under a single _backendCts:

  • RunBackendWriterAsync — single consumer of _outboundChannel.Reader.ReadAllAsync. Writes every frame to the backend socket via SendAsync with a loop to handle short writes. Single-writer means no socket-level lock is needed for sends.
  • RunBackendReaderAsync — single producer reading frames off the backend socket. For each frame:
    1. Parses the MBAP header to extract proxyTxId and length.
    2. Reads the PDU body into a fresh byte[].
    3. Calls CorrelationMap.TryRemove(proxyTxId, out var inFlight). A miss (no entry) drops the frame silently — usually a stale response after a cascade.
    4. Frees the allocator slot via _allocator.Release(proxyTxId).
    5. Updates the per-PLC EWMA round-trip via UpdateRoundTripEwma using inFlight.SentAtUtc.
    6. Runs the response-side BCD rewriter through _pipeline.Process(MbapDirection.ResponseToClient, ...). The rewriter needs inFlight.StartAddress and inFlight.Qty because the FC03/FC04 response PDU does not echo the read range.
    7. (Cache write-through, post-rewriter) on a non-exception response, stores FC03/FC04 entries in _ctx.Cache or invalidates overlapping entries on FC06/FC16.
    8. Walks inFlight.InterestedParties. For each party with a live pipe, copies the frame, restores party.OriginalTxId into header bytes 01, and calls party.Pipe.SendResponseAsync to enqueue the frame onto that pipe's response channel.

Single-reader on the backend socket plus per-pipe response channels means every cross-task hand-off goes through a Channel<byte[]> — no locks on the wire-touching code paths.

Frame fan-out

When inFlight.InterestedParties.Count == 1 — the common non-coalesced case — the reader optimises by passing the original frame buffer through to SendResponseAsync without copying. When the list has more than one party (a coalesced FC03/FC04 read), the reader clones the frame for each party before patching in its OriginalTxId, so each pipe's response channel owns an independent buffer.

A party whose pipe reports IsAlive == false is skipped. For multi-party FC03/FC04 frames the skip path also increments the per-PLC coalescedResponseToDeadUpstream counter and logs CoalescingLogEvents.DeadUpstream, so operators can correlate cascade-mid-flight events with which reads were affected.

Per-Request Timeout Watchdog

RunRequestTimeoutWatchdogAsync is launched from the multiplexer constructor and runs for the lifetime of the multiplexer. It ticks every BackendRequestTimeoutMs / 4, floored at 100 ms, and on each tick calls CorrelationMap.SnapshotOlderThan(DateTimeOffset.UtcNow.AddMilliseconds(-BackendRequestTimeoutMs)).

For each stale entry the watchdog:

  1. Tries to claim the entry via _correlation.TryRemove(proxyTxId, out var req). A failed claim means a response, cascade, or another watchdog tick already removed it — skip.
  2. Releases the proxy TxId via _allocator.Release(proxyTxId).
  3. For FC03/FC04, also removes the matching CoalescingKey from InFlightByKeyMap so a brand-new identical request opens a fresh round-trip rather than attaching to a corpse.
  4. Walks req.InterestedParties and, for each live pipe, delivers a synthesised Modbus exception frame with function code req.Fc | 0x80 and exception code 0x0B (Gateway Target Device Failed To Respond), with the party's OriginalTxId patched back into the MBAP header.

The watchdog exists because the multiplexed model has no per-pair fault-on-timeout backstop. In the 1:1 model, a lost response simply sat on a dead socket that the upstream eventually closed; in the multiplexed model, a single missing or mis-echoed response would leak its CorrelationMap entry forever and hang every upstream party attached to it. Specific failure modes the watchdog covers:

  • The PLC drops a response (busy controller, scan-time excursion).
  • A middlebox drops a packet on a long-idle backend socket.
  • A backend mis-echoes the MBAP TxId — including pymodbus 3.13.0's deferred-handler bug noted below.

Why claim then release

The watchdog reads the stale set via SnapshotOlderThan (a non-removing scan) and only then competes for each entry via TryRemove. The two-step is deliberate: a response arriving between the snapshot and the claim wins the TryRemove race and the watchdog skips that entry. Without the claim race, the upstream party could receive both a real response and a 0x0B exception for the same request, which would corrupt clients that expect responses in TxId order.

Tick cadence

The 100 ms floor on tickMs keeps the watchdog from busy-waking when an operator configures BackendRequestTimeoutMs below 400 ms. With the production default of 3000 ms the watchdog ticks every 750 ms, which keeps timeout dispatch latency well under one second past the threshold.

Exception frame shape

BuildExceptionFrame produces a 9-byte synthetic response: 7-byte MBAP header plus a 2-byte exception PDU. The function code byte is OR'd with 0x80 to flag the response as an exception, and the second PDU byte carries the exception code (0x04 for allocator saturation, 0x0B for the watchdog). The Length field in the MBAP header is set to 3 (UnitId + exception FC + exception code) and the ProtocolId is zero per the Modbus TCP spec. Clients written against a real DL260 see exactly the same frame layout a controller would emit, so client libraries surface a normal ModbusException rather than a transport error.

Backend Disconnect Cascade

When the backend socket dies — reader EOF, writer fault, PLC reboot, network partition, or middlebox idle drop — TearDownBackendAsync(reason, cascadeUpstreams: true) runs:

  1. Cancels _backendCts, which terminates both backend tasks.
  2. Shuts down and disposes the backend Socket.
  3. Calls CorrelationMap.DrainAll, releases every allocator slot, and collects every InterestedParty's pipe ID.
  4. Calls InFlightByKeyMap.DrainAll so stale coalescing entries cannot outlive the backend they were aimed at.
  5. Disposes every attached UpstreamPipe and clears _pipes.
  6. Increments BackendDisconnectCascades on the per-PLC counters by the number of upstream pipes that were attached (AddDisconnectCascades(upstreamCount)).
  7. Logs a MultiplexerLogEvents.BackendDisconnected event with the upstream count, drained correlation count, and a reason string.

The rationale: a backend disconnect invalidates every in-flight response, and there is no clean way to mid-flight-rebind upstream clients to a fresh backend socket without risking silent data loss. Cascading the disconnect upstream is loud (clients re-issue immediately) but unambiguous — every upstream sees its socket close, no zombie upstream sockets hold stale state. The next upstream frame after the cascade triggers a fresh Polly-driven backend connect.

Failure detection paths

Three independent paths can initiate a cascade:

  1. Reader EOF. RunBackendReaderAsync sees a clean zero-byte read from ReceiveAsync and falls out of the loop. It calls TearDownBackendAsync("backend reader EOF", cascadeUpstreams: true) as a fire-and-forget task.
  2. Reader fault or writer fault. Either backend task catches a non-cancellation exception and calls TearDownBackendAsync($"reader fault: {ex.Message}", ...) or the equivalent writer-fault path.
  3. Watchdog-driven indirect failure. A backend that mis-echoes TxIds will not itself fault the socket; the watchdog eventually times out the leaked correlation entries and delivers 0x0B exceptions. The socket stays up unless the backend then also stops responding to subsequent requests.

TearDownBackendAsync is idempotent against itself — the lock (_backendLock) block atomically swaps the live socket and task references to null, so a second invocation sees oldSocket is null && oldCts is null and returns without re-cascading.

Why every attached upstream cascades

An earlier sketch cascaded only upstream pipes that had a request in flight at the moment of disconnect. The current implementation cascades every attached pipe, in flight or idle. The reason: an idle upstream pipe is one that the proxy has been quietly answering from cache or that has simply not sent a request recently. After a backend disconnect, the proxy has no way to prove the PLC's state still matches what those idle clients last saw — a PLC reboot, ladder edit, or operator write between the disconnect and reconnect can have moved the values out from under them. Closing every upstream socket is the unambiguous signal that "the link to the device was lost; rebuild your state from scratch." Clients reconnect on their own next request.

Connect-on-next-frame, not eager reconnect

The cascade tears down the backend without scheduling a reconnect. The next upstream frame that arrives invokes EnsureBackendConnectedAsync, which constructs a fresh socket and runs the Polly connect pipeline. The rationale is that an eager reconnect spinner would hammer a downed PLC at the configured backoff schedule even when no clients are attached; gating reconnect on client demand avoids waste during long PLC outages without sacrificing recovery latency once clients return.

Wire-Rate Considerations

The multiplexer is not a throughput multiplier. The ECOM serialises every request it receives on its single internal scan, so PDUs-per-second to one PLC is bounded by 1 / ecom_scan_ms regardless of how many upstream clients the proxy fans in. What changes:

  • Connection count. Upstream-side connection count is now limited by the OS socket budget and OutboundChannelCapacity (256), not by the ECOM's 4-client cap.
  • Coalescing opportunity. Identical concurrent FC03/FC04 reads attach to the same InFlightRequest via InFlightByKeyMap, so the proxy issues one backend round-trip and fans the response out to all attached parties (see ./ReadCoalescing.md).
  • Cache short-circuit. FC03/FC04 reads with a resolved per-tag TTL never reach the wire while the cached PDU is fresh (see ./ResponseCache.md).

The proxy can hand more concurrent upstream clients a result on a hot tag than the bare PLC can serve simultaneously. It cannot let those clients hammer the PLC harder than the PLC's scan time allows.

Counters exposed by the status page

PlcMultiplexer implements IMultiplexCountersProvider and registers itself with the per-PLC counters object during construction. The status page reads these values per snapshot:

Counter Source Meaning
inFlight TxIdAllocator.InFlightCount Proxy TxIds currently allocated against this PLC.
maxInFlight Counters.ObserveInFlight peak High-water mark since service start.
txIdWraps TxIdAllocator.WrapCount Times the rolling cursor has rolled 0xFFFF → 0x0000. Sustained non-zero means very high churn.
queueDepth _outboundChannel.Reader.Count Frames sitting in the outbound channel waiting for the backend writer. Persistent depth means the PLC is the bottleneck.
disconnectCascades Counters.AddDisconnectCascades Cumulative count of upstream pipes cascaded by backend disconnects. Rises in chunks equal to the attached pipe count at cascade time.
connectsSuccess / connectsFailed Counters.IncrementConnectSuccess / IncrementConnectFailed Per-PLC backend connect outcomes.

Interpreting non-zero txIdWraps

Each WrapCount increment means the allocator has issued at least 65,536 TxIds against one PLC since service start. On a steady 10 ms-per-PDU pace that takes about 11 minutes; sustained wraps therefore indicate request rates in the hundreds-per-second range, well above what an ECOM-served PLC can answer. Wraps without a matching rise in inFlight simply reflect cumulative volume and are benign. Wraps that climb alongside a high inFlight value indicate the PLC is back-pressuring; check queueDepth and the EWMA round-trip on the same snapshot.

Interpreting non-zero queueDepth

_outboundChannel is bounded at 256 with BoundedChannelFullMode.Wait. A persistent depth above zero means the backend writer is not draining as fast as upstream pipes are submitting — the PLC has become the bottleneck. A queue that climbs toward 256 means upstream pipes are starting to block on WriteAsync; that backpressure walks back up the per-pipe read loop and ultimately stalls the upstream client's send buffer, which is the correct behaviour for an overloaded PLC.

A queue depth above zero with inFlight also climbing suggests the PLC is keeping up with requests but slowly; the EWMA round-trip on the same snapshot will confirm. A queue depth above zero with inFlight flat at the allocator's saturation ceiling indicates a stuck backend (no responses arriving, no slots freeing); the watchdog will eventually clear the stuck entries via 0x0B exceptions.

Memory footprint per PLC

Each PlcMultiplexer holds a bool[65536] for the TxId allocator (~64 KB), the ConcurrentDictionary for the correlation map (sized to peak in-flight, typically tens of bytes per entry plus the byte[] frame buffers referenced by the entries), the bounded outbound channel (≤ 256 frames in flight; each frame at most 260 bytes), and the per-pipe response channels (≤ 16 frames per attached pipe). With ~54 PLCs the allocator alone accounts for roughly 3.4 MB; the rest is request-rate dependent and well within the service's measured ~30 MB working set under load.

Lifecycle And Disposal

PlcMultiplexer.DisposeAsync is idempotent and runs in this order:

  1. Sets _disposed = true and unhooks the live IMultiplexCountersProvider registration so a concurrent status snapshot does not observe internal state mid-teardown.
  2. Cancels _disposeCts, which cooperatively stops the watchdog task.
  3. Awaits the watchdog with a 2-second timeout so its in-flight 0x0B dispatches settle before tests assert against counter values.
  4. Calls TearDownBackendAsync("disposing", cascadeUpstreams: true) to close the backend, drain CorrelationMap, drain InFlightByKeyMap, and dispose every attached pipe.
  5. Completes the outbound channel writer, then disposes any pipes that were not already cleared by the cascade walk.
  6. Disposes _disposeCts.

UpstreamPipe.DisposeAsync is similarly idempotent: it completes its response channel writer, cancels its internal CTS, shuts the upstream socket down both ways, and emits a MultiplexerLogEvents.ClientDisconnected event with the remote endpoint string and a reason. Disposal can be triggered by the listener (clean upstream EOF), by the read or write loop encountering a socket error, or by the cascade walk.

pymodbus 3.13.0 Simulator Quirk

The pymodbus simulator's ServerRequestHandler stores a single last_pdu field per connection and schedules deferred response handlers via asyncio.call_soon. When two MBAP frames arrive in the same recv buffer — exactly the workload the multiplexer can produce on its shared backend socket — the second frame's last_pdu overwrites the first before either deferred handler runs. Both responses then carry the second request's TxId.

Why this only matters in tests

The real H2-ECOM100 does not have this bug; it echoes per-request TxIds correctly. Multiplexer correctness under genuine backend concurrency is proven by the unit tests in PlcMultiplexerTests against a stub backend that respects MBAP TxIds, not via the simulator. The E2E suite paces requests against the pymodbus simulator to keep it in known-good single-PDU mode.

The per-request timeout watchdog described above is the production defence against any backend (real or simulated) that mis-echoes a TxId: the unanswered InFlightRequest ages past BackendRequestTimeoutMs and the upstream party receives a clean Modbus exception 0x0B rather than a hung socket.