mbproxy/docs: split deep docs into focused PascalCase files per StyleGuide

Adds 11 topic-focused docs under docs/{Architecture,Features,Operations,Reference,Testing}/
and links them from README.md's new "Detailed documentation" section. Existing
top-level docs (design.md, kpi.md, operations.md) remain as canonical landings.

Architecture/
  - Overview.md         (150 lines) — listener topology, request flow, per-PLC isolation
  - ConnectionModel.md  (247 lines) — TxId multiplexer, watchdog, disconnect cascade
  - ReadCoalescing.md   (243 lines) — in-flight FC03/04 dedup via InFlightByKeyMap
  - ResponseCache.md    (398 lines) — opt-in per-tag TTL cache + range-overlap invalidation

Features/
  - BcdRewriting.md     (252 lines) — codec, CDAB, FC scope, partial-overlap policy
  - HotReload.md        (189 lines) — IOptionsMonitor + per-change-kind reconcile rules

Operations/
  - Configuration.md    (422 lines) — every Mbproxy:* option + validation rules
  - StatusPage.md       (334 lines) — admin endpoint surface, every JSON field
  - Troubleshooting.md  (364 lines) — diagnosis playbook keyed to log events

Reference/
  - LogEvents.md        (499 lines) — 28 events across 7 categories, grep-verified

Testing/
  - Simulator.md        (235 lines) — pymodbus fixture, skip policy, 3.13 framer quirk

Each doc was written by a dedicated agent against the StyleGuide.md rules with
a per-doc phase gate (PascalCase filename, H1 Title Case, code-fence language
tags, Related Documentation section with >=3 relative links, real type names
verified against src/). Cross-references between docs use relative paths;
all 18 README->docs links and all sibling links resolve.

Known follow-up: docs/design.md lines 215-251 are stale on two log-event
property templates (config.reload.applied and config.reload.rejected) and
mention LogContext.PushProperty scoping that isn't actually used. Reference/
LogEvents.md is now the authoritative event catalog and source-of-truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-14 03:44:34 -04:00
parent 4fcda87ecd
commit f49e27e316
12 changed files with 3363 additions and 0 deletions
@@ -0,0 +1,247 @@
# Connection Model
The proxy holds one persistent backend TCP socket per PLC and multiplexes many upstream client connections onto it by rewriting the MBAP transaction ID on every request and restoring each client's original TxId on the matching response.
## Why One Backend Connection Per PLC
An earlier design opened a fresh backend socket for each accepted upstream client (1:1 pairs). That model collapsed against the **AutomationDirect H2-ECOM100**, which caps simultaneous TCP clients at **4 per PLC** (see [`../../DL260/dl205.md`](../../DL260/dl205.md) under "Behavioural Oddities"). The fifth upstream client to attach to a busy PLC was refused at connect, with no recourse other than waiting for an existing pair to drop.
Multiplexing replaces 1:N upstream-to-backend with N:1 upstream-to-multiplexer-to-backend:
- The proxy occupies exactly one of the ECOM's 4 TCP client slots per PLC, regardless of how many upstream clients are attached.
- Upstream-side concurrency is no longer bounded by the controller's accept queue.
- Serialisation shifts from the PLC accept queue to the proxy's outbound channel (`_outboundChannel` in `PlcMultiplexer`).
The honest trade-off: the wire-rate ceiling does not change. The ECOM serialises requests internally at roughly 210 ms per scan, so the multiplexer cannot deliver more PDUs per second to one PLC than the 1:1 model could. What multiplexing buys is connection headroom, plus the data structures that read coalescing and the response cache hook into.
### Why TxId rewriting rather than connection pooling
The MBAP transaction ID is a 16-bit field at bytes 01 of every Modbus TCP frame, and the Modbus TCP specification explicitly permits clients to pipeline requests under different TxIds on a single connection. The PLC echoes each request's TxId on the matching response. The multiplexer exploits that contract: by allocating a proxy-side TxId per request and substituting it for the upstream client's TxId on the wire, many upstream clients can have concurrent requests outstanding on one backend socket without their MBAP frames ever colliding. A connection pool, by contrast, would still need either one backend socket per concurrent request (defeating the ECOM cap workaround) or a serialisation lock on each pooled socket (defeating concurrency).
## Components
The load-bearing types all live in [`../../src/Mbproxy/Proxy/Multiplexing/`](../../src/Mbproxy/Proxy/Multiplexing).
### Type roster
| Type | File | Role |
|------|------|------|
| `PlcMultiplexer` | `PlcMultiplexer.cs` | Owns the backend socket, the outbound channel, the backend writer and reader tasks, the per-request timeout watchdog, and the set of attached upstream pipes. One instance per PLC. |
| `UpstreamPipe` | `UpstreamPipe.cs` | Per-upstream-client wrapper around an accepted `Socket`. Owns a read task that drives `PlcMultiplexer.OnUpstreamFrameAsync`, plus a write task that drains a bounded `_responseChannel` (capacity 16) onto the socket. |
| `TxIdAllocator` | `TxIdAllocator.cs` | Proxy-side 16-bit TxId allocator. Backed by a `bool[65536]` plus a rolling `_next` cursor under a single lock. Exposes `TryAllocate`, `Release`, `InFlightCount`, and `WrapCount`. |
| `CorrelationMap` | `CorrelationMap.cs` | `ConcurrentDictionary<ushort, InFlightRequest>` mapping proxy TxId to its in-flight record. Exposes `TryAdd`, `TryRemove`, `DrainAll`, and `SnapshotOlderThan`. |
| `InFlightRequest` | `InFlightRequest.cs` | Record carrying `UnitId`, `Fc`, `StartAddress`, `Qty`, `IReadOnlyList<InterestedParty> InterestedParties`, `SentAtUtc`, and `ResolvedCacheTtlMs`. |
| `InterestedParty` | `InFlightRequest.cs` | Record `(UpstreamPipe Pipe, ushort OriginalTxId)` identifying who receives the response and which TxId to restore. |
### Threading invariants
The multiplexer relies on a handful of single-owner rules that keep the wire-touching code lock-free:
- **One backend writer.** Only `RunBackendWriterAsync` calls `backend.SendAsync`. The single-writer drain of `_outboundChannel.Reader.ReadAllAsync` means no socket-level send lock is needed.
- **One backend reader.** Only `RunBackendReaderAsync` calls `backend.ReceiveAsync`. The reader is the sole producer of `CorrelationMap.TryRemove` for the response path.
- **Per-pipe write loop.** Each `UpstreamPipe` has exactly one task that drains `_responseChannel` to its upstream socket. The multiplexer fan-out path only enqueues; it never writes to the socket directly.
- **Per-pipe read loop.** A single read task per pipe parses MBAP frames and calls `OnUpstreamFrameAsync` sequentially. A single upstream client therefore cannot multi-PDU-pipeline itself; concurrency comes from having many pipes.
`TxIdAllocator` holds an internal lock for `TryAllocate` / `Release`. Contention is low in practice — one PLC's wire rate is bounded by the ECOM scan time — and the lock is preferred over a lock-free approach so the saturation, cascade, and Polly-retry paths remain deterministic.
### Why ConcurrentDictionary for the correlation map
`CorrelationMap` is backed by `ConcurrentDictionary<ushort, InFlightRequest>` even though the request-side adds and the response-side removes are nominally single-threaded each. Three independent paths can remove an entry concurrently with each other: the backend reader on a normal response, the watchdog on a timeout, and the cascade walker on a backend disconnect. Two adders (the coalescing path's factory and the non-coalescing fast path) can also race against a removal if the backend response arrives mid-add. The `ConcurrentDictionary` makes those `TryAdd`/`TryRemove` calls atomic, which is what the "claim then dispatch" pattern in the watchdog and reader relies on for correctness.
## Upstream To Multiplexer Path
`PlcListener` accepts an upstream `Socket` and constructs an `UpstreamPipe` around it. `PlcMultiplexer.StartPipeAsync` attaches the pipe, spins up its write loop, and invokes `RunReadLoopAsync` with `OnUpstreamFrameAsync` as the per-frame callback. When the read loop returns (clean upstream EOF, socket fault, or cascade), a `ContinueWith` removes the pipe from `_pipes`; disposal itself is owned by the listener.
### Frame parsing
The pipe's read loop reads a 7-byte MBAP header into a stack-buffered array, parses the `Length` field, allocates a fresh `byte[]` sized to header + (`Length` 1) bytes, fills the PDU body, and hands the complete frame to the callback. Frames whose length field claims a body larger than `MbapFrame.MaxPduBodySize` are treated as a protocol error and close the upstream pipe; a zero-body length is permitted (the header alone is forwarded). The buffer ownership transfers to the multiplexer with each call so the multiplexer can store it in the `CorrelationMap` entry without coordinating buffer lifetimes back to the pipe.
Each call to `OnUpstreamFrameAsync`:
1. Parses the MBAP header to extract the upstream client's `originalTxId` and the `unitId`.
2. For FC03, FC04, FC06, and FC16 it also pulls `startAddress` and `qty` out of the PDU; these feed the cache, the read-coalescing key, and the response BCD rewriter.
3. (Response cache, FC03/FC04 only) checks `_ctx.Cache` via a `CacheKey`. A hit short-circuits the entire path — including the backend connect attempt — and returns a synthesised frame.
4. Calls `EnsureBackendConnectedAsync`, which lazily brings up the backend socket through a Polly retry pipeline driven by `Connection.BackendConnectTimeoutMs`.
5. (Read coalescing, FC03/FC04 only, when enabled) consults `InFlightByKeyMap` to either attach to an existing peer in flight or open a new entry.
6. On a coalescing miss or any non-coalescing FC: calls `TxIdAllocator.TryAllocate(out ushort proxyTxId)`. Saturation returns false and the client receives a Modbus exception code 4 (Slave Device Failure).
7. Builds an `InFlightRequest`, registers it in `CorrelationMap.TryAdd(proxyTxId, ...)`, and observes the new peak via `ObserveInFlight`.
8. Runs the BCD rewriter over the request payload through `_pipeline.Process(MbapDirection.RequestToBackend, ...)`.
9. Overwrites bytes 0 and 1 of the MBAP header with the big-endian encoding of `proxyTxId`.
10. Enqueues the frame onto `_outboundChannel` via `_outboundChannel.Writer.WriteAsync`. The channel is bounded at 256 with `BoundedChannelFullMode.Wait`, so a saturated outbound queue backpressures the upstream rather than dropping frames.
```csharp
// Sketch of the proxy-TxId rewrite (PlcMultiplexer.OnUpstreamFrameAsync):
if (!_allocator.TryAllocate(out ushort proxyTxIdFc)) { /* exception 4 */ }
_correlation.TryAdd(proxyTxIdFc, inFlightNc);
_pipeline.Process(MbapDirection.RequestToBackend, header, body, requestCtxNc);
frame[0] = (byte)(proxyTxIdFc >> 8);
frame[1] = (byte)(proxyTxIdFc & 0xFF);
await _outboundChannel.Writer.WriteAsync(frame, ct).ConfigureAwait(false);
```
After enqueuing, the upstream read loop is free to read the next frame. There is no per-pipe in-flight gate beyond what the upstream client itself imposes by reading from a single TCP stream.
### Saturation handling
`TxIdAllocator.TryAllocate` returns `false` only when all 65,536 slots are simultaneously in flight against one PLC. In that state `OnUpstreamFrameAsync` calls `BuildExceptionFrame(originalTxId, unitId, fcByte, exceptionCode: 4)` and enqueues the frame straight onto the requesting pipe's response channel — the upstream client sees a clean Modbus exception code 4 (Slave Device Failure) rather than a hung socket. The same path emits `MultiplexerLogEvents.Saturated` with the remote endpoint string for operator triage.
### Lazy backend connect
The backend socket starts offline. `EnsureBackendConnectedAsync` runs under a `SemaphoreSlim` named `_connectGate` so concurrent upstream frames during a cold start serialise their connect attempts. The first caller through the gate builds a fresh `Socket`, sets `NoDelay = true`, and runs `ConnectAsync` under either the supplied `_backendConnectPipeline` (Polly resilience pipeline) or a plain `CancellationToken` linked to `Connection.BackendConnectTimeoutMs`. On failure it logs `MultiplexerLogEvents.BackendFailed`, increments the per-PLC `connectsFailed` counter, and returns `false`; the upstream pipe is disposed by the caller. On success it spawns the backend writer and reader tasks under a fresh `CancellationTokenSource` linked to `_disposeCts`, increments `connectsSuccess`, and logs `MultiplexerLogEvents.BackendConnected`.
A double-checked fast path before the gate avoids the semaphore acquire on the happy path: the moment `_backendSocket is { Connected: true }` and `_backendCts is { IsCancellationRequested: false }`, `EnsureBackendConnectedAsync` returns immediately without taking the lock. The lazy-connect contract therefore costs one volatile read per request after the first successful connect.
## Multiplexer To Backend Path
The backend side is two tasks plus one bounded channel. `EnsureBackendConnectedAsync` launches two tasks against the backend socket on first connect, both under a single `_backendCts`:
- **`RunBackendWriterAsync`** — single consumer of `_outboundChannel.Reader.ReadAllAsync`. Writes every frame to the backend socket via `SendAsync` with a loop to handle short writes. Single-writer means no socket-level lock is needed for sends.
- **`RunBackendReaderAsync`** — single producer reading frames off the backend socket. For each frame:
1. Parses the MBAP header to extract `proxyTxId` and `length`.
2. Reads the PDU body into a fresh `byte[]`.
3. Calls `CorrelationMap.TryRemove(proxyTxId, out var inFlight)`. A miss (no entry) drops the frame silently — usually a stale response after a cascade.
4. Frees the allocator slot via `_allocator.Release(proxyTxId)`.
5. Updates the per-PLC EWMA round-trip via `UpdateRoundTripEwma` using `inFlight.SentAtUtc`.
6. Runs the response-side BCD rewriter through `_pipeline.Process(MbapDirection.ResponseToClient, ...)`. The rewriter needs `inFlight.StartAddress` and `inFlight.Qty` because the FC03/FC04 response PDU does not echo the read range.
7. (Cache write-through, post-rewriter) on a non-exception response, stores FC03/FC04 entries in `_ctx.Cache` or invalidates overlapping entries on FC06/FC16.
8. Walks `inFlight.InterestedParties`. For each party with a live pipe, copies the frame, restores `party.OriginalTxId` into header bytes 01, and calls `party.Pipe.SendResponseAsync` to enqueue the frame onto that pipe's response channel.
Single-reader on the backend socket plus per-pipe response channels means every cross-task hand-off goes through a `Channel<byte[]>` — no locks on the wire-touching code paths.
### Frame fan-out
When `inFlight.InterestedParties.Count == 1` — the common non-coalesced case — the reader optimises by passing the original frame buffer through to `SendResponseAsync` without copying. When the list has more than one party (a coalesced FC03/FC04 read), the reader clones the frame for each party before patching in its `OriginalTxId`, so each pipe's response channel owns an independent buffer.
A party whose pipe reports `IsAlive == false` is skipped. For multi-party FC03/FC04 frames the skip path also increments the per-PLC `coalescedResponseToDeadUpstream` counter and logs `CoalescingLogEvents.DeadUpstream`, so operators can correlate cascade-mid-flight events with which reads were affected.
## Per-Request Timeout Watchdog
`RunRequestTimeoutWatchdogAsync` is launched from the multiplexer constructor and runs for the lifetime of the multiplexer. It ticks every `BackendRequestTimeoutMs / 4`, floored at 100 ms, and on each tick calls `CorrelationMap.SnapshotOlderThan(DateTimeOffset.UtcNow.AddMilliseconds(-BackendRequestTimeoutMs))`.
For each stale entry the watchdog:
1. Tries to claim the entry via `_correlation.TryRemove(proxyTxId, out var req)`. A failed claim means a response, cascade, or another watchdog tick already removed it — skip.
2. Releases the proxy TxId via `_allocator.Release(proxyTxId)`.
3. For FC03/FC04, also removes the matching `CoalescingKey` from `InFlightByKeyMap` so a brand-new identical request opens a fresh round-trip rather than attaching to a corpse.
4. Walks `req.InterestedParties` and, for each live pipe, delivers a synthesised Modbus exception frame with function code `req.Fc | 0x80` and exception code `0x0B` (Gateway Target Device Failed To Respond), with the party's `OriginalTxId` patched back into the MBAP header.
The watchdog exists because the multiplexed model has no per-pair fault-on-timeout backstop. In the 1:1 model, a lost response simply sat on a dead socket that the upstream eventually closed; in the multiplexed model, a single missing or mis-echoed response would leak its `CorrelationMap` entry forever and hang every upstream party attached to it. Specific failure modes the watchdog covers:
- The PLC drops a response (busy controller, scan-time excursion).
- A middlebox drops a packet on a long-idle backend socket.
- A backend mis-echoes the MBAP TxId — including pymodbus 3.13.0's deferred-handler bug noted below.
### Why claim then release
The watchdog reads the stale set via `SnapshotOlderThan` (a non-removing scan) and only then competes for each entry via `TryRemove`. The two-step is deliberate: a response arriving between the snapshot and the claim wins the `TryRemove` race and the watchdog skips that entry. Without the claim race, the upstream party could receive both a real response and a 0x0B exception for the same request, which would corrupt clients that expect responses in TxId order.
### Tick cadence
The 100 ms floor on `tickMs` keeps the watchdog from busy-waking when an operator configures `BackendRequestTimeoutMs` below 400 ms. With the production default of 3000 ms the watchdog ticks every 750 ms, which keeps timeout dispatch latency well under one second past the threshold.
### Exception frame shape
`BuildExceptionFrame` produces a 9-byte synthetic response: 7-byte MBAP header plus a 2-byte exception PDU. The function code byte is OR'd with `0x80` to flag the response as an exception, and the second PDU byte carries the exception code (`0x04` for allocator saturation, `0x0B` for the watchdog). The `Length` field in the MBAP header is set to 3 (`UnitId` + exception FC + exception code) and the `ProtocolId` is zero per the Modbus TCP spec. Clients written against a real DL260 see exactly the same frame layout a controller would emit, so client libraries surface a normal `ModbusException` rather than a transport error.
## Backend Disconnect Cascade
When the backend socket dies — reader EOF, writer fault, PLC reboot, network partition, or middlebox idle drop — `TearDownBackendAsync(reason, cascadeUpstreams: true)` runs:
1. Cancels `_backendCts`, which terminates both backend tasks.
2. Shuts down and disposes the backend `Socket`.
3. Calls `CorrelationMap.DrainAll`, releases every allocator slot, and collects every `InterestedParty`'s pipe ID.
4. Calls `InFlightByKeyMap.DrainAll` so stale coalescing entries cannot outlive the backend they were aimed at.
5. Disposes every attached `UpstreamPipe` and clears `_pipes`.
6. Increments `BackendDisconnectCascades` on the per-PLC counters by the number of upstream pipes that were attached (`AddDisconnectCascades(upstreamCount)`).
7. Logs a `MultiplexerLogEvents.BackendDisconnected` event with the upstream count, drained correlation count, and a reason string.
The rationale: a backend disconnect invalidates every in-flight response, and there is no clean way to mid-flight-rebind upstream clients to a fresh backend socket without risking silent data loss. Cascading the disconnect upstream is loud (clients re-issue immediately) but unambiguous — every upstream sees its socket close, no zombie upstream sockets hold stale state. The next upstream frame after the cascade triggers a fresh Polly-driven backend connect.
### Failure detection paths
Three independent paths can initiate a cascade:
1. **Reader EOF.** `RunBackendReaderAsync` sees a clean zero-byte read from `ReceiveAsync` and falls out of the loop. It calls `TearDownBackendAsync("backend reader EOF", cascadeUpstreams: true)` as a fire-and-forget task.
2. **Reader fault or writer fault.** Either backend task catches a non-cancellation exception and calls `TearDownBackendAsync($"reader fault: {ex.Message}", ...)` or the equivalent writer-fault path.
3. **Watchdog-driven indirect failure.** A backend that mis-echoes TxIds will not itself fault the socket; the watchdog eventually times out the leaked correlation entries and delivers 0x0B exceptions. The socket stays up unless the backend then also stops responding to subsequent requests.
`TearDownBackendAsync` is idempotent against itself — the `lock (_backendLock)` block atomically swaps the live socket and task references to `null`, so a second invocation sees `oldSocket is null && oldCts is null` and returns without re-cascading.
### Why every attached upstream cascades
An earlier sketch cascaded only upstream pipes that had a request in flight at the moment of disconnect. The current implementation cascades every attached pipe, in flight or idle. The reason: an idle upstream pipe is one that the proxy has been quietly answering from cache or that has simply not sent a request recently. After a backend disconnect, the proxy has no way to prove the PLC's state still matches what those idle clients last saw — a PLC reboot, ladder edit, or operator write between the disconnect and reconnect can have moved the values out from under them. Closing every upstream socket is the unambiguous signal that "the link to the device was lost; rebuild your state from scratch." Clients reconnect on their own next request.
### Connect-on-next-frame, not eager reconnect
The cascade tears down the backend without scheduling a reconnect. The next upstream frame that arrives invokes `EnsureBackendConnectedAsync`, which constructs a fresh socket and runs the Polly connect pipeline. The rationale is that an eager reconnect spinner would hammer a downed PLC at the configured backoff schedule even when no clients are attached; gating reconnect on client demand avoids waste during long PLC outages without sacrificing recovery latency once clients return.
## Wire-Rate Considerations
The multiplexer is not a throughput multiplier. The ECOM serialises every request it receives on its single internal scan, so PDUs-per-second to one PLC is bounded by `1 / ecom_scan_ms` regardless of how many upstream clients the proxy fans in. What changes:
- **Connection count.** Upstream-side connection count is now limited by the OS socket budget and `OutboundChannelCapacity` (256), not by the ECOM's 4-client cap.
- **Coalescing opportunity.** Identical concurrent FC03/FC04 reads attach to the same `InFlightRequest` via `InFlightByKeyMap`, so the proxy issues one backend round-trip and fans the response out to all attached parties (see [`./ReadCoalescing.md`](./ReadCoalescing.md)).
- **Cache short-circuit.** FC03/FC04 reads with a resolved per-tag TTL never reach the wire while the cached PDU is fresh (see [`./ResponseCache.md`](./ResponseCache.md)).
The proxy can hand more concurrent upstream clients a result on a hot tag than the bare PLC can serve simultaneously. It cannot let those clients hammer the PLC harder than the PLC's scan time allows.
### Counters exposed by the status page
`PlcMultiplexer` implements `IMultiplexCountersProvider` and registers itself with the per-PLC counters object during construction. The status page reads these values per snapshot:
| Counter | Source | Meaning |
|---------|--------|---------|
| `inFlight` | `TxIdAllocator.InFlightCount` | Proxy TxIds currently allocated against this PLC. |
| `maxInFlight` | `Counters.ObserveInFlight` peak | High-water mark since service start. |
| `txIdWraps` | `TxIdAllocator.WrapCount` | Times the rolling cursor has rolled 0xFFFF → 0x0000. Sustained non-zero means very high churn. |
| `queueDepth` | `_outboundChannel.Reader.Count` | Frames sitting in the outbound channel waiting for the backend writer. Persistent depth means the PLC is the bottleneck. |
| `disconnectCascades` | `Counters.AddDisconnectCascades` | Cumulative count of upstream pipes cascaded by backend disconnects. Rises in chunks equal to the attached pipe count at cascade time. |
| `connectsSuccess` / `connectsFailed` | `Counters.IncrementConnectSuccess` / `IncrementConnectFailed` | Per-PLC backend connect outcomes. |
### Interpreting non-zero txIdWraps
Each `WrapCount` increment means the allocator has issued at least 65,536 TxIds against one PLC since service start. On a steady 10 ms-per-PDU pace that takes about 11 minutes; sustained wraps therefore indicate request rates in the hundreds-per-second range, well above what an ECOM-served PLC can answer. Wraps without a matching rise in `inFlight` simply reflect cumulative volume and are benign. Wraps that climb alongside a high `inFlight` value indicate the PLC is back-pressuring; check `queueDepth` and the EWMA round-trip on the same snapshot.
### Interpreting non-zero queueDepth
`_outboundChannel` is bounded at 256 with `BoundedChannelFullMode.Wait`. A persistent depth above zero means the backend writer is not draining as fast as upstream pipes are submitting — the PLC has become the bottleneck. A queue that climbs toward 256 means upstream pipes are starting to block on `WriteAsync`; that backpressure walks back up the per-pipe read loop and ultimately stalls the upstream client's send buffer, which is the correct behaviour for an overloaded PLC.
A queue depth above zero with `inFlight` also climbing suggests the PLC is keeping up with requests but slowly; the EWMA round-trip on the same snapshot will confirm. A queue depth above zero with `inFlight` flat at the allocator's saturation ceiling indicates a stuck backend (no responses arriving, no slots freeing); the watchdog will eventually clear the stuck entries via 0x0B exceptions.
### Memory footprint per PLC
Each `PlcMultiplexer` holds a `bool[65536]` for the TxId allocator (~64 KB), the `ConcurrentDictionary` for the correlation map (sized to peak in-flight, typically tens of bytes per entry plus the `byte[]` frame buffers referenced by the entries), the bounded outbound channel (≤ 256 frames in flight; each frame at most 260 bytes), and the per-pipe response channels (≤ 16 frames per attached pipe). With ~54 PLCs the allocator alone accounts for roughly 3.4 MB; the rest is request-rate dependent and well within the service's measured ~30 MB working set under load.
## Lifecycle And Disposal
`PlcMultiplexer.DisposeAsync` is idempotent and runs in this order:
1. Sets `_disposed = true` and unhooks the live `IMultiplexCountersProvider` registration so a concurrent status snapshot does not observe internal state mid-teardown.
2. Cancels `_disposeCts`, which cooperatively stops the watchdog task.
3. Awaits the watchdog with a 2-second timeout so its in-flight 0x0B dispatches settle before tests assert against counter values.
4. Calls `TearDownBackendAsync("disposing", cascadeUpstreams: true)` to close the backend, drain `CorrelationMap`, drain `InFlightByKeyMap`, and dispose every attached pipe.
5. Completes the outbound channel writer, then disposes any pipes that were not already cleared by the cascade walk.
6. Disposes `_disposeCts`.
`UpstreamPipe.DisposeAsync` is similarly idempotent: it completes its response channel writer, cancels its internal CTS, shuts the upstream socket down both ways, and emits a `MultiplexerLogEvents.ClientDisconnected` event with the remote endpoint string and a reason. Disposal can be triggered by the listener (clean upstream EOF), by the read or write loop encountering a socket error, or by the cascade walk.
## pymodbus 3.13.0 Simulator Quirk
The pymodbus simulator's `ServerRequestHandler` stores a single `last_pdu` field per connection and schedules deferred response handlers via `asyncio.call_soon`. When two MBAP frames arrive in the same recv buffer — exactly the workload the multiplexer can produce on its shared backend socket — the second frame's `last_pdu` overwrites the first before either deferred handler runs. Both responses then carry the second request's TxId.
### Why this only matters in tests
The real H2-ECOM100 does not have this bug; it echoes per-request TxIds correctly. Multiplexer correctness under genuine backend concurrency is proven by the unit tests in `PlcMultiplexerTests` against a stub backend that respects MBAP TxIds, not via the simulator. The E2E suite paces requests against the pymodbus simulator to keep it in known-good single-PDU mode.
The per-request timeout watchdog described above is the production defence against any backend (real or simulated) that mis-echoes a TxId: the unanswered `InFlightRequest` ages past `BackendRequestTimeoutMs` and the upstream party receives a clean Modbus exception 0x0B rather than a hung socket.
## Related Documentation
- [`./Overview.md`](./Overview.md) — proxy architecture entry point
- [`./ReadCoalescing.md`](./ReadCoalescing.md) — FC03/FC04 fan-out built on `InterestedParties`
- [`./ResponseCache.md`](./ResponseCache.md) — per-PLC FC03/FC04 cache layered in front of this multiplexer
- [`../Operations/Configuration.md`](../Operations/Configuration.md) — `Connection.BackendConnectTimeoutMs`, `Connection.BackendRequestTimeoutMs`, retry tuning
- [`../Operations/StatusPage.md`](../Operations/StatusPage.md) — `inFlight`, `maxInFlight`, `txIdWraps`, `queueDepth`, `disconnectCascades` counters
- [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — `mbproxy.multiplex.*` structured log events
- [`../Testing/Simulator.md`](../Testing/Simulator.md) — pymodbus 3.13.0 deferred-handler quirk in detail
- [`../../DL260/dl205.md`](../../DL260/dl205.md) — DL205/DL260 quirks including the 4-client ECOM cap
+150
View File
@@ -0,0 +1,150 @@
# Architecture Overview
`mbproxy` is a .NET 10 background service that sits inline between Modbus TCP clients and a fleet of AutomationDirect DL205/DL260 PLCs, rewriting BCD-encoded registers in both directions while multiplexing many upstream clients onto one persistent backend socket per PLC.
This document is the entry point for readers new to the codebase. It sketches the runtime shape, the listener topology, the per-PLC isolation model, and the path a single Modbus frame takes from accept to response, and then hands off to the per-feature documents under `docs/Architecture/`, `docs/Features/`, and `docs/Operations/`.
## Runtime Shape
The process is a single .NET 10 Generic Host worker. `Microsoft.Extensions.Hosting.WindowsServices` registers the host as a Windows Service so the same binary runs interactively (for development) or under the SCM (in production). All configuration binds from `appsettings.json` through `IOptionsMonitor<MbproxyOptions>`, which makes the tag list and PLC roster hot-reloadable without process restart. `ProxyWorker` is the long-lived `BackgroundService` that owns startup, shutdown, and the listener supervisors for every PLC. A small Kestrel admin endpoint runs in the same process to serve the read-only status page.
There is no in-process database, no message broker, and no persistent cache file: state is per-PLC, in-memory, and ephemeral. Restarting the service drops every in-flight request and every cached response. Upstream clients are expected to reconnect and reissue; the proxy never replays a request on their behalf.
## Listener Topology
The proxy opens **one `TcpListener` per PLC** on a distinct port. A client picks which PLC it is talking to by choosing which port to connect to. There is no protocol-level routing — port number is the PLC identity. This keeps the upstream surface trivial for Wonderware, Historian gateways, and generic Modbus clients that already know how to point at `host:port`, and it means no per-frame header inspection is needed to decide where a request is going.
```text
Client A ──┐
Client B ──┼──→ proxy:5020 ──→ PLC #1 (10.0.1.1:502)
├──→ proxy:5021 ──→ PLC #2 (10.0.1.2:502)
│ ...
└──→ proxy:5073 ──→ PLC #54 (10.0.1.54:502)
```
Each listener runs under a `PlcListenerSupervisor` that owns its bind lifecycle. If a bind fails at startup or the listener faults at runtime, the supervisor reattempts under a Polly retry pipeline; the same code path also brings up newly-added PLCs from hot-reload and tears down removed ones. The supervisor's state (`SupervisorState`) is observable on the status page so an operator can tell at a glance whether a port is bound, recovering, or shut down.
Because port identity is PLC identity, adding a PLC is purely a configuration change — append an entry to `Mbproxy.Plcs` with a free `ListenPort`, save, and the supervisor reconciliation loop binds the new port without touching any other PLC. Removing a PLC follows the same path in reverse.
## Per-PLC Isolation
Every PLC gets its own `PerPlcContext` carrying that PLC's `PlcMultiplexer`, `CorrelationMap`, `TxIdAllocator`, `InFlightByKeyMap`, optional `ResponseCache`, `CacheInvalidator`, and `BcdPduPipeline`. There is no shared mutable state across PLCs at the request path.
The consequence is fault containment:
- A slow or dead backend on PLC #17 cannot block the request loop for PLC #18. Each multiplexer owns its own outbound channel and its own backend reader/writer task pair.
- A flood of in-flight requests on one PLC consumes only that PLC's TxId allocator (the 16-bit space is per-PLC, not global).
- A backend disconnect on one PLC cascades only to that PLC's attached upstream pipes; the rest of the fleet is unaffected.
- Hot-reload of one PLC's tag list rewrites only that PLC's `BcdPduPipeline` view of the tag map. Other PLCs do not observe the swap.
The listener topology and the per-PLC component graph are deliberately aligned: one port, one supervisor, one multiplexer, one backend socket, one cache instance.
Cross-PLC state exists only in three places, and each is read-mostly: the bound `IOptionsMonitor<MbproxyOptions>` snapshot, the global Serilog logger, and the service-wide counter set surfaced on the status page. Counters are written via lock-free `Interlocked` operations on disjoint per-PLC fields, then summed when the status page is rendered.
This isolation is what lets the service operate degraded without operator intervention. If three PLCs drop off the network, the supervisor for each enters `recovering`, their multiplexers tear down their backend sockets, attached upstream clients are disconnected, and the remaining 51 PLCs keep serving traffic with no measurable impact. When the dropped PLCs come back, their supervisors rebind their listeners and the next upstream request triggers a fresh backend connect through the Polly pipeline — no fleet-wide restart, no manual reconnect, no shared state to flush.
## Request Flow
The path of an FC03 read from an upstream client through the proxy and back. The cache check, the coalescing check, and the BCD rewrite all sit between the upstream parse and the backend send so the multiplexer can short-circuit the backend entirely when it does not need to be involved. Steps the upstream client never sees are indented.
```text
Upstream client
│ TCP connect → proxy:5020
PlcListener (PlcListener.cs) accepts the socket
UpstreamPipe wraps the socket: read loop + bounded response channel
│ parses MBAP frames off the wire, hands each frame to:
PlcMultiplexer.OnUpstreamFrameAsync(pipe, frame, ct)
│ 1. Parse MBAP header → originalTxId, unitId
│ 2. Parse PDU → fc, startAddr, qty
│ 3. (FC03/FC04 only) ResponseCache.TryGet(CacheKey)
│ ├─ HIT → splice cached payload onto a fresh MBAP header
│ │ with originalTxId, push to upstream channel, DONE.
│ └─ MISS → fall through.
│ 4. InFlightByKeyMap coalesce check
│ ├─ duplicate read in flight → attach as additional waiter,
│ │ share the eventual response, DONE for this frame.
│ └─ first-of-key → become the leader, fall through.
│ 5. BcdPduPipeline rewrites request payload (FC06/FC16) binary → BCD
│ 6. TxIdAllocator hands out a free proxyTxId
│ 7. CorrelationMap[proxyTxId] = InFlightRequest(pipe, originalTxId, ...)
│ 8. Overwrite MBAP TxId field with proxyTxId; enqueue to outbound channel
Backend writer task drains the outbound channel
│ → single persistent socket → PLC :502
PLC responds; backend reader task picks the frame off the socket
│ 9. Look up proxyTxId in CorrelationMap; recover original requester(s)
│ 10. BcdPduPipeline rewrites response payload (FC03/FC04) BCD → binary
│ 11. ResponseCache stores the rewritten payload (if TTL > 0)
│ 12. Fan out to every waiter on the InFlightByKey entry, restoring each
│ waiter's originalTxId before pushing into its UpstreamPipe channel
UpstreamPipe writer task drains its response channel → upstream socket
Upstream client sees a response with the TxId it originally sent.
```
Writes (FC06, FC16) take a shorter path: no cache lookup, no coalescing, but the request payload is BCD-rewritten before forwarding, and the response triggers `CacheInvalidator` to evict any overlapping cached read ranges so the next read does not serve stale data.
A few invariants are worth flagging because they shape the design:
- **Original TxId is preserved end-to-end.** The multiplexer rewrites the wire TxId for routing, but every upstream client sees the exact 16-bit value it sent. `InFlightRequest` carries the original TxId alongside the upstream pipe reference.
- **Single backend writer, single backend reader.** No socket-level synchronisation is needed because exactly one task writes to the backend socket and exactly one task reads from it. The outbound channel funnels every request through that single writer.
- **The cache check happens before backend connect.** If every read in a request is cache-served and the backend is currently disconnected, the upstream client still gets a response. The cache survives backend transitions intentionally.
- **No mid-request retries on writes.** FC06 and FC16 are non-idempotent on BCD tags (a partial-applied multi-register write could leave a 32-bit BCD value mid-transition), so a backend failure during a write surfaces as Modbus exception 0x0B and the client decides how to recover.
## Component Map
The major components a reader will hit when tracing a request, with their file locations under `src/Mbproxy/`. The list is ordered by where each component sits in the request path — accept loop at the top, rewrite at the bottom.
- **`ProxyWorker`** — `Proxy/ProxyWorker.cs`. The `BackgroundService` host; reconciles the configured PLC list with the supervisor roster on startup and on `IOptionsMonitor` change events.
- **`PlcListenerSupervisor`** — `Proxy/Supervision/PlcListenerSupervisor.cs`. Owns one PLC's listener lifecycle (bind, run, recover, shut down). Uses Polly for bounded recovery.
- **`PlcListener`** — `Proxy/PlcListener.cs`. The actual `TcpListener` accept loop for one PLC; hands every accepted socket to that PLC's multiplexer as a new `UpstreamPipe`.
- **`UpstreamPipe`** — `Proxy/Multiplexing/UpstreamPipe.cs`. One per upstream socket. Frame-parses inbound bytes and pushes parsed MBAP frames into the multiplexer; drains outbound responses from a bounded channel back to the client.
- **`PlcMultiplexer`** — `Proxy/Multiplexing/PlcMultiplexer.cs`. The per-PLC fanin/fanout core. Owns the persistent backend socket, the outbound write loop, the backend read loop, the per-request watchdog, and the cascade-on-backend-disconnect contract. Entry point `OnUpstreamFrameAsync` is where every upstream frame enters the request path; it is the single function that ties cache, coalescing, BCD rewrite, TxId allocation, and correlation together.
- **`CorrelationMap`** — `Proxy/Multiplexing/CorrelationMap.cs`. Maps `proxyTxId → InFlightRequest` so backend responses can be routed back to the originating upstream pipe(s). Also the surface the watchdog scans for stale entries.
- **`TxIdAllocator`** — `Proxy/Multiplexing/TxIdAllocator.cs`. Allocates and recycles the per-PLC 16-bit proxy TxId space used by the multiplexer.
- **`InFlightByKeyMap`** — `Proxy/Multiplexing/InFlightByKeyMap.cs`. The read-coalescing seam: keys on `(unitId, fc, startAddr, qty)` so duplicate concurrent reads share one backend round-trip and one response.
- **`ResponseCache`** — `Proxy/Cache/ResponseCache.cs`. Opt-in per-tag-range TTL cache for FC03/FC04 responses. A cache hit short-circuits the backend entirely; cache lookup happens before the multiplexer even ensures the backend is connected.
- **`CacheInvalidator`** — `Proxy/Cache/CacheInvalidator.cs`. Invalidates cached read ranges that overlap with successful FC06/FC16 writes, so writes never leave stale reads behind.
- **`BcdPduPipeline`** — `Proxy/BcdPduPipeline.cs`. The actual BCD rewrite: walks request and response PDUs against the resolved tag map and re-encodes each configured register between BCD nibbles and binary integers. 32-bit BCD tags spanning the CDAB word pair are rewritten as a unit. Non-BCD registers pass through untouched, and any function code the pipeline does not own (diagnostics, exceptions, coil and discrete-input functions) is forwarded byte-for-byte.
`PerPlcContext` (`Proxy/PerPlcContext.cs`) is the container that binds these together for one PLC and is the handle the supervisor and multiplexer carry around.
Two supporting abstractions are worth knowing about even though they do not appear in the per-frame path:
- **`IPduPipeline`** — the rewrite-pipeline interface (`Proxy/IPduPipeline.cs`). `BcdPduPipeline` is the production implementation; `NoopPduPipeline` is the test/passthrough implementation used when no BCD tags are configured for a PLC.
- **`MbapFrame`** — the static helper (`Proxy/MbapFrame.cs`) that parses and serialises the 7-byte MBAP header. Every component that touches the wire goes through this helper rather than indexing raw byte arrays directly.
Counters and structured log event names emitted from these components are catalogued in `ProxyCounters` (`Proxy/ProxyCounters.cs`) and the various `*LogEvents` static classes (`MultiplexerLogEvents`, `CoalescingLogEvents`, `CacheLogEvents`, `RewriterLogEvents`). A reader following a runtime symptom back to its source should grep for the event-name constants in those files first.
## Where to Read Next
For the wire-level details of how one backend socket fans out to many upstream clients — TxId rewriting, the correlation map, the per-request watchdog, the backend disconnect cascade — read [`./ConnectionModel.md`](./ConnectionModel.md). It is the most load-bearing internal document; almost every failure-mode question routes through it.
For the read-coalescing seam (when duplicate concurrent reads collapse onto one backend request) read [`./ReadCoalescing.md`](./ReadCoalescing.md). For the opt-in TTL cache and how writes invalidate overlapping read ranges read [`./ResponseCache.md`](./ResponseCache.md). The BCD rewrite itself — what gets rewritten, what passes through, and how CDAB 32-bit values are handled — is in [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md).
Operators looking for configuration shape, hot-reload semantics, and the status page should start at [`../Operations/Configuration.md`](../Operations/Configuration.md) and [`../Operations/StatusPage.md`](../Operations/StatusPage.md). When something is misbehaving in production, [`../Operations/Troubleshooting.md`](../Operations/Troubleshooting.md) and [`../Reference/LogEvents.md`](../Reference/LogEvents.md) are the two places to look first.
The simulator used by the end-to-end test suite — a `pymodbus`-based stand-in for a real DL205 — has its own document at [`../Testing/Simulator.md`](../Testing/Simulator.md). Test-only quirks of that simulator are called out there rather than in the production docs, because the real DL260 ECOM does not share them.
## Related Documentation
- [`./ConnectionModel.md`](./ConnectionModel.md) — TxId multiplexing, correlation map, per-request watchdog.
- [`./ReadCoalescing.md`](./ReadCoalescing.md) — how `InFlightByKeyMap` collapses duplicate concurrent reads.
- [`./ResponseCache.md`](./ResponseCache.md) — `ResponseCache` and `CacheInvalidator` semantics.
- [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md) — the `BcdPduPipeline` rewrite rules.
- [`../Features/HotReload.md`](../Features/HotReload.md) — `IOptionsMonitor` propagation and supervisor reconciliation.
- [`../Operations/Configuration.md`](../Operations/Configuration.md) — `appsettings.json` schema and tag list shape.
- [`../Operations/StatusPage.md`](../Operations/StatusPage.md) — the Kestrel admin endpoint and counter catalog.
- [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — stable structured log event names.
- [`../design.md`](../design.md) — canonical design decisions and rationale.
- [`../Testing/Simulator.md`](../Testing/Simulator.md) — `pymodbus` DL205 simulator used by the end-to-end suite.
- [`../plan/README.md`](../plan/README.md) — phase plan with per-phase test inventory.
+243
View File
@@ -0,0 +1,243 @@
# Read Coalescing
In-flight read coalescing collapses identical FC03/FC04 requests that arrive
while a backend response is still in flight onto a single backend round-trip,
then fans the single response out to every attached upstream client with each
client's original MBAP transaction ID restored.
## What Coalescing Does
When two upstream clients each send `(unitId=1, FC=3, start=100, qty=10)`
within the in-flight window of a previously-routed request, the second
arrival attaches to the existing `InFlightRequest` instead of opening a new
proxy transaction ID and a second backend round-trip. The PLC's reply is
delivered to both upstream pipes; each pipe sees its own MBAP `TxId`
restored on its copy of the response.
The value each upstream sees is the same value an uncoalesced request would
have returned within the PLC's own scan-time precision (microseconds to
~10 ms typical window). Coalescing is not a cache layer — once the response
fans out, the in-flight entry dies, and a subsequent identical read opens a
fresh round-trip. Bounded-staleness caching is a separate feature; see
[`./ResponseCache.md`](./ResponseCache.md).
## The Coalescing Key
The lookup tuple is defined in `CoalescingKey.cs`:
```csharp
internal readonly record struct CoalescingKey(
byte UnitId,
byte Fc,
ushort StartAddress,
ushort Qty);
```
Record-struct value equality drives the dictionary lookup in
`InFlightByKeyMap`. Several axes never coalesce, by design:
- **Function code.** FC03 (Read Holding Registers) and FC04 (Read Input
Registers) read different Modbus tables on the device. Their responses
are not interchangeable, so they do not share a key even at the same
address.
- **Unit ID.** Distinct unit IDs behind a shared socket address different
Modbus personalities — coalescing never crosses a unit boundary.
- **Start address and quantity.** Two reads with overlapping but
non-identical ranges never coalesce. Range-overlap logic exists for cache
invalidation, not for coalescing.
## Eligibility
Only FC03 and FC04 enter the coalescing path. The multiplexer's request
handler parses the function code from the inbound PDU and gates on
`fcByte is 0x03 or 0x04` before consulting `_inFlightByKey`.
- FC06 (Write Single Register) and FC16 (Write Multiple Registers) are
non-idempotent on BCD tags — a second send would write the value twice.
Writes bypass coalescing entirely and always take the one-round-trip path.
- Exception responses do not coalesce. Each upstream sees an exception
delivered against its own MBAP `TxId` through the normal correlation map
fan-out; there is no special exception-deduplication path.
## The InterestedParties Seam
The data shape that powers fan-out lives on `InFlightRequest`:
```csharp
internal sealed record InFlightRequest(
byte UnitId,
byte Fc,
ushort StartAddress,
ushort Qty,
IReadOnlyList<InterestedParty> InterestedParties,
DateTimeOffset SentAtUtc,
int ResolvedCacheTtlMs = 0);
internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId);
```
Each `InterestedParty` records the upstream pipe to deliver the response to
and the original MBAP `TxId` that pipe sent. The backend reader iterates
this list, patches each party's `OriginalTxId` into a per-party copy of the
response frame, and hands the frame to `party.Pipe.SendResponseAsync`.
### Multi-writer multi-reader safety
The list typed as `IReadOnlyList<InterestedParty>` on the public surface is
in fact a mutable `List<InterestedParty>` underneath. `InFlightByKeyMap`
serialises every state mutation under a single `object` lock:
- `TryAttachOrCreate` looks up the key, casts the existing
`InterestedParties` back to `List<InterestedParty>`, and appends the new
party — all under the lock.
- The backend reader calls `TryRemove(coalKey, out _)` **before** it
iterates the parties list during fan-out. Once the key is gone from the
map, no future attach can find it, so no further appends can occur.
The reader's removal-before-iteration ordering is the load-bearing
invariant. By the time fan-out reads the list, the list is effectively
frozen — there is no other writer that can reach it. The watchdog timeout
path observes the same protocol: it removes the coalescing key before it
walks `req.InterestedParties` to deliver exception 0x0B.
The reverse race (reader removes first, then a late attach arrives) is
impossible by construction — `TryRemove` and `TryAttachOrCreate` both take
the same map lock, so any late attach is serialised either entirely before
the removal (and is part of the fan-out) or entirely after (and opens a
fresh entry under a new factory call).
## MaxParties Cap
`ResilienceOptions.cs` exposes the load-shedding cap:
```csharp
public sealed class ReadCoalescingOptions
{
public bool Enabled { get; init; } = true;
public int MaxParties { get; init; } = 32;
}
```
`Mbproxy.Resilience.ReadCoalescing.MaxParties` defaults to 32. Inside
`TryAttachOrCreate`, an existing entry is only extended when
`existingList.Count < maxParties`; once the cap is hit, the next identical
arrival falls through to the factory branch and opens a fresh in-flight
entry (which means a fresh backend round-trip).
The cap bounds two costs:
- **Fan-out cost per entry** at O(MaxParties). The backend reader's
per-party copy-and-patch loop runs at most `MaxParties` times for any
single response.
- **Backend reader latency under pile-on.** A single pathologically popular
read (every HMI hitting the same tag at the same second) cannot stretch
one fan-out arbitrarily long.
## Hot-Reloadable On/Off
`Mbproxy.Resilience.ReadCoalescing.Enabled` defaults to `true`. The
multiplexer holds a `Func<ReadCoalescingOptions>` accessor that production
binds to `() => optionsMonitor.CurrentValue.Resilience.ReadCoalescing`, so
a hot-reload of `appsettings.json` propagates immediately on the next
inbound PDU.
Flipping `Enabled` to `false` at runtime does not disturb already-coalesced
entries: existing fan-outs drain through the backend reader naturally.
Subsequent FC03/FC04 requests skip the coalescing branch entirely and take
the one-proxy-TxId-per-upstream-request path verbatim.
The same accessor reads `MaxParties` per PDU, so an operator can raise or
lower the cap without restarting the service.
## Lookup Order in the Multiplexer's Read Path
`OnUpstreamFrameAsync` consults three tiers in fixed order for FC03/FC04:
1. **Cache** — if `_ctx.Cache` is wired and `_ctx.TagMap.ResolveCacheTtlMs`
returns a positive TTL for the read range, the response cache is
checked first. A hit short-circuits everything, including the
`EnsureBackendConnectedAsync` call. See
[`./ResponseCache.md`](./ResponseCache.md).
2. **Coalesce** — on a cache miss (or no cache configured), the request
consults `_inFlightByKey` via `TryAttachOrCreate`. A hit attaches the
new party to an in-flight peer and emits no backend traffic.
3. **Backend** — on a coalescing miss, the factory branch allocates a
proxy `TxId` through `TxIdAllocator`, registers the entry in
`CorrelationMap`, runs the BCD rewriter on the request PDU, and queues
the frame onto the outbound channel.
The order is load-bearing. Cache hits avoid both backend traffic **and**
any coalescing-entry housekeeping. Coalescing hits avoid the backend but
still incur a list-append and a fan-out. Backend round-trips are the most
expensive of the three.
## Counter Accounting
`PerPlcContext.Counters` exposes three coalescing-specific counters, all
surfaced on the status page:
- **`coalescedHitCount`** — increments inside `OnUpstreamFrameAsync` when
`TryAttachOrCreate` returns `wasNew == false` (the request attached to
an existing in-flight entry).
- **`coalescedMissCount`** — increments when `wasNew == true`. The
non-coalescing FC03/FC04 path also increments this counter when
coalescing is disabled, so the identity `coalescedHitCount +
coalescedMissCount == total FC03+FC04 requests since multiplexer
construction` holds regardless of `Enabled`.
- **`coalescedResponseToDeadUpstream`** — increments inside the backend
reader's fan-out loop when a coalesced party's pipe has gone dead
(`party.Pipe.IsAlive == false`) before the response landed. Only
counted when the in-flight entry had more than one party — single-party
dead-upstream skips are the normal Phase-9 behaviour and are silent.
When `ReadCoalescing.Enabled == false`, `coalescedHitCount` remains zero
and every FC03/FC04 read increments `coalescedMissCount`. Aggregate fleet
metrics (hit ratio, requests per second) read directly from these
counters; see [`../Operations/StatusPage.md`](../Operations/StatusPage.md).
The Debug-level log events `mbproxy.coalesce.hit`,
`mbproxy.coalesce.miss`, and `mbproxy.coalesce.dead_upstream` mirror each
counter increment; see [`../Reference/LogEvents.md`](../Reference/LogEvents.md).
## Transparency Contract Preserved
Each upstream client receives the same response shape it would have
received from a one-to-one proxy:
- **Original MBAP `TxId` restored.** The backend reader patches
`outFrame[0..2]` with `party.OriginalTxId` for each party in the
`InterestedParties` list. The proxy's internal TxId never reaches an
upstream socket.
- **BCD rewriter runs once.** `_pipeline.Process(ResponseToClient, ...)`
fires exactly once against the shared backend response buffer. Cached
rewriter context (start address, quantity) comes from the
`InFlightRequest` that opened the round-trip.
- **One-party fan-out reuses the buffer.** When
`inFlight.InterestedParties.Count == 1`, the backend reader assigns the
original `frame` reference to `outFrame` instead of cloning, saving the
allocation. Multi-party fan-outs clone the frame per party so each can
carry a distinct `TxId` without trampling its peers.
Coalescing is invisible at the wire-protocol layer. An upstream client
cannot tell whether its read was served by a fresh backend round-trip or
by attaching to a peer's in-flight request — only the timing distribution
changes.
## Related Documentation
- [`./ConnectionModel.md`](./ConnectionModel.md) — multiplexer overview;
the `InterestedParties` seam, `CorrelationMap`, and `TxIdAllocator` live
here.
- [`./ResponseCache.md`](./ResponseCache.md) — bounded-staleness cache that
sits above coalescing in the lookup order; cache hits short-circuit
coalescing entirely.
- [`../Operations/StatusPage.md`](../Operations/StatusPage.md) — exposes
`coalescedHitCount`, `coalescedMissCount`, and
`coalescedResponseToDeadUpstream` per PLC and as fleet aggregates.
- [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — full
`mbproxy.coalesce.*` event catalogue with event IDs.
- [`../Operations/Configuration.md`](../Operations/Configuration.md) —
binding for `Mbproxy.Resilience.ReadCoalescing.Enabled` and `MaxParties`,
hot-reload semantics.
- [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md) — the
rewriter that runs once on the shared response buffer before fan-out.
+398
View File
@@ -0,0 +1,398 @@
# Response Cache
The response cache is an opt-in per-tag, bounded-staleness layer that serves
FC03 and FC04 reads from in-process memory. It sits above read coalescing in
the request path so a hit avoids both the coalescing entry and the backend
round-trip entirely.
## Cache Contract
The cache is **off by default for every tag**. `CacheTtlMs = 0` on every BCD
tag is the default state, and a deployment that ships without any TTL
configuration behaves identically to one compiled without the cache at all
— no in-memory entries are created, every FC03/FC04 read falls through to
the coalescing-then-backend path, and counters that track cache activity
stay at zero.
Operators opt a tag in by setting a positive `CacheTtlMs`. That positive
value is the explicit acknowledgement of the staleness window: the operator
is stating, "I am willing for upstream clients to see a value up to N
milliseconds old in exchange for taking the read off the backend." There is
no implicit cache enablement. There is no global cache toggle that turns
caching on for previously-uncached tags. Every cached tag is one whose
configuration has a positive TTL on its line.
This stance is the design-contract pivot the cache introduces: before it,
the proxy is purely transparent except for BCD rewriting. With the cache,
the proxy is transparent **by default**, with an opt-in cache layer the
operator can engage tag-by-tag.
## TTL Resolution Order
Each FC03/FC04 read range resolves to one effective TTL through three
tiers:
1. **Explicit per-tag.** `BcdTagOptions.CacheTtlMs` on the tag entry. A
non-null value wins regardless of the per-PLC default. An explicit `0`
here disables caching for that tag even when the PLC default is
positive.
2. **Per-PLC default.** `PlcOptions.DefaultCacheTtlMs` applies to any tag
whose explicit `CacheTtlMs` is `null` (unset). A `0` default means "no
caching by default at this PLC."
3. **Zero.** With nothing set at either tier, the resolved TTL is `0` and
the read is uncached.
`BcdTagMap.ResolveCacheTtlMs(startAddress, qty)` implements the per-read
resolution. It enumerates the BCD tags whose register footprints intersect
the requested range and returns the smallest positive TTL across the hits,
or `0` if the range covers no configured tags.
```csharp
public int ResolveCacheTtlMs(ushort startAddress, ushort qty)
{
if (!TryGetForRange(startAddress, qty, out var hits) || hits.Count == 0)
return 0;
int min = int.MaxValue;
foreach (var hit in hits)
{
int ttl = hit.Tag.CacheTtlMs;
if (ttl <= 0) return 0;
if (ttl < min) min = ttl;
}
return min == int.MaxValue ? 0 : min;
}
```
The `hit.Tag.CacheTtlMs` value resolved on each `BcdTag` already reflects
the explicit-then-default order — the options binder resolves the per-tag
override against the per-PLC default at config build time, so the runtime
hot path sees a single integer per tag.
## Multi-Tag Range TTL Rule
When a single FC03/FC04 read covers multiple configured BCD tags, the
effective TTL is the minimum across them:
```text
range covers tags { A:TTL=500, B:TTL=2000, C:TTL=100 } → effective TTL = 100
range covers tags { A:TTL=500, B:TTL=0 (uncached) } → effective TTL = 0
range covers tags { A:TTL=500 } → effective TTL = 500
range covers no configured tags → effective TTL = 0
```
If any covered tag has `CacheTtlMs = 0`, the whole read is uncached. The
rationale is conservative-by-design: a multi-tag read whose narrowest TTL
is, for example, 100 ms cannot be served safely from an entry that was
stored under a tag with TTL 2 s, because that entry's freshness was only
guaranteed by the longer window. Rather than partition a range read across
heterogeneous TTLs or invent inheritance rules that an operator would have
to reason about per-deployment, the cache refuses to serve any multi-tag
read whose narrowest covered TTL is zero. Operators who want a tag cached
in isolation but uncached when read alongside an uncached neighbour get the
expected behaviour by leaving the neighbour at `CacheTtlMs = 0`.
A read whose range covers no configured BCD tags also resolves to `0`.
There is nothing to be conservative about because the cache only serves
ranges that contain rewriter-tracked tags — a read of plain non-BCD
registers does not engage the cache regardless of any per-PLC default.
## Lookup Order
The multiplexer's FC03/FC04 path consults three tiers in fixed order:
1. **Cache.** When `_ctx.Cache` is wired and `BcdTagMap.ResolveCacheTtlMs`
returns a positive TTL for the read range, `ResponseCache.TryGet` is
called against a `CacheKey(unitId, fc, startAddress, qty)`. A hit
splices the cached payload onto a fresh MBAP header carrying the
original upstream TxId, pushes the frame onto that pipe's response
channel, and **returns without engaging coalescing or the backend at
all**.
2. **Coalesce.** On a cache miss (or when the resolved TTL is zero), the
request is offered to `InFlightByKeyMap.TryAttachOrCreate`. A hit
attaches the new party to a peer's in-flight request.
3. **Backend.** On a coalescing miss, the request opens a proxy TxId,
registers a `CorrelationMap` entry, runs the BCD rewriter on any FC06
or FC16 payload, and queues the frame onto the outbound channel.
The cache check happens **before** the multiplexer's
`EnsureBackendConnectedAsync` call. A cache hit serves the upstream even
when the backend socket is currently disconnected or recovering. This is
not an accident — the cached payload's freshness is bounded by its TTL,
not by the liveness of the backend socket. See
[`../Operations/Troubleshooting.md`](../Operations/Troubleshooting.md) for
the operator view of cache-served reads during a backend outage.
## Storage Format: Post-Rewriter Bytes
`CacheEntry.PduBytes` holds the **post-rewriter response PDU body** — the
function code byte, the byte count, and the rewriter-decoded register
data, with no MBAP header. The backend reader task decodes the response
through `BcdPduPipeline` first and only then hands the rewritten payload
to `ResponseCache.Set`.
```csharp
internal sealed record CacheEntry(
byte[] PduBytes,
DateTimeOffset CachedAtUtc,
DateTimeOffset ExpiresAtUtc,
int Length,
long LastUsedTick);
```
Storing post-rewriter bytes is both a CPU optimisation and a correctness
guarantee:
- **CPU.** A cache hit returns ready-to-send bytes. The rewriter does not
re-run per hit; only the MBAP header is regenerated to carry the
upstream's original TxId.
- **Correctness.** An entry decoded against an earlier rewriter version
never gets retroactively re-transformed against a newer version. If the
rewriter's behaviour changes mid-process (it does not today, but the
guarantee is durable across future changes), in-flight cached entries
age out under their TTL and are replaced by fresh entries decoded
through the new rewriter. A bidirectional re-encode never happens to an
already-stored entry.
## Write Invalidation by Address Range Overlap
A successful (non-exception) FC06 or FC16 response invalidates every
cached FC03 or FC04 entry whose address range
`[StartAddress, StartAddress + Qty)` overlaps the write range
`[writeStart, writeStart + writeQty)`. The pure overlap math lives in
`CacheInvalidator.FindOverlapping`:
```csharp
int writeEnd = writeStart + writeQty; // half-open upper bound
foreach (var key in haystack)
{
if (key.UnitId != unitId) continue;
if (key.Fc != 0x03 && key.Fc != 0x04) continue;
int keyEnd = key.StartAddress + key.Qty;
// Overlap iff writeStart < keyEnd AND key.StartAddress < writeEnd.
if (writeStart < keyEnd && key.StartAddress < writeEnd)
hits.Add(key);
}
```
Worked examples on a single unit ID:
```text
Write to register 105 (qty=1)
└─ invalidates cached FC03 [100..110) — register 105 is inside the cached range
└─ leaves cached FC03 [200..210) untouched
Write to registers [10..15) (qty=5)
└─ leaves cached FC03 [15..20) untouched — half-open intervals, 15 is not in [10..15)
Write to registers [98..108) (qty=10)
└─ invalidates cached FC03 [100..110) — ranges overlap on [100..108)
```
Three properties of the invalidator deserve calling out:
- **Exception responses do not invalidate.** A Modbus exception (code 01,
02, 03, 04, or any other) means the write did not take effect on the
PLC. The cached read is still consistent with the device, so the
invalidator is not engaged.
- **Different unit IDs never invalidate each other.** Multi-drop and
gateway personalities behind a shared socket address logically separate
Modbus tables. `CacheKey.UnitId` discriminates.
- **Only FC03 and FC04 entries are evicted.** The cache never stores write
responses, so the invalidator's function-code filter is defensive
rather than load-bearing.
## Bounded Capacity (LRU)
Each `ResponseCache` instance is capped at `Cache.MaxEntriesPerPlc`
(default 1000). When the dictionary is at the cap and a fresh insert
arrives, `EvictLeastRecentlyUsed` walks the entries and removes the one
with the smallest `CacheEntry.LastUsedTick`. The linear scan is
intentional — at 1000 entries the scan is cheaper than the network
round-trip the cache is saving, and a sorted secondary structure would
add complexity for no measurable win.
`LastUsedTick` is a monotonic 64-bit counter incremented on every hit and
every fresh insert. Using the counter rather than `DateTimeOffset.UtcNow`
keeps the hot path free of clock calls and survives wall-clock skew.
A background task drives proactive expiry. The constructor starts a
`PeriodicTimer` at `Cache.EvictionIntervalMs` (default 5000 ms; values
under 100 ms are clamped at 100 ms to prevent tight loops) and the
eviction loop sweeps every entry whose `ExpiresAtUtc` has passed. The
loop is the safety net that keeps abandoned entries — say, those for a
PLC whose upstream clients have all dropped — from holding memory until
process exit. Lazy expiry on `TryGet` still removes entries on demand
when traffic is steady; the background loop only matters under low- or
zero-traffic conditions.
## Long-TTL Safety Gate
`MbproxyOptionsValidator.ValidateCacheTtl` rejects any explicit
`CacheTtlMs > 60_000` unless `Cache.AllowLongTtl = true`. The same gate
applies to `PlcOptions.DefaultCacheTtlMs`. The rejection runs at config
bind / hot-reload time, so a misconfigured `appsettings.json` fails fast
before the cache sees the value.
The gate exists to catch the "left at 1 hour by accident" mistake — a
deployment where a developer set `CacheTtlMs = 3_600_000` for a debugging
session and the value survived into production. Operators who legitimately
need long TTLs (slow-moving setpoints, configuration values that change
once per shift) flip `Cache.AllowLongTtl` to `true` as the explicit
acknowledgement that the long staleness window is intentional.
## Cache and the Rewriter
The BCD rewriter runs **once** on the cache-miss path: the backend reader
task decodes the response through `BcdPduPipeline` and only then hands the
decoded bytes to `ResponseCache.Set`. Cache hits return the stored
post-rewriter bytes directly.
This division has two consequences worth restating:
- **The rewriter cost is amortised across hits.** A high cache hit ratio
on a tag-dense PLC drops the per-request rewriter cost from "every
response" to "every cache-miss response," which on a hot register at
TTL=500 ms is one-in-many.
- **The cached payload is decoupled from the rewriter implementation.**
An entry stored under one rewriter does not get re-transformed if the
rewriter changes. Entries age out under TTL and are replaced by fresh
entries decoded under the current rewriter — there is no in-place
recomputation pass.
## Hot-Reload Semantics
Configuration changes propagate through `IOptionsMonitor<MbproxyOptions>`.
The cache reacts to four kinds of change:
| Change | Cache behaviour |
|--------|----------------|
| Tag's `CacheTtlMs` changed (`0 → N`, `N → 0`, `N → M`) | Entire PLC cache is flushed via `ResponseCache.Clear()`; entries re-populate on demand under the new TTL. |
| New PLC added / removed | New PLC starts with an empty cache; removed PLC's `ResponseCache` is disposed with the multiplexer. |
| `Cache.AllowLongTtl` flipped | Validation runs on the next reload only; existing entries are unaffected. |
| `Cache.MaxEntriesPerPlc` changed | Existing entries are unaffected; the new cap applies to subsequent inserts. |
| `Cache.EvictionIntervalMs` changed | Existing eviction loop continues with its old period; subsequent loops use the new interval. |
Per-tag flush granularity is intentionally not implemented. The clean move
is "any tag-list change to a PLC → drop every entry for that PLC and let
the natural traffic re-populate." Tracking which keys correspond to which
tag IDs adds bookkeeping for no operational win — a tag-list reload is
already a once-in-a-while event, and the rebuild cost on the affected
PLC's hot keys is one round-trip per key under traffic.
See [`../Features/HotReload.md`](../Features/HotReload.md) for the
broader `IOptionsMonitor` propagation model.
## Cache Survives Backend Disconnects
A cached entry's data was valid when stored. A subsequent backend
disconnect does not retroactively invalidate it — the value the upstream
client sees on a hit is the value the PLC reported within the TTL
window, irrespective of whether the backend socket is up at the moment
of the hit. This is the cache's most operationally visible property
during PLC outages: upstream consumers that read hot tags within the
cache window continue to receive responses while the listener supervisor
is in `recovering` state.
The companion rule on the write side keeps the invariant consistent:
**invalidations during a `recovering` listener state are skipped**. If
the backend is down, an FC06 or FC16 write did not reach the PLC, so the
cached read is still consistent with the device's actual state. Skipping
the invalidation matches reality — the write did not take effect, so the
read is not stale.
## No Persistence
The cache is purely in-memory. Process restart wipes every entry. There
is no file-backed snapshot, no Redis or other external store, and no
last-known-good replay. A restarted service rebuilds its cache from
fresh backend round-trips driven by upstream traffic, exactly as it
would after a TTL-induced flush.
Intentional, for two reasons. First, the staleness contract is bounded
by `CacheTtlMs` measured from when the data was first read, and a
persisted entry would re-emerge with an unknown wall-clock age — every
invariant the cache offers would need a freshness field, freshness
arithmetic on load, and recovery against a clock that may have jumped.
Second, the operational model is that the proxy is a stateless
transformer; treating its cache as durable state would change the
deployment story for no measurable production benefit.
## Counter Accounting
`ProxyCounters` exposes five cache counters per PLC, surfaced on the
status page as both per-PLC and fleet-aggregate values:
- **`cacheHitCount`** — FC03/FC04 requests served from the cache. Bumped
inside `OnUpstreamFrameAsync` when `ResponseCache.TryGet` returns true.
- **`cacheMissCount`** — FC03/FC04 requests whose resolved TTL was
positive but whose key was not in the cache (or whose entry had
expired). The identity `cacheHitCount + cacheMissCount = total
cache-eligible FC03/FC04 requests` holds — reads whose effective TTL
is `0` (uncached) increment neither counter.
- **`cacheHitRatio`** — derived on the status page snapshot as
`cacheHitCount / (cacheHitCount + cacheMissCount)` when the
denominator is non-zero.
- **`cacheInvalidations`** — count of cache entries invalidated by
successful FC06/FC16 write responses, summed across writes.
- **`cacheEntryCount`** — point-in-time snapshot of
`ResponseCache.Count` (Tier-2 memory-watch KPI).
- **`cacheBytes`** — point-in-time approximation of cached PDU bytes,
computed as the running sum of `CacheEntry.Length` across entries
(Tier-2 memory-watch KPI).
The structured log events `mbproxy.cache.hit`, `mbproxy.cache.miss`,
`mbproxy.cache.store`, `mbproxy.cache.invalidated`, and
`mbproxy.cache.flushed` (defined in `CacheLogEvents`) mirror the counter
increments at Debug level for incident-time diagnosis. Counters are the
steady-state observability surface; the events are for tracing one
request through the cache when something looks wrong. See
[`../Operations/StatusPage.md`](../Operations/StatusPage.md) and
[`../Reference/LogEvents.md`](../Reference/LogEvents.md).
## Design-Contract Note
The cache changes the proxy's posture from "purely transparent except
for BCD rewriting" to "transparent by default, with an opt-in cache
layer." The transition is deliberate and operator-driven: setting
`CacheTtlMs > 0` on a tag is the explicit consent to the staleness
window, and a deployment that ships no positive TTLs is observationally
indistinguishable from one compiled without the cache code path.
There is no global switch, no implicit warm-up, and no behavioural
divergence from the transparent baseline until the operator opts in
tag-by-tag. The cache is the only place in the proxy where an upstream
read can resolve to a value that did not just round-trip the wire, and
its engagement is gated entirely by the per-tag and per-PLC TTL
configuration described above.
## Related Documentation
- [`./ConnectionModel.md`](./ConnectionModel.md) — TxId multiplexing,
correlation map, and the backend socket the cache short-circuits on a
hit.
- [`./ReadCoalescing.md`](./ReadCoalescing.md) — sits below the cache in
the lookup order; cache hits short-circuit coalescing entirely.
- [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md) — the
`BcdPduPipeline` whose post-decode bytes the cache stores.
- [`../Features/HotReload.md`](../Features/HotReload.md) — the
`IOptionsMonitor` propagation that drives the per-PLC flush on
tag-list change.
- [`../Operations/Configuration.md`](../Operations/Configuration.md) —
binding for `BcdTagOptions.CacheTtlMs`,
`PlcOptions.DefaultCacheTtlMs`, and the `Cache` section
(`AllowLongTtl`, `MaxEntriesPerPlc`, `EvictionIntervalMs`).
- [`../Operations/StatusPage.md`](../Operations/StatusPage.md) — exposes
`cacheHitCount`, `cacheMissCount`, `cacheHitRatio`,
`cacheInvalidations`, `cacheEntryCount`, and `cacheBytes`.
- [`../Operations/Troubleshooting.md`](../Operations/Troubleshooting.md)
— the operator view of cache-served reads while a backend is in
`recovering` state.
- [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — full
`mbproxy.cache.*` event catalogue with event IDs.
- [`../Testing/Simulator.md`](../Testing/Simulator.md) — the
`pymodbus` DL205 stand-in used by the end-to-end cache tests.
- [`../design.md`](../design.md) — canonical design decisions and
rationale.