mbproxy: add in-flight read coalescing (Phase 10)
When two or more upstream clients send the same FC03/FC04 read while a matching request is already in flight on the same PLC's multiplexed backend socket, attach the late arrivals to the existing InFlightRequest .InterestedParties list instead of opening a second backend round-trip. The single backend response fans out to every attached party with each party's original MBAP TxId restored individually. Zero post-response staleness — coalescing operates entirely within the in-flight window (microseconds to ~10 ms typical); the proxy is NOT a cache layer. Headline mechanism: - New record struct CoalescingKey(UnitId, Fc, StartAddress, Qty) keys the per-PLC InFlightByKeyMap. FC03 and FC04 are separate Modbus tables and never share a key; different unit IDs never coalesce; writes (FC06/FC16) bypass the coalescing path entirely. - InFlightByKeyMap uses a simple lock around a Dictionary; atomic TryAttachOrCreate either appends a new party to the in-flight request's mutable List<InterestedParty> or invokes a factory to build a fresh entry. Per-entry MaxParties cap (default 32) bounds fan-out cost; past the cap, the next arrival opens a new entry. - PlcMultiplexer.OnUpstreamFrameAsync takes the coalescing path for FC03/FC04 when Mbproxy.Resilience.ReadCoalescing.Enabled. The factory closure does the Phase-9 work (allocate TxId, add to CorrelationMap); the channel send happens AFTER returning from TryAttachOrCreate so the map lock is not held across the async send. - Response fan-out in RunBackendReaderAsync removes the entry from InFlightByKeyMap before iterating InterestedParties, ensuring no concurrent attach can mutate the list during iteration. - Cascade + watchdog paths also drain the key map so a stale entry cannot outlive its backend round-trip. Counter accounting balance (per snapshot): CoalescedHitCount + CoalescedMissCount equals total FC03 + FC04 requests since startup. Even with coalescing disabled, every read still bumps Miss so dashboard math stays balanced. New surface (additive only): - src/Mbproxy/Proxy/Multiplexing/CoalescingKey.cs - src/Mbproxy/Proxy/Multiplexing/InFlightByKeyMap.cs - src/Mbproxy/Proxy/Multiplexing/CoalescingLogEvents.cs - ReadCoalescingOptions on ResilienceOptions - CoalescedHitCount / CoalescedMissCount / CoalescedResponseToDeadUpstream counters surfaced on /status.json per PLC and as a compact "Coal" cell on the HTML status page. Phase 9 test patch: TwoUpstreams_ProxyTxIds_AreDistinct_OnTheWire previously read the same register from both clients (which now coalesces). Patched to read two different addresses so the test still proves distinct backend TxIds without violating the coalescing contract. Tests added: 24 new (19 unit + 5 E2E): - CoalescingKeyTests (5) - InFlightByKeyMapTests (6, includes concurrent stress) - ReadCoalescingTests (8, stub-backend with deterministic delay) - ReadCoalescingE2ETests (5, pymodbus simulator; coalescing-active during overlap is proven against the stub, not the sim, due to pymodbus 3.13's known concurrent-frame bug) Total: 325 tests passing (282 unit + 43 E2E). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+4
-3
@@ -25,19 +25,20 @@ The full design plan is in **[`docs/design.md`](docs/design.md)** — settled 20
|
||||
|
||||
- **One `TcpListener` per PLC** (54 distinct ports). Each PLC has **one shared backend socket** owned by a `PlcMultiplexer`; many upstream clients are multiplexed onto that single backend via MBAP TxId rewriting (Phase 9). The H2-ECOM100's 4-client cap no longer caps upstream connections.
|
||||
- **Transparent pass-through** of every byte except the MBAP TxId field (rewritten by the multiplexer on each request and restored on each response) and FC03/FC04 response payloads + FC06/FC16 request payloads at configured BCD addresses (re-encoded between BCD nibbles and binary integers).
|
||||
- **In-flight FC03/FC04 read coalescing** (Phase 10): same-key reads arriving while a peer is in flight attach to the existing `InFlightRequest.InterestedParties` list; the single backend response fans out to every attached client with original TxIds restored. Zero post-response staleness — coalescing entries die when the response arrives. Hot-reload via `Mbproxy.Resilience.ReadCoalescing.Enabled`.
|
||||
- **Polly-backed listener supervisor** auto-recovers any listener that fails to bind at startup or faults at runtime; the same code path also brings up newly-added PLCs from hot-reload and tears down removed ones.
|
||||
- **`appsettings.json` is hot-reloadable** via `IOptionsMonitor<MbproxyOptions>`; tag-list changes propagate per-PDU, PLC add/remove flows through the supervisor.
|
||||
- **Polly bounded retries** on backend connect (3 attempts at 100ms / 500ms / 2000ms). No retries on mid-request failures (FC06/FC16 are non-idempotent on BCD tags). A per-request watchdog in the multiplexer surfaces Modbus exception 0x0B to the upstream client if a backend response never arrives within `BackendRequestTimeoutMs`.
|
||||
- **Backend disconnect cascades upstream**: when the shared backend socket dies, every attached upstream pipe is closed in the same cycle (counter `BackendDisconnectCascades`); clients reconnect on their next request.
|
||||
- **Read-only Kestrel admin port** (default 8080) exposes `GET /` (auto-refreshing HTML) and `GET /status.json` with service-wide and per-PLC counters (including Phase-9 mux fields `inFlight`, `maxInFlight`, `txIdWraps`, `disconnectCascades`, `queueDepth`).
|
||||
- **Read-only Kestrel admin port** (default 8080) exposes `GET /` (auto-refreshing HTML) and `GET /status.json` with service-wide and per-PLC counters (including Phase-9 mux fields `inFlight`, `maxInFlight`, `txIdWraps`, `disconnectCascades`, `queueDepth` and Phase-10 coalescing fields `coalescedHitCount`, `coalescedMissCount`, `coalescedResponseToDeadUpstream`).
|
||||
|
||||
Anything beyond this short list — JSON schema, propagation table, stable log event names, status counter catalog, test plan — lives in `docs/design.md`. Open that doc before writing code; keep it in sync when decisions change.
|
||||
|
||||
## Current state
|
||||
|
||||
**Implementation complete through Phase 9.** Phases 00–08 shipped the production-ready 1:1-model service; Phase 9 swapped the connection layer for the TxId-multiplexed model without changing the transparent-rewrite contract. The service is production-ready as a Windows Service:
|
||||
**Implementation complete through Phase 10.** Phases 00–08 shipped the production-ready 1:1-model service; Phase 9 swapped the connection layer for the TxId-multiplexed model without changing the transparent-rewrite contract; Phase 10 added in-flight read coalescing as an additive optimization on top of the multiplexer. The service is production-ready as a Windows Service:
|
||||
|
||||
- 301 tests passing: 263 unit tests + 38 E2E tests (against the pymodbus DL205 simulator + stub backends).
|
||||
- 325 tests passing: 282 unit tests + 43 E2E tests (against the pymodbus DL205 simulator + stub backends).
|
||||
- Single-file self-contained publish (`dotnet publish -c Release -r win-x64`).
|
||||
- PowerShell install/uninstall scripts under `install/`.
|
||||
- Graceful shutdown with configurable drain timeout (`Connection.GracefulShutdownTimeoutMs`, default 10 s).
|
||||
|
||||
Reference in New Issue
Block a user