mbproxy: add in-flight read coalescing (Phase 10)

When two or more upstream clients send the same FC03/FC04 read while a
matching request is already in flight on the same PLC's multiplexed
backend socket, attach the late arrivals to the existing InFlightRequest
.InterestedParties list instead of opening a second backend round-trip.
The single backend response fans out to every attached party with each
party's original MBAP TxId restored individually. Zero post-response
staleness — coalescing operates entirely within the in-flight window
(microseconds to ~10 ms typical); the proxy is NOT a cache layer.

Headline mechanism:

- New record struct CoalescingKey(UnitId, Fc, StartAddress, Qty) keys
  the per-PLC InFlightByKeyMap. FC03 and FC04 are separate Modbus
  tables and never share a key; different unit IDs never coalesce;
  writes (FC06/FC16) bypass the coalescing path entirely.
- InFlightByKeyMap uses a simple lock around a Dictionary; atomic
  TryAttachOrCreate either appends a new party to the in-flight
  request's mutable List<InterestedParty> or invokes a factory to
  build a fresh entry. Per-entry MaxParties cap (default 32) bounds
  fan-out cost; past the cap, the next arrival opens a new entry.
- PlcMultiplexer.OnUpstreamFrameAsync takes the coalescing path for
  FC03/FC04 when Mbproxy.Resilience.ReadCoalescing.Enabled. The
  factory closure does the Phase-9 work (allocate TxId, add to
  CorrelationMap); the channel send happens AFTER returning from
  TryAttachOrCreate so the map lock is not held across the async send.
- Response fan-out in RunBackendReaderAsync removes the entry from
  InFlightByKeyMap before iterating InterestedParties, ensuring no
  concurrent attach can mutate the list during iteration.
- Cascade + watchdog paths also drain the key map so a stale entry
  cannot outlive its backend round-trip.

Counter accounting balance (per snapshot): CoalescedHitCount +
CoalescedMissCount equals total FC03 + FC04 requests since startup.
Even with coalescing disabled, every read still bumps Miss so dashboard
math stays balanced.

New surface (additive only):
- src/Mbproxy/Proxy/Multiplexing/CoalescingKey.cs
- src/Mbproxy/Proxy/Multiplexing/InFlightByKeyMap.cs
- src/Mbproxy/Proxy/Multiplexing/CoalescingLogEvents.cs
- ReadCoalescingOptions on ResilienceOptions
- CoalescedHitCount / CoalescedMissCount /
  CoalescedResponseToDeadUpstream counters surfaced on /status.json
  per PLC and as a compact "Coal" cell on the HTML status page.

Phase 9 test patch: TwoUpstreams_ProxyTxIds_AreDistinct_OnTheWire
previously read the same register from both clients (which now
coalesces). Patched to read two different addresses so the test still
proves distinct backend TxIds without violating the coalescing
contract.

Tests added: 24 new (19 unit + 5 E2E):
- CoalescingKeyTests (5)
- InFlightByKeyMapTests (6, includes concurrent stress)
- ReadCoalescingTests (8, stub-backend with deterministic delay)
- ReadCoalescingE2ETests (5, pymodbus simulator; coalescing-active
  during overlap is proven against the stub, not the sim, due to
  pymodbus 3.13's known concurrent-frame bug)

Total: 325 tests passing (282 unit + 43 E2E).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-14 02:26:06 -04:00
parent 56eee3c563
commit a2dba4bd07
25 changed files with 1888 additions and 52 deletions
+22 -2
View File
@@ -115,14 +115,28 @@ public sealed record BcdTag(ushort Address, byte Width); // Width ∈ { 16, 32
- **Unsigned only.** DL205/DL260 BCD is non-negative in the default ladder pattern; the proxy does not implement signed BCD.
- **Holding-register and input-register addresses share the same space.** The rewriter applies the configured tag list against both FC03 and FC04 reads.
## Read coalescing (Phase 10)
After Phase 10, FC03 / FC04 requests are additionally subject to **in-flight read coalescing** before they reach the backend. When two or more upstream clients send the same `(unitId, fc, startAddress, qty)` tuple within the in-flight window of an already-routed request, the multiplexer attaches each late arrival to the existing `InFlightRequest.InterestedParties` list instead of opening a second backend round-trip. The single backend response is fanned out to every attached party with each party's original MBAP TxId restored individually.
Properties:
- **Zero post-response staleness.** Coalescing operates entirely between "first request sent to backend" and "response received from backend" (microseconds to ~10 ms typical). Once the response is fanned out, the coalescing entry dies. The proxy is NOT a cache layer — the value each upstream sees is the same value an uncoalesced request would have returned within the PLC's scan-time precision.
- **Only FC03 / FC04.** Writes (FC06 / FC16) are non-idempotent on BCD tags and never coalesced. Different function codes never share a `CoalescingKey` even at the same address (FC03 and FC04 read different Modbus tables). Different `unitId` bytes never coalesce (different PLC personalities behind a shared socket).
- **Bounded fan-out via `MaxParties`** (default 32 in `Mbproxy.Resilience.ReadCoalescing.MaxParties`). Once an entry has `MaxParties` interested clients, the next arrival opens a fresh entry — bounds the response-fanout cost per entry at O(MaxParties) and shields the backend reader task from pathological pile-on.
- **Hot-reloadable on/off.** `Mbproxy.Resilience.ReadCoalescing.Enabled` defaults to `true`. Flipping it to `false` at runtime leaves running coalesced entries to drain naturally; subsequent FC03/04 requests take the Phase-9 (one round-trip per upstream request) path.
- **Transparency contract preserved.** Each upstream client still sees its own original MBAP TxId on the response. The BCD rewriter runs once on the shared response buffer; per-party copies are only made when fan-out has more than one party.
Counter accounting balance (per snapshot): `coalescedHitCount + coalescedMissCount` equals the total FC03 + FC04 requests seen since the multiplexer was constructed. Both counters increment regardless of whether the coalescing feature is enabled — `coalescedHitCount` is 0 when disabled, but every read still increments `coalescedMissCount`.
## Rewriter — function code scope
The rewriter inspects and rewrites payloads only for these function codes; every other FC (coils, discrete inputs, diagnostics, exception responses) passes through byte-for-byte:
| FC | Direction | Action |
|----|----------------|-----------------------------------------------------------------------|
| 03 | response | Re-encode covered BCD slots from raw nibbles → binary integer |
| 04 | response | Same as FC03 (input-register table also surfaces V-memory) |
| 03 | request + response | FC03 requests may be coalesced with peers before reaching the backend (see Phase-10 section above); response re-encodes covered BCD slots from raw nibbles → binary integer |
| 04 | request + response | Same coalescing eligibility as FC03; response re-encoding the same as FC03 (input-register table also surfaces V-memory) |
| 06 | request | Re-encode binary integer → BCD nibbles before forwarding |
| 06 | response | Decode BCD nibbles → binary integer on the echo (clients validate that the echoed value equals the value they sent; without this, NModbus-style clients throw on the round-trip) |
| 16 | request | Per-register over the configured slots, then forward |
@@ -176,6 +190,9 @@ Stable event names (keep these stable so log queries don't churn):
| `mbproxy.multiplex.backend.disconnected` | Warning | `Plc`, `UpstreamCount`, `InFlightCount`, `Reason` |
| `mbproxy.multiplex.saturated` | Error | `Plc`, `RemoteEp` (16-bit TxId space full) |
| `mbproxy.multiplex.request.timeout` | Warning | `Plc`, `ProxyTxId`, `OriginalTxId`, `Fc`, `ElapsedMs` |
| `mbproxy.coalesce.hit` | Debug | `Plc`, `UnitId`, `Fc`, `Start`, `Qty`, `PartyCount` |
| `mbproxy.coalesce.miss` | Debug | `Plc`, `UnitId`, `Fc`, `Start`, `Qty` |
| `mbproxy.coalesce.dead_upstream` | Debug | `Plc`, `UnitId`, `Fc`, `Start`, `Qty` |
## Status page — read-only HTTP endpoint
@@ -214,6 +231,9 @@ Authentication is assumed to live at the network layer (trusted internal segment
| `backend.connects.success` / `backend.connects.failed` | Polly-final-result counters |
| `backend.exceptions.byCode` | `{ "01": n, "02": n, "03": n, "04": n }` |
| `backend.lastRoundTripMs` | EWMA of recent successful round-trip times |
| `backend.coalescedHitCount` | FC03/04 requests that attached to an already-in-flight peer (Phase 10) |
| `backend.coalescedMissCount` | FC03/04 requests that opened a fresh backend round-trip (Phase 10). `Hit + Miss` = total FC03/04 requests |
| `backend.coalescedResponseToDeadUpstream` | Coalesced fan-out responses skipped because the attached upstream had already disconnected (Phase 10) |
| `bytes.upstreamIn` / `bytes.upstreamOut` | Bytes forwarded each direction |
Counters are `System.Threading.Interlocked` longs read atomically per request; no locking on the read path.