56eee3c563
Adds the mbproxy service end-to-end. Phases 00-08 implement the production-ready single-listener / 1:1-backend transparent Modbus TCP proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260 fleet. Phase 9 replaces the connection layer with a single backend socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's 4-concurrent-client cap as an operational ceiling. Phase 9 additions of note: - PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap - InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing for Phase 10 read coalescing — do not collapse to a single field) - Per-request watchdog: surfaces Modbus exception 0x0B to upstream on BackendRequestTimeoutMs, defending against lost responses, dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed- request bug (its ServerRequestHandler.last_pdu state race) - Status DTO + HTML gain inFlight / maxInFlight / txIdWraps / disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md) Tests: 263 unit + 38 E2E. Multiplexer correctness under truly concurrent backend traffic is proved against a stub backend in PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus 3.13's single-PDU framer stays in known-good mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
309 lines
21 KiB
Markdown
309 lines
21 KiB
Markdown
# Phase 10 — Read coalescing (in-flight only, zero staleness)
|
||
|
||
When two or more upstream clients send the same FC03/FC04 request to the same PLC while a matching request is already in flight, attach the late arrivals to the existing in-flight entry and fan out the single backend response to all attached clients. Operates entirely within the in-flight window (microseconds to ~10 ms typical) — no post-response caching, no TTL, no staleness contract change.
|
||
|
||
**Status:** post-1.0 follow-on, depends on Phase 9.
|
||
**Depends on:** Phase 09 (multiplexer + `InFlightRequest` with `InterestedParties` list shape).
|
||
**Parallel-safe with:** nothing. The phase modifies `PlcMultiplexer.OnFrame` and the backend reader fan-out path; both are tightly coupled.
|
||
|
||
## Goal
|
||
|
||
Phase 9's multiplexer routes every upstream request individually, even when two upstream clients are asking for identical data. In a fleet of 54 PLCs where the HMI, historian, and engineering workstation all poll the same screen tags every second, that's up to 3× redundant backend traffic per overlapping read — and the H2-ECOM100's single-request-per-scan internal serialization means redundant traffic compounds into measurable backend latency.
|
||
|
||
Phase 10 detects same-key reads within the in-flight window and serves them from a single backend response. Coalescing operates entirely between "first request sent to backend" and "response received from backend." Once the response is fanned out, the coalescing entry dies. No values are held past the response arrival; no invalidation logic; no design-doc change to the "not a polling/cache layer" stance.
|
||
|
||
## Why this is safe — the zero-staleness argument
|
||
|
||
A coalesced response is a value the backend was going to return to the first request anyway. By the time the second client's request arrives, the first request is already on the wire to the PLC. The PLC's response represents the register values at the moment the PLC serviced the request. Even if the second request had been sent separately on its own backend round-trip, the H2-ECOM100's internal serialization would have queued it behind the first, returning the same value (or a value as old as one extra PLC scan ≈ 2-10 ms older).
|
||
|
||
In other words: the only thing Phase 10 changes is whether the proxy sends one or two requests to the PLC. The answer the upstream clients see is identical (or fresher than the "two requests" alternative, since coalescing means the second client doesn't wait for a second backend round-trip).
|
||
|
||
## Outputs (new files in this phase)
|
||
|
||
```
|
||
src/Mbproxy/Proxy/Multiplexing/CoalescingKey.cs # readonly record struct
|
||
src/Mbproxy/Proxy/Multiplexing/InFlightByKeyMap.cs # ConcurrentDictionary wrapper with atomic attach-or-create
|
||
src/Mbproxy/Proxy/Multiplexing/CoalescingLogEvents.cs # [LoggerMessage] vocab for this phase
|
||
|
||
tests/Mbproxy.Tests/Proxy/Multiplexing/CoalescingKeyTests.cs
|
||
tests/Mbproxy.Tests/Proxy/Multiplexing/InFlightByKeyMapTests.cs
|
||
tests/Mbproxy.Tests/Proxy/Multiplexing/ReadCoalescingTests.cs
|
||
tests/Mbproxy.Tests/Proxy/Multiplexing/ReadCoalescingE2ETests.cs
|
||
```
|
||
|
||
## Files modified (existing files in this phase)
|
||
|
||
```
|
||
src/Mbproxy/Proxy/Multiplexing/PlcMultiplexer.cs # OnFrame learns coalescing path; reader fans out
|
||
src/Mbproxy/Proxy/ProxyCounters.cs # new: CoalescedHitCount, CoalescedMissCount, CoalescedResponseToDeadUpstream
|
||
src/Mbproxy/Options/ResilienceOptions.cs # new: ReadCoalescing sub-options
|
||
src/Mbproxy/Admin/StatusDto.cs # PlcBackendStatus gains coalescing fields
|
||
src/Mbproxy/Admin/StatusSnapshotBuilder.cs # populate new fields
|
||
src/Mbproxy/Admin/StatusHtmlRenderer.cs # show coalescing ratio in per-PLC row
|
||
|
||
docs/design.md # Rewriter section: note FC03/04 may be coalesced before reaching backend
|
||
docs/kpi.md # graduate "coalescing ratio" KPI from future to supported
|
||
install/mbproxy.config.template.json # add the new Resilience.ReadCoalescing section with comments
|
||
```
|
||
|
||
`InFlightRequest.cs` does **not** change — the `InterestedParties` list shape was specifically introduced in Phase 9 to make this phase additive.
|
||
|
||
## Tasks
|
||
|
||
### 10.1 Data types
|
||
|
||
1. **`CoalescingKey`** — `readonly record struct CoalescingKey(byte UnitId, byte Fc, ushort StartAddress, ushort Qty)`. Hash key for the in-flight-by-key map. Auto-generated record-struct equality. Verify hashcode distribution is reasonable for typical V-memory address ranges (smoke-test in unit tests).
|
||
|
||
2. **`InFlightByKeyMap`** — wraps `ConcurrentDictionary<CoalescingKey, InFlightRequest>` plus a small lock for atomic attach-or-create. Methods:
|
||
- `bool TryAttachOrCreate(CoalescingKey key, InterestedParty party, Func<InFlightRequest> factory, int maxParties, out InFlightRequest req, out bool wasNew)` — atomic: if the key exists and `req.InterestedParties.Count < maxParties`, append the party to a freshly-built `IReadOnlyList<InterestedParty>` (since the record is immutable, we substitute a new `InFlightRequest` with the extended list in the map) and return `(wasNew=false)`; else call factory to build a new entry, store it, return `(wasNew=true)`.
|
||
- `bool TryRemove(CoalescingKey key, out InFlightRequest req)` — called by the backend reader after fan-out completes.
|
||
- The "attach to existing" path is the load-bearing concurrency primitive of this phase. The simpler implementation: small `lock` around the attach branch. The lock-free implementation uses `AddOrUpdate` with a comparand check. Pick the simpler one; document the choice in code.
|
||
|
||
### 10.2 Multiplexer integration
|
||
|
||
3. **Request path** in `PlcMultiplexer.OnFrame`:
|
||
|
||
```csharp
|
||
bool coalesceCandidate = (fc is 0x03 or 0x04)
|
||
&& resilienceOptions.CurrentValue.ReadCoalescing.Enabled;
|
||
if (coalesceCandidate)
|
||
{
|
||
var key = new CoalescingKey(unitId, fc, startAddr, qty);
|
||
var party = new InterestedParty(upstreamPipe, originalTxId);
|
||
|
||
InFlightRequest? req;
|
||
bool wasNew;
|
||
inFlightByKey.TryAttachOrCreate(
|
||
key, party,
|
||
factory: () => BuildAndRegisterNew(unitId, fc, startAddr, qty, party),
|
||
maxParties: resilienceOptions.CurrentValue.ReadCoalescing.MaxParties,
|
||
out req, out wasNew);
|
||
|
||
if (!wasNew)
|
||
{
|
||
counters.IncrementCoalescedHit();
|
||
return; // do NOT send to backend — first request will get the response
|
||
}
|
||
counters.IncrementCoalescedMiss();
|
||
// fall through: factory already allocated proxyTxId + added to correlation map + sent
|
||
return;
|
||
}
|
||
|
||
// FC06/FC16 or coalescing disabled: existing Phase 9 path (allocate, register, send).
|
||
```
|
||
|
||
The factory closure does the existing Phase 9 work (TxId allocate, correlation map add, MBAP rewrite, send to outbound channel). The new code only adds the "is this already in-flight?" check before that work.
|
||
|
||
4. **Response fan-out** in the backend reader task — already shaped correctly by Phase 9; this phase just makes sure the `CoalescingKey` matching the response is also removed from `InFlightByKeyMap` alongside the `CorrelationMap` removal:
|
||
|
||
```csharp
|
||
if (correlationMap.TryRemove(proxyTxId, out var req))
|
||
{
|
||
txIdAllocator.Release(proxyTxId);
|
||
|
||
// Also clear the coalescing key so a new identical request after this point starts fresh.
|
||
var key = new CoalescingKey(req.UnitId, req.Fc, req.StartAddress, req.Qty);
|
||
inFlightByKey.TryRemove(key, out _);
|
||
|
||
// Phase 9's fan-out loop — already iterates InterestedParties.
|
||
foreach (var party in req.InterestedParties)
|
||
{
|
||
if (!party.Pipe.IsAlive)
|
||
{
|
||
counters.IncrementCoalescedResponseToDeadUpstream();
|
||
continue;
|
||
}
|
||
var partyFrame = WithTxId(responseFrame, party.OriginalTxId);
|
||
party.Pipe.SendResponse(partyFrame);
|
||
}
|
||
}
|
||
```
|
||
|
||
### 10.3 Configuration
|
||
|
||
5. **Extend `ResilienceOptions`:**
|
||
|
||
```csharp
|
||
public sealed class ReadCoalescingOptions
|
||
{
|
||
public bool Enabled { get; init; } = true;
|
||
public int MaxParties { get; init; } = 32;
|
||
}
|
||
|
||
public sealed class ResilienceOptions
|
||
{
|
||
public RetryProfile BackendConnect { get; init; } = new();
|
||
public RecoveryProfile ListenerRecovery { get; init; } = new();
|
||
public ReadCoalescingOptions ReadCoalescing { get; init; } = new(); // ← new
|
||
}
|
||
```
|
||
|
||
Hot-reloadable via the existing `IOptionsMonitor<MbproxyOptions>` wiring. Disabling `Enabled` at runtime means new requests take the non-coalescing path; existing in-flight coalesced entries drain naturally.
|
||
|
||
6. **`mbproxy.config.template.json` update** — add a commented `ReadCoalescing` block to the install template under `Resilience` with the two new keys, default values, and a one-paragraph explanation.
|
||
|
||
### 10.4 Counters and status surfacing
|
||
|
||
7. **`ProxyCounters` additions:**
|
||
|
||
```csharp
|
||
public void IncrementCoalescedHit();
|
||
public void IncrementCoalescedMiss();
|
||
public void IncrementCoalescedResponseToDeadUpstream();
|
||
```
|
||
|
||
`CounterSnapshot` gains `CoalescedHitCount`, `CoalescedMissCount`, `CoalescedResponseToDeadUpstream` — all `long`, all Interlocked. The status page derives `coalescingRatio = Hit / (Hit + Miss)` for display; the raw counts are exposed in JSON for downstream tooling.
|
||
|
||
8. **`/status.json` per-PLC fields** — extend `PlcBackendStatus`:
|
||
|
||
```csharp
|
||
public sealed record PlcBackendStatus(
|
||
long ConnectsSuccess, long ConnectsFailed,
|
||
ExceptionCounts ExceptionsByCode,
|
||
double LastRoundTripMs,
|
||
long CoalescedHitCount, // ← new
|
||
long CoalescedMissCount, // ← new
|
||
long CoalescedResponseToDeadUpstream); // ← new
|
||
```
|
||
|
||
9. **HTML page** — extend the per-PLC row with a compact `Coal: 73%` cell (`hit / (hit+miss) * 100`, rounded). Page-weight assertion (under 50 KB for 54 PLCs) must continue to pass.
|
||
|
||
### 10.5 Documentation
|
||
|
||
10. **`docs/design.md` Rewriter section:** add a paragraph clarifying that FC03/FC04 requests may be coalesced with other in-flight requests of the same `(unitId, fc, start, qty)` before reaching the backend. Emphasize that the transparency contract holds — each client sees its own original TxId restored on the response, and the response value is identical to what an uncoalesced request would have returned (within the PLC's scan-time precision).
|
||
|
||
11. **`docs/kpi.md` Tier 1:** the new `coalescedHitCount`, `coalescedMissCount`, derived `coalescingRatio` graduate from "future" to "supported" Tier 1 fields. Mention the `coalescedResponseToDeadUpstream` counter as a low-priority Tier 2 informational metric.
|
||
|
||
## Public surface declared in this phase
|
||
|
||
```csharp
|
||
namespace Mbproxy.Proxy.Multiplexing;
|
||
|
||
internal readonly record struct CoalescingKey(
|
||
byte UnitId, byte Fc, ushort StartAddress, ushort Qty);
|
||
|
||
internal sealed class InFlightByKeyMap
|
||
{
|
||
public bool TryAttachOrCreate(
|
||
CoalescingKey key,
|
||
InterestedParty party,
|
||
Func<InFlightRequest> factory,
|
||
int maxParties,
|
||
out InFlightRequest req,
|
||
out bool wasNew);
|
||
public bool TryRemove(CoalescingKey key, out InFlightRequest req);
|
||
public int Count { get; }
|
||
}
|
||
```
|
||
|
||
```csharp
|
||
namespace Mbproxy.Options;
|
||
|
||
public sealed class ReadCoalescingOptions
|
||
{
|
||
public bool Enabled { get; init; } = true;
|
||
public int MaxParties { get; init; } = 32;
|
||
}
|
||
// Added field on existing ResilienceOptions:
|
||
public ReadCoalescingOptions ReadCoalescing { get; init; } = new();
|
||
```
|
||
|
||
`ProxyCounters` and `CounterSnapshot` gain three new `long` fields. No public-surface removals, no renames.
|
||
|
||
## Tests required
|
||
|
||
### Unit (`Category = Unit`)
|
||
|
||
**`CoalescingKeyTests`** (≥ 4 tests):
|
||
|
||
1. `Equality_OnIdenticalKeys_ReturnsTrue`
|
||
2. `Equality_OnDifferentFc_ReturnsFalse` — FC03 vs FC04 with same start/qty/unit are NOT equal (different Modbus tables).
|
||
3. `Equality_OnDifferentUnitId_ReturnsFalse`
|
||
4. `HashCode_DistributionSanity` — build 10,000 randomly-generated keys, bucket by `Key.GetHashCode() & 0xFF`, assert no bucket has > 5 % of total (rough uniformity check).
|
||
|
||
**`InFlightByKeyMapTests`** (≥ 6 tests):
|
||
|
||
1. `TryAttachOrCreate_NewKey_CallsFactory_ReturnsTrue_WasNewTrue`
|
||
2. `TryAttachOrCreate_ExistingKey_AppendsParty_ReturnsTrue_WasNewFalse`
|
||
3. `TryAttachOrCreate_ExistingKey_AtMaxParties_CreatesFreshEntry_NotAppend` — refuses to fan out beyond the cap; preserves backend-load-shedding guarantee.
|
||
4. `TryRemove_AfterAttach_AllPartiesPresent_InRetrievedEntry`
|
||
5. `TryRemove_OfMissing_ReturnsFalse`
|
||
6. `Concurrent_AttachOrCreate_From_Two_Threads_NoLostParties_AndNoDuplicateEntries` — 100 tasks × 1000 ops each.
|
||
|
||
**`ReadCoalescingTests`** (≥ 7 tests, real sockets, stub backend):
|
||
|
||
1. `TwoClients_SameRequest_OnlyOneBackendRoundTrip` — stub backend counts received requests; assert 1.
|
||
2. `TwoClients_DifferentRequests_BothHitBackend` — different start addresses; assert 2.
|
||
3. `FiveClients_SameRequest_OneBackendRoundTrip_FiveResponses` — fan-out works correctly with 5 attached parties.
|
||
4. `FC03_And_FC04_SameAddress_NOT_Coalesced` — different tables.
|
||
5. `FC06_Write_NeverCoalesced` — writes always allocate their own TxId.
|
||
6. `OneClient_DisconnectsMidFlight_OthersStillGetResponse_AndDeadUpstreamCounterIncrements`
|
||
7. `AtMaxParties_NextRequest_StartsFreshBackendRoundTrip` — verify the cap behaviour: when `MaxParties = 2` and 3 simultaneous clients send the same request, the third opens a new in-flight entry rather than joining the first.
|
||
|
||
### E2E (`Category = E2E`)
|
||
|
||
**`ReadCoalescingE2ETests`** (≥ 5 tests, against pymodbus simulator, `[Collection(nameof(DL205SimulatorCollection))]`):
|
||
|
||
1. `E2E_FiveConcurrentClients_SameReadHR1072_CoalescedHitCount_AtLeast_3` — five NModbus clients connect to the proxy, simultaneously read HR1072 (BCD-configured). Assert `coalescedHitCount >= 3` (race wiggle room — perfect coalescing would give 4 hits, but the racy first-arrivals can both miss).
|
||
2. `E2E_RewriterStillWorks_ForAllCoalescedParties` — same setup, but with BCD tag at 1072. All five clients receive decoded `1234`. Proves the rewriter sees a coalesced response correctly and the TxId restoration doesn't perturb the BCD bytes.
|
||
3. `E2E_DifferentRegisters_NotCoalesced_CoalescedHitCount_Zero` — five clients reading five different addresses; assert no coalescing happened.
|
||
4. `E2E_StatusPage_Shows_CoalescingRatio` — `/status.json` for the test PLC has populated `coalescedHitCount` and `coalescedMissCount` after the burst.
|
||
5. `E2E_DisableViaHotReload_RevertToPhase9Behaviour` — write a temp appsettings with `ReadCoalescing.Enabled = false`, hot-reload, verify subsequent identical reads each hit the backend separately (counter doesn't increment).
|
||
|
||
## Phase gate
|
||
|
||
- [ ] `dotnet build Mbproxy.slnx -c Debug` — zero warnings, zero errors.
|
||
- [ ] All prior tests still green — specifically the **4 critical Phase-9 regression guards**:
|
||
- `Forward_FC03_HR1072_Returns_Decoded_1234`
|
||
- `Forward_FC06_WriteHR200_ThenReadBack_RoundTrips`
|
||
- `Forward_FC16_WriteMultipleHR201_203_ThenReadBack_RoundTrips`
|
||
- `MbapTxId_IsPreservedEndToEnd`
|
||
- [ ] All new unit + e2e tests pass (≥ 17 new).
|
||
- [ ] **Headline assertion:** 5 concurrent FC03 reads of the same register through the proxy produce **at most 2** backend round-trips (allowing one race for the initial pair). Verifiable via stub-backend's request counter in `ReadCoalescingTests`.
|
||
- [ ] FC04 reads of the same address as a coexisting FC03 stream do NOT coalesce together. Verified by an explicit test.
|
||
- [ ] FC06 / FC16 writes are NEVER on the coalescing path. Verified by setting `MaxParties = 1` and confirming write throughput is unaffected.
|
||
- [ ] Coalescing-ratio counter ≥ 50 % under the headline stress test (5 simultaneous identical reads).
|
||
- [ ] Disabling coalescing via `Mbproxy.Resilience.ReadCoalescing.Enabled = false` hot-reloads cleanly; running coalesced entries drain naturally without errors.
|
||
- [ ] `docs/design.md` Rewriter section mentions the coalescing path; `docs/kpi.md` Tier 1 includes the new fields; `install/mbproxy.config.template.json` includes the new commented `Resilience.ReadCoalescing` block.
|
||
- [ ] HTML page weight under 50 KB for 54 PLCs (verify with the existing renderer test).
|
||
|
||
## Out of scope
|
||
|
||
- **Post-response caching** — no TTL, no staleness window beyond "while the request is in flight." This phase is strictly in-flight. A response-cache phase would be a separate plan (Phase 11+) and would require the design.md "not a cache layer" stance to be revisited and rewritten.
|
||
- **Range-overlap coalescing** — request A reading [100..110], request B reading [105..115]. Different keys; no coalescing. Range-overlap detection is a separate optimisation with its own algorithmic complexity (interval trees, etc.) and its own staleness questions (request B's response would include reg 100..104 from A's perspective, but those weren't in B's response).
|
||
- **Cross-PLC coalescing** — each PLC's multiplexer has its own key map. No optimization across PLCs (their backend connections are independent anyway).
|
||
- **Write coalescing / batching** — different problem with non-idempotency concerns. The design doc's "no mid-request retry on writes" principle extends to "no write coalescing."
|
||
- **Predictive batching** — combining a single client's likely-next read into the current request. Out of scope; speculative reads are a different optimization category.
|
||
- **Adaptive `MaxParties`** — staying at the configured value. Auto-tuning is interesting but speculative.
|
||
|
||
## Subagent briefing
|
||
|
||
If you're the agent picking up this phase:
|
||
|
||
1. **Phase 9's `InterestedParties` list is the seam.** This phase only adds the "look up the key, attach a new party to an existing entry" logic. The fan-out side already iterates the list correctly. If you find yourself rewriting Phase 9's response path, you've drifted out of scope.
|
||
|
||
2. **`CoalescingKey` includes `UnitId`.** DL260 fleets typically use unit 1, but we don't assume — different unit IDs are different PLC personalities behind the same TCP socket and must not coalesce.
|
||
|
||
3. **FC03 and FC04 are different tables.** Same register address space in DL series, but Modbus treats them separately. Different `CoalescingKey` for the same address; no coalescing across them.
|
||
|
||
4. **Coalescing is best-effort under races.** Two simultaneous identical requests can both miss the map and create separate entries — counter just shows a lower ratio. Not a bug; documented behaviour. Do not over-engineer with double-checked locking.
|
||
|
||
5. **`MaxParties` is the load-shedding safety valve.** If a thousand HMI panels all attach to one in-flight request, the response fan-out cost goes linear with attachment count and stalls the backend reader task. Cap at 32 by default. Past the cap, route through a fresh entry — fan-out cost per entry is bounded.
|
||
|
||
6. **The attach-or-create operation MUST be atomic per key.** Two simultaneous arrivals must not both create new entries for the same key (would defeat coalescing). The simpler implementation: `lock(map.SyncRoot)` around the attach branch. The lock-free implementation uses `AddOrUpdate` with the updateFactory checking the count cap. Pick whichever you can write correctly in 30 minutes; document the choice.
|
||
|
||
7. **Response fan-out must check `Pipe.IsAlive` per party.** An upstream client that disconnects between attaching and the response arriving — count it as `CoalescedResponseToDeadUpstream` and continue with the others. Do not throw, do not log per-occurrence at Information (would be too noisy under client churn).
|
||
|
||
8. **Hot-reload of `Enabled` doesn't disrupt in-flight entries.** Disabling the feature mid-flight just means subsequent requests take the non-coalescing path. Existing coalesced entries drain when their response arrives. Don't try to "flush" them on the reload event.
|
||
|
||
9. **`CoalescedHit + CoalescedMiss = total FC03+FC04 requests`.** The math has to balance per snapshot. Use `Interlocked.Increment` exclusively. Disabling coalescing means every FC03/04 request becomes a Miss (which is fine — the metric still tracks total reads).
|
||
|
||
10. **Update `design.md` AND `kpi.md` AND the install template in the same PR as the code.** Doc drift is a gate failure. The coalescing-ratio KPI specifically graduates from "future" to "Tier 1 supported" — make that promotion explicit in `kpi.md`.
|
||
|
||
## Cross-references
|
||
|
||
- Phase 9's multiplexer is the foundation. The `InterestedParty` and `InterestedParties` types live there: [`09-txid-multiplexing.md`](09-txid-multiplexing.md).
|
||
- KPI graduation target: [`../kpi.md`](../kpi.md) → Tier 1 (rates / percentiles / availability — coalescing-ratio joins this tier).
|
||
- Modbus unit-ID semantics that make coalescing-key uniqueness load-bearing: [`../../DL260/dl205.md`](../../DL260/dl205.md) → "Function Code Support" and "Coils and Discrete Inputs".
|
||
- Counter snapshot backwards-compat policy that this phase respects (additive only): [`../kpi.md`](../kpi.md) → "Backwards-compat policy".
|