a2dba4bd07
When two or more upstream clients send the same FC03/FC04 read while a matching request is already in flight on the same PLC's multiplexed backend socket, attach the late arrivals to the existing InFlightRequest .InterestedParties list instead of opening a second backend round-trip. The single backend response fans out to every attached party with each party's original MBAP TxId restored individually. Zero post-response staleness — coalescing operates entirely within the in-flight window (microseconds to ~10 ms typical); the proxy is NOT a cache layer. Headline mechanism: - New record struct CoalescingKey(UnitId, Fc, StartAddress, Qty) keys the per-PLC InFlightByKeyMap. FC03 and FC04 are separate Modbus tables and never share a key; different unit IDs never coalesce; writes (FC06/FC16) bypass the coalescing path entirely. - InFlightByKeyMap uses a simple lock around a Dictionary; atomic TryAttachOrCreate either appends a new party to the in-flight request's mutable List<InterestedParty> or invokes a factory to build a fresh entry. Per-entry MaxParties cap (default 32) bounds fan-out cost; past the cap, the next arrival opens a new entry. - PlcMultiplexer.OnUpstreamFrameAsync takes the coalescing path for FC03/FC04 when Mbproxy.Resilience.ReadCoalescing.Enabled. The factory closure does the Phase-9 work (allocate TxId, add to CorrelationMap); the channel send happens AFTER returning from TryAttachOrCreate so the map lock is not held across the async send. - Response fan-out in RunBackendReaderAsync removes the entry from InFlightByKeyMap before iterating InterestedParties, ensuring no concurrent attach can mutate the list during iteration. - Cascade + watchdog paths also drain the key map so a stale entry cannot outlive its backend round-trip. Counter accounting balance (per snapshot): CoalescedHitCount + CoalescedMissCount equals total FC03 + FC04 requests since startup. Even with coalescing disabled, every read still bumps Miss so dashboard math stays balanced. New surface (additive only): - src/Mbproxy/Proxy/Multiplexing/CoalescingKey.cs - src/Mbproxy/Proxy/Multiplexing/InFlightByKeyMap.cs - src/Mbproxy/Proxy/Multiplexing/CoalescingLogEvents.cs - ReadCoalescingOptions on ResilienceOptions - CoalescedHitCount / CoalescedMissCount / CoalescedResponseToDeadUpstream counters surfaced on /status.json per PLC and as a compact "Coal" cell on the HTML status page. Phase 9 test patch: TwoUpstreams_ProxyTxIds_AreDistinct_OnTheWire previously read the same register from both clients (which now coalesces). Patched to read two different addresses so the test still proves distinct backend TxIds without violating the coalescing contract. Tests added: 24 new (19 unit + 5 E2E): - CoalescingKeyTests (5) - InFlightByKeyMapTests (6, includes concurrent stress) - ReadCoalescingTests (8, stub-backend with deterministic delay) - ReadCoalescingE2ETests (5, pymodbus simulator; coalescing-active during overlap is proven against the stub, not the sim, due to pymodbus 3.13's known concurrent-frame bug) Total: 325 tests passing (282 unit + 43 E2E). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
327 lines
24 KiB
Markdown
327 lines
24 KiB
Markdown
# Phase 10 — Read coalescing (in-flight only, zero staleness)
|
||
|
||
When two or more upstream clients send the same FC03/FC04 request to the same PLC while a matching request is already in flight, attach the late arrivals to the existing in-flight entry and fan out the single backend response to all attached clients. Operates entirely within the in-flight window (microseconds to ~10 ms typical) — no post-response caching, no TTL, no staleness contract change.
|
||
|
||
**Status:** shipped (2026-05-14). All gate items green.
|
||
**Depends on:** Phase 09 (multiplexer + `InFlightRequest` with `InterestedParties` list shape).
|
||
**Parallel-safe with:** nothing. The phase modifies `PlcMultiplexer.OnFrame` and the backend reader fan-out path; both are tightly coupled.
|
||
|
||
## Goal
|
||
|
||
Phase 9's multiplexer routes every upstream request individually, even when two upstream clients are asking for identical data. In a fleet of 54 PLCs where the HMI, historian, and engineering workstation all poll the same screen tags every second, that's up to 3× redundant backend traffic per overlapping read — and the H2-ECOM100's single-request-per-scan internal serialization means redundant traffic compounds into measurable backend latency.
|
||
|
||
Phase 10 detects same-key reads within the in-flight window and serves them from a single backend response. Coalescing operates entirely between "first request sent to backend" and "response received from backend." Once the response is fanned out, the coalescing entry dies. No values are held past the response arrival; no invalidation logic; no design-doc change to the "not a polling/cache layer" stance.
|
||
|
||
## Why this is safe — the zero-staleness argument
|
||
|
||
A coalesced response is a value the backend was going to return to the first request anyway. By the time the second client's request arrives, the first request is already on the wire to the PLC. The PLC's response represents the register values at the moment the PLC serviced the request. Even if the second request had been sent separately on its own backend round-trip, the H2-ECOM100's internal serialization would have queued it behind the first, returning the same value (or a value as old as one extra PLC scan ≈ 2-10 ms older).
|
||
|
||
In other words: the only thing Phase 10 changes is whether the proxy sends one or two requests to the PLC. The answer the upstream clients see is identical (or fresher than the "two requests" alternative, since coalescing means the second client doesn't wait for a second backend round-trip).
|
||
|
||
## Outputs (new files in this phase)
|
||
|
||
```
|
||
src/Mbproxy/Proxy/Multiplexing/CoalescingKey.cs # readonly record struct
|
||
src/Mbproxy/Proxy/Multiplexing/InFlightByKeyMap.cs # ConcurrentDictionary wrapper with atomic attach-or-create
|
||
src/Mbproxy/Proxy/Multiplexing/CoalescingLogEvents.cs # [LoggerMessage] vocab for this phase
|
||
|
||
tests/Mbproxy.Tests/Proxy/Multiplexing/CoalescingKeyTests.cs
|
||
tests/Mbproxy.Tests/Proxy/Multiplexing/InFlightByKeyMapTests.cs
|
||
tests/Mbproxy.Tests/Proxy/Multiplexing/ReadCoalescingTests.cs
|
||
tests/Mbproxy.Tests/Proxy/Multiplexing/ReadCoalescingE2ETests.cs
|
||
```
|
||
|
||
## Files modified (existing files in this phase)
|
||
|
||
```
|
||
src/Mbproxy/Proxy/Multiplexing/PlcMultiplexer.cs # OnFrame learns coalescing path; reader fans out
|
||
src/Mbproxy/Proxy/ProxyCounters.cs # new: CoalescedHitCount, CoalescedMissCount, CoalescedResponseToDeadUpstream
|
||
src/Mbproxy/Options/ResilienceOptions.cs # new: ReadCoalescing sub-options
|
||
src/Mbproxy/Admin/StatusDto.cs # PlcBackendStatus gains coalescing fields
|
||
src/Mbproxy/Admin/StatusSnapshotBuilder.cs # populate new fields
|
||
src/Mbproxy/Admin/StatusHtmlRenderer.cs # show coalescing ratio in per-PLC row
|
||
|
||
docs/design.md # Rewriter section: note FC03/04 may be coalesced before reaching backend
|
||
docs/kpi.md # graduate "coalescing ratio" KPI from future to supported
|
||
install/mbproxy.config.template.json # add the new Resilience.ReadCoalescing section with comments
|
||
```
|
||
|
||
`InFlightRequest.cs` does **not** change — the `InterestedParties` list shape was specifically introduced in Phase 9 to make this phase additive.
|
||
|
||
## Tasks
|
||
|
||
### 10.1 Data types
|
||
|
||
1. **`CoalescingKey`** — `readonly record struct CoalescingKey(byte UnitId, byte Fc, ushort StartAddress, ushort Qty)`. Hash key for the in-flight-by-key map. Auto-generated record-struct equality. Verify hashcode distribution is reasonable for typical V-memory address ranges (smoke-test in unit tests).
|
||
|
||
2. **`InFlightByKeyMap`** — wraps `ConcurrentDictionary<CoalescingKey, InFlightRequest>` plus a small lock for atomic attach-or-create. Methods:
|
||
- `bool TryAttachOrCreate(CoalescingKey key, InterestedParty party, Func<InFlightRequest> factory, int maxParties, out InFlightRequest req, out bool wasNew)` — atomic: if the key exists and `req.InterestedParties.Count < maxParties`, append the party to a freshly-built `IReadOnlyList<InterestedParty>` (since the record is immutable, we substitute a new `InFlightRequest` with the extended list in the map) and return `(wasNew=false)`; else call factory to build a new entry, store it, return `(wasNew=true)`.
|
||
- `bool TryRemove(CoalescingKey key, out InFlightRequest req)` — called by the backend reader after fan-out completes.
|
||
- The "attach to existing" path is the load-bearing concurrency primitive of this phase. The simpler implementation: small `lock` around the attach branch. The lock-free implementation uses `AddOrUpdate` with a comparand check. Pick the simpler one; document the choice in code.
|
||
|
||
### 10.2 Multiplexer integration
|
||
|
||
3. **Request path** in `PlcMultiplexer.OnFrame`:
|
||
|
||
```csharp
|
||
bool coalesceCandidate = (fc is 0x03 or 0x04)
|
||
&& resilienceOptions.CurrentValue.ReadCoalescing.Enabled;
|
||
if (coalesceCandidate)
|
||
{
|
||
var key = new CoalescingKey(unitId, fc, startAddr, qty);
|
||
var party = new InterestedParty(upstreamPipe, originalTxId);
|
||
|
||
InFlightRequest? req;
|
||
bool wasNew;
|
||
inFlightByKey.TryAttachOrCreate(
|
||
key, party,
|
||
factory: () => BuildAndRegisterNew(unitId, fc, startAddr, qty, party),
|
||
maxParties: resilienceOptions.CurrentValue.ReadCoalescing.MaxParties,
|
||
out req, out wasNew);
|
||
|
||
if (!wasNew)
|
||
{
|
||
counters.IncrementCoalescedHit();
|
||
return; // do NOT send to backend — first request will get the response
|
||
}
|
||
counters.IncrementCoalescedMiss();
|
||
// fall through: factory already allocated proxyTxId + added to correlation map + sent
|
||
return;
|
||
}
|
||
|
||
// FC06/FC16 or coalescing disabled: existing Phase 9 path (allocate, register, send).
|
||
```
|
||
|
||
The factory closure does the existing Phase 9 work (TxId allocate, correlation map add, MBAP rewrite, send to outbound channel). The new code only adds the "is this already in-flight?" check before that work.
|
||
|
||
4. **Response fan-out** in the backend reader task — already shaped correctly by Phase 9; this phase just makes sure the `CoalescingKey` matching the response is also removed from `InFlightByKeyMap` alongside the `CorrelationMap` removal:
|
||
|
||
```csharp
|
||
if (correlationMap.TryRemove(proxyTxId, out var req))
|
||
{
|
||
txIdAllocator.Release(proxyTxId);
|
||
|
||
// Also clear the coalescing key so a new identical request after this point starts fresh.
|
||
var key = new CoalescingKey(req.UnitId, req.Fc, req.StartAddress, req.Qty);
|
||
inFlightByKey.TryRemove(key, out _);
|
||
|
||
// Phase 9's fan-out loop — already iterates InterestedParties.
|
||
foreach (var party in req.InterestedParties)
|
||
{
|
||
if (!party.Pipe.IsAlive)
|
||
{
|
||
counters.IncrementCoalescedResponseToDeadUpstream();
|
||
continue;
|
||
}
|
||
var partyFrame = WithTxId(responseFrame, party.OriginalTxId);
|
||
party.Pipe.SendResponse(partyFrame);
|
||
}
|
||
}
|
||
```
|
||
|
||
### 10.3 Configuration
|
||
|
||
5. **Extend `ResilienceOptions`:**
|
||
|
||
```csharp
|
||
public sealed class ReadCoalescingOptions
|
||
{
|
||
public bool Enabled { get; init; } = true;
|
||
public int MaxParties { get; init; } = 32;
|
||
}
|
||
|
||
public sealed class ResilienceOptions
|
||
{
|
||
public RetryProfile BackendConnect { get; init; } = new();
|
||
public RecoveryProfile ListenerRecovery { get; init; } = new();
|
||
public ReadCoalescingOptions ReadCoalescing { get; init; } = new(); // ← new
|
||
}
|
||
```
|
||
|
||
Hot-reloadable via the existing `IOptionsMonitor<MbproxyOptions>` wiring. Disabling `Enabled` at runtime means new requests take the non-coalescing path; existing in-flight coalesced entries drain naturally.
|
||
|
||
6. **`mbproxy.config.template.json` update** — add a commented `ReadCoalescing` block to the install template under `Resilience` with the two new keys, default values, and a one-paragraph explanation.
|
||
|
||
### 10.4 Counters and status surfacing
|
||
|
||
7. **`ProxyCounters` additions:**
|
||
|
||
```csharp
|
||
public void IncrementCoalescedHit();
|
||
public void IncrementCoalescedMiss();
|
||
public void IncrementCoalescedResponseToDeadUpstream();
|
||
```
|
||
|
||
`CounterSnapshot` gains `CoalescedHitCount`, `CoalescedMissCount`, `CoalescedResponseToDeadUpstream` — all `long`, all Interlocked. The status page derives `coalescingRatio = Hit / (Hit + Miss)` for display; the raw counts are exposed in JSON for downstream tooling.
|
||
|
||
8. **`/status.json` per-PLC fields** — extend `PlcBackendStatus`:
|
||
|
||
```csharp
|
||
public sealed record PlcBackendStatus(
|
||
long ConnectsSuccess, long ConnectsFailed,
|
||
ExceptionCounts ExceptionsByCode,
|
||
double LastRoundTripMs,
|
||
long CoalescedHitCount, // ← new
|
||
long CoalescedMissCount, // ← new
|
||
long CoalescedResponseToDeadUpstream); // ← new
|
||
```
|
||
|
||
9. **HTML page** — extend the per-PLC row with a compact `Coal: 73%` cell (`hit / (hit+miss) * 100`, rounded). Page-weight assertion (under 50 KB for 54 PLCs) must continue to pass.
|
||
|
||
### 10.5 Documentation
|
||
|
||
10. **`docs/design.md` Rewriter section:** add a paragraph clarifying that FC03/FC04 requests may be coalesced with other in-flight requests of the same `(unitId, fc, start, qty)` before reaching the backend. Emphasize that the transparency contract holds — each client sees its own original TxId restored on the response, and the response value is identical to what an uncoalesced request would have returned (within the PLC's scan-time precision).
|
||
|
||
11. **`docs/kpi.md` Tier 1:** the new `coalescedHitCount`, `coalescedMissCount`, derived `coalescingRatio` graduate from "future" to "supported" Tier 1 fields. Mention the `coalescedResponseToDeadUpstream` counter as a low-priority Tier 2 informational metric.
|
||
|
||
## Public surface declared in this phase
|
||
|
||
```csharp
|
||
namespace Mbproxy.Proxy.Multiplexing;
|
||
|
||
internal readonly record struct CoalescingKey(
|
||
byte UnitId, byte Fc, ushort StartAddress, ushort Qty);
|
||
|
||
internal sealed class InFlightByKeyMap
|
||
{
|
||
public bool TryAttachOrCreate(
|
||
CoalescingKey key,
|
||
InterestedParty party,
|
||
Func<InFlightRequest> factory,
|
||
int maxParties,
|
||
out InFlightRequest req,
|
||
out bool wasNew);
|
||
public bool TryRemove(CoalescingKey key, out InFlightRequest req);
|
||
public int Count { get; }
|
||
}
|
||
```
|
||
|
||
```csharp
|
||
namespace Mbproxy.Options;
|
||
|
||
public sealed class ReadCoalescingOptions
|
||
{
|
||
public bool Enabled { get; init; } = true;
|
||
public int MaxParties { get; init; } = 32;
|
||
}
|
||
// Added field on existing ResilienceOptions:
|
||
public ReadCoalescingOptions ReadCoalescing { get; init; } = new();
|
||
```
|
||
|
||
`ProxyCounters` and `CounterSnapshot` gain three new `long` fields. No public-surface removals, no renames.
|
||
|
||
## Tests required
|
||
|
||
### Unit (`Category = Unit`)
|
||
|
||
**`CoalescingKeyTests`** (≥ 4 tests):
|
||
|
||
1. `Equality_OnIdenticalKeys_ReturnsTrue`
|
||
2. `Equality_OnDifferentFc_ReturnsFalse` — FC03 vs FC04 with same start/qty/unit are NOT equal (different Modbus tables).
|
||
3. `Equality_OnDifferentUnitId_ReturnsFalse`
|
||
4. `HashCode_DistributionSanity` — build 10,000 randomly-generated keys, bucket by `Key.GetHashCode() & 0xFF`, assert no bucket has > 5 % of total (rough uniformity check).
|
||
|
||
**`InFlightByKeyMapTests`** (≥ 6 tests):
|
||
|
||
1. `TryAttachOrCreate_NewKey_CallsFactory_ReturnsTrue_WasNewTrue`
|
||
2. `TryAttachOrCreate_ExistingKey_AppendsParty_ReturnsTrue_WasNewFalse`
|
||
3. `TryAttachOrCreate_ExistingKey_AtMaxParties_CreatesFreshEntry_NotAppend` — refuses to fan out beyond the cap; preserves backend-load-shedding guarantee.
|
||
4. `TryRemove_AfterAttach_AllPartiesPresent_InRetrievedEntry`
|
||
5. `TryRemove_OfMissing_ReturnsFalse`
|
||
6. `Concurrent_AttachOrCreate_From_Two_Threads_NoLostParties_AndNoDuplicateEntries` — 100 tasks × 1000 ops each.
|
||
|
||
**`ReadCoalescingTests`** (≥ 7 tests, real sockets, stub backend):
|
||
|
||
1. `TwoClients_SameRequest_OnlyOneBackendRoundTrip` — stub backend counts received requests; assert 1.
|
||
2. `TwoClients_DifferentRequests_BothHitBackend` — different start addresses; assert 2.
|
||
3. `FiveClients_SameRequest_OneBackendRoundTrip_FiveResponses` — fan-out works correctly with 5 attached parties.
|
||
4. `FC03_And_FC04_SameAddress_NOT_Coalesced` — different tables.
|
||
5. `FC06_Write_NeverCoalesced` — writes always allocate their own TxId.
|
||
6. `OneClient_DisconnectsMidFlight_OthersStillGetResponse_AndDeadUpstreamCounterIncrements`
|
||
7. `AtMaxParties_NextRequest_StartsFreshBackendRoundTrip` — verify the cap behaviour: when `MaxParties = 2` and 3 simultaneous clients send the same request, the third opens a new in-flight entry rather than joining the first.
|
||
|
||
### E2E (`Category = E2E`)
|
||
|
||
**`ReadCoalescingE2ETests`** (≥ 5 tests, against pymodbus simulator, `[Collection(nameof(DL205SimulatorCollection))]`):
|
||
|
||
1. `E2E_FiveConcurrentClients_SameReadHR1072_CoalescedHitCount_AtLeast_3` — five NModbus clients connect to the proxy, simultaneously read HR1072 (BCD-configured). Assert `coalescedHitCount >= 3` (race wiggle room — perfect coalescing would give 4 hits, but the racy first-arrivals can both miss).
|
||
2. `E2E_RewriterStillWorks_ForAllCoalescedParties` — same setup, but with BCD tag at 1072. All five clients receive decoded `1234`. Proves the rewriter sees a coalesced response correctly and the TxId restoration doesn't perturb the BCD bytes.
|
||
3. `E2E_DifferentRegisters_NotCoalesced_CoalescedHitCount_Zero` — five clients reading five different addresses; assert no coalescing happened.
|
||
4. `E2E_StatusPage_Shows_CoalescingRatio` — `/status.json` for the test PLC has populated `coalescedHitCount` and `coalescedMissCount` after the burst.
|
||
5. `E2E_DisableViaHotReload_RevertToPhase9Behaviour` — write a temp appsettings with `ReadCoalescing.Enabled = false`, hot-reload, verify subsequent identical reads each hit the backend separately (counter doesn't increment).
|
||
|
||
## Phase gate
|
||
|
||
- [ ] `dotnet build Mbproxy.slnx -c Debug` — zero warnings, zero errors.
|
||
- [ ] All prior tests still green — specifically the **4 critical Phase-9 regression guards**:
|
||
- `Forward_FC03_HR1072_Returns_Decoded_1234`
|
||
- `Forward_FC06_WriteHR200_ThenReadBack_RoundTrips`
|
||
- `Forward_FC16_WriteMultipleHR201_203_ThenReadBack_RoundTrips`
|
||
- `MbapTxId_IsPreservedEndToEnd`
|
||
- [ ] All new unit + e2e tests pass (≥ 17 new).
|
||
- [ ] **Headline assertion:** 5 concurrent FC03 reads of the same register through the proxy produce **at most 2** backend round-trips (allowing one race for the initial pair). Verifiable via stub-backend's request counter in `ReadCoalescingTests`.
|
||
- [ ] FC04 reads of the same address as a coexisting FC03 stream do NOT coalesce together. Verified by an explicit test.
|
||
- [ ] FC06 / FC16 writes are NEVER on the coalescing path. Verified by setting `MaxParties = 1` and confirming write throughput is unaffected.
|
||
- [ ] Coalescing-ratio counter ≥ 50 % under the headline stress test (5 simultaneous identical reads).
|
||
- [ ] Disabling coalescing via `Mbproxy.Resilience.ReadCoalescing.Enabled = false` hot-reloads cleanly; running coalesced entries drain naturally without errors.
|
||
- [ ] `docs/design.md` Rewriter section mentions the coalescing path; `docs/kpi.md` Tier 1 includes the new fields; `install/mbproxy.config.template.json` includes the new commented `Resilience.ReadCoalescing` block.
|
||
- [ ] HTML page weight under 50 KB for 54 PLCs (verify with the existing renderer test).
|
||
|
||
## Out of scope
|
||
|
||
- **Post-response caching** — no TTL, no staleness window beyond "while the request is in flight." This phase is strictly in-flight. A response-cache phase would be a separate plan (Phase 11+) and would require the design.md "not a cache layer" stance to be revisited and rewritten.
|
||
- **Range-overlap coalescing** — request A reading [100..110], request B reading [105..115]. Different keys; no coalescing. Range-overlap detection is a separate optimisation with its own algorithmic complexity (interval trees, etc.) and its own staleness questions (request B's response would include reg 100..104 from A's perspective, but those weren't in B's response).
|
||
- **Cross-PLC coalescing** — each PLC's multiplexer has its own key map. No optimization across PLCs (their backend connections are independent anyway).
|
||
- **Write coalescing / batching** — different problem with non-idempotency concerns. The design doc's "no mid-request retry on writes" principle extends to "no write coalescing."
|
||
- **Predictive batching** — combining a single client's likely-next read into the current request. Out of scope; speculative reads are a different optimization category.
|
||
- **Adaptive `MaxParties`** — staying at the configured value. Auto-tuning is interesting but speculative.
|
||
|
||
## Subagent briefing
|
||
|
||
If you're the agent picking up this phase:
|
||
|
||
1. **Phase 9's `InterestedParties` list is the seam.** This phase only adds the "look up the key, attach a new party to an existing entry" logic. The fan-out side already iterates the list correctly. If you find yourself rewriting Phase 9's response path, you've drifted out of scope.
|
||
|
||
2. **`CoalescingKey` includes `UnitId`.** DL260 fleets typically use unit 1, but we don't assume — different unit IDs are different PLC personalities behind the same TCP socket and must not coalesce.
|
||
|
||
3. **FC03 and FC04 are different tables.** Same register address space in DL series, but Modbus treats them separately. Different `CoalescingKey` for the same address; no coalescing across them.
|
||
|
||
4. **Coalescing is best-effort under races.** Two simultaneous identical requests can both miss the map and create separate entries — counter just shows a lower ratio. Not a bug; documented behaviour. Do not over-engineer with double-checked locking.
|
||
|
||
5. **`MaxParties` is the load-shedding safety valve.** If a thousand HMI panels all attach to one in-flight request, the response fan-out cost goes linear with attachment count and stalls the backend reader task. Cap at 32 by default. Past the cap, route through a fresh entry — fan-out cost per entry is bounded.
|
||
|
||
6. **The attach-or-create operation MUST be atomic per key.** Two simultaneous arrivals must not both create new entries for the same key (would defeat coalescing). The simpler implementation: `lock(map.SyncRoot)` around the attach branch. The lock-free implementation uses `AddOrUpdate` with the updateFactory checking the count cap. Pick whichever you can write correctly in 30 minutes; document the choice.
|
||
|
||
7. **Response fan-out must check `Pipe.IsAlive` per party.** An upstream client that disconnects between attaching and the response arriving — count it as `CoalescedResponseToDeadUpstream` and continue with the others. Do not throw, do not log per-occurrence at Information (would be too noisy under client churn).
|
||
|
||
8. **Hot-reload of `Enabled` doesn't disrupt in-flight entries.** Disabling the feature mid-flight just means subsequent requests take the non-coalescing path. Existing coalesced entries drain when their response arrives. Don't try to "flush" them on the reload event.
|
||
|
||
9. **`CoalescedHit + CoalescedMiss = total FC03+FC04 requests`.** The math has to balance per snapshot. Use `Interlocked.Increment` exclusively. Disabling coalescing means every FC03/04 request becomes a Miss (which is fine — the metric still tracks total reads).
|
||
|
||
10. **Update `design.md` AND `kpi.md` AND the install template in the same PR as the code.** Doc drift is a gate failure. The coalescing-ratio KPI specifically graduates from "future" to "Tier 1 supported" — make that promotion explicit in `kpi.md`.
|
||
|
||
## Cross-references
|
||
|
||
- Phase 9's multiplexer is the foundation. The `InterestedParty` and `InterestedParties` types live there: [`09-txid-multiplexing.md`](09-txid-multiplexing.md).
|
||
- KPI graduation target: [`../kpi.md`](../kpi.md) → Tier 1 (rates / percentiles / availability — coalescing-ratio joins this tier).
|
||
- Modbus unit-ID semantics that make coalescing-key uniqueness load-bearing: [`../../DL260/dl205.md`](../../DL260/dl205.md) → "Function Code Support" and "Coils and Discrete Inputs".
|
||
- Counter snapshot backwards-compat policy that this phase respects (additive only): [`../kpi.md`](../kpi.md) → "Backwards-compat policy".
|
||
|
||
## Clarifications discovered during implementation
|
||
|
||
These are the implementation details that the original phase doc did not pin down; recorded here so the next reader doesn't relearn them.
|
||
|
||
1. **`InterestedParties` is a `List<InterestedParty>` cast to `IReadOnlyList`.** Phase 9 typed the field as `IReadOnlyList<InterestedParty>` to leave room for any implementation; Phase 10 specifically requires a mutable list so the map can append parties under its lock. The list is mutated only under `InFlightByKeyMap`'s lock, and the reader's fan-out iterates the list ONLY after the entry has been removed from the map — by that point no further appends are possible. There is no separate snapshot copy.
|
||
|
||
2. **The factory closure performs the Phase-9 work (allocate TxId + add to CorrelationMap) but does NOT enqueue to the outbound channel.** The channel send happens AFTER returning from `TryAttachOrCreate` so the InFlightByKey lock is not held across a potentially-async send. The factory communicates its allocated proxy TxId and InFlightRequest back to the caller through closure-captured locals. If the allocator is saturated, the factory returns a "stub" InFlightRequest with no CorrelationMap entry; the caller detects this and delivers a Modbus exception 04.
|
||
|
||
3. **`coalescedHitCount + coalescedMissCount` = total FC03/FC04 requests (always).** Even when coalescing is disabled, every FC03/04 request bumps `coalescedMissCount` from the non-coalescing path. This keeps the math balanced for dashboard consumers regardless of feature state. Writes (FC06/FC16) are NOT in this accounting — they never touch the coalescing path.
|
||
|
||
4. **Cascade and watchdog paths drain `InFlightByKeyMap` too.** On backend disconnect, `TearDownBackendAsync` calls `_inFlightByKey.DrainAll()` so a brand-new identical request through the freshly-reconnected backend is treated as a miss. On per-request watchdog timeout, `_inFlightByKey.TryRemove(key)` runs alongside the CorrelationMap removal so subsequent identical requests start fresh.
|
||
|
||
5. **Live config accessor, not `IOptionsMonitor`-by-value.** The multiplexer takes a `Func<ReadCoalescingOptions>` accessor that resolves to `optionsMonitor.CurrentValue.Resilience.ReadCoalescing` per PDU. This keeps the constructor surface lightweight (no DI on `IOptionsMonitor<MbproxyOptions>`) and gives tests a clean way to pin a fixed config. Hot-reload of `Enabled` propagates because the accessor is read on every incoming FC03/FC04 request.
|
||
|
||
6. **Phase 9's `TwoUpstreams_ProxyTxIds_AreDistinct_OnTheWire` test required a one-line edit.** It asserted ≥2 distinct backend TxIds from two identical FC03 reads — exactly the case Phase 10 now coalesces. The test was patched to use DIFFERENT start addresses so the two reads remain non-coalescable while still proving distinct proxy TxIds. The rest of Phase 9's tests are unaffected.
|
||
|
||
7. **pymodbus simulator and coalescing.** The simulator's `last_pdu`-overwrite bug (documented in design.md) means we cannot E2E-verify "five concurrent identical reads → 1 backend round-trip" against pymodbus. The headline-stress correctness claim is therefore proven against the stub backend in `ReadCoalescingTests` (real loopback sockets, deterministic 200–400 ms response delay so the in-flight window is wide enough for racing requests to actually overlap). The E2E suite verifies counter accounting, status-page surfacing, and the rewriter integration on serialised reads — i.e. the integration boundary, not the concurrency proof.
|