f49e27e316
Adds 11 topic-focused docs under docs/{Architecture,Features,Operations,Reference,Testing}/
and links them from README.md's new "Detailed documentation" section. Existing
top-level docs (design.md, kpi.md, operations.md) remain as canonical landings.
Architecture/
- Overview.md (150 lines) — listener topology, request flow, per-PLC isolation
- ConnectionModel.md (247 lines) — TxId multiplexer, watchdog, disconnect cascade
- ReadCoalescing.md (243 lines) — in-flight FC03/04 dedup via InFlightByKeyMap
- ResponseCache.md (398 lines) — opt-in per-tag TTL cache + range-overlap invalidation
Features/
- BcdRewriting.md (252 lines) — codec, CDAB, FC scope, partial-overlap policy
- HotReload.md (189 lines) — IOptionsMonitor + per-change-kind reconcile rules
Operations/
- Configuration.md (422 lines) — every Mbproxy:* option + validation rules
- StatusPage.md (334 lines) — admin endpoint surface, every JSON field
- Troubleshooting.md (364 lines) — diagnosis playbook keyed to log events
Reference/
- LogEvents.md (499 lines) — 28 events across 7 categories, grep-verified
Testing/
- Simulator.md (235 lines) — pymodbus fixture, skip policy, 3.13 framer quirk
Each doc was written by a dedicated agent against the StyleGuide.md rules with
a per-doc phase gate (PascalCase filename, H1 Title Case, code-fence language
tags, Related Documentation section with >=3 relative links, real type names
verified against src/). Cross-references between docs use relative paths;
all 18 README->docs links and all sibling links resolve.
Known follow-up: docs/design.md lines 215-251 are stale on two log-event
property templates (config.reload.applied and config.reload.rejected) and
mention LogContext.PushProperty scoping that isn't actually used. Reference/
LogEvents.md is now the authoritative event catalog and source-of-truth.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
244 lines
11 KiB
Markdown
244 lines
11 KiB
Markdown
# Read Coalescing
|
|
|
|
In-flight read coalescing collapses identical FC03/FC04 requests that arrive
|
|
while a backend response is still in flight onto a single backend round-trip,
|
|
then fans the single response out to every attached upstream client with each
|
|
client's original MBAP transaction ID restored.
|
|
|
|
## What Coalescing Does
|
|
|
|
When two upstream clients each send `(unitId=1, FC=3, start=100, qty=10)`
|
|
within the in-flight window of a previously-routed request, the second
|
|
arrival attaches to the existing `InFlightRequest` instead of opening a new
|
|
proxy transaction ID and a second backend round-trip. The PLC's reply is
|
|
delivered to both upstream pipes; each pipe sees its own MBAP `TxId`
|
|
restored on its copy of the response.
|
|
|
|
The value each upstream sees is the same value an uncoalesced request would
|
|
have returned within the PLC's own scan-time precision (microseconds to
|
|
~10 ms typical window). Coalescing is not a cache layer — once the response
|
|
fans out, the in-flight entry dies, and a subsequent identical read opens a
|
|
fresh round-trip. Bounded-staleness caching is a separate feature; see
|
|
[`./ResponseCache.md`](./ResponseCache.md).
|
|
|
|
## The Coalescing Key
|
|
|
|
The lookup tuple is defined in `CoalescingKey.cs`:
|
|
|
|
```csharp
|
|
internal readonly record struct CoalescingKey(
|
|
byte UnitId,
|
|
byte Fc,
|
|
ushort StartAddress,
|
|
ushort Qty);
|
|
```
|
|
|
|
Record-struct value equality drives the dictionary lookup in
|
|
`InFlightByKeyMap`. Several axes never coalesce, by design:
|
|
|
|
- **Function code.** FC03 (Read Holding Registers) and FC04 (Read Input
|
|
Registers) read different Modbus tables on the device. Their responses
|
|
are not interchangeable, so they do not share a key even at the same
|
|
address.
|
|
- **Unit ID.** Distinct unit IDs behind a shared socket address different
|
|
Modbus personalities — coalescing never crosses a unit boundary.
|
|
- **Start address and quantity.** Two reads with overlapping but
|
|
non-identical ranges never coalesce. Range-overlap logic exists for cache
|
|
invalidation, not for coalescing.
|
|
|
|
## Eligibility
|
|
|
|
Only FC03 and FC04 enter the coalescing path. The multiplexer's request
|
|
handler parses the function code from the inbound PDU and gates on
|
|
`fcByte is 0x03 or 0x04` before consulting `_inFlightByKey`.
|
|
|
|
- FC06 (Write Single Register) and FC16 (Write Multiple Registers) are
|
|
non-idempotent on BCD tags — a second send would write the value twice.
|
|
Writes bypass coalescing entirely and always take the one-round-trip path.
|
|
- Exception responses do not coalesce. Each upstream sees an exception
|
|
delivered against its own MBAP `TxId` through the normal correlation map
|
|
fan-out; there is no special exception-deduplication path.
|
|
|
|
## The InterestedParties Seam
|
|
|
|
The data shape that powers fan-out lives on `InFlightRequest`:
|
|
|
|
```csharp
|
|
internal sealed record InFlightRequest(
|
|
byte UnitId,
|
|
byte Fc,
|
|
ushort StartAddress,
|
|
ushort Qty,
|
|
IReadOnlyList<InterestedParty> InterestedParties,
|
|
DateTimeOffset SentAtUtc,
|
|
int ResolvedCacheTtlMs = 0);
|
|
|
|
internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId);
|
|
```
|
|
|
|
Each `InterestedParty` records the upstream pipe to deliver the response to
|
|
and the original MBAP `TxId` that pipe sent. The backend reader iterates
|
|
this list, patches each party's `OriginalTxId` into a per-party copy of the
|
|
response frame, and hands the frame to `party.Pipe.SendResponseAsync`.
|
|
|
|
### Multi-writer multi-reader safety
|
|
|
|
The list typed as `IReadOnlyList<InterestedParty>` on the public surface is
|
|
in fact a mutable `List<InterestedParty>` underneath. `InFlightByKeyMap`
|
|
serialises every state mutation under a single `object` lock:
|
|
|
|
- `TryAttachOrCreate` looks up the key, casts the existing
|
|
`InterestedParties` back to `List<InterestedParty>`, and appends the new
|
|
party — all under the lock.
|
|
- The backend reader calls `TryRemove(coalKey, out _)` **before** it
|
|
iterates the parties list during fan-out. Once the key is gone from the
|
|
map, no future attach can find it, so no further appends can occur.
|
|
|
|
The reader's removal-before-iteration ordering is the load-bearing
|
|
invariant. By the time fan-out reads the list, the list is effectively
|
|
frozen — there is no other writer that can reach it. The watchdog timeout
|
|
path observes the same protocol: it removes the coalescing key before it
|
|
walks `req.InterestedParties` to deliver exception 0x0B.
|
|
|
|
The reverse race (reader removes first, then a late attach arrives) is
|
|
impossible by construction — `TryRemove` and `TryAttachOrCreate` both take
|
|
the same map lock, so any late attach is serialised either entirely before
|
|
the removal (and is part of the fan-out) or entirely after (and opens a
|
|
fresh entry under a new factory call).
|
|
|
|
## MaxParties Cap
|
|
|
|
`ResilienceOptions.cs` exposes the load-shedding cap:
|
|
|
|
```csharp
|
|
public sealed class ReadCoalescingOptions
|
|
{
|
|
public bool Enabled { get; init; } = true;
|
|
public int MaxParties { get; init; } = 32;
|
|
}
|
|
```
|
|
|
|
`Mbproxy.Resilience.ReadCoalescing.MaxParties` defaults to 32. Inside
|
|
`TryAttachOrCreate`, an existing entry is only extended when
|
|
`existingList.Count < maxParties`; once the cap is hit, the next identical
|
|
arrival falls through to the factory branch and opens a fresh in-flight
|
|
entry (which means a fresh backend round-trip).
|
|
|
|
The cap bounds two costs:
|
|
|
|
- **Fan-out cost per entry** at O(MaxParties). The backend reader's
|
|
per-party copy-and-patch loop runs at most `MaxParties` times for any
|
|
single response.
|
|
- **Backend reader latency under pile-on.** A single pathologically popular
|
|
read (every HMI hitting the same tag at the same second) cannot stretch
|
|
one fan-out arbitrarily long.
|
|
|
|
## Hot-Reloadable On/Off
|
|
|
|
`Mbproxy.Resilience.ReadCoalescing.Enabled` defaults to `true`. The
|
|
multiplexer holds a `Func<ReadCoalescingOptions>` accessor that production
|
|
binds to `() => optionsMonitor.CurrentValue.Resilience.ReadCoalescing`, so
|
|
a hot-reload of `appsettings.json` propagates immediately on the next
|
|
inbound PDU.
|
|
|
|
Flipping `Enabled` to `false` at runtime does not disturb already-coalesced
|
|
entries: existing fan-outs drain through the backend reader naturally.
|
|
Subsequent FC03/FC04 requests skip the coalescing branch entirely and take
|
|
the one-proxy-TxId-per-upstream-request path verbatim.
|
|
|
|
The same accessor reads `MaxParties` per PDU, so an operator can raise or
|
|
lower the cap without restarting the service.
|
|
|
|
## Lookup Order in the Multiplexer's Read Path
|
|
|
|
`OnUpstreamFrameAsync` consults three tiers in fixed order for FC03/FC04:
|
|
|
|
1. **Cache** — if `_ctx.Cache` is wired and `_ctx.TagMap.ResolveCacheTtlMs`
|
|
returns a positive TTL for the read range, the response cache is
|
|
checked first. A hit short-circuits everything, including the
|
|
`EnsureBackendConnectedAsync` call. See
|
|
[`./ResponseCache.md`](./ResponseCache.md).
|
|
2. **Coalesce** — on a cache miss (or no cache configured), the request
|
|
consults `_inFlightByKey` via `TryAttachOrCreate`. A hit attaches the
|
|
new party to an in-flight peer and emits no backend traffic.
|
|
3. **Backend** — on a coalescing miss, the factory branch allocates a
|
|
proxy `TxId` through `TxIdAllocator`, registers the entry in
|
|
`CorrelationMap`, runs the BCD rewriter on the request PDU, and queues
|
|
the frame onto the outbound channel.
|
|
|
|
The order is load-bearing. Cache hits avoid both backend traffic **and**
|
|
any coalescing-entry housekeeping. Coalescing hits avoid the backend but
|
|
still incur a list-append and a fan-out. Backend round-trips are the most
|
|
expensive of the three.
|
|
|
|
## Counter Accounting
|
|
|
|
`PerPlcContext.Counters` exposes three coalescing-specific counters, all
|
|
surfaced on the status page:
|
|
|
|
- **`coalescedHitCount`** — increments inside `OnUpstreamFrameAsync` when
|
|
`TryAttachOrCreate` returns `wasNew == false` (the request attached to
|
|
an existing in-flight entry).
|
|
- **`coalescedMissCount`** — increments when `wasNew == true`. The
|
|
non-coalescing FC03/FC04 path also increments this counter when
|
|
coalescing is disabled, so the identity `coalescedHitCount +
|
|
coalescedMissCount == total FC03+FC04 requests since multiplexer
|
|
construction` holds regardless of `Enabled`.
|
|
- **`coalescedResponseToDeadUpstream`** — increments inside the backend
|
|
reader's fan-out loop when a coalesced party's pipe has gone dead
|
|
(`party.Pipe.IsAlive == false`) before the response landed. Only
|
|
counted when the in-flight entry had more than one party — single-party
|
|
dead-upstream skips are the normal Phase-9 behaviour and are silent.
|
|
|
|
When `ReadCoalescing.Enabled == false`, `coalescedHitCount` remains zero
|
|
and every FC03/FC04 read increments `coalescedMissCount`. Aggregate fleet
|
|
metrics (hit ratio, requests per second) read directly from these
|
|
counters; see [`../Operations/StatusPage.md`](../Operations/StatusPage.md).
|
|
|
|
The Debug-level log events `mbproxy.coalesce.hit`,
|
|
`mbproxy.coalesce.miss`, and `mbproxy.coalesce.dead_upstream` mirror each
|
|
counter increment; see [`../Reference/LogEvents.md`](../Reference/LogEvents.md).
|
|
|
|
## Transparency Contract Preserved
|
|
|
|
Each upstream client receives the same response shape it would have
|
|
received from a one-to-one proxy:
|
|
|
|
- **Original MBAP `TxId` restored.** The backend reader patches
|
|
`outFrame[0..2]` with `party.OriginalTxId` for each party in the
|
|
`InterestedParties` list. The proxy's internal TxId never reaches an
|
|
upstream socket.
|
|
- **BCD rewriter runs once.** `_pipeline.Process(ResponseToClient, ...)`
|
|
fires exactly once against the shared backend response buffer. Cached
|
|
rewriter context (start address, quantity) comes from the
|
|
`InFlightRequest` that opened the round-trip.
|
|
- **One-party fan-out reuses the buffer.** When
|
|
`inFlight.InterestedParties.Count == 1`, the backend reader assigns the
|
|
original `frame` reference to `outFrame` instead of cloning, saving the
|
|
allocation. Multi-party fan-outs clone the frame per party so each can
|
|
carry a distinct `TxId` without trampling its peers.
|
|
|
|
Coalescing is invisible at the wire-protocol layer. An upstream client
|
|
cannot tell whether its read was served by a fresh backend round-trip or
|
|
by attaching to a peer's in-flight request — only the timing distribution
|
|
changes.
|
|
|
|
## Related Documentation
|
|
|
|
- [`./ConnectionModel.md`](./ConnectionModel.md) — multiplexer overview;
|
|
the `InterestedParties` seam, `CorrelationMap`, and `TxIdAllocator` live
|
|
here.
|
|
- [`./ResponseCache.md`](./ResponseCache.md) — bounded-staleness cache that
|
|
sits above coalescing in the lookup order; cache hits short-circuit
|
|
coalescing entirely.
|
|
- [`../Operations/StatusPage.md`](../Operations/StatusPage.md) — exposes
|
|
`coalescedHitCount`, `coalescedMissCount`, and
|
|
`coalescedResponseToDeadUpstream` per PLC and as fleet aggregates.
|
|
- [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — full
|
|
`mbproxy.coalesce.*` event catalogue with event IDs.
|
|
- [`../Operations/Configuration.md`](../Operations/Configuration.md) —
|
|
binding for `Mbproxy.Resilience.ReadCoalescing.Enabled` and `MaxParties`,
|
|
hot-reload semantics.
|
|
- [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md) — the
|
|
rewriter that runs once on the shared response buffer before fan-out.
|