Files
Joseph Doherty f49e27e316 mbproxy/docs: split deep docs into focused PascalCase files per StyleGuide
Adds 11 topic-focused docs under docs/{Architecture,Features,Operations,Reference,Testing}/
and links them from README.md's new "Detailed documentation" section. Existing
top-level docs (design.md, kpi.md, operations.md) remain as canonical landings.

Architecture/
  - Overview.md         (150 lines) — listener topology, request flow, per-PLC isolation
  - ConnectionModel.md  (247 lines) — TxId multiplexer, watchdog, disconnect cascade
  - ReadCoalescing.md   (243 lines) — in-flight FC03/04 dedup via InFlightByKeyMap
  - ResponseCache.md    (398 lines) — opt-in per-tag TTL cache + range-overlap invalidation

Features/
  - BcdRewriting.md     (252 lines) — codec, CDAB, FC scope, partial-overlap policy
  - HotReload.md        (189 lines) — IOptionsMonitor + per-change-kind reconcile rules

Operations/
  - Configuration.md    (422 lines) — every Mbproxy:* option + validation rules
  - StatusPage.md       (334 lines) — admin endpoint surface, every JSON field
  - Troubleshooting.md  (364 lines) — diagnosis playbook keyed to log events

Reference/
  - LogEvents.md        (499 lines) — 28 events across 7 categories, grep-verified

Testing/
  - Simulator.md        (235 lines) — pymodbus fixture, skip policy, 3.13 framer quirk

Each doc was written by a dedicated agent against the StyleGuide.md rules with
a per-doc phase gate (PascalCase filename, H1 Title Case, code-fence language
tags, Related Documentation section with >=3 relative links, real type names
verified against src/). Cross-references between docs use relative paths;
all 18 README->docs links and all sibling links resolve.

Known follow-up: docs/design.md lines 215-251 are stale on two log-event
property templates (config.reload.applied and config.reload.rejected) and
mention LogContext.PushProperty scoping that isn't actually used. Reference/
LogEvents.md is now the authoritative event catalog and source-of-truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 03:44:34 -04:00

244 lines
11 KiB
Markdown

# Read Coalescing
In-flight read coalescing collapses identical FC03/FC04 requests that arrive
while a backend response is still in flight onto a single backend round-trip,
then fans the single response out to every attached upstream client with each
client's original MBAP transaction ID restored.
## What Coalescing Does
When two upstream clients each send `(unitId=1, FC=3, start=100, qty=10)`
within the in-flight window of a previously-routed request, the second
arrival attaches to the existing `InFlightRequest` instead of opening a new
proxy transaction ID and a second backend round-trip. The PLC's reply is
delivered to both upstream pipes; each pipe sees its own MBAP `TxId`
restored on its copy of the response.
The value each upstream sees is the same value an uncoalesced request would
have returned within the PLC's own scan-time precision (microseconds to
~10 ms typical window). Coalescing is not a cache layer — once the response
fans out, the in-flight entry dies, and a subsequent identical read opens a
fresh round-trip. Bounded-staleness caching is a separate feature; see
[`./ResponseCache.md`](./ResponseCache.md).
## The Coalescing Key
The lookup tuple is defined in `CoalescingKey.cs`:
```csharp
internal readonly record struct CoalescingKey(
byte UnitId,
byte Fc,
ushort StartAddress,
ushort Qty);
```
Record-struct value equality drives the dictionary lookup in
`InFlightByKeyMap`. Several axes never coalesce, by design:
- **Function code.** FC03 (Read Holding Registers) and FC04 (Read Input
Registers) read different Modbus tables on the device. Their responses
are not interchangeable, so they do not share a key even at the same
address.
- **Unit ID.** Distinct unit IDs behind a shared socket address different
Modbus personalities — coalescing never crosses a unit boundary.
- **Start address and quantity.** Two reads with overlapping but
non-identical ranges never coalesce. Range-overlap logic exists for cache
invalidation, not for coalescing.
## Eligibility
Only FC03 and FC04 enter the coalescing path. The multiplexer's request
handler parses the function code from the inbound PDU and gates on
`fcByte is 0x03 or 0x04` before consulting `_inFlightByKey`.
- FC06 (Write Single Register) and FC16 (Write Multiple Registers) are
non-idempotent on BCD tags — a second send would write the value twice.
Writes bypass coalescing entirely and always take the one-round-trip path.
- Exception responses do not coalesce. Each upstream sees an exception
delivered against its own MBAP `TxId` through the normal correlation map
fan-out; there is no special exception-deduplication path.
## The InterestedParties Seam
The data shape that powers fan-out lives on `InFlightRequest`:
```csharp
internal sealed record InFlightRequest(
byte UnitId,
byte Fc,
ushort StartAddress,
ushort Qty,
IReadOnlyList<InterestedParty> InterestedParties,
DateTimeOffset SentAtUtc,
int ResolvedCacheTtlMs = 0);
internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId);
```
Each `InterestedParty` records the upstream pipe to deliver the response to
and the original MBAP `TxId` that pipe sent. The backend reader iterates
this list, patches each party's `OriginalTxId` into a per-party copy of the
response frame, and hands the frame to `party.Pipe.SendResponseAsync`.
### Multi-writer multi-reader safety
The list typed as `IReadOnlyList<InterestedParty>` on the public surface is
in fact a mutable `List<InterestedParty>` underneath. `InFlightByKeyMap`
serialises every state mutation under a single `object` lock:
- `TryAttachOrCreate` looks up the key, casts the existing
`InterestedParties` back to `List<InterestedParty>`, and appends the new
party — all under the lock.
- The backend reader calls `TryRemove(coalKey, out _)` **before** it
iterates the parties list during fan-out. Once the key is gone from the
map, no future attach can find it, so no further appends can occur.
The reader's removal-before-iteration ordering is the load-bearing
invariant. By the time fan-out reads the list, the list is effectively
frozen — there is no other writer that can reach it. The watchdog timeout
path observes the same protocol: it removes the coalescing key before it
walks `req.InterestedParties` to deliver exception 0x0B.
The reverse race (reader removes first, then a late attach arrives) is
impossible by construction — `TryRemove` and `TryAttachOrCreate` both take
the same map lock, so any late attach is serialised either entirely before
the removal (and is part of the fan-out) or entirely after (and opens a
fresh entry under a new factory call).
## MaxParties Cap
`ResilienceOptions.cs` exposes the load-shedding cap:
```csharp
public sealed class ReadCoalescingOptions
{
public bool Enabled { get; init; } = true;
public int MaxParties { get; init; } = 32;
}
```
`Mbproxy.Resilience.ReadCoalescing.MaxParties` defaults to 32. Inside
`TryAttachOrCreate`, an existing entry is only extended when
`existingList.Count < maxParties`; once the cap is hit, the next identical
arrival falls through to the factory branch and opens a fresh in-flight
entry (which means a fresh backend round-trip).
The cap bounds two costs:
- **Fan-out cost per entry** at O(MaxParties). The backend reader's
per-party copy-and-patch loop runs at most `MaxParties` times for any
single response.
- **Backend reader latency under pile-on.** A single pathologically popular
read (every HMI hitting the same tag at the same second) cannot stretch
one fan-out arbitrarily long.
## Hot-Reloadable On/Off
`Mbproxy.Resilience.ReadCoalescing.Enabled` defaults to `true`. The
multiplexer holds a `Func<ReadCoalescingOptions>` accessor that production
binds to `() => optionsMonitor.CurrentValue.Resilience.ReadCoalescing`, so
a hot-reload of `appsettings.json` propagates immediately on the next
inbound PDU.
Flipping `Enabled` to `false` at runtime does not disturb already-coalesced
entries: existing fan-outs drain through the backend reader naturally.
Subsequent FC03/FC04 requests skip the coalescing branch entirely and take
the one-proxy-TxId-per-upstream-request path verbatim.
The same accessor reads `MaxParties` per PDU, so an operator can raise or
lower the cap without restarting the service.
## Lookup Order in the Multiplexer's Read Path
`OnUpstreamFrameAsync` consults three tiers in fixed order for FC03/FC04:
1. **Cache** — if `_ctx.Cache` is wired and `_ctx.TagMap.ResolveCacheTtlMs`
returns a positive TTL for the read range, the response cache is
checked first. A hit short-circuits everything, including the
`EnsureBackendConnectedAsync` call. See
[`./ResponseCache.md`](./ResponseCache.md).
2. **Coalesce** — on a cache miss (or no cache configured), the request
consults `_inFlightByKey` via `TryAttachOrCreate`. A hit attaches the
new party to an in-flight peer and emits no backend traffic.
3. **Backend** — on a coalescing miss, the factory branch allocates a
proxy `TxId` through `TxIdAllocator`, registers the entry in
`CorrelationMap`, runs the BCD rewriter on the request PDU, and queues
the frame onto the outbound channel.
The order is load-bearing. Cache hits avoid both backend traffic **and**
any coalescing-entry housekeeping. Coalescing hits avoid the backend but
still incur a list-append and a fan-out. Backend round-trips are the most
expensive of the three.
## Counter Accounting
`PerPlcContext.Counters` exposes three coalescing-specific counters, all
surfaced on the status page:
- **`coalescedHitCount`** — increments inside `OnUpstreamFrameAsync` when
`TryAttachOrCreate` returns `wasNew == false` (the request attached to
an existing in-flight entry).
- **`coalescedMissCount`** — increments when `wasNew == true`. The
non-coalescing FC03/FC04 path also increments this counter when
coalescing is disabled, so the identity `coalescedHitCount +
coalescedMissCount == total FC03+FC04 requests since multiplexer
construction` holds regardless of `Enabled`.
- **`coalescedResponseToDeadUpstream`** — increments inside the backend
reader's fan-out loop when a coalesced party's pipe has gone dead
(`party.Pipe.IsAlive == false`) before the response landed. Only
counted when the in-flight entry had more than one party — single-party
dead-upstream skips are the normal Phase-9 behaviour and are silent.
When `ReadCoalescing.Enabled == false`, `coalescedHitCount` remains zero
and every FC03/FC04 read increments `coalescedMissCount`. Aggregate fleet
metrics (hit ratio, requests per second) read directly from these
counters; see [`../Operations/StatusPage.md`](../Operations/StatusPage.md).
The Debug-level log events `mbproxy.coalesce.hit`,
`mbproxy.coalesce.miss`, and `mbproxy.coalesce.dead_upstream` mirror each
counter increment; see [`../Reference/LogEvents.md`](../Reference/LogEvents.md).
## Transparency Contract Preserved
Each upstream client receives the same response shape it would have
received from a one-to-one proxy:
- **Original MBAP `TxId` restored.** The backend reader patches
`outFrame[0..2]` with `party.OriginalTxId` for each party in the
`InterestedParties` list. The proxy's internal TxId never reaches an
upstream socket.
- **BCD rewriter runs once.** `_pipeline.Process(ResponseToClient, ...)`
fires exactly once against the shared backend response buffer. Cached
rewriter context (start address, quantity) comes from the
`InFlightRequest` that opened the round-trip.
- **One-party fan-out reuses the buffer.** When
`inFlight.InterestedParties.Count == 1`, the backend reader assigns the
original `frame` reference to `outFrame` instead of cloning, saving the
allocation. Multi-party fan-outs clone the frame per party so each can
carry a distinct `TxId` without trampling its peers.
Coalescing is invisible at the wire-protocol layer. An upstream client
cannot tell whether its read was served by a fresh backend round-trip or
by attaching to a peer's in-flight request — only the timing distribution
changes.
## Related Documentation
- [`./ConnectionModel.md`](./ConnectionModel.md) — multiplexer overview;
the `InterestedParties` seam, `CorrelationMap`, and `TxIdAllocator` live
here.
- [`./ResponseCache.md`](./ResponseCache.md) — bounded-staleness cache that
sits above coalescing in the lookup order; cache hits short-circuit
coalescing entirely.
- [`../Operations/StatusPage.md`](../Operations/StatusPage.md) — exposes
`coalescedHitCount`, `coalescedMissCount`, and
`coalescedResponseToDeadUpstream` per PLC and as fleet aggregates.
- [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — full
`mbproxy.coalesce.*` event catalogue with event IDs.
- [`../Operations/Configuration.md`](../Operations/Configuration.md) —
binding for `Mbproxy.Resilience.ReadCoalescing.Enabled` and `MaxParties`,
hot-reload semantics.
- [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md) — the
rewriter that runs once on the shared response buffer before fan-out.