Files
Joseph Doherty f49e27e316 mbproxy/docs: split deep docs into focused PascalCase files per StyleGuide
Adds 11 topic-focused docs under docs/{Architecture,Features,Operations,Reference,Testing}/
and links them from README.md's new "Detailed documentation" section. Existing
top-level docs (design.md, kpi.md, operations.md) remain as canonical landings.

Architecture/
  - Overview.md         (150 lines) — listener topology, request flow, per-PLC isolation
  - ConnectionModel.md  (247 lines) — TxId multiplexer, watchdog, disconnect cascade
  - ReadCoalescing.md   (243 lines) — in-flight FC03/04 dedup via InFlightByKeyMap
  - ResponseCache.md    (398 lines) — opt-in per-tag TTL cache + range-overlap invalidation

Features/
  - BcdRewriting.md     (252 lines) — codec, CDAB, FC scope, partial-overlap policy
  - HotReload.md        (189 lines) — IOptionsMonitor + per-change-kind reconcile rules

Operations/
  - Configuration.md    (422 lines) — every Mbproxy:* option + validation rules
  - StatusPage.md       (334 lines) — admin endpoint surface, every JSON field
  - Troubleshooting.md  (364 lines) — diagnosis playbook keyed to log events

Reference/
  - LogEvents.md        (499 lines) — 28 events across 7 categories, grep-verified

Testing/
  - Simulator.md        (235 lines) — pymodbus fixture, skip policy, 3.13 framer quirk

Each doc was written by a dedicated agent against the StyleGuide.md rules with
a per-doc phase gate (PascalCase filename, H1 Title Case, code-fence language
tags, Related Documentation section with >=3 relative links, real type names
verified against src/). Cross-references between docs use relative paths;
all 18 README->docs links and all sibling links resolve.

Known follow-up: docs/design.md lines 215-251 are stale on two log-event
property templates (config.reload.applied and config.reload.rejected) and
mention LogContext.PushProperty scoping that isn't actually used. Reference/
LogEvents.md is now the authoritative event catalog and source-of-truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 03:44:34 -04:00

11 KiB

Read Coalescing

In-flight read coalescing collapses identical FC03/FC04 requests that arrive while a backend response is still in flight onto a single backend round-trip, then fans the single response out to every attached upstream client with each client's original MBAP transaction ID restored.

What Coalescing Does

When two upstream clients each send (unitId=1, FC=3, start=100, qty=10) within the in-flight window of a previously-routed request, the second arrival attaches to the existing InFlightRequest instead of opening a new proxy transaction ID and a second backend round-trip. The PLC's reply is delivered to both upstream pipes; each pipe sees its own MBAP TxId restored on its copy of the response.

The value each upstream sees is the same value an uncoalesced request would have returned within the PLC's own scan-time precision (microseconds to ~10 ms typical window). Coalescing is not a cache layer — once the response fans out, the in-flight entry dies, and a subsequent identical read opens a fresh round-trip. Bounded-staleness caching is a separate feature; see ./ResponseCache.md.

The Coalescing Key

The lookup tuple is defined in CoalescingKey.cs:

internal readonly record struct CoalescingKey(
    byte UnitId,
    byte Fc,
    ushort StartAddress,
    ushort Qty);

Record-struct value equality drives the dictionary lookup in InFlightByKeyMap. Several axes never coalesce, by design:

  • Function code. FC03 (Read Holding Registers) and FC04 (Read Input Registers) read different Modbus tables on the device. Their responses are not interchangeable, so they do not share a key even at the same address.
  • Unit ID. Distinct unit IDs behind a shared socket address different Modbus personalities — coalescing never crosses a unit boundary.
  • Start address and quantity. Two reads with overlapping but non-identical ranges never coalesce. Range-overlap logic exists for cache invalidation, not for coalescing.

Eligibility

Only FC03 and FC04 enter the coalescing path. The multiplexer's request handler parses the function code from the inbound PDU and gates on fcByte is 0x03 or 0x04 before consulting _inFlightByKey.

  • FC06 (Write Single Register) and FC16 (Write Multiple Registers) are non-idempotent on BCD tags — a second send would write the value twice. Writes bypass coalescing entirely and always take the one-round-trip path.
  • Exception responses do not coalesce. Each upstream sees an exception delivered against its own MBAP TxId through the normal correlation map fan-out; there is no special exception-deduplication path.

The InterestedParties Seam

The data shape that powers fan-out lives on InFlightRequest:

internal sealed record InFlightRequest(
    byte UnitId,
    byte Fc,
    ushort StartAddress,
    ushort Qty,
    IReadOnlyList<InterestedParty> InterestedParties,
    DateTimeOffset SentAtUtc,
    int ResolvedCacheTtlMs = 0);

internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId);

Each InterestedParty records the upstream pipe to deliver the response to and the original MBAP TxId that pipe sent. The backend reader iterates this list, patches each party's OriginalTxId into a per-party copy of the response frame, and hands the frame to party.Pipe.SendResponseAsync.

Multi-writer multi-reader safety

The list typed as IReadOnlyList<InterestedParty> on the public surface is in fact a mutable List<InterestedParty> underneath. InFlightByKeyMap serialises every state mutation under a single object lock:

  • TryAttachOrCreate looks up the key, casts the existing InterestedParties back to List<InterestedParty>, and appends the new party — all under the lock.
  • The backend reader calls TryRemove(coalKey, out _) before it iterates the parties list during fan-out. Once the key is gone from the map, no future attach can find it, so no further appends can occur.

The reader's removal-before-iteration ordering is the load-bearing invariant. By the time fan-out reads the list, the list is effectively frozen — there is no other writer that can reach it. The watchdog timeout path observes the same protocol: it removes the coalescing key before it walks req.InterestedParties to deliver exception 0x0B.

The reverse race (reader removes first, then a late attach arrives) is impossible by construction — TryRemove and TryAttachOrCreate both take the same map lock, so any late attach is serialised either entirely before the removal (and is part of the fan-out) or entirely after (and opens a fresh entry under a new factory call).

MaxParties Cap

ResilienceOptions.cs exposes the load-shedding cap:

public sealed class ReadCoalescingOptions
{
    public bool Enabled { get; init; } = true;
    public int MaxParties { get; init; } = 32;
}

Mbproxy.Resilience.ReadCoalescing.MaxParties defaults to 32. Inside TryAttachOrCreate, an existing entry is only extended when existingList.Count < maxParties; once the cap is hit, the next identical arrival falls through to the factory branch and opens a fresh in-flight entry (which means a fresh backend round-trip).

The cap bounds two costs:

  • Fan-out cost per entry at O(MaxParties). The backend reader's per-party copy-and-patch loop runs at most MaxParties times for any single response.
  • Backend reader latency under pile-on. A single pathologically popular read (every HMI hitting the same tag at the same second) cannot stretch one fan-out arbitrarily long.

Hot-Reloadable On/Off

Mbproxy.Resilience.ReadCoalescing.Enabled defaults to true. The multiplexer holds a Func<ReadCoalescingOptions> accessor that production binds to () => optionsMonitor.CurrentValue.Resilience.ReadCoalescing, so a hot-reload of appsettings.json propagates immediately on the next inbound PDU.

Flipping Enabled to false at runtime does not disturb already-coalesced entries: existing fan-outs drain through the backend reader naturally. Subsequent FC03/FC04 requests skip the coalescing branch entirely and take the one-proxy-TxId-per-upstream-request path verbatim.

The same accessor reads MaxParties per PDU, so an operator can raise or lower the cap without restarting the service.

Lookup Order in the Multiplexer's Read Path

OnUpstreamFrameAsync consults three tiers in fixed order for FC03/FC04:

  1. Cache — if _ctx.Cache is wired and _ctx.TagMap.ResolveCacheTtlMs returns a positive TTL for the read range, the response cache is checked first. A hit short-circuits everything, including the EnsureBackendConnectedAsync call. See ./ResponseCache.md.
  2. Coalesce — on a cache miss (or no cache configured), the request consults _inFlightByKey via TryAttachOrCreate. A hit attaches the new party to an in-flight peer and emits no backend traffic.
  3. Backend — on a coalescing miss, the factory branch allocates a proxy TxId through TxIdAllocator, registers the entry in CorrelationMap, runs the BCD rewriter on the request PDU, and queues the frame onto the outbound channel.

The order is load-bearing. Cache hits avoid both backend traffic and any coalescing-entry housekeeping. Coalescing hits avoid the backend but still incur a list-append and a fan-out. Backend round-trips are the most expensive of the three.

Counter Accounting

PerPlcContext.Counters exposes three coalescing-specific counters, all surfaced on the status page:

  • coalescedHitCount — increments inside OnUpstreamFrameAsync when TryAttachOrCreate returns wasNew == false (the request attached to an existing in-flight entry).
  • coalescedMissCount — increments when wasNew == true. The non-coalescing FC03/FC04 path also increments this counter when coalescing is disabled, so the identity coalescedHitCount + coalescedMissCount == total FC03+FC04 requests since multiplexer construction holds regardless of Enabled.
  • coalescedResponseToDeadUpstream — increments inside the backend reader's fan-out loop when a coalesced party's pipe has gone dead (party.Pipe.IsAlive == false) before the response landed. Only counted when the in-flight entry had more than one party — single-party dead-upstream skips are the normal Phase-9 behaviour and are silent.

When ReadCoalescing.Enabled == false, coalescedHitCount remains zero and every FC03/FC04 read increments coalescedMissCount. Aggregate fleet metrics (hit ratio, requests per second) read directly from these counters; see ../Operations/StatusPage.md.

The Debug-level log events mbproxy.coalesce.hit, mbproxy.coalesce.miss, and mbproxy.coalesce.dead_upstream mirror each counter increment; see ../Reference/LogEvents.md.

Transparency Contract Preserved

Each upstream client receives the same response shape it would have received from a one-to-one proxy:

  • Original MBAP TxId restored. The backend reader patches outFrame[0..2] with party.OriginalTxId for each party in the InterestedParties list. The proxy's internal TxId never reaches an upstream socket.
  • BCD rewriter runs once. _pipeline.Process(ResponseToClient, ...) fires exactly once against the shared backend response buffer. Cached rewriter context (start address, quantity) comes from the InFlightRequest that opened the round-trip.
  • One-party fan-out reuses the buffer. When inFlight.InterestedParties.Count == 1, the backend reader assigns the original frame reference to outFrame instead of cloning, saving the allocation. Multi-party fan-outs clone the frame per party so each can carry a distinct TxId without trampling its peers.

Coalescing is invisible at the wire-protocol layer. An upstream client cannot tell whether its read was served by a fresh backend round-trip or by attaching to a peer's in-flight request — only the timing distribution changes.