Files
wwtools/mbproxy/docs/plan/11-response-cache.md
T
Joseph Doherty 1db900edef mbproxy: add opt-in response cache (Phase 11)
Layers a per-PLC, per-tag response cache on top of Phase 10's coalescing.
Cache is OFF by default per tag (CacheTtlMs = 0); a fresh deployment with no
TTL config behaves identically to Phase 10. Operators opt tags in by setting
CacheTtlMs > 0 on a BcdTagOptions entry (or DefaultCacheTtlMs > 0 on a
PlcOptions entry), explicitly acknowledging the staleness window.

Cache lookup order: cache -> coalesce -> backend. A cache hit short-circuits
both Phase 10's coalescing path and Phase 9's backend send. Cache stores
POST-rewriter PDU bytes so hits never re-invoke the BCD rewriter. FC06/FC16
write responses invalidate every cached entry whose address range overlaps
the write (half-open interval math).

New types (Mbproxy.Proxy.Cache, all internal):
- CacheKey (record-struct, same shape as CoalescingKey but kept SEPARATE so
  the two phases evolve independently).
- CacheEntry, ResponseCache (IDisposable; LRU + PeriodicTimer eviction
  loop), CacheInvalidator (pure overlap matcher), CacheLogEvents (stable
  mbproxy.cache.* names).

Multi-tag range TTL = min(TTLs); any tag with TTL = 0 in the range disables
caching for the whole read (conservative-by-design).

Options surface:
- BcdTagOptions.CacheTtlMs (nullable int; null = fall through to PLC default)
- PlcOptions.DefaultCacheTtlMs
- MbproxyOptions.Cache.{AllowLongTtl, MaxEntriesPerPlc, EvictionIntervalMs}
- TTL > 60_000 ms requires Cache.AllowLongTtl = true (reload validation).

Admin counters (Tier 1.8 + Tier 2 cache-memory KPIs from docs/kpi.md):
- CacheHitCount, CacheMissCount, CacheInvalidations on ProxyCounters.
- CacheEntryCount, CacheBytes via a new ICacheStatsProvider snapshot path.
- /status.json and the HTML page surface a new Cache cell per PLC row.

Hot-reload: any tag-list change to a PLC reseats the per-PLC context with a
fresh cache; the old cache is disposed inside ReplaceContextAsync. Per-tag
flush granularity is intentionally not implemented in v1.

PLCs with no cache-eligible tags (every resolved tag has CacheTtlMs = 0)
get Cache = null on the context and skip the eviction timer entirely, so
the no-cache path is byte-identical to Phase 10.

Tests (32 new unit + 5 new E2E = 37 new; suite now 314 unit + 48 E2E):
- CacheKeyTests, CacheEntryTests (records + boundary semantics).
- CacheInvalidatorTests: full overlap, both partials, adjacent-not-
  overlapping, disjoint, different unit ID + auxiliary FC-filter / zero-qty.
- ResponseCacheTests: round-trip, lazy expiry, range invalidation,
  unit-id filter, LRU bound, LRU access tracking, concurrent get/set,
  dispose, clear, approximate-bytes accounting.
- ResponseCacheMultiplexerTests (stub-backend): hit short-circuits
  coalescing, BCD-decoded bytes are cached not raw, FC06 invalidates
  overlapping, non-overlapping write does not invalidate, multi-tag
  TTL=min rule, regression-cache-disabled-by-default-is-Phase-10, hit
  works even when backend unreachable.
- ResponseCacheE2ETests (pymodbus DL205 sim, sequential reads):
  * Headline: 10 reads with TTL=1000 ms -> 9 hits, 1 miss, 1 backend trip.
  * TTL expiry path with sleep > TTL.
  * Write invalidation through the proxy on a scratch register.
  * BCD-decoded bytes are cached, not raw BCD nibbles.
  * Regression: Cache disabled by default -> behaviour byte-identical to
    Phase 10.

Pre-existing flake hardened: BackendDisconnect_CascadesToAllUpstreams now
polls briefly for the cascade counter to absorb the inherent scheduling
gap between "upstream EOF observed" and "counter incremented inside
TearDownBackendAsync." Counter semantics unchanged.

Phase doc updated with implementation clarifications discovered during
this work (CacheKey kept separate from CoalescingKey, LastUsedTick is
long, FC06/FC16 startAddr/qty parsing extension, cache-pre-connect
short-circuit, write-invalidation only on successful responses).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 03:08:51 -04:00

28 KiB
Raw Blame History

Phase 11 — Short-TTL response cache (bounded staleness)

Cache FC03/FC04 responses with a per-tag TTL. Subsequent same-key reads within the TTL window are served from the cache without backend traffic. FC06/FC16 writes invalidate overlapping cache entries on the response side. This phase is a deliberate design-contract change — the proxy gains an opt-in cache layer with explicit bounded staleness.

Status: post-1.0 follow-on, depends on Phase 10. Architectural pivot — read the "Design pivot" section below before scoping. Depends on: Phase 09 (multiplexer chokepoint), Phase 10 (CoalescingKey is reused as CacheKey — same shape). Parallel-safe with: nothing.

Design pivot — do NOT skip this section

Phases 09 and 10 were additive performance optimisations that preserved the design's "transparent inline proxy" contract. Phase 11 is different. It changes the load-bearing claim in docs/design.md:

  • Today's contract (lines 12-20 of design.md): "The service is not a polling/cache layer. It is a transparent Modbus TCP proxy whose job is to rewrite the configured BCD tags in real time, in both directions, while proxying every other byte of the MBTCP connection untouched."
  • Post-Phase-11 contract: the proxy is optionally a cache layer within a bounded TTL. The TTL is per-tag, default 0 (no caching), opt-in by operator action.

Implication: Task 1 of this phase is rewriting the relevant design.md sections. The contract update is a code commit too — review, land first, then build the implementation against the new contract. Shipping cache code while design.md still says "not a cache layer" is a gate failure, not a merge-it-and-fix-later situation.

The cache is OFF by default. A fresh post-Phase-11 deployment with no TTL configuration behaves identically to a Phase-10 deployment. The opt-in shape (per-tag CacheTtlMs configuration) means a deployment can adopt Phase 11 without changing semantics until an operator explicitly opts a tag in.

Goal

Reduce backend Modbus traffic for the common SCADA case where many clients poll the same registers at near-identical cadences. Phase 10 already coalesces within the in-flight window (~10 ms). Phase 11 extends the "served without backend traffic" window from the in-flight microseconds to operator-configurable seconds.

Concretely: with CacheTtlMs = 1000 on a frequently-read BCD tag, the backend sees at most one read of that tag per second per PLC regardless of how many upstream clients are polling.

What it does NOT do

  • No active polling. Cache entries are populated on demand by upstream reads, not by proactive polling. (Active polling is Tier C-3 from the conversation history — a separate phase if ever wanted.)
  • No predictive prefetching.
  • No SCADA-style subscription/notification model.
  • No write-back caching. Writes always go straight through to the backend; cache invalidation happens on the write-response side, not by intercepting the write.
  • No cross-PLC caching. Each PLC's cache is independent.
  • No persistence. Process restart wipes the cache. Cache survives backend disconnects (the cached data was fresh when stored; disconnects don't retroactively invalidate it).

Outputs (new files)

src/Mbproxy/Proxy/Cache/CacheKey.cs                  # reuses CoalescingKey shape; type-aliased or reflected
src/Mbproxy/Proxy/Cache/CacheEntry.cs                # response bytes + expiry + lastFetched
src/Mbproxy/Proxy/Cache/ResponseCache.cs             # the cache itself; TTL-based eviction, LRU under cap
src/Mbproxy/Proxy/Cache/CacheInvalidator.cs          # address-range-overlap matcher for write invalidation
src/Mbproxy/Proxy/Cache/CacheLogEvents.cs            # [LoggerMessage] vocab for this phase

tests/Mbproxy.Tests/Proxy/Cache/CacheKeyTests.cs
tests/Mbproxy.Tests/Proxy/Cache/CacheEntryTests.cs
tests/Mbproxy.Tests/Proxy/Cache/ResponseCacheTests.cs
tests/Mbproxy.Tests/Proxy/Cache/CacheInvalidatorTests.cs
tests/Mbproxy.Tests/Proxy/Cache/ResponseCacheE2ETests.cs

Files modified

src/Mbproxy/Proxy/Multiplexing/PlcMultiplexer.cs       # OnFrame: cache check BEFORE coalescing; OnResponse: cache store + write invalidation
src/Mbproxy/Options/BcdTagOptions.cs                   # add CacheTtlMs (default 0 = no caching)
src/Mbproxy/Options/PlcOptions.cs                      # add DefaultCacheTtlMs
src/Mbproxy/Options/MbproxyOptions.cs                  # add Cache section (AllowLongTtl, MaxEntriesPerPlc, EvictionIntervalMs)
src/Mbproxy/Bcd/BcdTag.cs                              # carry CacheTtlMs on the record
src/Mbproxy/Bcd/BcdTagMapBuilder.cs                    # resolve per-tag TTL with per-PLC default fallback
src/Mbproxy/Proxy/ProxyCounters.cs                     # new: CacheHit, CacheMiss, CacheInvalidations, CacheEntryCount, CacheBytes
src/Mbproxy/Admin/StatusDto.cs                         # surface cache KPIs in PlcBackendStatus
src/Mbproxy/Admin/StatusSnapshotBuilder.cs             # populate
src/Mbproxy/Admin/StatusHtmlRenderer.cs                # show cache-hit ratio per PLC row
src/Mbproxy/Configuration/ReloadValidator.cs           # validate CacheTtlMs bounds; require AllowLongTtl=true for > 60s

docs/design.md                                         # SUBSTANTIAL — see Task 1
docs/kpi.md                                            # graduate cache KPIs from future to Tier 1
install/mbproxy.config.template.json                   # add CacheTtlMs examples + staleness commentary
mbproxy/CLAUDE.md                                      # Architecture summary: add the cache-layer bullet

Tasks

11.1 Design contract update — DO THIS FIRST

  1. docs/design.md updates (review and land before writing implementation code):

    a. "What this is" section — add the cache disclosure paragraph:

    As of Phase 11, the proxy gains an optional per-tag response cache with a bounded staleness window (CacheTtlMs). The cache is OFF by default (CacheTtlMs = 0) and must be opt-in per tag. With caching enabled, the proxy is no longer purely transparent — upstream reads may return a value up to CacheTtlMs milliseconds old. The 1:1 read-to-backend-request guarantee no longer holds; operators opting tags into caching MUST acknowledge the staleness bound.

    b. New section "Cache contract" between "Rewriter" and "Failure modes":

    • Cache populates on demand only. No polling.
    • Cache entries carry their TTL with them. Hits older than TTL are evicted on access.
    • FC06/FC16 successful responses invalidate cache entries whose address range overlaps the write.
    • Cache survives backend disconnects (cached data was valid at cache time).
    • Cache does NOT survive process restart.
    • Multi-tag read range: effective TTL is the minimum of all configured tags in the range. Any tag with TTL = 0 in the range disables caching for the whole read.
    • Cache stores POST-rewriter bytes (BCD already decoded). Hits bypass the rewriter entirely.

    c. "Failure modes" section — add bullet on cache behaviour during backend recovery:

    • Cache hits remain valid during a recovering listener state. Data was fresh when cached; recovery only affects future requests.
    • Invalidations during recovery: writes that arrive cannot reach the backend, so the invalidation never happens. This is consistent — the write didn't take effect either. Cache entries remain valid until their TTL expires.

    d. "Rewriter" section — clarify that the rewriter runs on the cache-miss path (decode on store), and that cache hits return pre-decoded bytes without re-invoking the rewriter.

    Treat (a)-(d) as one atomic change. Get them reviewed, land them, then implement against the new contract.

11.2 Cache key

  1. CacheKey — same shape as Phase 10's CoalescingKey: readonly record struct CacheKey(byte UnitId, byte Fc, ushort StartAddress, ushort Qty). If Phase 10 is already merged, prefer a using CacheKey = CoalescingKey; alias over a redefinition — same data, same hashing, single source of truth. If the two phases land together (Phase 10 + 11 in a coordinated release), consider renaming CoalescingKeyReadKey to make the shared use site neutral.

11.3 Cache entry and storage

  1. CacheEntryinternal sealed record CacheEntry(byte[] PduBytes, DateTimeOffset CachedAtUtc, DateTimeOffset ExpiresAtUtc, int Length, ushort LastUsedTick). LastUsedTick is a monotonic counter for LRU ordering (avoids DateTimeOffset.UtcNow calls on every cache access).

  2. ResponseCacheinternal sealed class ResponseCache : IDisposable. Methods:

    • bool TryGet(CacheKey key, out CacheEntry entry) — returns true ONLY if entry exists and entry.ExpiresAtUtc > DateTimeOffset.UtcNow. Updates LastUsedTick on hit. Expired entries removed lazily.
    • void Set(CacheKey key, CacheEntry entry) — replaces any existing entry. If Count >= MaxEntriesPerPlc, evict the LRU entry first.
    • int Invalidate(byte unitId, ushort startAddress, ushort qty) — delegates to CacheInvalidator. Returns count invalidated.
    • int Count { get; }, long ApproximateBytes { get; }
    • Background eviction loop (started in constructor, stopped in Dispose): every EvictionIntervalMs (default 5000), scans the map and removes entries past ExpiresAtUtc.
  3. CacheInvalidator — pure logic: static IEnumerable<CacheKey> FindOverlapping(IReadOnlyCollection<CacheKey> haystack, byte unitId, ushort writeStart, ushort writeQty). Returns keys whose range [StartAddress, StartAddress + Qty) intersects [writeStart, writeStart + writeQty). Limit scope to keys matching unitId and Fc in {3, 4} (we never cache writes; invalidation only applies to read entries).

11.4 Multiplexer integration

  1. Cache lookup in PlcMultiplexer.OnFrame — for FC03/04 requests when the read range has a non-zero resolved TTL:

    if (fc is 0x03 or 0x04 && resolvedTtlMs > 0) {
        var key = new CacheKey(unitId, fc, startAddr, qty);
        if (cache.TryGet(key, out var entry)) {
            counters.IncrementCacheHit();
            // Build a fresh MBAP wrapper for this client and send.
            var hitFrame = BuildResponseFrame(entry.PduBytes, originalTxId, unitId);
            upstreamPipe.SendResponse(hitFrame);
            return;   // no coalescing check, no backend round-trip
        }
        counters.IncrementCacheMiss();
    }
    // Fall through to Phase 10 coalescing path → Phase 9 send path
    

    Order matters: cache check FIRST, then coalescing. A cache hit short-circuits everything; only on a miss do we engage Phase 10's coalescing logic.

  2. Cache store on response — in the backend reader fan-out path, AFTER the rewriter has run on the response:

    if (req.Fc is 0x03 or 0x04 && req.ResolvedCacheTtlMs > 0) {
        var key = new CacheKey(req.UnitId, req.Fc, req.StartAddress, req.Qty);
        var now = DateTimeOffset.UtcNow;
        var entry = new CacheEntry(
            PduBytes:     rewrittenPduBytes.ToArray(),   // defensive copy
            CachedAtUtc:  now,
            ExpiresAtUtc: now.AddMilliseconds(req.ResolvedCacheTtlMs),
            Length:       rewrittenPduBytes.Length,
            LastUsedTick: NextLruTick());
        cache.Set(key, entry);
    }
    

    Note: req.ResolvedCacheTtlMs is computed at request-receive time by walking the BcdTagMap for tags in [StartAddress, StartAddress + Qty) and taking min(CacheTtlMs). If any tag has TTL = 0, ResolvedCacheTtlMs = 0 and the whole read is uncached.

  3. Cache invalidation on write response — FC06 / FC16 successful response (NOT exception response):

    if (req.Fc is 0x06 or 0x10 && (fc & 0x80) == 0) {
        int invalidated = cache.Invalidate(req.UnitId, req.StartAddress, req.Qty);
        if (invalidated > 0) {
            counters.AddCacheInvalidations(invalidated);
            CacheLogEvents.WriteInvalidatedEntries(logger, req.UnitId,
                req.StartAddress, req.Qty, invalidated);
        }
    }
    

    Invalidation is by ADDRESS RANGE OVERLAP, not by exact key match. A write to register 105 invalidates a cached read of [100..110] and a cached read of [105..115] but NOT a cached read of [200..210].

11.5 Per-tag TTL configuration

  1. BcdTagOptions extension:

    public sealed class BcdTagOptions {
        public ushort Address    { get; init; }
        public byte   Width      { get; init; }
        public int    CacheTtlMs { get; init; } = 0;   // 0 = no caching (default)
    }
    
  2. PlcOptions.DefaultCacheTtlMs — applies to any tag whose explicit CacheTtlMs was not set (use a nullable int? on BcdTagOptions instead of int = 0 to distinguish "explicitly zero" from "unset"). Default for the PLC default itself is 0.

  3. MbproxyOptions.Cache section:

    public sealed class CacheOptions {
        public bool AllowLongTtl       { get; init; } = false; // gate for TTL > 60_000
        public int  MaxEntriesPerPlc   { get; init; } = 1000;
        public int  EvictionIntervalMs { get; init; } = 5000;
    }
    
  4. Validation in ReloadValidator: CacheTtlMs >= 0 always; CacheTtlMs > 60_000 requires Cache.AllowLongTtl = true. Reject reloads that violate. Prevents "left at 1 hour by accident" deployments.

  5. BcdTagMapBuilder.Build resolution: returns each BcdTag with CacheTtlMs resolved per fallback rules: explicit per-tag → per-PLC default → 0.

11.6 Counters and status surfacing

  1. ProxyCounters additions:

    • CacheHitCount (Interlocked long)
    • CacheMissCount (Interlocked long)
    • CacheInvalidations (Interlocked long)
    • CacheEntryCount (snapshot from ResponseCache.Count — read-time)
    • CacheBytes (snapshot from ResponseCache.ApproximateBytes — read-time)
  2. StatusDto.PlcBackendStatus extension:

    public sealed record PlcBackendStatus(
        long ConnectsSuccess, long ConnectsFailed,
        ExceptionCounts ExceptionsByCode,
        double LastRoundTripMs,
        long CoalescedHitCount, long CoalescedMissCount, long CoalescedResponseToDeadUpstream,  // Phase 10
        long CacheHitCount, long CacheMissCount,                                                 // Phase 11
        long CacheInvalidations, long CacheEntryCount, long CacheBytes);                         // Phase 11
    
  3. HTML page — add a compact Cache: 73% cell per PLC row. Page-weight assertion (under 50 KB for 54 PLCs) must continue to pass.

11.7 Documentation and template

  1. docs/kpi.md — graduate cache-hit-ratio KPIs from "deferred / future" to Tier 1 supported. Add cacheEntryCount and cacheBytes as Tier 2 memory-watch KPIs.

  2. install/mbproxy.config.template.json — add a fully-commented Mbproxy.Cache section showing AllowLongTtl, MaxEntriesPerPlc, EvictionIntervalMs. Show example per-tag CacheTtlMs: 1000 and per-PLC DefaultCacheTtlMs: 500 entries. Include a prominent comment explaining the staleness contract: "clients reading these tags will see values up to CacheTtlMs milliseconds old".

  3. mbproxy/CLAUDE.md Architecture summary — add a bullet:

    • Optional response cache with per-tag TTL (default 0 = off). Cached FC03/04 responses serve subsequent same-key reads without backend traffic; FC06/FC16 write responses invalidate overlapping entries by address range.

Public surface declared in this phase

namespace Mbproxy.Proxy.Cache;

internal readonly record struct CacheKey(
    byte UnitId, byte Fc, ushort StartAddress, ushort Qty);

internal sealed record CacheEntry(
    byte[] PduBytes,
    DateTimeOffset CachedAtUtc, DateTimeOffset ExpiresAtUtc,
    int Length, ushort LastUsedTick);

internal sealed class ResponseCache : IDisposable {
    public bool TryGet(CacheKey key, out CacheEntry entry);
    public void Set(CacheKey key, CacheEntry entry);
    public int Invalidate(byte unitId, ushort startAddress, ushort qty);
    public int Count { get; }
    public long ApproximateBytes { get; }
    public void Dispose();
}

internal static class CacheInvalidator {
    public static IEnumerable<CacheKey> FindOverlapping(
        IReadOnlyCollection<CacheKey> haystack,
        byte unitId, ushort writeStart, ushort writeQty);
}
namespace Mbproxy.Options;

public sealed class CacheOptions {
    public bool AllowLongTtl       { get; init; } = false;
    public int  MaxEntriesPerPlc   { get; init; } = 1000;
    public int  EvictionIntervalMs { get; init; } = 5000;
}
// Added field on MbproxyOptions:
public CacheOptions Cache { get; init; } = new();

// Added field on BcdTagOptions (nullable to distinguish "unset" from "explicitly 0"):
public int? CacheTtlMs { get; init; }

// Added field on PlcOptions:
public int DefaultCacheTtlMs { get; init; } = 0;

ProxyCounters and CounterSnapshot gain 5 new long fields. No public-surface removals or renames.

Tests required

Unit (Category = Unit)

CacheKeyTests (≥ 3 tests): equality across identical keys; FC03 vs FC04 differs; UnitId differs.

CacheEntryTests (≥ 3 tests): expired detection at boundary; immutability of PduBytes; LRU tick monotonicity.

CacheInvalidatorTests (≥ 5 tests, range-overlap math):

  1. FullOverlap_WriteCoversEntryRange_Invalidates
  2. PartialOverlap_WriteStartsBeforeEntry_Invalidates
  3. PartialOverlap_WriteEndsAfterEntry_Invalidates
  4. Adjacent_NotOverlapping_DoesNotInvalidate — write to [10..15] does NOT invalidate cached [15..20] (half-open intervals — 15 is not in the entry's range).
  5. NoOverlap_DoesNotInvalidate
  6. DifferentUnitId_DoesNotInvalidate

ResponseCacheTests (≥ 8 tests):

  1. SetThenGet_RoundTrips
  2. GetExpiredEntry_ReturnsFalse_AndRemoves — uses a small TTL + Task.Delay
  3. Invalidate_OverlappingRange_RemovesMatching — set 3 entries, invalidate a range overlapping 2 of them, verify Count drops by 2
  4. Invalidate_OnlyAffectsFc03Fc04_KeysWithFcOther_NotTouched — there shouldn't be FC06/FC16 entries in cache, but a defensive test
  5. Set_AtMaxEntries_EvictsLRU
  6. LRU_TracksAccessOrder_Across_Get_And_Set
  7. Concurrent_GetSet_NoDataRace — 100 tasks, 1000 ops each
  8. Dispose_StopsEvictionLoop

E2E (Category = E2E)

ResponseCacheE2ETests (≥ 6 tests, against pymodbus simulator):

  1. E2E_CacheHit_AfterFirstRead_NoBackendTraffic — configure tag at HR1072 with CacheTtlMs = 5000; first read goes to backend; second read within 5s hits cache. Verify via the simulator's HTTP introspection or by timing (cache hits return ~ms; backend reads return ~10ms).
  2. E2E_CacheExpires_AfterTtl_NextReadHitsBackend — short TTL (e.g., 200 ms); after delay, second read goes to backend.
  3. E2E_WriteInvalidatesOverlappingCacheEntries — read HR1072 (cache it), write to HR1072 with FC06, next read MUST miss cache and re-fetch.
  4. E2E_NonOverlappingWrite_DoesNotInvalidate — read HR1072 (cache it), write to HR1080, next read of HR1072 still hits cache.
  5. E2E_BcdDecodedBytesAreCached_NotRawBcd — cache hit returns the decoded 1234, not 0x1234. Proves the cache stores post-rewriter bytes.
  6. E2E_DisablingCache_ViaHotReload_FlushesEntries — set CacheTtlMs = 1000 on a tag, do a read (cached), hot-reload with CacheTtlMs = 0, next read must hit the backend even though the old entry is still within its TTL window.
  7. E2E_MultiTagRead_RangeWithZeroTtlTag_DisablesCaching — read [100..110] where one tag in the range has CacheTtlMs = 0; verify no caching of the whole read.

Phase gate

  • docs/design.md updates from Task 1 are merged FIRST (or in the same PR). The contract change is not optional and not deferrable. Gate fail otherwise.
  • dotnet build Mbproxy.slnx -c Debug — zero warnings, zero errors.
  • All prior tests still green — the 4 critical Phase-9 regression guards + Phase 10's coalescing tests.
  • All new unit + e2e tests pass (≥ 25 new).
  • Default TTL = 0 → no observable behavior change vs Phase 10. Verify: run the full Phase 10 test suite with the Phase 11 build; everything green.
  • Headline assertion (E2E): configure CacheTtlMs = 1000 on HR1072; issue 10 reads at 100 ms intervals; backend (stub or sim with introspection) sees exactly 1 backend round-trip.
  • Write invalidation correctly handles all 6 range-overlap cases (full, two partial, adjacent, none, different-unit-id).
  • Memory cap enforced: with MaxEntriesPerPlc = 5, 6 distinct cache inserts produce 5 entries (one LRU eviction observed).
  • Validation rejects CacheTtlMs > 60_000 unless Cache.AllowLongTtl = true.
  • Hot-reload of CacheTtlMs flushes entries for the affected tag (or, simpler: flushes the entire cache for the PLC). Pick the simpler option (PLC-wide flush) and document.
  • HTML page weight under 50 KB for 54 PLCs (verify with the existing renderer test).
  • docs/kpi.md Tier 1 includes cache-hit-ratio.
  • install/mbproxy.config.template.json includes the new Mbproxy.Cache block with the staleness commentary.

Out of scope

  • Active polling — cache populates on demand only. No background poll loop.
  • Predictive prefetching — no speculative reads.
  • Range-overlap coalescing of cache entries — if reads [100..110] and [105..115] are both cached, no attempt to merge them into one [100..115] entry. Same-key only.
  • Cross-PLC caching — each PLC's cache is independent. No optimisation across PLCs.
  • Persistence — process restart wipes the cache. No file/Redis backing store.
  • Cache warming — no pre-populating the cache from a snapshot, last-known-good file, etc.
  • TTL > 60 seconds without explicit AllowLongTtl opt-in — refused at validation.
  • Adaptive TTL — operator-configured only. No auto-tuning.

Subagent briefing

If you're the agent picking up this phase:

  1. Task 1 is design.md, not code. The contract update is the gate. Do not write the cache code until the design changes have been reviewed and merged (or are in the same PR with explicit reviewer attention). A reviewer who lands the code without the design update has failed the gate, and so have you.

  2. Default TTL = 0 means default behavior = Phase 10 unchanged. Critical for backwards-compat. Every existing test that doesn't set CacheTtlMs must continue to pass without modification.

  3. Cache stores POST-rewriter bytes. The rewriter runs once on the cache-miss path; subsequent hits return cached decoded bytes directly. Do not re-invoke the rewriter on hits — wastes CPU and changes nothing.

  4. Write-invalidation is by ADDRESS RANGE OVERLAP, not by exact key match. A write to register 105 invalidates a cached read of [100..110]. Use half-open interval math: write [w, w+q) overlaps entry [s, s+n) iff w < s+n && s < w+q.

  5. Multi-tag read range: effective TTL is min(TTLs). If any tag in the read range has TTL = 0, the whole read is uncached. Conservative-by-design.

  6. Cache lookup happens BEFORE coalescing. Order: cache check → cache miss → coalescing check (Phase 10) → backend send (Phase 9). A cache hit short-circuits everything.

  7. CacheKey is structurally identical to CoalescingKey. Prefer aliasing over redefinition. If the two phases land together, rename the shared type to ReadKey to make the joint use site neutral.

  8. MBAP TxId restoration on cache-hit responses. The cache stores the PDU bytes (post-rewriter); on hit, build a fresh MBAP wrapper with the requesting client's OriginalTxId. There's no cached MBAP — the per-request TxId is supplied by the upstream pipe's request.

  9. Hot-reload of CacheTtlMs: flush the whole PLC cache on any tag-list change. Tag-level granularity is technically possible but complicates the reload code path. The simple correctness move is "any tag-list change to this PLC → drop all cached entries for this PLC and let them re-populate." Document the choice.

  10. Eviction loop: PeriodicTimer + cancellation token. Not System.Timers.Timer. The cache is IDisposable; the loop honours Dispose.

  11. Update docs/design.md AND docs/kpi.md AND mbproxy/CLAUDE.md AND install/mbproxy.config.template.json IN THE SAME PR AS THE CODE. Doc drift is a gate fail. The architectural pivot must be visible across all reader-facing surfaces.

Implementation clarifications discovered during this phase

The following clarifications were resolved while implementing Phase 11 — recorded here so the next agent doesn't re-derive them:

  • CacheKey vs CoalescingKey — kept SEPARATE (no aliasing). The two records carry the same dimensions but live in different namespaces (Mbproxy.Proxy.Cache vs Mbproxy.Proxy.Multiplexing). Aliasing them would couple the two phases' evolution; a duplicate 4-field record-struct is cheap enough to justify keeping them independent. Per-key equality is record-struct value equality; the two types are never compared.
  • CacheEntry.LastUsedTick is a long, not ushort. The phase doc proposed ushort but the LRU comparison needs to survive >65K touches in a long-running process. The signed-long ticker stamp suffices for the lifetime of any reasonable deployment.
  • No-cacheable-tag PLCs skip the cache entirely. When a PLC's resolved tag map has no entry with CacheTtlMs > 0, ProxyWorker (and ConfigReconciler on reseat/add) builds the PerPlcContext with Cache = null. The multiplexer's cache check is a no-op on a null cache, and no eviction timer is started. The "default OFF = byte- identical to Phase 10" regression test (Cache_DisabledByDefault_*) lands on this code path.
  • Cache check runs BEFORE EnsureBackendConnectedAsync. A cache hit serves the upstream client even when the backend is currently unreachable. This is intentional and matches the design contract bullet "cache survives backend disconnects." Verified by the unit-level FailedBackendConnect_OnFirstRead_DoesNotPreventLaterCacheHits_* test.
  • FC06 / FC16 invalidation requires startAddr/qty parsing. The multiplexer's request parser previously only extracted start/qty for FC03/FC04. Phase 11 extends it to FC06 (qty = 1) and FC16 (qty from request) so the InFlightRequest carries the write span; the response path then invalidates by overlap using those values.
  • Cache eviction loop uses PeriodicTimer. Per the phase doc; clamps the interval to a 100 ms floor (operator-configurable down to that) so a misconfigured EvictionIntervalMs = 0 doesn't become a tight loop.
  • Write invalidation only fires on SUCCESSFUL responses. The post-rewriter check at the backend reader inspects the response FC byte for the exception-bit (& 0x80). An exception response on FC06 / FC16 (e.g. PLC in PROGRAM mode → code 04) does NOT invalidate — consistent with "the write didn't take effect."
  • Pre-existing flake in BackendDisconnect_CascadesToAllUpstreams hardened with a poll loop. The race window between "upstream EOF observed" and "BackendDisconnectCascades counter incremented in TearDownBackendAsync" is inherent to the multiplexer's serial-pipe-dispose loop; the test now polls for up to 1 s for the counter to reach 3. Behaviour is unchanged.

Cross-references

  • Phase 9's multiplexer is the chokepoint that hosts the cache check: 09-txid-multiplexing.md.
  • Phase 10's CoalescingKey is the same shape as Phase 11's CacheKey: 10-read-coalescing.md.
  • The "not a polling/cache layer" stance that this phase pivots away from: ../design.md → "What this is" + "Purpose".
  • KPI graduation target: ../kpi.md → Tier 1 (cache-hit-ratio joins this tier).
  • Resolution rules for per-tag CacheTtlMs (Global Add Remove fallback + per-PLC default): ../design.md → "Hybrid tag resolution".