mbproxy: add opt-in response cache (Phase 11)

Layers a per-PLC, per-tag response cache on top of Phase 10's coalescing.
Cache is OFF by default per tag (CacheTtlMs = 0); a fresh deployment with no
TTL config behaves identically to Phase 10. Operators opt tags in by setting
CacheTtlMs > 0 on a BcdTagOptions entry (or DefaultCacheTtlMs > 0 on a
PlcOptions entry), explicitly acknowledging the staleness window.

Cache lookup order: cache -> coalesce -> backend. A cache hit short-circuits
both Phase 10's coalescing path and Phase 9's backend send. Cache stores
POST-rewriter PDU bytes so hits never re-invoke the BCD rewriter. FC06/FC16
write responses invalidate every cached entry whose address range overlaps
the write (half-open interval math).

New types (Mbproxy.Proxy.Cache, all internal):
- CacheKey (record-struct, same shape as CoalescingKey but kept SEPARATE so
  the two phases evolve independently).
- CacheEntry, ResponseCache (IDisposable; LRU + PeriodicTimer eviction
  loop), CacheInvalidator (pure overlap matcher), CacheLogEvents (stable
  mbproxy.cache.* names).

Multi-tag range TTL = min(TTLs); any tag with TTL = 0 in the range disables
caching for the whole read (conservative-by-design).

Options surface:
- BcdTagOptions.CacheTtlMs (nullable int; null = fall through to PLC default)
- PlcOptions.DefaultCacheTtlMs
- MbproxyOptions.Cache.{AllowLongTtl, MaxEntriesPerPlc, EvictionIntervalMs}
- TTL > 60_000 ms requires Cache.AllowLongTtl = true (reload validation).

Admin counters (Tier 1.8 + Tier 2 cache-memory KPIs from docs/kpi.md):
- CacheHitCount, CacheMissCount, CacheInvalidations on ProxyCounters.
- CacheEntryCount, CacheBytes via a new ICacheStatsProvider snapshot path.
- /status.json and the HTML page surface a new Cache cell per PLC row.

Hot-reload: any tag-list change to a PLC reseats the per-PLC context with a
fresh cache; the old cache is disposed inside ReplaceContextAsync. Per-tag
flush granularity is intentionally not implemented in v1.

PLCs with no cache-eligible tags (every resolved tag has CacheTtlMs = 0)
get Cache = null on the context and skip the eviction timer entirely, so
the no-cache path is byte-identical to Phase 10.

Tests (32 new unit + 5 new E2E = 37 new; suite now 314 unit + 48 E2E):
- CacheKeyTests, CacheEntryTests (records + boundary semantics).
- CacheInvalidatorTests: full overlap, both partials, adjacent-not-
  overlapping, disjoint, different unit ID + auxiliary FC-filter / zero-qty.
- ResponseCacheTests: round-trip, lazy expiry, range invalidation,
  unit-id filter, LRU bound, LRU access tracking, concurrent get/set,
  dispose, clear, approximate-bytes accounting.
- ResponseCacheMultiplexerTests (stub-backend): hit short-circuits
  coalescing, BCD-decoded bytes are cached not raw, FC06 invalidates
  overlapping, non-overlapping write does not invalidate, multi-tag
  TTL=min rule, regression-cache-disabled-by-default-is-Phase-10, hit
  works even when backend unreachable.
- ResponseCacheE2ETests (pymodbus DL205 sim, sequential reads):
  * Headline: 10 reads with TTL=1000 ms -> 9 hits, 1 miss, 1 backend trip.
  * TTL expiry path with sleep > TTL.
  * Write invalidation through the proxy on a scratch register.
  * BCD-decoded bytes are cached, not raw BCD nibbles.
  * Regression: Cache disabled by default -> behaviour byte-identical to
    Phase 10.

Pre-existing flake hardened: BackendDisconnect_CascadesToAllUpstreams now
polls briefly for the cascade counter to absorb the inherent scheduling
gap between "upstream EOF observed" and "counter incremented inside
TearDownBackendAsync." Counter semantics unchanged.

Phase doc updated with implementation clarifications discovered during
this work (CacheKey kept separate from CoalescingKey, LastUsedTick is
long, FC06/FC16 startAddr/qty parsing extension, cache-pre-connect
short-circuit, write-invalidation only on successful responses).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-14 03:08:51 -04:00
parent 892b10baf4
commit 1db900edef
33 changed files with 2407 additions and 38 deletions
+81 -2
View File
@@ -97,7 +97,34 @@ public sealed record CounterSnapshot(
/// attached upstream pipe had already disconnected. A spike is a churn indicator; the
/// metric itself is informational (Tier 2 in <c>docs/kpi.md</c>).
/// </summary>
long CoalescedResponseToDeadUpstream);
long CoalescedResponseToDeadUpstream,
/// <summary>
/// Phase 11 — cumulative count of FC03/FC04 requests served from the response cache.
/// <c>CacheHitCount + CacheMissCount</c> equals total FC03/FC04 requests whose resolved
/// TTL was &gt; 0 (cache-eligible). Reads against tags with TTL = 0 increment neither.
/// </summary>
long CacheHitCount,
/// <summary>
/// Phase 11 — cumulative count of cache-eligible FC03/FC04 requests that fell through
/// to coalescing / backend (no fresh entry was present or the entry had expired).
/// </summary>
long CacheMissCount,
/// <summary>
/// Phase 11 — cumulative count of cache entries invalidated by overlapping FC06/FC16
/// write responses. A high rate suggests caching is fighting writes; consider lower
/// TTLs on cache-overlapping tags.
/// </summary>
long CacheInvalidations,
/// <summary>
/// Phase 11 — point-in-time snapshot of the per-PLC <see cref="Cache.ResponseCache"/>
/// entry count. Read on the snapshot path; 0 when no cache is wired.
/// </summary>
long CacheEntryCount,
/// <summary>
/// Phase 11 — point-in-time approximation of cached PDU bytes for this PLC. Sum of
/// <see cref="Cache.CacheEntry.Length"/> across entries. Read on the snapshot path.
/// </summary>
long CacheBytes);
/// <summary>
/// Thread-safe per-PLC counters backed by <see cref="System.Threading.Interlocked"/> longs.
@@ -137,6 +164,16 @@ internal sealed class ProxyCounters
private long _coalescedMissCount;
private long _coalescedResponseToDeadUpstream;
// Phase 11 — response-cache counters. Hit + Miss = total cache-eligible FC03/FC04.
private long _cacheHitCount;
private long _cacheMissCount;
private long _cacheInvalidations;
// Phase 11 — live cache state pulled from a per-PLC ResponseCache on each snapshot.
// The multiplexer registers a single provider via SetCacheStatsProvider so the status
// page sees current entry-count / bytes without a separate poll.
private volatile ICacheStatsProvider? _cacheStatsProvider;
// Phase 9: live state pulled from the multiplexer's allocator/map/queue on each
// snapshot. The multiplexer registers a single provider via SetMultiplexProvider.
// We use a volatile reference for lock-free read on the snapshot path.
@@ -244,6 +281,25 @@ internal sealed class ProxyCounters
public void IncrementCoalescedResponseToDeadUpstream()
=> Interlocked.Increment(ref _coalescedResponseToDeadUpstream);
/// <summary>Phase 11 — records one FC03/FC04 cache hit.</summary>
public void IncrementCacheHit()
=> Interlocked.Increment(ref _cacheHitCount);
/// <summary>Phase 11 — records one cache-eligible FC03/FC04 read that missed.</summary>
public void IncrementCacheMiss()
=> Interlocked.Increment(ref _cacheMissCount);
/// <summary>Phase 11 — records <paramref name="n"/> cache entries invalidated by a write.</summary>
public void AddCacheInvalidations(int n)
=> Interlocked.Add(ref _cacheInvalidations, n);
/// <summary>
/// Phase 11 — wires the per-PLC <see cref="Cache.ResponseCache"/> as the live stats
/// source for the snapshot path. Pass <c>null</c> to detach during disposal.
/// </summary>
internal void SetCacheStatsProvider(ICacheStatsProvider? provider)
=> _cacheStatsProvider = provider;
/// <summary>
/// CAS-updates the peak in-flight high-water mark. Called on every successful
/// allocation by the multiplexer. Phase 9.
@@ -328,6 +384,10 @@ internal sealed class ProxyCounters
long txWraps = provider?.TxIdWraps ?? 0;
long queueDepth = provider?.BackendQueueDepth ?? 0;
var cacheProvider = _cacheStatsProvider;
long cacheEntries = cacheProvider?.EntryCount ?? 0;
long cacheBytes = cacheProvider?.ApproximateBytes ?? 0;
return new(
PdusForwarded: Interlocked.Read(ref _pdusForwarded),
Fc03: Interlocked.Read(ref _fc03),
@@ -357,7 +417,12 @@ internal sealed class ProxyCounters
BackendQueueDepth: queueDepth,
CoalescedHitCount: Interlocked.Read(ref _coalescedHitCount),
CoalescedMissCount: Interlocked.Read(ref _coalescedMissCount),
CoalescedResponseToDeadUpstream: Interlocked.Read(ref _coalescedResponseToDeadUpstream));
CoalescedResponseToDeadUpstream: Interlocked.Read(ref _coalescedResponseToDeadUpstream),
CacheHitCount: Interlocked.Read(ref _cacheHitCount),
CacheMissCount: Interlocked.Read(ref _cacheMissCount),
CacheInvalidations: Interlocked.Read(ref _cacheInvalidations),
CacheEntryCount: cacheEntries,
CacheBytes: cacheBytes);
}
}
@@ -380,3 +445,17 @@ internal interface IMultiplexCountersProvider
/// <summary>Current depth of the outbound channel (frames queued for the backend writer).</summary>
long BackendQueueDepth { get; }
}
/// <summary>
/// Phase 11 — read-only window into the per-PLC <see cref="Cache.ResponseCache"/>'s live
/// state for the snapshot path. The multiplexer wires this on cache construction so the
/// status page sees live counts without holding a direct reference to the cache.
/// </summary>
internal interface ICacheStatsProvider
{
/// <summary>Current cache entry count.</summary>
long EntryCount { get; }
/// <summary>Approximation of cached PDU bytes (sum of <see cref="Cache.CacheEntry.Length"/>).</summary>
long ApproximateBytes { get; }
}