Files
wwtools/mbproxy/src/Mbproxy/Admin/StatusDto.cs
T
Joseph Doherty 1db900edef mbproxy: add opt-in response cache (Phase 11)
Layers a per-PLC, per-tag response cache on top of Phase 10's coalescing.
Cache is OFF by default per tag (CacheTtlMs = 0); a fresh deployment with no
TTL config behaves identically to Phase 10. Operators opt tags in by setting
CacheTtlMs > 0 on a BcdTagOptions entry (or DefaultCacheTtlMs > 0 on a
PlcOptions entry), explicitly acknowledging the staleness window.

Cache lookup order: cache -> coalesce -> backend. A cache hit short-circuits
both Phase 10's coalescing path and Phase 9's backend send. Cache stores
POST-rewriter PDU bytes so hits never re-invoke the BCD rewriter. FC06/FC16
write responses invalidate every cached entry whose address range overlaps
the write (half-open interval math).

New types (Mbproxy.Proxy.Cache, all internal):
- CacheKey (record-struct, same shape as CoalescingKey but kept SEPARATE so
  the two phases evolve independently).
- CacheEntry, ResponseCache (IDisposable; LRU + PeriodicTimer eviction
  loop), CacheInvalidator (pure overlap matcher), CacheLogEvents (stable
  mbproxy.cache.* names).

Multi-tag range TTL = min(TTLs); any tag with TTL = 0 in the range disables
caching for the whole read (conservative-by-design).

Options surface:
- BcdTagOptions.CacheTtlMs (nullable int; null = fall through to PLC default)
- PlcOptions.DefaultCacheTtlMs
- MbproxyOptions.Cache.{AllowLongTtl, MaxEntriesPerPlc, EvictionIntervalMs}
- TTL > 60_000 ms requires Cache.AllowLongTtl = true (reload validation).

Admin counters (Tier 1.8 + Tier 2 cache-memory KPIs from docs/kpi.md):
- CacheHitCount, CacheMissCount, CacheInvalidations on ProxyCounters.
- CacheEntryCount, CacheBytes via a new ICacheStatsProvider snapshot path.
- /status.json and the HTML page surface a new Cache cell per PLC row.

Hot-reload: any tag-list change to a PLC reseats the per-PLC context with a
fresh cache; the old cache is disposed inside ReplaceContextAsync. Per-tag
flush granularity is intentionally not implemented in v1.

PLCs with no cache-eligible tags (every resolved tag has CacheTtlMs = 0)
get Cache = null on the context and skip the eviction timer entirely, so
the no-cache path is byte-identical to Phase 10.

Tests (32 new unit + 5 new E2E = 37 new; suite now 314 unit + 48 E2E):
- CacheKeyTests, CacheEntryTests (records + boundary semantics).
- CacheInvalidatorTests: full overlap, both partials, adjacent-not-
  overlapping, disjoint, different unit ID + auxiliary FC-filter / zero-qty.
- ResponseCacheTests: round-trip, lazy expiry, range invalidation,
  unit-id filter, LRU bound, LRU access tracking, concurrent get/set,
  dispose, clear, approximate-bytes accounting.
- ResponseCacheMultiplexerTests (stub-backend): hit short-circuits
  coalescing, BCD-decoded bytes are cached not raw, FC06 invalidates
  overlapping, non-overlapping write does not invalidate, multi-tag
  TTL=min rule, regression-cache-disabled-by-default-is-Phase-10, hit
  works even when backend unreachable.
- ResponseCacheE2ETests (pymodbus DL205 sim, sequential reads):
  * Headline: 10 reads with TTL=1000 ms -> 9 hits, 1 miss, 1 backend trip.
  * TTL expiry path with sleep > TTL.
  * Write invalidation through the proxy on a scratch register.
  * BCD-decoded bytes are cached, not raw BCD nibbles.
  * Regression: Cache disabled by default -> behaviour byte-identical to
    Phase 10.

Pre-existing flake hardened: BackendDisconnect_CascadesToAllUpstreams now
polls briefly for the cascade counter to absorb the inherent scheduling
gap between "upstream EOF observed" and "counter incremented inside
TearDownBackendAsync." Counter semantics unchanged.

Phase doc updated with implementation clarifications discovered during
this work (CacheKey kept separate from CoalescingKey, LastUsedTick is
long, FC06/FC16 startAddr/qty parsing extension, cache-pre-connect
short-circuit, write-invalidation only on successful responses).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 03:08:51 -04:00

121 lines
4.1 KiB
C#

using System.Text.Json.Serialization;
namespace Mbproxy.Admin;
// ── Wire DTOs for GET /status.json ───────────────────────────────────────────
// Field names must match design.md "Status page" tables EXACTLY (camelCase via
// JsonKnownNamingPolicy.CamelCase on the source-gen context).
/// <summary>
/// Top-level response envelope for <c>GET /status.json</c>.
/// </summary>
public sealed record StatusResponse(
ServiceFields Service,
ListenersAggregate Listeners,
IReadOnlyList<PlcStatus> Plcs);
/// <summary>Service-wide identity and reload counters.</summary>
public sealed record ServiceFields(
long UptimeSeconds,
string Version,
DateTimeOffset? ConfigLastReloadUtc,
int ConfigReloadCount,
int ConfigReloadRejectedCount);
/// <summary>Aggregate listener state across all configured PLCs.</summary>
public sealed record ListenersAggregate(int Bound, int Configured);
/// <summary>Per-PLC status row.</summary>
public sealed record PlcStatus(
string Name,
string Host,
int ListenPort,
PlcListenerStatus Listener,
PlcClientsStatus Clients,
PlcPdusStatus Pdus,
PlcBackendStatus Backend,
PlcBytesStatus Bytes);
/// <summary>Listener state sub-object.</summary>
public sealed record PlcListenerStatus(
string State,
string? LastBindError,
int RecoveryAttempts);
/// <summary>Connected-clients sub-object.</summary>
public sealed record PlcClientsStatus(
int Connected,
IReadOnlyList<ClientSnapshot> RemoteEndpoints);
/// <summary>Per-connection-pair snapshot for the status page.</summary>
public sealed record ClientSnapshot(
string Remote,
DateTimeOffset ConnectedAtUtc,
long PdusForwarded);
/// <summary>PDU counters sub-object.</summary>
public sealed record PlcPdusStatus(
long Forwarded,
FcCounts ByFc,
long RewrittenSlots,
long PartialBcdWarnings);
/// <summary>Per-function-code request counts.</summary>
public sealed record FcCounts(
long Fc03,
long Fc04,
long Fc06,
long Fc16,
long Other);
/// <summary>
/// Backend connect, exception, and multiplexer telemetry. Phase 9 added
/// <c>InFlight</c>, <c>MaxInFlight</c>, <c>TxIdWraps</c>, <c>DisconnectCascades</c>, and
/// <c>QueueDepth</c>. Phase 10 added the three coalescing counters
/// (<c>CoalescedHitCount</c>, <c>CoalescedMissCount</c>, <c>CoalescedResponseToDeadUpstream</c>);
/// the dashboard-side derived <c>coalescingRatio</c> is intentionally NOT carried on the wire
/// — consumers compute <c>Hit / (Hit + Miss)</c>. Phase 11 added the five cache counters
/// (<c>CacheHitCount</c>, <c>CacheMissCount</c>, <c>CacheInvalidations</c>,
/// <c>CacheEntryCount</c>, <c>CacheBytes</c>); the dashboard-side derived
/// <c>cacheHitRatio</c> is intentionally NOT carried on the wire.
/// </summary>
public sealed record PlcBackendStatus(
long ConnectsSuccess,
long ConnectsFailed,
ExceptionCounts ExceptionsByCode,
double LastRoundTripMs,
long InFlight,
long MaxInFlight,
long TxIdWraps,
long DisconnectCascades,
long QueueDepth,
long CoalescedHitCount,
long CoalescedMissCount,
long CoalescedResponseToDeadUpstream,
long CacheHitCount,
long CacheMissCount,
long CacheInvalidations,
long CacheEntryCount,
long CacheBytes);
/// <summary>Modbus exception counts by code.</summary>
public sealed record ExceptionCounts(
long Code01,
long Code02,
long Code03,
long Code04);
/// <summary>Byte-transfer counters.</summary>
public sealed record PlcBytesStatus(
long UpstreamIn,
long UpstreamOut);
// ── Source-generation context ─────────────────────────────────────────────────
// TreatWarningsAsErrors is on, so the context must include every reachable type.
[JsonSerializable(typeof(StatusResponse))]
[JsonSourceGenerationOptions(
WriteIndented = false,
PropertyNamingPolicy = JsonKnownNamingPolicy.CamelCase)]
internal partial class StatusJsonContext : JsonSerializerContext;