mbproxy/docs: pivot design contract for Phase 11 response cache

Lands the design-contract pivot ahead of any cache implementation code so
reviewers can evaluate the change to the "purely transparent proxy" stance
independently of the Phase-11 code that depends on it.

- docs/design.md: rewrite "What this is" / Read-coalescing / Failure-modes
  sections to acknowledge the opt-in cache; add new "Response cache (Phase
  11)" section covering lookup order (cache -> coalesce -> backend), multi-
  tag range TTL = min, post-rewriter storage, address-range-overlap write
  invalidation, hot-reload PLC-wide flush, no-persistence, AllowLongTtl gate,
  and LRU-bounded capacity. Extend log event table with mbproxy.cache.*
  events. Extend per-PLC status field table with cacheHitCount /
  cacheMissCount / cacheInvalidations / cacheEntryCount / cacheBytes.
  Extend hot-reload propagation table with CacheTtlMs / Cache.* rows.
- docs/kpi.md: graduate Tier 1.8 (response cache) from "requires Phase 11"
  to "shipped in Phase 11" and add Tier 2.4a cache-memory section.
- CLAUDE.md (mbproxy): update Purpose paragraph and the Architecture
  headline bullets to reflect the transparent-by-default + opt-in-cache
  contract; flip "Implementation complete through Phase 10" to "through
  Phase 11".
- install/mbproxy.config.template.json: add a fully-commented Mbproxy.Cache
  block and a CacheTtlMs example on a BcdTags.Global entry, with prominent
  staleness commentary documenting the design contract.

No code changes in this commit - implementation lands in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-14 02:35:35 -04:00
parent a2dba4bd07
commit 892b10baf4
4 changed files with 124 additions and 12 deletions
+7 -6
View File
@@ -11,7 +11,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
### Purpose: bidirectional BCD rewrite inline on the MBTCP stream
The service is **not** a polling/cache layer. It is a transparent Modbus TCP proxy whose job is to **rewrite the configured BCD tags in real time, in both directions**, while proxying every other byte of the MBTCP connection untouched:
The service is a **transparent-by-default** Modbus TCP proxy whose job is to **rewrite the configured BCD tags in real time, in both directions**, while proxying every other byte of the MBTCP connection untouched. Since Phase 11 the proxy also exposes an **opt-in per-tag response cache** (default OFF, opt-in by setting `BcdTagOptions.CacheTtlMs > 0`); with caching enabled the proxy is no longer purely transparent — upstream reads may return a value up to `CacheTtlMs` milliseconds old.
- **Upstream read path (client → PLC → client).** When a client reads a register on the BCD tag list, the proxy intercepts the PLC's response and rewrites the raw BCD nibbles (`0x1234`) into the binary integer the client expects (`0x04D2` = decimal 1234) before forwarding. 32-bit BCD values that span the CDAB word pair are rewritten as a unit.
- **Downstream write path (client → PLC).** When a client writes a register on the BCD tag list, the proxy intercepts the request and re-encodes the client's binary integer (`0x04D2`) into BCD nibbles (`0x1234`) before forwarding to the PLC, so the value the operator sees in ladder matches what the client wrote.
@@ -24,21 +24,22 @@ The integration win is that upstream consumers (Wonderware / Historian / OPC UA
The full design plan is in **[`docs/design.md`](docs/design.md)** — settled 2026-05-13, updated for Phase 9 multiplexing on 2026-05-14. Headline choices the agent should keep in mind without opening that file:
- **One `TcpListener` per PLC** (54 distinct ports). Each PLC has **one shared backend socket** owned by a `PlcMultiplexer`; many upstream clients are multiplexed onto that single backend via MBAP TxId rewriting (Phase 9). The H2-ECOM100's 4-client cap no longer caps upstream connections.
- **Transparent pass-through** of every byte except the MBAP TxId field (rewritten by the multiplexer on each request and restored on each response) and FC03/FC04 response payloads + FC06/FC16 request payloads at configured BCD addresses (re-encoded between BCD nibbles and binary integers).
- **Transparent by default; opt-in cached** (Phase 11). Every byte passes through unchanged except the MBAP TxId field (rewritten by the multiplexer on each request and restored on each response) and FC03/FC04 response payloads + FC06/FC16 request payloads at configured BCD addresses (re-encoded between BCD nibbles and binary integers). With Phase 11, FC03/FC04 reads for tags whose `CacheTtlMs > 0` may be served from a per-PLC in-process cache without backend traffic; the cache is **OFF by default** per tag.
- **In-flight FC03/FC04 read coalescing** (Phase 10): same-key reads arriving while a peer is in flight attach to the existing `InFlightRequest.InterestedParties` list; the single backend response fans out to every attached client with original TxIds restored. Zero post-response staleness — coalescing entries die when the response arrives. Hot-reload via `Mbproxy.Resilience.ReadCoalescing.Enabled`.
- **Optional response cache** (Phase 11) with per-tag TTL (default 0 = off). Lookup order is **cache → coalesce → backend**: a cache hit short-circuits Phase 10's coalescing path entirely. Multi-tag read range: effective TTL = `min(TTLs)`; any tag with `CacheTtlMs = 0` in the range disables caching for the whole read. Successful FC06/FC16 write responses invalidate cached FC03/FC04 entries whose address range overlaps the write. Cache stores POST-rewriter bytes (hits never re-invoke the rewriter). No persistence — process restart wipes the cache. `Cache.AllowLongTtl = true` is required for any `CacheTtlMs > 60_000`.
- **Polly-backed listener supervisor** auto-recovers any listener that fails to bind at startup or faults at runtime; the same code path also brings up newly-added PLCs from hot-reload and tears down removed ones.
- **`appsettings.json` is hot-reloadable** via `IOptionsMonitor<MbproxyOptions>`; tag-list changes propagate per-PDU, PLC add/remove flows through the supervisor.
- **`appsettings.json` is hot-reloadable** via `IOptionsMonitor<MbproxyOptions>`; tag-list changes propagate per-PDU, PLC add/remove flows through the supervisor. A tag-list reload flushes the affected PLC's response cache (per-tag granularity intentionally not done in v1).
- **Polly bounded retries** on backend connect (3 attempts at 100ms / 500ms / 2000ms). No retries on mid-request failures (FC06/FC16 are non-idempotent on BCD tags). A per-request watchdog in the multiplexer surfaces Modbus exception 0x0B to the upstream client if a backend response never arrives within `BackendRequestTimeoutMs`.
- **Backend disconnect cascades upstream**: when the shared backend socket dies, every attached upstream pipe is closed in the same cycle (counter `BackendDisconnectCascades`); clients reconnect on their next request.
- **Read-only Kestrel admin port** (default 8080) exposes `GET /` (auto-refreshing HTML) and `GET /status.json` with service-wide and per-PLC counters (including Phase-9 mux fields `inFlight`, `maxInFlight`, `txIdWraps`, `disconnectCascades`, `queueDepth` and Phase-10 coalescing fields `coalescedHitCount`, `coalescedMissCount`, `coalescedResponseToDeadUpstream`).
- **Read-only Kestrel admin port** (default 8080) exposes `GET /` (auto-refreshing HTML) and `GET /status.json` with service-wide and per-PLC counters (including Phase-9 mux fields, Phase-10 coalescing fields, and Phase-11 cache fields `cacheHitCount`, `cacheMissCount`, `cacheInvalidations`, `cacheEntryCount`, `cacheBytes`).
Anything beyond this short list — JSON schema, propagation table, stable log event names, status counter catalog, test plan — lives in `docs/design.md`. Open that doc before writing code; keep it in sync when decisions change.
## Current state
**Implementation complete through Phase 10.** Phases 0008 shipped the production-ready 1:1-model service; Phase 9 swapped the connection layer for the TxId-multiplexed model without changing the transparent-rewrite contract; Phase 10 added in-flight read coalescing as an additive optimization on top of the multiplexer. The service is production-ready as a Windows Service:
**Implementation complete through Phase 11.** Phases 0008 shipped the production-ready 1:1-model service; Phase 9 swapped the connection layer for the TxId-multiplexed model; Phase 10 added in-flight read coalescing on top; Phase 11 added an opt-in per-tag response cache (bounded staleness, OFF by default — see "Response cache" in `docs/design.md`). The service is production-ready as a Windows Service:
- 325 tests passing: 282 unit tests + 43 E2E tests (against the pymodbus DL205 simulator + stub backends).
- Test count grew through Phase 11 (see `tests/Mbproxy.Tests/` for the current suite; previous baseline was 325 = 282 unit + 43 E2E).
- Single-file self-contained publish (`dotnet publish -c Release -r win-x64`).
- PowerShell install/uninstall scripts under `install/`.
- Graceful shutdown with configurable drain timeout (`Connection.GracefulShutdownTimeoutMs`, default 10 s).