892b10baf4
Lands the design-contract pivot ahead of any cache implementation code so reviewers can evaluate the change to the "purely transparent proxy" stance independently of the Phase-11 code that depends on it. - docs/design.md: rewrite "What this is" / Read-coalescing / Failure-modes sections to acknowledge the opt-in cache; add new "Response cache (Phase 11)" section covering lookup order (cache -> coalesce -> backend), multi- tag range TTL = min, post-rewriter storage, address-range-overlap write invalidation, hot-reload PLC-wide flush, no-persistence, AllowLongTtl gate, and LRU-bounded capacity. Extend log event table with mbproxy.cache.* events. Extend per-PLC status field table with cacheHitCount / cacheMissCount / cacheInvalidations / cacheEntryCount / cacheBytes. Extend hot-reload propagation table with CacheTtlMs / Cache.* rows. - docs/kpi.md: graduate Tier 1.8 (response cache) from "requires Phase 11" to "shipped in Phase 11" and add Tier 2.4a cache-memory section. - CLAUDE.md (mbproxy): update Purpose paragraph and the Architecture headline bullets to reflect the transparent-by-default + opt-in-cache contract; flip "Implementation complete through Phase 10" to "through Phase 11". - install/mbproxy.config.template.json: add a fully-commented Mbproxy.Cache block and a CacheTtlMs example on a BcdTags.Global entry, with prominent staleness commentary documenting the design contract. No code changes in this commit - implementation lands in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
91 lines
12 KiB
Markdown
91 lines
12 KiB
Markdown
# CLAUDE.md
|
||
|
||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||
|
||
## What this is
|
||
|
||
`mbproxy` is a **C# .NET 10** background service (Windows Service) that sits **inline as a Modbus TCP proxy** in front of a fleet of **~54 AutomationDirect DirectLOGIC DL205 / DL260** equipment controllers. It is pre-configured with two pieces of static data:
|
||
|
||
1. **A list of BCD tags** — the holding/input registers (by Modbus address and bit width) that the controllers store in DirectLOGIC's native BCD encoding (`V2000 = 1234` is stored on the wire as `0x1234`, *not* `0x04D2`).
|
||
2. **A list of equipment controller IP addresses** (~54 entries) for the DL205/DL260 fleet. Each controller speaks Modbus TCP on port 502 via either the built-in DL260 Ethernet port or an H2-ECOM100 / H2-EBC100 coprocessor.
|
||
|
||
### Purpose: bidirectional BCD rewrite inline on the MBTCP stream
|
||
|
||
The service is a **transparent-by-default** Modbus TCP proxy whose job is to **rewrite the configured BCD tags in real time, in both directions**, while proxying every other byte of the MBTCP connection untouched. Since Phase 11 the proxy also exposes an **opt-in per-tag response cache** (default OFF, opt-in by setting `BcdTagOptions.CacheTtlMs > 0`); with caching enabled the proxy is no longer purely transparent — upstream reads may return a value up to `CacheTtlMs` milliseconds old.
|
||
|
||
- **Upstream read path (client → PLC → client).** When a client reads a register on the BCD tag list, the proxy intercepts the PLC's response and rewrites the raw BCD nibbles (`0x1234`) into the binary integer the client expects (`0x04D2` = decimal 1234) before forwarding. 32-bit BCD values that span the CDAB word pair are rewritten as a unit.
|
||
- **Downstream write path (client → PLC).** When a client writes a register on the BCD tag list, the proxy intercepts the request and re-encodes the client's binary integer (`0x04D2`) into BCD nibbles (`0x1234`) before forwarding to the PLC, so the value the operator sees in ladder matches what the client wrote.
|
||
- **Everything else passes through unchanged.** Non-BCD registers, coils, discrete inputs, function codes the service doesn't touch (diagnostics, exception responses, etc.) are forwarded byte-for-byte. MBAP transaction IDs and unit IDs are preserved end-to-end so the proxy is invisible to both sides.
|
||
|
||
The integration win is that upstream consumers (Wonderware / Historian / OPC UA gateways / generic Modbus clients) can read and write the configured BCD tags as if they were plain `Int16`/`Int32`, and the proxy is the only place that has to know which registers are BCD.
|
||
|
||
## Architecture
|
||
|
||
The full design plan is in **[`docs/design.md`](docs/design.md)** — settled 2026-05-13, updated for Phase 9 multiplexing on 2026-05-14. Headline choices the agent should keep in mind without opening that file:
|
||
|
||
- **One `TcpListener` per PLC** (54 distinct ports). Each PLC has **one shared backend socket** owned by a `PlcMultiplexer`; many upstream clients are multiplexed onto that single backend via MBAP TxId rewriting (Phase 9). The H2-ECOM100's 4-client cap no longer caps upstream connections.
|
||
- **Transparent by default; opt-in cached** (Phase 11). Every byte passes through unchanged except the MBAP TxId field (rewritten by the multiplexer on each request and restored on each response) and FC03/FC04 response payloads + FC06/FC16 request payloads at configured BCD addresses (re-encoded between BCD nibbles and binary integers). With Phase 11, FC03/FC04 reads for tags whose `CacheTtlMs > 0` may be served from a per-PLC in-process cache without backend traffic; the cache is **OFF by default** per tag.
|
||
- **In-flight FC03/FC04 read coalescing** (Phase 10): same-key reads arriving while a peer is in flight attach to the existing `InFlightRequest.InterestedParties` list; the single backend response fans out to every attached client with original TxIds restored. Zero post-response staleness — coalescing entries die when the response arrives. Hot-reload via `Mbproxy.Resilience.ReadCoalescing.Enabled`.
|
||
- **Optional response cache** (Phase 11) with per-tag TTL (default 0 = off). Lookup order is **cache → coalesce → backend**: a cache hit short-circuits Phase 10's coalescing path entirely. Multi-tag read range: effective TTL = `min(TTLs)`; any tag with `CacheTtlMs = 0` in the range disables caching for the whole read. Successful FC06/FC16 write responses invalidate cached FC03/FC04 entries whose address range overlaps the write. Cache stores POST-rewriter bytes (hits never re-invoke the rewriter). No persistence — process restart wipes the cache. `Cache.AllowLongTtl = true` is required for any `CacheTtlMs > 60_000`.
|
||
- **Polly-backed listener supervisor** auto-recovers any listener that fails to bind at startup or faults at runtime; the same code path also brings up newly-added PLCs from hot-reload and tears down removed ones.
|
||
- **`appsettings.json` is hot-reloadable** via `IOptionsMonitor<MbproxyOptions>`; tag-list changes propagate per-PDU, PLC add/remove flows through the supervisor. A tag-list reload flushes the affected PLC's response cache (per-tag granularity intentionally not done in v1).
|
||
- **Polly bounded retries** on backend connect (3 attempts at 100ms / 500ms / 2000ms). No retries on mid-request failures (FC06/FC16 are non-idempotent on BCD tags). A per-request watchdog in the multiplexer surfaces Modbus exception 0x0B to the upstream client if a backend response never arrives within `BackendRequestTimeoutMs`.
|
||
- **Backend disconnect cascades upstream**: when the shared backend socket dies, every attached upstream pipe is closed in the same cycle (counter `BackendDisconnectCascades`); clients reconnect on their next request.
|
||
- **Read-only Kestrel admin port** (default 8080) exposes `GET /` (auto-refreshing HTML) and `GET /status.json` with service-wide and per-PLC counters (including Phase-9 mux fields, Phase-10 coalescing fields, and Phase-11 cache fields `cacheHitCount`, `cacheMissCount`, `cacheInvalidations`, `cacheEntryCount`, `cacheBytes`).
|
||
|
||
Anything beyond this short list — JSON schema, propagation table, stable log event names, status counter catalog, test plan — lives in `docs/design.md`. Open that doc before writing code; keep it in sync when decisions change.
|
||
|
||
## Current state
|
||
|
||
**Implementation complete through Phase 11.** Phases 00–08 shipped the production-ready 1:1-model service; Phase 9 swapped the connection layer for the TxId-multiplexed model; Phase 10 added in-flight read coalescing on top; Phase 11 added an opt-in per-tag response cache (bounded staleness, OFF by default — see "Response cache" in `docs/design.md`). The service is production-ready as a Windows Service:
|
||
|
||
- Test count grew through Phase 11 (see `tests/Mbproxy.Tests/` for the current suite; previous baseline was 325 = 282 unit + 43 E2E).
|
||
- Single-file self-contained publish (`dotnet publish -c Release -r win-x64`).
|
||
- PowerShell install/uninstall scripts under `install/`.
|
||
- Graceful shutdown with configurable drain timeout (`Connection.GracefulShutdownTimeoutMs`, default 10 s).
|
||
- Windows Event Log integration (Error+ events when running as a service).
|
||
- Read-only HTTP status page at `AdminPort` (default 8080) — surfaces Phase-9 mux fields alongside Phase-7 counters.
|
||
- `connectsSuccess` / `connectsFailed` counters wired in `PlcMultiplexer`.
|
||
- Phase 9 per-request watchdog defends against any backend that drops or mis-echoes a response (real-world packet loss; pymodbus 3.13 simulator's concurrent-multiplexed-request bug).
|
||
- `AssemblyInformationalVersion` set to `1.0.0` (CI can override via `/p:InformationalVersion=...`).
|
||
|
||
The human-facing entry point is **[`README.md`](README.md)**. All design decisions remain in [`docs/design.md`](docs/design.md).
|
||
|
||
Constraints that still apply to this codebase (do not change without updating the design doc):
|
||
- The csproj targets **.NET 10** (`net10.0`). This is the **only** tool in `wwtools/` not pinned to .NET Framework 4.8 / x86.
|
||
- The sample test `DL260/DL205BcdQuirkTests.cs` is a pattern reference only — its types are not available in this project.
|
||
|
||
## Device quirks (read before writing Modbus code)
|
||
|
||
The DL205/DL260 family is *almost* Modbus-spec-compliant, but every category below has at least one trap. The authoritative reference is **[`DL260/dl205.md`](DL260/dl205.md)** — read it end-to-end before touching the wire protocol. Highlights that bear directly on this proxy:
|
||
|
||
- **BCD-by-default numeric encoding.** `V2000 = 1234` stores `0x1234` on the wire, not `0x04D2`. This is the entire reason this service exists.
|
||
- **CDAB word order for 32-bit values.** Low word first, big-endian bytes within each word. `0xAABBCCDD` lands as `[0xCC 0xDD][0xAA 0xBB]`.
|
||
- **Octal V-memory ↔ decimal Modbus translation.** `V2000` octal = decimal 1024 = Modbus PDU `0x0400`. Config addresses are PDU-decimal, **not** octal V-memory and **not** 1-based 4xxxx.
|
||
- **FC03/FC04 max qty = 128** (above spec's 125). **FC16 max qty = 100** (below spec's 123). The proxy passes these through; the PLC enforces the cap with exception 03.
|
||
- **Max 4 concurrent TCP clients per ECOM100.** Direct constraint on this proxy's 1:1 connection model — see [`docs/design.md`](docs/design.md) → "Connection model" for the band-aid-vs-rearchitect decision tree if this becomes a real problem.
|
||
- **No TCP keepalive from the device.** Middleboxes typically drop idle sockets at 2–5 min. With the 1:1 model, backend liveness tracks upstream client liveness; if both are idle long enough, the path dies on its own and the next request reconnects.
|
||
- **Register 0 is valid** on DL205/DL260 in factory "absolute" addressing mode — don't probe-skip it.
|
||
- **As-deployed PLC parameters** (captured in `DL260/mbtcp_settings.JPG`): port 502, "Use Concept data structures (Longs/Reals)" enabled, "Swap bytes" enabled, "Use Zero Based Addressing" **unchecked**, Register type = Binary, max coil read 1976 / coil write 800 / register read 122 / register write 100. The proxy must speak Modbus as-is; these settings describe the wire it'll see.
|
||
|
||
## Resource index
|
||
|
||
| Task | Go to |
|
||
| --- | --- |
|
||
| Full architecture / design plan (decisions, schema, log events, status counters, test plan) | [`docs/design.md`](docs/design.md) |
|
||
| Phase-by-phase implementation plan (parallel-safety, phase gates, per-phase test list) | [`docs/plan/README.md`](docs/plan/README.md) |
|
||
| Dashboard KPI catalogue — what's exposed today and proposed additions (rates, percentiles, availability, fleet aggregates) | [`docs/kpi.md`](docs/kpi.md) |
|
||
| DL205/DL260 Modbus quirks (BCD, CDAB, octal V-memory, FC limits, exception codes, oddities) | [`DL260/dl205.md`](DL260/dl205.md) |
|
||
| pymodbus simulator profile that models those quirks as concrete register values | [`DL260/dl205.json`](DL260/dl205.json) |
|
||
| Example integration test pattern (xUnit + Shouldly + simulator fixture) | [`DL260/DL205BcdQuirkTests.cs`](DL260/DL205BcdQuirkTests.cs) |
|
||
| As-deployed PLC Modbus parameters screenshot | [`DL260/mbtcp_settings.JPG`](DL260/mbtcp_settings.JPG) |
|
||
|
||
## Maintenance
|
||
|
||
Documentation doctrine for `wwtools/` lives in [`../DOCS-GUIDE.md`](../DOCS-GUIDE.md). The three-layer rules apply:
|
||
|
||
- **[`README.md`](README.md)** is the canonical human entry point (Layer-2 per DOCS-GUIDE). It routes to deep docs; it does not duplicate them. Update it when the service's public surface or install steps change.
|
||
- This `CLAUDE.md` stays a router for LLM coding agents. Deep design decisions live in [`docs/design.md`](docs/design.md); device quirks live in [`DL260/dl205.md`](DL260/dl205.md). When you change a design decision, update `docs/design.md` first (it's the source of truth) and only mirror the change into the Architecture summary above if it shifts one of the headline bullets.
|
||
- When the service's task→tool mapping changes in the root index, update [`../CLAUDE.md`](../CLAUDE.md) too.
|
||
- Any further work beyond Phase 08 belongs in a new design revision (dated, in `docs/design.md`) and a new phase plan.
|