# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## What this is `mbproxy` is a **C# .NET 10** background service — a **Windows Service** or a **Linux systemd unit** — that sits **inline as a Modbus TCP proxy** in front of a fleet of **~54 AutomationDirect DirectLOGIC DL205 / DL260** equipment controllers. It is pre-configured with two pieces of static data: 1. **A list of BCD tags** — the holding/input registers (by Modbus address and bit width) that the controllers store in DirectLOGIC's native BCD encoding (`V2000 = 1234` is stored on the wire as `0x1234`, *not* `0x04D2`). 2. **A list of equipment controller IP addresses** (~54 entries) for the DL205/DL260 fleet. Each controller speaks Modbus TCP on port 502 via either the built-in DL260 Ethernet port or an H2-ECOM100 / H2-EBC100 coprocessor. ### Purpose: bidirectional BCD rewrite inline on the MBTCP stream The service is a **transparent-by-default** Modbus TCP proxy whose job is to **rewrite the configured BCD tags in real time, in both directions**, while proxying every other byte of the MBTCP connection untouched. Since Phase 11 the proxy also exposes an **opt-in per-tag response cache** (default OFF, opt-in by setting `BcdTagOptions.CacheTtlMs > 0`); with caching enabled the proxy is no longer purely transparent — upstream reads may return a value up to `CacheTtlMs` milliseconds old. - **Upstream read path (client → PLC → client).** When a client reads a register on the BCD tag list, the proxy intercepts the PLC's response and rewrites the raw BCD nibbles (`0x1234`) into the binary integer the client expects (`0x04D2` = decimal 1234) before forwarding. 32-bit BCD values that span the CDAB word pair are rewritten as a unit. - **Downstream write path (client → PLC).** When a client writes a register on the BCD tag list, the proxy intercepts the request and re-encodes the client's binary integer (`0x04D2`) into BCD nibbles (`0x1234`) before forwarding to the PLC, so the value the operator sees in ladder matches what the client wrote. - **Everything else passes through unchanged.** Non-BCD registers, coils, discrete inputs, function codes the service doesn't touch (diagnostics, exception responses, etc.) are forwarded byte-for-byte. MBAP transaction IDs and unit IDs are preserved end-to-end so the proxy is invisible to both sides. The integration win is that upstream consumers (Wonderware / Historian / OPC UA gateways / generic Modbus clients) can read and write the configured BCD tags as if they were plain `Int16`/`Int32`, and the proxy is the only place that has to know which registers are BCD. ## Architecture The full architecture is documented under **[`docs/`](docs/)** — see the `Architecture/`, `Features/`, `Operations/`, `Reference/`, and `Testing/` pages. Headline choices the agent should keep in mind without opening those files: - **One `TcpListener` per PLC** (54 distinct ports). Each PLC has **one shared backend socket** owned by a `PlcMultiplexer`; many upstream clients are multiplexed onto that single backend via MBAP TxId rewriting (Phase 9). The H2-ECOM100's 4-client cap no longer caps upstream connections. - **Transparent by default; opt-in cached** (Phase 11). Every byte passes through unchanged except the MBAP TxId field (rewritten by the multiplexer on each request and restored on each response) and FC03/FC04 response payloads + FC06/FC16 request payloads at configured BCD addresses (re-encoded between BCD nibbles and binary integers). With Phase 11, FC03/FC04 reads for tags whose `CacheTtlMs > 0` may be served from a per-PLC in-process cache without backend traffic; the cache is **OFF by default** per tag. - **In-flight FC03/FC04 read coalescing** (Phase 10): same-key reads arriving while a peer is in flight attach to the existing `InFlightRequest.InterestedParties` list; the single backend response fans out to every attached client with original TxIds restored. Zero post-response staleness — coalescing entries die when the response arrives. Hot-reload via `Mbproxy.Resilience.ReadCoalescing.Enabled`. - **Optional response cache** (Phase 11) with per-tag TTL (default 0 = off). Lookup order is **cache → coalesce → backend**: a cache hit short-circuits Phase 10's coalescing path entirely. Multi-tag read range: effective TTL = `min(TTLs)`; any tag with `CacheTtlMs = 0` in the range disables caching for the whole read. Successful FC06/FC16 write responses invalidate cached FC03/FC04 entries whose address range overlaps the write. Cache stores POST-rewriter bytes (hits never re-invoke the rewriter). No persistence — process restart wipes the cache. `Cache.AllowLongTtl = true` is required for any `CacheTtlMs > 60_000`. - **Polly-backed listener supervisor** auto-recovers any listener that fails to bind at startup or faults at runtime; the same code path also brings up newly-added PLCs from hot-reload and tears down removed ones. - **`appsettings.json` is hot-reloadable** via `IOptionsMonitor`; tag-list changes propagate per-PDU, PLC add/remove flows through the supervisor. A tag-list reload flushes the affected PLC's response cache (per-tag granularity intentionally not done in v1). - **Polly bounded retries** on backend connect (3 attempts at 100ms / 500ms / 2000ms). No retries on mid-request failures (FC06/FC16 are non-idempotent on BCD tags). A per-request watchdog in the multiplexer surfaces Modbus exception 0x0B to the upstream client if a backend response never arrives within `BackendRequestTimeoutMs`. - **Backend disconnect cascades upstream**: when the shared backend socket dies, every attached upstream pipe is closed in the same cycle (counter `BackendDisconnectCascades`); clients reconnect on their next request. - **Keepalive / connection monitoring** (ON by default, `Connection.Keepalive`): OS `SO_KEEPALIVE` on backend and accepted upstream sockets, plus a per-PLC application heartbeat — a synthetic FC03 qty=1 read fired on an idle backend socket (`BackendHeartbeatIdleMs`). An unanswered heartbeat proactively tears the backend down (counters `backendHeartbeatsSent/Failed`, `backendIdleDisconnects`). The DL260 has no FC08, so the probe is a real register read. See [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md). - **Read-only Kestrel admin port** (default 8080) exposes `GET /` (auto-refreshing HTML) and `GET /status.json` with service-wide and per-PLC counters (including Phase-9 mux fields, Phase-10 coalescing fields, and Phase-11 cache fields `cacheHitCount`, `cacheMissCount`, `cacheInvalidations`, `cacheEntryCount`, `cacheBytes`). Anything beyond this short list lives in the `docs/` tree: the appsettings.json schema in [`docs/Operations/Configuration.md`](docs/Operations/Configuration.md), config propagation in [`docs/Features/HotReload.md`](docs/Features/HotReload.md), stable log event names in [`docs/Reference/LogEvents.md`](docs/Reference/LogEvents.md), the status counter catalog in [`docs/Operations/StatusPage.md`](docs/Operations/StatusPage.md), and the simulator-backed test fixture in [`docs/Testing/Simulator.md`](docs/Testing/Simulator.md). Open the relevant page before writing code; keep it in sync when decisions change. ## Current state **Implementation complete through Phase 11.** Phases 00–08 shipped the production-ready 1:1-model service; Phase 9 swapped the connection layer for the TxId-multiplexed model; Phase 10 added in-flight read coalescing on top; Phase 11 added an opt-in per-tag response cache (bounded staleness, OFF by default — see [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md)). The service is production-ready as a **Windows Service or a Linux systemd unit**: - Test count grew through Phase 11 (see `tests/Mbproxy.Tests/` for the current suite; previous baseline was 325 = 282 unit + 43 E2E). - Single-file self-contained publish for `win-x64` **and** `linux-x64` (`dotnet publish -c Release -r `) — the RID is supplied per publish, never hardcoded in the csproj. - Install/uninstall scripts under `install/`: PowerShell (`install.ps1` / `uninstall.ps1`) for the Windows Service; shell (`install.sh` / `uninstall.sh` + the `mbproxy.service` unit) for systemd. - Graceful shutdown with configurable drain timeout (`Connection.GracefulShutdownTimeoutMs`, default 10 s) — driven by the Windows SCM stop signal or POSIX `SIGTERM`. - Platform diagnostic sink for Error+ events, chosen once at the composition root by `DiagnosticSinkSelector`: Windows Application Event Log under the SCM, local syslog under systemd, none for interactive/dev runs. The systemd unit is `Type=exec` (not `notify`). - Read-only HTTP status page at `AdminPort` (default 8080) — surfaces Phase-9 mux fields alongside Phase-7 counters. - `connectsSuccess` / `connectsFailed` counters wired in `PlcMultiplexer`. - Phase 9 per-request watchdog defends against any backend that drops or mis-echoes a response (real-world packet loss; pymodbus 3.13 simulator's concurrent-multiplexed-request bug). - `AssemblyInformationalVersion` set to `1.0.0` (CI can override via `/p:InformationalVersion=...`). The human-facing entry point is **[`README.md`](README.md)**. All design decisions live in the [`docs/`](docs/) tree. Constraints that still apply to this codebase (do not change without updating the relevant `docs/` page): - The csproj targets **.NET 10** (`net10.0`). This is the **only** tool in `wwtools/` not pinned to .NET Framework 4.8 / x86. ## Device quirks (read before writing Modbus code) The DL205/DL260 family is *almost* Modbus-spec-compliant, but every category below has at least one trap. The authoritative reference is **[`docs/Reference/dl205.md`](docs/Reference/dl205.md)** — read it end-to-end before touching the wire protocol. Highlights that bear directly on this proxy: - **BCD-by-default numeric encoding.** `V2000 = 1234` stores `0x1234` on the wire, not `0x04D2`. This is the entire reason this service exists. - **CDAB word order for 32-bit values.** Low word first, big-endian bytes within each word. `0xAABBCCDD` lands as `[0xCC 0xDD][0xAA 0xBB]`. - **Octal V-memory ↔ decimal Modbus translation.** `V2000` octal = decimal 1024 = Modbus PDU `0x0400`. Config addresses are PDU-decimal, **not** octal V-memory and **not** 1-based 4xxxx. - **FC03/FC04 max qty = 128** (above spec's 125). **FC16 max qty = 100** (below spec's 123). The proxy passes these through; the PLC enforces the cap with exception 03. - **Max 4 concurrent TCP clients per ECOM100.** This is why the proxy uses a single TxId-multiplexed backend socket per PLC — see [`docs/Architecture/ConnectionModel.md`](docs/Architecture/ConnectionModel.md) for how the connection model lifts this cap. - **No TCP keepalive from the device.** Middleboxes typically drop idle sockets at 2–5 min. The proxy compensates with its own keepalive — `SO_KEEPALIVE` on every socket plus an idle backend FC03 heartbeat (see the Architecture summary and [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md)). - **Register 0 is valid** on DL205/DL260 in factory "absolute" addressing mode — don't probe-skip it. - **As-deployed PLC parameters** (captured in `docs/Reference/mbtcp_settings.JPG`): port 502, "Use Concept data structures (Longs/Reals)" enabled, "Swap bytes" enabled, "Use Zero Based Addressing" **unchecked**, Register type = Binary, max coil read 1976 / coil write 800 / register read 122 / register write 100. The proxy must speak Modbus as-is; these settings describe the wire it'll see. ## Resource index | Task | Go to | | --- | --- | | Architecture — listener topology, request flow, per-PLC isolation | [`docs/Architecture/Overview.md`](docs/Architecture/Overview.md) | | Connection model — single backend socket per PLC, TxId multiplexing, request-timeout watchdog, disconnect cascade | [`docs/Architecture/ConnectionModel.md`](docs/Architecture/ConnectionModel.md) | | Keepalive / connection monitoring — TCP `SO_KEEPALIVE` + backend FC03 heartbeat | [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md) | | In-flight read coalescing / opt-in response cache | [`docs/Architecture/ReadCoalescing.md`](docs/Architecture/ReadCoalescing.md), [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md) | | BCD rewriting (codec, CDAB word order, FC03/04/06/16 scope) and config hot-reload | [`docs/Features/BcdRewriting.md`](docs/Features/BcdRewriting.md), [`docs/Features/HotReload.md`](docs/Features/HotReload.md) | | Operations — full appsettings.json reference, status page / JSON fields, troubleshooting playbook | [`docs/Operations/Configuration.md`](docs/Operations/Configuration.md), [`docs/Operations/StatusPage.md`](docs/Operations/StatusPage.md), [`docs/Operations/Troubleshooting.md`](docs/Operations/Troubleshooting.md) | | Stable `mbproxy.*` log event-name catalog | [`docs/Reference/LogEvents.md`](docs/Reference/LogEvents.md) | | DL205/DL260 Modbus quirks (BCD, CDAB, octal V-memory, FC limits, exception codes, oddities) | [`docs/Reference/dl205.md`](docs/Reference/dl205.md) | | pymodbus simulator profile that models those quirks as concrete register values | [`tests/sim/dl205.json`](tests/sim/dl205.json) | | As-deployed PLC Modbus parameters screenshot | [`docs/Reference/mbtcp_settings.JPG`](docs/Reference/mbtcp_settings.JPG) | ## Maintenance Documentation doctrine for `wwtools/` lives in [`../DOCS-GUIDE.md`](../DOCS-GUIDE.md). The three-layer rules apply: - **[`README.md`](README.md)** is the canonical human entry point (Layer-2 per DOCS-GUIDE). It routes to deep docs; it does not duplicate them. Update it when the service's public surface or install steps change. - This `CLAUDE.md` stays a router for LLM coding agents. Deep design decisions live in the [`docs/`](docs/) tree; device quirks live in [`docs/Reference/dl205.md`](docs/Reference/dl205.md). When you change a design decision, update the relevant page under `docs/` first (it's the source of truth) and only mirror the change into the Architecture summary above if it shifts one of the headline bullets. - When the service's task→tool mapping changes in the root index, update [`../CLAUDE.md`](../CLAUDE.md) too. - Any further design changes belong in the relevant `docs/` page (`Architecture/`, `Features/`, `Operations/`, `Reference/`, or `Testing/`).