Files
Joseph Doherty b330faff03 mbproxy: cross-platform support — Linux/systemd alongside Windows
Make the service build, run, and install on Linux as a first-class
target while keeping the Windows Service + Event Log behaviour intact.

- Build: drop the hardcoded win-x64 RID — single-file publish now works
  for any RID. publish.ps1 gains -Rid; new publish.sh for Linux hosts.
- Diagnostics: DiagnosticSinkSelector picks the Error+ sink per host —
  Windows Event Log under the SCM, local syslog under systemd
  (Serilog.Sinks.SyslogMessages), none for interactive runs. The
  EventLog truncation helper is extracted so it is testable cross-OS.
- Host: Program.cs registers AddSystemd() alongside AddWindowsService().
- Config: a RID-conditioned appsettings template ships Windows or Unix
  paths; both templates are schema-validated by a test.
- Install: systemd unit (Type=exec) plus install.sh / uninstall.sh.
  Also fixes two cross-platform bugs found while testing: install.ps1
  and uninstall.ps1 used New-EventLog / Remove-EventLog (absent in
  PowerShell 7), and the E2E sim launcher hardcoded Windows venv paths.
- Docs updated across README, CLAUDE.md, and docs/ for dual-platform.

413 tests pass on Windows; 374 (all non-simulator) on Linux.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 09:41:59 -04:00

94 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## What this is
`mbproxy` is a **C# .NET 10** background service — a **Windows Service** or a **Linux systemd unit** — that sits **inline as a Modbus TCP proxy** in front of a fleet of **~54 AutomationDirect DirectLOGIC DL205 / DL260** equipment controllers. It is pre-configured with two pieces of static data:
1. **A list of BCD tags** — the holding/input registers (by Modbus address and bit width) that the controllers store in DirectLOGIC's native BCD encoding (`V2000 = 1234` is stored on the wire as `0x1234`, *not* `0x04D2`).
2. **A list of equipment controller IP addresses** (~54 entries) for the DL205/DL260 fleet. Each controller speaks Modbus TCP on port 502 via either the built-in DL260 Ethernet port or an H2-ECOM100 / H2-EBC100 coprocessor.
### Purpose: bidirectional BCD rewrite inline on the MBTCP stream
The service is a **transparent-by-default** Modbus TCP proxy whose job is to **rewrite the configured BCD tags in real time, in both directions**, while proxying every other byte of the MBTCP connection untouched. Since Phase 11 the proxy also exposes an **opt-in per-tag response cache** (default OFF, opt-in by setting `BcdTagOptions.CacheTtlMs > 0`); with caching enabled the proxy is no longer purely transparent — upstream reads may return a value up to `CacheTtlMs` milliseconds old.
- **Upstream read path (client → PLC → client).** When a client reads a register on the BCD tag list, the proxy intercepts the PLC's response and rewrites the raw BCD nibbles (`0x1234`) into the binary integer the client expects (`0x04D2` = decimal 1234) before forwarding. 32-bit BCD values that span the CDAB word pair are rewritten as a unit.
- **Downstream write path (client → PLC).** When a client writes a register on the BCD tag list, the proxy intercepts the request and re-encodes the client's binary integer (`0x04D2`) into BCD nibbles (`0x1234`) before forwarding to the PLC, so the value the operator sees in ladder matches what the client wrote.
- **Everything else passes through unchanged.** Non-BCD registers, coils, discrete inputs, function codes the service doesn't touch (diagnostics, exception responses, etc.) are forwarded byte-for-byte. MBAP transaction IDs and unit IDs are preserved end-to-end so the proxy is invisible to both sides.
The integration win is that upstream consumers (Wonderware / Historian / OPC UA gateways / generic Modbus clients) can read and write the configured BCD tags as if they were plain `Int16`/`Int32`, and the proxy is the only place that has to know which registers are BCD.
## Architecture
The full architecture is documented under **[`docs/`](docs/)** — see the `Architecture/`, `Features/`, `Operations/`, `Reference/`, and `Testing/` pages. Headline choices the agent should keep in mind without opening those files:
- **One `TcpListener` per PLC** (54 distinct ports). Each PLC has **one shared backend socket** owned by a `PlcMultiplexer`; many upstream clients are multiplexed onto that single backend via MBAP TxId rewriting (Phase 9). The H2-ECOM100's 4-client cap no longer caps upstream connections.
- **Transparent by default; opt-in cached** (Phase 11). Every byte passes through unchanged except the MBAP TxId field (rewritten by the multiplexer on each request and restored on each response) and FC03/FC04 response payloads + FC06/FC16 request payloads at configured BCD addresses (re-encoded between BCD nibbles and binary integers). With Phase 11, FC03/FC04 reads for tags whose `CacheTtlMs > 0` may be served from a per-PLC in-process cache without backend traffic; the cache is **OFF by default** per tag.
- **In-flight FC03/FC04 read coalescing** (Phase 10): same-key reads arriving while a peer is in flight attach to the existing `InFlightRequest.InterestedParties` list; the single backend response fans out to every attached client with original TxIds restored. Zero post-response staleness — coalescing entries die when the response arrives. Hot-reload via `Mbproxy.Resilience.ReadCoalescing.Enabled`.
- **Optional response cache** (Phase 11) with per-tag TTL (default 0 = off). Lookup order is **cache → coalesce → backend**: a cache hit short-circuits Phase 10's coalescing path entirely. Multi-tag read range: effective TTL = `min(TTLs)`; any tag with `CacheTtlMs = 0` in the range disables caching for the whole read. Successful FC06/FC16 write responses invalidate cached FC03/FC04 entries whose address range overlaps the write. Cache stores POST-rewriter bytes (hits never re-invoke the rewriter). No persistence — process restart wipes the cache. `Cache.AllowLongTtl = true` is required for any `CacheTtlMs > 60_000`.
- **Polly-backed listener supervisor** auto-recovers any listener that fails to bind at startup or faults at runtime; the same code path also brings up newly-added PLCs from hot-reload and tears down removed ones.
- **`appsettings.json` is hot-reloadable** via `IOptionsMonitor<MbproxyOptions>`; tag-list changes propagate per-PDU, PLC add/remove flows through the supervisor. A tag-list reload flushes the affected PLC's response cache (per-tag granularity intentionally not done in v1).
- **Polly bounded retries** on backend connect (3 attempts at 100ms / 500ms / 2000ms). No retries on mid-request failures (FC06/FC16 are non-idempotent on BCD tags). A per-request watchdog in the multiplexer surfaces Modbus exception 0x0B to the upstream client if a backend response never arrives within `BackendRequestTimeoutMs`.
- **Backend disconnect cascades upstream**: when the shared backend socket dies, every attached upstream pipe is closed in the same cycle (counter `BackendDisconnectCascades`); clients reconnect on their next request.
- **Keepalive / connection monitoring** (ON by default, `Connection.Keepalive`): OS `SO_KEEPALIVE` on backend and accepted upstream sockets, plus a per-PLC application heartbeat — a synthetic FC03 qty=1 read fired on an idle backend socket (`BackendHeartbeatIdleMs`). An unanswered heartbeat proactively tears the backend down (counters `backendHeartbeatsSent/Failed`, `backendIdleDisconnects`). The DL260 has no FC08, so the probe is a real register read. See [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md).
- **Read-only Kestrel admin port** (default 8080) exposes `GET /` (auto-refreshing HTML) and `GET /status.json` with service-wide and per-PLC counters (including Phase-9 mux fields, Phase-10 coalescing fields, and Phase-11 cache fields `cacheHitCount`, `cacheMissCount`, `cacheInvalidations`, `cacheEntryCount`, `cacheBytes`).
Anything beyond this short list lives in the `docs/` tree: the appsettings.json schema in [`docs/Operations/Configuration.md`](docs/Operations/Configuration.md), config propagation in [`docs/Features/HotReload.md`](docs/Features/HotReload.md), stable log event names in [`docs/Reference/LogEvents.md`](docs/Reference/LogEvents.md), the status counter catalog in [`docs/Operations/StatusPage.md`](docs/Operations/StatusPage.md), and the simulator-backed test fixture in [`docs/Testing/Simulator.md`](docs/Testing/Simulator.md). Open the relevant page before writing code; keep it in sync when decisions change.
## Current state
**Implementation complete through Phase 11.** Phases 0008 shipped the production-ready 1:1-model service; Phase 9 swapped the connection layer for the TxId-multiplexed model; Phase 10 added in-flight read coalescing on top; Phase 11 added an opt-in per-tag response cache (bounded staleness, OFF by default — see [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md)). The service is production-ready as a **Windows Service or a Linux systemd unit**:
- Test count grew through Phase 11 (see `tests/Mbproxy.Tests/` for the current suite; previous baseline was 325 = 282 unit + 43 E2E).
- Single-file self-contained publish for `win-x64` **and** `linux-x64` (`dotnet publish -c Release -r <rid>`) — the RID is supplied per publish, never hardcoded in the csproj.
- Install/uninstall scripts under `install/`: PowerShell (`install.ps1` / `uninstall.ps1`) for the Windows Service; shell (`install.sh` / `uninstall.sh` + the `mbproxy.service` unit) for systemd.
- Graceful shutdown with configurable drain timeout (`Connection.GracefulShutdownTimeoutMs`, default 10 s) — driven by the Windows SCM stop signal or POSIX `SIGTERM`.
- Platform diagnostic sink for Error+ events, chosen once at the composition root by `DiagnosticSinkSelector`: Windows Application Event Log under the SCM, local syslog under systemd, none for interactive/dev runs. The systemd unit is `Type=exec` (not `notify`).
- Read-only HTTP status page at `AdminPort` (default 8080) — surfaces Phase-9 mux fields alongside Phase-7 counters.
- `connectsSuccess` / `connectsFailed` counters wired in `PlcMultiplexer`.
- Phase 9 per-request watchdog defends against any backend that drops or mis-echoes a response (real-world packet loss; pymodbus 3.13 simulator's concurrent-multiplexed-request bug).
- `AssemblyInformationalVersion` set to `1.0.0` (CI can override via `/p:InformationalVersion=...`).
The human-facing entry point is **[`README.md`](README.md)**. All design decisions live in the [`docs/`](docs/) tree.
Constraints that still apply to this codebase (do not change without updating the relevant `docs/` page):
- The csproj targets **.NET 10** (`net10.0`). This is the **only** tool in `wwtools/` not pinned to .NET Framework 4.8 / x86.
## Device quirks (read before writing Modbus code)
The DL205/DL260 family is *almost* Modbus-spec-compliant, but every category below has at least one trap. The authoritative reference is **[`docs/Reference/dl205.md`](docs/Reference/dl205.md)** — read it end-to-end before touching the wire protocol. Highlights that bear directly on this proxy:
- **BCD-by-default numeric encoding.** `V2000 = 1234` stores `0x1234` on the wire, not `0x04D2`. This is the entire reason this service exists.
- **CDAB word order for 32-bit values.** Low word first, big-endian bytes within each word. `0xAABBCCDD` lands as `[0xCC 0xDD][0xAA 0xBB]`.
- **Octal V-memory ↔ decimal Modbus translation.** `V2000` octal = decimal 1024 = Modbus PDU `0x0400`. Config addresses are PDU-decimal, **not** octal V-memory and **not** 1-based 4xxxx.
- **FC03/FC04 max qty = 128** (above spec's 125). **FC16 max qty = 100** (below spec's 123). The proxy passes these through; the PLC enforces the cap with exception 03.
- **Max 4 concurrent TCP clients per ECOM100.** This is why the proxy uses a single TxId-multiplexed backend socket per PLC — see [`docs/Architecture/ConnectionModel.md`](docs/Architecture/ConnectionModel.md) for how the connection model lifts this cap.
- **No TCP keepalive from the device.** Middleboxes typically drop idle sockets at 25 min. The proxy compensates with its own keepalive — `SO_KEEPALIVE` on every socket plus an idle backend FC03 heartbeat (see the Architecture summary and [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md)).
- **Register 0 is valid** on DL205/DL260 in factory "absolute" addressing mode — don't probe-skip it.
- **As-deployed PLC parameters** (captured in `docs/Reference/mbtcp_settings.JPG`): port 502, "Use Concept data structures (Longs/Reals)" enabled, "Swap bytes" enabled, "Use Zero Based Addressing" **unchecked**, Register type = Binary, max coil read 1976 / coil write 800 / register read 122 / register write 100. The proxy must speak Modbus as-is; these settings describe the wire it'll see.
## Resource index
| Task | Go to |
| --- | --- |
| Architecture — listener topology, request flow, per-PLC isolation | [`docs/Architecture/Overview.md`](docs/Architecture/Overview.md) |
| Connection model — single backend socket per PLC, TxId multiplexing, request-timeout watchdog, disconnect cascade | [`docs/Architecture/ConnectionModel.md`](docs/Architecture/ConnectionModel.md) |
| Keepalive / connection monitoring — TCP `SO_KEEPALIVE` + backend FC03 heartbeat | [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md) |
| In-flight read coalescing / opt-in response cache | [`docs/Architecture/ReadCoalescing.md`](docs/Architecture/ReadCoalescing.md), [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md) |
| BCD rewriting (codec, CDAB word order, FC03/04/06/16 scope) and config hot-reload | [`docs/Features/BcdRewriting.md`](docs/Features/BcdRewriting.md), [`docs/Features/HotReload.md`](docs/Features/HotReload.md) |
| Operations — full appsettings.json reference, status page / JSON fields, troubleshooting playbook | [`docs/Operations/Configuration.md`](docs/Operations/Configuration.md), [`docs/Operations/StatusPage.md`](docs/Operations/StatusPage.md), [`docs/Operations/Troubleshooting.md`](docs/Operations/Troubleshooting.md) |
| Stable `mbproxy.*` log event-name catalog | [`docs/Reference/LogEvents.md`](docs/Reference/LogEvents.md) |
| DL205/DL260 Modbus quirks (BCD, CDAB, octal V-memory, FC limits, exception codes, oddities) | [`docs/Reference/dl205.md`](docs/Reference/dl205.md) |
| pymodbus simulator profile that models those quirks as concrete register values | [`tests/sim/dl205.json`](tests/sim/dl205.json) |
| As-deployed PLC Modbus parameters screenshot | [`docs/Reference/mbtcp_settings.JPG`](docs/Reference/mbtcp_settings.JPG) |
## Maintenance
Documentation doctrine for `wwtools/` lives in [`../DOCS-GUIDE.md`](../DOCS-GUIDE.md). The three-layer rules apply:
- **[`README.md`](README.md)** is the canonical human entry point (Layer-2 per DOCS-GUIDE). It routes to deep docs; it does not duplicate them. Update it when the service's public surface or install steps change.
- This `CLAUDE.md` stays a router for LLM coding agents. Deep design decisions live in the [`docs/`](docs/) tree; device quirks live in [`docs/Reference/dl205.md`](docs/Reference/dl205.md). When you change a design decision, update the relevant page under `docs/` first (it's the source of truth) and only mirror the change into the Architecture summary above if it shifts one of the headline bullets.
- When the service's task→tool mapping changes in the root index, update [`../CLAUDE.md`](../CLAUDE.md) too.
- Any further design changes belong in the relevant `docs/` page (`Architecture/`, `Features/`, `Operations/`, `Reference/`, or `Testing/`).