diff --git a/CLAUDE.md b/CLAUDE.md index 4ebc118..caabdf8 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -23,7 +23,7 @@ When in doubt about where content belongs, default to pushing it deeper. `DOCS-G - [`graccesscli/`](graccesscli/README.md) — `.NET Framework 4.8 / x86` CliFx-based CLI for automating Galaxy configuration through the ArchestrA GRAccess COM interop. - [`grdb/`](grdb/README.md) — SQL/DDL exploration of the Galaxy Repository SQL database (queries, schema, hierarchy/tag-name translation). - [`histdb/`](histdb/README.md) — LLM-oriented reference for AVEVA Historian retrieval (extension tables, `wwXxx` time-domain extensions, retrieval modes/options, alarm-event SQL, REST API). Distilled from the official Historian Retrieval Guide. -- [`mbproxy/`](mbproxy/README.md) — `.NET 10` Windows Service that proxies Modbus TCP for a fleet of ~54 DL205/DL260 PLCs: inline bidirectional BCD rewriting, single-backend-conn TxId multiplexing (lifts the H2-ECOM100 4-client cap), in-flight read coalescing, and opt-in per-tag response caching. +- [`mbproxy/`](mbproxy/README.md) — `.NET 10` background service (Windows Service or Linux systemd unit) that proxies Modbus TCP for a fleet of ~54 DL205/DL260 PLCs: inline bidirectional BCD rewriting, single-backend-conn TxId multiplexing (lifts the H2-ECOM100 4-client cap), in-flight read coalescing, and opt-in per-tag response caching. - [`mxaccesscli/`](mxaccesscli/README.md) — `.NET Framework 4.8 / x86` CliFx-based CLI for reading, writing, and subscribing to System Platform tags via the **MxAccess** COM proxy (`LMXProxyServerClass`). - [`secrets/`](secrets/README.md) — Self-hosted Infisical CLI + `secret` PowerShell helper for fetching credentials from `https://infisical.dohertylan.com` instead of inlining plaintext. diff --git a/mbproxy/CLAUDE.md b/mbproxy/CLAUDE.md index 5d06905..28855a2 100644 --- a/mbproxy/CLAUDE.md +++ b/mbproxy/CLAUDE.md @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## What this is -`mbproxy` is a **C# .NET 10** background service (Windows Service) that sits **inline as a Modbus TCP proxy** in front of a fleet of **~54 AutomationDirect DirectLOGIC DL205 / DL260** equipment controllers. It is pre-configured with two pieces of static data: +`mbproxy` is a **C# .NET 10** background service — a **Windows Service** or a **Linux systemd unit** — that sits **inline as a Modbus TCP proxy** in front of a fleet of **~54 AutomationDirect DirectLOGIC DL205 / DL260** equipment controllers. It is pre-configured with two pieces of static data: 1. **A list of BCD tags** — the holding/input registers (by Modbus address and bit width) that the controllers store in DirectLOGIC's native BCD encoding (`V2000 = 1234` is stored on the wire as `0x1234`, *not* `0x04D2`). 2. **A list of equipment controller IP addresses** (~54 entries) for the DL205/DL260 fleet. Each controller speaks Modbus TCP on port 502 via either the built-in DL260 Ethernet port or an H2-ECOM100 / H2-EBC100 coprocessor. @@ -31,19 +31,20 @@ The full architecture is documented under **[`docs/`](docs/)** — see the `Arch - **`appsettings.json` is hot-reloadable** via `IOptionsMonitor`; tag-list changes propagate per-PDU, PLC add/remove flows through the supervisor. A tag-list reload flushes the affected PLC's response cache (per-tag granularity intentionally not done in v1). - **Polly bounded retries** on backend connect (3 attempts at 100ms / 500ms / 2000ms). No retries on mid-request failures (FC06/FC16 are non-idempotent on BCD tags). A per-request watchdog in the multiplexer surfaces Modbus exception 0x0B to the upstream client if a backend response never arrives within `BackendRequestTimeoutMs`. - **Backend disconnect cascades upstream**: when the shared backend socket dies, every attached upstream pipe is closed in the same cycle (counter `BackendDisconnectCascades`); clients reconnect on their next request. +- **Keepalive / connection monitoring** (ON by default, `Connection.Keepalive`): OS `SO_KEEPALIVE` on backend and accepted upstream sockets, plus a per-PLC application heartbeat — a synthetic FC03 qty=1 read fired on an idle backend socket (`BackendHeartbeatIdleMs`). An unanswered heartbeat proactively tears the backend down (counters `backendHeartbeatsSent/Failed`, `backendIdleDisconnects`). The DL260 has no FC08, so the probe is a real register read. See [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md). - **Read-only Kestrel admin port** (default 8080) exposes `GET /` (auto-refreshing HTML) and `GET /status.json` with service-wide and per-PLC counters (including Phase-9 mux fields, Phase-10 coalescing fields, and Phase-11 cache fields `cacheHitCount`, `cacheMissCount`, `cacheInvalidations`, `cacheEntryCount`, `cacheBytes`). Anything beyond this short list lives in the `docs/` tree: the appsettings.json schema in [`docs/Operations/Configuration.md`](docs/Operations/Configuration.md), config propagation in [`docs/Features/HotReload.md`](docs/Features/HotReload.md), stable log event names in [`docs/Reference/LogEvents.md`](docs/Reference/LogEvents.md), the status counter catalog in [`docs/Operations/StatusPage.md`](docs/Operations/StatusPage.md), and the simulator-backed test fixture in [`docs/Testing/Simulator.md`](docs/Testing/Simulator.md). Open the relevant page before writing code; keep it in sync when decisions change. ## Current state -**Implementation complete through Phase 11.** Phases 00–08 shipped the production-ready 1:1-model service; Phase 9 swapped the connection layer for the TxId-multiplexed model; Phase 10 added in-flight read coalescing on top; Phase 11 added an opt-in per-tag response cache (bounded staleness, OFF by default — see [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md)). The service is production-ready as a Windows Service: +**Implementation complete through Phase 11.** Phases 00–08 shipped the production-ready 1:1-model service; Phase 9 swapped the connection layer for the TxId-multiplexed model; Phase 10 added in-flight read coalescing on top; Phase 11 added an opt-in per-tag response cache (bounded staleness, OFF by default — see [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md)). The service is production-ready as a **Windows Service or a Linux systemd unit**: - Test count grew through Phase 11 (see `tests/Mbproxy.Tests/` for the current suite; previous baseline was 325 = 282 unit + 43 E2E). -- Single-file self-contained publish (`dotnet publish -c Release -r win-x64`). -- PowerShell install/uninstall scripts under `install/`. -- Graceful shutdown with configurable drain timeout (`Connection.GracefulShutdownTimeoutMs`, default 10 s). -- Windows Event Log integration (Error+ events when running as a service). +- Single-file self-contained publish for `win-x64` **and** `linux-x64` (`dotnet publish -c Release -r `) — the RID is supplied per publish, never hardcoded in the csproj. +- Install/uninstall scripts under `install/`: PowerShell (`install.ps1` / `uninstall.ps1`) for the Windows Service; shell (`install.sh` / `uninstall.sh` + the `mbproxy.service` unit) for systemd. +- Graceful shutdown with configurable drain timeout (`Connection.GracefulShutdownTimeoutMs`, default 10 s) — driven by the Windows SCM stop signal or POSIX `SIGTERM`. +- Platform diagnostic sink for Error+ events, chosen once at the composition root by `DiagnosticSinkSelector`: Windows Application Event Log under the SCM, local syslog under systemd, none for interactive/dev runs. The systemd unit is `Type=exec` (not `notify`). - Read-only HTTP status page at `AdminPort` (default 8080) — surfaces Phase-9 mux fields alongside Phase-7 counters. - `connectsSuccess` / `connectsFailed` counters wired in `PlcMultiplexer`. - Phase 9 per-request watchdog defends against any backend that drops or mis-echoes a response (real-world packet loss; pymodbus 3.13 simulator's concurrent-multiplexed-request bug). @@ -63,7 +64,7 @@ The DL205/DL260 family is *almost* Modbus-spec-compliant, but every category bel - **Octal V-memory ↔ decimal Modbus translation.** `V2000` octal = decimal 1024 = Modbus PDU `0x0400`. Config addresses are PDU-decimal, **not** octal V-memory and **not** 1-based 4xxxx. - **FC03/FC04 max qty = 128** (above spec's 125). **FC16 max qty = 100** (below spec's 123). The proxy passes these through; the PLC enforces the cap with exception 03. - **Max 4 concurrent TCP clients per ECOM100.** This is why the proxy uses a single TxId-multiplexed backend socket per PLC — see [`docs/Architecture/ConnectionModel.md`](docs/Architecture/ConnectionModel.md) for how the connection model lifts this cap. -- **No TCP keepalive from the device.** Middleboxes typically drop idle sockets at 2–5 min. With the 1:1 model, backend liveness tracks upstream client liveness; if both are idle long enough, the path dies on its own and the next request reconnects. +- **No TCP keepalive from the device.** Middleboxes typically drop idle sockets at 2–5 min. The proxy compensates with its own keepalive — `SO_KEEPALIVE` on every socket plus an idle backend FC03 heartbeat (see the Architecture summary and [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md)). - **Register 0 is valid** on DL205/DL260 in factory "absolute" addressing mode — don't probe-skip it. - **As-deployed PLC parameters** (captured in `docs/Reference/mbtcp_settings.JPG`): port 502, "Use Concept data structures (Longs/Reals)" enabled, "Swap bytes" enabled, "Use Zero Based Addressing" **unchecked**, Register type = Binary, max coil read 1976 / coil write 800 / register read 122 / register write 100. The proxy must speak Modbus as-is; these settings describe the wire it'll see. @@ -73,6 +74,7 @@ The DL205/DL260 family is *almost* Modbus-spec-compliant, but every category bel | --- | --- | | Architecture — listener topology, request flow, per-PLC isolation | [`docs/Architecture/Overview.md`](docs/Architecture/Overview.md) | | Connection model — single backend socket per PLC, TxId multiplexing, request-timeout watchdog, disconnect cascade | [`docs/Architecture/ConnectionModel.md`](docs/Architecture/ConnectionModel.md) | +| Keepalive / connection monitoring — TCP `SO_KEEPALIVE` + backend FC03 heartbeat | [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md) | | In-flight read coalescing / opt-in response cache | [`docs/Architecture/ReadCoalescing.md`](docs/Architecture/ReadCoalescing.md), [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md) | | BCD rewriting (codec, CDAB word order, FC03/04/06/16 scope) and config hot-reload | [`docs/Features/BcdRewriting.md`](docs/Features/BcdRewriting.md), [`docs/Features/HotReload.md`](docs/Features/HotReload.md) | | Operations — full appsettings.json reference, status page / JSON fields, troubleshooting playbook | [`docs/Operations/Configuration.md`](docs/Operations/Configuration.md), [`docs/Operations/StatusPage.md`](docs/Operations/StatusPage.md), [`docs/Operations/Troubleshooting.md`](docs/Operations/Troubleshooting.md) | diff --git a/mbproxy/README.md b/mbproxy/README.md index bdf916f..65bbf29 100644 --- a/mbproxy/README.md +++ b/mbproxy/README.md @@ -1,14 +1,14 @@ # mbproxy -A .NET 10 Windows Service that sits inline as a Modbus TCP proxy in front of a fleet of AutomationDirect DirectLOGIC DL205/DL260 controllers, rewriting BCD-encoded registers bidirectionally so upstream clients can read and write them as plain integers. The proxy also offers an opt-in per-tag response cache (default OFF) for FC03/FC04 reads with bounded operator-configured staleness — see [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md) before enabling it. +A .NET 10 background service — a **Windows Service** or a **Linux systemd unit** — that sits inline as a Modbus TCP proxy in front of a fleet of AutomationDirect DirectLOGIC DL205/DL260 controllers, rewriting BCD-encoded registers bidirectionally so upstream clients can read and write them as plain integers. The proxy also offers an opt-in per-tag response cache (default OFF) for FC03/FC04 reads with bounded operator-configured staleness — see [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md) before enabling it. > ⚠ **32-bit BCD wire format is "two base-10000 digits in CDAB", not standard CDAB binary Int32.** A 32-bit BCD tag at address `A` decodes as `decimal = high * 10_000 + low` where `low` is the register at `A` and `high` is the register at `A+1`. Each word independently must be 0–9999. Standard Modbus clients (NModbus, FluentModbus, Wonderware DAServer) that interpret CDAB as straight binary Int32 will silently corrupt any value > 9999 on writes and read garbage on reads. Configure your client to send/receive each register as a separate base-10000 BCD digit pair, not as a single binary Int32. Full details in [`docs/Features/BcdRewriting.md`](docs/Features/BcdRewriting.md). ## Hard constraints / prerequisites -- **Windows 10 / Server 2019 or later, 64-bit.** No Linux or Docker support — the service uses `Microsoft.Extensions.Hosting.WindowsServices` and the Windows Event Log. +- **Windows (10 / Server 2019+) or Linux (any systemd distro), 64-bit.** Ships as a Windows Service (Application Event Log integration) or a systemd unit (syslog integration); builds single-file for `win-x64` and `linux-x64`. macOS is not a deployment target — it runs only as a foreground console process. - **Modbus TCP backends reachable** from the proxy host on port 502 (or the port configured per PLC). The H2-ECOM100 module caps simultaneous connections at **4 per PLC** — a fifth upstream client will fail to connect. -- **Admin rights** to install the service (`install.ps1` requires elevation). +- **Admin / root rights** to install the service (`install.ps1` requires elevation; `install.sh` requires root). - **No COM dependency** — this is a pure .NET 10 socket-level proxy (unlike the `.NET Framework 4.8 / x86` siblings in this repo). - **Python 3.10+** on the test machine to run the pymodbus-backed E2E simulator (not needed to run the service in production). @@ -16,8 +16,8 @@ A .NET 10 Windows Service that sits inline as a Modbus TCP proxy in front of a f ``` src/Mbproxy/ Main C# project (net10.0, Microsoft.NET.Sdk.Worker) -tests/Mbproxy.Tests/ xUnit v3 test project (314 unit + 48 E2E tests) -install/ PowerShell install/uninstall scripts and config template +tests/Mbproxy.Tests/ xUnit v3 test project (unit + simulator-backed E2E tests) +install/ Install/uninstall + publish scripts (PowerShell + shell), systemd unit, config templates docs/ Architecture, features, operations, reference, and testing docs ``` @@ -40,6 +40,7 @@ The `docs/` tree is organized by topic. Start with [`Architecture/Overview.md`]( - [`Architecture/ConnectionModel.md`](docs/Architecture/ConnectionModel.md) — Single backend connection per PLC, TxId multiplexing, request-timeout watchdog, disconnect cascade. - [`Architecture/ReadCoalescing.md`](docs/Architecture/ReadCoalescing.md) — In-flight FC03/FC04 deduplication via `InFlightByKeyMap`. - [`Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md) — Opt-in per-tag response cache with bounded operator-configured staleness. +- [`Architecture/Keepalive.md`](docs/Architecture/Keepalive.md) — TCP `SO_KEEPALIVE` on every socket plus an idle-backend FC03 heartbeat. ### Features @@ -54,7 +55,7 @@ The `docs/` tree is organized by topic. Start with [`Architecture/Overview.md`]( ### Reference -- [`Reference/LogEvents.md`](docs/Reference/LogEvents.md) — Stable `mbproxy.*` event catalog (28 events across 7 categories). +- [`Reference/LogEvents.md`](docs/Reference/LogEvents.md) — Stable `mbproxy.*` event catalog (31 events across 8 categories). ### Testing @@ -68,20 +69,27 @@ The `docs/` tree is organized by topic. Start with [`Architecture/Overview.md`]( dotnet build Mbproxy.slnx -c Debug ``` -**Publish (Release, single-file, win-x64):** +**Publish (Release, single-file):** ```powershell -.\install\publish.ps1 -Clean +.\install\publish.ps1 -Clean # win-x64 (default) +.\install\publish.ps1 -Rid linux-x64 -Clean # cross-publish for linux-x64 ``` -Produces both flavours under `publish-out\`: +On a Linux build host, use the shell counterpart: -| Flavour | Path | Size | Target prerequisite | +```bash +./install/publish.sh --clean # linux-x64 (default) +``` + +Each run produces both flavours under `publish-out\`: + +| Flavour | Path (win-x64) | Size | Target prerequisite | |---|---|---|---| | Self-contained | `publish-out\self-contained\Mbproxy.exe` | ~100 MB | None — bundles .NET 10 + ASP.NET Core runtime | -| Framework-dependent | `publish-out\framework-dependent\Mbproxy.exe` | ~1.5 MB | .NET 10 + ASP.NET Core preinstalled | +| Framework-dependent | `publish-out\framework-dependent\Mbproxy.exe` | ~1.6 MB | .NET 10 + ASP.NET Core preinstalled | -Pass `-OutputDir ` to publish elsewhere; omit `-Clean` to skip the wipe. The script wraps `dotnet publish src/Mbproxy/Mbproxy.csproj -c Release -r win-x64 [-p:SelfContained=false]` — run those directly if you only need one flavour. +On `linux-x64` the binary is `Mbproxy` (no extension) and ships the Linux config template. Pass `-OutputDir`/`-o` to publish elsewhere; omit `-Clean`/`--clean` to skip the wipe. The scripts wrap `dotnet publish src/Mbproxy/Mbproxy.csproj -c Release -r [-p:SelfContained=false]` — run that directly if you only need one flavour. **Run tests:** @@ -102,21 +110,30 @@ Edit `src/Mbproxy/appsettings.json` to configure PLCs before running. The admin ## Install -The `install/` directory holds the publish, install, and uninstall scripts. Quick path: +The `install/` directory holds the publish, install, and uninstall scripts for both platforms. + +**Windows** — elevated PowerShell: ```powershell -# 1. Publish (produces publish-out\self-contained\ and publish-out\framework-dependent\) .\install\publish.ps1 -Clean - -# 2. Install (elevated PowerShell) — point at the flavour you want to deploy .\install\install.ps1 -PublishOutput .\publish-out\self-contained -Start - -# 3. Edit the config that was placed at %ProgramData%\mbproxy\appsettings.json - -# 4. Verify +# Config is placed at %ProgramData%\mbproxy\appsettings.json — edit it, then: +# Restart-Service mbproxy Invoke-WebRequest http://localhost:8080/ -UseBasicParsing ``` +**Linux** — root / `sudo` on a systemd host: + +```bash +./install/publish.sh --clean +sudo ./install/install.sh --publish-dir ./publish-out/self-contained +# Config is placed at /etc/mbproxy/appsettings.json — edit it, then: +# sudo systemctl restart mbproxy +curl http://localhost:8080/ +``` + +`uninstall.ps1` / `uninstall.sh` reverse the install; both archive log files rather than deleting them. The systemd unit runs mbproxy as `Type=exec` under a dedicated `mbproxy` service account. + ## Maintenance Documentation doctrine for this repo: [`../DOCS-GUIDE.md`](../DOCS-GUIDE.md). diff --git a/mbproxy/docs/Architecture/Overview.md b/mbproxy/docs/Architecture/Overview.md index ae8e37b..c737b45 100644 --- a/mbproxy/docs/Architecture/Overview.md +++ b/mbproxy/docs/Architecture/Overview.md @@ -6,7 +6,7 @@ This document is the entry point for readers new to the codebase. It sketches th ## Runtime Shape -The process is a single .NET 10 Generic Host worker. `Microsoft.Extensions.Hosting.WindowsServices` registers the host as a Windows Service so the same binary runs interactively (for development) or under the SCM (in production). All configuration binds from `appsettings.json` through `IOptionsMonitor`, which makes the tag list and PLC roster hot-reloadable without process restart. `ProxyWorker` is the long-lived `BackgroundService` that owns startup, shutdown, and the listener supervisors for every PLC. A small Kestrel admin endpoint runs in the same process to serve the read-only status page. +The process is a single .NET 10 Generic Host worker. It registers both `Microsoft.Extensions.Hosting.WindowsServices` and `Microsoft.Extensions.Hosting.Systemd` — each a no-op off its own init system — so the same binary runs interactively (for development), as a Windows Service under the SCM, or as a Linux systemd unit. All configuration binds from `appsettings.json` through `IOptionsMonitor`, which makes the tag list and PLC roster hot-reloadable without process restart. `ProxyWorker` is the long-lived `BackgroundService` that owns startup, shutdown, and the listener supervisors for every PLC. A small Kestrel admin endpoint runs in the same process to serve the read-only status page. There is no in-process database, no message broker, and no persistent cache file: state is per-PLC, in-memory, and ephemeral. Restarting the service drops every in-flight request and every cached response. Upstream clients are expected to reconnect and reissue; the proxy never replays a request on their behalf. diff --git a/mbproxy/docs/Features/HotReload.md b/mbproxy/docs/Features/HotReload.md index 7df3275..5ab0162 100644 --- a/mbproxy/docs/Features/HotReload.md +++ b/mbproxy/docs/Features/HotReload.md @@ -6,7 +6,7 @@ A save to `appsettings.json` propagates to a running `mbproxy` without restartin `Microsoft.Extensions.Configuration` loads `appsettings.json` with `reloadOnChange: true`. Every consumer reads its options through `IOptionsMonitor` instead of capturing a one-shot `IOptions` snapshot at construction. When the framework's `FileSystemWatcher` sees the file change, it re-parses the JSON, re-binds the option tree, and notifies subscribers through `IOptionsMonitor.OnChange`. -The chosen mechanism is deliberate. There is no custom file watcher, no IPC channel, no admin-port mutation endpoint, and no SIGHUP-style trigger. An operator edits the file in place (or a deployment tool atomically rewrites it) and the running service catches up. The reload contract is identical whether the service is running interactively or as a Windows Service under the SCM. +The chosen mechanism is deliberate. There is no custom file watcher, no IPC channel, no admin-port mutation endpoint, and no SIGHUP-style trigger. An operator edits the file in place (or a deployment tool atomically rewrites it) and the running service catches up. The reload contract is identical whether the service is running interactively, as a Windows Service under the SCM, or as a Linux systemd unit. The `OnChange` callback can fire multiple times for a single logical save because text editors on Windows commonly use a rename-and-replace pattern that produces two or three `FileSystemWatcher` events. The reconciler debounces these inside its own background loop with a 250 ms quiescent window so a single save produces a single apply. diff --git a/mbproxy/docs/Operations/Configuration.md b/mbproxy/docs/Operations/Configuration.md index 74d3aba..c36a34c 100644 --- a/mbproxy/docs/Operations/Configuration.md +++ b/mbproxy/docs/Operations/Configuration.md @@ -7,8 +7,11 @@ The configuration loader resolves `appsettings.json` relative to the executable. - **Development run** (`dotnet run`): `src/Mbproxy/appsettings.json` next to the build output. -- **Single-file publish** (`dotnet publish -c Release -r win-x64`): `appsettings.json` next to `Mbproxy.exe` in the publish folder. -- **Installed as a Windows Service**: `%ProgramData%\mbproxy\appsettings.json`. The install script copies the template at `install/mbproxy.config.template.json` to this path the first time only — an existing file is preserved across reinstalls. +- **Single-file publish** (`dotnet publish -c Release -r `): `appsettings.json` next to the published binary. A `win-x64` publish ships `install/mbproxy.config.template.json`; a `linux-x64` publish ships `install/mbproxy.linux.config.template.json` (same keys, Unix log path) — each linked into the bundle as `appsettings.json`. +- **Installed as a Windows Service**: `%ProgramData%\mbproxy\appsettings.json`, seeded by `install.ps1` from `mbproxy.config.template.json`. +- **Installed as a systemd unit**: `/etc/mbproxy/appsettings.json` (the unit's `WorkingDirectory`), seeded by `install.sh` from the Linux template. + +In both installed cases the install script copies the template only when no config already exists — an existing file is preserved across reinstalls. The file is loaded with `reloadOnChange: true`. All consumers read through `IOptionsMonitor`, so a save propagates without restarting the service. See [`../Features/HotReload.md`](../Features/HotReload.md) for per-key propagation semantics. @@ -51,11 +54,19 @@ Every supported key under `Mbproxy:*`, populated to a representative default: // Read-only HTTP status page. Set to 0 to disable. "AdminPort": 8080, - // Backend connection / request / shutdown timeouts. + // Backend connection / request / shutdown timeouts and keepalive. "Connection": { "BackendConnectTimeoutMs": 3000, "BackendRequestTimeoutMs": 3000, - "GracefulShutdownTimeoutMs": 10000 + "GracefulShutdownTimeoutMs": 10000, + "Keepalive": { + "Enabled": true, + "TcpIdleTimeMs": 30000, + "TcpProbeIntervalMs": 5000, + "TcpProbeCount": 4, + "BackendHeartbeatIdleMs": 30000, + "BackendHeartbeatProbeAddress": 0 + } }, // Polly resilience policies. @@ -169,6 +180,21 @@ Operational sizing notes: - A 3 s request timeout is generous compared with typical DL205/DL260 scan times (a few ms to tens of ms for FC03 of 100 registers). The slack absorbs PLC scan-overlap jitter without faulting the upstream client. - `GracefulShutdownTimeoutMs` should be less than the Service Control Manager's stop deadline. The default 10 s suits a fleet of 54 PLCs; on a much larger fleet, raise both the SCM wait hint and this value in lockstep. +## `Mbproxy.Connection.Keepalive` + +TCP keepalive and backend heartbeat settings. Source: `KeepaliveOptions.cs`. Enabled by default — the DL205/DL260 ECOM never emits TCP keepalives, so an idle socket is otherwise dropped by middleboxes after 2–5 minutes. See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md) for the full design. + +| Field | Type | Default | Notes | +|-------|------|---------|-------| +| `Enabled` | bool | `true` | Master switch. When `false`, neither `SO_KEEPALIVE` nor the backend heartbeat is applied and the proxy behaves exactly as a pre-keepalive build. | +| `TcpIdleTimeMs` | int | `30000` | `SO_KEEPALIVE` idle time before the OS sends its first probe. Applied to the backend socket and accepted upstream sockets. | +| `TcpProbeIntervalMs` | int | `5000` | `SO_KEEPALIVE` interval between probes once idle. | +| `TcpProbeCount` | int | `4` | `SO_KEEPALIVE` unanswered probes before the OS declares the socket dead. | +| `BackendHeartbeatIdleMs` | int | `30000` | After this much backend idle, the proxy issues a synthetic FC03 qty=1 read to keep the path warm and prove the ECOM still answers Modbus. Must be greater than `BackendRequestTimeoutMs`. | +| `BackendHeartbeatProbeAddress` | int | `0` | Modbus PDU address the heartbeat FC03 probe reads. Address `0` (`V0`) is valid on DL205/DL260 in factory absolute mode. Range `[0, 65535]`. | + +On hot reload, the heartbeat interval and probe address are re-read on every loop tick. The `Tcp*` socket options are applied at connect/accept time, so a reload affects only sockets opened after the change. A reload where `BackendHeartbeatIdleMs <= BackendRequestTimeoutMs` is rejected — a heartbeat interval at or below the request timeout would fire continuously. + ## `Mbproxy.Resilience` Polly retry pipelines for backend connect, listener bind, and the in-flight read coalescer. Source: `ResilienceOptions.cs`. @@ -391,6 +417,7 @@ A reduced view of [`../Features/HotReload.md`](../Features/HotReload.md), restri | `Plcs[i]` removed | Supervisor stops the listener and closes all upstream connections for that PLC. | | `Plcs[i].ListenPort` or `Host` changed | Equivalent to remove + add. | | `Connection.Backend*TimeoutMs` | Next backend connect or request uses the new value. | +| `Connection.Keepalive` heartbeat fields | Re-read on every heartbeat loop tick. `Tcp*` socket options apply to backend/upstream sockets opened after the change. | | `AdminPort` | Requires a service restart — the Kestrel admin host is built once at startup. | | `Resilience.ReadCoalescing.Enabled` | Hot-reloadable; in-flight coalesced entries drain naturally. | | `BcdTags.*.CacheTtlMs`, `Plcs[i].DefaultCacheTtlMs` | Tag-map reseat for the affected PLC drops that PLC's entire cache. | diff --git a/mbproxy/docs/Operations/Troubleshooting.md b/mbproxy/docs/Operations/Troubleshooting.md index 3d65716..480a7d6 100644 --- a/mbproxy/docs/Operations/Troubleshooting.md +++ b/mbproxy/docs/Operations/Troubleshooting.md @@ -2,7 +2,9 @@ Operator diagnosis playbook for mbproxy. Each entry maps an observable symptom to the log event name and status-page counter that confirms it, then lists likely causes and remediation steps. -The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-.log`. The live counters are at `http://:/status.json` (default port `8080`). Events at Error level and above are also mirrored to the Windows Application Event Log under source `mbproxy`. +The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-.log` on Windows, or `/var/log/mbproxy/mbproxy-.log` on Linux. The live counters are at `http://:/status.json` (default port `8080`). Events at Error level and above are also mirrored to the **Windows Application Event Log** (Windows Service) or the **local syslog / journal** (systemd) under source `mbproxy` — view the latter with `journalctl -t mbproxy` or `journalctl -u mbproxy`. + +Paths and service commands below are written for Windows (`%ProgramData%`, `sc.exe`); the systemd equivalents are `/etc/mbproxy` + `/var/log/mbproxy` and `systemctl start|stop|status mbproxy`. ## Service Startup Failures @@ -124,7 +126,28 @@ The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-.log`. The l 1. Verify the upstream count on the status page returns to normal as clients reconnect — `plcs[].clients.connected` should climb again within seconds. 2. If cascades fire repeatedly against the same PLC, investigate the PLC and intermediate network for stability. The proxy itself has no state to repair. -3. If cascades correlate with idle periods, the idle middlebox-drop pattern is the likeliest cause; reduce the upstream client's poll interval below the middlebox idle timeout to keep traffic flowing. +3. If cascades correlate with idle periods, the idle middlebox-drop pattern is the likeliest cause. Keepalive is enabled by default and should already be preventing this — confirm `Connection.Keepalive.Enabled` is `true` and that `BackendHeartbeatIdleMs` is comfortably below the middlebox idle timeout. See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md). + +### Backend keepalive heartbeat failing + +**Symptom.** A PLC's backend connection is torn down while idle — no client was actively talking to it. `plcs[].backend.backendIdleDisconnects` increments and the upstream clients (if any were attached) are cascaded. + +**Where to look.** + +- Log events: `mbproxy.keepalive.heartbeat.timeout` (Warning) followed by `mbproxy.keepalive.backend.idle_disconnect` (Information). +- Status fields: `plcs[].backend.backendHeartbeatsSent`, `backendHeartbeatsFailed`, `backendIdleDisconnects`. + +**Root causes.** + +- The ECOM is reachable at the IP layer but no longer answering Modbus (firmware hang, ECOM reset mid-session). +- The path died between heartbeats and the heartbeat was the first request to discover it — this is the feature working as intended (the failure is found during idle, not on a client request). +- `BackendHeartbeatProbeAddress` points at an address the PLC rejects. The default (0 = `V0`) is safe on DL205/DL260; only an operator override could break this. + +**Remediation.** + +1. A single idle-disconnect that recovers on the next client request needs no action — the proxy reconnected the path proactively. +2. Repeated idle-disconnects on one PLC mean it keeps going dark while idle. Investigate the device and the network path; the proxy has no state to repair. +3. If `backendHeartbeatsFailed` climbs but the PLC answers real client requests fine, check that `BackendHeartbeatProbeAddress` is a register the device actually serves. ### Request timeout watchdog firing diff --git a/mbproxy/docs/Reference/LogEvents.md b/mbproxy/docs/Reference/LogEvents.md index 05b468b..7840073 100644 --- a/mbproxy/docs/Reference/LogEvents.md +++ b/mbproxy/docs/Reference/LogEvents.md @@ -6,9 +6,9 @@ The stable catalog of every `mbproxy.*` event name the service emits, with its l The service uses [Serilog](https://serilog.net/) wired through the `Microsoft.Extensions.Logging` bridge. Three sinks are configured (see `src/Mbproxy/HostingExtensions.cs`): -- **Console** — written to stdout for interactive `--console` runs and for the SCM stdout capture. -- **Rolling file** — under `%ProgramData%\mbproxy\logs\` (`mbproxy-.log`). -- **Windows Event Log** — only when running as a Windows Service, and only for events at `Error` and above (see `src/Mbproxy/Diagnostics/EventLogBridge.cs`). +- **Console** — stdout; captured by the Windows SCM or by systemd-journald. +- **Rolling file** — `%ProgramData%\mbproxy\logs\` on Windows, `/var/log/mbproxy/` on Linux (`mbproxy-.log`). +- **Platform diagnostic sink** — `Error`+ events only. `DiagnosticSinkSelector` picks it once at the composition root: the **Windows Application Event Log** under the SCM (`EventLogBridge`), **local syslog** under systemd (`SyslogBridge`), or none for interactive/dev runs. Every event uses source-generated `[LoggerMessage]` definitions, so the property names below match the message template token-for-token. The default minimum level is `Information`; lower the floor for `Mbproxy.*` categories via the standard `Logging:LogLevel` configuration to surface `Debug` events such as the coalesce and cache traces. @@ -385,6 +385,51 @@ Fires whenever the entire per-PLC cache is wiped at once — primarily after a b **Operator action:** none unless flushes happen on a tight loop, which would indicate the backend connection itself is unstable. +## Keepalive + +See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md) for the backend heartbeat design. + +### mbproxy.keepalive.heartbeat.sent + +**Level:** Debug · **EventId:** 150 · **Source:** `src/Mbproxy/Proxy/Multiplexing/KeepaliveLogEvents.cs` + +| Property | Type | Meaning | +|----------|------|---------| +| `Plc` | `string` | Configured PLC name. | +| `ProxyTxId` | `ushort` | Proxy-allocated TxId carrying the synthetic FC03 probe. | +| `Address` | `ushort` | Modbus address the probe reads (`BackendHeartbeatProbeAddress`). | + +Fires each time the heartbeat loop issues a probe on an idle backend socket — at most one per `BackendHeartbeatIdleMs` per idle PLC. + +**Operator action:** none. Debug-level; useful only when confirming the heartbeat is alive. + +### mbproxy.keepalive.heartbeat.timeout + +**Level:** Warning · **EventId:** 151 · **Source:** `src/Mbproxy/Proxy/Multiplexing/KeepaliveLogEvents.cs` + +| Property | Type | Meaning | +|----------|------|---------| +| `Plc` | `string` | Configured PLC name. | +| `ProxyTxId` | `ushort` | Proxy TxId of the unanswered probe. | +| `ElapsedMs` | `long` | Milliseconds from probe send to timeout. | + +Fires when a heartbeat probe is not answered within `BackendRequestTimeoutMs` — the backend is connected but no longer answering Modbus. + +**Operator action:** check the PLC and the network path. Paired with `mbproxy.keepalive.backend.idle_disconnect` for the same PLC. + +### mbproxy.keepalive.backend.idle_disconnect + +**Level:** Information · **EventId:** 152 · **Source:** `src/Mbproxy/Proxy/Multiplexing/KeepaliveLogEvents.cs` + +| Property | Type | Meaning | +|----------|------|---------| +| `Plc` | `string` | Configured PLC name. | +| `ElapsedMs` | `long` | Milliseconds the failed heartbeat waited before the teardown. | + +Fires when a failed heartbeat triggers a proactive backend teardown. Every attached upstream pipe is cascaded; clients reconnect on their next request. This is the keepalive feature doing its job — finding a dead path during idle instead of on the next real request. + +**Operator action:** none if isolated. Repeated idle-disconnects on one PLC indicate it keeps going dark while idle — investigate the device or the network path. + ## BCD Rewriter ### mbproxy.rewrite.partial_bcd @@ -495,5 +540,6 @@ Lifecycle events (`startup.*`, `listener.*`, `admin.*`, `shutdown.*`, `config.re - [Response Cache](../Architecture/ResponseCache.md) — context for the `mbproxy.cache.*` events. - [Status Page](../Operations/StatusPage.md) — counter equivalents for the high-volume Debug-level events. - [Read Coalescing](../Architecture/ReadCoalescing.md) — context for the `mbproxy.coalesce.*` events. +- [Keepalive](../Architecture/Keepalive.md) — context for the `mbproxy.keepalive.*` events. - [BCD Rewriting](../Features/BcdRewriting.md) — context for the `mbproxy.rewrite.*` and `mbproxy.exception.passthrough` events. - [Hot Reload](../Features/HotReload.md) — context for the `mbproxy.config.reload.*` events. diff --git a/mbproxy/install/install.ps1 b/mbproxy/install/install.ps1 index 9ee706c..b3d2809 100644 --- a/mbproxy/install/install.ps1 +++ b/mbproxy/install/install.ps1 @@ -165,7 +165,10 @@ if (-not (Test-Path $configDest)) { if (-not [System.Diagnostics.EventLog]::SourceExists('mbproxy')) { Write-Host "Registering Windows Event Log source 'mbproxy'..." - New-EventLog -Source 'mbproxy' -LogName 'Application' + # .NET API, not New-EventLog: the *-EventLog cmdlets exist only in Windows + # PowerShell 5.1, not PowerShell 7+. This call is symmetric with the + # SourceExists check above and works on every PowerShell edition. + [System.Diagnostics.EventLog]::CreateEventSource('mbproxy', 'Application') } else { Write-Host "Windows Event Log source 'mbproxy' already registered." } diff --git a/mbproxy/install/install.sh b/mbproxy/install/install.sh new file mode 100644 index 0000000..0f5ad4c --- /dev/null +++ b/mbproxy/install/install.sh @@ -0,0 +1,134 @@ +#!/usr/bin/env bash +# +# install.sh — install the mbproxy service on a Linux / systemd host. +# +# The Linux counterpart of install.ps1. Copies the published binary to +# /opt/mbproxy, seeds the config at /etc/mbproxy/appsettings.json (preserving any +# existing one), creates the log and bundle-cache directories and the mbproxy +# service account, installs the systemd unit, and enables + starts the service. +# +# Re-running on an already-installed service is safe (idempotent): the binary is +# refreshed, an existing /etc/mbproxy/appsettings.json is preserved, and the +# service is restarted. +# +# Usage: +# sudo ./install.sh [--publish-dir DIR] [--no-start] +# +# --publish-dir DIR directory containing the published Mbproxy binary. +# Default: /publish-out/self-contained +# --no-start install and enable the unit but do not start it. +# +set -euo pipefail + +# ── 0. Settings ────────────────────────────────────────────────────────────── +SERVICE_NAME="mbproxy" +SERVICE_USER="mbproxy" +INSTALL_DIR="/opt/mbproxy" +CONFIG_DIR="/etc/mbproxy" +LOG_DIR="/var/log/mbproxy" +CACHE_DIR="/var/cache/mbproxy" +UNIT_DEST="/etc/systemd/system/${SERVICE_NAME}.service" + +script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +repo_root="$(dirname "$script_dir")" +publish_dir="${repo_root}/publish-out/self-contained" +start_service=1 + +while [[ $# -gt 0 ]]; do + case "$1" in + --publish-dir) publish_dir="$2"; shift 2 ;; + --no-start) start_service=0; shift ;; + *) echo "Unknown argument: $1" >&2; exit 2 ;; + esac +done + +# ── 1. Pre-flight checks ───────────────────────────────────────────────────── +if [[ "$(id -u)" -ne 0 ]]; then + echo "install.sh must run as root (use sudo)." >&2 + exit 1 +fi + +binary_src="${publish_dir}/Mbproxy" +if [[ ! -f "$binary_src" ]]; then + echo "Mbproxy binary not found at '${binary_src}'." >&2 + echo "Run install/publish.sh first, or pass --publish-dir." >&2 + exit 1 +fi + +unit_src="${script_dir}/mbproxy.service" +config_src="${publish_dir}/appsettings.json" +if [[ ! -f "$unit_src" ]]; then + echo "Unit file not found at '${unit_src}'." >&2 + exit 1 +fi + +echo "Installing ${SERVICE_NAME} service..." +echo " Publish dir : ${publish_dir}" +echo " Install dir : ${INSTALL_DIR}" +echo " Config dir : ${CONFIG_DIR}" + +# ── 2. Service account ─────────────────────────────────────────────────────── +if ! id -u "$SERVICE_USER" >/dev/null 2>&1; then + echo "Creating service account '${SERVICE_USER}'..." + useradd --system --no-create-home --shell /usr/sbin/nologin "$SERVICE_USER" +else + echo "Service account '${SERVICE_USER}' already exists." +fi + +# ── 3. Stop the service if running (so the binary can be replaced) ─────────── +if systemctl is-active --quiet "$SERVICE_NAME" 2>/dev/null; then + echo "Stopping running service '${SERVICE_NAME}'..." + systemctl stop "$SERVICE_NAME" +fi + +# ── 4. Directories ─────────────────────────────────────────────────────────── +install -d -m 0755 "$INSTALL_DIR" +install -d -m 0755 "$CONFIG_DIR" +install -d -m 0750 -o "$SERVICE_USER" -g "$SERVICE_USER" "$LOG_DIR" +install -d -m 0750 -o "$SERVICE_USER" -g "$SERVICE_USER" "$CACHE_DIR" + +# ── 5. Binary ──────────────────────────────────────────────────────────────── +echo "Copying binary to '${INSTALL_DIR}/Mbproxy'..." +install -m 0755 "$binary_src" "${INSTALL_DIR}/Mbproxy" + +# ── 6. Config (preserve an existing one) ───────────────────────────────────── +config_dest="${CONFIG_DIR}/appsettings.json" +if [[ -f "$config_dest" ]]; then + echo "Preserving existing config at '${config_dest}'." +elif [[ -f "$config_src" ]]; then + echo "Seeding config template to '${config_dest}'..." + install -m 0644 "$config_src" "$config_dest" +else + echo "WARNING: no appsettings.json in '${publish_dir}' — create '${config_dest}' manually." >&2 +fi + +# ── 7. systemd unit ────────────────────────────────────────────────────────── +echo "Installing systemd unit to '${UNIT_DEST}'..." +install -m 0644 "$unit_src" "$UNIT_DEST" +systemctl daemon-reload +systemctl enable "$SERVICE_NAME" >/dev/null + +# ── 8. Start ───────────────────────────────────────────────────────────────── +if [[ "$start_service" -eq 1 ]]; then + echo "Starting service '${SERVICE_NAME}'..." + systemctl start "$SERVICE_NAME" + sleep 1 + if systemctl is-active --quiet "$SERVICE_NAME"; then + echo "Service '${SERVICE_NAME}' is running." + else + echo "WARNING: service '${SERVICE_NAME}' did not reach active state." >&2 + echo "Check: journalctl -u ${SERVICE_NAME} -e" >&2 + fi +fi + +echo "" +echo "Install complete." +echo " Config : ${config_dest}" +echo " Logs : ${LOG_DIR}" +echo " Binary : ${INSTALL_DIR}/Mbproxy" +echo "" +echo "Next steps:" +echo " 1. Edit '${config_dest}' to configure your PLC list and BCD tags." +echo " 2. Restart: sudo systemctl restart ${SERVICE_NAME}" +echo " 3. Logs: journalctl -u ${SERVICE_NAME} -f" +echo " 4. Status: http://localhost:8080/" diff --git a/mbproxy/install/mbproxy.linux.config.template.json b/mbproxy/install/mbproxy.linux.config.template.json new file mode 100644 index 0000000..22b50d9 --- /dev/null +++ b/mbproxy/install/mbproxy.linux.config.template.json @@ -0,0 +1,255 @@ +// mbproxy configuration template (Linux / systemd) — copy to /etc/mbproxy/appsettings.json +// and edit before starting the service. +// +// The .NET configuration loader accepts // and /* */ comments in JSON files +// (JSONC semantics) when using the default Host.CreateApplicationBuilder path. +// +// IMPORTANT: install.sh overwrites this file at the destination ONLY if no +// appsettings.json already exists there. An existing file is always preserved. +// +// This is the Linux counterpart of mbproxy.config.template.json — identical except +// for the rolling-log path (/var/log/mbproxy) and a few platform notes. It is shipped +// as appsettings.json by a `dotnet publish -r linux-*` build. +{ + "Mbproxy": { + + // ── Global BCD tag list ───────────────────────────────────────────────────────────── + // These tags apply to EVERY PLC by default. + // Each entry: Address (Modbus PDU address, decimal), Width (16 or 32 bits). + // + // Width 16 — one register holds 4 BCD digits (0–9999). + // Wire value 0x1234 decodes to decimal 1234. + // + // Width 32 — a CDAB-ordered register pair (Address = low word, Address+1 = high word). + // Decoded decimal = high * 10000 + low (DirectLOGIC CDAB word order). + // + // Per-PLC overrides (see Plcs[].BcdTags below): + // Add — appends extra tags beyond what Global defines, or overrides a + // Global entry's Width when the same Address appears in both. + // Remove — removes specific addresses from the effective set for that PLC. + // Effective set = (Global ∪ Add) − Remove, resolved per PDU. + "BcdTags": { + "Global": [ + // V2000 (octal) = decimal address 1024. 16-bit BCD counter. + { "Address": 1024, "Width": 16 }, + + // V2040 (octal) = decimal address 1056. 32-bit BCD total at 1056/1057. + { "Address": 1056, "Width": 32 }, + + // V2100 (octal) = decimal address 1088. 16-bit BCD setpoint. + // + // Phase 11: CacheTtlMs (optional) opts this tag into the response cache. With + // CacheTtlMs > 0 set, upstream clients reading this register will see values up + // to CacheTtlMs MILLISECONDS OLD — explicit acknowledgement of the staleness + // window is required by enabling it. Default (omitted or 0) = cache disabled + // for this tag. The cache is OFF by default for every tag. + { "Address": 1088, "Width": 16 /* , "CacheTtlMs": 1000 */ } + ] + }, + + // ── PLC list ──────────────────────────────────────────────────────────────────────── + // Each entry maps one upstream proxy port → one backend PLC. + // Upstream clients connect to ListenPort; the proxy forwards to Host:Port. + // + // IMPORTANT: H2-ECOM100 modules accept at most 4 simultaneous TCP connections. + // With the 1:1 upstream↔backend model, a fifth upstream client to the same proxy + // port will cause a backend connect failure and an immediate upstream disconnect. + "Plcs": [ + { + "Name": "Line1-Mixer", // Human-readable name (shown on status page and in logs) + "ListenPort": 5020, // Port the proxy listens on (upstream clients connect here) + "Host": "10.0.1.1", // PLC IP address or hostname + "Port": 502, // PLC Modbus TCP port (almost always 502) + "BcdTags": { + // Additional 32-bit tag specific to this PLC only. + "Add": [ + { "Address": 1200, "Width": 32 } + ], + // Remove address 1056 from the Global list for this PLC + // (this mixer doesn't use the 32-bit BCD total). + "Remove": [ 1056 ] + } + }, + { + "Name": "Line1-Conveyor", + "ListenPort": 5021, + "Host": "10.0.1.2", + "Port": 502 + // No BcdTags override — uses the Global set as-is. + } + // Add one entry per PLC. Ports must be unique per host. Typical fleet: 54 PLCs. + ], + + // ── Admin port ────────────────────────────────────────────────────────────────────── + // Read-only HTTP status page. + // GET / → self-contained HTML (auto-refreshes every 5 s) + // GET /status.json → same data as JSON for monitoring scrapers + // + // Authentication is assumed at the network layer (trusted internal segment). + // Set to 0 to disable the admin endpoint. + "AdminPort": 8080, + + // ── Connection timeouts ───────────────────────────────────────────────────────────── + "Connection": { + // Max time (ms) to wait for a TCP connect to the PLC backend. + // Each Polly retry attempt gets its own copy of this timeout. + "BackendConnectTimeoutMs": 3000, + + // Max time (ms) to wait for the PLC to respond to a forwarded PDU. + // Non-idempotent FC06/FC16 writes are one-shot — the upstream client + // is disconnected immediately on timeout (no retry). + "BackendRequestTimeoutMs": 3000, + + // Max time (ms) to wait for in-flight PDUs to complete during graceful shutdown + // (systemctl stop → SIGTERM). After this deadline the coordinator cancels + // remaining work and proceeds. Keep at or below the unit's TimeoutStopSec. + "GracefulShutdownTimeoutMs": 10000, + + // ── Keepalive / connection monitoring ─────────────────────────────────── + // The DL205/DL260 ECOM does not emit TCP keepalives, so an idle backend + // socket can be silently dropped by a middlebox (switch, firewall, NAT) + // after 2-5 minutes. This section enables OS-level SO_KEEPALIVE on both + // backend and upstream sockets, and drives a periodic Modbus FC03 heartbeat + // on each idle backend socket so a dead path is detected before a real + // client request hits it. See docs/Architecture/Keepalive.md. + "Keepalive": { + // Master switch. false → no SO_KEEPALIVE and no heartbeat; the proxy + // behaves exactly as a pre-keepalive build. + "Enabled": true, + + // SO_KEEPALIVE: idle time (ms) before the OS sends its first probe. + "TcpIdleTimeMs": 30000, + // SO_KEEPALIVE: interval (ms) between probes once the idle time elapses. + "TcpProbeIntervalMs": 5000, + // SO_KEEPALIVE: unanswered probes before the OS declares the socket dead. + "TcpProbeCount": 4, + + // Backend heartbeat: after this much backend idle (ms) the proxy issues a + // synthetic FC03 qty=1 read to keep the path warm and prove the ECOM is + // still answering Modbus. Must be greater than BackendRequestTimeoutMs. + "BackendHeartbeatIdleMs": 30000, + // FC03 PDU address the heartbeat reads. 0 = V0, valid on DL205/DL260. + "BackendHeartbeatProbeAddress": 0 + } + }, + + // ── Resilience policies ───────────────────────────────────────────────────────────── + "Resilience": { + + // Polly retry policy for backend TCP connect attempts. + // MaxAttempts: total connect tries (including the first). + // BackoffMs: delay between each attempt (must have MaxAttempts−1 entries). + "BackendConnect": { + "MaxAttempts": 3, + "BackoffMs": [ 100, 500, 2000 ] + }, + + // Polly recovery policy for listener bind failures. + // If a PLC's listen port can't be bound (in-use, bad IP, transient OS error), + // the supervisor retries according to this schedule. + // InitialBackoffMs: backoff per step (first N retries). + // SteadyStateMs: backoff for all subsequent retries (runs indefinitely). + "ListenerRecovery": { + "InitialBackoffMs": [ 1000, 2000, 5000, 15000, 30000 ], + "SteadyStateMs": 30000 + }, + + // Phase 10 — in-flight read coalescing. + // + // When two or more upstream clients (HMI / historian / engineering workstation / + // gateway) issue the SAME FC03 or FC04 read while a matching backend round-trip is + // already in flight, the proxy attaches the late arrivals to the existing in-flight + // entry and fans the single PLC response out to every attached client — saving the + // ECOM's per-scan PDU budget on duplicated reads. + // + // Zero post-response staleness: coalescing operates ONLY between "first request + // sent to PLC" and "response received from PLC" (microseconds to ~10 ms typical). + // Each upstream client still sees its own MBAP transaction ID echoed correctly; + // the proxy is transparent. + // + // FC06 / FC16 writes are NEVER coalesced (non-idempotent). FC03 vs FC04 are + // separate Modbus tables and never share a coalescing key. Different unit IDs + // (multi-drop / gateway-backed setups) never coalesce. + // + // Enabled — master switch. Hot-reloadable; flipping to false leaves running + // coalesced entries to drain naturally. + // MaxParties — per-entry cap on attached parties. Past the cap, the next + // identical request opens a fresh backend round-trip (load-shedding + // safety valve for very fan-out-heavy fleets). + "ReadCoalescing": { + "Enabled": true, + "MaxParties": 32 + } + }, + + // ── Response cache (Phase 11) — opt-in bounded-staleness cache ────────────────── + // + // ⚠ DESIGN-CONTRACT PIVOT: with caching enabled the proxy is no longer purely + // transparent. Upstream FC03/FC04 reads for cache-enabled tags may return values + // up to CacheTtlMs MILLISECONDS OLD. Operators opt tags in by setting a non-zero + // CacheTtlMs on a BcdTagOptions entry (or DefaultCacheTtlMs on a PlcOptions entry). + // + // The cache is OFF BY DEFAULT for every tag. A deployment with NO TTL config (this + // section entirely absent and no BcdTags.*.CacheTtlMs / Plcs[i].DefaultCacheTtlMs) + // behaves IDENTICALLY to a pre-Phase-11 deployment — no behaviour change. + // + // AllowLongTtl — gate for any CacheTtlMs > 60_000. Reload validation + // rejects configs that exceed 60 s without this opt-in, + // to prevent accidentally-stale-for-an-hour deployments. + // MaxEntriesPerPlc — LRU cap per-PLC. Past this cap, the next insert evicts + // the least-recently-used entry. Defaults to 1000. + // EvictionIntervalMs — background eviction tick. Scans each PLC's cache and + // removes entries past their TTL. Defaults to 5000. + // + // Properties (full text in docs/Architecture/ResponseCache.md): + // * Cache hits SHORT-CIRCUIT coalescing entirely (cache → coalesce → backend). + // * Successful FC06/FC16 write responses invalidate every cached FC03/FC04 entry + // whose address range OVERLAPS the write — not just exact-key match. + // * Multi-tag read range: effective TTL = min(TTLs). Any tag with TTL=0 in the + // range disables caching for the whole read. + // * Cache stores POST-rewriter bytes; hits never re-invoke the BCD rewriter. + // * Tag-list hot-reload flushes the affected PLC's whole cache. + // * No persistence — process restart wipes the cache. + "Cache": { + "AllowLongTtl": false, + "MaxEntriesPerPlc": 1000, + "EvictionIntervalMs": 5000 + } + }, + + // ── Serilog ───────────────────────────────────────────────────────────────────────────── + // Structured log output. Default: Information level, console + rolling-file. + // The console sink is captured by systemd-journald (view with `journalctl -u mbproxy`). + // In addition, when mbproxy runs as a systemd service the SyslogBridge writes Error+ + // events to the local syslog with proper RFC5424 severity (wired in code, not here). + "Serilog": { + "Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ], + "MinimumLevel": { + "Default": "Information", + "Override": { + "Microsoft": "Warning", + "System": "Warning" + } + }, + "WriteTo": [ + { + "Name": "Console", + "Args": { + "outputTemplate": "[{Timestamp:HH:mm:ss} {Level:u3}] {Message:lj} {Properties:j}{NewLine}{Exception}" + } + }, + { + "Name": "File", + "Args": { + // Rolling log: one file per day, kept for 30 days, under /var/log/mbproxy + // (created by install.sh and owned by the mbproxy service account). + // Survives uninstall — uninstall.sh archives logs to /var/log/mbproxy.archived-. + "path": "/var/log/mbproxy/mbproxy-.log", + "rollingInterval": "Day", + "retainedFileCountLimit": 30, + "outputTemplate": "[{Timestamp:yyyy-MM-dd HH:mm:ss.fff zzz} {Level:u3}] {Message:lj} {Properties:j}{NewLine}{Exception}" + } + } + ] + } +} diff --git a/mbproxy/install/mbproxy.service b/mbproxy/install/mbproxy.service new file mode 100644 index 0000000..db4bc73 --- /dev/null +++ b/mbproxy/install/mbproxy.service @@ -0,0 +1,45 @@ +# systemd unit for mbproxy — the Modbus TCP BCD proxy. +# +# Installed to /etc/systemd/system/mbproxy.service by install.sh. +# The Linux counterpart of the Windows Service registered by install.ps1. +# +# Type=exec (not Type=notify): mbproxy is a leaf service that nothing orders +# against, so systemd's readiness signal is unnecessary. Type=exec marks the +# unit active once the binary is exec'd; graceful stop still works because the +# .NET generic host handles SIGTERM directly (drains in-flight requests within +# Connection.GracefulShutdownTimeoutMs). + +[Unit] +Description=mbproxy — Modbus TCP BCD proxy +After=network-online.target +Wants=network-online.target + +[Service] +Type=exec +ExecStart=/opt/mbproxy/Mbproxy +WorkingDirectory=/etc/mbproxy +User=mbproxy +Group=mbproxy + +# Restart on crash, but not on a clean SIGTERM stop. +Restart=on-failure +RestartSec=5 +# Keep above Connection.GracefulShutdownTimeoutMs (default 10 s) so the drain +# completes before systemd escalates to SIGKILL. +TimeoutStopSec=30 + +# Self-contained single-file publish: pin native-library extraction to a stable, +# writable directory (install.sh creates it and grants the mbproxy account access). +Environment=DOTNET_BUNDLE_EXTRACT_BASE_DIR=/var/cache/mbproxy + +# Hardening. The service only needs to write its log and bundle-cache directories. +NoNewPrivileges=true +ProtectSystem=strict +ProtectHome=true +PrivateTmp=true +ReadWritePaths=/var/log/mbproxy /var/cache/mbproxy +# If any configured ListenPort is below 1024, also add: +# AmbientCapabilities=CAP_NET_BIND_SERVICE + +[Install] +WantedBy=multi-user.target diff --git a/mbproxy/install/publish.ps1 b/mbproxy/install/publish.ps1 index ba74834..26c5c10 100644 --- a/mbproxy/install/publish.ps1 +++ b/mbproxy/install/publish.ps1 @@ -1,19 +1,27 @@ <# .SYNOPSIS - Publishes Mbproxy.exe in two flavours: self-contained and framework-dependent. + Publishes the Mbproxy binary in two flavours: self-contained and framework-dependent. .DESCRIPTION - Produces two single-file win-x64 builds under \publish-out\: + Produces two single-file builds for the requested runtime under \publish-out\: - self-contained\Mbproxy.exe ~100 MB — bundles the .NET 10 runtime; - no .NET install needed on target. - framework-dependent\Mbproxy.exe ~1.6 MB — requires .NET 10 + ASP.NET Core - runtime preinstalled on target. + self-contained\ ~100 MB — bundles the .NET 10 + ASP.NET Core runtime; + no .NET install needed on the target. + framework-dependent\ ~1.6 MB — requires the .NET 10 + ASP.NET Core runtime + preinstalled on the target. - Both builds use the Release configuration and inherit the publish settings - declared in src\Mbproxy\Mbproxy.csproj (PublishSingleFile=true, - IncludeNativeLibrariesForSelfExtract=true). The framework-dependent build - overrides SelfContained=false on the command line. + The runtime is selected with -Rid (default win-x64). The binary is Mbproxy.exe on + Windows RIDs and Mbproxy on Linux/macOS RIDs. + + Both builds use the Release configuration and inherit the publish settings declared + in src\Mbproxy\Mbproxy.csproj (PublishSingleFile=true, SelfContained=true, + IncludeNativeLibrariesForSelfExtract=true; those settings are gated on an explicit + RID, which is supplied here). The framework-dependent build overrides + SelfContained=false on the command line. + +.PARAMETER Rid + .NET runtime identifier to publish for. Examples: win-x64, linux-x64. + Default: win-x64 .PARAMETER OutputDir Root output directory. Two subfolders are created beneath it. @@ -24,10 +32,12 @@ .EXAMPLE .\publish.ps1 - .\publish.ps1 -Clean + .\publish.ps1 -Rid linux-x64 + .\publish.ps1 -Rid win-x64 -Clean #> [CmdletBinding()] param( + [string]$Rid = 'win-x64', [string]$OutputDir = (Join-Path (Split-Path -Parent $PSScriptRoot) 'publish-out'), [switch]$Clean ) @@ -46,15 +56,18 @@ if ($Clean -and (Test-Path $OutputDir)) { Remove-Item -Recurse -Force $OutputDir } +# Binary name: Windows RIDs produce an .exe, every other RID produces an extensionless ELF/Mach-O. +$exeName = if ($Rid -like 'win-*') { 'Mbproxy.exe' } else { 'Mbproxy' } + $selfContainedOut = Join-Path $OutputDir 'self-contained' $frameworkDependentOut = Join-Path $OutputDir 'framework-dependent' -Write-Host "`n=== Publishing self-contained (~100 MB) ===" -ForegroundColor Cyan -& dotnet publish $csproj -c Release -r win-x64 -o $selfContainedOut --nologo +Write-Host "`n=== Publishing self-contained ($Rid, ~100 MB) ===" -ForegroundColor Cyan +& dotnet publish $csproj -c Release -r $Rid -o $selfContainedOut --nologo if ($LASTEXITCODE -ne 0) { throw "self-contained publish failed (exit $LASTEXITCODE)" } -Write-Host "`n=== Publishing framework-dependent (~1.6 MB) ===" -ForegroundColor Cyan -& dotnet publish $csproj -c Release -r win-x64 -p:SelfContained=false -p:PublishSingleFile=true -o $frameworkDependentOut --nologo +Write-Host "`n=== Publishing framework-dependent ($Rid, ~1.6 MB) ===" -ForegroundColor Cyan +& dotnet publish $csproj -c Release -r $Rid -p:SelfContained=false -p:PublishSingleFile=true -o $frameworkDependentOut --nologo if ($LASTEXITCODE -ne 0) { throw "framework-dependent publish failed (exit $LASTEXITCODE)" } function Format-Size { @@ -63,14 +76,14 @@ function Format-Size { else { '{0:N1} KB' -f ($Bytes / 1KB) } } -Write-Host "`n=== Result ===" -ForegroundColor Green +Write-Host "`n=== Result ($Rid) ===" -ForegroundColor Green foreach ($flavour in 'self-contained','framework-dependent') { - $exe = Join-Path $OutputDir "$flavour\Mbproxy.exe" - if (Test-Path $exe) { - $size = (Get-Item $exe).Length - Write-Host (" {0,-22} {1,10} {2}" -f $flavour, (Format-Size $size), $exe) + $bin = Join-Path $OutputDir "$flavour\$exeName" + if (Test-Path $bin) { + $size = (Get-Item $bin).Length + Write-Host (" {0,-22} {1,10} {2}" -f $flavour, (Format-Size $size), $bin) } else { - Write-Warning "Missing: $exe" + Write-Warning "Missing: $bin" } } Write-Host "" diff --git a/mbproxy/install/publish.sh b/mbproxy/install/publish.sh new file mode 100644 index 0000000..807ff68 --- /dev/null +++ b/mbproxy/install/publish.sh @@ -0,0 +1,82 @@ +#!/usr/bin/env bash +# +# publish.sh — Linux/macOS counterpart of publish.ps1. +# +# Publishes the Mbproxy binary in two flavours for the requested runtime under +# /publish-out/: +# +# self-contained/ ~100 MB — bundles the .NET 10 + ASP.NET Core runtime; +# no .NET install needed on the target. +# framework-dependent/ ~1.6 MB — requires the .NET 10 + ASP.NET Core runtime +# preinstalled on the target. +# +# Both builds use the Release configuration and inherit the publish settings in +# src/Mbproxy/Mbproxy.csproj (those settings are gated on an explicit RID, which +# is supplied here). The framework-dependent build overrides SelfContained=false. +# +# Usage: +# ./publish.sh [-r RID] [-o OUTPUT_DIR] [--clean] +# +# -r RID .NET runtime identifier (default: linux-x64) +# -o OUTPUT_DIR root output directory (default: /publish-out) +# --clean delete OUTPUT_DIR before publishing +# +# Examples: +# ./publish.sh +# ./publish.sh -r linux-x64 --clean +# +set -euo pipefail + +rid="linux-x64" +script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +repo_root="$(dirname "$script_dir")" +output_dir="$repo_root/publish-out" +clean=0 + +while [[ $# -gt 0 ]]; do + case "$1" in + -r) rid="$2"; shift 2 ;; + -o) output_dir="$2"; shift 2 ;; + --clean) clean=1; shift ;; + *) echo "Unknown argument: $1" >&2; exit 2 ;; + esac +done + +csproj="$repo_root/src/Mbproxy/Mbproxy.csproj" +if [[ ! -f "$csproj" ]]; then + echo "Cannot find $csproj" >&2 + exit 1 +fi + +if [[ "$clean" -eq 1 && -d "$output_dir" ]]; then + echo "Cleaning $output_dir" + rm -rf "$output_dir" +fi + +# Binary name: Windows RIDs produce an .exe, every other RID an extensionless binary. +if [[ "$rid" == win-* ]]; then bin_name="Mbproxy.exe"; else bin_name="Mbproxy"; fi + +self_contained_out="$output_dir/self-contained" +framework_dependent_out="$output_dir/framework-dependent" + +echo +echo "=== Publishing self-contained ($rid, ~100 MB) ===" +dotnet publish "$csproj" -c Release -r "$rid" -o "$self_contained_out" --nologo + +echo +echo "=== Publishing framework-dependent ($rid, ~1.6 MB) ===" +dotnet publish "$csproj" -c Release -r "$rid" \ + -p:SelfContained=false -p:PublishSingleFile=true -o "$framework_dependent_out" --nologo + +echo +echo "=== Result ($rid) ===" +for flavour in self-contained framework-dependent; do + bin="$output_dir/$flavour/$bin_name" + if [[ -f "$bin" ]]; then + size="$(du -h "$bin" | cut -f1)" + printf ' %-22s %8s %s\n' "$flavour" "$size" "$bin" + else + echo " WARNING: missing $bin" >&2 + fi +done +echo diff --git a/mbproxy/install/uninstall.ps1 b/mbproxy/install/uninstall.ps1 index 1ed5ce6..231f3d0 100644 --- a/mbproxy/install/uninstall.ps1 +++ b/mbproxy/install/uninstall.ps1 @@ -122,7 +122,10 @@ if (Test-Path $InstallPath) { if ([System.Diagnostics.EventLog]::SourceExists('mbproxy')) { Write-Host "Removing Windows Event Log source 'mbproxy'..." try { - Remove-EventLog -Source 'mbproxy' + # .NET API, not Remove-EventLog: the *-EventLog cmdlets exist only in + # Windows PowerShell 5.1, not PowerShell 7+. Symmetric with the + # SourceExists check above. + [System.Diagnostics.EventLog]::DeleteEventSource('mbproxy') } catch { Write-Warning "Could not remove Event Log source: $_" } diff --git a/mbproxy/install/uninstall.sh b/mbproxy/install/uninstall.sh new file mode 100644 index 0000000..4b1bb0f --- /dev/null +++ b/mbproxy/install/uninstall.sh @@ -0,0 +1,85 @@ +#!/usr/bin/env bash +# +# uninstall.sh — remove the mbproxy service from a Linux / systemd host. +# +# The Linux counterpart of uninstall.ps1. Stops and disables the service, +# removes the systemd unit and installed files, and (unless --keep-config) +# removes the config directory. Log files are always preserved: they are moved +# to a timestamped archive so post-uninstall diagnostics remain accessible. +# +# Usage: +# sudo ./uninstall.sh [--keep-config] [--keep-user] +# +# --keep-config leave /etc/mbproxy/appsettings.json in place. +# --keep-user leave the mbproxy service account in place. +# +set -euo pipefail + +SERVICE_NAME="mbproxy" +SERVICE_USER="mbproxy" +INSTALL_DIR="/opt/mbproxy" +CONFIG_DIR="/etc/mbproxy" +LOG_DIR="/var/log/mbproxy" +CACHE_DIR="/var/cache/mbproxy" +UNIT_DEST="/etc/systemd/system/${SERVICE_NAME}.service" + +keep_config=0 +keep_user=0 +while [[ $# -gt 0 ]]; do + case "$1" in + --keep-config) keep_config=1; shift ;; + --keep-user) keep_user=1; shift ;; + *) echo "Unknown argument: $1" >&2; exit 2 ;; + esac +done + +if [[ "$(id -u)" -ne 0 ]]; then + echo "uninstall.sh must run as root (use sudo)." >&2 + exit 1 +fi + +echo "Uninstalling ${SERVICE_NAME} service..." + +# ── 1. Stop + disable the service ──────────────────────────────────────────── +if systemctl list-unit-files "${SERVICE_NAME}.service" >/dev/null 2>&1 \ + && [[ -n "$(systemctl list-unit-files "${SERVICE_NAME}.service" --no-legend 2>/dev/null)" ]]; then + echo "Stopping and disabling '${SERVICE_NAME}'..." + systemctl disable --now "$SERVICE_NAME" >/dev/null 2>&1 || true +fi + +# ── 2. Remove the systemd unit ─────────────────────────────────────────────── +if [[ -f "$UNIT_DEST" ]]; then + echo "Removing systemd unit '${UNIT_DEST}'..." + rm -f "$UNIT_DEST" +fi +systemctl daemon-reload +systemctl reset-failed "$SERVICE_NAME" >/dev/null 2>&1 || true + +# ── 3. Archive logs (always preserved, never deleted) ──────────────────────── +if [[ -d "$LOG_DIR" ]]; then + timestamp="$(date -u +%Y%m%dT%H%M%SZ)" + archive_dir="${LOG_DIR}.archived-${timestamp}" + echo "Archiving logs to '${archive_dir}'..." + mv "$LOG_DIR" "$archive_dir" +fi + +# ── 4. Remove installed files ──────────────────────────────────────────────── +rm -rf "$INSTALL_DIR" "$CACHE_DIR" + +if [[ "$keep_config" -eq 1 ]]; then + echo "Keeping config at '${CONFIG_DIR}/appsettings.json' (--keep-config)." +else + rm -rf "$CONFIG_DIR" +fi + +# ── 5. Remove the service account ──────────────────────────────────────────── +if [[ "$keep_user" -eq 0 ]] && id -u "$SERVICE_USER" >/dev/null 2>&1; then + echo "Removing service account '${SERVICE_USER}'..." + userdel "$SERVICE_USER" 2>/dev/null || true +fi + +echo "" +echo "Uninstall complete." +if compgen -G "${LOG_DIR}.archived-*" >/dev/null; then + echo "Archived logs: ${LOG_DIR}.archived-*" +fi diff --git a/mbproxy/plans/2026-05-15-multiplatform.md b/mbproxy/plans/2026-05-15-multiplatform.md new file mode 100644 index 0000000..fae2b88 --- /dev/null +++ b/mbproxy/plans/2026-05-15-multiplatform.md @@ -0,0 +1,576 @@ +# mbproxy Multiplatform Implementation Plan + +**Created:** 2026-05-15 +**Status:** All six phases implemented. 413 tests green on Windows; Windows Service and +Linux systemd install E2E both green. Two findings (pymodbus-sim-on-Linux, `AddSystemd()` +notify) logged as orthogonal follow-ups. Working tree only — nothing committed. +**Working artifact** — not part of the `docs/` source-of-truth tree (per `../DOCS-GUIDE.md`). +Delete or archive once the work lands. + +### Progress log + +- **2026-05-15 — Phase 1 done, Gate 1 green.** RID removed from `csproj` + (single-file settings now gated on `'$(RuntimeIdentifier)' != ''`); + `publish.ps1` gained `-Rid`; `publish.sh` added. `dotnet build -c Debug` 0 + warnings; `dotnet test` **398 passed / 0 failed** (baseline 325 → 398, the + Keepalive feature added tests); `win-x64` → `Mbproxy.exe` 100.1 MB, + `linux-x64` → `Mbproxy` ELF 97.2 MB. ELF launch-smoked on `10.100.0.35`: + full startup, listeners bound, `mbproxy.startup.ready` + admin endpoint up, + no errors. Box prep done (.NET SDK 10.0.300, shellcheck 0.10.0 installed). +- **2026-05-15 — Phases 2 + 3 code done (combined integrator pass).** Packages + added: `Microsoft.Extensions.Hosting.Systemd` 10.0.8, + `Serilog.Sinks.SyslogMessages` 4.1.0 (the maintained IonxSolutions package — + the bare `Serilog.Sinks.Syslog` ID is a near-abandoned 0.2.0 package; same + approved intent). New `DiagnosticSink` enum + `DiagnosticSinkSelector` (pure); + new `SyslogBridge`; `EventLogBridge` truncation extracted to a non-annotated + `EventLogMessage` type (testable cross-OS). `AddMbproxySerilog` now selects + the sink internally; `Program.cs` calls `AddSystemd()` + `AddWindowsService()`. + 13 new tests. **411 passed / 0 failed on Windows**; on `10.100.0.35` + **372 passed / 39 skipped / 0 failed** — all 39 skips are simulator-backed + E2E (see finding below), every host/diagnostic/smoke test green on Linux. + +- **2026-05-15 — Two cross-platform bugs found and fixed in install tooling.** + (1) `tests/sim/run-dl205-sim.ps1` was Windows-only — hardcoded venv paths + `Scripts\*.exe`; now branches `Scripts`/`.exe` vs `bin`/`` on `$IsWindows` + and adds `python3` to the interpreter candidates. (2) `install.ps1` / + `uninstall.ps1` used `New-EventLog` / `Remove-EventLog`, which exist only in + Windows PowerShell 5.1 — they fail under PowerShell 7+. Switched to the .NET + API (`[EventLog]::CreateEventSource` / `DeleteEventSource`), symmetric with + the `SourceExists` calls already in those scripts. +- **2026-05-15 — Windows Service E2E green (local, admin).** Republished + `win-x64`; `install.ps1 -Start` installs + starts the service; verified + Running/Automatic, `status.json` served, listeners bound, + `mbproxy.startup.ready` logged, Event Log source registered, + `WindowsServiceLifetime` wrote "Service started successfully" (proves the + process runs under the SCM). `uninstall.ps1` stopped/deleted the service, + archived logs, removed the Event Log source. Box left clean. (A forced + `EventLogBridge` Error+ write was not pursued — `Emit` is unchanged code, + covered by `EventLogMessageTests`; sink selection is covered by + `DiagnosticSinkSelectorTests`.) +- **2026-05-15 — Linux systemd E2E done.** The `linux-x64` ELF runs under a + real systemd unit on `10.100.0.35`: starts, binds listeners, serves the + admin endpoint, and `systemctl stop` → graceful SIGTERM drain + (`mbproxy.shutdown.complete` in the journal). `Type=notify` does not work + (see Findings) → Phase 5 will ship `Type=exec`. Box prep this session: + `dotnet-sdk-10.0`, `shellcheck`, `python3-venv`, pwsh 7.6.1 (dotnet global + tool), pymodbus 3.13.0 venv. + +- **2026-05-15 — Phases 4–6 done.** Phase 4: new `install/mbproxy.linux.config.template.json` + (Unix log path `/var/log/mbproxy`, systemd-oriented comments); `csproj` links the + platform-correct template into the published `appsettings.json` by RID + (`win-*`/RID-less → Windows, else Unix) — verified by publishing both RIDs; + `MbproxyOptionsBindingTests` extended to load + schema-validate both templates + (now 413 tests on Windows). Phase 5: `install/mbproxy.service` (`Type=exec`, + hardened, `mbproxy` service account), `install/install.sh`, `install/uninstall.sh` + — `shellcheck` clean; install→active→`status.json` served→uninstall→clean E2E + passed on `10.100.0.35`. Phase 6: `README.md`, `mbproxy/CLAUDE.md`, + `../CLAUDE.md`, `docs/Operations/Configuration.md`, `docs/Reference/LogEvents.md`, + `docs/Operations/Troubleshooting.md`, `docs/Architecture/Overview.md`, + `docs/Features/HotReload.md` updated for the dual-platform reality. + +### Findings + +- **Linux full run: 374 passed / 37 failed / 0 skipped.** With the simulator + launcher fixed and pymodbus provisioned, the simulator-backed E2E tests now + *run* on Linux (0 skipped) but **37 fail** with `IOException: Broken pipe` + (`SocketException`) when the NModbus client writes through the proxy. The + failures are broad across all simulator-backed E2E (cache, forwarding, + rewriter, supervision). **Not a Phases 1–3 regression:** the multiplatform + work touches only build config, diagnostic sinks, and host registration — + none of the Modbus proxy data path. The same 37 tests pass on Windows + (411/411), and every non-E2E test — including all 13 new diagnostic tests — + passes on Linux. **Root cause isolated:** the `SimulatorSmokeTests` — which + connect *directly to the pymodbus simulator with no proxy in the path* — also + fail (TCP connect error). So the fault is the pymodbus 3.13.0 simulator + itself on this box, not mbproxy's proxy code. Likely pymodbus 3.13.0 vs + Python 3.13.5 (both very new), or the box's Docker-host networking. Treated + as a **separate investigation** (pymodbus-simulator-on-Linux), entirely + orthogonal to the multiplatform service work — see the session report. +- The `run-dl205-sim.ps1` idempotency check keys on `Test-Path $venvDir` only; + a venv left structurally broken by a killed run (no `bin/`) is not detected + and re-created. Pre-existing latent gap, not platform-specific — noted, not + fixed (out of scope; a clean run is unaffected). +- **`AddSystemd()` does not deliver `sd_notify(READY=1)` here → Phase 5 uses + `Type=exec`.** mbproxy runs correctly under systemd (starts, binds, serves, + and SIGTERM → graceful drain all work — verified in the journal), but a + `Type=notify` unit never receives `READY=1` and times out. Isolated step by + step: `SystemdHelpers.IsSystemdService()` correctly returns `True` under + systemd; a *minimal* `Host.CreateApplicationBuilder()` + `AddSystemd()` host + reproduces the failure; both a `systemd-run` transient unit and a real + `Type=notify` unit file fail identically. So it is **not an mbproxy bug** — + it is a `HostApplicationBuilder` + `Microsoft.Extensions.Hosting.Systemd` + 10.0.8 (minimal-hosting) issue. **Resolution:** the Phase 5 unit uses + `Type=exec` — mbproxy is a leaf service that nothing orders against, so the + readiness signal is unnecessary; `Type=exec` + the generic host's built-in + POSIX `SIGTERM` handling (independent of `SystemdLifetime`) gives a fully + working unit with `Restart=on-failure`. `AddSystemd()` stays in `Program.cs` + (correct, documented, forward-compatible, harmless). Root-causing the .NET + notify gap is logged as a separate follow-up. + +A plan to make mbproxy run on Linux (and incidentally macOS) as a first-class +target while keeping the Windows Service + Event Log behavior intact and adding +systemd + journald/syslog equivalents. + +The hosting model (`Host.CreateApplicationBuilder` + `IHostedService` + Kestrel) +is already portable, so the work is narrow: generalize the build, abstract one +diagnostic sink, add one package + one call, and add Linux tooling/docs. + +--- + +## 0. Test Environments + +Both platforms can be exercised fully — no environment is simulated or +deferred. + +### 0.1 Windows (the dev box — local) + +The dev box runs **with administrator rights**, so every Windows gate runs +locally with no separate test machine: + +- `install.ps1` (requires elevation) installs the real Windows Service. +- The Event Log source `mbproxy` can be registered and `EventLogBridge` writes + verified against the Application log. +- Install → start → stop → uninstall is a full local round-trip. + +> Windows Service E2E mutates machine state (a registered service + Event Log +> source). It is **integrator-only** and the integrator always runs +> `uninstall.ps1` to leave the box clean after each gate. + +### 0.2 Linux + +**Host:** `dohertj2@10.100.0.35` — Debian 13 (trixie), amd64, kernel 6.12, +hostname `DOCKER`. systemd 257. + +- **Access:** passwordless SSH from the Windows dev box; passwordless `sudo` + (verified 2026-05-15). +- **Reachable** on `10.100.0.35` (also `10.50.0.35`, `10.200.0.35`). +- **One-time prep** (run once before Wave 1 gates): + ``` + ssh dohertj2@10.100.0.35 'sudo apt-get update && \ + sudo apt-get install -y dotnet-sdk-10.0 shellcheck' + ``` + `dotnet-sdk-10.0` candidate is `10.0.203` — matches the `net10.0` target. +- **Docker is installed** on the box (the user is in the `docker` group). Use + ephemeral Debian containers to isolate per-subagent E2E runs so parallel + Wave-4 agents don't collide on the host's systemd / ports (see section 3, + rule 8). + +**How the integrator uses the box per gate:** +- Push the integration branch (or `rsync` the worktree) to the box, then run + `dotnet build` / `dotnet test` / `dotnet publish -r linux-x64` over SSH. +- Run the *actual* `linux-x64` ELF binary, the systemd unit, and `shellcheck` + here — Windows can cross-*publish* a `linux-x64` binary but cannot *run* or + service-host it. + +> The box is a **shared mutable resource**. Host-level mutations (apt installs, +> `systemctl` on the real host, privileged-port binds) are integrator-only and +> run serially between waves. Subagents that need Linux E2E use throwaway +> Docker containers, never the host's init system directly. + +--- + +## 1. Scope + +**In scope** +- Linux (`linux-x64`) as a supported runtime target alongside `win-x64`. +- systemd integration (`Type=notify`, sd_notify readiness, SIGTERM drain). +- A Linux-appropriate error-event diagnostic sink (syslog, severity-mapped). +- RID-agnostic build + dual-RID publish tooling. +- Linux install tooling (systemd unit + shell scripts). +- Docs/README/CLAUDE.md updates. + +**Out of scope (state explicitly in docs)** +- macOS `launchd` integration — mbproxy will *run* on macOS as a console + process but ships no service-manager integration. +- ARM RIDs (`linux-arm64`) — the build will not *forbid* them, but they are + untested. +- Container/Docker packaging — separate future effort. + +**Locked design decisions** +- Reference `Microsoft.Extensions.Hosting.WindowsServices` *and* + `Microsoft.Extensions.Hosting.Systemd` unconditionally; both packages are + portable and both helpers self-detect their host. No conditional + ``. +- All Windows API calls (`System.Diagnostics.EventLog`) stay behind + `OperatingSystem.IsWindows()` + `[SupportedOSPlatform("windows")]`; CA1416 + (already enforced via `TreatWarningsAsErrors`) is the safety net. +- Diagnostic sink selection happens **once**, at the composition root + (`AddMbproxySerilog`). No OS branching anywhere else. +- Prefer **new files** over editing shared files, to keep parallel work + conflict-free. +- **Linux error-event sink: `Serilog.Sinks.Syslog`** (decided 2026-05-15). + Error+ events get RFC5424 severity mapping on Linux, mirroring the Windows + Event Log behavior where Error+ is surfaced distinctly. + `DiagnosticSinkSelector` returns `EventLog | Syslog | None`. + +--- + +## 2. Phase Breakdown + +Each phase lists its **owned file set** (the parallel-safety contract), +changes, tests, and a **gate** that must be green before the next phase starts. + +### Phase 1 — Build & publish generalization (foundation) + +**Objective:** Remove the hardcoded RID so the project builds/publishes for any +runtime; keep the Windows output byte-identical. + +**Owned files** +- `src/Mbproxy/Mbproxy.csproj` +- `install/publish.ps1` +- `install/publish.sh` *(new)* + +**Changes** +- `Mbproxy.csproj`: delete `win-x64` + from the Release `PropertyGroup`; keep `PublishSingleFile` / `SelfContained` + / `IncludeNativeLibrariesForSelfExtract`. RID becomes a publish-time `-r` + argument. +- `publish.ps1`: add a `-Rid` parameter (default `win-x64`), keep the + two-flavor logic. +- `publish.sh`: Linux counterpart producing `linux-x64` self-contained + + framework-dependent builds. +- (The RID-conditioned `appsettings.json` content item is Phase 4; in Phase 1 + just confirm the build works without a baked RID.) + +**Tests** +- No xunit tests (build-config change). Gate is publish success on both RIDs. + +**Gate 1** +- `dotnet build -c Debug` green; `dotnet test` full suite green (unchanged + count). +- `dotnet publish -c Release -r win-x64` produces a single-file `Mbproxy.exe` + (same size class as before). +- `dotnet publish -c Release -r linux-x64` produces a single-file `Mbproxy` + ELF binary. Cross-published from the Windows dev box; the ELF is then copied + to `10.100.0.35` and confirmed to launch (`./Mbproxy --version`-class smoke). +- Zero new analyzer warnings. + +--- + +### Phase 2 — Diagnostic sink abstraction + +**Objective:** Make error-event delivery a platform-selected sink. Windows +keeps `EventLogBridge`; Linux gets a syslog sink. + +**Owned files** +- `src/Mbproxy/Diagnostics/DiagnosticSinkSelector.cs` *(new — pure selection + logic)* +- `src/Mbproxy/Diagnostics/SyslogBridge.cs` *(new)* +- `src/Mbproxy/Diagnostics/EventLogBridge.cs` *(minor: extract the 32 KB + truncation helper into a testable static method)* +- `src/Mbproxy/HostingExtensions.cs` *(only `AddMbproxySerilog`)* +- `src/Mbproxy/Mbproxy.csproj` *(add `Serilog.Sinks.Syslog` package)* +- New test files (see below) + +> `HostingExtensions.cs` and `Mbproxy.csproj` are also touched by Phase 3. +> **Phases 2 and 3 must not run in parallel** (see section 3). They are +> sequential. + +**Changes** +- `DiagnosticSinkSelector` — a pure function taking + `(bool isWindows, bool isWindowsService, bool isSystemd)` and returning an + enum (`EventLog | Syslog | None`). No I/O, fully unit-testable. +- `SyslogBridge`: Serilog `ILogEventSink` wrapping `Serilog.Sinks.Syslog`, + active for Error+ only, mirroring `EventLogBridge`'s contract (silent no-op + if syslog unavailable). +- `AddMbproxySerilog`: replace the `addEventLogBridge` bool parameter with a + `DiagnosticSinkSelector` result; wire the chosen sink. Keep the + `OperatingSystem.IsWindows()` guard around `EventLogBridge`. +- Extract `EventLogBridge`'s message-truncation into + `internal static string TruncateToEventLogLimit(string)` so it can be tested + OS-independently. + +**Tests** (`tests/Mbproxy.Tests/Diagnostics/`) +- `DiagnosticSinkSelectorTests` — table-driven: Windows+service→`EventLog`; + Windows console→`None`; Linux+systemd→`Syslog`; Linux console→`None`; + macOS→`None`. +- `EventLogBridgeTests` — `[Trait("Category","Unit")]`, Windows-guarded facts: + source-missing → silent no-op; truncation helper caps at 32 KB and appends + `...` (this fact runs on all OSes since the helper is pure). +- `SyslogBridgeTests` — Error+ filter; no-throw when transport unavailable. + +**Gate 2** +- Full test suite green on Windows (local); full suite green on Linux — + integrator runs `dotnet test` over SSH on `10.100.0.35`. +- `EventLogBridge` emits to the Application log — verified locally via a real + Windows Service install (`install.ps1`, admin rights available), then + `uninstall.ps1` to clean up. +- CA1416: zero warnings. + +--- + +### Phase 3 — Service host integration (systemd) + +**Objective:** Register both init-system integrations; the host correctly +reports readiness to whichever launched it. + +**Owned files** +- `src/Mbproxy/Program.cs` +- `src/Mbproxy/HostingExtensions.cs` *(call-site update only)* +- `src/Mbproxy/Mbproxy.csproj` *(add `Microsoft.Extensions.Hosting.Systemd`)* + +**Changes** +- `csproj`: add + `` (pin to + the 10.0.x line matching the existing Windows-services package). +- `Program.cs`: call `builder.Services.AddSystemd();` alongside + `AddWindowsService();`. Compute `isSystemd` via + `SystemdHelpers.IsSystemdService()` and feed `DiagnosticSinkSelector` + together with `isWindowsService`. +- Confirm SIGTERM → host shutdown → existing + `Connection.GracefulShutdownTimeoutMs` drain path works (it does — POSIX + signal handling is built into the generic host; just verify). + +**Tests** (`tests/Mbproxy.Tests/HostSmokeTests.cs` — extend existing file) +- `HostSmoke_RegistersBothServiceIntegrations_StartsAndStops` — builds the host + with both `AddWindowsService` + `AddSystemd`, asserts no throw, asserts + `mbproxy.startup.ready` still logged. +- Existing two smoke tests must remain green. + +**Gate 3** +- Full suite green on Windows (local) and Linux (`10.100.0.35` via SSH). +- Windows Service E2E, run locally with admin rights: `install.ps1` → service + starts, logs `mbproxy.startup.ready` + writes to Event Log, `Stop-Service` + drains cleanly, `uninstall.ps1` removes it. **No regression** in Windows + behavior is the hard requirement of this gate. +- Linux systemd E2E on `10.100.0.35` — **done.** The `linux-x64` binary runs + under a real systemd unit: it starts, binds listeners, serves the admin + endpoint, and `systemctl stop` (SIGTERM) drains gracefully + (`mbproxy.shutdown.complete` in the journal). `Type=notify` was found not to + deliver `READY=1` (Findings) → the Phase 5 unit uses `Type=exec`, under which + the service is fully functional. + +--- + +### Phase 4 — Config & filesystem portability + +**Objective:** No Windows-only paths in the shipped/installed config. + +**Owned files** +- `install/mbproxy.config.template.json` *(Windows — keep `C:\ProgramData\...` + path)* +- `install/mbproxy.linux.config.template.json` *(new — `/var/log/mbproxy/...`, + Linux syslog `Using` entry)* +- `src/Mbproxy/Mbproxy.csproj` *(condition the linked `appsettings.json` + content item by `$(RuntimeIdentifier)`)* + +> Touches `csproj`. Must run after Phase 3's csproj edit is merged (sequential +> w.r.t. csproj), but is otherwise independent of Phase 5/6. + +**Changes** +- New Linux template: log path `/var/log/mbproxy/mbproxy-.log`; Serilog + `Using` array includes the syslog sink; comment header points at + `/etc/mbproxy/appsettings.json`. +- `csproj`: link the win template for `win-*` RIDs and the linux template for + `linux-*` RIDs into the published `appsettings.json` (RID-conditioned + `` items). + +**Tests** (`tests/Mbproxy.Tests/Options/`) +- Extend `MbproxyOptionsBindingTests`: load **each** shipped template through + the config binder + `MbproxyOptionsValidator`; assert both bind and validate + cleanly. Catches a malformed Linux template at build time. + +**Gate 4** +- Both templates bind + validate (new test green). +- `dotnet publish -r linux-x64` ships the Linux template as `appsettings.json`; + `-r win-x64` ships the Windows one. Verify by inspecting publish output. + +--- + +### Phase 5 — Linux install tooling + +**Objective:** Parity with `install.ps1` for systemd hosts. + +**Owned files** (all new, fully disjoint from all other phases) +- `install/mbproxy.service` — systemd unit, **`Type=exec`** (not `Type=notify` — + see Findings: `AddSystemd()` does not deliver `READY=1` for the minimal + hosting model), `Restart=on-failure`, `User=mbproxy`, `ExecStart` pointing at + the installed binary; sets `DOTNET_BUNDLE_EXTRACT_BASE_DIR`. +- `install/install.sh` — creates `mbproxy` service account, lays down binary + + `/etc/mbproxy/appsettings.json` (preserve-if-exists, matching `install.ps1` + semantics), creates `/var/log/mbproxy`, installs + `systemctl enable --now`. +- `install/uninstall.sh` — `systemctl disable --now`, archives logs (mirror the + `.archived-` convention), removes unit. + +**Tests** +- Not xunit. Gate = `shellcheck` clean + a dry-run inside a throwaway Debian + container on `10.100.0.35`. + +**Gate 5** +- `shellcheck install/*.sh` clean — run on `10.100.0.35` (shellcheck installed + in the one-time prep). +- End-to-end on `10.100.0.35`, inside a throwaway Debian container: + `install.sh` → service active → proxy answers Modbus on a configured port → + `uninstall.sh` → service gone, logs archived. Container isolation keeps the + `mbproxy` service account / unit off the real host. + +--- + +### Phase 6 — Documentation + +**Objective:** Docs reflect dual-platform reality; doctrine in `DOCS-GUIDE.md` +respected. + +**Owned files** +- `README.md` — rewrite "Hard constraints / prerequisites" (drop "No Linux or + Docker support"); add Linux install path; document both publish flavors × + both RIDs. +- `docs/Operations/Configuration.md` — both config templates, log-path + differences, syslog vs Event Log. +- `docs/Operations/Troubleshooting.md` — `journalctl` guidance alongside Event + Viewer. +- `docs/Architecture/Overview.md` — note dual init-system hosting (only if it + shifts a headline bullet). +- `docs/Reference/LogEvents.md` — note Error+ events route to Event Log + (Windows) / syslog (Linux). +- `mbproxy/CLAUDE.md` — correct the implied Windows-only framing. +- `wwtools/CLAUDE.md` — broaden the mbproxy index row if the task→tool mapping + changed. + +**Tests** +- Markdown link-check across touched files. + +**Gate 6** +- All internal doc links resolve. +- README "Hard constraints" no longer contradicts the shipped tooling. + +--- + +## 3. Parallel Subagent Execution Plan + +### Dependency graph + +``` +Phase 1 (build) ──> Phase 2 (diagnostics) ──> Phase 3 (host) ──┬─> Phase 4 (config) + ├─> Phase 5 (install) + └─> Phase 6 (docs) +``` + +Phases 2 and 3 are **strictly sequential**: Phase 3 calls the new +`AddMbproxySerilog` signature Phase 2 defines, and both edit +`HostingExtensions.cs` + `csproj`. Phases 4, 5, 6 are **mutually independent** +and parallelizable once Phase 3 is merged. + +### Wave plan + +| Wave | Phases | Agents | Mode | +| ---- | --------- | ------------------- | ----------------------------------------------- | +| W1 | Phase 1 | 1 agent | Single — touches `csproj` | +| W2 | Phase 2 | 1 agent | Single — touches `csproj` + `HostingExtensions` | +| W3 | Phase 3 | 1 agent | Single — touches `csproj` + `HostingExtensions` + `Program.cs` | +| W4 | 4, 5, 6 | 3 agents (parallel) | Parallel — disjoint file sets | + +> Phase 4 touches `csproj` but no other W4 phase does, so within W4 the file +> sets are still disjoint. Safe. + +### File-ownership matrix (the parallel-safety contract) + +| File | P1 | P2 | P3 | P4 | P5 | P6 | +| --------------------------------------------- | -- | -- | -- | -- | -- | -- | +| `Mbproxy.csproj` | x | x | x | x | | | +| `HostingExtensions.cs` | | x | x | | | | +| `Program.cs` | | | x | | | | +| `Diagnostics/*` (new + EventLogBridge) | | x | | | | | +| `install/publish.*` | x | | | | | | +| `install/*.config.template.json` | | | | x | | | +| `install/install.sh`, `uninstall.sh`, `.service` | | | | | x | | +| `tests/**` | | x | x | x | | | +| docs / READMEs / CLAUDE.md | | | | | | x | + +No column in W4 (P4/P5/P6) shares a row. Confirmed conflict-free. + +### Subagent rules (enforce in every dispatch prompt) + +1. **One git worktree per subagent** — dispatch each `Agent` call with + `isolation: "worktree"`. Physical isolation means even a stray edit can't + corrupt a sibling's tree. +2. **Owned-file contract** — each subagent is told its exact owned file set + from the matrix and instructed to edit nothing outside it. A subagent that + discovers it needs an out-of-set file must stop and report, not edit. +3. **No intra-wave API coupling** — subagents in the same wave may only depend + on public APIs from *already-merged* prior waves, never on a sibling's + in-progress work. (This is why P2→P3 are separate waves, not parallel.) +4. **Tests ship with code** — the subagent that writes a phase's code also + writes that phase's tests and runs `dotnet test` green *in its own + worktree* before reporting done. No separate "test agent." +5. **Integrator merges in declared order** — the main agent merges each + worktree, runs the full build + test suite, and only then declares the + phase gate met. A failed gate blocks the next wave. +6. **High-contention files are single-agent-only** — `csproj`, + `HostingExtensions.cs`, `Program.cs`, `CLAUDE.md` are never edited by two + agents in the same wave (the matrix guarantees this). +7. **Prefer new files** — `DiagnosticSinkSelector.cs`, `SyslogBridge.cs`, + `mbproxy.linux.config.template.json`, the shell scripts, the unit file are + all new — new files can't merge-conflict, maximizing safe parallelism. +8. **Shared test hosts are integrator-only for mutations** — subagents may run + `dotnet build` / `dotnet test` (read-mostly) but must **not** install a + Windows Service, register an Event Log source, or `systemctl` against the + real `10.100.0.35` host. Service-level E2E is the integrator's job at gate + time; if a subagent needs Linux E2E it spins an ephemeral Docker container + on the box (named per-agent, `--rm`) so parallel agents never collide on + ports, the init system, or service accounts. + +### Merge protocol per wave + +``` +for each wave: + dispatch agent(s) with isolation: worktree + owned-file list + on completion: + integrator: merge worktree(s) in matrix order + integrator: dotnet build -c Debug (must be green) + integrator: dotnet test (green, count >= prior) + integrator: dotnet publish -r win-x64 AND -r linux-x64 (must succeed) + integrator: verify phase-specific gate checklist + gate green? -> next wave. gate red? -> fix in a single-agent pass, re-gate. +``` + +--- + +## 4. Cross-Cutting Test Strategy + +- **Existing baseline (325 = 282 unit + 43 E2E) must never regress.** Every + gate re-runs the full suite. +- **New tests target pure logic** — `DiagnosticSinkSelector` is a pure function + precisely so platform-selection is testable without being a service. Highest- + value new test. +- **OS-conditional tests** use `[Trait]` + a runtime `OperatingSystem.IsWindows()` + skip so the suite is green on both Windows and Linux. +- **Both platforms are exercised every gate, no simulation.** Windows runs + locally (admin rights → real Windows Service install). Linux runs on + `dohertj2@10.100.0.35` (Debian 13, systemd 257) — the integrator drives + `dotnet build` / `dotnet test` / publish / systemd E2E over SSH. +- **CI** (if/when a pipeline exists): add a `linux-x64` build+test leg, ideally + pointed at the same box or an equivalent image. Until then the integrator's + per-gate SSH run on `10.100.0.35` is the Linux leg. +- **CA1416 platform analyzer** is treated as a test — `TreatWarningsAsErrors` + already fails the build if a Windows API escapes its guard. + +--- + +## 5. Risk Register + +| Risk | Phase | Mitigation | +| --------------------------------------------- | ----- | -------------------------------------------------------------------------- | +| Windows Service behavior regresses unnoticed | P3 | Gate 3 mandates a real Windows Service install/start/stop smoke check | +| `Serilog.Sinks.Syslog` version drift | P2 | Pin the version; `SyslogBridge` is isolated behind `DiagnosticSinkSelector` | +| Linux publish ships Windows config path | P4 | RID-conditioned `` item + `MbproxyOptionsBindingTests` on both templates | +| Self-extracting single-file temp-dir perms | P1/P5 | Document + set `DOTNET_BUNDLE_EXTRACT_BASE_DIR` in the systemd unit | +| Two agents racing `csproj` | all | Matrix forbids it — `csproj` edited only in single-agent waves W1–W3 + lone P4 | +| Hidden Windows path elsewhere in code | all | `Grep` sweep for `C:\\`, `ProgramData`, `\\\\` before Gate 6 | +| Parallel Wave-4 agents collide on the shared `10.100.0.35` host | W4 | Rule 8 — service-level E2E is integrator-only and serial; subagent E2E uses per-agent `--rm` Docker containers | +| Windows Service E2E leaves stale service/Event Log source | P2/P3 | Integrator always runs `uninstall.ps1` after each Windows gate | + +--- + +## 6. Deliverable Summary + +- **3 modified source files** (`csproj`, `HostingExtensions.cs`, `Program.cs`) + + **3 new** (`DiagnosticSinkSelector.cs`, `SyslogBridge.cs`, and the + truncation-helper extraction in `EventLogBridge.cs`). +- **2 new packages** (`Microsoft.Extensions.Hosting.Systemd`, + `Serilog.Sinks.Syslog`). +- **6 new install/tooling files** (`publish.sh`, Linux config template, + `mbproxy.service`, `install.sh`, `uninstall.sh`). +- **~6–8 new tests** across 3 new/extended test files; baseline 325 preserved. +- **7 doc files** updated. +- **4 waves**, max 3 concurrent subagents, conflict-free by construction. diff --git a/mbproxy/src/Mbproxy/Diagnostics/DiagnosticSinkSelector.cs b/mbproxy/src/Mbproxy/Diagnostics/DiagnosticSinkSelector.cs new file mode 100644 index 0000000..28869c7 --- /dev/null +++ b/mbproxy/src/Mbproxy/Diagnostics/DiagnosticSinkSelector.cs @@ -0,0 +1,60 @@ +namespace Mbproxy.Diagnostics; + +/// +/// The platform diagnostic sink to wire for Error+ events — picked once, +/// at the composition root, by . +/// +internal enum DiagnosticSink +{ + /// + /// No platform diagnostic sink — console (and rolling-file) sinks only. Used + /// for interactive / dev runs on every OS. + /// + None, + + /// + /// Windows Application Event Log, via . Selected + /// only when the process is hosted as a Windows Service. + /// + EventLog, + + /// + /// Local syslog, via . Selected only when the + /// process is hosted as a systemd service on Linux. + /// + Syslog, +} + +/// +/// Pure platform-selection logic for the Error+ diagnostic sink. Holds no +/// I/O and no host APIs so it is unit-testable for every OS / host combination; +/// the host detection itself happens in . +/// +internal static class DiagnosticSinkSelector +{ + /// + /// Picks the diagnostic sink for the current host: + /// + /// Windows hosted as a Windows Service → . + /// Linux hosted as a systemd service → . + /// Everything else — interactive / dev runs, macOS, launches not owned + /// by an init system → . + /// + /// The managed-service gate mirrors the original + /// contract: a diagnostic sink is wired only when an init system actually owns + /// the process, so dev / console runs never need an Event Log source registered + /// or a syslog socket reachable. + /// + /// Running on Windows. + /// Hosted by the Windows Service Control Manager. + /// Hosted by systemd. + public static DiagnosticSink Select(bool isWindows, bool isWindowsService, bool isSystemd) + { + // Windows takes precedence: isSystemd is meaningless there, and on + // non-Windows isWindowsService is always false. + if (isWindows) + return isWindowsService ? DiagnosticSink.EventLog : DiagnosticSink.None; + + return isSystemd ? DiagnosticSink.Syslog : DiagnosticSink.None; + } +} diff --git a/mbproxy/src/Mbproxy/Diagnostics/EventLogBridge.cs b/mbproxy/src/Mbproxy/Diagnostics/EventLogBridge.cs index 86f7075..28d7b16 100644 --- a/mbproxy/src/Mbproxy/Diagnostics/EventLogBridge.cs +++ b/mbproxy/src/Mbproxy/Diagnostics/EventLogBridge.cs @@ -5,6 +5,32 @@ using Serilog.Events; namespace Mbproxy.Diagnostics; +/// +/// Pure message-shaping helpers for the Windows Event Log. Kept on a separate, +/// non-platform-annotated type — not on , +/// which is [SupportedOSPlatform("windows")] — so the truncation logic is +/// unit-testable on any OS without tripping the platform-compatibility analyzer. +/// +internal static class EventLogMessage +{ + /// The Windows Event Log single-entry limit, in bytes (32 KB). + public const int MaxBytes = 32 * 1024; + + /// + /// Truncates so its UTF-16 byte length stays within + /// , appending an ellipsis when truncation occurs. Shorter + /// messages are returned unchanged. + /// + public static string TruncateToLimit(string message) + { + // Rough UTF-16 upper bound: 2 bytes per char. + if (message.Length * 2 <= MaxBytes) return message; + + int charLimit = MaxBytes / 2 - 3; // leave room for the "..." suffix + return message[..charLimit] + "..."; + } +} + /// /// Serilog sink that writes events at level Error and above to the Windows Event Log /// under source mbproxy. @@ -26,7 +52,6 @@ internal sealed class EventLogBridge : ILogEventSink { private const string Source = "mbproxy"; private const string LogName = "Application"; - private const int MaxMessageBytes = 32 * 1024; // 32 KB Event Log limit private readonly bool _enabled; // Cache the source-exists check at construction so Emit doesn't hit the registry on @@ -63,11 +88,7 @@ internal sealed class EventLogBridge : ILogEventSink } // Truncate to the Event Log single-entry limit. - if (message.Length * 2 > MaxMessageBytes) // rough UTF-16 upper bound - { - int charLimit = MaxMessageBytes / 2 - 3; - message = message[..charLimit] + "..."; - } + message = EventLogMessage.TruncateToLimit(message); var type = logEvent.Level switch { diff --git a/mbproxy/src/Mbproxy/Diagnostics/SyslogBridge.cs b/mbproxy/src/Mbproxy/Diagnostics/SyslogBridge.cs new file mode 100644 index 0000000..ee4dac5 --- /dev/null +++ b/mbproxy/src/Mbproxy/Diagnostics/SyslogBridge.cs @@ -0,0 +1,50 @@ +using Serilog; +using Serilog.Debugging; +using Serilog.Events; + +namespace Mbproxy.Diagnostics; + +/// +/// Wires the local-syslog sink for Error+ events when mbproxy runs as a +/// systemd service on Linux — the cross-platform counterpart of +/// . +/// +/// Events at and above are written to the +/// local syslog socket (/dev/log) under the application name +/// , with Serilog levels mapped to syslog severities by the +/// sink. On a systemd host the local syslog socket is provided by +/// systemd-journald, so these events land in the journal at +/// err/crit priority — distinct from the process's stdout, which +/// journald captures at info. +/// +/// If the local syslog socket is unavailable the bridge degrades silently +/// to the console (and rolling-file) sinks rather than failing logger +/// construction, mirroring 's no-op-when-unavailable +/// contract. +/// +internal static class SyslogBridge +{ + /// syslog application name — the TAG field of each entry. + internal const string AppName = "mbproxy"; + + /// + /// Attaches the Error+ local-syslog sink to and + /// returns it for fluent chaining. Never throws: a host where the syslog sink + /// cannot be configured degrades to unchanged. + /// + public static LoggerConfiguration AttachTo(LoggerConfiguration cfg) + { + try + { + return cfg.WriteTo.LocalSyslog( + appName: AppName, + restrictedToMinimumLevel: LogEventLevel.Error); + } + catch (Exception ex) + { + // Degrade to console-only rather than crash logger construction. + SelfLog.WriteLine("SyslogBridge: local syslog unavailable, console-only: {0}", ex); + return cfg; + } + } +} diff --git a/mbproxy/src/Mbproxy/HostingExtensions.cs b/mbproxy/src/Mbproxy/HostingExtensions.cs index 0540195..3d24d81 100644 --- a/mbproxy/src/Mbproxy/HostingExtensions.cs +++ b/mbproxy/src/Mbproxy/HostingExtensions.cs @@ -2,7 +2,10 @@ using Mbproxy.Admin; using Mbproxy.Configuration; using Mbproxy.Diagnostics; using Mbproxy.Options; +using Microsoft.Extensions.Hosting.Systemd; +using Microsoft.Extensions.Hosting.WindowsServices; using Serilog; +using Serilog.Events; namespace Mbproxy; @@ -62,25 +65,39 @@ internal static class HostingExtensions /// Configures Serilog from the "Serilog" configuration section, with console /// and rolling-file sinks as defaults. /// - /// When is true, the - /// is added as a sub-sink for events at - /// and above. This flag should only be - /// set when the service is running as a Windows Service — the bridge silently ignores - /// events when the Event Log source is not registered. + /// This is the single composition-root point where the platform diagnostic + /// sink for Error+ events is chosen. + /// picks it from the current host: + /// + /// Windows Service → (Application + /// Event Log). + /// systemd service → (local syslog). + /// interactive / dev runs (any OS) → no platform sink. + /// + /// Both bridges silently no-op when their backing facility is unavailable, so a + /// dev run never needs an Event Log source registered or a syslog socket. /// - public static IHostApplicationBuilder AddMbproxySerilog( - this IHostApplicationBuilder builder, - bool addEventLogBridge = false) + public static IHostApplicationBuilder AddMbproxySerilog(this IHostApplicationBuilder builder) { var cfg = new LoggerConfiguration() .ReadFrom.Configuration(builder.Configuration); - if (addEventLogBridge && OperatingSystem.IsWindows()) + var sink = DiagnosticSinkSelector.Select( + isWindows: OperatingSystem.IsWindows(), + isWindowsService: WindowsServiceHelpers.IsWindowsService(), + isSystemd: SystemdHelpers.IsSystemdService()); + + cfg = sink switch { - cfg = cfg.WriteTo.Sink( - new EventLogBridge(enabled: true), - Serilog.Events.LogEventLevel.Error); - } + // EventLogBridge is [SupportedOSPlatform("windows")]; the extra + // OperatingSystem.IsWindows() guard satisfies the platform analyzer + // (DiagnosticSinkSelector already guarantees Windows for this case). + DiagnosticSink.EventLog when OperatingSystem.IsWindows() + => cfg.WriteTo.Sink(new EventLogBridge(enabled: true), LogEventLevel.Error), + DiagnosticSink.Syslog + => SyslogBridge.AttachTo(cfg), + _ => cfg, + }; Log.Logger = cfg.CreateLogger(); diff --git a/mbproxy/src/Mbproxy/Mbproxy.csproj b/mbproxy/src/Mbproxy/Mbproxy.csproj index ab2f7c5..b328993 100644 --- a/mbproxy/src/Mbproxy/Mbproxy.csproj +++ b/mbproxy/src/Mbproxy/Mbproxy.csproj @@ -12,16 +12,19 @@ 1.0.0 - - + + true true - win-x64 true @@ -32,12 +35,19 @@ + Microsoft.AspNetCore.App — do not re-add it explicitly. + The two init-system integration packages are both portable: each is + safe to reference and call on any OS (the helper self-detects its host + and no-ops otherwise), so no conditional reference is needed. --> + + + @@ -48,17 +58,29 @@ + - + + PreserveNewest + + + PreserveNewest + + diff --git a/mbproxy/src/Mbproxy/Program.cs b/mbproxy/src/Mbproxy/Program.cs index 119bc6a..a2331bf 100644 --- a/mbproxy/src/Mbproxy/Program.cs +++ b/mbproxy/src/Mbproxy/Program.cs @@ -1,17 +1,21 @@ using Mbproxy; using Mbproxy.Proxy; +using Microsoft.Extensions.Hosting.Systemd; using Microsoft.Extensions.Hosting.WindowsServices; var builder = Host.CreateApplicationBuilder(args); -// Windows Service support; no-op when running under dotnet run / console. +// Init-system integration. Both helpers self-detect their host and are no-ops +// otherwise, so calling both unconditionally is correct on every platform: +// - AddWindowsService(): active only when launched by the Windows SCM. +// - AddSystemd(): active only when launched by systemd (wires sd_notify +// readiness; SIGTERM shutdown is handled by the host). builder.Services.AddWindowsService(); +builder.Services.AddSystemd(); -// Wire EventLogBridge only when actually running as a Windows Service. -bool isWindowsService = WindowsServiceHelpers.IsWindowsService(); - -// Wire up structured config, Serilog, and typed options. -builder.AddMbproxySerilog(addEventLogBridge: isWindowsService); +// Wire up structured config, Serilog, and typed options. AddMbproxySerilog selects +// the platform diagnostic sink (Windows Event Log / syslog / none) internally. +builder.AddMbproxySerilog(); builder.AddMbproxyOptions(); // PDU pipeline: BcdPduPipeline is stateless (per-call correlation flows through diff --git a/mbproxy/tests/Mbproxy.Tests/Diagnostics/DiagnosticSinkSelectorTests.cs b/mbproxy/tests/Mbproxy.Tests/Diagnostics/DiagnosticSinkSelectorTests.cs new file mode 100644 index 0000000..ceb1df4 --- /dev/null +++ b/mbproxy/tests/Mbproxy.Tests/Diagnostics/DiagnosticSinkSelectorTests.cs @@ -0,0 +1,40 @@ +using Mbproxy.Diagnostics; +using Shouldly; +using Xunit; + +namespace Mbproxy.Tests.Diagnostics; + +/// +/// Unit tests for — the pure platform-selection +/// logic for the Error+ diagnostic sink. Covers every OS / host combination so the +/// selection contract is pinned without needing a real Windows Service or systemd host. +/// +[Trait("Category", "Unit")] +public sealed class DiagnosticSinkSelectorTests +{ + // 'expected' is the underlying int of DiagnosticSink: the enum is internal and + // cannot appear in a public (xunit-discoverable) method signature. + [Theory] + [InlineData(true, true, false, (int)DiagnosticSink.EventLog)] // Windows, hosted as a Windows Service + [InlineData(true, false, false, (int)DiagnosticSink.None)] // Windows, interactive / dev run + [InlineData(false, false, true, (int)DiagnosticSink.Syslog)] // Linux, hosted as a systemd service + [InlineData(false, false, false, (int)DiagnosticSink.None)] // Linux / macOS, interactive / dev run + public void Select_PicksExpectedSink( + bool isWindows, bool isWindowsService, bool isSystemd, int expected) + => ((int)DiagnosticSinkSelector.Select(isWindows, isWindowsService, isSystemd)).ShouldBe(expected); + + [Fact] + public void Select_Windows_TakesPrecedence_OverASpuriousSystemdFlag() + => DiagnosticSinkSelector.Select(isWindows: true, isWindowsService: true, isSystemd: true) + .ShouldBe(DiagnosticSink.EventLog); + + [Fact] + public void Select_WindowsConsoleRun_GetsNoSink_EvenIfSystemdFlagSet() + => DiagnosticSinkSelector.Select(isWindows: true, isWindowsService: false, isSystemd: true) + .ShouldBe(DiagnosticSink.None); + + [Fact] + public void Select_NonWindowsWithoutSystemd_GetsNoSink() + => DiagnosticSinkSelector.Select(isWindows: false, isWindowsService: false, isSystemd: false) + .ShouldBe(DiagnosticSink.None); +} diff --git a/mbproxy/tests/Mbproxy.Tests/Diagnostics/EventLogMessageTests.cs b/mbproxy/tests/Mbproxy.Tests/Diagnostics/EventLogMessageTests.cs new file mode 100644 index 0000000..a410935 --- /dev/null +++ b/mbproxy/tests/Mbproxy.Tests/Diagnostics/EventLogMessageTests.cs @@ -0,0 +1,41 @@ +using Mbproxy.Diagnostics; +using Shouldly; +using Xunit; + +namespace Mbproxy.Tests.Diagnostics; + +/// +/// Unit tests for — the 32 KB Windows +/// Event Log truncation rule. The helper is pure and OS-agnostic, so these run on +/// every platform (the Windows-only sink itself is not +/// exercised here). +/// +[Trait("Category", "Unit")] +public sealed class EventLogMessageTests +{ + [Fact] + public void TruncateToLimit_ShortMessage_ReturnedUnchanged() + { + const string msg = "mbproxy backend connect failed"; + EventLogMessage.TruncateToLimit(msg).ShouldBeSameAs(msg); + } + + [Fact] + public void TruncateToLimit_MessageAtTheLimit_NotTruncated() + { + // MaxBytes / 2 chars = exactly MaxBytes at the 2-bytes-per-char upper bound. + var atLimit = new string('y', EventLogMessage.MaxBytes / 2); + EventLogMessage.TruncateToLimit(atLimit).ShouldBe(atLimit); + } + + [Fact] + public void TruncateToLimit_OversizeMessage_TruncatedWithinLimit_AndEndsWithEllipsis() + { + var huge = new string('x', EventLogMessage.MaxBytes); // well over the limit + var result = EventLogMessage.TruncateToLimit(huge); + + (result.Length * 2).ShouldBeLessThanOrEqualTo(EventLogMessage.MaxBytes); + result.ShouldEndWith("..."); + result.Length.ShouldBeLessThan(huge.Length); + } +} diff --git a/mbproxy/tests/Mbproxy.Tests/Diagnostics/SyslogBridgeTests.cs b/mbproxy/tests/Mbproxy.Tests/Diagnostics/SyslogBridgeTests.cs new file mode 100644 index 0000000..9de7aa4 --- /dev/null +++ b/mbproxy/tests/Mbproxy.Tests/Diagnostics/SyslogBridgeTests.cs @@ -0,0 +1,27 @@ +using Mbproxy.Diagnostics; +using Serilog; +using Shouldly; +using Xunit; + +namespace Mbproxy.Tests.Diagnostics; + +/// +/// Unit tests for . The bridge's fail-safe contract is that +/// attaching the local-syslog sink and building the resulting logger never throw — +/// even on a host with no /dev/log (e.g. the Windows test leg), where the sink +/// connects lazily and degrades silently. +/// +[Trait("Category", "Unit")] +public sealed class SyslogBridgeTests +{ + [Fact] + public void AttachTo_ReturnsAConfiguration_AndNeverThrows() + => SyslogBridge.AttachTo(new LoggerConfiguration()).ShouldNotBeNull(); + + [Fact] + public void AttachTo_ResultCreatesALogger_WithoutThrowing() + { + using var logger = SyslogBridge.AttachTo(new LoggerConfiguration()).CreateLogger(); + logger.ShouldNotBeNull(); + } +} diff --git a/mbproxy/tests/Mbproxy.Tests/HostSmokeTests.cs b/mbproxy/tests/Mbproxy.Tests/HostSmokeTests.cs index 6888d5e..5dfb02e 100644 --- a/mbproxy/tests/Mbproxy.Tests/HostSmokeTests.cs +++ b/mbproxy/tests/Mbproxy.Tests/HostSmokeTests.cs @@ -5,6 +5,8 @@ using Mbproxy.Proxy; using Microsoft.Extensions.Configuration; using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.Hosting; +using Microsoft.Extensions.Hosting.Systemd; +using Microsoft.Extensions.Hosting.WindowsServices; using Serilog; using Serilog.Core; using Serilog.Events; @@ -71,6 +73,26 @@ public sealed class HostSmokeTests // Assert: does not throw / time out. await stopTask.ShouldCompleteWithinAsync(TimeSpan.FromSeconds(3)); } + + [Fact] + public async Task HostSmoke_BothInitSystemIntegrations_CoRegister_AndHostRunsCleanly() + { + // Arrange: register BOTH init-system integrations. Each is a no-op off its + // own init system, so on a test run (neither) the default console lifetime + // applies — they must co-register without conflict and leave the host + // startable and stoppable. + var builder = Host.CreateApplicationBuilder(); + builder.Services.AddWindowsService(); + builder.Services.AddSystemd(); + builder.ConfigureForTest(new LoggerConfiguration().CreateLogger()); + + using var host = builder.Build(); + using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5)); + + // Act + Assert: start/stop do not throw or time out. + await host.StartAsync(cts.Token); + await host.StopAsync(cts.Token); + } } /// diff --git a/mbproxy/tests/Mbproxy.Tests/Options/MbproxyOptionsBindingTests.cs b/mbproxy/tests/Mbproxy.Tests/Options/MbproxyOptionsBindingTests.cs index 37e4690..c823699 100644 --- a/mbproxy/tests/Mbproxy.Tests/Options/MbproxyOptionsBindingTests.cs +++ b/mbproxy/tests/Mbproxy.Tests/Options/MbproxyOptionsBindingTests.cs @@ -102,6 +102,39 @@ public sealed class MbproxyOptionsBindingTests options.Resilience.ListenerRecovery.InitialBackoffMs.ShouldBe([1000, 2000, 5000, 15000, 30000]); options.Plcs.ShouldBeEmpty(); options.BcdTags.Global.ShouldBeEmpty(); + + // Keepalive defaults — enabled, with the documented timer values. + options.Connection.Keepalive.Enabled.ShouldBeTrue(); + options.Connection.Keepalive.TcpIdleTimeMs.ShouldBe(30000); + options.Connection.Keepalive.TcpProbeIntervalMs.ShouldBe(5000); + options.Connection.Keepalive.TcpProbeCount.ShouldBe(4); + options.Connection.Keepalive.BackendHeartbeatIdleMs.ShouldBe(30000); + options.Connection.Keepalive.BackendHeartbeatProbeAddress.ShouldBe(0); + } + + // ------------------------------------------------------------------------- + // Test 5 — the Connection:Keepalive block binds from configuration + // ------------------------------------------------------------------------- + [Fact] + public void MbproxyOptionsBinding_BindsKeepaliveBlock() + { + var options = BindOptions(new Dictionary + { + ["Mbproxy:Connection:Keepalive:Enabled"] = "false", + ["Mbproxy:Connection:Keepalive:TcpIdleTimeMs"] = "45000", + ["Mbproxy:Connection:Keepalive:TcpProbeIntervalMs"] = "7000", + ["Mbproxy:Connection:Keepalive:TcpProbeCount"] = "6", + ["Mbproxy:Connection:Keepalive:BackendHeartbeatIdleMs"] = "20000", + ["Mbproxy:Connection:Keepalive:BackendHeartbeatProbeAddress"] = "1024", + }); + + var ka = options.Connection.Keepalive; + ka.Enabled.ShouldBeFalse(); + ka.TcpIdleTimeMs.ShouldBe(45000); + ka.TcpProbeIntervalMs.ShouldBe(7000); + ka.TcpProbeCount.ShouldBe(6); + ka.BackendHeartbeatIdleMs.ShouldBe(20000); + ka.BackendHeartbeatProbeAddress.ShouldBe(1024); } // ------------------------------------------------------------------------- @@ -129,4 +162,47 @@ public sealed class MbproxyOptionsBindingTests result.Failed.ShouldBeTrue("Width=8 should fail schema validation"); result.Failures.ShouldNotBeEmpty(); } + + // ------------------------------------------------------------------------- + // Test 6 — every shipped install template (Windows + Linux) loads as JSONC, + // binds to MbproxyOptions, and passes schema validation. This catches + // a malformed template at build time and keeps the two platform + // variants in lockstep. + // ------------------------------------------------------------------------- + [Theory] + [InlineData("mbproxy.config.template.json")] + [InlineData("mbproxy.linux.config.template.json")] + public void MbproxyOptionsBinding_ShippedInstallTemplate_BindsAndValidates(string templateFileName) + { + var templatePath = ResolveInstallFile(templateFileName); + + // The templates are JSONC; the .NET JSON config provider skips // and /* */ + // comments and allows trailing commas, so AddJsonFile loads them directly. + var config = new ConfigurationBuilder() + .AddJsonFile(templatePath, optional: false) + .Build(); + + var options = config.GetSection("Mbproxy").Get() ?? new MbproxyOptions(); + + var result = new MbproxyOptionsValidator().Validate(null, options); + result.Succeeded.ShouldBeTrue( + $"{templateFileName} must pass schema validation — failures: " + + string.Join("; ", result.Failures ?? [])); + } + + /// + /// Resolves an install/ file by walking up from the test assembly directory. + /// Works from both the Windows dev box and the Linux test box. + /// + private static string ResolveInstallFile(string fileName) + { + for (var dir = new DirectoryInfo(AppContext.BaseDirectory); dir is not null; dir = dir.Parent) + { + var candidate = Path.Combine(dir.FullName, "install", fileName); + if (File.Exists(candidate)) + return candidate; + } + throw new FileNotFoundException( + $"Could not locate install/{fileName} above {AppContext.BaseDirectory}"); + } } diff --git a/mbproxy/tests/sim/run-dl205-sim.ps1 b/mbproxy/tests/sim/run-dl205-sim.ps1 index fc9a0a6..0f296ce 100644 --- a/mbproxy/tests/sim/run-dl205-sim.ps1 +++ b/mbproxy/tests/sim/run-dl205-sim.ps1 @@ -48,9 +48,13 @@ if (-not $ProfileResolved) { } # ── 2. Locate Python ───────────────────────────────────────────────────────── -# Try 'python' first (standard PATH install), then the Windows-store launcher 'py'. +# Windows: 'python' (standard PATH install), then the 'py' launcher. +# Linux/macOS: 'python3' (the canonical name), then 'python'. +# The candidate order is platform-specific so Windows never matches the Microsoft +# Store 'python3' stub. $pythonExe = $null -foreach ($candidate in 'python', 'py') { +$pythonCandidates = $IsWindows ? @('python', 'py') : @('python3', 'python') +foreach ($candidate in $pythonCandidates) { try { $ver = & $candidate --version 2>&1 if ($LASTEXITCODE -eq 0) { @@ -77,9 +81,13 @@ or use the Windows Store launcher ('py'). $PYMODBUS_VERSION = '3.13.0' $venvDir = Join-Path $PSScriptRoot '.venv' -$venvPython = Join-Path $venvDir 'Scripts\python.exe' -$pipExe = Join-Path $venvDir 'Scripts\pip.exe' -$simulatorExe = Join-Path $venvDir 'Scripts\pymodbus.simulator.exe' # sentinel for complete install +# venv executable layout differs by OS: Windows puts them in Scripts\ with a .exe +# extension; Linux/macOS put them in bin/ with no extension. +$venvBin = $IsWindows ? 'Scripts' : 'bin' +$exeExt = $IsWindows ? '.exe' : '' +$venvPython = Join-Path $venvDir $venvBin "python$exeExt" +$pipExe = Join-Path $venvDir $venvBin "pip$exeExt" +$simulatorExe = Join-Path $venvDir $venvBin "pymodbus.simulator$exeExt" # sentinel for complete install # Provisioning is idempotent: we only skip it when pymodbus.simulator.exe exists. # Checking only the .venv directory is not enough — a previous run killed mid-install