Compare commits

..

2 Commits

Author SHA1 Message Date
Joseph Doherty b330faff03 mbproxy: cross-platform support — Linux/systemd alongside Windows
Make the service build, run, and install on Linux as a first-class
target while keeping the Windows Service + Event Log behaviour intact.

- Build: drop the hardcoded win-x64 RID — single-file publish now works
  for any RID. publish.ps1 gains -Rid; new publish.sh for Linux hosts.
- Diagnostics: DiagnosticSinkSelector picks the Error+ sink per host —
  Windows Event Log under the SCM, local syslog under systemd
  (Serilog.Sinks.SyslogMessages), none for interactive runs. The
  EventLog truncation helper is extracted so it is testable cross-OS.
- Host: Program.cs registers AddSystemd() alongside AddWindowsService().
- Config: a RID-conditioned appsettings template ships Windows or Unix
  paths; both templates are schema-validated by a test.
- Install: systemd unit (Type=exec) plus install.sh / uninstall.sh.
  Also fixes two cross-platform bugs found while testing: install.ps1
  and uninstall.ps1 used New-EventLog / Remove-EventLog (absent in
  PowerShell 7), and the E2E sim launcher hardcoded Windows venv paths.
- Docs updated across README, CLAUDE.md, and docs/ for dual-platform.

413 tests pass on Windows; 374 (all non-simulator) on Linux.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 09:41:59 -04:00
Joseph Doherty 0868613890 mbproxy: add keepalive / connection monitoring
The DL205/DL260 ECOM emits no TCP keepalives, so an idle backend socket
can be silently dropped by a middlebox (switch, firewall, NAT) after
2-5 minutes. Enable OS SO_KEEPALIVE on backend and accepted upstream
sockets, and drive a periodic synthetic FC03 heartbeat on each idle
backend socket so a dead path is detected before a real client request
hits it. Controlled by Connection.Keepalive (ON by default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 09:40:54 -04:00
54 changed files with 2940 additions and 131 deletions
+1 -1
View File
@@ -23,7 +23,7 @@ When in doubt about where content belongs, default to pushing it deeper. `DOCS-G
- [`graccesscli/`](graccesscli/README.md) — `.NET Framework 4.8 / x86` CliFx-based CLI for automating Galaxy configuration through the ArchestrA GRAccess COM interop.
- [`grdb/`](grdb/README.md) — SQL/DDL exploration of the Galaxy Repository SQL database (queries, schema, hierarchy/tag-name translation).
- [`histdb/`](histdb/README.md) — LLM-oriented reference for AVEVA Historian retrieval (extension tables, `wwXxx` time-domain extensions, retrieval modes/options, alarm-event SQL, REST API). Distilled from the official Historian Retrieval Guide.
- [`mbproxy/`](mbproxy/README.md) — `.NET 10` Windows Service that proxies Modbus TCP for a fleet of ~54 DL205/DL260 PLCs: inline bidirectional BCD rewriting, single-backend-conn TxId multiplexing (lifts the H2-ECOM100 4-client cap), in-flight read coalescing, and opt-in per-tag response caching.
- [`mbproxy/`](mbproxy/README.md) — `.NET 10` background service (Windows Service or Linux systemd unit) that proxies Modbus TCP for a fleet of ~54 DL205/DL260 PLCs: inline bidirectional BCD rewriting, single-backend-conn TxId multiplexing (lifts the H2-ECOM100 4-client cap), in-flight read coalescing, and opt-in per-tag response caching.
- [`mxaccesscli/`](mxaccesscli/README.md) — `.NET Framework 4.8 / x86` CliFx-based CLI for reading, writing, and subscribing to System Platform tags via the **MxAccess** COM proxy (`LMXProxyServerClass`).
- [`secrets/`](secrets/README.md) — Self-hosted Infisical CLI + `secret` PowerShell helper for fetching credentials from `https://infisical.dohertylan.com` instead of inlining plaintext.
+9 -7
View File
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## What this is
`mbproxy` is a **C# .NET 10** background service (Windows Service) that sits **inline as a Modbus TCP proxy** in front of a fleet of **~54 AutomationDirect DirectLOGIC DL205 / DL260** equipment controllers. It is pre-configured with two pieces of static data:
`mbproxy` is a **C# .NET 10** background service — a **Windows Service** or a **Linux systemd unit** that sits **inline as a Modbus TCP proxy** in front of a fleet of **~54 AutomationDirect DirectLOGIC DL205 / DL260** equipment controllers. It is pre-configured with two pieces of static data:
1. **A list of BCD tags** — the holding/input registers (by Modbus address and bit width) that the controllers store in DirectLOGIC's native BCD encoding (`V2000 = 1234` is stored on the wire as `0x1234`, *not* `0x04D2`).
2. **A list of equipment controller IP addresses** (~54 entries) for the DL205/DL260 fleet. Each controller speaks Modbus TCP on port 502 via either the built-in DL260 Ethernet port or an H2-ECOM100 / H2-EBC100 coprocessor.
@@ -31,19 +31,20 @@ The full architecture is documented under **[`docs/`](docs/)** — see the `Arch
- **`appsettings.json` is hot-reloadable** via `IOptionsMonitor<MbproxyOptions>`; tag-list changes propagate per-PDU, PLC add/remove flows through the supervisor. A tag-list reload flushes the affected PLC's response cache (per-tag granularity intentionally not done in v1).
- **Polly bounded retries** on backend connect (3 attempts at 100ms / 500ms / 2000ms). No retries on mid-request failures (FC06/FC16 are non-idempotent on BCD tags). A per-request watchdog in the multiplexer surfaces Modbus exception 0x0B to the upstream client if a backend response never arrives within `BackendRequestTimeoutMs`.
- **Backend disconnect cascades upstream**: when the shared backend socket dies, every attached upstream pipe is closed in the same cycle (counter `BackendDisconnectCascades`); clients reconnect on their next request.
- **Keepalive / connection monitoring** (ON by default, `Connection.Keepalive`): OS `SO_KEEPALIVE` on backend and accepted upstream sockets, plus a per-PLC application heartbeat — a synthetic FC03 qty=1 read fired on an idle backend socket (`BackendHeartbeatIdleMs`). An unanswered heartbeat proactively tears the backend down (counters `backendHeartbeatsSent/Failed`, `backendIdleDisconnects`). The DL260 has no FC08, so the probe is a real register read. See [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md).
- **Read-only Kestrel admin port** (default 8080) exposes `GET /` (auto-refreshing HTML) and `GET /status.json` with service-wide and per-PLC counters (including Phase-9 mux fields, Phase-10 coalescing fields, and Phase-11 cache fields `cacheHitCount`, `cacheMissCount`, `cacheInvalidations`, `cacheEntryCount`, `cacheBytes`).
Anything beyond this short list lives in the `docs/` tree: the appsettings.json schema in [`docs/Operations/Configuration.md`](docs/Operations/Configuration.md), config propagation in [`docs/Features/HotReload.md`](docs/Features/HotReload.md), stable log event names in [`docs/Reference/LogEvents.md`](docs/Reference/LogEvents.md), the status counter catalog in [`docs/Operations/StatusPage.md`](docs/Operations/StatusPage.md), and the simulator-backed test fixture in [`docs/Testing/Simulator.md`](docs/Testing/Simulator.md). Open the relevant page before writing code; keep it in sync when decisions change.
## Current state
**Implementation complete through Phase 11.** Phases 0008 shipped the production-ready 1:1-model service; Phase 9 swapped the connection layer for the TxId-multiplexed model; Phase 10 added in-flight read coalescing on top; Phase 11 added an opt-in per-tag response cache (bounded staleness, OFF by default — see [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md)). The service is production-ready as a Windows Service:
**Implementation complete through Phase 11.** Phases 0008 shipped the production-ready 1:1-model service; Phase 9 swapped the connection layer for the TxId-multiplexed model; Phase 10 added in-flight read coalescing on top; Phase 11 added an opt-in per-tag response cache (bounded staleness, OFF by default — see [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md)). The service is production-ready as a **Windows Service or a Linux systemd unit**:
- Test count grew through Phase 11 (see `tests/Mbproxy.Tests/` for the current suite; previous baseline was 325 = 282 unit + 43 E2E).
- Single-file self-contained publish (`dotnet publish -c Release -r win-x64`).
- PowerShell install/uninstall scripts under `install/`.
- Graceful shutdown with configurable drain timeout (`Connection.GracefulShutdownTimeoutMs`, default 10 s).
- Windows Event Log integration (Error+ events when running as a service).
- Single-file self-contained publish for `win-x64` **and** `linux-x64` (`dotnet publish -c Release -r <rid>`) — the RID is supplied per publish, never hardcoded in the csproj.
- Install/uninstall scripts under `install/`: PowerShell (`install.ps1` / `uninstall.ps1`) for the Windows Service; shell (`install.sh` / `uninstall.sh` + the `mbproxy.service` unit) for systemd.
- Graceful shutdown with configurable drain timeout (`Connection.GracefulShutdownTimeoutMs`, default 10 s) — driven by the Windows SCM stop signal or POSIX `SIGTERM`.
- Platform diagnostic sink for Error+ events, chosen once at the composition root by `DiagnosticSinkSelector`: Windows Application Event Log under the SCM, local syslog under systemd, none for interactive/dev runs. The systemd unit is `Type=exec` (not `notify`).
- Read-only HTTP status page at `AdminPort` (default 8080) — surfaces Phase-9 mux fields alongside Phase-7 counters.
- `connectsSuccess` / `connectsFailed` counters wired in `PlcMultiplexer`.
- Phase 9 per-request watchdog defends against any backend that drops or mis-echoes a response (real-world packet loss; pymodbus 3.13 simulator's concurrent-multiplexed-request bug).
@@ -63,7 +64,7 @@ The DL205/DL260 family is *almost* Modbus-spec-compliant, but every category bel
- **Octal V-memory ↔ decimal Modbus translation.** `V2000` octal = decimal 1024 = Modbus PDU `0x0400`. Config addresses are PDU-decimal, **not** octal V-memory and **not** 1-based 4xxxx.
- **FC03/FC04 max qty = 128** (above spec's 125). **FC16 max qty = 100** (below spec's 123). The proxy passes these through; the PLC enforces the cap with exception 03.
- **Max 4 concurrent TCP clients per ECOM100.** This is why the proxy uses a single TxId-multiplexed backend socket per PLC — see [`docs/Architecture/ConnectionModel.md`](docs/Architecture/ConnectionModel.md) for how the connection model lifts this cap.
- **No TCP keepalive from the device.** Middleboxes typically drop idle sockets at 25 min. With the 1:1 model, backend liveness tracks upstream client liveness; if both are idle long enough, the path dies on its own and the next request reconnects.
- **No TCP keepalive from the device.** Middleboxes typically drop idle sockets at 25 min. The proxy compensates with its own keepalive — `SO_KEEPALIVE` on every socket plus an idle backend FC03 heartbeat (see the Architecture summary and [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md)).
- **Register 0 is valid** on DL205/DL260 in factory "absolute" addressing mode — don't probe-skip it.
- **As-deployed PLC parameters** (captured in `docs/Reference/mbtcp_settings.JPG`): port 502, "Use Concept data structures (Longs/Reals)" enabled, "Swap bytes" enabled, "Use Zero Based Addressing" **unchecked**, Register type = Binary, max coil read 1976 / coil write 800 / register read 122 / register write 100. The proxy must speak Modbus as-is; these settings describe the wire it'll see.
@@ -73,6 +74,7 @@ The DL205/DL260 family is *almost* Modbus-spec-compliant, but every category bel
| --- | --- |
| Architecture — listener topology, request flow, per-PLC isolation | [`docs/Architecture/Overview.md`](docs/Architecture/Overview.md) |
| Connection model — single backend socket per PLC, TxId multiplexing, request-timeout watchdog, disconnect cascade | [`docs/Architecture/ConnectionModel.md`](docs/Architecture/ConnectionModel.md) |
| Keepalive / connection monitoring — TCP `SO_KEEPALIVE` + backend FC03 heartbeat | [`docs/Architecture/Keepalive.md`](docs/Architecture/Keepalive.md) |
| In-flight read coalescing / opt-in response cache | [`docs/Architecture/ReadCoalescing.md`](docs/Architecture/ReadCoalescing.md), [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md) |
| BCD rewriting (codec, CDAB word order, FC03/04/06/16 scope) and config hot-reload | [`docs/Features/BcdRewriting.md`](docs/Features/BcdRewriting.md), [`docs/Features/HotReload.md`](docs/Features/HotReload.md) |
| Operations — full appsettings.json reference, status page / JSON fields, troubleshooting playbook | [`docs/Operations/Configuration.md`](docs/Operations/Configuration.md), [`docs/Operations/StatusPage.md`](docs/Operations/StatusPage.md), [`docs/Operations/Troubleshooting.md`](docs/Operations/Troubleshooting.md) |
+37 -20
View File
@@ -1,14 +1,14 @@
# mbproxy
A .NET 10 Windows Service that sits inline as a Modbus TCP proxy in front of a fleet of AutomationDirect DirectLOGIC DL205/DL260 controllers, rewriting BCD-encoded registers bidirectionally so upstream clients can read and write them as plain integers. The proxy also offers an opt-in per-tag response cache (default OFF) for FC03/FC04 reads with bounded operator-configured staleness — see [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md) before enabling it.
A .NET 10 background service — a **Windows Service** or a **Linux systemd unit** that sits inline as a Modbus TCP proxy in front of a fleet of AutomationDirect DirectLOGIC DL205/DL260 controllers, rewriting BCD-encoded registers bidirectionally so upstream clients can read and write them as plain integers. The proxy also offers an opt-in per-tag response cache (default OFF) for FC03/FC04 reads with bounded operator-configured staleness — see [`docs/Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md) before enabling it.
> ⚠ **32-bit BCD wire format is "two base-10000 digits in CDAB", not standard CDAB binary Int32.** A 32-bit BCD tag at address `A` decodes as `decimal = high * 10_000 + low` where `low` is the register at `A` and `high` is the register at `A+1`. Each word independently must be 09999. Standard Modbus clients (NModbus, FluentModbus, Wonderware DAServer) that interpret CDAB as straight binary Int32 will silently corrupt any value > 9999 on writes and read garbage on reads. Configure your client to send/receive each register as a separate base-10000 BCD digit pair, not as a single binary Int32. Full details in [`docs/Features/BcdRewriting.md`](docs/Features/BcdRewriting.md).
## Hard constraints / prerequisites
- **Windows 10 / Server 2019 or later, 64-bit.** No Linux or Docker support — the service uses `Microsoft.Extensions.Hosting.WindowsServices` and the Windows Event Log.
- **Windows (10 / Server 2019+) or Linux (any systemd distro), 64-bit.** Ships as a Windows Service (Application Event Log integration) or a systemd unit (syslog integration); builds single-file for `win-x64` and `linux-x64`. macOS is not a deployment target — it runs only as a foreground console process.
- **Modbus TCP backends reachable** from the proxy host on port 502 (or the port configured per PLC). The H2-ECOM100 module caps simultaneous connections at **4 per PLC** — a fifth upstream client will fail to connect.
- **Admin rights** to install the service (`install.ps1` requires elevation).
- **Admin / root rights** to install the service (`install.ps1` requires elevation; `install.sh` requires root).
- **No COM dependency** — this is a pure .NET 10 socket-level proxy (unlike the `.NET Framework 4.8 / x86` siblings in this repo).
- **Python 3.10+** on the test machine to run the pymodbus-backed E2E simulator (not needed to run the service in production).
@@ -16,8 +16,8 @@ A .NET 10 Windows Service that sits inline as a Modbus TCP proxy in front of a f
```
src/Mbproxy/ Main C# project (net10.0, Microsoft.NET.Sdk.Worker)
tests/Mbproxy.Tests/ xUnit v3 test project (314 unit + 48 E2E tests)
install/ PowerShell install/uninstall scripts and config template
tests/Mbproxy.Tests/ xUnit v3 test project (unit + simulator-backed E2E tests)
install/ Install/uninstall + publish scripts (PowerShell + shell), systemd unit, config templates
docs/ Architecture, features, operations, reference, and testing docs
```
@@ -40,6 +40,7 @@ The `docs/` tree is organized by topic. Start with [`Architecture/Overview.md`](
- [`Architecture/ConnectionModel.md`](docs/Architecture/ConnectionModel.md) — Single backend connection per PLC, TxId multiplexing, request-timeout watchdog, disconnect cascade.
- [`Architecture/ReadCoalescing.md`](docs/Architecture/ReadCoalescing.md) — In-flight FC03/FC04 deduplication via `InFlightByKeyMap`.
- [`Architecture/ResponseCache.md`](docs/Architecture/ResponseCache.md) — Opt-in per-tag response cache with bounded operator-configured staleness.
- [`Architecture/Keepalive.md`](docs/Architecture/Keepalive.md) — TCP `SO_KEEPALIVE` on every socket plus an idle-backend FC03 heartbeat.
### Features
@@ -54,7 +55,7 @@ The `docs/` tree is organized by topic. Start with [`Architecture/Overview.md`](
### Reference
- [`Reference/LogEvents.md`](docs/Reference/LogEvents.md) — Stable `mbproxy.*` event catalog (28 events across 7 categories).
- [`Reference/LogEvents.md`](docs/Reference/LogEvents.md) — Stable `mbproxy.*` event catalog (31 events across 8 categories).
### Testing
@@ -68,20 +69,27 @@ The `docs/` tree is organized by topic. Start with [`Architecture/Overview.md`](
dotnet build Mbproxy.slnx -c Debug
```
**Publish (Release, single-file, win-x64):**
**Publish (Release, single-file):**
```powershell
.\install\publish.ps1 -Clean
.\install\publish.ps1 -Clean # win-x64 (default)
.\install\publish.ps1 -Rid linux-x64 -Clean # cross-publish for linux-x64
```
Produces both flavours under `publish-out\`:
On a Linux build host, use the shell counterpart:
| Flavour | Path | Size | Target prerequisite |
```bash
./install/publish.sh --clean # linux-x64 (default)
```
Each run produces both flavours under `publish-out\`:
| Flavour | Path (win-x64) | Size | Target prerequisite |
|---|---|---|---|
| Self-contained | `publish-out\self-contained\Mbproxy.exe` | ~100 MB | None — bundles .NET 10 + ASP.NET Core runtime |
| Framework-dependent | `publish-out\framework-dependent\Mbproxy.exe` | ~1.5 MB | .NET 10 + ASP.NET Core preinstalled |
| Framework-dependent | `publish-out\framework-dependent\Mbproxy.exe` | ~1.6 MB | .NET 10 + ASP.NET Core preinstalled |
Pass `-OutputDir <path>` to publish elsewhere; omit `-Clean` to skip the wipe. The script wraps `dotnet publish src/Mbproxy/Mbproxy.csproj -c Release -r win-x64 [-p:SelfContained=false]` — run those directly if you only need one flavour.
On `linux-x64` the binary is `Mbproxy` (no extension) and ships the Linux config template. Pass `-OutputDir`/`-o` to publish elsewhere; omit `-Clean`/`--clean` to skip the wipe. The scripts wrap `dotnet publish src/Mbproxy/Mbproxy.csproj -c Release -r <rid> [-p:SelfContained=false]` — run that directly if you only need one flavour.
**Run tests:**
@@ -102,21 +110,30 @@ Edit `src/Mbproxy/appsettings.json` to configure PLCs before running. The admin
## Install
The `install/` directory holds the publish, install, and uninstall scripts. Quick path:
The `install/` directory holds the publish, install, and uninstall scripts for both platforms.
**Windows** — elevated PowerShell:
```powershell
# 1. Publish (produces publish-out\self-contained\ and publish-out\framework-dependent\)
.\install\publish.ps1 -Clean
# 2. Install (elevated PowerShell) — point at the flavour you want to deploy
.\install\install.ps1 -PublishOutput .\publish-out\self-contained -Start
# 3. Edit the config that was placed at %ProgramData%\mbproxy\appsettings.json
# 4. Verify
# Config is placed at %ProgramData%\mbproxy\appsettings.json — edit it, then:
# Restart-Service mbproxy
Invoke-WebRequest http://localhost:8080/ -UseBasicParsing
```
**Linux** — root / `sudo` on a systemd host:
```bash
./install/publish.sh --clean
sudo ./install/install.sh --publish-dir ./publish-out/self-contained
# Config is placed at /etc/mbproxy/appsettings.json — edit it, then:
# sudo systemctl restart mbproxy
curl http://localhost:8080/
```
`uninstall.ps1` / `uninstall.sh` reverse the install; both archive log files rather than deleting them. The systemd unit runs mbproxy as `Type=exec` under a dedicated `mbproxy` service account.
## Maintenance
Documentation doctrine for this repo: [`../DOCS-GUIDE.md`](../DOCS-GUIDE.md).
@@ -240,6 +240,7 @@ The per-request timeout watchdog described above is the production defence again
- [`./Overview.md`](./Overview.md) — proxy architecture entry point
- [`./ReadCoalescing.md`](./ReadCoalescing.md) — FC03/FC04 fan-out built on `InterestedParties`
- [`./ResponseCache.md`](./ResponseCache.md) — per-PLC FC03/FC04 cache layered in front of this multiplexer
- [`./Keepalive.md`](./Keepalive.md) — TCP keepalive and the backend heartbeat that keeps this socket warm
- [`../Operations/Configuration.md`](../Operations/Configuration.md) — `Connection.BackendConnectTimeoutMs`, `Connection.BackendRequestTimeoutMs`, retry tuning
- [`../Operations/StatusPage.md`](../Operations/StatusPage.md) — `inFlight`, `maxInFlight`, `txIdWraps`, `queueDepth`, `disconnectCascades` counters
- [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — `mbproxy.multiplex.*` structured log events
+76
View File
@@ -0,0 +1,76 @@
# Keepalive & Connection Monitoring
The DL205/DL260 ECOM does not emit TCP keepalives (see [`../Reference/dl205.md`](../Reference/dl205.md) → "Behavioural Oddities"). An idle socket is silently dropped by middleboxes — switches, firewalls, NAT — typically after 25 minutes. The proxy holds one **persistent backend socket per PLC** ([`./ConnectionModel.md`](./ConnectionModel.md)) plus many accepted upstream client sockets, so it needs its own keepalive on both sides.
Keepalive is **enabled by default** and is governed by the `Connection.Keepalive` option block (see [`../Operations/Configuration.md`](../Operations/Configuration.md)). Set `Connection.Keepalive.Enabled = false` to restore pre-keepalive behaviour exactly.
## Two mechanisms
| Mechanism | Scope | Detects |
|-----------|-------|---------|
| OS TCP keepalive (`SO_KEEPALIVE`) | Backend socket **and** accepted upstream sockets | A peer whose TCP stack is gone (host down, cable pulled, half-open socket). |
| Application heartbeat (FC03 probe) | Backend socket only | The above **plus** a middlebox idle-drop and an ECOM that is connected-but-not-answering Modbus. |
The application heartbeat is the load-bearing mechanism; OS keepalive is a cheap belt-and-suspenders that also covers the window between heartbeat ticks.
## Backend: OS TCP keepalive
`SocketKeepalive.Apply` sets `SO_KEEPALIVE` plus the idle-time / probe-interval / probe-count tunables on the backend `Socket` right after it is created in `PlcMultiplexer.EnsureBackendConnectedAsync`. The tunables come from `Connection.Keepalive.Tcp*`. Socket options are applied **at connect time** — a hot-reload of the `Tcp*` values only affects backend sockets opened *after* the change.
## Backend: application heartbeat
A per-`PlcMultiplexer` background loop (`RunBackendHeartbeatAsync`) is started alongside the backend writer and reader on each successful connect, under the same `_backendCts`, and dies with them on teardown.
- The multiplexer tracks `_lastBackendActivityUtc`, updated by **both** the writer (on every send) and the reader (on every received frame). Real traffic in either direction therefore suppresses the heartbeat.
- Each tick (a quarter of `BackendHeartbeatIdleMs`, floored at 500 ms), if the socket has been idle longer than `BackendHeartbeatIdleMs`, the loop issues a **synthetic FC03 qty=1 read** at `BackendHeartbeatProbeAddress` (default 0 = `V0`, valid on DL205/DL260). FC08 (Diagnostics) is **not** supported by the DL260 ECOM, so the probe must be a real register read.
- The probe targets the unit ID of the most recent upstream request, so it reaches the same Modbus unit the real clients successfully use.
- The probe takes a real proxy TxId and a `CorrelationMap` entry flagged `InFlightRequest.IsHeartbeat`. It is enqueued straight onto the backend outbound channel, **bypassing** the read-coalescing and response-cache paths.
### Heartbeat response
The backend reader recognises an `IsHeartbeat` correlation entry, refreshes the idle timer (already done on frame receipt), frees the TxId, and **drops the payload** — no rewriter, no cache write-through, no fan-out, and no round-trip-EWMA sample (the synthetic probe never pollutes the client-facing RTT metric).
### Heartbeat timeout
If a probe is not answered within `BackendRequestTimeoutMs`, the per-request timeout watchdog ([`./ConnectionModel.md`](./ConnectionModel.md) → "Per-Request Timeout Watchdog") finds the stale `IsHeartbeat` entry and — instead of dispatching a 0x0B exception to a (non-existent) upstream party — calls `TearDownBackendAsync`, cascading every attached upstream pipe.
This is a **proactive** version of the existing backend-disconnect cascade: the dead path is found during idle instead of corrupting the next real client request. Reconnect stays lazy — the heartbeat keeps an *existing* backend warm, it never resurrects a dead one and adds no eager-reconnect spinner. Clients reconnect on their next request, exactly as for an organic cascade.
`BackendHeartbeatIdleMs` must be greater than `BackendRequestTimeoutMs` (enforced by the reload validator) — a heartbeat interval at or below the request timeout would fire continuously.
## Upstream: OS TCP keepalive
`SocketKeepalive.Apply` is also called on each accepted client `Socket` in the `UpstreamPipe` constructor. This is the **only** standard keepalive available on the upstream side: Modbus TCP is strictly client-initiated, so the proxy — a server to its clients — cannot send an unsolicited application heartbeat to a client. OS keepalive lets the proxy's TCP stack probe each client; a dead or half-open client then faults the pipe's read loop, the pipe is disposed, and its correlation / coalescing slots are freed instead of leaking until the proxy next tries to write.
## Counters
Per-PLC, exposed on the status page (see [`../Operations/StatusPage.md`](../Operations/StatusPage.md)):
| Counter | Meaning |
|---------|---------|
| `backendHeartbeatsSent` | Heartbeat probes issued on idle backend sockets. |
| `backendHeartbeatsFailed` | Probes not answered within `BackendRequestTimeoutMs`. |
| `backendIdleDisconnects` | Backend teardowns triggered by a failed heartbeat (event count — distinct from `disconnectCascades`, which counts cascaded pipes). |
## Log events
`mbproxy.keepalive.*` — see [`../Reference/LogEvents.md`](../Reference/LogEvents.md):
- `mbproxy.keepalive.heartbeat.sent` (Debug)
- `mbproxy.keepalive.heartbeat.timeout` (Warning)
- `mbproxy.keepalive.backend.idle_disconnect` (Information)
## Hot reload
`Connection.Keepalive` is read through a live accessor (`Func<KeepaliveOptions>`), so a reload of `appsettings.json` propagates without a listener restart:
- The **heartbeat** interval and probe address are re-read on every loop tick.
- The **TCP socket options** are applied at connect/accept time, so a reload affects only sockets opened after the change.
## Related documentation
- [`./ConnectionModel.md`](./ConnectionModel.md) — backend socket lifecycle, the timeout watchdog, and the disconnect cascade this feature hooks into
- [`../Operations/Configuration.md`](../Operations/Configuration.md) — the `Connection.Keepalive` option block
- [`../Operations/StatusPage.md`](../Operations/StatusPage.md) — keepalive counters
- [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — `mbproxy.keepalive.*` events
- [`../Reference/dl205.md`](../Reference/dl205.md) — the device "no keepalive" oddity and FC03/FC08 support
+1 -1
View File
@@ -6,7 +6,7 @@ This document is the entry point for readers new to the codebase. It sketches th
## Runtime Shape
The process is a single .NET 10 Generic Host worker. `Microsoft.Extensions.Hosting.WindowsServices` registers the host as a Windows Service so the same binary runs interactively (for development) or under the SCM (in production). All configuration binds from `appsettings.json` through `IOptionsMonitor<MbproxyOptions>`, which makes the tag list and PLC roster hot-reloadable without process restart. `ProxyWorker` is the long-lived `BackgroundService` that owns startup, shutdown, and the listener supervisors for every PLC. A small Kestrel admin endpoint runs in the same process to serve the read-only status page.
The process is a single .NET 10 Generic Host worker. It registers both `Microsoft.Extensions.Hosting.WindowsServices` and `Microsoft.Extensions.Hosting.Systemd` — each a no-op off its own init system — so the same binary runs interactively (for development), as a Windows Service under the SCM, or as a Linux systemd unit. All configuration binds from `appsettings.json` through `IOptionsMonitor<MbproxyOptions>`, which makes the tag list and PLC roster hot-reloadable without process restart. `ProxyWorker` is the long-lived `BackgroundService` that owns startup, shutdown, and the listener supervisors for every PLC. A small Kestrel admin endpoint runs in the same process to serve the read-only status page.
There is no in-process database, no message broker, and no persistent cache file: state is per-PLC, in-memory, and ephemeral. Restarting the service drops every in-flight request and every cached response. Upstream clients are expected to reconnect and reissue; the proxy never replays a request on their behalf.
+1 -1
View File
@@ -6,7 +6,7 @@ A save to `appsettings.json` propagates to a running `mbproxy` without restartin
`Microsoft.Extensions.Configuration` loads `appsettings.json` with `reloadOnChange: true`. Every consumer reads its options through `IOptionsMonitor<MbproxyOptions>` instead of capturing a one-shot `IOptions<T>` snapshot at construction. When the framework's `FileSystemWatcher` sees the file change, it re-parses the JSON, re-binds the option tree, and notifies subscribers through `IOptionsMonitor.OnChange`.
The chosen mechanism is deliberate. There is no custom file watcher, no IPC channel, no admin-port mutation endpoint, and no SIGHUP-style trigger. An operator edits the file in place (or a deployment tool atomically rewrites it) and the running service catches up. The reload contract is identical whether the service is running interactively or as a Windows Service under the SCM.
The chosen mechanism is deliberate. There is no custom file watcher, no IPC channel, no admin-port mutation endpoint, and no SIGHUP-style trigger. An operator edits the file in place (or a deployment tool atomically rewrites it) and the running service catches up. The reload contract is identical whether the service is running interactively, as a Windows Service under the SCM, or as a Linux systemd unit.
The `OnChange` callback can fire multiple times for a single logical save because text editors on Windows commonly use a rename-and-replace pattern that produces two or three `FileSystemWatcher` events. The reconciler debounces these inside its own background loop with a 250 ms quiescent window so a single save produces a single apply.
+31 -4
View File
@@ -7,8 +7,11 @@
The configuration loader resolves `appsettings.json` relative to the executable.
- **Development run** (`dotnet run`): `src/Mbproxy/appsettings.json` next to the build output.
- **Single-file publish** (`dotnet publish -c Release -r win-x64`): `appsettings.json` next to `Mbproxy.exe` in the publish folder.
- **Installed as a Windows Service**: `%ProgramData%\mbproxy\appsettings.json`. The install script copies the template at `install/mbproxy.config.template.json` to this path the first time only — an existing file is preserved across reinstalls.
- **Single-file publish** (`dotnet publish -c Release -r <rid>`): `appsettings.json` next to the published binary. A `win-x64` publish ships `install/mbproxy.config.template.json`; a `linux-x64` publish ships `install/mbproxy.linux.config.template.json` (same keys, Unix log path) — each linked into the bundle as `appsettings.json`.
- **Installed as a Windows Service**: `%ProgramData%\mbproxy\appsettings.json`, seeded by `install.ps1` from `mbproxy.config.template.json`.
- **Installed as a systemd unit**: `/etc/mbproxy/appsettings.json` (the unit's `WorkingDirectory`), seeded by `install.sh` from the Linux template.
In both installed cases the install script copies the template only when no config already exists — an existing file is preserved across reinstalls.
The file is loaded with `reloadOnChange: true`. All consumers read through `IOptionsMonitor<MbproxyOptions>`, so a save propagates without restarting the service. See [`../Features/HotReload.md`](../Features/HotReload.md) for per-key propagation semantics.
@@ -51,11 +54,19 @@ Every supported key under `Mbproxy:*`, populated to a representative default:
// Read-only HTTP status page. Set to 0 to disable.
"AdminPort": 8080,
// Backend connection / request / shutdown timeouts.
// Backend connection / request / shutdown timeouts and keepalive.
"Connection": {
"BackendConnectTimeoutMs": 3000,
"BackendRequestTimeoutMs": 3000,
"GracefulShutdownTimeoutMs": 10000
"GracefulShutdownTimeoutMs": 10000,
"Keepalive": {
"Enabled": true,
"TcpIdleTimeMs": 30000,
"TcpProbeIntervalMs": 5000,
"TcpProbeCount": 4,
"BackendHeartbeatIdleMs": 30000,
"BackendHeartbeatProbeAddress": 0
}
},
// Polly resilience policies.
@@ -169,6 +180,21 @@ Operational sizing notes:
- A 3 s request timeout is generous compared with typical DL205/DL260 scan times (a few ms to tens of ms for FC03 of 100 registers). The slack absorbs PLC scan-overlap jitter without faulting the upstream client.
- `GracefulShutdownTimeoutMs` should be less than the Service Control Manager's stop deadline. The default 10 s suits a fleet of 54 PLCs; on a much larger fleet, raise both the SCM wait hint and this value in lockstep.
## `Mbproxy.Connection.Keepalive`
TCP keepalive and backend heartbeat settings. Source: `KeepaliveOptions.cs`. Enabled by default — the DL205/DL260 ECOM never emits TCP keepalives, so an idle socket is otherwise dropped by middleboxes after 25 minutes. See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md) for the full design.
| Field | Type | Default | Notes |
|-------|------|---------|-------|
| `Enabled` | bool | `true` | Master switch. When `false`, neither `SO_KEEPALIVE` nor the backend heartbeat is applied and the proxy behaves exactly as a pre-keepalive build. |
| `TcpIdleTimeMs` | int | `30000` | `SO_KEEPALIVE` idle time before the OS sends its first probe. Applied to the backend socket and accepted upstream sockets. |
| `TcpProbeIntervalMs` | int | `5000` | `SO_KEEPALIVE` interval between probes once idle. |
| `TcpProbeCount` | int | `4` | `SO_KEEPALIVE` unanswered probes before the OS declares the socket dead. |
| `BackendHeartbeatIdleMs` | int | `30000` | After this much backend idle, the proxy issues a synthetic FC03 qty=1 read to keep the path warm and prove the ECOM still answers Modbus. Must be greater than `BackendRequestTimeoutMs`. |
| `BackendHeartbeatProbeAddress` | int | `0` | Modbus PDU address the heartbeat FC03 probe reads. Address `0` (`V0`) is valid on DL205/DL260 in factory absolute mode. Range `[0, 65535]`. |
On hot reload, the heartbeat interval and probe address are re-read on every loop tick. The `Tcp*` socket options are applied at connect/accept time, so a reload affects only sockets opened after the change. A reload where `BackendHeartbeatIdleMs <= BackendRequestTimeoutMs` is rejected — a heartbeat interval at or below the request timeout would fire continuously.
## `Mbproxy.Resilience`
Polly retry pipelines for backend connect, listener bind, and the in-flight read coalescer. Source: `ResilienceOptions.cs`.
@@ -391,6 +417,7 @@ A reduced view of [`../Features/HotReload.md`](../Features/HotReload.md), restri
| `Plcs[i]` removed | Supervisor stops the listener and closes all upstream connections for that PLC. |
| `Plcs[i].ListenPort` or `Host` changed | Equivalent to remove + add. |
| `Connection.Backend*TimeoutMs` | Next backend connect or request uses the new value. |
| `Connection.Keepalive` heartbeat fields | Re-read on every heartbeat loop tick. `Tcp*` socket options apply to backend/upstream sockets opened after the change. |
| `AdminPort` | Requires a service restart — the Kestrel admin host is built once at startup. |
| `Resilience.ReadCoalescing.Enabled` | Hot-reloadable; in-flight coalesced entries drain naturally. |
| `BcdTags.*.CacheTtlMs`, `Plcs[i].DefaultCacheTtlMs` | Tag-map reseat for the affected PLC drops that PLC's entire cache. |
+20 -4
View File
@@ -135,6 +135,16 @@ These two fields are Tier-2 KPIs intended for memory-budget alerts. The cache is
| `backend.cacheEntryCount` | `long` | `CounterSnapshot.CacheEntryCount` | Current number of cached response entries for this PLC. |
| `backend.cacheBytes` | `long` | `CounterSnapshot.CacheBytes` | Approximate byte cost of the cache entries (response payloads plus key overhead). Used to detect runaway growth from a chatty client. |
### Keepalive counters
These fields describe the backend keepalive heartbeat. See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md).
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `backend.backendHeartbeatsSent` | `long` | `CounterSnapshot.BackendHeartbeatsSent` | Synthetic FC03 heartbeat probes issued on this PLC's idle backend socket. |
| `backend.backendHeartbeatsFailed` | `long` | `CounterSnapshot.BackendHeartbeatsFailed` | Heartbeat probes not answered within `BackendRequestTimeoutMs`. Each failure tears the backend down. |
| `backend.backendIdleDisconnects` | `long` | `CounterSnapshot.BackendIdleDisconnects` | Backend teardowns triggered by a failed heartbeat — an event count, distinct from `disconnectCascades` (which counts cascaded pipes). Sustained growth means a PLC is repeatedly going dark while idle. |
### Bytes
| JSON path | Type | Source | Meaning |
@@ -224,7 +234,10 @@ A representative two-PLC deployment, ~2 hours into a run:
"cacheMissCount": 88691,
"cacheInvalidations": 6203,
"cacheEntryCount": 47,
"cacheBytes": 18512
"cacheBytes": 18512,
"backendHeartbeatsSent": 412,
"backendHeartbeatsFailed": 0,
"backendIdleDisconnects": 0
},
"bytes": {
"upstreamIn": 4108290,
@@ -267,7 +280,10 @@ A representative two-PLC deployment, ~2 hours into a run:
"cacheMissCount": 0,
"cacheInvalidations": 0,
"cacheEntryCount": 0,
"cacheBytes": 0
"cacheBytes": 0,
"backendHeartbeatsSent": 0,
"backendHeartbeatsFailed": 0,
"backendIdleDisconnects": 0
},
"bytes": { "upstreamIn": 0, "upstreamOut": 0 }
}
@@ -282,10 +298,10 @@ The HTML renderer is `StatusHtmlRenderer.Render(StatusResponse)` in `src/Mbproxy
Structure:
1. **Header summary** — version, formatted uptime (`Nh MMm SSs`), `bound/configured` listener tally, last reload timestamp, reload count with a `(N rejected)` suffix when applicable.
2. **PLC table** — one row per configured PLC. Columns: Name, Host, Port, State (colour-coded — `bound` = green, `recovering` = orange, `stopped` = grey), Clients (count plus a comma-separated list of `remote (N PDUs)`), PDUs forwarded, FC03/FC04/FC06/FC16/FC? counts, BCD slots, Partial BCD, exception codes 01/02/03/04, RTT (ms), bytes in/out, multiplexer columns (in-flight, max in-flight, TxId wraps, cascades, queue), coalescing ratio cell, cache ratio cell.
2. **PLC table** — one row per configured PLC. Columns: Name, Host, Port, State (colour-coded — `bound` = green, `recovering` = orange, `stopped` = grey), Clients (count plus a comma-separated list of `remote (N PDUs)`), PDUs forwarded, FC03/FC04/FC06/FC16/FC? counts, BCD slots, Partial BCD, exception codes 01/02/03/04, RTT (ms), bytes in/out, multiplexer columns (in-flight, max in-flight, TxId wraps, cascades, queue), coalescing ratio cell, cache ratio cell, keepalive cell.
3. **State cell error detail** — when `state == "recovering"`, the cell also shows `lastBindError` and `(attempt N)` in a small red span.
The coalescing and cache cells each render as `<pct>% (<hits>)`. When neither has been exercised (`hit + miss == 0`), the cell renders an em-dash to keep the column narrow. Page weight is bounded by the design budget (≤ 50 KB for a 54-PLC fleet).
The coalescing and cache cells each render as `<pct>% (<hits>)`. When neither has been exercised (`hit + miss == 0`), the cell renders an em-dash to keep the column narrow. The keepalive cell shows the heartbeat-sent count, with `(fail N, idle-disc N)` appended only when either is non-zero. Page weight is bounded by the design budget (≤ 50 KB for a 54-PLC fleet).
The page does not depend on JavaScript. Refresh is driven entirely by the `<meta http-equiv="refresh" content="5">` tag, so any browser — including text-mode browsers — sees the same view.
+25 -2
View File
@@ -2,7 +2,9 @@
Operator diagnosis playbook for mbproxy. Each entry maps an observable symptom to the log event name and status-page counter that confirms it, then lists likely causes and remediation steps.
The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-<date>.log`. The live counters are at `http://<host>:<AdminPort>/status.json` (default port `8080`). Events at Error level and above are also mirrored to the Windows Application Event Log under source `mbproxy`.
The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-<date>.log` on Windows, or `/var/log/mbproxy/mbproxy-<date>.log` on Linux. The live counters are at `http://<host>:<AdminPort>/status.json` (default port `8080`). Events at Error level and above are also mirrored to the **Windows Application Event Log** (Windows Service) or the **local syslog / journal** (systemd) under source `mbproxy` — view the latter with `journalctl -t mbproxy` or `journalctl -u mbproxy`.
Paths and service commands below are written for Windows (`%ProgramData%`, `sc.exe`); the systemd equivalents are `/etc/mbproxy` + `/var/log/mbproxy` and `systemctl start|stop|status mbproxy`.
## Service Startup Failures
@@ -124,7 +126,28 @@ The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-<date>.log`. The l
1. Verify the upstream count on the status page returns to normal as clients reconnect — `plcs[].clients.connected` should climb again within seconds.
2. If cascades fire repeatedly against the same PLC, investigate the PLC and intermediate network for stability. The proxy itself has no state to repair.
3. If cascades correlate with idle periods, the idle middlebox-drop pattern is the likeliest cause; reduce the upstream client's poll interval below the middlebox idle timeout to keep traffic flowing.
3. If cascades correlate with idle periods, the idle middlebox-drop pattern is the likeliest cause. Keepalive is enabled by default and should already be preventing this — confirm `Connection.Keepalive.Enabled` is `true` and that `BackendHeartbeatIdleMs` is comfortably below the middlebox idle timeout. See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md).
### Backend keepalive heartbeat failing
**Symptom.** A PLC's backend connection is torn down while idle — no client was actively talking to it. `plcs[].backend.backendIdleDisconnects` increments and the upstream clients (if any were attached) are cascaded.
**Where to look.**
- Log events: `mbproxy.keepalive.heartbeat.timeout` (Warning) followed by `mbproxy.keepalive.backend.idle_disconnect` (Information).
- Status fields: `plcs[].backend.backendHeartbeatsSent`, `backendHeartbeatsFailed`, `backendIdleDisconnects`.
**Root causes.**
- The ECOM is reachable at the IP layer but no longer answering Modbus (firmware hang, ECOM reset mid-session).
- The path died between heartbeats and the heartbeat was the first request to discover it — this is the feature working as intended (the failure is found during idle, not on a client request).
- `BackendHeartbeatProbeAddress` points at an address the PLC rejects. The default (0 = `V0`) is safe on DL205/DL260; only an operator override could break this.
**Remediation.**
1. A single idle-disconnect that recovers on the next client request needs no action — the proxy reconnected the path proactively.
2. Repeated idle-disconnects on one PLC mean it keeps going dark while idle. Investigate the device and the network path; the proxy has no state to repair.
3. If `backendHeartbeatsFailed` climbs but the PLC answers real client requests fine, check that `BackendHeartbeatProbeAddress` is a register the device actually serves.
### Request timeout watchdog firing
+49 -3
View File
@@ -6,9 +6,9 @@ The stable catalog of every `mbproxy.*` event name the service emits, with its l
The service uses [Serilog](https://serilog.net/) wired through the `Microsoft.Extensions.Logging` bridge. Three sinks are configured (see `src/Mbproxy/HostingExtensions.cs`):
- **Console**written to stdout for interactive `--console` runs and for the SCM stdout capture.
- **Rolling file** under `%ProgramData%\mbproxy\logs\` (`mbproxy-<date>.log`).
- **Windows Event Log** — only when running as a Windows Service, and only for events at `Error` and above (see `src/Mbproxy/Diagnostics/EventLogBridge.cs`).
- **Console**stdout; captured by the Windows SCM or by systemd-journald.
- **Rolling file**`%ProgramData%\mbproxy\logs\` on Windows, `/var/log/mbproxy/` on Linux (`mbproxy-<date>.log`).
- **Platform diagnostic sink**`Error`+ events only. `DiagnosticSinkSelector` picks it once at the composition root: the **Windows Application Event Log** under the SCM (`EventLogBridge`), **local syslog** under systemd (`SyslogBridge`), or none for interactive/dev runs.
Every event uses source-generated `[LoggerMessage]` definitions, so the property names below match the message template token-for-token. The default minimum level is `Information`; lower the floor for `Mbproxy.*` categories via the standard `Logging:LogLevel` configuration to surface `Debug` events such as the coalesce and cache traces.
@@ -385,6 +385,51 @@ Fires whenever the entire per-PLC cache is wiped at once — primarily after a b
**Operator action:** none unless flushes happen on a tight loop, which would indicate the backend connection itself is unstable.
## Keepalive
See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md) for the backend heartbeat design.
### mbproxy.keepalive.heartbeat.sent
**Level:** Debug &middot; **EventId:** 150 &middot; **Source:** `src/Mbproxy/Proxy/Multiplexing/KeepaliveLogEvents.cs`
| Property | Type | Meaning |
|----------|------|---------|
| `Plc` | `string` | Configured PLC name. |
| `ProxyTxId` | `ushort` | Proxy-allocated TxId carrying the synthetic FC03 probe. |
| `Address` | `ushort` | Modbus address the probe reads (`BackendHeartbeatProbeAddress`). |
Fires each time the heartbeat loop issues a probe on an idle backend socket — at most one per `BackendHeartbeatIdleMs` per idle PLC.
**Operator action:** none. Debug-level; useful only when confirming the heartbeat is alive.
### mbproxy.keepalive.heartbeat.timeout
**Level:** Warning &middot; **EventId:** 151 &middot; **Source:** `src/Mbproxy/Proxy/Multiplexing/KeepaliveLogEvents.cs`
| Property | Type | Meaning |
|----------|------|---------|
| `Plc` | `string` | Configured PLC name. |
| `ProxyTxId` | `ushort` | Proxy TxId of the unanswered probe. |
| `ElapsedMs` | `long` | Milliseconds from probe send to timeout. |
Fires when a heartbeat probe is not answered within `BackendRequestTimeoutMs` — the backend is connected but no longer answering Modbus.
**Operator action:** check the PLC and the network path. Paired with `mbproxy.keepalive.backend.idle_disconnect` for the same PLC.
### mbproxy.keepalive.backend.idle_disconnect
**Level:** Information &middot; **EventId:** 152 &middot; **Source:** `src/Mbproxy/Proxy/Multiplexing/KeepaliveLogEvents.cs`
| Property | Type | Meaning |
|----------|------|---------|
| `Plc` | `string` | Configured PLC name. |
| `ElapsedMs` | `long` | Milliseconds the failed heartbeat waited before the teardown. |
Fires when a failed heartbeat triggers a proactive backend teardown. Every attached upstream pipe is cascaded; clients reconnect on their next request. This is the keepalive feature doing its job — finding a dead path during idle instead of on the next real request.
**Operator action:** none if isolated. Repeated idle-disconnects on one PLC indicate it keeps going dark while idle — investigate the device or the network path.
## BCD Rewriter
### mbproxy.rewrite.partial_bcd
@@ -495,5 +540,6 @@ Lifecycle events (`startup.*`, `listener.*`, `admin.*`, `shutdown.*`, `config.re
- [Response Cache](../Architecture/ResponseCache.md) — context for the `mbproxy.cache.*` events.
- [Status Page](../Operations/StatusPage.md) — counter equivalents for the high-volume Debug-level events.
- [Read Coalescing](../Architecture/ReadCoalescing.md) — context for the `mbproxy.coalesce.*` events.
- [Keepalive](../Architecture/Keepalive.md) — context for the `mbproxy.keepalive.*` events.
- [BCD Rewriting](../Features/BcdRewriting.md) — context for the `mbproxy.rewrite.*` and `mbproxy.exception.passthrough` events.
- [Hot Reload](../Features/HotReload.md) — context for the `mbproxy.config.reload.*` events.
+4 -1
View File
@@ -165,7 +165,10 @@ if (-not (Test-Path $configDest)) {
if (-not [System.Diagnostics.EventLog]::SourceExists('mbproxy')) {
Write-Host "Registering Windows Event Log source 'mbproxy'..."
New-EventLog -Source 'mbproxy' -LogName 'Application'
# .NET API, not New-EventLog: the *-EventLog cmdlets exist only in Windows
# PowerShell 5.1, not PowerShell 7+. This call is symmetric with the
# SourceExists check above and works on every PowerShell edition.
[System.Diagnostics.EventLog]::CreateEventSource('mbproxy', 'Application')
} else {
Write-Host "Windows Event Log source 'mbproxy' already registered."
}
+134
View File
@@ -0,0 +1,134 @@
#!/usr/bin/env bash
#
# install.sh — install the mbproxy service on a Linux / systemd host.
#
# The Linux counterpart of install.ps1. Copies the published binary to
# /opt/mbproxy, seeds the config at /etc/mbproxy/appsettings.json (preserving any
# existing one), creates the log and bundle-cache directories and the mbproxy
# service account, installs the systemd unit, and enables + starts the service.
#
# Re-running on an already-installed service is safe (idempotent): the binary is
# refreshed, an existing /etc/mbproxy/appsettings.json is preserved, and the
# service is restarted.
#
# Usage:
# sudo ./install.sh [--publish-dir DIR] [--no-start]
#
# --publish-dir DIR directory containing the published Mbproxy binary.
# Default: <repo>/publish-out/self-contained
# --no-start install and enable the unit but do not start it.
#
set -euo pipefail
# ── 0. Settings ──────────────────────────────────────────────────────────────
SERVICE_NAME="mbproxy"
SERVICE_USER="mbproxy"
INSTALL_DIR="/opt/mbproxy"
CONFIG_DIR="/etc/mbproxy"
LOG_DIR="/var/log/mbproxy"
CACHE_DIR="/var/cache/mbproxy"
UNIT_DEST="/etc/systemd/system/${SERVICE_NAME}.service"
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
repo_root="$(dirname "$script_dir")"
publish_dir="${repo_root}/publish-out/self-contained"
start_service=1
while [[ $# -gt 0 ]]; do
case "$1" in
--publish-dir) publish_dir="$2"; shift 2 ;;
--no-start) start_service=0; shift ;;
*) echo "Unknown argument: $1" >&2; exit 2 ;;
esac
done
# ── 1. Pre-flight checks ─────────────────────────────────────────────────────
if [[ "$(id -u)" -ne 0 ]]; then
echo "install.sh must run as root (use sudo)." >&2
exit 1
fi
binary_src="${publish_dir}/Mbproxy"
if [[ ! -f "$binary_src" ]]; then
echo "Mbproxy binary not found at '${binary_src}'." >&2
echo "Run install/publish.sh first, or pass --publish-dir." >&2
exit 1
fi
unit_src="${script_dir}/mbproxy.service"
config_src="${publish_dir}/appsettings.json"
if [[ ! -f "$unit_src" ]]; then
echo "Unit file not found at '${unit_src}'." >&2
exit 1
fi
echo "Installing ${SERVICE_NAME} service..."
echo " Publish dir : ${publish_dir}"
echo " Install dir : ${INSTALL_DIR}"
echo " Config dir : ${CONFIG_DIR}"
# ── 2. Service account ───────────────────────────────────────────────────────
if ! id -u "$SERVICE_USER" >/dev/null 2>&1; then
echo "Creating service account '${SERVICE_USER}'..."
useradd --system --no-create-home --shell /usr/sbin/nologin "$SERVICE_USER"
else
echo "Service account '${SERVICE_USER}' already exists."
fi
# ── 3. Stop the service if running (so the binary can be replaced) ───────────
if systemctl is-active --quiet "$SERVICE_NAME" 2>/dev/null; then
echo "Stopping running service '${SERVICE_NAME}'..."
systemctl stop "$SERVICE_NAME"
fi
# ── 4. Directories ───────────────────────────────────────────────────────────
install -d -m 0755 "$INSTALL_DIR"
install -d -m 0755 "$CONFIG_DIR"
install -d -m 0750 -o "$SERVICE_USER" -g "$SERVICE_USER" "$LOG_DIR"
install -d -m 0750 -o "$SERVICE_USER" -g "$SERVICE_USER" "$CACHE_DIR"
# ── 5. Binary ────────────────────────────────────────────────────────────────
echo "Copying binary to '${INSTALL_DIR}/Mbproxy'..."
install -m 0755 "$binary_src" "${INSTALL_DIR}/Mbproxy"
# ── 6. Config (preserve an existing one) ─────────────────────────────────────
config_dest="${CONFIG_DIR}/appsettings.json"
if [[ -f "$config_dest" ]]; then
echo "Preserving existing config at '${config_dest}'."
elif [[ -f "$config_src" ]]; then
echo "Seeding config template to '${config_dest}'..."
install -m 0644 "$config_src" "$config_dest"
else
echo "WARNING: no appsettings.json in '${publish_dir}' — create '${config_dest}' manually." >&2
fi
# ── 7. systemd unit ──────────────────────────────────────────────────────────
echo "Installing systemd unit to '${UNIT_DEST}'..."
install -m 0644 "$unit_src" "$UNIT_DEST"
systemctl daemon-reload
systemctl enable "$SERVICE_NAME" >/dev/null
# ── 8. Start ─────────────────────────────────────────────────────────────────
if [[ "$start_service" -eq 1 ]]; then
echo "Starting service '${SERVICE_NAME}'..."
systemctl start "$SERVICE_NAME"
sleep 1
if systemctl is-active --quiet "$SERVICE_NAME"; then
echo "Service '${SERVICE_NAME}' is running."
else
echo "WARNING: service '${SERVICE_NAME}' did not reach active state." >&2
echo "Check: journalctl -u ${SERVICE_NAME} -e" >&2
fi
fi
echo ""
echo "Install complete."
echo " Config : ${config_dest}"
echo " Logs : ${LOG_DIR}"
echo " Binary : ${INSTALL_DIR}/Mbproxy"
echo ""
echo "Next steps:"
echo " 1. Edit '${config_dest}' to configure your PLC list and BCD tags."
echo " 2. Restart: sudo systemctl restart ${SERVICE_NAME}"
echo " 3. Logs: journalctl -u ${SERVICE_NAME} -f"
echo " 4. Status: http://localhost:8080/"
+28 -1
View File
@@ -99,7 +99,34 @@
// Max time (ms) to wait for in-flight PDUs to complete during graceful shutdown
// (sc.exe stop / Windows Service stop signal). After this deadline the coordinator
// cancels remaining work and proceeds. Keep at or below the SCM wait-hint (30 s).
"GracefulShutdownTimeoutMs": 10000
"GracefulShutdownTimeoutMs": 10000,
// Keepalive / connection monitoring
// The DL205/DL260 ECOM does not emit TCP keepalives, so an idle backend
// socket can be silently dropped by a middlebox (switch, firewall, NAT)
// after 2-5 minutes. This section enables OS-level SO_KEEPALIVE on both
// backend and upstream sockets, and drives a periodic Modbus FC03 heartbeat
// on each idle backend socket so a dead path is detected before a real
// client request hits it. See docs/Architecture/Keepalive.md.
"Keepalive": {
// Master switch. false no SO_KEEPALIVE and no heartbeat; the proxy
// behaves exactly as a pre-keepalive build.
"Enabled": true,
// SO_KEEPALIVE: idle time (ms) before the OS sends its first probe.
"TcpIdleTimeMs": 30000,
// SO_KEEPALIVE: interval (ms) between probes once the idle time elapses.
"TcpProbeIntervalMs": 5000,
// SO_KEEPALIVE: unanswered probes before the OS declares the socket dead.
"TcpProbeCount": 4,
// Backend heartbeat: after this much backend idle (ms) the proxy issues a
// synthetic FC03 qty=1 read to keep the path warm and prove the ECOM is
// still answering Modbus. Must be greater than BackendRequestTimeoutMs.
"BackendHeartbeatIdleMs": 30000,
// FC03 PDU address the heartbeat reads. 0 = V0, valid on DL205/DL260.
"BackendHeartbeatProbeAddress": 0
}
},
// Resilience policies
@@ -0,0 +1,255 @@
// mbproxy configuration template (Linux / systemd) copy to /etc/mbproxy/appsettings.json
// and edit before starting the service.
//
// The .NET configuration loader accepts // and /* */ comments in JSON files
// (JSONC semantics) when using the default Host.CreateApplicationBuilder path.
//
// IMPORTANT: install.sh overwrites this file at the destination ONLY if no
// appsettings.json already exists there. An existing file is always preserved.
//
// This is the Linux counterpart of mbproxy.config.template.json identical except
// for the rolling-log path (/var/log/mbproxy) and a few platform notes. It is shipped
// as appsettings.json by a `dotnet publish -r linux-*` build.
{
"Mbproxy": {
// Global BCD tag list
// These tags apply to EVERY PLC by default.
// Each entry: Address (Modbus PDU address, decimal), Width (16 or 32 bits).
//
// Width 16 one register holds 4 BCD digits (09999).
// Wire value 0x1234 decodes to decimal 1234.
//
// Width 32 a CDAB-ordered register pair (Address = low word, Address+1 = high word).
// Decoded decimal = high * 10000 + low (DirectLOGIC CDAB word order).
//
// Per-PLC overrides (see Plcs[].BcdTags below):
// Add appends extra tags beyond what Global defines, or overrides a
// Global entry's Width when the same Address appears in both.
// Remove removes specific addresses from the effective set for that PLC.
// Effective set = (Global Add) Remove, resolved per PDU.
"BcdTags": {
"Global": [
// V2000 (octal) = decimal address 1024. 16-bit BCD counter.
{ "Address": 1024, "Width": 16 },
// V2040 (octal) = decimal address 1056. 32-bit BCD total at 1056/1057.
{ "Address": 1056, "Width": 32 },
// V2100 (octal) = decimal address 1088. 16-bit BCD setpoint.
//
// Phase 11: CacheTtlMs (optional) opts this tag into the response cache. With
// CacheTtlMs > 0 set, upstream clients reading this register will see values up
// to CacheTtlMs MILLISECONDS OLD explicit acknowledgement of the staleness
// window is required by enabling it. Default (omitted or 0) = cache disabled
// for this tag. The cache is OFF by default for every tag.
{ "Address": 1088, "Width": 16 /* , "CacheTtlMs": 1000 */ }
]
},
// PLC list
// Each entry maps one upstream proxy port one backend PLC.
// Upstream clients connect to ListenPort; the proxy forwards to Host:Port.
//
// IMPORTANT: H2-ECOM100 modules accept at most 4 simultaneous TCP connections.
// With the 1:1 upstreambackend model, a fifth upstream client to the same proxy
// port will cause a backend connect failure and an immediate upstream disconnect.
"Plcs": [
{
"Name": "Line1-Mixer", // Human-readable name (shown on status page and in logs)
"ListenPort": 5020, // Port the proxy listens on (upstream clients connect here)
"Host": "10.0.1.1", // PLC IP address or hostname
"Port": 502, // PLC Modbus TCP port (almost always 502)
"BcdTags": {
// Additional 32-bit tag specific to this PLC only.
"Add": [
{ "Address": 1200, "Width": 32 }
],
// Remove address 1056 from the Global list for this PLC
// (this mixer doesn't use the 32-bit BCD total).
"Remove": [ 1056 ]
}
},
{
"Name": "Line1-Conveyor",
"ListenPort": 5021,
"Host": "10.0.1.2",
"Port": 502
// No BcdTags override uses the Global set as-is.
}
// Add one entry per PLC. Ports must be unique per host. Typical fleet: 54 PLCs.
],
// Admin port
// Read-only HTTP status page.
// GET / self-contained HTML (auto-refreshes every 5 s)
// GET /status.json same data as JSON for monitoring scrapers
//
// Authentication is assumed at the network layer (trusted internal segment).
// Set to 0 to disable the admin endpoint.
"AdminPort": 8080,
// Connection timeouts
"Connection": {
// Max time (ms) to wait for a TCP connect to the PLC backend.
// Each Polly retry attempt gets its own copy of this timeout.
"BackendConnectTimeoutMs": 3000,
// Max time (ms) to wait for the PLC to respond to a forwarded PDU.
// Non-idempotent FC06/FC16 writes are one-shot the upstream client
// is disconnected immediately on timeout (no retry).
"BackendRequestTimeoutMs": 3000,
// Max time (ms) to wait for in-flight PDUs to complete during graceful shutdown
// (systemctl stop SIGTERM). After this deadline the coordinator cancels
// remaining work and proceeds. Keep at or below the unit's TimeoutStopSec.
"GracefulShutdownTimeoutMs": 10000,
// Keepalive / connection monitoring
// The DL205/DL260 ECOM does not emit TCP keepalives, so an idle backend
// socket can be silently dropped by a middlebox (switch, firewall, NAT)
// after 2-5 minutes. This section enables OS-level SO_KEEPALIVE on both
// backend and upstream sockets, and drives a periodic Modbus FC03 heartbeat
// on each idle backend socket so a dead path is detected before a real
// client request hits it. See docs/Architecture/Keepalive.md.
"Keepalive": {
// Master switch. false no SO_KEEPALIVE and no heartbeat; the proxy
// behaves exactly as a pre-keepalive build.
"Enabled": true,
// SO_KEEPALIVE: idle time (ms) before the OS sends its first probe.
"TcpIdleTimeMs": 30000,
// SO_KEEPALIVE: interval (ms) between probes once the idle time elapses.
"TcpProbeIntervalMs": 5000,
// SO_KEEPALIVE: unanswered probes before the OS declares the socket dead.
"TcpProbeCount": 4,
// Backend heartbeat: after this much backend idle (ms) the proxy issues a
// synthetic FC03 qty=1 read to keep the path warm and prove the ECOM is
// still answering Modbus. Must be greater than BackendRequestTimeoutMs.
"BackendHeartbeatIdleMs": 30000,
// FC03 PDU address the heartbeat reads. 0 = V0, valid on DL205/DL260.
"BackendHeartbeatProbeAddress": 0
}
},
// Resilience policies
"Resilience": {
// Polly retry policy for backend TCP connect attempts.
// MaxAttempts: total connect tries (including the first).
// BackoffMs: delay between each attempt (must have MaxAttempts1 entries).
"BackendConnect": {
"MaxAttempts": 3,
"BackoffMs": [ 100, 500, 2000 ]
},
// Polly recovery policy for listener bind failures.
// If a PLC's listen port can't be bound (in-use, bad IP, transient OS error),
// the supervisor retries according to this schedule.
// InitialBackoffMs: backoff per step (first N retries).
// SteadyStateMs: backoff for all subsequent retries (runs indefinitely).
"ListenerRecovery": {
"InitialBackoffMs": [ 1000, 2000, 5000, 15000, 30000 ],
"SteadyStateMs": 30000
},
// Phase 10 in-flight read coalescing.
//
// When two or more upstream clients (HMI / historian / engineering workstation /
// gateway) issue the SAME FC03 or FC04 read while a matching backend round-trip is
// already in flight, the proxy attaches the late arrivals to the existing in-flight
// entry and fans the single PLC response out to every attached client saving the
// ECOM's per-scan PDU budget on duplicated reads.
//
// Zero post-response staleness: coalescing operates ONLY between "first request
// sent to PLC" and "response received from PLC" (microseconds to ~10 ms typical).
// Each upstream client still sees its own MBAP transaction ID echoed correctly;
// the proxy is transparent.
//
// FC06 / FC16 writes are NEVER coalesced (non-idempotent). FC03 vs FC04 are
// separate Modbus tables and never share a coalescing key. Different unit IDs
// (multi-drop / gateway-backed setups) never coalesce.
//
// Enabled master switch. Hot-reloadable; flipping to false leaves running
// coalesced entries to drain naturally.
// MaxParties per-entry cap on attached parties. Past the cap, the next
// identical request opens a fresh backend round-trip (load-shedding
// safety valve for very fan-out-heavy fleets).
"ReadCoalescing": {
"Enabled": true,
"MaxParties": 32
}
},
// Response cache (Phase 11) opt-in bounded-staleness cache
//
// DESIGN-CONTRACT PIVOT: with caching enabled the proxy is no longer purely
// transparent. Upstream FC03/FC04 reads for cache-enabled tags may return values
// up to CacheTtlMs MILLISECONDS OLD. Operators opt tags in by setting a non-zero
// CacheTtlMs on a BcdTagOptions entry (or DefaultCacheTtlMs on a PlcOptions entry).
//
// The cache is OFF BY DEFAULT for every tag. A deployment with NO TTL config (this
// section entirely absent and no BcdTags.*.CacheTtlMs / Plcs[i].DefaultCacheTtlMs)
// behaves IDENTICALLY to a pre-Phase-11 deployment no behaviour change.
//
// AllowLongTtl gate for any CacheTtlMs > 60_000. Reload validation
// rejects configs that exceed 60 s without this opt-in,
// to prevent accidentally-stale-for-an-hour deployments.
// MaxEntriesPerPlc LRU cap per-PLC. Past this cap, the next insert evicts
// the least-recently-used entry. Defaults to 1000.
// EvictionIntervalMs background eviction tick. Scans each PLC's cache and
// removes entries past their TTL. Defaults to 5000.
//
// Properties (full text in docs/Architecture/ResponseCache.md):
// * Cache hits SHORT-CIRCUIT coalescing entirely (cache coalesce backend).
// * Successful FC06/FC16 write responses invalidate every cached FC03/FC04 entry
// whose address range OVERLAPS the write not just exact-key match.
// * Multi-tag read range: effective TTL = min(TTLs). Any tag with TTL=0 in the
// range disables caching for the whole read.
// * Cache stores POST-rewriter bytes; hits never re-invoke the BCD rewriter.
// * Tag-list hot-reload flushes the affected PLC's whole cache.
// * No persistence process restart wipes the cache.
"Cache": {
"AllowLongTtl": false,
"MaxEntriesPerPlc": 1000,
"EvictionIntervalMs": 5000
}
},
// Serilog
// Structured log output. Default: Information level, console + rolling-file.
// The console sink is captured by systemd-journald (view with `journalctl -u mbproxy`).
// In addition, when mbproxy runs as a systemd service the SyslogBridge writes Error+
// events to the local syslog with proper RFC5424 severity (wired in code, not here).
"Serilog": {
"Using": [ "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
"MinimumLevel": {
"Default": "Information",
"Override": {
"Microsoft": "Warning",
"System": "Warning"
}
},
"WriteTo": [
{
"Name": "Console",
"Args": {
"outputTemplate": "[{Timestamp:HH:mm:ss} {Level:u3}] {Message:lj} {Properties:j}{NewLine}{Exception}"
}
},
{
"Name": "File",
"Args": {
// Rolling log: one file per day, kept for 30 days, under /var/log/mbproxy
// (created by install.sh and owned by the mbproxy service account).
// Survives uninstall uninstall.sh archives logs to /var/log/mbproxy.archived-<ts>.
"path": "/var/log/mbproxy/mbproxy-.log",
"rollingInterval": "Day",
"retainedFileCountLimit": 30,
"outputTemplate": "[{Timestamp:yyyy-MM-dd HH:mm:ss.fff zzz} {Level:u3}] {Message:lj} {Properties:j}{NewLine}{Exception}"
}
}
]
}
}
+45
View File
@@ -0,0 +1,45 @@
# systemd unit for mbproxy — the Modbus TCP BCD proxy.
#
# Installed to /etc/systemd/system/mbproxy.service by install.sh.
# The Linux counterpart of the Windows Service registered by install.ps1.
#
# Type=exec (not Type=notify): mbproxy is a leaf service that nothing orders
# against, so systemd's readiness signal is unnecessary. Type=exec marks the
# unit active once the binary is exec'd; graceful stop still works because the
# .NET generic host handles SIGTERM directly (drains in-flight requests within
# Connection.GracefulShutdownTimeoutMs).
[Unit]
Description=mbproxy — Modbus TCP BCD proxy
After=network-online.target
Wants=network-online.target
[Service]
Type=exec
ExecStart=/opt/mbproxy/Mbproxy
WorkingDirectory=/etc/mbproxy
User=mbproxy
Group=mbproxy
# Restart on crash, but not on a clean SIGTERM stop.
Restart=on-failure
RestartSec=5
# Keep above Connection.GracefulShutdownTimeoutMs (default 10 s) so the drain
# completes before systemd escalates to SIGKILL.
TimeoutStopSec=30
# Self-contained single-file publish: pin native-library extraction to a stable,
# writable directory (install.sh creates it and grants the mbproxy account access).
Environment=DOTNET_BUNDLE_EXTRACT_BASE_DIR=/var/cache/mbproxy
# Hardening. The service only needs to write its log and bundle-cache directories.
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
ReadWritePaths=/var/log/mbproxy /var/cache/mbproxy
# If any configured ListenPort is below 1024, also add:
# AmbientCapabilities=CAP_NET_BIND_SERVICE
[Install]
WantedBy=multi-user.target
+34 -21
View File
@@ -1,19 +1,27 @@
<#
.SYNOPSIS
Publishes Mbproxy.exe in two flavours: self-contained and framework-dependent.
Publishes the Mbproxy binary in two flavours: self-contained and framework-dependent.
.DESCRIPTION
Produces two single-file win-x64 builds under <repo>\publish-out\:
Produces two single-file builds for the requested runtime under <repo>\publish-out\:
self-contained\Mbproxy.exe ~100 MB bundles the .NET 10 runtime;
no .NET install needed on target.
framework-dependent\Mbproxy.exe ~1.6 MB requires .NET 10 + ASP.NET Core
runtime preinstalled on target.
self-contained\ ~100 MB bundles the .NET 10 + ASP.NET Core runtime;
no .NET install needed on the target.
framework-dependent\ ~1.6 MB requires the .NET 10 + ASP.NET Core runtime
preinstalled on the target.
Both builds use the Release configuration and inherit the publish settings
declared in src\Mbproxy\Mbproxy.csproj (PublishSingleFile=true,
IncludeNativeLibrariesForSelfExtract=true). The framework-dependent build
overrides SelfContained=false on the command line.
The runtime is selected with -Rid (default win-x64). The binary is Mbproxy.exe on
Windows RIDs and Mbproxy on Linux/macOS RIDs.
Both builds use the Release configuration and inherit the publish settings declared
in src\Mbproxy\Mbproxy.csproj (PublishSingleFile=true, SelfContained=true,
IncludeNativeLibrariesForSelfExtract=true; those settings are gated on an explicit
RID, which is supplied here). The framework-dependent build overrides
SelfContained=false on the command line.
.PARAMETER Rid
.NET runtime identifier to publish for. Examples: win-x64, linux-x64.
Default: win-x64
.PARAMETER OutputDir
Root output directory. Two subfolders are created beneath it.
@@ -24,10 +32,12 @@
.EXAMPLE
.\publish.ps1
.\publish.ps1 -Clean
.\publish.ps1 -Rid linux-x64
.\publish.ps1 -Rid win-x64 -Clean
#>
[CmdletBinding()]
param(
[string]$Rid = 'win-x64',
[string]$OutputDir = (Join-Path (Split-Path -Parent $PSScriptRoot) 'publish-out'),
[switch]$Clean
)
@@ -46,15 +56,18 @@ if ($Clean -and (Test-Path $OutputDir)) {
Remove-Item -Recurse -Force $OutputDir
}
# Binary name: Windows RIDs produce an .exe, every other RID produces an extensionless ELF/Mach-O.
$exeName = if ($Rid -like 'win-*') { 'Mbproxy.exe' } else { 'Mbproxy' }
$selfContainedOut = Join-Path $OutputDir 'self-contained'
$frameworkDependentOut = Join-Path $OutputDir 'framework-dependent'
Write-Host "`n=== Publishing self-contained (~100 MB) ===" -ForegroundColor Cyan
& dotnet publish $csproj -c Release -r win-x64 -o $selfContainedOut --nologo
Write-Host "`n=== Publishing self-contained ($Rid, ~100 MB) ===" -ForegroundColor Cyan
& dotnet publish $csproj -c Release -r $Rid -o $selfContainedOut --nologo
if ($LASTEXITCODE -ne 0) { throw "self-contained publish failed (exit $LASTEXITCODE)" }
Write-Host "`n=== Publishing framework-dependent (~1.6 MB) ===" -ForegroundColor Cyan
& dotnet publish $csproj -c Release -r win-x64 -p:SelfContained=false -p:PublishSingleFile=true -o $frameworkDependentOut --nologo
Write-Host "`n=== Publishing framework-dependent ($Rid, ~1.6 MB) ===" -ForegroundColor Cyan
& dotnet publish $csproj -c Release -r $Rid -p:SelfContained=false -p:PublishSingleFile=true -o $frameworkDependentOut --nologo
if ($LASTEXITCODE -ne 0) { throw "framework-dependent publish failed (exit $LASTEXITCODE)" }
function Format-Size {
@@ -63,14 +76,14 @@ function Format-Size {
else { '{0:N1} KB' -f ($Bytes / 1KB) }
}
Write-Host "`n=== Result ===" -ForegroundColor Green
Write-Host "`n=== Result ($Rid) ===" -ForegroundColor Green
foreach ($flavour in 'self-contained','framework-dependent') {
$exe = Join-Path $OutputDir "$flavour\Mbproxy.exe"
if (Test-Path $exe) {
$size = (Get-Item $exe).Length
Write-Host (" {0,-22} {1,10} {2}" -f $flavour, (Format-Size $size), $exe)
$bin = Join-Path $OutputDir "$flavour\$exeName"
if (Test-Path $bin) {
$size = (Get-Item $bin).Length
Write-Host (" {0,-22} {1,10} {2}" -f $flavour, (Format-Size $size), $bin)
} else {
Write-Warning "Missing: $exe"
Write-Warning "Missing: $bin"
}
}
Write-Host ""
+82
View File
@@ -0,0 +1,82 @@
#!/usr/bin/env bash
#
# publish.sh — Linux/macOS counterpart of publish.ps1.
#
# Publishes the Mbproxy binary in two flavours for the requested runtime under
# <repo>/publish-out/:
#
# self-contained/ ~100 MB — bundles the .NET 10 + ASP.NET Core runtime;
# no .NET install needed on the target.
# framework-dependent/ ~1.6 MB — requires the .NET 10 + ASP.NET Core runtime
# preinstalled on the target.
#
# Both builds use the Release configuration and inherit the publish settings in
# src/Mbproxy/Mbproxy.csproj (those settings are gated on an explicit RID, which
# is supplied here). The framework-dependent build overrides SelfContained=false.
#
# Usage:
# ./publish.sh [-r RID] [-o OUTPUT_DIR] [--clean]
#
# -r RID .NET runtime identifier (default: linux-x64)
# -o OUTPUT_DIR root output directory (default: <repo>/publish-out)
# --clean delete OUTPUT_DIR before publishing
#
# Examples:
# ./publish.sh
# ./publish.sh -r linux-x64 --clean
#
set -euo pipefail
rid="linux-x64"
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
repo_root="$(dirname "$script_dir")"
output_dir="$repo_root/publish-out"
clean=0
while [[ $# -gt 0 ]]; do
case "$1" in
-r) rid="$2"; shift 2 ;;
-o) output_dir="$2"; shift 2 ;;
--clean) clean=1; shift ;;
*) echo "Unknown argument: $1" >&2; exit 2 ;;
esac
done
csproj="$repo_root/src/Mbproxy/Mbproxy.csproj"
if [[ ! -f "$csproj" ]]; then
echo "Cannot find $csproj" >&2
exit 1
fi
if [[ "$clean" -eq 1 && -d "$output_dir" ]]; then
echo "Cleaning $output_dir"
rm -rf "$output_dir"
fi
# Binary name: Windows RIDs produce an .exe, every other RID an extensionless binary.
if [[ "$rid" == win-* ]]; then bin_name="Mbproxy.exe"; else bin_name="Mbproxy"; fi
self_contained_out="$output_dir/self-contained"
framework_dependent_out="$output_dir/framework-dependent"
echo
echo "=== Publishing self-contained ($rid, ~100 MB) ==="
dotnet publish "$csproj" -c Release -r "$rid" -o "$self_contained_out" --nologo
echo
echo "=== Publishing framework-dependent ($rid, ~1.6 MB) ==="
dotnet publish "$csproj" -c Release -r "$rid" \
-p:SelfContained=false -p:PublishSingleFile=true -o "$framework_dependent_out" --nologo
echo
echo "=== Result ($rid) ==="
for flavour in self-contained framework-dependent; do
bin="$output_dir/$flavour/$bin_name"
if [[ -f "$bin" ]]; then
size="$(du -h "$bin" | cut -f1)"
printf ' %-22s %8s %s\n' "$flavour" "$size" "$bin"
else
echo " WARNING: missing $bin" >&2
fi
done
echo
+4 -1
View File
@@ -122,7 +122,10 @@ if (Test-Path $InstallPath) {
if ([System.Diagnostics.EventLog]::SourceExists('mbproxy')) {
Write-Host "Removing Windows Event Log source 'mbproxy'..."
try {
Remove-EventLog -Source 'mbproxy'
# .NET API, not Remove-EventLog: the *-EventLog cmdlets exist only in
# Windows PowerShell 5.1, not PowerShell 7+. Symmetric with the
# SourceExists check above.
[System.Diagnostics.EventLog]::DeleteEventSource('mbproxy')
} catch {
Write-Warning "Could not remove Event Log source: $_"
}
+85
View File
@@ -0,0 +1,85 @@
#!/usr/bin/env bash
#
# uninstall.sh — remove the mbproxy service from a Linux / systemd host.
#
# The Linux counterpart of uninstall.ps1. Stops and disables the service,
# removes the systemd unit and installed files, and (unless --keep-config)
# removes the config directory. Log files are always preserved: they are moved
# to a timestamped archive so post-uninstall diagnostics remain accessible.
#
# Usage:
# sudo ./uninstall.sh [--keep-config] [--keep-user]
#
# --keep-config leave /etc/mbproxy/appsettings.json in place.
# --keep-user leave the mbproxy service account in place.
#
set -euo pipefail
SERVICE_NAME="mbproxy"
SERVICE_USER="mbproxy"
INSTALL_DIR="/opt/mbproxy"
CONFIG_DIR="/etc/mbproxy"
LOG_DIR="/var/log/mbproxy"
CACHE_DIR="/var/cache/mbproxy"
UNIT_DEST="/etc/systemd/system/${SERVICE_NAME}.service"
keep_config=0
keep_user=0
while [[ $# -gt 0 ]]; do
case "$1" in
--keep-config) keep_config=1; shift ;;
--keep-user) keep_user=1; shift ;;
*) echo "Unknown argument: $1" >&2; exit 2 ;;
esac
done
if [[ "$(id -u)" -ne 0 ]]; then
echo "uninstall.sh must run as root (use sudo)." >&2
exit 1
fi
echo "Uninstalling ${SERVICE_NAME} service..."
# ── 1. Stop + disable the service ────────────────────────────────────────────
if systemctl list-unit-files "${SERVICE_NAME}.service" >/dev/null 2>&1 \
&& [[ -n "$(systemctl list-unit-files "${SERVICE_NAME}.service" --no-legend 2>/dev/null)" ]]; then
echo "Stopping and disabling '${SERVICE_NAME}'..."
systemctl disable --now "$SERVICE_NAME" >/dev/null 2>&1 || true
fi
# ── 2. Remove the systemd unit ───────────────────────────────────────────────
if [[ -f "$UNIT_DEST" ]]; then
echo "Removing systemd unit '${UNIT_DEST}'..."
rm -f "$UNIT_DEST"
fi
systemctl daemon-reload
systemctl reset-failed "$SERVICE_NAME" >/dev/null 2>&1 || true
# ── 3. Archive logs (always preserved, never deleted) ────────────────────────
if [[ -d "$LOG_DIR" ]]; then
timestamp="$(date -u +%Y%m%dT%H%M%SZ)"
archive_dir="${LOG_DIR}.archived-${timestamp}"
echo "Archiving logs to '${archive_dir}'..."
mv "$LOG_DIR" "$archive_dir"
fi
# ── 4. Remove installed files ────────────────────────────────────────────────
rm -rf "$INSTALL_DIR" "$CACHE_DIR"
if [[ "$keep_config" -eq 1 ]]; then
echo "Keeping config at '${CONFIG_DIR}/appsettings.json' (--keep-config)."
else
rm -rf "$CONFIG_DIR"
fi
# ── 5. Remove the service account ────────────────────────────────────────────
if [[ "$keep_user" -eq 0 ]] && id -u "$SERVICE_USER" >/dev/null 2>&1; then
echo "Removing service account '${SERVICE_USER}'..."
userdel "$SERVICE_USER" 2>/dev/null || true
fi
echo ""
echo "Uninstall complete."
if compgen -G "${LOG_DIR}.archived-*" >/dev/null; then
echo "Archived logs: ${LOG_DIR}.archived-*"
fi
+576
View File
@@ -0,0 +1,576 @@
# mbproxy Multiplatform Implementation Plan
**Created:** 2026-05-15
**Status:** All six phases implemented. 413 tests green on Windows; Windows Service and
Linux systemd install E2E both green. Two findings (pymodbus-sim-on-Linux, `AddSystemd()`
notify) logged as orthogonal follow-ups. Working tree only — nothing committed.
**Working artifact** — not part of the `docs/` source-of-truth tree (per `../DOCS-GUIDE.md`).
Delete or archive once the work lands.
### Progress log
- **2026-05-15 — Phase 1 done, Gate 1 green.** RID removed from `csproj`
(single-file settings now gated on `'$(RuntimeIdentifier)' != ''`);
`publish.ps1` gained `-Rid`; `publish.sh` added. `dotnet build -c Debug` 0
warnings; `dotnet test` **398 passed / 0 failed** (baseline 325 → 398, the
Keepalive feature added tests); `win-x64``Mbproxy.exe` 100.1 MB,
`linux-x64``Mbproxy` ELF 97.2 MB. ELF launch-smoked on `10.100.0.35`:
full startup, listeners bound, `mbproxy.startup.ready` + admin endpoint up,
no errors. Box prep done (.NET SDK 10.0.300, shellcheck 0.10.0 installed).
- **2026-05-15 — Phases 2 + 3 code done (combined integrator pass).** Packages
added: `Microsoft.Extensions.Hosting.Systemd` 10.0.8,
`Serilog.Sinks.SyslogMessages` 4.1.0 (the maintained IonxSolutions package —
the bare `Serilog.Sinks.Syslog` ID is a near-abandoned 0.2.0 package; same
approved intent). New `DiagnosticSink` enum + `DiagnosticSinkSelector` (pure);
new `SyslogBridge`; `EventLogBridge` truncation extracted to a non-annotated
`EventLogMessage` type (testable cross-OS). `AddMbproxySerilog` now selects
the sink internally; `Program.cs` calls `AddSystemd()` + `AddWindowsService()`.
13 new tests. **411 passed / 0 failed on Windows**; on `10.100.0.35`
**372 passed / 39 skipped / 0 failed** — all 39 skips are simulator-backed
E2E (see finding below), every host/diagnostic/smoke test green on Linux.
- **2026-05-15 — Two cross-platform bugs found and fixed in install tooling.**
(1) `tests/sim/run-dl205-sim.ps1` was Windows-only — hardcoded venv paths
`Scripts\*.exe`; now branches `Scripts`/`.exe` vs `bin`/`` on `$IsWindows`
and adds `python3` to the interpreter candidates. (2) `install.ps1` /
`uninstall.ps1` used `New-EventLog` / `Remove-EventLog`, which exist only in
Windows PowerShell 5.1 — they fail under PowerShell 7+. Switched to the .NET
API (`[EventLog]::CreateEventSource` / `DeleteEventSource`), symmetric with
the `SourceExists` calls already in those scripts.
- **2026-05-15 — Windows Service E2E green (local, admin).** Republished
`win-x64`; `install.ps1 -Start` installs + starts the service; verified
Running/Automatic, `status.json` served, listeners bound,
`mbproxy.startup.ready` logged, Event Log source registered,
`WindowsServiceLifetime` wrote "Service started successfully" (proves the
process runs under the SCM). `uninstall.ps1` stopped/deleted the service,
archived logs, removed the Event Log source. Box left clean. (A forced
`EventLogBridge` Error+ write was not pursued — `Emit` is unchanged code,
covered by `EventLogMessageTests`; sink selection is covered by
`DiagnosticSinkSelectorTests`.)
- **2026-05-15 — Linux systemd E2E done.** The `linux-x64` ELF runs under a
real systemd unit on `10.100.0.35`: starts, binds listeners, serves the
admin endpoint, and `systemctl stop` → graceful SIGTERM drain
(`mbproxy.shutdown.complete` in the journal). `Type=notify` does not work
(see Findings) → Phase 5 will ship `Type=exec`. Box prep this session:
`dotnet-sdk-10.0`, `shellcheck`, `python3-venv`, pwsh 7.6.1 (dotnet global
tool), pymodbus 3.13.0 venv.
- **2026-05-15 — Phases 46 done.** Phase 4: new `install/mbproxy.linux.config.template.json`
(Unix log path `/var/log/mbproxy`, systemd-oriented comments); `csproj` links the
platform-correct template into the published `appsettings.json` by RID
(`win-*`/RID-less → Windows, else Unix) — verified by publishing both RIDs;
`MbproxyOptionsBindingTests` extended to load + schema-validate both templates
(now 413 tests on Windows). Phase 5: `install/mbproxy.service` (`Type=exec`,
hardened, `mbproxy` service account), `install/install.sh`, `install/uninstall.sh`
`shellcheck` clean; install→active→`status.json` served→uninstall→clean E2E
passed on `10.100.0.35`. Phase 6: `README.md`, `mbproxy/CLAUDE.md`,
`../CLAUDE.md`, `docs/Operations/Configuration.md`, `docs/Reference/LogEvents.md`,
`docs/Operations/Troubleshooting.md`, `docs/Architecture/Overview.md`,
`docs/Features/HotReload.md` updated for the dual-platform reality.
### Findings
- **Linux full run: 374 passed / 37 failed / 0 skipped.** With the simulator
launcher fixed and pymodbus provisioned, the simulator-backed E2E tests now
*run* on Linux (0 skipped) but **37 fail** with `IOException: Broken pipe`
(`SocketException`) when the NModbus client writes through the proxy. The
failures are broad across all simulator-backed E2E (cache, forwarding,
rewriter, supervision). **Not a Phases 13 regression:** the multiplatform
work touches only build config, diagnostic sinks, and host registration —
none of the Modbus proxy data path. The same 37 tests pass on Windows
(411/411), and every non-E2E test — including all 13 new diagnostic tests —
passes on Linux. **Root cause isolated:** the `SimulatorSmokeTests` — which
connect *directly to the pymodbus simulator with no proxy in the path* — also
fail (TCP connect error). So the fault is the pymodbus 3.13.0 simulator
itself on this box, not mbproxy's proxy code. Likely pymodbus 3.13.0 vs
Python 3.13.5 (both very new), or the box's Docker-host networking. Treated
as a **separate investigation** (pymodbus-simulator-on-Linux), entirely
orthogonal to the multiplatform service work — see the session report.
- The `run-dl205-sim.ps1` idempotency check keys on `Test-Path $venvDir` only;
a venv left structurally broken by a killed run (no `bin/`) is not detected
and re-created. Pre-existing latent gap, not platform-specific — noted, not
fixed (out of scope; a clean run is unaffected).
- **`AddSystemd()` does not deliver `sd_notify(READY=1)` here → Phase 5 uses
`Type=exec`.** mbproxy runs correctly under systemd (starts, binds, serves,
and SIGTERM → graceful drain all work — verified in the journal), but a
`Type=notify` unit never receives `READY=1` and times out. Isolated step by
step: `SystemdHelpers.IsSystemdService()` correctly returns `True` under
systemd; a *minimal* `Host.CreateApplicationBuilder()` + `AddSystemd()` host
reproduces the failure; both a `systemd-run` transient unit and a real
`Type=notify` unit file fail identically. So it is **not an mbproxy bug**
it is a `HostApplicationBuilder` + `Microsoft.Extensions.Hosting.Systemd`
10.0.8 (minimal-hosting) issue. **Resolution:** the Phase 5 unit uses
`Type=exec` — mbproxy is a leaf service that nothing orders against, so the
readiness signal is unnecessary; `Type=exec` + the generic host's built-in
POSIX `SIGTERM` handling (independent of `SystemdLifetime`) gives a fully
working unit with `Restart=on-failure`. `AddSystemd()` stays in `Program.cs`
(correct, documented, forward-compatible, harmless). Root-causing the .NET
notify gap is logged as a separate follow-up.
A plan to make mbproxy run on Linux (and incidentally macOS) as a first-class
target while keeping the Windows Service + Event Log behavior intact and adding
systemd + journald/syslog equivalents.
The hosting model (`Host.CreateApplicationBuilder` + `IHostedService` + Kestrel)
is already portable, so the work is narrow: generalize the build, abstract one
diagnostic sink, add one package + one call, and add Linux tooling/docs.
---
## 0. Test Environments
Both platforms can be exercised fully — no environment is simulated or
deferred.
### 0.1 Windows (the dev box — local)
The dev box runs **with administrator rights**, so every Windows gate runs
locally with no separate test machine:
- `install.ps1` (requires elevation) installs the real Windows Service.
- The Event Log source `mbproxy` can be registered and `EventLogBridge` writes
verified against the Application log.
- Install → start → stop → uninstall is a full local round-trip.
> Windows Service E2E mutates machine state (a registered service + Event Log
> source). It is **integrator-only** and the integrator always runs
> `uninstall.ps1` to leave the box clean after each gate.
### 0.2 Linux
**Host:** `dohertj2@10.100.0.35` — Debian 13 (trixie), amd64, kernel 6.12,
hostname `DOCKER`. systemd 257.
- **Access:** passwordless SSH from the Windows dev box; passwordless `sudo`
(verified 2026-05-15).
- **Reachable** on `10.100.0.35` (also `10.50.0.35`, `10.200.0.35`).
- **One-time prep** (run once before Wave 1 gates):
```
ssh dohertj2@10.100.0.35 'sudo apt-get update && \
sudo apt-get install -y dotnet-sdk-10.0 shellcheck'
```
`dotnet-sdk-10.0` candidate is `10.0.203` — matches the `net10.0` target.
- **Docker is installed** on the box (the user is in the `docker` group). Use
ephemeral Debian containers to isolate per-subagent E2E runs so parallel
Wave-4 agents don't collide on the host's systemd / ports (see section 3,
rule 8).
**How the integrator uses the box per gate:**
- Push the integration branch (or `rsync` the worktree) to the box, then run
`dotnet build` / `dotnet test` / `dotnet publish -r linux-x64` over SSH.
- Run the *actual* `linux-x64` ELF binary, the systemd unit, and `shellcheck`
here — Windows can cross-*publish* a `linux-x64` binary but cannot *run* or
service-host it.
> The box is a **shared mutable resource**. Host-level mutations (apt installs,
> `systemctl` on the real host, privileged-port binds) are integrator-only and
> run serially between waves. Subagents that need Linux E2E use throwaway
> Docker containers, never the host's init system directly.
---
## 1. Scope
**In scope**
- Linux (`linux-x64`) as a supported runtime target alongside `win-x64`.
- systemd integration (`Type=notify`, sd_notify readiness, SIGTERM drain).
- A Linux-appropriate error-event diagnostic sink (syslog, severity-mapped).
- RID-agnostic build + dual-RID publish tooling.
- Linux install tooling (systemd unit + shell scripts).
- Docs/README/CLAUDE.md updates.
**Out of scope (state explicitly in docs)**
- macOS `launchd` integration — mbproxy will *run* on macOS as a console
process but ships no service-manager integration.
- ARM RIDs (`linux-arm64`) — the build will not *forbid* them, but they are
untested.
- Container/Docker packaging — separate future effort.
**Locked design decisions**
- Reference `Microsoft.Extensions.Hosting.WindowsServices` *and*
`Microsoft.Extensions.Hosting.Systemd` unconditionally; both packages are
portable and both helpers self-detect their host. No conditional
`<PackageReference>`.
- All Windows API calls (`System.Diagnostics.EventLog`) stay behind
`OperatingSystem.IsWindows()` + `[SupportedOSPlatform("windows")]`; CA1416
(already enforced via `TreatWarningsAsErrors`) is the safety net.
- Diagnostic sink selection happens **once**, at the composition root
(`AddMbproxySerilog`). No OS branching anywhere else.
- Prefer **new files** over editing shared files, to keep parallel work
conflict-free.
- **Linux error-event sink: `Serilog.Sinks.Syslog`** (decided 2026-05-15).
Error+ events get RFC5424 severity mapping on Linux, mirroring the Windows
Event Log behavior where Error+ is surfaced distinctly.
`DiagnosticSinkSelector` returns `EventLog | Syslog | None`.
---
## 2. Phase Breakdown
Each phase lists its **owned file set** (the parallel-safety contract),
changes, tests, and a **gate** that must be green before the next phase starts.
### Phase 1 — Build & publish generalization (foundation)
**Objective:** Remove the hardcoded RID so the project builds/publishes for any
runtime; keep the Windows output byte-identical.
**Owned files**
- `src/Mbproxy/Mbproxy.csproj`
- `install/publish.ps1`
- `install/publish.sh` *(new)*
**Changes**
- `Mbproxy.csproj`: delete `<RuntimeIdentifier>win-x64</RuntimeIdentifier>`
from the Release `PropertyGroup`; keep `PublishSingleFile` / `SelfContained`
/ `IncludeNativeLibrariesForSelfExtract`. RID becomes a publish-time `-r`
argument.
- `publish.ps1`: add a `-Rid` parameter (default `win-x64`), keep the
two-flavor logic.
- `publish.sh`: Linux counterpart producing `linux-x64` self-contained +
framework-dependent builds.
- (The RID-conditioned `appsettings.json` content item is Phase 4; in Phase 1
just confirm the build works without a baked RID.)
**Tests**
- No xunit tests (build-config change). Gate is publish success on both RIDs.
**Gate 1**
- `dotnet build -c Debug` green; `dotnet test` full suite green (unchanged
count).
- `dotnet publish -c Release -r win-x64` produces a single-file `Mbproxy.exe`
(same size class as before).
- `dotnet publish -c Release -r linux-x64` produces a single-file `Mbproxy`
ELF binary. Cross-published from the Windows dev box; the ELF is then copied
to `10.100.0.35` and confirmed to launch (`./Mbproxy --version`-class smoke).
- Zero new analyzer warnings.
---
### Phase 2 — Diagnostic sink abstraction
**Objective:** Make error-event delivery a platform-selected sink. Windows
keeps `EventLogBridge`; Linux gets a syslog sink.
**Owned files**
- `src/Mbproxy/Diagnostics/DiagnosticSinkSelector.cs` *(new — pure selection
logic)*
- `src/Mbproxy/Diagnostics/SyslogBridge.cs` *(new)*
- `src/Mbproxy/Diagnostics/EventLogBridge.cs` *(minor: extract the 32 KB
truncation helper into a testable static method)*
- `src/Mbproxy/HostingExtensions.cs` *(only `AddMbproxySerilog`)*
- `src/Mbproxy/Mbproxy.csproj` *(add `Serilog.Sinks.Syslog` package)*
- New test files (see below)
> `HostingExtensions.cs` and `Mbproxy.csproj` are also touched by Phase 3.
> **Phases 2 and 3 must not run in parallel** (see section 3). They are
> sequential.
**Changes**
- `DiagnosticSinkSelector` — a pure function taking
`(bool isWindows, bool isWindowsService, bool isSystemd)` and returning an
enum (`EventLog | Syslog | None`). No I/O, fully unit-testable.
- `SyslogBridge`: Serilog `ILogEventSink` wrapping `Serilog.Sinks.Syslog`,
active for Error+ only, mirroring `EventLogBridge`'s contract (silent no-op
if syslog unavailable).
- `AddMbproxySerilog`: replace the `addEventLogBridge` bool parameter with a
`DiagnosticSinkSelector` result; wire the chosen sink. Keep the
`OperatingSystem.IsWindows()` guard around `EventLogBridge`.
- Extract `EventLogBridge`'s message-truncation into
`internal static string TruncateToEventLogLimit(string)` so it can be tested
OS-independently.
**Tests** (`tests/Mbproxy.Tests/Diagnostics/`)
- `DiagnosticSinkSelectorTests` — table-driven: Windows+service→`EventLog`;
Windows console→`None`; Linux+systemd→`Syslog`; Linux console→`None`;
macOS→`None`.
- `EventLogBridgeTests``[Trait("Category","Unit")]`, Windows-guarded facts:
source-missing → silent no-op; truncation helper caps at 32 KB and appends
`...` (this fact runs on all OSes since the helper is pure).
- `SyslogBridgeTests` — Error+ filter; no-throw when transport unavailable.
**Gate 2**
- Full test suite green on Windows (local); full suite green on Linux —
integrator runs `dotnet test` over SSH on `10.100.0.35`.
- `EventLogBridge` emits to the Application log — verified locally via a real
Windows Service install (`install.ps1`, admin rights available), then
`uninstall.ps1` to clean up.
- CA1416: zero warnings.
---
### Phase 3 — Service host integration (systemd)
**Objective:** Register both init-system integrations; the host correctly
reports readiness to whichever launched it.
**Owned files**
- `src/Mbproxy/Program.cs`
- `src/Mbproxy/HostingExtensions.cs` *(call-site update only)*
- `src/Mbproxy/Mbproxy.csproj` *(add `Microsoft.Extensions.Hosting.Systemd`)*
**Changes**
- `csproj`: add
`<PackageReference Include="Microsoft.Extensions.Hosting.Systemd" />` (pin to
the 10.0.x line matching the existing Windows-services package).
- `Program.cs`: call `builder.Services.AddSystemd();` alongside
`AddWindowsService();`. Compute `isSystemd` via
`SystemdHelpers.IsSystemdService()` and feed `DiagnosticSinkSelector`
together with `isWindowsService`.
- Confirm SIGTERM → host shutdown → existing
`Connection.GracefulShutdownTimeoutMs` drain path works (it does — POSIX
signal handling is built into the generic host; just verify).
**Tests** (`tests/Mbproxy.Tests/HostSmokeTests.cs` — extend existing file)
- `HostSmoke_RegistersBothServiceIntegrations_StartsAndStops` — builds the host
with both `AddWindowsService` + `AddSystemd`, asserts no throw, asserts
`mbproxy.startup.ready` still logged.
- Existing two smoke tests must remain green.
**Gate 3**
- Full suite green on Windows (local) and Linux (`10.100.0.35` via SSH).
- Windows Service E2E, run locally with admin rights: `install.ps1` → service
starts, logs `mbproxy.startup.ready` + writes to Event Log, `Stop-Service`
drains cleanly, `uninstall.ps1` removes it. **No regression** in Windows
behavior is the hard requirement of this gate.
- Linux systemd E2E on `10.100.0.35`**done.** The `linux-x64` binary runs
under a real systemd unit: it starts, binds listeners, serves the admin
endpoint, and `systemctl stop` (SIGTERM) drains gracefully
(`mbproxy.shutdown.complete` in the journal). `Type=notify` was found not to
deliver `READY=1` (Findings) → the Phase 5 unit uses `Type=exec`, under which
the service is fully functional.
---
### Phase 4 — Config & filesystem portability
**Objective:** No Windows-only paths in the shipped/installed config.
**Owned files**
- `install/mbproxy.config.template.json` *(Windows — keep `C:\ProgramData\...`
path)*
- `install/mbproxy.linux.config.template.json` *(new — `/var/log/mbproxy/...`,
Linux syslog `Using` entry)*
- `src/Mbproxy/Mbproxy.csproj` *(condition the linked `appsettings.json`
content item by `$(RuntimeIdentifier)`)*
> Touches `csproj`. Must run after Phase 3's csproj edit is merged (sequential
> w.r.t. csproj), but is otherwise independent of Phase 5/6.
**Changes**
- New Linux template: log path `/var/log/mbproxy/mbproxy-.log`; Serilog
`Using` array includes the syslog sink; comment header points at
`/etc/mbproxy/appsettings.json`.
- `csproj`: link the win template for `win-*` RIDs and the linux template for
`linux-*` RIDs into the published `appsettings.json` (RID-conditioned
`<Content>` items).
**Tests** (`tests/Mbproxy.Tests/Options/`)
- Extend `MbproxyOptionsBindingTests`: load **each** shipped template through
the config binder + `MbproxyOptionsValidator`; assert both bind and validate
cleanly. Catches a malformed Linux template at build time.
**Gate 4**
- Both templates bind + validate (new test green).
- `dotnet publish -r linux-x64` ships the Linux template as `appsettings.json`;
`-r win-x64` ships the Windows one. Verify by inspecting publish output.
---
### Phase 5 — Linux install tooling
**Objective:** Parity with `install.ps1` for systemd hosts.
**Owned files** (all new, fully disjoint from all other phases)
- `install/mbproxy.service` — systemd unit, **`Type=exec`** (not `Type=notify`
see Findings: `AddSystemd()` does not deliver `READY=1` for the minimal
hosting model), `Restart=on-failure`, `User=mbproxy`, `ExecStart` pointing at
the installed binary; sets `DOTNET_BUNDLE_EXTRACT_BASE_DIR`.
- `install/install.sh` — creates `mbproxy` service account, lays down binary +
`/etc/mbproxy/appsettings.json` (preserve-if-exists, matching `install.ps1`
semantics), creates `/var/log/mbproxy`, installs + `systemctl enable --now`.
- `install/uninstall.sh``systemctl disable --now`, archives logs (mirror the
`.archived-<ts>` convention), removes unit.
**Tests**
- Not xunit. Gate = `shellcheck` clean + a dry-run inside a throwaway Debian
container on `10.100.0.35`.
**Gate 5**
- `shellcheck install/*.sh` clean — run on `10.100.0.35` (shellcheck installed
in the one-time prep).
- End-to-end on `10.100.0.35`, inside a throwaway Debian container:
`install.sh` → service active → proxy answers Modbus on a configured port →
`uninstall.sh` → service gone, logs archived. Container isolation keeps the
`mbproxy` service account / unit off the real host.
---
### Phase 6 — Documentation
**Objective:** Docs reflect dual-platform reality; doctrine in `DOCS-GUIDE.md`
respected.
**Owned files**
- `README.md` — rewrite "Hard constraints / prerequisites" (drop "No Linux or
Docker support"); add Linux install path; document both publish flavors ×
both RIDs.
- `docs/Operations/Configuration.md` — both config templates, log-path
differences, syslog vs Event Log.
- `docs/Operations/Troubleshooting.md``journalctl` guidance alongside Event
Viewer.
- `docs/Architecture/Overview.md` — note dual init-system hosting (only if it
shifts a headline bullet).
- `docs/Reference/LogEvents.md` — note Error+ events route to Event Log
(Windows) / syslog (Linux).
- `mbproxy/CLAUDE.md` — correct the implied Windows-only framing.
- `wwtools/CLAUDE.md` — broaden the mbproxy index row if the task→tool mapping
changed.
**Tests**
- Markdown link-check across touched files.
**Gate 6**
- All internal doc links resolve.
- README "Hard constraints" no longer contradicts the shipped tooling.
---
## 3. Parallel Subagent Execution Plan
### Dependency graph
```
Phase 1 (build) ──> Phase 2 (diagnostics) ──> Phase 3 (host) ──┬─> Phase 4 (config)
├─> Phase 5 (install)
└─> Phase 6 (docs)
```
Phases 2 and 3 are **strictly sequential**: Phase 3 calls the new
`AddMbproxySerilog` signature Phase 2 defines, and both edit
`HostingExtensions.cs` + `csproj`. Phases 4, 5, 6 are **mutually independent**
and parallelizable once Phase 3 is merged.
### Wave plan
| Wave | Phases | Agents | Mode |
| ---- | --------- | ------------------- | ----------------------------------------------- |
| W1 | Phase 1 | 1 agent | Single — touches `csproj` |
| W2 | Phase 2 | 1 agent | Single — touches `csproj` + `HostingExtensions` |
| W3 | Phase 3 | 1 agent | Single — touches `csproj` + `HostingExtensions` + `Program.cs` |
| W4 | 4, 5, 6 | 3 agents (parallel) | Parallel — disjoint file sets |
> Phase 4 touches `csproj` but no other W4 phase does, so within W4 the file
> sets are still disjoint. Safe.
### File-ownership matrix (the parallel-safety contract)
| File | P1 | P2 | P3 | P4 | P5 | P6 |
| --------------------------------------------- | -- | -- | -- | -- | -- | -- |
| `Mbproxy.csproj` | x | x | x | x | | |
| `HostingExtensions.cs` | | x | x | | | |
| `Program.cs` | | | x | | | |
| `Diagnostics/*` (new + EventLogBridge) | | x | | | | |
| `install/publish.*` | x | | | | | |
| `install/*.config.template.json` | | | | x | | |
| `install/install.sh`, `uninstall.sh`, `.service` | | | | | x | |
| `tests/**` | | x | x | x | | |
| docs / READMEs / CLAUDE.md | | | | | | x |
No column in W4 (P4/P5/P6) shares a row. Confirmed conflict-free.
### Subagent rules (enforce in every dispatch prompt)
1. **One git worktree per subagent** — dispatch each `Agent` call with
`isolation: "worktree"`. Physical isolation means even a stray edit can't
corrupt a sibling's tree.
2. **Owned-file contract** — each subagent is told its exact owned file set
from the matrix and instructed to edit nothing outside it. A subagent that
discovers it needs an out-of-set file must stop and report, not edit.
3. **No intra-wave API coupling** — subagents in the same wave may only depend
on public APIs from *already-merged* prior waves, never on a sibling's
in-progress work. (This is why P2→P3 are separate waves, not parallel.)
4. **Tests ship with code** — the subagent that writes a phase's code also
writes that phase's tests and runs `dotnet test` green *in its own
worktree* before reporting done. No separate "test agent."
5. **Integrator merges in declared order** — the main agent merges each
worktree, runs the full build + test suite, and only then declares the
phase gate met. A failed gate blocks the next wave.
6. **High-contention files are single-agent-only**`csproj`,
`HostingExtensions.cs`, `Program.cs`, `CLAUDE.md` are never edited by two
agents in the same wave (the matrix guarantees this).
7. **Prefer new files**`DiagnosticSinkSelector.cs`, `SyslogBridge.cs`,
`mbproxy.linux.config.template.json`, the shell scripts, the unit file are
all new — new files can't merge-conflict, maximizing safe parallelism.
8. **Shared test hosts are integrator-only for mutations** — subagents may run
`dotnet build` / `dotnet test` (read-mostly) but must **not** install a
Windows Service, register an Event Log source, or `systemctl` against the
real `10.100.0.35` host. Service-level E2E is the integrator's job at gate
time; if a subagent needs Linux E2E it spins an ephemeral Docker container
on the box (named per-agent, `--rm`) so parallel agents never collide on
ports, the init system, or service accounts.
### Merge protocol per wave
```
for each wave:
dispatch agent(s) with isolation: worktree + owned-file list
on completion:
integrator: merge worktree(s) in matrix order
integrator: dotnet build -c Debug (must be green)
integrator: dotnet test (green, count >= prior)
integrator: dotnet publish -r win-x64 AND -r linux-x64 (must succeed)
integrator: verify phase-specific gate checklist
gate green? -> next wave. gate red? -> fix in a single-agent pass, re-gate.
```
---
## 4. Cross-Cutting Test Strategy
- **Existing baseline (325 = 282 unit + 43 E2E) must never regress.** Every
gate re-runs the full suite.
- **New tests target pure logic**`DiagnosticSinkSelector` is a pure function
precisely so platform-selection is testable without being a service. Highest-
value new test.
- **OS-conditional tests** use `[Trait]` + a runtime `OperatingSystem.IsWindows()`
skip so the suite is green on both Windows and Linux.
- **Both platforms are exercised every gate, no simulation.** Windows runs
locally (admin rights → real Windows Service install). Linux runs on
`dohertj2@10.100.0.35` (Debian 13, systemd 257) — the integrator drives
`dotnet build` / `dotnet test` / publish / systemd E2E over SSH.
- **CI** (if/when a pipeline exists): add a `linux-x64` build+test leg, ideally
pointed at the same box or an equivalent image. Until then the integrator's
per-gate SSH run on `10.100.0.35` is the Linux leg.
- **CA1416 platform analyzer** is treated as a test — `TreatWarningsAsErrors`
already fails the build if a Windows API escapes its guard.
---
## 5. Risk Register
| Risk | Phase | Mitigation |
| --------------------------------------------- | ----- | -------------------------------------------------------------------------- |
| Windows Service behavior regresses unnoticed | P3 | Gate 3 mandates a real Windows Service install/start/stop smoke check |
| `Serilog.Sinks.Syslog` version drift | P2 | Pin the version; `SyslogBridge` is isolated behind `DiagnosticSinkSelector` |
| Linux publish ships Windows config path | P4 | RID-conditioned `<Content>` item + `MbproxyOptionsBindingTests` on both templates |
| Self-extracting single-file temp-dir perms | P1/P5 | Document + set `DOTNET_BUNDLE_EXTRACT_BASE_DIR` in the systemd unit |
| Two agents racing `csproj` | all | Matrix forbids it — `csproj` edited only in single-agent waves W1W3 + lone P4 |
| Hidden Windows path elsewhere in code | all | `Grep` sweep for `C:\\`, `ProgramData`, `\\\\` before Gate 6 |
| Parallel Wave-4 agents collide on the shared `10.100.0.35` host | W4 | Rule 8 — service-level E2E is integrator-only and serial; subagent E2E uses per-agent `--rm` Docker containers |
| Windows Service E2E leaves stale service/Event Log source | P2/P3 | Integrator always runs `uninstall.ps1` after each Windows gate |
---
## 6. Deliverable Summary
- **3 modified source files** (`csproj`, `HostingExtensions.cs`, `Program.cs`)
+ **3 new** (`DiagnosticSinkSelector.cs`, `SyslogBridge.cs`, and the
truncation-helper extraction in `EventLogBridge.cs`).
- **2 new packages** (`Microsoft.Extensions.Hosting.Systemd`,
`Serilog.Sinks.Syslog`).
- **6 new install/tooling files** (`publish.sh`, Linux config template,
`mbproxy.service`, `install.sh`, `uninstall.sh`).
- **~68 new tests** across 3 new/extended test files; baseline 325 preserved.
- **7 doc files** updated.
- **4 waves**, max 3 concurrent subagents, conflict-free by construction.
+7 -1
View File
@@ -103,7 +103,13 @@ public sealed record PlcBackendStatus(
long CacheMissCount,
long CacheInvalidations,
long CacheEntryCount,
long CacheBytes);
long CacheBytes,
/// <summary>Backend keepalive heartbeat probes issued on idle backend sockets.</summary>
long BackendHeartbeatsSent,
/// <summary>Keepalive heartbeat probes that timed out (backend not answering).</summary>
long BackendHeartbeatsFailed,
/// <summary>Backend teardowns triggered by a failed keepalive heartbeat.</summary>
long BackendIdleDisconnects);
/// <summary>Modbus exception counts by code.</summary>
public sealed record ExceptionCounts(
@@ -88,6 +88,9 @@ internal static class StatusHtmlRenderer
// an em-dash when no cache-eligible reads have occurred. Page-weight budget
// assertion stays under 50 KB for the 54-PLC fleet.
sb.Append("<th>Cache</th>");
// Keepalive column — heartbeats sent, with failure / idle-disconnect counts
// shown only when non-zero.
sb.Append("<th>Keepalive</th>");
sb.Append("</tr></thead><tbody>");
foreach (var plc in status.Plcs)
@@ -185,6 +188,24 @@ internal static class StatusHtmlRenderer
sb.Append(pct).Append("% (").Append(cacheHit).Append(')');
}
sb.Append("</td>");
// Keepalive cell — heartbeats sent; failures + idle-disconnects appended
// only when non-zero to keep the cell narrow.
long hbSent = plc.Backend.BackendHeartbeatsSent;
long hbFailed = plc.Backend.BackendHeartbeatsFailed;
long hbIdle = plc.Backend.BackendIdleDisconnects;
sb.Append("<td>");
if (hbSent == 0 && hbFailed == 0 && hbIdle == 0)
{
sb.Append("&mdash;");
}
else
{
sb.Append(hbSent);
if (hbFailed > 0 || hbIdle > 0)
sb.Append(" (fail ").Append(hbFailed)
.Append(", idle-disc ").Append(hbIdle).Append(')');
}
sb.Append("</td>");
sb.Append("</tr>");
}
@@ -108,7 +108,10 @@ internal sealed class StatusSnapshotBuilder
CacheInvalidations: 0,
CacheEntryCount: 0,
CacheBytes: 0,
ResponseDropForFullUpstream: 0);
ResponseDropForFullUpstream: 0,
BackendHeartbeatsSent: 0,
BackendHeartbeatsFailed: 0,
BackendIdleDisconnects: 0);
long connectsSuccess = counters.ConnectsSuccess;
long connectsFailed = counters.ConnectsFailed;
@@ -152,7 +155,10 @@ internal sealed class StatusSnapshotBuilder
CacheMissCount: counters.CacheMissCount,
CacheInvalidations: counters.CacheInvalidations,
CacheEntryCount: counters.CacheEntryCount,
CacheBytes: counters.CacheBytes),
CacheBytes: counters.CacheBytes,
BackendHeartbeatsSent: counters.BackendHeartbeatsSent,
BackendHeartbeatsFailed: counters.BackendHeartbeatsFailed,
BackendIdleDisconnects: counters.BackendIdleDisconnects),
Bytes: new PlcBytesStatus(
UpstreamIn: counters.BytesUpstreamIn,
UpstreamOut: counters.BytesUpstreamOut)));
@@ -61,6 +61,10 @@ internal sealed partial class ConfigReconciler : IDisposable
// and a hot-reload of `Enabled = false` would not propagate to them.
private Func<ReadCoalescingOptions>? _coalescingAccessor;
// Live accessor for KeepaliveOptions, threaded through Attach so PLCs added or
// restarted via hot-reload honour the current `Connection.Keepalive` values.
private Func<KeepaliveOptions>? _keepaliveAccessor;
// ── Debounce + serialisation machinery ───────────────────────────────────────────────
// Channel carries Unit to signal "something changed — please check".
@@ -121,11 +125,13 @@ internal sealed partial class ConfigReconciler : IDisposable
public void Attach(
ConcurrentDictionary<string, PlcListenerSupervisor> supervisors,
MbproxyOptions initialOptions,
Func<ReadCoalescingOptions>? coalescingAccessor = null)
Func<ReadCoalescingOptions>? coalescingAccessor = null,
Func<KeepaliveOptions>? keepaliveAccessor = null)
{
_supervisors = supervisors;
_currentOptions = initialOptions;
_coalescingAccessor = coalescingAccessor;
_keepaliveAccessor = keepaliveAccessor;
}
// ── ApplyAsync (exposed for tests) ───────────────────────────────────────────────────
@@ -315,7 +321,8 @@ internal sealed partial class ConfigReconciler : IDisposable
recoveryPipeline,
_loggerFactory.CreateLogger<PlcListenerSupervisor>(),
backendPipeline,
_coalescingAccessor);
_coalescingAccessor,
_keepaliveAccessor);
_supervisors[name] = newSupervisor;
await newSupervisor.StartAsync(ct).ConfigureAwait(false);
@@ -401,7 +408,8 @@ internal sealed partial class ConfigReconciler : IDisposable
recoveryPipeline,
_loggerFactory.CreateLogger<PlcListenerSupervisor>(),
backendPipeline,
_coalescingAccessor);
_coalescingAccessor,
_keepaliveAccessor);
_supervisors[plcNew.Name] = newSupervisor;
await newSupervisor.StartAsync(ct).ConfigureAwait(false);
@@ -141,6 +141,27 @@ internal static class ReloadValidator
errs.Add(
$"Connection.GracefulShutdownTimeoutMs must be > 0; got {next.Connection.GracefulShutdownTimeoutMs}.");
// ── 6. Keepalive section ──────────────────────────────────────────────
// Schema bounds are also checked in MbproxyOptionsValidator; re-checking here keeps
// the hot-reload gate self-contained. The cross-field rule (heartbeat interval must
// sit above the request timeout, or it would fire continuously) lives only here.
var ka = next.Connection.Keepalive;
if (ka.TcpIdleTimeMs <= 0)
errs.Add($"Connection.Keepalive.TcpIdleTimeMs must be > 0; got {ka.TcpIdleTimeMs}.");
if (ka.TcpProbeIntervalMs <= 0)
errs.Add($"Connection.Keepalive.TcpProbeIntervalMs must be > 0; got {ka.TcpProbeIntervalMs}.");
if (ka.TcpProbeCount <= 0)
errs.Add($"Connection.Keepalive.TcpProbeCount must be > 0; got {ka.TcpProbeCount}.");
if (ka.BackendHeartbeatProbeAddress is < 0 or > 65535)
errs.Add(
$"Connection.Keepalive.BackendHeartbeatProbeAddress must be in [0, 65535]; " +
$"got {ka.BackendHeartbeatProbeAddress}.");
if (ka.BackendHeartbeatIdleMs <= next.Connection.BackendRequestTimeoutMs)
errs.Add(
$"Connection.Keepalive.BackendHeartbeatIdleMs ({ka.BackendHeartbeatIdleMs}) must be greater " +
$"than Connection.BackendRequestTimeoutMs ({next.Connection.BackendRequestTimeoutMs}); " +
"a heartbeat interval at or below the request timeout would fire continuously.");
errors = errs;
return errs.Count == 0;
}
@@ -0,0 +1,60 @@
namespace Mbproxy.Diagnostics;
/// <summary>
/// The platform diagnostic sink to wire for <c>Error</c>+ events — picked once,
/// at the composition root, by <see cref="DiagnosticSinkSelector"/>.
/// </summary>
internal enum DiagnosticSink
{
/// <summary>
/// No platform diagnostic sink — console (and rolling-file) sinks only. Used
/// for interactive / dev runs on every OS.
/// </summary>
None,
/// <summary>
/// Windows Application Event Log, via <see cref="EventLogBridge"/>. Selected
/// only when the process is hosted as a Windows Service.
/// </summary>
EventLog,
/// <summary>
/// Local syslog, via <see cref="SyslogBridge"/>. Selected only when the
/// process is hosted as a systemd service on Linux.
/// </summary>
Syslog,
}
/// <summary>
/// Pure platform-selection logic for the <c>Error</c>+ diagnostic sink. Holds no
/// I/O and no host APIs so it is unit-testable for every OS / host combination;
/// the host detection itself happens in <see cref="HostingExtensions.AddMbproxySerilog"/>.
/// </summary>
internal static class DiagnosticSinkSelector
{
/// <summary>
/// Picks the diagnostic sink for the current host:
/// <list type="bullet">
/// <item>Windows hosted as a Windows Service → <see cref="DiagnosticSink.EventLog"/>.</item>
/// <item>Linux hosted as a systemd service → <see cref="DiagnosticSink.Syslog"/>.</item>
/// <item>Everything else — interactive / dev runs, macOS, launches not owned
/// by an init system → <see cref="DiagnosticSink.None"/>.</item>
/// </list>
/// The managed-service gate mirrors the original <see cref="EventLogBridge"/>
/// contract: a diagnostic sink is wired only when an init system actually owns
/// the process, so dev / console runs never need an Event Log source registered
/// or a syslog socket reachable.
/// </summary>
/// <param name="isWindows">Running on Windows.</param>
/// <param name="isWindowsService">Hosted by the Windows Service Control Manager.</param>
/// <param name="isSystemd">Hosted by systemd.</param>
public static DiagnosticSink Select(bool isWindows, bool isWindowsService, bool isSystemd)
{
// Windows takes precedence: isSystemd is meaningless there, and on
// non-Windows isWindowsService is always false.
if (isWindows)
return isWindowsService ? DiagnosticSink.EventLog : DiagnosticSink.None;
return isSystemd ? DiagnosticSink.Syslog : DiagnosticSink.None;
}
}
@@ -5,6 +5,32 @@ using Serilog.Events;
namespace Mbproxy.Diagnostics;
/// <summary>
/// Pure message-shaping helpers for the Windows Event Log. Kept on a separate,
/// non-platform-annotated type — <em>not</em> on <see cref="EventLogBridge"/>,
/// which is <c>[SupportedOSPlatform("windows")]</c> — so the truncation logic is
/// unit-testable on any OS without tripping the platform-compatibility analyzer.
/// </summary>
internal static class EventLogMessage
{
/// <summary>The Windows Event Log single-entry limit, in bytes (32 KB).</summary>
public const int MaxBytes = 32 * 1024;
/// <summary>
/// Truncates <paramref name="message"/> so its UTF-16 byte length stays within
/// <see cref="MaxBytes"/>, appending an ellipsis when truncation occurs. Shorter
/// messages are returned unchanged.
/// </summary>
public static string TruncateToLimit(string message)
{
// Rough UTF-16 upper bound: 2 bytes per char.
if (message.Length * 2 <= MaxBytes) return message;
int charLimit = MaxBytes / 2 - 3; // leave room for the "..." suffix
return message[..charLimit] + "...";
}
}
/// <summary>
/// Serilog sink that writes events at level Error and above to the Windows Event Log
/// under source <c>mbproxy</c>.
@@ -26,7 +52,6 @@ internal sealed class EventLogBridge : ILogEventSink
{
private const string Source = "mbproxy";
private const string LogName = "Application";
private const int MaxMessageBytes = 32 * 1024; // 32 KB Event Log limit
private readonly bool _enabled;
// Cache the source-exists check at construction so Emit doesn't hit the registry on
@@ -63,11 +88,7 @@ internal sealed class EventLogBridge : ILogEventSink
}
// Truncate to the Event Log single-entry limit.
if (message.Length * 2 > MaxMessageBytes) // rough UTF-16 upper bound
{
int charLimit = MaxMessageBytes / 2 - 3;
message = message[..charLimit] + "...";
}
message = EventLogMessage.TruncateToLimit(message);
var type = logEvent.Level switch
{
@@ -0,0 +1,50 @@
using Serilog;
using Serilog.Debugging;
using Serilog.Events;
namespace Mbproxy.Diagnostics;
/// <summary>
/// Wires the local-syslog sink for <c>Error</c>+ events when mbproxy runs as a
/// systemd service on Linux — the cross-platform counterpart of
/// <see cref="EventLogBridge"/>.
///
/// <para>Events at <see cref="LogEventLevel.Error"/> and above are written to the
/// local syslog socket (<c>/dev/log</c>) under the application name
/// <see cref="AppName"/>, with Serilog levels mapped to syslog severities by the
/// sink. On a systemd host the local syslog socket is provided by
/// <c>systemd-journald</c>, so these events land in the journal at
/// <c>err</c>/<c>crit</c> priority — distinct from the process's stdout, which
/// journald captures at <c>info</c>.</para>
///
/// <para>If the local syslog socket is unavailable the bridge degrades silently
/// to the console (and rolling-file) sinks rather than failing logger
/// construction, mirroring <see cref="EventLogBridge"/>'s no-op-when-unavailable
/// contract.</para>
/// </summary>
internal static class SyslogBridge
{
/// <summary>syslog application name — the <c>TAG</c> field of each entry.</summary>
internal const string AppName = "mbproxy";
/// <summary>
/// Attaches the <c>Error</c>+ local-syslog sink to <paramref name="cfg"/> and
/// returns it for fluent chaining. Never throws: a host where the syslog sink
/// cannot be configured degrades to <paramref name="cfg"/> unchanged.
/// </summary>
public static LoggerConfiguration AttachTo(LoggerConfiguration cfg)
{
try
{
return cfg.WriteTo.LocalSyslog(
appName: AppName,
restrictedToMinimumLevel: LogEventLevel.Error);
}
catch (Exception ex)
{
// Degrade to console-only rather than crash logger construction.
SelfLog.WriteLine("SyslogBridge: local syslog unavailable, console-only: {0}", ex);
return cfg;
}
}
}
+30 -13
View File
@@ -2,7 +2,10 @@ using Mbproxy.Admin;
using Mbproxy.Configuration;
using Mbproxy.Diagnostics;
using Mbproxy.Options;
using Microsoft.Extensions.Hosting.Systemd;
using Microsoft.Extensions.Hosting.WindowsServices;
using Serilog;
using Serilog.Events;
namespace Mbproxy;
@@ -62,25 +65,39 @@ internal static class HostingExtensions
/// Configures Serilog from the <c>"Serilog"</c> configuration section, with console
/// and rolling-file sinks as defaults.
///
/// <para>When <paramref name="addEventLogBridge"/> is <c>true</c>, the
/// <see cref="Diagnostics.EventLogBridge"/> is added as a sub-sink for events at
/// <see cref="Serilog.Events.LogEventLevel.Error"/> and above. This flag should only be
/// set when the service is running as a Windows Service — the bridge silently ignores
/// events when the Event Log source is not registered.</para>
/// <para>This is the single composition-root point where the platform diagnostic
/// sink for <c>Error</c>+ events is chosen. <see cref="DiagnosticSinkSelector"/>
/// picks it from the current host:
/// <list type="bullet">
/// <item>Windows Service → <see cref="Diagnostics.EventLogBridge"/> (Application
/// Event Log).</item>
/// <item>systemd service → <see cref="Diagnostics.SyslogBridge"/> (local syslog).</item>
/// <item>interactive / dev runs (any OS) → no platform sink.</item>
/// </list>
/// Both bridges silently no-op when their backing facility is unavailable, so a
/// dev run never needs an Event Log source registered or a syslog socket.</para>
/// </summary>
public static IHostApplicationBuilder AddMbproxySerilog(
this IHostApplicationBuilder builder,
bool addEventLogBridge = false)
public static IHostApplicationBuilder AddMbproxySerilog(this IHostApplicationBuilder builder)
{
var cfg = new LoggerConfiguration()
.ReadFrom.Configuration(builder.Configuration);
if (addEventLogBridge && OperatingSystem.IsWindows())
var sink = DiagnosticSinkSelector.Select(
isWindows: OperatingSystem.IsWindows(),
isWindowsService: WindowsServiceHelpers.IsWindowsService(),
isSystemd: SystemdHelpers.IsSystemdService());
cfg = sink switch
{
cfg = cfg.WriteTo.Sink(
new EventLogBridge(enabled: true),
Serilog.Events.LogEventLevel.Error);
}
// EventLogBridge is [SupportedOSPlatform("windows")]; the extra
// OperatingSystem.IsWindows() guard satisfies the platform analyzer
// (DiagnosticSinkSelector already guarantees Windows for this case).
DiagnosticSink.EventLog when OperatingSystem.IsWindows()
=> cfg.WriteTo.Sink(new EventLogBridge(enabled: true), LogEventLevel.Error),
DiagnosticSink.Syslog
=> SyslogBridge.AttachTo(cfg),
_ => cfg,
};
Log.Logger = cfg.CreateLogger();
+36 -14
View File
@@ -12,16 +12,19 @@
<InformationalVersion>1.0.0</InformationalVersion>
</PropertyGroup>
<!-- Single-file self-contained publish (Release only; Debug stays normal for fast iteration).
The resulting Mbproxy.exe is ~100 MB because the self-contained publish bundles the full
.NET 10 + ASP.NET Core runtime — fixed cost of self-contained deployment on .NET 10 with
ASP.NET Core. Operators who need a smaller footprint can use a framework-dependent publish
(dotnet publish -c Release -r win-x64 -p:SelfContained=false -p:PublishSingleFile=true)
if the target machine has .NET 10 installed. -->
<PropertyGroup Condition="'$(Configuration)' == 'Release'">
<!-- Single-file publish settings — apply only to a Release publish with an explicit RID.
Publishing with -r <rid> produces a single-file binary, self-contained by default
(bundles the .NET 10 + ASP.NET Core runtime, ~100 MB) so no .NET install is needed on
the target. Override with -p:SelfContained=false for a framework-dependent build
(~1.6 MB) when the target already has the .NET 10 + ASP.NET Core runtime.
The RID is supplied per publish (win-x64, linux-x64, ...) and is deliberately NOT
hardcoded here — see install/publish.ps1 / install/publish.sh. The
'$(RuntimeIdentifier)' != '' guard means a plain `dotnet build -c Release` with no RID
stays an ordinary framework build (SelfContained without a RID is an SDK error). -->
<PropertyGroup Condition="'$(Configuration)' == 'Release' and '$(RuntimeIdentifier)' != ''">
<PublishSingleFile>true</PublishSingleFile>
<SelfContained>true</SelfContained>
<RuntimeIdentifier>win-x64</RuntimeIdentifier>
<IncludeNativeLibrariesForSelfExtract>true</IncludeNativeLibrariesForSelfExtract>
</PropertyGroup>
@@ -32,12 +35,19 @@
<ItemGroup>
<!-- Microsoft.Extensions.Hosting is already included transitively via
Microsoft.AspNetCore.App — do not re-add it explicitly. -->
Microsoft.AspNetCore.App — do not re-add it explicitly.
The two init-system integration packages are both portable: each is
safe to reference and call on any OS (the helper self-detects its host
and no-ops otherwise), so no conditional reference is needed. -->
<PackageReference Include="Microsoft.Extensions.Hosting.WindowsServices" Version="10.0.8" />
<PackageReference Include="Microsoft.Extensions.Hosting.Systemd" Version="10.0.8" />
<PackageReference Include="Serilog.Extensions.Hosting" Version="10.0.0" />
<PackageReference Include="Serilog.Settings.Configuration" Version="10.0.0" />
<PackageReference Include="Serilog.Sinks.Console" Version="6.1.1" />
<PackageReference Include="Serilog.Sinks.File" Version="7.0.0" />
<!-- Local-syslog sink for the Linux diagnostic bridge (Error+ events).
Serilog.Sinks.SyslogMessages is the maintained IonxSolutions package. -->
<PackageReference Include="Serilog.Sinks.SyslogMessages" Version="4.1.0" />
<!-- Polly: backend-connect retry pipeline (PolicyFactory.BuildBackendConnect) and
listener-recovery pipeline (PolicyFactory.BuildListenerRecovery). -->
<PackageReference Include="Polly" Version="8.6.6" />
@@ -48,17 +58,29 @@
<InternalsVisibleTo Include="Mbproxy.Tests" />
</ItemGroup>
<!-- Link the platform-appropriate install template as the published appsettings.json so
the binary ships with a fully-commented, usable example config (PLCs, BCD tags, all
sections present) instead of an empty stub. The .NET configuration loader supports
JSONC (comments) under the default Host.CreateApplicationBuilder path, so the comments
in the template are valid at runtime.
The two templates differ only in OS-specific paths (log directory) and platform
notes. A `dotnet publish -r linux-*` (or any non-win RID) ships the Linux template;
win-* and a plain RID-less dev build ship the Windows one. -->
<ItemGroup>
<!-- Link the install template as the published appsettings.json so the binary ships
with a fully-commented, usable example config (one PLC, one BCD tag, all sections
present) instead of an empty stub. The .NET configuration loader supports JSONC
(comments) under the default Host.CreateApplicationBuilder path, so the comments
in the template are valid at runtime. -->
<None Remove="appsettings.json" />
</ItemGroup>
<ItemGroup Condition="'$(RuntimeIdentifier)' == '' or $(RuntimeIdentifier.StartsWith('win'))">
<Content Include="..\..\install\mbproxy.config.template.json"
Link="appsettings.json">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
</ItemGroup>
<ItemGroup Condition="'$(RuntimeIdentifier)' != '' and !$(RuntimeIdentifier.StartsWith('win'))">
<Content Include="..\..\install\mbproxy.linux.config.template.json"
Link="appsettings.json">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</Content>
</ItemGroup>
</Project>
@@ -9,4 +9,9 @@ public sealed class ConnectionOptions
/// graceful shutdown before cancelling them. Default: 10000 (10 s).
/// </summary>
public int GracefulShutdownTimeoutMs { get; init; } = 10000;
/// <summary>
/// TCP keepalive and backend-heartbeat connection-monitoring settings. Enabled by default.
/// </summary>
public KeepaliveOptions Keepalive { get; init; } = new();
}
@@ -0,0 +1,52 @@
namespace Mbproxy.Options;
/// <summary>
/// TCP keepalive and application-level connection-monitoring settings.
///
/// <para>The DL205/DL260 ECOM does not emit TCP keepalives, so an idle backend socket can be
/// silently dropped by a middlebox (switch, firewall, NAT) after 2-5 minutes. These knobs
/// (a) enable OS-level <c>SO_KEEPALIVE</c> on both backend and accepted upstream sockets and
/// (b) drive a periodic Modbus FC03 heartbeat on each idle backend socket so the path stays
/// warm and a dead ECOM is detected before a real client request hits it.</para>
/// </summary>
public sealed class KeepaliveOptions
{
/// <summary>
/// Master switch. When <c>false</c>, neither <c>SO_KEEPALIVE</c> nor the backend heartbeat
/// is applied and the proxy behaves exactly as a pre-keepalive build. Default: <c>true</c>.
/// </summary>
public bool Enabled { get; init; } = true;
/// <summary>
/// <c>SO_KEEPALIVE</c> idle time in milliseconds — how long a socket may be idle before the
/// OS sends its first keepalive probe. Applied to backend and accepted upstream sockets.
/// Default: 30000 (30 s).
/// </summary>
public int TcpIdleTimeMs { get; init; } = 30000;
/// <summary>
/// <c>SO_KEEPALIVE</c> interval in milliseconds between keepalive probes once the idle time
/// has elapsed. Default: 5000 (5 s).
/// </summary>
public int TcpProbeIntervalMs { get; init; } = 5000;
/// <summary>
/// <c>SO_KEEPALIVE</c> probe count — unanswered probes before the OS declares the socket
/// dead. Default: 4.
/// </summary>
public int TcpProbeCount { get; init; } = 4;
/// <summary>
/// Backend application heartbeat: after this many milliseconds with no backend traffic, the
/// multiplexer issues a synthetic FC03 qty=1 read to keep the socket warm and prove the ECOM
/// is still answering Modbus. Must be greater than <see cref="ConnectionOptions.BackendRequestTimeoutMs"/>.
/// Default: 30000 (30 s).
/// </summary>
public int BackendHeartbeatIdleMs { get; init; } = 30000;
/// <summary>
/// Modbus PDU address read by the backend heartbeat FC03 probe. Address 0 (V0) is valid on
/// DL205/DL260 in factory absolute mode. Default: 0.
/// </summary>
public int BackendHeartbeatProbeAddress { get; init; } = 0;
}
@@ -106,6 +106,22 @@ public sealed class MbproxyOptionsValidator : IValidateOptions<MbproxyOptions>
errors.Add(
$"Connection.GracefulShutdownTimeoutMs must be > 0; got {options.Connection.GracefulShutdownTimeoutMs}.");
// Keepalive section ranges. Cross-field rules (heartbeat interval vs request
// timeout) are enforced in ReloadValidator.
var ka = options.Connection.Keepalive;
if (ka.TcpIdleTimeMs <= 0)
errors.Add($"Connection.Keepalive.TcpIdleTimeMs must be > 0; got {ka.TcpIdleTimeMs}.");
if (ka.TcpProbeIntervalMs <= 0)
errors.Add($"Connection.Keepalive.TcpProbeIntervalMs must be > 0; got {ka.TcpProbeIntervalMs}.");
if (ka.TcpProbeCount <= 0)
errors.Add($"Connection.Keepalive.TcpProbeCount must be > 0; got {ka.TcpProbeCount}.");
if (ka.BackendHeartbeatIdleMs <= 0)
errors.Add($"Connection.Keepalive.BackendHeartbeatIdleMs must be > 0; got {ka.BackendHeartbeatIdleMs}.");
if (ka.BackendHeartbeatProbeAddress is < 0 or > 65535)
errors.Add(
$"Connection.Keepalive.BackendHeartbeatProbeAddress must be in [0, 65535]; " +
$"got {ka.BackendHeartbeatProbeAddress}.");
return errors.Count > 0
? ValidateOptionsResult.Fail(errors)
: ValidateOptionsResult.Success;
+10 -6
View File
@@ -1,17 +1,21 @@
using Mbproxy;
using Mbproxy.Proxy;
using Microsoft.Extensions.Hosting.Systemd;
using Microsoft.Extensions.Hosting.WindowsServices;
var builder = Host.CreateApplicationBuilder(args);
// Windows Service support; no-op when running under dotnet run / console.
// Init-system integration. Both helpers self-detect their host and are no-ops
// otherwise, so calling both unconditionally is correct on every platform:
// - AddWindowsService(): active only when launched by the Windows SCM.
// - AddSystemd(): active only when launched by systemd (wires sd_notify
// readiness; SIGTERM shutdown is handled by the host).
builder.Services.AddWindowsService();
builder.Services.AddSystemd();
// Wire EventLogBridge only when actually running as a Windows Service.
bool isWindowsService = WindowsServiceHelpers.IsWindowsService();
// Wire up structured config, Serilog, and typed options.
builder.AddMbproxySerilog(addEventLogBridge: isWindowsService);
// Wire up structured config, Serilog, and typed options. AddMbproxySerilog selects
// the platform diagnostic sink (Windows Event Log / syslog / none) internally.
builder.AddMbproxySerilog();
builder.AddMbproxyOptions();
// PDU pipeline: BcdPduPipeline is stateless (per-call correlation flows through
@@ -28,6 +28,12 @@ internal sealed record InterestedParty(UpstreamPipe Pipe, ushort OriginalTxId);
/// read coalescing uses to fan out a single PLC response to multiple upstream clients.
/// Reviewer note: do <i>not</i> simplify back to a single <c>UpstreamPipe</c> field.</para>
/// </summary>
/// <param name="IsHeartbeat">
/// <c>true</c> for the synthetic FC03 keepalive probe issued by the backend heartbeat
/// loop. Heartbeat entries carry no <see cref="InterestedParties"/>: the backend reader
/// drops the response (no fan-out, no rewriter, no cache) and the timeout watchdog tears
/// the backend down instead of dispatching a 0x0B exception. Defaults to <c>false</c>.
/// </param>
internal sealed record InFlightRequest(
byte UnitId,
byte Fc,
@@ -35,4 +41,5 @@ internal sealed record InFlightRequest(
ushort Qty,
IReadOnlyList<InterestedParty> InterestedParties,
DateTimeOffset SentAtUtc,
int ResolvedCacheTtlMs = 0);
int ResolvedCacheTtlMs = 0,
bool IsHeartbeat = false);
@@ -0,0 +1,54 @@
namespace Mbproxy.Proxy.Multiplexing;
/// <summary>
/// Source-generated <see cref="LoggerMessage"/> definitions for the backend keepalive
/// heartbeat. Event names are stable — do not rename without updating
/// docs/Reference/LogEvents.md's event-name table.
/// </summary>
internal static partial class KeepaliveLogEvents
{
/// <summary>
/// Emitted each time the heartbeat loop issues a synthetic FC03 probe on an idle
/// backend socket. Debug level — one per <c>BackendHeartbeatIdleMs</c> per idle PLC.
/// </summary>
[LoggerMessage(
EventId = 150,
EventName = "mbproxy.keepalive.heartbeat.sent",
Level = LogLevel.Debug,
Message = "Keepalive heartbeat sent: Plc={Plc} ProxyTxId={ProxyTxId} Address={Address}")]
public static partial void HeartbeatSent(
ILogger logger,
string plc,
ushort proxyTxId,
ushort address);
/// <summary>
/// Emitted when a keepalive heartbeat probe is not answered within
/// <c>BackendRequestTimeoutMs</c>. The backend is connected-but-not-answering; the
/// multiplexer tears it down (see <see cref="BackendIdleDisconnect"/>).
/// </summary>
[LoggerMessage(
EventId = 151,
EventName = "mbproxy.keepalive.heartbeat.timeout",
Level = LogLevel.Warning,
Message = "Keepalive heartbeat timed out: Plc={Plc} ProxyTxId={ProxyTxId} ElapsedMs={ElapsedMs}")]
public static partial void HeartbeatTimeout(
ILogger logger,
string plc,
ushort proxyTxId,
long elapsedMs);
/// <summary>
/// Emitted when a failed keepalive heartbeat triggers a proactive backend teardown.
/// Every attached upstream pipe is cascaded; clients reconnect on their next request.
/// </summary>
[LoggerMessage(
EventId = 152,
EventName = "mbproxy.keepalive.backend.idle_disconnect",
Level = LogLevel.Information,
Message = "Backend torn down by keepalive: Plc={Plc} HeartbeatElapsedMs={ElapsedMs}")]
public static partial void BackendIdleDisconnect(
ILogger logger,
string plc,
long elapsedMs);
}
@@ -61,6 +61,12 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
// `() => optionsMonitor.CurrentValue.Resilience.ReadCoalescing`. Tests default to a
// fresh `ReadCoalescingOptions()` (Enabled = true, MaxParties = 32).
private readonly Func<ReadCoalescingOptions> _coalescingOptions;
// Live keepalive config accessor. Read at backend-connect time (TCP SO_KEEPALIVE) and
// on each heartbeat-loop tick (idle threshold + probe address) so a hot-reload of
// `Connection.Keepalive` propagates without a listener restart. Production wires this
// to `() => optionsMonitor.CurrentValue.Connection.Keepalive`; the fallback reads the
// construction-time `ConnectionOptions` snapshot.
private readonly Func<KeepaliveOptions> _keepaliveOptions;
private readonly TxIdAllocator _allocator = new();
private readonly CorrelationMap _correlation = new();
@@ -86,6 +92,19 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
private CancellationTokenSource? _backendCts;
private Task? _backendWriterTask;
private Task? _backendReaderTask;
private Task? _backendHeartbeatTask;
// UTC ticks of the last backend socket activity (a send OR a received frame). Updated
// by the writer and reader tasks; read by the heartbeat loop to decide whether the
// socket has been idle long enough to warrant a probe. Interlocked for cross-task
// coherence.
private long _lastBackendActivityTicks;
// Unit ID of the most recent upstream request. The synthetic heartbeat reuses it so
// the probe targets the same Modbus unit the real clients successfully talk to.
// Defaults to 0 until the first upstream frame is seen; by the time a heartbeat can
// fire the backend socket exists, which means at least one upstream frame arrived.
private int _lastSeenUnitId;
private readonly CancellationTokenSource _disposeCts = new();
// Volatile so the disposing thread's write is observed by every hot-path reader
@@ -102,7 +121,8 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
PerPlcContext perPlcContext,
ILogger<PlcMultiplexer> logger,
ResiliencePipeline? backendConnectPipeline = null,
Func<ReadCoalescingOptions>? coalescingOptions = null)
Func<ReadCoalescingOptions>? coalescingOptions = null,
Func<KeepaliveOptions>? keepaliveOptions = null)
{
_plc = plc;
_connectionOptions = connectionOptions;
@@ -111,6 +131,7 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
_logger = logger;
_backendConnectPipeline = backendConnectPipeline;
_coalescingOptions = coalescingOptions ?? (static () => new ReadCoalescingOptions());
_keepaliveOptions = keepaliveOptions ?? (() => _connectionOptions.Keepalive);
// Register the per-PLC cache as the live stats source for the snapshot path.
// Cache may be null when the per-PLC context has not been wired with one
@@ -282,6 +303,7 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
// Build a fresh backend socket and Polly-connect.
var backend = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
{ NoDelay = true };
SocketKeepalive.Apply(backend, _keepaliveOptions());
try
{
@@ -318,8 +340,11 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
{
_backendSocket = backend;
_backendCts = cts2;
// Seed the idle timer so the heartbeat loop measures idleness from connect.
Interlocked.Exchange(ref _lastBackendActivityTicks, DateTime.UtcNow.Ticks);
_backendWriterTask = Task.Run(() => RunBackendWriterAsync(backend, cts2.Token), CancellationToken.None);
_backendReaderTask = Task.Run(() => RunBackendReaderAsync(backend, cts2.Token), CancellationToken.None);
_backendHeartbeatTask = Task.Run(() => RunBackendHeartbeatAsync(cts2.Token), CancellationToken.None);
}
_ctx.Counters.IncrementConnectSuccess();
@@ -381,18 +406,20 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
{
Socket? oldSocket;
CancellationTokenSource? oldCts;
Task? writer, reader;
Task? writer, reader, heartbeat;
lock (_backendLock)
{
oldSocket = _backendSocket;
oldCts = _backendCts;
writer = _backendWriterTask;
reader = _backendReaderTask;
heartbeat = _backendHeartbeatTask;
_backendSocket = null;
_backendCts = null;
_backendWriterTask = null;
_backendReaderTask = null;
_backendHeartbeatTask = null;
}
if (oldSocket is null && oldCts is null) return;
@@ -454,6 +481,7 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
// Best-effort join.
try { if (writer is not null) await writer.WaitAsync(TimeSpan.FromSeconds(2)).ConfigureAwait(false); } catch { /* swallow */ }
try { if (reader is not null) await reader.WaitAsync(TimeSpan.FromSeconds(2)).ConfigureAwait(false); } catch { /* swallow */ }
try { if (heartbeat is not null) await heartbeat.WaitAsync(TimeSpan.FromSeconds(2)).ConfigureAwait(false); } catch { /* swallow */ }
oldCts?.Dispose();
@@ -489,6 +517,9 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
if (n == 0) throw new SocketException((int)SocketError.ConnectionReset);
sent += n;
}
// A send counts as backend activity — it suppresses the idle heartbeat.
Interlocked.Exchange(ref _lastBackendActivityTicks, DateTime.UtcNow.Ticks);
}
}
catch (OperationCanceledException)
@@ -542,6 +573,10 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
if (!await FillAsync(backend, frame, MbapFrame.HeaderSize, pduBodyLen, ct).ConfigureAwait(false))
break;
// A received frame counts as backend activity — it suppresses (and, for a
// heartbeat response, satisfies) the idle heartbeat.
Interlocked.Exchange(ref _lastBackendActivityTicks, DateTime.UtcNow.Ticks);
if (!_correlation.TryRemove(proxyTxId, out var inFlight))
{
// No correlation entry — either a stale response after cascade, or
@@ -552,6 +587,14 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
// Free the allocator slot immediately so it can be reused.
_allocator.Release(proxyTxId);
// Keepalive heartbeat response — the probe came back, the backend is alive.
// The activity timestamp was already refreshed above. There is no upstream
// party, no cache eligibility, and no rewriting to do: drop the payload and
// skip the EWMA update so the synthetic probe never pollutes the
// client-facing round-trip metric.
if (inFlight.IsHeartbeat)
continue;
// For FC03/FC04 reads, also clear the coalescing-by-key entry so a
// brand-new identical request issued AFTER this response is treated as a
// miss (opens a fresh round-trip). The TryRemove is best-effort: a
@@ -727,6 +770,10 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
out ushort originalTxId, out _, out _, out byte unitId))
return;
// Remember the unit ID so the backend keepalive heartbeat probes the same Modbus
// unit the real clients are known to reach successfully.
Volatile.Write(ref _lastSeenUnitId, unitId);
// Count inbound bytes from the upstream client. Surfaces in bytes.upstreamIn on
// the status page. Counted ONCE per parsed frame regardless of subsequent
// routing (cache hit, coalesce, backend round-trip, exception).
@@ -1062,6 +1109,23 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
_allocator.Release(proxyTxId);
// Keepalive heartbeat that never came back. The backend is no longer
// answering Modbus even though the socket may still look connected —
// tear it down proactively (cascading every attached pipe) so the
// failure is found here, during idle, instead of corrupting the next
// real client request. There is no upstream party to send a 0x0B to.
if (req.IsHeartbeat)
{
long hbElapsedMs = (long)(DateTimeOffset.UtcNow - req.SentAtUtc).TotalMilliseconds;
KeepaliveLogEvents.HeartbeatTimeout(_logger, _plc.Name, proxyTxId, hbElapsedMs);
_ctx.Counters.IncrementBackendHeartbeatFailed();
_ctx.Counters.IncrementBackendIdleDisconnect();
KeepaliveLogEvents.BackendIdleDisconnect(_logger, _plc.Name, hbElapsedMs);
if (!_disposeCts.IsCancellationRequested)
_ = TearDownBackendAsync("keepalive heartbeat timeout", cascadeUpstreams: true);
continue;
}
// Also clear the coalescing-by-key entry. A late attach that raced
// in just before the watchdog claim will still receive the 0x0B
// exception via this entry's InterestedParties list (List<T>
@@ -1110,6 +1174,124 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
}
}
// ── Backend keepalive heartbeat ───────────────────────────────────────────
/// <summary>
/// Backend keepalive heartbeat loop. Started alongside the writer/reader on each
/// successful connect and cancelled with them on teardown. While the backend socket
/// has been idle (no send or receive) for longer than
/// <see cref="KeepaliveOptions.BackendHeartbeatIdleMs"/>, it issues a synthetic FC03
/// qty=1 read so the path stays warm against middlebox idle-drop and a backend that is
/// connected-but-not-answering is detected here rather than on the next client request.
///
/// <para>The probe response is consumed by <see cref="RunBackendReaderAsync"/> (which
/// recognises <see cref="InFlightRequest.IsHeartbeat"/> and drops it); a probe that
/// never returns is timed out by <see cref="RunRequestTimeoutWatchdogAsync"/>, which
/// tears the backend down. The heartbeat keeps an <i>existing</i> backend warm — it
/// never resurrects a dead one (reconnect stays gated on the next upstream frame).</para>
/// </summary>
private async Task RunBackendHeartbeatAsync(CancellationToken ct)
{
try
{
while (!ct.IsCancellationRequested)
{
var ka = _keepaliveOptions();
int idleMs = Math.Max(1000, ka.BackendHeartbeatIdleMs);
// Tick at a quarter of the idle window so a freshly-elapsed idle period is
// noticed promptly, floored at 500 ms so the loop never busy-wakes.
int tickMs = Math.Max(500, idleMs / 4);
await Task.Delay(tickMs, ct).ConfigureAwait(false);
if (!ka.Enabled)
continue;
long lastTicks = Interlocked.Read(ref _lastBackendActivityTicks);
double idleElapsedMs =
(DateTime.UtcNow - new DateTime(lastTicks, DateTimeKind.Utc)).TotalMilliseconds;
if (idleElapsedMs < idleMs)
continue;
SendHeartbeat(ka);
}
}
catch (OperationCanceledException)
{
// Normal teardown.
}
catch (Exception ex)
{
_logger.LogError(ex, "Backend heartbeat loop faulted: Plc={Plc}", _plc.Name);
}
}
/// <summary>
/// Builds and enqueues one synthetic FC03 qty=1 heartbeat request onto the backend
/// outbound channel. The correlation entry is flagged <see cref="InFlightRequest.IsHeartbeat"/>
/// so the reader and watchdog treat it specially; it carries no interested parties and
/// bypasses the coalescing and cache paths entirely.
/// </summary>
private void SendHeartbeat(KeepaliveOptions ka)
{
// A saturated TxId space means the backend is busy (65,536 requests in flight),
// which is the opposite of idle — skip this tick rather than force a probe.
if (!_allocator.TryAllocate(out ushort proxyTxId))
return;
byte unitId = (byte)Volatile.Read(ref _lastSeenUnitId);
ushort address = (ushort)ka.BackendHeartbeatProbeAddress;
var inFlight = new InFlightRequest(
UnitId: unitId,
Fc: 0x03,
StartAddress: address,
Qty: 1,
InterestedParties: Array.Empty<InterestedParty>(),
SentAtUtc: DateTimeOffset.UtcNow,
ResolvedCacheTtlMs: 0,
IsHeartbeat: true);
if (!_correlation.TryAdd(proxyTxId, inFlight))
{
_allocator.Release(proxyTxId);
return;
}
byte[] frame = BuildHeartbeatFrame(proxyTxId, unitId, address);
// Non-blocking enqueue: if the channel is full the backend is not idle (a race), and
// if it is completed the backend is tearing down — either way, undo and skip.
if (!_outboundChannel.Writer.TryWrite(frame))
{
if (_correlation.TryRemove(proxyTxId, out _))
_allocator.Release(proxyTxId);
return;
}
_ctx.Counters.IncrementBackendHeartbeatSent();
KeepaliveLogEvents.HeartbeatSent(_logger, _plc.Name, proxyTxId, address);
}
/// <summary>
/// Builds a 12-byte MBAP-framed FC03 (Read Holding Registers) request reading one
/// register at <paramref name="address"/> — the keepalive heartbeat probe PDU.
/// </summary>
private static byte[] BuildHeartbeatFrame(ushort proxyTxId, byte unitId, ushort address)
{
// PDU = [fc=03][addrHi][addrLo][qtyHi][qtyLo]. MBAP length = UnitId(1) + PDU(5) = 6.
var frame = new byte[MbapFrame.HeaderSize + 5];
frame[0] = (byte)(proxyTxId >> 8);
frame[1] = (byte)(proxyTxId & 0xFF);
frame[2] = 0; frame[3] = 0; // ProtocolId
frame[4] = 0; frame[5] = 6; // Length
frame[6] = unitId;
frame[7] = 0x03; // FC03 Read Holding Registers
frame[8] = (byte)(address >> 8);
frame[9] = (byte)(address & 0xFF);
frame[10] = 0; frame[11] = 1; // Qty = 1
return frame;
}
// ── Helpers ───────────────────────────────────────────────────────────────
/// <summary>
@@ -1,6 +1,7 @@
using System.Net;
using System.Net.Sockets;
using System.Threading.Channels;
using Mbproxy.Options;
namespace Mbproxy.Proxy.Multiplexing;
@@ -77,10 +78,15 @@ internal sealed partial class UpstreamPipe : IAsyncDisposable
/// </summary>
public bool IsAlive => !_disposed && !_cts.IsCancellationRequested;
public UpstreamPipe(Socket upstream, string plcName, ILogger logger)
public UpstreamPipe(Socket upstream, string plcName, ILogger logger, KeepaliveOptions? keepalive = null)
{
_upstream = upstream;
_upstream.NoDelay = true;
// Enable OS TCP keepalive on the accepted client socket so a half-open/dead
// client (gone without a TCP FIN) faults the read loop and is reaped, instead of
// leaking a pipe + correlation slots until the proxy next tries to write to it.
if (keepalive is not null)
SocketKeepalive.Apply(_upstream, keepalive);
RemoteEp = upstream.RemoteEndPoint as IPEndPoint;
_plcName = plcName;
_logger = logger;
+10 -3
View File
@@ -30,6 +30,10 @@ internal sealed partial class PlcListener : IAsyncDisposable
private readonly PerPlcContext? _perPlcContext;
private readonly ResiliencePipeline? _backendConnectPipeline;
private readonly Func<ReadCoalescingOptions>? _coalescingOptions;
// Live keepalive accessor (TCP SO_KEEPALIVE on accepted upstream sockets + the backend
// heartbeat). Non-null after construction — falls back to the construction-time
// ConnectionOptions snapshot when no live accessor is supplied.
private readonly Func<KeepaliveOptions> _keepaliveOptions;
private TcpListener? _listener;
private PlcMultiplexer? _multiplexer;
@@ -62,7 +66,8 @@ internal sealed partial class PlcListener : IAsyncDisposable
ILogger pipeLogger,
PerPlcContext? perPlcContext = null,
ResiliencePipeline? backendConnectPipeline = null,
Func<ReadCoalescingOptions>? coalescingOptions = null)
Func<ReadCoalescingOptions>? coalescingOptions = null,
Func<KeepaliveOptions>? keepaliveOptions = null)
{
_plc = plc;
_connectionOptions = connectionOptions;
@@ -73,6 +78,7 @@ internal sealed partial class PlcListener : IAsyncDisposable
_perPlcContext = perPlcContext;
_backendConnectPipeline = backendConnectPipeline;
_coalescingOptions = coalescingOptions;
_keepaliveOptions = keepaliveOptions ?? (() => _connectionOptions.Keepalive);
}
/// <summary>
@@ -103,7 +109,8 @@ internal sealed partial class PlcListener : IAsyncDisposable
ctx,
_multiplexerLogger,
_backendConnectPipeline,
_coalescingOptions);
_coalescingOptions,
_keepaliveOptions);
}
/// <summary>
@@ -125,7 +132,7 @@ internal sealed partial class PlcListener : IAsyncDisposable
{
Socket upstream = await _listener.AcceptSocketAsync(ct).ConfigureAwait(false);
var pipe = new UpstreamPipe(upstream, _plc.Name, _pipeLogger);
var pipe = new UpstreamPipe(upstream, _plc.Name, _pipeLogger, _keepaliveOptions());
var pipeTask = Task.Run(async () =>
{
try
+39 -2
View File
@@ -133,7 +133,24 @@ public sealed record CounterSnapshot(
/// socket fast enough to keep up with the backend; the wedged client loses its own
/// responses but its peers on the same PLC continue to receive theirs.
/// </summary>
long ResponseDropForFullUpstream);
long ResponseDropForFullUpstream,
/// <summary>
/// Cumulative count of backend keepalive heartbeat probes issued (synthetic FC03
/// qty=1 reads sent on an idle backend socket).
/// </summary>
long BackendHeartbeatsSent,
/// <summary>
/// Cumulative count of backend keepalive heartbeat probes that were not answered
/// within <c>BackendRequestTimeoutMs</c>. Each failure triggers a proactive backend
/// teardown (see <see cref="BackendIdleDisconnects"/>).
/// </summary>
long BackendHeartbeatsFailed,
/// <summary>
/// Cumulative count of backend teardowns triggered by a failed keepalive heartbeat.
/// Distinct from <see cref="BackendDisconnectCascades"/> (which counts cascaded
/// pipes); this counts the disconnect <i>events</i> attributed to keepalive.
/// </summary>
long BackendIdleDisconnects);
/// <summary>
/// Thread-safe per-PLC counters backed by <see cref="System.Threading.Interlocked"/> longs.
@@ -184,6 +201,11 @@ internal sealed class ProxyCounters
// and account.
private long _responseDropForFullUpstream;
// Backend keepalive heartbeat counters.
private long _backendHeartbeatsSent;
private long _backendHeartbeatsFailed;
private long _backendIdleDisconnects;
// Live cache state pulled from a per-PLC ResponseCache on each snapshot. The
// multiplexer registers a single provider via SetCacheStatsProvider so the status
// page sees current entry-count / bytes without a separate poll.
@@ -315,6 +337,18 @@ internal sealed class ProxyCounters
public void IncrementResponseDropForFullUpstream()
=> Interlocked.Increment(ref _responseDropForFullUpstream);
/// <summary>Records one backend keepalive heartbeat probe sent.</summary>
public void IncrementBackendHeartbeatSent()
=> Interlocked.Increment(ref _backendHeartbeatsSent);
/// <summary>Records one backend keepalive heartbeat probe that timed out.</summary>
public void IncrementBackendHeartbeatFailed()
=> Interlocked.Increment(ref _backendHeartbeatsFailed);
/// <summary>Records one backend teardown triggered by a failed keepalive heartbeat.</summary>
public void IncrementBackendIdleDisconnect()
=> Interlocked.Increment(ref _backendIdleDisconnects);
/// <summary>
/// Wires the per-PLC <see cref="Cache.ResponseCache"/> as the live stats source for
/// the snapshot path. Pass <c>null</c> to detach during disposal.
@@ -445,7 +479,10 @@ internal sealed class ProxyCounters
CacheInvalidations: Interlocked.Read(ref _cacheInvalidations),
CacheEntryCount: cacheEntries,
CacheBytes: cacheBytes,
ResponseDropForFullUpstream: Interlocked.Read(ref _responseDropForFullUpstream));
ResponseDropForFullUpstream: Interlocked.Read(ref _responseDropForFullUpstream),
BackendHeartbeatsSent: Interlocked.Read(ref _backendHeartbeatsSent),
BackendHeartbeatsFailed: Interlocked.Read(ref _backendHeartbeatsFailed),
BackendIdleDisconnects: Interlocked.Read(ref _backendIdleDisconnects));
}
}
+10 -2
View File
@@ -150,6 +150,11 @@ internal sealed partial class ProxyWorker : BackgroundService
Func<ReadCoalescingOptions> coalescingAccessor =
() => _options.CurrentValue.Resilience.ReadCoalescing;
// Live accessor for KeepaliveOptions so a hot-reload of `Connection.Keepalive`
// propagates to the backend heartbeat loop and to upstream-socket keepalive.
Func<KeepaliveOptions> keepaliveAccessor =
() => _options.CurrentValue.Connection.Keepalive;
var supervisor = new PlcListenerSupervisor(
plc,
opts.Connection,
@@ -161,7 +166,8 @@ internal sealed partial class ProxyWorker : BackgroundService
recoveryPipeline,
_loggerFactory.CreateLogger<PlcListenerSupervisor>(),
backendPipeline,
coalescingAccessor);
coalescingAccessor,
keepaliveAccessor);
_supervisors[plc.Name] = supervisor;
}
@@ -175,7 +181,9 @@ internal sealed partial class ProxyWorker : BackgroundService
// (add/restart paths) honour hot-reloaded ReadCoalescing values.
Func<ReadCoalescingOptions> reconcilerCoalescingAccessor =
() => _options.CurrentValue.Resilience.ReadCoalescing;
_reconciler.Attach(_supervisors, opts, reconcilerCoalescingAccessor);
Func<KeepaliveOptions> reconcilerKeepaliveAccessor =
() => _options.CurrentValue.Connection.Keepalive;
_reconciler.Attach(_supervisors, opts, reconcilerCoalescingAccessor, reconcilerKeepaliveAccessor);
if (_supervisors.Count == 0)
{
@@ -0,0 +1,49 @@
using System.Net.Sockets;
using Mbproxy.Options;
namespace Mbproxy.Proxy;
/// <summary>
/// Applies OS-level TCP keepalive (<c>SO_KEEPALIVE</c> plus the idle-time / probe-interval /
/// probe-count tunables) to a socket. Used on both the backend socket (proxy → PLC) and
/// accepted upstream sockets (client → proxy) so the OS detects a dead peer on an
/// otherwise-idle connection — the DL205/DL260 ECOM never emits keepalives of its own.
/// </summary>
internal static class SocketKeepalive
{
/// <summary>
/// Enables TCP keepalive on <paramref name="socket"/> from <paramref name="options"/>.
/// A no-op when <see cref="KeepaliveOptions.Enabled"/> is <c>false</c>.
///
/// <para>Failures are swallowed: keepalive is a best-effort belt-and-suspenders measure
/// (the backend application heartbeat is the load-bearing mechanism) and must never
/// abort a connection. The three TCP tunables are also not honoured on every platform;
/// a refusal there is benign.</para>
/// </summary>
public static void Apply(Socket socket, KeepaliveOptions options)
{
if (!options.Enabled) return;
try
{
socket.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.KeepAlive, true);
// SocketOptionName.TcpKeepAliveTime / TcpKeepAliveInterval are specified in
// SECONDS; round the configured milliseconds up to at least one second.
int idleSec = Math.Max(1, (options.TcpIdleTimeMs + 999) / 1000);
int intervalSec = Math.Max(1, (options.TcpProbeIntervalMs + 999) / 1000);
socket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAliveTime, idleSec);
socket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAliveInterval, intervalSec);
socket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAliveRetryCount, options.TcpProbeCount);
}
catch (SocketException)
{
// Platform refused a tunable — keepalive stays best-effort.
}
catch (ObjectDisposedException)
{
// Socket closed concurrently — nothing to do.
}
}
}
@@ -38,6 +38,7 @@ internal sealed partial class PlcListenerSupervisor : IAsyncDisposable
private readonly ILogger<PlcListenerSupervisor> _logger;
private readonly ResiliencePipeline? _backendConnectPipeline;
private readonly Func<ReadCoalescingOptions>? _coalescingOptions;
private readonly Func<KeepaliveOptions>? _keepaliveOptions;
// ── Mutable state ────────────────────────────────────────────────────────────────────
@@ -94,7 +95,8 @@ internal sealed partial class PlcListenerSupervisor : IAsyncDisposable
ResiliencePipeline recoveryPipeline,
ILogger<PlcListenerSupervisor> logger,
ResiliencePipeline? backendConnectPipeline = null,
Func<ReadCoalescingOptions>? coalescingOptions = null)
Func<ReadCoalescingOptions>? coalescingOptions = null,
Func<KeepaliveOptions>? keepaliveOptions = null)
{
_plc = plc;
_connectionOptions = connectionOptions;
@@ -108,6 +110,7 @@ internal sealed partial class PlcListenerSupervisor : IAsyncDisposable
_logger = logger;
_backendConnectPipeline = backendConnectPipeline;
_coalescingOptions = coalescingOptions;
_keepaliveOptions = keepaliveOptions;
}
/// <summary>
@@ -325,7 +328,8 @@ internal sealed partial class PlcListenerSupervisor : IAsyncDisposable
_pipeLogger,
_currentContext,
_backendConnectPipeline,
_coalescingOptions);
_coalescingOptions,
_keepaliveOptions);
// Expose the current listener for status-page pair enumeration.
_currentListener = listener;
@@ -53,7 +53,9 @@ public sealed class StatusHtmlRendererTests
CoalescedHitCount: 0, CoalescedMissCount: 0,
CoalescedResponseToDeadUpstream: 0,
CacheHitCount: 0, CacheMissCount: 0,
CacheInvalidations: 0, CacheEntryCount: 0, CacheBytes: 0),
CacheInvalidations: 0, CacheEntryCount: 0, CacheBytes: 0,
BackendHeartbeatsSent: 0, BackendHeartbeatsFailed: 0,
BackendIdleDisconnects: 0),
Bytes: new PlcBytesStatus(1024, 2048));
}
@@ -264,4 +264,74 @@ public sealed class ReloadValidatorTests
Assert.False(valid);
Assert.Contains(errors, e => e.Contains("GracefulShutdownTimeoutMs"));
}
// ── Keepalive section ─────────────────────────────────────────────────────
[Fact]
public void Validate_DefaultKeepalive_Passes()
{
// Default ConnectionOptions → default KeepaliveOptions (idle 30 s, request 3 s).
var opts = MakeOptions([MakePlc("PLC-A", 5020)]);
bool valid = ReloadValidator.Validate(opts, out _);
Assert.True(valid);
}
[Fact]
public void Validate_NonPositiveTcpProbeCount_Fails()
{
var opts = new MbproxyOptions
{
Plcs = [MakePlc("PLC-A", 5020)],
Connection = new ConnectionOptions
{
Keepalive = new KeepaliveOptions { TcpProbeCount = 0 },
},
};
bool valid = ReloadValidator.Validate(opts, out var errors);
Assert.False(valid);
Assert.Contains(errors, e => e.Contains("TcpProbeCount"));
}
[Fact]
public void Validate_OutOfRangeHeartbeatProbeAddress_Fails()
{
var opts = new MbproxyOptions
{
Plcs = [MakePlc("PLC-A", 5020)],
Connection = new ConnectionOptions
{
Keepalive = new KeepaliveOptions { BackendHeartbeatProbeAddress = 70000 },
},
};
bool valid = ReloadValidator.Validate(opts, out var errors);
Assert.False(valid);
Assert.Contains(errors, e => e.Contains("BackendHeartbeatProbeAddress"));
}
[Fact]
public void Validate_HeartbeatIdleNotAboveRequestTimeout_Fails()
{
// BackendHeartbeatIdleMs must sit ABOVE BackendRequestTimeoutMs, else a heartbeat
// would be timed out as fast as it could be issued.
var opts = new MbproxyOptions
{
Plcs = [MakePlc("PLC-A", 5020)],
Connection = new ConnectionOptions
{
BackendRequestTimeoutMs = 3000,
Keepalive = new KeepaliveOptions { BackendHeartbeatIdleMs = 3000 },
},
};
bool valid = ReloadValidator.Validate(opts, out var errors);
Assert.False(valid);
Assert.Contains(errors, e => e.Contains("BackendHeartbeatIdleMs"));
}
}
@@ -0,0 +1,40 @@
using Mbproxy.Diagnostics;
using Shouldly;
using Xunit;
namespace Mbproxy.Tests.Diagnostics;
/// <summary>
/// Unit tests for <see cref="DiagnosticSinkSelector"/> — the pure platform-selection
/// logic for the Error+ diagnostic sink. Covers every OS / host combination so the
/// selection contract is pinned without needing a real Windows Service or systemd host.
/// </summary>
[Trait("Category", "Unit")]
public sealed class DiagnosticSinkSelectorTests
{
// 'expected' is the underlying int of DiagnosticSink: the enum is internal and
// cannot appear in a public (xunit-discoverable) method signature.
[Theory]
[InlineData(true, true, false, (int)DiagnosticSink.EventLog)] // Windows, hosted as a Windows Service
[InlineData(true, false, false, (int)DiagnosticSink.None)] // Windows, interactive / dev run
[InlineData(false, false, true, (int)DiagnosticSink.Syslog)] // Linux, hosted as a systemd service
[InlineData(false, false, false, (int)DiagnosticSink.None)] // Linux / macOS, interactive / dev run
public void Select_PicksExpectedSink(
bool isWindows, bool isWindowsService, bool isSystemd, int expected)
=> ((int)DiagnosticSinkSelector.Select(isWindows, isWindowsService, isSystemd)).ShouldBe(expected);
[Fact]
public void Select_Windows_TakesPrecedence_OverASpuriousSystemdFlag()
=> DiagnosticSinkSelector.Select(isWindows: true, isWindowsService: true, isSystemd: true)
.ShouldBe(DiagnosticSink.EventLog);
[Fact]
public void Select_WindowsConsoleRun_GetsNoSink_EvenIfSystemdFlagSet()
=> DiagnosticSinkSelector.Select(isWindows: true, isWindowsService: false, isSystemd: true)
.ShouldBe(DiagnosticSink.None);
[Fact]
public void Select_NonWindowsWithoutSystemd_GetsNoSink()
=> DiagnosticSinkSelector.Select(isWindows: false, isWindowsService: false, isSystemd: false)
.ShouldBe(DiagnosticSink.None);
}
@@ -0,0 +1,41 @@
using Mbproxy.Diagnostics;
using Shouldly;
using Xunit;
namespace Mbproxy.Tests.Diagnostics;
/// <summary>
/// Unit tests for <see cref="EventLogMessage.TruncateToLimit"/> — the 32 KB Windows
/// Event Log truncation rule. The helper is pure and OS-agnostic, so these run on
/// every platform (the Windows-only <see cref="EventLogBridge"/> sink itself is not
/// exercised here).
/// </summary>
[Trait("Category", "Unit")]
public sealed class EventLogMessageTests
{
[Fact]
public void TruncateToLimit_ShortMessage_ReturnedUnchanged()
{
const string msg = "mbproxy backend connect failed";
EventLogMessage.TruncateToLimit(msg).ShouldBeSameAs(msg);
}
[Fact]
public void TruncateToLimit_MessageAtTheLimit_NotTruncated()
{
// MaxBytes / 2 chars = exactly MaxBytes at the 2-bytes-per-char upper bound.
var atLimit = new string('y', EventLogMessage.MaxBytes / 2);
EventLogMessage.TruncateToLimit(atLimit).ShouldBe(atLimit);
}
[Fact]
public void TruncateToLimit_OversizeMessage_TruncatedWithinLimit_AndEndsWithEllipsis()
{
var huge = new string('x', EventLogMessage.MaxBytes); // well over the limit
var result = EventLogMessage.TruncateToLimit(huge);
(result.Length * 2).ShouldBeLessThanOrEqualTo(EventLogMessage.MaxBytes);
result.ShouldEndWith("...");
result.Length.ShouldBeLessThan(huge.Length);
}
}
@@ -0,0 +1,27 @@
using Mbproxy.Diagnostics;
using Serilog;
using Shouldly;
using Xunit;
namespace Mbproxy.Tests.Diagnostics;
/// <summary>
/// Unit tests for <see cref="SyslogBridge"/>. The bridge's fail-safe contract is that
/// attaching the local-syslog sink and building the resulting logger never throw —
/// even on a host with no <c>/dev/log</c> (e.g. the Windows test leg), where the sink
/// connects lazily and degrades silently.
/// </summary>
[Trait("Category", "Unit")]
public sealed class SyslogBridgeTests
{
[Fact]
public void AttachTo_ReturnsAConfiguration_AndNeverThrows()
=> SyslogBridge.AttachTo(new LoggerConfiguration()).ShouldNotBeNull();
[Fact]
public void AttachTo_ResultCreatesALogger_WithoutThrowing()
{
using var logger = SyslogBridge.AttachTo(new LoggerConfiguration()).CreateLogger();
logger.ShouldNotBeNull();
}
}
@@ -5,6 +5,8 @@ using Mbproxy.Proxy;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Hosting.Systemd;
using Microsoft.Extensions.Hosting.WindowsServices;
using Serilog;
using Serilog.Core;
using Serilog.Events;
@@ -71,6 +73,26 @@ public sealed class HostSmokeTests
// Assert: does not throw / time out.
await stopTask.ShouldCompleteWithinAsync(TimeSpan.FromSeconds(3));
}
[Fact]
public async Task HostSmoke_BothInitSystemIntegrations_CoRegister_AndHostRunsCleanly()
{
// Arrange: register BOTH init-system integrations. Each is a no-op off its
// own init system, so on a test run (neither) the default console lifetime
// applies — they must co-register without conflict and leave the host
// startable and stoppable.
var builder = Host.CreateApplicationBuilder();
builder.Services.AddWindowsService();
builder.Services.AddSystemd();
builder.ConfigureForTest(new LoggerConfiguration().CreateLogger());
using var host = builder.Build();
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5));
// Act + Assert: start/stop do not throw or time out.
await host.StartAsync(cts.Token);
await host.StopAsync(cts.Token);
}
}
/// <summary>
@@ -102,6 +102,39 @@ public sealed class MbproxyOptionsBindingTests
options.Resilience.ListenerRecovery.InitialBackoffMs.ShouldBe([1000, 2000, 5000, 15000, 30000]);
options.Plcs.ShouldBeEmpty();
options.BcdTags.Global.ShouldBeEmpty();
// Keepalive defaults — enabled, with the documented timer values.
options.Connection.Keepalive.Enabled.ShouldBeTrue();
options.Connection.Keepalive.TcpIdleTimeMs.ShouldBe(30000);
options.Connection.Keepalive.TcpProbeIntervalMs.ShouldBe(5000);
options.Connection.Keepalive.TcpProbeCount.ShouldBe(4);
options.Connection.Keepalive.BackendHeartbeatIdleMs.ShouldBe(30000);
options.Connection.Keepalive.BackendHeartbeatProbeAddress.ShouldBe(0);
}
// -------------------------------------------------------------------------
// Test 5 — the Connection:Keepalive block binds from configuration
// -------------------------------------------------------------------------
[Fact]
public void MbproxyOptionsBinding_BindsKeepaliveBlock()
{
var options = BindOptions(new Dictionary<string, string?>
{
["Mbproxy:Connection:Keepalive:Enabled"] = "false",
["Mbproxy:Connection:Keepalive:TcpIdleTimeMs"] = "45000",
["Mbproxy:Connection:Keepalive:TcpProbeIntervalMs"] = "7000",
["Mbproxy:Connection:Keepalive:TcpProbeCount"] = "6",
["Mbproxy:Connection:Keepalive:BackendHeartbeatIdleMs"] = "20000",
["Mbproxy:Connection:Keepalive:BackendHeartbeatProbeAddress"] = "1024",
});
var ka = options.Connection.Keepalive;
ka.Enabled.ShouldBeFalse();
ka.TcpIdleTimeMs.ShouldBe(45000);
ka.TcpProbeIntervalMs.ShouldBe(7000);
ka.TcpProbeCount.ShouldBe(6);
ka.BackendHeartbeatIdleMs.ShouldBe(20000);
ka.BackendHeartbeatProbeAddress.ShouldBe(1024);
}
// -------------------------------------------------------------------------
@@ -129,4 +162,47 @@ public sealed class MbproxyOptionsBindingTests
result.Failed.ShouldBeTrue("Width=8 should fail schema validation");
result.Failures.ShouldNotBeEmpty();
}
// -------------------------------------------------------------------------
// Test 6 — every shipped install template (Windows + Linux) loads as JSONC,
// binds to MbproxyOptions, and passes schema validation. This catches
// a malformed template at build time and keeps the two platform
// variants in lockstep.
// -------------------------------------------------------------------------
[Theory]
[InlineData("mbproxy.config.template.json")]
[InlineData("mbproxy.linux.config.template.json")]
public void MbproxyOptionsBinding_ShippedInstallTemplate_BindsAndValidates(string templateFileName)
{
var templatePath = ResolveInstallFile(templateFileName);
// The templates are JSONC; the .NET JSON config provider skips // and /* */
// comments and allows trailing commas, so AddJsonFile loads them directly.
var config = new ConfigurationBuilder()
.AddJsonFile(templatePath, optional: false)
.Build();
var options = config.GetSection("Mbproxy").Get<MbproxyOptions>() ?? new MbproxyOptions();
var result = new MbproxyOptionsValidator().Validate(null, options);
result.Succeeded.ShouldBeTrue(
$"{templateFileName} must pass schema validation — failures: " +
string.Join("; ", result.Failures ?? []));
}
/// <summary>
/// Resolves an <c>install/</c> file by walking up from the test assembly directory.
/// Works from both the Windows dev box and the Linux test box.
/// </summary>
private static string ResolveInstallFile(string fileName)
{
for (var dir = new DirectoryInfo(AppContext.BaseDirectory); dir is not null; dir = dir.Parent)
{
var candidate = Path.Combine(dir.FullName, "install", fileName);
if (File.Exists(candidate))
return candidate;
}
throw new FileNotFoundException(
$"Could not locate install/{fileName} above {AppContext.BaseDirectory}");
}
}
@@ -0,0 +1,366 @@
using System.Net;
using System.Net.Sockets;
using Mbproxy.Options;
using Mbproxy.Proxy;
using Mbproxy.Proxy.Multiplexing;
using Microsoft.Extensions.Logging.Abstractions;
using Shouldly;
using Xunit;
namespace Mbproxy.Tests.Proxy.Multiplexing;
/// <summary>
/// Tests for the backend keepalive heartbeat and the <see cref="SocketKeepalive"/> helper.
/// The heartbeat tests run the real <see cref="PlcMultiplexer"/> against a stub backend
/// (real sockets, no simulator) with a deliberately short <c>BackendHeartbeatIdleMs</c>.
/// </summary>
[Trait("Category", "Unit")]
public sealed class KeepaliveTests
{
// ── Helpers ────────────────────────────────────────────────────────────────
private static int PickFreePort()
{
var l = new TcpListener(IPAddress.Loopback, 0);
l.Start();
int port = ((IPEndPoint)l.LocalEndpoint).Port;
l.Stop();
return port;
}
private static async Task<byte[]> ReadExactAsync(Socket socket, int count, CancellationToken ct)
{
var buf = new byte[count];
int read = 0;
while (read < count)
{
int n = await socket.ReceiveAsync(buf.AsMemory(read, count - read), SocketFlags.None, ct);
if (n == 0) throw new IOException("EOF");
read += n;
}
return buf;
}
private static async Task<byte[]> ReadOneFrameAsync(Socket socket, CancellationToken ct)
{
var header = await ReadExactAsync(socket, 7, ct);
ushort length = (ushort)((header[4] << 8) | header[5]);
int bodyLen = length - 1;
var body = bodyLen > 0 ? await ReadExactAsync(socket, bodyLen, ct) : Array.Empty<byte>();
var frame = new byte[7 + bodyLen];
Buffer.BlockCopy(header, 0, frame, 0, 7);
if (bodyLen > 0) Buffer.BlockCopy(body, 0, frame, 7, bodyLen);
return frame;
}
private static byte[] BuildFc03ReadFrame(ushort txId, ushort start, ushort qty, byte unitId = 1)
=>
[
(byte)(txId >> 8), (byte)(txId & 0xFF),
0x00, 0x00,
0x00, 0x06,
unitId,
0x03,
(byte)(start >> 8), (byte)(start & 0xFF),
(byte)(qty >> 8), (byte)(qty & 0xFF),
];
private static byte[] BuildFc03Response(ushort txId, byte unitId, ushort register)
{
// Body = FC(1) + byteCount(1) + data(2) = 4. MBAP length = UnitId(1) + body(4) = 5.
var frame = new byte[7 + 4];
frame[0] = (byte)(txId >> 8);
frame[1] = (byte)(txId & 0xFF);
frame[2] = 0; frame[3] = 0;
frame[4] = 0; frame[5] = 5; // length
frame[6] = unitId;
frame[7] = 0x03;
frame[8] = 2; // byte count
frame[9] = (byte)(register >> 8);
frame[10] = (byte)(register & 0xFF);
return frame;
}
private static PerPlcContext MakeContext(string name) => new()
{
PlcName = name,
TagMap = Mbproxy.Bcd.BcdTagMap.Empty,
Counters = new ProxyCounters(),
Logger = NullLogger.Instance,
};
/// <summary>
/// Stub backend that echoes FC03 responses (including the synthetic heartbeat probe,
/// which is itself an FC03). When <see cref="Silent"/> is set it reads and drains
/// requests but never responds — used to drive heartbeat timeouts.
/// </summary>
private sealed class StubBackend : IAsyncDisposable
{
public int Port { get; }
public volatile bool Silent;
private int _requestCount;
public int RequestCount => Volatile.Read(ref _requestCount);
private readonly TcpListener _listener;
private readonly CancellationTokenSource _cts = new();
private readonly List<Task> _clientTasks = new();
public StubBackend(int port, bool silent = false)
{
Port = port;
Silent = silent;
_listener = new TcpListener(IPAddress.Loopback, port);
_listener.Start();
_ = AcceptLoop();
}
private async Task AcceptLoop()
{
try
{
while (!_cts.IsCancellationRequested)
{
Socket s = await _listener.AcceptSocketAsync(_cts.Token);
var task = Task.Run(() => HandleAsync(s));
lock (_clientTasks) _clientTasks.Add(task);
}
}
catch { /* shutdown */ }
}
private async Task HandleAsync(Socket s)
{
try
{
while (!_cts.IsCancellationRequested)
{
var req = await ReadOneFrameAsync(s, _cts.Token);
if (req.Length < 8) break;
Interlocked.Increment(ref _requestCount);
if (Silent) continue;
ushort txId = (ushort)((req[0] << 8) | req[1]);
byte unitId = req[6];
byte fc = req[7];
if (fc != 0x03) break;
await s.SendAsync(BuildFc03Response(txId, unitId, 0x1234), SocketFlags.None, _cts.Token);
}
}
catch { /* normal */ }
finally { try { s.Dispose(); } catch { } }
}
public async ValueTask DisposeAsync()
{
await _cts.CancelAsync();
try { _listener.Stop(); } catch { }
Task[] snap;
lock (_clientTasks) snap = _clientTasks.ToArray();
try { await Task.WhenAll(snap).WaitAsync(TimeSpan.FromSeconds(2)); } catch { }
_cts.Dispose();
}
}
private static PlcMultiplexer BuildMux(PlcOptions plc, ConnectionOptions connOpts, PerPlcContext ctx)
=> new(
plc, connOpts,
new BcdPduPipeline(),
ctx,
NullLogger<PlcMultiplexer>.Instance,
backendConnectPipeline: null);
private static async Task<(Socket client, UpstreamPipe pipe, TcpListener listener)>
ConnectClientAsync(PlcMultiplexer mux, string plcName)
{
int proxyPort = PickFreePort();
var listener = new TcpListener(IPAddress.Loopback, proxyPort);
listener.Start();
var client = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
{ NoDelay = true };
await client.ConnectAsync(IPAddress.Loopback, proxyPort);
var upstream = await listener.AcceptSocketAsync();
var pipe = new UpstreamPipe(upstream, plcName, NullLogger.Instance);
_ = Task.Run(() => mux.StartPipeAsync(pipe, CancellationToken.None));
return (client, pipe, listener);
}
// ── SocketKeepalive helper ─────────────────────────────────────────────────
[Fact]
public void SocketKeepalive_Apply_Enabled_TurnsOnKeepAlive()
{
using var socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
SocketKeepalive.Apply(socket, new KeepaliveOptions
{
Enabled = true,
TcpIdleTimeMs = 30000,
TcpProbeIntervalMs = 5000,
TcpProbeCount = 4,
});
int keepAlive = (int)socket.GetSocketOption(SocketOptionLevel.Socket, SocketOptionName.KeepAlive)!;
keepAlive.ShouldNotBe(0, "SO_KEEPALIVE must be enabled after Apply");
}
[Fact]
public void SocketKeepalive_Apply_Disabled_IsNoOp()
{
using var socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
SocketKeepalive.Apply(socket, new KeepaliveOptions { Enabled = false });
int keepAlive = (int)socket.GetSocketOption(SocketOptionLevel.Socket, SocketOptionName.KeepAlive)!;
keepAlive.ShouldBe(0, "Apply with Enabled=false must not touch the socket");
}
// ── Backend heartbeat ──────────────────────────────────────────────────────
[Fact]
public async Task Heartbeat_FiresOnIdleBackend_AndIsAnswered_NoCascade()
{
int backendPort = PickFreePort();
await using var backend = new StubBackend(backendPort);
var ctx = MakeContext("PLC1");
var plc = new PlcOptions { Name = "PLC1", ListenPort = 0, Host = "127.0.0.1", Port = backendPort };
var connOpts = new ConnectionOptions
{
Keepalive = new KeepaliveOptions { Enabled = true, BackendHeartbeatIdleMs = 600 },
};
await using var mux = BuildMux(plc, connOpts, ctx);
var (client, pipe, listener) = await ConnectClientAsync(mux, plc.Name);
try
{
// One real round-trip brings the backend up and starts the heartbeat loop.
await client.SendAsync(BuildFc03ReadFrame(0x0001, 0, 1), SocketFlags.None);
_ = await ReadOneFrameAsync(client, TestContext.Current.CancellationToken);
// Idle the connection past the heartbeat threshold a few times over.
long sent = 0;
for (int i = 0; i < 60; i++)
{
sent = ctx.Counters.Snapshot().BackendHeartbeatsSent;
if (sent >= 1) break;
await Task.Delay(100, TestContext.Current.CancellationToken);
}
sent.ShouldBeGreaterThanOrEqualTo(1, "an idle backend must receive at least one heartbeat probe");
var snap = ctx.Counters.Snapshot();
snap.BackendHeartbeatsFailed.ShouldBe(0, "an answered heartbeat must not count as failed");
snap.BackendIdleDisconnects.ShouldBe(0, "an answered heartbeat must not tear the backend down");
// The client connection survived — a fresh request still round-trips.
await client.SendAsync(BuildFc03ReadFrame(0x0002, 0, 1), SocketFlags.None);
var rsp = await ReadOneFrameAsync(client, TestContext.Current.CancellationToken)
.WaitAsync(TimeSpan.FromSeconds(3), TestContext.Current.CancellationToken);
((ushort)((rsp[0] << 8) | rsp[1])).ShouldBe((ushort)0x0002);
}
finally
{
client.Dispose();
await pipe.DisposeAsync();
listener.Stop();
}
}
[Fact]
public async Task Heartbeat_SuppressedByRealTraffic()
{
int backendPort = PickFreePort();
await using var backend = new StubBackend(backendPort);
var ctx = MakeContext("PLC1");
var plc = new PlcOptions { Name = "PLC1", ListenPort = 0, Host = "127.0.0.1", Port = backendPort };
// Idle threshold well above the request cadence below.
var connOpts = new ConnectionOptions
{
Keepalive = new KeepaliveOptions { Enabled = true, BackendHeartbeatIdleMs = 1500 },
};
await using var mux = BuildMux(plc, connOpts, ctx);
var (client, pipe, listener) = await ConnectClientAsync(mux, plc.Name);
try
{
// Steady real traffic every ~200 ms for ~2.4 s. Each round-trip refreshes the
// activity timestamp, so the 1500 ms idle threshold is never reached.
for (ushort i = 1; i <= 12; i++)
{
await client.SendAsync(BuildFc03ReadFrame(i, 0, 1), SocketFlags.None);
_ = await ReadOneFrameAsync(client, TestContext.Current.CancellationToken)
.WaitAsync(TimeSpan.FromSeconds(3), TestContext.Current.CancellationToken);
await Task.Delay(200, TestContext.Current.CancellationToken);
}
ctx.Counters.Snapshot().BackendHeartbeatsSent
.ShouldBe(0, "real traffic must keep resetting the idle timer so no heartbeat fires");
}
finally
{
client.Dispose();
await pipe.DisposeAsync();
listener.Stop();
}
}
[Fact]
public async Task Heartbeat_Timeout_TearsDownBackend_AndCascades()
{
int backendPort = PickFreePort();
// Silent from the start: the backend accepts the TCP connection and drains every
// frame (including the heartbeat) but never replies.
await using var backend = new StubBackend(backendPort, silent: true);
var ctx = MakeContext("PLC1");
var plc = new PlcOptions { Name = "PLC1", ListenPort = 0, Host = "127.0.0.1", Port = backendPort };
var connOpts = new ConnectionOptions
{
BackendRequestTimeoutMs = 500,
Keepalive = new KeepaliveOptions { Enabled = true, BackendHeartbeatIdleMs = 700 },
};
await using var mux = BuildMux(plc, connOpts, ctx);
var (client, pipe, listener) = await ConnectClientAsync(mux, plc.Name);
try
{
// First request brings the backend TCP connection up and starts the heartbeat
// loop. It will itself time out with 0x0B (the backend never answers) — drain
// and ignore that frame.
await client.SendAsync(BuildFc03ReadFrame(0x0001, 0, 1), SocketFlags.None);
try
{
_ = await ReadOneFrameAsync(client, TestContext.Current.CancellationToken)
.WaitAsync(TimeSpan.FromSeconds(2), TestContext.Current.CancellationToken);
}
catch { /* 0x0B or socket close — not what this test asserts */ }
// The heartbeat fires on the idle socket, never gets answered, and the watchdog
// times it out — which tears the backend down.
long failed = 0, idleDisc = 0;
for (int i = 0; i < 80; i++)
{
var snap = ctx.Counters.Snapshot();
failed = snap.BackendHeartbeatsFailed;
idleDisc = snap.BackendIdleDisconnects;
if (failed >= 1 && idleDisc >= 1) break;
await Task.Delay(100, TestContext.Current.CancellationToken);
}
failed.ShouldBeGreaterThanOrEqualTo(1, "an unanswered heartbeat must count as failed");
idleDisc.ShouldBeGreaterThanOrEqualTo(1, "a failed heartbeat must trigger a backend idle-disconnect");
ctx.Counters.Snapshot().BackendHeartbeatsSent
.ShouldBeGreaterThanOrEqualTo(1, "a heartbeat must have been sent before it could fail");
}
finally
{
client.Dispose();
await pipe.DisposeAsync();
listener.Stop();
}
}
}
@@ -438,6 +438,69 @@ public sealed class MultiplexerE2ETests
}
}
// ── E2E 6: Backend keepalive heartbeat keeps an idle connection warm ─────────────
/// <summary>
/// With keepalive enabled, an idle backend connection receives periodic FC03 heartbeat
/// probes. This test idles a simulator-backed connection past
/// <c>BackendHeartbeatIdleMs</c>, verifies <c>backendHeartbeatsSent</c> climbs on the
/// status page, and confirms a later real read still round-trips on the same
/// (un-cascaded) connection.
/// </summary>
[Fact(Timeout = 8_000)]
public async Task E2E_Keepalive_IdleBackend_ReceivesHeartbeats_AndStaysUsable()
{
if (_sim.SkipReason is not null) Assert.Skip(_sim.SkipReason);
int proxyPort = PickFreePort();
int adminPort = PickFreePort();
var config = MakeBaseConfig(proxyPort);
config["Mbproxy:AdminPort"] = adminPort.ToString();
// Short idle window so the heartbeat fires several times within the test budget.
config["Mbproxy:Connection:Keepalive:Enabled"] = "true";
config["Mbproxy:Connection:Keepalive:BackendHeartbeatIdleMs"] = "700";
var host = BuildBcdHost(config);
using var startCts = new CancellationTokenSource(TimeSpan.FromSeconds(3));
await host.StartAsync(startCts.Token);
await using var hd = new AsyncHostDispose(host);
await Task.Delay(200, TestContext.Current.CancellationToken);
using (var client = new TcpClient())
{
await client.ConnectAsync("127.0.0.1", proxyPort, TestContext.Current.CancellationToken);
var master = new ModbusFactory().CreateMaster(client);
// One read brings the backend up and starts the heartbeat loop.
_ = master.ReadHoldingRegisters(1, 0, 1);
// Idle the connection so the heartbeat loop fires repeatedly.
await Task.Delay(2500, TestContext.Current.CancellationToken);
// A later read still succeeds — the connection was never cascaded.
ushort[] regs = master.ReadHoldingRegisters(1, 0, 1);
regs.Length.ShouldBe(1, "the idle-then-active connection must still serve reads");
}
using var httpClient = new HttpClient();
var resp = await httpClient.GetStringAsync(
$"http://127.0.0.1:{adminPort}/status.json",
TestContext.Current.CancellationToken);
using var doc = JsonDocument.Parse(resp);
var backend = doc.RootElement.GetProperty("plcs")[0].GetProperty("backend");
backend.TryGetProperty("backendHeartbeatsSent", out _)
.ShouldBeTrue("status.json must expose backend.backendHeartbeatsSent");
backend.GetProperty("backendHeartbeatsSent").GetInt64()
.ShouldBeGreaterThanOrEqualTo(1, "an idle backend must have received at least one heartbeat");
backend.GetProperty("backendHeartbeatsFailed").GetInt64()
.ShouldBe(0, "every heartbeat against the live simulator must be answered");
backend.GetProperty("backendIdleDisconnects").GetInt64()
.ShouldBe(0, "an answered heartbeat must never tear the backend down");
}
// ── Helpers ──────────────────────────────────────────────────────────────────────
private Dictionary<string, string?> MakeBaseConfig(int proxyPort) => new()
+13 -5
View File
@@ -48,9 +48,13 @@ if (-not $ProfileResolved) {
}
# ── 2. Locate Python ─────────────────────────────────────────────────────────
# Try 'python' first (standard PATH install), then the Windows-store launcher 'py'.
# Windows: 'python' (standard PATH install), then the 'py' launcher.
# Linux/macOS: 'python3' (the canonical name), then 'python'.
# The candidate order is platform-specific so Windows never matches the Microsoft
# Store 'python3' stub.
$pythonExe = $null
foreach ($candidate in 'python', 'py') {
$pythonCandidates = $IsWindows ? @('python', 'py') : @('python3', 'python')
foreach ($candidate in $pythonCandidates) {
try {
$ver = & $candidate --version 2>&1
if ($LASTEXITCODE -eq 0) {
@@ -77,9 +81,13 @@ or use the Windows Store launcher ('py').
$PYMODBUS_VERSION = '3.13.0'
$venvDir = Join-Path $PSScriptRoot '.venv'
$venvPython = Join-Path $venvDir 'Scripts\python.exe'
$pipExe = Join-Path $venvDir 'Scripts\pip.exe'
$simulatorExe = Join-Path $venvDir 'Scripts\pymodbus.simulator.exe' # sentinel for complete install
# venv executable layout differs by OS: Windows puts them in Scripts\ with a .exe
# extension; Linux/macOS put them in bin/ with no extension.
$venvBin = $IsWindows ? 'Scripts' : 'bin'
$exeExt = $IsWindows ? '.exe' : ''
$venvPython = Join-Path $venvDir $venvBin "python$exeExt"
$pipExe = Join-Path $venvDir $venvBin "pip$exeExt"
$simulatorExe = Join-Path $venvDir $venvBin "pymodbus.simulator$exeExt" # sentinel for complete install
# Provisioning is idempotent: we only skip it when pymodbus.simulator.exe exists.
# Checking only the .venv directory is not enough — a previous run killed mid-install