mbproxy: cross-platform support — Linux/systemd alongside Windows
Make the service build, run, and install on Linux as a first-class target while keeping the Windows Service + Event Log behaviour intact. - Build: drop the hardcoded win-x64 RID — single-file publish now works for any RID. publish.ps1 gains -Rid; new publish.sh for Linux hosts. - Diagnostics: DiagnosticSinkSelector picks the Error+ sink per host — Windows Event Log under the SCM, local syslog under systemd (Serilog.Sinks.SyslogMessages), none for interactive runs. The EventLog truncation helper is extracted so it is testable cross-OS. - Host: Program.cs registers AddSystemd() alongside AddWindowsService(). - Config: a RID-conditioned appsettings template ships Windows or Unix paths; both templates are schema-validated by a test. - Install: systemd unit (Type=exec) plus install.sh / uninstall.sh. Also fixes two cross-platform bugs found while testing: install.ps1 and uninstall.ps1 used New-EventLog / Remove-EventLog (absent in PowerShell 7), and the E2E sim launcher hardcoded Windows venv paths. - Docs updated across README, CLAUDE.md, and docs/ for dual-platform. 413 tests pass on Windows; 374 (all non-simulator) on Linux. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -6,7 +6,7 @@ This document is the entry point for readers new to the codebase. It sketches th
|
||||
|
||||
## Runtime Shape
|
||||
|
||||
The process is a single .NET 10 Generic Host worker. `Microsoft.Extensions.Hosting.WindowsServices` registers the host as a Windows Service so the same binary runs interactively (for development) or under the SCM (in production). All configuration binds from `appsettings.json` through `IOptionsMonitor<MbproxyOptions>`, which makes the tag list and PLC roster hot-reloadable without process restart. `ProxyWorker` is the long-lived `BackgroundService` that owns startup, shutdown, and the listener supervisors for every PLC. A small Kestrel admin endpoint runs in the same process to serve the read-only status page.
|
||||
The process is a single .NET 10 Generic Host worker. It registers both `Microsoft.Extensions.Hosting.WindowsServices` and `Microsoft.Extensions.Hosting.Systemd` — each a no-op off its own init system — so the same binary runs interactively (for development), as a Windows Service under the SCM, or as a Linux systemd unit. All configuration binds from `appsettings.json` through `IOptionsMonitor<MbproxyOptions>`, which makes the tag list and PLC roster hot-reloadable without process restart. `ProxyWorker` is the long-lived `BackgroundService` that owns startup, shutdown, and the listener supervisors for every PLC. A small Kestrel admin endpoint runs in the same process to serve the read-only status page.
|
||||
|
||||
There is no in-process database, no message broker, and no persistent cache file: state is per-PLC, in-memory, and ephemeral. Restarting the service drops every in-flight request and every cached response. Upstream clients are expected to reconnect and reissue; the proxy never replays a request on their behalf.
|
||||
|
||||
|
||||
@@ -6,7 +6,7 @@ A save to `appsettings.json` propagates to a running `mbproxy` without restartin
|
||||
|
||||
`Microsoft.Extensions.Configuration` loads `appsettings.json` with `reloadOnChange: true`. Every consumer reads its options through `IOptionsMonitor<MbproxyOptions>` instead of capturing a one-shot `IOptions<T>` snapshot at construction. When the framework's `FileSystemWatcher` sees the file change, it re-parses the JSON, re-binds the option tree, and notifies subscribers through `IOptionsMonitor.OnChange`.
|
||||
|
||||
The chosen mechanism is deliberate. There is no custom file watcher, no IPC channel, no admin-port mutation endpoint, and no SIGHUP-style trigger. An operator edits the file in place (or a deployment tool atomically rewrites it) and the running service catches up. The reload contract is identical whether the service is running interactively or as a Windows Service under the SCM.
|
||||
The chosen mechanism is deliberate. There is no custom file watcher, no IPC channel, no admin-port mutation endpoint, and no SIGHUP-style trigger. An operator edits the file in place (or a deployment tool atomically rewrites it) and the running service catches up. The reload contract is identical whether the service is running interactively, as a Windows Service under the SCM, or as a Linux systemd unit.
|
||||
|
||||
The `OnChange` callback can fire multiple times for a single logical save because text editors on Windows commonly use a rename-and-replace pattern that produces two or three `FileSystemWatcher` events. The reconciler debounces these inside its own background loop with a 250 ms quiescent window so a single save produces a single apply.
|
||||
|
||||
|
||||
@@ -7,8 +7,11 @@
|
||||
The configuration loader resolves `appsettings.json` relative to the executable.
|
||||
|
||||
- **Development run** (`dotnet run`): `src/Mbproxy/appsettings.json` next to the build output.
|
||||
- **Single-file publish** (`dotnet publish -c Release -r win-x64`): `appsettings.json` next to `Mbproxy.exe` in the publish folder.
|
||||
- **Installed as a Windows Service**: `%ProgramData%\mbproxy\appsettings.json`. The install script copies the template at `install/mbproxy.config.template.json` to this path the first time only — an existing file is preserved across reinstalls.
|
||||
- **Single-file publish** (`dotnet publish -c Release -r <rid>`): `appsettings.json` next to the published binary. A `win-x64` publish ships `install/mbproxy.config.template.json`; a `linux-x64` publish ships `install/mbproxy.linux.config.template.json` (same keys, Unix log path) — each linked into the bundle as `appsettings.json`.
|
||||
- **Installed as a Windows Service**: `%ProgramData%\mbproxy\appsettings.json`, seeded by `install.ps1` from `mbproxy.config.template.json`.
|
||||
- **Installed as a systemd unit**: `/etc/mbproxy/appsettings.json` (the unit's `WorkingDirectory`), seeded by `install.sh` from the Linux template.
|
||||
|
||||
In both installed cases the install script copies the template only when no config already exists — an existing file is preserved across reinstalls.
|
||||
|
||||
The file is loaded with `reloadOnChange: true`. All consumers read through `IOptionsMonitor<MbproxyOptions>`, so a save propagates without restarting the service. See [`../Features/HotReload.md`](../Features/HotReload.md) for per-key propagation semantics.
|
||||
|
||||
@@ -51,11 +54,19 @@ Every supported key under `Mbproxy:*`, populated to a representative default:
|
||||
// Read-only HTTP status page. Set to 0 to disable.
|
||||
"AdminPort": 8080,
|
||||
|
||||
// Backend connection / request / shutdown timeouts.
|
||||
// Backend connection / request / shutdown timeouts and keepalive.
|
||||
"Connection": {
|
||||
"BackendConnectTimeoutMs": 3000,
|
||||
"BackendRequestTimeoutMs": 3000,
|
||||
"GracefulShutdownTimeoutMs": 10000
|
||||
"GracefulShutdownTimeoutMs": 10000,
|
||||
"Keepalive": {
|
||||
"Enabled": true,
|
||||
"TcpIdleTimeMs": 30000,
|
||||
"TcpProbeIntervalMs": 5000,
|
||||
"TcpProbeCount": 4,
|
||||
"BackendHeartbeatIdleMs": 30000,
|
||||
"BackendHeartbeatProbeAddress": 0
|
||||
}
|
||||
},
|
||||
|
||||
// Polly resilience policies.
|
||||
@@ -169,6 +180,21 @@ Operational sizing notes:
|
||||
- A 3 s request timeout is generous compared with typical DL205/DL260 scan times (a few ms to tens of ms for FC03 of 100 registers). The slack absorbs PLC scan-overlap jitter without faulting the upstream client.
|
||||
- `GracefulShutdownTimeoutMs` should be less than the Service Control Manager's stop deadline. The default 10 s suits a fleet of 54 PLCs; on a much larger fleet, raise both the SCM wait hint and this value in lockstep.
|
||||
|
||||
## `Mbproxy.Connection.Keepalive`
|
||||
|
||||
TCP keepalive and backend heartbeat settings. Source: `KeepaliveOptions.cs`. Enabled by default — the DL205/DL260 ECOM never emits TCP keepalives, so an idle socket is otherwise dropped by middleboxes after 2–5 minutes. See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md) for the full design.
|
||||
|
||||
| Field | Type | Default | Notes |
|
||||
|-------|------|---------|-------|
|
||||
| `Enabled` | bool | `true` | Master switch. When `false`, neither `SO_KEEPALIVE` nor the backend heartbeat is applied and the proxy behaves exactly as a pre-keepalive build. |
|
||||
| `TcpIdleTimeMs` | int | `30000` | `SO_KEEPALIVE` idle time before the OS sends its first probe. Applied to the backend socket and accepted upstream sockets. |
|
||||
| `TcpProbeIntervalMs` | int | `5000` | `SO_KEEPALIVE` interval between probes once idle. |
|
||||
| `TcpProbeCount` | int | `4` | `SO_KEEPALIVE` unanswered probes before the OS declares the socket dead. |
|
||||
| `BackendHeartbeatIdleMs` | int | `30000` | After this much backend idle, the proxy issues a synthetic FC03 qty=1 read to keep the path warm and prove the ECOM still answers Modbus. Must be greater than `BackendRequestTimeoutMs`. |
|
||||
| `BackendHeartbeatProbeAddress` | int | `0` | Modbus PDU address the heartbeat FC03 probe reads. Address `0` (`V0`) is valid on DL205/DL260 in factory absolute mode. Range `[0, 65535]`. |
|
||||
|
||||
On hot reload, the heartbeat interval and probe address are re-read on every loop tick. The `Tcp*` socket options are applied at connect/accept time, so a reload affects only sockets opened after the change. A reload where `BackendHeartbeatIdleMs <= BackendRequestTimeoutMs` is rejected — a heartbeat interval at or below the request timeout would fire continuously.
|
||||
|
||||
## `Mbproxy.Resilience`
|
||||
|
||||
Polly retry pipelines for backend connect, listener bind, and the in-flight read coalescer. Source: `ResilienceOptions.cs`.
|
||||
@@ -391,6 +417,7 @@ A reduced view of [`../Features/HotReload.md`](../Features/HotReload.md), restri
|
||||
| `Plcs[i]` removed | Supervisor stops the listener and closes all upstream connections for that PLC. |
|
||||
| `Plcs[i].ListenPort` or `Host` changed | Equivalent to remove + add. |
|
||||
| `Connection.Backend*TimeoutMs` | Next backend connect or request uses the new value. |
|
||||
| `Connection.Keepalive` heartbeat fields | Re-read on every heartbeat loop tick. `Tcp*` socket options apply to backend/upstream sockets opened after the change. |
|
||||
| `AdminPort` | Requires a service restart — the Kestrel admin host is built once at startup. |
|
||||
| `Resilience.ReadCoalescing.Enabled` | Hot-reloadable; in-flight coalesced entries drain naturally. |
|
||||
| `BcdTags.*.CacheTtlMs`, `Plcs[i].DefaultCacheTtlMs` | Tag-map reseat for the affected PLC drops that PLC's entire cache. |
|
||||
|
||||
@@ -2,7 +2,9 @@
|
||||
|
||||
Operator diagnosis playbook for mbproxy. Each entry maps an observable symptom to the log event name and status-page counter that confirms it, then lists likely causes and remediation steps.
|
||||
|
||||
The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-<date>.log`. The live counters are at `http://<host>:<AdminPort>/status.json` (default port `8080`). Events at Error level and above are also mirrored to the Windows Application Event Log under source `mbproxy`.
|
||||
The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-<date>.log` on Windows, or `/var/log/mbproxy/mbproxy-<date>.log` on Linux. The live counters are at `http://<host>:<AdminPort>/status.json` (default port `8080`). Events at Error level and above are also mirrored to the **Windows Application Event Log** (Windows Service) or the **local syslog / journal** (systemd) under source `mbproxy` — view the latter with `journalctl -t mbproxy` or `journalctl -u mbproxy`.
|
||||
|
||||
Paths and service commands below are written for Windows (`%ProgramData%`, `sc.exe`); the systemd equivalents are `/etc/mbproxy` + `/var/log/mbproxy` and `systemctl start|stop|status mbproxy`.
|
||||
|
||||
## Service Startup Failures
|
||||
|
||||
@@ -124,7 +126,28 @@ The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-<date>.log`. The l
|
||||
|
||||
1. Verify the upstream count on the status page returns to normal as clients reconnect — `plcs[].clients.connected` should climb again within seconds.
|
||||
2. If cascades fire repeatedly against the same PLC, investigate the PLC and intermediate network for stability. The proxy itself has no state to repair.
|
||||
3. If cascades correlate with idle periods, the idle middlebox-drop pattern is the likeliest cause; reduce the upstream client's poll interval below the middlebox idle timeout to keep traffic flowing.
|
||||
3. If cascades correlate with idle periods, the idle middlebox-drop pattern is the likeliest cause. Keepalive is enabled by default and should already be preventing this — confirm `Connection.Keepalive.Enabled` is `true` and that `BackendHeartbeatIdleMs` is comfortably below the middlebox idle timeout. See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md).
|
||||
|
||||
### Backend keepalive heartbeat failing
|
||||
|
||||
**Symptom.** A PLC's backend connection is torn down while idle — no client was actively talking to it. `plcs[].backend.backendIdleDisconnects` increments and the upstream clients (if any were attached) are cascaded.
|
||||
|
||||
**Where to look.**
|
||||
|
||||
- Log events: `mbproxy.keepalive.heartbeat.timeout` (Warning) followed by `mbproxy.keepalive.backend.idle_disconnect` (Information).
|
||||
- Status fields: `plcs[].backend.backendHeartbeatsSent`, `backendHeartbeatsFailed`, `backendIdleDisconnects`.
|
||||
|
||||
**Root causes.**
|
||||
|
||||
- The ECOM is reachable at the IP layer but no longer answering Modbus (firmware hang, ECOM reset mid-session).
|
||||
- The path died between heartbeats and the heartbeat was the first request to discover it — this is the feature working as intended (the failure is found during idle, not on a client request).
|
||||
- `BackendHeartbeatProbeAddress` points at an address the PLC rejects. The default (0 = `V0`) is safe on DL205/DL260; only an operator override could break this.
|
||||
|
||||
**Remediation.**
|
||||
|
||||
1. A single idle-disconnect that recovers on the next client request needs no action — the proxy reconnected the path proactively.
|
||||
2. Repeated idle-disconnects on one PLC mean it keeps going dark while idle. Investigate the device and the network path; the proxy has no state to repair.
|
||||
3. If `backendHeartbeatsFailed` climbs but the PLC answers real client requests fine, check that `BackendHeartbeatProbeAddress` is a register the device actually serves.
|
||||
|
||||
### Request timeout watchdog firing
|
||||
|
||||
|
||||
@@ -6,9 +6,9 @@ The stable catalog of every `mbproxy.*` event name the service emits, with its l
|
||||
|
||||
The service uses [Serilog](https://serilog.net/) wired through the `Microsoft.Extensions.Logging` bridge. Three sinks are configured (see `src/Mbproxy/HostingExtensions.cs`):
|
||||
|
||||
- **Console** — written to stdout for interactive `--console` runs and for the SCM stdout capture.
|
||||
- **Rolling file** — under `%ProgramData%\mbproxy\logs\` (`mbproxy-<date>.log`).
|
||||
- **Windows Event Log** — only when running as a Windows Service, and only for events at `Error` and above (see `src/Mbproxy/Diagnostics/EventLogBridge.cs`).
|
||||
- **Console** — stdout; captured by the Windows SCM or by systemd-journald.
|
||||
- **Rolling file** — `%ProgramData%\mbproxy\logs\` on Windows, `/var/log/mbproxy/` on Linux (`mbproxy-<date>.log`).
|
||||
- **Platform diagnostic sink** — `Error`+ events only. `DiagnosticSinkSelector` picks it once at the composition root: the **Windows Application Event Log** under the SCM (`EventLogBridge`), **local syslog** under systemd (`SyslogBridge`), or none for interactive/dev runs.
|
||||
|
||||
Every event uses source-generated `[LoggerMessage]` definitions, so the property names below match the message template token-for-token. The default minimum level is `Information`; lower the floor for `Mbproxy.*` categories via the standard `Logging:LogLevel` configuration to surface `Debug` events such as the coalesce and cache traces.
|
||||
|
||||
@@ -385,6 +385,51 @@ Fires whenever the entire per-PLC cache is wiped at once — primarily after a b
|
||||
|
||||
**Operator action:** none unless flushes happen on a tight loop, which would indicate the backend connection itself is unstable.
|
||||
|
||||
## Keepalive
|
||||
|
||||
See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md) for the backend heartbeat design.
|
||||
|
||||
### mbproxy.keepalive.heartbeat.sent
|
||||
|
||||
**Level:** Debug · **EventId:** 150 · **Source:** `src/Mbproxy/Proxy/Multiplexing/KeepaliveLogEvents.cs`
|
||||
|
||||
| Property | Type | Meaning |
|
||||
|----------|------|---------|
|
||||
| `Plc` | `string` | Configured PLC name. |
|
||||
| `ProxyTxId` | `ushort` | Proxy-allocated TxId carrying the synthetic FC03 probe. |
|
||||
| `Address` | `ushort` | Modbus address the probe reads (`BackendHeartbeatProbeAddress`). |
|
||||
|
||||
Fires each time the heartbeat loop issues a probe on an idle backend socket — at most one per `BackendHeartbeatIdleMs` per idle PLC.
|
||||
|
||||
**Operator action:** none. Debug-level; useful only when confirming the heartbeat is alive.
|
||||
|
||||
### mbproxy.keepalive.heartbeat.timeout
|
||||
|
||||
**Level:** Warning · **EventId:** 151 · **Source:** `src/Mbproxy/Proxy/Multiplexing/KeepaliveLogEvents.cs`
|
||||
|
||||
| Property | Type | Meaning |
|
||||
|----------|------|---------|
|
||||
| `Plc` | `string` | Configured PLC name. |
|
||||
| `ProxyTxId` | `ushort` | Proxy TxId of the unanswered probe. |
|
||||
| `ElapsedMs` | `long` | Milliseconds from probe send to timeout. |
|
||||
|
||||
Fires when a heartbeat probe is not answered within `BackendRequestTimeoutMs` — the backend is connected but no longer answering Modbus.
|
||||
|
||||
**Operator action:** check the PLC and the network path. Paired with `mbproxy.keepalive.backend.idle_disconnect` for the same PLC.
|
||||
|
||||
### mbproxy.keepalive.backend.idle_disconnect
|
||||
|
||||
**Level:** Information · **EventId:** 152 · **Source:** `src/Mbproxy/Proxy/Multiplexing/KeepaliveLogEvents.cs`
|
||||
|
||||
| Property | Type | Meaning |
|
||||
|----------|------|---------|
|
||||
| `Plc` | `string` | Configured PLC name. |
|
||||
| `ElapsedMs` | `long` | Milliseconds the failed heartbeat waited before the teardown. |
|
||||
|
||||
Fires when a failed heartbeat triggers a proactive backend teardown. Every attached upstream pipe is cascaded; clients reconnect on their next request. This is the keepalive feature doing its job — finding a dead path during idle instead of on the next real request.
|
||||
|
||||
**Operator action:** none if isolated. Repeated idle-disconnects on one PLC indicate it keeps going dark while idle — investigate the device or the network path.
|
||||
|
||||
## BCD Rewriter
|
||||
|
||||
### mbproxy.rewrite.partial_bcd
|
||||
@@ -495,5 +540,6 @@ Lifecycle events (`startup.*`, `listener.*`, `admin.*`, `shutdown.*`, `config.re
|
||||
- [Response Cache](../Architecture/ResponseCache.md) — context for the `mbproxy.cache.*` events.
|
||||
- [Status Page](../Operations/StatusPage.md) — counter equivalents for the high-volume Debug-level events.
|
||||
- [Read Coalescing](../Architecture/ReadCoalescing.md) — context for the `mbproxy.coalesce.*` events.
|
||||
- [Keepalive](../Architecture/Keepalive.md) — context for the `mbproxy.keepalive.*` events.
|
||||
- [BCD Rewriting](../Features/BcdRewriting.md) — context for the `mbproxy.rewrite.*` and `mbproxy.exception.passthrough` events.
|
||||
- [Hot Reload](../Features/HotReload.md) — context for the `mbproxy.config.reload.*` events.
|
||||
|
||||
Reference in New Issue
Block a user