mbproxy: cross-platform support — Linux/systemd alongside Windows
Make the service build, run, and install on Linux as a first-class target while keeping the Windows Service + Event Log behaviour intact. - Build: drop the hardcoded win-x64 RID — single-file publish now works for any RID. publish.ps1 gains -Rid; new publish.sh for Linux hosts. - Diagnostics: DiagnosticSinkSelector picks the Error+ sink per host — Windows Event Log under the SCM, local syslog under systemd (Serilog.Sinks.SyslogMessages), none for interactive runs. The EventLog truncation helper is extracted so it is testable cross-OS. - Host: Program.cs registers AddSystemd() alongside AddWindowsService(). - Config: a RID-conditioned appsettings template ships Windows or Unix paths; both templates are schema-validated by a test. - Install: systemd unit (Type=exec) plus install.sh / uninstall.sh. Also fixes two cross-platform bugs found while testing: install.ps1 and uninstall.ps1 used New-EventLog / Remove-EventLog (absent in PowerShell 7), and the E2E sim launcher hardcoded Windows venv paths. - Docs updated across README, CLAUDE.md, and docs/ for dual-platform. 413 tests pass on Windows; 374 (all non-simulator) on Linux. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -7,8 +7,11 @@
|
||||
The configuration loader resolves `appsettings.json` relative to the executable.
|
||||
|
||||
- **Development run** (`dotnet run`): `src/Mbproxy/appsettings.json` next to the build output.
|
||||
- **Single-file publish** (`dotnet publish -c Release -r win-x64`): `appsettings.json` next to `Mbproxy.exe` in the publish folder.
|
||||
- **Installed as a Windows Service**: `%ProgramData%\mbproxy\appsettings.json`. The install script copies the template at `install/mbproxy.config.template.json` to this path the first time only — an existing file is preserved across reinstalls.
|
||||
- **Single-file publish** (`dotnet publish -c Release -r <rid>`): `appsettings.json` next to the published binary. A `win-x64` publish ships `install/mbproxy.config.template.json`; a `linux-x64` publish ships `install/mbproxy.linux.config.template.json` (same keys, Unix log path) — each linked into the bundle as `appsettings.json`.
|
||||
- **Installed as a Windows Service**: `%ProgramData%\mbproxy\appsettings.json`, seeded by `install.ps1` from `mbproxy.config.template.json`.
|
||||
- **Installed as a systemd unit**: `/etc/mbproxy/appsettings.json` (the unit's `WorkingDirectory`), seeded by `install.sh` from the Linux template.
|
||||
|
||||
In both installed cases the install script copies the template only when no config already exists — an existing file is preserved across reinstalls.
|
||||
|
||||
The file is loaded with `reloadOnChange: true`. All consumers read through `IOptionsMonitor<MbproxyOptions>`, so a save propagates without restarting the service. See [`../Features/HotReload.md`](../Features/HotReload.md) for per-key propagation semantics.
|
||||
|
||||
@@ -51,11 +54,19 @@ Every supported key under `Mbproxy:*`, populated to a representative default:
|
||||
// Read-only HTTP status page. Set to 0 to disable.
|
||||
"AdminPort": 8080,
|
||||
|
||||
// Backend connection / request / shutdown timeouts.
|
||||
// Backend connection / request / shutdown timeouts and keepalive.
|
||||
"Connection": {
|
||||
"BackendConnectTimeoutMs": 3000,
|
||||
"BackendRequestTimeoutMs": 3000,
|
||||
"GracefulShutdownTimeoutMs": 10000
|
||||
"GracefulShutdownTimeoutMs": 10000,
|
||||
"Keepalive": {
|
||||
"Enabled": true,
|
||||
"TcpIdleTimeMs": 30000,
|
||||
"TcpProbeIntervalMs": 5000,
|
||||
"TcpProbeCount": 4,
|
||||
"BackendHeartbeatIdleMs": 30000,
|
||||
"BackendHeartbeatProbeAddress": 0
|
||||
}
|
||||
},
|
||||
|
||||
// Polly resilience policies.
|
||||
@@ -169,6 +180,21 @@ Operational sizing notes:
|
||||
- A 3 s request timeout is generous compared with typical DL205/DL260 scan times (a few ms to tens of ms for FC03 of 100 registers). The slack absorbs PLC scan-overlap jitter without faulting the upstream client.
|
||||
- `GracefulShutdownTimeoutMs` should be less than the Service Control Manager's stop deadline. The default 10 s suits a fleet of 54 PLCs; on a much larger fleet, raise both the SCM wait hint and this value in lockstep.
|
||||
|
||||
## `Mbproxy.Connection.Keepalive`
|
||||
|
||||
TCP keepalive and backend heartbeat settings. Source: `KeepaliveOptions.cs`. Enabled by default — the DL205/DL260 ECOM never emits TCP keepalives, so an idle socket is otherwise dropped by middleboxes after 2–5 minutes. See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md) for the full design.
|
||||
|
||||
| Field | Type | Default | Notes |
|
||||
|-------|------|---------|-------|
|
||||
| `Enabled` | bool | `true` | Master switch. When `false`, neither `SO_KEEPALIVE` nor the backend heartbeat is applied and the proxy behaves exactly as a pre-keepalive build. |
|
||||
| `TcpIdleTimeMs` | int | `30000` | `SO_KEEPALIVE` idle time before the OS sends its first probe. Applied to the backend socket and accepted upstream sockets. |
|
||||
| `TcpProbeIntervalMs` | int | `5000` | `SO_KEEPALIVE` interval between probes once idle. |
|
||||
| `TcpProbeCount` | int | `4` | `SO_KEEPALIVE` unanswered probes before the OS declares the socket dead. |
|
||||
| `BackendHeartbeatIdleMs` | int | `30000` | After this much backend idle, the proxy issues a synthetic FC03 qty=1 read to keep the path warm and prove the ECOM still answers Modbus. Must be greater than `BackendRequestTimeoutMs`. |
|
||||
| `BackendHeartbeatProbeAddress` | int | `0` | Modbus PDU address the heartbeat FC03 probe reads. Address `0` (`V0`) is valid on DL205/DL260 in factory absolute mode. Range `[0, 65535]`. |
|
||||
|
||||
On hot reload, the heartbeat interval and probe address are re-read on every loop tick. The `Tcp*` socket options are applied at connect/accept time, so a reload affects only sockets opened after the change. A reload where `BackendHeartbeatIdleMs <= BackendRequestTimeoutMs` is rejected — a heartbeat interval at or below the request timeout would fire continuously.
|
||||
|
||||
## `Mbproxy.Resilience`
|
||||
|
||||
Polly retry pipelines for backend connect, listener bind, and the in-flight read coalescer. Source: `ResilienceOptions.cs`.
|
||||
@@ -391,6 +417,7 @@ A reduced view of [`../Features/HotReload.md`](../Features/HotReload.md), restri
|
||||
| `Plcs[i]` removed | Supervisor stops the listener and closes all upstream connections for that PLC. |
|
||||
| `Plcs[i].ListenPort` or `Host` changed | Equivalent to remove + add. |
|
||||
| `Connection.Backend*TimeoutMs` | Next backend connect or request uses the new value. |
|
||||
| `Connection.Keepalive` heartbeat fields | Re-read on every heartbeat loop tick. `Tcp*` socket options apply to backend/upstream sockets opened after the change. |
|
||||
| `AdminPort` | Requires a service restart — the Kestrel admin host is built once at startup. |
|
||||
| `Resilience.ReadCoalescing.Enabled` | Hot-reloadable; in-flight coalesced entries drain naturally. |
|
||||
| `BcdTags.*.CacheTtlMs`, `Plcs[i].DefaultCacheTtlMs` | Tag-map reseat for the affected PLC drops that PLC's entire cache. |
|
||||
|
||||
@@ -2,7 +2,9 @@
|
||||
|
||||
Operator diagnosis playbook for mbproxy. Each entry maps an observable symptom to the log event name and status-page counter that confirms it, then lists likely causes and remediation steps.
|
||||
|
||||
The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-<date>.log`. The live counters are at `http://<host>:<AdminPort>/status.json` (default port `8080`). Events at Error level and above are also mirrored to the Windows Application Event Log under source `mbproxy`.
|
||||
The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-<date>.log` on Windows, or `/var/log/mbproxy/mbproxy-<date>.log` on Linux. The live counters are at `http://<host>:<AdminPort>/status.json` (default port `8080`). Events at Error level and above are also mirrored to the **Windows Application Event Log** (Windows Service) or the **local syslog / journal** (systemd) under source `mbproxy` — view the latter with `journalctl -t mbproxy` or `journalctl -u mbproxy`.
|
||||
|
||||
Paths and service commands below are written for Windows (`%ProgramData%`, `sc.exe`); the systemd equivalents are `/etc/mbproxy` + `/var/log/mbproxy` and `systemctl start|stop|status mbproxy`.
|
||||
|
||||
## Service Startup Failures
|
||||
|
||||
@@ -124,7 +126,28 @@ The rolling log lives at `C:\ProgramData\mbproxy\logs\mbproxy-<date>.log`. The l
|
||||
|
||||
1. Verify the upstream count on the status page returns to normal as clients reconnect — `plcs[].clients.connected` should climb again within seconds.
|
||||
2. If cascades fire repeatedly against the same PLC, investigate the PLC and intermediate network for stability. The proxy itself has no state to repair.
|
||||
3. If cascades correlate with idle periods, the idle middlebox-drop pattern is the likeliest cause; reduce the upstream client's poll interval below the middlebox idle timeout to keep traffic flowing.
|
||||
3. If cascades correlate with idle periods, the idle middlebox-drop pattern is the likeliest cause. Keepalive is enabled by default and should already be preventing this — confirm `Connection.Keepalive.Enabled` is `true` and that `BackendHeartbeatIdleMs` is comfortably below the middlebox idle timeout. See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md).
|
||||
|
||||
### Backend keepalive heartbeat failing
|
||||
|
||||
**Symptom.** A PLC's backend connection is torn down while idle — no client was actively talking to it. `plcs[].backend.backendIdleDisconnects` increments and the upstream clients (if any were attached) are cascaded.
|
||||
|
||||
**Where to look.**
|
||||
|
||||
- Log events: `mbproxy.keepalive.heartbeat.timeout` (Warning) followed by `mbproxy.keepalive.backend.idle_disconnect` (Information).
|
||||
- Status fields: `plcs[].backend.backendHeartbeatsSent`, `backendHeartbeatsFailed`, `backendIdleDisconnects`.
|
||||
|
||||
**Root causes.**
|
||||
|
||||
- The ECOM is reachable at the IP layer but no longer answering Modbus (firmware hang, ECOM reset mid-session).
|
||||
- The path died between heartbeats and the heartbeat was the first request to discover it — this is the feature working as intended (the failure is found during idle, not on a client request).
|
||||
- `BackendHeartbeatProbeAddress` points at an address the PLC rejects. The default (0 = `V0`) is safe on DL205/DL260; only an operator override could break this.
|
||||
|
||||
**Remediation.**
|
||||
|
||||
1. A single idle-disconnect that recovers on the next client request needs no action — the proxy reconnected the path proactively.
|
||||
2. Repeated idle-disconnects on one PLC mean it keeps going dark while idle. Investigate the device and the network path; the proxy has no state to repair.
|
||||
3. If `backendHeartbeatsFailed` climbs but the PLC answers real client requests fine, check that `BackendHeartbeatProbeAddress` is a register the device actually serves.
|
||||
|
||||
### Request timeout watchdog firing
|
||||
|
||||
|
||||
Reference in New Issue
Block a user