56eee3c563
Adds the mbproxy service end-to-end. Phases 00-08 implement the production-ready single-listener / 1:1-backend transparent Modbus TCP proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260 fleet. Phase 9 replaces the connection layer with a single backend socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's 4-concurrent-client cap as an operational ceiling. Phase 9 additions of note: - PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap - InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing for Phase 10 read coalescing — do not collapse to a single field) - Per-request watchdog: surfaces Modbus exception 0x0B to upstream on BackendRequestTimeoutMs, defending against lost responses, dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed- request bug (its ServerRequestHandler.last_pdu state race) - Status DTO + HTML gain inFlight / maxInFlight / txIdWraps / disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md) Tests: 263 unit + 38 E2E. Multiplexer correctness under truly concurrent backend traffic is proved against a stub backend in PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus 3.13's single-PDU framer stays in known-good mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
272 lines
11 KiB
Markdown
272 lines
11 KiB
Markdown
# mbproxy operations runbook
|
||
|
||
Day-two operations reference for the mbproxy Windows Service: install, upgrade, configuration, logs, and troubleshooting.
|
||
|
||
## Install
|
||
|
||
### Prerequisites
|
||
|
||
- Windows 10 / Server 2019 or later (64-bit).
|
||
- PowerShell 5.1+ run as Administrator (the install script uses `#Requires -RunAsAdministrator`).
|
||
- The compiled publish output from `dotnet publish` (see [README.md](../README.md) for the exact command).
|
||
- Modbus TCP reachable from the proxy host to the PLCs on port 502.
|
||
- Port 8080 (or whatever `AdminPort` is set to) available for the status page.
|
||
|
||
### Steps
|
||
|
||
1. Publish the binaries on the build machine:
|
||
|
||
```powershell
|
||
dotnet publish src/Mbproxy/Mbproxy.csproj -c Release -r win-x64 --self-contained true -o C:\build\mbproxy-publish
|
||
```
|
||
|
||
2. Copy the publish output to the target server (or run the install script locally if you built on the server).
|
||
|
||
3. Open an elevated PowerShell prompt and run the install script:
|
||
|
||
```powershell
|
||
.\install\install.ps1 -PublishOutput C:\build\mbproxy-publish -Start
|
||
```
|
||
|
||
The script:
|
||
- Copies binaries to `C:\Program Files\Mbproxy\` (configurable via `-InstallPath`).
|
||
- Registers the service with `sc.exe create`.
|
||
- Sets failure-recovery: restart after 60 s on first/second failure, no action on third.
|
||
- Creates `%ProgramData%\mbproxy\logs\` and sets ACLs if needed.
|
||
- Copies `mbproxy.config.template.json` → `%ProgramData%\mbproxy\appsettings.json` **only if no config exists**.
|
||
- Registers the Windows Event Log source `mbproxy`.
|
||
- With `-Start`, starts the service and waits up to 30 s for `RUNNING` state.
|
||
|
||
4. Edit `%ProgramData%\mbproxy\appsettings.json` to configure your PLC list and BCD tags. See the template for inline comments on every field.
|
||
|
||
5. If you edited the config before starting, start the service:
|
||
|
||
```powershell
|
||
sc.exe start mbproxy
|
||
```
|
||
|
||
6. Verify (smoke checklist — see [Smoke checklist](#first-install-smoke-checklist) below).
|
||
|
||
### Re-running install on an existing installation
|
||
|
||
The install script is idempotent. Re-running it:
|
||
- Stops the service if running.
|
||
- Overwrites the binaries.
|
||
- Updates the service config via `sc.exe config` (not `sc.exe create`).
|
||
- Preserves `%ProgramData%\mbproxy\appsettings.json` (never overwritten on update).
|
||
- Skips Event Log source creation if already registered.
|
||
|
||
## Upgrade procedure
|
||
|
||
1. Publish new binaries on the build machine (same command as install step 1).
|
||
|
||
2. Stop the service:
|
||
|
||
```powershell
|
||
sc.exe stop mbproxy
|
||
```
|
||
|
||
Wait for the service to reach `STOPPED` state — graceful shutdown drains in-flight PDUs (up to `Connection.GracefulShutdownTimeoutMs`, default 10 s).
|
||
|
||
3. Copy new binaries to `C:\Program Files\Mbproxy\` (or run `install.ps1 -PublishOutput ...` to automate steps 2–4):
|
||
|
||
```powershell
|
||
Copy-Item -Path C:\build\mbproxy-publish\* -Destination 'C:\Program Files\Mbproxy\' -Force
|
||
```
|
||
|
||
4. Start the service:
|
||
|
||
```powershell
|
||
sc.exe start mbproxy
|
||
```
|
||
|
||
5. Check the status page to confirm the new version:
|
||
|
||
```powershell
|
||
Invoke-RestMethod http://localhost:8080/status.json | Select-Object -ExpandProperty service
|
||
```
|
||
|
||
The `version` field should show the new build.
|
||
|
||
## Uninstall
|
||
|
||
```powershell
|
||
.\install\uninstall.ps1
|
||
```
|
||
|
||
Options:
|
||
- `-KeepConfig` — preserves `%ProgramData%\mbproxy\appsettings.json` for re-install.
|
||
- Log files are **always archived** to `%ProgramData%\mbproxy.archived-<timestamp>\logs\` regardless of `-KeepConfig`. They are never deleted.
|
||
|
||
## Configuration
|
||
|
||
The service reads `%ProgramData%\mbproxy\appsettings.json` at startup and watches it for changes while running. Most settings are hot-reloadable; a few require a restart.
|
||
|
||
### Hot-reload vs. restart
|
||
|
||
| Setting | Behaviour on file save |
|
||
|---|---|
|
||
| `BcdTags.Global` add/remove/width | Next PDU uses the new map; in-flight PDUs complete with the old map. |
|
||
| `Plcs[].BcdTags.{Add,Remove}` | Same per-PDU propagation. |
|
||
| `Plcs[].Name` or `.Host` or `.ListenPort` changed | Treated as remove + add: old listener stops, new one starts. |
|
||
| New `Plcs[]` entry | New listener binds immediately (subject to port availability). |
|
||
| `Plcs[]` entry removed | Supervisor stops the listener; all connected clients for that PLC are disconnected. |
|
||
| `Connection.Backend*TimeoutMs` | Next connect/request uses the new value. |
|
||
| `Connection.GracefulShutdownTimeoutMs` | Picked up on the next `ApplicationStopping` event. |
|
||
| `AdminPort` | Admin endpoint re-binds on the new port; old port released. |
|
||
| Invalid reload (schema error, duplicate ports/addresses) | Rejected as a whole. Current in-memory config stays; `mbproxy.config.reload.rejected` logged at Error. |
|
||
|
||
For more detail on the hot-reload propagation model, see [`design.md`](design.md) → "Configuration hot-reload".
|
||
|
||
### Editing appsettings.json
|
||
|
||
The service picks up changes automatically. There is no need to restart unless you are changing the `Connection.GracefulShutdownTimeoutMs` (applies only on next stop) or updating the binary.
|
||
|
||
If a reload is rejected (`mbproxy.config.reload.rejected` in the log), the service continues running with the previous config. Fix the JSON error and save again — the next valid file write will be accepted.
|
||
|
||
## Logs
|
||
|
||
### Location
|
||
|
||
Rolling log files live at: `C:\ProgramData\mbproxy\logs\mbproxy-<date>.log`
|
||
|
||
One file per day, retained for 30 days by default (controlled by `retainedFileCountLimit` in the Serilog config section).
|
||
|
||
### Windows Event Log
|
||
|
||
When running as a Windows Service, the `EventLogBridge` sink writes events at Error level and above to the Windows Application Event Log under source `mbproxy`. View with:
|
||
|
||
```powershell
|
||
Get-EventLog -LogName Application -Source mbproxy -Newest 20
|
||
```
|
||
|
||
Or open Event Viewer → Windows Logs → Application, filter by source `mbproxy`.
|
||
|
||
### Log survival after uninstall
|
||
|
||
`uninstall.ps1` **never deletes log files**. It moves `logs\` to a timestamped archive at `%ProgramData%\mbproxy.archived-<timestamp>\logs\` so post-crash diagnostics remain accessible.
|
||
|
||
## Status page
|
||
|
||
**URL:** `http://<proxy-host>:<AdminPort>/`
|
||
|
||
Default port: 8080. Change with `Mbproxy.AdminPort` in `appsettings.json`.
|
||
|
||
Routes:
|
||
- `GET /` — HTML table, auto-refreshes every 5 s. No external assets.
|
||
- `GET /status.json` — same data as JSON for monitoring scrapers.
|
||
|
||
Key fields on `/status.json`:
|
||
|
||
| Field | Meaning |
|
||
|---|---|
|
||
| `service.version` | Assembly informational version (set at publish time). |
|
||
| `service.uptimeSeconds` | Seconds since service start. |
|
||
| `service.config.lastReloadUtc` | Last accepted hot-reload timestamp. |
|
||
| `listeners.bound` / `listeners.configured` | Bound count vs. configured PLC count. |
|
||
| `plcs[].listener.state` | `bound` / `recovering` / `stopped`. |
|
||
| `plcs[].backend.connectsSuccess` | Successful backend TCP connects since start. |
|
||
| `plcs[].backend.connectsFailed` | Failed backend connects (all retries exhausted). |
|
||
| `plcs[].pdus.forwarded` | Total PDUs forwarded through this PLC's proxy. |
|
||
|
||
## Common failure modes
|
||
|
||
### `mbproxy.startup.bind.failed` — port in use
|
||
|
||
**Symptom:** The service starts but one or more PLCs show `listener.state = recovering`.
|
||
|
||
**Cause:** Another process is bound to the configured `ListenPort`.
|
||
|
||
**Remediation:**
|
||
|
||
```powershell
|
||
netstat -ano | findstr :<port> # find PID holding the port
|
||
Get-Process -Id <pid> # identify the process
|
||
```
|
||
|
||
Release the port or change `Plcs[].ListenPort` in `appsettings.json`. The supervisor will retry automatically — watch for `mbproxy.listener.recovered` in the log.
|
||
|
||
### `mbproxy.listener.recovered` — no action needed
|
||
|
||
A previously-failing listener successfully bound. The service is self-healing. This is informational.
|
||
|
||
### `mbproxy.backend.failed` — PLC unreachable
|
||
|
||
**Symptom:** Upstream clients cannot connect through the proxy, or connections are immediately dropped.
|
||
|
||
**Cause:** The PLC backend (`Plcs[].Host:Port`) is unreachable — network issue, PLC power cycle, or H2-ECOM100 firmware issue.
|
||
|
||
**Remediation:** Check network path to the PLC. Verify the PLC Modbus port is responding:
|
||
|
||
```powershell
|
||
Test-NetConnection -ComputerName <plc-ip> -Port 502
|
||
```
|
||
|
||
Note: the H2-ECOM100 module caps connections at 4 simultaneous TCP clients. If the proxy already has 4 upstream clients connected to one PLC port, a fifth will trigger `mbproxy.backend.failed`.
|
||
|
||
### `mbproxy.config.reload.rejected` — bad config
|
||
|
||
**Symptom:** The log shows a rejection event after a file save; the current config is unchanged.
|
||
|
||
**Cause:** The saved `appsettings.json` has a schema error, duplicate port, or conflicting BCD address.
|
||
|
||
**Remediation:** Check the log for the joined error list immediately following the rejection event. Fix the JSON and save again.
|
||
|
||
### `mbproxy.admin.bind.failed` — admin port in use
|
||
|
||
**Symptom:** The status page is unreachable.
|
||
|
||
**Cause:** Another process is using `AdminPort`.
|
||
|
||
**Remediation:** The proxy continues to forward Modbus traffic — only the status page is affected. Change `AdminPort` in `appsettings.json` (hot-reload applies).
|
||
|
||
### `mbproxy.rewrite.partial_bcd` — client reading half a 32-bit BCD pair
|
||
|
||
**Symptom:** Warning in the log; the value passes through raw (no rewrite).
|
||
|
||
**Cause:** The upstream client is reading only one register of a configured 32-bit BCD pair (e.g., quantity = 1 at the low address, or any read at the high address alone). This is almost always a client-side tag-definition bug.
|
||
|
||
**Remediation:** Verify the client's tag definition specifies quantity = 2 for 32-bit BCD addresses.
|
||
|
||
### `mbproxy.rewrite.invalid_bcd` — non-BCD value from PLC
|
||
|
||
**Symptom:** Warning in the log; the value passes through raw.
|
||
|
||
**Cause:** The PLC returned a register value that contains non-BCD nibbles (e.g., `0xA123` — the nibble `A` is invalid BCD). This usually indicates the ladder program wrote a non-BCD value to a register configured as a BCD tag.
|
||
|
||
**Remediation:** Investigate the PLC ladder program. The proxy cannot decode non-BCD data — passing it through is safer than guessing.
|
||
|
||
## First-install smoke checklist
|
||
|
||
Run these commands after `install.ps1 -Start` to verify the deployment:
|
||
|
||
```powershell
|
||
# 1. Service is running
|
||
Get-Service mbproxy | Select-Object Status, DisplayName
|
||
|
||
# 2. Status page is reachable
|
||
Invoke-WebRequest http://localhost:8080/ -UseBasicParsing | Select-Object StatusCode
|
||
|
||
# 3. JSON endpoint returns expected fields
|
||
$status = Invoke-RestMethod http://localhost:8080/status.json
|
||
$status.service | Select-Object version, uptimeSeconds
|
||
$status.listeners
|
||
|
||
# 4. Log file exists and is recent
|
||
Get-Item "C:\ProgramData\mbproxy\logs\mbproxy-*.log" | Sort-Object LastWriteTime -Descending | Select-Object -First 1
|
||
|
||
# 5. No Error events in the Event Log
|
||
Get-EventLog -LogName Application -Source mbproxy -EntryType Error -Newest 5
|
||
|
||
# 6. Stop the service cleanly (graceful shutdown within 10 s)
|
||
$sw = [System.Diagnostics.Stopwatch]::StartNew()
|
||
sc.exe stop mbproxy
|
||
$deadline = [DateTime]::UtcNow.AddSeconds(15)
|
||
do { Start-Sleep 1 } until ((Get-Service mbproxy).Status -eq 'Stopped' -or [DateTime]::UtcNow -gt $deadline)
|
||
$sw.Stop()
|
||
Write-Host "Stop elapsed: $($sw.ElapsedMilliseconds) ms"
|
||
(Get-Service mbproxy).Status # Should be Stopped
|
||
```
|
||
|
||
**Note:** This checklist documents the expected steps. It was not executed on a dedicated clean VM (the proxy was developed and unit/E2E tested in-process). Run this checklist on first deployment to a production host.
|