# mbproxy operations runbook Day-two operations reference for the mbproxy Windows Service: install, upgrade, configuration, logs, and troubleshooting. ## Install ### Prerequisites - Windows 10 / Server 2019 or later (64-bit). - PowerShell 5.1+ run as Administrator (the install script uses `#Requires -RunAsAdministrator`). - The compiled publish output from `dotnet publish` (see [README.md](../README.md) for the exact command). - Modbus TCP reachable from the proxy host to the PLCs on port 502. - Port 8080 (or whatever `AdminPort` is set to) available for the status page. ### Steps 1. Publish the binaries on the build machine: ```powershell dotnet publish src/Mbproxy/Mbproxy.csproj -c Release -r win-x64 --self-contained true -o C:\build\mbproxy-publish ``` 2. Copy the publish output to the target server (or run the install script locally if you built on the server). 3. Open an elevated PowerShell prompt and run the install script: ```powershell .\install\install.ps1 -PublishOutput C:\build\mbproxy-publish -Start ``` The script: - Copies binaries to `C:\Program Files\Mbproxy\` (configurable via `-InstallPath`). - Registers the service with `sc.exe create`. - Sets failure-recovery: restart after 60 s on first/second failure, no action on third. - Creates `%ProgramData%\mbproxy\logs\` and sets ACLs if needed. - Copies `mbproxy.config.template.json` → `%ProgramData%\mbproxy\appsettings.json` **only if no config exists**. - Registers the Windows Event Log source `mbproxy`. - With `-Start`, starts the service and waits up to 30 s for `RUNNING` state. 4. Edit `%ProgramData%\mbproxy\appsettings.json` to configure your PLC list and BCD tags. See the template for inline comments on every field. 5. If you edited the config before starting, start the service: ```powershell sc.exe start mbproxy ``` 6. Verify (smoke checklist — see [Smoke checklist](#first-install-smoke-checklist) below). ### Re-running install on an existing installation The install script is idempotent. Re-running it: - Stops the service if running. - Overwrites the binaries. - Updates the service config via `sc.exe config` (not `sc.exe create`). - Preserves `%ProgramData%\mbproxy\appsettings.json` (never overwritten on update). - Skips Event Log source creation if already registered. ## Upgrade procedure 1. Publish new binaries on the build machine (same command as install step 1). 2. Stop the service: ```powershell sc.exe stop mbproxy ``` Wait for the service to reach `STOPPED` state — graceful shutdown drains in-flight PDUs (up to `Connection.GracefulShutdownTimeoutMs`, default 10 s). 3. Copy new binaries to `C:\Program Files\Mbproxy\` (or run `install.ps1 -PublishOutput ...` to automate steps 2–4): ```powershell Copy-Item -Path C:\build\mbproxy-publish\* -Destination 'C:\Program Files\Mbproxy\' -Force ``` 4. Start the service: ```powershell sc.exe start mbproxy ``` 5. Check the status page to confirm the new version: ```powershell Invoke-RestMethod http://localhost:8080/status.json | Select-Object -ExpandProperty service ``` The `version` field should show the new build. ## Uninstall ```powershell .\install\uninstall.ps1 ``` Options: - `-KeepConfig` — preserves `%ProgramData%\mbproxy\appsettings.json` for re-install. - Log files are **always archived** to `%ProgramData%\mbproxy.archived-\logs\` regardless of `-KeepConfig`. They are never deleted. ## Configuration The service reads `%ProgramData%\mbproxy\appsettings.json` at startup and watches it for changes while running. Most settings are hot-reloadable; a few require a restart. ### Hot-reload vs. restart | Setting | Behaviour on file save | |---|---| | `BcdTags.Global` add/remove/width | Next PDU uses the new map; in-flight PDUs complete with the old map. | | `Plcs[].BcdTags.{Add,Remove}` | Same per-PDU propagation. | | `Plcs[].Name` or `.Host` or `.ListenPort` changed | Treated as remove + add: old listener stops, new one starts. | | New `Plcs[]` entry | New listener binds immediately (subject to port availability). | | `Plcs[]` entry removed | Supervisor stops the listener; all connected clients for that PLC are disconnected. | | `Connection.Backend*TimeoutMs` | Next connect/request uses the new value. | | `Connection.GracefulShutdownTimeoutMs` | Picked up on the next `ApplicationStopping` event. | | `AdminPort` | Admin endpoint re-binds on the new port; old port released. | | Invalid reload (schema error, duplicate ports/addresses) | Rejected as a whole. Current in-memory config stays; `mbproxy.config.reload.rejected` logged at Error. | For more detail on the hot-reload propagation model, see [`design.md`](design.md) → "Configuration hot-reload". ### Editing appsettings.json The service picks up changes automatically. There is no need to restart unless you are changing the `Connection.GracefulShutdownTimeoutMs` (applies only on next stop) or updating the binary. If a reload is rejected (`mbproxy.config.reload.rejected` in the log), the service continues running with the previous config. Fix the JSON error and save again — the next valid file write will be accepted. ## Logs ### Location Rolling log files live at: `C:\ProgramData\mbproxy\logs\mbproxy-.log` One file per day, retained for 30 days by default (controlled by `retainedFileCountLimit` in the Serilog config section). ### Windows Event Log When running as a Windows Service, the `EventLogBridge` sink writes events at Error level and above to the Windows Application Event Log under source `mbproxy`. View with: ```powershell Get-EventLog -LogName Application -Source mbproxy -Newest 20 ``` Or open Event Viewer → Windows Logs → Application, filter by source `mbproxy`. ### Log survival after uninstall `uninstall.ps1` **never deletes log files**. It moves `logs\` to a timestamped archive at `%ProgramData%\mbproxy.archived-\logs\` so post-crash diagnostics remain accessible. ## Status page **URL:** `http://:/` Default port: 8080. Change with `Mbproxy.AdminPort` in `appsettings.json`. Routes: - `GET /` — HTML table, auto-refreshes every 5 s. No external assets. - `GET /status.json` — same data as JSON for monitoring scrapers. Key fields on `/status.json`: | Field | Meaning | |---|---| | `service.version` | Assembly informational version (set at publish time). | | `service.uptimeSeconds` | Seconds since service start. | | `service.config.lastReloadUtc` | Last accepted hot-reload timestamp. | | `listeners.bound` / `listeners.configured` | Bound count vs. configured PLC count. | | `plcs[].listener.state` | `bound` / `recovering` / `stopped`. | | `plcs[].backend.connectsSuccess` | Successful backend TCP connects since start. | | `plcs[].backend.connectsFailed` | Failed backend connects (all retries exhausted). | | `plcs[].pdus.forwarded` | Total PDUs forwarded through this PLC's proxy. | ## Common failure modes ### `mbproxy.startup.bind.failed` — port in use **Symptom:** The service starts but one or more PLCs show `listener.state = recovering`. **Cause:** Another process is bound to the configured `ListenPort`. **Remediation:** ```powershell netstat -ano | findstr : # find PID holding the port Get-Process -Id # identify the process ``` Release the port or change `Plcs[].ListenPort` in `appsettings.json`. The supervisor will retry automatically — watch for `mbproxy.listener.recovered` in the log. ### `mbproxy.listener.recovered` — no action needed A previously-failing listener successfully bound. The service is self-healing. This is informational. ### `mbproxy.backend.failed` — PLC unreachable **Symptom:** Upstream clients cannot connect through the proxy, or connections are immediately dropped. **Cause:** The PLC backend (`Plcs[].Host:Port`) is unreachable — network issue, PLC power cycle, or H2-ECOM100 firmware issue. **Remediation:** Check network path to the PLC. Verify the PLC Modbus port is responding: ```powershell Test-NetConnection -ComputerName -Port 502 ``` Note: the H2-ECOM100 module caps connections at 4 simultaneous TCP clients. If the proxy already has 4 upstream clients connected to one PLC port, a fifth will trigger `mbproxy.backend.failed`. ### `mbproxy.config.reload.rejected` — bad config **Symptom:** The log shows a rejection event after a file save; the current config is unchanged. **Cause:** The saved `appsettings.json` has a schema error, duplicate port, or conflicting BCD address. **Remediation:** Check the log for the joined error list immediately following the rejection event. Fix the JSON and save again. ### `mbproxy.admin.bind.failed` — admin port in use **Symptom:** The status page is unreachable. **Cause:** Another process is using `AdminPort`. **Remediation:** The proxy continues to forward Modbus traffic — only the status page is affected. Change `AdminPort` in `appsettings.json` (hot-reload applies). ### `mbproxy.rewrite.partial_bcd` — client reading half a 32-bit BCD pair **Symptom:** Warning in the log; the value passes through raw (no rewrite). **Cause:** The upstream client is reading only one register of a configured 32-bit BCD pair (e.g., quantity = 1 at the low address, or any read at the high address alone). This is almost always a client-side tag-definition bug. **Remediation:** Verify the client's tag definition specifies quantity = 2 for 32-bit BCD addresses. ### `mbproxy.rewrite.invalid_bcd` — non-BCD value from PLC **Symptom:** Warning in the log; the value passes through raw. **Cause:** The PLC returned a register value that contains non-BCD nibbles (e.g., `0xA123` — the nibble `A` is invalid BCD). This usually indicates the ladder program wrote a non-BCD value to a register configured as a BCD tag. **Remediation:** Investigate the PLC ladder program. The proxy cannot decode non-BCD data — passing it through is safer than guessing. ## First-install smoke checklist Run these commands after `install.ps1 -Start` to verify the deployment: ```powershell # 1. Service is running Get-Service mbproxy | Select-Object Status, DisplayName # 2. Status page is reachable Invoke-WebRequest http://localhost:8080/ -UseBasicParsing | Select-Object StatusCode # 3. JSON endpoint returns expected fields $status = Invoke-RestMethod http://localhost:8080/status.json $status.service | Select-Object version, uptimeSeconds $status.listeners # 4. Log file exists and is recent Get-Item "C:\ProgramData\mbproxy\logs\mbproxy-*.log" | Sort-Object LastWriteTime -Descending | Select-Object -First 1 # 5. No Error events in the Event Log Get-EventLog -LogName Application -Source mbproxy -EntryType Error -Newest 5 # 6. Stop the service cleanly (graceful shutdown within 10 s) $sw = [System.Diagnostics.Stopwatch]::StartNew() sc.exe stop mbproxy $deadline = [DateTime]::UtcNow.AddSeconds(15) do { Start-Sleep 1 } until ((Get-Service mbproxy).Status -eq 'Stopped' -or [DateTime]::UtcNow -gt $deadline) $sw.Stop() Write-Host "Stop elapsed: $($sw.ElapsedMilliseconds) ms" (Get-Service mbproxy).Status # Should be Stopped ``` **Note:** This checklist documents the expected steps. It was not executed on a dedicated clean VM (the proxy was developed and unit/E2E tested in-process). Run this checklist on first deployment to a production host.