mbproxy/docs: slim operations.md to runbook content + pointers

Three sections in operations.md duplicated the new focused docs:
  - "Configuration" → Operations/Configuration.md + Features/HotReload.md
  - "Status page"   → Operations/StatusPage.md
  - "Common failure modes" → Operations/Troubleshooting.md + Reference/LogEvents.md

Replaced each with a short pointer block. The runbook now keeps only
content unique to day-two ops: install steps, upgrade procedure, uninstall,
log file locations / retention / archival, and the first-install smoke
checklist. 271 -> 176 lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-14 03:49:34 -04:00
parent 0eb560a7f6
commit b4c82bf379
+8 -103
View File
@@ -100,29 +100,12 @@ Options:
## Configuration
The service reads `%ProgramData%\mbproxy\appsettings.json` at startup and watches it for changes while running. Most settings are hot-reloadable; a few require a restart.
The service reads `%ProgramData%\mbproxy\appsettings.json` at startup and watches it for changes while running. Most settings are hot-reloadable; a save triggers a re-bind of `IOptionsMonitor<MbproxyOptions>` and a per-change-kind reconcile.
### Hot-reload vs. restart
- Full schema (every `Mbproxy:*` key, defaults, validation rules, examples): [`Operations/Configuration.md`](Operations/Configuration.md).
- Per-change-kind reconcile semantics (what propagates instantly vs. what requires a restart): [`Features/HotReload.md`](Features/HotReload.md).
| Setting | Behaviour on file save |
|---|---|
| `BcdTags.Global` add/remove/width | Next PDU uses the new map; in-flight PDUs complete with the old map. |
| `Plcs[].BcdTags.{Add,Remove}` | Same per-PDU propagation. |
| `Plcs[].Name` or `.Host` or `.ListenPort` changed | Treated as remove + add: old listener stops, new one starts. |
| New `Plcs[]` entry | New listener binds immediately (subject to port availability). |
| `Plcs[]` entry removed | Supervisor stops the listener; all connected clients for that PLC are disconnected. |
| `Connection.Backend*TimeoutMs` | Next connect/request uses the new value. |
| `Connection.GracefulShutdownTimeoutMs` | Picked up on the next `ApplicationStopping` event. |
| `AdminPort` | Admin endpoint re-binds on the new port; old port released. |
| Invalid reload (schema error, duplicate ports/addresses) | Rejected as a whole. Current in-memory config stays; `mbproxy.config.reload.rejected` logged at Error. |
For more detail on the hot-reload propagation model, see [`design.md`](design.md) → "Configuration hot-reload".
### Editing appsettings.json
The service picks up changes automatically. There is no need to restart unless you are changing the `Connection.GracefulShutdownTimeoutMs` (applies only on next stop) or updating the binary.
If a reload is rejected (`mbproxy.config.reload.rejected` in the log), the service continues running with the previous config. Fix the JSON error and save again — the next valid file write will be accepted.
If a reload is rejected (`mbproxy.config.reload.rejected` in the log), the service continues running with the previous config. Fix the JSON and save again — the next valid file write is accepted.
## Logs
@@ -148,93 +131,15 @@ Or open Event Viewer → Windows Logs → Application, filter by source `mbproxy
## Status page
**URL:** `http://<proxy-host>:<AdminPort>/`
**URL:** `http://<proxy-host>:<AdminPort>/` (default port 8080; change via `Mbproxy.AdminPort` in `appsettings.json`).
Default port: 8080. Change with `Mbproxy.AdminPort` in `appsettings.json`.
Routes: `GET /` (auto-refreshing HTML, no external assets) and `GET /status.json` (same data as JSON for monitoring scrapers).
Routes:
- `GET /` — HTML table, auto-refreshes every 5 s. No external assets.
- `GET /status.json` — same data as JSON for monitoring scrapers.
Key fields on `/status.json`:
| Field | Meaning |
|---|---|
| `service.version` | Assembly informational version (set at publish time). |
| `service.uptimeSeconds` | Seconds since service start. |
| `service.config.lastReloadUtc` | Last accepted hot-reload timestamp. |
| `listeners.bound` / `listeners.configured` | Bound count vs. configured PLC count. |
| `plcs[].listener.state` | `bound` / `recovering` / `stopped`. |
| `plcs[].backend.connectsSuccess` | Successful backend TCP connects since start. |
| `plcs[].backend.connectsFailed` | Failed backend connects (all retries exhausted). |
| `plcs[].pdus.forwarded` | Total PDUs forwarded through this PLC's proxy. |
The full endpoint shape, every JSON field, counter semantics, and scraping examples live in [`Operations/StatusPage.md`](Operations/StatusPage.md). KPI catalog and dashboard guidance: [`kpi.md`](kpi.md).
## Common failure modes
### `mbproxy.startup.bind.failed` — port in use
**Symptom:** The service starts but one or more PLCs show `listener.state = recovering`.
**Cause:** Another process is bound to the configured `ListenPort`.
**Remediation:**
```powershell
netstat -ano | findstr :<port> # find PID holding the port
Get-Process -Id <pid> # identify the process
```
Release the port or change `Plcs[].ListenPort` in `appsettings.json`. The supervisor will retry automatically — watch for `mbproxy.listener.recovered` in the log.
### `mbproxy.listener.recovered` — no action needed
A previously-failing listener successfully bound. The service is self-healing. This is informational.
### `mbproxy.backend.failed` — PLC unreachable
**Symptom:** Upstream clients cannot connect through the proxy, or connections are immediately dropped.
**Cause:** The PLC backend (`Plcs[].Host:Port`) is unreachable — network issue, PLC power cycle, or H2-ECOM100 firmware issue.
**Remediation:** Check network path to the PLC. Verify the PLC Modbus port is responding:
```powershell
Test-NetConnection -ComputerName <plc-ip> -Port 502
```
Note: the H2-ECOM100 module caps connections at 4 simultaneous TCP clients. If the proxy already has 4 upstream clients connected to one PLC port, a fifth will trigger `mbproxy.backend.failed`.
### `mbproxy.config.reload.rejected` — bad config
**Symptom:** The log shows a rejection event after a file save; the current config is unchanged.
**Cause:** The saved `appsettings.json` has a schema error, duplicate port, or conflicting BCD address.
**Remediation:** Check the log for the joined error list immediately following the rejection event. Fix the JSON and save again.
### `mbproxy.admin.bind.failed` — admin port in use
**Symptom:** The status page is unreachable.
**Cause:** Another process is using `AdminPort`.
**Remediation:** The proxy continues to forward Modbus traffic — only the status page is affected. Change `AdminPort` in `appsettings.json` (hot-reload applies).
### `mbproxy.rewrite.partial_bcd` — client reading half a 32-bit BCD pair
**Symptom:** Warning in the log; the value passes through raw (no rewrite).
**Cause:** The upstream client is reading only one register of a configured 32-bit BCD pair (e.g., quantity = 1 at the low address, or any read at the high address alone). This is almost always a client-side tag-definition bug.
**Remediation:** Verify the client's tag definition specifies quantity = 2 for 32-bit BCD addresses.
### `mbproxy.rewrite.invalid_bcd` — non-BCD value from PLC
**Symptom:** Warning in the log; the value passes through raw.
**Cause:** The PLC returned a register value that contains non-BCD nibbles (e.g., `0xA123` — the nibble `A` is invalid BCD). This usually indicates the ladder program wrote a non-BCD value to a register configured as a BCD tag.
**Remediation:** Investigate the PLC ladder program. The proxy cannot decode non-BCD data — passing it through is safer than guessing.
The full diagnosis playbook — startup bind conflicts, backend connectivity, hot-reload validation errors, BCD rewrite anomalies, performance and queue-depth issues, response-cache anomalies, and graceful-shutdown problems — is keyed to log events and status counters in [`Operations/Troubleshooting.md`](Operations/Troubleshooting.md). The complete `mbproxy.*` event catalog with levels, properties, and operator implications is in [`Reference/LogEvents.md`](Reference/LogEvents.md).
## First-install smoke checklist