From b4c82bf379f40b3028647160768bd091c8eeb44b Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Thu, 14 May 2026 03:49:34 -0400 Subject: [PATCH] mbproxy/docs: slim operations.md to runbook content + pointers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three sections in operations.md duplicated the new focused docs: - "Configuration" → Operations/Configuration.md + Features/HotReload.md - "Status page" → Operations/StatusPage.md - "Common failure modes" → Operations/Troubleshooting.md + Reference/LogEvents.md Replaced each with a short pointer block. The runbook now keeps only content unique to day-two ops: install steps, upgrade procedure, uninstall, log file locations / retention / archival, and the first-install smoke checklist. 271 -> 176 lines. Co-Authored-By: Claude Opus 4.7 (1M context) --- mbproxy/docs/operations.md | 111 +++---------------------------------- 1 file changed, 8 insertions(+), 103 deletions(-) diff --git a/mbproxy/docs/operations.md b/mbproxy/docs/operations.md index 4879bb6..ccb6744 100644 --- a/mbproxy/docs/operations.md +++ b/mbproxy/docs/operations.md @@ -100,29 +100,12 @@ Options: ## Configuration -The service reads `%ProgramData%\mbproxy\appsettings.json` at startup and watches it for changes while running. Most settings are hot-reloadable; a few require a restart. +The service reads `%ProgramData%\mbproxy\appsettings.json` at startup and watches it for changes while running. Most settings are hot-reloadable; a save triggers a re-bind of `IOptionsMonitor` and a per-change-kind reconcile. -### Hot-reload vs. restart +- Full schema (every `Mbproxy:*` key, defaults, validation rules, examples): [`Operations/Configuration.md`](Operations/Configuration.md). +- Per-change-kind reconcile semantics (what propagates instantly vs. what requires a restart): [`Features/HotReload.md`](Features/HotReload.md). -| Setting | Behaviour on file save | -|---|---| -| `BcdTags.Global` add/remove/width | Next PDU uses the new map; in-flight PDUs complete with the old map. | -| `Plcs[].BcdTags.{Add,Remove}` | Same per-PDU propagation. | -| `Plcs[].Name` or `.Host` or `.ListenPort` changed | Treated as remove + add: old listener stops, new one starts. | -| New `Plcs[]` entry | New listener binds immediately (subject to port availability). | -| `Plcs[]` entry removed | Supervisor stops the listener; all connected clients for that PLC are disconnected. | -| `Connection.Backend*TimeoutMs` | Next connect/request uses the new value. | -| `Connection.GracefulShutdownTimeoutMs` | Picked up on the next `ApplicationStopping` event. | -| `AdminPort` | Admin endpoint re-binds on the new port; old port released. | -| Invalid reload (schema error, duplicate ports/addresses) | Rejected as a whole. Current in-memory config stays; `mbproxy.config.reload.rejected` logged at Error. | - -For more detail on the hot-reload propagation model, see [`design.md`](design.md) → "Configuration hot-reload". - -### Editing appsettings.json - -The service picks up changes automatically. There is no need to restart unless you are changing the `Connection.GracefulShutdownTimeoutMs` (applies only on next stop) or updating the binary. - -If a reload is rejected (`mbproxy.config.reload.rejected` in the log), the service continues running with the previous config. Fix the JSON error and save again — the next valid file write will be accepted. +If a reload is rejected (`mbproxy.config.reload.rejected` in the log), the service continues running with the previous config. Fix the JSON and save again — the next valid file write is accepted. ## Logs @@ -148,93 +131,15 @@ Or open Event Viewer → Windows Logs → Application, filter by source `mbproxy ## Status page -**URL:** `http://:/` +**URL:** `http://:/` (default port 8080; change via `Mbproxy.AdminPort` in `appsettings.json`). -Default port: 8080. Change with `Mbproxy.AdminPort` in `appsettings.json`. +Routes: `GET /` (auto-refreshing HTML, no external assets) and `GET /status.json` (same data as JSON for monitoring scrapers). -Routes: -- `GET /` — HTML table, auto-refreshes every 5 s. No external assets. -- `GET /status.json` — same data as JSON for monitoring scrapers. - -Key fields on `/status.json`: - -| Field | Meaning | -|---|---| -| `service.version` | Assembly informational version (set at publish time). | -| `service.uptimeSeconds` | Seconds since service start. | -| `service.config.lastReloadUtc` | Last accepted hot-reload timestamp. | -| `listeners.bound` / `listeners.configured` | Bound count vs. configured PLC count. | -| `plcs[].listener.state` | `bound` / `recovering` / `stopped`. | -| `plcs[].backend.connectsSuccess` | Successful backend TCP connects since start. | -| `plcs[].backend.connectsFailed` | Failed backend connects (all retries exhausted). | -| `plcs[].pdus.forwarded` | Total PDUs forwarded through this PLC's proxy. | +The full endpoint shape, every JSON field, counter semantics, and scraping examples live in [`Operations/StatusPage.md`](Operations/StatusPage.md). KPI catalog and dashboard guidance: [`kpi.md`](kpi.md). ## Common failure modes -### `mbproxy.startup.bind.failed` — port in use - -**Symptom:** The service starts but one or more PLCs show `listener.state = recovering`. - -**Cause:** Another process is bound to the configured `ListenPort`. - -**Remediation:** - -```powershell -netstat -ano | findstr : # find PID holding the port -Get-Process -Id # identify the process -``` - -Release the port or change `Plcs[].ListenPort` in `appsettings.json`. The supervisor will retry automatically — watch for `mbproxy.listener.recovered` in the log. - -### `mbproxy.listener.recovered` — no action needed - -A previously-failing listener successfully bound. The service is self-healing. This is informational. - -### `mbproxy.backend.failed` — PLC unreachable - -**Symptom:** Upstream clients cannot connect through the proxy, or connections are immediately dropped. - -**Cause:** The PLC backend (`Plcs[].Host:Port`) is unreachable — network issue, PLC power cycle, or H2-ECOM100 firmware issue. - -**Remediation:** Check network path to the PLC. Verify the PLC Modbus port is responding: - -```powershell -Test-NetConnection -ComputerName -Port 502 -``` - -Note: the H2-ECOM100 module caps connections at 4 simultaneous TCP clients. If the proxy already has 4 upstream clients connected to one PLC port, a fifth will trigger `mbproxy.backend.failed`. - -### `mbproxy.config.reload.rejected` — bad config - -**Symptom:** The log shows a rejection event after a file save; the current config is unchanged. - -**Cause:** The saved `appsettings.json` has a schema error, duplicate port, or conflicting BCD address. - -**Remediation:** Check the log for the joined error list immediately following the rejection event. Fix the JSON and save again. - -### `mbproxy.admin.bind.failed` — admin port in use - -**Symptom:** The status page is unreachable. - -**Cause:** Another process is using `AdminPort`. - -**Remediation:** The proxy continues to forward Modbus traffic — only the status page is affected. Change `AdminPort` in `appsettings.json` (hot-reload applies). - -### `mbproxy.rewrite.partial_bcd` — client reading half a 32-bit BCD pair - -**Symptom:** Warning in the log; the value passes through raw (no rewrite). - -**Cause:** The upstream client is reading only one register of a configured 32-bit BCD pair (e.g., quantity = 1 at the low address, or any read at the high address alone). This is almost always a client-side tag-definition bug. - -**Remediation:** Verify the client's tag definition specifies quantity = 2 for 32-bit BCD addresses. - -### `mbproxy.rewrite.invalid_bcd` — non-BCD value from PLC - -**Symptom:** Warning in the log; the value passes through raw. - -**Cause:** The PLC returned a register value that contains non-BCD nibbles (e.g., `0xA123` — the nibble `A` is invalid BCD). This usually indicates the ladder program wrote a non-BCD value to a register configured as a BCD tag. - -**Remediation:** Investigate the PLC ladder program. The proxy cannot decode non-BCD data — passing it through is safer than guessing. +The full diagnosis playbook — startup bind conflicts, backend connectivity, hot-reload validation errors, BCD rewrite anomalies, performance and queue-depth issues, response-cache anomalies, and graceful-shutdown problems — is keyed to log events and status counters in [`Operations/Troubleshooting.md`](Operations/Troubleshooting.md). The complete `mbproxy.*` event catalog with levels, properties, and operator implications is in [`Reference/LogEvents.md`](Reference/LogEvents.md). ## First-install smoke checklist