Files
wwtools/mbproxy/docs/operations.md
T
Joseph Doherty 56eee3c563 mbproxy: initial commit through Phase 9 (TxId multiplexing)
Adds the mbproxy service end-to-end. Phases 00-08 implement the
production-ready single-listener / 1:1-backend transparent Modbus TCP
proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260
fleet. Phase 9 replaces the connection layer with a single backend
socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's
4-concurrent-client cap as an operational ceiling.

Phase 9 additions of note:
- PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap
- InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing
  for Phase 10 read coalescing — do not collapse to a single field)
- Per-request watchdog: surfaces Modbus exception 0x0B to upstream
  on BackendRequestTimeoutMs, defending against lost responses,
  dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed-
  request bug (its ServerRequestHandler.last_pdu state race)
- Status DTO + HTML gain inFlight / maxInFlight / txIdWraps /
  disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md)

Tests: 263 unit + 38 E2E. Multiplexer correctness under truly
concurrent backend traffic is proved against a stub backend in
PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus
3.13's single-PDU framer stays in known-good mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:49:35 -04:00

11 KiB
Raw Blame History

mbproxy operations runbook

Day-two operations reference for the mbproxy Windows Service: install, upgrade, configuration, logs, and troubleshooting.

Install

Prerequisites

  • Windows 10 / Server 2019 or later (64-bit).
  • PowerShell 5.1+ run as Administrator (the install script uses #Requires -RunAsAdministrator).
  • The compiled publish output from dotnet publish (see README.md for the exact command).
  • Modbus TCP reachable from the proxy host to the PLCs on port 502.
  • Port 8080 (or whatever AdminPort is set to) available for the status page.

Steps

  1. Publish the binaries on the build machine:

    dotnet publish src/Mbproxy/Mbproxy.csproj -c Release -r win-x64 --self-contained true -o C:\build\mbproxy-publish
    
  2. Copy the publish output to the target server (or run the install script locally if you built on the server).

  3. Open an elevated PowerShell prompt and run the install script:

    .\install\install.ps1 -PublishOutput C:\build\mbproxy-publish -Start
    

    The script:

    • Copies binaries to C:\Program Files\Mbproxy\ (configurable via -InstallPath).
    • Registers the service with sc.exe create.
    • Sets failure-recovery: restart after 60 s on first/second failure, no action on third.
    • Creates %ProgramData%\mbproxy\logs\ and sets ACLs if needed.
    • Copies mbproxy.config.template.json%ProgramData%\mbproxy\appsettings.json only if no config exists.
    • Registers the Windows Event Log source mbproxy.
    • With -Start, starts the service and waits up to 30 s for RUNNING state.
  4. Edit %ProgramData%\mbproxy\appsettings.json to configure your PLC list and BCD tags. See the template for inline comments on every field.

  5. If you edited the config before starting, start the service:

    sc.exe start mbproxy
    
  6. Verify (smoke checklist — see Smoke checklist below).

Re-running install on an existing installation

The install script is idempotent. Re-running it:

  • Stops the service if running.
  • Overwrites the binaries.
  • Updates the service config via sc.exe config (not sc.exe create).
  • Preserves %ProgramData%\mbproxy\appsettings.json (never overwritten on update).
  • Skips Event Log source creation if already registered.

Upgrade procedure

  1. Publish new binaries on the build machine (same command as install step 1).

  2. Stop the service:

    sc.exe stop mbproxy
    

    Wait for the service to reach STOPPED state — graceful shutdown drains in-flight PDUs (up to Connection.GracefulShutdownTimeoutMs, default 10 s).

  3. Copy new binaries to C:\Program Files\Mbproxy\ (or run install.ps1 -PublishOutput ... to automate steps 24):

    Copy-Item -Path C:\build\mbproxy-publish\* -Destination 'C:\Program Files\Mbproxy\' -Force
    
  4. Start the service:

    sc.exe start mbproxy
    
  5. Check the status page to confirm the new version:

    Invoke-RestMethod http://localhost:8080/status.json | Select-Object -ExpandProperty service
    

    The version field should show the new build.

Uninstall

.\install\uninstall.ps1

Options:

  • -KeepConfig — preserves %ProgramData%\mbproxy\appsettings.json for re-install.
  • Log files are always archived to %ProgramData%\mbproxy.archived-<timestamp>\logs\ regardless of -KeepConfig. They are never deleted.

Configuration

The service reads %ProgramData%\mbproxy\appsettings.json at startup and watches it for changes while running. Most settings are hot-reloadable; a few require a restart.

Hot-reload vs. restart

Setting Behaviour on file save
BcdTags.Global add/remove/width Next PDU uses the new map; in-flight PDUs complete with the old map.
Plcs[].BcdTags.{Add,Remove} Same per-PDU propagation.
Plcs[].Name or .Host or .ListenPort changed Treated as remove + add: old listener stops, new one starts.
New Plcs[] entry New listener binds immediately (subject to port availability).
Plcs[] entry removed Supervisor stops the listener; all connected clients for that PLC are disconnected.
Connection.Backend*TimeoutMs Next connect/request uses the new value.
Connection.GracefulShutdownTimeoutMs Picked up on the next ApplicationStopping event.
AdminPort Admin endpoint re-binds on the new port; old port released.
Invalid reload (schema error, duplicate ports/addresses) Rejected as a whole. Current in-memory config stays; mbproxy.config.reload.rejected logged at Error.

For more detail on the hot-reload propagation model, see design.md → "Configuration hot-reload".

Editing appsettings.json

The service picks up changes automatically. There is no need to restart unless you are changing the Connection.GracefulShutdownTimeoutMs (applies only on next stop) or updating the binary.

If a reload is rejected (mbproxy.config.reload.rejected in the log), the service continues running with the previous config. Fix the JSON error and save again — the next valid file write will be accepted.

Logs

Location

Rolling log files live at: C:\ProgramData\mbproxy\logs\mbproxy-<date>.log

One file per day, retained for 30 days by default (controlled by retainedFileCountLimit in the Serilog config section).

Windows Event Log

When running as a Windows Service, the EventLogBridge sink writes events at Error level and above to the Windows Application Event Log under source mbproxy. View with:

Get-EventLog -LogName Application -Source mbproxy -Newest 20

Or open Event Viewer → Windows Logs → Application, filter by source mbproxy.

Log survival after uninstall

uninstall.ps1 never deletes log files. It moves logs\ to a timestamped archive at %ProgramData%\mbproxy.archived-<timestamp>\logs\ so post-crash diagnostics remain accessible.

Status page

URL: http://<proxy-host>:<AdminPort>/

Default port: 8080. Change with Mbproxy.AdminPort in appsettings.json.

Routes:

  • GET / — HTML table, auto-refreshes every 5 s. No external assets.
  • GET /status.json — same data as JSON for monitoring scrapers.

Key fields on /status.json:

Field Meaning
service.version Assembly informational version (set at publish time).
service.uptimeSeconds Seconds since service start.
service.config.lastReloadUtc Last accepted hot-reload timestamp.
listeners.bound / listeners.configured Bound count vs. configured PLC count.
plcs[].listener.state bound / recovering / stopped.
plcs[].backend.connectsSuccess Successful backend TCP connects since start.
plcs[].backend.connectsFailed Failed backend connects (all retries exhausted).
plcs[].pdus.forwarded Total PDUs forwarded through this PLC's proxy.

Common failure modes

mbproxy.startup.bind.failed — port in use

Symptom: The service starts but one or more PLCs show listener.state = recovering.

Cause: Another process is bound to the configured ListenPort.

Remediation:

netstat -ano | findstr :<port>      # find PID holding the port
Get-Process -Id <pid>               # identify the process

Release the port or change Plcs[].ListenPort in appsettings.json. The supervisor will retry automatically — watch for mbproxy.listener.recovered in the log.

mbproxy.listener.recovered — no action needed

A previously-failing listener successfully bound. The service is self-healing. This is informational.

mbproxy.backend.failed — PLC unreachable

Symptom: Upstream clients cannot connect through the proxy, or connections are immediately dropped.

Cause: The PLC backend (Plcs[].Host:Port) is unreachable — network issue, PLC power cycle, or H2-ECOM100 firmware issue.

Remediation: Check network path to the PLC. Verify the PLC Modbus port is responding:

Test-NetConnection -ComputerName <plc-ip> -Port 502

Note: the H2-ECOM100 module caps connections at 4 simultaneous TCP clients. If the proxy already has 4 upstream clients connected to one PLC port, a fifth will trigger mbproxy.backend.failed.

mbproxy.config.reload.rejected — bad config

Symptom: The log shows a rejection event after a file save; the current config is unchanged.

Cause: The saved appsettings.json has a schema error, duplicate port, or conflicting BCD address.

Remediation: Check the log for the joined error list immediately following the rejection event. Fix the JSON and save again.

mbproxy.admin.bind.failed — admin port in use

Symptom: The status page is unreachable.

Cause: Another process is using AdminPort.

Remediation: The proxy continues to forward Modbus traffic — only the status page is affected. Change AdminPort in appsettings.json (hot-reload applies).

mbproxy.rewrite.partial_bcd — client reading half a 32-bit BCD pair

Symptom: Warning in the log; the value passes through raw (no rewrite).

Cause: The upstream client is reading only one register of a configured 32-bit BCD pair (e.g., quantity = 1 at the low address, or any read at the high address alone). This is almost always a client-side tag-definition bug.

Remediation: Verify the client's tag definition specifies quantity = 2 for 32-bit BCD addresses.

mbproxy.rewrite.invalid_bcd — non-BCD value from PLC

Symptom: Warning in the log; the value passes through raw.

Cause: The PLC returned a register value that contains non-BCD nibbles (e.g., 0xA123 — the nibble A is invalid BCD). This usually indicates the ladder program wrote a non-BCD value to a register configured as a BCD tag.

Remediation: Investigate the PLC ladder program. The proxy cannot decode non-BCD data — passing it through is safer than guessing.

First-install smoke checklist

Run these commands after install.ps1 -Start to verify the deployment:

# 1. Service is running
Get-Service mbproxy | Select-Object Status, DisplayName

# 2. Status page is reachable
Invoke-WebRequest http://localhost:8080/ -UseBasicParsing | Select-Object StatusCode

# 3. JSON endpoint returns expected fields
$status = Invoke-RestMethod http://localhost:8080/status.json
$status.service | Select-Object version, uptimeSeconds
$status.listeners

# 4. Log file exists and is recent
Get-Item "C:\ProgramData\mbproxy\logs\mbproxy-*.log" | Sort-Object LastWriteTime -Descending | Select-Object -First 1

# 5. No Error events in the Event Log
Get-EventLog -LogName Application -Source mbproxy -EntryType Error -Newest 5

# 6. Stop the service cleanly (graceful shutdown within 10 s)
$sw = [System.Diagnostics.Stopwatch]::StartNew()
sc.exe stop mbproxy
$deadline = [DateTime]::UtcNow.AddSeconds(15)
do { Start-Sleep 1 } until ((Get-Service mbproxy).Status -eq 'Stopped' -or [DateTime]::UtcNow -gt $deadline)
$sw.Stop()
Write-Host "Stop elapsed: $($sw.ElapsedMilliseconds) ms"
(Get-Service mbproxy).Status   # Should be Stopped

Note: This checklist documents the expected steps. It was not executed on a dedicated clean VM (the proxy was developed and unit/E2E tested in-process). Run this checklist on first deployment to a production host.