56eee3c563
Adds the mbproxy service end-to-end. Phases 00-08 implement the production-ready single-listener / 1:1-backend transparent Modbus TCP proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260 fleet. Phase 9 replaces the connection layer with a single backend socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's 4-concurrent-client cap as an operational ceiling. Phase 9 additions of note: - PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap - InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing for Phase 10 read coalescing — do not collapse to a single field) - Per-request watchdog: surfaces Modbus exception 0x0B to upstream on BackendRequestTimeoutMs, defending against lost responses, dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed- request bug (its ServerRequestHandler.last_pdu state race) - Status DTO + HTML gain inFlight / maxInFlight / txIdWraps / disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md) Tests: 263 unit + 38 E2E. Multiplexer correctness under truly concurrent backend traffic is proved against a stub backend in PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus 3.13's single-PDU framer stays in known-good mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.6 KiB
9.6 KiB
Phase 08 — Windows service hardening
Install / uninstall scripts, graceful shutdown, Windows Event Log integration, and the public-facing README.md that the root wwtools/CLAUDE.md index points at. This is the "ship it" phase.
Depends on: Phase 04 (rewriter), Phase 07 (status page). Parallel-safe with: nothing.
Goal
After this phase, an operator can:
dotnet publishthe service into a self-contained folder.- Run
install.ps1to register it as a Windows service. - See it appear in
services.mscrunning asLocal System(default — overridable to a managed service account). - Stop it cleanly via
sc.exe stop mbproxy; the service finishes all in-flight PDUs and exits within 10 s. - Read crash reasons from the Windows Event Log alongside the Serilog rolling-file output.
- Read
../../mbproxy/README.mdto figure all of this out without needing to talk to a developer.
Outputs
mbproxy/README.md # tool-level human entry point (per DOCS-GUIDE Layer 2)
mbproxy/install/install.ps1 # registers the service
mbproxy/install/uninstall.ps1 # removes it
mbproxy/install/mbproxy.config.template.json # commented appsettings.json for ops
mbproxy/docs/operations.md # ops runbook (install, upgrade, troubleshooting)
src/Mbproxy/Diagnostics/ShutdownCoordinator.cs # graceful-shutdown helper
src/Mbproxy/Diagnostics/EventLogBridge.cs # logs critical events to Windows Event Log
tests/Mbproxy.Tests/Diagnostics/ShutdownCoordinatorTests.cs
Modifications:
src/Mbproxy/Program.cs— wireShutdownCoordinatorinto the host-stop signal. WireEventLogBridgeas a Serilog sub-sink for events at Error and above when running under Windows Service (WindowsServiceHelpers.IsWindowsService()true).mbproxy/Mbproxy.csproj—<PublishSingleFile>true</PublishSingleFile>and<SelfContained>true</SelfContained>for the publish profile.../CLAUDE.md(the rootwwtools/CLAUDE.md) — update thembproxyindex row to point at the newmbproxy/README.md(per the maintenance note inmbproxy/CLAUDE.md).mbproxy/CLAUDE.md— update the "Current state" section to reflect the post-implementation state (no longer "no code yet"), and the Maintenance section to note that the README is now the canonical human entry point.
Tasks
mbproxy/README.md— follows the DOCS-GUIDE Layer-2 template exactly. Required sections in order: one-sentence identification, hard constraints / prerequisites, layout, resource index, build & run, install. Cross-link todocs/design.md,docs/plan/README.md,docs/operations.md,CLAUDE.md. No deep prose tutorials; the README routes.mbproxy/install/install.ps1— parameters:-InstallPath <path>(defaultC:\Program Files\Mbproxy),-ServiceName <name>(defaultmbproxy),-DisplayName <text>,-Account <managed-service-account>(defaultLocalSystem). Behaviour:- Verifies admin rights; fails with a clear message if not elevated.
- Copies the publish output (passed via
-PublishOutput <path>) toInstallPath. - Runs
sc.exe create <ServiceName> binPath= "<InstallPath>\Mbproxy.exe" start= auto displayName= "<DisplayName>" obj= <Account>. - Sets the failure-action policy: restart after 60 s on first/second failure, no restart on subsequent (
sc.exe failure ...). - Creates
%ProgramData%\mbproxy\logs\with appropriate ACLs. - Copies
mbproxy.config.template.jsonto%ProgramData%\mbproxy\appsettings.jsonif no config exists. - Optionally starts the service if
-Startflag is passed.
mbproxy/install/uninstall.ps1— stops the service if running,sc.exe delete <ServiceName>, removesInstallPath(with-KeepConfigflag to preserve%ProgramData%\mbproxy\appsettings.json).mbproxy/install/mbproxy.config.template.json— a fully commentedappsettings.jsonshowing the full schema with example values and inline//comments describing every field. (Useappsettings.jsoncsemantics; .NET's configuration loader tolerates//comments when configured to.)ShutdownCoordinator.cs— orchestrates graceful shutdown onIHostApplicationLifetime.ApplicationStopping:- Stop accepting new upstream connections on all
PlcListenerSupervisors. - Wait for in-flight PDUs to complete with a
10 sdeadline (configurable viaConnection.GracefulShutdownTimeoutMs, default 10000). - Stop the admin endpoint.
- Cancel all remaining work. Log
mbproxy.shutdown.completewithInFlightAtCancelcount.
- Stop accepting new upstream connections on all
EventLogBridge.cs— adds a Serilog sub-sink that writes events with level >= Error to the Windows Event Log under sourcembproxy. Only enabled when running as a Windows Service. The install script creates the event source.mbproxy/docs/operations.md— operations runbook:- Install / uninstall steps (mirror to
README.md). - Upgrade procedure (stop service, copy new binaries, start).
- Where logs live, how to roll them, retention defaults.
- Common failure modes (port already in use, PLC unreachable, BCD validation reject) with the relevant log event names and what to check.
- The
services.msc/sc.exe/Get-Servicecommands operators will actually use. - How to safely edit
appsettings.jsonfor hot-reload (with the rejection-keeps-old-config promise).
- Install / uninstall steps (mirror to
Public surface declared in this phase
namespace Mbproxy.Diagnostics;
internal sealed class ShutdownCoordinator {
public Task ShutdownAsync(int timeoutMs, CancellationToken hostCt);
}
internal sealed class EventLogBridge { /* Serilog sub-sink */ }
No additional public types are needed; all surfaces from previous phases remain stable.
Tests required
Unit (Category = Unit)
ShutdownCoordinatorTests (≥ 4 tests):
Shutdown_NoActiveConnections_CompletesImmediatelyShutdown_OneActiveConnection_WaitsForCompletionShutdown_TimeoutExceeded_CancelsRemainingWork_AndReportsCountShutdown_AdminEndpointStopped_AfterListenersStopped— ordering test.
E2E (Category = E2E)
ShutdownE2ETests (≥ 2 tests, against simulator):
E2E_StopHost_WithConnectedClient_DrainsCleanlyWithin10s— start host, connect NModbus, issue 5 back-to-back FC03 reads, signal host stop, assert all 5 complete and the client's TCP socket is closed cleanly.E2E_StopHost_DuringInFlightRequest_CancelsAfterTimeout— same but with aConnection.BackendRequestTimeoutMsthat exceeds the shutdown deadline; assert shutdown completes within the deadline and the in-flight request was cancelled.
Manual / smoke
- Install the service via
install.ps1on a clean test VM; confirm it appears inservices.mscwithLocal Systemidentity. sc.exe start mbproxy— service starts, admin endpoint athttp://localhost:8080/shows the proxy is up.- Send
sc.exe stop mbproxy— service stops within 10 s. - Trigger a crash (e.g., corrupt
appsettings.jsonwhile running and reload — actually this is rejected gracefully; better: kill the process with Task Manager) — confirm an entry appears in Windows Event Log under sourcembproxy. uninstall.ps1— service removed cleanly;%ProgramData%\mbproxy\preserved unless-KeepConfigwas not passed.
The manual smoke results go into docs/operations.md as a "first install" verification checklist.
Phase gate
- Zero-warnings build.
- All phase 00–07 tests still green.
- All new unit tests green.
- All e2e shutdown tests green.
mbproxy/README.mdexists, follows the DOCS-GUIDE Layer-2 template, and routes into deep docs without duplicating their content.- Root
wwtools/CLAUDE.mdindex row formbproxypoints atmbproxy/README.md(was previously pointing into the design plan or the bare folder). install.ps1anduninstall.ps1are idempotent — re-running install when the service already exists is a clean no-op or update, not a hard error.- Windows Event Log source is created during install and removed during uninstall.
dotnet publish src/Mbproxy/Mbproxy.csproj -c Release -r win-x64 --self-contained true /p:PublishSingleFile=trueproduces a single executable under 50 MB.- Manual smoke checklist in
docs/operations.mdhas been executed on at least one fresh VM and the result documented.
Out of scope
- Linux / Docker packaging. The design fixes Windows Service as the deployment target.
- Centralised log aggregation (Splunk forwarder config, Elastic agent, etc.). Document where the logs are; let ops integrate.
- A signed installer (MSI / setup.exe). PowerShell-driven install is the contract; an MSI can be added later if procurement demands it.
- Metric exposition for Prometheus / OpenTelemetry. The status page's
/status.jsonis sufficient for the operational needs declared in the design.
Notes for the subagent
- The Windows Event Log source creation requires admin rights — that's already a precondition for
install.ps1. Do not try to create the source at runtime from the service itself (it would fail when the service runs as a non-admin account). - Single-file publish makes
Assembly.GetExecutingAssembly().Locationempty. IfAssemblyVersionAccessor(phase 07) used that, swap toAssembly.GetExecutingAssembly().GetCustomAttribute<AssemblyInformationalVersionAttribute>(). - The
mbproxy/README.mdis what an operator reads first. Be ruthless about length — aim for under 100 lines. The DOCS-GUIDE says routes, not tutorials. - After this phase merges, the project is feature-complete against
../design.md. Any further work belongs in a NEW design revision (dated, in the same doc) and a new phase plan.