# Phase 08 — Windows service hardening Install / uninstall scripts, graceful shutdown, Windows Event Log integration, and the public-facing `README.md` that the root `wwtools/CLAUDE.md` index points at. This is the "ship it" phase. **Depends on:** Phase 04 (rewriter), Phase 07 (status page). **Parallel-safe with:** nothing. ## Goal After this phase, an operator can: 1. `dotnet publish` the service into a self-contained folder. 2. Run `install.ps1` to register it as a Windows service. 3. See it appear in `services.msc` running as `Local System` (default — overridable to a managed service account). 4. Stop it cleanly via `sc.exe stop mbproxy`; the service finishes all in-flight PDUs and exits within 10 s. 5. Read crash reasons from the Windows Event Log alongside the Serilog rolling-file output. 6. Read [`../../mbproxy/README.md`](../../mbproxy/README.md) to figure all of this out without needing to talk to a developer. ## Outputs ``` mbproxy/README.md # tool-level human entry point (per DOCS-GUIDE Layer 2) mbproxy/install/install.ps1 # registers the service mbproxy/install/uninstall.ps1 # removes it mbproxy/install/mbproxy.config.template.json # commented appsettings.json for ops mbproxy/docs/operations.md # ops runbook (install, upgrade, troubleshooting) src/Mbproxy/Diagnostics/ShutdownCoordinator.cs # graceful-shutdown helper src/Mbproxy/Diagnostics/EventLogBridge.cs # logs critical events to Windows Event Log tests/Mbproxy.Tests/Diagnostics/ShutdownCoordinatorTests.cs ``` Modifications: - `src/Mbproxy/Program.cs` — wire `ShutdownCoordinator` into the host-stop signal. Wire `EventLogBridge` as a Serilog sub-sink for events at Error and above when running under Windows Service (`WindowsServiceHelpers.IsWindowsService()` true). - `mbproxy/Mbproxy.csproj` — `true` and `true` for the publish profile. - `../CLAUDE.md` (the root `wwtools/CLAUDE.md`) — update the `mbproxy` index row to point at the new `mbproxy/README.md` (per the maintenance note in `mbproxy/CLAUDE.md`). - `mbproxy/CLAUDE.md` — update the "Current state" section to reflect the post-implementation state (no longer "no code yet"), and the Maintenance section to note that the README is now the canonical human entry point. ## Tasks 1. **`mbproxy/README.md`** — follows the DOCS-GUIDE Layer-2 template exactly. Required sections in order: one-sentence identification, hard constraints / prerequisites, layout, resource index, build & run, install. Cross-link to `docs/design.md`, `docs/plan/README.md`, `docs/operations.md`, `CLAUDE.md`. No deep prose tutorials; the README routes. 2. **`mbproxy/install/install.ps1`** — parameters: `-InstallPath ` (default `C:\Program Files\Mbproxy`), `-ServiceName ` (default `mbproxy`), `-DisplayName `, `-Account ` (default `LocalSystem`). Behaviour: - Verifies admin rights; fails with a clear message if not elevated. - Copies the publish output (passed via `-PublishOutput `) to `InstallPath`. - Runs `sc.exe create binPath= "\Mbproxy.exe" start= auto displayName= "" obj= `. - Sets the failure-action policy: restart after 60 s on first/second failure, no restart on subsequent (`sc.exe failure ...`). - Creates `%ProgramData%\mbproxy\logs\` with appropriate ACLs. - Copies `mbproxy.config.template.json` to `%ProgramData%\mbproxy\appsettings.json` if no config exists. - Optionally starts the service if `-Start` flag is passed. 3. **`mbproxy/install/uninstall.ps1`** — stops the service if running, `sc.exe delete `, removes `InstallPath` (with `-KeepConfig` flag to preserve `%ProgramData%\mbproxy\appsettings.json`). 4. **`mbproxy/install/mbproxy.config.template.json`** — a fully commented `appsettings.json` showing the full schema with example values and inline `//` comments describing every field. (Use `appsettings.jsonc` semantics; .NET's configuration loader tolerates `//` comments when configured to.) 5. **`ShutdownCoordinator.cs`** — orchestrates graceful shutdown on `IHostApplicationLifetime.ApplicationStopping`: - Stop accepting new upstream connections on all `PlcListenerSupervisor`s. - Wait for in-flight PDUs to complete with a `10 s` deadline (configurable via `Connection.GracefulShutdownTimeoutMs`, default 10000). - Stop the admin endpoint. - Cancel all remaining work. Log `mbproxy.shutdown.complete` with `InFlightAtCancel` count. 6. **`EventLogBridge.cs`** — adds a Serilog sub-sink that writes events with level >= Error to the Windows Event Log under source `mbproxy`. Only enabled when running as a Windows Service. The install script creates the event source. 7. **`mbproxy/docs/operations.md`** — operations runbook: - Install / uninstall steps (mirror to `README.md`). - Upgrade procedure (stop service, copy new binaries, start). - Where logs live, how to roll them, retention defaults. - Common failure modes (port already in use, PLC unreachable, BCD validation reject) with the relevant log event names and what to check. - The `services.msc` / `sc.exe` / `Get-Service` commands operators will actually use. - How to safely edit `appsettings.json` for hot-reload (with the rejection-keeps-old-config promise). ## Public surface declared in this phase ```csharp namespace Mbproxy.Diagnostics; internal sealed class ShutdownCoordinator { public Task ShutdownAsync(int timeoutMs, CancellationToken hostCt); } internal sealed class EventLogBridge { /* Serilog sub-sink */ } ``` No additional public types are needed; all surfaces from previous phases remain stable. ## Tests required ### Unit (`Category = Unit`) `ShutdownCoordinatorTests` (≥ 4 tests): 1. `Shutdown_NoActiveConnections_CompletesImmediately` 2. `Shutdown_OneActiveConnection_WaitsForCompletion` 3. `Shutdown_TimeoutExceeded_CancelsRemainingWork_AndReportsCount` 4. `Shutdown_AdminEndpointStopped_AfterListenersStopped` — ordering test. ### E2E (`Category = E2E`) `ShutdownE2ETests` (≥ 2 tests, against simulator): 1. `E2E_StopHost_WithConnectedClient_DrainsCleanlyWithin10s` — start host, connect NModbus, issue 5 back-to-back FC03 reads, signal host stop, assert all 5 complete and the client's TCP socket is closed cleanly. 2. `E2E_StopHost_DuringInFlightRequest_CancelsAfterTimeout` — same but with a `Connection.BackendRequestTimeoutMs` that exceeds the shutdown deadline; assert shutdown completes within the deadline and the in-flight request was cancelled. ### Manual / smoke - Install the service via `install.ps1` on a clean test VM; confirm it appears in `services.msc` with `Local System` identity. - `sc.exe start mbproxy` — service starts, admin endpoint at `http://localhost:8080/` shows the proxy is up. - Send `sc.exe stop mbproxy` — service stops within 10 s. - Trigger a crash (e.g., corrupt `appsettings.json` while running and reload — actually this is rejected gracefully; better: kill the process with Task Manager) — confirm an entry appears in Windows Event Log under source `mbproxy`. - `uninstall.ps1` — service removed cleanly; `%ProgramData%\mbproxy\` preserved unless `-KeepConfig` was not passed. The manual smoke results go into `docs/operations.md` as a "first install" verification checklist. ## Phase gate - [ ] Zero-warnings build. - [ ] All phase 00–07 tests still green. - [ ] All new unit tests green. - [ ] All e2e shutdown tests green. - [ ] `mbproxy/README.md` exists, follows the DOCS-GUIDE Layer-2 template, and routes into deep docs without duplicating their content. - [ ] Root `wwtools/CLAUDE.md` index row for `mbproxy` points at `mbproxy/README.md` (was previously pointing into the design plan or the bare folder). - [ ] `install.ps1` and `uninstall.ps1` are idempotent — re-running install when the service already exists is a clean no-op or update, not a hard error. - [ ] Windows Event Log source is created during install and removed during uninstall. - [ ] `dotnet publish src/Mbproxy/Mbproxy.csproj -c Release -r win-x64 --self-contained true /p:PublishSingleFile=true` produces a single executable under 50 MB. - [ ] Manual smoke checklist in `docs/operations.md` has been executed on at least one fresh VM and the result documented. ## Out of scope - Linux / Docker packaging. The design fixes Windows Service as the deployment target. - Centralised log aggregation (Splunk forwarder config, Elastic agent, etc.). Document where the logs are; let ops integrate. - A signed installer (MSI / setup.exe). PowerShell-driven install is the contract; an MSI can be added later if procurement demands it. - Metric exposition for Prometheus / OpenTelemetry. The status page's `/status.json` is sufficient for the operational needs declared in the design. ## Notes for the subagent - The Windows Event Log source creation requires admin rights — that's already a precondition for `install.ps1`. Do not try to create the source at runtime from the service itself (it would fail when the service runs as a non-admin account). - Single-file publish makes `Assembly.GetExecutingAssembly().Location` empty. If `AssemblyVersionAccessor` (phase 07) used that, swap to `Assembly.GetExecutingAssembly().GetCustomAttribute()`. - The `mbproxy/README.md` is what an operator reads first. Be ruthless about length — aim for under 100 lines. The DOCS-GUIDE says routes, not tutorials. - After this phase merges, the project is feature-complete against [`../design.md`](../design.md). Any further work belongs in a NEW design revision (dated, in the same doc) and a new phase plan.