f2c6669444
Eight area-focused reviews (BCD rewriter, multiplexer, response cache, supervisor + hot-reload, admin + diagnostics, hosting + options, test suite) plus an Overview that prioritises findings across areas, and a RemediationPlan that groups the work into three waves with per-item file:line citations and regression-test sketches. Findings call out: hot-reload tag-list/cache changes that don't reach the running multiplexer, a coalescing factory leak that hangs late attachers, backend-reader head-of-line block on a wedged upstream, stranded outbound frames after cascade, and ShutdownCoordinator double-stop ordering. Plus the unconventional 32-bit BCD wire format (two base-10000 digits in CDAB, not standard binary), unreachable BcdValidationError.DuplicateAddress, mbproxy.cache.flushed event that's defined but never emitted, and missing test coverage for Cache.AllowLongTtl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
58 lines
9.9 KiB
Markdown
58 lines
9.9 KiB
Markdown
# Hosting / DI / Options / Windows Service Audit
|
|
|
|
Scope: `Program.cs`, `HostingExtensions.cs`, `Mbproxy.csproj`, `appsettings.json`, `Options/*.cs`, `install/`, `Mbproxy.slnx`. Cross-checked against `docs/design.md` (Configuration, Logging, Startup posture), `docs/Architecture/Overview.md`, `docs/Operations/Configuration.md`. Tests under `tests/Mbproxy.Tests/Options/MbproxyOptionsBindingTests.cs` + `tests/Mbproxy.Tests/HostSmokeTests.cs`.
|
|
|
|
## Summary
|
|
|
|
- The host pipeline is small, idiomatic for .NET 10 `Host.CreateApplicationBuilder`, and the Windows Service / console-fallback story is correctly handled via `AddWindowsService()` + `WindowsServiceHelpers.IsWindowsService()`.
|
|
- Hot-reload posture is mostly clean (`IOptionsMonitor<MbproxyOptions>` everywhere it matters), with one regression: `ShutdownCoordinator` consumes `IOptions<MbproxyOptions>` and will freeze `GracefulShutdownTimeoutMs` at startup-time value.
|
|
- `appsettings.json` shipped in `src/Mbproxy/` is an empty template (no PLCs, no BCD tags, no `Cache` section). The richer working example lives only in `install/mbproxy.config.template.json` — a maintenance-time hazard.
|
|
- Single-file self-contained publish is wired correctly for net10.0; package versions are pinned and GA (no preview/RC strings present).
|
|
|
|
## Critical findings
|
|
|
|
None. The service starts, reaches `mbproxy.startup.ready`, and shuts down cleanly per `HostSmokeTests`.
|
|
|
|
## Major findings
|
|
|
|
- **`ShutdownCoordinator` uses `IOptions<MbproxyOptions>`, defeating hot-reload of `GracefulShutdownTimeoutMs`.** `src/Mbproxy/Diagnostics/ShutdownCoordinator.cs:93,102,118` and the DI factory at `src/Mbproxy/Program.cs:42`. `IOptions<T>` is a singleton snapshot: any operator who edits `Connection.GracefulShutdownTimeoutMs` after start will silently see the original value at shutdown. Per `docs/Operations/Configuration.md:160` "`GracefulShutdownTimeoutMs` is sampled only at shutdown" — so the contract is "current config wins at stop time". Switch to `IOptionsMonitor<MbproxyOptions>` and read `_options.CurrentValue.Connection.GracefulShutdownTimeoutMs` inside `ShutdownAsync`.
|
|
|
|
- **`appsettings.json` shipped with the binary is a near-empty stub.** `src/Mbproxy/appsettings.json:1` ships `BcdTags.Global = []`, `Plcs = []`, no `Cache` section, and no example tag. The fully-commented working example lives only in `install/mbproxy.config.template.json`. Copy-on-build (`Mbproxy.csproj:52`) will publish the empty stub next to `Mbproxy.exe`, and `install.ps1:99-105` then copies *that* into `InstallPath`. The Production binary therefore carries an `appsettings.json` that only the install script's separate copy of the template will ever fix at `%ProgramData%\mbproxy\…`. A fresh `Mbproxy.exe` run from the publish folder yields a no-op service. Recommend either marking `src/Mbproxy/appsettings.json` `CopyToOutputDirectory=Never` so the install template is the single source of truth, or making the shipped file mirror the template.
|
|
|
|
- **`ConfigReconciler.Attach` happens between supervisor construction and `StartAsync` — not "BEFORE accepting hot-reload events".** `src/Mbproxy/Proxy/ProxyWorker.cs:153`. The comment claims this is safe because the channel "only enqueues" and apply runs async. That is true, but it does mean a config save that lands during the very early window between `host.RunAsync()` and `_reconciler.Attach` is observed by `IOptionsMonitor.OnChange` before the reconciler has its supervisor map. Whether that is benign depends on `ConfigReconciler` internals (it does buffer); worth a quick review to confirm the buffer is unbounded or that the missed event is replayed from `_options.CurrentValue` after `Attach`.
|
|
|
|
- **`IHostApplicationLifetime.ApplicationStopping` callback runs `coordinator.ShutdownAsync().GetAwaiter().GetResult()` synchronously on the lifetime thread.** `src/Mbproxy/Program.cs:60-66`. The comment acknowledges async isn't supported and notes the coordinator manages its own deadline, which is correct. But the drain loop in `ShutdownCoordinator.ShutdownAsync` (line 167) `Task.Delay(10, drainCts.Token)` ContinueWith chains will resume on the threadpool — fine — yet the synchronous `GetResult` blocks one IHost lifetime thread for up to `GracefulShutdownTimeoutMs + ~7 s` (5 s stop + drain + 2 s admin). That is by design, but worth documenting as the reason the `GracefulShutdownTimeoutMs` default (10 s) is bounded well below the SCM 30 s wait hint.
|
|
|
|
## Minor findings
|
|
|
|
- **Bare-config-only Serilog wiring; the design's "Event Log sink for Error+ in service mode" is satisfied via a custom sink, not the standard `Serilog.Sinks.EventLog` package.** `src/Mbproxy/HostingExtensions.cs:79-84` + `src/Mbproxy/Diagnostics/EventLogBridge.cs:25`. This is intentional (the bridge silently no-ops if the source isn't registered, avoiding admin-required `EventLog.CreateEventSource` at startup). Just call this out in `docs/Operations/Configuration.md` so reviewers don't add a duplicate sink.
|
|
|
|
- **`ProxyWorker` registered as singleton + hosted service via the `sp.GetRequiredService` factory pattern.** `src/Mbproxy/Program.cs:29-30` and the same trick at `src/Mbproxy/HostingExtensions.cs:57-58` for `AdminEndpointHost`. The pattern is correct and idiomatic; just notice that `ProxyWorker` holds a private mutable dictionary `_supervisors` (line 41) — singleton lifetime is correct because that dict is per-process, not per-PLC, and `Supervisors` is exposed as `IReadOnlyDictionary` for the status page.
|
|
|
|
- **`MbproxyOptions.Plcs` typed as `IReadOnlyList<PlcOptions>`.** `src/Mbproxy/Options/MbproxyOptions.cs:8`. Configuration binder support for `IReadOnlyList<T>` works on .NET 8+ but historically had quirks; `MbproxyOptionsBindingTests.cs:75` confirms binding works. No action needed.
|
|
|
|
- **`AddJsonFile` is implicit via `Host.CreateApplicationBuilder`.** Not called manually in `Program.cs`. The default loader uses `optional: true, reloadOnChange: true` for `appsettings.json` and `appsettings.{Environment}.json`. Per `docs/Operations/Configuration.md:13` the design states `reloadOnChange: true` — satisfied by the default. Environment overrides and command-line are also picked up automatically by `CreateApplicationBuilder(args)` at `Program.cs:9`.
|
|
|
|
- **`MbproxyOptionsValidator` is registered manually as `IValidateOptions<>` (line 28-30) AND `.ValidateOnStart()` is called (line 26).** Correct combination, but note `ValidateOnStart()` only gates startup — runtime reload validation is delegated to `ReloadValidator` (per `Configuration.md:245`). The schema-level validator does NOT run on reload via `IOptionsMonitor.OnChange` because options binding rebinds without re-invoking `IValidateOptions<>`. This is the design (the validation entry-point on reload is the reconciler), but it's a sharp edge — a width-of-8 sneaked in via hot-reload would NOT be caught by `MbproxyOptionsValidator`; it has to be re-applied by `ReloadValidator`.
|
|
|
|
- **`Mbproxy.csproj:42` describes Polly as "deferred" — but Polly is now used in `PolicyFactory.BuildBackendConnect/ListenerRecovery`.** Comment is stale; remove it.
|
|
|
|
- **No `appsettings.Development.json` exists** — fine, but the host will still attempt to load it (optional). If you want to suppress dev-only Serilog overrides in published artifacts, OK as-is.
|
|
|
|
## What looks good
|
|
|
|
- TFM `net10.0`, GA package versions `Microsoft.Extensions.Hosting.WindowsServices 10.0.8`, `Serilog.Extensions.Hosting 10.0.0`, `Polly 8.6.6` — all production releases, no `*-preview*` / `*-rc*` strings (`Mbproxy.csproj:37-43`).
|
|
- `OutputType=Exe` + `Sdk="Microsoft.NET.Sdk.Worker"` is the right combo for a Worker Service that also hosts ASP.NET Core (Kestrel admin) via `<FrameworkReference Include="Microsoft.AspNetCore.App" />` (line 31).
|
|
- Single-file publish gated to Release only (`Mbproxy.csproj:22-27`) — Debug iteration stays fast. `IncludeNativeLibrariesForSelfExtract=true` is the correct hint for self-extracting native deps.
|
|
- `AddWindowsService()` is called unconditionally (`Program.cs:12`); `EventLogBridge` wiring is gated by `WindowsServiceHelpers.IsWindowsService()` so console runs don't depend on Event Log source registration (`Program.cs:15`, `HostingExtensions.cs:79-84`).
|
|
- Cancellation flows correctly: `BackgroundService.ExecuteAsync(stoppingToken)` is plumbed into supervisor `StartAsync(stoppingToken)` and `WaitForInitialBindAttemptAsync(readyLinked.Token)` (`ProxyWorker.cs:163-177`).
|
|
- Install/uninstall scripts are idempotent: `install.ps1:71-84` stops gracefully before rebuilding, `:112-117` updates an existing service in-place rather than recreating, `:166-171` checks Event Log source existence before creating, `:154-162` preserves an existing operator config. `uninstall.ps1:75-93` archives logs to a timestamped folder rather than deleting — excellent operational instinct.
|
|
- Failure-recovery actions are configured (`install.ps1:127`): `restart/60000/restart/60000/""/0` with a 24 h reset window — restart twice then back off, which is the right policy for a fleet proxy that should not flap-restart.
|
|
- `Mbproxy.slnx` (the new XML solution format) is correctly minimal and references only the two projects.
|
|
|
|
## Open questions
|
|
|
|
1. Is the empty `src/Mbproxy/appsettings.json` shipping intentionally so operators are forced into `%ProgramData%`? If so, document it explicitly in `docs/Operations/Configuration.md`. If not, mirror the install template into the source tree.
|
|
2. Should `ShutdownCoordinator` switch to `IOptionsMonitor<T>` to honour a hot-reloaded `GracefulShutdownTimeoutMs`, or is the doc claim ("sampled only at shutdown") meant to mean "as of the last reload **before** shutdown was requested"? Either intent is fine, but the code currently means "as of process start" and that contradicts both readings.
|
|
3. Is the missing per-PLC reload-validation re-run of `MbproxyOptionsValidator` (Width != 16/32) covered by `ReloadValidator` test coverage? The width check appears to live only in `MbproxyOptionsValidator` (`MbproxyOptions.cs:62-63`) — worth confirming `ReloadValidator` re-checks it on hot-reload, or it could be skipped on a runtime config change.
|