Files
wwtools/mbproxy/codereviews/2026-05-14/HostingAndOptions.md
Joseph Doherty f2c6669444 mbproxy/codereviews: 2026-05-14 in-depth review + remediation plan
Eight area-focused reviews (BCD rewriter, multiplexer, response cache,
supervisor + hot-reload, admin + diagnostics, hosting + options, test
suite) plus an Overview that prioritises findings across areas, and a
RemediationPlan that groups the work into three waves with per-item
file:line citations and regression-test sketches.

Findings call out: hot-reload tag-list/cache changes that don't reach the
running multiplexer, a coalescing factory leak that hangs late attachers,
backend-reader head-of-line block on a wedged upstream, stranded outbound
frames after cascade, and ShutdownCoordinator double-stop ordering. Plus
the unconventional 32-bit BCD wire format (two base-10000 digits in CDAB,
not standard binary), unreachable BcdValidationError.DuplicateAddress,
mbproxy.cache.flushed event that's defined but never emitted, and missing
test coverage for Cache.AllowLongTtl.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 05:15:34 -04:00

9.9 KiB

Hosting / DI / Options / Windows Service Audit

Scope: Program.cs, HostingExtensions.cs, Mbproxy.csproj, appsettings.json, Options/*.cs, install/, Mbproxy.slnx. Cross-checked against docs/design.md (Configuration, Logging, Startup posture), docs/Architecture/Overview.md, docs/Operations/Configuration.md. Tests under tests/Mbproxy.Tests/Options/MbproxyOptionsBindingTests.cs + tests/Mbproxy.Tests/HostSmokeTests.cs.

Summary

  • The host pipeline is small, idiomatic for .NET 10 Host.CreateApplicationBuilder, and the Windows Service / console-fallback story is correctly handled via AddWindowsService() + WindowsServiceHelpers.IsWindowsService().
  • Hot-reload posture is mostly clean (IOptionsMonitor<MbproxyOptions> everywhere it matters), with one regression: ShutdownCoordinator consumes IOptions<MbproxyOptions> and will freeze GracefulShutdownTimeoutMs at startup-time value.
  • appsettings.json shipped in src/Mbproxy/ is an empty template (no PLCs, no BCD tags, no Cache section). The richer working example lives only in install/mbproxy.config.template.json — a maintenance-time hazard.
  • Single-file self-contained publish is wired correctly for net10.0; package versions are pinned and GA (no preview/RC strings present).

Critical findings

None. The service starts, reaches mbproxy.startup.ready, and shuts down cleanly per HostSmokeTests.

Major findings

  • ShutdownCoordinator uses IOptions<MbproxyOptions>, defeating hot-reload of GracefulShutdownTimeoutMs. src/Mbproxy/Diagnostics/ShutdownCoordinator.cs:93,102,118 and the DI factory at src/Mbproxy/Program.cs:42. IOptions<T> is a singleton snapshot: any operator who edits Connection.GracefulShutdownTimeoutMs after start will silently see the original value at shutdown. Per docs/Operations/Configuration.md:160 "GracefulShutdownTimeoutMs is sampled only at shutdown" — so the contract is "current config wins at stop time". Switch to IOptionsMonitor<MbproxyOptions> and read _options.CurrentValue.Connection.GracefulShutdownTimeoutMs inside ShutdownAsync.

  • appsettings.json shipped with the binary is a near-empty stub. src/Mbproxy/appsettings.json:1 ships BcdTags.Global = [], Plcs = [], no Cache section, and no example tag. The fully-commented working example lives only in install/mbproxy.config.template.json. Copy-on-build (Mbproxy.csproj:52) will publish the empty stub next to Mbproxy.exe, and install.ps1:99-105 then copies that into InstallPath. The Production binary therefore carries an appsettings.json that only the install script's separate copy of the template will ever fix at %ProgramData%\mbproxy\…. A fresh Mbproxy.exe run from the publish folder yields a no-op service. Recommend either marking src/Mbproxy/appsettings.json CopyToOutputDirectory=Never so the install template is the single source of truth, or making the shipped file mirror the template.

  • ConfigReconciler.Attach happens between supervisor construction and StartAsync — not "BEFORE accepting hot-reload events". src/Mbproxy/Proxy/ProxyWorker.cs:153. The comment claims this is safe because the channel "only enqueues" and apply runs async. That is true, but it does mean a config save that lands during the very early window between host.RunAsync() and _reconciler.Attach is observed by IOptionsMonitor.OnChange before the reconciler has its supervisor map. Whether that is benign depends on ConfigReconciler internals (it does buffer); worth a quick review to confirm the buffer is unbounded or that the missed event is replayed from _options.CurrentValue after Attach.

  • IHostApplicationLifetime.ApplicationStopping callback runs coordinator.ShutdownAsync().GetAwaiter().GetResult() synchronously on the lifetime thread. src/Mbproxy/Program.cs:60-66. The comment acknowledges async isn't supported and notes the coordinator manages its own deadline, which is correct. But the drain loop in ShutdownCoordinator.ShutdownAsync (line 167) Task.Delay(10, drainCts.Token) ContinueWith chains will resume on the threadpool — fine — yet the synchronous GetResult blocks one IHost lifetime thread for up to GracefulShutdownTimeoutMs + ~7 s (5 s stop + drain + 2 s admin). That is by design, but worth documenting as the reason the GracefulShutdownTimeoutMs default (10 s) is bounded well below the SCM 30 s wait hint.

Minor findings

  • Bare-config-only Serilog wiring; the design's "Event Log sink for Error+ in service mode" is satisfied via a custom sink, not the standard Serilog.Sinks.EventLog package. src/Mbproxy/HostingExtensions.cs:79-84 + src/Mbproxy/Diagnostics/EventLogBridge.cs:25. This is intentional (the bridge silently no-ops if the source isn't registered, avoiding admin-required EventLog.CreateEventSource at startup). Just call this out in docs/Operations/Configuration.md so reviewers don't add a duplicate sink.

  • ProxyWorker registered as singleton + hosted service via the sp.GetRequiredService factory pattern. src/Mbproxy/Program.cs:29-30 and the same trick at src/Mbproxy/HostingExtensions.cs:57-58 for AdminEndpointHost. The pattern is correct and idiomatic; just notice that ProxyWorker holds a private mutable dictionary _supervisors (line 41) — singleton lifetime is correct because that dict is per-process, not per-PLC, and Supervisors is exposed as IReadOnlyDictionary for the status page.

  • MbproxyOptions.Plcs typed as IReadOnlyList<PlcOptions>. src/Mbproxy/Options/MbproxyOptions.cs:8. Configuration binder support for IReadOnlyList<T> works on .NET 8+ but historically had quirks; MbproxyOptionsBindingTests.cs:75 confirms binding works. No action needed.

  • AddJsonFile is implicit via Host.CreateApplicationBuilder. Not called manually in Program.cs. The default loader uses optional: true, reloadOnChange: true for appsettings.json and appsettings.{Environment}.json. Per docs/Operations/Configuration.md:13 the design states reloadOnChange: true — satisfied by the default. Environment overrides and command-line are also picked up automatically by CreateApplicationBuilder(args) at Program.cs:9.

  • MbproxyOptionsValidator is registered manually as IValidateOptions<> (line 28-30) AND .ValidateOnStart() is called (line 26). Correct combination, but note ValidateOnStart() only gates startup — runtime reload validation is delegated to ReloadValidator (per Configuration.md:245). The schema-level validator does NOT run on reload via IOptionsMonitor.OnChange because options binding rebinds without re-invoking IValidateOptions<>. This is the design (the validation entry-point on reload is the reconciler), but it's a sharp edge — a width-of-8 sneaked in via hot-reload would NOT be caught by MbproxyOptionsValidator; it has to be re-applied by ReloadValidator.

  • Mbproxy.csproj:42 describes Polly as "deferred" — but Polly is now used in PolicyFactory.BuildBackendConnect/ListenerRecovery. Comment is stale; remove it.

  • No appsettings.Development.json exists — fine, but the host will still attempt to load it (optional). If you want to suppress dev-only Serilog overrides in published artifacts, OK as-is.

What looks good

  • TFM net10.0, GA package versions Microsoft.Extensions.Hosting.WindowsServices 10.0.8, Serilog.Extensions.Hosting 10.0.0, Polly 8.6.6 — all production releases, no *-preview* / *-rc* strings (Mbproxy.csproj:37-43).
  • OutputType=Exe + Sdk="Microsoft.NET.Sdk.Worker" is the right combo for a Worker Service that also hosts ASP.NET Core (Kestrel admin) via <FrameworkReference Include="Microsoft.AspNetCore.App" /> (line 31).
  • Single-file publish gated to Release only (Mbproxy.csproj:22-27) — Debug iteration stays fast. IncludeNativeLibrariesForSelfExtract=true is the correct hint for self-extracting native deps.
  • AddWindowsService() is called unconditionally (Program.cs:12); EventLogBridge wiring is gated by WindowsServiceHelpers.IsWindowsService() so console runs don't depend on Event Log source registration (Program.cs:15, HostingExtensions.cs:79-84).
  • Cancellation flows correctly: BackgroundService.ExecuteAsync(stoppingToken) is plumbed into supervisor StartAsync(stoppingToken) and WaitForInitialBindAttemptAsync(readyLinked.Token) (ProxyWorker.cs:163-177).
  • Install/uninstall scripts are idempotent: install.ps1:71-84 stops gracefully before rebuilding, :112-117 updates an existing service in-place rather than recreating, :166-171 checks Event Log source existence before creating, :154-162 preserves an existing operator config. uninstall.ps1:75-93 archives logs to a timestamped folder rather than deleting — excellent operational instinct.
  • Failure-recovery actions are configured (install.ps1:127): restart/60000/restart/60000/""/0 with a 24 h reset window — restart twice then back off, which is the right policy for a fleet proxy that should not flap-restart.
  • Mbproxy.slnx (the new XML solution format) is correctly minimal and references only the two projects.

Open questions

  1. Is the empty src/Mbproxy/appsettings.json shipping intentionally so operators are forced into %ProgramData%? If so, document it explicitly in docs/Operations/Configuration.md. If not, mirror the install template into the source tree.
  2. Should ShutdownCoordinator switch to IOptionsMonitor<T> to honour a hot-reloaded GracefulShutdownTimeoutMs, or is the doc claim ("sampled only at shutdown") meant to mean "as of the last reload before shutdown was requested"? Either intent is fine, but the code currently means "as of process start" and that contradicts both readings.
  3. Is the missing per-PLC reload-validation re-run of MbproxyOptionsValidator (Width != 16/32) covered by ReloadValidator test coverage? The width check appears to live only in MbproxyOptionsValidator (MbproxyOptions.cs:62-63) — worth confirming ReloadValidator re-checks it on hot-reload, or it could be skipped on a runtime config change.