fix(host): resolve Host-012..015 — consume DownIfAlone in HOCON, sub-second timing precision, config-driven Serilog sinks, transient-only startup retry
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-17 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `39d737e` |
|
||||
| Open findings | 4 |
|
||||
| Open findings | 0 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -585,7 +585,7 @@ key). Full Host suite green (175 passed).
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:146-148` |
|
||||
|
||||
**Description**
|
||||
@@ -618,7 +618,15 @@ declared-but-dead.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-17 (commit `<pending>`). Root cause confirmed against
|
||||
`AkkaHostedService.cs`: `BuildHocon` emitted `keep-oldest { down-if-alone = on }` as
|
||||
a literal, and `ClusterOptions.DownIfAlone` was referenced nowhere outside its own
|
||||
declaration, so the bound option was dead. Fix: `BuildHocon` now emits
|
||||
`down-if-alone = {(clusterOptions.DownIfAlone ? "on" : "off")}`, consuming the bound
|
||||
property. Regression tests in `HoconBuilderTests`: `BuildHocon_DownIfAloneTrue_EmitsOn`
|
||||
and `BuildHocon_DownIfAloneFalse_EmitsOff` parse the resulting HOCON and assert the
|
||||
`keep-oldest.down-if-alone` token tracks the property; the `false` case failed before
|
||||
the fix (token was still `on`) and passes after. Full Host suite green (182 passed).
|
||||
|
||||
### Host-013 — `:F0` rounding of cluster timing values silently degrades sub-second configuration
|
||||
|
||||
@@ -626,7 +634,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:135-136,145,151-152` |
|
||||
|
||||
**Description**
|
||||
@@ -653,7 +661,17 @@ number of seconds and fail fast otherwise. Either way, do not silently round.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-17 (commit `<pending>`). Root cause confirmed: `BuildHocon`
|
||||
rendered every duration via `TotalSeconds:F0`, rounding sub-second values — a 750ms
|
||||
heartbeat became `1s` and a 500ms value would collapse to a degenerate `0s`. Fix:
|
||||
durations are now rendered by a new `DurationHocon` helper that emits whole
|
||||
milliseconds with an `ms` suffix (Akka's HOCON parser accepts it), so sub-second
|
||||
configuration survives exactly; `BuildHocon` takes the transport heartbeat/failure
|
||||
as `TimeSpan` rather than pre-rounded `double` seconds. Regression test
|
||||
`HoconBuilderTests.BuildHocon_SubSecondTimings_PreservedWithoutRounding` configures
|
||||
750ms/2600ms/500ms/2.5s/7.5s timings and asserts each round-trips through the parsed
|
||||
HOCON unchanged; it failed before the fix (750ms parsed as `00:00:01`) and passes
|
||||
after. Full Host suite green (182 passed).
|
||||
|
||||
### Host-014 — Serilog sinks are hard-coded in `Program.cs`, not configuration-driven (REQ-HOST-8)
|
||||
|
||||
@@ -661,7 +679,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.Host/Program.cs:43-48`, `src/ScadaLink.Host/appsettings.json:1-7` |
|
||||
|
||||
**Description**
|
||||
@@ -690,7 +708,22 @@ state the sinks are intentionally code-defined — but the design doc currently
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-17 (commit `<pending>`). Root cause confirmed: `LoggerConfigurationFactory.Build`
|
||||
called `ReadFrom.Configuration` but neither `appsettings.json` nor either
|
||||
role-specific file contained a `Serilog` section, so the two sinks that actually ran
|
||||
were hard-coded `.WriteTo.Console(...)` / `.WriteTo.File(...)` calls in `Program.cs`
|
||||
— uneditable without recompiling, contrary to REQ-HOST-8. Fix: added a `Serilog`
|
||||
section to `appsettings.json` with `WriteTo` entries for the Console and File sinks
|
||||
(carrying the output template, file path `logs/scadalink-.log` and `Day` rolling
|
||||
interval as args) and removed the hard-coded `.WriteTo` calls from `Program.cs`; the
|
||||
factory's `ReadFrom.Configuration` (backed by the transitive
|
||||
`Serilog.Settings.Configuration` package) now drives the sinks. Regression tests in
|
||||
new `SerilogSinkConfigTests`: `ShippedAppSettings_HasSerilogSection_WithConsoleAndFileSinks`
|
||||
asserts the shipped `appsettings.json` declares both sinks in a `Serilog:WriteTo`
|
||||
array (fails against the pre-fix file, which had no `Serilog` section), and
|
||||
`LoggerConfigurationFactory_AppliesConfiguredFileSink` proves a configuration-supplied
|
||||
File sink reaches the built logger and writes to the configured path. Full Host suite
|
||||
green (182 passed).
|
||||
|
||||
### Host-015 — `StartupRetry` retries on every exception type, including permanent failures
|
||||
|
||||
@@ -698,7 +731,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.Host/StartupRetry.cs:36-45` |
|
||||
|
||||
**Description**
|
||||
@@ -727,4 +760,20 @@ single attempt.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-17 (commit `<pending>`). Root cause confirmed: `ExecuteWithRetryAsync`
|
||||
caught `Exception` with only the `when (attempt < maxAttempts)` guard, so it retried
|
||||
any exception — including a permanent schema-version mismatch raised by the
|
||||
migration call site, delaying the inevitable fatal exit by ~2 minutes. Fix: added an
|
||||
optional `Func<Exception, bool> isTransient` parameter (default treats all
|
||||
exceptions as transient, preserving prior behaviour for unclassified callers); the
|
||||
catch guard is now `when (attempt < maxAttempts && isTransient(ex))`, so a
|
||||
non-transient fault propagates after a single attempt. Added a
|
||||
`StartupRetry.IsTransientDatabaseFault` classifier that returns `true` only for
|
||||
connection-class faults (`TimeoutException`, `SocketException`, `SqlException` /
|
||||
`DbException` by type-name match) and `false` for everything else — notably
|
||||
schema-validation `InvalidOperationException`s — and the `Program.cs` migration call
|
||||
site now passes it. Regression tests in `StartupRetryTests`:
|
||||
`ExecuteWithRetry_NonTransientFailure_RethrowsAfterSingleAttempt` (runs exactly once
|
||||
when `isTransient` returns false) and `ExecuteWithRetry_TransientThenPermanent_StopsAtPermanent`
|
||||
(retries a `TimeoutException` then stops at a permanent `InvalidOperationException`).
|
||||
Full Host suite green (182 passed).
|
||||
|
||||
Reference in New Issue
Block a user