diff --git a/code-reviews/Host/findings.md b/code-reviews/Host/findings.md index 73e3f65..8e31a14 100644 --- a/code-reviews/Host/findings.md +++ b/code-reviews/Host/findings.md @@ -8,7 +8,7 @@ | Last reviewed | 2026-05-17 | | Reviewer | claude-agent | | Commit reviewed | `39d737e` | -| Open findings | 4 | +| Open findings | 0 | ## Summary @@ -585,7 +585,7 @@ key). Full Host suite green (175 passed). |--|--| | Severity | Medium | | Category | Design-document adherence | -| Status | Open | +| Status | Resolved | | Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:146-148` | **Description** @@ -618,7 +618,15 @@ declared-but-dead. **Resolution** -_Unresolved._ +Resolved 2026-05-17 (commit ``). Root cause confirmed against +`AkkaHostedService.cs`: `BuildHocon` emitted `keep-oldest { down-if-alone = on }` as +a literal, and `ClusterOptions.DownIfAlone` was referenced nowhere outside its own +declaration, so the bound option was dead. Fix: `BuildHocon` now emits +`down-if-alone = {(clusterOptions.DownIfAlone ? "on" : "off")}`, consuming the bound +property. Regression tests in `HoconBuilderTests`: `BuildHocon_DownIfAloneTrue_EmitsOn` +and `BuildHocon_DownIfAloneFalse_EmitsOff` parse the resulting HOCON and assert the +`keep-oldest.down-if-alone` token tracks the property; the `false` case failed before +the fix (token was still `on`) and passes after. Full Host suite green (182 passed). ### Host-013 — `:F0` rounding of cluster timing values silently degrades sub-second configuration @@ -626,7 +634,7 @@ _Unresolved._ |--|--| | Severity | Low | | Category | Correctness & logic bugs | -| Status | Open | +| Status | Resolved | | Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:135-136,145,151-152` | **Description** @@ -653,7 +661,17 @@ number of seconds and fail fast otherwise. Either way, do not silently round. **Resolution** -_Unresolved._ +Resolved 2026-05-17 (commit ``). Root cause confirmed: `BuildHocon` +rendered every duration via `TotalSeconds:F0`, rounding sub-second values — a 750ms +heartbeat became `1s` and a 500ms value would collapse to a degenerate `0s`. Fix: +durations are now rendered by a new `DurationHocon` helper that emits whole +milliseconds with an `ms` suffix (Akka's HOCON parser accepts it), so sub-second +configuration survives exactly; `BuildHocon` takes the transport heartbeat/failure +as `TimeSpan` rather than pre-rounded `double` seconds. Regression test +`HoconBuilderTests.BuildHocon_SubSecondTimings_PreservedWithoutRounding` configures +750ms/2600ms/500ms/2.5s/7.5s timings and asserts each round-trips through the parsed +HOCON unchanged; it failed before the fix (750ms parsed as `00:00:01`) and passes +after. Full Host suite green (182 passed). ### Host-014 — Serilog sinks are hard-coded in `Program.cs`, not configuration-driven (REQ-HOST-8) @@ -661,7 +679,7 @@ _Unresolved._ |--|--| | Severity | Low | | Category | Design-document adherence | -| Status | Open | +| Status | Resolved | | Location | `src/ScadaLink.Host/Program.cs:43-48`, `src/ScadaLink.Host/appsettings.json:1-7` | **Description** @@ -690,7 +708,22 @@ state the sinks are intentionally code-defined — but the design doc currently **Resolution** -_Unresolved._ +Resolved 2026-05-17 (commit ``). Root cause confirmed: `LoggerConfigurationFactory.Build` +called `ReadFrom.Configuration` but neither `appsettings.json` nor either +role-specific file contained a `Serilog` section, so the two sinks that actually ran +were hard-coded `.WriteTo.Console(...)` / `.WriteTo.File(...)` calls in `Program.cs` +— uneditable without recompiling, contrary to REQ-HOST-8. Fix: added a `Serilog` +section to `appsettings.json` with `WriteTo` entries for the Console and File sinks +(carrying the output template, file path `logs/scadalink-.log` and `Day` rolling +interval as args) and removed the hard-coded `.WriteTo` calls from `Program.cs`; the +factory's `ReadFrom.Configuration` (backed by the transitive +`Serilog.Settings.Configuration` package) now drives the sinks. Regression tests in +new `SerilogSinkConfigTests`: `ShippedAppSettings_HasSerilogSection_WithConsoleAndFileSinks` +asserts the shipped `appsettings.json` declares both sinks in a `Serilog:WriteTo` +array (fails against the pre-fix file, which had no `Serilog` section), and +`LoggerConfigurationFactory_AppliesConfiguredFileSink` proves a configuration-supplied +File sink reaches the built logger and writes to the configured path. Full Host suite +green (182 passed). ### Host-015 — `StartupRetry` retries on every exception type, including permanent failures @@ -698,7 +731,7 @@ _Unresolved._ |--|--| | Severity | Low | | Category | Error handling & resilience | -| Status | Open | +| Status | Resolved | | Location | `src/ScadaLink.Host/StartupRetry.cs:36-45` | **Description** @@ -727,4 +760,20 @@ single attempt. **Resolution** -_Unresolved._ +Resolved 2026-05-17 (commit ``). Root cause confirmed: `ExecuteWithRetryAsync` +caught `Exception` with only the `when (attempt < maxAttempts)` guard, so it retried +any exception — including a permanent schema-version mismatch raised by the +migration call site, delaying the inevitable fatal exit by ~2 minutes. Fix: added an +optional `Func isTransient` parameter (default treats all +exceptions as transient, preserving prior behaviour for unclassified callers); the +catch guard is now `when (attempt < maxAttempts && isTransient(ex))`, so a +non-transient fault propagates after a single attempt. Added a +`StartupRetry.IsTransientDatabaseFault` classifier that returns `true` only for +connection-class faults (`TimeoutException`, `SocketException`, `SqlException` / +`DbException` by type-name match) and `false` for everything else — notably +schema-validation `InvalidOperationException`s — and the `Program.cs` migration call +site now passes it. Regression tests in `StartupRetryTests`: +`ExecuteWithRetry_NonTransientFailure_RethrowsAfterSingleAttempt` (runs exactly once +when `isTransient` returns false) and `ExecuteWithRetry_TransientThenPermanent_StopsAtPermanent` +(retries a `TimeoutException` then stops at a permanent `InvalidOperationException`). +Full Host suite green (182 passed). diff --git a/src/ScadaLink.Host/Actors/AkkaHostedService.cs b/src/ScadaLink.Host/Actors/AkkaHostedService.cs index 9f313e3..d1e07f6 100644 --- a/src/ScadaLink.Host/Actors/AkkaHostedService.cs +++ b/src/ScadaLink.Host/Actors/AkkaHostedService.cs @@ -67,7 +67,8 @@ public class AkkaHostedService : IHostedService // interpolated value, so a hostname, seed node or strategy containing a quote, // backslash or whitespace cannot corrupt the configuration document. var hocon = BuildHocon(_nodeOptions, _clusterOptions, roles, - transportHeartbeatSec, transportFailureSec); + _communicationOptions.TransportHeartbeatInterval, + _communicationOptions.TransportFailureThreshold); var config = ConfigurationFactory.ParseString(hocon); _actorSystem = ActorSystem.Create("scadalink", config); @@ -106,13 +107,21 @@ public class AkkaHostedService : IHostedService /// routed through (string values) so a hostname, /// seed-node URI, role or split-brain strategy containing a quote, backslash or /// whitespace cannot corrupt the document or be silently misparsed (Host-006). + /// + /// Host-012: the keep-oldest down-if-alone flag is emitted from + /// rather than hard-coded, so the bound + /// configuration value is actually consumed. + /// + /// Host-013: every duration is rendered via in + /// milliseconds, so sub-second cluster timing values (e.g. a 750ms heartbeat) are + /// preserved exactly instead of being rounded to whole seconds. /// public static string BuildHocon( NodeOptions nodeOptions, ClusterOptions clusterOptions, IEnumerable roles, - double transportHeartbeatSec, - double transportFailureSec) + TimeSpan transportHeartbeat, + TimeSpan transportFailure) { var seedNodesStr = string.Join(",", clusterOptions.SeedNodes.Select(QuoteHocon)); @@ -132,8 +141,8 @@ akka {{ port = {nodeOptions.RemotingPort} }} transport-failure-detector {{ - heartbeat-interval = {transportHeartbeatSec:F0}s - acceptable-heartbeat-pause = {transportFailureSec:F0}s + heartbeat-interval = {DurationHocon(transportHeartbeat)} + acceptable-heartbeat-pause = {DurationHocon(transportFailure)} }} }} cluster {{ @@ -142,14 +151,14 @@ akka {{ min-nr-of-members = {clusterOptions.MinNrOfMembers} split-brain-resolver {{ active-strategy = {QuoteHocon(clusterOptions.SplitBrainResolverStrategy)} - stable-after = {clusterOptions.StableAfter.TotalSeconds:F0}s + stable-after = {DurationHocon(clusterOptions.StableAfter)} keep-oldest {{ - down-if-alone = on + down-if-alone = {(clusterOptions.DownIfAlone ? "on" : "off")} }} }} failure-detector {{ - heartbeat-interval = {clusterOptions.HeartbeatInterval.TotalSeconds:F0}s - acceptable-heartbeat-pause = {clusterOptions.FailureDetectionThreshold.TotalSeconds:F0}s + heartbeat-interval = {DurationHocon(clusterOptions.HeartbeatInterval)} + acceptable-heartbeat-pause = {DurationHocon(clusterOptions.FailureDetectionThreshold)} }} run-coordinated-shutdown-when-down = on }} @@ -159,6 +168,18 @@ akka {{ }}"; } + /// + /// Renders a as a HOCON duration in milliseconds. Akka's + /// HOCON parser accepts a ms suffix, so emitting whole milliseconds + /// preserves sub-second configuration exactly — a 750ms heartbeat stays 750ms + /// rather than being rounded to 1s (or, for sub-half-second values, + /// silently collapsing to a degenerate 0s) — Host-013. + /// + private static string DurationHocon(TimeSpan duration) + { + return $"{(long)Math.Round(duration.TotalMilliseconds)}ms"; + } + /// /// Renders a value as a HOCON double-quoted string, escaping backslashes and /// double quotes so the resulting token cannot break out of its string literal. diff --git a/src/ScadaLink.Host/LoggerConfigurationFactory.cs b/src/ScadaLink.Host/LoggerConfigurationFactory.cs index ee87825..d04f52c 100644 --- a/src/ScadaLink.Host/LoggerConfigurationFactory.cs +++ b/src/ScadaLink.Host/LoggerConfigurationFactory.cs @@ -8,11 +8,13 @@ namespace ScadaLink.Host; /// /// REQ-HOST-8 / Host-011: the configured minimum level comes from /// ScadaLink:Logging:MinimumLevel (bound to ) so an -/// operator editing that key changes the effective log level. The standard -/// Serilog configuration section is still read (via -/// ) -/// for sink/override customisation; the explicit MinimumLevel.Is below pins -/// the floor from . +/// operator editing that key changes the effective log level. +/// +/// REQ-HOST-8 / Host-014: the console and file sinks are read from the standard +/// Serilog configuration section via ReadFrom.Configuration — the sink +/// set, console output template, file path and rolling interval are all +/// configuration-driven (defined in appsettings.json), not hard-coded. The +/// explicit MinimumLevel.Is below pins the floor from . /// public static class LoggerConfigurationFactory { diff --git a/src/ScadaLink.Host/Program.cs b/src/ScadaLink.Host/Program.cs index a54803c..0dd4b2c 100644 --- a/src/ScadaLink.Host/Program.cs +++ b/src/ScadaLink.Host/Program.cs @@ -40,11 +40,12 @@ var siteId = configuration["ScadaLink:Node:SiteId"] ?? "central"; // WP-14: Serilog structured logging. // Host-011: minimum level is driven by ScadaLink:Logging:MinimumLevel (LoggingOptions). +// Host-014: console and file sinks are defined in the `Serilog` configuration +// section (appsettings.json) and applied via ReadFrom.Configuration inside the +// factory — the sink set, output template, file path and rolling interval are all +// configuration-driven per REQ-HOST-8, not hard-coded here. Log.Logger = ScadaLink.Host.LoggerConfigurationFactory .Build(configuration, nodeRole, siteId, nodeHostname) - .WriteTo.Console(outputTemplate: - "[{Timestamp:HH:mm:ss} {Level:u3}] [{NodeRole}/{NodeHostname}] {Message:lj}{NewLine}{Exception}") - .WriteTo.File("logs/scadalink-.log", rollingInterval: Serilog.RollingInterval.Day) .CreateLogger(); try @@ -121,6 +122,8 @@ try // Host-010: tolerate a database that is briefly unreachable at boot // (e.g. app and DB containers starting together) with a bounded // exponential backoff before failing fatally. + // Host-015: only connection-class (transient) faults are retried — a + // schema-version mismatch is permanent and must fail fast on attempt 1. await StartupRetry.ExecuteWithRetryAsync( "database-migration", async () => @@ -131,7 +134,8 @@ try }, maxAttempts: 8, initialDelay: TimeSpan.FromSeconds(2), - migrationLogger); + migrationLogger, + isTransient: StartupRetry.IsTransientDatabaseFault); } // Middleware pipeline diff --git a/src/ScadaLink.Host/StartupRetry.cs b/src/ScadaLink.Host/StartupRetry.cs index 3e9e710..0faa313 100644 --- a/src/ScadaLink.Host/StartupRetry.cs +++ b/src/ScadaLink.Host/StartupRetry.cs @@ -9,6 +9,12 @@ namespace ScadaLink.Host; /// unreachable. Rather than crashing the process on the first connection failure, /// the migration step is wrapped in this bounded exponential backoff: it tolerates a /// short outage and only fails fatally once attempts are exhausted. +/// +/// Host-015: only transient faults are retried. The optional +/// isTransient predicate classifies each exception; a permanent failure +/// (e.g. a database schema-version mismatch — which no amount of waiting can fix) +/// is rethrown immediately rather than being retried for minutes before the +/// inevitable fatal exit. /// public static class StartupRetry { @@ -18,8 +24,13 @@ public static class StartupRetry int maxAttempts, TimeSpan initialDelay, ILogger logger, + Func? isTransient = null, CancellationToken cancellationToken = default) { + // Default: treat every exception as transient (preserves the pre-Host-015 + // behaviour for callers that do not classify faults). + isTransient ??= static _ => true; + var delay = initialDelay; for (var attempt = 1; ; attempt++) { @@ -33,10 +44,10 @@ public static class StartupRetry operationName, attempt); return; } - catch (Exception ex) when (attempt < maxAttempts) + catch (Exception ex) when (attempt < maxAttempts && isTransient(ex)) { logger.LogWarning(ex, - "Startup operation '{Operation}' failed on attempt {Attempt}/{MaxAttempts}; " + + "Startup operation '{Operation}' failed (transient) on attempt {Attempt}/{MaxAttempts}; " + "retrying in {Delay}.", operationName, attempt, maxAttempts, delay); await Task.Delay(delay, cancellationToken); @@ -45,4 +56,39 @@ public static class StartupRetry } } } + + /// + /// Transient-fault classifier for the database-migration startup step (Host-015). + /// Returns true only for connection-class faults that a brief wait can + /// resolve — a SQL connection/transport error or a timeout — and false + /// for everything else (notably schema-validation s + /// raised by MigrationHelper.ApplyOrValidateMigrationsAsync, which are + /// permanent and must fail fast). + /// + public static bool IsTransientDatabaseFault(Exception ex) + { + // Unwrap a single layer of aggregation so a faulted Task surfaces correctly. + if (ex is AggregateException agg && agg.InnerException != null) + ex = agg.InnerException; + + if (ex is TimeoutException) + return true; + + // Socket / network errors raised while opening the connection. + if (ex is System.Net.Sockets.SocketException) + return true; + + // Microsoft.Data.SqlClient throws SqlException; matching by type name keeps + // the Host free of a direct SqlClient package reference. A SqlException at + // the migration stage is, in practice, a connection failure (the server is + // not yet reachable) rather than a schema fault — schema mismatches surface + // as InvalidOperationException from the migration helper. + var typeName = ex.GetType().FullName; + if (typeName != null && + (typeName.EndsWith("SqlException", StringComparison.Ordinal) || + typeName.EndsWith("DbException", StringComparison.Ordinal))) + return true; + + return false; + } } diff --git a/src/ScadaLink.Host/appsettings.json b/src/ScadaLink.Host/appsettings.json index 960be50..8de97a3 100644 --- a/src/ScadaLink.Host/appsettings.json +++ b/src/ScadaLink.Host/appsettings.json @@ -3,5 +3,26 @@ "LogLevel": { "Default": "Information" } + }, + "Serilog": { + "Using": [ + "Serilog.Sinks.Console", + "Serilog.Sinks.File" + ], + "WriteTo": [ + { + "Name": "Console", + "Args": { + "outputTemplate": "[{Timestamp:HH:mm:ss} {Level:u3}] [{NodeRole}/{NodeHostname}] {Message:lj}{NewLine}{Exception}" + } + }, + { + "Name": "File", + "Args": { + "path": "logs/scadalink-.log", + "rollingInterval": "Day" + } + } + ] } } diff --git a/tests/ScadaLink.Host.Tests/HoconBuilderTests.cs b/tests/ScadaLink.Host.Tests/HoconBuilderTests.cs index df9468b..ebb7b10 100644 --- a/tests/ScadaLink.Host.Tests/HoconBuilderTests.cs +++ b/tests/ScadaLink.Host.Tests/HoconBuilderTests.cs @@ -37,7 +37,8 @@ public class HoconBuilderTests }; var hocon = AkkaHostedService.BuildHocon( - node, DefaultCluster(), new[] { "Central" }, 5, 15); + node, DefaultCluster(), new[] { "Central" }, + TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(15)); var config = ConfigurationFactory.ParseString(hocon); Assert.Equal("central-node1", config.GetString("akka.remote.dot-netty.tcp.hostname")); @@ -57,7 +58,8 @@ public class HoconBuilderTests }; var hocon = AkkaHostedService.BuildHocon( - node, DefaultCluster(), new[] { "Central" }, 5, 15); + node, DefaultCluster(), new[] { "Central" }, + TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(15)); // Must still parse, and the keys after the hostname must remain intact. var config = ConfigurationFactory.ParseString(hocon); @@ -79,9 +81,83 @@ public class HoconBuilderTests }; var hocon = AkkaHostedService.BuildHocon( - node, cluster, new[] { "Central" }, 5, 15); + node, cluster, new[] { "Central" }, + TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(15)); var config = ConfigurationFactory.ParseString(hocon); Assert.Equal("keep oldest", config.GetString("akka.cluster.split-brain-resolver.active-strategy")); } + + private static NodeOptions DefaultNode() => new() + { + Role = "Central", + NodeHostname = "node1", + RemotingPort = 8081, + }; + + [Fact] + public void BuildHocon_DownIfAloneTrue_EmitsOn() + { + // Host-012: BuildHocon must consume ClusterOptions.DownIfAlone, not + // hard-code the down-if-alone token. + var cluster = DefaultCluster(); + cluster.DownIfAlone = true; + + var hocon = AkkaHostedService.BuildHocon( + DefaultNode(), cluster, new[] { "Central" }, + TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(15)); + + var config = ConfigurationFactory.ParseString(hocon); + Assert.True(config.GetBoolean( + "akka.cluster.split-brain-resolver.keep-oldest.down-if-alone")); + } + + [Fact] + public void BuildHocon_DownIfAloneFalse_EmitsOff() + { + // Host-012: setting DownIfAlone=false must actually flip the emitted token. + var cluster = DefaultCluster(); + cluster.DownIfAlone = false; + + var hocon = AkkaHostedService.BuildHocon( + DefaultNode(), cluster, new[] { "Central" }, + TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(15)); + + var config = ConfigurationFactory.ParseString(hocon); + Assert.False(config.GetBoolean( + "akka.cluster.split-brain-resolver.keep-oldest.down-if-alone")); + } + + [Fact] + public void BuildHocon_SubSecondTimings_PreservedWithoutRounding() + { + // Host-013: cluster timing values below one second must not be rounded to + // whole seconds. A 750ms heartbeat must survive as 750ms, not collapse to + // 1s — and a 500ms value must not collapse to a degenerate 0s. + var cluster = DefaultCluster(); + cluster.HeartbeatInterval = TimeSpan.FromMilliseconds(750); + cluster.FailureDetectionThreshold = TimeSpan.FromMilliseconds(2600); + cluster.StableAfter = TimeSpan.FromMilliseconds(500); + + var hocon = AkkaHostedService.BuildHocon( + DefaultNode(), cluster, new[] { "Central" }, + TimeSpan.FromMilliseconds(2500), TimeSpan.FromMilliseconds(7500)); + + var config = ConfigurationFactory.ParseString(hocon); + Assert.Equal( + TimeSpan.FromMilliseconds(750), + config.GetTimeSpan("akka.cluster.failure-detector.heartbeat-interval")); + Assert.Equal( + TimeSpan.FromMilliseconds(2600), + config.GetTimeSpan("akka.cluster.failure-detector.acceptable-heartbeat-pause")); + Assert.Equal( + TimeSpan.FromMilliseconds(500), + config.GetTimeSpan("akka.cluster.split-brain-resolver.stable-after")); + Assert.Equal( + TimeSpan.FromMilliseconds(2500), + config.GetTimeSpan("akka.remote.transport-failure-detector.heartbeat-interval")); + Assert.Equal( + TimeSpan.FromMilliseconds(7500), + config.GetTimeSpan("akka.remote.transport-failure-detector.acceptable-heartbeat-pause")); + } } diff --git a/tests/ScadaLink.Host.Tests/SerilogSinkConfigTests.cs b/tests/ScadaLink.Host.Tests/SerilogSinkConfigTests.cs new file mode 100644 index 0000000..7a08d27 --- /dev/null +++ b/tests/ScadaLink.Host.Tests/SerilogSinkConfigTests.cs @@ -0,0 +1,93 @@ +using System.Reflection; +using System.Text.Json; +using Microsoft.Extensions.Configuration; + +namespace ScadaLink.Host.Tests; + +/// +/// Host-014 regression: REQ-HOST-8 requires the Host's Serilog sinks +/// (console and file at minimum) to be configuration-driven. The sinks +/// must be defined in a Serilog section in appsettings.json and +/// applied via ReadFrom.Configuration — they must not be hard-coded in +/// Program.cs, so an operator can change the file path, rolling interval or +/// output template without recompiling. +/// +public class SerilogSinkConfigTests +{ + private static string FindHostProjectDirectory() + { + var assemblyDir = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location)!; + var dir = new DirectoryInfo(assemblyDir); + while (dir != null) + { + var hostPath = Path.Combine(dir.FullName, "src", "ScadaLink.Host"); + if (Directory.Exists(hostPath)) + return hostPath; + dir = dir.Parent; + } + throw new DirectoryNotFoundException("Could not locate src/ScadaLink.Host"); + } + + [Fact] + public void ShippedAppSettings_HasSerilogSection_WithConsoleAndFileSinks() + { + var path = Path.Combine(FindHostProjectDirectory(), "appsettings.json"); + using var doc = JsonDocument.Parse(File.ReadAllText(path)); + + Assert.True( + doc.RootElement.TryGetProperty("Serilog", out var serilog), + "appsettings.json must contain a `Serilog` section so ReadFrom.Configuration " + + "drives the sinks (REQ-HOST-8 / Host-014)."); + + Assert.True( + serilog.TryGetProperty("WriteTo", out var writeTo), + "the `Serilog` section must contain a `WriteTo` array defining the sinks."); + + var sinkNames = writeTo.EnumerateArray() + .Select(e => e.TryGetProperty("Name", out var n) ? n.GetString() : null) + .ToList(); + + Assert.Contains("Console", sinkNames); + Assert.Contains("File", sinkNames); + } + + [Fact] + public void LoggerConfigurationFactory_AppliesConfiguredFileSink() + { + // A `Serilog` section in configuration must actually reach the built logger + // via ReadFrom.Configuration — proving the sink set is configuration-driven. + var logDir = Path.Combine(Path.GetTempPath(), "scadalink-host014-" + Guid.NewGuid().ToString("N")); + var logPath = Path.Combine(logDir, "test-.log"); + try + { + var configuration = new ConfigurationBuilder() + .AddInMemoryCollection(new Dictionary + { + ["Serilog:Using:0"] = "Serilog.Sinks.File", + ["Serilog:WriteTo:0:Name"] = "File", + ["Serilog:WriteTo:0:Args:path"] = logPath, + ["Serilog:WriteTo:0:Args:rollingInterval"] = "Day", + }) + .Build(); + + var logger = LoggerConfigurationFactory + .Build(configuration, "Central", "central", "node1") + .CreateLogger(); + + logger.Information("host-014 configured-sink probe"); + logger.Dispose(); + + Assert.True(Directory.Exists(logDir), "the configured file sink must have created its directory."); + var written = Directory.GetFiles(logDir, "test-*.log"); + Assert.NotEmpty(written); + Assert.Contains( + "host-014 configured-sink probe", + File.ReadAllText(written[0])); + } + finally + { + if (Directory.Exists(logDir)) + Directory.Delete(logDir, recursive: true); + } + } +} diff --git a/tests/ScadaLink.Host.Tests/StartupRetryTests.cs b/tests/ScadaLink.Host.Tests/StartupRetryTests.cs index 7454a03..e164c09 100644 --- a/tests/ScadaLink.Host.Tests/StartupRetryTests.cs +++ b/tests/ScadaLink.Host.Tests/StartupRetryTests.cs @@ -57,9 +57,59 @@ public class StartupRetryTests }, maxAttempts: 3, initialDelay: TimeSpan.FromMilliseconds(1), - NullLogger.Instance)); + NullLogger.Instance, + isTransient: _ => true)); Assert.Equal(3, attempts); Assert.Equal("failure 3", ex.Message); } + + [Fact] + public async Task ExecuteWithRetry_NonTransientFailure_RethrowsAfterSingleAttempt() + { + // Host-015: a permanent failure (e.g. a schema-version mismatch) must NOT be + // retried — retrying it cannot succeed and only delays the fatal exit by + // minutes. The isTransient predicate classifies it as non-retryable, so the + // operation runs exactly once before the exception propagates. + var attempts = 0; + var ex = await Assert.ThrowsAsync(() => + StartupRetry.ExecuteWithRetryAsync( + "test-op", + () => + { + attempts++; + throw new InvalidOperationException("permanent schema mismatch"); + }, + maxAttempts: 8, + initialDelay: TimeSpan.FromMilliseconds(1), + NullLogger.Instance, + isTransient: _ => false)); + + Assert.Equal(1, attempts); + Assert.Equal("permanent schema mismatch", ex.Message); + } + + [Fact] + public async Task ExecuteWithRetry_TransientThenPermanent_StopsAtPermanent() + { + // A transient fault is retried; a subsequent permanent fault is not. + var attempts = 0; + await Assert.ThrowsAsync(() => + StartupRetry.ExecuteWithRetryAsync( + "test-op", + () => + { + attempts++; + if (attempts == 1) + throw new TimeoutException("transient"); + throw new InvalidOperationException("permanent"); + }, + maxAttempts: 8, + initialDelay: TimeSpan.FromMilliseconds(1), + NullLogger.Instance, + isTransient: e => e is TimeoutException)); + + // 1 transient (retried) + 1 permanent (not retried) = 2. + Assert.Equal(2, attempts); + } }