fix(api-surface): close Theme 9 — 27 naming / dead-code / config / hygiene findings

The largest themed batch — small mechanical fixes across 11 modules.

API / message hygiene:
- Comm-020: SiteAddressCacheLoaded now carries IReadOnlyDictionary /
  IReadOnlyList — Akka messages must be immutable.
- Commons-016: BundleSession.MaxUnlockAttempts named constant replaces
  magic 3.
- Commons-018: IOperationTrackingStore + IPartitionMaintenance moved from
  Interfaces/ root to Interfaces/Services/ (namespace preserved — 9
  consumers exceeded the in-prompt move threshold).
- Commons-023: TrackingStatusSnapshot.SourceNode now consistent with the
  trailing-optional-with-default pattern used elsewhere.
- SR-022: AuditingDbCommand.DbConnection.set no longer uses reflection —
  exposes AuditingDbConnection.Inner via internal API surface.

Dead code / config cleanup:
- ClusterInfra-011: decorative SectionName constant deleted.
- ClusterInfra-014: dead AddClusterInfrastructureActors method + its
  "throws-when-called" test deleted.
- Host-021: Microsoft Logging:LogLevel block deleted from appsettings.json
  (dead under Serilog).

Fail-loud over fail-silent:
- DM-021: ResolveSiteIdentifierAsync throws on missing site (was silently
  substituting a DB id).
- DM-022: dropped transient Pending write — record now lands directly in
  InProgress (no UI flicker, one fewer DB write).
- Host-020: LoggerConfigurationFactory emits a Console.Error warning when
  both Serilog:MinimumLevel and ScadaLink:Logging:MinimumLevel are set
  (ScadaLink remains truth per Host-011).
- SnF-022: NotifyCachedCallObserverAsync logs Warning on unparseable
  TrackedOperationId (was silently dropping).
- SnF-023: empty siteId default replaced with $unknown-site sentinel
  + constructor normalisation.

Correctness:
- SCA-001: SupervisorStrategy XML rewritten to match actual
  DefaultDecider/Restart semantics (was claiming Resume).
- SCA-003: OnUpsertAsync now restamps IngestedAtUtc on every upsert.
- SR-021: HandleDeployArtifacts now dispatches an internal
  ApplyArtifactDataConnectionsToDcl message after the SQLite write so
  system-wide artifact-deploy data-connection changes go live
  immediately (was requiring a site restart).
- SnF-020: RetryParkedMessageAsync captures the parked row BEFORE the
  local write so a concurrent delete can't skip standby replication.

Sentinels / naming collisions:
- HM-021: CentralSiteId changed from "central" to "$central"
  (uncollideable — leading $ is forbidden in real SiteIdentifiers).

Doc / surface cleanups:
- SEL-018: FailedWriteCount promoted to ISiteEventLogger; XML softened
  to "Available for future Health Monitoring integration".
- SnF-019: VERIFY outcome — documented parking-after-DefaultMaxRetries
  in Component-StoreAndForward.md + DefaultMaxRetries XML (uniform
  cap; maxRetries:0 is the unbounded escape hatch).
- SnF-021: Component-StoreAndForward.md no longer claims the tracking
  table lives in SnF — it's in SiteRuntime, the interface is in Commons.
- CLI-020: bundle export response parse guarded with try/catch on
  JsonException / KeyNotFoundException / FormatException — emits a
  clean INVALID_RESPONSE exit instead of a stack trace.

Config:
- ClusterInfra-013: intent comment added to "catastrophic config" test.
- Host-016: appsettings.Site.json second CentralContactPoints entry
  removed (was pointing at the SITE's own port); doc-key explains how
  to extend.
- Host-018: NodeName added to both shipped per-role configs (was
  causing SourceNode to be null on audit rows).

UI:
- CentralUI-029: replaced JS.InvokeAsync<int>("eval", …) with an ES
  module import (new wwwroot/js/browser-time.js).
- CentralUI-032: AuditResultsGrid gains a Previous button backed by a
  cursor stack.

10+ new regression tests across the affected projects. Build clean;
all suites green. README regenerated: 6 open (was 33).

Session-to-date: 130 of 136 originally-open Theme findings closed.
This commit is contained in:
Joseph Doherty
2026-05-28 08:39:01 -04:00
parent d190345ef0
commit 77cb0ad0e2
46 changed files with 966 additions and 278 deletions
@@ -35,17 +35,26 @@ public class ClusterOptionsTests
Assert.Empty(options.SeedNodes);
}
[Fact]
public void SectionName_IsTheExpectedAppSettingsSection()
{
// CI-005: ClusterOptions must expose a single-source-of-truth constant for
// its appsettings.json section so binding sites do not hard-code the string.
Assert.Equal("ScadaLink:Cluster", ClusterOptions.SectionName);
}
// ClusterInfra-011: SectionName constant deleted — the previous test
// `SectionName_IsTheExpectedAppSettingsSection` is removed alongside it.
// The Host's SiteServiceRegistration / StartupValidator continue to
// reference the `"ScadaLink:Cluster"` literal directly; reinstating the
// constant should happen when those Host binding sites can be updated in
// the same change.
[Fact]
public void Properties_CanBeSetToCustomValues()
{
// ClusterInfra-013: this test exercises the POCO property setters only —
// `SplitBrainResolverStrategy = "keep-majority"` and `MinNrOfMembers = 2`
// are values the design doc explicitly forbids in production
// (`keep-majority` causes total shutdown on a two-node partition;
// `MinNrOfMembers = 2` blocks the cluster singleton after failover).
// The POCO accepts any value by design; rejection lives in
// `ClusterOptionsValidator` and is covered by
// `ClusterOptionsValidatorTests.UnsupportedSplitBrainStrategy_FailsValidation`
// and `ClusterOptionsValidatorTests.MinNrOfMembers_NotOne_FailsValidation`.
// Do NOT read these values as endorsed runtime configuration.
var options = new ClusterOptions
{
SeedNodes = new List<string> { "akka.tcp://system@node1:2551", "akka.tcp://system@node2:2551" },
@@ -4,11 +4,11 @@ using Microsoft.Extensions.Options;
namespace ScadaLink.ClusterInfrastructure.Tests;
/// <summary>
/// CI-002: Tests that the DI extension methods do real work rather than
/// silently returning success. <see cref="ServiceCollectionExtensions.AddClusterInfrastructure"/>
/// must register the <see cref="ClusterOptionsValidator"/> so misconfiguration
/// fails fast, and the unimplemented actor-registration placeholder must fail
/// loudly rather than masquerade as a completed registration.
/// CI-002: Tests that <see cref="ServiceCollectionExtensions.AddClusterInfrastructure"/>
/// does real work rather than silently returning success — it must register
/// the <see cref="ClusterOptionsValidator"/> so misconfiguration fails fast.
/// (The companion actor-registration test was removed alongside the deleted
/// `AddClusterInfrastructureActors` extension method — see ClusterInfra-014.)
/// </summary>
public class ServiceCollectionExtensionsTests
{
@@ -48,11 +48,10 @@ public class ServiceCollectionExtensionsTests
Assert.Contains("MinNrOfMembers", ex.Message);
}
[Fact]
public void AddClusterInfrastructureActors_ThrowsRatherThanSilentlySucceeding()
{
var services = new ServiceCollection();
Assert.Throws<NotImplementedException>(() => services.AddClusterInfrastructureActors());
}
// ClusterInfra-014: `AddClusterInfrastructureActors_ThrowsRatherThanSilentlySucceeding`
// was removed alongside the now-deleted `AddClusterInfrastructureActors`
// extension method. The Akka.NET actor wiring legitimately lives in
// `ScadaLink.Host` (AkkaHostedService) per the
// Component-ClusterInfrastructure.md "Implementation Note — Code Placement"
// section; this project no longer exposes an actor-registration extension.
}
@@ -5,6 +5,7 @@ using Microsoft.Extensions.Options;
using NSubstitute;
using ScadaLink.Commons.Entities.Deployment;
using ScadaLink.Commons.Entities.Instances;
using ScadaLink.Commons.Entities.Sites;
using ScadaLink.Commons.Interfaces.Repositories;
using ScadaLink.Commons.Interfaces.Services;
using ScadaLink.Commons.Messages.Deployment;
@@ -44,7 +45,7 @@ public class DeploymentServiceTests : TestKit
OperationLockTimeout = TimeSpan.FromSeconds(5)
});
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
_service = new DeploymentService(
_repo, siteRepo, _pipeline, _comms, _lockManager, _audit,
new DiffService(),
@@ -101,6 +102,91 @@ public class DeploymentServiceTests : TestKit
Assert.Contains("Validation failed", result.Error);
}
// ── DeploymentManager-021: missing Site row -> hard failure, no silent fabrication ──
[Fact]
public async Task DeployInstanceAsync_SiteRowMissing_FailsLoudlyInsteadOfSilentlySubstituting()
{
// DeploymentManager-021 regression: previously ResolveSiteIdentifierAsync
// silently returned the numeric siteId rendered as a string when the
// site row was missing (FK was deleted, race with admin delete, DB
// inconsistency). That bogus identifier then surfaced downstream as a
// confusing "unknown site" routing error that hid the real cause.
//
// After the fix the resolver throws InvalidOperationException naming
// the unresolved id; on the deploy path the existing try/catch turns
// it into a Failed deployment record whose error message reflects the
// actual problem.
var instance = new Instance("OrphanInst")
{
Id = 99, SiteId = 42, State = InstanceState.NotDeployed
};
_repo.GetInstanceByIdAsync(99, Arg.Any<CancellationToken>()).Returns(instance);
SetupValidPipeline(99, "OrphanInst", "sha256:abc");
// Build a fresh service whose siteRepo explicitly returns null for the
// instance's SiteId (the helper above seeds every id, so we shadow it
// for SiteId=42 only).
var siteRepo = CreateSiteRepoStub();
siteRepo.GetSiteByIdAsync(42, Arg.Any<CancellationToken>()).Returns((Site?)null);
var service = new DeploymentService(
_repo, siteRepo, _pipeline, _comms, _lockManager, _audit,
new DiffService(),
new DeploymentStatusNotifier(NullLogger<DeploymentStatusNotifier>.Instance),
Options.Create(new DeploymentManagerOptions { OperationLockTimeout = TimeSpan.FromSeconds(5) }),
NullLogger<DeploymentService>.Instance);
var result = await service.DeployInstanceAsync(99, "admin");
Assert.True(result.IsFailure);
// The descriptive message names the unresolved id so the operator sees
// the actual problem (missing site row), not a downstream routing error.
Assert.Contains("42", result.Error);
Assert.Contains("not found", result.Error, StringComparison.OrdinalIgnoreCase);
}
// ── DeploymentManager-022: no transient Pending -> single InProgress insert ──
[Fact]
public async Task DeployInstanceAsync_NoTransientPendingWrite_RecordCreatedDirectlyInProgress()
{
// DeploymentManager-022 regression: previously the deploy path wrote
// the record as Pending, then immediately updated it to InProgress
// with no work in between — an extra SaveChangesAsync round-trip, an
// extra notifier push, and a Pending->InProgress flicker in the
// CentralUI deployment-status page. After the fix the record is
// inserted directly in InProgress (one Add + one notify); no
// intermediate Pending row is ever persisted or notified.
var instance = new Instance("DirectInProgressInst")
{
Id = 200, SiteId = 1, State = InstanceState.NotDeployed
};
_repo.GetInstanceByIdAsync(200, Arg.Any<CancellationToken>()).Returns(instance);
SetupValidPipeline(200, "DirectInProgressInst", "sha256:dp22");
// The catch path later flips the same record reference to Failed, so
// snapshot the Status at insert time rather than reading the live
// reference at assertion time.
DeploymentStatus? statusAtInsert = null;
await _repo.AddDeploymentRecordAsync(
Arg.Do<DeploymentRecord>(r => statusAtInsert = r.Status), Arg.Any<CancellationToken>());
// The communication actor is unset so the call throws after the insert;
// we only care about the status the insert was made with.
await _service.DeployInstanceAsync(200, "admin");
// The single Add happens with the record already in InProgress.
Assert.NotNull(statusAtInsert);
Assert.Equal(DeploymentStatus.InProgress, statusAtInsert!.Value);
// No Pending update was issued — the resolver never wrote the
// intermediate Pending row.
await _repo.DidNotReceive().UpdateDeploymentRecordAsync(
Arg.Is<DeploymentRecord>(r => r.Status == DeploymentStatus.Pending),
Arg.Any<CancellationToken>());
}
// ── WP-2: Deployment identity ──
[Fact]
@@ -581,7 +667,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
return new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(),
@@ -598,6 +684,30 @@ public class DeploymentServiceTests : TestKit
new FlatteningPipelineResult(config, revisionHash, ValidationResult.Success())));
}
/// <summary>
/// DeploymentManager-021 test helper: returns an <see cref="ISiteRepository"/>
/// substitute that resolves <see cref="ISiteRepository.GetSiteByIdAsync"/>
/// for ANY integer id to a stub <see cref="Site"/> whose
/// <c>SiteIdentifier</c> is <c>"site-{id}"</c>. Prior to the
/// DeploymentManager-021 fix the production `ResolveSiteIdentifierAsync`
/// silently substituted the numeric id when the site row was missing, so
/// these tests passed without seeding any Sites. After the fix a missing
/// site throws — every test that drives a deploy/lifecycle path needs a
/// real-shaped <see cref="Site"/> back, and this helper centralises that
/// arrangement so individual tests don't repeat the boilerplate.
/// </summary>
private static ISiteRepository CreateSiteRepoStub()
{
var siteRepo = Substitute.For<ISiteRepository>();
siteRepo.GetSiteByIdAsync(Arg.Any<int>(), Arg.Any<CancellationToken>())
.Returns(callInfo =>
{
var id = callInfo.ArgAt<int>(0);
return new Site($"Test Site {id}", $"site-{id}") { Id = id };
});
return siteRepo;
}
[Fact]
public async Task DeployInstanceAsync_PriorInProgressRecord_SiteHasTargetHash_MarksSuccessWithoutRedeploy()
{
@@ -994,7 +1104,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(),
@@ -1049,7 +1159,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
var deadline = TimeSpan.FromMilliseconds(300);
var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit,
@@ -1098,7 +1208,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(),
@@ -1143,7 +1253,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(),
@@ -4,6 +4,7 @@ using Microsoft.Extensions.Options;
using NSubstitute;
using ScadaLink.Commons.Entities.Deployment;
using ScadaLink.Commons.Entities.Instances;
using ScadaLink.Commons.Entities.Sites;
using ScadaLink.Commons.Interfaces.Repositories;
using ScadaLink.Commons.Interfaces.Services;
using ScadaLink.Commons.Types;
@@ -46,7 +47,17 @@ public class DeploymentStatusNotifierTests : TestKit
OperationLockTimeout = TimeSpan.FromSeconds(5)
});
// DeploymentManager-021: the resolver now throws when the site row
// is missing, so seed the substitute to return a real-shaped Site for
// any id these tests touch.
var siteRepo = Substitute.For<ISiteRepository>();
siteRepo.GetSiteByIdAsync(Arg.Any<int>(), Arg.Any<CancellationToken>())
.Returns(callInfo =>
{
var id = callInfo.ArgAt<int>(0);
return new Site($"Test Site {id}", $"site-{id}") { Id = id };
});
_service = new DeploymentService(
_repo, siteRepo, _pipeline, _comms, _lockManager, _audit,
new DiffService(), _notifier, options,
@@ -68,14 +79,20 @@ public class DeploymentStatusNotifierTests : TestKit
_notifier.StatusChanged += c => changes.Add(c);
// _comms has no actor set, so the deploy reaches the catch block and
// the record ends Failed. The notifier must fire for the Pending,
// InProgress and Failed writes — not be silent (the pre-fix behaviour).
// the record ends Failed. The notifier must fire for the InProgress
// and Failed writes — not be silent (the pre-fix behaviour).
//
// DeploymentManager-022: the transient Pending write was dropped from
// the deploy path (the record is now created directly in InProgress),
// so there is no Pending notification any more. The remaining two
// writes — the initial InProgress insert and the catch-block Failed
// update — must each raise a status-change.
var result = await _service.DeployInstanceAsync(7, "admin");
Assert.True(result.IsFailure);
Assert.NotEmpty(changes);
Assert.All(changes, c => Assert.Equal(7, c.InstanceId));
Assert.Contains(changes, c => c.Status == DeploymentStatus.Pending);
Assert.DoesNotContain(changes, c => c.Status == DeploymentStatus.Pending);
Assert.Contains(changes, c => c.Status == DeploymentStatus.InProgress);
Assert.Contains(changes, c => c.Status == DeploymentStatus.Failed);
@@ -108,4 +108,54 @@ public class LoggerConfigurationTests
Assert.Equal(LogEventLevel.Warning, result);
Assert.Empty(writer.ToString());
}
/// <summary>
/// Host-020: <c>ScadaLink:Logging:MinimumLevel</c> is the documented source
/// of truth for the Serilog floor, and the explicit <c>MinimumLevel.Is</c>
/// call deliberately runs after <c>ReadFrom.Configuration(...)</c> so a
/// <c>Serilog:MinimumLevel</c> entry is overridden. To make that precedence
/// visible — instead of silently swallowed — <see cref="LoggerConfigurationFactory.Build(IConfiguration,string,string,string,TextWriter)"/>
/// writes a one-shot warning when both keys are present. The warning must
/// name both values and point the operator at the documented key. When the
/// Serilog key is absent the warning is silent.
/// </summary>
[Fact]
public void Build_BothMinimumLevelKeysSet_WarnsAboutOverride()
{
var writer = new StringWriter();
var configuration = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
["ScadaLink:Logging:MinimumLevel"] = "Warning",
["Serilog:MinimumLevel"] = "Debug",
})
.Build();
LoggerConfigurationFactory.Build(configuration, "Central", "central", "node1", writer);
var warning = writer.ToString();
Assert.Contains("warning", warning, StringComparison.OrdinalIgnoreCase);
Assert.Contains("Serilog:MinimumLevel", warning);
Assert.Contains("ScadaLink:Logging:MinimumLevel", warning);
Assert.Contains("Debug", warning);
Assert.Contains("Warning", warning);
}
[Fact]
public void Build_OnlyScadaLinkMinimumLevelSet_NoOverrideWarning()
{
var writer = new StringWriter();
var configuration = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
["ScadaLink:Logging:MinimumLevel"] = "Warning",
})
.Build();
LoggerConfigurationFactory.Build(configuration, "Central", "central", "node1", writer);
// No Serilog override -> no override-warning. (The ScadaLink value is
// a recognised level, so ParseLevel is silent too.)
Assert.Empty(writer.ToString());
}
}