fix(api-surface): close Theme 9 — 27 naming / dead-code / config / hygiene findings

The largest themed batch — small mechanical fixes across 11 modules.

API / message hygiene:
- Comm-020: SiteAddressCacheLoaded now carries IReadOnlyDictionary /
  IReadOnlyList — Akka messages must be immutable.
- Commons-016: BundleSession.MaxUnlockAttempts named constant replaces
  magic 3.
- Commons-018: IOperationTrackingStore + IPartitionMaintenance moved from
  Interfaces/ root to Interfaces/Services/ (namespace preserved — 9
  consumers exceeded the in-prompt move threshold).
- Commons-023: TrackingStatusSnapshot.SourceNode now consistent with the
  trailing-optional-with-default pattern used elsewhere.
- SR-022: AuditingDbCommand.DbConnection.set no longer uses reflection —
  exposes AuditingDbConnection.Inner via internal API surface.

Dead code / config cleanup:
- ClusterInfra-011: decorative SectionName constant deleted.
- ClusterInfra-014: dead AddClusterInfrastructureActors method + its
  "throws-when-called" test deleted.
- Host-021: Microsoft Logging:LogLevel block deleted from appsettings.json
  (dead under Serilog).

Fail-loud over fail-silent:
- DM-021: ResolveSiteIdentifierAsync throws on missing site (was silently
  substituting a DB id).
- DM-022: dropped transient Pending write — record now lands directly in
  InProgress (no UI flicker, one fewer DB write).
- Host-020: LoggerConfigurationFactory emits a Console.Error warning when
  both Serilog:MinimumLevel and ScadaLink:Logging:MinimumLevel are set
  (ScadaLink remains truth per Host-011).
- SnF-022: NotifyCachedCallObserverAsync logs Warning on unparseable
  TrackedOperationId (was silently dropping).
- SnF-023: empty siteId default replaced with $unknown-site sentinel
  + constructor normalisation.

Correctness:
- SCA-001: SupervisorStrategy XML rewritten to match actual
  DefaultDecider/Restart semantics (was claiming Resume).
- SCA-003: OnUpsertAsync now restamps IngestedAtUtc on every upsert.
- SR-021: HandleDeployArtifacts now dispatches an internal
  ApplyArtifactDataConnectionsToDcl message after the SQLite write so
  system-wide artifact-deploy data-connection changes go live
  immediately (was requiring a site restart).
- SnF-020: RetryParkedMessageAsync captures the parked row BEFORE the
  local write so a concurrent delete can't skip standby replication.

Sentinels / naming collisions:
- HM-021: CentralSiteId changed from "central" to "$central"
  (uncollideable — leading $ is forbidden in real SiteIdentifiers).

Doc / surface cleanups:
- SEL-018: FailedWriteCount promoted to ISiteEventLogger; XML softened
  to "Available for future Health Monitoring integration".
- SnF-019: VERIFY outcome — documented parking-after-DefaultMaxRetries
  in Component-StoreAndForward.md + DefaultMaxRetries XML (uniform
  cap; maxRetries:0 is the unbounded escape hatch).
- SnF-021: Component-StoreAndForward.md no longer claims the tracking
  table lives in SnF — it's in SiteRuntime, the interface is in Commons.
- CLI-020: bundle export response parse guarded with try/catch on
  JsonException / KeyNotFoundException / FormatException — emits a
  clean INVALID_RESPONSE exit instead of a stack trace.

Config:
- ClusterInfra-013: intent comment added to "catastrophic config" test.
- Host-016: appsettings.Site.json second CentralContactPoints entry
  removed (was pointing at the SITE's own port); doc-key explains how
  to extend.
- Host-018: NodeName added to both shipped per-role configs (was
  causing SourceNode to be null on audit rows).

UI:
- CentralUI-029: replaced JS.InvokeAsync<int>("eval", …) with an ES
  module import (new wwwroot/js/browser-time.js).
- CentralUI-032: AuditResultsGrid gains a Previous button backed by a
  cursor stack.

10+ new regression tests across the affected projects. Build clean;
all suites green. README regenerated: 6 open (was 33).

Session-to-date: 130 of 136 originally-open Theme findings closed.
This commit is contained in:
Joseph Doherty
2026-05-28 08:39:01 -04:00
parent d190345ef0
commit 77cb0ad0e2
46 changed files with 966 additions and 278 deletions
@@ -5,6 +5,7 @@ using Microsoft.Extensions.Options;
using NSubstitute;
using ScadaLink.Commons.Entities.Deployment;
using ScadaLink.Commons.Entities.Instances;
using ScadaLink.Commons.Entities.Sites;
using ScadaLink.Commons.Interfaces.Repositories;
using ScadaLink.Commons.Interfaces.Services;
using ScadaLink.Commons.Messages.Deployment;
@@ -44,7 +45,7 @@ public class DeploymentServiceTests : TestKit
OperationLockTimeout = TimeSpan.FromSeconds(5)
});
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
_service = new DeploymentService(
_repo, siteRepo, _pipeline, _comms, _lockManager, _audit,
new DiffService(),
@@ -101,6 +102,91 @@ public class DeploymentServiceTests : TestKit
Assert.Contains("Validation failed", result.Error);
}
// ── DeploymentManager-021: missing Site row -> hard failure, no silent fabrication ──
[Fact]
public async Task DeployInstanceAsync_SiteRowMissing_FailsLoudlyInsteadOfSilentlySubstituting()
{
// DeploymentManager-021 regression: previously ResolveSiteIdentifierAsync
// silently returned the numeric siteId rendered as a string when the
// site row was missing (FK was deleted, race with admin delete, DB
// inconsistency). That bogus identifier then surfaced downstream as a
// confusing "unknown site" routing error that hid the real cause.
//
// After the fix the resolver throws InvalidOperationException naming
// the unresolved id; on the deploy path the existing try/catch turns
// it into a Failed deployment record whose error message reflects the
// actual problem.
var instance = new Instance("OrphanInst")
{
Id = 99, SiteId = 42, State = InstanceState.NotDeployed
};
_repo.GetInstanceByIdAsync(99, Arg.Any<CancellationToken>()).Returns(instance);
SetupValidPipeline(99, "OrphanInst", "sha256:abc");
// Build a fresh service whose siteRepo explicitly returns null for the
// instance's SiteId (the helper above seeds every id, so we shadow it
// for SiteId=42 only).
var siteRepo = CreateSiteRepoStub();
siteRepo.GetSiteByIdAsync(42, Arg.Any<CancellationToken>()).Returns((Site?)null);
var service = new DeploymentService(
_repo, siteRepo, _pipeline, _comms, _lockManager, _audit,
new DiffService(),
new DeploymentStatusNotifier(NullLogger<DeploymentStatusNotifier>.Instance),
Options.Create(new DeploymentManagerOptions { OperationLockTimeout = TimeSpan.FromSeconds(5) }),
NullLogger<DeploymentService>.Instance);
var result = await service.DeployInstanceAsync(99, "admin");
Assert.True(result.IsFailure);
// The descriptive message names the unresolved id so the operator sees
// the actual problem (missing site row), not a downstream routing error.
Assert.Contains("42", result.Error);
Assert.Contains("not found", result.Error, StringComparison.OrdinalIgnoreCase);
}
// ── DeploymentManager-022: no transient Pending -> single InProgress insert ──
[Fact]
public async Task DeployInstanceAsync_NoTransientPendingWrite_RecordCreatedDirectlyInProgress()
{
// DeploymentManager-022 regression: previously the deploy path wrote
// the record as Pending, then immediately updated it to InProgress
// with no work in between — an extra SaveChangesAsync round-trip, an
// extra notifier push, and a Pending->InProgress flicker in the
// CentralUI deployment-status page. After the fix the record is
// inserted directly in InProgress (one Add + one notify); no
// intermediate Pending row is ever persisted or notified.
var instance = new Instance("DirectInProgressInst")
{
Id = 200, SiteId = 1, State = InstanceState.NotDeployed
};
_repo.GetInstanceByIdAsync(200, Arg.Any<CancellationToken>()).Returns(instance);
SetupValidPipeline(200, "DirectInProgressInst", "sha256:dp22");
// The catch path later flips the same record reference to Failed, so
// snapshot the Status at insert time rather than reading the live
// reference at assertion time.
DeploymentStatus? statusAtInsert = null;
await _repo.AddDeploymentRecordAsync(
Arg.Do<DeploymentRecord>(r => statusAtInsert = r.Status), Arg.Any<CancellationToken>());
// The communication actor is unset so the call throws after the insert;
// we only care about the status the insert was made with.
await _service.DeployInstanceAsync(200, "admin");
// The single Add happens with the record already in InProgress.
Assert.NotNull(statusAtInsert);
Assert.Equal(DeploymentStatus.InProgress, statusAtInsert!.Value);
// No Pending update was issued — the resolver never wrote the
// intermediate Pending row.
await _repo.DidNotReceive().UpdateDeploymentRecordAsync(
Arg.Is<DeploymentRecord>(r => r.Status == DeploymentStatus.Pending),
Arg.Any<CancellationToken>());
}
// ── WP-2: Deployment identity ──
[Fact]
@@ -581,7 +667,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
return new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(),
@@ -598,6 +684,30 @@ public class DeploymentServiceTests : TestKit
new FlatteningPipelineResult(config, revisionHash, ValidationResult.Success())));
}
/// <summary>
/// DeploymentManager-021 test helper: returns an <see cref="ISiteRepository"/>
/// substitute that resolves <see cref="ISiteRepository.GetSiteByIdAsync"/>
/// for ANY integer id to a stub <see cref="Site"/> whose
/// <c>SiteIdentifier</c> is <c>"site-{id}"</c>. Prior to the
/// DeploymentManager-021 fix the production `ResolveSiteIdentifierAsync`
/// silently substituted the numeric id when the site row was missing, so
/// these tests passed without seeding any Sites. After the fix a missing
/// site throws — every test that drives a deploy/lifecycle path needs a
/// real-shaped <see cref="Site"/> back, and this helper centralises that
/// arrangement so individual tests don't repeat the boilerplate.
/// </summary>
private static ISiteRepository CreateSiteRepoStub()
{
var siteRepo = Substitute.For<ISiteRepository>();
siteRepo.GetSiteByIdAsync(Arg.Any<int>(), Arg.Any<CancellationToken>())
.Returns(callInfo =>
{
var id = callInfo.ArgAt<int>(0);
return new Site($"Test Site {id}", $"site-{id}") { Id = id };
});
return siteRepo;
}
[Fact]
public async Task DeployInstanceAsync_PriorInProgressRecord_SiteHasTargetHash_MarksSuccessWithoutRedeploy()
{
@@ -994,7 +1104,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(),
@@ -1049,7 +1159,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
var deadline = TimeSpan.FromMilliseconds(300);
var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit,
@@ -1098,7 +1208,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(),
@@ -1143,7 +1253,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>();
var siteRepo = CreateSiteRepoStub();
var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(),
@@ -4,6 +4,7 @@ using Microsoft.Extensions.Options;
using NSubstitute;
using ScadaLink.Commons.Entities.Deployment;
using ScadaLink.Commons.Entities.Instances;
using ScadaLink.Commons.Entities.Sites;
using ScadaLink.Commons.Interfaces.Repositories;
using ScadaLink.Commons.Interfaces.Services;
using ScadaLink.Commons.Types;
@@ -46,7 +47,17 @@ public class DeploymentStatusNotifierTests : TestKit
OperationLockTimeout = TimeSpan.FromSeconds(5)
});
// DeploymentManager-021: the resolver now throws when the site row
// is missing, so seed the substitute to return a real-shaped Site for
// any id these tests touch.
var siteRepo = Substitute.For<ISiteRepository>();
siteRepo.GetSiteByIdAsync(Arg.Any<int>(), Arg.Any<CancellationToken>())
.Returns(callInfo =>
{
var id = callInfo.ArgAt<int>(0);
return new Site($"Test Site {id}", $"site-{id}") { Id = id };
});
_service = new DeploymentService(
_repo, siteRepo, _pipeline, _comms, _lockManager, _audit,
new DiffService(), _notifier, options,
@@ -68,14 +79,20 @@ public class DeploymentStatusNotifierTests : TestKit
_notifier.StatusChanged += c => changes.Add(c);
// _comms has no actor set, so the deploy reaches the catch block and
// the record ends Failed. The notifier must fire for the Pending,
// InProgress and Failed writes — not be silent (the pre-fix behaviour).
// the record ends Failed. The notifier must fire for the InProgress
// and Failed writes — not be silent (the pre-fix behaviour).
//
// DeploymentManager-022: the transient Pending write was dropped from
// the deploy path (the record is now created directly in InProgress),
// so there is no Pending notification any more. The remaining two
// writes — the initial InProgress insert and the catch-block Failed
// update — must each raise a status-change.
var result = await _service.DeployInstanceAsync(7, "admin");
Assert.True(result.IsFailure);
Assert.NotEmpty(changes);
Assert.All(changes, c => Assert.Equal(7, c.InstanceId));
Assert.Contains(changes, c => c.Status == DeploymentStatus.Pending);
Assert.DoesNotContain(changes, c => c.Status == DeploymentStatus.Pending);
Assert.Contains(changes, c => c.Status == DeploymentStatus.InProgress);
Assert.Contains(changes, c => c.Status == DeploymentStatus.Failed);