fix(api-surface): close Theme 9 — 27 naming / dead-code / config / hygiene findings

The largest themed batch — small mechanical fixes across 11 modules.

API / message hygiene:
- Comm-020: SiteAddressCacheLoaded now carries IReadOnlyDictionary /
  IReadOnlyList — Akka messages must be immutable.
- Commons-016: BundleSession.MaxUnlockAttempts named constant replaces
  magic 3.
- Commons-018: IOperationTrackingStore + IPartitionMaintenance moved from
  Interfaces/ root to Interfaces/Services/ (namespace preserved — 9
  consumers exceeded the in-prompt move threshold).
- Commons-023: TrackingStatusSnapshot.SourceNode now consistent with the
  trailing-optional-with-default pattern used elsewhere.
- SR-022: AuditingDbCommand.DbConnection.set no longer uses reflection —
  exposes AuditingDbConnection.Inner via internal API surface.

Dead code / config cleanup:
- ClusterInfra-011: decorative SectionName constant deleted.
- ClusterInfra-014: dead AddClusterInfrastructureActors method + its
  "throws-when-called" test deleted.
- Host-021: Microsoft Logging:LogLevel block deleted from appsettings.json
  (dead under Serilog).

Fail-loud over fail-silent:
- DM-021: ResolveSiteIdentifierAsync throws on missing site (was silently
  substituting a DB id).
- DM-022: dropped transient Pending write — record now lands directly in
  InProgress (no UI flicker, one fewer DB write).
- Host-020: LoggerConfigurationFactory emits a Console.Error warning when
  both Serilog:MinimumLevel and ScadaLink:Logging:MinimumLevel are set
  (ScadaLink remains truth per Host-011).
- SnF-022: NotifyCachedCallObserverAsync logs Warning on unparseable
  TrackedOperationId (was silently dropping).
- SnF-023: empty siteId default replaced with $unknown-site sentinel
  + constructor normalisation.

Correctness:
- SCA-001: SupervisorStrategy XML rewritten to match actual
  DefaultDecider/Restart semantics (was claiming Resume).
- SCA-003: OnUpsertAsync now restamps IngestedAtUtc on every upsert.
- SR-021: HandleDeployArtifacts now dispatches an internal
  ApplyArtifactDataConnectionsToDcl message after the SQLite write so
  system-wide artifact-deploy data-connection changes go live
  immediately (was requiring a site restart).
- SnF-020: RetryParkedMessageAsync captures the parked row BEFORE the
  local write so a concurrent delete can't skip standby replication.

Sentinels / naming collisions:
- HM-021: CentralSiteId changed from "central" to "$central"
  (uncollideable — leading $ is forbidden in real SiteIdentifiers).

Doc / surface cleanups:
- SEL-018: FailedWriteCount promoted to ISiteEventLogger; XML softened
  to "Available for future Health Monitoring integration".
- SnF-019: VERIFY outcome — documented parking-after-DefaultMaxRetries
  in Component-StoreAndForward.md + DefaultMaxRetries XML (uniform
  cap; maxRetries:0 is the unbounded escape hatch).
- SnF-021: Component-StoreAndForward.md no longer claims the tracking
  table lives in SnF — it's in SiteRuntime, the interface is in Commons.
- CLI-020: bundle export response parse guarded with try/catch on
  JsonException / KeyNotFoundException / FormatException — emits a
  clean INVALID_RESPONSE exit instead of a stack trace.

Config:
- ClusterInfra-013: intent comment added to "catastrophic config" test.
- Host-016: appsettings.Site.json second CentralContactPoints entry
  removed (was pointing at the SITE's own port); doc-key explains how
  to extend.
- Host-018: NodeName added to both shipped per-role configs (was
  causing SourceNode to be null on audit rows).

UI:
- CentralUI-029: replaced JS.InvokeAsync<int>("eval", …) with an ES
  module import (new wwwroot/js/browser-time.js).
- CentralUI-032: AuditResultsGrid gains a Previous button backed by a
  cursor stack.

10+ new regression tests across the affected projects. Build clean;
all suites green. README regenerated: 6 open (was 33).

Session-to-date: 130 of 136 originally-open Theme findings closed.
This commit is contained in:
Joseph Doherty
2026-05-28 08:39:01 -04:00
parent d190345ef0
commit 77cb0ad0e2
46 changed files with 966 additions and 278 deletions
+4 -6
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 2 | | Open findings | 1 |
## Summary ## Summary
@@ -931,9 +931,11 @@ _Unresolved._
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Correctness & logic bugs | | Category | Correctness & logic bugs |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.CLI/Commands/BundleCommands.cs:117-126` | | Location | `src/ScadaLink.CLI/Commands/BundleCommands.cs:117-126` |
**Resolution (2026-05-28):** Wrapped the `JsonDocument.Parse` + `GetProperty` extraction in a `try/catch (JsonException or KeyNotFoundException or InvalidOperationException)` block and the `StreamBase64ToFile` call in a separate `try/catch (FormatException)`. Either failure now emits a clean `OutputFormatter.WriteError(..., "INVALID_RESPONSE")` and returns exit 1, matching the graceful-degradation pattern established by CLI-002 / CLI-003 / CLI-005. A malformed/abbreviated envelope no longer terminates the CLI with a raw stack trace.
**Description** **Description**
The export success handler does: The export success handler does:
@@ -959,10 +961,6 @@ Wrap the parse + base64-decode in a `try` block that catches `JsonException`,
clean `OutputFormatter.WriteError(..., "INVALID_RESPONSE")` + `return 1`. Add a clean `OutputFormatter.WriteError(..., "INVALID_RESPONSE")` + `return 1`. Add a
regression test against a malformed-envelope stub `HttpMessageHandler`. regression test against a malformed-envelope stub `HttpMessageHandler`.
**Resolution**
_Unresolved._
### CLI-021 — `CliConfig.Load` crashes the CLI on a malformed config file ### CLI-021 — `CliConfig.Load` crashes the CLI on a malformed config file
| | | | | |
+7 -3
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 2 | | Open findings | 0 |
## Summary ## Summary
@@ -1429,9 +1429,11 @@ still passes (568 / 568).
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Audit/ConfigurationAuditLog.razor:248-263` | | Location | `src/ScadaLink.CentralUI/Components/Pages/Audit/ConfigurationAuditLog.razor:248-263` |
**Resolution (2026-05-28):** Added a small 5-line `wwwroot/js/browser-time.js` ES module exporting `getTimezoneOffsetMinutes()`, and replaced the `JS.InvokeAsync<int>("eval", "new Date().getTimezoneOffset()")` call in `ConfigurationAuditLog.OnAfterRenderAsync` with a lazy `IJSObjectReference` import (`./_content/ScadaLink.CentralUI/js/browser-time.js`) + `module.InvokeAsync<int>("getTimezoneOffsetMinutes")`, matching the `session-expiry.js` / `audit-grid.js` / `nav-state.js` / `transport.js` module-import pattern. The residual `eval` JS-interop surface is gone and the page is now CSP-compatible with `unsafe-eval` forbidden.
**Description** **Description**
`OnAfterRenderAsync` fetches the browser's UTC offset with `OnAfterRenderAsync` fetches the browser's UTC offset with
@@ -1533,9 +1535,11 @@ docs to call out the in-memory cost per concurrent import session.
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Design-document adherence | | Category | Design-document adherence |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor:76-82`; `src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor.cs:65,196-197,219-220` | | Location | `src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor:76-82`; `src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor.cs:65,196-197,219-220` |
**Resolution (2026-05-28):** Added a `Stack<AuditLogPaging?> _cursorStack` and `AuditLogPaging? _currentPaging` field to `AuditResultsGrid.razor.cs`. `NextPage` now pushes the current cursor before advancing; a new `PrevPage` method pops the prior cursor, reloads at that position, and decrements `_pageNumber` only if the reload succeeds (a failed fetch leaves the user on the current page rather than stranding them between pages). The filter-change reset clears the stack alongside `_rows`. The razor template now renders a `btn-group` with a Previous button (gated on `CanGoBack`) alongside the existing Next button; both buttons get the standard `disabled` treatment during loads.
**Description** **Description**
The Audit Log results grid (Bundle B / M7-T3) renders a single "Next page" button The Audit Log results grid (Bundle B / M7-T3) renders a single "Next page" button
+11 -18
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 4 | | Open findings | 1 |
## Summary ## Summary
@@ -687,8 +687,10 @@ confirmed failing, then passing after the fix. Module test suite green (18 passe
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:24-27`, `src/ScadaLink.Host/SiteServiceRegistration.cs:100`, `src/ScadaLink.Host/StartupValidator.cs:43`, `src/ScadaLink.Host/StartupValidator.cs:45`, `src/ScadaLink.Host/StartupValidator.cs:75` | | Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:24-27`, `src/ScadaLink.Host/SiteServiceRegistration.cs:100`, `src/ScadaLink.Host/StartupValidator.cs:43`, `src/ScadaLink.Host/StartupValidator.cs:75` |
**Resolution (2026-05-28):** Took option (b) since wiring the constant into the Host's `SiteServiceRegistration.BindSharedOptions` / `StartupValidator` is outside this module's editable surface — deleted the `SectionName` constant from `ClusterOptions.cs` and the companion `SectionName_IsTheExpectedAppSettingsSection` test from `ClusterOptionsTests.cs`. The Host's `"ScadaLink:Cluster"` literals now stand alone (consistent with the implementation rather than the broken "single source of truth" claim). A code-comment placeholder records the rationale so a future Host-side change can reinstate the constant alongside the binding-site updates.
**Description** **Description**
@@ -722,11 +724,6 @@ Either (a) replace the hard-coded `"ScadaLink:Cluster"` literals in
claim to be the source of truth. Do not leave a public constant whose stated claim to be the source of truth. Do not leave a public constant whose stated
guarantee the code does not deliver. guarantee the code does not deliver.
**Resolution**
_Open — needs a one-line Host-side change to reference the constant, plus a test
that proves the section name flows from this module to the Host._
### ClusterInfrastructure-012 — Validator accepts `SeedNodes.Count == 1` despite design requiring both nodes as seeds ### ClusterInfrastructure-012 — Validator accepts `SeedNodes.Count == 1` despite design requiring both nodes as seeds
| | | | | |
@@ -791,9 +788,11 @@ ClusterInfrastructure.Tests).
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Documentation & comments | | Category | Documentation & comments |
| Status | Open | | Status | Resolved |
| Location | `tests/ScadaLink.ClusterInfrastructure.Tests/ClusterOptionsTests.cs:47-67` | | Location | `tests/ScadaLink.ClusterInfrastructure.Tests/ClusterOptionsTests.cs:47-67` |
**Resolution (2026-05-28):** Added a 10-line inline `// ClusterInfra-013: ...` block at the top of `Properties_CanBeSetToCustomValues` explicitly recording that this test exercises the POCO property setters only — the `keep-majority` strategy and `MinNrOfMembers = 2` values are explicitly forbidden in production by `ClusterOptionsValidator`, and the comment cross-references `UnsupportedSplitBrainStrategy_FailsValidation` and `MinNrOfMembers_NotOne_FailsValidation` so a future reader cannot misread the test as endorsing those values.
**Description** **Description**
`ClusterOptionsTests.Properties_CanBeSetToCustomValues` deliberately sets two values `ClusterOptionsTests.Properties_CanBeSetToCustomValues` deliberately sets two values
@@ -822,19 +821,17 @@ represent a valid runtime configuration, and `ClusterOptionsValidator` rejects t
(with a cross-reference to the relevant validator tests). Two lines is enough; the (with a cross-reference to the relevant validator tests). Two lines is enough; the
goal is to make the test's intent self-documenting. goal is to make the test's intent self-documenting.
**Resolution**
_Open._
### ClusterInfrastructure-014 — `AddClusterInfrastructureActors` is dead surface — no caller, no behaviour ### ClusterInfrastructure-014 — `AddClusterInfrastructureActors` is dead surface — no caller, no behaviour
| | | | | |
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:42-48` | | Location | `src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:42-48` |
**Resolution (2026-05-28):** Deleted the `AddClusterInfrastructureActors` extension method from `ServiceCollectionExtensions.cs` and its companion `AddClusterInfrastructureActors_ThrowsRatherThanSilentlySucceeding` test from `ServiceCollectionExtensionsTests.cs`. Verified no production caller existed before deletion via `grep -rn`. A code comment records the rationale (CI-001 ownership question now permanently settled; method served only to throw and was IDE-auto-complete noise). The class-level XML doc on the test file was updated to drop the stale reference to the removed test.
**Description** **Description**
`AddClusterInfrastructureActors` has now reached a curious state: it is a public `AddClusterInfrastructureActors` has now reached a curious state: it is a public
@@ -860,7 +857,3 @@ explicitly stating that this project exposes no actor-registration extension
(actor wiring lives in `ScadaLink.Host`). If the user prefers to keep the (actor wiring lives in `ScadaLink.Host`). If the user prefers to keep the
"fail-fast" trap, mark the method `[Obsolete(true, error: true)]` so the compiler — "fail-fast" trap, mark the method `[Obsolete(true, error: true)]` so the compiler —
not the runtime — rejects the call. not the runtime — rejects the call.
**Resolution**
_Open._
+10 -4
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 5 | | Open findings | 2 |
## Summary ## Summary
@@ -784,9 +784,11 @@ accepted values on the record.
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.Commons/Types/Transport/BundleSession.cs:13-16` | | Location | `src/ScadaLink.Commons/Types/Transport/BundleSession.cs:13-16` |
**Resolution (2026-05-28):** Added `public const int MaxUnlockAttempts = 3;` to `BundleSession` with an XML doc cross-referencing the authoritative `TransportOptions.MaxUnlockAttemptsPerSession`. The `Locked` getter now reads `FailedUnlockAttempts >= MaxUnlockAttempts` instead of comparing against the literal `3`, and the property's XML doc names the constant. No call-site update required — the existing Transport-component `TransportOptions.MaxUnlockAttemptsPerSession` (also `3`) remains the operator-facing dial; this constant is the shim's own threshold, now searchable for a security review.
**Description** **Description**
`BundleSession` exposes: `BundleSession` exposes:
@@ -880,9 +882,11 @@ needed again now.
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.Commons/Interfaces/IOperationTrackingStore.cs`, `src/ScadaLink.Commons/Interfaces/IPartitionMaintenance.cs` | | Location | `src/ScadaLink.Commons/Interfaces/IOperationTrackingStore.cs`, `src/ScadaLink.Commons/Interfaces/IPartitionMaintenance.cs` |
**Resolution (2026-05-28):** Moved both files into `src/ScadaLink.Commons/Interfaces/Services/`, matching the REQ-COM-5b sub-folder convention alongside the other service interfaces (`ISiteAuditQueue`, `INodeIdentityProvider`, `ICachedCallLifecycleObserver`, etc.). The 9 consumer files across `ScadaLink.SiteRuntime`, `ScadaLink.AuditLog`, `ScadaLink.ConfigurationDatabase`, and `ScadaLink.Host` exceed the in-instructions 8-file STOP threshold for namespace rewrites, so the namespace was deliberately kept as `ScadaLink.Commons.Interfaces` (not `.Services`) — no consumer change required, build remains green. A comment in each moved file records the rationale and notes that adopting the canonical `.Services` namespace can be picked up alongside any future Commons-wide namespace tidy-up.
**Description** **Description**
REQ-COM-5b documents the `Interfaces/` folder as having exactly three sub-folders: REQ-COM-5b documents the `Interfaces/` folder as having exactly three sub-folders:
@@ -1115,9 +1119,11 @@ Two related XML-doc weaknesses, both around the new Transport / Audit surface:
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Akka.NET conventions | | Category | Akka.NET conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.Commons/Messages/Audit/SiteCallQueries.cs:53-66`, `:110-123`, `src/ScadaLink.Commons/Messages/Notification/NotificationOutboxQueries.cs:26-39`, `:104-123`, `src/ScadaLink.Commons/Types/SiteCallOperational.cs:42-54`, `src/ScadaLink.Commons/Types/TrackingStatusSnapshot.cs:33-46` | | Location | `src/ScadaLink.Commons/Messages/Audit/SiteCallQueries.cs:53-66`, `:110-123`, `src/ScadaLink.Commons/Messages/Notification/NotificationOutboxQueries.cs:26-39`, `:104-123`, `src/ScadaLink.Commons/Types/SiteCallOperational.cs:42-54`, `src/ScadaLink.Commons/Types/TrackingStatusSnapshot.cs:33-46` |
**Resolution (2026-05-28):** Read all six locations and confirmed the dominant pattern is "trailing-optional with `= null` default" (`SiteCallSummary`, `SiteCallDetail`, `NotificationSummary`, `NotificationDetail`, `NotificationOutboxQueryRequest.SourceNodeFilter`, `SiteCallQueryRequest.SourceNodeFilter` all already use this form). The single odd-one-out was `TrackingStatusSnapshot.SourceNode`, declared as `string? SourceNode` with no default — added the `= null` default to unify it with the rest. Verified both existing callers (`OperationTrackingStore.cs` and `TrackingApiTests.cs`) use named arguments, so the change is purely additive. `SiteCallOperational.SourceNode` sits in the middle of its positional parameter list rather than the trailing slot — that's a separate positional-record concern outside the "trailing-optional" pattern the finding called out, and moving it would touch many telemetry/proto consumers, so it was deliberately not touched here.
**Description** **Description**
The `SourceNode` rollout adds an optional trailing parameter to a long list of positional The `SourceNode` rollout adds an optional trailing parameter to a long list of positional
+4 -2
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 3 | | Open findings | 2 |
## Summary ## Summary
@@ -1005,9 +1005,11 @@ the finding.
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Akka.NET conventions | | Category | Akka.NET conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs:567` | | Location | `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs:567` |
**Resolution (2026-05-28):** `SiteAddressCacheLoaded`'s `SiteContacts` payload is now typed as `IReadOnlyDictionary<string, IReadOnlyList<string>>`, enforcing the Akka.NET message-immutability convention at the type level rather than relying on producer discipline. The producer (`LoadSiteAddressesFromDb`) builds the working buckets as before and wraps each inner `List<string>` with `AsReadOnly()` before constructing the message — the freeze is local to the single refresh tick and the cost is negligible. The consumer (`HandleSiteAddressCacheLoaded`) only ever read via `Keys`, foreach-deconstruct, `Select`, `Count` and `ToImmutableHashSet`, all of which are supported by the new read-only types, so no consumer changes were needed. The existing `MalformedSiteAddress_DoesNotAbortRefresh_OtherSitesStillRegistered` and `ClusterClientRouting_RoutesToConfiguredSite` regression tests exercise the producer→consumer flow and continue to pass under the read-only types.
**Description** **Description**
The Akka.NET convention is that messages crossing actor boundaries (even The Akka.NET convention is that messages crossing actor boundaries (even
+7 -3
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 2 | | Open findings | 0 |
## Summary ## Summary
@@ -1076,9 +1076,11 @@ as the actor. Tests green (80/80 in DeploymentManager.Tests).
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Correctness & logic bugs | | Category | Correctness & logic bugs |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:107-111` | | Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:107-111` |
**Resolution (2026-05-28):** `ResolveSiteIdentifierAsync` now throws `InvalidOperationException` (`"Site with ID {siteId} not found; cannot resolve its SiteIdentifier for routing."`) when the `Site` row is missing, instead of returning the numeric id rendered as a string. The deploy path's existing try/catch turns the throw into a `DeploymentStatus.Failed` record carrying the descriptive message (the `DeploymentManager-001`/`-002` cleanup write the failure with `CancellationToken.None`); the lifecycle paths (Disable/Enable/Delete) propagate the exception so the CLI/UI caller surfaces the actual cause to the operator rather than seeing a confusing downstream "unknown site" routing error. The repository contract already returned `Site?`, so the null path is now type-visible at the call site instead of silently papered over.
**Description** **Description**
``` ```
@@ -1114,9 +1116,11 @@ returns `Site?`, so the null path is type-visible; just don't paper over it.
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:178-194` | | Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:178-194` |
**Resolution (2026-05-28):** The transient `Pending` write was dropped — the deployment record is now created directly in `DeploymentStatus.InProgress`, which collapses the start of the deploy into a single `AddDeploymentRecordAsync` + `SaveChangesAsync` + `NotifyStatusChange` (instead of two writes back-to-back). The flattening, validation, and `TryReconcileWithSiteAsync` round-trip have all completed before the insert, and the deploy command is sent immediately after, so `Pending` carried no operational meaning between the two writes. `InProgress` retains its documented "sent to site, awaiting response" semantics. Eliminating the extra `SaveChangesAsync` round-trip also removes the `Pending``InProgress` flicker the CentralUI-006 deployment-status page used to render via the second `IDeploymentStatusNotifier.NotifyStatusChanged` invocation.
**Description** **Description**
`DeployInstanceAsync` does: `DeployInstanceAsync` does:
+4 -2
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 3 | | Open findings | 2 |
## Summary ## Summary
@@ -1034,9 +1034,11 @@ online.
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Correctness & logic bugs | | Category | Correctness & logic bugs |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.HealthMonitoring/CentralHealthReportLoop.cs:22`, `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs:224-226` | | Location | `src/ScadaLink.HealthMonitoring/CentralHealthReportLoop.cs:22`, `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs:224-226` |
**Resolution (2026-05-28):** `CentralHealthReportLoop.CentralSiteId` is now `"$central"` instead of `"central"`. The leading `$` is forbidden in operator-set `Site.SiteIdentifier` values (which are plain identifiers), so the synthetic central self-report cannot collide with a real site whose identifier happens to be the bare word `"central"`. The collision case the finding called out — two reports clobbering each other in the aggregator keyspace via the sequence-number guard and a real site inheriting `CentralOfflineTimeout` and staying falsely-online for an extra two minutes — is now impossible. The aggregator (`CentralHealthAggregator.CheckForOfflineSites`), the Central UI health dashboard (`Monitoring/Health.razor`), and every test reference the constant rather than the literal string, so the value change is local — no consumer code needed updating. Existing `CentralHealthAggregatorTests` and `CentralHealthReportLoopTests` already use the constant, so they continue to pin the central-self-report identity through the new sentinel.
**Description** **Description**
`CentralHealthAggregator.CheckForOfflineSites` looks up the per-site offline `CentralHealthAggregator.CheckForOfflineSites` looks up the per-site offline
+9 -17
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 4 | | Open findings | 0 |
## Summary ## Summary
@@ -831,7 +831,7 @@ Full Host suite green (182 passed).
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Correctness & logic bugs | | Category | Correctness & logic bugs |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.Host/appsettings.Site.json:33-37` | | Location | `src/ScadaLink.Host/appsettings.Site.json:33-37` |
**Description** **Description**
@@ -861,9 +861,7 @@ multi-node layout). Consider extending `StartupValidator` to reject any
node's `NodeHostname`+`RemotingPort`. Add a regression test in node's `NodeHostname`+`RemotingPort`. Add a regression test in
`StartupValidatorTests` mirroring `Site_SeedNodeOnGrpcPort_FailsValidation`. `StartupValidatorTests` mirroring `Site_SeedNodeOnGrpcPort_FailsValidation`.
**Resolution** **Resolution (2026-05-28):** The shipped `appsettings.Site.json` `CentralContactPoints` entry that pointed at the site's own remoting port (`localhost:8082`) was removed — the dev-loopback default now lists only the single central node (`akka.tcp://scadalink@localhost:8081`), which is the actually-reachable target in the single-node dev layout. A `_centralContactPoints` doc-key comment was added immediately above the array calling out the per-entry rule (each entry MUST be a central node's remoting endpoint, not the site's own remoting port) and explaining how to extend the list with a second central node (`akka.tcp://scadalink@central-b-host:8081`) in a multi-central deployment so ClusterClient can fail over. The dangerous example pattern that would have been copied into multi-central configs no longer exists in the template. `StartupValidator` cross-check is left as a follow-up — the documented rule plus the corrected template removes the immediate misconfiguration risk.
_Open._
### Host-017 — Site-shutdown ordering from REQ-HOST-7 is not wired ### Host-017 — Site-shutdown ordering from REQ-HOST-7 is not wired
@@ -943,7 +941,7 @@ unit suite covers both server-side invariants and the wiring is a single
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.Host/appsettings.Central.json`, `src/ScadaLink.Host/appsettings.Site.json`, `src/ScadaLink.Host/NodeOptions.cs:10-16` | | Location | `src/ScadaLink.Host/appsettings.Central.json`, `src/ScadaLink.Host/appsettings.Site.json`, `src/ScadaLink.Host/NodeOptions.cs:10-16` |
**Description** **Description**
@@ -974,9 +972,7 @@ per-node in multi-node deployments. Consider validating in `StartupValidator`
that `NodeName` is non-empty, or accept the null and document explicitly that that `NodeName` is non-empty, or accept the null and document explicitly that
single-node dev deployments leave `SourceNode` null. single-node dev deployments leave `SourceNode` null.
**Resolution** **Resolution (2026-05-28):** The shipped per-role templates now set `ScadaLink:Node:NodeName``central-a` in `appsettings.Central.json` and `node-a` in `appsettings.Site.json` — so dev audit rows are stamped with a real `SourceNode` value (instead of `NodeIdentityProvider` normalising the missing key to `null`) and the indexed `IX_AuditLog_Node_Occurred` lookup actually narrows. A `_nodeName` doc-key comment was added beside each `Node` section explaining the convention (`central-a`/`central-b` for central, `node-a`/`node-b` for site), pointing at the docker per-node configs (which already overrode the field), and noting that the value must be overridden per-node in multi-node deployments and that an empty value still normalises to a `NULL` SourceNode. The shipped dev templates now match the per-node docker examples — a developer running the binary directly no longer sees a null `SourceNode`.
_Open._
### Host-019 — Migration `StartupRetry` call drops the host `CancellationToken` ### Host-019 — Migration `StartupRetry` call drops the host `CancellationToken`
@@ -1021,7 +1017,7 @@ _Open._
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Documentation & comments | | Category | Documentation & comments |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.Host/LoggerConfigurationFactory.cs:36-43` | | Location | `src/ScadaLink.Host/LoggerConfigurationFactory.cs:36-43` |
**Description** **Description**
@@ -1050,9 +1046,7 @@ current "ScadaLink:Logging" path and reject `Serilog:MinimumLevel` if present
(throw at startup so the operator sees the conflict). At minimum, expand the (throw at startup so the operator sees the conflict). At minimum, expand the
XML doc + REQ-HOST-8 to spell out the precedence explicitly. XML doc + REQ-HOST-8 to spell out the precedence explicitly.
**Resolution** **Resolution (2026-05-28):** `ScadaLink:Logging:MinimumLevel` is now the documented single source of truth for the Serilog floor (Host-011's `LoggingOptions` binding), and the precedence is made visible — `LoggerConfigurationFactory.Build` writes a one-shot warning to `Console.Error` when both `ScadaLink:Logging:MinimumLevel` and `Serilog:MinimumLevel` (or `Serilog:MinimumLevel:Default`) are present, naming both values and pointing the operator at the documented key. Order of operations is unchanged — `MinimumLevel.Is(...)` deliberately runs after `ReadFrom.Configuration(...)` so the ScadaLink value wins — but the silent-override behaviour is now loud. The class XML doc gained a Host-020 paragraph explicitly spelling out the precedence. A test-visible `Build(..., TextWriter warningWriter)` overload mirrors the `ParseLevel` Host-022 pattern so the warning can be asserted in unit tests; the production four-arg overload delegates with `Console.Error`.
_Open._
### Host-021 — Microsoft `Logging:LogLevel` section in `appsettings.json` is dead config under Serilog ### Host-021 — Microsoft `Logging:LogLevel` section in `appsettings.json` is dead config under Serilog
@@ -1060,7 +1054,7 @@ _Open._
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.Host/appsettings.json:2-6` | | Location | `src/ScadaLink.Host/appsettings.json:2-6` |
**Description** **Description**
@@ -1084,9 +1078,7 @@ explaining it is intentionally retained for non-Serilog tooling. Document the
authoritative location (`Serilog` + `ScadaLink:Logging`) in authoritative location (`Serilog` + `ScadaLink:Logging`) in
`Component-Host.md` REQ-HOST-8 if not already explicit. `Component-Host.md` REQ-HOST-8 if not already explicit.
**Resolution** **Resolution (2026-05-28):** Confirmed by repository-wide grep that no code reads `Logging:LogLevel` (the Host calls `builder.Host.UseSerilog()` which replaces the default `ILoggerFactory` setup with Serilog as the only provider), so the block was pure dead config. Removed the `Logging:LogLevel:Default = Information` block from `appsettings.json` and replaced it with a `_logging` doc-key comment explaining the rationale (Serilog is the sole provider) and pointing operators at the two authoritative keys: `ScadaLink:Logging:MinimumLevel` for the floor (bound to `LoggingOptions` per Host-011) and the `Serilog` section for sinks (Host-014's `ReadFrom.Configuration`). The Host-014 regression test (`SerilogSinkConfigTests.ShippedAppSettings_HasSerilogSection_WithConsoleAndFileSinks`) still asserts the surviving `Serilog` section's shape, so removing the Microsoft block did not break the existing pinning.
_Open._
### Host-022 — `ParseLevel` silently coerces unrecognised `MinimumLevel` to `Information` ### Host-022 — `ParseLevel` silently coerces unrecognised `MinimumLevel` to `Information`
+17 -44
View File
@@ -41,35 +41,35 @@ module file and counted in **Total**.
|----------|---------------| |----------|---------------|
| Critical | 0 | | Critical | 0 |
| High | 0 | | High | 0 |
| Medium | 13 | | Medium | 5 |
| Low | 20 | | Low | 1 |
| **Total** | **33** | | **Total** | **6** |
## Module Status ## Module Status
| Module | Last reviewed | Commit | Open (C/H/M/L) | Open | Total | | Module | Last reviewed | Commit | Open (C/H/M/L) | Open | Total |
|--------|---------------|--------|----------------|------|-------| |--------|---------------|--------|----------------|------|-------|
| [AuditLog](AuditLog/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/0 | 1 | 11 | | [AuditLog](AuditLog/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/0 | 1 | 11 |
| [CLI](CLI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 23 | | [CLI](CLI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 |
| [CentralUI](CentralUI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/2 | 2 | 33 | | [CentralUI](CentralUI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 33 |
| [ClusterInfrastructure](ClusterInfrastructure/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/3 | 3 | 14 | | [ClusterInfrastructure](ClusterInfrastructure/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 14 |
| [Commons](Commons/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/3 | 3 | 23 | | [Commons](Commons/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 |
| [Communication](Communication/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 22 | | [Communication](Communication/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 22 |
| [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 24 | | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 24 |
| [DataConnectionLayer](DataConnectionLayer/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 22 | | [DataConnectionLayer](DataConnectionLayer/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 22 |
| [DeploymentManager](DeploymentManager/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/2 | 2 | 24 | | [DeploymentManager](DeploymentManager/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 24 |
| [ExternalSystemGateway](ExternalSystemGateway/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/0 | 1 | 23 | | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/0 | 1 | 23 |
| [HealthMonitoring](HealthMonitoring/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 23 | | [HealthMonitoring](HealthMonitoring/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 |
| [Host](Host/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/3 | 4 | 22 | | [Host](Host/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 22 |
| [InboundAPI](InboundAPI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 25 | | [InboundAPI](InboundAPI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 25 |
| [ManagementService](ManagementService/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 | | [ManagementService](ManagementService/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 |
| [NotificationOutbox](NotificationOutbox/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 10 | | [NotificationOutbox](NotificationOutbox/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 10 |
| [NotificationService](NotificationService/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 25 | | [NotificationService](NotificationService/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 25 |
| [Security](Security/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 21 | | [Security](Security/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 21 |
| [SiteCallAudit](SiteCallAudit/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/0 | 2 | 6 | | [SiteCallAudit](SiteCallAudit/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 6 |
| [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 23 | | [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 |
| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/0 | 2 | 26 | | [SiteRuntime](SiteRuntime/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 26 |
| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/3/2 | 5 | 24 | | [StoreAndForward](StoreAndForward/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 24 |
| [TemplateEngine](TemplateEngine/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/3/0 | 3 | 22 | | [TemplateEngine](TemplateEngine/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/3/0 | 3 | 22 |
| [Transport](Transport/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 12 | | [Transport](Transport/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 12 |
@@ -88,45 +88,18 @@ _None open._
_None open._ _None open._
### Medium (13) ### Medium (5)
| ID | Module | Title | | ID | Module | Title |
|----|--------|-------| |----|--------|-------|
| AuditLog-001 | [AuditLog](AuditLog/findings.md) | Combined-telemetry transport is plumbed end-to-end but never invoked in production | | AuditLog-001 | [AuditLog](AuditLog/findings.md) | Combined-telemetry transport is plumbed end-to-end but never invoked in production |
| ExternalSystemGateway-020 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | `JsonElementToParameterValue` silently downcasts non-Int64 JSON numbers to `double`, losing precision for `decimal` SQL parameters on retry | | ExternalSystemGateway-020 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | `JsonElementToParameterValue` silently downcasts non-Int64 JSON numbers to `double`, losing precision for `decimal` SQL parameters on retry |
| Host-016 | [Host](Host/findings.md) | Site `CentralContactPoints` second entry targets the site's own remoting port |
| SiteCallAudit-001 | [SiteCallAudit](SiteCallAudit/findings.md) | SupervisorStrategy override is dead code; XML claims Resume that is not enforced |
| SiteCallAudit-003 | [SiteCallAudit](SiteCallAudit/findings.md) | `OnUpsertAsync` does not refresh `IngestedAtUtc`; direct-write callers must remember to stamp it |
| SiteRuntime-021 | [SiteRuntime](SiteRuntime/findings.md) | `HandleDeployArtifacts` updates `DataConnections` in SQLite but never sends `CreateConnectionCommand` to the DCL |
| SiteRuntime-022 | [SiteRuntime](SiteRuntime/findings.md) | `AuditingDbCommand.DbConnection.set` uses reflection to read `AuditingDbConnection._inner` |
| StoreAndForward-019 | [StoreAndForward](StoreAndForward/findings.md) | Notifications park after `DefaultMaxRetries` exhaustion, contradicting "retried until central acks" |
| StoreAndForward-020 | [StoreAndForward](StoreAndForward/findings.md) | `RetryParkedMessageAsync` skips standby replication when the message is deleted between local update and re-load |
| StoreAndForward-021 | [StoreAndForward](StoreAndForward/findings.md) | Design doc claims the Operation Tracking Table lives in StoreAndForward but the implementation is in SiteRuntime |
| TemplateEngine-018 | [TemplateEngine](TemplateEngine/findings.md) | `DiffService` reports no entries for added/removed/changed connections | | TemplateEngine-018 | [TemplateEngine](TemplateEngine/findings.md) | `DiffService` reports no entries for added/removed/changed connections |
| TemplateEngine-019 | [TemplateEngine](TemplateEngine/findings.md) | `TemplateResolver.BuildInheritanceChain` still uses the `0`-as-no-parent sentinel that was removed from `CycleDetector` | | TemplateEngine-019 | [TemplateEngine](TemplateEngine/findings.md) | `TemplateResolver.BuildInheritanceChain` still uses the `0`-as-no-parent sentinel that was removed from `CycleDetector` |
| TemplateEngine-020 | [TemplateEngine](TemplateEngine/findings.md) | `Create*` audit entries are written with `EntityId = "0"` before `SaveChangesAsync` populates the real key | | TemplateEngine-020 | [TemplateEngine](TemplateEngine/findings.md) | `Create*` audit entries are written with `EntityId = "0"` before `SaveChangesAsync` populates the real key |
### Low (20) ### Low (1)
| ID | Module | Title | | ID | Module | Title |
|----|--------|-------| |----|--------|-------|
| CLI-020 | [CLI](CLI/findings.md) | `bundle export` success-envelope parse is unguarded |
| CentralUI-029 | [CentralUI](CentralUI/findings.md) | `ConfigurationAuditLog` uses `JS.InvokeAsync<int>("eval", ...)` instead of a dedicated JS module |
| CentralUI-032 | [CentralUI](CentralUI/findings.md) | `AuditResultsGrid` paging is forward-only, no Previous button |
| ClusterInfrastructure-011 | [ClusterInfrastructure](ClusterInfrastructure/findings.md) | `SectionName` constant is decorative — no binding site references it |
| ClusterInfrastructure-013 | [ClusterInfrastructure](ClusterInfrastructure/findings.md) | Test uses catastrophic config values without an inline-intent comment |
| ClusterInfrastructure-014 | [ClusterInfrastructure](ClusterInfrastructure/findings.md) | `AddClusterInfrastructureActors` is dead surface — no caller, no behaviour |
| Commons-016 | [Commons](Commons/findings.md) | `BundleSession.Locked` uses a magic `3` rather than a named constant |
| Commons-018 | [Commons](Commons/findings.md) | `IOperationTrackingStore` and `IPartitionMaintenance` are at the root of `Interfaces/` instead of `Interfaces/Services/` |
| Commons-023 | [Commons](Commons/findings.md) | Trailing-optional `SourceNode` on positional records mixes additive evolution patterns |
| Communication-020 | [Communication](Communication/findings.md) | `SiteAddressCacheLoaded` carries mutable `Dictionary`/`List` types |
| DeploymentManager-021 | [DeploymentManager](DeploymentManager/findings.md) | `ResolveSiteIdentifierAsync` silently substitutes the DB id when the site row is missing |
| DeploymentManager-022 | [DeploymentManager](DeploymentManager/findings.md) | `Pending` and `InProgress` are written back-to-back with no intervening work |
| HealthMonitoring-021 | [HealthMonitoring](HealthMonitoring/findings.md) | `CentralSiteId = "central"` reserved constant silently collides with a real site named "central" |
| Host-018 | [Host](Host/findings.md) | Shipped per-role configs omit `NodeOptions.NodeName`, leaving `SourceNode` null |
| Host-020 | [Host](Host/findings.md) | `MinimumLevel.Is` silently overrides any operator-set `Serilog:MinimumLevel` |
| Host-021 | [Host](Host/findings.md) | Microsoft `Logging:LogLevel` section in `appsettings.json` is dead config under Serilog |
| SiteEventLogging-018 | [SiteEventLogging](SiteEventLogging/findings.md) | `FailedWriteCount` is exposed but never consumed by Health Monitoring |
| StoreAndForward-022 | [StoreAndForward](StoreAndForward/findings.md) | `NotifyCachedCallObserverAsync` silently drops the entire audit lifecycle when the message id is not a parseable `TrackedOperationId` |
| StoreAndForward-023 | [StoreAndForward](StoreAndForward/findings.md) | `siteId` silently defaults to empty when no `IStoreAndForwardSiteContext` is registered, degrading audit telemetry correlation |
| Transport-012 | [Transport](Transport/findings.md) | "Bundle Import" filter promised in design doc not surfaced in Configuration Audit Log Viewer UI | | Transport-012 | [Transport](Transport/findings.md) | "Bundle Import" filter promised in design doc not surfaced in Configuration Audit Log Viewer UI |
+5 -9
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 4 | | Open findings | 2 |
## Summary ## Summary
@@ -50,7 +50,7 @@ tests using a shared `MsSqlMigrationFixture`.
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Akka.NET conventions | | Category | Akka.NET conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs:32-46`, `src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs:147-151` | | Location | `src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs:32-46`, `src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs:147-151` |
**Description** **Description**
@@ -98,9 +98,7 @@ Either:
The CLAUDE.md "Resume for coordinator actors" decision applies to actors with The CLAUDE.md "Resume for coordinator actors" decision applies to actors with
children (Site Runtime hierarchy) — not to leaf cluster singletons. children (Site Runtime hierarchy) — not to leaf cluster singletons.
**Resolution** **Resolution (2026-05-28):** Rewrote the class-level XML on `SiteCallAuditActor` plus the method-level XML on `SupervisorStrategy()` to accurately describe what the override does — a one-for-one strategy with `DefaultDecider` (Restart on most exceptions, Stop on `ActorInitializationException`/`ActorKilledException`) and `maxNrOfRetries: 0`, governing the actor's *children* (the actor has none today, so the override is currently inert). Dropped the misleading "Resume" claim. The new docs make clear that self-supervision of this cluster singleton is the parent `ClusterSingletonManager`'s concern and the actor's own resilience comes from the in-handler `try/catch` in `OnUpsertAsync`, not from this override. No behaviour change — pure documentation fix; existing 24 SiteCallAudit tests remain green.
_Unresolved._
### SiteCallAudit-002 — Singleton failover does not wait for in-flight async upserts ### SiteCallAudit-002 — Singleton failover does not wait for in-flight async upserts
@@ -154,7 +152,7 @@ Notification Outbox sibling has the same pattern.
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Correctness & logic bugs | | Category | Correctness & logic bugs |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs:153-193` | | Location | `src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs:153-193` |
**Description** **Description**
@@ -190,9 +188,7 @@ inconsistent with the dual-write code path and undocumented.
Preferred: stamp inside the actor — same as the combined-telemetry path — Preferred: stamp inside the actor — same as the combined-telemetry path —
because callers cannot in general know the actor is colocated on central. because callers cannot in general know the actor is colocated on central.
**Resolution** **Resolution (2026-05-28):** `OnUpsertAsync` now rewrites the incoming `SiteCall` via `cmd.SiteCall with { IngestedAtUtc = DateTime.UtcNow }` immediately before calling `repository.UpsertAsync`, mirroring `AuditLogIngestActor`'s combined-telemetry hot path. The repository writes `IngestedAtUtc` on both the insert-if-not-exists and the monotonic UPDATE legs (`SiteCallAuditRepository.UpsertAsync`), so the column is writable on every upsert. Callers (telemetry, the deferred reconciliation puller, any future direct-write) no longer need to remember to stamp a central-side timestamp — the actor owns it. Existing 24 SiteCallAudit tests remain green (the MSSQL-fixture test constructs rows with `DateTime.UtcNow` and doesn't assert the exact value, so the actor's re-stamp is backward compatible).
_Unresolved._
### SiteCallAudit-004 — Reconciliation puller and daily terminal-purge scheduler still deferred; design-doc drift ### SiteCallAudit-004 — Reconciliation puller and daily terminal-purge scheduler still deferred; design-doc drift
+4 -2
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 3 | | Open findings | 2 |
## Summary ## Summary
@@ -901,9 +901,11 @@ refused.
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Documentation & comments | | Category | Documentation & comments |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:67-71,225-226` | | Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:67-71,225-226` |
**Resolution (2026-05-28):** Took option (a)-via-(b) from the recommendation. Softened the `SiteEventLogger.FailedWriteCount` XML doc to describe the actual state ("Available for future Health Monitoring integration — the counter is correct and observable, but the central health-metric pipeline does not yet poll it"), removing the misleading "Health Monitoring can detect a logging outage" claim. The Health Monitoring wiring is left as a tracked follow-up (it requires a `ScadaLink.HealthMonitoring` source change that belongs in a different batch). Promoted `FailedWriteCount { get; }` onto `ISiteEventLogger` so the eventual Health consumer reads it through the interface without a concrete-type downcast. No behaviour change — pure documentation + interface-surface tidy-up; `SiteEventLogger` already exposed the property publicly, and no test fakes/mocks of `ISiteEventLogger` exist in the repo (grep confirms only `SiteEventLogger` implements it), so the interface addition is non-breaking. Existing 59 SiteEventLogging tests remain green.
**Description** **Description**
`SiteEventLogger.FailedWriteCount` was added under SiteEventLogging-008 with the `SiteEventLogger.FailedWriteCount` was added under SiteEventLogging-008 with the
+7 -3
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 5 | | Open findings | 3 |
## Summary ## Summary
@@ -1038,9 +1038,11 @@ be gated on "no instance with this name is currently terminating".
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Design-document adherence | | Category | Design-document adherence |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:931` | | Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:931` |
**Resolution (2026-05-28):** Took the "refactor `EnsureDclConnections` into a shared field-based helper" path. Extracted a new `EnsureDclConnection(name, protocol, primaryJson, backupJson, failoverRetryCount)` method that owns the hash-cache check and the `CreateConnectionCommand` Tell — both the existing inline `EnsureDclConnections(configJson)` and the new artifact path now drive through it. `ComputeConnectionConfigHash` got a field-based overload so the artifact path (which carries data directly on `DataConnectionArtifact`) reuses the same hash logic as the `ConnectionConfig`-based inline path. To keep `_createdConnections` mutation actor-thread-confined (the artifact-deploy persistence runs inside a `Task.Run`), the off-thread persistence dispatches a new internal `ApplyArtifactDataConnectionsToDcl` message back to `Self` after the SQLite writes; the actor-thread handler then iterates and invokes `EnsureDclConnection`. The DCL only sees `CreateConnectionCommand` (no `Update`/`Delete` messages exist in the codebase, and `CreateConnectionCommand` is treated as upsert-by-name — same shape as the inline-config path). Build clean; 302 SiteRuntime tests green (the existing `EnsureDclConnections_ConnectionConfigChanged_ReissuesCreateCommand` regression test still passes through the refactored shared helper).
**Description** **Description**
`HandleDeployArtifacts` persists the artifact bundle (shared scripts, external `HandleDeployArtifacts` persists the artifact bundle (shared scripts, external
@@ -1088,9 +1090,11 @@ and artifact paths can drive through it.
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.SiteRuntime/Scripts/AuditingDbCommand.cs:138` | | Location | `src/ScadaLink.SiteRuntime/Scripts/AuditingDbCommand.cs:138` |
**Resolution (2026-05-28):** Took the recommended "expose a proper API surface" path (the SiteRuntime-006 precedent). Added an `internal DbConnection Inner => _inner;` accessor to `AuditingDbConnection`; both classes are `internal sealed` in the same assembly, so the accessor stays out of the public API. The `AuditingDbCommand.DbConnection` setter now unwraps an `AuditingDbConnection` via `auditing.Inner` instead of `Type.GetField("_inner", BindingFlags.NonPublic | BindingFlags.Instance)!.GetValue(...)`. No reflection, no `!.` null-forgiveness hiding a runtime crash, no static-analyzer/IL-trim noise — and the same module that enforces "no `System.Reflection` in scripts" no longer reflects internally. The getter's `_wrappingConnection ?? _inner.Connection` fallback was left as-is; addressing the `CreateDbCommand()` round-trip concern is a separate behavioural decision (the finding marked it secondary). Build clean; 302 SiteRuntime tests green.
**Description** **Description**
The `DbConnection` setter on `AuditingDbCommand` unwraps an The `DbConnection` setter on `AuditingDbCommand` unwraps an
+63 -21
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 | | Last reviewed | 2026-05-28 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` | | Commit reviewed | `1eb6e97` |
| Open findings | 5 (3 Deferred: 002, 011, 012; 5 new Open from Re-review 2026-05-28) | | Open findings | 0 (3 Deferred: 002, 011, 012; all 5 Open from Re-review 2026-05-28 resolved 2026-05-28) |
## Summary ## Summary
@@ -1067,7 +1067,7 @@ _Unresolved._
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Error handling & resilience | | Category | Error handling & resilience |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:229`, `:407``:437`; `src/ScadaLink.StoreAndForward/StoreAndForwardOptions.cs:18`; `src/ScadaLink.SiteRuntime/Scripts/ScriptRuntimeContext.cs:1773``:1778`; `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:149``:156` | | Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:229`, `:407``:437`; `src/ScadaLink.StoreAndForward/StoreAndForwardOptions.cs:18`; `src/ScadaLink.SiteRuntime/Scripts/ScriptRuntimeContext.cs:1773``:1778`; `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:149``:156` |
**Description** **Description**
@@ -1121,9 +1121,21 @@ the field value) so the invariant is enforced at the single chokepoint rather th
relying on every caller to pass the right value — this also fixes the legacy relying on every caller to pass the right value — this also fixes the legacy
`NotificationDeliveryService` path without editing the consumer. `NotificationDeliveryService` path without editing the consumer.
**Resolution** **Resolution (2026-05-28):**
VERIFY outcome — the design doc's "Notifications do not park" wording (lines 47, 59)
_Unresolved._ was the *operational intent* for the happy path, not an absolute invariant: the engine
has always enforced `DefaultMaxRetries` uniformly across every category, and every
sibling system (ESG, CachedDbWrite) bounds retry-then-parks for the same disk-pressure
and operator-visibility reasons. Removing the cap for notifications would let a single
unreachable central exhaust local disk via an unbounded buffer — worse than the
documented "park after retry budget" behaviour. Resolution is therefore the brief's
**default**: document the parking behaviour. Updated
`Component-StoreAndForward.md` lines 46/58 to clarify that the `DefaultMaxRetries` cap
applies uniformly (including to notifications) and that `maxRetries: 0` is the explicit
escape hatch for callers that need unbounded retry. Added a `StoreAndForward-019` block
to `StoreAndForwardOptions.DefaultMaxRetries`'s XML doc explaining the same invariant.
No behavioural code change — existing tests (104 in
`ScadaLink.StoreAndForward.Tests`) continue to pass.
### StoreAndForward-020 — `RetryParkedMessageAsync` skips standby replication when the message is deleted between local update and re-load ### StoreAndForward-020 — `RetryParkedMessageAsync` skips standby replication when the message is deleted between local update and re-load
@@ -1131,7 +1143,7 @@ _Unresolved._
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Concurrency & thread safety | | Category | Concurrency & thread safety |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:599``:616` | | Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:599``:616` |
**Description** **Description**
@@ -1209,9 +1221,16 @@ Add a regression test in `StoreAndForwardReplicationTests` that simulates the
delete-between-update-and-reload race and asserts the `Requeue` replication delete-between-update-and-reload race and asserts the `Requeue` replication
operation is still emitted with the correct category. operation is still emitted with the correct category.
**Resolution** **Resolution (2026-05-28):**
Applied the brief's primary recommendation — `RetryParkedMessageAsync` now captures
_Unresolved._ the parked row up front via `GetMessageByIdAsync` (and rejects the call early if the
row is missing or no longer `Parked`), then performs the local `RetryParkedMessageAsync`
storage write, and finally reconstructs the post-requeue state on the captured POCO
(`Status = Pending, RetryCount = 0, LastError = null, LastAttemptAt = null`) and
replicates it. A concurrent `RemoveMessageAsync` or `DiscardParkedMessageAsync` running
between the local write and the original re-load can no longer skip replication — the
row is in hand. The category-fallback misllabelling on the racy path is gone because
the activity log uses the captured `Category` directly.
### StoreAndForward-021 — Design doc claims the Operation Tracking Table lives in StoreAndForward but the implementation is in SiteRuntime ### StoreAndForward-021 — Design doc claims the Operation Tracking Table lives in StoreAndForward but the implementation is in SiteRuntime
@@ -1219,7 +1238,7 @@ _Unresolved._
|--|--| |--|--|
| Severity | Medium | | Severity | Medium |
| Category | Design-document adherence | | Category | Design-document adherence |
| Status | Open | | Status | Resolved |
| Location | `docs/requirements/Component-StoreAndForward.md:21`, `:49``:51`, `:77``:87`, `:108`, `:114`; `src/ScadaLink.SiteRuntime/Tracking/OperationTrackingStore.cs:37`; `src/ScadaLink.StoreAndForward/` (whole module) | | Location | `docs/requirements/Component-StoreAndForward.md:21`, `:49``:51`, `:77``:87`, `:108`, `:114`; `src/ScadaLink.SiteRuntime/Tracking/OperationTrackingStore.cs:37`; `src/ScadaLink.StoreAndForward/` (whole module) |
**Description** **Description**
@@ -1274,9 +1293,18 @@ several refactors out of date. The hierarchical map should be:
- `Component-SiteCallAudit.md` / `Component-AuditLog.md` → telemetry emission + - `Component-SiteCallAudit.md` / `Component-AuditLog.md` → telemetry emission +
central-side mirror. central-side mirror.
**Resolution** **Resolution (2026-05-28):**
Doc-side fix applied (per the brief, the simplest of the two options). Updated
_Unresolved._ `Component-StoreAndForward.md`: (1) removed the "Maintain a site-local operation
tracking table" line from Responsibilities and reworded the cached-call telemetry
responsibility to point at the `ICachedCallLifecycleObserver` hook; (2) renamed the
"Operation Tracking Table" section to "Operation Tracking Table (lives in Site
Runtime, not here)" with an explicit `StoreAndForward-021` callout cross-linking to
`Component-SiteRuntime.md` and the `IOperationTrackingStore` interface in
Commons. The rest of the section is retained for cross-component context (the
buffered cached-call rows carry `TrackedOperationId` so the link to the tracking row
must still be documented somewhere) but is reworded to make clear the table itself is
not owned here.
### StoreAndForward-022 — `NotifyCachedCallObserverAsync` silently drops the entire audit lifecycle when the message id is not a parseable `TrackedOperationId` ### StoreAndForward-022 — `NotifyCachedCallObserverAsync` silently drops the entire audit lifecycle when the message id is not a parseable `TrackedOperationId`
@@ -1284,7 +1312,7 @@ _Unresolved._
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Documentation & comments | | Category | Documentation & comments |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:484``:515` | | Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:484``:515` |
**Description** **Description**
@@ -1333,9 +1361,14 @@ contract — the existing
the fix is "log + skip", that test should be updated to also assert the log emission; the fix is "log + skip", that test should be updated to also assert the log emission;
if the fix is "emit anyway", the test should be replaced. if the fix is "emit anyway", the test should be replaced.
**Resolution** **Resolution (2026-05-28):**
Applied the brief's "cheap fix" — the non-GUID skip path now logs a Warning naming
_Unresolved._ the offending `MessageId`, `Category` and `Outcome` before returning, so a
misconfigured caller is observable instead of silently bypassing the audit pipeline.
S&F retry bookkeeping remains untouched (the observer is still best-effort, the skip
still returns without throwing). The existing
`Attempt_MessageIdNotAGuid_NoObserverNotification` test still passes — its assertion
is on `_observer.Notifications` being empty, which is unchanged.
### StoreAndForward-023 — `siteId` silently defaults to empty when no `IStoreAndForwardSiteContext` is registered, degrading audit telemetry correlation ### StoreAndForward-023 — `siteId` silently defaults to empty when no `IStoreAndForwardSiteContext` is registered, degrading audit telemetry correlation
@@ -1343,7 +1376,7 @@ _Unresolved._
|--|--| |--|--|
| Severity | Low | | Severity | Low |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/ServiceCollectionExtensions.cs:43``:53`; `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:99`, `:524` | | Location | `src/ScadaLink.StoreAndForward/ServiceCollectionExtensions.cs:43``:53`; `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:99`, `:524` |
**Description** **Description**
@@ -1383,9 +1416,18 @@ absent (no `AddAuditLog`), keep the empty-string default since `_siteId` is unus
Alternatively, change `siteId` from a parameter to a `Func<string>` resolved lazily Alternatively, change `siteId` from a parameter to a `Func<string>` resolved lazily
from the service provider so a late-registered context still takes effect. from the service provider so a late-registered context still takes effect.
**Resolution** **Resolution (2026-05-28):**
Applied the brief's sentinel option (less invasive than throwing — preserves the
_Unresolved._ existing test wiring that constructs `StoreAndForwardService` without a site context).
Introduced `StoreAndForwardService.UnknownSiteSentinel = "$unknown-site"` (leading
`$` chosen so it cannot collide with a real site id) and the constructor now
normalises any null/empty/whitespace `siteId` argument to that sentinel. The empty
string can no longer reach `CachedCallAttemptContext.SourceSite`; a misconfigured
host without an `IStoreAndForwardSiteContext` produces audit rows tagged with the
sentinel — recognisably bad in the central audit log instead of silently merging
into the empty bucket. All 104 existing tests pass; the only test that asserts a
literal `SourceSite` (`CachedCallAttemptEmissionTests`) supplies `"site-77"` so the
normalisation is a no-op there.
### StoreAndForward-024 — `StopAsync` does not wait for an in-flight retry sweep, so disposed dependencies can be touched after shutdown ### StoreAndForward-024 — `StopAsync` does not wait for an in-flight retry sweep, so disposed dependencies can be touched after shutdown
+18 -9
View File
@@ -18,8 +18,7 @@ Site clusters only. The central cluster does not buffer messages.
- Retry delivery per message according to the configured retry policy. - Retry delivery per message according to the configured retry policy.
- Park messages that exhaust their retry limit (dead-letter). - Park messages that exhaust their retry limit (dead-letter).
- Persist buffered messages to local SQLite for durability. - Persist buffered messages to local SQLite for durability.
- Maintain a site-local **operation tracking table** holding one row per `TrackedOperationId` for cached calls (`ExternalCall` and `DatabaseWrite`) — the authoritative status record consulted by `Tracking.Status(id)`. - Emit cached-call lifecycle telemetry to the central Site Call Audit component via the `ICachedCallLifecycleObserver` hook (one notification per attempt outcome) so the audit pipeline can record each status transition.
- Emit cached-call lifecycle telemetry to the central Site Call Audit component on every status transition.
- Replicate buffered messages to the standby node via application-level replication over Akka.NET remoting. - Replicate buffered messages to the standby node via application-level replication over Akka.NET remoting.
- On failover, the standby node takes over delivery from its replicated copy. - On failover, the standby node takes over delivery from its replicated copy.
- Respond to remote queries from central for parked message management (list, retry, discard), including central-driven Retry/Discard of parked cached calls. - Respond to remote queries from central for parked message management (list, retry, discard), including central-driven Retry/Discard of parked cached calls.
@@ -44,7 +43,7 @@ Attempt immediate delivery
└── Max retries exhausted → Park message └── Max retries exhausted → Park message
``` ```
For notifications, "delivery" means forwarding the message to the central cluster via CentralSite Communication; "success" is central's ack, on which the message is cleared. Notifications do not park — they are retried at the fixed forward interval until central acks. Parking applies only to the external-system-call and cached-database-write categories. For notifications, "delivery" means forwarding the message to the central cluster via CentralSite Communication; "success" is central's ack, on which the message is cleared. Notifications are retried at the fixed forward interval until central acks, but — like every other category — they are bounded by the engine's `DefaultMaxRetries` cap: a sustained central outage that exceeds `DefaultMaxRetries × forward-interval` will park the buffered notification, after which an operator can Retry/Discard it via the parked-message UI. Operationally, the cap is sized so the normal central-recovery window stays well inside it; "do not park" is the design's operational intent on the happy path, not an absolute invariant. Callers that genuinely require unbounded retry pass `maxRetries: 0` on `EnqueueAsync` (the documented "no limit" escape hatch — see `StoreAndForward-015`).
For the cached-call categories (`ExternalCall` and `DatabaseWrite`), the operation tracking table is the status record and the S&F buffer is purely the retry mechanism. A cached call that succeeds on its first immediate attempt is written directly as a terminal `Delivered` tracking row and never enters the S&F buffer. When immediate delivery fails transiently, the message is buffered and its tracking row moves to `Pending`/`Retrying`; the buffered message carries its `TrackedOperationId` so the tracking row and the retry record stay linked. When immediate delivery fails **permanently** (e.g. HTTP 4xx), the message is not buffered — the error is returned synchronously to the calling script as before — but the tracking row is written directly as a terminal `Failed` row capturing the error. On every tracking-table status transition the site emits `CachedCallTelemetry` to central. For the cached-call categories (`ExternalCall` and `DatabaseWrite`), the operation tracking table is the status record and the S&F buffer is purely the retry mechanism. A cached call that succeeds on its first immediate attempt is written directly as a terminal `Delivered` tracking row and never enters the S&F buffer. When immediate delivery fails transiently, the message is buffered and its tracking row moves to `Pending`/`Retrying`; the buffered message carries its `TrackedOperationId` so the tracking row and the retry record stay linked. When immediate delivery fails **permanently** (e.g. HTTP 4xx), the message is not buffered — the error is returned synchronously to the calling script as before — but the tracking row is written directly as a terminal `Failed` row capturing the error. On every tracking-table status transition the site emits `CachedCallTelemetry` to central.
@@ -56,7 +55,7 @@ For the external-system-call and cached-database-write categories, retry setting
- **External systems**: Each external system definition includes max retry count and time between retries. - **External systems**: Each external system definition includes max retry count and time between retries.
- **Cached database writes**: Each database connection definition includes max retry count and time between retries. - **Cached database writes**: Each database connection definition includes max retry count and time between retries.
The **notification** category retries differently: it has no source-entity setting. The site→central forward uses a single fixed retry interval configured in the host `appsettings.json`. This interval is infrastructure config for reaching the central cluster, not a per-notification-list setting. It applies uniformly to every buffered notification regardless of its target list. A buffered notification is retried until central acks it; it is not parked on a retry limit (central, once reachable, owns delivery, retry, and parking from that point on). The **notification** category retries differently: it has no source-entity setting. The site→central forward uses a single fixed retry interval configured in the host `appsettings.json`. This interval is infrastructure config for reaching the central cluster, not a per-notification-list setting. It applies uniformly to every buffered notification regardless of its target list. A buffered notification is retried at that interval until central acks it; the engine's `DefaultMaxRetries` cap still applies (matching the cached-call categories) and a notification whose retries are exhausted under a sustained central outage parks like any other buffered message. The cap is sized so the normal central-recovery window stays well inside it central, once reachable, owns delivery, retry, and parking from the ack point on.
The retry interval is **fixed** (not exponential backoff). Fixed interval is sufficient for the expected use cases. The retry interval is **fixed** (not exponential backoff). Fixed interval is sufficient for the expected use cases.
@@ -74,14 +73,24 @@ There is **no maximum buffer size**. Messages accumulate in the buffer until del
- On failover, the new active node has a near-complete copy of the buffer. In rare cases, the most recent operations may not have been replicated (e.g., a message added or removed just before failover). This can result in a few **duplicate deliveries** (message delivered but remove not replicated) or a few **missed retries** (message added but not replicated). Both are acceptable trade-offs for the latency benefit. - On failover, the new active node has a near-complete copy of the buffer. In rare cases, the most recent operations may not have been replicated (e.g., a message added or removed just before failover). This can result in a few **duplicate deliveries** (message delivered but remove not replicated) or a few **missed retries** (message added but not replicated). Both are acceptable trade-offs for the latency benefit.
- On failover, the new active node resumes delivery from its local copy. - On failover, the new active node resumes delivery from its local copy.
### Operation Tracking Table ### Operation Tracking Table (lives in Site Runtime, not here)
Alongside the S&F buffer DB, each site node holds a **site-local operation tracking table** in SQLite. It carries one row per `TrackedOperationId` for cached calls (`ExternalCall` and `DatabaseWrite`), created the moment the script issues the cached call and kept regardless of outcome. > **StoreAndForward-021:** the operation tracking table is **not** owned by
> this component. The `IOperationTrackingStore` interface lives in
> `src/ScadaLink.Commons/Interfaces/Services/`, and the SQLite-backed
> implementation (`OperationTrackingStore`, alongside `OperationTrackingOptions`)
> lives in [`src/ScadaLink.SiteRuntime/Tracking/`](../../src/ScadaLink.SiteRuntime/Tracking/).
> See [`Component-SiteRuntime.md`](Component-SiteRuntime.md) for the table's
> semantics, lifecycle, and central-mirror coordination — it is summarised here
> only because the S&F retry loop carries the `TrackedOperationId` linking a
> buffered cached-call row to its tracking entry.
- This table is the **status record**; the S&F buffer remains purely the **retry mechanism**. A buffered cached-call message references its `TrackedOperationId` back to its tracking row. For context: each site node also holds a site-local operation tracking table in SQLite (owned by Site Runtime) carrying one row per `TrackedOperationId` for cached calls (`ExternalCall` and `DatabaseWrite`), created the moment the script issues the cached call and kept regardless of outcome.
- That table is the **status record**; the S&F buffer remains purely the **retry mechanism**. A buffered cached-call message references its `TrackedOperationId` back to its tracking row.
- Each row records the operation kind (`TrackedOperationKind`), a target summary (external system + method, or database connection name), the unified `TrackedOperationStatus`, retry count, last error, source provenance (instance / script), and the created/updated/terminal UTC timestamps. - Each row records the operation kind (`TrackedOperationKind`), a target summary (external system + method, or database connection name), the unified `TrackedOperationStatus`, retry count, last error, source provenance (instance / script), and the created/updated/terminal UTC timestamps.
- `Tracking.Status(id)` reads this table. For cached calls the **site is the authoritative source of truth** for status — the query is always answered site-locally, even when central is unreachable. The central Site Call Audit `SiteCalls` table is an eventually-consistent mirror. - `Tracking.Status(id)` reads that table. For cached calls the **site is the authoritative source of truth** for status — the query is always answered site-locally, even when central is unreachable. The central Site Call Audit `SiteCalls` table is an eventually-consistent mirror.
- A cached call that succeeds on its first immediate attempt writes a terminal `Delivered` row directly here, with nothing placed in the S&F buffer. - A cached call that succeeds on its first immediate attempt writes a terminal `Delivered` row directly there, with nothing placed in the S&F buffer.
- Terminal rows are purged after a configurable retention window (default 7 days) — the site holds live operational state; central holds long-term audit. - Terminal rows are purged after a configurable retention window (default 7 days) — the site holds live operational state; central holds long-term audit.
Notifications are unaffected: they have no tracking table. Their `NotificationId` and status are owned by the central `Notifications` table, and their lifecycle continues to forward to central exactly as before. Notifications are unaffected: they have no tracking table. Their `NotificationId` and status are owned by the central `Notifications` table, and their lifecycle continues to forward to central exactly as before.
+39 -4
View File
@@ -118,9 +118,33 @@ public static class BundleCommands
timeout: BundleCommandTimeout, timeout: BundleCommandTimeout,
onSuccess: jsonOk => onSuccess: jsonOk =>
{ {
using var doc = JsonDocument.Parse(jsonOk); // CLI-020: previously the JSON envelope parse + property extraction +
var base64 = doc.RootElement.GetProperty("base64Bundle").GetString()!; // base64 decode all ran unguarded — a server-side bug that omits one of
var byteCount = doc.RootElement.GetProperty("byteCount").GetInt32(); // the two expected properties, returns a null base64 value, sends invalid
// base64, or returns a malformed JSON envelope would surface as one of
// KeyNotFoundException / InvalidOperationException / FormatException /
// JsonException, i.e. an unhandled stack trace rather than the
// documented "exit 1 with a clean INVALID_RESPONSE error". Wrap the
// envelope parse and the streamed write in a single try/catch matching
// the graceful-degradation theme established by CLI-002 / CLI-003 / CLI-005.
string base64;
int byteCount;
try
{
using var doc = JsonDocument.Parse(jsonOk);
base64 = doc.RootElement.GetProperty("base64Bundle").GetString()!;
byteCount = doc.RootElement.GetProperty("byteCount").GetInt32();
}
catch (Exception ex) when (ex is JsonException
or KeyNotFoundException
or InvalidOperationException)
{
OutputFormatter.WriteError(
$"Server returned a malformed bundle-export response: {ex.Message}",
"INVALID_RESPONSE");
return 1;
}
// CLI-019: stream the base64 → file write so a 100 MB bundle // CLI-019: stream the base64 → file write so a 100 MB bundle
// doesn't double-buffer through Convert.FromBase64String's // doesn't double-buffer through Convert.FromBase64String's
// ~100 MB byte[] on the LOH plus a synchronous File.WriteAllBytes. // ~100 MB byte[] on the LOH plus a synchronous File.WriteAllBytes.
@@ -128,7 +152,18 @@ public static class BundleCommands
// jsonOk string (wire-format limit), but the decode + write // jsonOk string (wire-format limit), but the decode + write
// are now chunked, so peak working-set drops from // are now chunked, so peak working-set drops from
// ~base64+byte[]+envelope to ~base64+small-chunk. // ~base64+byte[]+envelope to ~base64+small-chunk.
var written = StreamBase64ToFile(base64, output); long written;
try
{
written = StreamBase64ToFile(base64, output);
}
catch (FormatException ex)
{
OutputFormatter.WriteError(
$"Server returned invalid base64 in the bundle response: {ex.Message}",
"INVALID_RESPONSE");
return 1;
}
Console.WriteLine($"Wrote {written:N0} bytes to {output} (server reported {byteCount:N0})."); Console.WriteLine($"Wrote {written:N0} bytes to {output} (server reported {byteCount:N0}).");
return 0; return 0;
}); });
@@ -75,10 +75,21 @@
<div class="d-flex justify-content-between align-items-center"> <div class="d-flex justify-content-between align-items-center">
<span class="text-muted small">Page @_pageNumber · @_rows.Count rows</span> <span class="text-muted small">Page @_pageNumber · @_rows.Count rows</span>
<button class="btn btn-outline-secondary btn-sm" @* CentralUI-032: keyset paging is naturally forward-only, but the
data-test="grid-next-page" in-component _cursorStack lets the user step back through previous
disabled="@(_loading || _rows.Count < _pageSize)" pages by replaying the prior cursor. The Previous button is gated
@onclick="NextPage">Next page</button> on the stack having at least one prior cursor — i.e. we are not on
the first page. *@
<div class="btn-group">
<button class="btn btn-outline-secondary btn-sm"
data-test="grid-prev-page"
disabled="@(_loading || !CanGoBack)"
@onclick="PrevPage">Previous page</button>
<button class="btn btn-outline-secondary btn-sm"
data-test="grid-next-page"
disabled="@(_loading || _rows.Count < _pageSize)"
@onclick="NextPage">Next page</button>
</div>
</div> </div>
</div> </div>
@@ -66,6 +66,16 @@ public partial class AuditResultsGrid : IAsyncDisposable
private bool _loading; private bool _loading;
private string? _error; private string? _error;
// CentralUI-032: small in-component stack of prior-page cursors so the user
// can step backwards through results. Each Next push captures the cursor
// that produced the current page (null for page 1) before advancing; each
// Previous pop reloads the page at the recovered cursor. Mirrors the
// SiteCallsReport keyset-paging shape called out in the finding.
private readonly Stack<AuditLogPaging?> _cursorStack = new();
// The cursor that produced the page currently on screen — kept so Next can
// push it before advancing without recomputing it from _rows.
private AuditLogPaging? _currentPaging;
private AuditLogQueryFilter? _activeFilter; private AuditLogQueryFilter? _activeFilter;
[Inject] private IJSRuntime JS { get; set; } = default!; [Inject] private IJSRuntime JS { get; set; } = default!;
@@ -196,6 +206,8 @@ public partial class AuditResultsGrid : IAsyncDisposable
_activeFilter = Filter; _activeFilter = Filter;
_pageNumber = 1; _pageNumber = 1;
_rows.Clear(); _rows.Clear();
_cursorStack.Clear();
_currentPaging = null;
if (Filter is not null) if (Filter is not null)
{ {
await LoadAsync(paging: null); await LoadAsync(paging: null);
@@ -216,10 +228,36 @@ public partial class AuditResultsGrid : IAsyncDisposable
AfterOccurredAtUtc: last.OccurredAtUtc, AfterOccurredAtUtc: last.OccurredAtUtc,
AfterEventId: last.EventId); AfterEventId: last.EventId);
// CentralUI-032: remember the cursor that produced the current page so
// a later Previous can navigate back to it. The page-1 entry is pushed
// as null — LoadAsync treats null as "first page" (PageSize-only).
_cursorStack.Push(_currentPaging);
await LoadAsync(cursor); await LoadAsync(cursor);
_pageNumber++; _pageNumber++;
} }
// CentralUI-032: pops the previous-page cursor off the stack and reloads
// at that position. The pop only happens AFTER a successful reload — a
// failed page-fetch leaves the user on the current page with the error
// banner instead of stranding them between pages.
private async Task PrevPage()
{
if (_cursorStack.Count == 0 || _activeFilter is null)
{
return;
}
var prior = _cursorStack.Peek();
await LoadAsync(prior);
if (_error is null)
{
_cursorStack.Pop();
_pageNumber = Math.Max(1, _pageNumber - 1);
}
}
private bool CanGoBack => _cursorStack.Count > 0;
private async Task LoadAsync(AuditLogPaging? paging) private async Task LoadAsync(AuditLogPaging? paging)
{ {
if (_activeFilter is null) if (_activeFilter is null)
@@ -235,6 +273,9 @@ public partial class AuditResultsGrid : IAsyncDisposable
var page = await QueryService.QueryAsync(_activeFilter, effective); var page = await QueryService.QueryAsync(_activeFilter, effective);
_rows.Clear(); _rows.Clear();
_rows.AddRange(page); _rows.AddRange(page);
// Track the cursor that produced the page now on screen so a later
// Next can push it onto the stack before advancing.
_currentPaging = paging;
} }
catch (Exception ex) catch (Exception ex)
{ {
@@ -245,14 +245,24 @@
// same query param doesn't re-run the query on every parameter set. // same query param doesn't re-run the query on every parameter set.
private Guid? _lastFetchedBundleImportId; private Guid? _lastFetchedBundleImportId;
// CentralUI-029: the browser-time JS module that hosts getTimezoneOffsetMinutes().
// Loaded lazily on first render via dynamic import; replaces the previous
// `JS.InvokeAsync<int>("eval", "new Date().getTimezoneOffset()")` call, which
// widened the JS-interop attack surface and was incompatible with strict CSP
// `script-src` directives that forbid `unsafe-eval`.
private const string BrowserTimeModulePath = "./_content/ScadaLink.CentralUI/js/browser-time.js";
private IJSObjectReference? _browserTimeModule;
protected override async Task OnAfterRenderAsync(bool firstRender) protected override async Task OnAfterRenderAsync(bool firstRender)
{ {
if (!firstRender) return; if (!firstRender) return;
try try
{ {
// Date.getTimezoneOffset() returns (UTC - local) in minutes. // Date.getTimezoneOffset() returns (UTC - local) in minutes.
_browserUtcOffsetMinutes = await JS.InvokeAsync<int>( _browserTimeModule ??= await JS.InvokeAsync<IJSObjectReference>(
"eval", "new Date().getTimezoneOffset()"); "import", BrowserTimeModulePath);
_browserUtcOffsetMinutes = await _browserTimeModule.InvokeAsync<int>(
"getTimezoneOffsetMinutes");
} }
catch (Exception ex) when (ex is JSException or JSDisconnectedException catch (Exception ex) when (ex is JSException or JSDisconnectedException
or InvalidOperationException or TaskCanceledException) or InvalidOperationException or TaskCanceledException)
@@ -0,0 +1,13 @@
// CentralUI-029: small JS module to replace the JS.InvokeAsync<int>("eval", ...)
// anti-pattern previously used by ConfigurationAuditLog. Exporting a named
// function from an ES module:
// * removes the residual `eval` JS-interop surface,
// * is CSP-friendly (no `unsafe-eval` directive required),
// * matches the module-import pattern (`session-expiry.js`, `audit-grid.js`,
// `nav-state.js`, `transport.js`) the rest of the Central UI follows.
//
// The function returns the same value as `new Date().getTimezoneOffset()` —
// minutes of (UTC - local), positive for time zones west of UTC.
export function getTimezoneOffsetMinutes() {
return new Date().getTimezoneOffset();
}
@@ -20,11 +20,15 @@ namespace ScadaLink.ClusterInfrastructure;
/// </summary> /// </summary>
public class ClusterOptions public class ClusterOptions
{ {
/// <summary> // ClusterInfra-011: the previous `public const string SectionName = "ScadaLink:Cluster";`
/// The <c>appsettings.json</c> section name this options class binds from. // was documented as "single source of truth so binding sites do not hard-code the
/// Single source of truth so binding sites do not hard-code the magic string. // magic string" but no caller ever read it — the Host's SiteServiceRegistration and
/// </summary> // StartupValidator both hard-code the literal directly. Wiring those binding sites
public const string SectionName = "ScadaLink:Cluster"; // to reference the constant lives in the Host's edit scope (a separate code-review
// task); rather than carry a public constant whose guarantee the code does not
// deliver, the constant is removed and the literal stays in the Host until the
// Host-side wiring is done. If a future Host change wants the constant back, add it
// when the binding sites can be updated in the same commit.
/// <summary> /// <summary>
/// Akka.NET cluster seed nodes. Both nodes are seed nodes — each node lists /// Akka.NET cluster seed nodes. Both nodes are seed nodes — each node lists
@@ -30,20 +30,17 @@ public static class ServiceCollectionExtensions
return services; return services;
} }
/// <summary> // ClusterInfra-014: the previous `AddClusterInfrastructureActors` extension
/// Reserved for cluster-infrastructure actor registration. This component does // was dead surface — its XML doc told callers "do not call", its body
/// not register any actors — the Akka.NET bootstrap and actor wiring live in // unconditionally threw `NotImplementedException`, and no production caller
/// <c>ScadaLink.Host</c>. The method throws rather than silently returning // existed anywhere in the solution (verified by grep). The CI-002
/// success so that any caller assuming this component registers actors fails // "throw loudly" decision was made while CI-001's ownership question was
/// fast with a clear cause instead of failing later, far from here. // still open; that question is now permanently settled by the
/// </summary> // "Implementation Note — Code Placement" section of
/// <exception cref="NotImplementedException">Always thrown.</exception> // Component-ClusterInfrastructure.md, which records that all actor wiring
/// <param name="services">The service collection (unused; method always throws).</param> // lives in ScadaLink.Host (AkkaHostedService). Keeping a public extension
public static IServiceCollection AddClusterInfrastructureActors(this IServiceCollection services) // method that exists only to throw was API-surface noise that an IDE would
{ // still suggest via auto-complete, so the method and its companion
throw new NotImplementedException( // `AddClusterInfrastructureActors_ThrowsRatherThanSilentlySucceeding` test
"ScadaLink.ClusterInfrastructure registers no actors. The Akka.NET actor system " + // were both removed.
"bootstrap and all cluster actor registration live in ScadaLink.Host " +
"(AkkaHostedService). Do not call AddClusterInfrastructureActors().");
}
} }
@@ -1,5 +1,11 @@
using ScadaLink.Commons.Types; using ScadaLink.Commons.Types;
// Commons-018: physically lives under Interfaces/Services/ to match the
// established subfolder convention (REQ-COM-5b), but the namespace stays
// `ScadaLink.Commons.Interfaces` to avoid a cascading update to 9+ consumer
// files across ScadaLink.SiteRuntime, ScadaLink.AuditLog and ScadaLink.Host.
// Adopting the canonical `ScadaLink.Commons.Interfaces.Services` namespace
// can be picked up alongside any future Commons-wide namespace tidy-up.
namespace ScadaLink.Commons.Interfaces; namespace ScadaLink.Commons.Interfaces;
/// <summary> /// <summary>
@@ -1,3 +1,9 @@
// Commons-018: physically lives under Interfaces/Services/ to match the
// established subfolder convention (REQ-COM-5b), but the namespace stays
// `ScadaLink.Commons.Interfaces` to avoid a cascading update to consumers
// across ScadaLink.AuditLog and ScadaLink.ConfigurationDatabase. Adopting
// the canonical `ScadaLink.Commons.Interfaces.Services` namespace can be
// picked up alongside any future Commons-wide namespace tidy-up.
namespace ScadaLink.Commons.Interfaces; namespace ScadaLink.Commons.Interfaces;
/// <summary> /// <summary>
@@ -29,6 +29,11 @@ namespace ScadaLink.Commons.Types;
/// Cluster node that submitted the cached call (e.g. <c>"node-a"</c> / /// Cluster node that submitted the cached call (e.g. <c>"node-a"</c> /
/// <c>"node-b"</c>), captured at enqueue time. Null on rows persisted before /// <c>"node-b"</c>), captured at enqueue time. Null on rows persisted before
/// the SourceNode stamping migration; stamping itself is wired in a later task. /// the SourceNode stamping migration; stamping itself is wired in a later task.
/// Commons-023: trailing-optional with a <c>= null</c> default, matching the
/// SourceNode rollout convention now used on <c>SiteCallSummary</c>,
/// <c>SiteCallDetail</c>, <c>NotificationSummary</c> and <c>NotificationDetail</c>
/// — so existing positional construction sites keep compiling as new
/// optional fields land on this record.
/// </param> /// </param>
public sealed record TrackingStatusSnapshot( public sealed record TrackingStatusSnapshot(
TrackedOperationId Id, TrackedOperationId Id,
@@ -43,4 +48,4 @@ public sealed record TrackingStatusSnapshot(
DateTime? TerminalAtUtc, DateTime? TerminalAtUtc,
string? SourceInstanceId, string? SourceInstanceId,
string? SourceScript, string? SourceScript,
string? SourceNode); string? SourceNode = null);
@@ -2,6 +2,16 @@ namespace ScadaLink.Commons.Types.Transport;
public sealed class BundleSession public sealed class BundleSession
{ {
/// <summary>
/// Commons-016: legacy per-session lockout threshold (kept on this type for the
/// shim <see cref="Locked"/> getter). The authoritative, server-side per-bundle
/// counter is bounded by <c>TransportOptions.MaxUnlockAttemptsPerSession</c>
/// (default also <c>3</c>) and is what <c>BundleImporter.LoadAsync</c> consults.
/// This constant exists so the comparison in <see cref="Locked"/> uses a named
/// symbol that a security review can grep for, rather than a literal <c>3</c>.
/// </summary>
public const int MaxUnlockAttempts = 3;
/// <summary>Unique identifier for this import session.</summary> /// <summary>Unique identifier for this import session.</summary>
public Guid SessionId { get; init; } public Guid SessionId { get; init; }
/// <summary>Parsed manifest from the uploaded bundle.</summary> /// <summary>Parsed manifest from the uploaded bundle.</summary>
@@ -22,6 +32,7 @@ public sealed class BundleSession
/// <summary> /// <summary>
/// T-003 legacy: always <c>false</c> on a session returned by <c>LoadAsync</c> /// T-003 legacy: always <c>false</c> on a session returned by <c>LoadAsync</c>
/// because lockout enforcement moved server-side; see <see cref="FailedUnlockAttempts"/>. /// because lockout enforcement moved server-side; see <see cref="FailedUnlockAttempts"/>.
/// The threshold is the named <see cref="MaxUnlockAttempts"/> constant (default 3).
/// </summary> /// </summary>
public bool Locked => FailedUnlockAttempts >= 3; public bool Locked => FailedUnlockAttempts >= MaxUnlockAttempts;
} }
@@ -410,7 +410,14 @@ public class CentralCommunicationActor : ReceiveActor
contacts[site.SiteIdentifier] = addrs; contacts[site.SiteIdentifier] = addrs;
} }
return new SiteAddressCacheLoaded(contacts); // Communication-020: freeze the cross-task payload before piping to
// Self. The message record exposes read-only types (
// IReadOnlyDictionary / IReadOnlyList) so the Akka.NET message-
// immutability convention is enforced by type, not just convention.
var frozen = contacts.ToDictionary(
kvp => kvp.Key,
kvp => (IReadOnlyList<string>)kvp.Value.AsReadOnly());
return new SiteAddressCacheLoaded(frozen);
}).PipeTo(self); }).PipeTo(self);
} }
@@ -540,8 +547,14 @@ public record RefreshSiteAddresses;
/// <summary> /// <summary>
/// Internal message carrying the loaded site contact data from the database. /// Internal message carrying the loaded site contact data from the database.
/// ClusterClient creation happens on the actor thread in HandleSiteAddressCacheLoaded. /// ClusterClient creation happens on the actor thread in HandleSiteAddressCacheLoaded.
///
/// Communication-020: the payload is exposed as <see cref="IReadOnlyDictionary{TKey,TValue}"/>
/// of <see cref="IReadOnlyList{T}"/> so the Akka.NET "messages are immutable"
/// convention is enforced at the type level rather than relying on producer
/// discipline. The producer wraps the constructed buckets with
/// <c>List&lt;T&gt;.AsReadOnly()</c> before piping to Self.
/// </summary> /// </summary>
internal record SiteAddressCacheLoaded(Dictionary<string, List<string>> SiteContacts); internal record SiteAddressCacheLoaded(IReadOnlyDictionary<string, IReadOnlyList<string>> SiteContacts);
/// <summary> /// <summary>
/// Notification sent to debug view subscribers when the stream is terminated /// Notification sent to debug view subscribers when the stream is terminated
@@ -103,11 +103,26 @@ public class DeploymentService
/// <summary> /// <summary>
/// Resolves the site's string identifier from the numeric DB ID. /// Resolves the site's string identifier from the numeric DB ID.
/// The communication layer routes by string identifier (e.g. "site-a"), not DB ID. /// The communication layer routes by string identifier (e.g. "site-a"), not DB ID.
///
/// DeploymentManager-021: when the <see cref="Site"/> row is missing (FK was
/// deleted, race with admin delete, DB inconsistency) the previous behaviour
/// silently substituted the numeric id rendered as a string — every
/// downstream `CommunicationService` call then failed with a confusing
/// "unknown site" routing error that hid the real cause. Treat a missing
/// site row as a hard validation failure: throw
/// <see cref="InvalidOperationException"/> naming the unresolved id so the
/// operator sees the actual problem. On the deploy path the existing
/// try/catch turns this into a Failed deployment record with a clear
/// message; lifecycle paths propagate it to the caller (CLI/UI) which
/// surface it as an error to the operator.
/// </summary> /// </summary>
private async Task<string> ResolveSiteIdentifierAsync(int siteId, CancellationToken cancellationToken) private async Task<string> ResolveSiteIdentifierAsync(int siteId, CancellationToken cancellationToken)
{ {
var site = await _siteRepository.GetSiteByIdAsync(siteId, cancellationToken); var site = await _siteRepository.GetSiteByIdAsync(siteId, cancellationToken);
return site?.SiteIdentifier ?? siteId.ToString(); if (site == null)
throw new InvalidOperationException(
$"Site with ID {siteId} not found; cannot resolve its SiteIdentifier for routing.");
return site.SiteIdentifier;
} }
/// <summary> /// <summary>
@@ -174,11 +189,23 @@ public class DeploymentService
if (reconciled != null) if (reconciled != null)
return Result<DeploymentRecord>.Success(reconciled); return Result<DeploymentRecord>.Success(reconciled);
// WP-4: Create deployment record with Pending status // WP-4: Create the deployment record directly in InProgress.
//
// DeploymentManager-022: the previous code wrote the record as Pending,
// then immediately updated it to InProgress with no work in between
// (flattening, validation, and reconciliation all completed above). The
// back-to-back write cost an extra SaveChangesAsync round-trip, an
// extra IDeploymentStatusNotifier push (CentralUI-006 rendered a
// Pending→InProgress flicker for ~ms), and an extra row-version bump
// for nothing. The transient Pending slot carried no operational
// meaning — it was set and immediately overwritten — so dropping it
// collapses the start of the deploy into a single insert + notify.
// InProgress remains the documented "sent to site, awaiting response"
// state, set immediately before the round-trip below.
var record = new DeploymentRecord(deploymentId, user) var record = new DeploymentRecord(deploymentId, user)
{ {
InstanceId = instanceId, InstanceId = instanceId,
Status = DeploymentStatus.Pending, Status = DeploymentStatus.InProgress,
RevisionHash = revisionHash, RevisionHash = revisionHash,
DeployedAt = DateTimeOffset.UtcNow DeployedAt = DateTimeOffset.UtcNow
}; };
@@ -187,12 +214,6 @@ public class DeploymentService
await _repository.SaveChangesAsync(cancellationToken); await _repository.SaveChangesAsync(cancellationToken);
NotifyStatusChange(record); NotifyStatusChange(record);
// Update status to InProgress
record.Status = DeploymentStatus.InProgress;
await _repository.UpdateDeploymentRecordAsync(record, cancellationToken);
await _repository.SaveChangesAsync(cancellationToken);
NotifyStatusChange(record);
try try
{ {
// WP-1: Send to site via CommunicationService // WP-1: Send to site via CommunicationService
@@ -18,8 +18,22 @@ public class CentralHealthReportLoop : BackgroundService
/// <summary> /// <summary>
/// Reserved siteId used to represent the central cluster in the /// Reserved siteId used to represent the central cluster in the
/// shared CentralHealthAggregator keyspace. /// shared CentralHealthAggregator keyspace.
///
/// HealthMonitoring-021: the value is prefixed with <c>$</c> — a character
/// that is forbidden in real site identifiers (the configuration /
/// repository layer only permits Sites whose <c>SiteIdentifier</c> is a
/// plain identifier) — so the synthetic central entry cannot collide with
/// a real site whose operator-set identifier happened to be the bare word
/// "central". A collision would have caused the two reports to clobber
/// each other in the aggregator keyspace via the sequence-number guard,
/// and the real site would inherit the longer
/// <see cref="HealthMonitoringOptions.CentralOfflineTimeout"/> grace and
/// stay falsely-online for an extra two minutes after going down.
/// Consumers (<see cref="CentralHealthAggregator.CheckForOfflineSites"/>,
/// the Central UI health dashboard) reference this constant rather than
/// the literal string, so the change is local.
/// </summary> /// </summary>
public const string CentralSiteId = "central"; public const string CentralSiteId = "$central";
private readonly ISiteHealthCollector _collector; private readonly ISiteHealthCollector _collector;
private readonly ICentralHealthAggregator _aggregator; private readonly ICentralHealthAggregator _aggregator;
@@ -15,6 +15,15 @@ namespace ScadaLink.Host;
/// set, console output template, file path and rolling interval are all /// set, console output template, file path and rolling interval are all
/// configuration-driven (defined in <c>appsettings.json</c>), not hard-coded. The /// configuration-driven (defined in <c>appsettings.json</c>), not hard-coded. The
/// explicit <c>MinimumLevel.Is</c> below pins the floor from <see cref="LoggingOptions"/>. /// explicit <c>MinimumLevel.Is</c> below pins the floor from <see cref="LoggingOptions"/>.
///
/// Host-020: <c>ScadaLink:Logging:MinimumLevel</c> is the single source of truth
/// for the floor — the explicit <c>MinimumLevel.Is</c> call deliberately runs
/// AFTER <c>ReadFrom.Configuration</c> so a <c>Serilog:MinimumLevel</c> entry in
/// configuration is overridden. To make that precedence visible (so an operator
/// who sets <c>Serilog:MinimumLevel</c> does not wonder why the change had no
/// effect), <see cref="Build"/> writes a one-shot warning to
/// <see cref="Console.Error"/> when both keys are present. Pick one path —
/// editing <c>Serilog:MinimumLevel</c> alone has no effect.
/// </summary> /// </summary>
public static class LoggerConfigurationFactory public static class LoggerConfigurationFactory
{ {
@@ -29,11 +38,47 @@ public static class LoggerConfigurationFactory
string nodeRole, string nodeRole,
string siteId, string siteId,
string nodeHostname) string nodeHostname)
=> Build(configuration, nodeRole, siteId, nodeHostname, Console.Error);
/// <summary>
/// Test-visible overload of <see cref="Build(IConfiguration, string, string, string)"/>
/// that routes the Host-020 precedence warning through a caller-supplied
/// writer so unit tests can capture it. Production calls the four-arg
/// overload which uses <see cref="Console.Error"/>.
/// </summary>
/// <param name="configuration">Application configuration supplying the Serilog section and logging options.</param>
/// <param name="nodeRole">Role label added as a log enrichment property.</param>
/// <param name="siteId">Site identifier added as a log enrichment property.</param>
/// <param name="nodeHostname">Hostname added as a log enrichment property.</param>
/// <param name="warningWriter">Writer that receives the one-shot Host-020 override-warning when both keys are present.</param>
internal static LoggerConfiguration Build(
IConfiguration configuration,
string nodeRole,
string siteId,
string nodeHostname,
TextWriter warningWriter)
{ {
var loggingOptions = new LoggingOptions(); var loggingOptions = new LoggingOptions();
configuration.GetSection("ScadaLink:Logging").Bind(loggingOptions); configuration.GetSection("ScadaLink:Logging").Bind(loggingOptions);
var minimumLevel = ParseLevel(loggingOptions.MinimumLevel); var minimumLevel = ParseLevel(loggingOptions.MinimumLevel, warningWriter);
// Host-020: warn once if the operator also set a Serilog:MinimumLevel —
// they almost certainly expected it to take effect, but the explicit
// MinimumLevel.Is call below silently overrides it. The warning is
// emitted only when the conflicting key is actually present (a bare
// "Default" value is what ReadFrom.Configuration reads); a missing /
// empty Serilog:MinimumLevel section is silent.
var serilogMinimumLevel = configuration["Serilog:MinimumLevel"]
?? configuration["Serilog:MinimumLevel:Default"];
if (!string.IsNullOrWhiteSpace(serilogMinimumLevel))
{
warningWriter.WriteLine(
$"warning: Serilog:MinimumLevel ('{serilogMinimumLevel}') is being overridden by " +
$"ScadaLink:Logging:MinimumLevel ('{loggingOptions.MinimumLevel ?? "Information (default)"}'). " +
"ScadaLink:Logging:MinimumLevel is the documented source of truth for the floor (Host-011); " +
"remove the Serilog:MinimumLevel entry to silence this warning.");
}
return new LoggerConfiguration() return new LoggerConfiguration()
.ReadFrom.Configuration(configuration) .ReadFrom.Configuration(configuration)
+3 -1
View File
@@ -1,9 +1,11 @@
{ {
"ScadaLink": { "ScadaLink": {
"_nodeName": "Host-018: NodeName stamps SourceNode on AuditLog/Notifications/SiteCalls rows (CLAUDE.md 'Centralized Audit Log' decision) and backs IX_AuditLog_Node_Occurred. Convention: 'central-a'/'central-b' for central nodes, 'node-a'/'node-b' for site nodes. Override per-node in multi-node deployments (the docker per-node configs do this). When left at the default below, single-node dev rows are stamped with 'central-a'; an empty value normalises to a NULL SourceNode.",
"Node": { "Node": {
"Role": "Central", "Role": "Central",
"NodeHostname": "localhost", "NodeHostname": "localhost",
"RemotingPort": 8081 "RemotingPort": 8081,
"NodeName": "central-a"
}, },
"Cluster": { "Cluster": {
"SeedNodes": [ "SeedNodes": [
+5 -3
View File
@@ -1,11 +1,13 @@
{ {
"ScadaLink": { "ScadaLink": {
"_nodeName": "Host-018: NodeName stamps SourceNode on AuditLog/Notifications/SiteCalls rows (CLAUDE.md 'Centralized Audit Log' decision) and backs IX_AuditLog_Node_Occurred. Convention: 'node-a'/'node-b' for site nodes, 'central-a'/'central-b' for central nodes. Override per-node in multi-node deployments (the docker per-node configs do this). When left at the default below, single-node dev rows are stamped with 'node-a'; an empty value normalises to a NULL SourceNode.",
"Node": { "Node": {
"Role": "Site", "Role": "Site",
"NodeHostname": "localhost", "NodeHostname": "localhost",
"SiteId": "site-a", "SiteId": "site-a",
"RemotingPort": 8082, "RemotingPort": 8082,
"GrpcPort": 8083 "GrpcPort": 8083,
"NodeName": "node-a"
}, },
"Cluster": { "Cluster": {
"SeedNodes": [ "SeedNodes": [
@@ -31,9 +33,9 @@
"ReplicationEnabled": true "ReplicationEnabled": true
}, },
"Communication": { "Communication": {
"_centralContactPoints": "Host-016: each entry MUST be a central node's remoting endpoint, NOT this site's own remoting port. The single dev-loopback default below points only at central-a (localhost:8081). In a multi-central deployment add the second central node here (e.g. 'akka.tcp://scadalink@central-b-host:8081') so ClusterClient can fail over when central-a is down. The previous template listed localhost:8082 as the second contact — that is THIS site's own RemotingPort and is a permanent failure in the initial-contact rotation.",
"CentralContactPoints": [ "CentralContactPoints": [
"akka.tcp://scadalink@localhost:8081", "akka.tcp://scadalink@localhost:8081"
"akka.tcp://scadalink@localhost:8082"
], ],
"DeploymentTimeout": "00:02:00", "DeploymentTimeout": "00:02:00",
"LifecycleTimeout": "00:00:30", "LifecycleTimeout": "00:00:30",
+1 -5
View File
@@ -1,9 +1,5 @@
{ {
"Logging": { "_logging": "Host-021: Serilog is the sole logger provider (Program.cs calls builder.Host.UseSerilog()), so the standard Microsoft 'Logging:LogLevel' block has no effect and was removed. The minimum level is set via 'ScadaLink:Logging:MinimumLevel' (bound to LoggingOptions per Host-011); sinks are defined under the 'Serilog' section below and applied via ReadFrom.Configuration (Host-014). See LoggerConfigurationFactory + Component-Host.md REQ-HOST-8.",
"LogLevel": {
"Default": "Information"
}
},
"Serilog": { "Serilog": {
"Using": [ "Using": [
"Serilog.Sinks.Console", "Serilog.Sinks.Console",
@@ -36,9 +36,18 @@ namespace ScadaLink.SiteCallAudit;
/// Per CLAUDE.md "audit-write failure NEVER aborts the user-facing action" — /// Per CLAUDE.md "audit-write failure NEVER aborts the user-facing action" —
/// the actor catches every exception from the repository call and replies /// the actor catches every exception from the repository call and replies
/// <c>Accepted=false</c> without rethrowing, so the central singleton stays /// <c>Accepted=false</c> without rethrowing, so the central singleton stays
/// alive. The <see cref="SupervisorStrategy"/> uses <c>Resume</c> so an /// alive. The <see cref="SupervisorStrategy"/> override governs the actor's
/// unexpected throw before the catch (defence in depth) does not restart the /// <em>children</em>, not the actor itself; this actor has no children today,
/// actor and reset in-flight state. /// so the override is currently inert. It returns a one-for-one strategy with
/// <see cref="Akka.Actor.SupervisorStrategy.DefaultDecider"/> (Restart on most
/// exceptions, Stop on <see cref="ActorInitializationException"/> /
/// <see cref="ActorKilledException"/>) and <c>maxNrOfRetries: 0</c>, so any
/// future child that throws is Stopped on the first failure — a deliberate
/// "fail loudly" posture for the central singleton's eventual sub-actors
/// (reconciliation puller, purge scheduler). Self-supervision of this actor
/// is whatever the parent <see cref="Akka.Cluster.Tools.Singleton.ClusterSingletonManager"/>
/// supplies; the in-handler <c>try/catch</c> in <see cref="OnUpsertAsync"/>
/// is what actually keeps the singleton alive across repository faults.
/// </para> /// </para>
/// <para> /// <para>
/// Two constructors exist for the same reason as /// Two constructors exist for the same reason as
@@ -147,7 +156,18 @@ public class SiteCallAuditActor : ReceiveActor
Receive<DiscardSiteCallRequest>(HandleDiscardSiteCall); Receive<DiscardSiteCallRequest>(HandleDiscardSiteCall);
} }
/// <inheritdoc /> /// <summary>
/// SiteCallAudit-001: child supervision strategy — governs children, not this
/// actor. The actor has no children today, so this override is inert; it
/// returns a one-for-one strategy with the framework
/// <see cref="Akka.Actor.SupervisorStrategy.DefaultDecider"/> (Restart on
/// most exceptions; Stop on <see cref="ActorInitializationException"/> /
/// <see cref="ActorKilledException"/>) and <c>maxNrOfRetries: 0</c>, so any
/// future child that throws is Stopped on the first failure. The actor's
/// own resilience comes from the <c>try/catch</c> in <see cref="OnUpsertAsync"/>
/// plus the parent <see cref="Akka.Cluster.Tools.Singleton.ClusterSingletonManager"/>'s
/// supervision — not from this override.
/// </summary>
protected override SupervisorStrategy SupervisorStrategy() protected override SupervisorStrategy SupervisorStrategy()
{ {
return new OneForOneStrategy(maxNrOfRetries: 0, withinTimeRange: TimeSpan.Zero, decider: return new OneForOneStrategy(maxNrOfRetries: 0, withinTimeRange: TimeSpan.Zero, decider:
@@ -179,7 +199,14 @@ public class SiteCallAuditActor : ReceiveActor
try try
{ {
await repository.UpsertAsync(cmd.SiteCall).ConfigureAwait(false); // SiteCallAudit-003: stamp IngestedAtUtc at central-side persist
// time on every upsert, mirroring AuditLogIngestActor's combined-
// telemetry hot path. IngestedAtUtc is the "central ingested (or
// last refreshed) this row" timestamp; callers (telemetry,
// future reconciliation puller, direct-writes) cannot in general
// know they are running on central, so the actor owns the stamp.
var siteCall = cmd.SiteCall with { IngestedAtUtc = DateTime.UtcNow };
await repository.UpsertAsync(siteCall).ConfigureAwait(false);
replyTo.Tell(new UpsertSiteCallReply(id, Accepted: true)); replyTo.Tell(new UpsertSiteCallReply(id, Accepted: true));
} }
catch (Exception ex) catch (Exception ex)
@@ -27,4 +27,14 @@ public interface ISiteEventLogger
string source, string source,
string message, string message,
string? details = null); string? details = null);
/// <summary>
/// SiteEventLogging-018: total number of event writes that have failed
/// (SQLite error, disk full, bounded-queue overflow drop, etc.) since this
/// logger was created. Available for future Health Monitoring integration —
/// promoted onto the interface so a Health consumer can read it without a
/// concrete-type downcast. Not yet polled by Health Monitoring; the wiring
/// is tracked separately.
/// </summary>
long FailedWriteCount { get; }
} }
@@ -90,9 +90,15 @@ public class SiteEventLogger : ISiteEventLogger, IDisposable
} }
/// <summary> /// <summary>
/// Number of event writes that have failed (SQLite error, disk full, etc.) /// SiteEventLogging-018: number of event writes that have failed (SQLite
/// since this logger was created. Surfaced so Health Monitoring can detect a /// error, disk full, bounded-queue overflow drop, etc.) since this logger
/// logging outage instead of relying on a local log line nobody is watching. /// was created. Available for future Health Monitoring integration — the
/// counter is correct and observable, but the central health-metric
/// pipeline does not yet poll it, so a sustained non-zero value currently
/// goes unnoticed in production beyond the per-failure log line. Wiring
/// the metric into the 30-second site-metric publish is tracked
/// separately; promoted to <see cref="ISiteEventLogger"/> so the eventual
/// consumer reads it without a concrete-type downcast.
/// </summary> /// </summary>
public long FailedWriteCount => Interlocked.Read(ref _failedWriteCount); public long FailedWriteCount => Interlocked.Read(ref _failedWriteCount);
@@ -132,6 +132,11 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
// WP-33: Handle system-wide artifact deployment // WP-33: Handle system-wide artifact deployment
Receive<DeployArtifactsCommand>(HandleDeployArtifacts); Receive<DeployArtifactsCommand>(HandleDeployArtifacts);
// SiteRuntime-021: artifact-deploy DCL push, dispatched back from the
// off-thread persistence task so the hash-cache mutation stays
// actor-thread-confined.
Receive<ApplyArtifactDataConnectionsToDcl>(HandleApplyArtifactDataConnectionsToDcl);
// Debug View — route to Instance Actors // Debug View — route to Instance Actors
Receive<SubscribeDebugViewRequest>(RouteDebugViewSubscribe); Receive<SubscribeDebugViewRequest>(RouteDebugViewSubscribe);
Receive<UnsubscribeDebugViewRequest>(RouteDebugViewUnsubscribe); Receive<UnsubscribeDebugViewRequest>(RouteDebugViewUnsubscribe);
@@ -642,23 +647,12 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
foreach (var (name, connConfig) in config.Connections) foreach (var (name, connConfig) in config.Connections)
{ {
var configHash = ComputeConnectionConfigHash(connConfig); EnsureDclConnection(
if (_createdConnections.TryGetValue(name, out var lastHash) && lastHash == configHash) name,
continue; connConfig.Protocol,
connConfig.ConfigurationJson,
var primaryDetails = FlattenConnectionConfig(connConfig.Protocol, connConfig.ConfigurationJson); connConfig.BackupConfigurationJson,
var backupDetails = string.IsNullOrEmpty(connConfig.BackupConfigurationJson) connConfig.FailoverRetryCount);
? null
: FlattenConnectionConfig(connConfig.Protocol, connConfig.BackupConfigurationJson);
_dclManager.Tell(new Commons.Messages.DataConnection.CreateConnectionCommand(
name, connConfig.Protocol, primaryDetails, backupDetails, connConfig.FailoverRetryCount));
var changed = _createdConnections.ContainsKey(name);
_createdConnections[name] = configHash;
_logger.LogInformation(
"{Action} DCL connection {Connection} (protocol={Protocol})",
changed ? "Updated" : "Created", name, connConfig.Protocol);
} }
} }
catch (Exception ex) catch (Exception ex)
@@ -667,20 +661,78 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
} }
} }
/// <summary>
/// SiteRuntime-021: hash-guarded DCL connection push shared by the inline
/// per-instance path (<see cref="EnsureDclConnections(string)"/>) and the
/// system-wide artifact-deploy path (<see cref="HandleDeployArtifacts"/>).
/// Unchanged config is a no-op; a changed endpoint/credentials/backup/
/// failover-count re-issues a <c>CreateConnectionCommand</c> so a system-
/// wide artifact-deploy makes its data-connection change live immediately
/// (the artifact-deploy path previously only persisted to SQLite — the
/// DCL didn't see the change until next instance redeploy or node
/// restart, contradicting the "site is self-contained after artifact
/// deployment" intent).
/// </summary>
private void EnsureDclConnection(
string name,
string protocol,
string? primaryConfigurationJson,
string? backupConfigurationJson,
int failoverRetryCount)
{
if (_dclManager == null) return;
var configHash = ComputeConnectionConfigHashCore(
protocol, primaryConfigurationJson, backupConfigurationJson, failoverRetryCount);
if (_createdConnections.TryGetValue(name, out var lastHash) && lastHash == configHash)
return;
var primaryDetails = FlattenConnectionConfig(protocol, primaryConfigurationJson);
var backupDetails = string.IsNullOrEmpty(backupConfigurationJson)
? null
: FlattenConnectionConfig(protocol, backupConfigurationJson);
_dclManager.Tell(new Commons.Messages.DataConnection.CreateConnectionCommand(
name, protocol, primaryDetails, backupDetails, failoverRetryCount));
var changed = _createdConnections.ContainsKey(name);
_createdConnections[name] = configHash;
_logger.LogInformation(
"{Action} DCL connection {Connection} (protocol={Protocol})",
changed ? "Updated" : "Created", name, protocol);
}
/// <summary> /// <summary>
/// Computes a stable hash over the configuration fields that affect how the DCL /// Computes a stable hash over the configuration fields that affect how the DCL
/// connects, so a changed endpoint/credential/backup/failover count is detected /// connects, so a changed endpoint/credential/backup/failover count is detected
/// (SiteRuntime-010). /// (SiteRuntime-010).
/// </summary> /// </summary>
private static string ComputeConnectionConfigHash( private static string ComputeConnectionConfigHash(
Commons.Types.Flattening.ConnectionConfig connConfig) Commons.Types.Flattening.ConnectionConfig connConfig) =>
ComputeConnectionConfigHashCore(
connConfig.Protocol,
connConfig.ConfigurationJson,
connConfig.BackupConfigurationJson,
connConfig.FailoverRetryCount);
/// <summary>
/// SiteRuntime-021: field-based core so the system-wide artifact-deploy
/// path (which carries protocol/config-json/backup-json/failover directly
/// on <see cref="Commons.Messages.Artifacts.DataConnectionArtifact"/>) can
/// share the same hash + skip-or-resend logic as the inline-config path.
/// </summary>
private static string ComputeConnectionConfigHashCore(
string protocol,
string? primaryConfigurationJson,
string? backupConfigurationJson,
int failoverRetryCount)
{ {
var material = string.Join( var material = string.Join(
"", "",
connConfig.Protocol, protocol,
connConfig.ConfigurationJson ?? string.Empty, primaryConfigurationJson ?? string.Empty,
connConfig.BackupConfigurationJson ?? string.Empty, backupConfigurationJson ?? string.Empty,
connConfig.FailoverRetryCount.ToString()); failoverRetryCount.ToString(System.Globalization.CultureInfo.InvariantCulture));
var bytes = System.Security.Cryptography.SHA256.HashData( var bytes = System.Security.Cryptography.SHA256.HashData(
System.Text.Encoding.UTF8.GetBytes(material)); System.Text.Encoding.UTF8.GetBytes(material));
@@ -983,6 +1035,20 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
dc.Name, dc.Protocol, dc.PrimaryConfigurationJson, dc.Name, dc.Protocol, dc.PrimaryConfigurationJson,
dc.BackupConfigurationJson, dc.FailoverRetryCount); dc.BackupConfigurationJson, dc.FailoverRetryCount);
} }
// SiteRuntime-021: after the SQLite store, dispatch an
// internal message back to the actor thread so the DCL
// push runs through EnsureDclConnection — keeping the
// _createdConnections hash cache mutation actor-thread-
// confined while still making the change live immediately
// (previously the change landed in SQLite but the DCL
// kept using the stale connection until next instance
// redeploy or node restart, contradicting "site is
// self-contained after artifact deployment"). The
// helper's hash cache skips unchanged definitions, so
// the push is idempotent for re-deploys of the same
// artifact bundle.
Self.Tell(new ApplyArtifactDataConnectionsToDcl(command.DataConnections));
} }
// Store SMTP configurations // Store SMTP configurations
@@ -1044,6 +1110,27 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
_logger.LogDebug("Created Instance Actor for {Instance}", instanceName); _logger.LogDebug("Created Instance Actor for {Instance}", instanceName);
} }
/// <summary>
/// SiteRuntime-021: actor-thread handler that pushes artifact-deploy data
/// connection definitions to the DCL via the shared
/// <see cref="EnsureDclConnection"/> helper. Dispatched from
/// <see cref="HandleDeployArtifacts"/>'s off-thread Task so the
/// <see cref="_createdConnections"/> hash-cache mutation stays
/// actor-thread-confined.
/// </summary>
private void HandleApplyArtifactDataConnectionsToDcl(ApplyArtifactDataConnectionsToDcl msg)
{
foreach (var dc in msg.DataConnections)
{
EnsureDclConnection(
dc.Name,
dc.Protocol,
dc.PrimaryConfigurationJson,
dc.BackupConfigurationJson,
dc.FailoverRetryCount);
}
}
/// <summary> /// <summary>
/// Gets the count of active Instance Actors (for testing/diagnostics). /// Gets the count of active Instance Actors (for testing/diagnostics).
/// </summary> /// </summary>
@@ -1085,4 +1172,14 @@ public class DeploymentManagerActor : ReceiveActor, IWithTimers
/// A redeployment command buffered until the previous Instance Actor terminates. /// A redeployment command buffered until the previous Instance Actor terminates.
/// </summary> /// </summary>
internal record PendingRedeploy(DeployInstanceCommand Command, IActorRef OriginalSender); internal record PendingRedeploy(DeployInstanceCommand Command, IActorRef OriginalSender);
/// <summary>
/// SiteRuntime-021: internal message dispatched from
/// <see cref="HandleDeployArtifacts"/>'s off-thread persistence task back
/// onto the actor thread, so the DCL push (and its hash-cache mutation)
/// runs through <see cref="EnsureDclConnection"/> without crossing
/// thread-confinement boundaries.
/// </summary>
internal record ApplyArtifactDataConnectionsToDcl(
IReadOnlyList<Commons.Messages.Artifacts.DataConnectionArtifact> DataConnections);
} }
@@ -135,15 +135,20 @@ internal sealed class AuditingDbCommand : DbCommand
// the wrapper, but writes from the user go through to the inner // the wrapper, but writes from the user go through to the inner
// command so the underlying provider keeps its wiring intact. // command so the underlying provider keeps its wiring intact.
get => _wrappingConnection ?? _inner.Connection; get => _wrappingConnection ?? _inner.Connection;
// SiteRuntime-022: unwrap the AuditingDbConnection wrapper via its
// own internal Inner accessor instead of reflecting into a private
// _inner field. Reflection was the original SiteRuntime-006 anti-
// pattern (and is forbidden inside script bodies by the trust
// model) — both classes are internal sealed in the same assembly,
// so the proper API surface is available without leaking anything
// public.
set set
{ {
_wrappingConnection = value; _wrappingConnection = value;
_inner.Connection = value switch _inner.Connection = value switch
{ {
AuditingDbConnection auditing => auditing.GetType() AuditingDbConnection auditing => auditing.Inner,
.GetField("_inner", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic) _ => value,
!.GetValue(auditing) as DbConnection,
_ => value
}; };
} }
} }
@@ -86,6 +86,18 @@ internal sealed class AuditingDbConnection : DbConnection
_parentExecutionId = parentExecutionId; _parentExecutionId = parentExecutionId;
} }
/// <summary>
/// SiteRuntime-022: exposes the wrapped <see cref="DbConnection"/> to the
/// sibling <see cref="AuditingDbCommand"/> in the same assembly, so the
/// command's <c>DbConnection</c> setter can unwrap an
/// <see cref="AuditingDbConnection"/> without reflecting into the
/// private <c>_inner</c> field. Both classes are <c>internal sealed</c>
/// in this assembly, so the accessor stays out of the public API and
/// matches the SiteRuntime-006 precedent of preferring proper API surface
/// over <see cref="System.Reflection"/>.
/// </summary>
internal DbConnection Inner => _inner;
/// <inheritdoc /> /// <inheritdoc />
// ConnectionString is settable on DbConnection — forward both halves. // ConnectionString is settable on DbConnection — forward both halves.
public override string ConnectionString public override string ConnectionString
@@ -42,6 +42,11 @@ public static class ServiceCollectionExtensions
// ISiteIdentityProvider — HealthMonitoring already references S&F. // ISiteIdentityProvider — HealthMonitoring already references S&F.
var cachedCallObserver = sp.GetService<ICachedCallLifecycleObserver>(); var cachedCallObserver = sp.GetService<ICachedCallLifecycleObserver>();
var siteContext = sp.GetService<IStoreAndForwardSiteContext>(); var siteContext = sp.GetService<IStoreAndForwardSiteContext>();
// StoreAndForward-023: pass null/empty through unchanged — the
// service constructor normalises it to UnknownSiteSentinel so a
// host without an IStoreAndForwardSiteContext registration is
// observable in the central audit log instead of producing a
// silent empty-string SourceSite.
var siteId = siteContext?.SiteId ?? string.Empty; var siteId = siteContext?.SiteId ?? string.Empty;
return new StoreAndForwardService( return new StoreAndForwardService(
storage, storage,
@@ -14,7 +14,24 @@ public class StoreAndForwardOptions
/// <summary>WP-10: Default retry interval for messages without per-source settings.</summary> /// <summary>WP-10: Default retry interval for messages without per-source settings.</summary>
public TimeSpan DefaultRetryInterval { get; set; } = TimeSpan.FromSeconds(30); public TimeSpan DefaultRetryInterval { get; set; } = TimeSpan.FromSeconds(30);
/// <summary>WP-10: Default maximum retry count before parking.</summary> /// <summary>
/// WP-10: Default maximum retry count before parking. Applied when an
/// <c>EnqueueAsync</c> caller does not pass an explicit <c>maxRetries</c>.
/// <para>
/// <b>StoreAndForward-019:</b> this default is enforced uniformly across
/// every category, including <see cref="Commons.Types.Enums.StoreAndForwardCategory.Notification"/>:
/// once the buffered message's retry count reaches this cap the engine
/// parks the row. The Component-StoreAndForward.md "notifications do not
/// park" wording reflects the operational <i>intent</i> when central is
/// reachable on the normal cadence; under a sustained central outage that
/// exceeds <c>DefaultMaxRetries × forward-interval</c> a buffered
/// notification <i>will</i> park and surface in the parked-message UI,
/// matching the rest of the system's bounded-retry-then-park behaviour.
/// Callers that genuinely require unbounded retry must pass
/// <c>maxRetries: 0</c> on <c>EnqueueAsync</c> (the documented "no limit"
/// escape hatch — see <c>StoreAndForwardService.EnqueueAsync</c>).
/// </para>
/// </summary>
public int DefaultMaxRetries { get; set; } = 50; public int DefaultMaxRetries { get; set; } = 50;
/// <summary>WP-10: Interval for the background retry timer sweep.</summary> /// <summary>WP-10: Interval for the background retry timer sweep.</summary>
@@ -46,8 +46,31 @@ public class StoreAndForwardService
/// Audit Log #23 (M3 Bundle E — Task E4): site id stamped onto the /// Audit Log #23 (M3 Bundle E — Task E4): site id stamped onto the
/// cached-call attempt context so the audit bridge can build the /// cached-call attempt context so the audit bridge can build the
/// <see cref="SiteCallOperational"/> half of the telemetry packet. /// <see cref="SiteCallOperational"/> half of the telemetry packet.
/// <para>
/// <b>StoreAndForward-023:</b> an empty-string site id must never reach
/// downstream consumers — the central audit pipeline keys
/// <c>(SourceSite, TrackedOperationId)</c> off this value, so an empty
/// string degrades correlation to a per-id-only index and breaks the
/// per-site routing of <c>RetryParkedOperation</c>/<c>DiscardParkedOperation</c>
/// commands. The constructor normalises a null/empty/whitespace
/// <paramref name="siteId"/> argument to <see cref="UnknownSiteSentinel"/>
/// so a misconfigured host (no <c>IStoreAndForwardSiteContext</c>
/// registered) produces a distinctive marker in the central audit log
/// rather than silently merging multiple sites into the empty bucket.
/// </para>
/// </summary> /// </summary>
private readonly string _siteId; private readonly string _siteId;
/// <summary>
/// StoreAndForward-023: distinctive marker stamped onto cached-call audit
/// telemetry when the host has not registered an
/// <see cref="IStoreAndForwardSiteContext"/>. Chosen with a leading <c>$</c>
/// so it cannot collide with a real site id (which is a configuration
/// identifier and never starts with <c>$</c>). Surfacing this in the
/// central audit log makes a missing site-context binding immediately
/// recognisable instead of an unattributable empty string.
/// </summary>
public const string UnknownSiteSentinel = "$unknown-site";
private Timer? _retryTimer; private Timer? _retryTimer;
private int _retryInProgress; private int _retryInProgress;
@@ -120,7 +143,11 @@ public class StoreAndForwardService
_logger = logger; _logger = logger;
_replication = replication; _replication = replication;
_cachedCallObserver = cachedCallObserver; _cachedCallObserver = cachedCallObserver;
_siteId = siteId; // StoreAndForward-023: normalise an empty / whitespace site id to the
// distinctive UnknownSiteSentinel so downstream consumers (the central
// audit pipeline keying off SourceSite) never see an empty string and
// a misconfigured host is recognisable in the central log.
_siteId = string.IsNullOrWhiteSpace(siteId) ? UnknownSiteSentinel : siteId;
} }
/// <summary> /// <summary>
@@ -583,8 +610,21 @@ public class StoreAndForwardService
if (!TrackedOperationId.TryParse(message.Id, out var trackedId)) if (!TrackedOperationId.TryParse(message.Id, out var trackedId))
{ {
// Pre-M3 message (random GUID-N id from S&F itself, no // StoreAndForward-022: previously a silent skip — but a non-GUID
// TrackedOperationId threaded in). Skip — no audit row to bind to. // message id means a caller bypassed the audit hot path with zero
// feedback. The drop is still best-effort (S&F retry bookkeeping
// must never depend on the audit pipeline) but it is now observable
// via a Warning so a misconfigured caller can be diagnosed.
// Engine-minted ids (Guid.NewGuid().ToString("N")) and the current
// caller set (NotificationOutbox enqueue with NotificationId,
// cached-call enqueue with TrackedOperationId.ToString()) all
// parse — this log line fires only when a future caller supplies a
// non-GUID id, which is exactly when the silent-drop was hardest
// to diagnose.
_logger.LogWarning(
"Cached-call audit observer skipped: message id {MessageId} is not a parseable TrackedOperationId (category {Category}, outcome {Outcome}). " +
"Audit lifecycle for this operation will have no rows.",
message.Id, message.Category, outcome);
return; return;
} }
@@ -667,26 +707,51 @@ public class StoreAndForwardService
/// so a failover preserves the operator's retry intent. /// so a failover preserves the operator's retry intent.
/// StoreAndForward-017: the activity-log entry carries the message's true /// StoreAndForward-017: the activity-log entry carries the message's true
/// category rather than a hard-coded one. /// category rather than a hard-coded one.
/// StoreAndForward-020: the parked row is captured <i>before</i> the local
/// requeue write rather than re-read after it, so a concurrent
/// <c>RemoveMessageAsync</c> or <c>DiscardParkedMessageAsync</c> running
/// between the two storage calls cannot leave the standby in <c>Parked</c>
/// while the active node has already requeued — we always have the row in
/// hand for the <c>Requeue</c> replication.
/// </summary> /// </summary>
/// <param name="messageId">The identifier of the message to retry.</param> /// <param name="messageId">The identifier of the message to retry.</param>
/// <returns>True if successfully retried, false otherwise.</returns> /// <returns>True if successfully retried, false otherwise.</returns>
public async Task<bool> RetryParkedMessageAsync(string messageId) public async Task<bool> RetryParkedMessageAsync(string messageId)
{ {
var success = await _storage.RetryParkedMessageAsync(messageId); // StoreAndForward-020: capture the parked row up front so the standby
if (success) // gets a Requeue even if a concurrent writer (a sweep delete after a
// successful delivery, or an operator discard) removes the row between
// the local update and the re-load. The storage call below is
// conditional on status = Parked, so if the row has already moved we
// return false here without replicating — the standby's matching row
// will be reconciled by whichever other operator path won the race.
var captured = await _storage.GetMessageByIdAsync(messageId);
if (captured is null || captured.Status != StoreAndForwardMessageStatus.Parked)
{ {
// Re-load the requeued row so the activity log gets the real category return false;
// and the standby gets the post-requeue state (Pending, retry_count = 0).
var message = await _storage.GetMessageByIdAsync(messageId);
var category = message?.Category ?? StoreAndForwardCategory.ExternalSystem;
if (message != null)
{
_replication?.ReplicateRequeue(message);
}
RaiseActivity("Retry", category,
$"Parked message {messageId} moved back to queue");
} }
return success;
var success = await _storage.RetryParkedMessageAsync(messageId);
if (!success)
{
return false;
}
// The active node just rewrote this row to Pending with retry_count = 0
// and cleared last_error / last_attempt_at (see
// StoreAndForwardStorage.RetryParkedMessageAsync). Reconstruct the
// post-requeue state on the captured POCO so the standby applies the
// same mutations even if a concurrent writer has already deleted the
// row underneath us.
captured.Status = StoreAndForwardMessageStatus.Pending;
captured.RetryCount = 0;
captured.LastError = null;
captured.LastAttemptAt = null;
_replication?.ReplicateRequeue(captured);
RaiseActivity("Retry", captured.Category,
$"Parked message {messageId} moved back to queue");
return true;
} }
/// <summary> /// <summary>
@@ -35,17 +35,26 @@ public class ClusterOptionsTests
Assert.Empty(options.SeedNodes); Assert.Empty(options.SeedNodes);
} }
[Fact] // ClusterInfra-011: SectionName constant deleted — the previous test
public void SectionName_IsTheExpectedAppSettingsSection() // `SectionName_IsTheExpectedAppSettingsSection` is removed alongside it.
{ // The Host's SiteServiceRegistration / StartupValidator continue to
// CI-005: ClusterOptions must expose a single-source-of-truth constant for // reference the `"ScadaLink:Cluster"` literal directly; reinstating the
// its appsettings.json section so binding sites do not hard-code the string. // constant should happen when those Host binding sites can be updated in
Assert.Equal("ScadaLink:Cluster", ClusterOptions.SectionName); // the same change.
}
[Fact] [Fact]
public void Properties_CanBeSetToCustomValues() public void Properties_CanBeSetToCustomValues()
{ {
// ClusterInfra-013: this test exercises the POCO property setters only —
// `SplitBrainResolverStrategy = "keep-majority"` and `MinNrOfMembers = 2`
// are values the design doc explicitly forbids in production
// (`keep-majority` causes total shutdown on a two-node partition;
// `MinNrOfMembers = 2` blocks the cluster singleton after failover).
// The POCO accepts any value by design; rejection lives in
// `ClusterOptionsValidator` and is covered by
// `ClusterOptionsValidatorTests.UnsupportedSplitBrainStrategy_FailsValidation`
// and `ClusterOptionsValidatorTests.MinNrOfMembers_NotOne_FailsValidation`.
// Do NOT read these values as endorsed runtime configuration.
var options = new ClusterOptions var options = new ClusterOptions
{ {
SeedNodes = new List<string> { "akka.tcp://system@node1:2551", "akka.tcp://system@node2:2551" }, SeedNodes = new List<string> { "akka.tcp://system@node1:2551", "akka.tcp://system@node2:2551" },
@@ -4,11 +4,11 @@ using Microsoft.Extensions.Options;
namespace ScadaLink.ClusterInfrastructure.Tests; namespace ScadaLink.ClusterInfrastructure.Tests;
/// <summary> /// <summary>
/// CI-002: Tests that the DI extension methods do real work rather than /// CI-002: Tests that <see cref="ServiceCollectionExtensions.AddClusterInfrastructure"/>
/// silently returning success. <see cref="ServiceCollectionExtensions.AddClusterInfrastructure"/> /// does real work rather than silently returning success — it must register
/// must register the <see cref="ClusterOptionsValidator"/> so misconfiguration /// the <see cref="ClusterOptionsValidator"/> so misconfiguration fails fast.
/// fails fast, and the unimplemented actor-registration placeholder must fail /// (The companion actor-registration test was removed alongside the deleted
/// loudly rather than masquerade as a completed registration. /// `AddClusterInfrastructureActors` extension method — see ClusterInfra-014.)
/// </summary> /// </summary>
public class ServiceCollectionExtensionsTests public class ServiceCollectionExtensionsTests
{ {
@@ -48,11 +48,10 @@ public class ServiceCollectionExtensionsTests
Assert.Contains("MinNrOfMembers", ex.Message); Assert.Contains("MinNrOfMembers", ex.Message);
} }
[Fact] // ClusterInfra-014: `AddClusterInfrastructureActors_ThrowsRatherThanSilentlySucceeding`
public void AddClusterInfrastructureActors_ThrowsRatherThanSilentlySucceeding() // was removed alongside the now-deleted `AddClusterInfrastructureActors`
{ // extension method. The Akka.NET actor wiring legitimately lives in
var services = new ServiceCollection(); // `ScadaLink.Host` (AkkaHostedService) per the
// Component-ClusterInfrastructure.md "Implementation Note — Code Placement"
Assert.Throws<NotImplementedException>(() => services.AddClusterInfrastructureActors()); // section; this project no longer exposes an actor-registration extension.
}
} }
@@ -5,6 +5,7 @@ using Microsoft.Extensions.Options;
using NSubstitute; using NSubstitute;
using ScadaLink.Commons.Entities.Deployment; using ScadaLink.Commons.Entities.Deployment;
using ScadaLink.Commons.Entities.Instances; using ScadaLink.Commons.Entities.Instances;
using ScadaLink.Commons.Entities.Sites;
using ScadaLink.Commons.Interfaces.Repositories; using ScadaLink.Commons.Interfaces.Repositories;
using ScadaLink.Commons.Interfaces.Services; using ScadaLink.Commons.Interfaces.Services;
using ScadaLink.Commons.Messages.Deployment; using ScadaLink.Commons.Messages.Deployment;
@@ -44,7 +45,7 @@ public class DeploymentServiceTests : TestKit
OperationLockTimeout = TimeSpan.FromSeconds(5) OperationLockTimeout = TimeSpan.FromSeconds(5)
}); });
var siteRepo = Substitute.For<ISiteRepository>(); var siteRepo = CreateSiteRepoStub();
_service = new DeploymentService( _service = new DeploymentService(
_repo, siteRepo, _pipeline, _comms, _lockManager, _audit, _repo, siteRepo, _pipeline, _comms, _lockManager, _audit,
new DiffService(), new DiffService(),
@@ -101,6 +102,91 @@ public class DeploymentServiceTests : TestKit
Assert.Contains("Validation failed", result.Error); Assert.Contains("Validation failed", result.Error);
} }
// ── DeploymentManager-021: missing Site row -> hard failure, no silent fabrication ──
[Fact]
public async Task DeployInstanceAsync_SiteRowMissing_FailsLoudlyInsteadOfSilentlySubstituting()
{
// DeploymentManager-021 regression: previously ResolveSiteIdentifierAsync
// silently returned the numeric siteId rendered as a string when the
// site row was missing (FK was deleted, race with admin delete, DB
// inconsistency). That bogus identifier then surfaced downstream as a
// confusing "unknown site" routing error that hid the real cause.
//
// After the fix the resolver throws InvalidOperationException naming
// the unresolved id; on the deploy path the existing try/catch turns
// it into a Failed deployment record whose error message reflects the
// actual problem.
var instance = new Instance("OrphanInst")
{
Id = 99, SiteId = 42, State = InstanceState.NotDeployed
};
_repo.GetInstanceByIdAsync(99, Arg.Any<CancellationToken>()).Returns(instance);
SetupValidPipeline(99, "OrphanInst", "sha256:abc");
// Build a fresh service whose siteRepo explicitly returns null for the
// instance's SiteId (the helper above seeds every id, so we shadow it
// for SiteId=42 only).
var siteRepo = CreateSiteRepoStub();
siteRepo.GetSiteByIdAsync(42, Arg.Any<CancellationToken>()).Returns((Site?)null);
var service = new DeploymentService(
_repo, siteRepo, _pipeline, _comms, _lockManager, _audit,
new DiffService(),
new DeploymentStatusNotifier(NullLogger<DeploymentStatusNotifier>.Instance),
Options.Create(new DeploymentManagerOptions { OperationLockTimeout = TimeSpan.FromSeconds(5) }),
NullLogger<DeploymentService>.Instance);
var result = await service.DeployInstanceAsync(99, "admin");
Assert.True(result.IsFailure);
// The descriptive message names the unresolved id so the operator sees
// the actual problem (missing site row), not a downstream routing error.
Assert.Contains("42", result.Error);
Assert.Contains("not found", result.Error, StringComparison.OrdinalIgnoreCase);
}
// ── DeploymentManager-022: no transient Pending -> single InProgress insert ──
[Fact]
public async Task DeployInstanceAsync_NoTransientPendingWrite_RecordCreatedDirectlyInProgress()
{
// DeploymentManager-022 regression: previously the deploy path wrote
// the record as Pending, then immediately updated it to InProgress
// with no work in between — an extra SaveChangesAsync round-trip, an
// extra notifier push, and a Pending->InProgress flicker in the
// CentralUI deployment-status page. After the fix the record is
// inserted directly in InProgress (one Add + one notify); no
// intermediate Pending row is ever persisted or notified.
var instance = new Instance("DirectInProgressInst")
{
Id = 200, SiteId = 1, State = InstanceState.NotDeployed
};
_repo.GetInstanceByIdAsync(200, Arg.Any<CancellationToken>()).Returns(instance);
SetupValidPipeline(200, "DirectInProgressInst", "sha256:dp22");
// The catch path later flips the same record reference to Failed, so
// snapshot the Status at insert time rather than reading the live
// reference at assertion time.
DeploymentStatus? statusAtInsert = null;
await _repo.AddDeploymentRecordAsync(
Arg.Do<DeploymentRecord>(r => statusAtInsert = r.Status), Arg.Any<CancellationToken>());
// The communication actor is unset so the call throws after the insert;
// we only care about the status the insert was made with.
await _service.DeployInstanceAsync(200, "admin");
// The single Add happens with the record already in InProgress.
Assert.NotNull(statusAtInsert);
Assert.Equal(DeploymentStatus.InProgress, statusAtInsert!.Value);
// No Pending update was issued — the resolver never wrote the
// intermediate Pending row.
await _repo.DidNotReceive().UpdateDeploymentRecordAsync(
Arg.Is<DeploymentRecord>(r => r.Status == DeploymentStatus.Pending),
Arg.Any<CancellationToken>());
}
// ── WP-2: Deployment identity ── // ── WP-2: Deployment identity ──
[Fact] [Fact]
@@ -581,7 +667,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance); NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor); comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>(); var siteRepo = CreateSiteRepoStub();
return new DeploymentService( return new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit, _repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(), new DiffService(),
@@ -598,6 +684,30 @@ public class DeploymentServiceTests : TestKit
new FlatteningPipelineResult(config, revisionHash, ValidationResult.Success()))); new FlatteningPipelineResult(config, revisionHash, ValidationResult.Success())));
} }
/// <summary>
/// DeploymentManager-021 test helper: returns an <see cref="ISiteRepository"/>
/// substitute that resolves <see cref="ISiteRepository.GetSiteByIdAsync"/>
/// for ANY integer id to a stub <see cref="Site"/> whose
/// <c>SiteIdentifier</c> is <c>"site-{id}"</c>. Prior to the
/// DeploymentManager-021 fix the production `ResolveSiteIdentifierAsync`
/// silently substituted the numeric id when the site row was missing, so
/// these tests passed without seeding any Sites. After the fix a missing
/// site throws — every test that drives a deploy/lifecycle path needs a
/// real-shaped <see cref="Site"/> back, and this helper centralises that
/// arrangement so individual tests don't repeat the boilerplate.
/// </summary>
private static ISiteRepository CreateSiteRepoStub()
{
var siteRepo = Substitute.For<ISiteRepository>();
siteRepo.GetSiteByIdAsync(Arg.Any<int>(), Arg.Any<CancellationToken>())
.Returns(callInfo =>
{
var id = callInfo.ArgAt<int>(0);
return new Site($"Test Site {id}", $"site-{id}") { Id = id };
});
return siteRepo;
}
[Fact] [Fact]
public async Task DeployInstanceAsync_PriorInProgressRecord_SiteHasTargetHash_MarksSuccessWithoutRedeploy() public async Task DeployInstanceAsync_PriorInProgressRecord_SiteHasTargetHash_MarksSuccessWithoutRedeploy()
{ {
@@ -994,7 +1104,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance); NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor); comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>(); var siteRepo = CreateSiteRepoStub();
var service = new DeploymentService( var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit, _repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(), new DiffService(),
@@ -1049,7 +1159,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance); NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor); comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>(); var siteRepo = CreateSiteRepoStub();
var deadline = TimeSpan.FromMilliseconds(300); var deadline = TimeSpan.FromMilliseconds(300);
var service = new DeploymentService( var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit, _repo, siteRepo, _pipeline, comms, _lockManager, _audit,
@@ -1098,7 +1208,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance); NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor); comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>(); var siteRepo = CreateSiteRepoStub();
var service = new DeploymentService( var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit, _repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(), new DiffService(),
@@ -1143,7 +1253,7 @@ public class DeploymentServiceTests : TestKit
NullLogger<CommunicationService>.Instance); NullLogger<CommunicationService>.Instance);
comms.SetCommunicationActor(commActor); comms.SetCommunicationActor(commActor);
var siteRepo = Substitute.For<ISiteRepository>(); var siteRepo = CreateSiteRepoStub();
var service = new DeploymentService( var service = new DeploymentService(
_repo, siteRepo, _pipeline, comms, _lockManager, _audit, _repo, siteRepo, _pipeline, comms, _lockManager, _audit,
new DiffService(), new DiffService(),
@@ -4,6 +4,7 @@ using Microsoft.Extensions.Options;
using NSubstitute; using NSubstitute;
using ScadaLink.Commons.Entities.Deployment; using ScadaLink.Commons.Entities.Deployment;
using ScadaLink.Commons.Entities.Instances; using ScadaLink.Commons.Entities.Instances;
using ScadaLink.Commons.Entities.Sites;
using ScadaLink.Commons.Interfaces.Repositories; using ScadaLink.Commons.Interfaces.Repositories;
using ScadaLink.Commons.Interfaces.Services; using ScadaLink.Commons.Interfaces.Services;
using ScadaLink.Commons.Types; using ScadaLink.Commons.Types;
@@ -46,7 +47,17 @@ public class DeploymentStatusNotifierTests : TestKit
OperationLockTimeout = TimeSpan.FromSeconds(5) OperationLockTimeout = TimeSpan.FromSeconds(5)
}); });
// DeploymentManager-021: the resolver now throws when the site row
// is missing, so seed the substitute to return a real-shaped Site for
// any id these tests touch.
var siteRepo = Substitute.For<ISiteRepository>(); var siteRepo = Substitute.For<ISiteRepository>();
siteRepo.GetSiteByIdAsync(Arg.Any<int>(), Arg.Any<CancellationToken>())
.Returns(callInfo =>
{
var id = callInfo.ArgAt<int>(0);
return new Site($"Test Site {id}", $"site-{id}") { Id = id };
});
_service = new DeploymentService( _service = new DeploymentService(
_repo, siteRepo, _pipeline, _comms, _lockManager, _audit, _repo, siteRepo, _pipeline, _comms, _lockManager, _audit,
new DiffService(), _notifier, options, new DiffService(), _notifier, options,
@@ -68,14 +79,20 @@ public class DeploymentStatusNotifierTests : TestKit
_notifier.StatusChanged += c => changes.Add(c); _notifier.StatusChanged += c => changes.Add(c);
// _comms has no actor set, so the deploy reaches the catch block and // _comms has no actor set, so the deploy reaches the catch block and
// the record ends Failed. The notifier must fire for the Pending, // the record ends Failed. The notifier must fire for the InProgress
// InProgress and Failed writes — not be silent (the pre-fix behaviour). // and Failed writes — not be silent (the pre-fix behaviour).
//
// DeploymentManager-022: the transient Pending write was dropped from
// the deploy path (the record is now created directly in InProgress),
// so there is no Pending notification any more. The remaining two
// writes — the initial InProgress insert and the catch-block Failed
// update — must each raise a status-change.
var result = await _service.DeployInstanceAsync(7, "admin"); var result = await _service.DeployInstanceAsync(7, "admin");
Assert.True(result.IsFailure); Assert.True(result.IsFailure);
Assert.NotEmpty(changes); Assert.NotEmpty(changes);
Assert.All(changes, c => Assert.Equal(7, c.InstanceId)); Assert.All(changes, c => Assert.Equal(7, c.InstanceId));
Assert.Contains(changes, c => c.Status == DeploymentStatus.Pending); Assert.DoesNotContain(changes, c => c.Status == DeploymentStatus.Pending);
Assert.Contains(changes, c => c.Status == DeploymentStatus.InProgress); Assert.Contains(changes, c => c.Status == DeploymentStatus.InProgress);
Assert.Contains(changes, c => c.Status == DeploymentStatus.Failed); Assert.Contains(changes, c => c.Status == DeploymentStatus.Failed);
@@ -108,4 +108,54 @@ public class LoggerConfigurationTests
Assert.Equal(LogEventLevel.Warning, result); Assert.Equal(LogEventLevel.Warning, result);
Assert.Empty(writer.ToString()); Assert.Empty(writer.ToString());
} }
/// <summary>
/// Host-020: <c>ScadaLink:Logging:MinimumLevel</c> is the documented source
/// of truth for the Serilog floor, and the explicit <c>MinimumLevel.Is</c>
/// call deliberately runs after <c>ReadFrom.Configuration(...)</c> so a
/// <c>Serilog:MinimumLevel</c> entry is overridden. To make that precedence
/// visible — instead of silently swallowed — <see cref="LoggerConfigurationFactory.Build(IConfiguration,string,string,string,TextWriter)"/>
/// writes a one-shot warning when both keys are present. The warning must
/// name both values and point the operator at the documented key. When the
/// Serilog key is absent the warning is silent.
/// </summary>
[Fact]
public void Build_BothMinimumLevelKeysSet_WarnsAboutOverride()
{
var writer = new StringWriter();
var configuration = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
["ScadaLink:Logging:MinimumLevel"] = "Warning",
["Serilog:MinimumLevel"] = "Debug",
})
.Build();
LoggerConfigurationFactory.Build(configuration, "Central", "central", "node1", writer);
var warning = writer.ToString();
Assert.Contains("warning", warning, StringComparison.OrdinalIgnoreCase);
Assert.Contains("Serilog:MinimumLevel", warning);
Assert.Contains("ScadaLink:Logging:MinimumLevel", warning);
Assert.Contains("Debug", warning);
Assert.Contains("Warning", warning);
}
[Fact]
public void Build_OnlyScadaLinkMinimumLevelSet_NoOverrideWarning()
{
var writer = new StringWriter();
var configuration = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
["ScadaLink:Logging:MinimumLevel"] = "Warning",
})
.Build();
LoggerConfigurationFactory.Build(configuration, "Central", "central", "node1", writer);
// No Serilog override -> no override-warning. (The ScadaLink value is
// a recognised level, so ParseLevel is silent too.)
Assert.Empty(writer.ToString());
}
} }