fix(api-surface): close Theme 9 — 27 naming / dead-code / config / hygiene findings

The largest themed batch — small mechanical fixes across 11 modules.

API / message hygiene:
- Comm-020: SiteAddressCacheLoaded now carries IReadOnlyDictionary /
  IReadOnlyList — Akka messages must be immutable.
- Commons-016: BundleSession.MaxUnlockAttempts named constant replaces
  magic 3.
- Commons-018: IOperationTrackingStore + IPartitionMaintenance moved from
  Interfaces/ root to Interfaces/Services/ (namespace preserved — 9
  consumers exceeded the in-prompt move threshold).
- Commons-023: TrackingStatusSnapshot.SourceNode now consistent with the
  trailing-optional-with-default pattern used elsewhere.
- SR-022: AuditingDbCommand.DbConnection.set no longer uses reflection —
  exposes AuditingDbConnection.Inner via internal API surface.

Dead code / config cleanup:
- ClusterInfra-011: decorative SectionName constant deleted.
- ClusterInfra-014: dead AddClusterInfrastructureActors method + its
  "throws-when-called" test deleted.
- Host-021: Microsoft Logging:LogLevel block deleted from appsettings.json
  (dead under Serilog).

Fail-loud over fail-silent:
- DM-021: ResolveSiteIdentifierAsync throws on missing site (was silently
  substituting a DB id).
- DM-022: dropped transient Pending write — record now lands directly in
  InProgress (no UI flicker, one fewer DB write).
- Host-020: LoggerConfigurationFactory emits a Console.Error warning when
  both Serilog:MinimumLevel and ScadaLink:Logging:MinimumLevel are set
  (ScadaLink remains truth per Host-011).
- SnF-022: NotifyCachedCallObserverAsync logs Warning on unparseable
  TrackedOperationId (was silently dropping).
- SnF-023: empty siteId default replaced with $unknown-site sentinel
  + constructor normalisation.

Correctness:
- SCA-001: SupervisorStrategy XML rewritten to match actual
  DefaultDecider/Restart semantics (was claiming Resume).
- SCA-003: OnUpsertAsync now restamps IngestedAtUtc on every upsert.
- SR-021: HandleDeployArtifacts now dispatches an internal
  ApplyArtifactDataConnectionsToDcl message after the SQLite write so
  system-wide artifact-deploy data-connection changes go live
  immediately (was requiring a site restart).
- SnF-020: RetryParkedMessageAsync captures the parked row BEFORE the
  local write so a concurrent delete can't skip standby replication.

Sentinels / naming collisions:
- HM-021: CentralSiteId changed from "central" to "$central"
  (uncollideable — leading $ is forbidden in real SiteIdentifiers).

Doc / surface cleanups:
- SEL-018: FailedWriteCount promoted to ISiteEventLogger; XML softened
  to "Available for future Health Monitoring integration".
- SnF-019: VERIFY outcome — documented parking-after-DefaultMaxRetries
  in Component-StoreAndForward.md + DefaultMaxRetries XML (uniform
  cap; maxRetries:0 is the unbounded escape hatch).
- SnF-021: Component-StoreAndForward.md no longer claims the tracking
  table lives in SnF — it's in SiteRuntime, the interface is in Commons.
- CLI-020: bundle export response parse guarded with try/catch on
  JsonException / KeyNotFoundException / FormatException — emits a
  clean INVALID_RESPONSE exit instead of a stack trace.

Config:
- ClusterInfra-013: intent comment added to "catastrophic config" test.
- Host-016: appsettings.Site.json second CentralContactPoints entry
  removed (was pointing at the SITE's own port); doc-key explains how
  to extend.
- Host-018: NodeName added to both shipped per-role configs (was
  causing SourceNode to be null on audit rows).

UI:
- CentralUI-029: replaced JS.InvokeAsync<int>("eval", …) with an ES
  module import (new wwwroot/js/browser-time.js).
- CentralUI-032: AuditResultsGrid gains a Previous button backed by a
  cursor stack.

10+ new regression tests across the affected projects. Build clean;
all suites green. README regenerated: 6 open (was 33).

Session-to-date: 130 of 136 originally-open Theme findings closed.
This commit is contained in:
Joseph Doherty
2026-05-28 08:39:01 -04:00
parent d190345ef0
commit 77cb0ad0e2
46 changed files with 966 additions and 278 deletions
+4 -6
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 2 |
| Open findings | 1 |
## Summary
@@ -931,9 +931,11 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.CLI/Commands/BundleCommands.cs:117-126` |
**Resolution (2026-05-28):** Wrapped the `JsonDocument.Parse` + `GetProperty` extraction in a `try/catch (JsonException or KeyNotFoundException or InvalidOperationException)` block and the `StreamBase64ToFile` call in a separate `try/catch (FormatException)`. Either failure now emits a clean `OutputFormatter.WriteError(..., "INVALID_RESPONSE")` and returns exit 1, matching the graceful-degradation pattern established by CLI-002 / CLI-003 / CLI-005. A malformed/abbreviated envelope no longer terminates the CLI with a raw stack trace.
**Description**
The export success handler does:
@@ -959,10 +961,6 @@ Wrap the parse + base64-decode in a `try` block that catches `JsonException`,
clean `OutputFormatter.WriteError(..., "INVALID_RESPONSE")` + `return 1`. Add a
regression test against a malformed-envelope stub `HttpMessageHandler`.
**Resolution**
_Unresolved._
### CLI-021 — `CliConfig.Load` crashes the CLI on a malformed config file
| | |
+7 -3
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 2 |
| Open findings | 0 |
## Summary
@@ -1429,9 +1429,11 @@ still passes (568 / 568).
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Audit/ConfigurationAuditLog.razor:248-263` |
**Resolution (2026-05-28):** Added a small 5-line `wwwroot/js/browser-time.js` ES module exporting `getTimezoneOffsetMinutes()`, and replaced the `JS.InvokeAsync<int>("eval", "new Date().getTimezoneOffset()")` call in `ConfigurationAuditLog.OnAfterRenderAsync` with a lazy `IJSObjectReference` import (`./_content/ScadaLink.CentralUI/js/browser-time.js`) + `module.InvokeAsync<int>("getTimezoneOffsetMinutes")`, matching the `session-expiry.js` / `audit-grid.js` / `nav-state.js` / `transport.js` module-import pattern. The residual `eval` JS-interop surface is gone and the page is now CSP-compatible with `unsafe-eval` forbidden.
**Description**
`OnAfterRenderAsync` fetches the browser's UTC offset with
@@ -1533,9 +1535,11 @@ docs to call out the in-memory cost per concurrent import session.
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor:76-82`; `src/ScadaLink.CentralUI/Components/Audit/AuditResultsGrid.razor.cs:65,196-197,219-220` |
**Resolution (2026-05-28):** Added a `Stack<AuditLogPaging?> _cursorStack` and `AuditLogPaging? _currentPaging` field to `AuditResultsGrid.razor.cs`. `NextPage` now pushes the current cursor before advancing; a new `PrevPage` method pops the prior cursor, reloads at that position, and decrements `_pageNumber` only if the reload succeeds (a failed fetch leaves the user on the current page rather than stranding them between pages). The filter-change reset clears the stack alongside `_rows`. The razor template now renders a `btn-group` with a Previous button (gated on `CanGoBack`) alongside the existing Next button; both buttons get the standard `disabled` treatment during loads.
**Description**
The Audit Log results grid (Bundle B / M7-T3) renders a single "Next page" button
+11 -18
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 4 |
| Open findings | 1 |
## Summary
@@ -687,8 +687,10 @@ confirmed failing, then passing after the fix. Module test suite green (18 passe
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:24-27`, `src/ScadaLink.Host/SiteServiceRegistration.cs:100`, `src/ScadaLink.Host/StartupValidator.cs:43`, `src/ScadaLink.Host/StartupValidator.cs:45`, `src/ScadaLink.Host/StartupValidator.cs:75` |
| Status | Resolved |
| Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:24-27`, `src/ScadaLink.Host/SiteServiceRegistration.cs:100`, `src/ScadaLink.Host/StartupValidator.cs:43`, `src/ScadaLink.Host/StartupValidator.cs:75` |
**Resolution (2026-05-28):** Took option (b) since wiring the constant into the Host's `SiteServiceRegistration.BindSharedOptions` / `StartupValidator` is outside this module's editable surface — deleted the `SectionName` constant from `ClusterOptions.cs` and the companion `SectionName_IsTheExpectedAppSettingsSection` test from `ClusterOptionsTests.cs`. The Host's `"ScadaLink:Cluster"` literals now stand alone (consistent with the implementation rather than the broken "single source of truth" claim). A code-comment placeholder records the rationale so a future Host-side change can reinstate the constant alongside the binding-site updates.
**Description**
@@ -722,11 +724,6 @@ Either (a) replace the hard-coded `"ScadaLink:Cluster"` literals in
claim to be the source of truth. Do not leave a public constant whose stated
guarantee the code does not deliver.
**Resolution**
_Open — needs a one-line Host-side change to reference the constant, plus a test
that proves the section name flows from this module to the Host._
### ClusterInfrastructure-012 — Validator accepts `SeedNodes.Count == 1` despite design requiring both nodes as seeds
| | |
@@ -791,9 +788,11 @@ ClusterInfrastructure.Tests).
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Status | Resolved |
| Location | `tests/ScadaLink.ClusterInfrastructure.Tests/ClusterOptionsTests.cs:47-67` |
**Resolution (2026-05-28):** Added a 10-line inline `// ClusterInfra-013: ...` block at the top of `Properties_CanBeSetToCustomValues` explicitly recording that this test exercises the POCO property setters only — the `keep-majority` strategy and `MinNrOfMembers = 2` values are explicitly forbidden in production by `ClusterOptionsValidator`, and the comment cross-references `UnsupportedSplitBrainStrategy_FailsValidation` and `MinNrOfMembers_NotOne_FailsValidation` so a future reader cannot misread the test as endorsing those values.
**Description**
`ClusterOptionsTests.Properties_CanBeSetToCustomValues` deliberately sets two values
@@ -822,19 +821,17 @@ represent a valid runtime configuration, and `ClusterOptionsValidator` rejects t
(with a cross-reference to the relevant validator tests). Two lines is enough; the
goal is to make the test's intent self-documenting.
**Resolution**
_Open._
### ClusterInfrastructure-014 — `AddClusterInfrastructureActors` is dead surface — no caller, no behaviour
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:42-48` |
**Resolution (2026-05-28):** Deleted the `AddClusterInfrastructureActors` extension method from `ServiceCollectionExtensions.cs` and its companion `AddClusterInfrastructureActors_ThrowsRatherThanSilentlySucceeding` test from `ServiceCollectionExtensionsTests.cs`. Verified no production caller existed before deletion via `grep -rn`. A code comment records the rationale (CI-001 ownership question now permanently settled; method served only to throw and was IDE-auto-complete noise). The class-level XML doc on the test file was updated to drop the stale reference to the removed test.
**Description**
`AddClusterInfrastructureActors` has now reached a curious state: it is a public
@@ -860,7 +857,3 @@ explicitly stating that this project exposes no actor-registration extension
(actor wiring lives in `ScadaLink.Host`). If the user prefers to keep the
"fail-fast" trap, mark the method `[Obsolete(true, error: true)]` so the compiler —
not the runtime — rejects the call.
**Resolution**
_Open._
+10 -4
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 5 |
| Open findings | 2 |
## Summary
@@ -784,9 +784,11 @@ accepted values on the record.
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Commons/Types/Transport/BundleSession.cs:13-16` |
**Resolution (2026-05-28):** Added `public const int MaxUnlockAttempts = 3;` to `BundleSession` with an XML doc cross-referencing the authoritative `TransportOptions.MaxUnlockAttemptsPerSession`. The `Locked` getter now reads `FailedUnlockAttempts >= MaxUnlockAttempts` instead of comparing against the literal `3`, and the property's XML doc names the constant. No call-site update required — the existing Transport-component `TransportOptions.MaxUnlockAttemptsPerSession` (also `3`) remains the operator-facing dial; this constant is the shim's own threshold, now searchable for a security review.
**Description**
`BundleSession` exposes:
@@ -880,9 +882,11 @@ needed again now.
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Commons/Interfaces/IOperationTrackingStore.cs`, `src/ScadaLink.Commons/Interfaces/IPartitionMaintenance.cs` |
**Resolution (2026-05-28):** Moved both files into `src/ScadaLink.Commons/Interfaces/Services/`, matching the REQ-COM-5b sub-folder convention alongside the other service interfaces (`ISiteAuditQueue`, `INodeIdentityProvider`, `ICachedCallLifecycleObserver`, etc.). The 9 consumer files across `ScadaLink.SiteRuntime`, `ScadaLink.AuditLog`, `ScadaLink.ConfigurationDatabase`, and `ScadaLink.Host` exceed the in-instructions 8-file STOP threshold for namespace rewrites, so the namespace was deliberately kept as `ScadaLink.Commons.Interfaces` (not `.Services`) — no consumer change required, build remains green. A comment in each moved file records the rationale and notes that adopting the canonical `.Services` namespace can be picked up alongside any future Commons-wide namespace tidy-up.
**Description**
REQ-COM-5b documents the `Interfaces/` folder as having exactly three sub-folders:
@@ -1115,9 +1119,11 @@ Two related XML-doc weaknesses, both around the new Transport / Audit surface:
|--|--|
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Commons/Messages/Audit/SiteCallQueries.cs:53-66`, `:110-123`, `src/ScadaLink.Commons/Messages/Notification/NotificationOutboxQueries.cs:26-39`, `:104-123`, `src/ScadaLink.Commons/Types/SiteCallOperational.cs:42-54`, `src/ScadaLink.Commons/Types/TrackingStatusSnapshot.cs:33-46` |
**Resolution (2026-05-28):** Read all six locations and confirmed the dominant pattern is "trailing-optional with `= null` default" (`SiteCallSummary`, `SiteCallDetail`, `NotificationSummary`, `NotificationDetail`, `NotificationOutboxQueryRequest.SourceNodeFilter`, `SiteCallQueryRequest.SourceNodeFilter` all already use this form). The single odd-one-out was `TrackingStatusSnapshot.SourceNode`, declared as `string? SourceNode` with no default — added the `= null` default to unify it with the rest. Verified both existing callers (`OperationTrackingStore.cs` and `TrackingApiTests.cs`) use named arguments, so the change is purely additive. `SiteCallOperational.SourceNode` sits in the middle of its positional parameter list rather than the trailing slot — that's a separate positional-record concern outside the "trailing-optional" pattern the finding called out, and moving it would touch many telemetry/proto consumers, so it was deliberately not touched here.
**Description**
The `SourceNode` rollout adds an optional trailing parameter to a long list of positional
+4 -2
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 3 |
| Open findings | 2 |
## Summary
@@ -1005,9 +1005,11 @@ the finding.
|--|--|
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Communication/Actors/CentralCommunicationActor.cs:567` |
**Resolution (2026-05-28):** `SiteAddressCacheLoaded`'s `SiteContacts` payload is now typed as `IReadOnlyDictionary<string, IReadOnlyList<string>>`, enforcing the Akka.NET message-immutability convention at the type level rather than relying on producer discipline. The producer (`LoadSiteAddressesFromDb`) builds the working buckets as before and wraps each inner `List<string>` with `AsReadOnly()` before constructing the message — the freeze is local to the single refresh tick and the cost is negligible. The consumer (`HandleSiteAddressCacheLoaded`) only ever read via `Keys`, foreach-deconstruct, `Select`, `Count` and `ToImmutableHashSet`, all of which are supported by the new read-only types, so no consumer changes were needed. The existing `MalformedSiteAddress_DoesNotAbortRefresh_OtherSitesStillRegistered` and `ClusterClientRouting_RoutesToConfiguredSite` regression tests exercise the producer→consumer flow and continue to pass under the read-only types.
**Description**
The Akka.NET convention is that messages crossing actor boundaries (even
+7 -3
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 2 |
| Open findings | 0 |
## Summary
@@ -1076,9 +1076,11 @@ as the actor. Tests green (80/80 in DeploymentManager.Tests).
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:107-111` |
**Resolution (2026-05-28):** `ResolveSiteIdentifierAsync` now throws `InvalidOperationException` (`"Site with ID {siteId} not found; cannot resolve its SiteIdentifier for routing."`) when the `Site` row is missing, instead of returning the numeric id rendered as a string. The deploy path's existing try/catch turns the throw into a `DeploymentStatus.Failed` record carrying the descriptive message (the `DeploymentManager-001`/`-002` cleanup write the failure with `CancellationToken.None`); the lifecycle paths (Disable/Enable/Delete) propagate the exception so the CLI/UI caller surfaces the actual cause to the operator rather than seeing a confusing downstream "unknown site" routing error. The repository contract already returned `Site?`, so the null path is now type-visible at the call site instead of silently papered over.
**Description**
```
@@ -1114,9 +1116,11 @@ returns `Site?`, so the null path is type-visible; just don't paper over it.
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.DeploymentManager/DeploymentService.cs:178-194` |
**Resolution (2026-05-28):** The transient `Pending` write was dropped — the deployment record is now created directly in `DeploymentStatus.InProgress`, which collapses the start of the deploy into a single `AddDeploymentRecordAsync` + `SaveChangesAsync` + `NotifyStatusChange` (instead of two writes back-to-back). The flattening, validation, and `TryReconcileWithSiteAsync` round-trip have all completed before the insert, and the deploy command is sent immediately after, so `Pending` carried no operational meaning between the two writes. `InProgress` retains its documented "sent to site, awaiting response" semantics. Eliminating the extra `SaveChangesAsync` round-trip also removes the `Pending``InProgress` flicker the CentralUI-006 deployment-status page used to render via the second `IDeploymentStatusNotifier.NotifyStatusChanged` invocation.
**Description**
`DeployInstanceAsync` does:
+4 -2
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 3 |
| Open findings | 2 |
## Summary
@@ -1034,9 +1034,11 @@ online.
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.HealthMonitoring/CentralHealthReportLoop.cs:22`, `src/ScadaLink.HealthMonitoring/CentralHealthAggregator.cs:224-226` |
**Resolution (2026-05-28):** `CentralHealthReportLoop.CentralSiteId` is now `"$central"` instead of `"central"`. The leading `$` is forbidden in operator-set `Site.SiteIdentifier` values (which are plain identifiers), so the synthetic central self-report cannot collide with a real site whose identifier happens to be the bare word `"central"`. The collision case the finding called out — two reports clobbering each other in the aggregator keyspace via the sequence-number guard and a real site inheriting `CentralOfflineTimeout` and staying falsely-online for an extra two minutes — is now impossible. The aggregator (`CentralHealthAggregator.CheckForOfflineSites`), the Central UI health dashboard (`Monitoring/Health.razor`), and every test reference the constant rather than the literal string, so the value change is local — no consumer code needed updating. Existing `CentralHealthAggregatorTests` and `CentralHealthReportLoopTests` already use the constant, so they continue to pin the central-self-report identity through the new sentinel.
**Description**
`CentralHealthAggregator.CheckForOfflineSites` looks up the per-site offline
+9 -17
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 4 |
| Open findings | 0 |
## Summary
@@ -831,7 +831,7 @@ Full Host suite green (182 passed).
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Host/appsettings.Site.json:33-37` |
**Description**
@@ -861,9 +861,7 @@ multi-node layout). Consider extending `StartupValidator` to reject any
node's `NodeHostname`+`RemotingPort`. Add a regression test in
`StartupValidatorTests` mirroring `Site_SeedNodeOnGrpcPort_FailsValidation`.
**Resolution**
_Open._
**Resolution (2026-05-28):** The shipped `appsettings.Site.json` `CentralContactPoints` entry that pointed at the site's own remoting port (`localhost:8082`) was removed — the dev-loopback default now lists only the single central node (`akka.tcp://scadalink@localhost:8081`), which is the actually-reachable target in the single-node dev layout. A `_centralContactPoints` doc-key comment was added immediately above the array calling out the per-entry rule (each entry MUST be a central node's remoting endpoint, not the site's own remoting port) and explaining how to extend the list with a second central node (`akka.tcp://scadalink@central-b-host:8081`) in a multi-central deployment so ClusterClient can fail over. The dangerous example pattern that would have been copied into multi-central configs no longer exists in the template. `StartupValidator` cross-check is left as a follow-up — the documented rule plus the corrected template removes the immediate misconfiguration risk.
### Host-017 — Site-shutdown ordering from REQ-HOST-7 is not wired
@@ -943,7 +941,7 @@ unit suite covers both server-side invariants and the wiring is a single
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Host/appsettings.Central.json`, `src/ScadaLink.Host/appsettings.Site.json`, `src/ScadaLink.Host/NodeOptions.cs:10-16` |
**Description**
@@ -974,9 +972,7 @@ per-node in multi-node deployments. Consider validating in `StartupValidator`
that `NodeName` is non-empty, or accept the null and document explicitly that
single-node dev deployments leave `SourceNode` null.
**Resolution**
_Open._
**Resolution (2026-05-28):** The shipped per-role templates now set `ScadaLink:Node:NodeName``central-a` in `appsettings.Central.json` and `node-a` in `appsettings.Site.json` — so dev audit rows are stamped with a real `SourceNode` value (instead of `NodeIdentityProvider` normalising the missing key to `null`) and the indexed `IX_AuditLog_Node_Occurred` lookup actually narrows. A `_nodeName` doc-key comment was added beside each `Node` section explaining the convention (`central-a`/`central-b` for central, `node-a`/`node-b` for site), pointing at the docker per-node configs (which already overrode the field), and noting that the value must be overridden per-node in multi-node deployments and that an empty value still normalises to a `NULL` SourceNode. The shipped dev templates now match the per-node docker examples — a developer running the binary directly no longer sees a null `SourceNode`.
### Host-019 — Migration `StartupRetry` call drops the host `CancellationToken`
@@ -1021,7 +1017,7 @@ _Open._
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Host/LoggerConfigurationFactory.cs:36-43` |
**Description**
@@ -1050,9 +1046,7 @@ current "ScadaLink:Logging" path and reject `Serilog:MinimumLevel` if present
(throw at startup so the operator sees the conflict). At minimum, expand the
XML doc + REQ-HOST-8 to spell out the precedence explicitly.
**Resolution**
_Open._
**Resolution (2026-05-28):** `ScadaLink:Logging:MinimumLevel` is now the documented single source of truth for the Serilog floor (Host-011's `LoggingOptions` binding), and the precedence is made visible — `LoggerConfigurationFactory.Build` writes a one-shot warning to `Console.Error` when both `ScadaLink:Logging:MinimumLevel` and `Serilog:MinimumLevel` (or `Serilog:MinimumLevel:Default`) are present, naming both values and pointing the operator at the documented key. Order of operations is unchanged — `MinimumLevel.Is(...)` deliberately runs after `ReadFrom.Configuration(...)` so the ScadaLink value wins — but the silent-override behaviour is now loud. The class XML doc gained a Host-020 paragraph explicitly spelling out the precedence. A test-visible `Build(..., TextWriter warningWriter)` overload mirrors the `ParseLevel` Host-022 pattern so the warning can be asserted in unit tests; the production four-arg overload delegates with `Console.Error`.
### Host-021 — Microsoft `Logging:LogLevel` section in `appsettings.json` is dead config under Serilog
@@ -1060,7 +1054,7 @@ _Open._
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Host/appsettings.json:2-6` |
**Description**
@@ -1084,9 +1078,7 @@ explaining it is intentionally retained for non-Serilog tooling. Document the
authoritative location (`Serilog` + `ScadaLink:Logging`) in
`Component-Host.md` REQ-HOST-8 if not already explicit.
**Resolution**
_Open._
**Resolution (2026-05-28):** Confirmed by repository-wide grep that no code reads `Logging:LogLevel` (the Host calls `builder.Host.UseSerilog()` which replaces the default `ILoggerFactory` setup with Serilog as the only provider), so the block was pure dead config. Removed the `Logging:LogLevel:Default = Information` block from `appsettings.json` and replaced it with a `_logging` doc-key comment explaining the rationale (Serilog is the sole provider) and pointing operators at the two authoritative keys: `ScadaLink:Logging:MinimumLevel` for the floor (bound to `LoggingOptions` per Host-011) and the `Serilog` section for sinks (Host-014's `ReadFrom.Configuration`). The Host-014 regression test (`SerilogSinkConfigTests.ShippedAppSettings_HasSerilogSection_WithConsoleAndFileSinks`) still asserts the surviving `Serilog` section's shape, so removing the Microsoft block did not break the existing pinning.
### Host-022 — `ParseLevel` silently coerces unrecognised `MinimumLevel` to `Information`
+17 -44
View File
@@ -41,35 +41,35 @@ module file and counted in **Total**.
|----------|---------------|
| Critical | 0 |
| High | 0 |
| Medium | 13 |
| Low | 20 |
| **Total** | **33** |
| Medium | 5 |
| Low | 1 |
| **Total** | **6** |
## Module Status
| Module | Last reviewed | Commit | Open (C/H/M/L) | Open | Total |
|--------|---------------|--------|----------------|------|-------|
| [AuditLog](AuditLog/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/0 | 1 | 11 |
| [CLI](CLI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 23 |
| [CentralUI](CentralUI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/2 | 2 | 33 |
| [ClusterInfrastructure](ClusterInfrastructure/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/3 | 3 | 14 |
| [Commons](Commons/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/3 | 3 | 23 |
| [Communication](Communication/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 22 |
| [CLI](CLI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 |
| [CentralUI](CentralUI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 33 |
| [ClusterInfrastructure](ClusterInfrastructure/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 14 |
| [Commons](Commons/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 |
| [Communication](Communication/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 22 |
| [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 24 |
| [DataConnectionLayer](DataConnectionLayer/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 22 |
| [DeploymentManager](DeploymentManager/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/2 | 2 | 24 |
| [DeploymentManager](DeploymentManager/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 24 |
| [ExternalSystemGateway](ExternalSystemGateway/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/0 | 1 | 23 |
| [HealthMonitoring](HealthMonitoring/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 23 |
| [Host](Host/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/3 | 4 | 22 |
| [HealthMonitoring](HealthMonitoring/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 |
| [Host](Host/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 22 |
| [InboundAPI](InboundAPI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 25 |
| [ManagementService](ManagementService/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 |
| [NotificationOutbox](NotificationOutbox/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 10 |
| [NotificationService](NotificationService/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 25 |
| [Security](Security/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 21 |
| [SiteCallAudit](SiteCallAudit/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/0 | 2 | 6 |
| [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 23 |
| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/0 | 2 | 26 |
| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/3/2 | 5 | 24 |
| [SiteCallAudit](SiteCallAudit/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 6 |
| [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 23 |
| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 26 |
| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 24 |
| [TemplateEngine](TemplateEngine/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/3/0 | 3 | 22 |
| [Transport](Transport/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 12 |
@@ -88,45 +88,18 @@ _None open._
_None open._
### Medium (13)
### Medium (5)
| ID | Module | Title |
|----|--------|-------|
| AuditLog-001 | [AuditLog](AuditLog/findings.md) | Combined-telemetry transport is plumbed end-to-end but never invoked in production |
| ExternalSystemGateway-020 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | `JsonElementToParameterValue` silently downcasts non-Int64 JSON numbers to `double`, losing precision for `decimal` SQL parameters on retry |
| Host-016 | [Host](Host/findings.md) | Site `CentralContactPoints` second entry targets the site's own remoting port |
| SiteCallAudit-001 | [SiteCallAudit](SiteCallAudit/findings.md) | SupervisorStrategy override is dead code; XML claims Resume that is not enforced |
| SiteCallAudit-003 | [SiteCallAudit](SiteCallAudit/findings.md) | `OnUpsertAsync` does not refresh `IngestedAtUtc`; direct-write callers must remember to stamp it |
| SiteRuntime-021 | [SiteRuntime](SiteRuntime/findings.md) | `HandleDeployArtifacts` updates `DataConnections` in SQLite but never sends `CreateConnectionCommand` to the DCL |
| SiteRuntime-022 | [SiteRuntime](SiteRuntime/findings.md) | `AuditingDbCommand.DbConnection.set` uses reflection to read `AuditingDbConnection._inner` |
| StoreAndForward-019 | [StoreAndForward](StoreAndForward/findings.md) | Notifications park after `DefaultMaxRetries` exhaustion, contradicting "retried until central acks" |
| StoreAndForward-020 | [StoreAndForward](StoreAndForward/findings.md) | `RetryParkedMessageAsync` skips standby replication when the message is deleted between local update and re-load |
| StoreAndForward-021 | [StoreAndForward](StoreAndForward/findings.md) | Design doc claims the Operation Tracking Table lives in StoreAndForward but the implementation is in SiteRuntime |
| TemplateEngine-018 | [TemplateEngine](TemplateEngine/findings.md) | `DiffService` reports no entries for added/removed/changed connections |
| TemplateEngine-019 | [TemplateEngine](TemplateEngine/findings.md) | `TemplateResolver.BuildInheritanceChain` still uses the `0`-as-no-parent sentinel that was removed from `CycleDetector` |
| TemplateEngine-020 | [TemplateEngine](TemplateEngine/findings.md) | `Create*` audit entries are written with `EntityId = "0"` before `SaveChangesAsync` populates the real key |
### Low (20)
### Low (1)
| ID | Module | Title |
|----|--------|-------|
| CLI-020 | [CLI](CLI/findings.md) | `bundle export` success-envelope parse is unguarded |
| CentralUI-029 | [CentralUI](CentralUI/findings.md) | `ConfigurationAuditLog` uses `JS.InvokeAsync<int>("eval", ...)` instead of a dedicated JS module |
| CentralUI-032 | [CentralUI](CentralUI/findings.md) | `AuditResultsGrid` paging is forward-only, no Previous button |
| ClusterInfrastructure-011 | [ClusterInfrastructure](ClusterInfrastructure/findings.md) | `SectionName` constant is decorative — no binding site references it |
| ClusterInfrastructure-013 | [ClusterInfrastructure](ClusterInfrastructure/findings.md) | Test uses catastrophic config values without an inline-intent comment |
| ClusterInfrastructure-014 | [ClusterInfrastructure](ClusterInfrastructure/findings.md) | `AddClusterInfrastructureActors` is dead surface — no caller, no behaviour |
| Commons-016 | [Commons](Commons/findings.md) | `BundleSession.Locked` uses a magic `3` rather than a named constant |
| Commons-018 | [Commons](Commons/findings.md) | `IOperationTrackingStore` and `IPartitionMaintenance` are at the root of `Interfaces/` instead of `Interfaces/Services/` |
| Commons-023 | [Commons](Commons/findings.md) | Trailing-optional `SourceNode` on positional records mixes additive evolution patterns |
| Communication-020 | [Communication](Communication/findings.md) | `SiteAddressCacheLoaded` carries mutable `Dictionary`/`List` types |
| DeploymentManager-021 | [DeploymentManager](DeploymentManager/findings.md) | `ResolveSiteIdentifierAsync` silently substitutes the DB id when the site row is missing |
| DeploymentManager-022 | [DeploymentManager](DeploymentManager/findings.md) | `Pending` and `InProgress` are written back-to-back with no intervening work |
| HealthMonitoring-021 | [HealthMonitoring](HealthMonitoring/findings.md) | `CentralSiteId = "central"` reserved constant silently collides with a real site named "central" |
| Host-018 | [Host](Host/findings.md) | Shipped per-role configs omit `NodeOptions.NodeName`, leaving `SourceNode` null |
| Host-020 | [Host](Host/findings.md) | `MinimumLevel.Is` silently overrides any operator-set `Serilog:MinimumLevel` |
| Host-021 | [Host](Host/findings.md) | Microsoft `Logging:LogLevel` section in `appsettings.json` is dead config under Serilog |
| SiteEventLogging-018 | [SiteEventLogging](SiteEventLogging/findings.md) | `FailedWriteCount` is exposed but never consumed by Health Monitoring |
| StoreAndForward-022 | [StoreAndForward](StoreAndForward/findings.md) | `NotifyCachedCallObserverAsync` silently drops the entire audit lifecycle when the message id is not a parseable `TrackedOperationId` |
| StoreAndForward-023 | [StoreAndForward](StoreAndForward/findings.md) | `siteId` silently defaults to empty when no `IStoreAndForwardSiteContext` is registered, degrading audit telemetry correlation |
| Transport-012 | [Transport](Transport/findings.md) | "Bundle Import" filter promised in design doc not surfaced in Configuration Audit Log Viewer UI |
+5 -9
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 4 |
| Open findings | 2 |
## Summary
@@ -50,7 +50,7 @@ tests using a shared `MsSqlMigrationFixture`.
|--|--|
| Severity | Medium |
| Category | Akka.NET conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs:32-46`, `src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs:147-151` |
**Description**
@@ -98,9 +98,7 @@ Either:
The CLAUDE.md "Resume for coordinator actors" decision applies to actors with
children (Site Runtime hierarchy) — not to leaf cluster singletons.
**Resolution**
_Unresolved._
**Resolution (2026-05-28):** Rewrote the class-level XML on `SiteCallAuditActor` plus the method-level XML on `SupervisorStrategy()` to accurately describe what the override does — a one-for-one strategy with `DefaultDecider` (Restart on most exceptions, Stop on `ActorInitializationException`/`ActorKilledException`) and `maxNrOfRetries: 0`, governing the actor's *children* (the actor has none today, so the override is currently inert). Dropped the misleading "Resume" claim. The new docs make clear that self-supervision of this cluster singleton is the parent `ClusterSingletonManager`'s concern and the actor's own resilience comes from the in-handler `try/catch` in `OnUpsertAsync`, not from this override. No behaviour change — pure documentation fix; existing 24 SiteCallAudit tests remain green.
### SiteCallAudit-002 — Singleton failover does not wait for in-flight async upserts
@@ -154,7 +152,7 @@ Notification Outbox sibling has the same pattern.
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs:153-193` |
**Description**
@@ -190,9 +188,7 @@ inconsistent with the dual-write code path and undocumented.
Preferred: stamp inside the actor — same as the combined-telemetry path —
because callers cannot in general know the actor is colocated on central.
**Resolution**
_Unresolved._
**Resolution (2026-05-28):** `OnUpsertAsync` now rewrites the incoming `SiteCall` via `cmd.SiteCall with { IngestedAtUtc = DateTime.UtcNow }` immediately before calling `repository.UpsertAsync`, mirroring `AuditLogIngestActor`'s combined-telemetry hot path. The repository writes `IngestedAtUtc` on both the insert-if-not-exists and the monotonic UPDATE legs (`SiteCallAuditRepository.UpsertAsync`), so the column is writable on every upsert. Callers (telemetry, the deferred reconciliation puller, any future direct-write) no longer need to remember to stamp a central-side timestamp — the actor owns it. Existing 24 SiteCallAudit tests remain green (the MSSQL-fixture test constructs rows with `DateTime.UtcNow` and doesn't assert the exact value, so the actor's re-stamp is backward compatible).
### SiteCallAudit-004 — Reconciliation puller and daily terminal-purge scheduler still deferred; design-doc drift
+4 -2
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 3 |
| Open findings | 2 |
## Summary
@@ -901,9 +901,11 @@ refused.
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:67-71,225-226` |
**Resolution (2026-05-28):** Took option (a)-via-(b) from the recommendation. Softened the `SiteEventLogger.FailedWriteCount` XML doc to describe the actual state ("Available for future Health Monitoring integration — the counter is correct and observable, but the central health-metric pipeline does not yet poll it"), removing the misleading "Health Monitoring can detect a logging outage" claim. The Health Monitoring wiring is left as a tracked follow-up (it requires a `ScadaLink.HealthMonitoring` source change that belongs in a different batch). Promoted `FailedWriteCount { get; }` onto `ISiteEventLogger` so the eventual Health consumer reads it through the interface without a concrete-type downcast. No behaviour change — pure documentation + interface-surface tidy-up; `SiteEventLogger` already exposed the property publicly, and no test fakes/mocks of `ISiteEventLogger` exist in the repo (grep confirms only `SiteEventLogger` implements it), so the interface addition is non-breaking. Existing 59 SiteEventLogging tests remain green.
**Description**
`SiteEventLogger.FailedWriteCount` was added under SiteEventLogging-008 with the
+7 -3
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 5 |
| Open findings | 3 |
## Summary
@@ -1038,9 +1038,11 @@ be gated on "no instance with this name is currently terminating".
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.SiteRuntime/Actors/DeploymentManagerActor.cs:931` |
**Resolution (2026-05-28):** Took the "refactor `EnsureDclConnections` into a shared field-based helper" path. Extracted a new `EnsureDclConnection(name, protocol, primaryJson, backupJson, failoverRetryCount)` method that owns the hash-cache check and the `CreateConnectionCommand` Tell — both the existing inline `EnsureDclConnections(configJson)` and the new artifact path now drive through it. `ComputeConnectionConfigHash` got a field-based overload so the artifact path (which carries data directly on `DataConnectionArtifact`) reuses the same hash logic as the `ConnectionConfig`-based inline path. To keep `_createdConnections` mutation actor-thread-confined (the artifact-deploy persistence runs inside a `Task.Run`), the off-thread persistence dispatches a new internal `ApplyArtifactDataConnectionsToDcl` message back to `Self` after the SQLite writes; the actor-thread handler then iterates and invokes `EnsureDclConnection`. The DCL only sees `CreateConnectionCommand` (no `Update`/`Delete` messages exist in the codebase, and `CreateConnectionCommand` is treated as upsert-by-name — same shape as the inline-config path). Build clean; 302 SiteRuntime tests green (the existing `EnsureDclConnections_ConnectionConfigChanged_ReissuesCreateCommand` regression test still passes through the refactored shared helper).
**Description**
`HandleDeployArtifacts` persists the artifact bundle (shared scripts, external
@@ -1088,9 +1090,11 @@ and artifact paths can drive through it.
|--|--|
| Severity | Medium |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.SiteRuntime/Scripts/AuditingDbCommand.cs:138` |
**Resolution (2026-05-28):** Took the recommended "expose a proper API surface" path (the SiteRuntime-006 precedent). Added an `internal DbConnection Inner => _inner;` accessor to `AuditingDbConnection`; both classes are `internal sealed` in the same assembly, so the accessor stays out of the public API. The `AuditingDbCommand.DbConnection` setter now unwraps an `AuditingDbConnection` via `auditing.Inner` instead of `Type.GetField("_inner", BindingFlags.NonPublic | BindingFlags.Instance)!.GetValue(...)`. No reflection, no `!.` null-forgiveness hiding a runtime crash, no static-analyzer/IL-trim noise — and the same module that enforces "no `System.Reflection` in scripts" no longer reflects internally. The getter's `_wrappingConnection ?? _inner.Connection` fallback was left as-is; addressing the `CreateDbCommand()` round-trip concern is a separate behavioural decision (the finding marked it secondary). Build clean; 302 SiteRuntime tests green.
**Description**
The `DbConnection` setter on `AuditingDbCommand` unwraps an
+63 -21
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 5 (3 Deferred: 002, 011, 012; 5 new Open from Re-review 2026-05-28) |
| Open findings | 0 (3 Deferred: 002, 011, 012; all 5 Open from Re-review 2026-05-28 resolved 2026-05-28) |
## Summary
@@ -1067,7 +1067,7 @@ _Unresolved._
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:229`, `:407``:437`; `src/ScadaLink.StoreAndForward/StoreAndForwardOptions.cs:18`; `src/ScadaLink.SiteRuntime/Scripts/ScriptRuntimeContext.cs:1773``:1778`; `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:149``:156` |
**Description**
@@ -1121,9 +1121,21 @@ the field value) so the invariant is enforced at the single chokepoint rather th
relying on every caller to pass the right value — this also fixes the legacy
`NotificationDeliveryService` path without editing the consumer.
**Resolution**
_Unresolved._
**Resolution (2026-05-28):**
VERIFY outcome — the design doc's "Notifications do not park" wording (lines 47, 59)
was the *operational intent* for the happy path, not an absolute invariant: the engine
has always enforced `DefaultMaxRetries` uniformly across every category, and every
sibling system (ESG, CachedDbWrite) bounds retry-then-parks for the same disk-pressure
and operator-visibility reasons. Removing the cap for notifications would let a single
unreachable central exhaust local disk via an unbounded buffer — worse than the
documented "park after retry budget" behaviour. Resolution is therefore the brief's
**default**: document the parking behaviour. Updated
`Component-StoreAndForward.md` lines 46/58 to clarify that the `DefaultMaxRetries` cap
applies uniformly (including to notifications) and that `maxRetries: 0` is the explicit
escape hatch for callers that need unbounded retry. Added a `StoreAndForward-019` block
to `StoreAndForwardOptions.DefaultMaxRetries`'s XML doc explaining the same invariant.
No behavioural code change — existing tests (104 in
`ScadaLink.StoreAndForward.Tests`) continue to pass.
### StoreAndForward-020 — `RetryParkedMessageAsync` skips standby replication when the message is deleted between local update and re-load
@@ -1131,7 +1143,7 @@ _Unresolved._
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:599``:616` |
**Description**
@@ -1209,9 +1221,16 @@ Add a regression test in `StoreAndForwardReplicationTests` that simulates the
delete-between-update-and-reload race and asserts the `Requeue` replication
operation is still emitted with the correct category.
**Resolution**
_Unresolved._
**Resolution (2026-05-28):**
Applied the brief's primary recommendation — `RetryParkedMessageAsync` now captures
the parked row up front via `GetMessageByIdAsync` (and rejects the call early if the
row is missing or no longer `Parked`), then performs the local `RetryParkedMessageAsync`
storage write, and finally reconstructs the post-requeue state on the captured POCO
(`Status = Pending, RetryCount = 0, LastError = null, LastAttemptAt = null`) and
replicates it. A concurrent `RemoveMessageAsync` or `DiscardParkedMessageAsync` running
between the local write and the original re-load can no longer skip replication — the
row is in hand. The category-fallback misllabelling on the racy path is gone because
the activity log uses the captured `Category` directly.
### StoreAndForward-021 — Design doc claims the Operation Tracking Table lives in StoreAndForward but the implementation is in SiteRuntime
@@ -1219,7 +1238,7 @@ _Unresolved._
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `docs/requirements/Component-StoreAndForward.md:21`, `:49``:51`, `:77``:87`, `:108`, `:114`; `src/ScadaLink.SiteRuntime/Tracking/OperationTrackingStore.cs:37`; `src/ScadaLink.StoreAndForward/` (whole module) |
**Description**
@@ -1274,9 +1293,18 @@ several refactors out of date. The hierarchical map should be:
- `Component-SiteCallAudit.md` / `Component-AuditLog.md` → telemetry emission +
central-side mirror.
**Resolution**
_Unresolved._
**Resolution (2026-05-28):**
Doc-side fix applied (per the brief, the simplest of the two options). Updated
`Component-StoreAndForward.md`: (1) removed the "Maintain a site-local operation
tracking table" line from Responsibilities and reworded the cached-call telemetry
responsibility to point at the `ICachedCallLifecycleObserver` hook; (2) renamed the
"Operation Tracking Table" section to "Operation Tracking Table (lives in Site
Runtime, not here)" with an explicit `StoreAndForward-021` callout cross-linking to
`Component-SiteRuntime.md` and the `IOperationTrackingStore` interface in
Commons. The rest of the section is retained for cross-component context (the
buffered cached-call rows carry `TrackedOperationId` so the link to the tracking row
must still be documented somewhere) but is reworded to make clear the table itself is
not owned here.
### StoreAndForward-022 — `NotifyCachedCallObserverAsync` silently drops the entire audit lifecycle when the message id is not a parseable `TrackedOperationId`
@@ -1284,7 +1312,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:484``:515` |
**Description**
@@ -1333,9 +1361,14 @@ contract — the existing
the fix is "log + skip", that test should be updated to also assert the log emission;
if the fix is "emit anyway", the test should be replaced.
**Resolution**
_Unresolved._
**Resolution (2026-05-28):**
Applied the brief's "cheap fix" — the non-GUID skip path now logs a Warning naming
the offending `MessageId`, `Category` and `Outcome` before returning, so a
misconfigured caller is observable instead of silently bypassing the audit pipeline.
S&F retry bookkeeping remains untouched (the observer is still best-effort, the skip
still returns without throwing). The existing
`Attempt_MessageIdNotAGuid_NoObserverNotification` test still passes — its assertion
is on `_observer.Notifications` being empty, which is unchanged.
### StoreAndForward-023 — `siteId` silently defaults to empty when no `IStoreAndForwardSiteContext` is registered, degrading audit telemetry correlation
@@ -1343,7 +1376,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/ServiceCollectionExtensions.cs:43``:53`; `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:99`, `:524` |
**Description**
@@ -1383,9 +1416,18 @@ absent (no `AddAuditLog`), keep the empty-string default since `_siteId` is unus
Alternatively, change `siteId` from a parameter to a `Func<string>` resolved lazily
from the service provider so a late-registered context still takes effect.
**Resolution**
_Unresolved._
**Resolution (2026-05-28):**
Applied the brief's sentinel option (less invasive than throwing — preserves the
existing test wiring that constructs `StoreAndForwardService` without a site context).
Introduced `StoreAndForwardService.UnknownSiteSentinel = "$unknown-site"` (leading
`$` chosen so it cannot collide with a real site id) and the constructor now
normalises any null/empty/whitespace `siteId` argument to that sentinel. The empty
string can no longer reach `CachedCallAttemptContext.SourceSite`; a misconfigured
host without an `IStoreAndForwardSiteContext` produces audit rows tagged with the
sentinel — recognisably bad in the central audit log instead of silently merging
into the empty bucket. All 104 existing tests pass; the only test that asserts a
literal `SourceSite` (`CachedCallAttemptEmissionTests`) supplies `"site-77"` so the
normalisation is a no-op there.
### StoreAndForward-024 — `StopAsync` does not wait for an in-flight retry sweep, so disposed dependencies can be touched after shutdown