fix(api-surface): close Theme 9 — 27 naming / dead-code / config / hygiene findings
The largest themed batch — small mechanical fixes across 11 modules.
API / message hygiene:
- Comm-020: SiteAddressCacheLoaded now carries IReadOnlyDictionary /
IReadOnlyList — Akka messages must be immutable.
- Commons-016: BundleSession.MaxUnlockAttempts named constant replaces
magic 3.
- Commons-018: IOperationTrackingStore + IPartitionMaintenance moved from
Interfaces/ root to Interfaces/Services/ (namespace preserved — 9
consumers exceeded the in-prompt move threshold).
- Commons-023: TrackingStatusSnapshot.SourceNode now consistent with the
trailing-optional-with-default pattern used elsewhere.
- SR-022: AuditingDbCommand.DbConnection.set no longer uses reflection —
exposes AuditingDbConnection.Inner via internal API surface.
Dead code / config cleanup:
- ClusterInfra-011: decorative SectionName constant deleted.
- ClusterInfra-014: dead AddClusterInfrastructureActors method + its
"throws-when-called" test deleted.
- Host-021: Microsoft Logging:LogLevel block deleted from appsettings.json
(dead under Serilog).
Fail-loud over fail-silent:
- DM-021: ResolveSiteIdentifierAsync throws on missing site (was silently
substituting a DB id).
- DM-022: dropped transient Pending write — record now lands directly in
InProgress (no UI flicker, one fewer DB write).
- Host-020: LoggerConfigurationFactory emits a Console.Error warning when
both Serilog:MinimumLevel and ScadaLink:Logging:MinimumLevel are set
(ScadaLink remains truth per Host-011).
- SnF-022: NotifyCachedCallObserverAsync logs Warning on unparseable
TrackedOperationId (was silently dropping).
- SnF-023: empty siteId default replaced with $unknown-site sentinel
+ constructor normalisation.
Correctness:
- SCA-001: SupervisorStrategy XML rewritten to match actual
DefaultDecider/Restart semantics (was claiming Resume).
- SCA-003: OnUpsertAsync now restamps IngestedAtUtc on every upsert.
- SR-021: HandleDeployArtifacts now dispatches an internal
ApplyArtifactDataConnectionsToDcl message after the SQLite write so
system-wide artifact-deploy data-connection changes go live
immediately (was requiring a site restart).
- SnF-020: RetryParkedMessageAsync captures the parked row BEFORE the
local write so a concurrent delete can't skip standby replication.
Sentinels / naming collisions:
- HM-021: CentralSiteId changed from "central" to "$central"
(uncollideable — leading $ is forbidden in real SiteIdentifiers).
Doc / surface cleanups:
- SEL-018: FailedWriteCount promoted to ISiteEventLogger; XML softened
to "Available for future Health Monitoring integration".
- SnF-019: VERIFY outcome — documented parking-after-DefaultMaxRetries
in Component-StoreAndForward.md + DefaultMaxRetries XML (uniform
cap; maxRetries:0 is the unbounded escape hatch).
- SnF-021: Component-StoreAndForward.md no longer claims the tracking
table lives in SnF — it's in SiteRuntime, the interface is in Commons.
- CLI-020: bundle export response parse guarded with try/catch on
JsonException / KeyNotFoundException / FormatException — emits a
clean INVALID_RESPONSE exit instead of a stack trace.
Config:
- ClusterInfra-013: intent comment added to "catastrophic config" test.
- Host-016: appsettings.Site.json second CentralContactPoints entry
removed (was pointing at the SITE's own port); doc-key explains how
to extend.
- Host-018: NodeName added to both shipped per-role configs (was
causing SourceNode to be null on audit rows).
UI:
- CentralUI-029: replaced JS.InvokeAsync<int>("eval", …) with an ES
module import (new wwwroot/js/browser-time.js).
- CentralUI-032: AuditResultsGrid gains a Previous button backed by a
cursor stack.
10+ new regression tests across the affected projects. Build clean;
all suites green. README regenerated: 6 open (was 33).
Session-to-date: 130 of 136 originally-open Theme findings closed.
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-28 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `1eb6e97` |
|
||||
| Open findings | 5 (3 Deferred: 002, 011, 012; 5 new Open from Re-review 2026-05-28) |
|
||||
| Open findings | 0 (3 Deferred: 002, 011, 012; all 5 Open from Re-review 2026-05-28 resolved 2026-05-28) |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -1067,7 +1067,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:229`, `:407`–`:437`; `src/ScadaLink.StoreAndForward/StoreAndForwardOptions.cs:18`; `src/ScadaLink.SiteRuntime/Scripts/ScriptRuntimeContext.cs:1773`–`:1778`; `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:149`–`:156` |
|
||||
|
||||
**Description**
|
||||
@@ -1121,9 +1121,21 @@ the field value) so the invariant is enforced at the single chokepoint rather th
|
||||
relying on every caller to pass the right value — this also fixes the legacy
|
||||
`NotificationDeliveryService` path without editing the consumer.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
**Resolution (2026-05-28):**
|
||||
VERIFY outcome — the design doc's "Notifications do not park" wording (lines 47, 59)
|
||||
was the *operational intent* for the happy path, not an absolute invariant: the engine
|
||||
has always enforced `DefaultMaxRetries` uniformly across every category, and every
|
||||
sibling system (ESG, CachedDbWrite) bounds retry-then-parks for the same disk-pressure
|
||||
and operator-visibility reasons. Removing the cap for notifications would let a single
|
||||
unreachable central exhaust local disk via an unbounded buffer — worse than the
|
||||
documented "park after retry budget" behaviour. Resolution is therefore the brief's
|
||||
**default**: document the parking behaviour. Updated
|
||||
`Component-StoreAndForward.md` lines 46/58 to clarify that the `DefaultMaxRetries` cap
|
||||
applies uniformly (including to notifications) and that `maxRetries: 0` is the explicit
|
||||
escape hatch for callers that need unbounded retry. Added a `StoreAndForward-019` block
|
||||
to `StoreAndForwardOptions.DefaultMaxRetries`'s XML doc explaining the same invariant.
|
||||
No behavioural code change — existing tests (104 in
|
||||
`ScadaLink.StoreAndForward.Tests`) continue to pass.
|
||||
|
||||
### StoreAndForward-020 — `RetryParkedMessageAsync` skips standby replication when the message is deleted between local update and re-load
|
||||
|
||||
@@ -1131,7 +1143,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:599`–`:616` |
|
||||
|
||||
**Description**
|
||||
@@ -1209,9 +1221,16 @@ Add a regression test in `StoreAndForwardReplicationTests` that simulates the
|
||||
delete-between-update-and-reload race and asserts the `Requeue` replication
|
||||
operation is still emitted with the correct category.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
**Resolution (2026-05-28):**
|
||||
Applied the brief's primary recommendation — `RetryParkedMessageAsync` now captures
|
||||
the parked row up front via `GetMessageByIdAsync` (and rejects the call early if the
|
||||
row is missing or no longer `Parked`), then performs the local `RetryParkedMessageAsync`
|
||||
storage write, and finally reconstructs the post-requeue state on the captured POCO
|
||||
(`Status = Pending, RetryCount = 0, LastError = null, LastAttemptAt = null`) and
|
||||
replicates it. A concurrent `RemoveMessageAsync` or `DiscardParkedMessageAsync` running
|
||||
between the local write and the original re-load can no longer skip replication — the
|
||||
row is in hand. The category-fallback misllabelling on the racy path is gone because
|
||||
the activity log uses the captured `Category` directly.
|
||||
|
||||
### StoreAndForward-021 — Design doc claims the Operation Tracking Table lives in StoreAndForward but the implementation is in SiteRuntime
|
||||
|
||||
@@ -1219,7 +1238,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `docs/requirements/Component-StoreAndForward.md:21`, `:49`–`:51`, `:77`–`:87`, `:108`, `:114`; `src/ScadaLink.SiteRuntime/Tracking/OperationTrackingStore.cs:37`; `src/ScadaLink.StoreAndForward/` (whole module) |
|
||||
|
||||
**Description**
|
||||
@@ -1274,9 +1293,18 @@ several refactors out of date. The hierarchical map should be:
|
||||
- `Component-SiteCallAudit.md` / `Component-AuditLog.md` → telemetry emission +
|
||||
central-side mirror.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
**Resolution (2026-05-28):**
|
||||
Doc-side fix applied (per the brief, the simplest of the two options). Updated
|
||||
`Component-StoreAndForward.md`: (1) removed the "Maintain a site-local operation
|
||||
tracking table" line from Responsibilities and reworded the cached-call telemetry
|
||||
responsibility to point at the `ICachedCallLifecycleObserver` hook; (2) renamed the
|
||||
"Operation Tracking Table" section to "Operation Tracking Table (lives in Site
|
||||
Runtime, not here)" with an explicit `StoreAndForward-021` callout cross-linking to
|
||||
`Component-SiteRuntime.md` and the `IOperationTrackingStore` interface in
|
||||
Commons. The rest of the section is retained for cross-component context (the
|
||||
buffered cached-call rows carry `TrackedOperationId` so the link to the tracking row
|
||||
must still be documented somewhere) but is reworded to make clear the table itself is
|
||||
not owned here.
|
||||
|
||||
### StoreAndForward-022 — `NotifyCachedCallObserverAsync` silently drops the entire audit lifecycle when the message id is not a parseable `TrackedOperationId`
|
||||
|
||||
@@ -1284,7 +1312,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:484`–`:515` |
|
||||
|
||||
**Description**
|
||||
@@ -1333,9 +1361,14 @@ contract — the existing
|
||||
the fix is "log + skip", that test should be updated to also assert the log emission;
|
||||
if the fix is "emit anyway", the test should be replaced.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
**Resolution (2026-05-28):**
|
||||
Applied the brief's "cheap fix" — the non-GUID skip path now logs a Warning naming
|
||||
the offending `MessageId`, `Category` and `Outcome` before returning, so a
|
||||
misconfigured caller is observable instead of silently bypassing the audit pipeline.
|
||||
S&F retry bookkeeping remains untouched (the observer is still best-effort, the skip
|
||||
still returns without throwing). The existing
|
||||
`Attempt_MessageIdNotAGuid_NoObserverNotification` test still passes — its assertion
|
||||
is on `_observer.Notifications` being empty, which is unchanged.
|
||||
|
||||
### StoreAndForward-023 — `siteId` silently defaults to empty when no `IStoreAndForwardSiteContext` is registered, degrading audit telemetry correlation
|
||||
|
||||
@@ -1343,7 +1376,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/ServiceCollectionExtensions.cs:43`–`:53`; `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:99`, `:524` |
|
||||
|
||||
**Description**
|
||||
@@ -1383,9 +1416,18 @@ absent (no `AddAuditLog`), keep the empty-string default since `_siteId` is unus
|
||||
Alternatively, change `siteId` from a parameter to a `Func<string>` resolved lazily
|
||||
from the service provider so a late-registered context still takes effect.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
**Resolution (2026-05-28):**
|
||||
Applied the brief's sentinel option (less invasive than throwing — preserves the
|
||||
existing test wiring that constructs `StoreAndForwardService` without a site context).
|
||||
Introduced `StoreAndForwardService.UnknownSiteSentinel = "$unknown-site"` (leading
|
||||
`$` chosen so it cannot collide with a real site id) and the constructor now
|
||||
normalises any null/empty/whitespace `siteId` argument to that sentinel. The empty
|
||||
string can no longer reach `CachedCallAttemptContext.SourceSite`; a misconfigured
|
||||
host without an `IStoreAndForwardSiteContext` produces audit rows tagged with the
|
||||
sentinel — recognisably bad in the central audit log instead of silently merging
|
||||
into the empty bucket. All 104 existing tests pass; the only test that asserts a
|
||||
literal `SourceSite` (`CachedCallAttemptEmissionTests`) supplies `"site-77"` so the
|
||||
normalisation is a no-op there.
|
||||
|
||||
### StoreAndForward-024 — `StopAsync` does not wait for an in-flight retry sweep, so disposed dependencies can be touched after shutdown
|
||||
|
||||
|
||||
Reference in New Issue
Block a user