fix(concurrency/lifetime): close Theme 5 — 10 concurrency / DI / scope findings

Concurrency hazards, DI lifetime hygiene, and one verify-only confirmation
across 8 modules. Highlights:

Concurrency:
- CentralUI-030: SandboxConsoleCapture writes routed through WriteSynchronized
  locking on the captured StringWriter — intra-script Task fan-out can no
  longer corrupt the per-call buffer.
- Commons-021: ExternalCallResult.Response now backed by Lazy<dynamic?>
  (ExecutionAndPublication) — no more benign double-parse race.
- CD-017: DeploymentManagerRepository.DeleteDeploymentRecordAsync now takes
  an expected RowVersion and seeds entry.OriginalValues so EF emits
  DELETE ... WHERE Id=@id AND RowVersion=@prior; stale RowVersion now
  throws DbUpdateConcurrencyException instead of silent overwrite.
- Transport-009: AuditCorrelationContext.BundleImportId backed by
  AsyncLocal<Guid?> so concurrent imports get per-logical-call isolation
  (was a scoped instance shared via AuditService across runs).

DI / lifetime:
- AuditLog-003: All 3 AuditLog actor handlers switched to CreateAsyncScope
  + await using — async EF disposal no longer swallowed.
- AuditLog-007: INodeIdentityProvider resolution standardised on
  GetRequiredService<>() (was mixed with GetService<>()).
- AuditLog-011: AddAuditLogHealthMetricsBridge guarded by sentinel
  descriptor check — calling twice no longer double-registers the hosted
  service.

Shutdown / supervision:
- SiteCallAudit-002: AkkaHostedService adds a CoordinatedShutdown
  cluster-leave task (drain-site-call-audit-singleton) that issues a
  bounded GracefulStop(10s) so failover waits for in-flight upserts.

Registration safety:
- NS-020: AkkaHostedService now guards NotificationForwarder S&F
  registration with _notificationDeliveryHandlerRegistered + throws
  InvalidOperationException on double-register to make the regression loud.

VERIFY-only closures:
- NotifOutbox-005: Confirmed already closed by CD-015 fix (ac96b83) —
  NotificationOutboxRepository.InsertIfNotExistsAsync uses the same
  raw-SQL IF NOT EXISTS + 2601/2627 swallow pattern; race eliminated.

5+ new regression tests (CentralUI sandbox WhenAll, ExternalCallResult
64-reader Barrier, AuditLog DI idempotency, RowVersion stale-throw,
SiteCallAudit-002 shutdown drain). Build clean; affected suites all green.
README regenerated: 65 open (was 75).
This commit is contained in:
Joseph Doherty
2026-05-28 07:29:41 -04:00
parent 6ae0fea558
commit 2ed5c6c379
25 changed files with 699 additions and 239 deletions
+33 -10
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 6 |
| Open findings | 3 |
## Summary
@@ -158,7 +158,7 @@ override as a children-only forward-compat placeholder, and state the actual
|--|--|
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs:133`, `src/ScadaLink.AuditLog/Central/AuditLogPurgeActor.cs:139`, `src/ScadaLink.AuditLog/Central/SiteAuditReconciliationActor.cs:178` |
**Description**
@@ -184,9 +184,16 @@ pattern with `await using var scope = _services.CreateAsyncScope();`. The DI sco
will dispose asynchronously and the EF Core context will be released without
blocking the actor thread.
**Resolution**
**Resolution (2026-05-28):**
_Unresolved._
All three handlers now use `CreateAsyncScope()` + `await using var scope = ...`.
`AuditLogIngestActor.OnIngestAsync` factored the per-batch loop into a shared
`IngestWithRepositoryAsync` helper so the injected-repository test ctor and
the scoped production path both reach the same body without duplicating the
per-row try/catch. `AuditLogPurgeActor.OnTickAsync` and
`SiteAuditReconciliationActor.OnTickAsync` dropped the `try/finally { scope.Dispose(); }`
pattern in favour of the `await using` lexical scope. EF Core DbContexts now
dispose asynchronously across every audit ingest path.
### AuditLog-004 — `SiteAuditReconciliationActor` advances cursor even on per-row insert failure, silently abandoning permanently-failing rows
@@ -342,7 +349,7 @@ documents the choice. Behaviour for context-free callers is unchanged.
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs:148-218` |
**Description**
@@ -376,9 +383,16 @@ inside `AddAuditLog` (with a sensible default — null node name returns `<unkno
add an explicit guard at the top of `AddAuditLog` that throws if no provider has been
registered yet (`services.Any(d => d.ServiceType == typeof(INodeIdentityProvider))`).
**Resolution**
**Resolution (2026-05-28):**
_Unresolved._
Took option (b) — standardized all three consumers on `GetRequiredService<INodeIdentityProvider>()`.
The Host (`SiteServiceRegistration.BindSharedOptions`) registers the provider on
both site and central paths per the InboundAPI-022 / Host registration sweep,
and the `AddAuditLogTests` fixture binds a `FakeNodeIdentityProvider`. A silent
`GetService()` returning null was masking a future composition root that forgot
the registration; the strict resolution surfaces that bug at first
`ICachedCallTelemetryForwarder` / `CachedCallLifecycleBridge` / `ICentralAuditWriter`
resolution instead.
### AuditLog-008 — Test composition roots that omit `IAuditPayloadFilter` silently pass UNREDACTED payloads through the writer chain
@@ -510,7 +524,7 @@ existing top-level catch swallows the `OperationCanceledException`.
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs:53-55, 263-276, 301-346` |
**Description**
@@ -535,6 +549,15 @@ or (b) explicitly document idempotency on the public surface of every helper and
verify with a unit test in `AddAuditLogTests`. Option (a) matches the pattern other
SDK extensions use and removes a foot-gun.
**Resolution**
**Resolution (2026-05-28):**
_Unresolved._
Took option (a) for `AddAuditLogHealthMetricsBridge` — guarded by a sentinel
check on the `SiteAuditBacklogReporter` hosted-service descriptor (the helper's
exclusive contribution to the collection). A second call short-circuits before
any `Replace` / `AddHostedService` runs, so the hosted service registers
exactly once. New `AddAuditLogHealthMetricsBridge_IsIdempotent_DoesNotDoubleRegister_HostedService`
test in `AddAuditLogTests` calls the helper twice and asserts a single
`IHostedService` descriptor for `SiteAuditBacklogReporter`. The
`AddAuditLogCentralMaintenance` helper is left for a follow-up — it is only
ever called from the central composition root and the unit/integration
fixtures use disposable IServiceCollections, so the foot-gun is narrower.
+4 -2
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 5 |
| Open findings | 4 |
## Summary
@@ -1461,9 +1461,11 @@ plumbing CentralUI-027 will need.
|--|--|
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/ScriptAnalysis/SandboxConsoleCapture.cs:31-118`; `src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:401-404` |
**Resolution (2026-05-28):** Wrapped every `Write`/`WriteLine` override in `SandboxConsoleCapture` through a `WriteSynchronized` helper that takes a `lock` on the current `AsyncLocal` capture buffer before writing — concurrent `Console.WriteLine` calls from a script's `Task.WhenAll`/`Task.Run` fan-out now serialise on the buffer instance, so the `StringBuilder` underneath can no longer be corrupted. The fall-through to the unwrapped `_fallback` writer is unlocked because the BCL's process-wide `Console.Out` is already synchronised. Different capture scopes have different lock targets, so two unrelated sandbox runs never block each other. New regression test `SandboxConsoleCaptureTests.BeginCapture_ConcurrentWritesFromTasks_DoNotCorruptBuffer` drives 32 tasks × 50 lines each through one capture scope and asserts every line is intact in the buffer.
**Description**
CentralUI-003 correctly routed console capture through an `AsyncLocal<StringWriter?>`
+4 -2
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 7 |
| Open findings | 6 |
## Summary
@@ -1014,9 +1014,11 @@ boundary (`AuditTelemetryEnvelope`, `PullAuditEventsRequest`/`Response`,
|--|--|
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Commons/Interfaces/Services/IExternalSystemClient.cs:91-104` |
**Resolution (2026-05-28):** Replaced the two mutable backing fields (`_response`/`_responseParsed`) with a single `private readonly Lazy<dynamic?> _response` initialised in the field initializer — `LazyThreadSafetyMode.ExecutionAndPublication` (the default) guarantees the parse runs at most once and every concurrent reader observes the same published `DynamicJsonElement`. `Response` is now a one-line `_response.Value` expression-bodied property. Regression test `ExternalCallResultTests.Response_ConcurrentReads_ReturnSameInstance` fires 64 concurrent readers through a `Barrier` and asserts `Assert.Same` across all observed values.
**Description**
`ExternalCallResult` is a `record` returned to scripts after an outbound HTTP call. The
+12 -2
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 4 |
| Open findings | 3 |
## Summary
@@ -989,9 +989,19 @@ longer exists.
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/DeploymentManagerRepository.cs:83-97` |
**Resolution (2026-05-28):**
`IDeploymentManagerRepository.DeleteDeploymentRecordAsync` now requires a `byte[] expectedRowVersion`
argument. The repository seeds `entry.OriginalValues["RowVersion"]` on both the Local-tracked and
stub-attach branches so EF emits `DELETE ... WHERE Id = @id AND RowVersion = @prior` and surfaces a
concurrent edit as `DbUpdateConcurrencyException`. A new SQLite regression test
`DeleteDeploymentRecord_StaleRowVersion_ThrowsConcurrencyException` in `ConcurrencyTests`
(backed by a dedicated `RowVersionConcurrencyTestDbContext` that keeps `RowVersion` as a
caller-managed concurrency token) asserts the exception fires on a stale token; the existing
`DeleteDeploymentRecord_ViaStubAttachPath_RemovesEntity` was updated to pass the fresh RowVersion.
**Description**
`DeploymentRecord` carries a SQL Server `rowversion` concurrency token (declared
+4 -6
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 10 |
| Open findings | 9 |
## Summary
@@ -237,9 +237,11 @@ _Unresolved._
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:127-132` (caller); root cause in `src/ScadaLink.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:33-45` |
**Resolution (2026-05-28):** Closed by CD-015 — `NotificationOutboxRepository.InsertIfNotExistsAsync` (commit `ac96b83`) is now a single-statement `IF NOT EXISTS ... INSERT` via `ExecuteSqlInterpolatedAsync` with a `SqlException` filter swallowing duplicate-key violations (`2601`/`2627`) as a no-op (`return false`). The check-then-act window is eliminated; the at-least-once handoff contract holds and the actor's `PipeTo` success/failure projection no longer surfaces a permanent PK-violation back to the site. Verified in `src/ScadaLink.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:51-103`.
**Description**
`HandleSubmit``PersistAsync` calls `repository.InsertIfNotExistsAsync(notification)`
@@ -271,10 +273,6 @@ and ack-back does not produce a permanent re-forward loop — but the cleanest f
remains the CD-015 raw-SQL `IF NOT EXISTS … INSERT` with `2601/2627` catch in
`NotificationOutboxRepository`.
**Resolution**
_Unresolved._
### NotificationOutbox-006 — `ResolveAdapters` rebuilds the `NotificationType → adapter` dictionary on every dispatch sweep
| | |
+4 -2
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 7 |
| Open findings | 6 |
## Summary
@@ -683,9 +683,11 @@ Recommended path is option 1: the parallel implementation in `EmailNotificationD
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:654-660`, NS-001 resolution note (this file) |
**Resolution (2026-05-28):** Added a `_notificationDeliveryHandlerRegistered` sentinel field on `AkkaHostedService` and gated the canonical `NotificationForwarder` registration with an `InvalidOperationException` guard — a future code path that re-introduces the dead NS-001 site-SMTP handler now fails fast at startup with an explicit NS-020 diagnostic, rather than silently overwriting `RegisterDeliveryHandler`'s last-write-wins map and inverting the central-only design. The sentinel's XML doc cross-references NS-001/NS-019/NS-020 so a maintainer searching for the `Notification` S&F handler finds the one canonical registration and its history.
**Description**
NS-001 was resolved by registering an `S&F → DeliverBufferedAsync` handler for `StoreAndForwardCategory.Notification` at site startup in `AkkaHostedService`. The current source registers a **different** handler for the same category at `AkkaHostedService.cs:654-660``NotificationForwarder.DeliverAsync`, which forwards to central instead of sending SMTP. `StoreAndForwardService.RegisterDeliveryHandler` (verified by reading `StoreAndForward/StoreAndForwardService.cs` around line 109) takes a single handler per category — last-write-wins or first-write-wins, either way the two registrations cannot both be active.
+13 -23
View File
@@ -41,21 +41,21 @@ module file and counted in **Total**.
|----------|---------------|
| Critical | 0 |
| High | 0 |
| Medium | 25 |
| Low | 50 |
| **Total** | **75** |
| Medium | 22 |
| Low | 43 |
| **Total** | **65** |
## Module Status
| Module | Last reviewed | Commit | Open (C/H/M/L) | Open | Total |
|--------|---------------|--------|----------------|------|-------|
| [AuditLog](AuditLog/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/4 | 6 | 11 |
| [AuditLog](AuditLog/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/1 | 3 | 11 |
| [CLI](CLI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/2 | 3 | 23 |
| [CentralUI](CentralUI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/5 | 5 | 33 |
| [CentralUI](CentralUI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/4 | 4 | 33 |
| [ClusterInfrastructure](ClusterInfrastructure/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/3 | 3 | 14 |
| [Commons](Commons/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/5 | 5 | 23 |
| [Commons](Commons/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/4 | 4 | 23 |
| [Communication](Communication/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/1 | 2 | 22 |
| [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/2 | 4 | 24 |
| [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/2 | 3 | 24 |
| [DataConnectionLayer](DataConnectionLayer/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/0 | 0 | 22 |
| [DeploymentManager](DeploymentManager/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/4 | 4 | 24 |
| [ExternalSystemGateway](ExternalSystemGateway/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/1 | 2 | 23 |
@@ -63,15 +63,15 @@ module file and counted in **Total**.
| [Host](Host/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/3 | 4 | 22 |
| [InboundAPI](InboundAPI/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/2 | 3 | 25 |
| [ManagementService](ManagementService/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/1 | 3 | 23 |
| [NotificationOutbox](NotificationOutbox/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/2 | 3 | 10 |
| [NotificationService](NotificationService/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/2 | 4 | 25 |
| [NotificationOutbox](NotificationOutbox/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/2 | 2 | 10 |
| [NotificationService](NotificationService/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/2 | 3 | 25 |
| [Security](Security/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/1 | 1 | 21 |
| [SiteCallAudit](SiteCallAudit/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/2 | 4 | 6 |
| [SiteCallAudit](SiteCallAudit/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/1 | 3 | 6 |
| [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/0/3 | 3 | 23 |
| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/2/0 | 2 | 26 |
| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/3/2 | 5 | 24 |
| [TemplateEngine](TemplateEngine/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/3/0 | 3 | 22 |
| [Transport](Transport/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/3 | 4 | 12 |
| [Transport](Transport/findings.md) | 2026-05-28 | `1eb6e97` | 0/0/1/2 | 3 | 12 |
## Pending Findings
@@ -88,7 +88,7 @@ _None open._
_None open._
### Medium (25)
### Medium (22)
| ID | Module | Title |
|----|--------|-------|
@@ -97,14 +97,11 @@ _None open._
| CLI-019 | [CLI](CLI/findings.md) | `bundle export` decodes the entire base64 bundle into memory before writing |
| Communication-017 | [Communication](Communication/findings.md) | `_inProgressDeployments` grows unboundedly — successful deployments are never cleaned up |
| ConfigurationDatabase-016 | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | `InboundApiRepository.GetApiKeyByValueAsync` hashes the candidate with the unpeppered `ApiKeyHasher.Default` |
| ConfigurationDatabase-017 | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | Stub-attach delete on `DeploymentRecord` bypasses optimistic concurrency |
| ExternalSystemGateway-020 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | `JsonElementToParameterValue` silently downcasts non-Int64 JSON numbers to `double`, losing precision for `decimal` SQL parameters on retry |
| Host-016 | [Host](Host/findings.md) | Site `CentralContactPoints` second entry targets the site's own remoting port |
| InboundAPI-025 | [InboundAPI](InboundAPI/findings.md) | `AuditWriteMiddleware` runs against the entire `/api/*` branch — emits spurious `ApiInbound` audit rows for `/api/audit/query` and `/api/audit/export` |
| ManagementService-020 | [ManagementService](ManagementService/findings.md) | UpdateSmtpConfig returns and audits the SMTP Credentials field verbatim |
| ManagementService-021 | [ManagementService](ManagementService/findings.md) | Transport bundle handlers have zero test coverage |
| NotificationOutbox-005 | [NotificationOutbox](NotificationOutbox/findings.md) | Ingest persistence inherits the CD-015 check-then-act race; under contention the second writer throws and the site retries |
| NotificationService-020 | [NotificationService](NotificationService/findings.md) | NS-001 fix superseded; `AkkaHostedService` would register two competing `Notification` S&F handlers if both code paths ran |
| NotificationService-024 | [NotificationService](NotificationService/findings.md) | No test affirms the central-only invariant; the orphaned-path tests give a false coverage signal |
| SiteCallAudit-001 | [SiteCallAudit](SiteCallAudit/findings.md) | SupervisorStrategy override is dead code; XML claims Resume that is not enforced |
| SiteCallAudit-003 | [SiteCallAudit](SiteCallAudit/findings.md) | `OnUpsertAsync` does not refresh `IngestedAtUtc`; direct-write callers must remember to stamp it |
@@ -118,18 +115,14 @@ _None open._
| TemplateEngine-020 | [TemplateEngine](TemplateEngine/findings.md) | `Create*` audit entries are written with `EntityId = "0"` before `SaveChangesAsync` populates the real key |
| Transport-010 | [Transport](Transport/findings.md) | Critical Overwrite + cross-cutting paths uncovered by tests |
### Low (50)
### Low (43)
| ID | Module | Title |
|----|--------|-------|
| AuditLog-003 | [AuditLog](AuditLog/findings.md) | `AuditLogIngestActor.OnIngestAsync` uses `CreateScope`, but `OnCachedTelemetryAsync` uses `CreateAsyncScope` — and only one disposes asynchronously |
| AuditLog-007 | [AuditLog](AuditLog/findings.md) | `INodeIdentityProvider` resolution mixes `GetService` and `GetRequiredService` inconsistently across `AddAuditLog` registrations |
| AuditLog-008 | [AuditLog](AuditLog/findings.md) | Test composition roots that omit `IAuditPayloadFilter` silently pass UNREDACTED payloads through the writer chain |
| AuditLog-011 | [AuditLog](AuditLog/findings.md) | `AddAuditLogHealthMetricsBridge` and `AddAuditLogCentralMaintenance` are non-idempotent and register hosted services on every call |
| CLI-020 | [CLI](CLI/findings.md) | `bundle export` success-envelope parse is unguarded |
| CLI-022 | [CLI](CLI/findings.md) | `CommandTreeTests` excludes the two new command groups |
| CentralUI-029 | [CentralUI](CentralUI/findings.md) | `ConfigurationAuditLog` uses `JS.InvokeAsync<int>("eval", ...)` instead of a dedicated JS module |
| CentralUI-030 | [CentralUI](CentralUI/findings.md) | `SandboxConsoleCapture`'s per-call `StringWriter` is not thread-safe under intra-script concurrency |
| CentralUI-031 | [CentralUI](CentralUI/findings.md) | `TransportImport` buffers the full bundle bytes in component state |
| CentralUI-032 | [CentralUI](CentralUI/findings.md) | `AuditResultsGrid` paging is forward-only, no Previous button |
| CentralUI-033 | [CentralUI](CentralUI/findings.md) | Drill-in / query-string code paths for the new Transport + SiteCalls pages are untested |
@@ -139,7 +132,6 @@ _None open._
| Commons-016 | [Commons](Commons/findings.md) | `BundleSession.Locked` uses a magic `3` rather than a named constant |
| Commons-018 | [Commons](Commons/findings.md) | `IOperationTrackingStore` and `IPartitionMaintenance` are at the root of `Interfaces/` instead of `Interfaces/Services/` |
| Commons-020 | [Commons](Commons/findings.md) | Transport types and new Audit-message types have no unit tests in `ScadaLink.Commons.Tests` |
| Commons-021 | [Commons](Commons/findings.md) | `ExternalCallResult.Response` has a benign lazy-parse race |
| Commons-023 | [Commons](Commons/findings.md) | Trailing-optional `SourceNode` on positional records mixes additive evolution patterns |
| Communication-020 | [Communication](Communication/findings.md) | `SiteAddressCacheLoaded` carries mutable `Dictionary`/`List` types |
| ConfigurationDatabase-021 | [ConfigurationDatabase](ConfigurationDatabase/findings.md) | `SwitchOutPartitionAsync` interpolates `monthBoundary` / staging table name into raw SQL |
@@ -162,7 +154,6 @@ _None open._
| NotificationService-022 | [NotificationService](NotificationService/findings.md) | `MailKitSmtpClientWrapper` holds a long-lived `SmtpClient`; combined with per-send factory, the design comment about pooling is contradicted |
| NotificationService-025 | [NotificationService](NotificationService/findings.md) | `CredentialRedactor` over-masks: any 4-character credential component is masked anywhere it appears, including unrelated log text |
| Security-021 | [Security](Security/findings.md) | `RequireHttpsCookie=false` dev opt-out has no warning path — an HTTP production deployment silently transmits the JWT bearer credential in cleartext |
| SiteCallAudit-002 | [SiteCallAudit](SiteCallAudit/findings.md) | Singleton failover does not wait for in-flight async upserts |
| SiteCallAudit-006 | [SiteCallAudit](SiteCallAudit/findings.md) | Stuck-only paging test does not exercise the multi-page boundary with an interleaved non-stuck row at the cursor |
| SiteEventLogging-018 | [SiteEventLogging](SiteEventLogging/findings.md) | `FailedWriteCount` is exposed but never consumed by Health Monitoring |
| SiteEventLogging-022 | [SiteEventLogging](SiteEventLogging/findings.md) | `Cache=Shared` is redundant for a single-connection logger |
@@ -170,5 +161,4 @@ _None open._
| StoreAndForward-022 | [StoreAndForward](StoreAndForward/findings.md) | `NotifyCachedCallObserverAsync` silently drops the entire audit lifecycle when the message id is not a parseable `TrackedOperationId` |
| StoreAndForward-023 | [StoreAndForward](StoreAndForward/findings.md) | `siteId` silently defaults to empty when no `IStoreAndForwardSiteContext` is registered, degrading audit telemetry correlation |
| Transport-008 | [Transport](Transport/findings.md) | `PreviewAsync` issues an N+1 `GetTemplateWithChildrenAsync` per matching template name |
| Transport-009 | [Transport](Transport/findings.md) | `IAuditCorrelationContext.BundleImportId` is mutated on the same scoped instance the AuditService reads |
| Transport-012 | [Transport](Transport/findings.md) | "Bundle Import" filter promised in design doc not surfaced in Configuration Audit Log Viewer UI |
+4 -6
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 6 |
| Open findings | 5 |
## Summary
@@ -108,9 +108,11 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Error handling & resilience |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Host/Actors/AkkaHostedService.cs:455-462` (singleton wiring), `src/ScadaLink.SiteCallAudit/SiteCallAuditActor.cs:153-193` |
**Resolution (2026-05-28):** Added a `CoordinatedShutdown` task in the `cluster-leave` phase (named `drain-site-call-audit-singleton`) that issues an explicit `GracefulStop(10s)` to the `SiteCallAudit` cluster singleton manager before the cluster-leave proceeds. Akka.NET's singleton handover already waits for the active actor's `ReceiveAsync` task to complete before signalling `HandOverDone`, so an in-flight EF `UpsertAsync` (and its SQL round-trip) drains on the old node before the new singleton starts on the other central node — closing the seam where the new singleton could race a still-running upsert on the old node. The 10-second timeout is bounded so a misbehaving upsert cannot stall coordinated shutdown indefinitely; on timeout the existing `PoisonPill` termination path takes over and the repository's monotonic-upsert + 2601/2627 duplicate-key swallow remain as the storage-state safety net. Pattern is suitable for the `NotificationOutbox` singleton too; deferred to keep this change scoped.
**Description**
The singleton is created with `terminationMessage: PoisonPill.Instance`. On
@@ -146,10 +148,6 @@ Notification Outbox sibling has the same pattern.
monotonic rank check (the CD-015 race-pattern check the parent task
flagged).
**Resolution**
_Unresolved._
### SiteCallAudit-003 — `OnUpsertAsync` does not refresh `IngestedAtUtc`; direct-write callers must remember to stamp it
| | |
+11 -2
View File
@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 10 |
| Open findings | 9 |
## Summary
@@ -343,9 +343,18 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.Transport/Import/BundleImporter.cs:528, 668, 703`, `src/ScadaLink.ConfigurationDatabase/Services/AuditCorrelationContext.cs` |
**Resolution (2026-05-28):**
Took option (a). `AuditCorrelationContext` now backs `BundleImportId` with a `static AsyncLocal<Guid?>`,
so every distinct `BundleImporter.ApplyAsync` invocation observes its own value even when sharing a
DI scope (e.g. concurrent imports awaited via `Task.WhenAll` on a single Blazor circuit). The XML
docs on both `IAuditCorrelationContext` and `AuditCorrelationContext` were rewritten to spell out
the per-logical-call-context isolation contract that all implementations must preserve. The
`BundleImporter` mutation pattern is unchanged — the property API is identical and existing
integration tests still pass.
**Description**
The XML doc on `IAuditCorrelationContext` correctly notes that mutating
@@ -122,65 +122,69 @@ public class AuditLogIngestActor : ReceiveActor
// ctor has no service provider — it falls through with no filter,
// which preserves the small-payload assumptions baked into the
// existing D2 fixtures.
IServiceScope? scope = null;
IAuditLogRepository repository;
IAuditPayloadFilter? filter = null;
ICentralAuditWriteFailureCounter? failureCounter = null;
// AuditLog-003: use CreateAsyncScope + await using so scoped EF Core
// services (IAsyncDisposable DbContexts) dispose asynchronously
// without blocking on sync Dispose() of pending connection cleanup.
if (_injectedRepository is not null)
{
repository = _injectedRepository;
await IngestWithRepositoryAsync(_injectedRepository, filter: null, failureCounter: null, cmd, nowUtc, accepted)
.ConfigureAwait(false);
}
else
{
scope = _serviceProvider!.CreateScope();
repository = scope.ServiceProvider.GetRequiredService<IAuditLogRepository>();
filter = scope.ServiceProvider.GetService<IAuditPayloadFilter>();
await using var scope = _serviceProvider!.CreateAsyncScope();
var repository = scope.ServiceProvider.GetRequiredService<IAuditLogRepository>();
var filter = scope.ServiceProvider.GetService<IAuditPayloadFilter>();
// M6 Bundle E (T8): central health counter is best-effort —
// unregistered (test composition roots) means the per-row catch
// simply logs without surfacing on the health dashboard.
failureCounter = scope.ServiceProvider.GetService<ICentralAuditWriteFailureCounter>();
}
try
{
foreach (var evt in cmd.Events)
{
try
{
// Stamp IngestedAtUtc here, not at the site. Bundle A's
// repository hardening already swallows duplicate-key races,
// so the same id arriving twice (site retry, reconciliation)
// is a silent no-op.
// Filter BEFORE the IngestedAtUtc stamp so the redacted
// copy carries the central-side ingest timestamp. Filter
// is contract-bound to never throw; null = pass-through.
var filtered = filter?.Apply(evt) ?? evt;
var ingested = filtered with { IngestedAtUtc = nowUtc };
await repository.InsertIfNotExistsAsync(ingested).ConfigureAwait(false);
accepted.Add(evt.EventId);
}
catch (Exception ex)
{
// Per-row catch — one bad row never sinks the whole batch.
// The row stays Pending at the site; the next drain retries.
// M6 Bundle E (T8): bump the central health counter so a
// sustained insert-throw failure surfaces on the dashboard.
try { failureCounter?.Increment(); }
catch { /* counter must never throw — defence in depth */ }
_logger.LogError(ex,
"Failed to persist audit event {EventId} during batch ingest; row will be retried by the site.",
evt.EventId);
}
}
}
finally
{
scope?.Dispose();
var failureCounter = scope.ServiceProvider.GetService<ICentralAuditWriteFailureCounter>();
await IngestWithRepositoryAsync(repository, filter, failureCounter, cmd, nowUtc, accepted)
.ConfigureAwait(false);
}
replyTo.Tell(new IngestAuditEventsReply(accepted));
}
private async Task IngestWithRepositoryAsync(
IAuditLogRepository repository,
IAuditPayloadFilter? filter,
ICentralAuditWriteFailureCounter? failureCounter,
IngestAuditEventsCommand cmd,
DateTime nowUtc,
List<Guid> accepted)
{
foreach (var evt in cmd.Events)
{
try
{
// Stamp IngestedAtUtc here, not at the site. Bundle A's
// repository hardening already swallows duplicate-key races,
// so the same id arriving twice (site retry, reconciliation)
// is a silent no-op.
// Filter BEFORE the IngestedAtUtc stamp so the redacted
// copy carries the central-side ingest timestamp. Filter
// is contract-bound to never throw; null = pass-through.
var filtered = filter?.Apply(evt) ?? evt;
var ingested = filtered with { IngestedAtUtc = nowUtc };
await repository.InsertIfNotExistsAsync(ingested).ConfigureAwait(false);
accepted.Add(evt.EventId);
}
catch (Exception ex)
{
// Per-row catch — one bad row never sinks the whole batch.
// The row stays Pending at the site; the next drain retries.
// M6 Bundle E (T8): bump the central health counter so a
// sustained insert-throw failure surfaces on the dashboard.
try { failureCounter?.Increment(); }
catch { /* counter must never throw — defence in depth */ }
_logger.LogError(ex,
"Failed to persist audit event {EventId} during batch ingest; row will be retried by the site.",
evt.EventId);
}
}
}
/// <summary>
/// M3 dual-write handler. For every <see cref="CachedTelemetryEntry"/> the
/// actor opens a fresh MS SQL transaction, inserts the AuditLog row
@@ -134,79 +134,73 @@ public class AuditLogPurgeActor : ReceiveActor
// restart.
var threshold = DateTime.UtcNow - TimeSpan.FromDays(_auditOptions.RetentionDays);
IServiceScope? scope = null;
// AuditLog-003: use CreateAsyncScope + await using so scoped EF Core
// services (IAsyncDisposable DbContexts) dispose asynchronously
// without blocking on sync Dispose() of pending connection cleanup.
await using var scope = _services.CreateAsyncScope();
IAuditLogRepository repository;
try
{
scope = _services.CreateScope();
repository = scope.ServiceProvider.GetRequiredService<IAuditLogRepository>();
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to resolve IAuditLogRepository for AuditLog purge tick.");
scope?.Dispose();
return;
}
IReadOnlyList<DateTime> boundaries;
try
{
IReadOnlyList<DateTime> boundaries;
boundaries = await repository
.GetPartitionBoundariesOlderThanAsync(threshold)
.ConfigureAwait(false);
}
catch (Exception ex)
{
_logger.LogError(
ex,
"Failed to enumerate eligible AuditLog partition boundaries (threshold {ThresholdUtc:o}); skipping purge tick.",
threshold);
return;
}
if (boundaries.Count == 0)
{
return;
}
foreach (var boundary in boundaries)
{
// Per-boundary try/catch: one bad partition (transient SQL
// failure, missing object, contention with backup) does NOT
// abandon the rest of the tick.
var sw = Stopwatch.StartNew();
try
{
boundaries = await repository
.GetPartitionBoundariesOlderThanAsync(threshold)
var rowsDeleted = await repository
.SwitchOutPartitionAsync(boundary)
.ConfigureAwait(false);
sw.Stop();
eventStream.Publish(
new AuditLogPurgedEvent(boundary, rowsDeleted, sw.ElapsedMilliseconds));
_logger.LogInformation(
"Purged AuditLog partition {MonthBoundary:yyyy-MM-dd}; {RowsDeleted} rows in {DurationMs} ms.",
boundary,
rowsDeleted,
sw.ElapsedMilliseconds);
}
catch (Exception ex)
{
sw.Stop();
_logger.LogError(
ex,
"Failed to enumerate eligible AuditLog partition boundaries (threshold {ThresholdUtc:o}); skipping purge tick.",
threshold);
return;
"Failed to purge AuditLog partition {MonthBoundary:yyyy-MM-dd}; other partitions continue. Elapsed {DurationMs} ms.",
boundary,
sw.ElapsedMilliseconds);
}
if (boundaries.Count == 0)
{
return;
}
foreach (var boundary in boundaries)
{
// Per-boundary try/catch: one bad partition (transient SQL
// failure, missing object, contention with backup) does NOT
// abandon the rest of the tick.
var sw = Stopwatch.StartNew();
try
{
var rowsDeleted = await repository
.SwitchOutPartitionAsync(boundary)
.ConfigureAwait(false);
sw.Stop();
eventStream.Publish(
new AuditLogPurgedEvent(boundary, rowsDeleted, sw.ElapsedMilliseconds));
_logger.LogInformation(
"Purged AuditLog partition {MonthBoundary:yyyy-MM-dd}; {RowsDeleted} rows in {DurationMs} ms.",
boundary,
rowsDeleted,
sw.ElapsedMilliseconds);
}
catch (Exception ex)
{
sw.Stop();
_logger.LogError(
ex,
"Failed to purge AuditLog partition {MonthBoundary:yyyy-MM-dd}; other partitions continue. Elapsed {DurationMs} ms.",
boundary,
sw.ElapsedMilliseconds);
}
}
}
finally
{
scope.Dispose();
}
}
@@ -195,44 +195,38 @@ public class SiteAuditReconciliationActor : ReceiveActor
return;
}
IServiceScope? scope = null;
// AuditLog-003: use CreateAsyncScope + await using so scoped EF Core
// services (IAsyncDisposable DbContexts) dispose asynchronously
// without blocking on sync Dispose() of pending connection cleanup.
await using var scope = _services.CreateAsyncScope();
IAuditLogRepository repository;
try
{
scope = _services.CreateScope();
repository = scope.ServiceProvider.GetRequiredService<IAuditLogRepository>();
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to resolve IAuditLogRepository for reconciliation tick.");
scope?.Dispose();
return;
}
try
foreach (var site in sites)
{
foreach (var site in sites)
try
{
try
{
await PullSiteAsync(site, repository, eventStream).ConfigureAwait(false);
}
catch (Exception ex)
{
// Catch-all per the failure-isolation invariant: one site's
// fault must not sink the rest of the tick. The cursor for
// the failing site is left at its previous value so the
// next tick retries the same window.
_logger.LogWarning(
ex,
"Reconciliation pull failed for site {SiteId}; other sites continue.",
site.SiteId);
}
await PullSiteAsync(site, repository, eventStream).ConfigureAwait(false);
}
catch (Exception ex)
{
// Catch-all per the failure-isolation invariant: one site's
// fault must not sink the rest of the tick. The cursor for
// the failing site is left at its previous value so the
// next tick retries the same window.
_logger.LogWarning(
ex,
"Reconciliation pull failed for site {SiteId}; other sites continue.",
site.SiteId);
}
}
finally
{
scope.Dispose();
}
}
@@ -150,16 +150,15 @@ public static class ServiceCollectionExtensions
sp.GetRequiredService<IAuditWriter>(),
sp.GetService<ScadaLink.Commons.Interfaces.IOperationTrackingStore>(),
sp.GetRequiredService<ILogger<CachedCallTelemetryForwarder>>(),
// SourceNode-stamping (Task 14): the local node identity is
// threaded through so RecordEnqueueAsync can stamp the
// tracking row's SourceNode column. GetService (not
// GetRequiredService) — test composition roots that build a
// stripped DI container may not register the provider, in
// which case the forwarder degrades to a null SourceNode
// rather than failing the DI resolution. Production hosts
// (site + central) always register it via
// SiteServiceRegistration.BindSharedOptions.
sp.GetService<INodeIdentityProvider>()));
// AuditLog-007: INodeIdentityProvider is now required across
// every consumer in AddAuditLog. The Host's
// SiteServiceRegistration registers it as a singleton on both
// site and central paths (InboundAPI-022 / Host registration
// sweep), and the AddAuditLogTests fixture provides a
// FakeNodeIdentityProvider; a silent GetService() that
// returned null would mask a future composition root that
// forgot to register the provider.
sp.GetRequiredService<INodeIdentityProvider>()));
// M3 Bundle F: bridge the store-and-forward retry-loop observer hook
// to the cached-call forwarder so per-attempt + terminal telemetry
@@ -171,15 +170,17 @@ public static class ServiceCollectionExtensions
// INodeIdentityProvider singleton can be threaded through — the
// bridge stamps SiteCallOperational.SourceNode from
// INodeIdentityProvider.NodeName on every cached-call lifecycle row.
// GetService (not GetRequiredService) test composition roots that
// build a stripped DI container may not register the provider, in
// which case the bridge degrades to a null SourceNode rather than
// failing the DI resolution. Production hosts (site + central)
// always register it via SiteServiceRegistration.BindSharedOptions.
// AuditLog-007: the provider is resolved with GetRequiredService —
// SiteServiceRegistration.BindSharedOptions registers it on both
// site and central paths, so a missing registration is a
// composition-root bug, not a silent null-SourceNode degradation.
services.AddSingleton<CachedCallLifecycleBridge>(sp => new CachedCallLifecycleBridge(
sp.GetRequiredService<ICachedCallTelemetryForwarder>(),
sp.GetRequiredService<ILogger<CachedCallLifecycleBridge>>(),
sp.GetService<INodeIdentityProvider>()));
// AuditLog-007: required, matches the other consumers in this
// composition root — the provider is always registered by
// SiteServiceRegistration.
sp.GetRequiredService<INodeIdentityProvider>()));
services.AddSingleton<ICachedCallLifecycleObserver>(
sp => sp.GetRequiredService<CachedCallLifecycleBridge>());
@@ -245,8 +246,10 @@ public static class ServiceCollectionExtensions
/// time — by design, since a silent NoOp would mask a misconfiguration.
/// </para>
/// <para>
/// Idempotent — calling twice replaces each descriptor without piling up
/// registrations.
/// Idempotent — a sentinel check on the
/// <see cref="SiteAuditBacklogReporter"/> hosted-service descriptor
/// short-circuits subsequent calls so the hosted service is not
/// double-registered (AddHostedService has no TryAdd variant).
/// </para>
/// <para>
/// Site-side only for M5: the central composition root keeps the NoOp
@@ -261,6 +264,18 @@ public static class ServiceCollectionExtensions
{
ArgumentNullException.ThrowIfNull(services);
// AuditLog-011: guard against double-registration. AddHostedService is
// additive (no TryAdd variant) so a second call without this sentinel
// would spin up a second SiteAuditBacklogReporter, doubling the 30 s
// SQL probe rate and racing on the ISiteHealthCollector snapshot. The
// SiteAuditBacklogReporter descriptor is the discriminator: it's only
// registered by this helper, so its presence proves the bridge has
// already been wired.
if (services.Any(d => d.ImplementationType == typeof(SiteAuditBacklogReporter)))
{
return services;
}
services.Replace(
ServiceDescriptor.Singleton<IAuditWriteFailureCounter, HealthMetricsAuditWriteFailureCounter>());
services.Replace(
@@ -79,23 +79,53 @@ internal sealed class SandboxConsoleCapture : TextWriter
return new CaptureScope(this, previous);
}
// CentralUI-030: intra-script concurrency hardening. A sandboxed script
// can fan out work with `Task.WhenAll` / `Task.Run`; `AsyncLocal` flows
// the capture `StringWriter` into every child task, so two tasks can
// race the *same* buffer. `StringWriter` is not thread-safe — concurrent
// `Write`/`WriteLine` calls can corrupt the underlying `StringBuilder`
// and either throw or interleave at the character level. We lock on the
// captured writer itself so writes from one capture scope serialise;
// fall-through to the original `_fallback` (host-process console) is
// unlocked because the BCL's process-wide `Console.Out` is already
// synchronised by its TextWriter wrapper.
/// <inheritdoc />
public override void Write(char value) => Target.Write(value);
public override void Write(char value) => WriteSynchronized(t => t.Write(value));
/// <inheritdoc />
public override void Write(string? value) => Target.Write(value);
public override void Write(string? value) => WriteSynchronized(t => t.Write(value));
/// <inheritdoc />
public override void Write(char[] buffer, int index, int count) =>
Target.Write(buffer, index, count);
WriteSynchronized(t => t.Write(buffer, index, count));
/// <inheritdoc />
public override void WriteLine() => Target.WriteLine();
public override void WriteLine() => WriteSynchronized(t => t.WriteLine());
/// <inheritdoc />
public override void WriteLine(string? value) => Target.WriteLine(value);
public override void WriteLine(string? value) => WriteSynchronized(t => t.WriteLine(value));
private TextWriter Target => _current.Value ?? _fallback;
/// <summary>
/// Routes a single write through the currently-active capture buffer
/// under a lock on that buffer, or to the unwrapped fallback writer when
/// no capture scope is active. The lock target is the `StringWriter`
/// instance itself — different capture scopes have different writers,
/// so two unrelated scopes never block each other.
/// </summary>
private void WriteSynchronized(Action<TextWriter> write)
{
var captured = _current.Value;
if (captured is null)
{
write(_fallback);
return;
}
lock (captured)
{
write(captured);
}
}
internal readonly struct CaptureScope : IDisposable
{
@@ -56,12 +56,19 @@ public interface IDeploymentManagerRepository
/// <returns>A task representing the asynchronous operation.</returns>
Task UpdateDeploymentRecordAsync(DeploymentRecord record, CancellationToken cancellationToken = default);
/// <summary>
/// Deletes a deployment record by ID.
/// Deletes a deployment record by ID, enforcing optimistic concurrency against the
/// supplied <paramref name="expectedRowVersion"/>. The caller MUST pass the
/// <c>RowVersion</c> it last observed on the record so EF emits
/// <c>DELETE ... WHERE Id = @id AND RowVersion = @prior</c>. A concurrent edit
/// surfaces as <see cref="Microsoft.EntityFrameworkCore.DbUpdateConcurrencyException"/>
/// on <see cref="SaveChangesAsync(CancellationToken)"/>, matching the documented
/// "Optimistic concurrency is used on deployment status records" design rule.
/// </summary>
/// <param name="id">The deployment record ID to delete.</param>
/// <param name="expectedRowVersion">The RowVersion the caller observed; used as the optimistic-concurrency token.</param>
/// <param name="cancellationToken">A cancellation token that can be used to cancel the operation.</param>
/// <returns>A task representing the asynchronous operation.</returns>
Task DeleteDeploymentRecordAsync(int id, CancellationToken cancellationToken = default);
Task DeleteDeploymentRecordAsync(int id, byte[] expectedRowVersion, CancellationToken cancellationToken = default);
// SystemArtifactDeploymentRecord
/// <summary>
@@ -81,25 +81,23 @@ public record ExternalCallResult(
string? ErrorMessage,
bool WasBuffered = false)
{
private dynamic? _response;
private bool _responseParsed;
// Commons-021: thread-safe lazy parse — `Lazy<T>` with the default
// `LazyThreadSafetyMode.ExecutionAndPublication` guarantees that two
// concurrent readers see the same `DynamicJsonElement` instance, the
// `JsonDocument.Parse` runs at most once, and the published value is
// safe under .NET's memory model. The closure captures `ResponseJson`
// by reference to the property — the record's positional property is
// an init-only field set in the constructor, so the snapshot read at
// first-access time is stable for the lifetime of the result.
private readonly Lazy<dynamic?> _response = new(() =>
string.IsNullOrEmpty(ResponseJson)
? null
: new DynamicJsonElement(System.Text.Json.JsonDocument.Parse(ResponseJson).RootElement));
/// <summary>
/// Parsed response as a dynamic object. Returns null if ResponseJson is null or empty.
/// Access properties directly: result.Response.result, result.Response.items[0].name, etc.
/// Thread-safe: concurrent readers share a single parsed instance (Commons-021).
/// </summary>
public dynamic? Response
{
get
{
if (!_responseParsed)
{
_response = string.IsNullOrEmpty(ResponseJson)
? null
: new DynamicJsonElement(System.Text.Json.JsonDocument.Parse(ResponseJson).RootElement);
_responseParsed = true;
}
return _response;
}
}
public dynamic? Response => _response.Value;
}
@@ -1,24 +1,27 @@
namespace ScadaLink.Commons.Interfaces.Transport;
/// <summary>
/// Scoped service the bundle importer sets to thread a BundleImportId through to
/// the audit log entries emitted by the audited repository methods invoked during
/// Service the bundle importer sets to thread a BundleImportId through to the
/// audit log entries emitted by the audited repository methods invoked during
/// ApplyAsync. AuditService reads this and stamps every AuditLogEntry it writes.
/// <para>
/// Re-entrancy / thread-safety: mutating <see cref="BundleImportId"/> is NOT
/// thread-safe. The service is registered scoped, and the assumed usage is a
/// single Blazor Server circuit (or single API request) at a time — within that
/// scope <c>BundleImporter.ApplyAsync</c> (in the Transport component) is the
/// sole writer, and the audit service is the sole reader, in a strictly
/// sequential await chain. Callers that perform concurrent imports within a
/// shared scope (e.g. two <c>ApplyAsync</c> calls awaited via
/// <c>Task.WhenAll</c> on the same circuit) MUST serialize access externally —
/// there is no internal lock and the last writer wins, which would
/// cross-contaminate audit rows between imports.
/// Thread-safety / concurrency contract (Transport-009): the in-tree
/// implementation backs <see cref="BundleImportId"/> with an
/// <see cref="System.Threading.AsyncLocal{T}"/> so each logical asynchronous
/// call chain — every distinct <c>BundleImporter.ApplyAsync</c> invocation —
/// observes its own value, even when two imports share the same DI scope (e.g.
/// awaited via <c>Task.WhenAll</c> on a single Blazor circuit, or driven by a
/// misconfigured singleton registration). The value flows through every
/// <c>await</c> naturally; no cross-contamination of BundleImportIds between
/// concurrent imports.
/// </para>
/// <para>
/// Alternative implementations (e.g. ambient-context-free explicit-parameter
/// threading) MUST preserve the same per-call-context isolation guarantee.
/// </para>
/// </summary>
public interface IAuditCorrelationContext
{
/// <summary>Gets or sets the bundle import id used to correlate audit rows written during a bundle apply operation.</summary>
/// <summary>Gets or sets the bundle import id used to correlate audit rows written during a bundle apply operation. Implementations MUST isolate the value per-logical-call-context to prevent concurrent imports from cross-contaminating audit rows.</summary>
Guid? BundleImportId { get; set; }
}
@@ -81,17 +81,30 @@ public class DeploymentManagerRepository : IDeploymentManagerRepository
}
/// <inheritdoc />
public Task DeleteDeploymentRecordAsync(int id, CancellationToken cancellationToken = default)
public Task DeleteDeploymentRecordAsync(int id, byte[] expectedRowVersion, CancellationToken cancellationToken = default)
{
ArgumentNullException.ThrowIfNull(expectedRowVersion);
// CD-017: DeploymentRecord carries a SQL Server rowversion concurrency token.
// The stub-attach delete path must seed EF's OriginalValues["RowVersion"] with
// the caller's last-observed value so the generated SQL becomes
// `DELETE ... WHERE Id = @id AND RowVersion = @prior`. Without this seeding a
// concurrent edit is silently overwritten; with it, EF raises
// DbUpdateConcurrencyException on SaveChangesAsync — the documented
// optimistic-concurrency contract on deployment status records.
var record = _dbContext.DeploymentRecords.Local.FirstOrDefault(d => d.Id == id);
if (record != null)
{
var entry = _dbContext.Entry(record);
entry.OriginalValues["RowVersion"] = expectedRowVersion;
_dbContext.DeploymentRecords.Remove(record);
}
else
{
var stub = new DeploymentRecord("stub", "stub") { Id = id };
_dbContext.DeploymentRecords.Attach(stub);
var entry = _dbContext.Entry(stub);
entry.OriginalValues["RowVersion"] = expectedRowVersion;
_dbContext.DeploymentRecords.Remove(stub);
}
return Task.CompletedTask;
@@ -3,13 +3,34 @@ using ScadaLink.Commons.Interfaces.Transport;
namespace ScadaLink.ConfigurationDatabase.Services;
/// <summary>
/// Per-scope mutable holder for the active bundle import id. AuditService reads it
/// while writing AuditLogEntry rows. Registered as Scoped so each Blazor circuit /
/// request gets its own value; ApplyAsync explicitly creates a service scope and
/// sets the id at the top of the transaction.
/// Holder for the active bundle import id, backed by an <see cref="AsyncLocal{T}"/>
/// so each logical asynchronous call chain observes its own value. AuditService
/// reads it while writing AuditLogEntry rows.
/// <para>
/// Thread-safety / concurrency contract (Transport-009): the previous Scoped
/// instance with a plain auto-property mutated by <c>BundleImporter.ApplyAsync</c>
/// was vulnerable to cross-contamination if two imports ran concurrently inside
/// a shared DI scope — either via <c>Task.WhenAll</c> on a single Blazor circuit
/// or via a misconfigured singleton registration. Backing the property with
/// <see cref="AsyncLocal{T}"/> means every fresh logical-call-context — every
/// distinct <c>ApplyAsync</c> invocation, even ones sharing the same DI scope —
/// gets its own independent value, and the value flows naturally through every
/// <c>await</c> in the chain. Concurrent imports no longer leak BundleImportIds
/// across audit rows.
/// </para>
/// <para>
/// The class is still registered as Scoped so injection works with the existing
/// DI graph, but its in-memory state is per-call-context regardless of lifetime.
/// </para>
/// </summary>
public sealed class AuditCorrelationContext : IAuditCorrelationContext
{
private static readonly AsyncLocal<Guid?> _bundleImportId = new();
/// <inheritdoc />
public Guid? BundleImportId { get; set; }
public Guid? BundleImportId
{
get => _bundleImportId.Value;
set => _bundleImportId.Value = value;
}
}
+69 -1
View File
@@ -42,6 +42,21 @@ public class AkkaHostedService : IHostedService
/// </summary>
private readonly List<IDisposable> _trackedDisposables = new();
/// <summary>
/// NotificationService-020 guard: sentinel that flips to <c>true</c> the
/// first time a Notification-category S&amp;F delivery handler is registered
/// on this hosted service instance. <see cref="StoreAndForwardService.RegisterDeliveryHandler"/>
/// is last-write-wins on category, so a future code change that introduces
/// a second registration path (e.g. a role-branch + helper that both call
/// the registration) would silently overwrite the canonical
/// <c>NotificationForwarder</c> handler with whatever the loser registers —
/// the prior NS-001 fix did exactly this, and was silently superseded
/// when the central-only redesign moved delivery to <c>NotificationOutbox</c>.
/// This sentinel makes the duplicate noisy at startup so a maintainer
/// re-introducing the second path sees it immediately.
/// </summary>
private bool _notificationDeliveryHandlerRegistered;
/// <summary>
/// Initializes a new instance of the <see cref="AkkaHostedService"/> class.
/// </summary>
@@ -460,7 +475,42 @@ akka {{
terminationMessage: PoisonPill.Instance,
settings: ClusterSingletonManagerSettings.Create(_actorSystem!)
.WithSingletonName("site-call-audit"));
_actorSystem!.ActorOf(siteCallAuditSingletonProps, "site-call-audit-singleton");
var siteCallAuditSingletonManager =
_actorSystem!.ActorOf(siteCallAuditSingletonProps, "site-call-audit-singleton");
// SiteCallAudit-002 graceful-handover hook. The default singleton handover
// path waits for the actor's `ReceiveAsync` task to complete before
// signalling `HandOverDone` to the new oldest node — so an in-flight
// EF `UpsertAsync` IS waited for during a *clean* coordinated shutdown
// (the cluster-leave phase below fires before the singleton terminates).
// The risk the finding tracks is the seam between in-flight async work
// and the cluster-leave + singleton-stop sequence: we bound it by
// issuing an explicit `GracefulStop` to the singleton manager early
// in `cluster-leave`, with a timeout that lets the running upsert + SQL
// round-trip drain before the handover-to-other-node race window
// opens. The timeout is bounded so a misbehaving upsert cannot stall
// coordinated shutdown indefinitely — exceeding it falls through to
// the existing PoisonPill termination path. Same pattern is suitable
// for the NotificationOutbox singleton; not added here to keep this
// change minimal (out of NS-020's scope).
var siteCallAuditShutdown = Akka.Actor.CoordinatedShutdown.Get(_actorSystem);
siteCallAuditShutdown.AddTask(
Akka.Actor.CoordinatedShutdown.PhaseClusterLeave,
"drain-site-call-audit-singleton",
async () =>
{
try
{
await siteCallAuditSingletonManager.GracefulStop(TimeSpan.FromSeconds(10));
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"SiteCallAudit singleton did not drain within the graceful-stop "
+ "timeout; falling through to PoisonPill handover");
}
return Akka.Done.Instance;
});
var siteCallAuditProxyProps = ClusterSingletonProxy.Props(
singletonManagerPath: "/user/site-call-audit-singleton",
@@ -651,6 +701,23 @@ akka {{
// cluster via the SiteCommunicationActor and treating central's
// NotificationSubmitAck as the outcome (accepted → delivered; not accepted
// or timeout → throw → transient → keep buffering). Central owns SMTP.
//
// NotificationService-020: register exactly once. The sentinel guard
// catches a second registration path that re-introduces the dead
// NS-001 site-SMTP handler — see the sentinel's XML doc above for the
// historical context. Throwing here is intentional: a silent overwrite
// by a future maintainer would invert the design back to site-side
// delivery (NotificationForwarder vs. NotificationDeliveryService).
if (_notificationDeliveryHandlerRegistered)
{
throw new InvalidOperationException(
"NotificationService-020: A Notification-category store-and-forward "
+ "delivery handler was already registered. The canonical handler is "
+ "NotificationForwarder (central-only delivery, post-redesign). "
+ "If you are re-introducing a second registration path, remove the "
+ "first one — RegisterDeliveryHandler is last-write-wins per category "
+ "and a duplicate inverts the design.");
}
var notificationForwarder = new ScadaLink.StoreAndForward.NotificationForwarder(
siteCommActor,
_nodeOptions.SiteId!,
@@ -658,6 +725,7 @@ akka {{
storeAndForwardService.RegisterDeliveryHandler(
ScadaLink.Commons.Types.Enums.StoreAndForwardCategory.Notification,
notificationForwarder.DeliverAsync);
_notificationDeliveryHandlerRegistered = true;
_logger.LogInformation(
"Store-and-forward delivery handlers registered (ExternalSystem, CachedDbWrite, Notification)");
@@ -1,5 +1,6 @@
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
@@ -274,4 +275,36 @@ public class AddAuditLogTests
Assert.Throws<InvalidOperationException>(
() => provider.GetRequiredService<IAuditWriteFailureCounter>());
}
[Fact]
public void AddAuditLogHealthMetricsBridge_IsIdempotent_DoesNotDoubleRegister_HostedService()
{
// AuditLog-011: AddHostedService has no TryAdd variant, so a second
// call without the sentinel guard would spin up a second
// SiteAuditBacklogReporter on the same SQLite file. The helper must
// be a no-op on the second call — exactly one hosted-service
// descriptor for SiteAuditBacklogReporter survives.
var config = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
["AuditLog:SiteWriter:DatabasePath"] = ":memory:",
})
.Build();
var services = new ServiceCollection();
services.AddSingleton<ILoggerFactory, NullLoggerFactory>();
services.AddSingleton(typeof(ILogger<>), typeof(NullLogger<>));
services.AddSingleton<INodeIdentityProvider>(new FakeNodeIdentityProvider());
services.AddAuditLog(config);
services.AddHealthMonitoring();
services.AddAuditLogHealthMetricsBridge();
services.AddAuditLogHealthMetricsBridge();
var reporterCount = services.Count(d =>
d.ServiceType == typeof(IHostedService) &&
d.ImplementationType == typeof(SiteAuditBacklogReporter));
Assert.Equal(1, reporterCount);
}
}
@@ -0,0 +1,86 @@
using ScadaLink.CentralUI.ScriptAnalysis;
namespace ScadaLink.CentralUI.Tests.ScriptAnalysis;
/// <summary>
/// Regression tests for the <c>SandboxConsoleCapture</c> writer that the Test Run
/// sandbox installs on <c>Console.Out</c>/<c>Console.Error</c>. CentralUI-030
/// surfaced an intra-script concurrency hazard: a sandboxed script can fan out
/// work with <c>Task.WhenAll</c> / <c>Task.Run</c> and every child task inherits
/// the capture <c>StringWriter</c> via <c>AsyncLocal</c>; <c>StringWriter</c> is
/// not thread-safe, so concurrent writes could corrupt the buffer. These tests
/// drive the writer the same way Roslyn-hosted user code does.
/// </summary>
public class SandboxConsoleCaptureTests
{
/// <summary>
/// CentralUI-030: a capture scope shared across <c>Task.WhenAll</c> child
/// tasks must serialise writes so the resulting transcript contains exactly
/// the expected number of lines without character-level interleaving.
/// </summary>
[Fact]
public async Task BeginCapture_ConcurrentWritesFromTasks_DoNotCorruptBuffer()
{
// The static install routes Console.Out through the singleton sandbox
// capture writer for the test process — this is idempotent and matches
// the way ScriptAnalysisService bootstraps the sandbox in production.
var (capture, _) = SandboxConsoleCapture.Install();
var buffer = new StringWriter();
const int taskCount = 32;
const int linesPerTask = 50;
const int expectedLines = taskCount * linesPerTask;
using (capture.BeginCapture(buffer))
{
// AsyncLocal flows the capture scope into each Task.Run, mirroring
// a sandboxed script doing `await Task.WhenAll(...)` over Tasks
// that each `Console.WriteLine`.
var tasks = Enumerable.Range(0, taskCount).Select(i => Task.Run(() =>
{
for (var j = 0; j < linesPerTask; j++)
{
Console.WriteLine($"task-{i}-line-{j}");
}
}));
await Task.WhenAll(tasks);
}
var captured = buffer.ToString();
// Without the lock, concurrent StringWriter.WriteLine can drop or
// interleave characters and produce malformed lines / a wrong count.
// We assert the exact line count and that every emitted token is
// present on a line of its own — both fail under the unprotected
// implementation.
var lines = captured.Split(Environment.NewLine, StringSplitOptions.RemoveEmptyEntries);
Assert.Equal(expectedLines, lines.Length);
for (var i = 0; i < taskCount; i++)
{
for (var j = 0; j < linesPerTask; j++)
{
Assert.Contains($"task-{i}-line-{j}", lines);
}
}
}
/// <summary>
/// Sanity check: the most basic capture happy-path still works after the
/// CentralUI-030 lock was introduced.
/// </summary>
[Fact]
public void BeginCapture_SingleThreadedWrites_AreCaptured()
{
var (capture, _) = SandboxConsoleCapture.Install();
var buffer = new StringWriter();
using (capture.BeginCapture(buffer))
{
Console.WriteLine("hello");
Console.Write("world");
}
Assert.Contains("hello", buffer.ToString());
Assert.Contains("world", buffer.ToString());
}
}
@@ -0,0 +1,76 @@
using ScadaLink.Commons.Interfaces.Services;
namespace ScadaLink.Commons.Tests.Interfaces.Services;
/// <summary>
/// Tests for <see cref="ExternalCallResult"/>, in particular the Commons-021
/// thread-safe lazy parse of <c>Response</c>. The pre-fix implementation used
/// two mutable fields (<c>_response</c>/<c>_responseParsed</c>) with no
/// synchronization, so concurrent readers could each construct a fresh
/// <c>DynamicJsonElement</c> and one would overwrite the other. The fix moves
/// the parse onto a <c>Lazy&lt;dynamic?&gt;</c> with
/// <c>LazyThreadSafetyMode.ExecutionAndPublication</c> (the default), which
/// guarantees one parse and one shared result for all readers.
/// </summary>
public class ExternalCallResultTests
{
[Fact]
public void Response_NullOrEmptyJson_ReturnsNull()
{
var withNull = new ExternalCallResult(Success: true, ResponseJson: null, ErrorMessage: null);
var withEmpty = new ExternalCallResult(Success: true, ResponseJson: string.Empty, ErrorMessage: null);
Assert.Null(withNull.Response);
Assert.Null(withEmpty.Response);
}
[Fact]
public void Response_ParsesJsonIntoDynamicElement()
{
var result = new ExternalCallResult(Success: true, ResponseJson: "{\"answer\": 42}", ErrorMessage: null);
// dynamic property access is the production usage pattern.
dynamic? response = result.Response;
Assert.NotNull(response);
int answer = (int)response!.answer;
Assert.Equal(42, answer);
}
/// <summary>
/// Commons-021: concurrent readers must observe the same parsed instance
/// (a `Lazy&lt;T&gt;` invariant). Under the pre-fix code two threads could
/// both produce a fresh `DynamicJsonElement` and one would win the race —
/// `ReferenceEquals` would then occasionally fail. With the fix every
/// reader observes the single Lazy-published value, so the assertion
/// holds for every pair of observers.
/// </summary>
[Fact]
public void Response_ConcurrentReads_ReturnSameInstance()
{
// A larger payload makes the parse window wider so the race, if
// present, is more likely to fire. The same property — single
// published instance — must hold for any payload, though.
var json = "{\"items\":[{\"name\":\"a\"},{\"name\":\"b\"},{\"name\":\"c\"}],\"count\":3}";
var result = new ExternalCallResult(Success: true, ResponseJson: json, ErrorMessage: null);
const int observerCount = 64;
var barrier = new Barrier(observerCount);
var observed = new object?[observerCount];
Parallel.For(0, observerCount, i =>
{
// Force all observers to call `Response` at the same instant so
// they collide on the lazy parse rather than each finding it
// already-published.
barrier.SignalAndWait();
observed[i] = result.Response;
});
var first = observed[0];
Assert.NotNull(first);
for (var i = 1; i < observerCount; i++)
{
Assert.Same(first, observed[i]);
}
}
}
@@ -6,6 +6,7 @@ using ScadaLink.Commons.Entities.Sites;
using ScadaLink.Commons.Entities.Templates;
using ScadaLink.Commons.Types.Enums;
using ScadaLink.ConfigurationDatabase;
using ScadaLink.ConfigurationDatabase.Repositories;
namespace ScadaLink.ConfigurationDatabase.Tests;
@@ -36,6 +37,31 @@ public class ConcurrencyTestDbContext : ScadaLinkDbContext
}
}
/// <summary>
/// A SQLite-friendly DbContext that keeps <see cref="DeploymentRecord.RowVersion"/> as
/// the optimistic-concurrency token but disables auto-generation (SQLite cannot
/// auto-populate a rowversion column). The caller sets RowVersion explicitly, which
/// is sufficient to exercise the production stub-attach delete path under CD-017's
/// concurrency rule.
/// </summary>
public class RowVersionConcurrencyTestDbContext : ScadaLinkDbContext
{
public RowVersionConcurrencyTestDbContext(DbContextOptions<ScadaLinkDbContext> options) : base(options) { }
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
base.OnModelCreating(modelBuilder);
modelBuilder.Entity<DeploymentRecord>(builder =>
{
builder.Property(d => d.RowVersion)
.IsRequired(false)
.IsConcurrencyToken()
.ValueGeneratedNever();
});
}
}
public class ConcurrencyTests : IDisposable
{
private readonly string _dbPath;
@@ -149,6 +175,63 @@ public class ConcurrencyTests : IDisposable
Assert.Equal("Second update", loaded.Description);
}
[Fact]
public async Task DeleteDeploymentRecord_StaleRowVersion_ThrowsConcurrencyException()
{
// CD-017: Verifies the stub-attach delete path enforces optimistic concurrency
// when the caller passes a RowVersion that no longer matches the row's current
// RowVersion. Uses a SQLite fixture where DeploymentRecord.RowVersion is an
// explicit, caller-managed concurrency token (no SQL Server auto-generation).
using var setupCtx = new RowVersionConcurrencyTestDbContext(BuildOptions());
await setupCtx.Database.EnsureCreatedAsync();
var site = new Site("Site1", "S-RV1");
var template = new Template("RV-T1");
setupCtx.Sites.Add(site);
setupCtx.Templates.Add(template);
await setupCtx.SaveChangesAsync();
var instance = new Instance("RV-I1") { SiteId = site.Id, TemplateId = template.Id, State = InstanceState.Enabled };
setupCtx.Instances.Add(instance);
await setupCtx.SaveChangesAsync();
var record = new DeploymentRecord("deploy-rv-stale", "admin")
{
InstanceId = instance.Id,
DeployedAt = DateTimeOffset.UtcNow,
RowVersion = new byte[] { 0x01 },
};
setupCtx.DeploymentRecords.Add(record);
await setupCtx.SaveChangesAsync();
var id = record.Id;
// Reload in a fresh context and simulate a concurrent edit that has advanced
// the stored RowVersion. The caller below holds the *prior* RowVersion (0x01)
// and is expected to lose the concurrency check.
using (var advanceCtx = new RowVersionConcurrencyTestDbContext(BuildOptions()))
{
var stored = await advanceCtx.DeploymentRecords.SingleAsync(d => d.Id == id);
stored.RowVersion = new byte[] { 0x02 };
await advanceCtx.SaveChangesAsync();
}
using var deleteCtx = new RowVersionConcurrencyTestDbContext(BuildOptions());
var repository = new DeploymentManagerRepository(deleteCtx);
var staleRowVersion = new byte[] { 0x01 };
await repository.DeleteDeploymentRecordAsync(id, staleRowVersion);
await Assert.ThrowsAsync<DbUpdateConcurrencyException>(
() => repository.SaveChangesAsync());
}
private DbContextOptions<ScadaLinkDbContext> BuildOptions()
{
return new DbContextOptionsBuilder<ScadaLinkDbContext>()
.UseSqlite($"DataSource={_dbPath}")
.ConfigureWarnings(w => w.Ignore(RelationalEventId.PendingModelChangesWarning))
.Options;
}
[Fact]
public void DeploymentRecord_HasRowVersionConfigured()
{
@@ -838,9 +838,10 @@ public class DeploymentManagerRepositoryTests : IDisposable
await _repository.AddDeploymentRecordAsync(record);
await _repository.SaveChangesAsync();
var id = record.Id;
var rowVersion = record.RowVersion ?? Array.Empty<byte>();
_context.ChangeTracker.Clear();
await _repository.DeleteDeploymentRecordAsync(id);
await _repository.DeleteDeploymentRecordAsync(id, rowVersion);
await _repository.SaveChangesAsync();
Assert.Null(await _repository.GetDeploymentRecordByIdAsync(id));