Async cancellation hygiene, fire-and-forget observability, retry/shutdown
semantics, and audit-row coverage across 9 modules. Highlights:
Cancellation & lifecycle:
- AuditLog-006: SqliteAuditWriter.Dispose hops to thread pool, escaping the
captured SyncContext that risked sync-over-async deadlock.
- AuditLog-010: SiteAuditTelemetryActor owns a private lifecycle CTS,
threaded through drain paths instead of CancellationToken.None.
- Comm-019: CentralCommunicationActor adds lifecycle CTS for repo calls.
- Host-019: Migration StartupRetry forwards ApplicationStopping so SIGTERM
during the bounded-retry window aborts cleanly.
Cursor / retry / counter correctness:
- AuditLog-004: SiteAuditReconciliationActor's cursor now holds at `since`
when any row's idempotent insert is still being retried (per-EventId
retry counter, MaxPermanentInsertAttempts=5 escape valve with LogCritical
abandon). No more silent abandonment of permanently-failing rows.
- ConfigDB-019: Dropped the catch-and-continue on EnsureLookaheadAsync's
SPLIT loop — by class-doc construction the catch could only mask real
failures and let the next iteration create permanent partition holes.
- HM-017/018: HealthReportSender + CentralHealthReportLoop snapshot
per-interval counters before sending, restore via new
ISiteHealthCollector.AddIntervalCounters on transport failure so counts
aren't silently lost.
Fire-and-forget / shutdown waits:
- InboundAPI-018: AuditWriteMiddleware observes faulted audit-write tasks
via OnlyOnFaulted continuation (Warning log; response unchanged).
- SnF-024: StoreAndForwardService.StopAsync awaits in-flight retry sweep
with a bounded SweepShutdownWaitTimeout (10s).
Leak / refactor:
- Comm-021: SiteStreamGrpcServer.SubscribeInstance wraps Subscribe in its
own try/catch so a throw doesn't leak the relay actor or _activeStreams
entry.
- Comm-022: VERIFIED already-closed by Comm-016's dead-code purge.
- CLI-017: BundleCommands' three subcommands delegate to ExecuteCommandAsync
(auth-failure exit-code contract unified).
Defensive / validation:
- CLI-021: CliConfig.Load wraps file-read/JSON parse so malformed config
prints a warning and returns defaults instead of crashing the CLI.
- Host-022: ParseLevel emits stderr one-shot warning for unrecognised
MinimumLevel instead of silently coercing to Information.
- ESG-019: ExternalSystemClient sets HttpClient.Timeout=Infinite so the
per-call CTS is the sole timeout source (was clipped to 100s by .NET).
- Security-020: New SecurityOptionsValidator (IValidateOptions) rejects
empty LdapServer/LdapSearchBase with ValidateOnStart.
- DM-019: Lifecycle command timeouts now emit DisableTimedOut/EnableTimedOut/
DeleteTimedOut audit entries (mirrors DeployFailed pattern).
Plus reconciled stale per-module Open-findings counters that had drifted
from prior sessions.
20+ new regression tests across 11 test projects; build clean; affected
suites all green. README regenerated: 75 open (was 93).
Resolves the auth-theme batch from the 2026-05-28 baseline review (8 findings
across Security/CentralUI/ManagementService/CLI). The most consequential gaps:
NotificationReport + SiteCallsReport now route through SiteScopeService so a
site-scoped Deployment user cannot see or act on other sites' rows (CUI-028);
QueryAuditLogCommand is no longer "any authenticated user" — gated Admin-only
to match /api/audit/query's strictness (MS-018); RoleMapper preserves the
broader grant when a user is in both an unscoped and scoped Deployment LDAP
group, instead of silently narrowing to the scoped set (Sec-016); and the
dead SiteScopeRequirement/Handler are deleted so SiteScopeService is
unambiguously the sole site-scoping mechanism (Sec-017). Pending findings:
172 → 164.
Bundle G (#23 M7-T15): replace the temporary Admin-only gate on the Audit
Log surface with two new permission policies — OperationalAudit (read) and
AuditExport (bulk-export) — so the read path and the forensic-export path
can be delegated independently.
ScadaLink.Security
- AuthorizationPolicies: add OperationalAudit + AuditExport policy
constants; register them via RequireClaim with an explicit role allow-list
(OperationalAuditRoles, AuditExportRoles) so the role-to-permission
mapping is documented in one place.
- Default mapping: Admin and Audit roles grant both policies; AuditReadOnly
grants OperationalAudit only (read access without bulk export); Design
and Deployment grant neither.
ScadaLink.CentralUI
- AuditLogPage: switch the page-level [Authorize] to the OperationalAudit
policy and wrap the Export-CSV button in an AuthorizeView gated on
AuditExport so an OperationalAudit-only operator still sees the page +
filters but cannot trigger the CSV pull.
- ConfigurationAuditLog: switch from RequireAdmin to OperationalAudit so
both pages under the Audit nav group share the same gate.
- NavMenu: the Audit nav group now gates on OperationalAudit so the
section header + both child links match the per-page policies.
- AuditExportEndpoints: switch RequireAuthorization from RequireAdmin to
AuditExport — this is the authoritative gate; the AuthorizeView on the
button is just a UX affordance.
Tests
- New AuditLogPagePermissionTests covers the 5 brief-mandated cases plus
defence-in-depth for Admin-alone and AuditReadOnly users on the endpoint.
- SecurityTests: add policy-level coverage for the new role→permission
matrix (Theory rows pin every role/policy combination).
- AuditExportEndpointsTests: switch to AddScadaLinkAuthorization() so the
test host exercises the real production wiring under the new gate.
- AuditLogPageScaffoldTests: wrap the page render in a
CascadingAuthenticationState so the new in-page AuthorizeView resolves
the principal.
Move all package versions into Directory.Packages.props so every project
resolves a single consistent version. Consolidates the Roslyn packages
(Microsoft.CodeAnalysis.CSharp.Scripting/Workspaces) onto 5.0.0, which
resolves the pre-existing NU1608 version-skew error in the test projects.
Restore inside the docker build was failing because TreatWarningsAsErrors
promotes NU1902/NU1903/NU1904 (vulnerable package warnings) to errors.
Bump the flagged packages to advisory-free versions:
- MailKit 4.15.1 -> 4.16.0 (GHSA-9j88-vvj5-vhgr)
- Microsoft.AspNetCore.DataProtection.EFCore 10.0.5 -> 10.0.7 (GHSA-9mv3-2cwr-p262, transitively pulls fixed System.Security.Cryptography.Xml — GHSA-37gx-xxp4-5rgx, GHSA-w3x6-4m5h-cxqf)
- OpenTelemetry.Api (transitive via Akka.Hosting) 1.9.0 -> 1.15.3 (GHSA-g94r-2vxg-569j, GHSA-8785-wc3w-h8q6) — added as a direct PackageReference in ScadaLink.Host to override the Akka.Hosting pin
To resolve the NU1605 downgrade chain triggered by DataProtection.EFCore
10.0.7 (which transitively requires Microsoft.EntityFrameworkCore >= 10.0.7
and friends), bump every Microsoft.* 10.0.5 reference across src/ and
tests/ to 10.0.7 in lockstep.
17 source projects (Commons + Host + 15 components) and 17 xUnit test projects.
SLNX format, net10.0, nullable enabled, warnings as errors. All components
reference Commons; Host references all components. Builds and tests clean.