The largest themed batch — small mechanical fixes across 11 modules.
API / message hygiene:
- Comm-020: SiteAddressCacheLoaded now carries IReadOnlyDictionary /
IReadOnlyList — Akka messages must be immutable.
- Commons-016: BundleSession.MaxUnlockAttempts named constant replaces
magic 3.
- Commons-018: IOperationTrackingStore + IPartitionMaintenance moved from
Interfaces/ root to Interfaces/Services/ (namespace preserved — 9
consumers exceeded the in-prompt move threshold).
- Commons-023: TrackingStatusSnapshot.SourceNode now consistent with the
trailing-optional-with-default pattern used elsewhere.
- SR-022: AuditingDbCommand.DbConnection.set no longer uses reflection —
exposes AuditingDbConnection.Inner via internal API surface.
Dead code / config cleanup:
- ClusterInfra-011: decorative SectionName constant deleted.
- ClusterInfra-014: dead AddClusterInfrastructureActors method + its
"throws-when-called" test deleted.
- Host-021: Microsoft Logging:LogLevel block deleted from appsettings.json
(dead under Serilog).
Fail-loud over fail-silent:
- DM-021: ResolveSiteIdentifierAsync throws on missing site (was silently
substituting a DB id).
- DM-022: dropped transient Pending write — record now lands directly in
InProgress (no UI flicker, one fewer DB write).
- Host-020: LoggerConfigurationFactory emits a Console.Error warning when
both Serilog:MinimumLevel and ScadaLink:Logging:MinimumLevel are set
(ScadaLink remains truth per Host-011).
- SnF-022: NotifyCachedCallObserverAsync logs Warning on unparseable
TrackedOperationId (was silently dropping).
- SnF-023: empty siteId default replaced with $unknown-site sentinel
+ constructor normalisation.
Correctness:
- SCA-001: SupervisorStrategy XML rewritten to match actual
DefaultDecider/Restart semantics (was claiming Resume).
- SCA-003: OnUpsertAsync now restamps IngestedAtUtc on every upsert.
- SR-021: HandleDeployArtifacts now dispatches an internal
ApplyArtifactDataConnectionsToDcl message after the SQLite write so
system-wide artifact-deploy data-connection changes go live
immediately (was requiring a site restart).
- SnF-020: RetryParkedMessageAsync captures the parked row BEFORE the
local write so a concurrent delete can't skip standby replication.
Sentinels / naming collisions:
- HM-021: CentralSiteId changed from "central" to "$central"
(uncollideable — leading $ is forbidden in real SiteIdentifiers).
Doc / surface cleanups:
- SEL-018: FailedWriteCount promoted to ISiteEventLogger; XML softened
to "Available for future Health Monitoring integration".
- SnF-019: VERIFY outcome — documented parking-after-DefaultMaxRetries
in Component-StoreAndForward.md + DefaultMaxRetries XML (uniform
cap; maxRetries:0 is the unbounded escape hatch).
- SnF-021: Component-StoreAndForward.md no longer claims the tracking
table lives in SnF — it's in SiteRuntime, the interface is in Commons.
- CLI-020: bundle export response parse guarded with try/catch on
JsonException / KeyNotFoundException / FormatException — emits a
clean INVALID_RESPONSE exit instead of a stack trace.
Config:
- ClusterInfra-013: intent comment added to "catastrophic config" test.
- Host-016: appsettings.Site.json second CentralContactPoints entry
removed (was pointing at the SITE's own port); doc-key explains how
to extend.
- Host-018: NodeName added to both shipped per-role configs (was
causing SourceNode to be null on audit rows).
UI:
- CentralUI-029: replaced JS.InvokeAsync<int>("eval", …) with an ES
module import (new wwwroot/js/browser-time.js).
- CentralUI-032: AuditResultsGrid gains a Previous button backed by a
cursor stack.
10+ new regression tests across the affected projects. Build clean;
all suites green. README regenerated: 6 open (was 33).
Session-to-date: 130 of 136 originally-open Theme findings closed.
13 well-bounded test-coverage gaps closed across 11 test projects.
Net +35 regression tests; no production code changes except the
SiteEventLogger src reference unchanged (W3 redacted only test code).
Test additions:
- CLI-022: CommandTreeTests pinned-count assertion bumped 14→16 and
3 InlineData rows added for the audit + bundle command groups.
- Commons-020: new TransportRecordsTests covers BundleManifest /
ExportSelection / ImportPreview / ImportResolution / ImportResult —
ctor + System.Text.Json round-trip + record-equality (14 tests).
- CD-024: SPLIT-RANGE failure-continuation now under
EnsureLookahead_SecondSplitThrows_LoopAborts_FirstBoundaryStillCommitted
(Skippable MS-SQL fixture); production-shape rowversion delete
asserted by DeleteDeploymentRecord_CurrentRowVersion_StubAttachPath_DeleteSucceeds.
- CentralUI-033: new QueryStringDrillInTests with 4 bUnit cases for
Transport + SiteCalls drill-in / query-string handling.
- DM-024: probe actors (ReconcileProbeActor, SerializationProbeActor,
ArtifactProbeActor) refactored from static fields to per-test instances
(Interlocked on counter) — all 31 callers updated; no production
changes required.
- HM-022: real-time PeriodicTimer test flake fixed by replacing
fixed-budget Task.Delay with a RunLoopUntil poll-until-condition
helper (5s/25ms). Production loop untouched.
- InboundAPI-023: new EndpointExtensionsTests covers the
POST /api/{methodName} composition wiring via TestServer (7 cases:
happy path, missing key 401, unknown method 403, invalid JSON 400,
missing param 400, script-throws 500 sanitised, AuditActorItemKey
stash invariant).
- MgmtSvc-021: 6 new ManagementActorTests cover the Transport bundle
handlers (role gate for Export/Preview/Import, unknown-name
ManagementCommandException, blocker-rejection, dedupe last-write-wins).
- SCA-006: SiteCallQueryRequest_StuckOnly_CursorAtNonStuckBoundary_SkipsToNextStuckRow
pins the missing boundary case.
- SEL-023: stress-test `bool stop` promoted to `volatile bool` for
cross-thread visibility under release/JIT.
Verify-only resolutions:
- NS-024: closed by NS-019 (commit ac96b83 deletion of
NotificationDeliveryService + its test file). No edits needed.
- NotifOutbox-008: FallbackMaxRetries/FallbackRetryDelay are private
forward-compat constants returned only when no SMTP-config row exists
(in which case EmailNotificationDeliveryAdapter returns Permanent,
bypassing the values entirely). Marked Resolved with note.
- Transport-010: Overwrite child-collection sync covered by the T-001/
T-002 tests added in commit e3ca9af; per-IP throttle by
BundleUnlockRateLimiterTests; failed-session retention by
BundleSessionStoreTests; T-009 closed structurally via AsyncLocal.
Marked Resolved by reference.
Build clean; all 11 affected test suites green. README regenerated:
33 open (was 46).
Security-sensitive batch, handled main-thread for careful judgment on
secret-leak and pepper-bypass paths.
Secret leak / pepper bypass:
- CD-016 (pepper bypass): InboundApiRepository's GetApiKeyByValueAsync no
longer hashes the candidate with the unpeppered ApiKeyHasher.Default —
ctor takes a lazy Func<IApiKeyHasher> accessor (lazy so test composition
roots without a pepper still bring up the repository), and the DI
registration wires sp.GetService<IApiKeyHasher>() so the production
peppered hasher matches the stored KeyHash. Regression test asserts
positive (peppered roundtrip) AND negative (Default hasher misses the
same key — proving the lookup uses the injected hasher).
- MgmtSvc-020 (SMTP credential leak): UpdateSmtpConfig/ListSmtpConfigs
now project through SmtpConfigPublicShape so the response payload and
audit-row afterState never carry the Credentials field — only a
HasCredentials bool. The SMTP password / OAuth2 client secret no
longer leaves the Admin-only UpdateSmtpConfig boundary the caller
already supplied it to.
Redaction:
- AuditLog-008 (test-fixture under-redact): new
SafeDefaultAuditPayloadFilter (stateless singleton) does HTTP header
redaction for the always-sensitive defaults (Authorization, X-Api-Key,
Cookie, Set-Cookie). FallbackAuditWriter, CentralAuditWriter, and
AuditLogIngestActor (both ingest paths) default to it instead of null
— composition roots that bypass AddAuditLog can no longer write
unredacted auth headers to the audit store.
- NotifService-025 (over-mask): CredentialRedactor.Scrub now only masks
the last colon-separated component (password / clientSecret) AND only
if it's >= 12 chars (typical password heuristic). Short user names
like "root" no longer become global redaction tokens that eat unrelated
diagnostic text. The full packed string is always masked regardless of
length. 3 new negative tests pin the no-over-mask contract.
Audit-row correctness / fail-loud:
- InboundAPI-025: Program.cs UseWhen predicate now excludes /api/audit,
/api/management, /api/centralui, /api/script-analysis AND requires POST
— the AuditWriteMiddleware no longer emits spurious ApiInbound rows
for audit-log query/export endpoints (write-on-read recursion broken).
- ESG-021: ApplyAuth now logs Warning (not silent) on empty
AuthConfiguration for apikey/basic, unknown AuthType, and malformed
Basic config. AuthConfiguration value NEVER logged. AuthType=none
remains silent (documented unauthenticated sentinel).
- Security-021: AddSecurity now logs a startup Warning when
RequireHttpsCookie=false — an HTTP-only deployment that previously
transmitted the cookie-embedded JWT silently in cleartext is now
audible in the log.
Defensive:
- CD-021: SwitchOutPartitionAsync's monthBoundary format string now
yyyy-MM-dd HH:mm:ss.fffffff (datetime2(7) precision) so a future
sub-second / non-midnight boundary doesn't silently round to the
wrong partition.
Plus reconciled stale per-module Open-findings counters that had drifted
from earlier sessions (AuditLog, CD, ESG, IAPI, MgmtSvc, NotifService,
Security).
Build clean; all affected test projects green (Host 208, ConfigDB 242,
ESG 69, IAPI 151, MgmtSvc 100, NotifService 55, Security 85, AuditLog
247/248 — 1 pre-existing date-sensitive integration test flake on
PartitionPurgeTests, unrelated). README regenerated: 46 open (was 54).
Well-localised perf fixes across 8 modules.
Lock decoupling / SQL streaming:
- AuditLog-005: SqliteAuditWriter gains dedicated read-only _readConnection
(+ _readLock) backed by WAL journal mode. GetBacklogStatsAsync,
ReadPendingAsync, ReadPendingSinceAsync, ReadForwardedAsync no longer
contend with the hot-path INSERT lock — backlog probes on a 30s timer
can't stall the writer under multi-hundred-K Pending backlog.
- SEL-022: dropped Cache=Shared from SiteEventLogger's default connection
string (single-connection logger; mode was dormant config).
Memory / streaming:
- CLI-019: bundle export streams base64 in 1 MB-aligned chunks via
Convert.TryFromBase64Chars straight into the FileStream — no more
full-bundle byte[] allocation.
- CentralUI-031: TransportImport now stages the upload to a per-session
temp file under Path.GetTempPath() (replaces in-memory byte[] field);
page implements IDisposable to delete the temp file on reset / new
upload / dispose. Per-circuit working set drops from ~100 MB to ~80 KB.
N+1 hoisting:
- Transport-008: added ITemplateEngineRepository.GetTemplatesWithChildrenAsync
bulk method; BundleImporter.PreviewAsync calls it once instead of per-
template-name. Single query with .Include(...).AsSplitQuery().
- DM-023: BuildDeployArtifactsCommandAsync's per-site loop now references
a pre-fetched GlobalArtifactSnapshot (shared scripts, external systems,
DB connections, notification lists, SMTP) instead of re-querying per site.
- MgmtSvc-023: HandleQueryDeployments unfiltered branch uses one
GetAllInstancesAsync bulk load + Dictionary<int,int?> lookup (was a
GetInstanceByIdAsync per record).
Small allocations / per-tick rebuilds:
- InboundAPI-019: AuditWriteMiddleware gates EnableBuffering() on
RequestHasBody() so GET/HEAD/DELETE/TRACE/OPTIONS and Content-Length:0
requests skip the FileBufferingReadStream allocation.
- NotifOutbox-006: ResolveAdapters dictionary now cached on
_adaptersCache (built lazily on first sweep) + actor-lifetime
_adaptersScope; ResolveAdapters no longer rebuilds per dispatch tick.
Verify-only:
- Comm-017: Confirmed _inProgressDeployments was deleted by Comm-016 in
commit ac96b83 — marked Resolved with that attribution. No code change.
Doc-correction:
- NS-022: Updated MailKitSmtpClientWrapper XML doc to spell out single-
connection / per-delivery-factory contract (option (b) — transient
client per Send — rejected because it re-handshakes TLS per email).
10+ new regression tests across 8 test projects. Build clean; affected
suites all green. README regenerated: 54 open (was 65).
Concurrency hazards, DI lifetime hygiene, and one verify-only confirmation
across 8 modules. Highlights:
Concurrency:
- CentralUI-030: SandboxConsoleCapture writes routed through WriteSynchronized
locking on the captured StringWriter — intra-script Task fan-out can no
longer corrupt the per-call buffer.
- Commons-021: ExternalCallResult.Response now backed by Lazy<dynamic?>
(ExecutionAndPublication) — no more benign double-parse race.
- CD-017: DeploymentManagerRepository.DeleteDeploymentRecordAsync now takes
an expected RowVersion and seeds entry.OriginalValues so EF emits
DELETE ... WHERE Id=@id AND RowVersion=@prior; stale RowVersion now
throws DbUpdateConcurrencyException instead of silent overwrite.
- Transport-009: AuditCorrelationContext.BundleImportId backed by
AsyncLocal<Guid?> so concurrent imports get per-logical-call isolation
(was a scoped instance shared via AuditService across runs).
DI / lifetime:
- AuditLog-003: All 3 AuditLog actor handlers switched to CreateAsyncScope
+ await using — async EF disposal no longer swallowed.
- AuditLog-007: INodeIdentityProvider resolution standardised on
GetRequiredService<>() (was mixed with GetService<>()).
- AuditLog-011: AddAuditLogHealthMetricsBridge guarded by sentinel
descriptor check — calling twice no longer double-registers the hosted
service.
Shutdown / supervision:
- SiteCallAudit-002: AkkaHostedService adds a CoordinatedShutdown
cluster-leave task (drain-site-call-audit-singleton) that issues a
bounded GracefulStop(10s) so failover waits for in-flight upserts.
Registration safety:
- NS-020: AkkaHostedService now guards NotificationForwarder S&F
registration with _notificationDeliveryHandlerRegistered + throws
InvalidOperationException on double-register to make the regression loud.
VERIFY-only closures:
- NotifOutbox-005: Confirmed already closed by CD-015 fix (ac96b83) —
NotificationOutboxRepository.InsertIfNotExistsAsync uses the same
raw-SQL IF NOT EXISTS + 2601/2627 swallow pattern; race eliminated.
5+ new regression tests (CentralUI sandbox WhenAll, ExternalCallResult
64-reader Barrier, AuditLog DI idempotency, RowVersion stale-throw,
SiteCallAudit-002 shutdown drain). Build clean; affected suites all green.
README regenerated: 65 open (was 75).
Async cancellation hygiene, fire-and-forget observability, retry/shutdown
semantics, and audit-row coverage across 9 modules. Highlights:
Cancellation & lifecycle:
- AuditLog-006: SqliteAuditWriter.Dispose hops to thread pool, escaping the
captured SyncContext that risked sync-over-async deadlock.
- AuditLog-010: SiteAuditTelemetryActor owns a private lifecycle CTS,
threaded through drain paths instead of CancellationToken.None.
- Comm-019: CentralCommunicationActor adds lifecycle CTS for repo calls.
- Host-019: Migration StartupRetry forwards ApplicationStopping so SIGTERM
during the bounded-retry window aborts cleanly.
Cursor / retry / counter correctness:
- AuditLog-004: SiteAuditReconciliationActor's cursor now holds at `since`
when any row's idempotent insert is still being retried (per-EventId
retry counter, MaxPermanentInsertAttempts=5 escape valve with LogCritical
abandon). No more silent abandonment of permanently-failing rows.
- ConfigDB-019: Dropped the catch-and-continue on EnsureLookaheadAsync's
SPLIT loop — by class-doc construction the catch could only mask real
failures and let the next iteration create permanent partition holes.
- HM-017/018: HealthReportSender + CentralHealthReportLoop snapshot
per-interval counters before sending, restore via new
ISiteHealthCollector.AddIntervalCounters on transport failure so counts
aren't silently lost.
Fire-and-forget / shutdown waits:
- InboundAPI-018: AuditWriteMiddleware observes faulted audit-write tasks
via OnlyOnFaulted continuation (Warning log; response unchanged).
- SnF-024: StoreAndForwardService.StopAsync awaits in-flight retry sweep
with a bounded SweepShutdownWaitTimeout (10s).
Leak / refactor:
- Comm-021: SiteStreamGrpcServer.SubscribeInstance wraps Subscribe in its
own try/catch so a throw doesn't leak the relay actor or _activeStreams
entry.
- Comm-022: VERIFIED already-closed by Comm-016's dead-code purge.
- CLI-017: BundleCommands' three subcommands delegate to ExecuteCommandAsync
(auth-failure exit-code contract unified).
Defensive / validation:
- CLI-021: CliConfig.Load wraps file-read/JSON parse so malformed config
prints a warning and returns defaults instead of crashing the CLI.
- Host-022: ParseLevel emits stderr one-shot warning for unrecognised
MinimumLevel instead of silently coercing to Information.
- ESG-019: ExternalSystemClient sets HttpClient.Timeout=Infinite so the
per-call CTS is the sole timeout source (was clipped to 100s by .NET).
- Security-020: New SecurityOptionsValidator (IValidateOptions) rejects
empty LdapServer/LdapSearchBase with ValidateOnStart.
- DM-019: Lifecycle command timeouts now emit DisableTimedOut/EnableTimedOut/
DeleteTimedOut audit entries (mirrors DeployFailed pattern).
Plus reconciled stale per-module Open-findings counters that had drifted
from prior sessions.
20+ new regression tests across 11 test projects; build clean; affected
suites all green. README regenerated: 75 open (was 93).
Each finding is a focused validation guard or upper bound at a trust boundary.
Highlights:
- Commons-015: EncryptionMetadata ctor now validates Algorithm (AES-256-GCM
only), Kdf (PBKDF2-SHA256 only), Iterations ([100k, 10M]), non-null Salt/IV.
- Transport-004: new BundleUnlockRateLimiter (sliding-window, per-key,
singleton) wired into BundleImporter.LoadAsync; over-budget callers see
BundleUnlockRateLimitedException. Per-bundle 3-strike + per-window cap.
- ESG-022: ExternalSystemClient.InvokeHttpAsync allow-lists the documented
GET/POST/PUT/PATCH/DELETE set (case-insensitive); unknown verbs throw.
- SEL-015: SiteEventLogger queue now bounded (10k cap, DropOldest); dropped
events fault their Task and increment FailedWriteCount so the drop is
observable instead of an unbounded memory growth.
- SEL-017: EventLogQueryService clamps caller-supplied PageSize to a new
MaxQueryPageSize cap (default 500) so int.MaxValue can't OOM the host.
- SEL-020: LogEventAsync rejects severities outside {Info, Warning, Error}
(matches SQLite BINARY-collation query filter).
- InboundAPI-020: ContentType "json" check now case-insensitive
(application/JSON no longer slips through as not-json).
- InboundAPI-024: _knownBadMethods capped at 1000 entries (drops new entries
once full); per-request DB lookup remains the correctness path.
- SR-025: HandleSetStaticAttribute validates the attribute name against the
deployed config; unknown names now return Success=false instead of
leaking orphan override rows into the SQLite store.
- TE-021: MoveTemplateAsync runs the sibling-name-collision check at the
destination, mirroring TemplateFolderService.MoveFolderAsync.
- TE-022: LockEnforcer's once-locked-stays-locked rule now also covers
LockedInDerived (was previously only IsLocked).
New regression tests across 8 test projects (EncryptionMetadata, rate
limiter, ESG client allow-list, SEL bounded channel / PageSize clamp /
severity validation, InboundAPI ContentType + bad-methods cap, SiteRT
unknown-attribute, TemplateEngine MoveTemplate + LockedInDerived).
Build clean; affected suites all green. README regenerated: 93 open (was 104).
Note: a separate manual re-run was needed for the SiteEventLogging hunk
because its initial subagent's source edits never landed on disk despite
reporting success (file-collision-style failure mode).
UTC invariant + culture-safety fixes across UI form binding, audit entity
hydrate, and locale-dependent parses. Highlights:
- CentralUI-026/027: AuditFilterBar / SiteCallsReport / NotificationReport /
EventLogs now apply SpecifyKind(Local) + ToUniversalTime() at form submit
so browser-local datetime-local inputs aren't silently treated as UTC.
- Commons-019: AuditEvent.OccurredAtUtc / IngestedAtUtc init-setters
re-tag any incoming DateTime as Kind=Utc, documenting the invariant.
- CD-018: AuditLogEntityTypeConfiguration adds UTC ValueConverters on the
*Utc DateTime columns so EF hydrate yields Kind=Utc (SQL Server's
datetime2 has no Kind metadata, so reads were returning Unspecified).
- CD-020: GetPartitionBoundariesOlderThanAsync now SpecifyKind(Utc) on the
raw-ADO read, matching the existing defence in AuditLogPartitionMaintenance.
- SEL-021: EventLogQueryService.DateTimeOffset.Parse now uses
InvariantCulture + AssumeUniversal | AdjustToUniversal.
- SR-023: Convert.ToDouble in ScriptActor + AlarmActor (4 sites) now
passes InvariantCulture so non-US locales don't mis-parse string values.
- HM-020: CentralHealthAggregator.MarkHeartbeat anchors LastHeartbeatAt to
max(receivedAt, now) on offline→online so a stale receivedAt can't
leave a recovered site one tick from re-going-offline.
3 new tests added (AuditLog UTC converter, AuditFilterBar/EventLogs/
NotificationReport-touching CentralUI tests already cover Apply paths,
heartbeat offline→online). Build clean; ConfigurationDatabase 236,
Commons 330, HealthMonitoring 71, SiteRuntime 301, SiteEventLogging 50,
CentralUI 50 — all green. README regenerated: 104 open (was 112).
Transport-001: template Overwrite now diff-and-merges the bundle's
Attributes / Alarms / Scripts onto the target template via three private
helpers (SyncTemplateAttributesAsync / SyncTemplateAlarmsAsync /
SyncTemplateScriptsAsync). Each helper emits one audit row per detected
add / update / delete and feeds the post-merge state into the existing
ResolveAlarmScriptLinks and ResolveCompositionEdges passes.
Transport-002: external-system Overwrite now syncs the Methods collection
via a parallel SyncExternalSystemMethodsAsync helper mirroring the T-001
shape, with ExternalSystemMethodAdded / Updated / Deleted audit rows.
Both fixes are covered by new integration tests in BundleImporterApplyTests.
README regenerated — open findings dropped from 146 to 136; all 10 open
High findings are now closed (0 Critical, 0 High, 46 Medium, 90 Low
remaining).
Comm-016: delete dead HandleConnectionStateChanged + _debugSubscriptions /
_inProgressDeployments tracking + ConnectionStateChanged message record.
Disconnect detection is owned by the transport layers (gRPC keepalive PING
~25s; Ask-timeout at CommunicationService). Updates the
Component-Communication.md design doc to make that explicit.
SnF-018: NotificationForwarder.DeliverAsync now discards a corrupt buffered
payload (Warning log + return true) instead of returning false and parking
the row — honoring the design's "notifications do not park" invariant.
DM-018: reconciliation no longer force-sets Enabled, preserving an
intentional Disabled state after central failover.
ESG-018: DeliverBufferedAsync (both ExternalSystemClient + DatabaseGateway)
catches JsonException and returns false, turning a corrupt buffered row
into a parked operation instead of a retry-forever poison message.
InboundAPI-022: register ActiveNodeGate as IActiveNodeGate in the Central
DI branch so standby-node gating is actually wired up in production.
NS-019: remove orphaned NotificationDeliveryService /
INotificationDeliveryService / NotificationResult; central notification
delivery now lives entirely in NotificationOutbox.
SEL-016: normalise From/To filters to UTC before ISO-string compare so
non-UTC DateTimeOffset clients no longer get spuriously excluded events.
TE-017: include Description on attributes/alarms and a HashableConnections
projection (protocol, endpoint JSON, failover count) in the revision hash
and DiffService; staleness detection now catches description-only and
connection-endpoint edits.
Transport-001 and Transport-002 (also High) remain Open — they're being
handled in a follow-up batch because both touch BundleImporter.cs and
must serialise.
CD-015: rewrite NotificationOutboxRepository.InsertIfNotExistsAsync as raw-SQL
IF NOT EXISTS … INSERT with SqlException 2601/2627 catch, ending the
at-least-once livelock on the site→central notification handoff.
DCL-018/019/020/021/022: add _subscribesInFlight guard so concurrent
same-tag subscribes don't orphan an adapter handle; delete the latent
dead _subscriptionHandles dictionary; stop double-counting
_totalSubscribed when an unresolved tag is promoted via another instance;
release adapter handles on mid-flight unsubscribe; gate the
tag-resolution retry timer with IsTimerActive so subscribe bursts don't
reset it into starvation.
SR-020: add _terminatingActorsByName shadow so a third deploy arriving
during a pending redeploy doesn't crash on InvalidActorNameException —
displaced senders get a Failed/superseded response and the latest
command wins on Terminated.
SR-024: split OperationTrackingStore reads from writes (fresh
SqliteConnection per GetStatusAsync) so long writes don't block status
queries; rewrite Dispose to drop the sync-over-async bridge that could
deadlock on a non-reentrant SyncContext; Interlocked.Exchange makes the
dispose-once flag race-safe across both paths.
T-003: move the unlock lockout server-side. The 3-strike counter used to
live in the Razor page only — a second tab / CLI caller could re-upload
the same bytes and grind PBKDF2 indefinitely. The counter now lives in
IBundleSessionStore, keyed by ContentHash, so retries against identical
bundle bytes are throttled regardless of client. BundleLockedException
surfaces the new typed error path.
T-005: bind the manifest's non-derivative fields into AES-GCM AAD. A
SHA-256 of the manifest (with ContentHash + Encryption normalised to
sentinels) is now passed to AesGcm.Encrypt / .Decrypt, so a tampered
SourceEnvironment / ExportedBy / CreatedAtUtc on a stolen bundle yields
an authentication-tag mismatch instead of slipping past the Step-4
typo-resistant confirmation gate.
T-006: cap zip entry count, decompressed length, and compression ratio
in LoadAsync's envelope validator BEFORE any payload is decompressed,
using ZipArchiveEntry.Length / .CompressedLength. New TransportOptions
fields default to 4 entries / 200 MB / 50x ratio.
T-007: clear decrypted plaintext on the ApplyAsync failure path and zero
the buffer on success before removing the session, so a 100 MB
DecryptedContent doesn't sit in memory for the 30-min TTL after a failed
apply. A BundleSessionEvictionService BackgroundService now also drives
EvictExpired periodically so abandoned sessions clear without needing a
fresh Get() call to trigger lazy eviction.
Also resolves NO-010 — the misleading "writer never throws" XML doc was
the same code+comment my prior NO-004 await-the-writer fix already
rewrote.
NS-021/NO-001: thread FromAddress into XOAUTH2 so M365 stops rejecting
sends with 535 5.7.3. Added an additive oauth2UserName parameter on
ISmtpClientWrapper.AuthenticateAsync; both NotificationService and
NotificationOutbox now pass config.FromAddress.
NO-002: clamp non-positive SmtpConfiguration.MaxRetries/RetryDelay to the
1-min / 10-attempt fallback with a Warning so a misconfigured row no
longer parks transient failures on the first attempt or burn-loops.
NO-003: route a lifecycle-scoped CancellationToken from the
NotificationOutboxActor through the dispatch sweep into the adapter so
in-flight SMTP sends abort on PostStop instead of blocking
CoordinatedShutdown for the full SMTP timeout per row.
NO-004: await the central audit writer inside the existing try/catch
instead of fire-and-forget so the audit task can't outlive the per-sweep
DI scope and writer faults reach the operator log instead of being
silently dropped.
Two AuditLog integration tests seeded RetryDelay = TimeSpan.Zero to force
immediate re-claim on the second tick; updated them to 1 ms so they keep
the same intent without tripping the NO-002 clamp.
Resolves the auth-theme batch from the 2026-05-28 baseline review (8 findings
across Security/CentralUI/ManagementService/CLI). The most consequential gaps:
NotificationReport + SiteCallsReport now route through SiteScopeService so a
site-scoped Deployment user cannot see or act on other sites' rows (CUI-028);
QueryAuditLogCommand is no longer "any authenticated user" — gated Admin-only
to match /api/audit/query's strictness (MS-018); RoleMapper preserves the
broader grant when a user is in both an unscoped and scoped Deployment LDAP
group, instead of silently narrowing to the scoped set (Sec-016); and the
dead SiteScopeRequirement/Handler are deleted so SiteScopeService is
unambiguously the sole site-scoping mechanism (Sec-017). Pending findings:
172 → 164.
DetectBlockersAsync was feeding TemplateAttribute.DataSourceReference
into the identifier scanner alongside script bodies, but that field is
an OPC UA node-address path (e.g. "ns=3;s=Tank.Level") owned by the
device, not script source. The dot delimiter inside the path tripped
the heuristic into flagging the address segment ("Tank", "Sensor",
"TestChildObject", "DevAppEngine") as a missing SharedScript or
ExternalSystem reference -- a 100% false-positive class on any
template catalog with OPC-UA-mapped attributes.
Drop the DataSourceReference scan entirely. Attribute.Value is still
scanned because it can carry a design-time default expression that
calls into runtime APIs. Add a regression test pinning the new behavior.
The DetectBlockersAsync heuristic was catching every PascalCase
"Identifier(" or "Identifier." token in script bodies and treating it
as a candidate SharedScript or ExternalSystem reference. On a normal
template catalog this surfaced 30+ blocker rows for .NET stdlib
(DateTimeOffset, Convert, ToString, Dispose, UtcNow...), ScadaLink
runtime API roots (Notify, Database, ExternalSystem, Scripts...), and
SQL keywords inside string literals (COUNT), blocking the import.
Two surgical fixes:
1. Skip identifiers preceded by `.` so `obj.Method()` no longer flags
`Method` as a top-level reference.
2. Maintain a `KnownNonReferenceNames` denylist for the small set of
well-known stdlib / runtime / SQL tokens that can never be
user-defined SharedScripts or ExternalSystems.
The documented use case -- a top-level free-standing call to a missing
SharedScript or ExternalSystem (e.g. `MissingHelper()` at the start of
an expression, or `ErpSystem.Call(...)` where ErpSystem is the
external-system identifier) -- still produces a blocker row, pinned
by the existing test plus a new noise-filter regression test.
The asserted 'LDAP credentials' tagline was deliberately removed from
Login.razor in f973f49 but the test was not updated alongside. Drop
the test — it asserts on UI text that no longer exists by design.
Add TimeSpan? MinTimeBetweenRuns to TemplateScriptDto and int MaxRetries /
TimeSpan RetryDelay to ExternalSystemDto; wire both directions in
EntitySerializer. Extends the existing script round-trip assertion and adds
Roundtrip_external_system_preserves_retry_config.
- NavMenu: move Import Bundle out of the nested RequireDesign/RequireAdmin
double-gate into the top-level Admin section so an Admin-only user sees it
without needing the Design role; Export Bundle stays in the Design section.
- TransportImport: inject IAuditService + ScadaLinkDbContext; emit a
BundleImportUnlockFailed audit row (best-effort, swallowed on failure) on
every wrong-passphrase attempt in SubmitPassphraseAsync, with attempt
number and error reason in afterState.
- docker central-node-a/b appsettings: add ScadaLink:Transport section with
SourceEnvironment = "docker-cluster" so the importer picks up a non-null
environment name in the audit trail.
- CentralUI.Tests: register IAuditService mock + SQLite in-memory
ScadaLinkDbContext in TransportImportPageTests to satisfy the two new injects.
Implements Task T21 of the Transport feature. A four-step Blazor wizard
(Select → Review → Encrypt → Download) under /design/transport/export,
gated on AuthorizationPolicies.RequireDesign:
1. Select — TemplateFolderTree (checkbox-mode) plus flat checkbox
lists for shared scripts, external systems, DB connections,
notification lists, SMTP configs, API keys, API methods.
2. Review — runs DependencyResolver, surfaces seed vs auto-included.
"Include all dependencies" toggle re-resolves on flip.
3. Encrypt — passphrase + confirm with strength meter, secret-count
warning over the resolved closure, explicit unencrypted
opt-out path (calls BundleExporter with passphrase=null
so the audit row tags UnencryptedBundleExport).
4. Download— calls IBundleExporter.ExportAsync, streams bytes to the
browser via JS interop (wwwroot/js/transport.js), displays
filename + size + SHA-256 + encryption status.
Source environment is sourced from new TransportOptions.SourceEnvironment
(bound from ScadaLink:Transport:SourceEnvironment, defaults "scadalink"),
filename pattern scadabundle-{env}-{yyyy-MM-dd-HHmmss}.scadabundle.
Tests (bUnit + policy): step 1 group rendering, step 2 dependency
expansion (Pump composes Motor), step 4 full walkthrough verifying
ExportAsync receives the selected ids + authenticated identity, and a
RequireDesign policy-deny test for users without the Design role. Also
unit-pins the filename-sanitisation contract.
Address one Blocker and three Important findings from code review of
2c34f12 (BundleImporter.ApplyAsync):
- BLOCKER: wrap RollbackAsync in nested try/catch so a rollback fault
does not swallow the BundleImportFailed audit row. Dispose the
failed transaction before the audit-write so the new SaveChangesAsync
uses a fresh implicit transaction instead of enlisting in the broken
one. Surface the rollback exception's message on the failure row
alongside the original cause, and swallow audit-write faults per the
design's best-effort-audit invariant. Add regression integration
test using a SQLite transaction interceptor that throws on rollback.
- Document re-entrancy assumption on IAuditCorrelationContext: scoped
lifetime, single circuit, concurrent imports within a shared scope
must serialize externally.
- Document repository audit responsibility on BundleImporter: repos
are thin EF wrappers; ApplyAsync writes audit rows explicitly. If
repos ever start emitting audit rows, the explicit calls here must
be removed to avoid double-logging.
- Document BundleSessionStore thread-safety: ConcurrentDictionary
primitives are safe under concurrent callers; BundleSession itself
is not thread-safe.
The audit log drilldown drawer (and the execution-tree node-detail modal,
which shares this component) now renders the SourceNode field directly
under SourceSiteId so provenance reads 'site → node → instance → script'
in declared order. Two focused tests pin the field's presence in both
populated and null cases plus the inter-field ordering.
The Site Calls and Notifications detail modals were reading SourceNode from
the summary record (d.SourceNode) while every other field read from the
detail record (det.X). The pattern works today because the modal always
opens via a row click that pre-loads the summary, but a future drill-in
from a deep link or refresh path could leave the summary stale or null and
the field would render blank or wrong.
Add SourceNode to both detail records, project it through the actor's
ToDetail mapping, and switch the razor markup to read det.SourceNode. Now
the modal binds uniformly to the detail record across all fields.
Two follow-ups from the T13/T14 code review:
- M1: Add CachedWrite_StampsSourceNode_OnSubmitTelemetryRow and
CachedWrite_NoSourceNodeWired_LeavesSourceNodeNull to DatabaseCachedWriteEmissionTests,
mirroring the existing ApiOutbound SourceNode tests in
ExternalSystemCachedCallEmissionTests. Site-emitter coverage now symmetric
across both cached-call channels.
- M2: Clarify the GetService(INodeIdentityProvider) DI comments on the
CachedCallTelemetryForwarder and CachedCallLifecycleBridge factories:
it's test composition roots that may not register the provider, not
central production. Both site and central hosts always register it via
SiteServiceRegistration.BindSharedOptions.
Site: site emitters of SiteCallOperational (ExternalSystemClient, the script-API
cached call path in ScriptRuntimeContext, CachedCallLifecycleBridge) inject
INodeIdentityProvider and stamp SourceNode = NodeName at construction.
OperationTrackingStore call site in CachedCallTelemetryForwarder now stamps
SourceNode too.
Central: SiteCallAuditRepository.UpsertAsync INSERT includes SourceNode in the
column list; conditional monotonic UPDATE uses
COALESCE(@SourceNode, SourceNode) so later packets cannot blank a previously-
stamped value. After this commit every SiteCalls row carries node-a/node-b in
SourceNode (subject to monotonic preservation).
Site: inject INodeIdentityProvider where NotificationSubmit is built; stamp
SourceNode = NodeName at construction.
Central: NotificationOutboxActor.HandleSubmit copies submit.SourceNode onto
the Notification row; the repository INSERT persists it (EF tracked-entity
insert flows it through automatically; raw-SQL extension if not).
After this commit, every Notifications row carries the originating site
node-a/node-b in SourceNode. Existing notifications submitted pre-feature
remain NULL.