Commit Graph

542 Commits

Author SHA1 Message Date
Joseph Doherty add7210d9e fix(dcl): route native alarm subscribe/unsubscribe through DataConnectionManagerActor
The NativeAlarmActor sends SubscribeAlarmsRequest to the DCL manager, but the
manager only routed tag/write/browse messages to the per-connection
DataConnectionActor — alarm subscribe/unsubscribe were unhandled and dead-lettered,
so native alarms never subscribed at runtime. Caught by live T28 deployment.
Mirrors the existing HandleRoute forwarding.
2026-05-31 03:25:28 -04:00
Joseph Doherty 27d5701d99 test(dcl): OPC UA A&C live smoke (skippable) + test-infra A&C note 2026-05-31 03:05:44 -04:00
Joseph Doherty 046797e699 feat(ui): instance configure native alarm source override panel 2026-05-31 02:46:54 -04:00
Joseph Doherty 60f8e2c9a7 feat(ui): template editor Native Alarm Sources subsection 2026-05-31 02:40:52 -04:00
Joseph Doherty 1f6c4207df feat(ui): enrich DebugView alarm table with severity + condition state + native metadata 2026-05-31 02:34:12 -04:00
Joseph Doherty a6dcbf62cd feat(cli): native-alarm-source commands (template add/list/remove + instance set/clear) 2026-05-31 02:30:05 -04:00
Joseph Doherty 3bf1d26d79 feat(management): handlers for native alarm source CRUD 2026-05-31 02:23:17 -04:00
Joseph Doherty b1df6d5beb feat(commons): management command contracts for native alarm sources 2026-05-31 02:18:37 -04:00
Joseph Doherty 0c6f9a9cff feat(communication): map enriched alarm fields across gRPC (server + client) 2026-05-31 02:16:43 -04:00
Joseph Doherty bca21ffb95 test(siteruntime): assert computed alarms carry unified condition state 2026-05-31 02:07:54 -04:00
Joseph Doherty 6d318586d1 feat(siteruntime): InstanceActor spawns NativeAlarmActors + enriched alarm snapshot; clear native state on redeploy/undeploy 2026-05-31 02:06:39 -04:00
Joseph Doherty fda7ac9c50 feat(siteruntime): NativeAlarmActor mirrors source alarms (snapshot swap, retention, persistence) 2026-05-31 01:49:28 -04:00
Joseph Doherty 24fd7bee53 feat(siteruntime): site SQLite native_alarm_state store 2026-05-31 01:44:40 -04:00
Joseph Doherty c7411700dc feat(dcl): MxGateway StreamAlarms adapter (snapshot + live transitions, reconnecting)
Adds IAlarmSubscribableConnection to MxGatewayDataConnection (shared session-less
feed, ref-counted), IMxGatewayClient.RunAlarmStreamAsync over the package
StreamAlarmsAsync with internal reconnect, and MxGatewayAlarmMapper
(AlarmFeedMessage/OnAlarmTransitionEvent -> NativeAlarmTransition). Behavior
verified against a live gateway in Task 28; mapper unit-tested.
2026-05-29 16:49:25 -04:00
Joseph Doherty 1fbb814daa feat(dcl): OPC UA A&C field mapper (Task 11 part 1 — pure, unit-tested) 2026-05-29 16:13:02 -04:00
Joseph Doherty d3b3d15018 feat(dcl): DataConnectionActor native alarm subscribe + source-ref routing + unavailable signal 2026-05-29 16:09:31 -04:00
Joseph Doherty ba278736af feat(templateengine): validate native alarm source connection + source reference 2026-05-29 16:04:01 -04:00
Joseph Doherty e5392d2c7b feat(templateengine): flatten native alarm sources (inherit/compose/override) 2026-05-29 16:00:10 -04:00
Joseph Doherty aedd17ca7f feat(configdb): native alarm source repository CRUD + eager-load includes 2026-05-29 15:56:35 -04:00
Joseph Doherty 63f1ec282f feat(configdb): EF mappings + DbSets for native alarm source entities 2026-05-29 15:52:33 -04:00
Joseph Doherty 913441972e feat(commons): native alarm source entities + ResolvedNativeAlarmSource 2026-05-29 15:43:24 -04:00
Joseph Doherty ea14ace150 feat(commons): IAlarmSubscribableConnection seam + DCL native alarm messages 2026-05-29 15:41:10 -04:00
Joseph Doherty edc2dacf6c feat(commons): enrich AlarmStateChanged with unified condition state (additive) 2026-05-29 15:40:20 -04:00
Joseph Doherty 696da92c3a feat(commons): native alarm core types (AlarmConditionState, NativeAlarmTransition, enums) 2026-05-29 15:39:20 -04:00
Joseph Doherty 4881f9c23c fix(centralui): enable Test Bindings for MxGateway connections
The Test Bindings button was disabled (greyed out) for any attribute bound
to a non-OPC-UA connection. BuildTestableRows() filtered to protocol ==
"OpcUa", a stale gate left over from when OPC UA was the only protocol.
ReadTagValuesCommand is protocol-agnostic (routes through
IDataConnection.ReadBatchAsync, which MxGatewayDataConnection implements),
so the filter only blocked the UI — mirroring the already-fixed IsBrowsable.

Remove the OPC-UA-only filter and update the stale comments. Add a bUnit
regression test (theory over MxGateway + OpcUa) asserting the button is
enabled for a readable-protocol binding.

Verified live: dialog opens for an MxGateway binding and returns a
Good-quality read.
2026-05-29 12:26:46 -04:00
Joseph Doherty 4b6ff49822 fix(dcl+centralui): MxGateway tag browse — lazy attributes, frame-size cap, wider scrollable picker
Expanding a Galaxy object in the tag picker hung on "loading…": the browse
reply inlined every child's full attribute set (~152 KB), exceeding Akka's
128 KB remote frame, and remoting silently discarded the oversized reply.

Browse path (DataConnectionLayer):
- RealMxGatewayClient: navigation now uses BrowseChildren(include_attributes=
  false) — child objects only — and an object's own attributes load lazily via
  DiscoverHierarchy(root, max_depth=0) when it's expanded. Payload drops from
  ~152 KB/level to a few KB. Seam contract unchanged.
- DataConnectionActor.CapBrowseChildren: protocol-agnostic byte-budget cap
  (~100 KB) on every BrowseNodeResult before it crosses the site→central
  frame, OR-ing the adapter's own Truncated flag. Byte budget, not a count —
  the only bound that holds regardless of NodeId/attribute-name length.
- RealOpcUaClient: requestedMaxReferencesPerNode 1000 → 500 to narrow the
  window before the byte budget applies.
- Graceful gRPC Unimplemented handling → NotSupportedException →
  BrowseFailureKind.NotBrowsable with an actionable message (older gateway
  builds lacking BrowseChildren).

Picker UI (CentralUI):
- NodeBrowserDialog: modal-lg → modal-xl; new scoped .razor.css caps the tree
  at 55vh with its own scrollbar so manual entry + Select/Cancel stay visible.
- Protocol-agnostic failure messages (was hardcoded "OPC UA …"); renamed the
  leftover opcua-browser-tree class to node-browser-tree.

Tests: new frame-budget cap test + NotSupported=>NotBrowsable mapping test;
DCL suite 88/88. Doc: Component-DataConnectionLayer.md records the lazy
attribute-light browse and the frame-size guard.
2026-05-29 09:53:19 -04:00
Joseph Doherty be32e4a7ff feat(centralui): protocol selector + MxGateway editor in DataConnectionForm
Adds an OPC UA | MxGateway protocol dropdown (create-time; locked read-only on
edit), branches the primary/backup endpoint editors, serializer, and validator
by protocol, and persists DataConnection.Protocol accordingly. Updates form
tests: protocol dropdown present on create + MxGateway save round-trips typed
JSON with Protocol=MxGateway.
2026-05-29 08:02:44 -04:00
Joseph Doherty cb0d17dabd refactor(browse): rename OPC-UA browse service + dialog to protocol-agnostic
IOpcUaBrowseService/OpcUaBrowseService -> IBrowseService/BrowseService,
OpcUaBrowserDialog -> NodeBrowserDialog, and neutralize 'Browse OPC UA' UI
strings to 'Browse'. Updates DI, InstanceConfigure, TestBindingsDialog, TreeRow,
BindingTester, and tests. 574 CentralUI tests green.
2026-05-29 07:59:56 -04:00
Joseph Doherty 5461e4968e feat(dcl): register MxGateway protocol in factory + config flatten + options
DataConnectionFactory registers 'MxGateway' -> MxGatewayDataConnection over the
real client; AddDataConnectionLayer binds MxGatewayGlobalOptions; DeploymentManager
FlattenConnectionConfig gains an MxGateway arm using the typed serializer. Factory
test confirms Create("MxGateway") returns the adapter.
2026-05-29 07:58:51 -04:00
Joseph Doherty 9b7916bb2e refactor(browse): rename BrowseOpcUaNode* to protocol-agnostic BrowseNode*
Renames BrowseOpcUaNodeCommand/Result -> BrowseNodeCommand/Result and
CommunicationService.BrowseOpcUaNodeAsync -> BrowseNodeAsync across Commons,
Communication, SiteRuntime, DCL actors, and CentralUI. Wire manifest name
follows (BrowseOpcUaNode -> BrowseNode). Browse regression tests green.
2026-05-29 07:57:36 -04:00
Joseph Doherty 0a693e0be9 feat(dcl): MxGatewayDataConnection adapter (connect/subscribe/read/write/wait/browse)
Implements IDataConnection + IBrowsableDataConnection over the IMxGatewayClient
seam: connect/disconnect with once-only Disconnected guard + background event
loop, subscribe/unsubscribe with tag routing, read/write batch with per-tag
error classification, WriteBatchAndWait, and Galaxy browse mapping. Covers plan
Tasks 6-10. Full unit coverage via FakeMxGatewayClient (12 tests).
2026-05-29 07:50:16 -04:00
Joseph Doherty 19223a08cf feat(commons): MxGatewayEndpointConfig validator + tests 2026-05-29 07:46:28 -04:00
Joseph Doherty f0aad74311 feat(commons): MxGatewayEndpointConfig serializer + tests 2026-05-29 07:46:28 -04:00
Joseph Doherty 2a7dee4afa feat(centralui+dcl): Test Bindings popup — one-shot live read of bound tags
Adds a Test Bindings button to the Connection Bindings table on the Configure
Instance page that opens a modal showing the live current value of every bound
attribute. Reuses the routing path that the OPC UA tag browser landed on:

  Central:  TestBindingsDialog → IBindingTester → CommunicationService
            → ReadTagValuesCommand → SiteEnvelope (Ask)
  Site:     SiteCommunicationActor → DeploymentManagerActor singleton
            → DataConnectionManagerActor → child DataConnectionActor
            → _adapter.ReadBatchAsync

Split mirrors the browse handler:
  • Manager owns ConnectionNotFound (only it sees the per-site connection set).
  • Child owns ConnectionNotConnected (pre-call status check, never stash —
    read is interactive design-time), Timeout (OperationCanceledException),
    ServerError (any other exception). Per-tag failures from ReadBatchAsync
    become failure TagReadOutcomes without aborting the batch.

CentralUI:
  • IBindingTester / BindingTester — Design-role guard via HasClaim against
    JwtTokenService.RoleClaimType (not IsInRole — see c1e16cf), typed
    transport-failure translation.
  • TestBindingsDialog — ShowAsync(siteId, rows, instanceLabel) method-arg
    pattern (no Razor parameter race; see 2c138b6), groups rows by connection
    and issues one ReadAsync per connection in parallel, per-row error subline
    + per-connection banner, Refresh button re-issues the reads.
  • InstanceConfigure.razor — Test Bindings button next to Save Bindings,
    disabled when no testable rows. OPC UA only today (other protocols have
    no ReadTagValuesCommand wiring yet).

Tests:
  • Commons: ReadTagValuesCommand discovered by ManagementCommandRegistry.
  • DataConnectionLayer: unknown connection → ConnectionNotFound,
    not-connected adapter → ConnectionNotConnected (ReadBatchAsync NOT called),
    success-path mapping (Good/Bad + per-tag error), cancellation → Timeout.
  • CentralUI: register IBindingTester (and the previously-missing
    IOpcUaBrowseService) on the existing InstanceConfigureAuditDrillinTests
    Bunit container so the page renders cleanly with the new dialog.
2026-05-28 13:25:48 -04:00
Joseph Doherty d285174597 feat(dcl+ui): rename BrowseOpcUaNode -> ConnectionName-keyed; implement site handler + dialog failure mapping
- BrowseOpcUaNodeCommand: int DataConnectionId -> string ConnectionName
  (site DataConnectionManagerActor indexes children by name; CentralUI
  already has the connection name in scope via the dropdown — no extra
  plumbing across the trust boundary).
- IOpcUaBrowseService / OpcUaBrowseService: parameter renamed accordingly.
- OpcUaBrowserDialog: collapse the duplicate ConnectionName parameters
  (display label and routing key are the same string).
- Task 10: DataConnectionManagerActor forwards BrowseOpcUaNodeCommand to
  its child by name (owns ConnectionNotFound); DataConnectionActor adds
  the receive across all three lifecycle states (Connecting / Connected
  / Reconnecting) and maps adapter outcomes to BrowseFailureKind
  (NotBrowsable / ConnectionNotConnected / Timeout / ServerError).
- Task 17: SetFailure in OpcUaBrowserDialog implements the full
  BrowseFailureKind switch with friendly UI messages.
- Tests: DataConnectionManagerBrowseHandlerTests covers ConnectionNotFound,
  NotBrowsable, success, and ConnectionNotConnectedException paths.
2026-05-28 12:09:43 -04:00
Joseph Doherty 6999aedc60 feat(dcl): implement BrowseChildrenAsync on RealOpcUaClient 2026-05-28 11:59:03 -04:00
Joseph Doherty 0b4b4c02f6 feat(dcl): implement IBrowsableDataConnection on OpcUaDataConnection 2026-05-28 11:58:08 -04:00
Joseph Doherty 545a22e014 test(templates): override changes drive revision hash forward 2026-05-28 11:55:57 -04:00
Joseph Doherty aff1323896 feat(commons): carry DataSourceReferenceOverride on ConnectionBinding (additive) 2026-05-28 11:53:24 -04:00
Joseph Doherty 2ff138f1e8 feat(templates): apply InstanceConnectionBinding override during flattening 2026-05-28 11:52:28 -04:00
Joseph Doherty d727a6925b feat(commons): add BrowseOpcUaNodeCommand + result + failure types 2026-05-28 11:49:53 -04:00
Joseph Doherty 7b0b9c7365 refactor: rename ScadaLink → ZB.MOM.WW.ScadaBridge (code + projects + namespaces)
Solution + 23 src projects + 26 test projects renamed; folders, csproj,
namespaces, and ScadaLinkDbContext/ScadaBridgeDbContext class updated.
ActorSystem "scadalink" → "scadabridge", Akka seed-node URLs migrated.
SQL roles/logins, LDAP domains, CLI command name, and CLI config dir
(~/.scadalink → ~/.scadabridge) also renamed.

Build green; 5 Host.Tests fail awaiting SQL login rename in next commit.
Pre-existing StaleTagMonitor timing flakes unchanged.

Rename script committed at tools/rename-to-scadabridge.sh.
2026-05-28 09:37:45 -04:00
Joseph Doherty c1fe1c4f83 feat(audit): close AuditLog-001 — wire combined-telemetry dual-write transport
Closes the last open code-review finding. The unreachable
IngestCachedTelemetryAsync path now carries production cached-call
lifecycle traffic, delivering the design's "AuditLog + SiteCalls in one
MS SQL transaction" guarantee. Before this commit, the SiteCalls
operational half had NO production transport at all — central's
SiteCallAuditActor.OnUpsertAsync had zero producers, so cached-call
operational state never reached the central mirror.

Site-side partition (so neither path double-emits):
- ISiteAuditQueue.ReadPendingCachedTelemetryAsync — new method returning
  rows where Kind ∈ {CachedSubmit, ApiCallCached, DbWriteCached,
  CachedResolve} AND ForwardState = Pending.
- ISiteAuditQueue.ReadPendingAsync — XML doc updated, SQLite impl now
  filters Kind NOT IN the cached set so cached rows no longer ride the
  audit-only drain.

New cached-drain in SiteAuditTelemetryActor:
- Optional IOperationTrackingStore? ctor param (null on central
  composition roots — the cached scheduler is never armed there).
- Independent CachedDrain message + scheduler tick parallel to the
  existing Drain — a stall on one path can't block the other; shared
  lifecycle CTS gates both.
- OnCachedDrainAsync: reads cached audit rows, joins each with its
  matching SiteCallOperational snapshot via CorrelationId →
  TrackedOperationId from the tracking store, builds CachedTelemetryBatch,
  pushes via IngestCachedTelemetryAsync, marks ack'd rows Forwarded.
- Orphan rows (no tracking snapshot, thrown tracking-store call,
  missing CorrelationId) logged at Warning + skipped — they stay
  Pending so reconciliation/retry picks them up later. Best-effort
  contract preserved.

Central side: AuditLogIngestActor.OnCachedTelemetryAsync was already
implemented (M3 Bundle G dead code today, alive after this commit) —
performs InsertIfNotExists for AuditLog + UpsertAsync for SiteCalls
inside a BeginTransactionAsync. The handler is idempotent on EventId,
so any duplicate arrivals from concurrent push + reconciliation are
silent no-ops.

Composition root: AkkaHostedService now resolves IOperationTrackingStore
via GetService<>() (site-only) and threads it through the actor's
Props.Create.

Tests added (+3 in SiteAuditTelemetryActorTests):
- Cached rows route through the new transport, not the audit-only drain.
- Orphan cached row (no tracking match) is logged + skipped, drain
  doesn't crash.
- Ordinary audit rows still flow through the audit-only drain unchanged.
- ParentExecutionIdCorrelationTests now unions both queues to assert
  all expected Kinds remain covered after the partition.

Build clean; AuditLog.Tests 250/251 (the 1 fail is the pre-existing
date-sensitive PartitionPurgeTests integration flake explicitly accepted
across the session); SiteRuntime.Tests 302/302.

README regenerated: 0 pending of 481 total.

Session-final totals: 136 of 136 originally-open Theme findings closed
across 11 commits (10 themed batches + this architectural close).
2026-05-28 09:08:43 -04:00
Joseph Doherty 11950b0a8e fix(correctness): close Theme 10 — 5 data-integrity / serialisation findings
Final themed batch. 5 well-localised correctness fixes.

Serialisation precision:
- ESG-020: DatabaseGateway.JsonElementToParameterValue probes
  TryGetInt64 → TryGetDecimal → GetDouble, so a script's high-precision
  decimal SQL parameter survives the cached-write retry round-trip
  without silent precision loss. 3 new regression tests.

Template engine correctness:
- TE-018: DiffService gains ComputeConnectionsDiff over
  FlattenedConfiguration.Connections, mirroring the existing entity-diff
  shape and pairing with the Theme 1 TE-017 hash-coverage fix. A
  ConfigurationDiff record extension in Commons is flagged as a follow-up.
- TE-019: TemplateResolver.BuildInheritanceChain now walks via the
  int? ParentTemplateId directly — only null means "no parent". A real
  Id of 0 (the prior special-cased sentinel) now walks the chain like
  any other node, matching the TemplateEngine-013 CycleDetector fix.
  Regression of TE-013 closed.
- TE-020: All 5 Create* paths in TemplateService + SharedScriptService
  re-ordered to save-first → log-with-real-Id → save-audit (matching
  the InstanceService pattern). Create* audit rows no longer carry a
  literal "0" EntityId.

Doc deferral:
- Transport-012: Component-Transport.md §Audit Trail now spells out that
  the BundleImportId repository filter IS wired (in CentralUiRepository),
  but the Audit-Log-Viewer UI dropdown + summary-row hyperlink are a
  deferred CentralUI follow-up. CLI workaround documented
  (audit query --bundle-import-id).

11+ new regression tests (3 ESG, 4 DiffService, 3 TemplateResolver, 4
TemplateService, 1 SharedScriptService). Build clean; ESG 72/72,
TemplateEngine 324/324. README regenerated: 1 pending of 481 total.

Session-to-date: 135 of 136 originally-open Theme findings closed
across 10 themes in 10 commits.
2026-05-28 08:48:44 -04:00
Joseph Doherty 77cb0ad0e2 fix(api-surface): close Theme 9 — 27 naming / dead-code / config / hygiene findings
The largest themed batch — small mechanical fixes across 11 modules.

API / message hygiene:
- Comm-020: SiteAddressCacheLoaded now carries IReadOnlyDictionary /
  IReadOnlyList — Akka messages must be immutable.
- Commons-016: BundleSession.MaxUnlockAttempts named constant replaces
  magic 3.
- Commons-018: IOperationTrackingStore + IPartitionMaintenance moved from
  Interfaces/ root to Interfaces/Services/ (namespace preserved — 9
  consumers exceeded the in-prompt move threshold).
- Commons-023: TrackingStatusSnapshot.SourceNode now consistent with the
  trailing-optional-with-default pattern used elsewhere.
- SR-022: AuditingDbCommand.DbConnection.set no longer uses reflection —
  exposes AuditingDbConnection.Inner via internal API surface.

Dead code / config cleanup:
- ClusterInfra-011: decorative SectionName constant deleted.
- ClusterInfra-014: dead AddClusterInfrastructureActors method + its
  "throws-when-called" test deleted.
- Host-021: Microsoft Logging:LogLevel block deleted from appsettings.json
  (dead under Serilog).

Fail-loud over fail-silent:
- DM-021: ResolveSiteIdentifierAsync throws on missing site (was silently
  substituting a DB id).
- DM-022: dropped transient Pending write — record now lands directly in
  InProgress (no UI flicker, one fewer DB write).
- Host-020: LoggerConfigurationFactory emits a Console.Error warning when
  both Serilog:MinimumLevel and ScadaLink:Logging:MinimumLevel are set
  (ScadaLink remains truth per Host-011).
- SnF-022: NotifyCachedCallObserverAsync logs Warning on unparseable
  TrackedOperationId (was silently dropping).
- SnF-023: empty siteId default replaced with $unknown-site sentinel
  + constructor normalisation.

Correctness:
- SCA-001: SupervisorStrategy XML rewritten to match actual
  DefaultDecider/Restart semantics (was claiming Resume).
- SCA-003: OnUpsertAsync now restamps IngestedAtUtc on every upsert.
- SR-021: HandleDeployArtifacts now dispatches an internal
  ApplyArtifactDataConnectionsToDcl message after the SQLite write so
  system-wide artifact-deploy data-connection changes go live
  immediately (was requiring a site restart).
- SnF-020: RetryParkedMessageAsync captures the parked row BEFORE the
  local write so a concurrent delete can't skip standby replication.

Sentinels / naming collisions:
- HM-021: CentralSiteId changed from "central" to "$central"
  (uncollideable — leading $ is forbidden in real SiteIdentifiers).

Doc / surface cleanups:
- SEL-018: FailedWriteCount promoted to ISiteEventLogger; XML softened
  to "Available for future Health Monitoring integration".
- SnF-019: VERIFY outcome — documented parking-after-DefaultMaxRetries
  in Component-StoreAndForward.md + DefaultMaxRetries XML (uniform
  cap; maxRetries:0 is the unbounded escape hatch).
- SnF-021: Component-StoreAndForward.md no longer claims the tracking
  table lives in SnF — it's in SiteRuntime, the interface is in Commons.
- CLI-020: bundle export response parse guarded with try/catch on
  JsonException / KeyNotFoundException / FormatException — emits a
  clean INVALID_RESPONSE exit instead of a stack trace.

Config:
- ClusterInfra-013: intent comment added to "catastrophic config" test.
- Host-016: appsettings.Site.json second CentralContactPoints entry
  removed (was pointing at the SITE's own port); doc-key explains how
  to extend.
- Host-018: NodeName added to both shipped per-role configs (was
  causing SourceNode to be null on audit rows).

UI:
- CentralUI-029: replaced JS.InvokeAsync<int>("eval", …) with an ES
  module import (new wwwroot/js/browser-time.js).
- CentralUI-032: AuditResultsGrid gains a Previous button backed by a
  cursor stack.

10+ new regression tests across the affected projects. Build clean;
all suites green. README regenerated: 6 open (was 33).

Session-to-date: 130 of 136 originally-open Theme findings closed.
2026-05-28 08:39:01 -04:00
Joseph Doherty d190345ef0 test(coverage): close Theme 8 — 13 test-coverage findings, +35 tests
13 well-bounded test-coverage gaps closed across 11 test projects.
Net +35 regression tests; no production code changes except the
SiteEventLogger src reference unchanged (W3 redacted only test code).

Test additions:
- CLI-022: CommandTreeTests pinned-count assertion bumped 14→16 and
  3 InlineData rows added for the audit + bundle command groups.
- Commons-020: new TransportRecordsTests covers BundleManifest /
  ExportSelection / ImportPreview / ImportResolution / ImportResult —
  ctor + System.Text.Json round-trip + record-equality (14 tests).
- CD-024: SPLIT-RANGE failure-continuation now under
  EnsureLookahead_SecondSplitThrows_LoopAborts_FirstBoundaryStillCommitted
  (Skippable MS-SQL fixture); production-shape rowversion delete
  asserted by DeleteDeploymentRecord_CurrentRowVersion_StubAttachPath_DeleteSucceeds.
- CentralUI-033: new QueryStringDrillInTests with 4 bUnit cases for
  Transport + SiteCalls drill-in / query-string handling.
- DM-024: probe actors (ReconcileProbeActor, SerializationProbeActor,
  ArtifactProbeActor) refactored from static fields to per-test instances
  (Interlocked on counter) — all 31 callers updated; no production
  changes required.
- HM-022: real-time PeriodicTimer test flake fixed by replacing
  fixed-budget Task.Delay with a RunLoopUntil poll-until-condition
  helper (5s/25ms). Production loop untouched.
- InboundAPI-023: new EndpointExtensionsTests covers the
  POST /api/{methodName} composition wiring via TestServer (7 cases:
  happy path, missing key 401, unknown method 403, invalid JSON 400,
  missing param 400, script-throws 500 sanitised, AuditActorItemKey
  stash invariant).
- MgmtSvc-021: 6 new ManagementActorTests cover the Transport bundle
  handlers (role gate for Export/Preview/Import, unknown-name
  ManagementCommandException, blocker-rejection, dedupe last-write-wins).
- SCA-006: SiteCallQueryRequest_StuckOnly_CursorAtNonStuckBoundary_SkipsToNextStuckRow
  pins the missing boundary case.
- SEL-023: stress-test `bool stop` promoted to `volatile bool` for
  cross-thread visibility under release/JIT.

Verify-only resolutions:
- NS-024: closed by NS-019 (commit ac96b83 deletion of
  NotificationDeliveryService + its test file). No edits needed.
- NotifOutbox-008: FallbackMaxRetries/FallbackRetryDelay are private
  forward-compat constants returned only when no SMTP-config row exists
  (in which case EmailNotificationDeliveryAdapter returns Permanent,
  bypassing the values entirely). Marked Resolved with note.
- Transport-010: Overwrite child-collection sync covered by the T-001/
  T-002 tests added in commit e3ca9af; per-IP throttle by
  BundleUnlockRateLimiterTests; failed-session retention by
  BundleSessionStoreTests; T-009 closed structurally via AsyncLocal.
  Marked Resolved by reference.

Build clean; all 11 affected test suites green. README regenerated:
33 open (was 46).
2026-05-28 08:21:03 -04:00
Joseph Doherty 46cb6965ac fix(security): close Theme 7 — 8 secrets / redaction / append-only findings
Security-sensitive batch, handled main-thread for careful judgment on
secret-leak and pepper-bypass paths.

Secret leak / pepper bypass:
- CD-016 (pepper bypass): InboundApiRepository's GetApiKeyByValueAsync no
  longer hashes the candidate with the unpeppered ApiKeyHasher.Default —
  ctor takes a lazy Func<IApiKeyHasher> accessor (lazy so test composition
  roots without a pepper still bring up the repository), and the DI
  registration wires sp.GetService<IApiKeyHasher>() so the production
  peppered hasher matches the stored KeyHash. Regression test asserts
  positive (peppered roundtrip) AND negative (Default hasher misses the
  same key — proving the lookup uses the injected hasher).
- MgmtSvc-020 (SMTP credential leak): UpdateSmtpConfig/ListSmtpConfigs
  now project through SmtpConfigPublicShape so the response payload and
  audit-row afterState never carry the Credentials field — only a
  HasCredentials bool. The SMTP password / OAuth2 client secret no
  longer leaves the Admin-only UpdateSmtpConfig boundary the caller
  already supplied it to.

Redaction:
- AuditLog-008 (test-fixture under-redact): new
  SafeDefaultAuditPayloadFilter (stateless singleton) does HTTP header
  redaction for the always-sensitive defaults (Authorization, X-Api-Key,
  Cookie, Set-Cookie). FallbackAuditWriter, CentralAuditWriter, and
  AuditLogIngestActor (both ingest paths) default to it instead of null
  — composition roots that bypass AddAuditLog can no longer write
  unredacted auth headers to the audit store.
- NotifService-025 (over-mask): CredentialRedactor.Scrub now only masks
  the last colon-separated component (password / clientSecret) AND only
  if it's >= 12 chars (typical password heuristic). Short user names
  like "root" no longer become global redaction tokens that eat unrelated
  diagnostic text. The full packed string is always masked regardless of
  length. 3 new negative tests pin the no-over-mask contract.

Audit-row correctness / fail-loud:
- InboundAPI-025: Program.cs UseWhen predicate now excludes /api/audit,
  /api/management, /api/centralui, /api/script-analysis AND requires POST
  — the AuditWriteMiddleware no longer emits spurious ApiInbound rows
  for audit-log query/export endpoints (write-on-read recursion broken).
- ESG-021: ApplyAuth now logs Warning (not silent) on empty
  AuthConfiguration for apikey/basic, unknown AuthType, and malformed
  Basic config. AuthConfiguration value NEVER logged. AuthType=none
  remains silent (documented unauthenticated sentinel).
- Security-021: AddSecurity now logs a startup Warning when
  RequireHttpsCookie=false — an HTTP-only deployment that previously
  transmitted the cookie-embedded JWT silently in cleartext is now
  audible in the log.

Defensive:
- CD-021: SwitchOutPartitionAsync's monthBoundary format string now
  yyyy-MM-dd HH:mm:ss.fffffff (datetime2(7) precision) so a future
  sub-second / non-midnight boundary doesn't silently round to the
  wrong partition.

Plus reconciled stale per-module Open-findings counters that had drifted
from earlier sessions (AuditLog, CD, ESG, IAPI, MgmtSvc, NotifService,
Security).

Build clean; all affected test projects green (Host 208, ConfigDB 242,
ESG 69, IAPI 151, MgmtSvc 100, NotifService 55, Security 85, AuditLog
247/248 — 1 pre-existing date-sensitive integration test flake on
PartitionPurgeTests, unrelated). README regenerated: 46 open (was 54).
2026-05-28 08:04:10 -04:00
Joseph Doherty 55f46e7c92 perf: close Theme 6 — 11 allocation / N+1 / lock-contention findings
Well-localised perf fixes across 8 modules.

Lock decoupling / SQL streaming:
- AuditLog-005: SqliteAuditWriter gains dedicated read-only _readConnection
  (+ _readLock) backed by WAL journal mode. GetBacklogStatsAsync,
  ReadPendingAsync, ReadPendingSinceAsync, ReadForwardedAsync no longer
  contend with the hot-path INSERT lock — backlog probes on a 30s timer
  can't stall the writer under multi-hundred-K Pending backlog.
- SEL-022: dropped Cache=Shared from SiteEventLogger's default connection
  string (single-connection logger; mode was dormant config).

Memory / streaming:
- CLI-019: bundle export streams base64 in 1 MB-aligned chunks via
  Convert.TryFromBase64Chars straight into the FileStream — no more
  full-bundle byte[] allocation.
- CentralUI-031: TransportImport now stages the upload to a per-session
  temp file under Path.GetTempPath() (replaces in-memory byte[] field);
  page implements IDisposable to delete the temp file on reset / new
  upload / dispose. Per-circuit working set drops from ~100 MB to ~80 KB.

N+1 hoisting:
- Transport-008: added ITemplateEngineRepository.GetTemplatesWithChildrenAsync
  bulk method; BundleImporter.PreviewAsync calls it once instead of per-
  template-name. Single query with .Include(...).AsSplitQuery().
- DM-023: BuildDeployArtifactsCommandAsync's per-site loop now references
  a pre-fetched GlobalArtifactSnapshot (shared scripts, external systems,
  DB connections, notification lists, SMTP) instead of re-querying per site.
- MgmtSvc-023: HandleQueryDeployments unfiltered branch uses one
  GetAllInstancesAsync bulk load + Dictionary<int,int?> lookup (was a
  GetInstanceByIdAsync per record).

Small allocations / per-tick rebuilds:
- InboundAPI-019: AuditWriteMiddleware gates EnableBuffering() on
  RequestHasBody() so GET/HEAD/DELETE/TRACE/OPTIONS and Content-Length:0
  requests skip the FileBufferingReadStream allocation.
- NotifOutbox-006: ResolveAdapters dictionary now cached on
  _adaptersCache (built lazily on first sweep) + actor-lifetime
  _adaptersScope; ResolveAdapters no longer rebuilds per dispatch tick.

Verify-only:
- Comm-017: Confirmed _inProgressDeployments was deleted by Comm-016 in
  commit ac96b83 — marked Resolved with that attribution. No code change.

Doc-correction:
- NS-022: Updated MailKitSmtpClientWrapper XML doc to spell out single-
  connection / per-delivery-factory contract (option (b) — transient
  client per Send — rejected because it re-handshakes TLS per email).

10+ new regression tests across 8 test projects. Build clean; affected
suites all green. README regenerated: 54 open (was 65).
2026-05-28 07:47:24 -04:00
Joseph Doherty 2ed5c6c379 fix(concurrency/lifetime): close Theme 5 — 10 concurrency / DI / scope findings
Concurrency hazards, DI lifetime hygiene, and one verify-only confirmation
across 8 modules. Highlights:

Concurrency:
- CentralUI-030: SandboxConsoleCapture writes routed through WriteSynchronized
  locking on the captured StringWriter — intra-script Task fan-out can no
  longer corrupt the per-call buffer.
- Commons-021: ExternalCallResult.Response now backed by Lazy<dynamic?>
  (ExecutionAndPublication) — no more benign double-parse race.
- CD-017: DeploymentManagerRepository.DeleteDeploymentRecordAsync now takes
  an expected RowVersion and seeds entry.OriginalValues so EF emits
  DELETE ... WHERE Id=@id AND RowVersion=@prior; stale RowVersion now
  throws DbUpdateConcurrencyException instead of silent overwrite.
- Transport-009: AuditCorrelationContext.BundleImportId backed by
  AsyncLocal<Guid?> so concurrent imports get per-logical-call isolation
  (was a scoped instance shared via AuditService across runs).

DI / lifetime:
- AuditLog-003: All 3 AuditLog actor handlers switched to CreateAsyncScope
  + await using — async EF disposal no longer swallowed.
- AuditLog-007: INodeIdentityProvider resolution standardised on
  GetRequiredService<>() (was mixed with GetService<>()).
- AuditLog-011: AddAuditLogHealthMetricsBridge guarded by sentinel
  descriptor check — calling twice no longer double-registers the hosted
  service.

Shutdown / supervision:
- SiteCallAudit-002: AkkaHostedService adds a CoordinatedShutdown
  cluster-leave task (drain-site-call-audit-singleton) that issues a
  bounded GracefulStop(10s) so failover waits for in-flight upserts.

Registration safety:
- NS-020: AkkaHostedService now guards NotificationForwarder S&F
  registration with _notificationDeliveryHandlerRegistered + throws
  InvalidOperationException on double-register to make the regression loud.

VERIFY-only closures:
- NotifOutbox-005: Confirmed already closed by CD-015 fix (ac96b83) —
  NotificationOutboxRepository.InsertIfNotExistsAsync uses the same
  raw-SQL IF NOT EXISTS + 2601/2627 swallow pattern; race eliminated.

5+ new regression tests (CentralUI sandbox WhenAll, ExternalCallResult
64-reader Barrier, AuditLog DI idempotency, RowVersion stale-throw,
SiteCallAudit-002 shutdown drain). Build clean; affected suites all green.
README regenerated: 65 open (was 75).
2026-05-28 07:29:41 -04:00
Joseph Doherty 6ae0fea558 fix(error-handling): close Theme 4 — 18 cancellation / fire-and-forget findings
Async cancellation hygiene, fire-and-forget observability, retry/shutdown
semantics, and audit-row coverage across 9 modules. Highlights:

Cancellation & lifecycle:
- AuditLog-006: SqliteAuditWriter.Dispose hops to thread pool, escaping the
  captured SyncContext that risked sync-over-async deadlock.
- AuditLog-010: SiteAuditTelemetryActor owns a private lifecycle CTS,
  threaded through drain paths instead of CancellationToken.None.
- Comm-019: CentralCommunicationActor adds lifecycle CTS for repo calls.
- Host-019: Migration StartupRetry forwards ApplicationStopping so SIGTERM
  during the bounded-retry window aborts cleanly.

Cursor / retry / counter correctness:
- AuditLog-004: SiteAuditReconciliationActor's cursor now holds at `since`
  when any row's idempotent insert is still being retried (per-EventId
  retry counter, MaxPermanentInsertAttempts=5 escape valve with LogCritical
  abandon). No more silent abandonment of permanently-failing rows.
- ConfigDB-019: Dropped the catch-and-continue on EnsureLookaheadAsync's
  SPLIT loop — by class-doc construction the catch could only mask real
  failures and let the next iteration create permanent partition holes.
- HM-017/018: HealthReportSender + CentralHealthReportLoop snapshot
  per-interval counters before sending, restore via new
  ISiteHealthCollector.AddIntervalCounters on transport failure so counts
  aren't silently lost.

Fire-and-forget / shutdown waits:
- InboundAPI-018: AuditWriteMiddleware observes faulted audit-write tasks
  via OnlyOnFaulted continuation (Warning log; response unchanged).
- SnF-024: StoreAndForwardService.StopAsync awaits in-flight retry sweep
  with a bounded SweepShutdownWaitTimeout (10s).

Leak / refactor:
- Comm-021: SiteStreamGrpcServer.SubscribeInstance wraps Subscribe in its
  own try/catch so a throw doesn't leak the relay actor or _activeStreams
  entry.
- Comm-022: VERIFIED already-closed by Comm-016's dead-code purge.
- CLI-017: BundleCommands' three subcommands delegate to ExecuteCommandAsync
  (auth-failure exit-code contract unified).

Defensive / validation:
- CLI-021: CliConfig.Load wraps file-read/JSON parse so malformed config
  prints a warning and returns defaults instead of crashing the CLI.
- Host-022: ParseLevel emits stderr one-shot warning for unrecognised
  MinimumLevel instead of silently coercing to Information.
- ESG-019: ExternalSystemClient sets HttpClient.Timeout=Infinite so the
  per-call CTS is the sole timeout source (was clipped to 100s by .NET).
- Security-020: New SecurityOptionsValidator (IValidateOptions) rejects
  empty LdapServer/LdapSearchBase with ValidateOnStart.
- DM-019: Lifecycle command timeouts now emit DisableTimedOut/EnableTimedOut/
  DeleteTimedOut audit entries (mirrors DeployFailed pattern).

Plus reconciled stale per-module Open-findings counters that had drifted
from prior sessions.

20+ new regression tests across 11 test projects; build clean; affected
suites all green. README regenerated: 75 open (was 93).
2026-05-28 07:13:28 -04:00