Commit Graph

1045 Commits

Author SHA1 Message Date
Joseph Doherty 9b7916bb2e refactor(browse): rename BrowseOpcUaNode* to protocol-agnostic BrowseNode*
Renames BrowseOpcUaNodeCommand/Result -> BrowseNodeCommand/Result and
CommunicationService.BrowseOpcUaNodeAsync -> BrowseNodeAsync across Commons,
Communication, SiteRuntime, DCL actors, and CentralUI. Wire manifest name
follows (BrowseOpcUaNode -> BrowseNode). Browse regression tests green.
2026-05-29 07:57:36 -04:00
Joseph Doherty 20c24ef260 feat(dcl): RealMxGatewayClient over ZB.MOM.WW.MxGateway.Client
Seam implementation wrapping the gateway client + GalaxyRepositoryClient:
OpenSession/Register, AddItem+Advise subscribe, ReadBulk/WriteBulk with handle
tracking, StreamEvents loop with worker_sequence resume + OPC-style quality
mapping, and Galaxy BrowseChildren mapping (objects keyed by gobject id,
attributes by full tag reference). Only type touching the generated contracts.
2026-05-29 07:55:32 -04:00
Joseph Doherty 0a693e0be9 feat(dcl): MxGatewayDataConnection adapter (connect/subscribe/read/write/wait/browse)
Implements IDataConnection + IBrowsableDataConnection over the IMxGatewayClient
seam: connect/disconnect with once-only Disconnected guard + background event
loop, subscribe/unsubscribe with tag routing, read/write batch with per-tag
error classification, WriteBatchAndWait, and Galaxy browse mapping. Covers plan
Tasks 6-10. Full unit coverage via FakeMxGatewayClient (12 tests).
2026-05-29 07:50:16 -04:00
Joseph Doherty 19223a08cf feat(commons): MxGatewayEndpointConfig validator + tests 2026-05-29 07:46:28 -04:00
Joseph Doherty f0aad74311 feat(commons): MxGatewayEndpointConfig serializer + tests 2026-05-29 07:46:28 -04:00
Joseph Doherty fe02ec5664 feat(dcl): MxGateway client seam interfaces + global options 2026-05-29 07:44:07 -04:00
Joseph Doherty f626ece66a feat(commons): add MxGatewayEndpointConfig type 2026-05-29 07:44:07 -04:00
Joseph Doherty d695ab2492 build(dcl): add Gitea feed + ZB.MOM.WW.MxGateway.Client package reference
Central package management requires package-source mapping with >1 feed
(NU1507 as error), so nuget.config scopes ZB.MOM.WW.MxGateway.* to the Gitea
feed and everything else to nuget.org. Credentials are not committed.
2026-05-29 07:43:17 -04:00
Joseph Doherty 2044023bdd docs(dcl): implementation plan for MxGateway data connection
19 bite-sized tasks across adapter (TDD), config serializer/validator,
browse generalization rename, Central UI protocol selector/editor, packaging,
and integration. Co-located task persistence for resumable execution.
2026-05-29 07:39:44 -04:00
Joseph Doherty 8730c6e30a docs(dcl): design for MxGateway data connection (2nd protocol)
Add design doc for a second data-connection protocol, MxGateway, alongside
the OPC UA client. New IDataConnection adapter behind the existing
DataConnectionFactory extension point; tag pipe (read/subscribe/write) plus
Galaxy hierarchy browse, optional 2nd endpoint for failover. Generalizes the
OPC-UA-named browse plumbing to protocol-agnostic browse via
IBrowsableDataConnection. No entity/schema changes.
2026-05-29 07:28:21 -04:00
Joseph Doherty 5c98d23800 Merge feat/opcua-tag-browser: OPC UA tag browser + Test Bindings popup
- Per-instance DataSourceReferenceOverride on InstanceConnectionBinding
- IBrowsableDataConnection capability + RealOpcUaClient.BrowseChildrenAsync
- BrowseOpcUaNodeCommand routed via DeploymentManagerActor singleton (HA-safe)
- <OpcUaBrowserDialog/> on InstanceConfigure with lazy-loaded tree + manual paste
- Test Bindings popup: one-shot live read of all bound tags via ReadTagValuesCommand
- ConfigurationDatabase migration AddInstanceConnectionBindingOverride
- Doc updates: Component-DataConnectionLayer, Component-TemplateEngine, Component-CentralUI
2026-05-28 14:05:55 -04:00
Joseph Doherty 2a7dee4afa feat(centralui+dcl): Test Bindings popup — one-shot live read of bound tags
Adds a Test Bindings button to the Connection Bindings table on the Configure
Instance page that opens a modal showing the live current value of every bound
attribute. Reuses the routing path that the OPC UA tag browser landed on:

  Central:  TestBindingsDialog → IBindingTester → CommunicationService
            → ReadTagValuesCommand → SiteEnvelope (Ask)
  Site:     SiteCommunicationActor → DeploymentManagerActor singleton
            → DataConnectionManagerActor → child DataConnectionActor
            → _adapter.ReadBatchAsync

Split mirrors the browse handler:
  • Manager owns ConnectionNotFound (only it sees the per-site connection set).
  • Child owns ConnectionNotConnected (pre-call status check, never stash —
    read is interactive design-time), Timeout (OperationCanceledException),
    ServerError (any other exception). Per-tag failures from ReadBatchAsync
    become failure TagReadOutcomes without aborting the batch.

CentralUI:
  • IBindingTester / BindingTester — Design-role guard via HasClaim against
    JwtTokenService.RoleClaimType (not IsInRole — see c1e16cf), typed
    transport-failure translation.
  • TestBindingsDialog — ShowAsync(siteId, rows, instanceLabel) method-arg
    pattern (no Razor parameter race; see 2c138b6), groups rows by connection
    and issues one ReadAsync per connection in parallel, per-row error subline
    + per-connection banner, Refresh button re-issues the reads.
  • InstanceConfigure.razor — Test Bindings button next to Save Bindings,
    disabled when no testable rows. OPC UA only today (other protocols have
    no ReadTagValuesCommand wiring yet).

Tests:
  • Commons: ReadTagValuesCommand discovered by ManagementCommandRegistry.
  • DataConnectionLayer: unknown connection → ConnectionNotFound,
    not-connected adapter → ConnectionNotConnected (ReadBatchAsync NOT called),
    success-path mapping (Good/Bad + per-tag error), cancellation → Timeout.
  • CentralUI: register IBindingTester (and the previously-missing
    IOpcUaBrowseService) on the existing InstanceConfigureAuditDrillinTests
    Bunit container so the page renders cleanly with the new dialog.
2026-05-28 13:25:48 -04:00
Joseph Doherty f401a9ea0e fix(comm+site): route BrowseOpcUaNodeCommand via DeploymentManagerActor singleton 2026-05-28 12:51:45 -04:00
Joseph Doherty 2c138b6a25 fix(centralui): pass siteId+connectionName into ShowAsync explicitly
Razor parameter binding propagates on the next render, so reading SiteId
inside LoadRootAsync raced against the parent's "set field, then call
ShowAsync()" pattern — central received an empty siteId and rejected
with "No ClusterClient for site ,". Take the values as args instead.
2026-05-28 12:40:35 -04:00
Joseph Doherty c1e16cf9ff fix(centralui): role guard uses RoleClaimType, not IsInRole
ClaimsIdentity is built without an explicit roleType, so IsInRole("Design")
checks ClaimTypes.Role while actual claims use "Role" — the guard always
returned not-authorized. Switch to HasClaim(RoleClaimType, "Design").
2026-05-28 12:36:46 -04:00
Joseph Doherty c2919c2c38 docs(centralui): document OPC UA browse popup + override column 2026-05-28 12:15:39 -04:00
Joseph Doherty 3162370a8f feat(centralui): add OPC UA browse button + override column to InstanceConfigure 2026-05-28 12:14:26 -04:00
Joseph Doherty e6f9f91bb3 feat(comm): route BrowseOpcUaNodeCommand from central to site DCL manager
Wires the OPC UA Tag Browser cross-cluster path: central UI Asks via
CommunicationService.BrowseOpcUaNodeAsync -> ClusterClient -> site
SiteCommunicationActor -> /user/dcl-manager (Task 10 handler).

Uses ActorSelection.Tell(msg, Sender) since DataConnectionManagerActor
is not a child of DeploymentManagerActor and ActorSelection has no
Forward() helper; preserving Sender keeps the BrowseOpcUaNodeResult
routing back to the original Ask.

Integration test deferred: tests/ZB.MOM.WW.ScadaBridge.IntegrationTests
has no ClusterFixture (only ScadaBridgeWebApplicationFactory, which
does not expose a Communication service nor a seeded site OPC UA
connection). Round-trip will be exercised manually under Task 19.
2026-05-28 12:12:29 -04:00
Joseph Doherty d285174597 feat(dcl+ui): rename BrowseOpcUaNode -> ConnectionName-keyed; implement site handler + dialog failure mapping
- BrowseOpcUaNodeCommand: int DataConnectionId -> string ConnectionName
  (site DataConnectionManagerActor indexes children by name; CentralUI
  already has the connection name in scope via the dropdown — no extra
  plumbing across the trust boundary).
- IOpcUaBrowseService / OpcUaBrowseService: parameter renamed accordingly.
- OpcUaBrowserDialog: collapse the duplicate ConnectionName parameters
  (display label and routing key are the same string).
- Task 10: DataConnectionManagerActor forwards BrowseOpcUaNodeCommand to
  its child by name (owns ConnectionNotFound); DataConnectionActor adds
  the receive across all three lifecycle states (Connecting / Connected
  / Reconnecting) and maps adapter outcomes to BrowseFailureKind
  (NotBrowsable / ConnectionNotConnected / Timeout / ServerError).
- Task 17: SetFailure in OpcUaBrowserDialog implements the full
  BrowseFailureKind switch with friendly UI messages.
- Tests: DataConnectionManagerBrowseHandlerTests covers ConnectionNotFound,
  NotBrowsable, success, and ConnectionNotConnectedException paths.
2026-05-28 12:09:43 -04:00
Joseph Doherty 6999aedc60 feat(dcl): implement BrowseChildrenAsync on RealOpcUaClient 2026-05-28 11:59:03 -04:00
Joseph Doherty 1d2e2c1614 feat(centralui): tree rendering + lazy load + selection in OpcUaBrowserDialog 2026-05-28 11:58:59 -04:00
Joseph Doherty 0b4b4c02f6 feat(dcl): implement IBrowsableDataConnection on OpcUaDataConnection 2026-05-28 11:58:08 -04:00
Joseph Doherty d79d7fdf71 feat(configdb): migration AddInstanceConnectionBindingOverride 2026-05-28 11:56:56 -04:00
Joseph Doherty 41c78f7700 feat(centralui+comm): IOpcUaBrowseService + typed BrowseOpcUaNodeAsync on CommunicationService 2026-05-28 11:56:04 -04:00
Joseph Doherty 545a22e014 test(templates): override changes drive revision hash forward 2026-05-28 11:55:57 -04:00
Joseph Doherty c852979835 docs(dcl): document browse capability + BrowseOpcUaNodeCommand 2026-05-28 11:53:48 -04:00
Joseph Doherty 8d42a9b208 docs(templates): document per-instance DataSourceReference override 2026-05-28 11:53:48 -04:00
Joseph Doherty aff1323896 feat(commons): carry DataSourceReferenceOverride on ConnectionBinding (additive) 2026-05-28 11:53:24 -04:00
Joseph Doherty 7fc1f752f8 feat(dcl): add BrowseChildrenAsync to IOpcUaClient (NotImplementedException stubs) 2026-05-28 11:53:10 -04:00
Joseph Doherty 2ff138f1e8 feat(templates): apply InstanceConnectionBinding override during flattening 2026-05-28 11:52:28 -04:00
Joseph Doherty 18130a6937 feat(configdb): map InstanceConnectionBinding.DataSourceReferenceOverride 2026-05-28 11:51:05 -04:00
Joseph Doherty 4fc546383f feat(centralui): scaffold <OpcUaBrowserDialog/> modal 2026-05-28 11:49:59 -04:00
Joseph Doherty d727a6925b feat(commons): add BrowseOpcUaNodeCommand + result + failure types 2026-05-28 11:49:53 -04:00
Joseph Doherty 5645eb61a3 feat(commons): add IBrowsableDataConnection capability interface 2026-05-28 11:49:03 -04:00
Joseph Doherty 28f685965c feat(commons): add DataSourceReferenceOverride to InstanceConnectionBinding 2026-05-28 11:48:59 -04:00
Joseph Doherty 2aad9b533a plan: implementation plan for OPC UA tag browser popup (22 tasks)
Five phases, PR-shippable per phase: schema/contracts, DCL browse capability,
flattening uses override, Central UI popup + integration, docs. Per-task
classification, time estimates, and parallelism declared.
2026-05-28 11:43:04 -04:00
Joseph Doherty 8632c098b9 plan: design for OPC UA tag browser popup on instance config page
Per-instance address override + live ClusterClient-based browse via a new
IBrowsableDataConnection capability on RealOpcUaClient. Lazy-loaded tree
with manual-paste fallback; offline-safe.
2026-05-28 11:33:12 -04:00
Joseph Doherty de05c65992 fix(seed): seed Engineering Alerts notification list on both stacks
Test instances persistently emit Notify.To("Engineering Alerts").Send;
without the list at central a fresh cutover parks every notification
(observed 42k+ parked in a 3.5-min S&F drain after the rename).
Mirror the seed across docker/seed-sites.sh and docker-env2/seed-sites.sh.
2026-05-28 10:20:02 -04:00
Joseph Doherty d73f1b103a fix(seed): grant Design + Deployment to multi-role in primary seed-sites.sh
A fresh ScadaBridgeConfig has only the Admin LdapGroupMappings row
(InitialSchema migration ships one row, SecurityConfiguration.HasData
declares four). docker-env2/seed-sites.sh already inserts the missing
three idempotently; docker/seed-sites.sh did not, so multi-role got
Admin only on a primary cutover. Mirror the env2 insert block.
2026-05-28 10:11:21 -04:00
Joseph Doherty 1aa5da4eca refactor: add docker/rename-databases.sh for in-place MS SQL cutover
Renames ScadaLink* databases + scadalink_app login on an existing
running scadabridge-mssql container. For users preserving seeded
test data through the rename. Fresh deployments use the already-
updated infra/mssql/setup.sql directly.
2026-05-28 09:39:33 -04:00
Joseph Doherty 7b0b9c7365 refactor: rename ScadaLink → ZB.MOM.WW.ScadaBridge (code + projects + namespaces)
Solution + 23 src projects + 26 test projects renamed; folders, csproj,
namespaces, and ScadaLinkDbContext/ScadaBridgeDbContext class updated.
ActorSystem "scadalink" → "scadabridge", Akka seed-node URLs migrated.
SQL roles/logins, LDAP domains, CLI command name, and CLI config dir
(~/.scadalink → ~/.scadabridge) also renamed.

Build green; 5 Host.Tests fail awaiting SQL login rename in next commit.
Pre-existing StaleTagMonitor timing flakes unchanged.

Rename script committed at tools/rename-to-scadabridge.sh.
2026-05-28 09:37:45 -04:00
Joseph Doherty 6d87ee3c3b docs: add deployments/ catalog with per-deployment markdown
One file per local Docker cluster (docker-cluster, docker-cluster-env2)
keyed by Transport.SourceEnvironment. README indexes the set.
2026-05-28 09:27:43 -04:00
Joseph Doherty d8eda2f508 plan: design for ScadaLink → ZB.MOM.WW.ScadaBridge rename
Decisions: full prefix in csproj names + namespaces, full runtime
artifact rename (containers/network/DBs), staged commits on main,
in-place MS SQL DB rename, wipe site SQLite on cutover.
2026-05-28 09:27:31 -04:00
Joseph Doherty c1fe1c4f83 feat(audit): close AuditLog-001 — wire combined-telemetry dual-write transport
Closes the last open code-review finding. The unreachable
IngestCachedTelemetryAsync path now carries production cached-call
lifecycle traffic, delivering the design's "AuditLog + SiteCalls in one
MS SQL transaction" guarantee. Before this commit, the SiteCalls
operational half had NO production transport at all — central's
SiteCallAuditActor.OnUpsertAsync had zero producers, so cached-call
operational state never reached the central mirror.

Site-side partition (so neither path double-emits):
- ISiteAuditQueue.ReadPendingCachedTelemetryAsync — new method returning
  rows where Kind ∈ {CachedSubmit, ApiCallCached, DbWriteCached,
  CachedResolve} AND ForwardState = Pending.
- ISiteAuditQueue.ReadPendingAsync — XML doc updated, SQLite impl now
  filters Kind NOT IN the cached set so cached rows no longer ride the
  audit-only drain.

New cached-drain in SiteAuditTelemetryActor:
- Optional IOperationTrackingStore? ctor param (null on central
  composition roots — the cached scheduler is never armed there).
- Independent CachedDrain message + scheduler tick parallel to the
  existing Drain — a stall on one path can't block the other; shared
  lifecycle CTS gates both.
- OnCachedDrainAsync: reads cached audit rows, joins each with its
  matching SiteCallOperational snapshot via CorrelationId →
  TrackedOperationId from the tracking store, builds CachedTelemetryBatch,
  pushes via IngestCachedTelemetryAsync, marks ack'd rows Forwarded.
- Orphan rows (no tracking snapshot, thrown tracking-store call,
  missing CorrelationId) logged at Warning + skipped — they stay
  Pending so reconciliation/retry picks them up later. Best-effort
  contract preserved.

Central side: AuditLogIngestActor.OnCachedTelemetryAsync was already
implemented (M3 Bundle G dead code today, alive after this commit) —
performs InsertIfNotExists for AuditLog + UpsertAsync for SiteCalls
inside a BeginTransactionAsync. The handler is idempotent on EventId,
so any duplicate arrivals from concurrent push + reconciliation are
silent no-ops.

Composition root: AkkaHostedService now resolves IOperationTrackingStore
via GetService<>() (site-only) and threads it through the actor's
Props.Create.

Tests added (+3 in SiteAuditTelemetryActorTests):
- Cached rows route through the new transport, not the audit-only drain.
- Orphan cached row (no tracking match) is logged + skipped, drain
  doesn't crash.
- Ordinary audit rows still flow through the audit-only drain unchanged.
- ParentExecutionIdCorrelationTests now unions both queues to assert
  all expected Kinds remain covered after the partition.

Build clean; AuditLog.Tests 250/251 (the 1 fail is the pre-existing
date-sensitive PartitionPurgeTests integration flake explicitly accepted
across the session); SiteRuntime.Tests 302/302.

README regenerated: 0 pending of 481 total.

Session-final totals: 136 of 136 originally-open Theme findings closed
across 11 commits (10 themed batches + this architectural close).
2026-05-28 09:08:43 -04:00
Joseph Doherty 11950b0a8e fix(correctness): close Theme 10 — 5 data-integrity / serialisation findings
Final themed batch. 5 well-localised correctness fixes.

Serialisation precision:
- ESG-020: DatabaseGateway.JsonElementToParameterValue probes
  TryGetInt64 → TryGetDecimal → GetDouble, so a script's high-precision
  decimal SQL parameter survives the cached-write retry round-trip
  without silent precision loss. 3 new regression tests.

Template engine correctness:
- TE-018: DiffService gains ComputeConnectionsDiff over
  FlattenedConfiguration.Connections, mirroring the existing entity-diff
  shape and pairing with the Theme 1 TE-017 hash-coverage fix. A
  ConfigurationDiff record extension in Commons is flagged as a follow-up.
- TE-019: TemplateResolver.BuildInheritanceChain now walks via the
  int? ParentTemplateId directly — only null means "no parent". A real
  Id of 0 (the prior special-cased sentinel) now walks the chain like
  any other node, matching the TemplateEngine-013 CycleDetector fix.
  Regression of TE-013 closed.
- TE-020: All 5 Create* paths in TemplateService + SharedScriptService
  re-ordered to save-first → log-with-real-Id → save-audit (matching
  the InstanceService pattern). Create* audit rows no longer carry a
  literal "0" EntityId.

Doc deferral:
- Transport-012: Component-Transport.md §Audit Trail now spells out that
  the BundleImportId repository filter IS wired (in CentralUiRepository),
  but the Audit-Log-Viewer UI dropdown + summary-row hyperlink are a
  deferred CentralUI follow-up. CLI workaround documented
  (audit query --bundle-import-id).

11+ new regression tests (3 ESG, 4 DiffService, 3 TemplateResolver, 4
TemplateService, 1 SharedScriptService). Build clean; ESG 72/72,
TemplateEngine 324/324. README regenerated: 1 pending of 481 total.

Session-to-date: 135 of 136 originally-open Theme findings closed
across 10 themes in 10 commits.
2026-05-28 08:48:44 -04:00
Joseph Doherty 77cb0ad0e2 fix(api-surface): close Theme 9 — 27 naming / dead-code / config / hygiene findings
The largest themed batch — small mechanical fixes across 11 modules.

API / message hygiene:
- Comm-020: SiteAddressCacheLoaded now carries IReadOnlyDictionary /
  IReadOnlyList — Akka messages must be immutable.
- Commons-016: BundleSession.MaxUnlockAttempts named constant replaces
  magic 3.
- Commons-018: IOperationTrackingStore + IPartitionMaintenance moved from
  Interfaces/ root to Interfaces/Services/ (namespace preserved — 9
  consumers exceeded the in-prompt move threshold).
- Commons-023: TrackingStatusSnapshot.SourceNode now consistent with the
  trailing-optional-with-default pattern used elsewhere.
- SR-022: AuditingDbCommand.DbConnection.set no longer uses reflection —
  exposes AuditingDbConnection.Inner via internal API surface.

Dead code / config cleanup:
- ClusterInfra-011: decorative SectionName constant deleted.
- ClusterInfra-014: dead AddClusterInfrastructureActors method + its
  "throws-when-called" test deleted.
- Host-021: Microsoft Logging:LogLevel block deleted from appsettings.json
  (dead under Serilog).

Fail-loud over fail-silent:
- DM-021: ResolveSiteIdentifierAsync throws on missing site (was silently
  substituting a DB id).
- DM-022: dropped transient Pending write — record now lands directly in
  InProgress (no UI flicker, one fewer DB write).
- Host-020: LoggerConfigurationFactory emits a Console.Error warning when
  both Serilog:MinimumLevel and ScadaLink:Logging:MinimumLevel are set
  (ScadaLink remains truth per Host-011).
- SnF-022: NotifyCachedCallObserverAsync logs Warning on unparseable
  TrackedOperationId (was silently dropping).
- SnF-023: empty siteId default replaced with $unknown-site sentinel
  + constructor normalisation.

Correctness:
- SCA-001: SupervisorStrategy XML rewritten to match actual
  DefaultDecider/Restart semantics (was claiming Resume).
- SCA-003: OnUpsertAsync now restamps IngestedAtUtc on every upsert.
- SR-021: HandleDeployArtifacts now dispatches an internal
  ApplyArtifactDataConnectionsToDcl message after the SQLite write so
  system-wide artifact-deploy data-connection changes go live
  immediately (was requiring a site restart).
- SnF-020: RetryParkedMessageAsync captures the parked row BEFORE the
  local write so a concurrent delete can't skip standby replication.

Sentinels / naming collisions:
- HM-021: CentralSiteId changed from "central" to "$central"
  (uncollideable — leading $ is forbidden in real SiteIdentifiers).

Doc / surface cleanups:
- SEL-018: FailedWriteCount promoted to ISiteEventLogger; XML softened
  to "Available for future Health Monitoring integration".
- SnF-019: VERIFY outcome — documented parking-after-DefaultMaxRetries
  in Component-StoreAndForward.md + DefaultMaxRetries XML (uniform
  cap; maxRetries:0 is the unbounded escape hatch).
- SnF-021: Component-StoreAndForward.md no longer claims the tracking
  table lives in SnF — it's in SiteRuntime, the interface is in Commons.
- CLI-020: bundle export response parse guarded with try/catch on
  JsonException / KeyNotFoundException / FormatException — emits a
  clean INVALID_RESPONSE exit instead of a stack trace.

Config:
- ClusterInfra-013: intent comment added to "catastrophic config" test.
- Host-016: appsettings.Site.json second CentralContactPoints entry
  removed (was pointing at the SITE's own port); doc-key explains how
  to extend.
- Host-018: NodeName added to both shipped per-role configs (was
  causing SourceNode to be null on audit rows).

UI:
- CentralUI-029: replaced JS.InvokeAsync<int>("eval", …) with an ES
  module import (new wwwroot/js/browser-time.js).
- CentralUI-032: AuditResultsGrid gains a Previous button backed by a
  cursor stack.

10+ new regression tests across the affected projects. Build clean;
all suites green. README regenerated: 6 open (was 33).

Session-to-date: 130 of 136 originally-open Theme findings closed.
2026-05-28 08:39:01 -04:00
Joseph Doherty d190345ef0 test(coverage): close Theme 8 — 13 test-coverage findings, +35 tests
13 well-bounded test-coverage gaps closed across 11 test projects.
Net +35 regression tests; no production code changes except the
SiteEventLogger src reference unchanged (W3 redacted only test code).

Test additions:
- CLI-022: CommandTreeTests pinned-count assertion bumped 14→16 and
  3 InlineData rows added for the audit + bundle command groups.
- Commons-020: new TransportRecordsTests covers BundleManifest /
  ExportSelection / ImportPreview / ImportResolution / ImportResult —
  ctor + System.Text.Json round-trip + record-equality (14 tests).
- CD-024: SPLIT-RANGE failure-continuation now under
  EnsureLookahead_SecondSplitThrows_LoopAborts_FirstBoundaryStillCommitted
  (Skippable MS-SQL fixture); production-shape rowversion delete
  asserted by DeleteDeploymentRecord_CurrentRowVersion_StubAttachPath_DeleteSucceeds.
- CentralUI-033: new QueryStringDrillInTests with 4 bUnit cases for
  Transport + SiteCalls drill-in / query-string handling.
- DM-024: probe actors (ReconcileProbeActor, SerializationProbeActor,
  ArtifactProbeActor) refactored from static fields to per-test instances
  (Interlocked on counter) — all 31 callers updated; no production
  changes required.
- HM-022: real-time PeriodicTimer test flake fixed by replacing
  fixed-budget Task.Delay with a RunLoopUntil poll-until-condition
  helper (5s/25ms). Production loop untouched.
- InboundAPI-023: new EndpointExtensionsTests covers the
  POST /api/{methodName} composition wiring via TestServer (7 cases:
  happy path, missing key 401, unknown method 403, invalid JSON 400,
  missing param 400, script-throws 500 sanitised, AuditActorItemKey
  stash invariant).
- MgmtSvc-021: 6 new ManagementActorTests cover the Transport bundle
  handlers (role gate for Export/Preview/Import, unknown-name
  ManagementCommandException, blocker-rejection, dedupe last-write-wins).
- SCA-006: SiteCallQueryRequest_StuckOnly_CursorAtNonStuckBoundary_SkipsToNextStuckRow
  pins the missing boundary case.
- SEL-023: stress-test `bool stop` promoted to `volatile bool` for
  cross-thread visibility under release/JIT.

Verify-only resolutions:
- NS-024: closed by NS-019 (commit ac96b83 deletion of
  NotificationDeliveryService + its test file). No edits needed.
- NotifOutbox-008: FallbackMaxRetries/FallbackRetryDelay are private
  forward-compat constants returned only when no SMTP-config row exists
  (in which case EmailNotificationDeliveryAdapter returns Permanent,
  bypassing the values entirely). Marked Resolved with note.
- Transport-010: Overwrite child-collection sync covered by the T-001/
  T-002 tests added in commit e3ca9af; per-IP throttle by
  BundleUnlockRateLimiterTests; failed-session retention by
  BundleSessionStoreTests; T-009 closed structurally via AsyncLocal.
  Marked Resolved by reference.

Build clean; all 11 affected test suites green. README regenerated:
33 open (was 46).
2026-05-28 08:21:03 -04:00
Joseph Doherty 46cb6965ac fix(security): close Theme 7 — 8 secrets / redaction / append-only findings
Security-sensitive batch, handled main-thread for careful judgment on
secret-leak and pepper-bypass paths.

Secret leak / pepper bypass:
- CD-016 (pepper bypass): InboundApiRepository's GetApiKeyByValueAsync no
  longer hashes the candidate with the unpeppered ApiKeyHasher.Default —
  ctor takes a lazy Func<IApiKeyHasher> accessor (lazy so test composition
  roots without a pepper still bring up the repository), and the DI
  registration wires sp.GetService<IApiKeyHasher>() so the production
  peppered hasher matches the stored KeyHash. Regression test asserts
  positive (peppered roundtrip) AND negative (Default hasher misses the
  same key — proving the lookup uses the injected hasher).
- MgmtSvc-020 (SMTP credential leak): UpdateSmtpConfig/ListSmtpConfigs
  now project through SmtpConfigPublicShape so the response payload and
  audit-row afterState never carry the Credentials field — only a
  HasCredentials bool. The SMTP password / OAuth2 client secret no
  longer leaves the Admin-only UpdateSmtpConfig boundary the caller
  already supplied it to.

Redaction:
- AuditLog-008 (test-fixture under-redact): new
  SafeDefaultAuditPayloadFilter (stateless singleton) does HTTP header
  redaction for the always-sensitive defaults (Authorization, X-Api-Key,
  Cookie, Set-Cookie). FallbackAuditWriter, CentralAuditWriter, and
  AuditLogIngestActor (both ingest paths) default to it instead of null
  — composition roots that bypass AddAuditLog can no longer write
  unredacted auth headers to the audit store.
- NotifService-025 (over-mask): CredentialRedactor.Scrub now only masks
  the last colon-separated component (password / clientSecret) AND only
  if it's >= 12 chars (typical password heuristic). Short user names
  like "root" no longer become global redaction tokens that eat unrelated
  diagnostic text. The full packed string is always masked regardless of
  length. 3 new negative tests pin the no-over-mask contract.

Audit-row correctness / fail-loud:
- InboundAPI-025: Program.cs UseWhen predicate now excludes /api/audit,
  /api/management, /api/centralui, /api/script-analysis AND requires POST
  — the AuditWriteMiddleware no longer emits spurious ApiInbound rows
  for audit-log query/export endpoints (write-on-read recursion broken).
- ESG-021: ApplyAuth now logs Warning (not silent) on empty
  AuthConfiguration for apikey/basic, unknown AuthType, and malformed
  Basic config. AuthConfiguration value NEVER logged. AuthType=none
  remains silent (documented unauthenticated sentinel).
- Security-021: AddSecurity now logs a startup Warning when
  RequireHttpsCookie=false — an HTTP-only deployment that previously
  transmitted the cookie-embedded JWT silently in cleartext is now
  audible in the log.

Defensive:
- CD-021: SwitchOutPartitionAsync's monthBoundary format string now
  yyyy-MM-dd HH:mm:ss.fffffff (datetime2(7) precision) so a future
  sub-second / non-midnight boundary doesn't silently round to the
  wrong partition.

Plus reconciled stale per-module Open-findings counters that had drifted
from earlier sessions (AuditLog, CD, ESG, IAPI, MgmtSvc, NotifService,
Security).

Build clean; all affected test projects green (Host 208, ConfigDB 242,
ESG 69, IAPI 151, MgmtSvc 100, NotifService 55, Security 85, AuditLog
247/248 — 1 pre-existing date-sensitive integration test flake on
PartitionPurgeTests, unrelated). README regenerated: 46 open (was 54).
2026-05-28 08:04:10 -04:00
Joseph Doherty 55f46e7c92 perf: close Theme 6 — 11 allocation / N+1 / lock-contention findings
Well-localised perf fixes across 8 modules.

Lock decoupling / SQL streaming:
- AuditLog-005: SqliteAuditWriter gains dedicated read-only _readConnection
  (+ _readLock) backed by WAL journal mode. GetBacklogStatsAsync,
  ReadPendingAsync, ReadPendingSinceAsync, ReadForwardedAsync no longer
  contend with the hot-path INSERT lock — backlog probes on a 30s timer
  can't stall the writer under multi-hundred-K Pending backlog.
- SEL-022: dropped Cache=Shared from SiteEventLogger's default connection
  string (single-connection logger; mode was dormant config).

Memory / streaming:
- CLI-019: bundle export streams base64 in 1 MB-aligned chunks via
  Convert.TryFromBase64Chars straight into the FileStream — no more
  full-bundle byte[] allocation.
- CentralUI-031: TransportImport now stages the upload to a per-session
  temp file under Path.GetTempPath() (replaces in-memory byte[] field);
  page implements IDisposable to delete the temp file on reset / new
  upload / dispose. Per-circuit working set drops from ~100 MB to ~80 KB.

N+1 hoisting:
- Transport-008: added ITemplateEngineRepository.GetTemplatesWithChildrenAsync
  bulk method; BundleImporter.PreviewAsync calls it once instead of per-
  template-name. Single query with .Include(...).AsSplitQuery().
- DM-023: BuildDeployArtifactsCommandAsync's per-site loop now references
  a pre-fetched GlobalArtifactSnapshot (shared scripts, external systems,
  DB connections, notification lists, SMTP) instead of re-querying per site.
- MgmtSvc-023: HandleQueryDeployments unfiltered branch uses one
  GetAllInstancesAsync bulk load + Dictionary<int,int?> lookup (was a
  GetInstanceByIdAsync per record).

Small allocations / per-tick rebuilds:
- InboundAPI-019: AuditWriteMiddleware gates EnableBuffering() on
  RequestHasBody() so GET/HEAD/DELETE/TRACE/OPTIONS and Content-Length:0
  requests skip the FileBufferingReadStream allocation.
- NotifOutbox-006: ResolveAdapters dictionary now cached on
  _adaptersCache (built lazily on first sweep) + actor-lifetime
  _adaptersScope; ResolveAdapters no longer rebuilds per dispatch tick.

Verify-only:
- Comm-017: Confirmed _inProgressDeployments was deleted by Comm-016 in
  commit ac96b83 — marked Resolved with that attribution. No code change.

Doc-correction:
- NS-022: Updated MailKitSmtpClientWrapper XML doc to spell out single-
  connection / per-delivery-factory contract (option (b) — transient
  client per Send — rejected because it re-handshakes TLS per email).

10+ new regression tests across 8 test projects. Build clean; affected
suites all green. README regenerated: 54 open (was 65).
2026-05-28 07:47:24 -04:00
Joseph Doherty 2ed5c6c379 fix(concurrency/lifetime): close Theme 5 — 10 concurrency / DI / scope findings
Concurrency hazards, DI lifetime hygiene, and one verify-only confirmation
across 8 modules. Highlights:

Concurrency:
- CentralUI-030: SandboxConsoleCapture writes routed through WriteSynchronized
  locking on the captured StringWriter — intra-script Task fan-out can no
  longer corrupt the per-call buffer.
- Commons-021: ExternalCallResult.Response now backed by Lazy<dynamic?>
  (ExecutionAndPublication) — no more benign double-parse race.
- CD-017: DeploymentManagerRepository.DeleteDeploymentRecordAsync now takes
  an expected RowVersion and seeds entry.OriginalValues so EF emits
  DELETE ... WHERE Id=@id AND RowVersion=@prior; stale RowVersion now
  throws DbUpdateConcurrencyException instead of silent overwrite.
- Transport-009: AuditCorrelationContext.BundleImportId backed by
  AsyncLocal<Guid?> so concurrent imports get per-logical-call isolation
  (was a scoped instance shared via AuditService across runs).

DI / lifetime:
- AuditLog-003: All 3 AuditLog actor handlers switched to CreateAsyncScope
  + await using — async EF disposal no longer swallowed.
- AuditLog-007: INodeIdentityProvider resolution standardised on
  GetRequiredService<>() (was mixed with GetService<>()).
- AuditLog-011: AddAuditLogHealthMetricsBridge guarded by sentinel
  descriptor check — calling twice no longer double-registers the hosted
  service.

Shutdown / supervision:
- SiteCallAudit-002: AkkaHostedService adds a CoordinatedShutdown
  cluster-leave task (drain-site-call-audit-singleton) that issues a
  bounded GracefulStop(10s) so failover waits for in-flight upserts.

Registration safety:
- NS-020: AkkaHostedService now guards NotificationForwarder S&F
  registration with _notificationDeliveryHandlerRegistered + throws
  InvalidOperationException on double-register to make the regression loud.

VERIFY-only closures:
- NotifOutbox-005: Confirmed already closed by CD-015 fix (ac96b83) —
  NotificationOutboxRepository.InsertIfNotExistsAsync uses the same
  raw-SQL IF NOT EXISTS + 2601/2627 swallow pattern; race eliminated.

5+ new regression tests (CentralUI sandbox WhenAll, ExternalCallResult
64-reader Barrier, AuditLog DI idempotency, RowVersion stale-throw,
SiteCallAudit-002 shutdown drain). Build clean; affected suites all green.
README regenerated: 65 open (was 75).
2026-05-28 07:29:41 -04:00