Commit Graph

20 Commits

Author SHA1 Message Date
Joseph Doherty
c00603e2a4 feat(auditlog): thread ParentExecutionId through S&F for retry-loop cached rows
The store-and-forward retry loop emits the per-attempt and terminal cached
audit rows (ApiCallCached/DbWriteCached Attempted, CachedResolve) via
CachedCallLifecycleBridge from a CachedCallAttemptContext, not from the
script context. The ExecutionId rollout (Task 4) already threaded ExecutionId
and SourceScript through this path; ParentExecutionId — the spawning
inbound-API request's ExecutionId — was not, so those retry-loop rows had
ParentExecutionId = null even for an inbound-API-routed run.

Thread it additively as a sibling at every carry point ExecutionId passes
through:

- StoreAndForwardMessage gains ParentExecutionId (Guid?).
- StoreAndForwardStorage adds a nullable parent_execution_id column via the
  same idempotent PRAGMA-probed ALTER TABLE migration; rows persisted by an
  older build read back null (back-compat). The defensive Guid.TryParse read
  helper (ParseExecutionId) is renamed ParseGuidColumn and reused for both
  columns so a corrupt value cannot abort the retry sweep.
- StoreAndForwardService.EnqueueAsync gains an optional parentExecutionId
  param, stamped onto the buffered message and surfaced on the
  CachedCallAttemptContext built in the retry loop.
- CachedCallAttemptContext gains ParentExecutionId.
- CachedCallLifecycleBridge.BuildPacket sets AuditEvent.ParentExecutionId
  from the context, beside the existing ExecutionId.
- IExternalSystemClient.CachedCallAsync / IDatabaseGateway.CachedWriteAsync
  gain an optional parentExecutionId param; ScriptRuntimeContext's CachedCall
  / CachedWrite helpers pass _parentExecutionId.

All threading is additive — ParentExecutionId is Guid? everywhere, null for
non-routed runs, and old buffered S&F rows still deserialize with the new
field null.
2026-05-21 17:58:11 -04:00
Joseph Doherty
6aac4c8ed7 test(auditlog): pin OriginExecutionId preservation in forwarder + Parked NotifyDeliver 2026-05-21 15:42:45 -04:00
Joseph Doherty
705ae95404 test(auditlog): assert ExecutionId threading hops; defensive Guid parse on S&F read 2026-05-21 15:27:58 -04:00
Joseph Doherty
6f5a35f222 feat(auditlog): thread ExecutionId through S&F for retry-loop cached rows
The store-and-forward retry loop emits the per-attempt and terminal cached
audit rows (ApiCallCached/DbWriteCached Attempted, CachedResolve) via
CachedCallLifecycleBridge from a CachedCallAttemptContext, not from the
script context. ExecutionId (and SourceScript) were not threaded through the
S&F buffer, so those rows had ExecutionId = null and SourceScript = null.

Thread both, additively, from the cached-call enqueue path:

- StoreAndForwardMessage gains ExecutionId (Guid?) / SourceScript (string?).
- StoreAndForwardStorage adds nullable execution_id / source_script columns
  via an idempotent PRAGMA-probed ALTER TABLE migration; rows persisted by
  an older build read back null (back-compat).
- StoreAndForwardService.EnqueueAsync gains optional executionId /
  sourceScript params, stamped onto the buffered message and surfaced on the
  CachedCallAttemptContext built in the retry loop.
- CachedCallAttemptContext gains ExecutionId / SourceScript.
- CachedCallLifecycleBridge.BuildPacket sets AuditEvent.ExecutionId and
  AuditEvent.SourceScript from the context (replacing the hard-coded
  SourceScript = null and its now-stale comment).
- IExternalSystemClient.CachedCallAsync / IDatabaseGateway.CachedWriteAsync
  gain optional executionId / sourceScript params; ScriptRuntimeContext's
  CachedCall / CachedWrite helpers pass _executionId / _sourceScript.

Script-side cached rows (CachedSubmit, immediate Attempted+Resolve) are
unchanged. All threading is additive — old buffered S&F rows still
deserialize and process with the new fields null.
2026-05-21 15:18:35 -04:00
Joseph Doherty
7816b840c1 feat(sitecallaudit): central→site Retry/Discard relay for parked operations 2026-05-21 04:36:04 -04:00
Joseph Doherty
63eb1f4225 feat(snf): per-attempt and terminal cached-call lifecycle observer (#23 M3)
Hook the store-and-forward retry loop so the audit pipeline can emit
per-attempt + terminal telemetry under the original TrackedOperationId
(Bundle E Tasks E4 + E5).

New seam:

* ICachedCallLifecycleObserver + CachedCallAttemptContext in
  Commons.Interfaces.Services. Outcome enum
  (Delivered / TransientFailure / PermanentFailure / ParkedMaxRetries)
  is S&F-vocabulary; the bridge living in ScadaLink.AuditLog (Bundle F)
  will map it to the AuditKind/AuditStatus pair when building the
  CachedCallTelemetry packet.

* StoreAndForwardService gains an optional cachedCallObserver
  constructor parameter + siteId. RetryMessageAsync fires the observer
  exactly once per attempt with the appropriate outcome:
    - handler returns true               -> Delivered
    - handler returns false              -> PermanentFailure (and parks)
    - handler throws + retries remaining -> TransientFailure
    - handler throws + max retries hit   -> ParkedMaxRetries (and parks)

Hook is best-effort: a thrown observer is logged + swallowed so a
failing audit pipeline can never be misclassified as a transient
delivery failure or corrupt the retry-count bookkeeping (alog.md §7).

Only cached-call categories (ExternalSystem, CachedDbWrite) generate
notifications — Notification category has its own central-side
audit pipeline (Notification Outbox / #21).

Pre-M3 callers that didn't thread a TrackedOperationId into the S&F
message id are silently skipped — the observer requires a parseable id
by contract. New S&F callers stamp the id as messageId (Bundle E3).

Bundle E tasks E4 + E5.
2026-05-20 14:52:34 -04:00
Joseph Doherty
3326bddeb0 feat(notification-outbox): async Notify.Send with status handle
Notify.To(list).Send(subject,body) now generates a NotificationId GUID,
enqueues a Notification-category message into the site Store-and-Forward
Engine, and returns the NotificationId immediately (Task<string>). The
NotificationId is the single idempotency key end-to-end: it is the S&F
message Id, it is carried inside the buffered NotificationSubmit payload,
and it is the id the forwarder submits to central.

NotificationForwarder now deserializes the buffered payload as a
NotificationSubmit and reads NotificationId from it (re-stamping only the
site-owned SourceSiteId / SourceInstanceId), instead of deriving the id
from StoreAndForwardMessage.Id.

Adds NotifyHelper.Status(id): queries central via the site communication
actor; reports the site-local Forwarding state while the notification is
still buffered at the site, maps central's response when found, and
Unknown otherwise. Adds a NotificationDeliveryStatus record.

SiteCommunicationActor gains a NotificationStatusQuery forwarding handler
mirroring NotificationSubmit. StoreAndForwardService.EnqueueAsync gains an
optional messageId parameter and exposes GetMessageByIdAsync.
2026-05-19 02:30:51 -04:00
Joseph Doherty
05614e037a fix(notification-outbox): fall back to Target for empty notification list name 2026-05-19 02:21:22 -04:00
Joseph Doherty
6a77c12735 feat(notification-outbox): forward site S&F notifications to central 2026-05-19 02:16:27 -04:00
Joseph Doherty
0135a6b2a6 fix(store-and-forward): resolve StoreAndForward-015..017 — document maxRetries=0 contract, replicate operator retry/discard, real category in activity log 2026-05-17 03:18:41 -04:00
Joseph Doherty
9e2416b34c fix(store-and-forward): resolve StoreAndForward-006,007,008,009 — transactional parked reads, PipeTo, fault-isolated activity events; 002/011/012 deferred 2026-05-16 22:32:30 -04:00
Joseph Doherty
5672502d83 fix(store-and-forward): resolve StoreAndForward-004,005,010,013 — accurate handler-contract doc, conditional sweep writes, reset LastAttemptAt on parked retry, test coverage 2026-05-16 21:44:10 -04:00
Joseph Doherty
71c0564ec0 fix(store-and-forward): resolve StoreAndForward-003, re-triage 002 — fix retry-count off-by-one 2026-05-16 19:57:28 -04:00
Joseph Doherty
91438dcc1b fix(store-and-forward): create the SQLite database directory on init (StoreAndForward-014)
StoreAndForwardStorage.InitializeAsync opened a SqliteConnection against the
configured SqliteDbPath (default ./data/store-and-forward.db) without ensuring
the parent directory exists. SQLite creates the database file but not its
directory, so when data/ was absent the connection failed with
"SQLite Error 14: unable to open database file" — aborting the site host's
RegisterSiteActors at StoreAndForwardService.StartAsync.

This was the root cause of the six failing SiteActorPathTests. Production
masked it because the Docker image / deployment creates data/.

InitializeAsync now calls EnsureDatabaseDirectoryExists, which parses the
connection string and creates the parent directory of a file-backed database
(in-memory databases and bare filenames are skipped).

Regression test InitializeAsync_FileInMissingDirectory_CreatesDirectory fails
against the pre-fix code. Host suite now 155/155 green (was 149/155).
2026-05-16 19:13:00 -04:00
Joseph Doherty
61253e3269 fix(store-and-forward): resolve S&F delivery + replication wiring (3 Critical findings)
Resolves StoreAndForward-001, ExternalSystemGateway-001, NotificationService-001
— one systemic gap where buffered messages were persisted but never delivered,
and the active node never replicated its buffer to the standby.

Delivery handlers (ExternalSystemGateway-001 / NotificationService-001):
- AkkaHostedService registers delivery handlers for the ExternalSystem,
  CachedDbWrite and Notification categories after StoreAndForwardService starts;
  each resolves its scoped consumer in a fresh DI scope.
- ExternalSystemClient, DatabaseGateway and NotificationDeliveryService each
  gain a DeliverBufferedAsync method: re-resolve the target and re-attempt
  delivery, returning true/false/throwing per the transient-vs-permanent contract.
- EnqueueAsync gains an attemptImmediateDelivery flag; CachedCallAsync and
  NotificationDeliveryService.SendAsync pass false (they already attempted
  delivery themselves) so registering a handler does not dispatch twice.

Replication (StoreAndForward-001):
- ReplicationService is injected into StoreAndForwardService; a new BufferAsync
  helper replicates every enqueue, and successful-retry removes and parks are
  replicated too. Fire-and-forget, no-op when replication is disabled.

Tests: StoreAndForwardReplicationTests (Add/Remove/Park observed),
attemptImmediateDelivery behaviour, and DeliverBufferedAsync paths for each
consumer. Full solution builds; StoreAndForward/ExternalSystemGateway/
NotificationService suites green.
2026-05-16 18:58:11 -04:00
Joseph Doherty
9c60592632 build: adopt NuGet Central Package Management
Move all package versions into Directory.Packages.props so every project
resolves a single consistent version. Consolidates the Roslyn packages
(Microsoft.CodeAnalysis.CSharp.Scripting/Workspaces) onto 5.0.0, which
resolves the pre-existing NU1608 version-skew error in the test projects.
2026-05-16 15:56:30 -04:00
Joseph Doherty
ec1d8f1393 chore(deps): bump packages flagged by NU190x advisories
Restore inside the docker build was failing because TreatWarningsAsErrors
promotes NU1902/NU1903/NU1904 (vulnerable package warnings) to errors.
Bump the flagged packages to advisory-free versions:

- MailKit                                          4.15.1 -> 4.16.0    (GHSA-9j88-vvj5-vhgr)
- Microsoft.AspNetCore.DataProtection.EFCore       10.0.5 -> 10.0.7    (GHSA-9mv3-2cwr-p262, transitively pulls fixed System.Security.Cryptography.Xml — GHSA-37gx-xxp4-5rgx, GHSA-w3x6-4m5h-cxqf)
- OpenTelemetry.Api  (transitive via Akka.Hosting) 1.9.0  -> 1.15.3    (GHSA-g94r-2vxg-569j, GHSA-8785-wc3w-h8q6) — added as a direct PackageReference in ScadaLink.Host to override the Akka.Hosting pin

To resolve the NU1605 downgrade chain triggered by DataProtection.EFCore
10.0.7 (which transitively requires Microsoft.EntityFrameworkCore >= 10.0.7
and friends), bump every Microsoft.* 10.0.5 reference across src/ and
tests/ to 10.0.7 in lockstep.
2026-05-08 09:34:17 -04:00
Joseph Doherty
6ea38faa6f Phase 3C: Deployment pipeline & Store-and-Forward engine
Deployment Manager (WP-1–8, WP-16):
- DeploymentService: full pipeline (flatten→validate→send→track→audit)
- OperationLockManager: per-instance concurrency control
- StateTransitionValidator: Enabled/Disabled/NotDeployed transition matrix
- ArtifactDeploymentService: broadcast to all sites with per-site results
- Deployment identity (GUID + revision hash), idempotency, staleness detection
- Instance lifecycle commands (disable/enable/delete) with deduplication

Store-and-Forward (WP-9–15):
- StoreAndForwardStorage: SQLite persistence, 3 categories, no max buffer
- StoreAndForwardService: fixed-interval retry, transient-only buffering, parking
- ReplicationService: async best-effort to standby (fire-and-forget)
- Parked message management (query/retry/discard from central)
- Messages survive instance deletion, S&F drains on disable

620 tests pass (+79 new), zero warnings.
2026-03-16 21:27:18 -04:00
Joseph Doherty
fed5f5a82c Add .gitignore and remove tracked build artifacts (bin/obj) 2026-03-16 18:38:00 -04:00
Joseph Doherty
34190e1347 Phase 0 WP-0.1: Create .NET 10 solution structure with all 17 component projects
17 source projects (Commons + Host + 15 components) and 17 xUnit test projects.
SLNX format, net10.0, nullable enabled, warnings as errors. All components
reference Commons; Host references all components. Builds and tests clean.
2026-03-16 18:37:36 -04:00