The store-and-forward retry loop emits the per-attempt and terminal cached
audit rows (ApiCallCached/DbWriteCached Attempted, CachedResolve) via
CachedCallLifecycleBridge from a CachedCallAttemptContext, not from the
script context. The ExecutionId rollout (Task 4) already threaded ExecutionId
and SourceScript through this path; ParentExecutionId — the spawning
inbound-API request's ExecutionId — was not, so those retry-loop rows had
ParentExecutionId = null even for an inbound-API-routed run.
Thread it additively as a sibling at every carry point ExecutionId passes
through:
- StoreAndForwardMessage gains ParentExecutionId (Guid?).
- StoreAndForwardStorage adds a nullable parent_execution_id column via the
same idempotent PRAGMA-probed ALTER TABLE migration; rows persisted by an
older build read back null (back-compat). The defensive Guid.TryParse read
helper (ParseExecutionId) is renamed ParseGuidColumn and reused for both
columns so a corrupt value cannot abort the retry sweep.
- StoreAndForwardService.EnqueueAsync gains an optional parentExecutionId
param, stamped onto the buffered message and surfaced on the
CachedCallAttemptContext built in the retry loop.
- CachedCallAttemptContext gains ParentExecutionId.
- CachedCallLifecycleBridge.BuildPacket sets AuditEvent.ParentExecutionId
from the context, beside the existing ExecutionId.
- IExternalSystemClient.CachedCallAsync / IDatabaseGateway.CachedWriteAsync
gain an optional parentExecutionId param; ScriptRuntimeContext's CachedCall
/ CachedWrite helpers pass _parentExecutionId.
All threading is additive — ParentExecutionId is Guid? everywhere, null for
non-routed runs, and old buffered S&F rows still deserialize with the new
field null.
The store-and-forward retry loop emits the per-attempt and terminal cached
audit rows (ApiCallCached/DbWriteCached Attempted, CachedResolve) via
CachedCallLifecycleBridge from a CachedCallAttemptContext, not from the
script context. ExecutionId (and SourceScript) were not threaded through the
S&F buffer, so those rows had ExecutionId = null and SourceScript = null.
Thread both, additively, from the cached-call enqueue path:
- StoreAndForwardMessage gains ExecutionId (Guid?) / SourceScript (string?).
- StoreAndForwardStorage adds nullable execution_id / source_script columns
via an idempotent PRAGMA-probed ALTER TABLE migration; rows persisted by
an older build read back null (back-compat).
- StoreAndForwardService.EnqueueAsync gains optional executionId /
sourceScript params, stamped onto the buffered message and surfaced on the
CachedCallAttemptContext built in the retry loop.
- CachedCallAttemptContext gains ExecutionId / SourceScript.
- CachedCallLifecycleBridge.BuildPacket sets AuditEvent.ExecutionId and
AuditEvent.SourceScript from the context (replacing the hard-coded
SourceScript = null and its now-stale comment).
- IExternalSystemClient.CachedCallAsync / IDatabaseGateway.CachedWriteAsync
gain optional executionId / sourceScript params; ScriptRuntimeContext's
CachedCall / CachedWrite helpers pass _executionId / _sourceScript.
Script-side cached rows (CachedSubmit, immediate Attempted+Resolve) are
unchanged. All threading is additive — old buffered S&F rows still
deserialize and process with the new fields null.
M3 Bundle F (Task F1) wires the cached-call audit pipeline through the
composition roots:
- Central: register SiteCallAuditActor as a cluster singleton + proxy
(mirrors AuditLogIngestActor and NotificationOutboxActor). Program.cs
calls .AddSiteCallAudit() on the central role.
- Site: register ICachedCallTelemetryForwarder + CachedCallLifecycleBridge
in AddAuditLog (lazy factory — Central nodes degrade to audit-only
emission because IOperationTrackingStore is site-only).
- Site: bind CachedCallLifecycleBridge to ICachedCallLifecycleObserver so
StoreAndForwardService picks it up via DI.
- Site: introduce IStoreAndForwardSiteContext + Host adapter to surface the
site id to StoreAndForwardService without creating a
StoreAndForward -> HealthMonitoring project-reference cycle.
- ScriptExecutionActor resolves ICachedCallTelemetryForwarder per script
scope and threads it into ScriptRuntimeContext.
CachedCallTelemetryForwarder's IOperationTrackingStore dependency is now
nullable so Central DI validation succeeds with the lazy registration; the
forwarder's tracking-half emission is a no-op when the store is absent.
Tests:
- AkkaHostedServiceAuditWiringTests: Central host builds with
AddSiteCallAudit and resolves ICachedCallTelemetryForwarder; Site
resolves the forwarder + bridge + observer + IStoreAndForwardSiteContext.
- Full solution: 194 Host tests green, 241 SiteRuntime tests green, every
other suite unchanged.
Hook the store-and-forward retry loop so the audit pipeline can emit
per-attempt + terminal telemetry under the original TrackedOperationId
(Bundle E Tasks E4 + E5).
New seam:
* ICachedCallLifecycleObserver + CachedCallAttemptContext in
Commons.Interfaces.Services. Outcome enum
(Delivered / TransientFailure / PermanentFailure / ParkedMaxRetries)
is S&F-vocabulary; the bridge living in ScadaLink.AuditLog (Bundle F)
will map it to the AuditKind/AuditStatus pair when building the
CachedCallTelemetry packet.
* StoreAndForwardService gains an optional cachedCallObserver
constructor parameter + siteId. RetryMessageAsync fires the observer
exactly once per attempt with the appropriate outcome:
- handler returns true -> Delivered
- handler returns false -> PermanentFailure (and parks)
- handler throws + retries remaining -> TransientFailure
- handler throws + max retries hit -> ParkedMaxRetries (and parks)
Hook is best-effort: a thrown observer is logged + swallowed so a
failing audit pipeline can never be misclassified as a transient
delivery failure or corrupt the retry-count bookkeeping (alog.md §7).
Only cached-call categories (ExternalSystem, CachedDbWrite) generate
notifications — Notification category has its own central-side
audit pipeline (Notification Outbox / #21).
Pre-M3 callers that didn't thread a TrackedOperationId into the S&F
message id are silently skipped — the observer requires a parseable id
by contract. New S&F callers stamp the id as messageId (Bundle E3).
Bundle E tasks E4 + E5.
Notify.To(list).Send(subject,body) now generates a NotificationId GUID,
enqueues a Notification-category message into the site Store-and-Forward
Engine, and returns the NotificationId immediately (Task<string>). The
NotificationId is the single idempotency key end-to-end: it is the S&F
message Id, it is carried inside the buffered NotificationSubmit payload,
and it is the id the forwarder submits to central.
NotificationForwarder now deserializes the buffered payload as a
NotificationSubmit and reads NotificationId from it (re-stamping only the
site-owned SourceSiteId / SourceInstanceId), instead of deriving the id
from StoreAndForwardMessage.Id.
Adds NotifyHelper.Status(id): queries central via the site communication
actor; reports the site-local Forwarding state while the notification is
still buffered at the site, maps central's response when found, and
Unknown otherwise. Adds a NotificationDeliveryStatus record.
SiteCommunicationActor gains a NotificationStatusQuery forwarding handler
mirroring NotificationSubmit. StoreAndForwardService.EnqueueAsync gains an
optional messageId parameter and exposes GetMessageByIdAsync.
StoreAndForwardStorage.InitializeAsync opened a SqliteConnection against the
configured SqliteDbPath (default ./data/store-and-forward.db) without ensuring
the parent directory exists. SQLite creates the database file but not its
directory, so when data/ was absent the connection failed with
"SQLite Error 14: unable to open database file" — aborting the site host's
RegisterSiteActors at StoreAndForwardService.StartAsync.
This was the root cause of the six failing SiteActorPathTests. Production
masked it because the Docker image / deployment creates data/.
InitializeAsync now calls EnsureDatabaseDirectoryExists, which parses the
connection string and creates the parent directory of a file-backed database
(in-memory databases and bare filenames are skipped).
Regression test InitializeAsync_FileInMissingDirectory_CreatesDirectory fails
against the pre-fix code. Host suite now 155/155 green (was 149/155).
Resolves StoreAndForward-001, ExternalSystemGateway-001, NotificationService-001
— one systemic gap where buffered messages were persisted but never delivered,
and the active node never replicated its buffer to the standby.
Delivery handlers (ExternalSystemGateway-001 / NotificationService-001):
- AkkaHostedService registers delivery handlers for the ExternalSystem,
CachedDbWrite and Notification categories after StoreAndForwardService starts;
each resolves its scoped consumer in a fresh DI scope.
- ExternalSystemClient, DatabaseGateway and NotificationDeliveryService each
gain a DeliverBufferedAsync method: re-resolve the target and re-attempt
delivery, returning true/false/throwing per the transient-vs-permanent contract.
- EnqueueAsync gains an attemptImmediateDelivery flag; CachedCallAsync and
NotificationDeliveryService.SendAsync pass false (they already attempted
delivery themselves) so registering a handler does not dispatch twice.
Replication (StoreAndForward-001):
- ReplicationService is injected into StoreAndForwardService; a new BufferAsync
helper replicates every enqueue, and successful-retry removes and parks are
replicated too. Fire-and-forget, no-op when replication is disabled.
Tests: StoreAndForwardReplicationTests (Add/Remove/Park observed),
attemptImmediateDelivery behaviour, and DeliverBufferedAsync paths for each
consumer. Full solution builds; StoreAndForward/ExternalSystemGateway/
NotificationService suites green.
Move all package versions into Directory.Packages.props so every project
resolves a single consistent version. Consolidates the Roslyn packages
(Microsoft.CodeAnalysis.CSharp.Scripting/Workspaces) onto 5.0.0, which
resolves the pre-existing NU1608 version-skew error in the test projects.
Triage was painful on the old layout: a lone Site dropdown sat on a sparse
row, errors were truncated mid-sentence with a per-row View/Hide toggle
that on expand pushed an unwrapped <pre> through the table and shoved the
Actions column off-screen, all rows looked the same regardless of age or
attempt count, and OriginInstance — which tells you which instance
produced the failure — wasn't displayed at all even though the data was
on the entity.
This pass:
- Adds a real filter bar: Site, Category, Target system, Origin instance,
Age window, free-text search. Category/Target/Origin/Age/Search filter
the loaded page client-side; Site still drives the server query (and
changing site now auto-queries — one fewer click).
- Replaces the in-table expansion with an Offcanvas detail drawer.
Clicking a row slides in a side panel with full message ID + copy,
category label, origin, attempts, both timestamps in relative + absolute
form, the complete error (pre-wrap, scrollable), and big Retry / Discard
buttons. The table never overflows.
- Stacks Target + Method into one column (target in semibold, method
small/muted below) and surfaces Origin as a code-styled chip in a new
column ("—" muted when null).
- Severity left-border on each row, derived client-side from
AttemptCount/MaxAttempts and age of the last attempt: red when retries
are exhausted and last attempt was in the past hour, amber when
exhausted but stale, muted grey otherwise.
- Mini attempt progress bar under the n/max count, red when fully
exhausted and amber while partial.
- Relative timestamps ("5m ago", "1h ago", "2d ago") with absolute UTC on
hover via the title attribute — applies in both the table and the drawer.
- Bulk select: header checkbox selects the filtered set, per-row
checkboxes. When ≥1 selected, a sticky action strip slides in below the
filter bar offering Retry selected / Discard selected with the usual
confirm dialog. Toast reports per-item success/failure counts.
- Summary line next to the title: "N parked · K target systems · oldest
Xh ago" (and "(showing M of N)" when filters are active).
- ParkedMessageEntry contract extended additively with MaxAttempts,
Category, and OriginInstance so the UI has the data it needs for
severity, the category filter, and the new column.
- Bumped page size from 25 to 50 to better match the dense layout.
The Parked Messages page returned "Parked message handler not available"
because no actor was ever registered for ParkedMessages, and Retry/Discard
requests had no Receive at all (would have hit deadletters). On top of
that, StoreAndForwardService.StartAsync() was never called anywhere, so
the sf_messages SQLite table was never created and the retry timer never
ran — silently breaking all of S&F.
- New ParkedMessageHandlerActor bridges StoreAndForwardService.{Get,Retry,Discard}
using the Sender→Task→PipeTo pattern already used in DeploymentManagerActor.
- SiteCommunicationActor now routes ParkedMessageRetryRequest and
ParkedMessageDiscardRequest the same way as the existing Query handler.
- AkkaHostedService.RegisterSiteActors() resolves StoreAndForwardService,
calls StartAsync() to create the schema and start the timer, then
creates and registers the handler actor.
Restore inside the docker build was failing because TreatWarningsAsErrors
promotes NU1902/NU1903/NU1904 (vulnerable package warnings) to errors.
Bump the flagged packages to advisory-free versions:
- MailKit 4.15.1 -> 4.16.0 (GHSA-9j88-vvj5-vhgr)
- Microsoft.AspNetCore.DataProtection.EFCore 10.0.5 -> 10.0.7 (GHSA-9mv3-2cwr-p262, transitively pulls fixed System.Security.Cryptography.Xml — GHSA-37gx-xxp4-5rgx, GHSA-w3x6-4m5h-cxqf)
- OpenTelemetry.Api (transitive via Akka.Hosting) 1.9.0 -> 1.15.3 (GHSA-g94r-2vxg-569j, GHSA-8785-wc3w-h8q6) — added as a direct PackageReference in ScadaLink.Host to override the Akka.Hosting pin
To resolve the NU1605 downgrade chain triggered by DataProtection.EFCore
10.0.7 (which transitively requires Microsoft.EntityFrameworkCore >= 10.0.7
and friends), bump every Microsoft.* 10.0.5 reference across src/ and
tests/ to 10.0.7 in lockstep.
- AkkaHostedService: SetNodeHostname from NodeOptions
- DataConnectionActor: UpdateConnectionEndpoint on state transitions,
track per-tag quality counts and UpdateTagQuality on value changes
- HealthReportSender: query StoreAndForwardStorage for parked message count
- StoreAndForwardStorage: add GetParkedMessageCountAsync()
- WP-0.10: Role-based Host startup (Central=WebApplication, Site=generic Host),
15 component AddXxx() extension methods, MapCentralUI/MapInboundAPI stubs
- WP-0.11: 12 per-component options classes with config binding
- WP-0.12: Sample appsettings for central and site topologies
- Add execution procedure and checklist template to generate_plans.md
- Add phase-0-checklist.md for execution tracking
- Resolve all 21 open questions from plan generation
- Update IDataConnection with batch ops and IAsyncDisposable
57 tests pass, zero warnings.
17 source projects (Commons + Host + 15 components) and 17 xUnit test projects.
SLNX format, net10.0, nullable enabled, warnings as errors. All components
reference Commons; Host references all components. Builds and tests clean.