scadalink-design

Author	SHA1	Message	Date
Joseph Doherty	eb5fa8f2bc	test(auditlog): partition maintenance roll-forward end-to-end (#23 M6)	2026-05-20 19:38:07 -04:00
Joseph Doherty	2138534581	test(auditlog): partition-switch purge end-to-end (#23 M6)	2026-05-20 19:36:17 -04:00
Joseph Doherty	66f6724c5d	test(auditlog): outage + reconciliation recovery end-to-end (#23 M6)	2026-05-20 19:32:01 -04:00
Joseph Doherty	ef49b55cf6	fix(health): decouple AuditCentralHealthSnapshot from ActorSystem (#23 M6) The snapshot's per-site stalled latch now lives on the snapshot itself and is fed by SiteAuditTelemetryStalledTracker via ApplyStalled, removing the chain that required ActorSystem at DI composition time. The tracker is now constructed by AkkaHostedService once ActorSystem.Create returns, with a lock-guarded auxiliary-disposable list so concurrent host start/stop in tests cannot race the enumeration.	2026-05-20 19:25:28 -04:00
Joseph Doherty	2744011ce9	feat(health): surface AuditRedactionFailure in central snapshot (#23 M6)	2026-05-20 19:13:19 -04:00
Joseph Doherty	70ed8d4557	feat(health): CentralAuditWriteFailures + AuditCentralHealthSnapshot (#23 M6)	2026-05-20 19:11:52 -04:00
Joseph Doherty	42333a72ed	feat(health): SiteAuditTelemetryStalledTracker subscribes to EventStream (#23 M6)	2026-05-20 19:07:44 -04:00
Joseph Doherty	e93f655ce4	feat(health): SiteAuditBacklog metric (count + age + bytes) (#23 M6)	2026-05-20 19:02:01 -04:00
Joseph Doherty	75b060e0a8	feat(auditlog): AuditLogPartitionMaintenanceService monthly roll-forward (#23 M6)	2026-05-20 18:51:43 -04:00
Joseph Doherty	660fdc4e93	feat(auditlog): AuditLogPurgeActor daily partition-switch purge (#23 M6) Central singleton (M6-T4 Bundle C) that drives the daily AuditLog partition purge. On a configurable timer (default 24 hours) the actor: 1. Queries IAuditLogRepository.GetPartitionBoundariesOlderThanAsync for monthly boundaries whose latest OccurredAtUtc is older than DateTime.UtcNow - AuditLogOptions.RetentionDays. 2. For each eligible boundary calls SwitchOutPartitionAsync, which runs the drop-and-rebuild dance around UX_AuditLog_EventId. 3. Publishes AuditLogPurgedEvent(boundary, rowsDeleted, durationMs) on the actor-system EventStream so the Bundle E central health collector and ops surfaces can subscribe without coupling to this actor. Co-changes: * SwitchOutPartitionAsync returns long (rows deleted) — sampled BEFORE the switch via COUNT_BIG over the per-partition filter so the count reflects what the switch removed, not a post-purge scan of a table that no longer exists. All stub implementations updated. * AuditLogPurgeOptions: IntervalHours (default 24), IntervalOverride for tests, Interval property resolving either. * AuditLogPurgedEvent: record with MonthBoundary, RowsDeleted, DurationMs. Behavior: * Continue-on-error per boundary — one partition that throws does NOT abandon the rest of the tick. * DI scope opened per tick (IAuditLogRepository is a SCOPED EF Core service); mirrors SiteAuditReconciliationActor and AuditLogIngestActor. * SupervisorStrategy Resume keeps the singleton alive across leaked exceptions. * EventStream capture BEFORE the first await — Context is unsafe after await in async receive handlers (same pattern as Sender-capture in AuditLogIngestActor.OnIngestAsync). Tests: * Tick_Fires_OnDailyInterval — visible timer side effect. * Tick_OldPartitions_SwitchedOut — both seeded boundaries purged. * Tick_NewerPartitions_Untouched — empty enumerator → no switches. * Tick_PublishesPurgedEvent_WithRowCount — AuditLogPurgedEvent carries RowsDeleted and DurationMs. * Tick_SwitchThrows_OtherPartitionsStillProcessed — continue-on-error. * Threshold_UsesAuditLogOptionsRetentionDays — non-default 30-day window computed from UtcNow - RetentionDays. * EndToEnd_RealPartition_RowsRemoved_PurgedEventPublished — TestKit + MsSqlMigrationFixture: real partitioned table, Jan-2026 row purged, Apr-2026 row kept, AuditLogPurgedEvent observed via probe.	2026-05-20 18:36:31 -04:00
Joseph Doherty	6069a20e0f	fix(configdb): replace SwitchOutPartitionAsync stub with drop-and-rebuild dance (#23 M6) Replaces M1's NotSupportedException stub with the production drop-DROP-INDEX → CREATE-staging → SWITCH PARTITION → DROP-staging → CREATE-INDEX dance documented in alog.md §4. UX_AuditLog_EventId is intentionally non-aligned with ps_AuditLog_Month so single-column EventId uniqueness can be enforced cheaply for InsertIfNotExistsAsync; SQL Server rejects ALTER TABLE SWITCH while a non-aligned unique index is present, so the implementation drops it, switches the partition data into a GUID-suffixed staging table on [PRIMARY], drops staging (discarding the rows), and rebuilds the unique index — all inside an explicit transaction with a CATCH that guarantees the unique index is rebuilt regardless of failure point. Also adds GetPartitionBoundariesOlderThanAsync to IAuditLogRepository: a CROSS APPLY over sys.partition_range_values + per-partition MAX(OccurredAtUtc) to enumerate retention-eligible months for the M6 purge actor (next commit). Tests verify: * Old partition's rows are removed; other months untouched * UX_AuditLog_EventId is rebuilt after a successful switch * InsertIfNotExistsAsync's first-write-wins idempotency still holds after switch * On engineered SWITCH failure (inbound FK from a probe table), SqlException propagates AND UX_AuditLog_EventId is still present (CATCH branch ran) * GetPartitionBoundariesOlderThanAsync returns only boundaries whose partition's MAX(OccurredAtUtc) is strictly older than the threshold; empty partitions excluded	2026-05-20 18:20:55 -04:00
Joseph Doherty	c763bd9a04	feat(auditlog): SiteAuditReconciliationActor central singleton (#23 M6)	2026-05-20 18:10:42 -04:00
Joseph Doherty	640fd07454	feat(comms): site-side PullAuditEvents handler (#23 M6)	2026-05-20 17:58:43 -04:00
Joseph Doherty	25d9acbce3	feat(comms): PullAuditEvents RPC for audit reconciliation (#23 M6)	2026-05-20 17:48:30 -04:00
Joseph Doherty	1856b63f0c	test(auditlog): redaction safety net edge cases (#23 M5)	2026-05-20 17:38:59 -04:00
Joseph Doherty	4eeda45f0e	test(auditlog): hot-path latency budget for IAuditPayloadFilter (#23 M5)	2026-05-20 17:36:29 -04:00
Joseph Doherty	b409afda2e	feat(auditlog): hot-reloadable AuditLogOptions + regex cache invalidation (#23 M5)	2026-05-20 17:35:15 -04:00
Joseph Doherty	23c0fd417e	feat(health): AuditRedactionFailure counter + bridge (#23 M5) Bundle C task M5-T7 — surface DefaultAuditPayloadFilter redactor over-redactions as a Site Health metric so a misconfigured / catastrophic regex shows up on /monitoring/health rather than disappearing into a NoOp sink. - SiteHealthReport: new 'AuditRedactionFailure' int field (defaulted to 0 for back-compat with existing producers/tests). - ISiteHealthCollector / SiteHealthCollector: new IncrementAuditRedactionFailure() — per-interval atomic counter with Interlocked, reset on CollectReport, mirroring the M2 Bundle G SiteAuditWriteFailures pattern. - HealthMetricsAuditRedactionFailureCounter: new bridge in ScadaLink.AuditLog.Site that forwards IAuditRedactionFailureCounter increments to ISiteHealthCollector — mirrors HealthMetricsAuditWriteFailureCounter one-for-one. - AddAuditLogHealthMetricsBridge: now ALSO Replaces the NoOpAuditRedactionFailureCounter binding with the health-metrics bridge, so a single AddAuditLogHealthMetricsBridge() call wires both the M2 Bundle G write-failure counter and the M5 Bundle C redaction-failure counter into the health report. Site-side only for M5 — the filter also runs on CentralAuditWriter and AuditLogIngestActor (where it just keeps the NoOp default), but a central-side health-metric surface for AuditRedactionFailure is deferred to M6 alongside the rest of the central health collector work. Tests: - AuditRedactionFailureMetricTests (HealthMonitoring) covers the SiteHealthCollector increment/report/reset shape (3 tests). - HealthMetricsAuditRedactionFailureCounterTests (AuditLog) covers the AuditLog → HealthMonitoring bridge (3 tests). - Existing CountCapturingHealthCollector stub in DeploymentManagerRedeployTests extended with the new no-op interface method. Verified: dotnet build clean, all 24 test projects green (the only Failed at first ScadaLink.SiteRuntime.Tests run was the known-flaky InstanceActorChildAttributeRaceTests; passes on re-run in isolation and full suite, unrelated to these changes).	2026-05-20 17:28:33 -04:00
Joseph Doherty	9b1379ed9b	feat(auditlog): wire IAuditPayloadFilter into all writer paths (#23 M5) Bundle C task M5-T6 — plugs the IAuditPayloadFilter singleton into the three audit writer entry points so every event is truncated + redacted before persistence, regardless of which path it took to disk: - FallbackAuditWriter (site hot path): filter runs before the primary SQLite write AND the ring-buffer enqueue, so a recovery drain replays rows that are already capped/redacted. - CentralAuditWriter (central direct-write): filter runs before the per-call IAuditLogRepository.InsertIfNotExistsAsync. - AuditLogIngestActor (site→central telemetry): - OnIngestAsync resolves the filter from the per-message scope and applies it to each row before IngestedAtUtc stamping. - OnCachedTelemetryAsync (M3 dual-write) applies the filter to the audit half of every CachedTelemetryEntry before the audit-insert + site-call-upsert transaction. Filter parameter is optional (nullable) on each constructor so the existing test composition roots that don't pass one keep working unchanged — production DI wiring in AddAuditLog always passes the real filter through. ICentralAuditWriter registration switched from the open-ctor form to a factory so the filter flows through it. Tests: FilterIntegrationTests covers all three writer paths end-to-end (4 tests). Full ScadaLink.AuditLog.Tests suite: 146 passed, 0 failed, 0 skipped.	2026-05-20 17:21:57 -04:00
Joseph Doherty	5a7f3e8bf6	feat(auditlog): per-connection SQL parameter redaction opt-in (#23 M5)	2026-05-20 17:11:53 -04:00
Joseph Doherty	37f17dc4a8	feat(auditlog): body regex redaction with over-redaction safety net (#23 M5)	2026-05-20 17:09:36 -04:00
Joseph Doherty	ad7b330f43	feat(auditlog): HTTP header redaction stage (#23 M5)	2026-05-20 17:07:01 -04:00
Joseph Doherty	bba2ef1b4d	feat(auditlog): DefaultAuditPayloadFilter truncation with UTF-8 boundary safety (#23 M5)	2026-05-20 17:01:13 -04:00
Joseph Doherty	25cdf857c9	feat(auditlog): IAuditPayloadFilter contract (#23 M5)	2026-05-20 16:59:10 -04:00
Joseph Doherty	065c8259ae	test(auditlog): audit failures never abort user-facing actions (#23 M4)	2026-05-20 16:50:48 -04:00
Joseph Doherty	a7eea0a795	test(auditlog): Inbound API request audit end-to-end (#23 M4)	2026-05-20 16:48:27 -04:00
Joseph Doherty	02727b3a66	test(auditlog): Notify dispatcher audit trail end-to-end (#23 M4)	2026-05-20 16:47:09 -04:00
Joseph Doherty	56b26339ca	test(auditlog): DB sync emission end-to-end (#23 M4)	2026-05-20 16:43:55 -04:00
Joseph Doherty	1c862989b4	feat(inbound): register AuditWriteMiddleware in pipeline (#23 M4)	2026-05-20 16:35:13 -04:00
Joseph Doherty	3c3f7770c1	feat(inbound): AuditWriteMiddleware emitting InboundRequest/InboundAuthFailure (#23 M4)	2026-05-20 16:35:03 -04:00
Joseph Doherty	855df759b5	feat(siteruntime): emit NotifySend(Submitted) on site-side Notify.To().Send (#23 M4) Audit Log #23 M4 Bundle C — Task C1: every script-initiated Notify.To(list).Send(...) now emits exactly one Notification/NotifySend audit row via the IAuditWriter wired through ScriptRuntimeContext. The row carries Status=Submitted, Target=list name, RequestSummary={subject,body} JSON (M5 will redact), CorrelationId=NotificationId (parsed as Guid), provenance from context, ForwardState=Pending. Emission is best-effort per alog.md §7: a thrown audit writer is logged and swallowed inside the helper; the original NotificationId still flows back to the script and the underlying S&F enqueue still happened. Mirrors the M2 Bundle F ExternalSystem.Call wrapper pattern. Tests: 7 new tests in NotifySendAuditEmissionTests covering submitted- status, list-name target, request-summary JSON shape, writer-throws fail-safe, provenance, NotificationId/CorrelationId round-trip, and the null-writer degrade path.	2026-05-20 16:18:46 -04:00
Joseph Doherty	6de377a39e	feat(notif): emit NotifyDeliver(terminal) on terminal transitions (#23 M4) M4 Bundle B (B3) — NotificationOutboxActor emits a second NotifyDeliver audit row carrying the terminal AuditStatus whenever a notification transitions to a terminal state (Delivered, Parked, Discarded). - Dispatcher: after the B2 Attempted row, a Delivered or Parked row is emitted when the post-outcome status is terminal. Discarded is never produced by the dispatcher — only by the manual discard path. - Missing-adapter park: now emits both Attempted and terminal Parked, both carrying the same explanatory error. - Manual discard (DiscardAsync): after the row update, emits a terminal Discarded NotifyDeliver row with no error message (operator-driven cancellation, not a delivery error). - MapNotificationStatusToAuditStatus + IsTerminal helpers added; terminal emission shares BuildNotifyDeliverEvent with the B2 Attempted path so the two rows carry identical correlation/provenance fields. Audit failure NEVER aborts the user-facing action: every emission is wrapped in try/catch (defensive — the CentralAuditWriter itself swallows).	2026-05-20 16:12:44 -04:00
Joseph Doherty	1dfd67a90d	feat(notif): emit NotifyDeliver(Attempted) per dispatcher attempt (#23 M4) M4 Bundle B (B2) — NotificationOutboxActor's dispatcher loop emits a single AuditChannel.Notification / AuditKind.NotifyDeliver row with AuditStatus.Attempted for every delivery attempt (success, transient failure, permanent failure, and the missing-adapter park). - BuildNotifyDeliverEvent helper populates correlation id (parsed from the string NotificationId — sites generate Guid.NewGuid().ToString("N"), non-Guid ids fall through as null), list-name target, source site/instance/script provenance, and Actor=null (central dispatch has no authenticated end-user). - Attempt duration is measured around the adapter call and recorded as DurationMs so KPIs can compute per-attempt latency. - Emission is fire-and-forget (the writer swallows internally) and wrapped in try/catch — audit failure NEVER aborts the user-facing dispatch. Terminal-state emission lands separately in B3.	2026-05-20 16:08:06 -04:00
Joseph Doherty	b31747a632	feat(notif): NotificationOutboxActor + CentralAuditWriter wired (#23 M4) M4 Bundle B (B1) — add the central-only ICentralAuditWriter implementation and inject it into NotificationOutboxActor so subsequent tasks (B2/B3) can route attempt + terminal lifecycle events through the direct-write audit path. - CentralAuditWriter: thin wrapper around IAuditLogRepository.InsertIfNotExistsAsync; scope-per-call (matches AuditLogIngestActor / NotificationOutboxActor pattern); stamps IngestedAtUtc; swallows all internal failures (alog.md §13). - Registered as a singleton in AddAuditLog. - NotificationOutboxActor ctor takes ICentralAuditWriter (validated non-null). - Host wiring resolves the writer once from the root provider and passes it into the singleton's Props.Create call. - Existing TestKit fixtures updated with a NoOpCentralAuditWriter helper so tests that don't exercise audit emission still compile and pass.	2026-05-20 16:04:01 -04:00
Joseph Doherty	e4d902753b	feat(siteruntime): emit DbOutbound.DbWrite on sync Database.Execute/ExecuteReader (#23 M4) Audit Log #23 — M4 Bundle A (Tasks A1+A2): every script-initiated synchronous DB call routed through Database.Connection(name) now emits exactly one DbOutbound/DbWrite audit row. Implementation — three thin ADO.NET decorators in src/ScadaLink.SiteRuntime/Scripts/: - AuditingDbConnection: wraps the gateway-returned DbConnection so CreateDbCommand() hands the script an AuditingDbCommand. All other ADO.NET surface forwards unchanged. - AuditingDbCommand: intercepts ExecuteNonQuery / ExecuteScalar / ExecuteReader (sync + async). On terminal: Channel = DbOutbound, Kind = DbWrite, Status = Delivered\|Failed, Extra = {"op":"write","rowsAffected":N} (Execute), {"op":"read","rowsReturned":N} (ExecuteReader), RequestSummary = JSON of SQL + parameter values (default capture; redaction in M5), Target = "<connection>.<first 60 chars of SQL>", DurationMs captured via Stopwatch, Provenance from ScriptRuntimeContext (SourceSiteId, SourceInstanceId, SourceScript). - AuditingDbDataReader: counts rows on Read/ReadAsync and fires the audit emission exactly once on Close/CloseAsync/Dispose. DatabaseHelper now takes an IAuditWriter; ScriptRuntimeContext.Database threads through _auditWriter. When the writer is null (tests / minimal hosts) Connection() returns the raw inner DbConnection unchanged. Best-effort emission (alog.md §7): mirrors M2 Bundle F's 3-layer fail-safe — build, write, continuation. Audit-build, audit-write, and audit-continuation faults are logged + swallowed; the original ADO.NET result (or original exception) flows back to the script untouched. The SiteAuditWriteFailures counter increments automatically through the existing FallbackAuditWriter (Bundle G). Tests — tests/ScadaLink.SiteRuntime.Tests/Scripts/DatabaseSyncEmissionTests.cs (7 new, all passing): 1. Execute / INSERT success — one DbWrite row, op=write, rowsAffected=1. 2. ExecuteScalar success — one DbWrite row, op=write. 3. Execute throws — Status=Failed, ErrorMessage + ErrorDetail set. 4. ExecuteReader success — op=read, rowsReturned counts rows pulled. 5. AuditWriter throws — original ADO.NET rowsAffected returned, no events captured, no exception propagates. 6. Provenance populated from context. 7. DurationMs recorded non-zero. Tests use Microsoft.Data.Sqlite in-memory (already transitively available via SiteRuntime). Total SiteRuntime test suite: 251 passing (244 baseline + 7 new). Full solution test suite passes.	2026-05-20 15:54:54 -04:00
Joseph Doherty	c3d4e6b1e0	test(auditlog): combined telemetry idempotency on retried packets (#23 M3)	2026-05-20 15:27:14 -04:00
Joseph Doherty	f063b35633	test(auditlog): cached DB write combined telemetry end-to-end (#23 M3)	2026-05-20 15:26:04 -04:00
Joseph Doherty	f4a7be4929	test(auditlog): cached call combined telemetry end-to-end (#23 M3)	2026-05-20 15:25:10 -04:00
Joseph Doherty	a3b0fb7f08	refactor(auditlog-tests): extract DirectActorSiteStreamAuditClient + add IngestCachedTelemetry support (#23 M3)	2026-05-20 15:21:44 -04:00
Joseph Doherty	f81750b2aa	fix(siteruntime): immediate-success CachedCall emits terminal telemetry (#23 M3) Bundle E left a gap in ExternalSystem.CachedCall: when the underlying HTTP call succeeds immediately (WasBuffered=false), the store-and-forward retry loop is never engaged and the ICachedCallLifecycleObserver hook never fires. As a result Tracking.Status(id) would stay in Submitted forever and the audit log would be missing the Attempted + CachedResolve pair the M3 contract requires. Fix: capture the ExternalCallResult returned by IExternalSystemClient. CachedCallAsync. When WasBuffered=false, emit the two missing telemetry packets from the helper itself: - ApiCallCached / Attempted (per-attempt mechanics row, HttpStatus + ErrorMessage extracted via the same regex the synchronous Call() audit row uses) - CachedResolve / Delivered on Success, or - CachedResolve / Failed on Success=false (immediate permanent failure or transient failure without S&F). The terminal CachedResolve row carries TerminalAtUtc so SiteCallAudit can recognise the row as eligible for purge. The WasBuffered=true path is unaffected — the S&F retry loop owns the Attempted + Resolve emissions there via the CachedCallLifecycleBridge. Database.CachedWrite is unaffected too because IDatabaseGateway. CachedWriteAsync always enqueues into S&F (no immediate-success path). Both new emissions are best-effort: a throwing forwarder is logged and swallowed (alog.md §7) and each row is independently try/catch-wrapped so a single fault cannot drop both halves of the terminal pair. Tests in ExternalSystemCachedCallEmissionTests: - CachedCall_ImmediateSuccess_EmitsAttemptedAndCachedResolve - CachedCall_ImmediateFailure_EmitsAttemptedAndCachedResolveFailed - CachedCall_BufferedPath_DoesNotEmitTerminalTelemetryFromHelper Full suite: 244 SiteRuntime tests (3 new), 200 Host tests, all green.	2026-05-20 15:15:11 -04:00
Joseph Doherty	6fe23a4d9b	feat(host): register SiteCallAuditActor + CachedCallTelemetry forwarder/bridge (#22 , #23 M3) M3 Bundle F (Task F1) wires the cached-call audit pipeline through the composition roots: - Central: register SiteCallAuditActor as a cluster singleton + proxy (mirrors AuditLogIngestActor and NotificationOutboxActor). Program.cs calls .AddSiteCallAudit() on the central role. - Site: register ICachedCallTelemetryForwarder + CachedCallLifecycleBridge in AddAuditLog (lazy factory — Central nodes degrade to audit-only emission because IOperationTrackingStore is site-only). - Site: bind CachedCallLifecycleBridge to ICachedCallLifecycleObserver so StoreAndForwardService picks it up via DI. - Site: introduce IStoreAndForwardSiteContext + Host adapter to surface the site id to StoreAndForwardService without creating a StoreAndForward -> HealthMonitoring project-reference cycle. - ScriptExecutionActor resolves ICachedCallTelemetryForwarder per script scope and threads it into ScriptRuntimeContext. CachedCallTelemetryForwarder's IOperationTrackingStore dependency is now nullable so Central DI validation succeeds with the lazy registration; the forwarder's tracking-half emission is a no-op when the store is absent. Tests: - AkkaHostedServiceAuditWiringTests: Central host builds with AddSiteCallAudit and resolves ICachedCallTelemetryForwarder; Site resolves the forwarder + bridge + observer + IStoreAndForwardSiteContext. - Full solution: 194 Host tests green, 241 SiteRuntime tests green, every other suite unchanged.	2026-05-20 15:10:47 -04:00
Joseph Doherty	047988e4c8	feat(siteruntime): Database.CachedWrite emits combined telemetry + S&F audit bridge (#23 M3) Wire the M3 cached-call audit pipeline end-to-end for the database channel and close the loop between the S&F lifecycle observer and the site-side dual emitter. * DatabaseCachedWriteEmissionTests covers Database.CachedWrite (set up in Bundle E3): mints a TrackedOperationId, emits one CachedSubmit packet on DbOutbound, threads the id into IDatabaseGateway, and is best-effort on a thrown forwarder. Mirrors ExternalSystem.CachedCall coverage from E3. * CachedCallLifecycleBridge (new) implements ICachedCallLifecycleObserver and lives alongside CachedCallTelemetryForwarder. The bridge ingests per-attempt notifications from the S&F retry loop and fans them out to the forwarder: - TransientFailure -> 1 Attempted row - Delivered -> Attempted + CachedResolve(Delivered) - PermanentFailure -> Attempted + CachedResolve(Parked) - ParkedMaxRetries -> Attempted + CachedResolve(Parked) Channel string -> AuditKind mapping (ApiOutbound->ApiCallCached, DbOutbound->DbWriteCached). Best-effort top-level catch swallows any unexpected throw so the S&F retry bookkeeping is never disturbed. * Bridge tests (7) cover all four outcomes, channel mapping, provenance propagation, and the no-throw-on-forwarder-failure contract. Bundle F (Host registration) will instantiate the bridge and inject it into StoreAndForwardService.cachedCallObserver, closing the wiring path end-to-end. Bundle E task E6.	2026-05-20 14:55:17 -04:00
Joseph Doherty	63eb1f4225	feat(snf): per-attempt and terminal cached-call lifecycle observer (#23 M3) Hook the store-and-forward retry loop so the audit pipeline can emit per-attempt + terminal telemetry under the original TrackedOperationId (Bundle E Tasks E4 + E5). New seam: * ICachedCallLifecycleObserver + CachedCallAttemptContext in Commons.Interfaces.Services. Outcome enum (Delivered / TransientFailure / PermanentFailure / ParkedMaxRetries) is S&F-vocabulary; the bridge living in ScadaLink.AuditLog (Bundle F) will map it to the AuditKind/AuditStatus pair when building the CachedCallTelemetry packet. * StoreAndForwardService gains an optional cachedCallObserver constructor parameter + siteId. RetryMessageAsync fires the observer exactly once per attempt with the appropriate outcome: - handler returns true -> Delivered - handler returns false -> PermanentFailure (and parks) - handler throws + retries remaining -> TransientFailure - handler throws + max retries hit -> ParkedMaxRetries (and parks) Hook is best-effort: a thrown observer is logged + swallowed so a failing audit pipeline can never be misclassified as a transient delivery failure or corrupt the retry-count bookkeeping (alog.md §7). Only cached-call categories (ExternalSystem, CachedDbWrite) generate notifications — Notification category has its own central-side audit pipeline (Notification Outbox / #21). Pre-M3 callers that didn't thread a TrackedOperationId into the S&F message id are silently skipped — the observer requires a parseable id by contract. New S&F callers stamp the id as messageId (Bundle E3). Bundle E tasks E4 + E5.	2026-05-20 14:52:34 -04:00
Joseph Doherty	42430dd10a	feat(siteruntime): ExternalSystem.CachedCall emits CachedSubmit telemetry (#23 M3) Rework ScriptRuntimeContext.ExternalSystem.CachedCall to fit the M3 combined-telemetry model: * Mints a fresh TrackedOperationId and emits one CachedSubmit packet via ICachedCallTelemetryForwarder BEFORE handing the call off — the SiteCalls row is materialised before the first delivery attempt so Tracking.Status(id) can observe a Submitted row even if immediate delivery resolves before the helper returns. * Threads the TrackedOperationId into IExternalSystemClient.CachedCallAsync as a new optional parameter (and into IDatabaseGateway.CachedWriteAsync for the Database mirror set up here for E6). The gateway uses the id as the StoreAndForward messageId so the retry loop (Tasks E4/E5) can recover it from StoreAndForwardMessage.Id. * Returns the TrackedOperationId rather than ExternalCallResult — the script's contract is now "get a tracking handle, observe outcome via Tracking.Status". Best-effort emission: a thrown forwarder is logged + swallowed; the original call still runs and the id is still returned. DatabaseHelper gets the matching siteId / sourceScript / forwarder fields and a parallel CachedSubmit emitter (Channel=DbOutbound) so Task E6's Database.CachedWrite mirror plugs in without further runtime wiring. New ICachedCallTelemetryForwarder seam in Commons.Interfaces.Services so SiteRuntime depends on Commons (existing arrow) rather than ScadaLink.AuditLog (would have introduced a new dependency). Bundle E task E3 (and helper-shape work for E6).	2026-05-20 14:48:05 -04:00
Joseph Doherty	2145b29d4d	feat(auditlog): CachedCallTelemetryForwarder for site-side dual emission (#23 M3) Sister to SiteAuditTelemetryActor: takes a combined CachedCallTelemetry packet and fans it out to the two site-local stores. * AuditEvent half writes through IAuditWriter (the M2 FallbackAuditWriter + SqliteAuditWriter chain — same site SQLite hot-path as sync calls). * SiteCallOperational half maps Audit.Kind to the matching IOperationTrackingStore method: - CachedSubmit -> RecordEnqueueAsync (insert-if-not-exists) - ApiCallCached / DbWriteCached -> RecordAttemptAsync (monotonic) - CachedResolve -> RecordTerminalAsync (first-write-wins) Best-effort contract (alog.md §7): independent try/catch per half so a thrown writer cannot starve the tracking row (and vice-versa); both failures are logged at warning level and swallowed — the calling script never sees them. Wire push deferred to M6 — the NoOp ISiteStreamAuditClient binding stays in effect; the forwarder writes only to the local stores in M3. The existing SiteAuditTelemetryActor drain loop will sweep the audit rows once a real gRPC client lands. Bundle E task E2.	2026-05-20 14:41:15 -04:00
Joseph Doherty	73719ee066	feat(auditlog): extend ISiteStreamAuditClient with IngestCachedTelemetryAsync (#23 M3) Add the second site→central RPC seam alongside the existing M2 IngestAuditEventsAsync. The Bundle D proto already lit up IngestCachedTelemetry (CachedTelemetryBatch / IngestAck) so this commit just plumbs the client-side abstraction: * ISiteStreamAuditClient gains IngestCachedTelemetryAsync(batch, ct). * NoOpSiteStreamAuditClient implements it returning an empty IngestAck (same shape as M2 — production gRPC client lands in M6). * SyncCallEmissionEndToEndTests' DirectActorSiteStreamAuditClient stub throws NotSupportedException from the new method so a regression that accidentally routes a cached packet through the sync stub fails loudly. * New NoOpSiteStreamAuditClientTests cover the null-guard + empty-ack contract for both batch shapes. Bundle E task E1.	2026-05-20 14:39:24 -04:00
Joseph Doherty	0a97fff906	feat(auditlog): combined telemetry dual-write transaction (#23 M3)	2026-05-20 14:33:14 -04:00
Joseph Doherty	2b54290c7f	feat(comms): IngestCachedTelemetry RPC + CachedTelemetryPacket proto (#23 M3)	2026-05-20 14:24:13 -04:00
Joseph Doherty	de110f8b42	feat(scaudit): SiteCallAuditActor minimum surface (#22 , #23 M3) Bundle C of Audit Log #23 M3. Adds the ScadaLink.SiteCallAudit project + matching tests project, mirroring the ScadaLink.AuditLog scaffolding pattern (net10.0, central package management, InternalsVisibleTo to the tests assembly). SiteCallAuditActor is the central singleton entry point for Site Call Audit (#22): it receives UpsertSiteCallCommand and persists the SiteCall via ISiteCallAuditRepository.UpsertAsync (monotonic, idempotent — out-of-order or duplicate updates are silent no-ops at the repo). Audit-write failures NEVER abort the user-facing action (CLAUDE.md): repository throws are caught + logged, the actor replies Accepted=false, and the singleton stays alive (Resume supervisor strategy as defence in depth). Two constructors mirror AuditLogIngestActor: - IServiceProvider production constructor resolves the scoped EF repository from a fresh DI scope per message. - ISiteCallAuditRepository test constructor injects a concrete repository so the TestKit tests exercise the real monotonic-upsert SQL end to end. UpsertSiteCallCommand + UpsertSiteCallReply live in ScadaLink.Commons (same home as IngestAuditEventsCommand) so Bundle D's gRPC server can construct them without taking a project reference on the actor's host project. AddSiteCallAudit() is a placeholder for symmetry with AddAuditLog / AddNotificationOutbox; Bundle F will populate it with the actor's Props factory + options bindings. Tests (Akka.TestKit.Xunit2 + MsSqlMigrationFixture via project ref to ScadaLink.ConfigurationDatabase.Tests, mirroring Bundle D2): - Receive_UpsertSiteCallCommand_Persists_Replies_Accepted - Receive_DuplicateUpsert_OlderStatus_NoOp_StillRepliesAccepted (idempotency) - Receive_RepoThrowsTransient_RepliesAccepted_False_ActorStaysAlive Reconciliation, KPIs, and the central->site Retry/Discard relay are deferred per CLAUDE.md scope discipline. ScadaLink.slnx updated to include both new projects. All 3 new tests pass against the running infra/mssql container; full suite (2683 tests across 27 projects) passes with no regressions.	2026-05-20 14:18:49 -04:00
Joseph Doherty	bedfa6b8f3	feat(configdb): ISiteCallAuditRepository + EF impl, monotonic upsert (#22 , #23 M3) Bundle B3 of Audit Log #23 M3: data-access layer for the central SiteCalls table introduced in B1+B2. UpsertAsync is insert-if-not-exists then monotonic-status update so out-of-order telemetry, duplicate gRPC packets, and reconciliation pulls all converge on the same row without rolling state backward. - src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs: UpsertAsync (monotonic), GetAsync, QueryAsync, PurgeTerminalAsync. - src/ScadaLink.Commons/Types/Audit/SiteCallQueryFilter.cs + SiteCallPaging.cs: filter (Channel/SourceSite/Status/Target/time range) and keyset paging cursor on (CreatedAtUtc DESC, TrackedOperationId DESC), mirrored on M1's AuditLog* equivalents. - src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs: raw-SQL InsertIfNotExists + conditional UPDATE with inline CASE rank compare (Submitted=0, Forwarded=1, Attempted/Skipped=2, terminal=3 — terminal statuses are mutually exclusive so e.g. Delivered cannot overwrite Parked). Duplicate-key violations (SQL 2601/2627) are swallowed at Debug, identical to AuditLogRepository's race-fix. QueryAsync uses FromSqlInterpolated because EF Core 10 cannot translate string.Compare against the value-converted TrackedOperationId column inside an expression tree. - ServiceCollectionExtensions wires the repository (scoped, after IAuditLogRepository). - 12 integration tests in tests/ScadaLink.ConfigurationDatabase.Tests/ Repositories/ (MsSqlMigrationFixture + [SkippableFact]): fresh insert, monotonic advance, older-status no-op, same-status no-op, terminal-over-terminal no-op, 50-way concurrent-insert race produces exactly one row, Get known/unknown, filter by site, keyset paging no overlap, purge terminal-and-old, purge keeps non-terminal-and-recent.	2026-05-20 14:10:24 -04:00

1 2 3 4 5 ...

340 Commits