scadalink-design

Author	SHA1	Message	Date
Joseph Doherty	42333a72ed	feat(health): SiteAuditTelemetryStalledTracker subscribes to EventStream (#23 M6)	2026-05-20 19:07:44 -04:00
Joseph Doherty	e93f655ce4	feat(health): SiteAuditBacklog metric (count + age + bytes) (#23 M6)	2026-05-20 19:02:01 -04:00
Joseph Doherty	75b060e0a8	feat(auditlog): AuditLogPartitionMaintenanceService monthly roll-forward (#23 M6)	2026-05-20 18:51:43 -04:00
Joseph Doherty	cc2d6e91f1	fix(auditlog): SiteAuditReconciliationActor captures EventStream before await (#23 M6)	2026-05-20 18:39:19 -04:00
Joseph Doherty	660fdc4e93	feat(auditlog): AuditLogPurgeActor daily partition-switch purge (#23 M6) Central singleton (M6-T4 Bundle C) that drives the daily AuditLog partition purge. On a configurable timer (default 24 hours) the actor: 1. Queries IAuditLogRepository.GetPartitionBoundariesOlderThanAsync for monthly boundaries whose latest OccurredAtUtc is older than DateTime.UtcNow - AuditLogOptions.RetentionDays. 2. For each eligible boundary calls SwitchOutPartitionAsync, which runs the drop-and-rebuild dance around UX_AuditLog_EventId. 3. Publishes AuditLogPurgedEvent(boundary, rowsDeleted, durationMs) on the actor-system EventStream so the Bundle E central health collector and ops surfaces can subscribe without coupling to this actor. Co-changes: * SwitchOutPartitionAsync returns long (rows deleted) — sampled BEFORE the switch via COUNT_BIG over the per-partition filter so the count reflects what the switch removed, not a post-purge scan of a table that no longer exists. All stub implementations updated. * AuditLogPurgeOptions: IntervalHours (default 24), IntervalOverride for tests, Interval property resolving either. * AuditLogPurgedEvent: record with MonthBoundary, RowsDeleted, DurationMs. Behavior: * Continue-on-error per boundary — one partition that throws does NOT abandon the rest of the tick. * DI scope opened per tick (IAuditLogRepository is a SCOPED EF Core service); mirrors SiteAuditReconciliationActor and AuditLogIngestActor. * SupervisorStrategy Resume keeps the singleton alive across leaked exceptions. * EventStream capture BEFORE the first await — Context is unsafe after await in async receive handlers (same pattern as Sender-capture in AuditLogIngestActor.OnIngestAsync). Tests: * Tick_Fires_OnDailyInterval — visible timer side effect. * Tick_OldPartitions_SwitchedOut — both seeded boundaries purged. * Tick_NewerPartitions_Untouched — empty enumerator → no switches. * Tick_PublishesPurgedEvent_WithRowCount — AuditLogPurgedEvent carries RowsDeleted and DurationMs. * Tick_SwitchThrows_OtherPartitionsStillProcessed — continue-on-error. * Threshold_UsesAuditLogOptionsRetentionDays — non-default 30-day window computed from UtcNow - RetentionDays. * EndToEnd_RealPartition_RowsRemoved_PurgedEventPublished — TestKit + MsSqlMigrationFixture: real partitioned table, Jan-2026 row purged, Apr-2026 row kept, AuditLogPurgedEvent observed via probe.	2026-05-20 18:36:31 -04:00
Joseph Doherty	6069a20e0f	fix(configdb): replace SwitchOutPartitionAsync stub with drop-and-rebuild dance (#23 M6) Replaces M1's NotSupportedException stub with the production drop-DROP-INDEX → CREATE-staging → SWITCH PARTITION → DROP-staging → CREATE-INDEX dance documented in alog.md §4. UX_AuditLog_EventId is intentionally non-aligned with ps_AuditLog_Month so single-column EventId uniqueness can be enforced cheaply for InsertIfNotExistsAsync; SQL Server rejects ALTER TABLE SWITCH while a non-aligned unique index is present, so the implementation drops it, switches the partition data into a GUID-suffixed staging table on [PRIMARY], drops staging (discarding the rows), and rebuilds the unique index — all inside an explicit transaction with a CATCH that guarantees the unique index is rebuilt regardless of failure point. Also adds GetPartitionBoundariesOlderThanAsync to IAuditLogRepository: a CROSS APPLY over sys.partition_range_values + per-partition MAX(OccurredAtUtc) to enumerate retention-eligible months for the M6 purge actor (next commit). Tests verify: * Old partition's rows are removed; other months untouched * UX_AuditLog_EventId is rebuilt after a successful switch * InsertIfNotExistsAsync's first-write-wins idempotency still holds after switch * On engineered SWITCH failure (inbound FK from a probe table), SqlException propagates AND UX_AuditLog_EventId is still present (CATCH branch ran) * GetPartitionBoundariesOlderThanAsync returns only boundaries whose partition's MAX(OccurredAtUtc) is strictly older than the threshold; empty partitions excluded	2026-05-20 18:20:55 -04:00
Joseph Doherty	c763bd9a04	feat(auditlog): SiteAuditReconciliationActor central singleton (#23 M6)	2026-05-20 18:10:42 -04:00
Joseph Doherty	640fd07454	feat(comms): site-side PullAuditEvents handler (#23 M6)	2026-05-20 17:58:43 -04:00
Joseph Doherty	25d9acbce3	feat(comms): PullAuditEvents RPC for audit reconciliation (#23 M6)	2026-05-20 17:48:30 -04:00
Joseph Doherty	b0584f7a08	docs(audit): add M6 reconciliation+purge+partition+health plan (#23 ) 6 bundles: proto+site handler, reconciliation actor, purge actor with drop-and-rebuild around UX index, partition maintenance, four health metrics, integration tests. M5 realities baked in.	2026-05-20 17:44:12 -04:00
Joseph Doherty	db05af897e	docs(audit): roadmap corrections after M5 M6 head records M5 realities: - IOptionsMonitor hot-reload pattern verified; M6 retention config can reuse. - AuditRedactionFailure counter site-only in M5; M6 wires central side. - Filter integration is at 3 writer entry points; purge actor doesn't emit so no filter integration needed. - SwitchOutPartitionAsync drop-and-rebuild dance required (M1 reality + M6-T4 already documents it). - M6 should land the real ISiteStreamAuditClient (Option A) so push telemetry leaves NoOp behind.	2026-05-20 17:43:44 -04:00
Joseph Doherty	adc490b690	Merge branch 'feature/audit-log-m5-payload-redaction': Audit Log #23 M5 Payload + Redaction M5 ships the payload filter pipeline. IAuditPayloadFilter runs between event construction and writer call: - Stage 1: HTTP header redaction (Authorization/Cookie/Set-Cookie/X-Api-Key default list from M1-T9; case-insensitive name match against JSON {headers,body} shape). - Stage 2: Body regex redaction (global + per-target). Patterns compiled at startup with 100ms budget; runtime 50ms timeout guard against catastrophic backtracking. Over-redact on exception + increment counter. - Stage 3: SQL parameter redaction (Channel=DbOutbound, per-connection opt-in via PerTargetOverrides[connection].RedactSqlParamsMatching). - Stage 4: UTF-8 boundary-safe truncation. Default cap 8 KB; error cap 64 KB on Status NOT IN (Delivered/Submitted/Forwarded). PayloadTruncated set to true when applied. Filter wired into all three writer entry points: - FallbackAuditWriter (site chain) — filter before SqliteAuditWriter. - CentralAuditWriter (central direct-write) — filter before IAuditLogRepository.InsertIfNotExistsAsync (NotificationOutbox dispatcher, AuditWriteMiddleware). - AuditLogIngestActor — filter before dual-write transaction. Health metric SiteAuditRedactionFailureCounter wired through the existing M2 Bundle G + M4 Bundle B health-bridge pattern; central-side counter deferred to M6 (the milestone that ships the full central health surface). Hot-reload via IOptionsMonitor + per-call CurrentValue read. Regex cache keyed by pattern string so changing the config naturally invalidates old patterns. Shipped: 11 commits, ~49 net new tests across AuditLog.Tests, HealthMonitoring.Tests, PerformanceTests. Full solution 24/24 test projects green. infra/* untouched on any branch commit.	2026-05-20 17:43:19 -04:00
Joseph Doherty	1856b63f0c	test(auditlog): redaction safety net edge cases (#23 M5)	2026-05-20 17:38:59 -04:00
Joseph Doherty	4eeda45f0e	test(auditlog): hot-path latency budget for IAuditPayloadFilter (#23 M5)	2026-05-20 17:36:29 -04:00
Joseph Doherty	b409afda2e	feat(auditlog): hot-reloadable AuditLogOptions + regex cache invalidation (#23 M5)	2026-05-20 17:35:15 -04:00
Joseph Doherty	23c0fd417e	feat(health): AuditRedactionFailure counter + bridge (#23 M5) Bundle C task M5-T7 — surface DefaultAuditPayloadFilter redactor over-redactions as a Site Health metric so a misconfigured / catastrophic regex shows up on /monitoring/health rather than disappearing into a NoOp sink. - SiteHealthReport: new 'AuditRedactionFailure' int field (defaulted to 0 for back-compat with existing producers/tests). - ISiteHealthCollector / SiteHealthCollector: new IncrementAuditRedactionFailure() — per-interval atomic counter with Interlocked, reset on CollectReport, mirroring the M2 Bundle G SiteAuditWriteFailures pattern. - HealthMetricsAuditRedactionFailureCounter: new bridge in ScadaLink.AuditLog.Site that forwards IAuditRedactionFailureCounter increments to ISiteHealthCollector — mirrors HealthMetricsAuditWriteFailureCounter one-for-one. - AddAuditLogHealthMetricsBridge: now ALSO Replaces the NoOpAuditRedactionFailureCounter binding with the health-metrics bridge, so a single AddAuditLogHealthMetricsBridge() call wires both the M2 Bundle G write-failure counter and the M5 Bundle C redaction-failure counter into the health report. Site-side only for M5 — the filter also runs on CentralAuditWriter and AuditLogIngestActor (where it just keeps the NoOp default), but a central-side health-metric surface for AuditRedactionFailure is deferred to M6 alongside the rest of the central health collector work. Tests: - AuditRedactionFailureMetricTests (HealthMonitoring) covers the SiteHealthCollector increment/report/reset shape (3 tests). - HealthMetricsAuditRedactionFailureCounterTests (AuditLog) covers the AuditLog → HealthMonitoring bridge (3 tests). - Existing CountCapturingHealthCollector stub in DeploymentManagerRedeployTests extended with the new no-op interface method. Verified: dotnet build clean, all 24 test projects green (the only Failed at first ScadaLink.SiteRuntime.Tests run was the known-flaky InstanceActorChildAttributeRaceTests; passes on re-run in isolation and full suite, unrelated to these changes).	2026-05-20 17:28:33 -04:00
Joseph Doherty	9b1379ed9b	feat(auditlog): wire IAuditPayloadFilter into all writer paths (#23 M5) Bundle C task M5-T6 — plugs the IAuditPayloadFilter singleton into the three audit writer entry points so every event is truncated + redacted before persistence, regardless of which path it took to disk: - FallbackAuditWriter (site hot path): filter runs before the primary SQLite write AND the ring-buffer enqueue, so a recovery drain replays rows that are already capped/redacted. - CentralAuditWriter (central direct-write): filter runs before the per-call IAuditLogRepository.InsertIfNotExistsAsync. - AuditLogIngestActor (site→central telemetry): - OnIngestAsync resolves the filter from the per-message scope and applies it to each row before IngestedAtUtc stamping. - OnCachedTelemetryAsync (M3 dual-write) applies the filter to the audit half of every CachedTelemetryEntry before the audit-insert + site-call-upsert transaction. Filter parameter is optional (nullable) on each constructor so the existing test composition roots that don't pass one keep working unchanged — production DI wiring in AddAuditLog always passes the real filter through. ICentralAuditWriter registration switched from the open-ctor form to a factory so the filter flows through it. Tests: FilterIntegrationTests covers all three writer paths end-to-end (4 tests). Full ScadaLink.AuditLog.Tests suite: 146 passed, 0 failed, 0 skipped.	2026-05-20 17:21:57 -04:00
Joseph Doherty	5a7f3e8bf6	feat(auditlog): per-connection SQL parameter redaction opt-in (#23 M5)	2026-05-20 17:11:53 -04:00
Joseph Doherty	37f17dc4a8	feat(auditlog): body regex redaction with over-redaction safety net (#23 M5)	2026-05-20 17:09:36 -04:00
Joseph Doherty	ad7b330f43	feat(auditlog): HTTP header redaction stage (#23 M5)	2026-05-20 17:07:01 -04:00
Joseph Doherty	bba2ef1b4d	feat(auditlog): DefaultAuditPayloadFilter truncation with UTF-8 boundary safety (#23 M5)	2026-05-20 17:01:13 -04:00
Joseph Doherty	25cdf857c9	feat(auditlog): IAuditPayloadFilter contract (#23 M5)	2026-05-20 16:59:10 -04:00
Joseph Doherty	e7b40c1c50	docs(audit): add M5 payload+redaction implementation plan (#23 ) 4 bundles: filter+truncation, redactors (header/body/SQL-param), wire into all emission paths + health metric, config+perf+safety-net. Vocabulary translation locked: error-row cap (64 KB) on Status NOT IN (Delivered, Submitted, Forwarded). Filter integration point in each writer (FallbackAuditWriter, CentralAuditWriter, AuditLogIngestActor) BEFORE storage call.	2026-05-20 16:56:56 -04:00
Joseph Doherty	dae6de2c48	docs(audit): roadmap corrections after M4 M5 head records M4 realities: - AuditingDbConnection/Command/DataReader decorators need filter plug-in at WriteAsync emission point. - CentralAuditWriter + FallbackAuditWriter are both filter integration points for the direct-write + chained-write paths. - InboundAPI middleware RequestSummary populated, ResponseSummary=null pending response-body buffering decision in M5. - UseWhen(/api/) path-scoped middleware gives natural per-target redaction hook. - Error-row cap raised on Status IN (Failed, Parked, Discarded, Attempted, Skipped) per M1 vocab reconciliation.	2026-05-20 16:56:18 -04:00
Joseph Doherty	ac7fc9ce4d	Merge branch 'feature/audit-log-m4-remaining-boundaries': Audit Log #23 M4 Remaining Boundary Emission M4 closes the script-trust-boundary emission gaps: - Sync DB writes/reads via AuditingDbConnection decorator (Channel=DbOutbound, Kind=DbWrite; Extra carries op + rowsAffected/rowsReturned). - Notification Outbox dispatcher: NotifyDeliver(Attempted) per attempt; NotifyDeliver(Delivered/Parked/Discarded) on terminal. Direct-write via new ICentralAuditWriter (CentralAuditWriter implementation wraps IAuditLogRepository.InsertIfNotExistsAsync with scope-per-call). - Site Notify.To().Send() emits NotifySend(Submitted) via the existing IAuditWriter site path; correlation via NotificationId. - Inbound API AuditWriteMiddleware emits InboundRequest on success, InboundAuthFailure on 401/403; Actor = API key NAME (never material); registered via UseWhen(/api/) AFTER UseAuthentication/UseAuthorization; audit failure NEVER changes HTTP response. Audit-write-failure-never-aborts-action proven end-to-end across all five new code paths via AuditWriteFailureSafetyTests (broken ICentralAuditWriter + broken IAuditWriter scenarios all green). Shipped: 12 commits, ~62 net new tests across SiteRuntime / NotificationOutbox / AuditLog / InboundAPI tests. Full solution 2763 tests passing. No regressions. infra/* untouched on any branch commit.	2026-05-20 16:55:45 -04:00
Joseph Doherty	065c8259ae	test(auditlog): audit failures never abort user-facing actions (#23 M4)	2026-05-20 16:50:48 -04:00
Joseph Doherty	a7eea0a795	test(auditlog): Inbound API request audit end-to-end (#23 M4)	2026-05-20 16:48:27 -04:00
Joseph Doherty	02727b3a66	test(auditlog): Notify dispatcher audit trail end-to-end (#23 M4)	2026-05-20 16:47:09 -04:00
Joseph Doherty	56b26339ca	test(auditlog): DB sync emission end-to-end (#23 M4)	2026-05-20 16:43:55 -04:00
Joseph Doherty	1c862989b4	feat(inbound): register AuditWriteMiddleware in pipeline (#23 M4)	2026-05-20 16:35:13 -04:00
Joseph Doherty	3c3f7770c1	feat(inbound): AuditWriteMiddleware emitting InboundRequest/InboundAuthFailure (#23 M4)	2026-05-20 16:35:03 -04:00
Joseph Doherty	855df759b5	feat(siteruntime): emit NotifySend(Submitted) on site-side Notify.To().Send (#23 M4) Audit Log #23 M4 Bundle C — Task C1: every script-initiated Notify.To(list).Send(...) now emits exactly one Notification/NotifySend audit row via the IAuditWriter wired through ScriptRuntimeContext. The row carries Status=Submitted, Target=list name, RequestSummary={subject,body} JSON (M5 will redact), CorrelationId=NotificationId (parsed as Guid), provenance from context, ForwardState=Pending. Emission is best-effort per alog.md §7: a thrown audit writer is logged and swallowed inside the helper; the original NotificationId still flows back to the script and the underlying S&F enqueue still happened. Mirrors the M2 Bundle F ExternalSystem.Call wrapper pattern. Tests: 7 new tests in NotifySendAuditEmissionTests covering submitted- status, list-name target, request-summary JSON shape, writer-throws fail-safe, provenance, NotificationId/CorrelationId round-trip, and the null-writer degrade path.	2026-05-20 16:18:46 -04:00
Joseph Doherty	6de377a39e	feat(notif): emit NotifyDeliver(terminal) on terminal transitions (#23 M4) M4 Bundle B (B3) — NotificationOutboxActor emits a second NotifyDeliver audit row carrying the terminal AuditStatus whenever a notification transitions to a terminal state (Delivered, Parked, Discarded). - Dispatcher: after the B2 Attempted row, a Delivered or Parked row is emitted when the post-outcome status is terminal. Discarded is never produced by the dispatcher — only by the manual discard path. - Missing-adapter park: now emits both Attempted and terminal Parked, both carrying the same explanatory error. - Manual discard (DiscardAsync): after the row update, emits a terminal Discarded NotifyDeliver row with no error message (operator-driven cancellation, not a delivery error). - MapNotificationStatusToAuditStatus + IsTerminal helpers added; terminal emission shares BuildNotifyDeliverEvent with the B2 Attempted path so the two rows carry identical correlation/provenance fields. Audit failure NEVER aborts the user-facing action: every emission is wrapped in try/catch (defensive — the CentralAuditWriter itself swallows).	2026-05-20 16:12:44 -04:00
Joseph Doherty	1dfd67a90d	feat(notif): emit NotifyDeliver(Attempted) per dispatcher attempt (#23 M4) M4 Bundle B (B2) — NotificationOutboxActor's dispatcher loop emits a single AuditChannel.Notification / AuditKind.NotifyDeliver row with AuditStatus.Attempted for every delivery attempt (success, transient failure, permanent failure, and the missing-adapter park). - BuildNotifyDeliverEvent helper populates correlation id (parsed from the string NotificationId — sites generate Guid.NewGuid().ToString("N"), non-Guid ids fall through as null), list-name target, source site/instance/script provenance, and Actor=null (central dispatch has no authenticated end-user). - Attempt duration is measured around the adapter call and recorded as DurationMs so KPIs can compute per-attempt latency. - Emission is fire-and-forget (the writer swallows internally) and wrapped in try/catch — audit failure NEVER aborts the user-facing dispatch. Terminal-state emission lands separately in B3.	2026-05-20 16:08:06 -04:00
Joseph Doherty	b31747a632	feat(notif): NotificationOutboxActor + CentralAuditWriter wired (#23 M4) M4 Bundle B (B1) — add the central-only ICentralAuditWriter implementation and inject it into NotificationOutboxActor so subsequent tasks (B2/B3) can route attempt + terminal lifecycle events through the direct-write audit path. - CentralAuditWriter: thin wrapper around IAuditLogRepository.InsertIfNotExistsAsync; scope-per-call (matches AuditLogIngestActor / NotificationOutboxActor pattern); stamps IngestedAtUtc; swallows all internal failures (alog.md §13). - Registered as a singleton in AddAuditLog. - NotificationOutboxActor ctor takes ICentralAuditWriter (validated non-null). - Host wiring resolves the writer once from the root provider and passes it into the singleton's Props.Create call. - Existing TestKit fixtures updated with a NoOpCentralAuditWriter helper so tests that don't exercise audit emission still compile and pass.	2026-05-20 16:04:01 -04:00
Joseph Doherty	e4d902753b	feat(siteruntime): emit DbOutbound.DbWrite on sync Database.Execute/ExecuteReader (#23 M4) Audit Log #23 — M4 Bundle A (Tasks A1+A2): every script-initiated synchronous DB call routed through Database.Connection(name) now emits exactly one DbOutbound/DbWrite audit row. Implementation — three thin ADO.NET decorators in src/ScadaLink.SiteRuntime/Scripts/: - AuditingDbConnection: wraps the gateway-returned DbConnection so CreateDbCommand() hands the script an AuditingDbCommand. All other ADO.NET surface forwards unchanged. - AuditingDbCommand: intercepts ExecuteNonQuery / ExecuteScalar / ExecuteReader (sync + async). On terminal: Channel = DbOutbound, Kind = DbWrite, Status = Delivered\|Failed, Extra = {"op":"write","rowsAffected":N} (Execute), {"op":"read","rowsReturned":N} (ExecuteReader), RequestSummary = JSON of SQL + parameter values (default capture; redaction in M5), Target = "<connection>.<first 60 chars of SQL>", DurationMs captured via Stopwatch, Provenance from ScriptRuntimeContext (SourceSiteId, SourceInstanceId, SourceScript). - AuditingDbDataReader: counts rows on Read/ReadAsync and fires the audit emission exactly once on Close/CloseAsync/Dispose. DatabaseHelper now takes an IAuditWriter; ScriptRuntimeContext.Database threads through _auditWriter. When the writer is null (tests / minimal hosts) Connection() returns the raw inner DbConnection unchanged. Best-effort emission (alog.md §7): mirrors M2 Bundle F's 3-layer fail-safe — build, write, continuation. Audit-build, audit-write, and audit-continuation faults are logged + swallowed; the original ADO.NET result (or original exception) flows back to the script untouched. The SiteAuditWriteFailures counter increments automatically through the existing FallbackAuditWriter (Bundle G). Tests — tests/ScadaLink.SiteRuntime.Tests/Scripts/DatabaseSyncEmissionTests.cs (7 new, all passing): 1. Execute / INSERT success — one DbWrite row, op=write, rowsAffected=1. 2. ExecuteScalar success — one DbWrite row, op=write. 3. Execute throws — Status=Failed, ErrorMessage + ErrorDetail set. 4. ExecuteReader success — op=read, rowsReturned counts rows pulled. 5. AuditWriter throws — original ADO.NET rowsAffected returned, no events captured, no exception propagates. 6. Provenance populated from context. 7. DurationMs recorded non-zero. Tests use Microsoft.Data.Sqlite in-memory (already transitively available via SiteRuntime). Total SiteRuntime test suite: 251 passing (244 baseline + 7 new). Full solution test suite passes.	2026-05-20 15:54:54 -04:00
Joseph Doherty	c410fc6d43	docs(audit): add M4 remaining-boundaries implementation plan (#23 ) 5 bundles: DB sync emissions, NotificationOutbox central, site Notify.Send, Inbound API middleware, integration tests. M3-reality vocab baked in (DbWrite/NotifyDeliver/NotifySend/InboundRequest/InboundAuthFailure).	2026-05-20 15:41:13 -04:00
Joseph Doherty	f48efa7ca8	docs(audit): roadmap corrections after M3 M4 head now records M3 realities: - Vocabulary translation table from pre-M1 spec strings to M1-aligned enum values (DbWrite vs SyncWrite/SyncRead; NotifyDeliver vs Notification.Attempt/Terminal; InboundRequest/InboundAuthFailure vs ApiInbound.Completed; Failed vs PermanentFailure). - Mapper consolidation: 4 DTO mappers exist; extract single helper before M4 adds more channels. - OnCachedTelemetryWithoutDualWriteAsync test-mode fallback may be deprecated in M4. - Site SQLite drain for OperationTrackingStore: only dual-write transaction writes central today; plan drain if M4 needs in-flight tracking visibility. - SiteCallAuditActor wired but unused on M3 hot path; M4/M6 natural first direct caller.	2026-05-20 15:40:33 -04:00
Joseph Doherty	d20e8f4e9d	Merge branch 'feature/audit-log-m3-cached-operations': Audit Log #23 M3 Cached Operations + Dual-Write M3 ships the cached-call lifecycle: ExternalSystem.CachedCall and Database.CachedWrite each produce 3-5 audit rows + 1 SiteCalls row sharing the same TrackedOperationId. Site emits the combined packet (AuditEvent + SiteCallOperational); central writes both rows in one MS SQL transaction. Inlines the minimum-viable Site Call Audit (#22) surface: SiteCalls table + ISiteCallAuditRepository + SiteCallAuditActor. Reconciliation, KPIs, central->site Retry/Discard relay deferred. Shipped (23 commits, ~120 net new tests, 24/24 test projects green): - TrackedOperationId strong type + OperationTrackingStore site-local SQLite + Tracking.Status script API. - CachedCallTelemetry combined operational+audit packet (additive per Commons REQ-COM-5a — never renamed CachedOperationTelemetry). - SiteCalls MS SQL table + monotonic upsert repository (operational state, no partitioning) + migration. - ScadaLink.SiteCallAudit new project + SiteCallAuditActor cluster singleton. - sitestream.proto extended with IngestCachedTelemetry RPC + SiteCallOperationalDto + CachedTelemetryPacket/Batch. - AuditLogIngestActor combined-telemetry handler with per-entry BeginTransactionAsync; rollback on either-throw; per-entry try/catch isolates failures; central singleton stays alive (Resume). - ScriptRuntimeContext.ExternalSystem.CachedCall + Database.CachedWrite wrappers emit CachedSubmit on enqueue + handle immediate-success path (no S&F retry) with direct Attempted+CachedResolve emission. - StoreAndForward observer hook (ICachedCallLifecycleObserver) + CachedCallLifecycleBridge translates S&F outcomes to combined telemetry; per-attempt rows carry Kind=ApiCallCached/DbWriteCached, Status=Attempted (HttpStatus/ErrorMessage capture success/failure); terminal carries Kind=CachedResolve, Status=Delivered/Failed/Parked/ Discarded. - Component-level e2e via TestKit + MsSqlMigrationFixture + DirectActorSiteStreamAuditClient extracted to shared Integration/ Infrastructure/ + CombinedTelemetryHarness/Dispatcher helpers. - Health metric SiteAuditWriteFailures still wired (M2). Bridge from ICachedCallTelemetryForwarder to AuditWriter chain. Invariants honored: append-only AuditLog (writer role DENY UPDATE/DELETE from M1); audit-failure-never-aborts-script (three-layer fail-safe preserved); central singleton supervisor=Resume; idempotent at central on EventId (M2 race-fix from Bundle A) + monotonic at central on TrackedOperationId. infra/* never touched on any branch commit (verified empty via 'git log main..feature/audit-log-m3-cached-operations -- infra/'). Site->central gRPC client still NoOpSiteStreamAuditClient in production until M6; cached telemetry rows accumulate at site as Pending in production.	2026-05-20 15:39:41 -04:00
Joseph Doherty	73a19c6f02	refactor(auditlog): remove dead-code ternary in CachedCallLifecycleBridge (#23 M3 final-review fix)	2026-05-20 15:39:10 -04:00
Joseph Doherty	c3d4e6b1e0	test(auditlog): combined telemetry idempotency on retried packets (#23 M3)	2026-05-20 15:27:14 -04:00
Joseph Doherty	f063b35633	test(auditlog): cached DB write combined telemetry end-to-end (#23 M3)	2026-05-20 15:26:04 -04:00
Joseph Doherty	f4a7be4929	test(auditlog): cached call combined telemetry end-to-end (#23 M3)	2026-05-20 15:25:10 -04:00
Joseph Doherty	a3b0fb7f08	refactor(auditlog-tests): extract DirectActorSiteStreamAuditClient + add IngestCachedTelemetry support (#23 M3)	2026-05-20 15:21:44 -04:00
Joseph Doherty	f81750b2aa	fix(siteruntime): immediate-success CachedCall emits terminal telemetry (#23 M3) Bundle E left a gap in ExternalSystem.CachedCall: when the underlying HTTP call succeeds immediately (WasBuffered=false), the store-and-forward retry loop is never engaged and the ICachedCallLifecycleObserver hook never fires. As a result Tracking.Status(id) would stay in Submitted forever and the audit log would be missing the Attempted + CachedResolve pair the M3 contract requires. Fix: capture the ExternalCallResult returned by IExternalSystemClient. CachedCallAsync. When WasBuffered=false, emit the two missing telemetry packets from the helper itself: - ApiCallCached / Attempted (per-attempt mechanics row, HttpStatus + ErrorMessage extracted via the same regex the synchronous Call() audit row uses) - CachedResolve / Delivered on Success, or - CachedResolve / Failed on Success=false (immediate permanent failure or transient failure without S&F). The terminal CachedResolve row carries TerminalAtUtc so SiteCallAudit can recognise the row as eligible for purge. The WasBuffered=true path is unaffected — the S&F retry loop owns the Attempted + Resolve emissions there via the CachedCallLifecycleBridge. Database.CachedWrite is unaffected too because IDatabaseGateway. CachedWriteAsync always enqueues into S&F (no immediate-success path). Both new emissions are best-effort: a throwing forwarder is logged and swallowed (alog.md §7) and each row is independently try/catch-wrapped so a single fault cannot drop both halves of the terminal pair. Tests in ExternalSystemCachedCallEmissionTests: - CachedCall_ImmediateSuccess_EmitsAttemptedAndCachedResolve - CachedCall_ImmediateFailure_EmitsAttemptedAndCachedResolveFailed - CachedCall_BufferedPath_DoesNotEmitTerminalTelemetryFromHelper Full suite: 244 SiteRuntime tests (3 new), 200 Host tests, all green.	2026-05-20 15:15:11 -04:00
Joseph Doherty	6fe23a4d9b	feat(host): register SiteCallAuditActor + CachedCallTelemetry forwarder/bridge (#22 , #23 M3) M3 Bundle F (Task F1) wires the cached-call audit pipeline through the composition roots: - Central: register SiteCallAuditActor as a cluster singleton + proxy (mirrors AuditLogIngestActor and NotificationOutboxActor). Program.cs calls .AddSiteCallAudit() on the central role. - Site: register ICachedCallTelemetryForwarder + CachedCallLifecycleBridge in AddAuditLog (lazy factory — Central nodes degrade to audit-only emission because IOperationTrackingStore is site-only). - Site: bind CachedCallLifecycleBridge to ICachedCallLifecycleObserver so StoreAndForwardService picks it up via DI. - Site: introduce IStoreAndForwardSiteContext + Host adapter to surface the site id to StoreAndForwardService without creating a StoreAndForward -> HealthMonitoring project-reference cycle. - ScriptExecutionActor resolves ICachedCallTelemetryForwarder per script scope and threads it into ScriptRuntimeContext. CachedCallTelemetryForwarder's IOperationTrackingStore dependency is now nullable so Central DI validation succeeds with the lazy registration; the forwarder's tracking-half emission is a no-op when the store is absent. Tests: - AkkaHostedServiceAuditWiringTests: Central host builds with AddSiteCallAudit and resolves ICachedCallTelemetryForwarder; Site resolves the forwarder + bridge + observer + IStoreAndForwardSiteContext. - Full solution: 194 Host tests green, 241 SiteRuntime tests green, every other suite unchanged.	2026-05-20 15:10:47 -04:00
Joseph Doherty	047988e4c8	feat(siteruntime): Database.CachedWrite emits combined telemetry + S&F audit bridge (#23 M3) Wire the M3 cached-call audit pipeline end-to-end for the database channel and close the loop between the S&F lifecycle observer and the site-side dual emitter. * DatabaseCachedWriteEmissionTests covers Database.CachedWrite (set up in Bundle E3): mints a TrackedOperationId, emits one CachedSubmit packet on DbOutbound, threads the id into IDatabaseGateway, and is best-effort on a thrown forwarder. Mirrors ExternalSystem.CachedCall coverage from E3. * CachedCallLifecycleBridge (new) implements ICachedCallLifecycleObserver and lives alongside CachedCallTelemetryForwarder. The bridge ingests per-attempt notifications from the S&F retry loop and fans them out to the forwarder: - TransientFailure -> 1 Attempted row - Delivered -> Attempted + CachedResolve(Delivered) - PermanentFailure -> Attempted + CachedResolve(Parked) - ParkedMaxRetries -> Attempted + CachedResolve(Parked) Channel string -> AuditKind mapping (ApiOutbound->ApiCallCached, DbOutbound->DbWriteCached). Best-effort top-level catch swallows any unexpected throw so the S&F retry bookkeeping is never disturbed. * Bridge tests (7) cover all four outcomes, channel mapping, provenance propagation, and the no-throw-on-forwarder-failure contract. Bundle F (Host registration) will instantiate the bridge and inject it into StoreAndForwardService.cachedCallObserver, closing the wiring path end-to-end. Bundle E task E6.	2026-05-20 14:55:17 -04:00
Joseph Doherty	63eb1f4225	feat(snf): per-attempt and terminal cached-call lifecycle observer (#23 M3) Hook the store-and-forward retry loop so the audit pipeline can emit per-attempt + terminal telemetry under the original TrackedOperationId (Bundle E Tasks E4 + E5). New seam: * ICachedCallLifecycleObserver + CachedCallAttemptContext in Commons.Interfaces.Services. Outcome enum (Delivered / TransientFailure / PermanentFailure / ParkedMaxRetries) is S&F-vocabulary; the bridge living in ScadaLink.AuditLog (Bundle F) will map it to the AuditKind/AuditStatus pair when building the CachedCallTelemetry packet. * StoreAndForwardService gains an optional cachedCallObserver constructor parameter + siteId. RetryMessageAsync fires the observer exactly once per attempt with the appropriate outcome: - handler returns true -> Delivered - handler returns false -> PermanentFailure (and parks) - handler throws + retries remaining -> TransientFailure - handler throws + max retries hit -> ParkedMaxRetries (and parks) Hook is best-effort: a thrown observer is logged + swallowed so a failing audit pipeline can never be misclassified as a transient delivery failure or corrupt the retry-count bookkeeping (alog.md §7). Only cached-call categories (ExternalSystem, CachedDbWrite) generate notifications — Notification category has its own central-side audit pipeline (Notification Outbox / #21). Pre-M3 callers that didn't thread a TrackedOperationId into the S&F message id are silently skipped — the observer requires a parseable id by contract. New S&F callers stamp the id as messageId (Bundle E3). Bundle E tasks E4 + E5.	2026-05-20 14:52:34 -04:00
Joseph Doherty	42430dd10a	feat(siteruntime): ExternalSystem.CachedCall emits CachedSubmit telemetry (#23 M3) Rework ScriptRuntimeContext.ExternalSystem.CachedCall to fit the M3 combined-telemetry model: * Mints a fresh TrackedOperationId and emits one CachedSubmit packet via ICachedCallTelemetryForwarder BEFORE handing the call off — the SiteCalls row is materialised before the first delivery attempt so Tracking.Status(id) can observe a Submitted row even if immediate delivery resolves before the helper returns. * Threads the TrackedOperationId into IExternalSystemClient.CachedCallAsync as a new optional parameter (and into IDatabaseGateway.CachedWriteAsync for the Database mirror set up here for E6). The gateway uses the id as the StoreAndForward messageId so the retry loop (Tasks E4/E5) can recover it from StoreAndForwardMessage.Id. * Returns the TrackedOperationId rather than ExternalCallResult — the script's contract is now "get a tracking handle, observe outcome via Tracking.Status". Best-effort emission: a thrown forwarder is logged + swallowed; the original call still runs and the id is still returned. DatabaseHelper gets the matching siteId / sourceScript / forwarder fields and a parallel CachedSubmit emitter (Channel=DbOutbound) so Task E6's Database.CachedWrite mirror plugs in without further runtime wiring. New ICachedCallTelemetryForwarder seam in Commons.Interfaces.Services so SiteRuntime depends on Commons (existing arrow) rather than ScadaLink.AuditLog (would have introduced a new dependency). Bundle E task E3 (and helper-shape work for E6).	2026-05-20 14:48:05 -04:00
Joseph Doherty	2145b29d4d	feat(auditlog): CachedCallTelemetryForwarder for site-side dual emission (#23 M3) Sister to SiteAuditTelemetryActor: takes a combined CachedCallTelemetry packet and fans it out to the two site-local stores. * AuditEvent half writes through IAuditWriter (the M2 FallbackAuditWriter + SqliteAuditWriter chain — same site SQLite hot-path as sync calls). * SiteCallOperational half maps Audit.Kind to the matching IOperationTrackingStore method: - CachedSubmit -> RecordEnqueueAsync (insert-if-not-exists) - ApiCallCached / DbWriteCached -> RecordAttemptAsync (monotonic) - CachedResolve -> RecordTerminalAsync (first-write-wins) Best-effort contract (alog.md §7): independent try/catch per half so a thrown writer cannot starve the tracking row (and vice-versa); both failures are logged at warning level and swallowed — the calling script never sees them. Wire push deferred to M6 — the NoOp ISiteStreamAuditClient binding stays in effect; the forwarder writes only to the local stores in M3. The existing SiteAuditTelemetryActor drain loop will sweep the audit rows once a real gRPC client lands. Bundle E task E2.	2026-05-20 14:41:15 -04:00

1 2 3 4 5 ...

794 Commits