Audit Log — ExecutionId Universal Correlation (Design)

Date: 2026-05-21 Status: Validated — ready for implementation planning.

Problem

The audit CorrelationId column is overloaded with three incompatible meanings — TrackedOperationId for cached calls, NotificationId for notifications, the script-execution id for sync calls (added 2026-05-21), and request-local ids for inbound. It is NULL for sync one-shot calls. There is no single value that ties together everything one script run (or inbound request) did: a run that makes a sync API call, a cached call and a notification produces three unrelated correlation ids, and nothing links the cached call's lifecycle rows back to the run that launched them.

A single CorrelationId column cannot serve both scopes — the operation lifecycle (a cached call's Submit→Attempted→Resolve; a notification's Send→Deliver, which the Site Calls / Notifications "View audit history" drill-ins depend on) and the execution trace (all operations of one run).

Decision

Add a dedicated, nullable ExecutionId column to the audit row. It identifies the originating script execution or inbound API request. Every audit row that execution produces carries the same ExecutionId. CorrelationId is left exactly as it is — it keeps the per-operation lifecycle meaning, so the existing operation drill-ins are unaffected.

Result: WHERE ExecutionId = X returns every audit row of one run — sync ApiCall/DbWrite, the whole cached-call lifecycle, NotifySend, NotifyDeliver, and the inbound row — across both the site and central tables.

ScriptRuntimeContext already holds a per-execution id (_auditCorrelationId, added 2026-05-21). That id becomes the ExecutionId; this work stamps it into the new column from every emitter and threads it to the two paths where the script context is not in scope.

Considered and rejected

Overload CorrelationId with the execution id everywhere — breaks the cached-call / notification "View audit history" drill-ins (they filter CorrelationId by TrackedOperationId / NotificationId), or forces them to show the whole run instead of the one operation.
Stash the execution id in Extra JSON — no schema change, but Extra is unindexed; filtering an audit table of this volume by it is unworkable.

Schema changes (all additive, nullable — no backfill; pre-existing rows stay `NULL`)

Where	Change
`ScadaLink.Commons`	`AuditEvent` record (and the site-local variant) gains `Guid? ExecutionId`.
Central MS SQL `AuditLog`	new `ExecutionId uniqueidentifier NULL` column + index `IX_AuditLog_Execution (ExecutionId)`. EF migration — additive nullable column is a metadata-only `ALTER`, fast even on the monthly-partitioned table.
Site SQLite `auditlog.db` `AuditLog`	new `ExecutionId TEXT NULL` column (`SqliteAuditWriter` schema + `MapRow`).
gRPC `AuditEventDto` (`sitestream.proto`)	additive `execution_id` field; `AuditEventDtoMapper` maps it both directions.
Central MS SQL `Notifications`	new `OriginExecutionId uniqueidentifier NULL` column — carries the originating run's id so the dispatcher can echo it onto `NotifyDeliver` audit rows. EF migration.

SiteCalls needs no new column — the cached telemetry packet already carries the audit half, which now has ExecutionId directly.

Emitter coverage — every audit row carries `ExecutionId`

Emitter	`ExecutionId` source
Sync `ApiCall`, sync `DbWrite`	`ScriptRuntimeContext` execution id (in scope today)
Cached call script-side rows (`CachedSubmit`, immediate `Attempted`/`CachedResolve`)	`ScriptRuntimeContext` execution id
Cached call S&F retry-loop rows (`CachedCallLifecycleBridge`)	threaded through the store-and-forward buffered message → `CachedCallAttemptContext` → the bridge. This same threading also fixes the pre-existing `SourceScript = NULL` gap on those rows (identical boundary).
`NotifySend` (site, script-side)	`ScriptRuntimeContext` execution id
`NotifyDeliver` (central dispatch)	`Notifications.OriginExecutionId` — the id rides on `NotificationSubmit`, is persisted on the `Notifications` row, and the dispatcher stamps it on every `NotifyDeliver` row
Inbound `InboundRequest` / `InboundAuthFailure`	request id minted once in `AuditWriteMiddleware`

Data flow

Site script run — ScriptRuntimeContext generates the execution id (or is given one); every emitter it owns stamps ExecutionId.
Buffered cached call — the execution id rides on the S&F buffered message; the retry loop reconstructs it into CachedCallAttemptContext; CachedCallLifecycleBridge stamps it on the retry-loop audit rows.
Notification — the NotifySend row stamps it site-side; the id travels on NotificationSubmit, is stored as Notifications.OriginExecutionId, and the dispatcher stamps every NotifyDeliver row it emits.
Inbound API request — AuditWriteMiddleware mints a request id and stamps the inbound audit row.

UI / CLI surface

Central UI Audit Log page — ExecutionId added as a results-grid column (the grid already supports resize/reorder); an ExecutionId paste-filter in the filter bar; the page accepts ?executionId=<guid>; a row drill-in "View this execution" → /audit/log?executionId=<guid>.
CLI — scadalink audit query --execution-id <guid>.
ManagementService — /api/audit/query and the export endpoint accept an executionId filter parameter.

Compatibility

Two additive nullable columns; additive proto field; additive message-contract fields — all version-compatible. No data backfill; historical rows keep ExecutionId = NULL.
CorrelationId semantics unchanged — every existing drill-in keeps working.

Testing

Repository: query-by-ExecutionId; migration smoke test.
Emitter unit tests: each emitter stamps ExecutionId; the cached-call lifecycle rows from one run share it; NotifyDeliver echoes Notifications.OriginExecutionId.
Integration: a script run that does a sync call + a cached call + a notification → all resulting audit rows share one ExecutionId end-to-end.
Central UI: bUnit (grid column, filter, drill-in) + Playwright.

Out of scope

Bridging the inbound request id into the routed site script's execution (cross-cluster threading) — a separate future change.
Backfilling ExecutionId on historical audit rows.

6.4 KiB Raw Blame History