6.4 KiB
Audit Log — ExecutionId Universal Correlation (Design)
Date: 2026-05-21 Status: Validated — ready for implementation planning.
Problem
The audit CorrelationId column is overloaded with three incompatible meanings —
TrackedOperationId for cached calls, NotificationId for notifications, the
script-execution id for sync calls (added 2026-05-21), and request-local ids for
inbound. It is NULL for sync one-shot calls. There is no single value that ties
together everything one script run (or inbound request) did: a run that makes a
sync API call, a cached call and a notification produces three unrelated
correlation ids, and nothing links the cached call's lifecycle rows back to the
run that launched them.
A single CorrelationId column cannot serve both scopes — the operation
lifecycle (a cached call's Submit→Attempted→Resolve; a notification's
Send→Deliver, which the Site Calls / Notifications "View audit history"
drill-ins depend on) and the execution trace (all operations of one run).
Decision
Add a dedicated, nullable ExecutionId column to the audit row. It identifies
the originating script execution or inbound API request. Every audit row
that execution produces carries the same ExecutionId. CorrelationId is left
exactly as it is — it keeps the per-operation lifecycle meaning, so the existing
operation drill-ins are unaffected.
Result: WHERE ExecutionId = X returns every audit row of one run — sync
ApiCall/DbWrite, the whole cached-call lifecycle, NotifySend,
NotifyDeliver, and the inbound row — across both the site and central tables.
ScriptRuntimeContext already holds a per-execution id (_auditCorrelationId,
added 2026-05-21). That id becomes the ExecutionId; this work stamps it into the
new column from every emitter and threads it to the two paths where the script
context is not in scope.
Considered and rejected
- Overload
CorrelationIdwith the execution id everywhere — breaks the cached-call / notification "View audit history" drill-ins (they filterCorrelationIdbyTrackedOperationId/NotificationId), or forces them to show the whole run instead of the one operation. - Stash the execution id in
ExtraJSON — no schema change, butExtrais unindexed; filtering an audit table of this volume by it is unworkable.
Schema changes (all additive, nullable — no backfill; pre-existing rows stay NULL)
| Where | Change |
|---|---|
ScadaLink.Commons |
AuditEvent record (and the site-local variant) gains Guid? ExecutionId. |
Central MS SQL AuditLog |
new ExecutionId uniqueidentifier NULL column + index IX_AuditLog_Execution (ExecutionId). EF migration — additive nullable column is a metadata-only ALTER, fast even on the monthly-partitioned table. |
Site SQLite auditlog.db AuditLog |
new ExecutionId TEXT NULL column (SqliteAuditWriter schema + MapRow). |
gRPC AuditEventDto (sitestream.proto) |
additive execution_id field; AuditEventDtoMapper maps it both directions. |
Central MS SQL Notifications |
new OriginExecutionId uniqueidentifier NULL column — carries the originating run's id so the dispatcher can echo it onto NotifyDeliver audit rows. EF migration. |
SiteCalls needs no new column — the cached telemetry packet already carries the
audit half, which now has ExecutionId directly.
Emitter coverage — every audit row carries ExecutionId
| Emitter | ExecutionId source |
|---|---|
Sync ApiCall, sync DbWrite |
ScriptRuntimeContext execution id (in scope today) |
Cached call script-side rows (CachedSubmit, immediate Attempted/CachedResolve) |
ScriptRuntimeContext execution id |
Cached call S&F retry-loop rows (CachedCallLifecycleBridge) |
threaded through the store-and-forward buffered message → CachedCallAttemptContext → the bridge. This same threading also fixes the pre-existing SourceScript = NULL gap on those rows (identical boundary). |
NotifySend (site, script-side) |
ScriptRuntimeContext execution id |
NotifyDeliver (central dispatch) |
Notifications.OriginExecutionId — the id rides on NotificationSubmit, is persisted on the Notifications row, and the dispatcher stamps it on every NotifyDeliver row |
Inbound InboundRequest / InboundAuthFailure |
request id minted once in AuditWriteMiddleware |
Data flow
- Site script run —
ScriptRuntimeContextgenerates the execution id (or is given one); every emitter it owns stampsExecutionId. - Buffered cached call — the execution id rides on the S&F buffered message;
the retry loop reconstructs it into
CachedCallAttemptContext;CachedCallLifecycleBridgestamps it on the retry-loop audit rows. - Notification — the
NotifySendrow stamps it site-side; the id travels onNotificationSubmit, is stored asNotifications.OriginExecutionId, and the dispatcher stamps everyNotifyDeliverrow it emits. - Inbound API request —
AuditWriteMiddlewaremints a request id and stamps the inbound audit row.
UI / CLI surface
- Central UI Audit Log page —
ExecutionIdadded as a results-grid column (the grid already supports resize/reorder); anExecutionIdpaste-filter in the filter bar; the page accepts?executionId=<guid>; a row drill-in "View this execution" →/audit/log?executionId=<guid>. - CLI —
scadalink audit query --execution-id <guid>. - ManagementService —
/api/audit/queryand the export endpoint accept anexecutionIdfilter parameter.
Compatibility
- Two additive nullable columns; additive proto field; additive message-contract
fields — all version-compatible. No data backfill; historical rows keep
ExecutionId = NULL. CorrelationIdsemantics unchanged — every existing drill-in keeps working.
Testing
- Repository: query-by-
ExecutionId; migration smoke test. - Emitter unit tests: each emitter stamps
ExecutionId; the cached-call lifecycle rows from one run share it;NotifyDeliverechoesNotifications.OriginExecutionId. - Integration: a script run that does a sync call + a cached call + a notification
→ all resulting audit rows share one
ExecutionIdend-to-end. - Central UI: bUnit (grid column, filter, drill-in) + Playwright.
Out of scope
- Bridging the inbound request id into the routed site script's execution (cross-cluster threading) — a separate future change.
- Backfilling
ExecutionIdon historical audit rows.