Files
scadalink-design/docs/plans/2026-05-21-audit-executionid-design.md
2026-05-21 14:34:12 -04:00

6.4 KiB

Audit Log — ExecutionId Universal Correlation (Design)

Date: 2026-05-21 Status: Validated — ready for implementation planning.

Problem

The audit CorrelationId column is overloaded with three incompatible meanings — TrackedOperationId for cached calls, NotificationId for notifications, the script-execution id for sync calls (added 2026-05-21), and request-local ids for inbound. It is NULL for sync one-shot calls. There is no single value that ties together everything one script run (or inbound request) did: a run that makes a sync API call, a cached call and a notification produces three unrelated correlation ids, and nothing links the cached call's lifecycle rows back to the run that launched them.

A single CorrelationId column cannot serve both scopes — the operation lifecycle (a cached call's Submit→Attempted→Resolve; a notification's Send→Deliver, which the Site Calls / Notifications "View audit history" drill-ins depend on) and the execution trace (all operations of one run).

Decision

Add a dedicated, nullable ExecutionId column to the audit row. It identifies the originating script execution or inbound API request. Every audit row that execution produces carries the same ExecutionId. CorrelationId is left exactly as it is — it keeps the per-operation lifecycle meaning, so the existing operation drill-ins are unaffected.

Result: WHERE ExecutionId = X returns every audit row of one run — sync ApiCall/DbWrite, the whole cached-call lifecycle, NotifySend, NotifyDeliver, and the inbound row — across both the site and central tables.

ScriptRuntimeContext already holds a per-execution id (_auditCorrelationId, added 2026-05-21). That id becomes the ExecutionId; this work stamps it into the new column from every emitter and threads it to the two paths where the script context is not in scope.

Considered and rejected

  • Overload CorrelationId with the execution id everywhere — breaks the cached-call / notification "View audit history" drill-ins (they filter CorrelationId by TrackedOperationId / NotificationId), or forces them to show the whole run instead of the one operation.
  • Stash the execution id in Extra JSON — no schema change, but Extra is unindexed; filtering an audit table of this volume by it is unworkable.

Schema changes (all additive, nullable — no backfill; pre-existing rows stay NULL)

Where Change
ScadaLink.Commons AuditEvent record (and the site-local variant) gains Guid? ExecutionId.
Central MS SQL AuditLog new ExecutionId uniqueidentifier NULL column + index IX_AuditLog_Execution (ExecutionId). EF migration — additive nullable column is a metadata-only ALTER, fast even on the monthly-partitioned table.
Site SQLite auditlog.db AuditLog new ExecutionId TEXT NULL column (SqliteAuditWriter schema + MapRow).
gRPC AuditEventDto (sitestream.proto) additive execution_id field; AuditEventDtoMapper maps it both directions.
Central MS SQL Notifications new OriginExecutionId uniqueidentifier NULL column — carries the originating run's id so the dispatcher can echo it onto NotifyDeliver audit rows. EF migration.

SiteCalls needs no new column — the cached telemetry packet already carries the audit half, which now has ExecutionId directly.

Emitter coverage — every audit row carries ExecutionId

Emitter ExecutionId source
Sync ApiCall, sync DbWrite ScriptRuntimeContext execution id (in scope today)
Cached call script-side rows (CachedSubmit, immediate Attempted/CachedResolve) ScriptRuntimeContext execution id
Cached call S&F retry-loop rows (CachedCallLifecycleBridge) threaded through the store-and-forward buffered message → CachedCallAttemptContext → the bridge. This same threading also fixes the pre-existing SourceScript = NULL gap on those rows (identical boundary).
NotifySend (site, script-side) ScriptRuntimeContext execution id
NotifyDeliver (central dispatch) Notifications.OriginExecutionId — the id rides on NotificationSubmit, is persisted on the Notifications row, and the dispatcher stamps it on every NotifyDeliver row
Inbound InboundRequest / InboundAuthFailure request id minted once in AuditWriteMiddleware

Data flow

  • Site script runScriptRuntimeContext generates the execution id (or is given one); every emitter it owns stamps ExecutionId.
  • Buffered cached call — the execution id rides on the S&F buffered message; the retry loop reconstructs it into CachedCallAttemptContext; CachedCallLifecycleBridge stamps it on the retry-loop audit rows.
  • Notification — the NotifySend row stamps it site-side; the id travels on NotificationSubmit, is stored as Notifications.OriginExecutionId, and the dispatcher stamps every NotifyDeliver row it emits.
  • Inbound API requestAuditWriteMiddleware mints a request id and stamps the inbound audit row.

UI / CLI surface

  • Central UI Audit Log pageExecutionId added as a results-grid column (the grid already supports resize/reorder); an ExecutionId paste-filter in the filter bar; the page accepts ?executionId=<guid>; a row drill-in "View this execution" → /audit/log?executionId=<guid>.
  • CLIscadalink audit query --execution-id <guid>.
  • ManagementService/api/audit/query and the export endpoint accept an executionId filter parameter.

Compatibility

  • Two additive nullable columns; additive proto field; additive message-contract fields — all version-compatible. No data backfill; historical rows keep ExecutionId = NULL.
  • CorrelationId semantics unchanged — every existing drill-in keeps working.

Testing

  • Repository: query-by-ExecutionId; migration smoke test.
  • Emitter unit tests: each emitter stamps ExecutionId; the cached-call lifecycle rows from one run share it; NotifyDeliver echoes Notifications.OriginExecutionId.
  • Integration: a script run that does a sync call + a cached call + a notification → all resulting audit rows share one ExecutionId end-to-end.
  • Central UI: bUnit (grid column, filter, drill-in) + Playwright.

Out of scope

  • Bridging the inbound request id into the routed site script's execution (cross-cluster threading) — a separate future change.
  • Backfilling ExecutionId on historical audit rows.