12 KiB
Audit Log — Cross-Execution Correlation (ParentExecutionId) Design
Date: 2026-05-21 Status: Validated — ready for implementation planning.
Problem
The Audit Log carries ExecutionId (Guid?) — a universal per-run correlation
value stamped on every audit row, identifying the originating script execution
or inbound API request. It is per-execution and flat: WHERE ExecutionId = X
returns everything one run did, but nothing links an execution to the
execution that spawned it. A call chain cannot be traced across the execution
boundary.
Two cross-execution cases exist:
- Inbound API request → routed site script. An inbound HTTP request runs an
inbound method script (
InboundScriptExecutor, central) which callsRoute.Call(scriptName, params); that sends aRouteToCallRequestto a site instance, which runsscriptNameas a fresh site-side execution. The inbound request and the routed site script get two unrelatedExecutionIds. - Tag cascade. Script A writes an attribute; the attribute change triggers script B as a separate execution. A and B are unrelated.
Decision
Add a dedicated, nullable ParentExecutionId (Guid?) column to the audit
row. Every execution still gets its own fresh ExecutionId (unchanged). An
execution spawned by another carries the spawner's ExecutionId in its
ParentExecutionId; a top-level (tag/timer/inbound/un-bridged) execution leaves
it null. Walking ParentExecutionId → ExecutionId recursively reconstructs the
chain as a tree.
First cut — in scope: case 1 only, the inbound → routed-site-script
bridge. It is the most concrete case and the spawn point is an explicit,
threadable RPC (RouteToCallRequest).
Out of scope: case 2 (tag cascade) — the trigger is data-driven and
decoupled; "which execution wrote the tag that triggered me" is not tracked
anywhere today. Deferred as a follow-up. The ParentExecutionId model
generalises to it with no schema change if that data is ever threaded.
Considered and rejected
- Reuse
ExecutionId— the routed script adopts the inbound request'sExecutionIdinstead of generating its own. Cheaper (no new column) but conflates two genuinely separate executions on two clusters, breaks the invariant "oneExecutionId= oneScriptRuntimeContextrun", and does not generalise to tag cascade. - Point
ParentExecutionIdat the root (flatten the chain to two levels) instead of the immediate spawner — simpler queries but loses intermediate hops, needs a separately threaded root id, and does not generalise. Rejected in favour of the immediate-spawner tree.
Architecture & data flow
The id propagated is the inbound API request's ExecutionId. The chain:
- Mint the inbound request id once, early. Today
AuditWriteMiddlewaremints aGuid.NewGuid()late, only for the inbound row'sExecutionId. Move the mint to the HTTP entry and stash it onHttpContext.Items, so both the middleware (writes theInboundRequestrow at request end) andInboundScriptExecutor(needs it before the script runs) read the same id. - Carry it on the routing RPC.
RouteHelper.Callbuilds aRouteToCallRequest; an additiveParentExecutionIdfield is set from the stashed inbound id. (RouteHelper's own per-op GUID is a separate concern — left alone.) - Site side: thread it into the routed script's context. The site handler
for
RouteToCallRequestpasses it to a new optionalparentExecutionIdctor param onScriptRuntimeContext(sibling to the existingexecutionIdparam). The routed script still generates its own freshExecutionId. - Every emitter stamps
ParentExecutionIdalongsideExecutionId.
Recursion (immediate-spawner tree). A routed script that itself calls
Route.Call threads its own ExecutionId onward, so a grandchild's
ParentExecutionId points at its immediate spawner, not the root. Walk the tree
recursively to reconstruct any depth.
The inbound request's own row (InboundRequest / InboundAuthFailure) is
top-level → ParentExecutionId = NULL. Only the routed site script and every
row it produces carry the pointer.
Schema changes (all additive, nullable — no backfill; pre-existing rows stay NULL)
| Where | Change |
|---|---|
ScadaLink.Commons |
AuditEvent.ParentExecutionId (Guid?); RouteToCallRequest.ParentExecutionId (Guid?); Notification.OriginParentExecutionId (Guid?); NotificationSubmit.OriginParentExecutionId (Guid?). |
Central MS SQL AuditLog |
ParentExecutionId uniqueidentifier NULL column + partition-aligned index IX_AuditLog_ParentExecution (ParentExecutionId) (mirror AddAuditLogExecutionId). EF migration — additive nullable column is a metadata-only ALTER. |
Central MS SQL Notifications |
OriginParentExecutionId uniqueidentifier NULL column + EF migration (mirror AddNotificationOriginExecutionId). |
Site SQLite auditlog.db AuditLog |
ParentExecutionId TEXT NULL — added via the idempotent ALTER-if-missing upgrade path (per commit 5198b11), never relying on CREATE TABLE IF NOT EXISTS. |
gRPC AuditEventDto (sitestream.proto) |
additive parent_execution_id field (next free number); AuditEventDtoMapper maps it both directions (Guid ↔ string; empty string ↔ null). |
ScriptRuntimeContext |
optional parentExecutionId ctor param + stored _parentExecutionId field. |
IX_AuditLog_ParentExecution is load-bearing: the tree view's downward
recursive join seeks on it, and it backs the parentExecutionId filter.
SiteCalls needs no new column — the cached telemetry packet carries the audit
half, which now has ParentExecutionId directly.
Emitter coverage — full (mirrors the ExecutionId rollout)
Every audit row a routed-script run produces carries ParentExecutionId, so
WHERE ParentExecutionId = X returns the routed run's complete trust-boundary
footprint.
| Emitter | ParentExecutionId source |
|---|---|
Sync ApiCall, sync DbWrite |
ScriptRuntimeContext._parentExecutionId (in scope) |
Cached call script-side rows (CachedSubmit, immediate Attempted/CachedResolve) |
ScriptRuntimeContext._parentExecutionId |
Cached call S&F retry-loop rows (CachedCallLifecycleBridge) |
threaded through the S&F buffered message → CachedCallAttemptContext → the bridge, as a sibling to the ExecutionId already threaded there |
NotifySend (site, script-side) |
ScriptRuntimeContext._parentExecutionId |
NotifyDeliver (central dispatch) |
Notifications.OriginParentExecutionId — rides on NotificationSubmit, persisted on the Notifications row, dispatcher stamps every NotifyDeliver row |
Inbound InboundRequest / InboundAuthFailure |
NULL — inbound is top-level |
The threading reuses the carry points the ExecutionId rollout already opened
(S&F buffer, NotificationSubmit → Notifications); ParentExecutionId is a
sibling field at each, not a new boundary.
Recursive chain/tree view
A new repository method GetExecutionTreeAsync(Guid executionId):
- Walk up to the root: iterative single-parent follow
(
SELECT TOP 1 ParentExecutionId WHERE ExecutionId = current AND ParentExecutionId IS NOT NULL) until null. Cheap — each execution has exactly one parent. - Walk down from the root: recursive CTE joining
ParentExecutionId = ancestor.ExecutionId, seeking onIX_AuditLog_ParentExecution.MAXRECURSIONcapped (e.g. 32) — chains are shallow; the cap guards against corrupt/pathological data. - Returns a flat list of execution nodes:
ExecutionId,ParentExecutionId, row count, channels/statuses present,SourceSiteId/SourceInstanceId, first/lastOccurredAtUtc. The UI assembles the tree from the flat list.
UI. New route /audit/execution-tree?executionId=<guid>, reached via a
"View execution chain" drill-in from any audit row and from the ExecutionId
column. Renders an expandable custom Blazor tree (no component frameworks); each
node shows the execution summary; clicking a node filters the Audit Log grid to
?executionId=<node>. The tree is always rooted at the topmost ancestor, so the
reader sees the full chain regardless of which row they entered from.
Plus the cheaper navigation affordances: ParentExecutionId grid column (short
form / monospace), a ParentExecutionId paste-filter, a ?parentExecutionId=
query param, and a "View parent execution" drill-in (links
?executionId=<parentId>).
Edge cases
- Parent with no rows of its own. An execution that performed no
trust-boundary action emits no audit rows, yet a child still references it via
ParentExecutionId. The upward walk resolves the GUID but finds no rows for that node → render it as a stub node ("execution with no audited actions"). - Purged parent. A parent execution older than the 365-day central retention has no rows → the upward walk stops there; the chain renders as far as it resolves.
- Cycle guard. The
ParentExecutionIdgraph is acyclic by construction (each execution is minted fresh and its parent always pre-exists), butMAXRECURSIONbounds the downward CTE against corrupt data.
CLI / ManagementService
- CLI:
scadalink audit query --parent-execution-id <guid>;AuditLogQueryFiltergains aParentExecutionIdsingle-value filter dimension (mirrorExecutionId). - ManagementService
/api/audit/query+ export endpoint and the CentralUI export endpoints parse aparentExecutionIdquery param (lax-parse — unparseable dropped). - The tree view's data path:
GetExecutionTreeAsyncis exposed however the existing Audit Log page sources its grid data — mirror that path; add a ManagementService endpoint only if the page goes through it. - No CLI
audit treecommand in the first cut — the tree is a UI forensic affordance; the--parent-execution-idfilter covers scripted use. Noted as a possible follow-up.
Compatibility
- Additive nullable columns; additive proto field; additive message-contract
fields — all version-compatible. No backfill; historical rows keep
ParentExecutionId = NULL. ExecutionIdandCorrelationIdsemantics unchanged — every existing drill-in keeps working.
Failure handling
- Audit-write failure NEVER aborts the user-facing action — unchanged invariant;
ParentExecutionIdis just another field on the row. - Site
auditlog.dbschema change MUST use the idempotentALTER-if-missing path (commit5198b11); do not repeat the originalCREATE TABLE IF NOT EXISTSmistake.
Testing
- Repository: query-by-
ParentExecutionId;GetExecutionTreeAsync(multi-level tree, stub-parent node,MAXRECURSIONcap); migration smoke test. - Emitter unit tests: each emitter stamps
ParentExecutionId; the cached-call lifecycle rows from one routed run share it;NotifyDeliverechoesNotifications.OriginParentExecutionId. - Headline integration test: an inbound API request that calls
Route.Call→ the routed site script does a syncExternalSystem.Call, a cached call, and aNotify.Send→ every resulting audit row (site + central) carriesParentExecutionId= the inbound request'sExecutionId, while each has its own distinctExecutionId. - Central UI: bUnit (column renders, filter maps, query param parsed, tree assembled from the flat list) + Playwright (drill-in → tree → node click filters the grid).
Out of scope / follow-ups
- Tag cascade (case 2) — deferred. If the attribute-write path ever carries
the writing execution's id into the triggered script's
ScriptRuntimeContext, the sameParentExecutionIdcolumn and tree view cover it with no schema change. - CLI
audit treecommand — possible follow-up. - Backfilling
ParentExecutionIdon historical audit rows — not done.
Constraints
- Additive everywhere — nullable columns, additive proto/message fields, no backfill.
- Never touch
infra/*;alog.mdis the locked v1 spec — do not modify it. - Site
auditlog.dbschema change MUST use the idempotentALTER-if-missing path (commit5198b11).