# Audit Log — Cross-Execution Correlation (`ParentExecutionId`) Design **Date:** 2026-05-21 **Status:** Validated — ready for implementation planning. ## Problem The Audit Log carries `ExecutionId` (`Guid?`) — a universal per-run correlation value stamped on every audit row, identifying the originating script execution or inbound API request. It is **per-execution and flat**: `WHERE ExecutionId = X` returns everything *one* run did, but nothing links an execution to the execution that *spawned* it. A call chain cannot be traced across the execution boundary. Two cross-execution cases exist: 1. **Inbound API request → routed site script.** An inbound HTTP request runs an inbound method script (`InboundScriptExecutor`, central) which calls `Route.Call(scriptName, params)`; that sends a `RouteToCallRequest` to a site instance, which runs `scriptName` as a fresh site-side execution. The inbound request and the routed site script get two unrelated `ExecutionId`s. 2. **Tag cascade.** Script A writes an attribute; the attribute change triggers script B as a separate execution. A and B are unrelated. ## Decision Add a dedicated, nullable **`ParentExecutionId`** (`Guid?`) column to the audit row. Every execution still gets its own fresh `ExecutionId` (unchanged). An execution *spawned by* another carries the spawner's `ExecutionId` in its `ParentExecutionId`; a top-level (tag/timer/inbound/un-bridged) execution leaves it null. Walking `ParentExecutionId → ExecutionId` recursively reconstructs the chain as a tree. **First cut — in scope:** case 1 only, the **inbound → routed-site-script bridge**. It is the most concrete case and the spawn point is an explicit, threadable RPC (`RouteToCallRequest`). **Out of scope:** case 2 (tag cascade) — the trigger is data-driven and decoupled; "which execution wrote the tag that triggered me" is not tracked anywhere today. Deferred as a follow-up. The `ParentExecutionId` model generalises to it with no schema change if that data is ever threaded. ### Considered and rejected - **Reuse `ExecutionId`** — the routed script *adopts* the inbound request's `ExecutionId` instead of generating its own. Cheaper (no new column) but conflates two genuinely separate executions on two clusters, breaks the invariant "one `ExecutionId` = one `ScriptRuntimeContext` run", and does not generalise to tag cascade. - **Point `ParentExecutionId` at the root** (flatten the chain to two levels) instead of the immediate spawner — simpler queries but loses intermediate hops, needs a separately threaded root id, and does not generalise. Rejected in favour of the immediate-spawner tree. ## Architecture & data flow The id propagated is the **inbound API request's `ExecutionId`**. The chain: 1. **Mint the inbound request id once, early.** Today `AuditWriteMiddleware` mints a `Guid.NewGuid()` late, only for the inbound row's `ExecutionId`. Move the mint to the HTTP entry and stash it on `HttpContext.Items`, so both the middleware (writes the `InboundRequest` row at request end) and `InboundScriptExecutor` (needs it *before* the script runs) read the same id. 2. **Carry it on the routing RPC.** `RouteHelper.Call` builds a `RouteToCallRequest`; an additive `ParentExecutionId` field is set from the stashed inbound id. (`RouteHelper`'s own per-op GUID is a separate concern — left alone.) 3. **Site side: thread it into the routed script's context.** The site handler for `RouteToCallRequest` passes it to a new optional `parentExecutionId` ctor param on `ScriptRuntimeContext` (sibling to the existing `executionId` param). The routed script still generates its **own** fresh `ExecutionId`. 4. **Every emitter stamps `ParentExecutionId`** alongside `ExecutionId`. **Recursion (immediate-spawner tree).** A routed script that itself calls `Route.Call` threads its own `ExecutionId` onward, so a grandchild's `ParentExecutionId` points at its immediate spawner, not the root. Walk the tree recursively to reconstruct any depth. **The inbound request's own row** (`InboundRequest` / `InboundAuthFailure`) is top-level → `ParentExecutionId = NULL`. Only the routed site script and every row it produces carry the pointer. ## Schema changes (all additive, nullable — no backfill; pre-existing rows stay `NULL`) | Where | Change | |---|---| | `ScadaLink.Commons` | `AuditEvent.ParentExecutionId` (`Guid?`); `RouteToCallRequest.ParentExecutionId` (`Guid?`); `Notification.OriginParentExecutionId` (`Guid?`); `NotificationSubmit.OriginParentExecutionId` (`Guid?`). | | Central MS SQL `AuditLog` | `ParentExecutionId uniqueidentifier NULL` column + partition-aligned index `IX_AuditLog_ParentExecution (ParentExecutionId)` (mirror `AddAuditLogExecutionId`). EF migration — additive nullable column is a metadata-only `ALTER`. | | Central MS SQL `Notifications` | `OriginParentExecutionId uniqueidentifier NULL` column + EF migration (mirror `AddNotificationOriginExecutionId`). | | Site SQLite `auditlog.db` `AuditLog` | `ParentExecutionId TEXT NULL` — added **via the idempotent `ALTER`-if-missing upgrade path** (per commit `5198b11`), never relying on `CREATE TABLE IF NOT EXISTS`. | | gRPC `AuditEventDto` (`sitestream.proto`) | additive `parent_execution_id` field (next free number); `AuditEventDtoMapper` maps it both directions (Guid ↔ string; empty string ↔ null). | | `ScriptRuntimeContext` | optional `parentExecutionId` ctor param + stored `_parentExecutionId` field. | `IX_AuditLog_ParentExecution` is load-bearing: the tree view's downward recursive join seeks on it, and it backs the `parentExecutionId` filter. `SiteCalls` needs no new column — the cached telemetry packet carries the audit half, which now has `ParentExecutionId` directly. ## Emitter coverage — full (mirrors the `ExecutionId` rollout) Every audit row a routed-script run produces carries `ParentExecutionId`, so `WHERE ParentExecutionId = X` returns the routed run's complete trust-boundary footprint. | Emitter | `ParentExecutionId` source | |---|---| | Sync `ApiCall`, sync `DbWrite` | `ScriptRuntimeContext._parentExecutionId` (in scope) | | Cached call script-side rows (`CachedSubmit`, immediate `Attempted`/`CachedResolve`) | `ScriptRuntimeContext._parentExecutionId` | | Cached call **S&F retry-loop** rows (`CachedCallLifecycleBridge`) | threaded through the S&F buffered message → `CachedCallAttemptContext` → the bridge, as a sibling to the `ExecutionId` already threaded there | | `NotifySend` (site, script-side) | `ScriptRuntimeContext._parentExecutionId` | | `NotifyDeliver` (central dispatch) | `Notifications.OriginParentExecutionId` — rides on `NotificationSubmit`, persisted on the `Notifications` row, dispatcher stamps every `NotifyDeliver` row | | Inbound `InboundRequest` / `InboundAuthFailure` | `NULL` — inbound is top-level | The threading reuses the carry points the `ExecutionId` rollout already opened (S&F buffer, `NotificationSubmit` → `Notifications`); `ParentExecutionId` is a sibling field at each, not a new boundary. ## Recursive chain/tree view A new repository method `GetExecutionTreeAsync(Guid executionId)`: - **Walk up** to the root: iterative single-parent follow (`SELECT TOP 1 ParentExecutionId WHERE ExecutionId = current AND ParentExecutionId IS NOT NULL`) until null. Cheap — each execution has exactly one parent. - **Walk down** from the root: recursive CTE joining `ParentExecutionId = ancestor.ExecutionId`, seeking on `IX_AuditLog_ParentExecution`. `MAXRECURSION` capped (e.g. 32) — chains are shallow; the cap guards against corrupt/pathological data. - Returns a flat list of execution nodes: `ExecutionId`, `ParentExecutionId`, row count, channels/statuses present, `SourceSiteId`/`SourceInstanceId`, first/last `OccurredAtUtc`. The UI assembles the tree from the flat list. **UI.** New route `/audit/execution-tree?executionId=`, reached via a "View execution chain" drill-in from any audit row and from the `ExecutionId` column. Renders an expandable custom Blazor tree (no component frameworks); each node shows the execution summary; clicking a node filters the Audit Log grid to `?executionId=`. The tree is always rooted at the topmost ancestor, so the reader sees the full chain regardless of which row they entered from. Plus the cheaper navigation affordances: `ParentExecutionId` grid column (short form / monospace), a `ParentExecutionId` paste-filter, a `?parentExecutionId=` query param, and a "View parent execution" drill-in (links `?executionId=`). ### Edge cases - **Parent with no rows of its own.** An execution that performed no trust-boundary action emits no audit rows, yet a child still references it via `ParentExecutionId`. The upward walk resolves the GUID but finds no rows for that node → render it as a stub node ("execution with no audited actions"). - **Purged parent.** A parent execution older than the 365-day central retention has no rows → the upward walk stops there; the chain renders as far as it resolves. - **Cycle guard.** The `ParentExecutionId` graph is acyclic by construction (each execution is minted fresh and its parent always pre-exists), but `MAXRECURSION` bounds the downward CTE against corrupt data. ## CLI / ManagementService - CLI: `scadalink audit query --parent-execution-id `; `AuditLogQueryFilter` gains a `ParentExecutionId` single-value filter dimension (mirror `ExecutionId`). - ManagementService `/api/audit/query` + export endpoint and the CentralUI export endpoints parse a `parentExecutionId` query param (lax-parse — unparseable dropped). - The tree view's data path: `GetExecutionTreeAsync` is exposed however the existing Audit Log page sources its grid data — mirror that path; add a ManagementService endpoint only if the page goes through it. - **No CLI `audit tree` command in the first cut** — the tree is a UI forensic affordance; the `--parent-execution-id` filter covers scripted use. Noted as a possible follow-up. ## Compatibility - Additive nullable columns; additive proto field; additive message-contract fields — all version-compatible. No backfill; historical rows keep `ParentExecutionId = NULL`. - `ExecutionId` and `CorrelationId` semantics unchanged — every existing drill-in keeps working. ## Failure handling - Audit-write failure NEVER aborts the user-facing action — unchanged invariant; `ParentExecutionId` is just another field on the row. - Site `auditlog.db` schema change MUST use the idempotent `ALTER`-if-missing path (commit `5198b11`); do not repeat the original `CREATE TABLE IF NOT EXISTS` mistake. ## Testing - Repository: query-by-`ParentExecutionId`; `GetExecutionTreeAsync` (multi-level tree, stub-parent node, `MAXRECURSION` cap); migration smoke test. - Emitter unit tests: each emitter stamps `ParentExecutionId`; the cached-call lifecycle rows from one routed run share it; `NotifyDeliver` echoes `Notifications.OriginParentExecutionId`. - **Headline integration test:** an inbound API request that calls `Route.Call` → the routed site script does a sync `ExternalSystem.Call`, a cached call, and a `Notify.Send` → every resulting audit row (site + central) carries `ParentExecutionId` = the inbound request's `ExecutionId`, while each has its own distinct `ExecutionId`. - Central UI: bUnit (column renders, filter maps, query param parsed, tree assembled from the flat list) + Playwright (drill-in → tree → node click filters the grid). ## Out of scope / follow-ups - **Tag cascade (case 2)** — deferred. If the attribute-write path ever carries the writing execution's id into the triggered script's `ScriptRuntimeContext`, the same `ParentExecutionId` column and tree view cover it with no schema change. - CLI `audit tree` command — possible follow-up. - Backfilling `ParentExecutionId` on historical audit rows — not done. ## Constraints - Additive everywhere — nullable columns, additive proto/message fields, no backfill. - Never touch `infra/*`; `alog.md` is the locked v1 spec — do not modify it. - Site `auditlog.db` schema change MUST use the idempotent `ALTER`-if-missing path (commit `5198b11`).