# Audit Log — Cross-Execution Correlation (`ParentExecutionId`) Design

**Date:** 2026-05-21
**Status:** Validated — ready for implementation planning.

## Problem

The Audit Log carries `ExecutionId` (`Guid?`) — a universal per-run correlation
value stamped on every audit row, identifying the originating script execution
or inbound API request. It is **per-execution and flat**: `WHERE ExecutionId = X`
returns everything *one* run did, but nothing links an execution to the
execution that *spawned* it. A call chain cannot be traced across the execution
boundary.

Two cross-execution cases exist:

1. **Inbound API request → routed site script.** An inbound HTTP request runs an
   inbound method script (`InboundScriptExecutor`, central) which calls
   `Route.Call(scriptName, params)`; that sends a `RouteToCallRequest` to a site
   instance, which runs `scriptName` as a fresh site-side execution. The inbound
   request and the routed site script get two unrelated `ExecutionId`s.
2. **Tag cascade.** Script A writes an attribute; the attribute change triggers
   script B as a separate execution. A and B are unrelated.

## Decision

Add a dedicated, nullable **`ParentExecutionId`** (`Guid?`) column to the audit
row. Every execution still gets its own fresh `ExecutionId` (unchanged). An
execution *spawned by* another carries the spawner's `ExecutionId` in its
`ParentExecutionId`; a top-level (tag/timer/inbound/un-bridged) execution leaves
it null. Walking `ParentExecutionId → ExecutionId` recursively reconstructs the
chain as a tree.

**First cut — in scope:** case 1 only, the **inbound → routed-site-script
bridge**. It is the most concrete case and the spawn point is an explicit,
threadable RPC (`RouteToCallRequest`).

**Out of scope:** case 2 (tag cascade) — the trigger is data-driven and
decoupled; "which execution wrote the tag that triggered me" is not tracked
anywhere today. Deferred as a follow-up. The `ParentExecutionId` model
generalises to it with no schema change if that data is ever threaded.

### Considered and rejected

- **Reuse `ExecutionId`** — the routed script *adopts* the inbound request's
  `ExecutionId` instead of generating its own. Cheaper (no new column) but
  conflates two genuinely separate executions on two clusters, breaks the
  invariant "one `ExecutionId` = one `ScriptRuntimeContext` run", and does not
  generalise to tag cascade.
- **Point `ParentExecutionId` at the root** (flatten the chain to two levels)
  instead of the immediate spawner — simpler queries but loses intermediate
  hops, needs a separately threaded root id, and does not generalise. Rejected
  in favour of the immediate-spawner tree.

## Architecture & data flow

The id propagated is the **inbound API request's `ExecutionId`**. The chain:

1. **Mint the inbound request id once, early.** Today `AuditWriteMiddleware`
   mints a `Guid.NewGuid()` late, only for the inbound row's `ExecutionId`. Move
   the mint to the HTTP entry and stash it on `HttpContext.Items`, so both the
   middleware (writes the `InboundRequest` row at request end) and
   `InboundScriptExecutor` (needs it *before* the script runs) read the same id.
2. **Carry it on the routing RPC.** `RouteHelper.Call` builds a
   `RouteToCallRequest`; an additive `ParentExecutionId` field is set from the
   stashed inbound id. (`RouteHelper`'s own per-op GUID is a separate concern —
   left alone.)
3. **Site side: thread it into the routed script's context.** The site handler
   for `RouteToCallRequest` passes it to a new optional `parentExecutionId` ctor
   param on `ScriptRuntimeContext` (sibling to the existing `executionId`
   param). The routed script still generates its **own** fresh `ExecutionId`.
4. **Every emitter stamps `ParentExecutionId`** alongside `ExecutionId`.

**Recursion (immediate-spawner tree).** A routed script that itself calls
`Route.Call` threads its own `ExecutionId` onward, so a grandchild's
`ParentExecutionId` points at its immediate spawner, not the root. Walk the tree
recursively to reconstruct any depth.

**The inbound request's own row** (`InboundRequest` / `InboundAuthFailure`) is
top-level → `ParentExecutionId = NULL`. Only the routed site script and every
row it produces carry the pointer.

## Schema changes (all additive, nullable — no backfill; pre-existing rows stay `NULL`)

| Where | Change |
|---|---|
| `ScadaLink.Commons` | `AuditEvent.ParentExecutionId` (`Guid?`); `RouteToCallRequest.ParentExecutionId` (`Guid?`); `Notification.OriginParentExecutionId` (`Guid?`); `NotificationSubmit.OriginParentExecutionId` (`Guid?`). |
| Central MS SQL `AuditLog` | `ParentExecutionId uniqueidentifier NULL` column + partition-aligned index `IX_AuditLog_ParentExecution (ParentExecutionId)` (mirror `AddAuditLogExecutionId`). EF migration — additive nullable column is a metadata-only `ALTER`. |
| Central MS SQL `Notifications` | `OriginParentExecutionId uniqueidentifier NULL` column + EF migration (mirror `AddNotificationOriginExecutionId`). |
| Site SQLite `auditlog.db` `AuditLog` | `ParentExecutionId TEXT NULL` — added **via the idempotent `ALTER`-if-missing upgrade path** (per commit `5198b11`), never relying on `CREATE TABLE IF NOT EXISTS`. |
| gRPC `AuditEventDto` (`sitestream.proto`) | additive `parent_execution_id` field (next free number); `AuditEventDtoMapper` maps it both directions (Guid ↔ string; empty string ↔ null). |
| `ScriptRuntimeContext` | optional `parentExecutionId` ctor param + stored `_parentExecutionId` field. |

`IX_AuditLog_ParentExecution` is load-bearing: the tree view's downward
recursive join seeks on it, and it backs the `parentExecutionId` filter.

`SiteCalls` needs no new column — the cached telemetry packet carries the audit
half, which now has `ParentExecutionId` directly.

## Emitter coverage — full (mirrors the `ExecutionId` rollout)

Every audit row a routed-script run produces carries `ParentExecutionId`, so
`WHERE ParentExecutionId = X` returns the routed run's complete trust-boundary
footprint.

| Emitter | `ParentExecutionId` source |
|---|---|
| Sync `ApiCall`, sync `DbWrite` | `ScriptRuntimeContext._parentExecutionId` (in scope) |
| Cached call script-side rows (`CachedSubmit`, immediate `Attempted`/`CachedResolve`) | `ScriptRuntimeContext._parentExecutionId` |
| Cached call **S&F retry-loop** rows (`CachedCallLifecycleBridge`) | threaded through the S&F buffered message → `CachedCallAttemptContext` → the bridge, as a sibling to the `ExecutionId` already threaded there |
| `NotifySend` (site, script-side) | `ScriptRuntimeContext._parentExecutionId` |
| `NotifyDeliver` (central dispatch) | `Notifications.OriginParentExecutionId` — rides on `NotificationSubmit`, persisted on the `Notifications` row, dispatcher stamps every `NotifyDeliver` row |
| Inbound `InboundRequest` / `InboundAuthFailure` | `NULL` — inbound is top-level |

The threading reuses the carry points the `ExecutionId` rollout already opened
(S&F buffer, `NotificationSubmit` → `Notifications`); `ParentExecutionId` is a
sibling field at each, not a new boundary.

## Recursive chain/tree view

A new repository method `GetExecutionTreeAsync(Guid executionId)`:

- **Walk up** to the root: iterative single-parent follow
  (`SELECT TOP 1 ParentExecutionId WHERE ExecutionId = current AND
  ParentExecutionId IS NOT NULL`) until null. Cheap — each execution has exactly
  one parent.
- **Walk down** from the root: recursive CTE joining
  `ParentExecutionId = ancestor.ExecutionId`, seeking on
  `IX_AuditLog_ParentExecution`. `MAXRECURSION` capped (e.g. 32) — chains are
  shallow; the cap guards against corrupt/pathological data.
- Returns a flat list of execution nodes: `ExecutionId`, `ParentExecutionId`,
  row count, channels/statuses present, `SourceSiteId`/`SourceInstanceId`,
  first/last `OccurredAtUtc`. The UI assembles the tree from the flat list.

**UI.** New route `/audit/execution-tree?executionId=<guid>`, reached via a
"View execution chain" drill-in from any audit row and from the `ExecutionId`
column. Renders an expandable custom Blazor tree (no component frameworks); each
node shows the execution summary; clicking a node filters the Audit Log grid to
`?executionId=<node>`. The tree is always rooted at the topmost ancestor, so the
reader sees the full chain regardless of which row they entered from.

Plus the cheaper navigation affordances: `ParentExecutionId` grid column (short
form / monospace), a `ParentExecutionId` paste-filter, a `?parentExecutionId=`
query param, and a "View parent execution" drill-in (links
`?executionId=<parentId>`).

### Edge cases

- **Parent with no rows of its own.** An execution that performed no
  trust-boundary action emits no audit rows, yet a child still references it via
  `ParentExecutionId`. The upward walk resolves the GUID but finds no rows for
  that node → render it as a stub node ("execution with no audited actions").
- **Purged parent.** A parent execution older than the 365-day central
  retention has no rows → the upward walk stops there; the chain renders as far
  as it resolves.
- **Cycle guard.** The `ParentExecutionId` graph is acyclic by construction
  (each execution is minted fresh and its parent always pre-exists), but
  `MAXRECURSION` bounds the downward CTE against corrupt data.

## CLI / ManagementService

- CLI: `scadalink audit query --parent-execution-id <guid>`;
  `AuditLogQueryFilter` gains a `ParentExecutionId` single-value filter
  dimension (mirror `ExecutionId`).
- ManagementService `/api/audit/query` + export endpoint and the CentralUI
  export endpoints parse a `parentExecutionId` query param (lax-parse —
  unparseable dropped).
- The tree view's data path: `GetExecutionTreeAsync` is exposed however the
  existing Audit Log page sources its grid data — mirror that path; add a
  ManagementService endpoint only if the page goes through it.
- **No CLI `audit tree` command in the first cut** — the tree is a UI forensic
  affordance; the `--parent-execution-id` filter covers scripted use. Noted as a
  possible follow-up.

## Compatibility

- Additive nullable columns; additive proto field; additive message-contract
  fields — all version-compatible. No backfill; historical rows keep
  `ParentExecutionId = NULL`.
- `ExecutionId` and `CorrelationId` semantics unchanged — every existing
  drill-in keeps working.

## Failure handling

- Audit-write failure NEVER aborts the user-facing action — unchanged invariant;
  `ParentExecutionId` is just another field on the row.
- Site `auditlog.db` schema change MUST use the idempotent `ALTER`-if-missing
  path (commit `5198b11`); do not repeat the original `CREATE TABLE IF NOT
  EXISTS` mistake.

## Testing

- Repository: query-by-`ParentExecutionId`; `GetExecutionTreeAsync` (multi-level
  tree, stub-parent node, `MAXRECURSION` cap); migration smoke test.
- Emitter unit tests: each emitter stamps `ParentExecutionId`; the cached-call
  lifecycle rows from one routed run share it; `NotifyDeliver` echoes
  `Notifications.OriginParentExecutionId`.
- **Headline integration test:** an inbound API request that calls `Route.Call`
  → the routed site script does a sync `ExternalSystem.Call`, a cached call, and
  a `Notify.Send` → every resulting audit row (site + central) carries
  `ParentExecutionId` = the inbound request's `ExecutionId`, while each has its
  own distinct `ExecutionId`.
- Central UI: bUnit (column renders, filter maps, query param parsed, tree
  assembled from the flat list) + Playwright (drill-in → tree → node click
  filters the grid).

## Out of scope / follow-ups

- **Tag cascade (case 2)** — deferred. If the attribute-write path ever carries
  the writing execution's id into the triggered script's `ScriptRuntimeContext`,
  the same `ParentExecutionId` column and tree view cover it with no schema
  change.
- CLI `audit tree` command — possible follow-up.
- Backfilling `ParentExecutionId` on historical audit rows — not done.

## Constraints

- Additive everywhere — nullable columns, additive proto/message fields, no
  backfill.
- Never touch `infra/*`; `alog.md` is the locked v1 spec — do not modify it.
- Site `auditlog.db` schema change MUST use the idempotent `ALTER`-if-missing
  path (commit `5198b11`).