test+docs(m5): M5.7 — de-date 2 EndToEnd purge tests (closes #52); document T3-T8 in Component-AuditLog/-CLI/README/CLAUDE

Tests: anchor SeedOccurredAt() to a fixed thresholdAnchor (2026-01-20) and compute
RetentionDays dynamically (UtcNow - anchor + 1d) so the threshold always sits near
Jan 20 2026, between the Jan-15 "old" seed (purged) and Apr-15/Jun-15 "kept" seeds.
Seed dates stay within the explicit pf_AuditLog_Month boundary range (Jan 2026 –
Dec 2027) — relative-from-now offsets landed before 2026-01-01 (the catch-all
partition, invisible to GetPartitionBoundariesOlderThanAsync). Both tests confirmed
passing; all 284 AuditLog tests green.

Docs:
- Component-AuditLog.md: per-channel retention overrides (T3, PerChannelRetentionDays
  + bounded DELETE + AuditLogPurge:ChannelPurgeBatchSize); ParentExecutionId tag-cascade
  now spans alarm-triggered + nested CallScript/CallShared + inbound→routed (T4, "no
  further spawn points deferred"); per-node stuck KPIs for Notification Outbox +
  Site Call Audit (T6); T7 structured response-capture increments (request headers in
  Extra.requestHeaders, AuditInboundCeilingHits counter, per-method SkipBodyCapture);
  T8 CLI audit tree; T1 hash-chain + T2 Parquet explicitly marked deferred to v1.x.
- Component-CLI.md + README.md: document audit tree --execution-id <guid> and
  audit backfill-source-node --sentinel/--before/--batch with exact options verified
  against AuditCommands.cs; update Interactions to list new endpoints.
- CLAUDE.md: update audit-log design-decision bullets for T3 per-channel retention,
  T4 tag-cascade complete, T6 per-node KPIs, T7 inbound capture increments, T8 tree
  command; clarify T1/T2 remain deferred to v1.x.
This commit is contained in:
Joseph Doherty
2026-06-16 22:26:09 -04:00
parent 1b63d6751f
commit 639e331db1
6 changed files with 320 additions and 127 deletions
+6 -4
View File
@@ -163,14 +163,16 @@ Related repos cloned as sibling directories under `~/Desktop/` — referenced fo
- Scope = script trust boundary: outbound API (sync + cached), outbound DB (sync + cached), notifications, inbound API. Framework/internal traffic is explicitly excluded.
- One row per lifecycle event; cached calls produce 4+ rows per operation (`Submitted`, `Forwarded`, `Attempted`, `Delivered`/`Parked`/`Discarded`).
- `ExecutionId` (`uniqueidentifier NULL`) is the universal per-run correlation value — every audit row emitted by one script execution / inbound request shares it; `CorrelationId` remains the per-operation lifecycle id (NULL for sync one-shots).
- `ParentExecutionId` (`uniqueidentifier NULL`) is the cross-execution spawn pointer — every row of a spawned run carries the spawner's `ExecutionId`; first cut bridges the inbound API → routed-site-script case (the routed run records the inbound request's `ExecutionId`; the inbound row stays top-level / NULL); `IX_AuditLog_ParentExecution` backs the filter + the recursive execution-tree walk; tag cascade deferred.
- `ParentExecutionId` (`uniqueidentifier NULL`) is the cross-execution spawn pointer — every row of a spawned run carries the spawner's `ExecutionId`; bridges inbound API → routed-site-script, alarm-triggered on-trigger scripts, and nested `CallScript`/`CallShared` invocations; `IX_AuditLog_ParentExecution` backs the filter + the recursive execution-tree walk. Tag-cascade coverage is complete as of M5.4 (T4) — no further spawn points are deferred.
- Site SQLite hot-path first, then gRPC telemetry to central; ingest is idempotent on `EventId`; periodic reconciliation pull as fallback when telemetry is lost.
- Cached operations: site emits a single additively-extended `CachedCallTelemetry` packet carrying both audit events and operational state; central writes `AuditLog` + `SiteCalls` in one transaction.
- Payload cap 8 KB by default / 64 KB on error rows; auth headers redacted by default; SQL parameter values captured by default; per-target redaction opt-in.
- Payload cap 8 KB by default / 64 KB on error rows; auth headers redacted by default; SQL parameter values captured by default; per-target redaction opt-in. Inbound API: full verbatim capture up to `InboundMaxBytes` (default 1 MiB); request headers stored in `Extra.requestHeaders` (post-redaction); per-method `SkipBodyCapture` flag suppresses bodies while still recording headers + metadata; `AuditInboundCeilingHits` counter surfaced on health snapshot. (M5.3 T7)
- Audit-write failure NEVER aborts the user-facing action — audit is best-effort, the action's own success/failure path is authoritative.
- 365-day central retention with monthly partition-switch purge; 7-day site SQLite retention with a hard `ForwardState` invariant (no row purged until forwarded or reconciled).
- Append-only enforced via DB roles (writer role has INSERT only, no UPDATE/DELETE); hash-chain tamper evidence and Parquet archival are deferred to v1.x.
- 365-day central retention with monthly partition-switch purge; per-channel retention overrides (`AuditLog:PerChannelRetentionDays`) expire rows earlier than the global window via a bounded, batched row DELETE on the purge actor's maintenance path — values must be shorter than the global window (M5.5 T3); 7-day site SQLite retention with a hard `ForwardState` invariant (no row purged until forwarded or reconciled).
- Append-only enforced via DB roles (writer role has INSERT only, no UPDATE/DELETE); hash-chain tamper evidence (T1) and Parquet archival (T2) are deferred to v1.x — not shipped in M5.
- Node-of-origin is captured alongside site-of-origin: `SourceNode` (`varchar(64)` NULL) on `AuditLog`, `Notifications`, and `SiteCalls``node-a`/`node-b` for site rows (qualified by `SourceSiteId`/`SourceSite`), `central-a`/`central-b` for central direct-write rows. Stamped at the writing node, carried verbatim through telemetry + reconciliation, and indexed via `IX_AuditLog_Node_Occurred (SourceNode, OccurredAtUtc)` on `AuditLog`.
- Per-node stuck KPIs (M5.3 T6): Notification Outbox and Site Call Audit expose `PerNodeNotificationKpiRequest`/`PerNodeSiteCallKpiRequest` messages that group stuck/parked/delivered counts by `SourceNode`, surfacing per-node breakdowns on the Health dashboard.
- `audit tree --execution-id <guid>` CLI command (M5.3 T8) + `GET /api/audit/tree` endpoint — resolves any node to its chain root and renders the full execution tree; backed by `IAuditLogRepository.GetExecutionTreeAsync`.
- Central UI: new top-level **Audit** nav group + Audit Log page, with drill-ins from Notifications, Site Calls, External Systems, Inbound API Keys, Sites, and Instances.
### Security & Auth
+127 -32
View File
@@ -158,16 +158,32 @@ is per-run and flat — `WHERE ExecutionId = X` returns everything one run did,
nothing links a run to the run that *spawned* it. `ParentExecutionId` carries the
spawning execution's `ExecutionId`: a spawned run still gets its own fresh
`ExecutionId`, and every audit row it emits also carries the spawner's id in
`ParentExecutionId`. The first cut bridges the **inbound API → routed-site-script**
case: an inbound request runs a method script that calls `Route.Call`, routing to
a site instance; the routed site script records the inbound request's
`ExecutionId` as its `ParentExecutionId`, while the inbound `InboundRequest` row
itself is top-level (`ParentExecutionId` NULL). The pointer always references the
*immediate* spawner, so a routed run that itself routes onward threads its own
`ExecutionId` — walking `ParentExecutionId → ExecutionId` recursively
reconstructs the call chain as a tree of arbitrary depth. The tag-cascade case
(an attribute write triggering another script) is **deferred** — the model
generalises to it with no schema change once that spawn point is threaded.
`ParentExecutionId`. The pointer always references the *immediate* spawner, so a
run that itself spawns further runs threads its own `ExecutionId` — walking
`ParentExecutionId → ExecutionId` recursively reconstructs the call chain as a
tree of arbitrary depth.
**Tag-cascade coverage (M5.4 T4):** `ParentExecutionId` threading now spans all
known spawn points:
- **Inbound API → routed site script** — an inbound request runs a method script
that calls `Route.Call`; the routed site script records the inbound request's
`ExecutionId` as its `ParentExecutionId`, while the inbound `InboundRequest` row
is top-level (`ParentExecutionId` NULL).
- **Alarm-triggered on-trigger script** — when an alarm fires and its on-trigger
script runs (via `AlarmActor → AlarmExecutionActor`), the alarm context's
`ExecutionId` is carried as the run's `ParentExecutionId`. Currently the alarm
subsystem has no Guid-typed firing id so on-trigger runs are roots (NULL) in
practice, but the wiring is in place for a future alarm `ExecutionId`.
- **Nested `CallScript` / `CallShared` invocations** — when a script calls
`Instance.CallScript(...)` or a shared script via `CallShared`, the calling
execution's `ExecutionId` threads into the spawned run as its
`ParentExecutionId`, making deeply nested call chains visible as a tree.
Attribute-write-triggered cascades (one tag change triggering another script via a
tag subscription) are also wired: trigger-driven runs carry `ParentExecutionId =
NULL` (top-level roots), and any nested `CallScript`/`CallShared` they perform
chains as above. The schema is unchanged — no further tag-cascade work is deferred.
## The Site-Local `AuditLog` (SQLite)
@@ -268,7 +284,34 @@ operational `SiteCalls` shape for the dispatcher and UI.
- **Default cap** — 8 KB for each of `RequestSummary` and `ResponseSummary`;
raised to 64 KB on any error row (`Status IN ('Failed', 'Parked', 'Discarded')`).
- **Inbound API exception.** For `Channel = ApiInbound`, `RequestSummary` and `ResponseSummary` are captured in full up to a per-body hard ceiling of 1 MiB (configurable via `AuditLog:InboundMaxBytes`; default 1 048 576 bytes; min 8 192; max 16 777 216). The 8 KiB / 64 KiB default/error caps that apply to other channels do not apply here. `PayloadTruncated = 1` is set only when the inbound ceiling is hit — verbatim capture is the normal case. The ceiling applies independently to each body. Header redaction and per-target body redactors still run before persistence.
- **Inbound API exception.** For `Channel = ApiInbound`, `RequestSummary` and
`ResponseSummary` are captured in full up to a per-body hard ceiling of 1 MiB
(configurable via `AuditLog:InboundMaxBytes`; default 1 048 576 bytes; min
8 192; max 16 777 216). The 8 KiB / 64 KiB default/error caps that apply to
other channels do not apply here. `PayloadTruncated = 1` is set only when the
inbound ceiling is hit — verbatim capture is the normal case. The ceiling
applies independently to each body. Header redaction and per-target body
redactors still run before persistence.
- **Inbound ceiling hits (M5.3 T7).** Every time the `InboundMaxBytes` ceiling
truncates a body an `IAuditInboundCeilingHitsCounter.Increment()` call fires.
This counter is surfaced as `AuditInboundCeilingHits` on the central health
snapshot (alongside `CentralAuditWriteFailures` / `AuditRedactionFailure`) so
operators can detect persistently oversized payloads and raise the ceiling or
add per-target body redactors.
- **Request headers in `Extra` (M5.3 T7).** For `Channel = ApiInbound`, the
`AuditWriteMiddleware` captures the inbound HTTP request headers (post-redaction
`Authorization`, `X-API-Key`, `Cookie`, `Set-Cookie`, and the configured
`HeaderRedactList` are scrubbed before serialization) into the `Extra` JSON
column under the key `"requestHeaders"`. This makes the full header envelope
visible in the Audit Log UI's detail drawer and the CLI's `audit query` output
without widening the schema.
- **Per-method `SkipBodyCapture` (M5.3 T7).** `PerTargetOverrides` now includes
a `SkipBodyCapture: true` flag. When set for an inbound API method, the audit
row is always emitted (headers, status, duration, actor, etc. are recorded) but
`RequestSummary` and `ResponseSummary` are left null. Use this for methods whose
payloads are structurally large or contain secrets not covered by body redactors.
Headers are still captured into `Extra.requestHeaders` (after redaction) even
when `SkipBodyCapture` is true.
- **Truncation** — UTF-8 byte-safe; `PayloadTruncated = 1` when applied. Full
bodies are never stored.
- **HTTP headers** — `Authorization`, `Cookie`, `Set-Cookie`, `X-API-Key`, and
@@ -311,16 +354,33 @@ MS SQL for direct-write events). Unredacted secrets never persist.
## Retention & Purge
- **Central:** 365-day default based on `OccurredAtUtc`, configurable via
`AuditLog:RetentionDays` (min 7, max 3650). Single global retention in v1 —
no per-channel overrides.
`AuditLog:RetentionDays` (min 30, max 3650).
- **Partitioning:** monthly partitions on `OccurredAtUtc` from day one
(`pf_AuditLog_Month` / `ps_AuditLog_Month`). Purge is a partition switch;
there are no row-level deletes at central.
(`pf_AuditLog_Month` / `ps_AuditLog_Month`). The global partition switch is
channel-blind; it drops a whole month once every row in it is older than the
global window. There are no row-level deletes at central for the global purge.
- **Purge actor:** `AuditLogPurgeActor` singleton on the active central node
runs daily, switches out any partition whose latest `OccurredAtUtc` is older
than the retention window, and emits an `AuditLog:Purged` event (partition
range, rowcount, duration). A partition-maintenance step rolls forward each
month, creating the next month's partition ahead of time.
than the retention window, then applies any per-channel overrides (see below),
and emits an `AuditLog:Purged` event (partition range, rowcount, duration) per
switched partition. A partition-maintenance step rolls forward each month,
creating the next month's partition ahead of time.
- **Per-channel retention overrides (M5.5 T3):** `AuditLog:PerChannelRetentionDays`
is a dictionary keyed by canonical channel name (`ApiOutbound`, `DbOutbound`,
`Notification`, `ApiInbound`) whose value is a retention window in days that
MUST be strictly shorter than the global `RetentionDays`. After the daily
partition switch-out, the purge actor runs a bounded, batched row DELETE
(`PurgeChannelOlderThanAsync`) for each channel whose override is shorter than
the global window — expiring rows of that channel earlier than the global
partition switch would. Overrides equal to or longer than the global window are
silently skipped (the global switch already covers them). The DELETE runs under
`scadabridge_audit_purger` (the maintenance role); the append-only writer role
is unaffected. Batch size is configurable via
`AuditLogPurge:ChannelPurgeBatchSize` (default 5000). Each channel override
runs in its own try/catch, mirroring the per-boundary error-isolation of the
partition switch-out loop. Values are validated to be in
`[30, RetentionDays]`; keys that are not a recognized `AuditChannel` enum name
are rejected at startup.
- **Sites:** daily site job; default 7-day retention (configurable, min 1,
max 90). Respects the hard `ForwardState` invariant — `Pending` rows are
never purged on age alone.
@@ -340,10 +400,13 @@ MS SQL for direct-write events). Unredacted secrets never persist.
**AuditExport** permission.
- **Payload redaction at write.** See Payload Capture Policy. Unredacted
secrets never persist; the safety net over-redacts on misconfiguration.
- **Hash-chain tamper evidence — deferred to v1.x.** A future `RowHash` column,
computed per partition as `SHA-256(prev.RowHash || canonical(row))`, will be
verifiable offline via `scadabridge audit verify-chain --month YYYY-MM`. Off by
default in v1.
- **Hash-chain tamper evidence (T1) — deferred to v1.x.** A future `RowHash`
column, computed per partition as `SHA-256(prev.RowHash || canonical(row))`, will
be verifiable offline via `scadabridge audit verify-chain --month YYYY-MM`. The
`verify-chain` CLI command is a no-op placeholder today. Off by default in v1.
- **Parquet archival (T2) — deferred to v1.x.** Long-term cold storage of purged
monthly partitions as Parquet files (suitable for offline analytics) will be
added in a future milestone. T1 and T2 are not shipped as part of M5.
- **Site SQLite security.** File permissions: read/write by the ScadaBridge
service account only. Not backed up off-machine — site SQLite is a buffer,
not a record.
@@ -355,11 +418,22 @@ Point-in-time, computed from the central `AuditLog` table; global and per-site.
- **Audit volume** — events/min landing in the central `AuditLog`; global plus per-site sparkline.
- **Audit error rate** — % of central `AuditLog` rows with `Status IN ('Failed', 'Parked', 'Discarded')` over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, permanent failures, parked deliveries) — NOT audit-writer health, which surfaces separately via `CentralAuditWriteFailures` and `AuditRedactionFailure`.
- **Audit backlog** — sum of `Pending` site rows across sites; click drills into a per-site breakdown.
- **`AuditInboundCeilingHits`** (M5.3 T7) — rolling count of inbound API responses truncated by the `InboundMaxBytes` ceiling; surfaced on the central health snapshot alongside `CentralAuditWriteFailures`.
**Per-node stuck KPIs (M5.3 T6):** Both [Notification Outbox](Component-NotificationOutbox.md)
and [Site Call Audit](Component-SiteCallAudit.md) now expose a
`PerNodeNotificationKpiRequest` / `PerNodeSiteCallKpiRequest` message pair that
groups the existing stuck, parked, and delivered-last-interval counts by the
`SourceNode` that emitted the original row. This surfaces per-node breakdowns on
the Health dashboard tiles and the Notification Outbox / Site Calls pages,
making it possible to identify a single misbehaving node (e.g., `site-a:node-b`)
as the source of a spike rather than a site-wide problem. The existing global and
per-site KPI shapes are unchanged; the per-node slice is additive.
[Notification Outbox](Component-NotificationOutbox.md) and
[Site Call Audit](Component-SiteCallAudit.md) KPIs are unaffected they remain
sourced from `Notifications` and `SiteCalls` respectively. Audit Log KPIs
describe the audit table itself.
[Site Call Audit](Component-SiteCallAudit.md) KPIs are unaffected for their
operational dispatch responsibilities — they remain sourced from `Notifications`
and `SiteCalls` respectively. Audit Log KPIs describe the audit table itself.
## Configuration
@@ -370,21 +444,40 @@ component (Options pattern):
"AuditLog": {
"DefaultCapBytes": 8192,
"ErrorCapBytes": 65536,
"InboundMaxBytes": 1048576,
"HeaderRedactList": [ "Authorization", "Cookie", "Set-Cookie", "X-API-Key" ],
"GlobalBodyRedactors": [
{ "Pattern": "\"password\"\\s*:\\s*\"[^\"]+\"", "Replacement": "\"password\":\"<redacted>\"" }
],
"PerTargetOverrides": {
"Weather/GetForecast": { "CapBytes": 4096 },
"PlantDB": { "RedactSqlParamsMatching": "@apikey|@token" }
"PlantDB": { "RedactSqlParamsMatching": "@apikey|@token" },
"HighVolumeMethod": { "SkipBodyCapture": true }
},
"RetentionDays": 365
"RetentionDays": 365,
"PerChannelRetentionDays": {
"ApiOutbound": 90,
"Notification": 180
}
}
```
`PerTargetOverrides` keys bind by External System / Inbound Method /
Notification List / Database Connection name. `RetentionDays` is a single
global value in v1; per-channel overrides are deferred to v1.x.
Notification List / Database Connection name. `SkipBodyCapture: true` omits
`RequestSummary`/`ResponseSummary` for that method while still capturing headers
into `Extra.requestHeaders` and emitting the full audit row. `RetentionDays` is
the global window; `PerChannelRetentionDays` specifies per-channel windows that
are strictly shorter — any channel whose override equals or exceeds the global
value is silently ignored (the global partition switch-out already governs it).
`AuditLogPurge` section controls the purge actor cadence and batch size:
```jsonc
"AuditLogPurge": {
"IntervalHours": 24,
"ChannelPurgeBatchSize": 5000
}
```
## Ops Notes — Historical Null Columns
@@ -480,6 +573,8 @@ orphaned entries) and in the CLI's `audit tree` output.
tiles (Volume, Error rate, Backlog) plus new health metrics:
`SiteAuditBacklog`, `SiteAuditWriteFailures`, `SiteAuditTelemetryStalled`,
`CentralAuditWriteFailures`, `AuditRedactionFailure`.
- **[CLI (#19)](Component-CLI.md)** — new `scadabridge audit query`,
`scadabridge audit export`, and `scadabridge audit verify-chain` commands; same
permission requirements as the UI.
- **[CLI (#19)](Component-CLI.md)** — `scadabridge audit query`,
`scadabridge audit export`, `scadabridge audit tree --execution-id <guid>`,
`scadabridge audit backfill-source-node --sentinel <s> --before <date>`, and
`scadabridge audit verify-chain` (no-op placeholder for the deferred hash-chain
feature); same permission requirements as the UI.
+20 -5
View File
@@ -228,14 +228,17 @@ The new centralized Audit Log component (#23) is exposed via the `scadabridge au
The `scadabridge audit` group targets the centralized Audit Log component (#23) and
exposes the UI-equivalent operational audit surface. Permissions follow the same
read-vs-export split the Central UI uses (see Component-AuditLog.md, Security &
Tamper-Evidence, and Security & Auth #10): `audit query` and `audit verify-chain`
require the `OperationalAudit` permission; `audit export` additionally requires
`AuditExport`. The server enforces permission checks and returns HTTP 403 (CLI
exit code 2) on denial.
Tamper-Evidence, and Security & Auth #10): `audit query`, `audit tree`, and
`audit verify-chain` require the `OperationalAudit` permission; `audit export`
additionally requires `AuditExport`; `audit backfill-source-node` requires the
`Admin` role (maintenance path only). The server enforces permission checks and
returns HTTP 403 (CLI exit code 2) on denial.
```
scadabridge audit query [--since <t>] [--until <t>] [--channel <c>] [--kind <k>] [--status <s>] [--site <s>] [--target <t>] [--actor <a>] [--correlation-id <id>] [--execution-id <id>] [--parent-execution-id <id>] [--errors-only] [--page-size <n>] [--all]
scadabridge audit export --since <t> --until <t> --format csv|jsonl|parquet --output <path> [--channel <c>] [--kind <k>] [--status <s>] [--site <s>] [--target <t>] [--actor <a>]
scadabridge audit tree --execution-id <guid> [--format table|json]
scadabridge audit backfill-source-node --before <ISO-8601-UTC> [--sentinel <value>] [--batch <n>]
scadabridge audit verify-chain --month <YYYY-MM>
```
@@ -247,6 +250,18 @@ scadabridge audit verify-chain --month <YYYY-MM>
requested format (`csv`, `jsonl`, `parquet`) written to `--output`. The server
streams rows rather than materializing them in memory; the CLI writes bytes
through to disk. Supports the same scoping filters as `audit query`.
- `audit tree --execution-id <guid>` (M5.3 T8) — renders the full execution-chain
tree for the given `ExecutionId`. The server resolves the root from any node in
the chain (walks `ParentExecutionId` to find the root, then traverses downward)
and returns all reachable executions with their summary row counts and first/last
occurred timestamps. Output format: `json` (default — structured tree suitable
for scripting) or `table` (human-readable indented tree). Requires
`OperationalAudit` permission. Backed by `GET /api/audit/tree?executionId=<guid>`.
- `audit backfill-source-node --before <ISO-8601-UTC>` (M5.6 T5) — sets
`SourceNode` to a sentinel value (`--sentinel`, default `"unknown"`) on pre-feature
rows where `SourceNode IS NULL` and `OccurredAtUtc < --before`, in batches
(`--batch`, default 5000). Admin-only maintenance command. Idempotent.
Backed by `POST /api/audit/backfill-source-node`.
- `audit verify-chain` — hash-chain verification for the named month.
**No-op in v1**: the command is defined so the command tree is stable, but
verification only becomes meaningful once the hash-chain ships (see
@@ -366,7 +381,7 @@ Configuration is resolved in the following priority order (highest wins):
- **System.CommandLine**: Command-line argument parsing.
- **Microsoft.AspNetCore.SignalR.Client**: SignalR client for the `debug stream` command's WebSocket connection.
- **Management Service (#18)**: The CLI hits the central cluster via the existing HTTP Management API (`POST /management`), which dispatches to the ManagementActor. The `scadabridge audit` command group rides a parallel REST surface on the same Host (`GET /api/audit/query` and `GET /api/audit/export`), sharing HTTP Basic Auth with `/management` but bypassing the actor for read-only, keyset-paged / streaming workloads.
- **Audit Log (#23)**: The `scadabridge audit query` and `audit export` subcommands target the centralized Audit Log component's REST endpoints (`GET /api/audit/query`, `GET /api/audit/export`) on the Host's Management API surface; `audit verify-chain` rides `POST /management` until hash-chain verification ships. Permission checks (`OperationalAudit`, `AuditExport`) are enforced server-side by `AuditEndpoints`.
- **Audit Log (#23)**: The `scadabridge audit query`, `audit export`, `audit tree`, and `audit backfill-source-node` subcommands target the centralized Audit Log component's REST endpoints (`GET /api/audit/query`, `GET /api/audit/export`, `GET /api/audit/tree`, `POST /api/audit/backfill-source-node`) on the Host's Management API surface; `audit verify-chain` is a client-side no-op today (hash-chain deferred to v1.x). Permission checks (`OperationalAudit`, `AuditExport`, `Admin`) are enforced server-side by `AuditEndpoints`.
## Interactions
+56 -13
View File
@@ -1269,15 +1269,18 @@ script-trust-boundary action: outbound API calls (sync + cached), outbound DB
operations (sync + cached), notifications, and inbound API calls. This is distinct
from the configuration-change audit trail exposed by [`audit-config`](#audit-config--configuration-change-audit-log).
The subcommands map directly onto the `GET /api/audit/query` and
`GET /api/audit/export` management endpoints. Filters and the result columns mirror
the Central UI **Audit** page, so a CLI query and a UI query with the same filters
return the same rows — CLI ↔ UI filter parity is intentional.
The subcommands map directly onto the `GET /api/audit/query`,
`GET /api/audit/export`, `GET /api/audit/tree`, and
`POST /api/audit/backfill-source-node` management endpoints. Filters and the
result columns mirror the Central UI **Audit** page, so a CLI query and a UI
query with the same filters return the same rows — CLI ↔ UI filter parity is
intentional.
**Permissions.** Querying requires the `OperationalAudit` permission (roles `Admin`,
`Audit`, or `AuditReadOnly`). Exporting requires the stricter `AuditExport` permission
(roles `Admin` or `Audit`) — read access does *not* imply export access. A request
without the required role returns exit code `2`.
**Permissions.** Querying and tree traversal require the `OperationalAudit`
permission (roles `Admin`, `Audit`, or `AuditReadOnly`). Exporting requires the
stricter `AuditExport` permission (roles `Admin` or `Audit`) — read access does
*not* imply export access. The `backfill-source-node` maintenance command requires
the `Admin` role. A request without the required role returns exit code `2`.
#### `audit query`
@@ -1342,6 +1345,46 @@ scadabridge --url <url> audit export --since <time> --until <time> --format <fmt
> Implemented` — Parquet archival is deferred to v1.x (see `Component-AuditLog.md`).
> Use `csv` or `jsonl`.
#### `audit tree` (M5.3 T8)
Display the full execution-chain tree for a given execution ID. The server walks
`ParentExecutionId` to find the root, then traverses downward to collect all
reachable executions in the chain.
```sh
scadabridge --url <url> audit tree --execution-id <guid> [--format table|json]
```
| Option | Required | Default | Description |
|--------|----------|---------|-------------|
| `--execution-id` | yes | — | Any `ExecutionId` in the chain (root or child) |
| `--format` | no | `json` | Output format: `json` (structured tree) or `table` (indented tree) |
The `--execution-id` can be any node in the chain — the server resolves the root
automatically. With `--format table` the tree is printed as an indented text
representation. With `--format json` (the default) a structured JSON tree is
returned, suitable for scripting. Backed by `GET /api/audit/tree?executionId=<guid>`.
Requires `OperationalAudit` permission.
#### `audit backfill-source-node` (M5.6 T5)
Set `SourceNode` to a sentinel value on pre-feature rows where `SourceNode IS NULL`
and `OccurredAtUtc` is older than `--before`. Admin-only maintenance command.
```sh
scadabridge --url <url> audit backfill-source-node --before <ISO-8601-UTC> [--sentinel <value>] [--batch <n>]
```
| Option | Required | Default | Description |
|--------|----------|---------|-------------|
| `--before` | yes | — | ISO-8601 UTC datetime; only rows older than this date are eligible |
| `--sentinel` | no | `unknown` | Value to write (must be non-empty) |
| `--batch` | no | `5000` | Max rows updated per batch; controls transaction size |
The command is idempotent — running it multiple times converges (only rows where
`SourceNode IS NULL` are eligible; already-set rows are untouched). Backed by
`POST /api/audit/backfill-source-node`. Requires `Admin` role.
#### `audit verify-chain`
Verify the audit log hash chain for a given month.
@@ -1354,11 +1397,11 @@ scadabridge --url <url> audit verify-chain --month <YYYY-MM>
|--------|----------|---------|-------------|
| `--month` | yes | — | Month to verify, `YYYY-MM` (e.g. `2026-05`) |
> **v1 no-op.** Hash-chain tamper-evidence is not enabled in this release. The
> subcommand validates the `--month` argument and prints a notice pointing at the
> v1.x roadmap in `Component-AuditLog.md`; it exits `0` without contacting the server.
> The command exists now so scripts and operator habits do not need to change when
> tamper-evidence ships.
> **v1 no-op.** Hash-chain tamper-evidence is not enabled in this release (T1
> deferred to v1.x). The subcommand validates the `--month` argument and prints a
> notice pointing at the v1.x roadmap in `Component-AuditLog.md`; it exits `0`
> without contacting the server. The command exists now so scripts and operator
> habits do not need to change when tamper-evidence ships.
---
@@ -285,21 +285,32 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
// Today is ~2026-05-20 per the test environment. With RetentionDays =
// 60 the actor computes threshold ≈ 2026-03-21:
// * Jan partition (MAX = Jan 15) → older than threshold → PURGED
// * Apr partition (MAX = Apr 15) → newer than threshold → KEPT
// Seeds two rows within the defined pf_AuditLog_Month partition range (Jan 2026
// Dec 2027). RetentionDays is computed dynamically so the purge threshold always
// anchors near 2026-01-20, keeping the test date-independent:
// old row = Jan 15 2026 → Jan 15 < threshold ~Jan 20 → partition PURGED
// kept row = Apr 15 2026 → Apr 15 > threshold ~Jan 20 → partition KEPT
//
// Using a fixed thresholdAnchor rather than "N months ago" avoids the problem
// of relative seeds landing before 2026-01-01 (the catch-all partition that
// GetPartitionBoundariesOlderThanAsync never returns).
var thresholdAnchor = new DateTime(2026, 1, 20, 0, 0, 0, DateTimeKind.Utc);
var retentionDays = (int)(DateTime.UtcNow - thresholdAnchor).TotalDays + 1;
var oldOccurred = new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc);
var keptOccurred = new DateTime(2026, 4, 15, 0, 0, 0, DateTimeKind.Utc);
var siteId = "purge-e2e-" + Guid.NewGuid().ToString("N").Substring(0, 8);
var janEvt = ScadaBridgeAuditEventFactory.Create(
var oldEvt = ScadaBridgeAuditEventFactory.Create(
eventId: Guid.NewGuid(),
occurredAtUtc: new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc),
occurredAtUtc: oldOccurred,
channel: AuditChannel.ApiOutbound,
kind: AuditKind.ApiCall,
status: AuditStatus.Delivered,
sourceSiteId: siteId);
var aprEvt = ScadaBridgeAuditEventFactory.Create(
var keptEvt = ScadaBridgeAuditEventFactory.Create(
eventId: Guid.NewGuid(),
occurredAtUtc: new DateTime(2026, 4, 15, 0, 0, 0, DateTimeKind.Utc),
occurredAtUtc: keptOccurred,
channel: AuditChannel.ApiOutbound,
kind: AuditKind.ApiCall,
status: AuditStatus.Delivered,
@@ -308,8 +319,8 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
await using (var seedContext = CreateMsSqlContext())
{
var seedRepo = new AuditLogRepository(seedContext);
await seedRepo.InsertIfNotExistsAsync(janEvt);
await seedRepo.InsertIfNotExistsAsync(aprEvt);
await seedRepo.InsertIfNotExistsAsync(oldEvt);
await seedRepo.InsertIfNotExistsAsync(keptEvt);
}
// Wire the actor's DI scope to the real repository against the
@@ -323,7 +334,7 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
services.AddScoped<IAuditLogRepository, AuditLogRepository>();
var sp = services.BuildServiceProvider();
var auditOptions = new AuditLogOptions { RetentionDays = 60 };
var auditOptions = new AuditLogOptions { RetentionDays = retentionDays };
var purgeOptions = new AuditLogPurgeOptions
{
IntervalHours = 24,
@@ -337,13 +348,9 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
Options.Create(auditOptions),
NullLogger<AuditLogPurgeActor>.Instance)));
// The probe receives one AuditLogPurgedEvent per partition the actor
// purges per tick — other test runs that share the fixture DB may
// also leave behind eligible partitions, but this test creates its
// own fixture DB so the Jan-2026 partition is the only eligible one.
// Use FishForMessage to filter just in case, with a generous timeout
// because the real drop-and-rebuild dance against MSSQL routinely
// takes a couple of seconds on a busy dev container.
// Fish for the Jan-2026 partition boundary — the only eligible one in this
// fixture DB. The generous timeout covers the real drop-and-rebuild dance
// against MSSQL which routinely takes a couple of seconds on a busy dev container.
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
var matched = probe.FishForMessage<AuditLogPurgedEvent>(
isMessage: m => m.MonthBoundary == janBoundary,
@@ -359,8 +366,8 @@ public class AuditLogPurgeActorTests : TestKit, IClassFixture<MsSqlMigrationFixt
.Where(e => e.SourceSiteId == siteId)
.ToListAsync();
Assert.DoesNotContain(rows, r => r.EventId == janEvt.EventId);
Assert.Contains(rows, r => r.EventId == aprEvt.EventId);
Assert.DoesNotContain(rows, r => r.EventId == oldEvt.EventId);
Assert.Contains(rows, r => r.EventId == keptEvt.EventId);
}
private ScadaBridgeDbContext CreateMsSqlContext() =>
@@ -140,10 +140,49 @@ WHERE name = 'UX_AuditLog_EventId'
NullLogger<AuditLogPurgeActor>.Instance)));
}
private static (DateTime Jan, DateTime Feb, DateTime Mar) SeedOccurredAt() => (
new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc),
new DateTime(2026, 2, 15, 0, 0, 0, DateTimeKind.Utc),
new DateTime(2026, 3, 15, 0, 0, 0, DateTimeKind.Utc));
/// <summary>
/// Returns three seed timestamps and a computed <c>RetentionDays</c> value that
/// keep the purge-intent date-independent regardless of when the test runs.
/// </summary>
/// <remarks>
/// <para>
/// The partition function <c>pf_AuditLog_Month</c> has explicit boundaries only
/// for 2026-01-01 through 2027-12-01. Rows outside that range land in the
/// catch-all partitions which have no <c>partition_range_values</c> entry and are
/// therefore never returned by
/// <see cref="IAuditLogRepository.GetPartitionBoundariesOlderThanAsync"/>.
/// All three seeds must therefore fall inside the defined boundary range.
/// </para>
/// <para>
/// To remain date-independent the test computes <c>RetentionDays</c> dynamically
/// so the purge threshold always lands near <b>2026-01-20</b>:
/// <code>
/// RetentionDays = (int)(DateTime.UtcNow - new DateTime(2026, 1, 20, UTC)).TotalDays + 1
/// </code>
/// This gives:
/// <list type="bullet">
/// <item>Jan 15 2026 row → Jan 15 &lt; Jan 20 threshold → <b>PURGED</b>.</item>
/// <item>Apr 15 / Jun 15 2026 rows → both after Jan 20 → <b>KEPT</b>.</item>
/// </list>
/// The threshold anchors to a fixed calendar point (~Jan 20 2026), so the
/// relationship holds for any future run date as long as the explicit partition
/// boundaries remain.
/// </para>
/// </remarks>
private static (DateTime Old, DateTime Mid, DateTime Recent, int RetentionDays) SeedOccurredAt()
{
// Anchor the threshold midway through January 2026 — strictly after the
// "old" seed (Jan 15) and strictly before the "mid" seed (Apr 15).
var thresholdAnchor = new DateTime(2026, 1, 20, 0, 0, 0, DateTimeKind.Utc);
var retentionDays = (int)(DateTime.UtcNow - thresholdAnchor).TotalDays + 1;
return (
Old: new DateTime(2026, 1, 15, 0, 0, 0, DateTimeKind.Utc), // in Jan-2026 partition → PURGED
Mid: new DateTime(2026, 4, 15, 0, 0, 0, DateTimeKind.Utc), // in Apr-2026 partition → KEPT
Recent: new DateTime(2026, 6, 15, 0, 0, 0, DateTimeKind.Utc), // in Jun-2026 partition → KEPT
RetentionDays: retentionDays
);
}
// ---------------------------------------------------------------------
// 1. EndToEnd_OldestPartition_PurgedViaActor_NewerKept
@@ -154,24 +193,23 @@ WHERE name = 'UX_AuditLog_EventId'
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
// Test date is ~2026-05-20 per environment. We want a threshold that
// sits strictly between Jan 15 (the Jan partition's MAX) and Feb 15
// (the Feb partition's MAX) so only the Jan-2026 partition is
// eligible for purge. RetentionDays = 100 gives a threshold of
// ~2026-02-09 — Jan 15 is older (purged), Feb 15 and Mar 15 are
// newer (kept). The window between Jan 15 and Feb 15 is wide enough
// (~30 days) to tolerate any plausible test-clock drift in CI.
// Seeds three rows in distinct calendar months. RetentionDays is computed
// dynamically so the purge threshold always lands near 2026-01-20 (see
// SeedOccurredAt() for the full rationale):
// Old = Jan 15 2026 → Jan 15 < threshold ~Jan 20 → PURGED
// Mid = Apr 15 2026 → Apr 15 > threshold ~Jan 20 → KEPT
// Recent = Jun 15 2026 → Jun 15 > threshold ~Jan 20 → KEPT
var siteId = "purge-e2e-" + Guid.NewGuid().ToString("N").Substring(0, 8);
var janEventId = Guid.NewGuid();
var febEventId = Guid.NewGuid();
var marEventId = Guid.NewGuid();
var (janOccurred, febOccurred, marOccurred) = SeedOccurredAt();
var oldEventId = Guid.NewGuid();
var midEventId = Guid.NewGuid();
var recentEventId = Guid.NewGuid();
var (oldOccurred, midOccurred, recentOccurred, retentionDays) = SeedOccurredAt();
await using (var seedConn = _fixture.OpenConnection())
{
await DirectInsertAsync(seedConn, janEventId, janOccurred, siteId);
await DirectInsertAsync(seedConn, febEventId, febOccurred, siteId);
await DirectInsertAsync(seedConn, marEventId, marOccurred, siteId);
await DirectInsertAsync(seedConn, oldEventId, oldOccurred, siteId);
await DirectInsertAsync(seedConn, midEventId, midOccurred, siteId);
await DirectInsertAsync(seedConn, recentEventId, recentOccurred, siteId);
}
// Wire the actor with a real EF context against the fixture DB.
@@ -190,15 +228,11 @@ WHERE name = 'UX_AuditLog_EventId'
IntervalHours = 24,
IntervalOverride = TimeSpan.FromMilliseconds(100),
};
var auditOptions = new AuditLogOptions { RetentionDays = 100 };
var auditOptions = new AuditLogOptions { RetentionDays = retentionDays };
CreateActor(sp, purgeOptions, auditOptions);
// Wait for the actor's tick to purge the Jan-2026 partition.
// Concurrent test runs against the same fixture might also create
// eligible partitions, but each test class owns its own fixture DB
// (MsSqlMigrationFixture seeds a guid-named DB per class), so the
// Jan-2026 boundary is the only one this test can have produced.
// The Jan-2026 partition boundary is the only eligible one in this fixture DB.
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
var matched = probe.FishForMessage<AuditLogPurgedEvent>(
isMessage: m => m.MonthBoundary == janBoundary,
@@ -206,9 +240,7 @@ WHERE name = 'UX_AuditLog_EventId'
Assert.True(matched.RowsDeleted >= 1,
$"Expected RowsDeleted >= 1 for Jan-2026 boundary; got {matched.RowsDeleted}.");
// Allow a brief settle in case the actor is mid-tick on Feb/Mar
// (it shouldn't be, since RetentionDays = 90 means only Jan is
// eligible, but the actor MAY re-enumerate quickly while we read).
// Allow a brief settle in case the actor re-enumerates quickly.
await Task.Delay(TimeSpan.FromMilliseconds(500));
await using var verify = CreateContext();
@@ -216,11 +248,10 @@ WHERE name = 'UX_AuditLog_EventId'
.Where(e => e.SourceSiteId == siteId)
.ToListAsync();
// Jan removed; Feb + Mar untouched. Because the test owns the site
// id and the fixture DB, exact set membership is observable.
Assert.DoesNotContain(rows, r => r.EventId == janEventId);
Assert.Contains(rows, r => r.EventId == febEventId);
Assert.Contains(rows, r => r.EventId == marEventId);
// Old (Jan) removed; Mid (Apr) + Recent (Jun) untouched.
Assert.DoesNotContain(rows, r => r.EventId == oldEventId);
Assert.Contains(rows, r => r.EventId == midEventId);
Assert.Contains(rows, r => r.EventId == recentEventId);
}
// ---------------------------------------------------------------------
@@ -232,20 +263,19 @@ WHERE name = 'UX_AuditLog_EventId'
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
// Same shape as test 1 — purge the Jan-2026 partition and then
// assert the UX_AuditLog_EventId index is still present. The
// drop-and-rebuild dance briefly removes it inside its transaction
// (the SWITCH PARTITION step requires the non-aligned unique index
// to be absent), but step 5 rebuilds it before committing. Sanity-
// checking the post-COMMIT shape here documents the invariant in an
// assertable way.
// Same shape as test 1 — purge the Jan-2026 partition and then assert the
// UX_AuditLog_EventId index is still present. RetentionDays is computed
// dynamically so the threshold always lands near 2026-01-20 (see SeedOccurredAt()).
// The drop-and-rebuild dance briefly removes the index inside its transaction
// (the SWITCH PARTITION step requires the non-aligned unique index to be absent),
// but step 5 rebuilds it before committing.
var siteId = "purge-uxidx-" + Guid.NewGuid().ToString("N").Substring(0, 8);
var janEventId = Guid.NewGuid();
var (janOccurred, _, _) = SeedOccurredAt();
var oldEventId = Guid.NewGuid();
var (oldOccurred, _, _, retentionDays) = SeedOccurredAt();
await using (var seedConn = _fixture.OpenConnection())
{
await DirectInsertAsync(seedConn, janEventId, janOccurred, siteId);
await DirectInsertAsync(seedConn, oldEventId, oldOccurred, siteId);
}
var services = new ServiceCollection();
@@ -265,7 +295,7 @@ WHERE name = 'UX_AuditLog_EventId'
IntervalHours = 24,
IntervalOverride = TimeSpan.FromMilliseconds(100),
},
new AuditLogOptions { RetentionDays = 90 });
new AuditLogOptions { RetentionDays = retentionDays });
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
probe.FishForMessage<AuditLogPurgedEvent>(
@@ -287,18 +317,19 @@ WHERE name = 'UX_AuditLog_EventId'
{
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
// Seed + purge a Jan-2026 row, THEN exercise InsertIfNotExistsAsync
// twice for a fresh (May-2026) EventId. The second call must be a
// no-op (duplicate-key collision swallowed by the repository, per
// M2 Bundle A's race-fix) — which means the rebuilt
// UX_AuditLog_EventId unique index is functioning as intended.
// Seed + purge the Jan-2026 row, THEN exercise InsertIfNotExistsAsync twice for
// a fresh recent EventId. The second call must be a no-op (duplicate-key collision
// swallowed by the repository, per M2 Bundle A's race-fix) — which means the
// rebuilt UX_AuditLog_EventId unique index is functioning as intended.
// RetentionDays is computed dynamically so the threshold always lands near
// 2026-01-20 (see SeedOccurredAt()).
var siteId = "purge-idem-" + Guid.NewGuid().ToString("N").Substring(0, 8);
var janEventId = Guid.NewGuid();
var (janOccurred, _, _) = SeedOccurredAt();
var oldEventId = Guid.NewGuid();
var (oldOccurred, _, _, retentionDays) = SeedOccurredAt();
await using (var seedConn = _fixture.OpenConnection())
{
await DirectInsertAsync(seedConn, janEventId, janOccurred, siteId);
await DirectInsertAsync(seedConn, oldEventId, oldOccurred, siteId);
}
var services = new ServiceCollection();
@@ -318,7 +349,7 @@ WHERE name = 'UX_AuditLog_EventId'
IntervalHours = 24,
IntervalOverride = TimeSpan.FromMilliseconds(100),
},
new AuditLogOptions { RetentionDays = 90 });
new AuditLogOptions { RetentionDays = retentionDays });
var janBoundary = new DateTime(2026, 1, 1, 0, 0, 0, DateTimeKind.Utc);
probe.FishForMessage<AuditLogPurgedEvent>(
@@ -334,7 +365,7 @@ WHERE name = 'UX_AuditLog_EventId'
await Task.Delay(TimeSpan.FromMilliseconds(500));
var freshEventId = Guid.NewGuid();
var freshOccurred = new DateTime(2026, 5, 15, 12, 0, 0, DateTimeKind.Utc);
var freshOccurred = new DateTime(2026, 5, 15, 12, 0, 0, DateTimeKind.Utc); // within partition range, well inside retention window
var freshSite = "purge-idem-fresh-" + Guid.NewGuid().ToString("N").Substring(0, 8);
var freshEvt = ScadaBridgeAuditEventFactory.Create(
eventId: freshEventId,