diff --git a/docs/plans/2026-06-16-m5-audit-hardening.md b/docs/plans/2026-06-16-m5-audit-hardening.md new file mode 100644 index 00000000..cba5766a --- /dev/null +++ b/docs/plans/2026-06-16-m5-audit-hardening.md @@ -0,0 +1,92 @@ +# M5 — Audit Hardening (T3–T8) Implementation Plan + +> **For Claude:** executed via superpowers-extended-cc:subagent-driven-development in this session. + +**Goal:** Ship six independent audit-log hardening items (per-channel retention, ParentExecutionId tag-cascade, SourceNode backfill, per-node stuck KPIs, structured response-capture increments, CLI `audit tree`) without an AuditLog schema change. + +**Architecture:** Each item extends an existing seam identified in the survey. No new infra dependency (T1 hash-chain + T2 Parquet stay deferred to v1.x). Design: `docs/plans/2026-06-16-m5-audit-hardening-design.md`. + +**Tech Stack:** C#/.NET 10, EF Core (MS SQL), Akka.NET, Blazor Server, System.CommandLine, xUnit. + +**Conventions:** targeted builds/tests per task (`dotnet build `, `dotnet test --filter`); full-solution build only at integration (M5.7). Implementers do NOT create worktrees (already in `worktree-m5-audit-hardening`) and commit with pathspec form `git commit -m "..." -- ` (retry on index.lock). Append-only invariant holds for writer/ingest paths; the only sanctioned mutations are T3's purge-role channel delete and T5's purge-role sentinel UPDATE, both reflected in the M2.10 CI-guard allow-list. + +--- + +# Wave A — leverage-existing-infra (parallel; disjoint projects) + +### Task M5.1 (T8): CLI `audit tree` + tree endpoint +**Classification:** standard · **~5 min** · **Parallelizable with:** M5.2, M5.3 +**Files:** +- Modify: `src/ZB.MOM.WW.ScadaBridge.ManagementService/AuditEndpoints.cs` (`MapAuditAPI`, ~line 97) — add `GET /api/audit/tree?executionId=` → `IAuditLogRepository.GetExecutionTreeAsync(executionId)` → JSON `ExecutionTreeNode[]`; 400 on missing/invalid guid, empty array when no rows. +- Create: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditTreeHelpers.cs` — render `ExecutionTreeNode[]` as an indented ASCII tree (table) and as raw JSON (`--format json`), mirroring `AuditQueryHelpers`/`AuditExportHelpers`. +- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` (`Build`, ~line 28) — add `BuildTree()`: `audit tree --execution-id [--format table|json]`, calls the new endpoint via the existing `ManagementHttpClient` pattern. +- Test: ManagementService tests for the endpoint (multi-level tree + not-found); CLI tests for `AuditTreeHelpers` rendering. +**AC:** `audit tree --execution-id ` prints the execution tree (root→children, indented); `--format json` emits the node array; the server walk reuses the existing `GetExecutionTreeAsync` (no new SQL). No schema change. + +### Task M5.2 (T6): Per-node stuck-count KPIs +**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.3 +**Files:** +- Modify: `NotificationOutboxRepository` — add `ComputePerNodeKpisAsync` (group by `SourceNode`) parallel to `ComputePerSiteKpisAsync`. +- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteCallAudit/...Repository` — same `ComputePerNodeKpisAsync`. +- Modify: `NotificationOutboxActor.cs` (~line 1054) + `SiteCallAuditActor.cs` (~line 781) — add a `PerNode…KpiRequest`/`Response` message pair (in Commons messages) and a `Receive<>`/handler each. +- Modify: CentralUI `AuditKpiTiles.razor` / `SiteCallKpiTiles.razor` (or the per-site KPI panel) — add an additive per-node breakdown. +- Test: repository per-node grouping returns correct stuck/parked/queue-depth counts; actor message round-trip. +**AC:** per-node stuck/parked counts available + surfaced; `SourceNode` already on both tables (no migration). Per-site KPIs unchanged. + +### Task M5.3 (T7): Structured response-capture increments +**Classification:** standard · **~5 min** · **Parallelizable with:** M5.1, M5.2 +**Files:** +- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/...AuditWriteMiddleware.cs` (`EmitInboundAudit`, ~line 246) — capture inbound **request headers** into the existing `Extra` JSON (through the existing header redactor; auth headers redacted by default). +- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditCentralHealthSnapshot.cs` — add an `AuditInboundCeilingHits` counter (+ its interface), incremented from the middleware when an inbound row truncates (`requestTruncated || responseTruncated`). +- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/PerTargetRedactionOverride.cs` — add a `SkipBodyCapture` flag; honor it in the capture pipeline (suppress body, keep headers + metadata + the row). +- Test: request headers land in `Extra` and are redacted; ceiling-hit increments the counter; `SkipBodyCapture` suppresses body but still writes the row. +**AC:** no schema change (uses `Extra` JSON + health snapshot); existing redaction behavior preserved. + +--- + +# Wave B — actor model + maintenance (parallel; T5 after M5.1's CLI edits) + +### Task M5.4 (T4): ParentExecutionId tag-cascade +**Classification:** high-risk (actor model + correlation) · **~5 min** · **Parallelizable with:** M5.5 (and M5.6) +**Files:** +- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/AlarmActor.cs` (`SpawnAlarmExecutionActor`, ~line 578) + `AlarmExecutionActor.cs` (ctor, ~line 90) — thread a `Guid? parentExecutionId` so alarm-triggered scripts chain to the firing context; pass it into the `ScriptRuntimeContext` (currently `null`). +- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs` (`CallScript` ~line 394, `CallShared`) — pass **the current run's `_executionId`** (not the inherited `_parentExecutionId`) as the child invocation's `ParentExecutionId`, forming a true multi-level tree. +- Test (`tests/.../SiteRuntime.Tests/`): an alarm-triggered script row carries the expected parent; a 2-level nested `CallScript` (A→B→C) is walkable via `GetExecutionTreeAsync` (or assert the emitted `ParentExecutionId` chain). +**AC:** alarm/trigger-spawned and nested-call runs form a correct execution tree; top-level timer/expression-trigger runs stay roots; no regression to the inbound-API→routed-script path. + +### Task M5.5 (T3): Per-channel retention overrides +**Classification:** high-risk (purge/deletion + CI guard) · **~5 min** · **Parallelizable with:** M5.4, M5.6 +**Files:** +- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/AuditLogOptions.cs` — add `Dictionary PerChannelRetentionDays` (keyed by `Action`/channel name); validate in `AuditLogOptionsValidator.cs` (each override in `[30, global]`, shorter-than-global only). +- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditLogPurgeActor.cs` (`HandlePurgeTickAsync`, ~line 135) — after the global partition switch-out, for each channel with a shorter override, run a **bounded batched DELETE** (`WHERE Action=@channel AND OccurredAtUtc<@threshold`) via the purge/maintenance path. +- Modify: the M2.10 CI grep-guard script — add an allow-list entry for the purge actor's single audited DELETE call site (do NOT blanket-exempt; the guard must still reject all other UPDATE/DELETE on AuditLog). +- Test: a channel with a shorter override is purged earlier than global; un-overridden channels follow global; the CI guard still fails on a stray DELETE elsewhere. +**AC:** per-channel retention works without violating writer-role append-only; the guard remains effective. + +### Task M5.6 (T5): SourceNode sentinel backfill + runbook +**Classification:** small · **~4 min** · **Parallelizable with:** M5.4, M5.5 · **Depends on:** M5.1 (shares `AuditCommands.cs`) +**Files:** +- Create: a one-shot maintenance backfill (purge/maintenance path) that sets `SourceNode` to a configurable sentinel (default `"unknown"`) on `NULL` rows within a bounded `OccurredAtUtc` range; idempotent. +- Modify: `src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs` — add `audit backfill-source-node [--sentinel ] [--before ]` invoking it (after M5.1's `audit tree` is in, to avoid a concurrent edit to this file). +- Modify/Create: a runbook note (`deploy/.../RUNBOOK.md` or the AuditLog component doc) documenting that `ExecutionId`/`ParentExecutionId` are computed from `DetailsJson` and CANNOT be backfilled under append-only (pre-feature rows stay NULL) — no false precision. +- Test: backfill sets the sentinel only on NULL rows in range, is idempotent, and does not touch non-NULL rows. +**AC:** SourceNode backfill is sanctioned maintenance (CI-guard allow-listed if it does UPDATE); the computed-id limitation is documented, not coded. + +--- + +# Wave C — integration + docs + +### Task M5.7: Integration verification + docs +**Classification:** high-risk (final integration reviewer) · **~5 min** · **Depends on:** M5.1–M5.6 +**Steps:** +1. `dotnet build ZB.MOM.WW.ScadaBridge.slnx` (full solution). +2. Targeted tests across AuditLog, ManagementService, CLI, NotificationOutbox/SiteCallAudit, SiteRuntime, CentralUI; run the CI grep-guard to confirm it still blocks stray UPDATE/DELETE. +3. Docs: `docs/requirements/Component-AuditLog.md` (per-channel retention, per-node KPIs, response-capture increments, tag-cascade, `audit tree`), `Component-CLI.md` + CLI README (`audit tree`, `audit backfill-source-node`), CLAUDE.md audit notes (per-channel retention; tag-cascade now beyond inbound; per-node KPIs), and the runbook computed-id limitation. +4. Commit; final integration review of the whole `1b7600f..HEAD` diff. +**AC:** full build green; all targeted suites + CI guard green; docs reflect the six shipped items; no doc claims a deferred item shipped (T1/T2 remain deferred). + +--- + +## Native tasks & dependencies + +Sub-tasks created as native tasks under umbrella #16 (M5). Edges: M5.6 ⟵ M5.1 (shared CLI file); M5.7 ⟵ M5.1–M5.6. Waves: A = {M5.1, M5.2, M5.3} parallel; B = {M5.4, M5.5, M5.6} parallel (M5.6 after M5.1); C = M5.7. diff --git a/docs/plans/2026-06-16-m5-audit-hardening.md.tasks.json b/docs/plans/2026-06-16-m5-audit-hardening.md.tasks.json new file mode 100644 index 00000000..f3f9a514 --- /dev/null +++ b/docs/plans/2026-06-16-m5-audit-hardening.md.tasks.json @@ -0,0 +1,13 @@ +{ + "planPath": "docs/plans/2026-06-16-m5-audit-hardening.md", + "tasks": [ + {"id": 119, "subject": "M5.1 (T8): CLI audit tree + tree endpoint", "status": "pending"}, + {"id": 120, "subject": "M5.2 (T6): Per-node stuck-count KPIs", "status": "pending"}, + {"id": 121, "subject": "M5.3 (T7): Structured response-capture increments", "status": "pending"}, + {"id": 122, "subject": "M5.4 (T4): ParentExecutionId tag-cascade", "status": "pending"}, + {"id": 123, "subject": "M5.5 (T3): Per-channel retention overrides", "status": "pending"}, + {"id": 124, "subject": "M5.6 (T5): SourceNode sentinel backfill + runbook", "status": "pending", "blockedBy": [119]}, + {"id": 125, "subject": "M5.7: M5 integration verification + docs", "status": "pending", "blockedBy": [119, 120, 121, 122, 123, 124]} + ], + "lastUpdated": "2026-06-16" +}