9.4 KiB
M5 — Audit Hardening (T3–T8) Implementation Plan
For Claude: executed via superpowers-extended-cc:subagent-driven-development in this session.
Goal: Ship six independent audit-log hardening items (per-channel retention, ParentExecutionId tag-cascade, SourceNode backfill, per-node stuck KPIs, structured response-capture increments, CLI audit tree) without an AuditLog schema change.
Architecture: Each item extends an existing seam identified in the survey. No new infra dependency (T1 hash-chain + T2 Parquet stay deferred to v1.x). Design: docs/plans/2026-06-16-m5-audit-hardening-design.md.
Tech Stack: C#/.NET 10, EF Core (MS SQL), Akka.NET, Blazor Server, System.CommandLine, xUnit.
Conventions: targeted builds/tests per task (dotnet build <proj>, dotnet test --filter); full-solution build only at integration (M5.7). Implementers do NOT create worktrees (already in worktree-m5-audit-hardening) and commit with pathspec form git commit -m "..." -- <paths> (retry on index.lock). Append-only invariant holds for writer/ingest paths; the only sanctioned mutations are T3's purge-role channel delete and T5's purge-role sentinel UPDATE, both reflected in the M2.10 CI-guard allow-list.
Wave A — leverage-existing-infra (parallel; disjoint projects)
Task M5.1 (T8): CLI audit tree + tree endpoint
Classification: standard · ~5 min · Parallelizable with: M5.2, M5.3 Files:
- Modify:
src/ZB.MOM.WW.ScadaBridge.ManagementService/AuditEndpoints.cs(MapAuditAPI, ~line 97) — addGET /api/audit/tree?executionId=<guid>→IAuditLogRepository.GetExecutionTreeAsync(executionId)→ JSONExecutionTreeNode[]; 400 on missing/invalid guid, empty array when no rows. - Create:
src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditTreeHelpers.cs— renderExecutionTreeNode[]as an indented ASCII tree (table) and as raw JSON (--format json), mirroringAuditQueryHelpers/AuditExportHelpers. - Modify:
src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs(Build, ~line 28) — addBuildTree():audit tree --execution-id <guid> [--format table|json], calls the new endpoint via the existingManagementHttpClientpattern. - Test: ManagementService tests for the endpoint (multi-level tree + not-found); CLI tests for
AuditTreeHelpersrendering. AC:audit tree --execution-id <id>prints the execution tree (root→children, indented);--format jsonemits the node array; the server walk reuses the existingGetExecutionTreeAsync(no new SQL). No schema change.
Task M5.2 (T6): Per-node stuck-count KPIs
Classification: standard · ~5 min · Parallelizable with: M5.1, M5.3 Files:
- Modify:
NotificationOutboxRepository— addComputePerNodeKpisAsync(group bySourceNode) parallel toComputePerSiteKpisAsync. - Modify:
src/ZB.MOM.WW.ScadaBridge.SiteCallAudit/...Repository— sameComputePerNodeKpisAsync. - Modify:
NotificationOutboxActor.cs(~line 1054) +SiteCallAuditActor.cs(~line 781) — add aPerNode…KpiRequest/Responsemessage pair (in Commons messages) and aReceive<>/handler each. - Modify: CentralUI
AuditKpiTiles.razor/SiteCallKpiTiles.razor(or the per-site KPI panel) — add an additive per-node breakdown. - Test: repository per-node grouping returns correct stuck/parked/queue-depth counts; actor message round-trip.
AC: per-node stuck/parked counts available + surfaced;
SourceNodealready on both tables (no migration). Per-site KPIs unchanged.
Task M5.3 (T7): Structured response-capture increments
Classification: standard · ~5 min · Parallelizable with: M5.1, M5.2 Files:
- Modify:
src/ZB.MOM.WW.ScadaBridge.AuditLog/...AuditWriteMiddleware.cs(EmitInboundAudit, ~line 246) — capture inbound request headers into the existingExtraJSON (through the existing header redactor; auth headers redacted by default). - Modify:
src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditCentralHealthSnapshot.cs— add anAuditInboundCeilingHitscounter (+ its interface), incremented from the middleware when an inbound row truncates (requestTruncated || responseTruncated). - Modify:
src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/PerTargetRedactionOverride.cs— add aSkipBodyCaptureflag; honor it in the capture pipeline (suppress body, keep headers + metadata + the row). - Test: request headers land in
Extraand are redacted; ceiling-hit increments the counter;SkipBodyCapturesuppresses body but still writes the row. AC: no schema change (usesExtraJSON + health snapshot); existing redaction behavior preserved.
Wave B — actor model + maintenance (parallel; T5 after M5.1's CLI edits)
Task M5.4 (T4): ParentExecutionId tag-cascade
Classification: high-risk (actor model + correlation) · ~5 min · Parallelizable with: M5.5 (and M5.6) Files:
- Modify:
src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/AlarmActor.cs(SpawnAlarmExecutionActor, ~line 578) +AlarmExecutionActor.cs(ctor, ~line 90) — thread aGuid? parentExecutionIdso alarm-triggered scripts chain to the firing context; pass it into theScriptRuntimeContext(currentlynull). - Modify:
src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Scripts/ScriptRuntimeContext.cs(CallScript~line 394,CallShared) — pass the current run's_executionId(not the inherited_parentExecutionId) as the child invocation'sParentExecutionId, forming a true multi-level tree. - Test (
tests/.../SiteRuntime.Tests/): an alarm-triggered script row carries the expected parent; a 2-level nestedCallScript(A→B→C) is walkable viaGetExecutionTreeAsync(or assert the emittedParentExecutionIdchain). AC: alarm/trigger-spawned and nested-call runs form a correct execution tree; top-level timer/expression-trigger runs stay roots; no regression to the inbound-API→routed-script path.
Task M5.5 (T3): Per-channel retention overrides
Classification: high-risk (purge/deletion + CI guard) · ~5 min · Parallelizable with: M5.4, M5.6 Files:
- Modify:
src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/AuditLogOptions.cs— addDictionary<string,int> PerChannelRetentionDays(keyed byAction/channel name); validate inAuditLogOptionsValidator.cs(each override in[30, global], shorter-than-global only). - Modify:
src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditLogPurgeActor.cs(HandlePurgeTickAsync, ~line 135) — after the global partition switch-out, for each channel with a shorter override, run a bounded batched DELETE (WHERE Action=@channel AND OccurredAtUtc<@threshold) via the purge/maintenance path. - Modify: the M2.10 CI grep-guard script — add an allow-list entry for the purge actor's single audited DELETE call site (do NOT blanket-exempt; the guard must still reject all other UPDATE/DELETE on AuditLog).
- Test: a channel with a shorter override is purged earlier than global; un-overridden channels follow global; the CI guard still fails on a stray DELETE elsewhere. AC: per-channel retention works without violating writer-role append-only; the guard remains effective.
Task M5.6 (T5): SourceNode sentinel backfill + runbook
Classification: small · ~4 min · Parallelizable with: M5.4, M5.5 · Depends on: M5.1 (shares AuditCommands.cs)
Files:
- Create: a one-shot maintenance backfill (purge/maintenance path) that sets
SourceNodeto a configurable sentinel (default"unknown") onNULLrows within a boundedOccurredAtUtcrange; idempotent. - Modify:
src/ZB.MOM.WW.ScadaBridge.CLI/Commands/AuditCommands.cs— addaudit backfill-source-node [--sentinel <s>] [--before <date>]invoking it (after M5.1'saudit treeis in, to avoid a concurrent edit to this file). - Modify/Create: a runbook note (
deploy/.../RUNBOOK.mdor the AuditLog component doc) documenting thatExecutionId/ParentExecutionIdare computed fromDetailsJsonand CANNOT be backfilled under append-only (pre-feature rows stay NULL) — no false precision. - Test: backfill sets the sentinel only on NULL rows in range, is idempotent, and does not touch non-NULL rows. AC: SourceNode backfill is sanctioned maintenance (CI-guard allow-listed if it does UPDATE); the computed-id limitation is documented, not coded.
Wave C — integration + docs
Task M5.7: Integration verification + docs
Classification: high-risk (final integration reviewer) · ~5 min · Depends on: M5.1–M5.6 Steps:
dotnet build ZB.MOM.WW.ScadaBridge.slnx(full solution).- Targeted tests across AuditLog, ManagementService, CLI, NotificationOutbox/SiteCallAudit, SiteRuntime, CentralUI; run the CI grep-guard to confirm it still blocks stray UPDATE/DELETE.
- Docs:
docs/requirements/Component-AuditLog.md(per-channel retention, per-node KPIs, response-capture increments, tag-cascade,audit tree),Component-CLI.md+ CLI README (audit tree,audit backfill-source-node), CLAUDE.md audit notes (per-channel retention; tag-cascade now beyond inbound; per-node KPIs), and the runbook computed-id limitation. - Commit; final integration review of the whole
1b7600f..HEADdiff. AC: full build green; all targeted suites + CI guard green; docs reflect the six shipped items; no doc claims a deferred item shipped (T1/T2 remain deferred).
Native tasks & dependencies
Sub-tasks created as native tasks under umbrella #16 (M5). Edges: M5.6 ⟵ M5.1 (shared CLI file); M5.7 ⟵ M5.1–M5.6. Waves: A = {M5.1, M5.2, M5.3} parallel; B = {M5.4, M5.5, M5.6} parallel (M5.6 after M5.1); C = M5.7.