diff --git a/docs/plans/2026-05-20-centralized-audit-log.md b/docs/plans/2026-05-20-centralized-audit-log.md new file mode 100644 index 0000000..a6becf7 --- /dev/null +++ b/docs/plans/2026-05-20-centralized-audit-log.md @@ -0,0 +1,787 @@ +# Centralized Audit Log Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task. +> +> **Repo nature:** Design-documentation only. No code, no tests. Each task is a documentation change. "Verify" = re-read the diff + grep for stale cross-references. Commit after each task. + +**Goal:** Document the new **#23 Audit Log** component and propagate its cross-references across every affected component design, the README, HighLevelReqs, and CLAUDE.md — exactly as specified in `alog.md` (committed `fec0bb1`). + +**Architecture:** Layered, append-only `AuditLog` table at central, alongside existing `Notifications` (#21) and `SiteCalls` (#22) operational stores. Site SQLite writes on the hot path; gRPC telemetry forwards to central; site purge requires `ForwardState ∈ {Forwarded, Reconciled}`. Cached calls send a single telemetry packet that drives both the immutable `AuditLog` insert and the operational `SiteCalls` upsert. Central-originated events (Inbound API, Notification dispatch attempts) write directly. Monthly partitioning at central, 365-day default retention. + +**Tech Stack:** Markdown only. No code in v1 of this plan. + +**Spec:** `/Users/dohertj2/Desktop/scadalink-design/alog.md` (see commit `fec0bb1`). All task content below cites sections of that file. + +--- + +## Task 0: Prepare branch + +**Files:** +- None — git operation only. + +**Step 1: Confirm working tree state** + +Run: `git status --short` +Expected: three unstaged `infra/` modifications (unrelated; leave them alone), nothing else. + +**Step 2: Create feature branch off `main`** + +Run: `git switch -c feature/audit-log-docs` +Expected: switched to a new branch. + +**Step 3: Verify branch** + +Run: `git rev-parse --abbrev-ref HEAD` +Expected: `feature/audit-log-docs`. + +**No commit at this task — just branch prep.** + +--- + +## Task 1: Author `Component-AuditLog.md` + +**Files:** +- Create: `docs/requirements/Component-AuditLog.md` + +**Step 1: Read context** + +Read `alog.md` §1–§16. Read the structural style of `docs/requirements/Component-SiteCallAudit.md` and `docs/requirements/Component-NotificationOutbox.md` — mirror their section ordering (Purpose / Location / Responsibilities / Tables / Lifecycle / Ingest & Idempotency / Reconciliation / Retention & Purge / KPIs / Configuration / Dependencies / Interactions). + +**Step 2: Write the skeleton** + +Create the file with these top-level headings (verbatim, in order): + +``` +# Component: Audit Log + +## Purpose +## Location +## Responsibilities +## Scope — the script trust boundary +## The `AuditLog` Table (central) +## The Site-Local `AuditLog` (SQLite) +## Ingestion Paths +## Cached Operations — Combined Telemetry +## Payload Capture Policy +## Failure Handling & Idempotency +## Retention & Purge +## Security & Tamper-Evidence +## KPIs +## Configuration +## Dependencies +## Interactions +``` + +**Step 3: Fill `Purpose`** + +Two-paragraph version of `alog.md` §1. Lead sentence: "Provides a single, append-only, forensic + operational record of every integration action initiated by, or terminating in, a script — across outbound API, outbound DB, notifications, and inbound API." Second paragraph: not a dispatcher, observes Notification Outbox (#21) and Site Call Audit (#22), adds coverage where they are silent. + +**Step 4: Fill `Location`** + +Central cluster + site cluster. Central: `AuditLog` table in MS SQL plus three singleton actors on the active central node — `AuditLogIngestActor` (telemetry receiver), `SiteAuditReconciliationActor`, `AuditLogPurgeActor`. Sites: `AuditLog` SQLite database file alongside the S&F buffer plus `SiteAuditTelemetryActor` singleton on the active site node. Registered as component #23 in the Host role configuration. + +**Step 5: Fill `Responsibilities`** + +Bullet list mirroring `alog.md` §1–§3 commitments. Six bullets: +- Accept site-local hot-path audit writes from script-trust-boundary call paths. +- Forward site audit rows to central via gRPC telemetry with at-least-once + idempotency on `EventId`. +- Run periodic reconciliation pulls per site to self-heal missed telemetry. +- Accept central-originated audit writes (Inbound API, Notification dispatch attempts). +- Compute point-in-time KPIs (global + per-site) from the central `AuditLog` table. +- Purge expired rows by monthly partition switch. + +**Step 6: Fill `Scope — the script trust boundary`** + +Reproduce the table from `alog.md` §2 verbatim (the six rows). Add the "Out of scope" bullet list. Add the DB-reads note. + +**Step 7: Fill `The AuditLog Table (central)`** + +Reproduce the column table from `alog.md` §4. Then the index list. Then the `Kind`-per-channel table (with the inbound API simplification — only `Completed`). + +**Step 8: Fill `The Site-Local AuditLog (SQLite)`** + +State same schema as central minus `IngestedAtUtc`, plus `ForwardState` (`Pending | Forwarded | Reconciled`). Reproduce the **hard purge invariant** from `alog.md` §4 verbatim: + +> A row is eligible for purge only when both `OccurredAtUtc < retention threshold` AND `ForwardState IN ('Forwarded', 'Reconciled')`. Pending rows are never purged. + +Mention the `SiteAuditBacklog` health metric. + +**Step 9: Fill `Ingestion Paths`** + +Three subsections mirroring `alog.md` §6.1, §6.2, §6.3, §6.4. Keep concise — full pseudo-code lives in `alog.md`; the component doc captures the contract. + +**Step 10: Fill `Cached Operations — Combined Telemetry`** + +Capture `alog.md` §6.5 — site is source of truth, one telemetry packet carries both the audit row and the SiteCalls operational update; central ingest performs both writes in a single transaction. + +**Step 11: Fill `Payload Capture Policy`** + +Compress `alog.md` §8 into 8–12 lines: defaults (8 KB / 64 KB on error), header redaction, body-redactor regex hook, SQL captures values by default with per-connection opt-out, never-captured list (API keys, LDAP creds, secrets), safety-net over-redacts on misconfiguration. + +**Step 12: Fill `Failure Handling & Idempotency`** + +Compress `alog.md` §9: EventId is the PK and dedup key; never-fail-the-action principle; ring buffer for transient SQLite write failures; reconciliation as fallback when telemetry actor wedges; central-direct-write failure handling. + +**Step 13: Fill `Retention & Purge`** + +Compress `alog.md` §12: 365-day default central retention; monthly partition switch; no row-level deletes at central; site 7-day default; site purge respects `ForwardState`. + +**Step 14: Fill `Security & Tamper-Evidence`** + +Compress `alog.md` §11: dedicated `scadalink_audit_writer` (INSERT+SELECT) and `scadalink_audit_purger` (partition-switch only) DB roles; CI grep guard against `UPDATE`/`DELETE` of `AuditLog`; Audit + OperationalAudit + AuditExport permissions; hash-chain tamper evidence deferred to v1.x. + +**Step 15: Fill `KPIs`** + +List the five KPIs from `alog.md` §14: Volume, Error rate, Backlog, Top inbound callers, Top outbound 5xx. Note that Notification Outbox and Site Call Audit KPIs are unaffected. + +**Step 16: Fill `Configuration`** + +Show the `AuditLog` `appsettings.json` shape from `alog.md` §8.4. Include `DefaultCapBytes`, `ErrorCapBytes`, `HeaderRedactList`, `GlobalBodyRedactors`, `PerTargetOverrides`, and `RetentionDays` (global only in v1). + +**Step 17: Fill `Dependencies`** + +Cross-references to: +- **Commons (#16)** — `AuditEvent`, `IAuditWriter`, `ICentralAuditWriter`, `AuditChannel`, `AuditKind`, `AuditStatus` types and interfaces. +- **Configuration Database (#17)** — `AuditLog` table schema, partition function/scheme, DB roles, retention options. +- **Cluster Infrastructure (#13)** — singleton placement and supervision (`AuditLogIngestActor`, `SiteAuditTelemetryActor`, `SiteAuditReconciliationActor`, `AuditLogPurgeActor`). +- **Communication (#5)** — gRPC telemetry message types added to the existing site-stream proto additively. +- **Site Runtime (#3)** — script trust boundary touchpoints invoke `IAuditWriter`. +- **Host (#15)** — registers the new component under the central + site roles. + +**Step 18: Fill `Interactions`** + +Edges to: +- **External System Gateway (#7)** — emits `ApiOutbound.SyncCall` rows; for `CachedCall` emits combined telemetry (audit + operational). +- **Site Runtime (#3) / Database layer** — emits `DbOutbound.SyncWrite`, `DbOutbound.SyncRead`, and cached variants similarly. +- **Inbound API (#14)** — emits `ApiInbound.Completed` rows from request middleware. +- **Notification Outbox (#21)** — site-emitted `Notification.Enqueued` flows via audit telemetry; central dispatcher writes `Notification.Attempt` and `Notification.Terminal` rows directly via `ICentralAuditWriter`. +- **Site Call Audit (#22)** — shares the cached-call telemetry packet; central ingest of that packet performs both `AuditLog` insert and `SiteCalls` upsert in one transaction. +- **Central UI (#9)** — new Audit nav group + Audit Log page; drill-in links from Notifications, Site Calls, External Systems, Inbound API key, Sites, Instances detail pages. +- **Health Monitoring (#11)** — three new tiles (Volume, Error rate, Backlog) plus new metrics (`SiteAuditBacklog`, `SiteAuditWriteFailures`, `SiteAuditTelemetryStalled`, `CentralAuditWriteFailures`, `AuditRedactionFailure`). +- **CLI (#19)** — `scadalink audit query|export|verify-chain` commands. + +**Step 19: Verify** + +Run: `grep -n "Component-AuditLog.md\|#23" docs/requirements/Component-AuditLog.md` +Expected: file references itself sensibly. + +Run: `wc -l docs/requirements/Component-AuditLog.md` +Expected: ~250–400 lines (sanity check; not exact). + +**Step 20: Commit** + +```bash +git add docs/requirements/Component-AuditLog.md +git commit -m "docs(audit): add Component-AuditLog (#23) design document" +``` + +--- + +## Task 2: Update `Component-Commons.md` + +**Files:** +- Modify: `docs/requirements/Component-Commons.md` + +**Step 1: Read existing structure** + +Read the file to find the right sections — likely "Types", "Interfaces", "Messages", "Entities". Note which subsections audit-related additions belong in. + +**Step 2: Add to `Types/`** + +Under the Types section, add: + +- `AuditChannel` enum: `ApiOutbound | DbOutbound | Notification | ApiInbound`. +- `AuditKind` enum: union of channel-specific values from `alog.md` §4 table. +- `AuditStatus` enum: `Success | TransientFailure | PermanentFailure | Enqueued | Retrying | Delivered | Parked | Discarded`. +- `AuditEvent` POCO record carrying every column from `alog.md` §4 (central schema), plus a `ForwardState` for site SQLite. + +**Step 3: Add to `Interfaces/`** + +- `IAuditWriter` — site-local hot-path interface: `Task WriteAsync(AuditEvent evt, CancellationToken ct)`. Implementation lives in Audit Log (#23) component. +- `ICentralAuditWriter` — central direct-write interface: `Task WriteAsync(AuditEvent evt, CancellationToken ct)` with insert-if-not-exists semantics on `EventId`. + +**Step 4: Add to `Messages/`** + +- `AuditTelemetryEnvelope` — gRPC message wrapping a batch of `AuditEvent` rows for telemetry forwarding. +- `CachedOperationTelemetry` — additively-evolved version of the existing SiteCalls telemetry message: now also carries `AuditEvent` content alongside the operational `SiteCalls` upsert fields. Note additive-only evolution rule (per existing project convention). + +**Step 5: Verify** + +Run: `grep -n "AuditEvent\|IAuditWriter\|AuditChannel" docs/requirements/Component-Commons.md` +Expected: all five identifiers appear in the right sections. + +**Step 6: Commit** + +```bash +git add docs/requirements/Component-Commons.md +git commit -m "docs(audit): register AuditEvent, IAuditWriter, AuditTelemetry types in Commons" +``` + +--- + +## Task 3: Update `Component-ConfigurationDatabase.md` + +**Files:** +- Modify: `docs/requirements/Component-ConfigurationDatabase.md` + +**Step 1: Read existing structure** + +Find the "Tables" and "Roles" / "Permissions" / "Migrations" sections. + +**Step 2: Add `AuditLog` table description** + +Under Tables, add a new subsection mirroring how `Notifications` and `SiteCalls` are documented. Include: +- Full column list from `alog.md` §4 (central table). +- Index list from `alog.md` §4. +- Monthly partitioning: partition function `pf_AuditLog_Month`, scheme `ps_AuditLog_Month`, filegroup-per-month rollover. +- PK on `EventId` for idempotency. + +**Step 3: Add `AuditLog` DB roles** + +Under Roles/Permissions, add `scadalink_audit_writer` (INSERT+SELECT only) and `scadalink_audit_purger` (partition-switch only). Note the CI grep guard against `UPDATE … AuditLog` / `DELETE … AuditLog`. + +**Step 4: Add `AuditLog` migration note** + +Under Migrations, note that the initial migration creates the partition function/scheme and the table aligned to the scheme; partition-maintenance job is owned by the Audit Log component, not the Configuration DB. + +**Step 5: Add retention config note** + +Mention `AuditLog:RetentionDays` (global only in v1) as an Audit Log options key consumed by the purge actor. + +**Step 6: Verify cross-reference** + +Run: `grep -n "AuditLog\|Audit Log" docs/requirements/Component-ConfigurationDatabase.md` +Expected: new table appears in the Tables section, roles in Roles section. + +**Step 7: Commit** + +```bash +git add docs/requirements/Component-ConfigurationDatabase.md +git commit -m "docs(audit): add AuditLog table, partitioning, and DB roles to Config DB" +``` + +--- + +## Task 4: Update `Component-ClusterInfrastructure.md` + +**Files:** +- Modify: `docs/requirements/Component-ClusterInfrastructure.md` + +**Step 1: Read singleton-placement section** + +Find where Notification Outbox / Site Call Audit singletons are documented (active-central placement model). + +**Step 2: Register central singletons** + +Add to the central-singleton list: +- `AuditLogIngestActor` — receives gRPC telemetry batches, performs insert-if-not-exists on `EventId`; for cached telemetry, performs both `AuditLog` insert and `SiteCalls` upsert in one transaction. +- `SiteAuditReconciliationActor` — periodic per-site pull, default every 5 minutes. +- `AuditLogPurgeActor` — daily partition-switch purge. + +**Step 3: Register site singletons** + +Add to the site-singleton list: +- `SiteAuditTelemetryActor` — drains the local `AuditLog` SQLite's `Pending` rows to central in batches; short interval (5s) when busy, longer (30s) when idle. + +**Step 4: Note dedicated dispatcher** + +Add a one-liner: `SiteAuditTelemetryActor` runs on a dedicated dispatcher so it doesn't compete with the script blocking-I/O dispatcher (per `alog.md` §6.2). + +**Step 5: Verify** + +Run: `grep -n "AuditLogIngestActor\|SiteAuditTelemetryActor\|AuditLogPurgeActor\|SiteAuditReconciliationActor" docs/requirements/Component-ClusterInfrastructure.md` +Expected: all four singletons listed. + +**Step 6: Commit** + +```bash +git add docs/requirements/Component-ClusterInfrastructure.md +git commit -m "docs(audit): register AuditLog singletons in Cluster Infrastructure" +``` + +--- + +## Task 5: Update `Component-SiteRuntime.md` + +**Files:** +- Modify: `docs/requirements/Component-SiteRuntime.md` + +**Step 1: Find script-trust-boundary section** + +Locate the section listing what scripts can/cannot do and how their boundary-crossing calls are mediated. + +**Step 2: Note audit hook** + +Add: "Every script-trust-boundary call (External System Gateway, Database layer, Notify) emits an `AuditEvent` to `IAuditWriter` (site-local SQLite append). Hot path; never fails the calling action; failures logged via the `SiteAuditWriteFailures` health metric (see Health Monitoring #11)." + +**Step 3: Note site SQLite footprint** + +Find the section discussing site storage (SQLite for deployed configs, S&F buffer, event log, operation tracking). Add the `AuditLog` SQLite database file as a peer with the 7-day-purge-respecting-ForwardState invariant; cross-reference to Component-AuditLog.md. + +**Step 4: Verify** + +Run: `grep -n "IAuditWriter\|AuditLog\|Audit Log" docs/requirements/Component-SiteRuntime.md` +Expected: hook documented, SQLite file mentioned. + +**Step 5: Commit** + +```bash +git add docs/requirements/Component-SiteRuntime.md +git commit -m "docs(audit): note IAuditWriter hook and site SQLite in Site Runtime" +``` + +--- + +## Task 6: Update `Component-ExternalSystemGateway.md` + +**Files:** +- Modify: `docs/requirements/Component-ExternalSystemGateway.md` + +**Step 1: Find Call/CachedCall sections** + +Locate the dual-call-modes documentation. + +**Step 2: Note audit emission on sync calls** + +Under `ExternalSystem.Call`, add: "Emits an `ApiOutbound.SyncCall` row to `IAuditWriter` at call completion (success or failure). Payload captured per the Audit Log policy (#23 §Payload Capture Policy). Audit-write failure never aborts the script." + +**Step 3: Note audit emission on cached calls** + +Under `ExternalSystem.CachedCall`, add: "Each lifecycle transition (`CachedEnqueued`, `CachedAttempt`, `CachedTerminal`) emits an audit row via the combined cached-operation telemetry packet — one packet carries both the audit row and the SiteCalls upsert (see Audit Log #23 §Cached Operations and Site Call Audit #22)." + +**Step 4: Note audit emission on DB writes** + +Under `Database.Connection()` (synchronous), add: "Script-initiated `Execute`/`ExecuteScalar` calls emit `DbOutbound.SyncWrite` rows; `ExecuteReader` emits `DbOutbound.SyncRead`. SQL parameter values are captured by default; per-connection redaction opt-in via the Audit Log configuration (#23 §Payload Capture Policy §8.2)." + +**Step 5: Note audit emission on cached DB writes** + +Under `Database.CachedWrite`, add: same combined-telemetry pattern as cached external calls. + +**Step 6: Verify** + +Run: `grep -n "AuditLog\|Audit Log\|ApiOutbound\|DbOutbound\|IAuditWriter" docs/requirements/Component-ExternalSystemGateway.md` +Expected: hooks documented in all four call-mode subsections. + +**Step 7: Commit** + +```bash +git add docs/requirements/Component-ExternalSystemGateway.md +git commit -m "docs(audit): emit AuditLog rows from External System Gateway call paths" +``` + +--- + +## Task 7: Update `Component-SiteCallAudit.md` + +**Files:** +- Modify: `docs/requirements/Component-SiteCallAudit.md` + +**Step 1: Find Ingest & Idempotency section** + +Locate the "Ingest & Idempotency" section (around line 69 in current file). + +**Step 2: Note combined telemetry** + +Add a new paragraph: "From v1.x onward, the cached-operation telemetry packet additively carries the `AuditEvent` content alongside the existing operational fields. Central's `AuditLogIngestActor` (Audit Log #23) performs both the immutable `AuditLog` insert and the `SiteCalls` upsert in a single transaction. Idempotency keys remain `EventId` (for AuditLog) and `TrackedOperationId` (for SiteCalls)." + +**Step 3: Cross-reference Audit Log** + +Find the Dependencies / Interactions sections (typically near the end). Add an edge to **Audit Log (#23)** noting the shared telemetry packet and dual-write ingest. + +**Step 4: Verify** + +Run: `grep -n "Audit Log\|AuditLog\|AuditEvent\|#23" docs/requirements/Component-SiteCallAudit.md` +Expected: combined-telemetry paragraph + Dependencies edge present. + +**Step 5: Commit** + +```bash +git add docs/requirements/Component-SiteCallAudit.md +git commit -m "docs(audit): note shared cached-operation telemetry with Audit Log" +``` + +--- + +## Task 8: Update `Component-NotificationOutbox.md` + +**Files:** +- Modify: `docs/requirements/Component-NotificationOutbox.md` + +**Step 1: Find dispatcher section** + +Locate the section describing the central dispatcher's delivery attempt loop. + +**Step 2: Note central direct-write of attempt/terminal** + +Add: "Each delivery attempt writes a `Notification.Attempt` row to the `AuditLog` via `ICentralAuditWriter`; transition to a terminal status (`Delivered` / `Parked` / `Discarded`) writes a `Notification.Terminal` row. Audit writes are direct (no telemetry — the dispatcher runs at central). The site-emitted `Notification.Enqueued` row arrives via the standard audit telemetry channel." + +**Step 3: Cross-reference Audit Log** + +Add to Dependencies / Interactions: edge to **Audit Log (#23)** noting central direct-write of dispatch lifecycle events. + +**Step 4: Note status independence** + +Add a clarifying sentence: "The operational `Notifications` table remains the source of truth for the dispatcher and for Retry/Discard actions; the `AuditLog` rows are immutable shadows." + +**Step 5: Verify** + +Run: `grep -n "Audit Log\|ICentralAuditWriter\|Notification.Attempt\|#23" docs/requirements/Component-NotificationOutbox.md` +Expected: dispatcher hook + Dependencies edge present. + +**Step 6: Commit** + +```bash +git add docs/requirements/Component-NotificationOutbox.md +git commit -m "docs(audit): central direct-write of notification dispatch events to AuditLog" +``` + +--- + +## Task 9: Update `Component-InboundAPI.md` + +**Files:** +- Modify: `docs/requirements/Component-InboundAPI.md` + +**Step 1: Find request-completion / logging section** + +Locate the section describing how requests are processed and what gets logged today (today: failures only, per the brainstorm exploration). + +**Step 2: Replace failures-only stance** + +Edit the "failures-only logging" claim so it now reads: "Every request (success or failure) emits one `ApiInbound.Completed` row to `ICentralAuditWriter` from request middleware before the HTTP response is flushed. The row captures the API key *name* (never the key material), remote IP, user-agent, response status, duration, and truncated request/response bodies per the Audit Log capture policy (#23 §Payload Capture Policy)." + +**Step 3: Cross-reference Audit Log** + +Add Dependencies edge to **Audit Log (#23)**. + +**Step 4: Note non-blocking semantics** + +Add: "Middleware audit-write failures are logged and metricked (see Health Monitoring #11) but never affect the HTTP response." + +**Step 5: Verify** + +Run: `grep -n "Audit Log\|ApiInbound\|ICentralAuditWriter\|#23" docs/requirements/Component-InboundAPI.md` +Expected: middleware hook + Dependencies edge present. + +**Step 6: Commit** + +```bash +git add docs/requirements/Component-InboundAPI.md +git commit -m "docs(audit): emit ApiInbound.Completed audit row per request" +``` + +--- + +## Task 10: Update `Component-CentralUI.md` + +**Files:** +- Modify: `docs/requirements/Component-CentralUI.md` + +**Step 1: Find navigation / page list** + +Locate the section enumerating top-level nav groups and pages. + +**Step 2: Add Audit nav group** + +Add a new top-level group **Audit** with one page in v1: +- **Audit Log** — global query/filter/drilldown over the central `AuditLog` table. + +Document the filter bar and results grid columns from `alog.md` §10.1. + +**Step 3: Add drill-in links** + +In the existing Notifications, Site Calls, External Systems, Inbound API Keys, Sites, and Instances detail-page documentation, add a "View audit history" / "Recent activity" / "Audit feed" entry that opens the Audit Log page pre-filtered (per `alog.md` §10.2). + +**Step 4: Add Health dashboard tiles** + +In the Health dashboard documentation, add three tiles under a new "Audit" KPI group: Audit volume, Audit error rate, Audit backlog (per `alog.md` §10.3 / §14). + +**Step 5: Note UI rules already covered** + +No new framework choices — sticks to Blazor Server + Bootstrap + custom components per the existing project rules (per memory note `feedback_central_ui.md`). + +**Step 6: Verify** + +Run: `grep -n "Audit Log\|Audit nav\|Audit feed\|Audit volume\|#23" docs/requirements/Component-CentralUI.md` +Expected: nav group, page, drill-ins, tiles all documented. + +**Step 7: Commit** + +```bash +git add docs/requirements/Component-CentralUI.md +git commit -m "docs(audit): add Audit nav group, Audit Log page, drill-ins, and KPI tiles to Central UI" +``` + +--- + +## Task 11: Update `Component-HealthMonitoring.md` + +**Files:** +- Modify: `docs/requirements/Component-HealthMonitoring.md` + +**Step 1: Find metrics list** + +Locate where existing site + central metrics are enumerated. + +**Step 2: Add new site metrics** + +- `SiteAuditBacklog` — count of `Pending` rows in site-local `AuditLog` plus oldest-pending-age plus on-disk bytes. Threshold drives a Health dashboard warning on the affected site tile. +- `SiteAuditWriteFailures` — count of failed hot-path appends since last report. +- `SiteAuditTelemetryStalled` — boolean flag set when reconciliation reports a non-draining backlog over two cycles. + +**Step 3: Add new central metrics** + +- `CentralAuditWriteFailures` — central direct-write failures (Inbound API middleware, Notification Outbox dispatcher). +- `AuditRedactionFailure` — payload redactor errors (over-redacted, safety-net hit). + +**Step 4: Add new tiles** + +Three new dashboard tiles under an "Audit" group: Audit volume, Audit error rate, Audit backlog. + +**Step 5: Cross-reference Audit Log** + +Dependencies edge to **Audit Log (#23)**. + +**Step 6: Verify** + +Run: `grep -n "SiteAuditBacklog\|SiteAuditWriteFailures\|CentralAuditWriteFailures\|AuditRedactionFailure\|Audit volume" docs/requirements/Component-HealthMonitoring.md` +Expected: all five metrics + three tiles listed. + +**Step 7: Commit** + +```bash +git add docs/requirements/Component-HealthMonitoring.md +git commit -m "docs(audit): add Audit Log health metrics and dashboard tiles" +``` + +--- + +## Task 12: Update `Component-CLI.md` + +**Files:** +- Modify: `docs/requirements/Component-CLI.md` + +**Step 1: Find command-group list** + +Locate the section enumerating top-level CLI command groups. + +**Step 2: Add `scadalink audit` group** + +Three subcommands per `alog.md` §15.1: +- `audit query --site --since --kind [...]` — UI-equivalent filter set. +- `audit export --since --until --format csv|jsonl|parquet --output ` — server-side streaming export. +- `audit verify-chain --month ` — hash-chain verification (no-op in v1; available once §11.4 ships). + +Note: requires `OperationalAudit` + `AuditExport` permissions (Security & Auth #10). + +**Step 3: Cross-reference Audit Log and Management Service** + +Dependencies edges to **Audit Log (#23)** and **Management Service (#18)** (the CLI hits central via the existing HTTP Management API). + +**Step 4: Verify** + +Run: `grep -n "scadalink audit\|audit query\|audit export\|audit verify-chain\|#23" docs/requirements/Component-CLI.md` +Expected: command group documented with all three subcommands. + +**Step 5: Commit** + +```bash +git add docs/requirements/Component-CLI.md +git commit -m "docs(audit): add scadalink audit command group to CLI" +``` + +--- + +## Task 13: Update `README.md` + +**Files:** +- Modify: `README.md` + +**Step 1: Find component table** + +Locate the markdown table containing rows #1–#22 (currently around lines 36–58). + +**Step 2: Add row #23** + +Append a row after `Site Call Audit`: + +``` +| 23 | Audit Log | [docs/requirements/Component-AuditLog.md](docs/requirements/Component-AuditLog.md) | New central append-only AuditLog spanning every script-trust-boundary action (outbound API sync+cached, outbound DB sync+cached, notifications, inbound API). Site-local SQLite hot-path append + gRPC telemetry + central reconciliation; combined telemetry packet with Site Call Audit; central direct-write for Notification Outbox dispatch + Inbound API middleware; monthly partitioning, 365-day default retention. | +``` + +**Step 3: Update architecture diagram (logical)** + +In the architecture diagram, add an `AuditLog` box under the central cluster's "Audit Log" / observability cluster (parallel to Notification Outbox and Site Call Audit). Add a thin arrow from each affected component into it. + +**Step 4: Verify** + +Run: `grep -n "Audit Log\|Component-AuditLog.md\|| 23 |" README.md` +Expected: new row + diagram entry present. + +**Step 5: Commit** + +```bash +git add README.md +git commit -m "docs(audit): register Audit Log (#23) in the README component table" +``` + +--- + +## Task 14: Update `docs/requirements/HighLevelReqs.md` + +**Files:** +- Modify: `docs/requirements/HighLevelReqs.md` + +**Step 1: Find functional-area sections** + +Locate the section that currently contains requirements for Notification Outbox and Site Call Audit (likely under "Observability" or "Audit & Reporting"). + +**Step 2: Add Audit Log requirements section** + +Add a new subsection **"Centralized Audit Log"** with numbered requirements covering: +- AL-1: Append-only central record of every script-trust-boundary action. +- AL-2: One row per lifecycle event for cached calls and notifications. +- AL-3: Site-local hot-path append; gRPC telemetry to central; idempotent on `EventId`. +- AL-4: Reconciliation pull self-heals missed telemetry. +- AL-5: Payload metadata + truncated bodies (8 KB default, 64 KB on errors). +- AL-6: Headers redacted by default; SQL parameter values captured by default; per-target redaction opt-in. +- AL-7: Audit-write failure never aborts the user-facing action. +- AL-8: 365-day default central retention; monthly partition switch purge. +- AL-9: Site SQLite purge requires `ForwardState ∈ {Forwarded, Reconciled}`; central outage cannot cause audit loss at sites. +- AL-10: Central UI Audit Log page with cross-channel filter and drill-ins from existing operational pages. +- AL-11: Append-only enforced via DB roles; tamper-evidence hash chain deferred to v1.x. +- AL-12: CLI `scadalink audit` command group. + +**Step 3: Cross-reference Audit Log component** + +Add a "See Component-AuditLog.md (#23)" pointer at the top of the subsection. + +**Step 4: Verify** + +Run: `grep -n "AL-1\|AL-12\|Centralized Audit Log\|Component-AuditLog.md" docs/requirements/HighLevelReqs.md` +Expected: section header and all twelve requirements present. + +**Step 5: Commit** + +```bash +git add docs/requirements/HighLevelReqs.md +git commit -m "docs(audit): add Centralized Audit Log requirements (AL-1..AL-12) to HighLevelReqs" +``` + +--- + +## Task 15: Update `CLAUDE.md` + +**Files:** +- Modify: `CLAUDE.md` + +**Step 1: Update Current Component List** + +Change the heading from `## Current Component List (22 components)` to `## Current Component List (23 components)`. Append a new line at the end of the numbered list: + +``` +23. Audit Log — Central append-only AuditLog table spanning every script-trust-boundary action (outbound API sync+cached, outbound DB sync+cached, notifications, inbound API). Site SQLite hot-path + gRPC telemetry + reconciliation; combined telemetry with Site Call Audit; central direct-write for Notification Outbox dispatch + Inbound API; monthly partitioning, 365-day retention. +``` + +**Step 2: Add Key Design Decisions block** + +In the **Key Design Decisions** section, add a new subsection **`### Centralized Audit Log`** with bulleted decisions mirroring `alog.md` §1–§15 highlights: + +- Layered design — append-only AuditLog alongside operational Notifications (#21) and SiteCalls (#22), not replacing them. +- Scope = script trust boundary; framework traffic explicitly excluded. +- One row per lifecycle event; cached calls produce 4+ rows per operation. +- Site SQLite hot-path first; gRPC telemetry to central; idempotent on `EventId`; reconciliation pull as fallback. +- Cached operations: site emits, one telemetry packet carries audit + operational state; central writes both in one transaction. +- Payload cap 8 KB default / 64 KB on errors; headers redacted by default; SQL parameter values captured by default; per-target redaction opt-in. +- Audit-write failure never aborts the user-facing action. +- 365-day central retention with monthly partition-switch purge; 7-day site SQLite with hard `ForwardState` invariant. +- Append-only enforced via DB roles; hash-chain tamper evidence and Parquet archival deferred to v1.x. +- New top-level **Audit** nav group + Audit Log page + drill-ins from Notifications / Site Calls / External Systems / Inbound API Keys / Sites / Instances. + +**Step 3: Verify** + +Run: `grep -n "Centralized Audit Log\|Audit Log\|23 components\|23\\. Audit Log" CLAUDE.md` +Expected: count updated, list extended, Key Design Decisions block present. + +**Step 4: Commit** + +```bash +git add CLAUDE.md +git commit -m "docs(audit): register Audit Log (#23) in CLAUDE.md component list and key decisions" +``` + +--- + +## Task 16: Final cross-reference verification + +**Files:** +- None — verification only. + +**Step 1: Grep for stale references** + +Run: `grep -rn "22 components\|Currently 22\|22\\. Site Call Audit\\s*$" docs/ README.md CLAUDE.md` +Expected: no hits — all updated to 23. + +**Step 2: Grep for orphan references** + +Run: `grep -rn "Component-AuditLog.md" docs/ README.md CLAUDE.md` +Expected: hits in README, CLAUDE.md, and each affected component doc. Confirm the file exists at the referenced path. + +**Step 3: Verify all twelve affected component docs cross-reference Audit Log** + +Run: `for f in docs/requirements/Component-{ExternalSystemGateway,InboundAPI,NotificationOutbox,SiteCallAudit,SiteRuntime,Commons,CentralUI,ConfigurationDatabase,ClusterInfrastructure,HealthMonitoring,CLI}.md; do echo "--- $f"; grep -c "Audit Log\|AuditLog\|#23" "$f"; done` +Expected: each file shows count ≥ 1. + +**Step 4: Verify alog.md still matches the design canonically** + +Run: `git diff fec0bb1 -- alog.md` +Expected: no diff — alog.md is unchanged from the validated commit. + +**Step 5: Skim the new file once more end-to-end** + +Read: `docs/requirements/Component-AuditLog.md`. Verify section ordering, completeness, no contradictions with `alog.md`. + +**Step 6: Review the commit graph** + +Run: `git log --oneline feature/audit-log-docs ^main` +Expected: 14 commits — one per Task 1–13 plus Task 15 (Task 14 is HighLevelReqs in this list — recount: tasks 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 = 15 commits). Adjust expectation: 15 docs/commits. + +**Step 7: Final commit (only if any fix-ups needed)** + +If grep finds any issue, fix it and commit with `docs(audit): cross-reference cleanup`. Otherwise no commit at this task. + +--- + +## Task 17: Merge to main (optional, on user request only) + +**Files:** +- None — git operation only. + +**Step 1: Confirm with user** + +Per CLAUDE.md and harness policy, do not push or merge to main without explicit user instruction. This task documents the option but does not execute automatically. + +**Step 2: If user requests merge** + +```bash +git switch main +git merge --no-ff feature/audit-log-docs -m "Merge feature/audit-log-docs: centralized audit log design" +``` + +**Step 3: If user requests push** + +```bash +git push origin main +``` + +(or push the feature branch instead — operator's call). + +--- + +## Execution Notes + +- **Tasks 2–14 are mostly independent of each other** once Task 1 is done. Suitable for parallel execution via the **subagent-driven-development** sub-skill — one fresh subagent per task, review between commits. +- **Tasks 15 and 16** must run last (Task 15 is the CLAUDE.md rollup; Task 16 is verification). +- **Task 0** must run first (branch prep). +- Total: 17 tasks, ~15 commits, ~250–400 lines of new prose in `Component-AuditLog.md` plus smaller per-component additions. +- Spec is `alog.md` (commit `fec0bb1`); every task cites the relevant section. diff --git a/docs/plans/2026-05-20-centralized-audit-log.md.tasks.json b/docs/plans/2026-05-20-centralized-audit-log.md.tasks.json new file mode 100644 index 0000000..9b5ecac --- /dev/null +++ b/docs/plans/2026-05-20-centralized-audit-log.md.tasks.json @@ -0,0 +1,26 @@ +{ + "planPath": "docs/plans/2026-05-20-centralized-audit-log.md", + "spec": "alog.md (commit fec0bb1)", + "repoNature": "design-documentation-only", + "tasks": [ + {"id": 0, "subject": "Task 0: Prepare branch", "status": "pending", "blockedBy": []}, + {"id": 1, "subject": "Task 1: Author Component-AuditLog.md", "status": "pending", "blockedBy": [0]}, + {"id": 2, "subject": "Task 2: Update Component-Commons.md", "status": "pending", "blockedBy": [0]}, + {"id": 3, "subject": "Task 3: Update Component-ConfigurationDatabase.md", "status": "pending", "blockedBy": [1]}, + {"id": 4, "subject": "Task 4: Update Component-ClusterInfrastructure.md", "status": "pending", "blockedBy": [1]}, + {"id": 5, "subject": "Task 5: Update Component-SiteRuntime.md", "status": "pending", "blockedBy": [1]}, + {"id": 6, "subject": "Task 6: Update Component-ExternalSystemGateway.md", "status": "pending", "blockedBy": [1]}, + {"id": 7, "subject": "Task 7: Update Component-SiteCallAudit.md", "status": "pending", "blockedBy": [1]}, + {"id": 8, "subject": "Task 8: Update Component-NotificationOutbox.md", "status": "pending", "blockedBy": [1]}, + {"id": 9, "subject": "Task 9: Update Component-InboundAPI.md", "status": "pending", "blockedBy": [1]}, + {"id": 10, "subject": "Task 10: Update Component-CentralUI.md", "status": "pending", "blockedBy": [1]}, + {"id": 11, "subject": "Task 11: Update Component-HealthMonitoring.md", "status": "pending", "blockedBy": [1]}, + {"id": 12, "subject": "Task 12: Update Component-CLI.md", "status": "pending", "blockedBy": [1]}, + {"id": 13, "subject": "Task 13: Update README.md", "status": "pending", "blockedBy": [1]}, + {"id": 14, "subject": "Task 14: Update HighLevelReqs.md", "status": "pending", "blockedBy": [1]}, + {"id": 15, "subject": "Task 15: Update CLAUDE.md", "status": "pending", "blockedBy": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]}, + {"id": 16, "subject": "Task 16: Final cross-reference verification", "status": "pending", "blockedBy": [15]}, + {"id": 17, "subject": "Task 17: Merge to main (user-gated)", "status": "pending", "blockedBy": [16]} + ], + "lastUpdated": "2026-05-20T00:00:00Z" +}