Files
scadalink-design/docs/plans/2026-05-20-centralized-audit-log.md
Joseph Doherty d93ca4c56e docs(audit): add implementation plan for centralized audit log
See docs/plans/2026-05-20-centralized-audit-log.md and peer .tasks.json.
17 tasks covering Component-AuditLog.md plus cross-references across
11 affected component docs, README, HighLevelReqs, and CLAUDE.md.
Spec is alog.md at commit fec0bb1.
2026-05-20 07:32:47 -04:00

788 lines
34 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Centralized Audit Log Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
>
> **Repo nature:** Design-documentation only. No code, no tests. Each task is a documentation change. "Verify" = re-read the diff + grep for stale cross-references. Commit after each task.
**Goal:** Document the new **#23 Audit Log** component and propagate its cross-references across every affected component design, the README, HighLevelReqs, and CLAUDE.md — exactly as specified in `alog.md` (committed `fec0bb1`).
**Architecture:** Layered, append-only `AuditLog` table at central, alongside existing `Notifications` (#21) and `SiteCalls` (#22) operational stores. Site SQLite writes on the hot path; gRPC telemetry forwards to central; site purge requires `ForwardState ∈ {Forwarded, Reconciled}`. Cached calls send a single telemetry packet that drives both the immutable `AuditLog` insert and the operational `SiteCalls` upsert. Central-originated events (Inbound API, Notification dispatch attempts) write directly. Monthly partitioning at central, 365-day default retention.
**Tech Stack:** Markdown only. No code in v1 of this plan.
**Spec:** `/Users/dohertj2/Desktop/scadalink-design/alog.md` (see commit `fec0bb1`). All task content below cites sections of that file.
---
## Task 0: Prepare branch
**Files:**
- None — git operation only.
**Step 1: Confirm working tree state**
Run: `git status --short`
Expected: three unstaged `infra/` modifications (unrelated; leave them alone), nothing else.
**Step 2: Create feature branch off `main`**
Run: `git switch -c feature/audit-log-docs`
Expected: switched to a new branch.
**Step 3: Verify branch**
Run: `git rev-parse --abbrev-ref HEAD`
Expected: `feature/audit-log-docs`.
**No commit at this task — just branch prep.**
---
## Task 1: Author `Component-AuditLog.md`
**Files:**
- Create: `docs/requirements/Component-AuditLog.md`
**Step 1: Read context**
Read `alog.md` §1§16. Read the structural style of `docs/requirements/Component-SiteCallAudit.md` and `docs/requirements/Component-NotificationOutbox.md` — mirror their section ordering (Purpose / Location / Responsibilities / Tables / Lifecycle / Ingest & Idempotency / Reconciliation / Retention & Purge / KPIs / Configuration / Dependencies / Interactions).
**Step 2: Write the skeleton**
Create the file with these top-level headings (verbatim, in order):
```
# Component: Audit Log
## Purpose
## Location
## Responsibilities
## Scope — the script trust boundary
## The `AuditLog` Table (central)
## The Site-Local `AuditLog` (SQLite)
## Ingestion Paths
## Cached Operations — Combined Telemetry
## Payload Capture Policy
## Failure Handling & Idempotency
## Retention & Purge
## Security & Tamper-Evidence
## KPIs
## Configuration
## Dependencies
## Interactions
```
**Step 3: Fill `Purpose`**
Two-paragraph version of `alog.md` §1. Lead sentence: "Provides a single, append-only, forensic + operational record of every integration action initiated by, or terminating in, a script — across outbound API, outbound DB, notifications, and inbound API." Second paragraph: not a dispatcher, observes Notification Outbox (#21) and Site Call Audit (#22), adds coverage where they are silent.
**Step 4: Fill `Location`**
Central cluster + site cluster. Central: `AuditLog` table in MS SQL plus three singleton actors on the active central node — `AuditLogIngestActor` (telemetry receiver), `SiteAuditReconciliationActor`, `AuditLogPurgeActor`. Sites: `AuditLog` SQLite database file alongside the S&F buffer plus `SiteAuditTelemetryActor` singleton on the active site node. Registered as component #23 in the Host role configuration.
**Step 5: Fill `Responsibilities`**
Bullet list mirroring `alog.md` §1§3 commitments. Six bullets:
- Accept site-local hot-path audit writes from script-trust-boundary call paths.
- Forward site audit rows to central via gRPC telemetry with at-least-once + idempotency on `EventId`.
- Run periodic reconciliation pulls per site to self-heal missed telemetry.
- Accept central-originated audit writes (Inbound API, Notification dispatch attempts).
- Compute point-in-time KPIs (global + per-site) from the central `AuditLog` table.
- Purge expired rows by monthly partition switch.
**Step 6: Fill `Scope — the script trust boundary`**
Reproduce the table from `alog.md` §2 verbatim (the six rows). Add the "Out of scope" bullet list. Add the DB-reads note.
**Step 7: Fill `The AuditLog Table (central)`**
Reproduce the column table from `alog.md` §4. Then the index list. Then the `Kind`-per-channel table (with the inbound API simplification — only `Completed`).
**Step 8: Fill `The Site-Local AuditLog (SQLite)`**
State same schema as central minus `IngestedAtUtc`, plus `ForwardState` (`Pending | Forwarded | Reconciled`). Reproduce the **hard purge invariant** from `alog.md` §4 verbatim:
> A row is eligible for purge only when both `OccurredAtUtc < retention threshold` AND `ForwardState IN ('Forwarded', 'Reconciled')`. Pending rows are never purged.
Mention the `SiteAuditBacklog` health metric.
**Step 9: Fill `Ingestion Paths`**
Three subsections mirroring `alog.md` §6.1, §6.2, §6.3, §6.4. Keep concise — full pseudo-code lives in `alog.md`; the component doc captures the contract.
**Step 10: Fill `Cached Operations — Combined Telemetry`**
Capture `alog.md` §6.5 — site is source of truth, one telemetry packet carries both the audit row and the SiteCalls operational update; central ingest performs both writes in a single transaction.
**Step 11: Fill `Payload Capture Policy`**
Compress `alog.md` §8 into 812 lines: defaults (8 KB / 64 KB on error), header redaction, body-redactor regex hook, SQL captures values by default with per-connection opt-out, never-captured list (API keys, LDAP creds, secrets), safety-net over-redacts on misconfiguration.
**Step 12: Fill `Failure Handling & Idempotency`**
Compress `alog.md` §9: EventId is the PK and dedup key; never-fail-the-action principle; ring buffer for transient SQLite write failures; reconciliation as fallback when telemetry actor wedges; central-direct-write failure handling.
**Step 13: Fill `Retention & Purge`**
Compress `alog.md` §12: 365-day default central retention; monthly partition switch; no row-level deletes at central; site 7-day default; site purge respects `ForwardState`.
**Step 14: Fill `Security & Tamper-Evidence`**
Compress `alog.md` §11: dedicated `scadalink_audit_writer` (INSERT+SELECT) and `scadalink_audit_purger` (partition-switch only) DB roles; CI grep guard against `UPDATE`/`DELETE` of `AuditLog`; Audit + OperationalAudit + AuditExport permissions; hash-chain tamper evidence deferred to v1.x.
**Step 15: Fill `KPIs`**
List the five KPIs from `alog.md` §14: Volume, Error rate, Backlog, Top inbound callers, Top outbound 5xx. Note that Notification Outbox and Site Call Audit KPIs are unaffected.
**Step 16: Fill `Configuration`**
Show the `AuditLog` `appsettings.json` shape from `alog.md` §8.4. Include `DefaultCapBytes`, `ErrorCapBytes`, `HeaderRedactList`, `GlobalBodyRedactors`, `PerTargetOverrides`, and `RetentionDays` (global only in v1).
**Step 17: Fill `Dependencies`**
Cross-references to:
- **Commons (#16)** — `AuditEvent`, `IAuditWriter`, `ICentralAuditWriter`, `AuditChannel`, `AuditKind`, `AuditStatus` types and interfaces.
- **Configuration Database (#17)** — `AuditLog` table schema, partition function/scheme, DB roles, retention options.
- **Cluster Infrastructure (#13)** — singleton placement and supervision (`AuditLogIngestActor`, `SiteAuditTelemetryActor`, `SiteAuditReconciliationActor`, `AuditLogPurgeActor`).
- **Communication (#5)** — gRPC telemetry message types added to the existing site-stream proto additively.
- **Site Runtime (#3)** — script trust boundary touchpoints invoke `IAuditWriter`.
- **Host (#15)** — registers the new component under the central + site roles.
**Step 18: Fill `Interactions`**
Edges to:
- **External System Gateway (#7)** — emits `ApiOutbound.SyncCall` rows; for `CachedCall` emits combined telemetry (audit + operational).
- **Site Runtime (#3) / Database layer** — emits `DbOutbound.SyncWrite`, `DbOutbound.SyncRead`, and cached variants similarly.
- **Inbound API (#14)** — emits `ApiInbound.Completed` rows from request middleware.
- **Notification Outbox (#21)** — site-emitted `Notification.Enqueued` flows via audit telemetry; central dispatcher writes `Notification.Attempt` and `Notification.Terminal` rows directly via `ICentralAuditWriter`.
- **Site Call Audit (#22)** — shares the cached-call telemetry packet; central ingest of that packet performs both `AuditLog` insert and `SiteCalls` upsert in one transaction.
- **Central UI (#9)** — new Audit nav group + Audit Log page; drill-in links from Notifications, Site Calls, External Systems, Inbound API key, Sites, Instances detail pages.
- **Health Monitoring (#11)** — three new tiles (Volume, Error rate, Backlog) plus new metrics (`SiteAuditBacklog`, `SiteAuditWriteFailures`, `SiteAuditTelemetryStalled`, `CentralAuditWriteFailures`, `AuditRedactionFailure`).
- **CLI (#19)** — `scadalink audit query|export|verify-chain` commands.
**Step 19: Verify**
Run: `grep -n "Component-AuditLog.md\|#23" docs/requirements/Component-AuditLog.md`
Expected: file references itself sensibly.
Run: `wc -l docs/requirements/Component-AuditLog.md`
Expected: ~250400 lines (sanity check; not exact).
**Step 20: Commit**
```bash
git add docs/requirements/Component-AuditLog.md
git commit -m "docs(audit): add Component-AuditLog (#23) design document"
```
---
## Task 2: Update `Component-Commons.md`
**Files:**
- Modify: `docs/requirements/Component-Commons.md`
**Step 1: Read existing structure**
Read the file to find the right sections — likely "Types", "Interfaces", "Messages", "Entities". Note which subsections audit-related additions belong in.
**Step 2: Add to `Types/`**
Under the Types section, add:
- `AuditChannel` enum: `ApiOutbound | DbOutbound | Notification | ApiInbound`.
- `AuditKind` enum: union of channel-specific values from `alog.md` §4 table.
- `AuditStatus` enum: `Success | TransientFailure | PermanentFailure | Enqueued | Retrying | Delivered | Parked | Discarded`.
- `AuditEvent` POCO record carrying every column from `alog.md` §4 (central schema), plus a `ForwardState` for site SQLite.
**Step 3: Add to `Interfaces/`**
- `IAuditWriter` — site-local hot-path interface: `Task WriteAsync(AuditEvent evt, CancellationToken ct)`. Implementation lives in Audit Log (#23) component.
- `ICentralAuditWriter` — central direct-write interface: `Task WriteAsync(AuditEvent evt, CancellationToken ct)` with insert-if-not-exists semantics on `EventId`.
**Step 4: Add to `Messages/`**
- `AuditTelemetryEnvelope` — gRPC message wrapping a batch of `AuditEvent` rows for telemetry forwarding.
- `CachedOperationTelemetry` — additively-evolved version of the existing SiteCalls telemetry message: now also carries `AuditEvent` content alongside the operational `SiteCalls` upsert fields. Note additive-only evolution rule (per existing project convention).
**Step 5: Verify**
Run: `grep -n "AuditEvent\|IAuditWriter\|AuditChannel" docs/requirements/Component-Commons.md`
Expected: all five identifiers appear in the right sections.
**Step 6: Commit**
```bash
git add docs/requirements/Component-Commons.md
git commit -m "docs(audit): register AuditEvent, IAuditWriter, AuditTelemetry types in Commons"
```
---
## Task 3: Update `Component-ConfigurationDatabase.md`
**Files:**
- Modify: `docs/requirements/Component-ConfigurationDatabase.md`
**Step 1: Read existing structure**
Find the "Tables" and "Roles" / "Permissions" / "Migrations" sections.
**Step 2: Add `AuditLog` table description**
Under Tables, add a new subsection mirroring how `Notifications` and `SiteCalls` are documented. Include:
- Full column list from `alog.md` §4 (central table).
- Index list from `alog.md` §4.
- Monthly partitioning: partition function `pf_AuditLog_Month`, scheme `ps_AuditLog_Month`, filegroup-per-month rollover.
- PK on `EventId` for idempotency.
**Step 3: Add `AuditLog` DB roles**
Under Roles/Permissions, add `scadalink_audit_writer` (INSERT+SELECT only) and `scadalink_audit_purger` (partition-switch only). Note the CI grep guard against `UPDATE … AuditLog` / `DELETE … AuditLog`.
**Step 4: Add `AuditLog` migration note**
Under Migrations, note that the initial migration creates the partition function/scheme and the table aligned to the scheme; partition-maintenance job is owned by the Audit Log component, not the Configuration DB.
**Step 5: Add retention config note**
Mention `AuditLog:RetentionDays` (global only in v1) as an Audit Log options key consumed by the purge actor.
**Step 6: Verify cross-reference**
Run: `grep -n "AuditLog\|Audit Log" docs/requirements/Component-ConfigurationDatabase.md`
Expected: new table appears in the Tables section, roles in Roles section.
**Step 7: Commit**
```bash
git add docs/requirements/Component-ConfigurationDatabase.md
git commit -m "docs(audit): add AuditLog table, partitioning, and DB roles to Config DB"
```
---
## Task 4: Update `Component-ClusterInfrastructure.md`
**Files:**
- Modify: `docs/requirements/Component-ClusterInfrastructure.md`
**Step 1: Read singleton-placement section**
Find where Notification Outbox / Site Call Audit singletons are documented (active-central placement model).
**Step 2: Register central singletons**
Add to the central-singleton list:
- `AuditLogIngestActor` — receives gRPC telemetry batches, performs insert-if-not-exists on `EventId`; for cached telemetry, performs both `AuditLog` insert and `SiteCalls` upsert in one transaction.
- `SiteAuditReconciliationActor` — periodic per-site pull, default every 5 minutes.
- `AuditLogPurgeActor` — daily partition-switch purge.
**Step 3: Register site singletons**
Add to the site-singleton list:
- `SiteAuditTelemetryActor` — drains the local `AuditLog` SQLite's `Pending` rows to central in batches; short interval (5s) when busy, longer (30s) when idle.
**Step 4: Note dedicated dispatcher**
Add a one-liner: `SiteAuditTelemetryActor` runs on a dedicated dispatcher so it doesn't compete with the script blocking-I/O dispatcher (per `alog.md` §6.2).
**Step 5: Verify**
Run: `grep -n "AuditLogIngestActor\|SiteAuditTelemetryActor\|AuditLogPurgeActor\|SiteAuditReconciliationActor" docs/requirements/Component-ClusterInfrastructure.md`
Expected: all four singletons listed.
**Step 6: Commit**
```bash
git add docs/requirements/Component-ClusterInfrastructure.md
git commit -m "docs(audit): register AuditLog singletons in Cluster Infrastructure"
```
---
## Task 5: Update `Component-SiteRuntime.md`
**Files:**
- Modify: `docs/requirements/Component-SiteRuntime.md`
**Step 1: Find script-trust-boundary section**
Locate the section listing what scripts can/cannot do and how their boundary-crossing calls are mediated.
**Step 2: Note audit hook**
Add: "Every script-trust-boundary call (External System Gateway, Database layer, Notify) emits an `AuditEvent` to `IAuditWriter` (site-local SQLite append). Hot path; never fails the calling action; failures logged via the `SiteAuditWriteFailures` health metric (see Health Monitoring #11)."
**Step 3: Note site SQLite footprint**
Find the section discussing site storage (SQLite for deployed configs, S&F buffer, event log, operation tracking). Add the `AuditLog` SQLite database file as a peer with the 7-day-purge-respecting-ForwardState invariant; cross-reference to Component-AuditLog.md.
**Step 4: Verify**
Run: `grep -n "IAuditWriter\|AuditLog\|Audit Log" docs/requirements/Component-SiteRuntime.md`
Expected: hook documented, SQLite file mentioned.
**Step 5: Commit**
```bash
git add docs/requirements/Component-SiteRuntime.md
git commit -m "docs(audit): note IAuditWriter hook and site SQLite in Site Runtime"
```
---
## Task 6: Update `Component-ExternalSystemGateway.md`
**Files:**
- Modify: `docs/requirements/Component-ExternalSystemGateway.md`
**Step 1: Find Call/CachedCall sections**
Locate the dual-call-modes documentation.
**Step 2: Note audit emission on sync calls**
Under `ExternalSystem.Call`, add: "Emits an `ApiOutbound.SyncCall` row to `IAuditWriter` at call completion (success or failure). Payload captured per the Audit Log policy (#23 §Payload Capture Policy). Audit-write failure never aborts the script."
**Step 3: Note audit emission on cached calls**
Under `ExternalSystem.CachedCall`, add: "Each lifecycle transition (`CachedEnqueued`, `CachedAttempt`, `CachedTerminal`) emits an audit row via the combined cached-operation telemetry packet — one packet carries both the audit row and the SiteCalls upsert (see Audit Log #23 §Cached Operations and Site Call Audit #22)."
**Step 4: Note audit emission on DB writes**
Under `Database.Connection()` (synchronous), add: "Script-initiated `Execute`/`ExecuteScalar` calls emit `DbOutbound.SyncWrite` rows; `ExecuteReader` emits `DbOutbound.SyncRead`. SQL parameter values are captured by default; per-connection redaction opt-in via the Audit Log configuration (#23 §Payload Capture Policy §8.2)."
**Step 5: Note audit emission on cached DB writes**
Under `Database.CachedWrite`, add: same combined-telemetry pattern as cached external calls.
**Step 6: Verify**
Run: `grep -n "AuditLog\|Audit Log\|ApiOutbound\|DbOutbound\|IAuditWriter" docs/requirements/Component-ExternalSystemGateway.md`
Expected: hooks documented in all four call-mode subsections.
**Step 7: Commit**
```bash
git add docs/requirements/Component-ExternalSystemGateway.md
git commit -m "docs(audit): emit AuditLog rows from External System Gateway call paths"
```
---
## Task 7: Update `Component-SiteCallAudit.md`
**Files:**
- Modify: `docs/requirements/Component-SiteCallAudit.md`
**Step 1: Find Ingest & Idempotency section**
Locate the "Ingest & Idempotency" section (around line 69 in current file).
**Step 2: Note combined telemetry**
Add a new paragraph: "From v1.x onward, the cached-operation telemetry packet additively carries the `AuditEvent` content alongside the existing operational fields. Central's `AuditLogIngestActor` (Audit Log #23) performs both the immutable `AuditLog` insert and the `SiteCalls` upsert in a single transaction. Idempotency keys remain `EventId` (for AuditLog) and `TrackedOperationId` (for SiteCalls)."
**Step 3: Cross-reference Audit Log**
Find the Dependencies / Interactions sections (typically near the end). Add an edge to **Audit Log (#23)** noting the shared telemetry packet and dual-write ingest.
**Step 4: Verify**
Run: `grep -n "Audit Log\|AuditLog\|AuditEvent\|#23" docs/requirements/Component-SiteCallAudit.md`
Expected: combined-telemetry paragraph + Dependencies edge present.
**Step 5: Commit**
```bash
git add docs/requirements/Component-SiteCallAudit.md
git commit -m "docs(audit): note shared cached-operation telemetry with Audit Log"
```
---
## Task 8: Update `Component-NotificationOutbox.md`
**Files:**
- Modify: `docs/requirements/Component-NotificationOutbox.md`
**Step 1: Find dispatcher section**
Locate the section describing the central dispatcher's delivery attempt loop.
**Step 2: Note central direct-write of attempt/terminal**
Add: "Each delivery attempt writes a `Notification.Attempt` row to the `AuditLog` via `ICentralAuditWriter`; transition to a terminal status (`Delivered` / `Parked` / `Discarded`) writes a `Notification.Terminal` row. Audit writes are direct (no telemetry — the dispatcher runs at central). The site-emitted `Notification.Enqueued` row arrives via the standard audit telemetry channel."
**Step 3: Cross-reference Audit Log**
Add to Dependencies / Interactions: edge to **Audit Log (#23)** noting central direct-write of dispatch lifecycle events.
**Step 4: Note status independence**
Add a clarifying sentence: "The operational `Notifications` table remains the source of truth for the dispatcher and for Retry/Discard actions; the `AuditLog` rows are immutable shadows."
**Step 5: Verify**
Run: `grep -n "Audit Log\|ICentralAuditWriter\|Notification.Attempt\|#23" docs/requirements/Component-NotificationOutbox.md`
Expected: dispatcher hook + Dependencies edge present.
**Step 6: Commit**
```bash
git add docs/requirements/Component-NotificationOutbox.md
git commit -m "docs(audit): central direct-write of notification dispatch events to AuditLog"
```
---
## Task 9: Update `Component-InboundAPI.md`
**Files:**
- Modify: `docs/requirements/Component-InboundAPI.md`
**Step 1: Find request-completion / logging section**
Locate the section describing how requests are processed and what gets logged today (today: failures only, per the brainstorm exploration).
**Step 2: Replace failures-only stance**
Edit the "failures-only logging" claim so it now reads: "Every request (success or failure) emits one `ApiInbound.Completed` row to `ICentralAuditWriter` from request middleware before the HTTP response is flushed. The row captures the API key *name* (never the key material), remote IP, user-agent, response status, duration, and truncated request/response bodies per the Audit Log capture policy (#23 §Payload Capture Policy)."
**Step 3: Cross-reference Audit Log**
Add Dependencies edge to **Audit Log (#23)**.
**Step 4: Note non-blocking semantics**
Add: "Middleware audit-write failures are logged and metricked (see Health Monitoring #11) but never affect the HTTP response."
**Step 5: Verify**
Run: `grep -n "Audit Log\|ApiInbound\|ICentralAuditWriter\|#23" docs/requirements/Component-InboundAPI.md`
Expected: middleware hook + Dependencies edge present.
**Step 6: Commit**
```bash
git add docs/requirements/Component-InboundAPI.md
git commit -m "docs(audit): emit ApiInbound.Completed audit row per request"
```
---
## Task 10: Update `Component-CentralUI.md`
**Files:**
- Modify: `docs/requirements/Component-CentralUI.md`
**Step 1: Find navigation / page list**
Locate the section enumerating top-level nav groups and pages.
**Step 2: Add Audit nav group**
Add a new top-level group **Audit** with one page in v1:
- **Audit Log** — global query/filter/drilldown over the central `AuditLog` table.
Document the filter bar and results grid columns from `alog.md` §10.1.
**Step 3: Add drill-in links**
In the existing Notifications, Site Calls, External Systems, Inbound API Keys, Sites, and Instances detail-page documentation, add a "View audit history" / "Recent activity" / "Audit feed" entry that opens the Audit Log page pre-filtered (per `alog.md` §10.2).
**Step 4: Add Health dashboard tiles**
In the Health dashboard documentation, add three tiles under a new "Audit" KPI group: Audit volume, Audit error rate, Audit backlog (per `alog.md` §10.3 / §14).
**Step 5: Note UI rules already covered**
No new framework choices — sticks to Blazor Server + Bootstrap + custom components per the existing project rules (per memory note `feedback_central_ui.md`).
**Step 6: Verify**
Run: `grep -n "Audit Log\|Audit nav\|Audit feed\|Audit volume\|#23" docs/requirements/Component-CentralUI.md`
Expected: nav group, page, drill-ins, tiles all documented.
**Step 7: Commit**
```bash
git add docs/requirements/Component-CentralUI.md
git commit -m "docs(audit): add Audit nav group, Audit Log page, drill-ins, and KPI tiles to Central UI"
```
---
## Task 11: Update `Component-HealthMonitoring.md`
**Files:**
- Modify: `docs/requirements/Component-HealthMonitoring.md`
**Step 1: Find metrics list**
Locate where existing site + central metrics are enumerated.
**Step 2: Add new site metrics**
- `SiteAuditBacklog` — count of `Pending` rows in site-local `AuditLog` plus oldest-pending-age plus on-disk bytes. Threshold drives a Health dashboard warning on the affected site tile.
- `SiteAuditWriteFailures` — count of failed hot-path appends since last report.
- `SiteAuditTelemetryStalled` — boolean flag set when reconciliation reports a non-draining backlog over two cycles.
**Step 3: Add new central metrics**
- `CentralAuditWriteFailures` — central direct-write failures (Inbound API middleware, Notification Outbox dispatcher).
- `AuditRedactionFailure` — payload redactor errors (over-redacted, safety-net hit).
**Step 4: Add new tiles**
Three new dashboard tiles under an "Audit" group: Audit volume, Audit error rate, Audit backlog.
**Step 5: Cross-reference Audit Log**
Dependencies edge to **Audit Log (#23)**.
**Step 6: Verify**
Run: `grep -n "SiteAuditBacklog\|SiteAuditWriteFailures\|CentralAuditWriteFailures\|AuditRedactionFailure\|Audit volume" docs/requirements/Component-HealthMonitoring.md`
Expected: all five metrics + three tiles listed.
**Step 7: Commit**
```bash
git add docs/requirements/Component-HealthMonitoring.md
git commit -m "docs(audit): add Audit Log health metrics and dashboard tiles"
```
---
## Task 12: Update `Component-CLI.md`
**Files:**
- Modify: `docs/requirements/Component-CLI.md`
**Step 1: Find command-group list**
Locate the section enumerating top-level CLI command groups.
**Step 2: Add `scadalink audit` group**
Three subcommands per `alog.md` §15.1:
- `audit query --site <s> --since <t> --kind <k> [...]` — UI-equivalent filter set.
- `audit export --since <t> --until <t> --format csv|jsonl|parquet --output <path>` — server-side streaming export.
- `audit verify-chain --month <YYYY-MM>` — hash-chain verification (no-op in v1; available once §11.4 ships).
Note: requires `OperationalAudit` + `AuditExport` permissions (Security & Auth #10).
**Step 3: Cross-reference Audit Log and Management Service**
Dependencies edges to **Audit Log (#23)** and **Management Service (#18)** (the CLI hits central via the existing HTTP Management API).
**Step 4: Verify**
Run: `grep -n "scadalink audit\|audit query\|audit export\|audit verify-chain\|#23" docs/requirements/Component-CLI.md`
Expected: command group documented with all three subcommands.
**Step 5: Commit**
```bash
git add docs/requirements/Component-CLI.md
git commit -m "docs(audit): add scadalink audit command group to CLI"
```
---
## Task 13: Update `README.md`
**Files:**
- Modify: `README.md`
**Step 1: Find component table**
Locate the markdown table containing rows #1#22 (currently around lines 3658).
**Step 2: Add row #23**
Append a row after `Site Call Audit`:
```
| 23 | Audit Log | [docs/requirements/Component-AuditLog.md](docs/requirements/Component-AuditLog.md) | New central append-only AuditLog spanning every script-trust-boundary action (outbound API sync+cached, outbound DB sync+cached, notifications, inbound API). Site-local SQLite hot-path append + gRPC telemetry + central reconciliation; combined telemetry packet with Site Call Audit; central direct-write for Notification Outbox dispatch + Inbound API middleware; monthly partitioning, 365-day default retention. |
```
**Step 3: Update architecture diagram (logical)**
In the architecture diagram, add an `AuditLog` box under the central cluster's "Audit Log" / observability cluster (parallel to Notification Outbox and Site Call Audit). Add a thin arrow from each affected component into it.
**Step 4: Verify**
Run: `grep -n "Audit Log\|Component-AuditLog.md\|| 23 |" README.md`
Expected: new row + diagram entry present.
**Step 5: Commit**
```bash
git add README.md
git commit -m "docs(audit): register Audit Log (#23) in the README component table"
```
---
## Task 14: Update `docs/requirements/HighLevelReqs.md`
**Files:**
- Modify: `docs/requirements/HighLevelReqs.md`
**Step 1: Find functional-area sections**
Locate the section that currently contains requirements for Notification Outbox and Site Call Audit (likely under "Observability" or "Audit & Reporting").
**Step 2: Add Audit Log requirements section**
Add a new subsection **"Centralized Audit Log"** with numbered requirements covering:
- AL-1: Append-only central record of every script-trust-boundary action.
- AL-2: One row per lifecycle event for cached calls and notifications.
- AL-3: Site-local hot-path append; gRPC telemetry to central; idempotent on `EventId`.
- AL-4: Reconciliation pull self-heals missed telemetry.
- AL-5: Payload metadata + truncated bodies (8 KB default, 64 KB on errors).
- AL-6: Headers redacted by default; SQL parameter values captured by default; per-target redaction opt-in.
- AL-7: Audit-write failure never aborts the user-facing action.
- AL-8: 365-day default central retention; monthly partition switch purge.
- AL-9: Site SQLite purge requires `ForwardState ∈ {Forwarded, Reconciled}`; central outage cannot cause audit loss at sites.
- AL-10: Central UI Audit Log page with cross-channel filter and drill-ins from existing operational pages.
- AL-11: Append-only enforced via DB roles; tamper-evidence hash chain deferred to v1.x.
- AL-12: CLI `scadalink audit` command group.
**Step 3: Cross-reference Audit Log component**
Add a "See Component-AuditLog.md (#23)" pointer at the top of the subsection.
**Step 4: Verify**
Run: `grep -n "AL-1\|AL-12\|Centralized Audit Log\|Component-AuditLog.md" docs/requirements/HighLevelReqs.md`
Expected: section header and all twelve requirements present.
**Step 5: Commit**
```bash
git add docs/requirements/HighLevelReqs.md
git commit -m "docs(audit): add Centralized Audit Log requirements (AL-1..AL-12) to HighLevelReqs"
```
---
## Task 15: Update `CLAUDE.md`
**Files:**
- Modify: `CLAUDE.md`
**Step 1: Update Current Component List**
Change the heading from `## Current Component List (22 components)` to `## Current Component List (23 components)`. Append a new line at the end of the numbered list:
```
23. Audit Log — Central append-only AuditLog table spanning every script-trust-boundary action (outbound API sync+cached, outbound DB sync+cached, notifications, inbound API). Site SQLite hot-path + gRPC telemetry + reconciliation; combined telemetry with Site Call Audit; central direct-write for Notification Outbox dispatch + Inbound API; monthly partitioning, 365-day retention.
```
**Step 2: Add Key Design Decisions block**
In the **Key Design Decisions** section, add a new subsection **`### Centralized Audit Log`** with bulleted decisions mirroring `alog.md` §1§15 highlights:
- Layered design — append-only AuditLog alongside operational Notifications (#21) and SiteCalls (#22), not replacing them.
- Scope = script trust boundary; framework traffic explicitly excluded.
- One row per lifecycle event; cached calls produce 4+ rows per operation.
- Site SQLite hot-path first; gRPC telemetry to central; idempotent on `EventId`; reconciliation pull as fallback.
- Cached operations: site emits, one telemetry packet carries audit + operational state; central writes both in one transaction.
- Payload cap 8 KB default / 64 KB on errors; headers redacted by default; SQL parameter values captured by default; per-target redaction opt-in.
- Audit-write failure never aborts the user-facing action.
- 365-day central retention with monthly partition-switch purge; 7-day site SQLite with hard `ForwardState` invariant.
- Append-only enforced via DB roles; hash-chain tamper evidence and Parquet archival deferred to v1.x.
- New top-level **Audit** nav group + Audit Log page + drill-ins from Notifications / Site Calls / External Systems / Inbound API Keys / Sites / Instances.
**Step 3: Verify**
Run: `grep -n "Centralized Audit Log\|Audit Log\|23 components\|23\\. Audit Log" CLAUDE.md`
Expected: count updated, list extended, Key Design Decisions block present.
**Step 4: Commit**
```bash
git add CLAUDE.md
git commit -m "docs(audit): register Audit Log (#23) in CLAUDE.md component list and key decisions"
```
---
## Task 16: Final cross-reference verification
**Files:**
- None — verification only.
**Step 1: Grep for stale references**
Run: `grep -rn "22 components\|Currently 22\|22\\. Site Call Audit\\s*$" docs/ README.md CLAUDE.md`
Expected: no hits — all updated to 23.
**Step 2: Grep for orphan references**
Run: `grep -rn "Component-AuditLog.md" docs/ README.md CLAUDE.md`
Expected: hits in README, CLAUDE.md, and each affected component doc. Confirm the file exists at the referenced path.
**Step 3: Verify all twelve affected component docs cross-reference Audit Log**
Run: `for f in docs/requirements/Component-{ExternalSystemGateway,InboundAPI,NotificationOutbox,SiteCallAudit,SiteRuntime,Commons,CentralUI,ConfigurationDatabase,ClusterInfrastructure,HealthMonitoring,CLI}.md; do echo "--- $f"; grep -c "Audit Log\|AuditLog\|#23" "$f"; done`
Expected: each file shows count ≥ 1.
**Step 4: Verify alog.md still matches the design canonically**
Run: `git diff fec0bb1 -- alog.md`
Expected: no diff — alog.md is unchanged from the validated commit.
**Step 5: Skim the new file once more end-to-end**
Read: `docs/requirements/Component-AuditLog.md`. Verify section ordering, completeness, no contradictions with `alog.md`.
**Step 6: Review the commit graph**
Run: `git log --oneline feature/audit-log-docs ^main`
Expected: 14 commits — one per Task 113 plus Task 15 (Task 14 is HighLevelReqs in this list — recount: tasks 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 = 15 commits). Adjust expectation: 15 docs/commits.
**Step 7: Final commit (only if any fix-ups needed)**
If grep finds any issue, fix it and commit with `docs(audit): cross-reference cleanup`. Otherwise no commit at this task.
---
## Task 17: Merge to main (optional, on user request only)
**Files:**
- None — git operation only.
**Step 1: Confirm with user**
Per CLAUDE.md and harness policy, do not push or merge to main without explicit user instruction. This task documents the option but does not execute automatically.
**Step 2: If user requests merge**
```bash
git switch main
git merge --no-ff feature/audit-log-docs -m "Merge feature/audit-log-docs: centralized audit log design"
```
**Step 3: If user requests push**
```bash
git push origin main
```
(or push the feature branch instead — operator's call).
---
## Execution Notes
- **Tasks 214 are mostly independent of each other** once Task 1 is done. Suitable for parallel execution via the **subagent-driven-development** sub-skill — one fresh subagent per task, review between commits.
- **Tasks 15 and 16** must run last (Task 15 is the CLAUDE.md rollup; Task 16 is verification).
- **Task 0** must run first (branch prep).
- Total: 17 tasks, ~15 commits, ~250400 lines of new prose in `Component-AuditLog.md` plus smaller per-component additions.
- Spec is `alog.md` (commit `fec0bb1`); every task cites the relevant section.