docs(audit): apply cross-bundle review fixes before merge

Final cross-bundle reviewer identified 7 inconsistencies that the per-bundle
reviewers couldn't see; all fixed in one logical commit.

Critical:
- HighLevelReqs AL-3: drop 'then upsert-on-newer-status' — AuditLog is
  strictly append-only (correct for SiteCalls/Notifications, wrong for
  the immutable AuditLog shadow).
- Component-AuditLog Error rate KPI: align with HealthMonitoring's
  exclusion list (Success/Delivered/Enqueued) rather than just non-Success;
  otherwise every Delivered notification or Enqueued cached call would be
  counted as an error.

Important:
- Component-AuditLog line 154: ISiteAuditWriter -> IAuditWriter (canonical
  name per Commons and the rest of this doc).
- Component-AuditLog Central direct-write paragraph: convert remaining
  slash notation (ApiInbound/Completed, Notification/Attempt,
  Notification/Terminal) to dot notation used everywhere else.
- Component-ClusterInfrastructure: scope SiteCallAuditActor to
  reconciliation + KPIs + Retry/Discard relay; cached-telemetry ingest is
  AuditLogIngestActor's role per Combined Telemetry contract.
- Component-CentralUI Audit Log page: state the OperationalAudit read
  permission and the read-vs-export split (matching CLI doc).
- Component-NotificationOutbox: add never-fail-the-action invariant for
  dispatcher audit writes.

Minor:
- Component-InboundAPI: 'Non-blocking semantics' was ambiguous (could be
  read as async); reword to 'Fail-soft' — the write is still synchronous
  before flush, but failures are caught and don't change the response.
- Component-CLI: realign audit-query/audit-export flags to actually match
  the Central UI Audit Log filter set (channel, kind, status, site,
  instance, target, actor, correlation-id, errors-only); drop --user and
  --entity-id which are IAuditService concepts, not Audit Log columns.
- Component-AuditLog KPI tile names: 'Volume/Error rate/Backlog' ->
  'Audit volume/Audit error rate/Audit backlog' (matches Central UI and
  Health Monitoring); drop the two orphan KPIs (Top inbound callers, Top
  outbound 5xx) that were never surfaced anywhere.
- Component-AuditLog Interactions: re-attribute DbOutbound emissions to
  ESG (where Database.* lives) with a note that Site Runtime is the API
  surface for scripts.
- HighLevelReqs AL-12: drop 'and reconciliation operations' (CLI has no
  reconcile command; reconciliation is an internal self-healing pull).
  Add note that verify-chain becomes operational once AL-11's hash chain
  ships.
This commit is contained in:
Joseph Doherty
2026-05-20 09:00:11 -04:00
parent 34ea97bae9
commit c929562e41
7 changed files with 20 additions and 21 deletions

View File

@@ -448,7 +448,7 @@ Sections 10.110.4 cover **configuration-database audit** (config-mutating use
- **AL-1**: The system maintains an **append-only** central Audit Log recording every script-trust-boundary action — outbound external system calls (sync `Call` and `CachedCall`), outbound database operations (sync `Connection` access and `CachedWrite`), notifications, and inbound API method invocations.
- **AL-2**: For cached calls and notifications, the Audit Log captures **one row per lifecycle event** (e.g., enqueued, retrying, delivered, parked, discarded), not a single mutable row per operation.
- **AL-3**: Site-originated events are appended to a **site-local SQLite hot-path** synchronously with the action, then **forwarded to central via gRPC telemetry**; central ingest is **idempotent on `EventId`** (insert-if-not-exists then upsert-on-newer-status).
- **AL-3**: Site-originated events are appended to a **site-local SQLite hot-path** synchronously with the action, then **forwarded to central via gRPC telemetry**; central ingest is **idempotent on `EventId`** (insert-if-not-exists; the `AuditLog` table is strictly append-only, so rows are never updated after insert).
- **AL-4**: A periodic **central→site reconciliation pull** detects and replays any telemetry events that were missed (e.g., during a central outage), making the central Audit Log eventually consistent with sites.
- **AL-5**: Each row captures **payload metadata** (target, method, status, timings, correlation IDs) plus a **truncated request/response body****8 KB default**, expanded to **64 KB on error** outcomes.
- **AL-6**: **HTTP headers are redacted by default**; **SQL parameter values are captured by default**. Per-target **redaction opt-in** is configurable on external systems, database connections, and inbound API methods.
@@ -457,7 +457,7 @@ Sections 10.110.4 cover **configuration-database audit** (config-mutating use
- **AL-9**: The site SQLite Audit Log is purged only when `ForwardState ∈ {Forwarded, Reconciled}` — i.e., a row must be either confirmed-forwarded *or* confirmed-reconciled before it can be removed. A central outage therefore **cannot cause audit loss at sites**.
- **AL-10**: The Central UI exposes an **Audit Log page** with a cross-channel filter (by site, target, status, time range, correlation ID), plus **drill-ins from existing operational pages** (Site Calls, Notification Outbox, Inbound API).
- **AL-11**: Append-only semantics are **enforced via DB roles** (no UPDATE/DELETE granted on the `AuditLog` table to application accounts); a **tamper-evidence hash chain is deferred to v1.x**.
- **AL-12**: The CLI provides a `scadalink audit` command group for query, export, and reconciliation operations against the central Audit Log.
- **AL-12**: The CLI provides a `scadalink audit` command group for query, export, and hash-chain verification (verify-chain becomes operational once AL-11's hash chain ships) against the central Audit Log.
## 11. Health Monitoring