From 72388a76168ccd3a592f2c2219c56ac3dd362484 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Wed, 20 May 2026 08:30:42 -0400 Subject: [PATCH] docs(audit): fix Audit error rate semantics and CLI permission split MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bundle D code-review feedback on 0ae1a25 and e6f7a7f: - Audit error rate (HealthMonitoring tile) was described as a combined view of CentralAuditWriteFailures + AuditRedactionFailure (writer health). Per alog.md §10.3 / §14.1 it is the operational error rate of audited operations: % of central AuditLog rows with Status not in (Success/Delivered/Enqueued) over a rolling 5-min window. Audit writer issues surface separately via the dedicated metrics. - Audit volume description gains the spec-mandated 'events/min, global + per-site sparkline' shape. - CLI: scadalink audit was claiming all three subcommands need both OperationalAudit and AuditExport. Per alog.md §11.2 / §15.1, read (query, verify-chain) needs OperationalAudit; bulk export additionally requires AuditExport. Restored the spec's split. --- docs/requirements/Component-CLI.md | 10 ++++++---- docs/requirements/Component-HealthMonitoring.md | 6 +++--- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/docs/requirements/Component-CLI.md b/docs/requirements/Component-CLI.md index f70a651..6f04cdc 100644 --- a/docs/requirements/Component-CLI.md +++ b/docs/requirements/Component-CLI.md @@ -179,10 +179,12 @@ the `scadalink audit` group below. ### Centralized Audit Commands The `scadalink audit` group targets the centralized Audit Log component (#23) and -exposes the UI-equivalent operational audit surface. All three subcommands require -both the `OperationalAudit` and `AuditExport` permissions (see Security & Auth #10); -the server enforces permission checks and returns HTTP 403 (CLI exit code 2) on -denial. +exposes the UI-equivalent operational audit surface. Permissions follow the same +read-vs-export split the Central UI uses (see Component-AuditLog.md, Security & +Tamper-Evidence, and Security & Auth #10): `audit query` and `audit verify-chain` +require the `OperationalAudit` permission; `audit export` additionally requires +`AuditExport`. The server enforces permission checks and returns HTTP 403 (CLI +exit code 2) on denial. ``` scadalink audit query --site --since [--until ] [--kind ] [--user ] [--entity-id ] [--correlation-id ] [--status ] [--page ] [--page-size ] diff --git a/docs/requirements/Component-HealthMonitoring.md b/docs/requirements/Component-HealthMonitoring.md index 190d33b..e02349a 100644 --- a/docs/requirements/Component-HealthMonitoring.md +++ b/docs/requirements/Component-HealthMonitoring.md @@ -85,9 +85,9 @@ Unlike the Notification Outbox, the Site Call Audit is **not a dispatcher** — The Audit Log spans both sites (hot-path append + telemetry forward) and central (direct-write + ingest + redaction). Its operational health surfaces as three new dashboard tiles grouped under **Audit**: -- **Audit volume** — rate of audit rows landing in the central `AuditLog` table over the last interval, sourced from the Audit Log component on the active central node. -- **Audit error rate** — combined view of `CentralAuditWriteFailures` and `AuditRedactionFailure` over the last interval; non-zero values warrant a check of the Audit Log component's own logs. -- **Audit backlog** — global aggregate of `SiteAuditBacklog` across reporting sites (count of `Pending` site-local audit rows, oldest pending age, on-disk bytes). The per-site tile also surfaces a warning badge when its `SiteAuditBacklog` crosses the configurable threshold or when `SiteAuditTelemetryStalled` is set. +- **Audit volume** — events/min landing in the central `AuditLog` table, shown global plus per-site sparkline; sourced from the Audit Log component on the active central node. +- **Audit error rate** — percent of central `AuditLog` rows with `Status` other than `Success` / `Delivered` / `Enqueued` over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, transient failures, parked deliveries, etc.) — NOT the audit writer's own health. Audit-writer issues surface separately via `CentralAuditWriteFailures` and `AuditRedactionFailure`. +- **Audit backlog** — global aggregate of `SiteAuditBacklog` across reporting sites (count of `Pending` site-local audit rows, oldest pending age, on-disk bytes); click drills into a per-site breakdown. The per-site tile surfaces a warning badge when its `SiteAuditBacklog` crosses the configurable threshold or when `SiteAuditTelemetryStalled` is set. These tiles are **point-in-time** like the Notification Outbox and Site Call Audit KPI tiles — no time-series store; consistent with Health Monitoring's "current status only" philosophy. The site-scoped `SiteAuditBacklog` / `SiteAuditWriteFailures` / `SiteAuditTelemetryStalled` metrics arrive in the existing site health report; the central-scoped `CentralAuditWriteFailures` / `AuditRedactionFailure` metrics are central-computed alongside the existing central KPIs.