docs(audit): fix Audit error rate semantics and CLI permission split

Bundle D code-review feedback on 0ae1a25 and e6f7a7f:

- Audit error rate (HealthMonitoring tile) was described as a combined
  view of CentralAuditWriteFailures + AuditRedactionFailure (writer
  health). Per alog.md §10.3 / §14.1 it is the operational error rate
  of audited operations: % of central AuditLog rows with Status not
  in (Success/Delivered/Enqueued) over a rolling 5-min window. Audit
  writer issues surface separately via the dedicated metrics.

- Audit volume description gains the spec-mandated 'events/min, global
  + per-site sparkline' shape.

- CLI: scadalink audit was claiming all three subcommands need both
  OperationalAudit and AuditExport. Per alog.md §11.2 / §15.1, read
  (query, verify-chain) needs OperationalAudit; bulk export
  additionally requires AuditExport. Restored the spec's split.
This commit is contained in:
Joseph Doherty
2026-05-20 08:30:42 -04:00
parent e6f7a7ff79
commit 72388a7616
2 changed files with 9 additions and 7 deletions

View File

@@ -179,10 +179,12 @@ the `scadalink audit` group below.
### Centralized Audit Commands
The `scadalink audit` group targets the centralized Audit Log component (#23) and
exposes the UI-equivalent operational audit surface. All three subcommands require
both the `OperationalAudit` and `AuditExport` permissions (see Security & Auth #10);
the server enforces permission checks and returns HTTP 403 (CLI exit code 2) on
denial.
exposes the UI-equivalent operational audit surface. Permissions follow the same
read-vs-export split the Central UI uses (see Component-AuditLog.md, Security &
Tamper-Evidence, and Security & Auth #10): `audit query` and `audit verify-chain`
require the `OperationalAudit` permission; `audit export` additionally requires
`AuditExport`. The server enforces permission checks and returns HTTP 403 (CLI
exit code 2) on denial.
```
scadalink audit query --site <s> --since <t> [--until <t>] [--kind <k>] [--user <u>] [--entity-id <id>] [--correlation-id <id>] [--status <s>] [--page <n>] [--page-size <n>]

View File

@@ -85,9 +85,9 @@ Unlike the Notification Outbox, the Site Call Audit is **not a dispatcher** —
The Audit Log spans both sites (hot-path append + telemetry forward) and central (direct-write + ingest + redaction). Its operational health surfaces as three new dashboard tiles grouped under **Audit**:
- **Audit volume** — rate of audit rows landing in the central `AuditLog` table over the last interval, sourced from the Audit Log component on the active central node.
- **Audit error rate** — combined view of `CentralAuditWriteFailures` and `AuditRedactionFailure` over the last interval; non-zero values warrant a check of the Audit Log component's own logs.
- **Audit backlog** — global aggregate of `SiteAuditBacklog` across reporting sites (count of `Pending` site-local audit rows, oldest pending age, on-disk bytes). The per-site tile also surfaces a warning badge when its `SiteAuditBacklog` crosses the configurable threshold or when `SiteAuditTelemetryStalled` is set.
- **Audit volume** — events/min landing in the central `AuditLog` table, shown global plus per-site sparkline; sourced from the Audit Log component on the active central node.
- **Audit error rate** — percent of central `AuditLog` rows with `Status` other than `Success` / `Delivered` / `Enqueued` over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, transient failures, parked deliveries, etc.) — NOT the audit writer's own health. Audit-writer issues surface separately via `CentralAuditWriteFailures` and `AuditRedactionFailure`.
- **Audit backlog** — global aggregate of `SiteAuditBacklog` across reporting sites (count of `Pending` site-local audit rows, oldest pending age, on-disk bytes); click drills into a per-site breakdown. The per-site tile surfaces a warning badge when its `SiteAuditBacklog` crosses the configurable threshold or when `SiteAuditTelemetryStalled` is set.
These tiles are **point-in-time** like the Notification Outbox and Site Call Audit KPI tiles — no time-series store; consistent with Health Monitoring's "current status only" philosophy. The site-scoped `SiteAuditBacklog` / `SiteAuditWriteFailures` / `SiteAuditTelemetryStalled` metrics arrive in the existing site health report; the central-scoped `CentralAuditWriteFailures` / `AuditRedactionFailure` metrics are central-computed alongside the existing central KPIs.