docs(audit): fix Audit error rate semantics and CLI permission split
Bundle D code-review feedback on0ae1a25ande6f7a7f: - Audit error rate (HealthMonitoring tile) was described as a combined view of CentralAuditWriteFailures + AuditRedactionFailure (writer health). Per alog.md §10.3 / §14.1 it is the operational error rate of audited operations: % of central AuditLog rows with Status not in (Success/Delivered/Enqueued) over a rolling 5-min window. Audit writer issues surface separately via the dedicated metrics. - Audit volume description gains the spec-mandated 'events/min, global + per-site sparkline' shape. - CLI: scadalink audit was claiming all three subcommands need both OperationalAudit and AuditExport. Per alog.md §11.2 / §15.1, read (query, verify-chain) needs OperationalAudit; bulk export additionally requires AuditExport. Restored the spec's split.
This commit is contained in:
@@ -85,9 +85,9 @@ Unlike the Notification Outbox, the Site Call Audit is **not a dispatcher** —
|
||||
|
||||
The Audit Log spans both sites (hot-path append + telemetry forward) and central (direct-write + ingest + redaction). Its operational health surfaces as three new dashboard tiles grouped under **Audit**:
|
||||
|
||||
- **Audit volume** — rate of audit rows landing in the central `AuditLog` table over the last interval, sourced from the Audit Log component on the active central node.
|
||||
- **Audit error rate** — combined view of `CentralAuditWriteFailures` and `AuditRedactionFailure` over the last interval; non-zero values warrant a check of the Audit Log component's own logs.
|
||||
- **Audit backlog** — global aggregate of `SiteAuditBacklog` across reporting sites (count of `Pending` site-local audit rows, oldest pending age, on-disk bytes). The per-site tile also surfaces a warning badge when its `SiteAuditBacklog` crosses the configurable threshold or when `SiteAuditTelemetryStalled` is set.
|
||||
- **Audit volume** — events/min landing in the central `AuditLog` table, shown global plus per-site sparkline; sourced from the Audit Log component on the active central node.
|
||||
- **Audit error rate** — percent of central `AuditLog` rows with `Status` other than `Success` / `Delivered` / `Enqueued` over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, transient failures, parked deliveries, etc.) — NOT the audit writer's own health. Audit-writer issues surface separately via `CentralAuditWriteFailures` and `AuditRedactionFailure`.
|
||||
- **Audit backlog** — global aggregate of `SiteAuditBacklog` across reporting sites (count of `Pending` site-local audit rows, oldest pending age, on-disk bytes); click drills into a per-site breakdown. The per-site tile surfaces a warning badge when its `SiteAuditBacklog` crosses the configurable threshold or when `SiteAuditTelemetryStalled` is set.
|
||||
|
||||
These tiles are **point-in-time** like the Notification Outbox and Site Call Audit KPI tiles — no time-series store; consistent with Health Monitoring's "current status only" philosophy. The site-scoped `SiteAuditBacklog` / `SiteAuditWriteFailures` / `SiteAuditTelemetryStalled` metrics arrive in the existing site health report; the central-scoped `CentralAuditWriteFailures` / `AuditRedactionFailure` metrics are central-computed alongside the existing central KPIs.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user