docs(audit): fix Audit error rate semantics and CLI permission split
Bundle D code-review feedback on0ae1a25ande6f7a7f: - Audit error rate (HealthMonitoring tile) was described as a combined view of CentralAuditWriteFailures + AuditRedactionFailure (writer health). Per alog.md §10.3 / §14.1 it is the operational error rate of audited operations: % of central AuditLog rows with Status not in (Success/Delivered/Enqueued) over a rolling 5-min window. Audit writer issues surface separately via the dedicated metrics. - Audit volume description gains the spec-mandated 'events/min, global + per-site sparkline' shape. - CLI: scadalink audit was claiming all three subcommands need both OperationalAudit and AuditExport. Per alog.md §11.2 / §15.1, read (query, verify-chain) needs OperationalAudit; bulk export additionally requires AuditExport. Restored the spec's split.
This commit is contained in:
@@ -179,10 +179,12 @@ the `scadalink audit` group below.
|
||||
### Centralized Audit Commands
|
||||
|
||||
The `scadalink audit` group targets the centralized Audit Log component (#23) and
|
||||
exposes the UI-equivalent operational audit surface. All three subcommands require
|
||||
both the `OperationalAudit` and `AuditExport` permissions (see Security & Auth #10);
|
||||
the server enforces permission checks and returns HTTP 403 (CLI exit code 2) on
|
||||
denial.
|
||||
exposes the UI-equivalent operational audit surface. Permissions follow the same
|
||||
read-vs-export split the Central UI uses (see Component-AuditLog.md, Security &
|
||||
Tamper-Evidence, and Security & Auth #10): `audit query` and `audit verify-chain`
|
||||
require the `OperationalAudit` permission; `audit export` additionally requires
|
||||
`AuditExport`. The server enforces permission checks and returns HTTP 403 (CLI
|
||||
exit code 2) on denial.
|
||||
|
||||
```
|
||||
scadalink audit query --site <s> --since <t> [--until <t>] [--kind <k>] [--user <u>] [--entity-id <id>] [--correlation-id <id>] [--status <s>] [--page <n>] [--page-size <n>]
|
||||
|
||||
@@ -85,9 +85,9 @@ Unlike the Notification Outbox, the Site Call Audit is **not a dispatcher** —
|
||||
|
||||
The Audit Log spans both sites (hot-path append + telemetry forward) and central (direct-write + ingest + redaction). Its operational health surfaces as three new dashboard tiles grouped under **Audit**:
|
||||
|
||||
- **Audit volume** — rate of audit rows landing in the central `AuditLog` table over the last interval, sourced from the Audit Log component on the active central node.
|
||||
- **Audit error rate** — combined view of `CentralAuditWriteFailures` and `AuditRedactionFailure` over the last interval; non-zero values warrant a check of the Audit Log component's own logs.
|
||||
- **Audit backlog** — global aggregate of `SiteAuditBacklog` across reporting sites (count of `Pending` site-local audit rows, oldest pending age, on-disk bytes). The per-site tile also surfaces a warning badge when its `SiteAuditBacklog` crosses the configurable threshold or when `SiteAuditTelemetryStalled` is set.
|
||||
- **Audit volume** — events/min landing in the central `AuditLog` table, shown global plus per-site sparkline; sourced from the Audit Log component on the active central node.
|
||||
- **Audit error rate** — percent of central `AuditLog` rows with `Status` other than `Success` / `Delivered` / `Enqueued` over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, transient failures, parked deliveries, etc.) — NOT the audit writer's own health. Audit-writer issues surface separately via `CentralAuditWriteFailures` and `AuditRedactionFailure`.
|
||||
- **Audit backlog** — global aggregate of `SiteAuditBacklog` across reporting sites (count of `Pending` site-local audit rows, oldest pending age, on-disk bytes); click drills into a per-site breakdown. The per-site tile surfaces a warning badge when its `SiteAuditBacklog` crosses the configurable threshold or when `SiteAuditTelemetryStalled` is set.
|
||||
|
||||
These tiles are **point-in-time** like the Notification Outbox and Site Call Audit KPI tiles — no time-series store; consistent with Health Monitoring's "current status only" philosophy. The site-scoped `SiteAuditBacklog` / `SiteAuditWriteFailures` / `SiteAuditTelemetryStalled` metrics arrive in the existing site health report; the central-scoped `CentralAuditWriteFailures` / `AuditRedactionFailure` metrics are central-computed alongside the existing central KPIs.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user