docs(audit): add Audit nav group, Audit Log page, drill-ins, and KPI tiles to Central UI

This commit is contained in:
Joseph Doherty
2026-05-20 08:34:28 -04:00
parent 72388a7616
commit 8d922391b8

View File

@@ -58,6 +58,7 @@ Central cluster only. Sites have no user interface.
### External System Management (Design Role)
- Define external system contracts: connection details, API method definitions (parameters, return types).
- Define retry settings per external system (max retry count, fixed time between retries).
- The external system detail page includes a **"Recent activity"** link that opens the Audit Log page pre-filtered to `Channel = ApiOutbound` and `Target` starts-with the system name — surfacing the system's recent outbound API audit history.
### Database Connection Management (Design Role)
- Define named database connections: server, database, credentials.
@@ -74,6 +75,11 @@ Central cluster only. Sites have no user interface.
- Define data connections and assign them to sites (name, protocol type, connection details).
- **Data connection form**: "Primary Endpoint Configuration" (required JSON text area) and optional "Backup Endpoint Configuration" (collapsible section, hidden by default, revealed via "Add Backup Endpoint" button; "Remove Backup" button when editing an existing backup). "Failover Retry Count" numeric input (default 3, min 1, max 20) is visible only when a backup endpoint is configured.
- **Data connection list page**: Shows Primary Config and Backup Config columns. Active Endpoint column populated from health reports.
- The site detail page exposes a new **"Audit feed"** tab that hosts the Audit Log page pre-filtered to `Site = <site>` — an in-context view of every operational audit event for that site.
### Inbound API Management (Admin Role for keys, Design Role for methods)
- Manage inbound API keys (create, enable / disable, delete) and define API methods (name, parameters, return values, approved keys, implementation script).
- The API key detail page includes a **"Recent calls"** link that opens the Audit Log page pre-filtered to `Actor = <key name>` and `Channel = ApiInbound` — surfacing the key's recent inbound-call audit history.
### Area Management (Admin Role)
- Define hierarchical area structures per site.
@@ -89,6 +95,7 @@ Central cluster only. Sites have no user interface.
- **Disable** instances — stops data collection, script triggers, and alarm evaluation at the site while retaining the deployed configuration.
- **Enable** instances — re-activates a disabled instance.
- **Delete** instances — removes the running configuration from the site. Blocked if the site is unreachable. Store-and-forward messages are not cleared.
- The instance detail page exposes a new **"Audit feed"** tab that hosts the Audit Log page pre-filtered to the instance (`Site = <site>` and the `Instance / Script` filter set to the instance unique name) — an in-context view of every operational audit event involving that instance.
### Deployment (Deployment Role)
- View list of instances with staleness indicators (deployed config differs from template-derived config).
@@ -124,6 +131,7 @@ Central cluster only. Sites have no user interface.
- **KPI tiles** at the top of the page: queue depth (`Pending` + `Retrying`), stuck count, parked count, delivered in the last interval, and oldest pending age. The KPIs are central-computed on demand from the `Notifications` table.
- A **queryable notification list** filterable by status, type, source site, notification list, and time range, with a **stuck-only toggle** and keyword search on subject. Each row shows the notification's status, retry count, last error, and key timestamps.
- **Retry** and **Discard** actions are available on parked notifications: Retry returns the notification to `Pending` and resets `RetryCount` / `NextAttemptAt`; Discard moves it to `Discarded`. The row is retained either way so the table stays a complete audit record.
- Each row exposes a **"View audit history"** action that opens the Audit Log page pre-filtered to `CorrelationId = NotificationId`, surfacing every operational audit event recorded for that notification.
- **Stuck rows are visually badged** — a notification is stuck if it is `Pending` or `Retrying` and older than the configurable stuck-age threshold. Stuck detection is display-only; there is no automated escalation or alerting.
- All queries are served from the central `Notifications` table — no remote per-site queries are needed, unlike the Parked Message Management page.
@@ -131,6 +139,7 @@ Central cluster only. Sites have no user interface.
- Monitor cached calls store-and-forwarded from sites — `ExternalSystem.CachedCall()` and `Database.CachedWrite()` operations. Scoped to the `ExternalCall` and `DatabaseWrite` kinds only; notifications keep their separate Notification Outbox page and are not merged here.
- A **queryable cached-call list** filterable by site, kind, status, and time range. Each row shows the call's timestamp, site, kind, target summary, status badge, retry count, and last error.
- **Retry** and **Discard** actions are available on `Parked` rows only — `Failed` rows are not actionable, since a permanent failure would simply fail again and its error was already returned synchronously to the calling script. The actions issue central→site commands to the owning site; if the site is offline the UI surfaces a "site unreachable" message.
- Each row exposes a **"View audit history"** action that opens the Audit Log page pre-filtered to `CorrelationId = TrackedOperationId`, showing every operational audit event recorded for that cached call.
- Data is served from the central Site Call Audit component's `SiteCalls` table. The page is **read-mostly** — an eventually-consistent mirror of site state; the site remains the source of truth.
### Health Monitoring Dashboard (All Roles)
@@ -138,14 +147,42 @@ Central cluster only. Sites have no user interface.
- Per-site detail: active/standby node status, data connection health, script error rates, alarm evaluation error rates, store-and-forward buffer depths.
- Headline **Notification Outbox KPI tiles** — queue depth, stuck count, and parked count. These are central-computed by the Notification Outbox from the central `Notifications` table (not part of any site health report). The full outbox view is on the dedicated Notification Outbox page.
- Headline **Site Call Audit KPI tiles** — buffered count, parked count, and failed-last-interval. These are central-computed by the Site Call Audit component from the central `SiteCalls` table (not part of any site health report). The full cached-call view is on the dedicated Site Calls page.
- Headline **Audit KPI tiles** — three tiles in a new "Audit" KPI group: **Audit volume**, **Audit error rate**, and **Audit backlog**. These are sourced from the Audit Log component (#23) and Health Monitoring per the metric definitions in Component-HealthMonitoring.md; the dashboard simply surfaces them. The full audit query view is on the dedicated Audit Log page.
### Site Event Log Viewer (Deployment Role)
- Query site event logs remotely.
- Filter by event type, time range, instance.
- View script executions, alarm events (activations, clears, evaluation errors), deployment events (including script compilation results), connection status changes, store-and-forward activity, instance lifecycle events (enable, disable, delete).
### Audit Log Viewer (Admin Role)
- Query the central audit log.
### Audit Log (Admin / Audit Role)
- Lives under a **new top-level "Audit" nav group** (sibling to Notifications). In v1 the Audit nav group contains this single Audit Log page; the pre-existing Configuration Audit Log Viewer remains its own page below.
- Global query / filter / drilldown over the central `AuditLog` table maintained by the Audit Log component (#23). Read-only — the table is append-only, so there are no edit actions on rows.
- Per-site row scoping reuses the existing site-permission model from Security & Auth: a user sees only rows for sites they are authorized to operate. Bulk export (see below) requires the additional `AuditExport` permission.
- **Filter bar** (top of page, collapses to a single row when not focused):
- Time range — relative (15m / 1h / 24h / 7d) or custom.
- Channel — multi-select: `ApiOutbound`, `DbOutbound`, `Notification`, `ApiInbound`.
- Kind — multi-select; the available options are filtered by the selected Channels.
- Status — multi-select.
- Site — multi-select, scoped to the user's authorized sites.
- Instance / Script — text search with autocomplete.
- Target — text search (system + method, DB connection, list name).
- Actor — text search (inbound API key name).
- CorrelationId — paste a `TrackedOperationId` / `NotificationId` / request-id to see the full event sequence for one operation.
- "Errors only" toggle — shorthand for `Status NOT IN (Success, Delivered, Enqueued)`.
- **Results grid** (custom Blazor + Bootstrap component, consistent with the rest of the UI — no third-party grid):
- Columns, all resizable and reorderable, persisted per user: `OccurredAtUtc`, `Site`, `Channel`, `Kind`, `Status`, `Target`, `Actor`, `DurationMs`, `HttpStatus`, `ErrorMessage`.
- Keyset pagination ordered by `(OccurredAtUtc desc, EventId desc)`. Default page size 100.
- Clicking a row opens the drilldown drawer.
- **Drilldown drawer**:
- Pretty-prints `RequestSummary` / `ResponseSummary` — JSON is auto-detected and syntax-highlighted; SQL is syntax-highlighted.
- Surfaces **redaction indicators** wherever headers or fields were stripped at write time, per the Audit Log component's "Payload Capture Policy".
- **"Copy as cURL"** action on `ApiOutbound` and `ApiInbound` rows.
- **"Show all events for this operation"** link — re-applies the current view filtered by the row's `CorrelationId`.
- **Export** button on the page header streams a server-side CSV of the current filter (default cap 100k rows; larger exports go through the CLI). Requires the `AuditExport` permission.
### Configuration Audit Log Viewer (Admin Role)
- Pre-existing viewer for the `IAuditService` configuration-change log (template / instance / site / etc. before-after edits). Lives under the same **Audit** nav group as the operational Audit Log above.
- Query the central configuration audit log.
- Filter by user, entity type, action type, time range.
- View before/after state for each change.
@@ -163,3 +200,4 @@ Central cluster only. Sites have no user interface.
- **Health Monitoring**: Provides site health data for the dashboard.
- **Notification Outbox**: Provides notification delivery KPIs and serves the `Notifications` table queries and Retry/Discard actions for the Notification Outbox page.
- **Site Call Audit**: Serves the `SiteCalls` table queries and relays Retry/Discard actions to sites for the Site Calls page.
- **Audit Log (#23)**: Serves all `AuditLog` table queries (filter / grid / drilldown / CSV export) for the new Audit Log page and the drill-in surfaces on Notifications, Site Calls, External Systems, Inbound API keys, Sites, and Instances. Payload capture, redaction, and per-site authorization follow the Audit Log component's "Payload Capture Policy" and "Security & Tamper-Evidence" sections.