From 241a792e7bed61e919083e9dfcc2c6d6062cff31 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Wed, 17 Jun 2026 20:52:12 -0400 Subject: [PATCH] =?UTF-8?q?docs(kpi):=20K17=20=E2=80=94=20#26=20KpiHistory?= =?UTF-8?q?=20component=20doc=20+=20README/CLAUDE=20+=20cross-component=20?= =?UTF-8?q?interactions=20+=20completion-design=20update?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- CLAUDE.md | 4 +- README.md | 1 + ...26-06-15-stillpending-completion-design.md | 11 +- docs/requirements/Component-AuditLog.md | 5 + docs/requirements/Component-CentralUI.md | 1 + .../Component-HealthMonitoring.md | 1 + docs/requirements/Component-KpiHistory.md | 168 ++++++++++++++++++ .../Component-NotificationOutbox.md | 1 + docs/requirements/Component-SiteCallAudit.md | 6 + 9 files changed, 193 insertions(+), 5 deletions(-) create mode 100644 docs/requirements/Component-KpiHistory.md diff --git a/CLAUDE.md b/CLAUDE.md index 62db57ea..4f1fe4cc 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -63,7 +63,7 @@ Related repos cloned as sibling directories under `~/Desktop/` — referenced fo - Commit related changes together with a descriptive message summarizing the design decision and the implementation slice. - After non-trivial code changes, build (`dotnet build ZB.MOM.WW.ScadaBridge.slnx`) and run relevant tests before declaring done; for cluster-runtime changes, rebuild the image with `bash docker/deploy.sh`. -## Current Component List (25 components) +## Current Component List (26 components) 1. Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs. 2. Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle. @@ -90,6 +90,7 @@ Related repos cloned as sibling directories under `~/Desktop/` — referenced fo 23. Audit Log — Central append-only AuditLog table spanning every script-trust-boundary action (outbound API sync+cached, outbound DB sync+cached, notifications, inbound API). Site SQLite hot-path + gRPC telemetry + reconciliation; combined telemetry with Site Call Audit; central direct-write for Notification Outbox dispatch + Inbound API; monthly partitioning, 365-day retention. 24. Transport — File-based, encrypted bundle export/import via Central UI. Templates, system artifacts, central-only configuration. Per-conflict resolution. Correlated audit via `BundleImportId`. No site involvement. 25. Script Analysis — Shared authoritative script-trust analyzer: unified forbidden-API deny-list (`ScriptTrustPolicy`), fused semantic+syntactic validator (`ScriptTrustValidator`), Roslyn compile wrapper (`RoslynScriptCompiler`), and compile-only globals stubs (`ScriptCompileSurface`/`TriggerCompileSurface`); consumed by Template Engine, Site Runtime, Inbound API, and Central UI. +26. KPI History — Reusable central KPI-history backbone: tall/EAV `KpiSample` store in central MS SQL, `KpiHistoryRecorderActor` cluster singleton (`kpi-history-recorder`, not readiness-gated) sampling DI-registered `IKpiSampleSource`s every minute, bucketed query (`GetRawSeriesAsync` + `KpiSeriesBucketer`) + scoped `KpiHistoryQueryService`, and a reusable custom-SVG `KpiTrendChart`; ships trends for Notification Outbox, Site Call Audit, Audit Log, and Site Health. ## Key Design Decisions (for context across sessions) @@ -199,6 +200,7 @@ Related repos cloned as sibling directories under `~/Desktop/` — referenced fo - Stuck = `Pending`/`Retrying` older than a configurable age threshold (default 10 min) — display-only (KPI count + row badge), no escalation/alerting. - Headline KPI tiles surface on the Health dashboard; a new Central UI Notification Outbox page offers a queryable list with Retry/Discard actions on parked notifications. - Site Call Audit KPIs are central-computed point-in-time from the `SiteCalls` table (global + per-site), mirroring the Notification Outbox KPI shape; tiles surface on the Health dashboard alongside a queryable Central UI Site Calls page with Retry/Discard on parked rows. +- KPI History & Trends (#26, M6): a reusable central KPI-history backbone — supersedes the prior "point-in-time only, no time-series store" stance — backed by a tall/EAV `KpiSample` table in central MS SQL (no new infra). A `KpiHistoryRecorderActor` cluster singleton (`kpi-history-recorder`, **not readiness-gated**, best-effort with per-source isolation) samples every minute by enumerating DI-registered `IKpiSampleSource`s (each lives with its owner, registered via `TryAddEnumerable`, reusing existing KPI/aggregator reads); daily purge after `RetentionDays` (default 90). Querying is `IKpiHistoryRepository.GetRawSeriesAsync` → `KpiSeriesBucketer` (last-value-per-bucket) → scoped dual-ctor `KpiHistoryQueryService` → a reusable **custom-SVG** `KpiTrendChart` (no third-party charting lib). Trends ship on four surfaces: Notification Outbox, Site Calls, Audit Log pages + a per-site Health-dashboard panel. `KpiHistoryOptions` (`ScadaBridge:KpiHistory`): SampleInterval 60s, RetentionDays 90, PurgeInterval 1d, DefaultMaxSeriesPoints 200; validated. M6's T9 (Teams + other non-Email delivery adapters) and T10 (`NotificationType` enum values + Central UI list "Type" selector) are deferred to the next major version. ### Code Organization - Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database. diff --git a/README.md b/README.md index 0265f781..440454a2 100644 --- a/README.md +++ b/README.md @@ -101,6 +101,7 @@ Both stacks share the infrastructure services in [`infra/`](infra/) (MS SQL, LDA | 23 | Audit Log | [docs/requirements/Component-AuditLog.md](docs/requirements/Component-AuditLog.md) | New central append-only AuditLog spanning every script-trust-boundary action (outbound API sync+cached, outbound DB sync+cached, notifications, inbound API). Site-local SQLite hot-path append + gRPC telemetry + central reconciliation; combined telemetry packet with Site Call Audit; central direct-write for Notification Outbox dispatch + Inbound API middleware; monthly partitioning, 365-day default retention. | | 24 | Transport | [docs/requirements/Component-Transport.md](docs/requirements/Component-Transport.md) | Bundle export/import for templates, shared scripts, external systems, central-only artifacts. AES-256-GCM encryption; per-conflict resolution on import; correlated audit trail. | | 25 | Script Analysis | [docs/requirements/Component-ScriptAnalysis.md](docs/requirements/Component-ScriptAnalysis.md) | Shared authoritative script-trust analyzer: unified forbidden-API deny-list (`ScriptTrustPolicy`), fused semantic+syntactic validator (`ScriptTrustValidator`), Roslyn compile wrapper (`RoslynScriptCompiler`), and compile-only globals stubs (`ScriptCompileSurface`/`TriggerCompileSurface`); consumed by Template Engine, Site Runtime, Inbound API, and Central UI. | +| 26 | KPI History | [docs/requirements/Component-KpiHistory.md](docs/requirements/Component-KpiHistory.md) | Reusable central KPI-history backbone: tall/EAV `KpiSample` store (central MS SQL), `KpiHistoryRecorderActor` cluster singleton (`kpi-history-recorder`, not readiness-gated) sampling DI-registered `IKpiSampleSource`s each minute, bucketed `GetRawSeriesAsync` + `KpiSeriesBucketer` query, and a reusable custom-SVG `KpiTrendChart`. Ships trends for Notification Outbox, Site Call Audit, Audit Log, and Site Health. | **Shared UI sub-component** (not a top-level component): [TreeView](docs/requirements/Component-TreeView.md) — reusable hierarchical tree/grid Blazor component used by the Central UI (#9) for the templates folder hierarchy, data-connection browse, and tag pickers. diff --git a/docs/plans/2026-06-15-stillpending-completion-design.md b/docs/plans/2026-06-15-stillpending-completion-design.md index a40a7820..d08a0ba4 100644 --- a/docs/plans/2026-06-15-stillpending-completion-design.md +++ b/docs/plans/2026-06-15-stillpending-completion-design.md @@ -78,8 +78,11 @@ Wire up behavior that exists in code but is never started, and fill the event-lo #### M5 — Audit hardening (T1–T8) Hash-chain tamper evidence (off by default, `verify-chain` made real); Parquet export/archival (replace the 501); per-channel retention overrides; tag-cascade for `ParentExecutionId` (thread writing-execution id through trigger-driven runs); ExecutionId/ParentExecutionId + SourceNode backfill on historical rows; per-node stuck-count KPIs; structured response capture (headers/content-type, inbound request headers, per-method opt-out, `AuditInboundCeilingHits` metric); CLI `audit tree`. -#### M6 — Notifications (T9–T11) -Teams + other non-Email delivery adapters behind the existing `INotificationDeliveryAdapter` seam; `NotificationType` enum values; Central UI notification-list `Type` selector; historical/trend KPI charts (introduce a time-series store). +#### M6 — KPI History & Trends (T11 delivered; T9/T10 deferred) +Reshaped during the 2026-06-17 brainstorm (see `docs/plans/2026-06-17-m6-kpi-history-design.md`): +- **T11 — DELIVERED** as the reusable **KPI-history backbone** (#26 KpiHistory), promoted from a notifications-only feature. A tall/EAV `KpiSample` store in **central MS SQL** (no new infra — supersedes the original "point-in-time only, no time-series store" stance), a `KpiHistoryRecorderActor` cluster singleton (`kpi-history-recorder`, not readiness-gated, best-effort with per-source isolation) sampling DI-registered `IKpiSampleSource`s every minute, a bucketed `GetRawSeriesAsync` + `KpiSeriesBucketer` query + scoped `KpiHistoryQueryService`, and a reusable custom-SVG `KpiTrendChart` (no third-party charting lib). Trends shipped for **all** current KPI sources — Notification Outbox, Site Call Audit, Audit Log, and Site Health — across four UI surfaces. +- **T9 (Teams + other non-Email delivery adapters behind `INotificationDeliveryAdapter`) — DEFERRED to the next major version.** The seam exists; no code now. Transport choice (Incoming Webhook vs Microsoft Graph) and the Teams list-targeting model remain to be designed. +- **T10 (`NotificationType` enum values + Central UI notification-list `Type` selector) — DEFERRED with T9.** A Type selector has no purpose until a second delivery type exists. #### M7 — OPC UA / MxGateway UX (T13–T17) Dedicated operator Alarm Summary page; MxGateway secured writes (operator+verifier); OPC UA address-space search + `BrowseNext` paging; type-info surfacing + bulk override CSV import; "Verify endpoint" connectivity button + cert-management UI. @@ -96,7 +99,7 @@ Template tree search/filter; folder drag-drop + sibling reorder + root context m ## Dependencies & sequencing - **M1 → M5** — audit hardening builds on the wired purge/reconciliation. -- **M6/T11** — depends on introducing a time-series store (new infra; size carefully). +- **M6/T11** — delivered as the #26 KpiHistory backbone; reused **central MS SQL** (a tall/EAV `KpiSample` table) rather than introducing new infra. T9/T10 deferred to the next major version. - **M9/T26** — base-template versioning is the largest authoring item; may split. - **M4** — runs anytime; cheap and high-clarity, good to interleave. - **M3** — independent; can run in parallel with M1/M2. @@ -121,7 +124,7 @@ Template tree search/filter; folder drag-drop + sibling reorder + root context m ## Open items / risks - M3 real-compile may surface latent invalid scripts in existing templates/fixtures — budget for fixture cleanup. -- M6 time-series store is the one genuinely-new piece of infrastructure; scope it deliberately (could reuse MS SQL with a rollup table rather than a new dependency). +- M6 KPI history (resolved): reused **central MS SQL** with a tall/EAV `KpiSample` table rather than a new dependency, so no genuinely-new infrastructure was introduced. - The Phase 2 roadmap is large; treat each milestone as a separate planning + implementation pass, not a single mega-effort. ## Next step diff --git a/docs/requirements/Component-AuditLog.md b/docs/requirements/Component-AuditLog.md index 5b159ca0..3467b665 100644 --- a/docs/requirements/Component-AuditLog.md +++ b/docs/requirements/Component-AuditLog.md @@ -578,3 +578,8 @@ orphaned entries) and in the CLI's `audit tree` output. `scadabridge audit backfill-source-node --sentinel --before `, and `scadabridge audit verify-chain` (no-op placeholder for the deferred hash-chain feature); same permission requirements as the UI. +- **[KPI History (#26)](Component-KpiHistory.md)** — emits `IKpiSampleSource` + (`AuditLogKpiSampleSource`, Global) consumed by the KpiHistory recorder (#26), + reusing the existing audit-KPI reads; the resulting `totalEventsLastHour` / + `errorEventsLastHour` / `backlogTotal` series render as trends on the Audit Log + page via `KpiTrendChart`. diff --git a/docs/requirements/Component-CentralUI.md b/docs/requirements/Component-CentralUI.md index 06d4b7e5..9fa76127 100644 --- a/docs/requirements/Component-CentralUI.md +++ b/docs/requirements/Component-CentralUI.md @@ -249,3 +249,4 @@ Per-leaf alarm rendering (leaf nodes are individual conditions for native alarms - **Notification Outbox**: Provides notification delivery KPIs and serves the `Notifications` table queries and Retry/Discard actions for the Notification Outbox page. - **Site Call Audit**: Serves the `SiteCalls` table queries and relays Retry/Discard actions to sites for the Site Calls page. - **Audit Log (#23)**: Serves all `AuditLog` table queries (filter / grid / drilldown / CSV export) for the new Audit Log page and the drill-in surfaces on Notifications, Site Calls, External Systems, Inbound API keys, Sites, and Instances. Payload capture, redaction, and per-site authorization follow the Audit Log component's "Payload Capture Policy" and "Security & Tamper-Evidence" sections. +- **KPI History (#26)**: The Central UI hosts the `KpiHistoryQueryService` (scoped-repository read over `IKpiHistoryRepository.GetRawSeriesAsync` + `KpiSeriesBucketer`, dual-ctor test seam) and renders the reusable custom-SVG `KpiTrendChart` fed by it. Trend sections appear on the Notification Outbox, Site Calls, and Audit Log pages and in a per-site panel on the Health Monitoring dashboard; a query failure degrades to an unavailable-chart placeholder rather than breaking the page. See [Component-KpiHistory.md](Component-KpiHistory.md). diff --git a/docs/requirements/Component-HealthMonitoring.md b/docs/requirements/Component-HealthMonitoring.md index de2ec2a8..e4277e0f 100644 --- a/docs/requirements/Component-HealthMonitoring.md +++ b/docs/requirements/Component-HealthMonitoring.md @@ -116,3 +116,4 @@ These tiles are **point-in-time** like the Notification Outbox and Site Call Aud - **Central UI**: Health Monitoring Dashboard displays aggregated metrics. - **Communication Layer**: Health reports flow as periodic messages. +- **KPI History (#26)**: emits `IKpiSampleSource` (`SiteHealthKpiSampleSource`, per-Site) consumed by the KpiHistory recorder (#26). It reads the in-memory `ICentralHealthAggregator.GetAllSiteStates()` (no DB), turning the per-site snapshot — previously sequence-numbered every 30s but discarded — into trends (`connectionsUp`/`connectionsDown`, `scriptErrors`, `alarmEvalErrors`, `sfBufferDepth`, `deadLetters`, `parkedMessages`, `deployedInstances`/`enabledInstances`/`disabledInstances`, `auditBacklogPending`, `eventLogWriteFailures`) rendered in the dashboard's per-site `KpiTrendChart` panel. See [Component-KpiHistory.md](Component-KpiHistory.md). diff --git a/docs/requirements/Component-KpiHistory.md b/docs/requirements/Component-KpiHistory.md new file mode 100644 index 00000000..ee55c51c --- /dev/null +++ b/docs/requirements/Component-KpiHistory.md @@ -0,0 +1,168 @@ +# Component: KPI History + +## Purpose + +The KPI History component is the central, reusable **KPI-history backbone** — a tall / EAV time-series store, a periodic recorder singleton, a bucketed query API, and a reusable custom SVG trend-chart component. It turns the system's existing point-in-time KPIs into trends, and ships those trends for the **Notification Outbox (#21)**, **Site Call Audit (#22)**, **Audit Log (#23)**, and **Site Health (#11)** sources. + +This supersedes the earlier "KPI history — point-in-time only, no separate time-series store is added" stance carried by the Notification Outbox and Site Call Audit KPI sections. M6 explicitly introduces a store. It lives in **central MS SQL** — the existing HA store — so it adds **no new infrastructure dependency**: a single `KpiSample` table, an EF mapping + migration, and a central cluster singleton that samples every minute. + +The backbone is deliberately source-agnostic. Each owning component contributes an `IKpiSampleSource` registered into DI; the recorder enumerates them. KPI History therefore does **not** reference every component, and every source reuses the KPI reads its owner already computes — no per-source schema or storage work. + +## Location + +- `src/ZB.MOM.WW.ScadaBridge.KpiHistory` — the component project: the `KpiHistoryRecorderActor`, `KpiHistoryOptions` + validator, and the DI/options wiring (`ServiceCollectionExtensions`). It owns the recorder, the options, and consumes the `IKpiSampleSource` abstraction (defined in Commons). +- **`IKpiSampleSource` implementations live with their owners**, not here — `NotificationOutboxKpiSampleSource` (in NotificationOutbox), `SiteCallAuditKpiSampleSource` (in SiteCallAudit), `AuditLogKpiSampleSource` (in AuditLog), `SiteHealthKpiSampleSource` (in HealthMonitoring). Each registers itself via `TryAddEnumerable`. +- **Commons** — the `KpiSample` POCO entity (`Entities/Kpi`), the `IKpiSampleSource` and `IKpiHistoryRepository` interfaces (`Interfaces/Kpi`), and the `KpiSources` / `KpiScopes` constant catalogs + `KpiSeriesPoint` / `KpiSeriesBucketer` types (`Types/Kpi`). +- **Configuration Database** — the EF mapping (`KpiSampleEntityTypeConfiguration`), the migration that creates the `KpiSample` table + indexes, and the `KpiHistoryRepository` implementation. +- **Central UI** — the `KpiHistoryQueryService` query service and the reusable `KpiTrendChart.razor` component, plus the trend sections embedded on four surfaces. + +The recorder is a **singleton on the active central node**, consistent with the other central singletons (Notification Outbox, Site Call Audit, purge actors). + +## Responsibilities + +- Own the `KpiSample` table — the central tall / EAV KPI-history store in MS SQL. +- Run the recorder loop: every `SampleInterval`, enumerate all registered `IKpiSampleSource`s and persist their samples stamped with one shared tick timestamp. +- Isolate sources from one another and from the store: a failure in any one source (or in the write) is logged and skipped for that tick and never disrupts the source component or the rest of the tick (best-effort observability). +- Purge aged rows on a daily cadence (`PurgeInterval`) older than `RetentionDays`. +- Provide a bucketed series-query API (`IKpiHistoryRepository.GetRawSeriesAsync` + `KpiSeriesBucketer`) and the Central UI query service + reusable trend chart that consume it. + +KPI History is **observability, never a user-facing critical path** — neither recording nor querying may ever break a hosting page or disrupt a source component. + +## Schema — `KpiSample` (tall / EAV) + +A persistence-ignorant POCO in Commons; EF mapping + migration in Configuration Database; one table in central MS SQL. One row per `(Source, Metric, Scope, ScopeKey)` per recorder tick: + +| Column | Type | Notes | +|---|---|---| +| `Id` | `bigint` PK identity | Surrogate key assigned by the store. | +| `Source` | `varchar(64)` | Owning source — a `KpiSources` constant: `NotificationOutbox` / `SiteCallAudit` / `AuditLog` / `SiteHealth`. | +| `Metric` | `varchar(64)` | Per-source metric name, e.g. `queueDepth`, `parkedCount`, `deadLetters` — drawn from each source's own metric catalog. | +| `Scope` | `varchar(16)` | A `KpiScopes` constant: `Global` / `Site` / `Node`. | +| `ScopeKey` | `varchar(64)` NULL | Site id (for `Site`) or node name (for `Node`); `NULL` for `Global`. | +| `Value` | `float` (`double`) | Counts carried exactly within range; ages stored as **seconds**. | +| `CapturedAtUtc` | `datetime2` | The recorder tick timestamp (UTC), shared across every sample in one tick. | + +All timestamps are UTC, consistent with the system-wide convention. + +Two named indexes back the access paths: + +- **`IX_KpiSample_Series` (`Source`, `Metric`, `Scope`, `ScopeKey`, `CapturedAtUtc`)** — the per-series range query (one series scanned in time order). +- **`IX_KpiSample_Captured` (`CapturedAtUtc`)** — the retention purge. + +## Recorder — `KpiHistoryRecorderActor` + +The recorder is the Akka.NET cluster singleton **`kpi-history-recorder`** (singleton-manager actor `kpi-history-recorder-singleton`), running on the active central node. It is **not readiness-gated** — the recorder is pure observability and must never gate `/health/ready`, so it is started outside the readiness barrier (unlike the operational singletons). On graceful shutdown it drains via a `CoordinatedShutdown` task for clean singleton handover. + +A timer fires every `SampleInterval` (default 60s; an immediate first tick primes the series, then it settles into the periodic cadence). On each tick the recorder: + +1. Opens a **per-tick DI scope** (scoped `DbContext`/repository — the same scope-per-sweep pattern as the `NotificationOutboxActor`). +2. Enumerates the registered `IEnumerable`. Each source returns an `IReadOnlyList` stamped with the tick's single `CapturedAtUtc`. +3. Writes all collected samples via `IKpiHistoryRepository.RecordSamplesAsync`. + +**Best-effort, per-source isolation.** Each source call and the write are individually guarded. A throwing source is logged and its samples skipped for that tick; it never aborts the tick, the other sources, or the source component itself. This is the same `IEnumerable<>`-of-adapters decoupling pattern used by `INotificationDeliveryAdapter`. + +**Retention.** A daily purge timer (`PurgeInterval`, default 24h) deletes rows older than `RetentionDays` (default 90) via `IKpiHistoryRepository.PurgeOlderThanAsync`, reusing the existing purge-scheduler shape. Hourly/longer-range downsampling is deferred (YAGNI). + +## Sample Sources + +Each `IKpiSampleSource` lives in its owning component and is registered into DI with `TryAddEnumerable` (idempotent, additive). Each reuses the KPI reads its owner already performs — the Notification Outbox / Site Call Audit / Audit Log sources call their owners' existing `Compute…KpisAsync` aggregator reads; the Site Health source reads the in-memory `ICentralHealthAggregator` (no DB read). `Value` carries counts exactly and ages as seconds; all metric names below are the exact shipped strings. + +### `NotificationOutboxKpiSampleSource` (in NotificationOutbox) + +Scopes: **Global + per-Site + per-Node** (the per-node breakdown reuses the M5 `ComputePerNodeKpisAsync`). + +- `queueDepth` +- `stuckCount` +- `parkedCount` +- `deliveredLastInterval` +- `oldestPendingAgeSeconds` + +### `SiteCallAuditKpiSampleSource` (in SiteCallAudit) + +Scopes: **Global + per-Site + per-Node**. + +- `buffered` +- `parked` +- `failedLastInterval` +- `deliveredLastInterval` +- `stuck` +- `oldestPendingAgeSeconds` + +### `AuditLogKpiSampleSource` (in AuditLog) + +Scope: **Global**. + +- `totalEventsLastHour` +- `errorEventsLastHour` +- `backlogTotal` + +### `SiteHealthKpiSampleSource` (in HealthMonitoring) + +Reads `ICentralHealthAggregator.GetAllSiteStates()` (in-memory, no DB). Scope: **per-Site** — the largest latent win, since Site Health was previously sequence-numbered every 30s but its history discarded. + +- `connectionsUp` +- `connectionsDown` +- `scriptErrors` +- `alarmEvalErrors` +- `sfBufferDepth` +- `deadLetters` +- `parkedMessages` +- `deployedInstances` +- `enabledInstances` +- `disabledInstances` +- `auditBacklogPending` +- `eventLogWriteFailures` + +## Query + UI + +### Bucketed query + +`IKpiHistoryRepository.GetRawSeriesAsync(source, metric, scope, scopeKey, fromUtc, toUtc, …)` returns the raw points for one series over `[fromUtc, toUtc]`. `KpiSeriesBucketer.Bucket(raw, fromUtc, toUtc, maxPoints)` then partitions the window into ≤ `maxPoints` time buckets and returns the **last value per bucket** as `KpiSeriesPoint(BucketStartUtc, Value)`. Last-value is correct for gauge metrics; v1 ships exactly one aggregation — avg / min / max are deferred. + +### `KpiHistoryQueryService` (Central UI) + +A scoped-repository direct read with a **dual-constructor test seam** (one ctor resolves a scoped `IKpiHistoryRepository` per call; the other accepts an injected repository for tests) — the same shape as `AuditLogQueryService`. `GetSeriesAsync` resolves the effective point cap (caller override or `KpiHistoryOptions.DefaultMaxSeriesPoints`), fetches the raw series, and reduces it via `KpiSeriesBucketer`. A query failure surfaces as an unavailable chart (em-dash / message), mirroring how the existing KPI tiles surface transient failures — it never breaks the hosting page. + +### `KpiTrendChart.razor` (Central UI) + +A reusable **custom inline-SVG** line/area chart — a polyline path with min/max + time-range axis labels, a responsive `viewBox`, and clean corporate styling. There is **no third-party charting library** (per the CLAUDE.md no-third-party-component-framework rule). The time window (e.g. 24h / 7d) is owned by the parent page. + +### Surfaces + +Trend sections render on four pages, each feeding `KpiTrendChart` from `KpiHistoryQueryService`: + +- **Notification Outbox** page — outbox KPI trends. +- **Site Calls** page — cached-call KPI trends. +- **Audit Log** page — audit volume / error / backlog trends. +- **Health dashboard** — a per-site Site Health trend panel. + +## Configuration — `KpiHistoryOptions` + +Bound from the `ScadaBridge:KpiHistory` section on the central host (Options pattern), validated on startup by `KpiHistoryOptionsValidator`: + +| Option | Default | Notes | +|---|---|---| +| `SampleInterval` | `60s` | Recorder tick cadence. Must be `> 0`. | +| `RetentionDays` | `90` | Rows older than this are purged. Bounded to `[1, 3650]` days. | +| `PurgeInterval` | `1d` | Daily purge cadence. Must be `> 0`. | +| `DefaultMaxSeriesPoints` | `200` | Default bucket cap for a series query when the caller does not override it. Bounded to `[2, 5000]`. | + +Validation fails fast at startup on a non-positive `SampleInterval` / `PurgeInterval` (which would stall the recorder / purge), an out-of-range `RetentionDays` (too short loses history; too long defeats retention), or an out-of-range `DefaultMaxSeriesPoints`. + +## Dependencies + +- **Commons**: defines the `KpiSample` entity, the `IKpiSampleSource` and `IKpiHistoryRepository` interfaces, the `KpiSources` / `KpiScopes` catalogs, and the `KpiSeriesPoint` / `KpiSeriesBucketer` query types. +- **Configuration Database**: hosts the `KpiSample` table, its EF mapping, the migration, and the `KpiHistoryRepository` implementation. +- **Cluster Infrastructure**: hosts the `kpi-history-recorder` cluster singleton with active/standby failover. +- **Host**: binds `KpiHistoryOptions`, registers the component on the central role, and starts the recorder singleton **outside** the readiness barrier. +- **Notification Outbox / Site Call Audit / Audit Log / Health Monitoring**: each contributes an `IKpiSampleSource` and the KPI/aggregator reads it reuses. KPI History depends on the `IKpiSampleSource` abstraction, not on these components directly. +- **Central UI**: hosts `KpiHistoryQueryService` and the `KpiTrendChart` component. + +## Interactions + +- **Notification Outbox (#21)**: registers `NotificationOutboxKpiSampleSource` (Global / Site / Node), sampled each recorder tick; its trends render on the Notification Outbox page. +- **Site Call Audit (#22)**: registers `SiteCallAuditKpiSampleSource` (Global / Site / Node); its trends render on the Site Calls page. +- **Audit Log (#23)**: registers `AuditLogKpiSampleSource` (Global); its trends render on the Audit Log page. +- **Health Monitoring (#11)**: registers `SiteHealthKpiSampleSource` (per-Site), reading the in-memory central health aggregator; its trends render in the Health dashboard's per-site panel. +- **Central UI (#9)**: renders the reusable `KpiTrendChart` fed by `KpiHistoryQueryService` across the four trend surfaces; a query failure degrades to an unavailable-chart placeholder rather than breaking the page. +- **Cluster Infrastructure (#13)**: provides the active/standby singleton hosting for the recorder, which drains on `CoordinatedShutdown` for clean handover. diff --git a/docs/requirements/Component-NotificationOutbox.md b/docs/requirements/Component-NotificationOutbox.md index 3b05dbd9..9e476bed 100644 --- a/docs/requirements/Component-NotificationOutbox.md +++ b/docs/requirements/Component-NotificationOutbox.md @@ -190,3 +190,4 @@ Delivery max-retry-count and retry interval are not part of `NotificationOutboxO - **Notification Service**: Supplies delivery adapters and resolves notification lists at delivery time. - **Central UI**: Queries the `Notifications` table for the Notification Outbox page and issues operator Retry/Discard actions on parked notifications. - **Health Monitoring**: Polls the outbox for KPI tiles on the health dashboard. +- **KPI History (#26)**: Emits `IKpiSampleSource` (`NotificationOutboxKpiSampleSource`, Global + per-Site + per-Node) consumed by the KpiHistory recorder (#26), reusing the existing `Compute…KpisAsync` reads; the resulting `queueDepth` / `stuckCount` / `parkedCount` / `deliveredLastInterval` / `oldestPendingAgeSeconds` series render as trends on the Notification Outbox page via `KpiTrendChart`. See [Component-KpiHistory.md](Component-KpiHistory.md). diff --git a/docs/requirements/Component-SiteCallAudit.md b/docs/requirements/Component-SiteCallAudit.md index 06980f55..5c8d96e1 100644 --- a/docs/requirements/Component-SiteCallAudit.md +++ b/docs/requirements/Component-SiteCallAudit.md @@ -146,3 +146,9 @@ configurable window (default 365 days), matching the `Notifications` purge. - **Health Monitoring**: surfaces Site Call Audit KPI tiles on the dashboard. - **Cluster Infrastructure**: hosts the `SiteCallAuditActor` singleton with active/standby failover. +- **KPI History (#26)**: emits `IKpiSampleSource` + (`SiteCallAuditKpiSampleSource`, Global + per-Site + per-Node) consumed by the + KpiHistory recorder (#26), reusing the existing KPI reads; the resulting + `buffered` / `parked` / `failedLastInterval` / `deliveredLastInterval` / + `stuck` / `oldestPendingAgeSeconds` series render as trends on the Site Calls + page via `KpiTrendChart`. See [Component-KpiHistory.md](Component-KpiHistory.md).