# Component: KPI History ## Purpose The KPI History component is the central, reusable **KPI-history backbone** — a tall / EAV time-series store, a periodic recorder singleton, a bucketed query API, and a reusable custom SVG trend-chart component. It turns the system's existing point-in-time KPIs into trends, and ships those trends for the **Notification Outbox (#21)**, **Site Call Audit (#22)**, **Audit Log (#23)**, and **Site Health (#11)** sources. This supersedes the earlier "KPI history — point-in-time only, no separate time-series store is added" stance carried by the Notification Outbox and Site Call Audit KPI sections. M6 explicitly introduces a store. It lives in **central MS SQL** — the existing HA store — so it adds **no new infrastructure dependency**: a single `KpiSample` table, an EF mapping + migration, and a central cluster singleton that samples every minute. The backbone is deliberately source-agnostic. Each owning component contributes an `IKpiSampleSource` registered into DI; the recorder enumerates them. KPI History therefore does **not** reference every component, and every source reuses the KPI reads its owner already computes — no per-source schema or storage work. ## Location - `src/ZB.MOM.WW.ScadaBridge.KpiHistory` — the component project: the `KpiHistoryRecorderActor`, `KpiHistoryOptions` + validator, and the DI/options wiring (`ServiceCollectionExtensions`). It owns the recorder, the options, and consumes the `IKpiSampleSource` abstraction (defined in Commons). - **`IKpiSampleSource` implementations live with their owners**, not here — `NotificationOutboxKpiSampleSource` (in NotificationOutbox), `SiteCallAuditKpiSampleSource` (in SiteCallAudit), `AuditLogKpiSampleSource` (in AuditLog), `SiteHealthKpiSampleSource` (in HealthMonitoring). Each registers itself via `TryAddEnumerable`. - **Commons** — the `KpiSample` POCO entity (`Entities/Kpi`), the `IKpiSampleSource` and `IKpiHistoryRepository` interfaces (`Interfaces/Kpi`), and the `KpiSources` / `KpiScopes` constant catalogs + `KpiSeriesPoint` / `KpiSeriesBucketer` types (`Types/Kpi`). - **Configuration Database** — the EF mapping (`KpiSampleEntityTypeConfiguration`), the migration that creates the `KpiSample` table + indexes, and the `KpiHistoryRepository` implementation. - **Central UI** — the `KpiHistoryQueryService` query service and the reusable `KpiTrendChart.razor` component, plus the trend sections embedded on four surfaces. The recorder is a **singleton on the active central node**, consistent with the other central singletons (Notification Outbox, Site Call Audit, purge actors). ## Responsibilities - Own the `KpiSample` table — the central tall / EAV KPI-history store in MS SQL. - Run the recorder loop: every `SampleInterval`, enumerate all registered `IKpiSampleSource`s and persist their samples stamped with one shared tick timestamp. - Isolate sources from one another and from the store: a failure in any one source (or in the write) is logged and skipped for that tick and never disrupts the source component or the rest of the tick (best-effort observability). - Purge aged rows on a daily cadence (`PurgeInterval`) older than `RetentionDays`. - Provide a bucketed series-query API (`IKpiHistoryRepository.GetRawSeriesAsync` + `KpiSeriesBucketer`) and the Central UI query service + reusable trend chart that consume it. KPI History is **observability, never a user-facing critical path** — neither recording nor querying may ever break a hosting page or disrupt a source component. ## Schema — `KpiSample` (tall / EAV) A persistence-ignorant POCO in Commons; EF mapping + migration in Configuration Database; one table in central MS SQL. One row per `(Source, Metric, Scope, ScopeKey)` per recorder tick: | Column | Type | Notes | |---|---|---| | `Id` | `bigint` PK identity | Surrogate key assigned by the store. | | `Source` | `varchar(64)` | Owning source — a `KpiSources` constant: `NotificationOutbox` / `SiteCallAudit` / `AuditLog` / `SiteHealth`. | | `Metric` | `varchar(64)` | Per-source metric name, e.g. `queueDepth`, `parkedCount`, `deadLetters` — drawn from each source's own metric catalog. | | `Scope` | `varchar(16)` | A `KpiScopes` constant: `Global` / `Site` / `Node`. | | `ScopeKey` | `varchar(64)` NULL | Site id (for `Site`) or node name (for `Node`); `NULL` for `Global`. | | `Value` | `float` (`double`) | Counts carried exactly within range; ages stored as **seconds**. | | `CapturedAtUtc` | `datetime2` | The recorder tick timestamp (UTC), shared across every sample in one tick. | All timestamps are UTC, consistent with the system-wide convention. Two named indexes back the access paths: - **`IX_KpiSample_Series` (`Source`, `Metric`, `Scope`, `ScopeKey`, `CapturedAtUtc`)** — the per-series range query (one series scanned in time order). - **`IX_KpiSample_Captured` (`CapturedAtUtc`)** — the retention purge. ## Recorder — `KpiHistoryRecorderActor` The recorder is the Akka.NET cluster singleton **`kpi-history-recorder`** (singleton-manager actor `kpi-history-recorder-singleton`), running on the active central node. It is **not readiness-gated** — the recorder is pure observability and must never gate `/health/ready`, so it is started outside the readiness barrier (unlike the operational singletons). On graceful shutdown it drains via a `CoordinatedShutdown` task for clean singleton handover. A timer fires every `SampleInterval` (default 60s; an immediate first tick primes the series, then it settles into the periodic cadence). On each tick the recorder: 1. Opens a **per-tick DI scope** (scoped `DbContext`/repository — the same scope-per-sweep pattern as the `NotificationOutboxActor`). 2. Enumerates the registered `IEnumerable`. Each source returns an `IReadOnlyList` stamped with the tick's single `CapturedAtUtc`. 3. Writes all collected samples via `IKpiHistoryRepository.RecordSamplesAsync`. **Best-effort, per-source isolation.** Each source call and the write are individually guarded. A throwing source is logged and its samples skipped for that tick; it never aborts the tick, the other sources, or the source component itself. This is the same `IEnumerable<>`-of-adapters decoupling pattern used by `INotificationDeliveryAdapter`. **Retention.** A daily purge timer (`PurgeInterval`, default 24h) deletes rows older than `RetentionDays` (default 90) via `IKpiHistoryRepository.PurgeOlderThanAsync`, reusing the existing purge-scheduler shape. Hourly/longer-range downsampling is deferred (YAGNI). ## Sample Sources Each `IKpiSampleSource` lives in its owning component and is registered into DI with `TryAddEnumerable` (idempotent, additive). Each reuses the KPI reads its owner already performs — the Notification Outbox / Site Call Audit / Audit Log sources call their owners' existing `Compute…KpisAsync` aggregator reads; the Site Health source reads the in-memory `ICentralHealthAggregator` (no DB read). `Value` carries counts exactly and ages as seconds; all metric names below are the exact shipped strings. ### `NotificationOutboxKpiSampleSource` (in NotificationOutbox) Scopes: **Global + per-Site + per-Node** (the per-node breakdown reuses the M5 `ComputePerNodeKpisAsync`). - `queueDepth` - `stuckCount` - `parkedCount` - `deliveredLastInterval` - `oldestPendingAgeSeconds` ### `SiteCallAuditKpiSampleSource` (in SiteCallAudit) Scopes: **Global + per-Site + per-Node**. - `buffered` - `parked` - `failedLastInterval` - `deliveredLastInterval` - `stuck` - `oldestPendingAgeSeconds` ### `AuditLogKpiSampleSource` (in AuditLog) Scope: **Global**. - `totalEventsLastHour` - `errorEventsLastHour` - `backlogTotal` ### `SiteHealthKpiSampleSource` (in HealthMonitoring) Reads `ICentralHealthAggregator.GetAllSiteStates()` (in-memory, no DB). Scope: **per-Site** — the largest latent win, since Site Health was previously sequence-numbered every 30s but its history discarded. - `connectionsUp` - `connectionsDown` - `scriptErrors` - `alarmEvalErrors` - `sfBufferDepth` - `deadLetters` - `parkedMessages` - `deployedInstances` - `enabledInstances` - `disabledInstances` - `auditBacklogPending` - `eventLogWriteFailures` ## Query + UI ### Bucketed query `IKpiHistoryRepository.GetRawSeriesAsync(source, metric, scope, scopeKey, fromUtc, toUtc, …)` returns the raw points for one series over `[fromUtc, toUtc]`. `KpiSeriesBucketer.Bucket(raw, fromUtc, toUtc, maxPoints)` then partitions the window into ≤ `maxPoints` time buckets and returns the **last value per bucket** as `KpiSeriesPoint(BucketStartUtc, Value)`. Last-value is correct for gauge metrics; v1 ships exactly one aggregation — avg / min / max are deferred. ### `KpiHistoryQueryService` (Central UI) A scoped-repository direct read with a **dual-constructor test seam** (one ctor resolves a scoped `IKpiHistoryRepository` per call; the other accepts an injected repository for tests) — the same shape as `AuditLogQueryService`. `GetSeriesAsync` resolves the effective point cap (caller override or `KpiHistoryOptions.DefaultMaxSeriesPoints`), fetches the raw series, and reduces it via `KpiSeriesBucketer`. A query failure surfaces as an unavailable chart (em-dash / message), mirroring how the existing KPI tiles surface transient failures — it never breaks the hosting page. ### `KpiTrendChart.razor` (Central UI) A reusable **custom inline-SVG** line/area chart — a polyline path with min/max + time-range axis labels, a responsive `viewBox`, and clean corporate styling. There is **no third-party charting library** (per the CLAUDE.md no-third-party-component-framework rule). The time window (e.g. 24h / 7d) is owned by the parent page. ### Surfaces Trend sections render on four pages, each feeding `KpiTrendChart` from `KpiHistoryQueryService`: - **Notification Outbox** page — outbox KPI trends. - **Site Calls** page — cached-call KPI trends. - **Audit Log** page — audit volume / error / backlog trends. - **Health dashboard** — a per-site Site Health trend panel. ## Configuration — `KpiHistoryOptions` Bound from the `ScadaBridge:KpiHistory` section on the central host (Options pattern), validated on startup by `KpiHistoryOptionsValidator`: | Option | Default | Notes | |---|---|---| | `SampleInterval` | `60s` | Recorder tick cadence. Must be `> 0`. | | `RetentionDays` | `90` | Rows older than this are purged. Bounded to `[1, 3650]` days. | | `PurgeInterval` | `1d` | Daily purge cadence. Must be `> 0`. | | `DefaultMaxSeriesPoints` | `200` | Default bucket cap for a series query when the caller does not override it. Bounded to `[2, 5000]`. | Validation fails fast at startup on a non-positive `SampleInterval` / `PurgeInterval` (which would stall the recorder / purge), an out-of-range `RetentionDays` (too short loses history; too long defeats retention), or an out-of-range `DefaultMaxSeriesPoints`. ## Dependencies - **Commons**: defines the `KpiSample` entity, the `IKpiSampleSource` and `IKpiHistoryRepository` interfaces, the `KpiSources` / `KpiScopes` catalogs, and the `KpiSeriesPoint` / `KpiSeriesBucketer` query types. - **Configuration Database**: hosts the `KpiSample` table, its EF mapping, the migration, and the `KpiHistoryRepository` implementation. - **Cluster Infrastructure**: hosts the `kpi-history-recorder` cluster singleton with active/standby failover. - **Host**: binds `KpiHistoryOptions`, registers the component on the central role, and starts the recorder singleton **outside** the readiness barrier. - **Notification Outbox / Site Call Audit / Audit Log / Health Monitoring**: each contributes an `IKpiSampleSource` and the KPI/aggregator reads it reuses. KPI History depends on the `IKpiSampleSource` abstraction, not on these components directly. - **Central UI**: hosts `KpiHistoryQueryService` and the `KpiTrendChart` component. ## Interactions - **Notification Outbox (#21)**: registers `NotificationOutboxKpiSampleSource` (Global / Site / Node), sampled each recorder tick; its trends render on the Notification Outbox page. - **Site Call Audit (#22)**: registers `SiteCallAuditKpiSampleSource` (Global / Site / Node); its trends render on the Site Calls page. - **Audit Log (#23)**: registers `AuditLogKpiSampleSource` (Global); its trends render on the Audit Log page. - **Health Monitoring (#11)**: registers `SiteHealthKpiSampleSource` (per-Site), reading the in-memory central health aggregator; its trends render in the Health dashboard's per-site panel. - **Central UI (#9)**: renders the reusable `KpiTrendChart` fed by `KpiHistoryQueryService` across the four trend surfaces; a query failure degrades to an unavailable-chart placeholder rather than breaking the page. - **Cluster Infrastructure (#13)**: provides the active/standby singleton hosting for the recorder, which drains on `CoordinatedShutdown` for clean handover.