docs(kpi): K17 — #26 KpiHistory component doc + README/CLAUDE + cross-component interactions + completion-design update

This commit is contained in:
Joseph Doherty
2026-06-17 20:52:12 -04:00
parent 3f1f4ed7c6
commit 241a792e7b
9 changed files with 193 additions and 5 deletions
+3 -1
View File
@@ -63,7 +63,7 @@ Related repos cloned as sibling directories under `~/Desktop/` — referenced fo
- Commit related changes together with a descriptive message summarizing the design decision and the implementation slice.
- After non-trivial code changes, build (`dotnet build ZB.MOM.WW.ScadaBridge.slnx`) and run relevant tests before declaring done; for cluster-runtime changes, rebuild the image with `bash docker/deploy.sh`.
## Current Component List (25 components)
## Current Component List (26 components)
1. Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs.
2. Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle.
@@ -90,6 +90,7 @@ Related repos cloned as sibling directories under `~/Desktop/` — referenced fo
23. Audit Log — Central append-only AuditLog table spanning every script-trust-boundary action (outbound API sync+cached, outbound DB sync+cached, notifications, inbound API). Site SQLite hot-path + gRPC telemetry + reconciliation; combined telemetry with Site Call Audit; central direct-write for Notification Outbox dispatch + Inbound API; monthly partitioning, 365-day retention.
24. Transport — File-based, encrypted bundle export/import via Central UI. Templates, system artifacts, central-only configuration. Per-conflict resolution. Correlated audit via `BundleImportId`. No site involvement.
25. Script Analysis — Shared authoritative script-trust analyzer: unified forbidden-API deny-list (`ScriptTrustPolicy`), fused semantic+syntactic validator (`ScriptTrustValidator`), Roslyn compile wrapper (`RoslynScriptCompiler`), and compile-only globals stubs (`ScriptCompileSurface`/`TriggerCompileSurface`); consumed by Template Engine, Site Runtime, Inbound API, and Central UI.
26. KPI History — Reusable central KPI-history backbone: tall/EAV `KpiSample` store in central MS SQL, `KpiHistoryRecorderActor` cluster singleton (`kpi-history-recorder`, not readiness-gated) sampling DI-registered `IKpiSampleSource`s every minute, bucketed query (`GetRawSeriesAsync` + `KpiSeriesBucketer`) + scoped `KpiHistoryQueryService`, and a reusable custom-SVG `KpiTrendChart`; ships trends for Notification Outbox, Site Call Audit, Audit Log, and Site Health.
## Key Design Decisions (for context across sessions)
@@ -199,6 +200,7 @@ Related repos cloned as sibling directories under `~/Desktop/` — referenced fo
- Stuck = `Pending`/`Retrying` older than a configurable age threshold (default 10 min) — display-only (KPI count + row badge), no escalation/alerting.
- Headline KPI tiles surface on the Health dashboard; a new Central UI Notification Outbox page offers a queryable list with Retry/Discard actions on parked notifications.
- Site Call Audit KPIs are central-computed point-in-time from the `SiteCalls` table (global + per-site), mirroring the Notification Outbox KPI shape; tiles surface on the Health dashboard alongside a queryable Central UI Site Calls page with Retry/Discard on parked rows.
- KPI History & Trends (#26, M6): a reusable central KPI-history backbone — supersedes the prior "point-in-time only, no time-series store" stance — backed by a tall/EAV `KpiSample` table in central MS SQL (no new infra). A `KpiHistoryRecorderActor` cluster singleton (`kpi-history-recorder`, **not readiness-gated**, best-effort with per-source isolation) samples every minute by enumerating DI-registered `IKpiSampleSource`s (each lives with its owner, registered via `TryAddEnumerable`, reusing existing KPI/aggregator reads); daily purge after `RetentionDays` (default 90). Querying is `IKpiHistoryRepository.GetRawSeriesAsync``KpiSeriesBucketer` (last-value-per-bucket) → scoped dual-ctor `KpiHistoryQueryService` → a reusable **custom-SVG** `KpiTrendChart` (no third-party charting lib). Trends ship on four surfaces: Notification Outbox, Site Calls, Audit Log pages + a per-site Health-dashboard panel. `KpiHistoryOptions` (`ScadaBridge:KpiHistory`): SampleInterval 60s, RetentionDays 90, PurgeInterval 1d, DefaultMaxSeriesPoints 200; validated. M6's T9 (Teams + other non-Email delivery adapters) and T10 (`NotificationType` enum values + Central UI list "Type" selector) are deferred to the next major version.
### Code Organization
- Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database.
+1
View File
@@ -101,6 +101,7 @@ Both stacks share the infrastructure services in [`infra/`](infra/) (MS SQL, LDA
| 23 | Audit Log | [docs/requirements/Component-AuditLog.md](docs/requirements/Component-AuditLog.md) | New central append-only AuditLog spanning every script-trust-boundary action (outbound API sync+cached, outbound DB sync+cached, notifications, inbound API). Site-local SQLite hot-path append + gRPC telemetry + central reconciliation; combined telemetry packet with Site Call Audit; central direct-write for Notification Outbox dispatch + Inbound API middleware; monthly partitioning, 365-day default retention. |
| 24 | Transport | [docs/requirements/Component-Transport.md](docs/requirements/Component-Transport.md) | Bundle export/import for templates, shared scripts, external systems, central-only artifacts. AES-256-GCM encryption; per-conflict resolution on import; correlated audit trail. |
| 25 | Script Analysis | [docs/requirements/Component-ScriptAnalysis.md](docs/requirements/Component-ScriptAnalysis.md) | Shared authoritative script-trust analyzer: unified forbidden-API deny-list (`ScriptTrustPolicy`), fused semantic+syntactic validator (`ScriptTrustValidator`), Roslyn compile wrapper (`RoslynScriptCompiler`), and compile-only globals stubs (`ScriptCompileSurface`/`TriggerCompileSurface`); consumed by Template Engine, Site Runtime, Inbound API, and Central UI. |
| 26 | KPI History | [docs/requirements/Component-KpiHistory.md](docs/requirements/Component-KpiHistory.md) | Reusable central KPI-history backbone: tall/EAV `KpiSample` store (central MS SQL), `KpiHistoryRecorderActor` cluster singleton (`kpi-history-recorder`, not readiness-gated) sampling DI-registered `IKpiSampleSource`s each minute, bucketed `GetRawSeriesAsync` + `KpiSeriesBucketer` query, and a reusable custom-SVG `KpiTrendChart`. Ships trends for Notification Outbox, Site Call Audit, Audit Log, and Site Health. |
**Shared UI sub-component** (not a top-level component): [TreeView](docs/requirements/Component-TreeView.md) — reusable hierarchical tree/grid Blazor component used by the Central UI (#9) for the templates folder hierarchy, data-connection browse, and tag pickers.
@@ -78,8 +78,11 @@ Wire up behavior that exists in code but is never started, and fill the event-lo
#### M5 — Audit hardening (T1T8)
Hash-chain tamper evidence (off by default, `verify-chain` made real); Parquet export/archival (replace the 501); per-channel retention overrides; tag-cascade for `ParentExecutionId` (thread writing-execution id through trigger-driven runs); ExecutionId/ParentExecutionId + SourceNode backfill on historical rows; per-node stuck-count KPIs; structured response capture (headers/content-type, inbound request headers, per-method opt-out, `AuditInboundCeilingHits` metric); CLI `audit tree`.
#### M6 — Notifications (T9T11)
Teams + other non-Email delivery adapters behind the existing `INotificationDeliveryAdapter` seam; `NotificationType` enum values; Central UI notification-list `Type` selector; historical/trend KPI charts (introduce a time-series store).
#### M6 — KPI History & Trends (T11 delivered; T9/T10 deferred)
Reshaped during the 2026-06-17 brainstorm (see `docs/plans/2026-06-17-m6-kpi-history-design.md`):
- **T11 — DELIVERED** as the reusable **KPI-history backbone** (#26 KpiHistory), promoted from a notifications-only feature. A tall/EAV `KpiSample` store in **central MS SQL** (no new infra — supersedes the original "point-in-time only, no time-series store" stance), a `KpiHistoryRecorderActor` cluster singleton (`kpi-history-recorder`, not readiness-gated, best-effort with per-source isolation) sampling DI-registered `IKpiSampleSource`s every minute, a bucketed `GetRawSeriesAsync` + `KpiSeriesBucketer` query + scoped `KpiHistoryQueryService`, and a reusable custom-SVG `KpiTrendChart` (no third-party charting lib). Trends shipped for **all** current KPI sources — Notification Outbox, Site Call Audit, Audit Log, and Site Health — across four UI surfaces.
- **T9 (Teams + other non-Email delivery adapters behind `INotificationDeliveryAdapter`) — DEFERRED to the next major version.** The seam exists; no code now. Transport choice (Incoming Webhook vs Microsoft Graph) and the Teams list-targeting model remain to be designed.
- **T10 (`NotificationType` enum values + Central UI notification-list `Type` selector) — DEFERRED with T9.** A Type selector has no purpose until a second delivery type exists.
#### M7 — OPC UA / MxGateway UX (T13T17)
Dedicated operator Alarm Summary page; MxGateway secured writes (operator+verifier); OPC UA address-space search + `BrowseNext` paging; type-info surfacing + bulk override CSV import; "Verify endpoint" connectivity button + cert-management UI.
@@ -96,7 +99,7 @@ Template tree search/filter; folder drag-drop + sibling reorder + root context m
## Dependencies & sequencing
- **M1 → M5** — audit hardening builds on the wired purge/reconciliation.
- **M6/T11** — depends on introducing a time-series store (new infra; size carefully).
- **M6/T11** — delivered as the #26 KpiHistory backbone; reused **central MS SQL** (a tall/EAV `KpiSample` table) rather than introducing new infra. T9/T10 deferred to the next major version.
- **M9/T26** — base-template versioning is the largest authoring item; may split.
- **M4** — runs anytime; cheap and high-clarity, good to interleave.
- **M3** — independent; can run in parallel with M1/M2.
@@ -121,7 +124,7 @@ Template tree search/filter; folder drag-drop + sibling reorder + root context m
## Open items / risks
- M3 real-compile may surface latent invalid scripts in existing templates/fixtures — budget for fixture cleanup.
- M6 time-series store is the one genuinely-new piece of infrastructure; scope it deliberately (could reuse MS SQL with a rollup table rather than a new dependency).
- M6 KPI history (resolved): reused **central MS SQL** with a tall/EAV `KpiSample` table rather than a new dependency, so no genuinely-new infrastructure was introduced.
- The Phase 2 roadmap is large; treat each milestone as a separate planning + implementation pass, not a single mega-effort.
## Next step
+5
View File
@@ -578,3 +578,8 @@ orphaned entries) and in the CLI's `audit tree` output.
`scadabridge audit backfill-source-node --sentinel <s> --before <date>`, and
`scadabridge audit verify-chain` (no-op placeholder for the deferred hash-chain
feature); same permission requirements as the UI.
- **[KPI History (#26)](Component-KpiHistory.md)** — emits `IKpiSampleSource`
(`AuditLogKpiSampleSource`, Global) consumed by the KpiHistory recorder (#26),
reusing the existing audit-KPI reads; the resulting `totalEventsLastHour` /
`errorEventsLastHour` / `backlogTotal` series render as trends on the Audit Log
page via `KpiTrendChart`.
+1
View File
@@ -249,3 +249,4 @@ Per-leaf alarm rendering (leaf nodes are individual conditions for native alarms
- **Notification Outbox**: Provides notification delivery KPIs and serves the `Notifications` table queries and Retry/Discard actions for the Notification Outbox page.
- **Site Call Audit**: Serves the `SiteCalls` table queries and relays Retry/Discard actions to sites for the Site Calls page.
- **Audit Log (#23)**: Serves all `AuditLog` table queries (filter / grid / drilldown / CSV export) for the new Audit Log page and the drill-in surfaces on Notifications, Site Calls, External Systems, Inbound API keys, Sites, and Instances. Payload capture, redaction, and per-site authorization follow the Audit Log component's "Payload Capture Policy" and "Security & Tamper-Evidence" sections.
- **KPI History (#26)**: The Central UI hosts the `KpiHistoryQueryService` (scoped-repository read over `IKpiHistoryRepository.GetRawSeriesAsync` + `KpiSeriesBucketer`, dual-ctor test seam) and renders the reusable custom-SVG `KpiTrendChart` fed by it. Trend sections appear on the Notification Outbox, Site Calls, and Audit Log pages and in a per-site panel on the Health Monitoring dashboard; a query failure degrades to an unavailable-chart placeholder rather than breaking the page. See [Component-KpiHistory.md](Component-KpiHistory.md).
@@ -116,3 +116,4 @@ These tiles are **point-in-time** like the Notification Outbox and Site Call Aud
- **Central UI**: Health Monitoring Dashboard displays aggregated metrics.
- **Communication Layer**: Health reports flow as periodic messages.
- **KPI History (#26)**: emits `IKpiSampleSource` (`SiteHealthKpiSampleSource`, per-Site) consumed by the KpiHistory recorder (#26). It reads the in-memory `ICentralHealthAggregator.GetAllSiteStates()` (no DB), turning the per-site snapshot — previously sequence-numbered every 30s but discarded — into trends (`connectionsUp`/`connectionsDown`, `scriptErrors`, `alarmEvalErrors`, `sfBufferDepth`, `deadLetters`, `parkedMessages`, `deployedInstances`/`enabledInstances`/`disabledInstances`, `auditBacklogPending`, `eventLogWriteFailures`) rendered in the dashboard's per-site `KpiTrendChart` panel. See [Component-KpiHistory.md](Component-KpiHistory.md).
+168
View File
@@ -0,0 +1,168 @@
# Component: KPI History
## Purpose
The KPI History component is the central, reusable **KPI-history backbone** — a tall / EAV time-series store, a periodic recorder singleton, a bucketed query API, and a reusable custom SVG trend-chart component. It turns the system's existing point-in-time KPIs into trends, and ships those trends for the **Notification Outbox (#21)**, **Site Call Audit (#22)**, **Audit Log (#23)**, and **Site Health (#11)** sources.
This supersedes the earlier "KPI history — point-in-time only, no separate time-series store is added" stance carried by the Notification Outbox and Site Call Audit KPI sections. M6 explicitly introduces a store. It lives in **central MS SQL** — the existing HA store — so it adds **no new infrastructure dependency**: a single `KpiSample` table, an EF mapping + migration, and a central cluster singleton that samples every minute.
The backbone is deliberately source-agnostic. Each owning component contributes an `IKpiSampleSource` registered into DI; the recorder enumerates them. KPI History therefore does **not** reference every component, and every source reuses the KPI reads its owner already computes — no per-source schema or storage work.
## Location
- `src/ZB.MOM.WW.ScadaBridge.KpiHistory` — the component project: the `KpiHistoryRecorderActor`, `KpiHistoryOptions` + validator, and the DI/options wiring (`ServiceCollectionExtensions`). It owns the recorder, the options, and consumes the `IKpiSampleSource` abstraction (defined in Commons).
- **`IKpiSampleSource` implementations live with their owners**, not here — `NotificationOutboxKpiSampleSource` (in NotificationOutbox), `SiteCallAuditKpiSampleSource` (in SiteCallAudit), `AuditLogKpiSampleSource` (in AuditLog), `SiteHealthKpiSampleSource` (in HealthMonitoring). Each registers itself via `TryAddEnumerable`.
- **Commons** — the `KpiSample` POCO entity (`Entities/Kpi`), the `IKpiSampleSource` and `IKpiHistoryRepository` interfaces (`Interfaces/Kpi`), and the `KpiSources` / `KpiScopes` constant catalogs + `KpiSeriesPoint` / `KpiSeriesBucketer` types (`Types/Kpi`).
- **Configuration Database** — the EF mapping (`KpiSampleEntityTypeConfiguration`), the migration that creates the `KpiSample` table + indexes, and the `KpiHistoryRepository` implementation.
- **Central UI** — the `KpiHistoryQueryService` query service and the reusable `KpiTrendChart.razor` component, plus the trend sections embedded on four surfaces.
The recorder is a **singleton on the active central node**, consistent with the other central singletons (Notification Outbox, Site Call Audit, purge actors).
## Responsibilities
- Own the `KpiSample` table — the central tall / EAV KPI-history store in MS SQL.
- Run the recorder loop: every `SampleInterval`, enumerate all registered `IKpiSampleSource`s and persist their samples stamped with one shared tick timestamp.
- Isolate sources from one another and from the store: a failure in any one source (or in the write) is logged and skipped for that tick and never disrupts the source component or the rest of the tick (best-effort observability).
- Purge aged rows on a daily cadence (`PurgeInterval`) older than `RetentionDays`.
- Provide a bucketed series-query API (`IKpiHistoryRepository.GetRawSeriesAsync` + `KpiSeriesBucketer`) and the Central UI query service + reusable trend chart that consume it.
KPI History is **observability, never a user-facing critical path** — neither recording nor querying may ever break a hosting page or disrupt a source component.
## Schema — `KpiSample` (tall / EAV)
A persistence-ignorant POCO in Commons; EF mapping + migration in Configuration Database; one table in central MS SQL. One row per `(Source, Metric, Scope, ScopeKey)` per recorder tick:
| Column | Type | Notes |
|---|---|---|
| `Id` | `bigint` PK identity | Surrogate key assigned by the store. |
| `Source` | `varchar(64)` | Owning source — a `KpiSources` constant: `NotificationOutbox` / `SiteCallAudit` / `AuditLog` / `SiteHealth`. |
| `Metric` | `varchar(64)` | Per-source metric name, e.g. `queueDepth`, `parkedCount`, `deadLetters` — drawn from each source's own metric catalog. |
| `Scope` | `varchar(16)` | A `KpiScopes` constant: `Global` / `Site` / `Node`. |
| `ScopeKey` | `varchar(64)` NULL | Site id (for `Site`) or node name (for `Node`); `NULL` for `Global`. |
| `Value` | `float` (`double`) | Counts carried exactly within range; ages stored as **seconds**. |
| `CapturedAtUtc` | `datetime2` | The recorder tick timestamp (UTC), shared across every sample in one tick. |
All timestamps are UTC, consistent with the system-wide convention.
Two named indexes back the access paths:
- **`IX_KpiSample_Series` (`Source`, `Metric`, `Scope`, `ScopeKey`, `CapturedAtUtc`)** — the per-series range query (one series scanned in time order).
- **`IX_KpiSample_Captured` (`CapturedAtUtc`)** — the retention purge.
## Recorder — `KpiHistoryRecorderActor`
The recorder is the Akka.NET cluster singleton **`kpi-history-recorder`** (singleton-manager actor `kpi-history-recorder-singleton`), running on the active central node. It is **not readiness-gated** — the recorder is pure observability and must never gate `/health/ready`, so it is started outside the readiness barrier (unlike the operational singletons). On graceful shutdown it drains via a `CoordinatedShutdown` task for clean singleton handover.
A timer fires every `SampleInterval` (default 60s; an immediate first tick primes the series, then it settles into the periodic cadence). On each tick the recorder:
1. Opens a **per-tick DI scope** (scoped `DbContext`/repository — the same scope-per-sweep pattern as the `NotificationOutboxActor`).
2. Enumerates the registered `IEnumerable<IKpiSampleSource>`. Each source returns an `IReadOnlyList<KpiSample>` stamped with the tick's single `CapturedAtUtc`.
3. Writes all collected samples via `IKpiHistoryRepository.RecordSamplesAsync`.
**Best-effort, per-source isolation.** Each source call and the write are individually guarded. A throwing source is logged and its samples skipped for that tick; it never aborts the tick, the other sources, or the source component itself. This is the same `IEnumerable<>`-of-adapters decoupling pattern used by `INotificationDeliveryAdapter`.
**Retention.** A daily purge timer (`PurgeInterval`, default 24h) deletes rows older than `RetentionDays` (default 90) via `IKpiHistoryRepository.PurgeOlderThanAsync`, reusing the existing purge-scheduler shape. Hourly/longer-range downsampling is deferred (YAGNI).
## Sample Sources
Each `IKpiSampleSource` lives in its owning component and is registered into DI with `TryAddEnumerable` (idempotent, additive). Each reuses the KPI reads its owner already performs — the Notification Outbox / Site Call Audit / Audit Log sources call their owners' existing `Compute…KpisAsync` aggregator reads; the Site Health source reads the in-memory `ICentralHealthAggregator` (no DB read). `Value` carries counts exactly and ages as seconds; all metric names below are the exact shipped strings.
### `NotificationOutboxKpiSampleSource` (in NotificationOutbox)
Scopes: **Global + per-Site + per-Node** (the per-node breakdown reuses the M5 `ComputePerNodeKpisAsync`).
- `queueDepth`
- `stuckCount`
- `parkedCount`
- `deliveredLastInterval`
- `oldestPendingAgeSeconds`
### `SiteCallAuditKpiSampleSource` (in SiteCallAudit)
Scopes: **Global + per-Site + per-Node**.
- `buffered`
- `parked`
- `failedLastInterval`
- `deliveredLastInterval`
- `stuck`
- `oldestPendingAgeSeconds`
### `AuditLogKpiSampleSource` (in AuditLog)
Scope: **Global**.
- `totalEventsLastHour`
- `errorEventsLastHour`
- `backlogTotal`
### `SiteHealthKpiSampleSource` (in HealthMonitoring)
Reads `ICentralHealthAggregator.GetAllSiteStates()` (in-memory, no DB). Scope: **per-Site** — the largest latent win, since Site Health was previously sequence-numbered every 30s but its history discarded.
- `connectionsUp`
- `connectionsDown`
- `scriptErrors`
- `alarmEvalErrors`
- `sfBufferDepth`
- `deadLetters`
- `parkedMessages`
- `deployedInstances`
- `enabledInstances`
- `disabledInstances`
- `auditBacklogPending`
- `eventLogWriteFailures`
## Query + UI
### Bucketed query
`IKpiHistoryRepository.GetRawSeriesAsync(source, metric, scope, scopeKey, fromUtc, toUtc, …)` returns the raw points for one series over `[fromUtc, toUtc]`. `KpiSeriesBucketer.Bucket(raw, fromUtc, toUtc, maxPoints)` then partitions the window into ≤ `maxPoints` time buckets and returns the **last value per bucket** as `KpiSeriesPoint(BucketStartUtc, Value)`. Last-value is correct for gauge metrics; v1 ships exactly one aggregation — avg / min / max are deferred.
### `KpiHistoryQueryService` (Central UI)
A scoped-repository direct read with a **dual-constructor test seam** (one ctor resolves a scoped `IKpiHistoryRepository` per call; the other accepts an injected repository for tests) — the same shape as `AuditLogQueryService`. `GetSeriesAsync` resolves the effective point cap (caller override or `KpiHistoryOptions.DefaultMaxSeriesPoints`), fetches the raw series, and reduces it via `KpiSeriesBucketer`. A query failure surfaces as an unavailable chart (em-dash / message), mirroring how the existing KPI tiles surface transient failures — it never breaks the hosting page.
### `KpiTrendChart.razor` (Central UI)
A reusable **custom inline-SVG** line/area chart — a polyline path with min/max + time-range axis labels, a responsive `viewBox`, and clean corporate styling. There is **no third-party charting library** (per the CLAUDE.md no-third-party-component-framework rule). The time window (e.g. 24h / 7d) is owned by the parent page.
### Surfaces
Trend sections render on four pages, each feeding `KpiTrendChart` from `KpiHistoryQueryService`:
- **Notification Outbox** page — outbox KPI trends.
- **Site Calls** page — cached-call KPI trends.
- **Audit Log** page — audit volume / error / backlog trends.
- **Health dashboard** — a per-site Site Health trend panel.
## Configuration — `KpiHistoryOptions`
Bound from the `ScadaBridge:KpiHistory` section on the central host (Options pattern), validated on startup by `KpiHistoryOptionsValidator`:
| Option | Default | Notes |
|---|---|---|
| `SampleInterval` | `60s` | Recorder tick cadence. Must be `> 0`. |
| `RetentionDays` | `90` | Rows older than this are purged. Bounded to `[1, 3650]` days. |
| `PurgeInterval` | `1d` | Daily purge cadence. Must be `> 0`. |
| `DefaultMaxSeriesPoints` | `200` | Default bucket cap for a series query when the caller does not override it. Bounded to `[2, 5000]`. |
Validation fails fast at startup on a non-positive `SampleInterval` / `PurgeInterval` (which would stall the recorder / purge), an out-of-range `RetentionDays` (too short loses history; too long defeats retention), or an out-of-range `DefaultMaxSeriesPoints`.
## Dependencies
- **Commons**: defines the `KpiSample` entity, the `IKpiSampleSource` and `IKpiHistoryRepository` interfaces, the `KpiSources` / `KpiScopes` catalogs, and the `KpiSeriesPoint` / `KpiSeriesBucketer` query types.
- **Configuration Database**: hosts the `KpiSample` table, its EF mapping, the migration, and the `KpiHistoryRepository` implementation.
- **Cluster Infrastructure**: hosts the `kpi-history-recorder` cluster singleton with active/standby failover.
- **Host**: binds `KpiHistoryOptions`, registers the component on the central role, and starts the recorder singleton **outside** the readiness barrier.
- **Notification Outbox / Site Call Audit / Audit Log / Health Monitoring**: each contributes an `IKpiSampleSource` and the KPI/aggregator reads it reuses. KPI History depends on the `IKpiSampleSource` abstraction, not on these components directly.
- **Central UI**: hosts `KpiHistoryQueryService` and the `KpiTrendChart` component.
## Interactions
- **Notification Outbox (#21)**: registers `NotificationOutboxKpiSampleSource` (Global / Site / Node), sampled each recorder tick; its trends render on the Notification Outbox page.
- **Site Call Audit (#22)**: registers `SiteCallAuditKpiSampleSource` (Global / Site / Node); its trends render on the Site Calls page.
- **Audit Log (#23)**: registers `AuditLogKpiSampleSource` (Global); its trends render on the Audit Log page.
- **Health Monitoring (#11)**: registers `SiteHealthKpiSampleSource` (per-Site), reading the in-memory central health aggregator; its trends render in the Health dashboard's per-site panel.
- **Central UI (#9)**: renders the reusable `KpiTrendChart` fed by `KpiHistoryQueryService` across the four trend surfaces; a query failure degrades to an unavailable-chart placeholder rather than breaking the page.
- **Cluster Infrastructure (#13)**: provides the active/standby singleton hosting for the recorder, which drains on `CoordinatedShutdown` for clean handover.
@@ -190,3 +190,4 @@ Delivery max-retry-count and retry interval are not part of `NotificationOutboxO
- **Notification Service**: Supplies delivery adapters and resolves notification lists at delivery time.
- **Central UI**: Queries the `Notifications` table for the Notification Outbox page and issues operator Retry/Discard actions on parked notifications.
- **Health Monitoring**: Polls the outbox for KPI tiles on the health dashboard.
- **KPI History (#26)**: Emits `IKpiSampleSource` (`NotificationOutboxKpiSampleSource`, Global + per-Site + per-Node) consumed by the KpiHistory recorder (#26), reusing the existing `Compute…KpisAsync` reads; the resulting `queueDepth` / `stuckCount` / `parkedCount` / `deliveredLastInterval` / `oldestPendingAgeSeconds` series render as trends on the Notification Outbox page via `KpiTrendChart`. See [Component-KpiHistory.md](Component-KpiHistory.md).
@@ -146,3 +146,9 @@ configurable window (default 365 days), matching the `Notifications` purge.
- **Health Monitoring**: surfaces Site Call Audit KPI tiles on the dashboard.
- **Cluster Infrastructure**: hosts the `SiteCallAuditActor` singleton with
active/standby failover.
- **KPI History (#26)**: emits `IKpiSampleSource`
(`SiteCallAuditKpiSampleSource`, Global + per-Site + per-Node) consumed by the
KpiHistory recorder (#26), reusing the existing KPI reads; the resulting
`buffered` / `parked` / `failedLastInterval` / `deliveredLastInterval` /
`stuck` / `oldestPendingAgeSeconds` series render as trends on the Site Calls
page via `KpiTrendChart`. See [Component-KpiHistory.md](Component-KpiHistory.md).