13 KiB
Component: KPI History
Purpose
The KPI History component is the central, reusable KPI-history backbone — a tall / EAV time-series store, a periodic recorder singleton, a bucketed query API, and a reusable custom SVG trend-chart component. It turns the system's existing point-in-time KPIs into trends, and ships those trends for the Notification Outbox (#21), Site Call Audit (#22), Audit Log (#23), and Site Health (#11) sources.
This supersedes the earlier "KPI history — point-in-time only, no separate time-series store is added" stance carried by the Notification Outbox and Site Call Audit KPI sections. M6 explicitly introduces a store. It lives in central MS SQL — the existing HA store — so it adds no new infrastructure dependency: a single KpiSample table, an EF mapping + migration, and a central cluster singleton that samples every minute.
The backbone is deliberately source-agnostic. Each owning component contributes an IKpiSampleSource registered into DI; the recorder enumerates them. KPI History therefore does not reference every component, and every source reuses the KPI reads its owner already computes — no per-source schema or storage work.
Location
src/ZB.MOM.WW.ScadaBridge.KpiHistory— the component project: theKpiHistoryRecorderActor,KpiHistoryOptions+ validator, and the DI/options wiring (ServiceCollectionExtensions). It owns the recorder, the options, and consumes theIKpiSampleSourceabstraction (defined in Commons).IKpiSampleSourceimplementations live with their owners, not here —NotificationOutboxKpiSampleSource(in NotificationOutbox),SiteCallAuditKpiSampleSource(in SiteCallAudit),AuditLogKpiSampleSource(in AuditLog),SiteHealthKpiSampleSource(in HealthMonitoring). Each registers itself viaTryAddEnumerable.- Commons — the
KpiSamplePOCO entity (Entities/Kpi), theIKpiSampleSourceandIKpiHistoryRepositoryinterfaces (Interfaces/Kpi), and theKpiSources/KpiScopesconstant catalogs +KpiSeriesPoint/KpiSeriesBucketertypes (Types/Kpi). - Configuration Database — the EF mapping (
KpiSampleEntityTypeConfiguration), the migration that creates theKpiSampletable + indexes, and theKpiHistoryRepositoryimplementation. - Central UI — the
KpiHistoryQueryServicequery service and the reusableKpiTrendChart.razorcomponent, plus the trend sections embedded on four surfaces.
The recorder is a singleton on the active central node, consistent with the other central singletons (Notification Outbox, Site Call Audit, purge actors).
Responsibilities
- Own the
KpiSampletable — the central tall / EAV KPI-history store in MS SQL. - Run the recorder loop: every
SampleInterval, enumerate all registeredIKpiSampleSources and persist their samples stamped with one shared tick timestamp. - Isolate sources from one another and from the store: a failure in any one source (or in the write) is logged and skipped for that tick and never disrupts the source component or the rest of the tick (best-effort observability).
- Purge aged rows on a daily cadence (
PurgeInterval) older thanRetentionDays. - Provide a bucketed series-query API (
IKpiHistoryRepository.GetRawSeriesAsync+KpiSeriesBucketer) and the Central UI query service + reusable trend chart that consume it.
KPI History is observability, never a user-facing critical path — neither recording nor querying may ever break a hosting page or disrupt a source component.
Schema — KpiSample (tall / EAV)
A persistence-ignorant POCO in Commons; EF mapping + migration in Configuration Database; one table in central MS SQL. One row per (Source, Metric, Scope, ScopeKey) per recorder tick:
| Column | Type | Notes |
|---|---|---|
Id |
bigint PK identity |
Surrogate key assigned by the store. |
Source |
varchar(64) |
Owning source — a KpiSources constant: NotificationOutbox / SiteCallAudit / AuditLog / SiteHealth. |
Metric |
varchar(64) |
Per-source metric name, e.g. queueDepth, parkedCount, deadLetters — drawn from each source's own metric catalog. |
Scope |
varchar(16) |
A KpiScopes constant: Global / Site / Node. |
ScopeKey |
varchar(64) NULL |
Site id (for Site) or node name (for Node); NULL for Global. |
Value |
float (double) |
Counts carried exactly within range; ages stored as seconds. |
CapturedAtUtc |
datetime2 |
The recorder tick timestamp (UTC), shared across every sample in one tick. |
All timestamps are UTC, consistent with the system-wide convention.
Two named indexes back the access paths:
IX_KpiSample_Series(Source,Metric,Scope,ScopeKey,CapturedAtUtc) — the per-series range query (one series scanned in time order).IX_KpiSample_Captured(CapturedAtUtc) — the retention purge.
Recorder — KpiHistoryRecorderActor
The recorder is the Akka.NET cluster singleton kpi-history-recorder (singleton-manager actor kpi-history-recorder-singleton), running on the active central node. It is not readiness-gated — the recorder is pure observability and must never gate /health/ready, so it is started outside the readiness barrier (unlike the operational singletons). On graceful shutdown it drains via a CoordinatedShutdown task for clean singleton handover.
A timer fires every SampleInterval (default 60s; an immediate first tick primes the series, then it settles into the periodic cadence). On each tick the recorder:
- Opens a per-tick DI scope (scoped
DbContext/repository — the same scope-per-sweep pattern as theNotificationOutboxActor). - Enumerates the registered
IEnumerable<IKpiSampleSource>. Each source returns anIReadOnlyList<KpiSample>stamped with the tick's singleCapturedAtUtc. - Writes all collected samples via
IKpiHistoryRepository.RecordSamplesAsync.
Best-effort, per-source isolation. Each source call and the write are individually guarded. A throwing source is logged and its samples skipped for that tick; it never aborts the tick, the other sources, or the source component itself. This is the same IEnumerable<>-of-adapters decoupling pattern used by INotificationDeliveryAdapter.
Retention. A daily purge timer (PurgeInterval, default 24h) deletes rows older than RetentionDays (default 90) via IKpiHistoryRepository.PurgeOlderThanAsync, reusing the existing purge-scheduler shape. Hourly/longer-range downsampling is deferred (YAGNI).
Sample Sources
Each IKpiSampleSource lives in its owning component and is registered into DI with TryAddEnumerable (idempotent, additive). Each reuses the KPI reads its owner already performs — the Notification Outbox / Site Call Audit / Audit Log sources call their owners' existing Compute…KpisAsync aggregator reads; the Site Health source reads the in-memory ICentralHealthAggregator (no DB read). Value carries counts exactly and ages as seconds; all metric names below are the exact shipped strings.
NotificationOutboxKpiSampleSource (in NotificationOutbox)
Scopes: Global + per-Site + per-Node (the per-node breakdown reuses the M5 ComputePerNodeKpisAsync).
queueDepthstuckCountparkedCountdeliveredLastIntervaloldestPendingAgeSeconds
SiteCallAuditKpiSampleSource (in SiteCallAudit)
Scopes: Global + per-Site + per-Node.
bufferedparkedfailedLastIntervaldeliveredLastIntervalstuckoldestPendingAgeSeconds
AuditLogKpiSampleSource (in AuditLog)
Scope: Global.
totalEventsLastHourerrorEventsLastHourbacklogTotal
SiteHealthKpiSampleSource (in HealthMonitoring)
Reads ICentralHealthAggregator.GetAllSiteStates() (in-memory, no DB). Scope: per-Site — the largest latent win, since Site Health was previously sequence-numbered every 30s but its history discarded.
connectionsUpconnectionsDownscriptErrorsalarmEvalErrorssfBufferDepthdeadLettersparkedMessagesdeployedInstancesenabledInstancesdisabledInstancesauditBacklogPendingeventLogWriteFailures
Query + UI
Bucketed query
IKpiHistoryRepository.GetRawSeriesAsync(source, metric, scope, scopeKey, fromUtc, toUtc, …) returns the raw points for one series over [fromUtc, toUtc]. KpiSeriesBucketer.Bucket(raw, fromUtc, toUtc, maxPoints) then partitions the window into ≤ maxPoints time buckets and returns the last value per bucket as KpiSeriesPoint(BucketStartUtc, Value). Last-value is correct for gauge metrics; v1 ships exactly one aggregation — avg / min / max are deferred.
KpiHistoryQueryService (Central UI)
A scoped-repository direct read with a dual-constructor test seam (one ctor resolves a scoped IKpiHistoryRepository per call; the other accepts an injected repository for tests) — the same shape as AuditLogQueryService. GetSeriesAsync resolves the effective point cap (caller override or KpiHistoryOptions.DefaultMaxSeriesPoints), fetches the raw series, and reduces it via KpiSeriesBucketer. A query failure surfaces as an unavailable chart (em-dash / message), mirroring how the existing KPI tiles surface transient failures — it never breaks the hosting page.
KpiTrendChart.razor (Central UI)
A reusable custom inline-SVG line/area chart — a polyline path with min/max + time-range axis labels, a responsive viewBox, and clean corporate styling. There is no third-party charting library (per the CLAUDE.md no-third-party-component-framework rule). The time window (e.g. 24h / 7d) is owned by the parent page.
Surfaces
Trend sections render on four pages, each feeding KpiTrendChart from KpiHistoryQueryService:
- Notification Outbox page — outbox KPI trends.
- Site Calls page — cached-call KPI trends.
- Audit Log page — audit volume / error / backlog trends.
- Health dashboard — a per-site Site Health trend panel.
Configuration — KpiHistoryOptions
Bound from the ScadaBridge:KpiHistory section on the central host (Options pattern), validated on startup by KpiHistoryOptionsValidator:
| Option | Default | Notes |
|---|---|---|
SampleInterval |
60s |
Recorder tick cadence. Must be > 0. |
RetentionDays |
90 |
Rows older than this are purged. Bounded to [1, 3650] days. |
PurgeInterval |
1d |
Daily purge cadence. Must be > 0. |
DefaultMaxSeriesPoints |
200 |
Default bucket cap for a series query when the caller does not override it. Bounded to [2, 5000]. |
Validation fails fast at startup on a non-positive SampleInterval / PurgeInterval (which would stall the recorder / purge), an out-of-range RetentionDays (too short loses history; too long defeats retention), or an out-of-range DefaultMaxSeriesPoints.
Dependencies
- Commons: defines the
KpiSampleentity, theIKpiSampleSourceandIKpiHistoryRepositoryinterfaces, theKpiSources/KpiScopescatalogs, and theKpiSeriesPoint/KpiSeriesBucketerquery types. - Configuration Database: hosts the
KpiSampletable, its EF mapping, the migration, and theKpiHistoryRepositoryimplementation. - Cluster Infrastructure: hosts the
kpi-history-recordercluster singleton with active/standby failover. - Host: binds
KpiHistoryOptions, registers the component on the central role, and starts the recorder singleton outside the readiness barrier. - Notification Outbox / Site Call Audit / Audit Log / Health Monitoring: each contributes an
IKpiSampleSourceand the KPI/aggregator reads it reuses. KPI History depends on theIKpiSampleSourceabstraction, not on these components directly. - Central UI: hosts
KpiHistoryQueryServiceand theKpiTrendChartcomponent.
Interactions
- Notification Outbox (#21): registers
NotificationOutboxKpiSampleSource(Global / Site / Node), sampled each recorder tick; its trends render on the Notification Outbox page. - Site Call Audit (#22): registers
SiteCallAuditKpiSampleSource(Global / Site / Node); its trends render on the Site Calls page. - Audit Log (#23): registers
AuditLogKpiSampleSource(Global); its trends render on the Audit Log page. - Health Monitoring (#11): registers
SiteHealthKpiSampleSource(per-Site), reading the in-memory central health aggregator; its trends render in the Health dashboard's per-site panel. - Central UI (#9): renders the reusable
KpiTrendChartfed byKpiHistoryQueryServiceacross the four trend surfaces; a query failure degrades to an unavailable-chart placeholder rather than breaking the page. - Cluster Infrastructure (#13): provides the active/standby singleton hosting for the recorder, which drains on
CoordinatedShutdownfor clean handover.