Files
ScadaBridge/docs/requirements/Component-KpiHistory.md
T

13 KiB

Component: KPI History

Purpose

The KPI History component is the central, reusable KPI-history backbone — a tall / EAV time-series store, a periodic recorder singleton, a bucketed query API, and a reusable custom SVG trend-chart component. It turns the system's existing point-in-time KPIs into trends, and ships those trends for the Notification Outbox (#21), Site Call Audit (#22), Audit Log (#23), and Site Health (#11) sources.

This supersedes the earlier "KPI history — point-in-time only, no separate time-series store is added" stance carried by the Notification Outbox and Site Call Audit KPI sections. M6 explicitly introduces a store. It lives in central MS SQL — the existing HA store — so it adds no new infrastructure dependency: a single KpiSample table, an EF mapping + migration, and a central cluster singleton that samples every minute.

The backbone is deliberately source-agnostic. Each owning component contributes an IKpiSampleSource registered into DI; the recorder enumerates them. KPI History therefore does not reference every component, and every source reuses the KPI reads its owner already computes — no per-source schema or storage work.

Location

  • src/ZB.MOM.WW.ScadaBridge.KpiHistory — the component project: the KpiHistoryRecorderActor, KpiHistoryOptions + validator, and the DI/options wiring (ServiceCollectionExtensions). It owns the recorder, the options, and consumes the IKpiSampleSource abstraction (defined in Commons).
  • IKpiSampleSource implementations live with their owners, not here — NotificationOutboxKpiSampleSource (in NotificationOutbox), SiteCallAuditKpiSampleSource (in SiteCallAudit), AuditLogKpiSampleSource (in AuditLog), SiteHealthKpiSampleSource (in HealthMonitoring). Each registers itself via TryAddEnumerable.
  • Commons — the KpiSample POCO entity (Entities/Kpi), the IKpiSampleSource and IKpiHistoryRepository interfaces (Interfaces/Kpi), and the KpiSources / KpiScopes constant catalogs + KpiSeriesPoint / KpiSeriesBucketer types (Types/Kpi).
  • Configuration Database — the EF mapping (KpiSampleEntityTypeConfiguration), the migration that creates the KpiSample table + indexes, and the KpiHistoryRepository implementation.
  • Central UI — the KpiHistoryQueryService query service and the reusable KpiTrendChart.razor component, plus the trend sections embedded on four surfaces.

The recorder is a singleton on the active central node, consistent with the other central singletons (Notification Outbox, Site Call Audit, purge actors).

Responsibilities

  • Own the KpiSample table — the central tall / EAV KPI-history store in MS SQL.
  • Run the recorder loop: every SampleInterval, enumerate all registered IKpiSampleSources and persist their samples stamped with one shared tick timestamp.
  • Isolate sources from one another and from the store: a failure in any one source (or in the write) is logged and skipped for that tick and never disrupts the source component or the rest of the tick (best-effort observability).
  • Purge aged rows on a daily cadence (PurgeInterval) older than RetentionDays.
  • Provide a bucketed series-query API (IKpiHistoryRepository.GetRawSeriesAsync + KpiSeriesBucketer) and the Central UI query service + reusable trend chart that consume it.

KPI History is observability, never a user-facing critical path — neither recording nor querying may ever break a hosting page or disrupt a source component.

Schema — KpiSample (tall / EAV)

A persistence-ignorant POCO in Commons; EF mapping + migration in Configuration Database; one table in central MS SQL. One row per (Source, Metric, Scope, ScopeKey) per recorder tick:

Column Type Notes
Id bigint PK identity Surrogate key assigned by the store.
Source varchar(64) Owning source — a KpiSources constant: NotificationOutbox / SiteCallAudit / AuditLog / SiteHealth.
Metric varchar(64) Per-source metric name, e.g. queueDepth, parkedCount, deadLetters — drawn from each source's own metric catalog.
Scope varchar(16) A KpiScopes constant: Global / Site / Node.
ScopeKey varchar(64) NULL Site id (for Site) or node name (for Node); NULL for Global.
Value float (double) Counts carried exactly within range; ages stored as seconds.
CapturedAtUtc datetime2 The recorder tick timestamp (UTC), shared across every sample in one tick.

All timestamps are UTC, consistent with the system-wide convention.

Two named indexes back the access paths:

  • IX_KpiSample_Series (Source, Metric, Scope, ScopeKey, CapturedAtUtc) — the per-series range query (one series scanned in time order).
  • IX_KpiSample_Captured (CapturedAtUtc) — the retention purge.

Recorder — KpiHistoryRecorderActor

The recorder is the Akka.NET cluster singleton kpi-history-recorder (singleton-manager actor kpi-history-recorder-singleton), running on the active central node. It is not readiness-gated — the recorder is pure observability and must never gate /health/ready, so it is started outside the readiness barrier (unlike the operational singletons). On graceful shutdown it drains via a CoordinatedShutdown task for clean singleton handover.

A timer fires every SampleInterval (default 60s; an immediate first tick primes the series, then it settles into the periodic cadence). On each tick the recorder:

  1. Opens a per-tick DI scope (scoped DbContext/repository — the same scope-per-sweep pattern as the NotificationOutboxActor).
  2. Enumerates the registered IEnumerable<IKpiSampleSource>. Each source returns an IReadOnlyList<KpiSample> stamped with the tick's single CapturedAtUtc.
  3. Writes all collected samples via IKpiHistoryRepository.RecordSamplesAsync.

Best-effort, per-source isolation. Each source call and the write are individually guarded. A throwing source is logged and its samples skipped for that tick; it never aborts the tick, the other sources, or the source component itself. This is the same IEnumerable<>-of-adapters decoupling pattern used by INotificationDeliveryAdapter.

Retention. A daily purge timer (PurgeInterval, default 24h) deletes rows older than RetentionDays (default 90) via IKpiHistoryRepository.PurgeOlderThanAsync, reusing the existing purge-scheduler shape. Hourly/longer-range downsampling is deferred (YAGNI).

Sample Sources

Each IKpiSampleSource lives in its owning component and is registered into DI with TryAddEnumerable (idempotent, additive). Each reuses the KPI reads its owner already performs — the Notification Outbox / Site Call Audit / Audit Log sources call their owners' existing Compute…KpisAsync aggregator reads; the Site Health source reads the in-memory ICentralHealthAggregator (no DB read). Value carries counts exactly and ages as seconds; all metric names below are the exact shipped strings.

NotificationOutboxKpiSampleSource (in NotificationOutbox)

Scopes: Global + per-Site + per-Node (the per-node breakdown reuses the M5 ComputePerNodeKpisAsync).

  • queueDepth
  • stuckCount
  • parkedCount
  • deliveredLastInterval
  • oldestPendingAgeSeconds

SiteCallAuditKpiSampleSource (in SiteCallAudit)

Scopes: Global + per-Site + per-Node.

  • buffered
  • parked
  • failedLastInterval
  • deliveredLastInterval
  • stuck
  • oldestPendingAgeSeconds

AuditLogKpiSampleSource (in AuditLog)

Scope: Global.

  • totalEventsLastHour
  • errorEventsLastHour
  • backlogTotal

SiteHealthKpiSampleSource (in HealthMonitoring)

Reads ICentralHealthAggregator.GetAllSiteStates() (in-memory, no DB). Scope: per-Site — the largest latent win, since Site Health was previously sequence-numbered every 30s but its history discarded.

  • connectionsUp
  • connectionsDown
  • scriptErrors
  • alarmEvalErrors
  • sfBufferDepth
  • deadLetters
  • parkedMessages
  • deployedInstances
  • enabledInstances
  • disabledInstances
  • auditBacklogPending
  • eventLogWriteFailures

Query + UI

Bucketed query

IKpiHistoryRepository.GetRawSeriesAsync(source, metric, scope, scopeKey, fromUtc, toUtc, …) returns the raw points for one series over [fromUtc, toUtc]. KpiSeriesBucketer.Bucket(raw, fromUtc, toUtc, maxPoints) then partitions the window into ≤ maxPoints time buckets and returns the last value per bucket as KpiSeriesPoint(BucketStartUtc, Value). Last-value is correct for gauge metrics; v1 ships exactly one aggregation — avg / min / max are deferred.

KpiHistoryQueryService (Central UI)

A scoped-repository direct read with a dual-constructor test seam (one ctor resolves a scoped IKpiHistoryRepository per call; the other accepts an injected repository for tests) — the same shape as AuditLogQueryService. GetSeriesAsync resolves the effective point cap (caller override or KpiHistoryOptions.DefaultMaxSeriesPoints), fetches the raw series, and reduces it via KpiSeriesBucketer. A query failure surfaces as an unavailable chart (em-dash / message), mirroring how the existing KPI tiles surface transient failures — it never breaks the hosting page.

KpiTrendChart.razor (Central UI)

A reusable custom inline-SVG line/area chart — a polyline path with min/max + time-range axis labels, a responsive viewBox, and clean corporate styling. There is no third-party charting library (per the CLAUDE.md no-third-party-component-framework rule). The time window (e.g. 24h / 7d) is owned by the parent page.

Surfaces

Trend sections render on four pages, each feeding KpiTrendChart from KpiHistoryQueryService:

  • Notification Outbox page — outbox KPI trends.
  • Site Calls page — cached-call KPI trends.
  • Audit Log page — audit volume / error / backlog trends.
  • Health dashboard — a per-site Site Health trend panel.

Configuration — KpiHistoryOptions

Bound from the ScadaBridge:KpiHistory section on the central host (Options pattern), validated on startup by KpiHistoryOptionsValidator:

Option Default Notes
SampleInterval 60s Recorder tick cadence. Must be > 0.
RetentionDays 90 Rows older than this are purged. Bounded to [1, 3650] days.
PurgeInterval 1d Daily purge cadence. Must be > 0.
DefaultMaxSeriesPoints 200 Default bucket cap for a series query when the caller does not override it. Bounded to [2, 5000].

Validation fails fast at startup on a non-positive SampleInterval / PurgeInterval (which would stall the recorder / purge), an out-of-range RetentionDays (too short loses history; too long defeats retention), or an out-of-range DefaultMaxSeriesPoints.

Dependencies

  • Commons: defines the KpiSample entity, the IKpiSampleSource and IKpiHistoryRepository interfaces, the KpiSources / KpiScopes catalogs, and the KpiSeriesPoint / KpiSeriesBucketer query types.
  • Configuration Database: hosts the KpiSample table, its EF mapping, the migration, and the KpiHistoryRepository implementation.
  • Cluster Infrastructure: hosts the kpi-history-recorder cluster singleton with active/standby failover.
  • Host: binds KpiHistoryOptions, registers the component on the central role, and starts the recorder singleton outside the readiness barrier.
  • Notification Outbox / Site Call Audit / Audit Log / Health Monitoring: each contributes an IKpiSampleSource and the KPI/aggregator reads it reuses. KPI History depends on the IKpiSampleSource abstraction, not on these components directly.
  • Central UI: hosts KpiHistoryQueryService and the KpiTrendChart component.

Interactions

  • Notification Outbox (#21): registers NotificationOutboxKpiSampleSource (Global / Site / Node), sampled each recorder tick; its trends render on the Notification Outbox page.
  • Site Call Audit (#22): registers SiteCallAuditKpiSampleSource (Global / Site / Node); its trends render on the Site Calls page.
  • Audit Log (#23): registers AuditLogKpiSampleSource (Global); its trends render on the Audit Log page.
  • Health Monitoring (#11): registers SiteHealthKpiSampleSource (per-Site), reading the in-memory central health aggregator; its trends render in the Health dashboard's per-site panel.
  • Central UI (#9): renders the reusable KpiTrendChart fed by KpiHistoryQueryService across the four trend surfaces; a query failure degrades to an unavailable-chart placeholder rather than breaking the page.
  • Cluster Infrastructure (#13): provides the active/standby singleton hosting for the recorder, which drains on CoordinatedShutdown for clean handover.