Commit Graph

17 Commits

Author SHA1 Message Date
Joseph Doherty e14433cd64 feat(kpi): K5 — Host central wiring + KpiHistoryRecorder cluster singleton + appsettings (not readiness-gated)
Wire the M6 KPI History recorder into the central composition path:
- Program.cs: call services.AddKpiHistory(configuration) on the central-only
  branch alongside AddNotificationOutbox/AddAuditLog/AddSiteCallAudit.
- AkkaHostedService.cs: register KpiHistoryRecorderActor as a central,
  non-role-scoped ClusterSingletonManager + ClusterSingletonProxy + a
  PhaseClusterLeave CoordinatedShutdown graceful-stop drain (singleton name
  'kpi-history-recorder'), copied/adapted from the audit-log-purge block.
- appsettings.Central.json (Host + docker + docker-env2 central nodes): add a
  ScadaBridge:KpiHistory section (SampleInterval 00:01:00, RetentionDays 90,
  PurgeInterval 1.00:00:00, DefaultMaxSeriesPoints 200).

KPI history is observability/best-effort and MUST NOT gate readiness: the
recorder is deliberately NOT added to RequiredSingletonsHealthCheck or any
other readiness gate.
2026-06-17 20:20:34 -04:00
Joseph Doherty e89604298d feat(security): wire DisableLogin flag — auto-login scheme + startup warning 2026-06-16 08:47:19 -04:00
Joseph Doherty 253bec5a52 feat(host): readiness gates on required cluster singletons (#28, M2.14)
REQ-HOST-4a lists "required cluster singletons running (if applicable)" as a
readiness criterion, but /health/ready only checked database + akka-cluster.
Add a third Ready-tagged check, RequiredSingletonsHealthCheck, registered in the
Central-role AddHealthChecks() chain (so it is naturally role-scoped — site nodes
never run it).

Probe: for each required central singleton, Ask its local ClusterSingletonProxy
an Identify with a short bounded per-singleton timeout (~2s, probes run
concurrently via Task.WhenAll). A non-null ActorIdentity.Subject within the
timeout means the singleton is running and reachable through the proxy; a null
subject or a timeout means unreachable → Unhealthy, naming the unreachable
singleton(s). The check never throws (catch-all → Unhealthy) and resolves
ActorSystem lazily from DI per probe (Unhealthy if Akka not yet up).

Required-always set = the five singleton proxies created unconditionally in
AkkaHostedService.RegisterCentralActors: notification-outbox, audit-log-ingest,
site-call-audit, audit-log-purge, site-audit-reconciliation. There are no
feature/config-gated central singletons today; any future gated singleton is the
"if applicable" case and must NOT be added to the required set.

Leadership-agnostic: the proxy reaches the singleton from either central node, so
a ready standby still reports ready (readiness must not require cluster
leadership — that is the Active tier's job). During a brief singleton handover the
probe may time out and the node flaps to not-ready, which is correct (a node
mid-handover is legitimately not fully ready); no retries, to keep the probe fast.

Tests (TDD): RequiredSingletonsHealthCheckTests exercises the probe against a
TestKit ActorSystem — all proxies present+reachable → Healthy; one missing →
Unhealthy naming it; ActorSystem absent → Unhealthy, no throw. HealthCheckTests
regression-guards the Ready tag + absence of the Active tag on the new check.
2026-06-16 06:49:18 -04:00
Joseph Doherty 36a08a4145 feat(audit): start purge + reconciliation singletons; production ISiteEnumerator 2026-06-15 10:00:44 -04:00
Joseph Doherty d33617d65d fix(host): register ActorSystem as DI singleton so health-probe scopes don't dispose it (HOST-021)
Per-probe health-check child scopes were disposing the AddTransient-bridged
ActorSystem (IDisposable), terminating the live cluster node ~4s after boot and
leaving every singleton-proxy Ask to hang the full 30s QueryTimeout — the central
report pages (/notifications, /site-calls, /monitoring/health) loaded in ~30s.
Bridge it as a singleton via a new lazy AkkaHostedService.GetOrCreateActorSystem()
so child-scope disposal never touches it. Verified: 0 post-startup terminates,
healthy active/standby, report pages ~0.05s, Playwright 68 passed / 0 failed.
2026-06-05 08:26:09 -04:00
Joseph Doherty afa55981d5 feat(auth)!: ScadaBridge retire SQL Server ApiKey entity + ApprovedApiKeyIds + legacy hashing; EF migration RetireInboundApiKeyStore; re-issue runbook + CHANGELOG (re-arch C5/E) — BREAKING: X-API-Key -> Bearer sbk_, keys re-issued 2026-06-02 05:39:59 -04:00
Joseph Doherty 55099b19f6 fix(auth): move AddZbLdapAuth to Host composition root so component-lib AddSecurity() drops IConfiguration param (satisfy OptionsTests arch rule; fix pre-existing ac34dac red); behaviour-preserving 2026-06-02 03:50:16 -04:00
Joseph Doherty d09def2be0 feat(auth): ScadaBridge re-pin Auth 0.1.3 + add IInboundApiKeyAdmin seam over library admin facade (re-arch C1, additive) 2026-06-02 03:32:25 -04:00
Joseph Doherty a94558c289 feat(auth): ScadaBridge inbound API — adopt ZB.MOM.WW.Auth.ApiKeys verifier + Bearer + scope=method (re-arch A+B); additive, old path retired later 2026-06-02 02:40:18 -04:00
Joseph Doherty ac34dac479 feat(auth): cut ScadaBridge over to ZB.MOM.WW.Auth.Ldap; nest+rename Ldap config; roles+sitescope via IGroupRoleMapper (Task 1.2/1.4) 2026-06-02 01:04:34 -04:00
Joseph Doherty c41cb41c7b fix(scadabridge): default MetricsPort to 8084 (avoid site RemotingPort collision) + validate port distinctness 2026-06-01 16:46:59 -04:00
Joseph Doherty bbc9f09268 feat(scadabridge): add HTTP/1.1 metrics listener on site nodes (NodeOptions.MetricsPort=8082) 2026-06-01 16:36:59 -04:00
Joseph Doherty b3070c0bda feat(scadabridge): wire AddZbTelemetry + /metrics in both composition roots 2026-06-01 15:36:55 -04:00
Joseph Doherty bbff1d19b5 feat: adopt shared ZB.MOM.WW.Health probes; add /healthz; canonical writer 2026-06-01 13:46:49 -04:00
Joseph Doherty 2a7ff03718 feat: bridge ActorSystem into DI (transient) for shared health checks 2026-06-01 13:37:21 -04:00
Joseph Doherty c899cb162c refactor: scrub residual ScadaLink refs → ScadaBridge (env vars, config keys, assembly name, SQL login)
Renames the 13 SCADALINK_* runtime env vars → SCADABRIDGE_*, the ScadaLink__
.NET config keys → ScadaBridge__, the stale ScadaLink.Host.exe assembly name
→ ZB.MOM.WW.ScadaBridge.Host.exe, the scadalink_app SQL login → scadabridge_app,
and residual identifiers/comments/docs. Migration records (prior rename
tooling/design, DB-rename helper, this scrub script) carved out.

Adds tools/scrub-scadalink-refs.sh.
2026-05-31 21:50:38 -04:00
Joseph Doherty 7b0b9c7365 refactor: rename ScadaLink → ZB.MOM.WW.ScadaBridge (code + projects + namespaces)
Solution + 23 src projects + 26 test projects renamed; folders, csproj,
namespaces, and ScadaLinkDbContext/ScadaBridgeDbContext class updated.
ActorSystem "scadalink" → "scadabridge", Akka seed-node URLs migrated.
SQL roles/logins, LDAP domains, CLI command name, and CLI config dir
(~/.scadalink → ~/.scadabridge) also renamed.

Build green; 5 Host.Tests fail awaiting SQL login rename in next commit.
Pre-existing StaleTagMonitor timing flakes unchanged.

Rename script committed at tools/rename-to-scadabridge.sh.
2026-05-28 09:37:45 -04:00