Files
ScadaBridge/docs/plans/2026-06-15-stillpending-m2-implementation.md.tasks.json
T
Joseph Doherty 253bec5a52 feat(host): readiness gates on required cluster singletons (#28, M2.14)
REQ-HOST-4a lists "required cluster singletons running (if applicable)" as a
readiness criterion, but /health/ready only checked database + akka-cluster.
Add a third Ready-tagged check, RequiredSingletonsHealthCheck, registered in the
Central-role AddHealthChecks() chain (so it is naturally role-scoped — site nodes
never run it).

Probe: for each required central singleton, Ask its local ClusterSingletonProxy
an Identify with a short bounded per-singleton timeout (~2s, probes run
concurrently via Task.WhenAll). A non-null ActorIdentity.Subject within the
timeout means the singleton is running and reachable through the proxy; a null
subject or a timeout means unreachable → Unhealthy, naming the unreachable
singleton(s). The check never throws (catch-all → Unhealthy) and resolves
ActorSystem lazily from DI per probe (Unhealthy if Akka not yet up).

Required-always set = the five singleton proxies created unconditionally in
AkkaHostedService.RegisterCentralActors: notification-outbox, audit-log-ingest,
site-call-audit, audit-log-purge, site-audit-reconciliation. There are no
feature/config-gated central singletons today; any future gated singleton is the
"if applicable" case and must NOT be added to the required set.

Leadership-agnostic: the proxy reaches the singleton from either central node, so
a ready standby still reports ready (readiness must not require cluster
leadership — that is the Active tier's job). During a brief singleton handover the
probe may time out and the node flaps to not-ready, which is correct (a node
mid-handover is legitimately not fully ready); no retries, to keep the probe fast.

Tests (TDD): RequiredSingletonsHealthCheckTests exercises the probe against a
TestKit ActorSystem — all proxies present+reachable → Healthy; one missing →
Unhealthy naming it; ActorSystem absent → Unhealthy, no throw. HealthCheckTests
regression-guards the Ready tag + absence of the Active tag on the new check.
2026-06-16 06:49:18 -04:00

36 lines
4.2 KiB
JSON

{
"planPath": "docs/plans/2026-06-15-stillpending-m2-implementation.md",
"tasks": [
{"id": 32, "ref": "M2.0", "subject": "M2.0 #32: EF model/snapshot drift (PendingModelChangesWarning)", "class": "high-risk", "status": "completed", "commits": ["2fb608f"]},
{"id": 33, "ref": "M2.1", "subject": "M2.1 #22: native-alarm capability validation wired into deploy pipeline", "class": "standard", "status": "completed", "commits": ["d690920", "41d828e"]},
{"id": 34, "ref": "M2.2", "subject": "M2.2 #10: connection-level diff surfaced in deployment diff", "class": "standard", "status": "completed", "commits": ["e9a84ba", "198770f"]},
{"id": 35, "ref": "M2.3", "subject": "M2.3 #7: Database.CachedWrite transient/permanent SQL classification", "class": "high-risk", "status": "completed", "commits": ["d052706", "de375ff"]},
{"id": 36, "ref": "M2.4", "subject": "M2.4 #8: alarm conditionFilter applied (OPC UA WhereClause + client routing)", "class": "high-risk", "status": "completed", "commits": ["8825df5", "00304a2"]},
{"id": 37, "ref": "M2.5", "subject": "M2.5 #9: per-script execution timeout (entity+migration+flatten+actor)", "class": "standard", "status": "completed", "blockedBy": [32], "commits": ["3edef09", "3032faa"]},
{"id": 38, "ref": "M2.6", "subject": "M2.6 #13: nested Object/List extended-type validation", "class": "standard", "status": "completed", "commits": ["4b6187c", "411d0c0"]},
{"id": 39, "ref": "M2.7", "subject": "M2.7 #20+#21: return-type + argument-type compatibility checks", "class": "standard", "status": "completed", "commits": ["958229e", "a8e9e99"]},
{"id": 40, "ref": "M2.8", "subject": "M2.8 #23: binding-completeness Error + name-exists-at-site", "class": "standard", "status": "completed", "commits": ["7c14a69", "21b801b"]},
{"id": 41, "ref": "M2.9", "subject": "M2.9 #17: MachineDataDb fail-fast (reverts Host-008)", "class": "small", "status": "completed", "commits": ["76198b3"]},
{"id": 42, "ref": "M2.10", "subject": "M2.10 #18: CI grep-guard against UPDATE/DELETE on AuditLog", "class": "small", "status": "completed", "commits": ["e7b6fe3", "9cd62aa"]},
{"id": 43, "ref": "M2.11", "subject": "M2.11 #24: debug snapshot unknown-instance returns error", "class": "small", "status": "completed", "commits": ["dbf44b9", "d160c7f"]},
{"id": 44, "ref": "M2.12", "subject": "M2.12 #25: recursion-limit error to site event log", "class": "small", "status": "completed", "commits": ["f08038d", "e2b31a9"]},
{"id": 45, "ref": "M2.13", "subject": "M2.13 #27: populate obtainable OPC UA/MxGateway transition fields", "class": "small", "status": "completed", "commits": ["722b866", "3945789"]},
{"id": 46, "ref": "M2.14", "subject": "M2.14 #28: readiness gate checks required cluster singletons", "class": "standard", "status": "completed", "commits": ["a4d81fa"]},
{"id": 47, "ref": "M2.15", "subject": "M2.15 #29: register site active-node purge gate (DI)", "class": "small", "status": "pending"},
{"id": 48, "ref": "M2.16", "subject": "M2.16 #30: Health Monitoring consumes FailedWriteCount", "class": "small", "status": "pending"},
{"id": 49, "ref": "M2.17", "subject": "M2.17 #31: reconcile StateTransitionValidator delete-from-NotDeployed", "class": "small", "status": "pending"},
{"id": 50, "ref": "M2.18", "subject": "M2.18 #26: debug-stream stream-first ordering + replay/dedup", "class": "high-risk", "status": "pending"},
{"id": 51, "ref": "M2.19", "subject": "M2.19 #15: LDAP periodic re-query for interactive sessions (spike+impl)", "class": "high-risk", "status": "pending"}
],
"deferred": [
{"ref": "#16", "subject": "Transport stale-instance enumeration", "to": "M8 (Transport)"},
{"ref": "#19", "subject": "script started/completed events", "status": "done in M1.8"}
],
"followups": [
{"id": 52, "subject": "Investigate 2 partition-purge E2E test failures (AuditLogPurgeActor/PartitionPurge)", "from": "M2.0", "status": "pending"},
{"id": 53, "subject": "Dedup alarm-capable protocol predicate (3 copies → AlarmCapableProtocols)", "from": "M2.1", "status": "pending"},
{"id": 54, "subject": "Expose ExecutionTimeoutSeconds (+ MinTimeBetweenRuns) in CLI + UI script authoring", "from": "M2.5", "status": "pending"}
],
"lastUpdated": "2026-06-15"
}