Files
scadaproj/components/health/README.md
T
Joseph Doherty 07d5907258 docs(health): resolve spec/contract/gaps consistency (review fixes)
Applies canonical resolutions for eight settled decisions:
- GAPS: remove three stale "Decisions still open" bullets (#1 IActiveNodeGate placement, #2 GrpcChannel type, #3 OtOpcUaCompat named constant)
- Shared contract: AkkaClusterHealthCheck, ActiveNodeHealthCheck constructors take IServiceProvider (lazy ActorSystem, Degraded-when-not-ready)
- Shared contract: AkkaActiveNodeGate takes IServiceProvider; reads SelfMember+leader directly, null-guarded; does not proxy ActiveNodeHealthCheck
- Shared contract: DatabaseHealthCheckOptions.Probe renamed to ProbeQuery; consumer matrix updated
- Shared contract: settled AddZbHealthChecks open question removed (spec §5 is per-project AddHealthChecks)
- SPEC §2.2: OtOpcUaCompat Leaving/Exiting cell updated from — to Degraded + footnote; §2.3 startup-safety note added
- README: status line corrected from "built and tested" to "scaffolded … implementation is follow-on (task #7)"; IActiveNodeGate "left per-project" bullet removed
- OtOpcUa current-state: AddZbHealthChecks → AddHealthChecks().AddCheck<...>(); IClusterRoleInfo note reframed as accepted trade-off
- ScadaBridge current-state: IActiveNodeGate bullet rewritten — interface moves to ZB.MOM.WW.Health on adoption, InboundApiEndpointFilter references shared interface
2026-06-01 06:33:42 -04:00

88 lines
5.2 KiB
Markdown

# Health (readiness / liveness / active-node)
Second normalized component under the operability cluster. **Goal: path to shared code** — converge
the three sister projects onto a common three-tier health endpoint convention and a set of shared
probe implementations, proposed as the `ZB.MOM.WW.Health` library set (3 packages), while each
project keeps its own probe registration and orchestrator wiring.
- The one target: [`spec/SPEC.md`](spec/SPEC.md)
- The proposed shared library: [`shared-contract/ZB.MOM.WW.Health.md`](shared-contract/ZB.MOM.WW.Health.md)
- Divergences + backlog: [`GAPS.md`](GAPS.md)
- Current state, per project: [`current-state/`](current-state/)
## Why health is a strong normalization candidate
Both OtOpcUa and ScadaBridge trace their health-check structure to the same "ScadaLink three-tier
pattern" (`HealthEndpoints.cs:13` says so explicitly) but have already diverged in probe logic,
status semantics, response writer, and endpoint registration style. MxAccessGateway has no shared
ancestry here — it has a single hardcoded `/health/live` endpoint with no real probes at all.
The common core (three tiers, database probe, Akka cluster probe, active-node probe) is
re-implemented twice and absent once. Shared probe implementations with configurable policies
close the gap without forcing identical behavior onto projects with legitimately different cluster
semantics.
## Status by project
| Project | Endpoints today | Probes today | Response writer | `/healthz` | `IActiveNodeGate` | Adoption status |
|---|---|---|---|---|---|---|
| **OtOpcUa** | `/health/ready`, `/health/active`, `/healthz` | Database (query), AkkaCluster (2-way), AdminRoleLeader (role-filtered) | Default (plain-text/JSON) | ✅ present | — | Not started |
| **MxAccessGateway** | `/health/live` only (raw `MapGet`; hardcoded `"Healthy"`) | **None** (`AddHealthChecks()` called but unused) | Bespoke `GatewayHealthReply` JSON | ⛔ absent | — | Not started |
| **ScadaBridge** | `/health/ready`, `/health/active` | Database (`CanConnectAsync`), AkkaCluster (3-way), ActiveNode (role-less) | `HealthChecks.UI.Client` JSON | ⛔ absent | `ActiveNodeGate` (backs Inbound API 503 gate) | Not started |
See each project's [`current-state/<project>/CURRENT-STATE.md`](current-state/) for the
code-verified detail and its adoption plan.
## Normalized vs. left per-project
**Normalized (the shared target):**
- Three-tier endpoint convention: `/health/ready` (tag `ready`), `/health/active` (tag `active`),
`/healthz` (bare liveness). Mapped by `app.MapZbHealth()` from `ZB.MOM.WW.Health`.
- Canonical JSON response writer (lifted from `HealthChecks.UI.Client` style; no per-project
writer wiring needed).
- `IActiveNodeGate` seam — generalized from ScadaBridge's `ActiveNodeGate`; wired into `MapZbHealth`
for automatic active-tier response.
- `GrpcDependencyHealthCheck` — reachability probe for a downstream gRPC dependency (covers
OtOpcUa → MxAccessGateway channel and MxAccessGateway → worker IPC).
- `AkkaClusterHealthCheck` (in `ZB.MOM.WW.Health.Akka`) with a configurable status policy.
Default = ScadaBridge's three-way policy; `OtOpcUaCompat` preset preserves OtOpcUa's two-way
self-Up-among-members scan.
- `ActiveNodeHealthCheck` (in `ZB.MOM.WW.Health.Akka`) with an optional role filter. Role-less =
ScadaBridge's behavior (Up + cluster leader); role-filtered = OtOpcUa's `AdminRoleLeader`
behavior.
- `DatabaseHealthCheck<TContext>` (in `ZB.MOM.WW.Health.EntityFrameworkCore`) with default
`CanConnectAsync` and an optional `ProbeQuery` delegate.
- `AllowAnonymous` on all three tiers by default (consistent across all three projects today).
**Left per-project (not forced together):**
- Which probes each app registers, their names, and which tags they carry.
- Orchestrator / Traefik wiring (sidecars, route rules, upstreams).
- ScadaBridge's `HealthMonitoring/` distributed aggregation pipeline (`SiteHealthCollector`,
`CentralHealthAggregator`, `HealthReportSender`, etc.) — domain-specific, no shared-library
equivalent.
- MxAccessGateway's `GatewayHealthReply` metadata (`DefaultBackend`, `WorkerProtocolVersion`) —
keep as a bespoke `/info` endpoint.
- The x86 worker process — out of process and out of scope; the gateway-side
`GrpcDependencyHealthCheck` observes it indirectly.
## Package structure
`ZB.MOM.WW.Health` ships as three dependency-split packages:
| Package | Contents | Consumers |
|---|---|---|
| `ZB.MOM.WW.Health` | Core tiers, `MapZbHealth`, canonical writer, `IActiveNodeGate`, `GrpcDependencyHealthCheck` | All three |
| `ZB.MOM.WW.Health.Akka` | `AkkaClusterHealthCheck` + status presets, `ActiveNodeHealthCheck` + role filter | OtOpcUa, ScadaBridge |
| `ZB.MOM.WW.Health.EntityFrameworkCore` | `DatabaseHealthCheck<TContext>` + optional probe delegate | OtOpcUa, ScadaBridge |
MxAccessGateway consumes the core package only (no Akka, no EF). OtOpcUa and ScadaBridge consume
all three.
## Component status
**Status: Draft.** Spec and shared-contract written; current-state docs verified; GAPS backlog
populated. Library scaffolded at [`../../ZB.MOM.WW.Health/`](../../ZB.MOM.WW.Health/); source
implementation is a follow-on (task #7 in the adoption backlog). Adoption by the three apps is
a further follow-on tracked in [`GAPS.md`](GAPS.md).