# Health (readiness / liveness / active-node) Second normalized component under the operability cluster. **Goal: path to shared code** — converge the three sister projects onto a common three-tier health endpoint convention and a set of shared probe implementations, proposed as the `ZB.MOM.WW.Health` library set (3 packages), while each project keeps its own probe registration and orchestrator wiring. - The one target: [`spec/SPEC.md`](spec/SPEC.md) - The proposed shared library: [`shared-contract/ZB.MOM.WW.Health.md`](shared-contract/ZB.MOM.WW.Health.md) - Divergences + backlog: [`GAPS.md`](GAPS.md) - Current state, per project: [`current-state/`](current-state/) ## Why health is a strong normalization candidate Both OtOpcUa and ScadaBridge trace their health-check structure to the same "ScadaLink three-tier pattern" (`HealthEndpoints.cs:13` says so explicitly) but have already diverged in probe logic, status semantics, response writer, and endpoint registration style. MxAccessGateway has no shared ancestry here — it has a single hardcoded `/health/live` endpoint with no real probes at all. The common core (three tiers, database probe, Akka cluster probe, active-node probe) is re-implemented twice and absent once. Shared probe implementations with configurable policies close the gap without forcing identical behavior onto projects with legitimately different cluster semantics. ## Status by project | Project | Endpoints today | Probes today | Response writer | `/healthz` | `IActiveNodeGate` | Adoption status | |---|---|---|---|---|---|---| | **OtOpcUa** | `/health/ready`, `/health/active`, `/healthz` | Database (query), AkkaCluster (2-way), AdminRoleLeader (role-filtered) | Default (plain-text/JSON) | ✅ present | — | Not started | | **MxAccessGateway** | `/health/live` only (raw `MapGet`; hardcoded `"Healthy"`) | **None** (`AddHealthChecks()` called but unused) | Bespoke `GatewayHealthReply` JSON | ⛔ absent | — | Not started | | **ScadaBridge** | `/health/ready`, `/health/active` | Database (`CanConnectAsync`), AkkaCluster (3-way), ActiveNode (role-less) | `HealthChecks.UI.Client` JSON | ⛔ absent | `ActiveNodeGate` (backs Inbound API 503 gate) | Not started | See each project's [`current-state//CURRENT-STATE.md`](current-state/) for the code-verified detail and its adoption plan. ## Normalized vs. left per-project **Normalized (the shared target):** - Three-tier endpoint convention: `/health/ready` (tag `ready`), `/health/active` (tag `active`), `/healthz` (bare liveness). Mapped by `app.MapZbHealth()` from `ZB.MOM.WW.Health`. - Canonical JSON response writer (lifted from `HealthChecks.UI.Client` style; no per-project writer wiring needed). - `IActiveNodeGate` seam — generalized from ScadaBridge's `ActiveNodeGate`; wired into `MapZbHealth` for automatic active-tier response. - `GrpcDependencyHealthCheck` — reachability probe for a downstream gRPC dependency (covers OtOpcUa → MxAccessGateway channel and MxAccessGateway → worker IPC). - `AkkaClusterHealthCheck` (in `ZB.MOM.WW.Health.Akka`) with a configurable status policy. Default = ScadaBridge's three-way policy; `OtOpcUaCompat` preset preserves OtOpcUa's two-way self-Up-among-members scan. - `ActiveNodeHealthCheck` (in `ZB.MOM.WW.Health.Akka`) with an optional role filter. Role-less = ScadaBridge's behavior (Up + cluster leader); role-filtered = OtOpcUa's `AdminRoleLeader` behavior. - `DatabaseHealthCheck` (in `ZB.MOM.WW.Health.EntityFrameworkCore`) with default `CanConnectAsync` and an optional `ProbeQuery` delegate. - `AllowAnonymous` on all three tiers by default (consistent across all three projects today). **Left per-project (not forced together):** - Which probes each app registers, their names, and which tags they carry. - Orchestrator / Traefik wiring (sidecars, route rules, upstreams). - ScadaBridge's `HealthMonitoring/` distributed aggregation pipeline (`SiteHealthCollector`, `CentralHealthAggregator`, `HealthReportSender`, etc.) — domain-specific, no shared-library equivalent. - MxAccessGateway's `GatewayHealthReply` metadata (`DefaultBackend`, `WorkerProtocolVersion`) — keep as a bespoke `/info` endpoint. - The x86 worker process — out of process and out of scope; the gateway-side `GrpcDependencyHealthCheck` observes it indirectly. ## Package structure `ZB.MOM.WW.Health` ships as three dependency-split packages: | Package | Contents | Consumers | |---|---|---| | `ZB.MOM.WW.Health` | Core tiers, `MapZbHealth`, canonical writer, `IActiveNodeGate`, `GrpcDependencyHealthCheck` | All three | | `ZB.MOM.WW.Health.Akka` | `AkkaClusterHealthCheck` + status presets, `ActiveNodeHealthCheck` + role filter | OtOpcUa, ScadaBridge | | `ZB.MOM.WW.Health.EntityFrameworkCore` | `DatabaseHealthCheck` + optional probe delegate | OtOpcUa, ScadaBridge | MxAccessGateway consumes the core package only (no Akka, no EF). OtOpcUa and ScadaBridge consume all three. ## Component status **Status: Draft — library built at 0.1.0.** Spec and shared-contract written; current-state docs verified; GAPS backlog populated. Library implemented and packed at [`../../ZB.MOM.WW.Health/`](../../ZB.MOM.WW.Health/) (3 packages, 58 tests; `ZB.MOM.WW.Health`, `ZB.MOM.WW.Health.Akka`, `ZB.MOM.WW.Health.EntityFrameworkCore`). Adoption by the three apps is the next follow-on tracked in [`GAPS.md`](GAPS.md).