# Adopt `ZB.MOM.WW.Health` across the three sister apps — design **Date:** 2026-06-01 **Status:** Approved (design); implementation plan to follow via writing-plans. **Scope:** Integrate the built-but-unadopted `ZB.MOM.WW.Health` shared library into all three sister apps — **OtOpcUa**, **MxAccessGateway**, **ScadaBridge** — replacing each app's bespoke health-check wiring with the shared probes, tiers, and writer. This is the first full cross-fleet adoption of one of the six shared `ZB.MOM.WW.*` libraries. It follows the adoption backlog in [`components/health/GAPS.md`](../../components/health/GAPS.md), re-verified against current code on 2026-06-01. --- ## 1. Goal & scope Replace each app's bespoke health-check wiring with `ZB.MOM.WW.Health`, **preserving each app's existing health policy** — the library ships presets precisely so neither app's Healthy / Degraded / Unhealthy classifications change. Outcome: - All three apps expose the canonical tiers `/health/ready`, `/health/active`, `/healthz` with the canonical JSON writer (`ZbHealthWriter`). - **MxAccessGateway gains real health checks for the first time** (today its `/health/live` is a hardcoded `"Healthy"` lambda that bypasses the ASP.NET Core health-check pipeline, and its `AddHealthChecks()` call is dead code). - No breaking external contract; no metric, dashboard, or wire-format change; no ops coordination. **Out of scope:** OtOpcUa's actor-based `Runtime/Health/*` *driver* health (a different concern — OPC UA driver connectivity, not the ASP.NET health-endpoint tier). ScadaBridge's distributed health-monitoring pipeline beyond the endpoint probes. ### Library public surface this design depends on (code-verified) | API | Package | Use | |---|---|---| | `IEndpointRouteBuilder.MapZbHealth(ZbHealthEndpointOptions?)` | `ZB.MOM.WW.Health` | Maps `ready`/`active`/`live` tiers by tag. Does **not** call `AddHealthChecks()` — caller registers probes + tags. | | `ZbHealthTags.Ready / Active / Live` | `ZB.MOM.WW.Health` | Tag each probe so `MapZbHealth` routes it to the right tier. | | `ZbHealthWriter` | `ZB.MOM.WW.Health` | Canonical JSON response writer. | | `GrpcDependencyHealthCheck` + `GrpcDependencyOptions { Probe, DependencyName, Timeout }` | `ZB.MOM.WW.Health` | Probe a downstream gRPC channel. | | `IActiveNodeGate` (+ `AkkaActiveNodeGate`) | `ZB.MOM.WW.Health` / `.Akka` | Active-node seam, replacing duplicated leader logic. | | `AkkaClusterStatusPolicy.Default` / `.OtOpcUaCompat` → `AkkaClusterHealthCheck(sp, policy)` | `ZB.MOM.WW.Health.Akka` | Cluster-membership probe with per-app preset. | | `ActiveNodeHealthCheck(sp)` / `(sp, string role)` | `ZB.MOM.WW.Health.Akka` | Active/leader probe, role-filtered overload. | | `DatabaseHealthCheck` + `DatabaseHealthCheckOptions { ProbeQuery, Timeout }` | `ZB.MOM.WW.Health.EntityFrameworkCore` | DB probe; default `CanConnectAsync`, optional stricter `ProbeQuery`. | **Consumer matrix:** MxGateway → `ZB.MOM.WW.Health` (core) only; OtOpcUa & ScadaBridge → all three. --- ## 2. Distribution & referencing — Gitea registry (chosen) The family is already inconsistent in how it distributes shared `ZB.MOM.WW.*` packages: OtOpcUa uses a committed local folder feed (`./nuget-packages/`), ScadaBridge uses the Gitea NuGet registry + package-source-mapping, MxAccessGateway has no `nuget.config` (it is the *producer* of `MxGateway.*`). We standardize Health distribution on the **Gitea NuGet registry** — the only mechanism that gives a single versioned source of truth, commits no binaries, and is already proven in this family (ScadaBridge consumes `MxGateway.*` exactly this way). ### Step 0 — publish (one-time per version, prerequisite for all repos) From `scadaproj`: 1. `dotnet pack` the three Health projects (already emit `0.1.0` nupkgs). 2. `dotnet nuget push` the three packages to the `dohertj2-gitea` feed (`https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json`). 3. Credentials (push token / per-dev feed creds) supplied via env or `dotnet nuget add source`, **never committed** — same posture ScadaBridge already documents. ### Per-repo reference wiring | Repo | Change | Notes | |---|---|---| | **ScadaBridge** | Extend existing `packageSourceMapping` to route `ZB.MOM.WW.Health.*` → `dohertj2-gitea`; add 3 CPM `` entries; add `` (no version) to the Host csproj. | Smallest change — already wired for the Gitea feed + CPM. | | **OtOpcUa** | Add `dohertj2-gitea` source to `NuGet.config` (keep `local-mxgw` folder feed for `MxGateway.*`); add source-mapping (`MxGateway.*`→local, `Health.*`→gitea, `*`→nuget.org) for determinism; add 3 CPM `` entries + ``s. | Keeps its existing folder-feed arrangement untouched. | | **MxAccessGateway** | Create its **first** `nuget.config` (nuget.org + gitea sources + source-mapping); add a direct ``. | No CPM in this repo — a direct versioned reference is correct; introducing CPM for one package is deliberately avoided. | Existing `MxGateway.*` distribution arrangements are untouched; only `ZB.MOM.WW.Health.*` is added. --- ## 3. Per-repo integration ### 3a. MxAccessGateway — highest delta (no health infra today) - Delete the `/health/live` `MapGet` lambda (`GatewayApplication.cs:173`) and the dead `AddHealthChecks()` (`:66`). - Re-add `AddHealthChecks()` **with real probes**: register a `GrpcDependencyHealthCheck` (tag `Ready`) whose `Probe` exercises the **x86 worker IPC gRPC channel** the gateway already owns; `DependencyName = "mxworker"`, explicit `Timeout`. - `app.MapZbHealth()` → `/health/ready` (worker reachable), `/health/active`, `/healthz`. - Update `GatewayApplicationTests` (currently asserts `/health/live` exists) to assert the three new tier routes; add a worker-down test asserting `ready` = Unhealthy. ### 3b. OtOpcUa — all three packages - `Host/Health/AkkaClusterHealthCheck.cs` → shared `AkkaClusterHealthCheck` with **`AkkaClusterStatusPolicy.OtOpcUaCompat`** (preserves self-Up-among-members semantics). - `AdminRoleLeaderHealthCheck.cs` → shared `ActiveNodeHealthCheck(sp, role: "admin")`. - `DatabaseHealthCheck.cs` → shared `DatabaseHealthCheck` with `ProbeQuery` = its existing `Deployments.AsNoTracking().Take(1)` query (keeps stricter schema-touch semantics). - `HealthEndpoints.cs` → `MapZbHealth()` (same tier semantics, canonical writer); register each probe with the matching `ZbHealthTags`. - Add a downstream `GrpcDependencyHealthCheck` probing the **MxAccessGateway channel** (tag `Ready`) — closes the silent-gateway-down gap. - `Runtime/Health/*` (actor-based driver health) left untouched. ### 3c. ScadaBridge — all three packages - Three bespoke checks → shared `AkkaClusterHealthCheck` (**`Default`** policy), role-less `ActiveNodeHealthCheck(sp)`, `DatabaseHealthCheck` (default `CanConnectAsync`). - Switch the DB probe from injected `DbContext` to `IDbContextFactory` (background-safe). - Replace bespoke `ActiveNodeGate.cs` with the shared `IActiveNodeGate` seam + `AkkaActiveNodeGate` backing (removes duplicated leader logic). - Add `/healthz` (free via `MapZbHealth()`); swap `UIResponseWriter` for `ZbHealthWriter`. --- ## 4. Cross-cutting conventions - **Tags drive tiers:** every probe is registered with `tags: [ZbHealthTags.Ready|Active|Live]`; `MapZbHealth()` routes by tag. This is the one mechanical convention each repo must follow. - **Canonical writer** (`ZbHealthWriter`) everywhere — replaces three different writers (gateway `GatewayHealthReply`, ScadaBridge `UIResponseWriter`, OtOpcUa default). - **Auth:** all tiers stay `AllowAnonymous` (matches all three apps today). --- ## 5. Sequencing — one PR per repo The publish-to-Gitea step (§2 Step 0) is a shared prerequisite. After that, each repo PR is independent. Recommended order: 1. **MxAccessGateway** — highest delta, smallest surface; validates the publish→consume loop and the canonical writer end-to-end in the simplest app. 2. **OtOpcUa** — exercises all three packages + the `OtOpcUaCompat`/role-filter presets + the downstream gRPC probe. 3. **ScadaBridge** — heaviest (the `IActiveNodeGate` / `IDbContextFactory` cleanups); done last with the pattern proven twice. --- ## 6. Behaviour-preservation & error handling - **No policy change:** presets (`OtOpcUaCompat` vs `Default`) and `RoleFilter="admin"` vs role-less are chosen so each app's Healthy/Degraded/Unhealthy classifications are unchanged. - **Fail-soft:** a probe that throws maps to `Unhealthy`, never crashes the host; gRPC/DB probes carry explicit `Timeout`s. - **Credentials:** Gitea push token + per-dev feed creds handled out-of-band (env / `dotnet nuget add source`), never committed — verified by a "no secrets in diff" check per PR. --- ## 7. Testing & verification gates (per repo) - `dotnet build` + `dotnet test` green **in the sister repo** after adoption (not just scadaproj). - **MxGateway:** retarget the route-assertion test to the three tiers; add a worker-down → `ready` = Unhealthy test. - **OtOpcUa / ScadaBridge:** existing health tests retargeted to the shared types; assert tier→tag routing and that the preset preserves prior classification (ScadaBridge `Joining` = Healthy; OtOpcUa self-not-Up = Degraded). - Check off the corresponding `components/health/GAPS.md` items and update that file to reflect adoption. --- ## 8. Risks & open questions - **MxGateway worker-IPC probe shape** — the exact `Probe` delegate depends on how the gateway holds the per-session worker channel. Implementation detail; the plan pins it against `GatewayApplication`'s worker-client wiring. - **Gitea availability / credentials** in this environment — if the registry is unreachable when implementation starts, the fallback is the **local folder feed** without changing any per-repo code, only the `nuget.config` source. This is flagged explicitly rather than switched silently. - **CPM in MxGateway** — none today; this design uses a direct versioned `PackageReference` rather than introducing CPM for one package. Standardizing MxGateway onto CPM is a possible follow-up, out of scope here. --- ## Next step Hand off to the **writing-plans** skill to turn this design into a detailed, step-by-step implementation plan (per-repo tasks, exact edit sites, test changes, commit/PR structure).