Plan to integrate the built-but-unadopted Health library into OtOpcUa, MxAccessGateway, and ScadaBridge: Gitea-registry distribution, per-repo behaviour-preserving probe swaps (preset-based), canonical tiers + writer, MxGateway-first sequencing.
10 KiB
Adopt ZB.MOM.WW.Health across the three sister apps — design
Date: 2026-06-01
Status: Approved (design); implementation plan to follow via writing-plans.
Scope: Integrate the built-but-unadopted ZB.MOM.WW.Health shared library into all three
sister apps — OtOpcUa, MxAccessGateway, ScadaBridge — replacing each app's bespoke
health-check wiring with the shared probes, tiers, and writer.
This is the first full cross-fleet adoption of one of the six shared ZB.MOM.WW.* libraries.
It follows the adoption backlog in components/health/GAPS.md,
re-verified against current code on 2026-06-01.
1. Goal & scope
Replace each app's bespoke health-check wiring with ZB.MOM.WW.Health, preserving each app's
existing health policy — the library ships presets precisely so neither app's Healthy / Degraded
/ Unhealthy classifications change. Outcome:
- All three apps expose the canonical tiers
/health/ready,/health/active,/healthzwith the canonical JSON writer (ZbHealthWriter). - MxAccessGateway gains real health checks for the first time (today its
/health/liveis a hardcoded"Healthy"lambda that bypasses the ASP.NET Core health-check pipeline, and itsAddHealthChecks()call is dead code). - No breaking external contract; no metric, dashboard, or wire-format change; no ops coordination.
Out of scope: OtOpcUa's actor-based Runtime/Health/* driver health (a different concern —
OPC UA driver connectivity, not the ASP.NET health-endpoint tier). ScadaBridge's distributed
health-monitoring pipeline beyond the endpoint probes.
Library public surface this design depends on (code-verified)
| API | Package | Use |
|---|---|---|
IEndpointRouteBuilder.MapZbHealth(ZbHealthEndpointOptions?) |
ZB.MOM.WW.Health |
Maps ready/active/live tiers by tag. Does not call AddHealthChecks() — caller registers probes + tags. |
ZbHealthTags.Ready / Active / Live |
ZB.MOM.WW.Health |
Tag each probe so MapZbHealth routes it to the right tier. |
ZbHealthWriter |
ZB.MOM.WW.Health |
Canonical JSON response writer. |
GrpcDependencyHealthCheck + GrpcDependencyOptions { Probe, DependencyName, Timeout } |
ZB.MOM.WW.Health |
Probe a downstream gRPC channel. |
IActiveNodeGate (+ AkkaActiveNodeGate) |
ZB.MOM.WW.Health / .Akka |
Active-node seam, replacing duplicated leader logic. |
AkkaClusterStatusPolicy.Default / .OtOpcUaCompat → AkkaClusterHealthCheck(sp, policy) |
ZB.MOM.WW.Health.Akka |
Cluster-membership probe with per-app preset. |
ActiveNodeHealthCheck(sp) / (sp, string role) |
ZB.MOM.WW.Health.Akka |
Active/leader probe, role-filtered overload. |
DatabaseHealthCheck<TContext> + DatabaseHealthCheckOptions<TContext> { ProbeQuery, Timeout } |
ZB.MOM.WW.Health.EntityFrameworkCore |
DB probe; default CanConnectAsync, optional stricter ProbeQuery. |
Consumer matrix: MxGateway → ZB.MOM.WW.Health (core) only; OtOpcUa & ScadaBridge → all three.
2. Distribution & referencing — Gitea registry (chosen)
The family is already inconsistent in how it distributes shared ZB.MOM.WW.* packages:
OtOpcUa uses a committed local folder feed (./nuget-packages/), ScadaBridge uses the Gitea NuGet
registry + package-source-mapping, MxAccessGateway has no nuget.config (it is the producer of
MxGateway.*). We standardize Health distribution on the Gitea NuGet registry — the only
mechanism that gives a single versioned source of truth, commits no binaries, and is already proven
in this family (ScadaBridge consumes MxGateway.* exactly this way).
Step 0 — publish (one-time per version, prerequisite for all repos)
From scadaproj:
dotnet packthe three Health projects (already emit0.1.0nupkgs).dotnet nuget pushthe three packages to thedohertj2-giteafeed (https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json).- Credentials (push token / per-dev feed creds) supplied via env or
dotnet nuget add source, never committed — same posture ScadaBridge already documents.
Per-repo reference wiring
| Repo | Change | Notes |
|---|---|---|
| ScadaBridge | Extend existing packageSourceMapping to route ZB.MOM.WW.Health.* → dohertj2-gitea; add 3 CPM <PackageVersion> entries; add <PackageReference> (no version) to the Host csproj. |
Smallest change — already wired for the Gitea feed + CPM. |
| OtOpcUa | Add dohertj2-gitea source to NuGet.config (keep local-mxgw folder feed for MxGateway.*); add source-mapping (MxGateway.*→local, Health.*→gitea, *→nuget.org) for determinism; add 3 CPM <PackageVersion> entries + <PackageReference>s. |
Keeps its existing folder-feed arrangement untouched. |
| MxAccessGateway | Create its first nuget.config (nuget.org + gitea sources + source-mapping); add a direct <PackageReference Include="ZB.MOM.WW.Health" Version="0.1.0" />. |
No CPM in this repo — a direct versioned reference is correct; introducing CPM for one package is deliberately avoided. |
Existing MxGateway.* distribution arrangements are untouched; only ZB.MOM.WW.Health.* is added.
3. Per-repo integration
3a. MxAccessGateway — highest delta (no health infra today)
- Delete the
/health/liveMapGetlambda (GatewayApplication.cs:173) and the deadAddHealthChecks()(:66). - Re-add
AddHealthChecks()with real probes: register aGrpcDependencyHealthCheck(tagReady) whoseProbeexercises the x86 worker IPC gRPC channel the gateway already owns;DependencyName = "mxworker", explicitTimeout. app.MapZbHealth()→/health/ready(worker reachable),/health/active,/healthz.- Update
GatewayApplicationTests(currently asserts/health/liveexists) to assert the three new tier routes; add a worker-down test assertingready= Unhealthy.
3b. OtOpcUa — all three packages
Host/Health/AkkaClusterHealthCheck.cs→ sharedAkkaClusterHealthCheckwithAkkaClusterStatusPolicy.OtOpcUaCompat(preserves self-Up-among-members semantics).AdminRoleLeaderHealthCheck.cs→ sharedActiveNodeHealthCheck(sp, role: "admin").DatabaseHealthCheck.cs→ sharedDatabaseHealthCheck<TContext>withProbeQuery= its existingDeployments.AsNoTracking().Take(1)query (keeps stricter schema-touch semantics).HealthEndpoints.cs→MapZbHealth()(same tier semantics, canonical writer); register each probe with the matchingZbHealthTags.- Add a downstream
GrpcDependencyHealthCheckprobing the MxAccessGateway channel (tagReady) — closes the silent-gateway-down gap. Runtime/Health/*(actor-based driver health) left untouched.
3c. ScadaBridge — all three packages
- Three bespoke checks → shared
AkkaClusterHealthCheck(Defaultpolicy), role-lessActiveNodeHealthCheck(sp),DatabaseHealthCheck<TContext>(defaultCanConnectAsync). - Switch the DB probe from injected
DbContexttoIDbContextFactory<TContext>(background-safe). - Replace bespoke
ActiveNodeGate.cswith the sharedIActiveNodeGateseam +AkkaActiveNodeGatebacking (removes duplicated leader logic). - Add
/healthz(free viaMapZbHealth()); swapUIResponseWriterforZbHealthWriter.
4. Cross-cutting conventions
- Tags drive tiers: every probe is registered with
tags: [ZbHealthTags.Ready|Active|Live];MapZbHealth()routes by tag. This is the one mechanical convention each repo must follow. - Canonical writer (
ZbHealthWriter) everywhere — replaces three different writers (gatewayGatewayHealthReply, ScadaBridgeUIResponseWriter, OtOpcUa default). - Auth: all tiers stay
AllowAnonymous(matches all three apps today).
5. Sequencing — one PR per repo
The publish-to-Gitea step (§2 Step 0) is a shared prerequisite. After that, each repo PR is independent. Recommended order:
- MxAccessGateway — highest delta, smallest surface; validates the publish→consume loop and the canonical writer end-to-end in the simplest app.
- OtOpcUa — exercises all three packages + the
OtOpcUaCompat/role-filter presets + the downstream gRPC probe. - ScadaBridge — heaviest (the
IActiveNodeGate/IDbContextFactorycleanups); done last with the pattern proven twice.
6. Behaviour-preservation & error handling
- No policy change: presets (
OtOpcUaCompatvsDefault) andRoleFilter="admin"vs role-less are chosen so each app's Healthy/Degraded/Unhealthy classifications are unchanged. - Fail-soft: a probe that throws maps to
Unhealthy, never crashes the host; gRPC/DB probes carry explicitTimeouts. - Credentials: Gitea push token + per-dev feed creds handled out-of-band (env /
dotnet nuget add source), never committed — verified by a "no secrets in diff" check per PR.
7. Testing & verification gates (per repo)
dotnet build+dotnet testgreen in the sister repo after adoption (not just scadaproj).- MxGateway: retarget the route-assertion test to the three tiers; add a worker-down →
ready= Unhealthy test. - OtOpcUa / ScadaBridge: existing health tests retargeted to the shared types; assert tier→tag
routing and that the preset preserves prior classification (ScadaBridge
Joining= Healthy; OtOpcUa self-not-Up = Degraded). - Check off the corresponding
components/health/GAPS.mditems and update that file to reflect adoption.
8. Risks & open questions
- MxGateway worker-IPC probe shape — the exact
Probedelegate depends on how the gateway holds the per-session worker channel. Implementation detail; the plan pins it againstGatewayApplication's worker-client wiring. - Gitea availability / credentials in this environment — if the registry is unreachable when
implementation starts, the fallback is the local folder feed without changing any per-repo
code, only the
nuget.configsource. This is flagged explicitly rather than switched silently. - CPM in MxGateway — none today; this design uses a direct versioned
PackageReferencerather than introducing CPM for one package. Standardizing MxGateway onto CPM is a possible follow-up, out of scope here.
Next step
Hand off to the writing-plans skill to turn this design into a detailed, step-by-step implementation plan (per-repo tasks, exact edit sites, test changes, commit/PR structure).