Applies canonical resolutions for eight settled decisions: - GAPS: remove three stale "Decisions still open" bullets (#1 IActiveNodeGate placement, #2 GrpcChannel type, #3 OtOpcUaCompat named constant) - Shared contract: AkkaClusterHealthCheck, ActiveNodeHealthCheck constructors take IServiceProvider (lazy ActorSystem, Degraded-when-not-ready) - Shared contract: AkkaActiveNodeGate takes IServiceProvider; reads SelfMember+leader directly, null-guarded; does not proxy ActiveNodeHealthCheck - Shared contract: DatabaseHealthCheckOptions.Probe renamed to ProbeQuery; consumer matrix updated - Shared contract: settled AddZbHealthChecks open question removed (spec §5 is per-project AddHealthChecks) - SPEC §2.2: OtOpcUaCompat Leaving/Exiting cell updated from — to Degraded + footnote; §2.3 startup-safety note added - README: status line corrected from "built and tested" to "scaffolded … implementation is follow-on (task #7)"; IActiveNodeGate "left per-project" bullet removed - OtOpcUa current-state: AddZbHealthChecks → AddHealthChecks().AddCheck<...>(); IClusterRoleInfo note reframed as accepted trade-off - ScadaBridge current-state: IActiveNodeGate bullet rewritten — interface moves to ZB.MOM.WW.Health on adoption, InboundApiEndpointFilter references shared interface
8.1 KiB
Health — gaps & adoption backlog
Divergence of each project from spec/SPEC.md, and the ordered backlog to
reach the shared ZB.MOM.WW.Health library. Status legend: ⛔ gap · 🟡 partial · ✅ matches.
Divergence vs spec
§1 Endpoint tiers
| Spec tier | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
/health/ready (tag ready) |
✅ present | ⛔ absent | ✅ present (name-predicate) |
/health/active (tag active) |
✅ present | ⛔ absent | ✅ present (name-predicate) |
/healthz (bare process liveness) |
✅ present | ⛔ absent | ⛔ absent |
/health/live (non-standard) |
— | ⛔ present (hardcoded "Healthy", bypasses health-check pipeline) |
— |
→ Gap T1 (P1): MxAccessGateway has no standard health tiers. The existing /health/live
MapGet lambda must be replaced by app.MapZbHealth() + real probes.
→ Gap T2: ScadaBridge lacks /healthz. MapZbHealth() adds it automatically.
→ Gap T3: MxAccessGateway's /health/live uses a raw MapGet that bypasses the ASP.NET Core
health-check middleware — it does not participate in IHealthCheckPublisher, HealthReport, or
UI integration. Must be removed.
§2 Probe coverage
| Probe | OtOpcUa | MxAccessGateway | ScadaBridge |
|---|---|---|---|
| Database connectivity | ✅ DatabaseHealthCheck (query probe) |
⛔ none | ✅ DatabaseHealthCheck (CanConnectAsync) |
| Akka cluster membership | ✅ AkkaClusterHealthCheck (2-way) |
n/a (no Akka) | ✅ AkkaClusterHealthCheck (3-way) |
| Active / leader node | ✅ AdminRoleLeaderHealthCheck (role-filtered) |
n/a | ✅ ActiveNodeHealthCheck (role-less) |
| Downstream gRPC dependency | ⛔ none | ⛔ none | ⛔ none |
→ Gap P1 (P1): MxAccessGateway has zero probes — AddHealthChecks() at
GatewayApplication.cs:61 is dead code. Minimum viable: a GrpcDependencyHealthCheck
targeting the x86 worker IPC channel.
→ Gap P2: No project probes its downstream gRPC dependency. OtOpcUa should probe the
MxAccessGateway channel; MxAccessGateway should probe the worker IPC.
→ Gap P3: Dead AddHealthChecks() in MxAccessGateway (GatewayApplication.cs:61) should be
removed or replaced — it currently implies health checks are configured when they are not.
§3 Akka status-policy divergence
| Aspect | OtOpcUa | ScadaBridge |
|---|---|---|
| Probe implementation | Scans State.Members for self by address |
Reads SelfMember.Status directly |
| Joining status | Degraded (not in Members as Up) | Healthy |
| Leaving/Exiting status | Degraded | Degraded |
| Other (Removed, Down…) | Degraded | Unhealthy |
| ActorSystem null guard | — (none; ActorSystem injected directly) |
✅ Degraded if null |
The two implementations diverge in how they classify Joining (ScadaBridge calls it Healthy;
OtOpcUa would see it as Degraded because SelfMember with status Joining would not appear as
Up in the member scan). They also diverge in the Removed/Down classification (ScadaBridge
Unhealthy, OtOpcUa Degraded).
The shared ZB.MOM.WW.Health.Akka.AkkaClusterHealthCheck ships two presets to preserve both
behaviors rather than forcing one onto the other:
- Default — ScadaBridge's three-way policy (
Up/Joining=Healthy,Leaving/Exiting=Degraded, else Unhealthy) - OtOpcUaCompat — OtOpcUa's self-Up-among-members scan (found Up=Healthy, not found=Degraded)
→ Gap A1: OtOpcUa adopts the OtOpcUaCompat preset; ScadaBridge adopts the Default preset.
Both preserve existing behavior without forcing convergence on a single policy.
→ Gap A2: OtOpcUa's AkkaClusterHealthCheck injects ActorSystem directly (no null guard).
The shared implementation injects via AkkaHostedService for startup safety.
§4 Database probe technique
| Aspect | OtOpcUa | ScadaBridge |
|---|---|---|
| Probe method | db.Deployments.AsNoTracking().Take(1).ToListAsync() (query) |
_dbContext.Database.CanConnectAsync() (connection only) |
| Injection style | IDbContextFactory<T> (pooled, safe for concurrent probes) |
DbContext directly (scoped, requires care in background use) |
| Schema verification | ✅ implies schema is applied | ⛔ connection only |
→ Gap D1: ZB.MOM.WW.Health.EntityFrameworkCore.DatabaseHealthCheck<TContext> uses
CanConnectAsync as the default (ScadaBridge behavior). An optional ProbeQuery delegate covers
OtOpcUa's stricter approach. Both apps retain their existing probe semantics; neither is forced
to change unless desired.
→ Gap D2: ScadaBridge injects DbContext directly; the shared probe should use
IDbContextFactory<TContext> for safe reuse from a background-service health-check context.
ScadaBridge's DI registration will need updating on adoption.
§5 Active-node / leader check
| Aspect | OtOpcUa | ScadaBridge |
|---|---|---|
| Probe type | AdminRoleLeaderHealthCheck (role-filtered: "admin") |
ActiveNodeHealthCheck (role-less; Up + leader) |
| Non-role-bearing node | Healthy immediately | n/a (all central nodes have no role filter) |
| Leader status | Healthy | Healthy |
| Non-leader (standby) | Degraded | Unhealthy |
IActiveNodeGate backing |
Not present | ActiveNodeGate (separate type, duplicated logic) |
→ Gap L1: ZB.MOM.WW.Health.Akka.ActiveNodeHealthCheck with an optional RoleFilter
parameter unifies both behaviors. OtOpcUa passes RoleFilter = "admin" (role-filtered);
ScadaBridge uses no role filter.
→ Gap L2: ScadaBridge's ActiveNodeGate duplicates ActiveNodeHealthCheck logic. The shared
IActiveNodeGate seam + a backing singleton eliminates the duplication.
§6 Response writer
| OtOpcUa | MxAccessGateway | ScadaBridge | |
|---|---|---|---|
| Writer | Default (plain-text/JSON) | Bespoke GatewayHealthReply JSON |
UIResponseWriter.WriteHealthCheckUIResponse |
→ Gap W1: the shared ZB.MOM.WW.Health package ships a canonical JSON response writer
(lifting HealthChecks.UI.Client style to the default). All three projects adopt it on
MapZbHealth() call — no per-project writer wiring needed.
§7 Endpoint authentication
Both OtOpcUa and ScadaBridge expose health endpoints without authentication (AllowAnonymous or
open by default). MxAccessGateway's /health/live has no authentication requirement. The spec
canonizes this: health tiers are AllowAnonymous; MapZbHealth() applies AllowAnonymous by
default.
No gap — consistent across all three. MapZbHealth() should document and enforce this default.
Adoption backlog (ordered)
| # | Item | Projects | Priority | Effort | Risk | Notes |
|---|---|---|---|---|---|---|
| 1 | MxAccessGateway: remove dead /health/live + AddHealthChecks(), add GrpcDependencyHealthCheck (worker IPC) + MapZbHealth() |
MxGateway | P1 | S | Low | Gap T1, T3, P1, P3 — no probes/tiers today; highest delta |
| 2 | OtOpcUa: replace 3 bespoke checks with shared probes (AkkaClusterHealthCheck OtOpcUaCompat + ActiveNodeHealthCheck role-filtered + DatabaseHealthCheck<T> ProbeQuery) |
OtOpcUa | P2 | S | Low | Gap A1, D1, L1 |
| 3 | ScadaBridge: replace 3 bespoke checks with shared probes (Default policy + role-less Active + CanConnectAsync) + add /healthz + unify ActiveNodeGate with IActiveNodeGate seam |
ScadaBridge | P2 | S | Low | Gap T2, A1, D2, L1, L2 |
| 4 | OtOpcUa + MxAccessGateway: add GrpcDependencyHealthCheck for downstream gRPC channel |
OtOpcUa, MxGateway | P2 | S | Low | Gap P2 — closes the silent-gateway-down scenario |
| 5 | All: adopt canonical response writer (switch from per-project writers to MapZbHealth default) |
all 3 | P3 | XS | Low | Gap W1 — mechanical; bundled with #1–3 |
| 6 | DB injection style: switch ScadaBridge from injected DbContext to IDbContextFactory<T> |
ScadaBridge | P3 | XS | Low | Gap D2 — background-service safety |
Note: adoption items #1–6 are all follow-on tasks. They are tracked here as the backlog for
after ZB.MOM.WW.Health @ 0.1.0 is published. The library build itself (nupkgs, tests) is a
separate task. This is consistent with how ZB.MOM.WW.Auth and ZB.MOM.WW.Theme are structured:
the library is built first; adoption by the three apps is the next step.