# Health โ€” gaps & adoption backlog Divergence of each project from [`spec/SPEC.md`](spec/SPEC.md), and the ordered backlog to reach the shared `ZB.MOM.WW.Health` library. Status legend: โ›” gap ยท ๐ŸŸก partial ยท โœ… matches. ## Divergence vs spec ### ยง1 Endpoint tiers | Spec tier | OtOpcUa | MxAccessGateway | ScadaBridge | |---|---|---|---| | `/health/ready` (tag `ready`) | โœ… present | โ›” absent | โœ… present (name-predicate) | | `/health/active` (tag `active`) | โœ… present | โ›” absent | โœ… present (name-predicate) | | `/healthz` (bare process liveness) | โœ… present | โ›” absent | โ›” absent | | `/health/live` (non-standard) | โ€” | โ›” present (hardcoded `"Healthy"`, bypasses health-check pipeline) | โ€” | โ†’ **Gap T1 (P1):** MxAccessGateway has no standard health tiers. The existing `/health/live` `MapGet` lambda must be replaced by `app.MapZbHealth()` + real probes. โ†’ **Gap T2:** ScadaBridge lacks `/healthz`. `MapZbHealth()` adds it automatically. โ†’ **Gap T3:** MxAccessGateway's `/health/live` uses a raw `MapGet` that bypasses the ASP.NET Core health-check middleware โ€” it does not participate in `IHealthCheckPublisher`, `HealthReport`, or UI integration. Must be removed. ### ยง2 Probe coverage | Probe | OtOpcUa | MxAccessGateway | ScadaBridge | |---|---|---|---| | Database connectivity | โœ… `DatabaseHealthCheck` (query probe) | โ›” none | โœ… `DatabaseHealthCheck` (`CanConnectAsync`) | | Akka cluster membership | โœ… `AkkaClusterHealthCheck` (2-way) | n/a (no Akka) | โœ… `AkkaClusterHealthCheck` (3-way) | | Active / leader node | โœ… `AdminRoleLeaderHealthCheck` (role-filtered) | n/a | โœ… `ActiveNodeHealthCheck` (role-less) | | Downstream gRPC dependency | โ›” none | โ›” none | โ›” none | โ†’ **Gap P1 (P1):** MxAccessGateway has zero probes โ€” `AddHealthChecks()` at `GatewayApplication.cs:61` is dead code. Minimum viable: a `GrpcDependencyHealthCheck` targeting the x86 worker IPC channel. โ†’ **Gap P2:** No project probes its downstream gRPC dependency. OtOpcUa should probe the MxAccessGateway channel; MxAccessGateway should probe the worker IPC. โ†’ **Gap P3:** Dead `AddHealthChecks()` in MxAccessGateway (`GatewayApplication.cs:61`) should be removed or replaced โ€” it currently implies health checks are configured when they are not. ### ยง3 Akka status-policy divergence | Aspect | OtOpcUa | ScadaBridge | |---|---|---| | Probe implementation | Scans `State.Members` for self by address | Reads `SelfMember.Status` directly | | Joining status | Degraded (not in Members as Up) | Healthy | | Leaving/Exiting status | Degraded | Degraded | | Other (Removed, Downโ€ฆ) | Degraded | Unhealthy | | ActorSystem null guard | โ€” (none; `ActorSystem` injected directly) | โœ… Degraded if null | The two implementations diverge in how they classify `Joining` (ScadaBridge calls it Healthy; OtOpcUa would see it as Degraded because `SelfMember` with status `Joining` would not appear as `Up` in the member scan). They also diverge in the Removed/Down classification (ScadaBridge Unhealthy, OtOpcUa Degraded). The shared `ZB.MOM.WW.Health.Akka.AkkaClusterHealthCheck` ships two presets to preserve both behaviors rather than forcing one onto the other: - **Default** โ€” ScadaBridge's three-way policy (`Up`/`Joining`=Healthy, `Leaving`/`Exiting`=Degraded, else Unhealthy) - **OtOpcUaCompat** โ€” OtOpcUa's self-Up-among-members scan (found Up=Healthy, not found=Degraded) โ†’ **Gap A1:** OtOpcUa adopts the `OtOpcUaCompat` preset; ScadaBridge adopts the `Default` preset. Both preserve existing behavior without forcing convergence on a single policy. โ†’ **Gap A2:** OtOpcUa's `AkkaClusterHealthCheck` injects `ActorSystem` directly (no null guard). The shared implementation injects via `AkkaHostedService` for startup safety. ### ยง4 Database probe technique | Aspect | OtOpcUa | ScadaBridge | |---|---|---| | Probe method | `db.Deployments.AsNoTracking().Take(1).ToListAsync()` (query) | `_dbContext.Database.CanConnectAsync()` (connection only) | | Injection style | `IDbContextFactory` (pooled, safe for concurrent probes) | `DbContext` directly (scoped, requires care in background use) | | Schema verification | โœ… implies schema is applied | โ›” connection only | โ†’ **Gap D1:** `ZB.MOM.WW.Health.EntityFrameworkCore.DatabaseHealthCheck` uses `CanConnectAsync` as the default (ScadaBridge behavior). An optional `ProbeQuery` delegate covers OtOpcUa's stricter approach. Both apps retain their existing probe semantics; neither is forced to change unless desired. โ†’ **Gap D2:** ScadaBridge injects `DbContext` directly; the shared probe should use `IDbContextFactory` for safe reuse from a background-service health-check context. ScadaBridge's DI registration will need updating on adoption. ### ยง5 Active-node / leader check | Aspect | OtOpcUa | ScadaBridge | |---|---|---| | Probe type | `AdminRoleLeaderHealthCheck` (role-filtered: `"admin"`) | `ActiveNodeHealthCheck` (role-less; Up + leader) | | Non-role-bearing node | Healthy immediately | n/a (all central nodes have no role filter) | | Leader status | Healthy | Healthy | | Non-leader (standby) | Degraded | Unhealthy | | `IActiveNodeGate` backing | Not present | `ActiveNodeGate` (separate type, duplicated logic) | โ†’ **Gap L1:** `ZB.MOM.WW.Health.Akka.ActiveNodeHealthCheck` with an optional `RoleFilter` parameter unifies both behaviors. OtOpcUa passes `RoleFilter = "admin"` (role-filtered); ScadaBridge uses no role filter. โ†’ **Gap L2:** ScadaBridge's `ActiveNodeGate` duplicates `ActiveNodeHealthCheck` logic. The shared `IActiveNodeGate` seam + a backing singleton eliminates the duplication. ### ยง6 Response writer | | OtOpcUa | MxAccessGateway | ScadaBridge | |---|---|---|---| | Writer | Default (plain-text/JSON) | Bespoke `GatewayHealthReply` JSON | `UIResponseWriter.WriteHealthCheckUIResponse` | โ†’ **Gap W1:** the shared `ZB.MOM.WW.Health` package ships a canonical JSON response writer (lifting `HealthChecks.UI.Client` style to the default). All three projects adopt it on `MapZbHealth()` call โ€” no per-project writer wiring needed. ### ยง7 Endpoint authentication Both OtOpcUa and ScadaBridge expose health endpoints without authentication (`AllowAnonymous` or open by default). MxAccessGateway's `/health/live` has no authentication requirement. The spec canonizes this: health tiers are `AllowAnonymous`; `MapZbHealth()` applies `AllowAnonymous` by default. No gap โ€” consistent across all three. `MapZbHealth()` should document and enforce this default. ## Adoption backlog (ordered) | # | Item | Projects | Priority | Effort | Risk | Notes | |---|---|---|---|---|---|---| | 1 | MxAccessGateway: remove dead `/health/live` + `AddHealthChecks()`, add `GrpcDependencyHealthCheck` (worker IPC) + `MapZbHealth()` | MxGateway | P1 | S | Low | Gap T1, T3, P1, P3 โ€” no probes/tiers today; highest delta | | 2 | OtOpcUa: replace 3 bespoke checks with shared probes (`AkkaClusterHealthCheck` OtOpcUaCompat + `ActiveNodeHealthCheck` role-filtered + `DatabaseHealthCheck` ProbeQuery) | OtOpcUa | P2 | S | Low | Gap A1, D1, L1 | | 3 | ScadaBridge: replace 3 bespoke checks with shared probes (Default policy + role-less Active + `CanConnectAsync`) + add `/healthz` + unify `ActiveNodeGate` with `IActiveNodeGate` seam | ScadaBridge | P2 | S | Low | Gap T2, A1, D2, L1, L2 | | 4 | OtOpcUa + MxAccessGateway: add `GrpcDependencyHealthCheck` for downstream gRPC channel | OtOpcUa, MxGateway | P2 | S | Low | Gap P2 โ€” closes the silent-gateway-down scenario | | 5 | All: adopt canonical response writer (switch from per-project writers to `MapZbHealth` default) | all 3 | P3 | XS | Low | Gap W1 โ€” mechanical; bundled with #1โ€“3 | | 6 | DB injection style: switch ScadaBridge from injected `DbContext` to `IDbContextFactory` | ScadaBridge | P3 | XS | Low | Gap D2 โ€” background-service safety | **Note: adoption items #1โ€“6 are all follow-on tasks.** They are tracked here as the backlog for after `ZB.MOM.WW.Health` @ 0.1.0 is published. The library build itself (nupkgs, tests) is a separate task. This is consistent with how `ZB.MOM.WW.Auth` and `ZB.MOM.WW.Theme` are structured: the library is built first; adoption by the three apps is the next step.