0c087d150d
- Added PackageTags to all 3 library csproj files (health-checks;aspnetcore/akka/efcore;scada;wonderware;zb-mom-ww) - Full solution dotnet test: 58 tests green (32 Akka + 20 core + 6 EFCore) - dotnet pack -c Release produces ZB.MOM.WW.Health.0.1.0.nupkg, ZB.MOM.WW.Health.Akka.0.1.0.nupkg, ZB.MOM.WW.Health.EntityFrameworkCore.0.1.0.nupkg; artifacts/ not committed - ZB.MOM.WW.Health/README.md: overview, packages table, consumer matrix, versioning, build/test/pack instructions, status note - components/README.md: Health row added to component registry - CLAUDE.md: Health row in Component-normalization table + Health paragraph; intro updated from "two pieces" to "three pieces" - upcoming.md: Health checks item checked off with pointer to components/health/ and ZB.MOM.WW.Health/ - components/health/README.md: status updated from "Draft / scaffolded / follow-on" to "Built @ 0.1.0"
89 lines
5.3 KiB
Markdown
89 lines
5.3 KiB
Markdown
# Health (readiness / liveness / active-node)
|
|
|
|
Second normalized component under the operability cluster. **Goal: path to shared code** — converge
|
|
the three sister projects onto a common three-tier health endpoint convention and a set of shared
|
|
probe implementations, proposed as the `ZB.MOM.WW.Health` library set (3 packages), while each
|
|
project keeps its own probe registration and orchestrator wiring.
|
|
|
|
- The one target: [`spec/SPEC.md`](spec/SPEC.md)
|
|
- The proposed shared library: [`shared-contract/ZB.MOM.WW.Health.md`](shared-contract/ZB.MOM.WW.Health.md)
|
|
- Divergences + backlog: [`GAPS.md`](GAPS.md)
|
|
- Current state, per project: [`current-state/`](current-state/)
|
|
|
|
## Why health is a strong normalization candidate
|
|
|
|
Both OtOpcUa and ScadaBridge trace their health-check structure to the same "ScadaLink three-tier
|
|
pattern" (`HealthEndpoints.cs:13` says so explicitly) but have already diverged in probe logic,
|
|
status semantics, response writer, and endpoint registration style. MxAccessGateway has no shared
|
|
ancestry here — it has a single hardcoded `/health/live` endpoint with no real probes at all.
|
|
The common core (three tiers, database probe, Akka cluster probe, active-node probe) is
|
|
re-implemented twice and absent once. Shared probe implementations with configurable policies
|
|
close the gap without forcing identical behavior onto projects with legitimately different cluster
|
|
semantics.
|
|
|
|
## Status by project
|
|
|
|
| Project | Endpoints today | Probes today | Response writer | `/healthz` | `IActiveNodeGate` | Adoption status |
|
|
|---|---|---|---|---|---|---|
|
|
| **OtOpcUa** | `/health/ready`, `/health/active`, `/healthz` | Database (query), AkkaCluster (2-way), AdminRoleLeader (role-filtered) | Default (plain-text/JSON) | ✅ present | — | Not started |
|
|
| **MxAccessGateway** | `/health/live` only (raw `MapGet`; hardcoded `"Healthy"`) | **None** (`AddHealthChecks()` called but unused) | Bespoke `GatewayHealthReply` JSON | ⛔ absent | — | Not started |
|
|
| **ScadaBridge** | `/health/ready`, `/health/active` | Database (`CanConnectAsync`), AkkaCluster (3-way), ActiveNode (role-less) | `HealthChecks.UI.Client` JSON | ⛔ absent | `ActiveNodeGate` (backs Inbound API 503 gate) | Not started |
|
|
|
|
See each project's [`current-state/<project>/CURRENT-STATE.md`](current-state/) for the
|
|
code-verified detail and its adoption plan.
|
|
|
|
## Normalized vs. left per-project
|
|
|
|
**Normalized (the shared target):**
|
|
|
|
- Three-tier endpoint convention: `/health/ready` (tag `ready`), `/health/active` (tag `active`),
|
|
`/healthz` (bare liveness). Mapped by `app.MapZbHealth()` from `ZB.MOM.WW.Health`.
|
|
- Canonical JSON response writer (lifted from `HealthChecks.UI.Client` style; no per-project
|
|
writer wiring needed).
|
|
- `IActiveNodeGate` seam — generalized from ScadaBridge's `ActiveNodeGate`; wired into `MapZbHealth`
|
|
for automatic active-tier response.
|
|
- `GrpcDependencyHealthCheck` — reachability probe for a downstream gRPC dependency (covers
|
|
OtOpcUa → MxAccessGateway channel and MxAccessGateway → worker IPC).
|
|
- `AkkaClusterHealthCheck` (in `ZB.MOM.WW.Health.Akka`) with a configurable status policy.
|
|
Default = ScadaBridge's three-way policy; `OtOpcUaCompat` preset preserves OtOpcUa's two-way
|
|
self-Up-among-members scan.
|
|
- `ActiveNodeHealthCheck` (in `ZB.MOM.WW.Health.Akka`) with an optional role filter. Role-less =
|
|
ScadaBridge's behavior (Up + cluster leader); role-filtered = OtOpcUa's `AdminRoleLeader`
|
|
behavior.
|
|
- `DatabaseHealthCheck<TContext>` (in `ZB.MOM.WW.Health.EntityFrameworkCore`) with default
|
|
`CanConnectAsync` and an optional `ProbeQuery` delegate.
|
|
- `AllowAnonymous` on all three tiers by default (consistent across all three projects today).
|
|
|
|
**Left per-project (not forced together):**
|
|
|
|
- Which probes each app registers, their names, and which tags they carry.
|
|
- Orchestrator / Traefik wiring (sidecars, route rules, upstreams).
|
|
- ScadaBridge's `HealthMonitoring/` distributed aggregation pipeline (`SiteHealthCollector`,
|
|
`CentralHealthAggregator`, `HealthReportSender`, etc.) — domain-specific, no shared-library
|
|
equivalent.
|
|
- MxAccessGateway's `GatewayHealthReply` metadata (`DefaultBackend`, `WorkerProtocolVersion`) —
|
|
keep as a bespoke `/info` endpoint.
|
|
- The x86 worker process — out of process and out of scope; the gateway-side
|
|
`GrpcDependencyHealthCheck` observes it indirectly.
|
|
|
|
## Package structure
|
|
|
|
`ZB.MOM.WW.Health` ships as three dependency-split packages:
|
|
|
|
| Package | Contents | Consumers |
|
|
|---|---|---|
|
|
| `ZB.MOM.WW.Health` | Core tiers, `MapZbHealth`, canonical writer, `IActiveNodeGate`, `GrpcDependencyHealthCheck` | All three |
|
|
| `ZB.MOM.WW.Health.Akka` | `AkkaClusterHealthCheck` + status presets, `ActiveNodeHealthCheck` + role filter | OtOpcUa, ScadaBridge |
|
|
| `ZB.MOM.WW.Health.EntityFrameworkCore` | `DatabaseHealthCheck<TContext>` + optional probe delegate | OtOpcUa, ScadaBridge |
|
|
|
|
MxAccessGateway consumes the core package only (no Akka, no EF). OtOpcUa and ScadaBridge consume
|
|
all three.
|
|
|
|
## Component status
|
|
|
|
**Status: Draft — library built at 0.1.0.** Spec and shared-contract written; current-state docs
|
|
verified; GAPS backlog populated. Library implemented and packed at
|
|
[`../../ZB.MOM.WW.Health/`](../../ZB.MOM.WW.Health/) (3 packages, 58 tests;
|
|
`ZB.MOM.WW.Health`, `ZB.MOM.WW.Health.Akka`, `ZB.MOM.WW.Health.EntityFrameworkCore`).
|
|
Adoption by the three apps is the next follow-on tracked in [`GAPS.md`](GAPS.md).
|