Files
scadaproj/docs/plans/2026-06-01-zb-mom-ww-health-shared-library.md
T
Joseph Doherty c77df2a2cd docs: implementation plans for ZB.MOM.WW.Health + ZB.MOM.WW.Telemetry
Two TDD plans (one per library, per house precedent) derived from the approved
design, with co-located .tasks.json execution-persistence:

- Health: components/health docs + 3 dependency-split packages (11 tasks)
- Telemetry: components/observability docs + 2 packages (3 OTel signals +
  Serilog) + the MxGateway MEL->Serilog migration (12 tasks)

Each task carries classification / est-time / parallelizable metadata for the
executing-plans workflow.
2026-06-01 06:15:22 -04:00

266 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ZB.MOM.WW.Health Shared Library Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Author the `components/health/` normalization docs and build the `ZB.MOM.WW.Health` shared library set (3 dependency-split NuGet packages) that normalizes the three-tier health pattern (`/health/ready` · `/health/active` · `/healthz`) and its reusable probes, so OtOpcUa, MxAccessGateway, and ScadaBridge can stop re-implementing health checks and MxGateway can gain readiness/active tiers it lacks today.
**Architecture:** A new standalone nested repo (`~/Desktop/scadaproj/ZB.MOM.WW.Health`), .NET 10, three library projects — `ZB.MOM.WW.Health` (core: tier mapping, JSON writer, `IActiveNodeGate`, gRPC-dependency probe; depends only on ASP.NET Core HealthChecks abstractions), `ZB.MOM.WW.Health.Akka` (Akka cluster + leader/active probes; opt-in Akka dep), `ZB.MOM.WW.Health.EntityFrameworkCore` (`DatabaseHealthCheck<TContext>`; opt-in EF dep). Heavy deps live in satellites so MxGateway (non-Akka, non-EF) references core only. Reference implementations are lifted and generalized from OtOpcUa and ScadaBridge. **Consumer adoption (wiring into the three apps) is a separate follow-on `GAPS.md` item — this plan delivers docs + library + tests + packages only**, matching where Auth/Theme sit.
**Tech Stack:** .NET 10, C#; xUnit + coverlet; `Microsoft.AspNetCore.Mvc.Testing` (WebApplicationFactory); `Microsoft.Extensions.Diagnostics.HealthChecks`; `AspNetCore.HealthChecks.UI.Client` (response-writer style); `Akka.Cluster` 1.5.62; `Microsoft.EntityFrameworkCore` 10.0.7 (+ `.Sqlite`/`.InMemory` for tests); `Grpc.Net.Client`; central package management (`Directory.Packages.props`); `.slnx` solution; `Version` 0.1.0 lockstep.
**Source references (read-only, to port/generalize from):**
- Three-tier + probes: OtOpcUa `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/{HealthEndpoints,DatabaseHealthCheck,AkkaClusterHealthCheck,AdminRoleLeaderHealthCheck}.cs`
- Probes + gate + JSON writer: ScadaBridge `~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Health/{DatabaseHealthCheck,AkkaClusterHealthCheck,ActiveNodeHealthCheck,ActiveNodeGate}.cs` and `Program.cs:114-117,222-233`
- The gap to fill: MxGateway `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs:61,139-145`
- Design: `~/Desktop/scadaproj/docs/plans/2026-06-01-health-observability-components-design.md`
**Conventions for every task:** TDD (@superpowers-extended-cc:test-driven-development) — failing test first, minimal impl, green, commit. File-scoped namespaces, `sealed` by default, `Async` suffix on Task-returning methods. Commit after each green task. The `Files:` block of each task is the `files_to_edit` contract — touch exactly those paths; if more are needed, that's a plan defect to surface.
---
## Phase 0 — Normalization docs (spec drives the API)
### Task 1: components/health spec + shared-contract
**Classification:** small
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 2
**Files:**
- Create: `components/health/spec/SPEC.md`
- Create: `components/health/shared-contract/ZB.MOM.WW.Health.md`
**Steps:**
1. `spec/SPEC.md` — follow `components/auth/spec/SPEC.md` shape. Section 0 **Scope**: normalized = tier semantics (`ready`/`active`/`live`), canonical tag set, JSON response shape, configurable Akka status policy, role-filtered active/leader probe, `DatabaseHealthCheck<T>`, gRPC-dependency probe, `IActiveNodeGate`. Explicitly NOT normalized = which probes each app registers, Traefik/orchestrator wiring, ScadaBridge's `HealthMonitoring/` domain aggregation pipeline. Then a section per: tier table, probe catalog (with the two real Akka-policy variants + the role-filter unification), response-writer contract.
2. `shared-contract/ZB.MOM.WW.Health.md` — the paper public API of the 3 packages: `MapZbHealth`, `AddZbHealthChecks`, `ZbHealthTags`, `IActiveNodeGate`, `GrpcDependencyHealthCheck`(options), `AkkaClusterHealthCheck`(+`AkkaClusterStatusPolicy`), `ActiveNodeHealthCheck`(+role option), `DatabaseHealthCheck<TContext>`(+probe delegate). Valid C# signatures, package-by-package, like `components/auth/shared-contract/ZB.MOM.WW.Auth.md`.
**Acceptance:** Both files exist; `SPEC.md` has an explicit normalized-vs-per-project Section 0; the shared-contract compiles as C# signatures in your head (no behavior, just the surface). No tests (docs).
---
### Task 2: components/health current-state ×3 + GAPS + README
**Classification:** small
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 1
**Files:**
- Create: `components/health/current-state/otopcua/CURRENT-STATE.md`
- Create: `components/health/current-state/mxaccessgw/CURRENT-STATE.md`
- Create: `components/health/current-state/scadabridge/CURRENT-STATE.md`
- Create: `components/health/GAPS.md`
- Create: `components/health/README.md`
**Steps:**
1. Transcribe the code-verified scan from the design doc's "Health" current-state table + key refs into the three `CURRENT-STATE.md` files at full `file:line` depth (re-verify each ref against the live repo). Each ends in an **Adoption plan** (what that app deletes/replaces to reach the spec). MxGateway's plan = "register the tier mapping + a worker gRPC-dependency probe; today `AddHealthChecks()` at `GatewayApplication.cs:61` is unused."
2. `GAPS.md` — per-project divergences + prioritized extraction/adoption backlog. Top entries: MxGateway has no probes/tiers (P1); Akka status-policy divergence (OtOpcUa vs ScadaBridge); DB-probe technique (`CanConnectAsync` vs `Deployments` query); `/healthz` present in OtOpcUa, absent in ScadaBridge.
3. `README.md` — overview + per-project status table (like `components/auth/README.md`).
**Acceptance:** Five files exist; each current-state cites real `file:line` refs; `GAPS.md` lists adoption as a follow-on. No tests (docs).
---
## Phase 1 — Scaffold
### Task 3: Create repo, solution, and project shells
**Classification:** small
**Estimated implement time:** ~5 min
**Parallelizable with:** none (gates all impl tasks)
**Files:**
- Create: `ZB.MOM.WW.Health/ZB.MOM.WW.Health.slnx`
- Create: `ZB.MOM.WW.Health/Directory.Build.props`
- Create: `ZB.MOM.WW.Health/Directory.Packages.props`
- Create: `ZB.MOM.WW.Health/.gitignore`
- Create: `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/ZB.MOM.WW.Health.csproj`
- Create: `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.Akka/ZB.MOM.WW.Health.Akka.csproj`
- Create: `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.EntityFrameworkCore/ZB.MOM.WW.Health.EntityFrameworkCore.csproj`
- Create: `ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.Tests/…csproj`
- Create: `ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.Akka.Tests/…csproj`
- Create: `ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.EntityFrameworkCore.Tests/…csproj`
**Steps:**
1. `cd ~/Desktop/scadaproj && mkdir ZB.MOM.WW.Health && cd ZB.MOM.WW.Health && git init && dotnet new gitignore`
2. `dotnet new sln -n ZB.MOM.WW.Health --format slnx` (fallback to `.sln` if the SDK rejects `.slnx`).
3. `dotnet new classlib -f net10.0 -o src/<Name>` for the 3 libs; `dotnet new xunit -f net10.0 -o tests/<Name>.Tests` for the 3 test projects. Delete default `Class1.cs`/`UnitTest1.cs`.
4. Project refs: `.Akka` & `.EntityFrameworkCore` → core `ZB.MOM.WW.Health`; each test project → its lib (core test also refs `Microsoft.AspNetCore.Mvc.Testing`).
5. Copy `Directory.Build.props` verbatim from `ZB.MOM.WW.Auth/Directory.Build.props` (net10.0, Nullable, ImplicitUsings, LangVersion latest, Version 0.1.0, central PM).
6. `Directory.Packages.props` — pin: `Microsoft.Extensions.Diagnostics.HealthChecks` 10.0.7, `Microsoft.Extensions.Diagnostics.HealthChecks.Abstractions` 10.0.7, `Microsoft.AspNetCore.Http.Abstractions`/`Microsoft.AspNetCore.Routing` (framework refs via `<FrameworkReference Include="Microsoft.AspNetCore.App"/>` in core), `AspNetCore.HealthChecks.UI.Client` 9.0.0, `Akka.Cluster` 1.5.62, `Akka.TestKit.Xunit2` 1.5.62, `Microsoft.EntityFrameworkCore` 10.0.7, `Microsoft.EntityFrameworkCore.Sqlite` 10.0.7, `Microsoft.EntityFrameworkCore.InMemory` 10.0.7, `Grpc.Net.Client` (latest 2.x), test pkgs (`Microsoft.NET.Test.Sdk` 17.14.1, `xunit` 2.9.3, `xunit.runner.visualstudio` 3.1.4, `coverlet.collector` 6.0.4, `Microsoft.AspNetCore.Mvc.Testing` 10.0.7). Core lib uses `<FrameworkReference Include="Microsoft.AspNetCore.App"/>` rather than individual ASP.NET packages.
7. `dotnet sln add` all 6 projects; `dotnet build`.
8. **Commit:** `git add -A && git commit -m "chore: scaffold ZB.MOM.WW.Health solution and projects"`
**Acceptance:** `dotnet build` green; 6 projects in the solution.
---
## Phase 2 — Core package (`ZB.MOM.WW.Health`)
### Task 4: Canonical tags + `MapZbHealth` tier mapping
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (Tasks 5-7 build on this)
**Files:**
- Create: `src/ZB.MOM.WW.Health/ZbHealthTags.cs`
- Create: `src/ZB.MOM.WW.Health/ZbHealthEndpointExtensions.cs`
- Test: `tests/ZB.MOM.WW.Health.Tests/TierMappingTests.cs`
**Step 1 — failing test:** `WebApplicationFactory`-based test boots a minimal app that registers a fake check tagged `ready`, another tagged `active`, calls `app.MapZbHealth()`, then asserts: `GET /health/ready` runs ready-tagged only, `/health/active` runs active-tagged only, `/healthz` returns 200 with no checks executed (predicate `_ => false`). Use a counter check to prove which tiers ran which checks.
**Step 2 — run, expect FAIL** (`MapZbHealth` undefined). `dotnet test --filter TierMappingTests`.
**Step 3 — implement:** `ZbHealthTags` = `public const string Ready="ready", Active="active", Live="live";`. `ZbHealthEndpointExtensions.MapZbHealth(this IEndpointRouteBuilder, ZbHealthEndpointOptions? = null)` maps the three endpoints with predicates `c => c.Tags.Contains(Ready)`, `…Active`, and `_ => false`; each `AllowAnonymous()`; response writer = the Task 5 writer (stub to default `WriteMinimalPlaintext` for now, replaced in Task 5).
**Step 4 — run, expect PASS.**
**Step 5 — commit:** `feat(health): canonical tags + three-tier MapZbHealth`
---
### Task 5: Canonical JSON response writer
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (slots into Task 4's mapper)
**Files:**
- Create: `src/ZB.MOM.WW.Health/ZbHealthResponseWriter.cs`
- Modify: `src/ZB.MOM.WW.Health/ZbHealthEndpointExtensions.cs` (wire writer as default)
- Test: `tests/ZB.MOM.WW.Health.Tests/ResponseWriterTests.cs`
**Step 1 — failing test:** assert `/health/ready` body is JSON with `status`, per-check `name`/`status`/`description`, and `totalDurationMs`; content-type `application/json`; HTTP 200 when Healthy, 503 when any check Unhealthy, 200 when Degraded. Port the shape from ScadaBridge's `UIResponseWriter.WriteHealthCheckUIResponse` usage (`Program.cs:222-233`) but as our own writer (no UI dependency required at runtime — keep `AspNetCore.HealthChecks.UI.Client` optional).
**Step 2 — run, expect FAIL.**
**Step 3 — implement** `ZbHealthResponseWriter.WriteJsonAsync(HttpContext, HealthReport)`; set it as the default `ResponseWriter` in `MapZbHealth`.
**Step 4 — run, expect PASS.**
**Step 5 — commit:** `feat(health): canonical JSON health response writer`
---
### Task 6: `IActiveNodeGate` seam + route gating
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 7
**Files:**
- Create: `src/ZB.MOM.WW.Health/IActiveNodeGate.cs`
- Create: `src/ZB.MOM.WW.Health/ActiveNodeGateEndpointFilter.cs`
- Test: `tests/ZB.MOM.WW.Health.Tests/ActiveNodeGateTests.cs`
**Step 1 — failing test:** register a fake `IActiveNodeGate` whose `IsActiveNode` toggles; a route guarded by `.RequireActiveNode()` returns 200 when active, 503 (`Retry-After`) when standby. Port the gate contract from ScadaBridge `ActiveNodeGate.cs:24-57` (generalized — no Akka in core; the implementation is provided by `.Akka` or the consumer).
**Step 2 — FAIL. Step 3 — implement** `IActiveNodeGate { bool IsActiveNode { get; } }` + an `IEndpointConventionBuilder RequireActiveNode(this …)` filter that 503s when not active. **Step 4 — PASS. Step 5 — commit:** `feat(health): IActiveNodeGate seam + RequireActiveNode filter`
---
### Task 7: `GrpcDependencyHealthCheck`
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 6
**Files:**
- Create: `src/ZB.MOM.WW.Health/GrpcDependencyHealthCheck.cs`
- Create: `src/ZB.MOM.WW.Health/GrpcDependencyOptions.cs`
- Test: `tests/ZB.MOM.WW.Health.Tests/GrpcDependencyHealthCheckTests.cs`
**Step 1 — failing test:** with a stub probe delegate that resolves a `GrpcChannel`/health RPC, `CheckHealthAsync` returns Healthy on success, Unhealthy on `RpcException`/timeout. Make the actual probe a `Func<GrpcChannel, CancellationToken, Task<bool>>` injected via options so it's testable without a live server (default uses the standard gRPC Health Checking `Health.Check`).
**Step 2 — FAIL. Step 3 — implement.** Tags default `[Ready, Active]`. **Step 4 — PASS. Step 5 — commit:** `feat(health): gRPC dependency health check`
---
## Phase 3 — `ZB.MOM.WW.Health.Akka`
### Task 8: `AkkaClusterHealthCheck` + configurable status policy
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 10
**Files:**
- Create: `src/ZB.MOM.WW.Health.Akka/AkkaClusterStatusPolicy.cs`
- Create: `src/ZB.MOM.WW.Health.Akka/AkkaClusterHealthCheck.cs`
- Test: `tests/ZB.MOM.WW.Health.Akka.Tests/AkkaClusterStatusPolicyTests.cs`
**Step 1 — failing test:** table-driven over `MemberStatus``HealthStatus` for both presets. `Default` (ScadaBridge `AkkaClusterHealthCheck.cs:29-51`): `Up`/`Joining`→Healthy, `Leaving`/`Exiting`→Degraded, else Unhealthy. `OtOpcUaCompat` (OtOpcUa `AkkaClusterHealthCheck.cs:25-34`): self-`Up`-among-members→Healthy else Degraded. Test the **policy function** directly (pure), no live cluster.
**Step 2 — FAIL. Step 3 — implement** `AkkaClusterStatusPolicy` as a `Func<MemberStatus, HealthStatus>` with `Default`/`OtOpcUaCompat` static presets; `AkkaClusterHealthCheck` reads `Cluster.Get(system).SelfMember.Status` and applies the configured policy. Tags `[Ready, Active]`.
**Step 4 — PASS. Step 5 — commit:** `feat(health.akka): cluster health check with configurable status policy`
---
### Task 9: `ActiveNodeHealthCheck` with optional role filter
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 10
**Files:**
- Create: `src/ZB.MOM.WW.Health.Akka/ActiveNodeHealthCheck.cs`
- Create: `src/ZB.MOM.WW.Health.Akka/AkkaActiveNodeGate.cs` (implements core `IActiveNodeGate`)
- Test: `tests/ZB.MOM.WW.Health.Akka.Tests/ActiveNodeHealthCheckTests.cs`
**Step 1 — failing test:** with faked cluster state, role-less → Healthy iff `SelfMember.Status==Up && Leader==self` (ScadaBridge `ActiveNodeHealthCheck.cs:25-44`); with `role="admin"` → Healthy when node lacks the role (OtOpcUa `AdminRoleLeaderHealthCheck.cs:30`), Healthy when admin-leader, Degraded when admin-but-not-leader. Tags `[Active]` only.
**Step 2 — FAIL. Step 3 — implement** both the check and `AkkaActiveNodeGate : IActiveNodeGate` (so `RequireActiveNode` works in Akka apps). **Step 4 — PASS. Step 5 — commit:** `feat(health.akka): active/leader check with role filter + IActiveNodeGate impl`
---
## Phase 4 — `ZB.MOM.WW.Health.EntityFrameworkCore`
### Task 10: `DatabaseHealthCheck<TContext>`
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 8, Task 9
**Files:**
- Create: `src/ZB.MOM.WW.Health.EntityFrameworkCore/DatabaseHealthCheck.cs`
- Create: `src/ZB.MOM.WW.Health.EntityFrameworkCore/DatabaseHealthCheckOptions.cs`
- Test: `tests/ZB.MOM.WW.Health.EntityFrameworkCore.Tests/DatabaseHealthCheckTests.cs`
**Step 1 — failing test:** SQLite in-memory `DbContext` → Healthy (default probe `CanConnectAsync`, ScadaBridge `DatabaseHealthCheck.cs:27-42`); a deliberately broken context → Unhealthy; a custom `Func<TContext,CancellationToken,Task>` probe delegate (OtOpcUa's "query `Deployments`" style, `DatabaseHealthCheck.cs:25-37`) runs and Unhealthy on throw. Resolve `TContext` via `IDbContextFactory<TContext>` (matches OtOpcUa) with a fallback to scoped `TContext`.
**Step 2 — FAIL. Step 3 — implement** generic check + options carrying the optional probe delegate. Tags `[Ready, Active]`. **Step 4 — PASS. Step 5 — commit:** `feat(health.ef): generic DatabaseHealthCheck<TContext>`
---
## Phase 5 — Package & register
### Task 11: Pack, README, register in indexes
**Classification:** small
**Estimated implement time:** ~5 min
**Parallelizable with:** none (final)
**Files:**
- Create: `ZB.MOM.WW.Health/README.md`
- Modify: each `.csproj` (PackageId/Description/Authors metadata)
- Modify: `components/README.md` (registry row)
- Modify: `CLAUDE.md` (Component-normalization table row)
- Modify: `upcoming.md` (check off Health in "Suggested order")
**Steps:**
1. Add NuGet metadata to the 3 lib `.csproj`s (`<PackageId>`, `<Description>`, `<Authors>`, `<PackageTags>`); leave `GeneratePackageOnBuild` off (pack explicitly, like Auth).
2. `dotnet test` (all 3 test projects green) — record counts.
3. `dotnet pack -c Release -o ./artifacts` → confirm 3 `*.0.1.0.nupkg`.
4. `README.md` — packages, consumer matrix (MxGateway → core only; OtOpcUa/ScadaBridge → all three), `dotnet test` instructions, the "built, not yet adopted" note.
5. Register: `components/README.md` registry row (status `Draft`), `CLAUDE.md` table row, tick Health in `upcoming.md`.
6. **Commit:** `git -C ZB.MOM.WW.Health add -A && git -C ZB.MOM.WW.Health commit -m "docs: README + pack metadata"`, then in scadaproj `git add components/health CLAUDE.md components/README.md upcoming.md docs/plans && git commit -m "feat(health): ZB.MOM.WW.Health library + health normalization component"`
**Acceptance:** 3 nupkgs @ 0.1.0; all tests green (counts recorded); indexes updated; design-doc build-order step for Health complete.
---
## Summary of parallelism
- **Phase 0** docs: Task 1 ∥ Task 2.
- **Phase 1** scaffold: Task 3 (barrier — everything below needs it).
- **Phase 2** core: Task 4 → Task 5 (sequential, same mapper); Task 6 ∥ Task 7.
- **Phase 3/4**: Task 8 ∥ Task 9 ∥ Task 10 (different packages/files).
- **Phase 5**: Task 11 (barrier — needs all above green).