Files
scadaproj/docs/plans/2026-06-01-zb-mom-ww-health-shared-library.md
T
Joseph Doherty c77df2a2cd docs: implementation plans for ZB.MOM.WW.Health + ZB.MOM.WW.Telemetry
Two TDD plans (one per library, per house precedent) derived from the approved
design, with co-located .tasks.json execution-persistence:

- Health: components/health docs + 3 dependency-split packages (11 tasks)
- Telemetry: components/observability docs + 2 packages (3 OTel signals +
  Serilog) + the MxGateway MEL->Serilog migration (12 tasks)

Each task carries classification / est-time / parallelizable metadata for the
executing-plans workflow.
2026-06-01 06:15:22 -04:00

18 KiB
Raw Blame History

ZB.MOM.WW.Health Shared Library Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.

Goal: Author the components/health/ normalization docs and build the ZB.MOM.WW.Health shared library set (3 dependency-split NuGet packages) that normalizes the three-tier health pattern (/health/ready · /health/active · /healthz) and its reusable probes, so OtOpcUa, MxAccessGateway, and ScadaBridge can stop re-implementing health checks and MxGateway can gain readiness/active tiers it lacks today.

Architecture: A new standalone nested repo (~/Desktop/scadaproj/ZB.MOM.WW.Health), .NET 10, three library projects — ZB.MOM.WW.Health (core: tier mapping, JSON writer, IActiveNodeGate, gRPC-dependency probe; depends only on ASP.NET Core HealthChecks abstractions), ZB.MOM.WW.Health.Akka (Akka cluster + leader/active probes; opt-in Akka dep), ZB.MOM.WW.Health.EntityFrameworkCore (DatabaseHealthCheck<TContext>; opt-in EF dep). Heavy deps live in satellites so MxGateway (non-Akka, non-EF) references core only. Reference implementations are lifted and generalized from OtOpcUa and ScadaBridge. Consumer adoption (wiring into the three apps) is a separate follow-on GAPS.md item — this plan delivers docs + library + tests + packages only, matching where Auth/Theme sit.

Tech Stack: .NET 10, C#; xUnit + coverlet; Microsoft.AspNetCore.Mvc.Testing (WebApplicationFactory); Microsoft.Extensions.Diagnostics.HealthChecks; AspNetCore.HealthChecks.UI.Client (response-writer style); Akka.Cluster 1.5.62; Microsoft.EntityFrameworkCore 10.0.7 (+ .Sqlite/.InMemory for tests); Grpc.Net.Client; central package management (Directory.Packages.props); .slnx solution; Version 0.1.0 lockstep.

Source references (read-only, to port/generalize from):

  • Three-tier + probes: OtOpcUa ~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/{HealthEndpoints,DatabaseHealthCheck,AkkaClusterHealthCheck,AdminRoleLeaderHealthCheck}.cs
  • Probes + gate + JSON writer: ScadaBridge ~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Health/{DatabaseHealthCheck,AkkaClusterHealthCheck,ActiveNodeHealthCheck,ActiveNodeGate}.cs and Program.cs:114-117,222-233
  • The gap to fill: MxGateway ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs:61,139-145
  • Design: ~/Desktop/scadaproj/docs/plans/2026-06-01-health-observability-components-design.md

Conventions for every task: TDD (@superpowers-extended-cc:test-driven-development) — failing test first, minimal impl, green, commit. File-scoped namespaces, sealed by default, Async suffix on Task-returning methods. Commit after each green task. The Files: block of each task is the files_to_edit contract — touch exactly those paths; if more are needed, that's a plan defect to surface.


Phase 0 — Normalization docs (spec drives the API)

Task 1: components/health spec + shared-contract

Classification: small Estimated implement time: ~5 min Parallelizable with: Task 2

Files:

  • Create: components/health/spec/SPEC.md
  • Create: components/health/shared-contract/ZB.MOM.WW.Health.md

Steps:

  1. spec/SPEC.md — follow components/auth/spec/SPEC.md shape. Section 0 Scope: normalized = tier semantics (ready/active/live), canonical tag set, JSON response shape, configurable Akka status policy, role-filtered active/leader probe, DatabaseHealthCheck<T>, gRPC-dependency probe, IActiveNodeGate. Explicitly NOT normalized = which probes each app registers, Traefik/orchestrator wiring, ScadaBridge's HealthMonitoring/ domain aggregation pipeline. Then a section per: tier table, probe catalog (with the two real Akka-policy variants + the role-filter unification), response-writer contract.
  2. shared-contract/ZB.MOM.WW.Health.md — the paper public API of the 3 packages: MapZbHealth, AddZbHealthChecks, ZbHealthTags, IActiveNodeGate, GrpcDependencyHealthCheck(options), AkkaClusterHealthCheck(+AkkaClusterStatusPolicy), ActiveNodeHealthCheck(+role option), DatabaseHealthCheck<TContext>(+probe delegate). Valid C# signatures, package-by-package, like components/auth/shared-contract/ZB.MOM.WW.Auth.md.

Acceptance: Both files exist; SPEC.md has an explicit normalized-vs-per-project Section 0; the shared-contract compiles as C# signatures in your head (no behavior, just the surface). No tests (docs).


Task 2: components/health current-state ×3 + GAPS + README

Classification: small Estimated implement time: ~5 min Parallelizable with: Task 1

Files:

  • Create: components/health/current-state/otopcua/CURRENT-STATE.md
  • Create: components/health/current-state/mxaccessgw/CURRENT-STATE.md
  • Create: components/health/current-state/scadabridge/CURRENT-STATE.md
  • Create: components/health/GAPS.md
  • Create: components/health/README.md

Steps:

  1. Transcribe the code-verified scan from the design doc's "Health" current-state table + key refs into the three CURRENT-STATE.md files at full file:line depth (re-verify each ref against the live repo). Each ends in an Adoption plan (what that app deletes/replaces to reach the spec). MxGateway's plan = "register the tier mapping + a worker gRPC-dependency probe; today AddHealthChecks() at GatewayApplication.cs:61 is unused."
  2. GAPS.md — per-project divergences + prioritized extraction/adoption backlog. Top entries: MxGateway has no probes/tiers (P1); Akka status-policy divergence (OtOpcUa vs ScadaBridge); DB-probe technique (CanConnectAsync vs Deployments query); /healthz present in OtOpcUa, absent in ScadaBridge.
  3. README.md — overview + per-project status table (like components/auth/README.md).

Acceptance: Five files exist; each current-state cites real file:line refs; GAPS.md lists adoption as a follow-on. No tests (docs).


Phase 1 — Scaffold

Task 3: Create repo, solution, and project shells

Classification: small Estimated implement time: ~5 min Parallelizable with: none (gates all impl tasks)

Files:

  • Create: ZB.MOM.WW.Health/ZB.MOM.WW.Health.slnx
  • Create: ZB.MOM.WW.Health/Directory.Build.props
  • Create: ZB.MOM.WW.Health/Directory.Packages.props
  • Create: ZB.MOM.WW.Health/.gitignore
  • Create: ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/ZB.MOM.WW.Health.csproj
  • Create: ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.Akka/ZB.MOM.WW.Health.Akka.csproj
  • Create: ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.EntityFrameworkCore/ZB.MOM.WW.Health.EntityFrameworkCore.csproj
  • Create: ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.Tests/…csproj
  • Create: ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.Akka.Tests/…csproj
  • Create: ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.EntityFrameworkCore.Tests/…csproj

Steps:

  1. cd ~/Desktop/scadaproj && mkdir ZB.MOM.WW.Health && cd ZB.MOM.WW.Health && git init && dotnet new gitignore
  2. dotnet new sln -n ZB.MOM.WW.Health --format slnx (fallback to .sln if the SDK rejects .slnx).
  3. dotnet new classlib -f net10.0 -o src/<Name> for the 3 libs; dotnet new xunit -f net10.0 -o tests/<Name>.Tests for the 3 test projects. Delete default Class1.cs/UnitTest1.cs.
  4. Project refs: .Akka & .EntityFrameworkCore → core ZB.MOM.WW.Health; each test project → its lib (core test also refs Microsoft.AspNetCore.Mvc.Testing).
  5. Copy Directory.Build.props verbatim from ZB.MOM.WW.Auth/Directory.Build.props (net10.0, Nullable, ImplicitUsings, LangVersion latest, Version 0.1.0, central PM).
  6. Directory.Packages.props — pin: Microsoft.Extensions.Diagnostics.HealthChecks 10.0.7, Microsoft.Extensions.Diagnostics.HealthChecks.Abstractions 10.0.7, Microsoft.AspNetCore.Http.Abstractions/Microsoft.AspNetCore.Routing (framework refs via <FrameworkReference Include="Microsoft.AspNetCore.App"/> in core), AspNetCore.HealthChecks.UI.Client 9.0.0, Akka.Cluster 1.5.62, Akka.TestKit.Xunit2 1.5.62, Microsoft.EntityFrameworkCore 10.0.7, Microsoft.EntityFrameworkCore.Sqlite 10.0.7, Microsoft.EntityFrameworkCore.InMemory 10.0.7, Grpc.Net.Client (latest 2.x), test pkgs (Microsoft.NET.Test.Sdk 17.14.1, xunit 2.9.3, xunit.runner.visualstudio 3.1.4, coverlet.collector 6.0.4, Microsoft.AspNetCore.Mvc.Testing 10.0.7). Core lib uses <FrameworkReference Include="Microsoft.AspNetCore.App"/> rather than individual ASP.NET packages.
  7. dotnet sln add all 6 projects; dotnet build.
  8. Commit: git add -A && git commit -m "chore: scaffold ZB.MOM.WW.Health solution and projects"

Acceptance: dotnet build green; 6 projects in the solution.


Phase 2 — Core package (ZB.MOM.WW.Health)

Task 4: Canonical tags + MapZbHealth tier mapping

Classification: standard Estimated implement time: ~5 min Parallelizable with: none (Tasks 5-7 build on this)

Files:

  • Create: src/ZB.MOM.WW.Health/ZbHealthTags.cs
  • Create: src/ZB.MOM.WW.Health/ZbHealthEndpointExtensions.cs
  • Test: tests/ZB.MOM.WW.Health.Tests/TierMappingTests.cs

Step 1 — failing test: WebApplicationFactory-based test boots a minimal app that registers a fake check tagged ready, another tagged active, calls app.MapZbHealth(), then asserts: GET /health/ready runs ready-tagged only, /health/active runs active-tagged only, /healthz returns 200 with no checks executed (predicate _ => false). Use a counter check to prove which tiers ran which checks.

Step 2 — run, expect FAIL (MapZbHealth undefined). dotnet test --filter TierMappingTests.

Step 3 — implement: ZbHealthTags = public const string Ready="ready", Active="active", Live="live";. ZbHealthEndpointExtensions.MapZbHealth(this IEndpointRouteBuilder, ZbHealthEndpointOptions? = null) maps the three endpoints with predicates c => c.Tags.Contains(Ready), …Active, and _ => false; each AllowAnonymous(); response writer = the Task 5 writer (stub to default WriteMinimalPlaintext for now, replaced in Task 5).

Step 4 — run, expect PASS.

Step 5 — commit: feat(health): canonical tags + three-tier MapZbHealth


Task 5: Canonical JSON response writer

Classification: standard Estimated implement time: ~5 min Parallelizable with: none (slots into Task 4's mapper)

Files:

  • Create: src/ZB.MOM.WW.Health/ZbHealthResponseWriter.cs
  • Modify: src/ZB.MOM.WW.Health/ZbHealthEndpointExtensions.cs (wire writer as default)
  • Test: tests/ZB.MOM.WW.Health.Tests/ResponseWriterTests.cs

Step 1 — failing test: assert /health/ready body is JSON with status, per-check name/status/description, and totalDurationMs; content-type application/json; HTTP 200 when Healthy, 503 when any check Unhealthy, 200 when Degraded. Port the shape from ScadaBridge's UIResponseWriter.WriteHealthCheckUIResponse usage (Program.cs:222-233) but as our own writer (no UI dependency required at runtime — keep AspNetCore.HealthChecks.UI.Client optional).

Step 2 — run, expect FAIL. Step 3 — implement ZbHealthResponseWriter.WriteJsonAsync(HttpContext, HealthReport); set it as the default ResponseWriter in MapZbHealth. Step 4 — run, expect PASS. Step 5 — commit: feat(health): canonical JSON health response writer


Task 6: IActiveNodeGate seam + route gating

Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 7

Files:

  • Create: src/ZB.MOM.WW.Health/IActiveNodeGate.cs
  • Create: src/ZB.MOM.WW.Health/ActiveNodeGateEndpointFilter.cs
  • Test: tests/ZB.MOM.WW.Health.Tests/ActiveNodeGateTests.cs

Step 1 — failing test: register a fake IActiveNodeGate whose IsActiveNode toggles; a route guarded by .RequireActiveNode() returns 200 when active, 503 (Retry-After) when standby. Port the gate contract from ScadaBridge ActiveNodeGate.cs:24-57 (generalized — no Akka in core; the implementation is provided by .Akka or the consumer).

Step 2 — FAIL. Step 3 — implement IActiveNodeGate { bool IsActiveNode { get; } } + an IEndpointConventionBuilder RequireActiveNode(this …) filter that 503s when not active. Step 4 — PASS. Step 5 — commit: feat(health): IActiveNodeGate seam + RequireActiveNode filter


Task 7: GrpcDependencyHealthCheck

Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 6

Files:

  • Create: src/ZB.MOM.WW.Health/GrpcDependencyHealthCheck.cs
  • Create: src/ZB.MOM.WW.Health/GrpcDependencyOptions.cs
  • Test: tests/ZB.MOM.WW.Health.Tests/GrpcDependencyHealthCheckTests.cs

Step 1 — failing test: with a stub probe delegate that resolves a GrpcChannel/health RPC, CheckHealthAsync returns Healthy on success, Unhealthy on RpcException/timeout. Make the actual probe a Func<GrpcChannel, CancellationToken, Task<bool>> injected via options so it's testable without a live server (default uses the standard gRPC Health Checking Health.Check).

Step 2 — FAIL. Step 3 — implement. Tags default [Ready, Active]. Step 4 — PASS. Step 5 — commit: feat(health): gRPC dependency health check


Phase 3 — ZB.MOM.WW.Health.Akka

Task 8: AkkaClusterHealthCheck + configurable status policy

Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 10

Files:

  • Create: src/ZB.MOM.WW.Health.Akka/AkkaClusterStatusPolicy.cs
  • Create: src/ZB.MOM.WW.Health.Akka/AkkaClusterHealthCheck.cs
  • Test: tests/ZB.MOM.WW.Health.Akka.Tests/AkkaClusterStatusPolicyTests.cs

Step 1 — failing test: table-driven over MemberStatusHealthStatus for both presets. Default (ScadaBridge AkkaClusterHealthCheck.cs:29-51): Up/Joining→Healthy, Leaving/Exiting→Degraded, else Unhealthy. OtOpcUaCompat (OtOpcUa AkkaClusterHealthCheck.cs:25-34): self-Up-among-members→Healthy else Degraded. Test the policy function directly (pure), no live cluster.

Step 2 — FAIL. Step 3 — implement AkkaClusterStatusPolicy as a Func<MemberStatus, HealthStatus> with Default/OtOpcUaCompat static presets; AkkaClusterHealthCheck reads Cluster.Get(system).SelfMember.Status and applies the configured policy. Tags [Ready, Active]. Step 4 — PASS. Step 5 — commit: feat(health.akka): cluster health check with configurable status policy


Task 9: ActiveNodeHealthCheck with optional role filter

Classification: standard Estimated implement time: ~4 min Parallelizable with: Task 10

Files:

  • Create: src/ZB.MOM.WW.Health.Akka/ActiveNodeHealthCheck.cs
  • Create: src/ZB.MOM.WW.Health.Akka/AkkaActiveNodeGate.cs (implements core IActiveNodeGate)
  • Test: tests/ZB.MOM.WW.Health.Akka.Tests/ActiveNodeHealthCheckTests.cs

Step 1 — failing test: with faked cluster state, role-less → Healthy iff SelfMember.Status==Up && Leader==self (ScadaBridge ActiveNodeHealthCheck.cs:25-44); with role="admin" → Healthy when node lacks the role (OtOpcUa AdminRoleLeaderHealthCheck.cs:30), Healthy when admin-leader, Degraded when admin-but-not-leader. Tags [Active] only.

Step 2 — FAIL. Step 3 — implement both the check and AkkaActiveNodeGate : IActiveNodeGate (so RequireActiveNode works in Akka apps). Step 4 — PASS. Step 5 — commit: feat(health.akka): active/leader check with role filter + IActiveNodeGate impl


Phase 4 — ZB.MOM.WW.Health.EntityFrameworkCore

Task 10: DatabaseHealthCheck<TContext>

Classification: standard Estimated implement time: ~4 min Parallelizable with: Task 8, Task 9

Files:

  • Create: src/ZB.MOM.WW.Health.EntityFrameworkCore/DatabaseHealthCheck.cs
  • Create: src/ZB.MOM.WW.Health.EntityFrameworkCore/DatabaseHealthCheckOptions.cs
  • Test: tests/ZB.MOM.WW.Health.EntityFrameworkCore.Tests/DatabaseHealthCheckTests.cs

Step 1 — failing test: SQLite in-memory DbContext → Healthy (default probe CanConnectAsync, ScadaBridge DatabaseHealthCheck.cs:27-42); a deliberately broken context → Unhealthy; a custom Func<TContext,CancellationToken,Task> probe delegate (OtOpcUa's "query Deployments" style, DatabaseHealthCheck.cs:25-37) runs and Unhealthy on throw. Resolve TContext via IDbContextFactory<TContext> (matches OtOpcUa) with a fallback to scoped TContext.

Step 2 — FAIL. Step 3 — implement generic check + options carrying the optional probe delegate. Tags [Ready, Active]. Step 4 — PASS. Step 5 — commit: feat(health.ef): generic DatabaseHealthCheck<TContext>


Phase 5 — Package & register

Task 11: Pack, README, register in indexes

Classification: small Estimated implement time: ~5 min Parallelizable with: none (final)

Files:

  • Create: ZB.MOM.WW.Health/README.md
  • Modify: each .csproj (PackageId/Description/Authors metadata)
  • Modify: components/README.md (registry row)
  • Modify: CLAUDE.md (Component-normalization table row)
  • Modify: upcoming.md (check off Health in "Suggested order")

Steps:

  1. Add NuGet metadata to the 3 lib .csprojs (<PackageId>, <Description>, <Authors>, <PackageTags>); leave GeneratePackageOnBuild off (pack explicitly, like Auth).
  2. dotnet test (all 3 test projects green) — record counts.
  3. dotnet pack -c Release -o ./artifacts → confirm 3 *.0.1.0.nupkg.
  4. README.md — packages, consumer matrix (MxGateway → core only; OtOpcUa/ScadaBridge → all three), dotnet test instructions, the "built, not yet adopted" note.
  5. Register: components/README.md registry row (status Draft), CLAUDE.md table row, tick Health in upcoming.md.
  6. Commit: git -C ZB.MOM.WW.Health add -A && git -C ZB.MOM.WW.Health commit -m "docs: README + pack metadata", then in scadaproj git add components/health CLAUDE.md components/README.md upcoming.md docs/plans && git commit -m "feat(health): ZB.MOM.WW.Health library + health normalization component"

Acceptance: 3 nupkgs @ 0.1.0; all tests green (counts recorded); indexes updated; design-doc build-order step for Health complete.


Summary of parallelism

  • Phase 0 docs: Task 1 ∥ Task 2.
  • Phase 1 scaffold: Task 3 (barrier — everything below needs it).
  • Phase 2 core: Task 4 → Task 5 (sequential, same mapper); Task 6 ∥ Task 7.
  • Phase 3/4: Task 8 ∥ Task 9 ∥ Task 10 (different packages/files).
  • Phase 5: Task 11 (barrier — needs all above green).