Two TDD plans (one per library, per house precedent) derived from the approved design, with co-located .tasks.json execution-persistence: - Health: components/health docs + 3 dependency-split packages (11 tasks) - Telemetry: components/observability docs + 2 packages (3 OTel signals + Serilog) + the MxGateway MEL->Serilog migration (12 tasks) Each task carries classification / est-time / parallelizable metadata for the executing-plans workflow.
18 KiB
ZB.MOM.WW.Health Shared Library Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
Goal: Author the components/health/ normalization docs and build the ZB.MOM.WW.Health shared library set (3 dependency-split NuGet packages) that normalizes the three-tier health pattern (/health/ready · /health/active · /healthz) and its reusable probes, so OtOpcUa, MxAccessGateway, and ScadaBridge can stop re-implementing health checks and MxGateway can gain readiness/active tiers it lacks today.
Architecture: A new standalone nested repo (~/Desktop/scadaproj/ZB.MOM.WW.Health), .NET 10, three library projects — ZB.MOM.WW.Health (core: tier mapping, JSON writer, IActiveNodeGate, gRPC-dependency probe; depends only on ASP.NET Core HealthChecks abstractions), ZB.MOM.WW.Health.Akka (Akka cluster + leader/active probes; opt-in Akka dep), ZB.MOM.WW.Health.EntityFrameworkCore (DatabaseHealthCheck<TContext>; opt-in EF dep). Heavy deps live in satellites so MxGateway (non-Akka, non-EF) references core only. Reference implementations are lifted and generalized from OtOpcUa and ScadaBridge. Consumer adoption (wiring into the three apps) is a separate follow-on GAPS.md item — this plan delivers docs + library + tests + packages only, matching where Auth/Theme sit.
Tech Stack: .NET 10, C#; xUnit + coverlet; Microsoft.AspNetCore.Mvc.Testing (WebApplicationFactory); Microsoft.Extensions.Diagnostics.HealthChecks; AspNetCore.HealthChecks.UI.Client (response-writer style); Akka.Cluster 1.5.62; Microsoft.EntityFrameworkCore 10.0.7 (+ .Sqlite/.InMemory for tests); Grpc.Net.Client; central package management (Directory.Packages.props); .slnx solution; Version 0.1.0 lockstep.
Source references (read-only, to port/generalize from):
- Three-tier + probes: OtOpcUa
~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/{HealthEndpoints,DatabaseHealthCheck,AkkaClusterHealthCheck,AdminRoleLeaderHealthCheck}.cs - Probes + gate + JSON writer: ScadaBridge
~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Health/{DatabaseHealthCheck,AkkaClusterHealthCheck,ActiveNodeHealthCheck,ActiveNodeGate}.csandProgram.cs:114-117,222-233 - The gap to fill: MxGateway
~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs:61,139-145 - Design:
~/Desktop/scadaproj/docs/plans/2026-06-01-health-observability-components-design.md
Conventions for every task: TDD (@superpowers-extended-cc:test-driven-development) — failing test first, minimal impl, green, commit. File-scoped namespaces, sealed by default, Async suffix on Task-returning methods. Commit after each green task. The Files: block of each task is the files_to_edit contract — touch exactly those paths; if more are needed, that's a plan defect to surface.
Phase 0 — Normalization docs (spec drives the API)
Task 1: components/health spec + shared-contract
Classification: small Estimated implement time: ~5 min Parallelizable with: Task 2
Files:
- Create:
components/health/spec/SPEC.md - Create:
components/health/shared-contract/ZB.MOM.WW.Health.md
Steps:
spec/SPEC.md— followcomponents/auth/spec/SPEC.mdshape. Section 0 Scope: normalized = tier semantics (ready/active/live), canonical tag set, JSON response shape, configurable Akka status policy, role-filtered active/leader probe,DatabaseHealthCheck<T>, gRPC-dependency probe,IActiveNodeGate. Explicitly NOT normalized = which probes each app registers, Traefik/orchestrator wiring, ScadaBridge'sHealthMonitoring/domain aggregation pipeline. Then a section per: tier table, probe catalog (with the two real Akka-policy variants + the role-filter unification), response-writer contract.shared-contract/ZB.MOM.WW.Health.md— the paper public API of the 3 packages:MapZbHealth,AddZbHealthChecks,ZbHealthTags,IActiveNodeGate,GrpcDependencyHealthCheck(options),AkkaClusterHealthCheck(+AkkaClusterStatusPolicy),ActiveNodeHealthCheck(+role option),DatabaseHealthCheck<TContext>(+probe delegate). Valid C# signatures, package-by-package, likecomponents/auth/shared-contract/ZB.MOM.WW.Auth.md.
Acceptance: Both files exist; SPEC.md has an explicit normalized-vs-per-project Section 0; the shared-contract compiles as C# signatures in your head (no behavior, just the surface). No tests (docs).
Task 2: components/health current-state ×3 + GAPS + README
Classification: small Estimated implement time: ~5 min Parallelizable with: Task 1
Files:
- Create:
components/health/current-state/otopcua/CURRENT-STATE.md - Create:
components/health/current-state/mxaccessgw/CURRENT-STATE.md - Create:
components/health/current-state/scadabridge/CURRENT-STATE.md - Create:
components/health/GAPS.md - Create:
components/health/README.md
Steps:
- Transcribe the code-verified scan from the design doc's "Health" current-state table + key refs into the three
CURRENT-STATE.mdfiles at fullfile:linedepth (re-verify each ref against the live repo). Each ends in an Adoption plan (what that app deletes/replaces to reach the spec). MxGateway's plan = "register the tier mapping + a worker gRPC-dependency probe; todayAddHealthChecks()atGatewayApplication.cs:61is unused." GAPS.md— per-project divergences + prioritized extraction/adoption backlog. Top entries: MxGateway has no probes/tiers (P1); Akka status-policy divergence (OtOpcUa vs ScadaBridge); DB-probe technique (CanConnectAsyncvsDeploymentsquery);/healthzpresent in OtOpcUa, absent in ScadaBridge.README.md— overview + per-project status table (likecomponents/auth/README.md).
Acceptance: Five files exist; each current-state cites real file:line refs; GAPS.md lists adoption as a follow-on. No tests (docs).
Phase 1 — Scaffold
Task 3: Create repo, solution, and project shells
Classification: small Estimated implement time: ~5 min Parallelizable with: none (gates all impl tasks)
Files:
- Create:
ZB.MOM.WW.Health/ZB.MOM.WW.Health.slnx - Create:
ZB.MOM.WW.Health/Directory.Build.props - Create:
ZB.MOM.WW.Health/Directory.Packages.props - Create:
ZB.MOM.WW.Health/.gitignore - Create:
ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/ZB.MOM.WW.Health.csproj - Create:
ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.Akka/ZB.MOM.WW.Health.Akka.csproj - Create:
ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.EntityFrameworkCore/ZB.MOM.WW.Health.EntityFrameworkCore.csproj - Create:
ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.Tests/…csproj - Create:
ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.Akka.Tests/…csproj - Create:
ZB.MOM.WW.Health/tests/ZB.MOM.WW.Health.EntityFrameworkCore.Tests/…csproj
Steps:
cd ~/Desktop/scadaproj && mkdir ZB.MOM.WW.Health && cd ZB.MOM.WW.Health && git init && dotnet new gitignoredotnet new sln -n ZB.MOM.WW.Health --format slnx(fallback to.slnif the SDK rejects.slnx).dotnet new classlib -f net10.0 -o src/<Name>for the 3 libs;dotnet new xunit -f net10.0 -o tests/<Name>.Testsfor the 3 test projects. Delete defaultClass1.cs/UnitTest1.cs.- Project refs:
.Akka&.EntityFrameworkCore→ coreZB.MOM.WW.Health; each test project → its lib (core test also refsMicrosoft.AspNetCore.Mvc.Testing). - Copy
Directory.Build.propsverbatim fromZB.MOM.WW.Auth/Directory.Build.props(net10.0, Nullable, ImplicitUsings, LangVersion latest, Version 0.1.0, central PM). Directory.Packages.props— pin:Microsoft.Extensions.Diagnostics.HealthChecks10.0.7,Microsoft.Extensions.Diagnostics.HealthChecks.Abstractions10.0.7,Microsoft.AspNetCore.Http.Abstractions/Microsoft.AspNetCore.Routing(framework refs via<FrameworkReference Include="Microsoft.AspNetCore.App"/>in core),AspNetCore.HealthChecks.UI.Client9.0.0,Akka.Cluster1.5.62,Akka.TestKit.Xunit21.5.62,Microsoft.EntityFrameworkCore10.0.7,Microsoft.EntityFrameworkCore.Sqlite10.0.7,Microsoft.EntityFrameworkCore.InMemory10.0.7,Grpc.Net.Client(latest 2.x), test pkgs (Microsoft.NET.Test.Sdk17.14.1,xunit2.9.3,xunit.runner.visualstudio3.1.4,coverlet.collector6.0.4,Microsoft.AspNetCore.Mvc.Testing10.0.7). Core lib uses<FrameworkReference Include="Microsoft.AspNetCore.App"/>rather than individual ASP.NET packages.dotnet sln addall 6 projects;dotnet build.- Commit:
git add -A && git commit -m "chore: scaffold ZB.MOM.WW.Health solution and projects"
Acceptance: dotnet build green; 6 projects in the solution.
Phase 2 — Core package (ZB.MOM.WW.Health)
Task 4: Canonical tags + MapZbHealth tier mapping
Classification: standard Estimated implement time: ~5 min Parallelizable with: none (Tasks 5-7 build on this)
Files:
- Create:
src/ZB.MOM.WW.Health/ZbHealthTags.cs - Create:
src/ZB.MOM.WW.Health/ZbHealthEndpointExtensions.cs - Test:
tests/ZB.MOM.WW.Health.Tests/TierMappingTests.cs
Step 1 — failing test: WebApplicationFactory-based test boots a minimal app that registers a fake check tagged ready, another tagged active, calls app.MapZbHealth(), then asserts: GET /health/ready runs ready-tagged only, /health/active runs active-tagged only, /healthz returns 200 with no checks executed (predicate _ => false). Use a counter check to prove which tiers ran which checks.
Step 2 — run, expect FAIL (MapZbHealth undefined). dotnet test --filter TierMappingTests.
Step 3 — implement: ZbHealthTags = public const string Ready="ready", Active="active", Live="live";. ZbHealthEndpointExtensions.MapZbHealth(this IEndpointRouteBuilder, ZbHealthEndpointOptions? = null) maps the three endpoints with predicates c => c.Tags.Contains(Ready), …Active, and _ => false; each AllowAnonymous(); response writer = the Task 5 writer (stub to default WriteMinimalPlaintext for now, replaced in Task 5).
Step 4 — run, expect PASS.
Step 5 — commit: feat(health): canonical tags + three-tier MapZbHealth
Task 5: Canonical JSON response writer
Classification: standard Estimated implement time: ~5 min Parallelizable with: none (slots into Task 4's mapper)
Files:
- Create:
src/ZB.MOM.WW.Health/ZbHealthResponseWriter.cs - Modify:
src/ZB.MOM.WW.Health/ZbHealthEndpointExtensions.cs(wire writer as default) - Test:
tests/ZB.MOM.WW.Health.Tests/ResponseWriterTests.cs
Step 1 — failing test: assert /health/ready body is JSON with status, per-check name/status/description, and totalDurationMs; content-type application/json; HTTP 200 when Healthy, 503 when any check Unhealthy, 200 when Degraded. Port the shape from ScadaBridge's UIResponseWriter.WriteHealthCheckUIResponse usage (Program.cs:222-233) but as our own writer (no UI dependency required at runtime — keep AspNetCore.HealthChecks.UI.Client optional).
Step 2 — run, expect FAIL.
Step 3 — implement ZbHealthResponseWriter.WriteJsonAsync(HttpContext, HealthReport); set it as the default ResponseWriter in MapZbHealth.
Step 4 — run, expect PASS.
Step 5 — commit: feat(health): canonical JSON health response writer
Task 6: IActiveNodeGate seam + route gating
Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 7
Files:
- Create:
src/ZB.MOM.WW.Health/IActiveNodeGate.cs - Create:
src/ZB.MOM.WW.Health/ActiveNodeGateEndpointFilter.cs - Test:
tests/ZB.MOM.WW.Health.Tests/ActiveNodeGateTests.cs
Step 1 — failing test: register a fake IActiveNodeGate whose IsActiveNode toggles; a route guarded by .RequireActiveNode() returns 200 when active, 503 (Retry-After) when standby. Port the gate contract from ScadaBridge ActiveNodeGate.cs:24-57 (generalized — no Akka in core; the implementation is provided by .Akka or the consumer).
Step 2 — FAIL. Step 3 — implement IActiveNodeGate { bool IsActiveNode { get; } } + an IEndpointConventionBuilder RequireActiveNode(this …) filter that 503s when not active. Step 4 — PASS. Step 5 — commit: feat(health): IActiveNodeGate seam + RequireActiveNode filter
Task 7: GrpcDependencyHealthCheck
Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 6
Files:
- Create:
src/ZB.MOM.WW.Health/GrpcDependencyHealthCheck.cs - Create:
src/ZB.MOM.WW.Health/GrpcDependencyOptions.cs - Test:
tests/ZB.MOM.WW.Health.Tests/GrpcDependencyHealthCheckTests.cs
Step 1 — failing test: with a stub probe delegate that resolves a GrpcChannel/health RPC, CheckHealthAsync returns Healthy on success, Unhealthy on RpcException/timeout. Make the actual probe a Func<GrpcChannel, CancellationToken, Task<bool>> injected via options so it's testable without a live server (default uses the standard gRPC Health Checking Health.Check).
Step 2 — FAIL. Step 3 — implement. Tags default [Ready, Active]. Step 4 — PASS. Step 5 — commit: feat(health): gRPC dependency health check
Phase 3 — ZB.MOM.WW.Health.Akka
Task 8: AkkaClusterHealthCheck + configurable status policy
Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 10
Files:
- Create:
src/ZB.MOM.WW.Health.Akka/AkkaClusterStatusPolicy.cs - Create:
src/ZB.MOM.WW.Health.Akka/AkkaClusterHealthCheck.cs - Test:
tests/ZB.MOM.WW.Health.Akka.Tests/AkkaClusterStatusPolicyTests.cs
Step 1 — failing test: table-driven over MemberStatus → HealthStatus for both presets. Default (ScadaBridge AkkaClusterHealthCheck.cs:29-51): Up/Joining→Healthy, Leaving/Exiting→Degraded, else Unhealthy. OtOpcUaCompat (OtOpcUa AkkaClusterHealthCheck.cs:25-34): self-Up-among-members→Healthy else Degraded. Test the policy function directly (pure), no live cluster.
Step 2 — FAIL. Step 3 — implement AkkaClusterStatusPolicy as a Func<MemberStatus, HealthStatus> with Default/OtOpcUaCompat static presets; AkkaClusterHealthCheck reads Cluster.Get(system).SelfMember.Status and applies the configured policy. Tags [Ready, Active].
Step 4 — PASS. Step 5 — commit: feat(health.akka): cluster health check with configurable status policy
Task 9: ActiveNodeHealthCheck with optional role filter
Classification: standard Estimated implement time: ~4 min Parallelizable with: Task 10
Files:
- Create:
src/ZB.MOM.WW.Health.Akka/ActiveNodeHealthCheck.cs - Create:
src/ZB.MOM.WW.Health.Akka/AkkaActiveNodeGate.cs(implements coreIActiveNodeGate) - Test:
tests/ZB.MOM.WW.Health.Akka.Tests/ActiveNodeHealthCheckTests.cs
Step 1 — failing test: with faked cluster state, role-less → Healthy iff SelfMember.Status==Up && Leader==self (ScadaBridge ActiveNodeHealthCheck.cs:25-44); with role="admin" → Healthy when node lacks the role (OtOpcUa AdminRoleLeaderHealthCheck.cs:30), Healthy when admin-leader, Degraded when admin-but-not-leader. Tags [Active] only.
Step 2 — FAIL. Step 3 — implement both the check and AkkaActiveNodeGate : IActiveNodeGate (so RequireActiveNode works in Akka apps). Step 4 — PASS. Step 5 — commit: feat(health.akka): active/leader check with role filter + IActiveNodeGate impl
Phase 4 — ZB.MOM.WW.Health.EntityFrameworkCore
Task 10: DatabaseHealthCheck<TContext>
Classification: standard Estimated implement time: ~4 min Parallelizable with: Task 8, Task 9
Files:
- Create:
src/ZB.MOM.WW.Health.EntityFrameworkCore/DatabaseHealthCheck.cs - Create:
src/ZB.MOM.WW.Health.EntityFrameworkCore/DatabaseHealthCheckOptions.cs - Test:
tests/ZB.MOM.WW.Health.EntityFrameworkCore.Tests/DatabaseHealthCheckTests.cs
Step 1 — failing test: SQLite in-memory DbContext → Healthy (default probe CanConnectAsync, ScadaBridge DatabaseHealthCheck.cs:27-42); a deliberately broken context → Unhealthy; a custom Func<TContext,CancellationToken,Task> probe delegate (OtOpcUa's "query Deployments" style, DatabaseHealthCheck.cs:25-37) runs and Unhealthy on throw. Resolve TContext via IDbContextFactory<TContext> (matches OtOpcUa) with a fallback to scoped TContext.
Step 2 — FAIL. Step 3 — implement generic check + options carrying the optional probe delegate. Tags [Ready, Active]. Step 4 — PASS. Step 5 — commit: feat(health.ef): generic DatabaseHealthCheck<TContext>
Phase 5 — Package & register
Task 11: Pack, README, register in indexes
Classification: small Estimated implement time: ~5 min Parallelizable with: none (final)
Files:
- Create:
ZB.MOM.WW.Health/README.md - Modify: each
.csproj(PackageId/Description/Authors metadata) - Modify:
components/README.md(registry row) - Modify:
CLAUDE.md(Component-normalization table row) - Modify:
upcoming.md(check off Health in "Suggested order")
Steps:
- Add NuGet metadata to the 3 lib
.csprojs (<PackageId>,<Description>,<Authors>,<PackageTags>); leaveGeneratePackageOnBuildoff (pack explicitly, like Auth). dotnet test(all 3 test projects green) — record counts.dotnet pack -c Release -o ./artifacts→ confirm 3*.0.1.0.nupkg.README.md— packages, consumer matrix (MxGateway → core only; OtOpcUa/ScadaBridge → all three),dotnet testinstructions, the "built, not yet adopted" note.- Register:
components/README.mdregistry row (statusDraft),CLAUDE.mdtable row, tick Health inupcoming.md. - Commit:
git -C ZB.MOM.WW.Health add -A && git -C ZB.MOM.WW.Health commit -m "docs: README + pack metadata", then in scadaprojgit add components/health CLAUDE.md components/README.md upcoming.md docs/plans && git commit -m "feat(health): ZB.MOM.WW.Health library + health normalization component"
Acceptance: 3 nupkgs @ 0.1.0; all tests green (counts recorded); indexes updated; design-doc build-order step for Health complete.
Summary of parallelism
- Phase 0 docs: Task 1 ∥ Task 2.
- Phase 1 scaffold: Task 3 (barrier — everything below needs it).
- Phase 2 core: Task 4 → Task 5 (sequential, same mapper); Task 6 ∥ Task 7.
- Phase 3/4: Task 8 ∥ Task 9 ∥ Task 10 (different packages/files).
- Phase 5: Task 11 (barrier — needs all above green).