From 5a965639f987d0512fc94dbfe9224ee90008e59e Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Mon, 1 Jun 2026 13:15:48 -0400 Subject: [PATCH] docs: implementation plan for ZB.MOM.WW.Health adoption across the 3 sister apps Detailed task-by-task plan (publish to Gitea, then per-repo behaviour-preserving probe swaps) incorporating recon findings that revised the design: MxGateway worker IPC is named pipes (custom SQLite readiness probe instead of gRPC), ScadaBridge ActorSystem is not in DI (transient bridge), downstream gRPC probes + IDbContextFactory switch + ScadaBridge seam unification deferred. --- .../2026-06-01-health-library-adoption.md | 837 ++++++++++++++++++ ...6-01-health-library-adoption.md.tasks.json | 16 + 2 files changed, 853 insertions(+) create mode 100644 docs/plans/2026-06-01-health-library-adoption.md create mode 100644 docs/plans/2026-06-01-health-library-adoption.md.tasks.json diff --git a/docs/plans/2026-06-01-health-library-adoption.md b/docs/plans/2026-06-01-health-library-adoption.md new file mode 100644 index 0000000..4b37faf --- /dev/null +++ b/docs/plans/2026-06-01-health-library-adoption.md @@ -0,0 +1,837 @@ +# ZB.MOM.WW.Health Adoption Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task. + +**Goal:** Adopt the shared `ZB.MOM.WW.Health` library into all three sister apps (OtOpcUa, +MxAccessGateway, ScadaBridge), replacing each app's bespoke health-check wiring with the shared +probes, canonical three-tier endpoints (`/health/ready`, `/health/active`, `/healthz`), and JSON +writer — behaviour-preserving. + +**Architecture:** Distribution is via the Gitea NuGet registry (`dohertj2-gitea` feed). The shared +checks are registered with `AddTypeActivatedCheck` (DI supplies `IServiceProvider`; extra +constructor args — policy / role / options — passed positionally) and tagged with `ZbHealthTags`; +`MapZbHealth()` routes each tier by tag. Each sister repo is its **own git repo** — branch, commit, +and (optionally) PR happen inside that repo, not in scadaproj. The three repo phases are mutually +independent after publish and may proceed in parallel. + +**Tech Stack:** .NET 10, ASP.NET Core health checks (`Microsoft.Extensions.Diagnostics.HealthChecks`), +Akka.NET cluster, EF Core, `Microsoft.Data.Sqlite`, NuGet Central Package Management, Gitea NuGet feed. + +--- + +## Context the executor MUST know + +**This plan edits FOUR repos:** +- `~/Desktop/scadaproj` — only Phase 0 (verify publish) and Phase 4 (GAPS bookkeeping). +- `~/Desktop/MxAccessGateway` — Phase 1 (core package only). +- `~/Desktop/OtOpcUa` — Phase 2 (all three packages). +- `~/Desktop/ScadaBridge` — Phase 3 (all three packages). + +**Per-repo git discipline:** each sister repo is independent. Before editing a sister repo, create a +branch `feat/adopt-zb-health`. Commit inside that repo. Never commit sister-repo changes from +scadaproj. Never skip hooks; never force-push. + +**Shared registration idiom (used in every phase).** The shared checks need constructor args DI +can't supply alone, so register them with `AddTypeActivatedCheck`: + +```csharp +using Microsoft.Extensions.DependencyInjection; // AddTypeActivatedCheck +using Microsoft.Extensions.Diagnostics.HealthChecks; // HealthStatus +using ZB.MOM.WW.Health; // ZbHealthTags, MapZbHealth, ZbHealthWriter +// + ZB.MOM.WW.Health.Akka / .EntityFrameworkCore where used +``` +`AddTypeActivatedCheck(name, failureStatus, tags, params object[] args)` builds the check via +`ActivatorUtilities.CreateInstance`: `IServiceProvider` constructor params are satisfied from DI; +anything else (an `AkkaClusterStatusPolicy`, a role string, a `DatabaseHealthCheckOptions`) is +taken from `args` by type. This is the canonical way to wire the shared checks. + +**Library public API (verified, do not re-derive):** +- `endpoints.MapZbHealth(ZbHealthEndpointOptions? = null)` — maps ready/active/live; defaults + `/health/ready`, `/health/active`, `/healthz`; ready+active use `ZbHealthWriter.WriteJsonAsync`; + all anonymous. Does NOT call `AddHealthChecks()`. +- `ZbHealthTags.Ready` = `"ready"`, `.Active` = `"active"`, `.Live` = `"live"`. +- `DatabaseHealthCheck(IServiceProvider, DatabaseHealthCheckOptions? )` — + default probe `CanConnectAsync`; `options.ProbeQuery = Func` for + the stricter query probe; resolves an `IDbContextFactory` if registered, else a scoped + `TContext` from a fresh scope (pool-safe). +- `AkkaClusterHealthCheck(IServiceProvider, AkkaClusterStatusPolicy)` — presets + `AkkaClusterStatusPolicy.Default` and `.OtOpcUaCompat`. Resolves `ActorSystem` from DI. +- `ActiveNodeHealthCheck(IServiceProvider)` (role-less) / `(IServiceProvider, string role)`. + Resolves `ActorSystem` from DI lazily; Degraded if not yet available. +- `AkkaActiveNodeGate(IServiceProvider) : IActiveNodeGate` — not used in this plan (ScadaBridge seam + unification is deferred). + +**Scope deferrals (settled — do NOT implement here):** downstream gRPC dependency probes (no +host-level `GrpcChannel` exists in OtOpcUa or MxGateway); ScadaBridge `IDbContextFactory` switch +(the shared check self-scopes); ScadaBridge `IActiveNodeGate` seam unification (its interface is +`...InboundAPI.IActiveNodeGate`, wired into inbound-API gating — out of scope). These are recorded +as follow-ups in Phase 4. + +--- + +## Phase 0 — Publish the Health packages (prerequisite) + +### Task 0: Verify the three Health nupkgs are on the Gitea feed (publish if absent) + +**Classification:** small +**Estimated implement time:** ~3 min +**Parallelizable with:** none (gates all other phases) + +**Files:** +- Read: `~/Desktop/scadaproj/ZB.MOM.WW.Health/ZB.MOM.WW.Health.slnx` +- (No source edits — this is a pack/push/verify task.) + +**Step 1: Check whether the packages already resolve from Gitea** + +The library CLAUDE.md claims they are "published to the Gitea NuGet feed." Verify: + +```bash +curl -s "https://gitea.dohertylan.com/api/packages/dohertj2/nuget/v3/registration/ZB.MOM.WW.Health/index.json" -o /dev/null -w "%{http_code}\n" +``` +Expected: `200` if already published. If `404`/`401`, publish (Steps 2–3). If credentials are +needed for the query, skip to Step 2 and rely on the push result. + +**Step 2: Pack (only if not already published)** + +```bash +cd ~/Desktop/scadaproj/ZB.MOM.WW.Health +dotnet pack ZB.MOM.WW.Health.slnx -c Release -o ./artifacts +ls artifacts/*.nupkg +``` +Expected: `ZB.MOM.WW.Health.0.1.0.nupkg`, `ZB.MOM.WW.Health.Akka.0.1.0.nupkg`, +`ZB.MOM.WW.Health.EntityFrameworkCore.0.1.0.nupkg`. + +**Step 3: Push to the Gitea feed** + +Credentials are NOT in the repo. The developer/CI provides them. Push each package: + +```bash +dotnet nuget push "artifacts/ZB.MOM.WW.Health*.0.1.0.nupkg" \ + --source "https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json" \ + --api-key "$GITEA_NUGET_TOKEN" +``` +Expected: `Your package was pushed.` for each (or `409 Conflict` = already present = fine). + +**Fallback (if Gitea is unreachable):** STOP and surface it. Do not silently switch mechanisms — +the fallback (local folder feed) changes only each repo's `nuget.config` source line, but that is a +plan amendment the user should approve. + +**Step 4: Commit (none in scadaproj for this task)** — no source changed; proceed to Phase 1. + +--- + +## Phase 1 — MxAccessGateway (core package only) + +Repo: `~/Desktop/MxAccessGateway`. Branch: `feat/adopt-zb-health`. This repo has **no CPM and no +`nuget.config`** today. Readiness probe = a custom `AuthStoreHealthCheck` over the SQLite auth store +(the gateway authenticates every gRPC call against it). + +### Task 1: Reference wiring — create `nuget.config`, add the package reference + +**Classification:** small +**Estimated implement time:** ~3 min +**Parallelizable with:** Task 4, Task 7 (different repos) + +**Files:** +- Create: `~/Desktop/MxAccessGateway/nuget.config` +- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj` (ItemGroup, after line 13) + +**Step 1: Create `nuget.config`** (this repo's first; nuget.org for everything, Gitea for Health) + +```xml + + + + + + + + + + + + + + + + + +``` + +**Step 2: Add the package reference** to the Server `.csproj`. Insert into the first `` +(the one ending at the current line 14): + +```xml + +``` +(Direct versioned reference — this repo has no CPM. Do not introduce CPM.) + +**Step 3: Restore to verify the feed resolves** + +```bash +cd ~/Desktop/MxAccessGateway +dotnet restore src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj +``` +Expected: restore succeeds and pulls `ZB.MOM.WW.Health 0.1.0` from `dohertj2-gitea`. If it 401s, +the developer must add the Gitea source credentials (`dotnet nuget add source … -u … -p … --store-password-in-clear-text`). + +**Step 4: Commit** + +```bash +cd ~/Desktop/MxAccessGateway && git checkout -b feat/adopt-zb-health +git add nuget.config src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj +git commit -m "build: reference ZB.MOM.WW.Health from the Gitea feed" +``` + +### Task 2: Write the custom `AuthStoreHealthCheck` (TDD) + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 4, Task 7 + +**Files:** +- Create: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/AuthStoreHealthCheck.cs` +- Test: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Diagnostics/AuthStoreHealthCheckTests.cs` + +**Step 1: Write the failing tests** + +```csharp +using Microsoft.Extensions.Diagnostics.HealthChecks; +using Microsoft.Extensions.Options; +using ZB.MOM.WW.MxGateway.Server.Configuration; +using ZB.MOM.WW.MxGateway.Server.Diagnostics; +using ZB.MOM.WW.MxGateway.Server.Security.Authentication; + +namespace ZB.MOM.WW.MxGateway.Tests.Diagnostics; + +public sealed class AuthStoreHealthCheckTests +{ + private static AuthSqliteConnectionFactory FactoryFor(string sqlitePath) + { + var options = new GatewayOptions(); + options.Authentication.SqlitePath = sqlitePath; + return new AuthSqliteConnectionFactory(Options.Create(options)); + } + + [Fact] + public async Task Healthy_WhenStoreReachable() + { + var path = Path.Combine(Path.GetTempPath(), $"authcheck-{Guid.NewGuid():N}.db"); + try + { + var check = new AuthStoreHealthCheck(FactoryFor(path)); + var result = await check.CheckHealthAsync(new HealthCheckContext()); + Assert.Equal(HealthStatus.Healthy, result.Status); + } + finally { if (File.Exists(path)) File.Delete(path); } + } + + [Fact] + public async Task Unhealthy_WhenPathUnusable() + { + // A path whose parent cannot be created (a file used as a directory) forces open to fail. + var bogus = Path.Combine(Path.GetTempPath(), $"authcheck-{Guid.NewGuid():N}"); + await File.WriteAllTextAsync(bogus, "x"); + try + { + var check = new AuthStoreHealthCheck(FactoryFor(Path.Combine(bogus, "store.db"))); + var result = await check.CheckHealthAsync(new HealthCheckContext()); + Assert.Equal(HealthStatus.Unhealthy, result.Status); + } + finally { if (File.Exists(bogus)) File.Delete(bogus); } + } +} +``` + +**Step 2: Run, expect failure** (type does not exist) + +```bash +cd ~/Desktop/MxAccessGateway +dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~AuthStoreHealthCheckTests" +``` +Expected: COMPILE ERROR / FAIL — `AuthStoreHealthCheck` not found. + +**Step 3: Implement the check** + +```csharp +using Microsoft.Data.Sqlite; +using Microsoft.Extensions.Diagnostics.HealthChecks; +using ZB.MOM.WW.MxGateway.Server.Security.Authentication; + +namespace ZB.MOM.WW.MxGateway.Server.Diagnostics; + +/// +/// Readiness probe: verifies the SQLite authentication store is reachable. The gateway +/// authenticates every gRPC call against this store, so its reachability gates readiness. +/// +public sealed class AuthStoreHealthCheck : IHealthCheck +{ + private readonly AuthSqliteConnectionFactory _connectionFactory; + + public AuthStoreHealthCheck(AuthSqliteConnectionFactory connectionFactory) => + _connectionFactory = connectionFactory ?? throw new ArgumentNullException(nameof(connectionFactory)); + + public async Task CheckHealthAsync( + HealthCheckContext context, + CancellationToken cancellationToken = default) + { + try + { + await using SqliteConnection connection = + await _connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false); + await using SqliteCommand command = connection.CreateCommand(); + command.CommandText = "SELECT 1;"; + await command.ExecuteScalarAsync(cancellationToken).ConfigureAwait(false); + return HealthCheckResult.Healthy("Auth store is reachable."); + } + catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) + { + throw; + } + catch (Exception ex) + { + return HealthCheckResult.Unhealthy("Auth store is unreachable.", ex); + } + } +} +``` + +**Step 4: Run, expect pass** + +```bash +dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~AuthStoreHealthCheckTests" +``` +Expected: PASS (2 tests). If the `GatewayOptions.Authentication.SqlitePath` accessor differs, adjust +the test helper to match the real options shape (read `Configuration/GatewayOptions.cs` first). + +**Step 5: Commit** + +```bash +git add src/ZB.MOM.WW.MxGateway.Server/Diagnostics/AuthStoreHealthCheck.cs \ + src/ZB.MOM.WW.MxGateway.Tests/Diagnostics/AuthStoreHealthCheckTests.cs +git commit -m "feat: add AuthStoreHealthCheck readiness probe" +``` + +### Task 3: Rewire `GatewayApplication` to the canonical tiers; fix the route test + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 4, Task 7 + +**Files:** +- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs:63-66` (the `AddHealthChecks()` line) and `:172-178` (the `/health/live` block) +- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs:14-27` + +**Step 1: Replace the bare `AddHealthChecks()` (line 66) with the tagged readiness probe** + +```csharp + builder.Services.AddHealthChecks() + .AddTypeActivatedCheck( + "auth-store", + failureStatus: null, + tags: new[] { ZbHealthTags.Ready }); +``` +Add `using ZB.MOM.WW.Health;` and `using ZB.MOM.WW.MxGateway.Server.Diagnostics;` (Diagnostics is +already imported at line 9). + +**Step 2: Delete the `/health/live` block (lines 172-178) and map the canonical tiers** + +Remove: +```csharp + endpoints.MapGet( + "/health/live", + () => Results.Ok(new GatewayHealthReply( + Status: "Healthy", + DefaultBackend: GatewayContractInfo.DefaultBackendName, + WorkerProtocolVersion: GatewayContractInfo.WorkerProtocolVersion))) + .WithName("LiveHealth"); +``` +Replace with: +```csharp + endpoints.MapZbHealth(); +``` +(`/health/ready` runs `auth-store`; `/health/active` runs no checks → 200; `/healthz` is bare +liveness. The `GatewayHealthReply` type may now be unused — if so, the C# compiler won't flag it; +leave it unless a "remove dead code" reviewer asks, to keep this change tight.) + +**Step 3: Update the route test** (`GatewayApplicationTests.cs:14-27`) to assert the three tiers +instead of `/health/live`: + +```csharp + /// Verifies that Build maps the canonical three health tiers. + [Fact] + public async Task Build_MapsCanonicalHealthEndpoints() + { + await using WebApplication app = GatewayApplication.Build([]); + + var paths = ((IEndpointRouteBuilder)app).DataSources + .SelectMany(dataSource => dataSource.Endpoints) + .OfType() + .Select(e => e.RoutePattern.RawText) + .ToHashSet(); + + Assert.Contains("/health/ready", paths); + Assert.Contains("/health/active", paths); + Assert.Contains("/healthz", paths); + Assert.DoesNotContain("/health/live", paths); + } +``` + +**Step 4: Build + test the whole gateway** + +```bash +cd ~/Desktop/MxAccessGateway +dotnet build src/MxGateway.sln +dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj +``` +Expected: build clean; all tests pass (the old `Build_MapsLiveHealthEndpoint` is replaced). If any +other test references `/health/live` or `LiveHealth`, update it the same way. + +**Step 5: Commit** + +```bash +git add src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs \ + src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs +git commit -m "feat: map canonical ZB health tiers; replace bypassing /health/live" +``` + +--- + +## Phase 2 — OtOpcUa (all three packages) + +Repo: `~/Desktop/OtOpcUa`. Branch: `feat/adopt-zb-health`. CPM present; `NuGet.config` has nuget.org ++ `local-mxgw` folder feed, NO source mapping. `ActorSystem` IS in DI (the bespoke +`AkkaClusterHealthCheck` injects it directly). This is the cleanest of the three. + +### Task 4: Reference wiring — add Gitea source + mapping + CPM versions + package refs + +**Classification:** small +**Estimated implement time:** ~4 min +**Parallelizable with:** Task 1, Task 7 + +**Files:** +- Modify: `~/Desktop/OtOpcUa/NuGet.config` +- Modify: `~/Desktop/OtOpcUa/Directory.Packages.props` (near line 99-100) +- Modify: `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj` (ItemGroup lines 16-30) + +**Step 1: Add the Gitea source + source mapping** to `NuGet.config`. Because adding a mapping makes +ALL sources mapped explicitly, map the existing feeds too: + +```xml + + + + + + + + + + + + + + + + + + + +``` + +**Step 2: Add CPM versions** to `Directory.Packages.props` next to the existing `ZB.MOM.WW.*` lines: + +```xml + + + +``` + +**Step 3: Add package references** (no version — CPM) to the Host `.csproj` ItemGroup: + +```xml + + + +``` + +**Step 4: Restore** + +```bash +cd ~/Desktop/OtOpcUa && git checkout -b feat/adopt-zb-health +dotnet restore ZB.MOM.WW.OtOpcUa.slnx +``` +Expected: restore succeeds; the three Health packages come from `dohertj2-gitea`, MxGateway stays on +`local-mxgw`. + +**Step 5: Commit** + +```bash +git add NuGet.config Directory.Packages.props src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj +git commit -m "build: reference ZB.MOM.WW.Health packages from the Gitea feed" +``` + +### Task 5: Swap the three checks to shared probes; map tiers via `MapZbHealth` + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 1, Task 7 + +**Files:** +- Rewrite: `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/HealthEndpoints.cs` +- Delete: `Health/DatabaseHealthCheck.cs`, `Health/AkkaClusterHealthCheck.cs`, `Health/AdminRoleLeaderHealthCheck.cs` +- Verify call sites unchanged: `Program.cs:137` (`AddOtOpcUaHealth`), `Program.cs:159` (`MapOtOpcUaHealth`) + +**Step 1: Rewrite `HealthEndpoints.cs`** to register the shared checks (preserving names + tags) and +map via `MapZbHealth()`: + +```csharp +using Microsoft.AspNetCore.Routing; +using Microsoft.EntityFrameworkCore; +using Microsoft.Extensions.DependencyInjection; +using ZB.MOM.WW.Health; +using ZB.MOM.WW.Health.Akka; +using ZB.MOM.WW.Health.EntityFrameworkCore; +using ZB.MOM.WW.OtOpcUa.Configuration; + +namespace ZB.MOM.WW.OtOpcUa.Host.Health; + +public static class HealthEndpoints +{ + /// + /// Registers the shared ZB.MOM.WW health probes. Tier semantics preserved from the bespoke + /// implementation: configdb + akka on ready+active; admin-leader on active only. + /// + public static IServiceCollection AddOtOpcUaHealth(this IServiceCollection services) + { + services.AddHealthChecks() + .AddTypeActivatedCheck>( + "configdb", + failureStatus: null, + tags: new[] { ZbHealthTags.Ready, ZbHealthTags.Active }, + args: new DatabaseHealthCheckOptions + { + // Preserve OtOpcUa's stricter schema-touching probe. + ProbeQuery = static (db, ct) => db.Deployments.AsNoTracking().Take(1).ToListAsync(ct), + }) + .AddTypeActivatedCheck( + "akka", + failureStatus: null, + tags: new[] { ZbHealthTags.Ready, ZbHealthTags.Active }, + args: AkkaClusterStatusPolicy.OtOpcUaCompat) + .AddTypeActivatedCheck( + "admin-leader", + failureStatus: null, + tags: new[] { ZbHealthTags.Active }, + args: "admin"); + return services; + } + + /// Maps the canonical three-tier health endpoints. + public static IEndpointRouteBuilder MapOtOpcUaHealth(this IEndpointRouteBuilder app) + { + app.MapZbHealth(); // /health/ready, /health/active, /healthz — all AllowAnonymous + return app; + } +} +``` + +Note: `args:` is the `params object[]` — pass a single options object / policy / string. If the +compiler binds the single-array overload oddly, wrap as `args: new object[] { … }`. + +**Step 2: Delete the three bespoke check files** + +```bash +cd ~/Desktop/OtOpcUa +git rm src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/DatabaseHealthCheck.cs \ + src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/AkkaClusterHealthCheck.cs \ + src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/AdminRoleLeaderHealthCheck.cs +``` +(`IClusterRoleInfo` may now be unused by Health; leave its definition — it may be used elsewhere.) + +**Step 3: Build** + +```bash +dotnet build ZB.MOM.WW.OtOpcUa.slnx +``` +Expected: clean. Fix any now-dangling `using ...Host.Health` references to the deleted types. + +**Step 4: Run health-related tests** + +```bash +dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~Health" +``` +Expected: pass. **Behaviour-parity checks the executor must confirm** (add/keep tests if missing): +- akka tier: self `Up` → Healthy; self not Up → Degraded (the `OtOpcUaCompat` preset reproduces the + self-Up scan). +- admin-leader: node without `admin` role → Healthy; admin member non-leader → Degraded; admin + leader → Healthy. (Shared check reads `Cluster.Get(system).SelfMember` + `RoleLeader("admin")`, + vs the old `IClusterRoleInfo`; verify equivalence on a formed test cluster or via the library's + own `ActiveNodeDecision` table — already covered in the library’s tests.) + +**Step 5: Commit** + +```bash +git add src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/HealthEndpoints.cs +git commit -m "feat: adopt shared ZB.MOM.WW.Health probes (preserve tiers + OtOpcUaCompat policy)" +``` + +--- + +## Phase 3 — ScadaBridge (all three packages) + +Repo: `~/Desktop/ScadaBridge`. Branch: `feat/adopt-zb-health`. CPM + Gitea feed already wired (just +extend mapping). **`ActorSystem` is NOT in DI** (owned by `AkkaHostedService`) — add a transient DI +bridge so the shared checks can resolve it. Keep the existing `ActiveNodeGate` (seam unification +deferred). No `IDbContextFactory` switch (shared check self-scopes). + +### Task 6: Reference wiring — extend mapping + CPM versions + package refs + +**Classification:** small +**Estimated implement time:** ~4 min +**Parallelizable with:** Task 1, Task 4 + +**Files:** +- Modify: `~/Desktop/ScadaBridge/nuget.config` (source-mapping block, lines 13-20) +- Modify: `~/Desktop/ScadaBridge/Directory.Packages.props` (near lines 76-77) +- Modify: `~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj` (ItemGroup lines 14-31) + +**Step 1: Extend the Gitea source mapping** — add a second pattern under `dohertj2-gitea`: + +```xml + + + + +``` + +**Step 2: Add CPM versions** next to the existing `ZB.MOM.WW.*` lines in `Directory.Packages.props`: + +```xml + + + +``` + +**Step 3: Add package references** to the Host `.csproj` ItemGroup (no version — CPM): + +```xml + + + +``` + +**Step 4: Restore + commit** + +```bash +cd ~/Desktop/ScadaBridge && git checkout -b feat/adopt-zb-health +dotnet restore ZB.MOM.WW.ScadaBridge.slnx +git add nuget.config Directory.Packages.props src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj +git commit -m "build: reference ZB.MOM.WW.Health packages from the Gitea feed" +``` + +### Task 7: Add the transient `ActorSystem` DI bridge (TDD) + +**Classification:** standard +**Estimated implement time:** ~4 min +**Parallelizable with:** Task 1, Task 4 + +**Files:** +- Modify: `~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Program.cs` (near the Akka registration) +- Test: `~/Desktop/ScadaBridge/tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ActorSystemBridgeTests.cs` + +**Why transient:** the shared checks call `sp.GetService()` **per probe** and treat +`null` as "not ready yet" (Degraded). A transient factory re-reads `AkkaHostedService.ActorSystem` +each resolve, returning `null` before startup and the live system after. A singleton would cache the +startup `null` forever. + +**Step 1: Write the failing test** + +```csharp +using Akka.Actor; +using Microsoft.Extensions.DependencyInjection; +using ZB.MOM.WW.ScadaBridge.Host.Actors; + +namespace ZB.MOM.WW.ScadaBridge.Host.Tests; + +public sealed class ActorSystemBridgeTests +{ + [Fact] + public void ActorSystem_ResolvesNull_BeforeHostedServiceStarts() + { + var services = new ServiceCollection(); + services.AddSingleton(); // ActorSystem property is null pre-start + services.AddTransient(sp => sp.GetRequiredService().ActorSystem!); + + using var provider = services.BuildServiceProvider(); + Assert.Null(provider.GetService()); // transient re-reads → null, not cached + } +} +``` +If `AkkaHostedService` cannot be constructed without dependencies, register a minimal stub instead; +the assertion that matters is "transient bridge yields null before start." Read +`Actors/AkkaHostedService.cs` constructor first and adapt. + +**Step 2: Run, expect failure** + +```bash +cd ~/Desktop/ScadaBridge +dotnet test tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ZB.MOM.WW.ScadaBridge.Host.Tests.csproj --filter "FullyQualifiedName~ActorSystemBridgeTests" +``` +Expected: FAIL (no `ActorSystem` registration → `GetService` returns null already, OR compile gap). +Adjust so the test meaningfully exercises the bridge registration you add in Step 3. + +**Step 3: Register the bridge in `Program.cs`** (right after `AkkaHostedService` is registered): + +```csharp +// The shared ZB.MOM.WW.Health Akka checks resolve ActorSystem from DI. ScadaBridge owns the +// ActorSystem inside AkkaHostedService (not a DI singleton), so bridge it as TRANSIENT: each +// resolve re-reads the current value — null while warming up (checks → Degraded), live afterwards. +builder.Services.AddTransient(sp => + sp.GetRequiredService().ActorSystem + ?? throw new InvalidOperationException("ActorSystem not yet started.")); +``` +**Caution:** the shared checks use `GetService()` (returns null on failure to resolve) +NOT `GetRequiredService`. A transient factory that THROWS still surfaces as null from +`GetService`? No — `GetService` propagates factory exceptions. Therefore the factory must NOT throw; +return null instead. Use: + +```csharp +builder.Services.AddTransient(sp => + sp.GetRequiredService().ActorSystem!); // null before start; '!' is a hint only +``` +`GetService()` then returns `null` pre-start (Degraded) and the live system post-start. +Make the Step-1 test assert exactly this. + +**Step 4: Run, expect pass** + +```bash +dotnet test tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ZB.MOM.WW.ScadaBridge.Host.Tests.csproj --filter "FullyQualifiedName~ActorSystemBridgeTests" +``` +Expected: PASS. + +**Step 5: Commit** + +```bash +git add src/ZB.MOM.WW.ScadaBridge.Host/Program.cs \ + tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ActorSystemBridgeTests.cs +git commit -m "feat: bridge ActorSystem into DI (transient) for shared health checks" +``` + +### Task 8: Swap checks to shared probes; add `/healthz`; canonical writer + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** none (depends on Task 6 + Task 7) + +**Files:** +- Modify: `~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:114-117` (registration) and `:222-233` (endpoint mapping) +- Delete: `Health/DatabaseHealthCheck.cs`, `Health/AkkaClusterHealthCheck.cs`, `Health/ActiveNodeHealthCheck.cs` +- Keep: `Health/ActiveNodeGate.cs` (unchanged — seam unification deferred) +- Adjust: `tests/ZB.MOM.WW.ScadaBridge.Host.Tests/HealthCheckTests.cs` + +**Step 1: Replace the registration block** (Program.cs lines 114-117): + +```csharp +builder.Services.AddHealthChecks() + .AddTypeActivatedCheck>( + "database", + failureStatus: null, + tags: new[] { ZbHealthTags.Ready }) // default CanConnectAsync probe; self-scopes + .AddTypeActivatedCheck( + "akka-cluster", + failureStatus: null, + tags: new[] { ZbHealthTags.Ready }, + args: AkkaClusterStatusPolicy.Default) // Up/Joining=Healthy, Leaving/Exiting=Degraded + .AddTypeActivatedCheck( + "active-node", + failureStatus: null, + tags: new[] { ZbHealthTags.Active }); // role-less leader check +``` +Add usings: `ZB.MOM.WW.Health`, `ZB.MOM.WW.Health.Akka`, `ZB.MOM.WW.Health.EntityFrameworkCore`, +`ZB.MOM.WW.ScadaBridge.ConfigurationDatabase` (for `ScadaBridgeDbContext`). Tag mapping preserves +the prior split: `database` + `akka-cluster` on ready; `active-node` on active. + +**Step 2: Replace the endpoint mapping** (Program.cs lines 222-233 — the two `MapHealthChecks` +blocks using `UIResponseWriter`) with a single call: + +```csharp +app.MapZbHealth(); // /health/ready (database+akka-cluster), /health/active (active-node), /healthz +``` +This adds the previously-missing `/healthz` and switches both tiers to the canonical +`ZbHealthWriter`. Remove the now-unused `using` for `HealthChecks.UI.Client` / +`UIResponseWriter` if it becomes dead. + +**Step 3: Delete the three bespoke checks** + +```bash +cd ~/Desktop/ScadaBridge +git rm src/ZB.MOM.WW.ScadaBridge.Host/Health/DatabaseHealthCheck.cs \ + src/ZB.MOM.WW.ScadaBridge.Host/Health/AkkaClusterHealthCheck.cs \ + src/ZB.MOM.WW.ScadaBridge.Host/Health/ActiveNodeHealthCheck.cs +``` + +**Step 4: Build + test** + +```bash +dotnet build ZB.MOM.WW.ScadaBridge.slnx +dotnet test tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ZB.MOM.WW.ScadaBridge.Host.Tests.csproj +``` +Expected: build clean; tests pass. `HealthCheckTests.cs` likely references the deleted concrete +types or the old endpoint shape — retarget it to assert: `/health/ready`, `/health/active`, AND the +new `/healthz` are mapped; `database`+`akka-cluster` are tagged `ready`; `active-node` is tagged +`active`. The `Default` policy preserves ScadaBridge's `Joining`=Healthy classification — keep any +test asserting that. + +**Step 5: Commit** + +```bash +git add src/ZB.MOM.WW.ScadaBridge.Host/Program.cs \ + tests/ZB.MOM.WW.ScadaBridge.Host.Tests/HealthCheckTests.cs +git commit -m "feat: adopt shared ZB.MOM.WW.Health probes; add /healthz; canonical writer" +``` + +--- + +## Phase 4 — Bookkeeping (scadaproj) + +### Task 9: Update the Health GAPS backlog to reflect adoption + deferrals + +**Classification:** trivial +**Estimated implement time:** ~3 min +**Parallelizable with:** none (do last) + +**Files:** +- Modify: `~/Desktop/scadaproj/components/health/GAPS.md` (adoption backlog table + a deferrals note) + +**Step 1:** In `components/health/GAPS.md`, annotate the adoption-backlog rows as done for what +shipped (MxGateway tiers + `AuthStoreHealthCheck`; OtOpcUa shared probes; ScadaBridge shared probes ++ `/healthz` + canonical writer + ActorSystem bridge), and add a short "Deferred (verified +ill-fitting on adoption)" subsection capturing: downstream gRPC probes (no host-level channel), +ScadaBridge `IDbContextFactory` switch (shared check self-scopes), ScadaBridge `IActiveNodeGate` +seam unification (different InboundAPI interface), and MxGateway worker probe (named-pipe transport). + +**Step 2: Commit (scadaproj)** + +```bash +cd ~/Desktop/scadaproj +git add components/health/GAPS.md +git commit -m "docs(health): mark ZB.MOM.WW.Health adoption done; record verified deferrals" +``` + +--- + +## Execution notes + +- **Order:** Task 0 first (gates everything). Then the three repo phases are independent — Tasks + 1-3 (MxGateway), 4-5 (OtOpcUa), 6-8 (ScadaBridge) can run in parallel across repos; within a repo + they are sequential. Task 9 (scadaproj) last. +- **Per-repo green gate:** a phase is "done" only when that sister repo's full `dotnet build` + + `dotnet test` are green — not just the changed area. +- **Behaviour preservation is the acceptance bar:** the presets (`OtOpcUaCompat` / `Default`) and + the role filter (`"admin"` / role-less) exist to keep each app's Healthy/Degraded/Unhealthy + classifications identical. Any classification change is a defect, not an improvement. +- **No secrets in any diff** — the Gitea token / feed credentials are provided out-of-band; verify + no `nuget.config` or csproj change embeds them. diff --git a/docs/plans/2026-06-01-health-library-adoption.md.tasks.json b/docs/plans/2026-06-01-health-library-adoption.md.tasks.json new file mode 100644 index 0000000..6c7ca38 --- /dev/null +++ b/docs/plans/2026-06-01-health-library-adoption.md.tasks.json @@ -0,0 +1,16 @@ +{ + "planPath": "docs/plans/2026-06-01-health-library-adoption.md", + "tasks": [ + {"id": 0, "subject": "Task 0: Verify/publish Health nupkgs to Gitea", "status": "pending"}, + {"id": 1, "subject": "Task 1: MxGateway reference wiring", "status": "pending", "blockedBy": [0]}, + {"id": 2, "subject": "Task 2: MxGateway AuthStoreHealthCheck (TDD)", "status": "pending", "blockedBy": [1]}, + {"id": 3, "subject": "Task 3: MxGateway rewire to canonical tiers", "status": "pending", "blockedBy": [2]}, + {"id": 4, "subject": "Task 4: OtOpcUa reference wiring", "status": "pending", "blockedBy": [0]}, + {"id": 5, "subject": "Task 5: OtOpcUa swap to shared probes", "status": "pending", "blockedBy": [4]}, + {"id": 6, "subject": "Task 6: ScadaBridge reference wiring", "status": "pending", "blockedBy": [0]}, + {"id": 7, "subject": "Task 7: ScadaBridge ActorSystem DI bridge (TDD)", "status": "pending", "blockedBy": [6]}, + {"id": 8, "subject": "Task 8: ScadaBridge swap to shared probes", "status": "pending", "blockedBy": [6, 7]}, + {"id": 9, "subject": "Task 9: Update Health GAPS bookkeeping", "status": "pending", "blockedBy": [3, 5, 8]} + ], + "lastUpdated": "2026-06-01" +}