Files
scadaproj/docs/plans/2026-06-01-health-library-adoption.md
T

35 KiB
Raw Blame History

ZB.MOM.WW.Health Adoption Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.

Goal: Adopt the shared ZB.MOM.WW.Health library into all three sister apps (OtOpcUa, MxAccessGateway, ScadaBridge), replacing each app's bespoke health-check wiring with the shared probes, canonical three-tier endpoints (/health/ready, /health/active, /healthz), and JSON writer — behaviour-preserving.

Architecture: Distribution is via the Gitea NuGet registry (dohertj2-gitea feed). The shared checks are registered with AddTypeActivatedCheck<T> (DI supplies IServiceProvider; extra constructor args — policy / role / options — passed positionally) and tagged with ZbHealthTags; MapZbHealth() routes each tier by tag. Each sister repo is its own git repo — branch, commit, and (optionally) PR happen inside that repo, not in scadaproj. The three repo phases are mutually independent after publish and may proceed in parallel.

Tech Stack: .NET 10, ASP.NET Core health checks (Microsoft.Extensions.Diagnostics.HealthChecks), Akka.NET cluster, EF Core, Microsoft.Data.Sqlite, NuGet Central Package Management, Gitea NuGet feed.


Context the executor MUST know

This plan edits FOUR repos:

  • ~/Desktop/scadaproj — only Phase 0 (verify publish) and Phase 4 (GAPS bookkeeping).
  • ~/Desktop/MxAccessGateway — Phase 1 (core package only).
  • ~/Desktop/OtOpcUa — Phase 2 (all three packages).
  • ~/Desktop/ScadaBridge — Phase 3 (all three packages).

Per-repo git discipline: each sister repo is independent. Before editing a sister repo, create a branch feat/adopt-zb-health. Commit inside that repo. Never commit sister-repo changes from scadaproj. Never skip hooks; never force-push.

Distribution status (Task 0 already done): the three ZB.MOM.WW.Health 0.1.0 packages are published to the dohertj2-gitea feed, and authenticated read credentials are configured at the user level (~/.nuget/NuGet/NuGet.Config) — anonymous read is OFF, so restore needs them, and they are already in place for every subagent. NEVER put the token in a repo file.

Source-mapping gotcha (verified): a ZB.MOM.WW.Health.* pattern does NOT match the core package id ZB.MOM.WW.Health (no trailing dot). Every repo's packageSourceMapping for the Gitea feed MUST list BOTH <package pattern="ZB.MOM.WW.Health" /> and <package pattern="ZB.MOM.WW.Health.*" />.

Shared registration idiom (used in every phase). The shared checks need constructor args DI can't supply alone, so register them with AddTypeActivatedCheck<T>:

using Microsoft.Extensions.DependencyInjection;          // AddTypeActivatedCheck
using Microsoft.Extensions.Diagnostics.HealthChecks;     // HealthStatus
using ZB.MOM.WW.Health;                                   // ZbHealthTags, MapZbHealth, ZbHealthWriter
// + ZB.MOM.WW.Health.Akka / .EntityFrameworkCore where used

AddTypeActivatedCheck<T>(name, failureStatus, tags, params object[] args) builds the check via ActivatorUtilities.CreateInstance: IServiceProvider constructor params are satisfied from DI; anything else (an AkkaClusterStatusPolicy, a role string, a DatabaseHealthCheckOptions<T>) is taken from args by type. This is the canonical way to wire the shared checks.

Library public API (verified, do not re-derive):

  • endpoints.MapZbHealth(ZbHealthEndpointOptions? = null) — maps ready/active/live; defaults /health/ready, /health/active, /healthz; ready+active use ZbHealthWriter.WriteJsonAsync; all anonymous. Does NOT call AddHealthChecks().
  • ZbHealthTags.Ready = "ready", .Active = "active", .Live = "live".
  • DatabaseHealthCheck<TContext>(IServiceProvider, DatabaseHealthCheckOptions<TContext>? ) — default probe CanConnectAsync; options.ProbeQuery = Func<TContext,CancellationToken,Task> for the stricter query probe; resolves an IDbContextFactory<TContext> if registered, else a scoped TContext from a fresh scope (pool-safe).
  • AkkaClusterHealthCheck(IServiceProvider, AkkaClusterStatusPolicy) — presets AkkaClusterStatusPolicy.Default and .OtOpcUaCompat. Resolves ActorSystem from DI.
  • ActiveNodeHealthCheck(IServiceProvider) (role-less) / (IServiceProvider, string role). Resolves ActorSystem from DI lazily; Degraded if not yet available.
  • AkkaActiveNodeGate(IServiceProvider) : IActiveNodeGate — not used in this plan (ScadaBridge seam unification is deferred).

Scope deferrals (settled — do NOT implement here): downstream gRPC dependency probes (no host-level GrpcChannel exists in OtOpcUa or MxGateway); ScadaBridge IDbContextFactory switch (the shared check self-scopes); ScadaBridge IActiveNodeGate seam unification (its interface is ...InboundAPI.IActiveNodeGate, wired into inbound-API gating — out of scope). These are recorded as follow-ups in Phase 4.


Phase 0 — Publish the Health packages (prerequisite)

Task 0: Verify the three Health nupkgs are on the Gitea feed (publish if absent)

Classification: small Estimated implement time: ~3 min Parallelizable with: none (gates all other phases)

Files:

  • Read: ~/Desktop/scadaproj/ZB.MOM.WW.Health/ZB.MOM.WW.Health.slnx
  • (No source edits — this is a pack/push/verify task.)

Step 1: Check whether the packages already resolve from Gitea

The library CLAUDE.md claims they are "published to the Gitea NuGet feed." Verify:

curl -s "https://gitea.dohertylan.com/api/packages/dohertj2/nuget/v3/registration/ZB.MOM.WW.Health/index.json" -o /dev/null -w "%{http_code}\n"

Expected: 200 if already published. If 404/401, publish (Steps 23). If credentials are needed for the query, skip to Step 2 and rely on the push result.

Step 2: Pack (only if not already published)

cd ~/Desktop/scadaproj/ZB.MOM.WW.Health
dotnet pack ZB.MOM.WW.Health.slnx -c Release -o ./artifacts
ls artifacts/*.nupkg

Expected: ZB.MOM.WW.Health.0.1.0.nupkg, ZB.MOM.WW.Health.Akka.0.1.0.nupkg, ZB.MOM.WW.Health.EntityFrameworkCore.0.1.0.nupkg.

Step 3: Push to the Gitea feed

Credentials are NOT in the repo. The developer/CI provides them. Push each package:

dotnet nuget push "artifacts/ZB.MOM.WW.Health*.0.1.0.nupkg" \
  --source "https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json" \
  --api-key "$GITEA_NUGET_TOKEN"

Expected: Your package was pushed. for each (or 409 Conflict = already present = fine).

Fallback (if Gitea is unreachable): STOP and surface it. Do not silently switch mechanisms — the fallback (local folder feed) changes only each repo's nuget.config source line, but that is a plan amendment the user should approve.

Step 4: Commit (none in scadaproj for this task) — no source changed; proceed to Phase 1.


Phase 1 — MxAccessGateway (core package only)

Repo: ~/Desktop/MxAccessGateway. Branch: feat/adopt-zb-health. This repo has no CPM and no nuget.config today. Readiness probe = a custom AuthStoreHealthCheck over the SQLite auth store (the gateway authenticates every gRPC call against it).

Task 1: Reference wiring — create nuget.config, add the package reference

Classification: small Estimated implement time: ~3 min Parallelizable with: Task 4, Task 7 (different repos)

Files:

  • Create: ~/Desktop/MxAccessGateway/nuget.config
  • Modify: ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj (ItemGroup, after line 13)

Step 1: Create nuget.config (this repo's first; nuget.org for everything, Gitea for Health)

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <clear />
    <add key="nuget.org" value="https://api.nuget.org/v3/index.json" />
    <add key="dohertj2-gitea" value="https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json" />
  </packageSources>
  <!-- nuget.org serves everything; the Gitea feed serves only the ZB.MOM.WW.* shared libs.
       Credentials are NOT committed: provide them per-developer via `dotnet nuget add source`
       (username + access token) or NuGet credential env vars in CI. -->
  <packageSourceMapping>
    <packageSource key="nuget.org">
      <package pattern="*" />
    </packageSource>
    <packageSource key="dohertj2-gitea">
      <package pattern="ZB.MOM.WW.Health" />
      <package pattern="ZB.MOM.WW.Health.*" />
    </packageSource>
  </packageSourceMapping>
</configuration>

Step 2: Add the package reference to the Server .csproj. Insert into the first <ItemGroup> (the one ending at the current line 14):

    <PackageReference Include="ZB.MOM.WW.Health" Version="0.1.0" />

(Direct versioned reference — this repo has no CPM. Do not introduce CPM.)

Step 3: Restore to verify the feed resolves

cd ~/Desktop/MxAccessGateway
dotnet restore src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj

Expected: restore succeeds and pulls ZB.MOM.WW.Health 0.1.0 from dohertj2-gitea. If it 401s, the developer must add the Gitea source credentials (dotnet nuget add source … -u … -p … --store-password-in-clear-text).

Step 4: Commit

cd ~/Desktop/MxAccessGateway && git checkout -b feat/adopt-zb-health
git add nuget.config src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj
git commit -m "build: reference ZB.MOM.WW.Health from the Gitea feed"

Task 2: Write the custom AuthStoreHealthCheck (TDD)

Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 4, Task 7

Files:

  • Create: ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/AuthStoreHealthCheck.cs
  • Test: ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Diagnostics/AuthStoreHealthCheckTests.cs

Step 1: Write the failing tests

using Microsoft.Extensions.Diagnostics.HealthChecks;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.MxGateway.Server.Configuration;
using ZB.MOM.WW.MxGateway.Server.Diagnostics;
using ZB.MOM.WW.MxGateway.Server.Security.Authentication;

namespace ZB.MOM.WW.MxGateway.Tests.Diagnostics;

public sealed class AuthStoreHealthCheckTests
{
    private static AuthSqliteConnectionFactory FactoryFor(string sqlitePath)
    {
        var options = new GatewayOptions();
        options.Authentication.SqlitePath = sqlitePath;
        return new AuthSqliteConnectionFactory(Options.Create(options));
    }

    [Fact]
    public async Task Healthy_WhenStoreReachable()
    {
        var path = Path.Combine(Path.GetTempPath(), $"authcheck-{Guid.NewGuid():N}.db");
        try
        {
            var check = new AuthStoreHealthCheck(FactoryFor(path));
            var result = await check.CheckHealthAsync(new HealthCheckContext());
            Assert.Equal(HealthStatus.Healthy, result.Status);
        }
        finally { if (File.Exists(path)) File.Delete(path); }
    }

    [Fact]
    public async Task Unhealthy_WhenPathUnusable()
    {
        // A path whose parent cannot be created (a file used as a directory) forces open to fail.
        var bogus = Path.Combine(Path.GetTempPath(), $"authcheck-{Guid.NewGuid():N}");
        await File.WriteAllTextAsync(bogus, "x");
        try
        {
            var check = new AuthStoreHealthCheck(FactoryFor(Path.Combine(bogus, "store.db")));
            var result = await check.CheckHealthAsync(new HealthCheckContext());
            Assert.Equal(HealthStatus.Unhealthy, result.Status);
        }
        finally { if (File.Exists(bogus)) File.Delete(bogus); }
    }
}

Step 2: Run, expect failure (type does not exist)

cd ~/Desktop/MxAccessGateway
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~AuthStoreHealthCheckTests"

Expected: COMPILE ERROR / FAIL — AuthStoreHealthCheck not found.

Step 3: Implement the check

using Microsoft.Data.Sqlite;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using ZB.MOM.WW.MxGateway.Server.Security.Authentication;

namespace ZB.MOM.WW.MxGateway.Server.Diagnostics;

/// <summary>
/// Readiness probe: verifies the SQLite authentication store is reachable. The gateway
/// authenticates every gRPC call against this store, so its reachability gates readiness.
/// </summary>
public sealed class AuthStoreHealthCheck : IHealthCheck
{
    private readonly AuthSqliteConnectionFactory _connectionFactory;

    public AuthStoreHealthCheck(AuthSqliteConnectionFactory connectionFactory) =>
        _connectionFactory = connectionFactory ?? throw new ArgumentNullException(nameof(connectionFactory));

    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken cancellationToken = default)
    {
        try
        {
            await using SqliteConnection connection =
                await _connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
            await using SqliteCommand command = connection.CreateCommand();
            command.CommandText = "SELECT 1;";
            await command.ExecuteScalarAsync(cancellationToken).ConfigureAwait(false);
            return HealthCheckResult.Healthy("Auth store is reachable.");
        }
        catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
        {
            throw;
        }
        catch (Exception ex)
        {
            return HealthCheckResult.Unhealthy("Auth store is unreachable.", ex);
        }
    }
}

Step 4: Run, expect pass

dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~AuthStoreHealthCheckTests"

Expected: PASS (2 tests). If the GatewayOptions.Authentication.SqlitePath accessor differs, adjust the test helper to match the real options shape (read Configuration/GatewayOptions.cs first).

Step 5: Commit

git add src/ZB.MOM.WW.MxGateway.Server/Diagnostics/AuthStoreHealthCheck.cs \
        src/ZB.MOM.WW.MxGateway.Tests/Diagnostics/AuthStoreHealthCheckTests.cs
git commit -m "feat: add AuthStoreHealthCheck readiness probe"

Task 3: Rewire GatewayApplication to the canonical tiers; fix the route test

Classification: high-risk Estimated implement time: ~5 min Parallelizable with: Task 4, Task 7

Files:

  • Modify: ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs:63-66 (the AddHealthChecks() line) and :172-178 (the /health/live block)
  • Modify: ~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs:14-27

Step 1: Replace the bare AddHealthChecks() (line 66) with the tagged readiness probe

        builder.Services.AddHealthChecks()
            .AddTypeActivatedCheck<AuthStoreHealthCheck>(
                "auth-store",
                failureStatus: null,
                tags: new[] { ZbHealthTags.Ready });

Add using ZB.MOM.WW.Health; and using ZB.MOM.WW.MxGateway.Server.Diagnostics; (Diagnostics is already imported at line 9).

Step 2: Delete the /health/live block (lines 172-178) and map the canonical tiers

Remove:

        endpoints.MapGet(
                "/health/live",
                () => Results.Ok(new GatewayHealthReply(
                    Status: "Healthy",
                    DefaultBackend: GatewayContractInfo.DefaultBackendName,
                    WorkerProtocolVersion: GatewayContractInfo.WorkerProtocolVersion)))
            .WithName("LiveHealth");

Replace with:

        endpoints.MapZbHealth();

(/health/ready runs auth-store; /health/active runs no checks → 200; /healthz is bare liveness. The GatewayHealthReply type may now be unused — if so, the C# compiler won't flag it; leave it unless a "remove dead code" reviewer asks, to keep this change tight.)

Step 3: Update the route test (GatewayApplicationTests.cs:14-27) to assert the three tiers instead of /health/live:

    /// <summary>Verifies that Build maps the canonical three health tiers.</summary>
    [Fact]
    public async Task Build_MapsCanonicalHealthEndpoints()
    {
        await using WebApplication app = GatewayApplication.Build([]);

        var paths = ((IEndpointRouteBuilder)app).DataSources
            .SelectMany(dataSource => dataSource.Endpoints)
            .OfType<RouteEndpoint>()
            .Select(e => e.RoutePattern.RawText)
            .ToHashSet();

        Assert.Contains("/health/ready", paths);
        Assert.Contains("/health/active", paths);
        Assert.Contains("/healthz", paths);
        Assert.DoesNotContain("/health/live", paths);
    }

Step 4: Build + test the whole gateway

cd ~/Desktop/MxAccessGateway
dotnet build src/MxGateway.sln
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj

Expected: build clean; all tests pass (the old Build_MapsLiveHealthEndpoint is replaced). If any other test references /health/live or LiveHealth, update it the same way.

Step 5: Commit

git add src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs \
        src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs
git commit -m "feat: map canonical ZB health tiers; replace bypassing /health/live"

Phase 2 — OtOpcUa (all three packages)

Repo: ~/Desktop/OtOpcUa. Branch: feat/adopt-zb-health. CPM present; NuGet.config has nuget.org

  • local-mxgw folder feed, NO source mapping. ActorSystem IS in DI (the bespoke AkkaClusterHealthCheck injects it directly). This is the cleanest of the three.

Task 4: Reference wiring — add Gitea source + mapping + CPM versions + package refs

Classification: small Estimated implement time: ~4 min Parallelizable with: Task 1, Task 7

Files:

  • Modify: ~/Desktop/OtOpcUa/NuGet.config
  • Modify: ~/Desktop/OtOpcUa/Directory.Packages.props (near line 99-100)
  • Modify: ~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj (ItemGroup lines 16-30)

Step 1: Add the Gitea source + source mapping to NuGet.config. Because adding a mapping makes ALL sources mapped explicitly, map the existing feeds too:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
    <add key="local-mxgw" value="./nuget-packages" />
    <add key="dohertj2-gitea" value="https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json" />
  </packageSources>
  <packageSourceMapping>
    <packageSource key="nuget.org">
      <package pattern="*" />
    </packageSource>
    <packageSource key="local-mxgw">
      <package pattern="ZB.MOM.WW.MxGateway.*" />
    </packageSource>
    <packageSource key="dohertj2-gitea">
      <package pattern="ZB.MOM.WW.Health" />
      <package pattern="ZB.MOM.WW.Health.*" />
    </packageSource>
  </packageSourceMapping>
</configuration>

Step 2: Add CPM versions to Directory.Packages.props next to the existing ZB.MOM.WW.* lines:

    <PackageVersion Include="ZB.MOM.WW.Health" Version="0.1.0" />
    <PackageVersion Include="ZB.MOM.WW.Health.Akka" Version="0.1.0" />
    <PackageVersion Include="ZB.MOM.WW.Health.EntityFrameworkCore" Version="0.1.0" />

Step 3: Add package references (no version — CPM) to the Host .csproj ItemGroup:

    <PackageReference Include="ZB.MOM.WW.Health" />
    <PackageReference Include="ZB.MOM.WW.Health.Akka" />
    <PackageReference Include="ZB.MOM.WW.Health.EntityFrameworkCore" />

Step 4: Restore

cd ~/Desktop/OtOpcUa && git checkout -b feat/adopt-zb-health
dotnet restore ZB.MOM.WW.OtOpcUa.slnx

Expected: restore succeeds; the three Health packages come from dohertj2-gitea, MxGateway stays on local-mxgw.

Step 5: Commit

git add NuGet.config Directory.Packages.props src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj
git commit -m "build: reference ZB.MOM.WW.Health packages from the Gitea feed"

Task 5: Swap the three checks to shared probes; map tiers via MapZbHealth

Classification: high-risk Estimated implement time: ~5 min Parallelizable with: Task 1, Task 7

Files:

  • Rewrite: ~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/HealthEndpoints.cs
  • Delete: Health/DatabaseHealthCheck.cs, Health/AkkaClusterHealthCheck.cs, Health/AdminRoleLeaderHealthCheck.cs
  • Verify call sites unchanged: Program.cs:137 (AddOtOpcUaHealth), Program.cs:159 (MapOtOpcUaHealth)

Step 1: Rewrite HealthEndpoints.cs to register the shared checks (preserving names + tags) and map via MapZbHealth():

using Microsoft.AspNetCore.Routing;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.Health;
using ZB.MOM.WW.Health.Akka;
using ZB.MOM.WW.Health.EntityFrameworkCore;
using ZB.MOM.WW.OtOpcUa.Configuration;

namespace ZB.MOM.WW.OtOpcUa.Host.Health;

public static class HealthEndpoints
{
    /// <summary>
    /// Registers the shared ZB.MOM.WW health probes. Tier semantics preserved from the bespoke
    /// implementation: configdb + akka on ready+active; admin-leader on active only.
    /// </summary>
    public static IServiceCollection AddOtOpcUaHealth(this IServiceCollection services)
    {
        services.AddHealthChecks()
            .AddTypeActivatedCheck<DatabaseHealthCheck<OtOpcUaConfigDbContext>>(
                "configdb",
                failureStatus: null,
                tags: new[] { ZbHealthTags.Ready, ZbHealthTags.Active },
                args: new DatabaseHealthCheckOptions<OtOpcUaConfigDbContext>
                {
                    // Preserve OtOpcUa's stricter schema-touching probe.
                    ProbeQuery = static (db, ct) => db.Deployments.AsNoTracking().Take(1).ToListAsync(ct),
                })
            .AddTypeActivatedCheck<AkkaClusterHealthCheck>(
                "akka",
                failureStatus: null,
                tags: new[] { ZbHealthTags.Ready, ZbHealthTags.Active },
                args: AkkaClusterStatusPolicy.OtOpcUaCompat)
            .AddTypeActivatedCheck<ActiveNodeHealthCheck>(
                "admin-leader",
                failureStatus: null,
                tags: new[] { ZbHealthTags.Active },
                args: "admin");
        return services;
    }

    /// <summary>Maps the canonical three-tier health endpoints.</summary>
    public static IEndpointRouteBuilder MapOtOpcUaHealth(this IEndpointRouteBuilder app)
    {
        app.MapZbHealth();   // /health/ready, /health/active, /healthz — all AllowAnonymous
        return app;
    }
}

Note: args: is the params object[] — pass a single options object / policy / string. If the compiler binds the single-array overload oddly, wrap as args: new object[] { … }.

Step 2: Delete the three bespoke check files

cd ~/Desktop/OtOpcUa
git rm src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/DatabaseHealthCheck.cs \
       src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/AkkaClusterHealthCheck.cs \
       src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/AdminRoleLeaderHealthCheck.cs

(IClusterRoleInfo may now be unused by Health; leave its definition — it may be used elsewhere.)

Step 3: Build

dotnet build ZB.MOM.WW.OtOpcUa.slnx

Expected: clean. Fix any now-dangling using ...Host.Health references to the deleted types.

Step 4: Run health-related tests

dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~Health"

Expected: pass. Behaviour-parity checks the executor must confirm (add/keep tests if missing):

  • akka tier: self Up → Healthy; self not Up → Degraded (the OtOpcUaCompat preset reproduces the self-Up scan).
  • admin-leader: node without admin role → Healthy; admin member non-leader → Degraded; admin leader → Healthy. (Shared check reads Cluster.Get(system).SelfMember + RoleLeader("admin"), vs the old IClusterRoleInfo; verify equivalence on a formed test cluster or via the library's own ActiveNodeDecision table — already covered in the librarys tests.)

Step 5: Commit

git add src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/HealthEndpoints.cs
git commit -m "feat: adopt shared ZB.MOM.WW.Health probes (preserve tiers + OtOpcUaCompat policy)"

Phase 3 — ScadaBridge (all three packages)

Repo: ~/Desktop/ScadaBridge. Branch: feat/adopt-zb-health. CPM + Gitea feed already wired (just extend mapping). ActorSystem is NOT in DI (owned by AkkaHostedService) — add a transient DI bridge so the shared checks can resolve it. Keep the existing ActiveNodeGate (seam unification deferred). No IDbContextFactory switch (shared check self-scopes).

Task 6: Reference wiring — extend mapping + CPM versions + package refs

Classification: small Estimated implement time: ~4 min Parallelizable with: Task 1, Task 4

Files:

  • Modify: ~/Desktop/ScadaBridge/nuget.config (source-mapping block, lines 13-20)
  • Modify: ~/Desktop/ScadaBridge/Directory.Packages.props (near lines 76-77)
  • Modify: ~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj (ItemGroup lines 14-31)

Step 1: Extend the Gitea source mapping — add a second pattern under dohertj2-gitea:

    <packageSource key="dohertj2-gitea">
      <package pattern="ZB.MOM.WW.MxGateway.*" />
      <package pattern="ZB.MOM.WW.Health" />
      <package pattern="ZB.MOM.WW.Health.*" />
    </packageSource>

Step 2: Add CPM versions next to the existing ZB.MOM.WW.* lines in Directory.Packages.props:

    <PackageVersion Include="ZB.MOM.WW.Health" Version="0.1.0" />
    <PackageVersion Include="ZB.MOM.WW.Health.Akka" Version="0.1.0" />
    <PackageVersion Include="ZB.MOM.WW.Health.EntityFrameworkCore" Version="0.1.0" />

Step 3: Add package references to the Host .csproj ItemGroup (no version — CPM):

    <PackageReference Include="ZB.MOM.WW.Health" />
    <PackageReference Include="ZB.MOM.WW.Health.Akka" />
    <PackageReference Include="ZB.MOM.WW.Health.EntityFrameworkCore" />

Step 4: Restore + commit

cd ~/Desktop/ScadaBridge && git checkout -b feat/adopt-zb-health
dotnet restore ZB.MOM.WW.ScadaBridge.slnx
git add nuget.config Directory.Packages.props src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj
git commit -m "build: reference ZB.MOM.WW.Health packages from the Gitea feed"

Task 7: Add the transient ActorSystem DI bridge (TDD)

Classification: standard Estimated implement time: ~4 min Parallelizable with: Task 1, Task 4

Files:

  • Modify: ~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Program.cs (near the Akka registration)
  • Test: ~/Desktop/ScadaBridge/tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ActorSystemBridgeTests.cs

Why transient: the shared checks call sp.GetService<ActorSystem>() per probe and treat null as "not ready yet" (Degraded). A transient factory re-reads AkkaHostedService.ActorSystem each resolve, returning null before startup and the live system after. A singleton would cache the startup null forever.

Step 1: Write the failing test

using Akka.Actor;
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.ScadaBridge.Host.Actors;

namespace ZB.MOM.WW.ScadaBridge.Host.Tests;

public sealed class ActorSystemBridgeTests
{
    [Fact]
    public void ActorSystem_ResolvesNull_BeforeHostedServiceStarts()
    {
        var services = new ServiceCollection();
        services.AddSingleton<AkkaHostedService>();              // ActorSystem property is null pre-start
        services.AddTransient(sp => sp.GetRequiredService<AkkaHostedService>().ActorSystem!);

        using var provider = services.BuildServiceProvider();
        Assert.Null(provider.GetService<ActorSystem>());         // transient re-reads → null, not cached
    }
}

If AkkaHostedService cannot be constructed without dependencies, register a minimal stub instead; the assertion that matters is "transient bridge yields null before start." Read Actors/AkkaHostedService.cs constructor first and adapt.

Step 2: Run, expect failure

cd ~/Desktop/ScadaBridge
dotnet test tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ZB.MOM.WW.ScadaBridge.Host.Tests.csproj --filter "FullyQualifiedName~ActorSystemBridgeTests"

Expected: FAIL (no ActorSystem registration → GetService returns null already, OR compile gap). Adjust so the test meaningfully exercises the bridge registration you add in Step 3.

Step 3: Register the bridge in Program.cs (right after AkkaHostedService is registered):

// The shared ZB.MOM.WW.Health Akka checks resolve ActorSystem from DI. ScadaBridge owns the
// ActorSystem inside AkkaHostedService (not a DI singleton), so bridge it as TRANSIENT: each
// resolve re-reads the current value — null while warming up (checks → Degraded), live afterwards.
builder.Services.AddTransient(sp =>
    sp.GetRequiredService<AkkaHostedService>().ActorSystem
    ?? throw new InvalidOperationException("ActorSystem not yet started."));

Caution: the shared checks use GetService<ActorSystem>() (returns null on failure to resolve) NOT GetRequiredService. A transient factory that THROWS still surfaces as null from GetService? No — GetService propagates factory exceptions. Therefore the factory must NOT throw; return null instead. Use:

builder.Services.AddTransient<ActorSystem>(sp =>
    sp.GetRequiredService<AkkaHostedService>().ActorSystem!);   // null before start; '!' is a hint only

GetService<ActorSystem>() then returns null pre-start (Degraded) and the live system post-start. Make the Step-1 test assert exactly this.

Step 4: Run, expect pass

dotnet test tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ZB.MOM.WW.ScadaBridge.Host.Tests.csproj --filter "FullyQualifiedName~ActorSystemBridgeTests"

Expected: PASS.

Step 5: Commit

git add src/ZB.MOM.WW.ScadaBridge.Host/Program.cs \
        tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ActorSystemBridgeTests.cs
git commit -m "feat: bridge ActorSystem into DI (transient) for shared health checks"

Task 8: Swap checks to shared probes; add /healthz; canonical writer

Classification: high-risk Estimated implement time: ~5 min Parallelizable with: none (depends on Task 6 + Task 7)

Files:

  • Modify: ~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:114-117 (registration) and :222-233 (endpoint mapping)
  • Delete: Health/DatabaseHealthCheck.cs, Health/AkkaClusterHealthCheck.cs, Health/ActiveNodeHealthCheck.cs
  • Keep: Health/ActiveNodeGate.cs (unchanged — seam unification deferred)
  • Adjust: tests/ZB.MOM.WW.ScadaBridge.Host.Tests/HealthCheckTests.cs

Step 1: Replace the registration block (Program.cs lines 114-117):

builder.Services.AddHealthChecks()
    .AddTypeActivatedCheck<DatabaseHealthCheck<ScadaBridgeDbContext>>(
        "database",
        failureStatus: null,
        tags: new[] { ZbHealthTags.Ready })           // default CanConnectAsync probe; self-scopes
    .AddTypeActivatedCheck<AkkaClusterHealthCheck>(
        "akka-cluster",
        failureStatus: null,
        tags: new[] { ZbHealthTags.Ready },
        args: AkkaClusterStatusPolicy.Default)        // Up/Joining=Healthy, Leaving/Exiting=Degraded
    .AddTypeActivatedCheck<ActiveNodeHealthCheck>(
        "active-node",
        failureStatus: null,
        tags: new[] { ZbHealthTags.Active });         // role-less leader check

Add usings: ZB.MOM.WW.Health, ZB.MOM.WW.Health.Akka, ZB.MOM.WW.Health.EntityFrameworkCore, ZB.MOM.WW.ScadaBridge.ConfigurationDatabase (for ScadaBridgeDbContext). Tag mapping preserves the prior split: database + akka-cluster on ready; active-node on active.

Step 2: Replace the endpoint mapping (Program.cs lines 222-233 — the two MapHealthChecks blocks using UIResponseWriter) with a single call:

app.MapZbHealth();   // /health/ready (database+akka-cluster), /health/active (active-node), /healthz

This adds the previously-missing /healthz and switches both tiers to the canonical ZbHealthWriter. Remove the now-unused using for HealthChecks.UI.Client / UIResponseWriter if it becomes dead.

Step 3: Delete the three bespoke checks

cd ~/Desktop/ScadaBridge
git rm src/ZB.MOM.WW.ScadaBridge.Host/Health/DatabaseHealthCheck.cs \
       src/ZB.MOM.WW.ScadaBridge.Host/Health/AkkaClusterHealthCheck.cs \
       src/ZB.MOM.WW.ScadaBridge.Host/Health/ActiveNodeHealthCheck.cs

Step 4: Build + test

dotnet build ZB.MOM.WW.ScadaBridge.slnx
dotnet test tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ZB.MOM.WW.ScadaBridge.Host.Tests.csproj

Expected: build clean; tests pass. HealthCheckTests.cs likely references the deleted concrete types or the old endpoint shape — retarget it to assert: /health/ready, /health/active, AND the new /healthz are mapped; database+akka-cluster are tagged ready; active-node is tagged active. The Default policy preserves ScadaBridge's Joining=Healthy classification — keep any test asserting that.

Step 5: Commit

git add src/ZB.MOM.WW.ScadaBridge.Host/Program.cs \
        tests/ZB.MOM.WW.ScadaBridge.Host.Tests/HealthCheckTests.cs
git commit -m "feat: adopt shared ZB.MOM.WW.Health probes; add /healthz; canonical writer"

Phase 4 — Bookkeeping (scadaproj)

Task 9: Update the Health GAPS backlog to reflect adoption + deferrals

Classification: trivial Estimated implement time: ~3 min Parallelizable with: none (do last)

Files:

  • Modify: ~/Desktop/scadaproj/components/health/GAPS.md (adoption backlog table + a deferrals note)

Step 1: In components/health/GAPS.md, annotate the adoption-backlog rows as done for what shipped (MxGateway tiers + AuthStoreHealthCheck; OtOpcUa shared probes; ScadaBridge shared probes

  • /healthz + canonical writer + ActorSystem bridge), and add a short "Deferred (verified ill-fitting on adoption)" subsection capturing: downstream gRPC probes (no host-level channel), ScadaBridge IDbContextFactory switch (shared check self-scopes), ScadaBridge IActiveNodeGate seam unification (different InboundAPI interface), and MxGateway worker probe (named-pipe transport).

Step 2: Commit (scadaproj)

cd ~/Desktop/scadaproj
git add components/health/GAPS.md
git commit -m "docs(health): mark ZB.MOM.WW.Health adoption done; record verified deferrals"

Execution notes

  • Order: Task 0 first (gates everything). Then the three repo phases are independent — Tasks 1-3 (MxGateway), 4-5 (OtOpcUa), 6-8 (ScadaBridge) can run in parallel across repos; within a repo they are sequential. Task 9 (scadaproj) last.
  • Per-repo green gate: a phase is "done" only when that sister repo's full dotnet build + dotnet test are green — not just the changed area.
  • Behaviour preservation is the acceptance bar: the presets (OtOpcUaCompat / Default) and the role filter ("admin" / role-less) exist to keep each app's Healthy/Degraded/Unhealthy classifications identical. Any classification change is a defect, not an improvement.
  • No secrets in any diff — the Gitea token / feed credentials are provided out-of-band; verify no nuget.config or csproj change embeds them.