Files
scadaproj/docs/plans/2026-06-01-health-library-adoption.md
T

850 lines
35 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ZB.MOM.WW.Health Adoption Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Adopt the shared `ZB.MOM.WW.Health` library into all three sister apps (OtOpcUa,
MxAccessGateway, ScadaBridge), replacing each app's bespoke health-check wiring with the shared
probes, canonical three-tier endpoints (`/health/ready`, `/health/active`, `/healthz`), and JSON
writer — behaviour-preserving.
**Architecture:** Distribution is via the Gitea NuGet registry (`dohertj2-gitea` feed). The shared
checks are registered with `AddTypeActivatedCheck<T>` (DI supplies `IServiceProvider`; extra
constructor args — policy / role / options — passed positionally) and tagged with `ZbHealthTags`;
`MapZbHealth()` routes each tier by tag. Each sister repo is its **own git repo** — branch, commit,
and (optionally) PR happen inside that repo, not in scadaproj. The three repo phases are mutually
independent after publish and may proceed in parallel.
**Tech Stack:** .NET 10, ASP.NET Core health checks (`Microsoft.Extensions.Diagnostics.HealthChecks`),
Akka.NET cluster, EF Core, `Microsoft.Data.Sqlite`, NuGet Central Package Management, Gitea NuGet feed.
---
## Context the executor MUST know
**This plan edits FOUR repos:**
- `~/Desktop/scadaproj` — only Phase 0 (verify publish) and Phase 4 (GAPS bookkeeping).
- `~/Desktop/MxAccessGateway` — Phase 1 (core package only).
- `~/Desktop/OtOpcUa` — Phase 2 (all three packages).
- `~/Desktop/ScadaBridge` — Phase 3 (all three packages).
**Per-repo git discipline:** each sister repo is independent. Before editing a sister repo, create a
branch `feat/adopt-zb-health`. Commit inside that repo. Never commit sister-repo changes from
scadaproj. Never skip hooks; never force-push.
**Distribution status (Task 0 already done):** the three `ZB.MOM.WW.Health` 0.1.0 packages are
published to the `dohertj2-gitea` feed, and authenticated read credentials are configured at the
**user level** (`~/.nuget/NuGet/NuGet.Config`) — anonymous read is OFF, so restore needs them, and
they are already in place for every subagent. NEVER put the token in a repo file.
**Source-mapping gotcha (verified):** a `ZB.MOM.WW.Health.*` pattern does NOT match the core package
id `ZB.MOM.WW.Health` (no trailing dot). Every repo's `packageSourceMapping` for the Gitea feed MUST
list BOTH `<package pattern="ZB.MOM.WW.Health" />` and `<package pattern="ZB.MOM.WW.Health.*" />`.
**Shared registration idiom (used in every phase).** The shared checks need constructor args DI
can't supply alone, so register them with `AddTypeActivatedCheck<T>`:
```csharp
using Microsoft.Extensions.DependencyInjection; // AddTypeActivatedCheck
using Microsoft.Extensions.Diagnostics.HealthChecks; // HealthStatus
using ZB.MOM.WW.Health; // ZbHealthTags, MapZbHealth, ZbHealthWriter
// + ZB.MOM.WW.Health.Akka / .EntityFrameworkCore where used
```
`AddTypeActivatedCheck<T>(name, failureStatus, tags, params object[] args)` builds the check via
`ActivatorUtilities.CreateInstance`: `IServiceProvider` constructor params are satisfied from DI;
anything else (an `AkkaClusterStatusPolicy`, a role string, a `DatabaseHealthCheckOptions<T>`) is
taken from `args` by type. This is the canonical way to wire the shared checks.
**Library public API (verified, do not re-derive):**
- `endpoints.MapZbHealth(ZbHealthEndpointOptions? = null)` — maps ready/active/live; defaults
`/health/ready`, `/health/active`, `/healthz`; ready+active use `ZbHealthWriter.WriteJsonAsync`;
all anonymous. Does NOT call `AddHealthChecks()`.
- `ZbHealthTags.Ready` = `"ready"`, `.Active` = `"active"`, `.Live` = `"live"`.
- `DatabaseHealthCheck<TContext>(IServiceProvider, DatabaseHealthCheckOptions<TContext>? )`
default probe `CanConnectAsync`; `options.ProbeQuery = Func<TContext,CancellationToken,Task>` for
the stricter query probe; resolves an `IDbContextFactory<TContext>` if registered, else a scoped
`TContext` from a fresh scope (pool-safe).
- `AkkaClusterHealthCheck(IServiceProvider, AkkaClusterStatusPolicy)` — presets
`AkkaClusterStatusPolicy.Default` and `.OtOpcUaCompat`. Resolves `ActorSystem` from DI.
- `ActiveNodeHealthCheck(IServiceProvider)` (role-less) / `(IServiceProvider, string role)`.
Resolves `ActorSystem` from DI lazily; Degraded if not yet available.
- `AkkaActiveNodeGate(IServiceProvider) : IActiveNodeGate` — not used in this plan (ScadaBridge seam
unification is deferred).
**Scope deferrals (settled — do NOT implement here):** downstream gRPC dependency probes (no
host-level `GrpcChannel` exists in OtOpcUa or MxGateway); ScadaBridge `IDbContextFactory` switch
(the shared check self-scopes); ScadaBridge `IActiveNodeGate` seam unification (its interface is
`...InboundAPI.IActiveNodeGate`, wired into inbound-API gating — out of scope). These are recorded
as follow-ups in Phase 4.
---
## Phase 0 — Publish the Health packages (prerequisite)
### Task 0: Verify the three Health nupkgs are on the Gitea feed (publish if absent)
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** none (gates all other phases)
**Files:**
- Read: `~/Desktop/scadaproj/ZB.MOM.WW.Health/ZB.MOM.WW.Health.slnx`
- (No source edits — this is a pack/push/verify task.)
**Step 1: Check whether the packages already resolve from Gitea**
The library CLAUDE.md claims they are "published to the Gitea NuGet feed." Verify:
```bash
curl -s "https://gitea.dohertylan.com/api/packages/dohertj2/nuget/v3/registration/ZB.MOM.WW.Health/index.json" -o /dev/null -w "%{http_code}\n"
```
Expected: `200` if already published. If `404`/`401`, publish (Steps 23). If credentials are
needed for the query, skip to Step 2 and rely on the push result.
**Step 2: Pack (only if not already published)**
```bash
cd ~/Desktop/scadaproj/ZB.MOM.WW.Health
dotnet pack ZB.MOM.WW.Health.slnx -c Release -o ./artifacts
ls artifacts/*.nupkg
```
Expected: `ZB.MOM.WW.Health.0.1.0.nupkg`, `ZB.MOM.WW.Health.Akka.0.1.0.nupkg`,
`ZB.MOM.WW.Health.EntityFrameworkCore.0.1.0.nupkg`.
**Step 3: Push to the Gitea feed**
Credentials are NOT in the repo. The developer/CI provides them. Push each package:
```bash
dotnet nuget push "artifacts/ZB.MOM.WW.Health*.0.1.0.nupkg" \
--source "https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json" \
--api-key "$GITEA_NUGET_TOKEN"
```
Expected: `Your package was pushed.` for each (or `409 Conflict` = already present = fine).
**Fallback (if Gitea is unreachable):** STOP and surface it. Do not silently switch mechanisms —
the fallback (local folder feed) changes only each repo's `nuget.config` source line, but that is a
plan amendment the user should approve.
**Step 4: Commit (none in scadaproj for this task)** — no source changed; proceed to Phase 1.
---
## Phase 1 — MxAccessGateway (core package only)
Repo: `~/Desktop/MxAccessGateway`. Branch: `feat/adopt-zb-health`. This repo has **no CPM and no
`nuget.config`** today. Readiness probe = a custom `AuthStoreHealthCheck` over the SQLite auth store
(the gateway authenticates every gRPC call against it).
### Task 1: Reference wiring — create `nuget.config`, add the package reference
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 4, Task 7 (different repos)
**Files:**
- Create: `~/Desktop/MxAccessGateway/nuget.config`
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj` (ItemGroup, after line 13)
**Step 1: Create `nuget.config`** (this repo's first; nuget.org for everything, Gitea for Health)
```xml
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
<clear />
<add key="nuget.org" value="https://api.nuget.org/v3/index.json" />
<add key="dohertj2-gitea" value="https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json" />
</packageSources>
<!-- nuget.org serves everything; the Gitea feed serves only the ZB.MOM.WW.* shared libs.
Credentials are NOT committed: provide them per-developer via `dotnet nuget add source`
(username + access token) or NuGet credential env vars in CI. -->
<packageSourceMapping>
<packageSource key="nuget.org">
<package pattern="*" />
</packageSource>
<packageSource key="dohertj2-gitea">
<package pattern="ZB.MOM.WW.Health" />
<package pattern="ZB.MOM.WW.Health.*" />
</packageSource>
</packageSourceMapping>
</configuration>
```
**Step 2: Add the package reference** to the Server `.csproj`. Insert into the first `<ItemGroup>`
(the one ending at the current line 14):
```xml
<PackageReference Include="ZB.MOM.WW.Health" Version="0.1.0" />
```
(Direct versioned reference — this repo has no CPM. Do not introduce CPM.)
**Step 3: Restore to verify the feed resolves**
```bash
cd ~/Desktop/MxAccessGateway
dotnet restore src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj
```
Expected: restore succeeds and pulls `ZB.MOM.WW.Health 0.1.0` from `dohertj2-gitea`. If it 401s,
the developer must add the Gitea source credentials (`dotnet nuget add source … -u … -p … --store-password-in-clear-text`).
**Step 4: Commit**
```bash
cd ~/Desktop/MxAccessGateway && git checkout -b feat/adopt-zb-health
git add nuget.config src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj
git commit -m "build: reference ZB.MOM.WW.Health from the Gitea feed"
```
### Task 2: Write the custom `AuthStoreHealthCheck` (TDD)
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 4, Task 7
**Files:**
- Create: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/Diagnostics/AuthStoreHealthCheck.cs`
- Test: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Diagnostics/AuthStoreHealthCheckTests.cs`
**Step 1: Write the failing tests**
```csharp
using Microsoft.Extensions.Diagnostics.HealthChecks;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.MxGateway.Server.Configuration;
using ZB.MOM.WW.MxGateway.Server.Diagnostics;
using ZB.MOM.WW.MxGateway.Server.Security.Authentication;
namespace ZB.MOM.WW.MxGateway.Tests.Diagnostics;
public sealed class AuthStoreHealthCheckTests
{
private static AuthSqliteConnectionFactory FactoryFor(string sqlitePath)
{
var options = new GatewayOptions();
options.Authentication.SqlitePath = sqlitePath;
return new AuthSqliteConnectionFactory(Options.Create(options));
}
[Fact]
public async Task Healthy_WhenStoreReachable()
{
var path = Path.Combine(Path.GetTempPath(), $"authcheck-{Guid.NewGuid():N}.db");
try
{
var check = new AuthStoreHealthCheck(FactoryFor(path));
var result = await check.CheckHealthAsync(new HealthCheckContext());
Assert.Equal(HealthStatus.Healthy, result.Status);
}
finally { if (File.Exists(path)) File.Delete(path); }
}
[Fact]
public async Task Unhealthy_WhenPathUnusable()
{
// A path whose parent cannot be created (a file used as a directory) forces open to fail.
var bogus = Path.Combine(Path.GetTempPath(), $"authcheck-{Guid.NewGuid():N}");
await File.WriteAllTextAsync(bogus, "x");
try
{
var check = new AuthStoreHealthCheck(FactoryFor(Path.Combine(bogus, "store.db")));
var result = await check.CheckHealthAsync(new HealthCheckContext());
Assert.Equal(HealthStatus.Unhealthy, result.Status);
}
finally { if (File.Exists(bogus)) File.Delete(bogus); }
}
}
```
**Step 2: Run, expect failure** (type does not exist)
```bash
cd ~/Desktop/MxAccessGateway
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~AuthStoreHealthCheckTests"
```
Expected: COMPILE ERROR / FAIL — `AuthStoreHealthCheck` not found.
**Step 3: Implement the check**
```csharp
using Microsoft.Data.Sqlite;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using ZB.MOM.WW.MxGateway.Server.Security.Authentication;
namespace ZB.MOM.WW.MxGateway.Server.Diagnostics;
/// <summary>
/// Readiness probe: verifies the SQLite authentication store is reachable. The gateway
/// authenticates every gRPC call against this store, so its reachability gates readiness.
/// </summary>
public sealed class AuthStoreHealthCheck : IHealthCheck
{
private readonly AuthSqliteConnectionFactory _connectionFactory;
public AuthStoreHealthCheck(AuthSqliteConnectionFactory connectionFactory) =>
_connectionFactory = connectionFactory ?? throw new ArgumentNullException(nameof(connectionFactory));
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
try
{
await using SqliteConnection connection =
await _connectionFactory.OpenConnectionAsync(cancellationToken).ConfigureAwait(false);
await using SqliteCommand command = connection.CreateCommand();
command.CommandText = "SELECT 1;";
await command.ExecuteScalarAsync(cancellationToken).ConfigureAwait(false);
return HealthCheckResult.Healthy("Auth store is reachable.");
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
throw;
}
catch (Exception ex)
{
return HealthCheckResult.Unhealthy("Auth store is unreachable.", ex);
}
}
}
```
**Step 4: Run, expect pass**
```bash
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter "FullyQualifiedName~AuthStoreHealthCheckTests"
```
Expected: PASS (2 tests). If the `GatewayOptions.Authentication.SqlitePath` accessor differs, adjust
the test helper to match the real options shape (read `Configuration/GatewayOptions.cs` first).
**Step 5: Commit**
```bash
git add src/ZB.MOM.WW.MxGateway.Server/Diagnostics/AuthStoreHealthCheck.cs \
src/ZB.MOM.WW.MxGateway.Tests/Diagnostics/AuthStoreHealthCheckTests.cs
git commit -m "feat: add AuthStoreHealthCheck readiness probe"
```
### Task 3: Rewire `GatewayApplication` to the canonical tiers; fix the route test
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 4, Task 7
**Files:**
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs:63-66` (the `AddHealthChecks()` line) and `:172-178` (the `/health/live` block)
- Modify: `~/Desktop/MxAccessGateway/src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs:14-27`
**Step 1: Replace the bare `AddHealthChecks()` (line 66) with the tagged readiness probe**
```csharp
builder.Services.AddHealthChecks()
.AddTypeActivatedCheck<AuthStoreHealthCheck>(
"auth-store",
failureStatus: null,
tags: new[] { ZbHealthTags.Ready });
```
Add `using ZB.MOM.WW.Health;` and `using ZB.MOM.WW.MxGateway.Server.Diagnostics;` (Diagnostics is
already imported at line 9).
**Step 2: Delete the `/health/live` block (lines 172-178) and map the canonical tiers**
Remove:
```csharp
endpoints.MapGet(
"/health/live",
() => Results.Ok(new GatewayHealthReply(
Status: "Healthy",
DefaultBackend: GatewayContractInfo.DefaultBackendName,
WorkerProtocolVersion: GatewayContractInfo.WorkerProtocolVersion)))
.WithName("LiveHealth");
```
Replace with:
```csharp
endpoints.MapZbHealth();
```
(`/health/ready` runs `auth-store`; `/health/active` runs no checks → 200; `/healthz` is bare
liveness. The `GatewayHealthReply` type may now be unused — if so, the C# compiler won't flag it;
leave it unless a "remove dead code" reviewer asks, to keep this change tight.)
**Step 3: Update the route test** (`GatewayApplicationTests.cs:14-27`) to assert the three tiers
instead of `/health/live`:
```csharp
/// <summary>Verifies that Build maps the canonical three health tiers.</summary>
[Fact]
public async Task Build_MapsCanonicalHealthEndpoints()
{
await using WebApplication app = GatewayApplication.Build([]);
var paths = ((IEndpointRouteBuilder)app).DataSources
.SelectMany(dataSource => dataSource.Endpoints)
.OfType<RouteEndpoint>()
.Select(e => e.RoutePattern.RawText)
.ToHashSet();
Assert.Contains("/health/ready", paths);
Assert.Contains("/health/active", paths);
Assert.Contains("/healthz", paths);
Assert.DoesNotContain("/health/live", paths);
}
```
**Step 4: Build + test the whole gateway**
```bash
cd ~/Desktop/MxAccessGateway
dotnet build src/MxGateway.sln
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj
```
Expected: build clean; all tests pass (the old `Build_MapsLiveHealthEndpoint` is replaced). If any
other test references `/health/live` or `LiveHealth`, update it the same way.
**Step 5: Commit**
```bash
git add src/ZB.MOM.WW.MxGateway.Server/GatewayApplication.cs \
src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayApplicationTests.cs
git commit -m "feat: map canonical ZB health tiers; replace bypassing /health/live"
```
---
## Phase 2 — OtOpcUa (all three packages)
Repo: `~/Desktop/OtOpcUa`. Branch: `feat/adopt-zb-health`. CPM present; `NuGet.config` has nuget.org
+ `local-mxgw` folder feed, NO source mapping. `ActorSystem` IS in DI (the bespoke
`AkkaClusterHealthCheck` injects it directly). This is the cleanest of the three.
### Task 4: Reference wiring — add Gitea source + mapping + CPM versions + package refs
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 1, Task 7
**Files:**
- Modify: `~/Desktop/OtOpcUa/NuGet.config`
- Modify: `~/Desktop/OtOpcUa/Directory.Packages.props` (near line 99-100)
- Modify: `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj` (ItemGroup lines 16-30)
**Step 1: Add the Gitea source + source mapping** to `NuGet.config`. Because adding a mapping makes
ALL sources mapped explicitly, map the existing feeds too:
```xml
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
<add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
<add key="local-mxgw" value="./nuget-packages" />
<add key="dohertj2-gitea" value="https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json" />
</packageSources>
<packageSourceMapping>
<packageSource key="nuget.org">
<package pattern="*" />
</packageSource>
<packageSource key="local-mxgw">
<package pattern="ZB.MOM.WW.MxGateway.*" />
</packageSource>
<packageSource key="dohertj2-gitea">
<package pattern="ZB.MOM.WW.Health" />
<package pattern="ZB.MOM.WW.Health.*" />
</packageSource>
</packageSourceMapping>
</configuration>
```
**Step 2: Add CPM versions** to `Directory.Packages.props` next to the existing `ZB.MOM.WW.*` lines:
```xml
<PackageVersion Include="ZB.MOM.WW.Health" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Health.Akka" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Health.EntityFrameworkCore" Version="0.1.0" />
```
**Step 3: Add package references** (no version — CPM) to the Host `.csproj` ItemGroup:
```xml
<PackageReference Include="ZB.MOM.WW.Health" />
<PackageReference Include="ZB.MOM.WW.Health.Akka" />
<PackageReference Include="ZB.MOM.WW.Health.EntityFrameworkCore" />
```
**Step 4: Restore**
```bash
cd ~/Desktop/OtOpcUa && git checkout -b feat/adopt-zb-health
dotnet restore ZB.MOM.WW.OtOpcUa.slnx
```
Expected: restore succeeds; the three Health packages come from `dohertj2-gitea`, MxGateway stays on
`local-mxgw`.
**Step 5: Commit**
```bash
git add NuGet.config Directory.Packages.props src/Server/ZB.MOM.WW.OtOpcUa.Host/ZB.MOM.WW.OtOpcUa.Host.csproj
git commit -m "build: reference ZB.MOM.WW.Health packages from the Gitea feed"
```
### Task 5: Swap the three checks to shared probes; map tiers via `MapZbHealth`
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 1, Task 7
**Files:**
- Rewrite: `~/Desktop/OtOpcUa/src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/HealthEndpoints.cs`
- Delete: `Health/DatabaseHealthCheck.cs`, `Health/AkkaClusterHealthCheck.cs`, `Health/AdminRoleLeaderHealthCheck.cs`
- Verify call sites unchanged: `Program.cs:137` (`AddOtOpcUaHealth`), `Program.cs:159` (`MapOtOpcUaHealth`)
**Step 1: Rewrite `HealthEndpoints.cs`** to register the shared checks (preserving names + tags) and
map via `MapZbHealth()`:
```csharp
using Microsoft.AspNetCore.Routing;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.Health;
using ZB.MOM.WW.Health.Akka;
using ZB.MOM.WW.Health.EntityFrameworkCore;
using ZB.MOM.WW.OtOpcUa.Configuration;
namespace ZB.MOM.WW.OtOpcUa.Host.Health;
public static class HealthEndpoints
{
/// <summary>
/// Registers the shared ZB.MOM.WW health probes. Tier semantics preserved from the bespoke
/// implementation: configdb + akka on ready+active; admin-leader on active only.
/// </summary>
public static IServiceCollection AddOtOpcUaHealth(this IServiceCollection services)
{
services.AddHealthChecks()
.AddTypeActivatedCheck<DatabaseHealthCheck<OtOpcUaConfigDbContext>>(
"configdb",
failureStatus: null,
tags: new[] { ZbHealthTags.Ready, ZbHealthTags.Active },
args: new DatabaseHealthCheckOptions<OtOpcUaConfigDbContext>
{
// Preserve OtOpcUa's stricter schema-touching probe.
ProbeQuery = static (db, ct) => db.Deployments.AsNoTracking().Take(1).ToListAsync(ct),
})
.AddTypeActivatedCheck<AkkaClusterHealthCheck>(
"akka",
failureStatus: null,
tags: new[] { ZbHealthTags.Ready, ZbHealthTags.Active },
args: AkkaClusterStatusPolicy.OtOpcUaCompat)
.AddTypeActivatedCheck<ActiveNodeHealthCheck>(
"admin-leader",
failureStatus: null,
tags: new[] { ZbHealthTags.Active },
args: "admin");
return services;
}
/// <summary>Maps the canonical three-tier health endpoints.</summary>
public static IEndpointRouteBuilder MapOtOpcUaHealth(this IEndpointRouteBuilder app)
{
app.MapZbHealth(); // /health/ready, /health/active, /healthz — all AllowAnonymous
return app;
}
}
```
Note: `args:` is the `params object[]` — pass a single options object / policy / string. If the
compiler binds the single-array overload oddly, wrap as `args: new object[] { … }`.
**Step 2: Delete the three bespoke check files**
```bash
cd ~/Desktop/OtOpcUa
git rm src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/DatabaseHealthCheck.cs \
src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/AkkaClusterHealthCheck.cs \
src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/AdminRoleLeaderHealthCheck.cs
```
(`IClusterRoleInfo` may now be unused by Health; leave its definition — it may be used elsewhere.)
**Step 3: Build**
```bash
dotnet build ZB.MOM.WW.OtOpcUa.slnx
```
Expected: clean. Fix any now-dangling `using ...Host.Health` references to the deleted types.
**Step 4: Run health-related tests**
```bash
dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~Health"
```
Expected: pass. **Behaviour-parity checks the executor must confirm** (add/keep tests if missing):
- akka tier: self `Up` → Healthy; self not Up → Degraded (the `OtOpcUaCompat` preset reproduces the
self-Up scan).
- admin-leader: node without `admin` role → Healthy; admin member non-leader → Degraded; admin
leader → Healthy. (Shared check reads `Cluster.Get(system).SelfMember` + `RoleLeader("admin")`,
vs the old `IClusterRoleInfo`; verify equivalence on a formed test cluster or via the library's
own `ActiveNodeDecision` table — already covered in the librarys tests.)
**Step 5: Commit**
```bash
git add src/Server/ZB.MOM.WW.OtOpcUa.Host/Health/HealthEndpoints.cs
git commit -m "feat: adopt shared ZB.MOM.WW.Health probes (preserve tiers + OtOpcUaCompat policy)"
```
---
## Phase 3 — ScadaBridge (all three packages)
Repo: `~/Desktop/ScadaBridge`. Branch: `feat/adopt-zb-health`. CPM + Gitea feed already wired (just
extend mapping). **`ActorSystem` is NOT in DI** (owned by `AkkaHostedService`) — add a transient DI
bridge so the shared checks can resolve it. Keep the existing `ActiveNodeGate` (seam unification
deferred). No `IDbContextFactory` switch (shared check self-scopes).
### Task 6: Reference wiring — extend mapping + CPM versions + package refs
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 1, Task 4
**Files:**
- Modify: `~/Desktop/ScadaBridge/nuget.config` (source-mapping block, lines 13-20)
- Modify: `~/Desktop/ScadaBridge/Directory.Packages.props` (near lines 76-77)
- Modify: `~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj` (ItemGroup lines 14-31)
**Step 1: Extend the Gitea source mapping** — add a second pattern under `dohertj2-gitea`:
```xml
<packageSource key="dohertj2-gitea">
<package pattern="ZB.MOM.WW.MxGateway.*" />
<package pattern="ZB.MOM.WW.Health" />
<package pattern="ZB.MOM.WW.Health.*" />
</packageSource>
```
**Step 2: Add CPM versions** next to the existing `ZB.MOM.WW.*` lines in `Directory.Packages.props`:
```xml
<PackageVersion Include="ZB.MOM.WW.Health" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Health.Akka" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Health.EntityFrameworkCore" Version="0.1.0" />
```
**Step 3: Add package references** to the Host `.csproj` ItemGroup (no version — CPM):
```xml
<PackageReference Include="ZB.MOM.WW.Health" />
<PackageReference Include="ZB.MOM.WW.Health.Akka" />
<PackageReference Include="ZB.MOM.WW.Health.EntityFrameworkCore" />
```
**Step 4: Restore + commit**
```bash
cd ~/Desktop/ScadaBridge && git checkout -b feat/adopt-zb-health
dotnet restore ZB.MOM.WW.ScadaBridge.slnx
git add nuget.config Directory.Packages.props src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj
git commit -m "build: reference ZB.MOM.WW.Health packages from the Gitea feed"
```
### Task 7: Add the transient `ActorSystem` DI bridge (TDD)
**Classification:** standard
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 1, Task 4
**Files:**
- Modify: `~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Program.cs` (near the Akka registration)
- Test: `~/Desktop/ScadaBridge/tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ActorSystemBridgeTests.cs`
**Why transient:** the shared checks call `sp.GetService<ActorSystem>()` **per probe** and treat
`null` as "not ready yet" (Degraded). A transient factory re-reads `AkkaHostedService.ActorSystem`
each resolve, returning `null` before startup and the live system after. A singleton would cache the
startup `null` forever.
**Step 1: Write the failing test**
```csharp
using Akka.Actor;
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.ScadaBridge.Host.Actors;
namespace ZB.MOM.WW.ScadaBridge.Host.Tests;
public sealed class ActorSystemBridgeTests
{
[Fact]
public void ActorSystem_ResolvesNull_BeforeHostedServiceStarts()
{
var services = new ServiceCollection();
services.AddSingleton<AkkaHostedService>(); // ActorSystem property is null pre-start
services.AddTransient(sp => sp.GetRequiredService<AkkaHostedService>().ActorSystem!);
using var provider = services.BuildServiceProvider();
Assert.Null(provider.GetService<ActorSystem>()); // transient re-reads → null, not cached
}
}
```
If `AkkaHostedService` cannot be constructed without dependencies, register a minimal stub instead;
the assertion that matters is "transient bridge yields null before start." Read
`Actors/AkkaHostedService.cs` constructor first and adapt.
**Step 2: Run, expect failure**
```bash
cd ~/Desktop/ScadaBridge
dotnet test tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ZB.MOM.WW.ScadaBridge.Host.Tests.csproj --filter "FullyQualifiedName~ActorSystemBridgeTests"
```
Expected: FAIL (no `ActorSystem` registration → `GetService` returns null already, OR compile gap).
Adjust so the test meaningfully exercises the bridge registration you add in Step 3.
**Step 3: Register the bridge in `Program.cs`** (right after `AkkaHostedService` is registered):
```csharp
// The shared ZB.MOM.WW.Health Akka checks resolve ActorSystem from DI. ScadaBridge owns the
// ActorSystem inside AkkaHostedService (not a DI singleton), so bridge it as TRANSIENT: each
// resolve re-reads the current value — null while warming up (checks → Degraded), live afterwards.
builder.Services.AddTransient(sp =>
sp.GetRequiredService<AkkaHostedService>().ActorSystem
?? throw new InvalidOperationException("ActorSystem not yet started."));
```
**Caution:** the shared checks use `GetService<ActorSystem>()` (returns null on failure to resolve)
NOT `GetRequiredService`. A transient factory that THROWS still surfaces as null from
`GetService`? No — `GetService` propagates factory exceptions. Therefore the factory must NOT throw;
return null instead. Use:
```csharp
builder.Services.AddTransient<ActorSystem>(sp =>
sp.GetRequiredService<AkkaHostedService>().ActorSystem!); // null before start; '!' is a hint only
```
`GetService<ActorSystem>()` then returns `null` pre-start (Degraded) and the live system post-start.
Make the Step-1 test assert exactly this.
**Step 4: Run, expect pass**
```bash
dotnet test tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ZB.MOM.WW.ScadaBridge.Host.Tests.csproj --filter "FullyQualifiedName~ActorSystemBridgeTests"
```
Expected: PASS.
**Step 5: Commit**
```bash
git add src/ZB.MOM.WW.ScadaBridge.Host/Program.cs \
tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ActorSystemBridgeTests.cs
git commit -m "feat: bridge ActorSystem into DI (transient) for shared health checks"
```
### Task 8: Swap checks to shared probes; add `/healthz`; canonical writer
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** none (depends on Task 6 + Task 7)
**Files:**
- Modify: `~/Desktop/ScadaBridge/src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:114-117` (registration) and `:222-233` (endpoint mapping)
- Delete: `Health/DatabaseHealthCheck.cs`, `Health/AkkaClusterHealthCheck.cs`, `Health/ActiveNodeHealthCheck.cs`
- Keep: `Health/ActiveNodeGate.cs` (unchanged — seam unification deferred)
- Adjust: `tests/ZB.MOM.WW.ScadaBridge.Host.Tests/HealthCheckTests.cs`
**Step 1: Replace the registration block** (Program.cs lines 114-117):
```csharp
builder.Services.AddHealthChecks()
.AddTypeActivatedCheck<DatabaseHealthCheck<ScadaBridgeDbContext>>(
"database",
failureStatus: null,
tags: new[] { ZbHealthTags.Ready }) // default CanConnectAsync probe; self-scopes
.AddTypeActivatedCheck<AkkaClusterHealthCheck>(
"akka-cluster",
failureStatus: null,
tags: new[] { ZbHealthTags.Ready },
args: AkkaClusterStatusPolicy.Default) // Up/Joining=Healthy, Leaving/Exiting=Degraded
.AddTypeActivatedCheck<ActiveNodeHealthCheck>(
"active-node",
failureStatus: null,
tags: new[] { ZbHealthTags.Active }); // role-less leader check
```
Add usings: `ZB.MOM.WW.Health`, `ZB.MOM.WW.Health.Akka`, `ZB.MOM.WW.Health.EntityFrameworkCore`,
`ZB.MOM.WW.ScadaBridge.ConfigurationDatabase` (for `ScadaBridgeDbContext`). Tag mapping preserves
the prior split: `database` + `akka-cluster` on ready; `active-node` on active.
**Step 2: Replace the endpoint mapping** (Program.cs lines 222-233 — the two `MapHealthChecks`
blocks using `UIResponseWriter`) with a single call:
```csharp
app.MapZbHealth(); // /health/ready (database+akka-cluster), /health/active (active-node), /healthz
```
This adds the previously-missing `/healthz` and switches both tiers to the canonical
`ZbHealthWriter`. Remove the now-unused `using` for `HealthChecks.UI.Client` /
`UIResponseWriter` if it becomes dead.
**Step 3: Delete the three bespoke checks**
```bash
cd ~/Desktop/ScadaBridge
git rm src/ZB.MOM.WW.ScadaBridge.Host/Health/DatabaseHealthCheck.cs \
src/ZB.MOM.WW.ScadaBridge.Host/Health/AkkaClusterHealthCheck.cs \
src/ZB.MOM.WW.ScadaBridge.Host/Health/ActiveNodeHealthCheck.cs
```
**Step 4: Build + test**
```bash
dotnet build ZB.MOM.WW.ScadaBridge.slnx
dotnet test tests/ZB.MOM.WW.ScadaBridge.Host.Tests/ZB.MOM.WW.ScadaBridge.Host.Tests.csproj
```
Expected: build clean; tests pass. `HealthCheckTests.cs` likely references the deleted concrete
types or the old endpoint shape — retarget it to assert: `/health/ready`, `/health/active`, AND the
new `/healthz` are mapped; `database`+`akka-cluster` are tagged `ready`; `active-node` is tagged
`active`. The `Default` policy preserves ScadaBridge's `Joining`=Healthy classification — keep any
test asserting that.
**Step 5: Commit**
```bash
git add src/ZB.MOM.WW.ScadaBridge.Host/Program.cs \
tests/ZB.MOM.WW.ScadaBridge.Host.Tests/HealthCheckTests.cs
git commit -m "feat: adopt shared ZB.MOM.WW.Health probes; add /healthz; canonical writer"
```
---
## Phase 4 — Bookkeeping (scadaproj)
### Task 9: Update the Health GAPS backlog to reflect adoption + deferrals
**Classification:** trivial
**Estimated implement time:** ~3 min
**Parallelizable with:** none (do last)
**Files:**
- Modify: `~/Desktop/scadaproj/components/health/GAPS.md` (adoption backlog table + a deferrals note)
**Step 1:** In `components/health/GAPS.md`, annotate the adoption-backlog rows as done for what
shipped (MxGateway tiers + `AuthStoreHealthCheck`; OtOpcUa shared probes; ScadaBridge shared probes
+ `/healthz` + canonical writer + ActorSystem bridge), and add a short "Deferred (verified
ill-fitting on adoption)" subsection capturing: downstream gRPC probes (no host-level channel),
ScadaBridge `IDbContextFactory` switch (shared check self-scopes), ScadaBridge `IActiveNodeGate`
seam unification (different InboundAPI interface), and MxGateway worker probe (named-pipe transport).
**Step 2: Commit (scadaproj)**
```bash
cd ~/Desktop/scadaproj
git add components/health/GAPS.md
git commit -m "docs(health): mark ZB.MOM.WW.Health adoption done; record verified deferrals"
```
---
## Execution notes
- **Order:** Task 0 first (gates everything). Then the three repo phases are independent — Tasks
1-3 (MxGateway), 4-5 (OtOpcUa), 6-8 (ScadaBridge) can run in parallel across repos; within a repo
they are sequential. Task 9 (scadaproj) last.
- **Per-repo green gate:** a phase is "done" only when that sister repo's full `dotnet build` +
`dotnet test` are green — not just the changed area.
- **Behaviour preservation is the acceptance bar:** the presets (`OtOpcUaCompat` / `Default`) and
the role filter (`"admin"` / role-less) exist to keep each app's Healthy/Degraded/Unhealthy
classifications identical. Any classification change is a defect, not an improvement.
- **No secrets in any diff** — the Gitea token / feed credentials are provided out-of-band; verify
no `nuget.config` or csproj change embeds them.