- Contract: DatabaseHealthCheck<TContext> ctor now shows IServiceProvider (resolves IDbContextFactory<TContext> when registered, else a scoped TContext; pool-safe) - Contract: RequireActiveNode gains retryAfterSeconds = 5 default parameter - Packages: remove dangling AspNetCore.HealthChecks.UI.Client PackageVersion (no csproj referenced it) - Tests: fix CS8625 in RoleLessCases — use object?[] so null role rows compile warning-free under Nullable=enable - Add ZB.MOM.WW.Health/CLAUDE.md (packages, responsibilities, consumer matrix, build/test/pack commands, status + pointer to components/health/)
12 KiB
Proposed shared library: ZB.MOM.WW.Health
A contract on paper — the public surface to extract so the three projects stop re-implementing
health-check tiers, probe logic, and the active-node gating seam. Realizes
../spec/SPEC.md. Not yet created. Reference implementations already
exist: OtOpcUa Health/ (three-tier + probes), ScadaBridge Health/ (inline probes +
ActiveNodeGate).
Packages (.NET 10)
ZB.MOM.WW.Health # core: tier convention, response writer, IActiveNodeGate, GrpcDependencyHealthCheck
ZB.MOM.WW.Health.Akka # AkkaClusterHealthCheck, ActiveNodeHealthCheck, AkkaActiveNodeGate
ZB.MOM.WW.Health.EntityFrameworkCore # DatabaseHealthCheck<TContext>
All three are .NET 10. The split keeps Akka.Cluster and EF Core out of MxGateway's dependency graph — MxGateway pulls only the core package. Published to the Gitea NuGet feed; SemVer; lockstep to start. The x86 net48 mxaccessgw worker has no HTTP surface — net48 multi-targeting is not required.
Packaging & distribution
Three NuGet packages, one DLL each, on the Gitea NuGet feed. These are libraries linked into each app — there is no central health service. Consumers reference only what they need:
| Package (→ DLL) | Transitive deps | MxGateway | OtOpcUa | ScadaBridge |
|---|---|---|---|---|
…Health |
Microsoft.Extensions.Diagnostics.HealthChecks, ASP.NET Core abstractions |
✅ | ✅ | ✅ |
…Health.Akka |
Akka.Cluster | — | ✅ | ✅ |
…Health.EntityFrameworkCore |
EF Core | — | ✅ | ✅ |
Why MxGateway takes only core: it is not Akka-based and does not use EF Core. The
GrpcDependencyHealthCheck in the core package covers its only probe need (worker channel
reachability), so it avoids the Akka and EF transitive trees entirely.
ZB.MOM.WW.Health
namespace ZB.MOM.WW.Health;
/// Canonical tag constants — use these when calling AddCheck(..., tags: [ZbHealthTags.Ready]).
public static class ZbHealthTags
{
public const string Ready = "ready";
public const string Active = "active";
public const string Live = "live";
}
/// Options for MapZbHealth(). All paths and the response writer are overridable.
public sealed class ZbHealthEndpointOptions
{
public string ReadyPath { get; set; } = "/health/ready";
public string ActivePath { get; set; } = "/health/active";
public string LivePath { get; set; } = "/healthz";
/// Defaults to ZbHealthWriter.WriteJsonAsync.
public Func<HttpContext, HealthReport, Task>? ResponseWriter { get; set; }
}
/// Extension that maps all three health tiers in one call.
public static class ZbHealthEndpointExtensions
{
/// Maps /health/ready (tag "ready"), /health/active (tag "active"), /healthz (tag "live").
/// Does NOT call services.AddHealthChecks() — caller is responsible for probe registration.
public static IEndpointConventionBuilder MapZbHealth(
this IEndpointRouteBuilder endpoints,
ZbHealthEndpointOptions? options = null);
/// Maps /health/ready (tag "ready"), /health/active (tag "active"), /healthz (tag "live").
public static IEndpointConventionBuilder MapZbHealth(
this IEndpointRouteBuilder endpoints,
Action<ZbHealthEndpointOptions> configure);
}
/// Canonical JSON response writer. Shape: { status, totalDurationMs, entries: { name: { status, description, durationMs } } }.
public static class ZbHealthWriter
{
public static Task WriteJsonAsync(HttpContext context, HealthReport report);
}
/// Single-property seam: is this node the active/leader node?
/// Attach to route groups via RequireActiveNode(). Implement with AkkaActiveNodeGate (Health.Akka)
/// or a project-specific implementation for non-Akka nodes.
public interface IActiveNodeGate
{
bool IsActiveNode { get; }
}
/// Route convention that returns 503 on standby nodes. DI-resolves IActiveNodeGate.
public static class ActiveNodeGateExtensions
{
public static IEndpointConventionBuilder RequireActiveNode(
this IEndpointConventionBuilder builder,
int retryAfterSeconds = 5);
}
/// Checks that a downstream gRPC channel is reachable.
public sealed class GrpcDependencyHealthCheck : IHealthCheck
{
public GrpcDependencyHealthCheck(GrpcChannel channel, GrpcDependencyOptions? options = null);
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default);
}
/// Options for GrpcDependencyHealthCheck.
public sealed class GrpcDependencyOptions
{
/// Override the default probe (GrpcChannel.ConnectAsync).
/// Return true = reachable, false = unreachable.
public Func<GrpcChannel, CancellationToken, Task<bool>>? Probe { get; set; }
/// Human-readable name of the dependency, used in the HealthCheckResult description.
public string? DependencyName { get; set; }
public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(5);
}
ZB.MOM.WW.Health.Akka
namespace ZB.MOM.WW.Health.Akka;
/// Checks the local node's Akka cluster membership status.
/// Register to tag ZbHealthTags.Ready.
/// <remarks>
/// The ActorSystem is resolved lazily from the service provider. If the ActorSystem is not yet
/// available (e.g. during startup before Akka is initialised), the check returns Degraded rather
/// than throwing. This makes the check safe to register before Akka is fully up.
/// </remarks>
public sealed class AkkaClusterHealthCheck : IHealthCheck
{
/// <param name="serviceProvider">
/// The application service provider. ActorSystem is resolved lazily so the check is
/// startup-safe: if no ActorSystem is registered yet the result is Degraded.
/// </param>
public AkkaClusterHealthCheck(
IServiceProvider serviceProvider,
AkkaClusterStatusPolicy policy);
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default);
}
/// Maps Akka MemberStatus values to HealthStatus.
/// Two named presets cover the two existing implementations; construct a custom instance for
/// project-specific overrides.
public sealed class AkkaClusterStatusPolicy
{
public AkkaClusterStatusPolicy(Func<MemberStatus, HealthStatus> evaluate);
/// ScadaBridge origin: Up/Joining→Healthy, Leaving/Exiting→Degraded, else Unhealthy.
/// Convergence target for all projects.
public static AkkaClusterStatusPolicy Default { get; }
/// OtOpcUa origin: self-Up-among-reachable-members→Healthy, else Degraded.
/// Provided for backward compatibility during OtOpcUa migration.
public static AkkaClusterStatusPolicy OtOpcUaCompat { get; }
}
/// Checks whether this node is the designated leader / active node.
/// Optional role parameter scopes the check to nodes carrying that role.
/// Register to tag ZbHealthTags.Active.
/// <remarks>
/// The ActorSystem is resolved lazily from the service provider. If the ActorSystem is not yet
/// available (e.g. during startup before Akka is initialised), the check returns Degraded rather
/// than throwing. This makes the check startup-safe.
/// </remarks>
public sealed class ActiveNodeHealthCheck : IHealthCheck
{
/// Role-less constructor: Healthy = node is Up AND cluster leader (ScadaBridge ActiveNode pattern).
/// Returns Degraded when ActorSystem/cluster is not yet ready.
/// <param name="serviceProvider">
/// The application service provider. ActorSystem is resolved lazily so the check is
/// startup-safe: if no ActorSystem is registered yet the result is Degraded.
/// </param>
public ActiveNodeHealthCheck(IServiceProvider serviceProvider);
/// Role-filtered constructor: Healthy = (node lacks the role) OR (node carries role AND is role-singleton leader).
/// Degraded = node carries role but is not the role-singleton leader (OtOpcUa AdminRoleLeader pattern).
/// Returns Degraded when ActorSystem/cluster is not yet ready.
/// <param name="serviceProvider">
/// The application service provider. ActorSystem is resolved lazily so the check is
/// startup-safe: if no ActorSystem is registered yet the result is Degraded.
/// </param>
public ActiveNodeHealthCheck(IServiceProvider serviceProvider, string role);
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default);
}
/// IActiveNodeGate implementation that computes IsActiveNode directly from the ActorSystem
/// (SelfMember Up + cluster leader), null-guarded for startup safety.
/// Register as a singleton. Does NOT resolve ActiveNodeHealthCheck from DI.
public sealed class AkkaActiveNodeGate : IActiveNodeGate
{
/// <param name="serviceProvider">
/// The application service provider. ActorSystem is resolved lazily; if not yet available
/// IsActiveNode returns false (safe default during startup).
/// The gate checks SelfMember.Status == Up AND cluster.State.Leader == self.Address directly.
/// </param>
public AkkaActiveNodeGate(IServiceProvider serviceProvider);
public bool IsActiveNode { get; }
}
ZB.MOM.WW.Health.EntityFrameworkCore
namespace ZB.MOM.WW.Health.EntityFrameworkCore;
/// Checks database reachability via an EF Core DbContext.
/// Default probe: context.Database.CanConnectAsync() (ScadaBridge pattern).
/// Supply a custom probe delegate for query-based validation (OtOpcUa "query Deployments" pattern).
/// Register to tag ZbHealthTags.Ready.
public sealed class DatabaseHealthCheck<TContext> : IHealthCheck
where TContext : DbContext
{
// Resolves IDbContextFactory<TContext> when registered, else a scoped TContext; pool-safe.
public DatabaseHealthCheck(
IServiceProvider serviceProvider,
DatabaseHealthCheckOptions<TContext>? options = null);
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default);
}
/// Options for DatabaseHealthCheck<TContext>.
public sealed class DatabaseHealthCheckOptions<TContext>
where TContext : DbContext
{
/// Override the default CanConnectAsync() probe with a custom query-based probe.
/// Throw to signal failure; return normally to signal success.
/// Example: <c>db => db.Deployments.AsNoTracking().Take(1).ToListAsync()</c>
public Func<TContext, CancellationToken, Task>? ProbeQuery { get; set; }
public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(10);
}
Consumer matrix summary
| Consumer | Packages | Notes |
|---|---|---|
| MxGateway | ZB.MOM.WW.Health (core only) |
GrpcDependencyHealthCheck on the worker channel; all three tiers via MapZbHealth(); IActiveNodeGate not needed (not Akka-based) |
| OtOpcUa | All three | AkkaClusterHealthCheck + OtOpcUaCompat preset → Default on convergence; ActiveNodeHealthCheck(role: "admin"); DatabaseHealthCheck<T> with ProbeQuery delegate |
| ScadaBridge | All three | AkkaClusterHealthCheck + Default policy; ActiveNodeHealthCheck (role-less); DatabaseHealthCheck<T> default probe; AkkaActiveNodeGate replaces inline ActiveNodeGate |
Open contract questions
IActiveNodeGatefor non-Akka nodes: MxGateway does not need active-node gating today. If a future MxGateway cluster requires it, the interface is in the core package and can be implemented without an Akka dependency. Validate whether a stubAlwaysActiveGate(returnstrue) should ship in core for single-node deployments.AkkaActiveNodeGatecaching:IsActiveNodeis a synchronous property; the underlying cluster-state read is synchronous but the ActorSystem lookup is lazy. Validate whether the gate should cache the computed value on a short TTL (e.g. 5 s) to reduce Akka.Cluster API overhead on high-frequency API routing checks, or whether readingSelfMember/State.Leaderdirectly on every call is acceptable.
See ../GAPS.md for the adoption order and effort/risk.