Files
scadaproj/components/health/shared-contract/ZB.MOM.WW.Health.md
T
Joseph Doherty 76295695ee docs(health): align shared-contract to shipped API + per-lib CLAUDE.md + cleanup
- Contract: DatabaseHealthCheck<TContext> ctor now shows IServiceProvider (resolves
  IDbContextFactory<TContext> when registered, else a scoped TContext; pool-safe)
- Contract: RequireActiveNode gains retryAfterSeconds = 5 default parameter
- Packages: remove dangling AspNetCore.HealthChecks.UI.Client PackageVersion (no
  csproj referenced it)
- Tests: fix CS8625 in RoleLessCases — use object?[] so null role rows compile
  warning-free under Nullable=enable
- Add ZB.MOM.WW.Health/CLAUDE.md (packages, responsibilities, consumer matrix,
  build/test/pack commands, status + pointer to components/health/)
2026-06-01 07:17:18 -04:00

12 KiB

Proposed shared library: ZB.MOM.WW.Health

A contract on paper — the public surface to extract so the three projects stop re-implementing health-check tiers, probe logic, and the active-node gating seam. Realizes ../spec/SPEC.md. Not yet created. Reference implementations already exist: OtOpcUa Health/ (three-tier + probes), ScadaBridge Health/ (inline probes + ActiveNodeGate).

Packages (.NET 10)

ZB.MOM.WW.Health                        # core: tier convention, response writer, IActiveNodeGate, GrpcDependencyHealthCheck
ZB.MOM.WW.Health.Akka                   # AkkaClusterHealthCheck, ActiveNodeHealthCheck, AkkaActiveNodeGate
ZB.MOM.WW.Health.EntityFrameworkCore    # DatabaseHealthCheck<TContext>

All three are .NET 10. The split keeps Akka.Cluster and EF Core out of MxGateway's dependency graph — MxGateway pulls only the core package. Published to the Gitea NuGet feed; SemVer; lockstep to start. The x86 net48 mxaccessgw worker has no HTTP surface — net48 multi-targeting is not required.

Packaging & distribution

Three NuGet packages, one DLL each, on the Gitea NuGet feed. These are libraries linked into each app — there is no central health service. Consumers reference only what they need:

Package (→ DLL) Transitive deps MxGateway OtOpcUa ScadaBridge
…Health Microsoft.Extensions.Diagnostics.HealthChecks, ASP.NET Core abstractions
…Health.Akka Akka.Cluster
…Health.EntityFrameworkCore EF Core

Why MxGateway takes only core: it is not Akka-based and does not use EF Core. The GrpcDependencyHealthCheck in the core package covers its only probe need (worker channel reachability), so it avoids the Akka and EF transitive trees entirely.

ZB.MOM.WW.Health

namespace ZB.MOM.WW.Health;

/// Canonical tag constants — use these when calling AddCheck(..., tags: [ZbHealthTags.Ready]).
public static class ZbHealthTags
{
    public const string Ready  = "ready";
    public const string Active = "active";
    public const string Live   = "live";
}

/// Options for MapZbHealth(). All paths and the response writer are overridable.
public sealed class ZbHealthEndpointOptions
{
    public string ReadyPath  { get; set; } = "/health/ready";
    public string ActivePath { get; set; } = "/health/active";
    public string LivePath   { get; set; } = "/healthz";

    /// Defaults to ZbHealthWriter.WriteJsonAsync.
    public Func<HttpContext, HealthReport, Task>? ResponseWriter { get; set; }
}

/// Extension that maps all three health tiers in one call.
public static class ZbHealthEndpointExtensions
{
    /// Maps /health/ready (tag "ready"), /health/active (tag "active"), /healthz (tag "live").
    /// Does NOT call services.AddHealthChecks() — caller is responsible for probe registration.
    public static IEndpointConventionBuilder MapZbHealth(
        this IEndpointRouteBuilder endpoints,
        ZbHealthEndpointOptions?   options = null);

    /// Maps /health/ready (tag "ready"), /health/active (tag "active"), /healthz (tag "live").
    public static IEndpointConventionBuilder MapZbHealth(
        this IEndpointRouteBuilder    endpoints,
        Action<ZbHealthEndpointOptions> configure);
}

/// Canonical JSON response writer. Shape: { status, totalDurationMs, entries: { name: { status, description, durationMs } } }.
public static class ZbHealthWriter
{
    public static Task WriteJsonAsync(HttpContext context, HealthReport report);
}

/// Single-property seam: is this node the active/leader node?
/// Attach to route groups via RequireActiveNode(). Implement with AkkaActiveNodeGate (Health.Akka)
/// or a project-specific implementation for non-Akka nodes.
public interface IActiveNodeGate
{
    bool IsActiveNode { get; }
}

/// Route convention that returns 503 on standby nodes. DI-resolves IActiveNodeGate.
public static class ActiveNodeGateExtensions
{
    public static IEndpointConventionBuilder RequireActiveNode(
        this IEndpointConventionBuilder builder,
        int retryAfterSeconds = 5);
}

/// Checks that a downstream gRPC channel is reachable.
public sealed class GrpcDependencyHealthCheck : IHealthCheck
{
    public GrpcDependencyHealthCheck(GrpcChannel channel, GrpcDependencyOptions? options = null);

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken  cancellationToken = default);
}

/// Options for GrpcDependencyHealthCheck.
public sealed class GrpcDependencyOptions
{
    /// Override the default probe (GrpcChannel.ConnectAsync).
    /// Return true = reachable, false = unreachable.
    public Func<GrpcChannel, CancellationToken, Task<bool>>? Probe { get; set; }

    /// Human-readable name of the dependency, used in the HealthCheckResult description.
    public string? DependencyName { get; set; }

    public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(5);
}

ZB.MOM.WW.Health.Akka

namespace ZB.MOM.WW.Health.Akka;

/// Checks the local node's Akka cluster membership status.
/// Register to tag ZbHealthTags.Ready.
/// <remarks>
/// The ActorSystem is resolved lazily from the service provider. If the ActorSystem is not yet
/// available (e.g. during startup before Akka is initialised), the check returns Degraded rather
/// than throwing. This makes the check safe to register before Akka is fully up.
/// </remarks>
public sealed class AkkaClusterHealthCheck : IHealthCheck
{
    /// <param name="serviceProvider">
    /// The application service provider. ActorSystem is resolved lazily so the check is
    /// startup-safe: if no ActorSystem is registered yet the result is Degraded.
    /// </param>
    public AkkaClusterHealthCheck(
        IServiceProvider        serviceProvider,
        AkkaClusterStatusPolicy policy);

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken  cancellationToken = default);
}

/// Maps Akka MemberStatus values to HealthStatus.
/// Two named presets cover the two existing implementations; construct a custom instance for
/// project-specific overrides.
public sealed class AkkaClusterStatusPolicy
{
    public AkkaClusterStatusPolicy(Func<MemberStatus, HealthStatus> evaluate);

    /// ScadaBridge origin: Up/Joining→Healthy, Leaving/Exiting→Degraded, else Unhealthy.
    /// Convergence target for all projects.
    public static AkkaClusterStatusPolicy Default { get; }

    /// OtOpcUa origin: self-Up-among-reachable-members→Healthy, else Degraded.
    /// Provided for backward compatibility during OtOpcUa migration.
    public static AkkaClusterStatusPolicy OtOpcUaCompat { get; }
}

/// Checks whether this node is the designated leader / active node.
/// Optional role parameter scopes the check to nodes carrying that role.
/// Register to tag ZbHealthTags.Active.
/// <remarks>
/// The ActorSystem is resolved lazily from the service provider. If the ActorSystem is not yet
/// available (e.g. during startup before Akka is initialised), the check returns Degraded rather
/// than throwing. This makes the check startup-safe.
/// </remarks>
public sealed class ActiveNodeHealthCheck : IHealthCheck
{
    /// Role-less constructor: Healthy = node is Up AND cluster leader (ScadaBridge ActiveNode pattern).
    /// Returns Degraded when ActorSystem/cluster is not yet ready.
    /// <param name="serviceProvider">
    /// The application service provider. ActorSystem is resolved lazily so the check is
    /// startup-safe: if no ActorSystem is registered yet the result is Degraded.
    /// </param>
    public ActiveNodeHealthCheck(IServiceProvider serviceProvider);

    /// Role-filtered constructor: Healthy = (node lacks the role) OR (node carries role AND is role-singleton leader).
    /// Degraded = node carries role but is not the role-singleton leader (OtOpcUa AdminRoleLeader pattern).
    /// Returns Degraded when ActorSystem/cluster is not yet ready.
    /// <param name="serviceProvider">
    /// The application service provider. ActorSystem is resolved lazily so the check is
    /// startup-safe: if no ActorSystem is registered yet the result is Degraded.
    /// </param>
    public ActiveNodeHealthCheck(IServiceProvider serviceProvider, string role);

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken  cancellationToken = default);
}

/// IActiveNodeGate implementation that computes IsActiveNode directly from the ActorSystem
/// (SelfMember Up + cluster leader), null-guarded for startup safety.
/// Register as a singleton. Does NOT resolve ActiveNodeHealthCheck from DI.
public sealed class AkkaActiveNodeGate : IActiveNodeGate
{
    /// <param name="serviceProvider">
    /// The application service provider. ActorSystem is resolved lazily; if not yet available
    /// IsActiveNode returns false (safe default during startup).
    /// The gate checks SelfMember.Status == Up AND cluster.State.Leader == self.Address directly.
    /// </param>
    public AkkaActiveNodeGate(IServiceProvider serviceProvider);

    public bool IsActiveNode { get; }
}

ZB.MOM.WW.Health.EntityFrameworkCore

namespace ZB.MOM.WW.Health.EntityFrameworkCore;

/// Checks database reachability via an EF Core DbContext.
/// Default probe: context.Database.CanConnectAsync() (ScadaBridge pattern).
/// Supply a custom probe delegate for query-based validation (OtOpcUa "query Deployments" pattern).
/// Register to tag ZbHealthTags.Ready.
public sealed class DatabaseHealthCheck<TContext> : IHealthCheck
    where TContext : DbContext
{
    // Resolves IDbContextFactory<TContext> when registered, else a scoped TContext; pool-safe.
    public DatabaseHealthCheck(
        IServiceProvider                      serviceProvider,
        DatabaseHealthCheckOptions<TContext>? options = null);

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken  cancellationToken = default);
}

/// Options for DatabaseHealthCheck<TContext>.
public sealed class DatabaseHealthCheckOptions<TContext>
    where TContext : DbContext
{
    /// Override the default CanConnectAsync() probe with a custom query-based probe.
    /// Throw to signal failure; return normally to signal success.
    /// Example: <c>db => db.Deployments.AsNoTracking().Take(1).ToListAsync()</c>
    public Func<TContext, CancellationToken, Task>? ProbeQuery { get; set; }

    public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(10);
}

Consumer matrix summary

Consumer Packages Notes
MxGateway ZB.MOM.WW.Health (core only) GrpcDependencyHealthCheck on the worker channel; all three tiers via MapZbHealth(); IActiveNodeGate not needed (not Akka-based)
OtOpcUa All three AkkaClusterHealthCheck + OtOpcUaCompat preset → Default on convergence; ActiveNodeHealthCheck(role: "admin"); DatabaseHealthCheck<T> with ProbeQuery delegate
ScadaBridge All three AkkaClusterHealthCheck + Default policy; ActiveNodeHealthCheck (role-less); DatabaseHealthCheck<T> default probe; AkkaActiveNodeGate replaces inline ActiveNodeGate

Open contract questions

  1. IActiveNodeGate for non-Akka nodes: MxGateway does not need active-node gating today. If a future MxGateway cluster requires it, the interface is in the core package and can be implemented without an Akka dependency. Validate whether a stub AlwaysActiveGate (returns true) should ship in core for single-node deployments.
  2. AkkaActiveNodeGate caching: IsActiveNode is a synchronous property; the underlying cluster-state read is synchronous but the ActorSystem lookup is lazy. Validate whether the gate should cache the computed value on a short TTL (e.g. 5 s) to reduce Akka.Cluster API overhead on high-frequency API routing checks, or whether reading SelfMember/State.Leader directly on every call is acceptable.

See ../GAPS.md for the adoption order and effort/risk.