Files
scadaproj/components/health/shared-contract/ZB.MOM.WW.Health.md
T
Joseph Doherty 1dc35a8c43 docs(health): spec + ZB.MOM.WW.Health shared contract
Authors components/health/spec/SPEC.md (normalized three-tier endpoint
convention, probe catalog, response-writer contract, migration notes) and
components/health/shared-contract/ZB.MOM.WW.Health.md (paper API for the
3-package library: core, Akka, EntityFrameworkCore).
2026-06-01 06:20:19 -04:00

10 KiB

Proposed shared library: ZB.MOM.WW.Health

A contract on paper — the public surface to extract so the three projects stop re-implementing health-check tiers, probe logic, and the active-node gating seam. Realizes ../spec/SPEC.md. Not yet created. Reference implementations already exist: OtOpcUa Health/ (three-tier + probes), ScadaBridge Health/ (inline probes + ActiveNodeGate).

Packages (.NET 10)

ZB.MOM.WW.Health                        # core: tier convention, response writer, IActiveNodeGate, GrpcDependencyHealthCheck
ZB.MOM.WW.Health.Akka                   # AkkaClusterHealthCheck, ActiveNodeHealthCheck, AkkaActiveNodeGate
ZB.MOM.WW.Health.EntityFrameworkCore    # DatabaseHealthCheck<TContext>

All three are .NET 10. The split keeps Akka.Cluster and EF Core out of MxGateway's dependency graph — MxGateway pulls only the core package. Published to the Gitea NuGet feed; SemVer; lockstep to start. The x86 net48 mxaccessgw worker has no HTTP surface — net48 multi-targeting is not required.

Packaging & distribution

Three NuGet packages, one DLL each, on the Gitea NuGet feed. These are libraries linked into each app — there is no central health service. Consumers reference only what they need:

Package (→ DLL) Transitive deps MxGateway OtOpcUa ScadaBridge
…Health Microsoft.Extensions.Diagnostics.HealthChecks, ASP.NET Core abstractions
…Health.Akka Akka.Cluster
…Health.EntityFrameworkCore EF Core

Why MxGateway takes only core: it is not Akka-based and does not use EF Core. The GrpcDependencyHealthCheck in the core package covers its only probe need (worker channel reachability), so it avoids the Akka and EF transitive trees entirely.

ZB.MOM.WW.Health

namespace ZB.MOM.WW.Health;

/// Canonical tag constants — use these when calling AddCheck(..., tags: [ZbHealthTags.Ready]).
public static class ZbHealthTags
{
    public const string Ready  = "ready";
    public const string Active = "active";
    public const string Live   = "live";
}

/// Options for MapZbHealth(). All paths and the response writer are overridable.
public sealed class ZbHealthEndpointOptions
{
    public string ReadyPath  { get; set; } = "/health/ready";
    public string ActivePath { get; set; } = "/health/active";
    public string LivePath   { get; set; } = "/healthz";

    /// Defaults to ZbHealthWriter.WriteJsonAsync.
    public Func<HttpContext, HealthReport, Task>? ResponseWriter { get; set; }
}

/// Extension that maps all three health tiers in one call.
public static class ZbHealthEndpointExtensions
{
    /// Maps /health/ready (tag "ready"), /health/active (tag "active"), /healthz (tag "live").
    /// Does NOT call services.AddHealthChecks() — caller is responsible for probe registration.
    public static IEndpointConventionBuilder MapZbHealth(
        this IEndpointRouteBuilder endpoints,
        ZbHealthEndpointOptions?   options = null);

    /// Maps /health/ready (tag "ready"), /health/active (tag "active"), /healthz (tag "live").
    public static IEndpointConventionBuilder MapZbHealth(
        this IEndpointRouteBuilder    endpoints,
        Action<ZbHealthEndpointOptions> configure);
}

/// Canonical JSON response writer. Shape: { status, totalDurationMs, entries: { name: { status, description, duration } } }.
public static class ZbHealthWriter
{
    public static Task WriteJsonAsync(HttpContext context, HealthReport report);
}

/// Single-property seam: is this node the active/leader node?
/// Attach to route groups via RequireActiveNode(). Implement with AkkaActiveNodeGate (Health.Akka)
/// or a project-specific implementation for non-Akka nodes.
public interface IActiveNodeGate
{
    bool IsActiveNode { get; }
}

/// Route convention that returns 503 on standby nodes. DI-resolves IActiveNodeGate.
public static class ActiveNodeGateExtensions
{
    public static IEndpointConventionBuilder RequireActiveNode(
        this IEndpointConventionBuilder builder);
}

/// Checks that a downstream gRPC channel is reachable.
public sealed class GrpcDependencyHealthCheck : IHealthCheck
{
    public GrpcDependencyHealthCheck(GrpcChannel channel, GrpcDependencyOptions? options = null);

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken  cancellationToken = default);
}

/// Options for GrpcDependencyHealthCheck.
public sealed class GrpcDependencyOptions
{
    /// Override the default probe (GrpcChannel.ConnectAsync).
    /// Return true = reachable, false = unreachable.
    public Func<GrpcChannel, CancellationToken, Task<bool>>? Probe { get; set; }

    /// Human-readable name of the dependency, used in the HealthCheckResult description.
    public string? DependencyName { get; set; }

    public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(5);
}

ZB.MOM.WW.Health.Akka

namespace ZB.MOM.WW.Health.Akka;

/// Checks the local node's Akka cluster membership status.
/// Register to tag ZbHealthTags.Ready.
public sealed class AkkaClusterHealthCheck : IHealthCheck
{
    public AkkaClusterHealthCheck(
        ActorSystem             system,
        AkkaClusterStatusPolicy policy);

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken  cancellationToken = default);
}

/// Maps Akka MemberStatus values to HealthStatus.
/// Two named presets cover the two existing implementations; construct a custom instance for
/// project-specific overrides.
public sealed class AkkaClusterStatusPolicy
{
    public AkkaClusterStatusPolicy(Func<MemberStatus, HealthStatus> evaluate);

    /// ScadaBridge origin: Up/Joining→Healthy, Leaving/Exiting→Degraded, else Unhealthy.
    /// Convergence target for all projects.
    public static AkkaClusterStatusPolicy Default { get; }

    /// OtOpcUa origin: self-Up-among-reachable-members→Healthy, else Degraded.
    /// Provided for backward compatibility during OtOpcUa migration.
    public static AkkaClusterStatusPolicy OtOpcUaCompat { get; }
}

/// Checks whether this node is the designated leader / active node.
/// Optional role parameter scopes the check to nodes carrying that role.
/// Register to tag ZbHealthTags.Active.
public sealed class ActiveNodeHealthCheck : IHealthCheck
{
    /// Role-less constructor: Healthy = node is Up AND cluster leader (ScadaBridge ActiveNode pattern).
    public ActiveNodeHealthCheck(ActorSystem system);

    /// Role-filtered constructor: Healthy = (node lacks the role) OR (node carries role AND is role-singleton leader).
    /// Degraded = node carries role but is not the role-singleton leader (OtOpcUa AdminRoleLeader pattern).
    public ActiveNodeHealthCheck(ActorSystem system, string role);

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken  cancellationToken = default);
}

/// IActiveNodeGate implementation backed by ActiveNodeHealthCheck.
/// Register as a singleton; resolves ActiveNodeHealthCheck from DI.
public sealed class AkkaActiveNodeGate : IActiveNodeGate
{
    public AkkaActiveNodeGate(ActiveNodeHealthCheck check);

    public bool IsActiveNode { get; }
}

ZB.MOM.WW.Health.EntityFrameworkCore

namespace ZB.MOM.WW.Health.EntityFrameworkCore;

/// Checks database reachability via an EF Core DbContext.
/// Default probe: context.Database.CanConnectAsync() (ScadaBridge pattern).
/// Supply a custom probe delegate for query-based validation (OtOpcUa "query Deployments" pattern).
/// Register to tag ZbHealthTags.Ready.
public sealed class DatabaseHealthCheck<TContext> : IHealthCheck
    where TContext : DbContext
{
    public DatabaseHealthCheck(
        IDbContextFactory<TContext>      factory,
        DatabaseHealthCheckOptions<TContext>? options = null);

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken  cancellationToken = default);
}

/// Options for DatabaseHealthCheck<TContext>.
public sealed class DatabaseHealthCheckOptions<TContext>
    where TContext : DbContext
{
    /// Override the default CanConnectAsync() probe.
    /// Throw to signal failure; return normally to signal success.
    public Func<TContext, CancellationToken, Task>? Probe { get; set; }

    public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(10);
}

Consumer matrix summary

Consumer Packages Notes
MxGateway ZB.MOM.WW.Health (core only) GrpcDependencyHealthCheck on the worker channel; all three tiers via MapZbHealth(); IActiveNodeGate not needed (not Akka-based)
OtOpcUa All three AkkaClusterHealthCheck + OtOpcUaCompat preset → Default on convergence; ActiveNodeHealthCheck(role: "admin"); DatabaseHealthCheck<T> with custom probe delegate
ScadaBridge All three AkkaClusterHealthCheck + Default policy; ActiveNodeHealthCheck (role-less); DatabaseHealthCheck<T> default probe; AkkaActiveNodeGate replaces inline ActiveNodeGate

Open contract questions

  1. IActiveNodeGate for non-Akka nodes: MxGateway does not need active-node gating today. If a future MxGateway cluster requires it, the interface is in the core package and can be implemented without an Akka dependency. Validate whether a stub AlwaysActiveGate (returns true) should ship in core for single-node deployments.
  2. DI helpers: decide whether services.AddZbHealthChecks() (a DI-registered convenience that pre-registers gRPC + DB + Akka probes via options) is worth adding, or whether explicit services.AddHealthChecks().AddCheck<...>() calls per project are clearer. The spec currently leaves probe registration entirely per-project.
  3. AkkaActiveNodeGate caching: IsActiveNode is a synchronous property; the underlying ActiveNodeHealthCheck.CheckHealthAsync is async. Validate whether the gate should cache the last probe result on a short TTL (e.g. 5 s) or drive a background refresh, to avoid blocking synchronous callers.

See ../GAPS.md for the adoption order and effort/risk.