# Proposed shared library: `ZB.MOM.WW.Health` A contract on paper — the public surface to extract so the three projects stop re-implementing health-check tiers, probe logic, and the active-node gating seam. Realizes [`../spec/SPEC.md`](../spec/SPEC.md). **Not yet created.** Reference implementations already exist: OtOpcUa `Health/` (three-tier + probes), ScadaBridge `Health/` (inline probes + `ActiveNodeGate`). ## Packages (.NET 10) ``` ZB.MOM.WW.Health # core: tier convention, response writer, IActiveNodeGate, GrpcDependencyHealthCheck ZB.MOM.WW.Health.Akka # AkkaClusterHealthCheck, ActiveNodeHealthCheck, AkkaActiveNodeGate ZB.MOM.WW.Health.EntityFrameworkCore # DatabaseHealthCheck ``` All three are .NET 10. The split keeps Akka.Cluster and EF Core out of MxGateway's dependency graph — MxGateway pulls only the core package. Published to the Gitea NuGet feed; SemVer; lockstep to start. The x86 net48 mxaccessgw worker has no HTTP surface — net48 multi-targeting is **not** required. ## Packaging & distribution **Three NuGet packages, one DLL each**, on the Gitea NuGet feed. These are **libraries** linked into each app — there is no central health service. Consumers reference only what they need: | Package (→ DLL) | Transitive deps | MxGateway | OtOpcUa | ScadaBridge | |---|---|---|---|---| | `…Health` | `Microsoft.Extensions.Diagnostics.HealthChecks`, ASP.NET Core abstractions | ✅ | ✅ | ✅ | | `…Health.Akka` | Akka.Cluster | — | ✅ | ✅ | | `…Health.EntityFrameworkCore` | EF Core | — | ✅ | ✅ | **Why MxGateway takes only core:** it is not Akka-based and does not use EF Core. The `GrpcDependencyHealthCheck` in the core package covers its only probe need (worker channel reachability), so it avoids the Akka and EF transitive trees entirely. ## `ZB.MOM.WW.Health` ```csharp namespace ZB.MOM.WW.Health; /// Canonical tag constants — use these when calling AddCheck(..., tags: [ZbHealthTags.Ready]). public static class ZbHealthTags { public const string Ready = "ready"; public const string Active = "active"; public const string Live = "live"; } /// Options for MapZbHealth(). All paths and the response writer are overridable. public sealed class ZbHealthEndpointOptions { public string ReadyPath { get; set; } = "/health/ready"; public string ActivePath { get; set; } = "/health/active"; public string LivePath { get; set; } = "/healthz"; /// Defaults to ZbHealthWriter.WriteJsonAsync. public Func? ResponseWriter { get; set; } } /// Extension that maps all three health tiers in one call. public static class ZbHealthEndpointExtensions { /// Maps /health/ready (tag "ready"), /health/active (tag "active"), /healthz (tag "live"). /// Does NOT call services.AddHealthChecks() — caller is responsible for probe registration. public static IEndpointConventionBuilder MapZbHealth( this IEndpointRouteBuilder endpoints, ZbHealthEndpointOptions? options = null); /// Maps /health/ready (tag "ready"), /health/active (tag "active"), /healthz (tag "live"). public static IEndpointConventionBuilder MapZbHealth( this IEndpointRouteBuilder endpoints, Action configure); } /// Canonical JSON response writer. Shape: { status, totalDurationMs, entries: { name: { status, description, durationMs } } }. public static class ZbHealthWriter { public static Task WriteJsonAsync(HttpContext context, HealthReport report); } /// Single-property seam: is this node the active/leader node? /// Attach to route groups via RequireActiveNode(). Implement with AkkaActiveNodeGate (Health.Akka) /// or a project-specific implementation for non-Akka nodes. public interface IActiveNodeGate { bool IsActiveNode { get; } } /// Route convention that returns 503 on standby nodes. DI-resolves IActiveNodeGate. public static class ActiveNodeGateExtensions { public static IEndpointConventionBuilder RequireActiveNode( this IEndpointConventionBuilder builder, int retryAfterSeconds = 5); } /// Checks that a downstream gRPC channel is reachable. public sealed class GrpcDependencyHealthCheck : IHealthCheck { public GrpcDependencyHealthCheck(GrpcChannel channel, GrpcDependencyOptions? options = null); public Task CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken = default); } /// Options for GrpcDependencyHealthCheck. public sealed class GrpcDependencyOptions { /// Override the default probe (GrpcChannel.ConnectAsync). /// Return true = reachable, false = unreachable. public Func>? Probe { get; set; } /// Human-readable name of the dependency, used in the HealthCheckResult description. public string? DependencyName { get; set; } public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(5); } ``` ## `ZB.MOM.WW.Health.Akka` ```csharp namespace ZB.MOM.WW.Health.Akka; /// Checks the local node's Akka cluster membership status. /// Register to tag ZbHealthTags.Ready. /// /// The ActorSystem is resolved lazily from the service provider. If the ActorSystem is not yet /// available (e.g. during startup before Akka is initialised), the check returns Degraded rather /// than throwing. This makes the check safe to register before Akka is fully up. /// public sealed class AkkaClusterHealthCheck : IHealthCheck { /// /// The application service provider. ActorSystem is resolved lazily so the check is /// startup-safe: if no ActorSystem is registered yet the result is Degraded. /// public AkkaClusterHealthCheck( IServiceProvider serviceProvider, AkkaClusterStatusPolicy policy); public Task CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken = default); } /// Maps Akka MemberStatus values to HealthStatus. /// Two named presets cover the two existing implementations; construct a custom instance for /// project-specific overrides. public sealed class AkkaClusterStatusPolicy { public AkkaClusterStatusPolicy(Func evaluate); /// ScadaBridge origin: Up/Joining→Healthy, Leaving/Exiting→Degraded, else Unhealthy. /// Convergence target for all projects. public static AkkaClusterStatusPolicy Default { get; } /// OtOpcUa origin: self-Up-among-reachable-members→Healthy, else Degraded. /// Provided for backward compatibility during OtOpcUa migration. public static AkkaClusterStatusPolicy OtOpcUaCompat { get; } } /// Checks whether this node is the designated leader / active node. /// Optional role parameter scopes the check to nodes carrying that role. /// Register to tag ZbHealthTags.Active. /// /// The ActorSystem is resolved lazily from the service provider. If the ActorSystem is not yet /// available (e.g. during startup before Akka is initialised), the check returns Degraded rather /// than throwing. This makes the check startup-safe. /// public sealed class ActiveNodeHealthCheck : IHealthCheck { /// Role-less constructor: Healthy = node is Up AND cluster leader (ScadaBridge ActiveNode pattern). /// Returns Degraded when ActorSystem/cluster is not yet ready. /// /// The application service provider. ActorSystem is resolved lazily so the check is /// startup-safe: if no ActorSystem is registered yet the result is Degraded. /// public ActiveNodeHealthCheck(IServiceProvider serviceProvider); /// Role-filtered constructor: Healthy = (node lacks the role) OR (node carries role AND is role-singleton leader). /// Degraded = node carries role but is not the role-singleton leader (OtOpcUa AdminRoleLeader pattern). /// Returns Degraded when ActorSystem/cluster is not yet ready. /// /// The application service provider. ActorSystem is resolved lazily so the check is /// startup-safe: if no ActorSystem is registered yet the result is Degraded. /// public ActiveNodeHealthCheck(IServiceProvider serviceProvider, string role); public Task CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken = default); } /// IActiveNodeGate implementation that computes IsActiveNode directly from the ActorSystem /// (SelfMember Up + cluster leader), null-guarded for startup safety. /// Register as a singleton. Does NOT resolve ActiveNodeHealthCheck from DI. public sealed class AkkaActiveNodeGate : IActiveNodeGate { /// /// The application service provider. ActorSystem is resolved lazily; if not yet available /// IsActiveNode returns false (safe default during startup). /// The gate checks SelfMember.Status == Up AND cluster.State.Leader == self.Address directly. /// public AkkaActiveNodeGate(IServiceProvider serviceProvider); public bool IsActiveNode { get; } } ``` ## `ZB.MOM.WW.Health.EntityFrameworkCore` ```csharp namespace ZB.MOM.WW.Health.EntityFrameworkCore; /// Checks database reachability via an EF Core DbContext. /// Default probe: context.Database.CanConnectAsync() (ScadaBridge pattern). /// Supply a custom probe delegate for query-based validation (OtOpcUa "query Deployments" pattern). /// Register to tag ZbHealthTags.Ready. public sealed class DatabaseHealthCheck : IHealthCheck where TContext : DbContext { // Resolves IDbContextFactory when registered, else a scoped TContext; pool-safe. public DatabaseHealthCheck( IServiceProvider serviceProvider, DatabaseHealthCheckOptions? options = null); public Task CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken = default); } /// Options for DatabaseHealthCheck. public sealed class DatabaseHealthCheckOptions where TContext : DbContext { /// Override the default CanConnectAsync() probe with a custom query-based probe. /// Throw to signal failure; return normally to signal success. /// Example: db => db.Deployments.AsNoTracking().Take(1).ToListAsync() public Func? ProbeQuery { get; set; } public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(10); } ``` ## Consumer matrix summary | Consumer | Packages | Notes | |---|---|---| | **MxGateway** | `ZB.MOM.WW.Health` (core only) | `GrpcDependencyHealthCheck` on the worker channel; all three tiers via `MapZbHealth()`; `IActiveNodeGate` not needed (not Akka-based) | | **OtOpcUa** | All three | `AkkaClusterHealthCheck` + `OtOpcUaCompat` preset → `Default` on convergence; `ActiveNodeHealthCheck(role: "admin")`; `DatabaseHealthCheck` with `ProbeQuery` delegate | | **ScadaBridge** | All three | `AkkaClusterHealthCheck` + `Default` policy; `ActiveNodeHealthCheck` (role-less); `DatabaseHealthCheck` default probe; `AkkaActiveNodeGate` replaces inline `ActiveNodeGate` | ## Open contract questions 1. **`IActiveNodeGate` for non-Akka nodes:** MxGateway does not need active-node gating today. If a future MxGateway cluster requires it, the interface is in the core package and can be implemented without an Akka dependency. Validate whether a stub `AlwaysActiveGate` (returns `true`) should ship in core for single-node deployments. 2. **`AkkaActiveNodeGate` caching:** `IsActiveNode` is a synchronous property; the underlying cluster-state read is synchronous but the ActorSystem lookup is lazy. Validate whether the gate should cache the computed value on a short TTL (e.g. 5 s) to reduce Akka.Cluster API overhead on high-frequency API routing checks, or whether reading `SelfMember`/`State.Leader` directly on every call is acceptable. See [`../GAPS.md`](../GAPS.md) for the adoption order and effort/risk.