76295695ee
- Contract: DatabaseHealthCheck<TContext> ctor now shows IServiceProvider (resolves IDbContextFactory<TContext> when registered, else a scoped TContext; pool-safe) - Contract: RequireActiveNode gains retryAfterSeconds = 5 default parameter - Packages: remove dangling AspNetCore.HealthChecks.UI.Client PackageVersion (no csproj referenced it) - Tests: fix CS8625 in RoleLessCases — use object?[] so null role rows compile warning-free under Nullable=enable - Add ZB.MOM.WW.Health/CLAUDE.md (packages, responsibilities, consumer matrix, build/test/pack commands, status + pointer to components/health/)
269 lines
12 KiB
Markdown
269 lines
12 KiB
Markdown
# Proposed shared library: `ZB.MOM.WW.Health`
|
|
|
|
A contract on paper — the public surface to extract so the three projects stop re-implementing
|
|
health-check tiers, probe logic, and the active-node gating seam. Realizes
|
|
[`../spec/SPEC.md`](../spec/SPEC.md). **Not yet created.** Reference implementations already
|
|
exist: OtOpcUa `Health/` (three-tier + probes), ScadaBridge `Health/` (inline probes +
|
|
`ActiveNodeGate`).
|
|
|
|
## Packages (.NET 10)
|
|
|
|
```
|
|
ZB.MOM.WW.Health # core: tier convention, response writer, IActiveNodeGate, GrpcDependencyHealthCheck
|
|
ZB.MOM.WW.Health.Akka # AkkaClusterHealthCheck, ActiveNodeHealthCheck, AkkaActiveNodeGate
|
|
ZB.MOM.WW.Health.EntityFrameworkCore # DatabaseHealthCheck<TContext>
|
|
```
|
|
|
|
All three are .NET 10. The split keeps Akka.Cluster and EF Core out of MxGateway's dependency
|
|
graph — MxGateway pulls only the core package. Published to the Gitea NuGet feed; SemVer; lockstep
|
|
to start. The x86 net48 mxaccessgw worker has no HTTP surface — net48 multi-targeting is **not**
|
|
required.
|
|
|
|
## Packaging & distribution
|
|
|
|
**Three NuGet packages, one DLL each**, on the Gitea NuGet feed. These are **libraries** linked
|
|
into each app — there is no central health service. Consumers reference only what they need:
|
|
|
|
| Package (→ DLL) | Transitive deps | MxGateway | OtOpcUa | ScadaBridge |
|
|
|---|---|---|---|---|
|
|
| `…Health` | `Microsoft.Extensions.Diagnostics.HealthChecks`, ASP.NET Core abstractions | ✅ | ✅ | ✅ |
|
|
| `…Health.Akka` | Akka.Cluster | — | ✅ | ✅ |
|
|
| `…Health.EntityFrameworkCore` | EF Core | — | ✅ | ✅ |
|
|
|
|
**Why MxGateway takes only core:** it is not Akka-based and does not use EF Core. The
|
|
`GrpcDependencyHealthCheck` in the core package covers its only probe need (worker channel
|
|
reachability), so it avoids the Akka and EF transitive trees entirely.
|
|
|
|
## `ZB.MOM.WW.Health`
|
|
|
|
```csharp
|
|
namespace ZB.MOM.WW.Health;
|
|
|
|
/// Canonical tag constants — use these when calling AddCheck(..., tags: [ZbHealthTags.Ready]).
|
|
public static class ZbHealthTags
|
|
{
|
|
public const string Ready = "ready";
|
|
public const string Active = "active";
|
|
public const string Live = "live";
|
|
}
|
|
|
|
/// Options for MapZbHealth(). All paths and the response writer are overridable.
|
|
public sealed class ZbHealthEndpointOptions
|
|
{
|
|
public string ReadyPath { get; set; } = "/health/ready";
|
|
public string ActivePath { get; set; } = "/health/active";
|
|
public string LivePath { get; set; } = "/healthz";
|
|
|
|
/// Defaults to ZbHealthWriter.WriteJsonAsync.
|
|
public Func<HttpContext, HealthReport, Task>? ResponseWriter { get; set; }
|
|
}
|
|
|
|
/// Extension that maps all three health tiers in one call.
|
|
public static class ZbHealthEndpointExtensions
|
|
{
|
|
/// Maps /health/ready (tag "ready"), /health/active (tag "active"), /healthz (tag "live").
|
|
/// Does NOT call services.AddHealthChecks() — caller is responsible for probe registration.
|
|
public static IEndpointConventionBuilder MapZbHealth(
|
|
this IEndpointRouteBuilder endpoints,
|
|
ZbHealthEndpointOptions? options = null);
|
|
|
|
/// Maps /health/ready (tag "ready"), /health/active (tag "active"), /healthz (tag "live").
|
|
public static IEndpointConventionBuilder MapZbHealth(
|
|
this IEndpointRouteBuilder endpoints,
|
|
Action<ZbHealthEndpointOptions> configure);
|
|
}
|
|
|
|
/// Canonical JSON response writer. Shape: { status, totalDurationMs, entries: { name: { status, description, durationMs } } }.
|
|
public static class ZbHealthWriter
|
|
{
|
|
public static Task WriteJsonAsync(HttpContext context, HealthReport report);
|
|
}
|
|
|
|
/// Single-property seam: is this node the active/leader node?
|
|
/// Attach to route groups via RequireActiveNode(). Implement with AkkaActiveNodeGate (Health.Akka)
|
|
/// or a project-specific implementation for non-Akka nodes.
|
|
public interface IActiveNodeGate
|
|
{
|
|
bool IsActiveNode { get; }
|
|
}
|
|
|
|
/// Route convention that returns 503 on standby nodes. DI-resolves IActiveNodeGate.
|
|
public static class ActiveNodeGateExtensions
|
|
{
|
|
public static IEndpointConventionBuilder RequireActiveNode(
|
|
this IEndpointConventionBuilder builder,
|
|
int retryAfterSeconds = 5);
|
|
}
|
|
|
|
/// Checks that a downstream gRPC channel is reachable.
|
|
public sealed class GrpcDependencyHealthCheck : IHealthCheck
|
|
{
|
|
public GrpcDependencyHealthCheck(GrpcChannel channel, GrpcDependencyOptions? options = null);
|
|
|
|
public Task<HealthCheckResult> CheckHealthAsync(
|
|
HealthCheckContext context,
|
|
CancellationToken cancellationToken = default);
|
|
}
|
|
|
|
/// Options for GrpcDependencyHealthCheck.
|
|
public sealed class GrpcDependencyOptions
|
|
{
|
|
/// Override the default probe (GrpcChannel.ConnectAsync).
|
|
/// Return true = reachable, false = unreachable.
|
|
public Func<GrpcChannel, CancellationToken, Task<bool>>? Probe { get; set; }
|
|
|
|
/// Human-readable name of the dependency, used in the HealthCheckResult description.
|
|
public string? DependencyName { get; set; }
|
|
|
|
public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(5);
|
|
}
|
|
```
|
|
|
|
## `ZB.MOM.WW.Health.Akka`
|
|
|
|
```csharp
|
|
namespace ZB.MOM.WW.Health.Akka;
|
|
|
|
/// Checks the local node's Akka cluster membership status.
|
|
/// Register to tag ZbHealthTags.Ready.
|
|
/// <remarks>
|
|
/// The ActorSystem is resolved lazily from the service provider. If the ActorSystem is not yet
|
|
/// available (e.g. during startup before Akka is initialised), the check returns Degraded rather
|
|
/// than throwing. This makes the check safe to register before Akka is fully up.
|
|
/// </remarks>
|
|
public sealed class AkkaClusterHealthCheck : IHealthCheck
|
|
{
|
|
/// <param name="serviceProvider">
|
|
/// The application service provider. ActorSystem is resolved lazily so the check is
|
|
/// startup-safe: if no ActorSystem is registered yet the result is Degraded.
|
|
/// </param>
|
|
public AkkaClusterHealthCheck(
|
|
IServiceProvider serviceProvider,
|
|
AkkaClusterStatusPolicy policy);
|
|
|
|
public Task<HealthCheckResult> CheckHealthAsync(
|
|
HealthCheckContext context,
|
|
CancellationToken cancellationToken = default);
|
|
}
|
|
|
|
/// Maps Akka MemberStatus values to HealthStatus.
|
|
/// Two named presets cover the two existing implementations; construct a custom instance for
|
|
/// project-specific overrides.
|
|
public sealed class AkkaClusterStatusPolicy
|
|
{
|
|
public AkkaClusterStatusPolicy(Func<MemberStatus, HealthStatus> evaluate);
|
|
|
|
/// ScadaBridge origin: Up/Joining→Healthy, Leaving/Exiting→Degraded, else Unhealthy.
|
|
/// Convergence target for all projects.
|
|
public static AkkaClusterStatusPolicy Default { get; }
|
|
|
|
/// OtOpcUa origin: self-Up-among-reachable-members→Healthy, else Degraded.
|
|
/// Provided for backward compatibility during OtOpcUa migration.
|
|
public static AkkaClusterStatusPolicy OtOpcUaCompat { get; }
|
|
}
|
|
|
|
/// Checks whether this node is the designated leader / active node.
|
|
/// Optional role parameter scopes the check to nodes carrying that role.
|
|
/// Register to tag ZbHealthTags.Active.
|
|
/// <remarks>
|
|
/// The ActorSystem is resolved lazily from the service provider. If the ActorSystem is not yet
|
|
/// available (e.g. during startup before Akka is initialised), the check returns Degraded rather
|
|
/// than throwing. This makes the check startup-safe.
|
|
/// </remarks>
|
|
public sealed class ActiveNodeHealthCheck : IHealthCheck
|
|
{
|
|
/// Role-less constructor: Healthy = node is Up AND cluster leader (ScadaBridge ActiveNode pattern).
|
|
/// Returns Degraded when ActorSystem/cluster is not yet ready.
|
|
/// <param name="serviceProvider">
|
|
/// The application service provider. ActorSystem is resolved lazily so the check is
|
|
/// startup-safe: if no ActorSystem is registered yet the result is Degraded.
|
|
/// </param>
|
|
public ActiveNodeHealthCheck(IServiceProvider serviceProvider);
|
|
|
|
/// Role-filtered constructor: Healthy = (node lacks the role) OR (node carries role AND is role-singleton leader).
|
|
/// Degraded = node carries role but is not the role-singleton leader (OtOpcUa AdminRoleLeader pattern).
|
|
/// Returns Degraded when ActorSystem/cluster is not yet ready.
|
|
/// <param name="serviceProvider">
|
|
/// The application service provider. ActorSystem is resolved lazily so the check is
|
|
/// startup-safe: if no ActorSystem is registered yet the result is Degraded.
|
|
/// </param>
|
|
public ActiveNodeHealthCheck(IServiceProvider serviceProvider, string role);
|
|
|
|
public Task<HealthCheckResult> CheckHealthAsync(
|
|
HealthCheckContext context,
|
|
CancellationToken cancellationToken = default);
|
|
}
|
|
|
|
/// IActiveNodeGate implementation that computes IsActiveNode directly from the ActorSystem
|
|
/// (SelfMember Up + cluster leader), null-guarded for startup safety.
|
|
/// Register as a singleton. Does NOT resolve ActiveNodeHealthCheck from DI.
|
|
public sealed class AkkaActiveNodeGate : IActiveNodeGate
|
|
{
|
|
/// <param name="serviceProvider">
|
|
/// The application service provider. ActorSystem is resolved lazily; if not yet available
|
|
/// IsActiveNode returns false (safe default during startup).
|
|
/// The gate checks SelfMember.Status == Up AND cluster.State.Leader == self.Address directly.
|
|
/// </param>
|
|
public AkkaActiveNodeGate(IServiceProvider serviceProvider);
|
|
|
|
public bool IsActiveNode { get; }
|
|
}
|
|
```
|
|
|
|
## `ZB.MOM.WW.Health.EntityFrameworkCore`
|
|
|
|
```csharp
|
|
namespace ZB.MOM.WW.Health.EntityFrameworkCore;
|
|
|
|
/// Checks database reachability via an EF Core DbContext.
|
|
/// Default probe: context.Database.CanConnectAsync() (ScadaBridge pattern).
|
|
/// Supply a custom probe delegate for query-based validation (OtOpcUa "query Deployments" pattern).
|
|
/// Register to tag ZbHealthTags.Ready.
|
|
public sealed class DatabaseHealthCheck<TContext> : IHealthCheck
|
|
where TContext : DbContext
|
|
{
|
|
// Resolves IDbContextFactory<TContext> when registered, else a scoped TContext; pool-safe.
|
|
public DatabaseHealthCheck(
|
|
IServiceProvider serviceProvider,
|
|
DatabaseHealthCheckOptions<TContext>? options = null);
|
|
|
|
public Task<HealthCheckResult> CheckHealthAsync(
|
|
HealthCheckContext context,
|
|
CancellationToken cancellationToken = default);
|
|
}
|
|
|
|
/// Options for DatabaseHealthCheck<TContext>.
|
|
public sealed class DatabaseHealthCheckOptions<TContext>
|
|
where TContext : DbContext
|
|
{
|
|
/// Override the default CanConnectAsync() probe with a custom query-based probe.
|
|
/// Throw to signal failure; return normally to signal success.
|
|
/// Example: <c>db => db.Deployments.AsNoTracking().Take(1).ToListAsync()</c>
|
|
public Func<TContext, CancellationToken, Task>? ProbeQuery { get; set; }
|
|
|
|
public TimeSpan Timeout { get; set; } = TimeSpan.FromSeconds(10);
|
|
}
|
|
```
|
|
|
|
## Consumer matrix summary
|
|
|
|
| Consumer | Packages | Notes |
|
|
|---|---|---|
|
|
| **MxGateway** | `ZB.MOM.WW.Health` (core only) | `GrpcDependencyHealthCheck` on the worker channel; all three tiers via `MapZbHealth()`; `IActiveNodeGate` not needed (not Akka-based) |
|
|
| **OtOpcUa** | All three | `AkkaClusterHealthCheck` + `OtOpcUaCompat` preset → `Default` on convergence; `ActiveNodeHealthCheck(role: "admin")`; `DatabaseHealthCheck<T>` with `ProbeQuery` delegate |
|
|
| **ScadaBridge** | All three | `AkkaClusterHealthCheck` + `Default` policy; `ActiveNodeHealthCheck` (role-less); `DatabaseHealthCheck<T>` default probe; `AkkaActiveNodeGate` replaces inline `ActiveNodeGate` |
|
|
|
|
## Open contract questions
|
|
|
|
1. **`IActiveNodeGate` for non-Akka nodes:** MxGateway does not need active-node gating today.
|
|
If a future MxGateway cluster requires it, the interface is in the core package and can be
|
|
implemented without an Akka dependency. Validate whether a stub `AlwaysActiveGate` (returns
|
|
`true`) should ship in core for single-node deployments.
|
|
2. **`AkkaActiveNodeGate` caching:** `IsActiveNode` is a synchronous property; the underlying
|
|
cluster-state read is synchronous but the ActorSystem lookup is lazy. Validate whether the
|
|
gate should cache the computed value on a short TTL (e.g. 5 s) to reduce Akka.Cluster API
|
|
overhead on high-frequency API routing checks, or whether reading `SelfMember`/`State.Leader`
|
|
directly on every call is acceptable.
|
|
|
|
See [`../GAPS.md`](../GAPS.md) for the adoption order and effort/risk.
|