Phase 6.1 Stream C — health endpoints on :4841 + LogContextEnricher + Serilog JSON sink + CapabilityInvoker enrichment
Closes Stream C per docs/v2/implementation/phase-6-1-resilience-and-observability.md. Core.Observability (new namespace): - DriverHealthReport — pure-function aggregation over DriverHealthSnapshot list. Empty fleet = Healthy. Any Faulted = Faulted. Any Unknown/Initializing (no Faulted) = NotReady. Any Degraded or Reconnecting (no Faulted, no NotReady) = Degraded. Else Healthy. HttpStatus(verdict) maps to the Stream C.1 state matrix: Healthy/Degraded → 200, NotReady/Faulted → 503. - LogContextEnricher — Serilog LogContext wrapper. Push(id, type, capability, correlationId) returns an IDisposable scope; inner log calls carry DriverInstanceId / DriverType / CapabilityName / CorrelationId structured properties automatically. NewCorrelationId = 12-hex-char GUID slice for cases where no OPC UA RequestHeader.RequestHandle is in flight. CapabilityInvoker — now threads LogContextEnricher around every ExecuteAsync / ExecuteWriteAsync call site. OtOpcUaServer passes driver.DriverType through so logs correlate to the driver type too. Every capability call emits structured fields per the Stream C.4 compliance check. Server.Observability: - HealthEndpointsHost — standalone HttpListener on http://localhost:4841/ (loopback avoids Windows URL-ACL elevation; remote probing via reverse proxy or explicit netsh urlacl grant). Routes: /healthz → 200 when (configDbReachable OR usingStaleConfig); 503 otherwise. Body: status, uptimeSeconds, configDbReachable, usingStaleConfig. /readyz → DriverHealthReport.Aggregate + HttpStatus mapping. Body: verdict, drivers[], degradedDrivers[], uptimeSeconds. anything else → 404. Disposal cooperative with the HttpListener shutdown. - OpcUaApplicationHost starts the health host after the OPC UA server comes up and disposes it on shutdown. New OpcUaServerOptions knobs: HealthEndpointsEnabled (default true), HealthEndpointsPrefix (default http://localhost:4841/). Program.cs: - Serilog pipeline adds Enrich.FromLogContext + opt-in JSON file sink via `Serilog:WriteJson = true` appsetting. Uses Serilog.Formatting.Compact's CompactJsonFormatter (one JSON object per line — SIEMs like Splunk, Datadog, Graylog ingest without a regex parser). Server.Tests: - Existing 3 OpcUaApplicationHost integration tests now set HealthEndpointsEnabled=false to avoid port :4841 collisions under parallel execution. - New HealthEndpointsHostTests (9): /healthz healthy empty fleet; stale-config returns 200 with flag; unreachable+no-cache returns 503; /readyz empty/ Healthy/Faulted/Degraded/Initializing drivers return correct status and bodies; unknown path → 404. Uses ephemeral ports via Interlocked counter. Core.Tests: - DriverHealthReportTests (8): empty fleet, all-healthy, any-Faulted trumps, any-NotReady without Faulted, Degraded without Faulted/NotReady, HttpStatus per-verdict theory. - LogContextEnricherTests (8): all 4 properties attach; scope disposes cleanly; NewCorrelationId shape; null/whitespace driverInstanceId throws. - CapabilityInvokerEnrichmentTests (2): inner logs carry structured properties; no context leak outside the call site. Full solution dotnet test: 1016 passing (baseline 906, +110 for Phase 6.1 so far across Streams A+B+C). Pre-existing Client.CLI Subscribe flake unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,86 @@
|
||||
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
|
||||
|
||||
namespace ZB.MOM.WW.OtOpcUa.Core.Observability;
|
||||
|
||||
/// <summary>
|
||||
/// Domain-layer health aggregation for Phase 6.1 Stream C. Pure functions over the driver
|
||||
/// fleet — given each driver's <see cref="DriverState"/>, produce a <see cref="ReadinessVerdict"/>
|
||||
/// that maps to HTTP status codes at the endpoint layer.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// State matrix per <c>docs/v2/implementation/phase-6-1-resilience-and-observability.md</c>
|
||||
/// §Stream C.1:
|
||||
/// <list type="bullet">
|
||||
/// <item><see cref="DriverState.Unknown"/> / <see cref="DriverState.Initializing"/>
|
||||
/// → /readyz 503 (not yet ready).</item>
|
||||
/// <item><see cref="DriverState.Healthy"/> → /readyz 200.</item>
|
||||
/// <item><see cref="DriverState.Degraded"/> → /readyz 200 with flagged driver IDs.</item>
|
||||
/// <item><see cref="DriverState.Faulted"/> → /readyz 503.</item>
|
||||
/// </list>
|
||||
/// The overall verdict is computed across the fleet: any Faulted → Faulted; any
|
||||
/// Unknown/Initializing → NotReady; any Degraded → Degraded; else Healthy. An empty fleet
|
||||
/// is Healthy (nothing to degrade).
|
||||
/// </remarks>
|
||||
public static class DriverHealthReport
|
||||
{
|
||||
/// <summary>Compute the fleet-wide readiness verdict from per-driver states.</summary>
|
||||
public static ReadinessVerdict Aggregate(IReadOnlyList<DriverHealthSnapshot> drivers)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(drivers);
|
||||
if (drivers.Count == 0) return ReadinessVerdict.Healthy;
|
||||
|
||||
var anyFaulted = drivers.Any(d => d.State == DriverState.Faulted);
|
||||
if (anyFaulted) return ReadinessVerdict.Faulted;
|
||||
|
||||
var anyInitializing = drivers.Any(d =>
|
||||
d.State == DriverState.Unknown || d.State == DriverState.Initializing);
|
||||
if (anyInitializing) return ReadinessVerdict.NotReady;
|
||||
|
||||
// Reconnecting = driver alive but not serving live data; report as Degraded so /readyz
|
||||
// stays 200 (the fleet can still serve cached / last-good data) while operators see the
|
||||
// affected driver in the body.
|
||||
var anyDegraded = drivers.Any(d =>
|
||||
d.State == DriverState.Degraded || d.State == DriverState.Reconnecting);
|
||||
if (anyDegraded) return ReadinessVerdict.Degraded;
|
||||
|
||||
return ReadinessVerdict.Healthy;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Map a <see cref="ReadinessVerdict"/> to the HTTP status the /readyz endpoint should
|
||||
/// return per the Stream C.1 state matrix.
|
||||
/// </summary>
|
||||
public static int HttpStatus(ReadinessVerdict verdict) => verdict switch
|
||||
{
|
||||
ReadinessVerdict.Healthy => 200,
|
||||
ReadinessVerdict.Degraded => 200,
|
||||
ReadinessVerdict.NotReady => 503,
|
||||
ReadinessVerdict.Faulted => 503,
|
||||
_ => 500,
|
||||
};
|
||||
}
|
||||
|
||||
/// <summary>Per-driver snapshot fed into <see cref="DriverHealthReport.Aggregate"/>.</summary>
|
||||
/// <param name="DriverInstanceId">Driver instance identifier (from <c>IDriver.DriverInstanceId</c>).</param>
|
||||
/// <param name="State">Current <see cref="DriverState"/> from <c>IDriver.GetHealth</c>.</param>
|
||||
/// <param name="DetailMessage">Optional driver-supplied detail (e.g. "primary PLC unreachable").</param>
|
||||
public sealed record DriverHealthSnapshot(
|
||||
string DriverInstanceId,
|
||||
DriverState State,
|
||||
string? DetailMessage = null);
|
||||
|
||||
/// <summary>Overall fleet readiness — derived from driver states by <see cref="DriverHealthReport.Aggregate"/>.</summary>
|
||||
public enum ReadinessVerdict
|
||||
{
|
||||
/// <summary>All drivers Healthy (or fleet is empty).</summary>
|
||||
Healthy,
|
||||
|
||||
/// <summary>At least one driver Degraded; none Faulted / NotReady.</summary>
|
||||
Degraded,
|
||||
|
||||
/// <summary>At least one driver Unknown / Initializing; none Faulted.</summary>
|
||||
NotReady,
|
||||
|
||||
/// <summary>At least one driver Faulted.</summary>
|
||||
Faulted,
|
||||
}
|
||||
Reference in New Issue
Block a user