Closes Stream C per docs/v2/implementation/phase-6-1-resilience-and-observability.md.
Core.Observability (new namespace):
- DriverHealthReport — pure-function aggregation over DriverHealthSnapshot list.
Empty fleet = Healthy. Any Faulted = Faulted. Any Unknown/Initializing (no
Faulted) = NotReady. Any Degraded or Reconnecting (no Faulted, no NotReady)
= Degraded. Else Healthy. HttpStatus(verdict) maps to the Stream C.1 state
matrix: Healthy/Degraded → 200, NotReady/Faulted → 503.
- LogContextEnricher — Serilog LogContext wrapper. Push(id, type, capability,
correlationId) returns an IDisposable scope; inner log calls carry
DriverInstanceId / DriverType / CapabilityName / CorrelationId structured
properties automatically. NewCorrelationId = 12-hex-char GUID slice for
cases where no OPC UA RequestHeader.RequestHandle is in flight.
CapabilityInvoker — now threads LogContextEnricher around every ExecuteAsync /
ExecuteWriteAsync call site. OtOpcUaServer passes driver.DriverType through
so logs correlate to the driver type too. Every capability call emits
structured fields per the Stream C.4 compliance check.
Server.Observability:
- HealthEndpointsHost — standalone HttpListener on http://localhost:4841/
(loopback avoids Windows URL-ACL elevation; remote probing via reverse
proxy or explicit netsh urlacl grant). Routes:
/healthz → 200 when (configDbReachable OR usingStaleConfig); 503 otherwise.
Body: status, uptimeSeconds, configDbReachable, usingStaleConfig.
/readyz → DriverHealthReport.Aggregate + HttpStatus mapping.
Body: verdict, drivers[], degradedDrivers[], uptimeSeconds.
anything else → 404.
Disposal cooperative with the HttpListener shutdown.
- OpcUaApplicationHost starts the health host after the OPC UA server comes up
and disposes it on shutdown. New OpcUaServerOptions knobs:
HealthEndpointsEnabled (default true), HealthEndpointsPrefix (default
http://localhost:4841/).
Program.cs:
- Serilog pipeline adds Enrich.FromLogContext + opt-in JSON file sink via
`Serilog:WriteJson = true` appsetting. Uses Serilog.Formatting.Compact's
CompactJsonFormatter (one JSON object per line — SIEMs like Splunk,
Datadog, Graylog ingest without a regex parser).
Server.Tests:
- Existing 3 OpcUaApplicationHost integration tests now set
HealthEndpointsEnabled=false to avoid port :4841 collisions under parallel
execution.
- New HealthEndpointsHostTests (9): /healthz healthy empty fleet; stale-config
returns 200 with flag; unreachable+no-cache returns 503; /readyz empty/
Healthy/Faulted/Degraded/Initializing drivers return correct status and
bodies; unknown path → 404. Uses ephemeral ports via Interlocked counter.
Core.Tests:
- DriverHealthReportTests (8): empty fleet, all-healthy, any-Faulted trumps,
any-NotReady without Faulted, Degraded without Faulted/NotReady, HttpStatus
per-verdict theory.
- LogContextEnricherTests (8): all 4 properties attach; scope disposes cleanly;
NewCorrelationId shape; null/whitespace driverInstanceId throws.
- CapabilityInvokerEnrichmentTests (2): inner logs carry structured
properties; no context leak outside the call site.
Full solution dotnet test: 1016 passing (baseline 906, +110 for Phase 6.1 so
far across Streams A+B+C). Pre-existing Client.CLI Subscribe flake unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No production code changes — pure additive test. Server.Tests Integration: 3 new tests pass; existing OpcUaServerIntegrationTests stays green (single-driver case still exercised there). Full Server.Tests Unit still 43 / 0. Deferred: multi-driver alarm-event case (two drivers each raising a GalaxyAlarmEvent, assert each condition lands on its owning instance's condition node) — needs a stub IAlarmSource and is worth its own focused PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>