Closes task #116 (GA hardening backlog). Before this commit the
RedundancyStatePublisher saw PeerReachability.Unknown for every peer
because the tracker had no writers — every healthy peer got
degraded to the Isolated-Primary band (230) even when fully reachable.
Not release-blocking (safe default), but not the full non-transparent-
redundancy UX either.
Two-layer probe model per docs/v2/implementation/phase-6-3-redundancy-runtime.md
§Stream B:
- PeerHttpProbeLoop (Stream B.1) — fast-fail layer at 2 s / 1 s timeout.
Hits each peer's http://{Host}:{DashboardPort}/healthz via an injected
IHttpClientFactory. Writes the HTTP bit of PeerReachability while
preserving the UA bit from the last UA probe so a transient HTTP blip
doesn't clobber the authoritative UA reading.
- PeerUaProbeLoop (Stream B.2) — authoritative layer at 10 s / 5 s
timeout. Calls DiscoveryClient.GetEndpoints against opc.tcp://{Host}:
{OpcUaPort} — cheap compared to a full Session.Create, no cert trust
required. Short-circuits when the HTTP probe last reported the peer
unhealthy (no wasted handshakes on a known-dead endpoint), clearing
the stale UaHealthy bit in that case.
Both inherit from BackgroundService, follow the tick/delay/catch pattern
RedundancyPublisherHostedService + ResilienceStatusPublisherHostedService
established, and expose TickAsync() as internal for test drive-through.
New PeerProbeOptions class carries the four intervals/timeouts so
operators can tune cadence per site. Registered as singleton in Program.cs;
HTTP client registered by name so the OtOpcUa handler chain
(Serilog enrichers, potential future OpenTelemetry instrumentation) isn't
bypassed.
Tests — 9 new unit tests across PeerHttpProbeLoopTests (5) and
PeerUaProbeLoopTests (4). All pass. Server.Tests total 243 → 252.
Full solution build clean.
Docs: v2-release-readiness.md Phase 6.3 follow-ups list marks the
peer-probe bullet struck-through with a close-out note.
Still deferred in Phase 6.3:
- OPC UA variable-node binding (task #117 — ServiceLevel + ServerUriArray)
- sp_PublishGeneration lease wrap (task #118)
- Client interop matrix (task #119)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>