Phase 3 PR 74 -- OPC UA Client transparent reconnect via SessionReconnectHandler #73

Merged
dohertj2 merged 1 commits from phase-3-pr74-opcua-client-session-reconnect into v2 2026-04-19 02:06:49 -04:00
Owner

Summary

Before this PR a keep-alive failure flipped HostState to Stopped and stayed there. Now the driver auto-retries via the SDK's SessionReconnectHandler.

  • OnKeepAlive on bad status spins up _reconnectHandler (lazy, single-instance) + calls BeginReconnect with ReconnectPeriod (default 5s).
  • OnReconnectComplete reads handler.Session, unwires keep-alive from the dead session, rewires to the new one (without this the next drop wouldn't trigger another reconnect), disposes the handler, flips back to Running.
  • Session.TransferSubscriptionsOnReconnect=true (SDK default) handles subscription migration automatically — local MonitoredItem handles stay live across the bounce.
  • ShutdownAsync now calls CancelReconnect() + Dispose before Session.CloseAsync to prevent the retry loop fighting the close.

Validation

  • 54/54 OpcUaClient.Tests pass (3 new reconnect)
  • dotnet build: 0 errors

Scope

Live disconnect-revive wire coverage deferred to the in-process-fixture PR.

Test plan

  • Default ReconnectPeriod = 5s
  • Override tested
  • Pre-init handler null (lazy instantiation)
## Summary Before this PR a keep-alive failure flipped `HostState` to `Stopped` and stayed there. Now the driver auto-retries via the SDK's `SessionReconnectHandler`. - `OnKeepAlive` on bad status spins up `_reconnectHandler` (lazy, single-instance) + calls `BeginReconnect` with `ReconnectPeriod` (default 5s). - `OnReconnectComplete` reads `handler.Session`, unwires keep-alive from the dead session, **rewires to the new one** (without this the next drop wouldn't trigger another reconnect), disposes the handler, flips back to `Running`. - `Session.TransferSubscriptionsOnReconnect=true` (SDK default) handles subscription migration automatically — local `MonitoredItem` handles stay live across the bounce. - `ShutdownAsync` now calls `CancelReconnect()` + Dispose **before** `Session.CloseAsync` to prevent the retry loop fighting the close. ## Validation - 54/54 OpcUaClient.Tests pass (3 new reconnect) - `dotnet build`: 0 errors ## Scope Live disconnect-revive wire coverage deferred to the in-process-fixture PR. ## Test plan - [x] Default `ReconnectPeriod` = 5s - [x] Override tested - [x] Pre-init handler null (lazy instantiation)
dohertj2 added 1 commit 2026-04-19 02:06:45 -04:00
Phase 3 PR 74 -- OPC UA Client transparent reconnect via SessionReconnectHandler. Before this PR a session keep-alive failure flipped HostState to Stopped and stayed there until operator intervention. PR 74 wires the SDK's SessionReconnectHandler so the driver automatically retries + swaps in a new session when the upstream server comes back. New _reconnectHandler field lazily instantiated inside OnKeepAlive on a bad status; subsequent bad keep-alives during the same outage no-op (null-check prevents stacked handlers). Constructor uses (telemetry:null, reconnectAbort:false, maxReconnectPeriod:2min) -- reconnectAbort=false so the handler keeps trying across many retry cycles; 2min cap prevents pathological back-off from starving operator visibility. BeginReconnect takes the current ISession + ReconnectPeriod (from OpcUaClientDriverOptions, default 5s per driver-specs.md \u00A78) + our OnReconnectComplete callback. OnReconnectComplete reads handler.Session for the new session, unwires keepalive from the dead session, rewires to the new session (without this the NEXT drop wouldn't trigger another reconnect -- subtle and critical), swaps Session, disposes the handler. The SDK's Session.TransferSubscriptionsOnReconnect default=true handles subscription migration internally so local MonitoredItem handles stay live across the reconnect; no driver-side manual transfer needed. Shutdown path now aborts any in-flight reconnect via _reconnectHandler.CancelReconnect() + Dispose BEFORE touching Session.CloseAsync -- without this the handler's retry loop holds a reference to the about-to-close session and fights the close, producing BadSessionIdInvalid noise in the upstream log and potential disposal-race exceptions. Cancel-first is the documented SDK pattern. Kept the driver's own HostState/OnHostStatusChanged flow: bad keep-alive -> Stopped transition + reconnect kicks off; OnReconnectComplete -> Running transition + Healthy status. Downstream consumers see the bounce as Stopped->Running without needing to know about the reconnect handler internals. Unit tests (OpcUaClientReconnectTests, 3 facts): Default_ReconnectPeriod_matches_driver_specs_5_seconds (sanity check on the options default), Options_ReconnectPeriod_is_configurable_for_aggressive_or_relaxed_retry (500ms override works), Driver_starts_with_no_reconnect_handler_active_pre_init (lazy instantiation -- indirectly via lifecycle). Wire-level disconnect-reconnect-resume coverage against a live upstream server is deferred to the in-process-fixture PR -- testing the reconnect path needs a server we can kill + revive mid-test, non-trivial to scaffold in xUnit. 54/54 OpcUaClient.Tests pass (51 prior + 3 reconnect). dotnet build clean. ba3a5598e1
dohertj2 merged commit 17f901bb65 into v2 2026-04-19 02:06:49 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dohertj2/lmxopcua#73