Files
lmxopcua/docs/plans/2026-06-18-driver-pages-reconnect-e2e.md
T

13 KiB
Raw Blame History

Driver-pages Phase 10 — reconnect-transition E2E + close-out Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to implement this plan task-by-task.

Goal: Add the one genuinely-missing driver-pages E2E test — a deployed driver transitioning Healthy → Reconnecting → Healthy on ReconnectDriver — fix the harness-fidelity gap behind it, prove the suite green, and reconcile the stale trackers.

Architecture: Extend TwoNodeClusterHarness to match production DI (AddOtOpcUaRuntime, which binds the real AkkaDriverHealthPublisher) and to accept an opt-in test IDriverFactory. A controllable fake driver lets a deployed driver reach Connected deterministically; the real DriverStatusSignalRBridge + a capturing mock IHubContext record the full health-transition sequence through the real cluster wiring.

Tech Stack: xUnit + Shouldly, Akka.NET TestKit/Hosting, Moq (for IHubContext), EF InMemory. No bUnit, no EF migration, no Commons/proto/interface change.

Design: docs/plans/2026-06-18-driver-pages-reconnect-e2e-design.md (committed 482418c8).


Task 1: Harness fidelity fix + controllable fake driver factory

Classification: standard Estimated implement time: ~5 min Parallelizable with: none

Files:

  • Create: tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs
  • Modify: tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/TwoNodeClusterHarness.cs
  • Read for contract (do NOT edit): src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriver.cs, src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriverFactory.cs, tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorTests.cs (existing fake IDriver/IDriverFactory double to template from), src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs (the AddOtOpcUaRuntime method + the resolver.GetService<IDriverHealthPublisher>() site).

Context (scene-setting): Today TwoNodeClusterHarness.BuildNodeAsync calls WithOtOpcUaRuntimeActors() (Akka actor spawn) but not AddOtOpcUaRuntime() (the DI registration). So IDriverHealthPublisher resolves to NullDriverHealthPublisher and IDriverFactory to NullDriverFactory → deployed drivers never publish health and reach only Stubbed. Production (Program.cs:87 + :199) calls both. This task brings the harness to production fidelity and adds an opt-in fake factory so a test can drive a real Connected state.

Step 1: Build the fake driver double. Create FakeReconnectDriverFactory implementing IDriverFactory whose TryCreate(driverType, instanceId, configJson) returns a fake IDriver (for driverType == "Modbus"; SupportedTypes => ["Modbus"]). Mirror the existing fake IDriver in DriverInstanceActorTests.cs for the full member surface (DriverType, connect/initialize, read/write/subscribe, dispose). The fake's initialize/connect path must succeed so the DriverInstanceActor reaches InitializeSucceeded → Become(Connected). Keep read/subscribe as benign success/no-ops. (No fault-injection needed: ReconnectDriver drives ForceReconnect → Reconnecting → re-initialize → Connected on its own.)

Step 2: Wire the harness. In TwoNodeClusterHarness:

  • Add builder.Services.AddOtOpcUaRuntime(); before the AddAkka(...) call in BuildNodeAsync (match production ordering — "Call this BEFORE AddAkka").
  • Add an optional IDriverFactory? driverFactory = null parameter to StartAsync and thread it into BuildNodeAsync; when non-null, register it (builder.Services.AddSingleton<IDriverFactory>(driverFactory); placed after AddOtOpcUaRuntime so it replaces the Null default — confirm the runtime resolves the last/replacement registration, not the TryAdd default; if TryAdd would win, use Replace or register the override before AddOtOpcUaRuntime).

Step 3: Build + regression-check existing suite. Run dotnet build tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests (clean), then dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests — the existing tests (DeployApi, DriverReconnect, DriverStatusHub, DriverTestConnect, etc.) must stay green with the added AddOtOpcUaRuntime (the Null sinks are inert; nothing existing subscribes to driver-health). Skipped fixture-gated tests staying skipped is expected.

Step 4: Commit.

git add tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs \
        tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/TwoNodeClusterHarness.cs
git commit -m "test(harness): production-fidelity DI (AddOtOpcUaRuntime) + opt-in fake driver factory"

Task 2: Reconnect health-transition E2E test

Classification: standard Estimated implement time: ~5 min Parallelizable with: none (depends on Task 1)

Files:

  • Modify: tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs (add the new test method; keep the existing two)
  • Modify: tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs (Step 0: controllable health + InitializeCount + created-driver accessor)
  • Read for pattern (do NOT edit): tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/MultiClusterScopingTests.cs (the GREEN seed-ServerCluster/Namespace/ClusterNode/DriverInstance + StartDeploymentAsyncAccepted + per-node driver spawn precedent), tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverStatusHubE2eTests.cs (the mock IHubContext capture + manual bridge-spawn pattern).

Context: The existing Reconnect_RoundTrip_ReturnsOk only asserts command ingestion. This new test proves the actual health transition of a deployed driver, end-to-end through the real cluster wiring: ReconnectDriver → AdminOperationsActor → DriverHostActor → DriverInstanceActor FSM → PublishHealthSnapshot → driver-health DPS topic → DriverStatusSignalRBridge → snapshot store / hub push.

Key findings from exploration (these shape the design — do not skip):

  • Published State = _driver.GetHealth().State (DriverInstanceActor.PublishHealthSnapshot, line 750). The actor FSM (Become(Reconnecting)) does NOT set the published state directly — on ForceReconnect it does DetachSubscription(); Become(Reconnecting); PublishHealthSnapshot(), which polls the driver's GetHealth(). So the always-Healthy Task 1 fake can never surface Reconnecting. The fake must report Reconnecting at that poll. The realistic, deterministic way: the fake reports Reconnecting (simulating a dropped connection — exactly what prompts an operator to click Reconnect), the ForceReconnect poll publishes it, and the retry's InitializeAsync clears it back to Healthy.
  • Validator-clean seed (from the GREEN MultiClusterScopingTests): ServerCluster + Namespace + ClusterNode(NodeId = NodeANodeId) + DriverInstance(Enabled, DriverType "Modbus", DriverConfig "{}"). No equipment/tags — equipment/tags trip DraftValidator and the deploy is Rejected (this is why the stale EquipmentNamespaceMaterializationTests fails — pre-existing, unrelated).
  • Bridge is NOT auto-spawned by the harness — spawn it manually (as DriverStatusHubE2eTests does). DPS is fire-and-forget (no replay), and the driver's repeat-publish is deduped, so spawn the bridge + await its DPS subscription (~2 s) before deploying so it catches the initial Healthy.

Step 0: Enhance FakeReconnectDriver / FakeReconnectDriverFactory (same file from Task 1):

  • FakeReconnectDriver: add a volatile/locked controllable health — GetHealth() returns DriverState.Reconnecting when a _reconnecting flag is set, else Healthy; a public ReportReconnecting() sets the flag; InitializeAsync clears it (and bumps a public InitializeCount). (DriverHealth ctor = new(state, lastSuccessfulRead, lastError).)
  • FakeReconnectDriverFactory: record created drivers so the test can retrieve the one for a given driverInstanceId (e.g. a ConcurrentDictionary<string, FakeReconnectDriver> Created or TryGetCreated(id)).

Step 1: Write the test Reconnect_DeployedDriver_TransitionsThroughReconnectingBackToHealthy:

  1. var factory = new FakeReconnectDriverFactory(); await TwoNodeClusterHarness.StartAsync(driverFactory: factory).
  2. Resolve the DI IDriverStatusSnapshotStore; spawn the real DriverStatusSignalRBridge over it with a capturing mock IHubContext<DriverStatusHub> recording every pushed DriverHealthChanged (reuse the DriverStatusHubE2eTests mock pattern — records every SendCoreAsync). Wait ~2 s for the DPS SubscribeAck.
  3. Seed ServerCluster + Namespace + ClusterNode(NodeANodeId) + one DriverInstance ("Modbus", Enabled, "{}", no tags) via CreateConfigDbContextAsync (mirror MultiClusterScopingTests.SeedTwoClusterConfigAsync but a single cluster bound to NodeANodeId).
  4. StartDeploymentAsync(createdBy: ...) → assert Accepted. Condition-poll (≤20 s) until the store reports the instance Healthy (and factory.TryGetCreated(instanceId) is non-null).
  5. factory.Created[instanceId].ReportReconnecting() — simulate the driver having lost its connection (the realistic trigger for an operator Reconnect).
  6. Dispatch ReconnectDriver(clusterId, instanceId, "e2e", Guid.NewGuid()) via IAdminOperationsClient.AskAsync<ReconnectDriverResult>; assert Ok.
  7. Condition-poll the captured push list until it contains a Reconnecting entry followed by a later Healthy. Assert: the sequence shows ReconnectingHealthy, the store's final state is Healthy, and InitializeCount >= 2 (proves the command genuinely re-initialised the deployed driver through the full cluster path — not just a health poke).

This test runs without any Docker fixture (the fake driver is in-process) — NOT skip-gated.

Step 2: Run. dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests --filter "FullyQualifiedName~DriverReconnectE2eTests" — all green (the new test executes, not skips).

Step 3: Commit.

git add tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs
git commit -m "test(adminui): E2E deployed-driver Healthy→Reconnecting→Healthy transition on Reconnect"

Task 3: Full driver E2E suite live run + verification

Classification: small Estimated implement time: ~5 min Parallelizable with: none (depends on Task 2)

Files: none (verification only)

Step 1: Bring up the Modbus sim so the skip-gated 10.1 tests execute (not skip): lmxopcua-fix up modbus standard (sim at 10.100.0.35:5020). Verify reachability.

Step 2: Run the full driver E2E suite: dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests — confirm the DriverTestConnectE2eTests (now executing against the live sim, not skipped), DriverReconnectE2eTests (incl. the new transition test), and DriverStatusHubE2eTests all pass. Record pass counts + which previously-skipped tests now executed.

Step 3: If any sim-gated test cannot run (sim unreachable from this host), record that honestly; the new in-process transition test must pass regardless. No commit (verification).


Task 4: Reconcile stale trackers + finish

Classification: small Estimated implement time: ~4 min Parallelizable with: none (depends on Task 3)

Files:

  • Modify: docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json (mark Phases 610 completed with real commits / a "shipped — reconciled 2026-06-18" note; bump lastUpdated)
  • Modify: docs/plans/2026-05-28-adminui-driver-pages-design.md (§8.3: ModbusTcpModbus)
  • Modify: stillpending.md §A.9 (mark Phase 6/8/10 SHIPPED; record the new reconnect-transition test; keep the full-stack hub test as a documented deferred follow-up) — NEVER STAGE this file (local working file)
  • Modify: memory project_stillpending_backlog.md + MEMORY.md

Step 1: Reconcile the .tasks.json (Phases 610 → completed, with commit refs from the brainstorming finding) and fix the §8.3 ModbusTcp string.

Step 2: Stage only the two docs/plans/... files (the tasks.json + the design md) — by explicit path. Do NOT git add .. Do NOT stage stillpending.md.

git add docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json \
        docs/plans/2026-05-28-adminui-driver-pages-design.md
git commit -m "docs(plans): reconcile driver-pages tasks (Phases 6-10 shipped) + fix smoke checklist"

Step 3: Update stillpending.md §A.9 (unstaged) + memory files.

Step 4: Finish. Use superpowers-extended-cc:finishing-a-development-branch — verify the suite green, then merge feat/driver-pages-reconnect-e2e → master (ff/merge) + push. Bookkeep this plan's .tasks.json (executionState COMPLETE) on master.


Cross-cutting verification (before merge)

  1. dotnet build tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests — clean.
  2. dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests — green (new test executes).
  3. git diff --stat master.. — only the expected harness/test/docs files; no surprise changes, no never-stage files staged.