13 KiB
Driver-pages Phase 10 — reconnect-transition E2E + close-out Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to implement this plan task-by-task.
Goal: Add the one genuinely-missing driver-pages E2E test — a deployed driver
transitioning Healthy → Reconnecting → Healthy on ReconnectDriver — fix the
harness-fidelity gap behind it, prove the suite green, and reconcile the stale trackers.
Architecture: Extend TwoNodeClusterHarness to match production DI (AddOtOpcUaRuntime,
which binds the real AkkaDriverHealthPublisher) and to accept an opt-in test
IDriverFactory. A controllable fake driver lets a deployed driver reach Connected
deterministically; the real DriverStatusSignalRBridge + a capturing mock IHubContext
record the full health-transition sequence through the real cluster wiring.
Tech Stack: xUnit + Shouldly, Akka.NET TestKit/Hosting, Moq (for IHubContext), EF
InMemory. No bUnit, no EF migration, no Commons/proto/interface change.
Design: docs/plans/2026-06-18-driver-pages-reconnect-e2e-design.md (committed 482418c8).
Task 1: Harness fidelity fix + controllable fake driver factory
Classification: standard Estimated implement time: ~5 min Parallelizable with: none
Files:
- Create:
tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs - Modify:
tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/TwoNodeClusterHarness.cs - Read for contract (do NOT edit):
src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriver.cs,src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriverFactory.cs,tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorTests.cs(existing fakeIDriver/IDriverFactorydouble to template from),src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs(theAddOtOpcUaRuntimemethod + theresolver.GetService<IDriverHealthPublisher>()site).
Context (scene-setting): Today TwoNodeClusterHarness.BuildNodeAsync calls
WithOtOpcUaRuntimeActors() (Akka actor spawn) but not AddOtOpcUaRuntime() (the DI
registration). So IDriverHealthPublisher resolves to NullDriverHealthPublisher and
IDriverFactory to NullDriverFactory → deployed drivers never publish health and reach
only Stubbed. Production (Program.cs:87 + :199) calls both. This task brings the
harness to production fidelity and adds an opt-in fake factory so a test can drive a real
Connected state.
Step 1: Build the fake driver double. Create FakeReconnectDriverFactory implementing
IDriverFactory whose TryCreate(driverType, instanceId, configJson) returns a fake
IDriver (for driverType == "Modbus"; SupportedTypes => ["Modbus"]). Mirror the
existing fake IDriver in DriverInstanceActorTests.cs for the full member surface
(DriverType, connect/initialize, read/write/subscribe, dispose). The fake's
initialize/connect path must succeed so the DriverInstanceActor reaches
InitializeSucceeded → Become(Connected). Keep read/subscribe as benign success/no-ops.
(No fault-injection needed: ReconnectDriver drives ForceReconnect → Reconnecting → re-initialize → Connected on its own.)
Step 2: Wire the harness. In TwoNodeClusterHarness:
- Add
builder.Services.AddOtOpcUaRuntime();before theAddAkka(...)call inBuildNodeAsync(match production ordering — "Call this BEFORE AddAkka"). - Add an optional
IDriverFactory? driverFactory = nullparameter toStartAsyncand thread it intoBuildNodeAsync; when non-null, register it (builder.Services.AddSingleton<IDriverFactory>(driverFactory);placed afterAddOtOpcUaRuntimeso it replaces theNulldefault — confirm the runtime resolves the last/replacement registration, not theTryAdddefault; ifTryAddwould win, useReplaceor register the override beforeAddOtOpcUaRuntime).
Step 3: Build + regression-check existing suite. Run
dotnet build tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests (clean), then
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests — the existing tests
(DeployApi, DriverReconnect, DriverStatusHub, DriverTestConnect, etc.) must stay green with
the added AddOtOpcUaRuntime (the Null sinks are inert; nothing existing subscribes to
driver-health). Skipped fixture-gated tests staying skipped is expected.
Step 4: Commit.
git add tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs \
tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/TwoNodeClusterHarness.cs
git commit -m "test(harness): production-fidelity DI (AddOtOpcUaRuntime) + opt-in fake driver factory"
Task 2: Reconnect health-transition E2E test
Classification: standard Estimated implement time: ~5 min Parallelizable with: none (depends on Task 1)
Files:
- Modify:
tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs(add the new test method; keep the existing two) - Modify:
tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs(Step 0: controllable health +InitializeCount+ created-driver accessor) - Read for pattern (do NOT edit):
tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/MultiClusterScopingTests.cs(the GREEN seed-ServerCluster/Namespace/ClusterNode/DriverInstance+StartDeploymentAsync→Accepted+ per-node driver spawn precedent),tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverStatusHubE2eTests.cs(the mockIHubContextcapture + manual bridge-spawn pattern).
Context: The existing Reconnect_RoundTrip_ReturnsOk only asserts command ingestion.
This new test proves the actual health transition of a deployed driver, end-to-end through
the real cluster wiring: ReconnectDriver → AdminOperationsActor → DriverHostActor → DriverInstanceActor FSM → PublishHealthSnapshot → driver-health DPS topic → DriverStatusSignalRBridge → snapshot store / hub push.
Key findings from exploration (these shape the design — do not skip):
- Published
State=_driver.GetHealth().State(DriverInstanceActor.PublishHealthSnapshot, line 750). The actor FSM (Become(Reconnecting)) does NOT set the published state directly — onForceReconnectit doesDetachSubscription(); Become(Reconnecting); PublishHealthSnapshot(), which polls the driver'sGetHealth(). So the always-HealthyTask 1 fake can never surfaceReconnecting. The fake must reportReconnectingat that poll. The realistic, deterministic way: the fake reportsReconnecting(simulating a dropped connection — exactly what prompts an operator to click Reconnect), theForceReconnectpoll publishes it, and the retry'sInitializeAsyncclears it back toHealthy. - Validator-clean seed (from the GREEN
MultiClusterScopingTests):ServerCluster+Namespace+ClusterNode(NodeId =NodeANodeId) +DriverInstance(Enabled, DriverType"Modbus", DriverConfig"{}"). No equipment/tags — equipment/tags tripDraftValidatorand the deploy isRejected(this is why the staleEquipmentNamespaceMaterializationTestsfails — pre-existing, unrelated). - Bridge is NOT auto-spawned by the harness — spawn it manually (as
DriverStatusHubE2eTestsdoes). DPS is fire-and-forget (no replay), and the driver's repeat-publish is deduped, so spawn the bridge + await its DPS subscription (~2 s) before deploying so it catches the initialHealthy.
Step 0: Enhance FakeReconnectDriver / FakeReconnectDriverFactory (same file from Task 1):
FakeReconnectDriver: add avolatile/locked controllable health —GetHealth()returnsDriverState.Reconnectingwhen a_reconnectingflag is set, elseHealthy; a publicReportReconnecting()sets the flag;InitializeAsyncclears it (and bumps a publicInitializeCount). (DriverHealthctor =new(state, lastSuccessfulRead, lastError).)FakeReconnectDriverFactory: record created drivers so the test can retrieve the one for a givendriverInstanceId(e.g. aConcurrentDictionary<string, FakeReconnectDriver> CreatedorTryGetCreated(id)).
Step 1: Write the test Reconnect_DeployedDriver_TransitionsThroughReconnectingBackToHealthy:
var factory = new FakeReconnectDriverFactory(); await TwoNodeClusterHarness.StartAsync(driverFactory: factory).- Resolve the DI
IDriverStatusSnapshotStore; spawn the realDriverStatusSignalRBridgeover it with a capturing mockIHubContext<DriverStatusHub>recording every pushedDriverHealthChanged(reuse theDriverStatusHubE2eTestsmock pattern — records everySendCoreAsync). Wait ~2 s for the DPSSubscribeAck. - Seed
ServerCluster+Namespace+ClusterNode(NodeANodeId) + oneDriverInstance("Modbus", Enabled,"{}", no tags) viaCreateConfigDbContextAsync(mirrorMultiClusterScopingTests.SeedTwoClusterConfigAsyncbut a single cluster bound toNodeANodeId). StartDeploymentAsync(createdBy: ...)→ assertAccepted. Condition-poll (≤20 s) until the store reports the instanceHealthy(andfactory.TryGetCreated(instanceId)is non-null).factory.Created[instanceId].ReportReconnecting()— simulate the driver having lost its connection (the realistic trigger for an operator Reconnect).- Dispatch
ReconnectDriver(clusterId, instanceId, "e2e", Guid.NewGuid())viaIAdminOperationsClient.AskAsync<ReconnectDriverResult>; assertOk. - Condition-poll the captured push list until it contains a
Reconnectingentry followed by a laterHealthy. Assert: the sequence showsReconnecting→Healthy, the store's final state isHealthy, andInitializeCount >= 2(proves the command genuinely re-initialised the deployed driver through the full cluster path — not just a health poke).
This test runs without any Docker fixture (the fake driver is in-process) — NOT skip-gated.
Step 2: Run. dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests --filter "FullyQualifiedName~DriverReconnectE2eTests" — all green (the new test executes, not skips).
Step 3: Commit.
git add tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs
git commit -m "test(adminui): E2E deployed-driver Healthy→Reconnecting→Healthy transition on Reconnect"
Task 3: Full driver E2E suite live run + verification
Classification: small Estimated implement time: ~5 min Parallelizable with: none (depends on Task 2)
Files: none (verification only)
Step 1: Bring up the Modbus sim so the skip-gated 10.1 tests execute (not skip):
lmxopcua-fix up modbus standard (sim at 10.100.0.35:5020). Verify reachability.
Step 2: Run the full driver E2E suite:
dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests — confirm the
DriverTestConnectE2eTests (now executing against the live sim, not skipped),
DriverReconnectE2eTests (incl. the new transition test), and DriverStatusHubE2eTests
all pass. Record pass counts + which previously-skipped tests now executed.
Step 3: If any sim-gated test cannot run (sim unreachable from this host), record that honestly; the new in-process transition test must pass regardless. No commit (verification).
Task 4: Reconcile stale trackers + finish
Classification: small Estimated implement time: ~4 min Parallelizable with: none (depends on Task 3)
Files:
- Modify:
docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json(mark Phases 6–10completedwith real commits / a "shipped — reconciled 2026-06-18" note; bumplastUpdated) - Modify:
docs/plans/2026-05-28-adminui-driver-pages-design.md(§8.3:ModbusTcp→Modbus) - Modify:
stillpending.md§A.9 (mark Phase 6/8/10 SHIPPED; record the new reconnect-transition test; keep the full-stack hub test as a documented deferred follow-up) — NEVER STAGE this file (local working file) - Modify: memory
project_stillpending_backlog.md+MEMORY.md
Step 1: Reconcile the .tasks.json (Phases 6–10 → completed, with commit refs from the
brainstorming finding) and fix the §8.3 ModbusTcp string.
Step 2: Stage only the two docs/plans/... files (the tasks.json + the design md) —
by explicit path. Do NOT git add .. Do NOT stage stillpending.md.
git add docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json \
docs/plans/2026-05-28-adminui-driver-pages-design.md
git commit -m "docs(plans): reconcile driver-pages tasks (Phases 6-10 shipped) + fix smoke checklist"
Step 3: Update stillpending.md §A.9 (unstaged) + memory files.
Step 4: Finish. Use superpowers-extended-cc:finishing-a-development-branch — verify the
suite green, then merge feat/driver-pages-reconnect-e2e → master (ff/merge) + push. Bookkeep
this plan's .tasks.json (executionState COMPLETE) on master.
Cross-cutting verification (before merge)
dotnet build tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests— clean.dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests— green (new test executes).git diff --stat master..— only the expected harness/test/docs files; no surprise changes, no never-stage files staged.