8.4 KiB
Driver-pages Phase 10 — reconnect-transition E2E + plan close-out — Design
Date: 2026-06-18
Status: Approved
Base: master 08c7a2bd
Branch: feat/driver-pages-reconnect-e2e
Context
The 2026-05-28-adminui-driver-pages plan is fully shipped (Phases 0–10), but its
.tasks.json is comprehensively stale (it still marks Phases 6–10 "pending" while the
code + tests all exist and are real). A brainstorming pass on 2026-06-18 verified, seam
by seam, that:
- Phase 6 (live
DriverStatusPanel) is built end-to-end:DriverHealthChangedcontract,AkkaDriverHealthPublisher(DI-bound in prod, invoked atDriverInstanceActor.PublishHealthSnapshot),DriverStatusSignalRBridge(spawned admin-gated atProgram.cs:196), the shared-singleton snapshot store, the hub (MapHub<DriverStatusHub>), and the panel wired into all 9 driver pages. PageDriverInstanceIdand actor_driverInstanceIdkey on the same EF value — no mismatch. - Phase 8 (Reconnect/Restart) is built: messages,
AdminOperationsActor+DriverHostActorhandlers, DriverOperator-gated buttons in the panel. - Phase 10 automated E2E tests (
DriverTestConnectE2eTests,DriverReconnectE2eTests,DriverStatusHubE2eTests) are real, skip-gated, and honest about their scope.
Two genuine remnants remain, both flagged by the DriverReconnectE2eTests /
DriverStatusHubE2eTests scope-notes:
- The one real coverage gap: no test proves a deployed driver actually transitions
Healthy → Reconnecting → Healthy in response to
ReconnectDriver. The existing 10.2 test only asserts command ingestion (the round-trip reply), not the resulting actor health transition. - A harness-fidelity gap behind it:
TwoNodeClusterHarness.BuildNodeAsynccallsWithOtOpcUaRuntimeActors()(the Akka actor spawn) but notAddOtOpcUaRuntime()(the DI registration that bindsIDriverHealthPublisher → AkkaDriverHealthPublisher). Consequently, in every currentHost.IntegrationTests, deployed driver actors fall back toNullDriverHealthPublisherand emit no health to thedriver-healthDPS topic. The harness also leavesIDriverFactoryatNullDriverFactory, so deployed drivers reachStubbed, neverConnected.
The plan's stale trackers caused this item to be (wrongly) re-listed as OPEN in
stillpending.md §A.9. This phase closes both remnants and reconciles the trackers.
Goal
Close the genuine ReconnectDriver health-transition E2E gap, fix the harness-fidelity gap
behind it, prove the full Phase 10 driver suite green, and reconcile the stale trackers so
this fully-shipped plan stops re-triggering as backlog.
Design
1. Harness fidelity fix
In TwoNodeClusterHarness.BuildNodeAsync:
- Add
builder.Services.AddOtOpcUaRuntime()beforeAddAkka(matching productionProgram.cs:87). This binds the realAkkaDriverHealthPublisherso deployed drivers publish health to thedriver-healthDPS topic. It also seeds theNull*runtime defaults (IHistorianDataSource,IAlarmHistorianSink,IHistoryWriter,IDriverFactory, …) — all harmless no-ops that don't change existing test behavior (nothing in the current suite subscribes to driver-health, and the Null sinks are inert). - Add an opt-in seam to inject a test
IDriverFactoryfor tests that need a connecting driver. Default (no factory supplied) leaves the existing behavior untouched. Mechanism: aStartAsyncparameter (e.g.IDriverFactory? driverFactory = null) threaded intoBuildNodeAsync; when supplied, register it as a singleton afterAddOtOpcUaRuntimeso it wins over theNulldefault (last-registration-wins / replace).
This change is fidelity-improving for the whole suite: the existing DriverStatusHubE2eTests
keeps spawning its own bridge, but real driver health now flows in tests that deploy drivers.
2. The reconnect-transition E2E test (the real gap)
A new test (in DriverReconnectE2eTests.cs, or a focused new file) that:
- Starts the harness with a controllable fake
IDriverFactory(see decision below). - Seeds a driver row + minimal equipment/tag using the existing
SeedDeploymentWithEquipmentTagsprecedent (fromDriverHostActorLiveValueTests), bound toNodeANodeId. - Triggers a deploy (
DispatchDeployment, orPOST /api/deploymentswithHarnessDeployApiKey). - Spawns the real
DriverStatusSignalRBridgeover the real DI snapshot store (the store is the observation surface; a mockIHubContextcaptures the hub push the same way the existing hub test does). - Waits (condition-poll, generous timeout) for the snapshot store to report the deployed
instance as
Healthy. - Dispatches
ReconnectDriverviaIAdminOperationsClient(the real cluster-singleton path the AdminUI button uses). - Asserts the store observes the transition
Reconnectingand then returns toHealthywithin a timeout — proving the full wiring:ReconnectDriver → AdminOperationsActor → DriverHostActor.HandleReconnectDriver → DriverInstanceActor FSM (ForceReconnect → Become(Reconnecting) → Become(Connected)) → PublishHealthSnapshot → driver-health DPS topic → DriverStatusSignalRBridge → store.
Decision: controllable fake driver factory (not the real Modbus sim)
Recommended and approved: observe the transition via a deterministic, controllable fake
IDriver / IDriverFactory test double rather than a real Modbus sim connection.
Rationale:
- Determinism, no flakiness. A fake driver whose connect succeeds drives the actor to
Connectedimmediately;ReconnectDriverre-entersReconnectingthenConnecteddeterministically. No sim timing, no skip-gate, runs everywhere. - Smaller blast radius. The real-sim path additionally needs the full driver-factory bootstrap (all 9 driver factories + deps) wired into the shared harness.
- The wiring is what matters. This gap is about the health-transition + command
wiring, not the Modbus protocol. The real Modbus TCP connect/reconnect is already
covered by the
Modbus.IntegrationTestsand the 10.1TestConnectE2E (against the sim).
The fake IDriver exposes a minimal controllable surface (succeed-connect, optionally
signal a fault) sufficient to walk the FSM through Connected → Reconnecting → Connected.
3. Live suite run
Run the full Host.IntegrationTests driver E2E suite. Bring up the Modbus sim
(lmxopcua-fix up modbus standard, endpoint 10.100.0.35:5020 / MODBUS_SIM_ENDPOINT) so
the skip-gated 10.1 DriverTestConnectE2eTests actually execute green (not skipped),
alongside the new deterministic reconnect test (which runs regardless of the sim).
4. Reconcile the stale trackers
docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json: mark Phases 6–10 taskscompletedwith their real commits / a "shipped — reconciled 2026-06-18" note; flipexecutionState/lastUpdated.stillpending.md§A.9: mark Phase 6/8/10 SHIPPED (note the new reconnect-transition test; keep the deferred full-stack hub test as a documented follow-up). (Never staged — local working file.)docs/plans/2026-05-28-adminui-driver-pages-design.md§8.3: fix the staleModbusTcp→Modbusreference in the smoke checklist.- Memory: update
project_stillpending_backlog.md+MEMORY.md.
5. Explicitly deferred (documented follow-up)
The full-stack WebSocket + JWT DriverStatusHub connection test (a real HubConnection
to /hubs/driverstatus with a minted bearer token, JoinDriver, assert client receipt).
No repo precedent (no test mints a JWT or opens a real HubConnection), flaky-prone, and it
only re-covers transport the mock-hub test + the §8.3 manual runbook already handle.
Out of scope
- The 10.4 manual browser smoke (driving the AdminUI on docker-dev). Foldable later; the automated reconnect test + green suite is the higher-value core.
- Any production code change. This phase is test + harness + docs only.
Constraints
- xUnit + Shouldly. No bUnit.
- No EF migration, no Commons wire/proto contract change, no Core.Abstractions / interface contract change.
- Stage by explicit path; never
git add .; never stagesql_login.txt,src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/,pending.md,current.md,docker-dev/docker-compose.yml,stillpending.md. - No force-push, no
--no-verify.
Finish
Merge to master + push.