diff --git a/docs/plans/2026-06-18-driver-pages-reconnect-e2e-design.md b/docs/plans/2026-06-18-driver-pages-reconnect-e2e-design.md new file mode 100644 index 00000000..b066b5ea --- /dev/null +++ b/docs/plans/2026-06-18-driver-pages-reconnect-e2e-design.md @@ -0,0 +1,157 @@ +# Driver-pages Phase 10 — reconnect-transition E2E + plan close-out — Design + +**Date:** 2026-06-18 +**Status:** Approved +**Base:** master `08c7a2bd` +**Branch:** `feat/driver-pages-reconnect-e2e` + +## Context + +The `2026-05-28-adminui-driver-pages` plan is **fully shipped** (Phases 0–10), but its +`.tasks.json` is comprehensively stale (it still marks Phases 6–10 "pending" while the +code + tests all exist and are real). A brainstorming pass on 2026-06-18 verified, seam +by seam, that: + +- **Phase 6** (live `DriverStatusPanel`) is built end-to-end: `DriverHealthChanged` + contract, `AkkaDriverHealthPublisher` (DI-bound in prod, invoked at + `DriverInstanceActor.PublishHealthSnapshot`), `DriverStatusSignalRBridge` (spawned + admin-gated at `Program.cs:196`), the shared-singleton snapshot store, the hub + (`MapHub`), and the panel wired into all 9 driver pages. Page + `DriverInstanceId` and actor `_driverInstanceId` key on the same EF value — no mismatch. +- **Phase 8** (Reconnect/Restart) is built: messages, `AdminOperationsActor` + + `DriverHostActor` handlers, DriverOperator-gated buttons in the panel. +- **Phase 10** automated E2E tests (`DriverTestConnectE2eTests`, `DriverReconnectE2eTests`, + `DriverStatusHubE2eTests`) are real, skip-gated, and honest about their scope. + +Two genuine remnants remain, both flagged by the `DriverReconnectE2eTests` / +`DriverStatusHubE2eTests` scope-notes: + +1. **The one real coverage gap:** no test proves a *deployed* driver actually transitions + **Healthy → Reconnecting → Healthy** in response to `ReconnectDriver`. The existing + 10.2 test only asserts command *ingestion* (the round-trip reply), not the resulting + actor health transition. +2. **A harness-fidelity gap behind it:** `TwoNodeClusterHarness.BuildNodeAsync` calls + `WithOtOpcUaRuntimeActors()` (the Akka actor spawn) but **not** `AddOtOpcUaRuntime()` + (the DI registration that binds `IDriverHealthPublisher → AkkaDriverHealthPublisher`). + Consequently, in *every* current `Host.IntegrationTests`, deployed driver actors fall + back to `NullDriverHealthPublisher` and emit no health to the `driver-health` DPS topic. + The harness also leaves `IDriverFactory` at `NullDriverFactory`, so deployed drivers + reach `Stubbed`, never `Connected`. + +The plan's stale trackers caused this item to be (wrongly) re-listed as OPEN in +`stillpending.md` §A.9. This phase closes both remnants and reconciles the trackers. + +## Goal + +Close the genuine `ReconnectDriver` health-transition E2E gap, fix the harness-fidelity gap +behind it, prove the full Phase 10 driver suite green, and reconcile the stale trackers so +this fully-shipped plan stops re-triggering as backlog. + +## Design + +### 1. Harness fidelity fix + +In `TwoNodeClusterHarness.BuildNodeAsync`: + +- Add `builder.Services.AddOtOpcUaRuntime()` **before** `AddAkka` (matching production + `Program.cs:87`). This binds the real `AkkaDriverHealthPublisher` so deployed drivers + publish health to the `driver-health` DPS topic. It also seeds the `Null*` runtime + defaults (`IHistorianDataSource`, `IAlarmHistorianSink`, `IHistoryWriter`, + `IDriverFactory`, …) — all harmless no-ops that don't change existing test behavior + (nothing in the current suite subscribes to driver-health, and the Null sinks are inert). +- Add an **opt-in** seam to inject a test `IDriverFactory` for tests that need a connecting + driver. Default (no factory supplied) leaves the existing behavior untouched. Mechanism: + a `StartAsync` parameter (e.g. `IDriverFactory? driverFactory = null`) threaded into + `BuildNodeAsync`; when supplied, register it as a singleton **after** `AddOtOpcUaRuntime` + so it wins over the `Null` default (last-registration-wins / replace). + +This change is fidelity-improving for the whole suite: the existing `DriverStatusHubE2eTests` +keeps spawning its own bridge, but real driver health now flows in tests that deploy drivers. + +### 2. The reconnect-transition E2E test (the real gap) + +A new test (in `DriverReconnectE2eTests.cs`, or a focused new file) that: + +1. Starts the harness with a **controllable fake `IDriverFactory`** (see decision below). +2. Seeds a driver row + minimal equipment/tag using the existing + `SeedDeploymentWithEquipmentTags` precedent (from `DriverHostActorLiveValueTests`), + bound to `NodeANodeId`. +3. Triggers a deploy (`DispatchDeployment`, or `POST /api/deployments` with + `HarnessDeployApiKey`). +4. Spawns the **real** `DriverStatusSignalRBridge` over the real DI snapshot store (the + store is the observation surface; a mock `IHubContext` captures the hub push the same + way the existing hub test does). +5. Waits (condition-poll, generous timeout) for the snapshot store to report the deployed + instance as `Healthy`. +6. Dispatches `ReconnectDriver` via `IAdminOperationsClient` (the real cluster-singleton + path the AdminUI button uses). +7. Asserts the store observes the transition **`Reconnecting`** and then returns to + **`Healthy`** within a timeout — proving the full wiring: + `ReconnectDriver → AdminOperationsActor → DriverHostActor.HandleReconnectDriver → + DriverInstanceActor FSM (ForceReconnect → Become(Reconnecting) → Become(Connected)) → + PublishHealthSnapshot → driver-health DPS topic → DriverStatusSignalRBridge → store`. + +#### Decision: controllable fake driver factory (not the real Modbus sim) + +**Recommended and approved:** observe the transition via a deterministic, controllable fake +`IDriver` / `IDriverFactory` test double rather than a real Modbus sim connection. + +Rationale: + +- **Determinism, no flakiness.** A fake driver whose connect succeeds drives the actor to + `Connected` immediately; `ReconnectDriver` re-enters `Reconnecting` then `Connected` + deterministically. No sim timing, no skip-gate, runs everywhere. +- **Smaller blast radius.** The real-sim path additionally needs the full driver-factory + bootstrap (all 9 driver factories + deps) wired into the shared harness. +- **The wiring is what matters.** This gap is about the *health-transition + command + wiring*, not the Modbus protocol. The real Modbus TCP connect/reconnect is already + covered by the `Modbus.IntegrationTests` and the 10.1 `TestConnect` E2E (against the sim). + +The fake `IDriver` exposes a minimal controllable surface (succeed-connect, optionally +signal a fault) sufficient to walk the FSM through `Connected → Reconnecting → Connected`. + +### 3. Live suite run + +Run the full `Host.IntegrationTests` driver E2E suite. Bring up the Modbus sim +(`lmxopcua-fix up modbus standard`, endpoint `10.100.0.35:5020` / `MODBUS_SIM_ENDPOINT`) so +the skip-gated 10.1 `DriverTestConnectE2eTests` actually execute green (not skipped), +alongside the new deterministic reconnect test (which runs regardless of the sim). + +### 4. Reconcile the stale trackers + +- `docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json`: mark Phases 6–10 tasks + `completed` with their real commits / a "shipped — reconciled 2026-06-18" note; flip + `executionState`/`lastUpdated`. +- `stillpending.md` §A.9: mark Phase 6/8/10 SHIPPED (note the new reconnect-transition test; + keep the deferred full-stack hub test as a documented follow-up). *(Never staged — local + working file.)* +- `docs/plans/2026-05-28-adminui-driver-pages-design.md` §8.3: fix the stale `ModbusTcp` → + `Modbus` reference in the smoke checklist. +- Memory: update `project_stillpending_backlog.md` + `MEMORY.md`. + +### 5. Explicitly deferred (documented follow-up) + +The **full-stack WebSocket + JWT `DriverStatusHub` connection test** (a real `HubConnection` +to `/hubs/driverstatus` with a minted bearer token, `JoinDriver`, assert client receipt). +No repo precedent (no test mints a JWT or opens a real `HubConnection`), flaky-prone, and it +only re-covers transport the mock-hub test + the §8.3 manual runbook already handle. + +## Out of scope + +- The 10.4 manual browser smoke (driving the AdminUI on docker-dev). Foldable later; the + automated reconnect test + green suite is the higher-value core. +- Any production code change. This phase is test + harness + docs only. + +## Constraints + +- xUnit + Shouldly. **No bUnit.** +- **No** EF migration, **no** Commons wire/proto contract change, **no** + Core.Abstractions / interface contract change. +- Stage by explicit path; never `git add .`; never stage `sql_login.txt`, + `src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/`, `pending.md`, `current.md`, + `docker-dev/docker-compose.yml`, `stillpending.md`. +- No force-push, no `--no-verify`. + +## Finish + +Merge to master + push.