Files
lmxopcua/docs/plans/2026-06-18-driver-pages-reconnect-e2e-design.md
T

158 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Driver-pages Phase 10 — reconnect-transition E2E + plan close-out — Design
**Date:** 2026-06-18
**Status:** Approved
**Base:** master `08c7a2bd`
**Branch:** `feat/driver-pages-reconnect-e2e`
## Context
The `2026-05-28-adminui-driver-pages` plan is **fully shipped** (Phases 010), but its
`.tasks.json` is comprehensively stale (it still marks Phases 610 "pending" while the
code + tests all exist and are real). A brainstorming pass on 2026-06-18 verified, seam
by seam, that:
- **Phase 6** (live `DriverStatusPanel`) is built end-to-end: `DriverHealthChanged`
contract, `AkkaDriverHealthPublisher` (DI-bound in prod, invoked at
`DriverInstanceActor.PublishHealthSnapshot`), `DriverStatusSignalRBridge` (spawned
admin-gated at `Program.cs:196`), the shared-singleton snapshot store, the hub
(`MapHub<DriverStatusHub>`), and the panel wired into all 9 driver pages. Page
`DriverInstanceId` and actor `_driverInstanceId` key on the same EF value — no mismatch.
- **Phase 8** (Reconnect/Restart) is built: messages, `AdminOperationsActor` +
`DriverHostActor` handlers, DriverOperator-gated buttons in the panel.
- **Phase 10** automated E2E tests (`DriverTestConnectE2eTests`, `DriverReconnectE2eTests`,
`DriverStatusHubE2eTests`) are real, skip-gated, and honest about their scope.
Two genuine remnants remain, both flagged by the `DriverReconnectE2eTests` /
`DriverStatusHubE2eTests` scope-notes:
1. **The one real coverage gap:** no test proves a *deployed* driver actually transitions
**Healthy → Reconnecting → Healthy** in response to `ReconnectDriver`. The existing
10.2 test only asserts command *ingestion* (the round-trip reply), not the resulting
actor health transition.
2. **A harness-fidelity gap behind it:** `TwoNodeClusterHarness.BuildNodeAsync` calls
`WithOtOpcUaRuntimeActors()` (the Akka actor spawn) but **not** `AddOtOpcUaRuntime()`
(the DI registration that binds `IDriverHealthPublisher → AkkaDriverHealthPublisher`).
Consequently, in *every* current `Host.IntegrationTests`, deployed driver actors fall
back to `NullDriverHealthPublisher` and emit no health to the `driver-health` DPS topic.
The harness also leaves `IDriverFactory` at `NullDriverFactory`, so deployed drivers
reach `Stubbed`, never `Connected`.
The plan's stale trackers caused this item to be (wrongly) re-listed as OPEN in
`stillpending.md` §A.9. This phase closes both remnants and reconciles the trackers.
## Goal
Close the genuine `ReconnectDriver` health-transition E2E gap, fix the harness-fidelity gap
behind it, prove the full Phase 10 driver suite green, and reconcile the stale trackers so
this fully-shipped plan stops re-triggering as backlog.
## Design
### 1. Harness fidelity fix
In `TwoNodeClusterHarness.BuildNodeAsync`:
- Add `builder.Services.AddOtOpcUaRuntime()` **before** `AddAkka` (matching production
`Program.cs:87`). This binds the real `AkkaDriverHealthPublisher` so deployed drivers
publish health to the `driver-health` DPS topic. It also seeds the `Null*` runtime
defaults (`IHistorianDataSource`, `IAlarmHistorianSink`, `IHistoryWriter`,
`IDriverFactory`, …) — all harmless no-ops that don't change existing test behavior
(nothing in the current suite subscribes to driver-health, and the Null sinks are inert).
- Add an **opt-in** seam to inject a test `IDriverFactory` for tests that need a connecting
driver. Default (no factory supplied) leaves the existing behavior untouched. Mechanism:
a `StartAsync` parameter (e.g. `IDriverFactory? driverFactory = null`) threaded into
`BuildNodeAsync`; when supplied, register it as a singleton **after** `AddOtOpcUaRuntime`
so it wins over the `Null` default (last-registration-wins / replace).
This change is fidelity-improving for the whole suite: the existing `DriverStatusHubE2eTests`
keeps spawning its own bridge, but real driver health now flows in tests that deploy drivers.
### 2. The reconnect-transition E2E test (the real gap)
A new test (in `DriverReconnectE2eTests.cs`, or a focused new file) that:
1. Starts the harness with a **controllable fake `IDriverFactory`** (see decision below).
2. Seeds a driver row + minimal equipment/tag using the existing
`SeedDeploymentWithEquipmentTags` precedent (from `DriverHostActorLiveValueTests`),
bound to `NodeANodeId`.
3. Triggers a deploy (`DispatchDeployment`, or `POST /api/deployments` with
`HarnessDeployApiKey`).
4. Spawns the **real** `DriverStatusSignalRBridge` over the real DI snapshot store (the
store is the observation surface; a mock `IHubContext` captures the hub push the same
way the existing hub test does).
5. Waits (condition-poll, generous timeout) for the snapshot store to report the deployed
instance as `Healthy`.
6. Dispatches `ReconnectDriver` via `IAdminOperationsClient` (the real cluster-singleton
path the AdminUI button uses).
7. Asserts the store observes the transition **`Reconnecting`** and then returns to
**`Healthy`** within a timeout — proving the full wiring:
`ReconnectDriver → AdminOperationsActor → DriverHostActor.HandleReconnectDriver →
DriverInstanceActor FSM (ForceReconnect → Become(Reconnecting) → Become(Connected)) →
PublishHealthSnapshot → driver-health DPS topic → DriverStatusSignalRBridge → store`.
#### Decision: controllable fake driver factory (not the real Modbus sim)
**Recommended and approved:** observe the transition via a deterministic, controllable fake
`IDriver` / `IDriverFactory` test double rather than a real Modbus sim connection.
Rationale:
- **Determinism, no flakiness.** A fake driver whose connect succeeds drives the actor to
`Connected` immediately; `ReconnectDriver` re-enters `Reconnecting` then `Connected`
deterministically. No sim timing, no skip-gate, runs everywhere.
- **Smaller blast radius.** The real-sim path additionally needs the full driver-factory
bootstrap (all 9 driver factories + deps) wired into the shared harness.
- **The wiring is what matters.** This gap is about the *health-transition + command
wiring*, not the Modbus protocol. The real Modbus TCP connect/reconnect is already
covered by the `Modbus.IntegrationTests` and the 10.1 `TestConnect` E2E (against the sim).
The fake `IDriver` exposes a minimal controllable surface (succeed-connect, optionally
signal a fault) sufficient to walk the FSM through `Connected → Reconnecting → Connected`.
### 3. Live suite run
Run the full `Host.IntegrationTests` driver E2E suite. Bring up the Modbus sim
(`lmxopcua-fix up modbus standard`, endpoint `10.100.0.35:5020` / `MODBUS_SIM_ENDPOINT`) so
the skip-gated 10.1 `DriverTestConnectE2eTests` actually execute green (not skipped),
alongside the new deterministic reconnect test (which runs regardless of the sim).
### 4. Reconcile the stale trackers
- `docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json`: mark Phases 610 tasks
`completed` with their real commits / a "shipped — reconciled 2026-06-18" note; flip
`executionState`/`lastUpdated`.
- `stillpending.md` §A.9: mark Phase 6/8/10 SHIPPED (note the new reconnect-transition test;
keep the deferred full-stack hub test as a documented follow-up). *(Never staged — local
working file.)*
- `docs/plans/2026-05-28-adminui-driver-pages-design.md` §8.3: fix the stale `ModbusTcp`
`Modbus` reference in the smoke checklist.
- Memory: update `project_stillpending_backlog.md` + `MEMORY.md`.
### 5. Explicitly deferred (documented follow-up)
The **full-stack WebSocket + JWT `DriverStatusHub` connection test** (a real `HubConnection`
to `/hubs/driverstatus` with a minted bearer token, `JoinDriver`, assert client receipt).
No repo precedent (no test mints a JWT or opens a real `HubConnection`), flaky-prone, and it
only re-covers transport the mock-hub test + the §8.3 manual runbook already handle.
## Out of scope
- The 10.4 manual browser smoke (driving the AdminUI on docker-dev). Foldable later; the
automated reconnect test + green suite is the higher-value core.
- Any production code change. This phase is test + harness + docs only.
## Constraints
- xUnit + Shouldly. **No bUnit.**
- **No** EF migration, **no** Commons wire/proto contract change, **no**
Core.Abstractions / interface contract change.
- Stage by explicit path; never `git add .`; never stage `sql_login.txt`,
`src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/`, `pending.md`, `current.md`,
`docker-dev/docker-compose.yml`, `stillpending.md`.
- No force-push, no `--no-verify`.
## Finish
Merge to master + push.