docs(plans): driver-pages Phase 10 reconnect-transition E2E + close-out design
This commit is contained in:
@@ -0,0 +1,157 @@
|
||||
# Driver-pages Phase 10 — reconnect-transition E2E + plan close-out — Design
|
||||
|
||||
**Date:** 2026-06-18
|
||||
**Status:** Approved
|
||||
**Base:** master `08c7a2bd`
|
||||
**Branch:** `feat/driver-pages-reconnect-e2e`
|
||||
|
||||
## Context
|
||||
|
||||
The `2026-05-28-adminui-driver-pages` plan is **fully shipped** (Phases 0–10), but its
|
||||
`.tasks.json` is comprehensively stale (it still marks Phases 6–10 "pending" while the
|
||||
code + tests all exist and are real). A brainstorming pass on 2026-06-18 verified, seam
|
||||
by seam, that:
|
||||
|
||||
- **Phase 6** (live `DriverStatusPanel`) is built end-to-end: `DriverHealthChanged`
|
||||
contract, `AkkaDriverHealthPublisher` (DI-bound in prod, invoked at
|
||||
`DriverInstanceActor.PublishHealthSnapshot`), `DriverStatusSignalRBridge` (spawned
|
||||
admin-gated at `Program.cs:196`), the shared-singleton snapshot store, the hub
|
||||
(`MapHub<DriverStatusHub>`), and the panel wired into all 9 driver pages. Page
|
||||
`DriverInstanceId` and actor `_driverInstanceId` key on the same EF value — no mismatch.
|
||||
- **Phase 8** (Reconnect/Restart) is built: messages, `AdminOperationsActor` +
|
||||
`DriverHostActor` handlers, DriverOperator-gated buttons in the panel.
|
||||
- **Phase 10** automated E2E tests (`DriverTestConnectE2eTests`, `DriverReconnectE2eTests`,
|
||||
`DriverStatusHubE2eTests`) are real, skip-gated, and honest about their scope.
|
||||
|
||||
Two genuine remnants remain, both flagged by the `DriverReconnectE2eTests` /
|
||||
`DriverStatusHubE2eTests` scope-notes:
|
||||
|
||||
1. **The one real coverage gap:** no test proves a *deployed* driver actually transitions
|
||||
**Healthy → Reconnecting → Healthy** in response to `ReconnectDriver`. The existing
|
||||
10.2 test only asserts command *ingestion* (the round-trip reply), not the resulting
|
||||
actor health transition.
|
||||
2. **A harness-fidelity gap behind it:** `TwoNodeClusterHarness.BuildNodeAsync` calls
|
||||
`WithOtOpcUaRuntimeActors()` (the Akka actor spawn) but **not** `AddOtOpcUaRuntime()`
|
||||
(the DI registration that binds `IDriverHealthPublisher → AkkaDriverHealthPublisher`).
|
||||
Consequently, in *every* current `Host.IntegrationTests`, deployed driver actors fall
|
||||
back to `NullDriverHealthPublisher` and emit no health to the `driver-health` DPS topic.
|
||||
The harness also leaves `IDriverFactory` at `NullDriverFactory`, so deployed drivers
|
||||
reach `Stubbed`, never `Connected`.
|
||||
|
||||
The plan's stale trackers caused this item to be (wrongly) re-listed as OPEN in
|
||||
`stillpending.md` §A.9. This phase closes both remnants and reconciles the trackers.
|
||||
|
||||
## Goal
|
||||
|
||||
Close the genuine `ReconnectDriver` health-transition E2E gap, fix the harness-fidelity gap
|
||||
behind it, prove the full Phase 10 driver suite green, and reconcile the stale trackers so
|
||||
this fully-shipped plan stops re-triggering as backlog.
|
||||
|
||||
## Design
|
||||
|
||||
### 1. Harness fidelity fix
|
||||
|
||||
In `TwoNodeClusterHarness.BuildNodeAsync`:
|
||||
|
||||
- Add `builder.Services.AddOtOpcUaRuntime()` **before** `AddAkka` (matching production
|
||||
`Program.cs:87`). This binds the real `AkkaDriverHealthPublisher` so deployed drivers
|
||||
publish health to the `driver-health` DPS topic. It also seeds the `Null*` runtime
|
||||
defaults (`IHistorianDataSource`, `IAlarmHistorianSink`, `IHistoryWriter`,
|
||||
`IDriverFactory`, …) — all harmless no-ops that don't change existing test behavior
|
||||
(nothing in the current suite subscribes to driver-health, and the Null sinks are inert).
|
||||
- Add an **opt-in** seam to inject a test `IDriverFactory` for tests that need a connecting
|
||||
driver. Default (no factory supplied) leaves the existing behavior untouched. Mechanism:
|
||||
a `StartAsync` parameter (e.g. `IDriverFactory? driverFactory = null`) threaded into
|
||||
`BuildNodeAsync`; when supplied, register it as a singleton **after** `AddOtOpcUaRuntime`
|
||||
so it wins over the `Null` default (last-registration-wins / replace).
|
||||
|
||||
This change is fidelity-improving for the whole suite: the existing `DriverStatusHubE2eTests`
|
||||
keeps spawning its own bridge, but real driver health now flows in tests that deploy drivers.
|
||||
|
||||
### 2. The reconnect-transition E2E test (the real gap)
|
||||
|
||||
A new test (in `DriverReconnectE2eTests.cs`, or a focused new file) that:
|
||||
|
||||
1. Starts the harness with a **controllable fake `IDriverFactory`** (see decision below).
|
||||
2. Seeds a driver row + minimal equipment/tag using the existing
|
||||
`SeedDeploymentWithEquipmentTags` precedent (from `DriverHostActorLiveValueTests`),
|
||||
bound to `NodeANodeId`.
|
||||
3. Triggers a deploy (`DispatchDeployment`, or `POST /api/deployments` with
|
||||
`HarnessDeployApiKey`).
|
||||
4. Spawns the **real** `DriverStatusSignalRBridge` over the real DI snapshot store (the
|
||||
store is the observation surface; a mock `IHubContext` captures the hub push the same
|
||||
way the existing hub test does).
|
||||
5. Waits (condition-poll, generous timeout) for the snapshot store to report the deployed
|
||||
instance as `Healthy`.
|
||||
6. Dispatches `ReconnectDriver` via `IAdminOperationsClient` (the real cluster-singleton
|
||||
path the AdminUI button uses).
|
||||
7. Asserts the store observes the transition **`Reconnecting`** and then returns to
|
||||
**`Healthy`** within a timeout — proving the full wiring:
|
||||
`ReconnectDriver → AdminOperationsActor → DriverHostActor.HandleReconnectDriver →
|
||||
DriverInstanceActor FSM (ForceReconnect → Become(Reconnecting) → Become(Connected)) →
|
||||
PublishHealthSnapshot → driver-health DPS topic → DriverStatusSignalRBridge → store`.
|
||||
|
||||
#### Decision: controllable fake driver factory (not the real Modbus sim)
|
||||
|
||||
**Recommended and approved:** observe the transition via a deterministic, controllable fake
|
||||
`IDriver` / `IDriverFactory` test double rather than a real Modbus sim connection.
|
||||
|
||||
Rationale:
|
||||
|
||||
- **Determinism, no flakiness.** A fake driver whose connect succeeds drives the actor to
|
||||
`Connected` immediately; `ReconnectDriver` re-enters `Reconnecting` then `Connected`
|
||||
deterministically. No sim timing, no skip-gate, runs everywhere.
|
||||
- **Smaller blast radius.** The real-sim path additionally needs the full driver-factory
|
||||
bootstrap (all 9 driver factories + deps) wired into the shared harness.
|
||||
- **The wiring is what matters.** This gap is about the *health-transition + command
|
||||
wiring*, not the Modbus protocol. The real Modbus TCP connect/reconnect is already
|
||||
covered by the `Modbus.IntegrationTests` and the 10.1 `TestConnect` E2E (against the sim).
|
||||
|
||||
The fake `IDriver` exposes a minimal controllable surface (succeed-connect, optionally
|
||||
signal a fault) sufficient to walk the FSM through `Connected → Reconnecting → Connected`.
|
||||
|
||||
### 3. Live suite run
|
||||
|
||||
Run the full `Host.IntegrationTests` driver E2E suite. Bring up the Modbus sim
|
||||
(`lmxopcua-fix up modbus standard`, endpoint `10.100.0.35:5020` / `MODBUS_SIM_ENDPOINT`) so
|
||||
the skip-gated 10.1 `DriverTestConnectE2eTests` actually execute green (not skipped),
|
||||
alongside the new deterministic reconnect test (which runs regardless of the sim).
|
||||
|
||||
### 4. Reconcile the stale trackers
|
||||
|
||||
- `docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json`: mark Phases 6–10 tasks
|
||||
`completed` with their real commits / a "shipped — reconciled 2026-06-18" note; flip
|
||||
`executionState`/`lastUpdated`.
|
||||
- `stillpending.md` §A.9: mark Phase 6/8/10 SHIPPED (note the new reconnect-transition test;
|
||||
keep the deferred full-stack hub test as a documented follow-up). *(Never staged — local
|
||||
working file.)*
|
||||
- `docs/plans/2026-05-28-adminui-driver-pages-design.md` §8.3: fix the stale `ModbusTcp` →
|
||||
`Modbus` reference in the smoke checklist.
|
||||
- Memory: update `project_stillpending_backlog.md` + `MEMORY.md`.
|
||||
|
||||
### 5. Explicitly deferred (documented follow-up)
|
||||
|
||||
The **full-stack WebSocket + JWT `DriverStatusHub` connection test** (a real `HubConnection`
|
||||
to `/hubs/driverstatus` with a minted bearer token, `JoinDriver`, assert client receipt).
|
||||
No repo precedent (no test mints a JWT or opens a real `HubConnection`), flaky-prone, and it
|
||||
only re-covers transport the mock-hub test + the §8.3 manual runbook already handle.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- The 10.4 manual browser smoke (driving the AdminUI on docker-dev). Foldable later; the
|
||||
automated reconnect test + green suite is the higher-value core.
|
||||
- Any production code change. This phase is test + harness + docs only.
|
||||
|
||||
## Constraints
|
||||
|
||||
- xUnit + Shouldly. **No bUnit.**
|
||||
- **No** EF migration, **no** Commons wire/proto contract change, **no**
|
||||
Core.Abstractions / interface contract change.
|
||||
- Stage by explicit path; never `git add .`; never stage `sql_login.txt`,
|
||||
`src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/`, `pending.md`, `current.md`,
|
||||
`docker-dev/docker-compose.yml`, `stillpending.md`.
|
||||
- No force-push, no `--no-verify`.
|
||||
|
||||
## Finish
|
||||
|
||||
Merge to master + push.
|
||||
Reference in New Issue
Block a user