Files
lmxopcua/docs/plans/2026-06-18-driver-pages-reconnect-e2e.md
T

227 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Driver-pages Phase 10 — reconnect-transition E2E + close-out Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to implement this plan task-by-task.
**Goal:** Add the one genuinely-missing driver-pages E2E test — a *deployed* driver
transitioning **Healthy → Reconnecting → Healthy** on `ReconnectDriver` — fix the
harness-fidelity gap behind it, prove the suite green, and reconcile the stale trackers.
**Architecture:** Extend `TwoNodeClusterHarness` to match production DI (`AddOtOpcUaRuntime`,
which binds the real `AkkaDriverHealthPublisher`) and to accept an opt-in test
`IDriverFactory`. A controllable fake driver lets a deployed driver reach `Connected`
deterministically; the real `DriverStatusSignalRBridge` + a capturing mock `IHubContext`
record the full health-transition sequence through the real cluster wiring.
**Tech Stack:** xUnit + Shouldly, Akka.NET TestKit/Hosting, Moq (for `IHubContext`), EF
InMemory. No bUnit, no EF migration, no Commons/proto/interface change.
**Design:** `docs/plans/2026-06-18-driver-pages-reconnect-e2e-design.md` (committed `482418c8`).
---
### Task 1: Harness fidelity fix + controllable fake driver factory
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none
**Files:**
- Create: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs`
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/TwoNodeClusterHarness.cs`
- Read for contract (do NOT edit): `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriver.cs`,
`src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriverFactory.cs`,
`tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorTests.cs`
(existing fake `IDriver`/`IDriverFactory` double to template from),
`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs` (the
`AddOtOpcUaRuntime` method + the `resolver.GetService<IDriverHealthPublisher>()` site).
**Context (scene-setting):** Today `TwoNodeClusterHarness.BuildNodeAsync` calls
`WithOtOpcUaRuntimeActors()` (Akka actor spawn) but **not** `AddOtOpcUaRuntime()` (the DI
registration). So `IDriverHealthPublisher` resolves to `NullDriverHealthPublisher` and
`IDriverFactory` to `NullDriverFactory` → deployed drivers never publish health and reach
only `Stubbed`. Production (`Program.cs:87` + `:199`) calls both. This task brings the
harness to production fidelity and adds an opt-in fake factory so a test can drive a real
`Connected` state.
**Step 1: Build the fake driver double.** Create `FakeReconnectDriverFactory` implementing
`IDriverFactory` whose `TryCreate(driverType, instanceId, configJson)` returns a fake
`IDriver` (for `driverType == "Modbus"`; `SupportedTypes => ["Modbus"]`). Mirror the
existing fake `IDriver` in `DriverInstanceActorTests.cs` for the full member surface
(`DriverType`, connect/initialize, read/write/subscribe, dispose). The fake's
initialize/connect path must **succeed** so the `DriverInstanceActor` reaches
`InitializeSucceeded → Become(Connected)`. Keep read/subscribe as benign success/no-ops.
(No fault-injection needed: `ReconnectDriver` drives `ForceReconnect → Reconnecting →
re-initialize → Connected` on its own.)
**Step 2: Wire the harness.** In `TwoNodeClusterHarness`:
- Add `builder.Services.AddOtOpcUaRuntime();` **before** the `AddAkka(...)` call in
`BuildNodeAsync` (match production ordering — "Call this BEFORE AddAkka").
- Add an optional `IDriverFactory? driverFactory = null` parameter to `StartAsync` and
thread it into `BuildNodeAsync`; when non-null, register it
(`builder.Services.AddSingleton<IDriverFactory>(driverFactory);` placed **after**
`AddOtOpcUaRuntime` so it replaces the `Null` default — confirm the runtime resolves the
last/replacement registration, not the `TryAdd` default; if `TryAdd` would win, use
`Replace` or register the override before `AddOtOpcUaRuntime`).
**Step 3: Build + regression-check existing suite.** Run
`dotnet build tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests` (clean), then
`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests` — the existing tests
(DeployApi, DriverReconnect, DriverStatusHub, DriverTestConnect, etc.) must stay green with
the added `AddOtOpcUaRuntime` (the Null sinks are inert; nothing existing subscribes to
driver-health). Skipped fixture-gated tests staying skipped is expected.
**Step 4: Commit.**
```bash
git add tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs \
tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/TwoNodeClusterHarness.cs
git commit -m "test(harness): production-fidelity DI (AddOtOpcUaRuntime) + opt-in fake driver factory"
```
---
### Task 2: Reconnect health-transition E2E test
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** none (depends on Task 1)
**Files:**
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs`
(add the new test method; keep the existing two)
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs`
(Step 0: controllable health + `InitializeCount` + created-driver accessor)
- Read for pattern (do NOT edit):
`tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/MultiClusterScopingTests.cs`
(the GREEN seed-`ServerCluster`/`Namespace`/`ClusterNode`/`DriverInstance` + `StartDeploymentAsync`
`Accepted` + per-node driver spawn precedent),
`tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverStatusHubE2eTests.cs`
(the mock `IHubContext` capture + manual bridge-spawn pattern).
**Context:** The existing `Reconnect_RoundTrip_ReturnsOk` only asserts command ingestion.
This new test proves the *actual health transition* of a deployed driver, end-to-end through
the real cluster wiring: `ReconnectDriver → AdminOperationsActor → DriverHostActor →
DriverInstanceActor FSM → PublishHealthSnapshot → driver-health DPS topic →
DriverStatusSignalRBridge → snapshot store / hub push`.
**Key findings from exploration (these shape the design — do not skip):**
- **Published `State` = `_driver.GetHealth().State`** (`DriverInstanceActor.PublishHealthSnapshot`,
line 750). The actor FSM (`Become(Reconnecting)`) does NOT set the published state directly —
on `ForceReconnect` it does `DetachSubscription(); Become(Reconnecting); PublishHealthSnapshot()`,
which *polls the driver's `GetHealth()`*. So the **always-`Healthy` Task 1 fake can never surface
`Reconnecting`.** The fake must report `Reconnecting` at that poll. The realistic, deterministic
way: the fake reports `Reconnecting` (simulating a dropped connection — exactly what prompts an
operator to click Reconnect), the `ForceReconnect` poll publishes it, and the retry's
`InitializeAsync` clears it back to `Healthy`.
- **Validator-clean seed** (from the GREEN `MultiClusterScopingTests`): `ServerCluster` +
`Namespace` + `ClusterNode`(NodeId = `NodeANodeId`) + `DriverInstance`(Enabled, DriverType
`"Modbus"`, DriverConfig `"{}"`). **No equipment/tags** — equipment/tags trip `DraftValidator`
and the deploy is `Rejected` (this is why the stale `EquipmentNamespaceMaterializationTests`
fails — pre-existing, unrelated).
- **Bridge is NOT auto-spawned** by the harness — spawn it manually (as `DriverStatusHubE2eTests`
does). DPS is fire-and-forget (no replay), and the driver's repeat-publish is deduped, so spawn
the bridge + await its DPS subscription (~2 s) **before** deploying so it catches the initial
`Healthy`.
**Step 0: Enhance `FakeReconnectDriver` / `FakeReconnectDriverFactory`** (same file from Task 1):
- `FakeReconnectDriver`: add a `volatile`/locked controllable health — `GetHealth()` returns
`DriverState.Reconnecting` when a `_reconnecting` flag is set, else `Healthy`; a public
`ReportReconnecting()` sets the flag; `InitializeAsync` clears it (and bumps a public
`InitializeCount`). (`DriverHealth` ctor = `new(state, lastSuccessfulRead, lastError)`.)
- `FakeReconnectDriverFactory`: record created drivers so the test can retrieve the one for a
given `driverInstanceId` (e.g. a `ConcurrentDictionary<string, FakeReconnectDriver> Created`
or `TryGetCreated(id)`).
**Step 1: Write the test** `Reconnect_DeployedDriver_TransitionsThroughReconnectingBackToHealthy`:
1. `var factory = new FakeReconnectDriverFactory(); await TwoNodeClusterHarness.StartAsync(driverFactory: factory)`.
2. Resolve the DI `IDriverStatusSnapshotStore`; spawn the real `DriverStatusSignalRBridge` over it
with a **capturing** mock `IHubContext<DriverStatusHub>` recording every pushed
`DriverHealthChanged` (reuse the `DriverStatusHubE2eTests` mock pattern — records every
`SendCoreAsync`). Wait ~2 s for the DPS `SubscribeAck`.
3. Seed `ServerCluster` + `Namespace` + `ClusterNode`(`NodeANodeId`) + one `DriverInstance`
(`"Modbus"`, Enabled, `"{}"`, **no tags**) via `CreateConfigDbContextAsync` (mirror
`MultiClusterScopingTests.SeedTwoClusterConfigAsync` but a single cluster bound to `NodeANodeId`).
4. `StartDeploymentAsync(createdBy: ...)` → assert `Accepted`. Condition-poll (≤20 s) until the
store reports the instance `Healthy` (and `factory.TryGetCreated(instanceId)` is non-null).
5. `factory.Created[instanceId].ReportReconnecting()` — simulate the driver having lost its
connection (the realistic trigger for an operator Reconnect).
6. Dispatch `ReconnectDriver(clusterId, instanceId, "e2e", Guid.NewGuid())` via
`IAdminOperationsClient.AskAsync<ReconnectDriverResult>`; assert `Ok`.
7. Condition-poll the captured push list until it contains a `Reconnecting` entry followed by a
later `Healthy`. Assert: the sequence shows `Reconnecting``Healthy`, the store's final
state is `Healthy`, and `InitializeCount >= 2` (proves the command genuinely re-initialised
the deployed driver through the full cluster path — not just a health poke).
This test runs **without** any Docker fixture (the fake driver is in-process) — NOT skip-gated.
**Step 2: Run.** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests --filter "FullyQualifiedName~DriverReconnectE2eTests"` — all green (the new test executes, not skips).
**Step 3: Commit.**
```bash
git add tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs
git commit -m "test(adminui): E2E deployed-driver Healthy→Reconnecting→Healthy transition on Reconnect"
```
---
### Task 3: Full driver E2E suite live run + verification
**Classification:** small
**Estimated implement time:** ~5 min
**Parallelizable with:** none (depends on Task 2)
**Files:** none (verification only)
**Step 1: Bring up the Modbus sim** so the skip-gated 10.1 tests execute (not skip):
`lmxopcua-fix up modbus standard` (sim at `10.100.0.35:5020`). Verify reachability.
**Step 2: Run the full driver E2E suite:**
`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests` — confirm the
`DriverTestConnectE2eTests` (now executing against the live sim, not skipped),
`DriverReconnectE2eTests` (incl. the new transition test), and `DriverStatusHubE2eTests`
all pass. Record pass counts + which previously-skipped tests now executed.
**Step 3:** If any sim-gated test cannot run (sim unreachable from this host), record that
honestly; the new in-process transition test must pass regardless. No commit (verification).
---
### Task 4: Reconcile stale trackers + finish
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** none (depends on Task 3)
**Files:**
- Modify: `docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json` (mark Phases 610
`completed` with real commits / a "shipped — reconciled 2026-06-18" note; bump `lastUpdated`)
- Modify: `docs/plans/2026-05-28-adminui-driver-pages-design.md` (§8.3: `ModbusTcp``Modbus`)
- Modify: `stillpending.md` §A.9 (mark Phase 6/8/10 SHIPPED; record the new reconnect-transition
test; keep the full-stack hub test as a documented deferred follow-up) — **NEVER STAGE this
file** (local working file)
- Modify: memory `project_stillpending_backlog.md` + `MEMORY.md`
**Step 1:** Reconcile the `.tasks.json` (Phases 610 → completed, with commit refs from the
brainstorming finding) and fix the §8.3 `ModbusTcp` string.
**Step 2:** Stage **only** the two `docs/plans/...` files (the tasks.json + the design md) —
by explicit path. Do NOT `git add .`. Do NOT stage `stillpending.md`.
```bash
git add docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json \
docs/plans/2026-05-28-adminui-driver-pages-design.md
git commit -m "docs(plans): reconcile driver-pages tasks (Phases 6-10 shipped) + fix smoke checklist"
```
**Step 3:** Update `stillpending.md` §A.9 (unstaged) + memory files.
**Step 4: Finish.** Use superpowers-extended-cc:finishing-a-development-branch — verify the
suite green, then merge `feat/driver-pages-reconnect-e2e` → master (ff/merge) + push. Bookkeep
this plan's `.tasks.json` (executionState COMPLETE) on master.
---
## Cross-cutting verification (before merge)
1. `dotnet build tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests` — clean.
2. `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests` — green (new test executes).
3. `git diff --stat master..` — only the expected harness/test/docs files; no surprise changes,
no never-stage files staged.