227 lines
13 KiB
Markdown
227 lines
13 KiB
Markdown
# Driver-pages Phase 10 — reconnect-transition E2E + close-out Implementation Plan
|
||
|
||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to implement this plan task-by-task.
|
||
|
||
**Goal:** Add the one genuinely-missing driver-pages E2E test — a *deployed* driver
|
||
transitioning **Healthy → Reconnecting → Healthy** on `ReconnectDriver` — fix the
|
||
harness-fidelity gap behind it, prove the suite green, and reconcile the stale trackers.
|
||
|
||
**Architecture:** Extend `TwoNodeClusterHarness` to match production DI (`AddOtOpcUaRuntime`,
|
||
which binds the real `AkkaDriverHealthPublisher`) and to accept an opt-in test
|
||
`IDriverFactory`. A controllable fake driver lets a deployed driver reach `Connected`
|
||
deterministically; the real `DriverStatusSignalRBridge` + a capturing mock `IHubContext`
|
||
record the full health-transition sequence through the real cluster wiring.
|
||
|
||
**Tech Stack:** xUnit + Shouldly, Akka.NET TestKit/Hosting, Moq (for `IHubContext`), EF
|
||
InMemory. No bUnit, no EF migration, no Commons/proto/interface change.
|
||
|
||
**Design:** `docs/plans/2026-06-18-driver-pages-reconnect-e2e-design.md` (committed `482418c8`).
|
||
|
||
---
|
||
|
||
### Task 1: Harness fidelity fix + controllable fake driver factory
|
||
|
||
**Classification:** standard
|
||
**Estimated implement time:** ~5 min
|
||
**Parallelizable with:** none
|
||
|
||
**Files:**
|
||
- Create: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs`
|
||
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/TwoNodeClusterHarness.cs`
|
||
- Read for contract (do NOT edit): `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriver.cs`,
|
||
`src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IDriverFactory.cs`,
|
||
`tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorTests.cs`
|
||
(existing fake `IDriver`/`IDriverFactory` double to template from),
|
||
`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs` (the
|
||
`AddOtOpcUaRuntime` method + the `resolver.GetService<IDriverHealthPublisher>()` site).
|
||
|
||
**Context (scene-setting):** Today `TwoNodeClusterHarness.BuildNodeAsync` calls
|
||
`WithOtOpcUaRuntimeActors()` (Akka actor spawn) but **not** `AddOtOpcUaRuntime()` (the DI
|
||
registration). So `IDriverHealthPublisher` resolves to `NullDriverHealthPublisher` and
|
||
`IDriverFactory` to `NullDriverFactory` → deployed drivers never publish health and reach
|
||
only `Stubbed`. Production (`Program.cs:87` + `:199`) calls both. This task brings the
|
||
harness to production fidelity and adds an opt-in fake factory so a test can drive a real
|
||
`Connected` state.
|
||
|
||
**Step 1: Build the fake driver double.** Create `FakeReconnectDriverFactory` implementing
|
||
`IDriverFactory` whose `TryCreate(driverType, instanceId, configJson)` returns a fake
|
||
`IDriver` (for `driverType == "Modbus"`; `SupportedTypes => ["Modbus"]`). Mirror the
|
||
existing fake `IDriver` in `DriverInstanceActorTests.cs` for the full member surface
|
||
(`DriverType`, connect/initialize, read/write/subscribe, dispose). The fake's
|
||
initialize/connect path must **succeed** so the `DriverInstanceActor` reaches
|
||
`InitializeSucceeded → Become(Connected)`. Keep read/subscribe as benign success/no-ops.
|
||
(No fault-injection needed: `ReconnectDriver` drives `ForceReconnect → Reconnecting →
|
||
re-initialize → Connected` on its own.)
|
||
|
||
**Step 2: Wire the harness.** In `TwoNodeClusterHarness`:
|
||
- Add `builder.Services.AddOtOpcUaRuntime();` **before** the `AddAkka(...)` call in
|
||
`BuildNodeAsync` (match production ordering — "Call this BEFORE AddAkka").
|
||
- Add an optional `IDriverFactory? driverFactory = null` parameter to `StartAsync` and
|
||
thread it into `BuildNodeAsync`; when non-null, register it
|
||
(`builder.Services.AddSingleton<IDriverFactory>(driverFactory);` placed **after**
|
||
`AddOtOpcUaRuntime` so it replaces the `Null` default — confirm the runtime resolves the
|
||
last/replacement registration, not the `TryAdd` default; if `TryAdd` would win, use
|
||
`Replace` or register the override before `AddOtOpcUaRuntime`).
|
||
|
||
**Step 3: Build + regression-check existing suite.** Run
|
||
`dotnet build tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests` (clean), then
|
||
`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests` — the existing tests
|
||
(DeployApi, DriverReconnect, DriverStatusHub, DriverTestConnect, etc.) must stay green with
|
||
the added `AddOtOpcUaRuntime` (the Null sinks are inert; nothing existing subscribes to
|
||
driver-health). Skipped fixture-gated tests staying skipped is expected.
|
||
|
||
**Step 4: Commit.**
|
||
```bash
|
||
git add tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs \
|
||
tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/TwoNodeClusterHarness.cs
|
||
git commit -m "test(harness): production-fidelity DI (AddOtOpcUaRuntime) + opt-in fake driver factory"
|
||
```
|
||
|
||
---
|
||
|
||
### Task 2: Reconnect health-transition E2E test
|
||
|
||
**Classification:** standard
|
||
**Estimated implement time:** ~5 min
|
||
**Parallelizable with:** none (depends on Task 1)
|
||
|
||
**Files:**
|
||
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs`
|
||
(add the new test method; keep the existing two)
|
||
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs`
|
||
(Step 0: controllable health + `InitializeCount` + created-driver accessor)
|
||
- Read for pattern (do NOT edit):
|
||
`tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/MultiClusterScopingTests.cs`
|
||
(the GREEN seed-`ServerCluster`/`Namespace`/`ClusterNode`/`DriverInstance` + `StartDeploymentAsync`
|
||
→ `Accepted` + per-node driver spawn precedent),
|
||
`tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverStatusHubE2eTests.cs`
|
||
(the mock `IHubContext` capture + manual bridge-spawn pattern).
|
||
|
||
**Context:** The existing `Reconnect_RoundTrip_ReturnsOk` only asserts command ingestion.
|
||
This new test proves the *actual health transition* of a deployed driver, end-to-end through
|
||
the real cluster wiring: `ReconnectDriver → AdminOperationsActor → DriverHostActor →
|
||
DriverInstanceActor FSM → PublishHealthSnapshot → driver-health DPS topic →
|
||
DriverStatusSignalRBridge → snapshot store / hub push`.
|
||
|
||
**Key findings from exploration (these shape the design — do not skip):**
|
||
- **Published `State` = `_driver.GetHealth().State`** (`DriverInstanceActor.PublishHealthSnapshot`,
|
||
line 750). The actor FSM (`Become(Reconnecting)`) does NOT set the published state directly —
|
||
on `ForceReconnect` it does `DetachSubscription(); Become(Reconnecting); PublishHealthSnapshot()`,
|
||
which *polls the driver's `GetHealth()`*. So the **always-`Healthy` Task 1 fake can never surface
|
||
`Reconnecting`.** The fake must report `Reconnecting` at that poll. The realistic, deterministic
|
||
way: the fake reports `Reconnecting` (simulating a dropped connection — exactly what prompts an
|
||
operator to click Reconnect), the `ForceReconnect` poll publishes it, and the retry's
|
||
`InitializeAsync` clears it back to `Healthy`.
|
||
- **Validator-clean seed** (from the GREEN `MultiClusterScopingTests`): `ServerCluster` +
|
||
`Namespace` + `ClusterNode`(NodeId = `NodeANodeId`) + `DriverInstance`(Enabled, DriverType
|
||
`"Modbus"`, DriverConfig `"{}"`). **No equipment/tags** — equipment/tags trip `DraftValidator`
|
||
and the deploy is `Rejected` (this is why the stale `EquipmentNamespaceMaterializationTests`
|
||
fails — pre-existing, unrelated).
|
||
- **Bridge is NOT auto-spawned** by the harness — spawn it manually (as `DriverStatusHubE2eTests`
|
||
does). DPS is fire-and-forget (no replay), and the driver's repeat-publish is deduped, so spawn
|
||
the bridge + await its DPS subscription (~2 s) **before** deploying so it catches the initial
|
||
`Healthy`.
|
||
|
||
**Step 0: Enhance `FakeReconnectDriver` / `FakeReconnectDriverFactory`** (same file from Task 1):
|
||
- `FakeReconnectDriver`: add a `volatile`/locked controllable health — `GetHealth()` returns
|
||
`DriverState.Reconnecting` when a `_reconnecting` flag is set, else `Healthy`; a public
|
||
`ReportReconnecting()` sets the flag; `InitializeAsync` clears it (and bumps a public
|
||
`InitializeCount`). (`DriverHealth` ctor = `new(state, lastSuccessfulRead, lastError)`.)
|
||
- `FakeReconnectDriverFactory`: record created drivers so the test can retrieve the one for a
|
||
given `driverInstanceId` (e.g. a `ConcurrentDictionary<string, FakeReconnectDriver> Created`
|
||
or `TryGetCreated(id)`).
|
||
|
||
**Step 1: Write the test** `Reconnect_DeployedDriver_TransitionsThroughReconnectingBackToHealthy`:
|
||
1. `var factory = new FakeReconnectDriverFactory(); await TwoNodeClusterHarness.StartAsync(driverFactory: factory)`.
|
||
2. Resolve the DI `IDriverStatusSnapshotStore`; spawn the real `DriverStatusSignalRBridge` over it
|
||
with a **capturing** mock `IHubContext<DriverStatusHub>` recording every pushed
|
||
`DriverHealthChanged` (reuse the `DriverStatusHubE2eTests` mock pattern — records every
|
||
`SendCoreAsync`). Wait ~2 s for the DPS `SubscribeAck`.
|
||
3. Seed `ServerCluster` + `Namespace` + `ClusterNode`(`NodeANodeId`) + one `DriverInstance`
|
||
(`"Modbus"`, Enabled, `"{}"`, **no tags**) via `CreateConfigDbContextAsync` (mirror
|
||
`MultiClusterScopingTests.SeedTwoClusterConfigAsync` but a single cluster bound to `NodeANodeId`).
|
||
4. `StartDeploymentAsync(createdBy: ...)` → assert `Accepted`. Condition-poll (≤20 s) until the
|
||
store reports the instance `Healthy` (and `factory.TryGetCreated(instanceId)` is non-null).
|
||
5. `factory.Created[instanceId].ReportReconnecting()` — simulate the driver having lost its
|
||
connection (the realistic trigger for an operator Reconnect).
|
||
6. Dispatch `ReconnectDriver(clusterId, instanceId, "e2e", Guid.NewGuid())` via
|
||
`IAdminOperationsClient.AskAsync<ReconnectDriverResult>`; assert `Ok`.
|
||
7. Condition-poll the captured push list until it contains a `Reconnecting` entry followed by a
|
||
later `Healthy`. Assert: the sequence shows `Reconnecting` → `Healthy`, the store's final
|
||
state is `Healthy`, and `InitializeCount >= 2` (proves the command genuinely re-initialised
|
||
the deployed driver through the full cluster path — not just a health poke).
|
||
|
||
This test runs **without** any Docker fixture (the fake driver is in-process) — NOT skip-gated.
|
||
|
||
**Step 2: Run.** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests --filter "FullyQualifiedName~DriverReconnectE2eTests"` — all green (the new test executes, not skips).
|
||
|
||
**Step 3: Commit.**
|
||
```bash
|
||
git add tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs
|
||
git commit -m "test(adminui): E2E deployed-driver Healthy→Reconnecting→Healthy transition on Reconnect"
|
||
```
|
||
|
||
---
|
||
|
||
### Task 3: Full driver E2E suite live run + verification
|
||
|
||
**Classification:** small
|
||
**Estimated implement time:** ~5 min
|
||
**Parallelizable with:** none (depends on Task 2)
|
||
|
||
**Files:** none (verification only)
|
||
|
||
**Step 1: Bring up the Modbus sim** so the skip-gated 10.1 tests execute (not skip):
|
||
`lmxopcua-fix up modbus standard` (sim at `10.100.0.35:5020`). Verify reachability.
|
||
|
||
**Step 2: Run the full driver E2E suite:**
|
||
`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests` — confirm the
|
||
`DriverTestConnectE2eTests` (now executing against the live sim, not skipped),
|
||
`DriverReconnectE2eTests` (incl. the new transition test), and `DriverStatusHubE2eTests`
|
||
all pass. Record pass counts + which previously-skipped tests now executed.
|
||
|
||
**Step 3:** If any sim-gated test cannot run (sim unreachable from this host), record that
|
||
honestly; the new in-process transition test must pass regardless. No commit (verification).
|
||
|
||
---
|
||
|
||
### Task 4: Reconcile stale trackers + finish
|
||
|
||
**Classification:** small
|
||
**Estimated implement time:** ~4 min
|
||
**Parallelizable with:** none (depends on Task 3)
|
||
|
||
**Files:**
|
||
- Modify: `docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json` (mark Phases 6–10
|
||
`completed` with real commits / a "shipped — reconciled 2026-06-18" note; bump `lastUpdated`)
|
||
- Modify: `docs/plans/2026-05-28-adminui-driver-pages-design.md` (§8.3: `ModbusTcp` → `Modbus`)
|
||
- Modify: `stillpending.md` §A.9 (mark Phase 6/8/10 SHIPPED; record the new reconnect-transition
|
||
test; keep the full-stack hub test as a documented deferred follow-up) — **NEVER STAGE this
|
||
file** (local working file)
|
||
- Modify: memory `project_stillpending_backlog.md` + `MEMORY.md`
|
||
|
||
**Step 1:** Reconcile the `.tasks.json` (Phases 6–10 → completed, with commit refs from the
|
||
brainstorming finding) and fix the §8.3 `ModbusTcp` string.
|
||
|
||
**Step 2:** Stage **only** the two `docs/plans/...` files (the tasks.json + the design md) —
|
||
by explicit path. Do NOT `git add .`. Do NOT stage `stillpending.md`.
|
||
```bash
|
||
git add docs/plans/2026-05-28-adminui-driver-pages-plan.md.tasks.json \
|
||
docs/plans/2026-05-28-adminui-driver-pages-design.md
|
||
git commit -m "docs(plans): reconcile driver-pages tasks (Phases 6-10 shipped) + fix smoke checklist"
|
||
```
|
||
|
||
**Step 3:** Update `stillpending.md` §A.9 (unstaged) + memory files.
|
||
|
||
**Step 4: Finish.** Use superpowers-extended-cc:finishing-a-development-branch — verify the
|
||
suite green, then merge `feat/driver-pages-reconnect-e2e` → master (ff/merge) + push. Bookkeep
|
||
this plan's `.tasks.json` (executionState COMPLETE) on master.
|
||
|
||
---
|
||
|
||
## Cross-cutting verification (before merge)
|
||
1. `dotnet build tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests` — clean.
|
||
2. `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests` — green (new test executes).
|
||
3. `git diff --stat master..` — only the expected harness/test/docs files; no surprise changes,
|
||
no never-stage files staged.
|