From 4e649151ac4a35318c2a96d7939aed3f646e48c1 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Thu, 18 Jun 2026 09:07:33 -0400 Subject: [PATCH] docs(plans): refine Task 2 with GetHealth/seed/bridge-ordering findings from exploration --- .../2026-06-18-driver-pages-reconnect-e2e.md | 79 +++++++++++++------ 1 file changed, 54 insertions(+), 25 deletions(-) diff --git a/docs/plans/2026-06-18-driver-pages-reconnect-e2e.md b/docs/plans/2026-06-18-driver-pages-reconnect-e2e.md index 962c987a..2e2a16df 100644 --- a/docs/plans/2026-06-18-driver-pages-reconnect-e2e.md +++ b/docs/plans/2026-06-18-driver-pages-reconnect-e2e.md @@ -88,11 +88,14 @@ git commit -m "test(harness): production-fidelity DI (AddOtOpcUaRuntime) + opt-i **Files:** - Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs` (add the new test method; keep the existing two) +- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs` + (Step 0: controllable health + `InitializeCount` + created-driver accessor) - Read for pattern (do NOT edit): - `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverHostActorLiveValueTests.cs` - (the `SeedDeploymentWithEquipmentTags` + `DispatchDeployment` deploy precedent), + `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/MultiClusterScopingTests.cs` + (the GREEN seed-`ServerCluster`/`Namespace`/`ClusterNode`/`DriverInstance` + `StartDeploymentAsync` + → `Accepted` + per-node driver spawn precedent), `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverStatusHubE2eTests.cs` - (the mock `IHubContext` capture + bridge-spawn pattern). + (the mock `IHubContext` capture + manual bridge-spawn pattern). **Context:** The existing `Reconnect_RoundTrip_ReturnsOk` only asserts command ingestion. This new test proves the *actual health transition* of a deployed driver, end-to-end through @@ -100,29 +103,55 @@ the real cluster wiring: `ReconnectDriver → AdminOperationsActor → DriverHos DriverInstanceActor FSM → PublishHealthSnapshot → driver-health DPS topic → DriverStatusSignalRBridge → snapshot store / hub push`. -**Step 1: Write the test** `Reconnect_DeployedDriver_TransitionsThroughReconnectingBackToHealthy`: -1. `await TwoNodeClusterHarness.StartAsync(driverFactory: new FakeReconnectDriverFactory())`. -2. Seed a deployment with one `Modbus` driver instance (+ one equipment/tag so the artifact - projects) bound to `NodeANodeId`, using the `SeedDeploymentWithEquipmentTags` approach - adapted from `DriverHostActorLiveValueTests` (seed via `CreateConfigDbContextAsync`), then - trigger the deploy (`DispatchDeployment` to the deploy coordinator, or - `POST /api/deployments` with `HarnessDeployApiKey` as in `DeployApiE2eTests`). -3. Resolve the real DI `IDriverStatusSnapshotStore`; spawn the real - `DriverStatusSignalRBridge` over it with a **capturing** mock `IHubContext` - that appends every pushed `DriverHealthChanged.State` to a list (reuse the - `DriverStatusHubE2eTests` mock pattern — note it records every `SendCoreAsync`, giving the - full transition sequence, not just last-write). -4. Condition-poll (generous timeout, e.g. 20 s) until the store reports the instance - `Healthy` (confirm the exact `DriverState` string the Connected state publishes by reading - the health computation; it is the same value the panel's `ChipClass` maps as `"Healthy"`). -5. Dispatch `ReconnectDriver(ClusterId, instanceId, "e2e", Guid.NewGuid())` via - `IAdminOperationsClient.AskAsync`; assert `Ok`. -6. Condition-poll the captured push list until it contains a `Reconnecting` entry **after** - the initial Healthy, followed by a return to `Healthy`. Assert the store's final state is - `Healthy`. +**Key findings from exploration (these shape the design — do not skip):** +- **Published `State` = `_driver.GetHealth().State`** (`DriverInstanceActor.PublishHealthSnapshot`, + line 750). The actor FSM (`Become(Reconnecting)`) does NOT set the published state directly — + on `ForceReconnect` it does `DetachSubscription(); Become(Reconnecting); PublishHealthSnapshot()`, + which *polls the driver's `GetHealth()`*. So the **always-`Healthy` Task 1 fake can never surface + `Reconnecting`.** The fake must report `Reconnecting` at that poll. The realistic, deterministic + way: the fake reports `Reconnecting` (simulating a dropped connection — exactly what prompts an + operator to click Reconnect), the `ForceReconnect` poll publishes it, and the retry's + `InitializeAsync` clears it back to `Healthy`. +- **Validator-clean seed** (from the GREEN `MultiClusterScopingTests`): `ServerCluster` + + `Namespace` + `ClusterNode`(NodeId = `NodeANodeId`) + `DriverInstance`(Enabled, DriverType + `"Modbus"`, DriverConfig `"{}"`). **No equipment/tags** — equipment/tags trip `DraftValidator` + and the deploy is `Rejected` (this is why the stale `EquipmentNamespaceMaterializationTests` + fails — pre-existing, unrelated). +- **Bridge is NOT auto-spawned** by the harness — spawn it manually (as `DriverStatusHubE2eTests` + does). DPS is fire-and-forget (no replay), and the driver's repeat-publish is deduped, so spawn + the bridge + await its DPS subscription (~2 s) **before** deploying so it catches the initial + `Healthy`. -This test must run **without** any Docker fixture (the fake driver is in-process) — it is -NOT skip-gated. +**Step 0: Enhance `FakeReconnectDriver` / `FakeReconnectDriverFactory`** (same file from Task 1): +- `FakeReconnectDriver`: add a `volatile`/locked controllable health — `GetHealth()` returns + `DriverState.Reconnecting` when a `_reconnecting` flag is set, else `Healthy`; a public + `ReportReconnecting()` sets the flag; `InitializeAsync` clears it (and bumps a public + `InitializeCount`). (`DriverHealth` ctor = `new(state, lastSuccessfulRead, lastError)`.) +- `FakeReconnectDriverFactory`: record created drivers so the test can retrieve the one for a + given `driverInstanceId` (e.g. a `ConcurrentDictionary Created` + or `TryGetCreated(id)`). + +**Step 1: Write the test** `Reconnect_DeployedDriver_TransitionsThroughReconnectingBackToHealthy`: +1. `var factory = new FakeReconnectDriverFactory(); await TwoNodeClusterHarness.StartAsync(driverFactory: factory)`. +2. Resolve the DI `IDriverStatusSnapshotStore`; spawn the real `DriverStatusSignalRBridge` over it + with a **capturing** mock `IHubContext` recording every pushed + `DriverHealthChanged` (reuse the `DriverStatusHubE2eTests` mock pattern — records every + `SendCoreAsync`). Wait ~2 s for the DPS `SubscribeAck`. +3. Seed `ServerCluster` + `Namespace` + `ClusterNode`(`NodeANodeId`) + one `DriverInstance` + (`"Modbus"`, Enabled, `"{}"`, **no tags**) via `CreateConfigDbContextAsync` (mirror + `MultiClusterScopingTests.SeedTwoClusterConfigAsync` but a single cluster bound to `NodeANodeId`). +4. `StartDeploymentAsync(createdBy: ...)` → assert `Accepted`. Condition-poll (≤20 s) until the + store reports the instance `Healthy` (and `factory.TryGetCreated(instanceId)` is non-null). +5. `factory.Created[instanceId].ReportReconnecting()` — simulate the driver having lost its + connection (the realistic trigger for an operator Reconnect). +6. Dispatch `ReconnectDriver(clusterId, instanceId, "e2e", Guid.NewGuid())` via + `IAdminOperationsClient.AskAsync`; assert `Ok`. +7. Condition-poll the captured push list until it contains a `Reconnecting` entry followed by a + later `Healthy`. Assert: the sequence shows `Reconnecting` → `Healthy`, the store's final + state is `Healthy`, and `InitializeCount >= 2` (proves the command genuinely re-initialised + the deployed driver through the full cluster path — not just a health poke). + +This test runs **without** any Docker fixture (the fake driver is in-process) — NOT skip-gated. **Step 2: Run.** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests --filter "FullyQualifiedName~DriverReconnectE2eTests"` — all green (the new test executes, not skips).