docs(plans): refine Task 2 with GetHealth/seed/bridge-ordering findings from exploration

This commit is contained in:
Joseph Doherty
2026-06-18 09:07:33 -04:00
parent ffb725e4c1
commit 4e649151ac
@@ -88,11 +88,14 @@ git commit -m "test(harness): production-fidelity DI (AddOtOpcUaRuntime) + opt-i
**Files:**
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs`
(add the new test method; keep the existing two)
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs`
(Step 0: controllable health + `InitializeCount` + created-driver accessor)
- Read for pattern (do NOT edit):
`tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverHostActorLiveValueTests.cs`
(the `SeedDeploymentWithEquipmentTags` + `DispatchDeployment` deploy precedent),
`tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/MultiClusterScopingTests.cs`
(the GREEN seed-`ServerCluster`/`Namespace`/`ClusterNode`/`DriverInstance` + `StartDeploymentAsync`
`Accepted` + per-node driver spawn precedent),
`tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverStatusHubE2eTests.cs`
(the mock `IHubContext` capture + bridge-spawn pattern).
(the mock `IHubContext` capture + manual bridge-spawn pattern).
**Context:** The existing `Reconnect_RoundTrip_ReturnsOk` only asserts command ingestion.
This new test proves the *actual health transition* of a deployed driver, end-to-end through
@@ -100,29 +103,55 @@ the real cluster wiring: `ReconnectDriver → AdminOperationsActor → DriverHos
DriverInstanceActor FSM → PublishHealthSnapshot → driver-health DPS topic →
DriverStatusSignalRBridge → snapshot store / hub push`.
**Step 1: Write the test** `Reconnect_DeployedDriver_TransitionsThroughReconnectingBackToHealthy`:
1. `await TwoNodeClusterHarness.StartAsync(driverFactory: new FakeReconnectDriverFactory())`.
2. Seed a deployment with one `Modbus` driver instance (+ one equipment/tag so the artifact
projects) bound to `NodeANodeId`, using the `SeedDeploymentWithEquipmentTags` approach
adapted from `DriverHostActorLiveValueTests` (seed via `CreateConfigDbContextAsync`), then
trigger the deploy (`DispatchDeployment` to the deploy coordinator, or
`POST /api/deployments` with `HarnessDeployApiKey` as in `DeployApiE2eTests`).
3. Resolve the real DI `IDriverStatusSnapshotStore`; spawn the real
`DriverStatusSignalRBridge` over it with a **capturing** mock `IHubContext<DriverStatusHub>`
that appends every pushed `DriverHealthChanged.State` to a list (reuse the
`DriverStatusHubE2eTests` mock pattern — note it records every `SendCoreAsync`, giving the
full transition sequence, not just last-write).
4. Condition-poll (generous timeout, e.g. 20 s) until the store reports the instance
`Healthy` (confirm the exact `DriverState` string the Connected state publishes by reading
the health computation; it is the same value the panel's `ChipClass` maps as `"Healthy"`).
5. Dispatch `ReconnectDriver(ClusterId, instanceId, "e2e", Guid.NewGuid())` via
`IAdminOperationsClient.AskAsync<ReconnectDriverResult>`; assert `Ok`.
6. Condition-poll the captured push list until it contains a `Reconnecting` entry **after**
the initial Healthy, followed by a return to `Healthy`. Assert the store's final state is
`Healthy`.
**Key findings from exploration (these shape the design — do not skip):**
- **Published `State` = `_driver.GetHealth().State`** (`DriverInstanceActor.PublishHealthSnapshot`,
line 750). The actor FSM (`Become(Reconnecting)`) does NOT set the published state directly —
on `ForceReconnect` it does `DetachSubscription(); Become(Reconnecting); PublishHealthSnapshot()`,
which *polls the driver's `GetHealth()`*. So the **always-`Healthy` Task 1 fake can never surface
`Reconnecting`.** The fake must report `Reconnecting` at that poll. The realistic, deterministic
way: the fake reports `Reconnecting` (simulating a dropped connection — exactly what prompts an
operator to click Reconnect), the `ForceReconnect` poll publishes it, and the retry's
`InitializeAsync` clears it back to `Healthy`.
- **Validator-clean seed** (from the GREEN `MultiClusterScopingTests`): `ServerCluster` +
`Namespace` + `ClusterNode`(NodeId = `NodeANodeId`) + `DriverInstance`(Enabled, DriverType
`"Modbus"`, DriverConfig `"{}"`). **No equipment/tags** — equipment/tags trip `DraftValidator`
and the deploy is `Rejected` (this is why the stale `EquipmentNamespaceMaterializationTests`
fails — pre-existing, unrelated).
- **Bridge is NOT auto-spawned** by the harness — spawn it manually (as `DriverStatusHubE2eTests`
does). DPS is fire-and-forget (no replay), and the driver's repeat-publish is deduped, so spawn
the bridge + await its DPS subscription (~2 s) **before** deploying so it catches the initial
`Healthy`.
This test must run **without** any Docker fixture (the fake driver is in-process) — it is
NOT skip-gated.
**Step 0: Enhance `FakeReconnectDriver` / `FakeReconnectDriverFactory`** (same file from Task 1):
- `FakeReconnectDriver`: add a `volatile`/locked controllable health — `GetHealth()` returns
`DriverState.Reconnecting` when a `_reconnecting` flag is set, else `Healthy`; a public
`ReportReconnecting()` sets the flag; `InitializeAsync` clears it (and bumps a public
`InitializeCount`). (`DriverHealth` ctor = `new(state, lastSuccessfulRead, lastError)`.)
- `FakeReconnectDriverFactory`: record created drivers so the test can retrieve the one for a
given `driverInstanceId` (e.g. a `ConcurrentDictionary<string, FakeReconnectDriver> Created`
or `TryGetCreated(id)`).
**Step 1: Write the test** `Reconnect_DeployedDriver_TransitionsThroughReconnectingBackToHealthy`:
1. `var factory = new FakeReconnectDriverFactory(); await TwoNodeClusterHarness.StartAsync(driverFactory: factory)`.
2. Resolve the DI `IDriverStatusSnapshotStore`; spawn the real `DriverStatusSignalRBridge` over it
with a **capturing** mock `IHubContext<DriverStatusHub>` recording every pushed
`DriverHealthChanged` (reuse the `DriverStatusHubE2eTests` mock pattern — records every
`SendCoreAsync`). Wait ~2 s for the DPS `SubscribeAck`.
3. Seed `ServerCluster` + `Namespace` + `ClusterNode`(`NodeANodeId`) + one `DriverInstance`
(`"Modbus"`, Enabled, `"{}"`, **no tags**) via `CreateConfigDbContextAsync` (mirror
`MultiClusterScopingTests.SeedTwoClusterConfigAsync` but a single cluster bound to `NodeANodeId`).
4. `StartDeploymentAsync(createdBy: ...)` → assert `Accepted`. Condition-poll (≤20 s) until the
store reports the instance `Healthy` (and `factory.TryGetCreated(instanceId)` is non-null).
5. `factory.Created[instanceId].ReportReconnecting()` — simulate the driver having lost its
connection (the realistic trigger for an operator Reconnect).
6. Dispatch `ReconnectDriver(clusterId, instanceId, "e2e", Guid.NewGuid())` via
`IAdminOperationsClient.AskAsync<ReconnectDriverResult>`; assert `Ok`.
7. Condition-poll the captured push list until it contains a `Reconnecting` entry followed by a
later `Healthy`. Assert: the sequence shows `Reconnecting``Healthy`, the store's final
state is `Healthy`, and `InitializeCount >= 2` (proves the command genuinely re-initialised
the deployed driver through the full cluster path — not just a health poke).
This test runs **without** any Docker fixture (the fake driver is in-process) — NOT skip-gated.
**Step 2: Run.** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests --filter "FullyQualifiedName~DriverReconnectE2eTests"` — all green (the new test executes, not skips).