docs(plans): refine Task 2 with GetHealth/seed/bridge-ordering findings from exploration

This commit is contained in:
Joseph Doherty
2026-06-18 09:07:33 -04:00
parent ffb725e4c1
commit 4e649151ac
@@ -88,11 +88,14 @@ git commit -m "test(harness): production-fidelity DI (AddOtOpcUaRuntime) + opt-i
**Files:** **Files:**
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs` - Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverReconnectE2eTests.cs`
(add the new test method; keep the existing two) (add the new test method; keep the existing two)
- Modify: `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/Fakes/FakeReconnectDriverFactory.cs`
(Step 0: controllable health + `InitializeCount` + created-driver accessor)
- Read for pattern (do NOT edit): - Read for pattern (do NOT edit):
`tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverHostActorLiveValueTests.cs` `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/MultiClusterScopingTests.cs`
(the `SeedDeploymentWithEquipmentTags` + `DispatchDeployment` deploy precedent), (the GREEN seed-`ServerCluster`/`Namespace`/`ClusterNode`/`DriverInstance` + `StartDeploymentAsync`
`Accepted` + per-node driver spawn precedent),
`tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverStatusHubE2eTests.cs` `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DriverStatusHubE2eTests.cs`
(the mock `IHubContext` capture + bridge-spawn pattern). (the mock `IHubContext` capture + manual bridge-spawn pattern).
**Context:** The existing `Reconnect_RoundTrip_ReturnsOk` only asserts command ingestion. **Context:** The existing `Reconnect_RoundTrip_ReturnsOk` only asserts command ingestion.
This new test proves the *actual health transition* of a deployed driver, end-to-end through This new test proves the *actual health transition* of a deployed driver, end-to-end through
@@ -100,29 +103,55 @@ the real cluster wiring: `ReconnectDriver → AdminOperationsActor → DriverHos
DriverInstanceActor FSM → PublishHealthSnapshot → driver-health DPS topic → DriverInstanceActor FSM → PublishHealthSnapshot → driver-health DPS topic →
DriverStatusSignalRBridge → snapshot store / hub push`. DriverStatusSignalRBridge → snapshot store / hub push`.
**Step 1: Write the test** `Reconnect_DeployedDriver_TransitionsThroughReconnectingBackToHealthy`: **Key findings from exploration (these shape the design — do not skip):**
1. `await TwoNodeClusterHarness.StartAsync(driverFactory: new FakeReconnectDriverFactory())`. - **Published `State` = `_driver.GetHealth().State`** (`DriverInstanceActor.PublishHealthSnapshot`,
2. Seed a deployment with one `Modbus` driver instance (+ one equipment/tag so the artifact line 750). The actor FSM (`Become(Reconnecting)`) does NOT set the published state directly —
projects) bound to `NodeANodeId`, using the `SeedDeploymentWithEquipmentTags` approach on `ForceReconnect` it does `DetachSubscription(); Become(Reconnecting); PublishHealthSnapshot()`,
adapted from `DriverHostActorLiveValueTests` (seed via `CreateConfigDbContextAsync`), then which *polls the driver's `GetHealth()`*. So the **always-`Healthy` Task 1 fake can never surface
trigger the deploy (`DispatchDeployment` to the deploy coordinator, or `Reconnecting`.** The fake must report `Reconnecting` at that poll. The realistic, deterministic
`POST /api/deployments` with `HarnessDeployApiKey` as in `DeployApiE2eTests`). way: the fake reports `Reconnecting` (simulating a dropped connection — exactly what prompts an
3. Resolve the real DI `IDriverStatusSnapshotStore`; spawn the real operator to click Reconnect), the `ForceReconnect` poll publishes it, and the retry's
`DriverStatusSignalRBridge` over it with a **capturing** mock `IHubContext<DriverStatusHub>` `InitializeAsync` clears it back to `Healthy`.
that appends every pushed `DriverHealthChanged.State` to a list (reuse the - **Validator-clean seed** (from the GREEN `MultiClusterScopingTests`): `ServerCluster` +
`DriverStatusHubE2eTests` mock pattern — note it records every `SendCoreAsync`, giving the `Namespace` + `ClusterNode`(NodeId = `NodeANodeId`) + `DriverInstance`(Enabled, DriverType
full transition sequence, not just last-write). `"Modbus"`, DriverConfig `"{}"`). **No equipment/tags** — equipment/tags trip `DraftValidator`
4. Condition-poll (generous timeout, e.g. 20 s) until the store reports the instance and the deploy is `Rejected` (this is why the stale `EquipmentNamespaceMaterializationTests`
`Healthy` (confirm the exact `DriverState` string the Connected state publishes by reading fails — pre-existing, unrelated).
the health computation; it is the same value the panel's `ChipClass` maps as `"Healthy"`). - **Bridge is NOT auto-spawned** by the harness — spawn it manually (as `DriverStatusHubE2eTests`
5. Dispatch `ReconnectDriver(ClusterId, instanceId, "e2e", Guid.NewGuid())` via does). DPS is fire-and-forget (no replay), and the driver's repeat-publish is deduped, so spawn
`IAdminOperationsClient.AskAsync<ReconnectDriverResult>`; assert `Ok`. the bridge + await its DPS subscription (~2 s) **before** deploying so it catches the initial
6. Condition-poll the captured push list until it contains a `Reconnecting` entry **after** `Healthy`.
the initial Healthy, followed by a return to `Healthy`. Assert the store's final state is
`Healthy`.
This test must run **without** any Docker fixture (the fake driver is in-process) — it is **Step 0: Enhance `FakeReconnectDriver` / `FakeReconnectDriverFactory`** (same file from Task 1):
NOT skip-gated. - `FakeReconnectDriver`: add a `volatile`/locked controllable health — `GetHealth()` returns
`DriverState.Reconnecting` when a `_reconnecting` flag is set, else `Healthy`; a public
`ReportReconnecting()` sets the flag; `InitializeAsync` clears it (and bumps a public
`InitializeCount`). (`DriverHealth` ctor = `new(state, lastSuccessfulRead, lastError)`.)
- `FakeReconnectDriverFactory`: record created drivers so the test can retrieve the one for a
given `driverInstanceId` (e.g. a `ConcurrentDictionary<string, FakeReconnectDriver> Created`
or `TryGetCreated(id)`).
**Step 1: Write the test** `Reconnect_DeployedDriver_TransitionsThroughReconnectingBackToHealthy`:
1. `var factory = new FakeReconnectDriverFactory(); await TwoNodeClusterHarness.StartAsync(driverFactory: factory)`.
2. Resolve the DI `IDriverStatusSnapshotStore`; spawn the real `DriverStatusSignalRBridge` over it
with a **capturing** mock `IHubContext<DriverStatusHub>` recording every pushed
`DriverHealthChanged` (reuse the `DriverStatusHubE2eTests` mock pattern — records every
`SendCoreAsync`). Wait ~2 s for the DPS `SubscribeAck`.
3. Seed `ServerCluster` + `Namespace` + `ClusterNode`(`NodeANodeId`) + one `DriverInstance`
(`"Modbus"`, Enabled, `"{}"`, **no tags**) via `CreateConfigDbContextAsync` (mirror
`MultiClusterScopingTests.SeedTwoClusterConfigAsync` but a single cluster bound to `NodeANodeId`).
4. `StartDeploymentAsync(createdBy: ...)` → assert `Accepted`. Condition-poll (≤20 s) until the
store reports the instance `Healthy` (and `factory.TryGetCreated(instanceId)` is non-null).
5. `factory.Created[instanceId].ReportReconnecting()` — simulate the driver having lost its
connection (the realistic trigger for an operator Reconnect).
6. Dispatch `ReconnectDriver(clusterId, instanceId, "e2e", Guid.NewGuid())` via
`IAdminOperationsClient.AskAsync<ReconnectDriverResult>`; assert `Ok`.
7. Condition-poll the captured push list until it contains a `Reconnecting` entry followed by a
later `Healthy`. Assert: the sequence shows `Reconnecting``Healthy`, the store's final
state is `Healthy`, and `InitializeCount >= 2` (proves the command genuinely re-initialised
the deployed driver through the full cluster path — not just a health poke).
This test runs **without** any Docker fixture (the fake driver is in-process) — NOT skip-gated.
**Step 2: Run.** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests --filter "FullyQualifiedName~DriverReconnectE2eTests"` — all green (the new test executes, not skips). **Step 2: Run.** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests --filter "FullyQualifiedName~DriverReconnectE2eTests"` — all green (the new test executes, not skips).