docs(plan): driver-less equipment namespace implementation plan (#143-147)

This commit is contained in:
Joseph Doherty
2026-06-08 06:40:14 -04:00
parent 064adb0bd0
commit a94d03a194
2 changed files with 249 additions and 0 deletions
@@ -0,0 +1,238 @@
# Driver-less Equipment Namespace — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.
**Goal:** Let VirtualTag-only equipment reference no field driver — make `Equipment.DriverInstanceId` nullable, delete the misleading `Modbus` placeholder driver, and stop the per-deploy stub/exception noise, while preserving cluster-attribution correctness.
**Architecture:** The data path is unchanged (equipment values stream from the GalaxyMxGateway driver via the galaxy-mirror → VirtualTag indirection). Almost nothing reads `Equipment.DriverInstanceId`, so the change is: (1) make the column nullable (entity + one EF migration), (2) fix the single cluster-attribution site (`DeploymentArtifact.BuildClusterSets`) to anchor driver-less equipment on its UNS line's area cluster instead of its driver, (3) update the loader to stop creating the placeholder and NULL the FK, (4) live-verify on docker-dev.
**Tech Stack:** .NET 10, EF Core (MSSQL), Akka.NET, xUnit + Shouldly. Solution `ZB.MOM.WW.OtOpcUa.slnx`, Central Package Management. Branch `feat/driverless-equipment-namespace` (off `master` `446a456`). Loader: Python (`scadaproj/otopcua-uns-loader/otopcua_uns.py`). Design: `docs/plans/2026-06-08-driverless-equipment-namespace-design.md`.
**Sequencing:** T1 (entity + migration) first — it's the schema foundation. T2 (BuildClusterSets) and T3 (validator test) build on T1; T4 (loader) is a different repo and runs anytime. T5 (live verify) needs T1+T2+T4.
---
### Task 1: Make `Equipment.DriverInstanceId` nullable + EF migration
**Classification:** high-risk
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 4 (different repo)
**Files:**
- Modify: `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/Equipment.cs` (line ~23)
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/<timestamp>_NullableEquipmentDriverInstanceId.cs` (generated by `dotnet ef`)
- Auto-modify: `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/OtOpcUaConfigDbContextModelSnapshot.cs` (regenerated by `dotnet ef`)
**Context:** `DriverInstanceId` is currently `public required string DriverInstanceId { get; set; }`. The EF config `OtOpcUaConfigDbContext.ConfigureEquipment` has **no `.IsRequired()`** call and **no FK relationship** for it (it's a logical FK only) — so EF infers NOT NULL purely from the C# `required string` type. Changing the type to `string?` flips the column to nullable; no fluent-config change is needed. The index `IX_Equipment_Driver` is a plain non-unique index and stays valid on a nullable column.
**Step 1 — Change the entity property.** In `Equipment.cs`, change:
```csharp
public required string DriverInstanceId { get; set; }
```
to:
```csharp
public string? DriverInstanceId { get; set; }
```
(Drop the `required` modifier and make the type nullable. Update the property's XML-doc comment if it asserts non-null — note it is now optional: null = VirtualTag-only / driver-less equipment.)
**Step 2 — Build to confirm no compile break from the nullability change.**
Run: `dotnet build ZB.MOM.WW.OtOpcUa.slnx`
Expected: build succeeds. If any production code dereferences `Equipment.DriverInstanceId` with a non-null assumption (`!` or implicit), the nullable-reference analyzer will flag it under `TreatWarningsAsErrors` — the investigation found NONE in production code, so expect 0 new warnings. If one appears, STOP and report the file:line (it's an undocumented deref the design missed).
**Step 3 — Author the migration.** Confirm the EF tool is available (`dotnet ef --version`; if missing, `dotnet tool restore` or `dotnet tool install --global dotnet-ef`). Set the design-time connection env (no DB connection is made during `migrations add` — it diffs the model), then scaffold:
```bash
export OTOPCUA_CONFIG_CONNECTION="Server=localhost,14330;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
dotnet ef migrations add NullableEquipmentDriverInstanceId --project src/Core/ZB.MOM.WW.OtOpcUa.Configuration
```
**Inspect the generated migration.** It MUST be a single `AlterColumn` and its inverse — nothing else:
```csharp
migrationBuilder.AlterColumn<string>(
name: "DriverInstanceId", table: "Equipment",
type: "nvarchar(64)", maxLength: 64, nullable: true,
oldClrType: typeof(string), oldType: "nvarchar(64)", oldMaxLength: 64);
// Down(): the same AlterColumn with nullable: false / oldNullable: true
```
If the migration contains anything beyond this `AlterColumn` (e.g. unrelated model drift), STOP and report — the model snapshot may be out of sync with master and that must be resolved first.
**Step 4 — Build + run the Configuration test suite (behaviour-preserving).**
Run: `dotnet build ZB.MOM.WW.OtOpcUa.slnx` then `dotnet test tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/` (glob `tests` for the exact path if it differs).
Expected: green. Nothing asserts `DriverInstanceId` non-null, so all existing tests pass.
**Step 5 — Commit.**
```bash
git add -A && git commit -m "feat(config): make Equipment.DriverInstanceId nullable (driver-less equipment) + migration"
```
**Acceptance:** entity is `string?`; the new migration is a single `AlterColumn(... nullable: true)`; the model snapshot drops `.IsRequired()` on `DriverInstanceId`; solution builds; Configuration tests green. **Do NOT apply the migration to any DB here** — that happens in Task 5.
---
### Task 2: `BuildClusterSets` — attribute driver-less equipment via its UNS line's area cluster
**Classification:** standard
**Estimated implement time:** ~5 min
**Parallelizable with:** Task 3 (different file), Task 4 (different repo)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs` (`BuildClusterSets`, ~lines 271297, and its `ClusterSets` doc ~260267)
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/...` (find the existing DeploymentArtifact test file by glob, e.g. `*DeploymentArtifact*Tests.cs` or `*ParseComposition*`/`*ClusterScope*`)
**Context:** Equipment carries no `ClusterId`. Today `BuildClusterSets` attributes equipment to a cluster **via its driver**:
```csharp
// Equipment carries no ClusterId — include it when its DriverInstanceId is in-cluster.
var di = el.TryGetProperty("DriverInstanceId", out var diEl) ? diEl.GetString() : null;
var id = el.TryGetProperty("EquipmentId", out var idEl) ? idEl.GetString() : null;
if (!string.IsNullOrWhiteSpace(id) && di is not null && driverIds.Contains(di))
equipmentIds.Add(id!);
```
A null driver ⇒ equipment (and its VirtualTags) silently dropped in multi-cluster (`ScopeTo`) mode. The fix anchors driver-less equipment on its UNS placement: `UnsArea` carries `ClusterId` (collected into `areaIds`), and `Equipment → UnsLine.UnsAreaId → UnsArea`. So a driver-less equipment is in-cluster iff its line's area is in-cluster. Single-cluster (`ClusterFilterMode.None`) never calls `BuildClusterSets`, so this only affects multi-cluster — but it's a correctness fix worth a test.
**Step 1 — Write the failing test.** First READ the existing DeploymentArtifact/cluster-scope test file to learn the harness: how a deployment-artifact JSON blob is built (the helper that assembles `DriverInstances`/`UnsAreas`/`UnsLines`/`Equipment`/`EquipmentVirtualTags` arrays), and how `ParseComposition(blob, scope)` / `BuildClusterSets` results are asserted. Mirror that harness exactly. Add a test like:
- `Driverless_equipment_is_kept_when_its_line_area_is_in_cluster`: blob with an in-cluster `UnsArea` (ClusterId = "C1"), a `UnsLine` in that area, an `Equipment` with **`DriverInstanceId: null`** on that line, plus an `EquipmentVirtualTag` for it. Call the cluster-scoped parse for cluster "C1". Assert the equipment node AND its VirtualTag are present in the result.
- `Driverless_equipment_is_excluded_when_its_line_area_is_in_another_cluster`: same but the area's ClusterId is "C2"; scope to "C1" → equipment + its vtag absent.
- (Keep/confirm an existing driver-bound case still works — driver-bound equipment attribution must be unchanged.)
If the existing tests assert via `ParseComposition(blob, ClusterScope)`, use that public entry; if they unit-test `BuildClusterSets` directly via an internal-visible hook, follow that. Do NOT invent a new harness.
**Step 2 — Run, confirm fail.** `dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ --filter "FullyQualifiedName~Driverless"` → the new tests fail (driver-less equipment currently dropped).
**Step 3 — Implement.** In `BuildClusterSets`, before the Equipment loop, build a `UnsLineId → UnsAreaId` map from the artifact's `UnsLines` array; then attribute driver-less equipment by line-area:
```csharp
// Map each UnsLine to its area so driver-less equipment can be cluster-attributed by its UNS placement.
var lineToArea = new Dictionary<string, string>(StringComparer.Ordinal);
if (root.TryGetProperty("UnsLines", out var lines) && lines.ValueKind == JsonValueKind.Array)
{
foreach (var el in lines.EnumerateArray())
{
if (el.ValueKind != JsonValueKind.Object) continue;
var lineId = el.TryGetProperty("UnsLineId", out var lEl) ? lEl.GetString() : null;
var areaId = el.TryGetProperty("UnsAreaId", out var aEl) ? aEl.GetString() : null;
if (!string.IsNullOrWhiteSpace(lineId) && !string.IsNullOrWhiteSpace(areaId))
lineToArea[lineId!] = areaId!;
}
}
// ... in the Equipment loop, replace the driver-only attribution with:
var di = el.TryGetProperty("DriverInstanceId", out var diEl) ? diEl.GetString() : null;
var id = el.TryGetProperty("EquipmentId", out var idEl) ? idEl.GetString() : null;
var lineId = el.TryGetProperty("UnsLineId", out var luEl) ? luEl.GetString() : null;
if (string.IsNullOrWhiteSpace(id)) continue;
var inClusterByDriver = di is not null && driverIds.Contains(di);
var inClusterByLine = di is null && lineId is not null
&& lineToArea.TryGetValue(lineId, out var areaOfLine) && areaIds.Contains(areaOfLine);
if (inClusterByDriver || inClusterByLine)
equipmentIds.Add(id!);
```
Update the `// Equipment carries no ClusterId — …` comment and the `ClusterSets.EquipmentIds` doc (~line 263) to state: driver-bound equipment is attributed by its driver's cluster; driver-less equipment by its UNS line's area cluster. Keep `driverIds`/`areaIds` collection unchanged. **Note:** `areaIds` is already populated (the `CollectIdsWhereCluster(root, "UnsAreas", …)` call) before the Equipment loop — confirm ordering so `areaIds` is filled first.
**Step 4 — Run, confirm pass** + the broader DeploymentArtifact/Runtime suite stays green:
`dotnet test tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ --filter "FullyQualifiedName~DeploymentArtifact"` (and the `Driverless` filter). Also `dotnet build ZB.MOM.WW.OtOpcUa.slnx`.
**Step 5 — Commit.**
```bash
git add -A && git commit -m "fix(deploy): cluster-attribute driver-less equipment via its UNS line area (BuildClusterSets)"
```
**Acceptance:** driver-less equipment + its VirtualTags are kept in `ScopeTo` mode when the line's area is in-cluster, excluded otherwise; driver-bound attribution unchanged; suite green.
---
### Task 3: `DraftValidator` accepts a driver-less-equipment draft
**Classification:** small
**Estimated implement time:** ~3 min
**Parallelizable with:** Task 2 (different file), Task 4 (different repo)
**Files:**
- Test: `tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/...` (find the existing `DraftValidator` test file by glob, e.g. `*DraftValidator*Tests.cs`)
**Context:** No validator rule dereferences `Equipment.DriverInstanceId` or requires an Equipment namespace to have a driver. This task LOCKS that in with a regression test (no production change). It proves the design's core safety claim: a draft with driver-less equipment + an Equipment-kind namespace that has zero `DriverInstance` rows validates clean.
**Step 1 — Write the test.** READ the existing `DraftValidator` tests to learn how a `DraftSnapshot` is constructed in this suite (the builder/fixture for Equipment, Namespaces, DriverInstances, UnsAreas/Lines, ServerCluster/ClusterNode). Then add:
- `Validate_accepts_driverless_equipment_in_driverless_equipment_namespace`: build a minimal valid draft with an Equipment-kind namespace that has **zero `DriverInstance` rows**, and one `Equipment` with **`DriverInstanceId = null`** (canonical `EQ-…` id, valid Name/UnsLineId, with the UnsLine/UnsArea/cluster wiring the other rules need), plus its VirtualTag. Assert `DraftValidator.Validate(snapshot)` returns **zero errors** (`result.Errors.ShouldBeEmpty()` or the suite's idiom).
Reuse whatever canonical-equipment helper the suite already has (the EquipmentId must satisfy `ValidateEquipmentIdDerivation` = `"EQ-" + uuid.ToString("N")[:12].ToLowerInvariant()`); copy that derivation from an existing passing test so the test isn't tripped by an unrelated rule.
**Step 2 — Run, confirm pass.** `dotnet test tests/Core/ZB.MOM.WW.OtOpcUa.Configuration.Tests/ --filter "FullyQualifiedName~DraftValidator"` → green (this is a characterization test; it should pass immediately on top of Task 1's nullable entity). If it FAILS, that means a rule DOES implicitly require a driver — STOP and report which error code fired (the design would need revisiting).
**Step 3 — Commit.**
```bash
git add -A && git commit -m "test(config): DraftValidator accepts driver-less equipment + driverless equipment namespace"
```
**Acceptance:** the new test passes; existing DraftValidator tests stay green.
---
### Task 4: Loader — stop creating the placeholder, NULL the equipment FK
**Classification:** small
**Estimated implement time:** ~4 min
**Parallelizable with:** Task 1, Task 2, Task 3 (different repo)
**Files:**
- Modify: `/Users/dohertj2/Desktop/scadaproj/otopcua-uns-loader/otopcua_uns.py` (`cmd_populate_equipment`, `cmd_clean`, the `EQ_DRIVER` constant + its comment ~lines 8691)
**Context:** The loader invents a placeholder `Modbus` `DriverInstance` (`nw-uns-modbus`) and points all 40 equipment at it. With the column now nullable (Task 1), equipment should reference NO driver.
**Step 1 — Edit `cmd_populate_equipment`.** READ the function first. Then:
- **Remove** the placeholder `DriverInstance` INSERT (the `... VALUES (NEWID(), %s, %s, %s, 'Northwind UNS placeholder', 'Modbus', 1, '{}')` block, ~lines 275278) and its preceding "Placeholder driver…" comment.
- **Equipment INSERT** (~lines 296299): drop `DriverInstanceId` from the column list and its value param entirely, so the row inserts with a NULL driver. E.g.:
```python
cur.execute(
"INSERT INTO dbo.Equipment (EquipmentRowId, EquipmentId, EquipmentUuid, UnsLineId, "
"Name, MachineCode, Manufacturer, Model, Enabled) VALUES (NEWID(), %s, %s, %s, %s, %s, %s, %s, 1)",
(eq_id, eq_uuid, "nw-" + e["unsLineId"], e["name"], e["machineCode"],
e.get("manufacturer"), e.get("model")))
```
(Verify the exact column/param alignment against the current code — remove exactly the `DriverInstanceId` column and the `EQ_DRIVER` arg, keep everything else.)
- In the teardown block at the top of `cmd_populate_equipment` (the DELETEs), **remove** the now-dead `DELETE FROM dbo.Tag WHERE DriverInstanceId=%s, (EQ_DRIVER,)` and `DELETE FROM dbo.DriverInstance WHERE DriverInstanceId=%s, (EQ_DRIVER,)` lines (no such rows are created anymore). Keep the VirtualTag/Script/Equipment/UnsLine/UnsArea/Namespace deletes.
**Step 2 — Edit `cmd_clean`** the same way: remove the `DELETE … Tag WHERE DriverInstanceId=%s` and `DELETE … DriverInstance WHERE DriverInstanceId=%s` lines that reference `EQ_DRIVER`.
**Step 3 — Retire the constant + fix the comment.** Remove the `EQ_DRIVER = "nw-uns-modbus"` constant and the stale comment block (~lines 8691) that explains the placeholder ("A placeholder non-Galaxy driver kept ONLY to satisfy 'an Equipment namespace has a driver' expectations…"). If any other reference to `EQ_DRIVER` remains, grep `EQ_DRIVER` across the file and remove/adjust each.
**Step 4 — Smoke-check the script parses.** `python3 -c "import ast; ast.parse(open('/Users/dohertj2/Desktop/scadaproj/otopcua-uns-loader/otopcua_uns.py').read())"` → no SyntaxError. (Do NOT run populate against the DB here — that's Task 5, after the migration is applied.) Confirm no lingering `EQ_DRIVER` references: `grep -n EQ_DRIVER /Users/dohertj2/Desktop/scadaproj/otopcua-uns-loader/otopcua_uns.py` → no output.
**Step 5 — Commit (in the scadaproj repo).**
```bash
cd /Users/dohertj2/Desktop/scadaproj && git add otopcua-uns-loader/otopcua_uns.py \
&& git commit -m "loader: equipment is driver-less (drop Modbus placeholder, NULL DriverInstanceId)"
```
(Commit on `scadaproj` `main` — the loader lives there; do NOT push.)
**Acceptance:** no `EQ_DRIVER` references remain; the placeholder DriverInstance INSERT is gone; the Equipment INSERT omits `DriverInstanceId`; the script parses; teardown still cleans the overlay.
---
### Task 5: Live docker-dev verification
**Classification:** standard
**Estimated implement time:** ~6 min (+ migration/deploy/settle)
**Parallelizable with:** none (needs T1, T2, T4)
**Steps (no new code — the proof the wart is gone and nothing regressed):**
1. **Build the docker-dev image off this branch** (carries the nullable entity + BuildClusterSets fix):
`cd docker-dev && docker compose build central-1`.
2. **Apply the migration to the docker-dev config DB.** The DB is NOT auto-migrated. Run EF update against the live config DB (port 14330):
```bash
export OTOPCUA_CONFIG_CONNECTION="Server=localhost,14330;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;"
dotnet ef database update --project src/Core/ZB.MOM.WW.OtOpcUa.Configuration
```
Confirm: `Applied migration '<...>_NullableEquipmentDriverInstanceId'`. Sanity-check the column is now nullable:
`docker exec otopcua-dev-sql-1 /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P 'OtOpcUa!Dev123' -C -d OtOpcUa -Q "SELECT is_nullable FROM sys.columns WHERE object_id=OBJECT_ID('Equipment') AND name='DriverInstanceId';"` → `1`.
3. **Re-run the loader driver-less** (clears + reloads the equipment overlay with NULL driver, no placeholder):
`cd /Users/dohertj2/Desktop/scadaproj/otopcua-uns-loader && .venv/bin/python otopcua_uns.py populate-equipment` (use the same invocation the project README/memory documents). Verify in SQL: `SELECT COUNT(*) FROM Equipment WHERE DriverInstanceId IS NULL` = 40; `SELECT COUNT(*) FROM DriverInstance WHERE DriverInstanceId='nw-uns-modbus'` = 0.
4. **Recreate the admin nodes on the new image** (sites untouched):
`cd docker-dev && docker compose up -d --no-deps --force-recreate central-1 central-2`. Wait for `:9200` to answer (302).
5. **Redeploy headless:** `curl -s -X POST http://localhost:9200/api/deployments -H 'X-Api-Key: docker-dev-deploy-key'` → expect **202 Accepted**.
6. **Verify values still flow:** `cd /Users/dohertj2/Desktop/scadaproj/otopcua-uns-loader && .venv/bin/python otopcua_uns.py verify-equipment --require-good 396 --wait` → **`VERIFY-EQUIPMENT: PASS`**, 396 Good.
7. **The wart is gone — confirm the Modbus stub/exception no longer appears:**
`docker logs otopcua-dev-central-1-1 2>&1 | grep -iE "nw-uns-modbus|missing required Host|spawned Modbus driver"` → **no output** (previously this showed `factory for Modbus threw on nw-uns-modbus` + `spawned Modbus driver nw-uns-modbus (stub=True)`).
**Acceptance:** migration applied (column nullable); 40 equipment rows have NULL driver and the `nw-uns-modbus` DriverInstance is gone; deploy 202 Accepted; `verify-equipment` PASS (396 Good); central-1 log clean of the Modbus stub/exception. **This task changes the running docker-dev stack (the user's active env)** — coordinate: recreate only the admin nodes; don't disrupt the site nodes.
---
## After all tasks
Run the affected suites (`Configuration`, `Runtime` DeploymentArtifact) + build the solution, then use **superpowers-extended-cc:finishing-a-development-branch**: verify green, present merge/PR/keep/discard for `feat/driverless-equipment-namespace` (OtOpcUa) and the `scadaproj` loader commit. Merge/push only on the user's explicit go (the user manages this repo's integration).
@@ -0,0 +1,11 @@
{
"planPath": "docs/plans/2026-06-08-driverless-equipment-namespace.md",
"tasks": [
{"id": 1, "nativeId": 143, "subject": "Task 1: Equipment.DriverInstanceId nullable + EF migration", "status": "pending", "classification": "high-risk"},
{"id": 2, "nativeId": 144, "subject": "Task 2: BuildClusterSets driver-less equipment attribution", "status": "pending", "classification": "standard", "blockedBy": [1]},
{"id": 3, "nativeId": 145, "subject": "Task 3: DraftValidator accepts driver-less equipment (test)", "status": "pending", "classification": "small", "blockedBy": [1]},
{"id": 4, "nativeId": 146, "subject": "Task 4: Loader — driver-less equipment", "status": "pending", "classification": "small"},
{"id": 5, "nativeId": 147, "subject": "Task 5: Live docker-dev verification (driver-less equipment)", "status": "pending", "classification": "standard", "blockedBy": [1, 2, 4]}
],
"lastUpdated": "2026-06-08"
}