docs(design): driver-less equipment namespace (nullable Equipment.DriverInstanceId, drop Modbus placeholder)

This commit is contained in:
Joseph Doherty
2026-06-08 06:36:40 -04:00
parent 446a45686f
commit 064adb0bd0
@@ -0,0 +1,151 @@
# Driver-less Equipment Namespace — Design (2026-06-08)
## Problem
In the OtOpcUa config model, `Equipment.DriverInstanceId` is a **non-null** logical FK — every
equipment row must reference a `DriverInstance`. But the Northwind company overlay's equipment
signals are **VirtualTags** (driverless — they link via `EquipmentId` + a `Script`, with no driver
binding); their live values come from the galaxy mirror over the **GalaxyMxGateway** driver via the
script `return ctx.GetTag("TestMachine_NNN.Sig").Value;`.
To satisfy the non-null FK *and* the validator rule that **forbids a GalaxyMxGateway driver in an
Equipment-kind namespace** (`DraftValidator.ValidateDriverNamespaceCompatibility`:
`NamespaceKind.Equipment => DriverType != "GalaxyMxGateway"`), the loader was cornered into inventing
a **placeholder `Modbus` driver** (`nw-uns-modbus`, 0 tags bound) and pointing all 40 equipment rows
at it.
That placeholder is misleading ("equipment tied to Modbus") **and not inert** — on every deploy the
runtime tries to instantiate a real Modbus driver for it and fails:
```
WARNING DriverHost: factory for Modbus threw on nw-uns-modbus; stubbing
Cause: Modbus driver config for 'nw-uns-modbus' missing required Host
WARNING DriverHost: no factory for driver type Modbus (instance nw-uns-modbus); falling back to stub
INFO DriverHost: spawned Modbus driver nw-uns-modbus (stub=True)
```
The data path is correct (the MxGateway *is* the source, via the VirtualTag indirection — `verify-equipment`
shows 396 Good), but the driver association is a lie and generates per-deploy exception/stub noise.
## Decision
**Option B — make equipment driver-less.** VirtualTag-only equipment genuinely has no field driver, so
it should reference none: make `Equipment.DriverInstanceId` **nullable**, **delete the placeholder
driver**, and teach the one cluster-attribution site to tolerate a null driver. This keeps the
VirtualTag route intact (so overlay-building, hierarchy reorganization, and value transforms all stay
available — those are the route's strength), removes the misleading Modbus FK, and eliminates the
stub/exception noise.
Chosen over: **A** (a no-op "Virtual" driver type — keeps a placeholder), and **Option 4** (relax the
validator so a GalaxyMxGateway driver can serve equipment directly as galaxy tags — rejected because a
galaxy tag's subscribe ref is `FolderPath.Name`, i.e. tree placement is coupled to galaxy source, so a
direct-tag overlay could not freely reorganize the company hierarchy without further driver
re-architecture, and it would also drop transforms/ScriptedAlarms and blur the SystemPlatform/Equipment
layering).
## Impact map (investigated)
The change is small because almost nothing depends on `Equipment.DriverInstanceId`.
### Schema / entity — the core change
- `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/Equipment.cs:23`
`public required string DriverInstanceId``public string? DriverInstanceId`.
- EF infers nullability from the C# type: `OtOpcUaConfigDbContext.ConfigureEquipment` has **no
`.IsRequired()`** call, so only the property type changes. One EF migration emits a single
`AlterColumn(DriverInstanceId, nullable: true)`; the model snapshot loses its `.IsRequired()`.
- **No FK to drop.** Equipment→DriverInstance is a *logical* FK only — no EF `HasOne/WithMany`, no DB
`FK_…` constraint, no `ON DELETE`. Deleting the placeholder driver while equipment still references it
is already schema-legal.
- `IX_Equipment_Driver` is a plain non-unique, non-filtered index — valid on a nullable column, kept
as-is (no filtered-index change; YAGNI).
### Validator / composer / applier / runtime — no change required
- **`DraftValidator`** (all 9 rules read): none dereference `Equipment.DriverInstanceId`; none require an
Equipment namespace to have a driver. `ValidateDriverNamespaceCompatibility` iterates
`draft.DriverInstances` — with the placeholder gone, that namespace simply has no driver row to check.
- **`DraftSnapshotFactory.FromConfigDbAsync`** loads the full `Equipment` entity → naturally carries
`null`. No projection change.
- **`Phase7Composer`** — `EquipmentNode` uses `EquipmentId`/`Name`/`UnsLineId`; equipment *Tags* use
`Tag.DriverInstanceId` (a separate field), not Equipment's; `EquipmentVirtualTag` uses
`EquipmentId`/`Name`/`ScriptId`. `Equipment.DriverInstanceId` is never read.
- **`Phase7Applier`** — `MaterialiseHierarchy` / `MaterialiseEquipmentVirtualTags` build folders +
variable nodes from projected fields only; never reads the driver.
- **`DriverHostActor`** — spawns one driver actor *per DriverInstance* in the artifact. Deleting the
placeholder DriverInstance ⇒ no Modbus spec ⇒ **no stub spawn, no `missing required Host` exception**.
The VirtualTag host is spawned unconditionally and streams regardless of driver children.
- **`VirtualTagHostActor`** — driver-agnostic; depends only on the dependency mux publishing the
galaxy-mirror values (from the GalaxyMxGateway driver, untouched).
### The one real code change — `DeploymentArtifact.BuildClusterSets`
`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs` (~lines 271297). Equipment carries
no `ClusterId`; today it is cluster-attributed **via its driver** (`equipmentIds.Add(id)` when
`di is not null && driverIds.Contains(di)`). With a null driver, equipment — and its VirtualTags — would
be **silently dropped in multi-cluster (`ScopeTo`) mode**.
Fix using the UNS hierarchy, which already carries cluster identity (`UnsArea.ClusterId` exists, and
`Equipment → UnsLine.UnsAreaId → UnsArea`):
- Build a `UnsLineId → UnsAreaId` map from the artifact's `UnsLines` array.
- **Driver-bound** equipment: unchanged (in-cluster when its driver is in-cluster).
- **Driver-less** equipment: in-cluster **iff its line's area is in-cluster**
(`areaIds.Contains(lineToArea[UnsLineId])`).
This is *more* correct than the driver path: it anchors on the equipment's actual UNS placement, which is
exactly the same-cluster invariant decision #122 already enforces (driver-cluster must equal
line-cluster). The existing cross-cluster consistency warning (lines 237244) is phrased "in cluster by
its driver"; for driver-less equipment the attribution is by line, so it is consistent by construction
and the warning won't fire. **Single-cluster (`ClusterFilterMode.None`, the docker-dev topology) never
calls `BuildClusterSets`**, so this is a correctness fix for multi-cluster, not a dev-path requirement.
### Loader — `scadaproj/otopcua-uns-loader/otopcua_uns.py`
In `cmd_populate_equipment`:
- **Remove** the placeholder `DriverInstance` INSERT (`'Northwind UNS placeholder', 'Modbus', …`) and its
comment.
- **Equipment INSERT**: drop `DriverInstanceId` from the column list (→ NULL).
- Retire the `EQ_DRIVER = "nw-uns-modbus"` constant and the now-dead `DELETE … Tag WHERE DriverInstanceId`
/ `DELETE … DriverInstance` teardown lines (in both `cmd_populate_equipment` and `cmd_clean`) — they
become no-ops; remove for clarity. Keep the `Namespace` INSERT/teardown.
- Fix the stale `# … "an Equipment namespace has a driver" expectations` comment.
### Noted, not changing (YAGNI)
- `sp_ComputeGenerationDiff` includes `DriverInstanceId` in a `CHECKSUM(...)`. It is NULL-tolerant for
this one-time transition and sits on the **dormant** generation-diff path (the active deploy gate is
the C# `DraftValidator` + `ConfigComposer.SnapshotAndFlattenAsync`, not the SP). Flagged verify-only;
if it is ever reactivated, wrap with `ISNULL(DriverInstanceId, '')` as a follow-on.
## Testing & verification
1. **Unit (Configuration tests):** a `DraftValidator` test that a draft with driver-less equipment
(`Equipment.DriverInstanceId == null`, Equipment namespace with zero drivers) validates clean.
2. **Unit (Runtime tests):** a `BuildClusterSets` / scoped-`ParseComposition` test proving a null-driver
equipment is attributed to the cluster of its line's area in `ScopeTo` mode (and its VirtualTags are
kept), while a wrong-cluster line excludes it.
3. Existing `Configuration`, `OpcUaServer` (Phase7), and `Runtime` (DeploymentArtifact) suites stay green.
4. **Migration** authored + applied to the docker-dev config DB.
5. **Live docker-dev:** re-run the loader (`populate-equipment`, now driver-less) → redeploy → confirm
**396 Good still flow** on `:4840` (`VERIFY-EQUIPMENT: PASS`) **and** central-1 logs **no longer
contain** `spawned Modbus driver nw-uns-modbus` or `missing required Host`.
## Sequencing & risk
| Step | Risk | Notes |
|---|---|---|
| Entity nullable + EF migration | medium — schema change | single `AlterColumn`; no FK/constraint to drop; reversible |
| `BuildClusterSets` null-driver attribution | lowmedium — multi-cluster scoping | additive branch; single-cluster path unaffected; covered by a new test |
| Loader edits | low | drop placeholder + NULL the FK; teardown becomes no-ops |
| Live redeploy on docker-dev | low | recreate admin nodes only; sites untouched; the proof the wart is gone |
## Branches
- OtOpcUa: `feat/driverless-equipment-namespace` (off `master` `446a456`).
- Loader: `scadaproj` (`otopcua_uns.py`).
- Migration authored in the Configuration project; applied to docker-dev. Merge/push only on the user's
explicit go (the user manages this repo's integration).
## Related context
- Investigation: this design is grounded in a three-front impact sweep (EF/schema, validator/artifact,
runtime/loader) — see the per-layer findings folded into the Impact map above.
- Decision #122 (same-cluster invariant: equipment's driver-cluster must equal its UNS-line cluster) —
the anchor that makes line-based attribution for driver-less equipment correct.
- `DraftValidator.ValidateDriverNamespaceCompatibility` — the rule that forbids GalaxyMxGateway in
Equipment namespaces and forced the original placeholder.