Files
lmxopcua/docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-design.md
T

206 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# OtOpcUa — dynamic injection of driver-discovered FixedTree nodes into the Equipment projection (design)
**Date:** 2026-06-26
**Status:** ✅ Implemented (2026-06-26) — 11 tasks, offline-complete on branch `feat/focas-fixedtree-equipment-injection` (solution build 0 errors / 0 warnings; Runtime.Tests 312, OpcUaServer.Tests 304, FOCAS 247 + an end-to-end injection+value-flow test, all green). Live wonder validation pending.
**Follow-ups surfaced during the review chain — ✅ ALL RESOLVED 2026-06-26** (design
[`2026-06-26-otopcua-fixedtree-followups-design.md`](2026-06-26-otopcua-fixedtree-followups-design.md),
plan [`2026-06-26-otopcua-fixedtree-followups.md`](2026-06-26-otopcua-fixedtree-followups.md);
16 commits `c2c368dc`..`0074f37a` on this branch, every task spec+code reviewed; full offline suite green):
- ✅ Config-unchanged driver→equipment **rebind** now **re-triggers discovery** (follow-up C): the redeploy re-inject tail drops the stale plan AND `Tell`s the driver child a new `DriverInstanceActor.TriggerRediscovery` (a discovery action — not lifecycle control — idempotent, child no-ops if not Connected), so the FixedTree re-grafts under the new equipment on the next pass instead of waiting for the next natural reconnect.
-**Multi-device-per-driver** mapping **implemented** (follow-up E): `EquipmentNode` now carries `DriverInstanceId`/`DeviceId`/`DeviceHost` (projection-only — the columns + the `Devices` array were already in the artifact, no DB migration / no wire change), so equipment resolves via the driver binding **without** authored tags (≥1-tag requirement lifted), and a driver bound to multiple devices partitions its discovered tree by normalized device-host folder, grafting each device's subtree under the equipment whose `DeviceHost` matches (unmatched hosts warn-skip, never mis-graft).
- ✅ Per-(re)connect re-discovery is now **policy-gated** (follow-up B): `ITagDiscovery.RediscoverPolicy` (`UntilStable`/`Once`/`Never`, default `UntilStable`) — FOCAS stays `UntilStable` (its FixedTree cache fills asynchronously after connect); the synchronous-discovery drivers (OpcUaClient/TwinCAT/AbCip/AbLegacy/Modbus/S7/Galaxy) are `Once`, dropping the wasteful 15× retry. The hardcoded 30 s per-pass discovery timeout is now injectable too (follow-up A).
- ✅ The OPC-node-layer seed→serve gap (recording-sink-only e2e) was closed by the **live wonder deploy** of the base feature (validated 2026-06-26; see the deployment record).
**Companion to:** [`2026-06-25-otopcua-equipment-dataplane-investigation.md`](2026-06-25-otopcua-equipment-dataplane-investigation.md) (symptom #1 — live FOCAS values — FIXED + deployed; this design addresses **symptom #2**).
**Base branch:** `fix/focas-poll-io-serialization` (this feature builds on the now-deployed driver-host bootstrap re-spawn + FOCAS I/O fixes; that branch is ahead of `master` and not yet merged).
---
## Problem
Deployed FOCAS equipment serves only its **authored** config tags (`parts-count`/`parts-required`). The driver's
**FixedTree** (Identity / Axes / Spindle / Program / Timers — the auto-discovered CNC structure) **never appears** under
the served Equipment/UNS address space.
**Root cause (confirmed in the investigation, H2):** the served Equipment tree is built **purely from Config-DB entities**
(`AddressSpaceComposer.Compose``AddressSpaceApplier` → node manager). The only code that emits FixedTree nodes is
`ITagDiscovery.DiscoverAsync` (each driver implements it), reachable **only** through `GenericDriverNodeManager.BuildAddressSpaceAsync`
— which has **no runtime caller** (its referenced host method `OpcUaApplicationHost.PopulateAddressSpaces` no longer exists).
So `DiscoverAsync`/`ITagDiscovery` is **dead for serving**: every served node is config-driven, and nothing surfaces a
driver's discovered hierarchy.
Surfacing FixedTree under the Equipment node is therefore a **new dynamic-node-injection capability**, and it must solve a
**timing problem**: composition runs at deploy/apply time (before the driver connects), but the FixedTree shape
(axis count, spindle presence, which sections exist) is **capability-discovered ~02 s after the driver connects**
(`FocasDriver` populates `state.FixedTreeCache` in its bootstrap loop).
## Goal
After a driver connects, dynamically graft its discovered FixedTree nodes into the served Equipment projection under a
driver-named subfolder, e.g.:
```
ns=2;s=EQ-3686c0272279 (equipment "z-34184")
├── parts-count (authored config tag — unchanged)
├── parts-required (authored config tag — unchanged)
└── FOCAS (NEW — driver-named discovered subfolder)
├── Identity/{SeriesNumber, Version, MaxAxes, CncType, MtType, AxisCount}
├── Axes/{<axis>/{AbsolutePosition, MachinePosition, RelativePosition, DistanceToGo}, FeedRate/Actual, SpindleSpeed/Actual}
├── Spindle/{<name>/{Load, MaxRpm}} (capability-gated)
├── Program/{Name, ONumber, Number, MainNumber, Sequence, BlockCount} (capability-gated)
├── OperationMode/{Mode, ModeText} (capability-gated)
└── Timers/{PowerOnSeconds, OperatingSeconds, CuttingSeconds, CycleSeconds} (capability-gated)
```
Read-only value nodes carrying live values (e.g. `EQ-…/FOCAS/Axes/X/AbsolutePosition` reads Good).
## Decisions (locked with the user 2026-06-26)
| Decision | Choice |
|---|---|
| Driver scope | **Generic** — keyed off the shared `ITagDiscovery` interface (FOCAS, Galaxy, Modbus all implement it). FOCAS is the first/test consumer; others get it for free. **Zero per-driver code changes.** |
| Tree placement | **Under a driver-named subfolder**`EQ-…/FOCAS/…` (collision-safe vs. authored tags; self-describing). |
| Device-host folder | **Collapse** the single device-host level → `EQ-…/FOCAS/Identity/…` (not `EQ-…/FOCAS/10.201.31.5:8193/Identity/…`), valid because today's deployment is strictly 1:1 driver↔equipment↔device. |
| Model-change notification | **Emit `GeneralModelChangeEvent`** after a runtime add so already-connected OPC UA clients can refresh their browse. |
| Multi-device-per-driver | **Deferred** at base-feature time; ✅ **implemented as follow-up E** (2026-06-26) — `EquipmentNode.DeviceHost` partition. |
| Discovered alarms | **Out of scope** — this feature surfaces value nodes only; alarms continue to come via the config path. |
| Writable discovered nodes | **Out of scope** — FixedTree is read-only CNC state. |
## Approach (chosen): runtime post-connect injection via the actor pipeline
Treat discovered FixedTree nodes as **"synthetic equipment tags" injected at runtime**, reusing the existing
materialize → subscribe → poll → push pipeline end-to-end. Only three new pieces; **no driver changes** (each driver's
existing `DiscoverAsync` is reused verbatim via a capturing builder).
**Rejected alternatives:**
- *Composition-time pre-projection* — can't author the right nodes before the driver discovers capabilities; defeats the purpose.
- *Resurrect `GenericDriverNodeManager` as a 2nd namespace (ns=3)* — puts FixedTree in a separate tree (not **under** the equipment node), and that namespace's value-routing is also dead; more dead code to revive, wrong location.
- *Cheap baseline: author a Config-DB Tag row per FixedTree signal* — no new code, but static (can't adapt to per-CNC capabilities) and per-signal × per-machine manual authoring. User chose to build the dynamic feature instead.
## Components
### 1. `CapturingAddressSpaceBuilder` (new — runtime)
An `IAddressSpaceBuilder` implementation that **records** the streamed tree instead of creating OPC UA nodes. After a
driver's `DiscoverAsync(builder)` returns, it exposes a flat `IReadOnlyList<DiscoveredNode>`:
```
DiscoveredNode {
IReadOnlyList<string> FolderPathSegments, // e.g. ["FOCAS", "<deviceHost>", "Identity"]
string BrowseName, string DisplayName,
string FullReference, // == DriverAttributeInfo.FullName (the driver ref + routing key)
DriverDataType DataType, bool IsArray, uint? ArrayDim,
bool Writable, bool IsHistorized
}
```
- `Folder(browse, display)` returns a child capturing scope; `Variable(...)` records a node and returns an
`IVariableHandle` whose `FullReference` is `DriverAttributeInfo.FullName`.
- `MarkAsAlarmCondition(...)` returns a **no-op** sink; `AddProperty(...)` is **ignored** — value nodes only.
### 2. `DriverInstanceActor` — post-connect discovery (bounded retry)
On entering `Connected`, kick a bounded re-discovery:
1. Run `DiscoverAsync(capturingBuilder)` against the live `IDriver` it owns.
2. `Tell` the parent `DriverHostActor` a new message `DiscoveredNodesReady(DriverInstanceId, IReadOnlyList<DiscoveredNode>)`.
3. Because FOCAS suppresses FixedTree until `FixedTreeCache` populates (~02 s), **retry** every ~2 s up to a cap
(~30 s) **or until the captured set stops growing**, then stop. `DiscoverAsync` reads the in-memory cache (no extra
wire I/O), so retries are cheap. Re-runs on every reconnect (downstream is idempotent).
*(Drivers whose discovery is ready immediately — e.g. Galaxy/Modbus — satisfy this on the first attempt.)*
### 3. `DriverHostActor` — injection handler
On `DiscoveredNodesReady(id, nodes)`:
1. Find the equipment bound to the driver instance: `composition.EquipmentNodes` where `DriverInstanceId == id`.
- 0 matches → log Info, skip. >1 match → log Warning, skip (multi-device follow-up).
2. **Dedup** discovered `FullReference`s against authored `EquipmentTags` for that driver (never double-create
`parts-count`, etc.).
3. Map each remaining node to a NodeId `EQ-…/FOCAS/<collapsed-path>/<name>` via `EquipmentNodeIds.Variable(...)`
(collapse the single device-host folder level).
4. **Cache** the mapped result in `_discoveredByDriver[id]` (survives redeploys — see Lifecycle).
5. Update `_nodeIdByDriverRef[(id, FullReference)]` for each.
6. `Tell` `OpcUaPublishActor` a new `MaterialiseDiscoveredNodes(equipmentId, "FOCAS", nodes)`.
7. Merge the new refs into the driver's desired set and re-`Tell`
`DriverInstanceActor.SetDesiredSubscriptions(union, interval, alarmRefs)` — the existing **live path** immediately
re-subscribes (the actor self-`Tell`s `Subscribe` when already `Connected`).
### 4. `OpcUaPublishActor` / node manager — incremental materialize
New message `MaterialiseDiscoveredNodes(equipmentId, driverSubfolder, nodes)`:
- Idempotent `EnsureFolder` / `EnsureVariable` calls (the node manager already supports incremental add under `Lock`
via `AddChild` + `AddPredefinedNode`; `EnsureVariable` early-returns if the node exists).
- Variables materialize **read-only** (no `OnWriteValue`).
- After adding, emit a `GeneralModelChangeEvent` so connected clients can refresh their browse (the full-rebuild path
does not emit one; runtime adds should).
## Data flow (value path — fully reused)
```
SetDesiredSubscriptions(union) → DriverInstanceActor subscribes the FixedTree refs
→ PollGroupEngine polls each ref via FocasDriver.ReadAsync
→ TryReadFixedTree (cache lookup, NO extra wire I/O)
→ onChange → AttributeValuePublished(FullReference)
→ DriverHostActor.ForwardToMux
→ _nodeIdByDriverRef[(driverId, ref)] → AttributeValueUpdate(nodeId, value, quality, ts)
→ OtOpcUaNodeManager writes the node value
```
The routing key is **consistent by construction**: the capturing builder records `handle.FullReference`, which is exactly
the ref the driver publishes (`AttributeValuePublished.FullReference`) and the ref `TryReadFixedTree` matches
(`reference.StartsWith(state.Options.HostAddress + "/")`).
## Lifecycle / re-injection robustness (the timing problem, solved)
- **First connect:** driver connects → ~02 s later `FixedTreeCache` populates → bounded re-discovery catches it → inject.
- **Redeploy with a structural `RebuildAddressSpace`:** the full teardown wipes injected nodes and `PushDesiredSubscriptions`
rebuilds `_nodeIdByDriverRef` from authored tags only. **Fix:** after every `PushDesiredSubscriptions`, `DriverHostActor`
**re-applies its cached `_discoveredByDriver`** (re-materialize + re-map + re-merge refs) — so FixedTree survives
redeploys without re-querying the driver.
- **Process restart:** `_discoveredByDriver` is lost, but `RestoreApplied` re-spawns drivers → each reconnects →
post-connect re-discovery re-injects (same ~02 s delay). Consistent with the symptom-#1 restore behavior already
deployed.
- **Idempotent throughout:** `EnsureFolder`/`EnsureVariable` early-return if present; `_nodeIdByDriverRef` is set-based;
`SetDesiredSubscriptions` is idempotent.
## Error handling
- Discovery throws / driver not ready → bounded retry, then give up quietly (Info); authored tags unaffected.
- No equipment bound to the driver instance, or ambiguous (multi-equipment) → Warning, skip injection.
- A FixedTree ref that fails to read at poll time → flows the same recoverable `BadCommunicationError` push as any
equipment tag (the symptom-#1 fix) — observable, not silent.
## Testing
- **Unit:**
- `CapturingAddressSpaceBuilder` records the tree + refs from a fake `ITagDiscovery` (folders, nested variables,
no-op alarm sink, ignored properties).
- Injector mapping: discovered nodes → `EQ-…/FOCAS/…` NodeIds; dedup against authored tags; device-host-folder collapse.
- `DriverInstanceActor` bounded post-connect re-discovery (set becomes non-empty on the Nth attempt; stops on cap / no-growth).
- `DriverHostActor` `DiscoveredNodesReady` handling + re-inject-after-`PushDesiredSubscriptions`.
- Read-only materialization (no write callback).
- **Integration (docker-dev):** a fake `ITagDiscovery` driver exposing a *delayed* discovery set → assert nodes appear
under the equipment and carry values; verify survival across a redeploy + a process restart.
- **Live (wonder, following the symptom-#1 pattern):** deploy the current Host + this change, browse
`EQ-3686c0272279/FOCAS/Identity/SeriesNumber` and `…/Axes/X/AbsolutePosition`, confirm Good values. The live deploy is
**not** blocking for the build (macro/axes values may be 0 on the idle machine — assert status, not magnitude); confirm
the live-deploy step with the user at execution time.
## Scope / non-goals
- **In:** read-only value nodes for any `ITagDiscovery` driver; 1:1 driver↔equipment; survives redeploy/restart; generic
mechanism with FOCAS as the first consumer.
- **Out (documented follow-ups):** discovered **alarms** injection; multi-device-per-driver-instance mapping; writable
discovered nodes.
## Touched code (anticipated)
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs``DiscoveredNodesReady` handler, `_discoveredByDriver`
cache, re-inject after `PushDesiredSubscriptions`, desired-set merge.
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs` — post-connect bounded re-discovery + new message.
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs``MaterialiseDiscoveredNodes` receive.
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs``GeneralModelChangeEvent` emit on runtime add (verify
existing helper).
- New: `CapturingAddressSpaceBuilder` + `DiscoveredNode` DTO (runtime), `EquipmentNodeIds` reuse for mapping.
- Tests under `tests/...Runtime.Tests` / `tests/...OpcUaServer.Tests` and a fake `ITagDiscovery` test double.
## Task tracking
Umbrella native task **#14** (FixedTree feature). Implementation tasks to be generated by writing-plans from this design.