206 lines
15 KiB
Markdown
206 lines
15 KiB
Markdown
# OtOpcUa — dynamic injection of driver-discovered FixedTree nodes into the Equipment projection (design)
|
||
|
||
**Date:** 2026-06-26
|
||
**Status:** ✅ Implemented (2026-06-26) — 11 tasks, offline-complete on branch `feat/focas-fixedtree-equipment-injection` (solution build 0 errors / 0 warnings; Runtime.Tests 312, OpcUaServer.Tests 304, FOCAS 247 + an end-to-end injection+value-flow test, all green). Live wonder validation pending.
|
||
|
||
**Follow-ups surfaced during the review chain — ✅ ALL RESOLVED 2026-06-26** (design
|
||
[`2026-06-26-otopcua-fixedtree-followups-design.md`](2026-06-26-otopcua-fixedtree-followups-design.md),
|
||
plan [`2026-06-26-otopcua-fixedtree-followups.md`](2026-06-26-otopcua-fixedtree-followups.md);
|
||
16 commits `c2c368dc`..`0074f37a` on this branch, every task spec+code reviewed; full offline suite green):
|
||
- ✅ Config-unchanged driver→equipment **rebind** now **re-triggers discovery** (follow-up C): the redeploy re-inject tail drops the stale plan AND `Tell`s the driver child a new `DriverInstanceActor.TriggerRediscovery` (a discovery action — not lifecycle control — idempotent, child no-ops if not Connected), so the FixedTree re-grafts under the new equipment on the next pass instead of waiting for the next natural reconnect.
|
||
- ✅ **Multi-device-per-driver** mapping **implemented** (follow-up E): `EquipmentNode` now carries `DriverInstanceId`/`DeviceId`/`DeviceHost` (projection-only — the columns + the `Devices` array were already in the artifact, no DB migration / no wire change), so equipment resolves via the driver binding **without** authored tags (≥1-tag requirement lifted), and a driver bound to multiple devices partitions its discovered tree by normalized device-host folder, grafting each device's subtree under the equipment whose `DeviceHost` matches (unmatched hosts warn-skip, never mis-graft).
|
||
- ✅ Per-(re)connect re-discovery is now **policy-gated** (follow-up B): `ITagDiscovery.RediscoverPolicy` (`UntilStable`/`Once`/`Never`, default `UntilStable`) — FOCAS stays `UntilStable` (its FixedTree cache fills asynchronously after connect); the synchronous-discovery drivers (OpcUaClient/TwinCAT/AbCip/AbLegacy/Modbus/S7/Galaxy) are `Once`, dropping the wasteful 15× retry. The hardcoded 30 s per-pass discovery timeout is now injectable too (follow-up A).
|
||
- ✅ The OPC-node-layer seed→serve gap (recording-sink-only e2e) was closed by the **live wonder deploy** of the base feature (validated 2026-06-26; see the deployment record).
|
||
**Companion to:** [`2026-06-25-otopcua-equipment-dataplane-investigation.md`](2026-06-25-otopcua-equipment-dataplane-investigation.md) (symptom #1 — live FOCAS values — FIXED + deployed; this design addresses **symptom #2**).
|
||
**Base branch:** `fix/focas-poll-io-serialization` (this feature builds on the now-deployed driver-host bootstrap re-spawn + FOCAS I/O fixes; that branch is ahead of `master` and not yet merged).
|
||
|
||
---
|
||
|
||
## Problem
|
||
|
||
Deployed FOCAS equipment serves only its **authored** config tags (`parts-count`/`parts-required`). The driver's
|
||
**FixedTree** (Identity / Axes / Spindle / Program / Timers — the auto-discovered CNC structure) **never appears** under
|
||
the served Equipment/UNS address space.
|
||
|
||
**Root cause (confirmed in the investigation, H2):** the served Equipment tree is built **purely from Config-DB entities**
|
||
(`AddressSpaceComposer.Compose` → `AddressSpaceApplier` → node manager). The only code that emits FixedTree nodes is
|
||
`ITagDiscovery.DiscoverAsync` (each driver implements it), reachable **only** through `GenericDriverNodeManager.BuildAddressSpaceAsync`
|
||
— which has **no runtime caller** (its referenced host method `OpcUaApplicationHost.PopulateAddressSpaces` no longer exists).
|
||
So `DiscoverAsync`/`ITagDiscovery` is **dead for serving**: every served node is config-driven, and nothing surfaces a
|
||
driver's discovered hierarchy.
|
||
|
||
Surfacing FixedTree under the Equipment node is therefore a **new dynamic-node-injection capability**, and it must solve a
|
||
**timing problem**: composition runs at deploy/apply time (before the driver connects), but the FixedTree shape
|
||
(axis count, spindle presence, which sections exist) is **capability-discovered ~0–2 s after the driver connects**
|
||
(`FocasDriver` populates `state.FixedTreeCache` in its bootstrap loop).
|
||
|
||
## Goal
|
||
|
||
After a driver connects, dynamically graft its discovered FixedTree nodes into the served Equipment projection under a
|
||
driver-named subfolder, e.g.:
|
||
|
||
```
|
||
ns=2;s=EQ-3686c0272279 (equipment "z-34184")
|
||
├── parts-count (authored config tag — unchanged)
|
||
├── parts-required (authored config tag — unchanged)
|
||
└── FOCAS (NEW — driver-named discovered subfolder)
|
||
├── Identity/{SeriesNumber, Version, MaxAxes, CncType, MtType, AxisCount}
|
||
├── Axes/{<axis>/{AbsolutePosition, MachinePosition, RelativePosition, DistanceToGo}, FeedRate/Actual, SpindleSpeed/Actual}
|
||
├── Spindle/{<name>/{Load, MaxRpm}} (capability-gated)
|
||
├── Program/{Name, ONumber, Number, MainNumber, Sequence, BlockCount} (capability-gated)
|
||
├── OperationMode/{Mode, ModeText} (capability-gated)
|
||
└── Timers/{PowerOnSeconds, OperatingSeconds, CuttingSeconds, CycleSeconds} (capability-gated)
|
||
```
|
||
|
||
Read-only value nodes carrying live values (e.g. `EQ-…/FOCAS/Axes/X/AbsolutePosition` reads Good).
|
||
|
||
## Decisions (locked with the user 2026-06-26)
|
||
|
||
| Decision | Choice |
|
||
|---|---|
|
||
| Driver scope | **Generic** — keyed off the shared `ITagDiscovery` interface (FOCAS, Galaxy, Modbus all implement it). FOCAS is the first/test consumer; others get it for free. **Zero per-driver code changes.** |
|
||
| Tree placement | **Under a driver-named subfolder** — `EQ-…/FOCAS/…` (collision-safe vs. authored tags; self-describing). |
|
||
| Device-host folder | **Collapse** the single device-host level → `EQ-…/FOCAS/Identity/…` (not `EQ-…/FOCAS/10.201.31.5:8193/Identity/…`), valid because today's deployment is strictly 1:1 driver↔equipment↔device. |
|
||
| Model-change notification | **Emit `GeneralModelChangeEvent`** after a runtime add so already-connected OPC UA clients can refresh their browse. |
|
||
| Multi-device-per-driver | **Deferred** at base-feature time; ✅ **implemented as follow-up E** (2026-06-26) — `EquipmentNode.DeviceHost` partition. |
|
||
| Discovered alarms | **Out of scope** — this feature surfaces value nodes only; alarms continue to come via the config path. |
|
||
| Writable discovered nodes | **Out of scope** — FixedTree is read-only CNC state. |
|
||
|
||
## Approach (chosen): runtime post-connect injection via the actor pipeline
|
||
|
||
Treat discovered FixedTree nodes as **"synthetic equipment tags" injected at runtime**, reusing the existing
|
||
materialize → subscribe → poll → push pipeline end-to-end. Only three new pieces; **no driver changes** (each driver's
|
||
existing `DiscoverAsync` is reused verbatim via a capturing builder).
|
||
|
||
**Rejected alternatives:**
|
||
- *Composition-time pre-projection* — can't author the right nodes before the driver discovers capabilities; defeats the purpose.
|
||
- *Resurrect `GenericDriverNodeManager` as a 2nd namespace (ns=3)* — puts FixedTree in a separate tree (not **under** the equipment node), and that namespace's value-routing is also dead; more dead code to revive, wrong location.
|
||
- *Cheap baseline: author a Config-DB Tag row per FixedTree signal* — no new code, but static (can't adapt to per-CNC capabilities) and per-signal × per-machine manual authoring. User chose to build the dynamic feature instead.
|
||
|
||
## Components
|
||
|
||
### 1. `CapturingAddressSpaceBuilder` (new — runtime)
|
||
An `IAddressSpaceBuilder` implementation that **records** the streamed tree instead of creating OPC UA nodes. After a
|
||
driver's `DiscoverAsync(builder)` returns, it exposes a flat `IReadOnlyList<DiscoveredNode>`:
|
||
|
||
```
|
||
DiscoveredNode {
|
||
IReadOnlyList<string> FolderPathSegments, // e.g. ["FOCAS", "<deviceHost>", "Identity"]
|
||
string BrowseName, string DisplayName,
|
||
string FullReference, // == DriverAttributeInfo.FullName (the driver ref + routing key)
|
||
DriverDataType DataType, bool IsArray, uint? ArrayDim,
|
||
bool Writable, bool IsHistorized
|
||
}
|
||
```
|
||
|
||
- `Folder(browse, display)` returns a child capturing scope; `Variable(...)` records a node and returns an
|
||
`IVariableHandle` whose `FullReference` is `DriverAttributeInfo.FullName`.
|
||
- `MarkAsAlarmCondition(...)` returns a **no-op** sink; `AddProperty(...)` is **ignored** — value nodes only.
|
||
|
||
### 2. `DriverInstanceActor` — post-connect discovery (bounded retry)
|
||
On entering `Connected`, kick a bounded re-discovery:
|
||
1. Run `DiscoverAsync(capturingBuilder)` against the live `IDriver` it owns.
|
||
2. `Tell` the parent `DriverHostActor` a new message `DiscoveredNodesReady(DriverInstanceId, IReadOnlyList<DiscoveredNode>)`.
|
||
3. Because FOCAS suppresses FixedTree until `FixedTreeCache` populates (~0–2 s), **retry** every ~2 s up to a cap
|
||
(~30 s) **or until the captured set stops growing**, then stop. `DiscoverAsync` reads the in-memory cache (no extra
|
||
wire I/O), so retries are cheap. Re-runs on every reconnect (downstream is idempotent).
|
||
|
||
*(Drivers whose discovery is ready immediately — e.g. Galaxy/Modbus — satisfy this on the first attempt.)*
|
||
|
||
### 3. `DriverHostActor` — injection handler
|
||
On `DiscoveredNodesReady(id, nodes)`:
|
||
1. Find the equipment bound to the driver instance: `composition.EquipmentNodes` where `DriverInstanceId == id`.
|
||
- 0 matches → log Info, skip. >1 match → log Warning, skip (multi-device follow-up).
|
||
2. **Dedup** discovered `FullReference`s against authored `EquipmentTags` for that driver (never double-create
|
||
`parts-count`, etc.).
|
||
3. Map each remaining node to a NodeId `EQ-…/FOCAS/<collapsed-path>/<name>` via `EquipmentNodeIds.Variable(...)`
|
||
(collapse the single device-host folder level).
|
||
4. **Cache** the mapped result in `_discoveredByDriver[id]` (survives redeploys — see Lifecycle).
|
||
5. Update `_nodeIdByDriverRef[(id, FullReference)]` for each.
|
||
6. `Tell` `OpcUaPublishActor` a new `MaterialiseDiscoveredNodes(equipmentId, "FOCAS", nodes)`.
|
||
7. Merge the new refs into the driver's desired set and re-`Tell`
|
||
`DriverInstanceActor.SetDesiredSubscriptions(union, interval, alarmRefs)` — the existing **live path** immediately
|
||
re-subscribes (the actor self-`Tell`s `Subscribe` when already `Connected`).
|
||
|
||
### 4. `OpcUaPublishActor` / node manager — incremental materialize
|
||
New message `MaterialiseDiscoveredNodes(equipmentId, driverSubfolder, nodes)`:
|
||
- Idempotent `EnsureFolder` / `EnsureVariable` calls (the node manager already supports incremental add under `Lock`
|
||
via `AddChild` + `AddPredefinedNode`; `EnsureVariable` early-returns if the node exists).
|
||
- Variables materialize **read-only** (no `OnWriteValue`).
|
||
- After adding, emit a `GeneralModelChangeEvent` so connected clients can refresh their browse (the full-rebuild path
|
||
does not emit one; runtime adds should).
|
||
|
||
## Data flow (value path — fully reused)
|
||
|
||
```
|
||
SetDesiredSubscriptions(union) → DriverInstanceActor subscribes the FixedTree refs
|
||
→ PollGroupEngine polls each ref via FocasDriver.ReadAsync
|
||
→ TryReadFixedTree (cache lookup, NO extra wire I/O)
|
||
→ onChange → AttributeValuePublished(FullReference)
|
||
→ DriverHostActor.ForwardToMux
|
||
→ _nodeIdByDriverRef[(driverId, ref)] → AttributeValueUpdate(nodeId, value, quality, ts)
|
||
→ OtOpcUaNodeManager writes the node value
|
||
```
|
||
|
||
The routing key is **consistent by construction**: the capturing builder records `handle.FullReference`, which is exactly
|
||
the ref the driver publishes (`AttributeValuePublished.FullReference`) and the ref `TryReadFixedTree` matches
|
||
(`reference.StartsWith(state.Options.HostAddress + "/")`).
|
||
|
||
## Lifecycle / re-injection robustness (the timing problem, solved)
|
||
|
||
- **First connect:** driver connects → ~0–2 s later `FixedTreeCache` populates → bounded re-discovery catches it → inject.
|
||
- **Redeploy with a structural `RebuildAddressSpace`:** the full teardown wipes injected nodes and `PushDesiredSubscriptions`
|
||
rebuilds `_nodeIdByDriverRef` from authored tags only. **Fix:** after every `PushDesiredSubscriptions`, `DriverHostActor`
|
||
**re-applies its cached `_discoveredByDriver`** (re-materialize + re-map + re-merge refs) — so FixedTree survives
|
||
redeploys without re-querying the driver.
|
||
- **Process restart:** `_discoveredByDriver` is lost, but `RestoreApplied` re-spawns drivers → each reconnects →
|
||
post-connect re-discovery re-injects (same ~0–2 s delay). Consistent with the symptom-#1 restore behavior already
|
||
deployed.
|
||
- **Idempotent throughout:** `EnsureFolder`/`EnsureVariable` early-return if present; `_nodeIdByDriverRef` is set-based;
|
||
`SetDesiredSubscriptions` is idempotent.
|
||
|
||
## Error handling
|
||
|
||
- Discovery throws / driver not ready → bounded retry, then give up quietly (Info); authored tags unaffected.
|
||
- No equipment bound to the driver instance, or ambiguous (multi-equipment) → Warning, skip injection.
|
||
- A FixedTree ref that fails to read at poll time → flows the same recoverable `BadCommunicationError` push as any
|
||
equipment tag (the symptom-#1 fix) — observable, not silent.
|
||
|
||
## Testing
|
||
|
||
- **Unit:**
|
||
- `CapturingAddressSpaceBuilder` records the tree + refs from a fake `ITagDiscovery` (folders, nested variables,
|
||
no-op alarm sink, ignored properties).
|
||
- Injector mapping: discovered nodes → `EQ-…/FOCAS/…` NodeIds; dedup against authored tags; device-host-folder collapse.
|
||
- `DriverInstanceActor` bounded post-connect re-discovery (set becomes non-empty on the Nth attempt; stops on cap / no-growth).
|
||
- `DriverHostActor` `DiscoveredNodesReady` handling + re-inject-after-`PushDesiredSubscriptions`.
|
||
- Read-only materialization (no write callback).
|
||
- **Integration (docker-dev):** a fake `ITagDiscovery` driver exposing a *delayed* discovery set → assert nodes appear
|
||
under the equipment and carry values; verify survival across a redeploy + a process restart.
|
||
- **Live (wonder, following the symptom-#1 pattern):** deploy the current Host + this change, browse
|
||
`EQ-3686c0272279/FOCAS/Identity/SeriesNumber` and `…/Axes/X/AbsolutePosition`, confirm Good values. The live deploy is
|
||
**not** blocking for the build (macro/axes values may be 0 on the idle machine — assert status, not magnitude); confirm
|
||
the live-deploy step with the user at execution time.
|
||
|
||
## Scope / non-goals
|
||
|
||
- **In:** read-only value nodes for any `ITagDiscovery` driver; 1:1 driver↔equipment; survives redeploy/restart; generic
|
||
mechanism with FOCAS as the first consumer.
|
||
- **Out (documented follow-ups):** discovered **alarms** injection; multi-device-per-driver-instance mapping; writable
|
||
discovered nodes.
|
||
|
||
## Touched code (anticipated)
|
||
|
||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` — `DiscoveredNodesReady` handler, `_discoveredByDriver`
|
||
cache, re-inject after `PushDesiredSubscriptions`, desired-set merge.
|
||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs` — post-connect bounded re-discovery + new message.
|
||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs` — `MaterialiseDiscoveredNodes` receive.
|
||
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs` — `GeneralModelChangeEvent` emit on runtime add (verify
|
||
existing helper).
|
||
- New: `CapturingAddressSpaceBuilder` + `DiscoveredNode` DTO (runtime), `EquipmentNodeIds` reuse for mapping.
|
||
- Tests under `tests/...Runtime.Tests` / `tests/...OpcUaServer.Tests` and a fake `ITagDiscovery` test double.
|
||
|
||
## Task tracking
|
||
|
||
Umbrella native task **#14** (FixedTree feature). Implementation tasks to be generated by writing-plans from this design.
|