From 1058542d808702030ee0f81af33bec2d5c03b063 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Fri, 26 Jun 2026 15:19:46 -0400 Subject: [PATCH] docs(otopcua): record FixedTree follow-ups A-E as implemented (design, plan, RESUME) --- ...ua-fixedtree-equipment-injection-RESUME.md | 110 ++++++ ...ua-fixedtree-equipment-injection-design.md | 15 +- ...6-26-otopcua-fixedtree-followups-design.md | 234 ++++++++++++ .../2026-06-26-otopcua-fixedtree-followups.md | 358 ++++++++++++++++++ ...-otopcua-fixedtree-followups.md.tasks.json | 23 ++ 5 files changed, 734 insertions(+), 6 deletions(-) create mode 100644 docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-RESUME.md create mode 100644 docs/plans/2026-06-26-otopcua-fixedtree-followups-design.md create mode 100644 docs/plans/2026-06-26-otopcua-fixedtree-followups.md create mode 100644 docs/plans/2026-06-26-otopcua-fixedtree-followups.md.tasks.json diff --git a/docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-RESUME.md b/docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-RESUME.md new file mode 100644 index 00000000..bb1fa52a --- /dev/null +++ b/docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-RESUME.md @@ -0,0 +1,110 @@ +# FixedTree → Equipment injection — RESUME / work-left handoff + +**Date:** 2026-06-26 +**Purpose:** survive a context compaction; let a fresh session continue without re-deriving state. + +--- + +## TL;DR + +The **FixedTree-under-Equipment dynamic-injection feature is BUILT, offline-complete, AND +✅ LIVE-VALIDATED on wonder (2026-06-26)** — 11 tasks, all reviewed, full offline suite green, final +integration review = ready to merge, and the real OPC injection confirmed on `wonder-app-vd03` (57 nodes +grafted under `EQ-3686c0272279`, all reading Good live values). It lives on a **local, unpushed** branch. +The only substantive thing left is the user's decision on push/PR/merge (§1). A few documented non-blocking +follow-ups remain (§3). + +## Git state (exact) + +- **Branch:** `feat/focas-fixedtree-equipment-injection` (in the main working dir `/Users/dohertj2/Desktop/OtOpcUa`, NOT a worktree). +- **Base:** branched off `fix/focas-poll-io-serialization` (the symptom-#1 data-plane fix — itself ahead of `master`, pushed to gitea with its own open PR, NOT merged). So this feature **stacks on an unmerged branch**. +- **Commits:** 14, range `da55c69`..`37cac5de` (10 task commits + 4 review-fix/docs commits). All **local — nothing pushed.** +- **User decision (2026-06-26):** finishing-a-development-branch → **"Keep as-is."** Do NOT push/merge/discard without an explicit new go-ahead. Standing rule: **commit/push only when asked.** +- **Untouched pre-existing working-tree edits** (leave alone; never stage): `CLAUDE.md`, `docker-dev/docker-compose.yml`, `pending.md`, `stillpending.md`, `docs/plans/2026-06-19-followups-batch.md.tasks.json`. +- This RESUME doc itself is currently **uncommitted** (a working artifact). + +## What the feature does + +Generic post-connect `ITagDiscovery` injection (NOT FOCAS-special-cased). On driver Connect: +`DriverInstanceActor` runs bounded re-discovery (Timers single-tick, generation-guarded, stop-on-stable + +attempt cap, re-kicks on reconnect) into a capturing `IAddressSpaceBuilder` → ships `DiscoveredNodesReady` +→ `DriverHostActor` resolves the equipment via authored `EquipmentTags`, maps the nodes under +`EQ-…/FOCAS/…` (read-only; single device-host folder collapsed) via `DiscoveredNodeMapper`, extends +`_nodeIdByDriverRef`, caches the plan, Tells `OpcUaPublishActor.MaterialiseDiscoveredNodes` → +`AddressSpaceApplier` → sink `EnsureFolder`/`EnsureVariable` + `RaiseNodesAddedModelChange` (NodeAdded), and +re-sends `SetDesiredSubscriptions(authored ∪ FixedTree refs)` so values flow through the existing +poll→push path. Survives redeploys (re-applied at the tail of `PushDesiredSubscriptions` from the cache) +and restarts (re-discovered on reconnect). + +## Verification (offline) — all green as of 2026-06-26 + +- `dotnet build ZB.MOM.WW.OtOpcUa.slnx` → **0 errors, 0 warnings** (TreatWarningsAsErrors on). +- `dotnet test … --filter "FullyQualifiedName~Runtime.Tests"` → **312 passed**. +- `dotnet test … --filter "FullyQualifiedName~OpcUaServer.Tests"` → **304 passed**. +- `dotnet test … --filter "FullyQualifiedName~FOCAS"` → **324 passed, 10 skipped** (the skips are live-wire integration tests needing the physical CNC — expected). +- Final integration review: **ready to merge** (3 non-blocking Minors — see Follow-ups). +- Known env limitation (not a failure): the net48 `Driver.Historian.Wonderware.Tests` can't run its testhost on macOS — run the **filtered** suites above, not a full-solution `dotnet test`. + +## Key files / anchors + +- Design: `docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-design.md` (status = Implemented; has the follow-ups). +- Plan + task journal: `docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection.md` (+ `.md.tasks.json`, all tasks completed). +- Investigation plan (symptom #2 marked BUILT): `docs/plans/2026-06-25-otopcua-equipment-dataplane-investigation.md`. +- Deployment doc (FixedTree section added): `docs/deployments/wonder-app-vd03-makino-z-34184.md`. +- New code: + - `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DiscoveredNode.cs`, `CapturingAddressSpaceBuilder.cs`, `DiscoveredNodeMapper.cs` + - `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/DiscoveredInjection.cs` (DTOs) + - modified: `DriverInstanceActor.cs`, `DriverHostActor.cs`, `OpcUaPublishActor.cs`, `AddressSpaceApplier.cs`, `OtOpcUaNodeManager.cs`, `IOpcUaAddressSpaceSink.cs` (+ `SdkAddressSpaceSink.cs`, `DeferredAddressSpaceSink.cs`) + - tests: `tests/Server/…Runtime.Tests/Drivers/{CapturingAddressSpaceBuilderTests,DiscoveredNodeMapperTests,DriverInstanceActorDiscoveryTests,DriverHostActorDiscoveryTests,DiscoveryInjectionEndToEndTests}.cs`, `…OpcUaServer.Tests/NodeManagerModelChangeOnAddTests.cs`, edits to `AddressSpaceApplierTests.cs`/`OpcUaPublishActorTests.cs`. +- Memory: `…/memory/wonder-otopcua-focas-and-akka-roles.md` (RESUME-ANCHOR bullet updated to record this feature; read it for the broader wonder/FOCAS context + box-access recipe). + +## WORK LEFT (prioritized) + +### 1. Decide the git endgame (user-gated) +Pick one, only on explicit user go-ahead: +- **Push + PR** — `git push -u origin feat/focas-fixedtree-equipment-injection`; PR base is `fix/focas-poll-io-serialization` (stacked) or `master` (will show both features' commits). gitea repo: `lmxopcua`. +- **Merge locally** into `fix/focas-poll-io-serialization` (folds both features onto one branch/PR). +- Keep waiting until after live validation (current state). + +### 2. Live wonder validation — ✅ DONE 2026-06-26 +**Validated live on `wonder-app-vd03`.** Built a full self-contained Host overlay from this branch @ +`37cac5de`, deployed to `E:\ApiInstall\OtOpcUa` (stop → backup `E:\ApiInstall\OtOpcUa_bak-20260626111416` +→ robocopy overlay preserving `appsettings*.json` + `pki\` → restart). Baseline before deploy: only +`parts-count`/`parts-required` under `EQ-3686c0272279`. After deploy + FOCAS reconnect: the host log +recorded `injected 57 discovered node(s) … under EQ-3686c0272279` / `materialised … (folders=14, vars=57)`, +no exceptions. CLI browse showed the full `FOCAS/` subtree (Identity/Axes X-Y-Z-B-C-AA+Actual/Spindle/ +Program/OperationMode/Timers), idempotent across repeats, device-host folder collapsed. Sample reads all +Good: `Identity/SeriesNumber=G431`, `CncType=31`, `AxisCount=7`, `Axes/X/AbsolutePosition=2801574` (live), +`OperationMode/ModeText=TJOG`; authored tags still Good (no regression). `/healthz` 200 Healthy throughout. +Result recorded in `docs/deployments/wonder-app-vd03-makino-z-34184.md`. **The substantive remaining work +is now the git endgame (§1) only.** Original recipe retained below for reference: + +The offline e2e asserts the recording-sink contract, NOT the real `OtOpcUaNodeManager` seed→overwrite at +the OPC node layer. Live validation closes that gap. Recipe (mirrors the symptom-#1 deploy): +1. Build the current Host self-contained: `dotnet publish src/…/ZB.MOM.WW.OtOpcUa.Host…csproj -c Release -r win-x64 --self-contained true -p:PublishSingleFile=false`. **Must be a full self-contained publish-overlay, NOT a DLL swap** — the box is self-contained (DLL swaps crashed: FileNotFound / "Could not resolve CoreCLR path"). Note: deploying the current Host already happened for symptom #1; if the box is at the symptom-#1 build, this feature's DLLs (Runtime + OpcUaServer + Commons + the new Runtime/Drivers files) must be included in the overlay — so a fresh full overlay from THIS branch is the safe path. +2. Box access: servecli `:2222`, key `~/.ssh/servecli_wonder`, user `dohertj2`; drive via `scratchpad/wonder-ps.sh` (base64 PS over cmd PTY); SFTP root `C:\Users\dohertj2\Desktop\win64`. Service `OtOpcUaHost`. Overlay onto `E:\ApiInstall\OtOpcUa` **preserving `pki\` + `appsettings*.json` + `data\`**; back up first; auto-rollback if unhealthy. +3. Restart `OtOpcUaHost`; confirm member Up w/ ADMIN+DRIVER (roles env already set), `/healthz` Healthy, OPC `:4840` listening. +4. The FOCAS driver connects → ~0–2 s later FixedTree populates → injection fires. Validate via the OtOpcUa CLI (`src/Client/…Client.CLI`) against `opc.tcp://wonder-app-vd03.zmr.zimmer.com:4840/OtOpcUa` (Security None, anonymous): + - `browse --recursive` → expect a `FOCAS` subfolder under `ns=2;s=EQ-3686c0272279` with `Identity/`, `Axes/`, etc. + - `read ns=2;s=EQ-3686c0272279/FOCAS/Identity/SeriesNumber` → expect Good (a real string). + - `read ns=2;s=EQ-3686c0272279/FOCAS/Axes/X/AbsolutePosition` → expect Good (value may be 0 on idle machine — assert STATUS, not magnitude). + - The authored `parts-count`/`parts-required` should remain Good (symptom #1 fix). +5. If a value reads Bad, the symptom-#1 self-healing applies (recoverable `BadCommunicationError`, observable in Serilog at `C:\Windows\System32\logs\otopcua-.log`). The Akka→Serilog bridge (from symptom #1) makes `DriverHost`/`DriverInstance`/discovery logs visible. + +### 3. Non-blocking follow-ups +**✅ ALL FIXEDTREE FOLLOW-UPS (A–E) IMPLEMENTED 2026-06-26** — design+plan +`2026-06-26-otopcua-fixedtree-followups{-design,}.md`; 16 commits `c2c368dc`..`0074f37a` on this branch +(every task spec+code reviewed; offline suites green). Resolved: +- ✅ Config-unchanged rebind now re-triggers discovery (`TriggerRediscovery`) — follow-up C. +- ✅ Multi-device-per-driver implemented via `EquipmentNode.DeviceHost` partition; ≥1-authored-tag requirement lifted (driver-binding resolution) — follow-up E (projection-only, no migration / no artifact wire change). +- ✅ Per-(re)connect re-discovery policy-gated (`ITagDiscovery.RediscoverPolicy` UntilStable/Once/Never; synchronous drivers → Once) — follow-up B. +- ✅ Double `SetDesiredSubscriptions` per redeploy de-duped (one send per driver) — follow-up D. +- ✅ Per-pass `DiscoverAsync` timeout made injectable — follow-up A. + +**Still open (out of scope for the FixedTree follow-ups — separate cross-cutting work):** +- Cross-cutting (from symptom #1, all 3 apps): shared `AddZbSerilog` doesn't set the static `Serilog.Log.Logger`; AdminUI persists FOCAS config in formats (series-as-number, scheme-less host) the driver only now tolerates — reconcile at the AdminUI source. + +## Context that's easy to lose +- 3 real defects were caught + fixed by the review chain during the build: `DriverDataType.ToString()` ≠ OPC type string (`Float64`→`"Double"`); `Server.ReportEvent` under the node `Lock` (deadlock); `ConfigureAwait(false)` in the discovery handler (off-actor-context crash for async drivers like Galaxy sharing the node). All have regression tests. +- The plan's Task-3 instruction "keep ReportEvent inside lock" was itself a defect; the plan doc was corrected. +- The execution used subagent-driven-development (fresh implementer per task + spec/code reviews; high-risk tasks got Opus reviews, serial). Single-writer discipline was enforced (no concurrent `dotnet` builds → no obj/bin or git-index races). diff --git a/docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-design.md b/docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-design.md index 7bccce5f..2927476b 100644 --- a/docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-design.md +++ b/docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-design.md @@ -3,11 +3,14 @@ **Date:** 2026-06-26 **Status:** ✅ Implemented (2026-06-26) — 11 tasks, offline-complete on branch `feat/focas-fixedtree-equipment-injection` (solution build 0 errors / 0 warnings; Runtime.Tests 312, OpcUaServer.Tests 304, FOCAS 247 + an end-to-end injection+value-flow test, all green). Live wonder validation pending. -**Follow-ups surfaced during the review chain (not blocking):** -- Config-unchanged driver→equipment **rebind** drops the cached plan but does not itself re-trigger discovery (`ReconcileDrivers` only restarts a child on a `DriverConfig` change) → the FixedTree subtree is absent under the new equipment until the driver's next reconnect/restart re-discovers it. -- **Multi-device-per-driver** equipment mapping is deferred (today strictly 1:1; the equipment is resolved from authored `EquipmentTags`, so a driver needs ≥1 authored tag for its FixedTree to graft — `EquipmentNode` carries no `DriverInstanceId`). -- Per-(re)connect re-discovery runs for **every `ITagDiscovery` driver** (Galaxy/OpcUaClient/TwinCAT too), bounded by stop-on-stable + an attempt cap; narrowing/opt-in for heavy network drivers is a follow-up. -- The end-to-end test asserts the recording-sink contract, not the real `OtOpcUaNodeManager` `BadWaitingForInitialData`→Good seed-overwrite at the OPC node layer — that is covered by the live wonder deploy. +**Follow-ups surfaced during the review chain — ✅ ALL RESOLVED 2026-06-26** (design +[`2026-06-26-otopcua-fixedtree-followups-design.md`](2026-06-26-otopcua-fixedtree-followups-design.md), +plan [`2026-06-26-otopcua-fixedtree-followups.md`](2026-06-26-otopcua-fixedtree-followups.md); +16 commits `c2c368dc`..`0074f37a` on this branch, every task spec+code reviewed; full offline suite green): +- ✅ Config-unchanged driver→equipment **rebind** now **re-triggers discovery** (follow-up C): the redeploy re-inject tail drops the stale plan AND `Tell`s the driver child a new `DriverInstanceActor.TriggerRediscovery` (a discovery action — not lifecycle control — idempotent, child no-ops if not Connected), so the FixedTree re-grafts under the new equipment on the next pass instead of waiting for the next natural reconnect. +- ✅ **Multi-device-per-driver** mapping **implemented** (follow-up E): `EquipmentNode` now carries `DriverInstanceId`/`DeviceId`/`DeviceHost` (projection-only — the columns + the `Devices` array were already in the artifact, no DB migration / no wire change), so equipment resolves via the driver binding **without** authored tags (≥1-tag requirement lifted), and a driver bound to multiple devices partitions its discovered tree by normalized device-host folder, grafting each device's subtree under the equipment whose `DeviceHost` matches (unmatched hosts warn-skip, never mis-graft). +- ✅ Per-(re)connect re-discovery is now **policy-gated** (follow-up B): `ITagDiscovery.RediscoverPolicy` (`UntilStable`/`Once`/`Never`, default `UntilStable`) — FOCAS stays `UntilStable` (its FixedTree cache fills asynchronously after connect); the synchronous-discovery drivers (OpcUaClient/TwinCAT/AbCip/AbLegacy/Modbus/S7/Galaxy) are `Once`, dropping the wasteful 15× retry. The hardcoded 30 s per-pass discovery timeout is now injectable too (follow-up A). +- ✅ The OPC-node-layer seed→serve gap (recording-sink-only e2e) was closed by the **live wonder deploy** of the base feature (validated 2026-06-26; see the deployment record). **Companion to:** [`2026-06-25-otopcua-equipment-dataplane-investigation.md`](2026-06-25-otopcua-equipment-dataplane-investigation.md) (symptom #1 — live FOCAS values — FIXED + deployed; this design addresses **symptom #2**). **Base branch:** `fix/focas-poll-io-serialization` (this feature builds on the now-deployed driver-host bootstrap re-spawn + FOCAS I/O fixes; that branch is ahead of `master` and not yet merged). @@ -59,7 +62,7 @@ Read-only value nodes carrying live values (e.g. `EQ-…/FOCAS/Axes/X/AbsolutePo | Tree placement | **Under a driver-named subfolder** — `EQ-…/FOCAS/…` (collision-safe vs. authored tags; self-describing). | | Device-host folder | **Collapse** the single device-host level → `EQ-…/FOCAS/Identity/…` (not `EQ-…/FOCAS/10.201.31.5:8193/Identity/…`), valid because today's deployment is strictly 1:1 driver↔equipment↔device. | | Model-change notification | **Emit `GeneralModelChangeEvent`** after a runtime add so already-connected OPC UA clients can refresh their browse. | -| Multi-device-per-driver | **Deferred** (documented follow-up) — today is 1:1. | +| Multi-device-per-driver | **Deferred** at base-feature time; ✅ **implemented as follow-up E** (2026-06-26) — `EquipmentNode.DeviceHost` partition. | | Discovered alarms | **Out of scope** — this feature surfaces value nodes only; alarms continue to come via the config path. | | Writable discovered nodes | **Out of scope** — FixedTree is read-only CNC state. | diff --git a/docs/plans/2026-06-26-otopcua-fixedtree-followups-design.md b/docs/plans/2026-06-26-otopcua-fixedtree-followups-design.md new file mode 100644 index 00000000..4185dbce --- /dev/null +++ b/docs/plans/2026-06-26-otopcua-fixedtree-followups-design.md @@ -0,0 +1,234 @@ +# OtOpcUa — FixedTree-injection follow-ups (design) + +**Date:** 2026-06-26 +**Status:** ✅ Implemented (2026-06-26) — all five follow-ups (A–E) built via subagent-driven development +(16 commits `c2c368dc`..`0074f37a` on `feat/focas-fixedtree-equipment-injection`; every task spec+code +reviewed, high-risk tasks with serial Opus reviews). Offline suites green: Runtime.Tests 331, OpcUaServer.Tests 319, +FOCAS 248 + AbLegacy/composer additions; `dotnet build` 0 errors, production `src/` 0 warnings (TreatWarningsAsErrors). +No DB migration and no deployment-artifact wire-format change were needed (E is projection-only — the columns + +`Devices` array were already serialized). Live wonder re-validation of the single-device FOCAS path is optional/user-gated +(the base feature's live path is unchanged by these follow-ups). +**Companion to:** [`2026-06-26-otopcua-fixedtree-equipment-injection-design.md`](2026-06-26-otopcua-fixedtree-equipment-injection-design.md) +(the base feature — ✅ built + live-validated on `wonder-app-vd03`). This design works through the +follow-ups that feature's review chain surfaced. +**Branch:** continue on `feat/focas-fixedtree-equipment-injection` (stacked on `fix/focas-poll-io-serialization`, +local/unpushed — standing rule is "commit/push only when asked"). + +--- + +## Scope (locked with the user 2026-06-26) + +The user selected **all five** items below. The base feature's explicit non-goals — discovered-**alarm** +injection and **writable** discovered nodes — remain out of scope (locked design decisions, untouched). + +| # | Follow-up | Size | Notes | +|---|---|---|---| +| A | Hardcoded 30 s discovery timeout → injectable | trivial | behavior-preserving | +| B | Re-discovery opt-in/policy gate per driver | moderate | back-compat default | +| C | Config-unchanged driver→equipment rebind re-triggers discovery | moderate | reverses a deliberate `won't-fix` | +| D | De-dup the double `SetDesiredSubscriptions` during redeploy | small | one extra unsub/resub blip today | +| E | Lift the ≥1-authored-tag requirement + multi-device-per-driver | largest | **projection-only, no DB migration** | + +## Key discovery that shapes E + +The Config-DB **already** models the equipment→driver(→device) association as first-class data — no schema +change is needed for E: + +- `Equipment.DriverInstanceId` (`string?`, made nullable by migration `20260608104706_NullableEquipmentDriverInstanceId`) +- `Equipment.DeviceId` (`string?`) — FK to a multi-device driver's device +- `Device` is a first-class entity (`DeviceId`, `DriverInstanceId`, schemaless `DeviceConfig` JSON with host) +- The AdminUI equipment editor already exposes an optional driver pick (`EquipmentInput.DriverInstanceId`) + +The **only** gap is the runtime projection: `EquipmentNode` is `(EquipmentId, DisplayName, UnsLineId)` and drops +`DriverInstanceId`/`DeviceId`, so the injector (`DriverHostActor.HandleDiscoveredNodes`) can only resolve the +equipment by inferring it from authored `EquipmentTags` — hence the ≥1-tag requirement. E closes that gap in the +projection + resolver, not the schema. + +--- + +## A. Discovery timeout → injectable + +`DriverInstanceActor.HandleRediscoverAsync` hardcodes `new CancellationTokenSource(TimeSpan.FromSeconds(30))` +while the rediscover interval + attempt-cap are already constructor parameters. Add a +`rediscoverDiscoverTimeout` (`TimeSpan`, default `TimeSpan.FromSeconds(30)`) to the ctor and the `Props` +factory; use the field instead of the literal. Pure consistency fix; default preserves behavior. + +## B. Re-discovery opt-in / policy gate + +**Problem:** `StartDiscovery()` runs the bounded retry loop for **every** `ITagDiscovery` driver on every +(re)connect. FOCAS needs it (its `FixedTreeCache` fills ~0–2 s *after* connect, so a single early pass would +capture an empty/partial tree). A driver that browses its full shape **synchronously** inside `DiscoverAsync` +(OpcUaClient, TwinCAT, AB) needs at most **one** pass — the 15×2 s retry is wasted (potentially heavy) network I/O. + +**Decision:** a per-driver **policy**, declared in code (driver "heaviness" is a property of the driver *type*, +needs no DB/AdminUI plumbing, lowest risk). Add a default-implemented member to `ITagDiscovery`: + +```csharp +public enum DiscoveryRediscoverPolicy { UntilStable, Once, Never } + +public interface ITagDiscovery +{ + DiscoveryRediscoverPolicy RediscoverPolicy => DiscoveryRediscoverPolicy.UntilStable; // default = today's behavior + Task DiscoverAsync(IAddressSpaceBuilder builder, CancellationToken cancellationToken); +} +``` + +- **`UntilStable`** (default, unchanged) — today's loop: retry every `_rediscoverInterval` up to + `_rediscoverMaxAttempts` or until the captured signature is non-empty and stable. +- **`Once`** — kick exactly one discovery pass on connect, emit one `DiscoveredNodesReady`, then stop. +- **`Never`** — no post-connect discovery kick at all. + +**Driver assignments:** +- **FOCAS** → `UntilStable` (explicit; it genuinely needs the retry). +- **OpcUaClient, TwinCAT, AbCip, AbLegacy** → `Once` (they discover synchronously in `DiscoverAsync`; one pass + on connect injects their tree, the retry loop only added cost). Any driver not overriding the default keeps + `UntilStable`, so this is a strict no-regression change. + +**Mechanism:** `DriverInstanceActor.StartDiscovery` reads `((ITagDiscovery)_driver).RediscoverPolicy`. `Never` +→ return without scheduling. `Once`/`UntilStable` → schedule the first `RediscoverTick`; `HandleRediscoverAsync` +stops after the first pass when the policy is `Once` (instead of evaluating stop-on-stable). + +**Alternative considered (rejected):** a per-instance JSON flag parsed by the host + AdminUI. More flexible but +adds artifact/AdminUI plumbing for a knob whose correct value is type-uniform. + +## C. Config-unchanged rebind re-triggers discovery + +**Problem:** when a redeploy rebinds a driver to a new equipment **without** a `DriverConfig` change, +`PushDesiredSubscriptions`' re-inject tail correctly **drops** the stale cached plan (a stale `EQ-1`-scoped graft +under `EQ-2` would be worse), but `ReconcileDrivers` only restarts a child on a `DriverConfig` change — so a +config-unchanged child is never reconnected and the FixedTree stays absent under the new equipment until the +driver's next natural reconnect/restart. + +The base feature deliberately did **not** add a re-trigger here, to avoid coupling the subscription pass to +driver-lifecycle control. This follow-up reverses that — but cleanly, because the trigger is a **discovery** +action, not lifecycle control (no stop/restart), and it is idempotent. + +**Decision:** add a `DriverInstanceActor.TriggerRediscovery` message. In the re-inject tail, the two branches +that `Remove` a cached plan because of a rebind/loss also `Tell` that driver's child `TriggerRediscovery`. The +child kicks a fresh `RediscoverTick` (current `_initGeneration`) **iff it is in the `Connected` state**; +otherwise it no-ops (its eventual reconnect re-discovers anyway). The discovery pass re-emits +`DiscoveredNodesReady`, which resolves against the **new** composition (`_lastComposition`) and grafts under the +new equipment. The re-trigger honors B's policy (`Never` drivers do not re-discover; `Once`/`UntilStable` run +their normal pass(es)). + +Update the inline comment at the drop site and the follow-up note in the base design doc to record the new +behavior. + +## D. De-dup the double `SetDesiredSubscriptions` + +**Problem:** during an in-process redeploy, a cached driver receives two `SetDesiredSubscriptions`: +the bulk authored-only send in `PushDesiredSubscriptions`, then the authored∪discovered union from +`ApplyDiscoveredPlan` (the re-inject tail). The first send forces the child to unsubscribe the whole handle +(authored tags included) then the second re-subscribes — one extra blip per cached driver per redeploy. + +**Decision:** in the bulk loop, **skip** the send for any driver that has a `_discoveredByDriver` entry — the +re-inject tail sends their complete union. **Critical fallback:** the re-inject tail can still *drop* a cached +plan (rebind/loss, see C); when it does, it must send the **authored-only** set for that driver so its authored +subscriptions are not lost. Net invariant: every driver receives exactly **one** `SetDesiredSubscriptions` per +redeploy. + +## E. Lift the ≥1-authored-tag requirement + multi-device + +No DB migration — projection + resolver only. + +**E1 — projection (`AddressSpaceComposer`).** Extend `EquipmentNode`: + +```csharp +public sealed record EquipmentNode( + string EquipmentId, + string DisplayName, + string UnsLineId, + string? DriverInstanceId = null, // from Equipment.DriverInstanceId + string? DeviceId = null, // from Equipment.DeviceId + string? DeviceHost = null); // resolved at projection time: Equipment.DeviceId -> Device.DeviceConfig host +``` + +`DriverInstanceId`/`DeviceId` are copied straight off the `Equipment` row. `DeviceHost` is resolved by joining +`Equipment.DeviceId → Device` and parsing the host out of that `Device`'s schemaless `DeviceConfig` JSON, so the +resolver can match it against a discovered device-host folder without re-reading the DB. All three are nullable; +existing single-equipment behavior is unaffected when they're null. + +**E2 — resolver (`DriverHostActor.HandleDiscoveredNodes`, and the redeploy re-inject tail).** Replace tag-only +equipment resolution with: + +> **candidates** = { equipments where `EquipmentNode.DriverInstanceId == driverId` } **∪** +> { equipments inferred from authored `EquipmentTags` for `driverId` } (keeps today's path working) + +- **0 candidates** → log Info, skip (unchanged). +- **1 candidate** → graft all discovered nodes under it (today's single-device behavior, now also works with + **zero** authored tags because the equipment-level `DriverInstanceId` resolves it). The device-host folder is + still collapsed (single device). +- **>1 candidates (multi-device)** → partition discovered nodes by their device-host folder segment + (`DiscoveredNode.FolderPathSegments[1]`) and graft each device's subtree under the equipment whose + `DeviceHost` matches that segment. A device-host with no matching equipment is **warn-skipped** (its subtree is + not grafted) rather than mis-grafted. The mapper's existing device-host collapse already disables itself when + ≥2 distinct device-host segments are present, so multi-device paths retain the device-host level and don't + collide. + +**⚠️ Implementation risk (E2 multi-device only):** the partition join is a **host string** match — the driver's +emitted device-host folder segment (FOCAS uses `device.HostAddress`) must equal the equipment's projected +`DeviceHost` (parsed from `DeviceConfig`). Both ultimately derive from the same device configuration, but the +string forms must be normalized to match (e.g. `host:port`). The warn-skip fallback makes a mismatch *safe* (no +mis-graft, authored tags + single-device paths unaffected); a normalization helper + a unit test pin the formats +together. Single-device deployments (the validated FOCAS `z-34184` case) take the "1 candidate" path and are +**not** exposed to this risk. + +--- + +## Data flow (unchanged) + +E changes only *which equipment* a discovered node is grafted under and *whether* an equipment with no authored +tags participates. Once the NodeId is assigned, the materialize → subscribe → poll → push value path is exactly +the base feature's path; B/C/D change *when/how often* discovery runs and *how many* subscription pushes occur, +not the value path. + +## Error handling + +- B `Never` driver → no discovery, authored tags unaffected. +- C re-trigger on a non-`Connected` child → no-op (safe; reconnect re-discovers). +- D dropped-plan fallback → authored-only send, so a rebind/loss never strands a driver's authored subscriptions. +- E multi-device unmatched device-host → Warning + skip that device's subtree; other devices + authored tags + unaffected. >1 candidate with no `DeviceHost` data anywhere → falls back to the base feature's warn+skip + (no regression). + +## Testing + +- **A:** ctor/`Props` wires the timeout; default is 30 s (assert via a short injected timeout in an existing + rediscover test). +- **B:** `Never` → no `DiscoveredNodesReady`; `Once` → exactly one even when the captured set would keep growing; + `UntilStable` → today's loop (regression). FOCAS reports `UntilStable`; the four network drivers report `Once`. +- **C:** rebind drop branch `Tell`s `TriggerRediscovery`; `Connected` child re-discovers and re-emits; + non-`Connected` child no-ops; re-trigger respects a `Never` policy. +- **D:** single-send invariant — a cached driver gets exactly one `SetDesiredSubscriptions` on redeploy + (union when applied; authored-only when the plan is dropped). +- **E1:** `EquipmentNode` projection carries `DriverInstanceId`/`DeviceId`/`DeviceHost`; `DeviceHost` resolves via + the `Device` join + `DeviceConfig` host parse; nulls when unset. +- **E2:** tag-less graft (driver-level link, 0 authored tags); single-candidate unchanged (collapse retained); + multi-device partition maps each device-host to the right equipment; unmatched device-host → warn-skip; + host-string normalization. +- **Regression:** Runtime.Tests, OpcUaServer.Tests, and the FOCAS suite stay green; the validated single-device + FOCAS injection path is unchanged. +- Live wonder re-validation of the single-device FOCAS path is **optional** and user-gated (the base feature is + already live-validated; these follow-ups don't alter that path's runtime behavior). + +## Scope / non-goals + +- **In:** A–E above. +- **Out (still locked):** discovered-**alarm** injection; **writable** discovered nodes. + +## Touched code (anticipated) + +- `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ITagDiscovery.cs` — `DiscoveryRediscoverPolicy` enum + default member. +- Driver classes (`FocasDriver`, `OpcUaClientDriver`, `TwinCATDriver`, `AbCipDriver`, `AbLegacyDriver`) — override + `RediscoverPolicy`. +- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs` — injectable timeout (A); policy-gated + `StartDiscovery`/`HandleRediscoverAsync` (B); `TriggerRediscovery` message (C). +- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` — resolver union + multi-device partition (E2); + re-trigger on rebind drop (C); bulk-send skip + dropped-plan fallback (D). +- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/AddressSpaceComposer.cs` — `EquipmentNode` projection (E1). +- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DiscoveredNodeMapper.cs` — multi-device partition support (E2). +- Tests under `tests/.../Runtime.Tests` and `tests/.../OpcUaServer.Tests`. + +## Task tracking + +Implementation tasks to be generated by writing-plans from this design. diff --git a/docs/plans/2026-06-26-otopcua-fixedtree-followups.md b/docs/plans/2026-06-26-otopcua-fixedtree-followups.md new file mode 100644 index 00000000..9a4aaff5 --- /dev/null +++ b/docs/plans/2026-06-26-otopcua-fixedtree-followups.md @@ -0,0 +1,358 @@ +# FixedTree-injection follow-ups — Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development (or executing-plans) to implement this plan task-by-task. + +**Goal:** Implement the five approved follow-ups to the FixedTree-under-Equipment dynamic-injection feature: (A) injectable discovery timeout, (B) per-driver re-discovery policy gate, (C) re-trigger discovery on a config-unchanged rebind, (D) de-dup the double `SetDesiredSubscriptions`, and (E) lift the ≥1-authored-tag requirement + support multi-device-per-driver. + +**Architecture:** Akka.NET actor pipeline. `DriverInstanceActor` runs post-connect discovery and publishes `DiscoveredNodesReady`; `DriverHostActor` resolves the bound equipment, maps discovered nodes via `DiscoveredNodeMapper`, caches a plan, materialises via `OpcUaPublishActor`, and merges subscription refs. Composition is built by `AddressSpaceComposer.Compose` (pure, from entities) and mirrored by `DeploymentArtifact` (decode, from the sealed JSON artifact) — the two MUST stay byte-parity-equal. The deployment artifact already serialises full `Equipment` + `Device` entities, so E needs **no DB migration and no artifact wire-format change** — only decode/projection reads. + +**Tech Stack:** .NET 10, C# (default interface members, collection expressions), Akka.NET, xUnit. Build: `dotnet build ZB.MOM.WW.OtOpcUa.slnx` (TreatWarningsAsErrors). Test (macOS — run filtered, NOT full-solution; the net48 Wonderware testhost can't run on macOS): +- `dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~Runtime.Tests"` +- `dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~OpcUaServer.Tests"` +- `dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~FOCAS"` + +**Design:** [`2026-06-26-otopcua-fixedtree-followups-design.md`](2026-06-26-otopcua-fixedtree-followups-design.md). Branch: `feat/focas-fixedtree-equipment-injection` (continue on it; commit per task; do NOT push/merge — standing rule). + +**Out of scope (locked):** discovered-alarm injection; writable discovered nodes. + +--- + +## Execution order & parallelism + +Two files are each touched by multiple tasks and MUST be edited serially: +- `DriverInstanceActor.cs`: **Task 1 → Task 3 → Task 4** +- `DriverHostActor.cs`: **Task 6 → Task 7 → Task 8 → Task 9** + +Independent file sets that can run concurrently with the above: **Task 2** (`ITagDiscovery` + 5 driver files) and **Task 5** (`AddressSpaceComposer.cs` + `DeploymentArtifact.cs`). + +Dependency summary: T3 ⟵{T1,T2}; T4 ⟵T3; T6 ⟵T5; T7 ⟵{T4,T6}; T8 ⟵T7; T9 ⟵{T5,T8}; T10 ⟵T9; T11 ⟵{T2,T4,T9,T10}. + +--- + +### Task 1: Injectable discovery timeout (follow-up A) + +**Classification:** small +**Estimated implement time:** ~3 min +**Parallelizable with:** Task 2, Task 5 + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs` (ctor ~244-259, `Props` ~195-210, fields ~133-137, `HandleRediscoverAsync` ~765) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorDiscoveryTests.cs` + +**Context:** `HandleRediscoverAsync` hardcodes `using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));` (line 765). The rediscover interval + attempt-cap are already ctor params (`_rediscoverInterval`, `_rediscoverMaxAttempts`). Add a sibling param for the per-pass discovery timeout, default-preserving. + +**Step 1 — Failing test:** add a test asserting that when constructed with a very short discovery timeout and an `ITagDiscovery` whose `DiscoverAsync` blocks, the pass cancels by the injected timeout (e.g. `DiscoveredNodesReady` carries an empty set within the short window) rather than waiting 30 s. Reuse the existing fake `ITagDiscovery` driver in this test file (search it for the existing discovery-actor fake; mirror that pattern). If a fully deterministic timeout test is too flaky, instead assert the wiring: a new public `DefaultRediscoverDiscoverTimeout` constant exists and equals 30 s, and the ctor/`Props` accept the param. + +**Step 2 — Verify it fails:** `dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~DriverInstanceActorDiscoveryTests"` → fails to compile / fails assertion. + +**Step 3 — Implement:** +- Add `public static readonly TimeSpan DefaultRediscoverDiscoverTimeout = TimeSpan.FromSeconds(30);` next to the other discovery defaults (~line 36-39). +- Add a field `private readonly TimeSpan _rediscoverDiscoverTimeout;` (~133-137). +- Add ctor param `TimeSpan? rediscoverDiscoverTimeout = null` (after `rediscoverMaxAttempts`); assign `_rediscoverDiscoverTimeout = rediscoverDiscoverTimeout ?? DefaultRediscoverDiscoverTimeout;`. +- Add the matching optional param to `Props` and forward it. +- In `HandleRediscoverAsync`, replace `TimeSpan.FromSeconds(30)` with `_rediscoverDiscoverTimeout`. + +**Step 4 — Verify:** test passes; `dotnet build ZB.MOM.WW.OtOpcUa.slnx` → 0 warnings. + +**Step 5 — Commit:** `git commit -m "feat(otopcua): make FixedTree re-discovery per-pass timeout injectable (follow-up A)"` + +--- + +### Task 2: Re-discovery policy enum + ITagDiscovery member + driver overrides (follow-up B, part 1) + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 1, Task 5 + +**Files:** +- Modify: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ITagDiscovery.cs` +- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FocasDriver.cs` +- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriver.cs` +- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs` +- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipDriver.cs` +- Modify: `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs` +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorDiscoveryTests.cs` (or a small new test next to the FOCAS driver tests asserting `FocasDriver` reports `UntilStable`) + +**Context:** `ITagDiscovery` (Core.Abstractions) currently has only `DiscoverAsync`. Add a policy the actor (Task 3) honors. Default = today's behavior so any non-overriding driver is unchanged. + +**Step 1 — Failing test:** assert `new FocasDriver(...).RediscoverPolicy == DiscoveryRediscoverPolicy.UntilStable` and that one network driver (e.g. `OpcUaClientDriver`) reports `Once`. (Construct via the simplest available ctor/fake; if drivers are hard to construct standalone, assert the enum + default member exist and compile, plus a focused test on FOCAS.) + +**Step 2 — Verify it fails:** compile failure (enum/member absent). + +**Step 3 — Implement:** +- In `ITagDiscovery.cs`, add the enum + a default-implemented member: +```csharp +/// How aggressively the host re-runs post-connect discovery for this driver. +public enum DiscoveryRediscoverPolicy +{ + /// Retry every interval up to the cap or until the captured set is non-empty and stable + /// (for drivers whose discovered shape fills in asynchronously after connect, e.g. FOCAS FixedTree). + UntilStable, + /// Run exactly one discovery pass on connect (drivers that discover synchronously in DiscoverAsync). + Once, + /// Never run post-connect discovery. + Never, +} + +public interface ITagDiscovery +{ + /// Post-connect re-discovery policy. Default preserves the original retry-until-stable behavior. + DiscoveryRediscoverPolicy RediscoverPolicy => DiscoveryRediscoverPolicy.UntilStable; + + Task DiscoverAsync(IAddressSpaceBuilder builder, CancellationToken cancellationToken); +} +``` +- `FocasDriver`: add `public DiscoveryRediscoverPolicy RediscoverPolicy => DiscoveryRediscoverPolicy.UntilStable;` (explicit — it genuinely needs the retry loop). +- `OpcUaClientDriver`, `TwinCATDriver`, `AbCipDriver`, `AbLegacyDriver`: add `public DiscoveryRediscoverPolicy RediscoverPolicy => DiscoveryRediscoverPolicy.Once;` — these discover synchronously inside `DiscoverAsync`, so one pass on connect suffices; the 15× retry was wasted (potentially heavy) work. **Before setting `Once`, confirm each driver's `DiscoverAsync` returns its complete set synchronously** (read each `DiscoverAsync`); if any populates a cache asynchronously after connect like FOCAS, leave it `UntilStable` and note why in a comment. + +**Step 4 — Verify:** test passes; build 0 warnings. + +**Step 5 — Commit:** `git commit -m "feat(otopcua): add ITagDiscovery.RediscoverPolicy + per-driver assignments (follow-up B)"` + +--- + +### Task 3: DriverInstanceActor honors RediscoverPolicy (follow-up B, part 2) + +**Classification:** standard +**Estimated implement time:** ~5 min +**Parallelizable with:** none (serial after Task 1 on the same file; needs Task 2's enum) + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs` (`StartDiscovery` ~736-740, `HandleRediscoverAsync` ~754-795) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorDiscoveryTests.cs` + +**Context:** `StartDiscovery()` currently kicks the loop for every `ITagDiscovery` driver. `HandleRediscoverAsync` schedules the next tick unless stable/capped. Gate both on the driver's `RediscoverPolicy`. + +**Step 1 — Failing tests (3):** +1. A fake `ITagDiscovery` driver reporting `Never` → no `DiscoveredNodesReady` is ever published after connect. +2. A fake reporting `Once` whose captured set would keep GROWING across passes → exactly ONE `DiscoveredNodesReady` and no further tick scheduled. +3. A fake reporting `UntilStable` → existing behavior (retries until stable/cap) — keep/extend the current passing test. + +**Step 2 — Verify they fail:** the `Never`/`Once` tests fail (today everything retries-until-stable). + +**Step 3 — Implement:** +- In `StartDiscovery()`: after the `if (_driver is not ITagDiscovery discovery) return;` guard, read the policy; `if (discovery.RediscoverPolicy == DiscoveryRediscoverPolicy.Never) return;` before scheduling the first `RediscoverTick`. +- In `HandleRediscoverAsync`: after publishing `DiscoveredNodesReady`, when the policy is `Once`, do NOT schedule another tick (log Debug "policy=Once, single pass" and return). When `UntilStable`, keep today's stop-on-stable + cap logic. (Read the live policy via `((ITagDiscovery)_driver).RediscoverPolicy`.) +- Keep the generation guard intact. + +**Step 4 — Verify:** the 3 tests pass; the full `DriverInstanceActorDiscoveryTests` + `Runtime.Tests` suite stays green; build 0 warnings. + +**Step 5 — Commit:** `git commit -m "feat(otopcua): DriverInstanceActor honors RediscoverPolicy (Never/Once/UntilStable) (follow-up B)"` + +--- + +### Task 4: TriggerRediscovery message + handler (follow-up C, part 1) + +**Classification:** standard +**Estimated implement time:** ~4 min +**Parallelizable with:** none (serial after Task 3 on the same file) + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs` (message decls near `RediscoverTick` ~110-115; add a `Connected`-state receive) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverInstanceActorDiscoveryTests.cs` + +**Context:** Task 7 (`DriverHostActor`) will `Tell` a driver child to re-run discovery after a rebind. The child must accept that message and only act when `Connected`. + +**Step 1 — Failing tests (2):** +1. Send `TriggerRediscovery` to an actor whose driver is `Connected` → it runs a discovery pass and publishes `DiscoveredNodesReady` (respecting policy: a `Never` driver does NOT). +2. Send `TriggerRediscovery` before connect / while not `Connected` → no `DiscoveredNodesReady`, no crash (no-op). + +**Step 2 — Verify they fail:** message type doesn't exist. + +**Step 3 — Implement:** +- Add `public sealed record TriggerRediscovery();` near the other public messages. +- In the `Connected` state, add a receive for `TriggerRediscovery` that calls `StartDiscovery()` (which already honors policy + the `ITagDiscovery` guard, and uses the current `_initGeneration`). +- In other states, either don't register the receive (so it's unhandled = no-op) or register a no-op. Prefer registering only in `Connected` so a non-connected child silently ignores it (verify the actor's state-machine style — match how other state-scoped messages are handled). Ensure no `Unhandled`-logging noise; if the actor logs unhandled messages, add an explicit ignore in the relevant states. + +**Step 4 — Verify:** both tests pass; suite green; build 0 warnings. + +**Step 5 — Commit:** `git commit -m "feat(otopcua): DriverInstanceActor.TriggerRediscovery message (follow-up C)"` + +--- + +### Task 5: EquipmentNode carries DriverInstanceId/DeviceId/DeviceHost (follow-up E, projection) + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** Task 1, Task 2 + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/AddressSpaceComposer.cs` (`EquipmentNode` record line 61; projection ~326-332; `Compose` signatures ~281-312) +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs` (`ReadEquipmentNode` ~810-820; the equipment decode call ~204; `Empty()` ~362-367; add a `Devices`-array → `DeviceId`→host map) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/` (composer projection test) and the existing artifact-decode/parity test for `EquipmentNode` (search `tests/` for `ReadEquipmentNode`/`EquipmentNodes`/`DeploymentArtifact` coverage; if a Compose-vs-decode parity test exists, extend it) + +**Context:** The artifact already serialises full `Equipment` rows (incl. nullable `DriverInstanceId`, `DeviceId`) and a full `Devices` array (each `Device` has `DeviceId` + schemaless `DeviceConfig` JSON containing FOCAS's `HostAddress`). `Compose` (pure) and `DeploymentArtifact` (decode) MUST produce identical `EquipmentNode`s. `_lastComposition` (used by the resolver) always comes from decode, but parity is still required by tests. + +**Step 1 — Failing tests:** +- Composer: given an `Equipment` with `DriverInstanceId="d1"`, `DeviceId="dev1"`, and a `Device{DeviceId="dev1", DeviceConfig={"HostAddress":"10.0.0.5:8193"}}`, `Compose(...)` yields `EquipmentNode` with `DriverInstanceId=="d1"`, `DeviceId=="dev1"`, `DeviceHost=="10.0.0.5:8193"`; with no device assigned → all three null. +- Decode: an artifact JSON whose `Equipment` element has those fields + a matching `Devices` element decodes to the same `EquipmentNode`. + +**Step 2 — Verify they fail:** `EquipmentNode` has no such fields. + +**Step 3 — Implement:** +- Extend the record (defaulted params keep all existing call sites compiling): +```csharp +public sealed record EquipmentNode( + string EquipmentId, + string DisplayName, + string UnsLineId, + string? DriverInstanceId = null, + string? DeviceId = null, + string? DeviceHost = null); +``` +- Add a shared host-extraction helper usable by BOTH sides (place it where both can call it without a new project dependency — e.g. a `public static string? TryExtractDeviceHost(string? deviceConfigJson)` on `AddressSpaceComposer`, parsing the top-level `"HostAddress"` string from the `DeviceConfig` JSON; return null if absent/unparseable). Add a normalization step (trim; lower-case host) and DOCUMENT that the discovered device-host folder segment must be normalized the same way in Task 9. +- `Compose`: add an optional `IReadOnlyList? devices = null` param to BOTH overloads (forward from the 5-arg overload as empty). Build `deviceHostById = devices.ToDictionary(d => d.DeviceId, d => TryExtractDeviceHost(d.DeviceConfig))`. In the equipment projection, set `DriverInstanceId: e.DriverInstanceId`, `DeviceId: e.DeviceId`, `DeviceHost: e.DeviceId is null ? null : deviceHostById.GetValueOrDefault(e.DeviceId)`. +- `DeploymentArtifact`: read the `Devices` array (decode `DeviceId` + `DeviceConfig`) into a `DeviceId`→host map using the SAME `TryExtractDeviceHost` helper; thread it into `ReadEquipmentNode` (change its signature to accept the map, or do a post-pass) so it reads `DriverInstanceId`/`DeviceId` from the element and resolves `DeviceHost` from the map. Update `Empty()` only if its arity changed (it won't — record params are defaulted). +- **Parity:** ensure the decode-side host normalization is byte-identical to `Compose`'s (same helper). If a Compose-vs-decode parity test exists, pass the same `Devices` to `Compose` in that test. + +**Step 4 — Verify:** new tests pass; `OpcUaServer.Tests` + `Runtime.Tests` green; build 0 warnings. **Run the existing artifact-parity test** — it MUST stay green. + +**Step 5 — Commit:** `git commit -m "feat(otopcua): EquipmentNode carries DriverInstanceId/DeviceId/DeviceHost (follow-up E projection)"` + +--- + +### Task 6: DriverHostActor — cache-as-dict + driver-level equipment resolution (follow-up E, part 1) + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** none (serial: first DriverHostActor task; needs Task 5) + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` (`_discoveredByDriver` field ~168; `HandleDiscoveredNodes` ~580-639; `ApplyDiscoveredPlan` ~658-701; `RoutingEquals`; redeploy re-inject tail ~1247-1290) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverHostActorDiscoveryTests.cs` + +**Context:** Today `_discoveredByDriver` is `Dictionary` (one plan per driver) and equipment is resolved ONLY from authored `EquipmentTags`. This task (1) changes the cache value to a per-equipment map so Task 9 can add multiple equipments, and (2) makes resolution also use the equipment-level driver link so a driver with an assigned equipment but ZERO authored tags still grafts. **Still requires exactly one resolved equipment here** (multi-device is Task 9) — >1 keeps the current warn+skip. + +**Step 1 — Failing tests:** +- Tag-less graft: composition has an `EquipmentNode{DriverInstanceId="d1"}` with NO authored `EquipmentTags` for `d1`; `DiscoveredNodesReady("d1", nodes)` → nodes graft under that equipment (today: skipped with "no equipment/authored tags"). +- Regression: the existing single-equipment-with-authored-tags test still grafts identically (collapse retained). + +**Step 2 — Verify it fails:** tag-less case is skipped today. + +**Step 3 — Implement:** +- Change `_discoveredByDriver` to `Dictionary>` (driverId → (equipmentId → plan)). Update ALL readers: `HandleDiscoveredNodes` short-circuit, `ApplyDiscoveredPlan`, and the redeploy re-inject tail must iterate the inner map. +- New resolution in `HandleDiscoveredNodes`: candidate equipments = + `_lastComposition.EquipmentNodes.Where(e => e.DriverInstanceId == driverId).Select(e => e.EquipmentId)` + **∪** the existing authored-tag-derived set. Distinct. + - 0 → log Info, skip (unchanged message). + - 1 → resolve `equipmentId`; authoredRefs for that driver as today; `DiscoveredNodeMapper.Map(equipmentId, nodes, authoredRefs)`; cache as a 1-entry inner map; apply. + - >1 → for THIS task, keep `_log.Warning(... "multi-equipment-per-driver is handled in the multi-device path")` + skip. (Task 9 replaces this branch.) +- `ApplyDiscoveredPlan`: keep applying a single `(equipmentId, plan)`; callers now iterate the inner map and call it per entry. The subscription-merge union must include ALL discovered routing keys across the driver's plans (so a multi-plan driver subscribes every device's refs). Keep the authored value/alarm ref computation. +- `RoutingEquals` short-circuit: compare the FULL new inner-map routing against the cached inner-map routing (skip re-apply only when every equipment's routing is unchanged). +- Redeploy re-inject tail: iterate `_discoveredByDriver`; for each driver, re-resolve candidates from the CURRENT composition; per cached `(equipmentId, plan)` entry, keep the existing drop rules (equipment no longer resolves / plan NodeIds not scoped to `equipmentId`) but applied per-entry; re-apply surviving entries. (Task 7 will add the re-trigger on drop; Task 8 the de-dup.) + +**Step 4 — Verify:** new + existing `DriverHostActorDiscoveryTests` green; `Runtime.Tests` green; build 0 warnings. + +**Step 5 — Commit:** `git commit -m "feat(otopcua): driver-level equipment resolution + per-equipment discovered-plan cache (follow-up E)"` + +--- + +### Task 7: DriverHostActor — re-trigger discovery on rebind drop (follow-up C, part 2) + +**Classification:** high-risk +**Estimated implement time:** ~4 min +**Parallelizable with:** none (serial after Task 6; needs Task 4's message) + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` (redeploy re-inject tail drop branches ~1264-1288; update the deliberate-`won't-fix` comment) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverHostActorDiscoveryTests.cs` + +**Context:** When the re-inject tail DROPS a cached plan because the equipment rebound/no-longer-resolves, the FixedTree stays absent under the new equipment until the driver's next natural reconnect. Re-trigger discovery so it re-grafts promptly. + +**Step 1 — Failing test:** simulate a redeploy where a driver's equipment changed (cached plan scoped to old `EQ-1`, new composition binds the driver to `EQ-2`). Assert the driver child receives `DriverInstanceActor.TriggerRediscovery` after the drop. (Use the test harness's child-probe/TestProbe pattern already used in this file for asserting messages to driver children.) + +**Step 2 — Verify it fails:** no re-trigger today. + +**Step 3 — Implement:** in each drop branch (the two `Remove` sites), after removing the entry, `Tell` that driver's child actor `new DriverInstanceActor.TriggerRediscovery()` (guard: only if the child exists in `_children`). Update the inline comment: the previous "we deliberately do NOT add re-trigger logic" note becomes a description of the new re-trigger (discovery-only, idempotent, child no-ops if not `Connected`). If a driver maps to MULTIPLE cached equipment entries and only one drops, still send a single `TriggerRediscovery` (discovery re-resolves all of them) — de-dupe so a driver is told at most once per re-inject pass. + +**Step 4 — Verify:** test passes; suite green; build 0 warnings. + +**Step 5 — Commit:** `git commit -m "feat(otopcua): re-trigger discovery on config-unchanged rebind (follow-up C)"` + +--- + +### Task 8: DriverHostActor — single SetDesiredSubscriptions per redeploy (follow-up D) + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** none (serial after Task 7) + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` (`PushDesiredSubscriptions` bulk loop ~1204; re-inject tail interaction) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverHostActorDiscoveryTests.cs` + +**Context:** During an in-process redeploy a cached driver gets the bulk authored-only `SetDesiredSubscriptions` (line 1204) AND then the union from `ApplyDiscoveredPlan` (line 697) — one extra unsub/resub blip. Make it exactly one send per driver. + +**Step 1 — Failing test:** redeploy with one driver that has a cached discovered plan; assert the driver child receives `SetDesiredSubscriptions` EXACTLY ONCE during the redeploy, and that the single payload is the authored∪discovered UNION. Add a second test: a driver whose cached plan is DROPPED in the re-inject tail (rebind) still receives exactly one `SetDesiredSubscriptions` carrying the AUTHORED-ONLY set (fallback) — its authored subscriptions must not be lost. + +**Step 2 — Verify it fails:** today the cached-driver case sends twice. + +**Step 3 — Implement:** +- In the bulk loop, SKIP the send for any `driverId` present in `_discoveredByDriver` (capture the key set BEFORE the re-inject tail runs). +- Re-inject tail: when a cached plan is APPLIED, `ApplyDiscoveredPlan` already sends the union (covers authored). When a cached plan is DROPPED (all entries for the driver removed → the driver no longer has any cached plan), send the authored-only `SetDesiredSubscriptions` for that driver as a fallback (mirror the bulk-loop payload: authored value refs + alarm refs, `SubscriptionPublishingInterval`). +- Ensure the invariant holds for drivers WITHOUT a cached plan (unchanged: single bulk send) and drivers added/removed by the reconcile. + +**Step 4 — Verify:** both tests pass; the existing redeploy/restore tests stay green (watch for any test asserting the old double-send count); build 0 warnings. + +**Step 5 — Commit:** `git commit -m "perf(otopcua): one SetDesiredSubscriptions per driver per redeploy (follow-up D)"` + +--- + +### Task 9: DriverHostActor — multi-device-per-driver partition (follow-up E, part 2) + +**Classification:** high-risk +**Estimated implement time:** ~5 min +**Parallelizable with:** none (serial after Task 8; needs Task 5's DeviceHost) + +**Files:** +- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` (`HandleDiscoveredNodes` >1-candidate branch from Task 6) +- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/Drivers/DriverHostActorDiscoveryTests.cs` + +**Context:** Replace Task 6's ">1 candidate → warn+skip" with a real partition. Each candidate equipment has `EquipmentNode.DeviceHost` (from Task 5). The discovered nodes carry a device-host folder segment at `FolderPathSegments[1]` (FOCAS uses `device.HostAddress`). Partition nodes by that segment, normalize it the SAME way Task 5 normalized `DeviceHost`, and map each device's subset under the matching equipment via the existing `DiscoveredNodeMapper.Map` (a single-device subset → collapse kicks in per equipment → clean `EQ-n/FOCAS/Identity/...`). + +**Step 1 — Failing tests:** +- Multi-device: driver `d1` resolves to `EQ-A{DeviceHost=h1}` and `EQ-B{DeviceHost=h2}`; discovered nodes split across folder segments `h1`/`h2`; assert `h1`'s subtree grafts under `EQ-A` and `h2`'s under `EQ-B`, each routing-keyed correctly, and `_discoveredByDriver["d1"]` has two entries. +- Unmatched device-host → warn-skip: a discovered segment `h3` with no matching equipment is NOT grafted (logged Warning), while `h1`/`h2` still graft. +- Degenerate: >1 candidate but NO `DeviceHost` data anywhere → falls back to warn+skip (no crash, no mis-graft). + +**Step 2 — Verify it fails:** Task 6 left this as warn+skip. + +**Step 3 — Implement:** in the >1-candidate branch, build `hostToEquipment = candidates.Where(e => e.DeviceHost != null).ToDictionary(Normalize(e.DeviceHost), e.EquipmentId)` (guard duplicate hosts → warn+skip the ambiguous host). Partition `nodes` by `Normalize(FolderPathSegments.Count >= 2 ? FolderPathSegments[1] : null)`. For each partition with a matching equipment: compute that equipment's authoredRefs, `Map(equipmentId, partitionNodes, authoredRefs)`, collect into the inner `(equipmentId → plan)` map. Unmatched partitions → `_log.Warning` + skip. Cache the multi-entry inner map and apply every entry (Task 6 made apply per-entry). Use the SAME normalization helper from Task 5 (factor it so both call it). + +**Step 4 — Verify:** all three tests pass; single-device + tag-less tests from Task 6 still green; `Runtime.Tests` + `OpcUaServer.Tests` + FOCAS suites green; build 0 warnings. + +**Step 5 — Commit:** `git commit -m "feat(otopcua): multi-device-per-driver FixedTree partition (follow-up E)"` + +--- + +### Task 10: Docs — update follow-up notes + design statuses + +**Classification:** trivial +**Estimated implement time:** ~3 min +**Parallelizable with:** none (after Task 9) + +**Files:** +- Modify: `docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-design.md` (the "Follow-ups surfaced during the review chain" section + the decisions-table multi-device row — mark A–E DONE, note the rebind re-trigger now exists) +- Modify: `docs/plans/2026-06-26-otopcua-fixedtree-followups-design.md` (Status → Implemented) +- Modify: `docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-RESUME.md` (§3 — strike the now-closed follow-ups) + +**Steps:** update the prose to reflect what shipped (each follow-up + the fact that E required no migration / no artifact change; the rebind re-trigger reversed the earlier `won't-fix`, cleanly). Commit: `git commit -m "docs(otopcua): record FixedTree follow-ups A-E as implemented"` + +--- + +### Task 11: Build + full offline suite + regression gate + +**Classification:** standard +**Estimated implement time:** ~4 min (mostly test wall-time) +**Parallelizable with:** none (final; after Tasks 2, 4, 9, 10) + +**Files:** none (verification only) + +**Steps:** +1. `dotnet build ZB.MOM.WW.OtOpcUa.slnx` → **0 errors, 0 warnings**. +2. `dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~Runtime.Tests"` → all green. +3. `dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~OpcUaServer.Tests"` → all green. +4. `dotnet test ZB.MOM.WW.OtOpcUa.slnx --filter "FullyQualifiedName~FOCAS"` → all green (live-wire integration tests skip without the CNC — expected). +5. Confirm the validated single-device FOCAS injection path is unchanged (the relevant `DriverHostActorDiscoveryTests`/end-to-end test passes untouched). Report counts. Do NOT run a full-solution `dotnet test` (net48 Wonderware testhost can't run on macOS). + +No commit (verification). Live wonder re-validation is optional + user-gated. diff --git a/docs/plans/2026-06-26-otopcua-fixedtree-followups.md.tasks.json b/docs/plans/2026-06-26-otopcua-fixedtree-followups.md.tasks.json new file mode 100644 index 00000000..2311cc88 --- /dev/null +++ b/docs/plans/2026-06-26-otopcua-fixedtree-followups.md.tasks.json @@ -0,0 +1,23 @@ +{ + "planPath": "docs/plans/2026-06-26-otopcua-fixedtree-followups.md", + "tasks": [ + {"id": 1, "subject": "Task 1: Injectable discovery timeout (A)", "status": "completed", "nativeId": 38, "commits": ["c2c368dc"]}, + {"id": 2, "subject": "Task 2: RediscoverPolicy enum + driver overrides (B1)", "status": "completed", "nativeId": 39, "commits": ["a378b572", "efbdaf85"]}, + {"id": 3, "subject": "Task 3: DriverInstanceActor honors policy (B2)", "status": "completed", "blockedBy": [1, 2], "nativeId": 40, "commits": ["ce34816a", "a1a655e6"]}, + {"id": 4, "subject": "Task 4: TriggerRediscovery message + handler (C1)", "status": "completed", "blockedBy": [3], "nativeId": 41, "commits": ["f7358bf4", "e7d5ebe9"]}, + {"id": 5, "subject": "Task 5: EquipmentNode DriverInstanceId/DeviceId/DeviceHost (E projection)", "status": "completed", "nativeId": 42, "commits": ["cb7ce7f1", "915492a7"]}, + {"id": 6, "subject": "Task 6: DriverHostActor cache-as-dict + driver-level resolution (E1)", "status": "completed", "blockedBy": [5], "nativeId": 43, "commits": ["adcd7b57"]}, + {"id": 7, "subject": "Task 7: Re-trigger discovery on rebind drop (C2)", "status": "completed", "blockedBy": [4, 6], "nativeId": 44, "commits": ["53367148", "cde16063"]}, + {"id": 8, "subject": "Task 8: Single SetDesiredSubscriptions per redeploy (D)", "status": "completed", "blockedBy": [7], "nativeId": 45, "commits": ["05c82079", "51721df5"]}, + {"id": 9, "subject": "Task 9: Multi-device-per-driver partition (E2)", "status": "completed", "blockedBy": [5, 8], "nativeId": 46, "commits": ["50f08635", "0074f37a"]}, + {"id": 10, "subject": "Task 10: Docs — follow-up notes + statuses", "status": "completed", "blockedBy": [9], "nativeId": 47}, + {"id": 11, "subject": "Task 11: Build + offline suite + regression gate", "status": "pending", "blockedBy": [2, 4, 9, 10], "nativeId": 48} + ], + "nativeTaskIds": { + "1": 38, "2": 39, "3": 40, "4": 41, "5": 42, "6": 43, + "7": 44, "8": 45, "9": 46, "10": 47, "11": 48 + }, + "lastUpdated": "2026-06-26T00:00:00Z", + "status": "code+docs complete; final build/suite gate pending", + "branch": "feat/focas-fixedtree-equipment-injection" +}