Files
lmxopcua/docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-RESUME.md
T

111 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# FixedTree → Equipment injection — RESUME / work-left handoff
**Date:** 2026-06-26
**Purpose:** survive a context compaction; let a fresh session continue without re-deriving state.
---
## TL;DR
The **FixedTree-under-Equipment dynamic-injection feature is BUILT, offline-complete, AND
✅ LIVE-VALIDATED on wonder (2026-06-26)** — 11 tasks, all reviewed, full offline suite green, final
integration review = ready to merge, and the real OPC injection confirmed on `wonder-app-vd03` (57 nodes
grafted under `EQ-3686c0272279`, all reading Good live values). It lives on a **local, unpushed** branch.
The only substantive thing left is the user's decision on push/PR/merge (§1). A few documented non-blocking
follow-ups remain (§3).
## Git state (exact)
- **Branch:** `feat/focas-fixedtree-equipment-injection` (in the main working dir `/Users/dohertj2/Desktop/OtOpcUa`, NOT a worktree).
- **Base:** branched off `fix/focas-poll-io-serialization` (the symptom-#1 data-plane fix — itself ahead of `master`, pushed to gitea with its own open PR, NOT merged). So this feature **stacks on an unmerged branch**.
- **Commits:** 14, range `da55c69`..`37cac5de` (10 task commits + 4 review-fix/docs commits). All **local — nothing pushed.**
- **User decision (2026-06-26):** finishing-a-development-branch → **"Keep as-is."** Do NOT push/merge/discard without an explicit new go-ahead. Standing rule: **commit/push only when asked.**
- **Untouched pre-existing working-tree edits** (leave alone; never stage): `CLAUDE.md`, `docker-dev/docker-compose.yml`, `pending.md`, `stillpending.md`, `docs/plans/2026-06-19-followups-batch.md.tasks.json`.
- This RESUME doc itself is currently **uncommitted** (a working artifact).
## What the feature does
Generic post-connect `ITagDiscovery` injection (NOT FOCAS-special-cased). On driver Connect:
`DriverInstanceActor` runs bounded re-discovery (Timers single-tick, generation-guarded, stop-on-stable +
attempt cap, re-kicks on reconnect) into a capturing `IAddressSpaceBuilder` → ships `DiscoveredNodesReady`
`DriverHostActor` resolves the equipment via authored `EquipmentTags`, maps the nodes under
`EQ-…/FOCAS/…` (read-only; single device-host folder collapsed) via `DiscoveredNodeMapper`, extends
`_nodeIdByDriverRef`, caches the plan, Tells `OpcUaPublishActor.MaterialiseDiscoveredNodes`
`AddressSpaceApplier` → sink `EnsureFolder`/`EnsureVariable` + `RaiseNodesAddedModelChange` (NodeAdded), and
re-sends `SetDesiredSubscriptions(authored FixedTree refs)` so values flow through the existing
poll→push path. Survives redeploys (re-applied at the tail of `PushDesiredSubscriptions` from the cache)
and restarts (re-discovered on reconnect).
## Verification (offline) — all green as of 2026-06-26
- `dotnet build ZB.MOM.WW.OtOpcUa.slnx`**0 errors, 0 warnings** (TreatWarningsAsErrors on).
- `dotnet test … --filter "FullyQualifiedName~Runtime.Tests"`**312 passed**.
- `dotnet test … --filter "FullyQualifiedName~OpcUaServer.Tests"`**304 passed**.
- `dotnet test … --filter "FullyQualifiedName~FOCAS"`**324 passed, 10 skipped** (the skips are live-wire integration tests needing the physical CNC — expected).
- Final integration review: **ready to merge** (3 non-blocking Minors — see Follow-ups).
- Known env limitation (not a failure): the net48 `Driver.Historian.Wonderware.Tests` can't run its testhost on macOS — run the **filtered** suites above, not a full-solution `dotnet test`.
## Key files / anchors
- Design: `docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-design.md` (status = Implemented; has the follow-ups).
- Plan + task journal: `docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection.md` (+ `.md.tasks.json`, all tasks completed).
- Investigation plan (symptom #2 marked BUILT): `docs/plans/2026-06-25-otopcua-equipment-dataplane-investigation.md`.
- Deployment doc (FixedTree section added): `docs/deployments/wonder-app-vd03-makino-z-34184.md`.
- New code:
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DiscoveredNode.cs`, `CapturingAddressSpaceBuilder.cs`, `DiscoveredNodeMapper.cs`
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/DiscoveredInjection.cs` (DTOs)
- modified: `DriverInstanceActor.cs`, `DriverHostActor.cs`, `OpcUaPublishActor.cs`, `AddressSpaceApplier.cs`, `OtOpcUaNodeManager.cs`, `IOpcUaAddressSpaceSink.cs` (+ `SdkAddressSpaceSink.cs`, `DeferredAddressSpaceSink.cs`)
- tests: `tests/Server/…Runtime.Tests/Drivers/{CapturingAddressSpaceBuilderTests,DiscoveredNodeMapperTests,DriverInstanceActorDiscoveryTests,DriverHostActorDiscoveryTests,DiscoveryInjectionEndToEndTests}.cs`, `…OpcUaServer.Tests/NodeManagerModelChangeOnAddTests.cs`, edits to `AddressSpaceApplierTests.cs`/`OpcUaPublishActorTests.cs`.
- Memory: `…/memory/wonder-otopcua-focas-and-akka-roles.md` (RESUME-ANCHOR bullet updated to record this feature; read it for the broader wonder/FOCAS context + box-access recipe).
## WORK LEFT (prioritized)
### 1. Decide the git endgame (user-gated)
Pick one, only on explicit user go-ahead:
- **Push + PR** — `git push -u origin feat/focas-fixedtree-equipment-injection`; PR base is `fix/focas-poll-io-serialization` (stacked) or `master` (will show both features' commits). gitea repo: `lmxopcua`.
- **Merge locally** into `fix/focas-poll-io-serialization` (folds both features onto one branch/PR).
- Keep waiting until after live validation (current state).
### 2. Live wonder validation — ✅ DONE 2026-06-26
**Validated live on `wonder-app-vd03`.** Built a full self-contained Host overlay from this branch @
`37cac5de`, deployed to `E:\ApiInstall\OtOpcUa` (stop → backup `E:\ApiInstall\OtOpcUa_bak-20260626111416`
→ robocopy overlay preserving `appsettings*.json` + `pki\` → restart). Baseline before deploy: only
`parts-count`/`parts-required` under `EQ-3686c0272279`. After deploy + FOCAS reconnect: the host log
recorded `injected 57 discovered node(s) … under EQ-3686c0272279` / `materialised … (folders=14, vars=57)`,
no exceptions. CLI browse showed the full `FOCAS/` subtree (Identity/Axes X-Y-Z-B-C-AA+Actual/Spindle/
Program/OperationMode/Timers), idempotent across repeats, device-host folder collapsed. Sample reads all
Good: `Identity/SeriesNumber=G431`, `CncType=31`, `AxisCount=7`, `Axes/X/AbsolutePosition=2801574` (live),
`OperationMode/ModeText=TJOG`; authored tags still Good (no regression). `/healthz` 200 Healthy throughout.
Result recorded in `docs/deployments/wonder-app-vd03-makino-z-34184.md`. **The substantive remaining work
is now the git endgame (§1) only.** Original recipe retained below for reference:
The offline e2e asserts the recording-sink contract, NOT the real `OtOpcUaNodeManager` seed→overwrite at
the OPC node layer. Live validation closes that gap. Recipe (mirrors the symptom-#1 deploy):
1. Build the current Host self-contained: `dotnet publish src/…/ZB.MOM.WW.OtOpcUa.Host…csproj -c Release -r win-x64 --self-contained true -p:PublishSingleFile=false`. **Must be a full self-contained publish-overlay, NOT a DLL swap** — the box is self-contained (DLL swaps crashed: FileNotFound / "Could not resolve CoreCLR path"). Note: deploying the current Host already happened for symptom #1; if the box is at the symptom-#1 build, this feature's DLLs (Runtime + OpcUaServer + Commons + the new Runtime/Drivers files) must be included in the overlay — so a fresh full overlay from THIS branch is the safe path.
2. Box access: servecli `:2222`, key `~/.ssh/servecli_wonder`, user `dohertj2`; drive via `scratchpad/wonder-ps.sh` (base64 PS over cmd PTY); SFTP root `C:\Users\dohertj2\Desktop\win64`. Service `OtOpcUaHost`. Overlay onto `E:\ApiInstall\OtOpcUa` **preserving `pki\` + `appsettings*.json` + `data\`**; back up first; auto-rollback if unhealthy.
3. Restart `OtOpcUaHost`; confirm member Up w/ ADMIN+DRIVER (roles env already set), `/healthz` Healthy, OPC `:4840` listening.
4. The FOCAS driver connects → ~02 s later FixedTree populates → injection fires. Validate via the OtOpcUa CLI (`src/Client/…Client.CLI`) against `opc.tcp://wonder-app-vd03.zmr.zimmer.com:4840/OtOpcUa` (Security None, anonymous):
- `browse --recursive` → expect a `FOCAS` subfolder under `ns=2;s=EQ-3686c0272279` with `Identity/`, `Axes/`, etc.
- `read ns=2;s=EQ-3686c0272279/FOCAS/Identity/SeriesNumber` → expect Good (a real string).
- `read ns=2;s=EQ-3686c0272279/FOCAS/Axes/X/AbsolutePosition` → expect Good (value may be 0 on idle machine — assert STATUS, not magnitude).
- The authored `parts-count`/`parts-required` should remain Good (symptom #1 fix).
5. If a value reads Bad, the symptom-#1 self-healing applies (recoverable `BadCommunicationError`, observable in Serilog at `C:\Windows\System32\logs\otopcua-<date>.log`). The Akka→Serilog bridge (from symptom #1) makes `DriverHost`/`DriverInstance`/discovery logs visible.
### 3. Non-blocking follow-ups
**✅ ALL FIXEDTREE FOLLOW-UPS (AE) IMPLEMENTED 2026-06-26** — design+plan
`2026-06-26-otopcua-fixedtree-followups{-design,}.md`; 16 commits `c2c368dc`..`0074f37a` on this branch
(every task spec+code reviewed; offline suites green). Resolved:
- ✅ Config-unchanged rebind now re-triggers discovery (`TriggerRediscovery`) — follow-up C.
- ✅ Multi-device-per-driver implemented via `EquipmentNode.DeviceHost` partition; ≥1-authored-tag requirement lifted (driver-binding resolution) — follow-up E (projection-only, no migration / no artifact wire change).
- ✅ Per-(re)connect re-discovery policy-gated (`ITagDiscovery.RediscoverPolicy` UntilStable/Once/Never; synchronous drivers → Once) — follow-up B.
- ✅ Double `SetDesiredSubscriptions` per redeploy de-duped (one send per driver) — follow-up D.
- ✅ Per-pass `DiscoverAsync` timeout made injectable — follow-up A.
**Still open (out of scope for the FixedTree follow-ups — separate cross-cutting work):**
- Cross-cutting (from symptom #1, all 3 apps): shared `AddZbSerilog` doesn't set the static `Serilog.Log.Logger`; AdminUI persists FOCAS config in formats (series-as-number, scheme-less host) the driver only now tolerates — reconcile at the AdminUI source.
## Context that's easy to lose
- 3 real defects were caught + fixed by the review chain during the build: `DriverDataType.ToString()` ≠ OPC type string (`Float64``"Double"`); `Server.ReportEvent` under the node `Lock` (deadlock); `ConfigureAwait(false)` in the discovery handler (off-actor-context crash for async drivers like Galaxy sharing the node). All have regression tests.
- The plan's Task-3 instruction "keep ReportEvent inside lock" was itself a defect; the plan doc was corrected.
- The execution used subagent-driven-development (fresh implementer per task + spec/code reviews; high-risk tasks got Opus reviews, serial). Single-writer discipline was enforced (no concurrent `dotnet` builds → no obj/bin or git-index races).