Files
lmxopcua/docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-RESUME.md
T

11 KiB
Raw Blame History

FixedTree → Equipment injection — RESUME / work-left handoff

Date: 2026-06-26 Purpose: survive a context compaction; let a fresh session continue without re-deriving state.


TL;DR

The FixedTree-under-Equipment dynamic-injection feature is BUILT, offline-complete, AND LIVE-VALIDATED on wonder (2026-06-26) — 11 tasks, all reviewed, full offline suite green, final integration review = ready to merge, and the real OPC injection confirmed on wonder-app-vd03 (57 nodes grafted under EQ-3686c0272279, all reading Good live values). It lives on a local, unpushed branch. The only substantive thing left is the user's decision on push/PR/merge (§1). A few documented non-blocking follow-ups remain (§3).

Git state (exact)

  • Branch: feat/focas-fixedtree-equipment-injection (in the main working dir /Users/dohertj2/Desktop/OtOpcUa, NOT a worktree).
  • Base: branched off fix/focas-poll-io-serialization (the symptom-#1 data-plane fix — itself ahead of master, pushed to gitea with its own open PR, NOT merged). So this feature stacks on an unmerged branch.
  • Commits: 14, range da55c69..37cac5de (10 task commits + 4 review-fix/docs commits). All local — nothing pushed.
  • User decision (2026-06-26): finishing-a-development-branch → "Keep as-is." Do NOT push/merge/discard without an explicit new go-ahead. Standing rule: commit/push only when asked.
  • Untouched pre-existing working-tree edits (leave alone; never stage): CLAUDE.md, docker-dev/docker-compose.yml, pending.md, stillpending.md, docs/plans/2026-06-19-followups-batch.md.tasks.json.
  • This RESUME doc itself is currently uncommitted (a working artifact).

What the feature does

Generic post-connect ITagDiscovery injection (NOT FOCAS-special-cased). On driver Connect: DriverInstanceActor runs bounded re-discovery (Timers single-tick, generation-guarded, stop-on-stable + attempt cap, re-kicks on reconnect) into a capturing IAddressSpaceBuilder → ships DiscoveredNodesReadyDriverHostActor resolves the equipment via authored EquipmentTags, maps the nodes under EQ-…/FOCAS/… (read-only; single device-host folder collapsed) via DiscoveredNodeMapper, extends _nodeIdByDriverRef, caches the plan, Tells OpcUaPublishActor.MaterialiseDiscoveredNodesAddressSpaceApplier → sink EnsureFolder/EnsureVariable + RaiseNodesAddedModelChange (NodeAdded), and re-sends SetDesiredSubscriptions(authored FixedTree refs) so values flow through the existing poll→push path. Survives redeploys (re-applied at the tail of PushDesiredSubscriptions from the cache) and restarts (re-discovered on reconnect).

Verification (offline) — all green as of 2026-06-26

  • dotnet build ZB.MOM.WW.OtOpcUa.slnx0 errors, 0 warnings (TreatWarningsAsErrors on).
  • dotnet test … --filter "FullyQualifiedName~Runtime.Tests"312 passed.
  • dotnet test … --filter "FullyQualifiedName~OpcUaServer.Tests"304 passed.
  • dotnet test … --filter "FullyQualifiedName~FOCAS"324 passed, 10 skipped (the skips are live-wire integration tests needing the physical CNC — expected).
  • Final integration review: ready to merge (3 non-blocking Minors — see Follow-ups).
  • Known env limitation (not a failure): the net48 Driver.Historian.Wonderware.Tests can't run its testhost on macOS — run the filtered suites above, not a full-solution dotnet test.

Key files / anchors

  • Design: docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection-design.md (status = Implemented; has the follow-ups).
  • Plan + task journal: docs/plans/2026-06-26-otopcua-fixedtree-equipment-injection.md (+ .md.tasks.json, all tasks completed).
  • Investigation plan (symptom #2 marked BUILT): docs/plans/2026-06-25-otopcua-equipment-dataplane-investigation.md.
  • Deployment doc (FixedTree section added): docs/deployments/wonder-app-vd03-makino-z-34184.md.
  • New code:
    • src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DiscoveredNode.cs, CapturingAddressSpaceBuilder.cs, DiscoveredNodeMapper.cs
    • src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/DiscoveredInjection.cs (DTOs)
    • modified: DriverInstanceActor.cs, DriverHostActor.cs, OpcUaPublishActor.cs, AddressSpaceApplier.cs, OtOpcUaNodeManager.cs, IOpcUaAddressSpaceSink.cs (+ SdkAddressSpaceSink.cs, DeferredAddressSpaceSink.cs)
    • tests: tests/Server/…Runtime.Tests/Drivers/{CapturingAddressSpaceBuilderTests,DiscoveredNodeMapperTests,DriverInstanceActorDiscoveryTests,DriverHostActorDiscoveryTests,DiscoveryInjectionEndToEndTests}.cs, …OpcUaServer.Tests/NodeManagerModelChangeOnAddTests.cs, edits to AddressSpaceApplierTests.cs/OpcUaPublishActorTests.cs.
  • Memory: …/memory/wonder-otopcua-focas-and-akka-roles.md (RESUME-ANCHOR bullet updated to record this feature; read it for the broader wonder/FOCAS context + box-access recipe).

WORK LEFT (prioritized)

1. Decide the git endgame (user-gated)

Pick one, only on explicit user go-ahead:

  • Push + PRgit push -u origin feat/focas-fixedtree-equipment-injection; PR base is fix/focas-poll-io-serialization (stacked) or master (will show both features' commits). gitea repo: lmxopcua.
  • Merge locally into fix/focas-poll-io-serialization (folds both features onto one branch/PR).
  • Keep waiting until after live validation (current state).

2. Live wonder validation — DONE 2026-06-26

Validated live on wonder-app-vd03. Built a full self-contained Host overlay from this branch @ 37cac5de, deployed to E:\ApiInstall\OtOpcUa (stop → backup E:\ApiInstall\OtOpcUa_bak-20260626111416 → robocopy overlay preserving appsettings*.json + pki\ → restart). Baseline before deploy: only parts-count/parts-required under EQ-3686c0272279. After deploy + FOCAS reconnect: the host log recorded injected 57 discovered node(s) … under EQ-3686c0272279 / materialised … (folders=14, vars=57), no exceptions. CLI browse showed the full FOCAS/ subtree (Identity/Axes X-Y-Z-B-C-AA+Actual/Spindle/ Program/OperationMode/Timers), idempotent across repeats, device-host folder collapsed. Sample reads all Good: Identity/SeriesNumber=G431, CncType=31, AxisCount=7, Axes/X/AbsolutePosition=2801574 (live), OperationMode/ModeText=TJOG; authored tags still Good (no regression). /healthz 200 Healthy throughout. Result recorded in docs/deployments/wonder-app-vd03-makino-z-34184.md. The substantive remaining work is now the git endgame (§1) only. Original recipe retained below for reference:

The offline e2e asserts the recording-sink contract, NOT the real OtOpcUaNodeManager seed→overwrite at the OPC node layer. Live validation closes that gap. Recipe (mirrors the symptom-#1 deploy):

  1. Build the current Host self-contained: dotnet publish src/…/ZB.MOM.WW.OtOpcUa.Host…csproj -c Release -r win-x64 --self-contained true -p:PublishSingleFile=false. Must be a full self-contained publish-overlay, NOT a DLL swap — the box is self-contained (DLL swaps crashed: FileNotFound / "Could not resolve CoreCLR path"). Note: deploying the current Host already happened for symptom #1; if the box is at the symptom-#1 build, this feature's DLLs (Runtime + OpcUaServer + Commons + the new Runtime/Drivers files) must be included in the overlay — so a fresh full overlay from THIS branch is the safe path.
  2. Box access: servecli :2222, key ~/.ssh/servecli_wonder, user dohertj2; drive via scratchpad/wonder-ps.sh (base64 PS over cmd PTY); SFTP root C:\Users\dohertj2\Desktop\win64. Service OtOpcUaHost. Overlay onto E:\ApiInstall\OtOpcUa preserving pki\ + appsettings*.json + data\; back up first; auto-rollback if unhealthy.
  3. Restart OtOpcUaHost; confirm member Up w/ ADMIN+DRIVER (roles env already set), /healthz Healthy, OPC :4840 listening.
  4. The FOCAS driver connects → ~02 s later FixedTree populates → injection fires. Validate via the OtOpcUa CLI (src/Client/…Client.CLI) against opc.tcp://wonder-app-vd03.zmr.zimmer.com:4840/OtOpcUa (Security None, anonymous):
    • browse --recursive → expect a FOCAS subfolder under ns=2;s=EQ-3686c0272279 with Identity/, Axes/, etc.
    • read ns=2;s=EQ-3686c0272279/FOCAS/Identity/SeriesNumber → expect Good (a real string).
    • read ns=2;s=EQ-3686c0272279/FOCAS/Axes/X/AbsolutePosition → expect Good (value may be 0 on idle machine — assert STATUS, not magnitude).
    • The authored parts-count/parts-required should remain Good (symptom #1 fix).
  5. If a value reads Bad, the symptom-#1 self-healing applies (recoverable BadCommunicationError, observable in Serilog at C:\Windows\System32\logs\otopcua-<date>.log). The Akka→Serilog bridge (from symptom #1) makes DriverHost/DriverInstance/discovery logs visible.

3. Non-blocking follow-ups

ALL FIXEDTREE FOLLOW-UPS (AE) IMPLEMENTED 2026-06-26 — design+plan 2026-06-26-otopcua-fixedtree-followups{-design,}.md; 16 commits c2c368dc..0074f37a on this branch (every task spec+code reviewed; offline suites green). Resolved:

  • Config-unchanged rebind now re-triggers discovery (TriggerRediscovery) — follow-up C.
  • Multi-device-per-driver implemented via EquipmentNode.DeviceHost partition; ≥1-authored-tag requirement lifted (driver-binding resolution) — follow-up E (projection-only, no migration / no artifact wire change).
  • Per-(re)connect re-discovery policy-gated (ITagDiscovery.RediscoverPolicy UntilStable/Once/Never; synchronous drivers → Once) — follow-up B.
  • Double SetDesiredSubscriptions per redeploy de-duped (one send per driver) — follow-up D.
  • Per-pass DiscoverAsync timeout made injectable — follow-up A.

Still open (out of scope for the FixedTree follow-ups — separate cross-cutting work):

  • Cross-cutting (from symptom #1, all 3 apps): shared AddZbSerilog doesn't set the static Serilog.Log.Logger; AdminUI persists FOCAS config in formats (series-as-number, scheme-less host) the driver only now tolerates — reconcile at the AdminUI source.

Context that's easy to lose

  • 3 real defects were caught + fixed by the review chain during the build: DriverDataType.ToString() ≠ OPC type string (Float64"Double"); Server.ReportEvent under the node Lock (deadlock); ConfigureAwait(false) in the discovery handler (off-actor-context crash for async drivers like Galaxy sharing the node). All have regression tests.
  • The plan's Task-3 instruction "keep ReportEvent inside lock" was itself a defect; the plan doc was corrected.
  • The execution used subagent-driven-development (fresh implementer per task + spec/code reviews; high-risk tasks got Opus reviews, serial). Single-writer discipline was enforced (no concurrent dotnet builds → no obj/bin or git-index races).