From 53f1147bfc929d8167513a74ec8057d67802a2bb Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Thu, 18 Jun 2026 12:59:51 -0400 Subject: [PATCH] docs(plan): design for F10b vtag rebuild-skip + Client.UI Galaxy comment limitation --- ...vtag-skip-rebuild-galaxy-comment-design.md | 171 ++++++++++++++++++ 1 file changed, 171 insertions(+) create mode 100644 docs/plans/2026-06-18-f10b-vtag-skip-rebuild-galaxy-comment-design.md diff --git a/docs/plans/2026-06-18-f10b-vtag-skip-rebuild-galaxy-comment-design.md b/docs/plans/2026-06-18-f10b-vtag-skip-rebuild-galaxy-comment-design.md new file mode 100644 index 00000000..39453d21 --- /dev/null +++ b/docs/plans/2026-06-18-f10b-vtag-skip-rebuild-galaxy-comment-design.md @@ -0,0 +1,171 @@ +# F10b vtag-node-irrelevant rebuild skip + Client.UI Galaxy operator-comment limitation — Design + +**Date:** 2026-06-18 +**Branch:** `feat/f10b-vtag-skip-rebuild-galaxy-comment` (off master `be703a30`) +**Backlog items:** `stillpending.md` §A #11 (F10b in-place vs full rebuild) + #12 (Client.UI Galaxy operator-comment fallback) + +Two independent, low-risk backlog items. F10b is a *bounded* address-space-rebuild optimization (the narrow, provably-safe slice — NOT the broad surgical-write rewrite). The Client.UI item is an accepted-limitation close (doc/comment + a pinning test). They touch disjoint projects (OpcUaServer / Client.Shared). + +--- + +## Standing constraints (in force) + +- NO Commons/proto change · NO Core.Abstractions / breaking-interface change · NO EF migration · NO bUnit + (no Razor touched this phase). +- Stage by explicit path (never `git add .`); never stage `sql_login.txt` / `pki/` / `pending.md` / + `current.md` / `stillpending.md` / `docker-dev/docker-compose.yml`. No force-push / no `--no-verify`. +- Finish = merge to master + push. `dangerouslyDisableSandbox: true` for all build/test commands. + +--- + +## Component A — F10b: skip the address-space rebuild for vtag-node-irrelevant edits (backlog #11) + +### What "full rebuild" costs, and why H1a made it unconditional + +`OpcUaPublishActor.HandleRebuild` runs: `outcome = _applier.Apply(plan)` (which calls +`_sink.RebuildAddressSpace()` — a full tear-down of every folder/variable/alarm — iff `needsRebuild`), +then **unconditionally** re-runs the idempotent `Materialise*` passes. H1a/H1b (`1e95856b`/`ada01e1a`) +made `needsRebuild` fire on **any** Changed* (not just Added/Removed) to kill a silent stale-node bug — but +that means **editing a single VirtualTag's expression tears down and rebuilds the entire server address +space**, dropping every client's live values + monitored-item subscriptions server-wide. That blast radius +is the cost F10b targets. + +### The provably-safe slice (the only zero-node-mutation case) + +A VirtualTag's **materialised OPC UA node** (`Phase7Applier.MaterialiseEquipmentVirtualTags`, lines 264-272) +is derived **only** from `{EquipmentId, FolderPath, Name, DataType}` → +`EnsureVariable(parent/Name, Name, DataType, writable:false)`. `Expression`, `DependencyRefs`, and +`Historize` do **not** touch the node — `Historize` is explicitly write-side only (the code comment at +lines 260-263 says so; vtags are not materialised as SDK Historizing/HistoryRead variables). + +The VirtualTag **engine** picks up `Expression`/`DependencyRefs`/`Historize` changes through a **completely +independent path**: `DriverHostActor` (`:1062`) tells `VirtualTagHostActor` `ApplyVirtualTags(...)`, and +`VirtualTagHostActor.OnApply` (lines 97-109) stop+respawns any child "whose plan changed in place (edited +Expression/DependencyRefs, or a toggled flag)". This is **not** routed through `Phase7Applier`. + +Therefore, when a deploy's *only* changes are vtag deltas that differ *solely* in +`Expression`/`DependencyRefs`/`Historize`: +- the materialised node is byte-identical → `RebuildAddressSpace` would tear it down and rebuild an + identical node, dropping subscriptions for nothing; +- the idempotent `Materialise*` passes already short-circuit on an existing node + (`OtOpcUaNodeManager.EnsureVariable:1324` early-returns `if (_variables.ContainsKey(...))`), so they + no-op and preserve the live value + subscriptions; +- the engine respawns regardless via `VirtualTagHostActor`. + +So **skipping `RebuildAddressSpace` for this case is correct and preserves the H1a invariant** (no stale +node — because the node never needed to change). + +### Approach (Phase7Applier-only; NO node-manager / SDK / interface change) + +In `Phase7Applier.Apply`, refine `needsRebuild`. A vtag delta is **node-irrelevant** iff overriding the +three engine/write-side fields to the new values makes it equal to the new plan (a whitelist-of-may-differ +that **defaults to rebuild** for any other/unknown field, because `EquipmentVirtualTagPlan` has a custom +`Equals` comparing all 8 logical fields): + +```csharp +static bool VtagDeltaIsNodeIrrelevant(Phase7Plan.EquipmentVirtualTagDelta d) => + (d.Previous with + { + Expression = d.Current.Expression, + DependencyRefs = d.Current.DependencyRefs, + Historize = d.Current.Historize, + }).Equals(d.Current); +``` + +`needsRebuild` becomes true iff there is any structural change OR any vtag delta that is **not** +node-irrelevant: + +```csharp +var needsRebuild = + plan.AddedEquipment.Count > 0 || plan.RemovedEquipment.Count > 0 || plan.ChangedEquipment.Count > 0 || + plan.AddedAlarms.Count > 0 || plan.RemovedAlarms.Count > 0 || plan.ChangedAlarms.Count > 0 || + plan.AddedEquipmentTags.Count > 0 || plan.RemovedEquipmentTags.Count > 0 || plan.ChangedEquipmentTags.Count > 0 || + plan.AddedEquipmentVirtualTags.Count > 0 || plan.RemovedEquipmentVirtualTags.Count > 0 || + plan.ChangedEquipmentVirtualTags.Any(d => !VtagDeltaIsNodeIrrelevant(d)); +``` + +i.e. `needsRebuild` is false **only** when every effective change is a node-irrelevant vtag edit and +nothing else changed. `ChangedNodes` (the outcome count) stays as-is (it still counts the vtag change as a +change — it just doesn't force a rebuild). Update the lines 84-91 comment to document the new skip + the +safety reasoning. The downstream `Materialise*` passes and the `VirtualTagHostActor` respawn are unchanged. + +### Out of scope (deferred, documented) + +Surgical *in-place attribute writes* for equipment-tag property changes (`DataType`, `Writable`, +`IsHistorized`) and alarm-severity-only changes (`SetSeverity`) genuinely mutate node attributes and carry +the H1a stale-node risk + SDK handler/historian-lifecycle complexity. They are NOT in this slice — F10b here +is strictly the zero-node-mutation vtag skip. Record them as follow-ups. + +### Testing (offline, no SDK) + +`Phase7ApplierTests` already uses a `RecordingSink` that captures `RebuildAddressSpace` calls. Add: +- vtag delta differing **only in Expression** ⇒ `outcome.RebuildCalled == false`, `RecordingSink` saw **no** + `RebuildAddressSpace`, `outcome.ChangedNodes == 1`. +- same for a **DependencyRefs-only** and a **Historize-only** delta ⇒ no rebuild. +- vtag delta where **DataType also changed** ⇒ `RebuildCalled == true` (the safety boundary). +- vtag delta where **Name (or FolderPath) changed** ⇒ `RebuildCalled == true`. +- a node-irrelevant vtag delta **mixed with** any other change (e.g. a ChangedEquipmentTag, or an added + equipment) ⇒ `RebuildCalled == true` (the skip requires the vtag edit to be the *sole* change). + +### Live `/run` (optional, confirmatory) + +On docker-dev (rebuild BOTH central nodes per the `:9200` round-robin fact): subscribe a client to a vtag, +edit only its expression in `/uns`, Deploy → server log shows `rebuild=False` for that deploy and the +client subscription is NOT dropped (vs `rebuild=True` when a tag's DataType changes). This is confirmatory; +the unit tests pin the logic. + +--- + +## Component B — Client.UI Galaxy operator-comment fallback (backlog #12) + +### Verdict: accepted limitation, not a code-fixable bug + +The operator comment is **genuinely unrecoverable** in the Galaxy sub-attribute fallback path +(`OpcUaClientService.cs:769-820`): +- the standard OPC UA event SelectClause (`CreateAlarmEventFilter`, ~13 fields) does **not** request an + AckComment/Comment field — Part 9 conditions don't carry the operator comment in the event payload here; +- Galaxy's sub-attribute convention (`AlarmRefBuilder.cs:15-20`) exposes `AckMsg` as **write-only** — there + is no readable attribute to recover the last operator comment; +- the comment **is** available via the gateway's **native** alarm feed + (`GatewayGalaxyAlarmFeed` → `GalaxyAlarmTransition.OperatorComment`), which the Galaxy *driver* already + uses — a different path from this OPC UA *client* fallback. + +So a real fix would require a server-side change (out of scope); the backlog already labels this "accepted +limitation." `AlarmEventArgs.cs:82-87` partly documents it, but the client-side extraction omission is +unexplained, and the behavior is untested. + +### Approach (doc/comment + a pinning test; NO behavioral change) + +- **Inline comment** at the fallback `AlarmEvent?.Invoke(...)` site in `OpcUaClientService.cs` (~:817): + explain that `OperatorComment` is intentionally left null here — Galaxy `AckMsg` is write-only and the + OPC UA event SelectClause carries no comment field, so the comment is only available via the gateway's + native alarm feed (the driver path), not this client fallback. +- **Tighten the `AlarmEventArgs.cs:82-87` doc** to name the two concrete reasons (write-only `AckMsg` + + no event-field), so it reads as an intentional limitation, not a TODO. +- **Pinning test** in `OpcUaClientServiceTests` (mirror the existing + `OnAlarmEvent_MissingAckedActiveButHasConditionNode_FallbackReadsAndRaisesEvent` fixture): drive the + fallback path and assert the raised `AlarmEventArgs.OperatorComment` is null — converting the gap into a + tested, intentional limitation. + +### Testing + +Fully offline — the existing `OpcUaClientServiceTests` fakes (`FakeSessionAdapter`/`FakeSubscriptionAdapter`) +already drive the fallback path; the new test asserts the documented behavior. + +--- + +## Task slicing (independent → parallelizable) + +| Task | Component | Project | Class | Parallel with | +|---|---|---|---|---| +| T1 | F10b vtag-node-irrelevant rebuild skip + tests | OpcUaServer (+ .Tests) | small | T2 | +| T2 | Client.UI Galaxy comment: inline comment + doc + pinning test | Client.Shared (+ .Tests) | trivial→small | T1 | +| T3 | Reconcile stillpending #11/#12 + memory + finish (build, tests, merge+push) | docs (never-staged) | small | none | + +T1/T2 touch disjoint projects → dispatchable concurrently (worktree isolation or serial-on-shared-tree per +the git-index-race lesson). T3 runs last. + +## Done = + +Build clean + `dotnet test` green (OpcUaServer.Tests + Client.Shared.Tests) + backlog/memory reconciled + +merged to master + pushed.