9.9 KiB
F10b vtag-node-irrelevant rebuild skip + Client.UI Galaxy operator-comment limitation — Design
Date: 2026-06-18
Branch: feat/f10b-vtag-skip-rebuild-galaxy-comment (off master be703a30)
Backlog items: stillpending.md §A #11 (F10b in-place vs full rebuild) + #12 (Client.UI Galaxy operator-comment fallback)
Two independent, low-risk backlog items. F10b is a bounded address-space-rebuild optimization (the narrow, provably-safe slice — NOT the broad surgical-write rewrite). The Client.UI item is an accepted-limitation close (doc/comment + a pinning test). They touch disjoint projects (OpcUaServer / Client.Shared).
Standing constraints (in force)
- NO Commons/proto change · NO Core.Abstractions / breaking-interface change · NO EF migration · NO bUnit (no Razor touched this phase).
- Stage by explicit path (never
git add .); never stagesql_login.txt/pki//pending.md/current.md/stillpending.md/docker-dev/docker-compose.yml. No force-push / no--no-verify. - Finish = merge to master + push.
dangerouslyDisableSandbox: truefor all build/test commands.
Component A — F10b: skip the address-space rebuild for vtag-node-irrelevant edits (backlog #11)
What "full rebuild" costs, and why H1a made it unconditional
OpcUaPublishActor.HandleRebuild runs: outcome = _applier.Apply(plan) (which calls
_sink.RebuildAddressSpace() — a full tear-down of every folder/variable/alarm — iff needsRebuild),
then unconditionally re-runs the idempotent Materialise* passes. H1a/H1b (1e95856b/ada01e1a)
made needsRebuild fire on any Changed* (not just Added/Removed) to kill a silent stale-node bug — but
that means editing a single VirtualTag's expression tears down and rebuilds the entire server address
space, dropping every client's live values + monitored-item subscriptions server-wide. That blast radius
is the cost F10b targets.
The provably-safe slice (the only zero-node-mutation case)
A VirtualTag's materialised OPC UA node (Phase7Applier.MaterialiseEquipmentVirtualTags, lines 264-272)
is derived only from {EquipmentId, FolderPath, Name, DataType} →
EnsureVariable(parent/Name, Name, DataType, writable:false). Expression, DependencyRefs, and
Historize do not touch the node — Historize is explicitly write-side only (the code comment at
lines 260-263 says so; vtags are not materialised as SDK Historizing/HistoryRead variables).
The VirtualTag engine picks up Expression/DependencyRefs/Historize changes through a completely
independent path: DriverHostActor (:1062) tells VirtualTagHostActor ApplyVirtualTags(...), and
VirtualTagHostActor.OnApply (lines 97-109) stop+respawns any child "whose plan changed in place (edited
Expression/DependencyRefs, or a toggled flag)". This is not routed through Phase7Applier.
Therefore, when a deploy's only changes are vtag deltas that differ solely in
Expression/DependencyRefs/Historize:
- the materialised node is byte-identical →
RebuildAddressSpacewould tear it down and rebuild an identical node, dropping subscriptions for nothing; - the idempotent
Materialise*passes already short-circuit on an existing node (OtOpcUaNodeManager.EnsureVariable:1324early-returnsif (_variables.ContainsKey(...))), so they no-op and preserve the live value + subscriptions; - the engine respawns regardless via
VirtualTagHostActor.
So skipping RebuildAddressSpace for this case is correct and preserves the H1a invariant (no stale
node — because the node never needed to change).
Approach (Phase7Applier-only; NO node-manager / SDK / interface change)
In Phase7Applier.Apply, refine needsRebuild. A vtag delta is node-irrelevant iff overriding the
three engine/write-side fields to the new values makes it equal to the new plan (a whitelist-of-may-differ
that defaults to rebuild for any other/unknown field, because EquipmentVirtualTagPlan has a custom
Equals comparing all 8 logical fields):
static bool VtagDeltaIsNodeIrrelevant(Phase7Plan.EquipmentVirtualTagDelta d) =>
(d.Previous with
{
Expression = d.Current.Expression,
DependencyRefs = d.Current.DependencyRefs,
Historize = d.Current.Historize,
}).Equals(d.Current);
needsRebuild becomes true iff there is any structural change OR any vtag delta that is not
node-irrelevant:
var needsRebuild =
plan.AddedEquipment.Count > 0 || plan.RemovedEquipment.Count > 0 || plan.ChangedEquipment.Count > 0 ||
plan.AddedAlarms.Count > 0 || plan.RemovedAlarms.Count > 0 || plan.ChangedAlarms.Count > 0 ||
plan.AddedEquipmentTags.Count > 0 || plan.RemovedEquipmentTags.Count > 0 || plan.ChangedEquipmentTags.Count > 0 ||
plan.AddedEquipmentVirtualTags.Count > 0 || plan.RemovedEquipmentVirtualTags.Count > 0 ||
plan.ChangedEquipmentVirtualTags.Any(d => !VtagDeltaIsNodeIrrelevant(d));
i.e. needsRebuild is false only when every effective change is a node-irrelevant vtag edit and
nothing else changed. ChangedNodes (the outcome count) stays as-is (it still counts the vtag change as a
change — it just doesn't force a rebuild). Update the lines 84-91 comment to document the new skip + the
safety reasoning. The downstream Materialise* passes and the VirtualTagHostActor respawn are unchanged.
Out of scope (deferred, documented)
Surgical in-place attribute writes for equipment-tag property changes (DataType, Writable,
IsHistorized) and alarm-severity-only changes (SetSeverity) genuinely mutate node attributes and carry
the H1a stale-node risk + SDK handler/historian-lifecycle complexity. They are NOT in this slice — F10b here
is strictly the zero-node-mutation vtag skip. Record them as follow-ups.
Testing (offline, no SDK)
Phase7ApplierTests already uses a RecordingSink that captures RebuildAddressSpace calls. Add:
- vtag delta differing only in Expression ⇒
outcome.RebuildCalled == false,RecordingSinksaw noRebuildAddressSpace,outcome.ChangedNodes == 1. - same for a DependencyRefs-only and a Historize-only delta ⇒ no rebuild.
- vtag delta where DataType also changed ⇒
RebuildCalled == true(the safety boundary). - vtag delta where Name (or FolderPath) changed ⇒
RebuildCalled == true. - a node-irrelevant vtag delta mixed with any other change (e.g. a ChangedEquipmentTag, or an added
equipment) ⇒
RebuildCalled == true(the skip requires the vtag edit to be the sole change).
Live /run (optional, confirmatory)
On docker-dev (rebuild BOTH central nodes per the :9200 round-robin fact): subscribe a client to a vtag,
edit only its expression in /uns, Deploy → server log shows rebuild=False for that deploy and the
client subscription is NOT dropped (vs rebuild=True when a tag's DataType changes). This is confirmatory;
the unit tests pin the logic.
Component B — Client.UI Galaxy operator-comment fallback (backlog #12)
Verdict: accepted limitation, not a code-fixable bug
The operator comment is genuinely unrecoverable in the Galaxy sub-attribute fallback path
(OpcUaClientService.cs:769-820):
- the standard OPC UA event SelectClause (
CreateAlarmEventFilter, ~13 fields) does not request an AckComment/Comment field — Part 9 conditions don't carry the operator comment in the event payload here; - Galaxy's sub-attribute convention (
AlarmRefBuilder.cs:15-20) exposesAckMsgas write-only — there is no readable attribute to recover the last operator comment; - the comment is available via the gateway's native alarm feed
(
GatewayGalaxyAlarmFeed→GalaxyAlarmTransition.OperatorComment), which the Galaxy driver already uses — a different path from this OPC UA client fallback.
So a real fix would require a server-side change (out of scope); the backlog already labels this "accepted
limitation." AlarmEventArgs.cs:82-87 partly documents it, but the client-side extraction omission is
unexplained, and the behavior is untested.
Approach (doc/comment + a pinning test; NO behavioral change)
- Inline comment at the fallback
AlarmEvent?.Invoke(...)site inOpcUaClientService.cs(~:817): explain thatOperatorCommentis intentionally left null here — GalaxyAckMsgis write-only and the OPC UA event SelectClause carries no comment field, so the comment is only available via the gateway's native alarm feed (the driver path), not this client fallback. - Tighten the
AlarmEventArgs.cs:82-87doc to name the two concrete reasons (write-onlyAckMsg+ no event-field), so it reads as an intentional limitation, not a TODO. - Pinning test in
OpcUaClientServiceTests(mirror the existingOnAlarmEvent_MissingAckedActiveButHasConditionNode_FallbackReadsAndRaisesEventfixture): drive the fallback path and assert the raisedAlarmEventArgs.OperatorCommentis null — converting the gap into a tested, intentional limitation.
Testing
Fully offline — the existing OpcUaClientServiceTests fakes (FakeSessionAdapter/FakeSubscriptionAdapter)
already drive the fallback path; the new test asserts the documented behavior.
Task slicing (independent → parallelizable)
| Task | Component | Project | Class | Parallel with |
|---|---|---|---|---|
| T1 | F10b vtag-node-irrelevant rebuild skip + tests | OpcUaServer (+ .Tests) | small | T2 |
| T2 | Client.UI Galaxy comment: inline comment + doc + pinning test | Client.Shared (+ .Tests) | trivial→small | T1 |
| T3 | Reconcile stillpending #11/#12 + memory + finish (build, tests, merge+push) | docs (never-staged) | small | none |
T1/T2 touch disjoint projects → dispatchable concurrently (worktree isolation or serial-on-shared-tree per the git-index-race lesson). T3 runs last.
Done =
Build clean + dotnet test green (OpcUaServer.Tests + Client.Shared.Tests) + backlog/memory reconciled +
merged to master + pushed.