docs(plan): design for F10b vtag rebuild-skip + Client.UI Galaxy comment limitation

This commit is contained in:
Joseph Doherty
2026-06-18 12:59:51 -04:00
parent be703a3026
commit 53f1147bfc
@@ -0,0 +1,171 @@
# F10b vtag-node-irrelevant rebuild skip + Client.UI Galaxy operator-comment limitation — Design
**Date:** 2026-06-18
**Branch:** `feat/f10b-vtag-skip-rebuild-galaxy-comment` (off master `be703a30`)
**Backlog items:** `stillpending.md` §A #11 (F10b in-place vs full rebuild) + #12 (Client.UI Galaxy operator-comment fallback)
Two independent, low-risk backlog items. F10b is a *bounded* address-space-rebuild optimization (the narrow, provably-safe slice — NOT the broad surgical-write rewrite). The Client.UI item is an accepted-limitation close (doc/comment + a pinning test). They touch disjoint projects (OpcUaServer / Client.Shared).
---
## Standing constraints (in force)
- NO Commons/proto change · NO Core.Abstractions / breaking-interface change · NO EF migration · NO bUnit
(no Razor touched this phase).
- Stage by explicit path (never `git add .`); never stage `sql_login.txt` / `pki/` / `pending.md` /
`current.md` / `stillpending.md` / `docker-dev/docker-compose.yml`. No force-push / no `--no-verify`.
- Finish = merge to master + push. `dangerouslyDisableSandbox: true` for all build/test commands.
---
## Component A — F10b: skip the address-space rebuild for vtag-node-irrelevant edits (backlog #11)
### What "full rebuild" costs, and why H1a made it unconditional
`OpcUaPublishActor.HandleRebuild` runs: `outcome = _applier.Apply(plan)` (which calls
`_sink.RebuildAddressSpace()` — a full tear-down of every folder/variable/alarm — iff `needsRebuild`),
then **unconditionally** re-runs the idempotent `Materialise*` passes. H1a/H1b (`1e95856b`/`ada01e1a`)
made `needsRebuild` fire on **any** Changed* (not just Added/Removed) to kill a silent stale-node bug — but
that means **editing a single VirtualTag's expression tears down and rebuilds the entire server address
space**, dropping every client's live values + monitored-item subscriptions server-wide. That blast radius
is the cost F10b targets.
### The provably-safe slice (the only zero-node-mutation case)
A VirtualTag's **materialised OPC UA node** (`Phase7Applier.MaterialiseEquipmentVirtualTags`, lines 264-272)
is derived **only** from `{EquipmentId, FolderPath, Name, DataType}`
`EnsureVariable(parent/Name, Name, DataType, writable:false)`. `Expression`, `DependencyRefs`, and
`Historize` do **not** touch the node — `Historize` is explicitly write-side only (the code comment at
lines 260-263 says so; vtags are not materialised as SDK Historizing/HistoryRead variables).
The VirtualTag **engine** picks up `Expression`/`DependencyRefs`/`Historize` changes through a **completely
independent path**: `DriverHostActor` (`:1062`) tells `VirtualTagHostActor` `ApplyVirtualTags(...)`, and
`VirtualTagHostActor.OnApply` (lines 97-109) stop+respawns any child "whose plan changed in place (edited
Expression/DependencyRefs, or a toggled flag)". This is **not** routed through `Phase7Applier`.
Therefore, when a deploy's *only* changes are vtag deltas that differ *solely* in
`Expression`/`DependencyRefs`/`Historize`:
- the materialised node is byte-identical → `RebuildAddressSpace` would tear it down and rebuild an
identical node, dropping subscriptions for nothing;
- the idempotent `Materialise*` passes already short-circuit on an existing node
(`OtOpcUaNodeManager.EnsureVariable:1324` early-returns `if (_variables.ContainsKey(...))`), so they
no-op and preserve the live value + subscriptions;
- the engine respawns regardless via `VirtualTagHostActor`.
So **skipping `RebuildAddressSpace` for this case is correct and preserves the H1a invariant** (no stale
node — because the node never needed to change).
### Approach (Phase7Applier-only; NO node-manager / SDK / interface change)
In `Phase7Applier.Apply`, refine `needsRebuild`. A vtag delta is **node-irrelevant** iff overriding the
three engine/write-side fields to the new values makes it equal to the new plan (a whitelist-of-may-differ
that **defaults to rebuild** for any other/unknown field, because `EquipmentVirtualTagPlan` has a custom
`Equals` comparing all 8 logical fields):
```csharp
static bool VtagDeltaIsNodeIrrelevant(Phase7Plan.EquipmentVirtualTagDelta d) =>
(d.Previous with
{
Expression = d.Current.Expression,
DependencyRefs = d.Current.DependencyRefs,
Historize = d.Current.Historize,
}).Equals(d.Current);
```
`needsRebuild` becomes true iff there is any structural change OR any vtag delta that is **not**
node-irrelevant:
```csharp
var needsRebuild =
plan.AddedEquipment.Count > 0 || plan.RemovedEquipment.Count > 0 || plan.ChangedEquipment.Count > 0 ||
plan.AddedAlarms.Count > 0 || plan.RemovedAlarms.Count > 0 || plan.ChangedAlarms.Count > 0 ||
plan.AddedEquipmentTags.Count > 0 || plan.RemovedEquipmentTags.Count > 0 || plan.ChangedEquipmentTags.Count > 0 ||
plan.AddedEquipmentVirtualTags.Count > 0 || plan.RemovedEquipmentVirtualTags.Count > 0 ||
plan.ChangedEquipmentVirtualTags.Any(d => !VtagDeltaIsNodeIrrelevant(d));
```
i.e. `needsRebuild` is false **only** when every effective change is a node-irrelevant vtag edit and
nothing else changed. `ChangedNodes` (the outcome count) stays as-is (it still counts the vtag change as a
change — it just doesn't force a rebuild). Update the lines 84-91 comment to document the new skip + the
safety reasoning. The downstream `Materialise*` passes and the `VirtualTagHostActor` respawn are unchanged.
### Out of scope (deferred, documented)
Surgical *in-place attribute writes* for equipment-tag property changes (`DataType`, `Writable`,
`IsHistorized`) and alarm-severity-only changes (`SetSeverity`) genuinely mutate node attributes and carry
the H1a stale-node risk + SDK handler/historian-lifecycle complexity. They are NOT in this slice — F10b here
is strictly the zero-node-mutation vtag skip. Record them as follow-ups.
### Testing (offline, no SDK)
`Phase7ApplierTests` already uses a `RecordingSink` that captures `RebuildAddressSpace` calls. Add:
- vtag delta differing **only in Expression**`outcome.RebuildCalled == false`, `RecordingSink` saw **no**
`RebuildAddressSpace`, `outcome.ChangedNodes == 1`.
- same for a **DependencyRefs-only** and a **Historize-only** delta ⇒ no rebuild.
- vtag delta where **DataType also changed**`RebuildCalled == true` (the safety boundary).
- vtag delta where **Name (or FolderPath) changed**`RebuildCalled == true`.
- a node-irrelevant vtag delta **mixed with** any other change (e.g. a ChangedEquipmentTag, or an added
equipment) ⇒ `RebuildCalled == true` (the skip requires the vtag edit to be the *sole* change).
### Live `/run` (optional, confirmatory)
On docker-dev (rebuild BOTH central nodes per the `:9200` round-robin fact): subscribe a client to a vtag,
edit only its expression in `/uns`, Deploy → server log shows `rebuild=False` for that deploy and the
client subscription is NOT dropped (vs `rebuild=True` when a tag's DataType changes). This is confirmatory;
the unit tests pin the logic.
---
## Component B — Client.UI Galaxy operator-comment fallback (backlog #12)
### Verdict: accepted limitation, not a code-fixable bug
The operator comment is **genuinely unrecoverable** in the Galaxy sub-attribute fallback path
(`OpcUaClientService.cs:769-820`):
- the standard OPC UA event SelectClause (`CreateAlarmEventFilter`, ~13 fields) does **not** request an
AckComment/Comment field — Part 9 conditions don't carry the operator comment in the event payload here;
- Galaxy's sub-attribute convention (`AlarmRefBuilder.cs:15-20`) exposes `AckMsg` as **write-only** — there
is no readable attribute to recover the last operator comment;
- the comment **is** available via the gateway's **native** alarm feed
(`GatewayGalaxyAlarmFeed``GalaxyAlarmTransition.OperatorComment`), which the Galaxy *driver* already
uses — a different path from this OPC UA *client* fallback.
So a real fix would require a server-side change (out of scope); the backlog already labels this "accepted
limitation." `AlarmEventArgs.cs:82-87` partly documents it, but the client-side extraction omission is
unexplained, and the behavior is untested.
### Approach (doc/comment + a pinning test; NO behavioral change)
- **Inline comment** at the fallback `AlarmEvent?.Invoke(...)` site in `OpcUaClientService.cs` (~:817):
explain that `OperatorComment` is intentionally left null here — Galaxy `AckMsg` is write-only and the
OPC UA event SelectClause carries no comment field, so the comment is only available via the gateway's
native alarm feed (the driver path), not this client fallback.
- **Tighten the `AlarmEventArgs.cs:82-87` doc** to name the two concrete reasons (write-only `AckMsg` +
no event-field), so it reads as an intentional limitation, not a TODO.
- **Pinning test** in `OpcUaClientServiceTests` (mirror the existing
`OnAlarmEvent_MissingAckedActiveButHasConditionNode_FallbackReadsAndRaisesEvent` fixture): drive the
fallback path and assert the raised `AlarmEventArgs.OperatorComment` is null — converting the gap into a
tested, intentional limitation.
### Testing
Fully offline — the existing `OpcUaClientServiceTests` fakes (`FakeSessionAdapter`/`FakeSubscriptionAdapter`)
already drive the fallback path; the new test asserts the documented behavior.
---
## Task slicing (independent → parallelizable)
| Task | Component | Project | Class | Parallel with |
|---|---|---|---|---|
| T1 | F10b vtag-node-irrelevant rebuild skip + tests | OpcUaServer (+ .Tests) | small | T2 |
| T2 | Client.UI Galaxy comment: inline comment + doc + pinning test | Client.Shared (+ .Tests) | trivial→small | T1 |
| T3 | Reconcile stillpending #11/#12 + memory + finish (build, tests, merge+push) | docs (never-staged) | small | none |
T1/T2 touch disjoint projects → dispatchable concurrently (worktree isolation or serial-on-shared-tree per
the git-index-race lesson). T3 runs last.
## Done =
Build clean + `dotnet test` green (OpcUaServer.Tests + Client.Shared.Tests) + backlog/memory reconciled +
merged to master + pushed.