docs(plan): design for F10b vtag rebuild-skip + Client.UI Galaxy comment limitation
This commit is contained in:
@@ -0,0 +1,171 @@
|
||||
# F10b vtag-node-irrelevant rebuild skip + Client.UI Galaxy operator-comment limitation — Design
|
||||
|
||||
**Date:** 2026-06-18
|
||||
**Branch:** `feat/f10b-vtag-skip-rebuild-galaxy-comment` (off master `be703a30`)
|
||||
**Backlog items:** `stillpending.md` §A #11 (F10b in-place vs full rebuild) + #12 (Client.UI Galaxy operator-comment fallback)
|
||||
|
||||
Two independent, low-risk backlog items. F10b is a *bounded* address-space-rebuild optimization (the narrow, provably-safe slice — NOT the broad surgical-write rewrite). The Client.UI item is an accepted-limitation close (doc/comment + a pinning test). They touch disjoint projects (OpcUaServer / Client.Shared).
|
||||
|
||||
---
|
||||
|
||||
## Standing constraints (in force)
|
||||
|
||||
- NO Commons/proto change · NO Core.Abstractions / breaking-interface change · NO EF migration · NO bUnit
|
||||
(no Razor touched this phase).
|
||||
- Stage by explicit path (never `git add .`); never stage `sql_login.txt` / `pki/` / `pending.md` /
|
||||
`current.md` / `stillpending.md` / `docker-dev/docker-compose.yml`. No force-push / no `--no-verify`.
|
||||
- Finish = merge to master + push. `dangerouslyDisableSandbox: true` for all build/test commands.
|
||||
|
||||
---
|
||||
|
||||
## Component A — F10b: skip the address-space rebuild for vtag-node-irrelevant edits (backlog #11)
|
||||
|
||||
### What "full rebuild" costs, and why H1a made it unconditional
|
||||
|
||||
`OpcUaPublishActor.HandleRebuild` runs: `outcome = _applier.Apply(plan)` (which calls
|
||||
`_sink.RebuildAddressSpace()` — a full tear-down of every folder/variable/alarm — iff `needsRebuild`),
|
||||
then **unconditionally** re-runs the idempotent `Materialise*` passes. H1a/H1b (`1e95856b`/`ada01e1a`)
|
||||
made `needsRebuild` fire on **any** Changed* (not just Added/Removed) to kill a silent stale-node bug — but
|
||||
that means **editing a single VirtualTag's expression tears down and rebuilds the entire server address
|
||||
space**, dropping every client's live values + monitored-item subscriptions server-wide. That blast radius
|
||||
is the cost F10b targets.
|
||||
|
||||
### The provably-safe slice (the only zero-node-mutation case)
|
||||
|
||||
A VirtualTag's **materialised OPC UA node** (`Phase7Applier.MaterialiseEquipmentVirtualTags`, lines 264-272)
|
||||
is derived **only** from `{EquipmentId, FolderPath, Name, DataType}` →
|
||||
`EnsureVariable(parent/Name, Name, DataType, writable:false)`. `Expression`, `DependencyRefs`, and
|
||||
`Historize` do **not** touch the node — `Historize` is explicitly write-side only (the code comment at
|
||||
lines 260-263 says so; vtags are not materialised as SDK Historizing/HistoryRead variables).
|
||||
|
||||
The VirtualTag **engine** picks up `Expression`/`DependencyRefs`/`Historize` changes through a **completely
|
||||
independent path**: `DriverHostActor` (`:1062`) tells `VirtualTagHostActor` `ApplyVirtualTags(...)`, and
|
||||
`VirtualTagHostActor.OnApply` (lines 97-109) stop+respawns any child "whose plan changed in place (edited
|
||||
Expression/DependencyRefs, or a toggled flag)". This is **not** routed through `Phase7Applier`.
|
||||
|
||||
Therefore, when a deploy's *only* changes are vtag deltas that differ *solely* in
|
||||
`Expression`/`DependencyRefs`/`Historize`:
|
||||
- the materialised node is byte-identical → `RebuildAddressSpace` would tear it down and rebuild an
|
||||
identical node, dropping subscriptions for nothing;
|
||||
- the idempotent `Materialise*` passes already short-circuit on an existing node
|
||||
(`OtOpcUaNodeManager.EnsureVariable:1324` early-returns `if (_variables.ContainsKey(...))`), so they
|
||||
no-op and preserve the live value + subscriptions;
|
||||
- the engine respawns regardless via `VirtualTagHostActor`.
|
||||
|
||||
So **skipping `RebuildAddressSpace` for this case is correct and preserves the H1a invariant** (no stale
|
||||
node — because the node never needed to change).
|
||||
|
||||
### Approach (Phase7Applier-only; NO node-manager / SDK / interface change)
|
||||
|
||||
In `Phase7Applier.Apply`, refine `needsRebuild`. A vtag delta is **node-irrelevant** iff overriding the
|
||||
three engine/write-side fields to the new values makes it equal to the new plan (a whitelist-of-may-differ
|
||||
that **defaults to rebuild** for any other/unknown field, because `EquipmentVirtualTagPlan` has a custom
|
||||
`Equals` comparing all 8 logical fields):
|
||||
|
||||
```csharp
|
||||
static bool VtagDeltaIsNodeIrrelevant(Phase7Plan.EquipmentVirtualTagDelta d) =>
|
||||
(d.Previous with
|
||||
{
|
||||
Expression = d.Current.Expression,
|
||||
DependencyRefs = d.Current.DependencyRefs,
|
||||
Historize = d.Current.Historize,
|
||||
}).Equals(d.Current);
|
||||
```
|
||||
|
||||
`needsRebuild` becomes true iff there is any structural change OR any vtag delta that is **not**
|
||||
node-irrelevant:
|
||||
|
||||
```csharp
|
||||
var needsRebuild =
|
||||
plan.AddedEquipment.Count > 0 || plan.RemovedEquipment.Count > 0 || plan.ChangedEquipment.Count > 0 ||
|
||||
plan.AddedAlarms.Count > 0 || plan.RemovedAlarms.Count > 0 || plan.ChangedAlarms.Count > 0 ||
|
||||
plan.AddedEquipmentTags.Count > 0 || plan.RemovedEquipmentTags.Count > 0 || plan.ChangedEquipmentTags.Count > 0 ||
|
||||
plan.AddedEquipmentVirtualTags.Count > 0 || plan.RemovedEquipmentVirtualTags.Count > 0 ||
|
||||
plan.ChangedEquipmentVirtualTags.Any(d => !VtagDeltaIsNodeIrrelevant(d));
|
||||
```
|
||||
|
||||
i.e. `needsRebuild` is false **only** when every effective change is a node-irrelevant vtag edit and
|
||||
nothing else changed. `ChangedNodes` (the outcome count) stays as-is (it still counts the vtag change as a
|
||||
change — it just doesn't force a rebuild). Update the lines 84-91 comment to document the new skip + the
|
||||
safety reasoning. The downstream `Materialise*` passes and the `VirtualTagHostActor` respawn are unchanged.
|
||||
|
||||
### Out of scope (deferred, documented)
|
||||
|
||||
Surgical *in-place attribute writes* for equipment-tag property changes (`DataType`, `Writable`,
|
||||
`IsHistorized`) and alarm-severity-only changes (`SetSeverity`) genuinely mutate node attributes and carry
|
||||
the H1a stale-node risk + SDK handler/historian-lifecycle complexity. They are NOT in this slice — F10b here
|
||||
is strictly the zero-node-mutation vtag skip. Record them as follow-ups.
|
||||
|
||||
### Testing (offline, no SDK)
|
||||
|
||||
`Phase7ApplierTests` already uses a `RecordingSink` that captures `RebuildAddressSpace` calls. Add:
|
||||
- vtag delta differing **only in Expression** ⇒ `outcome.RebuildCalled == false`, `RecordingSink` saw **no**
|
||||
`RebuildAddressSpace`, `outcome.ChangedNodes == 1`.
|
||||
- same for a **DependencyRefs-only** and a **Historize-only** delta ⇒ no rebuild.
|
||||
- vtag delta where **DataType also changed** ⇒ `RebuildCalled == true` (the safety boundary).
|
||||
- vtag delta where **Name (or FolderPath) changed** ⇒ `RebuildCalled == true`.
|
||||
- a node-irrelevant vtag delta **mixed with** any other change (e.g. a ChangedEquipmentTag, or an added
|
||||
equipment) ⇒ `RebuildCalled == true` (the skip requires the vtag edit to be the *sole* change).
|
||||
|
||||
### Live `/run` (optional, confirmatory)
|
||||
|
||||
On docker-dev (rebuild BOTH central nodes per the `:9200` round-robin fact): subscribe a client to a vtag,
|
||||
edit only its expression in `/uns`, Deploy → server log shows `rebuild=False` for that deploy and the
|
||||
client subscription is NOT dropped (vs `rebuild=True` when a tag's DataType changes). This is confirmatory;
|
||||
the unit tests pin the logic.
|
||||
|
||||
---
|
||||
|
||||
## Component B — Client.UI Galaxy operator-comment fallback (backlog #12)
|
||||
|
||||
### Verdict: accepted limitation, not a code-fixable bug
|
||||
|
||||
The operator comment is **genuinely unrecoverable** in the Galaxy sub-attribute fallback path
|
||||
(`OpcUaClientService.cs:769-820`):
|
||||
- the standard OPC UA event SelectClause (`CreateAlarmEventFilter`, ~13 fields) does **not** request an
|
||||
AckComment/Comment field — Part 9 conditions don't carry the operator comment in the event payload here;
|
||||
- Galaxy's sub-attribute convention (`AlarmRefBuilder.cs:15-20`) exposes `AckMsg` as **write-only** — there
|
||||
is no readable attribute to recover the last operator comment;
|
||||
- the comment **is** available via the gateway's **native** alarm feed
|
||||
(`GatewayGalaxyAlarmFeed` → `GalaxyAlarmTransition.OperatorComment`), which the Galaxy *driver* already
|
||||
uses — a different path from this OPC UA *client* fallback.
|
||||
|
||||
So a real fix would require a server-side change (out of scope); the backlog already labels this "accepted
|
||||
limitation." `AlarmEventArgs.cs:82-87` partly documents it, but the client-side extraction omission is
|
||||
unexplained, and the behavior is untested.
|
||||
|
||||
### Approach (doc/comment + a pinning test; NO behavioral change)
|
||||
|
||||
- **Inline comment** at the fallback `AlarmEvent?.Invoke(...)` site in `OpcUaClientService.cs` (~:817):
|
||||
explain that `OperatorComment` is intentionally left null here — Galaxy `AckMsg` is write-only and the
|
||||
OPC UA event SelectClause carries no comment field, so the comment is only available via the gateway's
|
||||
native alarm feed (the driver path), not this client fallback.
|
||||
- **Tighten the `AlarmEventArgs.cs:82-87` doc** to name the two concrete reasons (write-only `AckMsg` +
|
||||
no event-field), so it reads as an intentional limitation, not a TODO.
|
||||
- **Pinning test** in `OpcUaClientServiceTests` (mirror the existing
|
||||
`OnAlarmEvent_MissingAckedActiveButHasConditionNode_FallbackReadsAndRaisesEvent` fixture): drive the
|
||||
fallback path and assert the raised `AlarmEventArgs.OperatorComment` is null — converting the gap into a
|
||||
tested, intentional limitation.
|
||||
|
||||
### Testing
|
||||
|
||||
Fully offline — the existing `OpcUaClientServiceTests` fakes (`FakeSessionAdapter`/`FakeSubscriptionAdapter`)
|
||||
already drive the fallback path; the new test asserts the documented behavior.
|
||||
|
||||
---
|
||||
|
||||
## Task slicing (independent → parallelizable)
|
||||
|
||||
| Task | Component | Project | Class | Parallel with |
|
||||
|---|---|---|---|---|
|
||||
| T1 | F10b vtag-node-irrelevant rebuild skip + tests | OpcUaServer (+ .Tests) | small | T2 |
|
||||
| T2 | Client.UI Galaxy comment: inline comment + doc + pinning test | Client.Shared (+ .Tests) | trivial→small | T1 |
|
||||
| T3 | Reconcile stillpending #11/#12 + memory + finish (build, tests, merge+push) | docs (never-staged) | small | none |
|
||||
|
||||
T1/T2 touch disjoint projects → dispatchable concurrently (worktree isolation or serial-on-shared-tree per
|
||||
the git-index-race lesson). T3 runs last.
|
||||
|
||||
## Done =
|
||||
|
||||
Build clean + `dotnet test` green (OpcUaServer.Tests + Client.Shared.Tests) + backlog/memory reconciled +
|
||||
merged to master + pushed.
|
||||
Reference in New Issue
Block a user