A rename-only deploy produced an IsEmpty plan that short-circuited before MaterialiseHierarchy, leaving the OPC UA folder DisplayName stale. AddressSpacePlanner now diffs UnsAreas/UnsLines by stable id into a RenamedFolders set (counted in IsEmpty); the applier refreshes the folder in place via a new UpdateFolderDisplayName on ISurgicalAddressSpaceSink (forwarded through DeferredAddressSpaceSink so it is NOT inert on driver hosts; falls back to rebuild when the sink is non-surgical). DeploymentArtifact byte-parity untouched (rename rides the existing Name round-trip). No EF migration, no serialized wire/proto contract change. +13 OpcUaServer tests, Runtime rebuild test.
23 KiB
Code Review — OpcUaServer
| Field | Value |
|---|---|
| Module | src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer |
| Reviewer | Claude Code |
| Review date | 2026-06-19 |
| Commit reviewed | 7286d320 |
| Status | Reviewed |
| Open findings | 0 |
Checklist coverage
A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | OpcUaServer-001, -002, -003, -007 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found (Lock discipline + fire-and-forget dispatch verified correct) |
| 4 | Error handling & resilience | OpcUaServer-004 |
| 5 | Security | No issues found (WriteOperate / AlarmAck gates fail closed; anonymous → BadUserAccessDenied; HistoryRead AccessLevel bits correct) |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found (ConditionId=dotted-FullName routing, MapSeverity buckets, fire-and-forget dispatch all match CLAUDE.md / docs) |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | No issues found (HistoryRead paging incl. oversized-tie-cluster + backstop is covered end-to-end; OpcUaServer-005 re-triaged to Won't Fix) |
| 10 | Documentation & comments | OpcUaServer-006 (Resolved) |
Findings
OpcUaServer-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | AddressSpacePlan.cs:56 (AddressSpacePlan.IsEmpty), AddressSpacePlan.cs:80 (AddressSpacePlanner.Compute) |
| Status | Resolved |
Description: AddressSpaceComposition carries the UNS topology (UnsAreas + UnsLines), and
AddressSpaceApplier.MaterialiseHierarchy uses each area's/line's DisplayName for the OPC UA
folder display name. But AddressSpacePlanner.Compute only diffs Equipment / DriverInstance /
ScriptedAlarm / EquipmentTag / EquipmentVirtualTag — it never diffs UnsAreas / UnsLines. So a
deployment whose ONLY change is a UNS Area or Line rename (no equipment/driver/alarm/tag/vtag
delta) produces a plan whose IsEmpty is true. In OpcUaPublishActor.HandleRebuild
(src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs:313-318) an empty plan
short-circuits BEFORE MaterialiseHierarchy runs, so the renamed Area/Line folder keeps its stale
display name until some unrelated structural change forces a full rebuild. Compounding it,
OtOpcUaNodeManager.EnsureFolder (OtOpcUaNodeManager.cs:1272) early-returns for an
already-present folder id and never updates an existing folder's DisplayName, so even if
MaterialiseHierarchy were reached on a rename it would be a no-op until RebuildAddressSpace
clears _folders.
Recommendation: Carry UNS area/line diff sets into AddressSpacePlan (mirroring the
init-only EquipmentTag/VirtualTag pattern) so a rename is no longer "empty", and have the apply
path drive a hierarchy refresh (or make EnsureFolder update an existing folder's DisplayName
in place + ClearChangeMasks). Deferred: a complete fix spans the Runtime module
(OpcUaPublishActor must honour the new plan flag and call a hierarchy-refresh / rebuild path),
which is outside this module's edit boundary.
Resolution: Resolved — 2026-06-20 (SHA pending): a UNS Area / Line rename-only deploy now produces a
non-empty plan that refreshes the existing folder's DisplayName IN PLACE (no rebuild, subscriptions
preserved). No EF migration, no entity-schema change, and no wire/proto/Commons-contract break — the
AddressSpacePlan is a pure in-process runtime diff (never serialized), and DeploymentArtifact already
round-trips each area/line Name into its UnsAreaProjection/UnsLineProjection.DisplayName, so the
artifact mirror needed NO change to carry the new name (byte-parity preserved). Changes span OpcUaServer +
Commons + Runtime:
AddressSpacePlan.cs(OpcUaServer) — added an init-onlyRenamedFoldersdiff set (new nestedFolderRename(FolderNodeId, NewDisplayName)record) mirroring the EquipmentTag/VirtualTag init-only pattern, and folded it intoIsEmpty.AddressSpacePlanner.Computenow diffsprev.UnsAreas/UnsLinesvsnext.UnsAreas/UnsLinesby stable id (UnsAreaId/UnsLineId— the exact NodeId schemeMaterialiseHierarchyuses), emitting a rename only when a surviving folder'sDisplayNamediffers (ordinal). Added/removed areas are NOT renames (handled by the hierarchy/rebuild path). Output is deterministic (areas first, then lines; each id-sorted).ISurgicalAddressSpaceSink.cs(Commons) — added an in-placebool UpdateFolderDisplayName(folderNodeId, displayName)to the existing optional surgical-capability interface (already forwarded byDeferredAddressSpaceSink); returns false when the folder is missing so the caller rebuilds.OtOpcUaNodeManager.UpdateFolderDisplayName(OpcUaServer) — the surgical counterpart ofEnsureFolder(which early-returns on an existing folder and never touched its name): underLockit mutates the liveFolderState.DisplayNameand callsClearChangeMasks(the SDK gotcha) so subscribers see the new name immediately; NodeId/BrowseName unchanged.SdkAddressSpaceSink+DeferredAddressSpaceSinkforward it (the deferred forward is load-bearing: actors inject the wrapper, so without it the optimization would be inert on every driver-role host — same trap as the F10b surgical-tag forward).AddressSpaceApplier.Apply(OpcUaServer) — when no structural rebuild fires, a rename-only (or rename + surgical-tag) plan applies each rename viaUpdateFolderDisplayName; a sink lacking the surgical capability or a missing folder falls back to a full rebuild (safe default). When a structural rebuild DOES fire (any add/remove/structural change),MaterialiseHierarchyre-creates every folder with the new names, so renames are covered for free and no separate surgical call is made.RenamedFolders.Countis added to the outcome'sChangedNodes.OpcUaPublishActor.HandleRebuild(Runtime) — no edit needed beyond the now-non-empty plan: the existing_applier.Apply(plan)drives the in-place refresh, and the subsequent idempotentMaterialiseHierarchy(composition)leaves the just-renamed folder alone (EnsureFolder early-returns on the existing id). The IsEmpty short-circuit now correctly proceeds for a rename-only deploy.
Tests (all green): AddressSpacePlannerTests — area-rename / line-rename / both-renames-ordered yield a
non-empty plan with the rename present, and no-change / added-area yield NO rename (no false positives).
AddressSpaceApplierTests — rename-only updates the folder in place with no rebuild; mixed-with-structural
rebuilds; non-surgical sink + surgical-returns-false both fall back to rebuild. AddressSpaceApplierHierarchyTests
— a real-SDK end-to-end apply asserts the existing area+line FolderState.DisplayName is swapped in place
with no node-count change. DeferredAddressSpaceSinkTests (OpcUaServer + Commons) — the deferred wrapper
forwards the new capability and returns the inner's result / false when not surgical.
OpcUaPublishActorRebuildTests.Rebuild_with_area_rename_only_updates_folder_in_place_without_rebuild — drives
the actor end-to-end: a second deploy changing ONLY the area Name reaches the apply path and issues a surgical
UpdateFolderDisplayName("area-1", "Plant South") without a second full rebuild. Suites:
OpcUaServer.Tests 297/297, Runtime.Tests 278/278, Commons.Tests deferred-sink 11/11.
OpcUaServer-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | OtOpcUaNodeManager.cs:1748 (HistoryReadEvents), OtOpcUaNodeManager.cs:1814 (ClampToInt) |
| Status | Resolved |
Description: For HistoryRead-Events, HistoryReadEvents passes
ClampToInt(details.NumValuesPerNode) to IHistorianDataSource.ReadEventsAsync(maxEvents) and
always returns the result with ContinuationPoint = null ("the full window in one shot"). The
events arm never issues continuation points. By the IHistoryProvider.ReadEventsAsync contract
(src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:88-95), maxEvents <= 0 is a
sentinel meaning "use the backend's default cap". But OPC UA Part 4/11 define
NumValuesPerNode == 0 on a HistoryRead as "no limit — return ALL values". So a client that asks
for all events in a window (NumValuesPerNode == 0) is silently truncated at the backend's
default cap, and because no continuation point (and no GoodMoreData) is returned, the client has
no signal that the result was truncated — it believes it received the complete window. This is a
silent-data-loss / spec-deviation on the events arm.
Recommendation: Either (a) translate NumValuesPerNode == 0 to "unbounded" for the events
backend (e.g. saturate to a very large cap) rather than the "default cap" sentinel, or (b) detect
backend truncation and surface a continuation point / GoodMoreData for events. Deferred: option
(a) changes the documented maxEvents <= 0 sentinel semantics shared with the Wonderware/OpcUaClient
event backends (cross-module, Core.Abstractions contract); option (b) requires the backend to report
truncation. Both cross this module's boundary.
Resolution: Resolved — 2026-06-20 (SHA pending): fixed locally inside OpcUaServer without touching
the cross-module IHistoryProvider.ReadEventsAsync maxEvents <= 0 sentinel. Added a small testable
internal static int EventMaxEvents(uint numValuesPerNode) helper next to ClampToInt that translates
the OPC UA Part 4/11 "no limit" request (NumValuesPerNode == 0) to UNBOUNDED (int.MaxValue, a very
large positive cap) rather than the backend's <= 0 "use the default cap" sentinel; a positive value
still passes through ClampToInt unchanged. HistoryReadEvents now calls EventMaxEvents(details.NumValuesPerNode)
instead of ClampToInt(details.NumValuesPerNode), so a "give me the whole window" events read is no
longer silently truncated at the backend default. The sentinel contract + the Wonderware/OpcUaClient
backends are untouched (a positive int.MaxValue is never the <= 0 sentinel). Tests:
NodeManagerEventMaxEventsTests (helper purity — 0u→int.MaxValue, normal passthrough,
>int.MaxValue→int.MaxValue clamp, exact-int.MaxValue boundary) plus
NodeManagerHistoryReadEventsTests.Events_unbounded_request_passes_int_max_to_backend (the recording fake
IHistorianDataSource receives int.MaxValue when NumValuesPerNode == 0). Note: option (b) — surfacing
a continuation point / GoodMoreData on backend truncation — remains a cross-module/backend change and is
out of scope; option (a) here removes the silent-truncation defect for the common "all events" request.
OpcUaServer-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | OtOpcUaNodeManager.cs:1978 (ServeRawPaged), HistoryPaging.cs (whole), HistoryPaging.cs:213 (SliceTieCluster next <= endUtc) |
| Status | Deferred |
Description: The Raw paging chain treats endUtc as an inclusive upper bound throughout —
the HistoryContinuationState/HistoryPaging XML docs all say "the original (inclusive) end of
the window", and SliceTieCluster advances with next <= endUtc. But the backend contract
(IHistoryProvider.ReadRawAsync, IHistoryProvider.cs:25) defines endUtc as an exclusive
upper bound. The two arms therefore disagree at the exact endUtc boundary. Impact is small
because the backend is the authority on which samples actually exist (a sample at exactly endUtc
never appears in an exclusive-end read), so the disagreement only ever yields one extra empty
resume page ([endUtc, endUtc) → GoodNoData, no CP) rather than duplicated or dropped data.
Separately, the SDK permits a Raw read with StartTime == DateTime.MinValue + a NumValuesPerNode
cap (the OPC UA "read N values backward from EndTime" request — validated in
CustomNodeManager.HistoryRead); the forward-only resume cursor (ComputeResumeCursor resumes from
the LAST sample forward) is not defined for a backward read, so paging a backward Raw read is
unsound if the backend honours the backward direction.
Recommendation: Pin the endUtc inclusive-vs-exclusive convention against the actual backend
(align the paging XML docs + SliceTieCluster comparison to the contract), and either explicitly
reject a StartTime == MinValue paged Raw read with a clear status or document that the backend is
assumed to always return chronological forward results regardless of direction. Deferred: resolving
the inclusive/exclusive question requires confirming the Wonderware backend's actual boundary
semantics (cross-module / infra), and changing a comparison without that confirmation risks the
opposite off-by-one.
Resolution: Deferred — 2026-06-20: infra-gated. Resolving the endUtc inclusive-vs-exclusive
disagreement requires confirming the actual Wonderware historian backend's boundary semantics, which is
hardware/infra-gated and not reachable from this macOS dev host. The impact is benign and bounded — because
the backend is the authority on which samples exist (a sample at exactly endUtc never appears in an
exclusive-end read), the disagreement only ever yields ONE extra empty resume page ([endUtc, endUtc) →
GoodNoData, no continuation point) rather than any duplicated or dropped data. Changing the
SliceTieCluster comparison / paging XML docs without confirming the live backend boundary risks
introducing the opposite off-by-one, so no code is changed here pending that live confirmation.
OpcUaServer-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | OtOpcUaNodeManager.cs:1597 (ResolveParentFolder), and every public sink mutator that calls it (EnsureFolder 1278, EnsureVariable 1335, MaterialiseAlarmCondition 597, plus WriteValue/WriteAlarmCondition CreateVariable) |
| Status | Resolved |
Description: ResolveParentFolder dereferences _root! with the null-forgiving operator, and
CreateVariable uses _root (AddChild). _root is only assigned in CreateAddressSpace, which
the SDK invokes during StandardServer start. Every public address-space mutator
(WriteValue, WriteAlarmCondition, EnsureFolder, EnsureVariable, MaterialiseAlarmCondition)
assumes CreateAddressSpace has already run. If any of these is ever called before the server has
started (e.g. a sink wired or a publish replayed before StartAsync completes), _root is null
and the call NREs out of the node manager. In the current boot ordering the host wires the
SdkAddressSpaceSink only after start, so this is latent rather than live, but it is an unguarded
ordering hazard on the highest-risk class.
Recommendation: Add an explicit guard (e.g. throw a clear InvalidOperationException("address space not yet created") or no-op with a logged warning) when _root is null at the top of the
mutators, so a too-early call fails legibly instead of with a bare NRE. Low priority — defensive
hardening, not a live defect. Left Open to avoid an unscoped change to the mutator entry points on
this critical class without a regression scenario that reproduces the early-call ordering.
Resolution: Resolved — 2026-06-20 (SHA pending): added a private EnsureAddressSpaceCreated() helper
that throws InvalidOperationException("OPC UA address space has not been created yet (server not started.)")
when _root is null, and call it at the top of ResolveParentFolder and at every public address-space
mutator entry point (WriteValue, WriteAlarmCondition, EnsureFolder, EnsureVariable,
MaterialiseAlarmCondition) — right after argument validation, before any _root dereference. A too-early
call (a sink wired or a publish replayed before StartAsync drives CreateAddressSpace) now fails legibly
instead of with a bare NRE out of ResolveParentFolder / CreateVariable. Happy-path behaviour is
unchanged. The guard was test-feasible after all: NodeManagerPreStartGuardTests boots a real host,
borrows the live node manager's real IServerInternal, constructs a SECOND, never-started
OtOpcUaNodeManager from it (so _root is null), and asserts each of the four lock-taking mutators
(EnsureFolder/EnsureVariable/WriteValue/MaterialiseAlarmCondition) throws InvalidOperationException
(not NRE), with the folder case asserting the message text. (WriteAlarmCondition's guard is identical and
sits on the same path; it is build-verified.) Full OpcUaServer.Tests suite green (284/284).
OpcUaServer-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | OtOpcUaNodeManager.cs:2049 (ServeRawPaged tie-cluster stall path), OtOpcUaNodeManager.cs:2068 (absurd-burst backstop) |
| Status | Won't Fix |
Description: (Re-triaged to a non-issue.) Initial concern was that the oversized-tie-cluster
paging path in ServeRawPaged (over-fetch the whole cluster, SliceTieCluster, the
MaxTieClusterOverfetch absurd-burst BadHistoryOperationUnsupported backstop) lacked an
end-to-end node-manager regression and was only covered at the pure level by HistoryPagingTests.
Recommendation: None required.
Resolution: Won't Fix — 2026-06-19: coverage already exists end-to-end. Verified that
NodeManagerHistoryReadPagingTests.Raw_oversized_tie_cluster_pages_within_the_timestamp drives a
real multi-page resume INTO the stall (a 5-way tie cluster larger than the page cap) through a
booted OtOpcUaSdkServer + InMemoryHistoryContinuationStore and asserts lossless duplicate-free
paging, and Raw_tie_cluster_beyond_overfetch_bound_fails_loudly exercises the
> MaxTieClusterOverfetch backstop. The module's HistoryRead paging surface is comprehensively
tested; no test gap.
OpcUaServer-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | OtOpcUaNodeManager.cs:11-30 (class XML doc), OpcUaApplicationHost.cs:88-93 / OpcUaApplicationHost.cs:421-423 (F13/F13c follow-up notes) |
| Status | Resolved |
Description: Several XML comments reference superseded plan milestones as if still pending. The
OtOpcUaNodeManager class doc says equipment-folder hierarchy + type metadata "still come from the
AddressSpaceApplier / EquipmentNodeWalker integration (F14b, tracked under #85)", but
EquipmentNodeWalker is no longer the integration path (the composer → applier → sink → node-manager
chain replaced it, per AddressSpaceComposer's own header) and #85 hierarchy materialisation has
shipped. OpcUaApplicationHost still carries "Full extraction … is tracked as follow-up F13" and
"F13c will plug a real LDAP-bound validator" notes, but the IOpcUaUserAuthenticator /
RoleCarryingUserIdentity impersonation wiring is implemented and live (the doc/docs/OpcUaServer.md
describe it as done). These are stale-comment / undocumented-as-shipped issues, not behaviour bugs.
Recommendation: Refresh the class- and method-level doc comments to drop the
EquipmentNodeWalker/F13/F13c/F14b "pending" framing and describe the shipped behaviour, matching
docs/OpcUaServer.md.
Resolution: Resolved — 2026-06-19 (SHA pending): rewrote the OtOpcUaNodeManager class XML
doc to describe the shipped folder-hierarchy / typed-variable / historized-node / Part 9
condition materialisation (dropping the false "treats every id as a flat BaseDataVariableState"
claim and the retired EquipmentNodeWalker/F14b framing), and the two OpcUaApplicationHost
doc blocks (class summary + BuildUserTokenPolicies) to describe the shipped impersonation/auth
wiring instead of the "F13/F13c pending" framing. Doc-comment-only — no behaviour change; build
re-verified green.
OpcUaServer-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | OtOpcUaNodeManager.cs:674 (MaterialiseAlarmCondition → alarm.OnTimedUnshelve) |
| Status | Resolved |
Description: The system-timer auto-unshelve callback (alarm.OnTimedUnshelve, wired in
MaterialiseAlarmCondition) — fired by the SDK when a TimedShelve duration expires — built its
AlarmCommand with User: string.Empty:
AlarmCommandRouter?.Invoke(new AlarmCommand(alarmId, "Unshelve", string.Empty, null, null)). That
command is routed to the scripted-alarm engine, whose ScriptedAlarmEngine.UnshelveAsync calls
Part9StateMachine.ApplyUnshelve(cur, user, _clock())
(src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/Part9StateMachine.cs:211-213), and that method
opens with if (string.IsNullOrWhiteSpace(user)) throw new ArgumentException("User required.", …).
The exception is swallowed downstream (caught by ScriptedAlarmHostActor), so this is NOT a crash —
but the auto-unshelve silently no-ops: a TimedShelve never auto-expires, leaving the alarm
permanently shelved with no operator-visible error. The bug was operationally invisible and even had
a green node-manager test that asserted cmd.User == string.Empty, encoding the defect as expected
behaviour. The separate, correct design rule — OnTimedUnshelve BYPASSES the AlarmAck role gate
because it is a session-less system timer with no client principal (routing through the gated
HandleAlarmCommand would return BadUserAccessDenied) — is intentional and was preserved; only the
empty User string was the defect.
Recommendation: Pass a non-empty system user at the OnTimedUnshelve call site so ApplyUnshelve
accepts it, while keeping the gate bypass. Use the codebase-wide canonical "system" system-actor
label (matching Part9StateMachine's own AutoUnshelve audit user, AlarmConditionState's
"engine-internal events ⇒ system" doc, and AuditActor.SystemFallback = "system").
Resolution: Resolved — 2026-06-19 (SHA pending): changed the OnTimedUnshelve delegate in
MaterialiseAlarmCondition (OtOpcUaNodeManager.cs:674) to build the AlarmCommand with
User: "system" instead of string.Empty, so the engine's ApplyUnshelve user-required guard
accepts the system-initiated unshelve and the timed auto-unshelve actually applies. The AlarmAck
gate bypass for the session-less system timer is unchanged. TDD: repurposed the existing
AlarmCommandRouterTests.OnTimedUnshelve_with_system_context_returns_good_and_routes_unshelve
(which previously asserted cmd.User == string.Empty) to assert a non-empty "system" user —
verified it failed against the unfixed source (cmd.User should not be null or white space) and
passes after the fix. Surgical one-line src change; no public-contract/wire change. Full
OpcUaServer.Tests suite green (275/275).