# Code Review — OpcUaServer | Field | Value | |---|---| | Module | `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer` | | Reviewer | Claude Code | | Review date | 2026-06-19 | | Commit reviewed | `7286d320` | | Status | Reviewed | | Open findings | 0 | ## Checklist coverage A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank. | # | Category | Result | |---|---|---| | 1 | Correctness & logic bugs | OpcUaServer-001, -002, -003, -007 | | 2 | OtOpcUa conventions | No issues found | | 3 | Concurrency & thread safety | No issues found (Lock discipline + fire-and-forget dispatch verified correct) | | 4 | Error handling & resilience | OpcUaServer-004 | | 5 | Security | No issues found (WriteOperate / AlarmAck gates fail closed; anonymous → BadUserAccessDenied; HistoryRead AccessLevel bits correct) | | 6 | Performance & resource management | No issues found | | 7 | Design-document adherence | No issues found (ConditionId=dotted-FullName routing, MapSeverity buckets, fire-and-forget dispatch all match CLAUDE.md / docs) | | 8 | Code organization & conventions | No issues found | | 9 | Testing coverage | No issues found (HistoryRead paging incl. oversized-tie-cluster + backstop is covered end-to-end; OpcUaServer-005 re-triaged to Won't Fix) | | 10 | Documentation & comments | OpcUaServer-006 (Resolved) | ## Findings ### OpcUaServer-001 | Field | Value | |---|---| | Severity | Medium | | Category | Correctness & logic bugs | | Location | `AddressSpacePlan.cs:56` (`AddressSpacePlan.IsEmpty`), `AddressSpacePlan.cs:80` (`AddressSpacePlanner.Compute`) | | Status | Resolved | **Description:** `AddressSpaceComposition` carries the UNS topology (`UnsAreas` + `UnsLines`), and `AddressSpaceApplier.MaterialiseHierarchy` uses each area's/line's `DisplayName` for the OPC UA folder display name. But `AddressSpacePlanner.Compute` only diffs Equipment / DriverInstance / ScriptedAlarm / EquipmentTag / EquipmentVirtualTag — it never diffs `UnsAreas` / `UnsLines`. So a deployment whose ONLY change is a UNS Area or Line **rename** (no equipment/driver/alarm/tag/vtag delta) produces a plan whose `IsEmpty` is `true`. In `OpcUaPublishActor.HandleRebuild` (`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs:313-318`) an empty plan short-circuits BEFORE `MaterialiseHierarchy` runs, so the renamed Area/Line folder keeps its stale display name until some unrelated structural change forces a full rebuild. Compounding it, `OtOpcUaNodeManager.EnsureFolder` (`OtOpcUaNodeManager.cs:1272`) early-returns for an already-present folder id and never updates an existing folder's `DisplayName`, so even if `MaterialiseHierarchy` were reached on a rename it would be a no-op until `RebuildAddressSpace` clears `_folders`. **Recommendation:** Carry UNS area/line diff sets into `AddressSpacePlan` (mirroring the init-only EquipmentTag/VirtualTag pattern) so a rename is no longer "empty", and have the apply path drive a hierarchy refresh (or make `EnsureFolder` update an existing folder's `DisplayName` in place + `ClearChangeMasks`). Deferred: a complete fix spans the Runtime module (`OpcUaPublishActor` must honour the new plan flag and call a hierarchy-refresh / rebuild path), which is outside this module's edit boundary. **Resolution:** Resolved — 2026-06-20 (SHA pending): a UNS Area / Line **rename-only** deploy now produces a non-empty plan that refreshes the existing folder's `DisplayName` IN PLACE (no rebuild, subscriptions preserved). No EF migration, no entity-schema change, and no wire/proto/Commons-contract break — the `AddressSpacePlan` is a pure in-process runtime diff (never serialized), and `DeploymentArtifact` already round-trips each area/line `Name` into its `UnsAreaProjection`/`UnsLineProjection.DisplayName`, so the artifact mirror needed NO change to carry the new name (byte-parity preserved). Changes span OpcUaServer + Commons + Runtime: - **`AddressSpacePlan.cs`** (OpcUaServer) — added an init-only `RenamedFolders` diff set (new nested `FolderRename(FolderNodeId, NewDisplayName)` record) mirroring the EquipmentTag/VirtualTag init-only pattern, and folded it into `IsEmpty`. `AddressSpacePlanner.Compute` now diffs `prev.UnsAreas/UnsLines` vs `next.UnsAreas/UnsLines` by stable id (`UnsAreaId`/`UnsLineId` — the exact NodeId scheme `MaterialiseHierarchy` uses), emitting a rename only when a surviving folder's `DisplayName` differs (ordinal). Added/removed areas are NOT renames (handled by the hierarchy/rebuild path). Output is deterministic (areas first, then lines; each id-sorted). - **`ISurgicalAddressSpaceSink.cs`** (Commons) — added an in-place `bool UpdateFolderDisplayName(folderNodeId, displayName)` to the existing optional surgical-capability interface (already forwarded by `DeferredAddressSpaceSink`); returns false when the folder is missing so the caller rebuilds. - **`OtOpcUaNodeManager.UpdateFolderDisplayName`** (OpcUaServer) — the surgical counterpart of `EnsureFolder` (which early-returns on an existing folder and never touched its name): under `Lock` it mutates the live `FolderState.DisplayName` and calls `ClearChangeMasks` (the SDK gotcha) so subscribers see the new name immediately; NodeId/BrowseName unchanged. `SdkAddressSpaceSink` + `DeferredAddressSpaceSink` forward it (the deferred forward is load-bearing: actors inject the wrapper, so without it the optimization would be inert on every driver-role host — same trap as the F10b surgical-tag forward). - **`AddressSpaceApplier.Apply`** (OpcUaServer) — when no structural rebuild fires, a rename-only (or rename + surgical-tag) plan applies each rename via `UpdateFolderDisplayName`; a sink lacking the surgical capability or a missing folder falls back to a full rebuild (safe default). When a structural rebuild DOES fire (any add/remove/structural change), `MaterialiseHierarchy` re-creates every folder with the new names, so renames are covered for free and no separate surgical call is made. `RenamedFolders.Count` is added to the outcome's `ChangedNodes`. - **`OpcUaPublishActor.HandleRebuild`** (Runtime) — no edit needed beyond the now-non-empty plan: the existing `_applier.Apply(plan)` drives the in-place refresh, and the subsequent idempotent `MaterialiseHierarchy(composition)` leaves the just-renamed folder alone (EnsureFolder early-returns on the existing id). The IsEmpty short-circuit now correctly proceeds for a rename-only deploy. **Tests** (all green): `AddressSpacePlannerTests` — area-rename / line-rename / both-renames-ordered yield a non-empty plan with the rename present, and no-change / added-area yield NO rename (no false positives). `AddressSpaceApplierTests` — rename-only updates the folder in place with no rebuild; mixed-with-structural rebuilds; non-surgical sink + surgical-returns-false both fall back to rebuild. `AddressSpaceApplierHierarchyTests` — a real-SDK end-to-end apply asserts the existing area+line `FolderState.DisplayName` is swapped in place with no node-count change. `DeferredAddressSpaceSinkTests` (OpcUaServer + Commons) — the deferred wrapper forwards the new capability and returns the inner's result / false when not surgical. `OpcUaPublishActorRebuildTests.Rebuild_with_area_rename_only_updates_folder_in_place_without_rebuild` — drives the actor end-to-end: a second deploy changing ONLY the area Name reaches the apply path and issues a surgical `UpdateFolderDisplayName("area-1", "Plant South")` without a second full rebuild. Suites: `OpcUaServer.Tests` 297/297, `Runtime.Tests` 278/278, `Commons.Tests` deferred-sink 11/11. ### OpcUaServer-002 | Field | Value | |---|---| | Severity | Medium | | Category | Correctness & logic bugs | | Location | `OtOpcUaNodeManager.cs:1748` (`HistoryReadEvents`), `OtOpcUaNodeManager.cs:1814` (`ClampToInt`) | | Status | Resolved | **Description:** For HistoryRead-Events, `HistoryReadEvents` passes `ClampToInt(details.NumValuesPerNode)` to `IHistorianDataSource.ReadEventsAsync(maxEvents)` and always returns the result with `ContinuationPoint = null` ("the full window in one shot"). The events arm never issues continuation points. By the `IHistoryProvider.ReadEventsAsync` contract (`src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:88-95`), `maxEvents <= 0` is a sentinel meaning "use the backend's **default cap**". But OPC UA Part 4/11 define `NumValuesPerNode == 0` on a HistoryRead as "no limit — return ALL values". So a client that asks for all events in a window (`NumValuesPerNode == 0`) is silently truncated at the backend's default cap, and because no continuation point (and no `GoodMoreData`) is returned, the client has no signal that the result was truncated — it believes it received the complete window. This is a silent-data-loss / spec-deviation on the events arm. **Recommendation:** Either (a) translate `NumValuesPerNode == 0` to "unbounded" for the events backend (e.g. saturate to a very large cap) rather than the "default cap" sentinel, or (b) detect backend truncation and surface a continuation point / `GoodMoreData` for events. Deferred: option (a) changes the documented `maxEvents <= 0` sentinel semantics shared with the Wonderware/OpcUaClient event backends (cross-module, Core.Abstractions contract); option (b) requires the backend to report truncation. Both cross this module's boundary. **Resolution:** Resolved — 2026-06-20 (SHA pending): fixed locally inside OpcUaServer without touching the cross-module `IHistoryProvider.ReadEventsAsync` `maxEvents <= 0` sentinel. Added a small testable `internal static int EventMaxEvents(uint numValuesPerNode)` helper next to `ClampToInt` that translates the OPC UA Part 4/11 "no limit" request (`NumValuesPerNode == 0`) to UNBOUNDED (`int.MaxValue`, a very large positive cap) rather than the backend's `<= 0` "use the default cap" sentinel; a positive value still passes through `ClampToInt` unchanged. `HistoryReadEvents` now calls `EventMaxEvents(details.NumValuesPerNode)` instead of `ClampToInt(details.NumValuesPerNode)`, so a "give me the whole window" events read is no longer silently truncated at the backend default. The sentinel contract + the Wonderware/OpcUaClient backends are untouched (a positive `int.MaxValue` is never the `<= 0` sentinel). Tests: `NodeManagerEventMaxEventsTests` (helper purity — `0u→int.MaxValue`, normal passthrough, `>int.MaxValue→int.MaxValue` clamp, exact-`int.MaxValue` boundary) plus `NodeManagerHistoryReadEventsTests.Events_unbounded_request_passes_int_max_to_backend` (the recording fake `IHistorianDataSource` receives `int.MaxValue` when `NumValuesPerNode == 0`). Note: option (b) — surfacing a continuation point / `GoodMoreData` on backend truncation — remains a cross-module/backend change and is out of scope; option (a) here removes the silent-truncation defect for the common "all events" request. ### OpcUaServer-003 | Field | Value | |---|---| | Severity | Low | | Category | Correctness & logic bugs | | Location | `OtOpcUaNodeManager.cs:1978` (`ServeRawPaged`), `HistoryPaging.cs` (whole), `HistoryPaging.cs:213` (`SliceTieCluster` `next <= endUtc`) | | Status | Deferred | **Description:** The Raw paging chain treats `endUtc` as an **inclusive** upper bound throughout — the `HistoryContinuationState`/`HistoryPaging` XML docs all say "the original (inclusive) end of the window", and `SliceTieCluster` advances with `next <= endUtc`. But the backend contract (`IHistoryProvider.ReadRawAsync`, `IHistoryProvider.cs:25`) defines `endUtc` as an **exclusive** upper bound. The two arms therefore disagree at the exact `endUtc` boundary. Impact is small because the backend is the authority on which samples actually exist (a sample at exactly `endUtc` never appears in an exclusive-end read), so the disagreement only ever yields one extra empty resume page (`[endUtc, endUtc)` → GoodNoData, no CP) rather than duplicated or dropped data. Separately, the SDK permits a Raw read with `StartTime == DateTime.MinValue` + a `NumValuesPerNode` cap (the OPC UA "read N values backward from EndTime" request — validated in `CustomNodeManager.HistoryRead`); the forward-only resume cursor (`ComputeResumeCursor` resumes from the LAST sample forward) is not defined for a backward read, so paging a backward Raw read is unsound if the backend honours the backward direction. **Recommendation:** Pin the `endUtc` inclusive-vs-exclusive convention against the actual backend (align the paging XML docs + `SliceTieCluster` comparison to the contract), and either explicitly reject a `StartTime == MinValue` paged Raw read with a clear status or document that the backend is assumed to always return chronological forward results regardless of direction. Deferred: resolving the inclusive/exclusive question requires confirming the Wonderware backend's actual boundary semantics (cross-module / infra), and changing a comparison without that confirmation risks the opposite off-by-one. **Resolution:** Deferred — 2026-06-20: infra-gated. Resolving the `endUtc` inclusive-vs-exclusive disagreement requires confirming the actual Wonderware historian backend's boundary semantics, which is hardware/infra-gated and not reachable from this macOS dev host. The impact is benign and bounded — because the backend is the authority on which samples exist (a sample at exactly `endUtc` never appears in an exclusive-end read), the disagreement only ever yields ONE extra empty resume page (`[endUtc, endUtc)` → GoodNoData, no continuation point) rather than any duplicated or dropped data. Changing the `SliceTieCluster` comparison / paging XML docs without confirming the live backend boundary risks introducing the opposite off-by-one, so no code is changed here pending that live confirmation. ### OpcUaServer-004 | Field | Value | |---|---| | Severity | Low | | Category | Error handling & resilience | | Location | `OtOpcUaNodeManager.cs:1597` (`ResolveParentFolder`), and every public sink mutator that calls it (`EnsureFolder` 1278, `EnsureVariable` 1335, `MaterialiseAlarmCondition` 597, plus `WriteValue`/`WriteAlarmCondition` `CreateVariable`) | | Status | Resolved | **Description:** `ResolveParentFolder` dereferences `_root!` with the null-forgiving operator, and `CreateVariable` uses `_root` (`AddChild`). `_root` is only assigned in `CreateAddressSpace`, which the SDK invokes during `StandardServer` start. Every public address-space mutator (`WriteValue`, `WriteAlarmCondition`, `EnsureFolder`, `EnsureVariable`, `MaterialiseAlarmCondition`) assumes `CreateAddressSpace` has already run. If any of these is ever called before the server has started (e.g. a sink wired or a publish replayed before `StartAsync` completes), `_root` is `null` and the call NREs out of the node manager. In the current boot ordering the host wires the `SdkAddressSpaceSink` only after start, so this is latent rather than live, but it is an unguarded ordering hazard on the highest-risk class. **Recommendation:** Add an explicit guard (e.g. throw a clear `InvalidOperationException("address space not yet created")` or no-op with a logged warning) when `_root` is null at the top of the mutators, so a too-early call fails legibly instead of with a bare NRE. Low priority — defensive hardening, not a live defect. Left Open to avoid an unscoped change to the mutator entry points on this critical class without a regression scenario that reproduces the early-call ordering. **Resolution:** Resolved — 2026-06-20 (SHA pending): added a private `EnsureAddressSpaceCreated()` helper that throws `InvalidOperationException("OPC UA address space has not been created yet (server not started.)")` when `_root` is null, and call it at the top of `ResolveParentFolder` and at every public address-space mutator entry point (`WriteValue`, `WriteAlarmCondition`, `EnsureFolder`, `EnsureVariable`, `MaterialiseAlarmCondition`) — right after argument validation, before any `_root` dereference. A too-early call (a sink wired or a publish replayed before `StartAsync` drives `CreateAddressSpace`) now fails legibly instead of with a bare NRE out of `ResolveParentFolder` / `CreateVariable`. Happy-path behaviour is unchanged. The guard was test-feasible after all: `NodeManagerPreStartGuardTests` boots a real host, borrows the live node manager's real `IServerInternal`, constructs a SECOND, never-started `OtOpcUaNodeManager` from it (so `_root` is null), and asserts each of the four lock-taking mutators (`EnsureFolder`/`EnsureVariable`/`WriteValue`/`MaterialiseAlarmCondition`) throws `InvalidOperationException` (not NRE), with the folder case asserting the message text. (`WriteAlarmCondition`'s guard is identical and sits on the same path; it is build-verified.) Full `OpcUaServer.Tests` suite green (284/284). ### OpcUaServer-005 | Field | Value | |---|---| | Severity | Low | | Category | Testing coverage | | Location | `OtOpcUaNodeManager.cs:2049` (`ServeRawPaged` tie-cluster stall path), `OtOpcUaNodeManager.cs:2068` (absurd-burst backstop) | | Status | Won't Fix | **Description:** (Re-triaged to a non-issue.) Initial concern was that the oversized-tie-cluster paging path in `ServeRawPaged` (over-fetch the whole cluster, `SliceTieCluster`, the `MaxTieClusterOverfetch` absurd-burst `BadHistoryOperationUnsupported` backstop) lacked an end-to-end node-manager regression and was only covered at the pure level by `HistoryPagingTests`. **Recommendation:** None required. **Resolution:** Won't Fix — 2026-06-19: coverage already exists end-to-end. Verified that `NodeManagerHistoryReadPagingTests.Raw_oversized_tie_cluster_pages_within_the_timestamp` drives a real multi-page resume INTO the stall (a 5-way tie cluster larger than the page cap) through a booted `OtOpcUaSdkServer` + `InMemoryHistoryContinuationStore` and asserts lossless duplicate-free paging, and `Raw_tie_cluster_beyond_overfetch_bound_fails_loudly` exercises the `> MaxTieClusterOverfetch` backstop. The module's HistoryRead paging surface is comprehensively tested; no test gap. ### OpcUaServer-006 | Field | Value | |---|---| | Severity | Low | | Category | Documentation & comments | | Location | `OtOpcUaNodeManager.cs:11-30` (class XML doc), `OpcUaApplicationHost.cs:88-93` / `OpcUaApplicationHost.cs:421-423` (F13/F13c follow-up notes) | | Status | Resolved | **Description:** Several XML comments reference superseded plan milestones as if still pending. The `OtOpcUaNodeManager` class doc says equipment-folder hierarchy + type metadata "still come from the AddressSpaceApplier / EquipmentNodeWalker integration (F14b, tracked under #85)", but `EquipmentNodeWalker` is no longer the integration path (the composer → applier → sink → node-manager chain replaced it, per `AddressSpaceComposer`'s own header) and #85 hierarchy materialisation has shipped. `OpcUaApplicationHost` still carries "Full extraction … is tracked as follow-up F13" and "F13c will plug a real LDAP-bound validator" notes, but the `IOpcUaUserAuthenticator` / `RoleCarryingUserIdentity` impersonation wiring is implemented and live (the doc/`docs/OpcUaServer.md` describe it as done). These are stale-comment / undocumented-as-shipped issues, not behaviour bugs. **Recommendation:** Refresh the class- and method-level doc comments to drop the `EquipmentNodeWalker`/F13/F13c/F14b "pending" framing and describe the shipped behaviour, matching `docs/OpcUaServer.md`. **Resolution:** Resolved — 2026-06-19 (SHA pending): rewrote the `OtOpcUaNodeManager` class XML doc to describe the shipped folder-hierarchy / typed-variable / historized-node / Part 9 condition materialisation (dropping the false "treats every id as a flat BaseDataVariableState" claim and the retired `EquipmentNodeWalker`/F14b framing), and the two `OpcUaApplicationHost` doc blocks (class summary + `BuildUserTokenPolicies`) to describe the shipped impersonation/auth wiring instead of the "F13/F13c pending" framing. Doc-comment-only — no behaviour change; build re-verified green. ### OpcUaServer-007 | Field | Value | |---|---| | Severity | Medium | | Category | Correctness & logic bugs | | Location | `OtOpcUaNodeManager.cs:674` (`MaterialiseAlarmCondition` → `alarm.OnTimedUnshelve`) | | Status | Resolved | **Description:** The system-timer auto-unshelve callback (`alarm.OnTimedUnshelve`, wired in `MaterialiseAlarmCondition`) — fired by the SDK when a `TimedShelve` duration expires — built its `AlarmCommand` with `User: string.Empty`: `AlarmCommandRouter?.Invoke(new AlarmCommand(alarmId, "Unshelve", string.Empty, null, null))`. That command is routed to the scripted-alarm engine, whose `ScriptedAlarmEngine.UnshelveAsync` calls `Part9StateMachine.ApplyUnshelve(cur, user, _clock())` (`src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/Part9StateMachine.cs:211-213`), and that method opens with `if (string.IsNullOrWhiteSpace(user)) throw new ArgumentException("User required.", …)`. The exception is swallowed downstream (caught by `ScriptedAlarmHostActor`), so this is NOT a crash — but the auto-unshelve **silently no-ops**: a `TimedShelve` never auto-expires, leaving the alarm permanently shelved with no operator-visible error. The bug was operationally invisible and even had a green node-manager test that asserted `cmd.User == string.Empty`, encoding the defect as expected behaviour. The separate, correct design rule — `OnTimedUnshelve` BYPASSES the `AlarmAck` role gate because it is a session-less system timer with no client principal (routing through the gated `HandleAlarmCommand` would return `BadUserAccessDenied`) — is intentional and was preserved; only the empty `User` string was the defect. **Recommendation:** Pass a non-empty system user at the `OnTimedUnshelve` call site so `ApplyUnshelve` accepts it, while keeping the gate bypass. Use the codebase-wide canonical `"system"` system-actor label (matching `Part9StateMachine`'s own `AutoUnshelve` audit user, `AlarmConditionState`'s "engine-internal events ⇒ `system`" doc, and `AuditActor.SystemFallback = "system"`). **Resolution:** Resolved — 2026-06-19 (SHA pending): changed the `OnTimedUnshelve` delegate in `MaterialiseAlarmCondition` (`OtOpcUaNodeManager.cs:674`) to build the `AlarmCommand` with `User: "system"` instead of `string.Empty`, so the engine's `ApplyUnshelve` user-required guard accepts the system-initiated unshelve and the timed auto-unshelve actually applies. The `AlarmAck` gate bypass for the session-less system timer is unchanged. TDD: repurposed the existing `AlarmCommandRouterTests.OnTimedUnshelve_with_system_context_returns_good_and_routes_unshelve` (which previously asserted `cmd.User == string.Empty`) to assert a non-empty `"system"` user — verified it failed against the unfixed source (`cmd.User should not be null or white space`) and passes after the fix. Surgical one-line src change; no public-contract/wire change. Full `OpcUaServer.Tests` suite green (275/275).