From bac6613dd278a42b73df1d8f682adfaaff6e5010 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Fri, 19 Jun 2026 10:37:00 -0400 Subject: [PATCH] review(OpcUaServer): record findings + fix stale node-manager/host docs First review of the v2 OPC UA core at HEAD 7286d320. 6 findings (2 Medium, 4 Low). OpcUaServer-006 fixed (stale NodeManager/ApplicationHost XML docs). 001-004 deferred (cross-module: Runtime publish-actor / Core.Abstractions history contract / Wonderware boundary semantics, or latent-only). 005 re-triaged Won't-Fix (coverage already exists). High-scrutiny paths (Lock discipline, OnWriteValue fire-and-forget, WriteOperate/AlarmAck gates, HistoryRead AccessLevel bits) verified correct. --- code-reviews/OpcUaServer/findings.md | 211 ++++++++++++++++++ .../OpcUaApplicationHost.cs | 20 +- .../OtOpcUaNodeManager.cs | 13 +- 3 files changed, 231 insertions(+), 13 deletions(-) create mode 100644 code-reviews/OpcUaServer/findings.md diff --git a/code-reviews/OpcUaServer/findings.md b/code-reviews/OpcUaServer/findings.md new file mode 100644 index 00000000..0317fd76 --- /dev/null +++ b/code-reviews/OpcUaServer/findings.md @@ -0,0 +1,211 @@ +# Code Review — OpcUaServer + +| Field | Value | +|---|---| +| Module | `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer` | +| Reviewer | Claude Code | +| Review date | 2026-06-19 | +| Commit reviewed | `7286d320` | +| Status | Reviewed | +| Open findings | 4 | + +## Checklist coverage + +A comprehensive review completes every category, recording "No issues found" where +a category produced nothing rather than leaving it blank. + +| # | Category | Result | +|---|---|---| +| 1 | Correctness & logic bugs | OpcUaServer-001, -002, -003 | +| 2 | OtOpcUa conventions | No issues found | +| 3 | Concurrency & thread safety | No issues found (Lock discipline + fire-and-forget dispatch verified correct) | +| 4 | Error handling & resilience | OpcUaServer-004 | +| 5 | Security | No issues found (WriteOperate / AlarmAck gates fail closed; anonymous → BadUserAccessDenied; HistoryRead AccessLevel bits correct) | +| 6 | Performance & resource management | No issues found | +| 7 | Design-document adherence | No issues found (ConditionId=dotted-FullName routing, MapSeverity buckets, fire-and-forget dispatch all match CLAUDE.md / docs) | +| 8 | Code organization & conventions | No issues found | +| 9 | Testing coverage | No issues found (HistoryRead paging incl. oversized-tie-cluster + backstop is covered end-to-end; OpcUaServer-005 re-triaged to Won't Fix) | +| 10 | Documentation & comments | OpcUaServer-006 (Resolved) | + +## Findings + + + +### OpcUaServer-001 + +| Field | Value | +|---|---| +| Severity | Medium | +| Category | Correctness & logic bugs | +| Location | `AddressSpacePlan.cs:56` (`AddressSpacePlan.IsEmpty`), `AddressSpacePlan.cs:80` (`AddressSpacePlanner.Compute`) | +| Status | Open | + +**Description:** `AddressSpaceComposition` carries the UNS topology (`UnsAreas` + `UnsLines`), and +`AddressSpaceApplier.MaterialiseHierarchy` uses each area's/line's `DisplayName` for the OPC UA +folder display name. But `AddressSpacePlanner.Compute` only diffs Equipment / DriverInstance / +ScriptedAlarm / EquipmentTag / EquipmentVirtualTag — it never diffs `UnsAreas` / `UnsLines`. So a +deployment whose ONLY change is a UNS Area or Line **rename** (no equipment/driver/alarm/tag/vtag +delta) produces a plan whose `IsEmpty` is `true`. In `OpcUaPublishActor.HandleRebuild` +(`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs:313-318`) an empty plan +short-circuits BEFORE `MaterialiseHierarchy` runs, so the renamed Area/Line folder keeps its stale +display name until some unrelated structural change forces a full rebuild. Compounding it, +`OtOpcUaNodeManager.EnsureFolder` (`OtOpcUaNodeManager.cs:1272`) early-returns for an +already-present folder id and never updates an existing folder's `DisplayName`, so even if +`MaterialiseHierarchy` were reached on a rename it would be a no-op until `RebuildAddressSpace` +clears `_folders`. + +**Recommendation:** Carry UNS area/line diff sets into `AddressSpacePlan` (mirroring the +init-only EquipmentTag/VirtualTag pattern) so a rename is no longer "empty", and have the apply +path drive a hierarchy refresh (or make `EnsureFolder` update an existing folder's `DisplayName` +in place + `ClearChangeMasks`). Deferred: a complete fix spans the Runtime module +(`OpcUaPublishActor` must honour the new plan flag and call a hierarchy-refresh / rebuild path), +which is outside this module's edit boundary. + +**Resolution:** _(Open — deferred: needs a coordinated change in the Runtime module's `OpcUaPublishActor` to act on a UNS-changed plan; an in-`AddressSpacePlan` change alone is inert because `Apply`/`MaterialiseHierarchy` do not refresh existing folder names.)_ + +### OpcUaServer-002 + +| Field | Value | +|---|---| +| Severity | Medium | +| Category | Correctness & logic bugs | +| Location | `OtOpcUaNodeManager.cs:1748` (`HistoryReadEvents`), `OtOpcUaNodeManager.cs:1814` (`ClampToInt`) | +| Status | Open | + +**Description:** For HistoryRead-Events, `HistoryReadEvents` passes +`ClampToInt(details.NumValuesPerNode)` to `IHistorianDataSource.ReadEventsAsync(maxEvents)` and +always returns the result with `ContinuationPoint = null` ("the full window in one shot"). The +events arm never issues continuation points. By the `IHistoryProvider.ReadEventsAsync` contract +(`src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:88-95`), `maxEvents <= 0` is a +sentinel meaning "use the backend's **default cap**". But OPC UA Part 4/11 define +`NumValuesPerNode == 0` on a HistoryRead as "no limit — return ALL values". So a client that asks +for all events in a window (`NumValuesPerNode == 0`) is silently truncated at the backend's +default cap, and because no continuation point (and no `GoodMoreData`) is returned, the client has +no signal that the result was truncated — it believes it received the complete window. This is a +silent-data-loss / spec-deviation on the events arm. + +**Recommendation:** Either (a) translate `NumValuesPerNode == 0` to "unbounded" for the events +backend (e.g. saturate to a very large cap) rather than the "default cap" sentinel, or (b) detect +backend truncation and surface a continuation point / `GoodMoreData` for events. Deferred: option +(a) changes the documented `maxEvents <= 0` sentinel semantics shared with the Wonderware/OpcUaClient +event backends (cross-module, Core.Abstractions contract); option (b) requires the backend to report +truncation. Both cross this module's boundary. + +**Resolution:** _(Open — deferred: rooted in the cross-module `IHistoryProvider.ReadEventsAsync` `maxEvents <= 0` sentinel contract (Core.Abstractions-006) and the Wonderware/OpcUaClient event backends; cannot be fixed safely inside OpcUaServer alone.)_ + +### OpcUaServer-003 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Correctness & logic bugs | +| Location | `OtOpcUaNodeManager.cs:1978` (`ServeRawPaged`), `HistoryPaging.cs` (whole), `HistoryPaging.cs:213` (`SliceTieCluster` `next <= endUtc`) | +| Status | Open | + +**Description:** The Raw paging chain treats `endUtc` as an **inclusive** upper bound throughout — +the `HistoryContinuationState`/`HistoryPaging` XML docs all say "the original (inclusive) end of +the window", and `SliceTieCluster` advances with `next <= endUtc`. But the backend contract +(`IHistoryProvider.ReadRawAsync`, `IHistoryProvider.cs:25`) defines `endUtc` as an **exclusive** +upper bound. The two arms therefore disagree at the exact `endUtc` boundary. Impact is small +because the backend is the authority on which samples actually exist (a sample at exactly `endUtc` +never appears in an exclusive-end read), so the disagreement only ever yields one extra empty +resume page (`[endUtc, endUtc)` → GoodNoData, no CP) rather than duplicated or dropped data. +Separately, the SDK permits a Raw read with `StartTime == DateTime.MinValue` + a `NumValuesPerNode` +cap (the OPC UA "read N values backward from EndTime" request — validated in +`CustomNodeManager.HistoryRead`); the forward-only resume cursor (`ComputeResumeCursor` resumes from +the LAST sample forward) is not defined for a backward read, so paging a backward Raw read is +unsound if the backend honours the backward direction. + +**Recommendation:** Pin the `endUtc` inclusive-vs-exclusive convention against the actual backend +(align the paging XML docs + `SliceTieCluster` comparison to the contract), and either explicitly +reject a `StartTime == MinValue` paged Raw read with a clear status or document that the backend is +assumed to always return chronological forward results regardless of direction. Deferred: resolving +the inclusive/exclusive question requires confirming the Wonderware backend's actual boundary +semantics (cross-module / infra), and changing a comparison without that confirmation risks the +opposite off-by-one. + +**Resolution:** _(Open — deferred: needs the backend's authoritative endUtc boundary semantics confirmed before the comparison/doc is changed; flipping it blindly risks an off-by-one in the other direction.)_ + +### OpcUaServer-004 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Error handling & resilience | +| Location | `OtOpcUaNodeManager.cs:1597` (`ResolveParentFolder`), and every public sink mutator that calls it (`EnsureFolder` 1278, `EnsureVariable` 1335, `MaterialiseAlarmCondition` 597, plus `WriteValue`/`WriteAlarmCondition` `CreateVariable`) | +| Status | Open | + +**Description:** `ResolveParentFolder` dereferences `_root!` with the null-forgiving operator, and +`CreateVariable` uses `_root` (`AddChild`). `_root` is only assigned in `CreateAddressSpace`, which +the SDK invokes during `StandardServer` start. Every public address-space mutator +(`WriteValue`, `WriteAlarmCondition`, `EnsureFolder`, `EnsureVariable`, `MaterialiseAlarmCondition`) +assumes `CreateAddressSpace` has already run. If any of these is ever called before the server has +started (e.g. a sink wired or a publish replayed before `StartAsync` completes), `_root` is `null` +and the call NREs out of the node manager. In the current boot ordering the host wires the +`SdkAddressSpaceSink` only after start, so this is latent rather than live, but it is an unguarded +ordering hazard on the highest-risk class. + +**Recommendation:** Add an explicit guard (e.g. throw a clear `InvalidOperationException("address +space not yet created")` or no-op with a logged warning) when `_root` is null at the top of the +mutators, so a too-early call fails legibly instead of with a bare NRE. Low priority — defensive +hardening, not a live defect. Left Open to avoid an unscoped change to the mutator entry points on +this critical class without a regression scenario that reproduces the early-call ordering. + +**Resolution:** _(Open — defensive-only; latent given current boot ordering. Deferred to avoid an unscoped guard-add across five mutators without a reproducing pre-start ordering scenario.)_ + +### OpcUaServer-005 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Testing coverage | +| Location | `OtOpcUaNodeManager.cs:2049` (`ServeRawPaged` tie-cluster stall path), `OtOpcUaNodeManager.cs:2068` (absurd-burst backstop) | +| Status | Won't Fix | + +**Description:** (Re-triaged to a non-issue.) Initial concern was that the oversized-tie-cluster +paging path in `ServeRawPaged` (over-fetch the whole cluster, `SliceTieCluster`, the +`MaxTieClusterOverfetch` absurd-burst `BadHistoryOperationUnsupported` backstop) lacked an +end-to-end node-manager regression and was only covered at the pure level by `HistoryPagingTests`. + +**Recommendation:** None required. + +**Resolution:** Won't Fix — 2026-06-19: coverage already exists end-to-end. Verified that +`NodeManagerHistoryReadPagingTests.Raw_oversized_tie_cluster_pages_within_the_timestamp` drives a +real multi-page resume INTO the stall (a 5-way tie cluster larger than the page cap) through a +booted `OtOpcUaSdkServer` + `InMemoryHistoryContinuationStore` and asserts lossless duplicate-free +paging, and `Raw_tie_cluster_beyond_overfetch_bound_fails_loudly` exercises the +`> MaxTieClusterOverfetch` backstop. The module's HistoryRead paging surface is comprehensively +tested; no test gap. + +### OpcUaServer-006 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Documentation & comments | +| Location | `OtOpcUaNodeManager.cs:11-30` (class XML doc), `OpcUaApplicationHost.cs:88-93` / `OpcUaApplicationHost.cs:421-423` (F13/F13c follow-up notes) | +| Status | Resolved | + +**Description:** Several XML comments reference superseded plan milestones as if still pending. The +`OtOpcUaNodeManager` class doc says equipment-folder hierarchy + type metadata "still come from the +AddressSpaceApplier / EquipmentNodeWalker integration (F14b, tracked under #85)", but +`EquipmentNodeWalker` is no longer the integration path (the composer → applier → sink → node-manager +chain replaced it, per `AddressSpaceComposer`'s own header) and #85 hierarchy materialisation has +shipped. `OpcUaApplicationHost` still carries "Full extraction … is tracked as follow-up F13" and +"F13c will plug a real LDAP-bound validator" notes, but the `IOpcUaUserAuthenticator` / +`RoleCarryingUserIdentity` impersonation wiring is implemented and live (the doc/`docs/OpcUaServer.md` +describe it as done). These are stale-comment / undocumented-as-shipped issues, not behaviour bugs. + +**Recommendation:** Refresh the class- and method-level doc comments to drop the +`EquipmentNodeWalker`/F13/F13c/F14b "pending" framing and describe the shipped behaviour, matching +`docs/OpcUaServer.md`. + +**Resolution:** Resolved — 2026-06-19 (SHA pending): rewrote the `OtOpcUaNodeManager` class XML +doc to describe the shipped folder-hierarchy / typed-variable / historized-node / Part 9 +condition materialisation (dropping the false "treats every id as a flat BaseDataVariableState" +claim and the retired `EquipmentNodeWalker`/F14b framing), and the two `OpcUaApplicationHost` +doc blocks (class summary + `BuildUserTokenPolicies`) to describe the shipped impersonation/auth +wiring instead of the "F13/F13c pending" framing. Doc-comment-only — no behaviour change; build +re-verified green. diff --git a/src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OpcUaApplicationHost.cs b/src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OpcUaApplicationHost.cs index 93b5230c..a591d281 100644 --- a/src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OpcUaApplicationHost.cs +++ b/src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OpcUaApplicationHost.cs @@ -83,13 +83,12 @@ public sealed class OpcUaApplicationHostOptions /// /// Thin facade over the OPC Foundation .NET Standard SDK's application bootstrap. -/// Owns the + lifetime -/// and starts a with the supplied node-manager factory. -/// -/// Full extraction from legacy OtOpcUa.Server (security wiring, ScriptedAlarmDescriptor -/// pipeline, ResilienceController, history backend, observability hooks) is tracked as -/// follow-up F13. This facade compiles + boots the SDK so Task 53 can wire the fused Host's -/// driver-role startup against it. +/// Owns the + lifetime, +/// builds the programmatically (security profiles + user +/// token policies + PKI stores), starts a , attaches the +/// ImpersonateUser hook for UserName-token authentication via +/// , and publishes the redundancy peer URIs through +/// Server.ServerArray. See docs/OpcUaServer.md and docs/security.md. /// public sealed class OpcUaApplicationHost : IAsyncDisposable { @@ -419,8 +418,11 @@ public sealed class OpcUaApplicationHost : IAsyncDisposable /// /// Anonymous + UserName token policies. UserName tokens are always SDK-encrypted with /// the server certificate (see docs/security.md "UserName token encryption") so the - /// policy works on None endpoints too. F13c will plug a real LDAP-bound validator into - /// StandardServer.SessionManager.ImpersonateUser. + /// policy works on None endpoints too. The real LDAP-bound validator is wired through + /// on StandardServer.SessionManager.ImpersonateUser + /// (see / ); the Host + /// project supplies the LDAP adapter, and denies all + /// UserName logins when none is registered. /// internal static IEnumerable BuildUserTokenPolicies() { diff --git a/src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs b/src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs index 456b3196..8b591777 100644 --- a/src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs +++ b/src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs @@ -24,10 +24,15 @@ namespace ZB.MOM.WW.OtOpcUa.OpcUaServer; /// materialise as real OPC UA Variable updates that clients can browse + subscribe to. /// /// Node-id encoding uses the manager's default namespace + the caller-supplied string id -/// as the identifier portion (e.g. "ns=2;s=eq-1/temp"). Equipment-folder hierarchy -/// and OPC UA type metadata still come from the AddressSpaceApplier / EquipmentNodeWalker -/// integration (F14b, tracked under #85) — this manager treats every id as a flat -/// under the namespace root. +/// as the identifier portion (e.g. "ns=2;s=eq-1/temp"). Beyond lazily-created flat +/// variables, the manager also materialises the full UNS Area/Line/Equipment folder hierarchy +/// (), typed equipment-tag variables with the correct built-in +/// DataType / array shape / access levels (), historized +/// nodes (the HistoryRead access bit + HistoryRead overrides), and real Part 9 +/// nodes (). The +/// AddressSpaceApplier drives these passes from the composed deployment artifact; +/// the legacy EquipmentNodeWalker server-side integration was retired in favour of the +/// (composer → applier → sink → node-manager) chain. /// public sealed class OtOpcUaNodeManager : CustomNodeManager2 {