review(OpcUaServer): record findings + fix stale node-manager/host docs

First review of the v2 OPC UA core at HEAD 7286d320. 6 findings (2 Medium, 4 Low).
OpcUaServer-006 fixed (stale NodeManager/ApplicationHost XML docs). 001-004 deferred
(cross-module: Runtime publish-actor / Core.Abstractions history contract / Wonderware
boundary semantics, or latent-only). 005 re-triaged Won't-Fix (coverage already exists).
High-scrutiny paths (Lock discipline, OnWriteValue fire-and-forget, WriteOperate/AlarmAck
gates, HistoryRead AccessLevel bits) verified correct.
This commit is contained in:
Joseph Doherty
2026-06-19 10:37:00 -04:00
parent e4abe186a1
commit bac6613dd2
3 changed files with 231 additions and 13 deletions
+211
View File
@@ -0,0 +1,211 @@
# Code Review — OpcUaServer
| Field | Value |
|---|---|
| Module | `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer` |
| Reviewer | Claude Code |
| Review date | 2026-06-19 |
| Commit reviewed | `7286d320` |
| Status | Reviewed |
| Open findings | 4 |
## Checklist coverage
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | OpcUaServer-001, -002, -003 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found (Lock discipline + fire-and-forget dispatch verified correct) |
| 4 | Error handling & resilience | OpcUaServer-004 |
| 5 | Security | No issues found (WriteOperate / AlarmAck gates fail closed; anonymous → BadUserAccessDenied; HistoryRead AccessLevel bits correct) |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found (ConditionId=dotted-FullName routing, MapSeverity buckets, fire-and-forget dispatch all match CLAUDE.md / docs) |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | No issues found (HistoryRead paging incl. oversized-tie-cluster + backstop is covered end-to-end; OpcUaServer-005 re-triaged to Won't Fix) |
| 10 | Documentation & comments | OpcUaServer-006 (Resolved) |
## Findings
<!-- One ### entry per finding. IDs are OpcUaServer-NNN, sequential within the module,
never reused. Findings are never deleted — close them by changing Status and
completing Resolution. -->
### OpcUaServer-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `AddressSpacePlan.cs:56` (`AddressSpacePlan.IsEmpty`), `AddressSpacePlan.cs:80` (`AddressSpacePlanner.Compute`) |
| Status | Open |
**Description:** `AddressSpaceComposition` carries the UNS topology (`UnsAreas` + `UnsLines`), and
`AddressSpaceApplier.MaterialiseHierarchy` uses each area's/line's `DisplayName` for the OPC UA
folder display name. But `AddressSpacePlanner.Compute` only diffs Equipment / DriverInstance /
ScriptedAlarm / EquipmentTag / EquipmentVirtualTag — it never diffs `UnsAreas` / `UnsLines`. So a
deployment whose ONLY change is a UNS Area or Line **rename** (no equipment/driver/alarm/tag/vtag
delta) produces a plan whose `IsEmpty` is `true`. In `OpcUaPublishActor.HandleRebuild`
(`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs:313-318`) an empty plan
short-circuits BEFORE `MaterialiseHierarchy` runs, so the renamed Area/Line folder keeps its stale
display name until some unrelated structural change forces a full rebuild. Compounding it,
`OtOpcUaNodeManager.EnsureFolder` (`OtOpcUaNodeManager.cs:1272`) early-returns for an
already-present folder id and never updates an existing folder's `DisplayName`, so even if
`MaterialiseHierarchy` were reached on a rename it would be a no-op until `RebuildAddressSpace`
clears `_folders`.
**Recommendation:** Carry UNS area/line diff sets into `AddressSpacePlan` (mirroring the
init-only EquipmentTag/VirtualTag pattern) so a rename is no longer "empty", and have the apply
path drive a hierarchy refresh (or make `EnsureFolder` update an existing folder's `DisplayName`
in place + `ClearChangeMasks`). Deferred: a complete fix spans the Runtime module
(`OpcUaPublishActor` must honour the new plan flag and call a hierarchy-refresh / rebuild path),
which is outside this module's edit boundary.
**Resolution:** _(Open — deferred: needs a coordinated change in the Runtime module's `OpcUaPublishActor` to act on a UNS-changed plan; an in-`AddressSpacePlan` change alone is inert because `Apply`/`MaterialiseHierarchy` do not refresh existing folder names.)_
### OpcUaServer-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `OtOpcUaNodeManager.cs:1748` (`HistoryReadEvents`), `OtOpcUaNodeManager.cs:1814` (`ClampToInt`) |
| Status | Open |
**Description:** For HistoryRead-Events, `HistoryReadEvents` passes
`ClampToInt(details.NumValuesPerNode)` to `IHistorianDataSource.ReadEventsAsync(maxEvents)` and
always returns the result with `ContinuationPoint = null` ("the full window in one shot"). The
events arm never issues continuation points. By the `IHistoryProvider.ReadEventsAsync` contract
(`src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:88-95`), `maxEvents <= 0` is a
sentinel meaning "use the backend's **default cap**". But OPC UA Part 4/11 define
`NumValuesPerNode == 0` on a HistoryRead as "no limit — return ALL values". So a client that asks
for all events in a window (`NumValuesPerNode == 0`) is silently truncated at the backend's
default cap, and because no continuation point (and no `GoodMoreData`) is returned, the client has
no signal that the result was truncated — it believes it received the complete window. This is a
silent-data-loss / spec-deviation on the events arm.
**Recommendation:** Either (a) translate `NumValuesPerNode == 0` to "unbounded" for the events
backend (e.g. saturate to a very large cap) rather than the "default cap" sentinel, or (b) detect
backend truncation and surface a continuation point / `GoodMoreData` for events. Deferred: option
(a) changes the documented `maxEvents <= 0` sentinel semantics shared with the Wonderware/OpcUaClient
event backends (cross-module, Core.Abstractions contract); option (b) requires the backend to report
truncation. Both cross this module's boundary.
**Resolution:** _(Open — deferred: rooted in the cross-module `IHistoryProvider.ReadEventsAsync` `maxEvents <= 0` sentinel contract (Core.Abstractions-006) and the Wonderware/OpcUaClient event backends; cannot be fixed safely inside OpcUaServer alone.)_
### OpcUaServer-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `OtOpcUaNodeManager.cs:1978` (`ServeRawPaged`), `HistoryPaging.cs` (whole), `HistoryPaging.cs:213` (`SliceTieCluster` `next <= endUtc`) |
| Status | Open |
**Description:** The Raw paging chain treats `endUtc` as an **inclusive** upper bound throughout —
the `HistoryContinuationState`/`HistoryPaging` XML docs all say "the original (inclusive) end of
the window", and `SliceTieCluster` advances with `next <= endUtc`. But the backend contract
(`IHistoryProvider.ReadRawAsync`, `IHistoryProvider.cs:25`) defines `endUtc` as an **exclusive**
upper bound. The two arms therefore disagree at the exact `endUtc` boundary. Impact is small
because the backend is the authority on which samples actually exist (a sample at exactly `endUtc`
never appears in an exclusive-end read), so the disagreement only ever yields one extra empty
resume page (`[endUtc, endUtc)` → GoodNoData, no CP) rather than duplicated or dropped data.
Separately, the SDK permits a Raw read with `StartTime == DateTime.MinValue` + a `NumValuesPerNode`
cap (the OPC UA "read N values backward from EndTime" request — validated in
`CustomNodeManager.HistoryRead`); the forward-only resume cursor (`ComputeResumeCursor` resumes from
the LAST sample forward) is not defined for a backward read, so paging a backward Raw read is
unsound if the backend honours the backward direction.
**Recommendation:** Pin the `endUtc` inclusive-vs-exclusive convention against the actual backend
(align the paging XML docs + `SliceTieCluster` comparison to the contract), and either explicitly
reject a `StartTime == MinValue` paged Raw read with a clear status or document that the backend is
assumed to always return chronological forward results regardless of direction. Deferred: resolving
the inclusive/exclusive question requires confirming the Wonderware backend's actual boundary
semantics (cross-module / infra), and changing a comparison without that confirmation risks the
opposite off-by-one.
**Resolution:** _(Open — deferred: needs the backend's authoritative endUtc boundary semantics confirmed before the comparison/doc is changed; flipping it blindly risks an off-by-one in the other direction.)_
### OpcUaServer-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `OtOpcUaNodeManager.cs:1597` (`ResolveParentFolder`), and every public sink mutator that calls it (`EnsureFolder` 1278, `EnsureVariable` 1335, `MaterialiseAlarmCondition` 597, plus `WriteValue`/`WriteAlarmCondition` `CreateVariable`) |
| Status | Open |
**Description:** `ResolveParentFolder` dereferences `_root!` with the null-forgiving operator, and
`CreateVariable` uses `_root` (`AddChild`). `_root` is only assigned in `CreateAddressSpace`, which
the SDK invokes during `StandardServer` start. Every public address-space mutator
(`WriteValue`, `WriteAlarmCondition`, `EnsureFolder`, `EnsureVariable`, `MaterialiseAlarmCondition`)
assumes `CreateAddressSpace` has already run. If any of these is ever called before the server has
started (e.g. a sink wired or a publish replayed before `StartAsync` completes), `_root` is `null`
and the call NREs out of the node manager. In the current boot ordering the host wires the
`SdkAddressSpaceSink` only after start, so this is latent rather than live, but it is an unguarded
ordering hazard on the highest-risk class.
**Recommendation:** Add an explicit guard (e.g. throw a clear `InvalidOperationException("address
space not yet created")` or no-op with a logged warning) when `_root` is null at the top of the
mutators, so a too-early call fails legibly instead of with a bare NRE. Low priority — defensive
hardening, not a live defect. Left Open to avoid an unscoped change to the mutator entry points on
this critical class without a regression scenario that reproduces the early-call ordering.
**Resolution:** _(Open — defensive-only; latent given current boot ordering. Deferred to avoid an unscoped guard-add across five mutators without a reproducing pre-start ordering scenario.)_
### OpcUaServer-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `OtOpcUaNodeManager.cs:2049` (`ServeRawPaged` tie-cluster stall path), `OtOpcUaNodeManager.cs:2068` (absurd-burst backstop) |
| Status | Won't Fix |
**Description:** (Re-triaged to a non-issue.) Initial concern was that the oversized-tie-cluster
paging path in `ServeRawPaged` (over-fetch the whole cluster, `SliceTieCluster`, the
`MaxTieClusterOverfetch` absurd-burst `BadHistoryOperationUnsupported` backstop) lacked an
end-to-end node-manager regression and was only covered at the pure level by `HistoryPagingTests`.
**Recommendation:** None required.
**Resolution:** Won't Fix — 2026-06-19: coverage already exists end-to-end. Verified that
`NodeManagerHistoryReadPagingTests.Raw_oversized_tie_cluster_pages_within_the_timestamp` drives a
real multi-page resume INTO the stall (a 5-way tie cluster larger than the page cap) through a
booted `OtOpcUaSdkServer` + `InMemoryHistoryContinuationStore` and asserts lossless duplicate-free
paging, and `Raw_tie_cluster_beyond_overfetch_bound_fails_loudly` exercises the
`> MaxTieClusterOverfetch` backstop. The module's HistoryRead paging surface is comprehensively
tested; no test gap.
### OpcUaServer-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `OtOpcUaNodeManager.cs:11-30` (class XML doc), `OpcUaApplicationHost.cs:88-93` / `OpcUaApplicationHost.cs:421-423` (F13/F13c follow-up notes) |
| Status | Resolved |
**Description:** Several XML comments reference superseded plan milestones as if still pending. The
`OtOpcUaNodeManager` class doc says equipment-folder hierarchy + type metadata "still come from the
AddressSpaceApplier / EquipmentNodeWalker integration (F14b, tracked under #85)", but
`EquipmentNodeWalker` is no longer the integration path (the composer → applier → sink → node-manager
chain replaced it, per `AddressSpaceComposer`'s own header) and #85 hierarchy materialisation has
shipped. `OpcUaApplicationHost` still carries "Full extraction … is tracked as follow-up F13" and
"F13c will plug a real LDAP-bound validator" notes, but the `IOpcUaUserAuthenticator` /
`RoleCarryingUserIdentity` impersonation wiring is implemented and live (the doc/`docs/OpcUaServer.md`
describe it as done). These are stale-comment / undocumented-as-shipped issues, not behaviour bugs.
**Recommendation:** Refresh the class- and method-level doc comments to drop the
`EquipmentNodeWalker`/F13/F13c/F14b "pending" framing and describe the shipped behaviour, matching
`docs/OpcUaServer.md`.
**Resolution:** Resolved — 2026-06-19 (SHA pending): rewrote the `OtOpcUaNodeManager` class XML
doc to describe the shipped folder-hierarchy / typed-variable / historized-node / Part 9
condition materialisation (dropping the false "treats every id as a flat BaseDataVariableState"
claim and the retired `EquipmentNodeWalker`/F14b framing), and the two `OpcUaApplicationHost`
doc blocks (class summary + `BuildUserTokenPolicies`) to describe the shipped impersonation/auth
wiring instead of the "F13/F13c pending" framing. Doc-comment-only — no behaviour change; build
re-verified green.
@@ -83,13 +83,12 @@ public sealed class OpcUaApplicationHostOptions
/// <summary>
/// Thin facade over the OPC Foundation .NET Standard SDK's application bootstrap.
/// Owns the <see cref="ApplicationInstance"/> + <see cref="ApplicationConfiguration"/> lifetime
/// and starts a <see cref="StandardServer"/> with the supplied node-manager factory.
///
/// Full extraction from legacy <c>OtOpcUa.Server</c> (security wiring, ScriptedAlarmDescriptor
/// pipeline, ResilienceController, history backend, observability hooks) is tracked as
/// follow-up F13. This facade compiles + boots the SDK so Task 53 can wire the fused Host's
/// driver-role startup against it.
/// Owns the <see cref="ApplicationInstance"/> + <see cref="ApplicationConfiguration"/> lifetime,
/// builds the <see cref="ApplicationConfiguration"/> programmatically (security profiles + user
/// token policies + PKI stores), starts a <see cref="StandardServer"/>, attaches the
/// <c>ImpersonateUser</c> hook for UserName-token authentication via
/// <see cref="IOpcUaUserAuthenticator"/>, and publishes the redundancy peer URIs through
/// <c>Server.ServerArray</c>. See <c>docs/OpcUaServer.md</c> and <c>docs/security.md</c>.
/// </summary>
public sealed class OpcUaApplicationHost : IAsyncDisposable
{
@@ -419,8 +418,11 @@ public sealed class OpcUaApplicationHost : IAsyncDisposable
/// <summary>
/// Anonymous + UserName token policies. UserName tokens are always SDK-encrypted with
/// the server certificate (see docs/security.md "UserName token encryption") so the
/// policy works on None endpoints too. F13c will plug a real LDAP-bound validator into
/// <c>StandardServer.SessionManager.ImpersonateUser</c>.
/// policy works on None endpoints too. The real LDAP-bound validator is wired through
/// <see cref="IOpcUaUserAuthenticator"/> on <c>StandardServer.SessionManager.ImpersonateUser</c>
/// (see <see cref="AttachUserAuthenticator"/> / <see cref="HandleImpersonation"/>); the Host
/// project supplies the LDAP adapter, and <see cref="NullOpcUaUserAuthenticator"/> denies all
/// UserName logins when none is registered.
/// </summary>
internal static IEnumerable<UserTokenPolicy> BuildUserTokenPolicies()
{
@@ -24,10 +24,15 @@ namespace ZB.MOM.WW.OtOpcUa.OpcUaServer;
/// materialise as real OPC UA Variable updates that clients can browse + subscribe to.
///
/// Node-id encoding uses the manager's default namespace + the caller-supplied string id
/// as the identifier portion (e.g. <c>"ns=2;s=eq-1/temp"</c>). Equipment-folder hierarchy
/// and OPC UA type metadata still come from the AddressSpaceApplier / EquipmentNodeWalker
/// integration (F14b, tracked under #85) — this manager treats every id as a flat
/// <see cref="BaseDataVariableState"/> under the namespace root.
/// as the identifier portion (e.g. <c>"ns=2;s=eq-1/temp"</c>). Beyond lazily-created flat
/// variables, the manager also materialises the full UNS Area/Line/Equipment folder hierarchy
/// (<see cref="EnsureFolder"/>), typed equipment-tag variables with the correct built-in
/// <c>DataType</c> / array shape / access levels (<see cref="EnsureVariable"/>), historized
/// nodes (the <c>HistoryRead</c> access bit + HistoryRead overrides), and real Part 9
/// <see cref="AlarmConditionState"/> nodes (<see cref="MaterialiseAlarmCondition"/>). The
/// <c>AddressSpaceApplier</c> drives these passes from the composed deployment artifact;
/// the legacy <c>EquipmentNodeWalker</c> server-side integration was retired in favour of the
/// (composer → applier → sink → node-manager) chain.
/// </summary>
public sealed class OtOpcUaNodeManager : CustomNodeManager2
{