- OpcUaServer-002: HistoryRead-Events NumValuesPerNode==0 now maps to unbounded (int.MaxValue) instead of the backend default-cap sentinel; no Core.Abstractions contract change (+EventMaxEvents helper tests) - OpcUaServer-004: EnsureAddressSpaceCreated guard on public mutators -> clear InvalidOperationException instead of bare NRE if called pre-start (+tests) - OpcUaServer-003: Deferred (endUtc inclusive/exclusive needs live Wonderware boundary confirmation) - Configuration-013: wire DraftValidator.ValidateClusterTopology into AdminOperationsActor deploy gate (read-only, no migration) (+2 tests)
19 KiB
Code Review — OpcUaServer
| Field | Value |
|---|---|
| Module | src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer |
| Reviewer | Claude Code |
| Review date | 2026-06-19 |
| Commit reviewed | 7286d320 |
| Status | Reviewed |
| Open findings | 1 |
Checklist coverage
A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | OpcUaServer-001, -002, -003, -007 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found (Lock discipline + fire-and-forget dispatch verified correct) |
| 4 | Error handling & resilience | OpcUaServer-004 |
| 5 | Security | No issues found (WriteOperate / AlarmAck gates fail closed; anonymous → BadUserAccessDenied; HistoryRead AccessLevel bits correct) |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found (ConditionId=dotted-FullName routing, MapSeverity buckets, fire-and-forget dispatch all match CLAUDE.md / docs) |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | No issues found (HistoryRead paging incl. oversized-tie-cluster + backstop is covered end-to-end; OpcUaServer-005 re-triaged to Won't Fix) |
| 10 | Documentation & comments | OpcUaServer-006 (Resolved) |
Findings
OpcUaServer-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | AddressSpacePlan.cs:56 (AddressSpacePlan.IsEmpty), AddressSpacePlan.cs:80 (AddressSpacePlanner.Compute) |
| Status | Open |
Description: AddressSpaceComposition carries the UNS topology (UnsAreas + UnsLines), and
AddressSpaceApplier.MaterialiseHierarchy uses each area's/line's DisplayName for the OPC UA
folder display name. But AddressSpacePlanner.Compute only diffs Equipment / DriverInstance /
ScriptedAlarm / EquipmentTag / EquipmentVirtualTag — it never diffs UnsAreas / UnsLines. So a
deployment whose ONLY change is a UNS Area or Line rename (no equipment/driver/alarm/tag/vtag
delta) produces a plan whose IsEmpty is true. In OpcUaPublishActor.HandleRebuild
(src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs:313-318) an empty plan
short-circuits BEFORE MaterialiseHierarchy runs, so the renamed Area/Line folder keeps its stale
display name until some unrelated structural change forces a full rebuild. Compounding it,
OtOpcUaNodeManager.EnsureFolder (OtOpcUaNodeManager.cs:1272) early-returns for an
already-present folder id and never updates an existing folder's DisplayName, so even if
MaterialiseHierarchy were reached on a rename it would be a no-op until RebuildAddressSpace
clears _folders.
Recommendation: Carry UNS area/line diff sets into AddressSpacePlan (mirroring the
init-only EquipmentTag/VirtualTag pattern) so a rename is no longer "empty", and have the apply
path drive a hierarchy refresh (or make EnsureFolder update an existing folder's DisplayName
in place + ClearChangeMasks). Deferred: a complete fix spans the Runtime module
(OpcUaPublishActor must honour the new plan flag and call a hierarchy-refresh / rebuild path),
which is outside this module's edit boundary.
Resolution: (Open — deferred: needs a coordinated change in the Runtime module's OpcUaPublishActor to act on a UNS-changed plan; an in-AddressSpacePlan change alone is inert because Apply/MaterialiseHierarchy do not refresh existing folder names.)
OpcUaServer-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | OtOpcUaNodeManager.cs:1748 (HistoryReadEvents), OtOpcUaNodeManager.cs:1814 (ClampToInt) |
| Status | Resolved |
Description: For HistoryRead-Events, HistoryReadEvents passes
ClampToInt(details.NumValuesPerNode) to IHistorianDataSource.ReadEventsAsync(maxEvents) and
always returns the result with ContinuationPoint = null ("the full window in one shot"). The
events arm never issues continuation points. By the IHistoryProvider.ReadEventsAsync contract
(src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:88-95), maxEvents <= 0 is a
sentinel meaning "use the backend's default cap". But OPC UA Part 4/11 define
NumValuesPerNode == 0 on a HistoryRead as "no limit — return ALL values". So a client that asks
for all events in a window (NumValuesPerNode == 0) is silently truncated at the backend's
default cap, and because no continuation point (and no GoodMoreData) is returned, the client has
no signal that the result was truncated — it believes it received the complete window. This is a
silent-data-loss / spec-deviation on the events arm.
Recommendation: Either (a) translate NumValuesPerNode == 0 to "unbounded" for the events
backend (e.g. saturate to a very large cap) rather than the "default cap" sentinel, or (b) detect
backend truncation and surface a continuation point / GoodMoreData for events. Deferred: option
(a) changes the documented maxEvents <= 0 sentinel semantics shared with the Wonderware/OpcUaClient
event backends (cross-module, Core.Abstractions contract); option (b) requires the backend to report
truncation. Both cross this module's boundary.
Resolution: Resolved — 2026-06-20 (SHA pending): fixed locally inside OpcUaServer without touching
the cross-module IHistoryProvider.ReadEventsAsync maxEvents <= 0 sentinel. Added a small testable
internal static int EventMaxEvents(uint numValuesPerNode) helper next to ClampToInt that translates
the OPC UA Part 4/11 "no limit" request (NumValuesPerNode == 0) to UNBOUNDED (int.MaxValue, a very
large positive cap) rather than the backend's <= 0 "use the default cap" sentinel; a positive value
still passes through ClampToInt unchanged. HistoryReadEvents now calls EventMaxEvents(details.NumValuesPerNode)
instead of ClampToInt(details.NumValuesPerNode), so a "give me the whole window" events read is no
longer silently truncated at the backend default. The sentinel contract + the Wonderware/OpcUaClient
backends are untouched (a positive int.MaxValue is never the <= 0 sentinel). Tests:
NodeManagerEventMaxEventsTests (helper purity — 0u→int.MaxValue, normal passthrough,
>int.MaxValue→int.MaxValue clamp, exact-int.MaxValue boundary) plus
NodeManagerHistoryReadEventsTests.Events_unbounded_request_passes_int_max_to_backend (the recording fake
IHistorianDataSource receives int.MaxValue when NumValuesPerNode == 0). Note: option (b) — surfacing
a continuation point / GoodMoreData on backend truncation — remains a cross-module/backend change and is
out of scope; option (a) here removes the silent-truncation defect for the common "all events" request.
OpcUaServer-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | OtOpcUaNodeManager.cs:1978 (ServeRawPaged), HistoryPaging.cs (whole), HistoryPaging.cs:213 (SliceTieCluster next <= endUtc) |
| Status | Deferred |
Description: The Raw paging chain treats endUtc as an inclusive upper bound throughout —
the HistoryContinuationState/HistoryPaging XML docs all say "the original (inclusive) end of
the window", and SliceTieCluster advances with next <= endUtc. But the backend contract
(IHistoryProvider.ReadRawAsync, IHistoryProvider.cs:25) defines endUtc as an exclusive
upper bound. The two arms therefore disagree at the exact endUtc boundary. Impact is small
because the backend is the authority on which samples actually exist (a sample at exactly endUtc
never appears in an exclusive-end read), so the disagreement only ever yields one extra empty
resume page ([endUtc, endUtc) → GoodNoData, no CP) rather than duplicated or dropped data.
Separately, the SDK permits a Raw read with StartTime == DateTime.MinValue + a NumValuesPerNode
cap (the OPC UA "read N values backward from EndTime" request — validated in
CustomNodeManager.HistoryRead); the forward-only resume cursor (ComputeResumeCursor resumes from
the LAST sample forward) is not defined for a backward read, so paging a backward Raw read is
unsound if the backend honours the backward direction.
Recommendation: Pin the endUtc inclusive-vs-exclusive convention against the actual backend
(align the paging XML docs + SliceTieCluster comparison to the contract), and either explicitly
reject a StartTime == MinValue paged Raw read with a clear status or document that the backend is
assumed to always return chronological forward results regardless of direction. Deferred: resolving
the inclusive/exclusive question requires confirming the Wonderware backend's actual boundary
semantics (cross-module / infra), and changing a comparison without that confirmation risks the
opposite off-by-one.
Resolution: Deferred — 2026-06-20: infra-gated. Resolving the endUtc inclusive-vs-exclusive
disagreement requires confirming the actual Wonderware historian backend's boundary semantics, which is
hardware/infra-gated and not reachable from this macOS dev host. The impact is benign and bounded — because
the backend is the authority on which samples exist (a sample at exactly endUtc never appears in an
exclusive-end read), the disagreement only ever yields ONE extra empty resume page ([endUtc, endUtc) →
GoodNoData, no continuation point) rather than any duplicated or dropped data. Changing the
SliceTieCluster comparison / paging XML docs without confirming the live backend boundary risks
introducing the opposite off-by-one, so no code is changed here pending that live confirmation.
OpcUaServer-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | OtOpcUaNodeManager.cs:1597 (ResolveParentFolder), and every public sink mutator that calls it (EnsureFolder 1278, EnsureVariable 1335, MaterialiseAlarmCondition 597, plus WriteValue/WriteAlarmCondition CreateVariable) |
| Status | Resolved |
Description: ResolveParentFolder dereferences _root! with the null-forgiving operator, and
CreateVariable uses _root (AddChild). _root is only assigned in CreateAddressSpace, which
the SDK invokes during StandardServer start. Every public address-space mutator
(WriteValue, WriteAlarmCondition, EnsureFolder, EnsureVariable, MaterialiseAlarmCondition)
assumes CreateAddressSpace has already run. If any of these is ever called before the server has
started (e.g. a sink wired or a publish replayed before StartAsync completes), _root is null
and the call NREs out of the node manager. In the current boot ordering the host wires the
SdkAddressSpaceSink only after start, so this is latent rather than live, but it is an unguarded
ordering hazard on the highest-risk class.
Recommendation: Add an explicit guard (e.g. throw a clear InvalidOperationException("address space not yet created") or no-op with a logged warning) when _root is null at the top of the
mutators, so a too-early call fails legibly instead of with a bare NRE. Low priority — defensive
hardening, not a live defect. Left Open to avoid an unscoped change to the mutator entry points on
this critical class without a regression scenario that reproduces the early-call ordering.
Resolution: Resolved — 2026-06-20 (SHA pending): added a private EnsureAddressSpaceCreated() helper
that throws InvalidOperationException("OPC UA address space has not been created yet (server not started.)")
when _root is null, and call it at the top of ResolveParentFolder and at every public address-space
mutator entry point (WriteValue, WriteAlarmCondition, EnsureFolder, EnsureVariable,
MaterialiseAlarmCondition) — right after argument validation, before any _root dereference. A too-early
call (a sink wired or a publish replayed before StartAsync drives CreateAddressSpace) now fails legibly
instead of with a bare NRE out of ResolveParentFolder / CreateVariable. Happy-path behaviour is
unchanged. The guard was test-feasible after all: NodeManagerPreStartGuardTests boots a real host,
borrows the live node manager's real IServerInternal, constructs a SECOND, never-started
OtOpcUaNodeManager from it (so _root is null), and asserts each of the four lock-taking mutators
(EnsureFolder/EnsureVariable/WriteValue/MaterialiseAlarmCondition) throws InvalidOperationException
(not NRE), with the folder case asserting the message text. (WriteAlarmCondition's guard is identical and
sits on the same path; it is build-verified.) Full OpcUaServer.Tests suite green (284/284).
OpcUaServer-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | OtOpcUaNodeManager.cs:2049 (ServeRawPaged tie-cluster stall path), OtOpcUaNodeManager.cs:2068 (absurd-burst backstop) |
| Status | Won't Fix |
Description: (Re-triaged to a non-issue.) Initial concern was that the oversized-tie-cluster
paging path in ServeRawPaged (over-fetch the whole cluster, SliceTieCluster, the
MaxTieClusterOverfetch absurd-burst BadHistoryOperationUnsupported backstop) lacked an
end-to-end node-manager regression and was only covered at the pure level by HistoryPagingTests.
Recommendation: None required.
Resolution: Won't Fix — 2026-06-19: coverage already exists end-to-end. Verified that
NodeManagerHistoryReadPagingTests.Raw_oversized_tie_cluster_pages_within_the_timestamp drives a
real multi-page resume INTO the stall (a 5-way tie cluster larger than the page cap) through a
booted OtOpcUaSdkServer + InMemoryHistoryContinuationStore and asserts lossless duplicate-free
paging, and Raw_tie_cluster_beyond_overfetch_bound_fails_loudly exercises the
> MaxTieClusterOverfetch backstop. The module's HistoryRead paging surface is comprehensively
tested; no test gap.
OpcUaServer-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | OtOpcUaNodeManager.cs:11-30 (class XML doc), OpcUaApplicationHost.cs:88-93 / OpcUaApplicationHost.cs:421-423 (F13/F13c follow-up notes) |
| Status | Resolved |
Description: Several XML comments reference superseded plan milestones as if still pending. The
OtOpcUaNodeManager class doc says equipment-folder hierarchy + type metadata "still come from the
AddressSpaceApplier / EquipmentNodeWalker integration (F14b, tracked under #85)", but
EquipmentNodeWalker is no longer the integration path (the composer → applier → sink → node-manager
chain replaced it, per AddressSpaceComposer's own header) and #85 hierarchy materialisation has
shipped. OpcUaApplicationHost still carries "Full extraction … is tracked as follow-up F13" and
"F13c will plug a real LDAP-bound validator" notes, but the IOpcUaUserAuthenticator /
RoleCarryingUserIdentity impersonation wiring is implemented and live (the doc/docs/OpcUaServer.md
describe it as done). These are stale-comment / undocumented-as-shipped issues, not behaviour bugs.
Recommendation: Refresh the class- and method-level doc comments to drop the
EquipmentNodeWalker/F13/F13c/F14b "pending" framing and describe the shipped behaviour, matching
docs/OpcUaServer.md.
Resolution: Resolved — 2026-06-19 (SHA pending): rewrote the OtOpcUaNodeManager class XML
doc to describe the shipped folder-hierarchy / typed-variable / historized-node / Part 9
condition materialisation (dropping the false "treats every id as a flat BaseDataVariableState"
claim and the retired EquipmentNodeWalker/F14b framing), and the two OpcUaApplicationHost
doc blocks (class summary + BuildUserTokenPolicies) to describe the shipped impersonation/auth
wiring instead of the "F13/F13c pending" framing. Doc-comment-only — no behaviour change; build
re-verified green.
OpcUaServer-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | OtOpcUaNodeManager.cs:674 (MaterialiseAlarmCondition → alarm.OnTimedUnshelve) |
| Status | Resolved |
Description: The system-timer auto-unshelve callback (alarm.OnTimedUnshelve, wired in
MaterialiseAlarmCondition) — fired by the SDK when a TimedShelve duration expires — built its
AlarmCommand with User: string.Empty:
AlarmCommandRouter?.Invoke(new AlarmCommand(alarmId, "Unshelve", string.Empty, null, null)). That
command is routed to the scripted-alarm engine, whose ScriptedAlarmEngine.UnshelveAsync calls
Part9StateMachine.ApplyUnshelve(cur, user, _clock())
(src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/Part9StateMachine.cs:211-213), and that method
opens with if (string.IsNullOrWhiteSpace(user)) throw new ArgumentException("User required.", …).
The exception is swallowed downstream (caught by ScriptedAlarmHostActor), so this is NOT a crash —
but the auto-unshelve silently no-ops: a TimedShelve never auto-expires, leaving the alarm
permanently shelved with no operator-visible error. The bug was operationally invisible and even had
a green node-manager test that asserted cmd.User == string.Empty, encoding the defect as expected
behaviour. The separate, correct design rule — OnTimedUnshelve BYPASSES the AlarmAck role gate
because it is a session-less system timer with no client principal (routing through the gated
HandleAlarmCommand would return BadUserAccessDenied) — is intentional and was preserved; only the
empty User string was the defect.
Recommendation: Pass a non-empty system user at the OnTimedUnshelve call site so ApplyUnshelve
accepts it, while keeping the gate bypass. Use the codebase-wide canonical "system" system-actor
label (matching Part9StateMachine's own AutoUnshelve audit user, AlarmConditionState's
"engine-internal events ⇒ system" doc, and AuditActor.SystemFallback = "system").
Resolution: Resolved — 2026-06-19 (SHA pending): changed the OnTimedUnshelve delegate in
MaterialiseAlarmCondition (OtOpcUaNodeManager.cs:674) to build the AlarmCommand with
User: "system" instead of string.Empty, so the engine's ApplyUnshelve user-required guard
accepts the system-initiated unshelve and the timed auto-unshelve actually applies. The AlarmAck
gate bypass for the session-less system timer is unchanged. TDD: repurposed the existing
AlarmCommandRouterTests.OnTimedUnshelve_with_system_context_returns_good_and_routes_unshelve
(which previously asserted cmd.User == string.Empty) to assert a non-empty "system" user —
verified it failed against the unfixed source (cmd.User should not be null or white space) and
passes after the fix. Surgical one-line src change; no public-contract/wire change. Full
OpcUaServer.Tests suite green (275/275).