Files
lmxopcua/code-reviews/Server/findings.md
Joseph Doherty 571066130b fix(server): stop WriteNodeIdUnknown infinite recursion (Server-001)
WriteNodeIdUnknown called itself unconditionally as its first statement
— unbounded recursion with no base case → StackOverflowException, an
uncatchable process crash reachable by any client issuing a HistoryRead
on an unresolvable NodeId (remote DoS).

Replace the self-call with the result-slot assignment, mirroring
WriteUnsupported / WriteInternalError. The helper is now internal so the
regression test can pin the StatusCode without a server fixture.

Resolves code-review finding Server-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:53:44 -04:00

15 KiB

Code Review — Server

Field Value
Module src/Server/ZB.MOM.WW.OtOpcUa.Server
Reviewer Claude Code
Review date 2026-05-22
Commit reviewed 76d35d1
Status Reviewed
Open findings 14

Checklist coverage

# Category Result
1 Correctness & logic bugs Server-001, Server-002, Server-003
2 OtOpcUa conventions Server-004
3 Concurrency & thread safety Server-005, Server-006
4 Error handling & resilience Server-007, Server-008
5 Security Server-009, Server-010, Server-011
6 Performance & resource management Server-012
7 Design-document adherence Server-013
8 Code organization & conventions Server-014
9 Testing coverage No issues found
10 Documentation & comments Server-015

Findings

Server-001

Field Value
Severity Critical
Category Correctness & logic bugs
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:1791
Status Resolved

Description: WriteNodeIdUnknown calls itself unconditionally as its first statement, then sets errors[i]. Unbounded recursion with no base case overflows the stack. Called from all four HistoryRead* overrides whenever a HistoryRead targets a node whose NodeId cannot be resolved to a driver full reference. Any client issuing such a HistoryRead triggers an uncatchable StackOverflowException that terminates the process — a remotely-triggerable DoS.

Recommendation: Replace the self-call with the result-slot assignment mirroring WriteUnsupported/WriteInternalError: results[i] = new OpcHistoryReadResult { StatusCode = StatusCodes.BadNodeIdUnknown }; then errors[i] = StatusCodes.BadNodeIdUnknown;.

Resolution: Resolved 2026-05-22 — replaced the unconditional self-call in WriteNodeIdUnknown with the result-slot assignment (results[i] = new OpcHistoryReadResult { StatusCode = StatusCodes.BadNodeIdUnknown }), mirroring WriteUnsupported/WriteInternalError; the helper is now internal for testability. Regression test DriverNodeManagerHistoryMappingTests.WriteNodeIdUnknown_returns_BadNodeIdUnknown_without_unbounded_recursion runs the helper on a small-stack worker thread and asserts it returns promptly with BadNodeIdUnknown.

Server-002

Field Value
Severity High
Category Correctness & logic bugs
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs:60-63
Status Open

Description: IsAllowed does if (decision.IsAllowed) return true; return !_strictMode;. When a session carries resolved LDAP groups and the evaluator returns an explicit deny, lax mode (default) overrides it to true. The lax fallback is intended only for sessions lacking LDAP groups / missing tries, but here it also nullifies authored NodeAcl deny rules for fully-resolved sessions. Per-tag deny ACLs do nothing until StrictMode is on.

Recommendation: Distinguish "indeterminate / no grant" from "explicit deny." Fall through to !_strictMode only when indeterminate; an explicit deny returns false regardless of mode. Extend AuthorizeDecision with an IsExplicitDeny flag if needed.

Resolution: (open)

Server-003

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/RingBufferHistoryWriter.cs:96-119
Status Open

Description: ReadRawAsync's XML doc claims "newest-first," but TagRingBuffer.Snapshot() returns oldest-to-newest and the loop preserves that order — so results are oldest-first. Also maxValuesPerNode is capped against total buffer size before the [startUtc, endUtc) filter, so a paged read returns the oldest in-window samples, contradicting the doc and usual HistoryRead expectations.

Recommendation: Make code and doc agree on ordering (raw HistoryRead is normally ascending source-timestamp). Apply maxValuesPerNode to the in-window count, not the whole buffer.

Resolution: (open)

Server-004

Field Value
Severity Low
Category OtOpcUa conventions
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:187-200
Status Open

Description: RoleBasedIdentity declares its own Display property, but the base UserIdentity already has a settable DisplayName. DriverNodeManager.ResolveCallUser/RouteScriptedAlarmMethodCalls read the base DisplayName, never Display. Since the ctor passes only userName to base, DisplayName resolves to the username — so scripted-alarm Ack/Confirm/Shelve audit entries record the raw username, not the LDAP-resolved display name the comment promises. Display is dead code.

Recommendation: Drop Display; set the base DisplayName = displayName ?? userName;. Verify ResolveCallUser yields the resolved display name.

Resolution: (open)

Server-005

Field Value
Severity Medium
Category Concurrency & thread safety
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs:166, src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:303-311
Status Open

Description: OnValueChanged raises TransitionRaised on the value-change thread; the subscriber OnAlarmServiceTransition drives ConditionSink.OnTransitionalarm.ReportEvent. DriverNodeManager.Dispose detaches the handler but does not synchronise against an in-flight Invoke. The service is process-shared across drivers, so a transition can dispatch to a ConditionSink whose DriverNodeManager is concurrently being disposed → ReportEvent on a torn-down node manager.

Recommendation: Guard OnAlarmServiceTransition with a _disposed check under Lock before sink.OnTransition. Document that handlers must tolerate invocation during their owner's disposal.

Resolution: (open)

Server-006

Field Value
Severity Low
Category Concurrency & thread safety
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:478-482, 1342-1348
Status Open

Description: OnReadValue/OnWriteValue are synchronous stack hooks that block on async driver calls via .GetAwaiter().GetResult() with CancellationToken.None. With MaxRequestThreadCount = 100, a burst of reads/writes into a stalled driver pins request threads for the full pipeline timeout, exhausting the pool and stalling unrelated sessions. The call cannot be cancelled by a client timeout.

Recommendation: Derive a CancellationToken from the OperationContext / TransportQuotas.OperationTimeout so a stuck driver call is abandoned. Longer term, use the stack's async service overrides if available.

Resolution: (open)

Server-007

Field Value
Severity Medium
Category Error handling & resilience
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:179-183
Status Open

Description: HealthEndpointsHost is built without a configDbHealthy delegate, so the default () => true is used — /healthz always reports configDbReachable = true and never 503s on a DB outage. _staleConfigFlag is also never supplied by Program.cs, so the stale-config signal is inert too. /healthz degenerates to a pure liveness probe; operators get a false-healthy during a DB outage.

Recommendation: Wire a real config-DB probe (cheap cached SELECT 1) into HealthEndpointsHost, and register StaleConfigFlag in Program.cs. Or move DB health to /readyz and drop the misleading configDbReachable field.

Resolution: (open)

Server-008

Field Value
Severity Low
Category Error handling & resilience
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:736
Status Open

Description: RouteScriptedAlarmMethodCalls marks a handled slot by setting errors[i] = ServiceResult.Good, assuming base.Call skips non-null Good error slots. The stack and GateCallMethodRequests only ever pre-populate Bad slots; the skip-on-Good assumption is not a guaranteed SDK contract. If base.Call re-dispatches, the engine method and the stack's built-in Part 9 handler both fire — double transition.

Recommendation: Verify against the pinned SDK whether base.Call skips Good-pre-populated slots. If not, exclude routed slots from methodsToCall before base.Call. Add a test asserting exactly-once engine transition for a routed Acknowledge.

Resolution: (open)

Server-009

Field Value
Severity High
Category Security
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/LdapOptions.cs:44, src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:74
Status Open

Description: AllowInsecureLdap defaults to true (and Program.cs reads ?? true); UseTls defaults to false. Out of the box, usernames and plaintext passwords are bound to LDAP over an unencrypted socket. A production deployment enabling LDAP without explicitly setting AllowInsecureLdap=false ships credentials in clear text on the server→LDAP hop.

Recommendation: Default AllowInsecureLdap to false in both the property initializer and the Program.cs fallback. Log a startup warning when LDAP is enabled with UseTls=false && AllowInsecureLdap=true.

Resolution: (open)

Server-010

Field Value
Severity Medium
Category Security
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:59, src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:284-291
Status Open

Description: AutoAcceptUntrustedClientCertificates defaults to true (Program.cs reads ?? true). BuildConfiguration wires a handler that accepts any client cert failing with BadCertificateUntrusted. A deployment that forgets to flip the flag accepts every untrusted client cert, defeating the PKI trust list. With the always-present None policy, the default posture is fully open.

Recommendation: Default AutoAcceptUntrustedClientCertificates to false; keep auto-accept as opt-in dev convenience. docs/security.md already shows false — align code to doc.

Resolution: (open)

Server-011

Field Value
Severity Medium
Category Security
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:322-346
Status Open

Description: BuildUserTokenPolicies advertises a UserName token policy only when SecurityProfile == Basic256Sha256SignAndEncrypt && Ldap.Enabled. With the default SecurityProfile = None and Ldap.Enabled = true, the LDAP authenticator is wired but no UserName policy is advertised — clients cannot present credentials; the only path in is Anonymous. The operator's intent is silently not honoured, with no diagnostic.

Recommendation: Validate config at startup and warn/fail when Ldap.Enabled = true but no UserName policy is advertised. Allow UserName tokens on any non-None profile (they are stack-encrypted regardless, per docs/security.md).

Resolution: (open)

Server-012

Field Value
Severity Low
Category Performance & resource management
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79
Status Open

Description: ProbeAsync creates an IHttpClientFactory client and mutates client.Timeout on every 2-second probe tick. The timeout belongs on the request or on the named-client registration, not set per call on a factory-vended instance.

Recommendation: Configure the timeout once via AddHttpClient(HttpClientName).ConfigureHttpClient(...), or use a per-request linked CancellationTokenSource(_options.HttpProbeTimeout); drop the per-call client.Timeout mutation.

Resolution: (open)

Server-013

Field Value
Severity Medium
Category Design-document adherence
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:9-19, src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:296-346, src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:89
Status Open

Description: docs/security.md documents 7 transport security profiles and CLAUDE.md references a SecurityProfileResolver. The code's OpcUaSecurityProfile enum has only None and Basic256Sha256SignAndEncrypt; BuildSecurityPolicies adds a policy only for the latter; SecurityProfileResolver does not exist in the repo (grep finds it only in docs). Basic256Sha256-Sign and all Aes profiles are unimplemented, and Program.cs:89's Enum.TryParse silently selects None for an unrecognised profile string.

Recommendation: Reconcile code and docs — implement the missing profiles + SecurityProfileResolver, or trim docs/security.md / CLAUDE.md to the two supported profiles. At minimum, log a warning when a configured SecurityProfile fails to parse instead of silently using None.

Resolution: (open)

Server-014

Field Value
Severity Low
Category Code organization & conventions
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs
Status Open

Description: SealedBootstrap claims in its xml-doc to "close release blocker #2" by consuming the generation-sealed cache + resilient reader + stale-config flag, but Program.cs registers and uses NodeBootstrap instead. SealedBootstrap is never registered in DI nor referenced by OpcUaServerService — it and its StaleConfigFlag plumbing are dead in the production wire-up; the release blocker remains open in practice.

Recommendation: Either register SealedBootstrap (with GenerationSealedCache/ResilientConfigReader/StaleConfigFlag) and wire StaleConfigFlag into the health host, or delete SealedBootstrap and correct the release-readiness doc.

Resolution: (open)

Server-015

Field Value
Severity Low
Category Documentation & comments
Location src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21, src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26
Status Open

Description: OtOpcUaServer's class doc still says "PR 16 minimum-viable scope ... no security ... LDAP + security profiles are deferred." OpcUaServerOptions's says "PR 17 minimum-viable scope: no LDAP, no security profiles beyond None." Both are stale — the class now does LDAP UserName auth, anonymous-role mapping, and a configurable security profile. A reader would wrongly conclude the server has no authentication.

Recommendation: Update both class summaries to describe current behaviour and drop the "deferred to a future PR" language.

Resolution: (open)