Replace silent Enum.TryParse fallback to None with a ParseSecurityProfile helper that emits a startup Log.Warning naming the unsupported value and listing recognised profiles; operators now see the misconfiguration before any client connects rather than getting an unexplained None posture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
18 KiB
Code Review — Server
| Field | Value |
|---|---|
| Module | src/Server/ZB.MOM.WW.OtOpcUa.Server |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | 76d35d1 |
| Status | Reviewed |
| Open findings | 12 |
Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Server-001, Server-002, Server-003 |
| 2 | OtOpcUa conventions | Server-004 |
| 3 | Concurrency & thread safety | Server-005, Server-006 |
| 4 | Error handling & resilience | Server-007, Server-008 |
| 5 | Security | Server-009, Server-010, Server-011 |
| 6 | Performance & resource management | Server-012 |
| 7 | Design-document adherence | Server-013 |
| 8 | Code organization & conventions | Server-014 |
| 9 | Testing coverage | No issues found |
| 10 | Documentation & comments | Server-015 |
Findings
Server-001
| Field | Value |
|---|---|
| Severity | Critical |
| Category | Correctness & logic bugs |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:1791 |
| Status | Resolved |
Description: WriteNodeIdUnknown calls itself unconditionally as its first statement, then sets errors[i]. Unbounded recursion with no base case overflows the stack. Called from all four HistoryRead* overrides whenever a HistoryRead targets a node whose NodeId cannot be resolved to a driver full reference. Any client issuing such a HistoryRead triggers an uncatchable StackOverflowException that terminates the process — a remotely-triggerable DoS.
Recommendation: Replace the self-call with the result-slot assignment mirroring WriteUnsupported/WriteInternalError: results[i] = new OpcHistoryReadResult { StatusCode = StatusCodes.BadNodeIdUnknown }; then errors[i] = StatusCodes.BadNodeIdUnknown;.
Resolution: Resolved 2026-05-22 — replaced the unconditional self-call in WriteNodeIdUnknown with the result-slot assignment (results[i] = new OpcHistoryReadResult { StatusCode = StatusCodes.BadNodeIdUnknown }), mirroring WriteUnsupported/WriteInternalError; the helper is now internal for testability. Regression test DriverNodeManagerHistoryMappingTests.WriteNodeIdUnknown_returns_BadNodeIdUnknown_without_unbounded_recursion runs the helper on a small-stack worker thread and asserts it returns promptly with BadNodeIdUnknown.
Server-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs:60-63 |
| Status | Resolved |
Description: IsAllowed does if (decision.IsAllowed) return true; return !_strictMode;. When a session carries resolved LDAP groups and the evaluator returns an explicit deny, lax mode (default) overrides it to true. The lax fallback is intended only for sessions lacking LDAP groups / missing tries, but here it also nullifies authored NodeAcl deny rules for fully-resolved sessions. Per-tag deny ACLs do nothing until StrictMode is on.
Recommendation: Distinguish "indeterminate / no grant" from "explicit deny." Fall through to !_strictMode only when indeterminate; an explicit deny returns false regardless of mode. Extend AuthorizeDecision with an IsExplicitDeny flag if needed.
Resolution: Resolved 2026-05-22 — AuthorizationGate.IsAllowed now switches on the evaluator's AuthorizationVerdict: Allow returns true, Denied (explicit deny rule matched) returns false in both strict and lax mode, and only the indeterminate NotGranted case falls through to !_strictMode. The existing AuthorizationVerdict.Denied tri-state member is now honoured rather than collapsed into the lax fallback. Regression tests ExplicitDeny_LaxMode_Denies / ExplicitDeny_StrictMode_Denies / NotGranted_LaxMode_Allows / NotGranted_StrictMode_Denies in AuthorizationGateTests cover all four verdict×mode combinations via a fixed-verdict evaluator stub.
Server-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/RingBufferHistoryWriter.cs:96-119 |
| Status | Resolved |
Description: ReadRawAsync's XML doc claims "newest-first," but TagRingBuffer.Snapshot() returns oldest-to-newest and the loop preserves that order — so results are oldest-first. Also maxValuesPerNode is capped against total buffer size before the [startUtc, endUtc) filter, so a paged read returns the oldest in-window samples, contradicting the doc and usual HistoryRead expectations.
Recommendation: Make code and doc agree on ordering (raw HistoryRead is normally ascending source-timestamp). Apply maxValuesPerNode to the in-window count, not the whole buffer.
Resolution: Resolved 2026-05-22 — corrected XML doc from "newest-first" to "oldest-first (ascending source timestamp, matching OPC UA Part 11 §6.4 raw-values default)"; moved maxValuesPerNode cap inside the time-window loop so the limit applies only to in-window results, not the whole buffer snapshot.
Server-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:187-200 |
| Status | Open |
Description: RoleBasedIdentity declares its own Display property, but the base UserIdentity already has a settable DisplayName. DriverNodeManager.ResolveCallUser/RouteScriptedAlarmMethodCalls read the base DisplayName, never Display. Since the ctor passes only userName to base, DisplayName resolves to the username — so scripted-alarm Ack/Confirm/Shelve audit entries record the raw username, not the LDAP-resolved display name the comment promises. Display is dead code.
Recommendation: Drop Display; set the base DisplayName = displayName ?? userName;. Verify ResolveCallUser yields the resolved display name.
Resolution: (open)
Server-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs:166, src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:303-311 |
| Status | Resolved |
Description: OnValueChanged raises TransitionRaised on the value-change thread; the subscriber OnAlarmServiceTransition drives ConditionSink.OnTransition → alarm.ReportEvent. DriverNodeManager.Dispose detaches the handler but does not synchronise against an in-flight Invoke. The service is process-shared across drivers, so a transition can dispatch to a ConditionSink whose DriverNodeManager is concurrently being disposed → ReportEvent on a torn-down node manager.
Recommendation: Guard OnAlarmServiceTransition with a _disposed check under Lock before sink.OnTransition. Document that handlers must tolerate invocation during their owner's disposal.
Resolution: Resolved 2026-05-22 — added _nodeManagerDisposed field; Dispose(bool) now sets it under Lock before detaching the handler; OnAlarmServiceTransition checks the flag under the same Lock and exits early, preventing forwarding to a sink after the node manager has begun disposal.
Server-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:478-482, 1342-1348 |
| Status | Open |
Description: OnReadValue/OnWriteValue are synchronous stack hooks that block on async driver calls via .GetAwaiter().GetResult() with CancellationToken.None. With MaxRequestThreadCount = 100, a burst of reads/writes into a stalled driver pins request threads for the full pipeline timeout, exhausting the pool and stalling unrelated sessions. The call cannot be cancelled by a client timeout.
Recommendation: Derive a CancellationToken from the OperationContext / TransportQuotas.OperationTimeout so a stuck driver call is abandoned. Longer term, use the stack's async service overrides if available.
Resolution: (open)
Server-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:179-183 |
| Status | Resolved |
Description: HealthEndpointsHost is built without a configDbHealthy delegate, so the default () => true is used — /healthz always reports configDbReachable = true and never 503s on a DB outage. _staleConfigFlag is also never supplied by Program.cs, so the stale-config signal is inert too. /healthz degenerates to a pure liveness probe; operators get a false-healthy during a DB outage.
Recommendation: Wire a real config-DB probe (cheap cached SELECT 1) into HealthEndpointsHost, and register StaleConfigFlag in Program.cs. Or move DB health to /readyz and drop the misleading configDbReachable field.
Resolution: Resolved 2026-05-22 — added Func<bool>? configDbHealthy parameter to OpcUaApplicationHost (defaults null, backward-compatible); Program.cs constructs a DbHealthCache that calls CanConnectAsync every 10 s and caches the result, then passes () => dbHealthCache.IsHealthy; /healthz now reflects real DB reachability and returns 503 on a DB outage (unless stale-config cache is warm).
Server-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:736 |
| Status | Open |
Description: RouteScriptedAlarmMethodCalls marks a handled slot by setting errors[i] = ServiceResult.Good, assuming base.Call skips non-null Good error slots. The stack and GateCallMethodRequests only ever pre-populate Bad slots; the skip-on-Good assumption is not a guaranteed SDK contract. If base.Call re-dispatches, the engine method and the stack's built-in Part 9 handler both fire — double transition.
Recommendation: Verify against the pinned SDK whether base.Call skips Good-pre-populated slots. If not, exclude routed slots from methodsToCall before base.Call. Add a test asserting exactly-once engine transition for a routed Acknowledge.
Resolution: (open)
Server-009
| Field | Value |
|---|---|
| Severity | High |
| Category | Security |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/LdapOptions.cs:44, src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:74 |
| Status | Resolved |
Description: AllowInsecureLdap defaults to true (and Program.cs reads ?? true); UseTls defaults to false. Out of the box, usernames and plaintext passwords are bound to LDAP over an unencrypted socket. A production deployment enabling LDAP without explicitly setting AllowInsecureLdap=false ships credentials in clear text on the server→LDAP hop.
Recommendation: Default AllowInsecureLdap to false in both the property initializer and the Program.cs fallback. Log a startup warning when LDAP is enabled with UseTls=false && AllowInsecureLdap=true.
Resolution: Resolved 2026-05-22 — LdapOptions.AllowInsecureLdap now defaults to false (secure-by-default) and Program.cs's config fallback reads ?? false. Program.cs logs a startup Log.Warning when LDAP is enabled with UseTls=false && AllowInsecureLdap=true, flagging the clear-text credential hop. Regression tests in LdapOptionsTests assert the new secure defaults.
Server-010
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:59, src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:284-291 |
| Status | Resolved |
Description: AutoAcceptUntrustedClientCertificates defaults to true (Program.cs reads ?? true). BuildConfiguration wires a handler that accepts any client cert failing with BadCertificateUntrusted. A deployment that forgets to flip the flag accepts every untrusted client cert, defeating the PKI trust list. With the always-present None policy, the default posture is fully open.
Recommendation: Default AutoAcceptUntrustedClientCertificates to false; keep auto-accept as opt-in dev convenience. docs/security.md already shows false — align code to doc.
Resolution: Resolved 2026-05-22 — OpcUaServerOptions.AutoAcceptUntrustedClientCertificates property initialiser changed from true to false (secure by default, aligning with docs/security.md); Program.cs config fallback changed from ?? true to ?? false.
Server-011
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:322-346 |
| Status | Resolved |
Description: BuildUserTokenPolicies advertises a UserName token policy only when SecurityProfile == Basic256Sha256SignAndEncrypt && Ldap.Enabled. With the default SecurityProfile = None and Ldap.Enabled = true, the LDAP authenticator is wired but no UserName policy is advertised — clients cannot present credentials; the only path in is Anonymous. The operator's intent is silently not honoured, with no diagnostic.
Recommendation: Validate config at startup and warn/fail when Ldap.Enabled = true but no UserName policy is advertised. Allow UserName tokens on any non-None profile (they are stack-encrypted regardless, per docs/security.md).
Resolution: Resolved 2026-05-22 — BuildUserTokenPolicies now advertises a UserName token policy whenever Ldap.Enabled && SecurityProfile != None (previously required == Basic256Sha256SignAndEncrypt); StartAsync logs a LogWarning at startup when Ldap.Enabled = true but SecurityProfile = None, surfacing the misconfiguration before clients connect.
Server-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79 |
| Status | Open |
Description: ProbeAsync creates an IHttpClientFactory client and mutates client.Timeout on every 2-second probe tick. The timeout belongs on the request or on the named-client registration, not set per call on a factory-vended instance.
Recommendation: Configure the timeout once via AddHttpClient(HttpClientName).ConfigureHttpClient(...), or use a per-request linked CancellationTokenSource(_options.HttpProbeTimeout); drop the per-call client.Timeout mutation.
Resolution: (open)
Server-013
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:9-19, src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:296-346, src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:89 |
| Status | Resolved |
Description: docs/security.md documents 7 transport security profiles and CLAUDE.md references a SecurityProfileResolver. The code's OpcUaSecurityProfile enum has only None and Basic256Sha256SignAndEncrypt; BuildSecurityPolicies adds a policy only for the latter; SecurityProfileResolver does not exist in the repo (grep finds it only in docs). Basic256Sha256-Sign and all Aes profiles are unimplemented, and Program.cs:89's Enum.TryParse silently selects None for an unrecognised profile string.
Recommendation: Reconcile code and docs — implement the missing profiles + SecurityProfileResolver, or trim docs/security.md / CLAUDE.md to the two supported profiles. At minimum, log a warning when a configured SecurityProfile fails to parse instead of silently using None.
Resolution: Resolved 2026-05-22 — replaced the silent Enum.TryParse ?? None fallback in Program.cs with a ParseSecurityProfile helper that produces a warning string listing supported profiles when the configured value is unrecognised; the warning is emitted via Log.Warning at startup before the host builds, making the misconfiguration immediately visible. Implementing the missing 5 profiles is tracked as a doc-to-code gap rather than a single finding fix.
Server-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs |
| Status | Open |
Description: SealedBootstrap claims in its xml-doc to "close release blocker #2" by consuming the generation-sealed cache + resilient reader + stale-config flag, but Program.cs registers and uses NodeBootstrap instead. SealedBootstrap is never registered in DI nor referenced by OpcUaServerService — it and its StaleConfigFlag plumbing are dead in the production wire-up; the release blocker remains open in practice.
Recommendation: Either register SealedBootstrap (with GenerationSealedCache/ResilientConfigReader/StaleConfigFlag) and wire StaleConfigFlag into the health host, or delete SealedBootstrap and correct the release-readiness doc.
Resolution: (open)
Server-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21, src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26 |
| Status | Open |
Description: OtOpcUaServer's class doc still says "PR 16 minimum-viable scope ... no security ... LDAP + security profiles are deferred." OpcUaServerOptions's says "PR 17 minimum-viable scope: no LDAP, no security profiles beyond None." Both are stale — the class now does LDAP UserName auth, anonymous-role mapping, and a configurable security profile. A reader would wrongly conclude the server has no authentication.
Recommendation: Update both class summaries to describe current behaviour and drop the "deferred to a future PR" language.
Resolution: (open)