docs(code-reviews): comprehensive per-module review pass at 76d35d1
Reviewed all 31 src/ production projects against the 10-category checklist in REVIEW-PROCESS.md. Each module gets its own findings.md; code-reviews/README.md is regenerated from them. 334 findings: 6 Critical, 46 High, 126 Medium, 156 Low. Critical findings: - Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead on an unresolvable node crashes the process (remote DoS). - Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView) plus unauthenticated mutating routes. - Core.Scripting-001: System.Environment reachable from operator scripts; Environment.Exit() terminates the server. - Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt payload misapplies outcomes — silent alarm-event data loss. - Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so a transient gateway drop permanently kills the event stream. All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md section 4. Review only — no source code changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
237
code-reviews/Server/findings.md
Normal file
237
code-reviews/Server/findings.md
Normal file
@@ -0,0 +1,237 @@
|
||||
# Code Review — Server
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Server/ZB.MOM.WW.OtOpcUa.Server` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 15 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Server-001, Server-002, Server-003 |
|
||||
| 2 | OtOpcUa conventions | Server-004 |
|
||||
| 3 | Concurrency & thread safety | Server-005, Server-006 |
|
||||
| 4 | Error handling & resilience | Server-007, Server-008 |
|
||||
| 5 | Security | Server-009, Server-010, Server-011 |
|
||||
| 6 | Performance & resource management | Server-012 |
|
||||
| 7 | Design-document adherence | Server-013 |
|
||||
| 8 | Code organization & conventions | Server-014 |
|
||||
| 9 | Testing coverage | No issues found |
|
||||
| 10 | Documentation & comments | Server-015 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Server-001
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Critical |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:1791` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `WriteNodeIdUnknown` calls itself unconditionally as its first statement, then sets `errors[i]`. Unbounded recursion with no base case overflows the stack. Called from all four `HistoryRead*` overrides whenever a HistoryRead targets a node whose `NodeId` cannot be resolved to a driver full reference. Any client issuing such a HistoryRead triggers an uncatchable `StackOverflowException` that terminates the process — a remotely-triggerable DoS.
|
||||
|
||||
**Recommendation:** Replace the self-call with the result-slot assignment mirroring `WriteUnsupported`/`WriteInternalError`: `results[i] = new OpcHistoryReadResult { StatusCode = StatusCodes.BadNodeIdUnknown };` then `errors[i] = StatusCodes.BadNodeIdUnknown;`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-002
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs:60-63` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `IsAllowed` does `if (decision.IsAllowed) return true; return !_strictMode;`. When a session carries resolved LDAP groups and the evaluator returns an explicit deny, lax mode (default) overrides it to `true`. The lax fallback is intended only for sessions lacking LDAP groups / missing tries, but here it also nullifies authored `NodeAcl` deny rules for fully-resolved sessions. Per-tag deny ACLs do nothing until `StrictMode` is on.
|
||||
|
||||
**Recommendation:** Distinguish "indeterminate / no grant" from "explicit deny." Fall through to `!_strictMode` only when indeterminate; an explicit deny returns `false` regardless of mode. Extend `AuthorizeDecision` with an `IsExplicitDeny` flag if needed.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-003
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/RingBufferHistoryWriter.cs:96-119` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ReadRawAsync`'s XML doc claims "newest-first," but `TagRingBuffer.Snapshot()` returns oldest-to-newest and the loop preserves that order — so results are oldest-first. Also `maxValuesPerNode` is capped against total buffer size *before* the `[startUtc, endUtc)` filter, so a paged read returns the oldest in-window samples, contradicting the doc and usual HistoryRead expectations.
|
||||
|
||||
**Recommendation:** Make code and doc agree on ordering (raw HistoryRead is normally ascending source-timestamp). Apply `maxValuesPerNode` to the in-window count, not the whole buffer.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-004
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:187-200` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `RoleBasedIdentity` declares its own `Display` property, but the base `UserIdentity` already has a settable `DisplayName`. `DriverNodeManager.ResolveCallUser`/`RouteScriptedAlarmMethodCalls` read the base `DisplayName`, never `Display`. Since the ctor passes only `userName` to base, `DisplayName` resolves to the username — so scripted-alarm Ack/Confirm/Shelve audit entries record the raw username, not the LDAP-resolved display name the comment promises. `Display` is dead code.
|
||||
|
||||
**Recommendation:** Drop `Display`; set the base `DisplayName = displayName ?? userName;`. Verify `ResolveCallUser` yields the resolved display name.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-005
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs:166`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:303-311` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OnValueChanged` raises `TransitionRaised` on the value-change thread; the subscriber `OnAlarmServiceTransition` drives `ConditionSink.OnTransition` → `alarm.ReportEvent`. `DriverNodeManager.Dispose` detaches the handler but does not synchronise against an in-flight `Invoke`. The service is process-shared across drivers, so a transition can dispatch to a `ConditionSink` whose `DriverNodeManager` is concurrently being disposed → `ReportEvent` on a torn-down node manager.
|
||||
|
||||
**Recommendation:** Guard `OnAlarmServiceTransition` with a `_disposed` check under `Lock` before `sink.OnTransition`. Document that handlers must tolerate invocation during their owner's disposal.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-006
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:478-482, 1342-1348` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OnReadValue`/`OnWriteValue` are synchronous stack hooks that block on async driver calls via `.GetAwaiter().GetResult()` with `CancellationToken.None`. With `MaxRequestThreadCount = 100`, a burst of reads/writes into a stalled driver pins request threads for the full pipeline timeout, exhausting the pool and stalling unrelated sessions. The call cannot be cancelled by a client timeout.
|
||||
|
||||
**Recommendation:** Derive a `CancellationToken` from the `OperationContext` / `TransportQuotas.OperationTimeout` so a stuck driver call is abandoned. Longer term, use the stack's async service overrides if available.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-007
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:179-183` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `HealthEndpointsHost` is built without a `configDbHealthy` delegate, so the default `() => true` is used — `/healthz` always reports `configDbReachable = true` and never 503s on a DB outage. `_staleConfigFlag` is also never supplied by `Program.cs`, so the stale-config signal is inert too. `/healthz` degenerates to a pure liveness probe; operators get a false-healthy during a DB outage.
|
||||
|
||||
**Recommendation:** Wire a real config-DB probe (cheap cached `SELECT 1`) into `HealthEndpointsHost`, and register `StaleConfigFlag` in `Program.cs`. Or move DB health to `/readyz` and drop the misleading `configDbReachable` field.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-008
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:736` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `RouteScriptedAlarmMethodCalls` marks a handled slot by setting `errors[i] = ServiceResult.Good`, assuming `base.Call` skips non-null *Good* error slots. The stack and `GateCallMethodRequests` only ever pre-populate *Bad* slots; the skip-on-Good assumption is not a guaranteed SDK contract. If `base.Call` re-dispatches, the engine method and the stack's built-in Part 9 handler both fire — double transition.
|
||||
|
||||
**Recommendation:** Verify against the pinned SDK whether `base.Call` skips Good-pre-populated slots. If not, exclude routed slots from `methodsToCall` before `base.Call`. Add a test asserting exactly-once engine transition for a routed Acknowledge.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-009
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/LdapOptions.cs:44`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:74` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `AllowInsecureLdap` defaults to `true` (and `Program.cs` reads `?? true`); `UseTls` defaults to `false`. Out of the box, usernames and plaintext passwords are bound to LDAP over an unencrypted socket. A production deployment enabling LDAP without explicitly setting `AllowInsecureLdap=false` ships credentials in clear text on the server→LDAP hop.
|
||||
|
||||
**Recommendation:** Default `AllowInsecureLdap` to `false` in both the property initializer and the `Program.cs` fallback. Log a startup warning when LDAP is enabled with `UseTls=false && AllowInsecureLdap=true`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-010
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:59`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:284-291` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `AutoAcceptUntrustedClientCertificates` defaults to `true` (`Program.cs` reads `?? true`). `BuildConfiguration` wires a handler that accepts any client cert failing with `BadCertificateUntrusted`. A deployment that forgets to flip the flag accepts every untrusted client cert, defeating the PKI trust list. With the always-present `None` policy, the default posture is fully open.
|
||||
|
||||
**Recommendation:** Default `AutoAcceptUntrustedClientCertificates` to `false`; keep auto-accept as opt-in dev convenience. `docs/security.md` already shows `false` — align code to doc.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-011
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:322-346` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `BuildUserTokenPolicies` advertises a `UserName` token policy only when `SecurityProfile == Basic256Sha256SignAndEncrypt && Ldap.Enabled`. With the default `SecurityProfile = None` and `Ldap.Enabled = true`, the LDAP authenticator is wired but no UserName policy is advertised — clients cannot present credentials; the only path in is Anonymous. The operator's intent is silently not honoured, with no diagnostic.
|
||||
|
||||
**Recommendation:** Validate config at startup and warn/fail when `Ldap.Enabled = true` but no UserName policy is advertised. Allow UserName tokens on any non-None profile (they are stack-encrypted regardless, per `docs/security.md`).
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-012
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ProbeAsync` creates an `IHttpClientFactory` client and mutates `client.Timeout` on every 2-second probe tick. The timeout belongs on the request or on the named-client registration, not set per call on a factory-vended instance.
|
||||
|
||||
**Recommendation:** Configure the timeout once via `AddHttpClient(HttpClientName).ConfigureHttpClient(...)`, or use a per-request linked `CancellationTokenSource(_options.HttpProbeTimeout)`; drop the per-call `client.Timeout` mutation.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-013
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:9-19`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:296-346`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:89` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `docs/security.md` documents 7 transport security profiles and `CLAUDE.md` references a `SecurityProfileResolver`. The code's `OpcUaSecurityProfile` enum has only `None` and `Basic256Sha256SignAndEncrypt`; `BuildSecurityPolicies` adds a policy only for the latter; `SecurityProfileResolver` does not exist in the repo (grep finds it only in docs). `Basic256Sha256-Sign` and all Aes profiles are unimplemented, and `Program.cs:89`'s `Enum.TryParse` silently selects `None` for an unrecognised profile string.
|
||||
|
||||
**Recommendation:** Reconcile code and docs — implement the missing profiles + `SecurityProfileResolver`, or trim `docs/security.md` / `CLAUDE.md` to the two supported profiles. At minimum, log a warning when a configured `SecurityProfile` fails to parse instead of silently using `None`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-014
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `SealedBootstrap` claims in its xml-doc to "close release blocker #2" by consuming the generation-sealed cache + resilient reader + stale-config flag, but `Program.cs` registers and uses `NodeBootstrap` instead. `SealedBootstrap` is never registered in DI nor referenced by `OpcUaServerService` — it and its `StaleConfigFlag` plumbing are dead in the production wire-up; the release blocker remains open in practice.
|
||||
|
||||
**Recommendation:** Either register `SealedBootstrap` (with `GenerationSealedCache`/`ResilientConfigReader`/`StaleConfigFlag`) and wire `StaleConfigFlag` into the health host, or delete `SealedBootstrap` and correct the release-readiness doc.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-015
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OtOpcUaServer`'s class doc still says "PR 16 minimum-viable scope ... no security ... LDAP + security profiles are deferred." `OpcUaServerOptions`'s says "PR 17 minimum-viable scope: no LDAP, no security profiles beyond None." Both are stale — the class now does LDAP UserName auth, anonymous-role mapping, and a configurable security profile. A reader would wrongly conclude the server has no authentication.
|
||||
|
||||
**Recommendation:** Update both class summaries to describe current behaviour and drop the "deferred to a future PR" language.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
Reference in New Issue
Block a user