fix(server): resolve Low code-review findings (Server-004,006,008,012,014,015)
- Server-004: pass the role-derived display name to UserIdentity's base ctor (the SDK's DisplayName has no public setter) and drop the dead Display property; make RoleBasedIdentity internal sealed. - Server-006: derive a bounded CancellationToken from the SDK's OperationContext.OperationDeadline in OnReadValue / OnWriteValue so a stalled driver call can no longer pin the request thread. - Server-008: mark handled slots via CallMethodRequest.Processed = true in RouteScriptedAlarmMethodCalls (the SDK skips on Processed, not on a Good error slot). - Server-012: PeerHttpProbeLoop.ProbeAsync stops mutating client.Timeout per call; uses a per-request CancellationTokenSource linked to the shutdown token instead. - Server-014: wire SealedBootstrap into Program.cs via AddSealedBootstrap + OpcUaServerService so the generation-sealed cache + stale-config flag + resilient reader actually run; /healthz now reflects cache-fallback state. - Server-015: replace the stale 'PR 16 / PR 17 minimum-viable scope' class summaries on OtOpcUaServer and OpcUaServerOptions with the shipped LDAP + anonymous-role + configurable security-profile prose. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -7,7 +7,7 @@
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 6 |
|
||||
| Open findings | 0 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
@@ -74,13 +74,13 @@
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:187-200` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `RoleBasedIdentity` declares its own `Display` property, but the base `UserIdentity` already has a settable `DisplayName`. `DriverNodeManager.ResolveCallUser`/`RouteScriptedAlarmMethodCalls` read the base `DisplayName`, never `Display`. Since the ctor passes only `userName` to base, `DisplayName` resolves to the username — so scripted-alarm Ack/Confirm/Shelve audit entries record the raw username, not the LDAP-resolved display name the comment promises. `Display` is dead code.
|
||||
|
||||
**Recommendation:** Drop `Display`; set the base `DisplayName = displayName ?? userName;`. Verify `ResolveCallUser` yields the resolved display name.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-23 — re-triaged: in the pinned SDK version (1.5.374.126) `UserIdentity.DisplayName` is a sealed-virtual auto-property with no public setter, so the base `DisplayName = …` assignment the original recommendation suggested won't compile. Instead the fix passes `displayName ?? userName` as the first arg to the base `UserIdentity(string, string)` ctor — the SDK seeds `DisplayName` from that arg internally — and removes the dead `Display` property. `RoleBasedIdentity` is now `internal sealed` so `DriverNodeManager.ResolveCallUser` can be unit-tested against the production identity type. Regression tests `RoleBasedIdentityTests.DisplayName_returns_LDAP_resolved_display_name_when_present`, `DisplayName_falls_back_to_userName_when_LDAP_display_name_is_null`, and `ResolveCallUser_yields_LDAP_resolved_display_name` cover the behaviour.
|
||||
|
||||
### Server-005
|
||||
| Field | Value |
|
||||
@@ -102,13 +102,13 @@
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:478-482, 1342-1348` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `OnReadValue`/`OnWriteValue` are synchronous stack hooks that block on async driver calls via `.GetAwaiter().GetResult()` with `CancellationToken.None`. With `MaxRequestThreadCount = 100`, a burst of reads/writes into a stalled driver pins request threads for the full pipeline timeout, exhausting the pool and stalling unrelated sessions. The call cannot be cancelled by a client timeout.
|
||||
|
||||
**Recommendation:** Derive a `CancellationToken` from the `OperationContext` / `TransportQuotas.OperationTimeout` so a stuck driver call is abandoned. Longer term, use the stack's async service overrides if available.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-23 — added `DriverNodeManager.DeriveOperationCancellation(ISystemContext, TimeSpan fallback)` helper that reads `SystemContext.OperationContext.OperationDeadline` (which the stack sets from the client's `RequestHeader.TimeoutHint`). `OnReadValue` and `OnWriteValue` now pass `cts.Token` to `_invoker.ExecuteAsync` / `ExecuteWriteAsync` instead of `CancellationToken.None`, and surface `BadTimeout` (instead of `BadInternalError`) when the deadline fires. Handles both the SDK's sentinel deadlines: `DateTime.MinValue` (no deadline plumbed) and `DateTime.MaxValue` (TimeoutHint=0, the SDK default) collapse to a 30-s fallback. A deadline > Int32.MaxValue ms in the future also clamps to the fallback so the read path never throws `ArgumentOutOfRangeException` from inside `CancellationTokenSource(TimeSpan)`. Regression tests in `DriverNodeManagerCancellationTests` cover all five paths (future / past / missing / MinValue / MaxValue).
|
||||
|
||||
### Server-007
|
||||
| Field | Value |
|
||||
@@ -130,13 +130,13 @@
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:736` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `RouteScriptedAlarmMethodCalls` marks a handled slot by setting `errors[i] = ServiceResult.Good`, assuming `base.Call` skips non-null *Good* error slots. The stack and `GateCallMethodRequests` only ever pre-populate *Bad* slots; the skip-on-Good assumption is not a guaranteed SDK contract. If `base.Call` re-dispatches, the engine method and the stack's built-in Part 9 handler both fire — double transition.
|
||||
|
||||
**Recommendation:** Verify against the pinned SDK whether `base.Call` skips Good-pre-populated slots. If not, exclude routed slots from `methodsToCall` before `base.Call`. Add a test asserting exactly-once engine transition for a routed Acknowledge.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-23 — verified against the pinned SDK (DeepWiki query against OPCFoundation/UA-.NETStandard): `CustomNodeManager2.Call` / `CallInternalAsync` skip slots whose `CallMethodRequest.Processed` flag is `true`, not slots whose `errors[i]` is a non-Bad `ServiceResult`. `RouteScriptedAlarmMethodCalls` now sets `request.Processed = true` on every handled slot — success, `ArgumentException`, and generic exception paths — so `base.Call` never re-dispatches a routed Acknowledge / Confirm / AddComment to the stack's built-in Part 9 handler. Regression tests in `ScriptedAlarmMethodRoutingProcessedFlagTests` assert `Processed` is `true` after each engine path and `false` for slots the helper passes through to `base.Call`.
|
||||
|
||||
### Server-009
|
||||
| Field | Value |
|
||||
@@ -186,13 +186,13 @@
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `ProbeAsync` creates an `IHttpClientFactory` client and mutates `client.Timeout` on every 2-second probe tick. The timeout belongs on the request or on the named-client registration, not set per call on a factory-vended instance.
|
||||
|
||||
**Recommendation:** Configure the timeout once via `AddHttpClient(HttpClientName).ConfigureHttpClient(...)`, or use a per-request linked `CancellationTokenSource(_options.HttpProbeTimeout)`; drop the per-call `client.Timeout` mutation.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-23 — `ProbeAsync` no longer mutates `client.Timeout`. Replaced with a per-call `CancellationTokenSource(_options.HttpProbeTimeout)` linked to the loop's shutdown token; `GetAsync` consumes the linked token so the per-request deadline is enforced via cancellation instead of via the factory-vended `HttpClient` instance. Regression test `PeerHttpProbeLoopTests.Tick_does_not_mutate_factory_vended_client_Timeout` asserts the timeout-on-client mutation is gone.
|
||||
|
||||
### Server-013
|
||||
| Field | Value |
|
||||
@@ -214,13 +214,13 @@
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `SealedBootstrap` claims in its xml-doc to "close release blocker #2" by consuming the generation-sealed cache + resilient reader + stale-config flag, but `Program.cs` registers and uses `NodeBootstrap` instead. `SealedBootstrap` is never registered in DI nor referenced by `OpcUaServerService` — it and its `StaleConfigFlag` plumbing are dead in the production wire-up; the release blocker remains open in practice.
|
||||
|
||||
**Recommendation:** Either register `SealedBootstrap` (with `GenerationSealedCache`/`ResilientConfigReader`/`StaleConfigFlag`) and wire `StaleConfigFlag` into the health host, or delete `SealedBootstrap` and correct the release-readiness doc.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-23 — added `ServerWiring.AddSealedBootstrap` DI helper that registers `GenerationSealedCache` (rooted at a `.sealed` sibling of `NodeOptions.LocalCachePath`), `StaleConfigFlag`, `ResilientConfigReader`, and `SealedBootstrap`. `Program.cs` calls it after `AddSingleton<NodeBootstrap>()`; `OpcUaServerService` now consumes `SealedBootstrap` instead of `NodeBootstrap`; `OpcUaApplicationHost` is constructed with `staleConfigFlag` resolved from DI so `/healthz`'s `usingStaleConfig` reflects the cache-fallback state. The legacy `NodeBootstrap` registration stays for back-compat with the integration tests that construct it directly. Regression test `SealedBootstrapWiringTests.SealedBootstrap_and_its_dependencies_are_registered_in_DI` asserts the registrations compose without missing-service exceptions; `SealedBootstrap.cs`'s xml-doc updated to describe the live wire-up rather than the deferred plan.
|
||||
|
||||
### Server-015
|
||||
| Field | Value |
|
||||
@@ -228,10 +228,10 @@
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `OtOpcUaServer`'s class doc still says "PR 16 minimum-viable scope ... no security ... LDAP + security profiles are deferred." `OpcUaServerOptions`'s says "PR 17 minimum-viable scope: no LDAP, no security profiles beyond None." Both are stale — the class now does LDAP UserName auth, anonymous-role mapping, and a configurable security profile. A reader would wrongly conclude the server has no authentication.
|
||||
|
||||
**Recommendation:** Update both class summaries to describe current behaviour and drop the "deferred to a future PR" language.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-23 — rewrote both class summaries. `OtOpcUaServer` now describes the live LDAP UserName / Anonymous identity-token flow, the `RoleBasedIdentity` wrapper, and the configurable `SecurityProfile` driven by `OpcUaServerOptions`. `OpcUaServerOptions` now describes endpoint + identity + PKI + health + LDAP + anonymous-role surfaces and points at `docs/security.md`. The stale "PR 16 / PR 17 minimum-viable scope" and "deferred to their own PR" language is gone.
|
||||
|
||||
Reference in New Issue
Block a user