fix(server): resolve Low code-review findings (Server-004,006,008,012,014,015)

- Server-004: pass the role-derived display name to UserIdentity's base
  ctor (the SDK's DisplayName has no public setter) and drop the dead
  Display property; make RoleBasedIdentity internal sealed.
- Server-006: derive a bounded CancellationToken from the SDK's
  OperationContext.OperationDeadline in OnReadValue / OnWriteValue so a
  stalled driver call can no longer pin the request thread.
- Server-008: mark handled slots via CallMethodRequest.Processed = true
  in RouteScriptedAlarmMethodCalls (the SDK skips on Processed, not on a
  Good error slot).
- Server-012: PeerHttpProbeLoop.ProbeAsync stops mutating client.Timeout
  per call; uses a per-request CancellationTokenSource linked to the
  shutdown token instead.
- Server-014: wire SealedBootstrap into Program.cs via AddSealedBootstrap
  + OpcUaServerService so the generation-sealed cache + stale-config flag
  + resilient reader actually run; /healthz now reflects cache-fallback
  state.
- Server-015: replace the stale 'PR 16 / PR 17 minimum-viable scope'
  class summaries on OtOpcUaServer and OpcUaServerOptions with the
  shipped LDAP + anonymous-role + configurable security-profile prose.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-23 07:24:20 -04:00
parent 2b33b64a58
commit 6134050ceb
14 changed files with 698 additions and 40 deletions

View File

@@ -7,7 +7,7 @@
| Review date | 2026-05-22 | | Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` | | Commit reviewed | `76d35d1` |
| Status | Reviewed | | Status | Reviewed |
| Open findings | 6 | | Open findings | 0 |
## Checklist coverage ## Checklist coverage
@@ -74,13 +74,13 @@
| Severity | Low | | Severity | Low |
| Category | OtOpcUa conventions | | Category | OtOpcUa conventions |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:187-200` | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:187-200` |
| Status | Open | | Status | Resolved |
**Description:** `RoleBasedIdentity` declares its own `Display` property, but the base `UserIdentity` already has a settable `DisplayName`. `DriverNodeManager.ResolveCallUser`/`RouteScriptedAlarmMethodCalls` read the base `DisplayName`, never `Display`. Since the ctor passes only `userName` to base, `DisplayName` resolves to the username — so scripted-alarm Ack/Confirm/Shelve audit entries record the raw username, not the LDAP-resolved display name the comment promises. `Display` is dead code. **Description:** `RoleBasedIdentity` declares its own `Display` property, but the base `UserIdentity` already has a settable `DisplayName`. `DriverNodeManager.ResolveCallUser`/`RouteScriptedAlarmMethodCalls` read the base `DisplayName`, never `Display`. Since the ctor passes only `userName` to base, `DisplayName` resolves to the username — so scripted-alarm Ack/Confirm/Shelve audit entries record the raw username, not the LDAP-resolved display name the comment promises. `Display` is dead code.
**Recommendation:** Drop `Display`; set the base `DisplayName = displayName ?? userName;`. Verify `ResolveCallUser` yields the resolved display name. **Recommendation:** Drop `Display`; set the base `DisplayName = displayName ?? userName;`. Verify `ResolveCallUser` yields the resolved display name.
**Resolution:** _(open)_ **Resolution:** Resolved 2026-05-23 — re-triaged: in the pinned SDK version (1.5.374.126) `UserIdentity.DisplayName` is a sealed-virtual auto-property with no public setter, so the base `DisplayName = …` assignment the original recommendation suggested won't compile. Instead the fix passes `displayName ?? userName` as the first arg to the base `UserIdentity(string, string)` ctor — the SDK seeds `DisplayName` from that arg internally — and removes the dead `Display` property. `RoleBasedIdentity` is now `internal sealed` so `DriverNodeManager.ResolveCallUser` can be unit-tested against the production identity type. Regression tests `RoleBasedIdentityTests.DisplayName_returns_LDAP_resolved_display_name_when_present`, `DisplayName_falls_back_to_userName_when_LDAP_display_name_is_null`, and `ResolveCallUser_yields_LDAP_resolved_display_name` cover the behaviour.
### Server-005 ### Server-005
| Field | Value | | Field | Value |
@@ -102,13 +102,13 @@
| Severity | Low | | Severity | Low |
| Category | Concurrency & thread safety | | Category | Concurrency & thread safety |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:478-482, 1342-1348` | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:478-482, 1342-1348` |
| Status | Open | | Status | Resolved |
**Description:** `OnReadValue`/`OnWriteValue` are synchronous stack hooks that block on async driver calls via `.GetAwaiter().GetResult()` with `CancellationToken.None`. With `MaxRequestThreadCount = 100`, a burst of reads/writes into a stalled driver pins request threads for the full pipeline timeout, exhausting the pool and stalling unrelated sessions. The call cannot be cancelled by a client timeout. **Description:** `OnReadValue`/`OnWriteValue` are synchronous stack hooks that block on async driver calls via `.GetAwaiter().GetResult()` with `CancellationToken.None`. With `MaxRequestThreadCount = 100`, a burst of reads/writes into a stalled driver pins request threads for the full pipeline timeout, exhausting the pool and stalling unrelated sessions. The call cannot be cancelled by a client timeout.
**Recommendation:** Derive a `CancellationToken` from the `OperationContext` / `TransportQuotas.OperationTimeout` so a stuck driver call is abandoned. Longer term, use the stack's async service overrides if available. **Recommendation:** Derive a `CancellationToken` from the `OperationContext` / `TransportQuotas.OperationTimeout` so a stuck driver call is abandoned. Longer term, use the stack's async service overrides if available.
**Resolution:** _(open)_ **Resolution:** Resolved 2026-05-23 — added `DriverNodeManager.DeriveOperationCancellation(ISystemContext, TimeSpan fallback)` helper that reads `SystemContext.OperationContext.OperationDeadline` (which the stack sets from the client's `RequestHeader.TimeoutHint`). `OnReadValue` and `OnWriteValue` now pass `cts.Token` to `_invoker.ExecuteAsync` / `ExecuteWriteAsync` instead of `CancellationToken.None`, and surface `BadTimeout` (instead of `BadInternalError`) when the deadline fires. Handles both the SDK's sentinel deadlines: `DateTime.MinValue` (no deadline plumbed) and `DateTime.MaxValue` (TimeoutHint=0, the SDK default) collapse to a 30-s fallback. A deadline > Int32.MaxValue ms in the future also clamps to the fallback so the read path never throws `ArgumentOutOfRangeException` from inside `CancellationTokenSource(TimeSpan)`. Regression tests in `DriverNodeManagerCancellationTests` cover all five paths (future / past / missing / MinValue / MaxValue).
### Server-007 ### Server-007
| Field | Value | | Field | Value |
@@ -130,13 +130,13 @@
| Severity | Low | | Severity | Low |
| Category | Error handling & resilience | | Category | Error handling & resilience |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:736` | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:736` |
| Status | Open | | Status | Resolved |
**Description:** `RouteScriptedAlarmMethodCalls` marks a handled slot by setting `errors[i] = ServiceResult.Good`, assuming `base.Call` skips non-null *Good* error slots. The stack and `GateCallMethodRequests` only ever pre-populate *Bad* slots; the skip-on-Good assumption is not a guaranteed SDK contract. If `base.Call` re-dispatches, the engine method and the stack's built-in Part 9 handler both fire — double transition. **Description:** `RouteScriptedAlarmMethodCalls` marks a handled slot by setting `errors[i] = ServiceResult.Good`, assuming `base.Call` skips non-null *Good* error slots. The stack and `GateCallMethodRequests` only ever pre-populate *Bad* slots; the skip-on-Good assumption is not a guaranteed SDK contract. If `base.Call` re-dispatches, the engine method and the stack's built-in Part 9 handler both fire — double transition.
**Recommendation:** Verify against the pinned SDK whether `base.Call` skips Good-pre-populated slots. If not, exclude routed slots from `methodsToCall` before `base.Call`. Add a test asserting exactly-once engine transition for a routed Acknowledge. **Recommendation:** Verify against the pinned SDK whether `base.Call` skips Good-pre-populated slots. If not, exclude routed slots from `methodsToCall` before `base.Call`. Add a test asserting exactly-once engine transition for a routed Acknowledge.
**Resolution:** _(open)_ **Resolution:** Resolved 2026-05-23 — verified against the pinned SDK (DeepWiki query against OPCFoundation/UA-.NETStandard): `CustomNodeManager2.Call` / `CallInternalAsync` skip slots whose `CallMethodRequest.Processed` flag is `true`, not slots whose `errors[i]` is a non-Bad `ServiceResult`. `RouteScriptedAlarmMethodCalls` now sets `request.Processed = true` on every handled slot — success, `ArgumentException`, and generic exception paths — so `base.Call` never re-dispatches a routed Acknowledge / Confirm / AddComment to the stack's built-in Part 9 handler. Regression tests in `ScriptedAlarmMethodRoutingProcessedFlagTests` assert `Processed` is `true` after each engine path and `false` for slots the helper passes through to `base.Call`.
### Server-009 ### Server-009
| Field | Value | | Field | Value |
@@ -186,13 +186,13 @@
| Severity | Low | | Severity | Low |
| Category | Performance & resource management | | Category | Performance & resource management |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79` | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79` |
| Status | Open | | Status | Resolved |
**Description:** `ProbeAsync` creates an `IHttpClientFactory` client and mutates `client.Timeout` on every 2-second probe tick. The timeout belongs on the request or on the named-client registration, not set per call on a factory-vended instance. **Description:** `ProbeAsync` creates an `IHttpClientFactory` client and mutates `client.Timeout` on every 2-second probe tick. The timeout belongs on the request or on the named-client registration, not set per call on a factory-vended instance.
**Recommendation:** Configure the timeout once via `AddHttpClient(HttpClientName).ConfigureHttpClient(...)`, or use a per-request linked `CancellationTokenSource(_options.HttpProbeTimeout)`; drop the per-call `client.Timeout` mutation. **Recommendation:** Configure the timeout once via `AddHttpClient(HttpClientName).ConfigureHttpClient(...)`, or use a per-request linked `CancellationTokenSource(_options.HttpProbeTimeout)`; drop the per-call `client.Timeout` mutation.
**Resolution:** _(open)_ **Resolution:** Resolved 2026-05-23 — `ProbeAsync` no longer mutates `client.Timeout`. Replaced with a per-call `CancellationTokenSource(_options.HttpProbeTimeout)` linked to the loop's shutdown token; `GetAsync` consumes the linked token so the per-request deadline is enforced via cancellation instead of via the factory-vended `HttpClient` instance. Regression test `PeerHttpProbeLoopTests.Tick_does_not_mutate_factory_vended_client_Timeout` asserts the timeout-on-client mutation is gone.
### Server-013 ### Server-013
| Field | Value | | Field | Value |
@@ -214,13 +214,13 @@
| Severity | Low | | Severity | Low |
| Category | Code organization & conventions | | Category | Code organization & conventions |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs` | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs` |
| Status | Open | | Status | Resolved |
**Description:** `SealedBootstrap` claims in its xml-doc to "close release blocker #2" by consuming the generation-sealed cache + resilient reader + stale-config flag, but `Program.cs` registers and uses `NodeBootstrap` instead. `SealedBootstrap` is never registered in DI nor referenced by `OpcUaServerService` — it and its `StaleConfigFlag` plumbing are dead in the production wire-up; the release blocker remains open in practice. **Description:** `SealedBootstrap` claims in its xml-doc to "close release blocker #2" by consuming the generation-sealed cache + resilient reader + stale-config flag, but `Program.cs` registers and uses `NodeBootstrap` instead. `SealedBootstrap` is never registered in DI nor referenced by `OpcUaServerService` — it and its `StaleConfigFlag` plumbing are dead in the production wire-up; the release blocker remains open in practice.
**Recommendation:** Either register `SealedBootstrap` (with `GenerationSealedCache`/`ResilientConfigReader`/`StaleConfigFlag`) and wire `StaleConfigFlag` into the health host, or delete `SealedBootstrap` and correct the release-readiness doc. **Recommendation:** Either register `SealedBootstrap` (with `GenerationSealedCache`/`ResilientConfigReader`/`StaleConfigFlag`) and wire `StaleConfigFlag` into the health host, or delete `SealedBootstrap` and correct the release-readiness doc.
**Resolution:** _(open)_ **Resolution:** Resolved 2026-05-23 — added `ServerWiring.AddSealedBootstrap` DI helper that registers `GenerationSealedCache` (rooted at a `.sealed` sibling of `NodeOptions.LocalCachePath`), `StaleConfigFlag`, `ResilientConfigReader`, and `SealedBootstrap`. `Program.cs` calls it after `AddSingleton<NodeBootstrap>()`; `OpcUaServerService` now consumes `SealedBootstrap` instead of `NodeBootstrap`; `OpcUaApplicationHost` is constructed with `staleConfigFlag` resolved from DI so `/healthz`'s `usingStaleConfig` reflects the cache-fallback state. The legacy `NodeBootstrap` registration stays for back-compat with the integration tests that construct it directly. Regression test `SealedBootstrapWiringTests.SealedBootstrap_and_its_dependencies_are_registered_in_DI` asserts the registrations compose without missing-service exceptions; `SealedBootstrap.cs`'s xml-doc updated to describe the live wire-up rather than the deferred plan.
### Server-015 ### Server-015
| Field | Value | | Field | Value |
@@ -228,10 +228,10 @@
| Severity | Low | | Severity | Low |
| Category | Documentation & comments | | Category | Documentation & comments |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26` | | Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26` |
| Status | Open | | Status | Resolved |
**Description:** `OtOpcUaServer`'s class doc still says "PR 16 minimum-viable scope ... no security ... LDAP + security profiles are deferred." `OpcUaServerOptions`'s says "PR 17 minimum-viable scope: no LDAP, no security profiles beyond None." Both are stale — the class now does LDAP UserName auth, anonymous-role mapping, and a configurable security profile. A reader would wrongly conclude the server has no authentication. **Description:** `OtOpcUaServer`'s class doc still says "PR 16 minimum-viable scope ... no security ... LDAP + security profiles are deferred." `OpcUaServerOptions`'s says "PR 17 minimum-viable scope: no LDAP, no security profiles beyond None." Both are stale — the class now does LDAP UserName auth, anonymous-role mapping, and a configurable security profile. A reader would wrongly conclude the server has no authentication.
**Recommendation:** Update both class summaries to describe current behaviour and drop the "deferred to a future PR" language. **Recommendation:** Update both class summaries to describe current behaviour and drop the "deferred to a future PR" language.
**Resolution:** _(open)_ **Resolution:** Resolved 2026-05-23 — rewrote both class summaries. `OtOpcUaServer` now describes the live LDAP UserName / Anonymous identity-token flow, the `RoleBasedIdentity` wrapper, and the configurable `SecurityProfile` driven by `OpcUaServerOptions`. `OpcUaServerOptions` now describes endpoint + identity + PKI + health + LDAP + anonymous-role surfaces and points at `docs/security.md`. The stale "PR 16 / PR 17 minimum-viable scope" and "deferred to their own PR" language is gone.

View File

@@ -73,11 +73,20 @@ public sealed class PeerHttpProbeLoop(
{ {
var url = $"http://{peer.Host}:{peer.DashboardPort}/healthz"; var url = $"http://{peer.Host}:{peer.DashboardPort}/healthz";
var healthy = false; var healthy = false;
// Server-012: bound the request via a linked CTS rather than mutating
// client.Timeout on a factory-vended HttpClient. IHttpClientFactory may pool /
// recycle the underlying handler, so mutating client.Timeout per call races with
// the factory's lifecycle and crosses ownership boundaries. A per-call CTS is the
// canonical way to enforce a per-request deadline.
using var timeoutCts = new CancellationTokenSource(_options.HttpProbeTimeout);
using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(
cancellationToken, timeoutCts.Token);
try try
{ {
using var client = httpClientFactory.CreateClient(HttpClientName); using var client = httpClientFactory.CreateClient(HttpClientName);
client.Timeout = _options.HttpProbeTimeout; using var response = await client.GetAsync(url, linkedCts.Token).ConfigureAwait(false);
using var response = await client.GetAsync(url, cancellationToken).ConfigureAwait(false);
healthy = response.IsSuccessStatusCode; healthy = response.IsSuccessStatusCode;
} }
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
@@ -86,9 +95,9 @@ public sealed class PeerHttpProbeLoop(
} }
catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException or OperationCanceledException) catch (Exception ex) when (ex is HttpRequestException or TaskCanceledException or OperationCanceledException)
{ {
// Any transport-level failure counts as unhealthy — connection refused, timeout, // Any transport-level failure counts as unhealthy — connection refused, timeout
// DNS fail, TLS fail. Swallow + mark unhealthy; don't log every tick, only when // (linked CTS expired), DNS fail, TLS fail. Swallow + mark unhealthy; don't log
// state transitions. // every tick, only when state transitions.
healthy = false; healthy = false;
} }

View File

@@ -462,6 +462,63 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
} }
} }
/// <summary>
/// Server-006 fallback for the synchronous OnRead/OnWrite hooks when the stack hasn't
/// plumbed an <see cref="IOperationContext.OperationDeadline"/> through. Bounds the
/// <c>.GetAwaiter().GetResult()</c> wait so a stalled driver can't pin a request
/// thread indefinitely (default <c>MaxRequestThreadCount = 100</c>).
/// </summary>
internal static readonly TimeSpan DefaultSynchronousHookTimeout = TimeSpan.FromSeconds(30);
/// <summary>
/// Derives a <see cref="CancellationTokenSource"/> bounded by the OPC UA stack's
/// <see cref="IOperationContext.OperationDeadline"/>, falling back to
/// <paramref name="fallback"/> when no deadline has been plumbed. The synchronous
/// <c>OnReadValue</c> / <c>OnWriteValue</c> stack hooks consume the returned token so
/// a stuck driver call can be abandoned instead of pinning a request thread for the
/// full pipeline timeout (Server-006).
/// </summary>
/// <remarks>
/// The OPC UA stack's <c>SessionSystemContext.OperationContext</c> exposes the
/// per-request deadline computed from the client's <c>TimeoutHint</c> +
/// <see cref="TransportQuotas.OperationTimeout"/>. An expired deadline produces an
/// immediately-cancelled token so the invoker call short-circuits without dispatching
/// to the driver. A <see cref="DateTime.MinValue"/> deadline (the SDK's "not plumbed"
/// sentinel) is treated as missing and falls back to <paramref name="fallback"/>.
/// </remarks>
internal static CancellationTokenSource DeriveOperationCancellation(ISystemContext context, TimeSpan fallback)
{
// Cast: ISystemContext is read-only; SystemContext (the concrete base) exposes
// OperationContext as IOperationContext. Server-side, every ISystemContext the
// stack passes is a SessionSystemContext (subclass of SystemContext), so the cast
// succeeds in practice. The null guard keeps the helper safe for tests / future
// overrides that don't follow that contract.
var opCtx = (context as SystemContext)?.OperationContext;
var deadline = opCtx?.OperationDeadline ?? DateTime.MinValue;
// DateTime.MinValue is the legacy "no deadline plumbed" sentinel; DateTime.MaxValue
// is the SDK's "client didn't supply a TimeoutHint" default (OperationContext's
// initial value when RequestHeader.TimeoutHint == 0). Both must collapse to the
// fallback timeout — otherwise MinValue compares as already-expired and MaxValue
// overflows CancellationTokenSource(TimeSpan)'s Int32.MaxValue-ms ceiling.
if (deadline == DateTime.MinValue || deadline == DateTime.MaxValue)
return new CancellationTokenSource(fallback);
var remaining = deadline - DateTime.UtcNow;
if (remaining <= TimeSpan.Zero)
{
var cts = new CancellationTokenSource();
cts.Cancel();
return cts;
}
// CancellationTokenSource(TimeSpan) caps at ~24.86 days (Int32.MaxValue ms). A
// legitimate per-request deadline far in the future (e.g. a TimeoutHint of hours)
// exceeds that limit; clamp to the fallback so we never throw ArgumentOutOfRangeException
// from inside the read path.
if (remaining.TotalMilliseconds > int.MaxValue) return new CancellationTokenSource(fallback);
return new CancellationTokenSource(remaining);
}
private ServiceResult OnReadValue(ISystemContext context, NodeState node, NumericRange indexRange, private ServiceResult OnReadValue(ISystemContext context, NodeState node, NumericRange indexRange,
QualifiedName dataEncoding, ref object? value, ref StatusCode statusCode, ref DateTime timestamp) QualifiedName dataEncoding, ref object? value, ref StatusCode statusCode, ref DateTime timestamp)
{ {
@@ -490,11 +547,15 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
} }
} }
// Server-006: bound the synchronous .GetAwaiter().GetResult() wait by the stack's
// OperationDeadline so a stalled driver can't pin a request thread for the full
// pipeline timeout. The previous CancellationToken.None left every read uncancellable.
using var cts = DeriveOperationCancellation(context, DefaultSynchronousHookTimeout);
var result = _invoker.ExecuteAsync( var result = _invoker.ExecuteAsync(
DriverCapability.Read, DriverCapability.Read,
ResolveHostFor(fullRef), ResolveHostFor(fullRef),
async ct => (IReadOnlyList<DataValueSnapshot>)await readable.ReadAsync([fullRef], ct).ConfigureAwait(false), async ct => (IReadOnlyList<DataValueSnapshot>)await readable.ReadAsync([fullRef], ct).ConfigureAwait(false),
CancellationToken.None).AsTask().GetAwaiter().GetResult(); cts.Token).AsTask().GetAwaiter().GetResult();
if (result.Count == 0) if (result.Count == 0)
{ {
statusCode = StatusCodes.BadNoData; statusCode = StatusCodes.BadNoData;
@@ -505,6 +566,12 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
statusCode = snap.StatusCode; statusCode = snap.StatusCode;
timestamp = snap.ServerTimestampUtc; timestamp = snap.ServerTimestampUtc;
} }
catch (OperationCanceledException)
{
// The deadline expired or the client cancelled. Surface BadTimeout so the client
// sees the actual outcome (the pre-fix uncancellable path would have hung instead).
statusCode = StatusCodes.BadTimeout;
}
catch (Exception ex) catch (Exception ex)
{ {
_logger.LogWarning(ex, "OnReadValue failed for {NodeId}", node.NodeId); _logger.LogWarning(ex, "OnReadValue failed for {NodeId}", node.NodeId);
@@ -745,11 +812,15 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
engine.AddCommentAsync(alarmId, user, comment ?? string.Empty, CancellationToken.None) engine.AddCommentAsync(alarmId, user, comment ?? string.Empty, CancellationToken.None)
.GetAwaiter().GetResult(); .GetAwaiter().GetResult();
// Mark the slot as handled so base.Call skips it. A pre-populated Good // Mark the slot as handled so base.Call skips it. Server-008: the SDK's
// result (not null and not Bad) is the signal the base class uses to // CustomNodeManager2.Call (and CallInternalAsync) skip slots whose
// skip per-slot dispatch — set StatusCode to Good explicitly. // CallMethodRequest.Processed flag is true — the errors[i] value is the
// per-slot status, not the skip signal. Without Processed=true the stack's
// built-in Part 9 Acknowledge/Confirm handler would also fire and the engine
// would observe a double transition.
results[i] = new CallMethodResult { StatusCode = StatusCodes.Good }; results[i] = new CallMethodResult { StatusCode = StatusCodes.Good };
errors[i] = ServiceResult.Good; errors[i] = ServiceResult.Good;
request.Processed = true;
} }
catch (ArgumentException ex) catch (ArgumentException ex)
{ {
@@ -757,11 +828,15 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
// as BadInvalidArgument so the OPC UA client sees a meaningful status. // as BadInvalidArgument so the OPC UA client sees a meaningful status.
errors[i] = new ServiceResult(StatusCodes.BadInvalidArgument, errors[i] = new ServiceResult(StatusCodes.BadInvalidArgument,
ex.Message, ex.Message); ex.Message, ex.Message);
// The engine rejected the call, but we still routed it — base.Call must
// not re-run the method on the stack's built-in handler. Server-008.
request.Processed = true;
} }
catch (Exception ex) catch (Exception ex)
{ {
errors[i] = new ServiceResult(StatusCodes.BadInternalError, errors[i] = new ServiceResult(StatusCodes.BadInternalError,
ex.Message, ex.Message); ex.Message, ex.Message);
request.Processed = true;
} }
} }
} }
@@ -1354,13 +1429,16 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
{ {
var isIdempotent = _writeIdempotentByFullRef.GetValueOrDefault(fullRef!, false); var isIdempotent = _writeIdempotentByFullRef.GetValueOrDefault(fullRef!, false);
var capturedValue = value; var capturedValue = value;
// Server-006: same deadline-derived cancellation as OnReadValue so a stalled
// driver write can't pin a request thread for the full pipeline timeout.
using var cts = DeriveOperationCancellation(context, DefaultSynchronousHookTimeout);
var results = _invoker.ExecuteWriteAsync( var results = _invoker.ExecuteWriteAsync(
ResolveHostFor(fullRef!), ResolveHostFor(fullRef!),
isIdempotent, isIdempotent,
async ct => (IReadOnlyList<WriteResult>)await _writable.WriteAsync( async ct => (IReadOnlyList<WriteResult>)await _writable.WriteAsync(
[new DriverWriteRequest(fullRef!, capturedValue)], [new DriverWriteRequest(fullRef!, capturedValue)],
ct).ConfigureAwait(false), ct).ConfigureAwait(false),
CancellationToken.None).AsTask().GetAwaiter().GetResult(); cts.Token).AsTask().GetAwaiter().GetResult();
if (results.Count > 0 && results[0].StatusCode != 0) if (results.Count > 0 && results[0].StatusCode != 0)
{ {
statusCode = results[0].StatusCode; statusCode = results[0].StatusCode;
@@ -1368,6 +1446,12 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
} }
return ServiceResult.Good; return ServiceResult.Good;
} }
catch (OperationCanceledException)
{
// Deadline expired or client cancelled — surface BadTimeout instead of the
// generic BadInternalError so the client sees the actual outcome.
return new ServiceResult(StatusCodes.BadTimeout);
}
catch (Exception ex) catch (Exception ex)
{ {
_logger.LogWarning(ex, "Write failed for {FullRef}", fullRef); _logger.LogWarning(ex, "Write failed for {FullRef}", fullRef);

View File

@@ -20,10 +20,15 @@ public enum OpcUaSecurityProfile
/// <summary> /// <summary>
/// OPC UA server endpoint + application-identity configuration. Bound from the /// OPC UA server endpoint + application-identity configuration. Bound from the
/// <c>OpcUaServer</c> section of <c>appsettings.json</c>. PR 17 minimum-viable scope: no LDAP, /// <c>OpcUaServer</c> section of <c>appsettings.json</c>.
/// no security profiles beyond None — those wire in alongside a future deployment-policy PR
/// that reads from the central config DB instead of appsettings.
/// </summary> /// </summary>
/// <remarks>
/// Covers the endpoint URL + application identity, the PKI store root + auto-trust toggle,
/// the optional <c>/healthz</c> health-endpoints listener, the configurable transport
/// <see cref="SecurityProfile"/>, the <see cref="Ldap"/> binding for UserName token
/// validation, and the optional <see cref="AnonymousRoles"/> set that grants anonymous
/// sessions a configurable role list. See <c>docs/security.md</c> for the full guide.
/// </remarks>
public sealed class OpcUaServerOptions public sealed class OpcUaServerOptions
{ {
public const string SectionName = "OpcUaServer"; public const string SectionName = "OpcUaServer";

View File

@@ -15,10 +15,27 @@ namespace ZB.MOM.WW.OtOpcUa.Server.OpcUa;
/// <summary> /// <summary>
/// <see cref="StandardServer"/> subclass that wires one <see cref="DriverNodeManager"/> per /// <see cref="StandardServer"/> subclass that wires one <see cref="DriverNodeManager"/> per
/// registered driver from <see cref="DriverHost"/>. Anonymous endpoint on /// registered driver from <see cref="DriverHost"/>. Endpoint URL, transport security
/// <c>opc.tcp://0.0.0.0:4840</c>, no security — PR 16 minimum-viable scope; LDAP + security /// profile (<see cref="OpcUaSecurityProfile"/>), and LDAP-backed UserName authentication
/// profiles are deferred to their own PR on top of this. /// are all driven by <see cref="OpcUaServerOptions"/>.
/// </summary> /// </summary>
/// <remarks>
/// <para>
/// <see cref="OnImpersonateUser"/> accepts the stack's <c>AnonymousIdentityToken</c>
/// and <c>UserNameIdentityToken</c>. Anonymous sessions either receive the stack's
/// default empty identity (production default) or a configurable role set via
/// <see cref="OpcUaServerOptions.AnonymousRoles"/>. UserName sessions are validated
/// against the injected <see cref="IUserAuthenticator"/> — the production binding is
/// the LDAP authenticator under <see cref="OpcUaServerOptions.Ldap"/>; failed binds
/// throw <see cref="ServiceResultException"/> with
/// <see cref="StatusCodes.BadUserAccessDenied"/>.
/// </para>
/// <para>
/// Authenticated identities are wrapped in <see cref="RoleBasedIdentity"/> which
/// carries the LDAP-resolved roles + groups so <see cref="DriverNodeManager"/>'s
/// server-layer authorization gate can evaluate per-tag ACLs without re-reading LDAP.
/// </para>
/// </remarks>
public sealed class OtOpcUaServer : StandardServer public sealed class OtOpcUaServer : StandardServer
{ {
private readonly DriverHost _driverHost; private readonly DriverHost _driverHost;
@@ -184,16 +201,31 @@ public sealed class OtOpcUaServer : StandardServer
/// <see cref="ILdapGroupsBearer"/> (data-plane: <see cref="AuthorizationGate"/> evaluator). /// <see cref="ILdapGroupsBearer"/> (data-plane: <see cref="AuthorizationGate"/> evaluator).
/// Anonymous identity (no roles configured) still uses the stack's default UserIdentity. /// Anonymous identity (no roles configured) still uses the stack's default UserIdentity.
/// </summary> /// </summary>
private sealed class RoleBasedIdentity : UserIdentity, IRoleBearer, ILdapGroupsBearer /// <remarks>
/// Server-004 — the LDAP-resolved display name is stamped onto the base
/// <see cref="UserIdentity.DisplayName"/> property (settable on the stack's
/// <c>UserIdentity</c>) so <c>DriverNodeManager.ResolveCallUser</c> picks it up
/// through the <see cref="IUserIdentity"/> interface. Without that assignment,
/// <c>DisplayName</c> resolves to the username and scripted-alarm
/// Ack/Confirm/Shelve audit entries record the raw username instead of the
/// human-readable LDAP <c>cn</c>.
/// </remarks>
internal sealed class RoleBasedIdentity : UserIdentity, IRoleBearer, ILdapGroupsBearer
{ {
public IReadOnlyList<string> Roles { get; } public IReadOnlyList<string> Roles { get; }
public IReadOnlyList<string> LdapGroups { get; } public IReadOnlyList<string> LdapGroups { get; }
public string? Display { get; }
public RoleBasedIdentity(string userName, string? displayName, IReadOnlyList<string> roles, IReadOnlyList<string> ldapGroups) public RoleBasedIdentity(string userName, string? displayName, IReadOnlyList<string> roles, IReadOnlyList<string> ldapGroups)
: base(userName, "") : base(displayName ?? userName, "")
{ {
Display = displayName; // The base UserIdentity(string, string) ctor seeds DisplayName from its first arg
// (the stack's UserIdentity.DisplayName getter is sealed-virtual on this pinned
// SDK version, so it cannot be overridden or re-assigned). Pass the LDAP-resolved
// display name as that first arg so DriverNodeManager.ResolveCallUser — which
// reads IUserIdentity.DisplayName — stamps the human-readable name onto
// scripted-alarm Ack / Confirm / Shelve audit entries. When LDAP returns no
// display name, fall back to the username so every audit row still carries an
// identity (matches the pre-fix observable behaviour).
Roles = roles; Roles = roles;
LdapGroups = ldapGroups; LdapGroups = ldapGroups;
} }

View File

@@ -15,7 +15,11 @@ namespace ZB.MOM.WW.OtOpcUa.Server;
/// runs until stopped. /// runs until stopped.
/// </summary> /// </summary>
public sealed class OpcUaServerService( public sealed class OpcUaServerService(
NodeBootstrap bootstrap, // Server-014 — consume SealedBootstrap so the Phase 6.1 Stream D resilient-reader +
// GenerationSealedCache + StaleConfigFlag chain actually runs on every boot. The
// legacy NodeBootstrap is still registered for integration tests that construct it
// directly, but the production service uses the generation-sealed path.
SealedBootstrap bootstrap,
DriverHost driverHost, DriverHost driverHost,
OpcUaApplicationHost applicationHost, OpcUaApplicationHost applicationHost,
DriverEquipmentContentRegistry equipmentContentRegistry, DriverEquipmentContentRegistry equipmentContentRegistry,

View File

@@ -116,6 +116,12 @@ builder.Services.AddSingleton<ILocalConfigCache>(_ => new LiteDbConfigCache(opti
builder.Services.AddSingleton<DriverHost>(); builder.Services.AddSingleton<DriverHost>();
builder.Services.AddSingleton<NodeBootstrap>(); builder.Services.AddSingleton<NodeBootstrap>();
// Server-014: register the Phase 6.1 Stream D generation-sealed bootstrap chain. Production
// uses SealedBootstrap; NodeBootstrap stays registered for back-compat with the integration
// tests that depend on it directly. StaleConfigFlag is resolved into OpcUaApplicationHost so
// /healthz surfaces the stale-config signal when the cache fallback path serves a request.
builder.Services.AddSealedBootstrap(options);
// Task #248 — driver-instance bootstrap pipeline. DriverFactoryRegistry is the // Task #248 — driver-instance bootstrap pipeline. DriverFactoryRegistry is the
// type-name → factory map; each driver project's static Register call pre-loads // type-name → factory map; each driver project's static Register call pre-loads
// its factory so the bootstrapper can materialise DriverInstance rows from the // its factory so the bootstrapper can materialise DriverInstance rows from the
@@ -217,7 +223,11 @@ builder.Services.AddSingleton<OpcUaApplicationHost>(sp =>
equipmentContentLookup: registry.Get, equipmentContentLookup: registry.Get,
historyRouter: sp.GetRequiredService<IHistoryRouter>(), historyRouter: sp.GetRequiredService<IHistoryRouter>(),
alarmConditionService: sp.GetRequiredService<AlarmConditionService>(), alarmConditionService: sp.GetRequiredService<AlarmConditionService>(),
configDbHealthy: () => dbHealthCache.IsHealthy); configDbHealthy: () => dbHealthCache.IsHealthy,
// Server-014: surface StaleConfigFlag into /healthz so a cache-fallback bootstrap
// (central DB unreachable + sealed snapshot served) flips usingStaleConfig=true on
// the next probe. Without this wire-up the flag was inert.
staleConfigFlag: sp.GetRequiredService<ZB.MOM.WW.OtOpcUa.Configuration.LocalCache.StaleConfigFlag>());
}); });
builder.Services.AddHostedService<OpcUaServerService>(); builder.Services.AddHostedService<OpcUaServerService>();

View File

@@ -12,14 +12,17 @@ namespace ZB.MOM.WW.OtOpcUa.Server;
/// sealed snapshot to fall back to. /// sealed snapshot to fall back to.
/// </summary> /// </summary>
/// <remarks> /// <remarks>
/// <para>Alongside the original <see cref="NodeBootstrap"/> (which uses the single-file /// <para>Server-014 — registered in DI via <c>ServerWiring.AddSealedBootstrap</c> and
/// <see cref="ILocalConfigCache"/>). Program.cs can switch to this one once operators are /// consumed by <c>OpcUaServerService</c>. The legacy <see cref="NodeBootstrap"/> stays
/// ready for the generation-sealed semantics. The original stays for backward compat /// registered alongside for the three integration tests that construct it directly, but
/// with the three integration tests that construct <see cref="NodeBootstrap"/> directly.</para> /// production boots through this wrapper so <see cref="GenerationSealedCache"/> +
/// <see cref="ResilientConfigReader"/> + <see cref="StaleConfigFlag"/> run on every
/// start-up and <c>/healthz</c>'s <c>usingStaleConfig</c> reflects the cache-fallback
/// state.</para>
/// ///
/// <para>Closes release blocker #2 in <c>docs/v2/v2-release-readiness.md</c> — the /// <para>Closes release blocker #2 in <c>docs/v2/v2-release-readiness.md</c> — the
/// generation-sealed cache + resilient reader + stale-config flag ship as unit-tested /// generation-sealed cache + resilient reader + stale-config flag ship as unit-tested
/// primitives in PR #81 but no production path consumed them until this wrapper.</para> /// primitives in PR #81; this wrapper is the production consumer that wires them in.</para>
/// </remarks> /// </remarks>
public sealed class SealedBootstrap public sealed class SealedBootstrap
{ {

View File

@@ -0,0 +1,57 @@
using Microsoft.Extensions.DependencyInjection;
using ZB.MOM.WW.OtOpcUa.Configuration.LocalCache;
namespace ZB.MOM.WW.OtOpcUa.Server;
/// <summary>
/// DI registration helpers consumed by <c>Program.cs</c>. Extracted so tests can assert
/// the production wire-up actually composes without spinning up the full <c>Host</c>.
/// </summary>
public static class ServerWiring
{
/// <summary>
/// Server-014 — registers the Phase 6.1 Stream D generation-sealed bootstrap chain:
/// <see cref="GenerationSealedCache"/>, <see cref="StaleConfigFlag"/>,
/// <see cref="ResilientConfigReader"/>, and <see cref="SealedBootstrap"/>. Without these
/// registrations <c>OpcUaServerService</c> cannot consume the sealed bootstrap and the
/// <see cref="StaleConfigFlag"/> stays inert — <c>/healthz</c>'s <c>usingStaleConfig</c>
/// never flips on a DB outage with a warm cache.
/// </summary>
/// <remarks>
/// The cache root is sourced from <see cref="NodeOptions.LocalCachePath"/> — same path
/// the legacy <see cref="NodeBootstrap"/> uses for its LiteDB cache, so both bootstrap
/// paths persist alongside each other while the migration completes.
/// </remarks>
public static IServiceCollection AddSealedBootstrap(this IServiceCollection services, NodeOptions options)
{
// Use a sibling directory off LocalCachePath so the LiteDB file and the
// GenerationSealedCache snapshots don't clash. The cache root is a directory;
// LocalCachePath is canonically the LiteDB file path.
var cacheRoot = ResolveCacheRoot(options.LocalCachePath);
// Register NodeOptions only if the caller hasn't already done so — Program.cs
// registers it earlier in its DI chain, but the wiring helper supports standalone
// unit tests that want to compose just the SealedBootstrap chain.
if (!services.Any(d => d.ServiceType == typeof(NodeOptions)))
services.AddSingleton(options);
services.AddSingleton(new GenerationSealedCache(cacheRoot));
services.AddSingleton<StaleConfigFlag>();
services.AddSingleton<ResilientConfigReader>();
services.AddSingleton<SealedBootstrap>();
return services;
}
private static string ResolveCacheRoot(string localCachePath)
{
// LocalCachePath is the LiteDB file (e.g. "config_cache.db"); the sealed cache is a
// directory. Pick a sibling folder so the two don't share a path.
if (string.IsNullOrWhiteSpace(localCachePath))
return Path.Combine(Path.GetTempPath(), "otopcua-sealed-cache");
var dir = Path.GetDirectoryName(localCachePath);
var name = Path.GetFileNameWithoutExtension(localCachePath);
var root = string.IsNullOrEmpty(dir)
? $"{name}.sealed"
: Path.Combine(dir, $"{name}.sealed");
return root;
}
}

View File

@@ -0,0 +1,116 @@
using Opc.Ua;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Server.OpcUa;
namespace ZB.MOM.WW.OtOpcUa.Server.Tests;
/// <summary>
/// Regression for Server-006 — synchronous OnReadValue / OnWriteValue stack hooks must
/// derive a <see cref="CancellationToken"/> from the operation deadline so a stalled
/// driver call doesn't pin a request thread for the full pipeline timeout. The shared
/// helper <see cref="DriverNodeManager.DeriveOperationCancellation"/> turns the
/// <see cref="ISystemContext"/>'s <c>OperationDeadline</c> into a linked CTS.
/// </summary>
[Trait("Category", "Unit")]
public sealed class DriverNodeManagerCancellationTests
{
/// <summary>
/// Build a SystemContext bound to the supplied IOperationContext. SystemContext's
/// OperationContext setter is protected, so we use the public <c>Copy</c> method
/// which clones the context onto the supplied operation context.
/// </summary>
private static ISystemContext ContextWithDeadline(DateTime deadline)
=> new SystemContext().Copy(new StubOperationContext(deadline));
[Fact]
public void Future_deadline_produces_uncancelled_token()
{
var ctx = ContextWithDeadline(DateTime.UtcNow.AddSeconds(30));
using var cts = DriverNodeManager.DeriveOperationCancellation(ctx, fallback: TimeSpan.FromSeconds(10));
cts.Token.IsCancellationRequested.ShouldBeFalse();
}
[Fact]
public void Past_deadline_produces_already_cancelled_token()
{
var ctx = ContextWithDeadline(DateTime.UtcNow.AddSeconds(-5));
using var cts = DriverNodeManager.DeriveOperationCancellation(ctx, fallback: TimeSpan.FromSeconds(10));
cts.Token.IsCancellationRequested.ShouldBeTrue(
"an expired OperationDeadline must surface as an immediately-cancelled token so the "
+ "stalled driver call returns without burning a request thread");
}
[Fact]
public void Missing_deadline_uses_fallback_timeout()
{
// No OperationContext attached → no deadline plumbed; helper falls back to the
// supplied timeout so an OnReadValue hook into a stalled driver can't hang the
// request thread indefinitely.
var ctx = new SystemContext();
using var cts = DriverNodeManager.DeriveOperationCancellation(ctx, fallback: TimeSpan.FromMilliseconds(20));
cts.Token.WaitHandle.WaitOne(TimeSpan.FromMilliseconds(500)).ShouldBeTrue(
"fallback timeout must fire so a missing deadline cannot hang the request thread");
cts.Token.IsCancellationRequested.ShouldBeTrue();
}
[Fact]
public void DateTime_MinValue_deadline_uses_fallback_timeout()
{
// IOperationContext.OperationDeadline is `DateTime.MinValue` when the stack hasn't
// plumbed a deadline through. The helper treats that as "no deadline" and falls
// back to the supplied timeout, otherwise an MinValue would surface as
// already-cancelled and short-circuit every read.
var ctx = ContextWithDeadline(DateTime.MinValue);
using var cts = DriverNodeManager.DeriveOperationCancellation(ctx, fallback: TimeSpan.FromMilliseconds(20));
cts.Token.WaitHandle.WaitOne(TimeSpan.FromMilliseconds(500)).ShouldBeTrue();
cts.Token.IsCancellationRequested.ShouldBeTrue();
}
[Fact]
public void DateTime_MaxValue_deadline_uses_fallback_timeout_not_overflow()
{
// OperationContext sets OperationDeadline = DateTime.MaxValue when the client's
// RequestHeader.TimeoutHint is zero (the default). DateTime.MaxValue - UtcNow
// overflows CancellationTokenSource(TimeSpan)'s Int32.MaxValue-ms cap, so the
// helper must collapse it to the fallback — otherwise the read throws
// ArgumentOutOfRangeException from inside DeriveOperationCancellation and surfaces
// as BadInternalError on every read (regression that broke OpcUaServerIntegrationTests).
var ctx = ContextWithDeadline(DateTime.MaxValue);
using var cts = DriverNodeManager.DeriveOperationCancellation(ctx, fallback: TimeSpan.FromSeconds(30));
cts.Token.IsCancellationRequested.ShouldBeFalse("MaxValue deadline + 30 s fallback must produce a live token");
}
[Fact]
public void Null_context_returns_uncancelled_token_with_fallback()
{
// Defensive — OnReadValue receives an ISystemContext from the stack so the helper
// shouldn't NRE if a future override passes through a null context.
using var cts = DriverNodeManager.DeriveOperationCancellation(context: null!, fallback: TimeSpan.FromSeconds(30));
cts.Token.IsCancellationRequested.ShouldBeFalse();
}
/// <summary>Minimal IOperationContext for deadline testing.</summary>
private sealed class StubOperationContext(DateTime deadline) : IOperationContext
{
public DateTime OperationDeadline { get; } = deadline;
public NodeId? SessionId => null;
public IUserIdentity? UserIdentity => null;
public IList<string>? PreferredLocales => null;
public DiagnosticsMasks DiagnosticsMask => DiagnosticsMasks.None;
public StringTable StringTable { get; } = new StringTable();
public StatusCode OperationStatus => StatusCodes.Good;
public string? AuditEntryId => null;
}
}

View File

@@ -115,6 +115,34 @@ public sealed class PeerHttpProbeLoopTests : IDisposable
tracker.Get("B").HttpHealthy.ShouldBeFalse(); tracker.Get("B").HttpHealthy.ShouldBeFalse();
} }
[Fact]
public async Task Tick_does_not_mutate_factory_vended_client_Timeout()
{
// Server-012: timeouts belong on the named-client registration or a per-request CTS,
// NOT on a factory-vended HttpClient (which IHttpClientFactory may pool/recycle).
// Mutating client.Timeout per tick is at minimum a bad smell and races with
// IHttpClientFactory's lifecycle expectations.
var coordinator = await SeedAndInitializeAsync("A",
("A", RedundancyRole.Primary, "urn:A"),
("B", RedundancyRole.Secondary, "urn:B"));
var tracker = new PeerReachabilityTracker();
var factoryInitialTimeout = TimeSpan.FromMinutes(2);
var factory = new RecordingHttpClientFactory(
_ => new HttpResponseMessage(HttpStatusCode.OK),
factoryInitialTimeout);
var loop = new PeerHttpProbeLoop(coordinator, tracker, factory, NullLogger<PeerHttpProbeLoop>.Instance,
options: new PeerProbeOptions { HttpProbeTimeout = TimeSpan.FromSeconds(3) });
await loop.TickAsync(CancellationToken.None);
factory.LastCreatedClient.ShouldNotBeNull();
factory.LastCreatedClient.Timeout.ShouldBe(factoryInitialTimeout,
"the probe loop must not mutate the factory-vended HttpClient's Timeout — "
+ "per-call timeout should be enforced via a CancellationToken or via "
+ "AddHttpClient.ConfigureHttpClient on the named registration.");
}
// ---- fixture helpers --------------------------------------------------- // ---- fixture helpers ---------------------------------------------------
private async Task<RedundancyCoordinator> SeedAndInitializeAsync(string selfNodeId, params (string id, RedundancyRole role, string appUri)[] nodes) private async Task<RedundancyCoordinator> SeedAndInitializeAsync(string selfNodeId, params (string id, RedundancyRole role, string appUri)[] nodes)
@@ -158,4 +186,30 @@ public sealed class PeerHttpProbeLoopTests : IDisposable
=> Task.FromResult(respond(request)); => Task.FromResult(respond(request));
} }
} }
/// <summary>
/// Server-012 — captures the most-recently-vended <see cref="HttpClient"/> so the
/// test can assert the probe loop didn't mutate its <see cref="HttpClient.Timeout"/>.
/// </summary>
private sealed class RecordingHttpClientFactory(
Func<HttpRequestMessage, HttpResponseMessage> respond,
TimeSpan initialTimeout) : IHttpClientFactory
{
public HttpClient? LastCreatedClient { get; private set; }
public HttpClient CreateClient(string name)
{
var client = new HttpClient(new RecordingHandler(respond), disposeHandler: true)
{
Timeout = initialTimeout,
};
LastCreatedClient = client;
return client;
}
private sealed class RecordingHandler(Func<HttpRequestMessage, HttpResponseMessage> respond) : HttpMessageHandler
{
protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
=> Task.FromResult(respond(request));
}
}
} }

View File

@@ -0,0 +1,176 @@
using Opc.Ua;
using Serilog;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;
using ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms;
using ZB.MOM.WW.OtOpcUa.Core.Scripting;
using ZB.MOM.WW.OtOpcUa.Server.OpcUa;
using ZB.MOM.WW.OtOpcUa.Server.Phase7;
namespace ZB.MOM.WW.OtOpcUa.Server.Tests.Phase7;
/// <summary>
/// Regression for Server-008 — <c>RouteScriptedAlarmMethodCalls</c> must mark a handled
/// <see cref="CallMethodRequest"/> slot as <c>Processed = true</c> so the stack's
/// <c>CustomNodeManager2.Call</c> skips it. The pre-fix code relied on the slot's
/// <c>errors[i]</c> being <c>ServiceResult.Good</c>, but the SDK's actual skip predicate is
/// <see cref="CallMethodRequest.Processed"/>; without setting it, the stack's built-in
/// Part 9 Acknowledge / Confirm handler would also fire, producing a double transition.
/// </summary>
[Trait("Category", "Unit")]
public sealed class ScriptedAlarmMethodRoutingProcessedFlagTests
{
private static ScriptedAlarmEngine BuildActiveEngine(string alarmId)
{
var upstream = new CachedTagUpstreamSource();
var logger = new LoggerConfiguration().CreateLogger();
var factory = new ScriptLoggerFactory(logger);
var engine = new ScriptedAlarmEngine(upstream, new InMemoryAlarmStateStore(), factory, logger);
var defs = new List<ScriptedAlarmDefinition>
{
new(AlarmId: alarmId,
EquipmentPath: "/eq",
AlarmName: alarmId,
Kind: AlarmKind.LimitAlarm,
Severity: AlarmSeverity.Medium,
MessageTemplate: "msg",
PredicateScriptSource: "return true;"),
};
engine.LoadAsync(defs, CancellationToken.None).GetAwaiter().GetResult();
return engine;
}
private static ScriptedAlarmEngine BuildEngine(string alarmId)
{
var upstream = new CachedTagUpstreamSource();
var logger = new LoggerConfiguration().CreateLogger();
var factory = new ScriptLoggerFactory(logger);
var engine = new ScriptedAlarmEngine(upstream, new InMemoryAlarmStateStore(), factory, logger);
var defs = new List<ScriptedAlarmDefinition>
{
new(AlarmId: alarmId,
EquipmentPath: "/eq",
AlarmName: alarmId,
Kind: AlarmKind.LimitAlarm,
Severity: AlarmSeverity.Medium,
MessageTemplate: "msg",
PredicateScriptSource: "return false;"),
};
engine.LoadAsync(defs, CancellationToken.None).GetAwaiter().GetResult();
return engine;
}
private static CallMethodRequest AcknowledgeRequest(string conditionNodeId)
=> new()
{
ObjectId = new NodeId(conditionNodeId, 2),
MethodId = MethodIds.AcknowledgeableConditionType_Acknowledge,
InputArguments =
{
new Variant(new byte[] { 1, 2, 3 }),
new Variant(new LocalizedText("ack-comment")),
},
};
private static CallMethodRequest AddCommentRequest(string conditionNodeId)
=> new()
{
ObjectId = new NodeId(conditionNodeId, 2),
MethodId = MethodIds.ConditionType_AddComment,
InputArguments =
{
new Variant(new byte[] { 1, 2, 3 }),
new Variant(new LocalizedText("comment-text")),
},
};
[Fact]
public void Handled_Acknowledge_marks_Processed_true_so_baseCall_skips_the_slot()
{
using var engine = BuildActiveEngine("al-1");
var calls = new List<CallMethodRequest> { AcknowledgeRequest("al-1.Condition") };
var results = new List<CallMethodResult> { new CallMethodResult() };
var errors = new List<ServiceResult> { null! };
var index = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
{
["al-1.Condition"] = "al-1",
};
DriverNodeManager.RouteScriptedAlarmMethodCalls(
new NamedIdentity("ops-user"), calls, results, errors, engine, index);
calls[0].Processed.ShouldBeTrue(
"CustomNodeManager2.Call/CallInternalAsync skips slots with Processed=true. "
+ "Without this flag, base.Call would re-dispatch the Acknowledge to the stack's "
+ "built-in Part 9 handler and the engine would observe a double transition.");
}
[Fact]
public void Handled_AddComment_marks_Processed_true()
{
using var engine = BuildEngine("al-1");
var calls = new List<CallMethodRequest> { AddCommentRequest("al-1.Condition") };
var results = new List<CallMethodResult> { new CallMethodResult() };
var errors = new List<ServiceResult> { null! };
var index = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
{
["al-1.Condition"] = "al-1",
};
DriverNodeManager.RouteScriptedAlarmMethodCalls(
new NamedIdentity("ops-user"), calls, results, errors, engine, index);
calls[0].Processed.ShouldBeTrue("AddComment handled by the engine must not re-dispatch via base.Call");
}
[Fact]
public void Engine_error_path_also_marks_Processed_so_baseCall_does_not_re_run_the_method()
{
using var engine = BuildEngine("al-1");
var calls = new List<CallMethodRequest>
{
// Index maps to an alarm id the engine doesn't know — engine throws
// ArgumentException, helper sets errors[i] = BadInvalidArgument.
AcknowledgeRequest("al-1.Condition"),
};
var results = new List<CallMethodResult> { new CallMethodResult() };
var errors = new List<ServiceResult> { null! };
var index = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
{
["al-1.Condition"] = "al-NOT-IN-ENGINE",
};
DriverNodeManager.RouteScriptedAlarmMethodCalls(
new NamedIdentity("ops-user"), calls, results, errors, engine, index);
ServiceResult.IsBad(errors[0]).ShouldBeTrue("engine error path");
calls[0].Processed.ShouldBeTrue(
"even when the engine returns Bad, the slot was handled — base.Call must not "
+ "re-dispatch the method against the OPC UA built-in handler.");
}
[Fact]
public void Unhandled_slot_leaves_Processed_false_so_baseCall_drives_it()
{
using var engine = BuildActiveEngine("al-1");
var genericMethod = new CallMethodRequest
{
ObjectId = new NodeId("some-driver-method", 2),
MethodId = new NodeId("driver-method", 2),
};
var calls = new List<CallMethodRequest> { genericMethod };
var results = new List<CallMethodResult> { new CallMethodResult() };
var errors = new List<ServiceResult> { null! };
DriverNodeManager.RouteScriptedAlarmMethodCalls(
new NamedIdentity("ops-user"), calls, results, errors, engine,
conditionIdToAlarmId: new Dictionary<string, string>());
calls[0].Processed.ShouldBeFalse("non-alarm methods must fall through to base.Call");
errors[0].ShouldBeNull("unhandled slot's error must stay null for the base implementation");
}
private sealed class NamedIdentity(string displayName) : UserIdentity(displayName, "") { }
}

View File

@@ -0,0 +1,56 @@
using Opc.Ua;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Server.OpcUa;
namespace ZB.MOM.WW.OtOpcUa.Server.Tests;
/// <summary>
/// Regression for Server-004 — the production
/// <see cref="OtOpcUaServer.RoleBasedIdentity"/> must surface the LDAP-resolved display
/// name through <see cref="IUserIdentity.DisplayName"/>, since
/// <c>DriverNodeManager.ResolveCallUser</c> reads the base interface property when stamping
/// audit identities on scripted-alarm Acknowledge / Confirm / Shelve calls.
/// </summary>
[Trait("Category", "Unit")]
public sealed class RoleBasedIdentityTests
{
[Fact]
public void DisplayName_returns_LDAP_resolved_display_name_when_present()
{
IUserIdentity identity = new OtOpcUaServer.RoleBasedIdentity(
userName: "alice",
displayName: "Alice Smith",
roles: new[] { "WriteOperate" },
ldapGroups: new[] { "ot_operators" });
identity.DisplayName.ShouldBe("Alice Smith",
"DriverNodeManager.ResolveCallUser reads IUserIdentity.DisplayName for audit entries; "
+ "RoleBasedIdentity must surface the LDAP-resolved name, not just the username.");
}
[Fact]
public void DisplayName_falls_back_to_userName_when_LDAP_display_name_is_null()
{
IUserIdentity identity = new OtOpcUaServer.RoleBasedIdentity(
userName: "alice",
displayName: null,
roles: [],
ldapGroups: []);
identity.DisplayName.ShouldBe("alice",
"absent an LDAP display name, audit entries should still carry the username.");
}
[Fact]
public void ResolveCallUser_yields_LDAP_resolved_display_name()
{
IUserIdentity identity = new OtOpcUaServer.RoleBasedIdentity(
userName: "alice",
displayName: "Alice Smith",
roles: [],
ldapGroups: []);
DriverNodeManager.ResolveCallUser(identity).ShouldBe("Alice Smith");
}
}

View File

@@ -0,0 +1,52 @@
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging.Abstractions;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Configuration.LocalCache;
namespace ZB.MOM.WW.OtOpcUa.Server.Tests;
/// <summary>
/// Regression for Server-014 — <see cref="SealedBootstrap"/> exists in the source tree and
/// is referenced by <c>docs/v2/v2-release-readiness.md</c> as the closed release blocker for
/// generation-sealed config plumbing, but it was never registered in the production DI
/// container. The release blocker remained de-facto open. This test asserts the DI
/// registrations (which <c>Program.cs</c> performs at startup) actually compose: every
/// dependency <see cref="SealedBootstrap"/> needs — <see cref="GenerationSealedCache"/>,
/// <see cref="ResilientConfigReader"/>, <see cref="StaleConfigFlag"/> — must be resolvable
/// so the production wire-up doesn't fail with a missing-service exception at startup.
/// </summary>
[Trait("Category", "Unit")]
public sealed class SealedBootstrapWiringTests
{
[Fact]
public void SealedBootstrap_and_its_dependencies_are_registered_in_DI()
{
var tempRoot = Path.Combine(Path.GetTempPath(), $"otopcua-sealed-bootstrap-wiring-{Guid.NewGuid():N}");
try
{
// Mirror Program.cs's registrations of NodeOptions + the SealedBootstrap chain.
var services = new ServiceCollection();
ZB.MOM.WW.OtOpcUa.Server.ServerWiring.AddSealedBootstrap(services, new NodeOptions
{
NodeId = "test-node",
ClusterId = "test-cluster",
ConfigDbConnectionString = "Server=fake;Database=fake;Integrated Security=true;",
LocalCachePath = tempRoot,
});
services.AddSingleton(NullLoggerFactory.Instance);
services.AddLogging();
using var sp = services.BuildServiceProvider();
sp.GetRequiredService<GenerationSealedCache>().ShouldNotBeNull();
sp.GetRequiredService<ResilientConfigReader>().ShouldNotBeNull();
sp.GetRequiredService<StaleConfigFlag>().ShouldNotBeNull();
sp.GetRequiredService<SealedBootstrap>().ShouldNotBeNull();
}
finally
{
try { if (Directory.Exists(tempRoot)) Directory.Delete(tempRoot, recursive: true); } catch { }
}
}
}