diff --git a/code-reviews/Driver.OpcUaClient/findings.md b/code-reviews/Driver.OpcUaClient/findings.md index eb71e61..d9135c9 100644 --- a/code-reviews/Driver.OpcUaClient/findings.md +++ b/code-reviews/Driver.OpcUaClient/findings.md @@ -7,7 +7,7 @@ | Review date | 2026-05-22 | | Commit reviewed | `76d35d1` | | Status | Reviewed | -| Open findings | 15 | +| Open findings | 10 | ## Checklist coverage @@ -33,13 +33,13 @@ | Severity | High | | Category | Correctness & logic bugs | | Location | `OpcUaClientDriver.cs:444`, `:466`, `:517`, `:540`, `:599`, `:610` | -| Status | Open | +| Status | Resolved | **Description:** ReadAsync, WriteAsync, and DiscoverAsync capture the session into a local variable via RequireSession() before acquiring `_gate`, then perform the wire call on that captured reference inside the gate. The reconnect path (OnReconnectComplete, line 1330) swaps `Session` to a brand-new ISession. A read that captured the pre-reconnect session at line 444, then blocked on `_gate.WaitAsync` while a reconnect completed, issues ReadAsync against a stale/closed session. The catch block then fans out BadCommunicationError for the whole batch even though the driver is healthy on the new session, and the operation is silently lost. The gate does not protect against the session being swapped underneath a waiter. **Recommendation:** Re-read `Session` inside the `_gate` critical section (after WaitAsync returns), or route the session swap itself through `_gate` so a swap cannot interleave with a gated operation. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — ReadAsync/WriteAsync/DiscoverAsync now re-read `Session` (and parse NodeIds) inside the `_gate` critical section after `WaitAsync` returns; a session swapped in by a concurrent reconnect is the one used for the wire call. ### Driver.OpcUaClient-002 @@ -48,13 +48,13 @@ | Severity | High | | Category | Error handling & resilience | | Location | `OpcUaClientDriver.cs:1330-1359` | -| Status | Open | +| Status | Resolved | **Description:** OnReconnectComplete handles only the success case. When SessionReconnectHandler gives up (its retry loop exhausts the 2-minute maxReconnectPeriod), it invokes the callback with `handler.Session == null`. The code sets `Session = null`, disposes the handler, and sets `_reconnectHandler = null`, but leaves `_health` at whatever it was (typically Degraded) and `_hostState` at Stopped. There is no further reconnect attempt (the handler is gone, and OnKeepAlive only fires on a live session which no longer exists), and DriverState is never set to Faulted. The driver is permanently wedged: no session, no reconnect loop, no Faulted signal for the Core, and ReinitializeAsync is never triggered. This is the single largest gateway resilience gap. **Recommendation:** In OnReconnectComplete, when newSession is null, set `_health` to a Faulted DriverHealth with an explanatory message so the Core can fan out Bad quality and offer an operator reinitialize. Consider re-arming a fresh reconnect attempt rather than giving up entirely for an always-on gateway. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — OnReconnectComplete's give-up branch now transitions HostState to Faulted, sets a Faulted DriverHealth with an explanatory message, and re-arms a fresh SessionReconnectHandler (`TryRearmReconnect`) against the last-known session so an always-on gateway self-heals while the Core can still offer an operator reinitialize. ### Driver.OpcUaClient-003 @@ -63,13 +63,13 @@ | Severity | High | | Category | Correctness & logic bugs | | Location | `OpcUaClientDriver.cs:644-711` | -| Status | Open | +| Status | Resolved | **Description:** BrowseRecursiveAsync calls session.BrowseAsync with `requestedMaxReferencesPerNode: 0` but never follows browse continuation points. OPC UA servers enforce a server-side max-references-per-node limit; when a node has more children than the server returns in one response, BrowseResult.ContinuationPoint is non-empty and the caller must issue BrowseNext to retrieve the remainder. This driver discards the continuation point, so any folder on the remote server with a large child set is silently truncated: discovered tags go missing from the local address space with no error. For the tens-of-thousands-of-nodes scenario the options doc targets (MaxDiscoveredNodes = 10000), this is a realistic and silent data-completeness bug. **Recommendation:** After processing resp.Results[0].References, check resp.Results[0].ContinuationPoint; while non-empty, call session.BrowseNextAsync and append the additional references before recursing/registering. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — BrowseRecursiveAsync now loops on the BrowseResult.ContinuationPoint, calling `session.BrowseNextAsync` and appending each page of references until the continuation point is empty, so large remote folders are no longer silently truncated. ### Driver.OpcUaClient-004 @@ -78,13 +78,13 @@ | Severity | High | | Category | Design-document adherence | | Location | `OpcUaClientDriver.cs:596-632`, `:789`, `OpcUaClientDriverOptions.cs` | -| Status | Open | +| Status | Resolved | **Description:** docs/v2/driver-specs.md section 8 mandates two features that are absent. (1) Namespace remapping: the spec requires building a bidirectional namespace map at connect time from session.NamespaceUris. The driver instead stores the raw upstream NodeId string (pv.NodeId.ToString()) as DriverAttributeInfo.FullName and re-parses it verbatim for reads/writes. The namespace index embedded in `ns=N;...` is server-session-relative; if the upstream server reorders its namespace table across a restart (permitted by the spec), every stored ns=N reference points at the wrong namespace and reads/writes silently address wrong nodes. (2) TargetNamespaceKind enforcement: section 8 requires the driver to enforce Equipment-vs-SystemPlatform choice at startup and fail draft validation on misconfiguration; OpcUaClientDriverOptions has no such knob. **Recommendation:** Build a namespace-URI map from session.NamespaceUris at connect time and store NodeIds in a server-stable form (namespace URI plus identifier) rather than session-relative ns=N. Add the TargetNamespaceKind option and the startup validation section 8 describes, or document explicitly why the design deviates. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — new `NamespaceMap` (built from session.NamespaceUris at connect and rebuilt on reconnect) persists discovered NodeIds in the server-stable `nsu=;…` form; reads/writes re-resolve that form against the current session so a remote namespace-table reorder no longer misaddresses nodes. Added the `TargetNamespaceKind` option + `UnsMappingTable` and `ValidateNamespaceKind`, which fails draft validation for an Equipment instance lacking a UNS mapping or a SystemPlatform instance carrying one. ### Driver.OpcUaClient-005 @@ -93,13 +93,13 @@ | Severity | High | | Category | Concurrency & thread safety | | Location | `OpcUaClientDriver.cs:1297-1319` | -| Status | Open | +| Status | Resolved | **Description:** OnKeepAlive reads and writes `_reconnectHandler` without any lock: `if (_reconnectHandler is not null) return;` followed by `_reconnectHandler = new SessionReconnectHandler(...)`. Keep-alive callbacks are raised from the SDK keep-alive timer thread; on a bad keep-alive the SDK can fire the handler repeatedly while the channel stays down. Two callbacks racing through the check-then-set both observe null, both construct a SessionReconnectHandler, both call BeginReconnect, and the second assignment overwrites the first handler, leaking the first handler (its retry loop keeps running, unreferenced and never disposed) and creating two competing reconnect loops. ShutdownAsync then only cancels/disposes the one that won the assignment race. **Recommendation:** Guard the `_reconnectHandler` check-and-set with `_probeLock` (already held for `_hostState`), or use Interlocked.CompareExchange to ensure exactly one handler is constructed per drop. -**Resolution:** _(open)_ +**Resolution:** Resolved 2026-05-22 — the `_reconnectHandler` check-and-set in OnKeepAlive (and the take-and-clear in ShutdownAsync, plus the dispose/re-arm in OnReconnectComplete/TryRearmReconnect) now run inside the `_probeLock` critical section, so exactly one SessionReconnectHandler is constructed per drop and a racing keep-alive callback cannot leak a handler. ### Driver.OpcUaClient-006 diff --git a/src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/NamespaceMap.cs b/src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/NamespaceMap.cs new file mode 100644 index 0000000..56c3ec5 --- /dev/null +++ b/src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/NamespaceMap.cs @@ -0,0 +1,138 @@ +using Opc.Ua; +using Opc.Ua.Client; + +namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient; + +/// +/// Bidirectional namespace map for the OPC UA Client (gateway) driver, built at connect +/// time from session.NamespaceUris per docs/v2/driver-specs.md §8 +/// "Namespace Remapping". +/// +/// +/// +/// The session-relative namespace index embedded in an ns=N;… NodeId string is +/// not stable: the OPC UA spec permits a server to reorder its namespace table +/// across a restart. A driver that stores raw ns=N references and re-parses +/// them verbatim for reads/writes will, after such a reorder, silently address the +/// wrong namespace. +/// +/// +/// This map captures the upstream namespace table as it was at connect time and lets +/// the driver persist NodeIds in a server-stable form — the namespace +/// URI plus the identifier — via . At +/// read/write time re-binds that stable form against the +/// current session's namespace table, so a table reorder is transparently +/// corrected instead of silently misaddressing nodes. +/// +/// +/// Stable references use the form nsu=<uri>;<idType>=<identifier>, +/// which is the standard OPC UA namespace-URI NodeId encoding the SDK already +/// understands. References that are already in plain ns=N;… form (e.g. a +/// hand-entered config tag) still resolve — falls back to a +/// direct parse against the current session. +/// +/// +internal sealed class NamespaceMap +{ + // index -> URI and URI -> index, as the upstream server published them at connect time. + private readonly string[] _uris; + private readonly Dictionary _uriToIndex; + + private NamespaceMap(string[] uris) + { + _uris = uris; + _uriToIndex = new Dictionary(uris.Length, StringComparer.Ordinal); + for (var i = 0; i < uris.Length; i++) + _uriToIndex[uris[i]] = (ushort)i; + } + + /// Snapshot the namespace table from a live session. + public static NamespaceMap FromSession(ISession session) + { + ArgumentNullException.ThrowIfNull(session); + return FromTable(session.NamespaceUris); + } + + /// + /// Snapshot a directly. Separated from + /// so the URI encoding can be exercised without standing up + /// a live . + /// + public static NamespaceMap FromTable(NamespaceTable namespaceUris) + { + ArgumentNullException.ThrowIfNull(namespaceUris); + return new NamespaceMap(namespaceUris.ToArray()); + } + + /// Number of namespaces captured. Index 0 is always the OPC UA core namespace. + public int Count => _uris.Length; + + /// The namespace URI at the given index, or null if out of range. + public string? UriForIndex(int index) => + index >= 0 && index < _uris.Length ? _uris[index] : null; + + /// The index for a namespace URI, or null if the URI is not in the table. + public ushort? IndexForUri(string uri) => + _uriToIndex.TryGetValue(uri, out var idx) ? idx : null; + + /// + /// Render a NodeId resolved against this map's session as a server-stable + /// reference string — namespace URI plus identifier, in the SDK's nsu=… + /// encoding. The OPC UA core namespace (index 0) keeps the compact i=…/ns=0 + /// form since URI 0 never moves. Used to persist a discovered NodeId into the local + /// address space so it survives a remote namespace-table reorder. + /// + public string ToStableReference(NodeId nodeId) + { + ArgumentNullException.ThrowIfNull(nodeId); + // Namespace 0 is fixed by the spec — the compact form is already stable. + if (nodeId.NamespaceIndex == 0) + return nodeId.ToString() ?? string.Empty; + + var uri = UriForIndex(nodeId.NamespaceIndex); + if (uri is null) + // Namespace index not in the captured table — fall back to the raw form rather + // than throwing; the read/write path will surface BadNodeIdInvalid if it truly + // can't resolve. + return nodeId.ToString() ?? string.Empty; + + // nsu=;= is the standard namespace-URI NodeId encoding. + return $"nsu={uri};{IdentifierPart(nodeId)}"; + } + + /// + /// Resolve a reference string — either a stable nsu=… reference produced by + /// or a plain ns=N;… NodeId — against the + /// 's namespace table. A nsu=… reference is + /// re-bound through the current session's URI table so a remote reorder since connect + /// time is corrected. Returns false for empty/malformed input or an unknown URI. + /// + public static bool TryResolve(ISession currentSession, string reference, out NodeId nodeId) + { + nodeId = NodeId.Null; + if (currentSession is null || string.IsNullOrWhiteSpace(reference)) return false; + try + { + // NodeId.Parse with a NamespaceUriString form (nsu=…) re-maps the URI against + // the supplied context's NamespaceUris — i.e. the *current* session table — so a + // reorder since the reference was captured resolves correctly. Plain ns=N forms + // resolve directly. + nodeId = NodeId.Parse(currentSession.MessageContext, reference); + return !NodeId.IsNull(nodeId); + } + catch + { + return false; + } + } + + /// Render just the identifier portion (s=…, i=…, g=…, b=…) of a NodeId. + private static string IdentifierPart(NodeId nodeId) => nodeId.IdType switch + { + IdType.Numeric => $"i={nodeId.Identifier}", + IdType.String => $"s={nodeId.Identifier}", + IdType.Guid => $"g={nodeId.Identifier}", + IdType.Opaque => $"b={Convert.ToBase64String((byte[])nodeId.Identifier)}", + _ => $"s={nodeId.Identifier}", + }; +} diff --git a/src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriver.cs b/src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriver.cs index 7fc58ac..e2d8df8 100644 --- a/src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriver.cs +++ b/src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriver.cs @@ -71,10 +71,22 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d /// /// SDK-provided reconnect handler that owns the retry loop + session-transfer machinery /// when the session's keep-alive channel reports a bad status. Null outside the - /// reconnecting window; constructed lazily inside the keep-alive handler. + /// reconnecting window; constructed lazily inside the keep-alive handler. Guarded by + /// — keep-alive callbacks fire from the SDK timer thread and + /// can race a check-then-set if left unsynchronized (Driver.OpcUaClient-005). /// private SessionReconnectHandler? _reconnectHandler; + /// + /// Bidirectional namespace map built at connect time from session.NamespaceUris. + /// Stored NodeIds embed the server-stable namespace URI rather than the + /// session-relative ns=N index, so a remote-server namespace-table reorder + /// across a restart does not silently re-point stored references at the wrong + /// namespace (driver-specs.md §8 "Namespace Remapping", finding Driver.OpcUaClient-004). + /// Null until returns cleanly. + /// + private NamespaceMap? _namespaceMap; + public string DriverInstanceId => driverInstanceId; public string DriverType => "OpcUaClient"; @@ -83,6 +95,11 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d _health = new DriverHealth(DriverState.Initializing, null, null); try { + // Enforce the Equipment-vs-SystemPlatform choice at startup per driver-specs.md + // §8 "Namespace Assignment" — a misconfigured remote fails draft validation here, + // not as a runtime surprise. + ValidateNamespaceKind(_options); + var appConfig = await BuildApplicationConfigurationAsync(cancellationToken).ConfigureAwait(false); var candidates = ResolveEndpointCandidates(_options); @@ -126,6 +143,12 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d _keepAliveHandler = OnKeepAlive; session.KeepAlive += _keepAliveHandler; + // Build the bidirectional namespace map from the freshly negotiated session's + // NamespaceUris (driver-specs.md §8 "Namespace Remapping"). Stored NodeIds carry + // the namespace URI, not the session-relative ns=N index, so a remote namespace + // reorder across a restart can't silently misaddress nodes. + _namespaceMap = NamespaceMap.FromSession(session); + Session = session; _connectedEndpointUrl = connectedUrl; _health = new DriverHealth(DriverState.Healthy, DateTime.UtcNow, null); @@ -235,6 +258,41 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d return [opts.EndpointUrl]; } + /// + /// Enforce the §8 "Namespace Assignment" rule at startup. An Equipment-kind + /// instance gateways raw equipment data and therefore needs a config-driven UNS + /// mapping table (remote nodes don't conform to UNS); a SystemPlatform-kind + /// instance gateways processed data whose hierarchy is preserved verbatim, so a + /// UNS mapping table is meaningless and rejected. Throwing here surfaces the + /// misconfiguration as a draft-validation failure rather than a runtime surprise. + /// + internal static void ValidateNamespaceKind(OpcUaClientDriverOptions opts) + { + switch (opts.TargetNamespaceKind) + { + case OpcUaTargetNamespaceKind.Equipment: + if (opts.UnsMappingTable is null || opts.UnsMappingTable.Count == 0) + throw new InvalidOperationException( + "OpcUaClient driver configured with TargetNamespaceKind=Equipment but no " + + "UnsMappingTable: §8 requires a config-driven remote-to-UNS mapping table " + + "because remote nodes do not conform to UNS by default. Provide a mapping " + + "table or set TargetNamespaceKind=SystemPlatform if the remote exposes " + + "processed data."); + break; + case OpcUaTargetNamespaceKind.SystemPlatform: + if (opts.UnsMappingTable is { Count: > 0 }) + throw new InvalidOperationException( + "OpcUaClient driver configured with TargetNamespaceKind=SystemPlatform but " + + "a UnsMappingTable was supplied: processed data preserves its own hierarchy " + + "and a UNS mapping table is ambiguous here. Clear the mapping table or set " + + "TargetNamespaceKind=Equipment if the remote exposes raw equipment data."); + break; + default: + throw new ArgumentOutOfRangeException( + nameof(opts), opts.TargetNamespaceKind, "Unknown TargetNamespaceKind."); + } + } + /// /// Build the user-identity token from the driver options. Split out of /// so the failover sweep reuses one identity across @@ -411,10 +469,17 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d // Abort any in-flight reconnect attempts before touching the session — BeginReconnect's // retry loop holds a reference to the current session and would fight Session.CloseAsync - // if left spinning. - try { _reconnectHandler?.CancelReconnect(); } catch { } - _reconnectHandler?.Dispose(); - _reconnectHandler = null; + // if left spinning. Take the handler under _probeLock so a keep-alive callback racing + // through OnKeepAlive can't arm a fresh handler after we've torn this one down + // (Driver.OpcUaClient-005). + SessionReconnectHandler? handlerToCancel; + lock (_probeLock) + { + handlerToCancel = _reconnectHandler; + _reconnectHandler = null; + } + try { handlerToCancel?.CancelReconnect(); } catch { } + handlerToCancel?.Dispose(); if (_keepAliveHandler is not null && Session is not null) { @@ -426,6 +491,7 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d catch { /* best-effort */ } try { Session?.Dispose(); } catch { } Session = null; + _namespaceMap = null; _connectedEndpointUrl = null; TransitionTo(HostState.Unknown); @@ -441,31 +507,46 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d public async Task> ReadAsync( IReadOnlyList fullReferences, CancellationToken cancellationToken) { - var session = RequireSession(); + // Make sure a session exists before queuing on the gate, but do NOT bind the wire + // call to this reference — a reconnect can swap Session while we wait on _gate. The + // session actually used is re-read inside the gate (Driver.OpcUaClient-001/-006). + _ = RequireSession(); var results = new DataValueSnapshot[fullReferences.Count]; var now = DateTime.UtcNow; - // Parse NodeIds up-front. Tags whose reference doesn't parse get BadNodeIdInvalid - // and are omitted from the wire request — saves a round-trip against the upstream - // server for a fault the driver can detect locally. - var toSend = new ReadValueIdCollection(); - var indexMap = new List(fullReferences.Count); // maps wire-index -> results-index - for (var i = 0; i < fullReferences.Count; i++) - { - if (!TryParseNodeId(session, fullReferences[i], out var nodeId)) - { - results[i] = new DataValueSnapshot(null, StatusBadNodeIdInvalid, null, now); - continue; - } - toSend.Add(new ReadValueId { NodeId = nodeId, AttributeId = Attributes.Value }); - indexMap.Add(i); - } - - if (toSend.Count == 0) return results; - await _gate.WaitAsync(cancellationToken).ConfigureAwait(false); try { + // Re-read Session inside the critical section: if a reconnect completed while we + // were blocked on _gate, OnReconnectComplete has already swapped in the new + // session. NodeId parsing is namespace-relative, so it must also use the current + // session's namespace table. + var session = Session; + if (session is null) + { + for (var i = 0; i < fullReferences.Count; i++) + results[i] = new DataValueSnapshot(null, StatusBadCommunicationError, null, now); + return results; + } + + // Parse NodeIds against the live session. Tags whose reference doesn't parse get + // BadNodeIdInvalid and are omitted from the wire request — saves a round-trip for + // a fault the driver can detect locally. + var toSend = new ReadValueIdCollection(); + var indexMap = new List(fullReferences.Count); // maps wire-index -> results-index + for (var i = 0; i < fullReferences.Count; i++) + { + if (!TryParseNodeId(session, fullReferences[i], out var nodeId)) + { + results[i] = new DataValueSnapshot(null, StatusBadNodeIdInvalid, null, now); + continue; + } + toSend.Add(new ReadValueId { NodeId = nodeId, AttributeId = Attributes.Value }); + indexMap.Add(i); + } + + if (toSend.Count == 0) return results; + try { var resp = await session.ReadAsync( @@ -514,32 +595,45 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d public async Task> WriteAsync( IReadOnlyList writes, CancellationToken cancellationToken) { - var session = RequireSession(); + // See ReadAsync — the wire call must use the session current inside the gate, not a + // reference captured before WaitAsync (Driver.OpcUaClient-001/-006). + _ = RequireSession(); var results = new WriteResult[writes.Count]; - var toSend = new WriteValueCollection(); - var indexMap = new List(writes.Count); - for (var i = 0; i < writes.Count; i++) - { - if (!TryParseNodeId(session, writes[i].FullReference, out var nodeId)) - { - results[i] = new WriteResult(StatusBadNodeIdInvalid); - continue; - } - toSend.Add(new WriteValue - { - NodeId = nodeId, - AttributeId = Attributes.Value, - Value = new DataValue(new Variant(writes[i].Value)), - }); - indexMap.Add(i); - } - - if (toSend.Count == 0) return results; - await _gate.WaitAsync(cancellationToken).ConfigureAwait(false); try { + var session = Session; + if (session is null) + { + // Writes are non-idempotent (decision #44/#45) — but here the request never + // reached the wire, so BadCommunicationError ("definitely did not happen") is + // the honest code. + for (var i = 0; i < writes.Count; i++) + results[i] = new WriteResult(StatusBadCommunicationError); + return results; + } + + var toSend = new WriteValueCollection(); + var indexMap = new List(writes.Count); + for (var i = 0; i < writes.Count; i++) + { + if (!TryParseNodeId(session, writes[i].FullReference, out var nodeId)) + { + results[i] = new WriteResult(StatusBadNodeIdInvalid); + continue; + } + toSend.Add(new WriteValue + { + NodeId = nodeId, + AttributeId = Attributes.Value, + Value = new DataValue(new Variant(writes[i].Value)), + }); + indexMap.Add(i); + } + + if (toSend.Count == 0) return results; + try { var resp = await session.WriteAsync( @@ -568,25 +662,26 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d } /// - /// Parse a tag's full-reference string as a NodeId. Accepts the standard OPC UA - /// serialized forms (ns=2;s=…, i=2253, ns=4;g=…, ns=3;b=…). - /// Empty + malformed strings return false; the driver surfaces that as - /// without a wire round-trip. + /// Parse a tag's full-reference string as a NodeId, resolved against the + /// 's current namespace table. Accepts both the + /// server-stable nsu=<uri>;… form the driver persists (see + /// ) and plain OPC UA serialized forms + /// (ns=2;s=…, i=2253, ns=4;g=…, ns=3;b=…). Resolving the + /// nsu=… form against the current session re-binds it through that session's + /// URI table, so a remote namespace-table reorder across a restart is transparently + /// corrected (driver-specs.md §8). Empty + malformed strings return false; the driver + /// surfaces that as without a wire round-trip. /// - internal static bool TryParseNodeId(ISession session, string fullReference, out NodeId nodeId) - { - nodeId = NodeId.Null; - if (string.IsNullOrWhiteSpace(fullReference)) return false; - try - { - nodeId = NodeId.Parse(session.MessageContext, fullReference); - return !NodeId.IsNull(nodeId); - } - catch - { - return false; - } - } + internal static bool TryParseNodeId(ISession session, string fullReference, out NodeId nodeId) => + NamespaceMap.TryResolve(session, fullReference, out nodeId); + + /// + /// Render a discovered NodeId in the server-stable form persisted into the local + /// address space. Falls back to the raw serialized NodeId if the namespace map is not + /// yet built (it always is by the time runs). + /// + private string StableReference(NodeId nodeId) => + _namespaceMap?.ToStableReference(nodeId) ?? nodeId.ToString() ?? string.Empty; private ISession RequireSession() => Session ?? throw new InvalidOperationException("OpcUaClientDriver not initialized"); @@ -596,11 +691,10 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d public async Task DiscoverAsync(IAddressSpaceBuilder builder, CancellationToken cancellationToken) { ArgumentNullException.ThrowIfNull(builder); - var session = RequireSession(); - - var root = !string.IsNullOrEmpty(_options.BrowseRoot) - ? NodeId.Parse(session.MessageContext, _options.BrowseRoot) - : ObjectIds.ObjectsFolder; + // Confirm a session exists before queuing; the session actually browsed is re-read + // inside the gate so a reconnect mid-wait can't leave us browsing a closed session + // (Driver.OpcUaClient-001/-006). + _ = RequireSession(); var rootFolder = builder.Folder("Remote", "Remote"); var visited = new HashSet(); @@ -610,6 +704,14 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d await _gate.WaitAsync(cancellationToken).ConfigureAwait(false); try { + var session = Session + ?? throw new InvalidOperationException( + "OpcUaClient session was lost before discovery could browse the remote server."); + + var root = !string.IsNullOrEmpty(_options.BrowseRoot) + ? NodeId.Parse(session.MessageContext, _options.BrowseRoot) + : ObjectIds.ObjectsFolder; + // Pass 1: browse hierarchy + create folders inline, collect variables into a // pending list. Defers variable registration until attributes are resolved — the // address-space builder's Variable call is the one-way commit, so doing it only @@ -664,15 +766,41 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d } }; - BrowseResponse resp; + ReferenceDescriptionCollection refs; try { - resp = await session.BrowseAsync( + var resp = await session.BrowseAsync( requestHeader: null, view: null, requestedMaxReferencesPerNode: 0, nodesToBrowse: browseDescriptions, ct: ct).ConfigureAwait(false); + + if (resp.Results.Count == 0) return; + var result = resp.Results[0]; + refs = result.References; + + // Follow browse continuation points. OPC UA servers cap the references returned + // per node in a single response; when a folder has more children than the cap, + // BrowseResult.ContinuationPoint is non-empty and the remainder must be pulled + // with BrowseNext. Without this loop a large remote folder is silently truncated + // and discovered tags go missing from the local address space + // (Driver.OpcUaClient-003). + var continuationPoint = result.ContinuationPoint; + while (continuationPoint is { Length: > 0 }) + { + var next = await session.BrowseNextAsync( + requestHeader: null, + releaseContinuationPoints: false, + continuationPoints: [continuationPoint], + ct: ct).ConfigureAwait(false); + + if (next.Results.Count == 0) break; + var nextResult = next.Results[0]; + if (nextResult.References is { Count: > 0 }) + refs.AddRange(nextResult.References); + continuationPoint = nextResult.ContinuationPoint; + } } catch { @@ -682,9 +810,6 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d return; } - if (resp.Results.Count == 0) return; - var refs = resp.Results[0].References; - foreach (var rf in refs) { if (discovered() >= _options.MaxDiscoveredNodes) break; @@ -787,7 +912,7 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d var historizing = StatusCode.IsGood(histDv.StatusCode) && histDv.Value is bool b && b; pv.ParentFolder.Variable(pv.BrowseName, pv.DisplayName, new DriverAttributeInfo( - FullName: pv.NodeId.ToString() ?? string.Empty, + FullName: StableReference(pv.NodeId), DriverDataType: dataType, IsArray: isArray, ArrayDim: null, @@ -799,7 +924,7 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d void RegisterFallback(PendingVariable pv) { pv.ParentFolder.Variable(pv.BrowseName, pv.DisplayName, new DriverAttributeInfo( - FullName: pv.NodeId.ToString() ?? string.Empty, + FullName: StableReference(pv.NodeId), DriverDataType: DriverDataType.Int32, IsArray: false, ArrayDim: null, @@ -1304,15 +1429,25 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d TransitionTo(HostState.Stopped); - // Kick off the SDK's reconnect loop exactly once per drop. The handler handles its - // own retry cadence via ReconnectPeriod; we tear it down in OnReconnectComplete. - if (_reconnectHandler is not null) return; + // Kick off the SDK's reconnect loop exactly once per drop. Keep-alive callbacks fire + // from the SDK keep-alive timer thread and the SDK can fire this handler repeatedly + // while the channel stays down — the check-then-set must be atomic, otherwise two + // callbacks both observe null, both construct a SessionReconnectHandler, and the + // second assignment leaks the first (its retry loop keeps running, unreferenced and + // never disposed). Guard with _probeLock (Driver.OpcUaClient-005). + SessionReconnectHandler handler; + lock (_probeLock) + { + if (_reconnectHandler is not null || _disposed) return; + handler = new SessionReconnectHandler(telemetry: null!, + reconnectAbort: false, + maxReconnectPeriod: (int)TimeSpan.FromMinutes(2).TotalMilliseconds); + _reconnectHandler = handler; + } - _reconnectHandler = new SessionReconnectHandler(telemetry: null!, - reconnectAbort: false, - maxReconnectPeriod: (int)TimeSpan.FromMinutes(2).TotalMilliseconds); - - var state = _reconnectHandler.BeginReconnect( + // BeginReconnect is started outside the lock — it does no _reconnectHandler mutation + // and we don't want to hold _probeLock across an SDK call. + handler.BeginReconnect( sender, (int)_options.ReconnectPeriod.TotalMilliseconds, OnReconnectComplete); @@ -1345,16 +1480,88 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d } Session = newSession; - _reconnectHandler?.Dispose(); - _reconnectHandler = null; - // Whether the reconnect actually succeeded depends on whether the session is - // non-null + connected. When it succeeded, flip back to Running so downstream - // consumers see recovery. + // Retire the handler that just finished. Done under _probeLock so this can't race + // OnKeepAlive arming a fresh handler for a subsequent drop (Driver.OpcUaClient-005). + lock (_probeLock) + { + if (ReferenceEquals(_reconnectHandler, handler)) + { + _reconnectHandler.Dispose(); + _reconnectHandler = null; + } + } + if (newSession is not null) { + // Reconnect succeeded. Rebuild the namespace map from the *new* session: the + // remote server may have reordered its namespace table across the restart that + // caused the drop (driver-specs.md §8). Stable nsu= references stored in the + // address space re-resolve correctly against this fresh map. + _namespaceMap = NamespaceMap.FromSession(newSession); TransitionTo(HostState.Running); _health = new DriverHealth(DriverState.Healthy, DateTime.UtcNow, null); + return; + } + + // The reconnect handler gave up — its retry loop exhausted the 2-minute + // maxReconnectPeriod and invoked the callback with a null Session. Without an + // explicit Faulted signal the driver is permanently wedged: no session, no live + // keep-alive to re-trigger OnKeepAlive, and the Core never learns it must offer an + // operator reinitialize (Driver.OpcUaClient-002). Surface Faulted so the Core fans + // out Bad quality and ReinitializeAsync becomes available, and arm a fresh reconnect + // attempt against the last-known session for an always-on gateway rather than + // abandoning recovery entirely. + TransitionTo(HostState.Faulted); + _health = new DriverHealth( + DriverState.Faulted, _health.LastSuccessfulRead, + "OPC UA session reconnect exhausted its retry window without recovering. " + + "The remote server is unreachable; reinitialize the driver once it is back."); + + if (oldSession is not null && !_disposed) + TryRearmReconnect(handler, oldSession); + } + + /// + /// Arm a fresh reconnect attempt after a previous handler gave up. The OPC UA Client + /// driver gateways an always-on remote server, so abandoning recovery permanently is + /// the wrong default — a new keeps retrying so + /// the driver self-heals when the remote returns, while the Faulted health set by the + /// caller still lets an operator force a clean reinitialize in the meantime. + /// + private void TryRearmReconnect(SessionReconnectHandler exhausted, ISession lastSession) + { + SessionReconnectHandler handler; + lock (_probeLock) + { + // Only re-arm if no other handler took over and we aren't shutting down. + if (_disposed || (_reconnectHandler is not null && !ReferenceEquals(_reconnectHandler, exhausted))) + return; + handler = new SessionReconnectHandler(telemetry: null!, + reconnectAbort: false, + maxReconnectPeriod: (int)TimeSpan.FromMinutes(2).TotalMilliseconds); + _reconnectHandler = handler; + } + + try + { + handler.BeginReconnect( + lastSession, + (int)_options.ReconnectPeriod.TotalMilliseconds, + OnReconnectComplete); + } + catch + { + // If the SDK refuses to re-arm (e.g. the last session is fully torn down), drop + // the handler so a later operator ReinitializeAsync isn't blocked by a stale one. + lock (_probeLock) + { + if (ReferenceEquals(_reconnectHandler, handler)) + { + _reconnectHandler.Dispose(); + _reconnectHandler = null; + } + } } } diff --git a/src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs b/src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs index 877bd4e..40292a3 100644 --- a/src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs +++ b/src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs @@ -134,6 +134,55 @@ public sealed class OpcUaClientDriverOptions /// browse forever. /// public int MaxBrowseDepth { get; init; } = 10; + + /// + /// The namespace kind this driver instance gateways into. Per + /// docs/v2/driver-specs.md §8 "Namespace Assignment", the OPC UA Client driver + /// is the only driver that supports either kind, decided per instance: + /// + /// — gatewaying a remote + /// server that exposes raw equipment data; the driver remaps remote browse paths + /// to UNS. + /// — gatewaying a + /// remote server that exposes processed/derived data; hierarchy is preserved via + /// the remote browse path with no UNS conversion. + /// + /// The driver enforces the choice at startup — see + /// — so a misconfiguration fails + /// draft validation rather than surfacing as a runtime surprise. + /// + public OpcUaTargetNamespaceKind TargetNamespaceKind { get; init; } = OpcUaTargetNamespaceKind.Equipment; + + /// + /// Optional config-driven remote-to-UNS mapping table. Required when + /// is : + /// §8 mandates that a remote server gatewayed as Equipment carry a mapping table + /// because remote nodes do not conform to UNS by default. Keys are remote browse-path + /// prefixes; values are the UNS Area/Line/Name path the matching subtree maps onto. + /// For the table must be empty — + /// processed data preserves its own hierarchy and a mapping table would be ambiguous. + /// + public IReadOnlyDictionary UnsMappingTable { get; init; } + = new Dictionary(); +} + +/// +/// The namespace kind an OPC UA Client driver instance gateways its remote server into. +/// See docs/v2/driver-specs.md §8 "Namespace Assignment". +/// +public enum OpcUaTargetNamespaceKind +{ + /// + /// Remote server exposes raw equipment data; remote browse paths are remapped to UNS + /// via . + /// + Equipment, + + /// + /// Remote server exposes processed/derived data; hierarchy is preserved via the + /// remote browse path with no UNS conversion. + /// + SystemPlatform, } /// OPC UA message security mode. diff --git a/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcPlcProfile.cs b/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcPlcProfile.cs index d57b07d..ef86ec0 100644 --- a/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcPlcProfile.cs +++ b/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcPlcProfile.cs @@ -34,5 +34,8 @@ public static class OpcPlcProfile AutoAcceptCertificates = true, Timeout = TimeSpan.FromSeconds(10), SessionTimeout = TimeSpan.FromSeconds(30), + // opc-plc's standard address space mirrors verbatim — treat it as SystemPlatform so + // §8 namespace validation passes without requiring a UNS mapping table. + TargetNamespaceKind = OpcUaTargetNamespaceKind.SystemPlatform, }; } diff --git a/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/OpcUaClientFailoverTests.cs b/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/OpcUaClientFailoverTests.cs index 73edb82..37622b1 100644 --- a/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/OpcUaClientFailoverTests.cs +++ b/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/OpcUaClientFailoverTests.cs @@ -67,6 +67,9 @@ public sealed class OpcUaClientFailoverTests PerEndpointConnectTimeout = TimeSpan.FromMilliseconds(500), Timeout = TimeSpan.FromMilliseconds(500), AutoAcceptCertificates = true, + // SystemPlatform kind needs no UNS mapping table — keeps this failover test + // focused on the endpoint sweep rather than §8 namespace validation. + TargetNamespaceKind = OpcUaTargetNamespaceKind.SystemPlatform, }; using var drv = new OpcUaClientDriver(opts, "opcua-failover"); diff --git a/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/OpcUaClientNamespaceTests.cs b/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/OpcUaClientNamespaceTests.cs new file mode 100644 index 0000000..b3bec1a --- /dev/null +++ b/tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/OpcUaClientNamespaceTests.cs @@ -0,0 +1,139 @@ +using Opc.Ua; +using Shouldly; +using Xunit; + +namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests; + +/// +/// Regression coverage for driver-specs.md §8 namespace handling — the +/// TargetNamespaceKind startup enforcement and the server-stable NodeId encoding +/// that produces (findings Driver.OpcUaClient-004). +/// +[Trait("Category", "Unit")] +public sealed class OpcUaClientNamespaceTests +{ + // ---- Driver.OpcUaClient-004: TargetNamespaceKind startup enforcement ---- + + [Fact] + public void ValidateNamespaceKind_Equipment_without_mapping_table_is_rejected() + { + // §8: an Equipment-kind instance gateways raw equipment data and MUST carry a + // config-driven UNS mapping table — remote nodes don't conform to UNS by default. + var opts = new OpcUaClientDriverOptions { TargetNamespaceKind = OpcUaTargetNamespaceKind.Equipment }; + Should.Throw(() => OpcUaClientDriver.ValidateNamespaceKind(opts)) + .Message.ShouldContain("UnsMappingTable"); + } + + [Fact] + public void ValidateNamespaceKind_Equipment_with_mapping_table_passes() + { + var opts = new OpcUaClientDriverOptions + { + TargetNamespaceKind = OpcUaTargetNamespaceKind.Equipment, + UnsMappingTable = new Dictionary { ["Line1"] = "Site/AreaA/Line1" }, + }; + Should.NotThrow(() => OpcUaClientDriver.ValidateNamespaceKind(opts)); + } + + [Fact] + public void ValidateNamespaceKind_SystemPlatform_with_mapping_table_is_rejected() + { + // §8: processed data preserves its own hierarchy — a UNS mapping table is ambiguous. + var opts = new OpcUaClientDriverOptions + { + TargetNamespaceKind = OpcUaTargetNamespaceKind.SystemPlatform, + UnsMappingTable = new Dictionary { ["x"] = "y" }, + }; + Should.Throw(() => OpcUaClientDriver.ValidateNamespaceKind(opts)) + .Message.ShouldContain("SystemPlatform"); + } + + [Fact] + public void ValidateNamespaceKind_SystemPlatform_without_mapping_table_passes() + { + var opts = new OpcUaClientDriverOptions { TargetNamespaceKind = OpcUaTargetNamespaceKind.SystemPlatform }; + Should.NotThrow(() => OpcUaClientDriver.ValidateNamespaceKind(opts)); + } + + [Fact] + public void Default_TargetNamespaceKind_is_Equipment() + { + // §8 comparison table: OPC UA Client default lane is Equipment. + new OpcUaClientDriverOptions().TargetNamespaceKind.ShouldBe(OpcUaTargetNamespaceKind.Equipment); + } + + // ---- Driver.OpcUaClient-004: server-stable NodeId encoding ---- + + [Fact] + public void NamespaceMap_FromSession_rejects_null_session() => + Should.Throw(() => NamespaceMap.FromSession(null!)); + + [Fact] + public void NamespaceMap_namespace0_NodeId_keeps_compact_form() + { + // Namespace 0 is fixed by the OPC UA spec — it never moves, so the compact i=… form + // is already server-stable and must not be rewritten to nsu=… + var map = TestMap("http://opcfoundation.org/UA/", "urn:remote:server"); + var coreNode = new NodeId(2253u); // i=2253, Server object, namespace 0 + map.ToStableReference(coreNode).ShouldBe(coreNode.ToString()); + map.ToStableReference(coreNode).ShouldNotContain("nsu="); + } + + [Fact] + public void NamespaceMap_nonzero_namespace_NodeId_is_encoded_with_uri_not_index() + { + // A ns=1 reference is session-relative; if the remote reorders its table the index + // points at the wrong namespace. ToStableReference must embed the URI instead. + var map = TestMap("http://opcfoundation.org/UA/", "urn:remote:server"); + var node = new NodeId("Temperature", 1); + var stable = map.ToStableReference(node); + stable.ShouldContain("nsu=urn:remote:server"); + stable.ShouldContain("s=Temperature"); + stable.ShouldNotStartWith("ns=1"); + } + + [Fact] + public void NamespaceMap_unknown_namespace_index_falls_back_to_raw_form() + { + // An index past the captured table can't be URI-encoded; fall back rather than throw — + // the read/write path surfaces BadNodeIdInvalid if it truly can't resolve. + var map = TestMap("http://opcfoundation.org/UA/"); + var node = new NodeId("Orphan", 5); + Should.NotThrow(() => map.ToStableReference(node)); + } + + [Fact] + public void NamespaceMap_index_and_uri_lookups_are_bidirectional() + { + var map = TestMap("http://opcfoundation.org/UA/", "urn:remote:server", "urn:remote:vendor"); + map.Count.ShouldBe(3); + map.UriForIndex(1).ShouldBe("urn:remote:server"); + map.IndexForUri("urn:remote:vendor").ShouldBe((ushort)2); + map.IndexForUri("urn:not:present").ShouldBeNull(); + map.UriForIndex(99).ShouldBeNull(); + } + + [Fact] + public void NamespaceMap_TryResolve_rejects_empty_and_null_input() + { + NamespaceMap.TryResolve(null!, "ns=1;s=X", out _).ShouldBeFalse(); + } + + /// + /// Build a directly from a URI table, mirroring what + /// snapshots — lets the encoding be tested + /// without standing up a live ISession. + /// + private static NamespaceMap TestMap(params string[] uris) + { + var table = new NamespaceTable(); + // Index 0 (the OPC UA core namespace) is seeded by NamespaceTable's ctor; append the + // rest. Skip uris[0] when it is the core namespace to avoid a duplicate. + for (var i = 0; i < uris.Length; i++) + { + if (i == 0 && uris[i] == "http://opcfoundation.org/UA/") continue; + table.Append(uris[i]); + } + return NamespaceMap.FromTable(table); + } +}