# OPC UA Client driver Tier-A in-process driver that opens a `Session` against a remote OPC UA server and re-exposes its address space through the local OtOpcUa server. The "gateway / aggregation" direction — opposite to the usual "server exposes PLC data" flow. For the test fixture (opc-plc) see [`OpcUaClient-Test-Fixture.md`](OpcUaClient-Test-Fixture.md). For the configuration surface see `OpcUaClientDriverOptions` in [`src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs). ## Auto re-import on `ModelChangeEvent` The driver subscribes to `BaseModelChangeEventType` (and its subtype `GeneralModelChangeEventType`) on the upstream `Server` node (`i=2253`) at the end of `InitializeAsync`. When the upstream server advertises a topology change, the driver coalesces events over a debounce window and runs a single re-import (equivalent to calling `ReinitializeAsync` — internally `ShutdownAsync` + `InitializeAsync`). ### Configuration | Option | Default | Notes | | --- | --- | --- | | `WatchModelChanges` | `true` | Disable to skip the watch entirely (no extra subscription, no re-import on topology change). | | `ModelChangeDebounce` | `5s` | Coalescing window. The first event starts the timer; further events extend it; when it elapses with no new events, the driver fires one re-import. | ### Behaviour - One model-change subscription per driver instance, separate from the data + alarm subscriptions. Created best-effort: a server that doesn't advertise the event types or rejects the `EventFilter` falls through to no-watch — `InitializeAsync` still succeeds. - The `EventFilter` selects only the `EventType` field (a `WhereClause` constrains by `OfType BaseModelChangeEventType`). Payload fields like `Changes[]` are intentionally ignored: the driver always re-imports the full upstream root, so per-event delta tracking would just add wire overhead. - Debounce is implemented via a single-shot `Timer`; every event calls `Timer.Change(window, Infinite)` so a burst of N events triggers exactly one re-import after the window elapses with no further events. - The re-import path acquires the same `_gate` semaphore that `ReadAsync` / `WriteAsync` / `BrowseAsync` / `SubscribeAsync` use. Downstream callers see a brief browse-gap (≈ the upstream `DiscoverAsync` duration) while the gate is held — but no torn reads or split-batch writes. - Failure during the re-import is best-effort: the next `ModelChangeEvent` triggers another attempt, and the keep-alive watchdog covers permanent upstream loss. Operators see failures through `DriverHealth.LastError` + the diagnostics counters. ### When to disable Flip `WatchModelChanges` to `false` when: - The upstream topology is known-static (e.g. firmware-pinned PLC) and the driver should never run a re-import unprompted. - The brief browse-gap during re-import is unacceptable and a manual `ReinitializeAsync` call from the operator is preferred. - The upstream server fires spurious `ModelChangeEvent`s that don't reflect real topology changes, causing wasted re-imports. Tighten or disable rather than chasing the noise downstream. ## Reverse Connect (server-initiated) OPC UA's reverse-connect mode flips the transport direction: instead of the client dialling the server, the **server** dials the client's listener. The upstream sends a `ReverseHello` and the client continues the OPC UA handshake on the inbound socket. Required for OT-DMZ deployments where the plant firewall only permits outbound traffic from the upstream — the gateway opens a listener, the upstream reaches out. ### Configuration | Option | Default | Notes | | --- | --- | --- | | `ReverseConnect.Enabled` | `false` | Opt-in. When `true`, replaces the failover dial-sweep with a `WaitForConnection` call. | | `ReverseConnect.ListenerUrl` | `null` | Local listener URL the SDK binds. Typically `opc.tcp://0.0.0.0:4844` (any interface) or a specific NIC for multi-homed gateways. **Required when `Enabled` is `true`.** | | `ReverseConnect.ExpectedServerUri` | `null` | Upstream's `ApplicationUri` to filter inbound dials. `null` accepts the first connection (only safe with one upstream targeting the listener). | ### Shared listener (singleton) A single underlying `Opc.Ua.Client.ReverseConnectManager` per process keyed on `ListenerUrl`. Two driver instances that share a listener URL multiplex onto one TCP socket; the SDK demuxes inbound dials by the upstream's reported `ServerUri`. The wrapper (`ReverseConnectListener`) is reference-counted — first `Acquire` binds the port, last `Release` tears it down. Letting drivers come and go independently without races on port-bind / port-unbind. When two drivers share a listener: - They MUST set `ExpectedServerUri` to disambiguate; otherwise the first upstream to dial in wins regardless of which driver is waiting. - They CAN come and go independently; the listener stays alive while at least one driver references it. ### Behaviour - The dial path is bypassed entirely when `Enabled` is `true`. Failover across multiple `EndpointUrls` doesn't apply — there's no client-side dial to fail over. - `ExpectedServerUri` is the SDK's filter parameter to `WaitForConnectionAsync`. Inbound `ReverseHello`s from a different upstream are ignored and the caller keeps waiting. - The same `EndpointDescription` derivation runs as the dial path — the first `EndpointUrl` in the candidate list seeds `SecurityPolicy` / `SecurityMode` / `EndpointUrl` for the session-create call. The actual endpoint lives on the upstream and the SDK reconciles after the `ReverseHello`. - Cancellation: `Timeout` bounds the wait. A stuck listener with no inbound dial throws after `Timeout` rather than hanging init forever. - Shutdown releases the listener reference. The last release stops the listener so the port can be re-bound by a future driver lifecycle. ### Wiring it up on the upstream The upstream OPC UA server has to be configured to dial out. The `opc-plc` simulator does this with `--rc=opc.tcp://:4844`; for a real upstream see your server's reverse-connect docs (most major implementations expose a "ReverseConnect.Endpoint" config knob). ### When NOT to use - Standard plant networks where the gateway can dial the upstream — the conventional dial path is simpler and supports failover natively. - Public-internet OPC UA: reverse-connect is a network-policy workaround, not a security primitive. Always pair with `Sign` or `SignAndEncrypt` + a vetted user-token policy. ## HistoryRead Events The driver passes through OPC UA `HistoryReadEvents` to the upstream server. HistoryRead Raw / Processed / AtTime ship in the same code path (`ExecuteHistoryReadAsync`); event history takes a slightly different shape because the client sends an `EventFilter` (SelectClauses + WhereClause) rather than a plain numeric / time-based detail block. ### Wire path `IHistoryProvider.ReadEventsAsync(fullReference, EventHistoryRequest, ct)` translates to: ``` new ReadEventDetails { StartTime, EndTime, NumValuesPerNode, Filter = EventFilter { SelectClauses, WhereClause } } ``` …and is sent through `Session.HistoryReadAsync` to the upstream server. The returned `HistoryEvent.Events` collection (one `HistoryEventFieldList` per historical event) is unwrapped into `HistoricalEventBatch.Events`, where each `HistoricalEventRow.Fields` dictionary is keyed by the `SimpleAttributeSpec.FieldName` the caller supplied. The server-side history dispatcher uses those keys to align fields with the wire-side SelectClause order — drivers don't have to honour the entire OPC UA `EventFilter` shape verbatim. ### SelectClauses When `EventHistoryRequest.SelectClauses` is `null` the driver falls back to a default set that matches `BuildHistoryEvent` on the server side: | Field | Browse path | Notes | | --- | --- | --- | | `EventId` | `EventId` | BaseEventType — stable unique id. | | `SourceName` | `SourceName` | Source-object name. | | `Time` | `Time` | Process-side event timestamp. Used for `OccurrenceTime`. | | `Message` | `Message` | LocalizedText payload. | | `Severity` | `Severity` | OPC UA 1-1000 scale. | | `ReceiveTime` | `ReceiveTime` | Server-side ingest timestamp. | Custom SelectClauses are supported — pass any `IReadOnlyList`. Each entry's `TypeDefinitionId` defaults to `BaseEventType` when `null`; pass an explicit NodeId text (e.g. `"i=2782"` for `ConditionType`) to reach typed-condition fields. ### WhereClause `ContentFilterSpec.EncodedOperands` carries the binary-encoded `ContentFilter` from the wire. The driver decodes it into the SDK `ContentFilter` and attaches it to the outgoing `EventFilter` verbatim — the OPC UA Client driver is a passthrough for filter semantics, it does not evaluate them. A malformed filter is dropped silently; the SelectClause projection still goes out. ### Continuation points Returned in `HistoricalEventBatch.ContinuationPoint`. The server-side HistoryRead facade is responsible for round-tripping these so a paged event read against a chatty upstream completes incrementally. The driver itself doesn't track them — every `ReadEventsAsync` call issues a fresh `HistoryReadAsync`. ## HistoryRead Aggregates (Part 13 catalog) `IHistoryProvider.ReadProcessedAsync` takes a `HistoryAggregateType` and the driver maps it to the standard `Opc.Ua.ObjectIds.AggregateFunction_*` NodeId in `MapAggregateToNodeId`. PR-13 (issue #285) extended the enum from the original 5 values (Average / Minimum / Maximum / Total / Count) to the full OPC UA Part 13 §5 catalog — ~30 aggregates. The mapping is best-effort: not every upstream OPC UA server implements every aggregate. Aggregates the upstream rejects come back with `StatusCode=BadAggregateNotSupported` on the per-row HistoryRead result; the driver passes that through verbatim (cascading-quality rule, Part 11 §8) — it does not throw. Servers advertise the aggregates they support via the `AggregateConfiguration` object on the `Server` node; clients can probe it at runtime. ### Catalog | Enum value | SDK NodeId field | Part 13 § | Server-side support | Typical use | | --- | --- | --- | --- | --- | | `Average` | `AggregateFunction_Average` | §5.4 | almost always | smoothing | | `Minimum` | `AggregateFunction_Minimum` | §5.5 | almost always | low watermark | | `Maximum` | `AggregateFunction_Maximum` | §5.6 | almost always | high watermark | | `Total` | `AggregateFunction_Total` | §5.10 | usually | totalisation | | `Count` | `AggregateFunction_Count` | §5.18 | almost always | sample count | | `TimeAverage` | `AggregateFunction_TimeAverage` | §5.4.2 | usually | time-weighted mean | | `TimeAverage2` | `AggregateFunction_TimeAverage2` | §5.4.3 | sometimes | bounded time-weighted mean | | `Interpolative` | `AggregateFunction_Interpolative` | §5.3 | usually | trend snapshot | | `MinimumActualTime` | `AggregateFunction_MinimumActualTime` | §5.5.4 | sometimes | when low occurred | | `MaximumActualTime` | `AggregateFunction_MaximumActualTime` | §5.6.4 | sometimes | when high occurred | | `Range` | `AggregateFunction_Range` | §5.7 | usually | spread | | `Range2` | `AggregateFunction_Range2` | §5.7 | sometimes | bounded spread | | `AnnotationCount` | `AggregateFunction_AnnotationCount` | §5.21 | rarely | operator notes | | `DurationGood` | `AggregateFunction_DurationGood` | §5.16 | sometimes | quality coverage | | `DurationBad` | `AggregateFunction_DurationBad` | §5.16 | sometimes | gap accounting | | `PercentGood` | `AggregateFunction_PercentGood` | §5.17 | sometimes | quality % | | `PercentBad` | `AggregateFunction_PercentBad` | §5.17 | sometimes | gap % | | `WorstQuality` | `AggregateFunction_WorstQuality` | §5.20 | sometimes | worst seen | | `WorstQuality2` | `AggregateFunction_WorstQuality2` | §5.20 | rarely | bounded worst | | `StandardDeviationSample` | `AggregateFunction_StandardDeviationSample` | §5.13 | sometimes | n-1 stddev | | `StandardDeviationPopulation` | `AggregateFunction_StandardDeviationPopulation` | §5.13 | sometimes | n stddev | | `VarianceSample` | `AggregateFunction_VarianceSample` | §5.13 | sometimes | n-1 variance | | `VariancePopulation` | `AggregateFunction_VariancePopulation` | §5.13 | sometimes | n variance | | `NumberOfTransitions` | `AggregateFunction_NumberOfTransitions` | §5.12 | sometimes | event count | | `DurationInStateZero` | `AggregateFunction_DurationInStateZero` | §5.19 | sometimes | OFF time | | `DurationInStateNonZero` | `AggregateFunction_DurationInStateNonZero` | §5.19 | sometimes | ON time | | `Start` | `AggregateFunction_Start` | §5.8 | usually | first sample | | `End` | `AggregateFunction_End` | §5.9 | usually | last sample | | `Delta` | `AggregateFunction_Delta` | §5.11 | usually | end-start | | `StartBound` | `AggregateFunction_StartBound` | §5.8 | sometimes | extrapolated start | | `EndBound` | `AggregateFunction_EndBound` | §5.9 | sometimes | extrapolated end | "Server-side support" is heuristic — see your upstream's `AggregateConfiguration` node for the authoritative list. AVEVA Historian, KEPServerEX, Prosys, and opc-plc each implement different subsets. ### Driver-side validation The mapping itself is unit-tested over the full enum (`OpcUaClientAggregateMappingTests`) — every value resolves to a non-null namespace-0 NodeId, and the original 5 ordinals stay pinned. Wire-side behaviour against a live server is exercised by `OpcUaClientAggregateSweepTests` (build-only scaffold pending an opc-plc history-sim profile). ## Upstream redundancy (`ServerArray`) When the upstream OPC UA server is itself a redundant pair (warm or hot per OPC UA Part 4 §6.6.2), the driver supports **mid-session failover** driven by the upstream's own `Server.ServerRedundancy.RedundancySupport` + `ServerUriArray` + `Server.ServiceLevel` nodes. Distinct from the static boot-time failover sweep on `EndpointUrls`: that path picks a single survivor at session-create time; this path swaps the active session live when the upstream signals degradation, transferring subscriptions onto the secondary so monitored-item handles stay valid. ### Configuration | Option | Default | Notes | | --- | --- | --- | | `Redundancy.Enabled` | `false` | Opt-in. When `false`, the driver doesn't read `RedundancySupport` / `ServerUriArray` and doesn't subscribe to `ServiceLevel`. | | `Redundancy.ServiceLevelThreshold` | `200` | Byte value below which the driver triggers failover. OPC UA spec convention: 200+ = healthy primary, 100..199 = degraded, 0..99 = unrecoverable. | | `Redundancy.RecheckInterval` | `5s` | Lower bound between two consecutive failovers — suppresses oscillation when ServiceLevel flaps around the threshold. | ### Behaviour - At session activation the driver reads `Server.ServerRedundancy.RedundancySupport`. When `None`, the driver records an empty peer list and the failover path becomes a no-op (`ServiceLevel` drops are still observable via diagnostics but trigger nothing). - When the upstream advertises `Cold` / `Warm` / `WarmActive` / `Hot`, the driver pulls `Server.ServerRedundancy.ServerUriArray` for the peer list, falling back to the top-level `Server.ServerArray` for legacy upstreams that don't expose the redundancy node. - A dedicated subscription on `Server.ServiceLevel` (publish interval 1s, separate from the alarm + data subscriptions) drives every failover decision via the SDK's notification path — no polling loop. - On a drop below `ServiceLevelThreshold` the driver picks the next URI in the peer list that isn't the active one, opens a parallel session against it, and calls `Session.TransferSubscriptionsAsync(other, sendInitialValues:true)` to migrate every live subscription (data + alarm + model-change + service-level itself). On success the driver swaps `Session`, closes the old one, and bumps `RedundancyFailoverCount`. - On any failure (`BadSecureChannelClosed`, `BadCertificateUntrusted`, `TransferSubscriptions` returning `false`, secondary unreachable) the driver leaves the existing session untouched, increments `RedundancyFailoverFailures`, and waits for the next ServiceLevel notification. The keep-alive watchdog continues to cover full upstream-loss scenarios. ### Shared client-cert prerequisite `TransferSubscriptionsAsync` requires the secondary's secure channel to accept the same client certificate the primary did. Operators running heterogeneous secondaries (different cert trust stores) will see `BadCertificateUntrusted` on every failover attempt and the failures counter climbing. The fix is to push the gateway driver's application-instance certificate into both upstreams' `TrustedPeerCertificates` store before enabling redundancy. A follow-up adds a fallback path that re-creates subscriptions instead of transferring when the secondary rejects the channel. ### Diagnostics The `driver-diagnostics` RPC surfaces three new counters via `DriverHealth.Diagnostics`: | Key | Type | Notes | | --- | --- | --- | | `RedundancyFailoverCount` | `double` (long-counted) | Successful mid-session swaps since driver start. | | `RedundancyFailoverFailures` | `double` (long-counted) | Swap attempts that bailed (TransferSubscriptions false, secondary unreachable, etc.). | | `ActiveServerUri` | string (in `OpcUaClientDiagnostics.ActiveServerUri`) | URI of the upstream the driver is currently bound to. Updates on every successful failover. | ### Forced-failover runbook To validate the wiring against a real redundant upstream pair: 1. Confirm the upstream advertises `RedundancySupport != None` and a non-empty `ServerUriArray`. Use the Client CLI: `dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- redundancy -u `. 2. Set `Redundancy.Enabled = true` on the gateway's `OpcUaClient` driver instance and restart. 3. Tail driver diagnostics: `driver-diagnostics --instance ` — note `RedundancyFailoverCount = 0` pre-test. 4. Drive a `ServiceLevel` drop on the primary. On AVEVA / KEPServer this is typically a "force standby" Admin action; on a custom server it's a write to the simulated ServiceLevel node. 5. Observe `RedundancyFailoverCount = 1` within `RecheckInterval` of the drop, the gateway's `HostName` swap to the secondary URI, and downstream reads/subscriptions continuing without interruption. For non-redundant upstreams (single-server deployments) the recommended configuration is to leave `Redundancy.Enabled = false` and rely on `EndpointUrls` for boot-time failover only.