Files
lmxopcua/docs/drivers/OpcUaClient.md
2026-04-26 10:05:05 -04:00

351 lines
18 KiB
Markdown

# OPC UA Client driver
Tier-A in-process driver that opens a `Session` against a remote OPC UA server
and re-exposes its address space through the local OtOpcUa server. The
"gateway / aggregation" direction — opposite to the usual "server exposes PLC
data" flow.
For the test fixture (opc-plc) see [`OpcUaClient-Test-Fixture.md`](OpcUaClient-Test-Fixture.md).
For the configuration surface see `OpcUaClientDriverOptions` in
[`src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs).
## Auto re-import on `ModelChangeEvent`
The driver subscribes to `BaseModelChangeEventType` (and its subtype
`GeneralModelChangeEventType`) on the upstream `Server` node (`i=2253`) at
the end of `InitializeAsync`. When the upstream server advertises a
topology change, the driver coalesces events over a debounce window and
runs a single re-import (equivalent to calling `ReinitializeAsync`
internally `ShutdownAsync` + `InitializeAsync`).
### Configuration
| Option | Default | Notes |
| --- | --- | --- |
| `WatchModelChanges` | `true` | Disable to skip the watch entirely (no extra subscription, no re-import on topology change). |
| `ModelChangeDebounce` | `5s` | Coalescing window. The first event starts the timer; further events extend it; when it elapses with no new events, the driver fires one re-import. |
### Behaviour
- One model-change subscription per driver instance, separate from the
data + alarm subscriptions. Created best-effort: a server that doesn't
advertise the event types or rejects the `EventFilter` falls through to
no-watch — `InitializeAsync` still succeeds.
- The `EventFilter` selects only the `EventType` field (a `WhereClause`
constrains by `OfType BaseModelChangeEventType`). Payload fields like
`Changes[]` are intentionally ignored: the driver always re-imports the
full upstream root, so per-event delta tracking would just add wire
overhead.
- Debounce is implemented via a single-shot `Timer`; every event calls
`Timer.Change(window, Infinite)` so a burst of N events triggers exactly
one re-import after the window elapses with no further events.
- The re-import path acquires the same `_gate` semaphore that `ReadAsync`
/ `WriteAsync` / `BrowseAsync` / `SubscribeAsync` use. Downstream callers
see a brief browse-gap (≈ the upstream `DiscoverAsync` duration) while
the gate is held — but no torn reads or split-batch writes.
- Failure during the re-import is best-effort: the next `ModelChangeEvent`
triggers another attempt, and the keep-alive watchdog covers permanent
upstream loss. Operators see failures through `DriverHealth.LastError`
+ the diagnostics counters.
### When to disable
Flip `WatchModelChanges` to `false` when:
- The upstream topology is known-static (e.g. firmware-pinned PLC) and
the driver should never run a re-import unprompted.
- The brief browse-gap during re-import is unacceptable and a manual
`ReinitializeAsync` call from the operator is preferred.
- The upstream server fires spurious `ModelChangeEvent`s that don't
reflect real topology changes, causing wasted re-imports. Tighten or
disable rather than chasing the noise downstream.
## Reverse Connect (server-initiated)
OPC UA's reverse-connect mode flips the transport direction: instead of the
client dialling the server, the **server** dials the client's listener. The
upstream sends a `ReverseHello` and the client continues the OPC UA
handshake on the inbound socket. Required for OT-DMZ deployments where the
plant firewall only permits outbound traffic from the upstream — the
gateway opens a listener, the upstream reaches out.
### Configuration
| Option | Default | Notes |
| --- | --- | --- |
| `ReverseConnect.Enabled` | `false` | Opt-in. When `true`, replaces the failover dial-sweep with a `WaitForConnection` call. |
| `ReverseConnect.ListenerUrl` | `null` | Local listener URL the SDK binds. Typically `opc.tcp://0.0.0.0:4844` (any interface) or a specific NIC for multi-homed gateways. **Required when `Enabled` is `true`.** |
| `ReverseConnect.ExpectedServerUri` | `null` | Upstream's `ApplicationUri` to filter inbound dials. `null` accepts the first connection (only safe with one upstream targeting the listener). |
### Shared listener (singleton)
A single underlying `Opc.Ua.Client.ReverseConnectManager` per process keyed
on `ListenerUrl`. Two driver instances that share a listener URL multiplex
onto one TCP socket; the SDK demuxes inbound dials by the upstream's
reported `ServerUri`. The wrapper (`ReverseConnectListener`) is
reference-counted — first `Acquire` binds the port, last `Release` tears it
down. Letting drivers come and go independently without races on
port-bind / port-unbind.
When two drivers share a listener:
- They MUST set `ExpectedServerUri` to disambiguate; otherwise the first
upstream to dial in wins regardless of which driver is waiting.
- They CAN come and go independently; the listener stays alive while at
least one driver references it.
### Behaviour
- The dial path is bypassed entirely when `Enabled` is `true`. Failover
across multiple `EndpointUrls` doesn't apply — there's no client-side
dial to fail over.
- `ExpectedServerUri` is the SDK's filter parameter to `WaitForConnectionAsync`.
Inbound `ReverseHello`s from a different upstream are ignored and the
caller keeps waiting.
- The same `EndpointDescription` derivation runs as the dial path — the
first `EndpointUrl` in the candidate list seeds `SecurityPolicy` /
`SecurityMode` / `EndpointUrl` for the session-create call. The actual
endpoint lives on the upstream and the SDK reconciles after the
`ReverseHello`.
- Cancellation: `Timeout` bounds the wait. A stuck listener with no inbound
dial throws after `Timeout` rather than hanging init forever.
- Shutdown releases the listener reference. The last release stops the
listener so the port can be re-bound by a future driver lifecycle.
### Wiring it up on the upstream
The upstream OPC UA server has to be configured to dial out. The `opc-plc`
simulator does this with `--rc=opc.tcp://<gateway-host>:4844`; for a real
upstream see your server's reverse-connect docs (most major implementations
expose a "ReverseConnect.Endpoint" config knob).
### When NOT to use
- Standard plant networks where the gateway can dial the upstream — the
conventional dial path is simpler and supports failover natively.
- Public-internet OPC UA: reverse-connect is a network-policy workaround,
not a security primitive. Always pair with `Sign` or `SignAndEncrypt`
+ a vetted user-token policy.
## HistoryRead Events
The driver passes through OPC UA `HistoryReadEvents` to the upstream server.
HistoryRead Raw / Processed / AtTime ship in the same code path
(`ExecuteHistoryReadAsync`); event history takes a slightly different shape
because the client sends an `EventFilter` (SelectClauses + WhereClause) rather
than a plain numeric / time-based detail block.
### Wire path
`IHistoryProvider.ReadEventsAsync(fullReference, EventHistoryRequest, ct)`
translates to:
```
new ReadEventDetails {
StartTime,
EndTime,
NumValuesPerNode,
Filter = EventFilter { SelectClauses, WhereClause }
}
```
…and is sent through `Session.HistoryReadAsync` to the upstream server. The
returned `HistoryEvent.Events` collection (one `HistoryEventFieldList` per
historical event) is unwrapped into `HistoricalEventBatch.Events`, where each
`HistoricalEventRow.Fields` dictionary is keyed by the
`SimpleAttributeSpec.FieldName` the caller supplied. The server-side history
dispatcher uses those keys to align fields with the wire-side SelectClause
order — drivers don't have to honour the entire OPC UA `EventFilter` shape
verbatim.
### SelectClauses
When `EventHistoryRequest.SelectClauses` is `null` the driver falls back to a
default set that matches `BuildHistoryEvent` on the server side:
| Field | Browse path | Notes |
| --- | --- | --- |
| `EventId` | `EventId` | BaseEventType — stable unique id. |
| `SourceName` | `SourceName` | Source-object name. |
| `Time` | `Time` | Process-side event timestamp. Used for `OccurrenceTime`. |
| `Message` | `Message` | LocalizedText payload. |
| `Severity` | `Severity` | OPC UA 1-1000 scale. |
| `ReceiveTime` | `ReceiveTime` | Server-side ingest timestamp. |
Custom SelectClauses are supported — pass any
`IReadOnlyList<SimpleAttributeSpec>`. Each entry's `TypeDefinitionId`
defaults to `BaseEventType` when `null`; pass an explicit NodeId text (e.g.
`"i=2782"` for `ConditionType`) to reach typed-condition fields.
### WhereClause
`ContentFilterSpec.EncodedOperands` carries the binary-encoded
`ContentFilter` from the wire. The driver decodes it into the SDK
`ContentFilter` and attaches it to the outgoing `EventFilter` verbatim — the
OPC UA Client driver is a passthrough for filter semantics, it does not
evaluate them. A malformed filter is dropped silently; the SelectClause
projection still goes out.
### Continuation points
Returned in `HistoricalEventBatch.ContinuationPoint`. The server-side
HistoryRead facade is responsible for round-tripping these so a paged event
read against a chatty upstream completes incrementally. The driver itself
doesn't track them — every `ReadEventsAsync` call issues a fresh
`HistoryReadAsync`.
## HistoryRead Aggregates (Part 13 catalog)
`IHistoryProvider.ReadProcessedAsync` takes a `HistoryAggregateType` and the
driver maps it to the standard `Opc.Ua.ObjectIds.AggregateFunction_*` NodeId
in `MapAggregateToNodeId`. PR-13 (issue #285) extended the enum from the
original 5 values (Average / Minimum / Maximum / Total / Count) to the full
OPC UA Part 13 §5 catalog — ~30 aggregates.
The mapping is best-effort: not every upstream OPC UA server implements every
aggregate. Aggregates the upstream rejects come back with
`StatusCode=BadAggregateNotSupported` on the per-row HistoryRead result; the
driver passes that through verbatim (cascading-quality rule, Part 11 §8) — it
does not throw. Servers advertise the aggregates they support via the
`AggregateConfiguration` object on the `Server` node; clients can probe it at
runtime.
### Catalog
| Enum value | SDK NodeId field | Part 13 § | Server-side support | Typical use |
| --- | --- | --- | --- | --- |
| `Average` | `AggregateFunction_Average` | §5.4 | almost always | smoothing |
| `Minimum` | `AggregateFunction_Minimum` | §5.5 | almost always | low watermark |
| `Maximum` | `AggregateFunction_Maximum` | §5.6 | almost always | high watermark |
| `Total` | `AggregateFunction_Total` | §5.10 | usually | totalisation |
| `Count` | `AggregateFunction_Count` | §5.18 | almost always | sample count |
| `TimeAverage` | `AggregateFunction_TimeAverage` | §5.4.2 | usually | time-weighted mean |
| `TimeAverage2` | `AggregateFunction_TimeAverage2` | §5.4.3 | sometimes | bounded time-weighted mean |
| `Interpolative` | `AggregateFunction_Interpolative` | §5.3 | usually | trend snapshot |
| `MinimumActualTime` | `AggregateFunction_MinimumActualTime` | §5.5.4 | sometimes | when low occurred |
| `MaximumActualTime` | `AggregateFunction_MaximumActualTime` | §5.6.4 | sometimes | when high occurred |
| `Range` | `AggregateFunction_Range` | §5.7 | usually | spread |
| `Range2` | `AggregateFunction_Range2` | §5.7 | sometimes | bounded spread |
| `AnnotationCount` | `AggregateFunction_AnnotationCount` | §5.21 | rarely | operator notes |
| `DurationGood` | `AggregateFunction_DurationGood` | §5.16 | sometimes | quality coverage |
| `DurationBad` | `AggregateFunction_DurationBad` | §5.16 | sometimes | gap accounting |
| `PercentGood` | `AggregateFunction_PercentGood` | §5.17 | sometimes | quality % |
| `PercentBad` | `AggregateFunction_PercentBad` | §5.17 | sometimes | gap % |
| `WorstQuality` | `AggregateFunction_WorstQuality` | §5.20 | sometimes | worst seen |
| `WorstQuality2` | `AggregateFunction_WorstQuality2` | §5.20 | rarely | bounded worst |
| `StandardDeviationSample` | `AggregateFunction_StandardDeviationSample` | §5.13 | sometimes | n-1 stddev |
| `StandardDeviationPopulation` | `AggregateFunction_StandardDeviationPopulation` | §5.13 | sometimes | n stddev |
| `VarianceSample` | `AggregateFunction_VarianceSample` | §5.13 | sometimes | n-1 variance |
| `VariancePopulation` | `AggregateFunction_VariancePopulation` | §5.13 | sometimes | n variance |
| `NumberOfTransitions` | `AggregateFunction_NumberOfTransitions` | §5.12 | sometimes | event count |
| `DurationInStateZero` | `AggregateFunction_DurationInStateZero` | §5.19 | sometimes | OFF time |
| `DurationInStateNonZero` | `AggregateFunction_DurationInStateNonZero` | §5.19 | sometimes | ON time |
| `Start` | `AggregateFunction_Start` | §5.8 | usually | first sample |
| `End` | `AggregateFunction_End` | §5.9 | usually | last sample |
| `Delta` | `AggregateFunction_Delta` | §5.11 | usually | end-start |
| `StartBound` | `AggregateFunction_StartBound` | §5.8 | sometimes | extrapolated start |
| `EndBound` | `AggregateFunction_EndBound` | §5.9 | sometimes | extrapolated end |
"Server-side support" is heuristic — see your upstream's `AggregateConfiguration`
node for the authoritative list. AVEVA Historian, KEPServerEX, Prosys, and
opc-plc each implement different subsets.
### Driver-side validation
The mapping itself is unit-tested over the full enum
(`OpcUaClientAggregateMappingTests`) — every value resolves to a non-null
namespace-0 NodeId, and the original 5 ordinals stay pinned. Wire-side
behaviour against a live server is exercised by
`OpcUaClientAggregateSweepTests` (build-only scaffold pending an opc-plc
history-sim profile).
## Upstream redundancy (`ServerArray`)
When the upstream OPC UA server is itself a redundant pair (warm or hot per
OPC UA Part 4 §6.6.2), the driver supports **mid-session failover** driven by
the upstream's own `Server.ServerRedundancy.RedundancySupport` +
`ServerUriArray` + `Server.ServiceLevel` nodes. Distinct from the static
boot-time failover sweep on `EndpointUrls`: that path picks a single survivor
at session-create time; this path swaps the active session live when the
upstream signals degradation, transferring subscriptions onto the secondary so
monitored-item handles stay valid.
### Configuration
| Option | Default | Notes |
| --- | --- | --- |
| `Redundancy.Enabled` | `false` | Opt-in. When `false`, the driver doesn't read `RedundancySupport` / `ServerUriArray` and doesn't subscribe to `ServiceLevel`. |
| `Redundancy.ServiceLevelThreshold` | `200` | Byte value below which the driver triggers failover. OPC UA spec convention: 200+ = healthy primary, 100..199 = degraded, 0..99 = unrecoverable. |
| `Redundancy.RecheckInterval` | `5s` | Lower bound between two consecutive failovers — suppresses oscillation when ServiceLevel flaps around the threshold. |
### Behaviour
- At session activation the driver reads
`Server.ServerRedundancy.RedundancySupport`. When `None`, the driver records
an empty peer list and the failover path becomes a no-op (`ServiceLevel`
drops are still observable via diagnostics but trigger nothing).
- When the upstream advertises `Cold` / `Warm` / `WarmActive` / `Hot`, the
driver pulls `Server.ServerRedundancy.ServerUriArray` for the peer list,
falling back to the top-level `Server.ServerArray` for legacy upstreams that
don't expose the redundancy node.
- A dedicated subscription on `Server.ServiceLevel` (publish interval 1s,
separate from the alarm + data subscriptions) drives every failover decision
via the SDK's notification path — no polling loop.
- On a drop below `ServiceLevelThreshold` the driver picks the next URI in the
peer list that isn't the active one, opens a parallel session against it,
and calls `Session.TransferSubscriptionsAsync(other, sendInitialValues:true)`
to migrate every live subscription (data + alarm + model-change +
service-level itself). On success the driver swaps `Session`, closes the
old one, and bumps `RedundancyFailoverCount`.
- On any failure (`BadSecureChannelClosed`, `BadCertificateUntrusted`,
`TransferSubscriptions` returning `false`, secondary unreachable) the driver
leaves the existing session untouched, increments
`RedundancyFailoverFailures`, and waits for the next ServiceLevel
notification. The keep-alive watchdog continues to cover full
upstream-loss scenarios.
### Shared client-cert prerequisite
`TransferSubscriptionsAsync` requires the secondary's secure channel to accept
the same client certificate the primary did. Operators running heterogeneous
secondaries (different cert trust stores) will see `BadCertificateUntrusted`
on every failover attempt and the failures counter climbing. The fix is to
push the gateway driver's application-instance certificate into both
upstreams' `TrustedPeerCertificates` store before enabling redundancy. A
follow-up adds a fallback path that re-creates subscriptions instead of
transferring when the secondary rejects the channel.
### Diagnostics
The `driver-diagnostics` RPC surfaces three new counters via
`DriverHealth.Diagnostics`:
| Key | Type | Notes |
| --- | --- | --- |
| `RedundancyFailoverCount` | `double` (long-counted) | Successful mid-session swaps since driver start. |
| `RedundancyFailoverFailures` | `double` (long-counted) | Swap attempts that bailed (TransferSubscriptions false, secondary unreachable, etc.). |
| `ActiveServerUri` | string (in `OpcUaClientDiagnostics.ActiveServerUri`) | URI of the upstream the driver is currently bound to. Updates on every successful failover. |
### Forced-failover runbook
To validate the wiring against a real redundant upstream pair:
1. Confirm the upstream advertises `RedundancySupport != None` and a
non-empty `ServerUriArray`. Use the Client CLI:
`dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- redundancy -u <primary>`.
2. Set `Redundancy.Enabled = true` on the gateway's `OpcUaClient` driver
instance and restart.
3. Tail driver diagnostics:
`driver-diagnostics --instance <id>` — note `RedundancyFailoverCount = 0`
pre-test.
4. Drive a `ServiceLevel` drop on the primary. On AVEVA / KEPServer this is
typically a "force standby" Admin action; on a custom server it's a write
to the simulated ServiceLevel node.
5. Observe `RedundancyFailoverCount = 1` within `RecheckInterval` of the
drop, the gateway's `HostName` swap to the secondary URI, and downstream
reads/subscriptions continuing without interruption.
For non-redundant upstreams (single-server deployments) the recommended
configuration is to leave `Redundancy.Enabled = false` and rely on
`EndpointUrls` for boot-time failover only.