18 KiB
OPC UA Client driver
Tier-A in-process driver that opens a Session against a remote OPC UA server
and re-exposes its address space through the local OtOpcUa server. The
"gateway / aggregation" direction — opposite to the usual "server exposes PLC
data" flow.
For the test fixture (opc-plc) see OpcUaClient-Test-Fixture.md.
For the configuration surface see OpcUaClientDriverOptions in
src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs.
Auto re-import on ModelChangeEvent
The driver subscribes to BaseModelChangeEventType (and its subtype
GeneralModelChangeEventType) on the upstream Server node (i=2253) at
the end of InitializeAsync. When the upstream server advertises a
topology change, the driver coalesces events over a debounce window and
runs a single re-import (equivalent to calling ReinitializeAsync —
internally ShutdownAsync + InitializeAsync).
Configuration
| Option | Default | Notes |
|---|---|---|
WatchModelChanges |
true |
Disable to skip the watch entirely (no extra subscription, no re-import on topology change). |
ModelChangeDebounce |
5s |
Coalescing window. The first event starts the timer; further events extend it; when it elapses with no new events, the driver fires one re-import. |
Behaviour
- One model-change subscription per driver instance, separate from the
data + alarm subscriptions. Created best-effort: a server that doesn't
advertise the event types or rejects the
EventFilterfalls through to no-watch —InitializeAsyncstill succeeds. - The
EventFilterselects only theEventTypefield (aWhereClauseconstrains byOfType BaseModelChangeEventType). Payload fields likeChanges[]are intentionally ignored: the driver always re-imports the full upstream root, so per-event delta tracking would just add wire overhead. - Debounce is implemented via a single-shot
Timer; every event callsTimer.Change(window, Infinite)so a burst of N events triggers exactly one re-import after the window elapses with no further events. - The re-import path acquires the same
_gatesemaphore thatReadAsync/WriteAsync/BrowseAsync/SubscribeAsyncuse. Downstream callers see a brief browse-gap (≈ the upstreamDiscoverAsyncduration) while the gate is held — but no torn reads or split-batch writes. - Failure during the re-import is best-effort: the next
ModelChangeEventtriggers another attempt, and the keep-alive watchdog covers permanent upstream loss. Operators see failures throughDriverHealth.LastError- the diagnostics counters.
When to disable
Flip WatchModelChanges to false when:
- The upstream topology is known-static (e.g. firmware-pinned PLC) and the driver should never run a re-import unprompted.
- The brief browse-gap during re-import is unacceptable and a manual
ReinitializeAsynccall from the operator is preferred. - The upstream server fires spurious
ModelChangeEvents that don't reflect real topology changes, causing wasted re-imports. Tighten or disable rather than chasing the noise downstream.
Reverse Connect (server-initiated)
OPC UA's reverse-connect mode flips the transport direction: instead of the
client dialling the server, the server dials the client's listener. The
upstream sends a ReverseHello and the client continues the OPC UA
handshake on the inbound socket. Required for OT-DMZ deployments where the
plant firewall only permits outbound traffic from the upstream — the
gateway opens a listener, the upstream reaches out.
Configuration
| Option | Default | Notes |
|---|---|---|
ReverseConnect.Enabled |
false |
Opt-in. When true, replaces the failover dial-sweep with a WaitForConnection call. |
ReverseConnect.ListenerUrl |
null |
Local listener URL the SDK binds. Typically opc.tcp://0.0.0.0:4844 (any interface) or a specific NIC for multi-homed gateways. Required when Enabled is true. |
ReverseConnect.ExpectedServerUri |
null |
Upstream's ApplicationUri to filter inbound dials. null accepts the first connection (only safe with one upstream targeting the listener). |
Shared listener (singleton)
A single underlying Opc.Ua.Client.ReverseConnectManager per process keyed
on ListenerUrl. Two driver instances that share a listener URL multiplex
onto one TCP socket; the SDK demuxes inbound dials by the upstream's
reported ServerUri. The wrapper (ReverseConnectListener) is
reference-counted — first Acquire binds the port, last Release tears it
down. Letting drivers come and go independently without races on
port-bind / port-unbind.
When two drivers share a listener:
- They MUST set
ExpectedServerUrito disambiguate; otherwise the first upstream to dial in wins regardless of which driver is waiting. - They CAN come and go independently; the listener stays alive while at least one driver references it.
Behaviour
- The dial path is bypassed entirely when
Enabledistrue. Failover across multipleEndpointUrlsdoesn't apply — there's no client-side dial to fail over. ExpectedServerUriis the SDK's filter parameter toWaitForConnectionAsync. InboundReverseHellos from a different upstream are ignored and the caller keeps waiting.- The same
EndpointDescriptionderivation runs as the dial path — the firstEndpointUrlin the candidate list seedsSecurityPolicy/SecurityMode/EndpointUrlfor the session-create call. The actual endpoint lives on the upstream and the SDK reconciles after theReverseHello. - Cancellation:
Timeoutbounds the wait. A stuck listener with no inbound dial throws afterTimeoutrather than hanging init forever. - Shutdown releases the listener reference. The last release stops the listener so the port can be re-bound by a future driver lifecycle.
Wiring it up on the upstream
The upstream OPC UA server has to be configured to dial out. The opc-plc
simulator does this with --rc=opc.tcp://<gateway-host>:4844; for a real
upstream see your server's reverse-connect docs (most major implementations
expose a "ReverseConnect.Endpoint" config knob).
When NOT to use
- Standard plant networks where the gateway can dial the upstream — the conventional dial path is simpler and supports failover natively.
- Public-internet OPC UA: reverse-connect is a network-policy workaround,
not a security primitive. Always pair with
SignorSignAndEncrypt- a vetted user-token policy.
HistoryRead Events
The driver passes through OPC UA HistoryReadEvents to the upstream server.
HistoryRead Raw / Processed / AtTime ship in the same code path
(ExecuteHistoryReadAsync); event history takes a slightly different shape
because the client sends an EventFilter (SelectClauses + WhereClause) rather
than a plain numeric / time-based detail block.
Wire path
IHistoryProvider.ReadEventsAsync(fullReference, EventHistoryRequest, ct)
translates to:
new ReadEventDetails {
StartTime,
EndTime,
NumValuesPerNode,
Filter = EventFilter { SelectClauses, WhereClause }
}
…and is sent through Session.HistoryReadAsync to the upstream server. The
returned HistoryEvent.Events collection (one HistoryEventFieldList per
historical event) is unwrapped into HistoricalEventBatch.Events, where each
HistoricalEventRow.Fields dictionary is keyed by the
SimpleAttributeSpec.FieldName the caller supplied. The server-side history
dispatcher uses those keys to align fields with the wire-side SelectClause
order — drivers don't have to honour the entire OPC UA EventFilter shape
verbatim.
SelectClauses
When EventHistoryRequest.SelectClauses is null the driver falls back to a
default set that matches BuildHistoryEvent on the server side:
| Field | Browse path | Notes |
|---|---|---|
EventId |
EventId |
BaseEventType — stable unique id. |
SourceName |
SourceName |
Source-object name. |
Time |
Time |
Process-side event timestamp. Used for OccurrenceTime. |
Message |
Message |
LocalizedText payload. |
Severity |
Severity |
OPC UA 1-1000 scale. |
ReceiveTime |
ReceiveTime |
Server-side ingest timestamp. |
Custom SelectClauses are supported — pass any
IReadOnlyList<SimpleAttributeSpec>. Each entry's TypeDefinitionId
defaults to BaseEventType when null; pass an explicit NodeId text (e.g.
"i=2782" for ConditionType) to reach typed-condition fields.
WhereClause
ContentFilterSpec.EncodedOperands carries the binary-encoded
ContentFilter from the wire. The driver decodes it into the SDK
ContentFilter and attaches it to the outgoing EventFilter verbatim — the
OPC UA Client driver is a passthrough for filter semantics, it does not
evaluate them. A malformed filter is dropped silently; the SelectClause
projection still goes out.
Continuation points
Returned in HistoricalEventBatch.ContinuationPoint. The server-side
HistoryRead facade is responsible for round-tripping these so a paged event
read against a chatty upstream completes incrementally. The driver itself
doesn't track them — every ReadEventsAsync call issues a fresh
HistoryReadAsync.
HistoryRead Aggregates (Part 13 catalog)
IHistoryProvider.ReadProcessedAsync takes a HistoryAggregateType and the
driver maps it to the standard Opc.Ua.ObjectIds.AggregateFunction_* NodeId
in MapAggregateToNodeId. PR-13 (issue #285) extended the enum from the
original 5 values (Average / Minimum / Maximum / Total / Count) to the full
OPC UA Part 13 §5 catalog — ~30 aggregates.
The mapping is best-effort: not every upstream OPC UA server implements every
aggregate. Aggregates the upstream rejects come back with
StatusCode=BadAggregateNotSupported on the per-row HistoryRead result; the
driver passes that through verbatim (cascading-quality rule, Part 11 §8) — it
does not throw. Servers advertise the aggregates they support via the
AggregateConfiguration object on the Server node; clients can probe it at
runtime.
Catalog
| Enum value | SDK NodeId field | Part 13 § | Server-side support | Typical use |
|---|---|---|---|---|
Average |
AggregateFunction_Average |
§5.4 | almost always | smoothing |
Minimum |
AggregateFunction_Minimum |
§5.5 | almost always | low watermark |
Maximum |
AggregateFunction_Maximum |
§5.6 | almost always | high watermark |
Total |
AggregateFunction_Total |
§5.10 | usually | totalisation |
Count |
AggregateFunction_Count |
§5.18 | almost always | sample count |
TimeAverage |
AggregateFunction_TimeAverage |
§5.4.2 | usually | time-weighted mean |
TimeAverage2 |
AggregateFunction_TimeAverage2 |
§5.4.3 | sometimes | bounded time-weighted mean |
Interpolative |
AggregateFunction_Interpolative |
§5.3 | usually | trend snapshot |
MinimumActualTime |
AggregateFunction_MinimumActualTime |
§5.5.4 | sometimes | when low occurred |
MaximumActualTime |
AggregateFunction_MaximumActualTime |
§5.6.4 | sometimes | when high occurred |
Range |
AggregateFunction_Range |
§5.7 | usually | spread |
Range2 |
AggregateFunction_Range2 |
§5.7 | sometimes | bounded spread |
AnnotationCount |
AggregateFunction_AnnotationCount |
§5.21 | rarely | operator notes |
DurationGood |
AggregateFunction_DurationGood |
§5.16 | sometimes | quality coverage |
DurationBad |
AggregateFunction_DurationBad |
§5.16 | sometimes | gap accounting |
PercentGood |
AggregateFunction_PercentGood |
§5.17 | sometimes | quality % |
PercentBad |
AggregateFunction_PercentBad |
§5.17 | sometimes | gap % |
WorstQuality |
AggregateFunction_WorstQuality |
§5.20 | sometimes | worst seen |
WorstQuality2 |
AggregateFunction_WorstQuality2 |
§5.20 | rarely | bounded worst |
StandardDeviationSample |
AggregateFunction_StandardDeviationSample |
§5.13 | sometimes | n-1 stddev |
StandardDeviationPopulation |
AggregateFunction_StandardDeviationPopulation |
§5.13 | sometimes | n stddev |
VarianceSample |
AggregateFunction_VarianceSample |
§5.13 | sometimes | n-1 variance |
VariancePopulation |
AggregateFunction_VariancePopulation |
§5.13 | sometimes | n variance |
NumberOfTransitions |
AggregateFunction_NumberOfTransitions |
§5.12 | sometimes | event count |
DurationInStateZero |
AggregateFunction_DurationInStateZero |
§5.19 | sometimes | OFF time |
DurationInStateNonZero |
AggregateFunction_DurationInStateNonZero |
§5.19 | sometimes | ON time |
Start |
AggregateFunction_Start |
§5.8 | usually | first sample |
End |
AggregateFunction_End |
§5.9 | usually | last sample |
Delta |
AggregateFunction_Delta |
§5.11 | usually | end-start |
StartBound |
AggregateFunction_StartBound |
§5.8 | sometimes | extrapolated start |
EndBound |
AggregateFunction_EndBound |
§5.9 | sometimes | extrapolated end |
"Server-side support" is heuristic — see your upstream's AggregateConfiguration
node for the authoritative list. AVEVA Historian, KEPServerEX, Prosys, and
opc-plc each implement different subsets.
Driver-side validation
The mapping itself is unit-tested over the full enum
(OpcUaClientAggregateMappingTests) — every value resolves to a non-null
namespace-0 NodeId, and the original 5 ordinals stay pinned. Wire-side
behaviour against a live server is exercised by
OpcUaClientAggregateSweepTests (build-only scaffold pending an opc-plc
history-sim profile).
Upstream redundancy (ServerArray)
When the upstream OPC UA server is itself a redundant pair (warm or hot per
OPC UA Part 4 §6.6.2), the driver supports mid-session failover driven by
the upstream's own Server.ServerRedundancy.RedundancySupport +
ServerUriArray + Server.ServiceLevel nodes. Distinct from the static
boot-time failover sweep on EndpointUrls: that path picks a single survivor
at session-create time; this path swaps the active session live when the
upstream signals degradation, transferring subscriptions onto the secondary so
monitored-item handles stay valid.
Configuration
| Option | Default | Notes |
|---|---|---|
Redundancy.Enabled |
false |
Opt-in. When false, the driver doesn't read RedundancySupport / ServerUriArray and doesn't subscribe to ServiceLevel. |
Redundancy.ServiceLevelThreshold |
200 |
Byte value below which the driver triggers failover. OPC UA spec convention: 200+ = healthy primary, 100..199 = degraded, 0..99 = unrecoverable. |
Redundancy.RecheckInterval |
5s |
Lower bound between two consecutive failovers — suppresses oscillation when ServiceLevel flaps around the threshold. |
Behaviour
- At session activation the driver reads
Server.ServerRedundancy.RedundancySupport. WhenNone, the driver records an empty peer list and the failover path becomes a no-op (ServiceLeveldrops are still observable via diagnostics but trigger nothing). - When the upstream advertises
Cold/Warm/WarmActive/Hot, the driver pullsServer.ServerRedundancy.ServerUriArrayfor the peer list, falling back to the top-levelServer.ServerArrayfor legacy upstreams that don't expose the redundancy node. - A dedicated subscription on
Server.ServiceLevel(publish interval 1s, separate from the alarm + data subscriptions) drives every failover decision via the SDK's notification path — no polling loop. - On a drop below
ServiceLevelThresholdthe driver picks the next URI in the peer list that isn't the active one, opens a parallel session against it, and callsSession.TransferSubscriptionsAsync(other, sendInitialValues:true)to migrate every live subscription (data + alarm + model-change + service-level itself). On success the driver swapsSession, closes the old one, and bumpsRedundancyFailoverCount. - On any failure (
BadSecureChannelClosed,BadCertificateUntrusted,TransferSubscriptionsreturningfalse, secondary unreachable) the driver leaves the existing session untouched, incrementsRedundancyFailoverFailures, and waits for the next ServiceLevel notification. The keep-alive watchdog continues to cover full upstream-loss scenarios.
Shared client-cert prerequisite
TransferSubscriptionsAsync requires the secondary's secure channel to accept
the same client certificate the primary did. Operators running heterogeneous
secondaries (different cert trust stores) will see BadCertificateUntrusted
on every failover attempt and the failures counter climbing. The fix is to
push the gateway driver's application-instance certificate into both
upstreams' TrustedPeerCertificates store before enabling redundancy. A
follow-up adds a fallback path that re-creates subscriptions instead of
transferring when the secondary rejects the channel.
Diagnostics
The driver-diagnostics RPC surfaces three new counters via
DriverHealth.Diagnostics:
| Key | Type | Notes |
|---|---|---|
RedundancyFailoverCount |
double (long-counted) |
Successful mid-session swaps since driver start. |
RedundancyFailoverFailures |
double (long-counted) |
Swap attempts that bailed (TransferSubscriptions false, secondary unreachable, etc.). |
ActiveServerUri |
string (in OpcUaClientDiagnostics.ActiveServerUri) |
URI of the upstream the driver is currently bound to. Updates on every successful failover. |
Forced-failover runbook
To validate the wiring against a real redundant upstream pair:
- Confirm the upstream advertises
RedundancySupport != Noneand a non-emptyServerUriArray. Use the Client CLI:dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- redundancy -u <primary>. - Set
Redundancy.Enabled = trueon the gateway'sOpcUaClientdriver instance and restart. - Tail driver diagnostics:
driver-diagnostics --instance <id>— noteRedundancyFailoverCount = 0pre-test. - Drive a
ServiceLeveldrop on the primary. On AVEVA / KEPServer this is typically a "force standby" Admin action; on a custom server it's a write to the simulated ServiceLevel node. - Observe
RedundancyFailoverCount = 1withinRecheckIntervalof the drop, the gateway'sHostNameswap to the secondary URI, and downstream reads/subscriptions continuing without interruption.
For non-redundant upstreams (single-server deployments) the recommended
configuration is to leave Redundancy.Enabled = false and rely on
EndpointUrls for boot-time failover only.