351 lines
18 KiB
Markdown
351 lines
18 KiB
Markdown
# OPC UA Client driver
|
|
|
|
Tier-A in-process driver that opens a `Session` against a remote OPC UA server
|
|
and re-exposes its address space through the local OtOpcUa server. The
|
|
"gateway / aggregation" direction — opposite to the usual "server exposes PLC
|
|
data" flow.
|
|
|
|
For the test fixture (opc-plc) see [`OpcUaClient-Test-Fixture.md`](OpcUaClient-Test-Fixture.md).
|
|
For the configuration surface see `OpcUaClientDriverOptions` in
|
|
[`src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs).
|
|
|
|
## Auto re-import on `ModelChangeEvent`
|
|
|
|
The driver subscribes to `BaseModelChangeEventType` (and its subtype
|
|
`GeneralModelChangeEventType`) on the upstream `Server` node (`i=2253`) at
|
|
the end of `InitializeAsync`. When the upstream server advertises a
|
|
topology change, the driver coalesces events over a debounce window and
|
|
runs a single re-import (equivalent to calling `ReinitializeAsync` —
|
|
internally `ShutdownAsync` + `InitializeAsync`).
|
|
|
|
### Configuration
|
|
|
|
| Option | Default | Notes |
|
|
| --- | --- | --- |
|
|
| `WatchModelChanges` | `true` | Disable to skip the watch entirely (no extra subscription, no re-import on topology change). |
|
|
| `ModelChangeDebounce` | `5s` | Coalescing window. The first event starts the timer; further events extend it; when it elapses with no new events, the driver fires one re-import. |
|
|
|
|
### Behaviour
|
|
|
|
- One model-change subscription per driver instance, separate from the
|
|
data + alarm subscriptions. Created best-effort: a server that doesn't
|
|
advertise the event types or rejects the `EventFilter` falls through to
|
|
no-watch — `InitializeAsync` still succeeds.
|
|
- The `EventFilter` selects only the `EventType` field (a `WhereClause`
|
|
constrains by `OfType BaseModelChangeEventType`). Payload fields like
|
|
`Changes[]` are intentionally ignored: the driver always re-imports the
|
|
full upstream root, so per-event delta tracking would just add wire
|
|
overhead.
|
|
- Debounce is implemented via a single-shot `Timer`; every event calls
|
|
`Timer.Change(window, Infinite)` so a burst of N events triggers exactly
|
|
one re-import after the window elapses with no further events.
|
|
- The re-import path acquires the same `_gate` semaphore that `ReadAsync`
|
|
/ `WriteAsync` / `BrowseAsync` / `SubscribeAsync` use. Downstream callers
|
|
see a brief browse-gap (≈ the upstream `DiscoverAsync` duration) while
|
|
the gate is held — but no torn reads or split-batch writes.
|
|
- Failure during the re-import is best-effort: the next `ModelChangeEvent`
|
|
triggers another attempt, and the keep-alive watchdog covers permanent
|
|
upstream loss. Operators see failures through `DriverHealth.LastError`
|
|
+ the diagnostics counters.
|
|
|
|
### When to disable
|
|
|
|
Flip `WatchModelChanges` to `false` when:
|
|
|
|
- The upstream topology is known-static (e.g. firmware-pinned PLC) and
|
|
the driver should never run a re-import unprompted.
|
|
- The brief browse-gap during re-import is unacceptable and a manual
|
|
`ReinitializeAsync` call from the operator is preferred.
|
|
- The upstream server fires spurious `ModelChangeEvent`s that don't
|
|
reflect real topology changes, causing wasted re-imports. Tighten or
|
|
disable rather than chasing the noise downstream.
|
|
|
|
## Reverse Connect (server-initiated)
|
|
|
|
OPC UA's reverse-connect mode flips the transport direction: instead of the
|
|
client dialling the server, the **server** dials the client's listener. The
|
|
upstream sends a `ReverseHello` and the client continues the OPC UA
|
|
handshake on the inbound socket. Required for OT-DMZ deployments where the
|
|
plant firewall only permits outbound traffic from the upstream — the
|
|
gateway opens a listener, the upstream reaches out.
|
|
|
|
### Configuration
|
|
|
|
| Option | Default | Notes |
|
|
| --- | --- | --- |
|
|
| `ReverseConnect.Enabled` | `false` | Opt-in. When `true`, replaces the failover dial-sweep with a `WaitForConnection` call. |
|
|
| `ReverseConnect.ListenerUrl` | `null` | Local listener URL the SDK binds. Typically `opc.tcp://0.0.0.0:4844` (any interface) or a specific NIC for multi-homed gateways. **Required when `Enabled` is `true`.** |
|
|
| `ReverseConnect.ExpectedServerUri` | `null` | Upstream's `ApplicationUri` to filter inbound dials. `null` accepts the first connection (only safe with one upstream targeting the listener). |
|
|
|
|
### Shared listener (singleton)
|
|
|
|
A single underlying `Opc.Ua.Client.ReverseConnectManager` per process keyed
|
|
on `ListenerUrl`. Two driver instances that share a listener URL multiplex
|
|
onto one TCP socket; the SDK demuxes inbound dials by the upstream's
|
|
reported `ServerUri`. The wrapper (`ReverseConnectListener`) is
|
|
reference-counted — first `Acquire` binds the port, last `Release` tears it
|
|
down. Letting drivers come and go independently without races on
|
|
port-bind / port-unbind.
|
|
|
|
When two drivers share a listener:
|
|
|
|
- They MUST set `ExpectedServerUri` to disambiguate; otherwise the first
|
|
upstream to dial in wins regardless of which driver is waiting.
|
|
- They CAN come and go independently; the listener stays alive while at
|
|
least one driver references it.
|
|
|
|
### Behaviour
|
|
|
|
- The dial path is bypassed entirely when `Enabled` is `true`. Failover
|
|
across multiple `EndpointUrls` doesn't apply — there's no client-side
|
|
dial to fail over.
|
|
- `ExpectedServerUri` is the SDK's filter parameter to `WaitForConnectionAsync`.
|
|
Inbound `ReverseHello`s from a different upstream are ignored and the
|
|
caller keeps waiting.
|
|
- The same `EndpointDescription` derivation runs as the dial path — the
|
|
first `EndpointUrl` in the candidate list seeds `SecurityPolicy` /
|
|
`SecurityMode` / `EndpointUrl` for the session-create call. The actual
|
|
endpoint lives on the upstream and the SDK reconciles after the
|
|
`ReverseHello`.
|
|
- Cancellation: `Timeout` bounds the wait. A stuck listener with no inbound
|
|
dial throws after `Timeout` rather than hanging init forever.
|
|
- Shutdown releases the listener reference. The last release stops the
|
|
listener so the port can be re-bound by a future driver lifecycle.
|
|
|
|
### Wiring it up on the upstream
|
|
|
|
The upstream OPC UA server has to be configured to dial out. The `opc-plc`
|
|
simulator does this with `--rc=opc.tcp://<gateway-host>:4844`; for a real
|
|
upstream see your server's reverse-connect docs (most major implementations
|
|
expose a "ReverseConnect.Endpoint" config knob).
|
|
|
|
### When NOT to use
|
|
|
|
- Standard plant networks where the gateway can dial the upstream — the
|
|
conventional dial path is simpler and supports failover natively.
|
|
- Public-internet OPC UA: reverse-connect is a network-policy workaround,
|
|
not a security primitive. Always pair with `Sign` or `SignAndEncrypt`
|
|
+ a vetted user-token policy.
|
|
|
|
## HistoryRead Events
|
|
|
|
The driver passes through OPC UA `HistoryReadEvents` to the upstream server.
|
|
HistoryRead Raw / Processed / AtTime ship in the same code path
|
|
(`ExecuteHistoryReadAsync`); event history takes a slightly different shape
|
|
because the client sends an `EventFilter` (SelectClauses + WhereClause) rather
|
|
than a plain numeric / time-based detail block.
|
|
|
|
### Wire path
|
|
|
|
`IHistoryProvider.ReadEventsAsync(fullReference, EventHistoryRequest, ct)`
|
|
translates to:
|
|
|
|
```
|
|
new ReadEventDetails {
|
|
StartTime,
|
|
EndTime,
|
|
NumValuesPerNode,
|
|
Filter = EventFilter { SelectClauses, WhereClause }
|
|
}
|
|
```
|
|
|
|
…and is sent through `Session.HistoryReadAsync` to the upstream server. The
|
|
returned `HistoryEvent.Events` collection (one `HistoryEventFieldList` per
|
|
historical event) is unwrapped into `HistoricalEventBatch.Events`, where each
|
|
`HistoricalEventRow.Fields` dictionary is keyed by the
|
|
`SimpleAttributeSpec.FieldName` the caller supplied. The server-side history
|
|
dispatcher uses those keys to align fields with the wire-side SelectClause
|
|
order — drivers don't have to honour the entire OPC UA `EventFilter` shape
|
|
verbatim.
|
|
|
|
### SelectClauses
|
|
|
|
When `EventHistoryRequest.SelectClauses` is `null` the driver falls back to a
|
|
default set that matches `BuildHistoryEvent` on the server side:
|
|
|
|
| Field | Browse path | Notes |
|
|
| --- | --- | --- |
|
|
| `EventId` | `EventId` | BaseEventType — stable unique id. |
|
|
| `SourceName` | `SourceName` | Source-object name. |
|
|
| `Time` | `Time` | Process-side event timestamp. Used for `OccurrenceTime`. |
|
|
| `Message` | `Message` | LocalizedText payload. |
|
|
| `Severity` | `Severity` | OPC UA 1-1000 scale. |
|
|
| `ReceiveTime` | `ReceiveTime` | Server-side ingest timestamp. |
|
|
|
|
Custom SelectClauses are supported — pass any
|
|
`IReadOnlyList<SimpleAttributeSpec>`. Each entry's `TypeDefinitionId`
|
|
defaults to `BaseEventType` when `null`; pass an explicit NodeId text (e.g.
|
|
`"i=2782"` for `ConditionType`) to reach typed-condition fields.
|
|
|
|
### WhereClause
|
|
|
|
`ContentFilterSpec.EncodedOperands` carries the binary-encoded
|
|
`ContentFilter` from the wire. The driver decodes it into the SDK
|
|
`ContentFilter` and attaches it to the outgoing `EventFilter` verbatim — the
|
|
OPC UA Client driver is a passthrough for filter semantics, it does not
|
|
evaluate them. A malformed filter is dropped silently; the SelectClause
|
|
projection still goes out.
|
|
|
|
### Continuation points
|
|
|
|
Returned in `HistoricalEventBatch.ContinuationPoint`. The server-side
|
|
HistoryRead facade is responsible for round-tripping these so a paged event
|
|
read against a chatty upstream completes incrementally. The driver itself
|
|
doesn't track them — every `ReadEventsAsync` call issues a fresh
|
|
`HistoryReadAsync`.
|
|
|
|
## HistoryRead Aggregates (Part 13 catalog)
|
|
|
|
`IHistoryProvider.ReadProcessedAsync` takes a `HistoryAggregateType` and the
|
|
driver maps it to the standard `Opc.Ua.ObjectIds.AggregateFunction_*` NodeId
|
|
in `MapAggregateToNodeId`. PR-13 (issue #285) extended the enum from the
|
|
original 5 values (Average / Minimum / Maximum / Total / Count) to the full
|
|
OPC UA Part 13 §5 catalog — ~30 aggregates.
|
|
|
|
The mapping is best-effort: not every upstream OPC UA server implements every
|
|
aggregate. Aggregates the upstream rejects come back with
|
|
`StatusCode=BadAggregateNotSupported` on the per-row HistoryRead result; the
|
|
driver passes that through verbatim (cascading-quality rule, Part 11 §8) — it
|
|
does not throw. Servers advertise the aggregates they support via the
|
|
`AggregateConfiguration` object on the `Server` node; clients can probe it at
|
|
runtime.
|
|
|
|
### Catalog
|
|
|
|
| Enum value | SDK NodeId field | Part 13 § | Server-side support | Typical use |
|
|
| --- | --- | --- | --- | --- |
|
|
| `Average` | `AggregateFunction_Average` | §5.4 | almost always | smoothing |
|
|
| `Minimum` | `AggregateFunction_Minimum` | §5.5 | almost always | low watermark |
|
|
| `Maximum` | `AggregateFunction_Maximum` | §5.6 | almost always | high watermark |
|
|
| `Total` | `AggregateFunction_Total` | §5.10 | usually | totalisation |
|
|
| `Count` | `AggregateFunction_Count` | §5.18 | almost always | sample count |
|
|
| `TimeAverage` | `AggregateFunction_TimeAverage` | §5.4.2 | usually | time-weighted mean |
|
|
| `TimeAverage2` | `AggregateFunction_TimeAverage2` | §5.4.3 | sometimes | bounded time-weighted mean |
|
|
| `Interpolative` | `AggregateFunction_Interpolative` | §5.3 | usually | trend snapshot |
|
|
| `MinimumActualTime` | `AggregateFunction_MinimumActualTime` | §5.5.4 | sometimes | when low occurred |
|
|
| `MaximumActualTime` | `AggregateFunction_MaximumActualTime` | §5.6.4 | sometimes | when high occurred |
|
|
| `Range` | `AggregateFunction_Range` | §5.7 | usually | spread |
|
|
| `Range2` | `AggregateFunction_Range2` | §5.7 | sometimes | bounded spread |
|
|
| `AnnotationCount` | `AggregateFunction_AnnotationCount` | §5.21 | rarely | operator notes |
|
|
| `DurationGood` | `AggregateFunction_DurationGood` | §5.16 | sometimes | quality coverage |
|
|
| `DurationBad` | `AggregateFunction_DurationBad` | §5.16 | sometimes | gap accounting |
|
|
| `PercentGood` | `AggregateFunction_PercentGood` | §5.17 | sometimes | quality % |
|
|
| `PercentBad` | `AggregateFunction_PercentBad` | §5.17 | sometimes | gap % |
|
|
| `WorstQuality` | `AggregateFunction_WorstQuality` | §5.20 | sometimes | worst seen |
|
|
| `WorstQuality2` | `AggregateFunction_WorstQuality2` | §5.20 | rarely | bounded worst |
|
|
| `StandardDeviationSample` | `AggregateFunction_StandardDeviationSample` | §5.13 | sometimes | n-1 stddev |
|
|
| `StandardDeviationPopulation` | `AggregateFunction_StandardDeviationPopulation` | §5.13 | sometimes | n stddev |
|
|
| `VarianceSample` | `AggregateFunction_VarianceSample` | §5.13 | sometimes | n-1 variance |
|
|
| `VariancePopulation` | `AggregateFunction_VariancePopulation` | §5.13 | sometimes | n variance |
|
|
| `NumberOfTransitions` | `AggregateFunction_NumberOfTransitions` | §5.12 | sometimes | event count |
|
|
| `DurationInStateZero` | `AggregateFunction_DurationInStateZero` | §5.19 | sometimes | OFF time |
|
|
| `DurationInStateNonZero` | `AggregateFunction_DurationInStateNonZero` | §5.19 | sometimes | ON time |
|
|
| `Start` | `AggregateFunction_Start` | §5.8 | usually | first sample |
|
|
| `End` | `AggregateFunction_End` | §5.9 | usually | last sample |
|
|
| `Delta` | `AggregateFunction_Delta` | §5.11 | usually | end-start |
|
|
| `StartBound` | `AggregateFunction_StartBound` | §5.8 | sometimes | extrapolated start |
|
|
| `EndBound` | `AggregateFunction_EndBound` | §5.9 | sometimes | extrapolated end |
|
|
|
|
"Server-side support" is heuristic — see your upstream's `AggregateConfiguration`
|
|
node for the authoritative list. AVEVA Historian, KEPServerEX, Prosys, and
|
|
opc-plc each implement different subsets.
|
|
|
|
### Driver-side validation
|
|
|
|
The mapping itself is unit-tested over the full enum
|
|
(`OpcUaClientAggregateMappingTests`) — every value resolves to a non-null
|
|
namespace-0 NodeId, and the original 5 ordinals stay pinned. Wire-side
|
|
behaviour against a live server is exercised by
|
|
`OpcUaClientAggregateSweepTests` (build-only scaffold pending an opc-plc
|
|
history-sim profile).
|
|
|
|
## Upstream redundancy (`ServerArray`)
|
|
|
|
When the upstream OPC UA server is itself a redundant pair (warm or hot per
|
|
OPC UA Part 4 §6.6.2), the driver supports **mid-session failover** driven by
|
|
the upstream's own `Server.ServerRedundancy.RedundancySupport` +
|
|
`ServerUriArray` + `Server.ServiceLevel` nodes. Distinct from the static
|
|
boot-time failover sweep on `EndpointUrls`: that path picks a single survivor
|
|
at session-create time; this path swaps the active session live when the
|
|
upstream signals degradation, transferring subscriptions onto the secondary so
|
|
monitored-item handles stay valid.
|
|
|
|
### Configuration
|
|
|
|
| Option | Default | Notes |
|
|
| --- | --- | --- |
|
|
| `Redundancy.Enabled` | `false` | Opt-in. When `false`, the driver doesn't read `RedundancySupport` / `ServerUriArray` and doesn't subscribe to `ServiceLevel`. |
|
|
| `Redundancy.ServiceLevelThreshold` | `200` | Byte value below which the driver triggers failover. OPC UA spec convention: 200+ = healthy primary, 100..199 = degraded, 0..99 = unrecoverable. |
|
|
| `Redundancy.RecheckInterval` | `5s` | Lower bound between two consecutive failovers — suppresses oscillation when ServiceLevel flaps around the threshold. |
|
|
|
|
### Behaviour
|
|
|
|
- At session activation the driver reads
|
|
`Server.ServerRedundancy.RedundancySupport`. When `None`, the driver records
|
|
an empty peer list and the failover path becomes a no-op (`ServiceLevel`
|
|
drops are still observable via diagnostics but trigger nothing).
|
|
- When the upstream advertises `Cold` / `Warm` / `WarmActive` / `Hot`, the
|
|
driver pulls `Server.ServerRedundancy.ServerUriArray` for the peer list,
|
|
falling back to the top-level `Server.ServerArray` for legacy upstreams that
|
|
don't expose the redundancy node.
|
|
- A dedicated subscription on `Server.ServiceLevel` (publish interval 1s,
|
|
separate from the alarm + data subscriptions) drives every failover decision
|
|
via the SDK's notification path — no polling loop.
|
|
- On a drop below `ServiceLevelThreshold` the driver picks the next URI in the
|
|
peer list that isn't the active one, opens a parallel session against it,
|
|
and calls `Session.TransferSubscriptionsAsync(other, sendInitialValues:true)`
|
|
to migrate every live subscription (data + alarm + model-change +
|
|
service-level itself). On success the driver swaps `Session`, closes the
|
|
old one, and bumps `RedundancyFailoverCount`.
|
|
- On any failure (`BadSecureChannelClosed`, `BadCertificateUntrusted`,
|
|
`TransferSubscriptions` returning `false`, secondary unreachable) the driver
|
|
leaves the existing session untouched, increments
|
|
`RedundancyFailoverFailures`, and waits for the next ServiceLevel
|
|
notification. The keep-alive watchdog continues to cover full
|
|
upstream-loss scenarios.
|
|
|
|
### Shared client-cert prerequisite
|
|
|
|
`TransferSubscriptionsAsync` requires the secondary's secure channel to accept
|
|
the same client certificate the primary did. Operators running heterogeneous
|
|
secondaries (different cert trust stores) will see `BadCertificateUntrusted`
|
|
on every failover attempt and the failures counter climbing. The fix is to
|
|
push the gateway driver's application-instance certificate into both
|
|
upstreams' `TrustedPeerCertificates` store before enabling redundancy. A
|
|
follow-up adds a fallback path that re-creates subscriptions instead of
|
|
transferring when the secondary rejects the channel.
|
|
|
|
### Diagnostics
|
|
|
|
The `driver-diagnostics` RPC surfaces three new counters via
|
|
`DriverHealth.Diagnostics`:
|
|
|
|
| Key | Type | Notes |
|
|
| --- | --- | --- |
|
|
| `RedundancyFailoverCount` | `double` (long-counted) | Successful mid-session swaps since driver start. |
|
|
| `RedundancyFailoverFailures` | `double` (long-counted) | Swap attempts that bailed (TransferSubscriptions false, secondary unreachable, etc.). |
|
|
| `ActiveServerUri` | string (in `OpcUaClientDiagnostics.ActiveServerUri`) | URI of the upstream the driver is currently bound to. Updates on every successful failover. |
|
|
|
|
### Forced-failover runbook
|
|
|
|
To validate the wiring against a real redundant upstream pair:
|
|
|
|
1. Confirm the upstream advertises `RedundancySupport != None` and a
|
|
non-empty `ServerUriArray`. Use the Client CLI:
|
|
`dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- redundancy -u <primary>`.
|
|
2. Set `Redundancy.Enabled = true` on the gateway's `OpcUaClient` driver
|
|
instance and restart.
|
|
3. Tail driver diagnostics:
|
|
`driver-diagnostics --instance <id>` — note `RedundancyFailoverCount = 0`
|
|
pre-test.
|
|
4. Drive a `ServiceLevel` drop on the primary. On AVEVA / KEPServer this is
|
|
typically a "force standby" Admin action; on a custom server it's a write
|
|
to the simulated ServiceLevel node.
|
|
5. Observe `RedundancyFailoverCount = 1` within `RecheckInterval` of the
|
|
drop, the gateway's `HostName` swap to the secondary URI, and downstream
|
|
reads/subscriptions continuing without interruption.
|
|
|
|
For non-redundant upstreams (single-server deployments) the recommended
|
|
configuration is to leave `Redundancy.Enabled = false` and rely on
|
|
`EndpointUrls` for boot-time failover only.
|