# OPC UA Client driver

Tier-A in-process driver that opens a `Session` against a remote OPC UA server
and re-exposes its address space through the local OtOpcUa server. The
"gateway / aggregation" direction — opposite to the usual "server exposes PLC
data" flow.

For the test fixture (opc-plc) see [`OpcUaClient-Test-Fixture.md`](OpcUaClient-Test-Fixture.md).
For the configuration surface see `OpcUaClientDriverOptions` in
[`src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs).

## Auto re-import on `ModelChangeEvent`

The driver subscribes to `BaseModelChangeEventType` (and its subtype
`GeneralModelChangeEventType`) on the upstream `Server` node (`i=2253`) at
the end of `InitializeAsync`. When the upstream server advertises a
topology change, the driver coalesces events over a debounce window and
runs a single re-import (equivalent to calling `ReinitializeAsync` —
internally `ShutdownAsync` + `InitializeAsync`).

### Configuration

| Option | Default | Notes |
| --- | --- | --- |
| `WatchModelChanges` | `true` | Disable to skip the watch entirely (no extra subscription, no re-import on topology change). |
| `ModelChangeDebounce` | `5s` | Coalescing window. The first event starts the timer; further events extend it; when it elapses with no new events, the driver fires one re-import. |

### Behaviour

- One model-change subscription per driver instance, separate from the
  data + alarm subscriptions. Created best-effort: a server that doesn't
  advertise the event types or rejects the `EventFilter` falls through to
  no-watch — `InitializeAsync` still succeeds.
- The `EventFilter` selects only the `EventType` field (a `WhereClause`
  constrains by `OfType BaseModelChangeEventType`). Payload fields like
  `Changes[]` are intentionally ignored: the driver always re-imports the
  full upstream root, so per-event delta tracking would just add wire
  overhead.
- Debounce is implemented via a single-shot `Timer`; every event calls
  `Timer.Change(window, Infinite)` so a burst of N events triggers exactly
  one re-import after the window elapses with no further events.
- The re-import path acquires the same `_gate` semaphore that `ReadAsync`
  / `WriteAsync` / `BrowseAsync` / `SubscribeAsync` use. Downstream callers
  see a brief browse-gap (≈ the upstream `DiscoverAsync` duration) while
  the gate is held — but no torn reads or split-batch writes.
- Failure during the re-import is best-effort: the next `ModelChangeEvent`
  triggers another attempt, and the keep-alive watchdog covers permanent
  upstream loss. Operators see failures through `DriverHealth.LastError`
  + the diagnostics counters.

### When to disable

Flip `WatchModelChanges` to `false` when:

- The upstream topology is known-static (e.g. firmware-pinned PLC) and
  the driver should never run a re-import unprompted.
- The brief browse-gap during re-import is unacceptable and a manual
  `ReinitializeAsync` call from the operator is preferred.
- The upstream server fires spurious `ModelChangeEvent`s that don't
  reflect real topology changes, causing wasted re-imports. Tighten or
  disable rather than chasing the noise downstream.

## Reverse Connect (server-initiated)

OPC UA's reverse-connect mode flips the transport direction: instead of the
client dialling the server, the **server** dials the client's listener. The
upstream sends a `ReverseHello` and the client continues the OPC UA
handshake on the inbound socket. Required for OT-DMZ deployments where the
plant firewall only permits outbound traffic from the upstream — the
gateway opens a listener, the upstream reaches out.

### Configuration

| Option | Default | Notes |
| --- | --- | --- |
| `ReverseConnect.Enabled` | `false` | Opt-in. When `true`, replaces the failover dial-sweep with a `WaitForConnection` call. |
| `ReverseConnect.ListenerUrl` | `null` | Local listener URL the SDK binds. Typically `opc.tcp://0.0.0.0:4844` (any interface) or a specific NIC for multi-homed gateways. **Required when `Enabled` is `true`.** |
| `ReverseConnect.ExpectedServerUri` | `null` | Upstream's `ApplicationUri` to filter inbound dials. `null` accepts the first connection (only safe with one upstream targeting the listener). |

### Shared listener (singleton)

A single underlying `Opc.Ua.Client.ReverseConnectManager` per process keyed
on `ListenerUrl`. Two driver instances that share a listener URL multiplex
onto one TCP socket; the SDK demuxes inbound dials by the upstream's
reported `ServerUri`. The wrapper (`ReverseConnectListener`) is
reference-counted — first `Acquire` binds the port, last `Release` tears it
down. Letting drivers come and go independently without races on
port-bind / port-unbind.

When two drivers share a listener:

- They MUST set `ExpectedServerUri` to disambiguate; otherwise the first
  upstream to dial in wins regardless of which driver is waiting.
- They CAN come and go independently; the listener stays alive while at
  least one driver references it.

### Behaviour

- The dial path is bypassed entirely when `Enabled` is `true`. Failover
  across multiple `EndpointUrls` doesn't apply — there's no client-side
  dial to fail over.
- `ExpectedServerUri` is the SDK's filter parameter to `WaitForConnectionAsync`.
  Inbound `ReverseHello`s from a different upstream are ignored and the
  caller keeps waiting.
- The same `EndpointDescription` derivation runs as the dial path — the
  first `EndpointUrl` in the candidate list seeds `SecurityPolicy` /
  `SecurityMode` / `EndpointUrl` for the session-create call. The actual
  endpoint lives on the upstream and the SDK reconciles after the
  `ReverseHello`.
- Cancellation: `Timeout` bounds the wait. A stuck listener with no inbound
  dial throws after `Timeout` rather than hanging init forever.
- Shutdown releases the listener reference. The last release stops the
  listener so the port can be re-bound by a future driver lifecycle.

### Wiring it up on the upstream

The upstream OPC UA server has to be configured to dial out. The `opc-plc`
simulator does this with `--rc=opc.tcp://<gateway-host>:4844`; for a real
upstream see your server's reverse-connect docs (most major implementations
expose a "ReverseConnect.Endpoint" config knob).

### When NOT to use

- Standard plant networks where the gateway can dial the upstream — the
  conventional dial path is simpler and supports failover natively.
- Public-internet OPC UA: reverse-connect is a network-policy workaround,
  not a security primitive. Always pair with `Sign` or `SignAndEncrypt`
  + a vetted user-token policy.

## HistoryRead Events

The driver passes through OPC UA `HistoryReadEvents` to the upstream server.
HistoryRead Raw / Processed / AtTime ship in the same code path
(`ExecuteHistoryReadAsync`); event history takes a slightly different shape
because the client sends an `EventFilter` (SelectClauses + WhereClause) rather
than a plain numeric / time-based detail block.

### Wire path

`IHistoryProvider.ReadEventsAsync(fullReference, EventHistoryRequest, ct)`
translates to:

```
new ReadEventDetails {
    StartTime,
    EndTime,
    NumValuesPerNode,
    Filter = EventFilter { SelectClauses, WhereClause }
}
```

…and is sent through `Session.HistoryReadAsync` to the upstream server. The
returned `HistoryEvent.Events` collection (one `HistoryEventFieldList` per
historical event) is unwrapped into `HistoricalEventBatch.Events`, where each
`HistoricalEventRow.Fields` dictionary is keyed by the
`SimpleAttributeSpec.FieldName` the caller supplied. The server-side history
dispatcher uses those keys to align fields with the wire-side SelectClause
order — drivers don't have to honour the entire OPC UA `EventFilter` shape
verbatim.

### SelectClauses

When `EventHistoryRequest.SelectClauses` is `null` the driver falls back to a
default set that matches `BuildHistoryEvent` on the server side:

| Field | Browse path | Notes |
| --- | --- | --- |
| `EventId` | `EventId` | BaseEventType — stable unique id. |
| `SourceName` | `SourceName` | Source-object name. |
| `Time` | `Time` | Process-side event timestamp. Used for `OccurrenceTime`. |
| `Message` | `Message` | LocalizedText payload. |
| `Severity` | `Severity` | OPC UA 1-1000 scale. |
| `ReceiveTime` | `ReceiveTime` | Server-side ingest timestamp. |

Custom SelectClauses are supported — pass any
`IReadOnlyList<SimpleAttributeSpec>`. Each entry's `TypeDefinitionId`
defaults to `BaseEventType` when `null`; pass an explicit NodeId text (e.g.
`"i=2782"` for `ConditionType`) to reach typed-condition fields.

### WhereClause

`ContentFilterSpec.EncodedOperands` carries the binary-encoded
`ContentFilter` from the wire. The driver decodes it into the SDK
`ContentFilter` and attaches it to the outgoing `EventFilter` verbatim — the
OPC UA Client driver is a passthrough for filter semantics, it does not
evaluate them. A malformed filter is dropped silently; the SelectClause
projection still goes out.

### Continuation points

Returned in `HistoricalEventBatch.ContinuationPoint`. The server-side
HistoryRead facade is responsible for round-tripping these so a paged event
read against a chatty upstream completes incrementally. The driver itself
doesn't track them — every `ReadEventsAsync` call issues a fresh
`HistoryReadAsync`.

## HistoryRead Aggregates (Part 13 catalog)

`IHistoryProvider.ReadProcessedAsync` takes a `HistoryAggregateType` and the
driver maps it to the standard `Opc.Ua.ObjectIds.AggregateFunction_*` NodeId
in `MapAggregateToNodeId`. PR-13 (issue #285) extended the enum from the
original 5 values (Average / Minimum / Maximum / Total / Count) to the full
OPC UA Part 13 §5 catalog — ~30 aggregates.

The mapping is best-effort: not every upstream OPC UA server implements every
aggregate. Aggregates the upstream rejects come back with
`StatusCode=BadAggregateNotSupported` on the per-row HistoryRead result; the
driver passes that through verbatim (cascading-quality rule, Part 11 §8) — it
does not throw. Servers advertise the aggregates they support via the
`AggregateConfiguration` object on the `Server` node; clients can probe it at
runtime.

### Catalog

| Enum value | SDK NodeId field | Part 13 § | Server-side support | Typical use |
| --- | --- | --- | --- | --- |
| `Average` | `AggregateFunction_Average` | §5.4 | almost always | smoothing |
| `Minimum` | `AggregateFunction_Minimum` | §5.5 | almost always | low watermark |
| `Maximum` | `AggregateFunction_Maximum` | §5.6 | almost always | high watermark |
| `Total` | `AggregateFunction_Total` | §5.10 | usually | totalisation |
| `Count` | `AggregateFunction_Count` | §5.18 | almost always | sample count |
| `TimeAverage` | `AggregateFunction_TimeAverage` | §5.4.2 | usually | time-weighted mean |
| `TimeAverage2` | `AggregateFunction_TimeAverage2` | §5.4.3 | sometimes | bounded time-weighted mean |
| `Interpolative` | `AggregateFunction_Interpolative` | §5.3 | usually | trend snapshot |
| `MinimumActualTime` | `AggregateFunction_MinimumActualTime` | §5.5.4 | sometimes | when low occurred |
| `MaximumActualTime` | `AggregateFunction_MaximumActualTime` | §5.6.4 | sometimes | when high occurred |
| `Range` | `AggregateFunction_Range` | §5.7 | usually | spread |
| `Range2` | `AggregateFunction_Range2` | §5.7 | sometimes | bounded spread |
| `AnnotationCount` | `AggregateFunction_AnnotationCount` | §5.21 | rarely | operator notes |
| `DurationGood` | `AggregateFunction_DurationGood` | §5.16 | sometimes | quality coverage |
| `DurationBad` | `AggregateFunction_DurationBad` | §5.16 | sometimes | gap accounting |
| `PercentGood` | `AggregateFunction_PercentGood` | §5.17 | sometimes | quality % |
| `PercentBad` | `AggregateFunction_PercentBad` | §5.17 | sometimes | gap % |
| `WorstQuality` | `AggregateFunction_WorstQuality` | §5.20 | sometimes | worst seen |
| `WorstQuality2` | `AggregateFunction_WorstQuality2` | §5.20 | rarely | bounded worst |
| `StandardDeviationSample` | `AggregateFunction_StandardDeviationSample` | §5.13 | sometimes | n-1 stddev |
| `StandardDeviationPopulation` | `AggregateFunction_StandardDeviationPopulation` | §5.13 | sometimes | n stddev |
| `VarianceSample` | `AggregateFunction_VarianceSample` | §5.13 | sometimes | n-1 variance |
| `VariancePopulation` | `AggregateFunction_VariancePopulation` | §5.13 | sometimes | n variance |
| `NumberOfTransitions` | `AggregateFunction_NumberOfTransitions` | §5.12 | sometimes | event count |
| `DurationInStateZero` | `AggregateFunction_DurationInStateZero` | §5.19 | sometimes | OFF time |
| `DurationInStateNonZero` | `AggregateFunction_DurationInStateNonZero` | §5.19 | sometimes | ON time |
| `Start` | `AggregateFunction_Start` | §5.8 | usually | first sample |
| `End` | `AggregateFunction_End` | §5.9 | usually | last sample |
| `Delta` | `AggregateFunction_Delta` | §5.11 | usually | end-start |
| `StartBound` | `AggregateFunction_StartBound` | §5.8 | sometimes | extrapolated start |
| `EndBound` | `AggregateFunction_EndBound` | §5.9 | sometimes | extrapolated end |

"Server-side support" is heuristic — see your upstream's `AggregateConfiguration`
node for the authoritative list. AVEVA Historian, KEPServerEX, Prosys, and
opc-plc each implement different subsets.

### Driver-side validation

The mapping itself is unit-tested over the full enum
(`OpcUaClientAggregateMappingTests`) — every value resolves to a non-null
namespace-0 NodeId, and the original 5 ordinals stay pinned. Wire-side
behaviour against a live server is exercised by
`OpcUaClientAggregateSweepTests` (build-only scaffold pending an opc-plc
history-sim profile).

## Upstream redundancy (`ServerArray`)

When the upstream OPC UA server is itself a redundant pair (warm or hot per
OPC UA Part 4 §6.6.2), the driver supports **mid-session failover** driven by
the upstream's own `Server.ServerRedundancy.RedundancySupport` +
`ServerUriArray` + `Server.ServiceLevel` nodes. Distinct from the static
boot-time failover sweep on `EndpointUrls`: that path picks a single survivor
at session-create time; this path swaps the active session live when the
upstream signals degradation, transferring subscriptions onto the secondary so
monitored-item handles stay valid.

### Configuration

| Option | Default | Notes |
| --- | --- | --- |
| `Redundancy.Enabled` | `false` | Opt-in. When `false`, the driver doesn't read `RedundancySupport` / `ServerUriArray` and doesn't subscribe to `ServiceLevel`. |
| `Redundancy.ServiceLevelThreshold` | `200` | Byte value below which the driver triggers failover. OPC UA spec convention: 200+ = healthy primary, 100..199 = degraded, 0..99 = unrecoverable. |
| `Redundancy.RecheckInterval` | `5s` | Lower bound between two consecutive failovers — suppresses oscillation when ServiceLevel flaps around the threshold. |

### Behaviour

- At session activation the driver reads
  `Server.ServerRedundancy.RedundancySupport`. When `None`, the driver records
  an empty peer list and the failover path becomes a no-op (`ServiceLevel`
  drops are still observable via diagnostics but trigger nothing).
- When the upstream advertises `Cold` / `Warm` / `WarmActive` / `Hot`, the
  driver pulls `Server.ServerRedundancy.ServerUriArray` for the peer list,
  falling back to the top-level `Server.ServerArray` for legacy upstreams that
  don't expose the redundancy node.
- A dedicated subscription on `Server.ServiceLevel` (publish interval 1s,
  separate from the alarm + data subscriptions) drives every failover decision
  via the SDK's notification path — no polling loop.
- On a drop below `ServiceLevelThreshold` the driver picks the next URI in the
  peer list that isn't the active one, opens a parallel session against it,
  and calls `Session.TransferSubscriptionsAsync(other, sendInitialValues:true)`
  to migrate every live subscription (data + alarm + model-change +
  service-level itself). On success the driver swaps `Session`, closes the
  old one, and bumps `RedundancyFailoverCount`.
- On any failure (`BadSecureChannelClosed`, `BadCertificateUntrusted`,
  `TransferSubscriptions` returning `false`, secondary unreachable) the driver
  leaves the existing session untouched, increments
  `RedundancyFailoverFailures`, and waits for the next ServiceLevel
  notification. The keep-alive watchdog continues to cover full
  upstream-loss scenarios.

### Shared client-cert prerequisite

`TransferSubscriptionsAsync` requires the secondary's secure channel to accept
the same client certificate the primary did. Operators running heterogeneous
secondaries (different cert trust stores) will see `BadCertificateUntrusted`
on every failover attempt and the failures counter climbing. The fix is to
push the gateway driver's application-instance certificate into both
upstreams' `TrustedPeerCertificates` store before enabling redundancy. A
follow-up adds a fallback path that re-creates subscriptions instead of
transferring when the secondary rejects the channel.

### Diagnostics

The `driver-diagnostics` RPC surfaces three new counters via
`DriverHealth.Diagnostics`:

| Key | Type | Notes |
| --- | --- | --- |
| `RedundancyFailoverCount` | `double` (long-counted) | Successful mid-session swaps since driver start. |
| `RedundancyFailoverFailures` | `double` (long-counted) | Swap attempts that bailed (TransferSubscriptions false, secondary unreachable, etc.). |
| `ActiveServerUri` | string (in `OpcUaClientDiagnostics.ActiveServerUri`) | URI of the upstream the driver is currently bound to. Updates on every successful failover. |

### Forced-failover runbook

To validate the wiring against a real redundant upstream pair:

1. Confirm the upstream advertises `RedundancySupport != None` and a
   non-empty `ServerUriArray`. Use the Client CLI:
   `dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- redundancy -u <primary>`.
2. Set `Redundancy.Enabled = true` on the gateway's `OpcUaClient` driver
   instance and restart.
3. Tail driver diagnostics:
   `driver-diagnostics --instance <id>` — note `RedundancyFailoverCount = 0`
   pre-test.
4. Drive a `ServiceLevel` drop on the primary. On AVEVA / KEPServer this is
   typically a "force standby" Admin action; on a custom server it's a write
   to the simulated ServiceLevel node.
5. Observe `RedundancyFailoverCount = 1` within `RecheckInterval` of the
   drop, the gateway's `HostName` swap to the secondary URI, and downstream
   reads/subscriptions continuing without interruption.

For non-redundant upstreams (single-server deployments) the recommended
configuration is to leave `Redundancy.Enabled = false` and rely on
`EndpointUrls` for boot-time failover only.