Files
lmxopcua/docs/Redundancy.md
T

188 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Redundancy (v2)
## Overview
OtOpcUa supports OPC UA **non-transparent** warm/hot redundancy. Two or more `OtOpcUa.Host` processes run side-by-side, share the same Config DB, and join the same Akka.NET cluster. Each process owns a distinct `ApplicationUri`; OPC UA clients discover both endpoints by reading `Server.ServerArray` (NodeId `i=2254`) on either node and pick one based on the `ServiceLevel` byte that each server publishes.
> **Discovery surface.** The `ServerArray` path on the `Server` object is what each node populates with self + peer `ApplicationUri`s — see `OpcUaApplicationHost.PopulateServerArray` and the per-node `PeerApplicationUris` option below. The redundancy-object-type `ServerUriArray` proper (a child of `Server.ServerRedundancy`) remains deferred pending an SDK object-type upgrade; clients should read `Server.ServerArray` for peer discovery today.
> **v2 change.** v1's operator-managed `ClusterNode.RedundancyRole` column + `RedundancyCoordinator` / `ApplyLeaseRegistry` / `PeerHttpProbeLoop` are gone. Primary/secondary is now derived from **Akka cluster role-leader** for the `driver` role. The operator no longer writes a role into the DB; cluster topology (specifically the `driver` role-leader) drives ServiceLevel automatically.
The runtime pieces live in:
| Component | Project | Role |
|---|---|---|
| `RedundancyStateActor` | `OtOpcUa.ControlPlane.Redundancy` | Admin-role cluster singleton; subscribes to cluster topology events, debounces 250ms, broadcasts `RedundancyStateChanged` on the `redundancy-state` DPS topic. |
| `OpcUaPublishActor` | `OtOpcUa.Runtime.OpcUa` | Per-driver-node; subscribes to the `redundancy-state` topic, maps the local node's role to a ServiceLevel byte (see below), and forwards it to `IServiceLevelPublisher`. |
| `IServiceLevelPublisher` / `SdkServiceLevelPublisher` | `OtOpcUa.Commons.OpcUa` / `OtOpcUa.OpcUaServer` | Writes the byte into the SDK's `Server.ServiceLevel` Variable. Production binds `DeferredServiceLevelPublisher`, which swaps in the real `SdkServiceLevelPublisher` once the SDK is up (it needs `IServerInternal`, available only after `StandardServer.Start`); until then writes route through `NullServiceLevelPublisher`. |
| `ServiceLevelCalculator` | `OtOpcUa.Cluster.Redundancy` (`Core.Cluster`) | Pure function `(NodeHealthInputs) → byte` — the DB/probe-aware tiering (see truth table below). Covered by `ServiceLevelCalculatorTests`. **Now the live publish path**`OpcUaPublishActor` calls it on every `HealthTick` and `RedundancyStateChanged` event. Moved to `Core.Cluster` so Runtime can reach it without a Runtime→ControlPlane reference. |
| `DbHealthProbeActor` | `OtOpcUa.Runtime.Health` | Per-node; runs `SELECT 1` against ConfigDb every 5s. Read by health endpoint. |
| `PeerOpcUaProbeActor` | `OtOpcUa.Runtime.Health` | Per-node; pings peer `opc.tcp://peer:4840` with a TCP connect (2s timeout) and publishes the result on the `redundancy-state` topic. A full secure-channel Hello handshake is a possible future upgrade; the TCP connect is the current real probe. |
| `ClusterRoleInfo` | `OtOpcUa.Cluster` | Live view of cluster membership + role-leader; exposes `IClusterRoleInfo` to the rest of the host. |
## ServiceLevel tiers
### Health-aware tiering (`ServiceLevelCalculator` — live path)
`ServiceLevelCalculator.Compute(NodeHealthInputs)` is the live publish path.
`OpcUaPublishActor` calls it on every `HealthTick` (~5 s) and on each
`RedundancyStateChanged` snapshot, then forwards the result through
`IServiceLevelPublisher` to the SDK's `Server.ServiceLevel` Variable.
The four inputs are sourced locally per driver node:
| Input | Source |
|---|---|
| `MemberState` | Local `SelfMember.Status` from the Akka cluster (Up / Joining / Leaving / …). |
| `DbReachable` | Local `DbHealthProbeActor``OpcUaPublishActor` Asks it on each `HealthTick`; an Ask timeout is treated as `Reachable=false`. |
| `OpcUaProbeOk` | Result of a peer probing THIS node's OPC UA endpoint: `PeerProbeSupervisor` spawns one `PeerOpcUaProbeActor` per OTHER driver-role peer; each probe publishes `OpcUaProbeResult(probed-node, ok)` on the `redundancy-state` topic; the publish actor consumes only results whose target is itself. Freshness-debounced: absent or stale (>30 s) → `true` (benefit of the doubt — single-node clusters and a departed peer never demote); only an actively-observed RECENT `false` demotes. |
| `Stale` (derived) | `!DbReachable \|\| (now lastDbHealth.AsOfUtc) > 30 s \|\| (now snapshotEntry.AsOfUtc) > 30 s`. |
| `IsDriverRoleLeader` | The local node's entry in the `RedundancyStateChanged` snapshot from `RedundancyStateActor`. |
The resulting truth table (all tiers are now reachable at runtime):
| Tier | Byte | Condition |
|---|---|---|
| Down / Detached | 0 | Member status is not `Up` or `Joining` (leaving, removed, exiting), OR node has no `driver` role (Detached). Published immediately — a starting or detached node never leaves the SDK default 255. |
| Critically degraded | 100 | ConfigDb unreachable AND data is stale. |
| Stale | 200 | Data stale but ConfigDb reachable. |
| Healthy follower | 240 | DB reachable + OPC UA probe ok + not stale + not role-leader. |
| Healthy leader | 250 | Same as healthy follower + this node is the `driver` role-leader (+10 bonus). |
> **Secondary 100 → 240 (behavior change).** Previously a healthy Secondary
> published 100 (coarse role-only mapping). It now publishes **240** — both
> nodes sit at 240/250 under healthy conditions, with the leader still preferred
> by the +10 bonus. Clients with the standard "pick highest ServiceLevel"
> heuristic continue to prefer the primary.
#### Backward-compatible fallback (legacy seam)
A node with no `DbHealthStatus` wired (e.g. early bootstrap window before the
first `DbHealthProbeActor` reply) falls back to the old role-only mapping:
Primary-leader → 240, Primary → 200, Secondary → 100, Detached → 0. Once the
first `DbHealthStatus` arrives the calculator takes over. The first computed
ServiceLevel (even 0) is always published so no node lingers at the SDK default
255.
Roles come from `RedundancyStateActor.BuildSnapshot`: a node with the `driver`
role is `Primary` when it holds the `driver` role-leader lease, otherwise
`Secondary`; a node without the `driver` role is `Detached`.
## Data flow
```
Cluster topology event ──────────────────────────────────────────┐
RedundancyStateActor (admin singleton)
│ debounce 250ms
DPS topic "redundancy-state"
│ ▲
┌───────────────────────┘ │
│ │
▼ │
Driver node: OpcUaPublishActor │
┌─────────────────────────────────────────────────────────┐ │
│ Inputs collected per ~5s HealthTick: │ │
│ • MemberState ← Akka SelfMember.Status │ │
│ • DbReachable ← DbHealthProbeActor (Ask, timeout→F) │ │
│ • OpcUaProbeOk ← OpcUaProbeResult about THIS node │──────┘
│ • Stale ← derived from above timestamps │ PeerProbeSupervisor
│ • IsLeader ← RedundancyStateChanged snapshot │ → PeerOpcUaProbeActor(s)
│ │ publish OpcUaProbeResult
│ ServiceLevelCalculator.Compute(NodeHealthInputs) │ on "redundancy-state"
│ → byte (0/100/200/240/250) │
└───────────────────────────────────────────────────────-─┘
IServiceLevelPublisher (SdkServiceLevelPublisher)
OPC UA Server.ServiceLevel Variable
```
Both `DbHealthProbeActor` and `PeerOpcUaProbeActor` feed the live publish path.
The peer probe publishes `OpcUaProbeResult` on the `redundancy-state` topic;
`OpcUaPublishActor` consumes only results whose target is itself and applies
freshness-debouncing before passing them to the calculator. `DbHealthProbeActor`
is queried directly via Ask on each `HealthTick`.
The admin singleton is the cluster's only `RedundancyStateActor`. If the admin leader fails over, the new admin node spins up its replacement, re-subscribes to cluster events, and publishes a fresh snapshot from the current `Cluster.State`. There is no DB-persisted state to recover.
## Configuration
Per-node identity comes from `appsettings.json` + the `OTOPCUA_ROLES` env var:
```json
{
"Cluster": {
"Hostname": "0.0.0.0",
"Port": 4053,
"PublicHostname": "node-a.lan",
"SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"],
"Roles": ["admin", "driver"]
}
}
```
```
OTOPCUA_ROLES=admin,driver
```
Both nodes share the same `ConfigDb` connection string; `Cluster.PublicHostname` + `Roles` are what makes them distinct in cluster gossip. The first node bootstraps the cluster (its address goes in `SeedNodes`); the second node joins via the same `SeedNodes` list.
There is no longer a `Node:NodeId` setting and no `ClusterNode.RedundancyRole` column (the V2 migration dropped it — primary/secondary is now derived from cluster role-leadership). NodeId is derived as `host:port` of the cluster `PublicHostname` (see `ClusterRoleInfo.LocalNode` for the formula).
> **`RedundancyStateActor` NodeId consistency (fixed).** `RedundancyStateActor` now keys each node's `NodeRedundancyState` entry by the canonical `host:port` node id (via a `ToNodeId(Address)` helper mirroring `ClusterRoleInfo.ToNodeId`). Previously it keyed by `member.Address.Host` (host-only, e.g. `central-2`); since every subscriber matches by the canonical `host:port` form, the mismatch silently meant no node ever matched its own entry — all nodes stayed at the default ServiceLevel 255 and never learned their role. This fix makes `RedundancyStateActor` consistent with the stated contract above. Additionally, `RedundancyStateActor` now **re-publishes the current snapshot on a periodic heartbeat (default 10 s)** so any node that subscribes after the last topology-change publish converges within the interval (DistributedPubSub does not replay to late subscribers).
The `ClusterNode.ServiceLevelBase` column still exists and is editable in the Admin UI (NodeEdit / Cluster Redundancy pages), but it no longer drives the runtime ServiceLevel — that value is computed by `ServiceLevelCalculator` from cluster role and live health inputs, independent of this stored preference.
### Peer URI advertising
Each node advertises its partner via `OpcUaApplicationHostOptions.PeerApplicationUris` (an `IList<string>`, default empty). `OpcUaApplicationHost.PopulateServerArray` appends each configured peer URI to the SDK's `IServerInternal.ServerUris` string table after server startup, so that `Server.ServerArray` reads served by `OnReadServerArray` return both self + peers. The options bind from the `OpcUa` config section (see `Program.cs``AddValidatedOptions<OpcUaApplicationHostOptions>(…, "OpcUa")`). Set this per-node in `appsettings.json`:
```json
{
"OpcUa": {
"PeerApplicationUris": ["urn:node-b:OtOpcUa"]
}
}
```
Node A lists Node B's `ApplicationUri` and vice-versa. Validated by `DualEndpointTests` in `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/` — boots two `OpcUaApplicationHost` instances on loopback, asserts a real OPCFoundation client `Session` reading `Server.ServerArray` from Node A sees both URIs.
## Split-brain
`akka.conf` configures Akka's split-brain resolver with `active-strategy = keep-oldest`, `stable-after = 15s`, and `failure-detector.threshold = 10.0`. Under a clean partition: the oldest member stays up + the smaller (or younger) side downs itself within ~15 seconds. The `RedundancyStateActor` on the surviving partition re-computes from the post-partition `Cluster.State`.
There is no operator-driven role swap during a partition. Failover is what the cluster does automatically.
## Primary-gated alarm emission and historization
Under warm/hot redundancy both cluster nodes run `ScriptedAlarmHostActor` and evaluate scripted alarms, keeping each node's address space and engine state warm for instant failover. However, to avoid duplicate rows on `/alerts` and duplicate historian writes, only the Primary node publishes externally:
- **`alerts` topic emission** — `ScriptedAlarmHostActor` subscribes to the `redundancy-state` DPS topic and caches the local node's `RedundancyRole`. Each alarm transition is published to the cluster `alerts` topic **only when the node's role is `Primary`**. The default behaviour before any `redundancy-state` message arrives is to emit, so single-node deployments and the boot window never drop transitions. The OPC UA condition-node write and inbound ack/shelve command processing remain **ungated** on both nodes so the secondary is always ready to serve clients after a failover.
- **`HistorianAdapterActor` historization** — likewise Primary-gated so alarm historization is exactly-once across all alarm sources. The actor subscribes to the `alerts` DPS topic and translates each `AlarmTransitionEvent``AlarmHistorianEvent` before enqueuing it on the sink; scripted alarms therefore historize exactly once regardless of cluster size.
Net effect: each alarm transition appears **once** on `/alerts` and would historize once, not once per node.
See [ScriptedAlarms.md](ScriptedAlarms.md) and [AlarmTracking.md](AlarmTracking.md) for the scripted-alarm engine internals.
## Client-side failover
The OtOpcUa Client CLI at `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI` supports `-F` / `--failover-urls` for automatic client-side failover; for long-running subscriptions the CLI monitors session KeepAlive and reconnects to the next available server, recreating the subscription on the new endpoint. See [`Client.CLI.md`](Client.CLI.md).
## Observability
`OpcUaPublishActor` emits one metric on every ServiceLevel transition (it suppresses no-op repeats of the same byte):
| Metric | Type | Notes |
|---|---|---|
| `otopcua.redundancy.service_level_change` | Counter (`{change}`) | OPC UA `Server.ServiceLevel` transitions emitted by the redundancy state. Tagged with `level` = the new byte. |
The meter is defined on `OtOpcUaTelemetry` (`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/OtOpcUaTelemetry.cs`); it surfaces through whatever OpenTelemetry exporter the host configures.
## Depth reference
For the full design — message contracts, tiered calculator truth table, recovery semantics — see `docs/plans/2026-05-26-akka-hosting-alignment-design.md` §6.