lmxopcua/docs/Redundancy.md

# Redundancy (v2)

## Overview

OtOpcUa supports OPC UA **non-transparent** warm/hot redundancy. Two or more `OtOpcUa.Host` processes run side-by-side, share the same Config DB, and join the same Akka.NET cluster. Each process owns a distinct `ApplicationUri`; OPC UA clients discover both endpoints by reading `Server.ServerArray` (NodeId `i=2254`) on either node and pick one based on the `ServiceLevel` byte that each server publishes.

> **Discovery surface.** The `ServerArray` path on the `Server` object is what each node populates with self + peer `ApplicationUri`s — see `OpcUaApplicationHost.PopulateServerArray` and the per-node `PeerApplicationUris` option below. The redundancy-object-type `ServerUriArray` proper (a child of `Server.ServerRedundancy`) remains deferred pending an SDK object-type upgrade; clients should read `Server.ServerArray` for peer discovery today.

> **v2 change.** v1's operator-managed `ClusterNode.RedundancyRole` column + `RedundancyCoordinator` / `ApplyLeaseRegistry` / `PeerHttpProbeLoop` are gone. Primary/secondary is now derived from **Akka cluster role-leader** for the `driver` role. The operator no longer writes a role into the DB; cluster topology (specifically the `driver` role-leader) drives ServiceLevel automatically.

The runtime pieces live in:

| Component | Project | Role |
|---|---|---|
| `RedundancyStateActor` | `OtOpcUa.ControlPlane.Redundancy` | Admin-role cluster singleton; subscribes to cluster topology events, debounces 250ms, broadcasts `RedundancyStateChanged` on the `redundancy-state` DPS topic. |
| `OpcUaPublishActor` | `OtOpcUa.Runtime.OpcUa` | Per-driver-node; subscribes to the `redundancy-state` topic, maps the local node's role to a ServiceLevel byte (see below), and forwards it to `IServiceLevelPublisher`. |
| `IServiceLevelPublisher` / `SdkServiceLevelPublisher` | `OtOpcUa.Commons.OpcUa` / `OtOpcUa.OpcUaServer` | Writes the byte into the SDK's `Server.ServiceLevel` Variable. Production binds `DeferredServiceLevelPublisher`, which swaps in the real `SdkServiceLevelPublisher` once the SDK is up (it needs `IServerInternal`, available only after `StandardServer.Start`); until then writes route through `NullServiceLevelPublisher`. |
| `ServiceLevelCalculator` | `OtOpcUa.Cluster.Redundancy` (`Core.Cluster`) | Pure function `(NodeHealthInputs) → byte` — the DB/probe-aware tiering (see truth table below). Covered by `ServiceLevelCalculatorTests`. **Now the live publish path** — `OpcUaPublishActor` calls it on every `HealthTick` and `RedundancyStateChanged` event. Moved to `Core.Cluster` so Runtime can reach it without a Runtime→ControlPlane reference. |
| `DbHealthProbeActor` | `OtOpcUa.Runtime.Health` | Per-node; runs `SELECT 1` against ConfigDb every 5s. Read by health endpoint. |
| `PeerOpcUaProbeActor` | `OtOpcUa.Runtime.Health` | Per-node; pings peer `opc.tcp://peer:4840` with a TCP connect (2s timeout) and publishes the result on the `redundancy-state` topic. A full secure-channel Hello handshake is a possible future upgrade; the TCP connect is the current real probe. |
| `ClusterRoleInfo` | `OtOpcUa.Cluster` | Live view of cluster membership + role-leader; exposes `IClusterRoleInfo` to the rest of the host. |

## ServiceLevel tiers

### Health-aware tiering (`ServiceLevelCalculator` — live path)

`ServiceLevelCalculator.Compute(NodeHealthInputs)` is the live publish path.
`OpcUaPublishActor` calls it on every `HealthTick` (~5 s) and on each
`RedundancyStateChanged` snapshot, then forwards the result through
`IServiceLevelPublisher` to the SDK's `Server.ServiceLevel` Variable.

The four inputs are sourced locally per driver node:

| Input | Source |
|---|---|
| `MemberState` | Local `SelfMember.Status` from the Akka cluster (Up / Joining / Leaving / …). |
| `DbReachable` | Local `DbHealthProbeActor` — `OpcUaPublishActor` Asks it on each `HealthTick`; an Ask timeout is treated as `Reachable=false`. |
| `OpcUaProbeOk` | Result of a peer probing THIS node's OPC UA endpoint: `PeerProbeSupervisor` spawns one `PeerOpcUaProbeActor` per OTHER driver-role peer; each probe publishes `OpcUaProbeResult(probed-node, ok)` on the `redundancy-state` topic; the publish actor consumes only results whose target is itself. Freshness-debounced: absent or stale (>30 s) → `true` (benefit of the doubt — single-node clusters and a departed peer never demote); only an actively-observed RECENT `false` demotes. |
| `Stale` (derived) | `!DbReachable \|\| (now − lastDbHealth.AsOfUtc) > 30 s \|\| (now − snapshotEntry.AsOfUtc) > 30 s`. |
| `IsDriverRoleLeader` | The local node's entry in the `RedundancyStateChanged` snapshot from `RedundancyStateActor`. |

The resulting truth table (all tiers are now reachable at runtime):

| Tier | Byte | Condition |
|---|---|---|
| Down / Detached | 0 | Member status is not `Up` or `Joining` (leaving, removed, exiting), OR node has no `driver` role (Detached). Published immediately — a starting or detached node never leaves the SDK default 255. |
| Critically degraded | 100 | ConfigDb unreachable AND data is stale. |
| Stale | 200 | Data stale but ConfigDb reachable. |
| Healthy follower | 240 | DB reachable + OPC UA probe ok + not stale + not role-leader. |
| Healthy leader | 250 | Same as healthy follower + this node is the `driver` role-leader (+10 bonus). |

> **Secondary 100 → 240 (behavior change).** Previously a healthy Secondary
> published 100 (coarse role-only mapping). It now publishes **240** — both
> nodes sit at 240/250 under healthy conditions, with the leader still preferred
> by the +10 bonus. Clients with the standard "pick highest ServiceLevel"
> heuristic continue to prefer the primary.

#### Backward-compatible fallback (legacy seam)

A node with no `DbHealthStatus` wired (e.g. early bootstrap window before the
first `DbHealthProbeActor` reply) falls back to the old role-only mapping:
Primary-leader → 240, Primary → 200, Secondary → 100, Detached → 0. Once the
first `DbHealthStatus` arrives the calculator takes over. The first computed
ServiceLevel (even 0) is always published so no node lingers at the SDK default
255.

Roles come from `RedundancyStateActor.BuildSnapshot`: a node with the `driver`
role is `Primary` when it holds the `driver` role-leader lease, otherwise
`Secondary`; a node without the `driver` role is `Detached`.

## Data flow

```
Cluster topology event ──────────────────────────────────────────┐
                                                                   ▼
                                               RedundancyStateActor (admin singleton)
                                                                   │  debounce 250ms
                                                                   ▼
                                               DPS topic "redundancy-state"
                                                    │                         ▲
                            ┌───────────────────────┘                         │
                            │                                                  │
                            ▼                                                  │
              Driver node: OpcUaPublishActor                                   │
              ┌─────────────────────────────────────────────────────────┐      │
              │  Inputs collected per ~5s HealthTick:                   │      │
              │   • MemberState  ← Akka SelfMember.Status               │      │
              │   • DbReachable  ← DbHealthProbeActor (Ask, timeout→F) │      │
              │   • OpcUaProbeOk ← OpcUaProbeResult about THIS node    │──────┘
              │   • Stale        ← derived from above timestamps        │  PeerProbeSupervisor
              │   • IsLeader     ← RedundancyStateChanged snapshot      │  → PeerOpcUaProbeActor(s)
              │                                                         │  publish OpcUaProbeResult
              │  ServiceLevelCalculator.Compute(NodeHealthInputs)       │  on "redundancy-state"
              │  → byte (0/100/200/240/250)                             │
              └───────────────────────────────────────────────────────-─┘
                            │
                            ▼
              IServiceLevelPublisher (SdkServiceLevelPublisher)
                            │
                            ▼
              OPC UA Server.ServiceLevel Variable
```

Both `DbHealthProbeActor` and `PeerOpcUaProbeActor` feed the live publish path.
The peer probe publishes `OpcUaProbeResult` on the `redundancy-state` topic;
`OpcUaPublishActor` consumes only results whose target is itself and applies
freshness-debouncing before passing them to the calculator. `DbHealthProbeActor`
is queried directly via Ask on each `HealthTick`.

The admin singleton is the cluster's only `RedundancyStateActor`. If the admin leader fails over, the new admin node spins up its replacement, re-subscribes to cluster events, and publishes a fresh snapshot from the current `Cluster.State`. There is no DB-persisted state to recover.

## Configuration

Per-node identity comes from `appsettings.json` + the `OTOPCUA_ROLES` env var:

```json
{
  "Cluster": {
    "Hostname": "0.0.0.0",
    "Port": 4053,
    "PublicHostname": "node-a.lan",
    "SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"],
    "Roles": ["admin", "driver"]
  }
}
```

```
OTOPCUA_ROLES=admin,driver
```

Both nodes share the same `ConfigDb` connection string; `Cluster.PublicHostname` + `Roles` are what makes them distinct in cluster gossip. The first node bootstraps the cluster (its address goes in `SeedNodes`); the second node joins via the same `SeedNodes` list.

There is no longer a `Node:NodeId` setting and no `ClusterNode.RedundancyRole` column (the V2 migration dropped it — primary/secondary is now derived from cluster role-leadership). NodeId is derived as `host:port` of the cluster `PublicHostname` (see `ClusterRoleInfo.LocalNode` for the formula).

> **`RedundancyStateActor` NodeId consistency (fixed).** `RedundancyStateActor` now keys each node's `NodeRedundancyState` entry by the canonical `host:port` node id (via a `ToNodeId(Address)` helper mirroring `ClusterRoleInfo.ToNodeId`). Previously it keyed by `member.Address.Host` (host-only, e.g. `central-2`); since every subscriber matches by the canonical `host:port` form, the mismatch silently meant no node ever matched its own entry — all nodes stayed at the default ServiceLevel 255 and never learned their role. This fix makes `RedundancyStateActor` consistent with the stated contract above. Additionally, `RedundancyStateActor` now **re-publishes the current snapshot on a periodic heartbeat (default 10 s)** so any node that subscribes after the last topology-change publish converges within the interval (DistributedPubSub does not replay to late subscribers).

The `ClusterNode.ServiceLevelBase` column still exists and is editable in the Admin UI (NodeEdit / Cluster Redundancy pages), but it no longer drives the runtime ServiceLevel — that value is computed by `ServiceLevelCalculator` from cluster role and live health inputs, independent of this stored preference.

### Peer URI advertising

Each node advertises its partner via `OpcUaApplicationHostOptions.PeerApplicationUris` (an `IList<string>`, default empty). `OpcUaApplicationHost.PopulateServerArray` appends each configured peer URI to the SDK's `IServerInternal.ServerUris` string table after server startup, so that `Server.ServerArray` reads served by `OnReadServerArray` return both self + peers. The options bind from the `OpcUa` config section (see `Program.cs` — `AddValidatedOptions<OpcUaApplicationHostOptions>(…, "OpcUa")`). Set this per-node in `appsettings.json`:

```json
{
  "OpcUa": {
    "PeerApplicationUris": ["urn:node-b:OtOpcUa"]
  }
}
```

Node A lists Node B's `ApplicationUri` and vice-versa. Validated by `DualEndpointTests` in `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests/` — boots two `OpcUaApplicationHost` instances on loopback, asserts a real OPCFoundation client `Session` reading `Server.ServerArray` from Node A sees both URIs.

## Split-brain

`akka.conf` configures Akka's split-brain resolver with `active-strategy = keep-oldest`, `stable-after = 15s`, and `failure-detector.threshold = 10.0`. Under a clean partition: the oldest member stays up + the smaller (or younger) side downs itself within ~15 seconds. The `RedundancyStateActor` on the surviving partition re-computes from the post-partition `Cluster.State`.

There is no operator-driven role swap during a partition. Failover is what the cluster does automatically.

## Primary-gated alarm emission and historization

Under warm/hot redundancy both cluster nodes run `ScriptedAlarmHostActor` and evaluate scripted alarms, keeping each node's address space and engine state warm for instant failover. However, to avoid duplicate rows on `/alerts` and duplicate historian writes, only the Primary node publishes externally:

- **`alerts` topic emission** — `ScriptedAlarmHostActor` subscribes to the `redundancy-state` DPS topic and caches the local node's `RedundancyRole`. Each alarm transition is published to the cluster `alerts` topic **only when the node's role is `Primary`**. The default behaviour before any `redundancy-state` message arrives is to emit, so single-node deployments and the boot window never drop transitions. The OPC UA condition-node write and inbound ack/shelve command processing remain **ungated** on both nodes so the secondary is always ready to serve clients after a failover.
- **`HistorianAdapterActor` historization** — likewise Primary-gated so alarm historization is exactly-once across all alarm sources. The actor subscribes to the `alerts` DPS topic and translates each `AlarmTransitionEvent` → `AlarmHistorianEvent` before enqueuing it on the sink; scripted alarms therefore historize exactly once regardless of cluster size.

Net effect: each alarm transition appears **once** on `/alerts` and would historize once, not once per node.

See [ScriptedAlarms.md](ScriptedAlarms.md) and [AlarmTracking.md](AlarmTracking.md) for the scripted-alarm engine internals.

## Client-side failover

The OtOpcUa Client CLI at `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI` supports `-F` / `--failover-urls` for automatic client-side failover; for long-running subscriptions the CLI monitors session KeepAlive and reconnects to the next available server, recreating the subscription on the new endpoint. See [`Client.CLI.md`](Client.CLI.md).

## Observability

`OpcUaPublishActor` emits one metric on every ServiceLevel transition (it suppresses no-op repeats of the same byte):

| Metric | Type | Notes |
|---|---|---|
| `otopcua.redundancy.service_level_change` | Counter (`{change}`) | OPC UA `Server.ServiceLevel` transitions emitted by the redundancy state. Tagged with `level` = the new byte. |

The meter is defined on `OtOpcUaTelemetry` (`src/Core/ZB.MOM.WW.OtOpcUa.Commons/Observability/OtOpcUaTelemetry.cs`); it surfaces through whatever OpenTelemetry exporter the host configures.

## Depth reference

For the full design — message contracts, tiered calculator truth table, recovery semantics — see `docs/plans/2026-05-26-akka-hosting-alignment-design.md` §6.