From 5de6c8d0529038c090b565de741aa558919a7314 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Sun, 22 Mar 2026 08:43:59 -0400 Subject: [PATCH] docs(dcl): document primary/backup endpoint redundancy across requirements and test infra --- docs/requirements/Component-CentralUI.md | 2 + .../Component-DataConnectionLayer.md | 39 ++++++++++++++++++- docs/requirements/HighLevelReqs.md | 1 + docs/test_infra/test_infra.md | 2 + 4 files changed, 43 insertions(+), 1 deletion(-) diff --git a/docs/requirements/Component-CentralUI.md b/docs/requirements/Component-CentralUI.md index 4a051fd..469f203 100644 --- a/docs/requirements/Component-CentralUI.md +++ b/docs/requirements/Component-CentralUI.md @@ -68,6 +68,8 @@ Central cluster only. Sites have no user interface. ### Site & Data Connection Management (Admin Role) - Create, edit, and delete site definitions, including Akka node addresses (NodeA/NodeB) and gRPC node addresses (GrpcNodeA/GrpcNodeB). - Define data connections and assign them to sites (name, protocol type, connection details). +- **Data connection form**: "Primary Endpoint Configuration" (required JSON text area) and optional "Backup Endpoint Configuration" (collapsible section, hidden by default, revealed via "Add Backup Endpoint" button; "Remove Backup" button when editing an existing backup). "Failover Retry Count" numeric input (default 3, min 1, max 20) is visible only when a backup endpoint is configured. +- **Data connection list page**: Shows Primary Config and Backup Config columns. Active Endpoint column populated from health reports. ### Area Management (Admin Role) - Define hierarchical area structures per site. diff --git a/docs/requirements/Component-DataConnectionLayer.md b/docs/requirements/Component-DataConnectionLayer.md index 9bcd98a..9662c51 100644 --- a/docs/requirements/Component-DataConnectionLayer.md +++ b/docs/requirements/Component-DataConnectionLayer.md @@ -104,9 +104,46 @@ LmxProxy is a gRPC-based protocol for communicating with LMX data servers. The D **Test Infrastructure**: The `infra/lmxfakeproxy/` project provides a fake LmxProxy server that bridges to the OPC UA test server. It implements the full `scada.ScadaService` proto, enabling end-to-end testing of `RealLmxProxyClient` without a Windows LmxProxy deployment. See [test_infra_lmxfakeproxy.md](../test_infra/test_infra_lmxfakeproxy.md) for setup. +## Endpoint Redundancy + +Data connections support an optional backup endpoint for automatic failover when the active endpoint becomes unreachable. Both endpoints use the same protocol. + +**Entity fields:** + +| Field | Type | Notes | +|-------|------|-------| +| `PrimaryConfiguration` | string? (max 4000) | Required. Renamed from `Configuration` | +| `BackupConfiguration` | string? (max 4000) | Optional. Null = no backup | +| `FailoverRetryCount` | int (default 3) | Retries on active endpoint before switching | + +**Failover state machine:** + +``` +Connected → disconnect → push bad quality → retry active endpoint (5s) + → N failures (≥ FailoverRetryCount) → switch to other endpoint + → dispose adapter, create fresh adapter with other config + → reconnect → ReSubscribeAll → Connected +``` + +- **Round-robin**: primary → backup → primary → backup. No preferred endpoint after first failover — the connection stays on whichever endpoint is working. +- **No auto-failback**: The connection remains on the active endpoint until it fails. +- **Single-endpoint connections** (no backup): Retry indefinitely on the same endpoint, preserving existing behavior. +- **Adapter lifecycle on failover**: The actor disposes the current `IDataConnection` adapter and creates a fresh one via `DataConnectionFactory.Create()` with the other endpoint's configuration. Clean slate — no stale state. + +**Health reporting:** + +- `DataConnectionHealthReport` includes `ActiveEndpoint`: `"Primary"`, `"Backup"`, or `"Primary (no backup)"`. + +**Site event log entries:** + +- `DataConnectionFailover` (Warning) — connection name, from-endpoint, to-endpoint, failure count. +- `DataConnectionRestored` (Info) — connection name, active endpoint. + +See [`2026-03-22-primary-backup-data-connections-design.md`](../plans/2026-03-22-primary-backup-data-connections-design.md) for the full design. + ## Connection Configuration Reference -All settings are parsed from the data connection's `Configuration` JSON dictionary (stored as `IDictionary` connection details). Invalid numeric values fall back to defaults silently. +All settings are parsed from the data connection's configuration JSON dictionaries (`PrimaryConfiguration` and optional `BackupConfiguration`, stored as `IDictionary` connection details). Both endpoints use the same protocol-specific keys. Invalid numeric values fall back to defaults silently. ### OPC UA Settings diff --git a/docs/requirements/HighLevelReqs.md b/docs/requirements/HighLevelReqs.md index fa322ab..0c45992 100644 --- a/docs/requirements/HighLevelReqs.md +++ b/docs/requirements/HighLevelReqs.md @@ -65,6 +65,7 @@ - Additional protocols can be added by implementing the common interface. - The Data Connection Layer is a **clean data pipe** — it publishes tag value updates to Instance Actors but performs no evaluation of triggers or alarm conditions. - **Initial attribute quality**: Attributes bound to a data connection start with **uncertain** quality when the Instance Actor initializes. The quality remains uncertain until the first value update is received from the Data Connection Layer. This distinguishes "never received a value" from "received a known-good value" or "connection lost" (bad quality). +- Data connections support optional **backup endpoints** with automatic failover after a configurable retry count. On failover, all subscriptions are transparently re-created on the new endpoint. ### 2.5 Scale - Approximately **10 sites**. diff --git a/docs/test_infra/test_infra.md b/docs/test_infra/test_infra.md index e4ec9e9..4673b4f 100644 --- a/docs/test_infra/test_infra.md +++ b/docs/test_infra/test_infra.md @@ -64,6 +64,8 @@ API key (ReadWrite): `c4559c7c6acc60a997135c1381162e3c30f4572ece78dd933c1a626e6f Full details: [`lmxproxy/instances_config.md`](../../lmxproxy/instances_config.md) +**Primary/backup testing**: The dual OPC UA test servers (ports 50000 and 50010) in local Docker and the dual LmxProxy v2 instances on windev (ports 50100 and 50101) provide primary/backup endpoint pairs for testing Data Connection Layer failover. Use `docker compose stop opcua` to simulate primary failure and verify automatic failover to the backup. + ## Connection Strings For use in `appsettings.Development.json`: