docs(dcl): document primary/backup endpoint redundancy across requirements and test infra

This commit is contained in:
Joseph Doherty
2026-03-22 08:43:59 -04:00
parent e8df71ea64
commit 5de6c8d052
4 changed files with 43 additions and 1 deletions

View File

@@ -68,6 +68,8 @@ Central cluster only. Sites have no user interface.
### Site & Data Connection Management (Admin Role)
- Create, edit, and delete site definitions, including Akka node addresses (NodeA/NodeB) and gRPC node addresses (GrpcNodeA/GrpcNodeB).
- Define data connections and assign them to sites (name, protocol type, connection details).
- **Data connection form**: "Primary Endpoint Configuration" (required JSON text area) and optional "Backup Endpoint Configuration" (collapsible section, hidden by default, revealed via "Add Backup Endpoint" button; "Remove Backup" button when editing an existing backup). "Failover Retry Count" numeric input (default 3, min 1, max 20) is visible only when a backup endpoint is configured.
- **Data connection list page**: Shows Primary Config and Backup Config columns. Active Endpoint column populated from health reports.
### Area Management (Admin Role)
- Define hierarchical area structures per site.

View File

@@ -104,9 +104,46 @@ LmxProxy is a gRPC-based protocol for communicating with LMX data servers. The D
**Test Infrastructure**: The `infra/lmxfakeproxy/` project provides a fake LmxProxy server that bridges to the OPC UA test server. It implements the full `scada.ScadaService` proto, enabling end-to-end testing of `RealLmxProxyClient` without a Windows LmxProxy deployment. See [test_infra_lmxfakeproxy.md](../test_infra/test_infra_lmxfakeproxy.md) for setup.
## Endpoint Redundancy
Data connections support an optional backup endpoint for automatic failover when the active endpoint becomes unreachable. Both endpoints use the same protocol.
**Entity fields:**
| Field | Type | Notes |
|-------|------|-------|
| `PrimaryConfiguration` | string? (max 4000) | Required. Renamed from `Configuration` |
| `BackupConfiguration` | string? (max 4000) | Optional. Null = no backup |
| `FailoverRetryCount` | int (default 3) | Retries on active endpoint before switching |
**Failover state machine:**
```
Connected → disconnect → push bad quality → retry active endpoint (5s)
→ N failures (≥ FailoverRetryCount) → switch to other endpoint
→ dispose adapter, create fresh adapter with other config
→ reconnect → ReSubscribeAll → Connected
```
- **Round-robin**: primary → backup → primary → backup. No preferred endpoint after first failover — the connection stays on whichever endpoint is working.
- **No auto-failback**: The connection remains on the active endpoint until it fails.
- **Single-endpoint connections** (no backup): Retry indefinitely on the same endpoint, preserving existing behavior.
- **Adapter lifecycle on failover**: The actor disposes the current `IDataConnection` adapter and creates a fresh one via `DataConnectionFactory.Create()` with the other endpoint's configuration. Clean slate — no stale state.
**Health reporting:**
- `DataConnectionHealthReport` includes `ActiveEndpoint`: `"Primary"`, `"Backup"`, or `"Primary (no backup)"`.
**Site event log entries:**
- `DataConnectionFailover` (Warning) — connection name, from-endpoint, to-endpoint, failure count.
- `DataConnectionRestored` (Info) — connection name, active endpoint.
See [`2026-03-22-primary-backup-data-connections-design.md`](../plans/2026-03-22-primary-backup-data-connections-design.md) for the full design.
## Connection Configuration Reference
All settings are parsed from the data connection's `Configuration` JSON dictionary (stored as `IDictionary<string, string>` connection details). Invalid numeric values fall back to defaults silently.
All settings are parsed from the data connection's configuration JSON dictionaries (`PrimaryConfiguration` and optional `BackupConfiguration`, stored as `IDictionary<string, string>` connection details). Both endpoints use the same protocol-specific keys. Invalid numeric values fall back to defaults silently.
### OPC UA Settings

View File

@@ -65,6 +65,7 @@
- Additional protocols can be added by implementing the common interface.
- The Data Connection Layer is a **clean data pipe** — it publishes tag value updates to Instance Actors but performs no evaluation of triggers or alarm conditions.
- **Initial attribute quality**: Attributes bound to a data connection start with **uncertain** quality when the Instance Actor initializes. The quality remains uncertain until the first value update is received from the Data Connection Layer. This distinguishes "never received a value" from "received a known-good value" or "connection lost" (bad quality).
- Data connections support optional **backup endpoints** with automatic failover after a configurable retry count. On failover, all subscriptions are transparently re-created on the new endpoint.
### 2.5 Scale
- Approximately **10 sites**.

View File

@@ -64,6 +64,8 @@ API key (ReadWrite): `c4559c7c6acc60a997135c1381162e3c30f4572ece78dd933c1a626e6f
Full details: [`lmxproxy/instances_config.md`](../../lmxproxy/instances_config.md)
**Primary/backup testing**: The dual OPC UA test servers (ports 50000 and 50010) in local Docker and the dual LmxProxy v2 instances on windev (ports 50100 and 50101) provide primary/backup endpoint pairs for testing Data Connection Layer failover. Use `docker compose stop opcua` to simulate primary failure and verify automatic failover to the backup.
## Connection Strings
For use in `appsettings.Development.json`: