Add configurable non-transparent OPC UA server redundancy

Separates ApplicationUri from namespace identity so each instance in a
redundant pair has a unique server URI while sharing the same Galaxy
namespace. Exposes RedundancySupport, ServerUriArray, and dynamic
ServiceLevel through the standard OPC UA server object. ServiceLevel
is computed from role (Primary/Secondary) and runtime health (MXAccess
and DB connectivity). Adds CLI redundancy command, second deployed
service instance, and 31 new tests including paired-server integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-03-28 13:32:17 -04:00
parent a3c2d9b243
commit a55153d7d5
27 changed files with 1475 additions and 248 deletions

View File

@@ -222,3 +222,29 @@ The command builds an `EventFilter` with select clauses for 12 fields from the O
When an `EventFieldList` notification arrives, the handler extracts these fields by index and prints a structured alarm event to the console showing the source name, condition name, active/acknowledged state, severity, message, retain flag, and suppressed/shelved status.
The `--refresh` flag calls `subscription.ConditionRefreshAsync()` after the subscription is created, which asks the server to re-emit retained condition events so the operator sees the current alarm state immediately rather than waiting for the next transition.
### redundancy
Reads the OPC UA redundancy state from a server: `RedundancySupport`, `ServiceLevel`, `ServerUriArray`, and `ApplicationUri`.
```bash
dotnet run -- redundancy -u opc.tcp://localhost:4840/LmxOpcUa
```
Example output:
```text
Redundancy Mode: Warm
Service Level: 200
Server URIs:
- urn:localhost:LmxOpcUa:instance1
- urn:localhost:LmxOpcUa:instance2
Application URI: urn:localhost:LmxOpcUa:instance1
```
| Flag | Description |
|------|-------------|
| `-u` | OPC UA server endpoint URL (required) |
| `-S` | Transport security mode (default: none) |
| `-U` | Username for authentication |
| `-P` | Password for authentication |

View File

@@ -55,6 +55,7 @@ Controls the OPC UA server endpoint and session limits. Defined in `OpcUaConfigu
| `MaxSessions` | `int` | `100` | Maximum simultaneous OPC UA sessions |
| `SessionTimeoutMinutes` | `int` | `30` | Idle session timeout in minutes |
| `AlarmTrackingEnabled` | `bool` | `false` | Enables `AlarmConditionState` nodes for alarm attributes |
| `ApplicationUri` | `string?` | `null` | Explicit application URI for this server instance. Required when redundancy is enabled. Defaults to `urn:{GalaxyName}:LmxOpcUa` when null |
### MxAccess
@@ -156,6 +157,30 @@ Example — production deployment with encrypted transport:
}
```
### Redundancy
Controls non-transparent OPC UA redundancy. Defined in `RedundancyConfiguration`. See [Redundancy Guide](Redundancy.md) for detailed usage.
| Property | Type | Default | Description |
|----------|------|---------|-------------|
| `Enabled` | `bool` | `false` | Enables redundancy mode and ServiceLevel computation |
| `Mode` | `string` | `"Warm"` | Redundancy mode: `Warm` or `Hot` |
| `Role` | `string` | `"Primary"` | Instance role: `Primary` (higher ServiceLevel) or `Secondary` |
| `ServerUris` | `List<string>` | `[]` | ApplicationUri values for all servers in the redundant set |
| `ServiceLevelBase` | `int` | `200` | Base ServiceLevel when healthy (1-255). Secondary receives base - 50 |
Example — two-instance redundant pair (Primary):
```json
"Redundancy": {
"Enabled": true,
"Mode": "Warm",
"Role": "Primary",
"ServerUris": ["urn:localhost:LmxOpcUa:instance1", "urn:localhost:LmxOpcUa:instance2"],
"ServiceLevelBase": 200
}
```
## Feature Flags
Three boolean properties act as feature flags that control optional subsystems:
@@ -176,6 +201,10 @@ Three boolean properties act as feature flags that control optional subsystems:
- Unknown security profile names are logged as warnings
- `AutoAcceptClientCertificates = true` emits a warning
- Only-`None` profile configuration emits a warning
- `OpcUa.ApplicationUri` must be set when `Redundancy.Enabled = true`
- `Redundancy.ServiceLevelBase` must be between 1 and 255
- `Redundancy.ServerUris` should contain at least 2 entries when enabled
- Local `ApplicationUri` should appear in `Redundancy.ServerUris`
If validation fails, the service throws `InvalidOperationException` and does not start.
@@ -206,7 +235,8 @@ Integration tests use this constructor to inject substitute implementations of `
"GalaxyName": "ZB",
"MaxSessions": 100,
"SessionTimeoutMinutes": 30,
"AlarmTrackingEnabled": false
"AlarmTrackingEnabled": false,
"ApplicationUri": null
},
"MxAccess": {
"ClientName": "LmxOpcUa",
@@ -249,6 +279,13 @@ Integration tests use this constructor to inject substitute implementations of `
"MinimumCertificateKeySize": 2048,
"PkiRootPath": null,
"CertificateSubject": null
},
"Redundancy": {
"Enabled": false,
"Mode": "Warm",
"Role": "Primary",
"ServerUris": [],
"ServiceLevelBase": 200
}
}
```

View File

@@ -19,7 +19,7 @@ The OPC UA server component hosts the Galaxy-backed namespace on a configurable
The resulting endpoint URL is `opc.tcp://{BindAddress}:{Port}{EndpointPath}`, e.g., `opc.tcp://0.0.0.0:4840/LmxOpcUa`.
The namespace URI follows the pattern `urn:{GalaxyName}:LmxOpcUa` and serves as both the `ApplicationUri` and `ProductUri`.
The namespace URI follows the pattern `urn:{GalaxyName}:LmxOpcUa` and is used as the `ProductUri`. The `ApplicationUri` can be set independently via `OpcUa.ApplicationUri` to support redundant deployments where each instance needs a unique identity. When `ApplicationUri` is null, it defaults to the namespace URI.
## Programmatic ApplicationConfiguration
@@ -48,6 +48,16 @@ Supported Phase 1 profiles:
For production deployments, configure `["Basic256Sha256-SignAndEncrypt"]` or `["None", "Basic256Sha256-SignAndEncrypt"]` and set `AutoAcceptClientCertificates` to `false`. See the [Security Guide](security.md) for hardening details.
## Redundancy
When `Redundancy.Enabled = true`, `LmxOpcUaServer` exposes the standard OPC UA redundancy nodes on startup:
- `Server/ServerRedundancy/RedundancySupport` — set to `Warm` or `Hot` based on configuration
- `Server/ServerRedundancy/ServerUriArray` — populated with the configured `ServerUris`
- `Server/ServiceLevel` — computed dynamically from role and runtime health
The `ServiceLevel` is updated whenever MXAccess connection state changes or Galaxy DB health changes. See [Redundancy Guide](Redundancy.md) for full details.
### User token policies
`UserTokenPolicies` are dynamically configured based on the `Authentication` settings in `appsettings.json`:

175
docs/Redundancy.md Normal file
View File

@@ -0,0 +1,175 @@
# Redundancy
## Overview
LmxOpcUa supports OPC UA **non-transparent redundancy** in Warm or Hot mode. In a non-transparent redundancy deployment, two independent server instances run side by side. Both connect to the same Galaxy repository database and the same MXAccess runtime, but each maintains its own OPC UA sessions and subscriptions. Clients discover the redundant set through the `ServerUriArray` exposed in each server's address space and are responsible for managing failover between the two endpoints.
When redundancy is disabled (the default), the server reports `RedundancySupport.None` and a fixed `ServiceLevel` of 255.
## Namespace vs Application Identity
Both servers in the redundant set share the same **namespace URI** so that clients see identical node IDs regardless of which instance they are connected to. The namespace URI follows the pattern `urn:{GalaxyName}:LmxOpcUa` (e.g., `urn:ZB:LmxOpcUa`).
The **ApplicationUri**, on the other hand, must be unique per instance. This is how the OPC UA stack and clients distinguish one server from the other within the redundant set. Each instance sets its own ApplicationUri via the `OpcUa.ApplicationUri` configuration property (e.g., `urn:localhost:LmxOpcUa:instance1` and `urn:localhost:LmxOpcUa:instance2`).
When redundancy is disabled, `ApplicationUri` defaults to `urn:{GalaxyName}:LmxOpcUa` if left null.
## Configuration
### Redundancy Section
| Property | Type | Default | Description |
|---|---|---|---|
| `Enabled` | bool | `false` | Enables non-transparent redundancy. When false, the server reports `RedundancySupport.None` and `ServiceLevel = 255`. |
| `Mode` | string | `"Warm"` | The redundancy mode advertised to clients. Valid values: `Warm`, `Hot`. |
| `Role` | string | `"Primary"` | This instance's role in the redundant pair. Valid values: `Primary`, `Secondary`. The Primary advertises a higher ServiceLevel than the Secondary when both are healthy. |
| `ServerUris` | string[] | `[]` | The ApplicationUri values of all servers in the redundant set. Must include this instance's own `OpcUa.ApplicationUri`. Should contain at least 2 entries. |
| `ServiceLevelBase` | int | `200` | The base ServiceLevel when the server is fully healthy. Valid range: 1-255. The Secondary automatically receives `ServiceLevelBase - 50`. |
### OpcUa.ApplicationUri
| Property | Type | Default | Description |
|---|---|---|---|
| `ApplicationUri` | string | `null` | Explicit application URI for this server instance. When null, defaults to `urn:{GalaxyName}:LmxOpcUa`. **Required when redundancy is enabled** -- each instance needs a unique identity. |
## ServiceLevel Computation
ServiceLevel is a standard OPC UA diagnostic value (0-255) that indicates server health. Clients in a redundant deployment should prefer the server advertising the highest ServiceLevel.
**Baseline values:**
| Role | Baseline |
|---|---|
| Primary | `ServiceLevelBase` (default 200) |
| Secondary | `ServiceLevelBase - 50` (default 150) |
**Penalties applied to the baseline:**
| Condition | Penalty |
|---|---|
| MXAccess disconnected | -100 |
| Galaxy DB unreachable | -50 |
| Both MXAccess and DB down | ServiceLevel forced to 0 |
The final value is clamped to the range 0-255.
**Examples (with default ServiceLevelBase = 200):**
| Scenario | Primary | Secondary |
|---|---|---|
| Both healthy | 200 | 150 |
| MXAccess down | 100 | 50 |
| DB down | 150 | 100 |
| Both down | 0 | 0 |
## Two-Instance Deployment
When deploying a redundant pair, the following configuration properties must differ between the two instances. All other settings (GalaxyName, ConnectionString, etc.) are shared.
| Property | Instance 1 (Primary) | Instance 2 (Secondary) |
|---|---|---|
| `OpcUa.Port` | 4840 | 4841 |
| `OpcUa.ServerName` | `LmxOpcUa-1` | `LmxOpcUa-2` |
| `OpcUa.ApplicationUri` | `urn:localhost:LmxOpcUa:instance1` | `urn:localhost:LmxOpcUa:instance2` |
| `Dashboard.Port` | 8081 | 8082 |
| `MxAccess.ClientName` | `LmxOpcUa-1` | `LmxOpcUa-2` |
| `Redundancy.Role` | `Primary` | `Secondary` |
### Instance 1 -- Primary (appsettings.json)
```json
{
"OpcUa": {
"Port": 4840,
"ServerName": "LmxOpcUa-1",
"GalaxyName": "ZB",
"ApplicationUri": "urn:localhost:LmxOpcUa:instance1"
},
"MxAccess": {
"ClientName": "LmxOpcUa-1"
},
"Dashboard": {
"Port": 8081
},
"Redundancy": {
"Enabled": true,
"Mode": "Warm",
"Role": "Primary",
"ServerUris": [
"urn:localhost:LmxOpcUa:instance1",
"urn:localhost:LmxOpcUa:instance2"
],
"ServiceLevelBase": 200
}
}
```
### Instance 2 -- Secondary (appsettings.json)
```json
{
"OpcUa": {
"Port": 4841,
"ServerName": "LmxOpcUa-2",
"GalaxyName": "ZB",
"ApplicationUri": "urn:localhost:LmxOpcUa:instance2"
},
"MxAccess": {
"ClientName": "LmxOpcUa-2"
},
"Dashboard": {
"Port": 8082
},
"Redundancy": {
"Enabled": true,
"Mode": "Warm",
"Role": "Secondary",
"ServerUris": [
"urn:localhost:LmxOpcUa:instance1",
"urn:localhost:LmxOpcUa:instance2"
],
"ServiceLevelBase": 200
}
}
```
## CLI `redundancy` Command
The CLI tool at `tools/opcuacli-dotnet/` includes a `redundancy` command that reads the redundancy state from a running server.
```bash
dotnet run -- redundancy -u opc.tcp://localhost:4840/LmxOpcUa
dotnet run -- redundancy -u opc.tcp://localhost:4841/LmxOpcUa
```
The command reads the following standard OPC UA nodes and displays their values:
- **Redundancy Mode** -- from `Server_ServerRedundancy_RedundancySupport` (None, Warm, or Hot)
- **Service Level** -- from `Server_ServiceLevel` (0-255)
- **Server URIs** -- from `Server_ServerRedundancy_ServerUriArray` (list of ApplicationUri values in the redundant set)
- **Application URI** -- from `Server_ServerArray` (this instance's ApplicationUri)
Example output for a healthy Primary:
```
Redundancy Mode: Warm
Service Level: 200
Server URIs:
- urn:localhost:LmxOpcUa:instance1
- urn:localhost:LmxOpcUa:instance2
Application URI: urn:localhost:LmxOpcUa:instance1
```
The command also supports `--username`/`--password` and `--security` options for authenticated or encrypted connections.
## Troubleshooting
**Mismatched ServerUris between instances** -- Both instances must list the exact same set of ApplicationUri values in `Redundancy.ServerUris`. If they differ, clients may not discover the full redundant set. Check the startup log for the `Redundancy.ServerUris` line on each instance.
**ServiceLevel stuck at 255** -- This indicates redundancy is not enabled. When `Redundancy.Enabled` is false (the default), the server always reports `ServiceLevel = 255` and `RedundancySupport.None`. Verify that `Redundancy.Enabled` is set to `true` in the configuration and that the configuration section is correctly bound.
**ApplicationUri not set** -- The configuration validator rejects startup when redundancy is enabled but `OpcUa.ApplicationUri` is null or empty. Each instance must have a unique ApplicationUri. Check the error log for: `OpcUa.ApplicationUri must be set when redundancy is enabled`.
**Both servers report the same ServiceLevel** -- Verify that one instance has `Redundancy.Role` set to `Primary` and the other to `Secondary`. Both set to `Primary` (or both to `Secondary`) will produce identical baseline values, preventing clients from distinguishing the preferred server.
**ServerUriArray not readable** -- When `RedundancySupport` is `None` (redundancy disabled), the OPC UA SDK may not expose the `ServerUriArray` node or it may return an empty value. The CLI `redundancy` command handles this gracefully by catching the read error. Enable redundancy to populate this array.

View File

@@ -132,6 +132,25 @@ Log files are written relative to the executable directory (see Working Director
`Log.CloseAndFlush()` is called in the `finally` block of `Program.Main()` to ensure all buffered log entries are written before process exit.
## Multi-Instance Deployment
The service supports running multiple instances for redundancy. Each instance requires:
- A unique Windows service name (e.g., `LmxOpcUa`, `LmxOpcUa2`)
- A unique OPC UA port and dashboard port
- A unique `OpcUa.ApplicationUri` and `OpcUa.ServerName`
- A unique `MxAccess.ClientName`
- Matching `Redundancy.ServerUris` arrays on all instances
Install additional instances using TopShelf's `-servicename` flag:
```bash
cd C:\publish\lmxopcua\instance2
ZB.MOM.WW.LmxOpcUa.Host.exe install -servicename "LmxOpcUa2" -displayname "LMX OPC UA Server (Instance 2)"
```
See [Redundancy Guide](Redundancy.md) for full deployment details.
## Platform Target
The service must be compiled and run as x86 (32-bit). The MXAccess COM toolkit DLLs in `Program Files (x86)\ArchestrA\Framework\bin` are 32-bit only. Running the service as x64 or AnyCPU (64-bit preferred) causes COM interop failures when creating the `LMXProxyServer` object on the STA thread.