Separates ApplicationUri from namespace identity so each instance in a redundant pair has a unique server URI while sharing the same Galaxy namespace. Exposes RedundancySupport, ServerUriArray, and dynamic ServiceLevel through the standard OPC UA server object. ServiceLevel is computed from role (Primary/Secondary) and runtime health (MXAccess and DB connectivity). Adds CLI redundancy command, second deployed service instance, and 31 new tests including paired-server integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
598 lines
21 KiB
Markdown
598 lines
21 KiB
Markdown
# OPC UA Server Redundancy Plan
|
|
|
|
## Summary
|
|
|
|
Add configurable non-transparent warm/hot redundancy to the LmxOpcUa server so that two instances sharing the same Galaxy repository can operate as a redundant pair. Each instance should advertise the redundant set through the standard OPC UA redundancy nodes, publish a dynamic `ServiceLevel` based on runtime health, and allow clients to discover and fail over between the instances. The CLI tool should gain a `redundancy` command for inspecting the redundant server set.
|
|
|
|
This review tightens the original draft in a few important ways:
|
|
|
|
- It separates **namespace identity** from **application identity**. The current host uses `urn:{GalaxyName}:LmxOpcUa` as both the namespace URI and `ApplicationUri`; that must change for redundancy because each server in the pair needs a unique server URI.
|
|
- It avoids hand-wavy "write the redundancy nodes directly" language and instead targets the OPC UA SDK's built-in `ServerObjectState` / `ServerRedundancyState` model.
|
|
- It removes a few inaccurate hardcoded assumptions, including the `ServerUriArray` node id and the deployment port examples.
|
|
- It fixes execution order so test-builder and helper changes happen before integration coverage depends on them.
|
|
|
|
This plan still covers server-side redundancy exposure, client-side discovery, a second deployed service instance, documentation, and tests. It does **not** implement automatic server-side failover or subscription transfer; those remain client responsibilities per the OPC UA specification.
|
|
|
|
---
|
|
|
|
## Background: OPC UA Redundancy Model
|
|
|
|
OPC UA exposes redundancy through standard nodes under `Server/ServerRedundancy` plus the `Server/ServiceLevel` property:
|
|
|
|
| Node | Type | Purpose |
|
|
|---|---|---|
|
|
| `RedundancySupport` | `RedundancySupport` enum | Declares the redundancy mode: `None`, `Cold`, `Warm`, `Hot`, `Transparent`, `HotAndMirrored` |
|
|
| `ServerUriArray` | `String[]` | Lists the `ApplicationUri` values of all servers in the redundant set for non-transparent redundancy |
|
|
| `ServiceLevel` | `Byte` (0-255) | Indicates current operational quality; clients prefer the server with the highest value |
|
|
|
|
### Non-Transparent Redundancy (our target)
|
|
|
|
In non-transparent redundancy (`Warm` or `Hot`), both servers run independently with their own sessions and subscriptions. Clients discover the redundant set by reading `ServerUriArray`, monitor `ServiceLevel` on each server, and manage their own failover. This fits the current architecture, where each instance independently connects to the same Galaxy repository and MXAccess runtime.
|
|
|
|
### ServiceLevel semantics
|
|
|
|
| Range | Meaning |
|
|
|---|---|
|
|
| 0 | Server is not operational |
|
|
| 1-99 | Degraded |
|
|
| 100-199 | Healthy secondary |
|
|
| 200-255 | Healthy primary |
|
|
|
|
The primary should advertise a higher `ServiceLevel` than the secondary so clients prefer it when both are healthy.
|
|
|
|
---
|
|
|
|
## Current State
|
|
|
|
- `LmxOpcUaServer` extends `StandardServer` but does not expose redundancy state
|
|
- `ServerRedundancy/RedundancySupport` remains the SDK default (`None`)
|
|
- `Server/ServiceLevel` remains the SDK default (`255`)
|
|
- No configuration exists for redundancy mode, role, or redundant partner URIs
|
|
- `OpcUaServerHost` currently sets `ApplicationUri = urn:{GalaxyName}:LmxOpcUa`
|
|
- `LmxNodeManager` uses the same `urn:{GalaxyName}:LmxOpcUa` as the published namespace URI
|
|
- A single deployed instance is documented in [service_info.md](C:\Users\dohertj2\Desktop\lmxopcua\service_info.md)
|
|
- No CLI command exists for reading redundancy information
|
|
|
|
## Key gap to fix first
|
|
|
|
For redundancy, each server in the set must advertise a unique `ApplicationUri`, and `ServerUriArray` must contain those unique values. The current implementation cannot do that because it reuses the namespace URI as the server `ApplicationUri`. Phase 1 therefore needs an application-identity change before the redundancy nodes can be correct.
|
|
|
|
---
|
|
|
|
## Scope
|
|
|
|
### In scope (Phase 1)
|
|
|
|
1. Add explicit application-identity configuration so each instance can have a unique `ApplicationUri`
|
|
2. Add redundancy configuration for mode, role, and server URI membership
|
|
3. Expose `RedundancySupport`, `ServerUriArray`, and dynamic `ServiceLevel`
|
|
4. Compute `ServiceLevel` from runtime health and preferred role
|
|
5. Add a CLI `redundancy` command
|
|
6. Document two-instance deployment
|
|
7. Add unit and integration coverage
|
|
|
|
### Deferred
|
|
|
|
- Automatic subscription transfer
|
|
- Server-initiated failover
|
|
- Transparent redundancy mode
|
|
- Load-balancer-specific HTTP health endpoints
|
|
- Mirrored data/session state
|
|
|
|
---
|
|
|
|
## Configuration Design
|
|
|
|
### 1. Add explicit `OpcUa.ApplicationUri`
|
|
|
|
**File:** `src/ZB.MOM.WW.LmxOpcUa.Host/Configuration/OpcUaConfiguration.cs`
|
|
|
|
Add:
|
|
|
|
```csharp
|
|
public string? ApplicationUri { get; set; }
|
|
```
|
|
|
|
Rules:
|
|
|
|
- `ApplicationUri = null` preserves the current behavior for non-redundant deployments
|
|
- when `Redundancy.Enabled = true`, `ApplicationUri` must be explicitly set and unique per instance
|
|
- `LmxNodeManager` should continue using `urn:{GalaxyName}:LmxOpcUa` as the namespace URI so both redundant servers expose the same namespace
|
|
- `Redundancy.ServerUris` must contain the exact `ApplicationUri` values for all servers in the redundant set
|
|
|
|
Example:
|
|
|
|
```json
|
|
{
|
|
"OpcUa": {
|
|
"ServerName": "LmxOpcUa",
|
|
"GalaxyName": "ZB",
|
|
"ApplicationUri": "urn:localhost:LmxOpcUa:instance1"
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. New `Redundancy` section in `appsettings.json`
|
|
|
|
```json
|
|
{
|
|
"Redundancy": {
|
|
"Enabled": false,
|
|
"Mode": "Warm",
|
|
"Role": "Primary",
|
|
"ServerUris": [],
|
|
"ServiceLevelBase": 200
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. Configuration model
|
|
|
|
**File:** `src/ZB.MOM.WW.LmxOpcUa.Host/Configuration/RedundancyConfiguration.cs` (new)
|
|
|
|
```csharp
|
|
public class RedundancyConfiguration
|
|
{
|
|
public bool Enabled { get; set; } = false;
|
|
public string Mode { get; set; } = "Warm";
|
|
public string Role { get; set; } = "Primary";
|
|
public List<string> ServerUris { get; set; } = new List<string>();
|
|
public int ServiceLevelBase { get; set; } = 200;
|
|
}
|
|
```
|
|
|
|
### 4. Configuration rules
|
|
|
|
- `Enabled` defaults to `false`
|
|
- `Mode` supports `Warm` and `Hot` in Phase 1
|
|
- `Role` supports `Primary` and `Secondary`
|
|
- `ServerUris` must contain the local `OpcUa.ApplicationUri` when redundancy is enabled
|
|
- `ServerUris` should contain at least two unique entries when redundancy is enabled
|
|
- `ServiceLevelBase` should be in the range `1-255`
|
|
- Effective baseline:
|
|
- Primary: `ServiceLevelBase`
|
|
- Secondary: `max(0, ServiceLevelBase - 50)`
|
|
|
|
### App root updates
|
|
|
|
**File:** `src/ZB.MOM.WW.LmxOpcUa.Host/Configuration/AppConfiguration.cs`
|
|
|
|
- Add `public RedundancyConfiguration Redundancy { get; set; } = new RedundancyConfiguration();`
|
|
|
|
---
|
|
|
|
## Implementation Steps
|
|
|
|
### Step 1: Separate application identity from namespace identity
|
|
|
|
**Files:**
|
|
|
|
- `src/.../Configuration/OpcUaConfiguration.cs`
|
|
- `src/.../OpcUa/OpcUaServerHost.cs`
|
|
- `docs/OpcUaServer.md`
|
|
- `tests/.../Configuration/ConfigurationLoadingTests.cs`
|
|
|
|
Changes:
|
|
|
|
1. Add optional `OpcUa.ApplicationUri`
|
|
2. Keep `urn:{GalaxyName}:LmxOpcUa` as the namespace URI used by `LmxNodeManager`
|
|
3. Set `ApplicationConfiguration.ApplicationUri` from `OpcUa.ApplicationUri` when supplied
|
|
4. Keep `ApplicationUri` and namespace URI distinct in docs and tests
|
|
|
|
This step is required before redundancy can be correct.
|
|
|
|
### Step 2: Add `RedundancyConfiguration` and bind it
|
|
|
|
**Files:**
|
|
|
|
- `src/.../Configuration/RedundancyConfiguration.cs` (new)
|
|
- `src/.../Configuration/AppConfiguration.cs`
|
|
- `src/.../OpcUaService.cs`
|
|
|
|
Changes:
|
|
|
|
1. Create `RedundancyConfiguration`
|
|
2. Add `Redundancy` to `AppConfiguration`
|
|
3. Bind `configuration.GetSection("Redundancy").Bind(_config.Redundancy);`
|
|
4. Pass `_config.Redundancy` through to `OpcUaServerHost` and `LmxOpcUaServer`
|
|
|
|
### Step 3: Add `RedundancyModeResolver`
|
|
|
|
**File:** `src/.../OpcUa/RedundancyModeResolver.cs` (new)
|
|
|
|
Responsibilities:
|
|
|
|
- map `Mode` to `RedundancySupport`
|
|
- validate supported Phase 1 modes
|
|
- fall back safely when disabled or invalid
|
|
|
|
```csharp
|
|
public static class RedundancyModeResolver
|
|
{
|
|
public static RedundancySupport Resolve(string mode, bool enabled);
|
|
}
|
|
```
|
|
|
|
### Step 4: Add `ServiceLevelCalculator`
|
|
|
|
**File:** `src/.../OpcUa/ServiceLevelCalculator.cs` (new)
|
|
|
|
Purpose:
|
|
|
|
- compute the current `ServiceLevel` from a baseline plus health inputs
|
|
|
|
Suggested signature:
|
|
|
|
```csharp
|
|
public sealed class ServiceLevelCalculator
|
|
{
|
|
public byte Calculate(int baseLevel, bool mxAccessConnected, bool dbConnected);
|
|
}
|
|
```
|
|
|
|
Suggested logic:
|
|
|
|
- start with the role-adjusted baseline supplied by the caller
|
|
- subtract 100 if MXAccess is disconnected
|
|
- subtract 50 if the Galaxy DB is unreachable
|
|
- return `0` if both are down
|
|
- clamp to `0-255`
|
|
|
|
### Step 5: Extend `ConfigurationValidator`
|
|
|
|
**File:** `src/.../Configuration/ConfigurationValidator.cs`
|
|
|
|
Add validation/logging for:
|
|
|
|
- `OpcUa.ApplicationUri`
|
|
- `Redundancy.Enabled`, `Mode`, `Role`
|
|
- `ServerUris` membership and uniqueness
|
|
- `ServiceLevelBase`
|
|
- local `OpcUa.ApplicationUri` must appear in `Redundancy.ServerUris` when enabled
|
|
- warning when fewer than 2 unique server URIs are configured
|
|
|
|
### Step 6: Expose redundancy through the standard OPC UA server object
|
|
|
|
**File:** `src/.../OpcUa/LmxOpcUaServer.cs`
|
|
|
|
Changes:
|
|
|
|
1. Accept `RedundancyConfiguration` and local `ApplicationUri`
|
|
2. On startup, locate the built-in `ServerObjectState`
|
|
3. Configure `ServerObjectState.ServiceLevel`
|
|
4. Configure the server redundancy object using the SDK's standard server-state types instead of writing guessed node ids directly
|
|
5. If the default `ServerRedundancyState` does not expose `ServerUriArray`, replace or upgrade it with the appropriate non-transparent redundancy state type from the SDK before populating values
|
|
6. Expose an internal method such as `UpdateServiceLevel(bool mxConnected, bool dbConnected)` for service-layer health updates
|
|
|
|
Important: the implementation should use SDK types/constants (`ServerObjectState`, `ServerRedundancyState`, `NonTransparentRedundancyState`, `VariableIds.*`) rather than hand-maintained numeric literals.
|
|
|
|
### Step 7: Update `OpcUaServerHost`
|
|
|
|
**File:** `src/.../OpcUa/OpcUaServerHost.cs`
|
|
|
|
Changes:
|
|
|
|
1. Accept `RedundancyConfiguration`
|
|
2. Pass redundancy config and resolved local `ApplicationUri` into `LmxOpcUaServer`
|
|
3. Log redundancy mode/role/server URIs at startup
|
|
|
|
### Step 8: Wire health updates in `OpcUaService`
|
|
|
|
**File:** `src/.../OpcUaService.cs`
|
|
|
|
Changes:
|
|
|
|
1. Bind and pass redundancy config
|
|
2. After startup, initialize the starting `ServiceLevel`
|
|
3. Subscribe to `IMxAccessClient.ConnectionStateChanged`
|
|
4. Update DB health whenever startup repository checks, change-detection work, or rebuild attempts succeed/fail
|
|
5. Prefer event-driven updates; add a lightweight periodic refresh only if necessary
|
|
|
|
Avoid introducing a second large standalone polling loop when existing connection and repository activity already gives most of the needed health signals.
|
|
|
|
### Step 9: Update test builders and helpers before integration coverage
|
|
|
|
**Files:**
|
|
|
|
- `src/.../OpcUaServiceBuilder.cs`
|
|
- `tests/.../Helpers/OpcUaServerFixture.cs`
|
|
- `tests/.../Helpers/OpcUaTestClient.cs`
|
|
|
|
Changes:
|
|
|
|
- add `WithRedundancy(...)`
|
|
- add `WithApplicationUri(...)` or allow full `OpcUaConfiguration` override
|
|
- ensure two in-process redundancy tests can run with distinct `ServerName`, `ApplicationUri`, and certificate identity
|
|
- when needed, use separate PKI roots in tests so paired fixtures do not collide on certificate state
|
|
|
|
### Step 10: Update `appsettings.json`
|
|
|
|
**File:** `src/.../appsettings.json`
|
|
|
|
Add:
|
|
|
|
- `OpcUa.ApplicationUri` example/commentary in docs
|
|
- `Redundancy` section with `Enabled = false` defaults
|
|
|
|
### Step 11: Add CLI `redundancy` command
|
|
|
|
**Files:**
|
|
|
|
- `tools/opcuacli-dotnet/Commands/RedundancyCommand.cs` (new)
|
|
- `tools/opcuacli-dotnet/README.md`
|
|
- `docs/CliTool.md`
|
|
|
|
Command: `redundancy`
|
|
|
|
Read:
|
|
|
|
- `VariableIds.Server_ServerRedundancy_RedundancySupport`
|
|
- `VariableIds.Server_ServiceLevel`
|
|
- `VariableIds.Server_ServerRedundancy_ServerUriArray`
|
|
|
|
Output example:
|
|
|
|
```text
|
|
Redundancy Mode: Warm
|
|
Service Level: 200
|
|
Server URIs:
|
|
- urn:localhost:LmxOpcUa:instance1
|
|
- urn:localhost:LmxOpcUa:instance2
|
|
```
|
|
|
|
Use SDK constants instead of hardcoded numeric ids in the command implementation.
|
|
|
|
### Step 12: Deploy the second service instance
|
|
|
|
**Deployment target:** `C:\publish\lmxopcua\instance2`
|
|
|
|
Suggested configuration differences:
|
|
|
|
| Setting | instance1 | instance2 |
|
|
|---|---|---|
|
|
| `OpcUa.Port` | `4840` | `4841` |
|
|
| `Dashboard.Port` | `8081` | `8082` |
|
|
| `OpcUa.ServerName` | `LmxOpcUa` | `LmxOpcUa2` |
|
|
| `OpcUa.ApplicationUri` | `urn:localhost:LmxOpcUa:instance1` | `urn:localhost:LmxOpcUa:instance2` |
|
|
| `Redundancy.Enabled` | `true` | `true` |
|
|
| `Redundancy.Role` | `Primary` | `Secondary` |
|
|
| `Redundancy.Mode` | `Warm` | `Warm` |
|
|
| `Redundancy.ServerUris` | same two-entry set | same two-entry set |
|
|
|
|
Deployment notes:
|
|
|
|
- both instances should share the same `GalaxyName` and namespace URI
|
|
- each instance must have a distinct application certificate identity
|
|
- if certificate handling is sensitive, give each instance an explicit `Security.CertificateSubject` or separate PKI root
|
|
|
|
Update [service_info.md](C:\Users\dohertj2\Desktop\lmxopcua\service_info.md) with the second instance details after deployment is real, not speculative.
|
|
|
|
---
|
|
|
|
## Test Plan
|
|
|
|
### Unit tests: `RedundancyModeResolver`
|
|
|
|
**New file:** `tests/ZB.MOM.WW.LmxOpcUa.Tests/Redundancy/RedundancyModeResolverTests.cs`
|
|
|
|
| Test | Description |
|
|
|---|---|
|
|
| `Resolve_Disabled_ReturnsNone` | `Enabled=false` returns `None` |
|
|
| `Resolve_Warm_ReturnsWarm` | `Mode="Warm"` maps correctly |
|
|
| `Resolve_Hot_ReturnsHot` | `Mode="Hot"` maps correctly |
|
|
| `Resolve_Unknown_FallsBackToNone` | Unknown mode falls back safely |
|
|
| `Resolve_CaseInsensitive` | Case-insensitive parsing works |
|
|
|
|
### Unit tests: `ServiceLevelCalculator`
|
|
|
|
**New file:** `tests/ZB.MOM.WW.LmxOpcUa.Tests/Redundancy/ServiceLevelCalculatorTests.cs`
|
|
|
|
| Test | Description |
|
|
|---|---|
|
|
| `FullyHealthy_Primary_ReturnsBase` | Healthy primary baseline is preserved |
|
|
| `FullyHealthy_Secondary_ReturnsBaseMinusFifty` | Healthy secondary baseline is lower |
|
|
| `MxAccessDown_ReducesServiceLevel` | MXAccess failure reduces score |
|
|
| `DbDown_ReducesServiceLevel` | DB failure reduces score |
|
|
| `BothDown_ReturnsZero` | Both unavailable returns 0 |
|
|
| `ClampedTo255` | Upper clamp works |
|
|
| `ClampedToZero` | Lower clamp works |
|
|
|
|
### Unit tests: `RedundancyConfiguration`
|
|
|
|
**New file:** `tests/ZB.MOM.WW.LmxOpcUa.Tests/Redundancy/RedundancyConfigurationTests.cs`
|
|
|
|
| Test | Description |
|
|
|---|---|
|
|
| `DefaultConfig_Disabled` | `Enabled` defaults to `false` |
|
|
| `DefaultConfig_ModeWarm` | `Mode` defaults to `Warm` |
|
|
| `DefaultConfig_RolePrimary` | `Role` defaults to `Primary` |
|
|
| `DefaultConfig_EmptyServerUris` | `ServerUris` defaults to empty |
|
|
| `DefaultConfig_ServiceLevelBase200` | `ServiceLevelBase` defaults to `200` |
|
|
|
|
### Updates to existing configuration tests
|
|
|
|
**File:** `tests/ZB.MOM.WW.LmxOpcUa.Tests/Configuration/ConfigurationLoadingTests.cs`
|
|
|
|
Add coverage for:
|
|
|
|
- `OpcUa.ApplicationUri`
|
|
- `Redundancy` section binding
|
|
- redundancy validation when `ApplicationUri` is missing
|
|
- redundancy validation when local `ApplicationUri` is absent from `ServerUris`
|
|
- invalid `ServiceLevelBase`
|
|
|
|
### Integration tests
|
|
|
|
**New file:** `tests/ZB.MOM.WW.LmxOpcUa.Tests/Integration/RedundancyTests.cs`
|
|
|
|
Cover:
|
|
|
|
- redundancy disabled reports `None`
|
|
- warm redundancy reports configured mode
|
|
- `ServerUriArray` matches configuration
|
|
- primary reports higher `ServiceLevel` than secondary
|
|
- both servers expose the same namespace URI but different `ApplicationUri` values
|
|
- service level drops when MXAccess disconnects
|
|
|
|
Pattern:
|
|
|
|
- use two fixture instances
|
|
- give each fixture a distinct `ServerName`, `ApplicationUri`, and port
|
|
- if secure transport is enabled in those tests, isolate PKI roots to avoid certificate cross-talk
|
|
|
|
---
|
|
|
|
## Documentation Plan
|
|
|
|
### New file
|
|
|
|
- `docs/Redundancy.md`
|
|
|
|
Contents:
|
|
|
|
1. overview of OPC UA non-transparent redundancy
|
|
2. difference between namespace URI and server `ApplicationUri`
|
|
3. redundancy configuration reference
|
|
4. service-level computation
|
|
5. two-instance deployment guide
|
|
6. CLI `redundancy` command usage
|
|
7. troubleshooting
|
|
|
|
### Updates to existing docs
|
|
|
|
| File | Changes |
|
|
|---|---|
|
|
| `docs/Configuration.md` | Add `OpcUa.ApplicationUri` and `Redundancy` sections |
|
|
| `docs/OpcUaServer.md` | Correct the current `ApplicationUri == namespace` description and add redundancy behavior |
|
|
| `docs/CliTool.md` | Add `redundancy` command |
|
|
| `docs/ServiceHosting.md` | Add multi-instance deployment notes |
|
|
| `README.md` | Mention redundancy support and link docs |
|
|
| `CLAUDE.md` | Add redundancy architecture note |
|
|
|
|
### Update after real deployment
|
|
|
|
- `service_info.md`
|
|
|
|
Only update this once the second instance is actually deployed and verified.
|
|
|
|
---
|
|
|
|
## File Change Summary
|
|
|
|
| File | Action | Description |
|
|
|---|---|---|
|
|
| `src/.../Configuration/OpcUaConfiguration.cs` | Modify | Add explicit `ApplicationUri` |
|
|
| `src/.../Configuration/RedundancyConfiguration.cs` | New | Redundancy config model |
|
|
| `src/.../Configuration/AppConfiguration.cs` | Modify | Add `Redundancy` section |
|
|
| `src/.../Configuration/ConfigurationValidator.cs` | Modify | Validate/log redundancy and application identity |
|
|
| `src/.../OpcUa/RedundancyModeResolver.cs` | New | Map config mode to `RedundancySupport` |
|
|
| `src/.../OpcUa/ServiceLevelCalculator.cs` | New | Compute `ServiceLevel` from health inputs |
|
|
| `src/.../OpcUa/LmxOpcUaServer.cs` | Modify | Expose redundancy state via SDK server object |
|
|
| `src/.../OpcUa/OpcUaServerHost.cs` | Modify | Pass local application identity and redundancy config |
|
|
| `src/.../OpcUaService.cs` | Modify | Bind config and wire health updates |
|
|
| `src/.../OpcUaServiceBuilder.cs` | Modify | Support redundancy/application identity injection |
|
|
| `src/.../appsettings.json` | Modify | Add redundancy settings |
|
|
| `tools/opcuacli-dotnet/Commands/RedundancyCommand.cs` | New | Read redundancy state from a server |
|
|
| `tests/.../Redundancy/*.cs` | New | Unit tests for redundancy config and calculators |
|
|
| `tests/.../Configuration/ConfigurationLoadingTests.cs` | Modify | Bind/validate new settings |
|
|
| `tests/.../Integration/RedundancyTests.cs` | New | Paired-server integration tests |
|
|
| `tests/.../Helpers/OpcUaServerFixture.cs` | Modify | Support paired redundancy fixtures |
|
|
| `tests/.../Helpers/OpcUaTestClient.cs` | Modify | Read redundancy nodes in integration tests |
|
|
| `docs/Redundancy.md` | New | Dedicated redundancy guide |
|
|
| `docs/Configuration.md` | Modify | Document new config |
|
|
| `docs/OpcUaServer.md` | Modify | Correct application identity and add redundancy details |
|
|
| `docs/CliTool.md` | Modify | Document `redundancy` command |
|
|
| `docs/ServiceHosting.md` | Modify | Multi-instance deployment notes |
|
|
| `README.md` | Modify | Link redundancy docs |
|
|
| `CLAUDE.md` | Modify | Architecture note |
|
|
| `service_info.md` | Modify later | Document real second-instance deployment |
|
|
|
|
---
|
|
|
|
## Verification Guardrails
|
|
|
|
### Gate 1: Build
|
|
|
|
```bash
|
|
dotnet build ZB.MOM.WW.LmxOpcUa.slnx
|
|
```
|
|
|
|
### Gate 2: Unit tests
|
|
|
|
```bash
|
|
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests
|
|
```
|
|
|
|
### Gate 3: Redundancy integration tests
|
|
|
|
```bash
|
|
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests --filter "FullyQualifiedName~Redundancy"
|
|
```
|
|
|
|
### Gate 4: CLI build
|
|
|
|
```bash
|
|
cd tools/opcuacli-dotnet
|
|
dotnet build
|
|
```
|
|
|
|
### Gate 5: Manual single-instance check
|
|
|
|
```bash
|
|
opcuacli-dotnet.exe connect -u opc.tcp://localhost:4840/LmxOpcUa
|
|
opcuacli-dotnet.exe redundancy -u opc.tcp://localhost:4840/LmxOpcUa
|
|
```
|
|
|
|
Expected:
|
|
|
|
- `RedundancySupport=None`
|
|
- `ServiceLevel=255`
|
|
|
|
### Gate 6: Manual paired-instance check
|
|
|
|
```bash
|
|
opcuacli-dotnet.exe redundancy -u opc.tcp://localhost:4840/LmxOpcUa
|
|
opcuacli-dotnet.exe redundancy -u opc.tcp://localhost:4841/LmxOpcUa
|
|
```
|
|
|
|
Expected:
|
|
|
|
- both report the same `ServerUriArray`
|
|
- each reports its own unique local `ApplicationUri`
|
|
- primary reports a higher `ServiceLevel`
|
|
|
|
### Gate 7: Full test suite
|
|
|
|
```bash
|
|
dotnet test ZB.MOM.WW.LmxOpcUa.slnx
|
|
```
|
|
|
|
---
|
|
|
|
## Risks and Considerations
|
|
|
|
1. **Application identity is the main correctness risk.** Without unique `ApplicationUri` values, the redundant set is invalid even if `ServerUriArray` is populated.
|
|
2. **SDK wiring may require replacing the default redundancy state node.** The base `ServerRedundancyState` does not expose `ServerUriArray`; the implementation may need the non-transparent subtype from the SDK.
|
|
3. **Two in-process servers can collide on certificates.** Tests and deployment need distinct application identities and, when necessary, isolated PKI roots.
|
|
4. **Both instances hit the same MXAccess runtime and Galaxy DB.** Verify client-registration and polling behavior under paired load.
|
|
5. **`ServiceLevel` should remain meaningful, not noisy.** Prefer deterministic role + health inputs over frequent arbitrary adjustments.
|
|
6. **`service_info.md` is deployment documentation, not design.** Do not prefill it with speculative values before the second instance actually exists.
|
|
|
|
---
|
|
|
|
## Execution Order
|
|
|
|
1. Step 1: add `OpcUa.ApplicationUri` and separate it from namespace identity
|
|
2. Steps 2-5: config model, resolver, calculator, validator
|
|
3. Gate 1 + Gate 2
|
|
4. Step 9: update builders/helpers so tests can express paired servers cleanly
|
|
5. Step 6-8: server exposure and service-layer health wiring
|
|
6. Gate 1 + Gate 2 + Gate 3
|
|
7. Step 10: update `appsettings.json`
|
|
8. Step 11: add CLI `redundancy` command
|
|
9. Gate 4 + Gate 5
|
|
10. Step 12: deploy and verify the second instance
|
|
11. Update `service_info.md` with real deployment details
|
|
12. Documentation updates
|
|
13. Gate 7
|