Separates ApplicationUri from namespace identity so each instance in a redundant pair has a unique server URI while sharing the same Galaxy namespace. Exposes RedundancySupport, ServerUriArray, and dynamic ServiceLevel through the standard OPC UA server object. ServiceLevel is computed from role (Primary/Secondary) and runtime health (MXAccess and DB connectivity). Adds CLI redundancy command, second deployed service instance, and 31 new tests including paired-server integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
21 KiB
OPC UA Server Redundancy Plan
Summary
Add configurable non-transparent warm/hot redundancy to the LmxOpcUa server so that two instances sharing the same Galaxy repository can operate as a redundant pair. Each instance should advertise the redundant set through the standard OPC UA redundancy nodes, publish a dynamic ServiceLevel based on runtime health, and allow clients to discover and fail over between the instances. The CLI tool should gain a redundancy command for inspecting the redundant server set.
This review tightens the original draft in a few important ways:
- It separates namespace identity from application identity. The current host uses
urn:{GalaxyName}:LmxOpcUaas both the namespace URI andApplicationUri; that must change for redundancy because each server in the pair needs a unique server URI. - It avoids hand-wavy "write the redundancy nodes directly" language and instead targets the OPC UA SDK's built-in
ServerObjectState/ServerRedundancyStatemodel. - It removes a few inaccurate hardcoded assumptions, including the
ServerUriArraynode id and the deployment port examples. - It fixes execution order so test-builder and helper changes happen before integration coverage depends on them.
This plan still covers server-side redundancy exposure, client-side discovery, a second deployed service instance, documentation, and tests. It does not implement automatic server-side failover or subscription transfer; those remain client responsibilities per the OPC UA specification.
Background: OPC UA Redundancy Model
OPC UA exposes redundancy through standard nodes under Server/ServerRedundancy plus the Server/ServiceLevel property:
| Node | Type | Purpose |
|---|---|---|
RedundancySupport |
RedundancySupport enum |
Declares the redundancy mode: None, Cold, Warm, Hot, Transparent, HotAndMirrored |
ServerUriArray |
String[] |
Lists the ApplicationUri values of all servers in the redundant set for non-transparent redundancy |
ServiceLevel |
Byte (0-255) |
Indicates current operational quality; clients prefer the server with the highest value |
Non-Transparent Redundancy (our target)
In non-transparent redundancy (Warm or Hot), both servers run independently with their own sessions and subscriptions. Clients discover the redundant set by reading ServerUriArray, monitor ServiceLevel on each server, and manage their own failover. This fits the current architecture, where each instance independently connects to the same Galaxy repository and MXAccess runtime.
ServiceLevel semantics
| Range | Meaning |
|---|---|
| 0 | Server is not operational |
| 1-99 | Degraded |
| 100-199 | Healthy secondary |
| 200-255 | Healthy primary |
The primary should advertise a higher ServiceLevel than the secondary so clients prefer it when both are healthy.
Current State
LmxOpcUaServerextendsStandardServerbut does not expose redundancy stateServerRedundancy/RedundancySupportremains the SDK default (None)Server/ServiceLevelremains the SDK default (255)- No configuration exists for redundancy mode, role, or redundant partner URIs
OpcUaServerHostcurrently setsApplicationUri = urn:{GalaxyName}:LmxOpcUaLmxNodeManageruses the sameurn:{GalaxyName}:LmxOpcUaas the published namespace URI- A single deployed instance is documented in service_info.md
- No CLI command exists for reading redundancy information
Key gap to fix first
For redundancy, each server in the set must advertise a unique ApplicationUri, and ServerUriArray must contain those unique values. The current implementation cannot do that because it reuses the namespace URI as the server ApplicationUri. Phase 1 therefore needs an application-identity change before the redundancy nodes can be correct.
Scope
In scope (Phase 1)
- Add explicit application-identity configuration so each instance can have a unique
ApplicationUri - Add redundancy configuration for mode, role, and server URI membership
- Expose
RedundancySupport,ServerUriArray, and dynamicServiceLevel - Compute
ServiceLevelfrom runtime health and preferred role - Add a CLI
redundancycommand - Document two-instance deployment
- Add unit and integration coverage
Deferred
- Automatic subscription transfer
- Server-initiated failover
- Transparent redundancy mode
- Load-balancer-specific HTTP health endpoints
- Mirrored data/session state
Configuration Design
1. Add explicit OpcUa.ApplicationUri
File: src/ZB.MOM.WW.LmxOpcUa.Host/Configuration/OpcUaConfiguration.cs
Add:
public string? ApplicationUri { get; set; }
Rules:
ApplicationUri = nullpreserves the current behavior for non-redundant deployments- when
Redundancy.Enabled = true,ApplicationUrimust be explicitly set and unique per instance LmxNodeManagershould continue usingurn:{GalaxyName}:LmxOpcUaas the namespace URI so both redundant servers expose the same namespaceRedundancy.ServerUrismust contain the exactApplicationUrivalues for all servers in the redundant set
Example:
{
"OpcUa": {
"ServerName": "LmxOpcUa",
"GalaxyName": "ZB",
"ApplicationUri": "urn:localhost:LmxOpcUa:instance1"
}
}
2. New Redundancy section in appsettings.json
{
"Redundancy": {
"Enabled": false,
"Mode": "Warm",
"Role": "Primary",
"ServerUris": [],
"ServiceLevelBase": 200
}
}
3. Configuration model
File: src/ZB.MOM.WW.LmxOpcUa.Host/Configuration/RedundancyConfiguration.cs (new)
public class RedundancyConfiguration
{
public bool Enabled { get; set; } = false;
public string Mode { get; set; } = "Warm";
public string Role { get; set; } = "Primary";
public List<string> ServerUris { get; set; } = new List<string>();
public int ServiceLevelBase { get; set; } = 200;
}
4. Configuration rules
Enableddefaults tofalseModesupportsWarmandHotin Phase 1RolesupportsPrimaryandSecondaryServerUrismust contain the localOpcUa.ApplicationUriwhen redundancy is enabledServerUrisshould contain at least two unique entries when redundancy is enabledServiceLevelBaseshould be in the range1-255- Effective baseline:
- Primary:
ServiceLevelBase - Secondary:
max(0, ServiceLevelBase - 50)
- Primary:
App root updates
File: src/ZB.MOM.WW.LmxOpcUa.Host/Configuration/AppConfiguration.cs
- Add
public RedundancyConfiguration Redundancy { get; set; } = new RedundancyConfiguration();
Implementation Steps
Step 1: Separate application identity from namespace identity
Files:
src/.../Configuration/OpcUaConfiguration.cssrc/.../OpcUa/OpcUaServerHost.csdocs/OpcUaServer.mdtests/.../Configuration/ConfigurationLoadingTests.cs
Changes:
- Add optional
OpcUa.ApplicationUri - Keep
urn:{GalaxyName}:LmxOpcUaas the namespace URI used byLmxNodeManager - Set
ApplicationConfiguration.ApplicationUrifromOpcUa.ApplicationUriwhen supplied - Keep
ApplicationUriand namespace URI distinct in docs and tests
This step is required before redundancy can be correct.
Step 2: Add RedundancyConfiguration and bind it
Files:
src/.../Configuration/RedundancyConfiguration.cs(new)src/.../Configuration/AppConfiguration.cssrc/.../OpcUaService.cs
Changes:
- Create
RedundancyConfiguration - Add
RedundancytoAppConfiguration - Bind
configuration.GetSection("Redundancy").Bind(_config.Redundancy); - Pass
_config.Redundancythrough toOpcUaServerHostandLmxOpcUaServer
Step 3: Add RedundancyModeResolver
File: src/.../OpcUa/RedundancyModeResolver.cs (new)
Responsibilities:
- map
ModetoRedundancySupport - validate supported Phase 1 modes
- fall back safely when disabled or invalid
public static class RedundancyModeResolver
{
public static RedundancySupport Resolve(string mode, bool enabled);
}
Step 4: Add ServiceLevelCalculator
File: src/.../OpcUa/ServiceLevelCalculator.cs (new)
Purpose:
- compute the current
ServiceLevelfrom a baseline plus health inputs
Suggested signature:
public sealed class ServiceLevelCalculator
{
public byte Calculate(int baseLevel, bool mxAccessConnected, bool dbConnected);
}
Suggested logic:
- start with the role-adjusted baseline supplied by the caller
- subtract 100 if MXAccess is disconnected
- subtract 50 if the Galaxy DB is unreachable
- return
0if both are down - clamp to
0-255
Step 5: Extend ConfigurationValidator
File: src/.../Configuration/ConfigurationValidator.cs
Add validation/logging for:
OpcUa.ApplicationUriRedundancy.Enabled,Mode,RoleServerUrismembership and uniquenessServiceLevelBase- local
OpcUa.ApplicationUrimust appear inRedundancy.ServerUriswhen enabled - warning when fewer than 2 unique server URIs are configured
Step 6: Expose redundancy through the standard OPC UA server object
File: src/.../OpcUa/LmxOpcUaServer.cs
Changes:
- Accept
RedundancyConfigurationand localApplicationUri - On startup, locate the built-in
ServerObjectState - Configure
ServerObjectState.ServiceLevel - Configure the server redundancy object using the SDK's standard server-state types instead of writing guessed node ids directly
- If the default
ServerRedundancyStatedoes not exposeServerUriArray, replace or upgrade it with the appropriate non-transparent redundancy state type from the SDK before populating values - Expose an internal method such as
UpdateServiceLevel(bool mxConnected, bool dbConnected)for service-layer health updates
Important: the implementation should use SDK types/constants (ServerObjectState, ServerRedundancyState, NonTransparentRedundancyState, VariableIds.*) rather than hand-maintained numeric literals.
Step 7: Update OpcUaServerHost
File: src/.../OpcUa/OpcUaServerHost.cs
Changes:
- Accept
RedundancyConfiguration - Pass redundancy config and resolved local
ApplicationUriintoLmxOpcUaServer - Log redundancy mode/role/server URIs at startup
Step 8: Wire health updates in OpcUaService
File: src/.../OpcUaService.cs
Changes:
- Bind and pass redundancy config
- After startup, initialize the starting
ServiceLevel - Subscribe to
IMxAccessClient.ConnectionStateChanged - Update DB health whenever startup repository checks, change-detection work, or rebuild attempts succeed/fail
- Prefer event-driven updates; add a lightweight periodic refresh only if necessary
Avoid introducing a second large standalone polling loop when existing connection and repository activity already gives most of the needed health signals.
Step 9: Update test builders and helpers before integration coverage
Files:
src/.../OpcUaServiceBuilder.cstests/.../Helpers/OpcUaServerFixture.cstests/.../Helpers/OpcUaTestClient.cs
Changes:
- add
WithRedundancy(...) - add
WithApplicationUri(...)or allow fullOpcUaConfigurationoverride - ensure two in-process redundancy tests can run with distinct
ServerName,ApplicationUri, and certificate identity - when needed, use separate PKI roots in tests so paired fixtures do not collide on certificate state
Step 10: Update appsettings.json
File: src/.../appsettings.json
Add:
OpcUa.ApplicationUriexample/commentary in docsRedundancysection withEnabled = falsedefaults
Step 11: Add CLI redundancy command
Files:
tools/opcuacli-dotnet/Commands/RedundancyCommand.cs(new)tools/opcuacli-dotnet/README.mddocs/CliTool.md
Command: redundancy
Read:
VariableIds.Server_ServerRedundancy_RedundancySupportVariableIds.Server_ServiceLevelVariableIds.Server_ServerRedundancy_ServerUriArray
Output example:
Redundancy Mode: Warm
Service Level: 200
Server URIs:
- urn:localhost:LmxOpcUa:instance1
- urn:localhost:LmxOpcUa:instance2
Use SDK constants instead of hardcoded numeric ids in the command implementation.
Step 12: Deploy the second service instance
Deployment target: C:\publish\lmxopcua\instance2
Suggested configuration differences:
| Setting | instance1 | instance2 |
|---|---|---|
OpcUa.Port |
4840 |
4841 |
Dashboard.Port |
8081 |
8082 |
OpcUa.ServerName |
LmxOpcUa |
LmxOpcUa2 |
OpcUa.ApplicationUri |
urn:localhost:LmxOpcUa:instance1 |
urn:localhost:LmxOpcUa:instance2 |
Redundancy.Enabled |
true |
true |
Redundancy.Role |
Primary |
Secondary |
Redundancy.Mode |
Warm |
Warm |
Redundancy.ServerUris |
same two-entry set | same two-entry set |
Deployment notes:
- both instances should share the same
GalaxyNameand namespace URI - each instance must have a distinct application certificate identity
- if certificate handling is sensitive, give each instance an explicit
Security.CertificateSubjector separate PKI root
Update service_info.md with the second instance details after deployment is real, not speculative.
Test Plan
Unit tests: RedundancyModeResolver
New file: tests/ZB.MOM.WW.LmxOpcUa.Tests/Redundancy/RedundancyModeResolverTests.cs
| Test | Description |
|---|---|
Resolve_Disabled_ReturnsNone |
Enabled=false returns None |
Resolve_Warm_ReturnsWarm |
Mode="Warm" maps correctly |
Resolve_Hot_ReturnsHot |
Mode="Hot" maps correctly |
Resolve_Unknown_FallsBackToNone |
Unknown mode falls back safely |
Resolve_CaseInsensitive |
Case-insensitive parsing works |
Unit tests: ServiceLevelCalculator
New file: tests/ZB.MOM.WW.LmxOpcUa.Tests/Redundancy/ServiceLevelCalculatorTests.cs
| Test | Description |
|---|---|
FullyHealthy_Primary_ReturnsBase |
Healthy primary baseline is preserved |
FullyHealthy_Secondary_ReturnsBaseMinusFifty |
Healthy secondary baseline is lower |
MxAccessDown_ReducesServiceLevel |
MXAccess failure reduces score |
DbDown_ReducesServiceLevel |
DB failure reduces score |
BothDown_ReturnsZero |
Both unavailable returns 0 |
ClampedTo255 |
Upper clamp works |
ClampedToZero |
Lower clamp works |
Unit tests: RedundancyConfiguration
New file: tests/ZB.MOM.WW.LmxOpcUa.Tests/Redundancy/RedundancyConfigurationTests.cs
| Test | Description |
|---|---|
DefaultConfig_Disabled |
Enabled defaults to false |
DefaultConfig_ModeWarm |
Mode defaults to Warm |
DefaultConfig_RolePrimary |
Role defaults to Primary |
DefaultConfig_EmptyServerUris |
ServerUris defaults to empty |
DefaultConfig_ServiceLevelBase200 |
ServiceLevelBase defaults to 200 |
Updates to existing configuration tests
File: tests/ZB.MOM.WW.LmxOpcUa.Tests/Configuration/ConfigurationLoadingTests.cs
Add coverage for:
OpcUa.ApplicationUriRedundancysection binding- redundancy validation when
ApplicationUriis missing - redundancy validation when local
ApplicationUriis absent fromServerUris - invalid
ServiceLevelBase
Integration tests
New file: tests/ZB.MOM.WW.LmxOpcUa.Tests/Integration/RedundancyTests.cs
Cover:
- redundancy disabled reports
None - warm redundancy reports configured mode
ServerUriArraymatches configuration- primary reports higher
ServiceLevelthan secondary - both servers expose the same namespace URI but different
ApplicationUrivalues - service level drops when MXAccess disconnects
Pattern:
- use two fixture instances
- give each fixture a distinct
ServerName,ApplicationUri, and port - if secure transport is enabled in those tests, isolate PKI roots to avoid certificate cross-talk
Documentation Plan
New file
docs/Redundancy.md
Contents:
- overview of OPC UA non-transparent redundancy
- difference between namespace URI and server
ApplicationUri - redundancy configuration reference
- service-level computation
- two-instance deployment guide
- CLI
redundancycommand usage - troubleshooting
Updates to existing docs
| File | Changes |
|---|---|
docs/Configuration.md |
Add OpcUa.ApplicationUri and Redundancy sections |
docs/OpcUaServer.md |
Correct the current ApplicationUri == namespace description and add redundancy behavior |
docs/CliTool.md |
Add redundancy command |
docs/ServiceHosting.md |
Add multi-instance deployment notes |
README.md |
Mention redundancy support and link docs |
CLAUDE.md |
Add redundancy architecture note |
Update after real deployment
service_info.md
Only update this once the second instance is actually deployed and verified.
File Change Summary
| File | Action | Description |
|---|---|---|
src/.../Configuration/OpcUaConfiguration.cs |
Modify | Add explicit ApplicationUri |
src/.../Configuration/RedundancyConfiguration.cs |
New | Redundancy config model |
src/.../Configuration/AppConfiguration.cs |
Modify | Add Redundancy section |
src/.../Configuration/ConfigurationValidator.cs |
Modify | Validate/log redundancy and application identity |
src/.../OpcUa/RedundancyModeResolver.cs |
New | Map config mode to RedundancySupport |
src/.../OpcUa/ServiceLevelCalculator.cs |
New | Compute ServiceLevel from health inputs |
src/.../OpcUa/LmxOpcUaServer.cs |
Modify | Expose redundancy state via SDK server object |
src/.../OpcUa/OpcUaServerHost.cs |
Modify | Pass local application identity and redundancy config |
src/.../OpcUaService.cs |
Modify | Bind config and wire health updates |
src/.../OpcUaServiceBuilder.cs |
Modify | Support redundancy/application identity injection |
src/.../appsettings.json |
Modify | Add redundancy settings |
tools/opcuacli-dotnet/Commands/RedundancyCommand.cs |
New | Read redundancy state from a server |
tests/.../Redundancy/*.cs |
New | Unit tests for redundancy config and calculators |
tests/.../Configuration/ConfigurationLoadingTests.cs |
Modify | Bind/validate new settings |
tests/.../Integration/RedundancyTests.cs |
New | Paired-server integration tests |
tests/.../Helpers/OpcUaServerFixture.cs |
Modify | Support paired redundancy fixtures |
tests/.../Helpers/OpcUaTestClient.cs |
Modify | Read redundancy nodes in integration tests |
docs/Redundancy.md |
New | Dedicated redundancy guide |
docs/Configuration.md |
Modify | Document new config |
docs/OpcUaServer.md |
Modify | Correct application identity and add redundancy details |
docs/CliTool.md |
Modify | Document redundancy command |
docs/ServiceHosting.md |
Modify | Multi-instance deployment notes |
README.md |
Modify | Link redundancy docs |
CLAUDE.md |
Modify | Architecture note |
service_info.md |
Modify later | Document real second-instance deployment |
Verification Guardrails
Gate 1: Build
dotnet build ZB.MOM.WW.LmxOpcUa.slnx
Gate 2: Unit tests
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests
Gate 3: Redundancy integration tests
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests --filter "FullyQualifiedName~Redundancy"
Gate 4: CLI build
cd tools/opcuacli-dotnet
dotnet build
Gate 5: Manual single-instance check
opcuacli-dotnet.exe connect -u opc.tcp://localhost:4840/LmxOpcUa
opcuacli-dotnet.exe redundancy -u opc.tcp://localhost:4840/LmxOpcUa
Expected:
RedundancySupport=NoneServiceLevel=255
Gate 6: Manual paired-instance check
opcuacli-dotnet.exe redundancy -u opc.tcp://localhost:4840/LmxOpcUa
opcuacli-dotnet.exe redundancy -u opc.tcp://localhost:4841/LmxOpcUa
Expected:
- both report the same
ServerUriArray - each reports its own unique local
ApplicationUri - primary reports a higher
ServiceLevel
Gate 7: Full test suite
dotnet test ZB.MOM.WW.LmxOpcUa.slnx
Risks and Considerations
- Application identity is the main correctness risk. Without unique
ApplicationUrivalues, the redundant set is invalid even ifServerUriArrayis populated. - SDK wiring may require replacing the default redundancy state node. The base
ServerRedundancyStatedoes not exposeServerUriArray; the implementation may need the non-transparent subtype from the SDK. - Two in-process servers can collide on certificates. Tests and deployment need distinct application identities and, when necessary, isolated PKI roots.
- Both instances hit the same MXAccess runtime and Galaxy DB. Verify client-registration and polling behavior under paired load.
ServiceLevelshould remain meaningful, not noisy. Prefer deterministic role + health inputs over frequent arbitrary adjustments.service_info.mdis deployment documentation, not design. Do not prefill it with speculative values before the second instance actually exists.
Execution Order
- Step 1: add
OpcUa.ApplicationUriand separate it from namespace identity - Steps 2-5: config model, resolver, calculator, validator
- Gate 1 + Gate 2
- Step 9: update builders/helpers so tests can express paired servers cleanly
- Step 6-8: server exposure and service-layer health wiring
- Gate 1 + Gate 2 + Gate 3
- Step 10: update
appsettings.json - Step 11: add CLI
redundancycommand - Gate 4 + Gate 5
- Step 12: deploy and verify the second instance
- Update
service_info.mdwith real deployment details - Documentation updates
- Gate 7