Add OPC UA server redundancy implementation plan
Covers non-transparent warm/hot redundancy with configurable roles, dynamic ServiceLevel, CLI support, second service instance deployment, and verification guardrails across unit, integration, and manual tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
508
redundancy.md
Normal file
508
redundancy.md
Normal file
@@ -0,0 +1,508 @@
|
||||
# OPC UA Server Redundancy Plan
|
||||
|
||||
## Summary
|
||||
|
||||
Add configurable non-transparent warm/hot redundancy to the LmxOpcUa server so that two instances sharing the same Galaxy repository can operate as a redundant pair. Each instance advertises itself and its partner through the OPC UA `ServerRedundancy` node, publishes a dynamic `ServiceLevel` reflecting runtime health, and allows clients to discover the redundant set and fail over between instances. The CLI tool gains a `redundancy` command for inspecting the redundant server set.
|
||||
|
||||
This plan covers server-side redundancy exposure, client-side discovery, a second deployed service instance, documentation, and tests. It does **not** implement automatic server-side failover or subscription transfer — those are client responsibilities per the OPC UA specification.
|
||||
|
||||
---
|
||||
|
||||
## Background: OPC UA Redundancy Model
|
||||
|
||||
OPC UA defines redundancy through three address-space nodes under `Server/ServerRedundancy`:
|
||||
|
||||
| Node | Type | Purpose |
|
||||
|---|---|---|
|
||||
| `RedundancySupport` | `RedundancySupport` enum | Declares the redundancy mode: `None`, `Cold`, `Warm`, `Hot`, `Transparent`, `HotAndMirrored` |
|
||||
| `ServerUriArray` | `String[]` | Lists the `ApplicationUri` values of all servers in the redundant set (non-transparent modes) |
|
||||
| `ServiceLevel` | `Byte` (0–255) | Indicates current operational quality; clients prefer the server with the highest value |
|
||||
|
||||
### Non-Transparent Redundancy (our target)
|
||||
|
||||
In non-transparent redundancy (`Warm` or `Hot`), both servers run independently with their own sessions and subscriptions. Clients discover the redundant set by reading `ServerUriArray`, monitor `ServiceLevel` on each server, and manage their own failover. This model fits our architecture where each instance connects to the same Galaxy repository and MXAccess runtime independently.
|
||||
|
||||
### ServiceLevel Semantics
|
||||
|
||||
| Range | Meaning |
|
||||
|---|---|
|
||||
| 0 | Server is not operational |
|
||||
| 1–99 | Degraded (e.g., MXAccess disconnected, DB unreachable) |
|
||||
| 100–199 | Healthy secondary |
|
||||
| 200–255 | Healthy primary (preferred) |
|
||||
|
||||
The primary server should advertise a higher `ServiceLevel` than the secondary so clients prefer it when both are healthy.
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
- `LmxOpcUaServer` extends `StandardServer` but does not override any redundancy-related methods
|
||||
- `ServerRedundancy/RedundancySupport` defaults to `None` (SDK default)
|
||||
- `ServiceLevel` defaults to `255` (SDK default — "fully operational")
|
||||
- No configuration for redundant partner URIs or role designation
|
||||
- Single deployed instance at `C:\publish\lmxopcua\instance1` on port 4840
|
||||
- No CLI support for reading redundancy information
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope (Phase 1)
|
||||
|
||||
1. **Redundancy configuration model** — role, partner URIs, ServiceLevel weights
|
||||
2. **Server redundancy node exposure** — `RedundancySupport`, `ServerUriArray`, dynamic `ServiceLevel`
|
||||
3. **ServiceLevel computation** — based on runtime health (MXAccess state, DB connectivity, role)
|
||||
4. **CLI redundancy command** — read `RedundancySupport`, `ServerUriArray`, `ServiceLevel` from a server
|
||||
5. **Second service instance** — deployed at `C:\publish\lmxopcua\instance2` with non-overlapping ports
|
||||
6. **Documentation** — new `docs/Redundancy.md` component doc, updates to existing docs
|
||||
7. **Unit tests** — config, ServiceLevel computation, resolver tests
|
||||
8. **Integration tests** — two-server redundancy E2E test in the integration test project
|
||||
|
||||
### Deferred
|
||||
|
||||
- Automatic subscription transfer (client-side responsibility)
|
||||
- Server-initiated failover (Galaxy `redundancy` table / engine flags)
|
||||
- Transparent redundancy mode
|
||||
- Health-check HTTP endpoint for load balancers
|
||||
|
||||
---
|
||||
|
||||
## Configuration Design
|
||||
|
||||
### New `Redundancy` section in `appsettings.json`
|
||||
|
||||
```json
|
||||
{
|
||||
"Redundancy": {
|
||||
"Enabled": false,
|
||||
"Mode": "Warm",
|
||||
"Role": "Primary",
|
||||
"ServerUris": [],
|
||||
"ServiceLevelBase": 200
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Configuration model
|
||||
|
||||
**File:** `src/ZB.MOM.WW.LmxOpcUa.Host/Configuration/RedundancyConfiguration.cs` (new)
|
||||
|
||||
```csharp
|
||||
public class RedundancyConfiguration
|
||||
{
|
||||
public bool Enabled { get; set; } = false;
|
||||
public string Mode { get; set; } = "Warm";
|
||||
public string Role { get; set; } = "Primary";
|
||||
public List<string> ServerUris { get; set; } = new List<string>();
|
||||
public int ServiceLevelBase { get; set; } = 200;
|
||||
}
|
||||
```
|
||||
|
||||
### Configuration rules
|
||||
|
||||
- `Enabled` defaults to `false` for backward compatibility. When `false`, `RedundancySupport = None` and `ServiceLevel = 255` (SDK defaults).
|
||||
- `Mode` must be `Warm` or `Hot` (Phase 1). Maps to `RedundancySupport.Warm` or `RedundancySupport.Hot`.
|
||||
- `Role` must be `Primary` or `Secondary`. Controls the base `ServiceLevel` (Primary gets `ServiceLevelBase`, Secondary gets `ServiceLevelBase - 50`).
|
||||
- `ServerUris` lists the `ApplicationUri` values for **all** servers in the redundant set, including the local server. The OPC UA spec requires this to contain the full set. These are namespace URIs like `urn:ZB:LmxOpcUa`, not endpoint URLs.
|
||||
- `ServiceLevelBase` is the starting ServiceLevel when the server is fully healthy. Degraded conditions subtract from this value.
|
||||
|
||||
### App root updates
|
||||
|
||||
**File:** `src/ZB.MOM.WW.LmxOpcUa.Host/Configuration/AppConfiguration.cs`
|
||||
|
||||
- Add `public RedundancyConfiguration Redundancy { get; set; } = new RedundancyConfiguration();`
|
||||
|
||||
---
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Add RedundancyConfiguration model and bind it
|
||||
|
||||
**Files:**
|
||||
- `src/.../Configuration/RedundancyConfiguration.cs` (new)
|
||||
- `src/.../Configuration/AppConfiguration.cs`
|
||||
- `src/.../OpcUaService.cs`
|
||||
|
||||
Changes:
|
||||
1. Create `RedundancyConfiguration` class with properties above
|
||||
2. Add `Redundancy` property to `AppConfiguration`
|
||||
3. Bind `configuration.GetSection("Redundancy").Bind(_config.Redundancy);`
|
||||
4. Pass `_config.Redundancy` through to `OpcUaServerHost` and `LmxOpcUaServer`
|
||||
|
||||
### Step 2: Add RedundancyModeResolver
|
||||
|
||||
**File:** `src/.../OpcUa/RedundancyModeResolver.cs` (new)
|
||||
|
||||
Responsibilities:
|
||||
- Map `Mode` string to `RedundancySupport` enum value
|
||||
- Validate against supported Phase 1 modes (`Warm`, `Hot`)
|
||||
- Fall back to `None` with warning for unknown modes
|
||||
|
||||
```csharp
|
||||
public static class RedundancyModeResolver
|
||||
{
|
||||
public static RedundancySupport Resolve(string mode, bool enabled);
|
||||
}
|
||||
```
|
||||
|
||||
### Step 3: Add ServiceLevelCalculator
|
||||
|
||||
**File:** `src/.../OpcUa/ServiceLevelCalculator.cs` (new)
|
||||
|
||||
Computes the dynamic `ServiceLevel` byte from runtime health:
|
||||
|
||||
```csharp
|
||||
public class ServiceLevelCalculator
|
||||
{
|
||||
public byte Calculate(int baseLine, bool mxAccessConnected, bool dbConnected, bool isPrimary);
|
||||
}
|
||||
```
|
||||
|
||||
Logic:
|
||||
- Start with `baseLine` (from config, e.g., 200 for Primary, 150 for Secondary)
|
||||
- Subtract 100 if MXAccess is disconnected
|
||||
- Subtract 50 if Galaxy DB is unreachable
|
||||
- Clamp to 0–255
|
||||
- Return 0 if both MXAccess and DB are down
|
||||
|
||||
### Step 4: Extend ConfigurationValidator for redundancy
|
||||
|
||||
**File:** `src/.../Configuration/ConfigurationValidator.cs`
|
||||
|
||||
Add validation/logging for:
|
||||
- `Redundancy.Enabled`, `Mode`, `Role`
|
||||
- `ServerUris` should not be empty when `Enabled = true`
|
||||
- `ServiceLevelBase` should be 1–255
|
||||
- Warning when `Enabled = true` but `ServerUris` has fewer than 2 entries
|
||||
- Log effective redundancy configuration at startup
|
||||
|
||||
### Step 5: Update LmxOpcUaServer to expose redundancy state
|
||||
|
||||
**File:** `src/.../OpcUa/LmxOpcUaServer.cs`
|
||||
|
||||
Changes:
|
||||
1. Accept `RedundancyConfiguration` in the constructor
|
||||
2. Override `OnServerStarted` to write redundancy nodes:
|
||||
- Set `Server/ServerRedundancy/RedundancySupport` to the resolved mode
|
||||
- Set `Server/ServerRedundancy/ServerUriArray` to the configured URIs
|
||||
3. Override `SetServerState` or use a timer to update `Server/ServiceLevel` periodically based on `ServiceLevelCalculator`
|
||||
4. Expose a method `UpdateServiceLevel(bool mxConnected, bool dbConnected)` that the service layer can call when health state changes
|
||||
|
||||
### Step 6: Update OpcUaServerHost to pass redundancy config
|
||||
|
||||
**File:** `src/.../OpcUa/OpcUaServerHost.cs`
|
||||
|
||||
Changes:
|
||||
1. Accept `RedundancyConfiguration` in the constructor
|
||||
2. Pass it through to `LmxOpcUaServer`
|
||||
3. Log active redundancy mode at startup
|
||||
|
||||
### Step 7: Wire ServiceLevel updates in OpcUaService
|
||||
|
||||
**File:** `src/.../OpcUaService.cs`
|
||||
|
||||
Changes:
|
||||
1. Bind redundancy config section
|
||||
2. Pass redundancy config to `OpcUaServerHost`
|
||||
3. Subscribe to `MxAccessClient.ConnectionStateChanged` to trigger `ServiceLevel` updates
|
||||
4. After Galaxy DB health checks, trigger `ServiceLevel` updates
|
||||
5. Use a periodic timer (e.g., every 5 seconds) to refresh `ServiceLevel` based on current component health
|
||||
|
||||
### Step 8: Update appsettings.json
|
||||
|
||||
**File:** `src/.../appsettings.json`
|
||||
|
||||
Add the `Redundancy` section with backward-compatible defaults (`Enabled: false`).
|
||||
|
||||
### Step 9: Update OpcUaServiceBuilder for test injection
|
||||
|
||||
**File:** `src/.../OpcUaServiceBuilder.cs`
|
||||
|
||||
Add `WithRedundancy(RedundancyConfiguration)` builder method so tests can inject redundancy configuration.
|
||||
|
||||
### Step 10: Add CLI `redundancy` command
|
||||
|
||||
**Files:**
|
||||
- `tools/opcuacli-dotnet/Commands/RedundancyCommand.cs` (new)
|
||||
|
||||
Command: `redundancy`
|
||||
|
||||
Reads from the target server:
|
||||
- `Server/ServerRedundancy/RedundancySupport` (i=11314)
|
||||
- `Server/ServiceLevel` (i=2267)
|
||||
- `Server/ServerRedundancy/ServerUriArray` (i=11492, if non-transparent redundancy)
|
||||
|
||||
Output format:
|
||||
```
|
||||
Redundancy Mode: Warm
|
||||
Service Level: 200
|
||||
Server URIs:
|
||||
- urn:ZB:LmxOpcUa
|
||||
- urn:ZB:LmxOpcUa2
|
||||
```
|
||||
|
||||
Options: `--url`, `--username`, `--password`, `--security` (same shared options as other commands).
|
||||
|
||||
### Step 11: Deploy second service instance
|
||||
|
||||
**Deployment target:** `C:\publish\lmxopcua\instance2`
|
||||
|
||||
Configuration differences from instance1:
|
||||
|
||||
| Setting | instance1 | instance2 |
|
||||
|---|---|---|
|
||||
| `OpcUa.Port` | `4840` | `4841` |
|
||||
| `OpcUa.ServerName` | `LmxOpcUa` | `LmxOpcUa2` |
|
||||
| `Dashboard.Port` | `8083` | `8084` |
|
||||
| `Redundancy.Enabled` | `true` | `true` |
|
||||
| `Redundancy.Role` | `Primary` | `Secondary` |
|
||||
| `Redundancy.Mode` | `Warm` | `Warm` |
|
||||
| `Redundancy.ServerUris` | `["urn:ZB:LmxOpcUa", "urn:ZB:LmxOpcUa2"]` | `["urn:ZB:LmxOpcUa", "urn:ZB:LmxOpcUa2"]` |
|
||||
| `Redundancy.ServiceLevelBase` | `200` | `200` |
|
||||
|
||||
Windows service for instance2:
|
||||
- Name: `LmxOpcUa2`
|
||||
- Display name: `LMX OPC UA Server (Instance 2)`
|
||||
- Executable: `C:\publish\lmxopcua\instance2\ZB.MOM.WW.LmxOpcUa.Host.exe`
|
||||
|
||||
Both instances share the same Galaxy DB (`ZB`) and MXAccess runtime. The `GalaxyName` remains `ZB` for both so they expose the same namespace.
|
||||
|
||||
Update `service_info.md` with the second instance details.
|
||||
|
||||
---
|
||||
|
||||
## Test Plan
|
||||
|
||||
### Unit tests — RedundancyModeResolver
|
||||
|
||||
**New file:** `tests/ZB.MOM.WW.LmxOpcUa.Tests/Redundancy/RedundancyModeResolverTests.cs`
|
||||
|
||||
| Test | Description |
|
||||
|---|---|
|
||||
| `Resolve_Disabled_ReturnsNone` | `Enabled=false` always returns `RedundancySupport.None` |
|
||||
| `Resolve_Warm_ReturnsWarm` | `Mode="Warm"` maps to `RedundancySupport.Warm` |
|
||||
| `Resolve_Hot_ReturnsHot` | `Mode="Hot"` maps to `RedundancySupport.Hot` |
|
||||
| `Resolve_Unknown_FallsBackToNone` | Unknown mode falls back safely |
|
||||
| `Resolve_CaseInsensitive` | `"warm"` and `"WARM"` both resolve |
|
||||
|
||||
### Unit tests — ServiceLevelCalculator
|
||||
|
||||
**New file:** `tests/ZB.MOM.WW.LmxOpcUa.Tests/Redundancy/ServiceLevelCalculatorTests.cs`
|
||||
|
||||
| Test | Description |
|
||||
|---|---|
|
||||
| `FullyHealthy_Primary_ReturnsBase` | All healthy, primary role → `ServiceLevelBase` |
|
||||
| `FullyHealthy_Secondary_ReturnsBaseMinusFifty` | All healthy, secondary role → `ServiceLevelBase - 50` |
|
||||
| `MxAccessDown_ReducesServiceLevel` | MXAccess disconnected subtracts 100 |
|
||||
| `DbDown_ReducesServiceLevel` | DB unreachable subtracts 50 |
|
||||
| `BothDown_ReturnsZero` | MXAccess + DB both down → 0 |
|
||||
| `ClampedTo255` | Base of 255 with healthy → 255 |
|
||||
| `ClampedToZero` | Heavy penalties don't go negative |
|
||||
|
||||
### Unit tests — RedundancyConfiguration defaults
|
||||
|
||||
**New file:** `tests/ZB.MOM.WW.LmxOpcUa.Tests/Redundancy/RedundancyConfigurationTests.cs`
|
||||
|
||||
| Test | Description |
|
||||
|---|---|
|
||||
| `DefaultConfig_Disabled` | `Enabled` defaults to `false` |
|
||||
| `DefaultConfig_ModeWarm` | `Mode` defaults to `"Warm"` |
|
||||
| `DefaultConfig_RolePrimary` | `Role` defaults to `"Primary"` |
|
||||
| `DefaultConfig_EmptyServerUris` | `ServerUris` defaults to empty |
|
||||
| `DefaultConfig_ServiceLevelBase200` | `ServiceLevelBase` defaults to `200` |
|
||||
|
||||
### Updates to existing configuration tests
|
||||
|
||||
**File:** `tests/ZB.MOM.WW.LmxOpcUa.Tests/Configuration/ConfigurationLoadingTests.cs`
|
||||
|
||||
Add:
|
||||
- `Redundancy_Section_BindsCorrectly` — verify binding from appsettings.json
|
||||
- `Redundancy_Section_BindsCustomValues` — in-memory override test
|
||||
- `Validator_RedundancyEnabled_EmptyServerUris_ReturnsTrue_WithWarning` — validates but warns
|
||||
- `Validator_RedundancyEnabled_InvalidServiceLevelBase_ReturnsFalse` — rejects 0 or >255
|
||||
|
||||
### Integration tests — redundancy E2E
|
||||
|
||||
**New file:** `tests/ZB.MOM.WW.LmxOpcUa.Tests/Integration/RedundancyTests.cs`
|
||||
|
||||
These tests start two in-process OPC UA servers with redundancy enabled and verify client-visible behavior:
|
||||
|
||||
| Test | Description |
|
||||
|---|---|
|
||||
| `Server_WithRedundancyDisabled_ReportsNone` | Default config → `RedundancySupport.None`, `ServiceLevel=255` |
|
||||
| `Server_WithRedundancyEnabled_ReportsConfiguredMode` | `Enabled=true, Mode=Warm` → `RedundancySupport.Warm` |
|
||||
| `Server_WithRedundancyEnabled_ExposesServerUriArray` | Client can read `ServerUriArray` and it matches config |
|
||||
| `Server_Primary_HasHigherServiceLevel_ThanSecondary` | Primary server reports higher `ServiceLevel` than secondary |
|
||||
| `TwoServers_BothExposeSameRedundantSet` | Two server fixtures, both report the same `ServerUriArray` |
|
||||
| `Server_ServiceLevel_DropsWith_MxAccessDisconnect` | Simulate MXAccess disconnect → `ServiceLevel` decreases |
|
||||
|
||||
Pattern: Use `OpcUaServerFixture.WithFakeMxAccessClient()` with redundancy config injected, connect with `OpcUaTestClient`, read the standard OPC UA redundancy nodes.
|
||||
|
||||
---
|
||||
|
||||
## Documentation Plan
|
||||
|
||||
### New file: `docs/Redundancy.md`
|
||||
|
||||
Contents:
|
||||
1. Overview of OPC UA non-transparent redundancy
|
||||
2. Redundancy configuration section reference (`Enabled`, `Mode`, `Role`, `ServerUris`, `ServiceLevelBase`)
|
||||
3. ServiceLevel computation logic and degraded-state penalties
|
||||
4. How clients discover and fail over between instances
|
||||
5. Deployment guide for a two-instance redundant pair (ports, service names, shared Galaxy DB)
|
||||
6. CLI `redundancy` command usage
|
||||
7. Troubleshooting: mismatched `ServerUris`, ServiceLevel stuck at 0, etc.
|
||||
|
||||
### Updates to existing docs
|
||||
|
||||
| File | Changes |
|
||||
|---|---|
|
||||
| `docs/Configuration.md` | Add `Redundancy` section table, example JSON, add to validation rules list, update example appsettings.json |
|
||||
| `docs/OpcUaServer.md` | Add redundancy state exposure section, link to `Redundancy.md` |
|
||||
| `docs/CliTool.md` | Add `redundancy` command documentation |
|
||||
| `docs/ServiceHosting.md` | Add multi-instance deployment notes |
|
||||
| `README.md` | Add `Redundancy` to the component documentation table, mention redundancy in Quick Start |
|
||||
| `CLAUDE.md` | Add redundancy architecture note |
|
||||
|
||||
### Update: `service_info.md`
|
||||
|
||||
Add a second section documenting `instance2`:
|
||||
- Path: `C:\publish\lmxopcua\instance2`
|
||||
- Windows service name: `LmxOpcUa2`
|
||||
- Port: `4841`
|
||||
- Dashboard port: `8084`
|
||||
- Redundancy role: `Secondary`
|
||||
- Endpoint: `opc.tcp://localhost:4841/LmxOpcUa`
|
||||
|
||||
---
|
||||
|
||||
## File Change Summary
|
||||
|
||||
| File | Action | Description |
|
||||
|---|---|---|
|
||||
| `src/.../Configuration/RedundancyConfiguration.cs` | New | Redundancy config model |
|
||||
| `src/.../Configuration/AppConfiguration.cs` | Modify | Add `Redundancy` section |
|
||||
| `src/.../Configuration/ConfigurationValidator.cs` | Modify | Validate/log redundancy settings |
|
||||
| `src/.../OpcUa/RedundancyModeResolver.cs` | New | Mode string → `RedundancySupport` enum |
|
||||
| `src/.../OpcUa/ServiceLevelCalculator.cs` | New | Dynamic ServiceLevel from health state |
|
||||
| `src/.../OpcUa/LmxOpcUaServer.cs` | Modify | Expose redundancy nodes, accept ServiceLevel updates |
|
||||
| `src/.../OpcUa/OpcUaServerHost.cs` | Modify | Pass redundancy config through |
|
||||
| `src/.../OpcUaService.cs` | Modify | Bind redundancy config, wire ServiceLevel updates |
|
||||
| `src/.../OpcUaServiceBuilder.cs` | Modify | Add `WithRedundancy()` builder |
|
||||
| `src/.../appsettings.json` | Modify | Add `Redundancy` section |
|
||||
| `tools/opcuacli-dotnet/Commands/RedundancyCommand.cs` | New | CLI command to read redundancy info |
|
||||
| `tests/.../Redundancy/RedundancyModeResolverTests.cs` | New | Mode resolver unit tests |
|
||||
| `tests/.../Redundancy/ServiceLevelCalculatorTests.cs` | New | ServiceLevel computation tests |
|
||||
| `tests/.../Redundancy/RedundancyConfigurationTests.cs` | New | Config defaults tests |
|
||||
| `tests/.../Configuration/ConfigurationLoadingTests.cs` | Modify | Binding + validation tests |
|
||||
| `tests/.../Integration/RedundancyTests.cs` | New | E2E two-server redundancy tests |
|
||||
| `tests/.../Helpers/OpcUaServerFixture.cs` | Modify | Accept redundancy config |
|
||||
| `docs/Redundancy.md` | New | Dedicated redundancy component doc |
|
||||
| `docs/Configuration.md` | Modify | Add Redundancy section |
|
||||
| `docs/OpcUaServer.md` | Modify | Add redundancy state section |
|
||||
| `docs/CliTool.md` | Modify | Add redundancy command |
|
||||
| `docs/ServiceHosting.md` | Modify | Multi-instance notes |
|
||||
| `README.md` | Modify | Add Redundancy to component table |
|
||||
| `CLAUDE.md` | Modify | Add redundancy architecture note |
|
||||
| `service_info.md` | Modify | Add instance2 details |
|
||||
|
||||
---
|
||||
|
||||
## Verification Guardrails
|
||||
|
||||
Each step must pass these gates before proceeding to the next:
|
||||
|
||||
### Gate 1: Build (after each implementation step)
|
||||
```bash
|
||||
dotnet build ZB.MOM.WW.LmxOpcUa.slnx
|
||||
```
|
||||
Must produce 0 errors. Proceed only when green.
|
||||
|
||||
### Gate 2: Unit tests (after steps 1–4, 9)
|
||||
```bash
|
||||
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests
|
||||
```
|
||||
All existing + new tests must pass. No regressions.
|
||||
|
||||
### Gate 3: Integration tests (after steps 5–7)
|
||||
```bash
|
||||
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests --filter "FullyQualifiedName~Integration.RedundancyTests"
|
||||
```
|
||||
All redundancy E2E tests must pass.
|
||||
|
||||
### Gate 4: CLI tool builds (after step 10)
|
||||
```bash
|
||||
cd tools/opcuacli-dotnet && dotnet build
|
||||
```
|
||||
Must compile without errors.
|
||||
|
||||
### Gate 5: Manual verification — single instance (after step 8)
|
||||
```bash
|
||||
# Publish and start with Redundancy.Enabled=false
|
||||
opcuacli-dotnet.exe connect -u opc.tcp://localhost:4840/LmxOpcUa
|
||||
opcuacli-dotnet.exe redundancy -u opc.tcp://localhost:4840/LmxOpcUa
|
||||
# Should report: RedundancySupport=None, ServiceLevel=255
|
||||
```
|
||||
|
||||
### Gate 6: Manual verification — redundant pair (after step 11)
|
||||
```bash
|
||||
# Start both instances
|
||||
sc start LmxOpcUa
|
||||
sc start LmxOpcUa2
|
||||
|
||||
# Verify instance1 (Primary)
|
||||
opcuacli-dotnet.exe redundancy -u opc.tcp://localhost:4840/LmxOpcUa
|
||||
# Should report: RedundancySupport=Warm, ServiceLevel=200, ServerUris=[urn:ZB:LmxOpcUa, urn:ZB:LmxOpcUa2]
|
||||
|
||||
# Verify instance2 (Secondary)
|
||||
opcuacli-dotnet.exe redundancy -u opc.tcp://localhost:4841/LmxOpcUa
|
||||
# Should report: RedundancySupport=Warm, ServiceLevel=150, ServerUris=[urn:ZB:LmxOpcUa, urn:ZB:LmxOpcUa2]
|
||||
|
||||
# Both instances should serve the same Galaxy address space
|
||||
opcuacli-dotnet.exe browse -u opc.tcp://localhost:4840/LmxOpcUa -r -d 2
|
||||
opcuacli-dotnet.exe browse -u opc.tcp://localhost:4841/LmxOpcUa -r -d 2
|
||||
```
|
||||
|
||||
### Gate 7: Full test suite (final)
|
||||
```bash
|
||||
dotnet test ZB.MOM.WW.LmxOpcUa.slnx
|
||||
```
|
||||
All tests across all projects must pass.
|
||||
|
||||
### Gate 8: Documentation review
|
||||
- All new/modified doc files render correctly in Markdown
|
||||
- Example JSON snippets match the actual `appsettings.json`
|
||||
- CLI examples use correct flags and expected output
|
||||
- `service_info.md` accurately reflects both deployed instances
|
||||
|
||||
---
|
||||
|
||||
## Risks and Considerations
|
||||
|
||||
1. **Backward compatibility**: `Redundancy.Enabled = false` must be the default so existing single-instance deployments are unaffected.
|
||||
2. **ServiceLevel timing**: Updates must not race with OPC UA publish cycles. Use the server's internal lock or `ServerInternal` APIs.
|
||||
3. **ServerUriArray immutability**: The OPC UA spec expects this to be static during a server session. Changes require a server restart.
|
||||
4. **MXAccess shared state**: Both instances connect to the same MXAccess runtime. If MXAccess has per-client registration limits, verify that two clients can coexist.
|
||||
5. **Galaxy DB contention**: Both instances poll for deploy changes. Ensure change detection doesn't trigger duplicate rebuilds or locking issues.
|
||||
6. **Port conflicts**: The second instance must use different ports for OPC UA (4841) and Dashboard (8084).
|
||||
7. **Certificate identity**: Each instance needs its own application certificate with a distinct `SubjectName` matching its `ServerName`.
|
||||
|
||||
---
|
||||
|
||||
## Execution Order
|
||||
|
||||
1. Steps 1–4: Config model, resolver, calculator, validator (unit-testable in isolation)
|
||||
2. **Gate 1 + Gate 2**: Build + unit tests pass
|
||||
3. Steps 5–7: Server integration (redundancy nodes, ServiceLevel wiring)
|
||||
4. **Gate 1 + Gate 2 + Gate 3**: Build + all tests including E2E
|
||||
5. Step 8: Update appsettings.json
|
||||
6. **Gate 5**: Manual single-instance verification
|
||||
7. Step 9: Update service builder for tests
|
||||
8. Step 10: CLI redundancy command
|
||||
9. **Gate 4**: CLI builds
|
||||
10. Step 11: Deploy second instance + update service_info.md
|
||||
11. **Gate 6**: Manual two-instance verification
|
||||
12. Documentation updates (all doc files)
|
||||
13. **Gate 7 + Gate 8**: Full test suite + documentation review
|
||||
14. Commit and push
|
||||
Reference in New Issue
Block a user