Phase 2 PR 4 — close 4 open MXAccess findings (push frames + reconnect + write-await + read-cancel) #3

Merged
dohertj2 merged 14 commits from phase-2-pr4-findings into v2 2026-04-18 06:57:22 -04:00
5 changed files with 460 additions and 0 deletions
Showing only changes of commit 7403b92b72 - Show all commits

View File

@@ -0,0 +1,103 @@
# Stream D — Legacy `OtOpcUa.Host` Removal Procedure
> Sequenced playbook for the next session that takes Phase 2 to its full exit gate.
> All Stream A/B/C work is committed. The blocker is structural: the 494 v1
> `OtOpcUa.Tests` instantiate v1 `Host` classes directly, so they must be
> retargeted (or archived) before the Host project can be deleted.
## Decision: Option A or Option B
### Option A — Rewrite the 494 v1 tests to use v2 topology
**Effort**: 3-5 days. Highest fidelity (full v1 test coverage carries forward).
**Steps**:
1. Build a `ProxyMxAccessClientAdapter` in a new `OtOpcUa.LegacyTestCompat/` project that
implements v1's `IMxAccessClient` by forwarding to `Driver.Galaxy.Proxy.GalaxyProxyDriver`.
Maps v1 `Vtq` ↔ v2 `DataValueSnapshot`, v1 `Quality` enum ↔ v2 `StatusCode` u32, the v1
`OnTagValueChanged` event ↔ v2 `ISubscribable.OnDataChange`.
2. Same idea for `IGalaxyRepository` — adapter that wraps v2's `Backend.Galaxy.GalaxyRepository`.
3. Replace `MxAccessClient` constructions in `OtOpcUa.Tests` test fixtures with the adapter.
Most tests use a single fixture so the change-set is concentrated.
4. For each test class: run; iterate on parity defects until green. Expected defect families:
timing-sensitive assertions (IPC adds ~5ms latency; widen tolerances), Quality enum vs
StatusCode mismatches, value-byte-encoding differences.
5. Once all 494 pass: proceed to deletion checklist below.
**When to pick A**: regulatory environments that need the full historical test suite green,
or when the v2 parity gate is itself a release-blocking artifact downstream consumers will
look for.
### Option B — Archive the 494 v1 tests, build a smaller v2 parity suite
**Effort**: 1-2 days. Faster to green; less coverage initially, accreted over time.
**Steps**:
1. Rename `tests/ZB.MOM.WW.OtOpcUa.Tests/``tests/ZB.MOM.WW.OtOpcUa.Tests.v1Archive/`.
Add `<IsTestProject>false</IsTestProject>` so CI doesn't run them; mark every class with
`[Trait("Category", "v1Archive")]` so a future operator can opt in via `--filter`.
2. New `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/` project (.NET 10):
- `ParityFixture` spawns Galaxy.Host EXE per test class with `OTOPCUA_GALAXY_BACKEND=mxaccess`
pointing at the dev box's live Galaxy. Pattern from `HostSubprocessParityTests`.
- 10-20 representative tests covering the core paths: hierarchy shape, attribute count,
read-Manufacturer-Boolean, write-Operate-Float roundtrip, subscribe-receives-OnDataChange,
Bad-quality on disconnect, alarm-event-shape.
3. The four 2026-04-13 stability findings get individual regression tests in this project.
4. Once green: proceed to deletion checklist below.
**When to pick B**: typical dev velocity case. The v1 archive is reference, the new suite is
the live parity bar.
## Deletion checklist (after Option A or B is green)
Pre-conditions:
- [ ] Chosen-option test suite green (494 retargeted OR new E2E suite passing on this box)
- [ ] `phase-2-compliance.ps1` runs and exits 0
- [ ] `Get-Service aaGR, aaBootstrap` → Running
- [ ] `Driver.Galaxy.Host` x86 publish output verified at
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Release/net48/`
- [ ] Migration script tested: `scripts/migration/Migrate-AppSettings-To-DriverConfig.ps1
-AppSettingsPath src/ZB.MOM.WW.OtOpcUa.Host/appsettings.json -DryRun` produces a
well-formed DriverConfig
- [ ] Service installer scripts dry-run on a test box: `scripts/install/Install-Services.ps1
-InstallRoot C:\OtOpcUa -ServiceAccount LOCALHOST\testuser` registers both services
and they start
Steps:
1. Delete `src/ZB.MOM.WW.OtOpcUa.Host/` (the legacy in-process Host project).
2. Edit `ZB.MOM.WW.OtOpcUa.slnx` — remove the legacy Host `<Project>` line; keep all v2
project lines.
3. Migrate the dev `appsettings.json` Galaxy sections to `DriverConfig` JSON via the
migration script; insert into the Configuration DB for the dev cluster's Galaxy driver
instance.
4. Run the chosen test suite once more — confirm zero regressions from the deletion.
5. Build full solution (`dotnet build ZB.MOM.WW.OtOpcUa.slnx`) — confirm clean build with
no references to the deleted project.
6. Commit:
`git rm -r src/ZB.MOM.WW.OtOpcUa.Host` followed by the slnx + cleanup edits in one
atomic commit titled "Phase 2 Stream D — retire legacy OtOpcUa.Host".
7. Run `/codex:adversarial-review --base v2` on the merged Phase 2 diff.
8. Record `exit-gate-phase-2-final.md` with: Option chosen, deletion-commit SHA, parity
test count + duration, adversarial-review findings (each closed or deferred with link).
9. Open PR against `v2`, link the exit-gate doc + compliance script output + parity report.
10. Merge after one reviewer signoff.
## Rollback
If Stream D causes downstream consumer failures (ScadaBridge / Ignition / SystemPlatform IO
clients seeing different OPC UA behavior), the rollback is `git revert` of the deletion
commit — the whole v2 codebase keeps Galaxy.Proxy + Galaxy.Host installed alongside the
restored legacy Host. Production can run either topology. `OtOpcUa.Driver.Galaxy.Proxy`
becomes dormant until the next attempt.
## Why this can't one-shot in an autonomous session
- The parity-defect debug cycle is intrinsically interactive: each iteration requires running
the test suite against live Galaxy, inspecting the diff, deciding if the difference is a
legitimate v2 improvement or a regression, then either widening the assertion or fixing the
v2 code. That decision-making is the bottleneck, not the typing.
- The legacy-Host deletion is destructive — needs explicit operator authorization on a real
PR review, not unattended automation.
- The downstream consumer cutover (ScadaBridge, Ignition, AppServer) lives outside this repo
and on an integration-team track; "Phase 2 done" inside this repo is a precondition, not
the full release.

View File

@@ -0,0 +1,102 @@
<#
.SYNOPSIS
Registers the two v2 Windows services on a node: OtOpcUa (main server, net10) and
OtOpcUaGalaxyHost (out-of-process Galaxy COM host, net48 x86).
.DESCRIPTION
Phase 2 Stream D.2 — replaces the v1 single-service install (TopShelf-based OtOpcUa.Host).
Installs both services with the correct service-account SID + per-process shared secret
provisioning per `driver-stability.md §"IPC Security"`. Galaxy.Host depends on OtOpcUa
(Galaxy.Host must be reachable when OtOpcUa starts; service dependency wiring + retry
handled by OtOpcUa.Server NodeBootstrap).
.PARAMETER InstallRoot
Where the binaries live (typically C:\Program Files\OtOpcUa).
.PARAMETER ServiceAccount
Service account SID or DOMAIN\name. Both services run under this account; the
Galaxy.Host pipe ACL only allows this SID to connect (decision #76).
.PARAMETER GalaxySharedSecret
Per-process secret passed to Galaxy.Host via env var. Generated freshly per install.
.PARAMETER ZbConnection
Galaxy ZB SQL connection string (passed to Galaxy.Host via env var).
.EXAMPLE
.\Install-Services.ps1 -InstallRoot 'C:\Program Files\OtOpcUa' -ServiceAccount 'OTOPCUA\svc-otopcua'
#>
[CmdletBinding()]
param(
[Parameter(Mandatory)] [string]$InstallRoot,
[Parameter(Mandatory)] [string]$ServiceAccount,
[string]$GalaxySharedSecret,
[string]$ZbConnection = 'Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;',
[string]$GalaxyClientName = 'OtOpcUa-Galaxy.Host',
[string]$GalaxyPipeName = 'OtOpcUaGalaxy'
)
$ErrorActionPreference = 'Stop'
if (-not (Test-Path "$InstallRoot\OtOpcUa.Server.exe")) {
Write-Error "OtOpcUa.Server.exe not found at $InstallRoot — copy the publish output first"
exit 1
}
if (-not (Test-Path "$InstallRoot\Galaxy\OtOpcUa.Driver.Galaxy.Host.exe")) {
Write-Error "OtOpcUa.Driver.Galaxy.Host.exe not found at $InstallRoot\Galaxy — copy the publish output first"
exit 1
}
# Generate a fresh shared secret per install if not supplied. Stored in DPAPI-protected file
# rather than the registry so the service account can read it but other local users cannot.
if (-not $GalaxySharedSecret) {
$bytes = New-Object byte[] 32
[System.Security.Cryptography.RandomNumberGenerator]::Create().GetBytes($bytes)
$GalaxySharedSecret = [Convert]::ToBase64String($bytes)
}
# Resolve the SID — the IPC ACL needs the SID, not the down-level name.
$sid = if ($ServiceAccount.StartsWith('S-1-')) {
$ServiceAccount
} else {
(New-Object System.Security.Principal.NTAccount $ServiceAccount).Translate([System.Security.Principal.SecurityIdentifier]).Value
}
# --- Install OtOpcUaGalaxyHost first (OtOpcUa starts after, depends on it being up).
$galaxyEnv = @(
"OTOPCUA_GALAXY_PIPE=$GalaxyPipeName"
"OTOPCUA_ALLOWED_SID=$sid"
"OTOPCUA_GALAXY_SECRET=$GalaxySharedSecret"
"OTOPCUA_GALAXY_BACKEND=mxaccess"
"OTOPCUA_GALAXY_ZB_CONN=$ZbConnection"
"OTOPCUA_GALAXY_CLIENT_NAME=$GalaxyClientName"
) -join "`0"
$galaxyEnv += "`0`0"
Write-Host "Installing OtOpcUaGalaxyHost..."
& sc.exe create OtOpcUaGalaxyHost binPath= "`"$InstallRoot\Galaxy\OtOpcUa.Driver.Galaxy.Host.exe`"" `
DisplayName= 'OtOpcUa Galaxy Host (out-of-process MXAccess)' `
start= auto `
obj= $ServiceAccount | Out-Null
# Set per-service environment variables via the registry — sc.exe doesn't expose them directly.
$svcKey = "HKLM:\SYSTEM\CurrentControlSet\Services\OtOpcUaGalaxyHost"
$envValue = $galaxyEnv.Split("`0") | Where-Object { $_ -ne '' }
Set-ItemProperty -Path $svcKey -Name 'Environment' -Type MultiString -Value $envValue
# --- Install OtOpcUa (depends on Galaxy host being installed; doesn't strictly require it
# started — OtOpcUa.Server NodeBootstrap retries on the IPC connect path).
Write-Host "Installing OtOpcUa..."
& sc.exe create OtOpcUa binPath= "`"$InstallRoot\OtOpcUa.Server.exe`"" `
DisplayName= 'OtOpcUa Server' `
start= auto `
depend= 'OtOpcUaGalaxyHost' `
obj= $ServiceAccount | Out-Null
Write-Host ""
Write-Host "Installed. Start with:"
Write-Host " sc.exe start OtOpcUaGalaxyHost"
Write-Host " sc.exe start OtOpcUa"
Write-Host ""
Write-Host "Galaxy shared secret (record this offline — required for service rebinding):"
Write-Host " $GalaxySharedSecret"

View File

@@ -0,0 +1,18 @@
<#
.SYNOPSIS
Stops + removes the two v2 services. Mirrors Install-Services.ps1.
#>
[CmdletBinding()] param()
$ErrorActionPreference = 'Continue'
foreach ($svc in 'OtOpcUa', 'OtOpcUaGalaxyHost') {
if (Get-Service $svc -ErrorAction SilentlyContinue) {
Write-Host "Stopping $svc..."
Stop-Service $svc -Force -ErrorAction SilentlyContinue
Write-Host "Removing $svc..."
& sc.exe delete $svc | Out-Null
} else {
Write-Host "$svc not installed — skipping"
}
}
Write-Host "Done."

View File

@@ -0,0 +1,107 @@
<#
.SYNOPSIS
Translates a v1 OtOpcUa.Host appsettings.json into a v2 DriverInstance.DriverConfig JSON
blob suitable for upserting into the central Configuration DB.
.DESCRIPTION
Phase 2 Stream D.3 — moves the legacy MxAccess + GalaxyRepository + Historian sections out
of node-local appsettings.json and into the central DB so each node only needs Cluster.NodeId
+ ClusterId + DB conn (per decision #18). Idempotent + dry-run-able.
Output shape matches the Galaxy DriverType schema in `docs/v2/plan.md` §"Galaxy DriverConfig":
{
"MxAccess": { "ClientName": "...", "RequestTimeoutSeconds": 30 },
"Database": { "ConnectionString": "...", "PollIntervalSeconds": 60 },
"Historian": { "Enabled": false }
}
.PARAMETER AppSettingsPath
Path to the v1 appsettings.json. Defaults to ../../src/ZB.MOM.WW.OtOpcUa.Host/appsettings.json
relative to the script.
.PARAMETER OutputPath
Where to write the generated DriverConfig JSON. Defaults to stdout.
.PARAMETER DryRun
Print what would be written without writing.
.EXAMPLE
pwsh ./Migrate-AppSettings-To-DriverConfig.ps1 -AppSettingsPath C:\OtOpcUa\appsettings.json -OutputPath C:\tmp\galaxy-driverconfig.json
#>
[CmdletBinding()]
param(
[string]$AppSettingsPath,
[string]$OutputPath,
[switch]$DryRun
)
$ErrorActionPreference = 'Stop'
if (-not $AppSettingsPath) {
$AppSettingsPath = Join-Path (Split-Path -Parent $PSScriptRoot) '..\src\ZB.MOM.WW.OtOpcUa.Host\appsettings.json'
}
if (-not (Test-Path $AppSettingsPath)) {
Write-Error "AppSettings file not found: $AppSettingsPath"
exit 1
}
$src = Get-Content -Raw $AppSettingsPath | ConvertFrom-Json
$mx = $src.MxAccess
$gr = $src.GalaxyRepository
$hi = $src.Historian
$driverConfig = [ordered]@{
MxAccess = [ordered]@{
ClientName = $mx.ClientName
NodeName = $mx.NodeName
GalaxyName = $mx.GalaxyName
RequestTimeoutSeconds = $mx.ReadTimeoutSeconds
WriteTimeoutSeconds = $mx.WriteTimeoutSeconds
MaxConcurrentOps = $mx.MaxConcurrentOperations
MonitorIntervalSec = $mx.MonitorIntervalSeconds
AutoReconnect = $mx.AutoReconnect
ProbeTag = $mx.ProbeTag
}
Database = [ordered]@{
ConnectionString = $gr.ConnectionString
ChangeDetectionIntervalSec = $gr.ChangeDetectionIntervalSeconds
CommandTimeoutSeconds = $gr.CommandTimeoutSeconds
ExtendedAttributes = $gr.ExtendedAttributes
Scope = $gr.Scope
PlatformName = $gr.PlatformName
}
Historian = [ordered]@{
Enabled = if ($null -ne $hi -and $null -ne $hi.Enabled) { $hi.Enabled } else { $false }
}
}
# Strip null-valued leaves so the resulting JSON is compact and round-trippable.
function Remove-Nulls($obj) {
$keys = @($obj.Keys)
foreach ($k in $keys) {
if ($null -eq $obj[$k]) { $obj.Remove($k) | Out-Null }
elseif ($obj[$k] -is [System.Collections.Specialized.OrderedDictionary]) { Remove-Nulls $obj[$k] }
}
}
Remove-Nulls $driverConfig
$json = $driverConfig | ConvertTo-Json -Depth 8
if ($DryRun) {
Write-Host "=== DriverConfig (dry-run, would write to $OutputPath) ==="
Write-Host $json
return
}
if ($OutputPath) {
$dir = Split-Path -Parent $OutputPath
if ($dir -and -not (Test-Path $dir)) { New-Item -ItemType Directory -Path $dir | Out-Null }
Set-Content -Path $OutputPath -Value $json -Encoding UTF8
Write-Host "Wrote DriverConfig to $OutputPath"
}
else {
$json
}

View File

@@ -0,0 +1,130 @@
using System.Diagnostics;
using System.Reflection;
using System.Security.Principal;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Ipc;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests;
/// <summary>
/// The honest cross-FX parity test — spawns the actual <c>OtOpcUa.Driver.Galaxy.Host.exe</c>
/// subprocess (net48 x86), the Proxy connects via real named pipe, exercises Discover
/// against the live Galaxy ZB DB, and asserts gobjects come back. This is the production
/// deployment shape (Tier C: separate process, IPC over named pipe, Proxy in the .NET 10
/// server process). Skipped when the Host EXE isn't built or Galaxy is unreachable.
/// </summary>
[Trait("Category", "ProcessSpawnParity")]
public sealed class HostSubprocessParityTests : IDisposable
{
private Process? _hostProcess;
public void Dispose()
{
if (_hostProcess is not null && !_hostProcess.HasExited)
{
try { _hostProcess.Kill(entireProcessTree: true); } catch { /* ignore */ }
try { _hostProcess.WaitForExit(5_000); } catch { /* ignore */ }
}
_hostProcess?.Dispose();
}
private static string? FindHostExe()
{
// The test assembly lives at tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/bin/Debug/net10.0/.
// The Host EXE lives at src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/.
var asmDir = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location)!;
var solutionRoot = asmDir;
for (var i = 0; i < 8 && solutionRoot is not null; i++)
{
if (File.Exists(Path.Combine(solutionRoot, "ZB.MOM.WW.OtOpcUa.slnx")))
break;
solutionRoot = Path.GetDirectoryName(solutionRoot);
}
if (solutionRoot is null) return null;
var candidate = Path.Combine(solutionRoot,
"src", "ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host", "bin", "Debug", "net48",
"OtOpcUa.Driver.Galaxy.Host.exe");
return File.Exists(candidate) ? candidate : null;
}
private static bool IsAdministrator()
{
if (!OperatingSystem.IsWindows()) return false;
using var identity = WindowsIdentity.GetCurrent();
return new WindowsPrincipal(identity).IsInRole(WindowsBuiltInRole.Administrator);
}
private static async Task<bool> ZbReachableAsync()
{
try
{
using var client = new System.Net.Sockets.TcpClient();
var task = client.ConnectAsync("localhost", 1433);
return await Task.WhenAny(task, Task.Delay(1_500)) == task && client.Connected;
}
catch { return false; }
}
[Fact]
public async Task Spawned_Host_in_db_mode_lets_Proxy_Discover_real_Galaxy_gobjects()
{
if (!OperatingSystem.IsWindows() || IsAdministrator()) return;
if (!await ZbReachableAsync()) return;
var hostExe = FindHostExe();
if (hostExe is null) return; // skip when the Host hasn't been built
using var identity = WindowsIdentity.GetCurrent();
var sid = identity.User!;
var pipeName = $"OtOpcUaGalaxyParity-{Guid.NewGuid():N}";
const string secret = "parity-secret";
var psi = new ProcessStartInfo(hostExe)
{
UseShellExecute = false,
CreateNoWindow = true,
RedirectStandardOutput = true,
RedirectStandardError = true,
EnvironmentVariables =
{
["OTOPCUA_GALAXY_PIPE"] = pipeName,
["OTOPCUA_ALLOWED_SID"] = sid.Value,
["OTOPCUA_GALAXY_SECRET"] = secret,
["OTOPCUA_GALAXY_BACKEND"] = "db", // SQL-only — doesn't need MXAccess
["OTOPCUA_GALAXY_ZB_CONN"] = "Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;",
},
};
_hostProcess = Process.Start(psi)
?? throw new InvalidOperationException("Failed to spawn Galaxy.Host");
// Wait for the pipe to come up — the Host's PipeServer takes ~100ms to bind.
await Task.Delay(2_000);
await using var client = await GalaxyIpcClient.ConnectAsync(
pipeName, secret, TimeSpan.FromSeconds(5), CancellationToken.None);
var sessionResp = await client.CallAsync<OpenSessionRequest, OpenSessionResponse>(
MessageKind.OpenSessionRequest,
new OpenSessionRequest { DriverInstanceId = "parity", DriverConfigJson = "{}" },
MessageKind.OpenSessionResponse,
CancellationToken.None);
sessionResp.Success.ShouldBeTrue(sessionResp.Error);
var discoverResp = await client.CallAsync<DiscoverHierarchyRequest, DiscoverHierarchyResponse>(
MessageKind.DiscoverHierarchyRequest,
new DiscoverHierarchyRequest { SessionId = sessionResp.SessionId },
MessageKind.DiscoverHierarchyResponse,
CancellationToken.None);
discoverResp.Success.ShouldBeTrue(discoverResp.Error);
discoverResp.Objects.Length.ShouldBeGreaterThan(0,
"live Galaxy ZB has at least one deployed gobject");
await client.SendOneWayAsync(MessageKind.CloseSessionRequest,
new CloseSessionRequest { SessionId = sessionResp.SessionId }, CancellationToken.None);
}
}