Files
lmxopcua/docs/plans/v2-ga-lab-gates-plan.md
Joseph Doherty 16a87b08f3 docs: add four planning runbooks for Phase 6.3 interop, v2 GA gates, live-hardware validation, and alarms worker wiring
Produces docs/plans/ entries for tasks #13, #15, #16, and #17-#20:
- phase-6-3-redundancy-interop-plan.md: automation boundary analysis,
  concrete test matrix (A/B/C blocks), and a step-by-step cutover
  runbook for the deferred Stream F client interop work
- v2-ga-lab-gates-plan.md: exact gate list with command, pass criterion,
  and owner for each of the nine v2 GA exit criteria
- live-hardware-validation-runbooks.md: one runbook per driver (FOCAS
  CNC smoke #54, AB CIP live-boot, TwinCAT wire-live) with preconditions,
  procedure, expected results, and recording template
- alarms-worker-wiring-plan.md: focused plan for A.2/A.3-A.4/C.1/D.1
  worker wiring in the mxaccessgw sibling repo, documenting the
  discovered AVEVA API surface, the architectural decision that blocks
  A.2, the dependency order, and what each item needs to unblock

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:53:36 -04:00

11 KiB

v2 GA Lab Gates Plan

Canonical tracker: docs/v2/v2-release-readiness.md — all code-path release blockers are closed as of 2026-04-24. This document maps the remaining exit-criteria from that tracker to concrete commands, automation boundaries, operator procedures, and pass criteria.

Status: RELEASE-READY (code-path). Manual/lab gates remain open.

The gate list

From docs/v2/v2-release-readiness.md §"Release-readiness exit criteria":

# Gate Kind Automatable here
G1 All four Phase 6.N compliance scripts exit 0 Script Yes — run on this box
G2 dotnet test ZB.MOM.WW.OtOpcUa.slnx passes with <= 1 known-flake failure Script Yes — run on this box
G3 Release blockers closed Audit Already closed (code-path)
G4 Phase 5 driver complement shipped Audit Already closed
G5 Production deployment checklist signed off by Fleet Admin Operator No — separate doc, human signoff
G6 At least one end-to-end integration run against live Galaxy succeeds Dev rig No — requires AVEVA platform
G7 FOCAS live-CNC wire-level smoke (#54) passes against a real FANUC control Lab hardware No — requires FANUC CNC
G8 OPC UA CTT / UA Compliance Test Tool passes against the live endpoint Operator tool No — requires CTT binary + live endpoint
G9 Non-transparent redundancy cutover validated with >= 1 production client Lab No — see docs/plans/phase-6-3-redundancy-interop-plan.md

G1 — Phase 6 compliance scripts

Command

pwsh ./scripts/compliance/phase-6-all.ps1

This meta-runner at scripts/compliance/phase-6-all.ps1 invokes each sub-script in a separate powershell.exe process to isolate exit codes:

Sub-script Phase What it checks
phase-6-1-compliance.ps1 6.1 Resilience & Observability Polly resilience classes, health endpoints, LiteDB sealed cache, observability sinks
phase-6-2-compliance.ps1 6.2 Authorization runtime AuthorizationGate, TriePermissionEvaluator, NodeScopeResolver, dispatch wiring in DriverNodeManager
phase-6-3-compliance.ps1 6.3 Redundancy runtime ServiceLevelCalculator 8-state band values, RecoveryStateManager, ApplyLeaseRegistry, ServerRedundancyNodeWriter; also invokes dotnet test with a baseline of 1097
phase-6-4-compliance.ps1 6.4 Admin UI completion Data-layer types, Identification folder, deferred Blazor items marked [DEFERRED]

Pass criterion

Phase 6 aggregate: PASS

Exit code 0. Any [FAIL] line is a blocker. [DEFERRED] lines are expected for the known-deferred surfaces listed in the implementation docs; they do not fail the run.

Prerequisites

  • SQL Server 10.100.0.35,14330 reachable (Config DB tests use it).
  • dotnet SDK on PATH (.NET 10).
  • Run from repo root.

G2 — Full solution test suite

Command

dotnet test ZB.MOM.WW.OtOpcUa.slnx --logger "console;verbosity=minimal"

For a more targeted run of integration suites that need their fixtures up:

# bring modbus fixture up first
lmxopcua-fix up modbus standard

dotnet test ZB.MOM.WW.OtOpcUa.slnx --logger "console;verbosity=minimal"

Pass criterion

  • Passed count >= 1159 (2026-04-19 baseline after Phase 5 driver complement).
  • Failed count <= 1 (the pre-existing SubscribeCommandTests.Execute_PrintsSubscriptionMessage flake in Client.CLI is the only tolerated failure).
  • No new [FAILED] tests relative to the baseline.

Known flake

ZB.MOM.WW.OtOpcUa.Client.CLI.Tests::SubscribeCommandTests.Execute_PrintsSubscriptionMessage is a timing-sensitive subscribe-then-cancel test. Rerun the specific project if it appears:

dotnet test tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests `
    --filter "FullyQualifiedName~SubscribeCommandTests.Execute_PrintsSubscriptionMessage" `
    --count 3

If it fails all three runs, investigate; otherwise treat as flake.

Docker fixtures needed for integration suites

Driver Command Endpoint used
Modbus lmxopcua-fix up modbus standard 10.100.0.35:5020
AB CIP lmxopcua-fix up abcip controllogix 10.100.0.35:44818
S7 lmxopcua-fix up s7 s7_1500 10.100.0.35:1102
OPC UA Client lmxopcua-fix up opcuaclient opc.tcp://10.100.0.35:50000
FOCAS lmxopcua-fix up focas (mock server) 10.100.0.35:8193

TwinCAT integration tests require the TCBSD ESXi VM at 10.100.0.128 (AmsNetId 41.169.163.43.1.1). Set env var before running:

$env:TWINCAT_TARGET_HOST   = "10.100.0.128"
$env:TWINCAT_TARGET_NETID  = "41.169.163.43.1.1"
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests

Galaxy integration tests run against the live mxaccessgw on the dev box (gate G6).


G3 — Release blockers closed (audit, already satisfied)

All three code-path release blockers are closed per v2-release-readiness.md:

  • Authorization dispatch wiring (task #143, PR #94) — CLOSED.
  • Config fallback Phase 6.1 Stream D (task #136, PR #96) — CLOSED.
  • Redundancy Phase 6.3 Streams A/C core (tasks #145/#147, PRs #98-99) — CLOSED.

No action required. Record the PR numbers in the release notes.


G4 — Driver complement (audit, already satisfied)

All eight drivers shipped:

Galaxy, Modbus (+ DL205/S7/MELSEC profiles), S7 native, OPC UA Client, AB CIP, AB Legacy, TwinCAT ADS, FOCAS (managed wire client — Tier-C isolation retired, FOCAS is now Tier A in-process via WireFocasClient).

No action required.


G5 — Production deployment checklist (operator action)

The deployment checklist is a separate document covering:

  • Windows service install (scripts/install/Install-Services.ps1)
  • Config DB migration (scripts/db/Apply-Migrations.ps1)
  • Certificate provisioning and trust
  • LDAP / GLAuth configuration for production AD target
  • mxaccessgw API key provisioning (apikey create-key in the sibling repo)
  • Service account permissions
  • Prometheus / OpenTelemetry export configuration
  • Firewall rules (port 4840 OPC UA, port 5120 gRPC to mxaccessgw, Admin port 5000/5001)

Sign-off party: Fleet Admin (operator). Not automatable.

Record sign-off as a comment on the v2 release issue.


G6 — Live Galaxy end-to-end integration run

Requires: AVEVA System Platform installed on dev box (confirmed available per project memory project_aveva_platform_installed.md); mxaccessgw running with a provisioned API key; at least one Galaxy object deployed.

Procedure

  1. Start mxaccessgw:

    # in sibling repo C:\Users\dohertj2\Desktop\mxaccessgw\
    dotnet run --project src/MxGateway.Server -- --apikey-path .local/api-key.txt
    
  2. Start OtOpcUa server with Galaxy driver instance configured:

    sc start OtOpcUa
    
  3. Browse via Client CLI:

    dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
        browse -u opc.tcp://localhost:4840 -r -d 3
    
  4. Read a known Galaxy tag (e.g. a deployed $UserDefined object attribute):

    dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
        read -u opc.tcp://localhost:4840 -n "ns=2;s=<tag_name.AttributeName>"
    
  5. Subscribe and verify live updates:

    dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
        subscribe -u opc.tcp://localhost:4840 -n "ns=2;s=<tag_name.AttributeName>" -i 1000
    

Pass criterion

  • Browse returns a non-empty node tree mirroring the Galaxy hierarchy.
  • Read returns Good quality with a non-null value.
  • Subscribe receives at least one data-change notification within 5 s (or within the configured publishing interval).
  • No BadNoCommunication or BadTimeout errors in the server log.

Record: Galaxy version, deployed object count, OtOpcUa git SHA.


G7 — FOCAS live-CNC smoke (task #54)

Requires: real FANUC CNC with Ethernet option, accessible on TCP port 8193 from the dev box; CNC series known (e.g. 0i-F, 30i-B).

See docs/plans/live-hardware-validation-runbooks.md §FOCAS for the full runbook.

Pass criterion

  • WireFocasClient opens a FOCAS2 session (cnc_allclibhndl3 succeeds).
  • Identity nodes (Identity/SeriesNumber, Identity/MaxAxes) return non-null values matching the physical control panel display.
  • At least one axis position (Axes/X/AbsolutePosition or similar) returns Good quality with a plausible double value.
  • Subscribe on a polled tag delivers at least three updates within 5 s.
  • No EW_SOCKET (-1) or EW_HANDLE (-7) errors in the server log during a 2-minute soak.

Record: CNC series, firmware version, test date, OtOpcUa git SHA.


G8 — OPC UA Conformance Test Tool (CTT) pass

Requires: OPC Foundation OPC UA Compliance Test Tool (CTT) or the open-source UA Compliance Test Tool installed on the client machine; live OtOpcUa server endpoint.

  • Attribute Read
  • Attribute Write
  • Browse
  • Subscription (DataChange)
  • Server-side monitoring
  • Security — None profile (if server configured with Security:Profiles=[None])

Procedure

  1. Launch CTT. Add server endpoint: opc.tcp://localhost:4840.
  2. Run the profile set above.
  3. Capture the CTT report HTML/XML.

Pass criterion

All mandatory test cases in each profile: PASS or NOT APPLICABLE.

Zero mandatory failures. Advisory failures may be documented with rationale (e.g. optional capability not implemented).

Record: CTT version, profile set, OtOpcUa git SHA, report artifact.


G9 — Non-transparent redundancy cutover with production client

See docs/plans/phase-6-3-redundancy-interop-plan.md for the full runbook.

Minimum acceptable result: one complete pass of the A-block (UaExpert OPC UA signal verification) plus scenario B2 (UaExpert failover on Primary kill).

Ignition 8.3 is the recommended production client per decision #85. If Ignition is not available on the lab machine, UaExpert is accepted for v2 GA.

Record: client name + version, OtOpcUa git SHA, test date.


Gate summary table

Gate Command / Procedure Pass criterion Owner
G1 pwsh ./scripts/compliance/phase-6-all.ps1 Exit 0, no [FAIL] Dev
G2 dotnet test ZB.MOM.WW.OtOpcUa.slnx >= 1159 passing, <= 1 failure Dev
G3 Audit PR list in release-readiness.md All blockers show CLOSED Dev
G4 Audit driver table All 8 drivers listed as shipped Dev
G5 Run deployment checklist doc All items checked; Fleet Admin signs off Fleet Admin
G6 Browse/read/subscribe against live Galaxy Good quality, non-empty tree Dev (dev box)
G7 FOCAS CNC smoke — see live-hardware runbook Session open, Good quality reads Dev + lab hardware
G8 CTT profile run against live endpoint Zero mandatory failures Dev + CTT tool
G9 Redundancy cutover runbook A-block + B2 pass with >= 1 client Dev + two instances