docs: add four planning runbooks for Phase 6.3 interop, v2 GA gates, live-hardware validation, and alarms worker wiring
Produces docs/plans/ entries for tasks #13, #15, #16, and #17-#20: - phase-6-3-redundancy-interop-plan.md: automation boundary analysis, concrete test matrix (A/B/C blocks), and a step-by-step cutover runbook for the deferred Stream F client interop work - v2-ga-lab-gates-plan.md: exact gate list with command, pass criterion, and owner for each of the nine v2 GA exit criteria - live-hardware-validation-runbooks.md: one runbook per driver (FOCAS CNC smoke #54, AB CIP live-boot, TwinCAT wire-live) with preconditions, procedure, expected results, and recording template - alarms-worker-wiring-plan.md: focused plan for A.2/A.3-A.4/C.1/D.1 worker wiring in the mxaccessgw sibling repo, documenting the discovered AVEVA API surface, the architectural decision that blocks A.2, the dependency order, and what each item needs to unblock Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
307
docs/plans/v2-ga-lab-gates-plan.md
Normal file
307
docs/plans/v2-ga-lab-gates-plan.md
Normal file
@@ -0,0 +1,307 @@
|
||||
# v2 GA Lab Gates Plan
|
||||
|
||||
> **Canonical tracker**: `docs/v2/v2-release-readiness.md` — all code-path
|
||||
> release blockers are closed as of 2026-04-24. This document maps the
|
||||
> remaining exit-criteria from that tracker to concrete commands, automation
|
||||
> boundaries, operator procedures, and pass criteria.
|
||||
>
|
||||
> **Status**: RELEASE-READY (code-path). Manual/lab gates remain open.
|
||||
|
||||
## The gate list
|
||||
|
||||
From `docs/v2/v2-release-readiness.md` §"Release-readiness exit criteria":
|
||||
|
||||
| # | Gate | Kind | Automatable here |
|
||||
|---|------|------|-----------------|
|
||||
| G1 | All four Phase 6.N compliance scripts exit 0 | Script | Yes — run on this box |
|
||||
| G2 | `dotnet test ZB.MOM.WW.OtOpcUa.slnx` passes with <= 1 known-flake failure | Script | Yes — run on this box |
|
||||
| G3 | Release blockers closed | Audit | Already closed (code-path) |
|
||||
| G4 | Phase 5 driver complement shipped | Audit | Already closed |
|
||||
| G5 | Production deployment checklist signed off by Fleet Admin | Operator | No — separate doc, human signoff |
|
||||
| G6 | At least one end-to-end integration run against live Galaxy succeeds | Dev rig | No — requires AVEVA platform |
|
||||
| G7 | FOCAS live-CNC wire-level smoke (#54) passes against a real FANUC control | Lab hardware | No — requires FANUC CNC |
|
||||
| G8 | OPC UA CTT / UA Compliance Test Tool passes against the live endpoint | Operator tool | No — requires CTT binary + live endpoint |
|
||||
| G9 | Non-transparent redundancy cutover validated with >= 1 production client | Lab | No — see `docs/plans/phase-6-3-redundancy-interop-plan.md` |
|
||||
|
||||
---
|
||||
|
||||
## G1 — Phase 6 compliance scripts
|
||||
|
||||
### Command
|
||||
|
||||
```powershell
|
||||
pwsh ./scripts/compliance/phase-6-all.ps1
|
||||
```
|
||||
|
||||
This meta-runner at `scripts/compliance/phase-6-all.ps1` invokes each
|
||||
sub-script in a separate `powershell.exe` process to isolate exit codes:
|
||||
|
||||
| Sub-script | Phase | What it checks |
|
||||
|-----------|-------|---------------|
|
||||
| `phase-6-1-compliance.ps1` | 6.1 Resilience & Observability | Polly resilience classes, health endpoints, LiteDB sealed cache, observability sinks |
|
||||
| `phase-6-2-compliance.ps1` | 6.2 Authorization runtime | `AuthorizationGate`, `TriePermissionEvaluator`, `NodeScopeResolver`, dispatch wiring in `DriverNodeManager` |
|
||||
| `phase-6-3-compliance.ps1` | 6.3 Redundancy runtime | `ServiceLevelCalculator` 8-state band values, `RecoveryStateManager`, `ApplyLeaseRegistry`, `ServerRedundancyNodeWriter`; also invokes `dotnet test` with a baseline of 1097 |
|
||||
| `phase-6-4-compliance.ps1` | 6.4 Admin UI completion | Data-layer types, Identification folder, deferred Blazor items marked `[DEFERRED]` |
|
||||
|
||||
### Pass criterion
|
||||
|
||||
```
|
||||
Phase 6 aggregate: PASS
|
||||
```
|
||||
|
||||
Exit code 0. Any `[FAIL]` line is a blocker. `[DEFERRED]` lines are expected
|
||||
for the known-deferred surfaces listed in the implementation docs; they do not
|
||||
fail the run.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- SQL Server `10.100.0.35,14330` reachable (Config DB tests use it).
|
||||
- `dotnet` SDK on PATH (`.NET 10`).
|
||||
- Run from repo root.
|
||||
|
||||
---
|
||||
|
||||
## G2 — Full solution test suite
|
||||
|
||||
### Command
|
||||
|
||||
```powershell
|
||||
dotnet test ZB.MOM.WW.OtOpcUa.slnx --logger "console;verbosity=minimal"
|
||||
```
|
||||
|
||||
For a more targeted run of integration suites that need their fixtures up:
|
||||
|
||||
```powershell
|
||||
# bring modbus fixture up first
|
||||
lmxopcua-fix up modbus standard
|
||||
|
||||
dotnet test ZB.MOM.WW.OtOpcUa.slnx --logger "console;verbosity=minimal"
|
||||
```
|
||||
|
||||
### Pass criterion
|
||||
|
||||
- Passed count >= 1159 (2026-04-19 baseline after Phase 5 driver complement).
|
||||
- Failed count <= 1 (the pre-existing
|
||||
`SubscribeCommandTests.Execute_PrintsSubscriptionMessage` flake in
|
||||
`Client.CLI` is the only tolerated failure).
|
||||
- No new `[FAILED]` tests relative to the baseline.
|
||||
|
||||
### Known flake
|
||||
|
||||
`ZB.MOM.WW.OtOpcUa.Client.CLI.Tests::SubscribeCommandTests.Execute_PrintsSubscriptionMessage`
|
||||
is a timing-sensitive subscribe-then-cancel test. Rerun the specific project
|
||||
if it appears:
|
||||
|
||||
```powershell
|
||||
dotnet test tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests `
|
||||
--filter "FullyQualifiedName~SubscribeCommandTests.Execute_PrintsSubscriptionMessage" `
|
||||
--count 3
|
||||
```
|
||||
|
||||
If it fails all three runs, investigate; otherwise treat as flake.
|
||||
|
||||
### Docker fixtures needed for integration suites
|
||||
|
||||
| Driver | Command | Endpoint used |
|
||||
|--------|---------|---------------|
|
||||
| Modbus | `lmxopcua-fix up modbus standard` | `10.100.0.35:5020` |
|
||||
| AB CIP | `lmxopcua-fix up abcip controllogix` | `10.100.0.35:44818` |
|
||||
| S7 | `lmxopcua-fix up s7 s7_1500` | `10.100.0.35:1102` |
|
||||
| OPC UA Client | `lmxopcua-fix up opcuaclient` | `opc.tcp://10.100.0.35:50000` |
|
||||
| FOCAS | `lmxopcua-fix up focas` (mock server) | `10.100.0.35:8193` |
|
||||
|
||||
TwinCAT integration tests require the TCBSD ESXi VM at `10.100.0.128`
|
||||
(AmsNetId `41.169.163.43.1.1`). Set env var before running:
|
||||
|
||||
```powershell
|
||||
$env:TWINCAT_TARGET_HOST = "10.100.0.128"
|
||||
$env:TWINCAT_TARGET_NETID = "41.169.163.43.1.1"
|
||||
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests
|
||||
```
|
||||
|
||||
Galaxy integration tests run against the live mxaccessgw on the dev box
|
||||
(gate G6).
|
||||
|
||||
---
|
||||
|
||||
## G3 — Release blockers closed (audit, already satisfied)
|
||||
|
||||
All three code-path release blockers are closed per `v2-release-readiness.md`:
|
||||
|
||||
- Authorization dispatch wiring (task #143, PR #94) — CLOSED.
|
||||
- Config fallback Phase 6.1 Stream D (task #136, PR #96) — CLOSED.
|
||||
- Redundancy Phase 6.3 Streams A/C core (tasks #145/#147, PRs #98-99) — CLOSED.
|
||||
|
||||
No action required. Record the PR numbers in the release notes.
|
||||
|
||||
---
|
||||
|
||||
## G4 — Driver complement (audit, already satisfied)
|
||||
|
||||
All eight drivers shipped:
|
||||
|
||||
Galaxy, Modbus (+ DL205/S7/MELSEC profiles), S7 native, OPC UA Client, AB CIP,
|
||||
AB Legacy, TwinCAT ADS, FOCAS (managed wire client — Tier-C isolation retired,
|
||||
FOCAS is now Tier A in-process via `WireFocasClient`).
|
||||
|
||||
No action required.
|
||||
|
||||
---
|
||||
|
||||
## G5 — Production deployment checklist (operator action)
|
||||
|
||||
The deployment checklist is a separate document covering:
|
||||
|
||||
- Windows service install (`scripts/install/Install-Services.ps1`)
|
||||
- Config DB migration (`scripts/db/Apply-Migrations.ps1`)
|
||||
- Certificate provisioning and trust
|
||||
- LDAP / GLAuth configuration for production AD target
|
||||
- mxaccessgw API key provisioning (`apikey create-key` in the sibling repo)
|
||||
- Service account permissions
|
||||
- Prometheus / OpenTelemetry export configuration
|
||||
- Firewall rules (port 4840 OPC UA, port 5120 gRPC to mxaccessgw,
|
||||
Admin port 5000/5001)
|
||||
|
||||
**Sign-off party**: Fleet Admin (operator). Not automatable.
|
||||
|
||||
Record sign-off as a comment on the v2 release issue.
|
||||
|
||||
---
|
||||
|
||||
## G6 — Live Galaxy end-to-end integration run
|
||||
|
||||
**Requires**: AVEVA System Platform installed on dev box (confirmed available
|
||||
per project memory `project_aveva_platform_installed.md`); mxaccessgw running
|
||||
with a provisioned API key; at least one Galaxy object deployed.
|
||||
|
||||
### Procedure
|
||||
|
||||
1. Start mxaccessgw:
|
||||
|
||||
```powershell
|
||||
# in sibling repo C:\Users\dohertj2\Desktop\mxaccessgw\
|
||||
dotnet run --project src/MxGateway.Server -- --apikey-path .local/api-key.txt
|
||||
```
|
||||
|
||||
2. Start OtOpcUa server with Galaxy driver instance configured:
|
||||
|
||||
```powershell
|
||||
sc start OtOpcUa
|
||||
```
|
||||
|
||||
3. Browse via Client CLI:
|
||||
|
||||
```powershell
|
||||
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
|
||||
browse -u opc.tcp://localhost:4840 -r -d 3
|
||||
```
|
||||
|
||||
4. Read a known Galaxy tag (e.g. a deployed `$UserDefined` object attribute):
|
||||
|
||||
```powershell
|
||||
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
|
||||
read -u opc.tcp://localhost:4840 -n "ns=2;s=<tag_name.AttributeName>"
|
||||
```
|
||||
|
||||
5. Subscribe and verify live updates:
|
||||
|
||||
```powershell
|
||||
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
|
||||
subscribe -u opc.tcp://localhost:4840 -n "ns=2;s=<tag_name.AttributeName>" -i 1000
|
||||
```
|
||||
|
||||
### Pass criterion
|
||||
|
||||
- Browse returns a non-empty node tree mirroring the Galaxy hierarchy.
|
||||
- Read returns `Good` quality with a non-null value.
|
||||
- Subscribe receives at least one data-change notification within 5 s
|
||||
(or within the configured publishing interval).
|
||||
- No `BadNoCommunication` or `BadTimeout` errors in the server log.
|
||||
|
||||
Record: Galaxy version, deployed object count, OtOpcUa git SHA.
|
||||
|
||||
---
|
||||
|
||||
## G7 — FOCAS live-CNC smoke (task #54)
|
||||
|
||||
**Requires**: real FANUC CNC with Ethernet option, accessible on TCP port 8193
|
||||
from the dev box; CNC series known (e.g. 0i-F, 30i-B).
|
||||
|
||||
See `docs/plans/live-hardware-validation-runbooks.md` §FOCAS for the full
|
||||
runbook.
|
||||
|
||||
### Pass criterion
|
||||
|
||||
- `WireFocasClient` opens a FOCAS2 session (`cnc_allclibhndl3` succeeds).
|
||||
- Identity nodes (`Identity/SeriesNumber`, `Identity/MaxAxes`) return non-null
|
||||
values matching the physical control panel display.
|
||||
- At least one axis position (`Axes/X/AbsolutePosition` or similar) returns
|
||||
`Good` quality with a plausible double value.
|
||||
- Subscribe on a polled tag delivers at least three updates within 5 s.
|
||||
- No `EW_SOCKET` (-1) or `EW_HANDLE` (-7) errors in the server log during a
|
||||
2-minute soak.
|
||||
|
||||
Record: CNC series, firmware version, test date, OtOpcUa git SHA.
|
||||
|
||||
---
|
||||
|
||||
## G8 — OPC UA Conformance Test Tool (CTT) pass
|
||||
|
||||
**Requires**: OPC Foundation OPC UA Compliance Test Tool (CTT) or the
|
||||
open-source UA Compliance Test Tool installed on the client machine;
|
||||
live OtOpcUa server endpoint.
|
||||
|
||||
### Recommended minimum profile set
|
||||
|
||||
- `Attribute Read`
|
||||
- `Attribute Write`
|
||||
- `Browse`
|
||||
- `Subscription` (DataChange)
|
||||
- `Server-side monitoring`
|
||||
- `Security — None profile` (if server configured with `Security:Profiles=[None]`)
|
||||
|
||||
### Procedure
|
||||
|
||||
1. Launch CTT. Add server endpoint: `opc.tcp://localhost:4840`.
|
||||
2. Run the profile set above.
|
||||
3. Capture the CTT report HTML/XML.
|
||||
|
||||
### Pass criterion
|
||||
|
||||
All mandatory test cases in each profile: **PASS** or **NOT APPLICABLE**.
|
||||
|
||||
Zero mandatory failures. Advisory failures may be documented with rationale
|
||||
(e.g. optional capability not implemented).
|
||||
|
||||
Record: CTT version, profile set, OtOpcUa git SHA, report artifact.
|
||||
|
||||
---
|
||||
|
||||
## G9 — Non-transparent redundancy cutover with production client
|
||||
|
||||
See `docs/plans/phase-6-3-redundancy-interop-plan.md` for the full runbook.
|
||||
|
||||
**Minimum acceptable result**: one complete pass of the A-block (UaExpert
|
||||
OPC UA signal verification) plus scenario B2 (UaExpert failover on Primary
|
||||
kill).
|
||||
|
||||
Ignition 8.3 is the recommended production client per decision #85. If
|
||||
Ignition is not available on the lab machine, UaExpert is accepted for v2 GA.
|
||||
|
||||
Record: client name + version, OtOpcUa git SHA, test date.
|
||||
|
||||
---
|
||||
|
||||
## Gate summary table
|
||||
|
||||
| Gate | Command / Procedure | Pass criterion | Owner |
|
||||
|------|---------------------|----------------|-------|
|
||||
| G1 | `pwsh ./scripts/compliance/phase-6-all.ps1` | Exit 0, no `[FAIL]` | Dev |
|
||||
| G2 | `dotnet test ZB.MOM.WW.OtOpcUa.slnx` | >= 1159 passing, <= 1 failure | Dev |
|
||||
| G3 | Audit PR list in release-readiness.md | All blockers show CLOSED | Dev |
|
||||
| G4 | Audit driver table | All 8 drivers listed as shipped | Dev |
|
||||
| G5 | Run deployment checklist doc | All items checked; Fleet Admin signs off | Fleet Admin |
|
||||
| G6 | Browse/read/subscribe against live Galaxy | Good quality, non-empty tree | Dev (dev box) |
|
||||
| G7 | FOCAS CNC smoke — see live-hardware runbook | Session open, Good quality reads | Dev + lab hardware |
|
||||
| G8 | CTT profile run against live endpoint | Zero mandatory failures | Dev + CTT tool |
|
||||
| G9 | Redundancy cutover runbook | A-block + B2 pass with >= 1 client | Dev + two instances |
|
||||
Reference in New Issue
Block a user