docs: add four planning runbooks for Phase 6.3 interop, v2 GA gates, live-hardware validation, and alarms worker wiring

Produces docs/plans/ entries for tasks #13, #15, #16, and #17-#20:
- phase-6-3-redundancy-interop-plan.md: automation boundary analysis,
  concrete test matrix (A/B/C blocks), and a step-by-step cutover
  runbook for the deferred Stream F client interop work
- v2-ga-lab-gates-plan.md: exact gate list with command, pass criterion,
  and owner for each of the nine v2 GA exit criteria
- live-hardware-validation-runbooks.md: one runbook per driver (FOCAS
  CNC smoke #54, AB CIP live-boot, TwinCAT wire-live) with preconditions,
  procedure, expected results, and recording template
- alarms-worker-wiring-plan.md: focused plan for A.2/A.3-A.4/C.1/D.1
  worker wiring in the mxaccessgw sibling repo, documenting the
  discovered AVEVA API surface, the architectural decision that blocks
  A.2, the dependency order, and what each item needs to unblock

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-18 04:52:07 -04:00
parent da8a3e46f7
commit 16a87b08f3
4 changed files with 1422 additions and 0 deletions

View File

@@ -0,0 +1,497 @@
# Live-Hardware Driver Validation Runbooks
> **Scope**: These runbooks cover the three driver validation tasks that
> require physical hardware or a hardware-equivalent live environment and
> cannot be satisfied by the Docker-based simulator fixtures or unit tests
> alone.
>
> Driver implementation is complete. The runbooks document the preconditions,
> step-by-step procedure, expected results, and how to record the outcome for
> each driver that has an open live-hardware gap.
---
## 1. FANUC FOCAS — Live CNC Smoke (task #54)
### Background
The FOCAS driver (`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/`) uses the
pure-managed `WireFocasClient` that speaks FOCAS2 over TCP directly (no
`Fwlib64.dll`, no P/Invoke). The integration test suite at
`tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/` runs against
the `focas-mock` Python server (PDU-verified against `fwlibe64.dll` upstream)
and covers all call-shapes the driver issues. What the mock cannot cover:
- Series-specific firmware quirks (e.g. 0i-F vs 30i-B parameter range limits)
- Real CNC Ethernet stack behaviour (TCP keep-alive, session-close edge cases)
- Series gating: some driver nodes are conditionally emitted based on
`CncSeries` — only a physical CNC can confirm the suppression works
### Preconditions
| Item | Requirement |
|------|-------------|
| CNC hardware | FANUC CNC with Ethernet option enabled; TCP port 8193 reachable from the dev box or from the host running OtOpcUa |
| CNC series | Any of: 0i-D, 0i-F, 0i-MF, 0i-TF, 16i, 30i-B, 31i, 32i, Power Motion i |
| CNC state | Running state (not E-stop, not alarm) for live axis-data reads |
| Network | TCP reachability from OtOpcUa server host to CNC port 8193 |
| OtOpcUa | Server built and deployed (`dotnet publish` or running via `dotnet run`) |
| Config | DriverInstance row for FOCAS in Config DB (`Type="FOCAS"`, `Backend="wire"`, `Devices[0].HostAddress="focas://<cnc-ip>:8193"`, `Devices[0].Series="<series>"`) |
### Procedure
**Step 1 — Verify TCP reachability**
```powershell
Test-NetConnection -ComputerName <cnc-ip> -Port 8193
```
Pass: `TcpTestSucceeded: True`.
**Step 2 — Start OtOpcUa with FOCAS driver configured**
Ensure the Config DB has the DriverInstance row. Start the server:
```powershell
sc start OtOpcUa
# or for a dev run:
dotnet run --project src/Server/ZB.MOM.WW.OtOpcUa.Server
```
Watch the Serilog log for:
```
[INF] FocasDriver initializing device focas://<cnc-ip>:8193 series=<series>
[INF] FocasDriver device <cnc-ip>:8193 Connected
```
If `EW_SOCKET (-1)` appears, the TCP endpoint is unreachable or the CNC
Ethernet option is not active.
**Step 3 — Browse the address space**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
browse -u opc.tcp://localhost:4840 -r -d 3
```
Expected: a node tree containing at minimum:
```
FOCAS/
<device>/
Identity/
SeriesNumber
Version
MaxAxes
Status/
RunState
Mode
EmergencyStop
Axes/
<X|Y|Z>/
AbsolutePosition
MachinePosition
```
Nodes suppressed by the `Series` capability gate will be absent — this is
correct behaviour.
**Step 4 — Read identity nodes**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=FOCAS/<device>/Identity/SeriesNumber"
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=FOCAS/<device>/Identity/MaxAxes"
```
Pass: `Good` quality; `SeriesNumber` matches the string printed on the CNC
control panel (e.g. `"0i-F"`); `MaxAxes` is a non-zero integer.
**Step 5 — Read live status and axis data**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=FOCAS/<device>/Status/RunState"
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=FOCAS/<device>/Axes/X/AbsolutePosition"
```
Pass: both return `Good` quality. `AbsolutePosition` is a `Double` (e.g.
`-12.3456` mm). Manually compare against the machine's position display.
**Step 6 — Subscribe and observe polling**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
subscribe -u opc.tcp://localhost:4840 `
-n "ns=2;s=FOCAS/<device>/Status/RunState" -i 500
```
Let run for 30 s while jogging an axis or changing mode on the CNC operator
panel. Pass: at least one data-change event received within 5 s; events
continue arriving every ~500 ms.
**Step 7 — 2-minute soak**
Let the server run for 2 minutes with the subscription active. Pass: no
`EW_SOCKET`, `EW_HANDLE`, `EW_BUSY` errors in the Serilog output; subscribed
node continues delivering updates.
**Step 8 — Run the FOCAS e2e script**
```powershell
pwsh scripts/e2e/test-focas.ps1 -ServerUrl opc.tcp://localhost:4840 `
-DriverInstance "<device>" -Series "<series>"
```
Pass: script exits 0.
### Expected results
| Check | Expected |
|-------|----------|
| TCP connect to CNC port 8193 | Success |
| FOCAS session open (`cnc_allclibhndl3`) | EW_OK (0) in driver log |
| `Identity/SeriesNumber` | Matches CNC panel, `Good` quality |
| `Identity/MaxAxes` | Non-zero integer, `Good` quality |
| `Status/RunState` | Integer 03, `Good` quality |
| `Axes/X/AbsolutePosition` | Double, `Good` quality, matches display |
| Subscribe: events delivered | >= 3 events in 5 s soak |
| 2-minute soak: no FOCAS errors | Clean Serilog log |
### Recording the outcome
```
FOCAS live-CNC smoke — task #54
Date: YYYY-MM-DD
CNC: <manufacturer> <model> series=<series> firmware=<version>
IP: <cnc-ip>:8193
OtOpcUa SHA: <git sha>
TCP connect: PASS
Session open: PASS
Identity reads: PASS SeriesNumber="<>" MaxAxes=<n>
Status read: PASS RunState=<n>
Axis read: PASS X/AbsolutePosition=<value>
Subscribe: PASS <n> events in 30s
2-min soak: PASS no errors
e2e script: PASS
```
---
## 2. Allen-Bradley CIP — Live Boot (ControlLogix)
### Background
The AB CIP driver (`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip/`) uses
`libplctag` 1.6.x. The Docker `ab_server` simulator covers connectivity and
atomic type reads (7 integration tests). Live-boot validation is needed to
confirm UDT shape-reading, array tag access, and the CIP packing behaviour on
a real ControlLogix backplane — all gaps acknowledged in
`docs/drivers/AbServer-Test-Fixture.md`.
AB CIP live-boot was first verified against a ControlLogix rig at PR #222.
Continue running before each release.
### Preconditions
| Item | Requirement |
|------|-------------|
| PLC hardware | ControlLogix (preferred) or CompactLogix; firmware 20+ for request packing |
| Network | TCP port 44818 reachable from OtOpcUa server host |
| PLC state | Running; at least one DINT / REAL / BOOL / STRING controller-scoped tag defined |
| OtOpcUa | Server built and deployed |
| Config | DriverInstance row: `Type="AbCip"`, `Host="<plc-ip>"`, `Path="1,0"`, `PlcType="ControlLogix"` |
### Procedure
**Step 1 — Verify TCP reachability**
```powershell
Test-NetConnection -ComputerName <plc-ip> -Port 44818
```
Pass: `TcpTestSucceeded: True`.
**Step 2 — Start OtOpcUa and watch driver log**
```powershell
sc start OtOpcUa
```
Look for:
```
[INF] AbCipDriver device <plc-ip> Connected path=1,0 plcType=ControlLogix
```
**Step 3 — Browse the address space**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
browse -u opc.tcp://localhost:4840 -r -d 3
```
Pass: node tree shows the tags defined in the ControlLogix project (controller-
and program-scoped). UDT members appear as child nodes.
**Step 4 — Read atomic tags**
```powershell
# Read a DINT tag
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=AbCip/<device>/<TagName>"
```
Pass: `Good` quality; value type matches the PLC tag type.
**Step 5 — Read a UDT member**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=AbCip/<device>/<UDT>/<MemberName>"
```
Pass: `Good` quality; value matches the live PLC value.
**Step 6 — Write a DINT tag (if in ReadWrite mode)**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
write -u opc.tcp://localhost:4840 `
-n "ns=2;s=AbCip/<device>/<TagName>" -v 42 -t Int32
```
Verify the new value via a subsequent read or on the PLC HMI.
Pass: read back returns 42 with `Good` quality.
**Step 7 — Subscribe to a tag that changes**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
subscribe -u opc.tcp://localhost:4840 `
-n "ns=2;s=AbCip/<device>/<ChangingTag>" -i 500
```
Jog or trigger a value change on the PLC. Pass: events received within 2 s.
**Step 8 — Override endpoint to docker sim and confirm parity**
```powershell
$env:AB_SERVER_ENDPOINT = "<plc-ip>:44818"
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests `
--filter "AbServerFact"
```
Pass: all 7 integration tests pass against the live PLC.
### Expected results
| Check | Expected |
|-------|----------|
| TCP connect | Success |
| Driver log `Connected` | Present, no error |
| Browse | Node tree mirrors PLC tag list |
| Atomic read | `Good` quality, correct type |
| UDT member read | `Good` quality, correct value |
| Write round-trip | Written value reads back |
| Subscribe | Events delivered on value change |
| Integration tests with live PLC | 7/7 pass |
### Recording the outcome
```
AB CIP live-boot
Date: YYYY-MM-DD
PLC: Allen-Bradley <model> firmware=<version>
IP: <plc-ip>:44818 path=1,0
OtOpcUa SHA: <git sha>
TCP connect: PASS
Driver connected: PASS
Browse: PASS <n> tags visible
Atomic read: PASS
UDT read: PASS
Write round-trip: PASS
Subscribe: PASS
Integration tests: 7/7 PASS
```
---
## 3. Beckhoff TwinCAT — Wire-Live Validation
### Background
The TwinCAT driver (`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/`) uses the
Beckhoff `TwinCAT.Ads` .NET SDK v6. The integration test suite at
`tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/`
(`TwinCAT3SmokeTests.cs`) covers 14 `[TwinCATFact]` methods + one 16-case
`[TwinCATTheory]` (30 cases total) against a live ADS runtime. The TCBSD ESXi
VM at `10.100.0.128` (AmsNetId `41.169.163.43.1.1`) is the primary fixture
runtime (project memory `project_tcbsd_fixture.md`) and bypasses the
TwinCAT/Hyper-V conflict on the dev box.
Live-hardware validation extends beyond the TCBSD VM to confirm the driver
works against a production PLC (not just the ESXi test VM) and that the three
defects found during original integration testing do not regress on newer
firmware:
1. Notification cycle time unit (250 ms was being set to ~41 min — fixed).
2. `STRING(N)` / `WSTRING(N)` type mapper (fixed).
3. Bit-indexed BOOL path (fixed).
### Preconditions
**TCBSD ESXi fixture (primary — no physical hardware needed)**
| Item | Requirement |
|------|-------------|
| TCBSD VM | Running on ESXi at `10.100.0.128` |
| AMS Net ID | `41.169.163.43.1.1` |
| ADS port | `851` (TwinCAT 3 PLC runtime 1) |
| PLC project | TwinCAT project from `tests/.../TwinCatProject/` loaded and in Run state |
| Network | TCP port 48898 reachable from dev box to `10.100.0.128` |
**Production PLC (for true wire-live validation)**
| Item | Requirement |
|------|-------------|
| TwinCAT hardware | Beckhoff IPC or CX series, TwinCAT 3 (TC3); TC2 is a known gap per fixture doc |
| AMS route | Route configured on TwinCAT device back to the OtOpcUa host |
| PLC state | Run state |
| GVL | At least a `GVL_Fixture.nCounter` DINT and `GVL_Fixture.rSetpoint` REAL present |
### Procedure — TCBSD ESXi fixture
**Step 1 — Verify TCBSD VM is reachable**
```powershell
Test-NetConnection -ComputerName 10.100.0.128 -Port 48898
```
Pass: `TcpTestSucceeded: True`.
**Step 2 — Run the integration test suite**
```powershell
$env:TWINCAT_TARGET_HOST = "10.100.0.128"
$env:TWINCAT_TARGET_NETID = "41.169.163.43.1.1"
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests `
--logger "console;verbosity=normal"
```
Pass: all 30 test cases pass (14 `[TwinCATFact]` + 16-case `[TwinCATTheory]`).
No `[TwinCATFact]` / `[TwinCATTheory]` skips — the env var is set, so the
runtime probe is expected to succeed.
Key tests to watch:
| Test | Validates |
|------|-----------|
| `Driver_subscribe_receives_native_ADS_notifications_on_counter_changes` | Native ADS notification path (the cycle-time-unit bug regression) |
| `Driver_reads_every_primitive_type_with_correct_mapping` | 16-type theory incl. `STRING(N)` |
| `Driver_reads_bit_indexed_BOOL_from_word` | Bit-indexed BOOL fix regression |
| `Driver_auto_reconnects_after_underlying_client_is_disposed` | Reconnect on ADS client dispose |
| `Driver_routes_reads_per_device_and_isolates_unreachable_peers` | Multi-device isolation |
**Step 3 — OtOpcUa server browse/read via Client CLI**
Start OtOpcUa with a TwinCAT DriverInstance pointing at the TCBSD VM:
```powershell
# appsettings.json DriverInstance: Type=TwinCAT, AmsNetId=41.169.163.43.1.1, AmsPort=851
sc start OtOpcUa
# or dev run
dotnet run --project src/Server/ZB.MOM.WW.OtOpcUa.Server
```
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
browse -u opc.tcp://localhost:4840 -r -d 4
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=TwinCAT/<device>/GVL_Fixture/nCounter"
```
Pass: browse shows the PLC symbol tree; read returns `Good` quality with an
integer value.
### Procedure — Production PLC (optional, for full wire-live signoff)
If a Beckhoff production IPC is available in the lab:
**Step 1** — Configure the AMS route on the TwinCAT device (TwinCAT System
Manager → Routes → Add static route from the TwinCAT device back to the
OtOpcUa server machine).
**Step 2** — Set env vars and run the integration suite against the production
target:
```powershell
$env:TWINCAT_TARGET_HOST = "<production-plc-ip>"
$env:TWINCAT_TARGET_NETID = "<production-ams-net-id>"
$env:TWINCAT_TARGET_PORT = "851"
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests
```
**Step 3** — Subscribe to a counter tag for 30 s to confirm native
notifications arrive:
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
subscribe -u opc.tcp://localhost:4840 `
-n "ns=2;s=TwinCAT/<device>/GVL_Fixture/nCounter" -i 100
```
Pass: events arrive every ~100 ms driven by the PLC's ADS notification, not
by polling.
### Expected results
| Check | TCBSD VM | Production PLC |
|-------|----------|----------------|
| ADS port 48898 reachable | Required | Required |
| Integration tests: all 30 pass | Required | Optional (same 30) |
| Notification cycle-time test passes | Required | Required |
| Server browse shows symbol tree | Required | Optional |
| Read `Good` quality | Required | Optional |
| Native ADS notifications deliver in subscribe | Required | Recommended |
### Known gaps (documented — not blockers for v2 GA)
Per `docs/drivers/TwinCAT-Test-Fixture.md` §"What it does NOT cover":
- Multi-hop AMS routing — single-hop only.
- TC2 (ADS v1) compatibility — TC3 only.
- Notification coalescing under sustained CPU load.
- `Symbol version changed (0x0702)` storm handling under rapid PLC re-downloads.
These are deferred to v3 per `docs/v3/twincat-backlog.md`.
### Recording the outcome
```
TwinCAT wire-live validation
Date: YYYY-MM-DD
Target: TCBSD VM 10.100.0.128 AmsNetId=41.169.163.43.1.1 (and/or production PLC details)
TwinCAT version: <version>
OtOpcUa SHA: <git sha>
ADS port reachable: PASS
Integration tests: 30/30 PASS
notification-cycle-time test: PASS (regression check)
STRING(N) type test: PASS (regression check)
bit-indexed BOOL test: PASS (regression check)
Server browse: PASS
Read Good quality: PASS
Native subscription delivery: PASS <n> events in 30s
```