Files
lmxopcua/docs/plans/live-hardware-validation-runbooks.md
Joseph Doherty 16a87b08f3 docs: add four planning runbooks for Phase 6.3 interop, v2 GA gates, live-hardware validation, and alarms worker wiring
Produces docs/plans/ entries for tasks #13, #15, #16, and #17-#20:
- phase-6-3-redundancy-interop-plan.md: automation boundary analysis,
  concrete test matrix (A/B/C blocks), and a step-by-step cutover
  runbook for the deferred Stream F client interop work
- v2-ga-lab-gates-plan.md: exact gate list with command, pass criterion,
  and owner for each of the nine v2 GA exit criteria
- live-hardware-validation-runbooks.md: one runbook per driver (FOCAS
  CNC smoke #54, AB CIP live-boot, TwinCAT wire-live) with preconditions,
  procedure, expected results, and recording template
- alarms-worker-wiring-plan.md: focused plan for A.2/A.3-A.4/C.1/D.1
  worker wiring in the mxaccessgw sibling repo, documenting the
  discovered AVEVA API surface, the architectural decision that blocks
  A.2, the dependency order, and what each item needs to unblock

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:53:36 -04:00

498 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Live-Hardware Driver Validation Runbooks
> **Scope**: These runbooks cover the three driver validation tasks that
> require physical hardware or a hardware-equivalent live environment and
> cannot be satisfied by the Docker-based simulator fixtures or unit tests
> alone.
>
> Driver implementation is complete. The runbooks document the preconditions,
> step-by-step procedure, expected results, and how to record the outcome for
> each driver that has an open live-hardware gap.
---
## 1. FANUC FOCAS — Live CNC Smoke (task #54)
### Background
The FOCAS driver (`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/`) uses the
pure-managed `WireFocasClient` that speaks FOCAS2 over TCP directly (no
`Fwlib64.dll`, no P/Invoke). The integration test suite at
`tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/` runs against
the `focas-mock` Python server (PDU-verified against `fwlibe64.dll` upstream)
and covers all call-shapes the driver issues. What the mock cannot cover:
- Series-specific firmware quirks (e.g. 0i-F vs 30i-B parameter range limits)
- Real CNC Ethernet stack behaviour (TCP keep-alive, session-close edge cases)
- Series gating: some driver nodes are conditionally emitted based on
`CncSeries` — only a physical CNC can confirm the suppression works
### Preconditions
| Item | Requirement |
|------|-------------|
| CNC hardware | FANUC CNC with Ethernet option enabled; TCP port 8193 reachable from the dev box or from the host running OtOpcUa |
| CNC series | Any of: 0i-D, 0i-F, 0i-MF, 0i-TF, 16i, 30i-B, 31i, 32i, Power Motion i |
| CNC state | Running state (not E-stop, not alarm) for live axis-data reads |
| Network | TCP reachability from OtOpcUa server host to CNC port 8193 |
| OtOpcUa | Server built and deployed (`dotnet publish` or running via `dotnet run`) |
| Config | DriverInstance row for FOCAS in Config DB (`Type="FOCAS"`, `Backend="wire"`, `Devices[0].HostAddress="focas://<cnc-ip>:8193"`, `Devices[0].Series="<series>"`) |
### Procedure
**Step 1 — Verify TCP reachability**
```powershell
Test-NetConnection -ComputerName <cnc-ip> -Port 8193
```
Pass: `TcpTestSucceeded: True`.
**Step 2 — Start OtOpcUa with FOCAS driver configured**
Ensure the Config DB has the DriverInstance row. Start the server:
```powershell
sc start OtOpcUa
# or for a dev run:
dotnet run --project src/Server/ZB.MOM.WW.OtOpcUa.Server
```
Watch the Serilog log for:
```
[INF] FocasDriver initializing device focas://<cnc-ip>:8193 series=<series>
[INF] FocasDriver device <cnc-ip>:8193 Connected
```
If `EW_SOCKET (-1)` appears, the TCP endpoint is unreachable or the CNC
Ethernet option is not active.
**Step 3 — Browse the address space**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
browse -u opc.tcp://localhost:4840 -r -d 3
```
Expected: a node tree containing at minimum:
```
FOCAS/
<device>/
Identity/
SeriesNumber
Version
MaxAxes
Status/
RunState
Mode
EmergencyStop
Axes/
<X|Y|Z>/
AbsolutePosition
MachinePosition
```
Nodes suppressed by the `Series` capability gate will be absent — this is
correct behaviour.
**Step 4 — Read identity nodes**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=FOCAS/<device>/Identity/SeriesNumber"
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=FOCAS/<device>/Identity/MaxAxes"
```
Pass: `Good` quality; `SeriesNumber` matches the string printed on the CNC
control panel (e.g. `"0i-F"`); `MaxAxes` is a non-zero integer.
**Step 5 — Read live status and axis data**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=FOCAS/<device>/Status/RunState"
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=FOCAS/<device>/Axes/X/AbsolutePosition"
```
Pass: both return `Good` quality. `AbsolutePosition` is a `Double` (e.g.
`-12.3456` mm). Manually compare against the machine's position display.
**Step 6 — Subscribe and observe polling**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
subscribe -u opc.tcp://localhost:4840 `
-n "ns=2;s=FOCAS/<device>/Status/RunState" -i 500
```
Let run for 30 s while jogging an axis or changing mode on the CNC operator
panel. Pass: at least one data-change event received within 5 s; events
continue arriving every ~500 ms.
**Step 7 — 2-minute soak**
Let the server run for 2 minutes with the subscription active. Pass: no
`EW_SOCKET`, `EW_HANDLE`, `EW_BUSY` errors in the Serilog output; subscribed
node continues delivering updates.
**Step 8 — Run the FOCAS e2e script**
```powershell
pwsh scripts/e2e/test-focas.ps1 -ServerUrl opc.tcp://localhost:4840 `
-DriverInstance "<device>" -Series "<series>"
```
Pass: script exits 0.
### Expected results
| Check | Expected |
|-------|----------|
| TCP connect to CNC port 8193 | Success |
| FOCAS session open (`cnc_allclibhndl3`) | EW_OK (0) in driver log |
| `Identity/SeriesNumber` | Matches CNC panel, `Good` quality |
| `Identity/MaxAxes` | Non-zero integer, `Good` quality |
| `Status/RunState` | Integer 03, `Good` quality |
| `Axes/X/AbsolutePosition` | Double, `Good` quality, matches display |
| Subscribe: events delivered | >= 3 events in 5 s soak |
| 2-minute soak: no FOCAS errors | Clean Serilog log |
### Recording the outcome
```
FOCAS live-CNC smoke — task #54
Date: YYYY-MM-DD
CNC: <manufacturer> <model> series=<series> firmware=<version>
IP: <cnc-ip>:8193
OtOpcUa SHA: <git sha>
TCP connect: PASS
Session open: PASS
Identity reads: PASS SeriesNumber="<>" MaxAxes=<n>
Status read: PASS RunState=<n>
Axis read: PASS X/AbsolutePosition=<value>
Subscribe: PASS <n> events in 30s
2-min soak: PASS no errors
e2e script: PASS
```
---
## 2. Allen-Bradley CIP — Live Boot (ControlLogix)
### Background
The AB CIP driver (`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip/`) uses
`libplctag` 1.6.x. The Docker `ab_server` simulator covers connectivity and
atomic type reads (7 integration tests). Live-boot validation is needed to
confirm UDT shape-reading, array tag access, and the CIP packing behaviour on
a real ControlLogix backplane — all gaps acknowledged in
`docs/drivers/AbServer-Test-Fixture.md`.
AB CIP live-boot was first verified against a ControlLogix rig at PR #222.
Continue running before each release.
### Preconditions
| Item | Requirement |
|------|-------------|
| PLC hardware | ControlLogix (preferred) or CompactLogix; firmware 20+ for request packing |
| Network | TCP port 44818 reachable from OtOpcUa server host |
| PLC state | Running; at least one DINT / REAL / BOOL / STRING controller-scoped tag defined |
| OtOpcUa | Server built and deployed |
| Config | DriverInstance row: `Type="AbCip"`, `Host="<plc-ip>"`, `Path="1,0"`, `PlcType="ControlLogix"` |
### Procedure
**Step 1 — Verify TCP reachability**
```powershell
Test-NetConnection -ComputerName <plc-ip> -Port 44818
```
Pass: `TcpTestSucceeded: True`.
**Step 2 — Start OtOpcUa and watch driver log**
```powershell
sc start OtOpcUa
```
Look for:
```
[INF] AbCipDriver device <plc-ip> Connected path=1,0 plcType=ControlLogix
```
**Step 3 — Browse the address space**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
browse -u opc.tcp://localhost:4840 -r -d 3
```
Pass: node tree shows the tags defined in the ControlLogix project (controller-
and program-scoped). UDT members appear as child nodes.
**Step 4 — Read atomic tags**
```powershell
# Read a DINT tag
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=AbCip/<device>/<TagName>"
```
Pass: `Good` quality; value type matches the PLC tag type.
**Step 5 — Read a UDT member**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=AbCip/<device>/<UDT>/<MemberName>"
```
Pass: `Good` quality; value matches the live PLC value.
**Step 6 — Write a DINT tag (if in ReadWrite mode)**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
write -u opc.tcp://localhost:4840 `
-n "ns=2;s=AbCip/<device>/<TagName>" -v 42 -t Int32
```
Verify the new value via a subsequent read or on the PLC HMI.
Pass: read back returns 42 with `Good` quality.
**Step 7 — Subscribe to a tag that changes**
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
subscribe -u opc.tcp://localhost:4840 `
-n "ns=2;s=AbCip/<device>/<ChangingTag>" -i 500
```
Jog or trigger a value change on the PLC. Pass: events received within 2 s.
**Step 8 — Override endpoint to docker sim and confirm parity**
```powershell
$env:AB_SERVER_ENDPOINT = "<plc-ip>:44818"
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests `
--filter "AbServerFact"
```
Pass: all 7 integration tests pass against the live PLC.
### Expected results
| Check | Expected |
|-------|----------|
| TCP connect | Success |
| Driver log `Connected` | Present, no error |
| Browse | Node tree mirrors PLC tag list |
| Atomic read | `Good` quality, correct type |
| UDT member read | `Good` quality, correct value |
| Write round-trip | Written value reads back |
| Subscribe | Events delivered on value change |
| Integration tests with live PLC | 7/7 pass |
### Recording the outcome
```
AB CIP live-boot
Date: YYYY-MM-DD
PLC: Allen-Bradley <model> firmware=<version>
IP: <plc-ip>:44818 path=1,0
OtOpcUa SHA: <git sha>
TCP connect: PASS
Driver connected: PASS
Browse: PASS <n> tags visible
Atomic read: PASS
UDT read: PASS
Write round-trip: PASS
Subscribe: PASS
Integration tests: 7/7 PASS
```
---
## 3. Beckhoff TwinCAT — Wire-Live Validation
### Background
The TwinCAT driver (`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/`) uses the
Beckhoff `TwinCAT.Ads` .NET SDK v6. The integration test suite at
`tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/`
(`TwinCAT3SmokeTests.cs`) covers 14 `[TwinCATFact]` methods + one 16-case
`[TwinCATTheory]` (30 cases total) against a live ADS runtime. The TCBSD ESXi
VM at `10.100.0.128` (AmsNetId `41.169.163.43.1.1`) is the primary fixture
runtime (project memory `project_tcbsd_fixture.md`) and bypasses the
TwinCAT/Hyper-V conflict on the dev box.
Live-hardware validation extends beyond the TCBSD VM to confirm the driver
works against a production PLC (not just the ESXi test VM) and that the three
defects found during original integration testing do not regress on newer
firmware:
1. Notification cycle time unit (250 ms was being set to ~41 min — fixed).
2. `STRING(N)` / `WSTRING(N)` type mapper (fixed).
3. Bit-indexed BOOL path (fixed).
### Preconditions
**TCBSD ESXi fixture (primary — no physical hardware needed)**
| Item | Requirement |
|------|-------------|
| TCBSD VM | Running on ESXi at `10.100.0.128` |
| AMS Net ID | `41.169.163.43.1.1` |
| ADS port | `851` (TwinCAT 3 PLC runtime 1) |
| PLC project | TwinCAT project from `tests/.../TwinCatProject/` loaded and in Run state |
| Network | TCP port 48898 reachable from dev box to `10.100.0.128` |
**Production PLC (for true wire-live validation)**
| Item | Requirement |
|------|-------------|
| TwinCAT hardware | Beckhoff IPC or CX series, TwinCAT 3 (TC3); TC2 is a known gap per fixture doc |
| AMS route | Route configured on TwinCAT device back to the OtOpcUa host |
| PLC state | Run state |
| GVL | At least a `GVL_Fixture.nCounter` DINT and `GVL_Fixture.rSetpoint` REAL present |
### Procedure — TCBSD ESXi fixture
**Step 1 — Verify TCBSD VM is reachable**
```powershell
Test-NetConnection -ComputerName 10.100.0.128 -Port 48898
```
Pass: `TcpTestSucceeded: True`.
**Step 2 — Run the integration test suite**
```powershell
$env:TWINCAT_TARGET_HOST = "10.100.0.128"
$env:TWINCAT_TARGET_NETID = "41.169.163.43.1.1"
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests `
--logger "console;verbosity=normal"
```
Pass: all 30 test cases pass (14 `[TwinCATFact]` + 16-case `[TwinCATTheory]`).
No `[TwinCATFact]` / `[TwinCATTheory]` skips — the env var is set, so the
runtime probe is expected to succeed.
Key tests to watch:
| Test | Validates |
|------|-----------|
| `Driver_subscribe_receives_native_ADS_notifications_on_counter_changes` | Native ADS notification path (the cycle-time-unit bug regression) |
| `Driver_reads_every_primitive_type_with_correct_mapping` | 16-type theory incl. `STRING(N)` |
| `Driver_reads_bit_indexed_BOOL_from_word` | Bit-indexed BOOL fix regression |
| `Driver_auto_reconnects_after_underlying_client_is_disposed` | Reconnect on ADS client dispose |
| `Driver_routes_reads_per_device_and_isolates_unreachable_peers` | Multi-device isolation |
**Step 3 — OtOpcUa server browse/read via Client CLI**
Start OtOpcUa with a TwinCAT DriverInstance pointing at the TCBSD VM:
```powershell
# appsettings.json DriverInstance: Type=TwinCAT, AmsNetId=41.169.163.43.1.1, AmsPort=851
sc start OtOpcUa
# or dev run
dotnet run --project src/Server/ZB.MOM.WW.OtOpcUa.Server
```
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
browse -u opc.tcp://localhost:4840 -r -d 4
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
read -u opc.tcp://localhost:4840 -n "ns=2;s=TwinCAT/<device>/GVL_Fixture/nCounter"
```
Pass: browse shows the PLC symbol tree; read returns `Good` quality with an
integer value.
### Procedure — Production PLC (optional, for full wire-live signoff)
If a Beckhoff production IPC is available in the lab:
**Step 1** — Configure the AMS route on the TwinCAT device (TwinCAT System
Manager → Routes → Add static route from the TwinCAT device back to the
OtOpcUa server machine).
**Step 2** — Set env vars and run the integration suite against the production
target:
```powershell
$env:TWINCAT_TARGET_HOST = "<production-plc-ip>"
$env:TWINCAT_TARGET_NETID = "<production-ams-net-id>"
$env:TWINCAT_TARGET_PORT = "851"
dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests
```
**Step 3** — Subscribe to a counter tag for 30 s to confirm native
notifications arrive:
```powershell
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- `
subscribe -u opc.tcp://localhost:4840 `
-n "ns=2;s=TwinCAT/<device>/GVL_Fixture/nCounter" -i 100
```
Pass: events arrive every ~100 ms driven by the PLC's ADS notification, not
by polling.
### Expected results
| Check | TCBSD VM | Production PLC |
|-------|----------|----------------|
| ADS port 48898 reachable | Required | Required |
| Integration tests: all 30 pass | Required | Optional (same 30) |
| Notification cycle-time test passes | Required | Required |
| Server browse shows symbol tree | Required | Optional |
| Read `Good` quality | Required | Optional |
| Native ADS notifications deliver in subscribe | Required | Recommended |
### Known gaps (documented — not blockers for v2 GA)
Per `docs/drivers/TwinCAT-Test-Fixture.md` §"What it does NOT cover":
- Multi-hop AMS routing — single-hop only.
- TC2 (ADS v1) compatibility — TC3 only.
- Notification coalescing under sustained CPU load.
- `Symbol version changed (0x0702)` storm handling under rapid PLC re-downloads.
These are deferred to v3 per `docs/v3/twincat-backlog.md`.
### Recording the outcome
```
TwinCAT wire-live validation
Date: YYYY-MM-DD
Target: TCBSD VM 10.100.0.128 AmsNetId=41.169.163.43.1.1 (and/or production PLC details)
TwinCAT version: <version>
OtOpcUa SHA: <git sha>
ADS port reachable: PASS
Integration tests: 30/30 PASS
notification-cycle-time test: PASS (regression check)
STRING(N) type test: PASS (regression check)
bit-indexed BOOL test: PASS (regression check)
Server browse: PASS
Read Good quality: PASS
Native subscription delivery: PASS <n> events in 30s
```