Files
lmxopcua/docs/plans/2026-06-19-followups-batch.md
T
Joseph Doherty ad359c5cd3
v2-ci / build (push) Failing after 40s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
docs(plan): design + implementation plan + tasklist for non-arch follow-ups batch (A/B/C)
2026-06-19 01:19:37 -04:00

93 lines
8.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Non-architectural follow-ups batch — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development to execute this plan task-by-task. Each task is self-contained; honor its Classification for the review chain.
**Goal:** Close the actionable non-architectural follow-ups (A/B groups), and capture the
operational/verify and blocked items (C) so nothing is lost.
**Design:** `docs/plans/2026-06-19-followups-batch-design.md`
**Base:** master `f57aa8fa`. **Branch (at execution):** `feat/followups-batch` (off master).
**Standing guardrails:** no EF migration, no Commons/proto/wire change, no bUnit; stage by explicit
path; never stage `sql_login.txt`/`Host/pki/`/`docker-dev/docker-compose.yml`/`pending.md`/
`current.md`/`stillpending.md`; no `--no-verify`/force-push; `dangerouslyDisableSandbox` for
build/test/rig. Finish a batch = ff-merge to master + push.
**Recommended execution order / waves** (disjoint files → concurrent):
- **Wave 1 (code, concurrent):** T1 (OpcUaClient) ∥ T2 (Client.CLI) ∥ T3 (cert-audit AdminUI) ∥ T4 (Galaxy modal) ∥ T5 (vtag modal) — all disjoint projects/files.
- **Wave 2 (code):** T6 (write-outcome, OpcUaServer/Runtime) — its own.
- **Gates (do NOT build without explicit go-ahead):** T7, T8 (reconsider).
- **Operator/rig:** T9, T10 (verify). **Blocked:** T11.
- Each wave: per-task review by classification + a final integration review, then merge+push.
---
### Task 1 (A1): OpcUaClient history session-capture-before-gate race
**Classification:** standard · **Parallelizable with:** T2,T3,T4,T5,T6
**Files:** `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriver.cs` · tests in `tests/Drivers/.../OpcUaClient*Tests`
**Steps:**
1. Audit every `var session = RequireSession()` that precedes `await _gate.WaitAsync` (known sites: 1134, 1299, 1413, **1618 `ExecuteHistoryReadAsync`**, 1788). Compare to the correct idiom at `:622-628` (outside `_ = RequireSession()` fast-fail guard, then re-read the session *inside* the gate).
2. Write a failing regression test: acquire `_gate`, swap `Session` (simulate `OnReconnectComplete`), release; assert the method under test uses the NEW session, not the captured one. (Use the existing OpcUaClient test harness; if session-swap isn't fakeable, assert via the `Gate` internal + a seam.)
3. Refactor each site to re-resolve inside the gate (keep the outside guard). Run the driver unit suite green; `dotnet build` the driver.
4. Commit `fix(opcuaclient): re-resolve session inside _gate in history/read paths (stale-session race)`.
### Task 2 (A2): Client.CLI `enable`/`disable` command (H4 client path)
**Classification:** standard · **Parallelizable with:** T1,T3,T4,T5,T6
**Files:** `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI/` (Program.cs + the ack/shelve/confirm command template — explore for the actual command structure) · the client-side `IOpcUaClientService` (+ impl) · CLI/Client.Shared tests
**Steps:**
1. Find the existing `ack`/`shelve`/`confirm` CLI command + the `IOpcUaClientService.{Acknowledge,Shelve,Confirm}AlarmAsync` they call (template). Confirm whether `Enable`/`Disable` already exist on the service (grep) — if not, add `EnableAsync(nodeId)`/`DisableAsync(nodeId)` that call the OPC UA ConditionType Enable/Disable methods (mirror the ack call shape). **Client app interface only — NOT Commons/wire.**
2. Add CLI `enable`/`disable` commands mirroring `ack` (node-id arg, connect, call, print status).
3. Unit-test the service/VM call + the command wiring. Build + driver/client tests green.
4. Live (later): drive `enable`/`disable` against the rig's scripted condition node → AlarmAck-gated → engine Enable/DisableAsync (closes the deferred H4 live `/run`).
5. Commit `feat(cli): add enable/disable condition commands (H4 client path)`.
### Task 3 (A3): Cert-audit minor review nits
**Classification:** trivial · **Parallelizable with:** T1,T2,T4,T5,T6
**Files:** `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Certificates.razor` · `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Certificates/CertificateStoreManager.cs`
**Steps:**
1. (a) The two unreachable `ConfirmAction` fallthrough arms (`"cannot delete from {Kind}"`, `"unknown action"`): add an explicit `// unreachable defensive guard — buttons only render for Trusted/Rejected + 3 literal verbs` comment (simplest), OR route through the manager so they audit. Pick the comment unless trivial to route.
2. (b) Expose a `PkiRoot` property on `CertificateStoreManager`; have `Certificates.razor:130` read it instead of re-reading `OpcUa:PkiStoreRoot` independently.
3. Build AdminUI (0 errors); existing AdminUI.Tests green.
4. Commit `refactor(adminui): tidy cert-audit review nits (fallthrough comment + single PkiStoreRoot read)`.
### Task 4 (B2): AdminUI — Galaxy re-pick preserves prior alarm-field edits
**Classification:** small · **Parallelizable with:** T1,T2,T3,T5,T6
**Files:** the Galaxy-address-picked handler on the equipment Tag modal (explore: `Components/Shared/Uns/TagModal.razor` + the Galaxy picker callback `OnGalaxyAddressPicked`/similar) · a pure merge helper + its unit test
**Steps:**
1. Reproduce: re-picking a Galaxy address resets manually-edited `alarm` fields. Find the picked-handler that overwrites the config.
2. Extract/extend a pure merge that applies picked defaults WITHOUT clobbering already-edited alarm fields (preserve-existing idiom); unit-test the merge.
3. Wire it into the handler. Build; AdminUI.Tests green. Live-verify on docker-dev (re-pick keeps edits).
4. Commit `fix(adminui): preserve edited alarm fields on Galaxy address re-pick`.
### Task 5 (B3): AdminUI — inline-create-script dropdown label drift
**Classification:** small · **Parallelizable with:** T1,T2,T3,T4,T6
**Files:** `VirtualTagModal` + its inline create-script handler (explore) · test if a pure binding helper exists
**Steps:**
1. Reproduce the label drift after "New script" inline-creates + binds (`SC-…`).
2. Refresh the bound-script label/selection from the created id after creation. Build; tests green; live-verify.
3. Commit `fix(adminui): refresh script dropdown label after inline create`.
### Task 6 (B1): Write-outcome residuals (Bad-quality blip + AuditWriteUpdateEvent + sync fail-fast)
**Classification:** standard · **Parallelizable with:** T1T5
**Files:** node-manager write path (`OtOpcUaNodeManager` `OnWriteValue` / the `IOpcUaNodeWriteGateway` outcome continuation — the write-outcome self-correction site, master `1d797c1c`) · Runtime gateway · tests
**Steps:**
1. Locate the failed-write revert continuation. Add behind the existing failure branch: (i) a brief Bad-quality status blip on the node before/with the revert; (ii) raise an OPC UA `AuditWriteUpdateEvent`; (iii) synchronous structural fail-fast for pre-dispatch-rejectable writes.
2. TDD each sub-behaviour (protocol-driver path only — Galaxy is fire-and-forget). Use the modbus exception-injector recipe for live proof (FC06 reject).
3. If any sub-part balloons >~300 LOC, split it out. Build; OpcUaServer + Runtime tests green.
4. Commit `feat(opcua): emit Bad blip + AuditWriteUpdateEvent + sync fail-fast on failed device write`.
### Task 7 (B4): F10b surgical DataType/IsArray in-place writes — **RECONSIDER GATE**
**Classification:** standard · **Do NOT build without an explicit fresh go-ahead** (previously decided against as dirty — brief value-type mismatch, no ModelChangeEvents, rare edits). If approved: extend `ISurgicalAddressSpaceSink.UpdateTagAttributes` to swap DataType/ValueRank in place + emit ModelChangeEvents; widen `Phase7Applier.TagDeltaIsSurgicalEligible`; **live-`/run` the rebuild=False path** (the prod-inertness trap, see the F10b deferred-wrapper lesson). Until approved this stays a deferred record.
### Task 8 (B5): Alarm-severity `SetSeverity` surgical update — **RECONSIDER GATE**
**Classification:** small · **Do NOT build without an explicit fresh go-ahead** (operationally invisible — the alarm engine overwrites authored severity on first eval). Recorded so the decision isn't a silent gap.
### Task 9 (C1): Modbus-Int64 full live authoring — **VERIFY-ONLY (operator/rig)**
**Classification:** verify · Seed a Modbus driver on docker-dev → sim `10.100.0.35:5020`, author an Int64 equipment tag, deploy, confirm the OPC UA node advertises `DataTypeIds.Int64` + reads changing. No code unless a gap surfaces.
### Task 10 (C2): S7 + AbCip Test-Connect probe happy-path — **VERIFY-ONLY (needs Windows-VM fixtures)**
**Classification:** verify · `lmxopcua-fix up s7 s7_1500` / `up abcip controllogix` from the Windows VM, then run the skip-gated probe E2E green path (`DriverProbeHandshakeE2eTests`).
### Task 11 (C3): Device-gated proofs — **BLOCKED (hardware)**
**Classification:** blocked · H6 native-ack→AVEVA, Galaxy Phase C historian T7, Phase B T9, AbLegacy/TwinCAT/FOCAS probe happy-paths — need Wonderware+AVEVA (`10.100.0.48`), a Galaxy native alarm, PLC5/SLC sim, ADS target, CNC+FWLIB. Captured; not executable here.