docs(plan): design + implementation plan + tasklist for non-arch follow-ups batch (A/B/C)
v2-ci / build (push) Failing after 40s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
v2-ci / build (push) Failing after 40s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
This commit is contained in:
@@ -0,0 +1,128 @@
|
||||
# Non-architectural follow-ups batch — Design
|
||||
|
||||
**Date:** 2026-06-19
|
||||
**Status:** Approved (planning) — for later subagent-driven execution
|
||||
**Base:** master `f57aa8fa`
|
||||
|
||||
This batches every **non-architectural** open follow-up (the A/B/C survey) into one design +
|
||||
plan so it can be executed task-by-task later. Items are grouped by how actionable they are:
|
||||
|
||||
- **A — actionable code now** (buildable, no infra/contract change). Do these.
|
||||
- **B — design-deferred** (decided not-worth-it, or out-of-scope by a prior mechanism choice).
|
||||
Included per request; two carry a *reconsider* flag (we previously rejected them).
|
||||
- **C — operational / live-verify** (code already shipped; needs a rig/fixtures/devices up).
|
||||
Captured as verify tasks, not code; the device-gated ones are blocked on hardware.
|
||||
|
||||
**Excluded (architectural / infra-gated, by request):** H2 HistoryUpdate service, H5b durable
|
||||
`IHistoryWriter`, per-Akka-member `/hosts` nesting (needs a Commons field on `DriverHealthChanged`),
|
||||
driver `IHistoryProvider`→server HistoryRead bridge, modified-value history, array writes /
|
||||
S7 wide-type arrays, the Wonderware SDK-semantic permanent boundary, full-stack WebSocket+JWT
|
||||
DriverStatusHub test.
|
||||
|
||||
**Standing guardrails (all tasks):** no EF migration, no Commons/proto/wire change, no bUnit;
|
||||
stage by explicit path; never stage `sql_login.txt`/`Host/pki/`/`docker-dev/docker-compose.yml`/
|
||||
`pending.md`/`current.md`/`stillpending.md`; no `--no-verify`/force-push; `dangerouslyDisableSandbox`
|
||||
for build/test/rig. Each code task = its own commit; finish a batch = merge to master + push.
|
||||
|
||||
---
|
||||
|
||||
## A — Actionable code
|
||||
|
||||
### A1. OpcUaClient history session-capture-before-gate race *(strongest — real latent bug)*
|
||||
`OpcUaClientDriver.cs` captures `var session = RequireSession()` **before** acquiring `_gate` at
|
||||
lines **1134, 1299, 1413, 1618 (`ExecuteHistoryReadAsync`, the funnel for ReadRaw/Processed/AtTime),
|
||||
and 1788** — whereas the read/write paths deliberately re-resolve *inside* the gate
|
||||
(`_ = RequireSession()` guard at 624/714/829, then re-read after `await _gate.WaitAsync`; see the
|
||||
`:622-628` comment "a reconnect can swap Session while we wait on _gate"). So a reconnect that swaps
|
||||
the session while a caller waits on `_gate` leaves these methods using a **stale session**.
|
||||
**Approach:** for every `var session = RequireSession()` that precedes `await _gate.WaitAsync`, move
|
||||
the authoritative read to *inside* the gate (keep an outside `_ = RequireSession()` fast-fail guard),
|
||||
matching the existing read/write idiom. Add a regression test that swaps the session across the gate
|
||||
wait and asserts the post-gate call uses the new session. Driver-internal; no contract change.
|
||||
**Classification:** standard.
|
||||
|
||||
### A2. Client.CLI `enable` / `disable` command (H4 client-driven path)
|
||||
Phase 3 shipped the node-manager `OnEnableDisable` → engine Enable/DisableAsync, but there's no
|
||||
client verb to invoke the OPC UA ConditionType Enable/Disable methods, so H4 was never live-driven.
|
||||
**Approach:** mirror the existing `ack`/`shelve`/`confirm` command + service-method pattern — add
|
||||
`EnableAsync`/`DisableAsync` to the client-side `IOpcUaClientService` (calls the OPC UA Enable/Disable
|
||||
condition methods; **client app interface, NOT a Commons/wire change**) + the CLI `enable`/`disable`
|
||||
commands. Unit-test the VM/service call; live-drive against the rig's scripted condition.
|
||||
**Classification:** standard. (Unblocks the deferred H4 live `/run`.)
|
||||
|
||||
### A3. Cert-audit minor review nits *(from this session's final integration review)*
|
||||
Two cleanups in the just-shipped cert-audit code:
|
||||
(a) `Certificates.razor` `ConfirmAction` has two **unreachable** fallthrough arms
|
||||
(`"cannot delete from {Kind}"`, `"unknown action"`) that build a `CertActionResult.Fail` *inside*
|
||||
the razor, bypassing `CertificateStoreManager` → those (unreachable) failures aren't audited. Either
|
||||
route them through the manager or add an explicit "unreachable defensive guard" comment.
|
||||
(b) `OpcUa:PkiStoreRoot` is read in BOTH `Certificates.razor:130` and the `CertificateStoreManager`
|
||||
ctor — centralize on the manager (expose a `PkiRoot` property the razor reads).
|
||||
**Classification:** trivial (combined into one task; same two files).
|
||||
|
||||
---
|
||||
|
||||
## B — Design-deferred (included per request)
|
||||
|
||||
### B1. Write-outcome residuals
|
||||
Extend the shipped compare-and-revert write path (write-outcome self-correction, master `1d797c1c`):
|
||||
on a failed inbound device write, additionally (i) emit a **Bad-quality blip** on the node, (ii) raise
|
||||
an OPC UA **`AuditWriteUpdateEvent`**, and (iii) add **synchronous structural fail-fast** for writes
|
||||
that can be rejected before dispatch. These were out-of-scope by the chosen revert-only mechanism.
|
||||
**Approach:** locate the revert continuation (node-manager `OnWriteValue`/the `IOpcUaNodeWriteGateway`
|
||||
outcome handler), add the three behaviours behind the existing failure branch. Galaxy can't surface a
|
||||
write failure (fire-and-forget gateway), so this only exercises on protocol drivers.
|
||||
**Classification:** standard. *(Each of i/ii/iii could split if it balloons.)*
|
||||
|
||||
### B2. AdminUI — Galaxy re-pick preserves prior alarm-field edits
|
||||
On the equipment Tag modal, re-picking a Galaxy address currently resets manually-edited alarm
|
||||
fields. **Approach:** in the Galaxy-address-picked handler, merge the picked defaults *without*
|
||||
clobbering already-edited `alarm` fields (same preserve-unknown idiom used elsewhere). No bUnit;
|
||||
live-verify on docker-dev. **Classification:** small.
|
||||
|
||||
### B3. AdminUI — inline-create-script dropdown label drift
|
||||
After "New script" inline-creates + binds a script from the VirtualTagModal, the script dropdown
|
||||
label can drift from the selection. **Approach:** refresh the bound-script label from the created
|
||||
`SC-…` id after creation. **Classification:** trivial/small.
|
||||
|
||||
### B4. F10b surgical DataType / IsArray in-place writes *(RECONSIDER — previously rejected)*
|
||||
Previously **decided against**: dirty (brief value-type mismatch on the live node, no
|
||||
ModelChangeEvents) for rare edits → kept full rebuild. Included here only so the decision is a task,
|
||||
not a silent gap. To actually build: extend `ISurgicalAddressSpaceSink.UpdateTagAttributes` to also
|
||||
swap DataType/ValueRank in place + emit ModelChangeEvents, and widen `TagDeltaIsSurgicalEligible`.
|
||||
**Recommendation:** keep deferred unless a concrete need appears. **Classification:** standard.
|
||||
|
||||
### B5. Alarm-severity `SetSeverity` surgical update *(RECONSIDER — previously rejected)*
|
||||
Previously **decided against**: operationally invisible — the live alarm engine
|
||||
(`ScriptedAlarmHostActor`→`AlarmStateUpdate`) overwrites the authored severity on first eval, so an
|
||||
in-place node severity change is shadowed immediately. Included for completeness.
|
||||
**Recommendation:** keep deferred. **Classification:** small.
|
||||
|
||||
---
|
||||
|
||||
## C — Operational / live-verify (not code; needs a rig)
|
||||
|
||||
### C1. Modbus-Int64 full live authoring
|
||||
Code shipped (Phase 4 T1, `bd8fee61`); never live-authored because docker-dev has no seeded Modbus
|
||||
driver. **Verify steps:** seed a Modbus driver on docker-dev pointed at sim `10.100.0.35:5020`, author
|
||||
an Int64 equipment tag, deploy, confirm the OPC UA node advertises `DataTypeIds.Int64` and reads.
|
||||
**No code** (unless seeding surfaces a gap). **Classification:** verify-only.
|
||||
|
||||
### C2. S7 + AbCip Test-Connect probe happy-path live-verify
|
||||
Probes shipped (Phase 5); green-path skipped because the fixtures aren't on this Mac. **Verify steps:**
|
||||
bring up `lmxopcua-fix up s7 s7_1500` / `up abcip controllogix` from the Windows VM, run the skip-gated
|
||||
probe E2E green path. **Classification:** verify-only (needs the Windows-VM fixtures).
|
||||
|
||||
### C3. Device-gated proofs *(BLOCKED on hardware — capture only)*
|
||||
H6 native-ack→AVEVA round-trip, Galaxy Phase C historian T7 live HistoryRead, Phase B native-alarm T9,
|
||||
and AbLegacy/TwinCAT/FOCAS probe happy-paths all need real devices (Wonderware sidecar+AVEVA on
|
||||
`10.100.0.48`, a Galaxy native alarm, PLC5/SLC sim, ADS target, CNC+FWLIB). Recorded so they're not
|
||||
lost; **not executable** without the hardware. **Classification:** blocked.
|
||||
|
||||
---
|
||||
|
||||
## Execution shape
|
||||
Independent code tasks (A1/A2/A3, B1/B2/B3) touch disjoint projects → dispatchable concurrently
|
||||
in a subagent-driven run, each its own commit, per-task review by classification, final integration
|
||||
review, merge+push. B4/B5 are *reconsider* gates (don't build without a fresh go-ahead). C1/C2 are
|
||||
operator/rig verify; C3 is blocked. See `2026-06-19-followups-batch.md` for the executable task list.
|
||||
Reference in New Issue
Block a user