ad359c5cd3
v2-ci / build (push) Failing after 40s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
129 lines
8.3 KiB
Markdown
129 lines
8.3 KiB
Markdown
# Non-architectural follow-ups batch — Design
|
|
|
|
**Date:** 2026-06-19
|
|
**Status:** Approved (planning) — for later subagent-driven execution
|
|
**Base:** master `f57aa8fa`
|
|
|
|
This batches every **non-architectural** open follow-up (the A/B/C survey) into one design +
|
|
plan so it can be executed task-by-task later. Items are grouped by how actionable they are:
|
|
|
|
- **A — actionable code now** (buildable, no infra/contract change). Do these.
|
|
- **B — design-deferred** (decided not-worth-it, or out-of-scope by a prior mechanism choice).
|
|
Included per request; two carry a *reconsider* flag (we previously rejected them).
|
|
- **C — operational / live-verify** (code already shipped; needs a rig/fixtures/devices up).
|
|
Captured as verify tasks, not code; the device-gated ones are blocked on hardware.
|
|
|
|
**Excluded (architectural / infra-gated, by request):** H2 HistoryUpdate service, H5b durable
|
|
`IHistoryWriter`, per-Akka-member `/hosts` nesting (needs a Commons field on `DriverHealthChanged`),
|
|
driver `IHistoryProvider`→server HistoryRead bridge, modified-value history, array writes /
|
|
S7 wide-type arrays, the Wonderware SDK-semantic permanent boundary, full-stack WebSocket+JWT
|
|
DriverStatusHub test.
|
|
|
|
**Standing guardrails (all tasks):** no EF migration, no Commons/proto/wire change, no bUnit;
|
|
stage by explicit path; never stage `sql_login.txt`/`Host/pki/`/`docker-dev/docker-compose.yml`/
|
|
`pending.md`/`current.md`/`stillpending.md`; no `--no-verify`/force-push; `dangerouslyDisableSandbox`
|
|
for build/test/rig. Each code task = its own commit; finish a batch = merge to master + push.
|
|
|
|
---
|
|
|
|
## A — Actionable code
|
|
|
|
### A1. OpcUaClient history session-capture-before-gate race *(strongest — real latent bug)*
|
|
`OpcUaClientDriver.cs` captures `var session = RequireSession()` **before** acquiring `_gate` at
|
|
lines **1134, 1299, 1413, 1618 (`ExecuteHistoryReadAsync`, the funnel for ReadRaw/Processed/AtTime),
|
|
and 1788** — whereas the read/write paths deliberately re-resolve *inside* the gate
|
|
(`_ = RequireSession()` guard at 624/714/829, then re-read after `await _gate.WaitAsync`; see the
|
|
`:622-628` comment "a reconnect can swap Session while we wait on _gate"). So a reconnect that swaps
|
|
the session while a caller waits on `_gate` leaves these methods using a **stale session**.
|
|
**Approach:** for every `var session = RequireSession()` that precedes `await _gate.WaitAsync`, move
|
|
the authoritative read to *inside* the gate (keep an outside `_ = RequireSession()` fast-fail guard),
|
|
matching the existing read/write idiom. Add a regression test that swaps the session across the gate
|
|
wait and asserts the post-gate call uses the new session. Driver-internal; no contract change.
|
|
**Classification:** standard.
|
|
|
|
### A2. Client.CLI `enable` / `disable` command (H4 client-driven path)
|
|
Phase 3 shipped the node-manager `OnEnableDisable` → engine Enable/DisableAsync, but there's no
|
|
client verb to invoke the OPC UA ConditionType Enable/Disable methods, so H4 was never live-driven.
|
|
**Approach:** mirror the existing `ack`/`shelve`/`confirm` command + service-method pattern — add
|
|
`EnableAsync`/`DisableAsync` to the client-side `IOpcUaClientService` (calls the OPC UA Enable/Disable
|
|
condition methods; **client app interface, NOT a Commons/wire change**) + the CLI `enable`/`disable`
|
|
commands. Unit-test the VM/service call; live-drive against the rig's scripted condition.
|
|
**Classification:** standard. (Unblocks the deferred H4 live `/run`.)
|
|
|
|
### A3. Cert-audit minor review nits *(from this session's final integration review)*
|
|
Two cleanups in the just-shipped cert-audit code:
|
|
(a) `Certificates.razor` `ConfirmAction` has two **unreachable** fallthrough arms
|
|
(`"cannot delete from {Kind}"`, `"unknown action"`) that build a `CertActionResult.Fail` *inside*
|
|
the razor, bypassing `CertificateStoreManager` → those (unreachable) failures aren't audited. Either
|
|
route them through the manager or add an explicit "unreachable defensive guard" comment.
|
|
(b) `OpcUa:PkiStoreRoot` is read in BOTH `Certificates.razor:130` and the `CertificateStoreManager`
|
|
ctor — centralize on the manager (expose a `PkiRoot` property the razor reads).
|
|
**Classification:** trivial (combined into one task; same two files).
|
|
|
|
---
|
|
|
|
## B — Design-deferred (included per request)
|
|
|
|
### B1. Write-outcome residuals
|
|
Extend the shipped compare-and-revert write path (write-outcome self-correction, master `1d797c1c`):
|
|
on a failed inbound device write, additionally (i) emit a **Bad-quality blip** on the node, (ii) raise
|
|
an OPC UA **`AuditWriteUpdateEvent`**, and (iii) add **synchronous structural fail-fast** for writes
|
|
that can be rejected before dispatch. These were out-of-scope by the chosen revert-only mechanism.
|
|
**Approach:** locate the revert continuation (node-manager `OnWriteValue`/the `IOpcUaNodeWriteGateway`
|
|
outcome handler), add the three behaviours behind the existing failure branch. Galaxy can't surface a
|
|
write failure (fire-and-forget gateway), so this only exercises on protocol drivers.
|
|
**Classification:** standard. *(Each of i/ii/iii could split if it balloons.)*
|
|
|
|
### B2. AdminUI — Galaxy re-pick preserves prior alarm-field edits
|
|
On the equipment Tag modal, re-picking a Galaxy address currently resets manually-edited alarm
|
|
fields. **Approach:** in the Galaxy-address-picked handler, merge the picked defaults *without*
|
|
clobbering already-edited `alarm` fields (same preserve-unknown idiom used elsewhere). No bUnit;
|
|
live-verify on docker-dev. **Classification:** small.
|
|
|
|
### B3. AdminUI — inline-create-script dropdown label drift
|
|
After "New script" inline-creates + binds a script from the VirtualTagModal, the script dropdown
|
|
label can drift from the selection. **Approach:** refresh the bound-script label from the created
|
|
`SC-…` id after creation. **Classification:** trivial/small.
|
|
|
|
### B4. F10b surgical DataType / IsArray in-place writes *(RECONSIDER — previously rejected)*
|
|
Previously **decided against**: dirty (brief value-type mismatch on the live node, no
|
|
ModelChangeEvents) for rare edits → kept full rebuild. Included here only so the decision is a task,
|
|
not a silent gap. To actually build: extend `ISurgicalAddressSpaceSink.UpdateTagAttributes` to also
|
|
swap DataType/ValueRank in place + emit ModelChangeEvents, and widen `TagDeltaIsSurgicalEligible`.
|
|
**Recommendation:** keep deferred unless a concrete need appears. **Classification:** standard.
|
|
|
|
### B5. Alarm-severity `SetSeverity` surgical update *(RECONSIDER — previously rejected)*
|
|
Previously **decided against**: operationally invisible — the live alarm engine
|
|
(`ScriptedAlarmHostActor`→`AlarmStateUpdate`) overwrites the authored severity on first eval, so an
|
|
in-place node severity change is shadowed immediately. Included for completeness.
|
|
**Recommendation:** keep deferred. **Classification:** small.
|
|
|
|
---
|
|
|
|
## C — Operational / live-verify (not code; needs a rig)
|
|
|
|
### C1. Modbus-Int64 full live authoring
|
|
Code shipped (Phase 4 T1, `bd8fee61`); never live-authored because docker-dev has no seeded Modbus
|
|
driver. **Verify steps:** seed a Modbus driver on docker-dev pointed at sim `10.100.0.35:5020`, author
|
|
an Int64 equipment tag, deploy, confirm the OPC UA node advertises `DataTypeIds.Int64` and reads.
|
|
**No code** (unless seeding surfaces a gap). **Classification:** verify-only.
|
|
|
|
### C2. S7 + AbCip Test-Connect probe happy-path live-verify
|
|
Probes shipped (Phase 5); green-path skipped because the fixtures aren't on this Mac. **Verify steps:**
|
|
bring up `lmxopcua-fix up s7 s7_1500` / `up abcip controllogix` from the Windows VM, run the skip-gated
|
|
probe E2E green path. **Classification:** verify-only (needs the Windows-VM fixtures).
|
|
|
|
### C3. Device-gated proofs *(BLOCKED on hardware — capture only)*
|
|
H6 native-ack→AVEVA round-trip, Galaxy Phase C historian T7 live HistoryRead, Phase B native-alarm T9,
|
|
and AbLegacy/TwinCAT/FOCAS probe happy-paths all need real devices (Wonderware sidecar+AVEVA on
|
|
`10.100.0.48`, a Galaxy native alarm, PLC5/SLC sim, ADS target, CNC+FWLIB). Recorded so they're not
|
|
lost; **not executable** without the hardware. **Classification:** blocked.
|
|
|
|
---
|
|
|
|
## Execution shape
|
|
Independent code tasks (A1/A2/A3, B1/B2/B3) touch disjoint projects → dispatchable concurrently
|
|
in a subagent-driven run, each its own commit, per-task review by classification, final integration
|
|
review, merge+push. B4/B5 are *reconsider* gates (don't build without a fresh go-ahead). C1/C2 are
|
|
operator/rig verify; C3 is blocked. See `2026-06-19-followups-batch.md` for the executable task list.
|