8.3 KiB
Non-architectural follow-ups batch — Design
Date: 2026-06-19
Status: Approved (planning) — for later subagent-driven execution
Base: master f57aa8fa
This batches every non-architectural open follow-up (the A/B/C survey) into one design + plan so it can be executed task-by-task later. Items are grouped by how actionable they are:
- A — actionable code now (buildable, no infra/contract change). Do these.
- B — design-deferred (decided not-worth-it, or out-of-scope by a prior mechanism choice). Included per request; two carry a reconsider flag (we previously rejected them).
- C — operational / live-verify (code already shipped; needs a rig/fixtures/devices up). Captured as verify tasks, not code; the device-gated ones are blocked on hardware.
Excluded (architectural / infra-gated, by request): H2 HistoryUpdate service, H5b durable
IHistoryWriter, per-Akka-member /hosts nesting (needs a Commons field on DriverHealthChanged),
driver IHistoryProvider→server HistoryRead bridge, modified-value history, array writes /
S7 wide-type arrays, the Wonderware SDK-semantic permanent boundary, full-stack WebSocket+JWT
DriverStatusHub test.
Standing guardrails (all tasks): no EF migration, no Commons/proto/wire change, no bUnit;
stage by explicit path; never stage sql_login.txt/Host/pki//docker-dev/docker-compose.yml/
pending.md/current.md/stillpending.md; no --no-verify/force-push; dangerouslyDisableSandbox
for build/test/rig. Each code task = its own commit; finish a batch = merge to master + push.
A — Actionable code
A1. OpcUaClient history session-capture-before-gate race (strongest — real latent bug)
OpcUaClientDriver.cs captures var session = RequireSession() before acquiring _gate at
lines 1134, 1299, 1413, 1618 (ExecuteHistoryReadAsync, the funnel for ReadRaw/Processed/AtTime),
and 1788 — whereas the read/write paths deliberately re-resolve inside the gate
(_ = RequireSession() guard at 624/714/829, then re-read after await _gate.WaitAsync; see the
:622-628 comment "a reconnect can swap Session while we wait on _gate"). So a reconnect that swaps
the session while a caller waits on _gate leaves these methods using a stale session.
Approach: for every var session = RequireSession() that precedes await _gate.WaitAsync, move
the authoritative read to inside the gate (keep an outside _ = RequireSession() fast-fail guard),
matching the existing read/write idiom. Add a regression test that swaps the session across the gate
wait and asserts the post-gate call uses the new session. Driver-internal; no contract change.
Classification: standard.
A2. Client.CLI enable / disable command (H4 client-driven path)
Phase 3 shipped the node-manager OnEnableDisable → engine Enable/DisableAsync, but there's no
client verb to invoke the OPC UA ConditionType Enable/Disable methods, so H4 was never live-driven.
Approach: mirror the existing ack/shelve/confirm command + service-method pattern — add
EnableAsync/DisableAsync to the client-side IOpcUaClientService (calls the OPC UA Enable/Disable
condition methods; client app interface, NOT a Commons/wire change) + the CLI enable/disable
commands. Unit-test the VM/service call; live-drive against the rig's scripted condition.
Classification: standard. (Unblocks the deferred H4 live /run.)
A3. Cert-audit minor review nits (from this session's final integration review)
Two cleanups in the just-shipped cert-audit code:
(a) Certificates.razor ConfirmAction has two unreachable fallthrough arms
("cannot delete from {Kind}", "unknown action") that build a CertActionResult.Fail inside
the razor, bypassing CertificateStoreManager → those (unreachable) failures aren't audited. Either
route them through the manager or add an explicit "unreachable defensive guard" comment.
(b) OpcUa:PkiStoreRoot is read in BOTH Certificates.razor:130 and the CertificateStoreManager
ctor — centralize on the manager (expose a PkiRoot property the razor reads).
Classification: trivial (combined into one task; same two files).
B — Design-deferred (included per request)
B1. Write-outcome residuals
Extend the shipped compare-and-revert write path (write-outcome self-correction, master 1d797c1c):
on a failed inbound device write, additionally (i) emit a Bad-quality blip on the node, (ii) raise
an OPC UA AuditWriteUpdateEvent, and (iii) add synchronous structural fail-fast for writes
that can be rejected before dispatch. These were out-of-scope by the chosen revert-only mechanism.
Approach: locate the revert continuation (node-manager OnWriteValue/the IOpcUaNodeWriteGateway
outcome handler), add the three behaviours behind the existing failure branch. Galaxy can't surface a
write failure (fire-and-forget gateway), so this only exercises on protocol drivers.
Classification: standard. (Each of i/ii/iii could split if it balloons.)
B2. AdminUI — Galaxy re-pick preserves prior alarm-field edits
On the equipment Tag modal, re-picking a Galaxy address currently resets manually-edited alarm
fields. Approach: in the Galaxy-address-picked handler, merge the picked defaults without
clobbering already-edited alarm fields (same preserve-unknown idiom used elsewhere). No bUnit;
live-verify on docker-dev. Classification: small.
B3. AdminUI — inline-create-script dropdown label drift
After "New script" inline-creates + binds a script from the VirtualTagModal, the script dropdown
label can drift from the selection. Approach: refresh the bound-script label from the created
SC-… id after creation. Classification: trivial/small.
B4. F10b surgical DataType / IsArray in-place writes (RECONSIDER — previously rejected)
Previously decided against: dirty (brief value-type mismatch on the live node, no
ModelChangeEvents) for rare edits → kept full rebuild. Included here only so the decision is a task,
not a silent gap. To actually build: extend ISurgicalAddressSpaceSink.UpdateTagAttributes to also
swap DataType/ValueRank in place + emit ModelChangeEvents, and widen TagDeltaIsSurgicalEligible.
Recommendation: keep deferred unless a concrete need appears. Classification: standard.
B5. Alarm-severity SetSeverity surgical update (RECONSIDER — previously rejected)
Previously decided against: operationally invisible — the live alarm engine
(ScriptedAlarmHostActor→AlarmStateUpdate) overwrites the authored severity on first eval, so an
in-place node severity change is shadowed immediately. Included for completeness.
Recommendation: keep deferred. Classification: small.
C — Operational / live-verify (not code; needs a rig)
C1. Modbus-Int64 full live authoring
Code shipped (Phase 4 T1, bd8fee61); never live-authored because docker-dev has no seeded Modbus
driver. Verify steps: seed a Modbus driver on docker-dev pointed at sim 10.100.0.35:5020, author
an Int64 equipment tag, deploy, confirm the OPC UA node advertises DataTypeIds.Int64 and reads.
No code (unless seeding surfaces a gap). Classification: verify-only.
C2. S7 + AbCip Test-Connect probe happy-path live-verify
Probes shipped (Phase 5); green-path skipped because the fixtures aren't on this Mac. Verify steps:
bring up lmxopcua-fix up s7 s7_1500 / up abcip controllogix from the Windows VM, run the skip-gated
probe E2E green path. Classification: verify-only (needs the Windows-VM fixtures).
C3. Device-gated proofs (BLOCKED on hardware — capture only)
H6 native-ack→AVEVA round-trip, Galaxy Phase C historian T7 live HistoryRead, Phase B native-alarm T9,
and AbLegacy/TwinCAT/FOCAS probe happy-paths all need real devices (Wonderware sidecar+AVEVA on
10.100.0.48, a Galaxy native alarm, PLC5/SLC sim, ADS target, CNC+FWLIB). Recorded so they're not
lost; not executable without the hardware. Classification: blocked.
Execution shape
Independent code tasks (A1/A2/A3, B1/B2/B3) touch disjoint projects → dispatchable concurrently
in a subagent-driven run, each its own commit, per-task review by classification, final integration
review, merge+push. B4/B5 are reconsider gates (don't build without a fresh go-ahead). C1/C2 are
operator/rig verify; C3 is blocked. See 2026-06-19-followups-batch.md for the executable task list.