13 KiB
Phase 3 — OPC UA standards completeness (H4 + H2-bit + H6) — design
Status: approved 2026-06-15. Parent roadmap:
docs/plans/2026-06-15-stillpending-backlog-design.md(Phase 3). Backlog items H4 + H2 + H6 instillpending.md§1. Branch:feat/stillpending-phase-3-opcua-standardsoff master4af8e65a. Classification: high-risk (touches the OPC UA node-manager method seam, the alarm command path, the driver inbound-ack route, and the authorization permission model).
Problem
Three OPC UA standards gaps remain in the alarm + authorization surface:
- H4 — alarm
Enable/Disablenot callable over OPC UA. The node manager wiresOnAcknowledge/OnConfirm/OnAddComment/OnShelve/OnTimedUnshelvebut notOnEnableDisable(AlarmCommand.csitself flags it as "a future task"). The engine side is already done (ScriptedAlarmEngine.EnableAsync/DisableAsync, dispatched atScriptedAlarmHostActor.cs:408-412) and theAlarmCommandvocabulary already includesEnable/Disable— only the node-manager delegate is missing. - H2 (permission bit only) —
NodePermissions.HistoryUpdatemissing. The flags enum has noHistoryUpdatemember, soTriePermissionEvaluator.cs:86mapsOpcUaOperation.HistoryUpdate → the HistoryRead bit. A read-only HistoryRead grant would therefore also authorize a future HistoryUpdate. The full HistoryUpdate service stays deferred (infra-gated — no backend insert/replace/delete RPC). - H6 — inbound ack of a NATIVE (Galaxy) alarm never reaches the driver / AVEVA.
MaterialiseAlarmConditionis the single shared materializer; both native and scripted conditions getOnAcknowledgewired to the scriptedAlarmCommandRouter→ScriptedAlarmHostActor→ engine. For a native alarm the engine doesn't own it, so the ack updates only the localAlarmConditionStateand never flows throughIAlarmSource.AcknowledgeAsync→ AVEVA. There is currently no inbound path from an OPC UA ack to a driver. (The authenticated principal is already carried on the scripted path —AlarmCommand.User = identity.DisplayName; the generic"opcua-client"lives only in the rawIAlarmSourcefallback inScriptedAlarmSource.cs:75.)
Goal
Make an OPC UA client able to Enable/Disable a Part 9 condition (H4); give HistoryUpdate its own permission bit so a HistoryRead grant can't authorize it (H2-bit); and route a native-condition Acknowledge to the owning driver (→ AVEVA) carrying the authenticated principal (H6). No EF migration; the only contract touches are additive (a sink-seam bool, a Core.Abstractions record field).
Locked decisions (brainstorming 2026-06-15)
- H6 native-ack routing = a new
NativeAlarmAckRouterseam on the node manager (a secondAction, mirroringAlarmCommandRouter). Native conditions wireOnAcknowledgeto it; scripted conditions keep the existing router. Rejected: a single router disambiguated downstream (couples the scripted host to native routing). - H4 scoped to scripted alarms only. Enable/Disable has clear engine semantics there. Native-alarm
Enable/Disable is deferred — the driver has no enable/disable surface; a native condition's
OnEnableDisablereturnsBadNotSupported(honest) rather than a silent no-op. - H6 live-provability accepted as code+unit-proven locally; the AVEVA round-trip is operator-gated. The
route is driver-agnostic via
IAlarmSource, verified with Galaxy (the only real native-alarm source, withGatewayGalaxyAlarmAcknowledger). The gateway ack→AVEVA commit needs the real gateway (10.100.0.48), the same infra-gating class as H5's durable sink / the Galaxy live-gw smokes. - Sequencing: all three this phase, increasing risk — H2-bit → H4 → H6.
Architecture
H2-bit (small, independent)
NodePermissions: addHistoryUpdate = 1 << 12(next free bit afterMethodCall = 1 << 11). Optionally fold into theAdmincomposite (write-side history op); leaveEngineer/Operatorwithout it.TriePermissionEvaluator.MapOperationToPermission:OpcUaOperation.HistoryUpdate => NodePermissions.HistoryUpdate(wasHistoryRead). Drop the stale TODO comment.- Pure mapping change — fully unit-testable; the HistoryUpdate service remains unimplemented (a client calling HistoryUpdate still gets the SDK's default reject — unchanged).
H4 (small-medium) — wire OnEnableDisable
- In
MaterialiseAlarmCondition, for scripted conditions wirealarm.OnEnableDisable = (context, condition, enabling) => HandleAlarmCommand(context, condition, enabling ? "Enable" : "Disable", comment: null, unshelveAt: null);— reusing the existing AlarmAck-gatedHandleAlarmCommand(which buildsAlarmCommand(Enable|Disable)and routes viaAlarmCommandRouter).ScriptedAlarmHostActor:408-412already dispatches those toScriptedAlarmEngine.Enable/DisableAsync. - For native conditions,
OnEnableDisablereturnsBadNotSupported(decision #2). - Verify the SDK
OnEnableDisabledelegate's exact signature against the decompiledAlarmConditionState/ConditionState(theOnShelvewiring was verified the same way) and that the enable-state round-trips to the node (the engine re-projects viaWriteAlarmCondition, whose delta-gate suppresses the double-event — same mechanism the ack path documents at:591). - Authz: AlarmAck (consistent with the other Part 9 alarm commands).
H6 (the meaty one) — native ack → driver → AVEVA + principal
Four moving parts:
- Mark native conditions at materialization. Add
bool isNativetoIOpcUaAddressSpaceSink.MaterialiseAlarmCondition(+DeferredAddressSpaceSink,SdkAddressSpaceSink,NullOpcUaAddressSpaceSink,OtOpcUaNodeManager).Phase7Applieralready knows which is which: passisNative: trueat the native equipment-tag-alarm site (:204) andfalseat the scripted site (:295). The node manager tracks native node ids (e.g. a_nativeAlarmNodeIdsset) and wires theirOnAcknowledge(andOnEnableDisable) accordingly. NativeAlarmAckRouterseam. A newAction<NativeAlarmAck>?on the node manager (mirrorsAlarmCommandRouter). A native condition'sOnAcknowledge→ resolve the AlarmAck-gated principal the same wayHandleAlarmCommanddoes (fail-closed) → invoke the router withNativeAlarmAck(ConditionNodeId, Comment, OperatorUser = identity.DisplayName)→ returnGood.- Reverse map in
DriverHostActor.DriverHostActoralready builds the forward map_alarmNodeIdByDriverRef: (DriverInstanceId, FullName) → condition NodeId(s)(:127, rebuilt at:902). Build the inverse in the same clear-and-rebuild pass:conditionNodeId → (DriverInstanceId, FullName). The host wires theNativeAlarmAckRouterto a fire-and-forgetTellintoDriverHostActor; on receipt it looks up the inverse map →(driverId, alarmRef)→ routes to theDriverInstanceActor→driver.AcknowledgeAsync([ new AlarmAcknowledgeRequest(SourceNodeId, ConditionId: alarmRef, Comment, OperatorUser) ], ct). - Carry the principal to the driver.
AlarmAcknowledgeRequest(SourceNodeId, ConditionId, Comment?)gainsstring? OperatorUser(the field the base-interface comment already anticipated — "Stream G provides the authenticated principal").GalaxyDriver.AcknowledgeAsyncpasses it toIGalaxyAlarmAcknowledger.AcknowledgeAsync(alarmFullReference, comment, operatorUser, ct)→ gateway → AVEVA. Also tidyScriptedAlarmSource.cs:75's"opcua-client"default to honor a supplied principal.
Native-alarm-source drivers that don't implement a real acknowledger (AbCip/FOCAS projections) accept the request as today (no behavioral regression); only Galaxy commits to its upstream.
OPC UA client Acknowledge on a NATIVE condition
→ OtOpcUaNodeManager.OnAcknowledge (native) [AlarmAck gate, fail-closed]
→ NativeAlarmAckRouter(NativeAlarmAck{node, comment, operatorUser})
→ DriverHostActor [inverse map conditionNodeId → (driverId, alarmRef)]
→ DriverInstanceActor → driver.AcknowledgeAsync(AlarmAcknowledgeRequest{…, OperatorUser})
→ GalaxyDriver → GatewayGalaxyAlarmAcknowledger → gateway → AVEVA
Error handling / edge cases
- Anonymous / missing AlarmAck role on any of H4/H6 →
BadUserAccessDenied(fail-closed, same as the existing alarm methods). - Native ack with no driver match (condition node not in the inverse map — e.g. a scripted alarm wrongly
routed) → log + drop; never throw under the node-manager Lock. The router
Tellis fire-and-forget (safe under Lock, likeAlarmCommandRouter). - Driver faulted / reconnecting →
AcknowledgeAsyncfails or no-ops per the driver; the local condition state still reflects the SDK-applied ack (the OPC UA client sees Good). Surfacing a failed upstream ack back to the client is out of scope (mirrors the Galaxy write-outcome fire-and-forget limitation). - H4 native Enable/Disable →
BadNotSupported(decision #2). - H2-bit: no runtime behavior change for existing grants except closing the HistoryRead⇒HistoryUpdate authorization hole; the service itself stays rejected.
Testing strategy (xUnit + Shouldly, TDD; NO bUnit)
- H2-bit:
TriePermissionEvaluatortest — a node granting onlyHistoryReaddoes not authorizeOpcUaOperation.HistoryUpdate(nowNotGranted); a grant including the newHistoryUpdatebit does. Enum bit-value/round-trip test (no collision withMethodCall). - H4: node-manager test —
OnEnableDisablewith an AlarmAck identity routes anEnable/DisableAlarmCommandthrough a capturedAlarmCommandRouter; anonymous →BadUserAccessDenied, no route; a native condition →BadNotSupported. Engine path (ScriptedAlarmHostActorEnable/Disable →ScriptedAlarmEngine) is already covered. - H6: node-manager test — a native condition's
OnAcknowledge(AlarmAck identity) invokes theNativeAlarmAckRouterwith the right(node, comment, operatorUser=DisplayName)and not the scriptedAlarmCommandRouter; a scripted condition still usesAlarmCommandRouter.DriverHostActortest — the inverse map resolves a condition node →(driverId, alarmRef)and aNativeAlarmAckroutes to a stub driver'sAcknowledgeAsyncwithOperatorUserpopulated.AlarmAcknowledgeRequestbyte/field test for the newOperatorUser. Galaxy: a unit test thatGalaxyDriver.AcknowledgeAsyncforwardsOperatorUserto the acknowledger; the live AVEVA commit is the existing skip-gatedGatewayGalaxy…LiveTestspattern. - Live
/run: H4 Enable/Disable proven end-to-end on docker-dev (author a scripted alarm, Disable via Client.CLI, confirm the condition disables); H6's local half proven (native ack routes to the driver — log evidence), the AVEVA round-trip operator-gated against10.100.0.48(decision #3).
Verification per item
dotnet buildclean (TreatWarningsAsErrors on production); fulldotnet testgreen.- High-risk review chain (serial spec → code) per task + a final integration review.
- Live
/run: H4 driven by the agent on docker-dev (login disabled); H6 local route agent-driven, AVEVA commit recorded as operator-gated (or driven against the reachable gateway if feasible this session).
Alternatives considered
- Single command router, disambiguate native vs scripted in the host — rejected (decision #1): pushes native routing knowledge into the scripted host; the dual-router split keeps each path's concern local.
- Native Enable/Disable now — rejected (decision #2): no driver enable/disable surface; murky semantics (the driver keeps producing events).
- Add the HistoryUpdate service — out of scope / deferred (infra-gated; on the roadmap defer list).
- Pass the principal out-of-band instead of on
AlarmAcknowledgeRequest— rejected: the record is the natural carrier and the base-interface comment already anticipated it; an additive field is the least surprising.
Hard constraints (carried from the roadmap)
NO Configuration entity / EF migration. The only contract touches are additive: a bool isNative on the
internal IOpcUaAddressSpaceSink sink seam, and string? OperatorUser on the AlarmAcknowledgeRequest
Core.Abstractions record. Stage by path — never git add .; never stage sql_login.txt,
src/Server/.../Host/pki/, pending.md, current.md, docker-dev/docker-compose.yml, stillpending.md.
Never echo or commit secrets (gateway API key). No force-push, no --no-verify. Razor/runtime behavior proven
only by live /run, never bUnit.