Files
lmxopcua/docs/plans/2026-06-15-stillpending-phase-3-opcua-standards-design.md
T

13 KiB

Phase 3 — OPC UA standards completeness (H4 + H2-bit + H6) — design

Status: approved 2026-06-15. Parent roadmap: docs/plans/2026-06-15-stillpending-backlog-design.md (Phase 3). Backlog items H4 + H2 + H6 in stillpending.md §1. Branch: feat/stillpending-phase-3-opcua-standards off master 4af8e65a. Classification: high-risk (touches the OPC UA node-manager method seam, the alarm command path, the driver inbound-ack route, and the authorization permission model).

Problem

Three OPC UA standards gaps remain in the alarm + authorization surface:

  • H4 — alarm Enable/Disable not callable over OPC UA. The node manager wires OnAcknowledge/OnConfirm/OnAddComment/OnShelve/OnTimedUnshelve but not OnEnableDisable (AlarmCommand.cs itself flags it as "a future task"). The engine side is already done (ScriptedAlarmEngine.EnableAsync/DisableAsync, dispatched at ScriptedAlarmHostActor.cs:408-412) and the AlarmCommand vocabulary already includes Enable/Disable — only the node-manager delegate is missing.
  • H2 (permission bit only) — NodePermissions.HistoryUpdate missing. The flags enum has no HistoryUpdate member, so TriePermissionEvaluator.cs:86 maps OpcUaOperation.HistoryUpdate → the HistoryRead bit. A read-only HistoryRead grant would therefore also authorize a future HistoryUpdate. The full HistoryUpdate service stays deferred (infra-gated — no backend insert/replace/delete RPC).
  • H6 — inbound ack of a NATIVE (Galaxy) alarm never reaches the driver / AVEVA. MaterialiseAlarmCondition is the single shared materializer; both native and scripted conditions get OnAcknowledge wired to the scripted AlarmCommandRouterScriptedAlarmHostActor → engine. For a native alarm the engine doesn't own it, so the ack updates only the local AlarmConditionState and never flows through IAlarmSource.AcknowledgeAsync → AVEVA. There is currently no inbound path from an OPC UA ack to a driver. (The authenticated principal is already carried on the scripted path — AlarmCommand.User = identity.DisplayName; the generic "opcua-client" lives only in the raw IAlarmSource fallback in ScriptedAlarmSource.cs:75.)

Goal

Make an OPC UA client able to Enable/Disable a Part 9 condition (H4); give HistoryUpdate its own permission bit so a HistoryRead grant can't authorize it (H2-bit); and route a native-condition Acknowledge to the owning driver (→ AVEVA) carrying the authenticated principal (H6). No EF migration; the only contract touches are additive (a sink-seam bool, a Core.Abstractions record field).

Locked decisions (brainstorming 2026-06-15)

  1. H6 native-ack routing = a new NativeAlarmAckRouter seam on the node manager (a second Action, mirroring AlarmCommandRouter). Native conditions wire OnAcknowledge to it; scripted conditions keep the existing router. Rejected: a single router disambiguated downstream (couples the scripted host to native routing).
  2. H4 scoped to scripted alarms only. Enable/Disable has clear engine semantics there. Native-alarm Enable/Disable is deferred — the driver has no enable/disable surface; a native condition's OnEnableDisable returns BadNotSupported (honest) rather than a silent no-op.
  3. H6 live-provability accepted as code+unit-proven locally; the AVEVA round-trip is operator-gated. The route is driver-agnostic via IAlarmSource, verified with Galaxy (the only real native-alarm source, with GatewayGalaxyAlarmAcknowledger). The gateway ack→AVEVA commit needs the real gateway (10.100.0.48), the same infra-gating class as H5's durable sink / the Galaxy live-gw smokes.
  4. Sequencing: all three this phase, increasing risk — H2-bit → H4 → H6.

Architecture

H2-bit (small, independent)

  • NodePermissions: add HistoryUpdate = 1 << 12 (next free bit after MethodCall = 1 << 11). Optionally fold into the Admin composite (write-side history op); leave Engineer/Operator without it.
  • TriePermissionEvaluator.MapOperationToPermission: OpcUaOperation.HistoryUpdate => NodePermissions.HistoryUpdate (was HistoryRead). Drop the stale TODO comment.
  • Pure mapping change — fully unit-testable; the HistoryUpdate service remains unimplemented (a client calling HistoryUpdate still gets the SDK's default reject — unchanged).

H4 (small-medium) — wire OnEnableDisable

  • In MaterialiseAlarmCondition, for scripted conditions wire alarm.OnEnableDisable = (context, condition, enabling) => HandleAlarmCommand(context, condition, enabling ? "Enable" : "Disable", comment: null, unshelveAt: null); — reusing the existing AlarmAck-gated HandleAlarmCommand (which builds AlarmCommand(Enable|Disable) and routes via AlarmCommandRouter). ScriptedAlarmHostActor:408-412 already dispatches those to ScriptedAlarmEngine.Enable/DisableAsync.
  • For native conditions, OnEnableDisable returns BadNotSupported (decision #2).
  • Verify the SDK OnEnableDisable delegate's exact signature against the decompiled AlarmConditionState/ConditionState (the OnShelve wiring was verified the same way) and that the enable-state round-trips to the node (the engine re-projects via WriteAlarmCondition, whose delta-gate suppresses the double-event — same mechanism the ack path documents at :591).
  • Authz: AlarmAck (consistent with the other Part 9 alarm commands).

H6 (the meaty one) — native ack → driver → AVEVA + principal

Four moving parts:

  1. Mark native conditions at materialization. Add bool isNative to IOpcUaAddressSpaceSink.MaterialiseAlarmCondition (+ DeferredAddressSpaceSink, SdkAddressSpaceSink, NullOpcUaAddressSpaceSink, OtOpcUaNodeManager). Phase7Applier already knows which is which: pass isNative: true at the native equipment-tag-alarm site (:204) and false at the scripted site (:295). The node manager tracks native node ids (e.g. a _nativeAlarmNodeIds set) and wires their OnAcknowledge (and OnEnableDisable) accordingly.
  2. NativeAlarmAckRouter seam. A new Action<NativeAlarmAck>? on the node manager (mirrors AlarmCommandRouter). A native condition's OnAcknowledge → resolve the AlarmAck-gated principal the same way HandleAlarmCommand does (fail-closed) → invoke the router with NativeAlarmAck(ConditionNodeId, Comment, OperatorUser = identity.DisplayName) → return Good.
  3. Reverse map in DriverHostActor. DriverHostActor already builds the forward map _alarmNodeIdByDriverRef: (DriverInstanceId, FullName) → condition NodeId(s) (:127, rebuilt at :902). Build the inverse in the same clear-and-rebuild pass: conditionNodeId → (DriverInstanceId, FullName). The host wires the NativeAlarmAckRouter to a fire-and-forget Tell into DriverHostActor; on receipt it looks up the inverse map → (driverId, alarmRef) → routes to the DriverInstanceActordriver.AcknowledgeAsync([ new AlarmAcknowledgeRequest(SourceNodeId, ConditionId: alarmRef, Comment, OperatorUser) ], ct).
  4. Carry the principal to the driver. AlarmAcknowledgeRequest(SourceNodeId, ConditionId, Comment?) gains string? OperatorUser (the field the base-interface comment already anticipated — "Stream G provides the authenticated principal"). GalaxyDriver.AcknowledgeAsync passes it to IGalaxyAlarmAcknowledger.AcknowledgeAsync(alarmFullReference, comment, operatorUser, ct) → gateway → AVEVA. Also tidy ScriptedAlarmSource.cs:75's "opcua-client" default to honor a supplied principal.

Native-alarm-source drivers that don't implement a real acknowledger (AbCip/FOCAS projections) accept the request as today (no behavioral regression); only Galaxy commits to its upstream.

OPC UA client Acknowledge on a NATIVE condition
  → OtOpcUaNodeManager.OnAcknowledge (native)         [AlarmAck gate, fail-closed]
  → NativeAlarmAckRouter(NativeAlarmAck{node, comment, operatorUser})
  → DriverHostActor  [inverse map conditionNodeId → (driverId, alarmRef)]
  → DriverInstanceActor → driver.AcknowledgeAsync(AlarmAcknowledgeRequest{…, OperatorUser})
  → GalaxyDriver → GatewayGalaxyAlarmAcknowledger → gateway → AVEVA

Error handling / edge cases

  • Anonymous / missing AlarmAck role on any of H4/H6 → BadUserAccessDenied (fail-closed, same as the existing alarm methods).
  • Native ack with no driver match (condition node not in the inverse map — e.g. a scripted alarm wrongly routed) → log + drop; never throw under the node-manager Lock. The router Tell is fire-and-forget (safe under Lock, like AlarmCommandRouter).
  • Driver faulted / reconnectingAcknowledgeAsync fails or no-ops per the driver; the local condition state still reflects the SDK-applied ack (the OPC UA client sees Good). Surfacing a failed upstream ack back to the client is out of scope (mirrors the Galaxy write-outcome fire-and-forget limitation).
  • H4 native Enable/DisableBadNotSupported (decision #2).
  • H2-bit: no runtime behavior change for existing grants except closing the HistoryRead⇒HistoryUpdate authorization hole; the service itself stays rejected.

Testing strategy (xUnit + Shouldly, TDD; NO bUnit)

  • H2-bit: TriePermissionEvaluator test — a node granting only HistoryRead does not authorize OpcUaOperation.HistoryUpdate (now NotGranted); a grant including the new HistoryUpdate bit does. Enum bit-value/round-trip test (no collision with MethodCall).
  • H4: node-manager test — OnEnableDisable with an AlarmAck identity routes an Enable/Disable AlarmCommand through a captured AlarmCommandRouter; anonymous → BadUserAccessDenied, no route; a native condition → BadNotSupported. Engine path (ScriptedAlarmHostActor Enable/Disable → ScriptedAlarmEngine) is already covered.
  • H6: node-manager test — a native condition's OnAcknowledge (AlarmAck identity) invokes the NativeAlarmAckRouter with the right (node, comment, operatorUser=DisplayName) and not the scripted AlarmCommandRouter; a scripted condition still uses AlarmCommandRouter. DriverHostActor test — the inverse map resolves a condition node → (driverId, alarmRef) and a NativeAlarmAck routes to a stub driver's AcknowledgeAsync with OperatorUser populated. AlarmAcknowledgeRequest byte/field test for the new OperatorUser. Galaxy: a unit test that GalaxyDriver.AcknowledgeAsync forwards OperatorUser to the acknowledger; the live AVEVA commit is the existing skip-gated GatewayGalaxy…LiveTests pattern.
  • Live /run: H4 Enable/Disable proven end-to-end on docker-dev (author a scripted alarm, Disable via Client.CLI, confirm the condition disables); H6's local half proven (native ack routes to the driver — log evidence), the AVEVA round-trip operator-gated against 10.100.0.48 (decision #3).

Verification per item

  • dotnet build clean (TreatWarningsAsErrors on production); full dotnet test green.
  • High-risk review chain (serial spec → code) per task + a final integration review.
  • Live /run: H4 driven by the agent on docker-dev (login disabled); H6 local route agent-driven, AVEVA commit recorded as operator-gated (or driven against the reachable gateway if feasible this session).

Alternatives considered

  • Single command router, disambiguate native vs scripted in the host — rejected (decision #1): pushes native routing knowledge into the scripted host; the dual-router split keeps each path's concern local.
  • Native Enable/Disable now — rejected (decision #2): no driver enable/disable surface; murky semantics (the driver keeps producing events).
  • Add the HistoryUpdate service — out of scope / deferred (infra-gated; on the roadmap defer list).
  • Pass the principal out-of-band instead of on AlarmAcknowledgeRequest — rejected: the record is the natural carrier and the base-interface comment already anticipated it; an additive field is the least surprising.

Hard constraints (carried from the roadmap)

NO Configuration entity / EF migration. The only contract touches are additive: a bool isNative on the internal IOpcUaAddressSpaceSink sink seam, and string? OperatorUser on the AlarmAcknowledgeRequest Core.Abstractions record. Stage by path — never git add .; never stage sql_login.txt, src/Server/.../Host/pki/, pending.md, current.md, docker-dev/docker-compose.yml, stillpending.md. Never echo or commit secrets (gateway API key). No force-push, no --no-verify. Razor/runtime behavior proven only by live /run, never bUnit.