docs(alarms): Phase 3 design — OPC UA standards completeness (H4 Enable/Disable + H2 HistoryUpdate bit + H6 native-ack→AVEVA)
This commit is contained in:
@@ -0,0 +1,173 @@
|
||||
# Phase 3 — OPC UA standards completeness (H4 + H2-bit + H6) — design
|
||||
|
||||
> **Status:** approved 2026-06-15. Parent roadmap: `docs/plans/2026-06-15-stillpending-backlog-design.md`
|
||||
> (Phase 3). Backlog items **H4 + H2 + H6** in `stillpending.md` §1.
|
||||
> **Branch:** `feat/stillpending-phase-3-opcua-standards` off master `4af8e65a`.
|
||||
> **Classification:** high-risk (touches the OPC UA node-manager method seam, the alarm command path,
|
||||
> the driver inbound-ack route, and the authorization permission model).
|
||||
|
||||
## Problem
|
||||
|
||||
Three OPC UA standards gaps remain in the alarm + authorization surface:
|
||||
|
||||
- **H4 — alarm `Enable`/`Disable` not callable over OPC UA.** The node manager wires
|
||||
`OnAcknowledge`/`OnConfirm`/`OnAddComment`/`OnShelve`/`OnTimedUnshelve` but **not** `OnEnableDisable`
|
||||
(`AlarmCommand.cs` itself flags it as "a future task"). The engine side is **already done**
|
||||
(`ScriptedAlarmEngine.EnableAsync`/`DisableAsync`, dispatched at `ScriptedAlarmHostActor.cs:408-412`) and
|
||||
the `AlarmCommand` vocabulary already includes `Enable`/`Disable` — only the node-manager delegate is missing.
|
||||
- **H2 (permission bit only) — `NodePermissions.HistoryUpdate` missing.** The flags enum has no
|
||||
`HistoryUpdate` member, so `TriePermissionEvaluator.cs:86` maps `OpcUaOperation.HistoryUpdate → the
|
||||
HistoryRead bit`. A read-only HistoryRead grant would therefore also authorize a future HistoryUpdate. The
|
||||
full HistoryUpdate **service** stays **deferred** (infra-gated — no backend insert/replace/delete RPC).
|
||||
- **H6 — inbound ack of a NATIVE (Galaxy) alarm never reaches the driver / AVEVA.**
|
||||
`MaterialiseAlarmCondition` is the **single shared** materializer; both native and scripted conditions get
|
||||
`OnAcknowledge` wired to the **scripted** `AlarmCommandRouter` → `ScriptedAlarmHostActor` → engine. For a
|
||||
native alarm the engine doesn't own it, so the ack updates only the local `AlarmConditionState` and never
|
||||
flows through `IAlarmSource.AcknowledgeAsync` → AVEVA. There is currently **no** inbound path from an OPC UA
|
||||
ack to a driver. (The authenticated principal is *already* carried on the scripted path —
|
||||
`AlarmCommand.User = identity.DisplayName`; the generic `"opcua-client"` lives only in the raw
|
||||
`IAlarmSource` fallback in `ScriptedAlarmSource.cs:75`.)
|
||||
|
||||
## Goal
|
||||
|
||||
Make an OPC UA client able to Enable/Disable a Part 9 condition (H4); give HistoryUpdate its own permission
|
||||
bit so a HistoryRead grant can't authorize it (H2-bit); and route a native-condition Acknowledge to the
|
||||
owning driver (→ AVEVA) carrying the authenticated principal (H6). No EF migration; the only contract
|
||||
touches are additive (a sink-seam bool, a Core.Abstractions record field).
|
||||
|
||||
## Locked decisions (brainstorming 2026-06-15)
|
||||
|
||||
1. **H6 native-ack routing = a new `NativeAlarmAckRouter` seam** on the node manager (a second `Action`,
|
||||
mirroring `AlarmCommandRouter`). Native conditions wire `OnAcknowledge` to it; scripted conditions keep the
|
||||
existing router. Rejected: a single router disambiguated downstream (couples the scripted host to native
|
||||
routing).
|
||||
2. **H4 scoped to scripted alarms only.** Enable/Disable has clear engine semantics there. Native-alarm
|
||||
Enable/Disable is deferred — the driver has no enable/disable surface; a native condition's
|
||||
`OnEnableDisable` returns `BadNotSupported` (honest) rather than a silent no-op.
|
||||
3. **H6 live-provability accepted as code+unit-proven locally; the AVEVA round-trip is operator-gated.** The
|
||||
route is driver-agnostic via `IAlarmSource`, verified with **Galaxy** (the only real native-alarm source,
|
||||
with `GatewayGalaxyAlarmAcknowledger`). The gateway ack→AVEVA commit needs the real gateway
|
||||
(`10.100.0.48`), the same infra-gating class as H5's durable sink / the Galaxy live-gw smokes.
|
||||
4. **Sequencing:** all three this phase, increasing risk — **H2-bit → H4 → H6.**
|
||||
|
||||
## Architecture
|
||||
|
||||
### H2-bit (small, independent)
|
||||
- `NodePermissions`: add `HistoryUpdate = 1 << 12` (next free bit after `MethodCall = 1 << 11`). Optionally
|
||||
fold into the `Admin` composite (write-side history op); leave `Engineer`/`Operator` without it.
|
||||
- `TriePermissionEvaluator.MapOperationToPermission`: `OpcUaOperation.HistoryUpdate => NodePermissions.HistoryUpdate`
|
||||
(was `HistoryRead`). Drop the stale TODO comment.
|
||||
- Pure mapping change — fully unit-testable; the HistoryUpdate **service** remains unimplemented (a client
|
||||
calling HistoryUpdate still gets the SDK's default reject — unchanged).
|
||||
|
||||
### H4 (small-medium) — wire `OnEnableDisable`
|
||||
- In `MaterialiseAlarmCondition`, for **scripted** conditions wire
|
||||
`alarm.OnEnableDisable = (context, condition, enabling) => HandleAlarmCommand(context, condition,
|
||||
enabling ? "Enable" : "Disable", comment: null, unshelveAt: null);` — reusing the existing AlarmAck-gated
|
||||
`HandleAlarmCommand` (which builds `AlarmCommand(Enable|Disable)` and routes via `AlarmCommandRouter`).
|
||||
`ScriptedAlarmHostActor:408-412` already dispatches those to `ScriptedAlarmEngine.Enable/DisableAsync`.
|
||||
- For **native** conditions, `OnEnableDisable` returns `BadNotSupported` (decision #2).
|
||||
- Verify the SDK `OnEnableDisable` delegate's exact signature against the decompiled
|
||||
`AlarmConditionState`/`ConditionState` (the `OnShelve` wiring was verified the same way) and that the
|
||||
enable-state round-trips to the node (the engine re-projects via `WriteAlarmCondition`, whose delta-gate
|
||||
suppresses the double-event — same mechanism the ack path documents at `:591`).
|
||||
- **Authz:** AlarmAck (consistent with the other Part 9 alarm commands).
|
||||
|
||||
### H6 (the meaty one) — native ack → driver → AVEVA + principal
|
||||
Four moving parts:
|
||||
|
||||
1. **Mark native conditions at materialization.** Add `bool isNative` to
|
||||
`IOpcUaAddressSpaceSink.MaterialiseAlarmCondition` (+ `DeferredAddressSpaceSink`, `SdkAddressSpaceSink`,
|
||||
`NullOpcUaAddressSpaceSink`, `OtOpcUaNodeManager`). `Phase7Applier` already knows which is which: pass
|
||||
`isNative: true` at the native equipment-tag-alarm site (`:204`) and `false` at the scripted site (`:295`).
|
||||
The node manager tracks native node ids (e.g. a `_nativeAlarmNodeIds` set) and wires their `OnAcknowledge`
|
||||
(and `OnEnableDisable`) accordingly.
|
||||
2. **`NativeAlarmAckRouter` seam.** A new `Action<NativeAlarmAck>?` on the node manager (mirrors
|
||||
`AlarmCommandRouter`). A native condition's `OnAcknowledge` → resolve the AlarmAck-gated principal the same
|
||||
way `HandleAlarmCommand` does (fail-closed) → invoke the router with
|
||||
`NativeAlarmAck(ConditionNodeId, Comment, OperatorUser = identity.DisplayName)` → return `Good`.
|
||||
3. **Reverse map in `DriverHostActor`.** `DriverHostActor` already builds the forward map
|
||||
`_alarmNodeIdByDriverRef: (DriverInstanceId, FullName) → condition NodeId(s)` (`:127`, rebuilt at `:902`).
|
||||
Build the **inverse** in the same clear-and-rebuild pass: `conditionNodeId → (DriverInstanceId, FullName)`.
|
||||
The host wires the `NativeAlarmAckRouter` to a fire-and-forget `Tell` into `DriverHostActor`; on receipt it
|
||||
looks up the inverse map → `(driverId, alarmRef)` → routes to the `DriverInstanceActor` →
|
||||
`driver.AcknowledgeAsync([ new AlarmAcknowledgeRequest(SourceNodeId, ConditionId: alarmRef, Comment,
|
||||
OperatorUser) ], ct)`.
|
||||
4. **Carry the principal to the driver.** `AlarmAcknowledgeRequest(SourceNodeId, ConditionId, Comment?)` gains
|
||||
`string? OperatorUser` (the field the base-interface comment already anticipated — "Stream G provides the
|
||||
authenticated principal"). `GalaxyDriver.AcknowledgeAsync` passes it to
|
||||
`IGalaxyAlarmAcknowledger.AcknowledgeAsync(alarmFullReference, comment, operatorUser, ct)` → gateway → AVEVA.
|
||||
Also tidy `ScriptedAlarmSource.cs:75`'s `"opcua-client"` default to honor a supplied principal.
|
||||
|
||||
Native-alarm-source drivers that don't implement a real acknowledger (AbCip/FOCAS projections) accept the
|
||||
request as today (no behavioral regression); only Galaxy commits to its upstream.
|
||||
|
||||
```
|
||||
OPC UA client Acknowledge on a NATIVE condition
|
||||
→ OtOpcUaNodeManager.OnAcknowledge (native) [AlarmAck gate, fail-closed]
|
||||
→ NativeAlarmAckRouter(NativeAlarmAck{node, comment, operatorUser})
|
||||
→ DriverHostActor [inverse map conditionNodeId → (driverId, alarmRef)]
|
||||
→ DriverInstanceActor → driver.AcknowledgeAsync(AlarmAcknowledgeRequest{…, OperatorUser})
|
||||
→ GalaxyDriver → GatewayGalaxyAlarmAcknowledger → gateway → AVEVA
|
||||
```
|
||||
|
||||
## Error handling / edge cases
|
||||
|
||||
- **Anonymous / missing AlarmAck role** on any of H4/H6 → `BadUserAccessDenied` (fail-closed, same as the
|
||||
existing alarm methods).
|
||||
- **Native ack with no driver match** (condition node not in the inverse map — e.g. a scripted alarm wrongly
|
||||
routed) → log + drop; never throw under the node-manager Lock. The router `Tell` is fire-and-forget (safe
|
||||
under Lock, like `AlarmCommandRouter`).
|
||||
- **Driver faulted / reconnecting** → `AcknowledgeAsync` fails or no-ops per the driver; the local condition
|
||||
state still reflects the SDK-applied ack (the OPC UA client sees Good). Surfacing a *failed* upstream ack
|
||||
back to the client is out of scope (mirrors the Galaxy write-outcome fire-and-forget limitation).
|
||||
- **H4 native Enable/Disable** → `BadNotSupported` (decision #2).
|
||||
- **H2-bit:** no runtime behavior change for existing grants except closing the HistoryRead⇒HistoryUpdate
|
||||
authorization hole; the service itself stays rejected.
|
||||
|
||||
## Testing strategy (xUnit + Shouldly, TDD; NO bUnit)
|
||||
|
||||
- **H2-bit:** `TriePermissionEvaluator` test — a node granting only `HistoryRead` does **not** authorize
|
||||
`OpcUaOperation.HistoryUpdate` (now `NotGranted`); a grant including the new `HistoryUpdate` bit does. Enum
|
||||
bit-value/round-trip test (no collision with `MethodCall`).
|
||||
- **H4:** node-manager test — `OnEnableDisable` with an AlarmAck identity routes an `Enable`/`Disable`
|
||||
`AlarmCommand` through a captured `AlarmCommandRouter`; anonymous → `BadUserAccessDenied`, no route; a native
|
||||
condition → `BadNotSupported`. Engine path (`ScriptedAlarmHostActor` Enable/Disable → `ScriptedAlarmEngine`)
|
||||
is already covered.
|
||||
- **H6:** node-manager test — a **native** condition's `OnAcknowledge` (AlarmAck identity) invokes the
|
||||
`NativeAlarmAckRouter` with the right `(node, comment, operatorUser=DisplayName)` and **not** the scripted
|
||||
`AlarmCommandRouter`; a **scripted** condition still uses `AlarmCommandRouter`. `DriverHostActor` test — the
|
||||
inverse map resolves a condition node → `(driverId, alarmRef)` and a `NativeAlarmAck` routes to a stub
|
||||
driver's `AcknowledgeAsync` with `OperatorUser` populated. `AlarmAcknowledgeRequest` byte/field test for the
|
||||
new `OperatorUser`. Galaxy: a unit test that `GalaxyDriver.AcknowledgeAsync` forwards `OperatorUser` to the
|
||||
acknowledger; the live AVEVA commit is the existing skip-gated `GatewayGalaxy…LiveTests` pattern.
|
||||
- **Live `/run`:** H4 Enable/Disable proven end-to-end on docker-dev (author a scripted alarm, Disable via
|
||||
Client.CLI, confirm the condition disables); H6's local half proven (native ack routes to the driver — log
|
||||
evidence), the **AVEVA round-trip operator-gated** against `10.100.0.48` (decision #3).
|
||||
|
||||
## Verification per item
|
||||
|
||||
- `dotnet build` clean (TreatWarningsAsErrors on production); full `dotnet test` green.
|
||||
- High-risk review chain (serial spec → code) per task + a final integration review.
|
||||
- Live `/run`: H4 driven by the agent on docker-dev (login disabled); H6 local route agent-driven, AVEVA
|
||||
commit recorded as operator-gated (or driven against the reachable gateway if feasible this session).
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- **Single command router, disambiguate native vs scripted in the host** — rejected (decision #1): pushes
|
||||
native routing knowledge into the scripted host; the dual-router split keeps each path's concern local.
|
||||
- **Native Enable/Disable now** — rejected (decision #2): no driver enable/disable surface; murky semantics
|
||||
(the driver keeps producing events).
|
||||
- **Add the HistoryUpdate *service*** — out of scope / deferred (infra-gated; on the roadmap defer list).
|
||||
- **Pass the principal out-of-band instead of on `AlarmAcknowledgeRequest`** — rejected: the record is the
|
||||
natural carrier and the base-interface comment already anticipated it; an additive field is the least
|
||||
surprising.
|
||||
|
||||
## Hard constraints (carried from the roadmap)
|
||||
|
||||
NO Configuration entity / EF migration. The only contract touches are additive: a `bool isNative` on the
|
||||
internal `IOpcUaAddressSpaceSink` sink seam, and `string? OperatorUser` on the `AlarmAcknowledgeRequest`
|
||||
Core.Abstractions record. Stage by path — never `git add .`; never stage `sql_login.txt`,
|
||||
`src/Server/.../Host/pki/`, `pending.md`, `current.md`, `docker-dev/docker-compose.yml`, `stillpending.md`.
|
||||
Never echo or commit secrets (gateway API key). No force-push, no `--no-verify`. Razor/runtime behavior proven
|
||||
only by live `/run`, never bUnit.
|
||||
Reference in New Issue
Block a user