docs(galaxy): Phase B native-alarms design (equipment-tag path)
This commit is contained in:
@@ -0,0 +1,265 @@
|
||||
# Galaxy Phase B — Native Alarms on the Equipment-Tag Path — Design
|
||||
|
||||
**Date:** 2026-06-14
|
||||
**Status:** Approved (brainstorming) — ready for implementation planning
|
||||
**Scope:** Phase B of the Galaxy standard-driver design
|
||||
(`docs/plans/2026-06-12-galaxy-standard-driver-design.md`). Restores native
|
||||
`IAlarmSource` alarms in the post-Phase-A Equipment model, delivered over a new
|
||||
driver→server alarm seam that mirrors the scripted-alarm seam.
|
||||
**Builds on:** master (Milestone 1b complete — equipment-tag live values + write
|
||||
pipeline shipped). Feature branch off master.
|
||||
|
||||
## Goal
|
||||
|
||||
A Galaxy equipment `Tag` bound to `GalaxyMxGateway` and marked as a **native
|
||||
alarm** materializes a real OPC UA Part 9 `AlarmConditionState` under its
|
||||
equipment folder, and the driver's live `IAlarmSource.OnAlarmEvent` transitions
|
||||
drive that condition (active / severity / message / ack-state) and fan out to the
|
||||
`alerts` topic (AdminUI `/alerts` + historian), exactly as scripted alarms do
|
||||
today. **No EF/schema migration.**
|
||||
|
||||
## Why this is needed (the gap, grounded in current code)
|
||||
|
||||
Phase A retired the `SystemPlatform` mirror (`MaterialiseGalaxyTags` +
|
||||
`GenericDriverNodeManager`), which was the **only** path that wired native
|
||||
alarms. Three concrete consequences, all verified against current code:
|
||||
|
||||
1. **No cross-actor transport for native alarms.** In the fused-host actor model,
|
||||
`DriverInstanceActor` subscribes the driver's `ISubscribable.OnDataChange`
|
||||
(value seam) but **never** subscribes `IAlarmSource.OnAlarmEvent`. No actor
|
||||
message carries an alarm transition. `GenericDriverNodeManager`
|
||||
(`src/Core/.../OpcUa/GenericDriverNodeManager.cs`) subscribed `OnAlarmEvent`
|
||||
**in-process**; it is now orphaned (only tests construct it).
|
||||
2. **No condition-update sink survives.** There is no production
|
||||
`IAlarmConditionSink.OnTransition` implementation anymore — only the interface
|
||||
(`IAddressSpaceBuilder.cs:115`) and an empty CLI stub
|
||||
(`TwinCAT.Cli/.../BrowseCommand.cs:150`). The real sink lived in the mirror's
|
||||
builder and died with it. The surviving condition path is the **snapshot**
|
||||
path: `OpcUaPublishActor.AlarmStateUpdate` → `OtOpcUaNodeManager.WriteAlarmCondition`.
|
||||
3. **Authored tags carry no alarm flag.** Native-alarm metadata came from driver
|
||||
discovery (`DriverAttributeInfo.IsAlarm`), but Phase 7 composes from authored
|
||||
`Tag` rows only. `EquipmentTagPlan` carries `{TagId, EquipmentId,
|
||||
DriverInstanceId, FolderPath, Name, DataType, FullName, Writable}` — **no
|
||||
alarm field**. The `Tag` entity has no alarm column.
|
||||
|
||||
**Decisive design fact:** `AlarmEventArgs` (the `OnAlarmEvent` payload) does
|
||||
**not** carry the transition kind. `GalaxyAlarmTransition` has an explicit
|
||||
`GalaxyAlarmTransitionKind {Raise, Acknowledge, Clear, Retrigger}`, but
|
||||
`GalaxyDriver.OnAlarmFeedTransition` drops it when building `AlarmEventArgs`. A
|
||||
consumer therefore cannot tell raise-from-clear. Phase B fixes this at the source
|
||||
(additive contract change) rather than guessing.
|
||||
|
||||
## Locked decisions (from brainstorming)
|
||||
|
||||
| Decision | Choice |
|
||||
|---|---|
|
||||
| How the server learns a tag is a native alarm | **TagConfig JSON (no migration).** The alarm intent rides in the schemaless `TagConfig` blob — `{"FullName":"…","alarm":{"alarmType":"OffNormalAlarm","severity":500}}` — parsed byte-parity in `Phase7Composer` + `DeploymentArtifact`, exactly as `FullName` is today. No EF/schema change. |
|
||||
| Phase B scope line | **Live condition + alerts fan-out; defer device-ack.** Trip → Part 9 condition + `/alerts` + historian (Primary-gated). A client Acknowledge updates the **local** condition state; routing it back to the driver's `IAlarmSource.AcknowledgeAsync` (→ AVEVA) is a deferred follow-up. |
|
||||
| Transition→state model | **Snapshot projection.** Reuse the scripted-alarm sink path (`WriteAlarmCondition` + delta-gate + `ReportConditionEvent` + Part-9 ack mechanics) unchanged; a new pure projector derives an `AlarmConditionSnapshot` from each transition. The retired `OnTransition` sink path is **not** resurrected. |
|
||||
| Transition-kind plumbing | **Additive contract change.** Add `AlarmTransitionKind Kind` to `AlarmEventArgs` (default `Unspecified`); `GalaxyDriver` populates it from the kind it already has. A record default keeps every other `IAlarmSource` implementer compiling. |
|
||||
|
||||
### Approaches considered and rejected
|
||||
- **Resurrect the `IAlarmConditionSink`/`OnTransition` builder path.** Rejected:
|
||||
reintroduces a second condition-state mechanism alongside the live, delta-gated,
|
||||
ack-wired `WriteAlarmCondition` path. The snapshot path is the one that works
|
||||
for scripted alarms today.
|
||||
- **Auto-discover the alarm flag at deploy** (query driver `DriverAttributeInfo`
|
||||
during composition). Rejected (this session): couples Phase 7 composition to a
|
||||
cross-process discovery query; larger and higher-risk than reading the authored
|
||||
blob. The TagConfig route matches the protocol-linkage precedent.
|
||||
- **Extend the `Tag` entity with `IsAlarm`/`AlarmType` columns.** Rejected:
|
||||
requires a Configuration/EF migration, out of scope for this phase.
|
||||
- **Infer raise/clear from `AlarmEventArgs` heuristically.** Rejected: the record
|
||||
has no active flag; inference is fragile and wrong. Fix the contract instead.
|
||||
|
||||
## Architecture (target end-state)
|
||||
|
||||
```
|
||||
Author Tag{Equipment, GalaxyMxGateway, TagConfig={FullName, alarm:{alarmType,severity}}}
|
||||
→ Phase7Composer / DeploymentArtifact (ExtractTagAlarm, byte-parity)
|
||||
→ EquipmentTagPlan.Alarm != null
|
||||
→ Phase7Applier.MaterialiseEquipmentTags
|
||||
• Alarm == null : SafeEnsureVariable (today's value variable, unchanged)
|
||||
• Alarm != null : SafeMaterialiseAlarmCondition(nodeId, equip, name, alarmType, severity)
|
||||
→ real Part 9 AlarmConditionState under the equipment folder (reused)
|
||||
|
||||
Runtime (the NEW seam — mirrors the scripted-alarm seam):
|
||||
GalaxyDriver.OnAlarmEvent (AlarmEventArgs{SourceNodeId=FullName, Kind, Severity, …})
|
||||
→ DriverInstanceActor (subscribes OnAlarmEvent; marshals via Self.Tell)
|
||||
→ AttributeAlarmPublished(DriverInstanceId, AlarmEventArgs) [NEW msg, parallels AttributeValuePublished]
|
||||
→ DriverHostActor.ForwardNativeAlarm
|
||||
• resolve (DriverInstanceId, SourceNodeId) → condition NodeId(s) via _alarmNodeIdByDriverRef [NEW map]
|
||||
• NativeAlarmProjector.Project(nodeId, args) → AlarmConditionSnapshot [NEW pure helper]
|
||||
• Tell OpcUaPublishActor.AlarmStateUpdate(nodeId, snapshot, ts) (UNGATED — warm on all nodes) [REUSED]
|
||||
• Publish AlarmTransitionEvent → `alerts` topic (Primary-gated — reuse _localRole) [REUSED]
|
||||
→ OtOpcUaNodeManager.WriteAlarmCondition → delta-gate → ReportConditionEvent (real Part 9 event) [REUSED]
|
||||
```
|
||||
|
||||
`OpcUaPublishActor` and `OtOpcUaNodeManager` are **unchanged** — Phase B reuses
|
||||
`AlarmStateUpdate` → `WriteAlarmCondition` verbatim.
|
||||
|
||||
## Components / workstreams
|
||||
|
||||
### WS-1 — Transition-kind contract (additive, Core.Abstractions)
|
||||
- New enum `AlarmTransitionKind { Unspecified = 0, Raise, Acknowledge, Clear, Retrigger }`
|
||||
(mirrors the internal `GalaxyAlarmTransitionKind`).
|
||||
- `AlarmEventArgs` gains a trailing `AlarmTransitionKind Kind = AlarmTransitionKind.Unspecified`
|
||||
param (record default → all existing implementers compile untouched).
|
||||
- `GalaxyDriver.OnAlarmFeedTransition` (`GalaxyDriver.cs:~1128-1167`) populates
|
||||
`Kind` from `transition.TransitionKind`.
|
||||
|
||||
### WS-2 — Alarm intent in TagConfig + compose plan (no EF)
|
||||
- New never-throw `ExtractTagAlarm(string tagConfig) → EquipmentTagAlarmInfo?`
|
||||
(parses the optional `"alarm"` object: `alarmType` default `"AlarmCondition"`,
|
||||
`severity` default `500`; absent/malformed → null). Lives next to
|
||||
`ExtractTagFullName`; **used by both** `Phase7Composer` and `DeploymentArtifact`.
|
||||
- `EquipmentTagPlan` (`Phase7Composer.cs:~80`) gains `EquipmentTagAlarmInfo? Alarm`
|
||||
(`record EquipmentTagAlarmInfo(string AlarmType, int Severity)`); null ⇒ plain
|
||||
variable. Populated in `Phase7Composer` `Select(...)` **and**
|
||||
`DeploymentArtifact.BuildEquipmentTagPlans` — **byte-parity invariant**, covered
|
||||
by a round-trip test.
|
||||
|
||||
### WS-3 — Materialize the condition node (reuse)
|
||||
- `Phase7Applier.MaterialiseEquipmentTags` (`:~162-199`) branches per tag:
|
||||
`tag.Alarm is not null` → `SafeMaterialiseAlarmCondition(nodeId, parentEquipment,
|
||||
tag.Name, tag.Alarm.AlarmType, tag.Alarm.Severity)` (the **same** method scripted
|
||||
alarms use; condition NodeId = the tag's equipment-scoped NodeId); else →
|
||||
`SafeEnsureVariable(...)` (today). `RebuildAddressSpace` already clears
|
||||
`_alarmConditions`, so redeploy teardown is covered.
|
||||
|
||||
### WS-4 — The driver→server alarm seam (the new plumbing)
|
||||
- **`DriverInstanceActor`**: on connect, if `_driver is IAlarmSource src`, subscribe
|
||||
`src.OnAlarmEvent += handler`; the handler marshals to the actor thread via
|
||||
`Self.Tell(new NativeAlarmRaised(e))` (mirrors the `_dataChangeHandler` pattern,
|
||||
`:409/:456`). `Receive<NativeAlarmRaised>` → `Context.Parent.Tell(new
|
||||
AttributeAlarmPublished(DriverInstanceId, Args))`. Unsubscribe on
|
||||
disconnect/teardown (mirror the `OnDataChange` unsubscribe). New messages
|
||||
`NativeAlarmRaised` (internal) + `AttributeAlarmPublished` (parallels
|
||||
`AttributeValuePublished`, `:65`). Phase B follows the mirror's model: subscribe
|
||||
the event and let the server filter by `SourceNodeId`; it does **not** drive
|
||||
`SubscribeAlarmsAsync` (Galaxy's feed auto-starts session-less in
|
||||
`InitializeAsync` and fires `OnAlarmEvent` regardless). Driving
|
||||
`SubscribeAlarmsAsync` from the materialized alarm-ref set, for drivers that gate
|
||||
on it, is a noted follow-up.
|
||||
- **`DriverHostActor`**: build `_alarmNodeIdByDriverRef: (DriverInstanceId,
|
||||
FullName) → HashSet<NodeId>` from equipment-tag plans where `Alarm != null`
|
||||
(alongside the existing `_nodeIdByDriverRef`, in the same apply pass). Add
|
||||
`Receive<AttributeAlarmPublished>` in the steady + applying states. Handler
|
||||
`ForwardNativeAlarm`: resolve nodeIds (unknown ref → drop silently, mirror
|
||||
behavior); per nodeId `NativeAlarmProjector.Project(...)` → snapshot →
|
||||
`_publishActor.Tell(AlarmStateUpdate(nodeId, snapshot, ts))` **ungated**; then
|
||||
publish `AlarmTransitionEvent` to `alerts` **Primary-gated** via the existing
|
||||
`_localRole` the write-routing already tracks.
|
||||
- **`NativeAlarmProjector`** (new pure class; unit-tested): per-condition-NodeId
|
||||
prior-state `(Active, Acked, Severity, Message)`; `Project(nodeId, AlarmEventArgs)
|
||||
→ AlarmConditionSnapshot` by `Kind`:
|
||||
- `Raise`/`Retrigger` → `Active=true, Acked=false`, severity+message from event.
|
||||
- `Acknowledge` → `Acked=true` (keep prior Active), carry `OperatorComment`.
|
||||
- `Clear` → `Active=false` (keep prior Acked).
|
||||
- `Unspecified` → keep prior Active/Acked, refresh severity+message.
|
||||
- `Enabled=true`, `Confirmed=true`, `Shelving=Unshelved` (shelving is a
|
||||
server/local concern). Severity: `AlarmSeverity` 4-bucket → 1..1000 ushort
|
||||
(Low→200, Medium→500, High→700, Critical→900).
|
||||
|
||||
### WS-5 — Historian / alerts parity (reuse)
|
||||
- The `AlarmTransitionEvent` published in WS-4 is the same contract
|
||||
`ScriptedAlarmHostActor` publishes; `HistorianAdapterActor` + AdminUI `/alerts`
|
||||
consume it unchanged. Populate `AlarmId` = condition NodeId, `EquipmentPath` +
|
||||
`AlarmName` from the plan, `TransitionKind` = `Kind.ToString()`, `AlarmTypeName`
|
||||
= the configured OPC UA alarm type, `User`/`Comment` from the event.
|
||||
|
||||
## Data type / severity mapping
|
||||
- OPC UA alarm subtype string → SDK type via the existing
|
||||
`CreateAlarmConditionOfType` (`OffNormalAlarm`/`DiscreteAlarm`/`LimitAlarm`/base).
|
||||
- `AlarmSeverity` (4-bucket) → 1..1000 via the projector map above; the authored
|
||||
`severity` seeds the condition's initial severity at materialization
|
||||
(`MaterialiseAlarmCondition`'s `MapSeverity`).
|
||||
|
||||
## Error handling / edge cases
|
||||
- **Unknown `SourceNodeId`** (no materialized condition for the ref): drop
|
||||
silently — preserves `GenericDriverNodeManager`'s documented behavior.
|
||||
- **Byte-parity** between `Phase7Composer` and `DeploymentArtifact` for alarm tags:
|
||||
parity round-trip test (the established invariant).
|
||||
- **Redeploy double-delivery**: `DriverInstanceActor` unsubscribes `OnAlarmEvent`
|
||||
on teardown; `WriteAlarmCondition`'s delta-gate independently suppresses
|
||||
duplicate events; `RebuildAddressSpace` clears `_alarmConditions`.
|
||||
- **Transition before condition materialized / after rebuild**: unknown-ref drop
|
||||
handles it; the projector's prior-state dict is keyed by NodeId and tolerates a
|
||||
cold start (first event seeds state).
|
||||
- **A tag with both a value and an alarm intent**: Phase B treats an `alarm`-marked
|
||||
tag as a **condition node only** (not also a plain variable) — matching the
|
||||
retired mirror, where an alarm attribute surfaced as a condition.
|
||||
|
||||
## Testing (no bUnit)
|
||||
|
||||
**xUnit + Shouldly (offline):**
|
||||
- `ExtractTagAlarm`: present / absent / malformed / defaults / unknown-keys-preserved.
|
||||
- `Phase7Composer` ↔ `DeploymentArtifact` byte-parity with alarm-bearing equipment tags.
|
||||
- `NativeAlarmProjector`: Raise→active+unacked, Acknowledge→acked, Clear→inactive,
|
||||
Retrigger, Unspecified, severity-bucket map, prior-state carry.
|
||||
- `GalaxyDriver.OnAlarmFeedTransition` populates `Kind`.
|
||||
- Akka.TestKit — `DriverInstanceActor`: a fake `IAlarmSource` driver fires
|
||||
`OnAlarmEvent` → the actor publishes `AttributeAlarmPublished` to its parent;
|
||||
unsubscribes on teardown.
|
||||
- Akka.TestKit — `DriverHostActor`: `AttributeAlarmPublished` resolves the ref →
|
||||
Tells `AlarmStateUpdate` with the projected snapshot; unknown ref dropped;
|
||||
`alerts` publish is Primary-gated (secondary suppresses).
|
||||
|
||||
**Live docker-dev `/run` (user-driven; the agent does NOT sign in)** — the gate:
|
||||
- Author a Galaxy alarm equipment tag (raw `TagConfig` carrying the `alarm` object)
|
||||
on the live-gateway-backed `MAIN-galaxy-eq`; deploy.
|
||||
- Trip the Galaxy alarm → a Part 9 `AlarmConditionState` appears active under the
|
||||
equipment via Client.CLI `alarms` (and `read`); the AdminUI `/alerts` row appears.
|
||||
- Clear → condition goes inactive. (Device-ack round-trip is the deferred
|
||||
follow-up, not part of this gate.)
|
||||
|
||||
## Suggested slicing (for the plan)
|
||||
1. **WS-1** — `AlarmTransitionKind` + `AlarmEventArgs.Kind` + Galaxy populates it
|
||||
(small/standard; touches a Core.Abstractions contract → ripples to implementers).
|
||||
2. **WS-2** — `ExtractTagAlarm` + `EquipmentTagPlan.Alarm` in both composer +
|
||||
artifact + parity test (high-risk: data-contract byte-parity).
|
||||
3. **WS-3** — `MaterialiseEquipmentTags` alarm branch (standard; reuses
|
||||
`MaterialiseAlarmCondition`).
|
||||
4. **WS-4a** — `NativeAlarmProjector` (standard; pure, fully TDD-able offline).
|
||||
5. **WS-4b** — `DriverInstanceActor` `OnAlarmEvent` subscription + publish (high-risk:
|
||||
actor state machine + driver-thread marshaling).
|
||||
6. **WS-4c** — `DriverHostActor` alarm map + `ForwardNativeAlarm` + Primary-gated
|
||||
alerts publish (high-risk: actor/concurrency/redundancy gate).
|
||||
7. **WS-5** — wire `AlarmTransitionEvent` fields (folds into WS-4c; verify historian
|
||||
+ `/alerts` consume it).
|
||||
8. **Docs** — document the `TagConfig` `alarm` schema (a Galaxy/alarms doc note).
|
||||
9. **Live `/run`** — the gate above (user-driven).
|
||||
|
||||
## Deferred follow-ups (explicitly out of Phase B)
|
||||
- **Inbound device-ack**: client Acknowledge → `IAlarmSource.AcknowledgeAsync` →
|
||||
AVEVA (its own inbound pipeline, mirrors the write-through work).
|
||||
- **`SubscribeAlarmsAsync` from the materialized alarm-ref set** for drivers that
|
||||
gate their feed on it (Galaxy doesn't).
|
||||
- **AdminUI Galaxy picker pre-fill** of the `alarm` object from discovery
|
||||
(`IsAlarm`/`SecurityClass` already known) — a UI convenience; raw-JSON authoring
|
||||
works without it and avoids live-only Razor binding risk.
|
||||
- Carrying the raw OPC UA severity (vs. the 4-bucket) end-to-end.
|
||||
|
||||
## Hard rules (carried into implementation)
|
||||
- Stage by path; never `git add .`. Never stage `sql_login.txt`,
|
||||
`src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/`, `pending.md`, `current.md`, or
|
||||
`docker-dev/docker-compose.yml`.
|
||||
- Never echo the gateway API key or any secret into a tracked file.
|
||||
- No force-push, no `--no-verify`.
|
||||
- **No Configuration entity / EF migration change** (the TagConfig route is chosen
|
||||
specifically to honor this).
|
||||
- No bUnit; Razor/JS proven only by live `/run`.
|
||||
- Build on a feature branch off master.
|
||||
|
||||
## Authoritative touched-code list (for planning)
|
||||
- `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAlarmSource.cs` (`AlarmEventArgs.Kind`, new enum)
|
||||
- `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/GalaxyDriver.cs` (`OnAlarmFeedTransition` populates `Kind`)
|
||||
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs` (`EquipmentTagPlan.Alarm`, `ExtractTagAlarm`)
|
||||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs` (`BuildEquipmentTagPlans` parity)
|
||||
- `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Applier.cs` (`MaterialiseEquipmentTags` alarm branch)
|
||||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs` (`OnAlarmEvent` sub + publish)
|
||||
- `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs` (alarm map + `ForwardNativeAlarm` + gated publish)
|
||||
- NEW `NativeAlarmProjector` (Runtime or Commons) + its tests
|
||||
- `OpcUaPublishActor` / `OtOpcUaNodeManager` — **no change** (reuse `AlarmStateUpdate`/`WriteAlarmCondition`)
|
||||
- A docs note for the `TagConfig` `alarm` schema
|
||||
```
|
||||
Reference in New Issue
Block a user