17 KiB
Galaxy Phase B — Native Alarms on the Equipment-Tag Path — Design
Date: 2026-06-14
Status: Approved (brainstorming) — ready for implementation planning
Scope: Phase B of the Galaxy standard-driver design
(docs/plans/2026-06-12-galaxy-standard-driver-design.md). Restores native
IAlarmSource alarms in the post-Phase-A Equipment model, delivered over a new
driver→server alarm seam that mirrors the scripted-alarm seam.
Builds on: master (Milestone 1b complete — equipment-tag live values + write
pipeline shipped). Feature branch off master.
Goal
A Galaxy equipment Tag bound to GalaxyMxGateway and marked as a native
alarm materializes a real OPC UA Part 9 AlarmConditionState under its
equipment folder, and the driver's live IAlarmSource.OnAlarmEvent transitions
drive that condition (active / severity / message / ack-state) and fan out to the
alerts topic (AdminUI /alerts + historian), exactly as scripted alarms do
today. No EF/schema migration.
Why this is needed (the gap, grounded in current code)
Phase A retired the SystemPlatform mirror (MaterialiseGalaxyTags +
GenericDriverNodeManager), which was the only path that wired native
alarms. Three concrete consequences, all verified against current code:
- No cross-actor transport for native alarms. In the fused-host actor model,
DriverInstanceActorsubscribes the driver'sISubscribable.OnDataChange(value seam) but never subscribesIAlarmSource.OnAlarmEvent. No actor message carries an alarm transition.GenericDriverNodeManager(src/Core/.../OpcUa/GenericDriverNodeManager.cs) subscribedOnAlarmEventin-process; it is now orphaned (only tests construct it). - No condition-update sink survives. There is no production
IAlarmConditionSink.OnTransitionimplementation anymore — only the interface (IAddressSpaceBuilder.cs:115) and an empty CLI stub (TwinCAT.Cli/.../BrowseCommand.cs:150). The real sink lived in the mirror's builder and died with it. The surviving condition path is the snapshot path:OpcUaPublishActor.AlarmStateUpdate→OtOpcUaNodeManager.WriteAlarmCondition. - Authored tags carry no alarm flag. Native-alarm metadata came from driver
discovery (
DriverAttributeInfo.IsAlarm), but Phase 7 composes from authoredTagrows only.EquipmentTagPlancarries{TagId, EquipmentId, DriverInstanceId, FolderPath, Name, DataType, FullName, Writable}— no alarm field. TheTagentity has no alarm column.
Decisive design fact: AlarmEventArgs (the OnAlarmEvent payload) does
not carry the transition kind. GalaxyAlarmTransition has an explicit
GalaxyAlarmTransitionKind {Raise, Acknowledge, Clear, Retrigger}, but
GalaxyDriver.OnAlarmFeedTransition drops it when building AlarmEventArgs. A
consumer therefore cannot tell raise-from-clear. Phase B fixes this at the source
(additive contract change) rather than guessing.
Locked decisions (from brainstorming)
| Decision | Choice |
|---|---|
| How the server learns a tag is a native alarm | TagConfig JSON (no migration). The alarm intent rides in the schemaless TagConfig blob — {"FullName":"…","alarm":{"alarmType":"OffNormalAlarm","severity":500}} — parsed byte-parity in Phase7Composer + DeploymentArtifact, exactly as FullName is today. No EF/schema change. |
| Phase B scope line | Live condition + alerts fan-out; defer device-ack. Trip → Part 9 condition + /alerts + historian (Primary-gated). A client Acknowledge updates the local condition state; routing it back to the driver's IAlarmSource.AcknowledgeAsync (→ AVEVA) is a deferred follow-up. |
| Transition→state model | Snapshot projection. Reuse the scripted-alarm sink path (WriteAlarmCondition + delta-gate + ReportConditionEvent + Part-9 ack mechanics) unchanged; a new pure projector derives an AlarmConditionSnapshot from each transition. The retired OnTransition sink path is not resurrected. |
| Transition-kind plumbing | Additive contract change. Add AlarmTransitionKind Kind to AlarmEventArgs (default Unspecified); GalaxyDriver populates it from the kind it already has. A record default keeps every other IAlarmSource implementer compiling. |
Approaches considered and rejected
- Resurrect the
IAlarmConditionSink/OnTransitionbuilder path. Rejected: reintroduces a second condition-state mechanism alongside the live, delta-gated, ack-wiredWriteAlarmConditionpath. The snapshot path is the one that works for scripted alarms today. - Auto-discover the alarm flag at deploy (query driver
DriverAttributeInfoduring composition). Rejected (this session): couples Phase 7 composition to a cross-process discovery query; larger and higher-risk than reading the authored blob. The TagConfig route matches the protocol-linkage precedent. - Extend the
Tagentity withIsAlarm/AlarmTypecolumns. Rejected: requires a Configuration/EF migration, out of scope for this phase. - Infer raise/clear from
AlarmEventArgsheuristically. Rejected: the record has no active flag; inference is fragile and wrong. Fix the contract instead.
Architecture (target end-state)
Author Tag{Equipment, GalaxyMxGateway, TagConfig={FullName, alarm:{alarmType,severity}}}
→ Phase7Composer / DeploymentArtifact (ExtractTagAlarm, byte-parity)
→ EquipmentTagPlan.Alarm != null
→ Phase7Applier.MaterialiseEquipmentTags
• Alarm == null : SafeEnsureVariable (today's value variable, unchanged)
• Alarm != null : SafeMaterialiseAlarmCondition(nodeId, equip, name, alarmType, severity)
→ real Part 9 AlarmConditionState under the equipment folder (reused)
Runtime (the NEW seam — mirrors the scripted-alarm seam):
GalaxyDriver.OnAlarmEvent (AlarmEventArgs{SourceNodeId=FullName, Kind, Severity, …})
→ DriverInstanceActor (subscribes OnAlarmEvent; marshals via Self.Tell)
→ AttributeAlarmPublished(DriverInstanceId, AlarmEventArgs) [NEW msg, parallels AttributeValuePublished]
→ DriverHostActor.ForwardNativeAlarm
• resolve (DriverInstanceId, SourceNodeId) → condition NodeId(s) via _alarmNodeIdByDriverRef [NEW map]
• NativeAlarmProjector.Project(nodeId, args) → AlarmConditionSnapshot [NEW pure helper]
• Tell OpcUaPublishActor.AlarmStateUpdate(nodeId, snapshot, ts) (UNGATED — warm on all nodes) [REUSED]
• Publish AlarmTransitionEvent → `alerts` topic (Primary-gated — reuse _localRole) [REUSED]
→ OtOpcUaNodeManager.WriteAlarmCondition → delta-gate → ReportConditionEvent (real Part 9 event) [REUSED]
OpcUaPublishActor and OtOpcUaNodeManager are unchanged — Phase B reuses
AlarmStateUpdate → WriteAlarmCondition verbatim.
Components / workstreams
WS-1 — Transition-kind contract (additive, Core.Abstractions)
- New enum
AlarmTransitionKind { Unspecified = 0, Raise, Acknowledge, Clear, Retrigger }(mirrors the internalGalaxyAlarmTransitionKind). AlarmEventArgsgains a trailingAlarmTransitionKind Kind = AlarmTransitionKind.Unspecifiedparam (record default → all existing implementers compile untouched).GalaxyDriver.OnAlarmFeedTransition(GalaxyDriver.cs:~1128-1167) populatesKindfromtransition.TransitionKind.
WS-2 — Alarm intent in TagConfig + compose plan (no EF)
- New never-throw
ExtractTagAlarm(string tagConfig) → EquipmentTagAlarmInfo?(parses the optional"alarm"object:alarmTypedefault"AlarmCondition",severitydefault500; absent/malformed → null). Lives next toExtractTagFullName; used by bothPhase7ComposerandDeploymentArtifact. EquipmentTagPlan(Phase7Composer.cs:~80) gainsEquipmentTagAlarmInfo? Alarm(record EquipmentTagAlarmInfo(string AlarmType, int Severity)); null ⇒ plain variable. Populated inPhase7ComposerSelect(...)andDeploymentArtifact.BuildEquipmentTagPlans— byte-parity invariant, covered by a round-trip test.
WS-3 — Materialize the condition node (reuse)
Phase7Applier.MaterialiseEquipmentTags(:~162-199) branches per tag:tag.Alarm is not null→SafeMaterialiseAlarmCondition(nodeId, parentEquipment, tag.Name, tag.Alarm.AlarmType, tag.Alarm.Severity)(the same method scripted alarms use; condition NodeId = the tag's equipment-scoped NodeId); else →SafeEnsureVariable(...)(today).RebuildAddressSpacealready clears_alarmConditions, so redeploy teardown is covered.
WS-4 — The driver→server alarm seam (the new plumbing)
DriverInstanceActor: on connect, if_driver is IAlarmSource src, subscribesrc.OnAlarmEvent += handler; the handler marshals to the actor thread viaSelf.Tell(new NativeAlarmRaised(e))(mirrors the_dataChangeHandlerpattern,:409/:456).Receive<NativeAlarmRaised>→Context.Parent.Tell(new AttributeAlarmPublished(DriverInstanceId, Args)). Unsubscribe on disconnect/teardown (mirror theOnDataChangeunsubscribe). New messagesNativeAlarmRaised(internal) +AttributeAlarmPublished(parallelsAttributeValuePublished,:65). Phase B follows the mirror's model: subscribe the event and let the server filter bySourceNodeId; it does not driveSubscribeAlarmsAsync(Galaxy's feed auto-starts session-less inInitializeAsyncand firesOnAlarmEventregardless). DrivingSubscribeAlarmsAsyncfrom the materialized alarm-ref set, for drivers that gate on it, is a noted follow-up.DriverHostActor: build_alarmNodeIdByDriverRef: (DriverInstanceId, FullName) → HashSet<NodeId>from equipment-tag plans whereAlarm != null(alongside the existing_nodeIdByDriverRef, in the same apply pass). AddReceive<AttributeAlarmPublished>in the steady + applying states. HandlerForwardNativeAlarm: resolve nodeIds (unknown ref → drop silently, mirror behavior); per nodeIdNativeAlarmProjector.Project(...)→ snapshot →_publishActor.Tell(AlarmStateUpdate(nodeId, snapshot, ts))ungated; then publishAlarmTransitionEventtoalertsPrimary-gated via the existing_localRolethe write-routing already tracks.NativeAlarmProjector(new pure class; unit-tested): per-condition-NodeId prior-state(Active, Acked, Severity, Message);Project(nodeId, AlarmEventArgs) → AlarmConditionSnapshotbyKind:Raise/Retrigger→Active=true, Acked=false, severity+message from event.Acknowledge→Acked=true(keep prior Active), carryOperatorComment.Clear→Active=false(keep prior Acked).Unspecified→ keep prior Active/Acked, refresh severity+message.Enabled=true,Confirmed=true,Shelving=Unshelved(shelving is a server/local concern). Severity:AlarmSeverity4-bucket → 1..1000 ushort (Low→200, Medium→500, High→700, Critical→900).
WS-5 — Historian / alerts parity (reuse)
- The
AlarmTransitionEventpublished in WS-4 is the same contractScriptedAlarmHostActorpublishes;HistorianAdapterActor+ AdminUI/alertsconsume it unchanged. PopulateAlarmId= condition NodeId,EquipmentPath+AlarmNamefrom the plan,TransitionKind=Kind.ToString(),AlarmTypeName= the configured OPC UA alarm type,User/Commentfrom the event.
Data type / severity mapping
- OPC UA alarm subtype string → SDK type via the existing
CreateAlarmConditionOfType(OffNormalAlarm/DiscreteAlarm/LimitAlarm/base). AlarmSeverity(4-bucket) → 1..1000 via the projector map above; the authoredseverityseeds the condition's initial severity at materialization (MaterialiseAlarmCondition'sMapSeverity).
Error handling / edge cases
- Unknown
SourceNodeId(no materialized condition for the ref): drop silently — preservesGenericDriverNodeManager's documented behavior. - Byte-parity between
Phase7ComposerandDeploymentArtifactfor alarm tags: parity round-trip test (the established invariant). - Redeploy double-delivery:
DriverInstanceActorunsubscribesOnAlarmEventon teardown;WriteAlarmCondition's delta-gate independently suppresses duplicate events;RebuildAddressSpaceclears_alarmConditions. - Transition before condition materialized / after rebuild: unknown-ref drop handles it; the projector's prior-state dict is keyed by NodeId and tolerates a cold start (first event seeds state).
- A tag with both a value and an alarm intent: Phase B treats an
alarm-marked tag as a condition node only (not also a plain variable) — matching the retired mirror, where an alarm attribute surfaced as a condition.
Testing (no bUnit)
xUnit + Shouldly (offline):
ExtractTagAlarm: present / absent / malformed / defaults / unknown-keys-preserved.Phase7Composer↔DeploymentArtifactbyte-parity with alarm-bearing equipment tags.NativeAlarmProjector: Raise→active+unacked, Acknowledge→acked, Clear→inactive, Retrigger, Unspecified, severity-bucket map, prior-state carry.GalaxyDriver.OnAlarmFeedTransitionpopulatesKind.- Akka.TestKit —
DriverInstanceActor: a fakeIAlarmSourcedriver firesOnAlarmEvent→ the actor publishesAttributeAlarmPublishedto its parent; unsubscribes on teardown. - Akka.TestKit —
DriverHostActor:AttributeAlarmPublishedresolves the ref → TellsAlarmStateUpdatewith the projected snapshot; unknown ref dropped;alertspublish is Primary-gated (secondary suppresses).
Live docker-dev /run (user-driven; the agent does NOT sign in) — the gate:
- Author a Galaxy alarm equipment tag (raw
TagConfigcarrying thealarmobject) on the live-gateway-backedMAIN-galaxy-eq; deploy. - Trip the Galaxy alarm → a Part 9
AlarmConditionStateappears active under the equipment via Client.CLIalarms(andread); the AdminUI/alertsrow appears. - Clear → condition goes inactive. (Device-ack round-trip is the deferred follow-up, not part of this gate.)
Suggested slicing (for the plan)
- WS-1 —
AlarmTransitionKind+AlarmEventArgs.Kind+ Galaxy populates it (small/standard; touches a Core.Abstractions contract → ripples to implementers). - WS-2 —
ExtractTagAlarm+EquipmentTagPlan.Alarmin both composer + artifact + parity test (high-risk: data-contract byte-parity). - WS-3 —
MaterialiseEquipmentTagsalarm branch (standard; reusesMaterialiseAlarmCondition). - WS-4a —
NativeAlarmProjector(standard; pure, fully TDD-able offline). - WS-4b —
DriverInstanceActorOnAlarmEventsubscription + publish (high-risk: actor state machine + driver-thread marshaling). - WS-4c —
DriverHostActoralarm map +ForwardNativeAlarm+ Primary-gated alerts publish (high-risk: actor/concurrency/redundancy gate). - WS-5 — wire
AlarmTransitionEventfields (folds into WS-4c; verify historian/alertsconsume it).
- Docs — document the
TagConfigalarmschema (a Galaxy/alarms doc note). - Live
/run— the gate above (user-driven).
Deferred follow-ups (explicitly out of Phase B)
- Inbound device-ack: client Acknowledge →
IAlarmSource.AcknowledgeAsync→ AVEVA (its own inbound pipeline, mirrors the write-through work). SubscribeAlarmsAsyncfrom the materialized alarm-ref set for drivers that gate their feed on it (Galaxy doesn't).- AdminUI Galaxy picker pre-fill of the
alarmobject from discovery (IsAlarm/SecurityClassalready known) — a UI convenience; raw-JSON authoring works without it and avoids live-only Razor binding risk. - Carrying the raw OPC UA severity (vs. the 4-bucket) end-to-end.
Hard rules (carried into implementation)
- Stage by path; never
git add .. Never stagesql_login.txt,src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/,pending.md,current.md, ordocker-dev/docker-compose.yml. - Never echo the gateway API key or any secret into a tracked file.
- No force-push, no
--no-verify. - No Configuration entity / EF migration change (the TagConfig route is chosen specifically to honor this).
- No bUnit; Razor/JS proven only by live
/run. - Build on a feature branch off master.
Authoritative touched-code list (for planning)
src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAlarmSource.cs(AlarmEventArgs.Kind, new enum)src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/GalaxyDriver.cs(OnAlarmFeedTransitionpopulatesKind)src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs(EquipmentTagPlan.Alarm,ExtractTagAlarm)src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs(BuildEquipmentTagPlansparity)src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Applier.cs(MaterialiseEquipmentTagsalarm branch)src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverInstanceActor.cs(OnAlarmEventsub + publish)src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs(alarm map +ForwardNativeAlarm+ gated publish)- NEW
NativeAlarmProjector(Runtime or Commons) + its tests OpcUaPublishActor/OtOpcUaNodeManager— no change (reuseAlarmStateUpdate/WriteAlarmCondition)- A docs note for the
TagConfigalarmschema