Doc refresh (task #202) — core architecture docs for multi-driver OtOpcUa

Rewrite seven core-architecture docs to match the shipped multi-driver platform.
The v1 single-driver LmxNodeManager framing is replaced with the Core +
capability-interface model — Galaxy is now one driver of seven, and each doc
points at the current class names + source paths.

What changed per file:
- OpcUaServer.md — OtOpcUaServer as StandardServer host; per-driver
  DriverNodeManager + CapabilityInvoker wiring; Config-DB-driven configuration
  (sp_PublishGeneration, DraftRevisionToken, Admin UI); Phase 6.2
  AuthorizationGate integration.
- AddressSpace.md — GenericDriverNodeManager.BuildAddressSpaceAsync walks
  ITagDiscovery.DiscoverAsync and streams DriverAttributeInfo through
  IAddressSpaceBuilder; CapturingBuilder registers alarm-condition sinks;
  per-driver NodeId schemes replace the fixed ns=1;s=ZB root.
- ReadWriteOperations.md — OnReadValue / OnWriteValue dispatch to
  IReadable.ReadAsync / IWritable.WriteAsync through CapabilityInvoker,
  honoring WriteIdempotentAttribute (#143); two-layer authorization
  (WriteAuthzPolicy + Phase 6.2 AuthorizationGate).
- Subscriptions.md — ISubscribable.SubscribeAsync/UnsubscribeAsync is the
  capability surface; STA-thread story is now Galaxy-specific (StaPump inside
  Driver.Galaxy.Host), other drivers are free-threaded.
- AlarmTracking.md — IAlarmSource is optional; AlarmSurfaceInvoker wraps
  Subscribe/Ack/Unsubscribe with fan-out by IPerCallHostResolver and the
  no-retry AlarmAcknowledge pipeline (#143); CapturingBuilder registers sinks
  at build time.
- DataTypeMapping.md — DriverDataType + SecurityClassification are the
  driver-agnostic enums; per-driver mappers (GalaxyProxyDriver inline,
  AbCipDataType, ModbusDriver, etc.); SecurityClassification is metadata only,
  ACL enforcement is at the server layer.
- IncrementalSync.md — IRediscoverable covers backend-change signals;
  sp_ComputeGenerationDiff + DiffViewer drive generation-level change
  detection; IDriver.ReinitializeAsync is the in-process recovery path.
This commit is contained in:
Joseph Doherty
2026-04-20 01:27:25 -04:00
parent 48970af416
commit 985b7aba26
7 changed files with 290 additions and 699 deletions

View File

@@ -1,234 +1,76 @@
# Alarm Tracking
`LmxNodeManager` generates OPC UA alarm conditions from Galaxy attributes marked as alarms. The system detects alarm-capable attributes during address space construction, creates `AlarmConditionState` nodes, auto-subscribes to the runtime alarm tags via MXAccess, and reports state transitions as OPC UA events.
Alarm surfacing is an optional driver capability exposed via `IAlarmSource` (`src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAlarmSource.cs`). Drivers whose backends have an alarm concept implement it — today: Galaxy (MXAccess alarms), FOCAS (CNC alarms), OPC UA Client (A&C events from the upstream server). Modbus / S7 / AB CIP / AB Legacy / TwinCAT do not implement the interface and the feature is simply absent from their subtrees.
## AlarmInfo Structure
Each tracked alarm is represented by an `AlarmInfo` instance stored in the `_alarmInAlarmTags` dictionary, keyed by the `InAlarm` tag reference:
## IAlarmSource surface
```csharp
private sealed class AlarmInfo
{
public string SourceTagReference { get; set; } // e.g., "Tag_001.Temperature"
public NodeId SourceNodeId { get; set; }
public string SourceName { get; set; } // attribute name for event messages
public bool LastInAlarm { get; set; } // tracks previous state for edge detection
public AlarmConditionState? ConditionNode { get; set; }
public string PriorityTagReference { get; set; } // e.g., "Tag_001.Temperature.Priority"
public string DescAttrNameTagReference { get; set; } // e.g., "Tag_001.Temperature.DescAttrName"
public ushort CachedSeverity { get; set; }
public string CachedMessage { get; set; }
}
Task<IAlarmSubscriptionHandle> SubscribeAlarmsAsync(
IReadOnlyList<string> sourceNodeIds, CancellationToken cancellationToken);
Task UnsubscribeAlarmsAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken);
Task AcknowledgeAsync(IReadOnlyList<AlarmAcknowledgeRequest> acknowledgements,
CancellationToken cancellationToken);
event EventHandler<AlarmEventArgs>? OnAlarmEvent;
```
`LastInAlarm` enables edge detection so only actual transitions (inactive-to-active or active-to-inactive) generate events, not repeated identical values.
The driver fires `OnAlarmEvent` for every transition (`Active`, `Acknowledged`, `Inactive`) with an `AlarmEventArgs` carrying the source node id, condition id, alarm type, message, severity (`AlarmSeverity` enum), and source timestamp.
## Alarm Detection via is_alarm Flag
## AlarmSurfaceInvoker
During `BuildAddressSpace` (and `BuildSubtree` for incremental sync), the node manager scans each non-area Galaxy object for attributes where `IsAlarm == true` and `PrimitiveName` is empty (direct attributes only, not primitive children):
`AlarmSurfaceInvoker` (`src/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs`) wraps the three mutating surfaces through `CapabilityInvoker`:
```csharp
var alarmAttrs = objAttrs.Where(a => a.IsAlarm && string.IsNullOrEmpty(a.PrimitiveName)).ToList();
```
- `SubscribeAlarmsAsync` / `UnsubscribeAlarmsAsync` run through the `DriverCapability.AlarmSubscribe` pipeline — retries apply under the tier configuration.
- `AcknowledgeAsync` runs through `DriverCapability.AlarmAcknowledge` which does NOT retry per decision #143. A timed-out ack may have already registered at the plant floor; replay would silently double-acknowledge.
The `IsAlarm` flag originates from the `AlarmExtension` primitive in the Galaxy repository database. When a Galaxy attribute has an associated `AlarmExtension` primitive, the SQL query sets `is_alarm = 1` on the corresponding `GalaxyAttributeInfo`.
Multi-host fan-out: when the driver implements `IPerCallHostResolver`, each source node id is resolved individually and batches are grouped by host so a dead PLC inside a multi-device driver doesn't poison sibling breakers. Single-host drivers fall back to `IDriver.DriverInstanceId` as the pipeline-key host.
For each alarm attribute, the code verifies that a corresponding `InAlarm` sub-attribute variable node exists in `_tagToVariableNode` (constructed from `FullTagReference + ".InAlarm"`). If the variable node is missing, the alarm is skipped -- this prevents creating orphaned alarm conditions for attributes whose extension primitives were not published.
## Condition-node creation via CapturingBuilder
## Template-Based Alarm Object Filter
Alarm-condition nodes are materialized at address-space build time. During `GenericDriverNodeManager.BuildAddressSpaceAsync` the builder is wrapped in a `CapturingBuilder` that observes every `Variable()` call. When a driver calls `IVariableHandle.MarkAsAlarmCondition(AlarmConditionInfo)` on a returned handle, the server-side `DriverNodeManager.VariableHandle` creates a sibling `AlarmConditionState` node and returns an `IAlarmConditionSink`. The wrapper stores the sink in `_alarmSinks` keyed by the variable's full reference, then `GenericDriverNodeManager` registers a forwarder on `IAlarmSource.OnAlarmEvent` that routes each push to the matching sink by `SourceNodeId`. Unknown source ids are dropped silently — they may belong to another driver.
When large galaxies contain more alarm-bearing objects than clients need, `OpcUa.AlarmFilter.ObjectFilters` restricts alarm condition creation to a subset of objects selected by **template name pattern**. The filter is applied at both alarm creation sites -- the full build in `BuildAddressSpace` and the subtree rebuild path triggered by Galaxy redeployment -- so the included set is recomputed on every rebuild against the fresh hierarchy.
The `AlarmConditionState` layout matches OPC UA Part 9:
### Matching rules
- `SourceNode` → the originating variable
- `SourceName` / `ConditionName` → from `AlarmConditionInfo.SourceName`
- Initial state: enabled, inactive, acknowledged, severity per `InitialSeverity`, retain false
- `HasCondition` references wire the source variable ↔ the condition node bidirectionally
- `*` is the only wildcard (glob-style, zero or more characters). All other regex metacharacters are escaped and matched literally.
- Matching is case-insensitive.
- The leading `$` used by Galaxy template `tag_name` values is normalized away on both the stored chain entry and the operator pattern, so `TestMachine*` matches the stored `$TestMachine`.
- Each configured entry may itself be comma-separated for operator convenience (`"TestMachine*, Pump_*"`).
- An empty list disables the filter and restores the prior behavior: every alarm-bearing object is tracked when `AlarmTrackingEnabled=true`.
Drivers flag alarm-bearing variables at discovery time via `DriverAttributeInfo.IsAlarm = true`. The Galaxy driver, for example, sets this on attributes that have an `AlarmExtension` primitive in the Galaxy repository DB; FOCAS sets it on the CNC alarm register.
### What gets included
## State transitions
Every Galaxy object whose **template derivation chain** contains any template matching any pattern is included. The chain walks `gobject.derived_from_gobject_id` from the instance through its immediate template and each ancestor template, up to `$Object`. An instance of `TestCoolMachine` whose chain is `$TestCoolMachine -> $TestMachine -> $UserDefined` matches the pattern `TestMachine` via the ancestor hit.
`ConditionSink.OnTransition` runs under the node manager's `Lock` and maps the `AlarmEventArgs.AlarmType` string to Part 9 state:
Inclusion propagates down the **containment hierarchy**: if an object matches, all of its descendants are included as well, regardless of their own template chains. This lets operators target a parent and pick up all its alarm-bearing children with one pattern.
| AlarmType | Action |
|---|---|
| `Active` | `SetActiveState(true)`, `SetAcknowledgedState(false)`, `Retain = true` |
| `Acknowledged` | `SetAcknowledgedState(true)` |
| `Inactive` | `SetActiveState(false)`; `Retain = false` once both inactive and acknowledged |
Each object is evaluated exactly once. Overlapping matches (multiple patterns hit, or both an ancestor and descendant match independently) never produce duplicate alarm condition subscriptions -- the filter operates on object identity via a `HashSet<int>` of included `GobjectId` values.
Severity is remapped: `AlarmSeverity.Low/Medium/High/Critical` → OPC UA numeric 250 / 500 / 700 / 900. `Message.Value` is set from `AlarmEventArgs.Message` on every transition. `ClearChangeMasks(true)` and `ReportEvent(condition)` fire the OPC UA event notification for clients subscribed to any ancestor notifier.
### Resolution algorithm
## Acknowledge dispatch
`AlarmObjectFilter.ResolveIncludedObjects(hierarchy)` runs once per build:
Alarm acknowledgement initiated by an OPC UA client flows:
1. Compile each pattern into a regex with `IgnoreCase | CultureInvariant | Compiled`.
2. Build a `parent -> children` map from the hierarchy. Orphans (parent id not in the hierarchy) are treated as roots.
3. BFS from each root with a `(nodeId, parentIncluded)` queue and a `visited` set for cycle defense.
4. At each node: if the parent was included OR any chain entry matches any pattern, add the node and mark its subtree as included.
5. Return the `HashSet<int>` of included object IDs. When no patterns are configured the filter is disabled and the method returns `null`, which the alarm loop treats as "no filtering".
1. The SDK invokes the `AlarmConditionState.OnAcknowledge` method delegate.
2. The handler checks the session's roles for `AlarmAck` — drivers never see a request the session wasn't entitled to make.
3. `AlarmSurfaceInvoker.AcknowledgeAsync` is called with the source / condition / comment tuple. The invoker groups by host and runs each batch through the no-retry `AlarmAcknowledge` pipeline.
After each resolution, `UnmatchedPatterns` exposes any raw pattern that matched zero objects so the startup log can warn about operator typos without failing startup.
Drivers return normally for success or throw to signal the ack failed at the backend.
### How the alarm loop applies the filter
## EventNotifier propagation
```csharp
// LmxNodeManager.BuildAddressSpace (and the subtree rebuild path)
if (_alarmTrackingEnabled)
{
var includedIds = ResolveAlarmFilterIncludedIds(sorted); // null if no filter
foreach (var obj in sorted)
{
if (obj.IsArea) continue;
if (includedIds != null && !includedIds.Contains(obj.GobjectId)) continue;
// ... existing alarm-attribute collection + AlarmConditionState creation
}
}
```
Drivers that want hierarchical alarm subscriptions propagate `EventNotifier.SubscribeToEvents` up the containment chain during discovery — the Galaxy driver flips the flag on every ancestor of an alarm-bearing object up to the driver root, mirroring v1 behavior. Clients subscribed at the driver root, a mid-level folder, or the `Objects/` root see alarm events from every descendant with an `AlarmConditionState` sibling. The driver-root `FolderState` is created in `DriverNodeManager.CreateAddressSpace` with `EventNotifier = SubscribeToEvents | HistoryRead` so alarm event subscriptions and alarm history both have a single natural target.
`ResolveAlarmFilterIncludedIds` also emits a one-line summary (`Alarm filter: X of Y objects included (Z pattern(s))`) and per-pattern warnings for patterns that matched nothing. The included count is published to the dashboard via `AlarmFilterIncludedObjectCount`.
## ConditionRefresh
### Runtime telemetry
The OPC UA `ConditionRefresh` service queues the current state of every retained condition back to the requesting monitored items. `DriverNodeManager` iterates the node manager's `AlarmConditionState` collection and queues each condition whose `Retain.Value == true` — matching the Part 9 requirement.
`LmxNodeManager` exposes three read-only properties populated by the filter:
## Key source files
- `AlarmFilterEnabled` -- true when patterns are configured.
- `AlarmFilterPatternCount` -- number of compiled patterns.
- `AlarmFilterIncludedObjectCount` -- number of objects in the most recent included set.
`StatusReportService` reads these into `AlarmStatusInfo.FilterEnabled`, `FilterPatternCount`, and `FilterIncludedObjectCount`. The Alarms panel on the dashboard renders `Filter: N pattern(s), M object(s) included` only when the filter is enabled. See [Status Dashboard](StatusDashboard.md#alarms).
### Validator warning
`ConfigurationValidator.ValidateAndLog()` logs the effective filter at startup and emits a `Warning` if `AlarmFilter.ObjectFilters` is non-empty while `AlarmTrackingEnabled` is `false`, because the filter would have no effect.
## AlarmConditionState Creation
Each detected alarm attribute produces an `AlarmConditionState` node:
```csharp
var condition = new AlarmConditionState(sourceVariable);
condition.Create(SystemContext, conditionNodeId,
new QualifiedName(alarmAttr.AttributeName + "Alarm", NamespaceIndex),
new LocalizedText("en", alarmAttr.AttributeName + " Alarm"),
true);
```
Key configuration on the condition node:
- **SourceNode** -- Set to the OPC UA NodeId of the source variable, linking the condition to the attribute that triggered it.
- **SourceName / ConditionName** -- Set to the Galaxy attribute name for identification in event notifications.
- **AutoReportStateChanges** -- Set to `true` so the OPC UA framework automatically generates event notifications when condition properties change.
- **Initial state** -- Enabled, inactive, acknowledged, severity Medium, retain false.
- **HasCondition references** -- Bidirectional references are added between the source variable and the condition node.
The condition's `OnReportEvent` callback forwards events to `Server.ReportEvent` so they reach clients subscribed at the server level.
### Condition Methods
Each alarm condition supports the following OPC UA Part 9 methods:
- **Acknowledge** (`OnAcknowledge`) -- Writes the acknowledgment message to the Galaxy `AckMsg` tag. Requires the `AlarmAck` role.
- **Confirm** (`OnConfirm`) -- Confirms a previously acknowledged alarm. The SDK manages the `ConfirmedState` transition.
- **AddComment** (`OnAddComment`) -- Attaches an operator comment to the condition for audit trail purposes.
- **Enable / Disable** (`OnEnableDisable`) -- Activates or deactivates alarm monitoring for the specific condition. The SDK manages the `EnabledState` transition.
- **Shelve** (`OnShelve`) -- Supports `TimedShelve`, `OneShotShelve`, and `Unshelve` operations. The SDK manages the `ShelvedStateMachineType` state transitions including automatic timed unshelve.
- **TimedUnshelve** (`OnTimedUnshelve`) -- Automatically called by the SDK when a timed shelve period expires.
### Event Fields
Alarm events include the following fields:
- `EventId` -- Unique GUID for each event, used as reference for Acknowledge/Confirm
- `ActiveState`, `AckedState`, `ConfirmedState` -- State transitions
- `Message` -- Alarm message from Galaxy `DescAttrName` or default text
- `Severity` -- Galaxy Priority clamped to OPC UA range 1-1000
- `Retain` -- True while alarm is active or unacknowledged
- `LocalTime` -- Server timezone offset with daylight saving flag
- `Quality` -- Set to Good for alarm events
## Auto-subscription to Alarm Tags
After alarm condition nodes are created, `SubscribeAlarmTags` opens MXAccess subscriptions for three tags per alarm:
1. **InAlarm** (`Tag_001.Temperature.InAlarm`) -- The boolean trigger for alarm activation/deactivation.
2. **Priority** (`Tag_001.Temperature.Priority`) -- Numeric priority that maps to OPC UA severity.
3. **DescAttrName** (`Tag_001.Temperature.DescAttrName`) -- String description used as the alarm event message.
These subscriptions are opened unconditionally (not ref-counted) because they serve the server's own alarm tracking, not client-initiated monitoring. Tags that do not have corresponding variable nodes in `_tagToVariableNode` are skipped.
## EventNotifier Propagation
When a Galaxy object contains at least one alarm attribute, `EventNotifiers.SubscribeToEvents` is set on the object node **and all its ancestors** up to the root. This allows OPC UA clients to subscribe to events at any level in the hierarchy and receive alarm notifications from all descendants:
```csharp
if (hasAlarms && _nodeMap.TryGetValue(obj.GobjectId, out var objNode))
EnableEventNotifierUpChain(objNode);
```
For example, an alarm on `TestMachine_001.SubObject.Temperature` will be visible to clients subscribed on `SubObject`, `TestMachine_001`, or the root `ZB` folder. The root `ZB` folder also has `EventNotifiers.SubscribeToEvents` enabled during initial construction.
## InAlarm Transition Detection in DispatchLoop
Alarm state changes are detected in the dispatch loop's Phase 1 (outside `Lock`), which runs on the background dispatch thread rather than the STA thread. This placement is intentional because the detection logic reads Priority and DescAttrName values from MXAccess, which would block the STA thread if done inside the `OnMxAccessDataChange` callback.
For each pending data change, the loop checks whether the address matches a key in `_alarmInAlarmTags`:
```csharp
if (_alarmInAlarmTags.TryGetValue(address, out var alarmInfo))
{
var newInAlarm = vtq.Value is true || vtq.Value is 1
|| (vtq.Value is int intVal && intVal != 0);
if (newInAlarm != alarmInfo.LastInAlarm)
{
alarmInfo.LastInAlarm = newInAlarm;
// Read Priority and DescAttrName via MXAccess (outside Lock)
...
pendingAlarmEvents.Add((alarmInfo, newInAlarm));
}
}
```
The boolean coercion handles multiple value representations: `true`, integer `1`, or any non-zero integer. When the value changes state, Priority and DescAttrName are read synchronously from MXAccess to populate `CachedSeverity` and `CachedMessage`. These reads happen outside `Lock` because they call into the STA thread.
Priority values are clamped to the OPC UA severity range (1-1000). Both `int` and `short` types are handled.
## ReportAlarmEvent
`ReportAlarmEvent` runs inside `Lock` during Phase 2 of the dispatch loop. It updates the `AlarmConditionState` and generates an OPC UA event:
```csharp
condition.SetActiveState(SystemContext, active);
condition.Message.Value = new LocalizedText("en", message);
condition.SetSeverity(SystemContext, (EventSeverity)severity);
condition.Retain.Value = active || (condition.AckedState?.Id?.Value == false);
```
Key behaviors:
- **Active state** -- Set to `true` on activation, `false` on clearing.
- **Message** -- Uses `CachedMessage` (from DescAttrName) when available on activation. Falls back to a generated `"Alarm active: {SourceName}"` string. Cleared alarms always use `"Alarm cleared: {SourceName}"`.
- **Severity** -- Set from `CachedSeverity`, which was read from the Priority tag.
- **Retain** -- `true` while the alarm is active or unacknowledged. This keeps the condition visible in condition refresh responses.
- **Acknowledged state** -- Reset to `false` when the alarm activates, requiring explicit client acknowledgment. When role-based auth is active, alarm acknowledgment requires the `AlarmAck` role on the session (checked via `GrantedRoleIds`). Users without this role receive `BadUserAccessDenied`.
The event is reported by walking up the notifier chain from the source variable's parent through all ancestor nodes. Each ancestor with `EventNotifier` set receives the event via `ReportEvent`, so clients subscribed at any level in the Galaxy hierarchy see alarm transitions from descendant objects.
## Condition Refresh Override
The `ConditionRefresh` override iterates all tracked alarms and queues retained conditions to the requesting monitored items:
```csharp
public override ServiceResult ConditionRefresh(OperationContext context,
IList<IEventMonitoredItem> monitoredItems)
{
foreach (var kvp in _alarmInAlarmTags)
{
var info = kvp.Value;
if (info.ConditionNode == null || info.ConditionNode.Retain?.Value != true)
continue;
foreach (var item in monitoredItems)
item.QueueEvent(info.ConditionNode);
}
return ServiceResult.Good;
}
```
Only conditions where `Retain.Value == true` are included. This means only active or unacknowledged alarms appear in condition refresh responses, matching the OPC UA specification requirement that condition refresh returns the current state of all retained conditions.
- `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAlarmSource.cs` — capability contract + `AlarmEventArgs`
- `src/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs` — per-host fan-out + no-retry ack
- `src/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs``CapturingBuilder` + alarm forwarder
- `src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs``VariableHandle.MarkAsAlarmCondition` + `ConditionSink`
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/Alarms/GalaxyAlarmTracker.cs` — Galaxy-specific alarm-event production