docs: plan — alarms over the mxaccessgw gateway
Coordinated cross-repo epic to restore the three v1 alarm capabilities that PR 7.2 regressed: rich MxAccess alarm-event metadata, native Acknowledge semantics, and the IAlarmHistorianWriter write-back path. Architectural split: gateway owns MxAccess transport (new OnAlarmTransition event family + AcknowledgeAlarm / QueryActiveAlarms / WriteHistorianEvent RPCs); lmxopcua keeps the OPC UA Part 9 state machine, ACL/role enforcement, and multi-source aggregation. The existing value-driven sub-attribute path stays as fallback. 10 PRs total — 5 in mxaccessgw, 5 in lmxopcua — sequenced so each side's work is independently reviewable. End-of-epic gate is a parity matrix run with five new alarm scenarios. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
617
docs/plans/alarms-over-gateway.md
Normal file
617
docs/plans/alarms-over-gateway.md
Normal file
@@ -0,0 +1,617 @@
|
||||
# Plan — alarms over the mxaccessgw gateway
|
||||
|
||||
Coordinated epic across two repos:
|
||||
|
||||
- **`lmxopcua`** (this repo) — `c:\Users\dohertj2\Desktop\lmxopcua\`
|
||||
- **`mxaccessgw`** — `c:\Users\dohertj2\Desktop\mxaccessgw\`
|
||||
|
||||
## Why
|
||||
|
||||
PR 7.2 (2026-04-30, commit `ae7106d`) retired the in-process v1 Galaxy stack
|
||||
(`Driver.Galaxy.Host` / `.Proxy` / `.Shared` + `OtOpcUaGalaxyHost` Windows
|
||||
service) and migrated Galaxy access to the in-process `GalaxyDriver` over
|
||||
mxaccessgw's gRPC. In doing so, three v1 capabilities regressed:
|
||||
|
||||
1. **Native MxAccess alarm-event metadata** — v1's `GalaxyAlarmTracker`
|
||||
surfaced rich alarm transitions (operator comment, original raise time,
|
||||
ack time, alarm category, native severity). The current architecture
|
||||
reconstructs Part 9 transitions by subscribing to four sub-attribute
|
||||
value updates (`InAlarm`, `Acked`, `Priority`, `Description`) — fine for
|
||||
raise/clear but loses everything else.
|
||||
2. **Native MxAccess Acknowledge semantics** — v1 called the MxAccess ack
|
||||
API directly from `GalaxyAlarmTracker`. Today, OPC UA acks are written
|
||||
into the `AckMsgWriteRef` sub-attribute — semantically valid but a
|
||||
round-trip through the value path that loses operator-comment fidelity.
|
||||
3. **Alarm-historian write-back path** — `GalaxyHistorianWriter`
|
||||
implemented `IAlarmHistorianWriter` and forwarded scripted-alarm and
|
||||
Galaxy-native alarm transitions back to AVEVA Historian via
|
||||
`aahClientManaged`. PR 7.2 deleted it. `Phase7Composer.ResolveHistorianSink`
|
||||
now finds no writer and falls back to `NullAlarmHistorianSink`, so
|
||||
**scripted-alarm transitions queue locally and silently discard.**
|
||||
(Galaxy-native alarms still reach AVEVA Historian via the Galaxy template's
|
||||
own `HistorizeToAveva` toggle, independent of our sink — that path
|
||||
wasn't broken.)
|
||||
|
||||
`gateway.md` (mxaccessgw, line 8) explicitly commits the gateway to "full
|
||||
MXAccess parity… preserve MXAccess behavior first… **native MXAccess event
|
||||
families**." Today's gateway proto exposes only data-change families. Closing
|
||||
the alarm regression and fulfilling that parity statement are the same task.
|
||||
|
||||
## Goals
|
||||
|
||||
- Restore all three regressed capabilities to feature parity with v1.
|
||||
- Keep the v2 architectural split — gateway owns MxAccess transport;
|
||||
lmxopcua owns OPC UA Part 9 semantics, ACL/role enforcement, and
|
||||
multi-source aggregation (driver-native + scripted + sub-attribute).
|
||||
- Preserve the value-driven sub-attribute path as a fallback for Galaxy
|
||||
templates that don't carry `$Alarm*` extensions.
|
||||
- Land the work as a sequence of small, independently-reviewable PRs that
|
||||
alternate between repos in dependency order.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Reimplementing the Part 9 state machine inside mxaccessgw. The gateway
|
||||
stays UA-agnostic.
|
||||
- Reworking the LDAP role-grant or OPC UA AlarmAck ACL surface — those
|
||||
already exist and route through `Server/Alarms/IAlarmAcknowledger`.
|
||||
- Adding alarm support to non-Galaxy drivers (AbCip / FOCAS / OpcUaClient
|
||||
already have their own `IAlarmSource` implementations; Modbus / S7 /
|
||||
AbLegacy / TwinCAT don't have a native alarm bus and are out of scope).
|
||||
- Altering Galaxy template conventions or `$Alarm*` extensions in the
|
||||
customer's Galaxy.
|
||||
|
||||
## Before → after
|
||||
|
||||
**Today (post-PR 7.2):**
|
||||
|
||||
```
|
||||
MxAccess COM (gateway worker)
|
||||
│ data-change events only on the MxEvent stream
|
||||
▼
|
||||
GalaxyDriver (no IAlarmSource)
|
||||
│ IWritable / ISubscribable / ITagDiscovery only
|
||||
▼
|
||||
DriverNodeManager
|
||||
├─ subscribes to four $Alarm* sub-attributes per condition
|
||||
├─ AlarmConditionService rebuilds Part 9 transitions from value updates
|
||||
└─ DriverWritableAcknowledger writes AckMsgWriteRef on ack
|
||||
|
||||
Phase7Composer.ResolveHistorianSink → NullAlarmHistorianSink
|
||||
(scripted-alarm transitions queue → silently discarded)
|
||||
```
|
||||
|
||||
**After this epic:**
|
||||
|
||||
```
|
||||
MxAccess COM (gateway worker)
|
||||
│ data-change ──┐
|
||||
│ alarm-transition │
|
||||
│ write-complete ├─► single MxEvent stream (new family added)
|
||||
▼ ▼
|
||||
GalaxyDriver : ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable,
|
||||
IHostConnectivityProbe, IAlarmSource ← restored
|
||||
├─ EventPump dispatches OnAlarmTransition family → IAlarmSource.OnAlarmEvent
|
||||
├─ AcknowledgeAsync → gateway RPC AcknowledgeAlarm
|
||||
└─ QueryActiveAlarmsAsync → gateway RPC QueryActiveAlarms (ConditionRefresh)
|
||||
|
||||
DriverNodeManager
|
||||
├─ rich alarm events from IAlarmSource.OnAlarmEvent → AlarmConditionService
|
||||
├─ value-driven sub-attribute path STILL WORKS for templates without $Alarm
|
||||
├─ DriverWritableAcknowledger preserved as fallback for the value path
|
||||
└─ ScriptedAlarmEngine output continues to feed AlarmConditionService
|
||||
|
||||
Phase7Composer.ResolveHistorianSink → GatewayAlarmHistorianWriter
|
||||
├─ scripted-alarm transitions → SqliteStoreAndForwardSink
|
||||
└─ drain worker → gateway RPC WriteHistorianEvent → AVEVA Historian
|
||||
```
|
||||
|
||||
## Architecture decisions
|
||||
|
||||
**D1 — Where the Part 9 state machine runs.** Stays in lmxopcua's
|
||||
`AlarmConditionService`. Gateway is UA-agnostic. ScriptedAlarmEngine produces
|
||||
Part 9 transitions with no MxAccess origin; the aggregator must live where all
|
||||
sources converge.
|
||||
|
||||
**D2 — Where authz on Acknowledge runs.** Stays in lmxopcua. The OPC UA
|
||||
`AlarmConditionState.OnAcknowledge` delegate already checks the session's
|
||||
roles for `AlarmAck` against the LDAP/role-grant ACL. The gateway should
|
||||
never be reachable in a way that bypasses that check.
|
||||
|
||||
**D3 — How rich alarm events reach OPC UA clients.** New `MxEventFamily`
|
||||
on the existing `StreamEvents` RPC (no second stream). Adds latency
|
||||
parity with data-change events, reuses the bounded-channel + worker-side
|
||||
delivery semantics already documented in `gateway.md`.
|
||||
|
||||
**D4 — Sub-attribute fallback path stays.** Some Galaxy templates won't
|
||||
have `$Alarm*` extensions yet; the existing value-driven path remains the
|
||||
only way to surface alarms for those templates. Both paths feed
|
||||
`AlarmConditionService`. Driver-native events take precedence when both
|
||||
are present (more authoritative, lower latency).
|
||||
|
||||
**D5 — Where the historian writer lives.** As a new RPC on the gateway
|
||||
(`WriteHistorianEvent`). The Wonderware sidecar's existing
|
||||
`WriteAlarmEvents` IPC slot stays unwired and is deleted as part of this
|
||||
epic — the gateway is the canonical place for "write to AVEVA Historian"
|
||||
since the gateway already owns AVEVA-COM access. This also means the
|
||||
sidecar (long term) only does *reads* and could potentially retire entirely
|
||||
if the historian-client REST migration (`docs/plans/...`) lands.
|
||||
|
||||
## Track A — mxaccessgw changes
|
||||
|
||||
All five PRs land in `c:\Users\dohertj2\Desktop\mxaccessgw\`.
|
||||
|
||||
### PR A.1 — proto: add alarm-transition event family + ack/query RPCs
|
||||
|
||||
**Files** (`src\MxGateway.Contracts\Protos\mxaccess_gateway.proto`):
|
||||
|
||||
1. Extend `MxEventFamily` (line 403):
|
||||
```
|
||||
MX_EVENT_FAMILY_ON_ALARM_TRANSITION = 5;
|
||||
```
|
||||
|
||||
2. Extend `MxEvent.body` oneof (line 395) with:
|
||||
```
|
||||
OnAlarmTransitionEvent on_alarm_transition = 24;
|
||||
```
|
||||
|
||||
3. New message `OnAlarmTransitionEvent` after the existing event-family
|
||||
bodies (line 425+). Carry the full MxAccess alarm payload — alarm name,
|
||||
source object reference, alarm-type-name (e.g. "AnalogLimitAlarm.HiHi"),
|
||||
transition kind enum (`Raise` / `Acknowledge` / `Clear`), severity (raw
|
||||
numeric — keep MxAccess scale; mapping to OPC UA 0-1000 happens
|
||||
server-side in lmxopcua), `original_raise_timestamp`,
|
||||
`transition_timestamp`, optional `operator_user`, optional
|
||||
`operator_comment`, alarm `category` string, alarm `description`. Mirror
|
||||
the field set documented in v1's `GalaxyAlarmTracker`.
|
||||
|
||||
4. New RPC on `MxAccessGateway` service (line 11):
|
||||
```
|
||||
rpc AcknowledgeAlarm(AcknowledgeAlarmRequest) returns (AcknowledgeAlarmReply);
|
||||
rpc QueryActiveAlarms(QueryActiveAlarmsRequest) returns (stream ActiveAlarmSnapshot);
|
||||
```
|
||||
|
||||
`AcknowledgeAlarmRequest` carries `session_id`, `alarm_full_reference`,
|
||||
`comment`, `user_principal`. Reply carries `MxStatusProxy`.
|
||||
|
||||
`QueryActiveAlarmsRequest` carries `session_id`, optional
|
||||
`alarm_filter_prefix` (for ConditionRefresh on a sub-tree).
|
||||
`ActiveAlarmSnapshot` carries the same fields as
|
||||
`OnAlarmTransitionEvent` plus `current_state` enum (`Active` /
|
||||
`ActiveAcked` / `Inactive`).
|
||||
|
||||
**Tests** (`MxGateway.Tests` — proto/codegen sanity):
|
||||
|
||||
- Round-trip Serialize→Deserialize for the new messages with all-fields
|
||||
populated and empty-optional-fields cases.
|
||||
- `MxEvent.body` oneof selection guard — supplying multiple bodies
|
||||
rejected.
|
||||
|
||||
**Out of scope:** worker-side wiring (PR A.2), gateway-side dispatch (PR A.3).
|
||||
PR A.1 is a pure contract-surface change; nothing functional yet.
|
||||
|
||||
### PR A.2 — worker: subscribe to MxAccess alarm event source
|
||||
|
||||
**Files** (`src\MxGateway.Worker\` — net48/x86):
|
||||
|
||||
The MxAccess Toolkit exposes alarm subscription separately from data
|
||||
subscription. Per AVEVA's MXAccess C++ Toolkit reference (canonical doc
|
||||
referenced from `gateway.md`), alarm events arrive through the
|
||||
`IAlarmEventSink` interface registered against the MxAccess `Alarms`
|
||||
collection of an open session, OR via the MxAccess "alarm provider"
|
||||
subscription pattern (depends on Toolkit version on the worker host —
|
||||
verify against the version actually deployed in the worker bin during
|
||||
PR A.2).
|
||||
|
||||
1. Worker subscribes to MxAccess alarms once per session, with a single
|
||||
sink that fans out into the same bounded channel the data-change pump
|
||||
uses (`MxGateway.Worker\Eventing\EventChannel.cs` or whatever the worker
|
||||
currently calls its sink — verify name during the PR).
|
||||
2. Sink translates each MxAccess alarm event into a `WorkerEvent` proto
|
||||
(defined in `mxaccess_worker.proto`) carrying the new
|
||||
`OnAlarmTransitionEvent` body. Reuses the existing `worker_sequence`
|
||||
counter so ordering is preserved across families.
|
||||
3. Worker honours the same backpressure rules as data-change events —
|
||||
newest-dropped on full channel, single dropped-counter metric per
|
||||
family.
|
||||
|
||||
**Tests** (`MxGateway.Worker.Tests`):
|
||||
|
||||
- Fake `IAlarmEventSink` source emits canned transitions; assert the
|
||||
worker forwards each as the right `WorkerEvent` shape.
|
||||
- Cancellation test — closing the session unsubscribes from MxAccess
|
||||
alarms cleanly (no leaked sinks if the worker is recycled mid-session).
|
||||
|
||||
**Out of scope:** any gateway-side dispatch, any RPC handler — PR A.2
|
||||
is worker-internal.
|
||||
|
||||
### PR A.3 — gateway: dispatch OnAlarmTransition + implement AcknowledgeAlarm
|
||||
|
||||
**Files** (`src\MxGateway.Server\`):
|
||||
|
||||
1. The session-level event multiplexer (`Sessions\SessionEventStream.cs`
|
||||
or equivalent — verify name during PR) recognizes the new
|
||||
`WorkerEvent` body and forwards as an `MxEvent` with family
|
||||
`MX_EVENT_FAMILY_ON_ALARM_TRANSITION` to the gRPC
|
||||
`StreamEvents` consumer.
|
||||
2. New RPC handler `AcknowledgeAlarm` builds an MxAccess `WorkerCommand`
|
||||
carrying an `AlarmAcknowledgeCommand` (new in `mxaccess_worker.proto`
|
||||
under PR A.1). Forwarded to the worker; reply mapped to
|
||||
`AcknowledgeAlarmReply` with the MxAccess `MxStatus` proxy populated.
|
||||
3. AuthN — same API-key + scope check as existing RPCs. Add a new scope
|
||||
`invoke:alarm-ack` (mirrors `invoke:write` granularity); existing keys
|
||||
without it return `PERMISSION_DENIED`.
|
||||
|
||||
**Tests** (`MxGateway.Tests`, `MxGateway.IntegrationTests`):
|
||||
|
||||
- Unit: dispatch test — fake worker emits an `AlarmTransition` event;
|
||||
assert the gateway forwards it on the live `StreamEvents` channel of
|
||||
every subscribed session.
|
||||
- Integration: end-to-end against the real worker (requires the parity
|
||||
rig setup — see `docs\v2\Galaxy.ParityRig.md` in lmxopcua for the
|
||||
MxAccess-installed dev box prerequisites). Trigger a real Galaxy
|
||||
alarm, assert the gateway emits `OnAlarmTransition`. Acknowledge via
|
||||
the new RPC, assert the alarm transitions to `ActiveAcked` and an
|
||||
`Acknowledge` transition event is emitted back.
|
||||
- AuthN: existing key without `invoke:alarm-ack` scope rejected.
|
||||
|
||||
### PR A.4 — gateway: ConditionRefresh snapshot via QueryActiveAlarms
|
||||
|
||||
**Files** (`src\MxGateway.Server\`, `src\MxGateway.Worker\`):
|
||||
|
||||
1. Worker exposes a `QueryActiveAlarmsCommand` that walks the session's
|
||||
active-alarm collection and streams snapshots back through the
|
||||
existing command-reply channel. The MxAccess Toolkit's
|
||||
`Alarms.GetActive()` (verify exact API name during PR) is the
|
||||
underlying call.
|
||||
2. Gateway RPC `QueryActiveAlarms` opens a server-streaming reply,
|
||||
batches snapshots through.
|
||||
3. AuthN — new scope `invoke:alarm-query` (separate from ack so a
|
||||
read-only client can refresh without ack rights).
|
||||
|
||||
**Tests:**
|
||||
|
||||
- Worker-test: synthetic active set of 0 / 1 / 100 alarms; assert
|
||||
pagination respects worker channel capacity.
|
||||
- Integration: against the parity rig, assert a ConditionRefresh after
|
||||
reconnect returns every alarm currently `Active` or `ActiveAcked` in
|
||||
the Galaxy.
|
||||
|
||||
### PR A.5 — gateway: WriteHistorianEvent RPC for sink write-back
|
||||
|
||||
**Files** (`src\MxGateway.Server\`, `src\MxGateway.Worker\`,
|
||||
`src\MxGateway.Contracts\Protos\mxaccess_gateway.proto`).
|
||||
|
||||
1. New RPC `WriteHistorianEvent(WriteHistorianEventRequest) →
|
||||
WriteHistorianEventReply`. Request carries an
|
||||
`AlarmHistorianRecord` mirroring the existing
|
||||
`Core.AlarmHistorian.AlarmHistorianEvent` payload (alarm id,
|
||||
equipment path, alarm name, alarm-type-name, severity, event kind,
|
||||
message, user, comment, timestamp).
|
||||
2. Worker maps the record onto `aahClientManaged`'s alarm-event
|
||||
write API (the same path v1's `GalaxyHistorianWriter` used). Worker
|
||||
batches up to N records per write to amortize the COM round-trip.
|
||||
3. AuthN — new scope `invoke:historian-write`. Cross-cutting with
|
||||
`invoke:write` — keys for OPC UA servers that publish historian
|
||||
data must hold both.
|
||||
|
||||
**Tests:**
|
||||
|
||||
- Worker test: fake `aahClientManaged` writer; assert batching
|
||||
semantics + retry-on-Bad-status-code behaviour matches v1's
|
||||
`GalaxyHistorianWriter` (per-row outcome reporting).
|
||||
- Integration: write a record, query it back via existing Historian
|
||||
read APIs, assert round-trip fidelity.
|
||||
|
||||
**Sequencing within Track A:** A.1 → A.2 → A.3 → A.4 → A.5. A.1 is
|
||||
mechanical; A.2 + A.3 are the load-bearing changes that unlock lmxopcua
|
||||
side. A.4 + A.5 can ship after lmxopcua starts consuming A.3 output.
|
||||
|
||||
## Track B — lmxopcua changes
|
||||
|
||||
All five PRs land in `c:\Users\dohertj2\Desktop\lmxopcua\`. Each B-PR
|
||||
depends on a specific A-PR — see the sequencing matrix below.
|
||||
|
||||
### PR B.1 — EventPump: dispatch OnAlarmTransition family
|
||||
|
||||
**Depends on:** A.1 (proto), A.3 (gateway dispatching the new family).
|
||||
|
||||
**Files:**
|
||||
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\EventPump.cs:160` —
|
||||
current `Dispatch(MxEvent ev)` returns early for any non-`OnDataChange`
|
||||
family. Add a branch:
|
||||
```csharp
|
||||
switch (ev.Family) {
|
||||
case MxEventFamily.OnDataChange: DispatchDataChange(ev); break;
|
||||
case MxEventFamily.OnAlarmTransition: DispatchAlarmTransition(ev); break;
|
||||
default: return;
|
||||
}
|
||||
```
|
||||
- New `DispatchAlarmTransition` translates the proto event into an
|
||||
`AlarmEventArgs` (existing type from `Core.Abstractions`) and raises an
|
||||
internal event the driver subscribes to.
|
||||
- New `MxAccessSeverityMapper` in `Driver.Galaxy\Runtime\` — maps the
|
||||
MxAccess raw severity into the `AlarmSeverity` enum + the OPC UA
|
||||
numeric severity (250 / 500 / 700 / 900 ladder per v1's
|
||||
`AlarmTracking.md`).
|
||||
|
||||
**Tests** (`tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests\Runtime\`):
|
||||
|
||||
- `EventPumpAlarmTests` — feed three synthetic MxEvents (raise / ack /
|
||||
clear); assert each fires `OnAlarmEvent` on the driver with correct
|
||||
payload.
|
||||
- Severity-mapping table tests — every documented MxAccess severity
|
||||
level → expected (`AlarmSeverity`, OPC UA numeric) tuple.
|
||||
|
||||
### PR B.2 — GalaxyDriver re-implements IAlarmSource
|
||||
|
||||
**Depends on:** A.3 (`AcknowledgeAlarm` RPC available), B.1 (event
|
||||
dispatch).
|
||||
|
||||
**Files:**
|
||||
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriver.cs:28` — extend the
|
||||
class declaration:
|
||||
```csharp
|
||||
public sealed class GalaxyDriver
|
||||
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable,
|
||||
IRediscoverable, IHostConnectivityProbe, IAlarmSource, IDisposable
|
||||
```
|
||||
- Implement the four `IAlarmSource` members:
|
||||
- `SubscribeAlarmsAsync` — no-op returning a sentinel handle. The
|
||||
driver is already subscribed for data; alarm events arrive on the
|
||||
same event stream once the gateway emits the new family. (Same
|
||||
pattern AbCip uses today — see `Driver.AbCip\AbCipDriver.cs:208`.)
|
||||
- `UnsubscribeAlarmsAsync` — no-op.
|
||||
- `OnAlarmEvent` — wired to the EventPump branch added in B.1.
|
||||
- `AcknowledgeAsync` — calls the new gateway RPC via the
|
||||
`IGalaxyAlarmAcknowledger` abstraction (new file, mirrors the
|
||||
`IGalaxyDataWriter` pattern), with `GatewayGalaxyAlarmAcknowledger`
|
||||
as the production implementation in `Runtime\`. Resilience wrapping
|
||||
via `AlarmSurfaceInvoker` per existing pattern.
|
||||
- `DriverInstanceFactory` for Galaxy registers
|
||||
`IGalaxyAlarmAcknowledger` alongside the existing data writer.
|
||||
|
||||
**Tests:**
|
||||
|
||||
- Subscribe-noop returns a non-null handle; unsubscribe accepts it.
|
||||
- Acknowledge — fake `IGalaxyAlarmAcknowledger` records the call; assert
|
||||
the request shape and resilience-pipeline routing.
|
||||
- End-to-end test in `Driver.Galaxy.Tests` — fake gateway emits a
|
||||
raise-then-ack event sequence; assert the driver fires `OnAlarmEvent`
|
||||
twice with matching alarm-id correlation.
|
||||
|
||||
### PR B.3 — DriverNodeManager: route to driver-native when present
|
||||
|
||||
**Depends on:** B.2.
|
||||
|
||||
**Files:**
|
||||
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs` — when
|
||||
registering an `AlarmConditionState` for a Galaxy variable, check
|
||||
whether the driver is `IAlarmSource`. If yes, prefer the
|
||||
`OnAlarmEvent`-driven path; the value-driven sub-attribute path
|
||||
becomes the secondary path that handles transitions the driver-native
|
||||
stream missed (network blip, gateway restart, gw missing the
|
||||
`$Alarm*` extension on this template).
|
||||
- `Server\Alarms\AlarmConditionService` — already accepts events from
|
||||
multiple sources; only addition is a `DriverEventOrigin` enum on
|
||||
internal transitions so the dedup logic prefers the richer
|
||||
driver-native record over a stale sub-attribute synthesis.
|
||||
- `IAlarmAcknowledger` resolution in `DriverNodeManager` —
|
||||
prefer the driver's `IAlarmSource.AcknowledgeAsync` over
|
||||
`DriverWritableAcknowledger` when both are available. Keep
|
||||
`DriverWritableAcknowledger` as the fallback for templates without
|
||||
`$Alarm*` extensions.
|
||||
|
||||
**Tests:**
|
||||
|
||||
- Two-source-fan-in test: same alarm condition receives both a
|
||||
driver-native ack event and a sub-attribute value update for the same
|
||||
transition; assert no duplicate Part 9 transition fires.
|
||||
- Acknowledger routing — driver implements `IAlarmSource` →
|
||||
ack-via-RPC; driver implements only `IWritable` → ack-via-write
|
||||
(existing path).
|
||||
|
||||
### PR B.4 — IAlarmHistorianWriter via gateway
|
||||
|
||||
**Depends on:** A.5 (`WriteHistorianEvent` RPC available).
|
||||
|
||||
**Files:**
|
||||
|
||||
- New `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\GatewayAlarmHistorianWriter.cs`
|
||||
implementing `IAlarmHistorianWriter`. Calls the gateway RPC from
|
||||
Track A.5 with the same batch + per-row outcome semantics v1's
|
||||
`GalaxyHistorianWriter` exposed.
|
||||
- `GalaxyDriverFactory` registers it as a singleton tied to the
|
||||
`DriverInstance`.
|
||||
- `Server\Phase7\Phase7Composer.ResolveHistorianSink` — already scans
|
||||
registered drivers for an `IAlarmHistorianWriter`. Once GalaxyDriver
|
||||
exposes one, `SqliteStoreAndForwardSink` boots with a real writer
|
||||
attached and the `NullAlarmHistorianSink` fallback no longer applies
|
||||
on Galaxy installs.
|
||||
- Delete `WriteAlarmEventsRequest` / `WriteAlarmEventsReply` /
|
||||
`IAlarmEventWriter` from the Wonderware sidecar
|
||||
(`src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Ipc\Contracts.cs`,
|
||||
`Ipc\HistorianFrameHandler.cs`, `Ipc\Framing.cs`). The historian
|
||||
sidecar becomes read-only — matches the audit done earlier.
|
||||
|
||||
**Tests:**
|
||||
|
||||
- `GatewayAlarmHistorianWriter` against a fake gRPC server — single
|
||||
record, batch, per-row failure modes (Ack / RetryPlease /
|
||||
PermanentFail).
|
||||
- `Phase7Composer` end-to-end — register a Galaxy driver, assert
|
||||
`ResolveHistorianSink` picks `SqliteStoreAndForwardSink` with the
|
||||
new writer attached.
|
||||
|
||||
### PR B.5 — docs + memory housekeeping
|
||||
|
||||
**Depends on:** B.1 / B.2 / B.3 / B.4 all green on the parity rig.
|
||||
|
||||
**Files:**
|
||||
|
||||
- `docs\drivers\Galaxy.md` — current text says the driver implements
|
||||
five capability interfaces; update to seven (`IAlarmSource`,
|
||||
`IAlarmHistorianWriter`-via-companion).
|
||||
- `docs\AlarmTracking.md` — promote a fresh top-level doc that
|
||||
describes the v2-final architecture (driver-native primary path +
|
||||
sub-attribute fallback + scripted-alarm aggregation). Cross-link from
|
||||
`docs\README.md`. The v1 archive stays as historical record.
|
||||
- `docs\v1\AlarmTracking.md` — extend the existing historical banner
|
||||
with "Restored to functional parity in this epic — see
|
||||
`docs\AlarmTracking.md` for current state."
|
||||
- Memory entries (`C:\Users\dohertj2\.claude\projects\…\memory\`):
|
||||
- Update `project_galaxy_via_mxgateway.md` — add the alarm path
|
||||
restoration.
|
||||
- Update `project_server_history_alarm_subsystems.md` — note that
|
||||
`Phase7Composer.ResolveHistorianSink` now finds a writer on
|
||||
Galaxy installs.
|
||||
- `docs\plans\alarms-over-gateway.md` (this file) — banner the doc
|
||||
`✅ Completed YYYY-MM-DD — historical record.` matching the existing
|
||||
v2-mxgw plan retirement convention.
|
||||
|
||||
## Sequencing matrix
|
||||
|
||||
```
|
||||
Track A (mxaccessgw) Track B (lmxopcua)
|
||||
───────────────────────── ─────────────────────────
|
||||
A.1 proto (waits)
|
||||
│
|
||||
├──────────────────────────► B.1 EventPump branch
|
||||
A.2 worker subscription │ uses proto types only
|
||||
│ │ unit-testable without live gw
|
||||
│
|
||||
A.3 gateway dispatch + ack RPC ──►B.2 GalaxyDriver : IAlarmSource
|
||||
│ │
|
||||
│ ──►B.3 DriverNodeManager routing
|
||||
│
|
||||
A.4 ConditionRefresh │ (B.3 closes the loop with A.4
|
||||
│ once ConditionRefresh wired)
|
||||
│
|
||||
A.5 WriteHistorianEvent ─────────►B.4 GatewayAlarmHistorianWriter
|
||||
│ + sidecar write-path deletion
|
||||
──►B.5 docs + memory
|
||||
```
|
||||
|
||||
A.1 + B.1 can land in parallel (B.1's tests use proto types without
|
||||
needing a running gateway). B.1 stays inert until A.3 ships the gateway
|
||||
dispatch — which is fine; the dispatch branch is a no-op until events
|
||||
arrive.
|
||||
|
||||
## Test gates
|
||||
|
||||
Per PR: unit tests pass + build green + analyzer clean (Roslyn
|
||||
OTOPCUA0001 still wraps every alarm-capability call through
|
||||
`AlarmSurfaceInvoker`).
|
||||
|
||||
End-of-epic gate: re-run the parity rig (`docs\v2\Galaxy.ParityRig.md`)
|
||||
with these scenarios added:
|
||||
|
||||
1. **Native alarm raise** — Galaxy `$Alarm*` raise with operator-time
|
||||
metadata appears as an OPC UA Part 9 transition with full payload
|
||||
(no longer reconstructed from sub-attribute writes).
|
||||
2. **Native ack** — OPC UA client acks; assert the gateway records the
|
||||
ack against MxAccess directly (not via sub-attribute write); operator
|
||||
comment present in the resulting `Acknowledged` transition.
|
||||
3. **ConditionRefresh after reconnect** — disconnect the GalaxyDriver,
|
||||
raise three alarms in Galaxy, reconnect; assert all three appear in
|
||||
the next ConditionRefresh.
|
||||
4. **Historian write-back** — fire a scripted alarm; assert it arrives in
|
||||
AVEVA Historian via the gateway path (use the existing Historian
|
||||
sidecar's read API to query it back).
|
||||
5. **Sub-attribute fallback still works** — disable `IAlarmSource` on
|
||||
the GalaxyDriver via test seam, fire a sub-attribute value change;
|
||||
assert Part 9 transition still raised.
|
||||
|
||||
Soak target: 24h × 1k tags (light) — same parity-rig harness but
|
||||
extended to also subscribe to alarms. Pass criterion: zero dropped
|
||||
alarm transitions, zero state-machine inversions, zero unhandled
|
||||
exceptions in the AlarmSurfaceInvoker pipeline.
|
||||
|
||||
## Risks and mitigations
|
||||
|
||||
| Risk | Mitigation |
|
||||
|---|---|
|
||||
| MxAccess Toolkit alarm subscription API differs across installed AVEVA versions | PR A.2 verifies against the worker-host's installed Toolkit version; documents the exact API used. Pin the worker DLL set per major MxAccess version if needed. |
|
||||
| Worker-side alarm subscription leaks between sessions if cleanup is wrong | PR A.2 includes a session-recycle test that asserts no `IAlarmEventSink` instances remain registered after Close. |
|
||||
| Gateway adds a new auth scope (`invoke:alarm-ack`); existing keys lack it | PR A.3 + A.5 ship with a one-time bootstrap migration: keys with `invoke:write` get the new scope auto-granted on the dev rig and parity rig. Production keys are reissued via `apikey rotate-key` (existing CLI). |
|
||||
| Two simultaneous alarm sources (driver-native + sub-attribute) double-fire transitions | PR B.3 dedup is the load-bearing design. End-to-end test #1 covers it explicitly. |
|
||||
| Historian write-back batch fails mid-batch — partial success | The existing `SqliteStoreAndForwardSink.HistorianWriteOutcome` per-row enum + dead-letter retention already handles this; PR A.5 just exposes the same outcome shape over gRPC. |
|
||||
| Sidecar write-path deletion in B.4 leaves orphan IPC frames in old client builds | The frame-kind enum is forward-compatible (`MessageKind.WriteAlarmEventsRequest = 0x20`). Old clients sending the request to a new sidecar receive `Unsupported message kind`; new clients never send it. Acceptable — same-version deploy is the existing rollout convention. |
|
||||
|
||||
## Roll-out
|
||||
|
||||
Track A lands first onto `mxaccessgw/main`, deployed to the parity rig.
|
||||
Track B lands onto `lmxopcua/master` once A.3 is live on the rig — earlier
|
||||
Track B PRs can target a feature branch (`feat/alarms-over-gateway`) and
|
||||
merge to master after the rig is fully green.
|
||||
|
||||
## Back-out
|
||||
|
||||
Each PR is individually revertable. The cleanest back-out point is at
|
||||
the gateway-side enum extension: removing `MX_EVENT_FAMILY_ON_ALARM_TRANSITION`
|
||||
from the proto means EventPump silently drops alarm events again and
|
||||
GalaxyDriver's `OnAlarmEvent` never fires — but the sub-attribute fallback
|
||||
path still produces functional alarms, so the OPC UA surface degrades to
|
||||
v2-current behaviour without breaking. PR B.4 is the only one with a
|
||||
non-trivial back-out (re-add the deleted sidecar IPC slot if revert
|
||||
needed); land B.4 last and only after end-of-epic gate is green.
|
||||
|
||||
## Out of scope (explicit)
|
||||
|
||||
- **Other alarm sources beyond Galaxy.** AbCip / FOCAS / OpcUaClient
|
||||
drivers already implement `IAlarmSource`; they're untouched.
|
||||
- **Modbus / S7 / AbLegacy / TwinCAT alarms.** None of those protocols
|
||||
has a native alarm bus. Alarms on those drivers, if needed, ship via
|
||||
the scripted-alarm path.
|
||||
- **Multi-Galaxy ack routing.** Today's gateway model is one Galaxy per
|
||||
session; if a deployment splits across galaxies, each gets its own
|
||||
GalaxyDriver and they don't cross-talk. No change.
|
||||
- **OPC UA Part 9 advanced features** beyond the current scope —
|
||||
shelving, subscribed-to-events-only, branch-state for re-trigger
|
||||
semantics. Future epic if a customer asks.
|
||||
- **Insight / cloud Historian write-back path.** Track A.5 targets the
|
||||
on-prem AVEVA Historian via aahClientManaged. The cloud variant
|
||||
would mirror the same gateway RPC over the REST API discussed in
|
||||
`docs/histsdk` — separate epic.
|
||||
|
||||
## File inventory (touched)
|
||||
|
||||
**mxaccessgw:**
|
||||
|
||||
- `src\MxGateway.Contracts\Protos\mxaccess_gateway.proto` (A.1, A.5)
|
||||
- `src\MxGateway.Contracts\Protos\mxaccess_worker.proto` (A.2, A.4, A.5)
|
||||
- `src\MxGateway.Worker\…\Eventing\` (A.2, A.3, A.4)
|
||||
- `src\MxGateway.Worker\…\Commands\` (A.3, A.4, A.5)
|
||||
- `src\MxGateway.Server\Sessions\SessionEventStream.cs` (A.3)
|
||||
- `src\MxGateway.Server\Rpc\` (A.3, A.4, A.5)
|
||||
- `src\MxGateway.Server\Auth\Scopes.cs` (A.3, A.4, A.5)
|
||||
- `MxGateway.Tests`, `MxGateway.Worker.Tests`, `MxGateway.IntegrationTests`
|
||||
|
||||
**lmxopcua:**
|
||||
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\EventPump.cs` (B.1)
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\MxAccessSeverityMapper.cs` *(new — B.1)*
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\IGalaxyAlarmAcknowledger.cs` *(new — B.2)*
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\GatewayGalaxyAlarmAcknowledger.cs` *(new — B.2)*
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\GatewayAlarmHistorianWriter.cs` *(new — B.4)*
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriver.cs` (B.2)
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriverFactory.cs` (B.2, B.4)
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs` (B.3)
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Server\Alarms\AlarmConditionService.cs` (B.3)
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Server\Phase7\Phase7Composer.cs` (B.4)
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Ipc\Contracts.cs` (B.4 — deletions)
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Ipc\HistorianFrameHandler.cs` (B.4 — deletions)
|
||||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Ipc\Framing.cs` (B.4 — deletions)
|
||||
- `tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests\Runtime\` (B.1, B.2)
|
||||
- `tests\ZB.MOM.WW.OtOpcUa.Server.Tests\Alarms\` (B.3)
|
||||
- `tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests\` (B.4 — drop deleted-contract tests)
|
||||
- `docs\drivers\Galaxy.md` (B.5)
|
||||
- `docs\AlarmTracking.md` *(new — B.5)*
|
||||
- `docs\v1\AlarmTracking.md` (B.5 — banner update)
|
||||
- `docs\plans\alarms-over-gateway.md` (B.5 — completion banner)
|
||||
|
||||
Total: ~12 source files added/modified in mxaccessgw; ~17 in lmxopcua;
|
||||
~10 test files. Should land in 4-6 weeks of focused work given the
|
||||
parity-rig dependency for end-to-end validation.
|
||||
Reference in New Issue
Block a user