Replaces the "ships as a follow-up gated on dev-rig validation" banner with the actual finding from the dev-rig inspection: the MXAccess COM Toolkit on this AVEVA install does not expose any alarm-event family, and the AVEVA alarm-subscription managed assemblies (aaAlarmManagedClient, ArchestrAAlarmsAndEvents.SDK) are x64-only and incompatible with the worker's x86 bitness. Two operator-facing paths forward documented inline: 1. Stay on the value-driven sub-attribute path (current production behaviour). Operator-comment fidelity is the only v1 regression. 2. Add an x64 alarm-helper sub-process alongside the worker that loads aaAlarmManagedClient and forwards transitions over a named-pipe IPC. Recovers full v1 fidelity but adds operational complexity. The full architectural notes live in the mxaccessgw repo at src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1173 lines
57 KiB
Markdown
1173 lines
57 KiB
Markdown
# Plan — alarms over the mxaccessgw gateway
|
||
|
||
> ✅ **All 19 PRs merged 2026-04-30 — historical record.**
|
||
> A.1 / A.2 / A.3 / A.4 (gateway proto + handlers + worker scaffold),
|
||
> B.1 / B.2 / B.3 / B.4 / B.5 (driver, server, docs), C.1 / C.2
|
||
> (sidecar alarm historian writer), D.1 (deploy script),
|
||
> E.1 / E.2 / E.3 / E.4 / E.5 / E.6 / E.7 (5 client SDKs + lmxopcua
|
||
> client surface). Public contract surface is live; client SDKs ship
|
||
> the new RPCs; the sub-attribute fallback path keeps Galaxy alarms
|
||
> functional today.
|
||
>
|
||
> ⚠️ **Worker-side native alarm subscription blocked on a dev-rig
|
||
> finding (2026-04-30):** the MXAccess COM Toolkit at
|
||
> `C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll`
|
||
> exposes no alarm-event family — only `OnDataChange`,
|
||
> `OnWriteComplete`, `OperationComplete`, `OnBufferedDataChange`.
|
||
> AVEVA's `aaAlarmManagedClient` / `ArchestrAAlarmsAndEvents.SDK`
|
||
> assemblies are x64-only and incompatible with the worker's x86
|
||
> bitness. **Operator decision needed before
|
||
> `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` carries any events:** either
|
||
> accept the value-driven sub-attribute path as the production
|
||
> architecture (operator-comment fidelity is the only v1 regression)
|
||
> or add an x64 alarm-helper sub-process alongside the worker. See
|
||
> `src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs` in the
|
||
> mxaccessgw repo for the architectural notes. Live
|
||
> `aahClientManaged` alarm-event write call site
|
||
> (`SdkAlarmHistorianWriteBackend` placeholder from PR C.1) and the
|
||
> D.1 smoke artifact ship once those decisions resolve. The
|
||
> remainder of this document is preserved as the design record.
|
||
|
||
Coordinated epic across two repos:
|
||
|
||
- **`lmxopcua`** (this repo) — `c:\Users\dohertj2\Desktop\lmxopcua\`
|
||
- **`mxaccessgw`** — `c:\Users\dohertj2\Desktop\mxaccessgw\`
|
||
|
||
## Why
|
||
|
||
PR 7.2 (2026-04-30, commit `ae7106d`) retired the in-process v1 Galaxy stack
|
||
(`Driver.Galaxy.Host` / `.Proxy` / `.Shared` + `OtOpcUaGalaxyHost` Windows
|
||
service) and migrated Galaxy access to the in-process `GalaxyDriver` over
|
||
mxaccessgw's gRPC. In doing so, three v1 capabilities regressed:
|
||
|
||
1. **Native MxAccess alarm-event metadata** — v1's `GalaxyAlarmTracker`
|
||
surfaced rich alarm transitions (operator comment, original raise time,
|
||
ack time, alarm category, native severity). The current architecture
|
||
reconstructs Part 9 transitions by subscribing to four sub-attribute
|
||
value updates (`InAlarm`, `Acked`, `Priority`, `Description`) — fine for
|
||
raise/clear but loses everything else.
|
||
2. **Native MxAccess Acknowledge semantics** — v1 called the MxAccess ack
|
||
API directly from `GalaxyAlarmTracker`. Today, OPC UA acks are written
|
||
into the `AckMsgWriteRef` sub-attribute — semantically valid but a
|
||
round-trip through the value path that loses operator-comment fidelity.
|
||
3. **Alarm-historian write-back path for non-Galaxy alarm sources.**
|
||
v1's `GalaxyHistorianWriter` implemented `IAlarmHistorianWriter` and
|
||
forwarded *scripted-alarm* transitions (and any future non-Galaxy
|
||
alarm source — AB CIP ALMD, OpcUaClient A&E, etc.) back to AVEVA
|
||
Historian via `aahClientManaged`. PR 7.2 deleted it.
|
||
`Phase7Composer.ResolveHistorianSink` now finds no writer and falls
|
||
back to `NullAlarmHistorianSink`, so **scripted-alarm transitions
|
||
queue locally and silently discard.** Galaxy-native alarms (with
|
||
`$Alarm*` extensions) reach AVEVA Historian via System Platform's
|
||
own `HistorizeToAveva` toggle on the Galaxy template — that path
|
||
was never broken and is not in scope for this epic.
|
||
|
||
`gateway.md` (mxaccessgw, line 8) explicitly commits the gateway to "full
|
||
MXAccess parity… preserve MXAccess behavior first… **native MXAccess event
|
||
families**." Today's gateway proto exposes only data-change families. Closing
|
||
the alarm regression and fulfilling that parity statement are the same task.
|
||
|
||
## Goals
|
||
|
||
- Restore all three regressed capabilities to feature parity with v1.
|
||
- Keep the v2 architectural split — gateway owns MxAccess transport;
|
||
lmxopcua owns OPC UA Part 9 semantics, ACL/role enforcement, and
|
||
multi-source aggregation (driver-native + scripted + sub-attribute).
|
||
- Preserve the value-driven sub-attribute path as a fallback for Galaxy
|
||
templates that don't carry `$Alarm*` extensions.
|
||
- Land the work as a sequence of small, independently-reviewable PRs that
|
||
alternate between repos in dependency order.
|
||
|
||
## Non-goals
|
||
|
||
- Reimplementing the Part 9 state machine inside mxaccessgw. The gateway
|
||
stays UA-agnostic.
|
||
- Reworking the LDAP role-grant or OPC UA AlarmAck ACL surface — those
|
||
already exist and route through `Server/Alarms/IAlarmAcknowledger`.
|
||
- Adding alarm support to non-Galaxy drivers (AbCip / FOCAS / OpcUaClient
|
||
already have their own `IAlarmSource` implementations; Modbus / S7 /
|
||
AbLegacy / TwinCAT don't have a native alarm bus and are out of scope).
|
||
- Altering Galaxy template conventions or `$Alarm*` extensions in the
|
||
customer's Galaxy.
|
||
|
||
## Before → after
|
||
|
||
**Today (post-PR 7.2):**
|
||
|
||
```
|
||
MxAccess COM (gateway worker)
|
||
│ data-change events only on the MxEvent stream
|
||
▼
|
||
GalaxyDriver (no IAlarmSource)
|
||
│ IWritable / ISubscribable / ITagDiscovery only
|
||
▼
|
||
DriverNodeManager
|
||
├─ subscribes to four $Alarm* sub-attributes per condition
|
||
├─ AlarmConditionService rebuilds Part 9 transitions from value updates
|
||
└─ DriverWritableAcknowledger writes AckMsgWriteRef on ack
|
||
|
||
Phase7Composer.ResolveHistorianSink → NullAlarmHistorianSink
|
||
(scripted-alarm transitions queue → silently discarded)
|
||
```
|
||
|
||
**After this epic:**
|
||
|
||
```
|
||
MxAccess COM (gateway worker)
|
||
│ data-change ──┐
|
||
│ alarm-transition │
|
||
│ write-complete ├─► single MxEvent stream (new family added)
|
||
▼ ▼
|
||
GalaxyDriver : ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable,
|
||
IHostConnectivityProbe, IAlarmSource ← restored
|
||
├─ EventPump dispatches OnAlarmTransition family → IAlarmSource.OnAlarmEvent
|
||
├─ AcknowledgeAsync → gateway RPC AcknowledgeAlarm
|
||
└─ QueryActiveAlarmsAsync → gateway RPC QueryActiveAlarms (ConditionRefresh)
|
||
|
||
DriverNodeManager
|
||
├─ rich alarm events from IAlarmSource.OnAlarmEvent → AlarmConditionService
|
||
├─ value-driven sub-attribute path STILL WORKS for templates without $Alarm
|
||
├─ DriverWritableAcknowledger preserved as fallback for the value path
|
||
└─ ScriptedAlarmEngine output continues to feed AlarmConditionService
|
||
|
||
Phase7Composer.ResolveHistorianSink → GatewayAlarmHistorianWriter
|
||
├─ scripted-alarm transitions → SqliteStoreAndForwardSink
|
||
└─ drain worker → gateway RPC WriteHistorianEvent → AVEVA Historian
|
||
```
|
||
|
||
## Architecture decisions
|
||
|
||
**D1 — Where the Part 9 state machine runs.** Stays in lmxopcua's
|
||
`AlarmConditionService`. Gateway is UA-agnostic. ScriptedAlarmEngine produces
|
||
Part 9 transitions with no MxAccess origin; the aggregator must live where all
|
||
sources converge.
|
||
|
||
**D2 — Where authz on Acknowledge runs.** Stays in lmxopcua. The OPC UA
|
||
`AlarmConditionState.OnAcknowledge` delegate already checks the session's
|
||
roles for `AlarmAck` against the LDAP/role-grant ACL. The gateway should
|
||
never be reachable in a way that bypasses that check.
|
||
|
||
**D3 — How rich alarm events reach OPC UA clients.** New `MxEventFamily`
|
||
on the existing `StreamEvents` RPC (no second stream). Adds latency
|
||
parity with data-change events, reuses the bounded-channel + worker-side
|
||
delivery semantics already documented in `gateway.md`.
|
||
|
||
**D4 — Sub-attribute fallback path stays.** Some Galaxy templates won't
|
||
have `$Alarm*` extensions yet; the existing value-driven path remains the
|
||
only way to surface alarms for those templates. Both paths feed
|
||
`AlarmConditionService`. Driver-native events take precedence when both
|
||
are present (more authoritative, lower latency).
|
||
|
||
**D5 — Where the historian writer lives.** In the **Wonderware historian
|
||
sidecar**, not in the gateway. The sidecar already owns `aahClientManaged`,
|
||
already has a `WriteAlarmEvents` IPC slot defined in `Ipc/Contracts.cs`, and
|
||
already dispatches to an `IAlarmEventWriter` interface — it's just unwired
|
||
in `Program.cs:57`. The gateway is for MxAccess (live data + Galaxy
|
||
hierarchy); the historian sidecar is for `aahClientManaged` (time-series +
|
||
alarms historian). Two different SDKs, two different concerns; keep the
|
||
split. Bonus: completing the sidecar's write path also gives it a clearer
|
||
long-term role — once the REST-API migration in `histsdk\instructions.md`
|
||
takes over reads, write-back keeps the sidecar relevant rather than
|
||
retiring it as a read-only relic. **Galaxy-native alarms bypass this
|
||
entirely** — System Platform's own `HistorizeToAveva` toggle on the
|
||
Galaxy template publishes them directly. The sidecar write path is
|
||
exclusively for non-Galaxy producers (today: scripted alarms; future: AB
|
||
CIP ALMD or any other lmxopcua-side alarm source the customer wants
|
||
unified into AVEVA Historian).
|
||
|
||
## Track A — mxaccessgw changes
|
||
|
||
All five PRs land in `c:\Users\dohertj2\Desktop\mxaccessgw\`.
|
||
|
||
### PR A.1 — proto: add alarm-transition event family + ack/query RPCs
|
||
|
||
**Files** (`src\MxGateway.Contracts\Protos\mxaccess_gateway.proto`):
|
||
|
||
1. Extend `MxEventFamily` (line 403):
|
||
```
|
||
MX_EVENT_FAMILY_ON_ALARM_TRANSITION = 5;
|
||
```
|
||
|
||
2. Extend `MxEvent.body` oneof (line 395) with:
|
||
```
|
||
OnAlarmTransitionEvent on_alarm_transition = 24;
|
||
```
|
||
|
||
3. New message `OnAlarmTransitionEvent` after the existing event-family
|
||
bodies (line 425+). Carry the full MxAccess alarm payload — alarm name,
|
||
source object reference, alarm-type-name (e.g. "AnalogLimitAlarm.HiHi"),
|
||
transition kind enum (`Raise` / `Acknowledge` / `Clear`), severity (raw
|
||
numeric — keep MxAccess scale; mapping to OPC UA 0-1000 happens
|
||
server-side in lmxopcua), `original_raise_timestamp`,
|
||
`transition_timestamp`, optional `operator_user`, optional
|
||
`operator_comment`, alarm `category` string, alarm `description`. Mirror
|
||
the field set documented in v1's `GalaxyAlarmTracker`.
|
||
|
||
4. New RPC on `MxAccessGateway` service (line 11):
|
||
```
|
||
rpc AcknowledgeAlarm(AcknowledgeAlarmRequest) returns (AcknowledgeAlarmReply);
|
||
rpc QueryActiveAlarms(QueryActiveAlarmsRequest) returns (stream ActiveAlarmSnapshot);
|
||
```
|
||
|
||
`AcknowledgeAlarmRequest` carries `session_id`, `alarm_full_reference`,
|
||
`comment`, `user_principal`. Reply carries `MxStatusProxy`.
|
||
|
||
`QueryActiveAlarmsRequest` carries `session_id`, optional
|
||
`alarm_filter_prefix` (for ConditionRefresh on a sub-tree).
|
||
`ActiveAlarmSnapshot` carries the same fields as
|
||
`OnAlarmTransitionEvent` plus `current_state` enum (`Active` /
|
||
`ActiveAcked` / `Inactive`).
|
||
|
||
**Tests** (`MxGateway.Tests` — proto/codegen sanity):
|
||
|
||
- Round-trip Serialize→Deserialize for the new messages with all-fields
|
||
populated and empty-optional-fields cases.
|
||
- `MxEvent.body` oneof selection guard — supplying multiple bodies
|
||
rejected.
|
||
|
||
**Out of scope:** worker-side wiring (PR A.2), gateway-side dispatch (PR A.3).
|
||
PR A.1 is a pure contract-surface change; nothing functional yet.
|
||
|
||
### PR A.2 — worker: subscribe to MxAccess alarm event source
|
||
|
||
**Files** (`src\MxGateway.Worker\` — net48/x86):
|
||
|
||
The MxAccess Toolkit exposes alarm subscription separately from data
|
||
subscription. Per AVEVA's MXAccess C++ Toolkit reference (canonical doc
|
||
referenced from `gateway.md`), alarm events arrive through the
|
||
`IAlarmEventSink` interface registered against the MxAccess `Alarms`
|
||
collection of an open session, OR via the MxAccess "alarm provider"
|
||
subscription pattern (depends on Toolkit version on the worker host —
|
||
verify against the version actually deployed in the worker bin during
|
||
PR A.2).
|
||
|
||
1. Worker subscribes to MxAccess alarms once per session, with a single
|
||
sink that fans out into the same bounded channel the data-change pump
|
||
uses (`MxGateway.Worker\Eventing\EventChannel.cs` or whatever the worker
|
||
currently calls its sink — verify name during the PR).
|
||
2. Sink translates each MxAccess alarm event into a `WorkerEvent` proto
|
||
(defined in `mxaccess_worker.proto`) carrying the new
|
||
`OnAlarmTransitionEvent` body. Reuses the existing `worker_sequence`
|
||
counter so ordering is preserved across families.
|
||
3. Worker honours the same backpressure rules as data-change events —
|
||
newest-dropped on full channel, single dropped-counter metric per
|
||
family.
|
||
|
||
**Tests** (`MxGateway.Worker.Tests`):
|
||
|
||
- Fake `IAlarmEventSink` source emits canned transitions; assert the
|
||
worker forwards each as the right `WorkerEvent` shape.
|
||
- Cancellation test — closing the session unsubscribes from MxAccess
|
||
alarms cleanly (no leaked sinks if the worker is recycled mid-session).
|
||
|
||
**Out of scope:** any gateway-side dispatch, any RPC handler — PR A.2
|
||
is worker-internal.
|
||
|
||
### PR A.3 — gateway: dispatch OnAlarmTransition + implement AcknowledgeAlarm
|
||
|
||
**Files** (`src\MxGateway.Server\`):
|
||
|
||
1. The session-level event multiplexer (`Sessions\SessionEventStream.cs`
|
||
or equivalent — verify name during PR) recognizes the new
|
||
`WorkerEvent` body and forwards as an `MxEvent` with family
|
||
`MX_EVENT_FAMILY_ON_ALARM_TRANSITION` to the gRPC
|
||
`StreamEvents` consumer.
|
||
2. New RPC handler `AcknowledgeAlarm` builds an MxAccess `WorkerCommand`
|
||
carrying an `AlarmAcknowledgeCommand` (new in `mxaccess_worker.proto`
|
||
under PR A.1). Forwarded to the worker; reply mapped to
|
||
`AcknowledgeAlarmReply` with the MxAccess `MxStatus` proxy populated.
|
||
3. AuthN — same API-key + scope check as existing RPCs. Add a new scope
|
||
`invoke:alarm-ack` (mirrors `invoke:write` granularity); existing keys
|
||
without it return `PERMISSION_DENIED`.
|
||
|
||
**Tests** (`MxGateway.Tests`, `MxGateway.IntegrationTests`):
|
||
|
||
- Unit: dispatch test — fake worker emits an `AlarmTransition` event;
|
||
assert the gateway forwards it on the live `StreamEvents` channel of
|
||
every subscribed session.
|
||
- Integration: end-to-end against the real worker (requires the parity
|
||
rig setup — see `docs\v2\Galaxy.ParityRig.md` in lmxopcua for the
|
||
MxAccess-installed dev box prerequisites). Trigger a real Galaxy
|
||
alarm, assert the gateway emits `OnAlarmTransition`. Acknowledge via
|
||
the new RPC, assert the alarm transitions to `ActiveAcked` and an
|
||
`Acknowledge` transition event is emitted back.
|
||
- AuthN: existing key without `invoke:alarm-ack` scope rejected.
|
||
|
||
### PR A.4 — gateway: ConditionRefresh snapshot via QueryActiveAlarms
|
||
|
||
**Files** (`src\MxGateway.Server\`, `src\MxGateway.Worker\`):
|
||
|
||
1. Worker exposes a `QueryActiveAlarmsCommand` that walks the session's
|
||
active-alarm collection and streams snapshots back through the
|
||
existing command-reply channel. The MxAccess Toolkit's
|
||
`Alarms.GetActive()` (verify exact API name during PR) is the
|
||
underlying call.
|
||
2. Gateway RPC `QueryActiveAlarms` opens a server-streaming reply,
|
||
batches snapshots through.
|
||
3. AuthN — new scope `invoke:alarm-query` (separate from ack so a
|
||
read-only client can refresh without ack rights).
|
||
|
||
**Tests:**
|
||
|
||
- Worker-test: synthetic active set of 0 / 1 / 100 alarms; assert
|
||
pagination respects worker channel capacity.
|
||
- Integration: against the parity rig, assert a ConditionRefresh after
|
||
reconnect returns every alarm currently `Active` or `ActiveAcked` in
|
||
the Galaxy.
|
||
|
||
**Sequencing within Track A:** A.1 → A.2 → A.3 → A.4. A.1 is
|
||
mechanical; A.2 + A.3 are the load-bearing changes that unlock lmxopcua
|
||
side. A.4 can ship after lmxopcua starts consuming A.3 output. The
|
||
historian-write capability moved to **Track C** below — the gateway
|
||
intentionally stays out of `aahClientManaged`.
|
||
|
||
## Track B — lmxopcua changes
|
||
|
||
All five PRs land in `c:\Users\dohertj2\Desktop\lmxopcua\`. Each B-PR
|
||
depends on a specific A-PR — see the sequencing matrix below.
|
||
|
||
### PR B.1 — EventPump: dispatch OnAlarmTransition family
|
||
|
||
**Depends on:** A.1 (proto), A.3 (gateway dispatching the new family).
|
||
|
||
**Files:**
|
||
|
||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\EventPump.cs:160` —
|
||
current `Dispatch(MxEvent ev)` returns early for any non-`OnDataChange`
|
||
family. Add a branch:
|
||
```csharp
|
||
switch (ev.Family) {
|
||
case MxEventFamily.OnDataChange: DispatchDataChange(ev); break;
|
||
case MxEventFamily.OnAlarmTransition: DispatchAlarmTransition(ev); break;
|
||
default: return;
|
||
}
|
||
```
|
||
- New `DispatchAlarmTransition` translates the proto event into an
|
||
`AlarmEventArgs` (existing type from `Core.Abstractions`) and raises an
|
||
internal event the driver subscribes to.
|
||
- New `MxAccessSeverityMapper` in `Driver.Galaxy\Runtime\` — maps the
|
||
MxAccess raw severity into the `AlarmSeverity` enum + the OPC UA
|
||
numeric severity (250 / 500 / 700 / 900 ladder per v1's
|
||
`AlarmTracking.md`).
|
||
|
||
**Tests** (`tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests\Runtime\`):
|
||
|
||
- `EventPumpAlarmTests` — feed three synthetic MxEvents (raise / ack /
|
||
clear); assert each fires `OnAlarmEvent` on the driver with correct
|
||
payload.
|
||
- Severity-mapping table tests — every documented MxAccess severity
|
||
level → expected (`AlarmSeverity`, OPC UA numeric) tuple.
|
||
|
||
### PR B.2 — GalaxyDriver re-implements IAlarmSource
|
||
|
||
**Depends on:** A.3 (`AcknowledgeAlarm` RPC available), B.1 (event
|
||
dispatch).
|
||
|
||
**Files:**
|
||
|
||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriver.cs:28` — extend the
|
||
class declaration:
|
||
```csharp
|
||
public sealed class GalaxyDriver
|
||
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable,
|
||
IRediscoverable, IHostConnectivityProbe, IAlarmSource, IDisposable
|
||
```
|
||
- Implement the four `IAlarmSource` members:
|
||
- `SubscribeAlarmsAsync` — no-op returning a sentinel handle. The
|
||
driver is already subscribed for data; alarm events arrive on the
|
||
same event stream once the gateway emits the new family. (Same
|
||
pattern AbCip uses today — see `Driver.AbCip\AbCipDriver.cs:208`.)
|
||
- `UnsubscribeAlarmsAsync` — no-op.
|
||
- `OnAlarmEvent` — wired to the EventPump branch added in B.1.
|
||
- `AcknowledgeAsync` — calls the new gateway RPC via the
|
||
`IGalaxyAlarmAcknowledger` abstraction (new file, mirrors the
|
||
`IGalaxyDataWriter` pattern), with `GatewayGalaxyAlarmAcknowledger`
|
||
as the production implementation in `Runtime\`. Resilience wrapping
|
||
via `AlarmSurfaceInvoker` per existing pattern.
|
||
- `DriverInstanceFactory` for Galaxy registers
|
||
`IGalaxyAlarmAcknowledger` alongside the existing data writer.
|
||
|
||
**Tests:**
|
||
|
||
- Subscribe-noop returns a non-null handle; unsubscribe accepts it.
|
||
- Acknowledge — fake `IGalaxyAlarmAcknowledger` records the call; assert
|
||
the request shape and resilience-pipeline routing.
|
||
- End-to-end test in `Driver.Galaxy.Tests` — fake gateway emits a
|
||
raise-then-ack event sequence; assert the driver fires `OnAlarmEvent`
|
||
twice with matching alarm-id correlation.
|
||
|
||
### PR B.3 — DriverNodeManager: route to driver-native when present
|
||
|
||
**Depends on:** B.2.
|
||
|
||
**Files:**
|
||
|
||
- `src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs` — when
|
||
registering an `AlarmConditionState` for a Galaxy variable, check
|
||
whether the driver is `IAlarmSource`. If yes, prefer the
|
||
`OnAlarmEvent`-driven path; the value-driven sub-attribute path
|
||
becomes the secondary path that handles transitions the driver-native
|
||
stream missed (network blip, gateway restart, gw missing the
|
||
`$Alarm*` extension on this template).
|
||
- `Server\Alarms\AlarmConditionService` — already accepts events from
|
||
multiple sources; only addition is a `DriverEventOrigin` enum on
|
||
internal transitions so the dedup logic prefers the richer
|
||
driver-native record over a stale sub-attribute synthesis.
|
||
- `IAlarmAcknowledger` resolution in `DriverNodeManager` —
|
||
prefer the driver's `IAlarmSource.AcknowledgeAsync` over
|
||
`DriverWritableAcknowledger` when both are available. Keep
|
||
`DriverWritableAcknowledger` as the fallback for templates without
|
||
`$Alarm*` extensions.
|
||
|
||
**Tests:**
|
||
|
||
- Two-source-fan-in test: same alarm condition receives both a
|
||
driver-native ack event and a sub-attribute value update for the same
|
||
transition; assert no duplicate Part 9 transition fires.
|
||
- Acknowledger routing — driver implements `IAlarmSource` →
|
||
ack-via-RPC; driver implements only `IWritable` → ack-via-write
|
||
(existing path).
|
||
|
||
### PR B.4 — IAlarmHistorianWriter via the historian sidecar IPC
|
||
|
||
**Depends on:** C.2 (sidecar wires its `IAlarmEventWriter`). See Track C
|
||
for the sidecar-side work; B.4 is the lmxopcua-side consumer.
|
||
|
||
**Files:**
|
||
|
||
- New `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client\SidecarAlarmHistorianWriter.cs`
|
||
implementing `IAlarmHistorianWriter`. Sends batches over the existing
|
||
named-pipe IPC using the **already-defined**
|
||
`WriteAlarmEventsRequest` / `WriteAlarmEventsReply` contracts at
|
||
`Ipc\Contracts.cs:153`. No protocol changes — the slot is wired today
|
||
on the contract side; only the production behaviour and the consumer
|
||
on this side need to land.
|
||
- `Server\Phase7\Phase7Composer.ResolveHistorianSink` — already scans
|
||
for registered `IAlarmHistorianWriter` instances. Register the new
|
||
sidecar-backed writer at server bootstrap when the historian sidecar
|
||
is enabled (`appsettings.json` `Historian:Wonderware:Enabled = true`).
|
||
`SqliteStoreAndForwardSink` then boots with a real writer attached
|
||
and the `NullAlarmHistorianSink` fallback no longer applies on
|
||
installs that have the sidecar deployed.
|
||
|
||
**Tests:**
|
||
|
||
- `SidecarAlarmHistorianWriter` against a fake `PipeServer` —
|
||
single record, batch, per-row failure modes (Ack / RetryPlease /
|
||
PermanentFail) mapped from the sidecar's `PerEventOk[]` reply.
|
||
- `Phase7Composer` end-to-end — start the server with the historian
|
||
sidecar enabled; assert `ResolveHistorianSink` picks
|
||
`SqliteStoreAndForwardSink` with the new sidecar writer attached.
|
||
|
||
**Note on producer scope:** This path historizes **non-Galaxy alarms
|
||
only.** Galaxy-native alarms (with `$Alarm*` extensions) reach AVEVA
|
||
Historian directly via System Platform's `HistorizeToAveva` toggle on
|
||
the alarm primitive, with no involvement from us. Today the only live
|
||
producer feeding `SqliteStoreAndForwardSink` is
|
||
`Phase7EngineComposer.RouteToHistorianAsync` for scripted alarms; future
|
||
producers (AB CIP ALMD, FOCAS CNC alarms if a customer wants unified
|
||
storage) plug into the same path.
|
||
|
||
### PR B.5 — docs + memory housekeeping
|
||
|
||
**Depends on:** B.1 / B.2 / B.3 / B.4 all green on the parity rig + D.1
|
||
(deployment refresh) verified on the dev rig.
|
||
|
||
**Files:**
|
||
|
||
- `docs\drivers\Galaxy.md` — current text says the driver implements
|
||
five capability interfaces; update to seven (`IAlarmSource`,
|
||
`IAlarmHistorianWriter`-via-companion).
|
||
- `docs\AlarmTracking.md` — promote a fresh top-level doc that
|
||
describes the v2-final architecture (driver-native primary path +
|
||
sub-attribute fallback + scripted-alarm aggregation). Cross-link from
|
||
`docs\README.md`. The v1 archive stays as historical record.
|
||
- `docs\v1\AlarmTracking.md` — extend the existing historical banner
|
||
with "Restored to functional parity in this epic — see
|
||
`docs\AlarmTracking.md` for current state."
|
||
- Memory entries (`C:\Users\dohertj2\.claude\projects\…\memory\`):
|
||
- Update `project_galaxy_via_mxgateway.md` — add the alarm path
|
||
restoration.
|
||
- Update `project_server_history_alarm_subsystems.md` — note that
|
||
`Phase7Composer.ResolveHistorianSink` now finds a writer on
|
||
Galaxy installs.
|
||
- `docs\plans\alarms-over-gateway.md` (this file) — banner the doc
|
||
`✅ Completed YYYY-MM-DD — historical record.` matching the existing
|
||
v2-mxgw plan retirement convention.
|
||
|
||
## Track C — historian sidecar wires the dormant write path
|
||
|
||
The Wonderware historian sidecar at
|
||
`src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\` is a separately
|
||
deployable Windows service (NSSM-wrapped) that already loads
|
||
`aahClientManaged` x64 and serves a named-pipe IPC for read operations.
|
||
The `WriteAlarmEvents` IPC slot is defined but unwired (`Program.cs:57`
|
||
constructs `HistorianFrameHandler` without an `alarmWriter`). Track C
|
||
completes that slot. Two PRs in the sidecar + one consumer-side PR
|
||
(B.4) in lmxopcua finishes the path.
|
||
|
||
### PR C.1 — sidecar: AahClientManagedAlarmEventWriter
|
||
|
||
**Files** (`src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Backend\`):
|
||
|
||
1. New `AahClientManagedAlarmEventWriter.cs` implementing the existing
|
||
`IAlarmEventWriter` interface (defined in `Ipc\HistorianFrameHandler.cs:242`).
|
||
2. Implementation calls `aahClientManaged`'s alarm-event write API —
|
||
the same path v1's `GalaxyHistorianWriter` used. Use the existing
|
||
`HistorianClusterEndpointPicker` for multi-node routing so write
|
||
failures fail over the same way reads do.
|
||
3. Batch size + retry behaviour mirrors v1's `GalaxyHistorianWriter`
|
||
per-row outcome reporting (`HistorianWriteOutcome` enum: Ack /
|
||
PermanentFail / RetryPlease). Map MxStatus codes onto outcomes.
|
||
4. Reuses `HistorianDataSource`'s existing connection-pool / health
|
||
gating — no new TCP work needed; the same session that serves
|
||
reads can issue writes too.
|
||
|
||
**Tests** (`tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests\`):
|
||
|
||
- Outcome-mapping table: every documented MxStatus on alarm-write →
|
||
expected `HistorianWriteOutcome`.
|
||
- Batching: 1 / 100 / 1000 events through a fake `aahClientManaged`
|
||
writer; assert per-row outcome list parallel to input order.
|
||
- Cluster failover: primary node returns `BadCommunicationError`;
|
||
picker rotates to secondary; assert eventual success.
|
||
|
||
### PR C.2 — sidecar: wire IAlarmEventWriter into Program.cs
|
||
|
||
**Files** (`src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Program.cs`):
|
||
|
||
1. Build an `AahClientManagedAlarmEventWriter` next to the existing
|
||
`BuildHistorian()` call.
|
||
2. Pass it to `HistorianFrameHandler` (currently constructed at line 57
|
||
without an `alarmWriter`). The dispatcher already routes
|
||
`WriteAlarmEventsRequest` through `_alarmWriter` when non-null
|
||
(`HistorianFrameHandler.cs:158-172`); supplying it makes the slot
|
||
functional.
|
||
3. Gate behind a new env var `OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED`
|
||
(default `true` when `OTOPCUA_HISTORIAN_ENABLED=true`). Lets a
|
||
read-only deployment skip the writer registration if needed.
|
||
4. Update `Install-Services.ps1` install-time env block in
|
||
lmxopcua's `scripts\install\` to include the new toggle.
|
||
|
||
**Tests:**
|
||
|
||
- `Program.cs` unit-test seam: assert handler is constructed with
|
||
alarm writer when enabled and without when disabled.
|
||
- Live integration (parity rig): write a synthetic alarm event
|
||
through the IPC; query it back via `ReadEvents`; assert
|
||
round-trip fidelity.
|
||
|
||
### Sequencing within Track C: C.1 → C.2.
|
||
|
||
C.2's lmxopcua-side consumer is **PR B.4 in Track B**, which depends
|
||
on C.2 being deployed.
|
||
|
||
## Track E — client surface refresh
|
||
|
||
Two surfaces become user-visible when the alarm path lights up: the
|
||
**mxaccessgw client SDKs** (5 languages, each with its own CLI) that
|
||
consume the new `OnAlarmTransition` event family + `AcknowledgeAlarm`
|
||
/ `QueryActiveAlarms` RPCs directly, and the **lmxopcua OPC UA-facing
|
||
clients** (Client.CLI, Client.UI) that consume the richer Part 9
|
||
condition payload through the OPC UA server. Both need updates so the
|
||
new fields actually reach end-users; without Track E, the data
|
||
arrives at the gateway / OPC UA server but the off-the-shelf clients
|
||
display the same five columns they did under v2-pre-this-epic.
|
||
|
||
Track E is split per-language so each PR stays small and reviewable.
|
||
PRs E.2 through E.6 are independent — they share only the proto
|
||
regen from E.1 — and can land in parallel by whoever owns each
|
||
language binding.
|
||
|
||
### PR E.1 — regenerate proto across all client SDKs
|
||
|
||
**Depends on:** A.1 merged (proto change live).
|
||
|
||
**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\clients\`):
|
||
|
||
1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; just
|
||
rebuild `MxGateway.Client.csproj` after pulling A.1.
|
||
2. **Python** — run `clients\python\generate-proto.ps1`; commit the
|
||
regenerated `_pb2.py` + `_pb2_grpc.py` files under
|
||
`clients\python\src\`.
|
||
3. **Go** — run `clients\go\generate-proto.ps1`; commit the
|
||
regenerated `*.pb.go` + `*_grpc.pb.go` files under
|
||
`clients\go\mxgateway\`.
|
||
4. **Java** — Gradle's `protobuf-gradle-plugin` regenerates on
|
||
`gradle build`; verify the new types appear in the build
|
||
output. Commit any pinned generated source under
|
||
`clients\java\mxgateway-client\src\main\java\` if that's the
|
||
convention (check `JavaClientDesign.md`).
|
||
5. **Rust** — `build.rs` runs `tonic-build` on the proto; just
|
||
`cargo build`. Generated code lives under
|
||
`clients\rust\target\` (gitignored) — nothing to commit;
|
||
verify the new types compile.
|
||
|
||
No hand-written code in this PR. Pure regen + commit of generated
|
||
artifacts. Per-language pre-existing proto-regen tests in each
|
||
client's test suite must stay green.
|
||
|
||
### PR E.2 — .NET client SDK + CLI
|
||
|
||
**Depends on:** E.1, A.3 (gateway alarm dispatch + ack RPC live).
|
||
|
||
**Files** (`clients\dotnet\MxGateway.Client\` + `MxGateway.Client.Cli\`):
|
||
|
||
1. `MxGatewayClient.cs` — new public methods:
|
||
```csharp
|
||
IAsyncEnumerable<AlarmTransition> SubscribeAlarmsAsync(
|
||
IAsyncEnumerable<MxGatewaySession> session,
|
||
AlarmFilter? filter = null,
|
||
CancellationToken ct = default);
|
||
Task<MxStatus> AcknowledgeAlarmAsync(
|
||
MxGatewaySession session,
|
||
string alarmFullReference,
|
||
string comment,
|
||
string userPrincipal,
|
||
CancellationToken ct = default);
|
||
IAsyncEnumerable<ActiveAlarmSnapshot> QueryActiveAlarmsAsync(
|
||
MxGatewaySession session,
|
||
string? filterPrefix = null,
|
||
CancellationToken ct = default);
|
||
```
|
||
Existing `MxGatewayClientRetryPolicy` covers the new operations
|
||
without bespoke retry config.
|
||
2. `MxGateway.Client.Cli` — add `alarms` verb with subcommands:
|
||
`subscribe` (streams transitions until cancelled),
|
||
`acknowledge --ref <full-ref> --comment "<text>"`,
|
||
`query-active [--prefix <equipment>]`. Output formatting mirrors
|
||
the existing `events stream` verb (default human-readable +
|
||
`--json` flag for machine output).
|
||
3. AuthN — `MxGatewayClientOptions` validates new scopes
|
||
`invoke:alarm-ack` / `invoke:alarm-query` exist on the API key
|
||
when those operations are invoked; pre-flight check fails fast
|
||
with a clear error rather than letting the gateway return
|
||
`PERMISSION_DENIED` mid-stream.
|
||
|
||
**Tests** (`clients\dotnet\MxGateway.Client.Tests\`):
|
||
|
||
- `FakeGatewayTransport` extended to emit `OnAlarmTransition`
|
||
events; assert `SubscribeAlarmsAsync` yields each as the right
|
||
payload shape.
|
||
- Ack: assert request shape, retry policy, and error wrapping
|
||
(Unauthenticated → `MxGatewayAuthenticationException`,
|
||
PermissionDenied → `MxGatewayAuthorizationException`,
|
||
resource-exhausted → `MxGatewayException` with the right
|
||
message).
|
||
- CLI verb tests in `MxGatewayClientCliTests.cs` — argument
|
||
parsing, JSON output shape, exit codes.
|
||
|
||
### PR E.3 — Python client SDK + CLI
|
||
|
||
**Depends on:** E.1.
|
||
|
||
**Files** (`clients\python\src\mxgateway\` + the existing CLI entry
|
||
point — verify the exact name during PR; `PythonClientDesign.md`
|
||
documents it):
|
||
|
||
1. New module `alarms.py` exposing async helpers:
|
||
```python
|
||
async def subscribe_alarms(session, *, filter=None) -> AsyncIterator[AlarmTransition]: ...
|
||
async def acknowledge_alarm(session, *, alarm_ref, comment, user) -> MxStatus: ...
|
||
async def query_active_alarms(session, *, prefix=None) -> AsyncIterator[ActiveAlarmSnapshot]: ...
|
||
```
|
||
2. CLI: add `alarms subscribe / acknowledge / query-active` verbs.
|
||
Use the same JSON output schema as E.2's CLI so cross-language
|
||
tooling can parse either.
|
||
3. Type stubs (`*.pyi`) updated for the new types.
|
||
|
||
**Tests** (`clients\python\tests\`):
|
||
|
||
- pytest-asyncio fixtures using a stub gRPC server; assert each
|
||
helper's request/response shape.
|
||
- CLI smoke via `subprocess` + captured stdout JSON comparison.
|
||
|
||
### PR E.4 — Go client SDK + CLI
|
||
|
||
**Depends on:** E.1.
|
||
|
||
**Files** (`clients\go\mxgateway\` + `clients\go\cmd\`):
|
||
|
||
1. New `alarms.go` exposing:
|
||
```go
|
||
func (c *Client) SubscribeAlarms(ctx context.Context, opts ...SubscribeOption) (<-chan AlarmTransition, error)
|
||
func (c *Client) AcknowledgeAlarm(ctx context.Context, ref, comment, user string) (MxStatus, error)
|
||
func (c *Client) QueryActiveAlarms(ctx context.Context, prefix string) ([]ActiveAlarmSnapshot, error)
|
||
```
|
||
2. CLI: add `alarms` subcommand under `clients\go\cmd\mxgateway-cli\`
|
||
(verify the binary name in `GoClientDesign.md`). Same verb shape
|
||
as E.2 / E.3.
|
||
3. Errors wrapped via `errors.Is` against named sentinels
|
||
(`ErrAuthFailed`, `ErrPermissionDenied`, etc.) so callers can
|
||
programmatically distinguish failure modes.
|
||
|
||
**Tests:** standard Go table-driven tests against a stub gRPC server
|
||
under `clients\go\internal\testserver\`.
|
||
|
||
### PR E.5 — Java client SDK + CLI
|
||
|
||
**Depends on:** E.1.
|
||
|
||
**Files** (`clients\java\mxgateway-client\src\main\java\` +
|
||
`clients\java\mxgateway-cli\`):
|
||
|
||
1. New methods on the existing client class (verify in
|
||
`JavaClientDesign.md`):
|
||
```java
|
||
Flowable<AlarmTransition> subscribeAlarms(Session s, AlarmFilter filter);
|
||
Single<MxStatus> acknowledgeAlarm(Session s, String alarmRef, String comment, String user);
|
||
Flowable<ActiveAlarmSnapshot> queryActiveAlarms(Session s, String prefix);
|
||
```
|
||
(RxJava idiom matching the existing data-change subscription
|
||
API; if the existing API uses `CompletableFuture` instead, follow
|
||
that convention — verify during PR.)
|
||
2. CLI: same `alarms subscribe / acknowledge / query-active`
|
||
verbs.
|
||
|
||
**Tests:** JUnit 5 + a stub gRPC server. CLI tested via
|
||
`ProcessBuilder` exec + JSON output comparison.
|
||
|
||
### PR E.6 — Rust client SDK
|
||
|
||
**Depends on:** E.1.
|
||
|
||
**Files** (`clients\rust\crates\mxgateway-client\src\` +
|
||
likely a `mxgateway-cli` crate — verify in `RustClientDesign.md`):
|
||
|
||
1. New methods on the client struct:
|
||
```rust
|
||
pub fn subscribe_alarms(&self, filter: Option<AlarmFilter>) -> impl Stream<Item = Result<AlarmTransition>>;
|
||
pub async fn acknowledge_alarm(&self, alarm_ref: &str, comment: &str, user: &str) -> Result<MxStatus>;
|
||
pub fn query_active_alarms(&self, prefix: Option<&str>) -> impl Stream<Item = Result<ActiveAlarmSnapshot>>;
|
||
```
|
||
2. CLI: same verb shape.
|
||
3. `thiserror`-based error enum extended with `AlarmAckPermissionDenied`
|
||
etc. variants if the existing pattern uses one.
|
||
|
||
**Tests:** `tokio::test` against a stub gRPC server using
|
||
`tonic-build`'s test harness. CLI tested via `assert_cmd`.
|
||
|
||
### PR E.7 — lmxopcua OPC UA-facing client refresh
|
||
|
||
**Depends on:** B.2 + B.3 (server-side payload final on the OPC UA
|
||
wire). Independent of E.2-E.6 — different consumer surface (OPC UA
|
||
Part 9, not gateway gRPC).
|
||
|
||
**Files** (`c:\Users\dohertj2\Desktop\lmxopcua\src\`):
|
||
|
||
1. `Core.Abstractions\AlarmEventArgs.cs` *(extend, not new)* — add
|
||
optional fields the new path surfaces:
|
||
- `OperatorComment` (nullable string — populated by the native
|
||
ack path; null on sub-attribute fallback path)
|
||
- `OriginalRaiseTimestampUtc` (nullable; null on fallback path)
|
||
- `AlarmCategory` (nullable string)
|
||
- `AlarmTypeName` (already exists per v1 docs — leave alone)
|
||
2. `Server\OpcUa\DriverNodeManager.cs` — populate the corresponding
|
||
OPC UA Part 9 condition fields when the new payload is non-null:
|
||
`Comment` (from OperatorComment), `Time` (from OriginalRaiseTimestampUtc
|
||
when present, else event arrival time), `ConditionClassName` (from
|
||
AlarmCategory if mapping is defined).
|
||
3. `Client.Shared\Models\AlarmEventArgs.cs` — mirror the new fields
|
||
on the client-side DTO.
|
||
4. `Client.CLI\Commands\AlarmsCommand.cs` — add columns under a new
|
||
`--verbose` flag, plus full payload under `--json`. Default output
|
||
stays five-column compatible.
|
||
5. `Client.UI\ViewModels\AlarmEventViewModel.cs` — bind the new
|
||
fields. Add columns to `Views\AlarmsView.axaml` (collapsible
|
||
under a "Show details" toggle so the default view stays compact).
|
||
Surface `OperatorComment` in `AckAlarmWindow.axaml` as a
|
||
prepopulated default when re-acknowledging an already-acked
|
||
alarm.
|
||
6. `docs\Client.CLI.md` — add the new `--verbose` and `--json`
|
||
flag examples to the alarms section.
|
||
7. `docs\Client.UI.md` — add a screenshot or description of the
|
||
"Show details" expansion behavior.
|
||
8. `docs\reqs\ClientRequirements.md` — line 116 + 153 reference
|
||
the alarm subscription contract; extend the field list to cover
|
||
the new payload.
|
||
9. `docs\AlarmTracking.md` (new in B.5) — wire in client-side
|
||
examples.
|
||
|
||
**Tests:**
|
||
|
||
- `Client.Shared.Tests` — DTO round-trip through the alarm event
|
||
pump with all fields populated and all-null cases.
|
||
- `Client.CLI.Tests` — `--verbose` column ordering, `--json`
|
||
schema validation, default output stays five-column.
|
||
- `Client.UI.Tests` — `AlarmEventViewModel` bindings exposed,
|
||
collapsible-detail toggle behavior.
|
||
|
||
### Sequencing within Track E:
|
||
|
||
E.1 first (mechanical). E.2-E.7 can land in parallel. E.7 has its own
|
||
dependency chain inside lmxopcua (B.2 + B.3) and doesn't gate any
|
||
other E PR. The .NET client (E.2) is the only language SDK
|
||
**lmxopcua** consumes today; if the gateway repo's release schedule
|
||
prefers landing E.2 first and shipping E.3-E.6 in a follow-up release,
|
||
that's a valid sequence — the customer-facing constraint is "at
|
||
least one language SDK ships at the same time as A.4 lights up the
|
||
gateway dispatch."
|
||
|
||
## Track D — deployment refresh
|
||
|
||
The dev box at `DESKTOP-6JL3KKO` runs three live services from
|
||
`C:\publish\` (installed in the session that produced commit
|
||
`ea04547`'s install scripts). Once Tracks A / B / C are merged, the
|
||
deployed binaries need to be refreshed so the running services pick
|
||
up the new alarm path. Track D is one PR — pure ops, no code change.
|
||
|
||
### PR D.1 — refresh C:\publish + restart services
|
||
|
||
**Depends on:** A.4 + B.4 + C.2 merged (every code-change PR landed).
|
||
|
||
**Order matters** — services must stop in reverse-dependency order
|
||
(`OtOpcUa` → `OtOpcUaWonderwareHistorian` → `MxAccessGw`) and start in
|
||
forward-dependency order (`MxAccessGw` → `OtOpcUaWonderwareHistorian`
|
||
→ `OtOpcUa`). Touching binaries while a dependent service holds them
|
||
locked produces the publish-time `MSB3027` file-lock error caught
|
||
during the original install (see commit `80104ca`).
|
||
|
||
**Steps (run as a single PowerShell session on the deploy host):**
|
||
|
||
1. **Stop in reverse order**:
|
||
```powershell
|
||
nssm stop OtOpcUa
|
||
nssm stop OtOpcUaWonderwareHistorian
|
||
nssm stop MxAccessGw
|
||
Start-Sleep -Seconds 3
|
||
Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Server, `
|
||
OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue |
|
||
Stop-Process -Force
|
||
```
|
||
|
||
2. **Refresh mxaccessgw binaries** (Track A output):
|
||
```powershell
|
||
$gwSrc = "C:\Users\dohertj2\Desktop\mxaccessgw"
|
||
dotnet build "$gwSrc\src\MxGateway.Worker" -c Release
|
||
dotnet build "$gwSrc\src\MxGateway.Server" -c Release
|
||
|
||
Copy-Item -Recurse -Force `
|
||
"$gwSrc\src\MxGateway.Server\bin\Release\net10.0\*" `
|
||
"C:\publish\mxaccessgw\Server\"
|
||
Copy-Item -Recurse -Force `
|
||
"$gwSrc\src\MxGateway.Worker\bin\x86\Release\net48\*" `
|
||
"C:\publish\mxaccessgw\Worker\"
|
||
```
|
||
|
||
3. **Refresh OtOpcUa + historian sidecar binaries** (Tracks B + C
|
||
output):
|
||
```powershell
|
||
$repo = "C:\Users\dohertj2\Desktop\lmxopcua"
|
||
dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Server" `
|
||
-c Release -o "C:\publish\lmxopcua"
|
||
dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware" `
|
||
-c Release -o "C:\publish\lmxopcua\WonderwareHistorian"
|
||
```
|
||
|
||
4. **Update service env block if Track C added the new toggle**:
|
||
```powershell
|
||
# Pull existing env, append OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true
|
||
# (default-on per C.2 design, but explicit assignment lets us flip false
|
||
# for read-only deployments without re-installing)
|
||
nssm set OtOpcUaWonderwareHistorian AppEnvironmentExtra `
|
||
(((nssm get OtOpcUaWonderwareHistorian AppEnvironmentExtra) `
|
||
+ "`r`nOTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true"))
|
||
```
|
||
|
||
5. **Start in forward order**:
|
||
```powershell
|
||
nssm start MxAccessGw
|
||
Start-Sleep -Seconds 4
|
||
nssm start OtOpcUaWonderwareHistorian
|
||
Start-Sleep -Seconds 4
|
||
nssm start OtOpcUa
|
||
Start-Sleep -Seconds 8
|
||
```
|
||
|
||
6. **Smoke verification:**
|
||
```powershell
|
||
foreach ($s in 'MxAccessGw','OtOpcUaWonderwareHistorian','OtOpcUa') {
|
||
(Get-Service $s).Status
|
||
}
|
||
foreach ($p in 5120, 4840, 4841) {
|
||
Get-NetTCPConnection -LocalPort $p -State Listen `
|
||
-ErrorAction SilentlyContinue
|
||
}
|
||
Get-Content "C:\publish\lmxopcua\logs\otopcua-*.log" -Tail 20
|
||
Get-Content "C:\publish\mxaccessgw\stdout.log" -Tail 20
|
||
Get-Content "C:\ProgramData\OtOpcUa\historian-wonderware-*.log" -Tail 10
|
||
```
|
||
|
||
Pass criterion: all three services `Running`; ports 5120 + 4840
|
||
listening; sidecar log shows `Wonderware historian sidecar
|
||
serving — pipe=OtOpcUaWonderwareHistorian`; OtOpcUa log shows
|
||
`OPC UA server started — endpoint=opc.tcp://0.0.0.0:4840/OtOpcUa`
|
||
and a new line `IAlarmHistorianWriter resolved: Sidecar` (added
|
||
in B.4).
|
||
|
||
7. **Functional verification — fire one alarm of each kind and assert
|
||
it propagates:**
|
||
- **Galaxy-native** — raise the `OtOpcUaParityTest_001.Counter`
|
||
`$Alarm*` extension via Galaxy's alarm-fire mechanism; assert an
|
||
OPC UA Part 9 transition reaches a connected `otopcua-cli alarms`
|
||
subscriber with rich payload (operator-comment field non-null,
|
||
original-raise-timestamp present). This validates Track A + B.1
|
||
+ B.2 + B.3.
|
||
- **Scripted** — author a one-line scripted alarm in the Admin UI
|
||
against any always-true predicate; assert the transition lands in
|
||
AVEVA Historian via `aaHistClientTrend` query (or
|
||
`Driver.Historian.Wonderware.IntegrationTests` with a query for
|
||
the alarm event). Validates Track C + B.4.
|
||
- **Sub-attribute fallback** — disable `IAlarmSource` on the
|
||
GalaxyDriver via the test seam (B.3 will introduce one); fire an
|
||
alarm; assert Part 9 transition still raised by the value-driven
|
||
path. Validates the fallback wasn't broken.
|
||
|
||
**Files:**
|
||
|
||
- `scripts\install\Refresh-Services.ps1` *(new — automates the above)*
|
||
- `docs\v2\dev-environment.md` — add the refresh script to the dev
|
||
workflow section.
|
||
|
||
**Tests:** smoke run on the dev rig (`DESKTOP-6JL3KKO`) producing
|
||
`docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md` with the captured log
|
||
tails + smoke-test assertions. Captured artifact lands as part of the
|
||
PR.
|
||
|
||
**Rollback:** the refresh script keeps a timestamped backup of the
|
||
existing `C:\publish\mxaccessgw\` and `C:\publish\lmxopcua\` trees
|
||
before overwriting (mirrored to `C:\publish\.backup-YYYY-MM-DD\`).
|
||
Rollback is a stop / restore-from-backup / start sequence; no service
|
||
re-install needed since the NSSM service definitions don't change.
|
||
|
||
**Production deploy:** out of scope for D.1 — the dev rig is the only
|
||
deployment in scope at this point. A separate PR-or-runbook lands the
|
||
production refresh once the dev rig has soaked for the documented
|
||
duration (parity-rig validation gate; see "Test gates" above).
|
||
|
||
## Sequencing matrix
|
||
|
||
```
|
||
Track A (mxaccessgw) Track B (lmxopcua) Track C (sidecar) Track E (clients)
|
||
───────────────────────── ───────────────────────── ───────────────────── ──────────────────────────
|
||
A.1 proto (waits) C.1 AahClientManagedWriter E.1 proto regen ×5 langs
|
||
│ │ │ (mechanical, after A.1)
|
||
├──────────────────────────► B.1 EventPump branch │ │
|
||
A.2 worker subscription │ uses proto types only │ │
|
||
│ │ unit-testable │ │
|
||
│ C.2 Program.cs wires │
|
||
A.3 gateway dispatch + ack RPC ──►B.2 GalaxyDriver : IAlarmSource │ ──►E.2 .NET SDK + CLI
|
||
│ │ │ ──►E.3 Python SDK + CLI
|
||
│ ──►B.3 DriverNodeManager routing │ ──►E.4 Go SDK + CLI
|
||
│ │ ──►E.5 Java SDK + CLI
|
||
│ │ ──►E.6 Rust SDK
|
||
A.4 ConditionRefresh │ │ │
|
||
│ │ │
|
||
B.4 SidecarAlarmHistorianWriter │
|
||
(depends on C.2 deployed) │ │
|
||
│ │
|
||
(B.2 + B.3 done) ────────────────────────────────────────────► E.7 lmxopcua client refresh
|
||
│ │
|
||
▼ │
|
||
Track D (deployment) │
|
||
───────────────────────── │
|
||
D.1 Refresh C:\publish + restart services │
|
||
(depends on A.4 + B.4 + C.2 + E.2 merged) │
|
||
▼ │
|
||
──►B.5 docs + memory + completion banner ◄─────────(E.7 done)──┘
|
||
```
|
||
|
||
A.1 + B.1 + C.1 + E.1 can all land in parallel — none have cross-repo
|
||
runtime dependencies. B.1's tests use proto types without needing a
|
||
running gateway. C.1 is purely sidecar-internal. E.1 is mechanical
|
||
codegen.
|
||
|
||
The gateway-side dispatch (A.3) gates B.2 and E.2-E.6. The
|
||
sidecar-side wiring (C.2) gates B.4. E.7 gates on B.2 + B.3 only —
|
||
it's the OPC UA client surface, not the gateway client surface.
|
||
|
||
D.1 (deployment refresh) requires E.2 to also be merged because the
|
||
deployed `MxGateway.Client.dll` consumed by GalaxyDriver needs the new
|
||
methods. E.3-E.6 (other-language SDKs) don't gate D.1 — they ship on
|
||
their own release cadence.
|
||
|
||
B.5 (docs sweep) gates on D.1 + E.7 both merged — it's the final
|
||
"snapshot the as-shipped state" pass.
|
||
|
||
## Test gates
|
||
|
||
Per PR: unit tests pass + build green + analyzer clean (Roslyn
|
||
OTOPCUA0001 still wraps every alarm-capability call through
|
||
`AlarmSurfaceInvoker`).
|
||
|
||
End-of-epic gate: re-run the parity rig (`docs\v2\Galaxy.ParityRig.md`)
|
||
with these scenarios added:
|
||
|
||
1. **Native alarm raise** — Galaxy `$Alarm*` raise with operator-time
|
||
metadata appears as an OPC UA Part 9 transition with full payload
|
||
(no longer reconstructed from sub-attribute writes).
|
||
2. **Native ack** — OPC UA client acks; assert the gateway records the
|
||
ack against MxAccess directly (not via sub-attribute write); operator
|
||
comment present in the resulting `Acknowledged` transition.
|
||
3. **ConditionRefresh after reconnect** — disconnect the GalaxyDriver,
|
||
raise three alarms in Galaxy, reconnect; assert all three appear in
|
||
the next ConditionRefresh.
|
||
4. **Historian write-back** — fire a scripted alarm; assert it arrives in
|
||
AVEVA Historian via the gateway path (use the existing Historian
|
||
sidecar's read API to query it back).
|
||
5. **Sub-attribute fallback still works** — disable `IAlarmSource` on
|
||
the GalaxyDriver via test seam, fire a sub-attribute value change;
|
||
assert Part 9 transition still raised.
|
||
|
||
Soak target: 24h × 1k tags (light) — same parity-rig harness but
|
||
extended to also subscribe to alarms. Pass criterion: zero dropped
|
||
alarm transitions, zero state-machine inversions, zero unhandled
|
||
exceptions in the AlarmSurfaceInvoker pipeline.
|
||
|
||
## Risks and mitigations
|
||
|
||
| Risk | Mitigation |
|
||
|---|---|
|
||
| MxAccess Toolkit alarm subscription API differs across installed AVEVA versions | PR A.2 verifies against the worker-host's installed Toolkit version; documents the exact API used. Pin the worker DLL set per major MxAccess version if needed. |
|
||
| Worker-side alarm subscription leaks between sessions if cleanup is wrong | PR A.2 includes a session-recycle test that asserts no `IAlarmEventSink` instances remain registered after Close. |
|
||
| Gateway adds a new auth scope (`invoke:alarm-ack`); existing keys lack it | PR A.3 + A.5 ship with a one-time bootstrap migration: keys with `invoke:write` get the new scope auto-granted on the dev rig and parity rig. Production keys are reissued via `apikey rotate-key` (existing CLI). |
|
||
| Two simultaneous alarm sources (driver-native + sub-attribute) double-fire transitions | PR B.3 dedup is the load-bearing design. End-to-end test #1 covers it explicitly. |
|
||
| Historian write-back batch fails mid-batch — partial success | The existing `SqliteStoreAndForwardSink.HistorianWriteOutcome` per-row enum + dead-letter retention already handles this; PR A.5 just exposes the same outcome shape over gRPC. |
|
||
| Sidecar starts honouring the `WriteAlarmEvents` slot — old lmxopcua-side consumers can now reach a previously inert path | The slot returns `Success=false, Error="not configured"` today; flipping to live writes means a build that *speculatively* sent the frame would suddenly start producing real historian rows. Inventory of any such caller is empty — `WriteAlarmEvents` was never invoked from the lmxopcua side; `Phase7EngineComposer.RouteToHistorianAsync` queues into `SqliteStoreAndForwardSink` and the drain worker is gated on `IAlarmHistorianWriter` registration which only the new B.4 path provides. So enabling C.2 without B.4 is safe. |
|
||
|
||
## Roll-out
|
||
|
||
Track A lands first onto `mxaccessgw/main`, deployed to the parity rig.
|
||
Track B lands onto `lmxopcua/master` once A.3 is live on the rig — earlier
|
||
Track B PRs can target a feature branch (`feat/alarms-over-gateway`) and
|
||
merge to master after the rig is fully green.
|
||
|
||
## Back-out
|
||
|
||
Each PR is individually revertable. The cleanest back-out point is at
|
||
the gateway-side enum extension: removing `MX_EVENT_FAMILY_ON_ALARM_TRANSITION`
|
||
from the proto means EventPump silently drops alarm events again and
|
||
GalaxyDriver's `OnAlarmEvent` never fires — but the sub-attribute fallback
|
||
path still produces functional alarms, so the OPC UA surface degrades to
|
||
v2-current behaviour without breaking. PR B.4 is the only one with a
|
||
non-trivial back-out (re-add the deleted sidecar IPC slot if revert
|
||
needed); land B.4 last and only after end-of-epic gate is green.
|
||
|
||
## Out of scope (explicit)
|
||
|
||
- **Other alarm sources beyond Galaxy.** AbCip / FOCAS / OpcUaClient
|
||
drivers already implement `IAlarmSource`; they're untouched.
|
||
- **Modbus / S7 / AbLegacy / TwinCAT alarms.** None of those protocols
|
||
has a native alarm bus. Alarms on those drivers, if needed, ship via
|
||
the scripted-alarm path.
|
||
- **Multi-Galaxy ack routing.** Today's gateway model is one Galaxy per
|
||
session; if a deployment splits across galaxies, each gets its own
|
||
GalaxyDriver and they don't cross-talk. No change.
|
||
- **OPC UA Part 9 advanced features** beyond the current scope —
|
||
shelving, subscribed-to-events-only, branch-state for re-trigger
|
||
semantics. Future epic if a customer asks.
|
||
- **Insight / cloud Historian write-back path.** Track A.5 targets the
|
||
on-prem AVEVA Historian via aahClientManaged. The cloud variant
|
||
would mirror the same gateway RPC over the REST API discussed in
|
||
`docs/histsdk` — separate epic.
|
||
|
||
## File inventory (touched)
|
||
|
||
**mxaccessgw (Track A):**
|
||
|
||
- `src\MxGateway.Contracts\Protos\mxaccess_gateway.proto` (A.1)
|
||
- `src\MxGateway.Contracts\Protos\mxaccess_worker.proto` (A.2, A.4)
|
||
- `src\MxGateway.Worker\…\Eventing\` (A.2, A.3, A.4)
|
||
- `src\MxGateway.Worker\…\Commands\` (A.3, A.4)
|
||
- `src\MxGateway.Server\Sessions\SessionEventStream.cs` (A.3)
|
||
- `src\MxGateway.Server\Rpc\` (A.3, A.4)
|
||
- `src\MxGateway.Server\Auth\Scopes.cs` (A.3, A.4)
|
||
- `MxGateway.Tests`, `MxGateway.Worker.Tests`, `MxGateway.IntegrationTests`
|
||
|
||
**lmxopcua — Galaxy driver + server (Track B):**
|
||
|
||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\EventPump.cs` (B.1)
|
||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\MxAccessSeverityMapper.cs` *(new — B.1)*
|
||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\IGalaxyAlarmAcknowledger.cs` *(new — B.2)*
|
||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\GatewayGalaxyAlarmAcknowledger.cs` *(new — B.2)*
|
||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriver.cs` (B.2)
|
||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriverFactory.cs` (B.2)
|
||
- `src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs` (B.3)
|
||
- `src\ZB.MOM.WW.OtOpcUa.Server\Alarms\AlarmConditionService.cs` (B.3)
|
||
- `src\ZB.MOM.WW.OtOpcUa.Server\Phase7\Phase7Composer.cs` (B.4)
|
||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client\SidecarAlarmHistorianWriter.cs` *(new — B.4)*
|
||
- `tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests\Runtime\` (B.1, B.2)
|
||
- `tests\ZB.MOM.WW.OtOpcUa.Server.Tests\Alarms\` (B.3)
|
||
- `tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests\` (B.4 — new tests)
|
||
- `docs\drivers\Galaxy.md` (B.5)
|
||
- `docs\AlarmTracking.md` *(new — B.5)*
|
||
- `docs\v1\AlarmTracking.md` (B.5 — banner update)
|
||
- `docs\plans\alarms-over-gateway.md` (B.5 — completion banner)
|
||
|
||
**lmxopcua — Wonderware historian sidecar (Track C):**
|
||
|
||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Backend\AahClientManagedAlarmEventWriter.cs` *(new — C.1)*
|
||
- `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Program.cs` (C.2 — wire writer)
|
||
- `scripts\install\Install-Services.ps1` (C.2 — env-var toggle for write-enable)
|
||
- `tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests\` (C.1 — outcome mapping + batch + cluster failover)
|
||
|
||
**lmxopcua — deployment refresh (Track D):**
|
||
|
||
- `scripts\install\Refresh-Services.ps1` *(new — D.1)*
|
||
- `docs\v2\dev-environment.md` (D.1 — document the refresh workflow)
|
||
- `docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md` *(new — D.1 captured smoke run)*
|
||
|
||
**mxaccessgw — client SDKs (Track E):**
|
||
|
||
- `clients\proto\` — no source change; downstream codegen consumes A.1
|
||
- **.NET (E.2)**:
|
||
- `clients\dotnet\MxGateway.Client\MxGatewayClient.cs`
|
||
- `clients\dotnet\MxGateway.Client\Alarms\` *(new namespace)*
|
||
- `clients\dotnet\MxGateway.Client.Cli\Verbs\AlarmsVerb.cs` *(new)*
|
||
- `clients\dotnet\MxGateway.Client.Tests\AlarmsTests.cs` *(new)*
|
||
- **Python (E.3)**:
|
||
- `clients\python\src\mxgateway\alarms.py` *(new)*
|
||
- `clients\python\src\mxgateway\cli\alarms.py` *(new — verify CLI module path)*
|
||
- `clients\python\tests\test_alarms.py` *(new)*
|
||
- **Go (E.4)**:
|
||
- `clients\go\mxgateway\alarms.go` *(new)*
|
||
- `clients\go\cmd\mxgateway-cli\alarms.go` *(new — verify dir name)*
|
||
- `clients\go\internal\testserver\alarms_test.go` *(new)*
|
||
- **Java (E.5)**:
|
||
- `clients\java\mxgateway-client\src\main\java\…\AlarmsApi.java` *(new)*
|
||
- `clients\java\mxgateway-cli\src\main\java\…\AlarmsCommand.java` *(new)*
|
||
- `clients\java\mxgateway-client\src\test\java\…\AlarmsApiTest.java` *(new)*
|
||
- **Rust (E.6)**:
|
||
- `clients\rust\crates\mxgateway-client\src\alarms.rs` *(new)*
|
||
- `clients\rust\crates\mxgateway-cli\src\alarms.rs` *(new — verify crate name)*
|
||
- `clients\rust\tests\alarms.rs` *(new)*
|
||
|
||
**lmxopcua — OPC UA client refresh (Track E.7):**
|
||
|
||
- `src\ZB.MOM.WW.OtOpcUa.Core.Abstractions\AlarmEventArgs.cs` (extend)
|
||
- `src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs` (Part 9 field population)
|
||
- `src\ZB.MOM.WW.OtOpcUa.Client.Shared\Models\AlarmEventArgs.cs` (DTO mirror)
|
||
- `src\ZB.MOM.WW.OtOpcUa.Client.CLI\Commands\AlarmsCommand.cs` (verbose / json flags)
|
||
- `src\ZB.MOM.WW.OtOpcUa.Client.UI\ViewModels\AlarmEventViewModel.cs`
|
||
- `src\ZB.MOM.WW.OtOpcUa.Client.UI\ViewModels\AlarmsViewModel.cs`
|
||
- `src\ZB.MOM.WW.OtOpcUa.Client.UI\Views\AlarmsView.axaml` (+ `.cs`)
|
||
- `src\ZB.MOM.WW.OtOpcUa.Client.UI\Views\AckAlarmWindow.axaml` (+ `.cs`)
|
||
- `docs\Client.CLI.md` (alarms section examples)
|
||
- `docs\Client.UI.md` (Show-details toggle description)
|
||
- `docs\reqs\ClientRequirements.md` (extend AlarmEventArgs contract)
|
||
- `docs\AlarmTracking.md` (B.5 — cross-link client examples)
|
||
- `tests\ZB.MOM.WW.OtOpcUa.Client.Shared.Tests\` (DTO round-trip)
|
||
- `tests\ZB.MOM.WW.OtOpcUa.Client.CLI.Tests\` (flag behaviour)
|
||
- `tests\ZB.MOM.WW.OtOpcUa.Client.UI.Tests\` (view-model bindings)
|
||
|
||
Total: ~10 source files added/modified in mxaccessgw server/worker
|
||
side; ~14 in lmxopcua server/driver side; ~3 in the historian sidecar;
|
||
~2 deployment scripts; ~30 across the five gateway-client SDK
|
||
languages; ~12 in lmxopcua client surfaces; ~25 test files across
|
||
all repos. The gateway-client multi-language work is parallelizable
|
||
across maintainers, so wall-clock effort lands in 4-6 weeks of
|
||
coordinated work given the parity-rig dependency for end-to-end
|
||
validation. If only the .NET SDK ships at first (E.2 only) and
|
||
E.3-E.6 follow asynchronously, lmxopcua's critical path stays
|
||
unchanged.
|