# Plan — alarms over the mxaccessgw gateway > **17 of 19 PRs merged. Public contract surface and the lmxopcua / > sidecar consumers are live; four merged PRs ship as scaffolds > pending worker-side wiring.** Status reconciled against the source > tree on 2026-05-01. > > **Functional end-to-end today:** B.1 / B.2 / B.3 / B.4 / B.5 > (EventPump branch, GalaxyDriver `IAlarmSource`, DriverNodeManager > ack routing, `WonderwareHistorianClient : IAlarmHistorianWriter`, > docs sweep), C.2 (sidecar wires the alarm-write slot), D.1 script > (`scripts/install/Refresh-Services.ps1`), E.1 – E.7 (proto regen + > .NET / Python / Go / Java / Rust SDK alarm methods + lmxopcua client > surface). The value-driven sub-attribute fallback path keeps Galaxy > alarms functional today. > > **Merged-but-inert scaffolds (gated on worker AlarmClient wiring):** > > - **A.2** — `MxAccessAlarmEventSink.Attach` is a no-op; the COM-side > `aaAlarmManagedClient.AlarmClient` registration / subscription has > not landed yet, so the gateway's > `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` is reserved on the wire but > never emitted. > - **A.3** AcknowledgeAlarm + **A.4** QueryActiveAlarms — public RPC > handlers in `MxAccessGatewayService.cs` route through > `NotWiredAlarmRpcDispatcher` (Ack returns OK with a `worker dispatch > pending dev-rig wiring` diagnostic; Query yields an empty stream). > - **C.1** sidecar — `AahClientManagedAlarmEventWriter` exists and the > IPC slot is wired, but the production backend > `SdkAlarmHistorianWriteBackend.WriteBatchAsync` returns > `RetryPlease` for every event with a placeholder log — the live > `aahClientManaged` SDK call site is pinned during the D.1 dev-rig > smoke. Effect: scripted-alarm transitions queue locally in > `SqliteStoreAndForwardSink` and the drain worker repeatedly retries. > > **Architectural decision RESOLVED 2026-04-30** (recorded in the > mxaccessgw repo at `src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs` > xmldoc): the worker hosts `aaAlarmManagedClient.AlarmClient` (x86 > .NET Framework 4.8 — same bitness as the existing MxAccess COM > consumer) alongside the COM consumer, sharing the worker's STA + > WM_APP message pump. The discovered API surface > (`RegisterConsumer`, `Subscribe`, `GetStatistics`, > `GetAlarmExtendedRec`, `AlarmAckByGUID`) is documented in that > file's xmldoc. The earlier concern that AVEVA's alarm SDK was > x64-only proved wrong against the deployed assemblies. What remains > is wiring PRs in the worker — session-startup `RegisterConsumer` + > `Subscribe`, an STA WM_APP handler that routes > alarm-changed messages into `EnqueueTransition`, and the worker > command path that calls `AlarmAckByGUID` from a gateway > `AcknowledgeAlarm` RPC. > > **D.1 smoke artifact** > (`docs/plans/artifacts/d1-rollout-YYYY-MM-DD.md`, called for in the > Track D test plan below) not yet captured — gated on the worker > AlarmClient wiring being live on the dev rig so the smoke can > exercise the alarm scenarios end-to-end and pin the > `SdkAlarmHistorianWriteBackend` SDK entry point. > > The remainder of this document is preserved as the design record. Coordinated epic across two repos: - **`lmxopcua`** (this repo) — `c:\Users\dohertj2\Desktop\lmxopcua\` - **`mxaccessgw`** — `c:\Users\dohertj2\Desktop\mxaccessgw\` ## Why PR 7.2 (2026-04-30, commit `ae7106d`) retired the in-process v1 Galaxy stack (`Driver.Galaxy.Host` / `.Proxy` / `.Shared` + `OtOpcUaGalaxyHost` Windows service) and migrated Galaxy access to the in-process `GalaxyDriver` over mxaccessgw's gRPC. In doing so, three v1 capabilities regressed: 1. **Native MxAccess alarm-event metadata** — v1's `GalaxyAlarmTracker` surfaced rich alarm transitions (operator comment, original raise time, ack time, alarm category, native severity). The current architecture reconstructs Part 9 transitions by subscribing to four sub-attribute value updates (`InAlarm`, `Acked`, `Priority`, `Description`) — fine for raise/clear but loses everything else. 2. **Native MxAccess Acknowledge semantics** — v1 called the MxAccess ack API directly from `GalaxyAlarmTracker`. Today, OPC UA acks are written into the `AckMsgWriteRef` sub-attribute — semantically valid but a round-trip through the value path that loses operator-comment fidelity. 3. **Alarm-historian write-back path for non-Galaxy alarm sources.** v1's `GalaxyHistorianWriter` implemented `IAlarmHistorianWriter` and forwarded *scripted-alarm* transitions (and any future non-Galaxy alarm source — AB CIP ALMD, OpcUaClient A&E, etc.) back to AVEVA Historian via `aahClientManaged`. PR 7.2 deleted it. `Phase7Composer.ResolveHistorianSink` now finds no writer and falls back to `NullAlarmHistorianSink`, so **scripted-alarm transitions queue locally and silently discard.** Galaxy-native alarms (with `$Alarm*` extensions) reach AVEVA Historian via System Platform's own `HistorizeToAveva` toggle on the Galaxy template — that path was never broken and is not in scope for this epic. `gateway.md` (mxaccessgw, line 8) explicitly commits the gateway to "full MXAccess parity… preserve MXAccess behavior first… **native MXAccess event families**." Today's gateway proto exposes only data-change families. Closing the alarm regression and fulfilling that parity statement are the same task. ## Goals - Restore all three regressed capabilities to feature parity with v1. - Keep the v2 architectural split — gateway owns MxAccess transport; lmxopcua owns OPC UA Part 9 semantics, ACL/role enforcement, and multi-source aggregation (driver-native + scripted + sub-attribute). - Preserve the value-driven sub-attribute path as a fallback for Galaxy templates that don't carry `$Alarm*` extensions. - Land the work as a sequence of small, independently-reviewable PRs that alternate between repos in dependency order. ## Non-goals - Reimplementing the Part 9 state machine inside mxaccessgw. The gateway stays UA-agnostic. - Reworking the LDAP role-grant or OPC UA AlarmAck ACL surface — those already exist and route through `Server/Alarms/IAlarmAcknowledger`. - Adding alarm support to non-Galaxy drivers (AbCip / FOCAS / OpcUaClient already have their own `IAlarmSource` implementations; Modbus / S7 / AbLegacy / TwinCAT don't have a native alarm bus and are out of scope). - Altering Galaxy template conventions or `$Alarm*` extensions in the customer's Galaxy. ## Before → after **Today (post-PR 7.2):** ``` MxAccess COM (gateway worker) │ data-change events only on the MxEvent stream ▼ GalaxyDriver (no IAlarmSource) │ IWritable / ISubscribable / ITagDiscovery only ▼ DriverNodeManager ├─ subscribes to four $Alarm* sub-attributes per condition ├─ AlarmConditionService rebuilds Part 9 transitions from value updates └─ DriverWritableAcknowledger writes AckMsgWriteRef on ack Phase7Composer.ResolveHistorianSink → NullAlarmHistorianSink (scripted-alarm transitions queue → silently discarded) ``` **After this epic:** ``` MxAccess COM (gateway worker) │ data-change ──┐ │ alarm-transition │ │ write-complete ├─► single MxEvent stream (new family added) ▼ ▼ GalaxyDriver : ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IAlarmSource ← restored ├─ EventPump dispatches OnAlarmTransition family → IAlarmSource.OnAlarmEvent ├─ AcknowledgeAsync → gateway RPC AcknowledgeAlarm └─ QueryActiveAlarmsAsync → gateway RPC QueryActiveAlarms (ConditionRefresh) DriverNodeManager ├─ rich alarm events from IAlarmSource.OnAlarmEvent → AlarmConditionService ├─ value-driven sub-attribute path STILL WORKS for templates without $Alarm ├─ DriverWritableAcknowledger preserved as fallback for the value path └─ ScriptedAlarmEngine output continues to feed AlarmConditionService Phase7Composer.ResolveHistorianSink → GatewayAlarmHistorianWriter ├─ scripted-alarm transitions → SqliteStoreAndForwardSink └─ drain worker → gateway RPC WriteHistorianEvent → AVEVA Historian ``` ## Architecture decisions **D1 — Where the Part 9 state machine runs.** Stays in lmxopcua's `AlarmConditionService`. Gateway is UA-agnostic. ScriptedAlarmEngine produces Part 9 transitions with no MxAccess origin; the aggregator must live where all sources converge. **D2 — Where authz on Acknowledge runs.** Stays in lmxopcua. The OPC UA `AlarmConditionState.OnAcknowledge` delegate already checks the session's roles for `AlarmAck` against the LDAP/role-grant ACL. The gateway should never be reachable in a way that bypasses that check. **D3 — How rich alarm events reach OPC UA clients.** New `MxEventFamily` on the existing `StreamEvents` RPC (no second stream). Adds latency parity with data-change events, reuses the bounded-channel + worker-side delivery semantics already documented in `gateway.md`. **D4 — Sub-attribute fallback path stays.** Some Galaxy templates won't have `$Alarm*` extensions yet; the existing value-driven path remains the only way to surface alarms for those templates. Both paths feed `AlarmConditionService`. Driver-native events take precedence when both are present (more authoritative, lower latency). **D5 — Where the historian writer lives.** In the **Wonderware historian sidecar**, not in the gateway. The sidecar already owns `aahClientManaged`, already has a `WriteAlarmEvents` IPC slot defined in `Ipc/Contracts.cs`, and already dispatches to an `IAlarmEventWriter` interface — it's just unwired in `Program.cs:57`. The gateway is for MxAccess (live data + Galaxy hierarchy); the historian sidecar is for `aahClientManaged` (time-series + alarms historian). Two different SDKs, two different concerns; keep the split. Bonus: completing the sidecar's write path also gives it a clearer long-term role — once the REST-API migration in `histsdk\instructions.md` takes over reads, write-back keeps the sidecar relevant rather than retiring it as a read-only relic. **Galaxy-native alarms bypass this entirely** — System Platform's own `HistorizeToAveva` toggle on the Galaxy template publishes them directly. The sidecar write path is exclusively for non-Galaxy producers (today: scripted alarms; future: AB CIP ALMD or any other lmxopcua-side alarm source the customer wants unified into AVEVA Historian). ## Track A — mxaccessgw changes All five PRs land in `c:\Users\dohertj2\Desktop\mxaccessgw\`. ### PR A.1 — proto: add alarm-transition event family + ack/query RPCs **Files** (`src\MxGateway.Contracts\Protos\mxaccess_gateway.proto`): 1. Extend `MxEventFamily` (line 403): ``` MX_EVENT_FAMILY_ON_ALARM_TRANSITION = 5; ``` 2. Extend `MxEvent.body` oneof (line 395) with: ``` OnAlarmTransitionEvent on_alarm_transition = 24; ``` 3. New message `OnAlarmTransitionEvent` after the existing event-family bodies (line 425+). Carry the full MxAccess alarm payload — alarm name, source object reference, alarm-type-name (e.g. "AnalogLimitAlarm.HiHi"), transition kind enum (`Raise` / `Acknowledge` / `Clear`), severity (raw numeric — keep MxAccess scale; mapping to OPC UA 0-1000 happens server-side in lmxopcua), `original_raise_timestamp`, `transition_timestamp`, optional `operator_user`, optional `operator_comment`, alarm `category` string, alarm `description`. Mirror the field set documented in v1's `GalaxyAlarmTracker`. 4. New RPC on `MxAccessGateway` service (line 11): ``` rpc AcknowledgeAlarm(AcknowledgeAlarmRequest) returns (AcknowledgeAlarmReply); rpc QueryActiveAlarms(QueryActiveAlarmsRequest) returns (stream ActiveAlarmSnapshot); ``` `AcknowledgeAlarmRequest` carries `session_id`, `alarm_full_reference`, `comment`, `user_principal`. Reply carries `MxStatusProxy`. `QueryActiveAlarmsRequest` carries `session_id`, optional `alarm_filter_prefix` (for ConditionRefresh on a sub-tree). `ActiveAlarmSnapshot` carries the same fields as `OnAlarmTransitionEvent` plus `current_state` enum (`Active` / `ActiveAcked` / `Inactive`). **Tests** (`MxGateway.Tests` — proto/codegen sanity): - Round-trip Serialize→Deserialize for the new messages with all-fields populated and empty-optional-fields cases. - `MxEvent.body` oneof selection guard — supplying multiple bodies rejected. **Out of scope:** worker-side wiring (PR A.2), gateway-side dispatch (PR A.3). PR A.1 is a pure contract-surface change; nothing functional yet. ### PR A.2 — worker: subscribe to MxAccess alarm event source **Files** (`src\MxGateway.Worker\` — net48/x86): The MxAccess Toolkit exposes alarm subscription separately from data subscription. Per AVEVA's MXAccess C++ Toolkit reference (canonical doc referenced from `gateway.md`), alarm events arrive through the `IAlarmEventSink` interface registered against the MxAccess `Alarms` collection of an open session, OR via the MxAccess "alarm provider" subscription pattern (depends on Toolkit version on the worker host — verify against the version actually deployed in the worker bin during PR A.2). 1. Worker subscribes to MxAccess alarms once per session, with a single sink that fans out into the same bounded channel the data-change pump uses (`MxGateway.Worker\Eventing\EventChannel.cs` or whatever the worker currently calls its sink — verify name during the PR). 2. Sink translates each MxAccess alarm event into a `WorkerEvent` proto (defined in `mxaccess_worker.proto`) carrying the new `OnAlarmTransitionEvent` body. Reuses the existing `worker_sequence` counter so ordering is preserved across families. 3. Worker honours the same backpressure rules as data-change events — newest-dropped on full channel, single dropped-counter metric per family. **Tests** (`MxGateway.Worker.Tests`): - Fake `IAlarmEventSink` source emits canned transitions; assert the worker forwards each as the right `WorkerEvent` shape. - Cancellation test — closing the session unsubscribes from MxAccess alarms cleanly (no leaked sinks if the worker is recycled mid-session). **Out of scope:** any gateway-side dispatch, any RPC handler — PR A.2 is worker-internal. ### PR A.3 — gateway: dispatch OnAlarmTransition + implement AcknowledgeAlarm **Files** (`src\MxGateway.Server\`): 1. The session-level event multiplexer (`Sessions\SessionEventStream.cs` or equivalent — verify name during PR) recognizes the new `WorkerEvent` body and forwards as an `MxEvent` with family `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` to the gRPC `StreamEvents` consumer. 2. New RPC handler `AcknowledgeAlarm` builds an MxAccess `WorkerCommand` carrying an `AlarmAcknowledgeCommand` (new in `mxaccess_worker.proto` under PR A.1). Forwarded to the worker; reply mapped to `AcknowledgeAlarmReply` with the MxAccess `MxStatus` proxy populated. 3. AuthN — same API-key + scope check as existing RPCs. Add a new scope `invoke:alarm-ack` (mirrors `invoke:write` granularity); existing keys without it return `PERMISSION_DENIED`. **Tests** (`MxGateway.Tests`, `MxGateway.IntegrationTests`): - Unit: dispatch test — fake worker emits an `AlarmTransition` event; assert the gateway forwards it on the live `StreamEvents` channel of every subscribed session. - Integration: end-to-end against the real worker (requires the parity rig setup — see `docs\v2\Galaxy.ParityRig.md` in lmxopcua for the MxAccess-installed dev box prerequisites). Trigger a real Galaxy alarm, assert the gateway emits `OnAlarmTransition`. Acknowledge via the new RPC, assert the alarm transitions to `ActiveAcked` and an `Acknowledge` transition event is emitted back. - AuthN: existing key without `invoke:alarm-ack` scope rejected. ### PR A.4 — gateway: ConditionRefresh snapshot via QueryActiveAlarms **Files** (`src\MxGateway.Server\`, `src\MxGateway.Worker\`): 1. Worker exposes a `QueryActiveAlarmsCommand` that walks the session's active-alarm collection and streams snapshots back through the existing command-reply channel. The MxAccess Toolkit's `Alarms.GetActive()` (verify exact API name during PR) is the underlying call. 2. Gateway RPC `QueryActiveAlarms` opens a server-streaming reply, batches snapshots through. 3. AuthN — new scope `invoke:alarm-query` (separate from ack so a read-only client can refresh without ack rights). **Tests:** - Worker-test: synthetic active set of 0 / 1 / 100 alarms; assert pagination respects worker channel capacity. - Integration: against the parity rig, assert a ConditionRefresh after reconnect returns every alarm currently `Active` or `ActiveAcked` in the Galaxy. **Sequencing within Track A:** A.1 → A.2 → A.3 → A.4. A.1 is mechanical; A.2 + A.3 are the load-bearing changes that unlock lmxopcua side. A.4 can ship after lmxopcua starts consuming A.3 output. The historian-write capability moved to **Track C** below — the gateway intentionally stays out of `aahClientManaged`. ## Track B — lmxopcua changes All five PRs land in `c:\Users\dohertj2\Desktop\lmxopcua\`. Each B-PR depends on a specific A-PR — see the sequencing matrix below. ### PR B.1 — EventPump: dispatch OnAlarmTransition family **Depends on:** A.1 (proto), A.3 (gateway dispatching the new family). **Files:** - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\EventPump.cs:160` — current `Dispatch(MxEvent ev)` returns early for any non-`OnDataChange` family. Add a branch: ```csharp switch (ev.Family) { case MxEventFamily.OnDataChange: DispatchDataChange(ev); break; case MxEventFamily.OnAlarmTransition: DispatchAlarmTransition(ev); break; default: return; } ``` - New `DispatchAlarmTransition` translates the proto event into an `AlarmEventArgs` (existing type from `Core.Abstractions`) and raises an internal event the driver subscribes to. - New `MxAccessSeverityMapper` in `Driver.Galaxy\Runtime\` — maps the MxAccess raw severity into the `AlarmSeverity` enum + the OPC UA numeric severity (250 / 500 / 700 / 900 ladder per v1's `AlarmTracking.md`). **Tests** (`tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests\Runtime\`): - `EventPumpAlarmTests` — feed three synthetic MxEvents (raise / ack / clear); assert each fires `OnAlarmEvent` on the driver with correct payload. - Severity-mapping table tests — every documented MxAccess severity level → expected (`AlarmSeverity`, OPC UA numeric) tuple. ### PR B.2 — GalaxyDriver re-implements IAlarmSource **Depends on:** A.3 (`AcknowledgeAlarm` RPC available), B.1 (event dispatch). **Files:** - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriver.cs:28` — extend the class declaration: ```csharp public sealed class GalaxyDriver : IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IAlarmSource, IDisposable ``` - Implement the four `IAlarmSource` members: - `SubscribeAlarmsAsync` — no-op returning a sentinel handle. The driver is already subscribed for data; alarm events arrive on the same event stream once the gateway emits the new family. (Same pattern AbCip uses today — see `Driver.AbCip\AbCipDriver.cs:208`.) - `UnsubscribeAlarmsAsync` — no-op. - `OnAlarmEvent` — wired to the EventPump branch added in B.1. - `AcknowledgeAsync` — calls the new gateway RPC via the `IGalaxyAlarmAcknowledger` abstraction (new file, mirrors the `IGalaxyDataWriter` pattern), with `GatewayGalaxyAlarmAcknowledger` as the production implementation in `Runtime\`. Resilience wrapping via `AlarmSurfaceInvoker` per existing pattern. - `DriverInstanceFactory` for Galaxy registers `IGalaxyAlarmAcknowledger` alongside the existing data writer. **Tests:** - Subscribe-noop returns a non-null handle; unsubscribe accepts it. - Acknowledge — fake `IGalaxyAlarmAcknowledger` records the call; assert the request shape and resilience-pipeline routing. - End-to-end test in `Driver.Galaxy.Tests` — fake gateway emits a raise-then-ack event sequence; assert the driver fires `OnAlarmEvent` twice with matching alarm-id correlation. ### PR B.3 — DriverNodeManager: route to driver-native when present **Depends on:** B.2. **Files:** - `src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs` — when registering an `AlarmConditionState` for a Galaxy variable, check whether the driver is `IAlarmSource`. If yes, prefer the `OnAlarmEvent`-driven path; the value-driven sub-attribute path becomes the secondary path that handles transitions the driver-native stream missed (network blip, gateway restart, gw missing the `$Alarm*` extension on this template). - `Server\Alarms\AlarmConditionService` — already accepts events from multiple sources; only addition is a `DriverEventOrigin` enum on internal transitions so the dedup logic prefers the richer driver-native record over a stale sub-attribute synthesis. - `IAlarmAcknowledger` resolution in `DriverNodeManager` — prefer the driver's `IAlarmSource.AcknowledgeAsync` over `DriverWritableAcknowledger` when both are available. Keep `DriverWritableAcknowledger` as the fallback for templates without `$Alarm*` extensions. **Tests:** - Two-source-fan-in test: same alarm condition receives both a driver-native ack event and a sub-attribute value update for the same transition; assert no duplicate Part 9 transition fires. - Acknowledger routing — driver implements `IAlarmSource` → ack-via-RPC; driver implements only `IWritable` → ack-via-write (existing path). ### PR B.4 — IAlarmHistorianWriter via the historian sidecar IPC **Depends on:** C.2 (sidecar wires its `IAlarmEventWriter`). See Track C for the sidecar-side work; B.4 is the lmxopcua-side consumer. **Files:** - New `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client\SidecarAlarmHistorianWriter.cs` implementing `IAlarmHistorianWriter`. Sends batches over the existing named-pipe IPC using the **already-defined** `WriteAlarmEventsRequest` / `WriteAlarmEventsReply` contracts at `Ipc\Contracts.cs:153`. No protocol changes — the slot is wired today on the contract side; only the production behaviour and the consumer on this side need to land. - `Server\Phase7\Phase7Composer.ResolveHistorianSink` — already scans for registered `IAlarmHistorianWriter` instances. Register the new sidecar-backed writer at server bootstrap when the historian sidecar is enabled (`appsettings.json` `Historian:Wonderware:Enabled = true`). `SqliteStoreAndForwardSink` then boots with a real writer attached and the `NullAlarmHistorianSink` fallback no longer applies on installs that have the sidecar deployed. **Tests:** - `SidecarAlarmHistorianWriter` against a fake `PipeServer` — single record, batch, per-row failure modes (Ack / RetryPlease / PermanentFail) mapped from the sidecar's `PerEventOk[]` reply. - `Phase7Composer` end-to-end — start the server with the historian sidecar enabled; assert `ResolveHistorianSink` picks `SqliteStoreAndForwardSink` with the new sidecar writer attached. **Note on producer scope:** This path historizes **non-Galaxy alarms only.** Galaxy-native alarms (with `$Alarm*` extensions) reach AVEVA Historian directly via System Platform's `HistorizeToAveva` toggle on the alarm primitive, with no involvement from us. Today the only live producer feeding `SqliteStoreAndForwardSink` is `Phase7EngineComposer.RouteToHistorianAsync` for scripted alarms; future producers (AB CIP ALMD, FOCAS CNC alarms if a customer wants unified storage) plug into the same path. ### PR B.5 — docs + memory housekeeping **Depends on:** B.1 / B.2 / B.3 / B.4 all green on the parity rig + D.1 (deployment refresh) verified on the dev rig. **Files:** - `docs\drivers\Galaxy.md` — current text says the driver implements five capability interfaces; update to seven (`IAlarmSource`, `IAlarmHistorianWriter`-via-companion). - `docs\AlarmTracking.md` — promote a fresh top-level doc that describes the v2-final architecture (driver-native primary path + sub-attribute fallback + scripted-alarm aggregation). Cross-link from `docs\README.md`. The v1 archive stays as historical record. - `docs\v1\AlarmTracking.md` — extend the existing historical banner with "Restored to functional parity in this epic — see `docs\AlarmTracking.md` for current state." - Memory entries (`C:\Users\dohertj2\.claude\projects\…\memory\`): - Update `project_galaxy_via_mxgateway.md` — add the alarm path restoration. - Update `project_server_history_alarm_subsystems.md` — note that `Phase7Composer.ResolveHistorianSink` now finds a writer on Galaxy installs. - `docs\plans\alarms-over-gateway.md` (this file) — banner the doc `✅ Completed YYYY-MM-DD — historical record.` matching the existing v2-mxgw plan retirement convention. ## Track C — historian sidecar wires the dormant write path The Wonderware historian sidecar at `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\` is a separately deployable Windows service (NSSM-wrapped) that already loads `aahClientManaged` x64 and serves a named-pipe IPC for read operations. The `WriteAlarmEvents` IPC slot is defined but unwired (`Program.cs:57` constructs `HistorianFrameHandler` without an `alarmWriter`). Track C completes that slot. Two PRs in the sidecar + one consumer-side PR (B.4) in lmxopcua finishes the path. ### PR C.1 — sidecar: AahClientManagedAlarmEventWriter **Files** (`src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Backend\`): 1. New `AahClientManagedAlarmEventWriter.cs` implementing the existing `IAlarmEventWriter` interface (defined in `Ipc\HistorianFrameHandler.cs:242`). 2. Implementation calls `aahClientManaged`'s alarm-event write API — the same path v1's `GalaxyHistorianWriter` used. Use the existing `HistorianClusterEndpointPicker` for multi-node routing so write failures fail over the same way reads do. 3. Batch size + retry behaviour mirrors v1's `GalaxyHistorianWriter` per-row outcome reporting (`HistorianWriteOutcome` enum: Ack / PermanentFail / RetryPlease). Map MxStatus codes onto outcomes. 4. Reuses `HistorianDataSource`'s existing connection-pool / health gating — no new TCP work needed; the same session that serves reads can issue writes too. **Tests** (`tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests\`): - Outcome-mapping table: every documented MxStatus on alarm-write → expected `HistorianWriteOutcome`. - Batching: 1 / 100 / 1000 events through a fake `aahClientManaged` writer; assert per-row outcome list parallel to input order. - Cluster failover: primary node returns `BadCommunicationError`; picker rotates to secondary; assert eventual success. ### PR C.2 — sidecar: wire IAlarmEventWriter into Program.cs **Files** (`src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Program.cs`): 1. Build an `AahClientManagedAlarmEventWriter` next to the existing `BuildHistorian()` call. 2. Pass it to `HistorianFrameHandler` (currently constructed at line 57 without an `alarmWriter`). The dispatcher already routes `WriteAlarmEventsRequest` through `_alarmWriter` when non-null (`HistorianFrameHandler.cs:158-172`); supplying it makes the slot functional. 3. Gate behind a new env var `OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED` (default `true` when `OTOPCUA_HISTORIAN_ENABLED=true`). Lets a read-only deployment skip the writer registration if needed. 4. Update `Install-Services.ps1` install-time env block in lmxopcua's `scripts\install\` to include the new toggle. **Tests:** - `Program.cs` unit-test seam: assert handler is constructed with alarm writer when enabled and without when disabled. - Live integration (parity rig): write a synthetic alarm event through the IPC; query it back via `ReadEvents`; assert round-trip fidelity. ### Sequencing within Track C: C.1 → C.2. C.2's lmxopcua-side consumer is **PR B.4 in Track B**, which depends on C.2 being deployed. ## Track E — client surface refresh Two surfaces become user-visible when the alarm path lights up: the **mxaccessgw client SDKs** (5 languages, each with its own CLI) that consume the new `OnAlarmTransition` event family + `AcknowledgeAlarm` / `QueryActiveAlarms` RPCs directly, and the **lmxopcua OPC UA-facing clients** (Client.CLI, Client.UI) that consume the richer Part 9 condition payload through the OPC UA server. Both need updates so the new fields actually reach end-users; without Track E, the data arrives at the gateway / OPC UA server but the off-the-shelf clients display the same five columns they did under v2-pre-this-epic. Track E is split per-language so each PR stays small and reviewable. PRs E.2 through E.6 are independent — they share only the proto regen from E.1 — and can land in parallel by whoever owns each language binding. ### PR E.1 — regenerate proto across all client SDKs **Depends on:** A.1 merged (proto change live). **Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\clients\`): 1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; just rebuild `MxGateway.Client.csproj` after pulling A.1. 2. **Python** — run `clients\python\generate-proto.ps1`; commit the regenerated `_pb2.py` + `_pb2_grpc.py` files under `clients\python\src\`. 3. **Go** — run `clients\go\generate-proto.ps1`; commit the regenerated `*.pb.go` + `*_grpc.pb.go` files under `clients\go\mxgateway\`. 4. **Java** — Gradle's `protobuf-gradle-plugin` regenerates on `gradle build`; verify the new types appear in the build output. Commit any pinned generated source under `clients\java\mxgateway-client\src\main\java\` if that's the convention (check `JavaClientDesign.md`). 5. **Rust** — `build.rs` runs `tonic-build` on the proto; just `cargo build`. Generated code lives under `clients\rust\target\` (gitignored) — nothing to commit; verify the new types compile. No hand-written code in this PR. Pure regen + commit of generated artifacts. Per-language pre-existing proto-regen tests in each client's test suite must stay green. ### PR E.2 — .NET client SDK + CLI **Depends on:** E.1, A.3 (gateway alarm dispatch + ack RPC live). **Files** (`clients\dotnet\MxGateway.Client\` + `MxGateway.Client.Cli\`): 1. `MxGatewayClient.cs` — new public methods: ```csharp IAsyncEnumerable SubscribeAlarmsAsync( IAsyncEnumerable session, AlarmFilter? filter = null, CancellationToken ct = default); Task AcknowledgeAlarmAsync( MxGatewaySession session, string alarmFullReference, string comment, string userPrincipal, CancellationToken ct = default); IAsyncEnumerable QueryActiveAlarmsAsync( MxGatewaySession session, string? filterPrefix = null, CancellationToken ct = default); ``` Existing `MxGatewayClientRetryPolicy` covers the new operations without bespoke retry config. 2. `MxGateway.Client.Cli` — add `alarms` verb with subcommands: `subscribe` (streams transitions until cancelled), `acknowledge --ref --comment ""`, `query-active [--prefix ]`. Output formatting mirrors the existing `events stream` verb (default human-readable + `--json` flag for machine output). 3. AuthN — `MxGatewayClientOptions` validates new scopes `invoke:alarm-ack` / `invoke:alarm-query` exist on the API key when those operations are invoked; pre-flight check fails fast with a clear error rather than letting the gateway return `PERMISSION_DENIED` mid-stream. **Tests** (`clients\dotnet\MxGateway.Client.Tests\`): - `FakeGatewayTransport` extended to emit `OnAlarmTransition` events; assert `SubscribeAlarmsAsync` yields each as the right payload shape. - Ack: assert request shape, retry policy, and error wrapping (Unauthenticated → `MxGatewayAuthenticationException`, PermissionDenied → `MxGatewayAuthorizationException`, resource-exhausted → `MxGatewayException` with the right message). - CLI verb tests in `MxGatewayClientCliTests.cs` — argument parsing, JSON output shape, exit codes. ### PR E.3 — Python client SDK + CLI **Depends on:** E.1. **Files** (`clients\python\src\mxgateway\` + the existing CLI entry point — verify the exact name during PR; `PythonClientDesign.md` documents it): 1. New module `alarms.py` exposing async helpers: ```python async def subscribe_alarms(session, *, filter=None) -> AsyncIterator[AlarmTransition]: ... async def acknowledge_alarm(session, *, alarm_ref, comment, user) -> MxStatus: ... async def query_active_alarms(session, *, prefix=None) -> AsyncIterator[ActiveAlarmSnapshot]: ... ``` 2. CLI: add `alarms subscribe / acknowledge / query-active` verbs. Use the same JSON output schema as E.2's CLI so cross-language tooling can parse either. 3. Type stubs (`*.pyi`) updated for the new types. **Tests** (`clients\python\tests\`): - pytest-asyncio fixtures using a stub gRPC server; assert each helper's request/response shape. - CLI smoke via `subprocess` + captured stdout JSON comparison. ### PR E.4 — Go client SDK + CLI **Depends on:** E.1. **Files** (`clients\go\mxgateway\` + `clients\go\cmd\`): 1. New `alarms.go` exposing: ```go func (c *Client) SubscribeAlarms(ctx context.Context, opts ...SubscribeOption) (<-chan AlarmTransition, error) func (c *Client) AcknowledgeAlarm(ctx context.Context, ref, comment, user string) (MxStatus, error) func (c *Client) QueryActiveAlarms(ctx context.Context, prefix string) ([]ActiveAlarmSnapshot, error) ``` 2. CLI: add `alarms` subcommand under `clients\go\cmd\mxgateway-cli\` (verify the binary name in `GoClientDesign.md`). Same verb shape as E.2 / E.3. 3. Errors wrapped via `errors.Is` against named sentinels (`ErrAuthFailed`, `ErrPermissionDenied`, etc.) so callers can programmatically distinguish failure modes. **Tests:** standard Go table-driven tests against a stub gRPC server under `clients\go\internal\testserver\`. ### PR E.5 — Java client SDK + CLI **Depends on:** E.1. **Files** (`clients\java\mxgateway-client\src\main\java\` + `clients\java\mxgateway-cli\`): 1. New methods on the existing client class (verify in `JavaClientDesign.md`): ```java Flowable subscribeAlarms(Session s, AlarmFilter filter); Single acknowledgeAlarm(Session s, String alarmRef, String comment, String user); Flowable queryActiveAlarms(Session s, String prefix); ``` (RxJava idiom matching the existing data-change subscription API; if the existing API uses `CompletableFuture` instead, follow that convention — verify during PR.) 2. CLI: same `alarms subscribe / acknowledge / query-active` verbs. **Tests:** JUnit 5 + a stub gRPC server. CLI tested via `ProcessBuilder` exec + JSON output comparison. ### PR E.6 — Rust client SDK **Depends on:** E.1. **Files** (`clients\rust\crates\mxgateway-client\src\` + likely a `mxgateway-cli` crate — verify in `RustClientDesign.md`): 1. New methods on the client struct: ```rust pub fn subscribe_alarms(&self, filter: Option) -> impl Stream>; pub async fn acknowledge_alarm(&self, alarm_ref: &str, comment: &str, user: &str) -> Result; pub fn query_active_alarms(&self, prefix: Option<&str>) -> impl Stream>; ``` 2. CLI: same verb shape. 3. `thiserror`-based error enum extended with `AlarmAckPermissionDenied` etc. variants if the existing pattern uses one. **Tests:** `tokio::test` against a stub gRPC server using `tonic-build`'s test harness. CLI tested via `assert_cmd`. ### PR E.7 — lmxopcua OPC UA-facing client refresh **Depends on:** B.2 + B.3 (server-side payload final on the OPC UA wire). Independent of E.2-E.6 — different consumer surface (OPC UA Part 9, not gateway gRPC). **Files** (`c:\Users\dohertj2\Desktop\lmxopcua\src\`): 1. `Core.Abstractions\AlarmEventArgs.cs` *(extend, not new)* — add optional fields the new path surfaces: - `OperatorComment` (nullable string — populated by the native ack path; null on sub-attribute fallback path) - `OriginalRaiseTimestampUtc` (nullable; null on fallback path) - `AlarmCategory` (nullable string) - `AlarmTypeName` (already exists per v1 docs — leave alone) 2. `Server\OpcUa\DriverNodeManager.cs` — populate the corresponding OPC UA Part 9 condition fields when the new payload is non-null: `Comment` (from OperatorComment), `Time` (from OriginalRaiseTimestampUtc when present, else event arrival time), `ConditionClassName` (from AlarmCategory if mapping is defined). 3. `Client.Shared\Models\AlarmEventArgs.cs` — mirror the new fields on the client-side DTO. 4. `Client.CLI\Commands\AlarmsCommand.cs` — add columns under a new `--verbose` flag, plus full payload under `--json`. Default output stays five-column compatible. 5. `Client.UI\ViewModels\AlarmEventViewModel.cs` — bind the new fields. Add columns to `Views\AlarmsView.axaml` (collapsible under a "Show details" toggle so the default view stays compact). Surface `OperatorComment` in `AckAlarmWindow.axaml` as a prepopulated default when re-acknowledging an already-acked alarm. 6. `docs\Client.CLI.md` — add the new `--verbose` and `--json` flag examples to the alarms section. 7. `docs\Client.UI.md` — add a screenshot or description of the "Show details" expansion behavior. 8. `docs\reqs\ClientRequirements.md` — line 116 + 153 reference the alarm subscription contract; extend the field list to cover the new payload. 9. `docs\AlarmTracking.md` (new in B.5) — wire in client-side examples. **Tests:** - `Client.Shared.Tests` — DTO round-trip through the alarm event pump with all fields populated and all-null cases. - `Client.CLI.Tests` — `--verbose` column ordering, `--json` schema validation, default output stays five-column. - `Client.UI.Tests` — `AlarmEventViewModel` bindings exposed, collapsible-detail toggle behavior. ### Sequencing within Track E: E.1 first (mechanical). E.2-E.7 can land in parallel. E.7 has its own dependency chain inside lmxopcua (B.2 + B.3) and doesn't gate any other E PR. The .NET client (E.2) is the only language SDK **lmxopcua** consumes today; if the gateway repo's release schedule prefers landing E.2 first and shipping E.3-E.6 in a follow-up release, that's a valid sequence — the customer-facing constraint is "at least one language SDK ships at the same time as A.4 lights up the gateway dispatch." ## Track D — deployment refresh The dev box at `DESKTOP-6JL3KKO` runs three live services from `C:\publish\` (installed in the session that produced commit `ea04547`'s install scripts). Once Tracks A / B / C are merged, the deployed binaries need to be refreshed so the running services pick up the new alarm path. Track D is one PR — pure ops, no code change. ### PR D.1 — refresh C:\publish + restart services **Depends on:** A.4 + B.4 + C.2 merged (every code-change PR landed). **Order matters** — services must stop in reverse-dependency order (`OtOpcUa` → `OtOpcUaWonderwareHistorian` → `MxAccessGw`) and start in forward-dependency order (`MxAccessGw` → `OtOpcUaWonderwareHistorian` → `OtOpcUa`). Touching binaries while a dependent service holds them locked produces the publish-time `MSB3027` file-lock error caught during the original install (see commit `80104ca`). **Steps (run as a single PowerShell session on the deploy host):** 1. **Stop in reverse order**: ```powershell nssm stop OtOpcUa nssm stop OtOpcUaWonderwareHistorian nssm stop MxAccessGw Start-Sleep -Seconds 3 Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Server, ` OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue | Stop-Process -Force ``` 2. **Refresh mxaccessgw binaries** (Track A output): ```powershell $gwSrc = "C:\Users\dohertj2\Desktop\mxaccessgw" dotnet build "$gwSrc\src\MxGateway.Worker" -c Release dotnet build "$gwSrc\src\MxGateway.Server" -c Release Copy-Item -Recurse -Force ` "$gwSrc\src\MxGateway.Server\bin\Release\net10.0\*" ` "C:\publish\mxaccessgw\Server\" Copy-Item -Recurse -Force ` "$gwSrc\src\MxGateway.Worker\bin\x86\Release\net48\*" ` "C:\publish\mxaccessgw\Worker\" ``` 3. **Refresh OtOpcUa + historian sidecar binaries** (Tracks B + C output): ```powershell $repo = "C:\Users\dohertj2\Desktop\lmxopcua" dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Server" ` -c Release -o "C:\publish\lmxopcua" dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware" ` -c Release -o "C:\publish\lmxopcua\WonderwareHistorian" ``` 4. **Update service env block if Track C added the new toggle**: ```powershell # Pull existing env, append OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true # (default-on per C.2 design, but explicit assignment lets us flip false # for read-only deployments without re-installing) nssm set OtOpcUaWonderwareHistorian AppEnvironmentExtra ` (((nssm get OtOpcUaWonderwareHistorian AppEnvironmentExtra) ` + "`r`nOTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true")) ``` 5. **Start in forward order**: ```powershell nssm start MxAccessGw Start-Sleep -Seconds 4 nssm start OtOpcUaWonderwareHistorian Start-Sleep -Seconds 4 nssm start OtOpcUa Start-Sleep -Seconds 8 ``` 6. **Smoke verification:** ```powershell foreach ($s in 'MxAccessGw','OtOpcUaWonderwareHistorian','OtOpcUa') { (Get-Service $s).Status } foreach ($p in 5120, 4840, 4841) { Get-NetTCPConnection -LocalPort $p -State Listen ` -ErrorAction SilentlyContinue } Get-Content "C:\publish\lmxopcua\logs\otopcua-*.log" -Tail 20 Get-Content "C:\publish\mxaccessgw\stdout.log" -Tail 20 Get-Content "C:\ProgramData\OtOpcUa\historian-wonderware-*.log" -Tail 10 ``` Pass criterion: all three services `Running`; ports 5120 + 4840 listening; sidecar log shows `Wonderware historian sidecar serving — pipe=OtOpcUaWonderwareHistorian`; OtOpcUa log shows `OPC UA server started — endpoint=opc.tcp://0.0.0.0:4840/OtOpcUa` and a new line `IAlarmHistorianWriter resolved: Sidecar` (added in B.4). 7. **Functional verification — fire one alarm of each kind and assert it propagates:** - **Galaxy-native** — raise the `OtOpcUaParityTest_001.Counter` `$Alarm*` extension via Galaxy's alarm-fire mechanism; assert an OPC UA Part 9 transition reaches a connected `otopcua-cli alarms` subscriber with rich payload (operator-comment field non-null, original-raise-timestamp present). This validates Track A + B.1 + B.2 + B.3. - **Scripted** — author a one-line scripted alarm in the Admin UI against any always-true predicate; assert the transition lands in AVEVA Historian via `aaHistClientTrend` query (or `Driver.Historian.Wonderware.IntegrationTests` with a query for the alarm event). Validates Track C + B.4. - **Sub-attribute fallback** — disable `IAlarmSource` on the GalaxyDriver via the test seam (B.3 will introduce one); fire an alarm; assert Part 9 transition still raised by the value-driven path. Validates the fallback wasn't broken. **Files:** - `scripts\install\Refresh-Services.ps1` *(new — automates the above)* - `docs\v2\dev-environment.md` — add the refresh script to the dev workflow section. **Tests:** smoke run on the dev rig (`DESKTOP-6JL3KKO`) producing `docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md` with the captured log tails + smoke-test assertions. Captured artifact lands as part of the PR. **Rollback:** the refresh script keeps a timestamped backup of the existing `C:\publish\mxaccessgw\` and `C:\publish\lmxopcua\` trees before overwriting (mirrored to `C:\publish\.backup-YYYY-MM-DD\`). Rollback is a stop / restore-from-backup / start sequence; no service re-install needed since the NSSM service definitions don't change. **Production deploy:** out of scope for D.1 — the dev rig is the only deployment in scope at this point. A separate PR-or-runbook lands the production refresh once the dev rig has soaked for the documented duration (parity-rig validation gate; see "Test gates" above). ## Sequencing matrix ``` Track A (mxaccessgw) Track B (lmxopcua) Track C (sidecar) Track E (clients) ───────────────────────── ───────────────────────── ───────────────────── ────────────────────────── A.1 proto (waits) C.1 AahClientManagedWriter E.1 proto regen ×5 langs │ │ │ (mechanical, after A.1) ├──────────────────────────► B.1 EventPump branch │ │ A.2 worker subscription │ uses proto types only │ │ │ │ unit-testable │ │ │ C.2 Program.cs wires │ A.3 gateway dispatch + ack RPC ──►B.2 GalaxyDriver : IAlarmSource │ ──►E.2 .NET SDK + CLI │ │ │ ──►E.3 Python SDK + CLI │ ──►B.3 DriverNodeManager routing │ ──►E.4 Go SDK + CLI │ │ ──►E.5 Java SDK + CLI │ │ ──►E.6 Rust SDK A.4 ConditionRefresh │ │ │ │ │ │ B.4 SidecarAlarmHistorianWriter │ (depends on C.2 deployed) │ │ │ │ (B.2 + B.3 done) ────────────────────────────────────────────► E.7 lmxopcua client refresh │ │ ▼ │ Track D (deployment) │ ───────────────────────── │ D.1 Refresh C:\publish + restart services │ (depends on A.4 + B.4 + C.2 + E.2 merged) │ ▼ │ ──►B.5 docs + memory + completion banner ◄─────────(E.7 done)──┘ ``` A.1 + B.1 + C.1 + E.1 can all land in parallel — none have cross-repo runtime dependencies. B.1's tests use proto types without needing a running gateway. C.1 is purely sidecar-internal. E.1 is mechanical codegen. The gateway-side dispatch (A.3) gates B.2 and E.2-E.6. The sidecar-side wiring (C.2) gates B.4. E.7 gates on B.2 + B.3 only — it's the OPC UA client surface, not the gateway client surface. D.1 (deployment refresh) requires E.2 to also be merged because the deployed `MxGateway.Client.dll` consumed by GalaxyDriver needs the new methods. E.3-E.6 (other-language SDKs) don't gate D.1 — they ship on their own release cadence. B.5 (docs sweep) gates on D.1 + E.7 both merged — it's the final "snapshot the as-shipped state" pass. ## Test gates Per PR: unit tests pass + build green + analyzer clean (Roslyn OTOPCUA0001 still wraps every alarm-capability call through `AlarmSurfaceInvoker`). End-of-epic gate: re-run the parity rig (`docs\v2\Galaxy.ParityRig.md`) with these scenarios added: 1. **Native alarm raise** — Galaxy `$Alarm*` raise with operator-time metadata appears as an OPC UA Part 9 transition with full payload (no longer reconstructed from sub-attribute writes). 2. **Native ack** — OPC UA client acks; assert the gateway records the ack against MxAccess directly (not via sub-attribute write); operator comment present in the resulting `Acknowledged` transition. 3. **ConditionRefresh after reconnect** — disconnect the GalaxyDriver, raise three alarms in Galaxy, reconnect; assert all three appear in the next ConditionRefresh. 4. **Historian write-back** — fire a scripted alarm; assert it arrives in AVEVA Historian via the gateway path (use the existing Historian sidecar's read API to query it back). 5. **Sub-attribute fallback still works** — disable `IAlarmSource` on the GalaxyDriver via test seam, fire a sub-attribute value change; assert Part 9 transition still raised. Soak target: 24h × 1k tags (light) — same parity-rig harness but extended to also subscribe to alarms. Pass criterion: zero dropped alarm transitions, zero state-machine inversions, zero unhandled exceptions in the AlarmSurfaceInvoker pipeline. ## Risks and mitigations | Risk | Mitigation | |---|---| | MxAccess Toolkit alarm subscription API differs across installed AVEVA versions | PR A.2 verifies against the worker-host's installed Toolkit version; documents the exact API used. Pin the worker DLL set per major MxAccess version if needed. | | Worker-side alarm subscription leaks between sessions if cleanup is wrong | PR A.2 includes a session-recycle test that asserts no `IAlarmEventSink` instances remain registered after Close. | | Gateway adds a new auth scope (`invoke:alarm-ack`); existing keys lack it | PR A.3 + A.5 ship with a one-time bootstrap migration: keys with `invoke:write` get the new scope auto-granted on the dev rig and parity rig. Production keys are reissued via `apikey rotate-key` (existing CLI). | | Two simultaneous alarm sources (driver-native + sub-attribute) double-fire transitions | PR B.3 dedup is the load-bearing design. End-to-end test #1 covers it explicitly. | | Historian write-back batch fails mid-batch — partial success | The existing `SqliteStoreAndForwardSink.HistorianWriteOutcome` per-row enum + dead-letter retention already handles this; PR A.5 just exposes the same outcome shape over gRPC. | | Sidecar starts honouring the `WriteAlarmEvents` slot — old lmxopcua-side consumers can now reach a previously inert path | The slot returns `Success=false, Error="not configured"` today; flipping to live writes means a build that *speculatively* sent the frame would suddenly start producing real historian rows. Inventory of any such caller is empty — `WriteAlarmEvents` was never invoked from the lmxopcua side; `Phase7EngineComposer.RouteToHistorianAsync` queues into `SqliteStoreAndForwardSink` and the drain worker is gated on `IAlarmHistorianWriter` registration which only the new B.4 path provides. So enabling C.2 without B.4 is safe. | ## Roll-out Track A lands first onto `mxaccessgw/main`, deployed to the parity rig. Track B lands onto `lmxopcua/master` once A.3 is live on the rig — earlier Track B PRs can target a feature branch (`feat/alarms-over-gateway`) and merge to master after the rig is fully green. ## Back-out Each PR is individually revertable. The cleanest back-out point is at the gateway-side enum extension: removing `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` from the proto means EventPump silently drops alarm events again and GalaxyDriver's `OnAlarmEvent` never fires — but the sub-attribute fallback path still produces functional alarms, so the OPC UA surface degrades to v2-current behaviour without breaking. PR B.4 is the only one with a non-trivial back-out (re-add the deleted sidecar IPC slot if revert needed); land B.4 last and only after end-of-epic gate is green. ## Out of scope (explicit) - **Other alarm sources beyond Galaxy.** AbCip / FOCAS / OpcUaClient drivers already implement `IAlarmSource`; they're untouched. - **Modbus / S7 / AbLegacy / TwinCAT alarms.** None of those protocols has a native alarm bus. Alarms on those drivers, if needed, ship via the scripted-alarm path. - **Multi-Galaxy ack routing.** Today's gateway model is one Galaxy per session; if a deployment splits across galaxies, each gets its own GalaxyDriver and they don't cross-talk. No change. - **OPC UA Part 9 advanced features** beyond the current scope — shelving, subscribed-to-events-only, branch-state for re-trigger semantics. Future epic if a customer asks. - **Insight / cloud Historian write-back path.** Track A.5 targets the on-prem AVEVA Historian via aahClientManaged. The cloud variant would mirror the same gateway RPC over the REST API discussed in `docs/histsdk` — separate epic. ## File inventory (touched) **mxaccessgw (Track A):** - `src\MxGateway.Contracts\Protos\mxaccess_gateway.proto` (A.1) - `src\MxGateway.Contracts\Protos\mxaccess_worker.proto` (A.2, A.4) - `src\MxGateway.Worker\…\Eventing\` (A.2, A.3, A.4) - `src\MxGateway.Worker\…\Commands\` (A.3, A.4) - `src\MxGateway.Server\Sessions\SessionEventStream.cs` (A.3) - `src\MxGateway.Server\Rpc\` (A.3, A.4) - `src\MxGateway.Server\Auth\Scopes.cs` (A.3, A.4) - `MxGateway.Tests`, `MxGateway.Worker.Tests`, `MxGateway.IntegrationTests` **lmxopcua — Galaxy driver + server (Track B):** - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\EventPump.cs` (B.1) - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\MxAccessSeverityMapper.cs` *(new — B.1)* - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\IGalaxyAlarmAcknowledger.cs` *(new — B.2)* - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\GatewayGalaxyAlarmAcknowledger.cs` *(new — B.2)* - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriver.cs` (B.2) - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriverFactory.cs` (B.2) - `src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs` (B.3) - `src\ZB.MOM.WW.OtOpcUa.Server\Alarms\AlarmConditionService.cs` (B.3) - `src\ZB.MOM.WW.OtOpcUa.Server\Phase7\Phase7Composer.cs` (B.4) - `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client\SidecarAlarmHistorianWriter.cs` *(new — B.4)* - `tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests\Runtime\` (B.1, B.2) - `tests\ZB.MOM.WW.OtOpcUa.Server.Tests\Alarms\` (B.3) - `tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests\` (B.4 — new tests) - `docs\drivers\Galaxy.md` (B.5) - `docs\AlarmTracking.md` *(new — B.5)* - `docs\v1\AlarmTracking.md` (B.5 — banner update) - `docs\plans\alarms-over-gateway.md` (B.5 — completion banner) **lmxopcua — Wonderware historian sidecar (Track C):** - `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Backend\AahClientManagedAlarmEventWriter.cs` *(new — C.1)* - `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Program.cs` (C.2 — wire writer) - `scripts\install\Install-Services.ps1` (C.2 — env-var toggle for write-enable) - `tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests\` (C.1 — outcome mapping + batch + cluster failover) **lmxopcua — deployment refresh (Track D):** - `scripts\install\Refresh-Services.ps1` *(new — D.1)* - `docs\v2\dev-environment.md` (D.1 — document the refresh workflow) - `docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md` *(new — D.1 captured smoke run)* **mxaccessgw — client SDKs (Track E):** - `clients\proto\` — no source change; downstream codegen consumes A.1 - **.NET (E.2)**: - `clients\dotnet\MxGateway.Client\MxGatewayClient.cs` - `clients\dotnet\MxGateway.Client\Alarms\` *(new namespace)* - `clients\dotnet\MxGateway.Client.Cli\Verbs\AlarmsVerb.cs` *(new)* - `clients\dotnet\MxGateway.Client.Tests\AlarmsTests.cs` *(new)* - **Python (E.3)**: - `clients\python\src\mxgateway\alarms.py` *(new)* - `clients\python\src\mxgateway\cli\alarms.py` *(new — verify CLI module path)* - `clients\python\tests\test_alarms.py` *(new)* - **Go (E.4)**: - `clients\go\mxgateway\alarms.go` *(new)* - `clients\go\cmd\mxgateway-cli\alarms.go` *(new — verify dir name)* - `clients\go\internal\testserver\alarms_test.go` *(new)* - **Java (E.5)**: - `clients\java\mxgateway-client\src\main\java\…\AlarmsApi.java` *(new)* - `clients\java\mxgateway-cli\src\main\java\…\AlarmsCommand.java` *(new)* - `clients\java\mxgateway-client\src\test\java\…\AlarmsApiTest.java` *(new)* - **Rust (E.6)**: - `clients\rust\crates\mxgateway-client\src\alarms.rs` *(new)* - `clients\rust\crates\mxgateway-cli\src\alarms.rs` *(new — verify crate name)* - `clients\rust\tests\alarms.rs` *(new)* **lmxopcua — OPC UA client refresh (Track E.7):** - `src\ZB.MOM.WW.OtOpcUa.Core.Abstractions\AlarmEventArgs.cs` (extend) - `src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs` (Part 9 field population) - `src\ZB.MOM.WW.OtOpcUa.Client.Shared\Models\AlarmEventArgs.cs` (DTO mirror) - `src\ZB.MOM.WW.OtOpcUa.Client.CLI\Commands\AlarmsCommand.cs` (verbose / json flags) - `src\ZB.MOM.WW.OtOpcUa.Client.UI\ViewModels\AlarmEventViewModel.cs` - `src\ZB.MOM.WW.OtOpcUa.Client.UI\ViewModels\AlarmsViewModel.cs` - `src\ZB.MOM.WW.OtOpcUa.Client.UI\Views\AlarmsView.axaml` (+ `.cs`) - `src\ZB.MOM.WW.OtOpcUa.Client.UI\Views\AckAlarmWindow.axaml` (+ `.cs`) - `docs\Client.CLI.md` (alarms section examples) - `docs\Client.UI.md` (Show-details toggle description) - `docs\reqs\ClientRequirements.md` (extend AlarmEventArgs contract) - `docs\AlarmTracking.md` (B.5 — cross-link client examples) - `tests\ZB.MOM.WW.OtOpcUa.Client.Shared.Tests\` (DTO round-trip) - `tests\ZB.MOM.WW.OtOpcUa.Client.CLI.Tests\` (flag behaviour) - `tests\ZB.MOM.WW.OtOpcUa.Client.UI.Tests\` (view-model bindings) Total: ~10 source files added/modified in mxaccessgw server/worker side; ~14 in lmxopcua server/driver side; ~3 in the historian sidecar; ~2 deployment scripts; ~30 across the five gateway-client SDK languages; ~12 in lmxopcua client surfaces; ~25 test files across all repos. The gateway-client multi-language work is parallelizable across maintainers, so wall-clock effort lands in 4-6 weeks of coordinated work given the parity-rig dependency for end-to-end validation. If only the .NET SDK ships at first (E.2 only) and E.3-E.6 follow asynchronously, lmxopcua's critical path stays unchanged.