# Alarms Worker Wiring Plan > **Context**: The alarms-over-gateway epic shipped 19 PRs across the > `lmxopcua` and `mxaccessgw` repos (merged 2026-04-30). Contracts are live; > the sub-attribute fallback path keeps Galaxy alarms functional today. Four > items remain as inert scaffolds gated on a dev-rig finding. This document is > the focused implementation plan for those four items only. > > **Do not duplicate `docs/plans/alarms-over-gateway.md`** — that document is > the full historical record of all 19 PRs. This document covers only what is > still to be done and exactly what blocks each item. > > **This work lives in the mxaccessgw sibling repo** at > `C:\Users\dohertj2\Desktop\mxaccessgw\` — not in this (lmxopcua) repo, > except where lmxopcua changes are noted explicitly. --- ## Dev-rig finding that blocks everything (2026-04-30) During PR A.2 work the following was discovered on the dev box: > The MXAccess COM Toolkit at > `C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll` > exposes **no alarm-event family** — only `OnDataChange`, `OnWriteComplete`, > `OperationComplete`, `OnBufferedDataChange`. > > AVEVA's `aaAlarmManagedClient` / `ArchestrAAlarmsAndEvents.SDK` assemblies > are **x64-only** and incompatible with the worker's x86 net48 bitness. The architectural decision required before any of A.2, A.3/A.4, C.1 can ship: > **Either** accept the value-driven sub-attribute path as the production > architecture (operator-comment fidelity is the only v1 regression), **or** > add an x64 alarm-helper sub-process alongside the x86 worker. Resolution drives the implementation shape of every item below. The plan presented here assumes the x64 alarm-helper sub-process route (the higher parity option), but notes the sub-attribute-only exit at each step. --- ## Discovered AVEVA API surface Before implementing, verify the following against the AVEVA SDK actually installed on the dev box and in the mxaccessgw worker's deployment folder: | Assembly | Bitness | Likely location | Key types | |----------|---------|-----------------|-----------| | `ArchestrA.MXAccess.dll` | x86 | `C:\Program Files (x86)\ArchestrA\Framework\Bin\` | `IMxAlarmEventSink`, `MxAlarmEventArgs` — **confirm exists at actual version** | | `aaAlarmManagedClient.dll` | x64 | `C:\Program Files\ArchestrA\Framework\Bin\` | `AlarmClient`, `IAlarmConsumer`, `AlarmEventArgs` | | `ArchestrAAlarmsAndEvents.SDK.dll` | x64 | Same or Historian SDK folder | `AlarmHistorianWriter`, `GetAlarmExtendedRec` | The AVEVA MXAccess Toolkit reference in the mxaccessgw repo (`gateway.md`) is the canonical API doc for the gateway worker's side. The alarm-client API is documented separately; verify the following call shapes during PR A.2: | Operation | Likely API | Notes | |-----------|-----------|-------| | Subscribe to alarm events | `AlarmClient.RegisterConsumer(IAlarmConsumer)` + `AlarmClient.Subscribe(filterSpec)` | Confirm exact method signatures against the SDK version on the dev box | | Receive alarm event | `IAlarmConsumer.OnAlarmEvent(AlarmEventArgs)` callback | Field set: alarm name, source, type, transition kind, severity, timestamps, operator fields | | Acknowledge alarm | `AlarmClient.AcknowledgeAlarm(alarmRef, comment, userPrincipal)` or equivalent | Confirm whether this is synchronous or returns a status | | Query active alarms | `AlarmClient.GetAlarmExtendedRec(filter)` or `GetActiveAlarms()` | Returns current active set for ConditionRefresh | | Get statistics | `AlarmClient.GetStatistics()` | Optional — useful for worker health checks | Record the exact method signatures against the installed SDK before starting A.2 — the proto field set in `OnAlarmTransitionEvent` must match the SDK's actual payload. --- ## Dependency order ``` A.2 (worker: AlarmClient subscription) └─► A.3 (gateway: dispatch OnAlarmTransition + AcknowledgeAlarm RPC handler) └─► A.4 (gateway: QueryActiveAlarms RPC handler) └─► lmxopcua B.2 (GalaxyDriver IAlarmSource live) └─► C.1 (sidecar: AahClientManagedAlarmEventWriter live) └─► D.1 (smoke artifact captured) ``` A.2 is the single blocking item. All subsequent items unblock serially once A.2 delivers alarm events through the channel. --- ## Item A.2 — Worker: subscribe to MxAccess alarm event source **Repo**: `mxaccessgw` — `src\MxGateway.Worker\` (net48, x86) **What it needs**: The worker must subscribe to AVEVA's alarm events and fan them into the same bounded channel the data-change pump uses, translating each MxAccess alarm event into a `WorkerEvent` proto with family `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` (defined in PR A.1, already merged). **Architectural choice determines the implementation path**: **Option X1 — aaAlarmManagedClient in a new x64 alarm-helper process** Add a second worker-mode sub-process (`MxGateway.AlarmWorker`, net8.0 x64) alongside the existing x86 worker. The AlarmWorker: 1. Loads `aaAlarmManagedClient.dll` (x64) on startup. 2. Calls `AlarmClient.RegisterConsumer` with a `WorkerAlarmConsumer` sink. 3. Calls `AlarmClient.Subscribe` with a session-level filter (all alarms for the session's Galaxy scope). 4. Translates each `IAlarmConsumer.OnAlarmEvent` callback into a protobuf `WorkerEvent` (family `ON_ALARM_TRANSITION`) and writes it to an IPC channel readable by the gateway server-side multiplexer. 5. Handles session lifecycle: re-subscribes after reconnect; unsubscribes on session close. IPC from AlarmWorker to gateway: simplest option is a named pipe or an in-process queue if the AlarmWorker is hosted in the same gateway process space as a separate `IHostedService`. **Option X2 — Accept sub-attribute fallback as production (no A.2 work)** If the architectural decision is to accept the sub-attribute path as permanent: - `MxAccessAlarmEventSink.Attach()` in the worker remains a no-op (as currently coded with the architectural comment). - The `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` proto family stays defined but the gateway never emits events on it. - lmxopcua's `GalaxyDriver` does not implement `IAlarmSource` for the native path; the value-driven sub-attribute path remains the production path. - The only regression vs. v1 is operator-comment fidelity on Galaxy alarms. - C.1 is still needed if scripted-alarm historian write-back is required. **What blocks it**: the architectural decision above. Once made, A.2 becomes a 2–3 day implementation task (sub-process plumbing + proto translation + unit tests for the consumer sink cancellation behaviour). **Tests to write (when A.2 proceeds)**: - `WorkerAlarmConsumerTests` — fake `IAlarmConsumer` source emits canned transitions; assert each produces the correct `WorkerEvent` body shape. - Cancellation/session-close test — closing the session unsubscribes from the AlarmClient cleanly (no leaked `IAlarmConsumer` reference if the worker is recycled mid-session). - Re-subscribe-after-reconnect test — `ReconnectSupervisor` triggers a reconnect; assert the alarm consumer re-attaches to the new session. --- ## Item A.3 / A.4 — Gateway: dispatch and RPC handlers **Repo**: `mxaccessgw` — `src\MxGateway.Server\` **Depends on**: A.2 delivering `WorkerEvent` bodies with family `MX_EVENT_FAMILY_ON_ALARM_TRANSITION`. **What it needs**: ### A.3 — Dispatch + AcknowledgeAlarm 1. The session-level event multiplexer (`Sessions\SessionEventStream.cs` or equivalent — verify name in the mxaccessgw repo) must recognise the new `WorkerEvent` body and forward it as an `MxEvent` with family `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` to every `StreamEvents` subscriber for that session. 2. New RPC handler `AcknowledgeAlarm` builds an `AlarmAcknowledgeCommand` worker command and forwards it to the alarm-helper process (Option X1) or the worker's MxAccess session (Option X2 if MxAccess exposes ack). Maps the reply status to `AcknowledgeAlarmReply.MxStatusProxy`. 3. Authorization: new API scope `invoke:alarm-ack` on the API key. Keys without it receive `PERMISSION_DENIED`. Follow the existing scope-check pattern used by `invoke:write`. ### A.4 — QueryActiveAlarms 1. New RPC handler `QueryActiveAlarms` calls `AlarmClient.GetAlarmExtendedRec` (or `GetActiveAlarms` — confirm the method name during implementation) on the alarm-helper process, batches results into `ActiveAlarmSnapshot` proto messages, and streams them back to the caller. 2. New API scope `invoke:alarm-query` (separate from ack so read-only clients can refresh without ack rights). **What blocks A.3/A.4**: A.2 must deliver `WorkerEvent` bodies on the channel. A.3/A.4 are pure dispatch wiring once the events arrive. **Tests to write**: - A.3 dispatch test — fake worker emits an `AlarmTransition` event; assert the gateway forwards it on the `StreamEvents` channel of every subscribed session (mirrors existing `OnDataChange` dispatch tests). - A.3 AcknowledgeAlarm auth test — existing key without `invoke:alarm-ack` scope returns `PERMISSION_DENIED`. - A.4 pagination test — synthetic active-alarm set of 0 / 1 / 100 entries; assert each streams back as separate `ActiveAlarmSnapshot` messages. - Integration (parity rig — requires dev box with AVEVA platform): trigger a real Galaxy alarm, call `QueryActiveAlarms`, assert the alarm appears in the stream; call `AcknowledgeAlarm`, assert the alarm transitions to `ActiveAcked` and a `Acknowledge` transition event appears on `StreamEvents`. --- ## Item C.1 — Historian sidecar: AahClientManagedAlarmEventWriter **Repo**: `lmxopcua` — `src\Drivers\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\` **Depends on**: Architectural decision (the sidecar uses `aahClientManaged` x64, which is not bitness-constrained like the worker). C.1 is independently unblockable from A.2 if the goal is to wire up the scripted-alarm historian path. **Current state**: `SdkAlarmHistorianWriteBackend` in `src\MxGateway.Worker\MxAccess\` is a placeholder returning `RetryPlease`. The lmxopcua sidecar's `WriteAlarmEvents` IPC slot is defined in `Ipc\Contracts.cs` but `Program.cs` constructs `HistorianFrameHandler` without an `alarmWriter` (line 57 per the alarms plan). The `IAlarmEventWriter` interface exists; only the production implementation and the consumer wiring are missing. **What it needs**: 1. New `AahClientManagedAlarmEventWriter.cs` implementing `IAlarmEventWriter` (defined in `Ipc\HistorianFrameHandler.cs`). Calls `aahClientManaged`'s alarm-event write API — same path v1's `GalaxyHistorianWriter` used. Uses `HistorianClusterEndpointPicker` for multi-node routing. Maps `MxStatus` write outcomes to `HistorianWriteOutcome` enum (Ack / PermanentFail / RetryPlease). 2. `Program.cs` — build `AahClientManagedAlarmEventWriter` next to the existing `BuildHistorian()` call; pass it to `HistorianFrameHandler`. Gate behind `OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED` env var (default `true` when `OTOPCUA_HISTORIAN_ENABLED=true`). 3. `Install-Services.ps1` — add the new env var to the install-time block. **What blocks C.1**: access to the `aahClientManaged` SDK on the dev box (confirmed available per `project_aveva_platform_installed.md` — AVEVA Historian SDK is present). C.1 can proceed without A.2 since the sidecar's `aahClientManaged` is x64 and does not share the worker's x86 bitness constraint. **Tests to write**: - Outcome-mapping table: every `MxStatus` on alarm-write → expected `HistorianWriteOutcome`. - Batch test: 1 / 100 / 1000 events through a fake `aahClientManaged` writer; assert per-row outcome list parallel to input order. - Cluster failover: primary Historian node returns `BadCommunicationError`; picker rotates to secondary; eventual success. - `Program.cs` seam: assert handler constructed with alarm writer when env var enabled; without it when disabled. - Live integration (parity rig): write a synthetic alarm event through the IPC; query it back via `ReadEvents`; assert round-trip fidelity. --- ## Item D.1 — Smoke artifact **Repo**: `lmxopcua` (deployment refresh) + `mxaccessgw` (rig verification) **Depends on**: A.2, A.3, A.4, and C.1 all passing on the dev rig with a live Galaxy and live Historian. **Current state**: The deployment script `Refresh-Services.ps1` (task D.1) has shipped as PR #417 (merged 2026-04-30). What was NOT captured at that time was a smoke artifact — a log snippet or test output confirming that: 1. An alarm transition event from a live Galaxy alarm reaches lmxopcua's `AlarmConditionService` via the new `IAlarmSource` path (not the fallback). 2. A scripted-alarm historian write-back reaches AVEVA Historian via the sidecar `IAlarmEventWriter`. **What it needs**: Once A.2, A.3, C.1 are wired on the parity rig: 1. Deploy the updated mxaccessgw (with A.2 / A.3 / A.4 changes). 2. Deploy the updated sidecar (with C.1 changes). 3. Run `Refresh-Services.ps1` to confirm clean service restarts. 4. Trigger a Galaxy alarm (e.g. set an AnalogLimitAlarm attribute out of range in Galaxy IDE). 5. Observe the lmxopcua OPC UA alarm surface via the Client CLI: ```powershell dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- ` alarms -u opc.tcp://localhost:4840 --subscribe ``` Pass: the alarm condition appears on the OPC UA A&E surface within 2 × publishing interval. 6. Trigger a scripted alarm via the lmxopcua `ScriptedAlarmEngine` (or an OPC UA method call if one is wired). 7. Confirm in the AVEVA Historian that the scripted alarm event is stored (query via the Historian client or HistorianWatch tool). 8. Capture log snippets: - mxaccessgw log: `[INF] AlarmTransition dispatched sessionId=<> alarmRef=<>` - lmxopcua log: `[INF] AlarmConditionService: IAlarmSource event alarmRef=<> origin=Driver` - Sidecar log: `[INF] AahClientManagedAlarmEventWriter: Wrote alarm events` 9. Commit the log snippets as `docs/plans/alarms-d1-smoke-artifact.md` (a new doc, not this one). **What blocks D.1**: all of A.2, A.3, C.1, plus the operator decision on the x64 alarm-helper architecture (or explicit acceptance of the sub-attribute fallback as production). --- ## Summary of blocks | Item | Blocked by | Estimated effort once unblocked | |------|-----------|--------------------------------| | A.2 | Architectural decision (x64 alarm-helper vs. sub-attribute fallback as production) | 2–3 days implementation; 1 day tests | | A.3 | A.2 delivering WorkerEvent bodies | 1–2 days | | A.4 | A.2 (active-alarm query needs AlarmClient session) | 1 day | | C.1 | aahClientManaged SDK access (available on dev box); NOT blocked by A.2 | 1–2 days | | D.1 | A.2 + A.3 + C.1 all passing on parity rig | 0.5 day (smoke + artifact capture) | C.1 can proceed in parallel with A.2 / A.3 since the sidecar's `aahClientManaged` is x64 and does not share the worker bitness constraint. --- ## What this plan does NOT cover - The value-driven sub-attribute fallback path — already shipped and functional (not being changed). - Track B (lmxopcua EventPump, GalaxyDriver IAlarmSource re-implementation) and Track E (client SDK surface refresh) from the alarms-over-gateway plan — those are in `lmxopcua` and depend on A.3 being live; they follow naturally once A.3 ships. - Galaxy-native alarm historian path — System Platform's own `HistorizeToAveva` toggle on the Galaxy template; not in scope. - Alarm ACL / role-grant surface — already shipped in Phase 6.2.