# Plan — alarms over the mxaccessgw gateway Coordinated epic across two repos: - **`lmxopcua`** (this repo) — `c:\Users\dohertj2\Desktop\lmxopcua\` - **`mxaccessgw`** — `c:\Users\dohertj2\Desktop\mxaccessgw\` ## Why PR 7.2 (2026-04-30, commit `ae7106d`) retired the in-process v1 Galaxy stack (`Driver.Galaxy.Host` / `.Proxy` / `.Shared` + `OtOpcUaGalaxyHost` Windows service) and migrated Galaxy access to the in-process `GalaxyDriver` over mxaccessgw's gRPC. In doing so, three v1 capabilities regressed: 1. **Native MxAccess alarm-event metadata** — v1's `GalaxyAlarmTracker` surfaced rich alarm transitions (operator comment, original raise time, ack time, alarm category, native severity). The current architecture reconstructs Part 9 transitions by subscribing to four sub-attribute value updates (`InAlarm`, `Acked`, `Priority`, `Description`) — fine for raise/clear but loses everything else. 2. **Native MxAccess Acknowledge semantics** — v1 called the MxAccess ack API directly from `GalaxyAlarmTracker`. Today, OPC UA acks are written into the `AckMsgWriteRef` sub-attribute — semantically valid but a round-trip through the value path that loses operator-comment fidelity. 3. **Alarm-historian write-back path** — `GalaxyHistorianWriter` implemented `IAlarmHistorianWriter` and forwarded scripted-alarm and Galaxy-native alarm transitions back to AVEVA Historian via `aahClientManaged`. PR 7.2 deleted it. `Phase7Composer.ResolveHistorianSink` now finds no writer and falls back to `NullAlarmHistorianSink`, so **scripted-alarm transitions queue locally and silently discard.** (Galaxy-native alarms still reach AVEVA Historian via the Galaxy template's own `HistorizeToAveva` toggle, independent of our sink — that path wasn't broken.) `gateway.md` (mxaccessgw, line 8) explicitly commits the gateway to "full MXAccess parity… preserve MXAccess behavior first… **native MXAccess event families**." Today's gateway proto exposes only data-change families. Closing the alarm regression and fulfilling that parity statement are the same task. ## Goals - Restore all three regressed capabilities to feature parity with v1. - Keep the v2 architectural split — gateway owns MxAccess transport; lmxopcua owns OPC UA Part 9 semantics, ACL/role enforcement, and multi-source aggregation (driver-native + scripted + sub-attribute). - Preserve the value-driven sub-attribute path as a fallback for Galaxy templates that don't carry `$Alarm*` extensions. - Land the work as a sequence of small, independently-reviewable PRs that alternate between repos in dependency order. ## Non-goals - Reimplementing the Part 9 state machine inside mxaccessgw. The gateway stays UA-agnostic. - Reworking the LDAP role-grant or OPC UA AlarmAck ACL surface — those already exist and route through `Server/Alarms/IAlarmAcknowledger`. - Adding alarm support to non-Galaxy drivers (AbCip / FOCAS / OpcUaClient already have their own `IAlarmSource` implementations; Modbus / S7 / AbLegacy / TwinCAT don't have a native alarm bus and are out of scope). - Altering Galaxy template conventions or `$Alarm*` extensions in the customer's Galaxy. ## Before → after **Today (post-PR 7.2):** ``` MxAccess COM (gateway worker) │ data-change events only on the MxEvent stream ▼ GalaxyDriver (no IAlarmSource) │ IWritable / ISubscribable / ITagDiscovery only ▼ DriverNodeManager ├─ subscribes to four $Alarm* sub-attributes per condition ├─ AlarmConditionService rebuilds Part 9 transitions from value updates └─ DriverWritableAcknowledger writes AckMsgWriteRef on ack Phase7Composer.ResolveHistorianSink → NullAlarmHistorianSink (scripted-alarm transitions queue → silently discarded) ``` **After this epic:** ``` MxAccess COM (gateway worker) │ data-change ──┐ │ alarm-transition │ │ write-complete ├─► single MxEvent stream (new family added) ▼ ▼ GalaxyDriver : ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IAlarmSource ← restored ├─ EventPump dispatches OnAlarmTransition family → IAlarmSource.OnAlarmEvent ├─ AcknowledgeAsync → gateway RPC AcknowledgeAlarm └─ QueryActiveAlarmsAsync → gateway RPC QueryActiveAlarms (ConditionRefresh) DriverNodeManager ├─ rich alarm events from IAlarmSource.OnAlarmEvent → AlarmConditionService ├─ value-driven sub-attribute path STILL WORKS for templates without $Alarm ├─ DriverWritableAcknowledger preserved as fallback for the value path └─ ScriptedAlarmEngine output continues to feed AlarmConditionService Phase7Composer.ResolveHistorianSink → GatewayAlarmHistorianWriter ├─ scripted-alarm transitions → SqliteStoreAndForwardSink └─ drain worker → gateway RPC WriteHistorianEvent → AVEVA Historian ``` ## Architecture decisions **D1 — Where the Part 9 state machine runs.** Stays in lmxopcua's `AlarmConditionService`. Gateway is UA-agnostic. ScriptedAlarmEngine produces Part 9 transitions with no MxAccess origin; the aggregator must live where all sources converge. **D2 — Where authz on Acknowledge runs.** Stays in lmxopcua. The OPC UA `AlarmConditionState.OnAcknowledge` delegate already checks the session's roles for `AlarmAck` against the LDAP/role-grant ACL. The gateway should never be reachable in a way that bypasses that check. **D3 — How rich alarm events reach OPC UA clients.** New `MxEventFamily` on the existing `StreamEvents` RPC (no second stream). Adds latency parity with data-change events, reuses the bounded-channel + worker-side delivery semantics already documented in `gateway.md`. **D4 — Sub-attribute fallback path stays.** Some Galaxy templates won't have `$Alarm*` extensions yet; the existing value-driven path remains the only way to surface alarms for those templates. Both paths feed `AlarmConditionService`. Driver-native events take precedence when both are present (more authoritative, lower latency). **D5 — Where the historian writer lives.** As a new RPC on the gateway (`WriteHistorianEvent`). The Wonderware sidecar's existing `WriteAlarmEvents` IPC slot stays unwired and is deleted as part of this epic — the gateway is the canonical place for "write to AVEVA Historian" since the gateway already owns AVEVA-COM access. This also means the sidecar (long term) only does *reads* and could potentially retire entirely if the historian-client REST migration (`docs/plans/...`) lands. ## Track A — mxaccessgw changes All five PRs land in `c:\Users\dohertj2\Desktop\mxaccessgw\`. ### PR A.1 — proto: add alarm-transition event family + ack/query RPCs **Files** (`src\MxGateway.Contracts\Protos\mxaccess_gateway.proto`): 1. Extend `MxEventFamily` (line 403): ``` MX_EVENT_FAMILY_ON_ALARM_TRANSITION = 5; ``` 2. Extend `MxEvent.body` oneof (line 395) with: ``` OnAlarmTransitionEvent on_alarm_transition = 24; ``` 3. New message `OnAlarmTransitionEvent` after the existing event-family bodies (line 425+). Carry the full MxAccess alarm payload — alarm name, source object reference, alarm-type-name (e.g. "AnalogLimitAlarm.HiHi"), transition kind enum (`Raise` / `Acknowledge` / `Clear`), severity (raw numeric — keep MxAccess scale; mapping to OPC UA 0-1000 happens server-side in lmxopcua), `original_raise_timestamp`, `transition_timestamp`, optional `operator_user`, optional `operator_comment`, alarm `category` string, alarm `description`. Mirror the field set documented in v1's `GalaxyAlarmTracker`. 4. New RPC on `MxAccessGateway` service (line 11): ``` rpc AcknowledgeAlarm(AcknowledgeAlarmRequest) returns (AcknowledgeAlarmReply); rpc QueryActiveAlarms(QueryActiveAlarmsRequest) returns (stream ActiveAlarmSnapshot); ``` `AcknowledgeAlarmRequest` carries `session_id`, `alarm_full_reference`, `comment`, `user_principal`. Reply carries `MxStatusProxy`. `QueryActiveAlarmsRequest` carries `session_id`, optional `alarm_filter_prefix` (for ConditionRefresh on a sub-tree). `ActiveAlarmSnapshot` carries the same fields as `OnAlarmTransitionEvent` plus `current_state` enum (`Active` / `ActiveAcked` / `Inactive`). **Tests** (`MxGateway.Tests` — proto/codegen sanity): - Round-trip Serialize→Deserialize for the new messages with all-fields populated and empty-optional-fields cases. - `MxEvent.body` oneof selection guard — supplying multiple bodies rejected. **Out of scope:** worker-side wiring (PR A.2), gateway-side dispatch (PR A.3). PR A.1 is a pure contract-surface change; nothing functional yet. ### PR A.2 — worker: subscribe to MxAccess alarm event source **Files** (`src\MxGateway.Worker\` — net48/x86): The MxAccess Toolkit exposes alarm subscription separately from data subscription. Per AVEVA's MXAccess C++ Toolkit reference (canonical doc referenced from `gateway.md`), alarm events arrive through the `IAlarmEventSink` interface registered against the MxAccess `Alarms` collection of an open session, OR via the MxAccess "alarm provider" subscription pattern (depends on Toolkit version on the worker host — verify against the version actually deployed in the worker bin during PR A.2). 1. Worker subscribes to MxAccess alarms once per session, with a single sink that fans out into the same bounded channel the data-change pump uses (`MxGateway.Worker\Eventing\EventChannel.cs` or whatever the worker currently calls its sink — verify name during the PR). 2. Sink translates each MxAccess alarm event into a `WorkerEvent` proto (defined in `mxaccess_worker.proto`) carrying the new `OnAlarmTransitionEvent` body. Reuses the existing `worker_sequence` counter so ordering is preserved across families. 3. Worker honours the same backpressure rules as data-change events — newest-dropped on full channel, single dropped-counter metric per family. **Tests** (`MxGateway.Worker.Tests`): - Fake `IAlarmEventSink` source emits canned transitions; assert the worker forwards each as the right `WorkerEvent` shape. - Cancellation test — closing the session unsubscribes from MxAccess alarms cleanly (no leaked sinks if the worker is recycled mid-session). **Out of scope:** any gateway-side dispatch, any RPC handler — PR A.2 is worker-internal. ### PR A.3 — gateway: dispatch OnAlarmTransition + implement AcknowledgeAlarm **Files** (`src\MxGateway.Server\`): 1. The session-level event multiplexer (`Sessions\SessionEventStream.cs` or equivalent — verify name during PR) recognizes the new `WorkerEvent` body and forwards as an `MxEvent` with family `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` to the gRPC `StreamEvents` consumer. 2. New RPC handler `AcknowledgeAlarm` builds an MxAccess `WorkerCommand` carrying an `AlarmAcknowledgeCommand` (new in `mxaccess_worker.proto` under PR A.1). Forwarded to the worker; reply mapped to `AcknowledgeAlarmReply` with the MxAccess `MxStatus` proxy populated. 3. AuthN — same API-key + scope check as existing RPCs. Add a new scope `invoke:alarm-ack` (mirrors `invoke:write` granularity); existing keys without it return `PERMISSION_DENIED`. **Tests** (`MxGateway.Tests`, `MxGateway.IntegrationTests`): - Unit: dispatch test — fake worker emits an `AlarmTransition` event; assert the gateway forwards it on the live `StreamEvents` channel of every subscribed session. - Integration: end-to-end against the real worker (requires the parity rig setup — see `docs\v2\Galaxy.ParityRig.md` in lmxopcua for the MxAccess-installed dev box prerequisites). Trigger a real Galaxy alarm, assert the gateway emits `OnAlarmTransition`. Acknowledge via the new RPC, assert the alarm transitions to `ActiveAcked` and an `Acknowledge` transition event is emitted back. - AuthN: existing key without `invoke:alarm-ack` scope rejected. ### PR A.4 — gateway: ConditionRefresh snapshot via QueryActiveAlarms **Files** (`src\MxGateway.Server\`, `src\MxGateway.Worker\`): 1. Worker exposes a `QueryActiveAlarmsCommand` that walks the session's active-alarm collection and streams snapshots back through the existing command-reply channel. The MxAccess Toolkit's `Alarms.GetActive()` (verify exact API name during PR) is the underlying call. 2. Gateway RPC `QueryActiveAlarms` opens a server-streaming reply, batches snapshots through. 3. AuthN — new scope `invoke:alarm-query` (separate from ack so a read-only client can refresh without ack rights). **Tests:** - Worker-test: synthetic active set of 0 / 1 / 100 alarms; assert pagination respects worker channel capacity. - Integration: against the parity rig, assert a ConditionRefresh after reconnect returns every alarm currently `Active` or `ActiveAcked` in the Galaxy. ### PR A.5 — gateway: WriteHistorianEvent RPC for sink write-back **Files** (`src\MxGateway.Server\`, `src\MxGateway.Worker\`, `src\MxGateway.Contracts\Protos\mxaccess_gateway.proto`). 1. New RPC `WriteHistorianEvent(WriteHistorianEventRequest) → WriteHistorianEventReply`. Request carries an `AlarmHistorianRecord` mirroring the existing `Core.AlarmHistorian.AlarmHistorianEvent` payload (alarm id, equipment path, alarm name, alarm-type-name, severity, event kind, message, user, comment, timestamp). 2. Worker maps the record onto `aahClientManaged`'s alarm-event write API (the same path v1's `GalaxyHistorianWriter` used). Worker batches up to N records per write to amortize the COM round-trip. 3. AuthN — new scope `invoke:historian-write`. Cross-cutting with `invoke:write` — keys for OPC UA servers that publish historian data must hold both. **Tests:** - Worker test: fake `aahClientManaged` writer; assert batching semantics + retry-on-Bad-status-code behaviour matches v1's `GalaxyHistorianWriter` (per-row outcome reporting). - Integration: write a record, query it back via existing Historian read APIs, assert round-trip fidelity. **Sequencing within Track A:** A.1 → A.2 → A.3 → A.4 → A.5. A.1 is mechanical; A.2 + A.3 are the load-bearing changes that unlock lmxopcua side. A.4 + A.5 can ship after lmxopcua starts consuming A.3 output. ## Track B — lmxopcua changes All five PRs land in `c:\Users\dohertj2\Desktop\lmxopcua\`. Each B-PR depends on a specific A-PR — see the sequencing matrix below. ### PR B.1 — EventPump: dispatch OnAlarmTransition family **Depends on:** A.1 (proto), A.3 (gateway dispatching the new family). **Files:** - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\EventPump.cs:160` — current `Dispatch(MxEvent ev)` returns early for any non-`OnDataChange` family. Add a branch: ```csharp switch (ev.Family) { case MxEventFamily.OnDataChange: DispatchDataChange(ev); break; case MxEventFamily.OnAlarmTransition: DispatchAlarmTransition(ev); break; default: return; } ``` - New `DispatchAlarmTransition` translates the proto event into an `AlarmEventArgs` (existing type from `Core.Abstractions`) and raises an internal event the driver subscribes to. - New `MxAccessSeverityMapper` in `Driver.Galaxy\Runtime\` — maps the MxAccess raw severity into the `AlarmSeverity` enum + the OPC UA numeric severity (250 / 500 / 700 / 900 ladder per v1's `AlarmTracking.md`). **Tests** (`tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests\Runtime\`): - `EventPumpAlarmTests` — feed three synthetic MxEvents (raise / ack / clear); assert each fires `OnAlarmEvent` on the driver with correct payload. - Severity-mapping table tests — every documented MxAccess severity level → expected (`AlarmSeverity`, OPC UA numeric) tuple. ### PR B.2 — GalaxyDriver re-implements IAlarmSource **Depends on:** A.3 (`AcknowledgeAlarm` RPC available), B.1 (event dispatch). **Files:** - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriver.cs:28` — extend the class declaration: ```csharp public sealed class GalaxyDriver : IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IAlarmSource, IDisposable ``` - Implement the four `IAlarmSource` members: - `SubscribeAlarmsAsync` — no-op returning a sentinel handle. The driver is already subscribed for data; alarm events arrive on the same event stream once the gateway emits the new family. (Same pattern AbCip uses today — see `Driver.AbCip\AbCipDriver.cs:208`.) - `UnsubscribeAlarmsAsync` — no-op. - `OnAlarmEvent` — wired to the EventPump branch added in B.1. - `AcknowledgeAsync` — calls the new gateway RPC via the `IGalaxyAlarmAcknowledger` abstraction (new file, mirrors the `IGalaxyDataWriter` pattern), with `GatewayGalaxyAlarmAcknowledger` as the production implementation in `Runtime\`. Resilience wrapping via `AlarmSurfaceInvoker` per existing pattern. - `DriverInstanceFactory` for Galaxy registers `IGalaxyAlarmAcknowledger` alongside the existing data writer. **Tests:** - Subscribe-noop returns a non-null handle; unsubscribe accepts it. - Acknowledge — fake `IGalaxyAlarmAcknowledger` records the call; assert the request shape and resilience-pipeline routing. - End-to-end test in `Driver.Galaxy.Tests` — fake gateway emits a raise-then-ack event sequence; assert the driver fires `OnAlarmEvent` twice with matching alarm-id correlation. ### PR B.3 — DriverNodeManager: route to driver-native when present **Depends on:** B.2. **Files:** - `src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs` — when registering an `AlarmConditionState` for a Galaxy variable, check whether the driver is `IAlarmSource`. If yes, prefer the `OnAlarmEvent`-driven path; the value-driven sub-attribute path becomes the secondary path that handles transitions the driver-native stream missed (network blip, gateway restart, gw missing the `$Alarm*` extension on this template). - `Server\Alarms\AlarmConditionService` — already accepts events from multiple sources; only addition is a `DriverEventOrigin` enum on internal transitions so the dedup logic prefers the richer driver-native record over a stale sub-attribute synthesis. - `IAlarmAcknowledger` resolution in `DriverNodeManager` — prefer the driver's `IAlarmSource.AcknowledgeAsync` over `DriverWritableAcknowledger` when both are available. Keep `DriverWritableAcknowledger` as the fallback for templates without `$Alarm*` extensions. **Tests:** - Two-source-fan-in test: same alarm condition receives both a driver-native ack event and a sub-attribute value update for the same transition; assert no duplicate Part 9 transition fires. - Acknowledger routing — driver implements `IAlarmSource` → ack-via-RPC; driver implements only `IWritable` → ack-via-write (existing path). ### PR B.4 — IAlarmHistorianWriter via gateway **Depends on:** A.5 (`WriteHistorianEvent` RPC available). **Files:** - New `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\GatewayAlarmHistorianWriter.cs` implementing `IAlarmHistorianWriter`. Calls the gateway RPC from Track A.5 with the same batch + per-row outcome semantics v1's `GalaxyHistorianWriter` exposed. - `GalaxyDriverFactory` registers it as a singleton tied to the `DriverInstance`. - `Server\Phase7\Phase7Composer.ResolveHistorianSink` — already scans registered drivers for an `IAlarmHistorianWriter`. Once GalaxyDriver exposes one, `SqliteStoreAndForwardSink` boots with a real writer attached and the `NullAlarmHistorianSink` fallback no longer applies on Galaxy installs. - Delete `WriteAlarmEventsRequest` / `WriteAlarmEventsReply` / `IAlarmEventWriter` from the Wonderware sidecar (`src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Ipc\Contracts.cs`, `Ipc\HistorianFrameHandler.cs`, `Ipc\Framing.cs`). The historian sidecar becomes read-only — matches the audit done earlier. **Tests:** - `GatewayAlarmHistorianWriter` against a fake gRPC server — single record, batch, per-row failure modes (Ack / RetryPlease / PermanentFail). - `Phase7Composer` end-to-end — register a Galaxy driver, assert `ResolveHistorianSink` picks `SqliteStoreAndForwardSink` with the new writer attached. ### PR B.5 — docs + memory housekeeping **Depends on:** B.1 / B.2 / B.3 / B.4 all green on the parity rig. **Files:** - `docs\drivers\Galaxy.md` — current text says the driver implements five capability interfaces; update to seven (`IAlarmSource`, `IAlarmHistorianWriter`-via-companion). - `docs\AlarmTracking.md` — promote a fresh top-level doc that describes the v2-final architecture (driver-native primary path + sub-attribute fallback + scripted-alarm aggregation). Cross-link from `docs\README.md`. The v1 archive stays as historical record. - `docs\v1\AlarmTracking.md` — extend the existing historical banner with "Restored to functional parity in this epic — see `docs\AlarmTracking.md` for current state." - Memory entries (`C:\Users\dohertj2\.claude\projects\…\memory\`): - Update `project_galaxy_via_mxgateway.md` — add the alarm path restoration. - Update `project_server_history_alarm_subsystems.md` — note that `Phase7Composer.ResolveHistorianSink` now finds a writer on Galaxy installs. - `docs\plans\alarms-over-gateway.md` (this file) — banner the doc `✅ Completed YYYY-MM-DD — historical record.` matching the existing v2-mxgw plan retirement convention. ## Sequencing matrix ``` Track A (mxaccessgw) Track B (lmxopcua) ───────────────────────── ───────────────────────── A.1 proto (waits) │ ├──────────────────────────► B.1 EventPump branch A.2 worker subscription │ uses proto types only │ │ unit-testable without live gw │ A.3 gateway dispatch + ack RPC ──►B.2 GalaxyDriver : IAlarmSource │ │ │ ──►B.3 DriverNodeManager routing │ A.4 ConditionRefresh │ (B.3 closes the loop with A.4 │ once ConditionRefresh wired) │ A.5 WriteHistorianEvent ─────────►B.4 GatewayAlarmHistorianWriter │ + sidecar write-path deletion ──►B.5 docs + memory ``` A.1 + B.1 can land in parallel (B.1's tests use proto types without needing a running gateway). B.1 stays inert until A.3 ships the gateway dispatch — which is fine; the dispatch branch is a no-op until events arrive. ## Test gates Per PR: unit tests pass + build green + analyzer clean (Roslyn OTOPCUA0001 still wraps every alarm-capability call through `AlarmSurfaceInvoker`). End-of-epic gate: re-run the parity rig (`docs\v2\Galaxy.ParityRig.md`) with these scenarios added: 1. **Native alarm raise** — Galaxy `$Alarm*` raise with operator-time metadata appears as an OPC UA Part 9 transition with full payload (no longer reconstructed from sub-attribute writes). 2. **Native ack** — OPC UA client acks; assert the gateway records the ack against MxAccess directly (not via sub-attribute write); operator comment present in the resulting `Acknowledged` transition. 3. **ConditionRefresh after reconnect** — disconnect the GalaxyDriver, raise three alarms in Galaxy, reconnect; assert all three appear in the next ConditionRefresh. 4. **Historian write-back** — fire a scripted alarm; assert it arrives in AVEVA Historian via the gateway path (use the existing Historian sidecar's read API to query it back). 5. **Sub-attribute fallback still works** — disable `IAlarmSource` on the GalaxyDriver via test seam, fire a sub-attribute value change; assert Part 9 transition still raised. Soak target: 24h × 1k tags (light) — same parity-rig harness but extended to also subscribe to alarms. Pass criterion: zero dropped alarm transitions, zero state-machine inversions, zero unhandled exceptions in the AlarmSurfaceInvoker pipeline. ## Risks and mitigations | Risk | Mitigation | |---|---| | MxAccess Toolkit alarm subscription API differs across installed AVEVA versions | PR A.2 verifies against the worker-host's installed Toolkit version; documents the exact API used. Pin the worker DLL set per major MxAccess version if needed. | | Worker-side alarm subscription leaks between sessions if cleanup is wrong | PR A.2 includes a session-recycle test that asserts no `IAlarmEventSink` instances remain registered after Close. | | Gateway adds a new auth scope (`invoke:alarm-ack`); existing keys lack it | PR A.3 + A.5 ship with a one-time bootstrap migration: keys with `invoke:write` get the new scope auto-granted on the dev rig and parity rig. Production keys are reissued via `apikey rotate-key` (existing CLI). | | Two simultaneous alarm sources (driver-native + sub-attribute) double-fire transitions | PR B.3 dedup is the load-bearing design. End-to-end test #1 covers it explicitly. | | Historian write-back batch fails mid-batch — partial success | The existing `SqliteStoreAndForwardSink.HistorianWriteOutcome` per-row enum + dead-letter retention already handles this; PR A.5 just exposes the same outcome shape over gRPC. | | Sidecar write-path deletion in B.4 leaves orphan IPC frames in old client builds | The frame-kind enum is forward-compatible (`MessageKind.WriteAlarmEventsRequest = 0x20`). Old clients sending the request to a new sidecar receive `Unsupported message kind`; new clients never send it. Acceptable — same-version deploy is the existing rollout convention. | ## Roll-out Track A lands first onto `mxaccessgw/main`, deployed to the parity rig. Track B lands onto `lmxopcua/master` once A.3 is live on the rig — earlier Track B PRs can target a feature branch (`feat/alarms-over-gateway`) and merge to master after the rig is fully green. ## Back-out Each PR is individually revertable. The cleanest back-out point is at the gateway-side enum extension: removing `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` from the proto means EventPump silently drops alarm events again and GalaxyDriver's `OnAlarmEvent` never fires — but the sub-attribute fallback path still produces functional alarms, so the OPC UA surface degrades to v2-current behaviour without breaking. PR B.4 is the only one with a non-trivial back-out (re-add the deleted sidecar IPC slot if revert needed); land B.4 last and only after end-of-epic gate is green. ## Out of scope (explicit) - **Other alarm sources beyond Galaxy.** AbCip / FOCAS / OpcUaClient drivers already implement `IAlarmSource`; they're untouched. - **Modbus / S7 / AbLegacy / TwinCAT alarms.** None of those protocols has a native alarm bus. Alarms on those drivers, if needed, ship via the scripted-alarm path. - **Multi-Galaxy ack routing.** Today's gateway model is one Galaxy per session; if a deployment splits across galaxies, each gets its own GalaxyDriver and they don't cross-talk. No change. - **OPC UA Part 9 advanced features** beyond the current scope — shelving, subscribed-to-events-only, branch-state for re-trigger semantics. Future epic if a customer asks. - **Insight / cloud Historian write-back path.** Track A.5 targets the on-prem AVEVA Historian via aahClientManaged. The cloud variant would mirror the same gateway RPC over the REST API discussed in `docs/histsdk` — separate epic. ## File inventory (touched) **mxaccessgw:** - `src\MxGateway.Contracts\Protos\mxaccess_gateway.proto` (A.1, A.5) - `src\MxGateway.Contracts\Protos\mxaccess_worker.proto` (A.2, A.4, A.5) - `src\MxGateway.Worker\…\Eventing\` (A.2, A.3, A.4) - `src\MxGateway.Worker\…\Commands\` (A.3, A.4, A.5) - `src\MxGateway.Server\Sessions\SessionEventStream.cs` (A.3) - `src\MxGateway.Server\Rpc\` (A.3, A.4, A.5) - `src\MxGateway.Server\Auth\Scopes.cs` (A.3, A.4, A.5) - `MxGateway.Tests`, `MxGateway.Worker.Tests`, `MxGateway.IntegrationTests` **lmxopcua:** - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\EventPump.cs` (B.1) - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\MxAccessSeverityMapper.cs` *(new — B.1)* - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\IGalaxyAlarmAcknowledger.cs` *(new — B.2)* - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\GatewayGalaxyAlarmAcknowledger.cs` *(new — B.2)* - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\GatewayAlarmHistorianWriter.cs` *(new — B.4)* - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriver.cs` (B.2) - `src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriverFactory.cs` (B.2, B.4) - `src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs` (B.3) - `src\ZB.MOM.WW.OtOpcUa.Server\Alarms\AlarmConditionService.cs` (B.3) - `src\ZB.MOM.WW.OtOpcUa.Server\Phase7\Phase7Composer.cs` (B.4) - `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Ipc\Contracts.cs` (B.4 — deletions) - `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Ipc\HistorianFrameHandler.cs` (B.4 — deletions) - `src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Ipc\Framing.cs` (B.4 — deletions) - `tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests\Runtime\` (B.1, B.2) - `tests\ZB.MOM.WW.OtOpcUa.Server.Tests\Alarms\` (B.3) - `tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests\` (B.4 — drop deleted-contract tests) - `docs\drivers\Galaxy.md` (B.5) - `docs\AlarmTracking.md` *(new — B.5)* - `docs\v1\AlarmTracking.md` (B.5 — banner update) - `docs\plans\alarms-over-gateway.md` (B.5 — completion banner) Total: ~12 source files added/modified in mxaccessgw; ~17 in lmxopcua; ~10 test files. Should land in 4-6 weeks of focused work given the parity-rig dependency for end-to-end validation.