Files
lmxopcua/docs/plans/alarms-over-gateway.md
Joseph Doherty 5ed26d2ec6 docs: alarms-over-gateway plan banner — record A.2 dev-rig finding
Replaces the "ships as a follow-up gated on dev-rig validation"
banner with the actual finding from the dev-rig inspection: the
MXAccess COM Toolkit on this AVEVA install does not expose any
alarm-event family, and the AVEVA alarm-subscription managed
assemblies (aaAlarmManagedClient, ArchestrAAlarmsAndEvents.SDK)
are x64-only and incompatible with the worker's x86 bitness.

Two operator-facing paths forward documented inline:

1. Stay on the value-driven sub-attribute path (current production
   behaviour). Operator-comment fidelity is the only v1 regression.

2. Add an x64 alarm-helper sub-process alongside the worker that
   loads aaAlarmManagedClient and forwards transitions over a
   named-pipe IPC. Recovers full v1 fidelity but adds operational
   complexity.

The full architectural notes live in the mxaccessgw repo at
src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:29:16 -04:00

57 KiB
Raw Blame History

Plan — alarms over the mxaccessgw gateway

All 19 PRs merged 2026-04-30 — historical record. A.1 / A.2 / A.3 / A.4 (gateway proto + handlers + worker scaffold), B.1 / B.2 / B.3 / B.4 / B.5 (driver, server, docs), C.1 / C.2 (sidecar alarm historian writer), D.1 (deploy script), E.1 / E.2 / E.3 / E.4 / E.5 / E.6 / E.7 (5 client SDKs + lmxopcua client surface). Public contract surface is live; client SDKs ship the new RPCs; the sub-attribute fallback path keeps Galaxy alarms functional today.

⚠️ Worker-side native alarm subscription blocked on a dev-rig finding (2026-04-30): the MXAccess COM Toolkit at C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dll exposes no alarm-event family — only OnDataChange, OnWriteComplete, OperationComplete, OnBufferedDataChange. AVEVA's aaAlarmManagedClient / ArchestrAAlarmsAndEvents.SDK assemblies are x64-only and incompatible with the worker's x86 bitness. Operator decision needed before MX_EVENT_FAMILY_ON_ALARM_TRANSITION carries any events: either accept the value-driven sub-attribute path as the production architecture (operator-comment fidelity is the only v1 regression) or add an x64 alarm-helper sub-process alongside the worker. See src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs in the mxaccessgw repo for the architectural notes. Live aahClientManaged alarm-event write call site (SdkAlarmHistorianWriteBackend placeholder from PR C.1) and the D.1 smoke artifact ship once those decisions resolve. The remainder of this document is preserved as the design record.

Coordinated epic across two repos:

  • lmxopcua (this repo) — c:\Users\dohertj2\Desktop\lmxopcua\
  • mxaccessgwc:\Users\dohertj2\Desktop\mxaccessgw\

Why

PR 7.2 (2026-04-30, commit ae7106d) retired the in-process v1 Galaxy stack (Driver.Galaxy.Host / .Proxy / .Shared + OtOpcUaGalaxyHost Windows service) and migrated Galaxy access to the in-process GalaxyDriver over mxaccessgw's gRPC. In doing so, three v1 capabilities regressed:

  1. Native MxAccess alarm-event metadata — v1's GalaxyAlarmTracker surfaced rich alarm transitions (operator comment, original raise time, ack time, alarm category, native severity). The current architecture reconstructs Part 9 transitions by subscribing to four sub-attribute value updates (InAlarm, Acked, Priority, Description) — fine for raise/clear but loses everything else.
  2. Native MxAccess Acknowledge semantics — v1 called the MxAccess ack API directly from GalaxyAlarmTracker. Today, OPC UA acks are written into the AckMsgWriteRef sub-attribute — semantically valid but a round-trip through the value path that loses operator-comment fidelity.
  3. Alarm-historian write-back path for non-Galaxy alarm sources. v1's GalaxyHistorianWriter implemented IAlarmHistorianWriter and forwarded scripted-alarm transitions (and any future non-Galaxy alarm source — AB CIP ALMD, OpcUaClient A&E, etc.) back to AVEVA Historian via aahClientManaged. PR 7.2 deleted it. Phase7Composer.ResolveHistorianSink now finds no writer and falls back to NullAlarmHistorianSink, so scripted-alarm transitions queue locally and silently discard. Galaxy-native alarms (with $Alarm* extensions) reach AVEVA Historian via System Platform's own HistorizeToAveva toggle on the Galaxy template — that path was never broken and is not in scope for this epic.

gateway.md (mxaccessgw, line 8) explicitly commits the gateway to "full MXAccess parity… preserve MXAccess behavior first… native MXAccess event families." Today's gateway proto exposes only data-change families. Closing the alarm regression and fulfilling that parity statement are the same task.

Goals

  • Restore all three regressed capabilities to feature parity with v1.
  • Keep the v2 architectural split — gateway owns MxAccess transport; lmxopcua owns OPC UA Part 9 semantics, ACL/role enforcement, and multi-source aggregation (driver-native + scripted + sub-attribute).
  • Preserve the value-driven sub-attribute path as a fallback for Galaxy templates that don't carry $Alarm* extensions.
  • Land the work as a sequence of small, independently-reviewable PRs that alternate between repos in dependency order.

Non-goals

  • Reimplementing the Part 9 state machine inside mxaccessgw. The gateway stays UA-agnostic.
  • Reworking the LDAP role-grant or OPC UA AlarmAck ACL surface — those already exist and route through Server/Alarms/IAlarmAcknowledger.
  • Adding alarm support to non-Galaxy drivers (AbCip / FOCAS / OpcUaClient already have their own IAlarmSource implementations; Modbus / S7 / AbLegacy / TwinCAT don't have a native alarm bus and are out of scope).
  • Altering Galaxy template conventions or $Alarm* extensions in the customer's Galaxy.

Before → after

Today (post-PR 7.2):

MxAccess COM (gateway worker)
   │ data-change events only on the MxEvent stream
   ▼
GalaxyDriver (no IAlarmSource)
   │ IWritable / ISubscribable / ITagDiscovery only
   ▼
DriverNodeManager
   ├─ subscribes to four $Alarm* sub-attributes per condition
   ├─ AlarmConditionService rebuilds Part 9 transitions from value updates
   └─ DriverWritableAcknowledger writes AckMsgWriteRef on ack

Phase7Composer.ResolveHistorianSink → NullAlarmHistorianSink
   (scripted-alarm transitions queue → silently discarded)

After this epic:

MxAccess COM (gateway worker)
   │ data-change   ──┐
   │ alarm-transition │
   │ write-complete   ├─► single MxEvent stream (new family added)
   ▼                  ▼
GalaxyDriver : ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable,
               IHostConnectivityProbe, IAlarmSource             ← restored
   ├─ EventPump dispatches OnAlarmTransition family → IAlarmSource.OnAlarmEvent
   ├─ AcknowledgeAsync → gateway RPC AcknowledgeAlarm
   └─ QueryActiveAlarmsAsync → gateway RPC QueryActiveAlarms (ConditionRefresh)

DriverNodeManager
   ├─ rich alarm events from IAlarmSource.OnAlarmEvent → AlarmConditionService
   ├─ value-driven sub-attribute path STILL WORKS for templates without $Alarm
   ├─ DriverWritableAcknowledger preserved as fallback for the value path
   └─ ScriptedAlarmEngine output continues to feed AlarmConditionService

Phase7Composer.ResolveHistorianSink → GatewayAlarmHistorianWriter
   ├─ scripted-alarm transitions → SqliteStoreAndForwardSink
   └─ drain worker → gateway RPC WriteHistorianEvent → AVEVA Historian

Architecture decisions

D1 — Where the Part 9 state machine runs. Stays in lmxopcua's AlarmConditionService. Gateway is UA-agnostic. ScriptedAlarmEngine produces Part 9 transitions with no MxAccess origin; the aggregator must live where all sources converge.

D2 — Where authz on Acknowledge runs. Stays in lmxopcua. The OPC UA AlarmConditionState.OnAcknowledge delegate already checks the session's roles for AlarmAck against the LDAP/role-grant ACL. The gateway should never be reachable in a way that bypasses that check.

D3 — How rich alarm events reach OPC UA clients. New MxEventFamily on the existing StreamEvents RPC (no second stream). Adds latency parity with data-change events, reuses the bounded-channel + worker-side delivery semantics already documented in gateway.md.

D4 — Sub-attribute fallback path stays. Some Galaxy templates won't have $Alarm* extensions yet; the existing value-driven path remains the only way to surface alarms for those templates. Both paths feed AlarmConditionService. Driver-native events take precedence when both are present (more authoritative, lower latency).

D5 — Where the historian writer lives. In the Wonderware historian sidecar, not in the gateway. The sidecar already owns aahClientManaged, already has a WriteAlarmEvents IPC slot defined in Ipc/Contracts.cs, and already dispatches to an IAlarmEventWriter interface — it's just unwired in Program.cs:57. The gateway is for MxAccess (live data + Galaxy hierarchy); the historian sidecar is for aahClientManaged (time-series + alarms historian). Two different SDKs, two different concerns; keep the split. Bonus: completing the sidecar's write path also gives it a clearer long-term role — once the REST-API migration in histsdk\instructions.md takes over reads, write-back keeps the sidecar relevant rather than retiring it as a read-only relic. Galaxy-native alarms bypass this entirely — System Platform's own HistorizeToAveva toggle on the Galaxy template publishes them directly. The sidecar write path is exclusively for non-Galaxy producers (today: scripted alarms; future: AB CIP ALMD or any other lmxopcua-side alarm source the customer wants unified into AVEVA Historian).

Track A — mxaccessgw changes

All five PRs land in c:\Users\dohertj2\Desktop\mxaccessgw\.

PR A.1 — proto: add alarm-transition event family + ack/query RPCs

Files (src\MxGateway.Contracts\Protos\mxaccess_gateway.proto):

  1. Extend MxEventFamily (line 403):

    MX_EVENT_FAMILY_ON_ALARM_TRANSITION = 5;
    
  2. Extend MxEvent.body oneof (line 395) with:

    OnAlarmTransitionEvent on_alarm_transition = 24;
    
  3. New message OnAlarmTransitionEvent after the existing event-family bodies (line 425+). Carry the full MxAccess alarm payload — alarm name, source object reference, alarm-type-name (e.g. "AnalogLimitAlarm.HiHi"), transition kind enum (Raise / Acknowledge / Clear), severity (raw numeric — keep MxAccess scale; mapping to OPC UA 0-1000 happens server-side in lmxopcua), original_raise_timestamp, transition_timestamp, optional operator_user, optional operator_comment, alarm category string, alarm description. Mirror the field set documented in v1's GalaxyAlarmTracker.

  4. New RPC on MxAccessGateway service (line 11):

    rpc AcknowledgeAlarm(AcknowledgeAlarmRequest) returns (AcknowledgeAlarmReply);
    rpc QueryActiveAlarms(QueryActiveAlarmsRequest) returns (stream ActiveAlarmSnapshot);
    

    AcknowledgeAlarmRequest carries session_id, alarm_full_reference, comment, user_principal. Reply carries MxStatusProxy.

    QueryActiveAlarmsRequest carries session_id, optional alarm_filter_prefix (for ConditionRefresh on a sub-tree). ActiveAlarmSnapshot carries the same fields as OnAlarmTransitionEvent plus current_state enum (Active / ActiveAcked / Inactive).

Tests (MxGateway.Tests — proto/codegen sanity):

  • Round-trip Serialize→Deserialize for the new messages with all-fields populated and empty-optional-fields cases.
  • MxEvent.body oneof selection guard — supplying multiple bodies rejected.

Out of scope: worker-side wiring (PR A.2), gateway-side dispatch (PR A.3). PR A.1 is a pure contract-surface change; nothing functional yet.

PR A.2 — worker: subscribe to MxAccess alarm event source

Files (src\MxGateway.Worker\ — net48/x86):

The MxAccess Toolkit exposes alarm subscription separately from data subscription. Per AVEVA's MXAccess C++ Toolkit reference (canonical doc referenced from gateway.md), alarm events arrive through the IAlarmEventSink interface registered against the MxAccess Alarms collection of an open session, OR via the MxAccess "alarm provider" subscription pattern (depends on Toolkit version on the worker host — verify against the version actually deployed in the worker bin during PR A.2).

  1. Worker subscribes to MxAccess alarms once per session, with a single sink that fans out into the same bounded channel the data-change pump uses (MxGateway.Worker\Eventing\EventChannel.cs or whatever the worker currently calls its sink — verify name during the PR).
  2. Sink translates each MxAccess alarm event into a WorkerEvent proto (defined in mxaccess_worker.proto) carrying the new OnAlarmTransitionEvent body. Reuses the existing worker_sequence counter so ordering is preserved across families.
  3. Worker honours the same backpressure rules as data-change events — newest-dropped on full channel, single dropped-counter metric per family.

Tests (MxGateway.Worker.Tests):

  • Fake IAlarmEventSink source emits canned transitions; assert the worker forwards each as the right WorkerEvent shape.
  • Cancellation test — closing the session unsubscribes from MxAccess alarms cleanly (no leaked sinks if the worker is recycled mid-session).

Out of scope: any gateway-side dispatch, any RPC handler — PR A.2 is worker-internal.

PR A.3 — gateway: dispatch OnAlarmTransition + implement AcknowledgeAlarm

Files (src\MxGateway.Server\):

  1. The session-level event multiplexer (Sessions\SessionEventStream.cs or equivalent — verify name during PR) recognizes the new WorkerEvent body and forwards as an MxEvent with family MX_EVENT_FAMILY_ON_ALARM_TRANSITION to the gRPC StreamEvents consumer.
  2. New RPC handler AcknowledgeAlarm builds an MxAccess WorkerCommand carrying an AlarmAcknowledgeCommand (new in mxaccess_worker.proto under PR A.1). Forwarded to the worker; reply mapped to AcknowledgeAlarmReply with the MxAccess MxStatus proxy populated.
  3. AuthN — same API-key + scope check as existing RPCs. Add a new scope invoke:alarm-ack (mirrors invoke:write granularity); existing keys without it return PERMISSION_DENIED.

Tests (MxGateway.Tests, MxGateway.IntegrationTests):

  • Unit: dispatch test — fake worker emits an AlarmTransition event; assert the gateway forwards it on the live StreamEvents channel of every subscribed session.
  • Integration: end-to-end against the real worker (requires the parity rig setup — see docs\v2\Galaxy.ParityRig.md in lmxopcua for the MxAccess-installed dev box prerequisites). Trigger a real Galaxy alarm, assert the gateway emits OnAlarmTransition. Acknowledge via the new RPC, assert the alarm transitions to ActiveAcked and an Acknowledge transition event is emitted back.
  • AuthN: existing key without invoke:alarm-ack scope rejected.

PR A.4 — gateway: ConditionRefresh snapshot via QueryActiveAlarms

Files (src\MxGateway.Server\, src\MxGateway.Worker\):

  1. Worker exposes a QueryActiveAlarmsCommand that walks the session's active-alarm collection and streams snapshots back through the existing command-reply channel. The MxAccess Toolkit's Alarms.GetActive() (verify exact API name during PR) is the underlying call.
  2. Gateway RPC QueryActiveAlarms opens a server-streaming reply, batches snapshots through.
  3. AuthN — new scope invoke:alarm-query (separate from ack so a read-only client can refresh without ack rights).

Tests:

  • Worker-test: synthetic active set of 0 / 1 / 100 alarms; assert pagination respects worker channel capacity.
  • Integration: against the parity rig, assert a ConditionRefresh after reconnect returns every alarm currently Active or ActiveAcked in the Galaxy.

Sequencing within Track A: A.1 → A.2 → A.3 → A.4. A.1 is mechanical; A.2 + A.3 are the load-bearing changes that unlock lmxopcua side. A.4 can ship after lmxopcua starts consuming A.3 output. The historian-write capability moved to Track C below — the gateway intentionally stays out of aahClientManaged.

Track B — lmxopcua changes

All five PRs land in c:\Users\dohertj2\Desktop\lmxopcua\. Each B-PR depends on a specific A-PR — see the sequencing matrix below.

PR B.1 — EventPump: dispatch OnAlarmTransition family

Depends on: A.1 (proto), A.3 (gateway dispatching the new family).

Files:

  • src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\EventPump.cs:160 — current Dispatch(MxEvent ev) returns early for any non-OnDataChange family. Add a branch:
    switch (ev.Family) {
      case MxEventFamily.OnDataChange: DispatchDataChange(ev); break;
      case MxEventFamily.OnAlarmTransition: DispatchAlarmTransition(ev); break;
      default: return;
    }
    
  • New DispatchAlarmTransition translates the proto event into an AlarmEventArgs (existing type from Core.Abstractions) and raises an internal event the driver subscribes to.
  • New MxAccessSeverityMapper in Driver.Galaxy\Runtime\ — maps the MxAccess raw severity into the AlarmSeverity enum + the OPC UA numeric severity (250 / 500 / 700 / 900 ladder per v1's AlarmTracking.md).

Tests (tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests\Runtime\):

  • EventPumpAlarmTests — feed three synthetic MxEvents (raise / ack / clear); assert each fires OnAlarmEvent on the driver with correct payload.
  • Severity-mapping table tests — every documented MxAccess severity level → expected (AlarmSeverity, OPC UA numeric) tuple.

PR B.2 — GalaxyDriver re-implements IAlarmSource

Depends on: A.3 (AcknowledgeAlarm RPC available), B.1 (event dispatch).

Files:

  • src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriver.cs:28 — extend the class declaration:
    public sealed class GalaxyDriver
        : IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable,
          IRediscoverable, IHostConnectivityProbe, IAlarmSource, IDisposable
    
  • Implement the four IAlarmSource members:
    • SubscribeAlarmsAsync — no-op returning a sentinel handle. The driver is already subscribed for data; alarm events arrive on the same event stream once the gateway emits the new family. (Same pattern AbCip uses today — see Driver.AbCip\AbCipDriver.cs:208.)
    • UnsubscribeAlarmsAsync — no-op.
    • OnAlarmEvent — wired to the EventPump branch added in B.1.
    • AcknowledgeAsync — calls the new gateway RPC via the IGalaxyAlarmAcknowledger abstraction (new file, mirrors the IGalaxyDataWriter pattern), with GatewayGalaxyAlarmAcknowledger as the production implementation in Runtime\. Resilience wrapping via AlarmSurfaceInvoker per existing pattern.
  • DriverInstanceFactory for Galaxy registers IGalaxyAlarmAcknowledger alongside the existing data writer.

Tests:

  • Subscribe-noop returns a non-null handle; unsubscribe accepts it.
  • Acknowledge — fake IGalaxyAlarmAcknowledger records the call; assert the request shape and resilience-pipeline routing.
  • End-to-end test in Driver.Galaxy.Tests — fake gateway emits a raise-then-ack event sequence; assert the driver fires OnAlarmEvent twice with matching alarm-id correlation.

PR B.3 — DriverNodeManager: route to driver-native when present

Depends on: B.2.

Files:

  • src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs — when registering an AlarmConditionState for a Galaxy variable, check whether the driver is IAlarmSource. If yes, prefer the OnAlarmEvent-driven path; the value-driven sub-attribute path becomes the secondary path that handles transitions the driver-native stream missed (network blip, gateway restart, gw missing the $Alarm* extension on this template).
  • Server\Alarms\AlarmConditionService — already accepts events from multiple sources; only addition is a DriverEventOrigin enum on internal transitions so the dedup logic prefers the richer driver-native record over a stale sub-attribute synthesis.
  • IAlarmAcknowledger resolution in DriverNodeManager — prefer the driver's IAlarmSource.AcknowledgeAsync over DriverWritableAcknowledger when both are available. Keep DriverWritableAcknowledger as the fallback for templates without $Alarm* extensions.

Tests:

  • Two-source-fan-in test: same alarm condition receives both a driver-native ack event and a sub-attribute value update for the same transition; assert no duplicate Part 9 transition fires.
  • Acknowledger routing — driver implements IAlarmSource → ack-via-RPC; driver implements only IWritable → ack-via-write (existing path).

PR B.4 — IAlarmHistorianWriter via the historian sidecar IPC

Depends on: C.2 (sidecar wires its IAlarmEventWriter). See Track C for the sidecar-side work; B.4 is the lmxopcua-side consumer.

Files:

  • New src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client\SidecarAlarmHistorianWriter.cs implementing IAlarmHistorianWriter. Sends batches over the existing named-pipe IPC using the already-defined WriteAlarmEventsRequest / WriteAlarmEventsReply contracts at Ipc\Contracts.cs:153. No protocol changes — the slot is wired today on the contract side; only the production behaviour and the consumer on this side need to land.
  • Server\Phase7\Phase7Composer.ResolveHistorianSink — already scans for registered IAlarmHistorianWriter instances. Register the new sidecar-backed writer at server bootstrap when the historian sidecar is enabled (appsettings.json Historian:Wonderware:Enabled = true). SqliteStoreAndForwardSink then boots with a real writer attached and the NullAlarmHistorianSink fallback no longer applies on installs that have the sidecar deployed.

Tests:

  • SidecarAlarmHistorianWriter against a fake PipeServer — single record, batch, per-row failure modes (Ack / RetryPlease / PermanentFail) mapped from the sidecar's PerEventOk[] reply.
  • Phase7Composer end-to-end — start the server with the historian sidecar enabled; assert ResolveHistorianSink picks SqliteStoreAndForwardSink with the new sidecar writer attached.

Note on producer scope: This path historizes non-Galaxy alarms only. Galaxy-native alarms (with $Alarm* extensions) reach AVEVA Historian directly via System Platform's HistorizeToAveva toggle on the alarm primitive, with no involvement from us. Today the only live producer feeding SqliteStoreAndForwardSink is Phase7EngineComposer.RouteToHistorianAsync for scripted alarms; future producers (AB CIP ALMD, FOCAS CNC alarms if a customer wants unified storage) plug into the same path.

PR B.5 — docs + memory housekeeping

Depends on: B.1 / B.2 / B.3 / B.4 all green on the parity rig + D.1 (deployment refresh) verified on the dev rig.

Files:

  • docs\drivers\Galaxy.md — current text says the driver implements five capability interfaces; update to seven (IAlarmSource, IAlarmHistorianWriter-via-companion).
  • docs\AlarmTracking.md — promote a fresh top-level doc that describes the v2-final architecture (driver-native primary path + sub-attribute fallback + scripted-alarm aggregation). Cross-link from docs\README.md. The v1 archive stays as historical record.
  • docs\v1\AlarmTracking.md — extend the existing historical banner with "Restored to functional parity in this epic — see docs\AlarmTracking.md for current state."
  • Memory entries (C:\Users\dohertj2\.claude\projects\…\memory\):
    • Update project_galaxy_via_mxgateway.md — add the alarm path restoration.
    • Update project_server_history_alarm_subsystems.md — note that Phase7Composer.ResolveHistorianSink now finds a writer on Galaxy installs.
  • docs\plans\alarms-over-gateway.md (this file) — banner the doc ✅ Completed YYYY-MM-DD — historical record. matching the existing v2-mxgw plan retirement convention.

Track C — historian sidecar wires the dormant write path

The Wonderware historian sidecar at src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\ is a separately deployable Windows service (NSSM-wrapped) that already loads aahClientManaged x64 and serves a named-pipe IPC for read operations. The WriteAlarmEvents IPC slot is defined but unwired (Program.cs:57 constructs HistorianFrameHandler without an alarmWriter). Track C completes that slot. Two PRs in the sidecar + one consumer-side PR (B.4) in lmxopcua finishes the path.

PR C.1 — sidecar: AahClientManagedAlarmEventWriter

Files (src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Backend\):

  1. New AahClientManagedAlarmEventWriter.cs implementing the existing IAlarmEventWriter interface (defined in Ipc\HistorianFrameHandler.cs:242).
  2. Implementation calls aahClientManaged's alarm-event write API — the same path v1's GalaxyHistorianWriter used. Use the existing HistorianClusterEndpointPicker for multi-node routing so write failures fail over the same way reads do.
  3. Batch size + retry behaviour mirrors v1's GalaxyHistorianWriter per-row outcome reporting (HistorianWriteOutcome enum: Ack / PermanentFail / RetryPlease). Map MxStatus codes onto outcomes.
  4. Reuses HistorianDataSource's existing connection-pool / health gating — no new TCP work needed; the same session that serves reads can issue writes too.

Tests (tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests\):

  • Outcome-mapping table: every documented MxStatus on alarm-write → expected HistorianWriteOutcome.
  • Batching: 1 / 100 / 1000 events through a fake aahClientManaged writer; assert per-row outcome list parallel to input order.
  • Cluster failover: primary node returns BadCommunicationError; picker rotates to secondary; assert eventual success.

PR C.2 — sidecar: wire IAlarmEventWriter into Program.cs

Files (src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Program.cs):

  1. Build an AahClientManagedAlarmEventWriter next to the existing BuildHistorian() call.
  2. Pass it to HistorianFrameHandler (currently constructed at line 57 without an alarmWriter). The dispatcher already routes WriteAlarmEventsRequest through _alarmWriter when non-null (HistorianFrameHandler.cs:158-172); supplying it makes the slot functional.
  3. Gate behind a new env var OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED (default true when OTOPCUA_HISTORIAN_ENABLED=true). Lets a read-only deployment skip the writer registration if needed.
  4. Update Install-Services.ps1 install-time env block in lmxopcua's scripts\install\ to include the new toggle.

Tests:

  • Program.cs unit-test seam: assert handler is constructed with alarm writer when enabled and without when disabled.
  • Live integration (parity rig): write a synthetic alarm event through the IPC; query it back via ReadEvents; assert round-trip fidelity.

Sequencing within Track C: C.1 → C.2.

C.2's lmxopcua-side consumer is PR B.4 in Track B, which depends on C.2 being deployed.

Track E — client surface refresh

Two surfaces become user-visible when the alarm path lights up: the mxaccessgw client SDKs (5 languages, each with its own CLI) that consume the new OnAlarmTransition event family + AcknowledgeAlarm / QueryActiveAlarms RPCs directly, and the lmxopcua OPC UA-facing clients (Client.CLI, Client.UI) that consume the richer Part 9 condition payload through the OPC UA server. Both need updates so the new fields actually reach end-users; without Track E, the data arrives at the gateway / OPC UA server but the off-the-shelf clients display the same five columns they did under v2-pre-this-epic.

Track E is split per-language so each PR stays small and reviewable. PRs E.2 through E.6 are independent — they share only the proto regen from E.1 — and can land in parallel by whoever owns each language binding.

PR E.1 — regenerate proto across all client SDKs

Depends on: A.1 merged (proto change live).

Files (c:\Users\dohertj2\Desktop\mxaccessgw\clients\):

  1. .NET — codegen runs on csproj rebuild via Grpc.Tools; just rebuild MxGateway.Client.csproj after pulling A.1.
  2. Python — run clients\python\generate-proto.ps1; commit the regenerated _pb2.py + _pb2_grpc.py files under clients\python\src\.
  3. Go — run clients\go\generate-proto.ps1; commit the regenerated *.pb.go + *_grpc.pb.go files under clients\go\mxgateway\.
  4. Java — Gradle's protobuf-gradle-plugin regenerates on gradle build; verify the new types appear in the build output. Commit any pinned generated source under clients\java\mxgateway-client\src\main\java\ if that's the convention (check JavaClientDesign.md).
  5. Rustbuild.rs runs tonic-build on the proto; just cargo build. Generated code lives under clients\rust\target\ (gitignored) — nothing to commit; verify the new types compile.

No hand-written code in this PR. Pure regen + commit of generated artifacts. Per-language pre-existing proto-regen tests in each client's test suite must stay green.

PR E.2 — .NET client SDK + CLI

Depends on: E.1, A.3 (gateway alarm dispatch + ack RPC live).

Files (clients\dotnet\MxGateway.Client\ + MxGateway.Client.Cli\):

  1. MxGatewayClient.cs — new public methods:
    IAsyncEnumerable<AlarmTransition> SubscribeAlarmsAsync(
        IAsyncEnumerable<MxGatewaySession> session,
        AlarmFilter? filter = null,
        CancellationToken ct = default);
    Task<MxStatus> AcknowledgeAlarmAsync(
        MxGatewaySession session,
        string alarmFullReference,
        string comment,
        string userPrincipal,
        CancellationToken ct = default);
    IAsyncEnumerable<ActiveAlarmSnapshot> QueryActiveAlarmsAsync(
        MxGatewaySession session,
        string? filterPrefix = null,
        CancellationToken ct = default);
    
    Existing MxGatewayClientRetryPolicy covers the new operations without bespoke retry config.
  2. MxGateway.Client.Cli — add alarms verb with subcommands: subscribe (streams transitions until cancelled), acknowledge --ref <full-ref> --comment "<text>", query-active [--prefix <equipment>]. Output formatting mirrors the existing events stream verb (default human-readable + --json flag for machine output).
  3. AuthN — MxGatewayClientOptions validates new scopes invoke:alarm-ack / invoke:alarm-query exist on the API key when those operations are invoked; pre-flight check fails fast with a clear error rather than letting the gateway return PERMISSION_DENIED mid-stream.

Tests (clients\dotnet\MxGateway.Client.Tests\):

  • FakeGatewayTransport extended to emit OnAlarmTransition events; assert SubscribeAlarmsAsync yields each as the right payload shape.
  • Ack: assert request shape, retry policy, and error wrapping (Unauthenticated → MxGatewayAuthenticationException, PermissionDenied → MxGatewayAuthorizationException, resource-exhausted → MxGatewayException with the right message).
  • CLI verb tests in MxGatewayClientCliTests.cs — argument parsing, JSON output shape, exit codes.

PR E.3 — Python client SDK + CLI

Depends on: E.1.

Files (clients\python\src\mxgateway\ + the existing CLI entry point — verify the exact name during PR; PythonClientDesign.md documents it):

  1. New module alarms.py exposing async helpers:
    async def subscribe_alarms(session, *, filter=None) -> AsyncIterator[AlarmTransition]: ...
    async def acknowledge_alarm(session, *, alarm_ref, comment, user) -> MxStatus: ...
    async def query_active_alarms(session, *, prefix=None) -> AsyncIterator[ActiveAlarmSnapshot]: ...
    
  2. CLI: add alarms subscribe / acknowledge / query-active verbs. Use the same JSON output schema as E.2's CLI so cross-language tooling can parse either.
  3. Type stubs (*.pyi) updated for the new types.

Tests (clients\python\tests\):

  • pytest-asyncio fixtures using a stub gRPC server; assert each helper's request/response shape.
  • CLI smoke via subprocess + captured stdout JSON comparison.

PR E.4 — Go client SDK + CLI

Depends on: E.1.

Files (clients\go\mxgateway\ + clients\go\cmd\):

  1. New alarms.go exposing:
    func (c *Client) SubscribeAlarms(ctx context.Context, opts ...SubscribeOption) (<-chan AlarmTransition, error)
    func (c *Client) AcknowledgeAlarm(ctx context.Context, ref, comment, user string) (MxStatus, error)
    func (c *Client) QueryActiveAlarms(ctx context.Context, prefix string) ([]ActiveAlarmSnapshot, error)
    
  2. CLI: add alarms subcommand under clients\go\cmd\mxgateway-cli\ (verify the binary name in GoClientDesign.md). Same verb shape as E.2 / E.3.
  3. Errors wrapped via errors.Is against named sentinels (ErrAuthFailed, ErrPermissionDenied, etc.) so callers can programmatically distinguish failure modes.

Tests: standard Go table-driven tests against a stub gRPC server under clients\go\internal\testserver\.

PR E.5 — Java client SDK + CLI

Depends on: E.1.

Files (clients\java\mxgateway-client\src\main\java\ + clients\java\mxgateway-cli\):

  1. New methods on the existing client class (verify in JavaClientDesign.md):
    Flowable<AlarmTransition> subscribeAlarms(Session s, AlarmFilter filter);
    Single<MxStatus> acknowledgeAlarm(Session s, String alarmRef, String comment, String user);
    Flowable<ActiveAlarmSnapshot> queryActiveAlarms(Session s, String prefix);
    
    (RxJava idiom matching the existing data-change subscription API; if the existing API uses CompletableFuture instead, follow that convention — verify during PR.)
  2. CLI: same alarms subscribe / acknowledge / query-active verbs.

Tests: JUnit 5 + a stub gRPC server. CLI tested via ProcessBuilder exec + JSON output comparison.

PR E.6 — Rust client SDK

Depends on: E.1.

Files (clients\rust\crates\mxgateway-client\src\ + likely a mxgateway-cli crate — verify in RustClientDesign.md):

  1. New methods on the client struct:
    pub fn subscribe_alarms(&self, filter: Option<AlarmFilter>) -> impl Stream<Item = Result<AlarmTransition>>;
    pub async fn acknowledge_alarm(&self, alarm_ref: &str, comment: &str, user: &str) -> Result<MxStatus>;
    pub fn query_active_alarms(&self, prefix: Option<&str>) -> impl Stream<Item = Result<ActiveAlarmSnapshot>>;
    
  2. CLI: same verb shape.
  3. thiserror-based error enum extended with AlarmAckPermissionDenied etc. variants if the existing pattern uses one.

Tests: tokio::test against a stub gRPC server using tonic-build's test harness. CLI tested via assert_cmd.

PR E.7 — lmxopcua OPC UA-facing client refresh

Depends on: B.2 + B.3 (server-side payload final on the OPC UA wire). Independent of E.2-E.6 — different consumer surface (OPC UA Part 9, not gateway gRPC).

Files (c:\Users\dohertj2\Desktop\lmxopcua\src\):

  1. Core.Abstractions\AlarmEventArgs.cs (extend, not new) — add optional fields the new path surfaces:
    • OperatorComment (nullable string — populated by the native ack path; null on sub-attribute fallback path)
    • OriginalRaiseTimestampUtc (nullable; null on fallback path)
    • AlarmCategory (nullable string)
    • AlarmTypeName (already exists per v1 docs — leave alone)
  2. Server\OpcUa\DriverNodeManager.cs — populate the corresponding OPC UA Part 9 condition fields when the new payload is non-null: Comment (from OperatorComment), Time (from OriginalRaiseTimestampUtc when present, else event arrival time), ConditionClassName (from AlarmCategory if mapping is defined).
  3. Client.Shared\Models\AlarmEventArgs.cs — mirror the new fields on the client-side DTO.
  4. Client.CLI\Commands\AlarmsCommand.cs — add columns under a new --verbose flag, plus full payload under --json. Default output stays five-column compatible.
  5. Client.UI\ViewModels\AlarmEventViewModel.cs — bind the new fields. Add columns to Views\AlarmsView.axaml (collapsible under a "Show details" toggle so the default view stays compact). Surface OperatorComment in AckAlarmWindow.axaml as a prepopulated default when re-acknowledging an already-acked alarm.
  6. docs\Client.CLI.md — add the new --verbose and --json flag examples to the alarms section.
  7. docs\Client.UI.md — add a screenshot or description of the "Show details" expansion behavior.
  8. docs\reqs\ClientRequirements.md — line 116 + 153 reference the alarm subscription contract; extend the field list to cover the new payload.
  9. docs\AlarmTracking.md (new in B.5) — wire in client-side examples.

Tests:

  • Client.Shared.Tests — DTO round-trip through the alarm event pump with all fields populated and all-null cases.
  • Client.CLI.Tests--verbose column ordering, --json schema validation, default output stays five-column.
  • Client.UI.TestsAlarmEventViewModel bindings exposed, collapsible-detail toggle behavior.

Sequencing within Track E:

E.1 first (mechanical). E.2-E.7 can land in parallel. E.7 has its own dependency chain inside lmxopcua (B.2 + B.3) and doesn't gate any other E PR. The .NET client (E.2) is the only language SDK lmxopcua consumes today; if the gateway repo's release schedule prefers landing E.2 first and shipping E.3-E.6 in a follow-up release, that's a valid sequence — the customer-facing constraint is "at least one language SDK ships at the same time as A.4 lights up the gateway dispatch."

Track D — deployment refresh

The dev box at DESKTOP-6JL3KKO runs three live services from C:\publish\ (installed in the session that produced commit ea04547's install scripts). Once Tracks A / B / C are merged, the deployed binaries need to be refreshed so the running services pick up the new alarm path. Track D is one PR — pure ops, no code change.

PR D.1 — refresh C:\publish + restart services

Depends on: A.4 + B.4 + C.2 merged (every code-change PR landed).

Order matters — services must stop in reverse-dependency order (OtOpcUaOtOpcUaWonderwareHistorianMxAccessGw) and start in forward-dependency order (MxAccessGwOtOpcUaWonderwareHistorianOtOpcUa). Touching binaries while a dependent service holds them locked produces the publish-time MSB3027 file-lock error caught during the original install (see commit 80104ca).

Steps (run as a single PowerShell session on the deploy host):

  1. Stop in reverse order:

    nssm stop OtOpcUa
    nssm stop OtOpcUaWonderwareHistorian
    nssm stop MxAccessGw
    Start-Sleep -Seconds 3
    Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Server, `
                OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue |
        Stop-Process -Force
    
  2. Refresh mxaccessgw binaries (Track A output):

    $gwSrc = "C:\Users\dohertj2\Desktop\mxaccessgw"
    dotnet build "$gwSrc\src\MxGateway.Worker" -c Release
    dotnet build "$gwSrc\src\MxGateway.Server" -c Release
    
    Copy-Item -Recurse -Force `
        "$gwSrc\src\MxGateway.Server\bin\Release\net10.0\*" `
        "C:\publish\mxaccessgw\Server\"
    Copy-Item -Recurse -Force `
        "$gwSrc\src\MxGateway.Worker\bin\x86\Release\net48\*" `
        "C:\publish\mxaccessgw\Worker\"
    
  3. Refresh OtOpcUa + historian sidecar binaries (Tracks B + C output):

    $repo = "C:\Users\dohertj2\Desktop\lmxopcua"
    dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Server" `
        -c Release -o "C:\publish\lmxopcua"
    dotnet publish "$repo\src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware" `
        -c Release -o "C:\publish\lmxopcua\WonderwareHistorian"
    
  4. Update service env block if Track C added the new toggle:

    # Pull existing env, append OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true
    # (default-on per C.2 design, but explicit assignment lets us flip false
    # for read-only deployments without re-installing)
    nssm set OtOpcUaWonderwareHistorian AppEnvironmentExtra `
        (((nssm get OtOpcUaWonderwareHistorian AppEnvironmentExtra) `
          + "`r`nOTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true"))
    
  5. Start in forward order:

    nssm start MxAccessGw
    Start-Sleep -Seconds 4
    nssm start OtOpcUaWonderwareHistorian
    Start-Sleep -Seconds 4
    nssm start OtOpcUa
    Start-Sleep -Seconds 8
    
  6. Smoke verification:

    foreach ($s in 'MxAccessGw','OtOpcUaWonderwareHistorian','OtOpcUa') {
        (Get-Service $s).Status
    }
    foreach ($p in 5120, 4840, 4841) {
        Get-NetTCPConnection -LocalPort $p -State Listen `
            -ErrorAction SilentlyContinue
    }
    Get-Content "C:\publish\lmxopcua\logs\otopcua-*.log" -Tail 20
    Get-Content "C:\publish\mxaccessgw\stdout.log" -Tail 20
    Get-Content "C:\ProgramData\OtOpcUa\historian-wonderware-*.log" -Tail 10
    

    Pass criterion: all three services Running; ports 5120 + 4840 listening; sidecar log shows Wonderware historian sidecar serving — pipe=OtOpcUaWonderwareHistorian; OtOpcUa log shows OPC UA server started — endpoint=opc.tcp://0.0.0.0:4840/OtOpcUa and a new line IAlarmHistorianWriter resolved: Sidecar (added in B.4).

  7. Functional verification — fire one alarm of each kind and assert it propagates:

    • Galaxy-native — raise the OtOpcUaParityTest_001.Counter $Alarm* extension via Galaxy's alarm-fire mechanism; assert an OPC UA Part 9 transition reaches a connected otopcua-cli alarms subscriber with rich payload (operator-comment field non-null, original-raise-timestamp present). This validates Track A + B.1
      • B.2 + B.3.
    • Scripted — author a one-line scripted alarm in the Admin UI against any always-true predicate; assert the transition lands in AVEVA Historian via aaHistClientTrend query (or Driver.Historian.Wonderware.IntegrationTests with a query for the alarm event). Validates Track C + B.4.
    • Sub-attribute fallback — disable IAlarmSource on the GalaxyDriver via the test seam (B.3 will introduce one); fire an alarm; assert Part 9 transition still raised by the value-driven path. Validates the fallback wasn't broken.

Files:

  • scripts\install\Refresh-Services.ps1 (new — automates the above)
  • docs\v2\dev-environment.md — add the refresh script to the dev workflow section.

Tests: smoke run on the dev rig (DESKTOP-6JL3KKO) producing docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md with the captured log tails + smoke-test assertions. Captured artifact lands as part of the PR.

Rollback: the refresh script keeps a timestamped backup of the existing C:\publish\mxaccessgw\ and C:\publish\lmxopcua\ trees before overwriting (mirrored to C:\publish\.backup-YYYY-MM-DD\). Rollback is a stop / restore-from-backup / start sequence; no service re-install needed since the NSSM service definitions don't change.

Production deploy: out of scope for D.1 — the dev rig is the only deployment in scope at this point. A separate PR-or-runbook lands the production refresh once the dev rig has soaked for the documented duration (parity-rig validation gate; see "Test gates" above).

Sequencing matrix

Track A (mxaccessgw)              Track B (lmxopcua)              Track C (sidecar)        Track E (clients)
─────────────────────────         ─────────────────────────       ─────────────────────    ──────────────────────────
A.1 proto                         (waits)                         C.1 AahClientManagedWriter   E.1 proto regen ×5 langs
   │                                                                 │                          │ (mechanical, after A.1)
   ├──────────────────────────►   B.1 EventPump branch                │                          │
A.2 worker subscription                  │ uses proto types only      │                          │
   │                                     │ unit-testable               │                          │
   │                                                                 C.2 Program.cs wires        │
A.3 gateway dispatch + ack RPC ──►B.2 GalaxyDriver : IAlarmSource     │                       ──►E.2 .NET SDK + CLI
   │                                     │                            │                       ──►E.3 Python SDK + CLI
   │                              ──►B.3 DriverNodeManager routing    │                       ──►E.4 Go SDK + CLI
   │                                                                  │                       ──►E.5 Java SDK + CLI
   │                                                                  │                       ──►E.6 Rust SDK
A.4 ConditionRefresh                     │                            │                          │
                                         │                            │                          │
                                         B.4 SidecarAlarmHistorianWriter                          │
                                            (depends on C.2 deployed) │                          │
                                                                      │                          │
                                  (B.2 + B.3 done) ────────────────────────────────────────────► E.7 lmxopcua client refresh
                                            │                                                     │
                                            ▼                                                     │
                              Track D (deployment)                                                │
                              ─────────────────────────                                           │
                              D.1 Refresh C:\publish + restart services                           │
                                  (depends on A.4 + B.4 + C.2 + E.2 merged)                       │
                                            ▼                                                     │
                                  ──►B.5 docs + memory + completion banner ◄─────────(E.7 done)──┘

A.1 + B.1 + C.1 + E.1 can all land in parallel — none have cross-repo runtime dependencies. B.1's tests use proto types without needing a running gateway. C.1 is purely sidecar-internal. E.1 is mechanical codegen.

The gateway-side dispatch (A.3) gates B.2 and E.2-E.6. The sidecar-side wiring (C.2) gates B.4. E.7 gates on B.2 + B.3 only — it's the OPC UA client surface, not the gateway client surface.

D.1 (deployment refresh) requires E.2 to also be merged because the deployed MxGateway.Client.dll consumed by GalaxyDriver needs the new methods. E.3-E.6 (other-language SDKs) don't gate D.1 — they ship on their own release cadence.

B.5 (docs sweep) gates on D.1 + E.7 both merged — it's the final "snapshot the as-shipped state" pass.

Test gates

Per PR: unit tests pass + build green + analyzer clean (Roslyn OTOPCUA0001 still wraps every alarm-capability call through AlarmSurfaceInvoker).

End-of-epic gate: re-run the parity rig (docs\v2\Galaxy.ParityRig.md) with these scenarios added:

  1. Native alarm raise — Galaxy $Alarm* raise with operator-time metadata appears as an OPC UA Part 9 transition with full payload (no longer reconstructed from sub-attribute writes).
  2. Native ack — OPC UA client acks; assert the gateway records the ack against MxAccess directly (not via sub-attribute write); operator comment present in the resulting Acknowledged transition.
  3. ConditionRefresh after reconnect — disconnect the GalaxyDriver, raise three alarms in Galaxy, reconnect; assert all three appear in the next ConditionRefresh.
  4. Historian write-back — fire a scripted alarm; assert it arrives in AVEVA Historian via the gateway path (use the existing Historian sidecar's read API to query it back).
  5. Sub-attribute fallback still works — disable IAlarmSource on the GalaxyDriver via test seam, fire a sub-attribute value change; assert Part 9 transition still raised.

Soak target: 24h × 1k tags (light) — same parity-rig harness but extended to also subscribe to alarms. Pass criterion: zero dropped alarm transitions, zero state-machine inversions, zero unhandled exceptions in the AlarmSurfaceInvoker pipeline.

Risks and mitigations

Risk Mitigation
MxAccess Toolkit alarm subscription API differs across installed AVEVA versions PR A.2 verifies against the worker-host's installed Toolkit version; documents the exact API used. Pin the worker DLL set per major MxAccess version if needed.
Worker-side alarm subscription leaks between sessions if cleanup is wrong PR A.2 includes a session-recycle test that asserts no IAlarmEventSink instances remain registered after Close.
Gateway adds a new auth scope (invoke:alarm-ack); existing keys lack it PR A.3 + A.5 ship with a one-time bootstrap migration: keys with invoke:write get the new scope auto-granted on the dev rig and parity rig. Production keys are reissued via apikey rotate-key (existing CLI).
Two simultaneous alarm sources (driver-native + sub-attribute) double-fire transitions PR B.3 dedup is the load-bearing design. End-to-end test #1 covers it explicitly.
Historian write-back batch fails mid-batch — partial success The existing SqliteStoreAndForwardSink.HistorianWriteOutcome per-row enum + dead-letter retention already handles this; PR A.5 just exposes the same outcome shape over gRPC.
Sidecar starts honouring the WriteAlarmEvents slot — old lmxopcua-side consumers can now reach a previously inert path The slot returns Success=false, Error="not configured" today; flipping to live writes means a build that speculatively sent the frame would suddenly start producing real historian rows. Inventory of any such caller is empty — WriteAlarmEvents was never invoked from the lmxopcua side; Phase7EngineComposer.RouteToHistorianAsync queues into SqliteStoreAndForwardSink and the drain worker is gated on IAlarmHistorianWriter registration which only the new B.4 path provides. So enabling C.2 without B.4 is safe.

Roll-out

Track A lands first onto mxaccessgw/main, deployed to the parity rig. Track B lands onto lmxopcua/master once A.3 is live on the rig — earlier Track B PRs can target a feature branch (feat/alarms-over-gateway) and merge to master after the rig is fully green.

Back-out

Each PR is individually revertable. The cleanest back-out point is at the gateway-side enum extension: removing MX_EVENT_FAMILY_ON_ALARM_TRANSITION from the proto means EventPump silently drops alarm events again and GalaxyDriver's OnAlarmEvent never fires — but the sub-attribute fallback path still produces functional alarms, so the OPC UA surface degrades to v2-current behaviour without breaking. PR B.4 is the only one with a non-trivial back-out (re-add the deleted sidecar IPC slot if revert needed); land B.4 last and only after end-of-epic gate is green.

Out of scope (explicit)

  • Other alarm sources beyond Galaxy. AbCip / FOCAS / OpcUaClient drivers already implement IAlarmSource; they're untouched.
  • Modbus / S7 / AbLegacy / TwinCAT alarms. None of those protocols has a native alarm bus. Alarms on those drivers, if needed, ship via the scripted-alarm path.
  • Multi-Galaxy ack routing. Today's gateway model is one Galaxy per session; if a deployment splits across galaxies, each gets its own GalaxyDriver and they don't cross-talk. No change.
  • OPC UA Part 9 advanced features beyond the current scope — shelving, subscribed-to-events-only, branch-state for re-trigger semantics. Future epic if a customer asks.
  • Insight / cloud Historian write-back path. Track A.5 targets the on-prem AVEVA Historian via aahClientManaged. The cloud variant would mirror the same gateway RPC over the REST API discussed in docs/histsdk — separate epic.

File inventory (touched)

mxaccessgw (Track A):

  • src\MxGateway.Contracts\Protos\mxaccess_gateway.proto (A.1)
  • src\MxGateway.Contracts\Protos\mxaccess_worker.proto (A.2, A.4)
  • src\MxGateway.Worker\…\Eventing\ (A.2, A.3, A.4)
  • src\MxGateway.Worker\…\Commands\ (A.3, A.4)
  • src\MxGateway.Server\Sessions\SessionEventStream.cs (A.3)
  • src\MxGateway.Server\Rpc\ (A.3, A.4)
  • src\MxGateway.Server\Auth\Scopes.cs (A.3, A.4)
  • MxGateway.Tests, MxGateway.Worker.Tests, MxGateway.IntegrationTests

lmxopcua — Galaxy driver + server (Track B):

  • src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\EventPump.cs (B.1)
  • src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\MxAccessSeverityMapper.cs (new — B.1)
  • src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\IGalaxyAlarmAcknowledger.cs (new — B.2)
  • src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\Runtime\GatewayGalaxyAlarmAcknowledger.cs (new — B.2)
  • src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriver.cs (B.2)
  • src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy\GalaxyDriverFactory.cs (B.2)
  • src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs (B.3)
  • src\ZB.MOM.WW.OtOpcUa.Server\Alarms\AlarmConditionService.cs (B.3)
  • src\ZB.MOM.WW.OtOpcUa.Server\Phase7\Phase7Composer.cs (B.4)
  • src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client\SidecarAlarmHistorianWriter.cs (new — B.4)
  • tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests\Runtime\ (B.1, B.2)
  • tests\ZB.MOM.WW.OtOpcUa.Server.Tests\Alarms\ (B.3)
  • tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests\ (B.4 — new tests)
  • docs\drivers\Galaxy.md (B.5)
  • docs\AlarmTracking.md (new — B.5)
  • docs\v1\AlarmTracking.md (B.5 — banner update)
  • docs\plans\alarms-over-gateway.md (B.5 — completion banner)

lmxopcua — Wonderware historian sidecar (Track C):

  • src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Backend\AahClientManagedAlarmEventWriter.cs (new — C.1)
  • src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\Program.cs (C.2 — wire writer)
  • scripts\install\Install-Services.ps1 (C.2 — env-var toggle for write-enable)
  • tests\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests\ (C.1 — outcome mapping + batch + cluster failover)

lmxopcua — deployment refresh (Track D):

  • scripts\install\Refresh-Services.ps1 (new — D.1)
  • docs\v2\dev-environment.md (D.1 — document the refresh workflow)
  • docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md (new — D.1 captured smoke run)

mxaccessgw — client SDKs (Track E):

  • clients\proto\ — no source change; downstream codegen consumes A.1
  • .NET (E.2):
    • clients\dotnet\MxGateway.Client\MxGatewayClient.cs
    • clients\dotnet\MxGateway.Client\Alarms\ (new namespace)
    • clients\dotnet\MxGateway.Client.Cli\Verbs\AlarmsVerb.cs (new)
    • clients\dotnet\MxGateway.Client.Tests\AlarmsTests.cs (new)
  • Python (E.3):
    • clients\python\src\mxgateway\alarms.py (new)
    • clients\python\src\mxgateway\cli\alarms.py (new — verify CLI module path)
    • clients\python\tests\test_alarms.py (new)
  • Go (E.4):
    • clients\go\mxgateway\alarms.go (new)
    • clients\go\cmd\mxgateway-cli\alarms.go (new — verify dir name)
    • clients\go\internal\testserver\alarms_test.go (new)
  • Java (E.5):
    • clients\java\mxgateway-client\src\main\java\…\AlarmsApi.java (new)
    • clients\java\mxgateway-cli\src\main\java\…\AlarmsCommand.java (new)
    • clients\java\mxgateway-client\src\test\java\…\AlarmsApiTest.java (new)
  • Rust (E.6):
    • clients\rust\crates\mxgateway-client\src\alarms.rs (new)
    • clients\rust\crates\mxgateway-cli\src\alarms.rs (new — verify crate name)
    • clients\rust\tests\alarms.rs (new)

lmxopcua — OPC UA client refresh (Track E.7):

  • src\ZB.MOM.WW.OtOpcUa.Core.Abstractions\AlarmEventArgs.cs (extend)
  • src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs (Part 9 field population)
  • src\ZB.MOM.WW.OtOpcUa.Client.Shared\Models\AlarmEventArgs.cs (DTO mirror)
  • src\ZB.MOM.WW.OtOpcUa.Client.CLI\Commands\AlarmsCommand.cs (verbose / json flags)
  • src\ZB.MOM.WW.OtOpcUa.Client.UI\ViewModels\AlarmEventViewModel.cs
  • src\ZB.MOM.WW.OtOpcUa.Client.UI\ViewModels\AlarmsViewModel.cs
  • src\ZB.MOM.WW.OtOpcUa.Client.UI\Views\AlarmsView.axaml (+ .cs)
  • src\ZB.MOM.WW.OtOpcUa.Client.UI\Views\AckAlarmWindow.axaml (+ .cs)
  • docs\Client.CLI.md (alarms section examples)
  • docs\Client.UI.md (Show-details toggle description)
  • docs\reqs\ClientRequirements.md (extend AlarmEventArgs contract)
  • docs\AlarmTracking.md (B.5 — cross-link client examples)
  • tests\ZB.MOM.WW.OtOpcUa.Client.Shared.Tests\ (DTO round-trip)
  • tests\ZB.MOM.WW.OtOpcUa.Client.CLI.Tests\ (flag behaviour)
  • tests\ZB.MOM.WW.OtOpcUa.Client.UI.Tests\ (view-model bindings)

Total: ~10 source files added/modified in mxaccessgw server/worker side; ~14 in lmxopcua server/driver side; ~3 in the historian sidecar; ~2 deployment scripts; ~30 across the five gateway-client SDK languages; ~12 in lmxopcua client surfaces; ~25 test files across all repos. The gateway-client multi-language work is parallelizable across maintainers, so wall-clock effort lands in 4-6 weeks of coordinated work given the parity-rig dependency for end-to-end validation. If only the .NET SDK ships at first (E.2 only) and E.3-E.6 follow asynchronously, lmxopcua's critical path stays unchanged.