SdkAlarmHistorianWriteBackend.WriteBatchAsync replaces the RetryPlease placeholder with the real entry point — HistorianAccess.AddStreamedValue (HistorianEvent, out HistorianAccessError) in aahClientManaged, pinned by decompiling the installed SDK. The write path opens its own ReadOnly=false connection: the query-side HistorianDataSource opens ReadOnly sessions and AddStreamedValue fails on those with WriteToReadOnlyFile. IHistorianConnectionFactory gains a readOnly parameter (default true, query path unchanged); BuildConnectionArgs is extracted as a pure helper. HistorianClusterEndpointPicker is shared for node failover; connection-class errors abort the batch as RetryPlease and reset the connection, malformed-input codes map to PermanentFail. Tests: connection-unavailable batch deferral, ClassifyOutcome error-code table, BuildConnectionArgs read-vs-write shaping (80 pass, 2 rig-skipped). Live_* round-trip tests stay Skip-gated for the D.1 rollout smoke. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
Alarms Worker Wiring Plan
Context: The alarms-over-gateway epic shipped 19 PRs across the
lmxopcuaandmxaccessgwrepos (merged 2026-04-30). Contracts are live; the sub-attribute fallback path keeps Galaxy alarms functional today. Four items remain as inert scaffolds gated on a dev-rig finding. This document is the focused implementation plan for those four items only.Do not duplicate
docs/plans/alarms-over-gateway.md— that document is the full historical record of all 19 PRs. This document covers only what is still to be done and exactly what blocks each item.This work lives in the mxaccessgw sibling repo at
C:\Users\dohertj2\Desktop\mxaccessgw\— not in this (lmxopcua) repo, except where lmxopcua changes are noted explicitly.
Dev-rig finding that blocks everything (2026-04-30)
During PR A.2 work the following was discovered on the dev box:
The MXAccess COM Toolkit at
C:\Program Files (x86)\ArchestrA\Framework\Bin\ArchestrA.MXAccess.dllexposes no alarm-event family — onlyOnDataChange,OnWriteComplete,OperationComplete,OnBufferedDataChange.AVEVA's
aaAlarmManagedClient/ArchestrAAlarmsAndEvents.SDKassemblies are x64-only and incompatible with the worker's x86 net48 bitness.
The architectural decision required before any of A.2, A.3/A.4, C.1 can ship:
Either accept the value-driven sub-attribute path as the production architecture (operator-comment fidelity is the only v1 regression), or add an x64 alarm-helper sub-process alongside the x86 worker.
Resolution drives the implementation shape of every item below. The plan presented here assumes the x64 alarm-helper sub-process route (the higher parity option), but notes the sub-attribute-only exit at each step.
Discovered AVEVA API surface
Before implementing, verify the following against the AVEVA SDK actually installed on the dev box and in the mxaccessgw worker's deployment folder:
| Assembly | Bitness | Likely location | Key types |
|---|---|---|---|
ArchestrA.MXAccess.dll |
x86 | C:\Program Files (x86)\ArchestrA\Framework\Bin\ |
IMxAlarmEventSink, MxAlarmEventArgs — confirm exists at actual version |
aaAlarmManagedClient.dll |
x64 | C:\Program Files\ArchestrA\Framework\Bin\ |
AlarmClient, IAlarmConsumer, AlarmEventArgs |
ArchestrAAlarmsAndEvents.SDK.dll |
x64 | Same or Historian SDK folder | AlarmHistorianWriter, GetAlarmExtendedRec |
The AVEVA MXAccess Toolkit reference in the mxaccessgw repo (gateway.md) is
the canonical API doc for the gateway worker's side. The alarm-client API is
documented separately; verify the following call shapes during PR A.2:
| Operation | Likely API | Notes |
|---|---|---|
| Subscribe to alarm events | AlarmClient.RegisterConsumer(IAlarmConsumer) + AlarmClient.Subscribe(filterSpec) |
Confirm exact method signatures against the SDK version on the dev box |
| Receive alarm event | IAlarmConsumer.OnAlarmEvent(AlarmEventArgs) callback |
Field set: alarm name, source, type, transition kind, severity, timestamps, operator fields |
| Acknowledge alarm | AlarmClient.AcknowledgeAlarm(alarmRef, comment, userPrincipal) or equivalent |
Confirm whether this is synchronous or returns a status |
| Query active alarms | AlarmClient.GetAlarmExtendedRec(filter) or GetActiveAlarms() |
Returns current active set for ConditionRefresh |
| Get statistics | AlarmClient.GetStatistics() |
Optional — useful for worker health checks |
Record the exact method signatures against the installed SDK before starting
A.2 — the proto field set in OnAlarmTransitionEvent must match the SDK's
actual payload.
Dependency order
A.2 (worker: AlarmClient subscription)
└─► A.3 (gateway: dispatch OnAlarmTransition + AcknowledgeAlarm RPC handler)
└─► A.4 (gateway: QueryActiveAlarms RPC handler)
└─► lmxopcua B.2 (GalaxyDriver IAlarmSource live)
└─► C.1 (sidecar: AahClientManagedAlarmEventWriter live)
└─► D.1 (smoke artifact captured)
A.2 is the single blocking item. All subsequent items unblock serially once A.2 delivers alarm events through the channel.
Item A.2 — Worker: subscribe to MxAccess alarm event source
Repo: mxaccessgw — src\MxGateway.Worker\ (net48, x86)
What it needs:
The worker must subscribe to AVEVA's alarm events and fan them into the same
bounded channel the data-change pump uses, translating each MxAccess alarm
event into a WorkerEvent proto with family MX_EVENT_FAMILY_ON_ALARM_TRANSITION
(defined in PR A.1, already merged).
Architectural choice determines the implementation path:
Option X1 — aaAlarmManagedClient in a new x64 alarm-helper process
Add a second worker-mode sub-process (MxGateway.AlarmWorker, net8.0 x64)
alongside the existing x86 worker. The AlarmWorker:
- Loads
aaAlarmManagedClient.dll(x64) on startup. - Calls
AlarmClient.RegisterConsumerwith aWorkerAlarmConsumersink. - Calls
AlarmClient.Subscribewith a session-level filter (all alarms for the session's Galaxy scope). - Translates each
IAlarmConsumer.OnAlarmEventcallback into a protobufWorkerEvent(familyON_ALARM_TRANSITION) and writes it to an IPC channel readable by the gateway server-side multiplexer. - Handles session lifecycle: re-subscribes after reconnect; unsubscribes on session close.
IPC from AlarmWorker to gateway: simplest option is a named pipe or an
in-process queue if the AlarmWorker is hosted in the same gateway process
space as a separate IHostedService.
Option X2 — Accept sub-attribute fallback as production (no A.2 work)
If the architectural decision is to accept the sub-attribute path as permanent:
MxAccessAlarmEventSink.Attach()in the worker remains a no-op (as currently coded with the architectural comment).- The
MX_EVENT_FAMILY_ON_ALARM_TRANSITIONproto family stays defined but the gateway never emits events on it. - lmxopcua's
GalaxyDriverdoes not implementIAlarmSourcefor the native path; the value-driven sub-attribute path remains the production path. - The only regression vs. v1 is operator-comment fidelity on Galaxy alarms.
- C.1 is still needed if scripted-alarm historian write-back is required.
What blocks it: the architectural decision above. Once made, A.2 becomes a 2–3 day implementation task (sub-process plumbing + proto translation + unit tests for the consumer sink cancellation behaviour).
Tests to write (when A.2 proceeds):
WorkerAlarmConsumerTests— fakeIAlarmConsumersource emits canned transitions; assert each produces the correctWorkerEventbody shape.- Cancellation/session-close test — closing the session unsubscribes from
the AlarmClient cleanly (no leaked
IAlarmConsumerreference if the worker is recycled mid-session). - Re-subscribe-after-reconnect test —
ReconnectSupervisortriggers a reconnect; assert the alarm consumer re-attaches to the new session.
Item A.3 / A.4 — Gateway: dispatch and RPC handlers
Repo: mxaccessgw — src\MxGateway.Server\
Depends on: A.2 delivering WorkerEvent bodies with family
MX_EVENT_FAMILY_ON_ALARM_TRANSITION.
What it needs:
A.3 — Dispatch + AcknowledgeAlarm
-
The session-level event multiplexer (
Sessions\SessionEventStream.csor equivalent — verify name in the mxaccessgw repo) must recognise the newWorkerEventbody and forward it as anMxEventwith familyMX_EVENT_FAMILY_ON_ALARM_TRANSITIONto everyStreamEventssubscriber for that session. -
New RPC handler
AcknowledgeAlarmbuilds anAlarmAcknowledgeCommandworker command and forwards it to the alarm-helper process (Option X1) or the worker's MxAccess session (Option X2 if MxAccess exposes ack). Maps the reply status toAcknowledgeAlarmReply.MxStatusProxy. -
Authorization: new API scope
invoke:alarm-ackon the API key. Keys without it receivePERMISSION_DENIED. Follow the existing scope-check pattern used byinvoke:write.
A.4 — QueryActiveAlarms
-
New RPC handler
QueryActiveAlarmscallsAlarmClient.GetAlarmExtendedRec(orGetActiveAlarms— confirm the method name during implementation) on the alarm-helper process, batches results intoActiveAlarmSnapshotproto messages, and streams them back to the caller. -
New API scope
invoke:alarm-query(separate from ack so read-only clients can refresh without ack rights).
What blocks A.3/A.4: A.2 must deliver WorkerEvent bodies on the channel.
A.3/A.4 are pure dispatch wiring once the events arrive.
Tests to write:
- A.3 dispatch test — fake worker emits an
AlarmTransitionevent; assert the gateway forwards it on theStreamEventschannel of every subscribed session (mirrors existingOnDataChangedispatch tests). - A.3 AcknowledgeAlarm auth test — existing key without
invoke:alarm-ackscope returnsPERMISSION_DENIED. - A.4 pagination test — synthetic active-alarm set of 0 / 1 / 100 entries;
assert each streams back as separate
ActiveAlarmSnapshotmessages. - Integration (parity rig — requires dev box with AVEVA platform):
trigger a real Galaxy alarm, call
QueryActiveAlarms, assert the alarm appears in the stream; callAcknowledgeAlarm, assert the alarm transitions toActiveAckedand aAcknowledgetransition event appears onStreamEvents.
Item C.1 — Historian sidecar: AahClientManagedAlarmEventWriter
Repo: lmxopcua — src\Drivers\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\
Depends on: Architectural decision (the sidecar uses aahClientManaged
x64, which is not bitness-constrained like the worker). C.1 is independently
unblockable from A.2 if the goal is to wire up the scripted-alarm historian
path.
Current state (DONE — code):
C.1 shipped. SdkAlarmHistorianWriteBackend.WriteBatchAsync writes through the
real SDK entry point — HistorianAccess.AddStreamedValue(HistorianEvent, out HistorianAccessError) in aahClientManaged — pinned 2026-05-18 by
decompiling the installed SDK. Program.cs and Install-Services.ps1 were
already wired in the PR C.1 scaffolding. Two corrections to the assumptions
this doc was written under:
- There is no
ArchestrAAlarmsAndEvents.SDKwriter. That assembly (ArchestrAAlarmsAndEvents.SDK.Common.dll, the only one installed) is a WCF query-proxy base — noAlarmHistorianWritertype. The write path is theaahClientManagedHistorianAccesssurface. - The write path needs its own connection. The query-side
HistorianDataSourceopensReadOnlysessions;AddStreamedValueon a read-only session fails withWriteToReadOnlyFile.SdkAlarmHistorianWriteBackendopens a dedicatedReadOnly=falseconnection and shares onlyHistorianClusterEndpointPicker(not the connection object).
What it needed (all done):
SdkAlarmHistorianWriteBackendbuilds aHistorianEventperAlarmHistorianEventDto, callsAddStreamedValue, and mapsHistorianAccessError.ErrorValuecodes throughAahClientManagedAlarmEventWriter.MapOutcome(Ack / PermanentFail / RetryPlease).HistorianClusterEndpointPickerdrives multi-node failover.Program.cs—BuildAlarmWriter()constructs the backend gated behindOTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED.Install-Services.ps1— env var present in the install-time block.
What remains for C.1: only the live-rig write smoke — the Live_* tests
in SdkAlarmHistorianWriteBackendTests stay Skip-gated until D.1 confirms a
round-trip against a real AVEVA Historian, including the exact mandatory
HistorianEvent field set.
Tests to write:
- Outcome-mapping table: every
MxStatuson alarm-write → expectedHistorianWriteOutcome. - Batch test: 1 / 100 / 1000 events through a fake
aahClientManagedwriter; assert per-row outcome list parallel to input order. - Cluster failover: primary Historian node returns
BadCommunicationError; picker rotates to secondary; eventual success. Program.csseam: assert handler constructed with alarm writer when env var enabled; without it when disabled.- Live integration (parity rig): write a synthetic alarm event through the
IPC; query it back via
ReadEvents; assert round-trip fidelity.
Item D.1 — Smoke artifact
Repo: lmxopcua (deployment refresh) + mxaccessgw (rig verification)
Depends on: A.2, A.3, A.4, and C.1 all passing on the dev rig with a live Galaxy and live Historian.
Current state: The deployment script Refresh-Services.ps1 (task D.1) has
shipped as PR #417 (merged 2026-04-30). What was NOT captured at that time was
a smoke artifact — a log snippet or test output confirming that:
- An alarm transition event from a live Galaxy alarm reaches lmxopcua's
AlarmConditionServicevia the newIAlarmSourcepath (not the fallback). - A scripted-alarm historian write-back reaches AVEVA Historian via the
sidecar
IAlarmEventWriter.
What it needs:
Once A.2, A.3, C.1 are wired on the parity rig:
-
Deploy the updated mxaccessgw (with A.2 / A.3 / A.4 changes).
-
Deploy the updated sidecar (with C.1 changes).
-
Run
Refresh-Services.ps1to confirm clean service restarts. -
Trigger a Galaxy alarm (e.g. set an AnalogLimitAlarm attribute out of range in Galaxy IDE).
-
Observe the lmxopcua OPC UA alarm surface via the Client CLI:
dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- ` alarms -u opc.tcp://localhost:4840 --subscribePass: the alarm condition appears on the OPC UA A&E surface within 2 × publishing interval.
-
Trigger a scripted alarm via the lmxopcua
ScriptedAlarmEngine(or an OPC UA method call if one is wired). -
Confirm in the AVEVA Historian that the scripted alarm event is stored (query via the Historian client or HistorianWatch tool).
-
Capture log snippets:
- mxaccessgw log:
[INF] AlarmTransition dispatched sessionId=<> alarmRef=<> - lmxopcua log:
[INF] AlarmConditionService: IAlarmSource event alarmRef=<> origin=Driver - Sidecar log:
[INF] AahClientManagedAlarmEventWriter: Wrote <n> alarm events
- mxaccessgw log:
-
Commit the log snippets as
docs/plans/alarms-d1-smoke-artifact.md(a new doc, not this one).
What blocks D.1: all of A.2, A.3, C.1, plus the operator decision on the x64 alarm-helper architecture (or explicit acceptance of the sub-attribute fallback as production).
Summary of blocks
| Item | Blocked by | Estimated effort once unblocked |
|---|---|---|
| A.2 | Architectural decision (x64 alarm-helper vs. sub-attribute fallback as production) | 2–3 days implementation; 1 day tests |
| A.3 | A.2 delivering WorkerEvent bodies | 1–2 days |
| A.4 | A.2 (active-alarm query needs AlarmClient session) | 1 day |
| C.1 | aahClientManaged SDK access (available on dev box); NOT blocked by A.2 | 1–2 days |
| D.1 | A.2 + A.3 + C.1 all passing on parity rig | 0.5 day (smoke + artifact capture) |
C.1 can proceed in parallel with A.2 / A.3 since the sidecar's aahClientManaged
is x64 and does not share the worker bitness constraint.
What this plan does NOT cover
- The value-driven sub-attribute fallback path — already shipped and functional (not being changed).
- Track B (lmxopcua EventPump, GalaxyDriver IAlarmSource re-implementation)
and Track E (client SDK surface refresh) from the alarms-over-gateway plan —
those are in
lmxopcuaand depend on A.3 being live; they follow naturally once A.3 ships. - Galaxy-native alarm historian path — System Platform's own
HistorizeToAvevatoggle on the Galaxy template; not in scope. - Alarm ACL / role-grant surface — already shipped in Phase 6.2.