alarms-over-gateway: wire worker AlarmClient + pin SDK call site (4 inert scaffolds + D.1 smoke) #420

Open
opened 2026-05-01 06:34:40 -04:00 by dohertj2 · 2 comments
Owner

Tracking the remaining work after PR #419 reconciled the plan banner against the audited source. Architectural decision was already resolved 2026-04-30 (aaAlarmManagedClient.AlarmClient is x86 net48, same bitness as the worker; API surface discovered via reflection probe). What remains is wiring.

mxaccessgw repo — worker AlarmClient wiring

A.2 — replace MxAccessAlarmEventSink.Attach no-op with a real subscription. Per the file's own xmldoc:

  1. Wire AlarmClient.RegisterConsumer(hWnd, productName, applicationName, version, retainHidden) against the worker's existing STA hWnd at session startup.
  2. Call AlarmClient.Subscribe(provider, fromPri, toPri, queryType, sortFlags, filterMask, filterSpec) with the Galaxy provider name and a permissive priority/filter range.
  3. STA WM_APP handler routes alarm-changed messages into MxAccessAlarmEventSink.EnqueueTransition after pulling each changed alarm via GetStatistics + GetAlarmExtendedRec.
  4. Detach unsubscribes cleanly (worker recycle test).

A.3 worker dispatcher — replace NotWiredAlarmRpcDispatcher. Build a WorkerAlarmRpcDispatcher that translates AcknowledgeAlarmRequest into a worker command calling AlarmClient.AlarmAckByGUID(alarmGuid, comment, oprName, oprNode, oprDomain, oprFullName) with the OPC UA operator's resolved identity. Swap into DI in place of the NotWired impl.

A.4 worker dispatcher — QueryActiveAlarms server-streaming reply. Walk AlarmClient's active-alarm collection (use GetStatistics to enumerate hAlarm handles, then GetAlarmExtendedRec per handle) and stream ActiveAlarmSnapshot messages back through the existing command-reply channel.

lmxopcua repo — sidecar SDK pin

C.1 — pin SdkAlarmHistorianWriteBackend.WriteBatchAsync. Replace the placeholder RetryPlease body with the live aahClientManaged alarm-event write call. The outcome-mapping helper AahClientManagedAlarmEventWriter.MapOutcome is already shared, so the smoke-pinned change is small. Performed on the dev rig as part of D.1.

D.1 smoke artifact

Capture docs/plans/artifacts/d1-rollout-YYYY-MM-DD.md per the test plan in docs/plans/alarms-over-gateway.md Track D — log tails from all three services after refresh, plus the three functional verifications (Galaxy-native alarm, scripted alarm, sub-attribute fallback). Directory does not exist yet.

Acceptance criteria

  • mxaccessgw PR: A.2 worker registration + STA WM_APP handler shipped; integration test against a live Galaxy raises one alarm and MX_EVENT_FAMILY_ON_ALARM_TRANSITION carries it.
  • mxaccessgw PR: A.3 WorkerAlarmRpcDispatcher implemented; NotWiredAlarmRpcDispatcher removed from DI.
  • mxaccessgw PR: A.4 active-alarm walk implemented; ConditionRefresh after reconnect returns every active+acked alarm.
  • lmxopcua PR: SdkAlarmHistorianWriteBackend calls the live aahClientManaged write API.
  • D.1 smoke artifact captured and committed.
  • docs/plans/alarms-over-gateway.md banner updated to historical record (final pass).

References

  • Plan: docs/plans/alarms-over-gateway.md
  • Worker sink xmldoc: mxaccessgw src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs (architecture pinned + API surface discovered)
  • Reflection probe: mxaccessgw MxGateway.Worker.Tests AlarmClientDiscoveryTests.DumpAlarmClientPublicSurface (Skip-gated)
  • Reconciliation PR: docs: reconcile alarms-over-gateway banner with audited source (#419)
Tracking the remaining work after [PR #419](https://gitea.dohertylan.com/dohertj2/lmxopcua/pulls/419) reconciled the plan banner against the audited source. Architectural decision was already resolved 2026-04-30 (`aaAlarmManagedClient.AlarmClient` is x86 net48, same bitness as the worker; API surface discovered via reflection probe). What remains is wiring. ## mxaccessgw repo — worker AlarmClient wiring **A.2 — replace `MxAccessAlarmEventSink.Attach` no-op with a real subscription.** Per the file's own xmldoc: 1. Wire `AlarmClient.RegisterConsumer(hWnd, productName, applicationName, version, retainHidden)` against the worker's existing STA hWnd at session startup. 2. Call `AlarmClient.Subscribe(provider, fromPri, toPri, queryType, sortFlags, filterMask, filterSpec)` with the Galaxy provider name and a permissive priority/filter range. 3. STA WM_APP handler routes alarm-changed messages into `MxAccessAlarmEventSink.EnqueueTransition` after pulling each changed alarm via `GetStatistics` + `GetAlarmExtendedRec`. 4. Detach unsubscribes cleanly (worker recycle test). **A.3 worker dispatcher — replace `NotWiredAlarmRpcDispatcher`.** Build a `WorkerAlarmRpcDispatcher` that translates `AcknowledgeAlarmRequest` into a worker command calling `AlarmClient.AlarmAckByGUID(alarmGuid, comment, oprName, oprNode, oprDomain, oprFullName)` with the OPC UA operator's resolved identity. Swap into DI in place of the `NotWired` impl. **A.4 worker dispatcher — `QueryActiveAlarms` server-streaming reply.** Walk `AlarmClient`'s active-alarm collection (use `GetStatistics` to enumerate `hAlarm` handles, then `GetAlarmExtendedRec` per handle) and stream `ActiveAlarmSnapshot` messages back through the existing command-reply channel. ## lmxopcua repo — sidecar SDK pin **C.1 — pin `SdkAlarmHistorianWriteBackend.WriteBatchAsync`.** Replace the placeholder `RetryPlease` body with the live `aahClientManaged` alarm-event write call. The outcome-mapping helper `AahClientManagedAlarmEventWriter.MapOutcome` is already shared, so the smoke-pinned change is small. Performed on the dev rig as part of D.1. ## D.1 smoke artifact Capture `docs/plans/artifacts/d1-rollout-YYYY-MM-DD.md` per the test plan in `docs/plans/alarms-over-gateway.md` Track D — log tails from all three services after refresh, plus the three functional verifications (Galaxy-native alarm, scripted alarm, sub-attribute fallback). Directory does not exist yet. ## Acceptance criteria - [ ] mxaccessgw PR: A.2 worker registration + STA WM_APP handler shipped; integration test against a live Galaxy raises one alarm and `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` carries it. - [ ] mxaccessgw PR: A.3 `WorkerAlarmRpcDispatcher` implemented; `NotWiredAlarmRpcDispatcher` removed from DI. - [ ] mxaccessgw PR: A.4 active-alarm walk implemented; ConditionRefresh after reconnect returns every active+acked alarm. - [ ] lmxopcua PR: `SdkAlarmHistorianWriteBackend` calls the live `aahClientManaged` write API. - [ ] D.1 smoke artifact captured and committed. - [ ] `docs/plans/alarms-over-gateway.md` banner updated to ✅ historical record (final pass). ## References - Plan: `docs/plans/alarms-over-gateway.md` - Worker sink xmldoc: `mxaccessgw` `src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs` (architecture pinned + API surface discovered) - Reflection probe: `mxaccessgw` `MxGateway.Worker.Tests` `AlarmClientDiscoveryTests.DumpAlarmClientPublicSurface` (Skip-gated) - Reconciliation PR: #419
Author
Owner

Update 2026-05-01 — A.2 architecture finding

Attempted to start A.2 by wiring MxAccessAlarmEventSink against the existing PR A.5 AlarmClientConsumer. Ran the Skip-gated reflection probe (MxGateway.Worker.Tests.AlarmClientDiscoveryTests.DumpAlarmClientPublicSurface) against the deployed aaAlarmManagedClient.dll (v1.0.7368.41290) and discovered:

The aaAlarmManagedClient.AlarmClient class has zero public events. PR A.5's xmldoc claim that the AVEVA alarm client exposes a managed-event surface is wrong against this assembly. The actual notification mechanism is WM_APP messagingRegisterConsumer(hWnd, ...) takes a window handle for a reason; AVEVA's alarm provider WM_APP-pokes the registered window, then GetStatistics + GetAlarmExtendedRec pull the change set on each poke.

Practical impact

  • AlarmClientConsumer.AlarmRecordReceived has no production caller. RaiseAlarmRecordReceived is invoked only from tests.
  • Subscribe(...) returns OK from RegisterConsumer + Subscribe but no notifications reach the consumer at runtime because no real window is attached.
  • The gateway's MX_EVENT_FAMILY_ON_ALARM_TRANSITION family is reserved on the wire but cannot carry events until A.2 lands a real WM_APP pump.
  • AcknowledgeByGuid and SnapshotActiveAlarms are pull-style and remain correct as written.

What landed

mxaccessgw PR #118 — doc-only commit recording the finding:

  • New docs/AlarmClientDiscovery.md with the reflection probe summary, full AlarmClient method list, and open questions for the A.2 implementation.
  • AlarmClientConsumer.cs xmldoc fixed (managed-event premise → WM_APP).
  • MxAccessAlarmEventSink.cs xmldoc fixed ("verify on dev rig" hedge → resolved finding + expanded open-questions list).

Open questions blocking implementation

Documented in mxaccessgw/docs/AlarmClientDiscovery.md "Implications for A.2":

  1. WM_APP message ID. Not in the public surface. Need either AVEVA's C++ Toolkit reference (canonical doc per gateway.md) or a runtime probe (subclass a window, log every WM arriving while a live alarm is fired, identify the AVEVA one). Worth doing once on the dev rig and checking the result in.
  2. wParam / lParam semantics. Probably none — pattern is "got poked, pull state via GetStatistics." Confirm during the probe.
  3. Threading. AVEVA almost certainly delivers the WM on the thread that owns the window. The worker's STA is the natural home; the existing StaRuntime already runs a pump there. If AVEVA assumes a UI thread inside GetStatistics, the alarm path may need its own STA.
  4. Subscription scope. Subscribe(szSubscription, ...) takes an AVEVA-syntax string for the alarm provider. The configured Galaxy name is already known to the worker via the existing data session — reuse it.

Next A.2 PR is a code change that:

  1. Creates a hidden message-only window inside the worker's STA.
  2. Implements a WindowProc that intercepts the AVEVA WM_APP message and routes change-enumeration into MxAccessAlarmEventSink.EnqueueTransition.
  3. Replaces AlarmClientConsumer.Subscribe's hWnd: 0 placeholder with the real window handle.
  4. Wires the consumer into worker session-startup alongside the existing data-change sink (composite-sink pattern).
  5. Tests against the dev rig with at least one Galaxy alarm raise/ack/clear cycle.

The doc-only PR keeps that future code-change PR tightly scoped — the discovery / re-architecture rationale is already captured.

## Update 2026-05-01 — A.2 architecture finding Attempted to start A.2 by wiring `MxAccessAlarmEventSink` against the existing PR A.5 `AlarmClientConsumer`. Ran the Skip-gated reflection probe (`MxGateway.Worker.Tests.AlarmClientDiscoveryTests.DumpAlarmClientPublicSurface`) against the deployed `aaAlarmManagedClient.dll` (v1.0.7368.41290) and discovered: **The `aaAlarmManagedClient.AlarmClient` class has zero public events.** PR A.5's xmldoc claim that the AVEVA alarm client exposes a managed-event surface is wrong against this assembly. The actual notification mechanism is **WM_APP messaging** — `RegisterConsumer(hWnd, ...)` takes a window handle for a reason; AVEVA's alarm provider WM_APP-pokes the registered window, then `GetStatistics` + `GetAlarmExtendedRec` pull the change set on each poke. ## Practical impact - `AlarmClientConsumer.AlarmRecordReceived` has no production caller. `RaiseAlarmRecordReceived` is invoked only from tests. - `Subscribe(...)` returns OK from `RegisterConsumer` + `Subscribe` but no notifications reach the consumer at runtime because no real window is attached. - The gateway's `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` family is reserved on the wire but cannot carry events until A.2 lands a real WM_APP pump. - `AcknowledgeByGuid` and `SnapshotActiveAlarms` are pull-style and remain correct as written. ## What landed [mxaccessgw PR #118](https://gitea.dohertylan.com/dohertj2/mxaccessgw/pulls/118) — doc-only commit recording the finding: - New `docs/AlarmClientDiscovery.md` with the reflection probe summary, full `AlarmClient` method list, and open questions for the A.2 implementation. - `AlarmClientConsumer.cs` xmldoc fixed (managed-event premise → WM_APP). - `MxAccessAlarmEventSink.cs` xmldoc fixed ("verify on dev rig" hedge → resolved finding + expanded open-questions list). ## Open questions blocking implementation Documented in `mxaccessgw/docs/AlarmClientDiscovery.md` "Implications for A.2": 1. **WM_APP message ID.** Not in the public surface. Need either AVEVA's C++ Toolkit reference (canonical doc per `gateway.md`) or a runtime probe (subclass a window, log every WM arriving while a live alarm is fired, identify the AVEVA one). Worth doing once on the dev rig and checking the result in. 2. **`wParam` / `lParam` semantics.** Probably none — pattern is "got poked, pull state via `GetStatistics`." Confirm during the probe. 3. **Threading.** AVEVA almost certainly delivers the WM on the thread that owns the window. The worker's STA is the natural home; the existing `StaRuntime` already runs a pump there. If AVEVA assumes a UI thread inside `GetStatistics`, the alarm path may need its own STA. 4. **Subscription scope.** `Subscribe(szSubscription, ...)` takes an AVEVA-syntax string for the alarm provider. The configured Galaxy name is already known to the worker via the existing data session — reuse it. Next A.2 PR is a code change that: 1. Creates a hidden message-only window inside the worker's STA. 2. Implements a `WindowProc` that intercepts the AVEVA WM_APP message and routes change-enumeration into `MxAccessAlarmEventSink.EnqueueTransition`. 3. Replaces `AlarmClientConsumer.Subscribe`'s `hWnd: 0` placeholder with the real window handle. 4. Wires the consumer into worker session-startup alongside the existing data-change sink (composite-sink pattern). 5. Tests against the dev rig with at least one Galaxy alarm raise/ack/clear cycle. The doc-only PR keeps that future code-change PR tightly scoped — the discovery / re-architecture rationale is already captured.
Author
Owner

Update 2026-05-01 — live runtime probe results

Added AlarmClientWmProbeTests.cs to mxaccessgw and ran it against the live AVEVA install on this dev rig. Results in PR #118 (now also includes the probe code + revised findings in docs/AlarmClientDiscovery.md):

RegisterConsumer and Subscribe both return 0 (success). The lifecycle calls are valid against the deployed assembly.

A registered-message-class WM (ID 0xC275 in this OS session) fires every ~1 second after Subscribe completes. Constant wParam=0x1100, constant lParam=0x079E46D8 (looks like a stable internal pointer) for all 20 hits with no manual alarm fired. The constant payload + 1Hz cadence suggests a heartbeat/keepalive, not a per-change notification.

Critically: this WM is delivered to AVEVA's own internal window (hwnd=0x18032E), NOT to the consumer hWnd we registered. The consumer window receives only the standard WM_CREATE / WM_DESTROY lifecycle sequence — nothing from AVEVA in between.

What this changes

The WM_APP-pump design from the original plan banner does not match how AVEVA actually delivers notifications. The hWnd parameter to RegisterConsumer appears to be a registration identity only — AVEVA's notification path runs entirely against its own internal window. A worker hWnd would never receive any of AVEVA's alarm traffic.

New A.2 design options

Replace the previous WM_APP-pump approach with one of:

  1. Polling. Call GetStatistics on a 500ms (or configurable) timer in the worker's STA and react to whatever change set it reports. No window plumbing needed. Latency floor = poll period. Matches AVEVA's own internal heartbeat cadence. Cheap to implement and robust against AVEVA-internal change.
  2. Hook AVEVA's internal window. SetWindowSubclass on AVEVA's hwnd, intercept WM 0xC275 on AVEVA's thread. Lower latency but invasive, fragile across AVEVA upgrades, requires same-process/thread coupling. Likely a non-starter.

Recommendation: option 1 (polling). The unanswered question is whether GetStatistics is safe to call outside AVEVA's own message-pump thread — confirmable with a follow-up probe.

Open follow-up probes (documented in docs/AlarmClientDiscovery.md)

  1. Fire a real Galaxy alarm during pump and check whether WM 0xC275 cadence changes or GetStatistics returns non-empty arrays. This needs human interaction with the System Platform IDE.
  2. GetStatistics threading-affinity test.
  3. Hook AVEVA's internal window 0x18032E to confirm option-2 viability.
  4. Decompile aaAlarmManagedClient.dll IL for RegisterConsumer to find whether WNAL_Register's callback surface is wrapped (the alarmlst.dll strings include WNAL_CallBack and a 'Invalid callbacks' error suggesting the underlying C API takes callbacks the managed wrapper might hide).

Next concrete step is probe (1) — fire a real alarm with the probe running. Without that, option 1 vs 2 is a guess; with it we'll know whether GetStatistics actually reports per-change deltas or whether AVEVA's notification layer is fundamentally one-way-into-AVEVA-internal.

## Update 2026-05-01 — live runtime probe results Added `AlarmClientWmProbeTests.cs` to mxaccessgw and ran it against the live AVEVA install on this dev rig. Results in PR #118 (now also includes the probe code + revised findings in `docs/AlarmClientDiscovery.md`): **`RegisterConsumer` and `Subscribe` both return 0 (success).** The lifecycle calls are valid against the deployed assembly. **A registered-message-class WM (ID 0xC275 in this OS session) fires every ~1 second after `Subscribe` completes.** Constant `wParam=0x1100`, constant `lParam=0x079E46D8` (looks like a stable internal pointer) for all 20 hits with no manual alarm fired. The constant payload + 1Hz cadence suggests a heartbeat/keepalive, not a per-change notification. **Critically: this WM is delivered to AVEVA's own internal window (hwnd=0x18032E), NOT to the consumer hWnd we registered.** The consumer window receives only the standard `WM_CREATE` / `WM_DESTROY` lifecycle sequence — nothing from AVEVA in between. ## What this changes The WM_APP-pump design from the original plan banner does **not** match how AVEVA actually delivers notifications. The hWnd parameter to `RegisterConsumer` appears to be a registration identity only — AVEVA's notification path runs entirely against its own internal window. A worker hWnd would never receive any of AVEVA's alarm traffic. ## New A.2 design options Replace the previous WM_APP-pump approach with one of: 1. **Polling.** Call `GetStatistics` on a 500ms (or configurable) timer in the worker's STA and react to whatever change set it reports. No window plumbing needed. Latency floor = poll period. Matches AVEVA's own internal heartbeat cadence. Cheap to implement and robust against AVEVA-internal change. 2. **Hook AVEVA's internal window.** `SetWindowSubclass` on AVEVA's hwnd, intercept WM 0xC275 on AVEVA's thread. Lower latency but invasive, fragile across AVEVA upgrades, requires same-process/thread coupling. Likely a non-starter. **Recommendation:** option 1 (polling). The unanswered question is whether `GetStatistics` is safe to call outside AVEVA's own message-pump thread — confirmable with a follow-up probe. ## Open follow-up probes (documented in `docs/AlarmClientDiscovery.md`) 1. Fire a real Galaxy alarm during pump and check whether WM 0xC275 cadence changes or `GetStatistics` returns non-empty arrays. **This needs human interaction with the System Platform IDE.** 2. `GetStatistics` threading-affinity test. 3. Hook AVEVA's internal window 0x18032E to confirm option-2 viability. 4. Decompile `aaAlarmManagedClient.dll` IL for `RegisterConsumer` to find whether `WNAL_Register`'s callback surface is wrapped (the alarmlst.dll strings include `WNAL_CallBack` and a 'Invalid callbacks' error suggesting the underlying C API takes callbacks the managed wrapper might hide). Next concrete step is probe (1) — fire a real alarm with the probe running. Without that, option 1 vs 2 is a guess; with it we'll know whether `GetStatistics` actually reports per-change deltas or whether AVEVA's notification layer is fundamentally one-way-into-AVEVA-internal.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dohertj2/lmxopcua#420