Files
mxaccessgw/docs/AlarmClientDiscovery.md
T
Joseph Doherty 4e8928cf71 probe: InitializeConsumer required — provider visible after, alarms still absent
InitializeConsumer was the missing call. Adding it before
RegisterConsumer makes the \Galaxy! provider appear in
GetProviders (status 0 -> 100 within 500ms). Without Initialize,
GetProviders returns an empty list even though everything else
returns rc=0 (success).

Probe trace 2026-05-01:

  InitializeConsumer -> 0
  RegisterConsumer -> 0
  GetProviders [after Register] -> count=0 list=[]
  Subscribe('\Galaxy!') -> 0
  GetProviders [after Subscribe] -> count=1 list=[  0 \Galaxy!]
  GetProviders [poll #1] -> count=1 list=[100 \Galaxy!]

Despite the provider being at "100% query complete" for the
entire 60s window, GetStatistics continued to report
total=0 active=0 codes=[7] -- no alarm transitions reached the
consumer even with a System Platform script flipping
TestMachine_001.TestAlarm001 every 10s during the run.

So the consumer chain works end-to-end. What's missing is alarm
traffic from the producer side. The next discriminator is
whether ObjectViewer (or another live consumer) sees the alarm
fire while the script runs.

API-ordering bug fix to apply to PR A.5's AlarmClientConsumer
regardless of how A.2 lands: AlarmClientConsumer.Subscribe
should call InitializeConsumer before RegisterConsumer (currently
omits Initialize entirely, which means the provider chain is
never visible from the worker either). That fix lifts a
fundamental bug independent of the polling-vs-callback question.

Probe changes:

- Added InitializeConsumer call before RegisterConsumer.
- Added LogProviders helper that logs only on change; called
  after Register, after Subscribe, and on every poll. Easier
  to spot when the provider chain transitions from empty to
  populated.
- Restored Skip-gating after run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:43:06 -04:00

14 KiB

aaAlarmManagedClient discovery — public surface, 2026-05-01

Result of running MxGateway.Worker.Tests.AlarmClientDiscoveryTests.DumpAlarmClientPublicSurface against the deployed AVEVA assembly:

  • File: C:\Program Files (x86)\ArchestrA\Framework\Bin\ViewAppFramework\Content\MA\aaAlarmManagedClient.dll
  • Assembly identity: aaAlarmManagedClient, Version=1.0.7368.41290, Culture=neutral, PublicKeyToken=7ebd82b507d9e10c

Public types

  • aaAlarmManagedClient.AlarmClient (class)
  • aaAlarmManagedClient.PriorityData (class)

That's the entire exported surface — two types, no interfaces, no delegates.

AlarmClient events

None. The class has no public events at all. The reflection probe's GetEvents(BindingFlags.Public | Instance | Static) returned an empty list.

AlarmClient methods (relevant subset)

  • Lifecycle: RegisterConsumer(int hWnd, string szProductName, string szApplicationName, string szVersion, bool bRetainHiddenAlarms) → int, DeregisterConsumer() → int, InitializeConsumer(string szApplicationName) → int, UninitializeConsumer() → int, Dispose().
  • Subscription: Subscribe(string szSubscription, short wFromPri, short wToPri, eQueryType QueryType, eSortFlags SortFlags, eAlarmFilterState FilterMask, eAlarmFilterState FilterSpecification) → int.
  • Change enumeration (pull on poke): GetStatistics(out int lPercentQuery, out int lTotalAlarms, out int lActiveAlarms, out int lSuppressedAlarms, out int lSuppressedFilters, out int lNewAlarms, out int lChangesCount, out int[] ChangeCodes, out int[] ChangePos, out int[] hAlarm) → int.
  • Record fetch: GetAlarmExtendedRec(int lIndex, out AlarmRecord almRec) → int, GetAlarmExtendedRec2(...), GetHighPriAlarm(out AlarmRecord almRec) → int.
  • Selection model (used by ack-selected-* family): DeselectAll, SelectAlaramEntry(short select, int from, int to), SelectByGUID(Guid), SelectAlarmCount(int from, int to).
  • Acknowledge: AlarmAckByGUID(Guid alarmGuid, string ackComment, string ackOprName, string ackOprNode, string ackOprDomain, string ackOprFullName) → int is the per-alarm full-fidelity native ack. AlarmAckSelected(string ackComment, string ackOprName, string ackOprNode, string ackOprDomain, string ackOprFullName) → int acks whatever the selection model currently has selected. Several AckSelected*Group/Tag/Priority/All/Visible*Alarms_Ex(...) variants exist for bulk ack scoped to a group / tag / priority range.
  • Suppress / shelve: SupressSelected* and ShelveSelected* families plus DoAlarmShelveAction(...). Out of scope for the v1 alarm path.
  • Snapshot/filter (SF* prefix): SFSetSortA / SFSetFilterA / SFCreateSnapshot / SFGetListCount / SFDeleteSnapshot / SFRefreshAlarm / SFGetStatistics. Snapshot-style query API, distinct from the consumer-subscription path. Not currently used.

What this means

The architecture comment on src/MxGateway.Worker/MxAccess/AlarmClientConsumer.cs (PR A.5) is wrong against this deployed assembly:

"The AVEVA alarm-manager surface (IAlarmMgrDataProvider) exposes the events we need as plain .NET events — no Windows message pump required."

There is no managed event surface. AlarmClient.RegisterConsumer takes an hWnd because WM_APP messaging is the actual notification mechanism: AVEVA's alarm provider WM_APP-pokes the registered window, and the consumer is expected to call GetStatistics on each poke to pull ChangeCodes / ChangePos / hAlarm arrays, then GetAlarmExtendedRec(pos, …) per index to fetch each changed record.

AlarmClientConsumer.AlarmRecordReceived has no production callers as a result — RaiseAlarmRecordReceived is internal for tests and never gets invoked at runtime. Until A.2 lands a WM_APP pump, MX_EVENT_FAMILY_ON_ALARM_TRANSITION cannot carry events.

Live runtime probe — 2026-05-01

MxGateway.Worker.Tests.AlarmClientWmProbeTests.ProbeAlarmClientWmMessages is a Skip-gated runtime probe that creates a real message-only window, calls AlarmClient.RegisterConsumer(hWnd, …) + Subscribe(@"\Galaxy!", …), and pumps for 20s while logging every window message that arrives. Run results below — this turned the "WM_APP pump" design assumption upside down.

RegisterConsumer and Subscribe both returned 0 (success). The calls are valid against the deployed assembly; no parameter pinning needed.

A registered-message-class WM (ID 0xC275 in this OS session) fired every ~1s after Subscribe completed. Constant wParam = 0x00001100, constant lParam = 0x079E46D8 (looks like a stable pointer into AVEVA-internal state) for all 20 hits. The constant payload across hits with no Galaxy alarm being fired suggests this is a heartbeat/keepalive, not a per-change notification.

Critically: this WM is delivered to AVEVA's own internal window (hwnd=0x18032E) — NOT to the consumer's hWnd we passed in. The consumer window's WndProc received only the standard creation sequence (WM_GETMINMAXINFO, WM_NCCREATE, WM_NCCALCSIZE, WM_CREATE) and the destruction sequence (WM_NCDESTROY, WM_DESTROY, WM_NCCALCSIZE) — nothing in between. AVEVA's notification path runs entirely against AVEVA's internal window; it never forwards to the user-supplied hWnd.

The message ID itself is dynamic (a RegisterWindowMessage allocation in the >= 0xC000 range), so it cannot be hard-coded — each consumer process must call RegisterWindowMessage with the correct string and use whatever ID the OS returns.

What this means for A.2

The "WM_APP pump on the user hWnd" design — what the original plan banner described and what the previous version of this doc recommended — does not match how AVEVA actually delivers notifications. The hWnd parameter to RegisterConsumer does not appear to receive any of AVEVA's alarm traffic; it's likely used only as a registration identity (and perhaps as a parent for modal dialogs).

Two viable A.2 designs given the probe data:

  1. Polling. Just call GetStatistics on a timer (e.g. every 500ms in the worker's STA) and react to the change set it reports. No window plumbing needed. Trade-off: latency floor = poll period; modest CPU floor because the call is cheap. Matches the heartbeat-style WM 0xC275 semantics — AVEVA itself runs a poll loop internally.
  2. Hook AVEVA's internal window. Discover AVEVA's own window (hwnd=0x18032E in the probe), SetWindowsHookEx or SetWindowSubclass on it, and intercept WM 0xC275 on AVEVA's thread. Higher fidelity, near-zero latency, but invasive, fragile across AVEVA upgrades, and requires running on the same process / thread as the AVEVA window. Probably a non-starter without further AVEVA documentation.

Recommendation: the polling path (option 1) is cheaper to implement, more robust against AVEVA-internal change, and acceptable for a typical alarm cadence. The worker's existing STA already provides a thread-affinitized timer surface. The unanswered question is whether GetStatistics can be safely called outside AVEVA's own message-pump thread — confirmable by extending the probe to fire GetStatistics on its own thread and check the result.

Alarm-provider visibility — third probe run, 2026-05-01

Extended the probe to call AlarmClient.GetProviders after RegisterConsumer. Result on this rig:

GetProviders -> rc=0 count=0 list=[]

Zero alarm providers visible to the consumer process. This explains every preceding probe run: no providers means no alarm events, regardless of how many times any value (including a bool with an $Alarm extension) flips. Subscribe(@"\Galaxy!") returns 0 (success) but matches nothing because the alarm-manager chain that provides the matching feed doesn't expose any provider to this consumer.

A System Platform script flipping TestMachine_001.TestAlarm001 every 10s during this probe run produced no observable GetStatistics transitions, no positions[] / handles[] entries, no change in any field — confirms the silence is not about subscription-scope / message-pump but about provider absence.

Possible causes

  1. No $Alarm extension on the test bool. If TestMachine_001.TestAlarm001 is a regular UDA without a BoolAlarm extension wired to it, flipping the value just writes a new value — no alarm fires.
  2. Alarm manager service not running. AVEVA's aaAlarmMgr (or the equivalent on this rig's Platform version) needs to be running for providers to register.
  3. Process security context. A consumer running under a normal user account may not see providers that registered under LocalSystem / a Platform service identity. The gateway-worker installation runs under a service account that may have access where dotnet test doesn't.

InitializeConsumer required — fourth probe run, 2026-05-01

Adding InitializeConsumer("AlarmProbe.Tests") before RegisterConsumer made \Galaxy! appear in GetProviders (count=1, status 0 → 100 within 500ms). So #2 and #3 above are NOT the cause — the consumer can see the alarm provider once it calls Initialize. That's a missing API-call ordering, not a permission or service issue.

InitializeConsumer -> 0
RegisterConsumer -> 0
GetProviders [after Register] -> rc=0 count=0 list=[]
Subscribe('\Galaxy!') -> 0
GetProviders [after Subscribe] -> rc=0 count=1 list=[  0 \Galaxy!]
GetProviders [poll #1] -> rc=0 count=1 list=[100 \Galaxy!]

Despite the provider being visible at "100% query complete" for the entire 60s window, GetStatistics continued to report total=0 active=0 codes=[7] — no alarm transitions reached the consumer even with a System Platform script flipping the test boolean every 10s during the run.

That isolates the remaining unknown to whether the test bool's alarm extension is actually generating MxAccess alarm-provider events when its value flips. The probe has confirmed every link in the consumer chain works (Initialize → Register → Subscribe → provider visible at 100%) — what's missing is alarm traffic from the producer side. ObjectViewer or another live consumer running alongside the script is the next discriminator: does it visibly see the alarm fire?

API-ordering finding: InitializeConsumer MUST precede RegisterConsumer (or at least, must be called before GetProviders returns anything). PR A.5's AlarmClientConsumer omits InitializeConsumer entirely — that's a bug fix to apply even before A.2 lands, since without it the provider chain never becomes visible.

Implications for A.2 implementation

The A.2 PR's value is unmeasurable until at least one alarm provider is visible. The choice between polling-via-GetStatistics and the callback path can only be decided by observing what populates first when a real alarm fires. Without a provider, both paths return the same "nothing happening" answer.

Until that's resolved, A.2 implementation work is genuinely blocked on a dev-rig configuration issue — not on architectural choice or code structure.

GetStatistics polling — second probe run, 2026-05-01

Extended the probe to call GetStatistics every ~2s alongside the WM logger. Key findings:

  • GetStatistics is safely callable from the same thread that did RegisterConsumer + Subscribe. Every poll returned rc=0 with no exceptions over 9 polls / 20s window.
  • The deployed Galaxy currently has zero active alarms. Every poll reported total=0 active=0 suppressed=0 newAlarms=0. The positions[] and handles[] arrays were empty.
  • changes=1 codes=[7] was constant across all polls, matching the constant 1 Hz WM 0xC275 cadence. Code 7 is consistent with a "heartbeat / subscription healthy" sentinel — same semantics as the WM but reported through the pull-side API.
  • percent=100 (query-complete percentage) was constant — the subscription is steady-state.

This confirms the polling design (option 1 in the previous section) is mechanically viable. The remaining open question is whether GetStatistics populates positions[] / handles[] with real entries when an alarm transition actually fires — proving that requires firing an alarm.

Open follow-up probes

Each can be added to AlarmClientWmProbeTests as a separate Skip-gated test:

  1. Fire a real Galaxy alarm during the pump window. The cleanest programmatic trigger is an MxAccess write that flips a $Alarm-extended boolean to true (alarm in) and back to false (alarm out). Pinning the exact tag reference is pending — needs either a documented test-fixture tag or an interactive selection in System Platform IDE. Once the trigger fires, this resolves whether AVEVA's pulled change set arrives via GetStatistics positions[] / handles[] (per-change polling works) or only via the AVEVA-internal window (callback path needed).
  2. Hook AVEVA's internal window to log what WMs it actually processes — only relevant if probe 1 shows GetStatistics does NOT report per-change activity.
  3. Decompile aaAlarmManagedClient.dll's IL for the RegisterConsumer method to find what RegisterWindowMessage string is used and whether there's a callback-registration surface on WNAL_Register that the managed client wraps. The alarmlst.dll strings (WNAL_CallBack, "Invalid callbacks" error) suggest the underlying C API takes callbacks, but the managed wrapper exposes none of them.

PR A.5's Subscribe / AcknowledgeByGuid / SnapshotActiveAlarms are correct — they're pull-style and don't depend on the notification mechanism.