Files
mxaccessgw/docs/AlarmClientDiscovery.md
T
Joseph Doherty f490ae2593 docs: revise interop fix path — wnwrapConsumer.dll is the right surface
Reflection on aaAlarmManagedClient.AlarmClient shows it implements
only IDisposable (no [ComImport] interface, no class GUID) and
has a single field "CwwAlarmConsumer* m_almUnmanaged". So
AlarmClient is a C++/CLI managed wrapper around a native C++
class -- NOT a COM-interop class. The DateTime conversion happens
INSIDE AVEVA's wrapper IL, not at the .NET-COM marshaling
boundary. There's no separate COM interface to QI to.

Revised approach (in docs/AlarmClientDiscovery.md):

A. wnwrapConsumer.dll -- separate standalone COM library AVEVA
   ships at "C:\Program Files (x86)\Common Files\ArchestrA"
   exposing WNWRAPCONSUMERLib.wwAlarmConsumerClass with
   SetXmlAlarmQuery / GetXmlCurrentAlarms. XML-string output
   bypasses FILETIME marshaling entirely. Best fit -- real COM,
   self-contained, conventional production-grade approach.
B. Patch aaAlarmManagedClient.dll IL -- direct but modifies a
   vendor binary, brittle to upgrades.
C. Reflect into m_almUnmanaged and call native vtable directly
   -- requires reverse-engineering the C++ class layout.

Picking A. Probe restored to Skip; next commit starts the
wnwrapConsumer integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:15:37 -04:00

495 lines
22 KiB
Markdown

# aaAlarmManagedClient discovery — public surface, 2026-05-01
Result of running
`MxGateway.Worker.Tests.AlarmClientDiscoveryTests.DumpAlarmClientPublicSurface`
against the deployed AVEVA assembly:
- File:
`C:\Program Files (x86)\ArchestrA\Framework\Bin\ViewAppFramework\Content\MA\aaAlarmManagedClient.dll`
- Assembly identity: `aaAlarmManagedClient, Version=1.0.7368.41290,
Culture=neutral, PublicKeyToken=7ebd82b507d9e10c`
## Public types
- `aaAlarmManagedClient.AlarmClient` (class)
- `aaAlarmManagedClient.PriorityData` (class)
That's the entire exported surface — two types, no interfaces, no
delegates.
## `AlarmClient` events
**None.** The class has no public events at all. The reflection probe's
`GetEvents(BindingFlags.Public | Instance | Static)` returned an empty
list.
## `AlarmClient` methods (relevant subset)
- **Lifecycle:**
`RegisterConsumer(int hWnd, string szProductName, string
szApplicationName, string szVersion, bool bRetainHiddenAlarms) → int`,
`DeregisterConsumer() → int`,
`InitializeConsumer(string szApplicationName) → int`,
`UninitializeConsumer() → int`,
`Dispose()`.
- **Subscription:**
`Subscribe(string szSubscription, short wFromPri, short wToPri,
eQueryType QueryType, eSortFlags SortFlags, eAlarmFilterState
FilterMask, eAlarmFilterState FilterSpecification) → int`.
- **Change enumeration (pull on poke):**
`GetStatistics(out int lPercentQuery, out int lTotalAlarms, out int
lActiveAlarms, out int lSuppressedAlarms, out int lSuppressedFilters,
out int lNewAlarms, out int lChangesCount, out int[] ChangeCodes,
out int[] ChangePos, out int[] hAlarm) → int`.
- **Record fetch:**
`GetAlarmExtendedRec(int lIndex, out AlarmRecord almRec) → int`,
`GetAlarmExtendedRec2(...)`,
`GetHighPriAlarm(out AlarmRecord almRec) → int`.
- **Selection model** (used by ack-selected-* family):
`DeselectAll`, `SelectAlaramEntry(short select, int from, int to)`,
`SelectByGUID(Guid)`, `SelectAlarmCount(int from, int to)`.
- **Acknowledge:**
`AlarmAckByGUID(Guid alarmGuid, string ackComment, string ackOprName,
string ackOprNode, string ackOprDomain, string ackOprFullName) → int`
is the per-alarm full-fidelity native ack.
`AlarmAckSelected(string ackComment, string ackOprName, string
ackOprNode, string ackOprDomain, string ackOprFullName) → int`
acks whatever the selection model currently has selected.
Several `AckSelected*Group/Tag/Priority/All/Visible*Alarms_Ex(...)`
variants exist for bulk ack scoped to a group / tag / priority range.
- **Suppress / shelve:** `SupressSelected*` and `ShelveSelected*`
families plus `DoAlarmShelveAction(...)`. Out of scope for the v1
alarm path.
- **Snapshot/filter** (`SF*` prefix): `SFSetSortA / SFSetFilterA /
SFCreateSnapshot / SFGetListCount / SFDeleteSnapshot / SFRefreshAlarm /
SFGetStatistics`. Snapshot-style query API, distinct from the
consumer-subscription path. Not currently used.
## What this means
The architecture comment on
`src/MxGateway.Worker/MxAccess/AlarmClientConsumer.cs` (PR A.5) is
**wrong against this deployed assembly**:
> "The AVEVA alarm-manager surface (`IAlarmMgrDataProvider`) exposes
> the events we need as plain .NET events — no Windows message pump
> required."
There is no managed event surface. `AlarmClient.RegisterConsumer`
takes an `hWnd` because **WM_APP messaging is the actual notification
mechanism**: AVEVA's alarm provider WM_APP-pokes the registered window,
and the consumer is expected to call `GetStatistics` on each poke to
pull `ChangeCodes` / `ChangePos` / `hAlarm` arrays, then
`GetAlarmExtendedRec(pos, …)` per index to fetch each changed record.
`AlarmClientConsumer.AlarmRecordReceived` has no production callers as
a result — `RaiseAlarmRecordReceived` is `internal` for tests and
never gets invoked at runtime. Until A.2 lands a WM_APP pump,
`MX_EVENT_FAMILY_ON_ALARM_TRANSITION` cannot carry events.
## Live runtime probe — 2026-05-01
`MxGateway.Worker.Tests.AlarmClientWmProbeTests.ProbeAlarmClientWmMessages`
is a Skip-gated runtime probe that creates a real message-only
window, calls `AlarmClient.RegisterConsumer(hWnd, …)` +
`Subscribe(@"\Galaxy!", …)`, and pumps for 20s while logging every
window message that arrives. Run results below — this turned the
"WM_APP pump" design assumption upside down.
**`RegisterConsumer` and `Subscribe` both returned 0 (success).** The
calls are valid against the deployed assembly; no parameter pinning
needed.
**A registered-message-class WM (ID `0xC275` in this OS session)
fired every ~1s after `Subscribe` completed.** Constant
`wParam = 0x00001100`, constant `lParam = 0x079E46D8` (looks like a
stable pointer into AVEVA-internal state) for all 20 hits. The
constant payload across hits with no Galaxy alarm being fired
suggests this is a **heartbeat/keepalive**, not a per-change
notification.
**Critically: this WM is delivered to AVEVA's own internal window
(`hwnd=0x18032E`) — NOT to the consumer's `hWnd` we passed in.** The
consumer window's `WndProc` received only the standard creation
sequence (`WM_GETMINMAXINFO`, `WM_NCCREATE`, `WM_NCCALCSIZE`,
`WM_CREATE`) and the destruction sequence (`WM_NCDESTROY`,
`WM_DESTROY`, `WM_NCCALCSIZE`) — nothing in between. AVEVA's
notification path runs entirely against AVEVA's internal window;
it never forwards to the user-supplied hWnd.
The message ID itself is dynamic (a `RegisterWindowMessage`
allocation in the >= 0xC000 range), so it cannot be hard-coded —
each consumer process must call `RegisterWindowMessage` with the
correct *string* and use whatever ID the OS returns.
## What this means for A.2
The "WM_APP pump on the user hWnd" design — what the original plan
banner described and what the previous version of this doc
recommended — does not match how AVEVA actually delivers
notifications. The hWnd parameter to `RegisterConsumer` does not
appear to receive any of AVEVA's alarm traffic; it's likely used
only as a registration identity (and perhaps as a parent for modal
dialogs).
Two viable A.2 designs given the probe data:
1. **Polling.** Just call `GetStatistics` on a timer (e.g. every
500ms in the worker's STA) and react to the change set it
reports. No window plumbing needed. Trade-off: latency floor =
poll period; modest CPU floor because the call is cheap. Matches
the heartbeat-style WM 0xC275 semantics — AVEVA itself runs a
poll loop internally.
2. **Hook AVEVA's internal window.** Discover AVEVA's own window
(`hwnd=0x18032E` in the probe), `SetWindowsHookEx` or
`SetWindowSubclass` on it, and intercept WM 0xC275 on AVEVA's
thread. Higher fidelity, near-zero latency, but invasive,
fragile across AVEVA upgrades, and requires running on the same
process / thread as the AVEVA window. Probably a non-starter
without further AVEVA documentation.
**Recommendation:** the polling path (option 1) is cheaper to
implement, more robust against AVEVA-internal change, and
acceptable for a typical alarm cadence. The worker's existing STA
already provides a thread-affinitized timer surface. The unanswered
question is whether `GetStatistics` can be safely called outside
AVEVA's own message-pump thread — confirmable by extending the
probe to fire `GetStatistics` on its own thread and check the
result.
## Alarm-provider visibility — third probe run, 2026-05-01
Extended the probe to call `AlarmClient.GetProviders` after
`RegisterConsumer`. Result on this rig:
```
GetProviders -> rc=0 count=0 list=[]
```
**Zero alarm providers visible to the consumer process.** This
explains every preceding probe run: no providers means no alarm
events, regardless of how many times any value (including a
bool with an `$Alarm` extension) flips. `Subscribe(@"\Galaxy!")`
returns 0 (success) but matches nothing because the alarm-manager
chain that provides the matching feed doesn't expose any provider
to this consumer.
A System Platform script flipping `TestMachine_001.TestAlarm001`
every 10s during this probe run produced no observable
`GetStatistics` transitions, no `positions[]` / `handles[]`
entries, no change in any field — confirms the silence is not
about subscription-scope / message-pump but about provider
absence.
### Possible causes
1. **No `$Alarm` extension on the test bool.** If
`TestMachine_001.TestAlarm001` is a regular UDA without a
`BoolAlarm` extension wired to it, flipping the value just
writes a new value — no alarm fires.
2. **Alarm manager service not running.** AVEVA's `aaAlarmMgr`
(or the equivalent on this rig's Platform version) needs to
be running for providers to register.
3. **Process security context.** A consumer running under a
normal user account may not see providers that registered
under `LocalSystem` / a Platform service identity. The
gateway-worker installation runs under a service account
that may have access where `dotnet test` doesn't.
## InitializeConsumer required — fourth probe run, 2026-05-01
Adding `InitializeConsumer("AlarmProbe.Tests")` before
`RegisterConsumer` made `\Galaxy!` appear in `GetProviders`
(count=1, status 0 → 100 within 500ms). So #2 and #3 above are
NOT the cause — the consumer can see the alarm provider once it
calls Initialize. That's a missing API-call ordering, not a
permission or service issue.
```
InitializeConsumer -> 0
RegisterConsumer -> 0
GetProviders [after Register] -> rc=0 count=0 list=[]
Subscribe('\Galaxy!') -> 0
GetProviders [after Subscribe] -> rc=0 count=1 list=[ 0 \Galaxy!]
GetProviders [poll #1] -> rc=0 count=1 list=[100 \Galaxy!]
```
Despite the provider being visible at "100% query complete" for
the entire 60s window, `GetStatistics` continued to report
`total=0 active=0 codes=[7]` — no alarm transitions reached the
consumer even with a System Platform script flipping the test
boolean every 10s during the run.
That isolates the remaining unknown to whether the test bool's
alarm extension is actually generating MxAccess alarm-provider
events when its value flips. The probe has confirmed every link
in the consumer chain works (Initialize → Register → Subscribe →
provider visible at 100%) — what's missing is alarm traffic from
the producer side. ObjectViewer or another live consumer running
alongside the script is the next discriminator: does it visibly
see the alarm fire?
API-ordering finding: `InitializeConsumer` MUST precede
`RegisterConsumer` (or at least, must be called before
`GetProviders` returns anything). PR A.5's `AlarmClientConsumer`
omits `InitializeConsumer` entirely — that's a bug fix to apply
even before A.2 lands, since without it the provider chain never
becomes visible.
## Subscribe-parameter sweep — fifth probe run, 2026-05-01
Even with `InitializeConsumer` + provider visible at status 100,
no alarm transitions arrived during a 60s window with the user's
script flipping the test bool every 10s. Tried:
- `qtSummary` and `qtHistory` (the only `eQueryType` values).
- Priority 1..999 and 0..32767.
- `eAlarmFilterState.asNone` and `asAlarmActiveNow` for both
`FilterMask` and `FilterSpecification`.
`eAlarmFilterState` is single-state-valued (asNone=0,
asAlarmActiveNow=1, asAlarmAcked=2, asShelved=3), not flag bits.
None of these knobs surfaced any alarm activity.
User confirmation 2026-05-01: the test bool does have a
`BoolAlarm` extension on it; in `aaObjectViewer` the
`$Alarm.InAlarm` sub-attribute flips true/false in lockstep with
the script's writes. So the alarm extension is **evaluating**
its condition, just not visibly producing transitions on the
`aaAlarmManagedClient` consumer stream.
## Multi-channel + multi-subscription probe — sixth run, 2026-05-01
Extended the probe to try every consumer-side approach in
parallel:
- **Subscription expressions** (sequential): `\Galaxy!`,
`\Galaxy!*`, `\\Galaxy!`, `\Galaxy!TestArea`, `\\.\Galaxy!`.
All Subscribe calls returned rc=0; the last one
(`\\.\Galaxy!`) is reflected in `GetProviders` (count=1).
- **Read channels** polled at 500ms cadence: `GetStatistics`,
`GetHighPriAlarm`, `SFCreateSnapshot` + `SFGetStatistics`.
- **Filter+sort**: priority 0..32767, `qtSummary`,
state=`asAlarmActiveNow`, sort=`sfReturnNewestFirst`.
- **AlarmRecord init** (worked around `Not a valid Win32
FileTime` exception): all DateTime fields pre-set to FILETIME
epoch (1601-01-01 UTC) before the call, since
`default(DateTime)` is outside FILETIME range and trips the
interop marshaler.
Result of the 60s run with `TestMachine_001.TestAlarm001` being
flipped every 10s:
```
Subscribe('\Galaxy!') -> 0
Subscribe('\Galaxy!*') -> 0
Subscribe('\\Galaxy!') -> 0
Subscribe('\Galaxy!TestArea') -> 0
Subscribe('\\.\Galaxy!') -> 0
GetProviders [after Subscribe-multi] -> count=1 list=[ 0 \\.\Galaxy!]
GetStatistics #1: total=0 active=0 changes=1 codes=[7] positions=[] handles=[]
GetHighPriAlarm #1: rc=0 { }
SF channel #1: SFCreate=0 numAlarms=0 SFStats=0 unackRet=0 unackAlm=0 ackAlm=0 others=0 events=0 idxNewest=-1
```
**No further "(changed)" entries for the entire 60s window.**
Every read API returned the same empty result on every poll.
User confirms the alarm IS firing — `aaObjectViewer` sees
`$Alarm.InAlarm` flip in lockstep with the script. Historian
records exist (per user — needs verification by querying the
historian directly).
## Conclusion of consumer-side probing
`aaAlarmManagedClient.AlarmClient` is **not** the receive
surface AVEVA's alarm pipeline routes to in this Galaxy
configuration. The consumer chain is verified end-to-end:
- `InitializeConsumer` + `RegisterConsumer` + `Subscribe` all
succeed (rc=0).
- `GetProviders` finds `\Galaxy!` once Initialize is called.
- All read APIs (`GetStatistics`, `GetHighPriAlarm`,
`SFCreateSnapshot`/`SFGetStatistics`) return empty even with
every documented filter combination.
- The consumer's hWnd receives zero AVEVA messages between
`WM_CREATE` and `WM_DESTROY`; AVEVA's traffic goes to its own
internal hwnd.
The next investigation directions are not consumer-side:
1. **Inspect `aaObjectViewer`'s alarm SDK** to see what library
it uses to read alarms. If different from
`aaAlarmManagedClient`, switch the worker over.
2. **Query the historian directly** (`aahEventStorage` /
`aahEventSvc`) to confirm alarms are recorded — and use the
same path for v2 alarm capture.
3. **Inspect AVEVA's alarm-routing config** for this Galaxy in
System Platform IDE — area assignments, alarm provider
bindings, "publish alarm events to" settings on the platform.
For A.2 implementation: the `aaAlarmManagedClient` path the
gateway-worker is currently architected around may be a
dead-end on customer Galaxies configured this way. If the
alarms truly only flow through the historian event-storage path,
A.2 needs to consume from `aahEventStorage` instead — a
fundamental architecture pivot.
## BREAKTHROUGH — seventh probe run, 2026-05-01
Two changes finally produced a signal:
1. **Subscription scope:** `\\<MachineName>\Galaxy!<TopArea>` is the
canonical AlarmClient subscription format (per ArchestrA Alarm
Client docs at `archestra6.rssing.com/chan-12008125/article13.html`):
`\\Node\Provider!Area!Filter`, where Node is the *machine* name,
Provider is **literally `Galaxy`**, and Area is a hosted area
object. For this rig (`\\DESKTOP-6JL3KKO\Galaxy!DEV`) the DEV
area — the platform's primary area — is the right scope. Earlier
`\Galaxy!`, `\Galaxy!TestArea`, `\\.\Galaxy!`, etc., all returned
rc=0 but matched no traffic — they were not the canonical form.
2. **`InitializeConsumer` before `RegisterConsumer`** — already
discovered earlier; bug-fix for PR A.5's `AlarmClientConsumer`.
With both in place, `GetHighPriAlarm` returned a record on every
poll for 60s straight (117/117 calls), but threw
`ArgumentOutOfRangeException: Not a valid Win32 FileTime` instead
of returning successfully — the AlarmRecord struct contains five
DateTime fields (`ar_Time`, `ar_OrigTime`, `ar_AckTime`,
`ar_RtnTime`, `ar_SubTime`) and AVEVA writes sentinel/invalid
FILETIME values for unset ones (e.g., `ar_AckTime` for an
unacknowledged alarm). The .NET interop that AVEVA ships
(`aaAlarmManagedClient.dll`) auto-converts FILETIME→DateTime and
rejects out-of-range values.
`GetStatistics` continues to report `total=0 active=0` even with
GetHighPriAlarm returning records — those two API surfaces have
genuinely different views in AVEVA's data model.
So: **alarms flow through `aaAlarmManagedClient.AlarmClient` once
the subscription expression is canonical**. The blocking issue is
extracting the payload past the .NET interop's DateTime
auto-marshaling.
## Remaining work to capture alarm payloads
Define a custom COM interop that uses `long` (FILETIME-as-int64)
instead of `DateTime` for the timestamp fields. Approach options:
1. **Patch the AVEVA-shipped `aaAlarmManagedClient.dll`** — ildasm
the assembly, replace `DateTime` with `long` on AlarmRecord's
timestamp fields, ilasm back. Brittle across AVEVA upgrades.
2. **Write our own `[ComImport]` interface** — declare
`IRawAlarmConsumer` ourselves with safe-blittable types,
discover the underlying COM IID (via reflection on
`AlarmClient`'s `[Guid]` attribute), and `(IRawAlarmConsumer)
alarmClient` cast. Cleaner; requires the IID.
3. **Use `IDispatch` late binding** — dispatch-Invoke bypasses
strong-typed marshaling. Verbose but doesn't need IIDs.
For PR A.2's worker integration, option 2 is the least
disruptive. Once the interop is custom, `AlarmClient.Subscribe` +
`GetHighPriAlarm` + `GetAlarmExtendedRec` form a viable
polling-style alarm consumer.
**REVISED 2026-05-01 — option 1 not directly applicable.**
Reflection on `aaAlarmManagedClient.AlarmClient` shows it
implements only `IDisposable` (no `[ComImport]` interface, no
class GUID). It has a single field `CwwAlarmConsumer*
m_almUnmanaged` — meaning `AlarmClient` is a **C++/CLI managed
wrapper around a native C++ class**, NOT a COM-interop class.
The DateTime conversion happens inside the AVEVA wrapper's IL,
not at a .NET-to-COM marshaling boundary. There is no separate
COM interface IID we can QI to.
Revised approach options:
A. **Switch to `wnwrapConsumer.dll`** — a separate standalone
COM library AVEVA ships at
`C:\Program Files (x86)\Common Files\ArchestrA\wnwrapConsumer.dll`
exposing `WNWRAPCONSUMERLib.wwAlarmConsumerClass` with
`SetXmlAlarmQuery` / `GetXmlCurrentAlarms`. XML-string output
bypasses FILETIME marshaling entirely.
B. **Patch `aaAlarmManagedClient.dll` IL** — wrap the unsafe
`DateTime.FromFileTime` calls with a safe variant. Direct
fix but modifies a vendor binary.
C. **Reflect into `m_almUnmanaged` and call native vtable** —
get the IntPtr, walk the MSVC C++ vtable, call
`__thiscall` methods via `Marshal.GetDelegateForFunctionPointer`.
Doable but requires reverse-engineering the C++ class layout.
Option A is the best fit: real COM-based, self-contained in
our code, conventional production-grade approach (the WIN-911
consumer pattern referenced in AVEVA support forums uses it).
The polling-vs-WM_APP-callback question from earlier is now
moot: `GetStatistics`'s `positions[]/handles[]` arrays remained
empty even when alarms were demonstrably present. The active
read API for current alarms is `GetHighPriAlarm`, not
`GetStatistics`'s change array.
### Implications for A.2 implementation
The A.2 PR's value is unmeasurable until at least one alarm
provider is visible. The choice between polling-via-`GetStatistics`
and the callback path can only be decided by observing what
populates first when a real alarm fires. Without a provider,
both paths return the same "nothing happening" answer.
Until that's resolved, A.2 implementation work is genuinely
blocked on a dev-rig configuration issue — not on architectural
choice or code structure.
## GetStatistics polling — second probe run, 2026-05-01
Extended the probe to call `GetStatistics` every ~2s alongside the
WM logger. Key findings:
- **`GetStatistics` is safely callable from the same thread that
did `RegisterConsumer` + `Subscribe`.** Every poll returned rc=0
with no exceptions over 9 polls / 20s window.
- **The deployed Galaxy currently has zero active alarms.** Every
poll reported `total=0 active=0 suppressed=0 newAlarms=0`. The
`positions[]` and `handles[]` arrays were empty.
- **`changes=1 codes=[7]` was constant across all polls**, matching
the constant 1 Hz WM 0xC275 cadence. Code 7 is consistent with a
"heartbeat / subscription healthy" sentinel — same semantics as
the WM but reported through the pull-side API.
- `percent=100` (query-complete percentage) was constant — the
subscription is steady-state.
This confirms the polling design (option 1 in the previous section)
is mechanically viable. The remaining open question is whether
`GetStatistics` populates `positions[] / handles[]` with real
entries when an alarm transition actually fires — proving that
requires firing an alarm.
## Open follow-up probes
Each can be added to `AlarmClientWmProbeTests` as a separate
Skip-gated test:
1. **Fire a real Galaxy alarm during the pump window.** The cleanest
programmatic trigger is an MxAccess write that flips a
`$Alarm`-extended boolean to true (alarm in) and back to false
(alarm out). Pinning the exact tag reference is pending — needs
either a documented test-fixture tag or an interactive selection
in System Platform IDE. Once the trigger fires, this resolves
whether AVEVA's pulled change set arrives via `GetStatistics`
`positions[] / handles[]` (per-change polling works) or only via
the AVEVA-internal window (callback path needed).
2. **Hook AVEVA's internal window** to log what WMs it actually
processes — only relevant if probe 1 shows `GetStatistics` does
NOT report per-change activity.
3. **Decompile `aaAlarmManagedClient.dll`'s IL** for the
`RegisterConsumer` method to find what `RegisterWindowMessage`
string is used and whether there's a callback-registration
surface on `WNAL_Register` that the managed client wraps. The
alarmlst.dll strings (`WNAL_CallBack`, "Invalid callbacks" error)
suggest the underlying C API takes callbacks, but the managed
wrapper exposes none of them.
PR A.5's `Subscribe` / `AcknowledgeByGuid` / `SnapshotActiveAlarms`
are correct — they're pull-style and don't depend on the
notification mechanism.