# aaAlarmManagedClient discovery — public surface, 2026-05-01 Result of running `MxGateway.Worker.Tests.AlarmClientDiscoveryTests.DumpAlarmClientPublicSurface` against the deployed AVEVA assembly: - File: `C:\Program Files (x86)\ArchestrA\Framework\Bin\ViewAppFramework\Content\MA\aaAlarmManagedClient.dll` - Assembly identity: `aaAlarmManagedClient, Version=1.0.7368.41290, Culture=neutral, PublicKeyToken=7ebd82b507d9e10c` ## Public types - `aaAlarmManagedClient.AlarmClient` (class) - `aaAlarmManagedClient.PriorityData` (class) That's the entire exported surface — two types, no interfaces, no delegates. ## `AlarmClient` events **None.** The class has no public events at all. The reflection probe's `GetEvents(BindingFlags.Public | Instance | Static)` returned an empty list. ## `AlarmClient` methods (relevant subset) - **Lifecycle:** `RegisterConsumer(int hWnd, string szProductName, string szApplicationName, string szVersion, bool bRetainHiddenAlarms) → int`, `DeregisterConsumer() → int`, `InitializeConsumer(string szApplicationName) → int`, `UninitializeConsumer() → int`, `Dispose()`. - **Subscription:** `Subscribe(string szSubscription, short wFromPri, short wToPri, eQueryType QueryType, eSortFlags SortFlags, eAlarmFilterState FilterMask, eAlarmFilterState FilterSpecification) → int`. - **Change enumeration (pull on poke):** `GetStatistics(out int lPercentQuery, out int lTotalAlarms, out int lActiveAlarms, out int lSuppressedAlarms, out int lSuppressedFilters, out int lNewAlarms, out int lChangesCount, out int[] ChangeCodes, out int[] ChangePos, out int[] hAlarm) → int`. - **Record fetch:** `GetAlarmExtendedRec(int lIndex, out AlarmRecord almRec) → int`, `GetAlarmExtendedRec2(...)`, `GetHighPriAlarm(out AlarmRecord almRec) → int`. - **Selection model** (used by ack-selected-* family): `DeselectAll`, `SelectAlaramEntry(short select, int from, int to)`, `SelectByGUID(Guid)`, `SelectAlarmCount(int from, int to)`. - **Acknowledge:** `AlarmAckByGUID(Guid alarmGuid, string ackComment, string ackOprName, string ackOprNode, string ackOprDomain, string ackOprFullName) → int` is the per-alarm full-fidelity native ack. `AlarmAckSelected(string ackComment, string ackOprName, string ackOprNode, string ackOprDomain, string ackOprFullName) → int` acks whatever the selection model currently has selected. Several `AckSelected*Group/Tag/Priority/All/Visible*Alarms_Ex(...)` variants exist for bulk ack scoped to a group / tag / priority range. - **Suppress / shelve:** `SupressSelected*` and `ShelveSelected*` families plus `DoAlarmShelveAction(...)`. Out of scope for the v1 alarm path. - **Snapshot/filter** (`SF*` prefix): `SFSetSortA / SFSetFilterA / SFCreateSnapshot / SFGetListCount / SFDeleteSnapshot / SFRefreshAlarm / SFGetStatistics`. Snapshot-style query API, distinct from the consumer-subscription path. Not currently used. ## What this means The architecture comment on `src/MxGateway.Worker/MxAccess/AlarmClientConsumer.cs` (PR A.5) is **wrong against this deployed assembly**: > "The AVEVA alarm-manager surface (`IAlarmMgrDataProvider`) exposes > the events we need as plain .NET events — no Windows message pump > required." There is no managed event surface. `AlarmClient.RegisterConsumer` takes an `hWnd` because **WM_APP messaging is the actual notification mechanism**: AVEVA's alarm provider WM_APP-pokes the registered window, and the consumer is expected to call `GetStatistics` on each poke to pull `ChangeCodes` / `ChangePos` / `hAlarm` arrays, then `GetAlarmExtendedRec(pos, …)` per index to fetch each changed record. `AlarmClientConsumer.AlarmRecordReceived` has no production callers as a result — `RaiseAlarmRecordReceived` is `internal` for tests and never gets invoked at runtime. Until A.2 lands a WM_APP pump, `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` cannot carry events. ## Live runtime probe — 2026-05-01 `MxGateway.Worker.Tests.AlarmClientWmProbeTests.ProbeAlarmClientWmMessages` is a Skip-gated runtime probe that creates a real message-only window, calls `AlarmClient.RegisterConsumer(hWnd, …)` + `Subscribe(@"\Galaxy!", …)`, and pumps for 20s while logging every window message that arrives. Run results below — this turned the "WM_APP pump" design assumption upside down. **`RegisterConsumer` and `Subscribe` both returned 0 (success).** The calls are valid against the deployed assembly; no parameter pinning needed. **A registered-message-class WM (ID `0xC275` in this OS session) fired every ~1s after `Subscribe` completed.** Constant `wParam = 0x00001100`, constant `lParam = 0x079E46D8` (looks like a stable pointer into AVEVA-internal state) for all 20 hits. The constant payload across hits with no Galaxy alarm being fired suggests this is a **heartbeat/keepalive**, not a per-change notification. **Critically: this WM is delivered to AVEVA's own internal window (`hwnd=0x18032E`) — NOT to the consumer's `hWnd` we passed in.** The consumer window's `WndProc` received only the standard creation sequence (`WM_GETMINMAXINFO`, `WM_NCCREATE`, `WM_NCCALCSIZE`, `WM_CREATE`) and the destruction sequence (`WM_NCDESTROY`, `WM_DESTROY`, `WM_NCCALCSIZE`) — nothing in between. AVEVA's notification path runs entirely against AVEVA's internal window; it never forwards to the user-supplied hWnd. The message ID itself is dynamic (a `RegisterWindowMessage` allocation in the >= 0xC000 range), so it cannot be hard-coded — each consumer process must call `RegisterWindowMessage` with the correct *string* and use whatever ID the OS returns. ## What this means for A.2 The "WM_APP pump on the user hWnd" design — what the original plan banner described and what the previous version of this doc recommended — does not match how AVEVA actually delivers notifications. The hWnd parameter to `RegisterConsumer` does not appear to receive any of AVEVA's alarm traffic; it's likely used only as a registration identity (and perhaps as a parent for modal dialogs). Two viable A.2 designs given the probe data: 1. **Polling.** Just call `GetStatistics` on a timer (e.g. every 500ms in the worker's STA) and react to the change set it reports. No window plumbing needed. Trade-off: latency floor = poll period; modest CPU floor because the call is cheap. Matches the heartbeat-style WM 0xC275 semantics — AVEVA itself runs a poll loop internally. 2. **Hook AVEVA's internal window.** Discover AVEVA's own window (`hwnd=0x18032E` in the probe), `SetWindowsHookEx` or `SetWindowSubclass` on it, and intercept WM 0xC275 on AVEVA's thread. Higher fidelity, near-zero latency, but invasive, fragile across AVEVA upgrades, and requires running on the same process / thread as the AVEVA window. Probably a non-starter without further AVEVA documentation. **Recommendation:** the polling path (option 1) is cheaper to implement, more robust against AVEVA-internal change, and acceptable for a typical alarm cadence. The worker's existing STA already provides a thread-affinitized timer surface. The unanswered question is whether `GetStatistics` can be safely called outside AVEVA's own message-pump thread — confirmable by extending the probe to fire `GetStatistics` on its own thread and check the result. ## Alarm-provider visibility — third probe run, 2026-05-01 Extended the probe to call `AlarmClient.GetProviders` after `RegisterConsumer`. Result on this rig: ``` GetProviders -> rc=0 count=0 list=[] ``` **Zero alarm providers visible to the consumer process.** This explains every preceding probe run: no providers means no alarm events, regardless of how many times any value (including a bool with an `$Alarm` extension) flips. `Subscribe(@"\Galaxy!")` returns 0 (success) but matches nothing because the alarm-manager chain that provides the matching feed doesn't expose any provider to this consumer. A System Platform script flipping `TestMachine_001.TestAlarm001` every 10s during this probe run produced no observable `GetStatistics` transitions, no `positions[]` / `handles[]` entries, no change in any field — confirms the silence is not about subscription-scope / message-pump but about provider absence. ### Possible causes 1. **No `$Alarm` extension on the test bool.** If `TestMachine_001.TestAlarm001` is a regular UDA without a `BoolAlarm` extension wired to it, flipping the value just writes a new value — no alarm fires. 2. **Alarm manager service not running.** AVEVA's `aaAlarmMgr` (or the equivalent on this rig's Platform version) needs to be running for providers to register. 3. **Process security context.** A consumer running under a normal user account may not see providers that registered under `LocalSystem` / a Platform service identity. The gateway-worker installation runs under a service account that may have access where `dotnet test` doesn't. ## InitializeConsumer required — fourth probe run, 2026-05-01 Adding `InitializeConsumer("AlarmProbe.Tests")` before `RegisterConsumer` made `\Galaxy!` appear in `GetProviders` (count=1, status 0 → 100 within 500ms). So #2 and #3 above are NOT the cause — the consumer can see the alarm provider once it calls Initialize. That's a missing API-call ordering, not a permission or service issue. ``` InitializeConsumer -> 0 RegisterConsumer -> 0 GetProviders [after Register] -> rc=0 count=0 list=[] Subscribe('\Galaxy!') -> 0 GetProviders [after Subscribe] -> rc=0 count=1 list=[ 0 \Galaxy!] GetProviders [poll #1] -> rc=0 count=1 list=[100 \Galaxy!] ``` Despite the provider being visible at "100% query complete" for the entire 60s window, `GetStatistics` continued to report `total=0 active=0 codes=[7]` — no alarm transitions reached the consumer even with a System Platform script flipping the test boolean every 10s during the run. That isolates the remaining unknown to whether the test bool's alarm extension is actually generating MxAccess alarm-provider events when its value flips. The probe has confirmed every link in the consumer chain works (Initialize → Register → Subscribe → provider visible at 100%) — what's missing is alarm traffic from the producer side. ObjectViewer or another live consumer running alongside the script is the next discriminator: does it visibly see the alarm fire? API-ordering finding: `InitializeConsumer` MUST precede `RegisterConsumer` (or at least, must be called before `GetProviders` returns anything). PR A.5's `AlarmClientConsumer` omits `InitializeConsumer` entirely — that's a bug fix to apply even before A.2 lands, since without it the provider chain never becomes visible. ## Subscribe-parameter sweep — fifth probe run, 2026-05-01 Even with `InitializeConsumer` + provider visible at status 100, no alarm transitions arrived during a 60s window with the user's script flipping the test bool every 10s. Tried: - `qtSummary` and `qtHistory` (the only `eQueryType` values). - Priority 1..999 and 0..32767. - `eAlarmFilterState.asNone` and `asAlarmActiveNow` for both `FilterMask` and `FilterSpecification`. `eAlarmFilterState` is single-state-valued (asNone=0, asAlarmActiveNow=1, asAlarmAcked=2, asShelved=3), not flag bits. None of these knobs surfaced any alarm activity. User confirmation 2026-05-01: the test bool does have a `BoolAlarm` extension on it; in `aaObjectViewer` the `$Alarm.InAlarm` sub-attribute flips true/false in lockstep with the script's writes. So the alarm extension is **evaluating** its condition, just not visibly producing transitions on the `aaAlarmManagedClient` consumer stream. ## Multi-channel + multi-subscription probe — sixth run, 2026-05-01 Extended the probe to try every consumer-side approach in parallel: - **Subscription expressions** (sequential): `\Galaxy!`, `\Galaxy!*`, `\\Galaxy!`, `\Galaxy!TestArea`, `\\.\Galaxy!`. All Subscribe calls returned rc=0; the last one (`\\.\Galaxy!`) is reflected in `GetProviders` (count=1). - **Read channels** polled at 500ms cadence: `GetStatistics`, `GetHighPriAlarm`, `SFCreateSnapshot` + `SFGetStatistics`. - **Filter+sort**: priority 0..32767, `qtSummary`, state=`asAlarmActiveNow`, sort=`sfReturnNewestFirst`. - **AlarmRecord init** (worked around `Not a valid Win32 FileTime` exception): all DateTime fields pre-set to FILETIME epoch (1601-01-01 UTC) before the call, since `default(DateTime)` is outside FILETIME range and trips the interop marshaler. Result of the 60s run with `TestMachine_001.TestAlarm001` being flipped every 10s: ``` Subscribe('\Galaxy!') -> 0 Subscribe('\Galaxy!*') -> 0 Subscribe('\\Galaxy!') -> 0 Subscribe('\Galaxy!TestArea') -> 0 Subscribe('\\.\Galaxy!') -> 0 GetProviders [after Subscribe-multi] -> count=1 list=[ 0 \\.\Galaxy!] GetStatistics #1: total=0 active=0 changes=1 codes=[7] positions=[] handles=[] GetHighPriAlarm #1: rc=0 { } SF channel #1: SFCreate=0 numAlarms=0 SFStats=0 unackRet=0 unackAlm=0 ackAlm=0 others=0 events=0 idxNewest=-1 ``` **No further "(changed)" entries for the entire 60s window.** Every read API returned the same empty result on every poll. User confirms the alarm IS firing — `aaObjectViewer` sees `$Alarm.InAlarm` flip in lockstep with the script. Historian records exist (per user — needs verification by querying the historian directly). ## Conclusion of consumer-side probing `aaAlarmManagedClient.AlarmClient` is **not** the receive surface AVEVA's alarm pipeline routes to in this Galaxy configuration. The consumer chain is verified end-to-end: - `InitializeConsumer` + `RegisterConsumer` + `Subscribe` all succeed (rc=0). - `GetProviders` finds `\Galaxy!` once Initialize is called. - All read APIs (`GetStatistics`, `GetHighPriAlarm`, `SFCreateSnapshot`/`SFGetStatistics`) return empty even with every documented filter combination. - The consumer's hWnd receives zero AVEVA messages between `WM_CREATE` and `WM_DESTROY`; AVEVA's traffic goes to its own internal hwnd. The next investigation directions are not consumer-side: 1. **Inspect `aaObjectViewer`'s alarm SDK** to see what library it uses to read alarms. If different from `aaAlarmManagedClient`, switch the worker over. 2. **Query the historian directly** (`aahEventStorage` / `aahEventSvc`) to confirm alarms are recorded — and use the same path for v2 alarm capture. 3. **Inspect AVEVA's alarm-routing config** for this Galaxy in System Platform IDE — area assignments, alarm provider bindings, "publish alarm events to" settings on the platform. For A.2 implementation: the `aaAlarmManagedClient` path the gateway-worker is currently architected around may be a dead-end on customer Galaxies configured this way. If the alarms truly only flow through the historian event-storage path, A.2 needs to consume from `aahEventStorage` instead — a fundamental architecture pivot. ## BREAKTHROUGH — seventh probe run, 2026-05-01 Two changes finally produced a signal: 1. **Subscription scope:** `\\\Galaxy!` is the canonical AlarmClient subscription format (per ArchestrA Alarm Client docs at `archestra6.rssing.com/chan-12008125/article13.html`): `\\Node\Provider!Area!Filter`, where Node is the *machine* name, Provider is **literally `Galaxy`**, and Area is a hosted area object. For this rig (`\\DESKTOP-6JL3KKO\Galaxy!DEV`) the DEV area — the platform's primary area — is the right scope. Earlier `\Galaxy!`, `\Galaxy!TestArea`, `\\.\Galaxy!`, etc., all returned rc=0 but matched no traffic — they were not the canonical form. 2. **`InitializeConsumer` before `RegisterConsumer`** — already discovered earlier; bug-fix for PR A.5's `AlarmClientConsumer`. With both in place, `GetHighPriAlarm` returned a record on every poll for 60s straight (117/117 calls), but threw `ArgumentOutOfRangeException: Not a valid Win32 FileTime` instead of returning successfully — the AlarmRecord struct contains five DateTime fields (`ar_Time`, `ar_OrigTime`, `ar_AckTime`, `ar_RtnTime`, `ar_SubTime`) and AVEVA writes sentinel/invalid FILETIME values for unset ones (e.g., `ar_AckTime` for an unacknowledged alarm). The .NET interop that AVEVA ships (`aaAlarmManagedClient.dll`) auto-converts FILETIME→DateTime and rejects out-of-range values. `GetStatistics` continues to report `total=0 active=0` even with GetHighPriAlarm returning records — those two API surfaces have genuinely different views in AVEVA's data model. So: **alarms flow through `aaAlarmManagedClient.AlarmClient` once the subscription expression is canonical**. The blocking issue is extracting the payload past the .NET interop's DateTime auto-marshaling. ## Remaining work to capture alarm payloads Define a custom COM interop that uses `long` (FILETIME-as-int64) instead of `DateTime` for the timestamp fields. Approach options: 1. **Patch the AVEVA-shipped `aaAlarmManagedClient.dll`** — ildasm the assembly, replace `DateTime` with `long` on AlarmRecord's timestamp fields, ilasm back. Brittle across AVEVA upgrades. 2. **Write our own `[ComImport]` interface** — declare `IRawAlarmConsumer` ourselves with safe-blittable types, discover the underlying COM IID (via reflection on `AlarmClient`'s `[Guid]` attribute), and `(IRawAlarmConsumer) alarmClient` cast. Cleaner; requires the IID. 3. **Use `IDispatch` late binding** — dispatch-Invoke bypasses strong-typed marshaling. Verbose but doesn't need IIDs. For PR A.2's worker integration, option 2 is the least disruptive. Once the interop is custom, `AlarmClient.Subscribe` + `GetHighPriAlarm` + `GetAlarmExtendedRec` form a viable polling-style alarm consumer. **REVISED 2026-05-01 — option 1 not directly applicable.** Reflection on `aaAlarmManagedClient.AlarmClient` shows it implements only `IDisposable` (no `[ComImport]` interface, no class GUID). It has a single field `CwwAlarmConsumer* m_almUnmanaged` — meaning `AlarmClient` is a **C++/CLI managed wrapper around a native C++ class**, NOT a COM-interop class. The DateTime conversion happens inside the AVEVA wrapper's IL, not at a .NET-to-COM marshaling boundary. There is no separate COM interface IID we can QI to. Revised approach options: A. **Switch to `wnwrapConsumer.dll`** — a separate standalone COM library AVEVA ships at `C:\Program Files (x86)\Common Files\ArchestrA\wnwrapConsumer.dll` exposing `WNWRAPCONSUMERLib.wwAlarmConsumerClass` with `SetXmlAlarmQuery` / `GetXmlCurrentAlarms`. XML-string output bypasses FILETIME marshaling entirely. B. **Patch `aaAlarmManagedClient.dll` IL** — wrap the unsafe `DateTime.FromFileTime` calls with a safe variant. Direct fix but modifies a vendor binary. C. **Reflect into `m_almUnmanaged` and call native vtable** — get the IntPtr, walk the MSVC C++ vtable, call `__thiscall` methods via `Marshal.GetDelegateForFunctionPointer`. Doable but requires reverse-engineering the C++ class layout. Option A is the best fit: real COM-based, self-contained in our code, conventional production-grade approach (the WIN-911 consumer pattern referenced in AVEVA support forums uses it). The polling-vs-WM_APP-callback question from earlier is now moot: `GetStatistics`'s `positions[]/handles[]` arrays remained empty even when alarms were demonstrably present. The active read API for current alarms is `GetHighPriAlarm`, not `GetStatistics`'s change array. ### Implications for A.2 implementation The A.2 PR's value is unmeasurable until at least one alarm provider is visible. The choice between polling-via-`GetStatistics` and the callback path can only be decided by observing what populates first when a real alarm fires. Without a provider, both paths return the same "nothing happening" answer. Until that's resolved, A.2 implementation work is genuinely blocked on a dev-rig configuration issue — not on architectural choice or code structure. ## GetStatistics polling — second probe run, 2026-05-01 Extended the probe to call `GetStatistics` every ~2s alongside the WM logger. Key findings: - **`GetStatistics` is safely callable from the same thread that did `RegisterConsumer` + `Subscribe`.** Every poll returned rc=0 with no exceptions over 9 polls / 20s window. - **The deployed Galaxy currently has zero active alarms.** Every poll reported `total=0 active=0 suppressed=0 newAlarms=0`. The `positions[]` and `handles[]` arrays were empty. - **`changes=1 codes=[7]` was constant across all polls**, matching the constant 1 Hz WM 0xC275 cadence. Code 7 is consistent with a "heartbeat / subscription healthy" sentinel — same semantics as the WM but reported through the pull-side API. - `percent=100` (query-complete percentage) was constant — the subscription is steady-state. This confirms the polling design (option 1 in the previous section) is mechanically viable. The remaining open question is whether `GetStatistics` populates `positions[] / handles[]` with real entries when an alarm transition actually fires — proving that requires firing an alarm. ## Open follow-up probes Each can be added to `AlarmClientWmProbeTests` as a separate Skip-gated test: 1. **Fire a real Galaxy alarm during the pump window.** The cleanest programmatic trigger is an MxAccess write that flips a `$Alarm`-extended boolean to true (alarm in) and back to false (alarm out). Pinning the exact tag reference is pending — needs either a documented test-fixture tag or an interactive selection in System Platform IDE. Once the trigger fires, this resolves whether AVEVA's pulled change set arrives via `GetStatistics` `positions[] / handles[]` (per-change polling works) or only via the AVEVA-internal window (callback path needed). 2. **Hook AVEVA's internal window** to log what WMs it actually processes — only relevant if probe 1 shows `GetStatistics` does NOT report per-change activity. 3. **Decompile `aaAlarmManagedClient.dll`'s IL** for the `RegisterConsumer` method to find what `RegisterWindowMessage` string is used and whether there's a callback-registration surface on `WNAL_Register` that the managed client wraps. The alarmlst.dll strings (`WNAL_CallBack`, "Invalid callbacks" error) suggest the underlying C API takes callbacks, but the managed wrapper exposes none of them. PR A.5's `Subscribe` / `AcknowledgeByGuid` / `SnapshotActiveAlarms` are correct — they're pull-style and don't depend on the notification mechanism. ## Option A — captured, 2026-05-01 `wnwrapConsumer.dll` (`C:\Program Files (x86)\Common Files\ ArchestrA\wnwrapConsumer.dll`) hosts the standalone COM class `WNWRAPCONSUMERLib.wwAlarmConsumerClass`. Type library imports cleanly via `tlbimp` (output stored under `mxaccessgw/lib/ Interop.WNWRAPCONSUMERLib.dll`). The COM class is registered in `HKLM:\SOFTWARE\WOW6432Node\Classes\CLSID\ {7AB52E5F-36B2-4A30-AE46-952A746F667C}` with `ThreadingModel= Apartment` — `new wwAlarmConsumerClass()` succeeds via `CoCreateInstance`. The probe `MxGateway.Worker.Tests/WnWrapConsumerProbeTests.cs` (Skip-gated, archival) drove the captured run. Lifecycle: 1. `new wwAlarmConsumerClass()` — instantiated. 2. `InitializeConsumer("MxGatewayProbe.WnWrap")` -> 0. 3. `RegisterConsumer(hWnd: 0, productName, applicationName, version)` -> 0. **Note:** wnwrap's `RegisterConsumer` is 4-arg (no `bRetainHiddenAlarms`); `aaAlarmManagedClient`'s is 5-arg. Different surface. 4. `Subscribe(@"\\\Galaxy!DEV", priLow=1, priHigh=999, qtSummary, sfReturnNewestFirst, asAlarmActiveNow, asAlarmActiveNow)` -> 0. Same canonical scope that worked for `aaAlarmManagedClient`. 5. `SetXmlAlarmQuery(...)` was called too but the round-trip `GetXmlAlarmQuery` returned a mangled echo (NODE became `DESKTOP-6JL3KKO\Galaxy!DEV`, PROVIDER became `Galaxy!DEV`, ALARM_STATE shortened to `All`, DISPLAY_MODE truncated to `Sum`). The XML-query path looks broken in this build; rely on `Subscribe` for the filter and skip `SetXmlAlarmQuery` in production. Confirming "Subscribe alone is sufficient" is one follow-up probe (call `Subscribe` and read XML, no `SetXmlAlarmQuery`) — out of scope for the breakthrough run but easy to verify. ### Captured XML (60 polls over 30s, 500ms cadence) `GetXmlCurrentAlarms2(maxAlmCnt: 100, out vartCurrentXmlAlarms)` returned BSTR XML cleanly on every call — 60/60 ok, zero throws. `GetXmlCurrentAlarms` (the v1 method) returned identical content on the same cadence; either method is viable. Empty state: ```xml ``` With alarm active (`UNACK_ALM`, value=true after the flip script set the bool true): ```xml BCC4705395424D65BDAABCDEA6A32A73 2026/5/1 240 0 DESKTOP-6JL3KKO Galaxy TestArea TestMachine_001.TestAlarm001 DSC true true 500 UNACK_ALM Test alarm #1 ``` After the script set the bool false (`UNACK_RTN`, value=false): ```xml BCC4705395424D65BDAABCDEA6A32A73 2026/5/1 ... false UNACK_RTN ... ``` The 10s cadence between transitions matches the System Platform script's flip frequency exactly. **GUID is stable across the in→out cycle** (`BCC4705…` carried through both states), so the XML stream represents the alarm record's lifecycle, not separate event records — this is "current alarms snapshot," not "transition stream." For an OPC UA `AlarmConditionService` adapter this is fine: condition-state changes per-snapshot is the supported model. `STATE` enum values observed: `UNACK_RTN` (the alarm has returned to normal but is unacknowledged — i.e., visible in the "current alarms" list because operator hasn't acked it yet) and `UNACK_ALM` (the alarm is currently active and unacknowledged). The other states from `eAlmState` (`ACK_RTN`, `ACK_ALM`) would appear when an ack is performed — `wwAlarmConsumerClass.AlarmAckByGUID` is the method to call. ### `GetStatistics` AV — unrelated quirk Every `GetStatistics` call threw `AccessViolationException` in the probe. Cause: the wnwrap interop signature uses `IntPtr` for the three array out-parameters (`pChangeCode`, `pChangePos`, `phAlarm`); passing `IntPtr.Zero` is wrong — the COM impl is writing into the buffer pointer without null-checking. Pre- allocate three int-arrays and pass pinned pointers (or use `Marshal.AllocCoTaskMem`) to fix. Not required for the production path — the XML methods give us everything we need. ### Implications for PR A.2 worker integration Replacing `aaAlarmManagedClient.AlarmClient` with `WNWRAPCONSUMERLib.wwAlarmConsumerClass` in the worker's alarm-consumer surface unblocks A.2 fully. Outline: 1. **Reference path:** drop `aaAlarmManagedClient.dll` reference from `MxGateway.Worker.csproj`; add `Interop.WNWRAPCONSUMERLib.dll` reference from `mxaccessgw/lib/`. (Or commit the interop dll in-tree under `lib/` and reference relatively.) 2. **`AlarmClientConsumer` → `WnWrapAlarmConsumer`:** rewrite the consumer wrapper to: - `new wwAlarmConsumerClass()` on the worker's STA thread. - `InitializeConsumer(applicationName)` then `RegisterConsumer(hWnd: 0, …)`. - `Subscribe(@"\\\Galaxy!", …)` per configured area. The `` and `` are configurable (default `Environment.MachineName` + the platform's primary area). - Poll `GetXmlCurrentAlarms2(maxAlmCnt, out xml)` on a timer (500ms-1s cadence is comfortable). Parse XML payload; diff against the previous snapshot (keyed by `GUID`); emit `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` events for added/changed/removed records. - `AlarmAckByGUID(VBGUID, comment, oprName, node, domain, fullName)` for client-driven acknowledgements (matches PR A.5's `AlarmAckCommand` payload). - Lifecycle teardown: `DeregisterConsumer` + `UninitializeConsumer` + `Marshal.FinalReleaseComObject`. 3. **Conversion layer:** map XML record fields to `MxAlarmConditionRecord` proto: - `GUID` → `condition_id` (canonicalize the no-dashes hex to a UUID string). - `STATE` enum → `inAlarm` + `acked` booleans (`UNACK_ALM` → in_alarm=true, acked=false; `UNACK_RTN` → in_alarm=false, acked=false; `ACK_ALM` → in_alarm=true, acked=true; `ACK_RTN` → in_alarm=false, acked=true). - `DATE + TIME + GMTOFFSET + DSTADJUST` → reassemble UTC timestamp; matches the worker's existing `Timestamp` wire format. - `PRIORITY` → severity (already 1-1000-ish range). - `TAGNAME` → reference; `PROVIDER_NAME` + `GROUP` for scope metadata. 4. **PR A.5 fix carry-over:** `InitializeConsumer` MUST be called before `RegisterConsumer` (rediscovered with `aaAlarmManagedClient`, also true here). The existing `AlarmClientConsumer` skips Initialize entirely; the new `WnWrapAlarmConsumer` includes it from day one. 5. **Test reuse:** PR A.5's snapshot/ack contract tests can stay — they don't touch the underlying COM API. Add a new integration test against the wnwrap surface (live-AVEVA-only, Skip-gated like the probe). ### Settled API-ordering and surface knowledge - `InitializeConsumer` first, then `RegisterConsumer` — both on `aaAlarmManagedClient.AlarmClient` and `wwAlarmConsumerClass`. - `RegisterConsumer` arity differs: `aaAlarmManagedClient.AlarmClient.RegisterConsumer(hWnd, product, app, version, bRetainHiddenAlarms)` — 5 args; `wwAlarmConsumerClass.RegisterConsumer(hWnd, product, app, version)` — 4 args. The wnwrap class has no `bRetainHiddenAlarms` parameter at all. - Subscription expression format: `\\\Galaxy!` (literal `Galaxy` provider) for both libraries. - Native ack: `AlarmAckByGUID(VBGUID guid, comment, oprName, node, domain, fullName)` on the v2 surface; ID 5-arg variant on the legacy `IwwAlarmConsumer` interface. These findings retire the open follow-up probes from the "polling-vs-pump" debate above — `wwAlarmConsumerClass` plus poll-on-timer is the implementation. ## Live smoke-test discoveries — 2026-05-01 The Skip-gated `AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip` ran the full `WnWrapAlarmConsumer` + `AlarmDispatcher` + `MxAccessAlarmEventSink` pipeline against the dev rig with the flip script running. End-to-end verified: 6 real transitions captured on the 10s cadence, ack-by-name returned rc=0, pipeline stayed healthy through 5 more transitions afterwards. Three production-relevant quirks surfaced and were fixed in the consumer: ### 1. `SetXmlAlarmQuery` is mandatory for reads despite the mangled echo Without `SetXmlAlarmQuery`, the first `GetXmlCurrentAlarms2` call fails with `E_FAIL` (HRESULT `0x80004005`). The discovery doc above flagged the round-trip echo as mangled and recommended skipping the call — that recommendation is **wrong**. The echo *is* mangled (AVEVA parses NODE/PROVIDER/ALARM_STATE/DISPLAY_MODE incorrectly), but the call itself is required as some kind of subscription enabler. Even the Subscribe call setting the actual filter doesn't avoid the need for `SetXmlAlarmQuery`. `WnWrapAlarmConsumer.ComposeXmlAlarmQuery(subscription)` decomposes the canonical `\\\Galaxy!` form into the XML's NODE/PROVIDER/GROUP fields. Mangled or not, the call enables reads. ### 2. Two consumers required: read-side vs. ack-side `SetXmlAlarmQuery` enables reads but **breaks `AlarmAckByName` on the same consumer instance**. With SetXml applied, AlarmAckByName returns -55 even with valid name+provider+group+operator. Without SetXml, AlarmAckByName succeeds with rc=0. The production consumer therefore provisions **two** wnwrap COM instances: - Primary consumer (`client`): runs full lifecycle including `SetXmlAlarmQuery` for `GetXmlCurrentAlarms2` polls. - Ack-only consumer (`ackClient`): runs Initialize → Register → Subscribe via the v1-prefixed methods, **no SetXmlAlarmQuery**. All `AcknowledgeByName` calls dispatch through this instance. Both consumers subscribe to the same expression. Disposal cleans up both via a shared `ReleaseConsumerCom` helper. ### 3. `AlarmAckByName` v2 8-arg vs. v1 6-arg `wwAlarmConsumerClass` exposes two `AlarmAckByName` overloads: - `IwwAlarmConsumer2` v2: 8 args (`name, provider, group, comment, oprName, node, domainName, oprFullName`). - `IwwAlarmConsumer` v1: 6 args (no domain, no full-name). The v2 8-arg method returns -55 on this AVEVA build regardless of operator-identity inputs — looks like a stub. The v1 6-arg method works. Production `WnWrapAlarmConsumer.AcknowledgeByName` calls the 6-arg overload and discards the proto's `domain` + `full_name` fields. The proto contract keeps the 8 fields for forward compatibility if AVEVA fixes the v2 method later. ### 4. `AlarmAckByGUID` is not implemented The v2 `AlarmAckByGUID(VBGUID, …)` throws `NotImplementedException` (COM `E_NOTIMPL`) on `wwAlarmConsumerClass` against this AVEVA build. The reference→GUID lookup that we initially planned to wire through `AlarmAckByGUID` is therefore not viable on wnwrap; all acks must go through `AlarmAckByName`. The proto `AcknowledgeAlarmCommand` (GUID-based) and the worker's `MxAccessCommandExecutor.ExecuteAcknowledgeAlarm` switch arm remain in the codebase for the forward-compat shape, but the gateway-side `WorkerAlarmRpcDispatcher.AcknowledgeAsync` now always routes through `AcknowledgeAlarmByName` when the public RPC supplies a recognizable `Provider!Group.Tag` reference. ### 5. STA / threading — production fix needed The wnwrap COM is `ThreadingModel=Apartment`. The consumer's internal `Timer` fires on threadpool threads and would block forever on cross-apartment marshaling unless the host STA pumps Win32 messages. The smoke test sidesteps this by setting `pollIntervalMilliseconds=0` (Timer disabled) and driving `PollOnce` manually from the test's STA. Production hosting will route polls through the worker's `StaRuntime` in a follow-up — the consumer's `PollOnce` is `public` and idempotent so the wire-up is mechanical. ### Capture summary ``` Transition: kind=Clear ref='Galaxy!TestArea.TestMachine_001.TestAlarm001' … Transition: kind=Raise ref='Galaxy!TestArea.TestMachine_001.TestAlarm001' … SnapshotActiveAlarms count=1 active: ref='Galaxy!TestArea.TestMachine_001.TestAlarm001' state=Active AcknowledgeByName(real identity) -> rc=0 Post-ack transition: kind=Clear … +1: kind=Raise … (10s after ack) +2: kind=Clear … (20s) +3: kind=Raise … (30s) +4: kind=Clear … (40s) ``` 10s cadence held throughout; full proto fields populated correctly; ack registered server-side without errors.