41 KiB
aaAlarmManagedClient discovery — public surface, 2026-05-01
Result of running
ZB.MOM.WW.MxGateway.Worker.Tests.AlarmClientDiscoveryTests.DumpAlarmClientPublicSurface
against the deployed AVEVA assembly:
- File:
C:\Program Files (x86)\ArchestrA\Framework\Bin\ViewAppFramework\Content\MA\aaAlarmManagedClient.dll - Assembly identity:
aaAlarmManagedClient, Version=1.0.7368.41290, Culture=neutral, PublicKeyToken=7ebd82b507d9e10c
Public types
aaAlarmManagedClient.AlarmClient(class)aaAlarmManagedClient.PriorityData(class)
That's the entire exported surface — two types, no interfaces, no delegates.
AlarmClient events
None. The class has no public events at all. The reflection probe's
GetEvents(BindingFlags.Public | Instance | Static) returned an empty
list.
AlarmClient methods (relevant subset)
- Lifecycle:
RegisterConsumer(int hWnd, string szProductName, string szApplicationName, string szVersion, bool bRetainHiddenAlarms) → int,DeregisterConsumer() → int,InitializeConsumer(string szApplicationName) → int,UninitializeConsumer() → int,Dispose(). - Subscription:
Subscribe(string szSubscription, short wFromPri, short wToPri, eQueryType QueryType, eSortFlags SortFlags, eAlarmFilterState FilterMask, eAlarmFilterState FilterSpecification) → int. - Change enumeration (pull on poke):
GetStatistics(out int lPercentQuery, out int lTotalAlarms, out int lActiveAlarms, out int lSuppressedAlarms, out int lSuppressedFilters, out int lNewAlarms, out int lChangesCount, out int[] ChangeCodes, out int[] ChangePos, out int[] hAlarm) → int. - Record fetch:
GetAlarmExtendedRec(int lIndex, out AlarmRecord almRec) → int,GetAlarmExtendedRec2(...),GetHighPriAlarm(out AlarmRecord almRec) → int. - Selection model (used by ack-selected-* family):
DeselectAll,SelectAlaramEntry(short select, int from, int to),SelectByGUID(Guid),SelectAlarmCount(int from, int to). - Acknowledge:
AlarmAckByGUID(Guid alarmGuid, string ackComment, string ackOprName, string ackOprNode, string ackOprDomain, string ackOprFullName) → intis the per-alarm full-fidelity native ack.AlarmAckSelected(string ackComment, string ackOprName, string ackOprNode, string ackOprDomain, string ackOprFullName) → intacks whatever the selection model currently has selected. SeveralAckSelected*Group/Tag/Priority/All/Visible*Alarms_Ex(...)variants exist for bulk ack scoped to a group / tag / priority range. - Suppress / shelve:
SupressSelected*andShelveSelected*families plusDoAlarmShelveAction(...). Out of scope for the v1 alarm path. - Snapshot/filter (
SF*prefix):SFSetSortA / SFSetFilterA / SFCreateSnapshot / SFGetListCount / SFDeleteSnapshot / SFRefreshAlarm / SFGetStatistics. Snapshot-style query API, distinct from the consumer-subscription path. Not currently used.
What this means
The architecture comment on
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmClientConsumer.cs (PR A.5) is
wrong against this deployed assembly:
"The AVEVA alarm-manager surface (
IAlarmMgrDataProvider) exposes the events we need as plain .NET events — no Windows message pump required."
There is no managed event surface. AlarmClient.RegisterConsumer
takes an hWnd because WM_APP messaging is the actual notification
mechanism: AVEVA's alarm provider WM_APP-pokes the registered window,
and the consumer is expected to call GetStatistics on each poke to
pull ChangeCodes / ChangePos / hAlarm arrays, then
GetAlarmExtendedRec(pos, …) per index to fetch each changed record.
AlarmClientConsumer.AlarmRecordReceived has no production callers as
a result — RaiseAlarmRecordReceived is internal for tests and
never gets invoked at runtime. Until A.2 lands a WM_APP pump,
MX_EVENT_FAMILY_ON_ALARM_TRANSITION cannot carry events.
Live runtime probe — 2026-05-01
ZB.MOM.WW.MxGateway.Worker.Tests.AlarmClientWmProbeTests.ProbeAlarmClientWmMessages
is a Skip-gated runtime probe that creates a real message-only
window, calls AlarmClient.RegisterConsumer(hWnd, …) +
Subscribe(@"\Galaxy!", …), and pumps for 20s while logging every
window message that arrives. Run results below — this turned the
"WM_APP pump" design assumption upside down.
RegisterConsumer and Subscribe both returned 0 (success). The
calls are valid against the deployed assembly; no parameter pinning
needed.
A registered-message-class WM (ID 0xC275 in this OS session)
fired every ~1s after Subscribe completed. Constant
wParam = 0x00001100, constant lParam = 0x079E46D8 (looks like a
stable pointer into AVEVA-internal state) for all 20 hits. The
constant payload across hits with no Galaxy alarm being fired
suggests this is a heartbeat/keepalive, not a per-change
notification.
Critically: this WM is delivered to AVEVA's own internal window
(hwnd=0x18032E) — NOT to the consumer's hWnd we passed in. The
consumer window's WndProc received only the standard creation
sequence (WM_GETMINMAXINFO, WM_NCCREATE, WM_NCCALCSIZE,
WM_CREATE) and the destruction sequence (WM_NCDESTROY,
WM_DESTROY, WM_NCCALCSIZE) — nothing in between. AVEVA's
notification path runs entirely against AVEVA's internal window;
it never forwards to the user-supplied hWnd.
The message ID itself is dynamic (a RegisterWindowMessage
allocation in the >= 0xC000 range), so it cannot be hard-coded —
each consumer process must call RegisterWindowMessage with the
correct string and use whatever ID the OS returns.
What this means for A.2
The "WM_APP pump on the user hWnd" design — what the original plan
banner described and what the previous version of this doc
recommended — does not match how AVEVA actually delivers
notifications. The hWnd parameter to RegisterConsumer does not
appear to receive any of AVEVA's alarm traffic; it's likely used
only as a registration identity (and perhaps as a parent for modal
dialogs).
Two viable A.2 designs given the probe data:
- Polling. Just call
GetStatisticson a timer (e.g. every 500ms in the worker's STA) and react to the change set it reports. No window plumbing needed. Trade-off: latency floor = poll period; modest CPU floor because the call is cheap. Matches the heartbeat-style WM 0xC275 semantics — AVEVA itself runs a poll loop internally. - Hook AVEVA's internal window. Discover AVEVA's own window
(
hwnd=0x18032Ein the probe),SetWindowsHookExorSetWindowSubclasson it, and intercept WM 0xC275 on AVEVA's thread. Higher fidelity, near-zero latency, but invasive, fragile across AVEVA upgrades, and requires running on the same process / thread as the AVEVA window. Probably a non-starter without further AVEVA documentation.
Recommendation: the polling path (option 1) is cheaper to
implement, more robust against AVEVA-internal change, and
acceptable for a typical alarm cadence. The worker's existing STA
already provides a thread-affinitized timer surface. The unanswered
question is whether GetStatistics can be safely called outside
AVEVA's own message-pump thread — confirmable by extending the
probe to fire GetStatistics on its own thread and check the
result.
Alarm-provider visibility — third probe run, 2026-05-01
Extended the probe to call AlarmClient.GetProviders after
RegisterConsumer. Result on this rig:
GetProviders -> rc=0 count=0 list=[]
Zero alarm providers visible to the consumer process. This
explains every preceding probe run: no providers means no alarm
events, regardless of how many times any value (including a
bool with an $Alarm extension) flips. Subscribe(@"\Galaxy!")
returns 0 (success) but matches nothing because the alarm-manager
chain that provides the matching feed doesn't expose any provider
to this consumer.
A System Platform script flipping TestMachine_001.TestAlarm001
every 10s during this probe run produced no observable
GetStatistics transitions, no positions[] / handles[]
entries, no change in any field — confirms the silence is not
about subscription-scope / message-pump but about provider
absence.
Possible causes
- No
$Alarmextension on the test bool. IfTestMachine_001.TestAlarm001is a regular UDA without aBoolAlarmextension wired to it, flipping the value just writes a new value — no alarm fires. - Alarm manager service not running. AVEVA's
aaAlarmMgr(or the equivalent on this rig's Platform version) needs to be running for providers to register. - Process security context. A consumer running under a
normal user account may not see providers that registered
under
LocalSystem/ a Platform service identity. The gateway-worker installation runs under a service account that may have access wheredotnet testdoesn't.
InitializeConsumer required — fourth probe run, 2026-05-01
Adding InitializeConsumer("AlarmProbe.Tests") before
RegisterConsumer made \Galaxy! appear in GetProviders
(count=1, status 0 → 100 within 500ms). So #2 and #3 above are
NOT the cause — the consumer can see the alarm provider once it
calls Initialize. That's a missing API-call ordering, not a
permission or service issue.
InitializeConsumer -> 0
RegisterConsumer -> 0
GetProviders [after Register] -> rc=0 count=0 list=[]
Subscribe('\Galaxy!') -> 0
GetProviders [after Subscribe] -> rc=0 count=1 list=[ 0 \Galaxy!]
GetProviders [poll #1] -> rc=0 count=1 list=[100 \Galaxy!]
Despite the provider being visible at "100% query complete" for
the entire 60s window, GetStatistics continued to report
total=0 active=0 codes=[7] — no alarm transitions reached the
consumer even with a System Platform script flipping the test
boolean every 10s during the run.
That isolates the remaining unknown to whether the test bool's alarm extension is actually generating MxAccess alarm-provider events when its value flips. The probe has confirmed every link in the consumer chain works (Initialize → Register → Subscribe → provider visible at 100%) — what's missing is alarm traffic from the producer side. ObjectViewer or another live consumer running alongside the script is the next discriminator: does it visibly see the alarm fire?
API-ordering finding: InitializeConsumer MUST precede
RegisterConsumer (or at least, must be called before
GetProviders returns anything). PR A.5's AlarmClientConsumer
omits InitializeConsumer entirely — that's a bug fix to apply
even before A.2 lands, since without it the provider chain never
becomes visible.
Subscribe-parameter sweep — fifth probe run, 2026-05-01
Even with InitializeConsumer + provider visible at status 100,
no alarm transitions arrived during a 60s window with the user's
script flipping the test bool every 10s. Tried:
qtSummaryandqtHistory(the onlyeQueryTypevalues).- Priority 1..999 and 0..32767.
eAlarmFilterState.asNoneandasAlarmActiveNowfor bothFilterMaskandFilterSpecification.
eAlarmFilterState is single-state-valued (asNone=0,
asAlarmActiveNow=1, asAlarmAcked=2, asShelved=3), not flag bits.
None of these knobs surfaced any alarm activity.
User confirmation 2026-05-01: the test bool does have a
BoolAlarm extension on it; in aaObjectViewer the
$Alarm.InAlarm sub-attribute flips true/false in lockstep with
the script's writes. So the alarm extension is evaluating
its condition, just not visibly producing transitions on the
aaAlarmManagedClient consumer stream.
Multi-channel + multi-subscription probe — sixth run, 2026-05-01
Extended the probe to try every consumer-side approach in parallel:
- Subscription expressions (sequential):
\Galaxy!,\Galaxy!*,\\Galaxy!,\Galaxy!TestArea,\\.\Galaxy!. All Subscribe calls returned rc=0; the last one (\\.\Galaxy!) is reflected inGetProviders(count=1). - Read channels polled at 500ms cadence:
GetStatistics,GetHighPriAlarm,SFCreateSnapshot+SFGetStatistics. - Filter+sort: priority 0..32767,
qtSummary, state=asAlarmActiveNow, sort=sfReturnNewestFirst. - AlarmRecord init (worked around
Not a valid Win32 FileTimeexception): all DateTime fields pre-set to FILETIME epoch (1601-01-01 UTC) before the call, sincedefault(DateTime)is outside FILETIME range and trips the interop marshaler.
Result of the 60s run with TestMachine_001.TestAlarm001 being
flipped every 10s:
Subscribe('\Galaxy!') -> 0
Subscribe('\Galaxy!*') -> 0
Subscribe('\\Galaxy!') -> 0
Subscribe('\Galaxy!TestArea') -> 0
Subscribe('\\.\Galaxy!') -> 0
GetProviders [after Subscribe-multi] -> count=1 list=[ 0 \\.\Galaxy!]
GetStatistics #1: total=0 active=0 changes=1 codes=[7] positions=[] handles=[]
GetHighPriAlarm #1: rc=0 { }
SF channel #1: SFCreate=0 numAlarms=0 SFStats=0 unackRet=0 unackAlm=0 ackAlm=0 others=0 events=0 idxNewest=-1
No further "(changed)" entries for the entire 60s window. Every read API returned the same empty result on every poll.
User confirms the alarm IS firing — aaObjectViewer sees
$Alarm.InAlarm flip in lockstep with the script. Historian
records exist (per user — needs verification by querying the
historian directly).
Conclusion of consumer-side probing
aaAlarmManagedClient.AlarmClient is not the receive
surface AVEVA's alarm pipeline routes to in this Galaxy
configuration. The consumer chain is verified end-to-end:
InitializeConsumer+RegisterConsumer+Subscribeall succeed (rc=0).GetProvidersfinds\Galaxy!once Initialize is called.- All read APIs (
GetStatistics,GetHighPriAlarm,SFCreateSnapshot/SFGetStatistics) return empty even with every documented filter combination. - The consumer's hWnd receives zero AVEVA messages between
WM_CREATEandWM_DESTROY; AVEVA's traffic goes to its own internal hwnd.
The next investigation directions are not consumer-side:
- Inspect
aaObjectViewer's alarm SDK to see what library it uses to read alarms. If different fromaaAlarmManagedClient, switch the worker over. - Query the historian directly (
aahEventStorage/aahEventSvc) to confirm alarms are recorded — and use the same path for v2 alarm capture. - Inspect AVEVA's alarm-routing config for this Galaxy in System Platform IDE — area assignments, alarm provider bindings, "publish alarm events to" settings on the platform.
For A.2 implementation: the aaAlarmManagedClient path the
gateway-worker is currently architected around may be a
dead-end on customer Galaxies configured this way. If the
alarms truly only flow through the historian event-storage path,
A.2 needs to consume from aahEventStorage instead — a
fundamental architecture pivot.
BREAKTHROUGH — seventh probe run, 2026-05-01
Two changes finally produced a signal:
- Subscription scope:
\\<MachineName>\Galaxy!<TopArea>is the canonical AlarmClient subscription format (per ArchestrA Alarm Client docs atarchestra6.rssing.com/chan-12008125/article13.html):\\Node\Provider!Area!Filter, where Node is the machine name, Provider is literallyGalaxy, and Area is a hosted area object. For this rig (\\DESKTOP-6JL3KKO\Galaxy!DEV) the DEV area — the platform's primary area — is the right scope. Earlier\Galaxy!,\Galaxy!TestArea,\\.\Galaxy!, etc., all returned rc=0 but matched no traffic — they were not the canonical form. InitializeConsumerbeforeRegisterConsumer— already discovered earlier; bug-fix for PR A.5'sAlarmClientConsumer.
With both in place, GetHighPriAlarm returned a record on every
poll for 60s straight (117/117 calls), but threw
ArgumentOutOfRangeException: Not a valid Win32 FileTime instead
of returning successfully — the AlarmRecord struct contains five
DateTime fields (ar_Time, ar_OrigTime, ar_AckTime,
ar_RtnTime, ar_SubTime) and AVEVA writes sentinel/invalid
FILETIME values for unset ones (e.g., ar_AckTime for an
unacknowledged alarm). The .NET interop that AVEVA ships
(aaAlarmManagedClient.dll) auto-converts FILETIME→DateTime and
rejects out-of-range values.
GetStatistics continues to report total=0 active=0 even with
GetHighPriAlarm returning records — those two API surfaces have
genuinely different views in AVEVA's data model.
So: alarms flow through aaAlarmManagedClient.AlarmClient once
the subscription expression is canonical. The blocking issue is
extracting the payload past the .NET interop's DateTime
auto-marshaling.
Remaining work to capture alarm payloads
Define a custom COM interop that uses long (FILETIME-as-int64)
instead of DateTime for the timestamp fields. Approach options:
- Patch the AVEVA-shipped
aaAlarmManagedClient.dll— ildasm the assembly, replaceDateTimewithlongon AlarmRecord's timestamp fields, ilasm back. Brittle across AVEVA upgrades. - Write our own
[ComImport]interface — declareIRawAlarmConsumerourselves with safe-blittable types, discover the underlying COM IID (via reflection onAlarmClient's[Guid]attribute), and(IRawAlarmConsumer) alarmClientcast. Cleaner; requires the IID. - Use
IDispatchlate binding — dispatch-Invoke bypasses strong-typed marshaling. Verbose but doesn't need IIDs.
For PR A.2's worker integration, option 2 is the least
disruptive. Once the interop is custom, AlarmClient.Subscribe +
GetHighPriAlarm + GetAlarmExtendedRec form a viable
polling-style alarm consumer.
REVISED 2026-05-01 — option 1 not directly applicable.
Reflection on aaAlarmManagedClient.AlarmClient shows it
implements only IDisposable (no [ComImport] interface, no
class GUID). It has a single field CwwAlarmConsumer* m_almUnmanaged — meaning AlarmClient is a C++/CLI managed
wrapper around a native C++ class, NOT a COM-interop class.
The DateTime conversion happens inside the AVEVA wrapper's IL,
not at a .NET-to-COM marshaling boundary. There is no separate
COM interface IID we can QI to.
Revised approach options:
A. Switch to wnwrapConsumer.dll — a separate standalone
COM library AVEVA ships at
C:\Program Files (x86)\Common Files\ArchestrA\wnwrapConsumer.dll
exposing WNWRAPCONSUMERLib.wwAlarmConsumerClass with
SetXmlAlarmQuery / GetXmlCurrentAlarms. XML-string output
bypasses FILETIME marshaling entirely.
B. Patch aaAlarmManagedClient.dll IL — wrap the unsafe
DateTime.FromFileTime calls with a safe variant. Direct
fix but modifies a vendor binary.
C. Reflect into m_almUnmanaged and call native vtable —
get the IntPtr, walk the MSVC C++ vtable, call
__thiscall methods via Marshal.GetDelegateForFunctionPointer.
Doable but requires reverse-engineering the C++ class layout.
Option A is the best fit: real COM-based, self-contained in our code, conventional production-grade approach (the WIN-911 consumer pattern referenced in AVEVA support forums uses it).
The polling-vs-WM_APP-callback question from earlier is now
moot: GetStatistics's positions[]/handles[] arrays remained
empty even when alarms were demonstrably present. The active
read API for current alarms is GetHighPriAlarm, not
GetStatistics's change array.
Implications for A.2 implementation
The A.2 PR's value is unmeasurable until at least one alarm
provider is visible. The choice between polling-via-GetStatistics
and the callback path can only be decided by observing what
populates first when a real alarm fires. Without a provider,
both paths return the same "nothing happening" answer.
Until that's resolved, A.2 implementation work is genuinely blocked on a dev-rig configuration issue — not on architectural choice or code structure.
GetStatistics polling — second probe run, 2026-05-01
Extended the probe to call GetStatistics every ~2s alongside the
WM logger. Key findings:
GetStatisticsis safely callable from the same thread that didRegisterConsumer+Subscribe. Every poll returned rc=0 with no exceptions over 9 polls / 20s window.- The deployed Galaxy currently has zero active alarms. Every
poll reported
total=0 active=0 suppressed=0 newAlarms=0. Thepositions[]andhandles[]arrays were empty. changes=1 codes=[7]was constant across all polls, matching the constant 1 Hz WM 0xC275 cadence. Code 7 is consistent with a "heartbeat / subscription healthy" sentinel — same semantics as the WM but reported through the pull-side API.percent=100(query-complete percentage) was constant — the subscription is steady-state.
This confirms the polling design (option 1 in the previous section)
is mechanically viable. The remaining open question is whether
GetStatistics populates positions[] / handles[] with real
entries when an alarm transition actually fires — proving that
requires firing an alarm.
Open follow-up probes
Each can be added to AlarmClientWmProbeTests as a separate
Skip-gated test:
- Fire a real Galaxy alarm during the pump window. The cleanest
programmatic trigger is an MxAccess write that flips a
$Alarm-extended boolean to true (alarm in) and back to false (alarm out). Pinning the exact tag reference is pending — needs either a documented test-fixture tag or an interactive selection in System Platform IDE. Once the trigger fires, this resolves whether AVEVA's pulled change set arrives viaGetStatisticspositions[] / handles[](per-change polling works) or only via the AVEVA-internal window (callback path needed). - Hook AVEVA's internal window to log what WMs it actually
processes — only relevant if probe 1 shows
GetStatisticsdoes NOT report per-change activity. - Decompile
aaAlarmManagedClient.dll's IL for theRegisterConsumermethod to find whatRegisterWindowMessagestring is used and whether there's a callback-registration surface onWNAL_Registerthat the managed client wraps. The alarmlst.dll strings (WNAL_CallBack, "Invalid callbacks" error) suggest the underlying C API takes callbacks, but the managed wrapper exposes none of them.
PR A.5's Subscribe / AcknowledgeByGuid / SnapshotActiveAlarms
are correct — they're pull-style and don't depend on the
notification mechanism.
Option A — captured, 2026-05-01
wnwrapConsumer.dll (C:\Program Files (x86)\Common Files\ ArchestrA\wnwrapConsumer.dll) hosts the standalone COM class
WNWRAPCONSUMERLib.wwAlarmConsumerClass. Type library imports
cleanly via tlbimp (output stored under mxaccessgw/lib/ Interop.WNWRAPCONSUMERLib.dll). The COM class is registered in
HKLM:\SOFTWARE\WOW6432Node\Classes\CLSID\ {7AB52E5F-36B2-4A30-AE46-952A746F667C} with ThreadingModel= Apartment — new wwAlarmConsumerClass() succeeds via
CoCreateInstance.
The probe ZB.MOM.WW.MxGateway.Worker.Tests/WnWrapConsumerProbeTests.cs
(Skip-gated, archival) drove the captured run. Lifecycle:
new wwAlarmConsumerClass()— instantiated.InitializeConsumer("MxGatewayProbe.WnWrap")-> 0.RegisterConsumer(hWnd: 0, productName, applicationName, version)-> 0. Note: wnwrap'sRegisterConsumeris 4-arg (nobRetainHiddenAlarms);aaAlarmManagedClient's is 5-arg. Different surface.Subscribe(@"\\<machine>\Galaxy!DEV", priLow=1, priHigh=999, qtSummary, sfReturnNewestFirst, asAlarmActiveNow, asAlarmActiveNow)-> 0. Same canonical scope that worked foraaAlarmManagedClient.SetXmlAlarmQuery(...)was called too but the round-tripGetXmlAlarmQueryreturned a mangled echo (NODE becameDESKTOP-6JL3KKO\Galaxy!DEV, PROVIDER becameGalaxy!DEV, ALARM_STATE shortened toAll, DISPLAY_MODE truncated toSum). The XML-query path looks broken in this build; rely onSubscribefor the filter and skipSetXmlAlarmQueryin production. Confirming "Subscribe alone is sufficient" is one follow-up probe (callSubscribeand read XML, noSetXmlAlarmQuery) — out of scope for the breakthrough run but easy to verify.
Captured XML (60 polls over 30s, 500ms cadence)
GetXmlCurrentAlarms2(maxAlmCnt: 100, out vartCurrentXmlAlarms)
returned BSTR XML cleanly on every call — 60/60 ok, zero throws.
GetXmlCurrentAlarms (the v1 method) returned identical content
on the same cadence; either method is viable.
Empty state:
<?xml version="1.0"?><ALARM_RECORDS COUNT="0"></ALARM_RECORDS>
With alarm active (UNACK_ALM, value=true after the flip
script set the bool true):
<?xml version="1.0"?>
<ALARM_RECORDS COUNT="1">
<ALARM>
<GUID>BCC4705395424D65BDAABCDEA6A32A73</GUID>
<DATE>2026/5/1</DATE>
<TIME>13:26:14.709</TIME>
<GMTOFFSET>240</GMTOFFSET>
<DSTADJUST>0</DSTADJUST>
<PROVIDER_NODE>DESKTOP-6JL3KKO</PROVIDER_NODE>
<PROVIDER_NAME>Galaxy</PROVIDER_NAME>
<GROUP>TestArea</GROUP>
<TAGNAME>TestMachine_001.TestAlarm001</TAGNAME>
<TYPE>DSC</TYPE>
<VALUE>true</VALUE>
<LIMIT>true</LIMIT>
<PRIORITY>500</PRIORITY>
<STATE>UNACK_ALM</STATE>
<OPERATOR_NODE></OPERATOR_NODE>
<OPERATOR_NAME></OPERATOR_NAME>
<ALARM_COMMENT>Test alarm #1</ALARM_COMMENT>
</ALARM>
</ALARM_RECORDS>
After the script set the bool false (UNACK_RTN, value=false):
<?xml version="1.0"?>
<ALARM_RECORDS COUNT="1">
<ALARM>
<GUID>BCC4705395424D65BDAABCDEA6A32A73</GUID>
<DATE>2026/5/1</DATE>
<TIME>13:26:24.710</TIME>
...
<VALUE>false</VALUE>
<STATE>UNACK_RTN</STATE>
...
</ALARM>
</ALARM_RECORDS>
The 10s cadence between transitions matches the System Platform
script's flip frequency exactly. GUID is stable across the
in→out cycle (BCC4705… carried through both states), so the
XML stream represents the alarm record's lifecycle, not separate
event records — this is "current alarms snapshot," not
"transition stream." For an OPC UA AlarmConditionService
adapter this is fine: condition-state changes per-snapshot is
the supported model.
STATE enum values observed: UNACK_RTN (the alarm has
returned to normal but is unacknowledged — i.e., visible in the
"current alarms" list because operator hasn't acked it yet) and
UNACK_ALM (the alarm is currently active and unacknowledged).
The other states from eAlmState (ACK_RTN, ACK_ALM) would
appear when an ack is performed — wwAlarmConsumerClass.AlarmAckByGUID
is the method to call.
GetStatistics AV — unrelated quirk
Every GetStatistics call threw AccessViolationException in
the probe. Cause: the wnwrap interop signature uses IntPtr for
the three array out-parameters (pChangeCode, pChangePos,
phAlarm); passing IntPtr.Zero is wrong — the COM impl is
writing into the buffer pointer without null-checking. Pre-
allocate three int-arrays and pass pinned pointers (or use
Marshal.AllocCoTaskMem) to fix. Not required for the
production path — the XML methods give us everything we need.
Implications for PR A.2 worker integration
Replacing aaAlarmManagedClient.AlarmClient with
WNWRAPCONSUMERLib.wwAlarmConsumerClass in the worker's
alarm-consumer surface unblocks A.2 fully. Outline:
- Reference path: drop
aaAlarmManagedClient.dllreference fromZB.MOM.WW.MxGateway.Worker.csproj; addInterop.WNWRAPCONSUMERLib.dllreference frommxaccessgw/lib/. (Or commit the interop dll in-tree underlib/and reference relatively.) AlarmClientConsumer→WnWrapAlarmConsumer: rewrite the consumer wrapper to:new wwAlarmConsumerClass()on the worker's STA thread.InitializeConsumer(applicationName)thenRegisterConsumer(hWnd: 0, …).Subscribe(@"\\<node>\Galaxy!<area>", …)per configured area. The<node>and<area>are configurable (defaultEnvironment.MachineName+ the platform's primary area).- Poll
GetXmlCurrentAlarms2(maxAlmCnt, out xml)on a timer (500ms-1s cadence is comfortable). Parse XML payload; diff against the previous snapshot (keyed byGUID); emitMX_EVENT_FAMILY_ON_ALARM_TRANSITIONevents for added/changed/removed records. AlarmAckByGUID(VBGUID, comment, oprName, node, domain, fullName)for client-driven acknowledgements (matches PR A.5'sAlarmAckCommandpayload).- Lifecycle teardown:
DeregisterConsumer+UninitializeConsumer+Marshal.FinalReleaseComObject.
- Conversion layer: map XML record fields to
MxAlarmConditionRecordproto:GUID→condition_id(canonicalize the no-dashes hex to a UUID string).STATEenum →inAlarm+ackedbooleans (UNACK_ALM→ in_alarm=true, acked=false;UNACK_RTN→ in_alarm=false, acked=false;ACK_ALM→ in_alarm=true, acked=true;ACK_RTN→ in_alarm=false, acked=true).DATE + TIME + GMTOFFSET + DSTADJUST→ reassemble UTC timestamp; matches the worker's existingTimestampwire format.PRIORITY→ severity (already 1-1000-ish range).TAGNAME→ reference;PROVIDER_NAME+GROUPfor scope metadata.
- PR A.5 fix carry-over:
InitializeConsumerMUST be called beforeRegisterConsumer(rediscovered withaaAlarmManagedClient, also true here). The existingAlarmClientConsumerskips Initialize entirely; the newWnWrapAlarmConsumerincludes it from day one. - Test reuse: PR A.5's snapshot/ack contract tests can stay — they don't touch the underlying COM API. Add a new integration test against the wnwrap surface (live-AVEVA-only, Skip-gated like the probe).
Settled API-ordering and surface knowledge
InitializeConsumerfirst, thenRegisterConsumer— both onaaAlarmManagedClient.AlarmClientandwwAlarmConsumerClass.RegisterConsumerarity differs:aaAlarmManagedClient.AlarmClient.RegisterConsumer(hWnd, product, app, version, bRetainHiddenAlarms)— 5 args;wwAlarmConsumerClass.RegisterConsumer(hWnd, product, app, version)— 4 args. The wnwrap class has nobRetainHiddenAlarmsparameter at all.- Subscription expression format:
\\<machine>\Galaxy!<area>(literalGalaxyprovider) for both libraries. - Native ack:
AlarmAckByGUID(VBGUID guid, comment, oprName, node, domain, fullName)on the v2 surface; ID 5-arg variant on the legacyIwwAlarmConsumerinterface.
These findings retire the open follow-up probes from the
"polling-vs-pump" debate above — wwAlarmConsumerClass plus
poll-on-timer is the implementation.
Live smoke-test discoveries — 2026-05-01
The Skip-gated AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip
ran the full
WnWrapAlarmConsumer + AlarmDispatcher + MxAccessAlarmEventSink
pipeline against the dev rig with the flip script running. End-to-end
verified: 6 real transitions captured on the 10s cadence, ack-by-name
returned rc=0, pipeline stayed healthy through 5 more transitions
afterwards. Three production-relevant quirks surfaced and were fixed
in the consumer:
1. SetXmlAlarmQuery is mandatory for reads despite the mangled echo
Without SetXmlAlarmQuery, the first GetXmlCurrentAlarms2 call
fails with E_FAIL (HRESULT 0x80004005). The discovery doc above
flagged the round-trip echo as mangled and recommended skipping the
call — that recommendation is wrong. The echo is mangled (AVEVA
parses NODE/PROVIDER/ALARM_STATE/DISPLAY_MODE incorrectly), but the
call itself is required as some kind of subscription enabler. Even
the Subscribe call setting the actual filter doesn't avoid the need
for SetXmlAlarmQuery.
WnWrapAlarmConsumer.ComposeXmlAlarmQuery(subscription) decomposes
the canonical \\<machine>\Galaxy!<area> form into the XML's
NODE/PROVIDER/GROUP fields. Mangled or not, the call enables reads.
2. Two consumers required: read-side vs. ack-side
SetXmlAlarmQuery enables reads but breaks AlarmAckByName on
the same consumer instance. With SetXml applied, AlarmAckByName
returns -55 even with valid name+provider+group+operator. Without
SetXml, AlarmAckByName succeeds with rc=0.
The production consumer therefore provisions two wnwrap COM instances:
- Primary consumer (
client): runs full lifecycle includingSetXmlAlarmQueryforGetXmlCurrentAlarms2polls. - Ack-only consumer (
ackClient): runs Initialize → Register → Subscribe via the v1-prefixed methods, no SetXmlAlarmQuery. AllAcknowledgeByNamecalls dispatch through this instance.
Both consumers subscribe to the same expression. Disposal cleans up
both via a shared ReleaseConsumerCom helper.
3. AlarmAckByName v2 8-arg vs. v1 6-arg
wwAlarmConsumerClass exposes two AlarmAckByName overloads:
IwwAlarmConsumer2v2: 8 args (name, provider, group, comment, oprName, node, domainName, oprFullName).IwwAlarmConsumerv1: 6 args (no domain, no full-name).
The v2 8-arg method returns -55 on this AVEVA build regardless of
operator-identity inputs — looks like a stub. The v1 6-arg method
works. Production WnWrapAlarmConsumer.AcknowledgeByName calls the
6-arg overload and discards the proto's domain + full_name fields.
The proto contract keeps the 8 fields for forward compatibility if
AVEVA fixes the v2 method later.
4. AlarmAckByGUID is not implemented
The v2 AlarmAckByGUID(VBGUID, …) throws NotImplementedException
(COM E_NOTIMPL) on wwAlarmConsumerClass against this AVEVA
build. The reference→GUID lookup that we initially planned to wire
through AlarmAckByGUID is therefore not viable on wnwrap; all acks
must go through AlarmAckByName.
The proto AcknowledgeAlarmCommand (GUID-based) and the worker's
MxAccessCommandExecutor.ExecuteAcknowledgeAlarm switch arm remain
in the codebase for the forward-compat shape, but the gateway-side
WorkerAlarmRpcDispatcher.AcknowledgeAsync now always routes through
AcknowledgeAlarmByName when the public RPC supplies a recognizable
Provider!Group.Tag reference.
5. STA / threading — production fix needed
The wnwrap COM is ThreadingModel=Apartment. The consumer's
internal Timer fires on threadpool threads and would block forever
on cross-apartment marshaling unless the host STA pumps Win32
messages. The smoke test sidesteps this by setting
pollIntervalMilliseconds=0 (Timer disabled) and driving PollOnce
manually from the test's STA. Production hosting will route polls
through the worker's StaRuntime in a follow-up — the consumer's
PollOnce is public and idempotent so the wire-up is mechanical.
Capture summary
Transition: kind=Clear ref='Galaxy!TestArea.TestMachine_001.TestAlarm001' …
Transition: kind=Raise ref='Galaxy!TestArea.TestMachine_001.TestAlarm001' …
SnapshotActiveAlarms count=1
active: ref='Galaxy!TestArea.TestMachine_001.TestAlarm001' state=Active
AcknowledgeByName(real identity) -> rc=0
Post-ack transition: kind=Clear …
+1: kind=Raise … (10s after ack)
+2: kind=Clear … (20s)
+3: kind=Raise … (30s)
+4: kind=Clear … (40s)
10s cadence held throughout; full proto fields populated correctly; ack registered server-side without errors.
Subtag-monitoring fallback provider
When the wnwrap alarm-manager source fails, the gateway worker switches to
SubtagAlarmConsumer — a synthetic alarm source that advises each alarm
attribute's subtags via the existing MXAccess AddItem/Advise pipeline and
derives alarm transitions from the resulting value-change stream. This is a
non-parity, degraded-mode source; every transition and snapshot it produces
carries degraded = true.
Watch-list discovery
GatewayAlarmMonitor resolves the subtag watch-list at subscribe time by
calling IAlarmWatchListResolver.GetAlarmAttributesAsync. The resolver merges:
- Galaxy Repository SQL (
GetAlarmAttributesAsync) — objects that have alarm extensions in the configured area. - Config overrides —
IncludeAttributesadds explicit entries;ExcludeAttributesremoves Repository-derived ones. The config list takes effect even whenUseGalaxyRepositoryisfalse.
The resolved list is a set of AlarmSubtagTarget messages sent to the worker
inside SubscribeAlarmsCommand.watch_list. Each target carries the composed
MXAccess item addresses for the InAlarm, Acked, AckMsg, and Priority
subtags (confirmed AVEVA AlarmExtension field names, verified against the live
ZB Galaxy attribute_definition rows). The gateway re-runs discovery on its
reconcile cadence and pushes an updated watch-list when the model changes.
Subtag advise and LmxSubtagAlarmSource
LmxSubtagAlarmSource (implements ISubtagAlarmSource) owns a separate
LMXProxyServerClass instance on the worker STA — it does not share the
session's main MXAccess object. For each watch-list target it calls
AddItem/Advise on the configured subtag addresses. When a subtag value
changes, it raises ValueChanged on the STA and SubtagAlarmConsumer
forwards it to SubtagAlarmStateMachine.
PollOnce() on the subtag consumer is a no-op — the path is event-driven
through Advise, not poll-driven.
Synthesis rules
SubtagAlarmStateMachine tracks (active, acked) per watch-list entry and
emits MxAlarmTransitionEvent records on change:
| Subtag change | Emitted transition | Notes |
|---|---|---|
InAlarm false → true |
Raise (UNACK_ALM) |
original_raise_timestamp = first observed active time for this episode |
Acked false → true, while InAlarm |
Acknowledge (ACK_ALM) |
AckedDuringEpisode latch set |
InAlarm true → false |
Clear | AckRtn if AckedDuringEpisode is set, else UnackRtn |
Acked true → false, while InAlarm |
(none) | Latch is NOT cleared; the episode retains its acknowledged status at clear |
The AckedDuringEpisode latch addresses out-of-order subtag delivery:
MXAccess does not guarantee the Acked = false update arrives before the
InAlarm = false update. The latch ensures a clear always emits ACK_RTN
when the alarm was acknowledged at any point during the active episode.
SnapshotActive() returns one MxAlarmSnapshotRecord per currently-active
alarm. State mapping:
InAlarm && !Acked→UNACK_ALMInAlarm && Acked→ACK_ALM!InAlarm→ not included in the snapshot
Synthetic GUID
The alarmmgr provider supplies a native GUID per alarm record. The subtag
provider has no native GUID. SubtagAlarmConsumer derives a deterministic
GUID by hashing alarm_full_reference (via SyntheticAlarmGuid.ForReference).
The same reference always produces the same GUID within a session, so
GUID-based ack routing resolves correctly. The GUID is not stable across
different alarm references or gateway restarts in the sense of matching any
AVEVA-internal GUID.
Acknowledge in subtag mode
AlarmDispatcher routes ack calls by active provider mode:
- Alarm-manager mode:
AlarmAckByNameonwwAlarmConsumerClass(unchanged). - Subtag mode:
SubtagAlarmConsumer.AcknowledgeByNameresolves the watch-list entry'sack_comment_subtagand issues aWrite(comment)on the STA viaLmxSubtagAlarmSource. Writing theAckMsgsubtag performs the acknowledge in AVEVA (AckMsgis the confirmedAlarmExtensionack-comment write target).
If the alarm has no writable ack-comment subtag (AckComment config key is
empty, or the entry's ack_comment_subtag field is empty), the ack call
returns a failure code that the gateway surfaces as FailedPrecondition.
AcknowledgeByGuid maps the synthetic GUID back to its reference via an
internal dictionary, then calls the same write path.
Fidelity limitations
The following fields are not available or have lower quality in subtag mode:
| Field | Subtag-mode behavior |
|---|---|
alarm_guid |
Synthetic deterministic GUID from alarm_full_reference; not an AVEVA-native GUID |
original_raise_timestamp |
First observed active = true time; no AVEVA-native raise time |
transition_timestamp |
OnDataChange source timestamp from MXAccess |
severity |
From priority subtag if advised; 0 otherwise |
category / description |
Not populated (no subtag for these) |
current_value / limit_value |
Not populated unless corresponding subtags are in the watch-list |
alarm_type_name |
Not populated |
operator_user / operator_comment |
Not populated on synthesized raise/clear transitions |
retrigger transition |
Not synthesized (no re-alarm counter subtag is observed) |
Every transition and snapshot record carries degraded = true and
source_provider = ALARM_PROVIDER_MODE_SUBTAG. Clients that require full
fidelity must wait for failback to the alarm manager.
Provider mode reflection
When FailoverAlarmConsumer switches between providers, it raises
ProviderModeChanged. AlarmDispatcher enqueues an
OnAlarmProviderModeChangedEvent (carried as an MxEvent), which the
gateway receives and reflects into:
AlarmFeedMessage.provider_statusemitted to everyStreamAlarmssubscriber.- The
/hubs/alarmsSignalR hub for the dashboard. - Metrics:
mxgateway.alarms.provider_modegauge andmxgateway.alarms.provider_switchescounter.
On every switch GatewayAlarmMonitor also forces a reconcile
(QueryActiveAlarms) against the now-active provider so the gateway cache
reflects the post-switch state without a spurious raise/clear storm.