A.2: replace AlarmClientConsumer with wnwrap-based polling consumer

Switch the worker's alarm-consumer surface from `aaAlarmManagedClient.AlarmClient`
to `WNWRAPCONSUMERLib.wwAlarmConsumerClass` (CLSID 7AB52E5F-…) hosted by
`wnwrapConsumer.dll`. The new path returns alarm records as a BSTR XML
payload via `GetXmlCurrentAlarms2`, bypassing the FILETIME→DateTime
auto-marshaling that crashed `GetHighPriAlarm` with
ArgumentOutOfRangeException on every poll. Live captured 60/60 polls
clean against `\DESKTOP-6JL3KKO\Galaxy!DEV` while a System Platform
script flipped TestMachine_001.TestAlarm001 every 10s; the GUID,
priority, state (UNACK_ALM ↔ UNACK_RTN), and ASCII-formatted timestamps
arrived end-to-end.

Implementation:
- `Interop.WNWRAPCONSUMERLib.dll` generated via tlbimp, checked in under
  `lib/` so dev boxes don't need the SDK to build.
- New `WnWrapAlarmConsumer` (replaces `AlarmClientConsumer`): owns a
  500ms polling timer, parses `GetXmlCurrentAlarms2` output, diffs the
  snapshot keyed by alarm GUID, and raises one
  `MxAlarmTransitionEvent` per state change. Includes the
  Initialize→Register-before-Subscribe ordering fix found during
  Discovery probe runs.
- New library-agnostic types `MxAlarmSnapshotRecord` /
  `MxAlarmStateKind` / `MxAlarmTransitionEvent` so the proto-build
  path is testable without an AVEVA install.
- `AlarmRecordTransitionMapper` retired the COM-coupled
  `MapTransitionKind(eAlmTransitions)`; new pure helpers
  `ParseStateKind`, `MapTransition(prev, curr)`, and
  `ParseTransitionTimestampUtc` cover XML decode + state-delta logic.
- `IMxAccessAlarmConsumer` event surface changed from
  `EventHandler<AlarmRecord>` to `EventHandler<MxAlarmTransitionEvent>`
  and `SnapshotActiveAlarms()` returns `MxAlarmSnapshotRecord` —
  decoupling the interface from any specific COM library.
- Worker csproj drops `aaAlarmManagedClient` / `IAlarmMgrDataProvider`
  refs; adds `Interop.WNWRAPCONSUMERLib`.

Tests:
- 36 new unit tests (state-string mapping, prev/current → proto kind
  decision table, timestamp UTC reassembly, XML payload parser, 32-char
  hex GUID round-trip) covering everything that doesn't touch the live
  COM surface — all passing.
- Skip-gated `WnWrapConsumerProbeTests.ProbeWnWrapConsumer` archives
  the live capture flow for regression / future probes.

Docs:
- `docs/AlarmClientDiscovery.md` "Option A — captured" section records
  sample XML payloads, the mangled `SetXmlAlarmQuery` round-trip
  (prefer `Subscribe` for filtering), the `GetStatistics`
  AccessViolationException quirk, and the worker-integration outline.

Pre-existing failure noted (separate):
`MxAccessInteropReference_ExistsOnlyInWorkerProject` was already
failing on HEAD — the test project still references `ArchestrA.MxAccess`
for the Skip-gated discovery probes. Not regressed by this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-01 09:44:15 -04:00
parent f490ae2593
commit f711a55be4
13 changed files with 1326 additions and 318 deletions
+196
View File
@@ -492,3 +492,199 @@ Skip-gated test:
PR A.5's `Subscribe` / `AcknowledgeByGuid` / `SnapshotActiveAlarms`
are correct — they're pull-style and don't depend on the
notification mechanism.
## Option A — captured, 2026-05-01
`wnwrapConsumer.dll` (`C:\Program Files (x86)\Common Files\
ArchestrA\wnwrapConsumer.dll`) hosts the standalone COM class
`WNWRAPCONSUMERLib.wwAlarmConsumerClass`. Type library imports
cleanly via `tlbimp` (output stored under `mxaccessgw/lib/
Interop.WNWRAPCONSUMERLib.dll`). The COM class is registered in
`HKLM:\SOFTWARE\WOW6432Node\Classes\CLSID\
{7AB52E5F-36B2-4A30-AE46-952A746F667C}` with `ThreadingModel=
Apartment` — `new wwAlarmConsumerClass()` succeeds via
`CoCreateInstance`.
The probe `MxGateway.Worker.Tests/WnWrapConsumerProbeTests.cs`
(Skip-gated, archival) drove the captured run. Lifecycle:
1. `new wwAlarmConsumerClass()` — instantiated.
2. `InitializeConsumer("MxGatewayProbe.WnWrap")` -> 0.
3. `RegisterConsumer(hWnd: 0, productName, applicationName,
version)` -> 0. **Note:** wnwrap's `RegisterConsumer` is
4-arg (no `bRetainHiddenAlarms`); `aaAlarmManagedClient`'s
is 5-arg. Different surface.
4. `Subscribe(@"\\<machine>\Galaxy!DEV", priLow=1, priHigh=999,
qtSummary, sfReturnNewestFirst, asAlarmActiveNow,
asAlarmActiveNow)` -> 0. Same canonical scope that worked
for `aaAlarmManagedClient`.
5. `SetXmlAlarmQuery(...)` was called too but the round-trip
`GetXmlAlarmQuery` returned a mangled echo (NODE became
`DESKTOP-6JL3KKO\Galaxy!DEV`, PROVIDER became `Galaxy!DEV`,
ALARM_STATE shortened to `All`, DISPLAY_MODE truncated to
`Sum`). The XML-query path looks broken in this build; rely
on `Subscribe` for the filter and skip `SetXmlAlarmQuery` in
production. Confirming "Subscribe alone is sufficient" is
one follow-up probe (call `Subscribe` and read XML, no
`SetXmlAlarmQuery`) — out of scope for the breakthrough run
but easy to verify.
### Captured XML (60 polls over 30s, 500ms cadence)
`GetXmlCurrentAlarms2(maxAlmCnt: 100, out vartCurrentXmlAlarms)`
returned BSTR XML cleanly on every call — 60/60 ok, zero throws.
`GetXmlCurrentAlarms` (the v1 method) returned identical content
on the same cadence; either method is viable.
Empty state:
```xml
<?xml version="1.0"?><ALARM_RECORDS COUNT="0"></ALARM_RECORDS>
```
With alarm active (`UNACK_ALM`, value=true after the flip
script set the bool true):
```xml
<?xml version="1.0"?>
<ALARM_RECORDS COUNT="1">
<ALARM>
<GUID>BCC4705395424D65BDAABCDEA6A32A73</GUID>
<DATE>2026/5/1</DATE>
<TIME>13:26:14.709</TIME>
<GMTOFFSET>240</GMTOFFSET>
<DSTADJUST>0</DSTADJUST>
<PROVIDER_NODE>DESKTOP-6JL3KKO</PROVIDER_NODE>
<PROVIDER_NAME>Galaxy</PROVIDER_NAME>
<GROUP>TestArea</GROUP>
<TAGNAME>TestMachine_001.TestAlarm001</TAGNAME>
<TYPE>DSC</TYPE>
<VALUE>true</VALUE>
<LIMIT>true</LIMIT>
<PRIORITY>500</PRIORITY>
<STATE>UNACK_ALM</STATE>
<OPERATOR_NODE></OPERATOR_NODE>
<OPERATOR_NAME></OPERATOR_NAME>
<ALARM_COMMENT>Test alarm #1</ALARM_COMMENT>
</ALARM>
</ALARM_RECORDS>
```
After the script set the bool false (`UNACK_RTN`, value=false):
```xml
<?xml version="1.0"?>
<ALARM_RECORDS COUNT="1">
<ALARM>
<GUID>BCC4705395424D65BDAABCDEA6A32A73</GUID>
<DATE>2026/5/1</DATE>
<TIME>13:26:24.710</TIME>
...
<VALUE>false</VALUE>
<STATE>UNACK_RTN</STATE>
...
</ALARM>
</ALARM_RECORDS>
```
The 10s cadence between transitions matches the System Platform
script's flip frequency exactly. **GUID is stable across the
in→out cycle** (`BCC4705…` carried through both states), so the
XML stream represents the alarm record's lifecycle, not separate
event records — this is "current alarms snapshot," not
"transition stream." For an OPC UA `AlarmConditionService`
adapter this is fine: condition-state changes per-snapshot is
the supported model.
`STATE` enum values observed: `UNACK_RTN` (the alarm has
returned to normal but is unacknowledged — i.e., visible in the
"current alarms" list because operator hasn't acked it yet) and
`UNACK_ALM` (the alarm is currently active and unacknowledged).
The other states from `eAlmState` (`ACK_RTN`, `ACK_ALM`) would
appear when an ack is performed — `wwAlarmConsumerClass.AlarmAckByGUID`
is the method to call.
### `GetStatistics` AV — unrelated quirk
Every `GetStatistics` call threw `AccessViolationException` in
the probe. Cause: the wnwrap interop signature uses `IntPtr` for
the three array out-parameters (`pChangeCode`, `pChangePos`,
`phAlarm`); passing `IntPtr.Zero` is wrong — the COM impl is
writing into the buffer pointer without null-checking. Pre-
allocate three int-arrays and pass pinned pointers (or use
`Marshal.AllocCoTaskMem`) to fix. Not required for the
production path — the XML methods give us everything we need.
### Implications for PR A.2 worker integration
Replacing `aaAlarmManagedClient.AlarmClient` with
`WNWRAPCONSUMERLib.wwAlarmConsumerClass` in the worker's
alarm-consumer surface unblocks A.2 fully. Outline:
1. **Reference path:** drop `aaAlarmManagedClient.dll` reference
from `MxGateway.Worker.csproj`; add `Interop.WNWRAPCONSUMERLib.dll`
reference from `mxaccessgw/lib/`. (Or commit the interop dll
in-tree under `lib/` and reference relatively.)
2. **`AlarmClientConsumer` → `WnWrapAlarmConsumer`:** rewrite
the consumer wrapper to:
- `new wwAlarmConsumerClass()` on the worker's STA thread.
- `InitializeConsumer(applicationName)` then
`RegisterConsumer(hWnd: 0, …)`.
- `Subscribe(@"\\<node>\Galaxy!<area>", …)` per configured
area. The `<node>` and `<area>` are configurable (default
`Environment.MachineName` + the platform's primary area).
- Poll `GetXmlCurrentAlarms2(maxAlmCnt, out xml)` on a
timer (500ms-1s cadence is comfortable). Parse XML
payload; diff against the previous snapshot (keyed by
`GUID`); emit `MX_EVENT_FAMILY_ON_ALARM_TRANSITION`
events for added/changed/removed records.
- `AlarmAckByGUID(VBGUID, comment, oprName, node, domain,
fullName)` for client-driven acknowledgements (matches
PR A.5's `AlarmAckCommand` payload).
- Lifecycle teardown: `DeregisterConsumer` +
`UninitializeConsumer` + `Marshal.FinalReleaseComObject`.
3. **Conversion layer:** map XML record fields to
`MxAlarmConditionRecord` proto:
- `GUID` → `condition_id` (canonicalize the no-dashes hex
to a UUID string).
- `STATE` enum → `inAlarm` + `acked` booleans
(`UNACK_ALM` → in_alarm=true, acked=false;
`UNACK_RTN` → in_alarm=false, acked=false;
`ACK_ALM` → in_alarm=true, acked=true;
`ACK_RTN` → in_alarm=false, acked=true).
- `DATE + TIME + GMTOFFSET + DSTADJUST` → reassemble UTC
timestamp; matches the worker's existing `Timestamp`
wire format.
- `PRIORITY` → severity (already 1-1000-ish range).
- `TAGNAME` → reference; `PROVIDER_NAME` + `GROUP` for
scope metadata.
4. **PR A.5 fix carry-over:** `InitializeConsumer` MUST be
called before `RegisterConsumer` (rediscovered with
`aaAlarmManagedClient`, also true here). The existing
`AlarmClientConsumer` skips Initialize entirely; the new
`WnWrapAlarmConsumer` includes it from day one.
5. **Test reuse:** PR A.5's snapshot/ack contract tests can
stay — they don't touch the underlying COM API. Add a new
integration test against the wnwrap surface (live-AVEVA-only,
Skip-gated like the probe).
### Settled API-ordering and surface knowledge
- `InitializeConsumer` first, then `RegisterConsumer` — both
on `aaAlarmManagedClient.AlarmClient` and
`wwAlarmConsumerClass`.
- `RegisterConsumer` arity differs:
`aaAlarmManagedClient.AlarmClient.RegisterConsumer(hWnd,
product, app, version, bRetainHiddenAlarms)` — 5 args;
`wwAlarmConsumerClass.RegisterConsumer(hWnd, product, app,
version)` — 4 args. The wnwrap class has no
`bRetainHiddenAlarms` parameter at all.
- Subscription expression format: `\\<machine>\Galaxy!<area>`
(literal `Galaxy` provider) for both libraries.
- Native ack: `AlarmAckByGUID(VBGUID guid, comment, oprName,
node, domain, fullName)` on the v2 surface; ID 5-arg
variant on the legacy `IwwAlarmConsumer` interface.
These findings retire the open follow-up probes from the
"polling-vs-pump" debate above — `wwAlarmConsumerClass` plus
poll-on-timer is the implementation.