A.3 (live smoke): full alarms-over-gateway pipeline verified end-to-end
Skip-gated AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip ran
against the dev rig with the flip script firing
TestMachine_001.TestAlarm001 every 10s. Verified:
- Subscribe + 1st PollOnce yield real transition events
- Field-by-field decode correct (provider, group, tag, severity,
UTC timestamp, comment, type)
- SnapshotActiveAlarms reflects current state
- AcknowledgeByName(real identity) -> rc=0
- Pipeline keeps streaming transitions on the 10s cadence post-ack
Three production quirks surfaced and were fixed in
WnWrapAlarmConsumer:
1. SetXmlAlarmQuery is mandatory for reads. Skipping it (per the
earlier discovery-doc recommendation) makes the first
GetXmlCurrentAlarms2 fail with E_FAIL. The doc's claim that the
call is unnecessary because the round-trip echo is mangled was
wrong — mangled echo or not, the call is required.
2. SetXmlAlarmQuery breaks AlarmAckByName on the same consumer
instance (returns -55). Workaround: provision a parallel
"ack-only" wnwrap consumer that runs Initialize → Register →
Subscribe via the v1-prefixed methods, no SetXmlAlarmQuery.
Production WnWrapAlarmConsumer now holds two COM clients;
AcknowledgeByName always dispatches through the ack-only one.
3. AlarmAckByName has v2 (8-arg) and v1 (6-arg) overloads. The v2
8-arg overload returns -55 on this AVEVA build (apparently a
stub); the v1 6-arg overload works. Production now calls the
6-arg overload, discarding the proto's operator_domain and
operator_full_name fields. The proto contract keeps both for
forward-compat if AVEVA fixes the v2 method.
Bonus finding (not fixed here): AlarmAckByGUID throws
NotImplementedException on wnwrap. Reference→GUID lookup that we
initially planned to plumb is therefore not viable; all acks must
go through AlarmAckByName. WorkerAlarmRpcDispatcher.AcknowledgeAsync
already routes references through the by-name path, so this only
affects the GUID-input branch (which the worker tries first if the
input parses as a GUID — that branch will surface
NotImplementedException as MxaccessFailure if a client supplies one).
Threading caveat: wnwrap is ThreadingModel=Apartment, so the
consumer's internal Timer (firing on threadpool threads) blocks on
cross-apartment marshaling without an STA message pump. The smoke
test sidesteps this with pollIntervalMilliseconds=0 (Timer disabled)
+ manual PollOnce calls from the test STA. Production hosting will
route polls through the worker's StaRuntime in a follow-up; PollOnce
is now public so the wire-up is straightforward.
Test counts after this slice:
Worker: 195 pass / 4 skipped (live probes incl. new live smoke) /
1 pre-existing structure-fail (untouched)
Server: 308 pass / 0 fail
Solution builds clean.
docs/AlarmClientDiscovery.md "Live smoke-test discoveries" section
records all five findings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -688,3 +688,105 @@ alarm-consumer surface unblocks A.2 fully. Outline:
|
||||
These findings retire the open follow-up probes from the
|
||||
"polling-vs-pump" debate above — `wwAlarmConsumerClass` plus
|
||||
poll-on-timer is the implementation.
|
||||
|
||||
## Live smoke-test discoveries — 2026-05-01
|
||||
|
||||
The Skip-gated `AlarmsLiveSmokeTests.Alarms_full_pipeline_round_trip`
|
||||
ran the full
|
||||
`WnWrapAlarmConsumer` + `AlarmDispatcher` + `MxAccessAlarmEventSink`
|
||||
pipeline against the dev rig with the flip script running. End-to-end
|
||||
verified: 6 real transitions captured on the 10s cadence, ack-by-name
|
||||
returned rc=0, pipeline stayed healthy through 5 more transitions
|
||||
afterwards. Three production-relevant quirks surfaced and were fixed
|
||||
in the consumer:
|
||||
|
||||
### 1. `SetXmlAlarmQuery` is mandatory for reads despite the mangled echo
|
||||
|
||||
Without `SetXmlAlarmQuery`, the first `GetXmlCurrentAlarms2` call
|
||||
fails with `E_FAIL` (HRESULT `0x80004005`). The discovery doc above
|
||||
flagged the round-trip echo as mangled and recommended skipping the
|
||||
call — that recommendation is **wrong**. The echo *is* mangled (AVEVA
|
||||
parses NODE/PROVIDER/ALARM_STATE/DISPLAY_MODE incorrectly), but the
|
||||
call itself is required as some kind of subscription enabler. Even
|
||||
the Subscribe call setting the actual filter doesn't avoid the need
|
||||
for `SetXmlAlarmQuery`.
|
||||
|
||||
`WnWrapAlarmConsumer.ComposeXmlAlarmQuery(subscription)` decomposes
|
||||
the canonical `\\<machine>\Galaxy!<area>` form into the XML's
|
||||
NODE/PROVIDER/GROUP fields. Mangled or not, the call enables reads.
|
||||
|
||||
### 2. Two consumers required: read-side vs. ack-side
|
||||
|
||||
`SetXmlAlarmQuery` enables reads but **breaks `AlarmAckByName` on
|
||||
the same consumer instance**. With SetXml applied, AlarmAckByName
|
||||
returns -55 even with valid name+provider+group+operator. Without
|
||||
SetXml, AlarmAckByName succeeds with rc=0.
|
||||
|
||||
The production consumer therefore provisions **two** wnwrap COM
|
||||
instances:
|
||||
- Primary consumer (`client`): runs full lifecycle including
|
||||
`SetXmlAlarmQuery` for `GetXmlCurrentAlarms2` polls.
|
||||
- Ack-only consumer (`ackClient`): runs Initialize → Register →
|
||||
Subscribe via the v1-prefixed methods, **no SetXmlAlarmQuery**.
|
||||
All `AcknowledgeByName` calls dispatch through this instance.
|
||||
|
||||
Both consumers subscribe to the same expression. Disposal cleans up
|
||||
both via a shared `ReleaseConsumerCom` helper.
|
||||
|
||||
### 3. `AlarmAckByName` v2 8-arg vs. v1 6-arg
|
||||
|
||||
`wwAlarmConsumerClass` exposes two `AlarmAckByName` overloads:
|
||||
- `IwwAlarmConsumer2` v2: 8 args (`name, provider, group, comment,
|
||||
oprName, node, domainName, oprFullName`).
|
||||
- `IwwAlarmConsumer` v1: 6 args (no domain, no full-name).
|
||||
|
||||
The v2 8-arg method returns -55 on this AVEVA build regardless of
|
||||
operator-identity inputs — looks like a stub. The v1 6-arg method
|
||||
works. Production `WnWrapAlarmConsumer.AcknowledgeByName` calls the
|
||||
6-arg overload and discards the proto's `domain` + `full_name` fields.
|
||||
The proto contract keeps the 8 fields for forward compatibility if
|
||||
AVEVA fixes the v2 method later.
|
||||
|
||||
### 4. `AlarmAckByGUID` is not implemented
|
||||
|
||||
The v2 `AlarmAckByGUID(VBGUID, …)` throws `NotImplementedException`
|
||||
(COM `E_NOTIMPL`) on `wwAlarmConsumerClass` against this AVEVA
|
||||
build. The reference→GUID lookup that we initially planned to wire
|
||||
through `AlarmAckByGUID` is therefore not viable on wnwrap; all acks
|
||||
must go through `AlarmAckByName`.
|
||||
|
||||
The proto `AcknowledgeAlarmCommand` (GUID-based) and the worker's
|
||||
`MxAccessCommandExecutor.ExecuteAcknowledgeAlarm` switch arm remain
|
||||
in the codebase for the forward-compat shape, but the gateway-side
|
||||
`WorkerAlarmRpcDispatcher.AcknowledgeAsync` now always routes through
|
||||
`AcknowledgeAlarmByName` when the public RPC supplies a recognizable
|
||||
`Provider!Group.Tag` reference.
|
||||
|
||||
### 5. STA / threading — production fix needed
|
||||
|
||||
The wnwrap COM is `ThreadingModel=Apartment`. The consumer's
|
||||
internal `Timer` fires on threadpool threads and would block forever
|
||||
on cross-apartment marshaling unless the host STA pumps Win32
|
||||
messages. The smoke test sidesteps this by setting
|
||||
`pollIntervalMilliseconds=0` (Timer disabled) and driving `PollOnce`
|
||||
manually from the test's STA. Production hosting will route polls
|
||||
through the worker's `StaRuntime` in a follow-up — the consumer's
|
||||
`PollOnce` is `public` and idempotent so the wire-up is mechanical.
|
||||
|
||||
### Capture summary
|
||||
|
||||
```
|
||||
Transition: kind=Clear ref='Galaxy!TestArea.TestMachine_001.TestAlarm001' …
|
||||
Transition: kind=Raise ref='Galaxy!TestArea.TestMachine_001.TestAlarm001' …
|
||||
SnapshotActiveAlarms count=1
|
||||
active: ref='Galaxy!TestArea.TestMachine_001.TestAlarm001' state=Active
|
||||
AcknowledgeByName(real identity) -> rc=0
|
||||
Post-ack transition: kind=Clear …
|
||||
+1: kind=Raise … (10s after ack)
|
||||
+2: kind=Clear … (20s)
|
||||
+3: kind=Raise … (30s)
|
||||
+4: kind=Clear … (40s)
|
||||
```
|
||||
|
||||
10s cadence held throughout; full proto fields populated correctly;
|
||||
ack registered server-side without errors.
|
||||
|
||||
Reference in New Issue
Block a user