29 KiB
Cluster 09 — Alarms
Audited doc: docs/AlarmClientDiscovery.md
Code base verified against:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cssrc/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmDispatcher.cssrc/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmCommandHandler.cssrc/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmRecordTransitionMapper.cssrc/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cssrc/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmStateKind.cssrc/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmTransitionEvent.cssrc/ZB.MOM.WW.MxGateway.Worker/MxAccess/IMxAccessAlarmConsumer.cssrc/ZB.MOM.WW.MxGateway.Worker/MxAccess/IAlarmCommandHandler.cssrc/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs(alarm arms)src/ZB.MOM.WW.MxGateway.Server/Alarms/GatewayAlarmMonitor.cssrc/ZB.MOM.WW.MxGateway.Server/Alarms/IGatewayAlarmService.cssrc/ZB.MOM.WW.MxGateway.Server/Alarms/AlarmsServiceCollectionExtensions.cssrc/ZB.MOM.WW.MxGateway.Server/Configuration/AlarmsOptions.cssrc/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto
DOC / LINES / 71-74 (comment about AlarmClientConsumer.cs)
CLAIM / The file src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmClientConsumer.cs exists in the repo.
CLAIM_TYPE / path
VERDICT / wrong
EVIDENCE / find /Users/dohertj2/Desktop/mxaccessgateway/src -name "AlarmClientConsumer*" returns nothing. The file no longer exists; WnWrapAlarmConsumer.cs comments confirm it was replaced: WnWrapAlarmConsumer.cs:18-19.
CODE_AREA / alarm.subscribe
SEVERITY / medium
PROPOSED_FIX / Update references to the obsolete AlarmClientConsumer.cs throughout the doc to note that the file was retired and replaced by WnWrapAlarmConsumer.cs.
DOC / LINES / 71-74
CLAIM / The architecture comment on AlarmClientConsumer.cs (PR A.5) describing IAlarmMgrDataProvider managed events is wrong against the deployed assembly — there is no managed event surface.
CLAIM_TYPE / behavior-rule
VERDICT / stale
EVIDENCE / The source file it critiques (AlarmClientConsumer.cs) no longer exists in the repo. The critique is historically accurate but refers to a file that was removed during the wnwrap migration. No live code contains IAlarmMgrDataProvider. WnWrapAlarmConsumer.cs:1-575.
CODE_AREA / alarm.subscribe
SEVERITY / low
PROPOSED_FIX / Note that the critique is a historical record; the offending file has been removed. The section remains valid as probe context but should clarify the current state.
DOC / LINES / 87-88
CLAIM / AlarmClientConsumer.AlarmRecordReceived has no production callers; RaiseAlarmRecordReceived is internal for tests and never invoked at runtime.
CLAIM_TYPE / behavior-rule
VERDICT / stale
EVIDENCE / Neither AlarmRecordReceived nor RaiseAlarmRecordReceived appear anywhere in the current source tree (grep -rn "AlarmRecordReceived\|RaiseAlarmRecordReceived" src — zero results outside tests or binaries). The entire AlarmClientConsumer class was removed; the observation is a dead historical probe note.
CODE_AREA / alarm.subscribe
SEVERITY / low
PROPOSED_FIX / Flag as historical only; the code path no longer exists.
DOC / LINES / 492
CLAIM / "PR A.5's Subscribe / AcknowledgeByGuid / SnapshotActiveAlarms are correct — they're pull-style and don't depend on the notification mechanism."
CLAIM_TYPE / behavior-rule
VERDICT / stale
EVIDENCE / The method names in the current interface are IMxAccessAlarmConsumer.AcknowledgeByGuid and SnapshotActiveAlarms (IMxAccessAlarmConsumer.cs:64,104), so the names are accurate. However, this statement refers to PR A.5's AlarmClientConsumer, which no longer exists. The claim implicitly endorses AlarmClientConsumer code that has been replaced by WnWrapAlarmConsumer. The successor also exposes AcknowledgeByGuid but routes it through AlarmAckByGUID on wwAlarmConsumerClass.
CODE_AREA / alarm.ack
SEVERITY / low
PROPOSED_FIX / Note that PR A.5 was superseded; the current production path is WnWrapAlarmConsumer.
DOC / LINES / 604-605
CLAIM / After an alarm return-to-normal (UNACK_RTN), wwAlarmConsumerClass.AlarmAckByGUID is "the method to call" for acknowledgement.
CLAIM_TYPE / behavior-rule
VERDICT / wrong
EVIDENCE / The doc itself contradicts this eleven sections later ("Section 4. AlarmAckByGUID is not implemented", lines 750-756): AlarmAckByGUID(VBGUID, …) throws NotImplementedException (COM E_NOTIMPL) on wwAlarmConsumerClass. The doc at line 604 presents it as the correct ack method before the discovery in the live-smoke section, creating a contradiction within the document that integrators reading top-to-bottom will encounter.
CODE_AREA / alarm.ack
SEVERITY / high
PROPOSED_FIX / Add a forward-reference warning at line 604 ("Note: see 'Live smoke-test discoveries — section 4' below; AlarmAckByGUID is E_NOTIMPL on wnwrap and must not be called directly; use AlarmAckByName via the ack-only consumer.") or reorder the section.
DOC / LINES / 750-756
CLAIM / AlarmAckByGUID(VBGUID, …) throws NotImplementedException (E_NOTIMPL) on wwAlarmConsumerClass, so all acks must go through AlarmAckByName.
CLAIM_TYPE / behavior-rule
VERDICT / accurate
EVIDENCE / WnWrapAlarmConsumer.cs:215-239 provides AcknowledgeByGuid which calls com.AlarmAckByGUID directly (the COM interop). The method is present in the consumer and called from AlarmCommandHandler.Acknowledge (AlarmCommandHandler.cs:141-158) and AlarmDispatcher.Acknowledge (AlarmDispatcher.cs:87-103). The code path is plumbed through and compiles. Whether it still throws E_NOTIMPL at runtime on the deployed AVEVA build is a runtime-only observable — the doc's claim was empirically confirmed 2026-05-01.
CODE_AREA / alarm.ack
SEVERITY / medium
PROPOSED_FIX / Flag: the code now calls AlarmAckByGUID without a try/catch for E_NOTIMPL; document that the GUID path will surface a COMException at runtime on affected AVEVA builds and that the gateway routes canonical Provider!Group.Tag references through AcknowledgeAlarmByName to avoid this.
DOC / LINES / 758-762
CLAIM / "The proto AcknowledgeAlarmCommand (GUID-based) and MxAccessCommandExecutor.ExecuteAcknowledgeAlarm switch arm remain in the codebase for forward-compat, but the gateway-side WorkerAlarmRpcDispatcher.AcknowledgeAsync now always routes through AcknowledgeAlarmByName when the public RPC supplies a recognizable Provider!Group.Tag reference."
CLAIM_TYPE / cross-ref
VERDICT / wrong
EVIDENCE / (a) WorkerAlarmRpcDispatcher does not exist in the source tree. The class that routes acknowledge requests is GatewayAlarmMonitor.AcknowledgeAsync + BuildAcknowledgeCommand (GatewayAlarmMonitor.cs:437,516). (b) The gateway does NOT always route through AcknowledgeAlarmByName: BuildAcknowledgeCommand first tries Guid.TryParse; if the alarm_full_reference is a canonical GUID it still dispatches MxCommandKind.AcknowledgeAlarm (the GUID path) (GatewayAlarmMonitor.cs:528-543). Only when the reference is not a GUID does it fall through to AcknowledgeAlarmByName (GatewayAlarmMonitor.cs:545-563).
CODE_AREA / alarm.ack
SEVERITY / high
PROPOSED_FIX / (1) Replace WorkerAlarmRpcDispatcher with the actual class name GatewayAlarmMonitor. (2) Correct the routing description: GUID-shaped references still go through AcknowledgeAlarmCommand (GUID path); Provider!Group.Tag references go through AcknowledgeAlarmByNameCommand. The claim that it "always routes through AcknowledgeAlarmByName" is false.
DOC / LINES / 636-639 (A.2 outline step 2)
CLAIM / Production WnWrapAlarmConsumer polls GetXmlCurrentAlarms2(maxAlmCnt, out xml) on a timer (500ms–1s cadence).
CLAIM_TYPE / behavior-rule
VERDICT / wrong
EVIDENCE / WnWrapAlarmConsumer.cs:38-43 explicitly states "the consumer owns no internal timer." PollOnce() is driven externally by StaRuntime.InvokeAsync (WnWrapAlarmConsumer.cs:39, AlarmCommandHandler.cs:29-33). The 500ms–1s timer cadence mentioned in the doc was a design proposal; the implementation delegates all poll scheduling to the caller (STA). The doc's step 2 reads as if the consumer self-schedules.
CODE_AREA / alarm.subscribe
SEVERITY / medium
PROPOSED_FIX / Correct to: "Poll GetXmlCurrentAlarms2 via PollOnce() called externally by the worker's STA through StaRuntime.InvokeAsync; the consumer owns no internal timer."
DOC / LINES / 641-643 (A.2 outline step 2)
CLAIM / "AlarmAckByGUID(VBGUID, comment, oprName, node, domain, fullName) for client-driven acknowledgements (matches PR A.5's AlarmAckCommand payload)."
CLAIM_TYPE / rpc/proto
VERDICT / wrong
EVIDENCE / The proto message is named AcknowledgeAlarmCommand (not AlarmAckCommand): mxaccess_gateway.proto:337. The consumer also exposes AcknowledgeByGuid (not AlarmAckByGUID) as its interface method (IMxAccessAlarmConsumer.cs:64). The doc uses the COM method name where it should use the proto/interface name, and uses the wrong proto message name.
CODE_AREA / alarm.ack
SEVERITY / medium
PROPOSED_FIX / Replace "PR A.5's AlarmAckCommand payload" with "the proto's AcknowledgeAlarmCommand message" (mxaccess_gateway.proto:337).
DOC / LINES / 644-647 (A.2 outline step 3)
CLAIM / STATE mapping: UNACK_ALM → in_alarm=true, acked=false; UNACK_RTN → in_alarm=false, acked=false; ACK_ALM → in_alarm=true, acked=true; ACK_RTN → in_alarm=false, acked=true.
CLAIM_TYPE / term
VERDICT / wrong
EVIDENCE / The production proto uses AlarmConditionState (Active / ActiveAcked / Inactive), not boolean in_alarm/acked fields. AlarmDispatcher.MapConditionState (AlarmDispatcher.cs:221-234): UnackAlm→Active, AckAlm→ActiveAcked, UnackRtn→Inactive, AckRtn→Inactive. Both Rtn states collapse to Inactive — the acked distinction on a cleared alarm is not surfaced. The doc's proposed boolean decomposition was a design proposal that was not adopted; the final proto shape uses the enum.
CODE_AREA / alarm.state
SEVERITY / high
PROPOSED_FIX / Replace the boolean mapping table with the actual AlarmConditionState enum mapping used in AlarmDispatcher.MapConditionState. Document that UnackRtn and AckRtn both map to Inactive (ack-vs-unack on a cleared alarm is not exposed through the proto).
DOC / LINES / 648-649 (A.2 outline step 3)
CLAIM / "GUID → condition_id (canonicalize the no-dashes hex to a UUID string)."
CLAIM_TYPE / term
VERDICT / wrong
EVIDENCE / The production code stores the GUID as MxAlarmSnapshotRecord.AlarmGuid (a System.Guid) and the proto carries it inside OnAlarmTransitionEvent only implicitly (there is no condition_id field in the proto). The alarm_full_reference field is used as the stable identifier for condition correlation, not a condition_id. mxaccess_gateway.proto:720-723, OnAlarmTransitionEvent.alarm_full_reference. The field name condition_id does not exist in the proto.
CODE_AREA / alarm.state
SEVERITY / medium
PROPOSED_FIX / Replace condition_id with the actual stable identifier: alarm_full_reference (OnAlarmTransitionEvent.alarm_full_reference). The GUID is used internally by WnWrapAlarmConsumer as a snapshot key but is not exposed as a proto field.
DOC / LINES / 651-654 (A.2 outline step 3 — timestamp)
CLAIM / "DATE + TIME + GMTOFFSET + DSTADJUST → reassemble UTC timestamp; matches the worker's existing Timestamp wire format."
CLAIM_TYPE / behavior-rule
VERDICT / accurate
EVIDENCE / AlarmRecordTransitionMapper.ParseTransitionTimestampUtc (AlarmRecordTransitionMapper.cs:116-188) parses all four fields and computes UTC. The proto uses google.protobuf.Timestamp (mxaccess_gateway.proto:747). Wire-up matches.
CODE_AREA / alarm.state
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / 656-657 (A.2 outline step 3)
CLAIM / "PRIORITY → severity (already 1-1000-ish range)."
CLAIM_TYPE / behavior-rule
VERDICT / accurate
EVIDENCE / WnWrapAlarmConsumer.ParseSnapshotXml reads PRIORITY as int (WnWrapAlarmConsumer.cs:433), stored as MxAlarmSnapshotRecord.Priority. AlarmDispatcher.OnTransition passes it as severity: record.Priority (AlarmDispatcher.cs:187). OnAlarmTransitionEvent.severity is int32 in the proto (mxaccess_gateway.proto). The 1-1000 range is consistent with AVEVA's alarm priority range.
CODE_AREA / alarm.state
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / 658-659 (A.2 outline step 3)
CLAIM / "TAGNAME → reference; PROVIDER_NAME + GROUP for scope metadata."
CLAIM_TYPE / behavior-rule
VERDICT / accurate
EVIDENCE / AlarmDispatcher.OnTransition calls AlarmRecordTransitionMapper.ComposeFullReference(record.ProviderName, record.Group, record.TagName) and passes the result as alarmFullReference (AlarmDispatcher.cs:180-183). ComposeFullReference formats it as Provider!Group.TagName (AlarmRecordTransitionMapper.cs:90-102). TAGNAME alone is passed as sourceObjectReference (AlarmDispatcher.cs:184).
CODE_AREA / alarm.state
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / 672-676 (A.2 outline step 5)
CLAIM / "PR A.5's snapshot/ack contract tests can stay — they don't touch the underlying COM API."
CLAIM_TYPE / cross-ref
VERDICT / stale
EVIDENCE / PR A.5's AlarmClientConsumer was retired; there is no class by that name. The test files for alarm command handling now cover AlarmCommandHandler, AlarmDispatcher, and WnWrapAlarmConsumerXmlTests — none named as "PR A.5 tests." The statement implies a test corpus that doesn't exist under the described label.
CODE_AREA / alarm.subscribe
SEVERITY / low
PROPOSED_FIX / Remove or update the PR label; reference actual test files: AlarmCommandHandlerTests.cs, AlarmDispatcherTests.cs, WnWrapAlarmConsumerXmlTests.cs.
DOC / LINES / 673-675 (settled API ordering section)
CLAIM / "InitializeConsumer first, then RegisterConsumer — both on aaAlarmManagedClient.AlarmClient and wwAlarmConsumerClass."
CLAIM_TYPE / behavior-rule
VERDICT / accurate
EVIDENCE / WnWrapAlarmConsumer.Subscribe calls IwwAlarmConsumer_InitializeConsumer before IwwAlarmConsumer_RegisterConsumer (WnWrapAlarmConsumer.cs:117-137). Same ordering for ackClient (WnWrapAlarmConsumer.cs:188-208).
CODE_AREA / alarm.subscribe
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / 676-682 (settled API section)
CLAIM / "aaAlarmManagedClient.AlarmClient.RegisterConsumer is 5-arg (includes bRetainHiddenAlarms); wwAlarmConsumerClass.RegisterConsumer is 4-arg (no bRetainHiddenAlarms)."
CLAIM_TYPE / behavior-rule
VERDICT / accurate
EVIDENCE / WnWrapAlarmConsumer.Subscribe calls IwwAlarmConsumer_RegisterConsumer with 4 args: hWnd, szProductName, szApplicationName, szVersion (WnWrapAlarmConsumer.cs:128-132). Consistent with the doc.
CODE_AREA / alarm.subscribe
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / 683-685 (settled API section)
CLAIM / "Subscription expression format: \\<machine>\Galaxy!<area> (literal Galaxy provider) for both libraries."
CLAIM_TYPE / path
VERDICT / accurate
EVIDENCE / WnWrapAlarmConsumer.ComposeXmlAlarmQuery parses this format and treats Galaxy as the provider (WnWrapAlarmConsumer.cs:489-530). IMxAccessAlarmConsumer.Subscribe doc comment confirms: "Subscription string follows AVEVA's canonical format: \\<node>\Galaxy!<area>. The literal 'Galaxy' is the provider name (regardless of the configured Galaxy database name)." (IMxAccessAlarmConsumer.cs:44-46). AlarmsOptions.cs:16-17 also confirms.
CODE_AREA / alarm.subscribe
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / 684-685 (settled API section)
CLAIM / "Native ack: AlarmAckByGUID(VBGUID guid, comment, oprName, node, domain, fullName) on the v2 surface."
CLAIM_TYPE / behavior-rule
VERDICT / accurate
EVIDENCE / WnWrapAlarmConsumer.AcknowledgeByGuid calls com.AlarmAckByGUID with exactly those args (WnWrapAlarmConsumer.cs:232-238).
CODE_AREA / alarm.ack
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / 695-699 (live smoke quirk 1)
CLAIM / "Without SetXmlAlarmQuery, the first GetXmlCurrentAlarms2 call fails with E_FAIL (HRESULT 0x80004005)."
CLAIM_TYPE / behavior-rule
VERDICT / accurate
EVIDENCE / WnWrapAlarmConsumer.Subscribe calls SetXmlAlarmQuery and wraps it with a COMException guard that would surface as InvalidOperationException with the E_FAIL message (WnWrapAlarmConsumer.cs:156-182). The call is mandatory per production code structure.
CODE_AREA / alarm.subscribe
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / 719-733 (live smoke quirk 2)
CLAIM / "Two consumers required: read-side consumer (with SetXmlAlarmQuery) and ack-only consumer (without SetXmlAlarmQuery). All AcknowledgeByName calls dispatch through the ack-only instance."
CLAIM_TYPE / behavior-rule
VERDICT / accurate
EVIDENCE / WnWrapAlarmConsumer.Subscribe provisions ackClient = new wwAlarmConsumerClass() with full lifecycle but no SetXmlAlarmQuery (WnWrapAlarmConsumer.cs:184-210). AcknowledgeByName uses ackClient (WnWrapAlarmConsumer.cs:256-278). AcknowledgeByGuid uses client (read-side) (WnWrapAlarmConsumer.cs:224-238).
CODE_AREA / alarm.ack
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / 736-748 (live smoke quirk 3)
CLAIM / "The v2 8-arg AlarmAckByName returns -55 on this AVEVA build. The v1 6-arg AlarmAckByName works. Production WnWrapAlarmConsumer.AcknowledgeByName calls the 6-arg overload. Operator domain and full-name fields are accepted by the proto but not propagated to AVEVA (discarded)."
CLAIM_TYPE / behavior-rule
VERDICT / accurate
EVIDENCE / WnWrapAlarmConsumer.AcknowledgeByName calls com.AlarmAckByName (6-arg) and explicitly discards ackOperatorDomain and ackOperatorFullName with _ = ... (WnWrapAlarmConsumer.cs:268-278). The proto AcknowledgeAlarmByNameCommand retains operator_domain and operator_full_name fields (mxaccess_gateway.proto:359-373).
CODE_AREA / alarm.ack
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / 750-756 (live smoke quirk 4)
CLAIM / "AlarmAckByGUID is not implemented on wwAlarmConsumerClass; it throws NotImplementedException / E_NOTIMPL. The reference→GUID lookup is not viable; all acks must go through AlarmAckByName."
CLAIM_TYPE / behavior-rule
VERDICT / stale
EVIDENCE / The production code at WnWrapAlarmConsumer.AcknowledgeByGuid still calls com.AlarmAckByGUID directly without a guard for E_NOTIMPL (WnWrapAlarmConsumer.cs:215-239). The gateway's BuildAcknowledgeCommand still dispatches MxCommandKind.AcknowledgeAlarm (GUID path) when alarm_full_reference parses as a GUID (GatewayAlarmMonitor.cs:528-543). The doc says all acks must go through AcknowledgeByName, but the code still routes GUID-shaped references through AlarmAckByGUID. The E_NOTIMPL runtime behavior is unguarded.
CODE_AREA / alarm.ack
SEVERITY / high
PROPOSED_FIX / Either (a) add a COMException/NotImplementedException guard around AlarmAckByGUID in WnWrapAlarmConsumer.AcknowledgeByGuid that falls back to AcknowledgeByName, or (b) make the gateway never dispatch the GUID arm. Document whichever approach is taken. The current state silently sends a doomed IPC command.
DOC / LINES / 761-762
CLAIM / "WorkerAlarmRpcDispatcher.AcknowledgeAsync now always routes through AcknowledgeAlarmByName when the public RPC supplies a recognizable Provider!Group.Tag reference."
CLAIM_TYPE / cross-ref
VERDICT / wrong
EVIDENCE / (a) No class named WorkerAlarmRpcDispatcher exists in the source tree. The gateway-side routing is in GatewayAlarmMonitor.BuildAcknowledgeCommand (GatewayAlarmMonitor.cs:516). (b) The routing is conditional: GUID-shaped alarm_full_reference → AcknowledgeAlarmCommand (GUID path); Provider!Group.Tag → AcknowledgeAlarmByNameCommand. The claim that the routing "always" goes through AcknowledgeAlarmByName is incorrect.
CODE_AREA / alarm.ack
SEVERITY / high
PROPOSED_FIX / Replace the entire sentence. The correct description: "The gateway's GatewayAlarmMonitor.BuildAcknowledgeCommand (GatewayAlarmMonitor.cs:516) dispatches MxCommandKind.AcknowledgeAlarm for GUID-shaped references and MxCommandKind.AcknowledgeAlarmByName for Provider!Group.Tag references."
DOC / LINES / 765-773 (STA quirk 5)
CLAIM / "The consumer's internal Timer fires on threadpool threads and would block on cross-apartment marshaling unless the host STA pumps Win32 messages. The smoke test sidesteps this by setting pollIntervalMilliseconds=0 (Timer disabled) and driving PollOnce manually."
CLAIM_TYPE / behavior-rule
VERDICT / stale
EVIDENCE / The production WnWrapAlarmConsumer has no internal Timer at all — the design was revised so PollOnce() is always external (WnWrapAlarmConsumer.cs:38-43: "the consumer owns no internal timer"). There is no pollIntervalMilliseconds constructor parameter (WnWrapAlarmConsumer.cs:69-87). The constructor takes only wwAlarmConsumerClass client and int maxAlarmsPerFetch. The smoke test mention of pollIntervalMilliseconds=0 refers to a superseded design.
CODE_AREA / alarm.subscribe
SEVERITY / medium
PROPOSED_FIX / Update to reflect the final design: WnWrapAlarmConsumer has no internal timer; PollOnce() is always called externally by the STA. Remove the pollIntervalMilliseconds=0 test-workaround reference.
DOC / LINES / 599-601 (XML STATE enum section)
CLAIM / "STATE enum values observed: UNACK_RTN (alarm returned to normal, unacknowledged) and UNACK_ALM (alarm active and unacknowledged). Other states (ACK_RTN, ACK_ALM) would appear when an ack is performed."
CLAIM_TYPE / term
VERDICT / accurate
EVIDENCE / MxAlarmStateKind.cs:1-17 defines all four values. AlarmRecordTransitionMapper.ParseStateKind handles all four (AlarmRecordTransitionMapper.cs:27-38).
CODE_AREA / alarm.state
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / 628-630 (reference format in smoke capture)
CLAIM / Reference format in the capture: ref='Galaxy!TestArea.TestMachine_001.TestAlarm001' — the alarm_full_reference is composed as ProviderName!Group.TagName.
CLAIM_TYPE / term
VERDICT / accurate
EVIDENCE / AlarmRecordTransitionMapper.ComposeFullReference formats as {provider}!{group}.{name} (AlarmRecordTransitionMapper.cs:90-102). The example matches this pattern exactly.
CODE_AREA / alarm.state
SEVERITY / low
PROPOSED_FIX / flag only
DOC / LINES / (entire doc — RPC names)
CLAIM / The document mentions IPC commands SubscribeAlarms, AcknowledgeByGuid, SnapshotActiveAlarms, QueryActiveAlarms but never names the public gRPC RPCs — AcknowledgeAlarm, StreamAlarms, QueryActiveAlarms — or the config keys governing the always-on monitor (MxGateway:Alarms:Enabled, MxGateway:Alarms:SubscriptionExpression, MxGateway:Alarms:DefaultArea, MxGateway:Alarms:ReconcileIntervalSeconds).
CLAIM_TYPE / gap
VERDICT / gap
EVIDENCE / mxaccess_gateway.proto:22-37 (RPCs); AlarmsOptions.cs:21-47 (config keys); GatewayAlarmMonitor.cs:17-51 (always-on broker). None documented in AlarmClientDiscovery.md.
CODE_AREA / alarm.subscribe
SEVERITY / high
PROPOSED_FIX / The doc is a probe/research log, not an operator/integrator guide. However, the gap means no other document covers these public-surface items. Add a section or cross-reference to the public alarm API: RPCs AcknowledgeAlarm, StreamAlarms, QueryActiveAlarms; config keys MxGateway:Alarms:Enabled, MxGateway:Alarms:SubscriptionExpression, MxGateway:Alarms:DefaultArea, MxGateway:Alarms:ReconcileIntervalSeconds.
DOC / LINES / (entire doc — always-on broker architecture)
CLAIM / (gap) The doc describes a model where individual client sessions subscribe to alarms. The production architecture uses a gateway-owned always-on GatewayAlarmMonitor that holds one dedicated worker session and fans the alarm feed to all clients. No client opens its own alarm subscription; StreamAlarms is session-less.
CLAIM_TYPE / gap
VERDICT / gap
EVIDENCE / GatewayAlarmMonitor.cs:1-697; IGatewayAlarmService.cs:27-63; AlarmsOptions.cs:1-48. AlarmClientDiscovery.md describes the worker alarm consumer (IPC layer) but never describes the gateway-level brokering architecture that wraps it.
CODE_AREA / alarm.subscribe
SEVERITY / high
PROPOSED_FIX / Add a section describing GatewayAlarmMonitor as the always-on broker: one gateway-owned session, periodic reconcile loop (ReconcileIntervalSeconds), StreamAsync fan-out to per-client Channel<AlarmFeedMessage>, subscriber capacity (2048 messages), fail-open restart-backoff (5s).
DOC / LINES / (entire doc — AlarmFeedMessage / snapshot_complete protocol)
CLAIM / (gap) The doc does not document the AlarmFeedMessage stream protocol: initial burst of active_alarm messages, then snapshot_complete sentinel, then transition messages for live changes.
CLAIM_TYPE / gap
VERDICT / gap
EVIDENCE / mxaccess_gateway.proto:857-868 (message definition); GatewayAlarmMonitor.StreamAsync:386-434. This is the key integrator-facing streaming contract.
CODE_AREA / alarm.subscribe
SEVERITY / high
PROPOSED_FIX / Document the StreamAlarms protocol: AlarmFeedMessage union with active_alarm, snapshot_complete, and transition fields; the invariant that the snapshot precedes the sentinel which precedes live transitions.
DOC / LINES / (entire doc — reconcile mechanism)
CLAIM / (gap) The periodic reconcile loop (ReconcileIntervalSeconds, default 30s, floor 5s) that snapshots the worker's active-alarm set and broadcasts synthetic raise/clear transitions for missed alarms is not documented.
CLAIM_TYPE / gap
VERDICT / gap
EVIDENCE / GatewayAlarmMonitor.ReconcileLoopAsync:235-260; GatewayAlarmMonitor.ApplyReconcile:315-354; AlarmsOptions.ReconcileIntervalSeconds:47.
CODE_AREA / alarm.subscribe
SEVERITY / medium
PROPOSED_FIX / Document the reconcile pass: cadence, purpose (catches missed poll-and-diff transitions), synthetic transition kind (Raise/Clear), and that it does not emit Acknowledge transitions.
DOC / LINES / (entire doc — subscriber backpressure / drop behavior)
CLAIM / (gap) A subscriber that cannot keep up with the alarm feed is dropped with an error ("Alarm feed subscriber fell behind and was dropped; reconnect to re-snapshot"). The queue capacity is 2048. This behavior is not documented.
CLAIM_TYPE / gap
VERDICT / gap
EVIDENCE / GatewayAlarmMonitor.Broadcast:358-375; SubscriberQueueCapacity = 2048 (GatewayAlarmMonitor.cs:21).
CODE_AREA / alarm.subscribe
SEVERITY / medium
PROPOSED_FIX / Document the backpressure model: bounded 2048-message channel per subscriber; slow subscribers are completed with error and must reconnect; reconnect re-snapshots the active set.
DOC / LINES / (entire doc — alarm_full_reference parse format for ack)
CLAIM / (gap) The doc does not document the alarm_full_reference parse contract for AcknowledgeAlarm: a canonical GUID string triggers the GUID path; Provider!Group.Tag (first ! splits provider, first . splits group from tag) triggers the by-name path; anything else is rejected.
CLAIM_TYPE / gap
VERDICT / gap
EVIDENCE / GatewayAlarmMonitor.BuildAcknowledgeCommand and TryParseAlarmReference (GatewayAlarmMonitor.cs:516-610). Error message: "alarm_full_reference must be a canonical GUID or 'Provider!Group.Tag' format."
CODE_AREA / alarm.ack
SEVERITY / high
PROPOSED_FIX / Document the AcknowledgeAlarm.alarm_full_reference field's two accepted formats and how the gateway routes each.
DOC / LINES / (entire doc — AlarmConditionState on snapshot)
CLAIM / (gap) The ActiveAlarmSnapshot.current_state field uses AlarmConditionState (Active / ActiveAcked / Inactive) — the distinction between UnackRtn and AckRtn is lost in the snapshot (both collapse to Inactive). This is not documented.
CLAIM_TYPE / gap
VERDICT / gap
EVIDENCE / AlarmDispatcher.MapConditionState (AlarmDispatcher.cs:221-234): both UnackRtn and AckRtn map to AlarmConditionState.Inactive.
CODE_AREA / alarm.state
SEVERITY / medium
PROPOSED_FIX / Document the state collapse rule: the ActiveAlarmSnapshot.current_state field does not distinguish between acknowledged-cleared and unacknowledged-cleared alarms; both surface as Inactive. Consumers that need this distinction must track the transition stream.
DOC / LINES / (entire doc — transition kind table)
CLAIM / (gap) The AlarmTransitionKind enum has a Retrigger value (ALARM_TRANSITION_KIND_RETRIGGER = 4), but the doc only describes Raise / Acknowledge / Clear.
CLAIM_TYPE / gap
VERDICT / gap
EVIDENCE / mxaccess_gateway.proto:777; AlarmRecordTransitionMapper.MapTransition does not produce Retrigger — it is defined in the proto but unused by the current mapping logic (AlarmRecordTransitionMapper.cs:54-78).
CODE_AREA / alarm.state
SEVERITY / low
PROPOSED_FIX / Note that AlarmTransitionKind.Retrigger exists in the proto but is not emitted by the current worker (the *Rtn→*Alm re-trigger case maps to Raise). Flag as reserved for future use or remove from the proto if unused.