Files
mxaccessgw/docs/audit/fragments/09-alarms.md
T

29 KiB
Raw Blame History

Cluster 09 — Alarms

Audited doc: docs/AlarmClientDiscovery.md Code base verified against:

  • src/ZB.MOM.WW.MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs
  • src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmDispatcher.cs
  • src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmCommandHandler.cs
  • src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmRecordTransitionMapper.cs
  • src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs
  • src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmStateKind.cs
  • src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAlarmTransitionEvent.cs
  • src/ZB.MOM.WW.MxGateway.Worker/MxAccess/IMxAccessAlarmConsumer.cs
  • src/ZB.MOM.WW.MxGateway.Worker/MxAccess/IAlarmCommandHandler.cs
  • src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs (alarm arms)
  • src/ZB.MOM.WW.MxGateway.Server/Alarms/GatewayAlarmMonitor.cs
  • src/ZB.MOM.WW.MxGateway.Server/Alarms/IGatewayAlarmService.cs
  • src/ZB.MOM.WW.MxGateway.Server/Alarms/AlarmsServiceCollectionExtensions.cs
  • src/ZB.MOM.WW.MxGateway.Server/Configuration/AlarmsOptions.cs
  • src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto

DOC / LINES / 71-74 (comment about AlarmClientConsumer.cs) CLAIM / The file src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmClientConsumer.cs exists in the repo. CLAIM_TYPE / path VERDICT / wrong EVIDENCE / find /Users/dohertj2/Desktop/mxaccessgateway/src -name "AlarmClientConsumer*" returns nothing. The file no longer exists; WnWrapAlarmConsumer.cs comments confirm it was replaced: WnWrapAlarmConsumer.cs:18-19. CODE_AREA / alarm.subscribe SEVERITY / medium PROPOSED_FIX / Update references to the obsolete AlarmClientConsumer.cs throughout the doc to note that the file was retired and replaced by WnWrapAlarmConsumer.cs.


DOC / LINES / 71-74 CLAIM / The architecture comment on AlarmClientConsumer.cs (PR A.5) describing IAlarmMgrDataProvider managed events is wrong against the deployed assembly — there is no managed event surface. CLAIM_TYPE / behavior-rule VERDICT / stale EVIDENCE / The source file it critiques (AlarmClientConsumer.cs) no longer exists in the repo. The critique is historically accurate but refers to a file that was removed during the wnwrap migration. No live code contains IAlarmMgrDataProvider. WnWrapAlarmConsumer.cs:1-575. CODE_AREA / alarm.subscribe SEVERITY / low PROPOSED_FIX / Note that the critique is a historical record; the offending file has been removed. The section remains valid as probe context but should clarify the current state.


DOC / LINES / 87-88 CLAIM / AlarmClientConsumer.AlarmRecordReceived has no production callers; RaiseAlarmRecordReceived is internal for tests and never invoked at runtime. CLAIM_TYPE / behavior-rule VERDICT / stale EVIDENCE / Neither AlarmRecordReceived nor RaiseAlarmRecordReceived appear anywhere in the current source tree (grep -rn "AlarmRecordReceived\|RaiseAlarmRecordReceived" src — zero results outside tests or binaries). The entire AlarmClientConsumer class was removed; the observation is a dead historical probe note. CODE_AREA / alarm.subscribe SEVERITY / low PROPOSED_FIX / Flag as historical only; the code path no longer exists.


DOC / LINES / 492 CLAIM / "PR A.5's Subscribe / AcknowledgeByGuid / SnapshotActiveAlarms are correct — they're pull-style and don't depend on the notification mechanism." CLAIM_TYPE / behavior-rule VERDICT / stale EVIDENCE / The method names in the current interface are IMxAccessAlarmConsumer.AcknowledgeByGuid and SnapshotActiveAlarms (IMxAccessAlarmConsumer.cs:64,104), so the names are accurate. However, this statement refers to PR A.5's AlarmClientConsumer, which no longer exists. The claim implicitly endorses AlarmClientConsumer code that has been replaced by WnWrapAlarmConsumer. The successor also exposes AcknowledgeByGuid but routes it through AlarmAckByGUID on wwAlarmConsumerClass. CODE_AREA / alarm.ack SEVERITY / low PROPOSED_FIX / Note that PR A.5 was superseded; the current production path is WnWrapAlarmConsumer.


DOC / LINES / 604-605 CLAIM / After an alarm return-to-normal (UNACK_RTN), wwAlarmConsumerClass.AlarmAckByGUID is "the method to call" for acknowledgement. CLAIM_TYPE / behavior-rule VERDICT / wrong EVIDENCE / The doc itself contradicts this eleven sections later ("Section 4. AlarmAckByGUID is not implemented", lines 750-756): AlarmAckByGUID(VBGUID, …) throws NotImplementedException (COM E_NOTIMPL) on wwAlarmConsumerClass. The doc at line 604 presents it as the correct ack method before the discovery in the live-smoke section, creating a contradiction within the document that integrators reading top-to-bottom will encounter. CODE_AREA / alarm.ack SEVERITY / high PROPOSED_FIX / Add a forward-reference warning at line 604 ("Note: see 'Live smoke-test discoveries — section 4' below; AlarmAckByGUID is E_NOTIMPL on wnwrap and must not be called directly; use AlarmAckByName via the ack-only consumer.") or reorder the section.


DOC / LINES / 750-756 CLAIM / AlarmAckByGUID(VBGUID, …) throws NotImplementedException (E_NOTIMPL) on wwAlarmConsumerClass, so all acks must go through AlarmAckByName. CLAIM_TYPE / behavior-rule VERDICT / accurate EVIDENCE / WnWrapAlarmConsumer.cs:215-239 provides AcknowledgeByGuid which calls com.AlarmAckByGUID directly (the COM interop). The method is present in the consumer and called from AlarmCommandHandler.Acknowledge (AlarmCommandHandler.cs:141-158) and AlarmDispatcher.Acknowledge (AlarmDispatcher.cs:87-103). The code path is plumbed through and compiles. Whether it still throws E_NOTIMPL at runtime on the deployed AVEVA build is a runtime-only observable — the doc's claim was empirically confirmed 2026-05-01. CODE_AREA / alarm.ack SEVERITY / medium PROPOSED_FIX / Flag: the code now calls AlarmAckByGUID without a try/catch for E_NOTIMPL; document that the GUID path will surface a COMException at runtime on affected AVEVA builds and that the gateway routes canonical Provider!Group.Tag references through AcknowledgeAlarmByName to avoid this.


DOC / LINES / 758-762 CLAIM / "The proto AcknowledgeAlarmCommand (GUID-based) and MxAccessCommandExecutor.ExecuteAcknowledgeAlarm switch arm remain in the codebase for forward-compat, but the gateway-side WorkerAlarmRpcDispatcher.AcknowledgeAsync now always routes through AcknowledgeAlarmByName when the public RPC supplies a recognizable Provider!Group.Tag reference." CLAIM_TYPE / cross-ref VERDICT / wrong EVIDENCE / (a) WorkerAlarmRpcDispatcher does not exist in the source tree. The class that routes acknowledge requests is GatewayAlarmMonitor.AcknowledgeAsync + BuildAcknowledgeCommand (GatewayAlarmMonitor.cs:437,516). (b) The gateway does NOT always route through AcknowledgeAlarmByName: BuildAcknowledgeCommand first tries Guid.TryParse; if the alarm_full_reference is a canonical GUID it still dispatches MxCommandKind.AcknowledgeAlarm (the GUID path) (GatewayAlarmMonitor.cs:528-543). Only when the reference is not a GUID does it fall through to AcknowledgeAlarmByName (GatewayAlarmMonitor.cs:545-563). CODE_AREA / alarm.ack SEVERITY / high PROPOSED_FIX / (1) Replace WorkerAlarmRpcDispatcher with the actual class name GatewayAlarmMonitor. (2) Correct the routing description: GUID-shaped references still go through AcknowledgeAlarmCommand (GUID path); Provider!Group.Tag references go through AcknowledgeAlarmByNameCommand. The claim that it "always routes through AcknowledgeAlarmByName" is false.


DOC / LINES / 636-639 (A.2 outline step 2) CLAIM / Production WnWrapAlarmConsumer polls GetXmlCurrentAlarms2(maxAlmCnt, out xml) on a timer (500ms1s cadence). CLAIM_TYPE / behavior-rule VERDICT / wrong EVIDENCE / WnWrapAlarmConsumer.cs:38-43 explicitly states "the consumer owns no internal timer." PollOnce() is driven externally by StaRuntime.InvokeAsync (WnWrapAlarmConsumer.cs:39, AlarmCommandHandler.cs:29-33). The 500ms1s timer cadence mentioned in the doc was a design proposal; the implementation delegates all poll scheduling to the caller (STA). The doc's step 2 reads as if the consumer self-schedules. CODE_AREA / alarm.subscribe SEVERITY / medium PROPOSED_FIX / Correct to: "Poll GetXmlCurrentAlarms2 via PollOnce() called externally by the worker's STA through StaRuntime.InvokeAsync; the consumer owns no internal timer."


DOC / LINES / 641-643 (A.2 outline step 2) CLAIM / "AlarmAckByGUID(VBGUID, comment, oprName, node, domain, fullName) for client-driven acknowledgements (matches PR A.5's AlarmAckCommand payload)." CLAIM_TYPE / rpc/proto VERDICT / wrong EVIDENCE / The proto message is named AcknowledgeAlarmCommand (not AlarmAckCommand): mxaccess_gateway.proto:337. The consumer also exposes AcknowledgeByGuid (not AlarmAckByGUID) as its interface method (IMxAccessAlarmConsumer.cs:64). The doc uses the COM method name where it should use the proto/interface name, and uses the wrong proto message name. CODE_AREA / alarm.ack SEVERITY / medium PROPOSED_FIX / Replace "PR A.5's AlarmAckCommand payload" with "the proto's AcknowledgeAlarmCommand message" (mxaccess_gateway.proto:337).


DOC / LINES / 644-647 (A.2 outline step 3) CLAIM / STATE mapping: UNACK_ALMin_alarm=true, acked=false; UNACK_RTNin_alarm=false, acked=false; ACK_ALMin_alarm=true, acked=true; ACK_RTNin_alarm=false, acked=true. CLAIM_TYPE / term VERDICT / wrong EVIDENCE / The production proto uses AlarmConditionState (Active / ActiveAcked / Inactive), not boolean in_alarm/acked fields. AlarmDispatcher.MapConditionState (AlarmDispatcher.cs:221-234): UnackAlm→Active, AckAlm→ActiveAcked, UnackRtn→Inactive, AckRtn→Inactive. Both Rtn states collapse to Inactive — the acked distinction on a cleared alarm is not surfaced. The doc's proposed boolean decomposition was a design proposal that was not adopted; the final proto shape uses the enum. CODE_AREA / alarm.state SEVERITY / high PROPOSED_FIX / Replace the boolean mapping table with the actual AlarmConditionState enum mapping used in AlarmDispatcher.MapConditionState. Document that UnackRtn and AckRtn both map to Inactive (ack-vs-unack on a cleared alarm is not exposed through the proto).


DOC / LINES / 648-649 (A.2 outline step 3) CLAIM / "GUIDcondition_id (canonicalize the no-dashes hex to a UUID string)." CLAIM_TYPE / term VERDICT / wrong EVIDENCE / The production code stores the GUID as MxAlarmSnapshotRecord.AlarmGuid (a System.Guid) and the proto carries it inside OnAlarmTransitionEvent only implicitly (there is no condition_id field in the proto). The alarm_full_reference field is used as the stable identifier for condition correlation, not a condition_id. mxaccess_gateway.proto:720-723, OnAlarmTransitionEvent.alarm_full_reference. The field name condition_id does not exist in the proto. CODE_AREA / alarm.state SEVERITY / medium PROPOSED_FIX / Replace condition_id with the actual stable identifier: alarm_full_reference (OnAlarmTransitionEvent.alarm_full_reference). The GUID is used internally by WnWrapAlarmConsumer as a snapshot key but is not exposed as a proto field.


DOC / LINES / 651-654 (A.2 outline step 3 — timestamp) CLAIM / "DATE + TIME + GMTOFFSET + DSTADJUST → reassemble UTC timestamp; matches the worker's existing Timestamp wire format." CLAIM_TYPE / behavior-rule VERDICT / accurate EVIDENCE / AlarmRecordTransitionMapper.ParseTransitionTimestampUtc (AlarmRecordTransitionMapper.cs:116-188) parses all four fields and computes UTC. The proto uses google.protobuf.Timestamp (mxaccess_gateway.proto:747). Wire-up matches. CODE_AREA / alarm.state SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / 656-657 (A.2 outline step 3) CLAIM / "PRIORITY → severity (already 1-1000-ish range)." CLAIM_TYPE / behavior-rule VERDICT / accurate EVIDENCE / WnWrapAlarmConsumer.ParseSnapshotXml reads PRIORITY as int (WnWrapAlarmConsumer.cs:433), stored as MxAlarmSnapshotRecord.Priority. AlarmDispatcher.OnTransition passes it as severity: record.Priority (AlarmDispatcher.cs:187). OnAlarmTransitionEvent.severity is int32 in the proto (mxaccess_gateway.proto). The 1-1000 range is consistent with AVEVA's alarm priority range. CODE_AREA / alarm.state SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / 658-659 (A.2 outline step 3) CLAIM / "TAGNAME → reference; PROVIDER_NAME + GROUP for scope metadata." CLAIM_TYPE / behavior-rule VERDICT / accurate EVIDENCE / AlarmDispatcher.OnTransition calls AlarmRecordTransitionMapper.ComposeFullReference(record.ProviderName, record.Group, record.TagName) and passes the result as alarmFullReference (AlarmDispatcher.cs:180-183). ComposeFullReference formats it as Provider!Group.TagName (AlarmRecordTransitionMapper.cs:90-102). TAGNAME alone is passed as sourceObjectReference (AlarmDispatcher.cs:184). CODE_AREA / alarm.state SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / 672-676 (A.2 outline step 5) CLAIM / "PR A.5's snapshot/ack contract tests can stay — they don't touch the underlying COM API." CLAIM_TYPE / cross-ref VERDICT / stale EVIDENCE / PR A.5's AlarmClientConsumer was retired; there is no class by that name. The test files for alarm command handling now cover AlarmCommandHandler, AlarmDispatcher, and WnWrapAlarmConsumerXmlTests — none named as "PR A.5 tests." The statement implies a test corpus that doesn't exist under the described label. CODE_AREA / alarm.subscribe SEVERITY / low PROPOSED_FIX / Remove or update the PR label; reference actual test files: AlarmCommandHandlerTests.cs, AlarmDispatcherTests.cs, WnWrapAlarmConsumerXmlTests.cs.


DOC / LINES / 673-675 (settled API ordering section) CLAIM / "InitializeConsumer first, then RegisterConsumer — both on aaAlarmManagedClient.AlarmClient and wwAlarmConsumerClass." CLAIM_TYPE / behavior-rule VERDICT / accurate EVIDENCE / WnWrapAlarmConsumer.Subscribe calls IwwAlarmConsumer_InitializeConsumer before IwwAlarmConsumer_RegisterConsumer (WnWrapAlarmConsumer.cs:117-137). Same ordering for ackClient (WnWrapAlarmConsumer.cs:188-208). CODE_AREA / alarm.subscribe SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / 676-682 (settled API section) CLAIM / "aaAlarmManagedClient.AlarmClient.RegisterConsumer is 5-arg (includes bRetainHiddenAlarms); wwAlarmConsumerClass.RegisterConsumer is 4-arg (no bRetainHiddenAlarms)." CLAIM_TYPE / behavior-rule VERDICT / accurate EVIDENCE / WnWrapAlarmConsumer.Subscribe calls IwwAlarmConsumer_RegisterConsumer with 4 args: hWnd, szProductName, szApplicationName, szVersion (WnWrapAlarmConsumer.cs:128-132). Consistent with the doc. CODE_AREA / alarm.subscribe SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / 683-685 (settled API section) CLAIM / "Subscription expression format: \\<machine>\Galaxy!<area> (literal Galaxy provider) for both libraries." CLAIM_TYPE / path VERDICT / accurate EVIDENCE / WnWrapAlarmConsumer.ComposeXmlAlarmQuery parses this format and treats Galaxy as the provider (WnWrapAlarmConsumer.cs:489-530). IMxAccessAlarmConsumer.Subscribe doc comment confirms: "Subscription string follows AVEVA's canonical format: \\<node>\Galaxy!<area>. The literal 'Galaxy' is the provider name (regardless of the configured Galaxy database name)." (IMxAccessAlarmConsumer.cs:44-46). AlarmsOptions.cs:16-17 also confirms. CODE_AREA / alarm.subscribe SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / 684-685 (settled API section) CLAIM / "Native ack: AlarmAckByGUID(VBGUID guid, comment, oprName, node, domain, fullName) on the v2 surface." CLAIM_TYPE / behavior-rule VERDICT / accurate EVIDENCE / WnWrapAlarmConsumer.AcknowledgeByGuid calls com.AlarmAckByGUID with exactly those args (WnWrapAlarmConsumer.cs:232-238). CODE_AREA / alarm.ack SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / 695-699 (live smoke quirk 1) CLAIM / "Without SetXmlAlarmQuery, the first GetXmlCurrentAlarms2 call fails with E_FAIL (HRESULT 0x80004005)." CLAIM_TYPE / behavior-rule VERDICT / accurate EVIDENCE / WnWrapAlarmConsumer.Subscribe calls SetXmlAlarmQuery and wraps it with a COMException guard that would surface as InvalidOperationException with the E_FAIL message (WnWrapAlarmConsumer.cs:156-182). The call is mandatory per production code structure. CODE_AREA / alarm.subscribe SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / 719-733 (live smoke quirk 2) CLAIM / "Two consumers required: read-side consumer (with SetXmlAlarmQuery) and ack-only consumer (without SetXmlAlarmQuery). All AcknowledgeByName calls dispatch through the ack-only instance." CLAIM_TYPE / behavior-rule VERDICT / accurate EVIDENCE / WnWrapAlarmConsumer.Subscribe provisions ackClient = new wwAlarmConsumerClass() with full lifecycle but no SetXmlAlarmQuery (WnWrapAlarmConsumer.cs:184-210). AcknowledgeByName uses ackClient (WnWrapAlarmConsumer.cs:256-278). AcknowledgeByGuid uses client (read-side) (WnWrapAlarmConsumer.cs:224-238). CODE_AREA / alarm.ack SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / 736-748 (live smoke quirk 3) CLAIM / "The v2 8-arg AlarmAckByName returns -55 on this AVEVA build. The v1 6-arg AlarmAckByName works. Production WnWrapAlarmConsumer.AcknowledgeByName calls the 6-arg overload. Operator domain and full-name fields are accepted by the proto but not propagated to AVEVA (discarded)." CLAIM_TYPE / behavior-rule VERDICT / accurate EVIDENCE / WnWrapAlarmConsumer.AcknowledgeByName calls com.AlarmAckByName (6-arg) and explicitly discards ackOperatorDomain and ackOperatorFullName with _ = ... (WnWrapAlarmConsumer.cs:268-278). The proto AcknowledgeAlarmByNameCommand retains operator_domain and operator_full_name fields (mxaccess_gateway.proto:359-373). CODE_AREA / alarm.ack SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / 750-756 (live smoke quirk 4) CLAIM / "AlarmAckByGUID is not implemented on wwAlarmConsumerClass; it throws NotImplementedException / E_NOTIMPL. The reference→GUID lookup is not viable; all acks must go through AlarmAckByName." CLAIM_TYPE / behavior-rule VERDICT / stale EVIDENCE / The production code at WnWrapAlarmConsumer.AcknowledgeByGuid still calls com.AlarmAckByGUID directly without a guard for E_NOTIMPL (WnWrapAlarmConsumer.cs:215-239). The gateway's BuildAcknowledgeCommand still dispatches MxCommandKind.AcknowledgeAlarm (GUID path) when alarm_full_reference parses as a GUID (GatewayAlarmMonitor.cs:528-543). The doc says all acks must go through AcknowledgeByName, but the code still routes GUID-shaped references through AlarmAckByGUID. The E_NOTIMPL runtime behavior is unguarded. CODE_AREA / alarm.ack SEVERITY / high PROPOSED_FIX / Either (a) add a COMException/NotImplementedException guard around AlarmAckByGUID in WnWrapAlarmConsumer.AcknowledgeByGuid that falls back to AcknowledgeByName, or (b) make the gateway never dispatch the GUID arm. Document whichever approach is taken. The current state silently sends a doomed IPC command.


DOC / LINES / 761-762 CLAIM / "WorkerAlarmRpcDispatcher.AcknowledgeAsync now always routes through AcknowledgeAlarmByName when the public RPC supplies a recognizable Provider!Group.Tag reference." CLAIM_TYPE / cross-ref VERDICT / wrong EVIDENCE / (a) No class named WorkerAlarmRpcDispatcher exists in the source tree. The gateway-side routing is in GatewayAlarmMonitor.BuildAcknowledgeCommand (GatewayAlarmMonitor.cs:516). (b) The routing is conditional: GUID-shaped alarm_full_referenceAcknowledgeAlarmCommand (GUID path); Provider!Group.TagAcknowledgeAlarmByNameCommand. The claim that the routing "always" goes through AcknowledgeAlarmByName is incorrect. CODE_AREA / alarm.ack SEVERITY / high PROPOSED_FIX / Replace the entire sentence. The correct description: "The gateway's GatewayAlarmMonitor.BuildAcknowledgeCommand (GatewayAlarmMonitor.cs:516) dispatches MxCommandKind.AcknowledgeAlarm for GUID-shaped references and MxCommandKind.AcknowledgeAlarmByName for Provider!Group.Tag references."


DOC / LINES / 765-773 (STA quirk 5) CLAIM / "The consumer's internal Timer fires on threadpool threads and would block on cross-apartment marshaling unless the host STA pumps Win32 messages. The smoke test sidesteps this by setting pollIntervalMilliseconds=0 (Timer disabled) and driving PollOnce manually." CLAIM_TYPE / behavior-rule VERDICT / stale EVIDENCE / The production WnWrapAlarmConsumer has no internal Timer at all — the design was revised so PollOnce() is always external (WnWrapAlarmConsumer.cs:38-43: "the consumer owns no internal timer"). There is no pollIntervalMilliseconds constructor parameter (WnWrapAlarmConsumer.cs:69-87). The constructor takes only wwAlarmConsumerClass client and int maxAlarmsPerFetch. The smoke test mention of pollIntervalMilliseconds=0 refers to a superseded design. CODE_AREA / alarm.subscribe SEVERITY / medium PROPOSED_FIX / Update to reflect the final design: WnWrapAlarmConsumer has no internal timer; PollOnce() is always called externally by the STA. Remove the pollIntervalMilliseconds=0 test-workaround reference.


DOC / LINES / 599-601 (XML STATE enum section) CLAIM / "STATE enum values observed: UNACK_RTN (alarm returned to normal, unacknowledged) and UNACK_ALM (alarm active and unacknowledged). Other states (ACK_RTN, ACK_ALM) would appear when an ack is performed." CLAIM_TYPE / term VERDICT / accurate EVIDENCE / MxAlarmStateKind.cs:1-17 defines all four values. AlarmRecordTransitionMapper.ParseStateKind handles all four (AlarmRecordTransitionMapper.cs:27-38). CODE_AREA / alarm.state SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / 628-630 (reference format in smoke capture) CLAIM / Reference format in the capture: ref='Galaxy!TestArea.TestMachine_001.TestAlarm001' — the alarm_full_reference is composed as ProviderName!Group.TagName. CLAIM_TYPE / term VERDICT / accurate EVIDENCE / AlarmRecordTransitionMapper.ComposeFullReference formats as {provider}!{group}.{name} (AlarmRecordTransitionMapper.cs:90-102). The example matches this pattern exactly. CODE_AREA / alarm.state SEVERITY / low PROPOSED_FIX / flag only


DOC / LINES / (entire doc — RPC names) CLAIM / The document mentions IPC commands SubscribeAlarms, AcknowledgeByGuid, SnapshotActiveAlarms, QueryActiveAlarms but never names the public gRPC RPCs — AcknowledgeAlarm, StreamAlarms, QueryActiveAlarms — or the config keys governing the always-on monitor (MxGateway:Alarms:Enabled, MxGateway:Alarms:SubscriptionExpression, MxGateway:Alarms:DefaultArea, MxGateway:Alarms:ReconcileIntervalSeconds). CLAIM_TYPE / gap VERDICT / gap EVIDENCE / mxaccess_gateway.proto:22-37 (RPCs); AlarmsOptions.cs:21-47 (config keys); GatewayAlarmMonitor.cs:17-51 (always-on broker). None documented in AlarmClientDiscovery.md. CODE_AREA / alarm.subscribe SEVERITY / high PROPOSED_FIX / The doc is a probe/research log, not an operator/integrator guide. However, the gap means no other document covers these public-surface items. Add a section or cross-reference to the public alarm API: RPCs AcknowledgeAlarm, StreamAlarms, QueryActiveAlarms; config keys MxGateway:Alarms:Enabled, MxGateway:Alarms:SubscriptionExpression, MxGateway:Alarms:DefaultArea, MxGateway:Alarms:ReconcileIntervalSeconds.


DOC / LINES / (entire doc — always-on broker architecture) CLAIM / (gap) The doc describes a model where individual client sessions subscribe to alarms. The production architecture uses a gateway-owned always-on GatewayAlarmMonitor that holds one dedicated worker session and fans the alarm feed to all clients. No client opens its own alarm subscription; StreamAlarms is session-less. CLAIM_TYPE / gap VERDICT / gap EVIDENCE / GatewayAlarmMonitor.cs:1-697; IGatewayAlarmService.cs:27-63; AlarmsOptions.cs:1-48. AlarmClientDiscovery.md describes the worker alarm consumer (IPC layer) but never describes the gateway-level brokering architecture that wraps it. CODE_AREA / alarm.subscribe SEVERITY / high PROPOSED_FIX / Add a section describing GatewayAlarmMonitor as the always-on broker: one gateway-owned session, periodic reconcile loop (ReconcileIntervalSeconds), StreamAsync fan-out to per-client Channel<AlarmFeedMessage>, subscriber capacity (2048 messages), fail-open restart-backoff (5s).


DOC / LINES / (entire doc — AlarmFeedMessage / snapshot_complete protocol) CLAIM / (gap) The doc does not document the AlarmFeedMessage stream protocol: initial burst of active_alarm messages, then snapshot_complete sentinel, then transition messages for live changes. CLAIM_TYPE / gap VERDICT / gap EVIDENCE / mxaccess_gateway.proto:857-868 (message definition); GatewayAlarmMonitor.StreamAsync:386-434. This is the key integrator-facing streaming contract. CODE_AREA / alarm.subscribe SEVERITY / high PROPOSED_FIX / Document the StreamAlarms protocol: AlarmFeedMessage union with active_alarm, snapshot_complete, and transition fields; the invariant that the snapshot precedes the sentinel which precedes live transitions.


DOC / LINES / (entire doc — reconcile mechanism) CLAIM / (gap) The periodic reconcile loop (ReconcileIntervalSeconds, default 30s, floor 5s) that snapshots the worker's active-alarm set and broadcasts synthetic raise/clear transitions for missed alarms is not documented. CLAIM_TYPE / gap VERDICT / gap EVIDENCE / GatewayAlarmMonitor.ReconcileLoopAsync:235-260; GatewayAlarmMonitor.ApplyReconcile:315-354; AlarmsOptions.ReconcileIntervalSeconds:47. CODE_AREA / alarm.subscribe SEVERITY / medium PROPOSED_FIX / Document the reconcile pass: cadence, purpose (catches missed poll-and-diff transitions), synthetic transition kind (Raise/Clear), and that it does not emit Acknowledge transitions.


DOC / LINES / (entire doc — subscriber backpressure / drop behavior) CLAIM / (gap) A subscriber that cannot keep up with the alarm feed is dropped with an error ("Alarm feed subscriber fell behind and was dropped; reconnect to re-snapshot"). The queue capacity is 2048. This behavior is not documented. CLAIM_TYPE / gap VERDICT / gap EVIDENCE / GatewayAlarmMonitor.Broadcast:358-375; SubscriberQueueCapacity = 2048 (GatewayAlarmMonitor.cs:21). CODE_AREA / alarm.subscribe SEVERITY / medium PROPOSED_FIX / Document the backpressure model: bounded 2048-message channel per subscriber; slow subscribers are completed with error and must reconnect; reconnect re-snapshots the active set.


DOC / LINES / (entire doc — alarm_full_reference parse format for ack) CLAIM / (gap) The doc does not document the alarm_full_reference parse contract for AcknowledgeAlarm: a canonical GUID string triggers the GUID path; Provider!Group.Tag (first ! splits provider, first . splits group from tag) triggers the by-name path; anything else is rejected. CLAIM_TYPE / gap VERDICT / gap EVIDENCE / GatewayAlarmMonitor.BuildAcknowledgeCommand and TryParseAlarmReference (GatewayAlarmMonitor.cs:516-610). Error message: "alarm_full_reference must be a canonical GUID or 'Provider!Group.Tag' format." CODE_AREA / alarm.ack SEVERITY / high PROPOSED_FIX / Document the AcknowledgeAlarm.alarm_full_reference field's two accepted formats and how the gateway routes each.


DOC / LINES / (entire doc — AlarmConditionState on snapshot) CLAIM / (gap) The ActiveAlarmSnapshot.current_state field uses AlarmConditionState (Active / ActiveAcked / Inactive) — the distinction between UnackRtn and AckRtn is lost in the snapshot (both collapse to Inactive). This is not documented. CLAIM_TYPE / gap VERDICT / gap EVIDENCE / AlarmDispatcher.MapConditionState (AlarmDispatcher.cs:221-234): both UnackRtn and AckRtn map to AlarmConditionState.Inactive. CODE_AREA / alarm.state SEVERITY / medium PROPOSED_FIX / Document the state collapse rule: the ActiveAlarmSnapshot.current_state field does not distinguish between acknowledged-cleared and unacknowledged-cleared alarms; both surface as Inactive. Consumers that need this distinction must track the transition stream.


DOC / LINES / (entire doc — transition kind table) CLAIM / (gap) The AlarmTransitionKind enum has a Retrigger value (ALARM_TRANSITION_KIND_RETRIGGER = 4), but the doc only describes Raise / Acknowledge / Clear. CLAIM_TYPE / gap VERDICT / gap EVIDENCE / mxaccess_gateway.proto:777; AlarmRecordTransitionMapper.MapTransition does not produce Retrigger — it is defined in the proto but unused by the current mapping logic (AlarmRecordTransitionMapper.cs:54-78). CODE_AREA / alarm.state SEVERITY / low PROPOSED_FIX / Note that AlarmTransitionKind.Retrigger exists in the proto but is not emitted by the current worker (the *Rtn→*Alm re-trigger case maps to Raise). Flag as reserved for future use or remove from the proto if unused.