docs(alarms): document alarmmgr->subtag fallback (providers, failover, config, contract, parity)

This commit is contained in:
Joseph Doherty
2026-06-13 10:43:37 -04:00
parent 27f6c9e6b7
commit 2f30f0c7c0
5 changed files with 368 additions and 0 deletions
+67
View File
@@ -94,6 +94,73 @@ Carrying the enqueue timestamp into the worker layer is what lets queue-wait tim
`StreamAlarms` is a server-streaming, **session-less** RPC that attaches to the gateway's central alarm feed. The handler delegates to `IGatewayAlarmService.StreamAsync`. The stream opens with one `AlarmFeedMessage` carrying an `active_alarm` per currently-active alarm (the ConditionRefresh snapshot), then a single `snapshot_complete`, then a `transition` for every subsequent raise / acknowledge / clear. It is served by the always-on `GatewayAlarmMonitor`, which owns a single gateway-managed worker session and fans out to every attached client — clients no longer open a session of their own. `alarm_filter_prefix`, when set, scopes the stream to a sub-tree.
#### Provider status on the alarm feed
`AlarmFeedMessage` has a fourth `payload` case, `provider_status`, carrying
an `AlarmProviderStatus` message:
```protobuf
message AlarmProviderStatus {
AlarmProviderMode mode = 1;
bool degraded = 2; // true whenever mode == SUBTAG
string reason = 3; // human-readable switch reason
google.protobuf.Timestamp since = 4;
}
```
The gateway emits `provider_status` once when a client first subscribes
(immediately after the initial snapshot and before the first live transition)
and again on every failover or failback. A late-joining client therefore
always learns the current provider mode without waiting for the next switch.
`AlarmProviderMode` is an enum with three values:
| Value | Meaning |
|-------|---------|
| `ALARM_PROVIDER_MODE_UNSPECIFIED` (0) | Default / unset |
| `ALARM_PROVIDER_MODE_ALARMMGR` (1) | Native wnwrap alarm-manager source |
| `ALARM_PROVIDER_MODE_SUBTAG` (2) | Subtag-monitoring fallback (degraded) |
#### Degraded and source-provider fields on transitions and snapshots
`OnAlarmTransitionEvent` and `ActiveAlarmSnapshot` both carry two new fields:
- `bool degraded` (field 14) — `true` when the record came from the subtag
fallback, not the native alarmmgr.
- `AlarmProviderMode source_provider` (field 15) — which provider produced
this record (`ALARMMGR` or `SUBTAG`).
Both fields are proto3 defaults (`false` / `UNSPECIFIED`) in alarmmgr mode,
so existing clients that do not read them continue to function without change.
Clients that care about provenance — for example, an OPC UA server that
applies different quality flags to degraded alarms — should inspect `degraded`
before consuming the transition.
Subtag-mode records are a non-parity source. They carry synthetic GUIDs,
best-effort timestamps, and reduced field coverage. See
`docs/AlarmClientDiscovery.md` for the full fidelity table.
#### Provider-mode-changed event
The worker emits `OnAlarmProviderModeChangedEvent` (family
`MX_EVENT_FAMILY_ON_ALARM_PROVIDER_MODE_CHANGED`) on each switch between
providers:
```protobuf
message OnAlarmProviderModeChangedEvent {
AlarmProviderMode mode = 1;
string reason = 2;
int32 hresult = 3; // COM HRESULT that triggered failover; 0 on failback
google.protobuf.Timestamp at = 4;
}
```
This event arrives on the `StreamEvents` stream of the alarm monitor's
internal gateway session (not on client sessions). `GatewayAlarmMonitor`
consumes it and reflects the new mode into the `StreamAlarms` feed's
`provider_status`, the dashboard hub, and metrics. Client sessions do not
receive this event directly.
## Validation Rules
`MxAccessGrpcRequestValidator` rejects requests with `StatusCode.InvalidArgument` before any session work happens. The rules are intentionally narrow — anything that requires session state (for example, "session does not exist") is left for `ISessionManager` so the validator can stay synchronous and side-effect free.