docs(alarms): document alarmmgr->subtag fallback (providers, failover, config, contract, parity)
This commit is contained in:
@@ -790,3 +790,127 @@ Post-ack transition: kind=Clear …
|
|||||||
|
|
||||||
10s cadence held throughout; full proto fields populated correctly;
|
10s cadence held throughout; full proto fields populated correctly;
|
||||||
ack registered server-side without errors.
|
ack registered server-side without errors.
|
||||||
|
|
||||||
|
## Subtag-monitoring fallback provider
|
||||||
|
|
||||||
|
When the wnwrap alarm-manager source fails, the gateway worker switches to
|
||||||
|
`SubtagAlarmConsumer` — a synthetic alarm source that advises each alarm
|
||||||
|
attribute's subtags via the existing MXAccess `AddItem`/`Advise` pipeline and
|
||||||
|
derives alarm transitions from the resulting value-change stream. This is a
|
||||||
|
non-parity, degraded-mode source; every transition and snapshot it produces
|
||||||
|
carries `degraded = true`.
|
||||||
|
|
||||||
|
### Watch-list discovery
|
||||||
|
|
||||||
|
`GatewayAlarmMonitor` resolves the subtag watch-list at subscribe time by
|
||||||
|
calling `IAlarmWatchListResolver.GetAlarmAttributesAsync`. The resolver merges:
|
||||||
|
|
||||||
|
1. Galaxy Repository SQL (`GetAlarmAttributesAsync`) — objects that have alarm
|
||||||
|
extensions in the configured area.
|
||||||
|
2. Config overrides — `IncludeAttributes` adds explicit entries;
|
||||||
|
`ExcludeAttributes` removes Repository-derived ones. The config list takes
|
||||||
|
effect even when `UseGalaxyRepository` is `false`.
|
||||||
|
|
||||||
|
The resolved list is a set of `AlarmSubtagTarget` messages sent to the worker
|
||||||
|
inside `SubscribeAlarmsCommand.watch_list`. Each target carries the composed
|
||||||
|
MXAccess item addresses for the `.active`, `.acked`, ack-comment, and priority
|
||||||
|
subtags. The gateway re-runs discovery on its reconcile cadence and pushes an
|
||||||
|
updated watch-list when the model changes.
|
||||||
|
|
||||||
|
### Subtag advise and `LmxSubtagAlarmSource`
|
||||||
|
|
||||||
|
`LmxSubtagAlarmSource` (implements `ISubtagAlarmSource`) owns a separate
|
||||||
|
`LMXProxyServerClass` instance on the worker STA — it does not share the
|
||||||
|
session's main MXAccess object. For each watch-list target it calls
|
||||||
|
`AddItem`/`Advise` on the configured subtag addresses. When a subtag value
|
||||||
|
changes, it raises `ValueChanged` on the STA and `SubtagAlarmConsumer`
|
||||||
|
forwards it to `SubtagAlarmStateMachine`.
|
||||||
|
|
||||||
|
`PollOnce()` on the subtag consumer is a no-op — the path is event-driven
|
||||||
|
through `Advise`, not poll-driven.
|
||||||
|
|
||||||
|
### Synthesis rules
|
||||||
|
|
||||||
|
`SubtagAlarmStateMachine` tracks `(active, acked)` per watch-list entry and
|
||||||
|
emits `MxAlarmTransitionEvent` records on change:
|
||||||
|
|
||||||
|
| Subtag change | Emitted transition | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| `active` false → true | Raise (`UNACK_ALM`) | `original_raise_timestamp` = first observed active time for this episode |
|
||||||
|
| `acked` false → true, while `active` | Acknowledge (`ACK_ALM`) | `AckedDuringEpisode` latch set |
|
||||||
|
| `active` true → false | Clear | `AckRtn` if `AckedDuringEpisode` is set, else `UnackRtn` |
|
||||||
|
| `acked` true → false, while `active` | (none) | Latch is NOT cleared; the episode retains its acknowledged status at clear |
|
||||||
|
|
||||||
|
The `AckedDuringEpisode` latch addresses out-of-order subtag delivery:
|
||||||
|
MXAccess does not guarantee the `acked = false` update arrives before the
|
||||||
|
`active = false` update. The latch ensures a clear always emits `ACK_RTN`
|
||||||
|
when the alarm was acknowledged at any point during the active episode.
|
||||||
|
|
||||||
|
`SnapshotActive()` returns one `MxAlarmSnapshotRecord` per currently-active
|
||||||
|
alarm. State mapping:
|
||||||
|
|
||||||
|
- `active && !acked` → `UNACK_ALM`
|
||||||
|
- `active && acked` → `ACK_ALM`
|
||||||
|
- `!active` → not included in the snapshot
|
||||||
|
|
||||||
|
### Synthetic GUID
|
||||||
|
|
||||||
|
The alarmmgr provider supplies a native GUID per alarm record. The subtag
|
||||||
|
provider has no native GUID. `SubtagAlarmConsumer` derives a deterministic
|
||||||
|
GUID by hashing `alarm_full_reference` (via `SyntheticAlarmGuid.ForReference`).
|
||||||
|
The same reference always produces the same GUID within a session, so
|
||||||
|
GUID-based ack routing resolves correctly. The GUID is not stable across
|
||||||
|
different alarm references or gateway restarts in the sense of matching any
|
||||||
|
AVEVA-internal GUID.
|
||||||
|
|
||||||
|
### Acknowledge in subtag mode
|
||||||
|
|
||||||
|
`AlarmDispatcher` routes ack calls by active provider mode:
|
||||||
|
|
||||||
|
- **Alarm-manager mode:** `AlarmAckByName` on `wwAlarmConsumerClass` (unchanged).
|
||||||
|
- **Subtag mode:** `SubtagAlarmConsumer.AcknowledgeByName` resolves the
|
||||||
|
watch-list entry's `ack_comment_subtag` and issues a `Write(comment)` on
|
||||||
|
the STA via `LmxSubtagAlarmSource`. The write performs the ack in AVEVA.
|
||||||
|
|
||||||
|
If the alarm has no writable ack-comment subtag (`AckComment` config key is
|
||||||
|
empty, or the entry's `ack_comment_subtag` field is empty), the ack call
|
||||||
|
returns a failure code that the gateway surfaces as `FailedPrecondition`.
|
||||||
|
`AcknowledgeByGuid` maps the synthetic GUID back to its reference via an
|
||||||
|
internal dictionary, then calls the same write path.
|
||||||
|
|
||||||
|
### Fidelity limitations
|
||||||
|
|
||||||
|
The following fields are not available or have lower quality in subtag mode:
|
||||||
|
|
||||||
|
| Field | Subtag-mode behavior |
|
||||||
|
|-------|---------------------|
|
||||||
|
| `alarm_guid` | Synthetic deterministic GUID from `alarm_full_reference`; not an AVEVA-native GUID |
|
||||||
|
| `original_raise_timestamp` | First observed `active = true` time; no AVEVA-native raise time |
|
||||||
|
| `transition_timestamp` | `OnDataChange` source timestamp from MXAccess |
|
||||||
|
| `severity` | From priority subtag if advised; 0 otherwise |
|
||||||
|
| `category` / `description` | Not populated (no subtag for these) |
|
||||||
|
| `current_value` / `limit_value` | Not populated unless corresponding subtags are in the watch-list |
|
||||||
|
| `alarm_type_name` | Not populated |
|
||||||
|
| `operator_user` / `operator_comment` | Not populated on synthesized raise/clear transitions |
|
||||||
|
| `retrigger` transition | Not synthesized (no re-alarm counter subtag is observed) |
|
||||||
|
|
||||||
|
Every transition and snapshot record carries `degraded = true` and
|
||||||
|
`source_provider = ALARM_PROVIDER_MODE_SUBTAG`. Clients that require full
|
||||||
|
fidelity must wait for failback to the alarm manager.
|
||||||
|
|
||||||
|
### Provider mode reflection
|
||||||
|
|
||||||
|
When `FailoverAlarmConsumer` switches between providers, it raises
|
||||||
|
`ProviderModeChanged`. `AlarmDispatcher` enqueues an
|
||||||
|
`OnAlarmProviderModeChangedEvent` (carried as an `MxEvent`), which the
|
||||||
|
gateway receives and reflects into:
|
||||||
|
|
||||||
|
- `AlarmFeedMessage.provider_status` emitted to every `StreamAlarms`
|
||||||
|
subscriber.
|
||||||
|
- The `/hubs/alarms` SignalR hub for the dashboard.
|
||||||
|
- Metrics: `mxgateway.alarms.provider_mode` gauge and
|
||||||
|
`mxgateway.alarms.provider_switches` counter.
|
||||||
|
|
||||||
|
On every switch `GatewayAlarmMonitor` also forces a reconcile
|
||||||
|
(`QueryActiveAlarms`) against the now-active provider so the gateway cache
|
||||||
|
reflects the post-switch state without a spurious raise/clear storm.
|
||||||
|
|||||||
@@ -411,6 +411,58 @@ a per-channel skip-verify hook:
|
|||||||
See [Gateway Configuration — Automatic self-signed certificate](./GatewayConfiguration.md#automatic-self-signed-certificate)
|
See [Gateway Configuration — Automatic self-signed certificate](./GatewayConfiguration.md#automatic-self-signed-certificate)
|
||||||
and the per-client READMEs for the as-built behavior.
|
and the per-client READMEs for the as-built behavior.
|
||||||
|
|
||||||
|
## Alarm-Manager to Subtag Fallback
|
||||||
|
|
||||||
|
Decision: add a second alarm provider (subtag monitoring) that the worker
|
||||||
|
activates automatically when the native wnwrap alarm manager fails, and fails
|
||||||
|
back to automatically when the manager recovers.
|
||||||
|
|
||||||
|
### Worker-side synthesis
|
||||||
|
|
||||||
|
Synthesis of alarm transitions from subtag value changes happens entirely in
|
||||||
|
the worker (`SubtagAlarmConsumer` / `SubtagAlarmStateMachine`). The gateway
|
||||||
|
still forwards only events the worker emits and synthesizes nothing itself.
|
||||||
|
This satisfies the parity rule even though the subtag path is inherently
|
||||||
|
non-parity: the parity rule governs where synthesis lives, not whether
|
||||||
|
synthesis is permitted when the native source is unavailable.
|
||||||
|
|
||||||
|
### Degraded is explicit
|
||||||
|
|
||||||
|
Every subtag-mode transition carries `degraded = true` on the
|
||||||
|
`OnAlarmTransitionEvent` and `ActiveAlarmSnapshot` proto messages, and the
|
||||||
|
`AlarmFeedMessage` feed carries an `AlarmProviderStatus` payload on stream
|
||||||
|
open and on every switch. No client can mistake a subtag-mode alarm for an
|
||||||
|
authoritative alarmmgr record. Subtag mode has lower fidelity: synthetic
|
||||||
|
deterministic GUID (SHA-derived from the alarm reference), best-effort
|
||||||
|
original-raise timestamp, narrower field set. Clients that need full fidelity
|
||||||
|
must wait for failback.
|
||||||
|
|
||||||
|
### Failover trigger
|
||||||
|
|
||||||
|
The failover trigger is N consecutive wnwrap COM failures — a `COMException`
|
||||||
|
thrown by `Subscribe` or `PollOnce`, or a failure HRESULT from
|
||||||
|
`GetXmlCurrentAlarms2`. A single poll failure does not trigger a switch; the
|
||||||
|
threshold (default 3, floored at 1) guards against transient COM hiccups. The
|
||||||
|
counter resets on any clean poll so a flapping provider does not permanently
|
||||||
|
latch in subtag mode.
|
||||||
|
|
||||||
|
### Acknowledge via ack-comment write
|
||||||
|
|
||||||
|
In subtag mode, `AcknowledgeAlarm` writes the operator comment to the alarm
|
||||||
|
attribute's ack-comment subtag (`Fallback:Subtags:AckComment`). The write
|
||||||
|
performs the native ack in AVEVA. This differs from alarmmgr mode, where
|
||||||
|
`AlarmAckByName` on `wwAlarmConsumerClass` is called directly. The `AckComment`
|
||||||
|
subtag name is empty by default; configuring it is required for ack to work in
|
||||||
|
subtag mode. The exact AVEVA subtag names are not hard-coded — the `Subtags`
|
||||||
|
config block exists precisely so names are not guessed without validation
|
||||||
|
against the live MXAccess attribute set.
|
||||||
|
|
||||||
|
### Related documentation
|
||||||
|
|
||||||
|
- [Gateway Configuration — Alarm Fallback options](./GatewayConfiguration.md#alarm-fallback-options)
|
||||||
|
- [Alarm Client Discovery — Subtag provider](./AlarmClientDiscovery.md)
|
||||||
|
- [gRPC Contract — provider_status and degraded fields](./Grpc.md)
|
||||||
|
|
||||||
## Later Revisit Items
|
## Later Revisit Items
|
||||||
|
|
||||||
These are explicit post-v1 revisit items, not open blockers:
|
These are explicit post-v1 revisit items, not open blockers:
|
||||||
|
|||||||
@@ -230,6 +230,74 @@ behavior.
|
|||||||
The alarm monitor is independent of client sessions: `AcknowledgeAlarm` and
|
The alarm monitor is independent of client sessions: `AcknowledgeAlarm` and
|
||||||
`StreamAlarms` are session-less RPCs served by the monitor.
|
`StreamAlarms` are session-less RPCs served by the monitor.
|
||||||
|
|
||||||
|
### Alarm fallback options
|
||||||
|
|
||||||
|
The `Fallback` sub-section controls how the alarm feed selects between the
|
||||||
|
native wnwrap alarm-manager provider and the subtag-monitoring fallback.
|
||||||
|
|
||||||
|
| Option | Default | Description |
|
||||||
|
|--------|---------|-------------|
|
||||||
|
| `MxGateway:Alarms:Fallback:Mode` | `Auto` | Provider selection mode. `Auto` uses the alarm manager as primary and fails over to subtag monitoring after consecutive COM failures, then fails back automatically. `ForceAlarmManager` disables failover. `ForceSubtag` forces subtag monitoring on from startup. Values are case-insensitive. |
|
||||||
|
| `MxGateway:Alarms:Fallback:ConsecutiveFailureThreshold` | `3` | Number of consecutive wnwrap COM failures (`COMException` or failure HRESULT from `Subscribe` / `GetXmlCurrentAlarms2`) before the monitor switches to subtag mode. Floored at 1. |
|
||||||
|
| `MxGateway:Alarms:Fallback:FailbackProbeIntervalSeconds` | `30` | While in subtag mode, how often (in seconds) the monitor probes the wnwrap provider to detect recovery. Floored at 1. |
|
||||||
|
| `MxGateway:Alarms:Fallback:FailbackStableProbes` | `3` | Number of consecutive clean wnwrap probes required before the monitor switches back to the alarm manager. Floored at 1. |
|
||||||
|
| `MxGateway:Alarms:Fallback:Discovery:UseGalaxyRepository` | `true` | When `true`, the monitor queries the Galaxy Repository SQL database to build the subtag watch-list for the configured area. |
|
||||||
|
| `MxGateway:Alarms:Fallback:Discovery:Area` | _(empty)_ | Galaxy area to scope the Repository query to. Falls back to `MxGateway:Alarms:DefaultArea` when empty. Ignored when `UseGalaxyRepository` is `false`. |
|
||||||
|
| `MxGateway:Alarms:Fallback:Discovery:IncludeAttributes` | _(empty)_ | Explicit MXAccess attribute paths to add to the subtag watch-list, supplementing (or replacing, when `UseGalaxyRepository` is `false`) the Repository-derived list. |
|
||||||
|
| `MxGateway:Alarms:Fallback:Discovery:ExcludeAttributes` | _(empty)_ | Attribute paths to remove from the Repository-derived watch-list. Ignored when `UseGalaxyRepository` is `false`. |
|
||||||
|
| `MxGateway:Alarms:Fallback:Subtags:Active` | `active` | Subtag name for the in-alarm boolean. |
|
||||||
|
| `MxGateway:Alarms:Fallback:Subtags:Acked` | `acked` | Subtag name for the acknowledged boolean. |
|
||||||
|
| `MxGateway:Alarms:Fallback:Subtags:AckComment` | _(empty)_ | Subtag name for the acknowledgement comment attribute. When empty, writing an ack comment in subtag mode is disabled. Must be verified against the live MXAccess attribute set before use. |
|
||||||
|
| `MxGateway:Alarms:Fallback:Subtags:Priority` | `priority` | Subtag name for the alarm priority / severity value. |
|
||||||
|
|
||||||
|
Validation rules:
|
||||||
|
|
||||||
|
- `Mode` must be `Auto`, `ForceAlarmManager`, or `ForceSubtag` (case-insensitive).
|
||||||
|
- `Mode = ForceSubtag` with both `UseGalaxyRepository = false` and an empty
|
||||||
|
`IncludeAttributes` list produces a startup validation warning: the subtag
|
||||||
|
provider has no attributes to advise.
|
||||||
|
- `ConsecutiveFailureThreshold`, `FailbackProbeIntervalSeconds`, and
|
||||||
|
`FailbackStableProbes` are floored at 1 by `GatewayOptionsValidator`.
|
||||||
|
|
||||||
|
Full example with non-default fallback settings:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"MxGateway": {
|
||||||
|
"Alarms": {
|
||||||
|
"Enabled": true,
|
||||||
|
"SubscriptionExpression": "\\\\SCADA01\\Galaxy!PlantArea",
|
||||||
|
"DefaultArea": "PlantArea",
|
||||||
|
"ReconcileIntervalSeconds": 30,
|
||||||
|
"Fallback": {
|
||||||
|
"Mode": "Auto",
|
||||||
|
"ConsecutiveFailureThreshold": 3,
|
||||||
|
"FailbackProbeIntervalSeconds": 30,
|
||||||
|
"FailbackStableProbes": 3,
|
||||||
|
"Discovery": {
|
||||||
|
"UseGalaxyRepository": true,
|
||||||
|
"Area": "",
|
||||||
|
"IncludeAttributes": [],
|
||||||
|
"ExcludeAttributes": []
|
||||||
|
},
|
||||||
|
"Subtags": {
|
||||||
|
"Active": "active",
|
||||||
|
"Acked": "acked",
|
||||||
|
"AckComment": "",
|
||||||
|
"Priority": "priority"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The exact AVEVA subtag names for `Active`, `Acked`, `AckComment`, and
|
||||||
|
`Priority` are not hard-coded. The `Subtags` block exists so names can be
|
||||||
|
confirmed against the live MXAccess attribute set and configured without a
|
||||||
|
code change. See `docs/AlarmClientDiscovery.md` for the synthesis rules that
|
||||||
|
depend on these names.
|
||||||
|
|
||||||
## Host Endpoints and Transport Security (Kestrel)
|
## Host Endpoints and Transport Security (Kestrel)
|
||||||
|
|
||||||
The listening endpoints are **not** part of the `MxGateway` section. The gateway
|
The listening endpoints are **not** part of the `MxGateway` section. The gateway
|
||||||
|
|||||||
@@ -94,6 +94,73 @@ Carrying the enqueue timestamp into the worker layer is what lets queue-wait tim
|
|||||||
|
|
||||||
`StreamAlarms` is a server-streaming, **session-less** RPC that attaches to the gateway's central alarm feed. The handler delegates to `IGatewayAlarmService.StreamAsync`. The stream opens with one `AlarmFeedMessage` carrying an `active_alarm` per currently-active alarm (the ConditionRefresh snapshot), then a single `snapshot_complete`, then a `transition` for every subsequent raise / acknowledge / clear. It is served by the always-on `GatewayAlarmMonitor`, which owns a single gateway-managed worker session and fans out to every attached client — clients no longer open a session of their own. `alarm_filter_prefix`, when set, scopes the stream to a sub-tree.
|
`StreamAlarms` is a server-streaming, **session-less** RPC that attaches to the gateway's central alarm feed. The handler delegates to `IGatewayAlarmService.StreamAsync`. The stream opens with one `AlarmFeedMessage` carrying an `active_alarm` per currently-active alarm (the ConditionRefresh snapshot), then a single `snapshot_complete`, then a `transition` for every subsequent raise / acknowledge / clear. It is served by the always-on `GatewayAlarmMonitor`, which owns a single gateway-managed worker session and fans out to every attached client — clients no longer open a session of their own. `alarm_filter_prefix`, when set, scopes the stream to a sub-tree.
|
||||||
|
|
||||||
|
#### Provider status on the alarm feed
|
||||||
|
|
||||||
|
`AlarmFeedMessage` has a fourth `payload` case, `provider_status`, carrying
|
||||||
|
an `AlarmProviderStatus` message:
|
||||||
|
|
||||||
|
```protobuf
|
||||||
|
message AlarmProviderStatus {
|
||||||
|
AlarmProviderMode mode = 1;
|
||||||
|
bool degraded = 2; // true whenever mode == SUBTAG
|
||||||
|
string reason = 3; // human-readable switch reason
|
||||||
|
google.protobuf.Timestamp since = 4;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The gateway emits `provider_status` once when a client first subscribes
|
||||||
|
(immediately after the initial snapshot and before the first live transition)
|
||||||
|
and again on every failover or failback. A late-joining client therefore
|
||||||
|
always learns the current provider mode without waiting for the next switch.
|
||||||
|
|
||||||
|
`AlarmProviderMode` is an enum with three values:
|
||||||
|
|
||||||
|
| Value | Meaning |
|
||||||
|
|-------|---------|
|
||||||
|
| `ALARM_PROVIDER_MODE_UNSPECIFIED` (0) | Default / unset |
|
||||||
|
| `ALARM_PROVIDER_MODE_ALARMMGR` (1) | Native wnwrap alarm-manager source |
|
||||||
|
| `ALARM_PROVIDER_MODE_SUBTAG` (2) | Subtag-monitoring fallback (degraded) |
|
||||||
|
|
||||||
|
#### Degraded and source-provider fields on transitions and snapshots
|
||||||
|
|
||||||
|
`OnAlarmTransitionEvent` and `ActiveAlarmSnapshot` both carry two new fields:
|
||||||
|
|
||||||
|
- `bool degraded` (field 14) — `true` when the record came from the subtag
|
||||||
|
fallback, not the native alarmmgr.
|
||||||
|
- `AlarmProviderMode source_provider` (field 15) — which provider produced
|
||||||
|
this record (`ALARMMGR` or `SUBTAG`).
|
||||||
|
|
||||||
|
Both fields are proto3 defaults (`false` / `UNSPECIFIED`) in alarmmgr mode,
|
||||||
|
so existing clients that do not read them continue to function without change.
|
||||||
|
Clients that care about provenance — for example, an OPC UA server that
|
||||||
|
applies different quality flags to degraded alarms — should inspect `degraded`
|
||||||
|
before consuming the transition.
|
||||||
|
|
||||||
|
Subtag-mode records are a non-parity source. They carry synthetic GUIDs,
|
||||||
|
best-effort timestamps, and reduced field coverage. See
|
||||||
|
`docs/AlarmClientDiscovery.md` for the full fidelity table.
|
||||||
|
|
||||||
|
#### Provider-mode-changed event
|
||||||
|
|
||||||
|
The worker emits `OnAlarmProviderModeChangedEvent` (family
|
||||||
|
`MX_EVENT_FAMILY_ON_ALARM_PROVIDER_MODE_CHANGED`) on each switch between
|
||||||
|
providers:
|
||||||
|
|
||||||
|
```protobuf
|
||||||
|
message OnAlarmProviderModeChangedEvent {
|
||||||
|
AlarmProviderMode mode = 1;
|
||||||
|
string reason = 2;
|
||||||
|
int32 hresult = 3; // COM HRESULT that triggered failover; 0 on failback
|
||||||
|
google.protobuf.Timestamp at = 4;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This event arrives on the `StreamEvents` stream of the alarm monitor's
|
||||||
|
internal gateway session (not on client sessions). `GatewayAlarmMonitor`
|
||||||
|
consumes it and reflects the new mode into the `StreamAlarms` feed's
|
||||||
|
`provider_status`, the dashboard hub, and metrics. Client sessions do not
|
||||||
|
receive this event directly.
|
||||||
|
|
||||||
## Validation Rules
|
## Validation Rules
|
||||||
|
|
||||||
`MxAccessGrpcRequestValidator` rejects requests with `StatusCode.InvalidArgument` before any session work happens. The rules are intentionally narrow — anything that requires session state (for example, "session does not exist") is left for `ISessionManager` so the validator can stay synchronous and side-effect free.
|
`MxAccessGrpcRequestValidator` rejects requests with `StatusCode.InvalidArgument` before any session work happens. The rules are intentionally narrow — anything that requires session state (for example, "session does not exist") is left for `ISessionManager` so the validator can stay synchronous and side-effect free.
|
||||||
|
|||||||
+57
@@ -143,6 +143,63 @@ session if the worker faults. Gated by `MxGateway:Alarms:Enabled` — see
|
|||||||
`docs/DesignDecisions.md` for why this reverses the v1 single-subscriber rule
|
`docs/DesignDecisions.md` for why this reverses the v1 single-subscriber rule
|
||||||
for the alarm subsystem.
|
for the alarm subsystem.
|
||||||
|
|
||||||
|
### Alarm providers and failover
|
||||||
|
|
||||||
|
The alarm feed has two providers, both implemented worker-side:
|
||||||
|
|
||||||
|
- **Alarm manager (primary):** `WnWrapAlarmConsumer` polls
|
||||||
|
`wwAlarmConsumerClass.GetXmlCurrentAlarms2` on the worker STA. This is the
|
||||||
|
authoritative native source.
|
||||||
|
- **Subtag monitoring (standby):** `SubtagAlarmConsumer` advises each alarm
|
||||||
|
attribute's subtags (`.active`, `.acked`, optionally `.priority`) via the
|
||||||
|
existing `AddItem`/`Advise` pipeline through `LmxSubtagAlarmSource` and
|
||||||
|
synthesizes alarm transitions with `SubtagAlarmStateMachine`. This is a
|
||||||
|
non-parity, lower-fidelity source — synthetic GUIDs, no native raise
|
||||||
|
timestamps, narrower fields.
|
||||||
|
|
||||||
|
`FailoverAlarmConsumer` wraps both and owns the state machine:
|
||||||
|
|
||||||
|
- **Auto-failover:** after `ConsecutiveFailureThreshold` (default 3)
|
||||||
|
consecutive wnwrap COM failures — `Subscribe` or `PollOnce` throws or
|
||||||
|
returns a failure HRESULT — it activates the standby. The standby is armed
|
||||||
|
(subscribed and adviseing) from the start so its state is warm at the moment
|
||||||
|
of switch.
|
||||||
|
- **Auto-failback:** while degraded, every `FailbackProbeIntervalSeconds`
|
||||||
|
(default 30) it re-probes the still-subscribed primary. After
|
||||||
|
`FailbackStableProbes` (default 3) consecutive clean polls it switches back
|
||||||
|
to the alarm manager.
|
||||||
|
- **On every switch:** the consumer snapshots the now-active provider and
|
||||||
|
emits `OnAlarmProviderModeChangedEvent` so the gateway can reconcile its
|
||||||
|
cache without a raise/clear storm.
|
||||||
|
|
||||||
|
Synthesis is worker-side. This preserves the parity rule — the gateway
|
||||||
|
forwards only events the worker emits and never synthesizes transitions
|
||||||
|
itself. The synthesis rules are documented in
|
||||||
|
`docs/AlarmClientDiscovery.md`.
|
||||||
|
|
||||||
|
**Acknowledge in subtag mode:** the ack-by-name path writes the operator
|
||||||
|
comment to the alarm attribute's ack-comment subtag. The write performs the
|
||||||
|
ack. If the attribute has no writable ack-comment subtag configured, the RPC
|
||||||
|
returns `FailedPrecondition`. In alarm-manager mode, `AlarmAckByName` is
|
||||||
|
used as before.
|
||||||
|
|
||||||
|
**Degraded state visibility:** every subtag-mode transition carries
|
||||||
|
`degraded = true` and `source_provider = ALARM_PROVIDER_MODE_SUBTAG` on the
|
||||||
|
`OnAlarmTransitionEvent` and `ActiveAlarmSnapshot` proto fields. The
|
||||||
|
`AlarmFeedMessage` feed emits an `AlarmProviderStatus` message (the
|
||||||
|
`provider_status` oneof case) on stream open and on every switch. The
|
||||||
|
dashboard shows a Bootstrap badge (green for alarm manager, amber when
|
||||||
|
degraded). Metrics: `mxgateway.alarms.provider_mode` gauge (1 = alarmmgr,
|
||||||
|
2 = subtag) and `mxgateway.alarms.provider_switches` counter.
|
||||||
|
|
||||||
|
Forced modes are available via `MxGateway:Alarms:Fallback:Mode`:
|
||||||
|
`ForceAlarmManager` disables failover; `ForceSubtag` forces the standby
|
||||||
|
on from startup; `Auto` (default) enables failover and failback. Watch-list
|
||||||
|
discovery for the subtag provider uses Galaxy Repository SQL with config
|
||||||
|
overrides. See `docs/GatewayConfiguration.md` for the full `Fallback` option
|
||||||
|
block and `docs/AlarmClientDiscovery.md` for synthesis rules and fidelity
|
||||||
|
limitations.
|
||||||
|
|
||||||
Dashboard authentication is LDAP-backed (distinct from the API-key model on
|
Dashboard authentication is LDAP-backed (distinct from the API-key model on
|
||||||
the gRPC API). `/login` accepts username and password in a form body, binds
|
the gRPC API). `/login` accepts username and password in a form body, binds
|
||||||
against `MxGateway:Ldap`, maps the user's LDAP groups to `Admin` or `Viewer`
|
against `MxGateway:Ldap`, maps the user's LDAP groups to `Admin` or `Viewer`
|
||||||
|
|||||||
Reference in New Issue
Block a user