Document the session-less StreamAlarms feed and alarm config

Update the gateway docs for the central alarm monitor reversal:
Grpc.md replaces QueryActiveAlarms with the session-less StreamAlarms
RPC and notes AcknowledgeAlarm no longer needs a session;
Authorization.md maps StreamAlarmsRequest to events:read;
GatewayConfiguration.md adds the MxGateway:Alarms options block; and
GatewayDashboardDesign.md points the Alarms page at the central
monitor cache instead of a per-session subscription.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-21 16:59:48 -04:00
parent 6777d49030
commit de58872435
4 changed files with 51 additions and 27 deletions
+3 -3
View File
@@ -103,7 +103,7 @@ public string ResolveRequiredScope(object request)
StreamEventsRequest => GatewayScopes.EventsRead,
MxCommandRequest commandRequest => ResolveCommandScope(commandRequest.Command?.Kind ?? MxCommandKind.Unspecified),
AcknowledgeAlarmRequest => GatewayScopes.InvokeWrite,
QueryActiveAlarmsRequest => GatewayScopes.EventsRead,
StreamAlarmsRequest => GatewayScopes.EventsRead,
TestConnectionRequest or
GetLastDeployTimeRequest or
DiscoverHierarchyRequest or
@@ -113,7 +113,7 @@ public string ResolveRequiredScope(object request)
}
```
The `_ => GatewayScopes.Admin` fallback is intentional: any future request type that the resolver does not recognize fails closed, requiring the strongest scope until the resolver is updated. `AcknowledgeAlarm` is treated as a write — it mutates alarm state, mirroring `MxCommandKind.Write*` — and `QueryActiveAlarms` shares the alarm/event surface with `StreamEvents` and `MxCommandKind.DrainEvents`, so it carries `events:read`.
The `_ => GatewayScopes.Admin` fallback is intentional: any future request type that the resolver does not recognize fails closed, requiring the strongest scope until the resolver is updated. `AcknowledgeAlarm` is treated as a write — it mutates alarm state, mirroring `MxCommandKind.Write*` — and `StreamAlarms` shares the alarm/event surface with `StreamEvents` and `MxCommandKind.DrainEvents`, so it carries `events:read`. Both alarm RPCs are session-less: the scope check is the only authorization gate, since there is no per-session ownership to enforce.
`MxCommandRequest` is special because it multiplexes many MxAccess operations through a single RPC. The resolver inspects the embedded `MxCommandKind` so each operation gets its own scope:
@@ -205,7 +205,7 @@ blocking constraint; secured values and raw credentials are never logged.
|----------|-------|--------------|
| `SessionOpen` | `session:open` | `OpenSessionRequest` |
| `SessionClose` | `session:close` | `CloseSessionRequest` |
| `EventsRead` | `events:read` | `StreamEventsRequest`, `QueryActiveAlarmsRequest`, `MxCommandKind.DrainEvents` |
| `EventsRead` | `events:read` | `StreamEventsRequest`, `StreamAlarmsRequest`, `MxCommandKind.DrainEvents` |
| `InvokeRead` | `invoke:read` | `MxCommandRequest` for read-style command kinds (`Register`, `AddItem`, `Advise`, `ReadBulk`, and any kind not otherwise mapped) |
| `InvokeWrite` | `invoke:write` | `AcknowledgeAlarmRequest`, `MxCommandKind.Write`, `MxCommandKind.Write2`, `MxCommandKind.WriteBulk`, `MxCommandKind.Write2Bulk` |
| `InvokeSecure` | `invoke:secure` | `MxCommandKind.WriteSecured`, `MxCommandKind.WriteSecured2`, `MxCommandKind.WriteSecuredBulk`, `MxCommandKind.WriteSecured2Bulk`, `MxCommandKind.AuthenticateUser` |
+20
View File
@@ -61,6 +61,12 @@ paths, timeouts, queue sizes, enum values, or protocol values are invalid.
"ConnectionString": "Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;",
"CommandTimeoutSeconds": 60,
"DashboardRefreshIntervalSeconds": 30
},
"Alarms": {
"Enabled": false,
"SubscriptionExpression": "",
"DefaultArea": "",
"ReconcileIntervalSeconds": 30
}
}
}
@@ -168,6 +174,20 @@ at startup.
See [Galaxy Repository Browse](./GalaxyRepository.md) for the RPC surface and
behavior.
## Alarm Options
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Alarms:Enabled` | `false` | Gates the gateway's always-on central alarm monitor. When `true`, the gateway opens one gateway-owned worker session dedicated to alarms, caches the active-alarm set, and fans it out to every client through the `StreamAlarms` RPC — no client opens its own session to see alarms. |
| `MxGateway:Alarms:SubscriptionExpression` | _(empty)_ | AVEVA alarm-subscription expression the monitor subscribes on startup, in canonical `\\<machine>\Galaxy!<area>` form. The literal `Galaxy` provider is correct regardless of the Galaxy database name. When empty and `Enabled` is `true`, the gateway falls back to `\\<MachineName>\Galaxy!<DefaultArea>` if `DefaultArea` is set. |
| `MxGateway:Alarms:DefaultArea` | _(empty)_ | Area name used to compose a default subscription when `SubscriptionExpression` is empty. If both are empty while `Enabled` is `true`, the monitor faults with a configuration diagnostic. |
| `MxGateway:Alarms:ReconcileIntervalSeconds` | `30` | How often the monitor reconciles its in-process alarm cache against the worker's authoritative active-alarm snapshot, catching transitions the live poll-and-diff feed missed. Floored at 5 seconds. |
The alarm monitor is independent of client sessions: `AcknowledgeAlarm` and
`StreamAlarms` are session-less RPCs served by the monitor. See
[Alarm Client Discovery](./AlarmClientDiscovery.md) for the AVEVA consumer
surface the monitor's worker session drives.
## Related Documentation
- [Gateway Process Detailed Design](./GatewayProcessDesign.md)
+22 -18
View File
@@ -274,28 +274,32 @@ diagnostic session/worker views.
### Alarms page
`/dashboard/alarms` lists the alarms the dashboard session's worker currently
reports as Active or ActiveAcked, refreshed every three seconds. It defaults to
showing unacknowledged `Active` alarms; filters add acknowledged alarms and
narrow by area, severity range, and a reference/source/description text search.
Cleared alarms are not retained — the gateway holds no alarm-history store, so
the page reflects only the live active set. The page is read-only; it does not
acknowledge alarms. If `MxGateway:Alarms:Enabled` is false the session is never
subscribed to an alarm provider, and the page says so instead of showing an
empty list with no explanation.
`/dashboard/alarms` lists the alarms the gateway's central alarm monitor
currently holds as Active or ActiveAcked, refreshed every three seconds. It
defaults to showing unacknowledged `Active` alarms; filters add acknowledged
alarms and narrow by area, severity range, and a reference/source/description
text search. Cleared alarms are not retained — the gateway holds no
alarm-history store, so the page reflects only the live active set. The page is
read-only; it does not acknowledge alarms. If `MxGateway:Alarms:Enabled` is
false the central monitor never starts, and the page says so instead of showing
an empty list with no explanation.
### Live data source
Both the Browse subscription panel and the Alarms page read live MXAccess data
through `IDashboardLiveDataService` (`DashboardLiveDataService`). It owns one
shared gateway session for the whole dashboard, opened lazily on first use via
`ISessionManager` and re-opened transparently when it faults or its lease
expires. One session means one worker process backs every dashboard circuit;
all access is serialised so the worker sees one in-flight command at a time.
Tag reads go through `GatewaySession.SubscribeBulkAsync` / `ReadBulkAsync`;
alarm queries go through `IAlarmRpcDispatcher`. Alarm subscription is the
gateway's existing auto-subscribe-on-open hook, so the dashboard session is
alarm-subscribed only when `MxGateway:Alarms:Enabled` is set.
through `IDashboardLiveDataService` (`DashboardLiveDataService`). For tag data
it owns one shared gateway session for the whole dashboard, opened lazily on
first use via `ISessionManager` and re-opened transparently when it faults or
its lease expires. One session means one worker process backs every dashboard
circuit; all access is serialised so the worker sees one in-flight command at a
time. Tag reads go through `GatewaySession.SubscribeBulkAsync` / `ReadBulkAsync`.
The Alarms page does **not** use the dashboard session: alarm data comes from
the gateway's always-on central monitor. `QueryAlarmsAsync` reads
`IGatewayAlarmService.CurrentAlarms` — the monitor's in-process cache — so the
dashboard sees the same active-alarm set as every `StreamAlarms` client, with
no per-dashboard alarm subscription. When `MxGateway:Alarms:Enabled` is false
the monitor never starts and the cache stays empty.
### API keys page
+6 -6
View File
@@ -29,7 +29,7 @@ A second gRPC service, `GalaxyRepositoryGrpcService`, is mapped alongside it. It
## RPC Handlers
`MxAccessGatewayService` derives from the generated `MxAccessGateway.MxAccessGatewayBase` and implements every RPC declared in `mxaccess_gateway.proto` — six in total: `OpenSession`, `CloseSession`, `Invoke`, `StreamEvents`, `AcknowledgeAlarm`, and `QueryActiveAlarms`. The proto contract itself is documented in [Contracts](./Contracts.md); this section covers only what the server-side handler does on top of that contract.
`MxAccessGatewayService` derives from the generated `MxAccessGateway.MxAccessGatewayBase` and implements every RPC declared in `mxaccess_gateway.proto` — six in total: `OpenSession`, `CloseSession`, `Invoke`, `StreamEvents`, `AcknowledgeAlarm`, and `StreamAlarms`. The proto contract itself is documented in [Contracts](./Contracts.md); this section covers only what the server-side handler does on top of that contract.
Public gRPC send and receive message sizes are configured from
`MxGateway:Protocol:MaxGrpcMessageBytes` (default 16 MiB). Official clients use
@@ -88,11 +88,11 @@ Carrying the enqueue timestamp into the worker layer is what lets queue-wait tim
### `AcknowledgeAlarm`
`AcknowledgeAlarm` is a unary RPC that acknowledges a single alarm. The handler validates `session_id` and `alarm_full_reference` inline (it does not run through `MxAccessGrpcRequestValidator`, because the alarm surface routes through `IAlarmRpcDispatcher` rather than the generic `Invoke` path), resolves the session, then delegates to the registered `IAlarmRpcDispatcher`. The production `WorkerAlarmRpcDispatcher` routes the ack over the worker IPC by GUID (`AcknowledgeAlarmCommand`) when the reference parses as a canonical GUID, or by `Provider!Group.Tag` reference (`AcknowledgeAlarmByNameCommand`) otherwise. The handler-level RPC behaviour and the alarm contract itself are documented in [Alarm Client Discovery](./AlarmClientDiscovery.md).
`AcknowledgeAlarm` is a unary, **session-less** RPC that acknowledges a single alarm. The handler validates `alarm_full_reference` inline (it does not run through `MxAccessGrpcRequestValidator`) and delegates to `IGatewayAlarmService.AcknowledgeAsync`. The always-on `GatewayAlarmMonitor` routes the ack over its own gateway-managed worker session — clients no longer open a session to acknowledge an alarm. A reference that parses as a canonical GUID forwards to `AcknowledgeAlarmCommand`; a `Provider!Group.Tag` reference forwards to `AcknowledgeAlarmByNameCommand`. The alarm contract and the central monitor are documented in [Alarm Client Discovery](./AlarmClientDiscovery.md).
### `QueryActiveAlarms`
### `StreamAlarms`
`QueryActiveAlarms` is a server-streaming RPC that returns an `ActiveAlarmSnapshot` per currently active alarm. The handler validates `session_id` inline, resolves the session, and delegates to `IAlarmRpcDispatcher`; `WorkerAlarmRpcDispatcher` issues a `QueryActiveAlarmsCommand` over the worker IPC and streams each snapshot from the worker reply.
`StreamAlarms` is a server-streaming, **session-less** RPC that attaches to the gateway's central alarm feed. The handler delegates to `IGatewayAlarmService.StreamAsync`. The stream opens with one `AlarmFeedMessage` carrying an `active_alarm` per currently-active alarm (the ConditionRefresh snapshot), then a single `snapshot_complete`, then a `transition` for every subsequent raise / acknowledge / clear. It is served by the always-on `GatewayAlarmMonitor`, which owns a single gateway-managed worker session and fans out to every attached client — clients no longer open a session of their own. `alarm_filter_prefix`, when set, scopes the stream to a sub-tree.
## Validation Rules
@@ -104,8 +104,8 @@ Carrying the enqueue timestamp into the worker layer is what lets queue-wait tim
| `CloseSession` | `session_id` must be non-empty. | `InvalidArgument` |
| `StreamEvents` | `session_id` must be non-empty. | `InvalidArgument` |
| `Invoke` | `session_id` non-empty, `command` present, `kind` not `Unspecified`, payload oneof must match `kind`. | `InvalidArgument` |
| `AcknowledgeAlarm` | `session_id` and `alarm_full_reference` must be non-empty. Validated inline in the handler, not by `MxAccessGrpcRequestValidator`. | `InvalidArgument` |
| `QueryActiveAlarms` | `session_id` must be non-empty. Validated inline in the handler, not by `MxAccessGrpcRequestValidator`. | `InvalidArgument` |
| `AcknowledgeAlarm` | `alarm_full_reference` must be non-empty. Validated inline in the handler, not by `MxAccessGrpcRequestValidator`. | `InvalidArgument` |
| `StreamAlarms` | No required fields — `alarm_filter_prefix` is optional. | — |
The payload-vs-kind check matters because the `MxCommand.payload` oneof is non-discriminated on the wire — a misaligned client could send `kind = Write` with a `Register` payload and silently confuse the worker. The validator turns that into a clear client error: