From de58872435e5e2fdac91c3c7b4d5c19528e81de1 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Thu, 21 May 2026 16:59:48 -0400 Subject: [PATCH] Document the session-less StreamAlarms feed and alarm config Update the gateway docs for the central alarm monitor reversal: Grpc.md replaces QueryActiveAlarms with the session-less StreamAlarms RPC and notes AcknowledgeAlarm no longer needs a session; Authorization.md maps StreamAlarmsRequest to events:read; GatewayConfiguration.md adds the MxGateway:Alarms options block; and GatewayDashboardDesign.md points the Alarms page at the central monitor cache instead of a per-session subscription. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/Authorization.md | 6 ++--- docs/GatewayConfiguration.md | 20 +++++++++++++++++ docs/GatewayDashboardDesign.md | 40 +++++++++++++++++++--------------- docs/Grpc.md | 12 +++++----- 4 files changed, 51 insertions(+), 27 deletions(-) diff --git a/docs/Authorization.md b/docs/Authorization.md index 2d2cdb6..ff270db 100644 --- a/docs/Authorization.md +++ b/docs/Authorization.md @@ -103,7 +103,7 @@ public string ResolveRequiredScope(object request) StreamEventsRequest => GatewayScopes.EventsRead, MxCommandRequest commandRequest => ResolveCommandScope(commandRequest.Command?.Kind ?? MxCommandKind.Unspecified), AcknowledgeAlarmRequest => GatewayScopes.InvokeWrite, - QueryActiveAlarmsRequest => GatewayScopes.EventsRead, + StreamAlarmsRequest => GatewayScopes.EventsRead, TestConnectionRequest or GetLastDeployTimeRequest or DiscoverHierarchyRequest or @@ -113,7 +113,7 @@ public string ResolveRequiredScope(object request) } ``` -The `_ => GatewayScopes.Admin` fallback is intentional: any future request type that the resolver does not recognize fails closed, requiring the strongest scope until the resolver is updated. `AcknowledgeAlarm` is treated as a write — it mutates alarm state, mirroring `MxCommandKind.Write*` — and `QueryActiveAlarms` shares the alarm/event surface with `StreamEvents` and `MxCommandKind.DrainEvents`, so it carries `events:read`. +The `_ => GatewayScopes.Admin` fallback is intentional: any future request type that the resolver does not recognize fails closed, requiring the strongest scope until the resolver is updated. `AcknowledgeAlarm` is treated as a write — it mutates alarm state, mirroring `MxCommandKind.Write*` — and `StreamAlarms` shares the alarm/event surface with `StreamEvents` and `MxCommandKind.DrainEvents`, so it carries `events:read`. Both alarm RPCs are session-less: the scope check is the only authorization gate, since there is no per-session ownership to enforce. `MxCommandRequest` is special because it multiplexes many MxAccess operations through a single RPC. The resolver inspects the embedded `MxCommandKind` so each operation gets its own scope: @@ -205,7 +205,7 @@ blocking constraint; secured values and raw credentials are never logged. |----------|-------|--------------| | `SessionOpen` | `session:open` | `OpenSessionRequest` | | `SessionClose` | `session:close` | `CloseSessionRequest` | -| `EventsRead` | `events:read` | `StreamEventsRequest`, `QueryActiveAlarmsRequest`, `MxCommandKind.DrainEvents` | +| `EventsRead` | `events:read` | `StreamEventsRequest`, `StreamAlarmsRequest`, `MxCommandKind.DrainEvents` | | `InvokeRead` | `invoke:read` | `MxCommandRequest` for read-style command kinds (`Register`, `AddItem`, `Advise`, `ReadBulk`, and any kind not otherwise mapped) | | `InvokeWrite` | `invoke:write` | `AcknowledgeAlarmRequest`, `MxCommandKind.Write`, `MxCommandKind.Write2`, `MxCommandKind.WriteBulk`, `MxCommandKind.Write2Bulk` | | `InvokeSecure` | `invoke:secure` | `MxCommandKind.WriteSecured`, `MxCommandKind.WriteSecured2`, `MxCommandKind.WriteSecuredBulk`, `MxCommandKind.WriteSecured2Bulk`, `MxCommandKind.AuthenticateUser` | diff --git a/docs/GatewayConfiguration.md b/docs/GatewayConfiguration.md index 7213e7a..1bb21bc 100644 --- a/docs/GatewayConfiguration.md +++ b/docs/GatewayConfiguration.md @@ -61,6 +61,12 @@ paths, timeouts, queue sizes, enum values, or protocol values are invalid. "ConnectionString": "Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;", "CommandTimeoutSeconds": 60, "DashboardRefreshIntervalSeconds": 30 + }, + "Alarms": { + "Enabled": false, + "SubscriptionExpression": "", + "DefaultArea": "", + "ReconcileIntervalSeconds": 30 } } } @@ -168,6 +174,20 @@ at startup. See [Galaxy Repository Browse](./GalaxyRepository.md) for the RPC surface and behavior. +## Alarm Options + +| Option | Default | Description | +|--------|---------|-------------| +| `MxGateway:Alarms:Enabled` | `false` | Gates the gateway's always-on central alarm monitor. When `true`, the gateway opens one gateway-owned worker session dedicated to alarms, caches the active-alarm set, and fans it out to every client through the `StreamAlarms` RPC — no client opens its own session to see alarms. | +| `MxGateway:Alarms:SubscriptionExpression` | _(empty)_ | AVEVA alarm-subscription expression the monitor subscribes on startup, in canonical `\\\Galaxy!` form. The literal `Galaxy` provider is correct regardless of the Galaxy database name. When empty and `Enabled` is `true`, the gateway falls back to `\\\Galaxy!` if `DefaultArea` is set. | +| `MxGateway:Alarms:DefaultArea` | _(empty)_ | Area name used to compose a default subscription when `SubscriptionExpression` is empty. If both are empty while `Enabled` is `true`, the monitor faults with a configuration diagnostic. | +| `MxGateway:Alarms:ReconcileIntervalSeconds` | `30` | How often the monitor reconciles its in-process alarm cache against the worker's authoritative active-alarm snapshot, catching transitions the live poll-and-diff feed missed. Floored at 5 seconds. | + +The alarm monitor is independent of client sessions: `AcknowledgeAlarm` and +`StreamAlarms` are session-less RPCs served by the monitor. See +[Alarm Client Discovery](./AlarmClientDiscovery.md) for the AVEVA consumer +surface the monitor's worker session drives. + ## Related Documentation - [Gateway Process Detailed Design](./GatewayProcessDesign.md) diff --git a/docs/GatewayDashboardDesign.md b/docs/GatewayDashboardDesign.md index 8c285bb..c93f1bf 100644 --- a/docs/GatewayDashboardDesign.md +++ b/docs/GatewayDashboardDesign.md @@ -274,28 +274,32 @@ diagnostic session/worker views. ### Alarms page -`/dashboard/alarms` lists the alarms the dashboard session's worker currently -reports as Active or ActiveAcked, refreshed every three seconds. It defaults to -showing unacknowledged `Active` alarms; filters add acknowledged alarms and -narrow by area, severity range, and a reference/source/description text search. -Cleared alarms are not retained — the gateway holds no alarm-history store, so -the page reflects only the live active set. The page is read-only; it does not -acknowledge alarms. If `MxGateway:Alarms:Enabled` is false the session is never -subscribed to an alarm provider, and the page says so instead of showing an -empty list with no explanation. +`/dashboard/alarms` lists the alarms the gateway's central alarm monitor +currently holds as Active or ActiveAcked, refreshed every three seconds. It +defaults to showing unacknowledged `Active` alarms; filters add acknowledged +alarms and narrow by area, severity range, and a reference/source/description +text search. Cleared alarms are not retained — the gateway holds no +alarm-history store, so the page reflects only the live active set. The page is +read-only; it does not acknowledge alarms. If `MxGateway:Alarms:Enabled` is +false the central monitor never starts, and the page says so instead of showing +an empty list with no explanation. ### Live data source Both the Browse subscription panel and the Alarms page read live MXAccess data -through `IDashboardLiveDataService` (`DashboardLiveDataService`). It owns one -shared gateway session for the whole dashboard, opened lazily on first use via -`ISessionManager` and re-opened transparently when it faults or its lease -expires. One session means one worker process backs every dashboard circuit; -all access is serialised so the worker sees one in-flight command at a time. -Tag reads go through `GatewaySession.SubscribeBulkAsync` / `ReadBulkAsync`; -alarm queries go through `IAlarmRpcDispatcher`. Alarm subscription is the -gateway's existing auto-subscribe-on-open hook, so the dashboard session is -alarm-subscribed only when `MxGateway:Alarms:Enabled` is set. +through `IDashboardLiveDataService` (`DashboardLiveDataService`). For tag data +it owns one shared gateway session for the whole dashboard, opened lazily on +first use via `ISessionManager` and re-opened transparently when it faults or +its lease expires. One session means one worker process backs every dashboard +circuit; all access is serialised so the worker sees one in-flight command at a +time. Tag reads go through `GatewaySession.SubscribeBulkAsync` / `ReadBulkAsync`. + +The Alarms page does **not** use the dashboard session: alarm data comes from +the gateway's always-on central monitor. `QueryAlarmsAsync` reads +`IGatewayAlarmService.CurrentAlarms` — the monitor's in-process cache — so the +dashboard sees the same active-alarm set as every `StreamAlarms` client, with +no per-dashboard alarm subscription. When `MxGateway:Alarms:Enabled` is false +the monitor never starts and the cache stays empty. ### API keys page diff --git a/docs/Grpc.md b/docs/Grpc.md index b134462..126361a 100644 --- a/docs/Grpc.md +++ b/docs/Grpc.md @@ -29,7 +29,7 @@ A second gRPC service, `GalaxyRepositoryGrpcService`, is mapped alongside it. It ## RPC Handlers -`MxAccessGatewayService` derives from the generated `MxAccessGateway.MxAccessGatewayBase` and implements every RPC declared in `mxaccess_gateway.proto` — six in total: `OpenSession`, `CloseSession`, `Invoke`, `StreamEvents`, `AcknowledgeAlarm`, and `QueryActiveAlarms`. The proto contract itself is documented in [Contracts](./Contracts.md); this section covers only what the server-side handler does on top of that contract. +`MxAccessGatewayService` derives from the generated `MxAccessGateway.MxAccessGatewayBase` and implements every RPC declared in `mxaccess_gateway.proto` — six in total: `OpenSession`, `CloseSession`, `Invoke`, `StreamEvents`, `AcknowledgeAlarm`, and `StreamAlarms`. The proto contract itself is documented in [Contracts](./Contracts.md); this section covers only what the server-side handler does on top of that contract. Public gRPC send and receive message sizes are configured from `MxGateway:Protocol:MaxGrpcMessageBytes` (default 16 MiB). Official clients use @@ -88,11 +88,11 @@ Carrying the enqueue timestamp into the worker layer is what lets queue-wait tim ### `AcknowledgeAlarm` -`AcknowledgeAlarm` is a unary RPC that acknowledges a single alarm. The handler validates `session_id` and `alarm_full_reference` inline (it does not run through `MxAccessGrpcRequestValidator`, because the alarm surface routes through `IAlarmRpcDispatcher` rather than the generic `Invoke` path), resolves the session, then delegates to the registered `IAlarmRpcDispatcher`. The production `WorkerAlarmRpcDispatcher` routes the ack over the worker IPC by GUID (`AcknowledgeAlarmCommand`) when the reference parses as a canonical GUID, or by `Provider!Group.Tag` reference (`AcknowledgeAlarmByNameCommand`) otherwise. The handler-level RPC behaviour and the alarm contract itself are documented in [Alarm Client Discovery](./AlarmClientDiscovery.md). +`AcknowledgeAlarm` is a unary, **session-less** RPC that acknowledges a single alarm. The handler validates `alarm_full_reference` inline (it does not run through `MxAccessGrpcRequestValidator`) and delegates to `IGatewayAlarmService.AcknowledgeAsync`. The always-on `GatewayAlarmMonitor` routes the ack over its own gateway-managed worker session — clients no longer open a session to acknowledge an alarm. A reference that parses as a canonical GUID forwards to `AcknowledgeAlarmCommand`; a `Provider!Group.Tag` reference forwards to `AcknowledgeAlarmByNameCommand`. The alarm contract and the central monitor are documented in [Alarm Client Discovery](./AlarmClientDiscovery.md). -### `QueryActiveAlarms` +### `StreamAlarms` -`QueryActiveAlarms` is a server-streaming RPC that returns an `ActiveAlarmSnapshot` per currently active alarm. The handler validates `session_id` inline, resolves the session, and delegates to `IAlarmRpcDispatcher`; `WorkerAlarmRpcDispatcher` issues a `QueryActiveAlarmsCommand` over the worker IPC and streams each snapshot from the worker reply. +`StreamAlarms` is a server-streaming, **session-less** RPC that attaches to the gateway's central alarm feed. The handler delegates to `IGatewayAlarmService.StreamAsync`. The stream opens with one `AlarmFeedMessage` carrying an `active_alarm` per currently-active alarm (the ConditionRefresh snapshot), then a single `snapshot_complete`, then a `transition` for every subsequent raise / acknowledge / clear. It is served by the always-on `GatewayAlarmMonitor`, which owns a single gateway-managed worker session and fans out to every attached client — clients no longer open a session of their own. `alarm_filter_prefix`, when set, scopes the stream to a sub-tree. ## Validation Rules @@ -104,8 +104,8 @@ Carrying the enqueue timestamp into the worker layer is what lets queue-wait tim | `CloseSession` | `session_id` must be non-empty. | `InvalidArgument` | | `StreamEvents` | `session_id` must be non-empty. | `InvalidArgument` | | `Invoke` | `session_id` non-empty, `command` present, `kind` not `Unspecified`, payload oneof must match `kind`. | `InvalidArgument` | -| `AcknowledgeAlarm` | `session_id` and `alarm_full_reference` must be non-empty. Validated inline in the handler, not by `MxAccessGrpcRequestValidator`. | `InvalidArgument` | -| `QueryActiveAlarms` | `session_id` must be non-empty. Validated inline in the handler, not by `MxAccessGrpcRequestValidator`. | `InvalidArgument` | +| `AcknowledgeAlarm` | `alarm_full_reference` must be non-empty. Validated inline in the handler, not by `MxAccessGrpcRequestValidator`. | `InvalidArgument` | +| `StreamAlarms` | No required fields — `alarm_filter_prefix` is optional. | — | The payload-vs-kind check matters because the `MxCommand.payload` oneof is non-discriminated on the wire — a misaligned client could send `kind = Write` with a `Register` payload and silently confuse the worker. The validator turns that into a clear client error: