Files
mxaccessgw/docs/GatewayDashboardDesign.md
T
Joseph Doherty e80f3c70b6 docs: cover admin dashboard actions + API key Delete
Update the design docs so they match the implemented Admin-only
dashboard surface. GatewayDashboardDesign now documents the Close
session / Kill worker controls and the new Delete action on revoked
API keys, plus the ConfirmDialog gate for every destructive action.
Sessions.md adds the SessionManager.KillWorkerAsync entry alongside
CloseSessionAsync and explains the immediate-kill semantics. Authentication.md adds the IApiKeyAdminStore.DeleteAsync write path
and the dashboard-delete-key audit event. DashboardInterfaceDesign
drops the "read-only until admin workflows have a separate design"
line in favor of the confirm-before-act invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:35:25 -04:00

588 lines
22 KiB
Markdown

# Gateway Dashboard Detailed Design
## Purpose
The gateway should host a basic web dashboard for operators and developers. The
dashboard is diagnostic and operational visibility only for v1. It should show
gateway health, active MXAccess worker instances, session state, and basic
statistics in real time.
## Technology Choice
Decision: Blazor Server with Bootstrap CSS/JS.
Allowed UI stack:
- ASP.NET Core Blazor Server,
- Bootstrap CSS,
- Bootstrap JavaScript,
- small local CSS for layout and status styling,
- built-in Blazor components.
Not allowed for v1:
- MudBlazor,
- Radzen,
- Syncfusion,
- Telerik,
- other Blazor UI component libraries,
- client-side SPA framework replacement.
Rationale: Blazor Server keeps the dashboard in the gateway process, avoids a
separate frontend build, and gives real-time UI updates through the Blazor
SignalR circuit. Bootstrap is sufficient for a basic dashboard.
## Hosting Model
The dashboard is hosted by `ZB.MOM.WW.MxGateway.Server` alongside the gRPC API. When
`MxGateway:Dashboard:Enabled` is `true`, `MapGatewayDashboard()` mounts the
Blazor Server app at the host root and registers the login, logout, denied,
SignalR hub, and hub-token endpoints beside it. When dashboard hosting is
disabled, none of those routes are mapped — the same listener still serves
gRPC.
Endpoint layout:
```text
/
/sessions
/sessions/{sessionId}
/workers
/events
/alarms
/galaxy
/browse
/apikeys
/settings
/login (POST also)
/logout (POST)
/denied
/hubs/snapshot
/hubs/alarms
/hubs/events
/hubs/token
/_blazor
```
The `/galaxy` page surfaces the Galaxy Repository browse summary
(deployed object hierarchy size, last deploy timestamp, attribute totals,
template usage, and connectivity sync info). The summary is fed by
`GalaxySummaryCache`, which is refreshed off the request path by
`GalaxySummaryRefreshService` on the
`MxGateway:Galaxy:DashboardRefreshIntervalSeconds` cadence so the dashboard
never blocks on SQL. See [Galaxy Repository Browse](./GalaxyRepository.md) for
the underlying gRPC service.
## High-Level Components
```text
ZB.MOM.WW.MxGateway.Server
Dashboard/
Components/
App.razor
Routes.razor
DashboardPageBase.cs
DashboardDisplay.cs
Layout/
DashboardLayout.razor
Pages/
DashboardHome.razor
SessionsPage.razor
SessionDetailsPage.razor
WorkersPage.razor
EventsPage.razor
ApiKeysPage.razor
SettingsPage.razor
Shared/
MetricCard.razor
StatusBadge.razor
FaultList.razor
DashboardSnapshotService.cs
DashboardAuthorizationHandler.cs
DashboardAuthenticator.cs
DashboardApiKeyAuthorization.cs
DashboardApiKeyManagementService.cs
DashboardApiKeySummary.cs
DashboardSnapshot.cs
DashboardSessionSummary.cs
DashboardWorkerSummary.cs
DashboardMetricSummary.cs
```
The dashboard exposes three named SignalR hubs in addition to Blazor Server's
internal circuit; pages connect to those hubs from within the circuit via the
`DashboardHubConnectionFactory` helper. The hubs publish snapshot, alarm, and
per-session event updates that the pages render in place of polling.
## Dashboard Data Source
The dashboard should consume read-only snapshots from gateway services:
- `SessionRegistry`,
- `SessionManager`,
- `WorkerClient`,
- `GatewayMetrics`,
- health checks,
- structured fault/event counters.
Do not let Razor components directly mutate gateway session or worker objects.
Create a small read-only dashboard service that projects gateway state into
plain DTOs.
`GatewayMetrics.GetSnapshot()` is the metrics input for the first dashboard
projection. It carries current session and worker gauges, command and event
counters, queue depth, and fault totals. The dashboard reads that snapshot
instead of reading raw `Meter` instruments because exporter configuration is an
operations concern, not a UI dependency.
Suggested service:
```csharp
public interface IDashboardSnapshotService
{
DashboardSnapshot GetSnapshot();
IAsyncEnumerable<DashboardSnapshot> WatchSnapshotsAsync(
CancellationToken cancellationToken);
}
```
Snapshot updates can be driven by:
- periodic timer, default every 1 second,
- session lifecycle notifications,
- worker heartbeat updates,
- event counter updates,
- fault notifications.
Use immutable snapshot DTOs so Razor components can render without locking
gateway internals.
## Realtime Updates
Updates flow over three SignalR hubs, all guarded by the
`MxGateway.Dashboard.HubClients` policy (cookie OR `MxGateway.Dashboard.HubToken`
bearer). Each hub class is `[Authorize(Policy = HubClientsPolicy)]`.
| Hub | Path | Producer | Payload | Routing |
|---|---|---|---|---|
| `DashboardSnapshotHub` | `/hubs/snapshot` | `DashboardSnapshotPublisher` (BackgroundService consuming `IDashboardSnapshotService.WatchSnapshotsAsync`) | `DashboardSnapshot` | Sent to all connected clients on every snapshot tick; new connections receive the current snapshot synchronously in `OnConnectedAsync`. |
| `AlarmsHub` | `/hubs/alarms` | `AlarmsHubPublisher` (BackgroundService consuming `IGatewayAlarmService.StreamAsync(filter: null)`) | `AlarmFeedMessage` (`active_alarm` / `snapshot_complete` / `transition`) | Connected clients auto-join `__alarms__`; all clients receive every message. Publisher auto-reconnects every 5s on stream faults. |
| `EventsHub` | `/hubs/events` | `DashboardEventBroadcaster` invoked by `EventStreamService` for each event it forwards to a gRPC client | `MxEvent` | Clients call `SubscribeSession(sessionId)` to join `session:{id}`. Events appear only while a gRPC client is also consuming that session's events — the dashboard is a passive mirror, not a separate worker subscriber. |
`DashboardPageBase` opens a `DashboardSnapshotHub` connection via the connection
factory in `OnInitializedAsync`, seeds `Snapshot` synchronously from
`IDashboardSnapshotService.GetSnapshot()` so the first render is non-empty, and
calls `InvokeAsync(StateHasChanged)` on every `SnapshotUpdated` push. SignalR's
`WithAutomaticReconnect` handles transient disconnects.
`SessionDetailsPage` additionally opens an `EventsHub` connection for the
current session id and renders the most recent N events (default 50) in a
"Recent events" table with a live/offline connection pill.
Default cadences:
- snapshot service produces one snapshot per
`MxGateway:Dashboard:SnapshotIntervalMilliseconds` (default 1s);
- alarm publisher emits on each transition observed by the central monitor;
- event publisher emits per event forwarded by `StreamEvents`.
Avoid pushing every MXAccess data-change event into a wider broadcast group.
The current design routes events strictly through `session:{id}` groups; the
snapshot hub continues to carry aggregate event counters and rates.
## Pages
### Dashboard home
Show top-level status:
- gateway status,
- gateway version,
- uptime,
- open sessions,
- workers running,
- sessions faulted,
- command rate,
- command failure count,
- event rate,
- event queue depth,
- worker restart/kill count.
Use Bootstrap cards for individual metric summaries. Keep the layout compact
and operational.
### Sessions page
Show active and recent sessions in a table:
- session id,
- client identity or API key display name,
- state,
- backend,
- worker process id,
- open time,
- last client activity,
- last worker heartbeat,
- active event subscribers,
- pending commands,
- event queue depth,
- last fault summary.
Rows should link to session details.
### Session details page
Show:
- session metadata,
- worker metadata,
- command counters by method,
- event counters by family,
- active server handles and item counts if gateway shadow state has them,
- latest faults,
- last heartbeat payload,
- admin Close session / Kill worker controls (Admin role only).
The Sessions list, the Workers list, and this details page all render the same
admin controls when the signed-in principal carries the `Admin` role; viewers
and the localhost-anonymous bypass see no action affordances and the server
re-checks the role on every invocation. Every destructive admin action is
gated by a confirmation dialog before it reaches `ISessionManager`.
- **Close session** routes through `ISessionManager.CloseSessionAsync`: the
worker is asked to shut down gracefully and is killed only as a fallback if
shutdown fails.
- **Kill worker** routes through `ISessionManager.KillWorkerAsync`: the worker
is killed immediately with no graceful-shutdown attempt. The session is
removed from the registry and the open-session slot is released either way.
### Workers page
Show:
- worker process id,
- session id,
- executable path/version,
- state,
- startup duration,
- memory and CPU if available,
- last heartbeat,
- current command correlation id,
- pending command count,
- event queue depth,
- restart/kill reason if terminal.
### Events page
Show aggregate event diagnostics:
- event rate by session,
- event rate by event family,
- total events since start,
- queue overflow count,
- stream disconnect count,
- recent terminal faults.
Do not display full tag values by default. If value display is later added, make
it opt-in and redacted.
### Browse page
`/dashboard/browse` lets an operator explore the Galaxy tag hierarchy and watch
live values. The tree is built in-process by `DashboardBrowseTreeBuilder` from
`IGalaxyHierarchyCache.Current` — the same cache the Galaxy page reads — so a
render costs no gRPC call and no SQL round-trip. Each node shows its child
objects and, when expanded, its attributes with attribute name, data type
(including array dimension), and the alarm / historized flags. Galaxy SQL
carries no attribute description, so none is shown. A filter box switches the
tree to a flat list of matching attributes.
Right-clicking an attribute (or double-clicking it) adds it to the subscription
panel. The panel shows each subscribed tag's live value, MXAccess data type,
quality and source timestamp, refreshed every two seconds. The subscription
panel is the explicit opt-in tag-value surface: it always shows values
regardless of `Dashboard:ShowTagValues`, which continues to govern only the
diagnostic session/worker views.
### Alarms page
`/dashboard/alarms` lists the alarms the gateway's central alarm monitor
currently holds as Active or ActiveAcked, refreshed every three seconds. It
defaults to showing unacknowledged `Active` alarms; filters add acknowledged
alarms and narrow by area, severity range, and a reference/source/description
text search. Cleared alarms are not retained — the gateway holds no
alarm-history store, so the page reflects only the live active set. The page is
read-only; it does not acknowledge alarms. If `MxGateway:Alarms:Enabled` is
false the central monitor never starts, and the page says so instead of showing
an empty list with no explanation.
### Live data source
Both the Browse subscription panel and the Alarms page read live MXAccess data
through `IDashboardLiveDataService` (`DashboardLiveDataService`). For tag data
it owns one shared gateway session for the whole dashboard, opened lazily on
first use via `ISessionManager` and re-opened transparently when it faults or
its lease expires. One session means one worker process backs every dashboard
circuit; all access is serialised so the worker sees one in-flight command at a
time. Tag reads go through `GatewaySession.SubscribeBulkAsync` / `ReadBulkAsync`.
The Alarms page does **not** use the dashboard session: alarm data comes from
the gateway's always-on central monitor. `QueryAlarmsAsync` reads
`IGatewayAlarmService.CurrentAlarms` — the monitor's in-process cache — so the
dashboard sees the same active-alarm set as every `StreamAlarms` client, with
no per-dashboard alarm subscription. When `MxGateway:Alarms:Enabled` is false
the monitor never starts and the cache stays empty.
### API keys page
`/dashboard/apikeys` lists the gateway's API keys and, for authorized
operators, manages them. It reads key metadata through the same
`IApiKeyAdminStore` the `apikey` CLI uses, so the dashboard and the CLI act
on one source of truth.
The table shows one row per key:
- key id,
- status (`Active` or `Revoked`),
- display name,
- scopes,
- constraints (rendered as `unconstrained` when none are set),
- created timestamp,
- last-used timestamp.
Key secrets are never listed. Only the peppered hash is stored, and the page
never reconstructs a key. See [Authorization](./Authorization.md#constraint-enforcement)
for what each constraint means and how it is enforced on the gRPC path.
#### Management actions
Create, Rotate, Revoke, and Delete controls render only when the signed-in
user is authorized. `DashboardApiKeyAuthorization.CanManage` requires an
authenticated principal carrying the `Admin` role claim (resolved at login
from the user's LDAP groups via `MxGateway:Dashboard:GroupToRole`). A
`Viewer` role can read the table but sees no action controls, and an
anonymous localhost session shows the same read-only view.
- **Create** opens a dialog for the key id, display name, scope checkboxes
(the `GatewayScopes` catalog), and the optional constraint fields: read and
write subtrees, read and write tag globs, browse subtrees, max write
classification, and the read-alarm-only / read-historized-only flags.
- **Rotate** issues a new secret for an existing key id and invalidates the
old one. Active keys only — rotating a revoked key would un-revoke it, so
the button is not shown on revoked rows.
- **Revoke** marks a key revoked; a revoked key cannot be un-revoked.
- **Delete** permanently removes a key row from the auth database, but only
when the key is already revoked. `IApiKeyAdminStore.DeleteAsync` rejects
active keys (returns false) so the revoke event lands in the audit log
before the row disappears. Revoked rows show a Delete button in place of
the previous "No actions" placeholder.
Every destructive action (Rotate / Revoke / Delete) is gated by the shared
`ConfirmDialog` component before reaching the service; Create uses its own
form modal as the implicit confirmation step.
Create and Rotate return the assembled `mxgw_<keyId>_<secret>` token **once**,
in a one-time banner. It is never shown again, so the operator must copy it
immediately. This mirrors the `apikey create-key` / `rotate-key` CLI.
Every management action appends an `api_key_audit` entry
(`dashboard-create-key`, `dashboard-rotate-key`, `dashboard-revoke-key`,
`dashboard-delete-key`) with the key id and the caller's remote address.
Secrets and pepper values are never logged.
### Settings page
Show read-only effective configuration:
- worker executable path,
- configured timeouts,
- queue capacities,
- auth mode,
- SQLite auth database path with sensitive parts redacted if needed,
- dashboard enabled state,
- protocol version.
Do not show API key secrets or pepper values.
## Authentication And Authorization
Dashboard authentication is LDAP-backed, distinct from the API-key model used
on the gRPC API. Users sign in with directory credentials; the gateway maps
their LDAP groups to one of two dashboard roles (`Admin` or `Viewer`) and
issues a cookie carrying those role claims.
Implemented behavior:
- a static `/login` HTML form posts username/password to the gateway;
- `DashboardAuthenticator` binds against `MxGateway:Ldap` (service-account bind,
user search, candidate bind) using `Novell.Directory.Ldap.NETStandard`;
- the user's `memberOf` (or short CN) is matched against
`MxGateway:Dashboard:GroupToRole`; the resolved role(s) are emitted as
`ClaimTypes.Role` claims, alongside the per-group `mxgateway:ldap_group`
claims;
- a successful login signs in the `MxGateway.Dashboard` cookie scheme
(`__Host-MxGatewayDashboard`, HttpOnly, SameSite=Strict, Secure);
- a user with no matching group cannot sign in — the login screen returns the
generic credential-rejected message;
- antiforgery tokens guard the login and logout POSTs.
Three authorization policies are registered:
- `MxGateway.Dashboard.Viewer` — Razor component routes. Satisfied by Admin or
Viewer.
- `MxGateway.Dashboard.Admin` — Admin-only write surfaces (API-key CRUD).
- `MxGateway.Dashboard.HubClients` — SignalR hubs. Accepts the dashboard
cookie OR a `MxGateway.Dashboard.HubToken` bearer (used by WebSocket upgrades
where the cookie can't be forwarded).
Two environmental bypasses still apply: `MxGateway:Authentication:Mode = Disabled`
authorizes every request, and `MxGateway:Dashboard:AllowAnonymousLocalhost`
(default `true`) authorizes any loopback request without a role check. Remote
requests always require an authenticated principal carrying at least the
Viewer role.
### Hub bearer flow
SignalR connections cannot reuse the `__Host-` cookie when the JS client
upgrades to WebSocket — the cookie's `SameSite=Strict; Path=/` keeps it from
being forwarded by the browser's WebSocket layer in some edge cases. The
dashboard mints short-lived bearer tokens for the connection:
1. The cookie-authenticated Blazor page calls `GET /hubs/token`
(gated by `ViewerPolicy`, cookie-only).
2. `HubTokenService.Issue(user)` serializes the user's name, NameIdentifier,
and role claims to JSON, encrypts with the ASP.NET Core data-protection
time-limited protector under purpose
`ZB.MOM.WW.MxGateway.Dashboard.HubToken.v1`, and returns the protected
string. Lifetime is 30 minutes.
3. The SignalR client passes the token as either `Authorization: Bearer …` or
`?access_token=…` (WebSocket upgrade query string).
4. `HubTokenAuthenticationHandler` validates the protected payload and
rebuilds the `ClaimsPrincipal` with the carried roles.
5. The hubs' `[Authorize(Policy = HubClientsPolicy)]` accepts the resulting
identity.
`DashboardHubConnectionFactory` (scoped to the Blazor circuit) wraps the
HubConnectionBuilder and supplies a fresh token via `AccessTokenProvider` on
every (re)connect.
## Configuration
Effective configuration:
```json
{
"MxGateway": {
"Dashboard": {
"Enabled": true,
"AllowAnonymousLocalhost": true,
"SnapshotIntervalMilliseconds": 1000,
"RecentFaultLimit": 100,
"RecentSessionLimit": 200,
"ShowTagValues": false,
"GroupToRole": {
"GwAdmin": "Admin",
"GwReader": "Viewer"
}
}
}
}
```
See [Gateway Configuration](./GatewayConfiguration.md#dashboard-options) for
the full option table and the policies/hubs that derive from these values.
## Security Rules
- Do not display API key secrets.
- Do not display credential-bearing MXAccess command values.
- Do not display full tag values by default.
- Do not expose worker pipe names with nonce or sensitive details.
- Protect dashboard auth cookies with `HttpOnly`, `Secure`, and `SameSite`.
- Require TLS for remote dashboard access.
- Use anti-forgery protection for login/logout and any future admin actions.
## Styling
The dashboard serves Bootstrap 5.3.3 assets from
`src/ZB.MOM.WW.MxGateway.Server/wwwroot/lib/bootstrap/` and local layout/status styling
from `src/ZB.MOM.WW.MxGateway.Server/wwwroot/css/dashboard.css`.
Recommended visual language:
- compact tables,
- status badges,
- metric cards,
- Bootstrap alerts for faults,
- restrained colors,
- no decorative hero sections,
- no charting dependency for v1.
If charts are added later, prefer simple server-generated data tables first. Do
not add a JavaScript charting dependency without a specific need.
The reusable visual rules for replicating this interface in other projects are
documented in [Dashboard Interface Design](./DashboardInterfaceDesign.md).
## Testing
Dashboard unit/component tests should cover:
- snapshot projection,
- dashboard auth authorization decisions,
- login API-key validation behavior,
- pages render with empty state,
- pages render with active sessions,
- pages render with faulted sessions,
- realtime subscription disposal,
- redaction of API keys and credential values.
Use bUnit if component testing is added. Otherwise keep the first tests focused
on snapshot services and authorization logic.
Integration tests should verify:
- dashboard disabled returns not found or configured fallback,
- dashboard requires auth when enabled,
- a user in an Admin-mapped LDAP group can access the dashboard and the
API-key CRUD surface,
- a user in a Viewer-mapped LDAP group can render every page but cannot
invoke the Admin-only management actions,
- a user with no mapped LDAP group cannot sign in at all,
- live snapshot updates when a fake session changes state are delivered
via the `/hubs/snapshot` push, not by polling.
## Initial Implementation Slice
The first dashboard slice implements:
1. Blazor Server hosting in `ZB.MOM.WW.MxGateway.Server`.
2. local Bootstrap static assets.
3. dashboard configuration binding.
4. dashboard auth using LDAP bind + role-mapped HTTP-only cookie.
5. `DashboardSnapshotService` projecting gateway state for read views.
6. home page with metric cards.
7. sessions page with active session table and session details.
8. workers page with worker table.
9. events page with aggregate counters.
10. settings page with redacted effective configuration.
11. periodic realtime refresh through Blazor Server.
12. route-mapping tests, disabled-dashboard tests, auth tests, and snapshot
projection/redaction tests.
Subsequent slices added Admin-gated destructive actions: API-key
Create/Rotate/Revoke (and Delete on revoked keys), and session/worker
Close/Kill via `IDashboardSessionAdminService``ISessionManager`. Every
destructive action passes through the shared `ConfirmDialog` component
before reaching its service.
## Related Documentation
- [Dashboard Interface Design](./DashboardInterfaceDesign.md)
- [Gateway Process Detailed Design](./GatewayProcessDesign.md)
- [Authentication](./Authentication.md)
- [Authorization](./Authorization.md)
- [Sessions](./Sessions.md)
- [Metrics](./Metrics.md)
- [Diagnostics](./Diagnostics.md)