Files
mxaccessgw/docs/GatewayDashboardDesign.md
T
Joseph Doherty e80f3c70b6 docs: cover admin dashboard actions + API key Delete
Update the design docs so they match the implemented Admin-only
dashboard surface. GatewayDashboardDesign now documents the Close
session / Kill worker controls and the new Delete action on revoked
API keys, plus the ConfirmDialog gate for every destructive action.
Sessions.md adds the SessionManager.KillWorkerAsync entry alongside
CloseSessionAsync and explains the immediate-kill semantics. Authentication.md adds the IApiKeyAdminStore.DeleteAsync write path
and the dashboard-delete-key audit event. DashboardInterfaceDesign
drops the "read-only until admin workflows have a separate design"
line in favor of the confirm-before-act invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:35:25 -04:00

22 KiB

Gateway Dashboard Detailed Design

Purpose

The gateway should host a basic web dashboard for operators and developers. The dashboard is diagnostic and operational visibility only for v1. It should show gateway health, active MXAccess worker instances, session state, and basic statistics in real time.

Technology Choice

Decision: Blazor Server with Bootstrap CSS/JS.

Allowed UI stack:

  • ASP.NET Core Blazor Server,
  • Bootstrap CSS,
  • Bootstrap JavaScript,
  • small local CSS for layout and status styling,
  • built-in Blazor components.

Not allowed for v1:

  • MudBlazor,
  • Radzen,
  • Syncfusion,
  • Telerik,
  • other Blazor UI component libraries,
  • client-side SPA framework replacement.

Rationale: Blazor Server keeps the dashboard in the gateway process, avoids a separate frontend build, and gives real-time UI updates through the Blazor SignalR circuit. Bootstrap is sufficient for a basic dashboard.

Hosting Model

The dashboard is hosted by ZB.MOM.WW.MxGateway.Server alongside the gRPC API. When MxGateway:Dashboard:Enabled is true, MapGatewayDashboard() mounts the Blazor Server app at the host root and registers the login, logout, denied, SignalR hub, and hub-token endpoints beside it. When dashboard hosting is disabled, none of those routes are mapped — the same listener still serves gRPC.

Endpoint layout:

/
/sessions
/sessions/{sessionId}
/workers
/events
/alarms
/galaxy
/browse
/apikeys
/settings
/login           (POST also)
/logout          (POST)
/denied
/hubs/snapshot
/hubs/alarms
/hubs/events
/hubs/token
/_blazor

The /galaxy page surfaces the Galaxy Repository browse summary (deployed object hierarchy size, last deploy timestamp, attribute totals, template usage, and connectivity sync info). The summary is fed by GalaxySummaryCache, which is refreshed off the request path by GalaxySummaryRefreshService on the MxGateway:Galaxy:DashboardRefreshIntervalSeconds cadence so the dashboard never blocks on SQL. See Galaxy Repository Browse for the underlying gRPC service.

High-Level Components

ZB.MOM.WW.MxGateway.Server
  Dashboard/
    Components/
      App.razor
      Routes.razor
      DashboardPageBase.cs
      DashboardDisplay.cs
      Layout/
        DashboardLayout.razor
      Pages/
        DashboardHome.razor
        SessionsPage.razor
        SessionDetailsPage.razor
        WorkersPage.razor
        EventsPage.razor
        ApiKeysPage.razor
        SettingsPage.razor
      Shared/
        MetricCard.razor
        StatusBadge.razor
        FaultList.razor
    DashboardSnapshotService.cs
    DashboardAuthorizationHandler.cs
    DashboardAuthenticator.cs
    DashboardApiKeyAuthorization.cs
    DashboardApiKeyManagementService.cs
    DashboardApiKeySummary.cs
    DashboardSnapshot.cs
    DashboardSessionSummary.cs
    DashboardWorkerSummary.cs
    DashboardMetricSummary.cs

The dashboard exposes three named SignalR hubs in addition to Blazor Server's internal circuit; pages connect to those hubs from within the circuit via the DashboardHubConnectionFactory helper. The hubs publish snapshot, alarm, and per-session event updates that the pages render in place of polling.

Dashboard Data Source

The dashboard should consume read-only snapshots from gateway services:

  • SessionRegistry,
  • SessionManager,
  • WorkerClient,
  • GatewayMetrics,
  • health checks,
  • structured fault/event counters.

Do not let Razor components directly mutate gateway session or worker objects. Create a small read-only dashboard service that projects gateway state into plain DTOs.

GatewayMetrics.GetSnapshot() is the metrics input for the first dashboard projection. It carries current session and worker gauges, command and event counters, queue depth, and fault totals. The dashboard reads that snapshot instead of reading raw Meter instruments because exporter configuration is an operations concern, not a UI dependency.

Suggested service:

public interface IDashboardSnapshotService
{
    DashboardSnapshot GetSnapshot();
    IAsyncEnumerable<DashboardSnapshot> WatchSnapshotsAsync(
        CancellationToken cancellationToken);
}

Snapshot updates can be driven by:

  • periodic timer, default every 1 second,
  • session lifecycle notifications,
  • worker heartbeat updates,
  • event counter updates,
  • fault notifications.

Use immutable snapshot DTOs so Razor components can render without locking gateway internals.

Realtime Updates

Updates flow over three SignalR hubs, all guarded by the MxGateway.Dashboard.HubClients policy (cookie OR MxGateway.Dashboard.HubToken bearer). Each hub class is [Authorize(Policy = HubClientsPolicy)].

Hub Path Producer Payload Routing
DashboardSnapshotHub /hubs/snapshot DashboardSnapshotPublisher (BackgroundService consuming IDashboardSnapshotService.WatchSnapshotsAsync) DashboardSnapshot Sent to all connected clients on every snapshot tick; new connections receive the current snapshot synchronously in OnConnectedAsync.
AlarmsHub /hubs/alarms AlarmsHubPublisher (BackgroundService consuming IGatewayAlarmService.StreamAsync(filter: null)) AlarmFeedMessage (active_alarm / snapshot_complete / transition) Connected clients auto-join __alarms__; all clients receive every message. Publisher auto-reconnects every 5s on stream faults.
EventsHub /hubs/events DashboardEventBroadcaster invoked by EventStreamService for each event it forwards to a gRPC client MxEvent Clients call SubscribeSession(sessionId) to join session:{id}. Events appear only while a gRPC client is also consuming that session's events — the dashboard is a passive mirror, not a separate worker subscriber.

DashboardPageBase opens a DashboardSnapshotHub connection via the connection factory in OnInitializedAsync, seeds Snapshot synchronously from IDashboardSnapshotService.GetSnapshot() so the first render is non-empty, and calls InvokeAsync(StateHasChanged) on every SnapshotUpdated push. SignalR's WithAutomaticReconnect handles transient disconnects.

SessionDetailsPage additionally opens an EventsHub connection for the current session id and renders the most recent N events (default 50) in a "Recent events" table with a live/offline connection pill.

Default cadences:

  • snapshot service produces one snapshot per MxGateway:Dashboard:SnapshotIntervalMilliseconds (default 1s);
  • alarm publisher emits on each transition observed by the central monitor;
  • event publisher emits per event forwarded by StreamEvents.

Avoid pushing every MXAccess data-change event into a wider broadcast group. The current design routes events strictly through session:{id} groups; the snapshot hub continues to carry aggregate event counters and rates.

Pages

Dashboard home

Show top-level status:

  • gateway status,
  • gateway version,
  • uptime,
  • open sessions,
  • workers running,
  • sessions faulted,
  • command rate,
  • command failure count,
  • event rate,
  • event queue depth,
  • worker restart/kill count.

Use Bootstrap cards for individual metric summaries. Keep the layout compact and operational.

Sessions page

Show active and recent sessions in a table:

  • session id,
  • client identity or API key display name,
  • state,
  • backend,
  • worker process id,
  • open time,
  • last client activity,
  • last worker heartbeat,
  • active event subscribers,
  • pending commands,
  • event queue depth,
  • last fault summary.

Rows should link to session details.

Session details page

Show:

  • session metadata,
  • worker metadata,
  • command counters by method,
  • event counters by family,
  • active server handles and item counts if gateway shadow state has them,
  • latest faults,
  • last heartbeat payload,
  • admin Close session / Kill worker controls (Admin role only).

The Sessions list, the Workers list, and this details page all render the same admin controls when the signed-in principal carries the Admin role; viewers and the localhost-anonymous bypass see no action affordances and the server re-checks the role on every invocation. Every destructive admin action is gated by a confirmation dialog before it reaches ISessionManager.

  • Close session routes through ISessionManager.CloseSessionAsync: the worker is asked to shut down gracefully and is killed only as a fallback if shutdown fails.
  • Kill worker routes through ISessionManager.KillWorkerAsync: the worker is killed immediately with no graceful-shutdown attempt. The session is removed from the registry and the open-session slot is released either way.

Workers page

Show:

  • worker process id,
  • session id,
  • executable path/version,
  • state,
  • startup duration,
  • memory and CPU if available,
  • last heartbeat,
  • current command correlation id,
  • pending command count,
  • event queue depth,
  • restart/kill reason if terminal.

Events page

Show aggregate event diagnostics:

  • event rate by session,
  • event rate by event family,
  • total events since start,
  • queue overflow count,
  • stream disconnect count,
  • recent terminal faults.

Do not display full tag values by default. If value display is later added, make it opt-in and redacted.

Browse page

/dashboard/browse lets an operator explore the Galaxy tag hierarchy and watch live values. The tree is built in-process by DashboardBrowseTreeBuilder from IGalaxyHierarchyCache.Current — the same cache the Galaxy page reads — so a render costs no gRPC call and no SQL round-trip. Each node shows its child objects and, when expanded, its attributes with attribute name, data type (including array dimension), and the alarm / historized flags. Galaxy SQL carries no attribute description, so none is shown. A filter box switches the tree to a flat list of matching attributes.

Right-clicking an attribute (or double-clicking it) adds it to the subscription panel. The panel shows each subscribed tag's live value, MXAccess data type, quality and source timestamp, refreshed every two seconds. The subscription panel is the explicit opt-in tag-value surface: it always shows values regardless of Dashboard:ShowTagValues, which continues to govern only the diagnostic session/worker views.

Alarms page

/dashboard/alarms lists the alarms the gateway's central alarm monitor currently holds as Active or ActiveAcked, refreshed every three seconds. It defaults to showing unacknowledged Active alarms; filters add acknowledged alarms and narrow by area, severity range, and a reference/source/description text search. Cleared alarms are not retained — the gateway holds no alarm-history store, so the page reflects only the live active set. The page is read-only; it does not acknowledge alarms. If MxGateway:Alarms:Enabled is false the central monitor never starts, and the page says so instead of showing an empty list with no explanation.

Live data source

Both the Browse subscription panel and the Alarms page read live MXAccess data through IDashboardLiveDataService (DashboardLiveDataService). For tag data it owns one shared gateway session for the whole dashboard, opened lazily on first use via ISessionManager and re-opened transparently when it faults or its lease expires. One session means one worker process backs every dashboard circuit; all access is serialised so the worker sees one in-flight command at a time. Tag reads go through GatewaySession.SubscribeBulkAsync / ReadBulkAsync.

The Alarms page does not use the dashboard session: alarm data comes from the gateway's always-on central monitor. QueryAlarmsAsync reads IGatewayAlarmService.CurrentAlarms — the monitor's in-process cache — so the dashboard sees the same active-alarm set as every StreamAlarms client, with no per-dashboard alarm subscription. When MxGateway:Alarms:Enabled is false the monitor never starts and the cache stays empty.

API keys page

/dashboard/apikeys lists the gateway's API keys and, for authorized operators, manages them. It reads key metadata through the same IApiKeyAdminStore the apikey CLI uses, so the dashboard and the CLI act on one source of truth.

The table shows one row per key:

  • key id,
  • status (Active or Revoked),
  • display name,
  • scopes,
  • constraints (rendered as unconstrained when none are set),
  • created timestamp,
  • last-used timestamp.

Key secrets are never listed. Only the peppered hash is stored, and the page never reconstructs a key. See Authorization for what each constraint means and how it is enforced on the gRPC path.

Management actions

Create, Rotate, Revoke, and Delete controls render only when the signed-in user is authorized. DashboardApiKeyAuthorization.CanManage requires an authenticated principal carrying the Admin role claim (resolved at login from the user's LDAP groups via MxGateway:Dashboard:GroupToRole). A Viewer role can read the table but sees no action controls, and an anonymous localhost session shows the same read-only view.

  • Create opens a dialog for the key id, display name, scope checkboxes (the GatewayScopes catalog), and the optional constraint fields: read and write subtrees, read and write tag globs, browse subtrees, max write classification, and the read-alarm-only / read-historized-only flags.
  • Rotate issues a new secret for an existing key id and invalidates the old one. Active keys only — rotating a revoked key would un-revoke it, so the button is not shown on revoked rows.
  • Revoke marks a key revoked; a revoked key cannot be un-revoked.
  • Delete permanently removes a key row from the auth database, but only when the key is already revoked. IApiKeyAdminStore.DeleteAsync rejects active keys (returns false) so the revoke event lands in the audit log before the row disappears. Revoked rows show a Delete button in place of the previous "No actions" placeholder.

Every destructive action (Rotate / Revoke / Delete) is gated by the shared ConfirmDialog component before reaching the service; Create uses its own form modal as the implicit confirmation step.

Create and Rotate return the assembled mxgw_<keyId>_<secret> token once, in a one-time banner. It is never shown again, so the operator must copy it immediately. This mirrors the apikey create-key / rotate-key CLI.

Every management action appends an api_key_audit entry (dashboard-create-key, dashboard-rotate-key, dashboard-revoke-key, dashboard-delete-key) with the key id and the caller's remote address. Secrets and pepper values are never logged.

Settings page

Show read-only effective configuration:

  • worker executable path,
  • configured timeouts,
  • queue capacities,
  • auth mode,
  • SQLite auth database path with sensitive parts redacted if needed,
  • dashboard enabled state,
  • protocol version.

Do not show API key secrets or pepper values.

Authentication And Authorization

Dashboard authentication is LDAP-backed, distinct from the API-key model used on the gRPC API. Users sign in with directory credentials; the gateway maps their LDAP groups to one of two dashboard roles (Admin or Viewer) and issues a cookie carrying those role claims.

Implemented behavior:

  • a static /login HTML form posts username/password to the gateway;
  • DashboardAuthenticator binds against MxGateway:Ldap (service-account bind, user search, candidate bind) using Novell.Directory.Ldap.NETStandard;
  • the user's memberOf (or short CN) is matched against MxGateway:Dashboard:GroupToRole; the resolved role(s) are emitted as ClaimTypes.Role claims, alongside the per-group mxgateway:ldap_group claims;
  • a successful login signs in the MxGateway.Dashboard cookie scheme (__Host-MxGatewayDashboard, HttpOnly, SameSite=Strict, Secure);
  • a user with no matching group cannot sign in — the login screen returns the generic credential-rejected message;
  • antiforgery tokens guard the login and logout POSTs.

Three authorization policies are registered:

  • MxGateway.Dashboard.Viewer — Razor component routes. Satisfied by Admin or Viewer.
  • MxGateway.Dashboard.Admin — Admin-only write surfaces (API-key CRUD).
  • MxGateway.Dashboard.HubClients — SignalR hubs. Accepts the dashboard cookie OR a MxGateway.Dashboard.HubToken bearer (used by WebSocket upgrades where the cookie can't be forwarded).

Two environmental bypasses still apply: MxGateway:Authentication:Mode = Disabled authorizes every request, and MxGateway:Dashboard:AllowAnonymousLocalhost (default true) authorizes any loopback request without a role check. Remote requests always require an authenticated principal carrying at least the Viewer role.

Hub bearer flow

SignalR connections cannot reuse the __Host- cookie when the JS client upgrades to WebSocket — the cookie's SameSite=Strict; Path=/ keeps it from being forwarded by the browser's WebSocket layer in some edge cases. The dashboard mints short-lived bearer tokens for the connection:

  1. The cookie-authenticated Blazor page calls GET /hubs/token (gated by ViewerPolicy, cookie-only).
  2. HubTokenService.Issue(user) serializes the user's name, NameIdentifier, and role claims to JSON, encrypts with the ASP.NET Core data-protection time-limited protector under purpose ZB.MOM.WW.MxGateway.Dashboard.HubToken.v1, and returns the protected string. Lifetime is 30 minutes.
  3. The SignalR client passes the token as either Authorization: Bearer … or ?access_token=… (WebSocket upgrade query string).
  4. HubTokenAuthenticationHandler validates the protected payload and rebuilds the ClaimsPrincipal with the carried roles.
  5. The hubs' [Authorize(Policy = HubClientsPolicy)] accepts the resulting identity.

DashboardHubConnectionFactory (scoped to the Blazor circuit) wraps the HubConnectionBuilder and supplies a fresh token via AccessTokenProvider on every (re)connect.

Configuration

Effective configuration:

{
  "MxGateway": {
    "Dashboard": {
      "Enabled": true,
      "AllowAnonymousLocalhost": true,
      "SnapshotIntervalMilliseconds": 1000,
      "RecentFaultLimit": 100,
      "RecentSessionLimit": 200,
      "ShowTagValues": false,
      "GroupToRole": {
        "GwAdmin": "Admin",
        "GwReader": "Viewer"
      }
    }
  }
}

See Gateway Configuration for the full option table and the policies/hubs that derive from these values.

Security Rules

  • Do not display API key secrets.
  • Do not display credential-bearing MXAccess command values.
  • Do not display full tag values by default.
  • Do not expose worker pipe names with nonce or sensitive details.
  • Protect dashboard auth cookies with HttpOnly, Secure, and SameSite.
  • Require TLS for remote dashboard access.
  • Use anti-forgery protection for login/logout and any future admin actions.

Styling

The dashboard serves Bootstrap 5.3.3 assets from src/ZB.MOM.WW.MxGateway.Server/wwwroot/lib/bootstrap/ and local layout/status styling from src/ZB.MOM.WW.MxGateway.Server/wwwroot/css/dashboard.css.

Recommended visual language:

  • compact tables,
  • status badges,
  • metric cards,
  • Bootstrap alerts for faults,
  • restrained colors,
  • no decorative hero sections,
  • no charting dependency for v1.

If charts are added later, prefer simple server-generated data tables first. Do not add a JavaScript charting dependency without a specific need.

The reusable visual rules for replicating this interface in other projects are documented in Dashboard Interface Design.

Testing

Dashboard unit/component tests should cover:

  • snapshot projection,
  • dashboard auth authorization decisions,
  • login API-key validation behavior,
  • pages render with empty state,
  • pages render with active sessions,
  • pages render with faulted sessions,
  • realtime subscription disposal,
  • redaction of API keys and credential values.

Use bUnit if component testing is added. Otherwise keep the first tests focused on snapshot services and authorization logic.

Integration tests should verify:

  • dashboard disabled returns not found or configured fallback,
  • dashboard requires auth when enabled,
  • a user in an Admin-mapped LDAP group can access the dashboard and the API-key CRUD surface,
  • a user in a Viewer-mapped LDAP group can render every page but cannot invoke the Admin-only management actions,
  • a user with no mapped LDAP group cannot sign in at all,
  • live snapshot updates when a fake session changes state are delivered via the /hubs/snapshot push, not by polling.

Initial Implementation Slice

The first dashboard slice implements:

  1. Blazor Server hosting in ZB.MOM.WW.MxGateway.Server.
  2. local Bootstrap static assets.
  3. dashboard configuration binding.
  4. dashboard auth using LDAP bind + role-mapped HTTP-only cookie.
  5. DashboardSnapshotService projecting gateway state for read views.
  6. home page with metric cards.
  7. sessions page with active session table and session details.
  8. workers page with worker table.
  9. events page with aggregate counters.
  10. settings page with redacted effective configuration.
  11. periodic realtime refresh through Blazor Server.
  12. route-mapping tests, disabled-dashboard tests, auth tests, and snapshot projection/redaction tests.

Subsequent slices added Admin-gated destructive actions: API-key Create/Rotate/Revoke (and Delete on revoked keys), and session/worker Close/Kill via IDashboardSessionAdminServiceISessionManager. Every destructive action passes through the shared ConfirmDialog component before reaching its service.