Files
mxaccessgw/docs/gateway-dashboard-design.md
2026-04-26 16:10:58 -04:00

370 lines
9.3 KiB
Markdown

# Gateway Dashboard Detailed Design
## Purpose
The gateway should host a basic web dashboard for operators and developers. The
dashboard is diagnostic and operational visibility only for v1. It should show
gateway health, active MXAccess worker instances, session state, and basic
statistics in real time.
## Technology Choice
Decision: Blazor Server with Bootstrap CSS/JS.
Allowed UI stack:
- ASP.NET Core Blazor Server,
- Bootstrap CSS,
- Bootstrap JavaScript,
- small local CSS for layout and status styling,
- built-in Blazor components.
Not allowed for v1:
- MudBlazor,
- Radzen,
- Syncfusion,
- Telerik,
- other Blazor UI component libraries,
- client-side SPA framework replacement.
Rationale: Blazor Server keeps the dashboard in the gateway process, avoids a
separate frontend build, and gives real-time UI updates through the Blazor
SignalR circuit. Bootstrap is sufficient for a basic dashboard.
## Hosting Model
The dashboard is hosted by `MxGateway.Server` alongside the gRPC API.
Suggested endpoint layout:
```text
/dashboard
/dashboard/sessions
/dashboard/sessions/{sessionId}
/dashboard/workers
/dashboard/events
/dashboard/settings
/_blazor
```
The app should redirect `/` to `/dashboard` only if the deployment wants the
dashboard as the default web page. Otherwise leave gRPC/API hosting unaffected.
## High-Level Components
```text
MxGateway.Server
Dashboard/
Components/
App.razor
Routes.razor
Layout/
DashboardLayout.razor
NavMenu.razor
Pages/
DashboardHome.razor
SessionsPage.razor
SessionDetailsPage.razor
WorkersPage.razor
EventsPage.razor
SettingsPage.razor
Components/
MetricCard.razor
SessionTable.razor
WorkerTable.razor
EventRatePanel.razor
FaultList.razor
Services/
DashboardSnapshotService.cs
DashboardUpdateHub.cs
DashboardAuthorization.cs
Models/
DashboardSnapshot.cs
SessionSummary.cs
WorkerSummary.cs
MetricSummary.cs
```
`DashboardUpdateHub` here means an internal application update service, not a
separate public SignalR hub unless implementation proves one is needed. Blazor
Server already uses SignalR for UI circuits.
## Dashboard Data Source
The dashboard should consume read-only snapshots from gateway services:
- `SessionRegistry`,
- `SessionManager`,
- `WorkerClient`,
- `GatewayMetrics`,
- health checks,
- structured fault/event counters.
Do not let Razor components directly mutate gateway session or worker objects.
Create a small read-only dashboard service that projects gateway state into
plain DTOs.
`GatewayMetrics.GetSnapshot()` is the metrics input for the first dashboard
projection. It carries current session and worker gauges, command and event
counters, queue depth, and fault totals. The dashboard reads that snapshot
instead of reading raw `Meter` instruments because exporter configuration is an
operations concern, not a UI dependency.
Suggested service:
```csharp
public interface IDashboardSnapshotService
{
DashboardSnapshot GetSnapshot();
IAsyncEnumerable<DashboardSnapshot> WatchSnapshotsAsync(
CancellationToken cancellationToken);
}
```
Snapshot updates can be driven by:
- periodic timer, default every 1 second,
- session lifecycle notifications,
- worker heartbeat updates,
- event counter updates,
- fault notifications.
Use immutable snapshot DTOs so Razor components can render without locking
gateway internals.
## Realtime Updates
Use Blazor Server component state updates for real-time dashboard refresh.
Recommended pattern:
1. Page/component subscribes to `WatchSnapshotsAsync`.
2. Snapshot service emits updates from a bounded channel or timer.
3. Component stores the latest snapshot.
4. Component calls `InvokeAsync(StateHasChanged)`.
5. Component cancels subscription on dispose.
Default update cadence:
- immediate update on session create/close/fault,
- immediate update on worker fault,
- periodic metrics refresh every 1 second,
- event-rate windows updated every 1 second.
Avoid pushing every MXAccess data-change event to the dashboard. Aggregate event
counts and rates instead.
## Pages
### Dashboard Home
Show top-level status:
- gateway status,
- gateway version,
- uptime,
- open sessions,
- workers running,
- sessions faulted,
- command rate,
- command failure count,
- event rate,
- event queue depth,
- worker restart/kill count.
Use Bootstrap cards for individual metric summaries. Keep the layout compact
and operational.
### Sessions Page
Show active and recent sessions in a table:
- session id,
- client identity or API key display name,
- state,
- backend,
- worker process id,
- open time,
- last client activity,
- last worker heartbeat,
- active event subscribers,
- pending commands,
- event queue depth,
- last fault summary.
Rows should link to session details.
### Session Details Page
Show:
- session metadata,
- worker metadata,
- command counters by method,
- event counters by family,
- active server handles and item counts if gateway shadow state has them,
- latest faults,
- last heartbeat payload,
- close/kill controls only if admin actions are later enabled.
For v1, details should be read-only unless an explicit admin action design is
added.
### Workers Page
Show:
- worker process id,
- session id,
- executable path/version,
- state,
- startup duration,
- memory and CPU if available,
- last heartbeat,
- current command correlation id,
- pending command count,
- event queue depth,
- restart/kill reason if terminal.
### Events Page
Show aggregate event diagnostics:
- event rate by session,
- event rate by event family,
- total events since start,
- queue overflow count,
- stream disconnect count,
- recent terminal faults.
Do not display full tag values by default. If value display is later added, make
it opt-in and redacted.
### Settings Page
Show read-only effective configuration:
- worker executable path,
- configured timeouts,
- queue capacities,
- auth mode,
- SQLite auth database path with sensitive parts redacted if needed,
- dashboard enabled state,
- protocol version.
Do not show API key secrets or pepper values.
## Authentication And Authorization
Dashboard access should use the same API-key authentication model as gRPC where
practical.
Recommended v1 behavior:
- dashboard disabled by default unless configured,
- when enabled, require API key auth,
- require `admin` scope for dashboard access,
- accept API key through a secure cookie established by a simple login form, or
through reverse-proxy/header configuration for local deployments,
- do not put API keys in query strings.
Simplest implementation path:
1. Add `/dashboard/login`.
2. User submits API key over HTTPS.
3. Gateway validates key and `admin` scope.
4. Gateway issues an HTTP-only secure auth cookie for the dashboard.
5. Dashboard pages require that cookie.
6. Logout clears the cookie.
For local development, allow an explicit `Dashboard:AllowAnonymousLocalhost`
option. It must default to false.
## Configuration
Suggested configuration:
```json
{
"MxGateway": {
"Dashboard": {
"Enabled": true,
"PathBase": "/dashboard",
"RequireAdminScope": true,
"AllowAnonymousLocalhost": false,
"SnapshotIntervalMilliseconds": 1000,
"RecentFaultLimit": 100,
"RecentSessionLimit": 200,
"ShowTagValues": false
}
}
}
```
## Security Rules
- Do not display API key secrets.
- Do not display credential-bearing MXAccess command values.
- Do not display full tag values by default.
- Do not expose worker pipe names with nonce or sensitive details.
- Protect dashboard auth cookies with `HttpOnly`, `Secure`, and `SameSite`.
- Require TLS for remote dashboard access.
- Use anti-forgery protection for login/logout and any future admin actions.
## Styling
Use Bootstrap utility classes and a small local stylesheet.
Recommended visual language:
- compact tables,
- status badges,
- metric cards,
- Bootstrap alerts for faults,
- restrained colors,
- no decorative hero sections,
- no charting dependency for v1.
If charts are added later, prefer simple server-generated data tables first. Do
not add a JavaScript charting dependency without a specific need.
## Testing
Dashboard unit/component tests should cover:
- snapshot projection,
- dashboard auth authorization decisions,
- login API-key validation behavior,
- pages render with empty state,
- pages render with active sessions,
- pages render with faulted sessions,
- realtime subscription disposal,
- redaction of API keys and credential values.
Use bUnit if component testing is added. Otherwise keep the first tests focused
on snapshot services and authorization logic.
Integration tests should verify:
- dashboard disabled returns not found or configured fallback,
- dashboard requires auth when enabled,
- admin-scoped key can access dashboard,
- non-admin key is denied,
- live snapshot updates when a fake session changes state.
## Initial Implementation Slice
The first dashboard slice should implement:
1. Blazor Server hosting in `MxGateway.Server`.
2. Bootstrap static assets.
3. dashboard configuration binding.
4. dashboard auth using API key login and HTTP-only cookie.
5. read-only `DashboardSnapshotService`.
6. home page with metric cards.
7. sessions page with active session table.
8. workers page with worker table.
9. 1-second realtime refresh through Blazor Server.
10. redaction tests for secrets.