Improve gateway reliability and dashboard docs

This commit is contained in:
Joseph Doherty
2026-04-28 00:13:22 -04:00
parent bd4a09a35e
commit 4fc355b357
61 changed files with 1722 additions and 150 deletions
+294
View File
@@ -0,0 +1,294 @@
# Dashboard Interface Design
This guide describes the visual and interaction patterns used by the MXAccess
Gateway dashboard so the same interface style can be reused in other
operations-focused projects.
## Design Goal
The dashboard is an operational interface, not a landing page. It prioritizes
fast scanning, low visual noise, and stable layouts while live data changes.
The design uses Bootstrap for common behavior and a small local stylesheet for
project identity, spacing, and status presentation.
Use this style for applications where users repeatedly check system state,
compare rows, inspect details, and diagnose faults. Avoid promotional layouts,
large hero areas, decorative imagery, or oversized cards that reduce data
density.
## Visual Language
The interface uses a quiet, work-focused visual system:
- A light gray page background separates the application shell from white data
surfaces.
- White cards and sections carry the actual operational content.
- Borders define structure more often than shadows.
- Accent color is reserved for metric values and important numeric signals.
- Bootstrap status badges provide state color without custom status art.
- Tables remain compact and responsive so long identifiers and timestamps stay
readable.
The resulting page should look like a control surface: restrained, predictable,
and dense enough for repeated use.
## Layout Structure
Every page follows the same structure:
1. A top navigation bar with the product or service name on the left.
2. A full-width `container-fluid` content area.
3. A page header with the page title, short context text, and optional status
badge.
4. Metric cards when a page has top-level numeric state.
5. Bordered content sections for tables, details, faults, or empty states.
The shell does not use a sidebar. A horizontal navigation bar is enough for the
current page count and keeps the content width available for tables.
```html
<div class="dashboard-shell">
<nav class="navbar navbar-expand-lg bg-body border-bottom dashboard-navbar">
<!-- brand, page links, sign-out action -->
</nav>
<main class="container-fluid dashboard-content">
<!-- page header, metric grid, sections -->
</main>
</div>
```
## Color Tokens
Use a small token set and let Bootstrap provide the rest. The current dashboard
uses these local tokens:
```css
:root {
--mxgw-surface: #f7f8fa;
--mxgw-border: #d8dee6;
--mxgw-ink-muted: #667085;
--mxgw-accent: #146c64;
}
```
| Token | Purpose |
|-------|---------|
| `--mxgw-surface` | Page background behind all content. |
| `--mxgw-border` | Borders on cards, tables, sections, and empty states. |
| `--mxgw-ink-muted` | Secondary labels, details, and empty-state text. |
| `--mxgw-accent` | Metric values and important numeric summaries. |
Keep the palette small. Add new colors only when they encode state or improve
readability. Prefer Bootstrap badge classes for states such as ready, closing,
closed, and faulted.
## Typography
Typography stays compact and consistent:
- Page headings use `1.35rem`, weight `650`, and normal letter spacing.
- Section headings use the same size as page headings when they introduce a
table or details group.
- Metric labels use uppercase text at `.78rem` and weight `650`.
- Metric values use `1.7rem`, weight `700`, and the accent color.
- Body and table text inherit Bootstrap defaults for readability.
Do not scale text with viewport width. Long values use `overflow-wrap:
anywhere` so session IDs, paths, and fault messages do not break the layout.
## Spacing And Shape
The dashboard uses modest spacing:
- Page content has `1.25rem` padding on desktop and `.75rem` on small screens.
- Metric grids use `.75rem` gaps.
- Content sections start with a top border and `1rem` top padding.
- Cards and empty states use Bootstrap's small radius shape, `.375rem`.
- Metric cards have no shadow.
This keeps information grouped without turning each section into a decorative
panel. Use cards for repeated metric summaries, login forms, and individual
items. Use unframed sections with a top border for page-level groups.
## Navigation
Navigation is a Bootstrap responsive navbar. It includes:
- Brand text for the service name.
- Short page labels: `Overview`, `Sessions`, `Workers`, `Events`, `Settings`.
- Active route styling through `NavLink`.
- A right-aligned sign-out button when authentication is enabled.
Keep navigation labels short. Operational users should be able to predict what
each page contains without reading explanatory copy.
## Page Headers
Each page starts with a `dashboard-page-header`:
- The title is the primary anchor.
- A single secondary line gives timestamp, row count, or configuration context.
- A status badge appears on the right when the page has an overall state.
On narrow screens, the header stacks vertically. This prevents long context
text or status badges from overlapping the title.
```html
<div class="dashboard-page-header">
<div>
<h1>Overview</h1>
<div class="text-secondary">Generated 2026-04-27 17:30:00</div>
</div>
<span class="badge text-bg-success">Healthy</span>
</div>
```
## Metric Cards
Metric cards summarize numeric state at the top of overview and diagnostic
pages. They use Bootstrap cards with a local `metric-card` class:
- Label: uppercase, muted, compact.
- Value: large enough to scan, accent colored, wraps safely.
- Detail: optional muted text for version, rate context, or explanatory state.
Use auto-fit CSS grid tracks so the cards fill available width without custom
breakpoints:
```css
.metric-grid {
display: grid;
gap: .75rem;
grid-template-columns: repeat(auto-fit, minmax(12rem, 1fr));
}
.metric-grid.compact {
grid-template-columns: repeat(auto-fit, minmax(10rem, 1fr));
}
```
Metrics should be formatted before rendering. Counts use thousands separators,
durations use stable units, and missing values render as `-`.
## Tables
Tables are the main information surface. Use Bootstrap `table table-sm` with a
local `dashboard-table` class:
- `table-sm` keeps rows dense.
- `align-middle` improves status badge alignment.
- `table-responsive` wraps every table that can exceed the viewport.
- Header cells use weight `650` and no wrapping.
- Body cells allow wrapping so identifiers, paths, and messages stay visible.
- Detail tables reserve a fixed header width.
Use code formatting for machine identifiers such as session IDs, file paths,
and protocol values. Link rows only where navigation is useful; avoid making
entire rows clickable when a single identifier link is clearer.
## Status Badges
Status uses Bootstrap badge classes with a small mapping layer:
| State | Badge class |
|-------|-------------|
| `Ready`, `Healthy` | `text-bg-success` |
| `Creating`, `StartingWorker`, `WaitingForPipe`, `InitializingWorker`, `Closing` | `text-bg-info` |
| `Closed` | `text-bg-secondary` |
| `Faulted` | `text-bg-danger` |
| Unknown state | `text-bg-light text-dark border` |
Keep status text literal. Operators benefit from seeing the same state names
that appear in logs and APIs.
## Empty And Loading States
Empty states are explicit and quiet. They use a white background, dashed border,
small radius, muted text, and one sentence:
```html
<div class="empty-state">No worker processes are attached.</div>
```
Loading states use the same component shape. Avoid spinners for snapshot pages
that update on a timer; a stable text placeholder is less distracting.
## Detail Pages
Detail pages use stacked sections instead of nested cards:
- The page header identifies the selected entity.
- The first section shows entity metadata in a two-column details table.
- Additional sections show related runtime state, such as worker metadata.
- Missing entities render a single section with a concise not-found message.
This structure keeps details comparable across pages and avoids card nesting.
## Responsive Behavior
The dashboard uses one small-screen breakpoint:
```css
@media (max-width: 700px) {
.dashboard-content {
padding: .75rem;
}
.dashboard-page-header {
align-items: flex-start;
flex-direction: column;
}
.details-table th {
width: 9rem;
}
}
```
Do not hide important columns by default. Use horizontal table scrolling for
dense operational data, and reserve column hiding for data that is clearly
duplicative.
## Data Formatting
Use a small display helper instead of formatting inline in every component.
The helper should provide consistent rendering for:
- empty text as `-`,
- counts with thousands separators,
- dates and times in a consistent local or configured format,
- durations in stable units,
- metric lookup by name and dimension.
Centralizing formatting prevents visual drift between overview cards, tables,
and detail pages.
## Security And Redaction
The interface is read-only unless an explicit administrative action is
designed. It should not display secrets or raw credential-bearing values.
Apply redaction before values reach Razor components. The UI treats redacted
values as normal display text; it does not need to know why a value is hidden.
This keeps security policy in the dashboard projection layer rather than in
markup.
## Replication Checklist
Use this checklist when applying the design to another project:
- Define four local tokens: surface, border, muted ink, and accent.
- Use a Bootstrap top navbar with short route labels.
- Keep page content inside a full-width fluid container.
- Start every page with the same header structure.
- Put primary numeric state in `metric-grid` cards.
- Put detailed runtime state in compact responsive tables.
- Use status badges mapped from real domain states.
- Use dashed bordered empty states for loading and no-data cases.
- Use top-bordered sections for page groups instead of nested cards.
- Centralize formatting and redaction outside Razor markup.
- Keep the dashboard read-only until admin workflows have a separate design.
## Related Documentation
- [Gateway Dashboard Detailed Design](./gateway-dashboard-design.md)
+153
View File
@@ -0,0 +1,153 @@
# Gateway Configuration
This document describes every option bound under the `MxGateway` configuration
section by `GatewayOptions`.
The gateway binds configuration at startup and validates it with
`GatewayOptionsValidator`. Startup fails before the server listens when required
paths, timeouts, queue sizes, enum values, or protocol values are invalid.
## Configuration Shape
```json
{
"MxGateway": {
"Authentication": {
"Mode": "ApiKey",
"SqlitePath": "C:\\ProgramData\\MxGateway\\gateway-auth.db",
"PepperSecretName": "MxGateway:ApiKeyPepper",
"RunMigrationsOnStartup": true
},
"Worker": {
"ExecutablePath": "src\\MxGateway.Worker\\bin\\x86\\Release\\MxGateway.Worker.exe",
"WorkingDirectory": null,
"RequiredArchitecture": "X86",
"StartupTimeoutSeconds": 30,
"StartupProbeRetryAttempts": 3,
"StartupProbeRetryDelayMilliseconds": 250,
"PipeConnectAttemptTimeoutMilliseconds": 2000,
"ShutdownTimeoutSeconds": 10,
"HeartbeatIntervalSeconds": 5,
"HeartbeatGraceSeconds": 15,
"MaxMessageBytes": 16777216
},
"Sessions": {
"DefaultCommandTimeoutSeconds": 30,
"MaxSessions": 64,
"MaxPendingCommandsPerSession": 128,
"AllowMultipleEventSubscribers": false
},
"Events": {
"QueueCapacity": 10000,
"BackpressurePolicy": "FailFast"
},
"Dashboard": {
"Enabled": true,
"PathBase": "/dashboard",
"RequireAdminScope": true,
"AllowAnonymousLocalhost": true,
"SnapshotIntervalMilliseconds": 1000,
"RecentFaultLimit": 100,
"RecentSessionLimit": 200,
"ShowTagValues": false
},
"Protocol": {
"WorkerProtocolVersion": 1
}
}
}
```
Environment variables use the normal .NET double-underscore form. For example,
`MxGateway__Sessions__MaxSessions=20` overrides
`MxGateway:Sessions:MaxSessions`.
## Authentication Options
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Authentication:Mode` | `ApiKey` | Selects public gRPC authentication. Supported values are `ApiKey` and `Disabled`. `Disabled` bypasses API-key verification and is for local development only. |
| `MxGateway:Authentication:SqlitePath` | `C:\ProgramData\MxGateway\gateway-auth.db` | SQLite database path for API-key records and audit rows when API-key authentication is enabled. |
| `MxGateway:Authentication:PepperSecretName` | `MxGateway:ApiKeyPepper` | Configuration key used to read the HMAC pepper for API-key secret hashing. The dashboard effective configuration redacts this value. |
| `MxGateway:Authentication:RunMigrationsOnStartup` | `true` | Runs SQLite auth schema migrations at gateway startup when API-key authentication is enabled. |
When `Mode` is `ApiKey`, `SqlitePath` and `PepperSecretName` must be present.
`SqlitePath` must be a valid filesystem path.
## Worker Options
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Worker:ExecutablePath` | `src\MxGateway.Worker\bin\x86\Release\MxGateway.Worker.exe` | Path to the x86 worker executable launched for each gateway session. The path must be valid and point to a `.exe` file. |
| `MxGateway:Worker:WorkingDirectory` | `null` | Optional working directory for the worker process. When set, it must be a valid filesystem path. |
| `MxGateway:Worker:RequiredArchitecture` | `X86` | Required Portable Executable architecture for the worker. Supported values are `X86` and `X64`; MXAccess parity uses `X86`. |
| `MxGateway:Worker:StartupTimeoutSeconds` | `30` | Total startup budget for process launch, startup probe, pipe connect, handshake, and worker readiness. |
| `MxGateway:Worker:StartupProbeRetryAttempts` | `3` | Number of retry attempts for transient worker startup probe failures before pipe connection and handshake continue. |
| `MxGateway:Worker:StartupProbeRetryDelayMilliseconds` | `250` | Delay between transient startup probe retry attempts. |
| `MxGateway:Worker:PipeConnectAttemptTimeoutMilliseconds` | `2000` | Per-attempt timeout used by the worker named-pipe connect retry path. The overall pipe connection still stays under the startup budget. |
| `MxGateway:Worker:ShutdownTimeoutSeconds` | `10` | Grace period for worker shutdown before the gateway treats shutdown as failed and may kill the worker process tree. |
| `MxGateway:Worker:HeartbeatIntervalSeconds` | `5` | Worker heartbeat send interval and gateway heartbeat check cadence input. |
| `MxGateway:Worker:HeartbeatGraceSeconds` | `15` | Maximum age of the last worker heartbeat before the gateway faults the worker. This must be greater than or equal to `HeartbeatIntervalSeconds`. |
| `MxGateway:Worker:MaxMessageBytes` | `16777216` | Maximum worker IPC frame payload size in bytes. The validator allows values from `1024` through `268435456`. |
`StartupProbeRetryAttempts`, `StartupProbeRetryDelayMilliseconds`,
`PipeConnectAttemptTimeoutMilliseconds`, timeout values, heartbeat values, and
`MaxMessageBytes` must be positive. `MaxMessageBytes` is intentionally bounded
to avoid accidental large allocations from malformed or oversized frames.
## Session Options
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Sessions:DefaultCommandTimeoutSeconds` | `30` | Default timeout used while the gateway waits for a worker command reply when an open-session request does not provide a positive command timeout. |
| `MxGateway:Sessions:MaxSessions` | `64` | Maximum number of concurrently open gateway sessions. Session opens reserve a slot atomically before worker creation. |
| `MxGateway:Sessions:MaxPendingCommandsPerSession` | `128` | Maximum number of pending worker commands for one session. Excess commands fail fast instead of queueing indefinitely. |
| `MxGateway:Sessions:AllowMultipleEventSubscribers` | `false` | Controls whether multiple `StreamEvents` subscribers may attach to one session. `true` is rejected until event fan-out is implemented. |
All numeric session options must be greater than zero. The current event stream
implementation supports one active subscriber per session; this preserves event
ordering and avoids competing consumers.
## Event Options
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Events:QueueCapacity` | `10000` | Capacity for bounded per-session event queues used by the gateway worker event channel and the public gRPC event stream queue. |
| `MxGateway:Events:BackpressurePolicy` | `FailFast` | Event backpressure behavior. `FailFast` is the only supported value. |
`QueueCapacity` must be greater than zero. With `FailFast`, queue overflow
faults the affected worker or session instead of silently dropping MXAccess
events.
## Dashboard Options
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Dashboard:Enabled` | `true` | Enables Blazor Server dashboard route mapping. |
| `MxGateway:Dashboard:PathBase` | `/dashboard` | Base path for dashboard routes. When the dashboard is enabled, this value is required and must start with `/`. |
| `MxGateway:Dashboard:RequireAdminScope` | `true` | Requires API keys used for dashboard login to carry the `admin` scope. |
| `MxGateway:Dashboard:AllowAnonymousLocalhost` | `true` | Allows loopback dashboard requests to bypass the dashboard cookie requirement for local development. Remote requests still require dashboard authentication. |
| `MxGateway:Dashboard:SnapshotIntervalMilliseconds` | `1000` | Dashboard snapshot refresh interval used by realtime Blazor pages. |
| `MxGateway:Dashboard:RecentFaultLimit` | `100` | Maximum number of fault summaries projected into each dashboard snapshot. |
| `MxGateway:Dashboard:RecentSessionLimit` | `200` | Maximum number of session summaries projected into each dashboard snapshot. |
| `MxGateway:Dashboard:ShowTagValues` | `false` | Reserved display control for tag values. The dashboard does not show full tag values by default. |
`SnapshotIntervalMilliseconds` must be greater than zero. `RecentFaultLimit`
and `RecentSessionLimit` must be greater than or equal to zero.
## Protocol Options
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Protocol:WorkerProtocolVersion` | `1` | Worker IPC protocol version expected by the gateway and worker. This must match `GatewayContractInfo.WorkerProtocolVersion`. |
The protocol option is exposed for diagnostics and explicit deployment
configuration, not for compatibility negotiation. A mismatch fails validation
at startup.
## Related Documentation
- [Gateway Process Detailed Design](./gateway-process-design.md)
- [Gateway Dashboard Detailed Design](./gateway-dashboard-design.md)
- [Worker Process Launcher](./WorkerProcessLauncher.md)
- [Worker Frame Protocol](./WorkerFrameProtocol.md)
+4
View File
@@ -247,6 +247,10 @@ Each client should expose event streaming as the idiomatic streaming primitive:
Events must preserve gateway order. Libraries should not reorder, coalesce, or
drop events by default.
Long-lived event streams do not inherit unary call deadlines. Clients apply the
default call timeout to unary operations only, and streams run until the caller
cancels them or an explicit stream timeout is configured.
The event surface must include:
- `OnDataChange`
+3
View File
@@ -336,6 +336,9 @@ Recommended visual language:
If charts are added later, prefer simple server-generated data tables first. Do
not add a JavaScript charting dependency without a specific need.
The reusable visual rules for replicating this interface in other projects are
documented in [Dashboard Interface Design](./DashboardInterfaceDesign.md).
## Testing
Dashboard unit/component tests should cover:
+21 -7
View File
@@ -361,13 +361,13 @@ worker startup, and removes the session if startup fails. A successful
`OpenSession` attaches the ready `IWorkerClient` and transitions the session to
`Ready`.
Only `Ready` sessions accept command and event operations. `CloseSession` is
idempotent for sessions still known to the registry: the first close shuts down
the worker, and later closes return the final `Closed` state. Lease handling is
exposed as a session hook so a monitor can close expired sessions without
embedding lease policy in the worker client. Gateway shutdown walks the
registry, closes each known session, and kills a worker if graceful shutdown
fails.
Only `Ready` sessions accept command and event operations. `CloseSession` shuts
down the worker, disposes the worker client, and removes the session from the
registry so closed sessions do not retain pipe or process handles. A later close
for the same id returns `SessionNotFound`. Lease handling is exposed as a
session hook so a monitor can close expired sessions without embedding lease
policy in the worker client. Gateway shutdown walks the registry, closes each
known session, and kills a worker if graceful shutdown fails.
## Worker Launch
@@ -813,6 +813,11 @@ It emits .NET `Meter` instruments for collectors and keeps a
the dashboard needs current counters and queue depths without depending on a
specific metrics exporter.
Event metrics use low-cardinality tags such as event family. Per-session event
counts are kept only in the in-process snapshot for active dashboard sessions
and are purged when the session is removed. Worker event queue depth and gRPC
event stream queue depth are reported as separate gauges.
HTTP request handling uses `UseGatewayRequestLoggingScope()` to attach common
structured log fields when request metadata is present:
@@ -842,6 +847,8 @@ Suggested configuration shape:
},
"Worker": {
"ExecutablePath": "src/MxGateway.Worker/bin/x86/Release/MxGateway.Worker.exe",
"WorkingDirectory": null,
"RequiredArchitecture": "X86",
"StartupTimeoutSeconds": 30,
"StartupProbeRetryAttempts": 3,
"StartupProbeRetryDelayMilliseconds": 250,
@@ -854,6 +861,7 @@ Suggested configuration shape:
"Sessions": {
"DefaultCommandTimeoutSeconds": 30,
"MaxSessions": 64,
"MaxPendingCommandsPerSession": 128,
"AllowMultipleEventSubscribers": false
},
"Events": {
@@ -869,6 +877,9 @@ Suggested configuration shape:
"RecentFaultLimit": 100,
"RecentSessionLimit": 200,
"ShowTagValues": false
},
"Protocol": {
"WorkerProtocolVersion": 1
}
}
}
@@ -888,6 +899,9 @@ diagnostics, so it redacts secret-related fields such as
`Authentication:PepperSecretName` and does not include raw API keys or key
material.
The complete option reference, including defaults and validation rules, is in
[Gateway Configuration](./GatewayConfiguration.md).
## Galaxy Repository Metadata
Galaxy hierarchy and tag metadata can be discovered through SQL Server when