docs(components): reference docs batch 4/4 — ManagementService, CLI, Transport, CentralUI, TraefikProxy, TreeView

This commit is contained in:
Joseph Doherty
2026-06-03 15:57:32 -04:00
parent c1c8e35687
commit d14fc3f68f
6 changed files with 1352 additions and 0 deletions
+240
View File
@@ -0,0 +1,240 @@
# Management Service
The Management Service is the Akka.NET actor that provides programmatic access to every admin operation on the central cluster — the same operations the Central UI exposes, made available over an HTTP API and, optionally, a `ClusterClient` path for cross-cluster callers.
## Overview
Management Service (#18) runs on the central cluster only. The component code lives in `src/ZB.MOM.WW.ScadaBridge.ManagementService/`, with four source files:
- `ManagementActor.cs` — the `ReceiveActor` that owns authorization, dispatch, and error mapping for all commands.
- `ManagementEndpoints.cs` — the `POST /management` minimal-API endpoint that authenticates over HTTP Basic Auth and forwards to the actor.
- `AuditEndpoints.cs` — dedicated REST endpoints (`GET /api/audit/query`, `GET /api/audit/export`) for the centralized Audit Log (#23); these bypass the actor because the workload is read-only and keyset-paged.
- `DebugStreamHub.cs` — a SignalR hub for real-time debug stream subscriptions (attribute and alarm state changes).
`ServiceCollectionExtensions.AddManagementService` registers `ManagementActorHolder` (a DI singleton that holds the live `IActorRef`) and binds `ManagementServiceOptions` from `ScadaBridge:ManagementService`.
The `ManagementActor` is not a cluster singleton. Because it is completely stateless — it opens a new DI scope per command and delegates all work to repositories and domain services — every central node runs its own instance. Either node can serve any request independently, so no singleton coordination is needed.
## Key Concepts
### `ManagementEnvelope` and the wire protocol
Every command arrives wrapped in a `ManagementEnvelope`:
```csharp
public record AuthenticatedUser(
string Username, string DisplayName,
string[] Roles, string[] PermittedSiteIds);
public record ManagementEnvelope(AuthenticatedUser User, object Command, string CorrelationId);
```
The HTTP endpoint constructs the envelope after LDAP authentication and role resolution; the `CorrelationId` (a `Guid` formatted as `"N"`) ties server-log entries to the caller's request. The actor never authenticates a second time — the envelope carries an already-resolved `AuthenticatedUser`.
### Role enforcement and site scope
Authorization is a two-level check. `GetRequiredRole` maps each command type to the minimum role required:
| Role | Commands |
|------|----------|
| `Administrator` | Site management, role mappings, API key management, scope rules, `QueryAuditLogCommand`, `PreviewBundle`, `ImportBundle` |
| `Designer` | Template authoring (members, folders, compositions), external systems, data connections, notification lists, shared scripts, database connections, inbound API methods, `ExportBundle` |
| `Deployer` | Instance lifecycle, connection bindings, overrides, deployments, debug snapshot, parked message queries |
| _(any authenticated user)_ | Read-only list/get queries, health summary |
Within `Deployer` commands, `EnforceSiteScope` applies a second check: users whose role mapping carries `PermittedSiteIds` can only touch instances and sites within their permitted set. Administrators and system-wide deployers (empty `PermittedSiteIds`) are unrestricted. A violation throws `SiteScopeViolationException`, which `MapFault` converts to `ManagementUnauthorized`.
### Command registry
`ManagementCommandRegistry` (in Commons) maps wire names to CLR types via reflection at startup. It scans the `ZB.MOM.WW.ScadaBridge.Commons.Messages.Management` namespace for non-abstract types whose name ends in `"Command"` and stores them in a `FrozenDictionary`. The HTTP endpoint calls `ManagementCommandRegistry.Resolve(commandName)` to get the target type, then deserializes the `payload` JSON into it.
### Audit contract
Mutating handlers that call repositories directly invoke `AuditAsync` (backed by `IAuditService`) after a successful write. Handlers that delegate to domain services — `TemplateService`, `InstanceService`, `DeploymentService`, `ArtifactDeploymentService`, `TemplateFolderService`, `SharedScriptService` — do not call `AuditAsync`; those services audit internally. This avoids double-logging. SMTP configuration and API key responses project out secrets before the audit entry is written.
## Architecture
### Actor lifecycle and registration
`AkkaHostedService` (in the Host) creates the `ManagementActor` under the path `/user/management` and registers it with `ClusterClientReceptionist`:
```csharp
var mgmtActor = _actorSystem!.ActorOf(
Props.Create(() => new ManagementActor(_serviceProvider, mgmtLogger)),
"management");
ClusterClientReceptionist.Get(_actorSystem).RegisterService(mgmtActor);
var mgmtHolder = _serviceProvider.GetRequiredService<ManagementActorHolder>();
mgmtHolder.ActorRef = mgmtActor;
```
`ClusterClientReceptionist` advertises the actor to `ClusterClient` senders without requiring them to join the Akka cluster. The `ManagementActorHolder.ActorRef` property is then the bridge from the HTTP endpoint (which runs in ASP.NET Core middleware) into the Akka actor world.
The actor declares an explicit supervisor strategy — one-for-one with Resume and no retry limit — to match the coordinator-actor convention and remain correct if child actors are added later.
### HTTP Management API (`POST /management`)
`ManagementEndpoints.MapManagementAPI` registers the endpoint. Each request goes through six steps:
1. Raise the per-request body size cap to 200 MB (needed for Transport bundle imports).
2. Decode `Authorization: Basic <base64>` and split username/password.
3. Authenticate via `ILdapAuthService`.
4. Resolve roles via `RoleMapper`, building the `AuthenticatedUser` with any site-scope limits.
5. Deserialize the JSON body (`command` + `payload`) via `ManagementCommandRegistry`.
6. `Ask` the `ManagementActor` with a `ManagementEnvelope` and map the response:
```csharp
return response switch
{
ManagementSuccess success => Results.Text(success.JsonData, "application/json", statusCode: 200),
ManagementError error => Results.Json(new { error = error.Error, code = error.ErrorCode }, statusCode: 400),
ManagementUnauthorized u => Results.Json(new { error = u.Message, code = "UNAUTHORIZED" }, statusCode: 403),
_ => Results.Json(new { error = "Unexpected response.", code = "INTERNAL_ERROR" }, statusCode: 500)
};
```
The `Ask` timeout defaults to 30 seconds and is overridable via `ScadaBridge:ManagementService:CommandTimeout`. An elapsed timeout returns HTTP 504.
### Actor dispatch and error mapping
`ManagementActor.HandleEnvelope` checks the required role, then calls `ProcessCommand`, which opens a DI scope, runs `DispatchCommand`, and wraps the result in `ManagementSuccess`. The `PipeTo` pattern keeps the actor's message loop free during async work; the failure continuation maps exceptions to `ManagementError` or `ManagementUnauthorized`:
```csharp
private void HandleEnvelope(ManagementEnvelope envelope)
{
var sender = Sender;
var correlationId = envelope.CorrelationId;
var user = envelope.User;
var requiredRole = GetRequiredRole(envelope.Command);
if (requiredRole != null && !user.Roles.Contains(requiredRole, StringComparer.OrdinalIgnoreCase))
{
sender.Tell(new ManagementUnauthorized(correlationId,
$"Role '{requiredRole}' required for {envelope.Command.GetType().Name}"));
return;
}
ProcessCommand(envelope, user)
.PipeTo(sender,
success: result => result,
failure: ex => MapFault(ex, correlationId, envelope.Command));
}
```
`ManagementCommandException` carries a message safe to surface to callers. Any other exception is an unanticipated fault; only the correlation ID is returned so internal detail (server names, constraint names) is not disclosed.
### Audit REST API (`/api/audit/*`)
`AuditEndpoints.MapAuditAPI` registers two GET endpoints that go directly to `IAuditLogRepository`, bypassing the actor:
- `GET /api/audit/query` — keyset-paged JSON result. Requires `OperationalAudit` permission (Admin / Audit / AuditReadOnly roles). Accepts `channel`, `kind`, `status`, `sourceSiteId`, `correlationId`, `executionId`, `parentExecutionId`, `fromUtc`, `toUtc`, `pageSize`, and cursor params `afterOccurredAtUtc`/`afterEventId`. Returns `{ events, nextCursor }` where `nextCursor` is explicit `null` on the last page.
- `GET /api/audit/export` — server-side streaming export (CSV or JSONL) of all matching rows, paging the repository internally at 1 000 rows per batch and flushing after each batch. Requires `AuditExport` permission (Admin / Audit roles). `format=parquet` returns HTTP 501 (deferred).
Both endpoints apply the same HTTP Basic Auth / LDAP / role flow as `/management`. Site-scoped callers have their `sourceSiteId` filter intersected with their `PermittedSiteIds`; an explicit out-of-scope filter returns HTTP 403 rather than silently empty results.
### Debug stream (`/debug-stream`)
`DebugStreamHub` is a SignalR hub registered alongside the management endpoints. It authenticates on `OnConnectedAsync` (same Basic Auth / LDAP / role flow), requires the `Deployer` role, and enforces per-instance site scope on `SubscribeInstance`. Accepted connections receive an initial `DebugViewSnapshot` followed by incremental `AttributeValueChanged` and `AlarmStateChanged` events pushed from `DebugStreamService`.
## Usage
### Sending a command from the CLI
The CLI sends a single `POST /management` with JSON body and Basic Auth; it does not use `ClusterClient` directly. A typical request:
```
POST /management
Authorization: Basic base64(username:password)
Content-Type: application/json
{
"command": "ListSites",
"payload": {}
}
```
A successful response is HTTP 200 with the JSON result. An authorization failure is HTTP 403 with `{ "error": "...", "code": "UNAUTHORIZED" }`.
### Sending a command via ClusterClient
The `ManagementActor` is also reachable from any `ClusterClient` that has a contact point into the central cluster. The actor is registered under `/system/receptionist` with the path `/user/management`. Callers construct and `Tell` a `ManagementEnvelope` and expect one of `ManagementSuccess`, `ManagementError`, or `ManagementUnauthorized` in reply.
## Command Groups
`DispatchCommand` in `ManagementActor.cs` is the canonical enumeration of every supported command. The table below organizes them by domain area.
| Group | Commands | Minimum role |
|-------|----------|--------------|
| Templates | `ListTemplates`, `GetTemplate`, `CreateTemplate`, `UpdateTemplate`, `DeleteTemplate`, `ValidateTemplate` | Designer (mutations) |
| Template members | `AddTemplateAttribute`, `UpdateTemplateAttribute`, `DeleteTemplateAttribute`, `AddTemplateAlarm`, `UpdateTemplateAlarm`, `DeleteTemplateAlarm`, `AddTemplateNativeAlarmSource`, `UpdateTemplateNativeAlarmSource`, `DeleteTemplateNativeAlarmSource`, `ListTemplateNativeAlarmSources`, `AddTemplateScript`, `UpdateTemplateScript`, `DeleteTemplateScript`, `AddTemplateComposition`, `DeleteTemplateComposition` | Designer (mutations) |
| Template folders | `ListTemplateFolders`, `CreateTemplateFolder`, `RenameTemplateFolder`, `MoveTemplateFolder`, `DeleteTemplateFolder`, `MoveTemplateToFolder` | Designer (mutations) |
| Instances | `ListInstances`, `GetInstance`, `CreateInstance`, `MgmtDeployInstance`, `MgmtEnableInstance`, `MgmtDisableInstance`, `MgmtDeleteInstance`, `SetConnectionBindings`, `SetInstanceOverrides`, `SetInstanceArea`, `SetInstanceAlarmOverride`, `DeleteInstanceAlarmOverride`, `ListInstanceAlarmOverrides`, `SetInstanceNativeAlarmSourceOverride`, `DeleteInstanceNativeAlarmSourceOverride`, `ListInstanceNativeAlarmSourceOverrides` | Deployer (mutations) |
| Sites & areas | `ListSites`, `GetSite`, `CreateSite`, `UpdateSite`, `DeleteSite`, `ListAreas`, `CreateArea`, `UpdateArea`, `DeleteArea` | Administrator (mutations) |
| Data connections | `ListDataConnections`, `GetDataConnection`, `CreateDataConnection`, `UpdateDataConnection`, `DeleteDataConnection` | Designer (mutations) |
| External systems | `ListExternalSystems`, `GetExternalSystem`, `CreateExternalSystem`, `UpdateExternalSystem`, `DeleteExternalSystem`, `ListExternalSystemMethods`, `GetExternalSystemMethod`, `CreateExternalSystemMethod`, `UpdateExternalSystemMethod`, `DeleteExternalSystemMethod` | Designer (mutations) |
| Notification lists / SMTP | `ListNotificationLists`, `GetNotificationList`, `CreateNotificationList`, `UpdateNotificationList`, `DeleteNotificationList`, `ListSmtpConfigs`, `UpdateSmtpConfig` | Designer (mutations) |
| Shared scripts | `ListSharedScripts`, `GetSharedScript`, `CreateSharedScript`, `UpdateSharedScript`, `DeleteSharedScript` | Designer (mutations) |
| Database connections | `ListDatabaseConnections`, `GetDatabaseConnection`, `CreateDatabaseConnectionDef`, `UpdateDatabaseConnectionDef`, `DeleteDatabaseConnectionDef` | Designer (mutations) |
| Inbound API methods | `ListApiMethods`, `GetApiMethod`, `CreateApiMethod`, `UpdateApiMethod`, `DeleteApiMethod` | Designer (mutations) |
| Security | `ListRoleMappings`, `CreateRoleMapping`, `UpdateRoleMapping`, `DeleteRoleMapping`, `ListApiKeys`, `CreateApiKey`, `UpdateApiKey`, `DeleteApiKey`, `SetApiKeyMethods`, `ListScopeRules`, `AddScopeRule`, `DeleteScopeRule` | Administrator |
| Deployments | `MgmtDeployArtifacts`, `QueryDeployments`, `GetDeploymentDiff` | Deployer |
| Health | `GetHealthSummary`, `GetSiteHealth` | Any authenticated user |
| Remote queries | `QueryEventLogs`, `QueryParkedMessages`, `RetryParkedMessage`, `DiscardParkedMessage`, `DebugSnapshot` | Deployer |
| Audit (legacy) | `QueryAuditLog` | Administrator |
| Transport | `ExportBundle` (Designer), `PreviewBundle`, `ImportBundle` (Administrator) | Varies |
`ValidateTemplate` builds a `FlattenedConfiguration` from the template's attributes, alarms, and scripts, runs the full `ValidationService` pipeline (collision detection, script compilation, trigger reference checks), and merges in naming-collision errors from `TemplateService.DetectCollisionsAsync` — all without a deployment.
`SetInstanceOverrides` validates every attribute name and lock status against the template before applying any write, making the batch all-or-nothing at the validation layer.
## Configuration
| Section | Key | Default | Description |
|---------|-----|---------|-------------|
| `ScadaBridge:ManagementService` | `CommandTimeout` | `00:00:30` | `Ask` timeout the `ManagementEndpoints` applies when forwarding to the `ManagementActor`. A non-positive value falls back to the 30-second default. |
The 200 MB per-request body cap (`ManagementEndpoints.MaxManagementRequestBodyBytes`) is hard-coded; it exists to accommodate Transport (#24) Import calls where a 100 MB raw bundle base64-inflates to roughly 140 MB plus the envelope overhead.
## Dependencies & Interactions
- [Commons (#16)](./Commons.md) — owns the message contracts (`Messages/Management/`), `ManagementEnvelope`, `ManagementCommandRegistry`, `AuthenticatedUser`, and `ManagementSuccess`/`ManagementError`/`ManagementUnauthorized` response types.
- [Configuration Database (#17)](./ConfigurationDatabase.md) — every repository (`ITemplateEngineRepository`, `ISiteRepository`, `IExternalSystemRepository`, `INotificationRepository`, `ISecurityRepository`, `IInboundApiRepository`, `IDeploymentManagerRepository`, `ICentralUiRepository`) and `IAuditService` are backed by EF Core against the central MS SQL database. Management Service resolves them per-command through scoped DI.
- [Template Engine (#1)](./TemplateEngine.md) — `TemplateService`, `TemplateFolderService`, `SharedScriptService`, and the `ValidationService` handle template authoring and validation. Management Service is the sole entry point for template mutations from outside the Central UI.
- [Deployment Manager (#2)](./DeploymentManager.md) — `DeploymentService` and `ArtifactDeploymentService` own the deployment pipeline. `MgmtDeployInstance` and `MgmtDeployArtifacts` delegate here.
- [CentralSite Communication (#5)](./Communication.md) — `CommunicationService` routes `QueryEventLogs`, `QueryParkedMessages`, `RetryParkedMessage`, `DiscardParkedMessage`, and `DebugSnapshot` to site actors via `ClusterClient`. Deployment commands also flow through the communication layer.
- [Security & Auth (#10)](./Security.md) — `ILdapAuthService` and `RoleMapper` authenticate and map roles on every HTTP request; the `Roles` constants and `IInboundApiKeyAdmin` are also consumed here.
- [Health Monitoring (#11)](./HealthMonitoring.md) — `ICentralHealthAggregator` answers `GetHealthSummary` and `GetSiteHealth` queries synchronously from its in-memory state.
- [Audit Log (#23)](./AuditLog.md) — `AuditEndpoints` reads the central `AuditLog` table via `IAuditLogRepository` directly (no actor hop). `QueryAuditLogCommand` through `/management` is a legacy path for the configuration-change audit via `ICentralUiRepository`.
- [CLI (#19)](./CLI.md) — the primary consumer of `POST /management` and the `/api/audit/*` endpoints. Constructs `ManagementEnvelope`-shaped JSON, sends Basic Auth, and deserializes the response.
- [Host (#15)](./Host.md) — `AkkaHostedService` creates the `ManagementActor`, registers it with `ClusterClientReceptionist`, and sets `ManagementActorHolder.ActorRef` so the HTTP endpoint can reach it.
- Design spec: [Component-ManagementService.md](../requirements/Component-ManagementService.md).
## Troubleshooting
### Actor not ready (HTTP 503)
If `POST /management` returns `503 SERVICE_UNAVAILABLE`, `ManagementActorHolder.ActorRef` is null — the actor system has not finished starting. This resolves itself once `AkkaHostedService.StartAsync` completes. The `/health/ready` endpoint is the gating signal; traffic should not reach `/management` before it returns 200.
### Command timeout (HTTP 504)
A 504 response means the `Ask` to `ManagementActor` did not return within the configured `CommandTimeout`. The server log entry includes the `CorrelationId` from the response body. Common causes: a long-running deployment waiting on a site that is offline, or a database query against a cold EF Core connection. Increasing `ScadaBridge:ManagementService:CommandTimeout` buys time while the root cause is investigated.
### Unexpected internal error
Any exception that is not a `ManagementCommandException` or `SiteScopeViolationException` maps to a generic `COMMAND_FAILED` error with the correlation ID. The server log at `Error` level will contain the full exception, keyed by `CorrelationId`. `ManagementCommandException` messages are intentionally surfaced verbatim; all other exception messages are suppressed on the wire to avoid leaking internal detail.
### Audit log export stalls mid-stream
`GET /api/audit/export` streams rows in pages of 1 000 and flushes after each page. If the response body stops arriving, check whether a proxy is buffering the response (the endpoint sets `Cache-Control: no-store` to defeat most buffers). The `pageSize` parameter on `/api/audit/query` caps at 1 000; requests above that are silently clamped.
## Related Documentation
- [Management Service design specification](../requirements/Component-ManagementService.md)
- [CLI](./CLI.md)
- [CentralSite Communication](./Communication.md)
- [Commons](./Commons.md)
- [Configuration Database](./ConfigurationDatabase.md)
- [Template Engine](./TemplateEngine.md)
- [Deployment Manager](./DeploymentManager.md)
- [Security](./Security.md)
- [Audit Log](./AuditLog.md)
- [Host](./Host.md)