Files
ScadaBridge/docs/components/ManagementService.md
T
Joseph Doherty 0c3837c778 docs(components): accuracy fixes from deep review (batch 4)
ManagementService (role table: queries any-auth, area mutations Designer;
audit contract exception), CLI (missing instance/api-key subcommands; server
JSON printed verbatim; bundle preview timeout), Transport (BundleFormatVersion
exact-match gate; dependency scan fields; three flushes), CentralUI
(/api/script-analysis endpoints; LoginLayout minimal; Health tile components),
TreeView (Topology no RevealNode; ContextMenu Site branch; InitiallyExpanded).
2026-06-03 16:39:29 -04:00

241 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Management Service
The Management Service is the Akka.NET actor that provides programmatic access to every admin operation on the central cluster — the same operations the Central UI exposes, made available over an HTTP API and, optionally, a `ClusterClient` path for cross-cluster callers.
## Overview
Management Service (#18) runs on the central cluster only. The component code lives in `src/ZB.MOM.WW.ScadaBridge.ManagementService/`, with four source files:
- `ManagementActor.cs` — the `ReceiveActor` that owns authorization, dispatch, and error mapping for all commands.
- `ManagementEndpoints.cs` — the `POST /management` minimal-API endpoint that authenticates over HTTP Basic Auth and forwards to the actor.
- `AuditEndpoints.cs` — dedicated REST endpoints (`GET /api/audit/query`, `GET /api/audit/export`) for the centralized Audit Log (#23); these bypass the actor because the workload is read-only and keyset-paged.
- `DebugStreamHub.cs` — a SignalR hub for real-time debug stream subscriptions (attribute and alarm state changes).
`ServiceCollectionExtensions.AddManagementService` registers `ManagementActorHolder` (a DI singleton that holds the live `IActorRef`) and binds `ManagementServiceOptions` from `ScadaBridge:ManagementService`.
The `ManagementActor` is not a cluster singleton. Because it is completely stateless — it opens a new DI scope per command and delegates all work to repositories and domain services — every central node runs its own instance. Either node can serve any request independently, so no singleton coordination is needed.
## Key Concepts
### `ManagementEnvelope` and the wire protocol
Every command arrives wrapped in a `ManagementEnvelope`:
```csharp
public record AuthenticatedUser(
string Username, string DisplayName,
string[] Roles, string[] PermittedSiteIds);
public record ManagementEnvelope(AuthenticatedUser User, object Command, string CorrelationId);
```
The HTTP endpoint constructs the envelope after LDAP authentication and role resolution; the `CorrelationId` (a `Guid` formatted as `"N"`) ties server-log entries to the caller's request. The actor never authenticates a second time — the envelope carries an already-resolved `AuthenticatedUser`.
### Role enforcement and site scope
Authorization is a two-level check. `GetRequiredRole` maps each command type to the minimum role required:
| Role | Commands |
|------|----------|
| `Administrator` | Site management, role mappings, API key management, scope rules, `QueryAuditLogCommand`, `PreviewBundle`, `ImportBundle` |
| `Designer` | Template authoring (members, folders, compositions), external systems, data connections, notification lists, shared scripts, database connections, inbound API methods, `ExportBundle` |
| `Deployer` | Instance lifecycle, connection bindings, overrides, deployments, debug snapshot, `RetryParkedMessageCommand`, `DiscardParkedMessageCommand` |
| _(any authenticated user)_ | Read-only list/get queries, health summary |
Within `Deployer` commands, `EnforceSiteScope` applies a second check: users whose role mapping carries `PermittedSiteIds` can only touch instances and sites within their permitted set. Administrators and system-wide deployers (empty `PermittedSiteIds`) are unrestricted. A violation throws `SiteScopeViolationException`, which `MapFault` converts to `ManagementUnauthorized`.
### Command registry
`ManagementCommandRegistry` (in Commons) maps wire names to CLR types via reflection at startup. It scans the `ZB.MOM.WW.ScadaBridge.Commons.Messages.Management` namespace for non-abstract types whose name ends in `"Command"` and stores them in a `FrozenDictionary`. The HTTP endpoint calls `ManagementCommandRegistry.Resolve(commandName)` to get the target type, then deserializes the `payload` JSON into it.
### Audit contract
Mutating handlers that call repositories directly invoke `AuditAsync` (backed by `IAuditService`) after a successful write. Most handlers that delegate to a domain service — `TemplateService`, `DeploymentService`, `ArtifactDeploymentService`, `TemplateFolderService`, `SharedScriptService` — do not call `AuditAsync`; those services audit internally, avoiding double-logging. However, some delegating handlers also call `AuditAsync` directly: `HandleCreateInstance` delegates to `InstanceService.CreateInstanceAsync` and then calls `AuditAsync` itself. SMTP configuration and API key responses project out secrets before the audit entry is written.
## Architecture
### Actor lifecycle and registration
`AkkaHostedService` (in the Host) creates the `ManagementActor` under the path `/user/management` and registers it with `ClusterClientReceptionist`:
```csharp
var mgmtActor = _actorSystem!.ActorOf(
Props.Create(() => new ManagementActor(_serviceProvider, mgmtLogger)),
"management");
ClusterClientReceptionist.Get(_actorSystem).RegisterService(mgmtActor);
var mgmtHolder = _serviceProvider.GetRequiredService<ManagementActorHolder>();
mgmtHolder.ActorRef = mgmtActor;
```
`ClusterClientReceptionist` advertises the actor to `ClusterClient` senders without requiring them to join the Akka cluster. The `ManagementActorHolder.ActorRef` property is then the bridge from the HTTP endpoint (which runs in ASP.NET Core middleware) into the Akka actor world.
The actor declares an explicit supervisor strategy — one-for-one with Resume and no retry limit — to match the coordinator-actor convention and remain correct if child actors are added later.
### HTTP Management API (`POST /management`)
`ManagementEndpoints.MapManagementAPI` registers the endpoint. Each request goes through six steps:
1. Raise the per-request body size cap to 200 MB (needed for Transport bundle imports).
2. Decode `Authorization: Basic <base64>` and split username/password.
3. Authenticate via `ILdapAuthService`.
4. Resolve roles via `RoleMapper`, building the `AuthenticatedUser` with any site-scope limits.
5. Deserialize the JSON body (`command` + `payload`) via `ManagementCommandRegistry`.
6. `Ask` the `ManagementActor` with a `ManagementEnvelope` and map the response:
```csharp
return response switch
{
ManagementSuccess success => Results.Text(success.JsonData, "application/json", statusCode: 200),
ManagementError error => Results.Json(new { error = error.Error, code = error.ErrorCode }, statusCode: 400),
ManagementUnauthorized u => Results.Json(new { error = u.Message, code = "UNAUTHORIZED" }, statusCode: 403),
_ => Results.Json(new { error = "Unexpected response.", code = "INTERNAL_ERROR" }, statusCode: 500)
};
```
The `Ask` timeout defaults to 30 seconds and is overridable via `ScadaBridge:ManagementService:CommandTimeout`. An elapsed timeout returns HTTP 504.
### Actor dispatch and error mapping
`ManagementActor.HandleEnvelope` checks the required role, then calls `ProcessCommand`, which opens a DI scope, runs `DispatchCommand`, and wraps the result in `ManagementSuccess`. The `PipeTo` pattern keeps the actor's message loop free during async work; the failure continuation maps exceptions to `ManagementError` or `ManagementUnauthorized`:
```csharp
private void HandleEnvelope(ManagementEnvelope envelope)
{
var sender = Sender;
var correlationId = envelope.CorrelationId;
var user = envelope.User;
var requiredRole = GetRequiredRole(envelope.Command);
if (requiredRole != null && !user.Roles.Contains(requiredRole, StringComparer.OrdinalIgnoreCase))
{
sender.Tell(new ManagementUnauthorized(correlationId,
$"Role '{requiredRole}' required for {envelope.Command.GetType().Name}"));
return;
}
ProcessCommand(envelope, user)
.PipeTo(sender,
success: result => result,
failure: ex => MapFault(ex, correlationId, envelope.Command));
}
```
`ManagementCommandException` carries a message safe to surface to callers. Any other exception is an unanticipated fault; only the correlation ID is returned so internal detail (server names, constraint names) is not disclosed.
### Audit REST API (`/api/audit/*`)
`AuditEndpoints.MapAuditAPI` registers two GET endpoints that go directly to `IAuditLogRepository`, bypassing the actor:
- `GET /api/audit/query` — keyset-paged JSON result. Requires `OperationalAudit` permission (Admin / Audit / AuditReadOnly roles). Accepts `channel`, `kind`, `status`, `sourceSiteId`, `correlationId`, `executionId`, `parentExecutionId`, `fromUtc`, `toUtc`, `pageSize`, and cursor params `afterOccurredAtUtc`/`afterEventId`. Returns `{ events, nextCursor }` where `nextCursor` is explicit `null` on the last page.
- `GET /api/audit/export` — server-side streaming export (CSV or JSONL) of all matching rows, paging the repository internally at 1 000 rows per batch and flushing after each batch. Requires `AuditExport` permission (Admin / Audit roles). `format=parquet` returns HTTP 501 (deferred).
Both endpoints apply the same HTTP Basic Auth / LDAP / role flow as `/management`. Site-scoped callers have their `sourceSiteId` filter intersected with their `PermittedSiteIds`; an explicit out-of-scope filter returns HTTP 403 rather than silently empty results.
### Debug stream (`/debug-stream`)
`DebugStreamHub` is a SignalR hub registered alongside the management endpoints. It authenticates on `OnConnectedAsync` (same Basic Auth / LDAP / role flow), requires the `Deployer` role, and enforces per-instance site scope on `SubscribeInstance`. Accepted connections receive an initial `DebugViewSnapshot` followed by incremental `AttributeValueChanged` and `AlarmStateChanged` events pushed from `DebugStreamService`.
## Usage
### Sending a command from the CLI
The CLI sends a single `POST /management` with JSON body and Basic Auth; it does not use `ClusterClient` directly. A typical request:
```http
POST /management
Authorization: Basic base64(username:password)
Content-Type: application/json
{
"command": "ListSites",
"payload": {}
}
```
A successful response is HTTP 200 with the JSON result. An authorization failure is HTTP 403 with `{ "error": "...", "code": "UNAUTHORIZED" }`.
### Sending a command via ClusterClient
The `ManagementActor` is also reachable from any `ClusterClient` that has a contact point into the central cluster. The actor is registered under `/system/receptionist` with the path `/user/management`. Callers construct and `Tell` a `ManagementEnvelope` and expect one of `ManagementSuccess`, `ManagementError`, or `ManagementUnauthorized` in reply.
## Command Groups
`DispatchCommand` in `ManagementActor.cs` is the canonical enumeration of every supported command. The table below organizes them by domain area.
| Group | Commands | Minimum role |
|-------|----------|--------------|
| Templates | `ListTemplates`, `GetTemplate`, `CreateTemplate`, `UpdateTemplate`, `DeleteTemplate`, `ValidateTemplate` | Designer (mutations) |
| Template members | `AddTemplateAttribute`, `UpdateTemplateAttribute`, `DeleteTemplateAttribute`, `AddTemplateAlarm`, `UpdateTemplateAlarm`, `DeleteTemplateAlarm`, `AddTemplateNativeAlarmSource`, `UpdateTemplateNativeAlarmSource`, `DeleteTemplateNativeAlarmSource`, `ListTemplateNativeAlarmSources`, `AddTemplateScript`, `UpdateTemplateScript`, `DeleteTemplateScript`, `AddTemplateComposition`, `DeleteTemplateComposition` | Designer (mutations) |
| Template folders | `ListTemplateFolders`, `CreateTemplateFolder`, `RenameTemplateFolder`, `MoveTemplateFolder`, `DeleteTemplateFolder`, `MoveTemplateToFolder` | Designer (mutations) |
| Instances | `ListInstances`, `GetInstance`, `CreateInstance`, `MgmtDeployInstance`, `MgmtEnableInstance`, `MgmtDisableInstance`, `MgmtDeleteInstance`, `SetConnectionBindings`, `SetInstanceOverrides`, `SetInstanceArea`, `SetInstanceAlarmOverride`, `DeleteInstanceAlarmOverride`, `ListInstanceAlarmOverrides`, `SetInstanceNativeAlarmSourceOverride`, `DeleteInstanceNativeAlarmSourceOverride`, `ListInstanceNativeAlarmSourceOverrides` | Deployer (mutations) |
| Sites & areas | `ListSites`, `GetSite`, `CreateSite`, `UpdateSite`, `DeleteSite`, `ListAreas`, `CreateArea`, `UpdateArea`, `DeleteArea` | Administrator (site mutations); Designer (`CreateArea`, `UpdateArea`, `DeleteArea`) |
| Data connections | `ListDataConnections`, `GetDataConnection`, `CreateDataConnection`, `UpdateDataConnection`, `DeleteDataConnection` | Designer (mutations) |
| External systems | `ListExternalSystems`, `GetExternalSystem`, `CreateExternalSystem`, `UpdateExternalSystem`, `DeleteExternalSystem`, `ListExternalSystemMethods`, `GetExternalSystemMethod`, `CreateExternalSystemMethod`, `UpdateExternalSystemMethod`, `DeleteExternalSystemMethod` | Designer (mutations) |
| Notification lists / SMTP | `ListNotificationLists`, `GetNotificationList`, `CreateNotificationList`, `UpdateNotificationList`, `DeleteNotificationList`, `ListSmtpConfigs`, `UpdateSmtpConfig` | Designer (mutations) |
| Shared scripts | `ListSharedScripts`, `GetSharedScript`, `CreateSharedScript`, `UpdateSharedScript`, `DeleteSharedScript` | Designer (mutations) |
| Database connections | `ListDatabaseConnections`, `GetDatabaseConnection`, `CreateDatabaseConnectionDef`, `UpdateDatabaseConnectionDef`, `DeleteDatabaseConnectionDef` | Designer (mutations) |
| Inbound API methods | `ListApiMethods`, `GetApiMethod`, `CreateApiMethod`, `UpdateApiMethod`, `DeleteApiMethod` | Designer (mutations) |
| Security | `ListRoleMappings`, `CreateRoleMapping`, `UpdateRoleMapping`, `DeleteRoleMapping`, `ListApiKeys`, `CreateApiKey`, `UpdateApiKey`, `DeleteApiKey`, `SetApiKeyMethods`, `ListScopeRules`, `AddScopeRule`, `DeleteScopeRule` | Administrator |
| Deployments | `MgmtDeployArtifacts`, `QueryDeployments`, `GetDeploymentDiff` | Deployer |
| Health | `GetHealthSummary`, `GetSiteHealth` | Any authenticated user |
| Remote queries | `QueryEventLogsCommand`, `QueryParkedMessagesCommand` (any authenticated user); `RetryParkedMessageCommand`, `DiscardParkedMessageCommand`, `DebugSnapshotCommand` (Deployer) | Varies |
| Audit (legacy) | `QueryAuditLog` | Administrator |
| Transport | `ExportBundle` (Designer), `PreviewBundle`, `ImportBundle` (Administrator) | Varies |
`ValidateTemplate` builds a `FlattenedConfiguration` from the template's attributes, alarms, and scripts, runs the full `ValidationService` pipeline (collision detection, script compilation, trigger reference checks), and merges in naming-collision errors from `TemplateService.DetectCollisionsAsync` — all without a deployment.
`SetInstanceOverrides` validates every attribute name and lock status against the template before applying any write, making the batch all-or-nothing at the validation layer.
## Configuration
| Section | Key | Default | Description |
|---------|-----|---------|-------------|
| `ScadaBridge:ManagementService` | `CommandTimeout` | `00:00:30` | `Ask` timeout the `ManagementEndpoints` applies when forwarding to the `ManagementActor`. A non-positive value falls back to the 30-second default. |
The 200 MB per-request body cap (`ManagementEndpoints.MaxManagementRequestBodyBytes`) is hard-coded; it exists to accommodate Transport (#24) Import calls where a 100 MB raw bundle base64-inflates to roughly 140 MB plus the envelope overhead.
## Dependencies & Interactions
- [Commons (#16)](./Commons.md) — owns the message contracts (`Messages/Management/`), `ManagementEnvelope`, `ManagementCommandRegistry`, `AuthenticatedUser`, and `ManagementSuccess`/`ManagementError`/`ManagementUnauthorized` response types.
- [Configuration Database (#17)](./ConfigurationDatabase.md) — every repository (`ITemplateEngineRepository`, `ISiteRepository`, `IExternalSystemRepository`, `INotificationRepository`, `ISecurityRepository`, `IInboundApiRepository`, `IDeploymentManagerRepository`, `ICentralUiRepository`) and `IAuditService` are backed by EF Core against the central MS SQL database. Management Service resolves them per-command through scoped DI.
- [Template Engine (#1)](./TemplateEngine.md) — `TemplateService`, `TemplateFolderService`, `SharedScriptService`, and the `ValidationService` handle template authoring and validation. Management Service is the sole entry point for template mutations from outside the Central UI.
- [Deployment Manager (#2)](./DeploymentManager.md) — `DeploymentService` and `ArtifactDeploymentService` own the deployment pipeline. `MgmtDeployInstance` and `MgmtDeployArtifacts` delegate here.
- [CentralSite Communication (#5)](./Communication.md) — `CommunicationService` routes `QueryEventLogsCommand`, `QueryParkedMessagesCommand`, `RetryParkedMessageCommand`, `DiscardParkedMessageCommand`, and `DebugSnapshotCommand` to site actors via `ClusterClient`. Deployment commands also flow through the communication layer.
- [Security & Auth (#10)](./Security.md) — `ILdapAuthService` and `RoleMapper` authenticate and map roles on every HTTP request; the `Roles` constants and `IInboundApiKeyAdmin` are also consumed here.
- [Health Monitoring (#11)](./HealthMonitoring.md) — `ICentralHealthAggregator` answers `GetHealthSummary` and `GetSiteHealth` queries synchronously from its in-memory state.
- [Audit Log (#23)](./AuditLog.md) — `AuditEndpoints` reads the central `AuditLog` table via `IAuditLogRepository` directly (no actor hop). `QueryAuditLogCommand` through `/management` is a legacy path for the configuration-change audit via `ICentralUiRepository`.
- [CLI (#19)](./CLI.md) — the primary consumer of `POST /management` and the `/api/audit/*` endpoints. Constructs `ManagementEnvelope`-shaped JSON, sends Basic Auth, and deserializes the response.
- [Host (#15)](./Host.md) — `AkkaHostedService` creates the `ManagementActor`, registers it with `ClusterClientReceptionist`, and sets `ManagementActorHolder.ActorRef` so the HTTP endpoint can reach it.
- Design spec: [Component-ManagementService.md](../requirements/Component-ManagementService.md).
## Troubleshooting
### Actor not ready (HTTP 503)
If `POST /management` returns `503 SERVICE_UNAVAILABLE`, `ManagementActorHolder.ActorRef` is null — the actor system has not finished starting. This resolves itself once `AkkaHostedService.StartAsync` completes. The `/health/ready` endpoint is the gating signal; traffic should not reach `/management` before it returns 200.
### Command timeout (HTTP 504)
A 504 response means the `Ask` to `ManagementActor` did not return within the configured `CommandTimeout`. The server log entry includes the `CorrelationId` from the response body. Common causes: a long-running deployment waiting on a site that is offline, or a database query against a cold EF Core connection. Increasing `ScadaBridge:ManagementService:CommandTimeout` buys time while the root cause is investigated.
### Unexpected internal error
Any exception that is not a `ManagementCommandException` or `SiteScopeViolationException` maps to a generic `COMMAND_FAILED` error with the correlation ID. The server log at `Error` level will contain the full exception, keyed by `CorrelationId`. `ManagementCommandException` messages are intentionally surfaced verbatim; all other exception messages are suppressed on the wire to avoid leaking internal detail.
### Audit log export stalls mid-stream
`GET /api/audit/export` streams rows in pages of 1 000 and flushes after each page. If the response body stops arriving, check whether a proxy is buffering the response (the endpoint sets `Cache-Control: no-store` to defeat most buffers). The `pageSize` parameter on `/api/audit/query` caps at 1 000; requests above that are silently clamped.
## Related Documentation
- [Management Service design specification](../requirements/Component-ManagementService.md)
- [CLI](./CLI.md)
- [CentralSite Communication](./Communication.md)
- [Commons](./Commons.md)
- [Configuration Database](./ConfigurationDatabase.md)
- [Template Engine](./TemplateEngine.md)
- [Deployment Manager](./DeploymentManager.md)
- [Security](./Security.md)
- [Audit Log](./AuditLog.md)
- [Host](./Host.md)