Files
Joseph Doherty 0c3837c778 docs(components): accuracy fixes from deep review (batch 4)
ManagementService (role table: queries any-auth, area mutations Designer;
audit contract exception), CLI (missing instance/api-key subcommands; server
JSON printed verbatim; bundle preview timeout), Transport (BundleFormatVersion
exact-match gate; dependency scan fields; three flushes), CentralUI
(/api/script-analysis endpoints; LoginLayout minimal; Health tile components),
TreeView (Topology no RevealNode; ContextMenu Site branch; InitiallyExpanded).
2026-06-03 16:39:29 -04:00

20 KiB
Raw Permalink Blame History

Management Service

The Management Service is the Akka.NET actor that provides programmatic access to every admin operation on the central cluster — the same operations the Central UI exposes, made available over an HTTP API and, optionally, a ClusterClient path for cross-cluster callers.

Overview

Management Service (#18) runs on the central cluster only. The component code lives in src/ZB.MOM.WW.ScadaBridge.ManagementService/, with four source files:

  • ManagementActor.cs — the ReceiveActor that owns authorization, dispatch, and error mapping for all commands.
  • ManagementEndpoints.cs — the POST /management minimal-API endpoint that authenticates over HTTP Basic Auth and forwards to the actor.
  • AuditEndpoints.cs — dedicated REST endpoints (GET /api/audit/query, GET /api/audit/export) for the centralized Audit Log (#23); these bypass the actor because the workload is read-only and keyset-paged.
  • DebugStreamHub.cs — a SignalR hub for real-time debug stream subscriptions (attribute and alarm state changes).

ServiceCollectionExtensions.AddManagementService registers ManagementActorHolder (a DI singleton that holds the live IActorRef) and binds ManagementServiceOptions from ScadaBridge:ManagementService.

The ManagementActor is not a cluster singleton. Because it is completely stateless — it opens a new DI scope per command and delegates all work to repositories and domain services — every central node runs its own instance. Either node can serve any request independently, so no singleton coordination is needed.

Key Concepts

ManagementEnvelope and the wire protocol

Every command arrives wrapped in a ManagementEnvelope:

public record AuthenticatedUser(
    string Username, string DisplayName,
    string[] Roles, string[] PermittedSiteIds);

public record ManagementEnvelope(AuthenticatedUser User, object Command, string CorrelationId);

The HTTP endpoint constructs the envelope after LDAP authentication and role resolution; the CorrelationId (a Guid formatted as "N") ties server-log entries to the caller's request. The actor never authenticates a second time — the envelope carries an already-resolved AuthenticatedUser.

Role enforcement and site scope

Authorization is a two-level check. GetRequiredRole maps each command type to the minimum role required:

Role Commands
Administrator Site management, role mappings, API key management, scope rules, QueryAuditLogCommand, PreviewBundle, ImportBundle
Designer Template authoring (members, folders, compositions), external systems, data connections, notification lists, shared scripts, database connections, inbound API methods, ExportBundle
Deployer Instance lifecycle, connection bindings, overrides, deployments, debug snapshot, RetryParkedMessageCommand, DiscardParkedMessageCommand
(any authenticated user) Read-only list/get queries, health summary

Within Deployer commands, EnforceSiteScope applies a second check: users whose role mapping carries PermittedSiteIds can only touch instances and sites within their permitted set. Administrators and system-wide deployers (empty PermittedSiteIds) are unrestricted. A violation throws SiteScopeViolationException, which MapFault converts to ManagementUnauthorized.

Command registry

ManagementCommandRegistry (in Commons) maps wire names to CLR types via reflection at startup. It scans the ZB.MOM.WW.ScadaBridge.Commons.Messages.Management namespace for non-abstract types whose name ends in "Command" and stores them in a FrozenDictionary. The HTTP endpoint calls ManagementCommandRegistry.Resolve(commandName) to get the target type, then deserializes the payload JSON into it.

Audit contract

Mutating handlers that call repositories directly invoke AuditAsync (backed by IAuditService) after a successful write. Most handlers that delegate to a domain service — TemplateService, DeploymentService, ArtifactDeploymentService, TemplateFolderService, SharedScriptService — do not call AuditAsync; those services audit internally, avoiding double-logging. However, some delegating handlers also call AuditAsync directly: HandleCreateInstance delegates to InstanceService.CreateInstanceAsync and then calls AuditAsync itself. SMTP configuration and API key responses project out secrets before the audit entry is written.

Architecture

Actor lifecycle and registration

AkkaHostedService (in the Host) creates the ManagementActor under the path /user/management and registers it with ClusterClientReceptionist:

var mgmtActor = _actorSystem!.ActorOf(
    Props.Create(() => new ManagementActor(_serviceProvider, mgmtLogger)),
    "management");
ClusterClientReceptionist.Get(_actorSystem).RegisterService(mgmtActor);
var mgmtHolder = _serviceProvider.GetRequiredService<ManagementActorHolder>();
mgmtHolder.ActorRef = mgmtActor;

ClusterClientReceptionist advertises the actor to ClusterClient senders without requiring them to join the Akka cluster. The ManagementActorHolder.ActorRef property is then the bridge from the HTTP endpoint (which runs in ASP.NET Core middleware) into the Akka actor world.

The actor declares an explicit supervisor strategy — one-for-one with Resume and no retry limit — to match the coordinator-actor convention and remain correct if child actors are added later.

HTTP Management API (POST /management)

ManagementEndpoints.MapManagementAPI registers the endpoint. Each request goes through six steps:

  1. Raise the per-request body size cap to 200 MB (needed for Transport bundle imports).
  2. Decode Authorization: Basic <base64> and split username/password.
  3. Authenticate via ILdapAuthService.
  4. Resolve roles via RoleMapper, building the AuthenticatedUser with any site-scope limits.
  5. Deserialize the JSON body (command + payload) via ManagementCommandRegistry.
  6. Ask the ManagementActor with a ManagementEnvelope and map the response:
return response switch
{
    ManagementSuccess success => Results.Text(success.JsonData, "application/json", statusCode: 200),
    ManagementError error    => Results.Json(new { error = error.Error, code = error.ErrorCode }, statusCode: 400),
    ManagementUnauthorized u => Results.Json(new { error = u.Message, code = "UNAUTHORIZED" }, statusCode: 403),
    _                        => Results.Json(new { error = "Unexpected response.", code = "INTERNAL_ERROR" }, statusCode: 500)
};

The Ask timeout defaults to 30 seconds and is overridable via ScadaBridge:ManagementService:CommandTimeout. An elapsed timeout returns HTTP 504.

Actor dispatch and error mapping

ManagementActor.HandleEnvelope checks the required role, then calls ProcessCommand, which opens a DI scope, runs DispatchCommand, and wraps the result in ManagementSuccess. The PipeTo pattern keeps the actor's message loop free during async work; the failure continuation maps exceptions to ManagementError or ManagementUnauthorized:

private void HandleEnvelope(ManagementEnvelope envelope)
{
    var sender = Sender;
    var correlationId = envelope.CorrelationId;
    var user = envelope.User;

    var requiredRole = GetRequiredRole(envelope.Command);
    if (requiredRole != null && !user.Roles.Contains(requiredRole, StringComparer.OrdinalIgnoreCase))
    {
        sender.Tell(new ManagementUnauthorized(correlationId,
            $"Role '{requiredRole}' required for {envelope.Command.GetType().Name}"));
        return;
    }

    ProcessCommand(envelope, user)
        .PipeTo(sender,
            success: result => result,
            failure: ex => MapFault(ex, correlationId, envelope.Command));
}

ManagementCommandException carries a message safe to surface to callers. Any other exception is an unanticipated fault; only the correlation ID is returned so internal detail (server names, constraint names) is not disclosed.

Audit REST API (/api/audit/*)

AuditEndpoints.MapAuditAPI registers two GET endpoints that go directly to IAuditLogRepository, bypassing the actor:

  • GET /api/audit/query — keyset-paged JSON result. Requires OperationalAudit permission (Admin / Audit / AuditReadOnly roles). Accepts channel, kind, status, sourceSiteId, correlationId, executionId, parentExecutionId, fromUtc, toUtc, pageSize, and cursor params afterOccurredAtUtc/afterEventId. Returns { events, nextCursor } where nextCursor is explicit null on the last page.
  • GET /api/audit/export — server-side streaming export (CSV or JSONL) of all matching rows, paging the repository internally at 1 000 rows per batch and flushing after each batch. Requires AuditExport permission (Admin / Audit roles). format=parquet returns HTTP 501 (deferred).

Both endpoints apply the same HTTP Basic Auth / LDAP / role flow as /management. Site-scoped callers have their sourceSiteId filter intersected with their PermittedSiteIds; an explicit out-of-scope filter returns HTTP 403 rather than silently empty results.

Debug stream (/debug-stream)

DebugStreamHub is a SignalR hub registered alongside the management endpoints. It authenticates on OnConnectedAsync (same Basic Auth / LDAP / role flow), requires the Deployer role, and enforces per-instance site scope on SubscribeInstance. Accepted connections receive an initial DebugViewSnapshot followed by incremental AttributeValueChanged and AlarmStateChanged events pushed from DebugStreamService.

Usage

Sending a command from the CLI

The CLI sends a single POST /management with JSON body and Basic Auth; it does not use ClusterClient directly. A typical request:

POST /management
Authorization: Basic base64(username:password)
Content-Type: application/json

{
  "command": "ListSites",
  "payload": {}
}

A successful response is HTTP 200 with the JSON result. An authorization failure is HTTP 403 with { "error": "...", "code": "UNAUTHORIZED" }.

Sending a command via ClusterClient

The ManagementActor is also reachable from any ClusterClient that has a contact point into the central cluster. The actor is registered under /system/receptionist with the path /user/management. Callers construct and Tell a ManagementEnvelope and expect one of ManagementSuccess, ManagementError, or ManagementUnauthorized in reply.

Command Groups

DispatchCommand in ManagementActor.cs is the canonical enumeration of every supported command. The table below organizes them by domain area.

Group Commands Minimum role
Templates ListTemplates, GetTemplate, CreateTemplate, UpdateTemplate, DeleteTemplate, ValidateTemplate Designer (mutations)
Template members AddTemplateAttribute, UpdateTemplateAttribute, DeleteTemplateAttribute, AddTemplateAlarm, UpdateTemplateAlarm, DeleteTemplateAlarm, AddTemplateNativeAlarmSource, UpdateTemplateNativeAlarmSource, DeleteTemplateNativeAlarmSource, ListTemplateNativeAlarmSources, AddTemplateScript, UpdateTemplateScript, DeleteTemplateScript, AddTemplateComposition, DeleteTemplateComposition Designer (mutations)
Template folders ListTemplateFolders, CreateTemplateFolder, RenameTemplateFolder, MoveTemplateFolder, DeleteTemplateFolder, MoveTemplateToFolder Designer (mutations)
Instances ListInstances, GetInstance, CreateInstance, MgmtDeployInstance, MgmtEnableInstance, MgmtDisableInstance, MgmtDeleteInstance, SetConnectionBindings, SetInstanceOverrides, SetInstanceArea, SetInstanceAlarmOverride, DeleteInstanceAlarmOverride, ListInstanceAlarmOverrides, SetInstanceNativeAlarmSourceOverride, DeleteInstanceNativeAlarmSourceOverride, ListInstanceNativeAlarmSourceOverrides Deployer (mutations)
Sites & areas ListSites, GetSite, CreateSite, UpdateSite, DeleteSite, ListAreas, CreateArea, UpdateArea, DeleteArea Administrator (site mutations); Designer (CreateArea, UpdateArea, DeleteArea)
Data connections ListDataConnections, GetDataConnection, CreateDataConnection, UpdateDataConnection, DeleteDataConnection Designer (mutations)
External systems ListExternalSystems, GetExternalSystem, CreateExternalSystem, UpdateExternalSystem, DeleteExternalSystem, ListExternalSystemMethods, GetExternalSystemMethod, CreateExternalSystemMethod, UpdateExternalSystemMethod, DeleteExternalSystemMethod Designer (mutations)
Notification lists / SMTP ListNotificationLists, GetNotificationList, CreateNotificationList, UpdateNotificationList, DeleteNotificationList, ListSmtpConfigs, UpdateSmtpConfig Designer (mutations)
Shared scripts ListSharedScripts, GetSharedScript, CreateSharedScript, UpdateSharedScript, DeleteSharedScript Designer (mutations)
Database connections ListDatabaseConnections, GetDatabaseConnection, CreateDatabaseConnectionDef, UpdateDatabaseConnectionDef, DeleteDatabaseConnectionDef Designer (mutations)
Inbound API methods ListApiMethods, GetApiMethod, CreateApiMethod, UpdateApiMethod, DeleteApiMethod Designer (mutations)
Security ListRoleMappings, CreateRoleMapping, UpdateRoleMapping, DeleteRoleMapping, ListApiKeys, CreateApiKey, UpdateApiKey, DeleteApiKey, SetApiKeyMethods, ListScopeRules, AddScopeRule, DeleteScopeRule Administrator
Deployments MgmtDeployArtifacts, QueryDeployments, GetDeploymentDiff Deployer
Health GetHealthSummary, GetSiteHealth Any authenticated user
Remote queries QueryEventLogsCommand, QueryParkedMessagesCommand (any authenticated user); RetryParkedMessageCommand, DiscardParkedMessageCommand, DebugSnapshotCommand (Deployer) Varies
Audit (legacy) QueryAuditLog Administrator
Transport ExportBundle (Designer), PreviewBundle, ImportBundle (Administrator) Varies

ValidateTemplate builds a FlattenedConfiguration from the template's attributes, alarms, and scripts, runs the full ValidationService pipeline (collision detection, script compilation, trigger reference checks), and merges in naming-collision errors from TemplateService.DetectCollisionsAsync — all without a deployment.

SetInstanceOverrides validates every attribute name and lock status against the template before applying any write, making the batch all-or-nothing at the validation layer.

Configuration

Section Key Default Description
ScadaBridge:ManagementService CommandTimeout 00:00:30 Ask timeout the ManagementEndpoints applies when forwarding to the ManagementActor. A non-positive value falls back to the 30-second default.

The 200 MB per-request body cap (ManagementEndpoints.MaxManagementRequestBodyBytes) is hard-coded; it exists to accommodate Transport (#24) Import calls where a 100 MB raw bundle base64-inflates to roughly 140 MB plus the envelope overhead.

Dependencies & Interactions

  • Commons (#16) — owns the message contracts (Messages/Management/), ManagementEnvelope, ManagementCommandRegistry, AuthenticatedUser, and ManagementSuccess/ManagementError/ManagementUnauthorized response types.
  • Configuration Database (#17) — every repository (ITemplateEngineRepository, ISiteRepository, IExternalSystemRepository, INotificationRepository, ISecurityRepository, IInboundApiRepository, IDeploymentManagerRepository, ICentralUiRepository) and IAuditService are backed by EF Core against the central MS SQL database. Management Service resolves them per-command through scoped DI.
  • Template Engine (#1)TemplateService, TemplateFolderService, SharedScriptService, and the ValidationService handle template authoring and validation. Management Service is the sole entry point for template mutations from outside the Central UI.
  • Deployment Manager (#2)DeploymentService and ArtifactDeploymentService own the deployment pipeline. MgmtDeployInstance and MgmtDeployArtifacts delegate here.
  • CentralSite Communication (#5)CommunicationService routes QueryEventLogsCommand, QueryParkedMessagesCommand, RetryParkedMessageCommand, DiscardParkedMessageCommand, and DebugSnapshotCommand to site actors via ClusterClient. Deployment commands also flow through the communication layer.
  • Security & Auth (#10)ILdapAuthService and RoleMapper authenticate and map roles on every HTTP request; the Roles constants and IInboundApiKeyAdmin are also consumed here.
  • Health Monitoring (#11)ICentralHealthAggregator answers GetHealthSummary and GetSiteHealth queries synchronously from its in-memory state.
  • Audit Log (#23)AuditEndpoints reads the central AuditLog table via IAuditLogRepository directly (no actor hop). QueryAuditLogCommand through /management is a legacy path for the configuration-change audit via ICentralUiRepository.
  • CLI (#19) — the primary consumer of POST /management and the /api/audit/* endpoints. Constructs ManagementEnvelope-shaped JSON, sends Basic Auth, and deserializes the response.
  • Host (#15)AkkaHostedService creates the ManagementActor, registers it with ClusterClientReceptionist, and sets ManagementActorHolder.ActorRef so the HTTP endpoint can reach it.
  • Design spec: Component-ManagementService.md.

Troubleshooting

Actor not ready (HTTP 503)

If POST /management returns 503 SERVICE_UNAVAILABLE, ManagementActorHolder.ActorRef is null — the actor system has not finished starting. This resolves itself once AkkaHostedService.StartAsync completes. The /health/ready endpoint is the gating signal; traffic should not reach /management before it returns 200.

Command timeout (HTTP 504)

A 504 response means the Ask to ManagementActor did not return within the configured CommandTimeout. The server log entry includes the CorrelationId from the response body. Common causes: a long-running deployment waiting on a site that is offline, or a database query against a cold EF Core connection. Increasing ScadaBridge:ManagementService:CommandTimeout buys time while the root cause is investigated.

Unexpected internal error

Any exception that is not a ManagementCommandException or SiteScopeViolationException maps to a generic COMMAND_FAILED error with the correlation ID. The server log at Error level will contain the full exception, keyed by CorrelationId. ManagementCommandException messages are intentionally surfaced verbatim; all other exception messages are suppressed on the wire to avoid leaking internal detail.

Audit log export stalls mid-stream

GET /api/audit/export streams rows in pages of 1 000 and flushes after each page. If the response body stops arriving, check whether a proxy is buffering the response (the endpoint sets Cache-Control: no-store to defeat most buffers). The pageSize parameter on /api/audit/query caps at 1 000; requests above that are silently clamped.