41 KiB
Code Review — ManagementService
| Field | Value |
|---|---|
| Module | src/ScadaLink.ManagementService |
| Design doc | docs/requirements/Component-ManagementService.md |
| Status | Reviewed |
| Last reviewed | 2026-05-17 |
| Reviewer | claude-agent |
| Commit reviewed | 39d737e |
| Open findings | 0 (1 Deferred — see ManagementService-012) |
Summary
The ManagementService module is a thin command-dispatch layer: a single ManagementActor
fronts every administrative operation, an HTTP POST /management endpoint authenticates and
forwards to it, and a SignalR DebugStreamHub provides real-time debug streaming. The code
is consistently structured and the role-based authorization gate (GetRequiredRole) is
broadly correct and well tested. However, the review surfaced a significant security
theme: site-scope enforcement, which the design document requires for instance- and
site-targeted Deployment operations, is applied inconsistently — several query handlers and
all remote-query/debug handlers perform no site-scope check at all, allowing a site-scoped
Deployment user to read or act on sites outside their scope. A second theme is Akka.NET
convention drift: the actor offloads all work to Task.Run instead of using PipeTo,
declares no supervision strategy, and the contract messages carry a loosely-typed object
payload. There are also resource-management defects in the HTTP endpoint (JsonDocument
instances never disposed) and dead/unused configuration. None of the findings are
crash-class, but the site-scope gaps are High severity because they are a real
authorization bypass with no workaround.
Re-review 2026-05-17 (commit 39d737e)
All thirteen prior findings remain correctly closed; the source under
src/ScadaLink.ManagementService is byte-identical between the previously reviewed state
and 39d737e (the resolution commits of findings 001–013 are folded into the history at or
before 39d737e). ManagementService-012 was re-checked and its Deferred status still
holds: ManagementEnvelope.Command is still typed object, and the marker-interface fix
still belongs in the Commons module, outside this module's edit scope — nothing has changed
to make it actionable here. This re-review re-ran the full 10-category checklist against the
current sources and surfaced four new findings. The dominant theme is the same
site-scope authorization gap that findings 001/002 closed: HandleQueryDeployments
(QueryDeploymentsCommand) was overlooked by that sweep and still performs no site-scope
enforcement, letting a site-scoped Deployment user read deployment history for any site
(014, High). The remaining three are lower severity: a non-atomic multi-override mutation
that can leave an instance partially modified after an error (015, Medium), raw exception
messages from unexpected faults being returned verbatim to HTTP callers (016, Low), and
QueryDeploymentsCommand having no test coverage at all (017, Low).
Checklist coverage
| # | Category | Examined | Notes |
|---|---|---|---|
| 1 | Correctness & logic bugs | + | HandleResolveRoles builds RoleMapper by hand; ResolveRolesCommand is a stale dispatch path. See 008, 011. |
| 2 | Akka.NET conventions | + | Task.Run instead of PipeTo, no supervision strategy, object-typed message payload. See 004, 005, 012. |
| 3 | Concurrency & thread safety | + | Actor is stateless so Task.Run does not corrupt state, but it defeats actor-thread serialization (004). Sender correctly captured to a local before the closure. |
| 4 | Error handling & resilience | + | Exceptions are caught and mapped uniformly; SiteScopeViolationException mapped to Unauthorized. Audit-logging consistency issue noted in 009. |
| 5 | Security | + | Site-scope enforcement missing on query/remote/debug paths. See 001, 002, 003. |
| 6 | Performance & resource management | + | JsonDocument instances never disposed in the HTTP endpoint. See 006. |
| 7 | Design-document adherence | + | Design doc states remote queries enforce site scoping; code does not. ManagementServiceOptions reserved-for-future config is unused. See 001, 010. |
| 8 | Code organization & conventions | + | Mixed serializers (Newtonsoft in actor, System.Text.Json in endpoint); inconsistent audit logging across mutations. See 007, 009. |
| 9 | Testing coverage | + | Authorization is well covered; site-scope enforcement, the HTTP endpoint, DebugStreamHub, and remote-query handlers have no tests. See 013. |
| 10 | Documentation & comments | + | XML docs are accurate where present; ManagementServiceOptions and ResolveRolesCommand paths are undocumented dead code (010, 011). |
Findings
ManagementService-001 — Remote-query and debug-snapshot handlers bypass site-scope enforcement
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementActor.cs:1465, :1481, :1493, :641, :649 |
Description
The design document (Component-ManagementService.md, Authorization section) states that for
Deployment users "Site scoping is enforced for site-scoped Deployment users" and lists
"debug snapshot, parked message queries, site event log queries" among the Deployment-role
operations. HandleQueryEventLogs, HandleQueryParkedMessages, HandleDebugSnapshot,
HandleRetryParkedMessage, and HandleDiscardParkedMessage make no call to EnforceSiteScope
or EnforceSiteScopeForInstance. A Deployment user scoped to site A can therefore query event
logs / parked messages of site B, retry or discard another site's parked messages, and pull a
debug snapshot of any instance simply by supplying a different SiteIdentifier or InstanceId.
This is an authorization bypass with no workaround.
Recommendation
In each of these handlers resolve the target site and call site-scope enforcement before
delegating to CommunicationService. For the SiteIdentifier-keyed handlers, look up the
Site by identifier and enforce against Site.Id; for DebugSnapshotCommand the instance
is already loaded — call EnforceSiteScope(user, instance.SiteId) (which requires threading
AuthenticatedUser into these handlers, currently dropped).
Resolution
Resolved 2026-05-16 (commit <pending>). Threaded AuthenticatedUser into
HandleQueryEventLogs, HandleQueryParkedMessages, HandleRetryParkedMessage,
HandleDiscardParkedMessage, and HandleDebugSnapshot; added an
EnforceSiteScopeForIdentifier helper that resolves the site by identifier and applies
EnforceSiteScope. HandleDebugSnapshot enforces against the already-loaded instance's
SiteId. Regression tests: QueryEventLogs_OutOfScopeForSiteScopedUser_ReturnsUnauthorized,
QueryParkedMessages_OutOfScopeForSiteScopedUser_ReturnsUnauthorized,
RetryParkedMessage_OutOfScopeForSiteScopedUser_ReturnsUnauthorized,
DiscardParkedMessage_OutOfScopeForSiteScopedUser_ReturnsUnauthorized,
DebugSnapshot_OutOfScopeForSiteScopedUser_ReturnsUnauthorized.
ManagementService-002 — Single-entity query handlers leak data across site scope
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementActor.cs:510, :673, :733, :774, :631, :624 |
Description
HandleListInstances and HandleListSites correctly filter their results by the user's
PermittedSiteIds, but the single-entity query handlers do not. HandleGetInstance,
HandleGetSite, HandleListAreas, and HandleGetDataConnection fetch by ID with no
site-scope check, so a site-scoped Deployment user can read any instance, site, area tree,
or data connection by ID even though that site is excluded from their scope. The list
endpoints having a filter while the get-by-id endpoints do not is an inconsistency that
undermines the scoping model. (HandleGetDeploymentDiff and HandleListInstanceAlarmOverrides
do enforce scope, confirming the omission elsewhere is unintentional.)
Recommendation
Apply EnforceSiteScopeForInstance in HandleGetInstance, and EnforceSiteScope against
the resolved site ID in HandleGetSite, HandleListAreas, and HandleGetDataConnection
(for data connections, scope by the connection's SiteId).
Resolution
Resolved 2026-05-16 (commit <pending>). HandleGetInstance, HandleGetSite,
HandleGetDataConnection now take AuthenticatedUser and call EnforceSiteScope against
the resolved entity's site ID (instance SiteId, site Id, data-connection SiteId);
HandleListAreas enforces against the requested SiteId before querying. Regression tests:
GetInstance_OutOfScopeForSiteScopedUser_ReturnsUnauthorized,
GetInstance_InScopeForSiteScopedUser_ReturnsSuccess,
GetSite_OutOfScopeForSiteScopedUser_ReturnsUnauthorized,
GetSite_OutOfScopeForAdminUser_ReturnsSuccess,
ListAreas_OutOfScopeForSiteScopedUser_ReturnsUnauthorized,
GetDataConnection_OutOfScopeForSiteScopedUser_ReturnsUnauthorized.
ManagementService-003 — DebugStreamHub.SubscribeInstance performs no per-instance authorization
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/DebugStreamHub.cs:104 |
Description
OnConnectedAsync authenticates the WebSocket connection and verifies the caller holds the
Deployment role, but SubscribeInstance(int instanceId) accepts any instance ID and starts
a stream without checking that the authenticated user is scoped to that instance's site. A
site-scoped Deployment user can therefore subscribe to the live debug stream (attribute
values, alarm states) of an instance belonging to a site outside their scope. This is the
streaming equivalent of finding 001/002.
Recommendation
Resolve the instance's site inside SubscribeInstance and reject the subscription if the
authenticated user's permitted-site set does not include it. The authenticated identity
established in OnConnectedAsync must be persisted on the connection (e.g. in
Context.Items) so it is available to SubscribeInstance.
Resolution
Resolved 2026-05-16 (commit <pending>). OnConnectedAsync now persists the resolved
roles and PermittedSiteIds in Context.Items. SubscribeInstance resolves the
instance's site via ITemplateEngineRepository and rejects the subscription (sending
OnStreamTerminated) when the new pure DebugStreamHub.IsInstanceAccessAllowed check
fails. The check grants access for the Admin role or system-wide Deployment (empty
permitted set) and otherwise requires the instance's site in the permitted set. Regression
tests: IsInstanceAccessAllowed_SiteScopedUser_OutOfScopeInstance_Denied,
IsInstanceAccessAllowed_SiteScopedUser_InScopeInstance_Allowed,
IsInstanceAccessAllowed_SystemWideDeployment_AnySiteAllowed,
IsInstanceAccessAllowed_AdminRole_BypassesSiteScope,
IsInstanceAccessAllowed_AdminRoleCheck_IsCaseInsensitive.
ManagementService-004 — Actor offloads work to Task.Run instead of using PipeTo
| Severity | Medium |
| Category | Akka.NET conventions |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementActor.cs:61 |
Description
HandleEnvelope runs every command on a thread-pool thread via Task.Run(async () => ...)
and replies from inside the continuation. This is the anti-pattern the project's Akka.NET
conventions warn against — the canonical approach is to start the async work and PipeTo
its result back to Self/Sender. Although Sender is correctly copied to a local before
the closure, the current code: (a) lets multiple commands execute fully concurrently with no
actor-thread serialization, so the actor provides no ordering or back-pressure guarantees
and is an actor in name only; (b) cannot be paused, supervised, or made to honour a mailbox
bound; (c) is shielded from synchronous faults only because every path is inside the
try/catch — any future code path that throws synchronously before the Task.Run body would
escape it.
Recommendation
Replace Task.Run with a method that returns the Task and PipeTo the mapped result
(ManagementSuccess/ManagementError/ManagementUnauthorized) back to the captured sender,
mapping faults in the PipeTo failure continuation. If genuine parallelism is desired, make
that explicit with a router/dispatcher rather than ad-hoc Task.Run.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: HandleEnvelope ran every command via
Task.Run and replied from inside the continuation, contrary to the project's PipeTo
convention. Replaced it with a ProcessCommand method returning a Task<object> and
PipeTo(sender, success, failure); faults are now mapped uniformly in a MapFault failure
continuation (SiteScopeViolationException -> ManagementUnauthorized, otherwise
ManagementError), which also unwraps AggregateException. Regression test:
UnknownCommandType_FaultMappedToManagementError. Existing success/error/unauthorized
mapping tests confirm behaviour is preserved.
ManagementService-005 — ManagementActor declares no supervision strategy
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementActor.cs:33 |
Description
The project conventions call for explicit supervision strategies (Resume for coordinator
actors). ManagementActor is a long-lived coordinator-style actor but overrides no
SupervisorStrategy and defines no PreRestart/PostRestart behaviour. In practice it
spawns no children so the default strategy is rarely exercised, but an explicit strategy
should still be declared for clarity and to match the documented convention; it also matters
if children are added later (e.g. if finding 004 introduces worker actors).
Recommendation
Add an explicit protected override SupervisorStrategy SupervisorStrategy() returning a
Resume-based strategy, consistent with other central coordinator actors.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: ManagementActor declared no
SupervisorStrategy. Added a public static SupervisorStrategy CreateSupervisorStrategy()
factory returning an unbounded OneForOneStrategy with a Directive.Resume decider, and a
protected override SupervisorStrategy() that delegates to it — matching the Resume-based
convention of CentralCommunicationActor/SiteCommunicationActor. The actor spawns no
children today, so this is a forward-looking correctness fix. Regression tests:
CreateSupervisorStrategy_ReturnsOneForOneStrategy,
CreateSupervisorStrategy_ResumesOnArbitraryException,
CreateSupervisorStrategy_ResumesIndefinitely (new ManagementActorSupervisionTests.cs).
ManagementService-006 — JsonDocument instances never disposed in the HTTP endpoint
| Severity | Medium |
| Category | Performance & resource management |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementEndpoints.cs:83, :112 |
Description
JsonDocument is IDisposable (it rents buffers from a pooled ArrayPool). HandleRequest
parses the request body into doc at line 83 and never disposes it, and line 112
(JsonDocument.Parse("{}")) allocates a second document inline that is also never disposed.
Every management HTTP call therefore leaks pooled buffers, increasing GC pressure and pool
churn under load.
Recommendation
Wrap the parsed document in using var doc = .... For the empty-payload fallback, avoid
allocating a JsonDocument entirely — deserialize from the literal string "{}"/an empty
object, or restructure so the fallback path does not parse a throwaway document.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: the request JsonDocument was never
disposed and the empty-payload path allocated a second throwaway JsonDocument. Extracted
request parsing into a testable ManagementEndpoints.ParseCommand helper that wraps the
document in using; the missing-payload case now deserializes from the "{}" literal
string rather than parsing a throwaway document. Regression tests:
ParseCommand_WithExplicitPayload_DeserializesIntoCommandType,
ParseCommand_WithMissingPayload_DeserializesParameterlessCommand,
ParseCommand_WithInvalidJson_ReturnsFailure,
ParseCommand_WithMissingCommandField_ReturnsFailure,
ParseCommand_WithUnknownCommand_ReturnsFailure.
ManagementService-007 — Inconsistent and cycle-prone serialization of repository entities
| Severity | Medium |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementActor.cs:67; src/ScadaLink.ManagementService/ManagementEndpoints.cs:113 |
Description
The actor serializes every command result with Newtonsoft.Json (JsonConvert.SerializeObject)
while the HTTP endpoint deserializes payloads with System.Text.Json. Beyond the
inconsistency, JsonConvert.SerializeObject is applied directly to EF-backed entities
returned by repositories (e.g. Site, DataConnection, NotificationList with a
Recipients collection, Template with children). With default Newtonsoft settings any
bidirectional navigation property produces a JsonSerializationException for self-referencing
loops, and even without cycles this serializes lazy/navigation state the CLI does not expect.
Recommendation
Standardise on one serializer (the rest of the HTTP path uses System.Text.Json). Serialize
explicit DTOs / projections rather than EF entities, or configure
ReferenceLoopHandling.Ignore and ignore navigation properties. Verify that handlers
returning rich entity graphs (HandleGetTemplate, HandleUpdateNotificationList) round-trip
correctly.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: the actor serialized results with
Newtonsoft.Json (not even a direct package reference) while the HTTP endpoint uses
System.Text.Json. Standardised the actor on System.Text.Json via a new
ManagementActor.SerializeResult helper using a shared JsonSerializerOptions with
ReferenceHandler.IgnoreCycles (cycle-safe for EF entity graphs) and camelCase naming
(matches the CLI's case-insensitive deserializer). Removed the Newtonsoft.Json import.
Regression tests: SerializeResult_WithCyclicGraph_DoesNotThrow,
SerializeResult_UsesCamelCasePropertyNames.
ManagementService-008 — HandleResolveRoles constructs RoleMapper manually instead of via DI
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementActor.cs:285 |
Description
Every other handler resolves its collaborators from the scoped IServiceProvider.
HandleResolveRoles instead does new RoleMapper(sp.GetRequiredService<ISecurityRepository>()),
bypassing DI. If RoleMapper ever gains a dependency, caching, or options, this hand-built
instance silently diverges from the DI-registered one. It is also inconsistent with
ManagementEndpoints, which resolves RoleMapper from DI.
Recommendation
Resolve RoleMapper via sp.GetRequiredService<RoleMapper>() like every other dependency.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: HandleResolveRoles did
new RoleMapper(sp.GetRequiredService<ISecurityRepository>()), bypassing the
AddScoped<RoleMapper>() DI registration. The hand-built RoleMapper lived only inside
HandleResolveRoles, which is itself the dead-code dispatch path removed under finding 011
(the two-step ResolveRoles flow is retired). Resolving 011 by deleting the
ResolveRolesCommand dispatch case and HandleResolveRoles handler also removes the only
manually-constructed RoleMapper in the module, so the DI-bypass no longer exists. No
remaining new RoleMapper in src/ScadaLink.ManagementService. Regression covered by
ResolveRolesCommand_IsNoLongerDispatched_ReturnsManagementError.
ManagementService-009 — Audit logging applied inconsistently across mutating handlers
| Severity | Low — re-triaged from Medium; the claimed audit gap does not exist (see Description), leaving only an undocumented-convention issue. |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementActor.cs:357, :1134, :1085, :526, :1275 |
Description
The design doc states "All mutating operations are audit logged." Some handlers call
AuditAsync explicitly (HandleCreateInstance, HandleCreateSite, all repository-direct
external-system/notification/security/area mutations), but the handlers that delegate to a
domain service do not — HandleCreateTemplate/HandleUpdateTemplate/HandleDeleteTemplate,
all template-member handlers (HandleAddAttribute ... HandleDeleteComposition), template-folder
handlers, shared-script handlers, HandleDeployArtifacts, HandleDeployInstance,
HandleEnableInstance/Disable/Delete, and the instance-binding/override handlers.
Re-triage (2026-05-16): the original finding claimed this "creates a real risk of silent
audit gaps for template authoring and deployment operations." That claim was verified against
the actual sources and is false. Every domain service the delegating handlers call —
TemplateService, SharedScriptService, InstanceService, AreaService, SiteService,
TemplateFolderService, DeploymentService, ArtifactDeploymentService — injects
IAuditService and calls LogAsync on every mutation (grep confirms an _auditService.LogAsync
call after each Create/Update/Delete in TemplateService.cs, DeploymentService.cs,
ArtifactDeploymentService.cs, etc.). There is therefore no audit gap; if anything, adding
explicit AuditAsync to a delegating handler would double-log. The genuine issue is purely
organizational: the two-layer split (actor audits repo-direct mutations, services audit their
own) was undocumented, which is what made it look risky. This is a Low-severity
code-organization issue, not a Medium error-handling/resilience defect.
Recommendation
Document the chosen contract so the split cannot be misread as a gap. (The original alternative — moving all auditing into the actor — would require un-auditing eight services and is not warranted given they already audit correctly.)
Resolution
Resolved 2026-05-16 (commit pending). Re-triaged to Low / Code organization after verifying
all eight delegated-to services audit internally — no audit gap exists. Documented the
two-layer audit contract in an XML <remarks> block on ManagementActor.AuditAsync:
repository-direct mutations call AuditAsync; service-delegating handlers must not, because
the services own auditing and a duplicate call would double-log. No behavioural change, so
no new regression test; existing CreateInstanceCommand_WithDeploymentRole_ReturnsSuccess
covers the explicit-audit path.
ManagementService-010 — ManagementServiceOptions.CommandTimeout is defined but never used
| Severity | Low |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementServiceOptions.cs:5; src/ScadaLink.ManagementService/ManagementEndpoints.cs:16 |
Description
ManagementServiceOptions.CommandTimeout is bound from configuration in
ServiceCollectionExtensions, but no code reads it. The HTTP endpoint instead hard-codes
AskTimeout = TimeSpan.FromSeconds(30). The design doc describes the options section as
"Reserved for future configuration — e.g., command timeout overrides", yet a concrete
CommandTimeout property already exists and is silently ignored, so an operator who sets it
in appsettings.json gets no effect.
Recommendation
Either consume ManagementServiceOptions.CommandTimeout in ManagementEndpoints.HandleRequest
(inject IOptions<ManagementServiceOptions>), or remove the property until it is wired up so
configuration cannot be set with no effect.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: ManagementEndpoints hard-coded
AskTimeout = TimeSpan.FromSeconds(30) and never read ManagementServiceOptions.CommandTimeout.
HandleRequest now resolves IOptions<ManagementServiceOptions> from context.RequestServices
and computes the Ask timeout via a new ManagementEndpoints.ResolveAskTimeout helper, which
returns the configured CommandTimeout when strictly positive and otherwise falls back to the
30s default (guarding against a misconfigured zero/negative value that would fail every call).
Regression tests: ResolveAskTimeout_UsesConfiguredCommandTimeout,
ResolveAskTimeout_WithNullOptions_FallsBackToDefault,
ResolveAskTimeout_WithNonPositiveTimeout_FallsBackToDefault.
ManagementService-011 — ResolveRolesCommand dispatch path is stale dead code
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementActor.cs:273, :283 |
Description
The design doc states the HTTP endpoint "collapses the CLI's previous two-step flow
(ResolveRoles + actual command) into a single HTTP round-trip", and indeed ManagementEndpoints
performs LDAP auth and role resolution itself before dispatching. The ResolveRolesCommand
case in DispatchCommand is therefore unreachable from the HTTP path. It remains reachable
only via a raw ClusterClient sender, but a caller able to send ResolveRolesCommand could
enumerate role mappings for arbitrary LDAP groups with no role requirement
(GetRequiredRole returns null for it) — a minor information-disclosure surface for a path
the design says no longer exists.
Recommendation
If the two-step flow is genuinely retired, remove ResolveRolesCommand, its handler, and the
class. If it must remain for non-HTTP clients, document why and confirm exposing role-mapping
data unauthenticated is intended.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed dead path: a repository-wide search found no
ResolveRolesCommand sender outside ManagementActor itself — the CLI and HTTP endpoint
perform LDAP auth + role resolution inline. Removed the ResolveRolesCommand dispatch case
and the HandleResolveRoles handler from ManagementActor; a stray ClusterClient sender now
falls through to the NotSupportedException default and gets a uniform ManagementError
(closing the unauthenticated role-mapping enumeration surface, since GetRequiredRole
returned null for it). A code comment at the former dispatch site documents the intentional
omission. Note: the ResolveRolesCommand record itself lives in
src/ScadaLink.Commons/Messages/Management/SecurityCommands.cs and was left in place — that
file is outside this module's permitted edit scope; deleting the orphan record should be done
as a Commons-module follow-up. With the handler removed it is now an inert,
registry-only type with no behaviour. Regression test:
ResolveRolesCommand_IsNoLongerDispatched_ReturnsManagementError.
ManagementService-012 — ManagementEnvelope carries a loosely-typed object payload
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Deferred |
| Location | src/ScadaLink.Commons/Messages/Management/ManagementEnvelope.cs:7; src/ScadaLink.ManagementService/ManagementActor.cs:132 |
Description
ManagementEnvelope.Command is typed object, so the actor relies on a large open-ended
switch with a NotSupportedException default for unknown types. While the individual
command records are immutable, object defeats compile-time exhaustiveness — adding a new
command record produces no compiler signal that DispatchCommand (and GetRequiredRole)
need updating, and a typo or unregistered command surfaces only as a runtime exception. The
message contract is also harder to evolve safely under the additive-only rule.
Recommendation
Introduce a marker interface (e.g. IManagementCommand) implemented by every command record
and type the envelope payload as that interface. This documents the contract, lets analyzers
flag unhandled cases, and keeps ManagementCommandRegistry's reflection scan precise.
Resolution
Deferred 2026-05-16. Finding verified as genuine: ManagementEnvelope.Command is typed
object and the recommended IManagementCommand marker-interface fix is sound. However, the
fix cannot be implemented within the ManagementService module: both ManagementEnvelope and
all ~50 *Command records live in src/ScadaLink.Commons/Messages/Management/ (17 files),
which is outside this work item's permitted edit scope (src/ScadaLink.ManagementService/**,
its tests, and this findings file only). Adding the marker interface, retyping the envelope,
and having ManagementCommandRegistry constrain its reflection scan to IManagementCommand
implementers is a cohesive Commons-module change and must be done there — also so the Commons
message-contract additive-only evolution rules are respected. Deferred to a Commons-module
work item; no ManagementService-local change is appropriate.
ManagementService-013 — No tests for site-scope enforcement, the HTTP endpoint, or DebugStreamHub
| Severity | Medium |
| Category | Testing coverage |
| Status | Resolved |
| Location | tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1 |
Description
ManagementActorTests covers role-based authorization, success/error mapping, and correlation
IDs thoroughly, but several critical paths are untested: (a) site-scope enforcement —
EnforceSiteScope/EnforceSiteScopeForInstance and SiteScopeViolationException -> Unauthorized
mapping have no test, which is why the gaps in findings 001/002 went unnoticed; (b)
ManagementEndpoints — Basic Auth decoding, malformed-header handling, LDAP/role resolution,
command deserialization, and HTTP status mapping have zero coverage; (c) DebugStreamHub
authentication, subscribe/unsubscribe lifecycle, and ManagementCommandRegistry.Resolve are
untested. The Envelope test helper always passes Array.Empty<string>() for permitted
sites, so no test ever exercises a site-scoped user.
Recommendation
Add tests that exercise a site-scoped Deployment user against in-scope and out-of-scope
targets for instance and site operations, asserting ManagementUnauthorized on violations.
Add WebApplicationFactory-based tests for ManagementEndpoints covering auth failures,
malformed bodies, unknown commands, and the 200/400/403/401/504 mappings.
Resolution
Resolved 2026-05-16 (commit pending). The site-scope and DebugStreamHub coverage gaps
were closed by the resolution of findings 001/002/003 (the ScopedEnvelope helper plus the
*_OutOfScopeForSiteScopedUser_ReturnsUnauthorized tests and DebugStreamHubTests). The
remaining HTTP-endpoint gap is now covered by a new ManagementEndpointsTests.cs exercising
ManagementEndpoints.ParseCommand — command deserialization, malformed JSON, missing
command field, and unknown commands. Full WebApplicationFactory auth-flow tests were
deliberately not added: ManagementEndpoints depends on LdapAuthService and live LDAP
infrastructure, so the testable command-parsing/dispatch logic was extracted into the pure
ParseCommand helper and covered instead. Tests: ParseCommand_* (5),
SerializeResult_* (2), UnknownCommandType_FaultMappedToManagementError, plus the
pre-existing site-scope and DebugStreamHub suites. dotnet test -> 48 passed.
ManagementService-014 — HandleQueryDeployments bypasses site-scope enforcement
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementActor.cs:306, :1174 |
Description
QueryDeploymentsCommand is gated to the Deployment role by GetRequiredRole
(ManagementActor.cs:170–:177), and the design document's Authorization section states
"Site scoping is enforced for site-scoped Deployment users" and explicitly lists deployments
among the Deployment-role operations. HandleQueryDeployments makes no call to
EnforceSiteScope / EnforceSiteScopeForInstance / EnforceSiteScopeForIdentifier: with no
InstanceId it returns IDeploymentManagerRepository.GetAllDeploymentRecordsAsync() (every
deployment record across all sites), and with an InstanceId it returns that instance's
deployment history with no check that the instance's site is within the caller's permitted
set. A site-scoped Deployment user scoped to site A can therefore enumerate deployment
records for instances at site B — instance IDs, DeployedBy (operator usernames),
timestamps, deployment status, and ErrorMessage content — by issuing QueryDeployments
with or without an out-of-scope InstanceId. This is the same authorization-bypass class as
the resolved findings 001/002, on a handler that sweep did not cover; it is DispatchCommand's
only Deployment-role handler with no scope enforcement.
Recommendation
Thread AuthenticatedUser into HandleQueryDeployments (the dispatch case at line 306
already has user in scope). When cmd.InstanceId is supplied, call
EnforceSiteScopeForInstance before querying. When it is not supplied, filter the returned
DeploymentRecord list to the caller's permitted sites — resolve each record's instance to
its SiteId (or join through a site-aware repository query) and drop records for sites
outside PermittedSiteIds, mirroring the HandleListInstances / HandleListSites filter
pattern. Add a regression test for a site-scoped user against in-scope and out-of-scope
instances.
Resolution
Resolved 2026-05-17 (commit pending). Re-triage: the finding understated the gap — it
claimed QueryDeploymentsCommand was already "gated to the Deployment role by
GetRequiredRole (lines 170–177)". Verified against the source: QueryDeploymentsCommand
appeared nowhere in GetRequiredRole, so it required no role at all — any authenticated
user could read every deployment record system-wide. Fix applied both gates: added
QueryDeploymentsCommand to the Deployment-role group in GetRequiredRole, and threaded
AuthenticatedUser into HandleQueryDeployments — the InstanceId branch now calls
EnforceSiteScopeForInstance; the unfiltered branch resolves each record's instance to its
SiteId (cached) and drops records outside PermittedSiteIds, mirroring HandleListInstances.
Regression tests: QueryDeployments_WithDesignRole_ReturnsUnauthorized,
QueryDeployments_FilteredByOutOfScopeInstance_ReturnsUnauthorized,
QueryDeployments_FilteredByInScopeInstance_ReturnsRecords,
QueryDeployments_UnfilteredForSiteScopedUser_DropsOutOfScopeRecords,
QueryDeployments_UnfilteredForAdminUser_ReturnsAllRecords.
ManagementService-015 — HandleSetInstanceOverrides applies overrides non-atomically
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementActor.cs:647–:659 |
Description
HandleSetInstanceOverrides iterates cmd.Overrides and calls
InstanceService.SetAttributeOverrideAsync once per attribute, throwing
InvalidOperationException on the first result.IsSuccess == false. Each
SetAttributeOverrideAsync call persists independently, so if the command supplies five
overrides and the third fails (e.g. an unknown attribute name or a validation error), the
first two overrides are already committed to the configuration database while the caller
receives a ManagementError. The instance is left partially mutated in a state the operator
neither sees nor requested, and the per-instance operation lock referenced in the design's
deployment decisions does not protect against this because the partial writes are committed
before the throw. A retry of the same command then re-applies the already-applied overrides.
Recommendation
Make the multi-override mutation all-or-nothing: either validate every requested override up
front before applying any, or apply all overrides within a single transaction / unit-of-work
so a mid-batch failure rolls back the earlier writes. If InstanceService cannot offer a
batch method, at minimum document the partial-application behaviour on SetInstanceOverridesCommand
and have the handler report which overrides were applied before the failure so the caller
can reconcile.
Resolution
Resolved 2026-05-17 (commit pending). Confirmed: each SetAttributeOverrideAsync call
commits independently, so a mid-batch failure left earlier overrides persisted.
HandleSetInstanceOverrides now validates every requested attribute up front against the
instance's template (exists, not locked) and only begins applying once the whole batch is
known valid — eliminating the realistic partial-mutation failure modes (unknown / locked
attribute). InstanceService is outside this module's edit scope and offers no batch/
transactional method, so a genuine database fault mid-apply remains theoretically possible;
this residual is documented in a code comment on the handler. Regression tests:
SetInstanceOverrides_WithOneInvalidAttribute_PersistsNoOverrides,
SetInstanceOverrides_AllValid_PersistsAllOverrides.
ManagementService-016 — Unexpected exception messages returned verbatim to HTTP callers
| Severity | Low |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.ManagementService/ManagementActor.cs:121–:131 |
Description
MapFault maps any non-SiteScopeViolationException fault to
new ManagementError(correlationId, cause.Message, "COMMAND_FAILED"), and
ManagementEndpoints.HandleRequest returns that Error string directly in the HTTP 400
body. For handler-thrown InvalidOperationExceptions carrying a curated result.Error
message this is intended and safe. But the same path also surfaces the raw .Message of
unanticipated exceptions — a SqlException (which can include server/database names and
constraint details), a DbUpdateException, an ArgumentException from Enum.Parse on a
malformed DataType/TriggerType value, or a NullReferenceException — straight to the
external CLI/HTTP client. This is a minor internal-detail disclosure surface: the exception
text is already logged server-side with full context, so the client does not need the raw
message.
Recommendation
Distinguish handler-curated failures from unexpected faults. Have handlers throw a dedicated
exception type (e.g. ManagementCommandException) for messages that are safe to surface, and
in MapFault return that message for the known type while returning a generic
"An internal error occurred (CorrelationId=...)" string for everything else — the operator
can still correlate to the server log via the correlation ID.
Resolution
Resolved 2026-05-17 (commit pending). Added a ManagementCommandException type for curated,
caller-safe failures and converted every curated throw in ManagementActor (the 34
result.Error rethrows and 15 "not found" / delete-blocked messages) to it. MapFault now
returns the message verbatim only for ManagementCommandException; any other fault (DB
errors, Enum.Parse ArgumentException, NullReferenceException, the unknown-command
NotSupportedException, etc.) yields a generic "An internal error occurred (CorrelationId=...)" string while the full exception is still logged server-side. Regression
tests: UnexpectedFault_ReturnsGenericMessage_NotRawExceptionText,
CuratedHandlerFailure_SurfacesTheCuratedMessage; the two pre-existing
*_WhenRepoThrows_* tests were updated to assert the raw repository message is no longer
leaked.
ManagementService-017 — QueryDeploymentsCommand has no test coverage
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1 |
Description
QueryDeploymentsCommand / HandleQueryDeployments is exercised by no test in
ManagementActorTests. There is no test that it requires the Deployment role, no test of
the InstanceId-filtered versus unfiltered branches, and — because the handler performs no
site-scope enforcement at all — no test that would have caught finding 014. The deployment
query is one of the operations the design's Authorization section calls out for site
scoping, yet it is the only Deployment-role command with neither an authorization test nor
a site-scope test.
Recommendation
Add tests for QueryDeploymentsCommand: a role test (Design/no-role caller →
ManagementUnauthorized), branch coverage for the InstanceId-filtered and unfiltered
repository calls, and — once finding 014 is fixed — site-scope tests for a site-scoped
Deployment user against in-scope and out-of-scope deployment records.
Resolution
Resolved 2026-05-17 (commit pending). Added seven QueryDeployments_* tests to
ManagementActorTests: role gate (_WithDesignRole_ReturnsUnauthorized), the
InstanceId-filtered and unfiltered branches (_FilteredByInstanceId_ReturnsInstanceRecords,
_UnfilteredWithDeploymentRole_ReturnsAllRecords), and site-scope coverage for a site-scoped
Deployment user and an Admin user, in- and out-of-scope
(_FilteredByOutOfScopeInstance_ReturnsUnauthorized, _FilteredByInScopeInstance_ReturnsRecords,
_UnfilteredForSiteScopedUser_DropsOutOfScopeRecords, _UnfilteredForAdminUser_ReturnsAllRecords).