21 new findings: Host-012..015, InboundAPI-014..017, ManagementService-014..017, NotificationService-014..018, Security-012..015.
713 lines
38 KiB
Markdown
713 lines
38 KiB
Markdown
# Code Review — ManagementService
|
||
|
||
| Field | Value |
|
||
|-------|-------|
|
||
| Module | `src/ScadaLink.ManagementService` |
|
||
| Design doc | `docs/requirements/Component-ManagementService.md` |
|
||
| Status | Reviewed |
|
||
| Last reviewed | 2026-05-17 |
|
||
| Reviewer | claude-agent |
|
||
| Commit reviewed | `39d737e` |
|
||
| Open findings | 4 (1 Deferred — see ManagementService-012) |
|
||
|
||
## Summary
|
||
|
||
The ManagementService module is a thin command-dispatch layer: a single `ManagementActor`
|
||
fronts every administrative operation, an HTTP `POST /management` endpoint authenticates and
|
||
forwards to it, and a SignalR `DebugStreamHub` provides real-time debug streaming. The code
|
||
is consistently structured and the role-based authorization gate (`GetRequiredRole`) is
|
||
broadly correct and well tested. However, the review surfaced a significant **security
|
||
theme**: site-scope enforcement, which the design document requires for instance- and
|
||
site-targeted Deployment operations, is applied inconsistently — several query handlers and
|
||
all remote-query/debug handlers perform no site-scope check at all, allowing a site-scoped
|
||
Deployment user to read or act on sites outside their scope. A second theme is **Akka.NET
|
||
convention drift**: the actor offloads all work to `Task.Run` instead of using `PipeTo`,
|
||
declares no supervision strategy, and the contract messages carry a loosely-typed `object`
|
||
payload. There are also resource-management defects in the HTTP endpoint (`JsonDocument`
|
||
instances never disposed) and dead/unused configuration. None of the findings are
|
||
crash-class, but the site-scope gaps are High severity because they are a real
|
||
authorization bypass with no workaround.
|
||
|
||
#### Re-review 2026-05-17 (commit `39d737e`)
|
||
|
||
All thirteen prior findings remain correctly closed; the source under
|
||
`src/ScadaLink.ManagementService` is byte-identical between the previously reviewed state
|
||
and `39d737e` (the resolution commits of findings 001–013 are folded into the history at or
|
||
before `39d737e`). ManagementService-012 was re-checked and its **Deferred** status still
|
||
holds: `ManagementEnvelope.Command` is still typed `object`, and the marker-interface fix
|
||
still belongs in the Commons module, outside this module's edit scope — nothing has changed
|
||
to make it actionable here. This re-review re-ran the full 10-category checklist against the
|
||
current sources and surfaced **four new findings**. The dominant theme is the same
|
||
site-scope authorization gap that findings 001/002 closed: `HandleQueryDeployments`
|
||
(`QueryDeploymentsCommand`) was overlooked by that sweep and still performs no site-scope
|
||
enforcement, letting a site-scoped Deployment user read deployment history for any site
|
||
(014, High). The remaining three are lower severity: a non-atomic multi-override mutation
|
||
that can leave an instance partially modified after an error (015, Medium), raw exception
|
||
messages from unexpected faults being returned verbatim to HTTP callers (016, Low), and
|
||
`QueryDeploymentsCommand` having no test coverage at all (017, Low).
|
||
|
||
## Checklist coverage
|
||
|
||
| # | Category | Examined | Notes |
|
||
|---|----------|----------|-------|
|
||
| 1 | Correctness & logic bugs | + | `HandleResolveRoles` builds `RoleMapper` by hand; `ResolveRolesCommand` is a stale dispatch path. See 008, 011. |
|
||
| 2 | Akka.NET conventions | + | `Task.Run` instead of `PipeTo`, no supervision strategy, `object`-typed message payload. See 004, 005, 012. |
|
||
| 3 | Concurrency & thread safety | + | Actor is stateless so `Task.Run` does not corrupt state, but it defeats actor-thread serialization (004). `Sender` correctly captured to a local before the closure. |
|
||
| 4 | Error handling & resilience | + | Exceptions are caught and mapped uniformly; `SiteScopeViolationException` mapped to `Unauthorized`. Audit-logging consistency issue noted in 009. |
|
||
| 5 | Security | + | Site-scope enforcement missing on query/remote/debug paths. See 001, 002, 003. |
|
||
| 6 | Performance & resource management | + | `JsonDocument` instances never disposed in the HTTP endpoint. See 006. |
|
||
| 7 | Design-document adherence | + | Design doc states remote queries enforce site scoping; code does not. `ManagementServiceOptions` reserved-for-future config is unused. See 001, 010. |
|
||
| 8 | Code organization & conventions | + | Mixed serializers (Newtonsoft in actor, System.Text.Json in endpoint); inconsistent audit logging across mutations. See 007, 009. |
|
||
| 9 | Testing coverage | + | Authorization is well covered; site-scope enforcement, the HTTP endpoint, `DebugStreamHub`, and remote-query handlers have no tests. See 013. |
|
||
| 10 | Documentation & comments | + | XML docs are accurate where present; `ManagementServiceOptions` and `ResolveRolesCommand` paths are undocumented dead code (010, 011). |
|
||
|
||
## Findings
|
||
|
||
### ManagementService-001 — Remote-query and debug-snapshot handlers bypass site-scope enforcement
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | High |
|
||
| Category | Security |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:1465`, `:1481`, `:1493`, `:641`, `:649` |
|
||
|
||
**Description**
|
||
|
||
The design document (`Component-ManagementService.md`, Authorization section) states that for
|
||
Deployment users "Site scoping is enforced for site-scoped Deployment users" and lists
|
||
"debug snapshot, parked message queries, site event log queries" among the Deployment-role
|
||
operations. `HandleQueryEventLogs`, `HandleQueryParkedMessages`, `HandleDebugSnapshot`,
|
||
`HandleRetryParkedMessage`, and `HandleDiscardParkedMessage` make no call to `EnforceSiteScope`
|
||
or `EnforceSiteScopeForInstance`. A Deployment user scoped to site A can therefore query event
|
||
logs / parked messages of site B, retry or discard another site's parked messages, and pull a
|
||
debug snapshot of any instance simply by supplying a different `SiteIdentifier` or `InstanceId`.
|
||
This is an authorization bypass with no workaround.
|
||
|
||
**Recommendation**
|
||
|
||
In each of these handlers resolve the target site and call site-scope enforcement before
|
||
delegating to `CommunicationService`. For the `SiteIdentifier`-keyed handlers, look up the
|
||
`Site` by identifier and enforce against `Site.Id`; for `DebugSnapshotCommand` the instance
|
||
is already loaded — call `EnforceSiteScope(user, instance.SiteId)` (which requires threading
|
||
`AuthenticatedUser` into these handlers, currently dropped).
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit `<pending>`). Threaded `AuthenticatedUser` into
|
||
`HandleQueryEventLogs`, `HandleQueryParkedMessages`, `HandleRetryParkedMessage`,
|
||
`HandleDiscardParkedMessage`, and `HandleDebugSnapshot`; added an
|
||
`EnforceSiteScopeForIdentifier` helper that resolves the site by identifier and applies
|
||
`EnforceSiteScope`. `HandleDebugSnapshot` enforces against the already-loaded instance's
|
||
`SiteId`. Regression tests: `QueryEventLogs_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
|
||
`QueryParkedMessages_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
|
||
`RetryParkedMessage_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
|
||
`DiscardParkedMessage_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
|
||
`DebugSnapshot_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`.
|
||
|
||
### ManagementService-002 — Single-entity query handlers leak data across site scope
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | High |
|
||
| Category | Security |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:510`, `:673`, `:733`, `:774`, `:631`, `:624` |
|
||
|
||
**Description**
|
||
|
||
`HandleListInstances` and `HandleListSites` correctly filter their results by the user's
|
||
`PermittedSiteIds`, but the single-entity query handlers do not. `HandleGetInstance`,
|
||
`HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection` fetch by ID with no
|
||
site-scope check, so a site-scoped Deployment user can read any instance, site, area tree,
|
||
or data connection by ID even though that site is excluded from their scope. The list
|
||
endpoints having a filter while the get-by-id endpoints do not is an inconsistency that
|
||
undermines the scoping model. (`HandleGetDeploymentDiff` and `HandleListInstanceAlarmOverrides`
|
||
do enforce scope, confirming the omission elsewhere is unintentional.)
|
||
|
||
**Recommendation**
|
||
|
||
Apply `EnforceSiteScopeForInstance` in `HandleGetInstance`, and `EnforceSiteScope` against
|
||
the resolved site ID in `HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection`
|
||
(for data connections, scope by the connection's `SiteId`).
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit `<pending>`). `HandleGetInstance`, `HandleGetSite`,
|
||
`HandleGetDataConnection` now take `AuthenticatedUser` and call `EnforceSiteScope` against
|
||
the resolved entity's site ID (instance `SiteId`, site `Id`, data-connection `SiteId`);
|
||
`HandleListAreas` enforces against the requested `SiteId` before querying. Regression tests:
|
||
`GetInstance_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
|
||
`GetInstance_InScopeForSiteScopedUser_ReturnsSuccess`,
|
||
`GetSite_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
|
||
`GetSite_OutOfScopeForAdminUser_ReturnsSuccess`,
|
||
`ListAreas_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`,
|
||
`GetDataConnection_OutOfScopeForSiteScopedUser_ReturnsUnauthorized`.
|
||
|
||
### ManagementService-003 — DebugStreamHub.SubscribeInstance performs no per-instance authorization
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | High |
|
||
| Category | Security |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ManagementService/DebugStreamHub.cs:104` |
|
||
|
||
**Description**
|
||
|
||
`OnConnectedAsync` authenticates the WebSocket connection and verifies the caller holds the
|
||
`Deployment` role, but `SubscribeInstance(int instanceId)` accepts any instance ID and starts
|
||
a stream without checking that the authenticated user is scoped to that instance's site. A
|
||
site-scoped Deployment user can therefore subscribe to the live debug stream (attribute
|
||
values, alarm states) of an instance belonging to a site outside their scope. This is the
|
||
streaming equivalent of finding 001/002.
|
||
|
||
**Recommendation**
|
||
|
||
Resolve the instance's site inside `SubscribeInstance` and reject the subscription if the
|
||
authenticated user's permitted-site set does not include it. The authenticated identity
|
||
established in `OnConnectedAsync` must be persisted on the connection (e.g. in
|
||
`Context.Items`) so it is available to `SubscribeInstance`.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit `<pending>`). `OnConnectedAsync` now persists the resolved
|
||
roles and `PermittedSiteIds` in `Context.Items`. `SubscribeInstance` resolves the
|
||
instance's site via `ITemplateEngineRepository` and rejects the subscription (sending
|
||
`OnStreamTerminated`) when the new pure `DebugStreamHub.IsInstanceAccessAllowed` check
|
||
fails. The check grants access for the Admin role or system-wide Deployment (empty
|
||
permitted set) and otherwise requires the instance's site in the permitted set. Regression
|
||
tests: `IsInstanceAccessAllowed_SiteScopedUser_OutOfScopeInstance_Denied`,
|
||
`IsInstanceAccessAllowed_SiteScopedUser_InScopeInstance_Allowed`,
|
||
`IsInstanceAccessAllowed_SystemWideDeployment_AnySiteAllowed`,
|
||
`IsInstanceAccessAllowed_AdminRole_BypassesSiteScope`,
|
||
`IsInstanceAccessAllowed_AdminRoleCheck_IsCaseInsensitive`.
|
||
|
||
### ManagementService-004 — Actor offloads work to Task.Run instead of using PipeTo
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Akka.NET conventions |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:61` |
|
||
|
||
**Description**
|
||
|
||
`HandleEnvelope` runs every command on a thread-pool thread via `Task.Run(async () => ...)`
|
||
and replies from inside the continuation. This is the anti-pattern the project's Akka.NET
|
||
conventions warn against — the canonical approach is to start the async work and `PipeTo`
|
||
its result back to `Self`/`Sender`. Although `Sender` is correctly copied to a local before
|
||
the closure, the current code: (a) lets multiple commands execute fully concurrently with no
|
||
actor-thread serialization, so the actor provides no ordering or back-pressure guarantees
|
||
and is an actor in name only; (b) cannot be paused, supervised, or made to honour a mailbox
|
||
bound; (c) is shielded from synchronous faults only because every path is inside the
|
||
try/catch — any future code path that throws synchronously before the `Task.Run` body would
|
||
escape it.
|
||
|
||
**Recommendation**
|
||
|
||
Replace `Task.Run` with a method that returns the `Task` and `PipeTo` the mapped result
|
||
(`ManagementSuccess`/`ManagementError`/`ManagementUnauthorized`) back to the captured sender,
|
||
mapping faults in the `PipeTo` failure continuation. If genuine parallelism is desired, make
|
||
that explicit with a router/dispatcher rather than ad-hoc `Task.Run`.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Confirmed: `HandleEnvelope` ran every command via
|
||
`Task.Run` and replied from inside the continuation, contrary to the project's PipeTo
|
||
convention. Replaced it with a `ProcessCommand` method returning a `Task<object>` and
|
||
`PipeTo(sender, success, failure)`; faults are now mapped uniformly in a `MapFault` failure
|
||
continuation (`SiteScopeViolationException` -> `ManagementUnauthorized`, otherwise
|
||
`ManagementError`), which also unwraps `AggregateException`. Regression test:
|
||
`UnknownCommandType_FaultMappedToManagementError`. Existing success/error/unauthorized
|
||
mapping tests confirm behaviour is preserved.
|
||
|
||
### ManagementService-005 — ManagementActor declares no supervision strategy
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Akka.NET conventions |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:33` |
|
||
|
||
**Description**
|
||
|
||
The project conventions call for explicit supervision strategies (Resume for coordinator
|
||
actors). `ManagementActor` is a long-lived coordinator-style actor but overrides no
|
||
`SupervisorStrategy` and defines no `PreRestart`/`PostRestart` behaviour. In practice it
|
||
spawns no children so the default strategy is rarely exercised, but an explicit strategy
|
||
should still be declared for clarity and to match the documented convention; it also matters
|
||
if children are added later (e.g. if finding 004 introduces worker actors).
|
||
|
||
**Recommendation**
|
||
|
||
Add an explicit `protected override SupervisorStrategy SupervisorStrategy()` returning a
|
||
Resume-based strategy, consistent with other central coordinator actors.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Confirmed: `ManagementActor` declared no
|
||
`SupervisorStrategy`. Added a `public static SupervisorStrategy CreateSupervisorStrategy()`
|
||
factory returning an unbounded `OneForOneStrategy` with a `Directive.Resume` decider, and a
|
||
`protected override SupervisorStrategy()` that delegates to it — matching the Resume-based
|
||
convention of `CentralCommunicationActor`/`SiteCommunicationActor`. The actor spawns no
|
||
children today, so this is a forward-looking correctness fix. Regression tests:
|
||
`CreateSupervisorStrategy_ReturnsOneForOneStrategy`,
|
||
`CreateSupervisorStrategy_ResumesOnArbitraryException`,
|
||
`CreateSupervisorStrategy_ResumesIndefinitely` (new `ManagementActorSupervisionTests.cs`).
|
||
|
||
### ManagementService-006 — JsonDocument instances never disposed in the HTTP endpoint
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Performance & resource management |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementEndpoints.cs:83`, `:112` |
|
||
|
||
**Description**
|
||
|
||
`JsonDocument` is `IDisposable` (it rents buffers from a pooled `ArrayPool`). `HandleRequest`
|
||
parses the request body into `doc` at line 83 and never disposes it, and line 112
|
||
(`JsonDocument.Parse("{}")`) allocates a second document inline that is also never disposed.
|
||
Every management HTTP call therefore leaks pooled buffers, increasing GC pressure and pool
|
||
churn under load.
|
||
|
||
**Recommendation**
|
||
|
||
Wrap the parsed document in `using var doc = ...`. For the empty-payload fallback, avoid
|
||
allocating a `JsonDocument` entirely — deserialize from the literal string `"{}"`/an empty
|
||
object, or restructure so the fallback path does not parse a throwaway document.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Confirmed: the request `JsonDocument` was never
|
||
disposed and the empty-payload path allocated a second throwaway `JsonDocument`. Extracted
|
||
request parsing into a testable `ManagementEndpoints.ParseCommand` helper that wraps the
|
||
document in `using`; the missing-payload case now deserializes from the `"{}"` literal
|
||
string rather than parsing a throwaway document. Regression tests:
|
||
`ParseCommand_WithExplicitPayload_DeserializesIntoCommandType`,
|
||
`ParseCommand_WithMissingPayload_DeserializesParameterlessCommand`,
|
||
`ParseCommand_WithInvalidJson_ReturnsFailure`,
|
||
`ParseCommand_WithMissingCommandField_ReturnsFailure`,
|
||
`ParseCommand_WithUnknownCommand_ReturnsFailure`.
|
||
|
||
### ManagementService-007 — Inconsistent and cycle-prone serialization of repository entities
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Code organization & conventions |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:67`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:113` |
|
||
|
||
**Description**
|
||
|
||
The actor serializes every command result with `Newtonsoft.Json` (`JsonConvert.SerializeObject`)
|
||
while the HTTP endpoint deserializes payloads with `System.Text.Json`. Beyond the
|
||
inconsistency, `JsonConvert.SerializeObject` is applied directly to EF-backed entities
|
||
returned by repositories (e.g. `Site`, `DataConnection`, `NotificationList` with a
|
||
`Recipients` collection, `Template` with children). With default Newtonsoft settings any
|
||
bidirectional navigation property produces a `JsonSerializationException` for self-referencing
|
||
loops, and even without cycles this serializes lazy/navigation state the CLI does not expect.
|
||
|
||
**Recommendation**
|
||
|
||
Standardise on one serializer (the rest of the HTTP path uses `System.Text.Json`). Serialize
|
||
explicit DTOs / projections rather than EF entities, or configure
|
||
`ReferenceLoopHandling.Ignore` and ignore navigation properties. Verify that handlers
|
||
returning rich entity graphs (`HandleGetTemplate`, `HandleUpdateNotificationList`) round-trip
|
||
correctly.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Confirmed: the actor serialized results with
|
||
`Newtonsoft.Json` (not even a direct package reference) while the HTTP endpoint uses
|
||
`System.Text.Json`. Standardised the actor on `System.Text.Json` via a new
|
||
`ManagementActor.SerializeResult` helper using a shared `JsonSerializerOptions` with
|
||
`ReferenceHandler.IgnoreCycles` (cycle-safe for EF entity graphs) and camelCase naming
|
||
(matches the CLI's case-insensitive deserializer). Removed the `Newtonsoft.Json` import.
|
||
Regression tests: `SerializeResult_WithCyclicGraph_DoesNotThrow`,
|
||
`SerializeResult_UsesCamelCasePropertyNames`.
|
||
|
||
### ManagementService-008 — HandleResolveRoles constructs RoleMapper manually instead of via DI
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Correctness & logic bugs |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:285` |
|
||
|
||
**Description**
|
||
|
||
Every other handler resolves its collaborators from the scoped `IServiceProvider`.
|
||
`HandleResolveRoles` instead does `new RoleMapper(sp.GetRequiredService<ISecurityRepository>())`,
|
||
bypassing DI. If `RoleMapper` ever gains a dependency, caching, or options, this hand-built
|
||
instance silently diverges from the DI-registered one. It is also inconsistent with
|
||
`ManagementEndpoints`, which resolves `RoleMapper` from DI.
|
||
|
||
**Recommendation**
|
||
|
||
Resolve `RoleMapper` via `sp.GetRequiredService<RoleMapper>()` like every other dependency.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Confirmed: `HandleResolveRoles` did
|
||
`new RoleMapper(sp.GetRequiredService<ISecurityRepository>())`, bypassing the
|
||
`AddScoped<RoleMapper>()` DI registration. The hand-built `RoleMapper` lived only inside
|
||
`HandleResolveRoles`, which is itself the dead-code dispatch path removed under finding 011
|
||
(the two-step ResolveRoles flow is retired). Resolving 011 by deleting the
|
||
`ResolveRolesCommand` dispatch case and `HandleResolveRoles` handler also removes the only
|
||
manually-constructed `RoleMapper` in the module, so the DI-bypass no longer exists. No
|
||
remaining `new RoleMapper` in `src/ScadaLink.ManagementService`. Regression covered by
|
||
`ResolveRolesCommand_IsNoLongerDispatched_ReturnsManagementError`.
|
||
|
||
### ManagementService-009 — Audit logging applied inconsistently across mutating handlers
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low — re-triaged from Medium; the claimed audit gap does not exist (see Description), leaving only an undocumented-convention issue. |
|
||
| Category | Code organization & conventions |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:357`, `:1134`, `:1085`, `:526`, `:1275` |
|
||
|
||
**Description**
|
||
|
||
The design doc states "All mutating operations are audit logged." Some handlers call
|
||
`AuditAsync` explicitly (`HandleCreateInstance`, `HandleCreateSite`, all repository-direct
|
||
external-system/notification/security/area mutations), but the handlers that delegate to a
|
||
domain service do **not** — `HandleCreateTemplate`/`HandleUpdateTemplate`/`HandleDeleteTemplate`,
|
||
all template-member handlers (`HandleAddAttribute` ... `HandleDeleteComposition`), template-folder
|
||
handlers, shared-script handlers, `HandleDeployArtifacts`, `HandleDeployInstance`,
|
||
`HandleEnableInstance`/`Disable`/`Delete`, and the instance-binding/override handlers.
|
||
|
||
**Re-triage (2026-05-16):** the original finding claimed this "creates a real risk of silent
|
||
audit gaps for template authoring and deployment operations." That claim was verified against
|
||
the actual sources and is **false**. Every domain service the delegating handlers call —
|
||
`TemplateService`, `SharedScriptService`, `InstanceService`, `AreaService`, `SiteService`,
|
||
`TemplateFolderService`, `DeploymentService`, `ArtifactDeploymentService` — injects
|
||
`IAuditService` and calls `LogAsync` on every mutation (`grep` confirms an `_auditService.LogAsync`
|
||
call after each `Create`/`Update`/`Delete` in `TemplateService.cs`, `DeploymentService.cs`,
|
||
`ArtifactDeploymentService.cs`, etc.). There is therefore no audit gap; if anything, adding
|
||
explicit `AuditAsync` to a delegating handler would *double-log*. The genuine issue is purely
|
||
organizational: the two-layer split (actor audits repo-direct mutations, services audit their
|
||
own) was undocumented, which is what made it look risky. This is a Low-severity
|
||
code-organization issue, not a Medium error-handling/resilience defect.
|
||
|
||
**Recommendation**
|
||
|
||
Document the chosen contract so the split cannot be misread as a gap. (The original
|
||
alternative — moving all auditing into the actor — would require un-auditing eight services
|
||
and is not warranted given they already audit correctly.)
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Re-triaged to Low / Code organization after verifying
|
||
all eight delegated-to services audit internally — no audit gap exists. Documented the
|
||
two-layer audit contract in an XML `<remarks>` block on `ManagementActor.AuditAsync`:
|
||
repository-direct mutations call `AuditAsync`; service-delegating handlers must not, because
|
||
the services own auditing and a duplicate call would double-log. No behavioural change, so
|
||
no new regression test; existing `CreateInstanceCommand_WithDeploymentRole_ReturnsSuccess`
|
||
covers the explicit-audit path.
|
||
|
||
### ManagementService-010 — ManagementServiceOptions.CommandTimeout is defined but never used
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Design-document adherence |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementServiceOptions.cs:5`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:16` |
|
||
|
||
**Description**
|
||
|
||
`ManagementServiceOptions.CommandTimeout` is bound from configuration in
|
||
`ServiceCollectionExtensions`, but no code reads it. The HTTP endpoint instead hard-codes
|
||
`AskTimeout = TimeSpan.FromSeconds(30)`. The design doc describes the options section as
|
||
"Reserved for future configuration — e.g., command timeout overrides", yet a concrete
|
||
`CommandTimeout` property already exists and is silently ignored, so an operator who sets it
|
||
in `appsettings.json` gets no effect.
|
||
|
||
**Recommendation**
|
||
|
||
Either consume `ManagementServiceOptions.CommandTimeout` in `ManagementEndpoints.HandleRequest`
|
||
(inject `IOptions<ManagementServiceOptions>`), or remove the property until it is wired up so
|
||
configuration cannot be set with no effect.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Confirmed: `ManagementEndpoints` hard-coded
|
||
`AskTimeout = TimeSpan.FromSeconds(30)` and never read `ManagementServiceOptions.CommandTimeout`.
|
||
`HandleRequest` now resolves `IOptions<ManagementServiceOptions>` from `context.RequestServices`
|
||
and computes the Ask timeout via a new `ManagementEndpoints.ResolveAskTimeout` helper, which
|
||
returns the configured `CommandTimeout` when strictly positive and otherwise falls back to the
|
||
30s default (guarding against a misconfigured zero/negative value that would fail every call).
|
||
Regression tests: `ResolveAskTimeout_UsesConfiguredCommandTimeout`,
|
||
`ResolveAskTimeout_WithNullOptions_FallsBackToDefault`,
|
||
`ResolveAskTimeout_WithNonPositiveTimeout_FallsBackToDefault`.
|
||
|
||
### ManagementService-011 — ResolveRolesCommand dispatch path is stale dead code
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Correctness & logic bugs |
|
||
| Status | Resolved |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:273`, `:283` |
|
||
|
||
**Description**
|
||
|
||
The design doc states the HTTP endpoint "collapses the CLI's previous two-step flow
|
||
(ResolveRoles + actual command) into a single HTTP round-trip", and indeed `ManagementEndpoints`
|
||
performs LDAP auth and role resolution itself before dispatching. The `ResolveRolesCommand`
|
||
case in `DispatchCommand` is therefore unreachable from the HTTP path. It remains reachable
|
||
only via a raw ClusterClient sender, but a caller able to send `ResolveRolesCommand` could
|
||
enumerate role mappings for arbitrary LDAP groups with no role requirement
|
||
(`GetRequiredRole` returns null for it) — a minor information-disclosure surface for a path
|
||
the design says no longer exists.
|
||
|
||
**Recommendation**
|
||
|
||
If the two-step flow is genuinely retired, remove `ResolveRolesCommand`, its handler, and the
|
||
class. If it must remain for non-HTTP clients, document why and confirm exposing role-mapping
|
||
data unauthenticated is intended.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). Confirmed dead path: a repository-wide search found no
|
||
`ResolveRolesCommand` sender outside `ManagementActor` itself — the CLI and HTTP endpoint
|
||
perform LDAP auth + role resolution inline. Removed the `ResolveRolesCommand` dispatch case
|
||
and the `HandleResolveRoles` handler from `ManagementActor`; a stray ClusterClient sender now
|
||
falls through to the `NotSupportedException` default and gets a uniform `ManagementError`
|
||
(closing the unauthenticated role-mapping enumeration surface, since `GetRequiredRole`
|
||
returned null for it). A code comment at the former dispatch site documents the intentional
|
||
omission. Note: the `ResolveRolesCommand` *record* itself lives in
|
||
`src/ScadaLink.Commons/Messages/Management/SecurityCommands.cs` and was left in place — that
|
||
file is outside this module's permitted edit scope; deleting the orphan record should be done
|
||
as a Commons-module follow-up. With the handler removed it is now an inert,
|
||
registry-only type with no behaviour. Regression test:
|
||
`ResolveRolesCommand_IsNoLongerDispatched_ReturnsManagementError`.
|
||
|
||
### ManagementService-012 — ManagementEnvelope carries a loosely-typed object payload
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Akka.NET conventions |
|
||
| Status | Deferred |
|
||
| Location | `src/ScadaLink.Commons/Messages/Management/ManagementEnvelope.cs:7`; `src/ScadaLink.ManagementService/ManagementActor.cs:132` |
|
||
|
||
**Description**
|
||
|
||
`ManagementEnvelope.Command` is typed `object`, so the actor relies on a large open-ended
|
||
`switch` with a `NotSupportedException` default for unknown types. While the individual
|
||
command records are immutable, `object` defeats compile-time exhaustiveness — adding a new
|
||
command record produces no compiler signal that `DispatchCommand` (and `GetRequiredRole`)
|
||
need updating, and a typo or unregistered command surfaces only as a runtime exception. The
|
||
message contract is also harder to evolve safely under the additive-only rule.
|
||
|
||
**Recommendation**
|
||
|
||
Introduce a marker interface (e.g. `IManagementCommand`) implemented by every command record
|
||
and type the envelope payload as that interface. This documents the contract, lets analyzers
|
||
flag unhandled cases, and keeps `ManagementCommandRegistry`'s reflection scan precise.
|
||
|
||
**Resolution**
|
||
|
||
Deferred 2026-05-16. Finding verified as genuine: `ManagementEnvelope.Command` is typed
|
||
`object` and the recommended `IManagementCommand` marker-interface fix is sound. However, the
|
||
fix cannot be implemented within the `ManagementService` module: both `ManagementEnvelope` and
|
||
all ~50 `*Command` records live in `src/ScadaLink.Commons/Messages/Management/` (17 files),
|
||
which is outside this work item's permitted edit scope (`src/ScadaLink.ManagementService/**`,
|
||
its tests, and this findings file only). Adding the marker interface, retyping the envelope,
|
||
and having `ManagementCommandRegistry` constrain its reflection scan to `IManagementCommand`
|
||
implementers is a cohesive Commons-module change and must be done there — also so the Commons
|
||
message-contract additive-only evolution rules are respected. Deferred to a Commons-module
|
||
work item; no `ManagementService`-local change is appropriate.
|
||
|
||
### ManagementService-013 — No tests for site-scope enforcement, the HTTP endpoint, or DebugStreamHub
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Testing coverage |
|
||
| Status | Resolved |
|
||
| Location | `tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1` |
|
||
|
||
**Description**
|
||
|
||
`ManagementActorTests` covers role-based authorization, success/error mapping, and correlation
|
||
IDs thoroughly, but several critical paths are untested: (a) site-scope enforcement —
|
||
`EnforceSiteScope`/`EnforceSiteScopeForInstance` and `SiteScopeViolationException` -> `Unauthorized`
|
||
mapping have no test, which is why the gaps in findings 001/002 went unnoticed; (b)
|
||
`ManagementEndpoints` — Basic Auth decoding, malformed-header handling, LDAP/role resolution,
|
||
command deserialization, and HTTP status mapping have zero coverage; (c) `DebugStreamHub`
|
||
authentication, subscribe/unsubscribe lifecycle, and `ManagementCommandRegistry.Resolve` are
|
||
untested. The `Envelope` test helper always passes `Array.Empty<string>()` for permitted
|
||
sites, so no test ever exercises a site-scoped user.
|
||
|
||
**Recommendation**
|
||
|
||
Add tests that exercise a site-scoped Deployment user against in-scope and out-of-scope
|
||
targets for instance and site operations, asserting `ManagementUnauthorized` on violations.
|
||
Add `WebApplicationFactory`-based tests for `ManagementEndpoints` covering auth failures,
|
||
malformed bodies, unknown commands, and the 200/400/403/401/504 mappings.
|
||
|
||
**Resolution**
|
||
|
||
Resolved 2026-05-16 (commit pending). The site-scope and `DebugStreamHub` coverage gaps
|
||
were closed by the resolution of findings 001/002/003 (the `ScopedEnvelope` helper plus the
|
||
`*_OutOfScopeForSiteScopedUser_ReturnsUnauthorized` tests and `DebugStreamHubTests`). The
|
||
remaining HTTP-endpoint gap is now covered by a new `ManagementEndpointsTests.cs` exercising
|
||
`ManagementEndpoints.ParseCommand` — command deserialization, malformed JSON, missing
|
||
`command` field, and unknown commands. Full `WebApplicationFactory` auth-flow tests were
|
||
deliberately not added: `ManagementEndpoints` depends on `LdapAuthService` and live LDAP
|
||
infrastructure, so the testable command-parsing/dispatch logic was extracted into the pure
|
||
`ParseCommand` helper and covered instead. Tests: `ParseCommand_*` (5),
|
||
`SerializeResult_*` (2), `UnknownCommandType_FaultMappedToManagementError`, plus the
|
||
pre-existing site-scope and DebugStreamHub suites. `dotnet test` -> 48 passed.
|
||
|
||
### ManagementService-014 — HandleQueryDeployments bypasses site-scope enforcement
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | High |
|
||
| Category | Security |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:306`, `:1174` |
|
||
|
||
**Description**
|
||
|
||
`QueryDeploymentsCommand` is gated to the `Deployment` role by `GetRequiredRole`
|
||
(`ManagementActor.cs:170`–`:177`), and the design document's Authorization section states
|
||
"Site scoping is enforced for site-scoped Deployment users" and explicitly lists deployments
|
||
among the Deployment-role operations. `HandleQueryDeployments` makes no call to
|
||
`EnforceSiteScope` / `EnforceSiteScopeForInstance` / `EnforceSiteScopeForIdentifier`: with no
|
||
`InstanceId` it returns `IDeploymentManagerRepository.GetAllDeploymentRecordsAsync()` (every
|
||
deployment record across all sites), and with an `InstanceId` it returns that instance's
|
||
deployment history with no check that the instance's site is within the caller's permitted
|
||
set. A site-scoped Deployment user scoped to site A can therefore enumerate deployment
|
||
records for instances at site B — instance IDs, `DeployedBy` (operator usernames),
|
||
timestamps, deployment status, and `ErrorMessage` content — by issuing `QueryDeployments`
|
||
with or without an out-of-scope `InstanceId`. This is the same authorization-bypass class as
|
||
the resolved findings 001/002, on a handler that sweep did not cover; it is `DispatchCommand`'s
|
||
only `Deployment`-role handler with no scope enforcement.
|
||
|
||
**Recommendation**
|
||
|
||
Thread `AuthenticatedUser` into `HandleQueryDeployments` (the dispatch case at line 306
|
||
already has `user` in scope). When `cmd.InstanceId` is supplied, call
|
||
`EnforceSiteScopeForInstance` before querying. When it is not supplied, filter the returned
|
||
`DeploymentRecord` list to the caller's permitted sites — resolve each record's instance to
|
||
its `SiteId` (or join through a site-aware repository query) and drop records for sites
|
||
outside `PermittedSiteIds`, mirroring the `HandleListInstances` / `HandleListSites` filter
|
||
pattern. Add a regression test for a site-scoped user against in-scope and out-of-scope
|
||
instances.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### ManagementService-015 — HandleSetInstanceOverrides applies overrides non-atomically
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Medium |
|
||
| Category | Error handling & resilience |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:647`–`:659` |
|
||
|
||
**Description**
|
||
|
||
`HandleSetInstanceOverrides` iterates `cmd.Overrides` and calls
|
||
`InstanceService.SetAttributeOverrideAsync` once per attribute, throwing
|
||
`InvalidOperationException` on the first `result.IsSuccess == false`. Each
|
||
`SetAttributeOverrideAsync` call persists independently, so if the command supplies five
|
||
overrides and the third fails (e.g. an unknown attribute name or a validation error), the
|
||
first two overrides are already committed to the configuration database while the caller
|
||
receives a `ManagementError`. The instance is left partially mutated in a state the operator
|
||
neither sees nor requested, and the per-instance operation lock referenced in the design's
|
||
deployment decisions does not protect against this because the partial writes are committed
|
||
before the throw. A retry of the same command then re-applies the already-applied overrides.
|
||
|
||
**Recommendation**
|
||
|
||
Make the multi-override mutation all-or-nothing: either validate every requested override up
|
||
front before applying any, or apply all overrides within a single transaction / unit-of-work
|
||
so a mid-batch failure rolls back the earlier writes. If `InstanceService` cannot offer a
|
||
batch method, at minimum document the partial-application behaviour on `SetInstanceOverridesCommand`
|
||
and have the handler report which overrides were applied before the failure so the caller
|
||
can reconcile.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### ManagementService-016 — Unexpected exception messages returned verbatim to HTTP callers
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Security |
|
||
| Status | Open |
|
||
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:121`–`:131` |
|
||
|
||
**Description**
|
||
|
||
`MapFault` maps any non-`SiteScopeViolationException` fault to
|
||
`new ManagementError(correlationId, cause.Message, "COMMAND_FAILED")`, and
|
||
`ManagementEndpoints.HandleRequest` returns that `Error` string directly in the HTTP 400
|
||
body. For handler-thrown `InvalidOperationException`s carrying a curated `result.Error`
|
||
message this is intended and safe. But the same path also surfaces the raw `.Message` of
|
||
unanticipated exceptions — a `SqlException` (which can include server/database names and
|
||
constraint details), a `DbUpdateException`, an `ArgumentException` from `Enum.Parse` on a
|
||
malformed `DataType`/`TriggerType` value, or a `NullReferenceException` — straight to the
|
||
external CLI/HTTP client. This is a minor internal-detail disclosure surface: the exception
|
||
text is already logged server-side with full context, so the client does not need the raw
|
||
message.
|
||
|
||
**Recommendation**
|
||
|
||
Distinguish handler-curated failures from unexpected faults. Have handlers throw a dedicated
|
||
exception type (e.g. `ManagementCommandException`) for messages that are safe to surface, and
|
||
in `MapFault` return that message for the known type while returning a generic
|
||
"An internal error occurred (CorrelationId=...)" string for everything else — the operator
|
||
can still correlate to the server log via the correlation ID.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|
||
|
||
### ManagementService-017 — QueryDeploymentsCommand has no test coverage
|
||
|
||
| | |
|
||
|--|--|
|
||
| Severity | Low |
|
||
| Category | Testing coverage |
|
||
| Status | Open |
|
||
| Location | `tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1` |
|
||
|
||
**Description**
|
||
|
||
`QueryDeploymentsCommand` / `HandleQueryDeployments` is exercised by no test in
|
||
`ManagementActorTests`. There is no test that it requires the `Deployment` role, no test of
|
||
the `InstanceId`-filtered versus unfiltered branches, and — because the handler performs no
|
||
site-scope enforcement at all — no test that would have caught finding 014. The deployment
|
||
query is one of the operations the design's Authorization section calls out for site
|
||
scoping, yet it is the only `Deployment`-role command with neither an authorization test nor
|
||
a site-scope test.
|
||
|
||
**Recommendation**
|
||
|
||
Add tests for `QueryDeploymentsCommand`: a role test (Design/no-role caller →
|
||
`ManagementUnauthorized`), branch coverage for the `InstanceId`-filtered and unfiltered
|
||
repository calls, and — once finding 014 is fixed — site-scope tests for a site-scoped
|
||
Deployment user against in-scope and out-of-scope deployment records.
|
||
|
||
**Resolution**
|
||
|
||
_Unresolved._
|