Files
scadalink-design/code-reviews/ManagementService/findings.md
Joseph Doherty 977d7369a7 docs: add code review process and baseline review of all 19 modules
Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
2026-05-16 18:09:09 -04:00

433 lines
19 KiB
Markdown

# Code Review — ManagementService
| Field | Value |
|-------|-------|
| Module | `src/ScadaLink.ManagementService` |
| Design doc | `docs/requirements/Component-ManagementService.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 13 |
## Summary
The ManagementService module is a thin command-dispatch layer: a single `ManagementActor`
fronts every administrative operation, an HTTP `POST /management` endpoint authenticates and
forwards to it, and a SignalR `DebugStreamHub` provides real-time debug streaming. The code
is consistently structured and the role-based authorization gate (`GetRequiredRole`) is
broadly correct and well tested. However, the review surfaced a significant **security
theme**: site-scope enforcement, which the design document requires for instance- and
site-targeted Deployment operations, is applied inconsistently — several query handlers and
all remote-query/debug handlers perform no site-scope check at all, allowing a site-scoped
Deployment user to read or act on sites outside their scope. A second theme is **Akka.NET
convention drift**: the actor offloads all work to `Task.Run` instead of using `PipeTo`,
declares no supervision strategy, and the contract messages carry a loosely-typed `object`
payload. There are also resource-management defects in the HTTP endpoint (`JsonDocument`
instances never disposed) and dead/unused configuration. None of the findings are
crash-class, but the site-scope gaps are High severity because they are a real
authorization bypass with no workaround.
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | + | `HandleResolveRoles` builds `RoleMapper` by hand; `ResolveRolesCommand` is a stale dispatch path. See 008, 011. |
| 2 | Akka.NET conventions | + | `Task.Run` instead of `PipeTo`, no supervision strategy, `object`-typed message payload. See 004, 005, 012. |
| 3 | Concurrency & thread safety | + | Actor is stateless so `Task.Run` does not corrupt state, but it defeats actor-thread serialization (004). `Sender` correctly captured to a local before the closure. |
| 4 | Error handling & resilience | + | Exceptions are caught and mapped uniformly; `SiteScopeViolationException` mapped to `Unauthorized`. Audit-logging consistency issue noted in 009. |
| 5 | Security | + | Site-scope enforcement missing on query/remote/debug paths. See 001, 002, 003. |
| 6 | Performance & resource management | + | `JsonDocument` instances never disposed in the HTTP endpoint. See 006. |
| 7 | Design-document adherence | + | Design doc states remote queries enforce site scoping; code does not. `ManagementServiceOptions` reserved-for-future config is unused. See 001, 010. |
| 8 | Code organization & conventions | + | Mixed serializers (Newtonsoft in actor, System.Text.Json in endpoint); inconsistent audit logging across mutations. See 007, 009. |
| 9 | Testing coverage | + | Authorization is well covered; site-scope enforcement, the HTTP endpoint, `DebugStreamHub`, and remote-query handlers have no tests. See 013. |
| 10 | Documentation & comments | + | XML docs are accurate where present; `ManagementServiceOptions` and `ResolveRolesCommand` paths are undocumented dead code (010, 011). |
## Findings
### ManagementService-001 — Remote-query and debug-snapshot handlers bypass site-scope enforcement
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:1465`, `:1481`, `:1493`, `:641`, `:649` |
**Description**
The design document (`Component-ManagementService.md`, Authorization section) states that for
Deployment users "Site scoping is enforced for site-scoped Deployment users" and lists
"debug snapshot, parked message queries, site event log queries" among the Deployment-role
operations. `HandleQueryEventLogs`, `HandleQueryParkedMessages`, `HandleDebugSnapshot`,
`HandleRetryParkedMessage`, and `HandleDiscardParkedMessage` make no call to `EnforceSiteScope`
or `EnforceSiteScopeForInstance`. A Deployment user scoped to site A can therefore query event
logs / parked messages of site B, retry or discard another site's parked messages, and pull a
debug snapshot of any instance simply by supplying a different `SiteIdentifier` or `InstanceId`.
This is an authorization bypass with no workaround.
**Recommendation**
In each of these handlers resolve the target site and call site-scope enforcement before
delegating to `CommunicationService`. For the `SiteIdentifier`-keyed handlers, look up the
`Site` by identifier and enforce against `Site.Id`; for `DebugSnapshotCommand` the instance
is already loaded — call `EnforceSiteScope(user, instance.SiteId)` (which requires threading
`AuthenticatedUser` into these handlers, currently dropped).
**Resolution**
_Unresolved._
### ManagementService-002 — Single-entity query handlers leak data across site scope
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:510`, `:673`, `:733`, `:774`, `:631`, `:624` |
**Description**
`HandleListInstances` and `HandleListSites` correctly filter their results by the user's
`PermittedSiteIds`, but the single-entity query handlers do not. `HandleGetInstance`,
`HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection` fetch by ID with no
site-scope check, so a site-scoped Deployment user can read any instance, site, area tree,
or data connection by ID even though that site is excluded from their scope. The list
endpoints having a filter while the get-by-id endpoints do not is an inconsistency that
undermines the scoping model. (`HandleGetDeploymentDiff` and `HandleListInstanceAlarmOverrides`
do enforce scope, confirming the omission elsewhere is unintentional.)
**Recommendation**
Apply `EnforceSiteScopeForInstance` in `HandleGetInstance`, and `EnforceSiteScope` against
the resolved site ID in `HandleGetSite`, `HandleListAreas`, and `HandleGetDataConnection`
(for data connections, scope by the connection's `SiteId`).
**Resolution**
_Unresolved._
### ManagementService-003 — DebugStreamHub.SubscribeInstance performs no per-instance authorization
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/DebugStreamHub.cs:104` |
**Description**
`OnConnectedAsync` authenticates the WebSocket connection and verifies the caller holds the
`Deployment` role, but `SubscribeInstance(int instanceId)` accepts any instance ID and starts
a stream without checking that the authenticated user is scoped to that instance's site. A
site-scoped Deployment user can therefore subscribe to the live debug stream (attribute
values, alarm states) of an instance belonging to a site outside their scope. This is the
streaming equivalent of finding 001/002.
**Recommendation**
Resolve the instance's site inside `SubscribeInstance` and reject the subscription if the
authenticated user's permitted-site set does not include it. The authenticated identity
established in `OnConnectedAsync` must be persisted on the connection (e.g. in
`Context.Items`) so it is available to `SubscribeInstance`.
**Resolution**
_Unresolved._
### ManagementService-004 — Actor offloads work to Task.Run instead of using PipeTo
| | |
|--|--|
| Severity | Medium |
| Category | Akka.NET conventions |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:61` |
**Description**
`HandleEnvelope` runs every command on a thread-pool thread via `Task.Run(async () => ...)`
and replies from inside the continuation. This is the anti-pattern the project's Akka.NET
conventions warn against — the canonical approach is to start the async work and `PipeTo`
its result back to `Self`/`Sender`. Although `Sender` is correctly copied to a local before
the closure, the current code: (a) lets multiple commands execute fully concurrently with no
actor-thread serialization, so the actor provides no ordering or back-pressure guarantees
and is an actor in name only; (b) cannot be paused, supervised, or made to honour a mailbox
bound; (c) is shielded from synchronous faults only because every path is inside the
try/catch — any future code path that throws synchronously before the `Task.Run` body would
escape it.
**Recommendation**
Replace `Task.Run` with a method that returns the `Task` and `PipeTo` the mapped result
(`ManagementSuccess`/`ManagementError`/`ManagementUnauthorized`) back to the captured sender,
mapping faults in the `PipeTo` failure continuation. If genuine parallelism is desired, make
that explicit with a router/dispatcher rather than ad-hoc `Task.Run`.
**Resolution**
_Unresolved._
### ManagementService-005 — ManagementActor declares no supervision strategy
| | |
|--|--|
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:33` |
**Description**
The project conventions call for explicit supervision strategies (Resume for coordinator
actors). `ManagementActor` is a long-lived coordinator-style actor but overrides no
`SupervisorStrategy` and defines no `PreRestart`/`PostRestart` behaviour. In practice it
spawns no children so the default strategy is rarely exercised, but an explicit strategy
should still be declared for clarity and to match the documented convention; it also matters
if children are added later (e.g. if finding 004 introduces worker actors).
**Recommendation**
Add an explicit `protected override SupervisorStrategy SupervisorStrategy()` returning a
Resume-based strategy, consistent with other central coordinator actors.
**Resolution**
_Unresolved._
### ManagementService-006 — JsonDocument instances never disposed in the HTTP endpoint
| | |
|--|--|
| Severity | Medium |
| Category | Performance & resource management |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementEndpoints.cs:83`, `:112` |
**Description**
`JsonDocument` is `IDisposable` (it rents buffers from a pooled `ArrayPool`). `HandleRequest`
parses the request body into `doc` at line 83 and never disposes it, and line 112
(`JsonDocument.Parse("{}")`) allocates a second document inline that is also never disposed.
Every management HTTP call therefore leaks pooled buffers, increasing GC pressure and pool
churn under load.
**Recommendation**
Wrap the parsed document in `using var doc = ...`. For the empty-payload fallback, avoid
allocating a `JsonDocument` entirely — deserialize from the literal string `"{}"`/an empty
object, or restructure so the fallback path does not parse a throwaway document.
**Resolution**
_Unresolved._
### ManagementService-007 — Inconsistent and cycle-prone serialization of repository entities
| | |
|--|--|
| Severity | Medium |
| Category | Code organization & conventions |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:67`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:113` |
**Description**
The actor serializes every command result with `Newtonsoft.Json` (`JsonConvert.SerializeObject`)
while the HTTP endpoint deserializes payloads with `System.Text.Json`. Beyond the
inconsistency, `JsonConvert.SerializeObject` is applied directly to EF-backed entities
returned by repositories (e.g. `Site`, `DataConnection`, `NotificationList` with a
`Recipients` collection, `Template` with children). With default Newtonsoft settings any
bidirectional navigation property produces a `JsonSerializationException` for self-referencing
loops, and even without cycles this serializes lazy/navigation state the CLI does not expect.
**Recommendation**
Standardise on one serializer (the rest of the HTTP path uses `System.Text.Json`). Serialize
explicit DTOs / projections rather than EF entities, or configure
`ReferenceLoopHandling.Ignore` and ignore navigation properties. Verify that handlers
returning rich entity graphs (`HandleGetTemplate`, `HandleUpdateNotificationList`) round-trip
correctly.
**Resolution**
_Unresolved._
### ManagementService-008 — HandleResolveRoles constructs RoleMapper manually instead of via DI
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:285` |
**Description**
Every other handler resolves its collaborators from the scoped `IServiceProvider`.
`HandleResolveRoles` instead does `new RoleMapper(sp.GetRequiredService<ISecurityRepository>())`,
bypassing DI. If `RoleMapper` ever gains a dependency, caching, or options, this hand-built
instance silently diverges from the DI-registered one. It is also inconsistent with
`ManagementEndpoints`, which resolves `RoleMapper` from DI.
**Recommendation**
Resolve `RoleMapper` via `sp.GetRequiredService<RoleMapper>()` like every other dependency.
**Resolution**
_Unresolved._
### ManagementService-009 — Audit logging applied inconsistently across mutating handlers
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:357`, `:1134`, `:1085`, `:526`, `:1275` |
**Description**
The design doc states "All mutating operations are audit logged." Some handlers call
`AuditAsync` explicitly (`HandleCreateInstance`, `HandleCreateSite`, all repository-direct
external-system/notification/security/area mutations), but the handlers that delegate to a
domain service do **not**`HandleCreateTemplate`/`HandleUpdateTemplate`/`HandleDeleteTemplate`,
all template-member handlers (`HandleAddAttribute` ... `HandleDeleteComposition`), template-folder
handlers, shared-script handlers, `HandleDeployArtifacts`, `HandleDeployInstance`,
`HandleEnableInstance`/`Disable`/`Delete`, and the instance-binding/override handlers. This is
correct only if every one of those services performs its own audit logging internally; the
mixed pattern makes that impossible to verify by reading this module and creates a real risk
of silent audit gaps for template authoring and deployment operations.
**Recommendation**
Decide on one layer that owns auditing. Either route all mutations through services that audit
internally (and remove the explicit `AuditAsync` calls here), or audit uniformly in the actor
after every successful mutation. Document the chosen contract so the inconsistency cannot
recur, and confirm template/deployment services actually audit.
**Resolution**
_Unresolved._
### ManagementService-010 — ManagementServiceOptions.CommandTimeout is defined but never used
| | |
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementServiceOptions.cs:5`; `src/ScadaLink.ManagementService/ManagementEndpoints.cs:16` |
**Description**
`ManagementServiceOptions.CommandTimeout` is bound from configuration in
`ServiceCollectionExtensions`, but no code reads it. The HTTP endpoint instead hard-codes
`AskTimeout = TimeSpan.FromSeconds(30)`. The design doc describes the options section as
"Reserved for future configuration — e.g., command timeout overrides", yet a concrete
`CommandTimeout` property already exists and is silently ignored, so an operator who sets it
in `appsettings.json` gets no effect.
**Recommendation**
Either consume `ManagementServiceOptions.CommandTimeout` in `ManagementEndpoints.HandleRequest`
(inject `IOptions<ManagementServiceOptions>`), or remove the property until it is wired up so
configuration cannot be set with no effect.
**Resolution**
_Unresolved._
### ManagementService-011 — ResolveRolesCommand dispatch path is stale dead code
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.ManagementService/ManagementActor.cs:273`, `:283` |
**Description**
The design doc states the HTTP endpoint "collapses the CLI's previous two-step flow
(ResolveRoles + actual command) into a single HTTP round-trip", and indeed `ManagementEndpoints`
performs LDAP auth and role resolution itself before dispatching. The `ResolveRolesCommand`
case in `DispatchCommand` is therefore unreachable from the HTTP path. It remains reachable
only via a raw ClusterClient sender, but a caller able to send `ResolveRolesCommand` could
enumerate role mappings for arbitrary LDAP groups with no role requirement
(`GetRequiredRole` returns null for it) — a minor information-disclosure surface for a path
the design says no longer exists.
**Recommendation**
If the two-step flow is genuinely retired, remove `ResolveRolesCommand`, its handler, and the
class. If it must remain for non-HTTP clients, document why and confirm exposing role-mapping
data unauthenticated is intended.
**Resolution**
_Unresolved._
### ManagementService-012 — ManagementEnvelope carries a loosely-typed object payload
| | |
|--|--|
| Severity | Low |
| Category | Akka.NET conventions |
| Status | Open |
| Location | `src/ScadaLink.Commons/Messages/Management/ManagementEnvelope.cs:7`; `src/ScadaLink.ManagementService/ManagementActor.cs:132` |
**Description**
`ManagementEnvelope.Command` is typed `object`, so the actor relies on a large open-ended
`switch` with a `NotSupportedException` default for unknown types. While the individual
command records are immutable, `object` defeats compile-time exhaustiveness — adding a new
command record produces no compiler signal that `DispatchCommand` (and `GetRequiredRole`)
need updating, and a typo or unregistered command surfaces only as a runtime exception. The
message contract is also harder to evolve safely under the additive-only rule.
**Recommendation**
Introduce a marker interface (e.g. `IManagementCommand`) implemented by every command record
and type the envelope payload as that interface. This documents the contract, lets analyzers
flag unhandled cases, and keeps `ManagementCommandRegistry`'s reflection scan precise.
**Resolution**
_Unresolved._
### ManagementService-013 — No tests for site-scope enforcement, the HTTP endpoint, or DebugStreamHub
| | |
|--|--|
| Severity | Medium |
| Category | Testing coverage |
| Status | Open |
| Location | `tests/ScadaLink.ManagementService.Tests/ManagementActorTests.cs:1` |
**Description**
`ManagementActorTests` covers role-based authorization, success/error mapping, and correlation
IDs thoroughly, but several critical paths are untested: (a) site-scope enforcement —
`EnforceSiteScope`/`EnforceSiteScopeForInstance` and `SiteScopeViolationException` -> `Unauthorized`
mapping have no test, which is why the gaps in findings 001/002 went unnoticed; (b)
`ManagementEndpoints` — Basic Auth decoding, malformed-header handling, LDAP/role resolution,
command deserialization, and HTTP status mapping have zero coverage; (c) `DebugStreamHub`
authentication, subscribe/unsubscribe lifecycle, and `ManagementCommandRegistry.Resolve` are
untested. The `Envelope` test helper always passes `Array.Empty<string>()` for permitted
sites, so no test ever exercises a site-scoped user.
**Recommendation**
Add tests that exercise a site-scoped Deployment user against in-scope and out-of-scope
targets for instance and site operations, asserting `ManagementUnauthorized` on violations.
Add `WebApplicationFactory`-based tests for `ManagementEndpoints` covering auth failures,
malformed bodies, unknown commands, and the 200/400/403/401/504 mappings.
**Resolution**
_Unresolved._